State-dependent doubly weighted stochastic simulation algorithm for automatic characterization of stochastic biochemical rare events

Min K Roh; Bernie J Daigle, Jr; Dan T Gillespie; Linda R Petzold

doi:10.1063/1.3668100

. 2011 Dec 20;135(23):234108. doi: 10.1063/1.3668100

State-dependent doubly weighted stochastic simulation algorithm for automatic characterization of stochastic biochemical rare events

Min K Roh ^1,^a),^b), Bernie J Daigle Jr ^1,^a),^c), Dan T Gillespie ^2,^d), Linda R Petzold ^1,^e)

PMCID: PMC3264419 PMID: 22191865

Abstract

In recent years there has been substantial growth in the development of algorithms for characterizing rare events in stochastic biochemical systems. Two such algorithms, the state-dependent weighted stochastic simulation algorithm (swSSA) and the doubly weighted SSA (dwSSA) are extensions of the weighted SSA (wSSA) by H. Kuwahara and I. Mura [J. Chem. Phys. 129, 165101 (2008)]10.1063/1.2987701. The swSSA substantially reduces estimator variance by implementing system state-dependent importance sampling (IS) parameters, but lacks an automatic parameter identification strategy. In contrast, the dwSSA provides for the automatic determination of state-independent IS parameters, thus it is inefficient for systems whose states vary widely in time. We present a novel modification of the dwSSA—the state-dependent doubly weighted SSA (sdwSSA)—that combines the strengths of the swSSA and the dwSSA without inheriting their weaknesses. The sdwSSA automatically computes state-dependent IS parameters via the multilevel cross-entropy method. We apply the method to three examples: a reversible isomerization process, a yeast polarization model, and a lac operon model. Our results demonstrate that the sdwSSA offers substantial improvements over previous methods in terms of both accuracy and efficiency.

INTRODUCTION

Stochasticity plays an important role in many biological processes. The significance of this role is particularly evident when stochastic behavior gives rise to biochemical rare events. These events are often accompanied by profound consequences to the underlying system. For example, a recent study indicates that rare mutations in stem cells can cause blood disorders such as chronic myeloid leukemia and paroxysmal nocturnal hemoglobinuria.¹ Because rare events can require a very long time period to be observed in a natural setting, conducting in vivo or in vitro experiments may not be feasible. As an alternative, properly formulated in silico simulations can provide essential information.

One popular mathematical approach for characterizing biochemical rare events utilizes Monte Carlo methods such as the stochastic simulation algorithm (SSA).² Although the SSA is straightforward to implement, it is not efficient for simulating a rare event, as a huge number of realizations will be required to witness a single occurrence. This inefficiency can be overcome with importance sampling (IS),³ a general technique that uses an alternate distribution to estimate a distribution of interest. In 2008, Kuwahara and Mura developed the weighted stochastic simulation algorithm (wSSA),⁴ which incorporated IS into the SSA for efficient characterization of rare events. The wSSA is designed to determine the probability that, given an initial state, the system reaches a state in a prescribed rare event set before the final simulation time. The wSSA alters the underlying probability distribution of reaction firings at each time step such that the system is artificially shifted toward the desired rare event. The bias introduced in this process is then corrected using a likelihood ratio between the original probability mass function (PMF) and the altered PMF, yielding an unbiased estimator. Although the wSSA can be efficient, its efficiency as well as accuracy largely depend on the choice of IS parameters. In particular, a poorly chosen set of IS parameters can generate a wSSA estimate that is far less accurate than the SSA estimate. For systems where favorable values of IS parameters are unknown, users must adopt the costly trial and test method. The large computational burden imposed by this method limits the size of systems that can be interrogated with the wSSA.

This issue was resolved with the recent development of the doubly weighted SSA (dwSSA).⁵ When combined with the multilevel cross-entropy (CE) method,⁶ the dwSSA automatically discovers IS parameter values that yield low-variance rare event probability estimates for any given system. Use of the dwSSA leads to a dramatic reduction in computation time compared to both the original SSA and the wSSA.⁵ Despite this favorable performance, there is still room for improvement, as the dwSSA employs constant (i.e., system state-independent) IS parameters. In contrast, systems whose species populations vary widely in time require state-dependent IS parameters to yield the lowest possible estimator variance. The state-dependent wSSA (swSSA) (Ref. ⁷) handles variation in the system state by computing IS parameters that depend on the relative propensity of each reaction at the current time step. However, like the wSSA, the swSSA does not offer an efficient strategy for choosing optimal IS parameters.

In this paper we present a novel modification of the dwSSA: the state-dependent doubly weighted stochastic simulation algorithm (sdwSSA), which automatically and efficiently computes state-dependent IS parameters with minimal input from the user. Our presentation is structured as follows: Section 2 provides background on rare event probability estimation using Monte Carlo simulation. Section 3 describes the detailed algorithm of the sdwSSA. In Sec. 4, we apply the sdwSSA to three examples of increasing complexity. Finally in Sec. 5, we summarize our contributions and discuss future areas of research.

BACKGROUND

Rare event probabilities and the SSA

Here, we give a brief review of the SSA and define the class of rare events examined in this paper. We begin by assuming a well-stirred chemical system whose N species populations at time t are represented as $X (t)$ . The system evolves in time by firing M reactions {R₁, …, R_M}, whose propensities at time t are in the set ${a_{j} (X (t)) : j = 1, ..., M}$ , with sum $a_{0} (X (t))$ . Starting at the initial state $x_{0}$ , the “direct method” implementation of the SSA chooses the time to the next reaction τ and the index of the next reaction j^′ as exponential (with mean $1 / a_{0} (x)$ ) and categorical (with probabilities $a_{j} (x) / a_{0} (x)$ ) random variables, respectively. These random draws repeat until some stopping time T, which we define as the smaller of the first time to reach the rare event and the final simulation time (which will later be denoted by t). Given $x_{0}$ , the probability of a single SSA trajectory $J \equiv (τ_{1}, j_{1}^{'}, ..., τ_{N_{T}}, j_{N_{T}}^{'})$ , with the notation τ_i denoting that the dwell time preceding the ith reaction firing lies within dτ of τ_i, can be expressed as follows:

\begin{matrix} P_{SSA} (J) & = & \prod_{i = 1}^{N_{T}} [a_{0} (X (t_{i})) e^{- a_{0} (X (t_{i})) τ_{i}} d τ \times \frac{a_{j_{i}^{'}} (X (t_{i}))}{a_{0} (X (t_{i}))}] \\ = & \prod_{i = 1}^{N_{T}} [a_{j_{i}^{'}} (X (t_{i})) e^{- a_{0} (X (t_{i})) τ_{i}} d τ], \end{matrix}

(1)

with $t_{i} \equiv \sum_{j = 1}^{i} τ_{j}$ and $N_{T}$ is the total number of reactions that fire in the interval $[0, T]$ .

In this paper we are interested in estimating rare event probabilities of the form $p (x_{0}, E; t)$ , defined as the probability that given the initial state $x_{0}$ , the system reaches any state in the rare event set E at least once before time t. The Monte Carlo estimator for this probability using the SSA is given by

{\hat{p}}_{SSA} (x_{0}, E; t) = \frac{1}{K} \sum_{k = 1}^{K} [I_{{S (J_{k}) \cap E}}],

(2)

where K is the total number of trajectories, $J_{k}$ is the kth SSA trajectory simulated over time interval $[0, T]$ , and $I_{{S (J_{k}) \cap E}}$ is an indicator function that takes a value of 1 if any state in E is visited by $J_{k}$ , and 0 otherwise. We note that the quantity in Eq. 2 is equivalent to the total number of trajectories that reached a state in E by time t, divided by K.

Doubly weighted SSA

The doubly weighted SSA (dwSSA) as presented in Ref. 5 uses IS to bias both reaction selection and time to the next reaction. The system state under the dwSSA evolves in time according to predilection functions given by

b_{j} (X (t)) \equiv γ_{j} a_{j} (X (t)), b_{0} (X (t)) = \sum_{j = 1}^{M} b_{j} (X (t)),

(3)

where each γ_j is a positive constant. The next reaction index j^′ is chosen using the set of predilection functions, and τ becomes an exponential random variable with mean $1 / b_{0} (X (t))$ . Thus, the probability of a system trajectory Junder the dwSSA is given by

\begin{matrix} P_{dwSSA} (J) & = & \prod_{i = 1}^{N_{T}} [b_{0} (X (t_{i})) e^{- b_{0} (X (t_{i})) τ_{i}} d τ \times \frac{b_{j_{i}^{'}} (X (t_{i}))}{b_{0} (X (t_{i}))}] \\ = & \prod_{i = 1}^{N_{T}} [b_{j_{i}^{'}} (X (t_{i})) e^{- b_{0} (X (t_{i})) τ_{i}} d τ] . \end{matrix}

(4)

The bias that was introduced by the predilection functions can be corrected by multiplying 4 by the following weight:

\begin{matrix} W_{dwSSA} (J) & = & \prod_{i = 1}^{N_{T}} [\frac{a_{j_{i}^{'}} (X (t_{i})) e^{- a_{0} (X (t_{i})) τ_{i}}}{b_{j_{i}^{'}} (X (t_{i})) e^{- b_{0} (X (t_{i})) τ_{i}}}] \\ = & \prod_{i = 1}^{N_{T}} [\exp \{(b_{0} (X (t_{i})) - a_{0} (X (t_{i}))) τ_{i}\} \times {(γ_{j_{i}^{'}})}^{- 1}] . \end{matrix}

(5)

It was shown in Ref. 5 that the multilevel CE method of Rubinstein and Kroese⁶ provides a closed-form solution in the dwSSA for each reaction's optimal parameter estimate:

{\hat{γ}}_{j}^{* (n)} = \frac{\sum_{k}^{'} (W_{dwSSA} (J_{k}^{(n - 1)}; {\hat{γ}}^{(n - 1)}) \times n_{k j})}{\sum_{k}^{'} (W_{dwSSA} (J_{k}^{(n - 1)}; {\hat{γ}}^{(n - 1)}) \times \sum_{i = 1}^{N_{T_{k}}} [a_{j} (X_{k}^{(n - 1)} (t_{k i})) τ_{k i}])} .

(6)

Here, ${\hat{γ}}^{(n - 1)}$ is the estimate of the optimal dwSSA biasing parameters in the (n)th level of the multilevel CE method, $J_{k}^{(n - 1)}$ is the kth dwSSA trajectory parameterized with ${\hat{γ}}^{(n - 1)}$ , and n_kj is the total number of times reaction R_j fires in the kth trajectory. The index k in $\sum_{k}^{'}$ includes only the trajectories reaching the rare event. The multilevel CE method defines a series of intermediate “less rare” events and sequentially biases the system towards these events until the target rare event is reached. The process starts by simulating $K_{CE}$ trajectories of the system in the interval $[0, T]$ using the dwSSA with all parameters set to 1 (≡ SSA). We record the top $⌈ ρ K_{CE} ⌉$ trajectories (where ρ is typically ∼10⁻²) that evolve farthest in the direction of the set E, and we label those states reached by the $⌈ ρ K_{CE} ⌉$ recorded trajectories that are closest to E as $E_{0}$ . The set $E_{0}$ represents a “less rare” event, and we solve for the corresponding optimal dwSSA parameters ${\hat{γ}}^{(0)}$ . This process repeats n times, until the intermediate set of states $E_{n}$ is contained in the target rare event set E. We note that Eq. 6 represents one of M uncoupled equations from the final step of the multilevel algorithm in Ref. 5, where ${\hat{γ}}^{* (n)} \equiv {\hat{γ}}^{*}$ . In practice, each of these equations is solved at every level of the CE method until the final parameter estimates are obtained. For an intuitive explanation of why Eq. 6 works, we note that the numerator represents a weighted sum of the total number of times reaction R_j fires across the successful trajectories, whereas the denominator is a weighted sum of the expected total number of times reaction R_j will fire across those same trajectories. Reactions that are needed to fire more often than their average behavior to reach the rare event will thus acquire a ${\hat{γ}}_{j}^{*}$ greater than 1, whereas reactions needed to fire less often than average will acquire a ${\hat{γ}}_{j}^{*}$ less than 1.

SDWSSA FORMULATION AND THE MULTILEVEL CROSS-ENTROPY METHOD

If our goal is to simply transform γ_j in Eq. 3 into a state-dependent IS parameter, the number of possible transformations is infinite. However, it is important that any state-dependent biasing scheme be computationally inexpensive and, more importantly, allow for a closed-form solution for the IS parameters when combined with the cross-entropy method. In Ref. 5, the authors showed that although the wSSA predilection function was simple to compute, its formulation did not give rise to a closed-form solution for the wSSA parameters. In Secs. 3A–III B, we present a novel state-dependent importance sampling scheme whose IS parameters are easily computable in closed form. Section 3A introduces the sdwSSA and describes its state-dependent IS strategy. Section 3B integrates the sdwSSA into the cross-entropy framework and derive s a closed form solution for the optimal state-dependent IS parameters.

State-dependent doubly weighted SSA

It is well known that the optimal importance biasing scheme for any IS problem is state-dependent. In typical biochemical systems, molecular populations of species change constantly during a simulation as reactions fire. If only one IS parameter is used for each reaction regardless of the system state, then γ_j is a positive constant that is multiplied by $a_{j} (X (t))$ at every time step. In this case, the best choice for γ_j would be a value that perturbs the jth reaction by the “right amount” for “most of the visited states,” which is the precise strategy used by the dwSSA. For a reaction R_j whose propensity changes substantially as the system evolves, this strategy will under- and over-perturb R_j when its propensity takes on values that are much larger or smaller than average. We can improve such sub-optimal biasing with a properly formulated state-dependent biasing scheme.

The most obvious approach for making γ_j state-dependent is to set it to a time-varying function of the system state, i.e., $γ_{j} (X (t))$ . However, this is not a good formulation for two reasons. First, closed-form expressions for $γ_{j}^{*}$ are not available when γ_j is a continuous function. Second, the system state alone is not sufficient to determine the amount of perturbation each reaction requires. Many different configurations of the state vector can yield the same propensity value for a reaction involving more than one species, and the possible number of states may be infinite. A better form for γ_j would be a discrete function that depends on the relative propensity, $a_{j} (X (t)) / a_{0} (X (t))$ , which corresponds to the likelihood of choosing R_j as the next reaction to fire. For simplicity, we assign a new variable $π_{j} (X (t))$ to denote the jth relative propensity, i.e., $π_{j} (X (t)) \equiv a_{j} (X (t)) / a_{0} (X (t))$ .

In short, our proposed state-dependent biasing scheme discretizes $π_{j} (X (t))$ into non-overlapping bins, and we estimate a constant biasing parameter for each bin. Thus, compared to the dwSSA, the sdwSSA requires additional biasing parameters for each reaction as well as bin discretization end points for each reaction. Specifically, the sdwSSA predilection function is given by

\begin{matrix} b_{j} (X (t)) \equiv γ_{j} (π_{j} (X (t))) \times a_{j} (X (t)), \\ b_{0} (X (t)) = \sum_{j = 1}^{M} b_{j} (X (t)), \end{matrix}

(7)

where γ_j(π_j) is a step function defined by

γ_{j} (π_{j}) = \{\begin{matrix} γ_{j 1}, & if π_{j} \leq c_{j}^{1} \\ γ_{j 2}, & if c_{j}^{1} < π_{j} \leq c_{j}^{2} \\ ⋮ & ⋮ \\ γ_{j β_{j}}, & if c_{j}^{β_{j} - 1} < π_{j} \end{matrix}, 0 < c_{j}^{1} < \dots < c_{j}^{β_{j} - 1} < 1 .

(8)

Here, β_j is the total number of bins used to discretize π_j. The number of end points required for β_j bins is β_j + 1, and the minimum and maximum possible values of any relative propensity are 0 and 1, respectively. Therefore, we need (β_j + 1 − 2) total end points, which are denoted by ${c_{j}^{1}, \dots, c_{j}^{β_{j} - 1}}$ in Eq. 8. These end points are defined by the following process. During the nth round of CE simulations, we determine the relative propensity range of each reaction parameterized with ${\hat{γ}}_{j}^{(n - 1)}$ . After completing these simulations, we divide each relative propensity π_j into β_max bins, where β_max is a global variable denoting the maximum number of bins for any reaction. In the next round of CE simulations, we record the number of reaction firings in each bin. We identify bins with fewer than κ_min firings, where κ_min is the minimum number required for each bin. We then repeatedly merge each of these bins with the adjacent bin having the fewest number of firings until the aggregate bin has at least κ_min firings. Upon completion, we update the coordinates and numbers of reaction firings in all merged bins. Once we have completed merging, the end points of β_j bins are given by ${0, c_{j}^{1}, \dots, c_{j}^{β_{j} - 1}, 1}$ . For convenience, we let cdenote a list of vectors ${[c_{1}^{1}, \dots, c_{1}^{β_{1} - 1}], \dots, [c_{M}^{1}, \dots, c_{M}^{β_{M} - 1}]}$ .

As with the dwSSA, the sdwSSA biases both the index of the next reaction and the time to that reaction. Using the biasing scheme in Eqs. 7, 8, the probability of the reaction trajectory in Eq. 1 under the sdwSSA is given by

\begin{matrix} P_{sdwSSA} (J) & = \prod_{i = 1}^{N_{T}} [b_{j_{i}^{'}} (X (t_{i})) e^{- b_{0} (X (t_{i})) τ_{i}} d τ] \end{matrix}

(9a)

\begin{matrix} = \prod_{i = 1}^{N_{T}} [(γ_{j_{i}^{'}} (π_{j_{i}^{'}} (X (t_{i}))) \times a_{j_{i}^{'}} (X (t_{i}))) \\ \times e^{- (\sum_{j = 1}^{M} γ_{j} (π_{j} (X (t_{i}))) \times a_{j} (X (t_{i}))) τ_{i}} d τ] . \end{matrix}

(9b)

The sdwSSA predilection function in Eq. 9a has been written in detail in Eq. 9b, to emphasize that it depends on the relative propensity of reactions at each time step. Similarly, the trajectory weight to correct the bias has the same dependence:

\begin{matrix} W_{sdwSSA} (J) & = & \prod_{i = 1}^{N_{T}} [\frac{a_{j_{i}^{'}} (X (t_{i})) e^{- a_{0} (X (t_{i})) τ_{i}}}{(γ_{j_{i}^{'}} (π_{j_{i}^{'}} (X (t_{i}))) \times a_{j_{i}^{'}} (X (t_{i}))) e^{- (\sum_{j = 1}^{M} γ_{j} (π_{j} (X (t_{i}))) \times a_{j} (X (t_{i}))) τ_{i}}}] . \end{matrix}

(10)

The product of Eqs. 9a, 10 equals the probability 1, which is the probability of an unbiased SSA trajectory. Thus, the sdwSSA weight is the ratio between the SSA and the sdwSSA of trajectory probabilities. As illustrated in Sec. 3B, the value of the biasing scheme of Eqs. 7, 8 is that it gives rise to closed form solutions for all $\sum_{j = 1}^{M} β_{j}$ IS parameters. In addition, the biasing scheme imposes very little computational overhead: only the original reaction propensity is required to determine the degree of perturbation in Eq. 8.

The sdwSSA and the cross-entropy method

The derivation of a closed-form solution for the sdwSSA parameter values can be obtained in a similar manner to that used for the dwSSA. Following the same logic as in Ref. 5, we derive the following system of $\sum_{j = 1}^{M} β_{j}$ equations:

\begin{matrix} \sum_{k = 1}^{K} [I_{{S (J_{k}^{(0)}) \cap E}} \times W_{sdwSSA} (J_{k}^{(0)}; γ^{(0)}) \times \underset{γ}{\nabla} \ln P_{s d w S S A} (J_{k}^{(0)}; \hat{γ^{*}})] \\ = 0, \end{matrix}

(11)

where $\hat{γ^{*}} = {[{\hat{γ}}_{11}^{*}, ..., {\hat{γ}}_{1 β_{1}}^{*}], ..., [{\hat{γ}}_{M 1}^{*}, ..., {\hat{γ}}_{M β_{M}}^{*}]}$ and $J_{k}^{(0)}$ is the kth sdwSSA trajectory parameterized with $γ^{(0)}$ . Here we note that γγ cannot necessarily be expressed as a matrix, because each reaction can have a different value for β_j.

Next, we substitute Eq. 9a into Eq. 11 to obtain

\begin{matrix} 0 & = & \sum_{k = 1}^{K} [I \times W \times \underset{γ}{\nabla} \ln (\prod_{i = 1}^{N_{T_{k}}} [b_{j_{k i}^{'}} (X_{k} (t_{k i})) e^{- b_{0} (X_{k} (t_{k i})) τ_{k i}} d τ])] \\ = & \sum_{k = 1}^{K} [I \times W \times \underset{γ}{\nabla} \ln (\prod_{i = 1}^{N_{T_{k}}} [{\hat{γ}}_{j_{k i}^{'}}^{*} (X_{k} (t_{k i})) a_{j_{k i}^{'}} (X_{k} (t_{k i})) \\ \times \exp \{- τ_{k i} \sum_{j = 1}^{M} [{\hat{γ}}_{j}^{*} (X_{k} (t_{k i})) a_{j} (X_{k} (t_{k i}))]\} d τ])], \end{matrix}

(12)

where the subscripts in the first two factors inside the summation have been removed. Upon taking the logarithm, collecting terms not depending on $\hat{γ^{*}}$ in C_ki, and simplifying, we obtain

\begin{matrix} 0 & = & \sum_{k = 1}^{K} [I \times W \times \underset{γ}{\nabla} (\sum_{i = 1}^{N_{T_{k}}} [\ln ({\hat{γ}}_{j_{k i}^{'}}^{*} (X_{k} (t_{k i}))) \\ - τ_{k i} \sum_{j = 1}^{M} [{\hat{γ}}_{j}^{*} (X_{k} (t_{k i})) a_{j} (X_{k} (t_{k i}))] + C_{k i}])] . \end{matrix}

(13)

After differentiation, we obtain a scalar version of Eq. 13 for all β_j bins of all M reactions, which leads to the following detailed closed-form expression for each optimal parameter estimate:

\begin{matrix} {\hat{γ}}_{j r}^{(n)} = \frac{\sum_{k}^{'} (W_{s d w S S A} (J_{k}^{(n - 1)}; {\hat{γ}}^{(n - 1)}) \times n_{k j r})}{\sum_{k}^{'} (W_{s d w S S A} (J_{k}^{(n - 1)}; {\hat{γ}}^{(n - 1)}) \times \sum_{i}^{'} a_{j} (X_{k}^{(n - 1)} (t_{k i})) τ_{k i})}, \\ j = 1, \dots, M, r = 1, \dots, β_{j}, \end{matrix}

(14)

where n_kjr is the total number of times reaction j fires with ${\hat{γ}}_{j r}^{(n - 1)}$ as its IS parameter in the kth trajectory. We emphasize the similarity between Eqs. 6, 14. In both expressions, the summation operator $\sum_{k}^{'}$ includes only those trajectories that have reached the rare event; in the latter expression, $\sum_{i}^{'}$ also includes only those time steps where reaction R_j fired at time t_ki, with ${\hat{γ}}_{j r}^{(n - 1)}$ as its IS parameter. Equation 14 represents one of $\sum_{j = 1}^{M} β_{j}$ uncoupled equations from the final step of the multilevel CE algorithm, where ${\hat{γ}}^{(n)} \equiv {\hat{γ}}^{*}$ . We note that the numerator in Eq. 14 represents a weighted sum of the total number of times R_j fires with ${\hat{γ}}_{j r}^{(n - 1)}$ as its IS parameter over the trajectories that reached the rare event. Similarly, the denominator is a weighted sum of the expected total number of times R_j will fire with ${\hat{γ}}_{j r}^{(n - 1)}$ as its IS parameter across those same trajectories. Therefore, ${\hat{γ}}_{j r}^{*}$ will be greater than 1 if the rth IS parameter for R_j is chosen more often than on average to reach the rare event and less than 1 otherwise.

An important consideration when using the multilevel CE method is that the system reaches a rare event after passing through multiple rounds of “less rare” events. The relative propensity range spanned by a reaction upon reaching the first less rare event may differ greatly from the one spanned by the same reaction after reaching the final (i.e., target) rare event. We have resolved this difficulty by dynamically measuring the relative propensity range and adjusting bin sizes at every step of the multilevel CE method.

The entire process for rare event characterization with the sdwSSA and the multilevel cross-entropy method can be compactly described using the following two algorithms: core sdwSSA and CE-DB (cross entropy-dynamic binning). Lines 5, 14, 16 (core sdwSSA) and 1, 7, 10, 12, 16, 17, 19, 21 (CE-DB) contain instructions that are specific to the sdwSSA (i.e., that differ from dwSSA). First, we learn the optimal state-dependent biasing parameters γ and bin end points c using CE-DB, which calls the core sdwSSA in each level of the multilevel cross-entropy method with $K_{CE}$ number of trajectories. Once the CE-DB returns γ andc , they are substituted into the core sdwSSA along with K (the total number of realizations) to estimate the rare event probability $p (x_{0}, E; t)$ . In our experience, the number of realizations K required to accurately compute ${\hat{p}}_{sdwSSA} (x_{0}, E; t)$ is much greater than $K_{CE}$ required to accurately compute ${\hat{γ}}_{sdwSSA}^{*}$ . In core sdwSSA, $ν_{j}$ and t_f represent the state change vector for R_j and the simulation end time, respectively.

Algorithm: Core sdwSSA

Input:

K, γ

and c

1: m_K ← 0

2: fork = 1 to Kdo

3: t ← 0,

x \leftarrow x_{0}

, w ← 1

4: evaluate all

a_{j} (x)

and calculate

a_{0} (x)

5: evaluate all

γ_{j} (π_{j} (x))

and

b_{j} (x)

; calculate

b_{0} (x)

6: whilet ⩽ t_fdo

7: if

x \in E

then

8: m_K ← m_K + w

9: break out of the while loop

10: endif

11: generate two unit-interval uniform random numbers r₁ and r₂

12:

τ \leftarrow {b_{0}}^{- 1} (x) \ln (1 / r_{1})

13: j ← smallest integer satisfying

\sum_{i = 1}^{j} b_{i} (x) \geq r_{2} b_{0} (x)

14:

w \leftarrow w \times {(γ_{j} (π_{j} (x)))}^{- 1} \times \exp {(b_{0} (x) - a_{0} (x)) τ}

15: t ← t + τ,

x \leftarrow x + ν_{j}

16: update all

a_{j} (x)

and

a_{0} (x)

; recalculate

γ_{j} (π_{j} (x))

b_{j} (x)

and

b_{0} (x)

17: endwhile

18: endfor

19: return

{\hat{p}}_{s d w S S A} (x_{0}, E; t) = m_{K} / K

Open in a new tab

Algorithm: CE-DB (cross entropy-dynamic binning)

Input:

K_{CE}, ρ, β_{m a x,}

and κ_min

γ \leftarrow {v_{1}, \dots, v_{M}}

, where v_j is a vector of 1s with length β_max

2: i ← −1

3: repeat

4: i ← i + 1

5: while running core sdwSSA with

K = K_{CE}

6: mark the

⌈ ρ K_{CE} ⌉

trajectories evolving farthest in the direction of E

7: record min(π_j) and max(π_j)

8: endwhile

E_{i} \leftarrow

at most

⌈ ρ K_{CE} ⌉

states closest to E reached by the marked trajectories (one per trajectory)

10:

c_{j}^{r} \leftarrow min (π_{j}) + \frac{max (π_{j}) - min (π_{j})}{β_{m a x}} \times r, r \in {1, \dots, (β_{m a x} - 1)}

11: while running core sdwSSA with

K = K_{CE}

12: record the number of firings among β_max bins for each reaction

13: store information of trajectories that reach

E_{i}

14: endwhile

15: forj = 1 → Mdo

16: merge bins until every bin contains greater than κ_min reaction firings

17: update

{c_{j}^{1}, \dots, c_{j}^{β_{j} - 1}}

according to the result from step 16

18: endfor

19:

γ \leftarrow

result of Eq. 14 evaluated using

E_{i}

and trajectories from step 13

20: until

E_{i} \subseteq E

21: return

\hat{γ^{*}} = γ

and c

Open in a new tab

The two input parameters in the CE-DB, β_max and κ_min, are specific to the sdwSSA and control the dynamic binning strategy. Based on our experience, we recommend setting β_max = 10 and κ_min = 20. In Sec. 4, we test the sensitivity of the rare event estimate with respect to these parameters to illustrate the robustness of the sdwSSA.

Once CE-DB completes and returns ${\hat{γ}}^{*}$ and c, we substitute these values into the core sdwSSA to obtain the estimate ${\hat{p}}_{sdwSSA} (x_{0}, E; t)$ . Thus the total complexity of the rare event characterization process is (2n × the complexity of the sdwSSA), where n is the number of steps taken by the CE-DB. All examples we tested used n ⩽ 4. For these examples, the time needed for the CE-DB was much less than the time required for the core sdwSSA.

Finally, we note that the above two algorithms are easily parallelized. Specifically, the K trajectories simulated in the core sdwSSA can be generated independently. In all examples below, we execute the core sdwSSA and the CE-DB using a parallel computing cluster. Source code for CE-DB and the core sdwSSA is available upon request.

EXAMPLES

We illustrate sdwSSA performance on the following three examples: a reversible isomerization process, a yeast polarization model, and a lac operon model. For each example, we compare sdwSSA results and central processing unit (CPU) time with that of dwSSA. Unless otherwise mentioned, we use the default parameter values listed in Table 1.

Table 1.

List of input parameters for the core sdwSSA and the CE-DB

Parameter default value		Description
K	10⁶	Number of realizations used to compute $\hat{p}$
$K_{CE}$	10⁵	Number of realizations used to compute ${\hat{γ}}^{*}$
ρ	0.01	Fraction of trajectories in CE-DB
β_max	10	Maximum number of reaction bins
κ_min	20	Minimum number of data points per bin

Open in a new tab

To ensure fair comparison, we computed the dwSSA estimate with the same value of $K_{CE}$ as with the sdwSSA. As the dwSSA required a larger value of K to exhibit comparable accuracy to the sdwSSA, we continued simulating dwSSA trajectories until the dwSSA estimate uncertainty was approximately equal to that of the sdwSSA. When possible, we also estimated rare event probabilities using the swSSA parameterized with the optimal parameter values given in Ref. 7. Finally, we computed a 68% confidence interval for every estimate using the method described in Ref. 8.

To study the relationship between β_max and the accuracy of an sdwSSA estimate, we compared the uncertainties computed with different values of β_max ∈ {1, 2, …, 15}, where the sdwSSA estimate with β_max = 1 is equivalent to a dwSSA estimate. We performed a similar comparison with κ_min when applicable. All results were obtained by running in parallel on a 54 processor cluster (Intel Xeon 2.27 GHz).

Finally, we simplified our definition of a rare event in the following three examples by limiting the states of interest E to those governed by only a single species S. Specifically, we define a threshold species count θ^S above/below which the event occurs, rewriting $p (x_{0}; E; t)$ as $p (x_{0}; θ^{S}; t)$ .

Reversible isomerization

Our first example is taken from Ref. 7 and concerns isomers A and B that are interconverted according to the following two reactions:

\begin{matrix} A & \overset{k_{1}}{\to} & B, k_{1} = 0.12 \\ B & \overset{k_{2}}{\to} & A, k_{2} = 1 \end{matrix}

with $x_{0} = [100 0]$ .

We examine the rare event probability $p (x_{0}, θ^{B}; t) \equiv p (x_{0}, 30; 10)$ , the probability that the population of species B reaches 30 before time 10, given that the initial population is [100 0]. For this simple system the state space is finite, and it is possible to calculate the exact probability ( $p (x_{0}, 30; 10) = 1.191 \times 10^{- 5}$ ) by constructing a generator matrix⁹.

The optimal parameters for the dwSSA and the sdwSSA were learned from running the dwSSA multilevel CE method and the CE-DB, respectively. The optimal parameters for the swSSA were taken from Ref. 7. Table 2 summarizes the results. For each method, the rare event probability estimate, estimate uncertainty, CE-DB run time, and total simulation time are displayed. The optimal set of biasing parameters employed in each method is listed in Table 3.

Table 2.

Results for the reversible isomerization model.

Method	$\hat{p} (x_{0}, θ^{B}; t)$	68% uncertainty	CE-DB time (s)	Total time (s)
swSSA	1.190 × 10⁻⁵	0.002 × 10⁻⁵	NA	27
dwSSA	1.190 × 10⁻⁵	0.002 × 10⁻⁵	36	1.29 × 10⁵
sdwSSA	1.193 × 10⁻⁵	0.002 × 10⁻⁵	22	30

Open in a new tab

Table 3.

Importance sampling parameters for the reversible isomerization process.

Method	Algorithmic parameters
swSSA	$γ_{1}^{m a x} = 20, ρ_{1}^{0} = 0.5$
dwSSA	${\hat{γ}}^{*} = [1.301 0.719]$
sdwSSA	${\hat{γ}}^{*} = {[2.69, 1.74, 1.26, 1.11, 1.06, 1.03, 1.02, 1.02, 1.06, 1.00],$
	[1.01, 0.99, 0.97, 0.97, 0.95, 0.92, 0.83, 0.62, 0.40]},
	$c = {[0.30, 0.38, 0.46, 0.54, 0.61, 0.69, 0.77, 0.85, 0.92],$
	[0.15, 0.23, 0.31, 0.39, 0.46, 0.54, 0.62, 0.70]}

Open in a new tab

The total sdwSSA runtime is approximately equal to the runtime of the swSSA. Although the sdwSSA did not require any prior information to obtain the estimate in Table 2, the swSSA was given a window of parameter values that contained the optimal biasing parameter. Without this prior information, the number of trials to find the optimal swSSA biasing parameters (and thus the total simulation time) would be significantly greater. The dwSSA runtime to reach the accuracy of the sdwSSA estimate in Table 2 is about 4300 times greater than the sdwSSA runtime. Moreover, the sdwSSA required less time to estimate its optimal biasing parameters than the dwSSA: CE-DB utilized one fewer intermediate rare event than the dwSSA multilevel CE method. For this example, it is clear that a state-dependent biasing strategy significantly improved efficiency in estimating $p (x_{0}, 30; 10)$ .

The sdwSSA estimate in Table 2 was obtained with β_max = 10 and κ_min = 20. If we decrease β_max to 1, the sdwSSA becomes equivalent to the dwSSA regardless of the value used for κ_min. We therefore expected the accuracy of the sdwSSA estimate to increase with increasing β_max, with this trend ending only when the value of β_max leads to the creation and merging of superfluous bins. For this example, we recomputed sdwSSA estimate uncertainties (by re-running the CE-DB and the core sdwSSA) using values of β_max in {1, 2, …, 15}, keeping all other parameters the same as before. Figure 1 compares the results of four independent sdwSSA estimates at each value of β_max to the dwSSA and swSSA.

Uncertainty vs β_max for the reversible isomerization process. For each β_max, four independent sdwSSA estimates are obtained. The two dotted red lines connect the minimum and maximum uncertainties at each β_max value. The blue line corresponds to the uncertainty obtained from K = 10⁶ swSSA simulations, parameterized with the optimal IS parameters listed in Table 3. The four green lines represent uncertainties of dwSSA estimates.

As expected, the sdwSSA uncertainty decreases with increasing β_max, although the four estimates for β_max ⩽ 6 exhibit high variability. This variability diminishes for larger β_max, indicating a more consistent discretization of relative propensities. As β_max reaches a value of 11, we see a reproducible increase in uncertainty. This increase is specific only to the reversible isomerization model (see examples below). We suspect that it is due to an artifact of our binning strategy. We note that the uncertainty at β_max = 11 is still an order of magnitude less than the lowest uncertainty achieved by dwSSA estimates. As we further increase β_max, the uncertainty again decreases toward that achieved by the swSSA.

Figure 1 does not show the effects of varying κ_min. Each isomerization reaction fires at least 80 000 times in a single simulation, so unless we increase κ_min to 200, the algorithm yields the same estimate as in Table 2 with κ_min = 20. We thus conclude that this rare event probability estimate is not sensitive to κ_min.

Yeast polarization

Next we consider the pheromone-induced G-protein cycle in Saccharomyces cerevisiae. This system is taken from Ref. 5 and consists of seven species x = [R L RL G G_a G_bg G_d], whose dynamics are represented by the following eight reactions:

\begin{matrix} \begin{matrix} R 1 : & \emptyset \overset{k_{1}}{\to} R & k_{1} = 0.0038 \\ R 2 : & R \overset{k_{2}}{\to} \emptyset & k_{2} = 4.00 \times 10^{- 4} \\ R 3 : & L + R \overset{k_{3}}{\to} R L + L & k_{3} = 0.042 \\ R 4 : & R L \overset{k_{4}}{\to} R & k_{4} = 0.010 \\ R 5 : & R L + G \overset{k_{5}}{\to} G_{a} + G_{b g} & k_{5} = 0.011 \\ R 6 : & G_{a} \overset{k_{6}}{\to} G_{d} & k_{6} = 0.100 \\ R 7 : & G_{d} + G_{b g} \overset{k_{7}}{\to} G & k_{7} = 1.05 \times 10^{3} \\ R 8 : & \emptyset \overset{k_{8}}{\to} R L & k_{8} = 3.21, \end{matrix} \end{matrix}

with $x_{0} = [50 2 0 50 0 0 0]$ . For this system, we estimated the rare event probability $p (x_{0}, θ^{G_{b g}}; t) \equiv p (x_{0}, 40; 5)$ ; i.e., the probability that the population of G_bg reaches 40 before time 5. To accurately estimate this probability using any method, it is necessary to bias more than one reaction of the system. However, the tortuous trial and error procedure associated with optimizing two or more parameters of the swSSA prohibits us from estimating $p (x_{0}, θ^{G_{b g}}; t)$ with this method. The dwSSA and the sdwSSA estimates were obtained by first computing respective optimal biasing parameters using the multilevel CE method with $K_{CE} = 10^{6}$ , followed by simulating K = 10⁷ sdwSSA and 6 × 10⁸ dwSSA trajectories to estimate the rare event probability. We note that due to the high intrinsic stochasticity, we increased the values for $K_{CE}$ and sdwSSA K by ten times their default values. The simulation results and optimal biasing parameters are summarized in Tables 4, 5, respectively.

Table 4.

Results for the yeast polarization model.

Method	$\hat{p} (x_{0}, θ^{G_{b g}}; t)$	68% uncertainty	CE-DB time (s)	Total time (s)
dwSSA	1.058 × 10⁻¹¹	0.010 × 10⁻¹¹	51	3116
sdwSSA	1.082 × 10⁻¹¹	0.010 × 10⁻¹¹	51	132

Open in a new tab

Table 5.

Optimal IS parameters for the yeast polarization model.

j	${\hat{γ}}_{d w S S A}^{*}$	${\hat{γ}}_{s d w S S A}^{*}$
1	0.786	[0.729 1.216 0.971 0.889 2.137]
2	0.670	[0.939 0.766 0.790 0.988 0.418 0.778 0.151 0.945]
3	1.800	[1.477 1.631 1.785 1.899 1.929 1.903 1.839 1.826 1.863]
4	0.692	[0.366 0.455 0.465 0.547 0.620 0.612 1.194]
5	1.687	[1.042 3.455 4.743 3.136 2.237 1.659 1.380 1.277 1.222 1.186]
6	0.250	[0.356 0.368 0.343 0.325 0.284 0.246 0.237 0.184 0.148 0.121]
7	0.987	[1.001]
8	2.048	[2.064 2.072 2.029 1.796 2.413 2.446 2.443]
j		c
1		[0.000259 0.00031 0.000362 0.000414]
2		[0.000822 0.0011 0.00137 0.00164 0.00192 0.00219 0.00246]
3		[0.115 0.172 0.23 0.287 0.344 0.402 0.459 0.517]
4		[0.00571 0.00857 0.0114 0.0143 0.0171 0.0200]
5		[0.069 0.138 0.207 0.276 0.345 0.414 0.483 0.552 0.621]
6		[0.0398 0.0796 0.119 0.159 0.199 0.239 0.279 0.318 0.358]
7		NA
8		[0.175 0.219 0.262 0.306 0.35 0.394 ]

Open in a new tab

In Table 4, the simulation time of the dwSSA to yield an estimate of similar accuracy as the sdwSSA is 24 times greater than the sdwSSA simulation time. When we compare ${\hat{γ}}_{dwSSA}^{*}$ to ${\hat{γ}}_{sdwSSA}^{*}$ in Table 5, the largest differences are observed in the parameters of R₄, R₅, and R₆ (ignoring the parameters of R₁ and R₂, which were previously shown to have a negligible influence on the accuracy of the probability estimate). The sdwSSA parameters of the other reactions (excluding R₁ and R₂) do not show appreciable variation between bins and are in general agreement with the optimal dwSSA parameters. We conclude that the use of state-dependent IS parameters for R₄, R₅, and R₆ is largely responsible for the almost 25-fold computational gain of the sdwSSA over the dwSSA.

We next measured the sensitivity of ${\hat{p}}_{sdwSSA}$ to β_max. As before, we computed four independent sdwSSA probability estimates for β_max = {1, …, 15}. Figure 2 displays their uncertainties. As expected, the uncertainties gradually decrease with increasing β_max. Unlike with the reversible isomerization model, we do not display the swSSA uncertainty for the yeast model since it is not feasible to optimize multiple swSSA IS parameters.

Uncertainty vs β_max for the yeast polarization model. For each β_max, four independent instances of the CE-DB with $K_{CE} = 10^{6}$ were executed, followed by four instances of the core sdwSSA with K = 10⁷. Each red dot corresponds to the uncertainty of sdwSSA estimates, and two red lines connect the minimum and the maximum uncertainties in each β_max value. The four green lines represent dwSSA uncertainties.

Finally, we evaluated the sensitivity of ${\hat{p}}_{sdwSSA}$ to the κ_min parameter. We executed the CE-DB and the core sdwSSA as before, using either κ_min = 10 or κ_min = 30. The results are shown in Fig. 3. We see no apparent uncertainty differences among estimates generated with κ_min = 20 (Fig. 2) and κ_min = 30 (Fig. 3b). However, estimate uncertainty varies more strongly between the four ensembles using κ_min = 10 (Fig. 3a). We hypothesize that this higher variability is due to an insufficient number of reaction firings in each bin used to estimate the optimal sdwSSA parameters. Since the yeast system exhibits high intrinsic stochasticity, consistent estimates of IS parameters likely require >10 reaction firings in each bin. This helps to illustrate a general point: whereas setting κ_min too high can decrease β_j, leading to suboptimal exploration of the relative propensity range, setting it too low can increase estimate variance. For the problems we have tested, κ_min = 20 seems to confer proper discretization of the relative propensity range as well as reliable estimation of sdwSSA parameters.

Uncertainty vs β_max for the yeast polarization model. All data were generated using the same simulation settings as in Fig. 2, except for the value of κ_min. Panel (a) was generated with κ_min = 10, and panel (b) was generated with κ_min = 30. Estimated uncertainties obtained with κ_min = 10 display significantly higher variability among the four ensembles than the estimates obtained with κ_min = 20 or κ_min = 30. As the magnitude of the rare event probability is very small, we suspect that greater numbers of firings are required in each bin to obtain reliable estimates of the state-dependent IS parameters.

Lac operon

Our last example is a lac operon model¹⁰ consisting of 12 species (x = [M_R R R₂ O R₂O I I_ex I₂R₂ M_Y Y YI_ex Y_tot]) and the following 25 reactions:

\begin{matrix} \begin{matrix} R 1 : & \emptyset \overset{k_{1}}{\to} M_{R} & k_{1} = 0.111 \\ R 2 : & M_{R} \overset{k_{2}}{\to} M_{R} + R & k_{2} = 15.0 \\ R 3 : & 2 R \overset{k_{3}}{\to} R_{2} & k_{3} = 103.8 \\ R 4 : & R_{2} \overset{k_{4}}{\to} 2 R & k_{4} = 0.001 \\ R 5 : & R_{2} + O \overset{k_{5}}{\to} R_{2} O & k_{5} = 1992.7 \\ R 6 : & R_{2} O \overset{k_{6}}{\to} R_{2} + O & k_{6} = 2.40 \\ R 7 : & 2 I + R_{2} \overset{k_{7}}{\to} I_{2} R_{2} & k_{7} = 1.293 \times 10^{- 6} \\ R 8 : & I_{2} R_{2} \overset{k_{8}}{\to} 2 I + R_{2} & k_{8} = 12.0 \\ R 9 : & 2 I + R_{2} O \overset{k_{9}}{\to} I_{2} R_{2} + O & k_{9} = 1.293 \times 10^{- 6} \end{matrix} \end{matrix}

\begin{matrix} \begin{matrix} R 10 : & I_{2} R_{2} + O \overset{k_{10}}{\to} 2 I + R_{2} O & k_{10} = 9963.2 \\ R 11 : & O \overset{k_{11}}{\to} O + M_{Y} & k_{11} = 0.50 \\ R 12 : & R_{2} O \overset{k_{12}}{\to} R_{2} O + M_{Y} & k_{12} = 0.010 \\ R 13 : & M_{Y} \overset{k_{13}}{\to} M_{Y} + Y & k_{13} = 30.0 \\ R 14 : & Y + I_{e x} \overset{k_{14}}{\to} Y I_{e x} & k_{14} = 0.249 \\ R 15 : & Y I_{e x} \overset{k_{15}}{\to} Y + I_{e x} & k_{15} = 0.10 \\ R 16 : & Y I_{e x} \overset{k_{16}}{\to} Y + I & k_{16} = 60 000 \\ R 17 : & I_{e x} \overset{k_{17}}{\to} I & k_{17} = 0.920 \\ R 18 : & I \overset{k_{18}}{\to} I_{e x} & k_{18} = 0.920 \\ R 19 : & M_{R} \overset{k_{19}}{\to} \emptyset & k_{19} = 0.462 \\ R 20 : & M_{Y} \overset{k_{20}}{\to} \emptyset & k_{20} = 0.462 \\ R 21 : & R \overset{k_{21}}{\to} \emptyset & k_{21} = 0.20 \\ R 22 : & R_{2} \overset{k_{22}}{\to} \emptyset & k_{22} = 0.20 \\ R 23 : & Y \overset{k_{23}}{\to} \emptyset & k_{23} = 0.20 \\ R 24 : & Y I_{e x} \overset{k_{24}}{\to} I & k_{24} = 0.20 \\ R 25 : & I_{2} R_{2} \overset{k_{25}}{\to} 2 I & k_{25} = 0.20 \end{matrix} \end{matrix}

The lac operon genetic switch has been widely studied since it was first discovered in 1960. Specifically, the positive feedback loop underlying the switch has garnered wide interest among researchers because it is accountable for the all-or-none bistable response of the lac operon. The key player in the positive feedback loop is lacY, which facilitates lactose import.

In this example, we consider the rare event probability $p (x_{0}, θ^{Y_{t o t}}; t) \equiv p (x_{0}, 120; 0.5)$ , i.e., the probability that the total number of Y molecules reaches 120 before time 0.5 given the initial condition $x_{0} = [0 0 0 1 0 0 48177 0 0 0 0 0]$ . We computed dwSSA and sdwSSA probability estimates for this system using the same simulation settings as in the yeast polarization model (except that K = 4.5 × 10⁷ for the dwSSA). The simulation statistics and the optimal biasing parameters are listed in Table 6 and the Appendix, respectively.

Table 6.

Results for the lac operon model.

Method	$\hat{p} (x_{0}, θ^{Y_{t o t}}; t)$	68% uncertainty	CE-DB time (h)	Total time (h)
dwSSA	3.75 × 10⁻¹⁵	0.07 × 10⁻¹⁵	9.33	97.36
sdwSSA	3.72 × 10⁻¹⁵	0.07 × 10⁻¹⁵	11.47	59.52

Open in a new tab

The total simulation time in Table 6 shows that the dwSSA takes ∼4 days to obtain an estimate of similar accuracy as the sdwSSA estimate, which required less than 2.5 days. We also note that the dwSSA multilevel CE method used three intermediate rare events ( $E_{1} = 18, E_{2} = 65$ , and $E_{3} = 117$ ) to estimate its optimal biasing parameters, while the multilevel CE method for the sdwSSA required only two ( $E_{1} = 18$ and $E_{2} = 60$ ).

A close look at the optimal IS parameter values in Table 7 (with associated bin end points in Table 8) reveals that many of the 25 state-dependent biasing parameters are in agreement with the corresponding state-independent dwSSA parameters. However, several sdwSSA parameters show differences that are worthy of discussion. One group of parameters (e.g., ${\hat{γ}}_{1}^{*}$ and ${\hat{γ}}_{11}^{*}$ ) show the same biasing directions in both the dwSSA and the sdwSSA but differ in their magnitudes. The sdwSSA parameter values indicate that R₁ and R₁₁ require significantly more perturbation as they become more likely to fire. In particular, the relative propensity range of R₁₁ is made up of several bins whose optimal IS parameter values are >100. The optimal dwSSA parameter for this reaction is 32.42, which is roughly equivalent to the average sdwSSA parameter value taken over all R₁₁ bins. In the dwSSA regime, setting γ₁₁ ⩾ 32.42 will over-perturb R₁₁ when it is not likely to fire; conversely, decreasing γ₁₁ will under-perturb R₁₁ when it is likely to fire. Reactions R₁ and R₁₁ in the lac operon network are similar to reactions R₄, R₅, and R₆ in the yeast polarization model in that they show the greatest benefit from using state-dependent biasing parameters.

Table 7.

Complete list of optimal IS parameters employed by the dwSSA and the sdwSSA in the lac operon model.

Reaction index	${\hat{γ^{*}}}_{dwSSA}$	${\hat{γ^{*}}}_{sdwSSA}$
1	0.48	[0.41, 0.54, 0.23, 0.14]
2	0.82	[1.15, 0.79, 1.00, 0.78]
3	0.83	[0.81, 5.57]
4	1.66e-83	[1.80e-82]
5	0.88	[1.28, 1.35]
6	1.33	[0.21]
7	1.09	[0.094, 0.54, 1.33, 0.29, 0.58]
8	0.89	[0.63, 0.32]
9	0.95	[1.05, 0.98, 1.13, 1.10, 0.89, 1.09, 0.66, 0.91, 0.79, 1.21]
10	1.00	[1.09, 1.11, 1.07, 0.74, 0.88, 0.88]
11	32.42	[13.14, 22.35, 18.82, 36.02, 17.06, 93.93, 98.64, 127.5, 116.2, 122.3]
12	1.82e-86	[0.0019]
13	1.35	[1.36, 1.40, 1.40, 1.23, 1.30, 1.52, 0.72, 0.29]
14	1.00	[0.96, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.99]
15	0.95	[0.11, 0.46, 0.89, 0.91, 0.52, 0.38]
16	1.00	[1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.97, 0.99]
17	1.00	[1.00, 1.00, 1.01, 1.00, 1.00, 1.01, 1.00, 1.02, 1.02, 1.00]
18	1.00	[0.99, 1.00, 1.00, 0.99, 0.99, 1.00, 1.00, 1.00, 0.99, 0.99]
19	0.19	[3.97]
20	0.20	[0.028, 0.13, 0.17, 1.51, 0.66]
21	0.87	[0.057]
22	8.35e-86	[9.01e-85]
23	0.72	[0.80, 0.69, 0.88, 0.72, 0.61, 0.59, 0.34, 1.30, 0.38, 0.48]
24	0.44	[0.47, 1.13, 1.36, 0.42, 0.25, 0.41]
25	1.44e-87	[1.73e-86]

Open in a new tab

Table 8.

End points of bins associated with the optimal sdwSSA IS parameters in Table 7.

Reaction index	c
1	[3.29, 5.70, 8.11] × 10⁻⁷
2	[4.65, 9.31, 13.8] × 10⁻⁵
3	[0.0012]
4	NA
5	[0.011]
6	NA
7	[0.0031, 0.0063, 0.0093, 0.012]
8	[0.00013]
9	[1.63, 3.26, 4.90, 6.53, 8.17, 9.80, 11.4, 13.0, 14.7]× 10⁻³
10	[0.057, 0.11, 0.17, 0.23, 0.29]
11	[1.13, 2.26, 3.38, 4.51, 5.64, 6.77, 7.89, 9.02, 10.2] × 10⁻⁶
12	NA
13	[0.00067, 0.0013, 0.0020, 0.0027, 0.0034, 0.0040, 0.0047]
14	[0.088, 0.18, 0.27, 0.35, 0.44, 0.53, 0.62, 0.71, 0.80]
15	[4.68, 6.24, 7.80, 9.36, 10.9] × 10⁻⁷
16	[0.19, 0.28, 0.37, 0.47, 0.56, 0.66, 0.75, 0.84]
17	[0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90]
18	[0.05, 0.10, 0.15, 0.21, 0.26, 0.31, 0.36, 0.41, 0.46]
19	NA
20	[1.03, 2.07, 3.10, 4.14] × 10⁻⁵
21	NA
22	NA
23	[2.82, 5.64 8.46, 11.2, 14.1, 16.9, 19.7, 22.5, 25.4]× 10⁻⁵
24	[0.94, 1.24, 1.56, 1.87, 2.18] × 10⁻⁶
25	NA

Open in a new tab

A second group of parameters exhibit an opposite biasing scheme in the dwSSA versus the sdwSSA. Reactions R₅ and R₁₉ are discouraged using optimal dwSSA parameters, yet they are encouraged in the sdwSSA. We note that both reactions utilize fewer than three bins after undergoing merging. If R₅ and R₁₉ were important for accurately estimating the rare event probability, we would expect them to require a greater number of relative propensity bins containing monotonically varying parameter values. Thus, we hypothesize that the probability estimate is relatively insensitive to the parameters of these two reactions. To confirm this hypothesis, we recomputed parameter values for the dwSSA and sdwSSA with different random number generator seeds. In the second set of dwSSA parameters, γ₅ changed from 0.88 to 1.81 and γ₁₉ from 0.19 to 0.95. The corresponding second set of sdwSSA parameters also showed high variability in their values. Thus, we conclude that $\hat{p} (x_{0}, θ^{Y_{t o t}}; t)$ is not sensitive to γ₅ and γ₁₉ in both the dwSSA and sdwSSA.

CONCLUSION

We have developed a novel modification of the doubly weighted SSA (dwSSA)—the state-dependent doubly weighted SSA (sdwSSA)—which yields an accurate estimate of a rare event probability, even for systems where the relative propensities vary widely over time. The sdwSSA is a natural extension of the dwSSA which reduces to the latter when β_max, the maximum number of bins used to discretize relative propensity, is set to 1. Consequently, the sdwSSA retains all benefits of the dwSSA while providing greater computational efficiency. It achieves this by automatically computing the optimal set of state-dependent biasing parameters and subsequently using these parameters to accurately estimate a rare event probability. Numerical results reported in Sec. 4 demonstrate the improved accuracy and efficiency of the method. For practical purposes, it is important that the sdwSSA not only improves estimate accuracy but also adds minimal computation to the existing dwSSA framework. We have achieved this requirement by (1) proposing a specific form for the state-dependent biasing parameter that yields a closed-form solution when coupled with the multilevel cross-entropy method, and (2) incorporating a dynamic binning strategy into the cross-entropy framework that requires no additional realizations over the dwSSA.

The sdwSSA discretizes the relative propensity range of each reaction to arrive at a closed-form expression of its biasing parameters. For each reaction, we interpret the β_j state-dependent biasing parameters as a step function, i.e., any relative propensity value between $c_{j}^{i - 1}$ and $c_{j}^{i}$ is assigned γ_ji. While leaving the discretization scheme unchanged, a higher order interpolation could be subsequently used as an alternative approach for determining the amount of perturbation for each reaction. Future work will focus on customizing the binning strategy for individual reactions such that key reactions for a given rare event are identified and their parameters computed using an optimal interpolation method.

ACKNOWLEDGMENTS

We acknowledge the following financial support: B.J.D.J. and L.R.P. were supported by the Institute for Collaborative Biotechnologies through Grant No. W911NF-09-0001 from the (U.S.) Army Research Office (USARO). The content of the information does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred. M.K.R. and L.R.P. were supported by the National Institutes of Health (NIH) Grant No. 5R01EB007511-03 and the (U.S.) Department of Energy (DOE) Grant No. DE-FG02-04ER25621. L.R.P. was also supported by the National Science Fund (NSF) Grant No. DMS-1001012. D.T.G. was supported by the California Institute of Technology through Consulting Agreement No. 102-1080890 pursuant to Grant No. R01GM078992 from the National Institute of General Medical Sciences and through Contract No. 82-1083250 pursuant to Grant No. R01EB007511 from the National Institute of Biomedical Imaging and Bioengineering, and also from the University of California at Santa Barbara under Consulting Agreement No. 054281A20 pursuant to funding from the NIH.

APPENDIX: SUPPLEMENTAL INFORMATION ON LAC OPERON MODEL

This section presents the optimal importance sampling parameter values and associated bin end points for the lac operon model.

References

Dingli D. and Pacheco J. M., BMC Biology 9, 41 (2011). 10.1186/1741-7007-9-41 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gillespie D., J. Phys. Chem. 81, 2340 (1977). 10.1021/j100540a008 [DOI] [Google Scholar]
Rubino G. and Tuffin B., Rare Event Simulation Using Monte Carlo Methods (Wiley, Chichester, UK, 2009). [Google Scholar]
Kuwahara H. and Mura I., J. Chem. Phys. 129, 165101 (2008). 10.1063/1.2987701 [DOI] [PubMed] [Google Scholar]
B. J.DaigleJr., Roh M. K., Gillespie D. T., and Petzold L. R., J. Chem. Phys. 134, 044110 (2011). 10.1063/1.3522769 [DOI] [PMC free article] [PubMed] [Google Scholar]
Rubinstein R. Y. and Kroese D. P., The Cross-entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning (Springer, New York, 2004). [Google Scholar]
Roh M. K., Gillespie D. T., and Petzold L. R., J. Chem. Phys. 133, 174106 (2010). 10.1063/1.3493460 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gillespie D. T., Roh M., and Petzold L. R., J. Chem. Phys. 130, 174103 (2009). 10.1063/1.3116791 [DOI] [PMC free article] [PubMed] [Google Scholar]
Nelson R., Probability, Stochastic Processes and Queueing Theory (Springer, New York, 1995). [Google Scholar]
Stamatakis M. and Mantzaris N. V., Biophys. J. 96, 887 (2009). 10.1016/j.bpj.2008.10.028 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c1] Dingli D. and Pacheco J. M., BMC Biology 9, 41 (2011). 10.1186/1741-7007-9-41 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c2] Gillespie D., J. Phys. Chem. 81, 2340 (1977). 10.1021/j100540a008 [DOI] [Google Scholar]

[c3] Rubino G. and Tuffin B., Rare Event Simulation Using Monte Carlo Methods (Wiley, Chichester, UK, 2009). [Google Scholar]

[c4] Kuwahara H. and Mura I., J. Chem. Phys. 129, 165101 (2008). 10.1063/1.2987701 [DOI] [PubMed] [Google Scholar]

[c5] B. J.DaigleJr., Roh M. K., Gillespie D. T., and Petzold L. R., J. Chem. Phys. 134, 044110 (2011). 10.1063/1.3522769 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c6] Rubinstein R. Y. and Kroese D. P., The Cross-entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning (Springer, New York, 2004). [Google Scholar]

[c7] Roh M. K., Gillespie D. T., and Petzold L. R., J. Chem. Phys. 133, 174106 (2010). 10.1063/1.3493460 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c8] Gillespie D. T., Roh M., and Petzold L. R., J. Chem. Phys. 130, 174103 (2009). 10.1063/1.3116791 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c9] Nelson R., Probability, Stochastic Processes and Queueing Theory (Springer, New York, 1995). [Google Scholar]

[c10] Stamatakis M. and Mantzaris N. V., Biophys. J. 96, 887 (2009). 10.1016/j.bpj.2008.10.028 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

State-dependent doubly weighted stochastic simulation algorithm for automatic characterization of stochastic biochemical rare events

Min K Roh

Bernie J Daigle Jr

Dan T Gillespie

Linda R Petzold

Abstract

INTRODUCTION

BACKGROUND

Rare event probabilities and the SSA

Doubly weighted SSA

SDWSSA FORMULATION AND THE MULTILEVEL CROSS-ENTROPY METHOD

State-dependent doubly weighted SSA

The sdwSSA and the cross-entropy method

EXAMPLES

Table 1.

Reversible isomerization

Table 2.

Table 3.

Figure 1.

Yeast polarization

Table 4.

Table 5.

Figure 2.

Figure 3.

Lac operon

Table 6.

Table 7.

Table 8.

CONCLUSION

ACKNOWLEDGMENTS

APPENDIX: SUPPLEMENTAL INFORMATION ON LAC OPERON MODEL

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

State-dependent doubly weighted stochastic simulation algorithm for automatic characterization of stochastic biochemical rare events

Min K Roh

Bernie J Daigle Jr

Dan T Gillespie

Linda R Petzold

Abstract

INTRODUCTION

BACKGROUND

Rare event probabilities and the SSA

Doubly weighted SSA

SDWSSA FORMULATION AND THE MULTILEVEL CROSS-ENTROPY METHOD

State-dependent doubly weighted SSA

The sdwSSA and the cross-entropy method

EXAMPLES

Table 1.

Reversible isomerization

Table 2.

Table 3.

Figure 1.

Yeast polarization

Table 4.

Table 5.

Figure 2.

Figure 3.

Lac operon

Table 6.

Table 7.

Table 8.

CONCLUSION

ACKNOWLEDGMENTS

APPENDIX: SUPPLEMENTAL INFORMATION ON LAC OPERON MODEL

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases