Adaptively biased sequential importance sampling for rare events in reaction networks with comparison to exact solutions from finite buffer dCME method

Youfang Cao; Jie Liang

doi:10.1063/1.4811286

. 2013 Jul 9;139(2):025101. doi: 10.1063/1.4811286

Adaptively biased sequential importance sampling for rare events in reaction networks with comparison to exact solutions from finite buffer dCME method

Youfang Cao ^1,^a), Jie Liang ^1,^b)

PMCID: PMC3724789 PMID: 23862966

Abstract

Critical events that occur rarely in biological processes are of great importance, but are challenging to study using Monte Carlo simulation. By introducing biases to reaction selection and reaction rates, weighted stochastic simulation algorithms based on importance sampling allow rare events to be sampled more effectively. However, existing methods do not address the important issue of barrier crossing, which often arises from multistable networks and systems with complex probability landscape. In addition, the proliferation of parameters and the associated computing cost pose significant problems. Here we introduce a general theoretical framework for obtaining optimized biases in sampling individual reactions for estimating probabilities of rare events. We further describe a practical algorithm called adaptively biased sequential importance sampling (ABSIS) method for efficient probability estimation. By adopting a look-ahead strategy and by enumerating short paths from the current state, we estimate the reaction-specific and state-specific forward and backward moving probabilities of the system, which are then used to bias reaction selections. The ABSIS algorithm can automatically detect barrier-crossing regions, and can adjust bias adaptively at different steps of the sampling process, with bias determined by the outcome of exhaustively generated short paths. In addition, there are only two bias parameters to be determined, regardless of the number of the reactions and the complexity of the network. We have applied the ABSIS method to four biochemical networks: the birth-death process, the reversible isomerization, the bistable Schlögl model, and the enzymatic futile cycle model. For comparison, we have also applied the finite buffer discrete chemical master equation (dCME) method recently developed to obtain exact numerical solutions of the underlying discrete chemical master equations of these problems. This allows us to assess sampling results objectively by comparing simulation results with true answers. Overall, ABSIS can accurately and efficiently estimate rare event probabilities for all examples, often with smaller variance than other importance sampling algorithms. The ABSIS method is general and can be applied to study rare events of other stochastic networks with complex probability landscape.

INTRODUCTION

Many critical events in biological processes occur rarely within the relevant physical time scale. Bacteriophage λ in E. coli¹^,²^,³ can maintain a stable dormant lysogenic lifestyle when integrated into the E. coli genome, but can spontaneously transit to the lytic lifestyle of phage outburst²^,⁴^,⁵^,⁶^,⁷^,⁸ with a small probability (∼4 × 10⁻⁷ per cell cycle⁹). Crossing barrier in the free energy landscape in some slow-folding protein may be rare, but critical.¹⁰ In tumorigenesis, cells experiencing normal growth rarely transit spontaneously to uncontrolled tumor growth.¹¹^,¹² However, environmental changes, e.g., those resulting in the accumulation of DNA hypermethylation in promoter CpG islands,¹³^,¹⁴ can accelerate such transitions. Multi-stable cellular states of endogenous molecular-cellular networks and rare stochastic transitions between them may offer a general framework to study human diseases.¹⁵^,¹⁶ Accurate assessment of rare event probabilities is, therefore, important for understanding the machineries behind many critical biological processes.

It is challenging to study rare events from the viewpoint of mechanistic theory.¹⁷^,¹⁸ Here we study networks of biochemical reactions. In principle, the transition probability rates between two states can be calculated exactly, if the state space of biochemical reaction networks are completely accounted for, e.g., when the underlying discrete chemical master equation can be solved exactly.⁸^,¹⁹ However, when the state spaces and the transition matrices are too large to be efficiently computed, a widely used approach to study stochastic behavior of biochemical reactions is that of Monte Carlo sampling, first formulated as the stochastic simulation algorithm (SSA).²¹ However, the original SSA²¹ is ineffective for studying rare events, as most computing time is spent on following high-probability paths.²²^,²³

The techniques of importance sampling and reweighting can improve sampling efficiency significantly. They have been widely used in equilibrium sampling where the condition of detailed balance holds.²⁴^,²⁵ However, stochastic processes in reaction networks are generally not time reversible and the condition of detailed balance is not valid. Kuwahara and Mura developed the weighted SSA (wSSA) algorithm by applying the importance sampling technique to study stochastic reaction networks, in which each reaction rate is biased by a pre-determined constant, with the overall summation of reaction rates unchanged.²² As the probability for reaction selection can be biased such that rare events are sampled more frequently while the time scale of the underlying reactions is maintained, significantly improved sampling efficiency for rare events was reported.²²^,²³^,²⁶ However, the choice of bias constants strongly affects the effectiveness of wSSA. When there are many reactions and the network is complex, the heuristic approach of determining bias constants by examining the reactions does not work.²² As there is no general guidance in how bias constants should be chosen, poor choices may lead to estimations that are less accurate than the original SSA.²³

Daigle et al. developed the doubly-weighted SSA (dwSSA) algorithm, in which a multilevel cross-entropy (CE) method is used iteratively to provide estimates of bias constants.²³ This is achieved by running long trial simulations until a fraction of the sampled trajectories reaches the target states.²³ With this automated estimation, both reaction selection and the underlying time scale of reactions can be biased.²³

A drawback of methods using constant biases such as wSSA and dwSSA is that the bias coefficients are global and state-independent, and are not influenced by the concentrations of molecules, which evolve with time. As the apparent rate of a reaction can vary dramatically depending on the copy number of molecules, the degree of bias for a reaction therefore needs to be adjusted according to the available copy numbers of reactants. With globally fixed bias constants, a network with reactions of a wide range of rates will have over- and under-biased reactions, depending on the states of the system. As a result, estimated properties of a network will have large variance, making these methods unsuitable for complex networks.²⁷

Roh et al. developed a state-dependent biasing wSSA method (swSSA).²⁷ By empirically classifying reactions into groups of favored, disfavored, and neutral reactions, biases in selection probability for reactions in the first two groups are calculated in a state-dependent fashion. The swSSA method can have better estimation accuracy and efficiency than the wSSA method,²⁷ at the expense of about twice as many biasing parameters as that of the wSSA.²⁷ Roh et al. further developed the state-dependent doubly weighted SSA method (sdwSSA), where reactions are further grouped into bins according to their selection probabilities, and are assigned different bias constants, which are automatically estimated using the cross-entropy method.²⁶ However, the number of parameters to be estimated using sdwSSA is much larger than that of wSSA, dwSSA, and swSSA. For example, about 20 bias constants need to be estimated for a simple reversible isomerization system with only two reactions.²⁶ Estimating a large number of bias constants needed for complex networks becomes difficult.

In this study, we describe an algorithm named adaptively biased sequential importance sampling (ABSIS) for efficient sampling of rare events. Based on the principle of sequential importance sampling, our approach adopts the look-ahead strategy, a technique well-established in polymer and protein studies,²⁸^,²⁹^,³⁰^,³¹ to gather future information for design of bias parameters to enable effective barrier crossings.²⁸^,²⁹^,³⁰^,³¹^,³²^,³³ By enumerating short paths from the current state, bias coefficients are generated based on analysis of these short paths. Unlike the dwSSA and sdwSSA methods, in which biases are fixed constants after parameter estimation, the biases in ABSIS for each reaction is dynamically determined based on exact calculation of the total probability of short κ-step forward- and backward-moving reaction paths, without the need of binning reaction rates. Reactions with higher probability of forward-moving are then encouraged, and reactions with higher probability of backward-moving are discouraged. Regardless of the number of reactions in the networks, we only need to assign two bias parameters for the whole network: the degree to encourage forward-moving reactions and the degree to discourage backward-moving reactions, which both can be estimated through an efficient parameter estimation algorithm.

We also take advantage of the recent development of a method that directly solves the discrete chemical master equation.⁸^,¹⁹ With a finite buffer, the rare event probability of a stochastic network of modest size can be computed exactly using this method, allowing us to have a gold standard to objectively assess the accuracy of estimated rare event probabilities through sampling. With errors computed based on exact numerical solutions, we show with four biological examples that the ABSIS method have improved or comparable accuracies compared to other methods (the dwSSA method, and the swSSA and sdwSSA methods when data available), at overall significantly reduced computational cost and much higher success rate.

This article is organized as follows: We briefly discuss the theoretical framework of reaction networks, the principle of sequential importance sampling, and details of the ABSIS method. We then apply our method to study four biological problems, namely, the birth-death process, the reversible isomerization model, the bistable Schlögl model, and the enzymatic futile cycle, and compare the accuracies of estimations and the success rates in generating reaction paths reaching the target states with the SSA and dwSSA methods. We conclude with remarks and discussions.

MODEL FRAMEWORK

Reaction networks

We assume a well-mixed biochemical system with constant volume and temperature. There are n different molecular species: $X = {X_{1}, X_{2}, \dots, X_{n}}$ . We use x_i(t) to denote the copy number of molecular species X_i at time t. There are m possible different reactions in the system: $R = {R_{1}, R_{2}, \dots, R_{m}}$ . Each reaction R_k has an intrinsic reaction rate constant r_k. The microstate of the system at time t is represented by a non-negative integer column vector: $x (t) = {(x_{1} (t), x_{2} (t), \dots, x_{n} (t))}^{T}$ , where ^T denotes the transpose. An arbitrary reaction R_k (k = 1, 2, ⋯, m) with intrinsic rate r_k takes the general form:

c_{1 k} X_{1} + c_{2 k} X_{2} + \dots + c_{n k} X_{n} \overset{r_{k}}{\to} c_{1 k}^{'} X_{1} + c_{2 k}^{'} X_{2} + \dots + c_{n k}^{'} X_{n},

which brings the system from a microstate $x_{i}$ to $x_{j}$ . The difference between $x_{i}$ and $x_{j}$ is the stoichiometry vector $s_{k}$ of the reaction R_k: $s_{k} = x_{j} - x_{i} = {(s_{1 k}, s_{2 k}, \dots, s_{n k})}^{T} = {(c_{1 k}^{'} - c_{1 k}, c_{2 k}^{'} - c_{2 k}, \dots, c_{n k}^{'} - c_{n k})}^{T} \in Z^{n} .$ The stoichiometry matrix S for the reaction network is defined as: $S = (s_{1}, s_{2}, \dots, s_{m}) \in Z^{n \times m},$ where each column represents a single reaction. The rate $A_{k} (x_{i}, x_{j})$ of reaction R_k that transforms microstate from $x_{i}$ to $x_{j}$ is determined by the intrinsic rate constant r_k and the combination number of relevant reactants in the current microstate $x_{i}$ : $A_{k} (x_{i}, x_{j}) = A_{k} (x_{i}) = r_{k} \prod_{l = 1}^{n} (\begin{matrix} x_{l} \\ c_{l k} \end{matrix}),$ assuming the convention $(\begin{matrix} 0 \\ 0 \end{matrix}) = 1$ .

State space and probability landscape

The state space S of a reaction network is defined as the set of all possible microstates that the system can visit from a given initial condition: $S = {x (t) | x (0), t \in (0, θ)}$ . We denote the probability of each microstate at time t as $p (x (t))$ , and the probability distribution at t over the whole state space as $p (t) = {(p (x (t)) | x (t) \in S)} .$ $p (t)$ is also called the probability landscape of the network.⁸ It can be visualized as a time-evolving scalar surface over the state space, with the value at each state x taken to be $p (x (t))$ . The volume integral under the surface at any arbitrary time t is always 1:

\int_{x \in S} p (x, t) d x = 1 .

In general, there is no assumption of detailed balance. For a reaction network with arbitrary stoichiometries and a specific initial state $p (0)$ , its probability landscape is governed by the discrete chemical master equation (dCME)

\frac{d p (t)}{d t} = A^{T} p (t),

(1)

from which the time-evolving probability landscape $p (t)$ and its steady state can be directly obtained.⁸^,¹⁹ Here A is the transition rate matrix $A = {A_{k} (x_{i}, x_{j})} \in R^{| S | \times | S |}$ .

Transition paths and transition probabilities

A transition path π_{(S, T)} consists of a sequence of states: $S = (x_{0}, \dots, x_{N})$ , starting from $x_{0}$ and ending at $x_{N}$ , along with a sequence of time points T = (t₀, ⋅⋅⋅, t_N) when each of these states are visited. Here N is the length of the transition path. When the beginning state $x_{0}$ and the ending state $x_{N}$ , as well as the sequence of states S and time points T are unambiguous from the context, we use π_{(0, N)} to denote the transition path π_{(S, T)} for convenience. This transition path is understood to move from state $x_{0}$ to state $x_{N}$ through a total of N steps following the specific sequence of states S and sequence of time points T. The sequence of time points can be alternatively specified by the corresponding sequence of time intervals: {τ₀, τ₁, ⋯, τ_{N − 1}} = {t₁ − t₀, t₂ − t₁, ⋯, t_N − t_{N − 1}}. In implementation, these time intervals are not predefined but are small random values generated by sampling Poisson processes, whose rates are governed by the underlying chemical reaction rates (see below). We assume that there is a unique reaction connecting each neighboring pair of microstates $x_{i - 1}$ and $x_{i}$ along the reaction path. The probability p(π_{(S, T)}) of a given transition path π_{(S, T)} can be calculated as the product of the probability of the initial state $x_{0}$ , and the probabilities of all subsequent transitions between neighboring states $p (x_{i} | x_{i - 1}, τ_{i - 1}) d τ_{i - 1}$ ,

\begin{matrix} p (π_{(0, N)}) = p (π_{(S, T)}) & = p (x_{0}) \prod_{i = 1}^{N} p (x_{i} | x_{i - 1}, τ_{i - 1}) d τ_{i - 1} . \end{matrix}

(2)

Assuming a Poisson process, the probability $p (x_{i} | x_{i - 1}, τ_{i - 1}) d τ_{i - 1}$ of each transition occurring during an infinitesimally small dτ_{i − 1} after τ_{i − 1} can be calculated as²²^,³⁴

\begin{matrix} p (x_{i} | x_{i - 1}, τ_{i - 1}) d τ_{i - 1} & = & A_{0} (x_{i - 1}) e^{- A_{0} (x_{i - 1}) τ_{i - 1}} d τ_{i - 1} \cdot \frac{A_{k} (x_{i - 1})}{A_{0} (x_{i - 1})} \\ = & A_{k} (x_{i - 1}) e^{- A_{0} (x_{i - 1}) τ_{i - 1}} d τ_{i - 1}, \end{matrix}

(3)

where $A_{0} (x_{i - 1}) = \sum_{k = 1}^{m} A_{k} (x_{i - 1})$ is the sum of rates of all reactions that could happen at the state $x_{i - 1}$ , and $A_{0} (x_{i - 1}) e^{- A_{0} (x_{i - 1}) τ_{i - 1}} d τ_{i - 1}$ is the probability that there is exactly one reaction occurring in next time interval τ_{i − 1}.³⁵ The subscript k denotes the reaction R_k that connects state $x_{i - 1}$ to state $x_{i}$ . The fraction $\frac{A_{k} (x_{i - 1})}{A_{0} (x_{i - 1})}$ is the probability that the kth reaction R_k occurs during τ_{i − 1}.²²^,³⁴ Taking together, the overall probability of the transition path π is

p (π_{(0, N)}) = p (x_{0}) \prod_{i = 1}^{N} A_{k} (x_{i - 1}) e^{- A_{0} (x_{i - 1}) τ_{i - 1}} d τ_{i - 1} .

(4)

Macrostates and probability of rare transitions between macrostates

We define a macrostate B as a set of microstates: $B = {x | \in S} \subset S .$ Here we are interested in biologically motivated macrostates. For example, in a bistable genetic switch system, most microstates belong to either the “on/off” or the “off/on” metastable states, each of which can be regarded as a macrostate. The probability of a macrostate B can be written as: $p (B) = \sum_{x \in B} p (x) .$

For a stochastic network, if a destination macrostate D can be reached from a beginning macrostate B, the probability of the system transiting from B to D is 1 if given infinite amount of time. However, we are interested in the probability of transition from B to D within a finite period of time θ. That is, we wish to estimate $p (D | B, t \leq θ)$ , which may be small (Fig. 1),

p (D | B, t \leq θ) = \sum_{π_{(S, T)}} p (π_{(S, T)} | x_{0} \in B, S \cap D \neq \emptyset, t_{N} - t_{0} \leq θ) .

Distribution of rare event transition paths. Rare event transition paths traveling from state B to D within time θ are of interest.

Calculating exact transition probability

The finite buffer dCME method can be used to enumerate the state space S of stochastic networks of modest size, and is optimal both in time complexity and in space requirement.⁸^,¹⁹ For these networks, we can directly solve the dCME. The transition probabilities of specific paths connecting two macroscopic states can therefore be calculated exactly. In this study, the probabilities of rare events in all examples are computed both by the finite buffer dCME method and by sampling methods. The results of the former are regarded as exact solutions, against which results from sampling methods are compared.

Weighted SSA and doubly-weighted SSA

There are potentially an enormous number of transition paths connecting two macrostates. In general, if enumeration is infeasible, exact calculation of the transition probabilities is not possible. One can estimate the probabilities through Monte Carlo sampling.

A number of sampling methods for rare events have been developed based on the principle of importance sampling. Kuwahara and Mura developed the wSSA algorithm,²² in which the rate of each reaction $A_{k} (x)$ is biased by a pre-selected predilection constant α_k, which will increase or decrease the rate of a specific reaction. This may affect the fraction of sampled paths reaching the target states. These paths are generated from the biased reaction rate $B_{k} (x) = α_{k} A_{k} (x)$ . The biased probability $p^{'} (x_{i} | x_{i - 1}, τ_{i - 1}) d τ_{i - 1}$ of the reaction in the time step starting at state $x_{i - 1}$ is calculated as

\begin{matrix} p^{'} (x_{i} | x_{i - 1}, τ_{i - 1}) d τ_{i - 1} \\ = A_{0} (x_{i - 1}) e^{- A_{0} (x_{i - 1}) τ_{i - 1}} d τ_{i - 1} \cdot \frac{B_{k} (x_{i - 1})}{B_{0} (x_{i - 1})}, \end{matrix}

(5)

where reaction R_k leads $x_{i - 1}$ to $x_{i}$ , and $B_{0} (x) \equiv \sum_{k = 1}^{m} B_{k} (x)$ . A weight for correcting the bias is also kept for this reaction:

w (k, x_{i - 1}) = \frac{A_{k} (x_{i - 1}) B_{0} (x_{i - 1})}{B_{k} (x_{i - 1}) A_{0} (x_{i - 1})} = \frac{1}{α_{k}} \frac{B_{0} (x_{i - 1})}{A_{0} (x_{i - 1})} .

(6)

The true probability $p (x_{i} | x_{i - 1}, τ_{i - 1})$ is then recovered as $p (x_{i} | x_{i - 1}, τ_{i - 1}) = w (k, x_{i - 1}) \cdot p^{'} (x_{i} | x_{i - 1}, τ_{i - 1}) .$ The biased probability p^′(π_{(0, N)}) for the full path is:

p^{'} (π_{(0, N)}) = p (x_{0}) \prod_{i = 1}^{N} p^{'} (x_{i} | x_{i - 1}, τ_{i - 1}) d τ_{i - 1},

with the weight:

w (π_{(0, N)}) = \prod_{i = 1}^{N} w (k, x_{i - 1}) = \prod_{i = 1}^{N} \frac{1}{α_{k}} \frac{B_{0} (x_{i - 1})}{A_{0} (x_{i - 1})},

The true probability of the path is then: p(π_{(0, N)}) = w(π_{(0, N)}) · p′(π_{(0, N)}).

In wSSA, bias is introduced through the second factor $\frac{B_{k} (x_{i - 1})}{B_{0} (x_{i - 1})}$ in Eq. 5, which represents the biased probability in selecting the next reaction in the wSSA scheme. The time scale of the Poisson process underlying the reaction, namely, the first factor in Eq. 5, remains unchanged.

In the doubly-weighted SSA method (dwSSA),²³ both the selection probability and the Poisson time scale are biased, and the biased probability for each step in a dwSSA sampling path is:

\begin{matrix} p^{'} (x_{i} | x_{i - 1}, τ_{i - 1}) d τ_{i - 1} \\ = B_{0} (x_{i - 1}) e^{- B_{0} (x_{i - 1}) τ_{i - 1}} d τ_{i - 1} \cdot \frac{B_{k} (x_{i - 1})}{B_{0} (x_{i - 1})} \\ = B_{k} (x_{i - 1}) e^{- B_{0} (x_{i - 1}) τ_{i - 1}} d τ_{i - 1}, \end{matrix}

(7)

where $B_{k} (x_{i - 1}) = γ_{k} A_{k} (x_{i - 1})$ is the biased reaction rate. The weight for the kth reaction occurring at step i − 1 is obtained from dividing Eq. 3 by Eq. 7:

\begin{matrix} w (k, x_{i - 1}) & = & \frac{A_{k} (x_{i - 1}) e^{- A_{0} (x_{i - 1}) τ_{i - 1}}}{B_{k} (x_{i - 1}) e^{- B_{0} (x_{i - 1}) τ_{i - 1}}} \\ = & \frac{1}{γ_{k}} \exp [(B_{0} (x_{i - 1}) - A_{0} (x_{i - 1})) τ_{i - 1}] . \end{matrix}

The biased probability for a full dwSSA path π_{(0, N)} is:

p^{'} (π_{(0, N)}) = p (x_{0}) \prod_{i = 1}^{N} p^{'} (x_{i} | x_{i - 1}, τ_{i - 1}) d τ_{i - 1},

and its weight is:

\begin{matrix} w (π_{(0, N)}) & = & \prod_{i = 1}^{N} w (k, x_{i - 1}) \\ = & \prod_{i = 1}^{N} \frac{1}{γ_{k}} \exp [(B_{0} (x_{i - 1}) - A_{0} (x_{i - 1})) τ_{i - 1}] . \end{matrix}

The true probability of the path p(π_{(0, N)}) can be recovered as: p(π_{(0, N)}) = w(π_{(0, N)}) · p′(π_{(0, N)}).

A key component of the dwSSA method is an automatic method to estimate the bias constant γ_k for each reaction R_k. A large number (typically 10⁵) of full-length trial simulations are run, with some of them reaching the target macrostate. The number of each reaction that occurred in those simulations that reached the macrostate are counted and compared to the expected number of occurrence if one were to follow a Poisson process in the same given time under the same initial condition. Those reactions occur more frequently than expected are biased towards. Reactions that occur less frequently than expected are biased against. This procedure is repeated with the biases updated iteratively, until a predefined fraction (e.g., 2%) of full-length trial paths reaches the target macrostate. In the further developed swSSA and sdwSSA methods, the bias coefficient is not a constant, but depends on the copy numbers of molecules of the current state.²⁶^,²⁷ In order to assign more effective bias coefficients, a refined scheme of bias assignment is used in sdwSSA, in which each reaction is divided into multiple bins according to its probability to be chosen, with each bin assigned its own bias coefficient.²⁶

There are a number of issues with these methods. First, estimation of the bias parameters relies on counting the number of occurrence of a reaction, which may not be possible or the estimate may not be reliable if reactions happens rarely. For example, gene binding and unbinding reactions in a toggle switch system bring the system from one metastable state to another. But this happens only once or twice during an extended time. It is challenging to sample these binding/unbinding reactions adequately using trial simulations, where limited runs are carried out. Second, as the estimated bias parameters are either constant or based on current state, no considerations for possible future barriers in the probability landscape is incorporated. This becomes problematic for complex systems, for example, those with multistabilities, where steep barriers need to be crossed. In these systems, the desirable bias may be quite different depending on the neighborhood where the system is currently located in the landscape. Third, there is a proliferation of adjustable bias parameters, for example, on the order of O(βm) for the sdwSSA method, with the number of bins β = 5–20 for each of the m reactions, making the assignment of bias coefficients a challenging task.²⁶ As a result, often the overall amount of computation involved is substantial, the variance of samples generated is high, and the accuracy is still unsatisfactory.

Adaptively biased sequential importance sampling

Here we describe a new method called ABSIS for estimating rare transition probability between macrostates. Our approach is based on the look-ahead strategy and the principle of sequential importance sampling,³¹ which have found wide applications in studies of polymers and protein biophysics,²⁹^,³⁰ where challenging problems such as RNA loop entropy calculation, generation of protein folding transition state ensemble, and protein packing have been investigated.³⁰^,³³^,³⁶ In ABSIS, bias for each reaction is calculated based on present and future information, and is adaptively adjusted automatically, resulting in more efficient sampling of rare events. It can be applied to stochastic networks with complex probability landscapes.

Perfect path sampling

Assume we wish to reach the macrostate D from the microstate $x_{0}$ . We can classify paths π_{(0, N)} starting at $x_{0}$ and ending at $x_{N}$ into two sets $P_{D}$ and $P_{\bar{D}}$ : those that reach the macrostate D before time θ form the set of paths $P_{D} = {π_{(0, N)} | S \cap D \neq \emptyset, t_{N} - t_{0} \leq θ}$ , and those that do not form another set of paths $P_{\bar{D}} = {π_{(0, N)} | S \cap D = \emptyset, t_{N} - t_{0} > θ}$ .

Our goal is to assess the transition probability $p (D | x_{0})$ from the microstate $x_{0}$ to the macrostate D. It can be calculated as

p (D | x_{0}) = \int_{π_{(0, N)}} I (π_{(0, N)}) \cdot p (π_{(0, N)}) d π_{(0, N)},

where $I (π_{(0, N)})$ is an indicator function such that $I (π_{(0, N)}) = 1$ if $π_{(0, N)} \in P_{D}$ , and 0 otherwise. Namely, it is 1 if a path π_{(0, N)} starting from state $x_{0}$ reaches the macrostate D in time, and 0 otherwise. Perfect path samples for calculating $p (D | x_{0})$ then can be drawn as

π_{(0, N)} \sim I (π_{(0, N)}) \cdot p (π_{(0, N)}) .

In general, if our goal is to estimate certain property of the reaction paths, which is expressed as a scalar function $f (x) : {Z^{+}}^{n} \mapsto R$ of the microstate x, perfect sampling of the reaction paths for this estimation problem is then:

π_{(0, N)} \sim I (π_{(0, N)}) \cdot \hat{f} (π_{(0, N)}),

where $\hat{f} (π_{(0, N)}) = p (x_{0}) f (x_{0}) \prod_{i = 1}^{N} p (x_{i} | x_{i - 1}, τ_{i - 1}) f (x_{i})$ , and $(x_{0}, \dots, x_{N})$ forms the path π_{(0, N)}.

Optimal bias strategy and future-perfect adaptive weighting

Similarly, the probability $p (D | x_{i})$ that future paths after a reaction connecting the microstate $x_{i - 1}$ to $x_{i}$ will reach the destination macrostate D in time is:

p (D | x_{i}) = \int_{π_{(i, N)}} I (π_{(i, N)}) \cdot p (π_{(i, N)}) d π_{(i, N)} .

To estimate the transition probability $p (D | x_{i - 1})$ , the next state $x_{i}$ can be sampled future-perfectly if we draw $x_{i}$ as

x_{i} \sim p (D | x_{i}) .

If our goal is to estimate the property f(·) of the reaction path that reaches the macrostate D, $x_{i}$ can be sampled optimally as

x_{i} \sim \int_{π_{(i, N})} I (π_{(i, N)}) \cdot \hat{f} (π_{(i, N)}) d π_{(x_{i}, N)},

where $\hat{f} (π_{(i, N)}) = p (x_{i}) f (x_{i}) \prod_{l = i + 1}^{N} p (x_{l} | x_{l - 1}, τ_{l - 1}) f (x_{l})$ .

κ-Step look-ahead bias strategy and adaptive weighting

As it is usually impossible to enumerate and examine all paths up to time θ to calculate $p (D | x_{i})$ exactly, we approximate it by adopting a κ-step look-ahead strategy. Briefly, we analyze statistics of exhaustively generated short paths, and design biases based on estimations made on these short paths. We first classify κ-step paths π_{(i, i + κ)}, which all have the first step following a specific reaction connecting $x_{i - 1}$ to $x_{i}$ , into three types: forward-moving paths $P_{F}$ , backward-moving paths $P_{B}$ , and non-moving paths $P_{N}$ (Fig. 2):

\begin{matrix} Forward-moving: π_{i, i + κ} \in P_{F} & if d (x_{i + κ}, D) < d (x_{i - 1}, D), \\ Backward-moving: π_{i, i + κ} \in P_{B} & if d (x_{i + κ}, D) > d (x_{i - 1}, D), \\ Non-moving: π_{i, i + κ} \in P_{N} & if d (x_{i + κ}, D) = d (x_{i - 1}, D) . \end{matrix}

(8)

Here $d (x_{i + κ}, D)$ and $d (x_{i - 1}, D)$ are the distances between the states $x_{i + κ}, x_{i - 1}$ , and the target macrostate D, respectively. We define the distance $d (x, D) = \min_{x_{l} \in D} d (x, x_{l}) .$ For convenience, we use 1-norm distance. The forward-moving, backward-moving, and non-moving probabilities after κ-steps, given that the first reaction connects state $x_{i - 1}$ and $x_{i}$ , can be calculated as

p_{F} (x_{i}) = \sum_{π_{(i, i + κ)} \in P_{F}} p (π_{(i, i + κ)})

p_{B} (x_{i}) = \sum_{π_{(i, i + κ)} \in P_{B}} p (π_{(i, i + κ)})

and

p_{N} (x_{i}) = 1 - [p_{F} (x_{i}) + p_{B} (x_{i})] .

We can then have the approximations $p (D | x_{i}) \approx p_{F} (x_{i}),$ and $p (\bar{D} | x_{i}) \approx p_{B} (x_{i}) + p_{N} (x_{i}),$ which will be used to construct the bias functions for accelerating/decelerating the reaction rate and for selecting reaction. We can now have the approximation:

\begin{matrix} p (D | x_{i}) & = & \int_{π_{(i, N)} \in P_{D}} I (π_{(i, N)}) \cdot p (π_{(i, N)}) d π_{(i, N)} \\ \approx & \int_{π_{(i, i + κ)} \in P_{F}} p (π_{(i, i + κ)}) d π_{(i, i + κ)} . \end{matrix}

Look-ahead strategy. Two reactions leading current state $x_{i - 1}$ to next state $x_{i}$ are shown in dashed lines (blue and pink). All possible look-ahead paths of length κ = 2 for both reactions from two different next states $x_{i}$ are illustrated in solid blue and pink lines. State B is the initial state, and D is the target state. The thickness of lines indicates reaction rates. The color of circles at the end state of each path indicates moving forward (green, $P_{F}$ ), backward (red, $P_{B}$ ), and non-moving (yellow, $P_{N}$ ).

Bias function with κ-step look-ahead

Recall that the probability of a path p(π_{(i, N)}) is computed as

p (π_{(i, N)}) = p (x_{i}) \prod_{l = i}^{N} A_{0} (x_{l}) e^{- A_{0} (x_{l}) τ_{l}} d τ_{l} \frac{A_{k} (x_{l})}{A_{0} (x_{l})} .

To design bias functions that are fast to compute, we consider only the overall probability p_r(π_(i, N)) of reaction choices accumulated along the path, and ignore the rates of reactions for now:

p_{r} (π_{(i, N)}) = \prod_{l = i}^{N} \frac{A_{k} (x_{l})}{A_{0} (x_{l})} .

We have

p_{F} (x_{i}) = \sum_{π_{(i, i + κ)} \in P_{F}} p (π_{(i, i + κ)}) \approx \sum_{π_{(i, i + κ)} \in P_{F}} [\prod_{l = i}^{i + κ - 1} \frac{A_{k} (x_{l}, x_{l + 1})}{A_{0} (x_{l})}]

(9)

and

p_{B} (x_{i}) = \sum_{π_{(i, i + κ)} \in P_{B}} p (π_{(i, i + κ)}) \approx \sum_{π_{(i, i + κ)} \in P_{B}} [\prod_{l = i}^{i + κ - 1} \frac{A_{k} (x_{l}, x_{l + 1})}{A_{0} (x_{l})}] .

(10)

We can then design a bias function $g_{k} (x_{i - 1}) = f (p_{F} (x_{i}), p_{B} (x_{i}), A_{k} (x_{i - 1}, x_{i}), A_{0} (x_{i - 1}))$ for selecting reaction k, and set $B_{k} (x_{i - 1}) = g_{k} (x_{i - 1}) A_{k} (x_{i - 1})$ as the biased reaction rate. The general biased probability for each step in the ABSIS sampling path is then:

\begin{matrix} p_{A B S I S} (x_{i} | x_{i - 1}, τ_{i - 1}) d τ_{i - 1} \\ = B_{0} (x_{i - 1}) e^{- B_{0} (x_{i - 1}) τ_{i - 1}} d τ_{i - 1} \cdot \frac{B_{k} (x_{i - 1})}{B_{0} (x_{i - 1})} . \end{matrix}

(11)

Note that calculating the probability $p_{F} (x_{i})$ and $p_{B} (x_{i})$ is equivalent to follow a κ-step Markov process $p (i + κ) = T^{κ} p (i)$ , where the probability transition matrix is:

T = A^{T} diag (1 / A_{0} (x_{i, 1}), 1 / A_{0} (x_{i, 2}), \dots, 1 / A_{0} (x_{i, | S |})) + I,

in which A is the transition rate matrix of dCME in Eq. 1, $A_{0} (x_{i}) = \sum_{k = 1}^{m} A_{k} (x_{i}, x_{i^{'}})$ , I is the identity matrix, and ${x_{i, 1}, \dots, x_{i, | S |}}$ form the state space S for this κ-step Markov process starting from $x_{i}$ . The initial probability distribution $p (i)$ for this Markov process is such that the probability for the current state $x_{i}$ is 1 and 0 for all other states.

Biasing strategy.

The bias in selecting reaction k that brings state $x_{i - 1}$ to $x_{i}$ is based on the forward-moving and backward-moving probabilities. We first have

g_{k} (x_{i - 1}) = \{\begin{matrix} 1 - λ_{1} p_{F} (x_{i}) \log [\frac{A_{k} (x_{i - 1}, x_{i})}{A_{0} (x_{i - 1})} \cdot p_{F} (x_{i})], & if p_{F} (x_{i}) > p_{B} (x_{i}) \\ \min \{1.0, - \frac{1}{λ 2} (1 - p_{B} (x_{i})) \log [\frac{A_{k} (x_{i - 1}, x_{i})}{A_{0} (x_{i - 1})} \cdot p_{B} (x_{i})]\}, & if p_{F} (x_{i}) < p_{B} (x_{i}) \\ 1, & if p_{F} (x_{i}) = p_{B} (x_{i}), \end{matrix}

(12)

where $A_{0} (x_{i - 1}) = \sum_{k = 1}^{m} A_{k} (x_{i - 1}, x_{i^{'}})$ , i^′ is any reachable state from $x_{i - 1}$ . Here λ₁ ⩾ 0 and λ₂ ⩾ 0 are the parameters for biasing towards forward-moving and against backward-moving reactions, respectively. Overall, there are only these two bias parameters, regardless which reaction k is considered. The surface maps of bias function $g_{k} (x_{i - 1})$ with λ₁ = λ₂ = 0.5 for encouraging and discouraging reaction k at different values of $p_{F}$ , $p_{B}$ , and $\frac{A_{k} (x_{i - 1}, x_{i})}{A_{0} (x_{i - 1})}$ are shown in Figs. 3a, 3b.

Surface map of bias function $g_{k} (x_{i - 1})$ in Eq. 12 with λ₁ = λ₂ = 0.5. Bias strengths for encouraging reaction k at different values of forward-moving probability $p_{F}$ and $\frac{A_{k} (x_{i - 1}, x_{i})}{A_{0} (x_{i - 1})}$ are shown in (a). Bias strengths for discouraging reaction k at different values of backward-moving probability $p_{B}$ and $\frac{A_{k} (x_{i - 1}, x_{i})}{A_{0} (x_{i - 1})}$ are shown in (b).

The construction of $g_{k} (x_{i - 1})$ is based on the following consideration. The rate ratio $\frac{A_{k} (x_{i - 1}, x_{i})}{A_{0} (x_{i - 1})}$ , which is the original probability of choosing the reaction k that reaches $x_{i}$ , is now modified by the forward probability $p_{F} (x_{i})$ at $x_{i}$ , obtained by looking κ-steps ahead. The term $\frac{A_{k} (x_{i - 1}, x_{i})}{A_{0} (x_{i - 1})} \cdot p_{F} (x_{i})$ therefore represents the probability of selecting reaction k and moving forward. To encourage forward-moving reactions with lower reaction rates, we use the term $- \log [\frac{A_{k} (x_{i - 1}, x_{i})}{A_{0} (x_{i - 1})} \cdot p_{F} (x_{i})]$ instead. This is then further modified by $p_{F} (x_{i})$ so that reactions with higher forward-moving probability is favored proportionally (Fig. 3a). The bias coefficient λ₁ is used to adjust the bias strength for forward-moving reactions. Larger λ₁ gives stronger encouragement. As $- λ_{1} p_{F} (x_{i}) \log [\frac{A_{k} (x_{i - 1}, x_{i})}{A_{0} (x_{i - 1})} \cdot p_{F} (x_{i})]$ falls in the interval [0, +∞), we add the constant 1 so the function $g_{k} (x_{i - 1})$ is now in the interval [1, +∞) when reaction k should be encouraged. Setting bias according to $g_{k} (x_{i - 1})$ will increase the probability for a forward moving reaction to be selected. Overall, if a larger λ₁ value is chosen, a slower reaction with higher probability of moving forward will be encouraged more (Fig. 3a).

Similarly, backward reactions are biased against, with stronger discouragement when using a larger λ₂ value. The discouragement is also stronger for reactions with larger backward probability $p_{B} (x_{i})$ and larger rate ratio $\frac{A_{k} (x_{i - 1}, x_{i})}{A_{0} (x_{i - 1})}$ (Fig. 3b). To ensure $g_{k} (x_{i - 1})$ fall within the interval (0, 1], a “min ” function is used here to add an upper bound for the bias. If $x_{i}$ neither advances nor backtracks the system, no bias is introduced.

Corrections of biases and biased reaction rates.

In principle, both reaction selection probability and the Poisson time scale can be biased. In this study, we focus on effects of directly biasing reaction selection probability alone. The effects of directly biasing specific reaction rates are the subject for future studies. Specifically, we now insist that the overall reaction rate of the system is unchanged, namely,

B_{0} (x_{i - 1}) = A_{0} (x_{i - 1}) .

(13)

As $B_{0} (x_{i - 1}) = \sum_{k} B_{k} (x_{i - 1}) = \sum_{k} g_{k} (x_{i - 1}) A_{k} (x_{i - 1})$ and $A_{0} (x_{i - 1}) = \sum_{k} A_{k} (x_{i - 1}),$ we use a normalizing constant α:

α = A_{0} (x_{i - 1}) / \sum_{k} g_{k} (x_{i - 1}) A_{k} (x_{i - 1}),

and the biased reaction rate is tentatively set to:

B_{k}^{'} (x_{i - 1}) = α g_{k} (x_{i - 1}) A_{k} (x_{i - 1}) .

(14)

This ensures Eq. 13 holds.

As there are occasions where the bias changes direction after normalization, namely, from <1.0 to >1.0, or vice versa, we further insist that:

\begin{matrix} B_{k}^{'} (x_{i - 1}) \leq A_{k} (x_{i - 1}), if p_{B} (x_{i}) \geq 2 \cdot p_{F} (x_{i}), \\ B_{k}^{'} (x_{i - 1}) \geq A_{k} (x_{i - 1}), if p_{F} (x_{i}) \geq 2 \cdot p_{B} (x_{i}) . \end{matrix}

(15)

To satisfy this requirement, we partition all reactions into two disjoint sets based on whether their corresponding $B_{k}^{'} (x_{i - 1})$ satisfy the above inequalities:

\begin{matrix} R_{S} & = {k | if B_{k}^{'} (x_{i - 1}) satisfies Eq. (15)}, \\ R_{U} & = {k | otherwise} . \end{matrix}

(16)

Inequalities in Eq. 15 are maintained by simply assigning no bias for all reactions in $R_{U}$ , and redistribute their surpluses and deficits evenly to all reactions in $R_{S}$ . As a result, the total reaction rate $B_{0} (x_{i - 1})$ is unchanged. We have the final biased reaction rates:

\begin{matrix} B_{k} (x_{i - 1}) & = & A_{k} (x_{i - 1}), if k \in R_{U}, \\ B_{k} (x_{i - 1}) & = & B_{k}^{'} (x_{i - 1}) (1 + \sum_{l \in R_{U}} [B_{l}^{'} (x_{i - 1}) - A_{l} (x_{i - 1})] / \\ \sum_{j \in R_{S}} [B_{j}^{'} (x_{i - 1})]), if k \in R_{S}, \end{matrix}

(17)

where $B_{k}^{'} (x_{i - 1})$ is given by Eq. 14. The final biased probability for each step in an ABSIS sampling path is then calculated as

\begin{matrix} p_{A B S I S} (x_{i} | x_{i - 1}, τ_{i - 1}) d τ_{i - 1} \\ = B_{0} (x_{i - 1}) e^{- B_{0} (x_{i - 1}) τ_{i - 1}} d τ_{i - 1} \cdot \frac{B_{k} (x_{i - 1})}{B_{0} (x_{i - 1})} . \end{matrix}

(18)

Weights of ABSIS path

The weight for correcting the bias for taking the kth reaction at step i − 1 is obtained by dividing Eq. 3 by Eq. 18:

w_{A B S I S} (k, x_{i - 1}) = \frac{A_{k} (x_{i - 1}) e^{[B_{0} (x_{i - 1}) - A_{0} (x_{i - 1})] τ_{i - 1}}}{B_{k} (x_{i - 1})} .

(19)

For the special case when the overall reaction rate is unchanged, namely, when $B_{0} (x_{i - 1}) = A_{0} (x_{i - 1})$ , we have $w_{A B S I S} (k, x_{i - 1}) = \frac{A_{k} (x_{i - 1})}{B_{k} (x_{i - 1})}$ . The weight for a full ABSIS path is then:

\begin{matrix} w_{A B S I S} (π_{(0, N)}) & = & \prod_{i = 1}^{N} w_{A B S I S} (k, x_{i - 1}) \\ = & \prod_{i = 1}^{N} \frac{A_{k} (x_{i - 1}) e^{[B_{0} (x_{i - 1}) - A_{0} (x_{i - 1})] τ_{i - 1}}}{B_{k} (x_{i - 1})} \\ = & \prod_{i = 1}^{N} \frac{A_{k} (x_{i - 1})}{B_{k} (x_{i - 1})} . \end{matrix}

(20)

The biased probability for a full ABSIS path π_{(0, N)} is

p_{A B S I S} (π_{(0, N)}) = p (x_{0}) \prod_{i = 1}^{N} p_{A B S I S} (x_{i} | x_{i - 1}, τ_{i - 1}) d τ_{i - 1} .

The true probability of the path π_{(0, N)} can be recovered as

p (π_{(0, N)}) = w_{A B S I S} (π_{0, N}) \cdot p_{A B S I S} (π_{0, N}) .

The ABSIS algorithm

We summarize the ABSIS method in Algorithm 1.

In order to improve computing efficiency, we enumerate the κ-step look-ahead paths for each microstate when encountered. As implementation and data structure greatly affect computing speed, $A_{k} (x_{i - 1})$ , $A_{0} (x_{i - 1})$ , $B_{k} (x_{i - 1})$ , $B_{0} (x_{i - 1})$ , and $g_{k} (x_{i - 1})$ are all calculated only once when the microstate $x_{i - 1}$ is first visited, with their values stored in hash tables using the microstate $x_{i - 1}$ as the key. All subsequent visits to the microstate $x_{i - 1}$ need only to retrieve relevant values stored in the hash tables. This leads to dramatically improved time efficiency.

Determining look-ahead step κ and bias parameter λ₁ and λ₂

In ABSIS, we only have one look-ahead steps parameter κ, and two bias parameters λ₁ and λ₂ to be determined, regardless of the number of reactions and the overall network complexity.

The ABSIS

// Input (X, R,

x_{0}

, D, θ, κ, M, λ₁, λ₂)

Define network

N \leftarrow (X, R)

Initialize hash table

H = {(x_{i}) : [A_{0} (x_{i}), B_{0} (x_{i}), A_{k} (x_{i}), B_{k} (x_{i}), g_{k} (x_{i})]} \leftarrow \emptyset

j ← 0, total weight w_M ← 0, weight square v_M ← 0,

Number of successful paths N_s ← 0

whilej < Mdo

Path length i ← 1

Initialize path with the initial state

x_{i - 1} \leftarrow x_{0}

Time on current path t ← 0, weight of current path w ← 1

whilet < θ and

d (x_{i - 1}, D) > 0

x_{i - 1} \notin H

then

Calculate reaction rates

A_{k} (x_{i - 1})

of state

x_{i - 1}

for all reactions

R_{k} \in R

A_{0} (x_{i - 1}) \leftarrow \sum_{k = 1}^{m} A_{k} (x_{i - 1})

Enumerate all possible κ step paths π_{(i − 1, i + κ)} starting from state

x_{i - 1}

using Algorithm of Ref 19.

Calculate

p_{F} (x_{i})

and

p_{B} (x_{i})

for each R_k using Eq. 9, 10

Calculate bias strength

g_{k} (x_{i - 1})

for each R_k according to Eq. 12

Calculate tentative reaction rate

B_{k}^{'} (x_{i - 1})

for all R_k according to Eq. 14

Calculate final biased reaction rate

B_{k} (x_{i - 1})

for all R_k according to Eq. 17

Calculate

B_{0} (x_{i - 1}) \leftarrow \sum_{k = 1}^{m} B_{k} (x_{i - 1})

H \leftarrow H \cup {(x_{i - 1}) : [A_{0} (x_{i - 1}), B_{0} (x_{i - 1}), A_{k} (x_{i - 1}), B_{k} (x_{i - 1}), g_{k} (x_{i - 1})]}

end if

Retrieve

B_{k} (x_{i - 1})

A_{k} (x_{i - 1})

B_{0} (x_{i - 1})

A_{0} (x_{i - 1})

and

g_{k} (x_{i - 1})

from H using key

x_{i - 1}

Generate two uniform random numbers

μ_{1} \sim U (0, 1)

and

μ_{2} \sim U (0, 1)

τ_{i - 1} \leftarrow - \ln (μ_{1}) / B_{0} (x_{i - 1})

r \leftarrow smallest integer satisfying \sum_{r = 1}^{k - 1} B_{r} (x_{i - 1}) < μ_{2} B_{0} (x_{i - 1}) \leq \sum_{r = 1}^{k} B_{r} (x_{i - 1})

t ← t + τ_{i − 1},

x_{i - 1} \leftarrow x_{i - 1} + s_{r}

w \leftarrow w \cdot \frac{A_{k} (x_{i - 1}) e^{(B_{0} (x_{i - 1}) - A_{0} (x_{i - 1})) τ_{i - 1}}}{B_{k} (x_{i - 1})}

i ← i + 1

end while

ift < θ and

d (x_{i - 1}, D) = 0

thenw_M ← w_M + w, v_M ← v_M + w², N_s ← N_s + 1

end if

end while

return

p_{A B S I S} (x_{0}, D, θ) = w_{M} / M

return

σ_{A B S I S}^{2} (x_{0}, D, θ) = (v_{M} / M) - {(w_{M} / M)}^{2}

return

σ_{A B S I S} (x_{0}, D, θ) = \sqrt{σ_{A B S I S}^{2} / M}

return Success Rate: s = N_s/M

Open in a new tab

Estimation of look-ahead steps κ and parameter search space l for ABSIS

1: // Input (X, R,

x_{0}

, D, θ)

2: Look-ahead step: κ ← 2

3: Maximum range of parameter search space for determining λ₁ and λ₂: l ← 1.0

4: Sample size: M ← 1000

5: Success rate: s ← 0

6: while 1 do ▷ Determine the optimal κ

7: s ← s + success rate of ABSIS(X, R,

x_{0}

, D, θ, κ, M, λ₁ = 0.0, λ₂ = 1.0)

8: s ← s + success rate of ABSIS(X, R,

x_{0}

, D, θ, κ, M, λ₁ = 1.0, λ₂ = 0.0)

9: s ← s + success rate of ABSIS(X, R,

x_{0}

, D, θ, κ, M, λ₁ = 1.0, λ₂ = 1.0)

10: s ← s/3

11: ifs > 0.5 then

12: break

13: end if

14: κ ← κ + 1

15: end while

16: s ← success rate of ABSIS(X, R,

x_{0}

, D, θ, κ, M, λ₁ = 0.5, λ₂ = 0.5)

17: ifs > 0.8 then ▷ Determine the maximum range for parameter search: l

18: l ← 0.5

19: end if

20: return κ

21: returnl

Open in a new tab

To determine κ, we make the reasonable assumption that longer look-ahead paths lead to better bias parameters. Starting from κ = 2, we test different κ values with an increment of 1 using 10³ ABSIS paths. We take the first value of κ that gives an average success rate s of >0.50 at three different parameters locations of (λ₁, λ₂) = (0.0, 1.0), (1.0, 0.0), and (1.0, 1.0). This is very efficient as it typically only takes 3 × 10³ ABSIS paths to evaluate one κ. This is summarized in Algorithm 2.

To determine the optimal biasing parameters (λ₁, λ₂) ∈ [0.0, 1.0] × [0.0, 1.0], we use a grid search, where 10³ paths are generated at each grid point. The sample variance, success rate, total path weight and total weight square are stored at each grid point. We assume that the success rate s of ABSIS increases monotonically with parameters λ₁ and λ₂, at the cost of reduced diversity among sampled paths. We first evaluate s at (λ₁, λ₂) = (0.5, 0.5) using 10³ ABSIS paths. If s > 0.8, we focus on exploring more diverse paths and restrict our search space to (λ₁, λ₂) ∈ [0.0, 0.5] × [0.0, 0.5]. Otherwise, the search space remains as [0.0, 1.0] × [0.0, 1.0].

We start at (λ₁, λ₂) = (0.0, 0.0), and move first along the direction of λ₂, and then continue at an increased λ₁ value, all with an interval of Δ = 0.1. We stop our search along the λ₂ direction if s > 0.8. If s at a specific point of (λ₁, λ₂) is 0.5 better than its visited neighbors in either the λ₁ or the λ₂ directions, we retrospectively increase the number of grid points in that direction with a finer interval of Δ^′ = 0.02, and carry out searches on these grid points. After the search concludes, grid points with the smallest variance and s ∈ [0.1, 0.8] are taken as candidates.

We repeat this search process starting at (0.5, 0.5) again. The first candidate grid point that is again identified from a second independent search is taken as our final choice. When no candidate grid points are found in two independent searches, we repeat the overall search process, and update the stored sampling variances and success rates with results from new samples, until an optimal parameter pair is found. To further reduce computing costs, we skip grid points with previous s outside the range of [0.1, 0.8] when updating variance and success rates. The procedure for parameter estimation is summarized in Algorithm 3.

BIOLOGICAL EXAMPLES

Below we describe examples of applying ABSIS to four biochemical reaction networks. We show that ABSIS can provide accurate estimation of transition probabilities with efficient computation. Results are then compared with the true answer obtained from the finite buffer dCME method, and those obtained using other methods (the dwSSA,²³ as well as the swSSA method²⁷ and sdwSSA²⁶ method when possible), with differences discussed.

Birth-death process

The birth-death process is a simple chemical reaction system that involves one molecular species and two reactions. Synthesis and degradation are the only reactions, and there is only one molecular species. The network and parameters are specified as follows:

\begin{matrix} R_{1} : \emptyset \overset{k_{1}}{\to} X, k_{1} = 1, \\ R_{2} : X \overset{k_{2}}{\to} \emptyset, k_{2} = 0.025 . \end{matrix}

(21)

We study the problem of estimating the rare event probability, p(x(t) = 80|x(0) = 40, t ⩽ θ), that the system transits from the initial state x(0) = 40 to the target state x(t) = 80 within the time threshold θ = 100. This same problem was studied in Daigle et al.²³ and Roh et al.²⁷

Estimation of Bias Parameters λ₁, λ₂ for ABSIS.

1: // Input (X, R,

x_{0}

, D, θ, κ, l)

2: Sample size: M ← 1000

3: Bias parameters: λ₁ ← 0, λ₂ ← 0

4: Grid size: Δ = 0.1 and refined grid size: Δ^′ = 0.02

5: Initialize hash tables

H_{1} {(λ_{1}, λ_{2}) : [σ_{1 (λ_{1}, λ_{2})}^{2}, s_{1 (λ_{1}, λ_{2})}, M_{1 (λ_{1}, λ_{2})}]} \leftarrow \emptyset

6: Initialize hash tables

H_{2} {(λ_{1}, λ_{2}) : [σ_{2 (λ_{1}, λ_{2})}^{2}, s_{2 (λ_{1}, λ_{2})}, M_{2 (λ_{1}, λ_{2})}]} \leftarrow \emptyset

7: Total sample size for parameter estimation: M_tot ← 0

8: while 1 do

9: while λ₁ ⩽ ldo

10: while λ₂ ⩽ ldo

11: fori = 1 → 2 do

12: if (λ₁, λ₂)

\in H_{i}

AND

s_{i (λ_{1}, λ_{2})} \notin [0.1, 0.8]

then

13: λ₂ ← λ₂ + Δ

14: i ← i + 1, and go to next iteration.

15: end if

16: [

σ_{i (λ_{1}, λ_{2})}^{2}

s_{i (λ_{1}, λ_{2})}

] = ABSIS(X, R,

x_{0}

, D, θ, κ, M, λ₁, λ₂)

17: M_tot ← M_tot + M

18: if

(λ_{1}, λ_{2}) \notin H_{i}

then

19:

H_{i} \leftarrow H_{1} \cup {(λ_{1}, λ_{2}) : [σ_{i (λ_{1}, λ_{2})}^{2}, s_{i (λ_{1}, λ_{2})}, M]}

20: else

21: Update

[σ_{i (λ_{1}, λ_{2})}^{2}, s_{i (λ_{1}, λ_{2})}, M_{i (λ_{1}, λ_{2})}] \in H_{i}

using

M_{i (λ_{1}, λ_{2})} + M

samples

22: end if

23: if

(s_{i (λ_{1}, λ_{2})} - s_{i (λ_{1}, λ_{2} - Δ)}) > 0.5

then

24: Repeat line 16–22 for refined grids in [λ₂ − Δ, λ₂] with interval Δ^′

25: end if

26: if

(s_{i (λ_{1}, λ_{2})} - s_{i (λ_{1} - Δ, λ_{2})}) > 0.5

then

27: Repeat line 16–22 for refined grids in [λ₁ − Δ, λ₁] with interval Δ^′

28: end if

29: end for

30: λ₂ ← λ₂ + Δ

31: end while

32: λ₁ ← λ₁ + Δ

33: end while

34: if

\arg \min_{(λ_{1}, λ_{2}) \in H_{1}, s_{1 (λ_{1}, λ_{2})} \in [0.1, 0.8]} {σ_{1 (λ_{1}, λ_{2})}^{2}} = \arg \min_{(λ_{1}, λ_{2}) \in H_{2}, s_{2 (λ_{1}, λ_{2})} \in [0.1, 0.8]} {σ_{2 (λ_{1}, λ_{2})}^{2}}

then Exit

35: end if

36: end while

37: return λ₁ and λ₂

38: return total sample size: M_tot.

Open in a new tab

Exact probability landscape and transition probability

We first enumerate the full state space S of the birth-death model of Eq. 21, starting from the initial state of x(0) = 40 using the finite state buffer dCME method with a buffer size of 200.⁸^,¹⁹ There are a total of 241 microstates. To calculate the exact rare event transition probability of p(80|40, t ⩽ θ), the 241 × 241 transition rate matrix A is modified by making the target states x = 80 as an absorbing state, following the approach of Ref. 20. The exact transition probability p(80|40, t ⩽ θ) can then be computed from the modified $A_{a b s}$ :

p (θ) = p (0) \exp (- A_{a b s} θ),

where the initial state probability landscape $p (0)$ has $p {(0)}_{x = 40} = 1$ and 0 for all other 240 states. p(80|40, t ⩽ θ) is obtained from $p {(θ)}_{x = 80}$ . We use the matrix exponential software EXPOKIT³⁷ to calculate p(80|40, t ⩽ θ) for θ = 100 numerically. The exact transition probability is found to be 2.986 × 10⁻⁷. This indicates that there would be only about 3 successful transition paths observed in 10 million sampled paths if the unmodified original SSA were used.

The calculated exact time-evolving probability landscape of the system is plotted in Fig. 4a. The blue and black curves show the landscapes at time t = 100 and at the steady state, respectively. There is one high probability region centered at x = 40 in both landscapes (green dots in Fig. 4a). The target state x = 80 (red dots in Fig. 4a) is located at a region with very low probability. Transitions from x = 40 to x = 80 is therefore of very low probability, as crossing a large barrier between these two states is necessary.

The birth-death model of Eq. 21. (a) Its time-evolving probability landscape. The blue and black curves highlight the landscape at t = 100 and at the steady state, respectively. There is one high probability region located at x = 40 in both landscapes (green dots), which is also the initial state. The target state (red dots) is outside the high probability region. (b) and (c) The variance (b) and success rate (c) of pilot ABSIS sampling during parameter search using a total sampling size of M = 9.2 × 10⁴ and look-ahead path length of κ = 2. The yellow dot in (b) shows the location of the optimal parameters. (d) The estimated transition probability and sample convergence using ABSIS. The solid red line indicates the exact probability calculated from dCME. Black bars and heights of the box-plots are the mean and its standard deviations of estimated transition probability calculated from 4 independent ABSIS simulations, each for a different sample size of M = 10⁴, 10⁵, 10⁶, 10⁷, and 10⁸, respectively. (e) Standard deviations of ABSIS (blue line) and dwSSA (red line) at different sample sizes. (f) Sample variances of the ABSIS (blue line) and the dwSSA (red line) method at different sample sizes.

Determination of look-ahead steps and bias parameters

The look-ahead steps for ABSIS is determined to be κ = 2, and the parameter search space is determined to be 0.5 by running Algorithm 2. The Algorithm 3 is then used to determine λ₁ and λ₂ from the search space [0, 0.5] × [0, 0.5]. The optimal parameters are determined to be λ₁ = 0.50 and λ₂ = 0.18, which have a success rate of 0.63. Figures 4b, 4c show the variances of sampling weights and success rates of reaching the target state at different values of λ₁ and λ₂. The optimal parameters λ₁ = 0.50 and λ₂ = 0.18 are located in the lowest variance region of the parameter space (yellow dot in Fig. 4b). The total sample size for parameter search is 9.5 × 10⁴, which is much smaller than the reported sample size of 7 × 10⁵ in dwSSA.²³

Estimated transition probability

The estimated transition probability and variance from four independent simulations are plotted in Fig. 4d for sample size M of 10⁴, 10⁵, 10⁶, 10⁷, and 10⁸ used for each simulation. The estimated rare transition probability with M = 10⁷ is:

p_{A B S I S} (80 | 40, t \leq 100) = 2.981 \times 10^{- 7} \pm 0.001 \times 10^{- 7},

which is very close to the exact value of 2.986 × 10⁻⁷ (red line in Fig. 4d). In addition, ABSIS converge rapidly as the sample size increases.

We compare our results with those from the dwSSA method, which was implemented following Ref. 23. We use the exact bias constants of γ₁ = 1.454 and γ₂ = 0.686 as in Daigle et al.²³ The probability estimated from dwSSA is 2.937 × 10⁻⁷ ± 0.017 × 10⁻⁷ using a sample size of M = 10⁷, which is accurate but less so than that of ABSIS. Additionally, the ABSIS method has a higher success rate (0.63) than the dwSSA method (0.59). The comparisons of mean standard deviations between ABSIS and dwSSA calculated from four independent simulations using different sample size are plotted in Fig. 4e. ABSIS has a standard deviation about one order of magnitude smaller than dwSSA. In addition, ABSIS requires much less samples to achieve the same accuracy of dwSSA. For example, 10⁴ samples of ABSIS has a smaller standard deviation than 10⁶ samples of dwSSA. We also compare ABSIS estimation with the results from swSSA as reported in Roh et al.²⁷ The estimation of 95% confidence interval from 10⁵ samples of ABSIS is 2.986 × 10⁻⁷ ± 0.020 × 10⁻⁷, which is comparable to the estimation of swSSA 2.986 × 10⁻⁷ ± 0.019 × 10⁻⁷ using the same sample size.²⁷

The sample variances of the ABSIS method when using different sample sizes are shown in Fig. 4f (blue line), along with variances using dwSSA sampling (red line, Fig. 4f). Overall, ABSIS gives consistently small variance. At M = 10⁷, the variance (1.0 × 10⁻¹³) is two orders of magnitude smaller than the variance of 3.1 × 10⁻¹¹ when using the dwSSA method. We further note that the variance of estimated transition probabilities using dwSSA seems to increase with the sample size.

Overall, our results show that the ABSIS method converges rapidly to the true transition probability when sample size is increased, whereas the dwSSA method converges less rapidly and has larger variance.

Bias mechanism of ABSIS

Examining the forward-moving probability (Figs. 5a, 5b, green lines) and the backward-moving probability (red lines) of both reactions R₁ and R₂ at different states helps to gain insight into how ABSIS works. The synthesis reaction R₁ has a much higher forward-moving than backward-moving probability in majority of the states, and the degradation reaction R₂ has a much higher backward-moving probability in majority of the states. These observations suggest that in most cases, one should bias to encourage reaction R₁ and to discourage reaction R₂.

Forward and backward moving probabilities. (a) and (b) Probability of moving-forward and moving-backward for two reactions R₁ and R₂ in birth-death model. The x-axis is the system state, i.e., the copy number of molecular species X, and y-axis is the forward-moving (green lines) and backward-moving (red lines) probabilities of reaction R1 and R2 in each state. (c) and (d) The final ABSIS bias strengths for both reactions in birth-death model. Blue lines show the steady state probability landscape of birth-death model. The black lines in (c) and (d) show the curves of bias strengths for reactions R₁ and R₂, respectively. Green and red vertical lines indicate the start and end state.

As the system approaches the target state, the forward-moving probability of R₁ (green line in Fig. 5a) decreases dramatically, while the backward-moving probability of R₂ (red line in Fig. 5b) increases. This is due to the fact that the propensity for backward-moving becomes stronger as the rate of the degradation reaction R₂ increases monotonically with the copy number of X, while the rate of the synthesis reaction R₁ remains constant.

It is clear that constant biases will not work well for this problem, as the rare event transition requires overcoming the steep probability barrier between the two states of x = 40 and x = 80 (Fig. 5c, blue line). The optimal bias strengths will need to depend on the current propensity of forward moving, and should be adaptive.

For this problem, the ABSIS strategy of designing biases based on estimations from look-ahead paths works well. The bias strengths generated by the ABSIS algorithm for both reactions R₁ and R₂ are plotted in Figs. 5c, 5d (black lines), along with the steady state probability landscape (blue lines) as reference. In general, the biases for R₁ are all favorable (Fig. 5c), and the biases for R₂ are all unfavorable (Fig. 5d). However, the strength of the bias is adaptively adjusted following changes in the reaction propensity, as well as the need for overcoming the probability barriers. When approaching the target state, bias is set such that R₁ is much more strongly encouraged to produce more X, whereas R₂ is severely repressed to reduce the degradation of X.

Overall, by utilizing future information from κ = 2 look-ahead paths, ABSIS can identify automatically the reaction to encourage, as well as the reaction to discourage at any given state. The forward and backward-moving probabilities estimated from look-ahead paths can aid in crossing the probability barrier of rare event transitions. By adaptively changing biases according to changes in reaction propensity and future information about the probability barrier, the ABSIS method can provide estimates for the birth-death model with much smaller sampling variance compared to methods using constant biases such as the dwSSA method.²³

Reversible isomerization

We also apply the ABSIS method to the reversible isomerization network taken from the Ref. 26, where the sdwSSA method was applied.²⁶ The reversible isomerization network involves two molecular species and two reactions:

\begin{matrix} R_{1} : A \overset{k_{1}}{\to} B, k_{1} = 0.12, \\ R_{2} : B \overset{k_{2}}{\to} A, k_{2} = 1 . \end{matrix}

(22)

Our goal is to estimate the rare event probability $p (x_{B} = 30 | x (0), t \leq 10)$ that the system transitions from an initial state $x (0) = {(100, 0)}$ to any state with 30 copies of B within the time interval of t ⩽ 10.

Exact probability landscape and transition probability

We first enumerate the full state space S of the reversible isomerization model in Eq. 22, starting from the initial state $x (0) = (100, 0)$ using the dCME method. This reversible isomerization model is a closed system, therefore no buffer is needed. There are a total of 101 microstates in the state space S. The exact transition probability of the rare event $p (x_{B} = 30 | x (0), t \leq 10)$ is calculated by solving the matrix exponential problem $p (10) = p (0) \exp (- A_{a b s} \cdot 10)$ using the EXPOKIT software,³⁷ where $A_{a b s}$ is the modified transition rate matrix by making the target states ${x | x_{B} = 30}$ absorbing states, following the approach of Ref. 20. The exact transition probability is found to be 1.1911 × 10⁻⁵.

The time-evolving probability landscape of the system is calculated, and its projection to B is plotted in Fig. 6a. The blue and black curves show the landscape at t = 10, and at the steady state, respectively. There is one high probability region centered at x_B = 10 (red circles in Fig. 6a). The target state x_B = 30 (red solid dots in Fig. 6a) is located in a region with very low probability. Transitions from x_B = 0 (green dots in Fig. 6a) to x_B = 30 therefore has very low probability, as a large barrier between these two states need to be crossed.

Rare event estimation of the reversible isomerization network model. (a) Its time-evolving probability landscape projected to B. The blue and black curves show the landscape at t = 10, and at the steady state, respectively. There is only one high probability region in both landscapes, which is located at x_B = 10 (red circles). The probability landscape at time t = 10 (blue curve) largely overlaps with that of the steady state (black curve). The initial state x_B = 0 (green dots) is on the tail of the left-side of the probability peak, and the target state x_B = 30 (red dots) is on the far right-side of the low probability region. (b) and (c) The variances (b) and success rates (c) of pilot ABSIS sampling during parameter search using a total sample size of M = 8.8 × 10⁴ and look-ahead steps κ = 2. The yellow dot in (b) shows the location of the optimal parameters. (d) The estimated rare event probability and sampling convergence using ABSIS. The solid red line represents the exact probability calculated from directly solving dCME. The black bars and the box heights are the means and standard deviations calculated from 4 independent ABSIS simulations, for different sample sizes of M10⁴, 10⁵, 10⁶, 10⁷, and 10⁸. (e) Standard deviations of ABSIS (blue line) and dwSSA (red line) at different sample sizes. (f) Sampling variances of ABSIS (blue line) and dwSSA (red line) for different sample sizes.

Determination of look-ahead steps and bias parameters

The look-ahead steps for ABSIS is determined to be κ = 2 and parameter search space to be l = 0.5 after running Algorithm 2. Algorithm 3 is used to determine λ₁ and λ₂ from the search space [0, 0.5] × [0, 0.5]. The optimal parameters are found to be λ₁ = 0.20 and λ₂ = 0.16, which have a success rate of 0.76. Figures 6b, 6c show the variances of sampling weights and success rates of reaching the target state at different values of λ₁ and λ₂. The optimal parameter pair λ₁ = 0.20 and λ₂ = 0.16 are located in the lowest variance region of the parameter space (yellow dot in Fig. 6b). The total sample size for parameter search is 9.1 × 10⁴, which is much smaller than the reported 7 × 10⁵ samples for parameter estimations for dwSSA and sdwSSA in Ref. 26.

Estimated transition probability

The estimated transition probability and standard deviation by averaging four independent simulations using different sample size of M = 10⁴, 10⁵, 10⁶, 10⁷, and 10⁸ are plotted in Fig. 6d. ABSIS simulation provides accurate estimate of 1.1909 × 10⁻⁵ ± 0.0004 × 10⁻⁵ using the sample size of M = 10⁷. In addition, ABSIS converges rapidly to the exact rare event probability computed from the dCME method (red line in Fig. 6d) as the sample size increases. When the same sample size M = 10⁶ as that of Roh et al.²⁶ is used, the ABSIS method gives the estimation of 1.192 × 10⁻⁵ ± 0.001 × 10⁻⁵, with its standard deviation about only one half of the estimation 1.193 × 10⁻⁵ ± 0.002 × 10⁻⁵ from the sdwSSA method.²⁶ When using the same sample size M = 10⁵ as in Roh et al.,²⁷ the ABSIS method gives the 95% confidence interval estimation as 1.191 × 10⁻⁵ ± 0.007 × 10⁻⁵, which has a much smaller 95% confidence interval than the estimation of swSSA 1.190 × 10⁻⁵ ± 0.011 × 10⁻⁵ with the same sample size.²⁷

We also compare our results with those from the dwSSA method. For dwSSA sampling, we use bias constants (γ₁ = 1.301, γ₂ = 0.719) for the two reactions in the network as reported in Roh et al.²⁶ The rare event probability estimated from dwSSA using the sample size of M = 10⁷ is 1.278 × 10⁻⁵ ± 0.060 × 10⁻⁵, and the success rate is only 0.07. The comparisons of mean standard deviations between ABSIS and dwSSA calculated from four independent simulations using different sample size are plotted in Fig. 6e. ABSIS results show 1–2 orders of magnitude smaller standard deviation (Fig. 6e) than dwSSA in estimating the rare event probability. In addition, ABSIS requires much less samples to achieve the same accuracy of dwSSA. In this example, 10⁴ samples of ABSIS has a much smaller standard deviation than dwSSA with 10⁸ samples.

The ABSIS method gives more accurate estimations than dwSSA (1.191 × 10⁻⁵ vs 1.278 × 10⁻⁵ at M = 10⁷, 1.192 × 10⁻⁵ vs 1.201 × 10⁻⁵ at M = 10⁶, and 1.191 × 10⁻⁵ vs 1.075 × 10⁻⁵ at M = 10⁵ compared to the exact value of 1.191 × 10⁻⁵), and has higher success rate (0.76) compared to the dwSSA method (0.07). The ABSIS method also gives estimations with much smaller standard deviations than swSSA (at M = 10⁵, where data are reported in Ref. 27) and sdwSSA (at M = 10⁶, where data are reported in Ref. 26). In addition, it gives consistently smaller sample variances (1.3 × 10⁻¹⁰ at M = 10⁷), which is four orders of magnitude smaller than the variance 4.2 × 10⁻⁶ obtained when using the dwSSA method. The sample variance of ABSIS using different sample size is shown in log-scale in Fig. 6f (blue line), along with variances using dwSSA sampling (red line, Fig. 6f).

Bistable Schlögl model

Schlögl model is a one-dimensional bistable system first proposed in Ref. 38, and extensively studied subsequently.³⁹^,⁴⁰^,⁴¹ It is an auto-catalytic network consisting of one molecular species (X) whose concentration can change through four reactions:³⁸^,³⁹

\begin{matrix} R_{1} : A + 2 X \overset{k_{1}}{\to} 3 X, k_{1} = 3, \\ R_{2} : 3 X \overset{k_{2}}{\to} A + 2 X, k_{2} = 0.6, \\ R_{3} : B \overset{k_{3}}{\to} X, k_{3} = 0.25, \\ R_{4} : X \overset{k_{4}}{\to} B, k_{4} = 2.95, \end{matrix}

(23)

where A and B are species with constant concentrations (set to a = 1 and b = 2, respectively). Values of reaction rate constants k₁, k₂, k₃, and k₄ are taken from Vellela and Qian.³⁹ The volume of the system is fixed as V = 25. Following Vellela and Qian,³⁹ the reaction rates are calculated using formulas $A_{1} (x) = \frac{a k_{1} x (x - 1)}{V}$ , $A_{2} (x) = \frac{k_{2} x (x - 1) (x - 2)}{V^{2}}$ , A₃(x) = bk₃V, and A₄(x) = k₄x, respectively. Our task is to estimate the probability p(92|0, t ⩽ θ) that the Schlögl system transitions from an initial state x = 0 to the target state x = 92 within a given time threshold of θ = 2.

Exact probability landscape and transition probability

We first enumerate the full state space S of the Schlögl model of Eq. 23, starting from the initial state of x = 0 using the dCME method with a buffer size of 1,000. There are 1,001 microstates in the state space S. The exact transition probability of the rare event p(92|0, t ⩽ 2) is calculated by solving the matrix exponential problem $p (θ) = p (0) \exp (- A_{a b s} \cdot 2)$ using the EXPOKIT software,³⁷ where $A_{a b s}$ is the modified transition rate matrix by making the target states x = 92 an absorbing state, following the practice of Ref. 20. The calculated exact transition probability is 5.419 × 10⁻⁵. That is, if 10⁵ paths are sampled using the original SSA method, there will only be about 5 successful transition paths.

The calculated exact time-evolving probability landscape of the system is plotted in Fig. 7a. The blue and black curves show the landscape at time t = 2 and at the steady state, respectively (Fig. 7a). There are two high probability regions centered at x = 4 (red circle on black curve) and x = 92 (red solid dot on black curve), respectively, on the steady state probability landscape (black curve). They are separated by a low probability barrier. The probability landscape at time t = 2 (blue curve) shows a much sharper peak centered at x = 3 (red circle on blue curve). It is clear that transition paths from x = 0 to x = 92 within t = 2 have a steep barrier to cross.

The Schlögl model. (a) Its time-evolving probability landscape. The blue and black curves show the landscape at t = 2 and at the steady state, respectively. There are two high probability regions at steady state (black curve) located at x = 4 (red circle on black curve) and x = 92 (red dot on black curve), respectively. The initial state x = 0 (green dot) is near the first peak, and the target state (red dot) is at the center of the second peak. (b) and (c) Variance and success rate of pilot ABSIS sampling during parameter search using a total sampling size of M = 3.28 × 10⁵ and look-ahead steps κ = 2. The yellow dot in (b) shows the location of the optimal parameters. (d) The estimated transition probability and convergence behavior using ABSIS. The solid red line indicates the exact probability calculated from dCME. The black bars and the heights of boxes in the box-plots are the average means and standard deviations calculated from 4 independent ABSIS simulations, for a different sample size of M = 10⁴, 10⁵, 10⁶, 10⁷ and 10⁸. (e) Standard deviations of ABSIS (blue line) and dwSSA (red line) at different sample sizes. (f) Sampling variances of ABSIS (blue line) and dwSSA (red line) for different sample sizes.

Determination of look-ahead steps and bias parameters

The look-ahead steps for ABSIS in Schlögl model is determined to be κ = 2, and the parameter search space is determined to be 0.5 after running Algorithm 2. Algorithm 3 is then used to determine λ₁ and λ₂ from the search space [0, 0.5] × [0, 0.5]. The optimal parameters are found to be λ₁ = 0.10 and λ₂ = 0.40, which have a success rate of 0.15. Figures 7b, 7c show the variances of sampling weights and success rates of reaching the target state at different values of λ₁ and λ₂. The optimal parameter pair λ₁ = 0.10 and λ₂ = 0.40 is located in the lowest variance region of the parameter space (yellow dot in Fig. 7b). The total sample size for parameter search is 3.28 × 10⁵, which is much smaller than the typical sample size of 7 × 10⁵ reported in dwSSA.²³

Estimated transition probability

The estimated transition probability and variance by averaging four independent simulations using different sample size of M = 10⁴, 10⁵, 10⁶, 10⁷, and 10⁸ are plotted in Fig. 7d. With the sample size M of 10⁷, ABSIS simulation provides an accurate estimate of p_ABSIS(92|0, t ⩽ 2) = 5.394 × 10⁻⁵ ± 0.009 × 10⁻⁵, which is very close to the exact value of 5.419 × 10⁻⁵. In addition, ABSIS converges rapidly (Fig. 7e) as the sample size increases.

We also compare our results with those obtained using the dwSSA method. For dwSSA sampling, we followed the original authors' recommendation of choosing parameters such that the minimum fraction ρ of trajectories reaching the target states is 0.02.²³ This gives the bias constants of γ₁ = 1.115, γ₂ = 0.967, γ₃ = 1.171, and γ₄ = 0.872 for reaction 1–4 in the network, respectively. The rare event probability estimated from dwSSA is 5.976 × 10⁻⁵ ± 0.342 × 10⁻⁵ using a sample size M = 10⁷, which is less accurate than that of ABSIS (5.394 × 10⁻⁵ ± 0.009 × 10⁻⁵ vs. the exact value of 5.419 × 10⁻⁵). It also has a lower success rate of 0.02 compared to ABSIS (0.15). The comparisons of mean standard deviations between ABSIS and dwSSA calculated from four independent simulations using different sample sizes are plotted in Fig. 7e. ABSIS results show about one order of magnitude smaller standard deviation (Fig. 7e) than dwSSA. In terms of computing efficiency, ABSIS sampling is able to achieve better accuracy than dwSSA with 1/10 of samples.

The sample variance of ABSIS using different sample size is shown in Fig. 7f (blue line), along with variances using dwSSA sampling (red line). Overall, ABSIS sampling gives consistently small sample variances (8.712 × 10⁻⁸ at M = 10⁷), which is roughly four orders of magnitude smaller than the variance 3.233 × 10⁻⁴ when using the dwSSA method.

Bias Mechanism of ABSIS

By examining the forward-moving probability (green lines in Figs. 8a, 8b, 8c, 8d) and the backward-moving probability (red lines) of all four reactions at different states, we found that the synthesis reactions R₁ and R₃ have much higher forward-moving than backward-moving probability in majority of the states, and the degradation reactions R₂ and R₄ have much higher backward-moving probability in majority of the states. These observations suggest that reactions R₁ and R₃ should be encouraged and reactions R₂ and R₄ should be discouraged.

Forward and backward-moving probabilities and ABSIS biases for each reactions. (a)–(d) Probability of moving-forward and moving-backward for four reactions in Schlögl model. The x-axis is the system state, i.e., the copy number of molecular species X, and y-axis is the forward-moving (green lines) and backward-moving (red lines) probabilities of reaction R1–R4 in each state. (e)–(h) The final ABSIS bias strengths for reactions in Schlögl model. Blue lines show the steady state probability landscape of birth-death model. The black lines show the curves of bias strengths for four reactions, respectively.

Obviously, constant biases will not work well for this problem because of the steep barrier crossing region between the initial and the target state (blue curve in Fig. 8e). The optimal bias strengths should be adaptive and should be determined by the complex probability landscape of the system.

For the Schlögl model, the ABSIS strategy works well. The bias strengths calculated by the ABSIS algorithm for all four reactions are plotted in Figs. 8e, 8f, 8g, 8h (black curves), along with the steady state probability landscape as a reference (blue curves). The biases for R₁ and R₃ are all favorable (Figs. 8e, 8g), and the biases for R₂ and R₄ are all unfavorable (Figs. 5f, 5h). Interestingly, the strongest biased region of R₁, R₃, and R₄ overlapped with the steepest barrier crossing region in the landscape (Figs. 8e, 8g, 8h), which shows that ABSIS can capture the urgent need for overcoming the probability barriers at the time. The insignificant bias of R₂ is due to its smaller reaction rates, although it has a similar backward-moving probability as R₄ (Fig. 8e).

Estimating rare event probability for Schlögl model is a difficult task for methods with constant biases, as reported in Ref. 34. However, by utilizing future information from 3-step look-ahead paths, ABSIS successfully estimated the probability of rare event transition in the bistable Schlögl model with accuracy and small sampling variance, and compares favorably to the constant biased dwSSA method.²³

Enzymatic futile cycle

Enzymatic futile cycle is a ubiquitous network motif consisting of six different molecular species and six reactions. Samoilov et al. studied this network in detail.⁴² The molecular species, reactions, and corresponding reaction rate constants of the enzymatic futile cycle system are as follows:

\begin{matrix} R_{1} : X_{1} + X_{2} \overset{k_{1}}{\to} X_{3}, k_{1} = 1, \\ R_{2} : X_{3} \overset{k_{2}}{\to} X_{1} + X_{2}, k_{2} = 1, \\ R_{3} : X_{3} \overset{k_{3}}{\to} X_{1} + X_{5}, k_{3} = 0.1, \\ R_{4} : X_{4} + X_{5} \overset{k_{4}}{\to} X_{6}, k_{4} = 1, \\ R_{5} : X_{6} \overset{k_{5}}{\to} X_{4} + X_{5}, k_{5} = 1, \\ R_{6} : X_{6} \overset{k_{6}}{\to} X_{4} + X_{2}, k_{6} = 0.1 . \end{matrix}

(24)

Estimating rare event probability of the enzymatic futile cycle system has been the subject of recent studies using the wSSA method²² and the dwSSA method.²³ Here the goal is to estimate the probability $p (x_{5} = 25 | x (0), t < θ)$ that the system starts from the initial state $x (0) = {(1, 50, 0, 1, 50, 0)}$ to any other states with exactly 25 copies of X₅ within the time-threshold of θ = 100. The same task was also studied in Daigle et al. using dwSSA.²³

Exact probability landscape and transition probability

We first enumerate the full state space S of the futile cycle model of Eq. 24, starting from the initial state $x (0) = (1, 50, 0, 1, 50, 0)$ using the dCME method. As the futile cycle model is a closed system, no buffer is needed for the dCME method. There are a total of 400 microstates in the state space S. The exact transition probability of the rare event p(x₅ = 25|(1, 50, 0, 1, 50, 0), t < 100) is calculated by solving the matrix exponential problem $p (100) = p (0) \exp (- A_{a b s} \cdot 100)$ using the EXPOKIT software,³⁷ where $A_{a b s}$ is the modified transition rate matrix by making the target states ${x | x_{5} = 25}$ absorbing states. The exact transition probability is calculated to be 1.738 × 10⁻⁷. That is, if we use the original SSA method, there will be only about 2 successful transition paths sampled in 10 million different sampled trajectories.

The time-evolving probability landscape of the system is calculated, and its projection to X₅ is plotted in Fig. 9a. The inset figure in Fig. 9a shows the time-evolving landscape from time t = 1 to t = 100, and the main figure shows the time frame from t = 100 to t = 10⁴. The blue and black curves show the landscape at t = 100, and at steady state, respectively. There is only one high probability region in the projected steady state probability landscape (black curve), which is centered at x₅ = 50 (green dots). The probability landscape at time θ = 100 (blue curve) shows a much sharper peak centered at the same location x₅ = 50. It is clear that transition paths from x₅ = 50 to x₅ = 25 within t ⩽ 100 have a steep barrier to cross, although there is no such barrier if sampling time is not restricted.

The enzymatic futile cycle model. (a) Its time-evolving probability landscape projected to X₅. The inset figure shows the time-evolving landscape from time t = 1 to t = 100, and the main figure shows the time frame from t = 100 to t = 10⁴. The blue curves show the landscape at t = 100, and the black curve shows the landscape at steady state. There is only one high probability region, which is located at x₅ = 50 (green dots). The probability landscape at time t = 100 (blue curve) shows a much sharper peak also centered at x₅ = 50. The initial state with x₅ = 50 (green dots) is at the height of the probability peak, and the target state at x₅ = 25 (red dots) is in the low probability region to the left of the peak. (b) and (c) shows the variance (b) and success rate (c) of pilot ABSIS sampling during parameter search using a total sampling size of M = 2.11 × 10⁵ and look-ahead steps of κ = 3. The yellow dot in (b) shows the location of the optimal parameters. (d) The estimated transition probability and convergence behavior using ABSIS. The solid red line represents the exact probability calculated from dCME. The black bars and the box-plots are the means and standard deviations calculated from 4 independent ABSIS simulations using a sample size of M of 10⁴, 10⁵, 10⁶, 10⁷, and 10⁸. (e) Standard deviations of ABSIS (blue line) and dwSSA (red line) at different sample sizes. (f) Sample variances of ABSIS (blue line) and dwSSA (red line) at different sample sizes.

Determination of look-ahead steps and bias parameters

The look-ahead steps for ABSIS in the futile cycle model is determined to be κ = 3, and the parameter search space is determined to be l = 1.0 after running Algorithm 2. Algorithm 3 is then used to determine λ₁ and λ₂ from the search space [0, 1.0] × [0, 1.0]. The optimal parameters are determined to be λ₁ = 0.60 and λ₂ = 0.40, which have a success rate of 0.41. Figures 9b, 9c show the variances of sampling weights and success rates of reaching the target state at different values of λ₁ and λ₂. The optimal parameter pair λ₁ = 0.60 and λ₂ = 0.40 is located in the lowest variance region of the parameter space (yellow dot in Fig. 9b). The total sample size for parameter search is 2.11 × 10⁵, which is much smaller than the reported sample size of 7 × 10⁵ using dwSSA.²³

Estimated transition probability

We also compared our results with those obtained using the dwSSA method. For dwSSA sampling, we use bias constants (γ₁ = 1.000, γ₂ = 1.003, γ₃ = 0.320, γ₄ = 1.003, γ₅ = 0.993, and γ₆ = 3.008) taken from Daigle et al.²³ for the six reactions in the network. The rare event probability estimated from dwSSA is 1.741 × 10⁻⁷ ± 0.001 × 10⁻⁷ using a sample size M = 10⁷, which is slightly better than the estimate from ABSIS, with a higher success rate of 0.67. The comparisons of mean standard deviations between ABSIS and dwSSA calculated from four independent simulations using different sample size are plotted in Fig. 9e. ABSIS results show about 1.5 times larger standard deviations (Fig. 9e) than dwSSA in estimating the rare event probability in futile cycle model. In terms of computing efficiency, ABSIS sampling needs about 1.5 times more samples to achieve the same accuracy as dwSSA with.

The sample variance of ABSIS using different sample size are shown in Fig. 9f (blue line), along with variances using dwSSA sampling (red line, Fig. 9f). ABSIS sampling gives consistently small sample variances (1.708 × 10⁻¹³ at M = 10⁷), although it is about twice as large as the variance of 7.901 × 10⁻¹⁴ when using the dwSSA method. It has also a lower success rate of 0.41.

Bias mechanism of ABSIS

The enzymatic futile cycle network has different characteristics from the other networks studied here. This network includes two enzymes, and each has its active and inactive forms, which amount to a total of four enzyme molecular species, namely the first enzyme X₁ and its inactive form X₃, the second enzyme X₄ and its inactive form X₆. However, as each enzyme has only one copy in the system (X₁ + X₃ = 1, and X₄ + X₆ = 1), the occurrence of reactions is highly restricted by the availability of the enzymes. To study the biasing mechanism of the reactions, we project the forward-moving probability and the backward-moving probability of each reaction to the space of species X₅ and fixed combinations of X₁ and X₄ (Figs. 10a, 10b, 10c, 10d, 10e, 10f for forward-moving probabilities and Figs. 10g, 10h, 10i, 10j, 10k, 10l for backward-moving probabilities). We found that the surfaces of forward and backward-moving probabilities are rather rugged. The forward and backward-moving probabilities for the same reaction can be very different for microstates with only one copy difference in X₁ or X₄. For example, the $p_{F}$ of R₁ at X₁ = 1, X₄ = 0 (red histograms in Fig. 10a) is close to 1, but with only one copy difference in X₄, the $p_{F}$ of R₁ at X₁ = 1, X₄ = 1 is very close 0 (yellow histograms in Fig. 10a). This ruggedness is due to the fact that neighboring microstates have different available enzymes, therefore very different reactions occur according to Eq. 24. In fact, no microstates can have all six reactions occurring simultaneously. The ruggedness of the surfaces of forward and backward-moving probabilities requires biases with large fluctuations (as shown in Figs. 11c, 11f). Our bias scheme seems to offer no improvement for reducing sampling variance compared to dwSSA.

ABSIS forward and backward-moving probabilities for each reactions in the enzymatic futile cycle model. (a)–(f) Projected forward-moving probabilities of all six reactions on the space of X₅ and four different combinations of X₁ and X₄ shown in different colors. (g)–(l) Projected backward-moving probabilities of all six reactions on the same space of X₅ and four different combinations of X₁ and X₄.

ABSIS biases for each reaction in futile cycle model. (a)–(f) The final ABSIS bias strengths for each reaction in enzymatic futile cycle model.

Although reactions R₁, R₂, R₄, and R₆ have overall larger forward-moving probabilities (Fig. 10) and should be encouraged, and reactions R₃ and R₅ have overall larger backward-moving probabilities and should be discouraged, the ABSIS biases for reactions R₁, R₂, R₄, and R₅ are all very close to 1, and only reactions R₃ and R₆ are significantly biased (Fig. 11), as they have the slowest rates among all six reactions and thus biased. Biases for R₃ are clustered into two nearly flat biases with different mean 0.3364 and 0.4651, respectively (Fig. 11c). There are also two different nearly flat biases for R₆ around the mean 2.6330 and 3.5129, respectively (Fig. 11f). In general, the current bias scheme used in ABSIS for the futile cycle network are not very different from the bias constants of dwSSA used in Daigle et al.²³

Overall the ABSIS method is comparable to the dwSSA method for studying the futile cycle network. It can provide accurate estimates of the rare event probability. The sampling variance is constantly small at around 1.7 × 10⁻¹³, regardless of sample sizes. ABSIS also correctly identifies reactions that need to be encouraged and to be discouraged, although the sampling variance of ABSIS is about twice as large as that of dwSSA due to the ruggedness of the surfaces of forward and backward-moving probabilities.

DISCUSSIONS AND CONCLUSIONS

Sampling rare events is an important task for studying key events important for biological processes. In this work, we described a general theoretical framework for obtaining optimized bias in sampling individual reactions to estimate probabilities of rare events. We further developed a practical algorithm named ABSIS for efficient estimation of probabilities of rare events. By adopting a look-ahead strategy and by examining κ-step look-ahead paths following each reaction from the current microstate, we can estimate the reaction-specific and state-dependent forward-moving and backward-moving probabilities of the system. These probabilities are then used to adaptively adjust biases towards selecting each reaction. Overall, ABSIS is well suited for studying rare events in networks with complex probability landscape and steep probability barrier.

Our method addresses a major challenge in estimating rare event probability in biological networks, namely, the need to cross barriers on the probability landscape. As reactions in a network proceeds, the local neighborhood of the probability landscape changes, and different biases are often necessary for barrier-crossing. Unlike previous importance sampling methods such as sdwSSA²⁶ and swSSA,²⁷ in which biases are only based on reaction rates in the current state with no consideration of future information, the ABSIS method can detect barrier-crossing region in the probability landscape by incorporating future information. The bias introduced by the ABSIS method not only depends on the current state, but also depends on the need to cross the probability barrier, which is detected by the κ-step look-ahead strategy. The calculation of κ-step forward-moving and backward-moving probabilities is equivalent to solving a small local version of a chemical master equation of κ-steps.¹⁹

Our method also addresses the issue of proliferation of parameters and associated computational costs. Regardless of the number of reactions in the system and the complexity of the network, bias strengths for all reactions in ABSIS are adjusted using only two general parameters: λ₁ for promoting forward-moving reactions, and λ₂ for repressing backward-moving reactions. The biasing scheme is designed such that forward-moving reactions with lower reaction rates are encouraged, and backward-moving reactions with higher reaction rates are repressed. As κ is small, bias strengths can be determined without lengthy simulations.

We have applied the ABSIS method to four biological networks: the birth-death process, the reversible isomerization, the bistable Schlögl model, and the enzymatic futile cycle model. ABSIS can accurately and efficiently estimate rare event probabilities for all examples. For the birth-death process and Schlögl model, the rare event probabilities can be estimated by ABSIS with a variance of about 1/100 of that of the dwSSA method.²³ For the reversible isomerization model, sampling variance of ABSIS is only about 1/10, 000 of that of the dwSSA. ABSIS also shows significant improvements in standard deviations in comparisons to dwSSA. For the reversible isomerization, ABSIS estimates the rare event probability with only about 1/100 standard deviation of that of dwSSA. For the birth-death model and bistable Schlögl model, the standard deviation of ABSIS sampling is less than 1/10 than that of dwSSA. In terms of computing efficiency, smaller standard deviation indicates ABSIS can achieve the same accuracy as dwSSA with only a small fraction of sample size that dwSSA needs.

Although ABSIS has no significant advantages over constant biasing methods such as dwSSA in studying the futile cycle model, as the current bias scheme in ABSIS gives nearly constant biases, the sampling variances of ABSIS are comparable to those of dwSSA. Future work includes designing more sophisticated bias functions that captures the ruggedness of the probability landscape, which may provide better solutions to problems such as the model of enzymatic futile cycle. In addition, replacing enumeration of κ-steps of the reactions with longer term look-ahead path sampling of comparable computational costs may help to explore potential for barrier-crossing at a longer time scale. Computational costs can be further reduced for larger networks if such long look-ahead sampling is strategically biased. It may also be possible to classify reaction networks based on their topology and rate constants and design different sub-schemes of bias.

ACKNOWLEDGMENTS

We thank Dr. Rong Chen and Dr. Ming Lin for thoughtful discussions and suggestions. This work was supported by National Science Foundation (NSF) Grant Nos. DBI 1062328 and DMS-0800257, and National Institutes of Health Grant Nos. GM079804 and GM086145.

References

Kim K. Y. and Wang J., PLOS Comput. Biol. 3(3), e60 (2007). 10.1371/journal.pcbi.0030060 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ptashne M., A Genetic Switch: Phage Lambda Revisited, 3rd ed. (Cold Spring Harbor Laboratory Press, 2004). [Google Scholar]
Tian T. and Burrage K., Proc. Natl. Acad. Sci. U.S.A. 103, 8372 (2006). 10.1073/pnas.0507818103 [DOI] [PMC free article] [PubMed] [Google Scholar]
Arkin A., Ross J., and McAdams H. H., Genetics 149(4), 1633 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
Aurell E., Brown S., Johanson J., and Sneppen K., Phys. Rev. E 65, 051914 (2002). 10.1103/PhysRevE.65.051914 [DOI] [PubMed] [Google Scholar]
Aurell E. and Sneppen K., Phys. Rev. Lett. 88, 048101 (2002). 10.1103/PhysRevLett.88.048101 [DOI] [PubMed] [Google Scholar]
Zhu X.-M., Yin L., Hood L., and Ao P., Funct. Integr. Genomics 4, 188 (2004). 10.1007/s10142-003-0095-5 [DOI] [PubMed] [Google Scholar]
Cao Y., Lu H.-M., and Liang J., Proc. Natl. Acad. Sci. U.S.A. 107, 18445 (2010). 10.1073/pnas.1001455107 [DOI] [PMC free article] [PubMed] [Google Scholar]
Little J. W., Shepley D. P., and Wert D. W., EMBO J. 18, 4299 (1999). 10.1093/emboj/18.15.4299 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ghosh K., Ozkan S. B., and Dill K. A., J. Am. Chem. Soc. 129, 11920 (2007). 10.1021/ja066785b [DOI] [PubMed] [Google Scholar]
Warren P. B., Phys. Rev. E 80, 030903 (2009). 10.1103/PhysRevE.80.030903 [DOI] [PubMed] [Google Scholar]
Eftimie R., Dushoff J., Bridle B., Bramson J., and Earn D., Bull. Math. Biol. 73, 2932 (2011). 10.1007/s11538-011-9653-5 [DOI] [PubMed] [Google Scholar]
Baylin S. and Herman J., Trends Genet. 16, 168 (2000). 10.1016/S0168-9525(99)01971-X [DOI] [PubMed] [Google Scholar]
Jones P. and Baylin S., Nat. Rev. Genet. 3, 415 (2002). 10.1038/nrg816 [DOI] [PubMed] [Google Scholar]
Ao P., Galas D., Hood L., and Zhu X., Med. Hypotheses 70, 678 (2008). 10.1016/j.mehy.2007.03.043 [DOI] [PMC free article] [PubMed] [Google Scholar]
Huang S., Ernberg I., and Kauffman S., Semin. Cell Dev. Biol. 20, 869 (2009). 10.1016/j.semcdb.2009.07.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
Faradjian A. K. and Elber R., J. Chem. Phys. 120, 10880 (2004). 10.1063/1.1738640 [DOI] [PubMed] [Google Scholar]
Allen R. J., Valeriani C., and ten Wolde P. R., J. Phys.: Condens. Matter 21, 463102 (2009). 10.1088/0953-8984/21/46/463102 [DOI] [PubMed] [Google Scholar]
Cao Y. and Liang J., BMC Syst. Biol. 2, 30 (2008). 10.1186/1752-0509-2-30 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gross D. and Miller D., Oper. Res. 32, 343 (1984). 10.1287/opre.32.2.343 [DOI] [Google Scholar]
Gillespie D. T., J. Phys. Chem. 81, 2340 (1977). 10.1021/j100540a008 [DOI] [Google Scholar]
Kuwahara H. and Mura I., J. Chem. Phys. 129, 165101 (2008). 10.1063/1.2987701 [DOI] [PubMed] [Google Scholar]
Daigle B., Roh M., Gillespie D., and Petzold L., J. Chem. Phys. 134, 044110 (2011). 10.1063/1.3522769 [DOI] [PMC free article] [PubMed] [Google Scholar]
Marshall A., in Symposium on Monte Carlo Methods, edited by Meyer M. (Wiley, 1956), pp. 123–140. [Google Scholar]
Torrie G. and Valleau J., J. Comput. Phys. 23, 187 (1977). 10.1016/0021-9991(77)90121-8 [DOI] [Google Scholar]
Roh M., Daigle B., Gillespie D., and Petzold L., J. Chem. Phys. 135, 234108 (2011). 10.1063/1.3668100 [DOI] [PMC free article] [PubMed] [Google Scholar]
Roh M. K., Gillespie D. T., and Petzold L. R., J. Chem. Phys. 133, 174106 (2010). 10.1063/1.3493460 [DOI] [PMC free article] [PubMed] [Google Scholar]
Meirovitch H., J. Phys. A: Mathematical and General 15, L735 (1982). 10.1088/0305-4470/15/12/014 [DOI] [Google Scholar]
Meirovitch H., J. Chem. Phys. 89, 2514 (1988). 10.1063/1.455045 [DOI] [Google Scholar]
Liang J., Zhang J., and Chen R., J. Chem. Phys. 117, 3511 (2002). 10.1063/1.1493772 [DOI] [Google Scholar]
Lin M., Chen R., and Liu J., “Lookahead strategies for sequential Monte Carlo,” Technical Report (Rutgers University, Peking University and Harvard University, 2009).
Liu J. S., Chen R., and Logvinenko T., Sequential Monte Carlo Methods in Practice, Statistics for Engineering and Information Science, edited by Doucet A., Freitas N., and Gordon N. (Springer, New York, 2001), pp. 225–246. [Google Scholar]
Liang J., Zhang J., and Chen R., J. Chem. Phys. 117, 3511 (2002). 10.1063/1.1493772 [DOI] [Google Scholar]
Gillespie D., Roh M., and Petzold L., J. Chem. Phys. 130, 174103 (2009). 10.1063/1.3116791 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gillespie D., J. Chem. Phys. 115, 1716 (2001). 10.1063/1.1378322 [DOI] [Google Scholar]
Lin M., Zhang J., Lu H.-M., Chen R., and Liang J., J. Chem. Phys. 134, 075103 (2011). 10.1063/1.3519056 [DOI] [PMC free article] [PubMed] [Google Scholar]
Sidje R. B., ACM Trans. Math. Softw. 24, 130 (1998). 10.1145/285861.285868 [DOI] [Google Scholar]
Schlögl F., Z. Phy. 253, 147 (1972). 10.1007/BF01379769 [DOI] [Google Scholar]
Vellela M. and Qian H., J. R. Soc., Interface 6, 925 (2009). 10.1098/rsif.2008.0476 [DOI] [PMC free article] [PubMed] [Google Scholar]
Vlad M. O. and Ross J., J. Chem. Phys. 100, 7268 (1994). 10.1063/1.466873 [DOI] [Google Scholar]
Elderfield D. and Vvedensky D. D., J. Phys. A 18, 2591 (1985). 10.1088/0305-4470/18/13/034 [DOI] [Google Scholar]
Samoilov M., Plyasunov S., and Arkin A., Proc. Natl. Acad. Sci. U.S.A. 102(7), 2310 (2005). 10.1073/pnas.0406841102 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c1] Kim K. Y. and Wang J., PLOS Comput. Biol. 3(3), e60 (2007). 10.1371/journal.pcbi.0030060 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c2] Ptashne M., A Genetic Switch: Phage Lambda Revisited, 3rd ed. (Cold Spring Harbor Laboratory Press, 2004). [Google Scholar]

[c3] Tian T. and Burrage K., Proc. Natl. Acad. Sci. U.S.A. 103, 8372 (2006). 10.1073/pnas.0507818103 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c4] Arkin A., Ross J., and McAdams H. H., Genetics 149(4), 1633 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]

[c5] Aurell E., Brown S., Johanson J., and Sneppen K., Phys. Rev. E 65, 051914 (2002). 10.1103/PhysRevE.65.051914 [DOI] [PubMed] [Google Scholar]

[c6] Aurell E. and Sneppen K., Phys. Rev. Lett. 88, 048101 (2002). 10.1103/PhysRevLett.88.048101 [DOI] [PubMed] [Google Scholar]

[c7] Zhu X.-M., Yin L., Hood L., and Ao P., Funct. Integr. Genomics 4, 188 (2004). 10.1007/s10142-003-0095-5 [DOI] [PubMed] [Google Scholar]

[c8] Cao Y., Lu H.-M., and Liang J., Proc. Natl. Acad. Sci. U.S.A. 107, 18445 (2010). 10.1073/pnas.1001455107 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c9] Little J. W., Shepley D. P., and Wert D. W., EMBO J. 18, 4299 (1999). 10.1093/emboj/18.15.4299 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c10] Ghosh K., Ozkan S. B., and Dill K. A., J. Am. Chem. Soc. 129, 11920 (2007). 10.1021/ja066785b [DOI] [PubMed] [Google Scholar]

[c11] Warren P. B., Phys. Rev. E 80, 030903 (2009). 10.1103/PhysRevE.80.030903 [DOI] [PubMed] [Google Scholar]

[c12] Eftimie R., Dushoff J., Bridle B., Bramson J., and Earn D., Bull. Math. Biol. 73, 2932 (2011). 10.1007/s11538-011-9653-5 [DOI] [PubMed] [Google Scholar]

[c13] Baylin S. and Herman J., Trends Genet. 16, 168 (2000). 10.1016/S0168-9525(99)01971-X [DOI] [PubMed] [Google Scholar]

[c14] Jones P. and Baylin S., Nat. Rev. Genet. 3, 415 (2002). 10.1038/nrg816 [DOI] [PubMed] [Google Scholar]

[c15] Ao P., Galas D., Hood L., and Zhu X., Med. Hypotheses 70, 678 (2008). 10.1016/j.mehy.2007.03.043 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c16] Huang S., Ernberg I., and Kauffman S., Semin. Cell Dev. Biol. 20, 869 (2009). 10.1016/j.semcdb.2009.07.003 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c17] Faradjian A. K. and Elber R., J. Chem. Phys. 120, 10880 (2004). 10.1063/1.1738640 [DOI] [PubMed] [Google Scholar]

[c18] Allen R. J., Valeriani C., and ten Wolde P. R., J. Phys.: Condens. Matter 21, 463102 (2009). 10.1088/0953-8984/21/46/463102 [DOI] [PubMed] [Google Scholar]

[c19] Cao Y. and Liang J., BMC Syst. Biol. 2, 30 (2008). 10.1186/1752-0509-2-30 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c20] Gross D. and Miller D., Oper. Res. 32, 343 (1984). 10.1287/opre.32.2.343 [DOI] [Google Scholar]

[c21] Gillespie D. T., J. Phys. Chem. 81, 2340 (1977). 10.1021/j100540a008 [DOI] [Google Scholar]

[c22] Kuwahara H. and Mura I., J. Chem. Phys. 129, 165101 (2008). 10.1063/1.2987701 [DOI] [PubMed] [Google Scholar]

[c23] Daigle B., Roh M., Gillespie D., and Petzold L., J. Chem. Phys. 134, 044110 (2011). 10.1063/1.3522769 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c24] Marshall A., in Symposium on Monte Carlo Methods, edited by Meyer M. (Wiley, 1956), pp. 123–140. [Google Scholar]

[c25] Torrie G. and Valleau J., J. Comput. Phys. 23, 187 (1977). 10.1016/0021-9991(77)90121-8 [DOI] [Google Scholar]

[c26] Roh M., Daigle B., Gillespie D., and Petzold L., J. Chem. Phys. 135, 234108 (2011). 10.1063/1.3668100 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c27] Roh M. K., Gillespie D. T., and Petzold L. R., J. Chem. Phys. 133, 174106 (2010). 10.1063/1.3493460 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c28] Meirovitch H., J. Phys. A: Mathematical and General 15, L735 (1982). 10.1088/0305-4470/15/12/014 [DOI] [Google Scholar]

[c29] Meirovitch H., J. Chem. Phys. 89, 2514 (1988). 10.1063/1.455045 [DOI] [Google Scholar]

[c30] Liang J., Zhang J., and Chen R., J. Chem. Phys. 117, 3511 (2002). 10.1063/1.1493772 [DOI] [Google Scholar]

[c31] Lin M., Chen R., and Liu J., “Lookahead strategies for sequential Monte Carlo,” Technical Report (Rutgers University, Peking University and Harvard University, 2009).

[c32] Liu J. S., Chen R., and Logvinenko T., Sequential Monte Carlo Methods in Practice, Statistics for Engineering and Information Science, edited by Doucet A., Freitas N., and Gordon N. (Springer, New York, 2001), pp. 225–246. [Google Scholar]

[c33] Liang J., Zhang J., and Chen R., J. Chem. Phys. 117, 3511 (2002). 10.1063/1.1493772 [DOI] [Google Scholar]

[c34] Gillespie D., Roh M., and Petzold L., J. Chem. Phys. 130, 174103 (2009). 10.1063/1.3116791 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c35] Gillespie D., J. Chem. Phys. 115, 1716 (2001). 10.1063/1.1378322 [DOI] [Google Scholar]

[c36] Lin M., Zhang J., Lu H.-M., Chen R., and Liang J., J. Chem. Phys. 134, 075103 (2011). 10.1063/1.3519056 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c37] Sidje R. B., ACM Trans. Math. Softw. 24, 130 (1998). 10.1145/285861.285868 [DOI] [Google Scholar]

[c38] Schlögl F., Z. Phy. 253, 147 (1972). 10.1007/BF01379769 [DOI] [Google Scholar]

[c39] Vellela M. and Qian H., J. R. Soc., Interface 6, 925 (2009). 10.1098/rsif.2008.0476 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c40] Vlad M. O. and Ross J., J. Chem. Phys. 100, 7268 (1994). 10.1063/1.466873 [DOI] [Google Scholar]

[c41] Elderfield D. and Vvedensky D. D., J. Phys. A 18, 2591 (1985). 10.1088/0305-4470/18/13/034 [DOI] [Google Scholar]

[c42] Samoilov M., Plyasunov S., and Arkin A., Proc. Natl. Acad. Sci. U.S.A. 102(7), 2310 (2005). 10.1073/pnas.0406841102 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Adaptively biased sequential importance sampling for rare events in reaction networks with comparison to exact solutions from finite buffer dCME method

Youfang Cao

Jie Liang

Abstract

INTRODUCTION

MODEL FRAMEWORK

Reaction networks

State space and probability landscape

Transition paths and transition probabilities

Macrostates and probability of rare transitions between macrostates

Figure 1.

Calculating exact transition probability

Weighted SSA and doubly-weighted SSA

Adaptively biased sequential importance sampling

Perfect path sampling

Optimal bias strategy and future-perfect adaptive weighting

κ-Step look-ahead bias strategy and adaptive weighting

Figure 2.

Bias function with κ-step look-ahead

Biasing strategy.

Figure 3.

Corrections of biases and biased reaction rates.

Weights of ABSIS path

The ABSIS algorithm

Determining look-ahead step κ and bias parameter λ1 and λ2

BIOLOGICAL EXAMPLES

Birth-death process

Exact probability landscape and transition probability

Figure 4.

Determination of look-ahead steps and bias parameters

Estimated transition probability

Bias mechanism of ABSIS

Figure 5.

Reversible isomerization

Exact probability landscape and transition probability

Figure 6.

Determination of look-ahead steps and bias parameters

Estimated transition probability

Bistable Schlögl model

Exact probability landscape and transition probability

Figure 7.

Determination of look-ahead steps and bias parameters

Estimated transition probability

Bias Mechanism of ABSIS

Figure 8.

Enzymatic futile cycle

Exact probability landscape and transition probability

Figure 9.

Determination of look-ahead steps and bias parameters

Estimated transition probability

Bias mechanism of ABSIS

Figure 10.

Figure 11.

DISCUSSIONS AND CONCLUSIONS

ACKNOWLEDGMENTS

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Determining look-ahead step κ and bias parameter λ₁ and λ₂