Rule-based Modeling and Simulation of Biochemical Systems with Molecular Finite Automata

Jin Yang; Xin Meng; William S Hlavacek

doi:10.1049/iet-syb.2010.0015

. Author manuscript; available in PMC: 2011 Nov 1.

Published in final edited form as: IET Syst Biol. 2010 Nov;4(6):453–466. doi: 10.1049/iet-syb.2010.0015

Rule-based Modeling and Simulation of Biochemical Systems with Molecular Finite Automata

Jin Yang ¹, Xin Meng ¹, William S Hlavacek ^2,³

PMCID: PMC3070173 NIHMSID: NIHMS282728 PMID: 21073243

Abstract

We propose a theoretical formalism, molecular finite automata (MFA), to describe individual proteins as rule-based computing machines. The MFA formalism provides a framework for modeling individual protein behaviors and systems-level dynamics via construction of programmable and executable machines. Models specified within this formalism explicitly represent the context-sensitive dynamics of individual proteins driven by external inputs and represent protein-protein interactions as synchronized machine reconfigurations. Both deterministic and stochastic simulations can be applied to quantitatively compute the dynamics of MFA models. We apply the MFA formalism to model and simulate a simple example of a signal transduction system that involves a MAP kinase cascade and a scaffold protein.

Keywords: Rule-based modeling, executable biology, finite state machine, computational systems biology, formal languages, cell signaling

1 Introduction

In computational systems biology, studying a complex biochemical system involving a large number of interacting proteins often relies on in silico simulations to analyze and predict system behaviors [1]. In recent years, computational models have been increasingly used in cell signaling research and have been developed to study various pathways [2, 3]. However, models often fail to capture the mechanistic details of signal transduction systems [4]. For example, models sometimes inadequately account for the complexities of protein interactions, including interaction details at the level of protein sites and structural relationships among components of signaling proteins [5], particularly multisite protein modification in the context of multiprotein complexation [6]. Proteins in a signal-transduction system often have multiple component parts that enable the protein to interact with other molecules in a modular manner [7, 8, 9]. Models that account for the functions of the component parts of proteins (e.g, linear motifs and protein interaction domains) are needed to better understand the dynamics of signal-transduction systems [10, 11].

Limitations of conventional modeling approaches, which rely on explicit specifications of chemical reaction networks, lie in both model construction and simulation. Conventional models are essentially specified with lists of biochemical species and their reactions. However, representing a biochemical system as a chemical reaction network is often cumbersome and unnecessary [12, 13]. Graphical rule-based modeling formalisms and associated simulation algorithms have been developed to represent biochemical systems in terms of formal rules for biomolecular interactions [12, 13, 14, 15, 16, 17, 18, 19]. In graphical rule-based modeling, graphs (or the equivalents) are used to represent molecules, and graph-rewriting rules (or the equivalents) are used to represent molecular interactions. A rule represents a molecular interaction explicitly and the reactions that can arise from the interaction implicitly, and a rule can be viewed as a coarse-grained description of a class of reactions.

Two common types of protein interactions, multivalent protein binding and multisite post-translational protein modification, cause a combinatorial increase in the size of a reaction network with an increase in the number of interaction modules. It is usually difficult and error prone to manually construct a full-sized chemical reaction model. As an alternative, such conventional models can be automatically obtained using rule-based reaction generation tools, such as BioNet-Gen, Virtual Cell or little b [20, 21, 22, 23], Moleculizer or Smoldyn [24, 25], Simmune [26], or Stochastic Simulator Compiler (SSC) [27]. Unfortunately, the number of reactions and biochemical species implied by rules can be enormously large (even infinite or limited only by the number of molecules in a system) for rule-based models of signal transduction systems [12], making it inefficient to construct, simulate and analyze conventional models derived from rules.

In addition to formalisms based on graph rewriting, a number of theoretical frameworks for biomolecular interaction systems have been proposed over the past decade or so to facilitate model building and simulation. Despite the differences in their syntactical and grammatical structures, most formalisms share a common feature: molecular entities are treated as computational agents that interact with one another according to a collection of specific protocols [28, 29, 30, 31, 32]. For example, protein interactions have been viewed as concurrent processes and have been modeled with communication protocols by process algebras, such as π-calculus [29, 30]. Many of these formalisms have been coupled to Gillespie's stochastic simulation algorithm [30, 33] to enable discrete-event simulation.

In engineering and computer science, complex dynamical systems with heterogeneous, modular and reactive components are frequently modeled by state machines and related formal structures. In this paper, we propose a new formalism, referred to as molecular finite automata (MFA), to model individual proteins as structured computing agents and to specify protein-protein interactions in the form of synchronized dynamics of interacting agents. The main goal is to provide an intuitive as well as programmable representation for biomolecular interaction systems. The MFA formalism is developed by incorporating and extending the classic structure of finite automata, which is a well-established formalism that has a wide application range and for which many sophisticated software and hardware tools are available. As we will see, an agent within the MFA framework explicitly represents a protein's activity (or state) induced by external inputs, and a protein interaction is specified as a joint transformation of the states of multiple MFA agents. At the systems level, a collection of reaction rules is used to describe interactions among the MFA agents in a system.

We also report simulation methods that can compute the dynamics of a system modeled within the MFA framework. Using the example of a MAP kinase cascade, we demonstrate how to apply the MFA formalism to model and simulate a cell signaling system.

2 Formal model

In the first part of this section, we introduce a representational framework using MFAs to describe the building blocks, particularly proteins, of biomolecular interaction systems. In the second part, we show how to apply the MFA representations to construct quantitative models for cell signaling systems and how to compute with these models.

2.1 Molecular entities — molecular finite automata

The notion of finite automata, or finite state machines, is well-established in theoretical computer science and has been applied to model the dynamics of diverse discrete systems (i.e., systems with finite numbers of states) [34]. Finite automata were traditionally developed to construct parsers and compilers, conduct formal verifications and mathematical proofs, and design and test software [34, 35, 36]. Because of their simple, adaptive and intuitive structure, finite automata have been applied in a number of areas, including engineering systems design [37], computational linguistics [38] and communication protocols [39]. Interacting state machines have also been used in the field of computational biology to visualize and model cellular level interactions [32].

Our goal in this paper is to propose a formalism based on an extended structure of finite automata that is suited for modeling biomolecular intreactions at the submolecular level with consideration of site-specific details. Below, we give a formal definition of a purely reactive finite automaton that will be extended for the later description of biomolecules.

Definition 1 (finite automaton). A finite automaton is a tuple D = (S, X, δ, S₀), where S and X are finite sets of states and inputs, respectively. The function δ is a transition function that maps the current state along with an input in X into a target state, δ : S × X → S. The symbol S₀ denotes a start state.

The automaton of Definition 1 is a so-called “deterministic” finite automaton (DFA). In a DFA, given an input, a state transition is non-ambiguous, and a DFA can only reside in one state at any given time. In contrast, a finite automaton can be nondeterministic, in which multiple state transition paths (or more than one target state) may exist for a single input. Here, assuming responses of a protein to external signals are deterministic, we focus on DFAs as the fundamental structures for modeling proteins.

The above definition characterizes a finite automaton as a reactive model in which state transitions are induced by external events in the form of input signals. Elements in the set of states S are represented using subscripted lower-case letters, S = {s₁, S₂, …,}. Throughout, inputs are represented using the lowercase alphabet, ε and subscripted ε's, i.e., X = {a, b, …, ε, ε₁, ε₂ …}. A special input symbol ε is introduced to model a non-specific external signal or a signal from an unknown source that causes a spontaneous state transition. In a model of a signaling system, such a non-specific input can be used to model molecular events such as dissociation of two bound proteins caused by background collisions with solvent molecules or protein modifications catalyzed by unknown enzymes. A finite automaton can be visually represented by a state transition diagram (Fig. 1(a)), a directed graph in which a node denotes a state and an edge denotes an input-induced transition. Equivalently, a finite automaton can be specified by a machine-readable state transition table (Fig. 1(b)).

An example finite automaton. (a) State transition diagram for a three-state finite automaton. A circle denotes a state. An arrow denotes a state transition. A letter next to an arrow denotes an input, (b) A state transition table for the finite automaton in panel (a). The leftmost column indicates possible current states. The topmost row indicates inputs. A table entry indicates a target state given a current state and an input. The symbol '–' indicates “not applicable.”

The dynamics of many reactive systems can be represented using the DFA structure of Definition 1. However, this structure is inefficient for describing signaling proteins. To extend the DFA structure to model a protein, we first look at the correspondence between properties of finite automata and protein functions. A classic finite automaton models a memoryless process, wherein a state transition depends only on the current state and an input. In contrast, protein interactions mostly happen under certain molecular contexts. To see the importance of molecular context, we consider allosteric regulation and protein complexation. Allosteric regulation of a protein or enzyme is a common mechanism in biochemistry. Protein activity in one domain is changed (either activated or inhibited) by binding or unbinding of an effector molecule at another site. The formation of heterogeneous and transient multiprotein complexes is one of the essential functions of protein-protein interactions in signal transduction. Context-sensitive interactions such as co-localization of an enzyme and one of its substrates control both the strength and specificity of molecular signaling. These features of protein interactions require an extension of the DFA structure beyond the representation of information only in terms of a finite number of states.

To capture the contextual sensitivity of protein interactions, internal variables are introduced to record contextual information such as information about binding partners or other local molecular information. An extension of Definition 1 should also include functions that will read and modify the machine variables. Along with state transitions, these machine operations update the configuration of a protein. Based on these considerations, we define an enhanced automaton structure, an “extended finite automaton” (EFA) to amend the classic finite automata structure.

Definition 2 (extended finite automaton). An extended finite automaton is a tuple E = (S, X, δ, s₀, v), where S and X are finite sets of states and inputs, respectively. The transition function, δ : S × X∖P(v) → S/A(v), maps the current state along with an input in X into a target state upon evaluation of a predicate function P(v), and performs an operation A(v) on the variable structure v along with the state transition. The symbol S₀ denotes a start state.

The meanings of operators (e.g., ∖ and /) used in the above definition are given in Table 1. Our definition of EFA is close to the convention of an extended finite state machine [36], which also involves operations on internal variables. Suppose that an EFA E is in state s. Upon receiving an input x, E undergoes a transition δ = (s, q, x, P(v), A(v)), where s and q are the source and target states, respectively. If the predicate P(v) is true (e.g., an evaluation of variables in v indicates that the transition is legitimate), E moves to the target state q and performs an operation A(v) on the variable structure v.

Table 1.

Operators and symbols used in MFA structures

Operator/symbol	Definition
x := a	Assignment of a value a to a variable x
x = a	Comparison between x and a
/a	Delimiter that precedes an operation a
∖ a	Delimiter that precedes a predicate a
a.b	Component operator: b is a member of a
A – B	Bond association between A and B
x → A	Mapping input x to machine A

Open in a new tab

Ultimately, another important and ubiquitous feature of cell signaling, site-specific interactions, must be incorporated to reflect the modularity of protein interactions. Many signaling proteins possess multiple functional motifs, domains and sites, which serve as modules for combinatoric protein organizations that can potentially generate diverse signaling patterns. A realistic protein automaton should express dynamics at the level of protein sites. To this end, we arrive at the definition of “molecular finite automaton” (MFA), which models the discrete dynamics of a multidomain biomolecule. The relationship between MFA and EFA is as follows: (1) an MFA contains one or multiple EFAs and (2) each EFA in an MFA operates on a common variable structure that is shared by all EFAs. Table 2 summarizes a conceptual mapping between protein functions and the structure of an MFA.

Table 2.

Molecular finite automaton and protein function

MFA component	Protein
State	Conformation
State transition	Conformation change
Input	Biochemical interaction
Variable and predicate	Molecular context
Component machine	Domain or site

Open in a new tab

Definition 3 (molecular finite automaton). A molecular finite automaton is a tuple M = (E₁,E₂, …,E_n, v), which is composed of n component EFAs and a shared variable structure v. The transition function for E_i is δ_i : S_i × X_i∖P_i(v) → S_i/A_i(v).

Table 1 lists a set of operators and symbols that we will use to describe MFAs in state transition diagrams and state transition tables. In essence, the MFA structure encapsulates multiple finite automata and allows for a hierarchical description of the component substructures of proteins. Internal variables and predicates help to compress the state space and make an MFA more accessible to intuitive understanding. Without using internal variables and predicate functions, one can still build an MFA by expanding the state space assuming that variables store information of finite size. However, such an approach may result in a state expansion that might become intractable for a complex system.

The construction of an MFA requires knowledge and/or a hypothesis about the biochemistry and the component substructure of the protein one wants to model. Although some proteins have established functions in well-studied signaling pathways, biochemical mechanisms for many protein functions still await characterization. For a protein with known structure and function, the corresponding MFA must be designed to faithfully reproduce the reactive dynamics of the protein. For a poorly characterized protein, building an MFA, as in building any model, provides an opportunity to generate testable hypotheses.

As a design issue, MFA models can be constructed with a great deal of flexibility. Equivalent MFAs may differ in the number of states and the topology of state transition diagrams. A protein with multiple sites can be modeled by an MFA that has separate finite automata, each of which describes the dynamics of a domain. Equivalently, instead of using one automaton to model one protein site, the protein can be modeled by an MFA that consists of a single finite automaton that describes the combined behavior of all sites. For example, if a biomolecule has three independent domains that interact with different binding partners, it can be modeled as one eight-state finite automaton plus a variable structure (Fig. 2(a)), where state S₁ indicates that the molecule is in a free form with no binding partners and state s₈ indicates that all sites are occupied. Alternatively, the biomolecule can be modeled with three two-state (a free state and a bound state) EFAs with each EFA describing an individual binding domain (Fig. 2(b)). For the case of three identical and non-cooperative binding domains, it may be preferable to model the protein with a four-state finite automaton with the state space S = {s₁ : free, S₂ : singly-bound, S₃ : doubly-bound, S₄ : triply-bound} for a parsimonious structure in terms of the number of states (Fig. 2(c)). This four-state MFA can be further compressed to a two-state model as shown in Fig. 2(d), where S₁ denotes the unoccupied state and S₁ denotes the protein is occupied at least on one of its three sites. In this model, the information about how many sites are bound is resolved by a variable c serving as a counter.

A protein with three binding domains modeled by different MFA structures, (a) An MFA that uses a single eight-state EFA to model the overall state transitions. Unlabeled transitions are induced by the same inputs as those identified for parallel transitions. (b) An MFA that models the protein with three independent internal finite automata, each of which interacts with distinct binding partners implied by input symbols {*a, b, c*}. Spontaneous inputs (ε1, ε2 and ε3) are distinguished for non-identical individual EFAs. (c) A four-state MFA that models the protein as having three independent and identical binding sites. Machine variables that register binding partners are not shown, (d) A two-state MFA that can replace the model in (c). The two states s₁ and s₂ represent “free” and “bound”, and the variable c counts the number of bound sites.

The state space of an MFA, S_M, is a subset of the product of the state spaces of component EFAs, i.e., S_M ⊆ S₁ × S₂ × … × S_n, where the two sides achieve equality when all component EFAs are independent. The input set of an MFA is a union $X_{M} = ⋃_{i = 1}^{n} X_{i}$ . For an input x ε X_M, the transition function δ_i is chosen if x only belongs to X_i. A transition function is chosen arbitrarily if x belongs to input sets of multiple component EFAs. For example, if x ε X_i ⋂ X_j, either δ_i from E_i or δ_j from E_j can be equivalently chosen to react to the input x. This scenario of relaying inputs corresponds to the case where a protein has multiple domains that interact with identical partners.

To present a biological example, Fig. 3 shows the state-transition diagram of an MFA model of the high-affinity IgE receptor, FcεRI, in the model of Goldstein et al. [40] and Faeder et al. [41]. The state transition table for the MFA representation of FcεRI is shown in Table 3. The receptor molecule FcεRI has three functional domains: (1) an extracellular α subunit responsible for binding its ligand, IgE (in fact, an IgE dimer is considered in the models in Refs. [40, 41]); (2) an intracellular β subunit that constitutively binds to Src-family protein tyrosine kinase Lyn when it is unphosphorylated and recruits Lyn with higher affinity upon phosphorylation; and (3) an intracellular γ sub-unit that recruits another protein tyrosine kinase Syk upon phosphorylation. The machine for FcεRI has three variables, v = (v_α, v_β, v_γ), which record the labels (id's) of binding partners for each of the corresponding domains. We note that recording binding partners using internal variables is equivalent to constructing an adjacency list to store an undirected graph. Tracking protein connectivity by such means allows protein complexes to be represented implicitly. The connectivity of proteins within a complex can be retrieved by a graph traversal. In the FcεRI pathway, some protein state transitions only happen in specific molecular contexts. For example, crosslinking of two receptors by an IgE dimer initiates signaling. On the cytoplasmic side of a crosslinked receptor dimer, a α subunit-associated Lyn can transphosphorylate the β subunit of the other receptor to initiate an intracellular signaling cascade [41]. To incorporate such non-local contextual information into the MFA-based pathway model of a signaling system, one needs to specify reaction rules. In the following section, we introduce a formal definition of reaction rules, which are used to describe interactions between proteins modeled by MFAs.

State transition diagram for the MFA of a receptor FcεRI with three component EFAs. α subunit: unbound (s₁), bound (s₂); β subunit: unbound and unphosphorylated (s₁), bound and unphosphorylated (s₂), unbound and phosphorylated (s₃), and bound and phosphorylated (s₁); γ subunit: unbound and unphosphorylated (s₃), unbound and phosphorylated (s₂), and bound and phosphorylated (s₃). Internal variables v_α, v_β and v_γ record binding partners of the α, β and γ subunits, respectively. Inputs and operations (if any) are labeled together on the transition edges, separated by a delimiter symbol / (cf. Table 1). For example, ε₁v_α:= ϕ indicates that the MFA receives a non-specific input ε₁ and then sets the variable v_α to the null value ϕ.

Table 3.

State transition table for the MFA of FcεRI.

FcεRI.α	a	ε ₁
s ₁	s₂ / v_α := id	–
s ₂	–	s₁ / v_α := ϕ

FcεRI.β	b	ε ₂	ε ₃	ε ₄
s ₁	s₂/v_β := id	–	s ₃	–
s ₂	–	s₁/v_β := ϕ	–	–
s ₃	s₁/v_β := id	–	–	s ₁
s ₄	–	s₃/v_β := ϕ	–	–

FcεRI.γ	c	ε ₅	ε ₆	ε ₇
s ₁	–	s ₂	–	–
s ₂	s₃/v_γ := id	–	s ₁	–
s ₃	–	–	–	s₂/v_γ := ϕ

Open in a new tab

ϕ: a null symbol indicating a free site. An id is a label assigned to identify an individual MFA agent among a population of agents of one type.

2.2 Molecular interactions — reaction rules

An MFA is essentially a discrete state model that characterizes a protein as a reactive agent with state transition protocols. Since specification of an MFA structure does not require consideration of the modeled protein within the larger context of a signaling system, explicit rules are needed to connect individual types of MFAs as parts of an interacting system.

A protein-protein interaction system is composed of a collection of MFAs for different types of proteins in the system. To describe the interactions between these MFAs in terms of biochemical reactions, one can specify protein interactions for the MFAs by means of reaction rules. An interaction between proteins changes the states of all participating molecules. In other words, a reaction synchronizes state transitions and machine reconfigurations among participant MFAs. We can view a reaction rule for an MFA-based interaction as a specification of synchronized state transitions and operations on internal variables. We formally define a reaction rule as follows.

Definition 4 (reaction rule). A reaction rule is an injective function R : X → M∖P. The sets X = {x₁, x₂, …,x_n} and M = {M₁,M₂, …,M_n} are ordered and contain inputs and MFAs, respectively. P is a predicate for the mapping.

The mapping X → M simultaneously sends each individual input x_i to machine M_i and executes the machine reconfigurations if M_i is responsive to x_i. The predicate P specifies an application condition for the mapping, which usually constitutes non-local molecular contexts (or patterns). Although the number of MFAs simultaneously involved in a reaction could in principle be any finite number n, we focus on two types of elementary interactions: (1) unimolecular interactions that involve state transitions of one MFA and (2) bimolecular interactions that involve synchronized state transitions of two MFAs. Execution of a reaction rule changes the configurations of participant MFAs according to the protocols defined in the state transition tables of the MFAs. The above definition of a reaction rule requires that MFA agents be in states that can respond to the inputs. Together with the state transition tables for individual MFA types, reaction rules provide executable and programmable protocols to connect standalone MFAs into an interacting system.

In the example model of Fig. 4, all interactions in the model can be specified by four reaction rules (Table 4). Phosphorylation and dephosphorylation of automaton A are approximated as unimolecular, single-step reactions, which can be defined by two rules, R₁ : {ε₁} → {A} and R2₂ : {ε₂} → {A}, respectively. In these cases, a rule is merely a local pairing of a current state and an input for an MFA. The state transition and its associated operations follow the protocol defined in the state transition table. For a bimolecular association reaction between automata A and B, a reaction rule can be formulated as R₃: {a,a} → {A, B}, where the MFA set and the input set are both ordered and have a one-to-one mapping. We note that the definition of a reaction rule does not specify machine states and therefore only MFAs in proper states will respond to an input. A rate law specifies a mathematical formula to calculate the kinetic rate for a reaction rule, which can be used for quantitative simulations. We note that only machines in states designated by a reaction rule are accounted for when one calculates the rate according to the rate law. For example, R₃ in Table 4 specifies a mass action rate law for the association reaction between protein A and B, r₃(t) = k₃A(·)B(·), where A(·) and B(·) represent the eligible populations of protein A and B Eligible machine states in an MFA can be automatically resolved by searching the state transition table for responsive states with regard to the input symbol specified in the rule. In this case, since the eligible machine states for this reaction rule are S₂ and S₁ for A and B, respectively, the actual rate should be calculated as r₃ = k₃A(S₂)B(s₁). The dissociation rule, R₄ : {ε₃, ε} → {A, B}∖A–B, has a predicate A–B that requires A and B must share a bond. The operator `-' denotes a bond association between the two machines.

Interactions between MFAs. An example system that is modeled by two interacting MFAs, A and B. Automaton A models a protein that can, upon phosphorylation, bind to another protein modeled by Automaton B. Automaton A has three states: free and unphosphorylated (s₁), free and phosphorylated (s₂), and bound and phosphorylated (s₃). Automaton B has two states: free (s₁) and bound (s₂). Internal variable v's in A and B are used to record information about binding partners (i.e., the label of an MFA agent, or ϕ if free).

Table 4.

Formal reaction rules for the model of Fig. 4

Rule description	Formal specification	Rate law
R₁ : Phosphorylation of A	{ε₁} → {A}	r₁(t) = k₁A(·)
R₂ : Dephosphorylation of A	{ε₂} → {A}	r₂(t) = k₂A(·)
R₃ : A and B association	{a, a} → {A, B}	r₃(t) = k₃A(·)B(·)
R₄ : A and B dissociation	{ε₃, ε} → {A, B}∖A–B	r₄(t) = k₄A(·), or k₄B(·)

Open in a new tab

Rules are shown as one-to-one mappings between ordered sets (e.g., R₄ : {ε₃, ε} → {A, B} indicates that ε₃ is an input for A and ε is an input for B.) A–B denotes that machines A and B have a bond association.

In summary, a list of reaction rules assumes three roles: (1) Assigning rate laws for quantitative computation; (2) synchronizing state transitions for bi-molecular reactions; and (3) making a modeling choice to decide which subsets of machine transitions are to be included in a system, in which the specification of a set of reaction rules reflects the choice of modeling assumptions and scope. For example, some state transitions may never be triggered by a given set of reaction rules even though these transitions may be possible at the machine level.

3 Quantitative modeling

Reaction rules are essentially specifications of coupled chemical processes that can be taken to change the configuration of a system in time. A set of rules can be translated into quantitative models if the rules can be associated with rates via rate laws. A straightforward way to translate a rule-based model into a quantitative model is to automatically generate a conventional chemical reaction network by evaluating reaction rules using a rewriting approach [20, 16]. However, a far more efficient approach is to use reaction rules to directly perform a simulation. Below, we describe how to construct and simulate models specified in terms of MFA structures, for either deterministic or stochastic simulation.

3.1 Deterministic simulation

A biochemical reaction system is conventionally modeled using coupled ordinary differential equations (ODEs) that describe the temporal evolution of all chemical species in the system. Here, we demonstrate that one can use a set of ODEs to instead describe the population dynamics of MFA states. In fact, an MFA state (or a combination of states) corresponds to an ensemble of chemical species, which often manifests as an experimental observable, such as free protein concentration or protein phosphorylation level. This idea is related to the concept of model reduction [19, 42, 43, 44, 45], in which a reduced system of ODEs is formulated to describe the dynamics of a set of observable quantities instead of concentrations of an exhaustive set of chemical species.

In a most general form, we can write the following ODE to model the population level of the agents of MFA M in state s, denoted as M(s, t):

\frac{dM (s, t)}{dt} = r_{in} (t) - r_{out} (t),

(1)

where r_in (r_out) is the rate of population influx (outfiux), consistent with the rate laws associated with the transition rules related to state s. For a single MFA agent, the above equation describes the time rate of change of the probability to find the machine in state s.

Considering the simple example model illustrated in Fig. 4, we assume the law of mass action for protein association reactions and single-step protein phosphorylation and dephosphorylation reactions. The following differential equation can be used to describe the concentration of protein A in the machine state S₂, A(s₂):

\begin{matrix} \frac{dA (s_{2}, t)}{dt} & = \underset{r_{in}}{\underset{︸}{k_{1} A (s_{1}, t) + k_{4} A (s_{3}, t)}} - \underset{r_{out}}{\underset{︸}{A (s_{2}, t) (k_{2} + k_{3} B (s_{1}, t))}} \\ = r_{1} (t) + r_{4} (t) - (r_{2} (t) + r_{3} (t)), \end{matrix}

(2)

where X(s_i, t) is the concentration of protein X in its machine state s_i at time t. The parameter k_i is the rate constant for an elementary reaction process defined by Rule i, and r_i, is the overall reaction rate for Rule i (Table 4). One can systematically write down ODEs for other MFA states for the model of Fig. 4. We assume the total numbers of automata A and B are conserved, i.e., A_tot = A(s₁, t) + A(s₂, t) + A(s₃, t) and B_tot = B(s₁, t) + B(s₂, t). Note that A(s₃, t) = B(s₂, t), the number of bonds formed between automata A and B. Based on these constraints, only one more independent ODE is needed to describe the whole system:

\begin{matrix} \frac{dA (s_{3}, t)}{dt} & = k_{3} A (s_{2}, t) B (s_{1}, t) - k_{4} A (s_{3}, t) \\ = r_{3} (t) - r_{4} (t) . \end{matrix}

(3)

. The above procedure of constructing a system of ODEs can be automated. In some systems modeled by MFAs, the construction of ODEs may not be straightforward. For example, if an MFA state transition depends on the status of a predicate evaluation, the calculation of the transition rate needs to be adjusted to account for the outcome of predicate evaluation. It is in many cases a difficult task to accurately account for rates of conditional transitions. We will discuss this issue below when we consider an example model for a MAPK cascade with a scaffold protein.

3.2 Stochastic simulation

The temporal dynamics of a biochemical reaction system can be modeled as a continuous-time discrete-state Markov process to account for the evolution of the system configuration, which can be described by the following master equation [46]:

\frac{dp (c, t)}{dt} = \sum_{c^{'} \neq c} [w (c ∣ c^{'}) p (c^{'}, t) - p (c, t) w (c^{'} ∣ c)],

(4)

where p(c, t) is the probability that the system is found in configuration c, and w(c′|c) gives the transition rate from configuration c to c′. In a chemical reaction system, a configuration is defined by the concentrations of all chemical species. In a system specified by MFAs, a configuration c is determined by the states and connectivities of MFAs. More precisely, the configuration is given by the properties of the individual MFA agents in the system, including the states and internal variables of these MFA agents. Analytical solutions to the above master equation are only possible for very simple systems. For a typical system, direct numerical integrations of the master equation is often intractable because of the enormous size of the configuration space. Kinetic Monte Carlo simulation is applied to conduct sequential random walks through the configuration space and to obtain stochastic trajectories for a system of interest.

A system of rate processes can be simulated by the classic kinetic Monte Carlo method [47]. In our case, coupled processes in a biomolecular interaction system are defined by reaction rules that proceed in time at rates r. These rule rates are determined by the current configuration of the system. At each time step, the waiting time T for the next reaction event can be sampled from an exponential distribution with a mean waiting time of l/r_tot, where r_tot = σ_ir_i is the overall reaction rate of the system. To select a process that generates the next reaction event, one can sample a rule i proportional to its rate r_i [48].

For rule-based models defined by MFA structures, additional sampling steps are needed in each step to identify MFA agents that should undergo state transitions. Below, we outline a kinetic Monte Carlo algorithm for simulating systems specified in terms of MFAs.

[Step 1] Initialization. Set time t = 0, set the initial states and copy numbers of individual MFA agents, specify the rate constants of rate laws associated with reaction rules; calculate rule rates r, and specify stopping criteria.

[Step 2] At each time step, select a rule i and a waiting time T, and update the time t ← t + T.

[Step 3] Sample MFA agents from MFA candidates that are in permissible configurations as specified in the reaction rule sampled in Step 2, execute the state transitions of the sampled MFA agents, and recalculate the rate vector r.

[Step 4] Repeat Steps 2 and 3 until a stopping criterion is satisfied.

In the above algorithm, Step 3 describes an agent-based simulation that tracks the states of individual proteins modeled by MFAs. A simulation produces single-protein configurations with details about reactive sites as well as connections between proteins.

Several general kinetic Monte Carlo methods for simulation of rule-based models have recently been developed [48, 49, 50, 51, 52], which can be readily adapted to suit the MFA framework. A rule-based kinetic Monte Carlo simulation involves sampling molecular agents or agent components that are permissible for transformation according to a rule [48]. In simulating an agent-based MFA model, after a rule is sampled in Step 2, the algorithm described above involves searching for reactant agents in a population of candidate MFA agents that satisfy the rule protocol. The selected rule is executed by transforming the sampled MFA agents (as in Step 3 in the above procedure). MFA transformations are executed by sending input signals as specified in the rule to the sampled MFA agents.

In some situations, the individual states of proteins and the connections of proteins within protein complexes may not be of interest, or such information may not be experimentally resolvable to verify the predictions generated by an agent-based simulation. In these cases, A kinetic Monte Carlo procedure that incorporates only observable quantities may be adopted. A kinetic Monte Carlo simulation can proceed as long as one is able to update the rates of reaction rules for each time step, which only requires tracking the populations of those machine states indicated in the rules (see Table 4). These local MFA states usually consist of experimentally accessible quantities such as the number of protein bonds or phosphorylated sites. Other quantities, such as the concentration of a complex, may in some cases be synthesized from basic MFA configurations.

4 Example: MAPK cascade

In this section, we use the example of a scaffold-mediated MAPK cascade to demonstrate how to construct and simulate an MFA-based model of a signaling pathway.

The system is inspired by the scaffold-mediated MAPK cascade in yeast. The scaffold protein Ste5 possesses three domains that bind the MAP kinases Ste11 (MAPKKK), Ste7 (MAPKK) and Fus3 (MAPK) in the signaling pathway for the mating response [53]. We consider a scaffold protein with three independent binding sites: α, β and γ sites, and three MAP kinases: a MAPKKK that binds to the α site of the scaffold protein, a MAPKK that binds to the β site of the scaffold protein and can be phosphorylated by MAPKKK, and a MAPK that binds to the γ site of the scaffold protein and can be phosphorylated by MAPKK. We assume that (1) binding reactions of the different kinases to the scaffold protein are independent processes, (2) MAPKKK can only be phosphorylated when it is bound to the scaffold protein, (3) phosphorylation of MAPKK (MAPK) can happen only when its kinase, phosphorylated MAPKKK (MAPKK), colocates on the same scaffold protein, and (4) phosphorylation and dephosphorylation of kinases can be modeled as single-step processes.

To simplify notations, we will use SCF to denote the scaffold protein and M3K, M2K and MPK to denote MAPKKK, MAPKK and MAPK, respectively. Figure 5 illustrates state transition diagrams of MFA models for the four proteins involved in the system. A total of 12 reaction rules describing protein interactions in the system are listed in Table 5.

MFA state transition diagrams for proteins in a MAPK cascade with a scaffold protein. The scaffold protein has three submolecular binding sites: the α, β and γ sites that bind to M3K, M2K and MPK, respectively. State s₁ (s₂) for each scaffold site indicates that the site is free (bound). Each kinase has four states: s₁(free and unphosphorylated), s₂ (bound and unphosphorylated), s₃ (bound and phosphorylated), and s₄ (free and phosphorylated). Internal variables (v_α, v_βv_γ in the scaffold, and v's in the kinases) record contextual information, in particular, record binding partners. Note that the names of inputs and internal variables are local to the MFAs in which they appear. In other words, inputs or internal variables in different MFAs with the same name are not identical.

Table 5.

Reaction rules for the MAPK cascade with a scaffold protein

Rule description	Formal specification	Rate law
1. M3K recruitment	{a, a}→{SCF._α,M3K}	k₁M3K(·)SCF._α(·)
2. M3K dissociation	{ε₁, ε₁}→{SCF._α,M3K}∖SCF._α-M3K	k₂SCF._α(·)
3. M2K recruitment	{b, a}→{SCF.β,M2K}	k₃M2K(·)SCF.β(·)
4. M2K dissociation	{ε₂, ε₁}→{SCF.β,M2K}∖SCF.β-M2K	k₄SCF.β(·)
5. MPK recruitment	{c, a}→{SCF._γ,MPK}	k₅MPK(·)SCF._γ(·)
6. MPK dissociation	{ε₃, ε₁}→{SCF._γ,MPK}∖SCF._γ-MPK	k₆SCF._γ(·)
7. M3K phosphorylation	{ε₂}→{M3K}	k₇M3K(·)
8. M3K dephosphorylation	{ε₃}→{M3K}	k₈M3K(·)
9. M2K phosphorylation	{b}→{M2K}∖M2K-SCF-M3K(s₃)	k₉M2K(·)
10. M2K dephosphorylation	{ε₂}→{M2K}	k₁₀M2K(·)
11. MPK phosphorylation	{b}→{MPK}∖MPK-SCF-M2K(s₃)	k₁₁MPK(·)
12. MPK dephosphorylation	{ε₂}→{MPK}	k₁₂MPK(·)

Open in a new tab

In particular, we note that assumption (3) above requires checking molecular contexts to determine if protein state transitions are permissible, which exemplifies one important feature of MFA framework that allows modeling context-sensitive interactions. Consequently, Rules 9 and 11 in Table 5 require predicate evaluations that examine nonlocal machine configurations. For example, Rule 9 validates an M2K agent by checking whether it is bound to an SCF agent that in turn is bound to an M3K agent in S3. This contextual information is provided in the rule as a pattern of bond associations, M2K–SCF–M3K(s₃). In general, a pattern of a multiprotein complex can be specified by a data structure that represents a connectivity graph.

4.1 Deterministic ODEs

We first show how to write deterministic ODEs to describe the evolution of the population levels of MFA states. Our MFA model of the MAPK cascade consists of 15 independent machine states, compared to a total 33 distinct chemical species implied by the same set of reaction rules. Furthermore, we assume that the total amount of each protein is conserved, which corresponds to $M_{tot} = Σ_{i = 1}^{n} M (s_{i})$ for an MFA with n states, where M_tot is the total number of the MFA agents. Kinetic parameters that appear in the following equations are defined in Table 5. The following equation characterizes M3K binding to the α site of the scaffold protein:

\frac{d SCF . α (s_{1})}{dt} = - k_{1} [M 3 K (s_{1}) + M 3 K (s_{4})] SCF . α (s_{1}) + k_{2} SCF . α (s_{2}) .

(5)

The equations for M2K and MPK binding to the β and γ sites of the scaffold are similar. Together with two algebraic relationships, ${M 3 K}_{tot} = Σ_{i = 1}^{4} M 3 K (s_{i})$ and SCF.α(s₂) = M3K(s₂) + M3K(s₃), two additional ODEs are needed to completely account for the populations of the four possible states of machine M3K:

\frac{d M 3 K (s_{1})}{dt} = k_{2} M 3 K (s_{2}) + k_{8} M 3 K (s_{4}) - k_{1} SCF . α (s_{1}) M 3 K (s_{1})

(6)

\frac{d M 3 K (s_{2})}{dt} = k_{1} SCF . α (s_{1}) M 3 K (s_{1}) + k_{8} M 3 K (s_{3}) - (k_{2} + k_{7}) M 3 K (s_{2}) .

(7)

For the case of M2K or MPK, because phosphorylation of a kinase bound to the scaffold (transition from state s₂ to s3) is a conditional process that requires colocalization of an upstream kinase on the same scaffold protein, only a fraction of all kinases in state s₂ are candidates for transition to S₃. One can use the MFA state transition diagrams of Fig. 5 to write the ODEs that track the populations of states s₁ ands₂ for M2K as follows:

\frac{d M 2 K (s_{1})}{dt} = k_{4} M 2 K (s_{2}) + k_{10} M 2 K (s_{4}) - k_{3} SCF . β (s_{1}) M 2 K (s_{1})

(8)

\frac{d M 2 K (s_{2})}{dt} = k_{3} SCF . β (s_{1}) M 2 K (s_{1}) + k_{10} M 2 K (s_{3}) - (k_{4} + k_{9} θ_{1}) M 2 K (s_{2}) .

(9)

Similar ODEs can be written to describe the dynamics of MPK states. The handling of Rule 9 (11) in Table 5 for M2K (MPK) phosphorylation requires special attention. Not all M2Ks (or MPKs) in state s₂ are subject to transition to s₃ at a given time. The eligible fraction must satisfy the model assumption that requires colocalization of an M2K (MPK) with its enzyme, a phosphory-lated M3K (M2K), on the same scaffold protein. The factors θ1 and θ2 are introduced to account for the fractions of M2K(s₂) and MPK(s₂) that are eligible for transition to state s₃. In general, it is non-trivial to obtain analytical equations for factors such as θ and θ In this example, we simply approximate these factors as follows: θ₁ ≈ M3K(s₃)/SCF_tot and θ ≈₂ M2K(s₃)/SCF_tot. These approximations are exact only if kinase phosphorylation reactions are independent and context insensitive. Figure 6 shows results from deterministic ODE simulations, compared to those from kinetic Monte Carlo simulations (performed as described below). We note that the stochastic simulation results are exact. The time trajectories for phosphorylated M3K from the deterministic and the stochastic simulations agree with each other on average. However, the ODE-based simulations of M2K and MPK phosphorylation deviate from those of the exact stochastic simulations due to the approximations used for θ₁ and θ₂ (Fig. 6(b)). Writing ODEs to directly describe MFA states as variables provides a way of quickly constructing a quantitative (sometimes approximate) model for simulation. One can avoid using approximations by expanding reaction rules into a chemical reaction network [16, 20]. In such cases, the variables in these ODEs would correspond to concentrations of chemical species rather than MFA states. Generating these ODEs may be impractical, because the number of ODEs needed to capture the dynamics of the chemical species implied by a set of MFAs can be much larger than the number of ODEs needed to capture the dynamics of MFA states.

Simulation of the model for the MAP kinase cascade with a scaffold of Fig. 5. Results from ODE simulations (smooth curves) vs. those from kinetic Monte Carlo simulations (fluctuating curves), (a) Kinases bound to the scaffold M3K(s₃ + s₄) (solid line), M2K(s₂ + s₃) (dashed line) and MPK(s₂ + s₃) (dash-dot) each normalized by total number of kinases. (b) Phosphorylated kinases M3K(s₃ + s₄) (solid line), M2K(s₃ + s₄) (dashed line) and MPK(s₃ + s₄) (dash-dot) each normalized by total number of kinases. SCF_tot = 1000, M3K_tot = 2000, M2K_tot = 1500 and MPK_tot = 1000. Initially, all machines are in state s₁. Kinetic parameters (k₁ to k₁₂, defined in Table 5) are chosen for the purpose of illustration, not to model a specific system. k₁ = k₃ = k₅ = 1.66 × 10⁻⁶ nL·s⁻¹, k₂ = k₄ = k₆ = 1.0 s⁻¹, k₇ = k₉ = k₁₁ = 3.0 s⁻¹ and k₈ = k₁₀ = k₁₂ = 1.0 s⁻¹.

4.2 Kinetic Monte Carlo simulation

The stochastic simulations were carried out using the kinetic Monte Carlo algorithm described in the previous section. The system-specific implementation was an agent-based simulation procedure that samples the reaction rule list in Table 5 and transforms individual MFA agents. We note that M2K_θ(s₂) and MPK_θ(s₂) in Rules 9 and 11 (Table 5), in contrast to the approximations used in the ODE simulations, are updated by on-the-fly bookkeeping that tracks reaction events immediately coupled to the two rules. The bookkeeping accounts for the numbers of M2K and MPK kinases in state s₂ that are eligible for the transitions specified by Rules 9 and 11. This implementation corresponds to a rejection-free algorithm for kinetic Monte Carlo simulation of rule-based models [49, 52], in which the exact rates of rules are calculated. This scheme becomes difficult to implement and computationally inefficient when predicates can be potentially affected by many types of reaction events. For example, Rule 9 may be affected by events from eight other distinct rules, as indicated in the dependence matrix $D$ below. When the algorithm executes an event defined by any of these eight rules, the algorithm also needs to evaluate whether the predicate of Rule 9 is affected.

Reaction rules are usually coupled in most systems. In other words, an event from one rule may affect the rates of others. Such coupling relationships between rules can be summarized by a “dependency graph,” or “influence map” [13]. For the example of the MAPK cascade model, we can summarize the rule dependence in the form of the following adjacency matrix:

D = [\begin{matrix} 1 & 1 & 0 & 0 & 0 & 0 & x & 0 & x & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 & 0 & x & 0 & x & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & x & 0 & x & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & x & 0 & x & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & x & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & x & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & x & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & x & 1 & x & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & x & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & x & 1 & x & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & x & 1 \end{matrix}] .

(10)

A matrix entry d_ij at row i and column j represents the influence of an event from rule i on the rate of rule j. An entry with a boolean value 1 (0) indicates that an event in a rule has (does not have) an immediate impact on the rate of another rule, whereas x indicates that an event may or may not alter the rate of another rule, depending on whether the event changes the predicate of the other rule. For the example of the MAPK cascade, an event from Rule 1 can alter the rate of Rule 9 only in the case that a recruited M3K is phosphorylated and an unphosphorylated M2K is bound to the same scaffold agent. A dependency graph can in general be automatically derived by systematically analyzing a reaction rule set. In practice, the adjacency matrix $D$ can be made more quantitative by replacing the Boolean value 1's with pre-calculated values of rate changes. For example, the entry d₈₉ (the influence of an event from Rule 8 on the rate of Rule 9) can be (conditionally) replaced by a numerical value, the value of – k₉ (the minus sign indicates a reduction in the rule rate), because an M3K dephosphorylation may decrement the eligible population of M2K agents.

To reduce bookkeeping, one may use a rejection algorithm [48] as an alternative. This algorithm allows one to use rejection sampling to simplify the firing of reactions when state transitions are associated with predicate functions. In this algorithm, one calculates rule rates and samples participant MFA agents without considering constraints imposed by the predicate functions. Sampled trial agents are rejected for state transition if the predicate function does not evaluate as true. For example, the number of all MPK agents in state s₂ can be used to calculate the rate of Rule 11 as ${\tilde{r}}_{11} = k_{11} M P K (s_{2})$ . Once an event from Rule 11 is sampled, a trial MPK agent in state s₂ will be chosen, which will then undergo a state transition only if the transition condition is satisfied. Otherwise, the transition is rejected. Implementation of a rejection algorithm is easier compared to that of a rejection-free algorithm. The rejection algorithm enforces the conditions of state transitions only when a rule that has conditional transitions is sampled. Our experience suggests that a rejection algorithm is more computationally efficient than a rejection-free algorithm as long as rejections do not comprise the vast majority of all Monte Carlo steps [49].

The sparsity of dependency matrix $D$ in Eq. (10) (with many entries being zeros) indicates that rule couplings are largely localized. Similar to an efficient implementation of a conventional stochastic simulation algorithm for chemical reaction systems [54], for a large system specified by a considerable number of rules, weak dependency between rules can be used to optimize the procedure for updating rule rates after each Monte Carlo step.

5 Discussion

Information processing in living cells can be viewed as protein computation, in which proteins act as computing machines that react to external signals by local computation [31, 55]. Thus, a protein-protein interaction system is an integrative and distributed system with numerous computing devices interacting with each other under certain protocols. This perspective suggests that formal computing models can help archive, organize and interpret protein functions and their interactions. Formal structures also facilitate the process of computational modeling of complex and large-scale protein-protein interaction systems.

In this work, we introduce an extension to the traditional computing model of finite state automata to describe protein behaviors in response to external inputs and protein interactions. The MFA formalism offers a rule-based platform for modeling and simulating biochemical systems, especially for signal transduction. An MFA is in essence a data structure that specifies protocols to define the activity of a protein in a discrete state space. An MFA can be used as a representation of knowledge or hypotheses about a protein and can serve as a building block for biomolecular interaction models. At the systems level, reaction rules connect separate MFA-represented proteins and model the dynamics of a protein interaction system as synchronized MFA state transitions. The MFA formalism allows for a clearer and more natural representation of proteins and the combinatorics of protein interactions. The MFA formalism adds in intuitive formulation of mechanistic models of signal transduction, which can be accessible for those who are familiar with the biological knowledge underlying the models. For quantitative model computation, as our example model of a MAP kinase cascade system demonstrates, ODEs for deterministic simulations can be constructed to track concentrations of machine states that often directly correspond to experimentally resolvable quantities. (For some states, the ODE solutions give approximations.) For exact and stochastic simulations, rule-based kinetic Monte Carlo methods can be applied.

Formalisms derived from finite automata theory have been proposed earlier for applications in biology. Notably, Harel and coworkers [56] have applied statecharts, a graphical and hierarchical (extended) finite automata structure [50], to model and simulate cell development and dynamics of cell populations. Recently, use of finite automata to model biomolecular interactions was studied by Cardelli [57], in which the author introduced the concept of polyautomata. The concepts of polyautomata and MFA are closely related. In Ref. [57], polyautomata are used to represent SPiM scripts, which specify stochastic simulations via the stochastic pi-calculus approach implemented in the SPiM software tool [58]. Here, we have shown that MFAs can be used to specify both deterministic and stochastic simulation approaches (Fig. 6). Cardelli [57] demonstrated that polyautomata are useful for modeling protein complex formation, including polymerization-like reactions. Here, we have shown that MFAs can be used to model protein complex formation as well as post-translational modifications of proteins, as illustrated in Figs. 3 and 5. Finally, the MFA formalism extends the polyautomata concept in an important way by allowing for the explicit representation of the functional components of proteins and the structural relationships among these components (Fig. 2).

A potential strength of the MFA framework is the mature development and many applications of finite automata. Various forms of finite state automata have been industry standards for modeling reactive systems for many years. In particular, MFAs are amenable to hardware implementations using programmable logic devices including widely-used programmable array logic (PAL), generic array logic (GAL) and field programmable gate array (FPGA) devices. In particular, because of the sequential nature of discrete event-driven simulation, the performance of simulating large-scale complex biochemical reaction systems by stochastic simulation is poor even if it is not prohibitive. Hardware implementation may yield an advantage in speed. The first hardware (FPGA) stochastic simulations of biochemical networks were implemented by Keane and co-workers [59] and Salwinski and Eisenberg [60]. Taking advantage of the parallel architecture of FPGA, implementations by Salwinski and Eisenberg [60] allowed improvement in the speed of simulation of a simple bimolecular reaction by at least one order of magnitude compared to a conventional software implementation on a benchmarking platform. Recently, Yoshimi et al. [61] implemented the next reaction method of Gibson and Bruck [54] in an FPGA-based simulator and were able to achieve a significant speed-up. Implementation of rule-based models into programmable circuits has yet to be realized, but finite state machines are routinely implemented in hardware. Hardware implementation stores state variables and embeds state transition protocols into digital electronic circuits. In the design process, the MFA structures need to be translated into binary logic to apply digital computing to achieve the defined machine dynamics. As proteins can be modeled as MFAs, in principle, the dynamics of a protein interaction network may be simulated by electronic circuits.

Another potential use of the MFA structure is to archive proteins in terms of their reaction dynamics. Since an MFA is a standalone structure for storing the discrete dynamics of a protein, we expect that it can be used to systematically archive protein functions, with the MFA structure serving as an elementary record type for a database. Protein records in current protein databases are mostly annotations including amino acid sequences, functional domains, associated functions, etc. However, such information cannot be readily used to construct mechanistic biomolecular interaction models. The MFA structure offers an alternative way of storing protein records. Using a database with MFA records, one can construct a model by querying the database for MFA structures to obtain a set of desired molecular building blocks. One can then specify reaction rules to connect these machines. Such a rule-based model can be efficiently revised and incrementally refined when records of MFAs involved in the model are updated to reflect new knowledge.

Acknowledgment

This work was supported by grants GM076570, GM085273 and RR18754 from the National Institutes of Health, by the Department of Energy through contract DE-AC52-06NA25396, and by grant 30870477 from the National Science Foundation of China to J.Y. We thank the Center for Nonlinear Studies for support that made it possible for J.Y. to visit Los Alamos.

References

[1].Kitano H. Computational systems biology. Nature. 2002;420:206–210. doi: 10.1038/nature01254. [DOI] [PubMed] [Google Scholar]
[2].Kholodenko BN. Cell signalling dynamics in time and space. Nat. Rev. Mol. Cell Biol. 2006;7:165–176. doi: 10.1038/nrm1838. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Aldridge BB, Burke JM, Lauffenburger DA, Sorger PK. Physico-chemical modelling of cell signalling pathways. Nat. Cell Biol. 2006;8:1195–1203. doi: 10.1038/ncb1497. [DOI] [PubMed] [Google Scholar]
[4].Breitling R, Hoeller D. Current challenges in quantitative modeling of epidermal growth factor signaling. FEBS Lett. 2005;579:6289–6294. doi: 10.1016/j.febslet.2005.10.034. [DOI] [PubMed] [Google Scholar]
[5].Hlavacek WS, Faeder JR, Blinov ML, Perelson AS, Goldstein B. The complexity of complexes in signal transduction. Biotechnol. Bioeng. 2003;84:783–794. doi: 10.1002/bit.10842. [DOI] [PubMed] [Google Scholar]
[6].Yang XJ. Multisite protein modification and intramolecular signaling. Oncogene. 2005;24:1653–1662. doi: 10.1038/sj.onc.1208173. [DOI] [PubMed] [Google Scholar]
[7].Hunter T. Signaling—2000 and beyond. Cell. 2000;100:113–127. doi: 10.1016/s0092-8674(00)81688-8. [DOI] [PubMed] [Google Scholar]
[8].Pawson T, Nash P. Assembly of cell regulatory systems through protein interaction domains. Science. 2003;300:445–452. doi: 10.1126/science.1083653. [DOI] [PubMed] [Google Scholar]
[9].Bhattacharyya RP, Reményi A, Yeh BJ, Lim WA. Domains, motifs, and scaffolds: The role of modular interactions in the evolution and wiring of cell signaling circuits. Annu. Rev. Biochem. 2006;75:655–80. doi: 10.1146/annurev.biochem.75.103004.142710. [DOI] [PubMed] [Google Scholar]
[10].Hlavacek WS, Faeder JR. The complexity of cell signaling and the need for a new mechanics. Sci. Signal. 2009;2:pe46. doi: 10.1126/scisignal.281pe46. [DOI] [PubMed] [Google Scholar]
[11].Mayer BJ, Blinov ML, Loew LM. Molecular machines or pleiomorphic ensembles: signaling complexes revisited. J. BioL. 2009;8:81. doi: 10.1186/jbiol185. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Hlavacek WS, Faeder JR, Blinov ML, Posner RG, Hucka M, Fontana W. Rules for modeling signal-transduction systems. Sci. STKE. 2006;2006:re6. doi: 10.1126/stke.3442006re6. [DOI] [PubMed] [Google Scholar]
[13].Danos V, Feret J, Fontana W, Harmer R, Krivine J. Rule-based modelling of cellular signalling. Lect. Notes Comput. Sci. 2007;4703:17–41. [Google Scholar]
[14].Danos V, Laneve C. Formal molecular biology. Theor. Comput. Sci. 2004;325:69–110. [Google Scholar]
[15].Priami C, Quaglia P. Beta binders for biological interactions. Lect Notes Comput. Sci. 2005:20–33. [Google Scholar]
[16].Faeder JR, Blinov ML, Goldstein B, Hlavacek WS. Rule-based modeling of biochemical networks. Complexity. 2005;10:22–41. [Google Scholar]
[17].Blinov ML, Yang J, Faeder JR, Hlavacek WS. Graph theory for rule-based modeling of biochemical networks. Lect. Notes Comput. Sci. 2006;4230:89–106. [Google Scholar]
[18].Andrei O, Kirchner H. Graph rewriting and strategies for modeling biochemical networks. 9th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing; IEEE Computer Society; 2007. pp. 407–414. [Google Scholar]
[19].Feret J, Danos V, Krivine J, Harmer R, Fontana W. Internal coarse-graining of molecular systems. Proc. Natl. Acad. Sci. USA. 2009;106:6453–6458. doi: 10.1073/pnas.0809908106. [DOI] [PMC free article] [PubMed] [Google Scholar]
[20].Blinov ML, Faeder JR, Goldstein B, Hlavacek WS. BioNetGen: software for rule-based modeling of signal transduction based on the interactions of molecular domains. Bioinformatics. 2004;20:3289–3291. doi: 10.1093/bioinformatics/bth378. [DOI] [PubMed] [Google Scholar]
[21].Moraru II, Schaff JC, Slepchenko BM, Blinov ML, Morgan F, Lak-shminarayana A, Gao F, Li Y, Loew LM. Virtual Cell modelling and simulation software environment. IET Syst. Biol. 2008;2:352–362. doi: 10.1049/iet-syb:20080102. [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Faeder JR, Blinov ML, Hlavacek WS. Rule-based modeling of biochemical systems with BioNetGen. Methods Mol. Biol. 2009;500:113–167. doi: 10.1007/978-1-59745-525-1_5. [DOI] [PubMed] [Google Scholar]
[23].Mallavarapu A, Thomson M, Ullian B, Gunawardena J. Programming with models: modularity and abstraction provide powerful capabilities for systems biology. J. Roy. Soc. Interface. 2009;6:257–270. doi: 10.1098/rsif.2008.0205. [DOI] [PMC free article] [PubMed] [Google Scholar]
[24].Lok L, Brent R. Automatic generation of cellular reaction networks with moleculizer 1.0. Nat. Biotechnol. 2005;23:131–136. doi: 10.1038/nbt1054. [DOI] [PubMed] [Google Scholar]
[25].Andrews SS, Addy NJ, Brent R, Arkin AP, Sauro HM. Detailed simulations of cell biology with Smoldyn 2.1. PLoS Gomput. Biol. 2010;6:e1000705. doi: 10.1371/journal.pcbi.1000705. [DOI] [PMC free article] [PubMed] [Google Scholar]
[26].Meier-Schellersheim M, Xu X, Angermann B, Kunkel EJ, Jin T, Germain RN. Key role of local regulation in chemosensing revealed by a new molecular interaction-based modeling method. PLoS Gomput. Biol. 2006;2:e82. doi: 10.1371/journal.pcbi.0020082. [DOI] [PMC free article] [PubMed] [Google Scholar]
[27].Lis M, Artyomov MN, Devadas S, Chakraborty AK. Efficient stochastic simulation of reaction-diffusion processes via direct compilation. Bioin-formatics. 2009;25:2289–2291. doi: 10.1093/bioinformatics/btp387. [DOI] [PMC free article] [PubMed] [Google Scholar]
[28].Morton-Firth CJ, Bray D. Predicting temporal fluctuations in an intra-cellular signalling pathway. J. Theor. Biol. 1998;192:117–128. doi: 10.1006/jtbi.1997.0651. [DOI] [PubMed] [Google Scholar]
[29].Regev A, Silverman W, Shapiro E. Representation and simulation of biochemical processes using the π-calculus process algebra. Pacific Symposium on Biocomputing. 2001;6:459–470. doi: 10.1142/9789814447362_0045. [DOI] [PubMed] [Google Scholar]
[30].Priami C, Regev A, Shapiro E, Silverman W. Application of a stochastic name-passing calculus to representation and simulation of molecular processes. Inform. Process. Lett. 2001;80:25–31. [Google Scholar]
[31].Regev A, Shapiro E. Cellular abstractions: Cells as computation. Nature. 2002;419:343. doi: 10.1038/419343a. [DOI] [PubMed] [Google Scholar]
[32].Fisher J, Henzinger TA. Executable cell biology. Nat. Biotechnol. 2007;25:1239–1250. doi: 10.1038/nbt1356. [DOI] [PubMed] [Google Scholar]
[33].Gillespie DT. Stochastic simulation of chemical kinetics. Annu. Rev. Phys. Chem. 2007;58:35–55. doi: 10.1146/annurev.physchem.58.032806.104637. [DOI] [PubMed] [Google Scholar]
[34].Hopcroft JE, Motwani R, Ullman JD. Introduction to Automata Theory, Languages, and Computation. 2nd. edition Addison-Wesley; New York: 2000. [Google Scholar]
[35].Sipser M. Introduction to the Theory of Computation. 2nd edition Course Technology; 2005. [Google Scholar]
[36].Lee D, Yannakakis M. Principles and methods of testing finite state machines — a survey. Proc. IEEE. 1996;84:1090–1123. [Google Scholar]
[37].Börger E, Stärk RF. Abstract State Machines: A Method for High-Level System Design and Analysis. Springer Verlag; New York: 2003. [Google Scholar]
[38].Mohri M. Finite-state transducers in language and speech processing. Computational Linguistics. 1997;23:269–311. [Google Scholar]
[39].Brand D, Zafiropulo P. On communicating finite-state machines. J. ACM. 1983;30:323–342. [Google Scholar]
[40].Goldstein B, Faeder JR, Hlavacek WS, Blinov ML, Redondo A, Wofsy C. Modeling the early signaling events mediated by FcεRI. Mol. Immunol. 2002;38:1213–1219. doi: 10.1016/s0161-5890(02)00066-4. [DOI] [PubMed] [Google Scholar]
[41].Faeder JR, Hlavacek WS, Reischl I, Blinov ML, Metzger H, Redondo A, Wofsy C, Goldstein B. Investigation of early events in FcεRI-mediated signaling using a detailed mathematical model. J. Immunol. 2003;170:3769–3781. doi: 10.4049/jimmunol.170.7.3769. [DOI] [PubMed] [Google Scholar]
[42].Borisov NM, Markevich NI, Hoek JB, Kholodenko BN. Signaling through receptors and scaffolds: independent interactions reduce combinatorial complexity. Biophys. J. 2005;89:951–966. doi: 10.1529/biophysj.105.060533. [DOI] [PMC free article] [PubMed] [Google Scholar]
[43].Conzelmann H, Saez-Rodriguez J, Sauter T, Kholodenko BN, Gilles E. A domain-oriented approach to the reduction of combinatorial complexity in signal transduction networks. BMC Bioinformatics. 2006;7:34. doi: 10.1186/1471-2105-7-34. [DOI] [PMC free article] [PubMed] [Google Scholar]
[44].Borisov NM, Markevich NI, Hoek JB, Kholodenko BN. Trading the micro-world of combinatorial complexity for the macro-world of protein interaction domains. BioSystems. 2006;83:152–166. doi: 10.1016/j.biosystems.2005.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
[45].Borisov NM, Chistopolsky AS, Faeder JR, Kholodenko BN. Domain-oriented reduction of rule-based network models. IET Syst. Biol. 2008;2:342–351. doi: 10.1049/iet-syb:20070081. [DOI] [PMC free article] [PubMed] [Google Scholar]
[46].van Kampen NG. Stochastic processes in physics and chemistry. 3rd edition North-Holland, Boston; 2007. [Google Scholar]
[47].Voter AF. Introduction to the kinetic Monte Carlo method. Radiation Effects in Solids. 2007;235:1–23. [Google Scholar]
[48].Yang J, Monine MI, Faeder JR, Hlavacek WS. Kinetic Monte Carlo method for rule-based modeling of biochemical networks. Phys. Rev. E. 2008;78:031910. doi: 10.1103/PhysRevE.78.031910. [DOI] [PMC free article] [PubMed] [Google Scholar]
[49].Yang J, Hlavacek WS. Rejection-free kinetic Monte Carlo simulation of multivalent biomolecular interactions. Arxiv preprint ar Xiv:0812.4619v5. 2010 [Google Scholar]
[50].Danos V, Feret J, Fontana W, Krivine J. Scalable simulation of cellular signaling networks. Lect. Notes Comput. Sci. 2007;4807:139–157. [Google Scholar]
[51].Colvin J, Monine MI, Faeder JR, Hlavacek WS, Von Hoff DD, Posner RG. Simulation of large-scale rule-based models. Bioinformatics. 2009;25:910–917. doi: 10.1093/bioinformatics/btp066. [DOI] [PMC free article] [PubMed] [Google Scholar]
[52].Colvin J, Monine MI, Gutenkunst RN, Hlavacek WS, Von Hoff DD, Posner RG. RuleMonkey: software for stochastic simulation of rule-based models. BMC Bioinformatics. 2010;11:404. doi: 10.1186/1471-2105-11-404. [DOI] [PMC free article] [PubMed] [Google Scholar]
[53].Garrington TP, Johnson GL. Organization and regulation of mitogen-activated protein kinase signaling pathways. Curr. Opin. Cell Biol. 1999;11:211–218. doi: 10.1016/s0955-0674(99)80028-3. [DOI] [PubMed] [Google Scholar]
[54].Gibson MA, Bruck J. Efficient exact stochastic simulation of chemical systems with many species and many channels. J. Phys. Chem. A. 2000;104:1876–1889. [Google Scholar]
[55].Bray D. Protein molecules as computational elements in living cells. Nature. 1995;376:307–312. doi: 10.1038/376307a0. [DOI] [PubMed] [Google Scholar]
[56].Fisher J, Harel D. On statecharts for biology. Symbolic systems biology: theory and methods; Jones and Bartlett; 2010. in press. [Google Scholar]
[57].Cardelli L. Artificial biochemistry. In: Condon A, Harel D, Kok JN, Salomaa A, Winfree E, editors. Algorithmic Bioprocesses, Natural Computing Series. Springer; Berlin: 2009. pp. 429–462. [Google Scholar]
[58].Phillips A, Cardelli L. Efficient, correct simulation of biological processes in the stochastic pi-calculus. Lect. Notes Comput. Sci. 2007;4695:184–199. [Google Scholar]
[59].Keane JF, Bradley C, Ebeling C. A compiled accelerator for biological cell signaling simulations. 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays; New York: ACM; 2004. pp. 233–241. [Google Scholar]
[60].Salwinski L, Eisenberg D. In silico simulation of biological network dynamics. Nat. Biotechnol. 2004;22:1017–1019. doi: 10.1038/nbt991. [DOI] [PubMed] [Google Scholar]
[61].Yoshimi M, Iwaoka Y, Nishikawa Y, Kojima T, Osana Y, Funahashi A, Hiroi N, Shibata Y, Iwanaga N, Yamada H, Kitano H, Amano H. FPGA implementation of a data-driven stochastic biochemical simulator with the next reaction method. International Conference on Field Programmable Logic and Applications; IEEE; 2007. pp. 254–259. [Google Scholar]

[R1] [1].Kitano H. Computational systems biology. Nature. 2002;420:206–210. doi: 10.1038/nature01254. [DOI] [PubMed] [Google Scholar]

[R2] [2].Kholodenko BN. Cell signalling dynamics in time and space. Nat. Rev. Mol. Cell Biol. 2006;7:165–176. doi: 10.1038/nrm1838. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Aldridge BB, Burke JM, Lauffenburger DA, Sorger PK. Physico-chemical modelling of cell signalling pathways. Nat. Cell Biol. 2006;8:1195–1203. doi: 10.1038/ncb1497. [DOI] [PubMed] [Google Scholar]

[R4] [4].Breitling R, Hoeller D. Current challenges in quantitative modeling of epidermal growth factor signaling. FEBS Lett. 2005;579:6289–6294. doi: 10.1016/j.febslet.2005.10.034. [DOI] [PubMed] [Google Scholar]

[R5] [5].Hlavacek WS, Faeder JR, Blinov ML, Perelson AS, Goldstein B. The complexity of complexes in signal transduction. Biotechnol. Bioeng. 2003;84:783–794. doi: 10.1002/bit.10842. [DOI] [PubMed] [Google Scholar]

[R6] [6].Yang XJ. Multisite protein modification and intramolecular signaling. Oncogene. 2005;24:1653–1662. doi: 10.1038/sj.onc.1208173. [DOI] [PubMed] [Google Scholar]

[R7] [7].Hunter T. Signaling—2000 and beyond. Cell. 2000;100:113–127. doi: 10.1016/s0092-8674(00)81688-8. [DOI] [PubMed] [Google Scholar]

[R8] [8].Pawson T, Nash P. Assembly of cell regulatory systems through protein interaction domains. Science. 2003;300:445–452. doi: 10.1126/science.1083653. [DOI] [PubMed] [Google Scholar]

[R9] [9].Bhattacharyya RP, Reményi A, Yeh BJ, Lim WA. Domains, motifs, and scaffolds: The role of modular interactions in the evolution and wiring of cell signaling circuits. Annu. Rev. Biochem. 2006;75:655–80. doi: 10.1146/annurev.biochem.75.103004.142710. [DOI] [PubMed] [Google Scholar]

[R10] [10].Hlavacek WS, Faeder JR. The complexity of cell signaling and the need for a new mechanics. Sci. Signal. 2009;2:pe46. doi: 10.1126/scisignal.281pe46. [DOI] [PubMed] [Google Scholar]

[R11] [11].Mayer BJ, Blinov ML, Loew LM. Molecular machines or pleiomorphic ensembles: signaling complexes revisited. J. BioL. 2009;8:81. doi: 10.1186/jbiol185. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] [12].Hlavacek WS, Faeder JR, Blinov ML, Posner RG, Hucka M, Fontana W. Rules for modeling signal-transduction systems. Sci. STKE. 2006;2006:re6. doi: 10.1126/stke.3442006re6. [DOI] [PubMed] [Google Scholar]

[R13] [13].Danos V, Feret J, Fontana W, Harmer R, Krivine J. Rule-based modelling of cellular signalling. Lect. Notes Comput. Sci. 2007;4703:17–41. [Google Scholar]

[R14] [14].Danos V, Laneve C. Formal molecular biology. Theor. Comput. Sci. 2004;325:69–110. [Google Scholar]

[R15] [15].Priami C, Quaglia P. Beta binders for biological interactions. Lect Notes Comput. Sci. 2005:20–33. [Google Scholar]

[R16] [16].Faeder JR, Blinov ML, Goldstein B, Hlavacek WS. Rule-based modeling of biochemical networks. Complexity. 2005;10:22–41. [Google Scholar]

[R17] [17].Blinov ML, Yang J, Faeder JR, Hlavacek WS. Graph theory for rule-based modeling of biochemical networks. Lect. Notes Comput. Sci. 2006;4230:89–106. [Google Scholar]

[R18] [18].Andrei O, Kirchner H. Graph rewriting and strategies for modeling biochemical networks. 9th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing; IEEE Computer Society; 2007. pp. 407–414. [Google Scholar]

[R19] [19].Feret J, Danos V, Krivine J, Harmer R, Fontana W. Internal coarse-graining of molecular systems. Proc. Natl. Acad. Sci. USA. 2009;106:6453–6458. doi: 10.1073/pnas.0809908106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] [20].Blinov ML, Faeder JR, Goldstein B, Hlavacek WS. BioNetGen: software for rule-based modeling of signal transduction based on the interactions of molecular domains. Bioinformatics. 2004;20:3289–3291. doi: 10.1093/bioinformatics/bth378. [DOI] [PubMed] [Google Scholar]

[R21] [21].Moraru II, Schaff JC, Slepchenko BM, Blinov ML, Morgan F, Lak-shminarayana A, Gao F, Li Y, Loew LM. Virtual Cell modelling and simulation software environment. IET Syst. Biol. 2008;2:352–362. doi: 10.1049/iet-syb:20080102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] [22].Faeder JR, Blinov ML, Hlavacek WS. Rule-based modeling of biochemical systems with BioNetGen. Methods Mol. Biol. 2009;500:113–167. doi: 10.1007/978-1-59745-525-1_5. [DOI] [PubMed] [Google Scholar]

[R23] [23].Mallavarapu A, Thomson M, Ullian B, Gunawardena J. Programming with models: modularity and abstraction provide powerful capabilities for systems biology. J. Roy. Soc. Interface. 2009;6:257–270. doi: 10.1098/rsif.2008.0205. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] [24].Lok L, Brent R. Automatic generation of cellular reaction networks with moleculizer 1.0. Nat. Biotechnol. 2005;23:131–136. doi: 10.1038/nbt1054. [DOI] [PubMed] [Google Scholar]

[R25] [25].Andrews SS, Addy NJ, Brent R, Arkin AP, Sauro HM. Detailed simulations of cell biology with Smoldyn 2.1. PLoS Gomput. Biol. 2010;6:e1000705. doi: 10.1371/journal.pcbi.1000705. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] [26].Meier-Schellersheim M, Xu X, Angermann B, Kunkel EJ, Jin T, Germain RN. Key role of local regulation in chemosensing revealed by a new molecular interaction-based modeling method. PLoS Gomput. Biol. 2006;2:e82. doi: 10.1371/journal.pcbi.0020082. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] [27].Lis M, Artyomov MN, Devadas S, Chakraborty AK. Efficient stochastic simulation of reaction-diffusion processes via direct compilation. Bioin-formatics. 2009;25:2289–2291. doi: 10.1093/bioinformatics/btp387. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] [28].Morton-Firth CJ, Bray D. Predicting temporal fluctuations in an intra-cellular signalling pathway. J. Theor. Biol. 1998;192:117–128. doi: 10.1006/jtbi.1997.0651. [DOI] [PubMed] [Google Scholar]

[R29] [29].Regev A, Silverman W, Shapiro E. Representation and simulation of biochemical processes using the π-calculus process algebra. Pacific Symposium on Biocomputing. 2001;6:459–470. doi: 10.1142/9789814447362_0045. [DOI] [PubMed] [Google Scholar]

[R30] [30].Priami C, Regev A, Shapiro E, Silverman W. Application of a stochastic name-passing calculus to representation and simulation of molecular processes. Inform. Process. Lett. 2001;80:25–31. [Google Scholar]

[R31] [31].Regev A, Shapiro E. Cellular abstractions: Cells as computation. Nature. 2002;419:343. doi: 10.1038/419343a. [DOI] [PubMed] [Google Scholar]

[R32] [32].Fisher J, Henzinger TA. Executable cell biology. Nat. Biotechnol. 2007;25:1239–1250. doi: 10.1038/nbt1356. [DOI] [PubMed] [Google Scholar]

[R33] [33].Gillespie DT. Stochastic simulation of chemical kinetics. Annu. Rev. Phys. Chem. 2007;58:35–55. doi: 10.1146/annurev.physchem.58.032806.104637. [DOI] [PubMed] [Google Scholar]

[R34] [34].Hopcroft JE, Motwani R, Ullman JD. Introduction to Automata Theory, Languages, and Computation. 2nd. edition Addison-Wesley; New York: 2000. [Google Scholar]

[R35] [35].Sipser M. Introduction to the Theory of Computation. 2nd edition Course Technology; 2005. [Google Scholar]

[R36] [36].Lee D, Yannakakis M. Principles and methods of testing finite state machines — a survey. Proc. IEEE. 1996;84:1090–1123. [Google Scholar]

[R37] [37].Börger E, Stärk RF. Abstract State Machines: A Method for High-Level System Design and Analysis. Springer Verlag; New York: 2003. [Google Scholar]

[R38] [38].Mohri M. Finite-state transducers in language and speech processing. Computational Linguistics. 1997;23:269–311. [Google Scholar]

[R39] [39].Brand D, Zafiropulo P. On communicating finite-state machines. J. ACM. 1983;30:323–342. [Google Scholar]

[R40] [40].Goldstein B, Faeder JR, Hlavacek WS, Blinov ML, Redondo A, Wofsy C. Modeling the early signaling events mediated by FcεRI. Mol. Immunol. 2002;38:1213–1219. doi: 10.1016/s0161-5890(02)00066-4. [DOI] [PubMed] [Google Scholar]

[R41] [41].Faeder JR, Hlavacek WS, Reischl I, Blinov ML, Metzger H, Redondo A, Wofsy C, Goldstein B. Investigation of early events in FcεRI-mediated signaling using a detailed mathematical model. J. Immunol. 2003;170:3769–3781. doi: 10.4049/jimmunol.170.7.3769. [DOI] [PubMed] [Google Scholar]

[R42] [42].Borisov NM, Markevich NI, Hoek JB, Kholodenko BN. Signaling through receptors and scaffolds: independent interactions reduce combinatorial complexity. Biophys. J. 2005;89:951–966. doi: 10.1529/biophysj.105.060533. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] [43].Conzelmann H, Saez-Rodriguez J, Sauter T, Kholodenko BN, Gilles E. A domain-oriented approach to the reduction of combinatorial complexity in signal transduction networks. BMC Bioinformatics. 2006;7:34. doi: 10.1186/1471-2105-7-34. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] [44].Borisov NM, Markevich NI, Hoek JB, Kholodenko BN. Trading the micro-world of combinatorial complexity for the macro-world of protein interaction domains. BioSystems. 2006;83:152–166. doi: 10.1016/j.biosystems.2005.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] [45].Borisov NM, Chistopolsky AS, Faeder JR, Kholodenko BN. Domain-oriented reduction of rule-based network models. IET Syst. Biol. 2008;2:342–351. doi: 10.1049/iet-syb:20070081. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] [46].van Kampen NG. Stochastic processes in physics and chemistry. 3rd edition North-Holland, Boston; 2007. [Google Scholar]

[R47] [47].Voter AF. Introduction to the kinetic Monte Carlo method. Radiation Effects in Solids. 2007;235:1–23. [Google Scholar]

[R48] [48].Yang J, Monine MI, Faeder JR, Hlavacek WS. Kinetic Monte Carlo method for rule-based modeling of biochemical networks. Phys. Rev. E. 2008;78:031910. doi: 10.1103/PhysRevE.78.031910. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] [49].Yang J, Hlavacek WS. Rejection-free kinetic Monte Carlo simulation of multivalent biomolecular interactions. Arxiv preprint ar Xiv:0812.4619v5. 2010 [Google Scholar]

[R50] [50].Danos V, Feret J, Fontana W, Krivine J. Scalable simulation of cellular signaling networks. Lect. Notes Comput. Sci. 2007;4807:139–157. [Google Scholar]

[R51] [51].Colvin J, Monine MI, Faeder JR, Hlavacek WS, Von Hoff DD, Posner RG. Simulation of large-scale rule-based models. Bioinformatics. 2009;25:910–917. doi: 10.1093/bioinformatics/btp066. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] [52].Colvin J, Monine MI, Gutenkunst RN, Hlavacek WS, Von Hoff DD, Posner RG. RuleMonkey: software for stochastic simulation of rule-based models. BMC Bioinformatics. 2010;11:404. doi: 10.1186/1471-2105-11-404. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] [53].Garrington TP, Johnson GL. Organization and regulation of mitogen-activated protein kinase signaling pathways. Curr. Opin. Cell Biol. 1999;11:211–218. doi: 10.1016/s0955-0674(99)80028-3. [DOI] [PubMed] [Google Scholar]

[R54] [54].Gibson MA, Bruck J. Efficient exact stochastic simulation of chemical systems with many species and many channels. J. Phys. Chem. A. 2000;104:1876–1889. [Google Scholar]

[R55] [55].Bray D. Protein molecules as computational elements in living cells. Nature. 1995;376:307–312. doi: 10.1038/376307a0. [DOI] [PubMed] [Google Scholar]

[R56] [56].Fisher J, Harel D. On statecharts for biology. Symbolic systems biology: theory and methods; Jones and Bartlett; 2010. in press. [Google Scholar]

[R57] [57].Cardelli L. Artificial biochemistry. In: Condon A, Harel D, Kok JN, Salomaa A, Winfree E, editors. Algorithmic Bioprocesses, Natural Computing Series. Springer; Berlin: 2009. pp. 429–462. [Google Scholar]

[R58] [58].Phillips A, Cardelli L. Efficient, correct simulation of biological processes in the stochastic pi-calculus. Lect. Notes Comput. Sci. 2007;4695:184–199. [Google Scholar]

[R59] [59].Keane JF, Bradley C, Ebeling C. A compiled accelerator for biological cell signaling simulations. 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays; New York: ACM; 2004. pp. 233–241. [Google Scholar]

[R60] [60].Salwinski L, Eisenberg D. In silico simulation of biological network dynamics. Nat. Biotechnol. 2004;22:1017–1019. doi: 10.1038/nbt991. [DOI] [PubMed] [Google Scholar]

[R61] [61].Yoshimi M, Iwaoka Y, Nishikawa Y, Kojima T, Osana Y, Funahashi A, Hiroi N, Shibata Y, Iwanaga N, Yamada H, Kitano H, Amano H. FPGA implementation of a data-driven stochastic biochemical simulator with the next reaction method. International Conference on Field Programmable Logic and Applications; IEEE; 2007. pp. 254–259. [Google Scholar]

PERMALINK

Rule-based Modeling and Simulation of Biochemical Systems with Molecular Finite Automata

Jin Yang

Xin Meng

William S Hlavacek

Abstract

1 Introduction

2 Formal model