Phenotype control techniques for Boolean gene regulatory networks

Daniel Plaugher; David Murrugarra

doi:10.1101/2023.04.17.537158

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 Apr 18:2023.04.17.537158. [Version 1] doi: 10.1101/2023.04.17.537158

Phenotype control techniques for Boolean gene regulatory networks

Daniel Plaugher ^1,^a, David Murrugarra ²

PMCID: PMC10153207 PMID: 37131770

Abstract

Modeling cell signal transduction pathways via Boolean networks (BNs) has become an established method for analyzing intracellular communications over the last few decades. What’s more, BNs provide a course-grained approach, not only to understanding molecular communications, but also for targeting pathway components that alter the long-term outcomes of the system. This has come to be known as phenotype control theory. In this review we study the interplay of various approaches for controlling gene regulatory networks such as: algebraic methods, control kernel, feedback vertex set, and stable motifs. The study will also include comparative discussion between the methods, using an established cancer model of T-Cell Large Granular Lymphocyte (T-LGL) Leukemia. Further, we explore possible options for making the control search more efficient using reduction and modularity. Finally, we will include challenges presented such as the complexity and the availability of software for implementing each of these control techniques.

1. Introduction and Motivation

In biology, phenotypes represent observable features such as apoptosis, proliferation, senescence, autophagy, and more. Mathematically, a phenotype is associated with a group of attractors where a subset of the system’s variables have a shared state. We define an attractor as a set of states from which there is no escape as the system evolves, and an attractor with a singleton state is called a fixed point. These shared states are then used as biomarkers that indicate diverse hallmarks of the system that one might view as rolling a ball down Waddington’s epigenetic landscape [1]. Thus, phenotype control is the ability to drive the system to a predetermined phenotype from any initial state by inducing the appropriate gene knockouts or knock-ins [2].

One way mathematicians are able to assist biological researchers is through modeling cell signal transduction pathways. However, these pathways can be highly complex due to signaling motifs like feedback loops, crosstalk, and high-dimensional nonlinearity [3]. To address these complexities, mathematical modelers have developed many strategies for creating and analyzing networks, traditionally classified based on the time and population of gene products. For instance, there are techniques for continuous population with continuous time such as ordinary differential equations [3, 4], discrete population with continuous time such as the Gillespie formulation [5, 6], and discrete population with discrete time such as BNs, logical models, and also their related stochastic counterparts [7–11]. There are also numerous well developed statistical, agent based, and PDE models which are outside the scope of this review [2]. For this review, the framework of choice utilizes Boolean networks.

Today increasingly extensive effort is dedicated to understanding more than just the cancer cells themselves. Modelers have developed multicellular models including cancer, stromal, immune, and other cells to study the interplay between cancer cells and their surrounding tumor microenvironment [12–15]. These models are typically referred to as multiscale because they integrate interactions at differing size and time scales, making it possible to simulate clinically relevant spatiotemporal scales, and at the same time simulate the effect of molecular drugs on tumor progression [16–21]. The high complexity of these models generates challenges for model validation such as the need to estimate too many model parameters and controlling variables at differing scales [12, 22].

Understanding such mechanisms is quite convoluted and is not presently well-established . Even though multiscale or hybrid models would likely provide more realistic simulations, there are currently no control methods that apply directly to such models [2, 12, 22]. For this reason, we elect to utilize Boolean networks because they provide a course-grained description of gene regulatory networks without the need for tedious parameter discovery [23]. This framework would also allow for approximating multistate, multiscale, or even continuous systems by projecting into a Boolean setting for analysis [12, 24, 25]. While there are many techniques available for controlling Boolean networks, we will highlight methods that provide overarching theory, as well as some emerging techniques. These methods include computational algebra [26,27], control kernel [28,29], feedback vertex set [30,31], and stable motifs [32], where each tactic provides a complimentary approach depending on the information available [22,33]. We will also include techniques to address efficiency with network modularity [34] and reduction [33, 35–37].

Phenotype control has two main distinguishing features. Its objectives are related to dynamical attractors of highly nonlinear systems, and it focuses on open-loop interventions. These types of interventions are instances where the protocol is not adjusted based on the state of the system, inducing the control only at the front end. This is contrasted with optimal control, where the goal is to find a control policy that specifies the ideal control action for each state [38–42]. Thus, phenotype control theory is primarily concerned with identifying key markers of the system that aid in understanding the various functions of cells and their molecular mechanisms.

The format of this review will be as follows: Section 2 will provide an initial overview of the methods with discussion of overlapping features and application to a known cancer model (Section 2.1), Section 3 will lay out the different techniques used to find target controls, Section 4 will discuss methods to make the target discovery problem more efficient, Section 5 will address limitations and open problems, Section 6 will have some concluding thoughts and discussion. Finally, readers can find helpful information in the Appendix including: toy models for basic examples of each method (Section 7.1), foundational principles for finite dynamical systems (Section 7.2), simulation techniques of suggested targets (Section 7.3), software with tutorials and how-to documentation (Section 7.4), and lastly Section 7.5 has supplementary tables.

2. Overview of Control Methods

Depending on the specific aims and information available, Table 1 provides a set of complementary approaches for phenotype control and their key features. For instance, if you only have access to the wiring diagram, then feedback vertex set (FVS) is an option for global stabilization. If you have the Boolean rules, and if the objective is to drive the system into one of the existing attractors, then stable motifs (SM) are an option. If you have the Boolean rules, and if the objective is to create a new attractor or to block existing attractors, then algebraic methods (AM) (also called computational algebra -CA) are an option.

Table 1:

Phenotype methods and their features.

Method	Control Objective(s)	Control Action(s)	Requirements	Refs
Algebraic Methods	Transform transient state into a steady state; Transform steady state into a transient state; Eliminate transition between two states	Assign node to specified value; Activate or inhibit specific edge	Regulatory network structure; Boolean functions written as polynomials	[26, 43]
Control Kernel	Force the system to have one stable attractor	Assign node to specified value	Boolean functions written as polynomials	[28, 29]
Feedback Vertex Set	Force the system to have one stable attractor	Assign node to its value in the target attractor	Regulatory network structure; Node activities in target attractor	[30, 44]
Stable Motifs	Force any initial state toward a pre-existing attractor; Transform a steady-state into a transient state	Assign node to its stable motif value; Inhibit interaction to disrupt stable motif of a steady-state	Regulatory network structure; Boolean rules written in DNF	[32]

Open in a new tab

This table contains a summary of the target identification techniques discussed, as well as their key features. Namely, we summarize their objectives, induced control actions, and the necessary components to use each method. Software for these methods can be found in the Appendix.

Despite the shared goals of these methods, each seeks distinct control objectives . They are each based on specific mathematical structures and lack a common theoretical framework that allows their complementary and synergistic application. Yet, we clearly see overlapping outcomes between methods. For example, it has been shown that the FVS establishes the upperbound for the magnitude of targets required to control the system [29]. Indeed, we observe that, among methods using pre-existing attractors, the control sets for CA and SM are subsets of the larger FVS results. On the other hand, CA and CK appear to produce minimal sets. Further, the CA and SM methods can produce the same results, or CA can be a subset of SM. See Tables 2 and 3.

Table 2:

Large T-LGL target tables.

CA Nodes
Name	State

S1P	OFF

KRAS	OFF

SPHK1	OFF

PDGFR	OFF

DISC	ON

Ceramide	ON

GAP	ON

(a)
CA Edges
Tail	Head	State

S1P	PDGFR	OFF

KRAS	MEK	OFF

JAK	STAT3	OFF

SPHK1	S1P	OFF

PDGFR	SPHK1	OFF

DISC	Caspase	ON

DISC	MCL1	ON

Ceramide	S1P	ON

GAP	KRAS	ON

(b)
FVS
Name	State

TCR	Osc.
ZAP70	OFF
GAP	OFF
NFKB	ON
IL2RB	ON
IL2RA	OFF
JAK	ON
TBET	ON
P2	ON
DISC	ON
BID	ON
S1P	OFF
PDGF	OFF
IL15	ON
Stimuli	ON
Stimuli2	OFF
CD45	OFF
TAX	OFF

(c)
SM
Name	State

PDGFR	OFF

S1P	OFF

SPHK1	OFF

TBET	ON
ERK	ON
Ceramide	ON

TBET	ON
GRB2	ON
Ceramide	ON

TBET	ON
IL2RB	ON
Ceramide	ON

TBET	ON
IL2RBT	ON
Ceramide	ON

TBET	ON
KRAS	ON
Ceramide	ON

TBET	ON
PI3K	ON
MEK	ON
Ceramide	ON

(d)

Open in a new tab

Here we list the control targets for the larger T-LGL model, where control sets are separated by double horizontal bars such that Table 2a contains seven singleton controls, Table 2b contains nine singleton controls, Table 2c contains one set of 18 controls (some of which are unnecessary), and Table 2d contains three singleton controls, five triple control sets, and one quadruple control set [2].

Table 3:

Reduced T-LGL target tables.

CA Edges
Tail	Head	State

Ceramide	DISC	ON

Ceramide	S1P	ON

(a)
CA Nodes
Name	State

S1P	OFF

DISC	ON

Ceramide	ON

(b)
CK
Name	State

S1P	OFF

(c)
FVS
Name	State

Ceramide	ON
DISC	ON

Ceramide	ON
FLIP	OFF

S1P	OFF
DISC	ON

S1P	OFF
FLIP	OFF

(d)
SM
Name	State

S1P	OFF

Ceramide	ON

(e)

Open in a new tab

As before, we list the control targets for the small T-LGL model, where control sets are separated by double horizontal bars such that Table 3a contains two singleton controls, Table 3b contains three singleton controls, Table 3c contains one singleton, Table 3d contains four sets of dual controls, and Table 3e contains two singleton controls [2].

However, a key unique feature of CA is the creation of new attractors, while other methods discussed rely on pre-existing attractors. This then leads to the potential for new target discovery as the long-term objectives change. Further, CA sets out to solve a system of polynomial equations, whereas FVS and SM rely on strongly connected components to find their targets. To explicitly see these connections, consider the following example.

2.1. Case Study: T-Cell Large Granular Lymphocyte (T-LGL) Leukemia

T-cell large granular lymphocyte (T-LGL) leukemia is a blood cancer in which there is an anomalous surge in white blood cells, called T-cells. Cytotoxic T-cells are part of the immune system that fight against antigens, even by killing cancer cells. These T-cells release specific cytokines that alter how the immune system responds to external agents by way of recruiting particular immune cells to fight infection, promoting antibody production, or inhibiting the activation and proliferation of other cells [32]. Once their job is complete they undergo controlled cell-death, however, T-LGL leukemia occurs when these T-cells evade apoptosis and maintain proliferation [2]. There are currently no standards of treatment established, however options include immunosuppressive therapy (such as methotrexate), oral cyclophosphamide (an alkylating agent), or cyclosporine (an immunomodulatory drug) [45]. Since there continues to be a search for standard therapies for this disease, the identification of potential therapeutic targets is essential.

In [46], a Boolean dynamic model was constructed consisting of a network of sixty nodes indicating the cellular location, molecular components, and conceptual nodes. For the sake of our analysis, we use the Boolean rules in Table 8 (see Appendix). The main inputs to the network are “Stimuli”, which represent virus or antigen stimulation, and the main output node is “Apoptosis”. Model analysis revealed that the system contains three attractors, two of which are diseased and the other is healthy (determined by apoptosis activation). Table 2 lists the control targets discovered by each of the respective methods for the large TLGL model, with the objective of activating apoptosis. Individual control methods are found in Tables (2a) – (2d), and control sets are separated by double horizontal bars. Note that the CK method did not produce results for the large model because of its size [2].

Likewise, an analysis of a smaller (reduced) model of T-LGL can also be useful [11,46]. Model analysis indicated that the reduced model in Figure 1 contains two fixed points, one healthy and one diseased. Regulatory functions for the small T-LGL model can be found in Appendix Table 7. Tables (3a) – (3e) list the control targets discovered by each of the respective methods for the small T-LGL model, with the objective of activating apoptosis. The control sets are separated by double horizontal bars as before [2].

For both large and reduced models, we see that FVS provides an upper bound for the amount of targets needed to achieve network control, whereas CA and CK can provide minimal sets.

3. Description of Control Methods

3.1. Algebraic Methods (CA)

The method based on computational algebra described in [26, 43] seeks two types of controls: nodes and edges. These can be achieved biologically by blocking effects of the products of genes associated with nodes, or by targeting specific gene communications (see Figure 3). The identification of control targets is achieved by encoding the nodes (or edges) of interest as control variables within the functions. Then, the control objective is expressed as a system of polynomial equations that is solved by computational algebra techniques. Though node and edge control are similar, they provide a range of biological options. One reason is that node control requires an entire node to be knocked out (or knocked-in), but edge control simply requires an edge communication to be blocked (or continually expressed) [2].

Figure 3: — *CA diagram*. Here, we show a toy model that emphasizes the difference between node and edge control. The key difference with edge control (b), is that all other communcations are maintained. Whereas, node control removes every signal associated with the given target.

Let the function $ℱ : F_{2}^{n} \times U \to F_{2}^{n}$ denote a Boolean network with control, where $U$ is a set of all possible controls. Then, for $u \in U$ , the new system dynamics are given by $x (t + 1) = ℱ (x (t), u)$ . That is, each coordinate $u_{i, j} \in u$ encodes the control of edges as follows: consider the edge $x_{i} \to x_{j}$ in a given wiring diagram. Then, we can encode this edge as a control edge by the following function:

ℱ_{j} (x, u_{i, j}) : = f_{j} (x_{1}, \dots, (u_{i, j} + 1) x_{i}, \dots, x_{n})

which gives

Inactive control:
$u_{i, j} = 0, ℱ_{j} (x, 0) = f_{j} (x_{1}, \dots, x_{i}, \dots, x_{n})$
Active control (edge deletion):
$u_{i, j} = 1, ℱ_{2} (x, 1) = f_{j} (x_{1}, \dots, x_{i} = 0, \dots, x_{n}) .$

The definition of edge control can therefore be applied to many edges, obtaining $ℱ : F_{2}^{n} \times F_{2}^{e} \to F_{2}^{n}$ where $e$ is the number of edges in the diagram. Next, we consider control of node $x_{i}$ from a given diagram. We can encode the control of node $x_{i}$ by the following function:

ℱ_{j} (x, u_{i}^{-}, u_{i}^{+}) : = (u_{i}^{-} + u_{i}^{+} + 1) f_{j} (x) + u_{i}^{+}

which yields

Inactive control:
$u_{i}^{-} = 0, u_{i}^{+} = 0, ℱ_{j} (x, 0, 0) = f_{j} (x)$
Node $x_{i}$ deletion:
$u_{i}^{-} = 1, u_{i}^{+} = 0, ℱ_{j} (x, 1, 0) = 0$
Node $x_{i}$ expression:
$u_{i}^{-} = 0, u_{i}^{+} = 1, ℱ_{j} (x, 0, 1) = 1$
Negated function value (irrelevant for control):
$u_{i}^{-} = 1, u_{i}^{+} = 1, ℱ_{j} (x, 1, 1) = f_{j} (x_{t_{1}}, \dots, x_{t_{n}}) + 1 .$

Using these definitions, we can achieve three types of objectives. Let $F = (f_{1}, \dots, f_{n}) : F^{n} \to F^{n}$ where $F = {0, 1}$ and $μ = \{μ_{1}, \dots, μ_{n}\}$ is a set of controls. Then we may:

Generate new attractors. If $y$ is a desirable state (i.e. apoptosis), but it is not currently an attractor, we find a set $μ$ so that we can solve
$ℱ_{j} (y, μ) - y_{j} = 0, j = 1, \dots n$ (1)
Block transitions or remove attractors. If $y$ is an undesirable attractor (i.e. proliferation), we want to find a set $μ$ so that $ℱ (y, μ) \neq y$ . In general, we can use this framework to avoid transitions between states (say $y \to z$ ) so that $ℱ (y, μ) \neq z$ . So we can solve
$ℱ_{j} (y, μ) - z_{j} + 1 = 0, j = 1, \dots n$ (2)
Block regions. If a particular value of a variable, say $x_{k} = a$ , triggers an undesirable pathway, then we need all attractors to satisfy $x_{k} \neq a$ . So we find a set $μ$ so that the following system has no solution
$\begin{array}{r} ℱ_{j} (x, μ) - x_{j} = 0 & j = 1, \dots n \\ x_{k} - a = 0 . \end{array}$ (3)

A subtle change in notation requires attention, because we have now used $x$ to indicate variables rather than specific values.

Notably, the Boolean functions $F$ must be written as polynomials. To complete the control search we then compute the Gröbner basis of the ideal associated with the given objective. For example, if we generate new attractors, we find the Gröbner basis for the ideal

I = ⟨ℱ_{1} (y, μ) - y_{1}, \dots, ℱ_{n} (y, μ) - y_{n}⟩ .

(4)

Therefore, we can determine all controls that solve the system of equations and detect combinatorial actions for the given model [2].

3.2. Control Kernel (CK)

A control kernel (CK) is defined as a set of nodes of minimal order whose pinning reshapes the dynamics such that the basin of attraction of attractor $A$ becomes the entire configuration space. There are three main contributors to the CK: input nodes (nodes with identity function as the updating rule), distinguishing nodes (subset of nodes where a pinning exists that is both compatible with attractor $A$ and incompatible with the other initial attractors of the network), and additional nodes (minimal distinguishing node sets that are needed to remove additional attractors). Note that input and distinguishing nodes provide only a lower bound to CK size because the pinning procedure can create new attractors [2, 29].

To compute CKs, first start with pinning input nodes. Then a brute-force method is used to loop over sets of distinguishing nodes of increasing size for each attractor. A CK has been found when no other attractors exist after pinning. Uncontrollable complex attractors are identified by pinning all constant nodes. If more than one attractor remains, then the cycle does not have a CK [29]. CK discovery works well for small networks, however, larger networks prove more difficult due to the brute-force nature of the algorithm. In fact, the scaling of the set cardinality is logarithmic based on the number of attractors in the network [2, 29].

3.3. Feedback Vertex Set (FVS)

FVS control uses only the topological structure of a network and knowledge of target phenotype biomarkers to induce a phenotype change [30, 44]. In FVS control, by manipulating the internal state of the feedback vertex set (i.e. the nodes that intersect every cycle in the network), we disrupt all feedbacks, making the resulting network admit a single steady state, which can be aligned with one of the original system’s dynamic attractors. Thus, a $F V S$ of a graph is a minimal set of nodes whose removal leaves the graph without cycles. FVS control has been successfully applied to a variety of networks and has been shown to provide an upper bound on the cardinality of the single set of control nodes needed to reach all attractors [29, 31]. The FVS method’s advantages include: (i) control simply requires fixing the internal state of the FVS to match that of the desired attractor, and (ii) making robust predictions that depend only on the network structure and not on dynamical details. For a transcription factor network underlying a phenotypic switch, the FVS is a set of transcription factors that, when controlled to match the expression of a desired phenotype, will shift the cell towards that phenotype [2].

We formally define a feedback vertex set of a directed graph $W$ as a possibly empty subset $I$ of vertices such that the di-graph $W \ I$ is acyclic, where $W \ I$ denotes the resulting di-graph when all vertices of $I$ are removed from $W$ , along with all edges from or towards those vertices. An alternative way to view FVS is as trees and forests. Recall that a tree is an undirected graph in which any two vertices are connected by exactly one path, that is, a connected acyclic undirected graph. A forest is defined as an undirected graph in which any two vertices are connected by at most one path, that is, an acyclic undirected graph, or a disjoint union of trees [47]. Define a graph $G = (V, E)$ that consists of a finite set of vertices $V (G)$ and a set of edges $E (G)$ . Then a FVS of $G$ is a subset of vertices $V^{'} \subseteq V (G)$ such that the removal of $V^{'}$ from $G$ , along with all edges incident to $V^{'}$ , results in a forest [48]. As such, a FVS must contain all source nodes and a node in every cycle. In other words, a FVS is a set of “determining nodes” such that if the dynamics of the determining nodes are given for large times, then the dynamics of the whole system are determined uniquely for large times [2, 30, 49].

3.4. Stable Motifs (SM)

Stable motif (SM) control is based on the identification of self-sustaining generalized positive feedback loops in the dynamic model. Each of these stable motifs determines a region of the state space from which dynamical trajectories cannot escape, called a trap space. Further, a stable motif (or a succession of multiple stable motifs) determines a dynamical attractor (i.e. phenotype). There is a SM control set associated with each attractor of the system, and the impact of numerous regulators on a single node can be addressed and analyzed with the method [32]. By definition, a stable motif is a strongly connected subgraph of the expanded graph that [2]:

contains either a node or its complement but not both
contains all inputs of its composite nodes (if any exist)

First, implement the expanded network that is used to add information about the combinatorial interaction and signs of nodes. Composite nodes represent the AND interaction and complementary nodes represent the NOT interaction. Each original node $i$ is denoted by $x_{i}$ in the expanded graph, and a complementary node $(\sim x_{i})$ is added if the original node represented suppression. Then, all NOT functions are replaced by its appropriate complementary node in the function. Next, edges are included where each edge is a positive regulation, contrary to the original wiring diagram [2, 50].

The second step is to make distinctions between OR rules and AND rules by using composite nodes for functions ivolving ANDs. To do this, the functions must be in disjunctive normal form in order to uniquely determine edges. A special node is included for AND rules, and edges are drawn from the non-composite nodes of the network that form the actual composite rule. It is noted that the benefit of such an action is that the reader is able to see all regulatory functions simply from the topology of the expanded network. Now that the expanded graph is complete, using the definition above we can search for SMs within the network. The group of nodes included in the SM represent partial fixed points, from which the remaining nodes can be calculated using the original Boolean functions [2, 50].

4. Efficiency management

In the age of “Big Data”, models are increasingly large and ever more complex. Currently the human genome is estimated to have approximately 25,000 genes, and single genes can encode multiple proteins. What’s more, post-translational modifications add even more complexity to the proteome, with an estimated list of greater than one million proteins [51]. Even networks of merely 100 nodes present a state space much larger than the total estimated cells in the human body [2]. Therefore, the question of control efficiency is an open problem to address. Below, we present possible options for addressing network sizes that are too large for target discovery to be performed in a timely fashion.

4.1. Reduction techniques

The magnitude of the BN state space for $n$ genes is $2^{n}$ . Thus, an increase of GRN size will exponentially increase the computational burden for its analysis, which means brute-force methods for small systems are not sufficient. Many reduction techniques allows to reduce the size of the network while preserving dynamical features (e.g., fixed points and periodic attractors), see [36, 37]. Reduction techniques were implemented in a pancreatic cancer model that effectively decreased the total network size from sixty-nine nodes to twenty-two nodes, a 68% reduction [33]. Critically, when a node was deleted, its function values were substituted directly into its downstream signal recipient(s) to maintain key network communications. Further, nodes containing self-loops cannot be removed, this includes input (source) nodes and self-modulating nodes.

First, nodes with one input and one output were removed, but maintain nodes with self-loops and phenotypes as biomarkers (see Figure 4) [35]. Next, remove nodes with either one input and multiple outputs, or vice versa (see Figure 5). Lastly, remove nodes with low connectivity relative to the remaining nodes (see Figure 6). These techniques have been shown to preserve fixed points but not complex attractors, yet, there are results indicating a conservation of attractors [33, 36, 37]. For an example of one input and one output, consider FGFR from the pancreatic cancer model [33]. The original model’s neighborhood about FGFR is shown in Figure 4a with equations (5) – (6).

Figure 5: — *Single-in-multi-out removal*. Here, we show how to remove MEK from the network and still maintain downstream signaling. See equations 9 – 14 for functional maintenance.

Figure 6: — *Low connectivity removal*. Here, we show how to remove cJUN from the network and still maintain downstream signaling. See equations 15 – 22 for functional maintenance.

FGFR = bFGF

(5)

RAS = (EGFR) ∣ (FGFR)

(6)

After reduction, we obtain the neighborhood seen in Figure 4 b with equations (7) – (8).

FGFR = bFGF

(7)

RAS = (EGFR) ∣ (bFGF)

(8)

For an example of either one input and multiple outputs, or vice versa, consider MEK from [33]. The original model’s neighborhood about MEK is shown in Figure 5a with equations (9) – (11).

M E K = R A F

(9)

E R K = M E K

(10)

J N K = M E K

(11)

After reduction, we obtain the neighborhood seen in Figure 5b with equations (12) – (14).

M E K = R A F

(12)

E R K = R A F

(13)

J N K = R A F

(14)

Lastly, for an example low connectivity removal, consider cJUN [33]. The original model’s neighborhood about cJUN is shown in Figure 6a with equations (15) – (18).

c J U N = (E R K) ∣ (J N K)

(15)

E G F = c J U N

(16)

m T O R = (\sim c J U N) & (A K T)

(17)

P r o = (C y c l i n E) & ((J N K) ∣ (c J U N))

(18)

After reduction, we obtain the neighborhood seen in Figure 6b with equations (19) – (22).

cJUN = (E R K) ∣ (J N K)

(19)

E G F = (E R K) ∣ (J N K)

(20)

m T O R = (\sim [(E R K) ∣ (J N K)]) & (A K T)

(21)

Pro = (C y c l i n E) & ((J N K) | (E R K) | (J N K))

(22)

4.2. Modularity techniques

Systems biology is capable of building complicated structures from simpler building blocks, even though these simple blocks (i.e. modules) traditionally are not clearly defined. The concept of modularity detailed in [34] is structural by nature, in that, a module of a BN is a subnetwork in which the restriction of the network to the variables of a subgraph has a strongly connected wiring diagram. This framework introduces both a structural and dynamic decomposition that encapsulates the dynamics of the whole system simply from the dynamics of its modules. Consequently, the decomposition yields a hierarchy among modules that can be used to specify controls. That is, by controlling key modules we are able to control the entire network [2].

Within the modularity framework, the dynamics of the state-space for Boolean network $F$ are denoted as $𝒟 (F)$ , which is a collection of all minimal subsets of attractors, $A$ , satisfying $F (A) = A$ . Further, if $F$ is decomposable (say into subnetworks $H$ and $G$ ), then we can write $F = H ⋊ G$ which is called the coupling of $H$ and $G$ . In the case where the dynamics of $G$ are dependent on $H$ , we call $G$ non-autonomous, denoted as $\overline{G}$ . Then we adopt the following notation: let $A = A_{1} \oplus A_{2}$ be a set of attractors of $F$ with $A_{1} \in 𝒟 (H)$ and $A_{2} \in 𝒟 (G^{A_{1}})$ [2].

For an example, consider the network in Figure 7 a with

F (x_{1}, x_{2}, x_{3}, x_{4}, x_{5}, x_{6}) = (x_{3}, x_{1}, x_{2}, x_{1} x_{6}, x_{4}, x_{5}) .

From the given wiring diagram, we derive two SCCs where module one (red in 7b) flows into module two (green in 7b). That is, $F = F_{1} ⋊ F_{2}$ with

F_{1} (x_{1}, x_{2}, x_{3}) = (x_{3}, x_{1}, x_{2})

F_{2} (x_{4}, x_{5}, x_{6}) = (x_{6}, x_{4}, x_{5})

\bar{F_{2}} (x_{4}, x_{5}, x_{6}) = (x_{1} x_{6}, x_{4}, x_{5})

𝒟 (F_{1}) = {000, 111, [001, 100, 010], [011, 101, 110]} .

Suppose we aim to stabilize the system into $y = 000000$ . First we see that either $x_{1} = 0, x_{2} = 0$ or $x_{3} = 0$ stabilize module one (i.e. $F_{1}$ ) to $A_{1} = 000$ by applying the FVS method from Section 3.3. Likewise, $x_{4} = 0, x_{5} = 0$ or $x_{6} = 0$ stabilize module two (i.e. $F_{2}^{A_{1}}$ ) to $A_{2} = 000$ . Thus, we conclude that $u = (x_{1} = 0, x_{6} = 0)$ achieves the desired result [2].

5. Limitations

Even though phenotype control theory shows massive potential, the field overall has some limitations, along with those of each technique we have described. From a biological and translational perspective, it remains yet to be validated as a viable option for clinical application. Further, the human genome is highly complex, with signaling mechanisms that are far from well understood. This leads modelers to rely on speculative networks and hypothesized functional communication rules.

Regardless of method, each of the resulting outputs are merely theoretical controls and must be parsed to find tangible targets (or combinations of targets). Efficacy of the resulting targets can be established computationally, which is discussed in the Appendix 7.3. The parsing process can include brute-force testing of all controls, knowledge of the regulatory network topology, knowledge of literature pertaining to particular controls, or a mixture of various techniques [22, 26]. Some controls may not be biologically achievable, others may be insufficient if applied independently, while some simply do not perform as desired.

Since we do not apply optimal control, another constraint to address is how to select controls that prioritize certain interventions over others. These criteria might include selection according to effectiveness (e.g. shorter absorption time), total/side effects (e.g. number of changes in the original state space), target “depth” within the network, and practical implementability. Many of the selection criteria will need stochasticity (such as for time to absorption), which can be achieved via SDDS [10, 52] or asynchronous simulations (see Appendix 7.2).

When it comes to network reduction, techniques can prove extremely tedious if networks are notably large. Further, the reduction techniques can change the long-term outlooks of key analytical features such as cyclical attractors. It has been shown that the methods in Section 4.1 will maintain fixed points, but they do not necessarily maintain cyclical attractors [33, 36, 37]. Even though examples have been shown to maintain all attractors [2, 33], one can easily show counter examples that do not (see the small T-LGL model in Section 2.1). Thus, a fully developed methodology for efficient reduction is yet to be seen, which could be important for analyzing large models.

Additionally, computational complexity varies across methods. For instance, the CA method makes use of computing Gröbner bases for a system of polynomials and, depending on the algorithm used, it has been shown to have doubly exponential complexity [26]. However, GRNs with small sets of regulatory nodes can compute Gröbner bases in a reasonable time [26, 53].

For CK, the problem of finding the minimal set of controlling nodes was shown to be NP-hard [54], and the problem of the existence of multiple possible minimal control sets is NP-complete. Thus, when computing CKs, no algorithm is expected to run faster in the worst case than checking every possible subset of increasing size, since the rounds of pinning to find CK’s are representative of NP-hard problems. Moreover, the average CK sizes scale logarithmically with the number of attractors [29].

The computational time to find a single FVS is reasonable, the issue arises when trying to find all possible FVSs. The global stabilization of BNs have been shown to have computational complexity that is exponential with respect to the number of state variables [55, 56]. However, while the problem of exactly identifying the minimal FVS has complexity of NP-hard, a variety of fast algorithms exist to find close-to-minimal solutions [31, 57].

Lastly, the complexity of calculating SMs using the domain of influence (DOI), through the expanded graph [50], is bounded by the order of the sum the number of nodes and edges in the expanded network, $O (N_{e x} + E_{e x})$ . Subsequent calculations for finding control sets from the DOI become more complex. So called “well behaved degree distribution” networks give calculated order $O (k^{2} N^{2})$ , where $k$ are the regulators for each node $N$ . Those networks considered to have “skewed degree distribution” are bounded by $O (N^{3})$ [50].

6. Conclusions

In this paper, we reviewed various techniques for implementing target discovery and control of gene regulatory networks. Due to the growing nature of the field, there are always emerging, novel techniques to implement and we acknowledge that the methods included here are not fully exhaustive [58–62]. Even so, we have set out to provide a list of varying options, depending on the specific aims and information available to users, that represent a broad range of applicable theory. We also hope to spark conversations and ideas for solving open problems in the field, as well as inspire application of these concepts across a wide range of disciplines, not strictly biology.

In addition to toy examples for each method (see Appendix), we also applied each approach to a well known cancer model (T-LGL Leukemia) to explore overlaps and differences among the processes. In particular, we showed that FVS provides an upper bound for the amount of targets needed to achieve network control, whereas CA and CK can provide minimal sets. Perhaps the most versatile method shown is CA, where users have wide ranging options to personalize their search (i.e. nodes vs. edges, use existing attractors, generate new attractors, and block transitions or regions). These overlaps have also been shown in a computational pancreatic cancer model [22, 33].

Even though there is not a common theoretical framework to apply all methods, we do see that each is capable of affirming discoveries across other methods while also suggesting possible novel targets of their own. We believe the future is bright for synthetic modeling and control of cell signaling networks, and the methods reviewed here in are just the beginning.

Supplementary Material

Supplement 1

NIHPP2023.04.17.537158v1-supplement-1.pdf^{(112KB, pdf)}

Figure 2: — *Reduced T-LGL network target overlaps*. We highlight the overlapping control targets from Table 3 by overlaying them with the reduced T-LGL wiring diagram from Figure 1, shown in two diagrams to avoid excessive noise. (a) We show instances of CA edge (blue), CA node (green), and SM (grey). (b) We show instances of CK (black) and FVS (purple). Note that FVS has combinatorial controls with connecting arches, where others are strictly singleton.

Acknowledgements

The authors would like to thank Reinhard Laubenbacher and Reka Albert for their discussions and suggestions during in the initial stage of this project. Further, DP was supported by the NIH Training Grant T32CA165990. D.M. was partially supported by a Collaboration grant (850896) from the Simons Foundation.

7. Appendix

7.1. Elementary examples for control methods

7.1.1. Computational Algebra

Consider the network in Figure 8, with the following regulatory functions.

f_{1} = (\sim x_{3}) \land (\sim x_{5})

f_{2} = (\sim x_{1}) \lor x_{4}

f_{3} = (\sim x_{2}) \lor x_{5}

f_{4} = x_{3}

f_{5} = \sim x_{4}

Using Table 5, we rewrite our functions as the following simplified polynomials.

Table 5:

Standard Boolean logical rules.

Rule	Symbol	Polynomial
AND	x ∧ y, x&y	xy
OR	x ∨ y, x\|y	xy + x + y
NOT	$(~ x), \bar{x}, (\neg x)$	x + 1

Open in a new tab

f_{1} = 1 + x_{3} + x_{5} + x_{3} x_{5}

f_{2} = 1 + x_{1} + x_{1} x_{4}

f_{3} = x_{2} x_{5} + x_{2} + 1

f_{4} = x_{3}

f_{5} = 1 + x_{4}

We can then find the fixed points of the system by solving $f_{i} = x_{i}$ for $i = 1, \dots, 5$ . Another way to view this step is as finding roots of $g_{i} = 0$ where $g_{i} = f_{i} - x_{i}$ , then finding the Grobner basis of the ideal $I = ⟨g_{1}, \dots, g_{5}⟩$ . In any case, the example in Figure 8 does not contain any fixed points. However, further state space analysis does reveal two attractors: ${01011, 01100}$ and ${00101, 01010, 01110, 01111, 10001, 11000}$ . Now, we encode our edge controls as

ℱ_{1} = 1 + (u_{3,1} + 1) x_{3} + (u_{5,1} + 1) x_{5} + (u_{3,1} + 1) x_{3} (u_{5,1} + 1) x_{5} ℱ_{2} = 1 + (u_{1,2} + 1) x_{1} + (u_{1,2} + 1) x_{1} (u_{4,2} + 1) x_{4} ℱ_{3} = (u_{2,3} + 1) x_{2} (u_{5,3} + 1) x_{5} + (u_{2,3} + 1) x_{2} + 1 ℱ_{4} = (u_{3,4} + 1) x_{3} ℱ_{5} = 1 + (u_{4,5} + 1) x_{4}

(23)

and node controls as

\begin{array}{l} ℱ_{1} = (u_{1}^{-} + u_{1}^{+} + 1) (1 + x_{3} + x_{5} + x_{3} x_{5}) + u_{1}^{+} \\ ℱ_{2} = (u_{2}^{-} + u_{2}^{+} + 1) (1 + x_{1} + x_{1} x_{4}) + u_{2}^{+} \\ ℱ_{3} = (u_{3}^{-} + u_{3}^{+} + 1) (x_{2} x_{5} + x_{2} + 1) + u_{3}^{+} \\ ℱ_{4} = (u_{4}^{-} + u_{4}^{+} + 1) x_{3} + u_{4}^{+} \\ ℱ_{5} = (u_{5}^{-} + u_{5}^{+} + 1) (1 + x_{4}) + u_{5}^{+} . \end{array}

(24)

Let’s consider the objective of generating new attractors, and assume we want our steady state to be $y = 11110$ . In general, one can search the entire system for controls, but there may be special cases where limiting decisions can be made amongst collaborators. For arguments sake, suppose we want to find edge knockouts and limit our search to edges $x_{3} \to x_{1}, x_{5} \to x_{1}$ , and $x_{2} \to x_{3}$ . Then the updated edge equations (Eq. 23) become

\begin{array}{l} ℱ_{1} = 1 + (u_{3,1} + 1) x_{3} + (u_{5,1} + 1) x_{5} + (u_{3,1} + 1) x_{3} (u_{5,1} + 1) x_{5} \\ ℱ_{2} = 1 + x_{1} + x_{1} x_{4} \\ ℱ_{3} = (u_{2,3} + 1) x_{2} x_{5} + (u_{2,3} + 1) x_{2} + 1 \\ ℱ_{4} = x_{3} \\ ℱ_{5} = 1 + x_{4} . \end{array}

(25)

Evaluating at $y = 11110$ yields

ℱ_{1} = u_{3,1}, ℱ_{2} = 1, ℱ_{3} = u_{2,3}, ℱ_{4} = 1, ℱ_{5} = 0 .

Therefore, the desired fixed point is achieved if and only if $u_{3,1} = u_{2,3} = 1$ . That is, the controls for $u_{3,1}$ and $u_{2,3}$ are active, such that we must delete both corresponding edges. Similarly, we can determine node control to achieve new fixed point $y = 11110$ . Again, for simplicity, we limit ourselves to $x_{1}$ knock-in, $x_{3}$ knock-out and knock-in, and $x_{4}$ knock-in. The updated node equations (Eq. 24) then become

\begin{array}{l} ℱ_{1} = (u_{1}^{+} + 1) (1 + x_{3} + x_{5} + x_{3} x_{5}) + u_{1}^{+} \\ ℱ_{2} = 1 + x_{1} + x_{1} x_{4} \\ ℱ_{3} = (u_{3}^{-} + u_{3}^{+} + 1) (x_{2} x_{5} + x_{2} + 1) + u_{3}^{+} \\ ℱ_{4} = (u_{4}^{+} + 1) x_{3} + u_{4}^{+} \\ ℱ_{5} = 1 + x_{4} . \end{array}

(26)

Evaluating at $y = 11110$ yields

ℱ_{1} = u_{1}^{+}, ℱ_{2} = 1, ℱ_{3} = u_{3}^{+}, ℱ_{4} = 1, ℱ_{5} = 0 .

Thus, the desired fixed point is achieved if and only if $u_{1}^{+} = 1$ and $u_{3}^{+} = 1$ . Importantly, this means that the controls by themselves are insufficient but together they achieve the desired goal. One can easily see that requiring numerous controls in much larger systems may not be biological feasible, which is why alternate objectives can prove useful.

Suppose we determine that $y = 01111$ is in a diseased attractor which we want to destroy. We can then aim to block the transition from $y$ to $F (y) = 01110$ . We limit ourselves to considering edges from $x_{3} \to x_{1}, x_{5} \to x_{1}, x_{3} \to x_{4}$ , and $x_{4} \to x_{5}$ . The updated edge equations (Eq. 23) become

\begin{array}{l} ℱ_{1} = 1 + (u_{3,1} + 1) x_{3} + (u_{5,1} + 1) x_{5} + (u_{3,1} + 1) x_{3} (u_{5,1} + 1) x_{5} \\ ℱ_{2} = 1 + x_{1} + x_{1} x_{4} \\ ℱ_{3} = x_{2} x_{5} + x_{2} + 1 \\ ℱ_{4} = (u_{3,4} + 1) x_{3} \\ ℱ_{5} = 1 + (u_{4,5} + 1) x_{4} . \end{array}

(27)

Evaluating at $y = 01111$ yields

ℱ_{1} = u_{3,1} u_{5,1}, ℱ_{2} = 1, ℱ_{3} = 1, ℱ_{4} = u_{3,4} + 1, ℱ_{5} = u_{4,5} .

This means that Eq. 2 becomes

(u_{3,1} u_{5,1} + 1) (u_{3,4}) (u_{4,5} + 1) = 0

giving three possible solutions: $u_{3,1} = u_{5,1} = 1, u_{3,4} = 0$ , or $u_{4,5} = 1$ . Notice that we again have a combinatorial solution in $u_{3,1}, u_{5,1}$ since they are insufficient individually but successful together, $u_{3,4} = 0$ means that the control is inactive, and $u_{4,5}$ is a singleton control.

Lastly, consider the objective of region blocking. Suppose we want to avoid regions where $x_{3} = 0$ , and we will limit ourselves to nodes $x_{2}$ knock-out, $x_{3}$ knock-in, and $x_{4}$ knock-in. Then the updated node equations (Eq. 24) become

\begin{array}{l} ℱ_{1} = 1 + x_{3} + x_{5} + x_{3} x_{5} \\ ℱ_{2} = (u_{2}^{-} + 1) (1 + x_{1} + x_{1} x_{4}) \\ ℱ_{3} = (u_{3}^{+} + 1) (x_{2} x_{5} + x_{2} + 1) + u_{3}^{+} \\ ℱ_{4} = (u_{4}^{+} + 1) x_{3} + u_{4}^{+} \\ ℱ_{5} = 1 + x_{4} . \end{array}

(28)

Next, we see that Eq. 3 yields

0 = 1 + x_{3} + x_{5} + x_{3} x_{5} + x_{1} 0 = (u_{2}^{-} + 1) (1 + x_{1} + x_{1} x_{4}) + x_{2} 0 = (u_{3}^{+} + 1) (x_{2} x_{5} + x_{2} + 1) + u_{3}^{+} + x_{3} 0 = (u_{4}^{+} + 1) x_{3} + u_{4}^{+} + x_{4} 0 = 1 + x_{4} + x_{5} 0 = x_{3}

(29)

Using computation algebra tools to compute the Grobner basis of the ideal associated to the above equations, we encode the system of equations to achieve the ideal:

I = ⟨x_{1} + 1, u_{2}^{-}, x_{2} + 1, u_{3}^{+}, x_{3}, u_{4}^{+} + 1, x_{4} + 1, x_{5}⟩ .

This means the original system has the same solutions as the following system.

\begin{array}{r} x_{1} + 1 = 0 & u_{2}^{-} = 0 & x_{2} + 1 = 0 & u_{3}^{+} = 0 \\ x_{3} = 0 & u_{4}^{+} + 1 = 0 & x_{4} + 1 = 0 & x_{5} = 0 \end{array}

Recall that our goal is to block the region $x_{3} = 0$ by finding parameters that guarantee the above system has no solutions. Utilizing equations that only contain control parameters we have $u_{2}^{-} = 0, u_{3}^{+} = 0$ , and $u_{4}^{+} + 1 = 0$ . Thus, if we allow either $u_{2}^{-} = 1, u_{3}^{+} = 1$ , or $u_{4}^{+} = 0$ , then our system will have no solution, as needed. Since $x_{3}$ is limiting criteria and $u_{4}^{+}$ is an inactive control, that leaves $u_{2}^{-} = 1$ as the desired target. As one can see, the computational algebra method is quite versatile [2].

7.1.2. Control Kernel

Consider the network in Figure 9. Steady state analysis reveals two fixed points: 000100 and 111011. Suppose our control objective is $x_{4} = 0$ , which is the second fixed point respectively. We first notice that there are no input nodes, which means we move on to distinguishing nodes. Then the CK method (correctly) indicates that $x_{1} = 1$ will direct the system into the desired fixed point. Admittedly, while the CK method is straight forward, the software used to implement the search can be difficult to navigate [2].

7.1.3. Feedback Vertex Set

Figure 10 contains a simple example of identifying a FVS. The input node $(x_{1})$ is always in the control set, while the only other node required is one of those in the 3-cycle. As scene in the figure, 10a is the example wiring diagram and 10b - 10d show the three possible FVS’s. One can easily see that the strategy for FVS is quite simple, yet, it can produce larger control sets than necessary. Further, we may not obtain all FVS’s if the system has many attractors [2].

7.1.4. Stable Motifs

Consider the example network in Figure 11a, with the following functions and negated functions.

\begin{array}{l} f_{1} = x_{2} ∣ x_{3} & \sim f_{1} = (\sim x_{2}) & (\sim x_{3}) \\ f_{2} = x_{1} & (\sim x_{3}) & \sim f_{2} = (\sim x_{1}) ∣ x_{3} \\ f_{3} = (\sim x_{1}) ∣ (\sim x_{2}) & \sim f_{3} = x_{1} & x_{2} \end{array}

Using the aforementioned steps, the expanded graph obtained is Figure 11b. Notice there are two stable motifs (circled in orange and green), which indicate a fixed point (110) and a partial fixed point (X01). To find the rest of partial fixed point, substitute known values into the original functions. Therefore,

f_{1} = x_{2} |x_{3} = 0| 1 = 1

which gives 101 as the second fixed point. Since the control sets are subsets of the stable motifs, we have $\{x_{2} = 1, x_{3} = 0\}$ or $\{x_{1} = 1, x_{3} = 0\}$ for fixed point 110, and $\{x_{2} = 0\}$ or $\{x_{3} = 1\}$ for fixed point 101 [2].

7.2. Finite Dynamical Systems

For the last few decades, a popular modeling approach for gene regulation has been to implement dynamical systems over finite fields. Here, functions can be interpreted as modeling information processing within cells, which determines cellular behavior. As depicted in Figure 12, $\{x_{i_{1}}, \dots, x_{i_{m}}\}$ represent the input genes or predictor genes, $f_{i} (x_{i_{1}}, \dots, x_{i_{m}})$ is the internal update function or predictor rule, and $x_{i}$ is the target gene.

Figure 12: — *FDS for gene regulation* [2]

First, let $X = X_{1} \times X_{2} \dots \times X_{n}$ be the Cartesian product of finite sets. A local model over a finite set $X$ is an $n$ -tuple of coordinate functions $F = (f_{1}, f_{2} \dots, f_{n})$ , where $f_{i} : X^{n} \to X$ . Each function $f_{i}$ uniquely determines a function

F_{i} : (x_{1}, \dots, x_{n}) \mapsto (x_{1}, \dots, f_{i} (x), \dots, x_{n})

and $x = (x_{1}, \dots, x_{n})$ . Every local model defines a canonical finite dynamical system (FDS) map, where the functions are updated as

f : X^{n} \to X^{n}, f : (x_{1}, \dots, x_{n}) \mapsto (f_{1} (x), \dots, f_{n} (x)) .

Note that discrete does not necessarily imply finite. Take the natural numbers $N = 1, 2, 3, 4, \dots$ , for example. The set is clearly discrete, yet its cardinality is infinite. In general, we cannot always write a function as a tuple if the space is simply “discrete”. In order to provide structure to each $X_{i}$ , we embed $X_{i}$ into a finite field where, for some prime $p$ ,

X_{i} ↪ F, |F| = p^{k} .

For example, if we desire states of Low, Medium, and High to represent levels of gene expression, then $X_{i} = {L, M, H} ↪ F_{3} = {0, 1, 2}$ . We call these mixed-state models when states are non-binary. For the case when states are binary (i.e. ON or OFF, HIGH or LOW, 1 or 0), we call these models Boolean networks [2].

7.2.1. Boolean Networks

Boolean networks (BNs) are popular because we can build effective models without the use of constants or rates. This then eliminates the need for tedious parameter discovery. Rather, BNs focus on the mechanics and logic of the system. BN models were originally introduced in 1963 by Kauffman and Thomas to provide a coarse grained description of gene regulatory networks [23, 63]. Within a BN there are three main components: structure (wiring diagram), functions (regulatory rules), and dynamics (attractors). As we begin to define our terms, it may be helpful to keep Figure 13 in mind as a basic example. Given $n$ binary variables, define a Boolean Network as an $n$ -tuple of coordinate functions

F = (f_{1}, \dots f_{n}) : {0,1}^{n} \to {0,1}^{n}, f_{i} : {0,1}^{n} \mapsto \{0,1\} .

The wiring diagram of $F$ , call it $W$ , is then defined as a directed graph with $n$ nodes $\{x_{1}, x_{2}, \dots, x_{n}\}$ such that there is an edge in $W$ from $x_{j}$ to $x_{i}$ if $f_{i}$ depends on $x_{j}$ . That is,

x_{j} \to x_{i} if f_{i} = f (x_{i_{1}}, \dots, x_{i_{j}}, \dots, x_{i_{k}})

Within $W$ we denote positive edges as $x_{j} \to x_{i}$ and negative edges as $x_{j} ⊣ x_{i}$ (or sometimes $x_{j} ⊸ x_{i}$ ). Biologically, a positive edge is representative of activation while a negative edge represents inhibition. For example, in Figure 13 we see the wiring diagram of $F = (f_{1}, f_{2}) = (x_{2}, x_{1})$ .

Now that we have structure and functions, the dynamics of $F$ are traditionally described as: (1) trajectories for all $2^{n}$ possible initial conditions, or (2) a directed graph with nodes in $F_{2}^{n} = {0, 1}^{n}$ . In the first case, a trajectory is a sequence $(x (t))_{t = 0}^{\infty}$ given by the difference equations $x (t + 1) = F (x (t))$ for all $t \geq 0$ [34]. For example, Figure 13 would yield deterministic trajectories

T_{1} = (00, 00, 00, \dots)

T_{2} = (11, 11, 11, \dots)

T_{3} = (01, 10, 01, 10, \dots)

T_{4} = (10, 01, 10, 01, \dots) .

The phase space (also called state space) of $F$ is the directed graph with vertex set $S^{n}$ and edge set $\{(s, f (s)) ∣ s \in S^{n}\}$ . Simply put, in a BN, $S$ is the set of all possible states, and their respective transitions according to the model $F$ form the state space (see Figure 14). A node $s \in S$ is called transient if $f^{k} (s) \neq s$ for all $k > 1$ , a node $s \in S$ is called periodic (or cyclic) if $f^{k} (s) = s$ for some $k \geq 1$ , and a node $s \in S$ is called a fixed point if $f (s) = s$ . We can also think of the phase space as having strongly connected components (SCCs), where a SCC is said to be terminal if it has no out-going edges. Thus, a transient state is not in a terminal SCC, a cyclic attractor is in a terminal $k$ -cycle ( $k = 1$ is a fixed point), and any instance of an SCC otherwise is a complex attractor. In other words, we define an attractor as a set of states from which there is no escape as the system evolves, and an attractor with a single state is called a fixed point. Thus, given sufficient time, the dynamics of a BN always end up in a fixed point or (complex) attractor.

Figure 14: — Phase space of diagram 13 [2]

For example, it was previously shown above that $F = (f_{1}, f_{2}) = (x_{2}, x_{1})$ . To find the dynamics of the corresponding state space $S = {00, 01, 10, 11}$ one can construct truth Table 4 using lexicographic ordering. It is important to point out that we denote the states in order of the variable so that

s_{2} = \{0, 1\} = 01 = \{x_{1} = 0, x_{2} = 1\},

because maintaining order is highly important for correct interpretation of state values. The left columns indicate the possible states of our nodes $x_{1}$ and $x_{2}$ , whereas the right columns indicate their deterministic updates according to the functions $f_{1}$ and $f_{2}$ . Therefore, from the framework we see in Figure 14 that we have two fixed points and one cycle.

Table 4:

Dynamic truth table for Figure 13.

x ₁	x ₂	f₁ = x₂	f₂ = x₁
0	0	0	0
0	1	1	0
1	0	0	1
1	1	1	1

Open in a new tab

Up to this point we have only discussed linear BNs, but real-world models are almost always highly nonlinear (see Figure 15). To accommodate these nonlinear regulatory networks, we implement various classes of functions based on three main Boolean logical rules - AND, OR, NOT. Some use XOR (exclusive OR), but for simplicity it is excluded here. Assume the variables $x$ and $y$ are given in a BN. Then Table 5 summarizes the functionality and notation used for each of the three main rules.

A common criticism of using discrete models for regulatory networks such as BNs is that deterministic dynamics are artificial. In reality biological systems do not contain a “central clock”, but instead the concentration levels of gene products change and respond to stimuli on varying time-scales. Thus, the update schedules chosen play a significant role in the accuracy of the model. Synchronous update schedules produce deterministic dynamics, wherein nodes are all updated simultaneously so that

x (0) \to x (1) = F (x (0)) \to x (2) = F (x (1)) \to \dots

On the other hand, asynchronous update schedules produce stochastic dynamics, wherein a randomly selected node is updated at each time step so that

x (0) \to x (1) = (x_{1} (0), \dots, f_{i} (x (0)), \dots, x_{n} (0)) \to \dots .

Lastly, sequential update schedules are performed asynchronously according to a designated permutation $σ = (σ_{1}, \dots, σ_{n})$ of $(1, \dots, n)$ . Specifically, if we define $F_{i} (x_{1}, \dots, x_{n}) = (x_{1}, \dots, f_{i} (x), \dots, x_{n})$ , then the update is given by

F_{σ} (x) = F_{σ_{n}} (F_{σ_{n - 1}} (\dots (F_{σ_{1}} (x)) \dots))

according to the order designated by $σ$ . This is sometimes done when the ordering of gene updates are known, as some may update faster than others. For example, using our simple example in Figure 13, Figure 16 shows the varying impacts of these three update schedules.

We can easily observe from Figure 16 that fixed points are maintained across all update schedules. However, cycles are not necessarily preserved. This is where the framework of Stochastic Discrete Dynamical Systems (SDDS) is beneficial [2, 11, 22, 33]. Developed in [11], SDDS incorporates Markov chain tools to study long-term dynamics of Boolean networks. SDDS uses parameters based on designated propensities to model node (and pathway) signal activation and deactivation, also referred to as degradation. In essence, SDDS merges the synchronous and asynchronous update schedules described above. One propensity is used when the update positively impacts the node, in the sense that the node increases its value from OFF to ON. Another propensity is used when the update negatively affects the node in the sense that the node decreases its value from ON to OFF. More precisely, an SDDS of the variables $(x_{1}, x_{2}, \dots, x_{n})$ is a collection of $n$ triples

\hat{F} = {f_{k}, p_{k}^{↑}, p_{k}^{↓}}_{k = 1}^{n}

where for $k = 1, \dots, n$ ,

$f_{k} : {0,,1}^{n} \to {0, 1}$ is the update function for $x_{k}$
$p_{k}^{↑} \in [0, 1]$ is the activation propensity
$p_{k}^{↓} \in [0, 1]$ is the deactivation propensity

Here, the parameters $p_{k}^{↑}$ and $p_{k}^{↓}$ introduce stochasticity. For example, an activation of $x_{k} (t)$ at the next time step (i.e. $x_{k} (t) = 0, f_{k} (x_{1} (t), \dots, x_{n} (t)) = 1$ , and $x_{k} (t + 1) = 1$ ) occurs with probability $p_{k}^{↑}$ . An SDDS can be represented as a Markov Chain via its transition matrix, which can be viewed as transition probabilities between various states of the network. Elements of the transition matrix $A$ are determined as follows: consider the set $S = {0, 1}^{n}$ consisting of all possible states of the network. Suppose $x = (x_{1}, \dots, x_{n}) \in S$ and $y = (y_{1}, \dots, y_{n}) \in S$ . Then, the probability of transitioning from $x$ to $y$ is

a_{y, x} = \prod_{i = 1}^{n} P (x_{i} \to y_{i})

(30)

where entries are stored column-wise and

P (x_{i} \to f_{i} (x)) = \{\begin{matrix} p_{k}^{↑}, & if x_{i} < f_{i} (x) \\ p_{k}^{↓}, & if x_{i} > f_{i} (x) \\ 1, & if x_{i} = f_{i} (x) \end{matrix} and P (x_{i} \to x_{i}) = \{\begin{matrix} 1 - p_{k}^{↑}, & if x_{i} < f_{i} (x) \\ 1 - p_{k}^{↓}, & if x_{i} > f_{i} (x) \\ 1, & if x_{i} = f_{i} (x) \end{matrix} .

It follows that $P (x_{i} \to y_{i}) = 0$ for any $y_{i} \notin \{x_{i}, f_{i} (x)\}$ . Therefore, we achieve $A = {[a_{y, x}]}_{x, y \in S}$ . Note that when propensities are set to $p = 1$ , we have a traditional BN. With this framework, we built a simulator that takes random initial states as inputs and then tracks the trajectory of each node through time. Long-term phenotype expression probabilities can then be estimated, as well as network dynamics with (and without) controls [2].

7.3. Simulating Target Efficacy

To determine the efficacy of controls, we compare uncontrolled simulations with the appropriate target control simulations. Thus, a good control will produce low disease levels and high health levels [2]. We can do so by utilizing a stochastic simulator based on SDDS [2, 11, 22, 33], which requires several inputs before it can begin. The number of input variables in each Boolean function is given by the vector $n v$ . Next, we need the variables for each gene in the form of an $m \times n$ matrix called $v a r F$ where $m$ is the maximum number of inputs, $n$ is the number of genes, and information is stored column-wise. The number of variables will vary between functions. Since only the first $n v (i)$ elements of the ith column are relevant, all remaining entries are set as (−1). Now we construct the truth table $F$ in compact form with size $2^{m} \times n$ . Again, the length of each column $i$ will vary but only the first $2^{n v (i)}$ entries are relevant. So all remaining entries are set as (−1). It is vitally important to maintain numerical ordering, which is why the columns of $F$ are in lexicographic binary arrays [25].

We must also establish propensities in the form of a $2 \times n$ matrix $c$ that contains values for $p_{k}^{↑}$ and $p_{k}^{↓}$ . The values chosen for propensities may perturb results, as we saw in Figure 16. But for all intents and purposes, we typically use $p_{k}^{↑} = p_{k}^{↓} = 0.9$ (i.e. follow the function rules 90% of the time). Finally, we can run simulations using inputs: $F, v a r F, n v$ , number of states (usually Boolean), $c, n$ , number of steps, and number of random initializations. We have also implemented versions that allow for mutation induction and specified initial states. As a result, we achieve time-course trajectories, and we can use the Markov chain structure of SDDS to analyze features such as time to absorption, stationary distributions, and more.

As an example, consider the simple 3-cycle in Figure 17. This particular system has two fixed points ({000} and {111}) as well as two attractors ({001, 100, 010} and {011, 101, 110}). Simulations were conducted using the variables in Table 6 , with 1000 random initializations, 100 time steps (function updates), and injecting 1% noise. The overall state-space is shown in Figure 18. In Figure 19a, the uncontrolled simulation shows the oscillatory nature of attractors. However, Figures 19b and 19c show that inducing control on x1 is enough to drive the system to one fixed point or the other. Therefore, the SDDS simulator has the ability to show long-term trajectories and impact of controls over time.

Table 6:

Variable tables for simple 3-cycle simulations in Figure 17 [2].

x ₁	x ₂	x ₃

1	1	1

(a) nv

x ₁	x ₂	x ₃

3	1	2

(b) varF

x ₁	x ₂	x ₃

0	0	0
1	1	1

(c) F

Open in a new tab

Figure 18: — *Phase-space of simple 3-cycle*. Here we show the state-space of the example from Figure 17, using SDDS with transition probabilities, with nodes written in lexicographical ordering.

Figure 19: — *Simulation examples for a simple 3-cycle with 1% noise [2]*.

7.4. Software

Cumulative files for all control techniques and examples, as well as “how-to” documentation [2]
- https://github.com/drplaugher/SMATA_pipeline
CA: used to find fixed points, controls, and run simulations [22, 33, 64]
- – use the example files above
- – see also, https://github.com/drplaugher/PCC_Mutations
CK: used to find control kernels [29]
- https://doi.org/10.5281/zenodo.5172898
FVS: used to find FVSs [30, 44]
- – https://github.com/jgtz/FVS_python3
Modularity: used to find strongly connected components (modules) [34]
- – use the example files above
SM: used to find stable motifs and dynamic attractors [32, 65]
- – https://github.com/jgtz/StableMotifs
- – https://github.com/jcrozum/pystablemotifs

8. References

[1].H Waddington C.. The strategy of the genes: a discussion of some aspects of theoretical biology. Allen & Unwin, London, 1957. [Google Scholar]
[2].Plaugher Daniel. An integrated computational pipeline to construct patient-specific cancer models, Dec 2022. [Google Scholar]
[3].Rozum Jordan and Albert Réka. Leveraging network structure in nonlinear control. NPJ systems biology and applications, 8(1):36, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
[4].Motter Adilson E. Networkcontrology. Chaos: An Interdisciplinary Journal of Nonlinear Science, 25(9):097621, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].Arkin Adam, Ross John, and McAdams Harley H. Stochastic kinetic analysis of developmental pathway bifurcation in phage $λ$ -infected escherichia coli cells. Genetics, 149(4):1633–1648, 1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
[6].Taylor Bradford P, Dushoff Jonathan, and Weitz Joshua S. Stochasticity and the limits to confidence when estimating r0 of ebola and other emerging infectious diseases. Journal of theoretical biology, 408:145–154, 2016. [DOI] [PubMed] [Google Scholar]
[7].Shmulevich Ilya, Dougherty Edward R, Kim Seungchan, and Zhang Wei. Probabilistic boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics, 18(2):261–274, 2002. [DOI] [PubMed] [Google Scholar]
[8].Shmulevich Ilya and Dougherty Edward R. Probabilistic Boolean networks: the modeling and control of gene regulatory networks. SIAM, 2010. [Google Scholar]
[9].Saadatpour Assieh, Albert István, and Albert Réka. Attractor analysis of asynchronous boolean models of signal transduction networks. J Theor Biol, 266(4):641–56, Oct 2010. [DOI] [PubMed] [Google Scholar]
[10].Murrugarra David, Veliz-Cuba Alan, Aguilar Boris, Arat Seda, and Laubenbacher Reinhard. Modeling stochasticity and variability in gene regulatory networks. EURASIP Journal on Bioinformatics and Systems Biology, 2012(1):5, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Murrugarra David and Aguilar Boris. Algebraic and Combinatorial Computational Biology, chapter 5, pages 149–150. Academic Press, 2018. [Google Scholar]
[12].Aguilar Boris, Gibbs David L, Reiss David J, McConnell Mark, Danziger Samuel A, Dervan Andrew, Trotter Matthew, Bassett Douglas, Hershberg Robert, Ratushny Alexander V, and Shmulevich Ilya. A generalizable data-driven multicellular model of pancreatic ductal adenocarcinoma. Gigascience, 9(7), 07 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Baker Ruth E, Pena Jose-Maria, Jayamohan Jayaratnam, and Jérusalem Antoine. Mechanistic models versus machine learning, a fight worth fighting for the biological community? Biology letters, 14(5):20170660, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Gong Chang, Milberg Oleg, Wang Bing, Vicini Paolo, Narwal Rajesh, Roskos Lorin, and Popel Aleksander S. A computational multiscale agent-based model for simulating spatio-temporal tumour immune response to pd1 and pdl1 inhibition. Journal of the Royal Society Interface, 14(134):20170320, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Macklin Paul. Key challenges facing data-driven multicellular systems biology. Gigascience, 8(10):giz127, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Erkan Mert, Reiser-Erkan Carolin, Michalski Christoph, and Kleeff Jörg. Tumor microenvironment and progression of pancreatic cancer. Experimental oncology, 32:128–31, 092010. [PubMed] [Google Scholar]
[17].Farrow Buckminster, Albo Daniel, and Berger David H.. The role of the tumor microenvironment in the progression of pancreatic cancer. Journal of Surgical Research, 149(2):319–328, 2008. [DOI] [PubMed] [Google Scholar]
[18].Feig Christine, Gopinathan Aarthi, Neesse Albrecht, Chan Derek S., Cook Natalie, and Tuveson David A.. The pancreas cancer microenvironment. Clinical Cancer Research, 18(16):4266–4276, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[19].Gore Jesse and Korc Murray. Pancreatic cancer stroma: Friend or foe? Cancer cell, 25:711–712, 06 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[20].Kleeff Jörg, Beckhove Philipp, Esposito Irene, Herzig Stephan, Huber Peter E., Löhr J. Matthias, and Friess Helmut. Pancreatic cancer microenvironment. International Journal of Cancer, 121(4):699–705, 2007. [DOI] [PubMed] [Google Scholar]
[21].Padoan Andrea, Plebani Mario, and Basso Daniela. Inflammation and pancreatic cancer: Focus on metabolism, cytokines, and immunity. International Journal of Molecular Sciences, 20:676, 022019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Plaugher Daniel, Aguilar Boris, and Murrugarra David. Uncovering potential interventions for pancreatic cancer patients via mathematical modeling. Journal of theoretical biology, 548:111197, 2022. [DOI] [PubMed] [Google Scholar]
[23].Kauffman S.A.. Metabolic stability and epigenesis in randomly constructed genetic nets. Journal of Theoretical Biology, 22(3):437–467, 1969. [DOI] [PubMed] [Google Scholar]
[24].Didier Gilles, Remy Elisabeth, and Chaouiya Claudine. Mapping multivalued onto boolean dynamics. Journal of theoretical biology, 270(1):177–184, 2011. [DOI] [PubMed] [Google Scholar]
[25].Veliz-Cuba Alan, Voss Stephen Randal, and Murrugarra David. Building model prototypes from timecourse data. Letters in Biomathematics, 9(1):107–120, 2022. [Google Scholar]
[26].Murrugarra David, Veliz-Cuba Alan, Aguilar Boris, and Laubenbacher Reinhard. Identification of control targets in boolean molecular network models via computational algebra. BMC Syst Biol, 10(1):94, Sep 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
[27].Vieira Luis Sordo, Laubenbacher Reinhard C, and Murrugarra David. Control of intracellular molecular networks using algebraic methods. Bulletin of mathematical biology, 82(1):1–22, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[28].Choo Sang-Mok, Ban Byunghyun, Joo Jae Il, and Cho Kwang-Hyun. The phenotype control kernel of a biomolecular regulatory network. BMC Syst Biol, 12(1):49, 042018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[29].Borriello Enrico and Daniels Bryan C.. The basis of easy controllability in boolean networks. Nature Communications, 12(1), December 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Mochizuki Atsushi, Fiedler Bernold, Kurosawa Gen, and Saito Daisuke. Dynamics and control at feedback vertex sets. ii: A faithful monitor to determine the diversity of molecular activities in regulatory networks. Journal of Theoretical Biology, 335:130 – 146, 2013. [DOI] [PubMed] [Google Scholar]
[31].Zañudo Jorge Gomez Tejeda, Yang Gang, and Albert Réka. Structure-based control of complex networks with nonlinear dynamics. Proc Natl Acad Sci U S A, 114(28):7234–7239, 07 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32].Zañudo Jorge G T and Albert Réka. Cell fate reprogramming by control of intracellular network dynamics. PLoS Comput Biol, 11(4):e1004193, Apr 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[33].Plaugher Daniel and Murrugarra D.. Modeling the pancreatic cancer microenvironment in search of control targets. Bulletin of Mathematical Biology, 83, 2021. [DOI] [PubMed] [Google Scholar]
[34].Kadelka Claus, Laubenbacher Reinhard, Murrugarra David, Veliz-Cuba Alan, and Wheeler Matthew. Decomposition of boolean networks: An approach to modularity of biological systems, 2022.
[35].Veliz-Cuba A.. Reduction of Boolean network models. Journal of Theoretical Biology, 289:167–172, 2011. [DOI] [PubMed] [Google Scholar]
[36].Saadatpour Assieh, Albert Réka, and Reluga Timothy. A reduction method for boolean network models proven to conserve attractors. SIAM Journal on Applied Dynamical Systems, 12:1997–2011, 012013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[37].Veliz-Cuba Alan, Aguilar Boris, Hinkelmann Franziska, and Laubenbacher Reinhard. Steady state analysis of boolean molecular network models via model reduction and computational algebra. BMC bioinformatics, 15:221, 06 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[38].Aguilar Boris, Fang Pan, Laubenbacher Reinhard, and Murrugarra David. A near-optimal control method for stochastic boolean networks. Letters in biomathematics, 7(1):67, 2020. [PMC free article] [PubMed] [Google Scholar]
[39].Bertsekas Dimitri. Reinforcement learning and optimal control. Athena Scientific, 2019. [Google Scholar]
[40].Sutton Richard S and Barto Andrew G. Reinforcement learning: An introduction. MIT press, 2018. [Google Scholar]
[41].Yousefi Mohammadmahdi R, Datta Aniruddha, and Dougherty Edward R. Optimal intervention strategies for therapeutic methods with fixed-length duration of drug effectiveness. IEEE Transactions on Signal Processing, 60(9):4930–4944, 2012. [Google Scholar]
[42].Johnson Kathleen, Plaugher Daniel, and Murrugarra David. Investigating the effect of changes in model parameters on optimal control policies, time to absorption, and mixing times. bioRxiv, 2023. [Google Scholar]
[43].Vieira Luis Sordo, Laubenbacher Reinhard C, and Murrugarra David. Control of intracellular molecular networks using algebraic methods. Bull Math Biol, 82(1):2, 122019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[44].Zañudo Jorge Gomez Tejeda, Yang Gang, and Albert Réka. Structure-based control of complex networks with nonlinear dynamics. Proc Natl Acad Sci U S A, 114(28):7234–7239, 07 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[45].Loughran Thomas P. Large granular lymphocytic leukemia.
[46].Saadatpour Assieh, Wang Rui-Sheng, Liao Aijun, Liu Xin, Loughran Thomas P, Albert István, and Albert Réka. Dynamical and structural analysis of a t cell survival network identifies novel candidate therapeutic targets for large granular lymphocyte leukemia. PLoS Comput Biol, 7(11):e1002267, Nov 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[47].Williamson E.A.B.S.G.. Lists, Decisions and Graphs. S. Gill Williamson, 2010. [Google Scholar]
[48].Festa Paola, Pardalos Panos, and Resende Mauricio. Feedback set problems. Encyclopedia of Optimization, 2, 061999. [Google Scholar]
[49].Fiedler Bernold, Mochizuki Atsushi, Kurosawa Gen, and Saito Daisuke. Dynamics and control at feedback vertex sets. i: Informative and determining nodes in regulatory networks. Journal of Dynamics and Differential Equations, 25(3):563–604, jul 2013. [DOI] [PubMed] [Google Scholar]
[50].Yang Gang, Zañudo Jorge G. T., and Albert Réka. Target control in logical models using the domain of influence of nodes. Frontiers in physiology, 9, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[51].Posted By: cp. Brief introduction of post-translational modifications (ptms), Jun 2018. [Google Scholar]
[52].Murrugarra David, Miller Jacob, and Mueller Alex N. Estimating propensity parameters using google pagerank and genetic algorithms. Front Neurosci, 10:513, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
[53].Hinkelmann Franziska, Brandon Madison, Guang Bonny, McNeill Rustin, Blekherman Greg, VelizCuba Alan, and Laubenbacher Reinhard. Adam: Analysis of discrete models of biological systems using computer algebra. BMC bioinformatics, 12:295, 07 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[54].Akutsu Tatsuya, Hayashida Morihiro, Ching Wai-Ki, and Ng Michael K.. Control of boolean networks: Hardness results and algorithms for tree structured networks. Journal of Theoretical Biology, 244(4):670–679, 2007. [DOI] [PubMed] [Google Scholar]
[55].Cheng Daizhan, Qi Hongsheng, Li Zhiqiang, and Liu Jiang B.. Stability and stabilization of boolean networks. International Journal of Robust and Nonlinear Control, 21(2):134–156, 2011. [Google Scholar]
[56].Yang Jung-Min, Lee Chun-Kyung, and Cho Kwang-Hyun. Stabilizing control of complex biological networks based on attractor-specific network reduction. IEEE Transactions on Control of Network Systems, 8(2):928–939, 2021. [DOI] [PubMed] [Google Scholar]
[57].Galinier Philippe, Lemamou Eunice, and Bouzidi Mohamed. Applying local search to the feedback vertex set problem. Journal of Heuristics, 19, 10 2013. [Google Scholar]
[58].Cifuentes-Fontanals Laura, Tonello Elisa, and Siebert Heike. Node and edge control strategy identification via trap spaces in boolean networks, 2022.
[59].Cifuentes-Fontanals Laura, Tonello Elisa, and Siebert Heike. Control in boolean networks with model checking. Frontiers in Applied Mathematics and Statistics, 8, 2022. [Google Scholar]
[60].Murrugarra David and Dimitrova Elena S. Molecular network control through boolean canalization. EURASIP J Bioinform Syst Biol, 2015(1):9, Dec 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[61].Yang Jung-Min, Lee Chun-Kyung, and Cho Kwang-Hyun. Stabilizing control of complex biological networks based on attractor-specific network reduction. IEEE Transactions on Control of Network Systems, 8(2):928–939, 2020. [DOI] [PubMed] [Google Scholar]
[62].Murrugarra David and Dimitrova Elena. Quantifying the total effect of edge interventions in discrete multistate networks. Automatica, 125:109453, 2021. [Google Scholar]
[63].Thomas René. Boolean formalization of genetic control circuits. Journal of Theoretical Biology, 42(3):563–585, 1973. [DOI] [PubMed] [Google Scholar]
[64].Grayson Daniel R. and Stillman Michael E.. Macaulay2, a software system for research in algebraic geometry. Available at http://www.math.uiuc.edu/Macaulay2/.
[65].Zañudo Jorge and Albert Réka. An effective network reduction approach to find the dynamical repertoire of discrete dynamic networks. Chaos (Woodbury, N.Y.), 23:025111, 06 2013. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

NIHPP2023.04.17.537158v1-supplement-1.pdf^{(112KB, pdf)}

[R1] [1].H Waddington C.. The strategy of the genes: a discussion of some aspects of theoretical biology. Allen & Unwin, London, 1957. [Google Scholar]

[R2] [2].Plaugher Daniel. An integrated computational pipeline to construct patient-specific cancer models, Dec 2022. [Google Scholar]

[R3] [3].Rozum Jordan and Albert Réka. Leveraging network structure in nonlinear control. NPJ systems biology and applications, 8(1):36, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] [4].Motter Adilson E. Networkcontrology. Chaos: An Interdisciplinary Journal of Nonlinear Science, 25(9):097621, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] [5].Arkin Adam, Ross John, and McAdams Harley H. Stochastic kinetic analysis of developmental pathway bifurcation in phage $λ$ -infected escherichia coli cells. Genetics, 149(4):1633–1648, 1998. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] [6].Taylor Bradford P, Dushoff Jonathan, and Weitz Joshua S. Stochasticity and the limits to confidence when estimating r0 of ebola and other emerging infectious diseases. Journal of theoretical biology, 408:145–154, 2016. [DOI] [PubMed] [Google Scholar]

[R7] [7].Shmulevich Ilya, Dougherty Edward R, Kim Seungchan, and Zhang Wei. Probabilistic boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics, 18(2):261–274, 2002. [DOI] [PubMed] [Google Scholar]

[R8] [8].Shmulevich Ilya and Dougherty Edward R. Probabilistic Boolean networks: the modeling and control of gene regulatory networks. SIAM, 2010. [Google Scholar]

[R9] [9].Saadatpour Assieh, Albert István, and Albert Réka. Attractor analysis of asynchronous boolean models of signal transduction networks. J Theor Biol, 266(4):641–56, Oct 2010. [DOI] [PubMed] [Google Scholar]

[R10] [10].Murrugarra David, Veliz-Cuba Alan, Aguilar Boris, Arat Seda, and Laubenbacher Reinhard. Modeling stochasticity and variability in gene regulatory networks. EURASIP Journal on Bioinformatics and Systems Biology, 2012(1):5, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].Murrugarra David and Aguilar Boris. Algebraic and Combinatorial Computational Biology, chapter 5, pages 149–150. Academic Press, 2018. [Google Scholar]

[R12] [12].Aguilar Boris, Gibbs David L, Reiss David J, McConnell Mark, Danziger Samuel A, Dervan Andrew, Trotter Matthew, Bassett Douglas, Hershberg Robert, Ratushny Alexander V, and Shmulevich Ilya. A generalizable data-driven multicellular model of pancreatic ductal adenocarcinoma. Gigascience, 9(7), 07 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] [13].Baker Ruth E, Pena Jose-Maria, Jayamohan Jayaratnam, and Jérusalem Antoine. Mechanistic models versus machine learning, a fight worth fighting for the biological community? Biology letters, 14(5):20170660, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] [14].Gong Chang, Milberg Oleg, Wang Bing, Vicini Paolo, Narwal Rajesh, Roskos Lorin, and Popel Aleksander S. A computational multiscale agent-based model for simulating spatio-temporal tumour immune response to pd1 and pdl1 inhibition. Journal of the Royal Society Interface, 14(134):20170320, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] [15].Macklin Paul. Key challenges facing data-driven multicellular systems biology. Gigascience, 8(10):giz127, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] [16].Erkan Mert, Reiser-Erkan Carolin, Michalski Christoph, and Kleeff Jörg. Tumor microenvironment and progression of pancreatic cancer. Experimental oncology, 32:128–31, 092010. [PubMed] [Google Scholar]

[R17] [17].Farrow Buckminster, Albo Daniel, and Berger David H.. The role of the tumor microenvironment in the progression of pancreatic cancer. Journal of Surgical Research, 149(2):319–328, 2008. [DOI] [PubMed] [Google Scholar]

[R18] [18].Feig Christine, Gopinathan Aarthi, Neesse Albrecht, Chan Derek S., Cook Natalie, and Tuveson David A.. The pancreas cancer microenvironment. Clinical Cancer Research, 18(16):4266–4276, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] [19].Gore Jesse and Korc Murray. Pancreatic cancer stroma: Friend or foe? Cancer cell, 25:711–712, 06 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] [20].Kleeff Jörg, Beckhove Philipp, Esposito Irene, Herzig Stephan, Huber Peter E., Löhr J. Matthias, and Friess Helmut. Pancreatic cancer microenvironment. International Journal of Cancer, 121(4):699–705, 2007. [DOI] [PubMed] [Google Scholar]

[R21] [21].Padoan Andrea, Plebani Mario, and Basso Daniela. Inflammation and pancreatic cancer: Focus on metabolism, cytokines, and immunity. International Journal of Molecular Sciences, 20:676, 022019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] [22].Plaugher Daniel, Aguilar Boris, and Murrugarra David. Uncovering potential interventions for pancreatic cancer patients via mathematical modeling. Journal of theoretical biology, 548:111197, 2022. [DOI] [PubMed] [Google Scholar]

[R23] [23].Kauffman S.A.. Metabolic stability and epigenesis in randomly constructed genetic nets. Journal of Theoretical Biology, 22(3):437–467, 1969. [DOI] [PubMed] [Google Scholar]

[R24] [24].Didier Gilles, Remy Elisabeth, and Chaouiya Claudine. Mapping multivalued onto boolean dynamics. Journal of theoretical biology, 270(1):177–184, 2011. [DOI] [PubMed] [Google Scholar]

[R25] [25].Veliz-Cuba Alan, Voss Stephen Randal, and Murrugarra David. Building model prototypes from timecourse data. Letters in Biomathematics, 9(1):107–120, 2022. [Google Scholar]

[R26] [26].Murrugarra David, Veliz-Cuba Alan, Aguilar Boris, and Laubenbacher Reinhard. Identification of control targets in boolean molecular network models via computational algebra. BMC Syst Biol, 10(1):94, Sep 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] [27].Vieira Luis Sordo, Laubenbacher Reinhard C, and Murrugarra David. Control of intracellular molecular networks using algebraic methods. Bulletin of mathematical biology, 82(1):1–22, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] [28].Choo Sang-Mok, Ban Byunghyun, Joo Jae Il, and Cho Kwang-Hyun. The phenotype control kernel of a biomolecular regulatory network. BMC Syst Biol, 12(1):49, 042018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] [29].Borriello Enrico and Daniels Bryan C.. The basis of easy controllability in boolean networks. Nature Communications, 12(1), December 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] [30].Mochizuki Atsushi, Fiedler Bernold, Kurosawa Gen, and Saito Daisuke. Dynamics and control at feedback vertex sets. ii: A faithful monitor to determine the diversity of molecular activities in regulatory networks. Journal of Theoretical Biology, 335:130 – 146, 2013. [DOI] [PubMed] [Google Scholar]

[R31] [31].Zañudo Jorge Gomez Tejeda, Yang Gang, and Albert Réka. Structure-based control of complex networks with nonlinear dynamics. Proc Natl Acad Sci U S A, 114(28):7234–7239, 07 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] [32].Zañudo Jorge G T and Albert Réka. Cell fate reprogramming by control of intracellular network dynamics. PLoS Comput Biol, 11(4):e1004193, Apr 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] [33].Plaugher Daniel and Murrugarra D.. Modeling the pancreatic cancer microenvironment in search of control targets. Bulletin of Mathematical Biology, 83, 2021. [DOI] [PubMed] [Google Scholar]

[R34] [34].Kadelka Claus, Laubenbacher Reinhard, Murrugarra David, Veliz-Cuba Alan, and Wheeler Matthew. Decomposition of boolean networks: An approach to modularity of biological systems, 2022.

[R35] [35].Veliz-Cuba A.. Reduction of Boolean network models. Journal of Theoretical Biology, 289:167–172, 2011. [DOI] [PubMed] [Google Scholar]

[R36] [36].Saadatpour Assieh, Albert Réka, and Reluga Timothy. A reduction method for boolean network models proven to conserve attractors. SIAM Journal on Applied Dynamical Systems, 12:1997–2011, 012013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] [37].Veliz-Cuba Alan, Aguilar Boris, Hinkelmann Franziska, and Laubenbacher Reinhard. Steady state analysis of boolean molecular network models via model reduction and computational algebra. BMC bioinformatics, 15:221, 06 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] [38].Aguilar Boris, Fang Pan, Laubenbacher Reinhard, and Murrugarra David. A near-optimal control method for stochastic boolean networks. Letters in biomathematics, 7(1):67, 2020. [PMC free article] [PubMed] [Google Scholar]

[R39] [39].Bertsekas Dimitri. Reinforcement learning and optimal control. Athena Scientific, 2019. [Google Scholar]

[R40] [40].Sutton Richard S and Barto Andrew G. Reinforcement learning: An introduction. MIT press, 2018. [Google Scholar]

[R41] [41].Yousefi Mohammadmahdi R, Datta Aniruddha, and Dougherty Edward R. Optimal intervention strategies for therapeutic methods with fixed-length duration of drug effectiveness. IEEE Transactions on Signal Processing, 60(9):4930–4944, 2012. [Google Scholar]

[R42] [42].Johnson Kathleen, Plaugher Daniel, and Murrugarra David. Investigating the effect of changes in model parameters on optimal control policies, time to absorption, and mixing times. bioRxiv, 2023. [Google Scholar]

[R43] [43].Vieira Luis Sordo, Laubenbacher Reinhard C, and Murrugarra David. Control of intracellular molecular networks using algebraic methods. Bull Math Biol, 82(1):2, 122019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] [44].Zañudo Jorge Gomez Tejeda, Yang Gang, and Albert Réka. Structure-based control of complex networks with nonlinear dynamics. Proc Natl Acad Sci U S A, 114(28):7234–7239, 07 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] [45].Loughran Thomas P. Large granular lymphocytic leukemia.

[R46] [46].Saadatpour Assieh, Wang Rui-Sheng, Liao Aijun, Liu Xin, Loughran Thomas P, Albert István, and Albert Réka. Dynamical and structural analysis of a t cell survival network identifies novel candidate therapeutic targets for large granular lymphocyte leukemia. PLoS Comput Biol, 7(11):e1002267, Nov 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] [47].Williamson E.A.B.S.G.. Lists, Decisions and Graphs. S. Gill Williamson, 2010. [Google Scholar]

[R48] [48].Festa Paola, Pardalos Panos, and Resende Mauricio. Feedback set problems. Encyclopedia of Optimization, 2, 061999. [Google Scholar]

[R49] [49].Fiedler Bernold, Mochizuki Atsushi, Kurosawa Gen, and Saito Daisuke. Dynamics and control at feedback vertex sets. i: Informative and determining nodes in regulatory networks. Journal of Dynamics and Differential Equations, 25(3):563–604, jul 2013. [DOI] [PubMed] [Google Scholar]

[R50] [50].Yang Gang, Zañudo Jorge G. T., and Albert Réka. Target control in logical models using the domain of influence of nodes. Frontiers in physiology, 9, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] [51].Posted By: cp. Brief introduction of post-translational modifications (ptms), Jun 2018. [Google Scholar]

[R52] [52].Murrugarra David, Miller Jacob, and Mueller Alex N. Estimating propensity parameters using google pagerank and genetic algorithms. Front Neurosci, 10:513, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] [53].Hinkelmann Franziska, Brandon Madison, Guang Bonny, McNeill Rustin, Blekherman Greg, VelizCuba Alan, and Laubenbacher Reinhard. Adam: Analysis of discrete models of biological systems using computer algebra. BMC bioinformatics, 12:295, 07 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] [54].Akutsu Tatsuya, Hayashida Morihiro, Ching Wai-Ki, and Ng Michael K.. Control of boolean networks: Hardness results and algorithms for tree structured networks. Journal of Theoretical Biology, 244(4):670–679, 2007. [DOI] [PubMed] [Google Scholar]

[R55] [55].Cheng Daizhan, Qi Hongsheng, Li Zhiqiang, and Liu Jiang B.. Stability and stabilization of boolean networks. International Journal of Robust and Nonlinear Control, 21(2):134–156, 2011. [Google Scholar]

[R56] [56].Yang Jung-Min, Lee Chun-Kyung, and Cho Kwang-Hyun. Stabilizing control of complex biological networks based on attractor-specific network reduction. IEEE Transactions on Control of Network Systems, 8(2):928–939, 2021. [DOI] [PubMed] [Google Scholar]

[R57] [57].Galinier Philippe, Lemamou Eunice, and Bouzidi Mohamed. Applying local search to the feedback vertex set problem. Journal of Heuristics, 19, 10 2013. [Google Scholar]

[R58] [58].Cifuentes-Fontanals Laura, Tonello Elisa, and Siebert Heike. Node and edge control strategy identification via trap spaces in boolean networks, 2022.

[R59] [59].Cifuentes-Fontanals Laura, Tonello Elisa, and Siebert Heike. Control in boolean networks with model checking. Frontiers in Applied Mathematics and Statistics, 8, 2022. [Google Scholar]

[R60] [60].Murrugarra David and Dimitrova Elena S. Molecular network control through boolean canalization. EURASIP J Bioinform Syst Biol, 2015(1):9, Dec 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R61] [61].Yang Jung-Min, Lee Chun-Kyung, and Cho Kwang-Hyun. Stabilizing control of complex biological networks based on attractor-specific network reduction. IEEE Transactions on Control of Network Systems, 8(2):928–939, 2020. [DOI] [PubMed] [Google Scholar]

[R62] [62].Murrugarra David and Dimitrova Elena. Quantifying the total effect of edge interventions in discrete multistate networks. Automatica, 125:109453, 2021. [Google Scholar]

[R63] [63].Thomas René. Boolean formalization of genetic control circuits. Journal of Theoretical Biology, 42(3):563–585, 1973. [DOI] [PubMed] [Google Scholar]

[R64] [64].Grayson Daniel R. and Stillman Michael E.. Macaulay2, a software system for research in algebraic geometry. Available at http://www.math.uiuc.edu/Macaulay2/.

[R65] [65].Zañudo Jorge and Albert Réka. An effective network reduction approach to find the dynamical repertoire of discrete dynamic networks. Chaos (Woodbury, N.Y.), 23:025111, 06 2013. [DOI] [PubMed] [Google Scholar]

PERMALINK

This is a preprint.

Phenotype control techniques for Boolean gene regulatory networks

Daniel Plaugher

David Murrugarra

Abstract

1. Introduction and Motivation

2. Overview of Control Methods

Table 1:

Table 2:

Table 3:

2.1. Case Study: T-Cell Large Granular Lymphocyte (T-LGL) Leukemia

Figure 1:

3. Description of Control Methods

3.1. Algebraic Methods (CA)

Figure 3:

3.2. Control Kernel (CK)

3.3. Feedback Vertex Set (FVS)

3.4. Stable Motifs (SM)

4. Efficiency management

4.1. Reduction techniques

Figure 4:

Figure 5:

Figure 6:

4.2. Modularity techniques

Figure 7:

5. Limitations

6. Conclusions

Supplementary Material

Figure 2:

Acknowledgements

7. Appendix

7.1. Elementary examples for control methods

7.1.1. Computational Algebra

Figure 8:

Table 5:

7.1.2. Control Kernel

Figure 9:

7.1.3. Feedback Vertex Set

Figure 10:

7.1.4. Stable Motifs

Figure 11:

7.2. Finite Dynamical Systems

Figure 12:

7.2.1. Boolean Networks

Figure 13:

Figure 14:

Table 4:

Figure 15:

Figure 16:

7.3. Simulating Target Efficacy

Figure 17:

Table 6:

Figure 18:

Figure 19:

7.4. Software

8. References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases