Modularity of biological systems: a link between structure and function

Claus Kadelka; Matthew Wheeler; Alan Veliz-Cuba; David Murrugarra; Reinhard Laubenbacher

doi:10.1101/2023.09.11.557227

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 Sep 12:2023.09.11.557227. [Version 1] doi: 10.1101/2023.09.11.557227

Modularity of biological systems: a link between structure and function

Claus Kadelka ^1,^†,^*, Matthew Wheeler ^2,^†, Alan Veliz-Cuba ³, David Murrugarra ⁴, Reinhard Laubenbacher ²

PMCID: PMC10515856 PMID: 37745485

Abstract

This paper addresses two topics in systems biology, the hypothesis that biological systems are modular and the problem of relating structure and function of biological systems. The focus here is on gene regulatory networks, represented by Boolean network models, a commonly used tool. Most of the research on gene regulatory network modularity has focused on network structure, typically represented through either directed or undirected graphs. But since gene regulation is a highly dynamic process as it determines the function of cells over time, it is natural to consider functional modularity as well. One of the main results is that the structural decomposition of a network into modules induces an analogous decomposition of the dynamic structure, exhibiting a strong relationship between network structure and function. An extensive simulation study provides evidence for the hypothesis that modularity might have evolved to increase phenotypic complexity while maintaining maximal dynamic robustness to external perturbations

Introduction

Building complicated structures from simpler building blocks is a widely observed principle in both natural and engineered systems. In molecular systems biology, it is also widely accepted, even though there has not emerged a clear definition of what constitutes a simple building block, or module. Consequently, it is not clear how the modular structure of a system can be identified, why it is advantageous to an organism to be composed of modular components, and how we could take advantage of modularity to advance our understanding of molecular systems (1–3). In the (graph-theoretic) network representation of molecular systems, such as gene regulatory networks or protein-protein interaction networks, a module is typically considered to be a “highly” connected region of the graph that is “sparsely” connected to the rest of the graph, otherwise known as a community in the graph. Graph theoretic algorithms that depend on the choice of parameters and the specific definition of “highly” and “sparsely” are typically used to define modules (4,5). Similar approaches are used for identifying modules in co-expression networks based on clustering of transcriptomics data (6).

A major limitation of this approach to modularity is that it focuses entirely on a static representation of gene regulatory networks and other systems. However, living organisms are dynamic, and need to be modeled and understood as dynamical systems. Thus, modularity should have an instantiation as a dynamic feature, as advocated in (7). The most common types of models employed for this purpose are systems of ordinary differential equations and discrete models such as Boolean networks and their generalizations, providing the basis for a study of dynamic modularity. In recent years, there have been an increasing number of papers that take this point of view. The authors of (8) argue that dynamic modularity may be independent of structural modularity, and they identify examples of multifunctional circuits in gene regulatory networks that they consider dynamically modular but without any underlying structural modularity. A similar argument is made in (9) by analyzing a small gene regulatory network example. For another example of a similar approach see (10).

The literature on how modularity might have evolved and why it might be useful as an organizational principle cites as the most common reasons robustness, the ability to rapidly respond to changing environmental conditions, and efficiency in the control of response to perturbations (2,11,12). An interesting hypothesis has been put forward in (3), namely that a modular organization of biological structure can be viewed as a symmetry-breaking phase transition, with modularity as the order parameter.

This literature makes clear that research on the topic of modularity in molecular systems, both structural and dynamic, would be greatly advanced by clear definitions of the concept of module, both structural and dynamic. This would in particular help to decide whether and how structural and dynamic modularity are related, and it would provide a basis on which to distinguish between modularity and multistationarity of a dynamic regulatory network. To be of practical use, such a theory should include algorithms to decompose a dynamic network into structural and/or dynamic modules. At the same time, it would be of great practical value, for instance for synthetic biology, to understand how systems can be composed from modules that have specific dynamic properties.

The search for such algorithms has led us to look for guidance to mathematics, as a complement to biology. After all, if the dynamic mathematical models that are widely used to encode gene regulatory networks are appropriate representations, and if modularity is indeed an important feature of such networks, then it should be reflected in the model structure and dynamics. Choosing the widely-used modeling framework of Boolean networks, we asked whether it is possible to identify meaningful concepts of modularity that, ideally, link both the structural and dynamic aspects. Modularity is fundamentally about connectivity. The central dynamic instantiation of connectivity is the feedback loop, which we therefore choose as the defining feature. The concept of module we propose is structural, in terms of special subgraphs of the (directed) graph of dependencies of network nodes. These subgraphs, called strongly connected compoments (SCC), are maximal with respect to the property that every node is connected to every other node in the subgraph through a directed path. In other words, none of the nodes in the SCC are involved in feedback loops that are not entirely within the SCC.

The main result of this paper is that this structural decomposition of the model into modules induces a similar decomposition of model dynamics, explicitly linking the dynamics of the structural modules in a mathematically clearly specified way. This theorem links structural and dynamic modularity, and provides an example of how network structure influences network function. We provide an important application of this theorem to network control by showing that in order to control a network, it is sufficient to control its modules, and we provide an application of this result to a published cancer signaling network. This result is important both for applications to, e.g., medicine, and might provide a candidate for a mechanisms that allows organisms to quickly respond to changes in their external environment. We also discuss our results in the context of published Boolean network models of regulatory networks and provide specific instantiations of our decomposition theorem. Finally, we address the question as to why evolution should favor modularity as a structural and dynamic feature. We carry out an extensive simulation study that provides evidence for the hypothesis that modularity enables phenotypic complexity while maintaining maximal robustness to external perturbations.

Boolean networks

For the purpose of this article, we will focus on the class of Boolean networks as a modeling paradigm. Recall that a Boolean network $F$ on variables $x_{1}, \dots, x_{n}$ can be viewed as a function on binary strings of length $n$ , which is described coordinate-wise by $n$ Boolean update functions $f_{i}$ . Each function $f_{i}$ uniquely determines a map

F_{i} : {0, 1}^{n} \to {0, 1}^{n}, F_{i} (x_{1}, \dots, x_{n}) = (x_{1}, \dots, f_{i} (x), \dots, x_{n}),

where $x = (x_{1}, \dots, x_{n})$ . Every Boolean network defines a canonical map, where the functions are synchronously updated:

F : {0, 1}^{n} \to {0, 1}^{n}, F (x_{1}, \dots, x_{n}) = (f_{1} (x), \dots, f_{n} (x)) .

In this paper, we only consider this canonical map, i.e., we only consider synchronously updated Boolean network models. Two directed graphs can be associated to $F$ (see Fig. 1 for an example). The wiring diagram (also known as dependency graph) contains $n$ nodes corresponding to the $x_{i}$ , and has a directed edge from $x_{i}$ to $x_{j}$ if $f_{j}$ depends on $x_{i}$ . The state space of $F$ contains as nodes the $2^{n}$ binary strings, and has a directed edge from u to v if $F (u) = v$ . Each connected component of the state space gives an attractor basin of $F$ , which consists of a directed loop, the attractor, as well as trees feeding into the attractor. Attractors can be steady states (also known as fixed points) or limit cycles. Each attractor in a biological Boolean network model typically corresponds to a distinct phenotype (13). The set of attractors of $F$ , denoted $𝒜 (F)$ , contains all attractors, i.e., all minimal subsets $𝒞 \subseteq {0,1}^{n}$ satisfying $F (𝒞) = 𝒞$ . Note that a limit cycle of length $k$ represents $k$ trajectories. For example, the 2-cycle (010,101) in Fig. 1 represents (010,101,010,…) and (101,010,101,…). This distinction becomes important later, when decomposing the dynamics of Boolean networks.

Figure 1: — Wiring diagram and state space of the Boolean network $F = (f_{1}, f_{2}, f_{3}) =$ $(x_{2} \land \neg x_{3}, x_{3}, \neg x_{1} \land x_{2})$ . (a) The wiring diagram encodes the dependency between variables. Subnetworks are defined on the basis of the wiring diagram. For example, the subnetwork ${F|}_{\{x_{2}, x_{3}\}} (x_{1}, x_{2}, x_{3}) = (x_{3}, \neg x_{1} \land x_{2})$ is the restriction of $F$ to $\{x_{2}, x_{3}\}$ and contains external parameter $x_{1}$ . (b) The state space is a directed graph with edges between all states and their images. This graph therefore encodes all possible trajectories and attractors. Here, $F$ has two steady states, 000 and 011, and one limit cycle, (010,101), so $𝒜 (F) = {000,011, (010,101)}$ .

A structural definition of modularity for Boolean networks

Given a Boolean network $F$ and a subset $S$ of its variables, we can define a subnetwork of $F$ , denoted ${F|}_{S}$ , as the restriction of $F$ to $S$ . If some variables in $S$ are regulated by variables not in $S$ , then we require these regulations to be included in ${F|}_{S}$ . In this case, the subnetwork is a Boolean network with external parameters. For the example in Fig. 1, the subnetwork ${F|}_{\{x_{2}, x_{3}\}}$ contains $x_{1}$ as external parameter because $x_{1}$ regulates $x_{3}$ . If the variables in $S$ form a SCC (that is, (i) every pair of nodes in $S$ (excluding possible external parameters) is connected by a directed path, and (ii) the inclusion of any additional node in $S$ will break this property), we call the subnetwork a module.

The wiring diagram of any Boolean network $F$ is either strongly connected or it consists of a collection of SCCs where connections between two SCC point in only one direction. Let $W_{1}, \dots, W_{m}$ be the SCCs of the wiring diagram, with $Y_{i}$ denoting the set of variables in SCC $W_{i}$ (note $\cup_{i} Y_{i} = Y$ and $Y_{i} \neq Y_{j}$ for $\neq j$ ). Then, the modules of $F$ are ${F|}_{Y_{1}}, \dots, {F|}_{Y_{n}}$ , the restrictions of $F$ to the $Y_{i}$ . By setting $W_{i} \to W_{j}$ if there exists at least one edge from a node in $W_{i}$ to a node in $W_{j}$ , we obtain a directed acyclic graph

Q = {(i, j) ∣ W_{i} \to W_{j}},

(1)

which describes the connections between the modules of $F$ .

As we will show later, any Boolean network can be decomposed into modules and this structural decomposition implies a decomposition of the network dynamics, which is of practical utility. The main question to be answered at this point, though, is whether there exists biological evidence that our concept of modularity and the structural and dynamic decomposition theory that follows does in fact reflect reality.

Modularity in expert-curated biological networks

A recent study investigated the features of 122 distinct published, expert-curated Boolean network models (14). Analyzing the wiring diagrams of these models, we found that almost all of them (113,92.6%) contained at least one feedback loop and thus at least one non-trivial SCC/module (which contains more than one node). The nine models that only contained single-node SCCs mainly describe signaling pathways. Thirty models (24.6%) contained even more than one non-trivial SCC, with one Influenza A virus replication model possessing eleven (15). The directed acyclic graph structure (Eq. 1) of these models varied widely (Fig. S1). While the average connectivity of a network was not correlated with the number of non-trivial SCCs $(ρ_{Spearman} = 0.05, p = 0.61)$ , network size was positively correlated $(ρ_{Spearman} = 0.35, p < 10^{- 4})$ . The same trends persisted when considering the binary variable “multiple non-trivial SCCs” (multivariable logistic regression: connectivity $p = 0.36$ , size $p = 0.003)$ .

Modules are subnetworks that carry out key control functions in a cell. It would therefore not be surprising if there was a selection bias among systems biologists to focus their attention on such modules. Larger networks are still challenging to build and analyze since an accurate formulation of a biological network model requires a substantial amount of data for a careful inference and calibration of the update rules by a subject expert (16,17). For this reason most published expert-curated models might focus on one specific cellular function of interest and contain therefore only one non-trivial SCC.

Modularity confers phenotypical robustness and a rich dynamic repertoire

To provide additional evidence that SCCs form biologically meaningful modules, we performed a computational study which shows that the presence of several modules confers robust phenotypes and a rich dynamic repertoire, both desirable features for an organism.

Biological networks must harbor multiple phenotypes, allowing the network to dynamically shift from one attractor to another based on its current needs. This shift is typically mitigated by external signals. Many evolutionary innovations are the result of newly evolved attractors of GRNs (18,19). The number of attractors of a Boolean network therefore describes its dynamical complexity.

Furthermore, biological networks need to robustly maintain a certain function (i.e., phenotype) in the presence of intrinsic and extrinsic perturbations (20,21). At any moment, these perturbations may cause a small number of genes to randomly change their expression level. For a Boolean GRN model, this corresponds to an unexpressed gene being randomly expressed, or vice versa. The robustness of the network describes how a perturbation on average affects the network dynamics. One popular robustness measure for Boolean networks, the Derrida value, describes the average Hamming distance between two states after one synchronous update according to the Boolean network rules, given that the two states differed in a single node (22). Due to the finite size of the state space, any state of a BN eventually transitions to an attractor, which corresponds to a distinct biological phenotype. Thus, the Derrida value is a meaningful robustness measure

r (F) = \frac{1}{n 2^{n}} \sum_{x \in {0, 1}^{n}} \sum_{i = 1}^{n} 1 [A (x) = A (x \oplus e_{i})] = \frac{1}{n 2^{n - 1}} \sum_{\begin{matrix} x, y \in {0, 1}^{n}, ∥ x - y ∥ = 1 \end{matrix}} 1 [A (x) = A (y)] \in [0, 1]

Here, $e_{i}$ is the $i$ th unit vector and $A (x)$ labels the attractor that state $x$ transitions to. Geometri-cally, if we consider the Boolean hypercube with each vertex in ${0,1}^{n}$ labeled by the attractor that the vertex-associated state eventually transitions to, then $r (F)$ is the proportion of edges, which connect vertices with the same value.

Clearly, $r (F) = 1$ if a Boolean network $F$ possesses only a single attractor. Moreover, the expected value, $E [r (F)]$ , decreases as the number of attractors of $F$ increases. This implies that the phenotypical robustness and the dynamical complexity are negatively correlated and that there exists a trade-off when trying to maximize both. It is reasonable to hypothesize that evolution favors robust GRNs that give rise to sufficient variety in the phenotype space. In line with this, we hypothesized that modular networks have higher robustness than non-modular networks with the same dynamical complexity.

To test this hypothesis, we generated nested canalizing Boolean networks with $N = 60$ nodes, a fixed in-degree of 3, and $m = 1, \dots, 6$ modules (i.e., SCCs of the wiring diagram) of size $N / m$ . Networks with more modules possessed on average a higher dynamical complexity, quantified here as the number of attractors (Fig. 2A). At a fixed dynamical complexity, the more modular a network the higher was its average phenotypical robustness (Fig. 2B). This finding supports the hypothesis that a modular design serves as an evolutionary answer to a multi-objective optimization problem.

Figure 2: — Modularity confers dynamical complexity and phenotypical robustness. 60-node nested canalizing Boolean networks with a constant in-degree of 3 and with 1–6 modules (i.e., SCCs of the wiring diagram) of equal size were generated (50,000 networks each). For each modular network, a weakly connected directed graph describing the connections between modules, as well as a single edge connecting an upstream with a downstream module were selected uniformly at random. By following the transitions of 500 random initial states to their attractors, the phenotypical robustness and a lower bound for the dynamical complexity (here, number of attractors) were established for each network. (A) Cumulative empirical density function of the number of attractors, stratified by the number of modules or SCCs. (B) The mean phenotypical robustness $(y)$ is plotted against the number of discovered attractors $(x)$ , stratified by the number of modules or SCCs (dots). Since $y (1) = 1$ , the two-parameter function $y = α + (1 - α) e^{- k (x - 1)}$ is fitted to the means of the number of attractors for $x = 1, \dots, 19$ (lines).

Structural decomposition of Boolean networks

Thus far, we have described how to define modules as restrictions of Boolean networks, and provided evidence that modules defined this way are biologically meaningful. To obtain a successful decomposition theory, we also require the inverse operation of a restriction: a semi-direct product that combines two Boolean networks, $F$ and $G$ , such that $F$ is the upstream module and $G$ is the downstream module. The coupling scheme $P$ contains the information which nodes in $F$ regulate which nodes in $G$ . We denote the combined Boolean network as $F ⋊_{P} G$ and refer to this as the coupling of $F$ and $G$ by the coupling scheme $P$ or as the semi-direct product of $F$ and $G$ via $P$ (detailed definition in SI Appendix, section 1). (The motivation for the term “semi-direct product” comes from the fact that the combination of the two subnetworks is like a product, except that $F$ acts on $G$ through $P$ , which is not the case in an actual product. The term is also used in mathematical group theory, which provided the motivation for our decomposition approach.)

As an example, consider the Boolean networks $F (x_{1}, x_{2}) = (x_{2}, x_{1})$ and $G (u_{1}, u_{2}, y_{1}, y_{2}) =$ $(u_{1} \lor (u_{2} \land y_{2}), \neg u_{2} \land y_{1})$ where $G$ possesses two external parameters, $u_{1}$ and $u_{2}$ . With the coupling scheme $P = \{x_{1} \to u_{1}, x_{2} \to u_{2}\}$ , we obtain the combined nested canalizing network $F ⋊_{P} G : {0,1}^{4} \to {0,1}^{4}$ ,

(F ⋊_{P} G) (x_{1}, x_{2}, y_{1}, y_{2}) = (x_{2}, x_{1}, x_{1} \lor (x_{2} \land y_{2}), \neg x_{2} \land y_{1}) .

At the wiring diagram level, this product can be seen as the union of the two wiring diagrams and some added edges determined by the coupling scheme $P$ (Fig. 3. If instead $G (u_{1}, u_{2}, y_{1}, y_{2}) =$ $u_{1} + u_{2} + y_{2}, u_{2} + y_{1}$ with $F$ and $P$ as before, then we obtain the linear network

(F ⋊_{P} G) (x_{1}, x_{2}, y_{1}, y_{2}) = (x_{2}, x_{1}, x_{1} + x_{2} + y_{2}, x_{2} + y_{1}) .

At the wiring diagram level, this product looks exactly the same (Fig. 3).

Figure 3: — Semi-direct product of Boolean networks. Wiring diagrams of independent Boolean networks $F$ and $G$ (where $G$ has external parameters) can be combined into $F ⋊_{P} G$ , the semi-direct product of $F$ and $G$ . The coupling scheme $P$ describes which variables of $F$ take the place of the external parameters and act as inputs to $G$ .

We can prove that every network is either a module or can be decomposed into a semi-direct product of two networks. That is, if a Boolean network $F$ is not a module (i.e., if its wiring diagram is not strongly connected), then there exist $F_{1}, F_{2}, P$ such that $F = F_{1} ⋊_{P} F_{2}$ , and we call such a network $F$ decomposable. We can even find a decomposition such that $F_{1}$ is a module. By induction on the downstream component $F_{2}$ , it follows that any Boolean network is either a module or decomposable into a unique series of semi-direct products of modules. That is, for any Boolean network $F$ , there exist unique modules $F_{1}, \dots, F_{m} (m = 1$ if $F$ is itself a module) such that

F = F_{1} ⋊_{P_{1}} (F_{2} ⋊_{P_{2}} (\dots ⋊_{P_{m - 1}} F_{m})),

(2)

where this representation is unique up to a reordering, which respects the partial order induced by the directed acyclic graph $Q$ (Eq. 11). The collection of coupling schemes $P_{1}, \dots, P_{m - 1}$ depends on the particular choice of ordering, as well as on the placement of parentheses in the decomposition of $F$ , which may be rearranged in any associative manner. SI Appendix, section 1 contains the proofs of these theorems.

Dynamic decomposition of Boolean networks

When the variables of a network $F$ can be partitioned such that $F = F_{1} ⋊_{P} F_{2} = F_{1} \times F_{2}$ is simply the cross product of two networks $F_{1}$ and $F_{2}$ , i.e., the coupling scheme $P = \emptyset$ , then the dynamics of $F$ can be determined directly from the dynamics of $F_{1}$ and $F_{2}$ . The dynamics of $F$ consists of coordinate pairs $(x, y)$ such that

x (t + 1) = F_{1} (x (t)), y (t + 1) = F_{2} (y (t)) .

If trajectories $(x (t))_{t = 0}^{\infty}$ and $(y (t))_{t = 0}^{\infty}$ have periods $l$ and $m$ , respectively, then the periodicity of the trajectory $(x (t), y (t))_{t = 0}^{\infty}$ is the least common multiple of $l$ and $m$ . Moreover, the set of periodic points (i.e., attractors) of $F$ is the Cartesian product of the set of periodic points of $F_{1}$ and periodic points of $F_{2}$ .

For example, the Boolean network $F (x_{1}, x_{2}, x_{3}, x_{4}) = (x_{2}, x_{1}, x_{4}, x_{3})$ can be seen as $=$ $F_{1} \times F_{2}$ , where $F_{1} (x_{1}, x_{2}) = (x_{2}, x_{1})$ and $F_{2} (x_{3}, x_{4}) = (x_{4}, x_{3})$ . The sets of attractors of $F_{1}$ and $F_{2}$ are $𝒜 (F_{1}) = {00,11, (01,10)}$ and $𝒜 (F_{2}) = {00,11, (01,10)}$ (where we omit parentheses around steady states). By concatenating the attractors of $F_{1}$ and $F_{2}$ , we obtain the attractors of $F$ (Fig. 4A). Note that we have two ways of concatenating the limit cycle (01,10) of $F_{1}$ and the limit cycle $(01,10)$ of $F_{2}$ to obtain attractors of $F$ . In general, we have the following equation that formally states that attractors of $F_{1} \times F_{2}$ are given by concatenating attractors of $F_{1}$ and $F_{2}$ .

Figure 4: — Attractors of a Cartesian product and a semi-direct product. (A) The space of attractors of a Cartesian product $F = F_{1} \times F_{2}$ , with $F_{1} (x_{1}, x_{2}) = (x_{2}, x_{1}), F_{2} (x_{3}, x_{4}) = (x_{4}, x_{3})$ , can be seen as a Cartesian product of $𝒜 (F_{1})$ and $𝒜 (F_{2})$ . To illustrate the different ways to combine attractors of $F_{1}$ and $F_{2}$ , in the panel we explicitly write (01,10) and (10,01) for $F_{2}$ . (B) In general, the coupling of networks does not behave as a Cartesian product and the space of attractors depends on this coupling. The crossed-out attractors indicate which attractors from the Cartesian product are lost when using a semi-direct product with coupling scheme $P = \{(x_{3}, x_{2} x_{4})\}$ , and $F_{1}, F_{2}$ as in $A$ .

𝒜 (F_{1} \times F_{2}) = 𝒜 (F_{1}) \times 𝒜 (F_{2})

(3)

The computation of the attractors of $F$ becomes more complicated when $F$ is slightly modified so that $F (x_{1}, x_{2}, x_{3}, x_{4}) = (x_{2}, x_{1}, x_{2} x_{4}, x_{3}) = F_{1} ⋊_{P} F_{2}$ , where $F_{1}$ as before and $F_{2} = (u x_{4}, x_{3})$ with external parameter $u$ and coupling scheme $P = \{x_{2} \to u\}$ . Since the coupling between $F_{1}$ and $F_{2}$ is no longer empty, not every combination of attractors of $F_{1}$ and $F_{2}$ will result in an attractor of $F$ (Fig. 4B). For example, $(01,10) \in 𝒜 (F_{1})$ and $(01,10) \in 𝒜 (F_{2})$ do give rise to an attractor of $F$ , while $(01,10) \in 𝒜 (F_{1})$ and $(10,01) \in 𝒜 (F_{2})$ do not. The set of attractors, $𝒜 (F)$ , is the union of $00 \times 00,11 \times 𝒜 (F_{2})$ , and (01,10)×{00,(01,10)}, and is thus a subset of the attractors of the Cartesian product (Fig. 4A). This is, however, not always the case but depends on the particular coupling between the networks. Hence, Eq. 3 is not valid in general.

In order to study the dynamics of decomposable networks, we need to understand how a trajectory, which describes the behavior of an “upstream” network at an attractor, influences the dynamics of a “downstream” network. The trajectory of an “upstream” $m$ -node network $F_{1}$ at an attractor $𝒞_{1} = (α_{1}, \dots, α_{r})$ can be described by $(g (t))_{t = 0}^{\infty}$ , a sequence with elements in ${0,1}^{m}$ . This trajectory has period $r$ , the length of the attractor. The dynamics of the “downstream” $n$ -node network $F_{2}$ depend on $F_{1}$ . Therefore, $F_{2}$ is a non-autonomous Boolean network, defined by

y (t + 1) = F_{2} (g (t), y (t)),

where $F_{2} : {0,1}^{m + n} \to {0,1}^{n}$ . SI Appendix, section 2 contains a detailed definition and examples of non-autonomous Boolean networks. To make the dependence of $F_{2}$ on the choice of upstream attractor $C_{1} \in 𝒞_{1}$ explicit, we often write $F_{2}^{𝒞_{1}}$ instead of simply $F_{2}$ . If $𝒞_{2} =$ $(β_{1}, \dots, β_{s})$ is an attractor of $F_{2}^{𝒞_{1}}$ , then

𝒞_{1} \oplus 𝒞_{2} = ((α_{1}, β_{1}), (α_{2}, β_{2}), \dots, (α_{l - 1}, β_{l - 1}))

is an attractor of the combined network $F = F_{1} ⋊_{P} F_{2}$ of length $l : = l c m (|𝒞_{1}|, |𝒞_{2}|)$ , the least common multiple of $|𝒞_{1}|$ and $|𝒞_{2}|$ .

Iterating over all attractors of $F_{1}$ (that is, all $𝒞_{1} \in 𝒜 (F_{1})$ ) as well as all attractors of the corresponding non-autonomous networks $F_{2}^{𝒞_{1}}$ (that is, all $𝒞_{2} \in 𝒜 (F_{2}^{𝒞_{1}})$ ) yields all attractors of the combined network $F$ . After the structural decomposition theorem (Eq. 2), this dynamic decomposition theorem constitutes the second main theoretical result. Mathematically, it can be expressed as

𝒜 (F) = \underset{𝒞_{1} \in 𝒜 (F_{1})}{∐} \underset{𝒞_{2} \in 𝒜 (F_{2}^{𝒞_{1}})}{∐} 𝒞_{1} \oplus 𝒞_{2},

(4)

which can be written as $𝒜 (F_{1}) ⋊_{P} 𝒜 (F_{2})$ to highlight the analogy between the structural decomposition of a Boolean network and the decomposition of its dynamics. With this, the dynamic decomposition theorem states $𝒜 (F_{1} ⋊_{P} F_{2}) = 𝒜 (F_{1}) ⋊_{P} 𝒜 (F_{2})$ , which implies a distributive property for the dynamics of decomposable networks. Note that if $P$ is empty, then $𝒜 (F_{2}^{𝒞_{1}}) = 𝒜 (F_{2})$ for all $𝒞_{1}$ and we recover Eq. 3. $𝒜 (F_{1} \times F_{2}) = 𝒜 (F_{1}) \times 𝒜 (F_{2})$ .

The dynamics of a Boolean network $F$ , which decomposes into modules $F_{1}, \dots, F_{m}$ , can thus be computed from the dynamics of its modules. That is,

𝒜 (F) = 𝒜 (F_{1}) ⋊_{P_{1}} (𝒜 (F_{2}) ⋊_{P_{2}} (\dots ⋊_{P_{m - 1}} 𝒜 (F_{m}))),

(5)

where the placement of the parentheses may be rearranged in any associative manner, just as for the structural decomposition in Eq.2. SI Appendix, section 2 contains the proof of the dynamic decomposition theorem as well as instructional examples.

Efficient control of decomposable Boolean networks

The state space of a Boolean network grows exponentially in the number of variables. Therefore, the decomposition theorems can reduce the time needed to perform various computations by orders of magnitude for networks with several larger modules. Besides an efficient strategy to compute all attractors of a Boolean network, the structural decomposition theorem can also be applied to efficiently identify controls of Boolean networks, a topic that has received recent attention (23–25). Drug developers wonder, for example, which nodes in a gene regulatory network need to be controlled by an external drug to ensure the network transitions to a desired phenotype, typically corresponding to a specific network attractor.

Two types of control actions are generally considered: edge controls and node controls. For each type of control, one can consider deletions or constant expressions, as defined in (26). The motivation for considering these control actions is that they represent the common interventions that can be implemented in practice. For instance, edge deletions can be achieved by the use of therapeutic drugs that target specific gene interactions, while node deletions represent the blocking of effects of products of genes associated to these nodes (27,28).

A set of controls $μ$ stabilizes a Boolean network at an attractor $𝒞$ when the resulting network after applying $μ$ possesses $𝒞$ as its only attractor. As described in detail in (29), the decomposition into modules can be used to obtain controls for each module, which can then be combined to obtain a control for an entire network. Specifically, for a decomposable network $F = F_{1} ⋊_{P} F_{2}$ , if $μ_{1}$ is a set of controls that stabilizes $F_{1}$ in $𝒞_{1}$ and $μ_{2}$ is a control that stabilizes $F_{2}^{𝒞_{1}}$ in $𝒞_{2}$ , then $μ = μ_{1} \cup μ_{2}$ is a set of control that stabilizes $F$ in $𝒞 = 𝒞_{1} \oplus 𝒞_{2}$ , as long as $𝒞_{1}$ or $𝒞_{2}$ is a steady state.

A recently published multicellular Boolean network model describes the microenvironment of pancreatic cancer cells by modeling the interactions of pancreatic cancer cells (PCCs), pancreatic stellate cells (PSCs), and their connecting cytokines (30). This network has 69 nodes, 114 edges, and possesses three non-trivial modules (Fig. 5a). Fig. 5b shows the directed acyclic graph, which describes the connections between the modules.

Figure 5: — A multicellular Boolean cancer model (30), which describes the interactions of pancreatic cancer cells (purple nodes), pancreatic stellate cells (blue nodes), and their connecting cytokines (yellow nodes). (a) Wiring diagram describing the regulations between nodes, which are all monotonic, with black and red arrows indicating activation and inhibition, respectively. The non-trivial modules are highlighted by amber, green, and gray boxes. (b) Directed acyclic graph describing the connections between the non-trivial modules.

An effective treatment should induce the cancer cell to undergo apoptosis, which therefore represents the desired attractor of this network. To find a set of controls that stabilizes the network in this attractor, one can exploit the structural decomposition of the network by first controlling the upstream module (module 1), which has four attractors: two steady states and two 3-cycles. This module consists of two feedback loops joined by the node TGFb1. It is thus enough to control TGFb1 to stabilize this module into any of its attractors (31). Using the methods from (32) or (26), the controls of module 2 can be identified. A minimal set of two nodes needs to be controlled to stabilize this module: $R A S$ in the pancreatic cell and $R A S$ in the stellate cell. After applying these controls, the nodes in the downstream module (module 3) are all already constant and do therefore not require additional controls. Using the modular structure of the network, three nodes can be easily identified, which suffice to control the entire network. Notably, this never requires the consideration of the entire network, which saves computation time. Disregarding the decomposition and identifying controls for the whole network instead yields the same minimal set of three controls. However, this may not always be the case. In rare cases, the module-by-module control identification strategy will yield a set of controls that is larger than necessary.

Discussion

The search for “fundamental laws” has been part of systems biology since its beginning, including features of biological systems that are characteristic of most or all systems of a given type, such as gene regulatory networks. The concept of modularity can be considered as such a feature, and has been studied extensively in several different contexts. Another focus of interest has been the relationship between the structure and function of dynamic networks. The results in this paper in essence provide evidence that modularity is in fact a key feature that connects structure and function of networks.

Systems biology has been a field that is making extensive use of mathematical models as descriptive language and analytic tool. Notions such as dynamic modularity are difficult or impossible to study without the use of mathematical models, as is the relationship between structure and function of networks. A limitation of this approach is of course that published models are partial and simplified representations of the requisite biology, so that caution is required when drawing conclusions. But this approach has yielded useful results in studying motifs in static networks (e.g., (20)). The advantage of a mathematical foundation is that it enables an analytical treatment of concepts that might otherwise have to be studied using heuristics, examples, and simulations. This is the essence of our approach in this study. Based on rigorous definitions, we were able to prove the link between structural and functional modularity, as well as the broad application to control of networks. We believe that we have only scratched the surface of results that follow from the mathematical framework we have established. For instance, the flip side of network decomposition is network construction through “concatenation” of modules. This can be done in ways that achieve certain dynamic properties, of potential interest to problems in synthetic biology.

Finally, while we have provided evidence that our concept of structural and functional modularity might have biological relevance, more work remains to be done. For instance, it would be of interest to investigate the biological features of the individual modules found in the repository of Boolean network models from (14) to investigate whether modules in our definition can be viewed as meaningful biological “functional units.” The implications of a functional modular structure also remain to be explored beyond our initial result of control at the modular level. We also believe that many of our results should hold in appropriate form for the modeling framework of ordinary differential equations.

Methods

Meta-analysis of published gene regulatory network models

We used the same repository of 122 published and distinct gene regulatory network models as in (14). Some of these models include non-essential regulators. That is, a node is included as a regulator in an update rule but a change in this node never affects the update rule. We removed all non-essential regulators from the update rules, before computing for each network the number of genes (i.e., size), the average connectivity, all SCCs, as well as the size of each SCC. From this, we derived the primary metric of interest, the number of non-trivial SCCs. Trivial SCCs consist of one node only. Since SCCs are defined as the largest connected component such that there is a path from every node to every other node, it is irrelevant whether the single node in a trivial SCC regulates itself.

The logistic multivariable regression model, implemented in the Python module statsmodels api is given by

\frac{p}{1 - p} = e^{β_{0} + β_{1} x_{1} + β_{2} x_{2}},

where $p$ is the probability of a model having multiple non-trivial SCCs, and $x_{1}, x_{2}$ are average connectivity and network size.

Generation of Boolean networks for simulation study

To understand the effect of modularity on the phenotypical robustness and the dynamical complexity, we resorted to simulation studies of Boolean networks with a specific structure and a defined number of SCCs (i.e., modules). To reduce the number of potential confounders, we fixed the network size at $N = 60$ and the in-degree of each node at $n = 3$ , which is slightly higher than the average in-degree in published gene regulatory network models (14). We further considered only nested canalizing update rules since most rules in published gene regulatory networks are of this type (14). To generate networks with a defined number of $m \in {1,2, \dots, 6}$ modules, each of which consists of $N / m$ nodes, we first generated a random directed acyclic graph of $m$ modules by picking uniformly at random a weakly-connected lower triangular binary $m \times m$ -matrix $D$ with diagonal entries 1. If $D_{i j} = 1$ , a node in module $i$ regulates a node in module $j$ . Otherwise, there is no connection. To ensure that the number of SCCs was indeed $m$ , we required each module to be a single SCC. We achieved this by randomly generating wiring diagrams for a module until the wiring diagram was strongly connected (for the sparsest modules (i.e., $= 1, N / m = 60$ ), this took on average about 22 iterations).

Estimating dynamical complexity and phenotypical robustness

The size of the state space of the 60-node Boolean networks used in the computational study prohibits the exhaustive identification of all attractors. To compute all attractors, we could have exploited the decomposition into smaller modules for decomposable networks. However, this does not help with the identification of attractors for non-decomposable networks consisting of a single module of size 60. To avoid introducing any bias by using different methods, we employed the same sampling technique to estimate a lower bound of the number of attractors for each Boolean network. Specifically for each network $F$ , we generated 500 random initial states $x_{0} \in {0,1}^{60}$ and continued to synchronously update each $x_{0}$ according to $F$ (that is, $x_{i + 1} = F (x_{i}))$ until a recurring state was found, indicating the arrival at an attractor.

Biologically meaningful attractors “attract” a substantial portion of the state space. With a state space size of 2⁶⁰ and when starting from 500 random initial states, we have a $95 %$ chance of finding an attractor, which attracts 0.6% of the state space and even a $99 %$ chance of finding an attractor, which attracts 0.9% of the state space. Relying on sampling and the resulting lower bound of the number of attractors should therefore not limit the validity of our findings.

To estimate the phenotypical robustness, we considered the same 500 random initial states $x_{0} \in {0,1}^{60}$ and generated for each $x_{0}$ a corresponding state $y_{0} = x_{0} \oplus e_{i}$ by randomly flipping one bit $i \in {1, \dots, n}$ (where $e_{i}$ is the $i$ th unit vector and $\oplus$ denotes binary addition). Just as $x_{0}$ , we synchronously updated $y_{0}$ according to $F$ until it reached an attractor and compared the attractors. As a consequence, all estimated phenotypical robustness values are multiples of 1/500.

Supplementary Material

NIHPP2023.09.11.557227V1-supplement-1.pdf^{(465.8KB, pdf)}

Acknowledgments

The authors thank Elena Dimitrova for participating in initial fruitful discussions.

Funding:

C.K. was partially supported by a Collaboration Grant from the Simons foundation (712537). M.W. thanks the University of Florida College of Medicine for travel support. D.M. was partially supported by a Collaboration Grant from the Simons foundation (850896). R.L. received support from NIH and DARPA: 1 R01 HL169974-01, HR00112220038. All authors thank the Banff International Research Station for support through its Focused Research Group program during the week of May 29, 2022 (22frg001), which was of great help in completing this paper.

Funding Statement

Footnotes

Competing interests: The authors declare that they have no competing interests.

Data availability:

All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials.

References and Notes

1.Hartwell L. H., Hopfield J. J., Leibler S., Murray A. W., Nature 402 (1999). [DOI] [PubMed] [Google Scholar]
2.Hernández U., Posadas-Vidales L., Espinosa-Soto C., Biosystems 212, 104586 (2022). [DOI] [PubMed] [Google Scholar]
3.Lorenz D. M., Jeng A., Deem M. W., Physics of Life Reviews 8, 129 (2011). [DOI] [PubMed] [Google Scholar]
4.Leicht E. A., Newman M. E., Physical review letters 100, 118703 (2008). [DOI] [PubMed] [Google Scholar]
5.Malliaros F. D., Vazirgiannis M., Physics reports 533, 95 (2013). [Google Scholar]
6.Zhang B., Horvath S., Statistical applications in genetics and molecular biology 4 (2005). [DOI] [PubMed] [Google Scholar]
7.Alexander R. P., Kim P. M., Emonet T., Gerstein M. B., Science signaling 2, pe44 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Jimenez A., Cotterell J., Munteanu A., Sharpe J., Molecular Systems Biology 13, 925 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Verd B., Monk N. A., Jaeger J., Elife 8, e42832 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Deritei D., Aird W. C., Ercsey-Ravasz M., Regan E. R., Scientific Reports 6 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Wagner G. P., Pavlicev M., Cheverud J. M., Nature Reviews Genetics 8, 921 (2007). [DOI] [PubMed] [Google Scholar]
12.Gilarranz L. J., Rayfield B., Liñán-Cembrano G., Bascompte J., Gonzalez A., Science 357, 199 (2017). [DOI] [PubMed] [Google Scholar]
13.Schwab J. D., Kühlwein S. D., Ikonomi N., Kühl M., Kestler H. A., Computational and structural biotechnology journal 18, 571 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Kadelka C., Butrie T.-M., Hilton E., Kinseth J., Serdarevic H., arXiv preprint arXiv:2009.01216 (2020). [Google Scholar]
15.Madrahimov A., Helikar T., Kowal B., Lu G., Rogers J., Bulletin of mathematical biology 75, 988 (2013). [DOI] [PubMed] [Google Scholar]
16.Lee W.-P., Tzou W.-S., Briefings in bioinformatics 10, 408 (2009). [DOI] [PubMed] [Google Scholar]
17.Pratapa A., Jalihal A. P., Law J. N., Bharadwaj A., Murali T., Nature Methods 17, 147 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Wagner G. P. (Princeton University Press, 2014).
19.Halfon M. S., Trends in Genetics 33, 436 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Alon U., Science 301, 1866 (2003). [DOI] [PubMed] [Google Scholar]
21.Klemm K., Bornholdt S., Proceedings of the National Academy of Sciences 102, 18414 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Derrida B., Weisbuch G., Journal de Physique 47, 1297 (1986). [Google Scholar]
23.Borriello E., Daniels B. C., Nature communications 12, 1 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Rozum J., Albert R., NPJ systems biology and applications 8, 36 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Paul S., Su C., Pang J., Mizera A., Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics (2018), pp. 11–20. [Google Scholar]
26.Murrugarra D., Veliz-Cuba A., Aguilar B., Laubenbacher R., BMC Syst Biol 10, 94 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Choi M., Shi J., Jung S. H., Chen X., Cho K.-H., Science signaling 5, ra83 (2012). [DOI] [PubMed] [Google Scholar]
28.Wooten D. J., et al. , PLoS computational biology 17, e1008690 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Kadelka C., Laubenbacher R., Murrugarra D., Veliz-Cuba A., Wheeler M., arXiv preprint arXiv:2206.04217 (2022). [Google Scholar]
30.Plaugher D., Murrugarra D., Bulletin of Mathematical Biology 83, 1 (2021). [DOI] [PubMed] [Google Scholar]
31.Zañudo J. G. T., Yang G., Albert R., Proceedings of the National Academy of Sciences 114, 7234 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Zanudo J. G., Albert R., PLoS computational biology 11, e1004193 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHPP2023.09.11.557227V1-supplement-1.pdf^{(465.8KB, pdf)}

Data Availability Statement

All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials.

[R1] 1.Hartwell L. H., Hopfield J. J., Leibler S., Murray A. W., Nature 402 (1999). [DOI] [PubMed] [Google Scholar]

[R2] 2.Hernández U., Posadas-Vidales L., Espinosa-Soto C., Biosystems 212, 104586 (2022). [DOI] [PubMed] [Google Scholar]

[R3] 3.Lorenz D. M., Jeng A., Deem M. W., Physics of Life Reviews 8, 129 (2011). [DOI] [PubMed] [Google Scholar]

[R4] 4.Leicht E. A., Newman M. E., Physical review letters 100, 118703 (2008). [DOI] [PubMed] [Google Scholar]

[R5] 5.Malliaros F. D., Vazirgiannis M., Physics reports 533, 95 (2013). [Google Scholar]

[R6] 6.Zhang B., Horvath S., Statistical applications in genetics and molecular biology 4 (2005). [DOI] [PubMed] [Google Scholar]

[R7] 7.Alexander R. P., Kim P. M., Emonet T., Gerstein M. B., Science signaling 2, pe44 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Jimenez A., Cotterell J., Munteanu A., Sharpe J., Molecular Systems Biology 13, 925 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Verd B., Monk N. A., Jaeger J., Elife 8, e42832 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Deritei D., Aird W. C., Ercsey-Ravasz M., Regan E. R., Scientific Reports 6 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Wagner G. P., Pavlicev M., Cheverud J. M., Nature Reviews Genetics 8, 921 (2007). [DOI] [PubMed] [Google Scholar]

[R12] 12.Gilarranz L. J., Rayfield B., Liñán-Cembrano G., Bascompte J., Gonzalez A., Science 357, 199 (2017). [DOI] [PubMed] [Google Scholar]

[R13] 13.Schwab J. D., Kühlwein S. D., Ikonomi N., Kühl M., Kestler H. A., Computational and structural biotechnology journal 18, 571 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Kadelka C., Butrie T.-M., Hilton E., Kinseth J., Serdarevic H., arXiv preprint arXiv:2009.01216 (2020). [Google Scholar]

[R15] 15.Madrahimov A., Helikar T., Kowal B., Lu G., Rogers J., Bulletin of mathematical biology 75, 988 (2013). [DOI] [PubMed] [Google Scholar]

[R16] 16.Lee W.-P., Tzou W.-S., Briefings in bioinformatics 10, 408 (2009). [DOI] [PubMed] [Google Scholar]

[R17] 17.Pratapa A., Jalihal A. P., Law J. N., Bharadwaj A., Murali T., Nature Methods 17, 147 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Wagner G. P. (Princeton University Press, 2014).

[R19] 19.Halfon M. S., Trends in Genetics 33, 436 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Alon U., Science 301, 1866 (2003). [DOI] [PubMed] [Google Scholar]

[R21] 21.Klemm K., Bornholdt S., Proceedings of the National Academy of Sciences 102, 18414 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Derrida B., Weisbuch G., Journal de Physique 47, 1297 (1986). [Google Scholar]

[R23] 23.Borriello E., Daniels B. C., Nature communications 12, 1 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Rozum J., Albert R., NPJ systems biology and applications 8, 36 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Paul S., Su C., Pang J., Mizera A., Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics (2018), pp. 11–20. [Google Scholar]

[R26] 26.Murrugarra D., Veliz-Cuba A., Aguilar B., Laubenbacher R., BMC Syst Biol 10, 94 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Choi M., Shi J., Jung S. H., Chen X., Cho K.-H., Science signaling 5, ra83 (2012). [DOI] [PubMed] [Google Scholar]

[R28] 28.Wooten D. J., et al. , PLoS computational biology 17, e1008690 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Kadelka C., Laubenbacher R., Murrugarra D., Veliz-Cuba A., Wheeler M., arXiv preprint arXiv:2206.04217 (2022). [Google Scholar]

[R30] 30.Plaugher D., Murrugarra D., Bulletin of Mathematical Biology 83, 1 (2021). [DOI] [PubMed] [Google Scholar]

[R31] 31.Zañudo J. G. T., Yang G., Albert R., Proceedings of the National Academy of Sciences 114, 7234 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Zanudo J. G., Albert R., PLoS computational biology 11, e1004193 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

This is a preprint.

Modularity of biological systems: a link between structure and function

Claus Kadelka

Matthew Wheeler

Alan Veliz-Cuba

David Murrugarra

Reinhard Laubenbacher

Abstract

Introduction

Boolean networks

Figure 1:

A structural definition of modularity for Boolean networks

Modularity in expert-curated biological networks

Modularity confers phenotypical robustness and a rich dynamic repertoire

Figure 2:

Structural decomposition of Boolean networks

Figure 3:

Dynamic decomposition of Boolean networks

Figure 4:

Efficient control of decomposable Boolean networks

Figure 5:

Discussion

Methods

Meta-analysis of published gene regulatory network models

Generation of Boolean networks for simulation study

Estimating dynamical complexity and phenotypical robustness

Supplementary Material

Acknowledgments

Funding:

Funding Statement

Footnotes

Data availability:

References and Notes

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases