Abstract
This paper addresses two topics in systems biology, the hypothesis that biological systems are modular and the problem of relating structure and function of biological systems. The focus here is on gene regulatory networks, represented by Boolean network models, a commonly used tool. Most of the research on gene regulatory network modularity has focused on network structure, typically represented through either directed or undirected graphs. But since gene regulation is a highly dynamic process as it determines the function of cells over time, it is natural to consider functional modularity as well. One of the main results is that the structural decomposition of a network into modules induces an analogous decomposition of the dynamic structure, exhibiting a strong relationship between network structure and function. An extensive simulation study provides evidence for the hypothesis that modularity might have evolved to increase phenotypic complexity while maintaining maximal dynamic robustness to external perturbations.
Keywords: decomposition theory, structure and function of networks, Boolean networks
1. Introduction
Building complicated structures from simpler building blocks is a widely observed principle in both natural and engineered systems. In molecular systems biology, it is also widely accepted, even though there has not emerged a clear definition of what constitutes a simple building block, or module. Consequently, it is not clear how the modular structure of a system can be identified, why it is advantageous to an organism to be composed of modular components, and how we could take advantage of modularity to advance our understanding of molecular systems [1–3]. In the (graph-theoretic) network representation of molecular systems, such as gene regulatory networks or protein–protein interaction networks, a module is typically considered to be a ‘highly’ connected region of the graph that is ‘sparsely’ connected to the rest of the graph, otherwise known as a community in the graph. Graph theoretic algorithms that depend on the choice of parameters, and the specific definition of ‘highly’ and ‘sparsely’ are typically used to define modules [4,5]. Similar approaches are used for identifying modules in co-expression networks based on clustering of transcriptomics data [6].
A major limitation of this approach to modularity is that it focuses entirely on a static representation of gene regulatory networks and other systems. However, living organisms are dynamic, and need to be modelled and understood as dynamical systems. Thus, modularity should have an instantiation as a dynamic feature, as advocated in [7]. The most common types of models employed for this purpose are systems of ordinary differential equations and discrete models such as Boolean networks and their generalizations, providing the basis for a study of dynamic modularity. In recent years, there have been an increasing number of papers that take this point of view. Jimenez et al. [8] argue that dynamic modularity may be independent of structural modularity, and they identify examples of multi-functional circuits in gene regulatory networks that they consider dynamically modular but without any underlying structural modularity. A similar argument is made in [9] by analysing a small gene regulatory network example. For another example of a similar approach see [10].
The literature on how modularity might have evolved and why it might be useful as an organizational principle cites as the most common reasons robustness, the ability to rapidly respond to changing environmental conditions, and efficiency in the control of response to perturbations [2,11,12]. An interesting hypothesis has been put forward in [3], namely that a modular organization of biological structure can be viewed as a symmetry-breaking phase transition, with modularity as the order parameter.
This literature makes clear that research on the topic of modularity in molecular systems, both structural and dynamic, would be greatly advanced by clear definitions of the concept of module, both structural and dynamic. This would in particular help to decide whether and how structural and dynamic modularity are related, and it would provide a basis on which to distinguish between dynamic modularity and multi-stationarity of a dynamic regulatory network. To be of practical use, such a theory should include algorithms to decompose a dynamic network into structural and/or dynamic modules. At the same time, it would be of great practical value, for instance for synthetic biology, to understand how systems can be composed from modules that have specific dynamic properties.
The search for such algorithms has led us to look for guidance to mathematics, as a complement to biology. After all, if the dynamic mathematical models that are widely used to encode gene regulatory networks are appropriate representations, and if modularity is indeed an important feature of such networks, then it should be reflected in the model structure and dynamics. Choosing the widely used modelling framework of Boolean networks, we asked whether it is possible to identify meaningful concepts of modularity that, ideally, link both the structural and dynamic aspects. Modularity is fundamentally about connectivity. The central dynamic instantiation of connectivity is the feedback loop, which we, therefore, choose as the defining feature. The concept of module we propose is structural, in terms of special subgraphs of the (directed) graph of dependencies of network nodes. These subgraphs, called strongly connected components (SCCs), are maximal with respect to the property that every node is connected to every other node in the subgraph through a directed path. In other words, none of the nodes in the SCC are involved in feedback loops that are not entirely within the SCC. These types of decomposition-based approaches are by no means novel and have been employed by computer scientists for developing faster, more efficient algorithms for finding and enumerating attractors. For instance, tree decompositions have been used to find fixed points and attractors of nested canalizing networks [13], and SCCs have been used to enumerate the attractors of an asynchronous network, in a manner very similar to our own approach [14]. Our aim is to highlight the structure that these types of decompositions, in particular that of SCCs, place on the attractor landscape, and to investigate the implications for the modelled biological systems.
The main result of this paper is that this structural decomposition of the model into modules induces a similar decomposition of model dynamics, explicitly linking the dynamics of the structural modules in a mathematically clearly specified way. This theorem links structural and dynamic modularity, and provides an example of how network structure influences network function. We provide an important application of this theorem to network control by showing that, in order to control a network, it is sufficient to control its modules, and we provide an application of this result to a published cancer signalling network. This result is important both for applications to e.g. medicine and might provide a candidate for a mechanism that allows organisms to quickly respond to changes in their external environment. We also discuss our results in the context of published Boolean network models of regulatory networks and provide specific instantiations of our decomposition theorem. Finally, we address the question as to why evolution should favour modularity as a structural and dynamic feature. We carry out an extensive simulation study that provides evidence for the hypothesis that modularity enables phenotypic complexity while maintaining maximal robustness to external perturbations.
1.1. Boolean networks
For the purpose of this article, we will focus on the class of Boolean networks as a modelling paradigm. Recall that a Boolean network F on variables x1, …, xn can be viewed as a function on binary strings of length n, which is described coordinate-wise by n Boolean update functions fi. Each function fi uniquely determines a map
where x = (x1, …, xn). Every Boolean network defines a canonical map, where the functions are synchronously updated,
In this paper, we only consider this canonical map, i.e. we only consider synchronously updated Boolean network models. Two directed graphs can be associated with F (see figure 1 for an example). The wiring diagram (also known as dependency graph) contains n nodes corresponding to the xi, and has a directed edge from xi to xj if fj depends on xi. The state space of F contains as nodes the 2n binary strings, and has a directed edge from u to v if F(u) = v. Each connected component of the state space gives an attractor basin of F, which consists of a directed loop, the attractor, as well as trees feeding into the attractor. Attractors can be steady states (also known as fixed points) or limit cycles. Each attractor in a biological Boolean network model typically corresponds to a distinct phenotype [15]. The set of attractors of F, denoted , contains all attractors, i.e. all minimal subsets satisfying . Note that a limit cycle of length k represents k trajectories. For example, the 2-cycle (010, 101) in figure 1 represents (010, 101, 010, …) and (101, 010, 101, …). This distinction becomes important later, when decomposing the dynamics of Boolean networks.
2. Results
2.1. A structural definition of modularity for Boolean networks
Given a Boolean network F and a subset S of its variables, we can define a subnetwork of F, denoted F|S, as the restriction of F to S. If some variables in S are regulated by variables not in S, then we require these regulations to be included in F|S. In this case, the subnetwork is a Boolean network with external parameters. For the example in figure 1, the subnetwork contains x1 as external parameter because x1 regulates x3. If the variables in S form a SCC (that is, (i) every pair of nodes in S (excluding possible external parameters) is connected by a directed path and (ii) the inclusion of any additional node in S will break this property), we call the subnetwork a module.
The wiring diagram of any Boolean network F is either strongly connected or it consists of a collection of SCCs where connections between two SCC point in only one direction. Let W1, …, Wm be the SCCs of the wiring diagram, with Yi denoting the set of variables in SCC Wi (note and Yi ≠ Yj for i ≠ j). Then, the modules of F are , the restrictions of F to the Yi. By setting Wi → Wj if there exists at least one edge from a node in Wi to a node in Wj, we obtain a directed acyclic graph
2.1 |
which describes the connections between the modules of F.
As we will show later, any Boolean network can be decomposed into modules and this structural decomposition implies a decomposition of the network dynamics, which is of practical utility. The main question to be answered at this point, though, is whether there exists biological evidence that our concept of modularity and the structural and dynamic decomposition theory that follows does in fact reflect reality.
2.2. Modularity in expert-curated biological networks
A recent study investigated the features of 122 distinct published, expert-curated Boolean network models [16]. Analysing the wiring diagrams of these models, we found that almost all of them (113, 92.6%) contained at least one feedback loop and thus at least one non-trivial SCC/module (which contains more than one node). The nine models that only contained single-node SCCs mainly describe signalling pathways. Thirty models (24.6%) contained even more than one non-trivial SCC, with one Influenza A virus replication model possessing 11 [17]. The directed acyclic graph structure (equation (2.1)) of these models varied widely (figure 2). While the average connectivity of a network was not correlated with the number of non-trivial SCCs (ρSpearman = −0.08, p = 0.37), network size was positively correlated (ρSpearman = 0.37, p < 10−4). The same trends persisted when considering the binary variable ‘multiple non-trivial SCCs’ (multi-variable logistic regression: connectivity p = 0.07, size p = 0.002).
Modules are subnetworks that carry out key control functions in a cell. It would, therefore, not be surprising if there was a selection bias among systems biologists to focus their attention on such modules. Larger networks are still challenging to build and analyse since an accurate formulation of a biological network model requires a substantial amount of data for a careful inference and calibration of the update rules by a subject expert [18–21]. For this reason most published expert-curated models might focus on one specific cellular function of interest and contain, therefore, only one non-trivial SCC. Assuming that a principled method for predetermining the modular structure of a biological system existed, one interesting application of this modular decomposition would be to allow Boolean inference algorithms to use this decomposition to focus on one module at a time reducing the complexity of the problem.
2.3. Modularity confers phenotypical robustness and a rich dynamic repertoire
To provide additional evidence that SCCs form biologically meaningful modules, we performed a computational study which shows that the presence of several modules confers robust phenotypes and a rich dynamic repertoire, both desirable features for an organism.
Biological networks must harbour multiple phenotypes, allowing the network to dynamically shift from one attractor to another based on its current needs. This shift is typically mitigated by external signals. Many evolutionary innovations are the result of newly evolved attractors of gene regulatory networks (GRNs) [22,23]. The number of attractors of a Boolean network, therefore, describes its dynamical complexity.
Furthermore, biological networks need to robustly maintain a certain function (i.e. phenotype) in the presence of intrinsic and extrinsic perturbations [24,25]. At any moment, these perturbations may cause a small number of genes to randomly change their expression level. For a Boolean GRN model, this corresponds to an unexpressed gene being randomly expressed, or vice versa. The robustness of the network describes how a perturbation on average affects the network dynamics. One popular robustness measure for Boolean networks (BNs), the Derrida value, describes the average Hamming distance between two states after one synchronous update according to the Boolean network rules, given that the two states differed in a single node [26]. Due to the finite size of the state space, any state of a BN eventually transitions to an attractor, which corresponds to a distinct biological phenotype. Thus, while the Derrida value is a meaningful robustness measure, a more phenotype-focused measure describes how frequently a small perturbation (e.g. a single node flip) forces the network to transition to a different attractor. We, therefore, measure the phenotypical robustness of a Boolean network F : {0, 1}n → {0, 1}n by
2.2 |
2.3 |
Here, ei is the ith unit vector and A(x) labels the attractor that state x transitions to. Geometrically, if we consider the Boolean hypercube with each vertex in {0, 1}n labelled by the attractor that the vertex-associated state eventually transitions to, then r(F) is the proportion of edges, which connect vertices with the same value.
Clearly, r(F) = 1 if a Boolean network F possesses only a single attractor. Moreover, the expected value, , decreases as the number of attractors of F increases. This implies that the phenotypical robustness and the dynamical complexity are negatively correlated and that there exists a trade-off when trying to maximize both. It is reasonable to hypothesize that evolution favours robust GRNs that give rise to sufficient variety in the phenotype space. In line with this, we hypothesized that modular networks have higher robustness than non-modular networks with the same dynamical complexity.
To test this hypothesis, we generated Boolean networks with N = 60 nodes, a fixed in-degree of 3, and m = 1, …, 6 modules (i.e. SCCs of the wiring diagram) of size N/m. Since published expert-curated Boolean GRN networks are almost exclusively governed by nested canalizing functions [16], we required all update rules to be of this type. Networks with more modules possessed on average a higher dynamical complexity, quantified here as the number of attractors (figure 3a). At a fixed dynamical complexity, the more modular a network the higher was its average phenotypical robustness (figure 3b). This finding supports the hypothesis that a modular design serves as an evolutionary answer to a multi-objective optimization problem.
2.4. Structural decomposition of Boolean networks
Thus far, we have described how to define modules as restrictions of Boolean networks and provided evidence that modules defined this way are biologically meaningful. To obtain a successful decomposition theory, we also require the inverse operation of a restriction: a semi-direct product that combines two Boolean networks, F and G, such that F is the upstream module and G is the downstream module. The coupling scheme P contains the information which nodes in F regulate which nodes in G. We denote the combined Boolean network as and refer to this as the coupling of F and G by the coupling scheme P or as the semi-direct product of F and G via P (detailed definition in appendix A, §A.1). (The motivation for the term ‘semi-direct product’ comes from the fact that the combination of the two subnetworks is like a product, except that F acts on G through P, which is not the case in an actual product. The term is also used in mathematical group theory, which provided the motivation for our decomposition approach.)
As an example, consider the Boolean networks where G possesses two external parameters, u1 and u2. With the coupling scheme P = {x1 → u1, x2 → u2}, we obtain the combined nested canalizing network ,
At the wiring diagram level, this product can be seen as the union of the two wiring diagrams and some added edges determined by the coupling scheme P (figure 4).
If instead G(u1, u2, y1, y2) = u1 + u2 + y2, u2 + y1 with F and P as before, then we obtain the linear network
At the wiring diagram level, this product looks exactly the same (figure 4).
We can prove that every network is either a module or can be decomposed into a semi-direct product of two networks. That is, if a Boolean network F is not a module (i.e. if its wiring diagram is not strongly connected), then there exist F1, F2, P such that , and we call such a network F decomposable. We can even find a decomposition such that F1 is a module. By induction on the downstream component F2, it follows that any Boolean network is either a module or decomposable into a unique series of semi-direct products of modules. That is, for any Boolean network F, there exist unique modules F1, …, Fm (m = 1 if F is itself a module) such that
2.4 |
where this representation is unique up to a reordering, which respects the partial order induced by the directed acyclic graph Q (equation (2.1)). The collection of coupling schemes P1, …, Pm−1 depends on the particular choice of ordering, as well as on the placement of parentheses in the decomposition of F, which may be rearranged in any associative manner. Appendix A, §A.1 contains the proofs of these theorems.
2.5. Dynamic decomposition of Boolean networks
When the variables of a network F can be partitioned such that is simply the cross product of two networks F1 and F2, i.e. the coupling scheme , then the dynamics of F can be determined directly from the dynamics of F1 and F2. The dynamics of F consists of coordinate pairs (x, y) such that
2.5 |
If trajectories and have periods l and m, respectively, then the periodicity of the trajectory is the least common multiple of l and m. Moreover, the set of periodic points (i.e. attractors) of F is the Cartesian product of the set of periodic points of F1 and periodic points of F2.
For example, the Boolean network F(x1, x2, x3, x4) = (x2, x1, x4, x3) can be seen as F = F1 × F2, where F1(x1, x2) = (x2, x1) and F2(x3, x4) = (x4, x3). The sets of attractors of F1 and F2 are and (where we omit parentheses around steady states). By concatenating the attractors of F1 and F2, we obtain the attractors of F (figure 5a). Note that we have two ways of concatenating the limit cycle (01, 10) of F1 and the limit cycle (01, 10) of F2 to obtain attractors of F. In general, we have the following equation that formally states that attractors of F1 × F2 are given by concatenating attractors of F1 and F2.
2.6 |
The computation of the attractors of F becomes more complicated when F is slightly modified so that , where F1 is as before and F2 = (ux4, x3) with external parameter u and coupling scheme P = {x2 → u}. Since the coupling between F1 and F2 is no longer empty, not every combination of attractors of F1 and F2 will result in an attractor of F (figure 5b). For example, and do give rise to an attractor of F, while and do not. The set of attractors, , is the union of 00 × 00, and (01, 10) × {00, (01, 10)}, and is thus a subset of the attractors of the Cartesian product (figure 5a). This is, however, not always the case but depends on the particular coupling between the networks. Hence, equation (2.6) is not valid in general.
In order to study the dynamics of decomposable networks, we need to understand how a trajectory, which describes the behaviour of an ‘upstream’ network at an attractor, influences the dynamics of a ‘downstream’ network. The trajectory of an ‘upstream’ m-node network F1 at an attractor can be described by , a sequence with elements in {0, 1}m. This trajectory has period r, the length of the attractor. The dynamics of the 'downstream' n-node network F2 depend on F1. Therefore, F2 is a non-autonomous Boolean network, defined by
2.7 |
where F2 : {0, 1}m+n → {0, 1}n. Appendix A, §A.2 contains a detailed definition and examples of non-autonomous Boolean networks. To make the dependence of F2 on the choice of upstream attractor explicit, we often write instead of simply F2. If is an attractor of , then
2.8 |
is an attractor of the combined network of length , the least common multiple of and .
Iterating over all attractors of F1 (that is, all ) as well as all attractors of the corresponding non-autonomous networks yields all attractors of the combined network F. After the structural decomposition theorem (equation (2.4)), this dynamic decomposition theorem constitutes the second main theoretical result. Mathematically, it can be expressed as
2.9 |
which can be written as to highlight the analogy between the structural decomposition of a Boolean network and the decomposition of its dynamics. With this, the dynamic decomposition theorem states , which implies a distributive property for the dynamics of decomposable networks. Note that if P is empty, then for all and we recover equation (2.6), .
The dynamics of a Boolean network F, which decomposes into modules F1, …, Fm, can thus be computed from the dynamics of its modules. That is,
2.10 |
where the placement of the parentheses may be rearranged in any associative manner, just as for the structural decomposition in equation (2.4). Appendix A, §A.2 contains the proof of the dynamic decomposition theorem as well as instructional examples.
2.6. Efficient control of decomposable Boolean networks
The state space of a Boolean network grows exponentially in the number of variables. Therefore, the decomposition theorems can reduce the time needed to perform various computations by orders of magnitude for networks with several larger modules. Besides an efficient strategy to compute all attractors of a Boolean network, the structural decomposition theorem can also be applied to efficiently identify controls of Boolean networks, a topic that has received recent attention [27–29]. Drug developers wonder, for example, which nodes in a gene regulatory network need to be controlled by an external drug to ensure the network transitions to a desired phenotype, typically corresponding to a specific network attractor.
Two types of control actions are generally considered: edge controls and node controls. For each type of control, one can consider deletions or constant expressions, as defined in [30]. The motivation for considering these control actions is that they represent the common interventions that can be implemented in practice. For instance, edge deletions can be achieved by the use of therapeutic drugs that target specific gene interactions, while node deletions represent the blocking of effects of products of genes associated with these nodes [31,32].
A set of controls μ stabilizes a Boolean network at an attractor when the resulting network after applying μ possesses as its only attractor. As described in detail in [33], the decomposition into modules can be used to obtain controls for each module, which can then be combined to obtain a control for an entire network. Specifically, for a decomposable network , if μ1 is a set of controls that stabilizes F1 in and μ2 is a control that stabilizes in , then is a set of control that stabilizes F in , as long as or is a steady state.
A recently published multi-cellular Boolean network model describes the microenvironment of pancreatic cancer cells (PCCs) by modelling the interactions of PCCs, pancreatic stellate cells (PSCs), and their connecting cytokines [34]. This network has 69 nodes, 114 edges and possesses three non-trivial modules (figure 6a). Figure 6b shows the directed acyclic graph, which describes the connections between the modules.
An effective treatment should induce the cancer cell to undergo apoptosis, which, therefore, represents the desired attractor of this network. To find a set of controls that stabilizes the network in this attractor, one can exploit the structural decomposition of the network by first controlling the upstream module (module 1), which has four attractors: two steady states and two 3-cycles. This module consists of two feedback loops joined by the node TGFb1. It is thus enough to control TGFb1 to stabilize this module into any of its attractors [35]. Using the methods from Zanudo & Albert [36] or Murrugarra et al. [30], the controls of module 2 can be identified. A minimal set of two nodes needs to be controlled to stabilize this module: RAS in the pancreatic cell and RAS in the stellate cell. After applying these controls, the nodes in the downstream module (module 3) are all already constant and do, therefore, not require additional controls. Using the modular structure of the network, three nodes can be easily identified, which suffice to control the entire network. Notably, this never requires the consideration of the entire network, which saves computation time. Disregarding the decomposition and identifying controls for the whole network instead yields the same minimal set of three controls. However, this may not always be the case. In rare cases, the module-by-module control identification strategy will yield a set of controls that is larger than necessary.
3. Discussion
The search for ‘fundamental laws’ has been part of systems biology since its beginning, including features of biological systems that are characteristic of most or all systems of a given type, such as gene regulatory networks. The concept of modularity can be considered as such a feature, and has been studied extensively in several different contexts. Another focus of interest has been the relationship between the structure and function of dynamic networks. The results in this paper in essence provide evidence that modularity is in fact a key feature that connects structure and function of networks.
Systems biology has been a field that is making extensive use of mathematical models as descriptive language and analytic tool. Notions such as dynamic modularity are difficult or impossible to study without the use of mathematical models, as is the relationship between structure and function of networks. A limitation of this approach is of course that published models are partial and simplified representations of the requisite biology, so that caution is required when drawing conclusions. But this approach has yielded useful results in studying motifs in static networks (e.g. [24]). The advantage of a mathematical foundation is that it enables an analytical treatment of concepts that might otherwise have to be studied using heuristics, examples and simulations. This is the essence of our approach in this study. Based on rigorous definitions, we were able to prove the link between structural and functional modularity, as well as the broad application to control of networks. We believe that we have only scratched the surface of results that follow from the mathematical framework we have established. For instance, the flip side of network decomposition is network construction through ‘concatenation’ of modules. This can be done in ways that achieve certain dynamic properties, of potential interest to problems in synthetic biology.
Finally, while we have provided evidence that our concept of structural and functional modularity might have biological relevance, more work remains to be done. For instance, it would be of interest to investigate the biological features of the individual modules found in the repository of Boolean network models from [16] to investigate whether modules in our definition can be viewed as meaningful biological ‘functional units'. The implications of a functional modular structure also remain to be explored beyond our initial result of control at the modular level. We also believe that many of our results should hold in appropriate form for the modelling framework of ordinary differential equations.
It is also worth observing that our decomposition results do not preclude the existence of emergent properties. Each module is a complex system in itself, capable of exhibiting emergent properties. And as modules perturb other downstream modules, their emergent properties propagate to other modules. Our results simply assert a certain relationship between certain subsystems of the whole system. These subsystems cannot be considered the parts that make up the whole.
4. Methods
4.1. Meta-analysis of published gene regulatory network models
We used the same repository of 122 published and distinct gene regulatory network models as in [16]. Some of these models include non-essential regulators. That is, a node is included as a regulator in an update rule but a change in this node never affects the update rule. We removed all non-essential regulators from the update rules, before computing for each network the number of genes (i.e. size), the average connectivity, all SCCs, as well as the size of each SCC. From this, we derived the primary metric of interest, the number of non-trivial SCCs. Trivial SCCs consist of one node only. Since SCCs are defined as the largest connected component such that there is a path from every node to every other node, it is irrelevant whether the single node in a trivial SCC regulates itself.
The logistic multi-variable regression model, implemented in the Python module statsmodels.api is given by
4.1 |
where p is the probability of a model having multiple non-trivial SCCs, and x1, x2 are average connectivity and network size.
4.2. Generation of Boolean networks for simulation study
To understand the effect of modularity on the phenotypical robustness and the dynamical complexity, we resorted to simulation studies of Boolean networks with a specific structure and a defined number of SCCs (i.e. modules). To reduce the number of potential confounders, we fixed the network size at N = 60 and the in-degree of each node at n = 3, which is slightly higher than the average in-degree in published gene regulatory network models [16]. We further considered only nested canalizing update rules (e.g. [37,38] for a definition) since most rules in published gene regulatory networks are of this type [16]. To generate networks with a defined number of m ∈ {1, 2, …, 6} modules, each of which consists of N/m nodes, we first generated a random directed acyclic graph of m modules by picking uniformly at random a weakly connected lower triangular binary m × m-matrix D with diagonal entries 1. If Dij = 1, a node in module i regulates a node in module j. Otherwise, there is no connection. To ensure that the number of SCCs was indeed m, we required each module to be a single SCC. We achieved this by randomly generating wiring diagrams for a module until the wiring diagram was strongly connected (for the sparsest modules (i.e. m = 1, N/m = 60), this took on average approximately 22 iterations).
4.3. Estimating dynamical complexity and phenotypical robustness
The size of the state space of the 60-node Boolean networks used in the computational study prohibits the exhaustive identification of all attractors. To compute all attractors, we could have exploited the decomposition into smaller modules for decomposable networks. However, this does not help with the identification of attractors for non-decomposable networks consisting of a single module of size 60. To avoid introducing any bias by using different methods, we employed the same sampling technique to estimate a lower bound of the number of attractors for each Boolean network. Specifically for each network F, we generated 500 random initial states x0 ∈ {0, 1}60 and continued to synchronously update each x0 according to F (that is, xi+1 = F(xi)) until a recurring state was found, indicating the arrival at an attractor.
Biologically meaningful attractors ‘attract’ a substantial portion of the state space. With a state space size of 260 and when starting from 500 random initial states, we have a chance of finding an attractor, which attracts of the state space and even a 99% chance of finding an attractor, which attracts 0.9% of the state space. Relying on sampling and the resulting lower bound of the number of attractors should, therefore, not limit the validity of our findings.
To estimate the phenotypical robustness, we considered the same 500 random initial states x0 ∈ {0, 1}60 and generated for each x0 a corresponding state by randomly flipping one bit i ∈ {1, …, n} (where ei is the ith unit vector and denotes binary addition). Just as x0, we synchronously updated y0 according to F until it reached an attractor and compared the attractors. As a consequence, all estimated phenotypical robustness values are multiples of 1/500.
Acknowledgements
The authors thank Elena Dimitrova for participating in initial fruitful discussions.
Appendix A. Mathematical details and supplementary figures
A.1. Proofs of the structural decomposition theorems
This section contains the proofs of the structural decomposition theorems described in the main text. First, we define in full detail the semi-direct product, used to combine two networks in a hierarchical fashion.
Definition A.1. —
Consider two Boolean networks,
with variables x = (x1, …, xk) and
with external inputs u = (u1, …, uℓ) and variables y = (y1, …, ym). Let such that and define . Then,
defines a combined Boolean network by setting
A 1 That is, the variables act as the external inputs of G. The corresponding coupling scheme is defined to be
A 2 We denote H as and refer to this as the coupling of F and G by (the coupling scheme) P or as the semi-direct product of F and G via P.
Theorem A.2. —
If a Boolean network F is not a module, then there exist F1, F2, P such that . Furthermore, we can find a decomposition such that F1 is a module.
Proof. —
Let F = (f1, …, fn) be a Boolean network with variables X = {x1, …, xn} and assume F is not a module. Then the wiring diagram of F is not strongly connected, implying there exists at least one node y and one node xj ≠ y such that there exists no path from xj to y in the wiring diagram of F. Let denote the set of all such nodes, i.e. the nodes for which there exists no paths to y. Further, let denote the complement set of nodes to X2. Note that for every xi ∈ X1, there exists a path from xi to y but no paths originating from X2 to xi.
Define to be the subset of indices such that for each there exists at least one function with which depends on .
If , then the sets X1 and X2 represent two groups of nodes, which are disconnected in the wiring diagram. Hence the network F is a Cartesian product of F1 and F2. It follows that with
If , then for any xi ∈ X1, the corresponding update function fi does not depend on X2 by construction, as there are no paths from X2 to xi, and we set F1 to be the restriction of F to X1, For any xi ∈ X2, if the corresponding update function depends on a node xj ∈ X1, then by the definition of . It follows by construction that any function fi then can be written as a Boolean function on X2 with external inputs from .
Hence, Note that in the above proof we can choose the node y such that it belongs to a SCC that receives no edge from any other SCC. X1 will contain the nodes of this SCC and hence F1 will be a module. ▪
The main structural decomposition theorem follows directly from this:
Theorem A.3. —
For any Boolean network F, there exist unique modules F1, …, Fm such that
A 3 where this representation is unique up to a reordering, which respects the partial order of Q (equation (2.1)), and the collection of coupling schemes P1, …, Pm−1 depends on the particular choice of ordering.
Proof. —
If F is a module, then m = 1 and the result follows.
If F is not a module, we use induction on the downstream subnetwork F2 in theorem A.2 to obtain the result. ▪
Consider as an example a network F with four SCCs, F1, F2, F3, F4, where F1 influences both F2 and F3, F2 and F3 both influence F4, but F2 and F3 have no influence on each other. The network can first be broken up as where G represents the downstream network of F2, F3, F4 and P1 includes the connections from F1 → F2 and F1 → F3. In turn, G can be decomposed as , where H is the network consisting of F3 and F4, and P2 denotes the connections from F2 → F4. Finally, H can be decomposed as where P3 represents the connections from F3 → F4. The final decomposition can thus be written as
Alternatively, we could have realized the decomposition of G as . The final decomposition then takes the form
The ambiguity of choice for decomposing G arises from the ambiguity of choosing a total order for the partially ordered set Q = {W1 → W2, W1 → W3, W2 → W4, W3 → W4}. Both decompositions are equally valid, and the ordering of the modules in each representation respects the partial order Q.
A.2. Non-autonomous Boolean networks
This section contains the full definition of non-autonomous Boolean networks, as well as two examples.
Definition A.4. —
A non-autonomous Boolean network is defined by
A 4 where H : {0, 1}k+m → {0, 1}m and is a sequence with elements in {0, 1}k. The network, denoted Hg, is non-autonomous because its dynamics depend on g(t).
A state c ∈ {0, 1}n is a steady state of Hg if H(g(t), c) = c for all t. Similarly, an ordered set with r elements, is an attractor of length r of Hg if c2 = H(g(1), c1), c3 = H(g(2), c2), …, cr = H(g(r − 1), cr−1), c1 = H(g(r), cr), c2 = H(g(r + 1), c1), …. In general, g(t) is not necessarily of period r and may even not be periodic.
If H(g(t), y) = G(y) for some network G for all t (that is, it does not depend on g(t)), then y(t + 1) = H(g(t), y(t)) = G(y(t)) and this definition of attractors coincides with the classical definition of attractors for (autonomous) Boolean networks.
Example A.5. —
Consider the non-autonomous network defined by
and the two-periodic sequence , which corresponds to a 2-cycle of the upstream 2-node network. If the initial point is , then the dynamics of Hg can be computed as follows:
Thus for t ≥ 1, and . It follows that the attractors of Hg are given by 00 (one steady state) and (01, 10) (one cycle of length 2). Note that (10, 01) is not an attractor because (10, 01, 10, 01, …) is not a trajectory for this non-autonomous network. This is a subtle situation that can be sometimes missed when not considering all trajectories a limit cycle represents.
Example A.6. —
Consider the non-autonomous network defined by H(u1, u2, y1, y2) = (u2 y2, y1), as in the previous example, and the one-periodic sequence , which corresponds to a steady state of the upstream 2-node network. If the initial point is , then the dynamics of Hg can be computed as follows:
and
Then, y(t) = (0, 0) for t ≥ 2, and the only attractor of Hg is the steady state 00.
A.3. Proof of the dynamic decomposition theorem
For a decomposable network , we introduce the following notation for attractors. First, note that F has the form F(x, y) = (F1(x), F2(x, y)) where F2 is a non-autonomous network. Let and be attractors of length m and n, respectively. Then, the sequence has period l = lcm(m, n), so we define the sum (or concatenation) of these attractors to be
A 5 |
Note that the sum of attractors is not a Cartesian product, .
Similarly, for an attractor and a collection of attractors A we define
A 6 |
Our second main theoretical result shows that the dynamics (i.e. the attractor space) of a semi-direct product can be seen as a type of semi-direct product of the dynamics of the decomposable subnetworks. When applied iteratively, this enables a computation of the attractor space from the attractor space of each module.
Theorem A.7. —
Let be a decomposable network. Then
A 7
Proof. —
Let X1 and X2 be the variables of F1 and F2, respectively. Further, let be an arbitrary attractor of F with length l. We can define as the projection of onto X1, and similarly as the projection of onto X2. By definition, F1 does not depend on X2. Thus, F1(pr1(x)) = pr1(F(x)), and for any ,
Iterating this, we find that in general , from which it follows that .
Next, we consider the non-autonomous network defined as in definition A.4 where y(t + 1) = pr2F(g(t), y(t)), and . If , then
and in general
Hence y(l + 1) = pr2 F(cl) = pr2 c1 = y(1) and thus From this we have that and thus
Conversely, let and . We want to show that . Let , and y(t + 1) = pr2 F(g(t), y(t)). Since then by definition. Let . Then
Thus and hence It follows that
from which we can conclude that the sets are equal. ▪
The following two examples highlight how theorem A.7 enables the computation of the dynamics of a decomposable network from the dynamics of its modules. To match attractors from the upstream module with the attractor spaces of the corresponding non-autonomous downstream networks, it is useful to consider the space of attractors in a specified order: we use parentheses (curly brackets) to denote an ordered (unordered) space of attractors. If there is no ambiguity, in practice we can use instead of .
Example A.8. —
Consider the Boolean network F(x1, x2, y1, y2) = (x2, x1, x2y2, y1). We can decompose where F1(x1, x2) = (x2, x1) is an upstream module and F2(u2, y1, y2) = (u2 y2, y1) is a downstream module with external parameter x2. To find all attractors of F by using theorem A.7, we find the attractors of F1 and the attractors of F2 induced by each of those attractors. It is easy to see that (where we denote steady states simply by c).
- —
For the corresponding non-autonomous network is y(t + 1) = F2(0, 0, y(t)). If , thenand
Thus, the space of attractors for is
- —
For the corresponding non-autonomous network is y(t + 1) = F2(1, 1, y(t)). If , thenand
Thus, the corresponding space of attractor is
- —
For we define by g(0) = (0, 1), g(1) = (1, 0), and g(t + 2) = g(t). is given by y(t + 1) = F2(g(t), y(t)). If thenThen, the corresponding space of attractors is
To reconstruct the entire space of attractors for F, we have
which agrees with the space of attractors shown in figure 5b.
Example A.9. —
Consider the linear Boolean network
We can decompose into modules F1(x1, x2) = (x2, x1) and F2(u2, y1, y2) = (u2 + y2, y1). The space of attractors of the upstream module F1 is
Using the dynamic decomposition theorem (theorem A.7), we can identify all attractors of F as follows (see figure 7 for a graphical description).
- —
For the corresponding non-autonomous network is . If then . Thus, the space of attractors for is
- —
Similarly, for we find that the space of attractors for is
- —
For we define by g(0) = (0, 1), g(1) = (1, 0), and g(t + 2) = g(t). is given by y(t + 1) = F2(g(t), y(t)). If , thenand in general for t > 0,
It follows that there are only two periodic trajectories in this case: (00, 10, 01, 00, 00, 10, 01, 00, …) and (11, 01, 10, 11, 11, 01, 10, 11, …), which both have period 4. The corresponding attractor space is
Note that the repetition of certain states is needed to obtain the correct attractors of the full network F.
To reconstruct the space of all attractors for F, we have
The linear network F possesses thus two steady states, one 2-cycle and three 4-cycles.
Ethics
This work did not require ethical approval from a human subject or animal welfare committee.
Data accessibility
The source code for the statistical analysis of the expert-curated biological networks and the generation of their directed acyclic graph structure is available from the Github repository: https://github.com/ckadelka/DesignPrinciplesGeneNetworks [39]. This repository also contains the rules of all the biological networks.
Declaration of AI use
We have not used AI-assisted technologies in creating this article.
Authors' contributions
C.K.: conceptualization, formal analysis, methodology, software, visualization, writing—original draft, writing—review and editing; M.W.: conceptualization, formal analysis, methodology, writing—original draft, writing—review and editing; A.V.-C.: conceptualization, formal analysis, methodology, visualization, writing—original draft, writing—review and editing; D.M.: conceptualization, formal analysis, methodology, visualization, writing—original draft, writing—review and editing; R.L.: conceptualization, formal analysis, methodology, writing—original draft, writing—review and editing.
All authors gave final approval for publication and agreed to be held accountable for the work performed therein.
Conflict of interest declaration
We declare we have no competing interests.
Funding
This work was supported by the Simons foundation (grant nos. 712537 (to C.K.), 850896 (to D.M.), 516088 (to A.V.)); the National Institute of Health (grant no. 1 R01 HL169974-01 (to R.L.)) and the Defense Advanced Research Projects Agency (grant no. HR00112220038 (to R.L.)).
References
- 1.Hartwell LH, Hopfield JJ, Leibler S, Murray AW. 1999. From modular to molecular cell biology. Nature 402, C47-C52. ( 10.1038/35011540) [DOI] [PubMed] [Google Scholar]
- 2.Hernández U, Posadas-Vidales L, Espinosa-Soto C. 2022. On the effects of the modularity of gene regulatory networks on phenotypic variability and its association with robustness. Biosystems 212, 104586. ( 10.1016/j.biosystems.2021.104586) [DOI] [PubMed] [Google Scholar]
- 3.Lorenz DM, Jeng A, Deem MW. 2011. The emergence of modularity in biological systems. Phys. Life Rev. 8, 129-160. ( 10.1016/j.plrev.2011.02.003) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Leicht EA, Newman ME. 2008. Community structure in directed networks. Phys. Rev. Lett. 100, 118703. ( 10.1103/PhysRevLett.100.118703) [DOI] [PubMed] [Google Scholar]
- 5.Malliaros FD, Vazirgiannis M. 2013. Clustering and community detection in directed networks: a survey. Phys. Rep. 533, 95-142. ( 10.1016/j.physrep.2013.08.002) [DOI] [Google Scholar]
- 6.Zhang B, Horvath S. 2005. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4, 17. ( 10.2202/1544-6115.1128) [DOI] [PubMed] [Google Scholar]
- 7.Alexander RP, Kim PM, Emonet T, Gerstein MB. 2009. Understanding modularity in molecular networks requires dynamics. Sci. Signal. 2, pe44. ( 10.1126/scisignal.281pe44) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Jimenez A, Cotterell J, Munteanu A, Sharpe J. 2017. A spectrum of modularity in multi-functional gene circuits. Mol. Syst. Biol. 13, 925. ( 10.15252/msb.20167347) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Verd B, Monk NA, Jaeger J. 2019. Modularity, criticality, and evolvability of a developmental gene regulatory network. Elife 8, e42832. ( 10.7554/eLife.42832) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Deritei D, Aird WC, Ercsey-Ravasz M, Regan ER. 2016. Principles of dynamical modularity in biological regulatory networks. Sci. Rep. 6, 21957. ( 10.1038/srep21957) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wagner GP, Pavlicev M, Cheverud JM. 2007. The road to modularity. Nat. Rev. Genet. 8, 921-931. ( 10.1038/nrg2267) [DOI] [PubMed] [Google Scholar]
- 12.Gilarranz LJ, Rayfield B, Liñán-Cembrano G, Bascompte J, Gonzalez A. 2017. Effects of network modularity on the spread of perturbation impact in experimental metapopulations. Science 357, 199-201. ( 10.1126/science.aal4122) [DOI] [PubMed] [Google Scholar]
- 13.Akutsu T, Kosub S, Melkman AA, Tamura T. 2012. Finding a periodic attractor of a Boolean network. IEEE/ACM Trans. Comput. Biol. Bioinf. 9, 1410-1421. ( 10.1109/TCBB.2012.87) [DOI] [PubMed] [Google Scholar]
- 14.Mizera A, Pang J, Qu H, Yuan Q. 2019. Taming asynchrony for attractor detection in large Boolean networks. IEEE/ACM Trans. Comput. Biol. Bioinf. 16, 31-42. ( 10.1109/TCBB.2018.2850901) [DOI] [PubMed] [Google Scholar]
- 15.Schwab JD, Kühlwein SD, Ikonomi N, Kühl M, Kestler HA. 2020. Concepts in Boolean network modeling: what do they all mean?. Comput. Struct. Biotechnol. J. 18, 571-582. ( 10.1016/j.csbj.2020.03.001) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kadelka C, Butrie TM, Hilton E, Kinseth J, Serdarevic H. 2020. A meta-analysis of Boolean network models reveals design principles of gene regulatory networks. arXiv. (http://arxiv.org/abs/2009.01216) [DOI] [PMC free article] [PubMed]
- 17.Madrahimov A, Helikar T, Kowal B, Lu G, Rogers J. 2013. Dynamics of influenza virus and human host interactions during infection and replication cycle. Bull. Math. Biol. 75, 988-1011. ( 10.1007/s11538-012-9777-2) [DOI] [PubMed] [Google Scholar]
- 18.Pušnik Z, Mraz M, Zimic N, Moškon M. 2022. Review and assessment of Boolean approaches for inference of gene regulatory networks. Heliyon 8, e10222. ( 10.1016/j.heliyon.2022.e10222) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lee WP, Tzou WS. 2009. Computational methods for discovering gene networks from expression data. Brief. Bioinform. 10, 408-423. ( 10.1093/bib/bbp028) [DOI] [PubMed] [Google Scholar]
- 20.Pratapa A, Jalihal AP, Law JN, Bharadwaj A, Murali T. 2020. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat. Methods 17, 147-154. ( 10.1038/s41592-019-0690-6) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Beneš N, Brim L, Huvar O, Pastva S, Šafránek D. 2023. Boolean network sketches: a unifying framework for logical model inference. Bioinformatics 39, btad158. ( 10.1093/bioinformatics/btad158) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wagner GP. 2014. Homology, genes, and evolutionary innovation. Princeton, NJ: Princeton University Press. [Google Scholar]
- 23.Halfon MS. 2017. Perspectives on gene regulatory network evolution. Trends Genet. 33, 436-447. ( 10.1016/j.tig.2017.04.005) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Alon U. 2003. Biological networks: the tinkerer as an engineer. Science 301, 1866-1867. ( 10.1126/science.1089072) [DOI] [PubMed] [Google Scholar]
- 25.Klemm K, Bornholdt S. 2005. Topology of biological networks and reliability of information processing. Proc. Natl Acad. Sci. USA 102, 18 414-18 419. ( 10.1073/pnas.0509132102) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Derrida B, Weisbuch G. 1986. Evolution of overlaps between configurations in random Boolean networks. J. Phys. 47, 1297-1303. ( 10.1051/jphys:019860047080129700) [DOI] [Google Scholar]
- 27.Borriello E, Daniels BC. 2021. The basis of easy controllability in Boolean networks. Nat. Commun. 12, 1-15. ( 10.1038/s41467-021-25533-3) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Rozum J, Albert R. 2022. Leveraging network structure in nonlinear control. NPJ Syst. Biol. Appl. 8, 36. ( 10.1038/s41540-022-00249-2) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Paul S, Su C, Pang J, Mizera A. 2018. A decomposition-based approach towards the control of Boolean networks. In Proc. of the 2018 ACM Int. Conf. on Bioinformatics, Computational Biology, and Health Informatics, Washington, DC, 29 August–1 September, pp. 11–20. New York, NY: ACM. ( 10.1145/3233547.3233550) [DOI]
- 30.Murrugarra D, Veliz-Cuba A, Aguilar B, Laubenbacher R. 2016. Identification of control targets in Boolean molecular network models via computational algebra. BMC Syst. Biol. 10, 94. ( 10.1186/s12918-016-0332-x) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Choi M, Shi J, Jung SH, Chen X, Cho KH. 2012. Attractor landscape analysis reveals feedback loops in the p53 network that control the cellular response to DNA damage. Sci. Signal. 5, ra83. ( 10.1126/scisignal.2003363) [DOI] [PubMed] [Google Scholar]
- 32.Wooten DJ, Zañudo JGT, Murrugarra D, Perry AM, Dongari-Bagtzoglou A, Laubenbacher R, Nobile CJ, Albert R. 2021. Mathematical modeling of the Candida albicans yeast to hyphal transition reveals novel control strategies. PLoS Comput. Biol. 17, e1008690. ( 10.1371/journal.pcbi.1008690) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Kadelka C, Laubenbacher R, Murrugarra D, Veliz-Cuba A, Wheeler M. 2022. Decomposition of Boolean networks: an approach to modularity of biological systems. arXiv. (http://arxiv.org/abs/2206.04217) [DOI] [PMC free article] [PubMed]
- 34.Plaugher D, Murrugarra D. 2021. Modeling the pancreatic cancer microenvironment in search of control targets. Bull. Math. Biol. 83, 1-26. ( 10.1007/s11538-021-00937-w) [DOI] [PubMed] [Google Scholar]
- 35.Zañudo JGT, Yang G, Albert R. 2017. Structure-based control of complex networks with nonlinear dynamics. Proc. Natl Acad. Sci. USA 114, 7234-7239. ( 10.1073/pnas.1617387114) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Zanudo JG, Albert R. 2015. Cell fate reprogramming by control of intracellular network dynamics. PLoS Comput. Biol. 11, e1004193. ( 10.1371/journal.pcbi.1004193) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Kauffman S, Peterson C, Samuelsson B, Troein C. 2003. Random Boolean network models and the yeast transcriptional network. Proc. Natl Acad. Sci. USA 100, 14 796-14 799. ( 10.1073/pnas.2036429100) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Li Y, Adeyeye JO, Murrugarra D, Aguilar B, Laubenbacher R. 2013. Boolean nested canalizing functions: a comprehensive analysis. J. Theor. Comput. Sci. 481, 24-36. ( 10.1016/j.tcs.2013.02.020) [DOI] [Google Scholar]
- 39.Kadelka C, Wheeler M, Veliz-Cuba A, Murrugarra D, Laubenbacher R. 2023. Modularity of biological systems: a link between structure and function. Github repository. (https://github.com/ckadelka/DesignPrinciplesGeneNetworks) [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Kadelka C, Wheeler M, Veliz-Cuba A, Murrugarra D, Laubenbacher R. 2023. Modularity of biological systems: a link between structure and function. Github repository. (https://github.com/ckadelka/DesignPrinciplesGeneNetworks) [DOI] [PMC free article] [PubMed]
Data Availability Statement
The source code for the statistical analysis of the expert-curated biological networks and the generation of their directed acyclic graph structure is available from the Github repository: https://github.com/ckadelka/DesignPrinciplesGeneNetworks [39]. This repository also contains the rules of all the biological networks.