Abstract
Background
It is widely accepted that genetic regulatory systems are 'modular', in that the whole system is made up of smaller 'subsystems' corresponding to specific biological functions. Most attempts to identify modules in genetic regulatory systems have relied on the topology of the underlying network. However, it is the temporal activity (dynamics) of genes and proteins that corresponds to biological functions, and hence it is dynamics that we focus on here for identifying subsystems.
Results
Using Boolean network models as an exemplar, we present a new technique to identify subsystems, based on their dynamical properties. The main part of the method depends only on the stable dynamics (attractors) of the system, thus requiring no prior knowledge of the underlying network. However, knowledge of the logical relationships between the network components can be used to describe how each subsystem is regulated. To demonstrate its applicability to genetic regulatory systems, we apply the method to a model of the Drosophila segment polarity network, providing a detailed breakdown of the system.
Conclusion
We have designed a technique for decomposing any set of discrete-state, discrete-time attractors into subsystems. Having a suitable mathematical model also allows us to describe how each subsystem is regulated and how robust each subsystem is against perturbations. However, since the subsystems are found directly from the attractors, a mathematical model or underlying network topology is not necessarily required to identify them, potentially allowing the method to be applied directly to experimental expression data.
Background
Genetic regulatory systems are assumed to be 'modular', with specific combinations of genes and proteins responsible for different biological functions. Although there is no authoritative definition of a 'module', one common description is a group of genes, proteins and/or molecules that combine to carry out a relatively distinct function (distinguishable from the functions associated with other modules) [1]. Rather than being a protein complex or group of co-expressed genes, such a module can be viewed as the temporal activity (dynamics) of a group of genes/proteins that controls a specific function in different environmental conditions, cell types and/or tissues. For example, the temporal activity of genes/proteins controlling progression through the cell cycle (in many different environmental conditions and cell types) can be viewed as a module. Moreover, since genetic regulatory systems are hierarchical [2], it is likely that these modules also interact in a hierarchical manner.
It is this concept of a 'dynamical module' or 'subsystem' that we consider in this paper. An important challenge in biology is how to identify such subsystems and discover how they combine hierarchically to determine cell types, tissues and organisms. A dynamic approach is the most suitable starting point for decomposing a genetic regulatory system into modules, since it is the activity profiles that control biological functions. The topology of the underlying interaction network is just a description of the interactions associated with the activity profiles.
Most previous attempts at modular decomposition have relied on the topology of the underlying network, deduced from protein-protein interaction and/or transcription factor binding data. From a generic network point of view, algorithms have been created which partition any network into topological modules [3]. From a biological viewpoint, transcription factor networks have been used to find 'network motifs' [4] and 'structural modules' [5], based solely on topological properties of the network. Although informative, topology based approaches do not necessarily allow the functions/dynamics associated with groups of genes/proteins to be inferred [6]; rather, activity/expression levels over a period of time are required. When expression data have been used, the resulting modules tend to be groups of genes whose expression levels change in unison. For example, 'transcription factor modules' [7,8] are groups of genes with common transcription factor binding sites and expression patterns. However, for a biological function, it may be the case that a series of interactions are involved, with different genes being affected at different points in time (e.g. genes involved in cell cycle regulation). Similar issues arise when applying clustering algorithms to expression data, which also groups together genes whose expression levels change in unison. In this paper, we present a method for identifying subsystems ('dynamical modules'), given a set of discrete state, discrete time attractors. The subsystems are found directly from the attractors (Fig. 1), without the need for topological information, making the method applicable both to models and (potentially) to experimental expression data. To demonstrate the method, we consider a class of mathematical models called Boolean network models (defined in the Methods section). These models are used as a starting point because of their relative simplicity and the fact that different dynamics can be easily compared. We also consider how each subsystem is regulated and how subsystems can help investigate robustness in these systems. In order to test our method, we then apply it to a model of the Drosophila segment polarity network.
Results: Identifying subsystems and regulatory interactions between subsystems
Given a set of discrete-state, discrete-time attractors as an input, we have produced a method that optimally breaks the attractors up and identifies subsystems, each one
1. Conserved across some set of attractors,
2. Distinguishable from the stable dynamics for the rest of the system.
In order to demonstrate the key features of the resulting subsystems, we use the simple example network in Fig. 2. Here, A1, ..., A4 are four attractors for the system, but it is evident that the sub-dynamics associated with some sub-networks (S1, ..., S6) can distinguish different sets of attractors and highlight parts of the system that are dynamically/functionally distinct. For example, the subsystem S1 is conserved across three attractors (A1, A2 and A3) and provides a way of distinguishing those attractors from the remaining attractors (A4). Moreover, S1 can be viewed as dynamically distinct from all remaining sub-dynamics, since the dynamics of each remaining node (1, 4, 5 and 6) has a distinguishable profile when viewed alongside S1. In particular,
(a) Nodes 1, 5 and 6 each exhibit different behaviours/dynamics alongside S1 (compare A3 with A1 and A2).
(b) The activity of node 4 oscillates out of phase with the activity of nodes 5 and 6, when viewed alongside S1 in attractors A1 and A2.
The sub-dynamics S1, ..., S6 in Fig. 2 are all subsystems (according to our method) because each one can be viewed as dynamically distinct from all the other sub-dynamics in attractors A1 – A4. The attractors themselves than can be viewed as combinations of subsystems.
The main method does not explicitly identify functional/regulatory relationships between subsystems. However, in the case of logical models such as Boolean network models, the logical functions can be used to identify suitable regulatory relationships.
For the remainder of this section we describe these new techniques in terms of Boolean network models (which we formally define in the Methods section). Algorithms for these new techniques are described in the Methods section of this paper, whilst formal proofs for all the methods can be found in Additional files 1 and 2. It should be noted that all of the definitions and algorithms can be extended to any set of discrete state, discrete time attractors, with or without a detailed mathematical model.
Partial state sequences
Much of this paper involves looking at the sub-dynamics in Boolean network models. Therefore, for a subset of nodes N ⊆ V, we consider partial states.
Definition 1. A partial state, xN ∈ {0, 1}|N| is a set of Boolean states, one for each node ni ∈ N ⊆ V. i.e. xN = {si: ni ∈ N}.
Definition 2. A partial state sequence, , is an ordered set of partial states, for a node set N (⊆ V).
These partial state sequences can be used to represent sub-dynamics in Boolean network models, and therefore, can be used as a starting point when defining subsystems. In particular, we are looking for partial state sequences that occur (or cycle within) attractors.
Definition 3. A partial state sequence occurs in an attractor A = {z0, ..., zp-1} if there exists integers b0, ..., bp-1 ∈ {0, ..., q - 1} for which the following are true
1. For k = 0, ..., p - 1, = {si ∈ zk : ni ∈ N}.
2. For each k ∈ {0,..., p - 1} and j = k - 1 (mod p), either
(a) bk = bj or (b) bk = bj + 1 (mod q)
3. Properties 1 and 2 are not true for any smaller partial state sequence and integers c0, ..., cp-1 ∈ {0, ..., q' - 1} (q' <q).
As an example of a partial state sequence that occurs in an attractor, consider the partial state sequence
and the attractor A1 = {z0, z1, z2, z3} in Fig. 2. As the system enters the attractor A1, we continually cycle through the component states over time (i.e. z0, z1, z2, z3, z0, z1, z2, z3, z0, ...). Therefore, since
- is contained in z0 and z1 (i.e. = {si ∈ z0 : ni ∈ N} = {si ∈ z1 : ni ∈ N})
- is contained in z2 and z3 (i.e. = {si ∈ z2 : ni ∈ N} = {si ∈ z3 : ni ∈ N})
we also continually cycle through the partial states of P over time (i.e. , ...). We note that the time taken to change from one partial state to the next is not considered (2 time steps in this example). Ignoring such time lags could be especially important in genetic regulatory systems, where the same process may take different lengths of time under different conditions (in different attractors). For example, in cell cycle regulation, some mutations can alter the time taken for a round of cell division, without necessarily altering the relationships that exist between other groups of genes/proteins [9]. Properties 1 and 2 allow us to capture the fact that P continually cycles within the the attractor A1 by noting that a sequence of integers {b0, b1, b2, b3} = {0, 0, 1, 1} exists that maps the partial states in P onto the attractor states in A1. Property 2 ensures that the partial states in P occur in the correct order, whilst allowing different lengths of time between each change. Properties 2 and 3 ensure that P is the smallest set of partial states that occurs in the attractor A. This leaves a partial state sequence that just describes the 'order' in which the node states change in A (for nodes in N), ignoring any time lags associated with individual partial states and the number of times P cycles within A. Returning to the example in Fig. 2, S3 cycles twice in A1 and A2. However, just two partial states ( = {s4 = 1} and = {s4 = 0}) are sufficient to capture the fact that S3 occurs in A1 and A2 (letting {b0, b1, b2, b3} = {0, 1, 0, 1} and {1, 0, 1, 0} respectively).
Given a node set N and a set of attractors, the Methods section of this paper gives detailed algorithms of how to identify the unique partial state sequences (within an order of rotation) that occur in each attractor. Essentially this is done by going through each attractor state and writing down the partial states involving the node set N. We then remove partial states when
(a) Multiple adjacent partial states are identical (in which case only one copy is kept)
(b) A sequence of states cycles many times within an attractor (in which case only one copy is kept).
leaving a partial state sequence that just describes the 'order' in which the node states change in that attractor.
Method for identifying subsystems
The new method of identifying subsystems is a two stage process that breaks up the system's attractors, to leave partial state sequences that are optimally distinguishable from one another. A summary of these two stages, along with relevant definitions, is given below using the simple example in Fig. 2 and Fig. 3. Detailed descriptions of the algorithms are given in the Methods section for those wanting to implement this method. A flow diagram of the procedures involved can be seen in Fig. 1
Stage 1: Partition sequences
The first stage of the method involves identifying every partial state sequence that satisfies Definition 5. Each such partition sequence (for a node set N) is both (a) conserved across a set of attractors and (b) has a fundamentally different dynamical profile from that associated with any larger node set M ⊇ N. We first introduce intersection sequences that are used in the definition of partition sequences and set up a hierarchy amongst them.
Definition 4. A partial state sequence P for a node set N is an intersection sequence, if there exists a subset of attractors for which the following hold
1. P occurs in every attractor A ∈
2. P does not occur in any attractor A ∉
3. Given a larger node set M ⊃ N, there is no partial state sequence P' (for the node set M) that satisfies condition 1.
If the above properties hold, we say P intersects at .
In order to identify every intersection sequence, node sets (N) are analysed to identify which partial state sequences occur in which attractors. Any such partial state sequence (P) and corresponding set of attractors () will then satisfy properties 1 and 2 of Definition 4. These pairs {P, } are stored for each node set N and then compared during the procedure to remove those that fail property 3. In the Methods section of this paper we give a detailed algorithm for this as well as providing some notes on improving efficiency. In practice, many node sets N and/or attractors can be ignored because we know that the pairs {P, } will fail property 3.
The set of all intersection sequences provide a hierarchical breakdown of the system's dynamics (see Fig. 3A, B and Table 1). At the top of the hierarchy are partial state sequences that contain only a few nodes but are conserved over (relatively) many attractors. Then as you go down the hierarchy, extra nodes are added but the resulting partial state sequences are conserved over fewer attractors. Property 3 above ensures that each partial state sequence is optimal in the sense that no more extra nodes can be considered without the new extended partial state sequence occurring in strictly fewer attractors.
Table 1.
Intersection sequences | Intersects at ... |
---|---|
P3 | 3 = {A1, A2, A3} |
P4 | 4 = {A3, A4} |
P5 | 5 = {A1, A2} |
P6 | 6 = {A1, A2} |
P7 | 7 = {A1} |
P8 | 8 = {A2} |
P9 | 9 = {A3} |
P10 | 10 = {A4} |
The following table gives all the intersection sequences for the simple example model (in Fig. 2 and Fig. 3). The table shows which partial state sequences (from Fig. 3) are intersection sequences, along with the attractors they occur in. The attractors (A1 - A4) can be seen in Fig. 2.
Although these sequences provide a neat hierarchical breakdown of the system's dynamics, some may be superfluous whilst important 'core' sub-dynamics may be missed. Therefore, we introduce 3 extra constraints when defining partition sequences.
Definition 5. A partial state sequence is a partition sequence if it satisfies any of the following properties, for some set of attractor cycles
A : P is Core to
The following 3 properties hold for P
1.P occurs in an intersection sequence P', which intersects at (P can equal P').
2. If an intersection sequence Q (for a node set M) intersects at (where ), then there exists an intersection sequence Q' (for a node set M' ⊇ M ∪ N) that occurs in every attractor A ∈
3. 1 and 2 are not true for any larger partial state sequence P" (for a node set N" ⊃ N)
B : P is Exclusive to
P is the only intersection sequence that intersects at .
C : P is Independently Oscillating
P intersects at and cycles out of phase with another intersection sequence Q. i.e. ∃ Q that involves the node set M and intersects at , for which
1. || ≥ 2
2. N ∪ M = V (the set of all nodes)
The three parts of this definition are used to describe three ways in which a sub-dynamic (partial state sequence) can be viewed as fundamental to the make up of the attractors and/or distinguishable from the remaining sub-dynamics. For the example in Fig. 2 and Fig. 3, Table 2 shows the partial state sequences that satisfy properties A, B and C above.
Table 2.
Partition sequences | Core to ... | Exclusive to ... | Independently Oscillating with ... |
---|---|---|---|
P1 | {A1, A2, A3} | × | × |
P2 | {A1, A2} | × | × |
P3 | × | {A1, A2, A3} | P5 |
P4 | {A3, A4} | {A3, A4} | × |
P5 | × | × | P3 and P6 |
P6 | × | × | P5 |
P7 | {A1} | {A1} | × |
P8 | {A2} | {A2} | × |
P9 | {A3} | {A3} | × |
P10 | {A4} | {A4} | × |
The following table gives all the partition sequences for the simple example model (in Fig. 2 and Fig. 3). The partition sequences (P1 – P10) can be seen in Fig. 3, whilst the attractors (A1 – A4) can be seen in Fig. 2. The table also shows which properties they satisfy in Definition 5 (× implies that property is not satisfied)
Two partition sequences P1 and P2 are not intersection sequences, but are core to the dynamics associated with different sets of attractors. For example, P2 is the core dynamic spanning attractors A1 and A2, rather than P5 and P6, which both occur in A1 and A2. This is because P5 and P6 involve partially overlapping node sets N5 and N6 (i.e. N5 ⊈ N6, N6 ⊈ N5), and so neither corresponds to a sub-network whose dynamics are core to attractors A1 and A2. On the other hand P2 acts as a central point from which these intersections sequences branch off.
Although in this example every intersection sequence is also a partition sequences, this does not necessarily have to be the case. One example where this is not the case is given in Section S3.2 of Additional file 3. In order to identify every partition sequence, we use the full set of intersection sequences to identify every partial state sequence satisfying property A, B or C of Definition 5. These 3 tests are done independently and the partial state sequences found in each test are grouped together to give the full set of partition sequences. In the Methods section of this paper we give detailed algorithms for these procedures.
Stage 2: Subsystems
The second stage of the method compares the partition sequences, to identify every partial state sequence that satisfies Definition 6. The partition sequences allow the attractors to be broken up in a hierarchical manner. However, these sequences are not subsystems (for a start, all of the attractors themselves are partition sequences). Of more interest are the key sub-dynamics that are unique to a particular partition sequence and that distinguish one level in the hierarchy from the next. Therefore, we define subsystems to be those components that are unique to a partition sequence. i.e.
Definition 6. A partial state sequence S (for a node set N) is a subsystem if it is unique to a partition sequence. i.e. there exists a partition sequence P (for a node set M) for which
1. S occurs in P.
2. If another partition sequence P' (for a node set M' ⊂ M) occurs in P, then M' ∩ N = ∅.
3. 1 and 2 are not true for any partial state sequence S', for a larger node set N' ⊃ N.
In the Methods section of this paper we give detailed algorithm for identifying every such subsystem. This essentially involves running through every partition sequence P (for a node set M) and identifying the nodes (N) that do not belong to any other partition sequence P', that occurs in P. The dynamics associated with this node set N are then a subsystem.
As an example, consider P5 in Fig. 3. Looking at the remaining partition sequences we see that only P1 and P2 occur within it. Therefore, the nodes and dynamics in P5 that are unique to P5 are the node set N = {n5, n6} and the subsystem S5 (from Fig. 2D). Obviously, S5 satisfies Property 1 of Definition 6 because it occurs in the partition sequence P5. Property 2 is satisfied since P1 and P2 are the only partition sequences occurring in P5 and the node sets do not intersect with N = {n5, n6} ({n2, n3} ∩ {n5, n6} = ∅ and {n1, n2, n3} ∩ {n5, n6} = ∅). Property 3 is satisfied since any node set N' ⊆ M, N' ⊃ N would fail Property 2. For the example Boolean network model, all the subsystems can be seen in Fig. 2, whilst Fig. 3C and Table 3 demonstrate which subsystem corresponds to which partition sequences.
Table 3.
Partition sequence | Unique Component |
---|---|
P1 | S1 |
P2 | S2 |
P3 | S3 |
P4 | S4 |
P5 | S5 |
P6 | N/a |
P7 | N/a |
P8 | N/a |
P9 | N/a |
P10 | S6 |
For the partition sequences in Table 2 and Fig. 3, the following table gives components that are unique to each one. These are the subsystems shown in Fig. 2.
Regulation of subsystems
We say a collection of subsystems x = {S1, ..., Sf} triggers an individual subsystem Sy in an attractor A, if the co-occurrence of subsystems S1, ..., Sf ensures the occurrence of Sy in A.
For Boolean network models (or other discrete-state models), such interactions can be identified by considering the Boolean functions (or Logical functions). In the case of the simple example in Fig. 2, it is possible to look at the Boolean functions and say that
(a) S2 can trigger the occurrence of S1 in attractors A1 and A2.
(b) S1 can trigger the occurrence of itself (S1) in attractors A1, A2 and A3.
This is since
(a) The prolonged activation of node 1 (s1 = 1) is sufficient to activate nodes 2 and 3 (s2 = 1, s3 = 1).
(b) If nodes 2 and 3 are both on at time t, then this sufficient to maintain their activation for all time steps t' ≥ t.
As can be seen above, different collections may act in different sets of attractors to fully explain the occurrence of a subsystem Sy.
Definition 7. A set of subsystem collections regulates an (individual) subsystem Sy if the following are true
1. For i = 1, ..., g, ∃ an attractor A for which i triggers Sy in A.
2. If Sy occurs in an attractor A, ∃ i ∈ {1, .., g} for which i triggers Sy in A.
We call the set {} the regulation set of Sy.
In the Methods section, we describes how a suitable regulation set can be found for each subsystem Sy, by looking at how each partial state within Sy is triggered/regulated in each attractor. This approach uses the Boolean functions from a Boolean network model to identify which partial states can trigger the occurrence of those in Sy.
For the simple example in Fig. 2, Table 4 shows a regulation set for each of the subsystems. As can be seen in this example it is often the case that subsystems play a role in their own regulation, although this does not necessarily have to be the case.
Table 4.
Subsystem | Regulation set |
---|---|
S1 | = {S1} |
= {S2} | |
S2 | = {S2} |
S3 | = {S1, S3} |
= {S2, S3} | |
S4 | = {S4} |
S5 | = {S2, S5} |
S6 | = {S4, S6} |
Table showing the collections of subsystems that regulate each individual subsystem Sy (for the example Boolean network model and subsystems in Fig. 2). Column 2: Each row shows a collection of subsystems (group) that can trigger Sy.
Even without the Boolean functions, it is possible to identify relationships between subsystems. On a simple observational level, a subsystem Sx may be hierarchically linked to another subsystem Sy, because Sx only occurs in an attractor in conjunction with the 'higher order' Sy. i.e.
Definition 8. Consider two subsystems Sx and Sy. Sx is hierarchically linked to Sy if the following are true
- Sx occurs in an attractor A ⇒ Sy occurs in an attractor A
Furthermore, such a link can be viewed as direct if it is impossible to find a subsystem Sz for which the following is true
1. Sx is hierarchically linked to Sz
2. Sz is hierarchically linked to Sy
3. There exists attractors A1 and A2 for which
(a) Sy occurs in A1 and A2
(b) Sz occurs in A1 but not A2
(c) Sx occurs in neither A1 nor A2
This terminology can easily be extended to collections of subsystems.
Robustness of attractors and subsystems
Suppose S is a subsystem that involves a set of nodes N and occurs in a set of attractors . For any network state zk = {s1, ..., sv} in any attractor A ∈ , a node nf can have its state sf perturbed to (sf + 1) (mod 2), to give a new network state (say). By looking at how often such a network state converges to an attractor A' ∈ containing the same subsystem, it is possible to measure how robust the subsystem S is to perturbations in the system.
Let r(S, L) be the probability that perturbing the state of a node nf ∈ L in an attractor A ∈ causes the system to converge to an attractor A' ∈ . i.e.
where
Then, using this measure, it is possible to measure Global Robustness = r(S, V), Internal Robustness = r(S, N), and External Robustness = r(S, M) (where V is the set of all nodes and M = V\N). This allows one to distinguish the effects of perturbing nodes within and outside of the subsystem. The measure r(S, L) can also be adapted to give the robustness of an individual attractor A (by letting S = A, = {A} and L = V).
Results: Application to the segment polarity network
During the development of the fruit fly Drosophila melanogaster, the embryo becomes segmented, with the segment polarity genes responsible for specifying the number, spacing and polarity (direction) of these segments. In order to demonstrate the applicability of our technique to genetic regulatory systems, we analyse a previously published Boolean network model of the Drosophila segment polarity network from references [10] and [11]. This model corresponds to a 4-cell ring, where intercellular interactions are allowed between cells 1 and 4. This approach was kept here since the segments start off 4 cells wide and the wild type patterns repeat every 4 cells (before cell proliferation). The Boolean functions for this model are shown in Table S4.1 of Additional file 4, whilst the main interactions are summarised in Fig. 4. This 4 cell model has 10 fixed point attractors. The wild type attractor A1 is shown in Fig. 5, and is characterised by WG and EN/HH expression in 1 cell wide stripes either side of the parasegment boundary (in cell 4 and 1 resp). Two other attractors, A2 and A3, correspond to experimentally observed phenotypes and are also shown in Fig. 5. All 10 attractors are shown in Additional file 4 (Fig. S4.1).
Applying our method, we identified 19 subsystems that satisfied Definition 6, which are shown in Fig. 6 and Table 5. SA, which occurs in all 10 attractors, corresponds to the cellular response to SLP (expressed in cells 3 and 4 only). Of the remaining subsystems, SB1, SB2, SC1 and SC2 appear to be the most interesting since they capture a large proportion of the global dynamics. They correspond to different states and different cells associated with the same positive feedback loop, involving Wingless (WG), Engrailed (EN), Hedgehog (HH) and Cubitus interruptus (CIA). The feedback loop can either be ON (SB1, SC1) or OFF (SB2, SC2) and either involve inter-cellular interactions between cells 4 and 1 (SB1, SB2) or cells 2 and 3 (SC1, SC2). As mentioned earlier, the wild type attractor is characterised by WG and EN/HH expression in 1 cell wide stripes either side of the parasegment boundary (in cell 4 and 1 respectively). This is captured by SB1.
Table 5.
Subsystem | States |
---|---|
SD1 | wg1 = 1, WG1 = 1 |
SD2 | wg1 = 0, WG1 = 0 |
SE1 | wg2 = 1, WG2 = 1 |
SE2 | Wg2 = 0, WG2 = 0 |
SF1 | PTC1 = 1 |
SF2 | PTC1 = 0 |
SG1 | PTC2 = 1 |
SG2 | PTC2 = 0 |
SH1 | CIA1 = 1, ptc1 = 1 |
SH2 | CIA1 = 0, ptc1 = 0 |
SI1 | CIA2 = 1, ptc2 = 1 |
SI2 | CIA2 = 0, ptc2 = 0 |
SJ1 | CIR1 = 1, CIR2 = 1 |
SJ2 | CIR1 = 0, CIR2 = 0 |
Table showing the remaining 14 subsystems for the segment polarity network model. For each subsystem, the second column of the table shows the nodes, cells (subscript) and states involved. The five main subsystems (SA, SB1, SB2, SC1 and SC2) are shown in Fig. 6.
A previous study has considered the modular design of this system [12], but there are significant differences in the component 'modules'. In [12], modules were arbitrarily chosen to be important intra-cellular pathways involving (a) SLP, (b) EN/HH and (c) WG/CI. However, here we have found that the most important subsystems are inter-cellular combinations of these pathways. Firstly, (a) and (b) combine to form SA. Secondly, (b) and (c) combine across cell boundaries to give 2-cell positive feedback loops SB1, SB2, SC1 and SC2. Moreover, the method in the paper captures the states of these pathways.
Interactions exist between these 19 subsystems, which explain how each one is regulated and which of the 10 attractors they occur in. For each of the 19 subsystems, Table 6 shows the sets of subsystems that regulate them, by ensuring their occurrence in an attractor (see previous section). Of the 19, the five subsystems highlighted in Fig. 6 play the largest role in regulating other subsystems with SA, SB1, SB2, SC1 and SC2 involved in the regulation of 12, 9, 9, 9 and 9 subsystems (respectively). Using these five main subsystems as a starting point, it is possible to generate a hierarchical breakdown of the 10 attractors in the model (Fig. 7A). The 10 attractors all contain SA, then split up into 4 main groups depending on the occurrence of SB1, SB2, SC1 and SC2. In two cases, where 1 = {SB1, SC1} or 2 = {SB2, SC2} occur, this is sufficient to distinguish the attractors A2 and A3 (respectively) from the others. In the remaining two cases, where 3 = {SB1, SC2} and 4 = {SB2, SC1} occur, two additional levels of specification determine the individual attractors. Additional file 4 provides further details of the interactions involved in specifying each individual attractor (and ensuring occur in an attractor).
Table 6.
Subsystem | Regulation set |
---|---|
S A | = {SA, SB1, SC1} |
= {SA, SB2, SC1} | |
= {SA, SB1, SC2} | |
= {SA, SB2, SC2} | |
SB1 | = {SA, SB1} |
SB2 | = {SA, SB2, SE2} |
= {SA, SB2, SC2, SF1, SG1} | |
SC1 | = {SA, SC1} |
SC2 | = {SA, SC2, SD2} |
= {SA, SB2, SC2, SF1, SG1} | |
SD1 | = {SD1, SH1, SJ2} |
SD2 | = {SD2} |
= {SB1} | |
= {SE1} | |
= {SA, SC2, SF1} | |
SE1 | = {SE1, SI1, SJ2} |
SE2 | = {SE2} |
= {SC1} | |
= {SD1} | |
= {SA, SB2, SG1} | |
SF1 | = {SB2, SC1} |
= {SA, SC2, SF1} | |
SF2 | = {SF2, SH2} |
= {SB1, SC1} | |
SG1 | = {SB1, SC2} |
= {SA, SB2, SG1} | |
SG2 | = {SG2, SI2} |
= {SB1, SC1} | |
SH1 | = {SB2, SC1} |
SH2 | = {SB1} |
= {SE1} | |
= {SA, SC2, SF1} | |
SI1 | = {SB1, SC2} |
SI2 | = {SC1} |
= {SD1} | |
= {SA, SB2, SG1} | |
SJ1 | = {SA, SB2, SC2, SF1, SG1} |
SJ2 | = {SB1} |
= {SC1} | |
= {SF2, SG2} |
Table showing the sets of subsystem collections responsible for regulating each individual subsystem Sy. Each row shows a collection of subsystems () that can trigger the occurrence of Sy.
Since only 3 of the 10 attractors from the model correspond to observed phenotypes, it is important to ensure that the main subsystems aren't an unrealistic artefact of the model. Re-applying our method of identifying subsystems to the reduced set of observed attractors {A1, A2, A3}, we find that the 5 main subsystems in Fig. 6 are preserved. The only change is that the new subsystems incorporated some of the old smaller subsystems. In particular we get the following 5 main subsystems
- = SA ∪ SD2 ∪ SE2 ∪ SH2
- = SB1 ∪ SF2 ∪ SJ2
- = SB2 ∪ SF1 ∪ SJ1
- = SC1 ∪ SG2
- = SC2 ∪ SG1
Moreover, as can be seen in Fig. 7B, the 5 main subsystems still provide a breakdown of the attractors. Focusing on subsystems allows us to look at the segment polarity network from a different angle. The first thing that stands out is that the subsystems exhibit a symmetry across the cell 1/2 boundary, whereby every subsystem in cell 4/1 has a symmetric counterpart in cell 3/2 (compare the pairs SA-SA, SB1-SC1, SB2-SC2, SD1-SE1, SD2-SE2, SF1-SG1, SF2-SG2, SH1-SI1, SH2-SI2, SJ1-SJ1 and SJ2-SJ2). Moreover, as can be seen in Table 6, these symmetric counterparts are analogously regulated. The protein SLP is crucial to setting up this symmetry, since it sets off a chain of interactions that nullifies hh, HH and en in cells 3 and 4 (see SA in Fig. 6). This then blocks signalling between cells 3 and 4, since (1) HH and WG are the only proteins involved in cell – cell signalling (in this model) and (2) en is the only gene (in this model) that can receive signals from WG. Therefore, it appears that one of the effects of SLP is to partition the cells in the embryo into 'isolated' 4 cell wide blocks (at this developmental stage) with the cell 1/2 boundary at the centre of this block. SLP also imposes restrictions on interactions within the neighbouring cells 1 and 2. This appears to be sufficient to partition the 4 cells into two, 2-cell blocks (cell 4/1 and cell 2/3) that have relatively independent sub-dynamics. These two, 2-cell blocks are forced to choose dynamics from SB1, SB2, SC1 and SC2, which in turn specify the attractor chosen. Important interactions do occur across the 1/2 cell boundary and sub-dynamics are conserved across this boundary. However, the driving force behind the dynamics and specification of the network appear to come from the two, 2-cell wide blocks. Whether this symmetry is inherent in the system or an artefact of the model remains to be verified, since only three attractors have been observed experimentally, and so more data may be required. In order to gain added insight into both the subsystems and the attractors, we calculate the robustness of each attractor and subsystem (see Tables 7 and 8). The robustness score is between 0 and 1 and measures how often the attractor/subsystem can survive after a perturbation to any node state. SB1 and SC1 are maximally robust in that no single node perturbations can destroy them. This also implies that SB1 and SC1 draw in local sub-dynamics that only marginally differ from them. Therefore, since SB1 and SC1 both occur in A2, this partially explains why A2 is so dominant in the state space, with over 98 % of network states converging to it. On the other hand SB2 and SC2 are vulnerable to perturbations, especially to nodes within the subsystems themselves. The wild type attractor is A1, which contains SB1 and SC2. Only a small proportion of the state space (0.01 %) converges to this attractor but it is still relatively robust (0.67). It appears that this robustness is primarily due to SB1, the same subsystem responsible for the characteristic WG and EN/HH expression either side of the parasegment boundary (in cell 4 and 1 resp).
Table 7.
Attractor | Basin of attraction | Robustness |
---|---|---|
A1 | 0.01 % | 0.67 |
A2 | 98.63 % | 1.0 |
A3 | 1.27 % | 1.0 |
A4 | 0.00035 % | 0.60 |
A5 | 0.00005 % | 0.58 |
A6 | 0.04 % | 0.69 |
A7 | 0.0003 % | 0.60 |
A8 | 0.00015 % | 0.58 |
A9 | 0.04 % | 0.69 |
A10 | 0.01 % | 0.67 |
Robustness of each attractor, where the score is between 0 and 1 and defined in the Results section. We also give the percentage of state space converging to each attractor (estimated from a sample of 2,000,000 initial network states).
Table 8.
Subsystem | Robustness | ||
---|---|---|---|
Global | Internal | External | |
S A | 1.0 | 1.0 | 1.0 |
SB1 | 1.0 | 1.0 | 1.0 |
SB2 | 0.78 | 0.56 | 0.85 |
SC1 | 1.0 | 1.0 | 1.0 |
SC2 | 0.78 | 0.56 | 0.85 |
SD1 | 0.63 | 0.5 | 0.63 |
SD2 | 0.99 | 0.88 | 1.0 |
SE1 | 0.63 | 0.5 | 0.63 |
SE2 | 0.99 | 0.88 | 1.0 |
SF1 | 0.76 | 0.71 | 0.76 |
SF2 | 0.97 | 0.33 | 0.99 |
SG1 | 0.76 | 0.71 | 0.76 |
SG2 | 0.97 | 0.33 | 0.99 |
SH1 | 0.73 | 1.0 | 0.72 |
SH2 | 1.0 | 1.0 | 1.0 |
SI1 | 0.73 | 1.0 | 0.72 |
SI2 | 1.0 | 1.0 | 1.0 |
SJ1 | 1.0 | 1.0 | 1.0 |
SJ2 | 1.0 | 1.0 | 1.0 |
Robustness of each subsystem, where the score is between 0 and 1 and defined in the Results section.
Discussion and Conclusions
Described in this paper is a framework for identifying subsystems ('dynamical modules') from a Boolean network model. This method of identifying subsystems is applicable for systems with either fixed point attractors, cyclic attractors or both. The methods are designed to be applicable to genetic regulatory systems. Therefore, we have applied the method to an existing model of the Drosophila segment polarity network. We identify novel subsystems acting across cell boundaries, and demonstrate how they regulate each other to give the attractors. Analysis of the subsystems also allows us to predict which ones underlie robustness in the wild type attractor.
Our methods are ideally suited to analysing multi-stable systems, where different stable states/conditions are captured within different attractors. For a node set with two or more different dynamics in multiple attractors, the different dynamics will form part of different intersection sequences, partition sequences and subsystems. Therefore, some identified subsystems will capture the key components associated with particular stable states (for some set of nodes). Meanwhile, other subsystems will capture shared/conserved dynamics. For example, in the Drosophila segment polarity network model, subsystems SB1, SB2, SC1 and SC2 (Fig. 6) capture different states of 2 positive feedback loops, which underlie multistationarity in this particular system. On the other hand, the subsystem SA captures a conserved sub-state that can co-occur alongside all of the remaining subsystems.
Once the subsystems have been identified, they can provide novel insight into aspects of the model. Firstly, Boolean functions from the model can be used to determine how each subsystem is regulated. This knowledge can then be used to provide a more detailed hierarchical breakdown of the original attractors, allowing the key differences and similarities between different attractors to be highlighted. A second advantage of looking at subsystems is that they can provide a new way of looking at robustness and adaptability. An external perturbation may switch the system from one attractor to another. However, despite this, a subsystem may be robust and remain unchanged. Looking at the effect of external perturbations on subsystems can highlight which parts of the system are most robust, or how certain subsystems can adapt/change without affecting others.
Although we have focussed on Boolean network models in this paper, the method can be adapted to other discrete-state, discrete-time models/systems. Moreover, since subsystems (in our method) are found directly from the attractors, a mathematical model or underlying network topology is not even necessarily required to identify them. Hierarchical links (see Results) between these subsystem can also be found without a mathematical model. These links allow dependencies between the subsystems to be discovered, providing insight into potential functional links. However, to gain a full understanding of how the subsystems interact, and the nature of these links, details of interactions between nodes (genes/proteins) are currently required. Importantly, our method can be applied directly to data on the attractors of a system when these data are incomplete. This can result either from only a subset of the full set of attractors being known, or from information being available for only a subset of nodes. In these cases, the method will still break up the attractors and identify relevant sub-dynamics. However, the more attractors there are, the more detailed and informative the eventual subsystems will be.
For many systems, data may be incomplete and/or noisy, leading to models that are also incomplete and/or unreliable. Even in the case of the extensively studied Drosophila segment polarity network, components may be missing and not fully understood. The most important factor affecting the quality of the results from our method (i.e. whether we find useful subsystems) is the reliability of the data/attractors. Even if there are some inaccuracies in a model or network topology, we can still make some reliable assertions, by focussing on the reliable data/attractors. In the case of the Drosophila segment polarity network model studied in this paper, 3 of the 10 attractors correspond to observed phenotypes. As we have already shown, applying the method directly to these 3 attractors gives us 5 'primary' subsystems (, see Fig. 7B). Then, once extra attractors from any model are taken into account, these 'primary' subsystems are split up so that each new subsystem only comes from a single 'primary' subsystem (e.g. in the existing model, the subsystem splits into SA, SD2, SE2, SH2). Therefore, for any future model of this system (with extra components or interactions), the subsystems obtained from our method should also relate back to these 5 primary subsystems . Therefore, modules extracted from the currently available data/models are still informative, and those that only rely on observed data can be viewed as reliable (even if they are larger/less detailed than the 'true' subsystems). New data and improved models will just increase the precision of the results/subsystems. Another way in which we can assess subsystems (from a current model) is by looking at how robust they are. As well as looking at how robust each subsystem is to perturbations in node states, we can assess how robust each subsystem is to changes in the model itself (such as changes in Boolean functions or the addition of new components). We envisage that this method can be extended in a number of ways. Firstly, the method can be applied to signal transduction pathways by having 'n' attractors representing the state of the pathway(s) under 'n' different environmental/cellular conditions. These attractors can then be analysed with the new method to identify subsystems within pathway(s). Additionally, there is no need to restrict ourselves to studying cellular systems and we envisage that the method will be applicable in other fields.
The main method we have introduced could be applied directly to large scale datasets, to extract novel functional information. For an individual experiment, data on the state of a set of nodes (genes/proteins) are taken from a particular tissue, developmental stage and environmental condition (e.g. normal, high/low temperature, light/dark). Moreover, expression data are typically taken at discrete time points and are often converted to binary (e.g. 1 = 'expressed', 0 = 'not expressed'). Therefore, the results of each individual experiment would correspond to an individual discrete-state discrete-time attractor. Although we have not applied our method to a large datasets in this paper, we believe that following methodology could be used as a starting point
1. Carry out multiple experiments in different environmental conditions/tissues/developmental stages, to give a range of discrete-state attractors,
2. Apply the method described in this paper, directly to this set of attractors (without using any model).
Then, identified subsystems would then be subsets of proteins, whose collective dynamics are conserved across data sets. In principle, this method could be applied to either time series data or fixed point data. However, since it is more difficult to obtain accurate data on time courses rather than steady states, we believe that the approach would be most likely to yield valuable information for fixed point/steady state data. We note here, that data from mutants are not necessary for such an approach. In fact, mutant data may not be as suitable, since each mutant would correspond to a different system. This type of analysis would be complementary to clustering, which primarily looks for groups of genes whose expression levels change in unison.
Reliability of data will be a major issue if trying to apply this method directly to experimental data. Expression data are very noisy and the less reliable the data, the less certain we will be about any identified subsystems. However, if can still be the case that novel groupings of genes/proteins could be found. Moreover, in the future, we envisage that these methods can be adapted so that we estimate the 'probability' of certain partial state sequences occurring in attractors, and then use this data to identify likely subsystems.
One limitation of this method is that it may have difficulties with systems with lots of cyclic attractors, each one containing a similar (but not identical) sub-dynamic. This is because we will get lots of similar intersection sequences (Definition 4). Since this definition provides the hierarchical backbone of the method, the method could struggle to identify informative partition sequences (Definition 5) or subsystems (Definition 6).
The method works best when (1) every attractor is a fixed point or (2) different cyclic sub-dynamics interact hierarchically or independently. This implies that the dynamics are partitioned in a strict hierarchical manner and there is no ambiguity when selecting subsystems. In case (1), an additional advantage is that all the fixed point attractors still occur when any asynchronous/stochastic updating scheme is used in a Boolean network model (once the model reaches a fixed point attractor, no update can cause the system to leave that attractor). Therefore, since subsystems are found directly from the attractors, the subsystems would also be the same. The situation of having all fixed point attractors is often relevant when considering cell type specification in developmental systems, where a cell must settle on one of a number of fixed states.
Methods
Boolean network models
For the purposes of this paper, a Boolean network model is a discrete time, deterministic, synchronous process acting on a directed network of v nodes V = {n1, ..., nv}. At each discrete time step t ≥ 0, each node ni ∈ V has a Boolean state si(t) ∈ {0, 1} and these collectively form a network state x = x(t) = (s1(t), ..., sv(t)). The model progresses, from one time step to the next, by synchronously updating these Boolean states according to a set of Boolean functions f = (f1, ..., fv), as follows
- x(t + 1) = f(x(t)) = (f1(x(t)), ..., fv(x(t))).
As time progresses, x(t) eventually gets trapped in an attractor A = {z0, ..., zp-1}. i.e. there is a time point t' for which
- For all t ≥ t', x(t) = zk (where k = t - t' (mod p)).
Here, each zi ∈ A is called an attractor state
For a given model, there are typically multiple attractors and these correspond to the stable dynamics of the system. For the purposes of this paper = {A1, ..., Ar} is the set of all attractors. A method to identify every attractor in a given Boolean network model is given in [13]. An example of a Boolean network model, along with a sample of its attractors, can be seen in Fig. 2.
It is possible to have models where nodes are updated in an asynchronous fashion. The method of identifying subsystems (below) can still be applied to such models as long as you focus on a specific set of attractors associated with such an updating scheme. The case where node updates are chosen stochastically is not explicitly considered here.
Algorithms
Below, we describe the main algorithms used in our new method. As can be seen in Fig. 1, there are multiple procedures/steps in our method, and so we describe each of these steps separately; namely
- Identify every intersection sequence (satisfying Definition 4),
- Identify every partition sequence (satisfying Definition 5),
- Identify every subsystem (satisfying Definition 6).
Finally, if a suitable mathematical model is available, we show how regulation sets can be found for each subsystem (satisfying Definition 7). Formal proofs for all of the procedures can be found in Additional file 1 (Main Method) and Additional file 2 (Regulation of Subsystems).
However, before describing the main procedure, we need to be able to identify the partial state sequences that occur in each attractor, given a node set N. Therefore, we first describe some procedures relating to partial state sequences.
Algorithms: Partial state sequences
Given a node set N and an attractor A = {z0, z1, ..., zp-1}, the following procedure identifies a partial state sequence that occurs in A (i.e. the 3 properties of Definition 3 are satisfied)
Procedure 1.
Initially let k = 0, b0 = 0 and = {si ∈ z0 : ni ∈ N}. The enter the following loop
Step 1: If k = p - 1, let q* = bp-1 + 1 and go to step 6
Step 2: Let j = k and increment k by 1 (let k = k + 1)
Step 3: If = {si ∈ zk : ni ∈ N}, then let bk = bj and go to step 1 (otherwise go to step 4)
Step 4: Let bk = bj + 1
Step 5: Let = {si ∈ zk : ni ∈ N} and go to step 1
Step 6: If and q* > 1, reduce q* by 1 (let q* = q* - 1)
Step 7: Let q be the smallest integer for which both
(a) q|q* (this can be q = q*)
(b) , whenever f ≤ bp-1, g ≤ bp-1 and f (mod q) = g (mod q)
Step 8: For k = 0, ..., p - 1, let bk = bk (mod q)
end of procedure
A formal proof for this procedure can be be found in Additional file 1 (see Theorem S1.4 in section S1.1.1). However, here, we give a brief justification.
At the end of this procedure occurs in A and the 3 properties of Definition 3 are satisfied. Steps 2–5 ensure that properties 1 and 2 are satisfied and the partial states in P cycle within the attractor A in the correct order (as the attractor progresses over time). Steps 3, 6, 7 and 8 ensures property 3, so that P is the smallest possible set of partial states that cycles within A. In particular
(a) Steps 3 and 6 ensures no two adjacent partial states in P are identical
(b) Step 7 ensures that if a sequence of states cycles many times within an attractor, only one copy is kept.
This leaves a partial state sequence that just describes the 'order' in which the node states change in A (for nodes in N).
This procedure can be easily modified to look at a partial state sequence (for a node set N) occurring in another partial state sequence (where M ⊇ N).
Given a node set N and a set of attractors , we need to be able to find a set of partial state sequences P1, ..., Pk that are all distinguishable from one another and optimally partition into smaller sets .
One such way is to apply the following procedure (Procedure 2). This will identify partial state sequences P1, ..., Pk and sets of attractors that satisfy properties A- F below
A: For i = 1, ..., k, Pi involves the node set N (i.e. )
B: For i = 1, ..., k, Pi occurs in every attractor A ∈ i
C: For i = 1, ..., k, Pi does not occur in any attractor A ∉ i
D: For any i, j (1 ≤ i <j ≤ k),
E:
F: Given the node set N, there are no other partial state sequences P' ∉ {P1, ..., Pk} that occur in any attractor A ∈ (unless P' and some Pi contain the same partial states in the same order, within a rotation)
Procedure 2.
Begin with the node set N and set of attractors and then carry out the following steps
Step 1: For every attractor Aj ∈ , apply Procedure 1 to N and Aj, to get a partial state sequence Qj that occurs in Aj.
Step 2: Put the Qj's into groups i = 1, ..., k, whereby two partial state sequences , go in the same group ⇔ is equivalent to (i.e. contains the same partial states in the same order, within a rotation). Here, k is the minimum number of groups required to hold every Qj.
Step 3: For each group, i, let
i) Pi = any Qj in the group i
ii) i = {Aj : Qj is part of the group i}
end of procedure
A formal proof for this procedure can be be found in Additional file 1 (see Theorem S1.10 in section S1.1.3). However, here, we give a brief justification of why properties A – F are satisfied at the end. If a partial state sequence occurs in an attractor A then so will q - 1 other equivalent partial state sequences that contain the same partial states in the same order (within a rotation). e.g. Moreover, if P occurs in an attractor A then no other partial state sequences for the same node set N can (other than the equivalent ones). Then, because
- Each partial state sequence Qj (from Step 1) occurs in the attractor Aj and involves a node set N,
- Partial state sequences found in Step 1 are only grouped together (in Step 2) if they are equivalent,
- Attractors are only grouped together in Step 3, if equivalent partial state sequences occur in them,
properties A, B, C and F must be satisfied. Properties D and E are satisfied because each attractor is put into exactly one set in Step 3.
Algorithms: Intersection sequences (Stage 1)
Identifying every intersection sequence is equivalent to finding every partial state sequence that satisfies the 3 properties of Definition 4 (for some set of attractors , say).
The method for identifying every intersection sequence can be visualised by considering the tree in Fig. 8. Searching through a tree analogous to this one (for a network with nodes V = {n1, ..., nv}), means that every node set N can be visited at some point. Then, using Procedure 2, we can identify the partial state sequences that occur in each attractor (for each node set N). Then, after the tree has been fully examined, we can pick out the partial state sequences (P) and sets of attractors that satisfy the 3 properties of Definition 4. In reality, many branches of the tree can be ignored, leading to improvements in efficiency (discussed below).
We first give the procedure for identifying every intersection sequence, then give an example and finally discuss ways to make the process more efficient.
Procedure 3
First, consider the tree in Fig. 8 and note that for every node set N, there exists a path from left to right (starting at '-') that corresponds to it. Therefore, searching through a tree analogous to the one in Fig. 8 (for a network with nodes V = {n1, ..., nv}), every node set N can be visited at some point.
The procedure searches through the tree (as in Fig. 8) and carries out the following steps for each node set N. At the end of the procedure the set S contains every intersection sequence
Step 0 (Initialisation): Let S = ∅ and let N = ∅ (-)
Step 1: Move onto the next node set N in the tree (as in Fig. 8).
Step 2: For the node set N, apply Procedure 2 to identify partial state sequences P1, ..., Pk and sets of attractors satisfying
A: For i = 1, ..., k, Pi involves the node set N (i.e. )
B: For i = 1, ..., k, Pi occurs in every attractor A ∈ i
C: For i = 1, ..., k, Pi does not occur in any attractor A ∉ i
D: For any i, j (1 ≤ i <j ≤ k),
E: (the set of all attractors)
F: Given the node set N, there are no other partial state sequences P' ∉ {P1, ..., Pk} that occur in any attractor A ∈
Step 3: For i = 1, ..., k, add the pair {Pi, i} to the set S
Step 4: For i = 1, ..., k, check S to see if there is any pair {} for which either of the following are true
(a) M ⊂ N and
(b) M ⊃ N and
If (a) is true, remove {Q, } from S. If (b) is true, remove {Pi, i} from S
Step 5: If the tree has been completely searched, end procedure. Otherwise, return to step 1.
end of procedure
A formal proof for this procedure can be be found in Additional file 1 (see Theorem S1.14 in section S1.2.1). However, here, we give a brief justification.
At the end of the procedure, S gives a complete set of intersection sequences (satisfying the 3 properties of Definition 4). Step 2 ensures every partial state sequence that satisfies properties 1 and 2 are identified for each node set N. Step 4 then ensures that only those satisfying property 3 remain in S.
As an example of how the procedure finds intersection sequences, consider the model in Fig. 2 and the intersection sequence P3 (from Fig. 3). When the node set N = {n2, n3, n4} is analysed in Step 2 of the algorithm (using Procedure 2) we find two distinct partial state sequences that satisfy properties 1 and 2 of Definition 4, namely
- P3 that intersects at 3 = {A1, A2, A3}
- Px = {s2 = 0, s3 = 0, s4 = 0} that intersects at x = {A4}
The remainder of the algorithm involves deciphering whether of these partial state sequences satisfy the final property (Property 3 of Definition 4). This can be done by looking at the way the attractors are split for different node sets. For the node set N = {n2, n3, n4} (above) we split the attractors as follows - {A1, A2, A3} + {A4}. For larger nodes sets M ⊃ N
M1 = {n1, n2, n3, n4} splits the attractors as follows - {A1, A2} + {A3} + {A4}
M5 = {n2, n3, n4, n5} splits the attractors as follows - {A1} + {A2} + {A3} + {A4}
M6 = {n2, n3, n4, n6} splits the attractors as follows - {A1} + {A2} + {A3} + {A4}
etc
Therefore, when P3 and 3 = {A1, A2, A3} are considered in Step 4, it can never be removed from the set S. This is because when a larger node set M ⊃ N is considered, we will never have a set of attractors (or ). Therefore, P3 satisfies the property 3 and is an intersection sequence. However, Px is not an intersection sequence since the corresponding set of attractors ({A4}) is still associated with a larger node set M ⊃ N. Therefore it will be removed from the set S in Step 4 (either when N or M is analysed in the procedure).
We now explain how the procedure can be made more efficient
Improving efficiency(1)
Consider node sets N ⊂ M and two partial state sequences and that both occur in some attractor A. Then, P' must contain P and hence only occur in an attractor whenever P does (proved in Lemma S1.12 of Additional file 1).
Therefore, if N ⊂ M ⊂ V, and P only occurs in a single attractor A*, neither P nor P' can be intersection sequences. This is since the attractor A* is itself a partial state sequence (for a larger node set V) that occurs in exactly the same set of attractors (i.e. just A*), and so property 3 of Definition 4 fails.
Therefore, if Step 2 of Procedure 3 identifies a partial state sequence that only occurs in a single attractor i = {A*}, we know that re-analysing this attractor {A*} for any node set M ⊃ N is pointless.
We show one way in which this knowledge can improve efficiency in Procedure 3. Suppose, attractors were returned as single attractors in Step 2, when analysing an earlier node set P ⊆ N (including P = N). Then, when a path is extended to the right in the tree (Fig. 8) from a node set N to a node set M ⊃ N, Procedure 2 in Step 2 need only be applied to the smaller set of attractors
Moreover, if every attractor is returned as a single attractor in Step 2, when analysing an earlier node set P ⊆ N (including P = N), there is no need to extend the path to look at node sets M ⊃ N. In Fig. 8, this is equivalent to ignoring all longer paths that include extra nodes to the right. For example, if N = {n1, n3}, there would be no need to look at longer paths (from left to right) that give node sets M = {n1, n3, n4}, M = {n1, n3, n5} or M = {n1, n3, n4, n5}.
However, because all of the attractors are intersection sequence, the full node set V = {n1, ..., nv} should still be fully analysed in Steps 2 – 4 (possibly at the very end of the procedure).
Improving efficiency(2)
As can be seen in Fig. 8, some nodes appear less than others, with the least frequent nodes visited earlier in the tree. Therefore, it is likely to be advantageous to re-index nodes in the tree during the search. At any stage during the search, nodes along paths to the right (from a node set N) can be re-indexed without impairing our ability to search the tree. For example, once N = {n1, n3} has been reached, re-indexing nodes {n4, n5} to {n5, n4} still allows us to reach the same node sets M = {n1, n3, n4}, M = {n1, n3, n5} and M = {n1, n3, n4, n5}, as before. However, they would be visited in a different order (M = {n1, n3, n5} then M = {n1, n3, n4} then M = {n1, n3, n4, n5}).
Once a node set N has been analysed, re-indexing so that the next node nj to be visited maximises c (below) will speed up the search
- For the sets of attractors identified in Step 2 (for the new node set M = N ∪ {nj}), i = {Ai} is a single attractor for c (≤ k) different values of i
Although this involves carrying out Step 2 multiple times (to compare different nj's), selecting an nj that gives lots of single attractor i's will mean less analysis later on (as discussed above). The quicker we can reach a stage where every set i in Step 2 is a single attractor, the more of the tree can be ignored during the rest of the search.
Algorithms: Partition sequences (Stage 1)
In order to identify every partition sequences, we start with the full set of intersection sequences and all the corresponding sets of attractors (obtained from Procedure 3). i.e. every pair {P, } such that P intersects at . Then, these intersection sequences are used to identify all the partial state sequences that satisfy any of properties A, B or C of Definition 5. Using the complete list of intersection sequences, properties A, B and C are considered independently and partition sequences stored after each test.
Part A: Core components
From Procedure 3, we get a set S that contains the complete set of intersection sequences, along with the set of attractors each one intersects at (if {P', } S, then P' intersects at ).
Using this set S as an input, the following procedure identifies every partial state sequence that is core to some set of attractors (i.e every partial state sequence P satisfying Definition 5A)
Procedure 4. A
Initially, let the set T = ∅ (empty set)
Then, for every intersection sequence , carry out the following steps
Step 1: From the complete set of intersection sequences (S), identify every Qi (for the node set Mi) for which
(a) Qi intersects at i, where
(b) There is no intersection sequence Q* (for a larger node set M* ⊃ Mi) that intersects at
Step 2: Let k be the number of partial state sequences from Step 1
Step 3: Let N = M1 ∩ ... ∩ Mk (N ⊆ N' since P' is itself identified in Step 1)
Step 4: If N ≠ ∅ in Step 3, find a partial state sequence that occurs in P' (see Procedure 1).
Step 5: If N ≠ ∅ in Step 3, add the pair {P, } to the set T
end of procedure
At the end of the procedure, T contains every partial state sequence P that is core to some set of attractors (i.e every partial state sequence P satisfying Definition 5A). Essentially, we take each intersection sequence P' in turn and then re-run though the set of all intersection sequences to find those (Qi, for a node set Mi) that satisfy the following
(a) Qi co-occurs with P' in at least one attractor (Qi can be P').
(b) Qi isn't contained in a larger intersection sequence (Q*, say) that co-occurs with P' in the same attractors
Then, the node set N = M1 ∩ ... ∩ Mk is the core set of nodes underlying P' and the attractors it occurs in (). The partial state sequence P that satisfies property A is then just partial state sequence (for the node set N) that occurs in P'
A formal proof for this procedure can be be found in Additional file 1 (see Theorem S1.17 in section S1.2.2). However, here, we give a brief justification.
The 3 properties of Definition 5A are satisfied, for all the identified partial state sequences P. Property 2 is satisfied because of the following. If an intersection sequence Q (for a node set M) intersects at a set of attractors (where ), then there is an intersection sequence Q' (for a node set M') identified in step 1 for which
- M ⊆ M' ⊆ N (because of Steps 1 and 3)
- Q' occurs in every attractor A ∈ (because of Step 1)
and so
- M ∪ N ⊇ M'
- Q' occurs in every attractor A ∈
Property 1 is satisfied because of Step 4. Property 3 is satisfied because of the way N is chosen in Step 3. We note that every partial state sequence satisfying Definition 5A is identified because (a) Property 1 implies that such a sequence must occur in an intersection sequence and (b) every intersection sequence is analysed independently.
Part B: Exclusive (Procedure 4B)
This is simply done by searching through all intersection sequences (in S) and identifying those that satisfy Definition 5B.
Part C: Independently Oscillating (Procedure 4C)
This is simply done by searching through every pair of intersection sequences (in S) to see which pairs satisfy Definition 5C. Where such a pair is found, both of them are partition sequences.
Algorithms: Subsystems (Stage 2)
In order to identify every subsystem, we start with the full set of partition sequences from Stage 1. i.e. every partial state sequence satisfying properties either A, B and C of Definition 5, The following procedure identifies every subsystem (satisfying Definition 6)
Procedure 5.
Initially, let the set U = ∅ (empty set)
Then, for every partition sequence (identified in A, B or C of the previous section), carry out the following steps
Step 1: From the complete set of partition sequences, identify every partition sequence Pi (for a node set Mi) for which
(a) Mi ⊂ M
(b) Pi and P both co-occur in some attractor A
(This also implies that Pi occurs in P)
Step 2: Let k be the number of partition sequences from Step 1
Step 3: Let N = M\(M1 ∪ ... ∪ Mk)
Step 4: If N ≠ ∅, identify that occurs in P (see Procedure 1)
Step 5: If N ≠ ∅, add S (identified in Step 4) to the set U
end of procedure
For each partition sequence P (for a node set M, say), we look to see which other partition sequences (for node sets ) occurs within it. Then the node set N, which is unique to P, can be calculated as N = M\(M1 ∪ ... ∪ Mk). If N ≠ ∅, the partial state sequence S (for the node set N) that occurs in P, is stored as a subsystem.
A formal proof for this procedure can be be found in Additional file 1 (see Theorem S1.19 in section S1.3). However, here, we give a brief justification.
At the end of the procedure, U contains every subsystem P satisfying the 3 properties of Definition 6. Essentially, property 1 is satisfied because of Step 4. Property 2 is satisfied because of Step 1 and the choice of N in step 3. Property 3 is satisfied because N is the largest set for which Mi ∩ N = ∅ for all i = 1, ..., k. We note that every partial state sequence satisfying Definition 6 is identified because (a) Property 1 implies that such a sequence must occur in an partition sequence and (b) every partition sequence is analysed independently.
Algorithms: Regulation of subsystems
In order to show how regulation sets can be identified for each subsystem, we first focus on how partial states are regulated in Boolean network models (although all the methods are applicable to other discrete-state discrete-time logical models).
It may be that a partial state xN controls some Boolean functions {fi : ni ∈ M} and ensure the occurrence of yM in the following time step. i.e xN is a predecessor of yM (or xN triggers the occurrence of yM). For example, in Fig. 2, by following the Boolean functions it is possible to say that
- The occurrence of = {s2 = 1, s3 = 1} can be triggered by the occurrence of = {s1 = 1, s2 = 1}.
- The occurrence of = {s1 = 1, s2 = 1} can be triggered by the occurrence of = {s1 = 1}.
or = {s1 = 1} triggers the occurrence of = {s2 = 1, s3 = 1} after two time steps.
This notion of predecessors can be extended to look for predecessors within attractors and subsystems. Going backwards around an attractor it is possible to identify which partial states are responsible for triggering the occurrence of other partial states in the same attractor state at a later point in time. Returning to the example in Fig. 2, and looking at attractor A1, it is possible to go backwards around the attractor via two different routes to say
Route 1:
The occurrence of {s2 = 1, s3 = 1} in z0 can be triggered by the occurrence of {s1 = 1, s2 = 1} in z3
The occurrence of {s1 = 1, s2 = 1} in z1 can be triggered by the occurrence of {s1 = 1} in z2
The occurrence of {s1 = 1} in z2 can be triggered by the occurrence of {s1 = 1} in z1
The occurrence of {s1 = 1} in z3 can be triggered by the occurrence of {s1 = 1} in z0
Route 2:
The occurrence of {s2 = 1, s3 = 1} in z0 can be triggered by the occurrence of {s2 = 1, s3 = 1} in z3
The occurrence of {s2 = 1, s3 = 1} in z1 can be triggered by the occurrence of {s2 = 1, s3 = 1} in z2
The occurrence of {s2 = 1, s3 = 1} in z2 can be triggered by the occurrence of {s2 = 1, s3 = 1} in z1
The occurrence of {s2 = 1, s3 = 1} in z3 can be triggered by the occurrence of {s2 = 1, s3 = 1} in z0
and so we have
1. {s1 = 1} triggers the occurrence of {s2 = 1, s3 = 1} in the attractor state z0
2. {s2 = 1, s3 = 1} triggers the occurrence of {s2 = 1, s3 = 1} in the attractor state z0
(Note: Similar conclusions could also be made for z1, z1 and z3)
So, more generally, given a partial state yM that occurs in an attractor state z ∈ A, we want to be able to find a suitable collection of predecessors (that trigger the occurrence of yM in z).
Procedure 6. Given a partial state yM that occurs in an attractor state z ∈ A, this procedure identifies partial states that trigger the occurrence of yM in z.
Starting from the partial state yM in z, this procedure involves going backwards around the attractor A (as in the above example) and identifying new predecessors at each step (using a method adapted from [13]). The procedure ends when no more new predecessors (that trigger the occurrence of yM in z) can be found.
Since much of this method involves adapting a previously published algorithm (from [13]), we do not give full details here. More details of this procedure can be found in section S2.1 of Additional file 2 (in particular, Procedures S2.8 and S2.11)
end of procedure
We now want to apply these procedures to subsystems. In particular, we want to find collections of subsystems whose co-occurrence in an attractor triggers chain of interactions that results in the occurrence of Sy.
Returning to the above example (from Fig. 2), we carry out Procedure 6 on S1 = {s2 = 1, s3 = 1} in all of the attractor states in A1, A2 and A3 (which contain S1), which gives the following results.
A1 and A2: S1 = {s2 = 1, s3 = 1} can trigger the occurrence of S1 = {s2 = 1, s3 = 1} in the attractor states z0, z1, z2 and z3
A1 and A2: S2 = {s1 = 1} can trigger the occurrence of S1 = {s2 = 1, s3 = 1} in the attractor states z0, z1, z2 and z3
A3: S1 = {s2 = 1, s3 = 1} can trigger the occurrence of S1 = {s2 = 1, s3 = 1} in the attractor states z0 and z1
Therefore, by looking at all of the attractor states (individually) in each attractor, we can say that
(a) S2 can trigger the occurrence of S1 in attractors A1 and A2.
(b) S1 can trigger the occurrence of itself (S1) in attractors A1, A2 and A3.
This is then sufficient to give a regulation set for S1; namely 1 = {S1} and 2 = {S2}. In this very simple example, the same subsystems/partial states are involved in every attractor state within an attractor. However, it may be case that different subsystems are involved in triggering different partial states (from a subsystem Sy) in different attractor states, and so it is necessary to consider each attractor state individually.
Procedure 7 (below) demonstrates a method of identifying a regulation set for a subsystem . This is done by looking at every attractor state in every attractor (containing Sy) to see how what triggers the occurrence of each relevant partial state .
However, since partial states within subsystems could be subject to different time lags in different attractors (see Definition 3), we first note that we need to consider the precise dynamics (the instances) of subsystems in attractors. i.e.
Definition 9. Consider a collection of subsystems = {S1, ..., Sf} where every involves a node set Ni and occurs in the attractor A = {z0, ..., zp-1}
The instance of in A is the partial state sequence , where
1. M = N1 ∪ ... ∪ Nf
2. For k = 0, ..., p - 1, = {sx ∈ zk : nx ∈ M}.
We can then use this terminology to describe whether a collection of subsystems = {S1, ..., Sf} triggers an individual subsystem Sy in an attractor A. i.e If the co-occurrence of subsystems S1, ..., Sf ensures the occurrence of Sy in A.
Definition 10. Suppose we have
1. An attractor A = {z0, ..., zp-1}
2. A collection of subsystems where
(a) all occur in A
(b) is the instance of x in A
3. An individual subsystem Sy where
(a) Sy occurs in A
(b) is the instance of Sy in A
Then x triggers the occurrence of Sy in A if the following holds for every i ∈ {0, ..., p - 1}
- triggers the occurrence of in the attractor state zi ∈ A.
We now give the procedure for identifying a regulation set of a subsystem Sy (Procedure 7). At the start of the procedure, we take a subsystem Sy (involving a node set My) and a set of attractors y, where
(a) Sy occurs in every attractor A ∈ y
(b) Sy does not occur in any attractor A ∉ y
Moreover, we assume we know every subsystem , along with the node set involved (N) and a list of attractors it occurs in. Each subsystem and node set N is a by-product of the method of identifying subsystems (described previously in this Methods section). A list of attractors can be found by applying Procedure 2 to the node set N (and the full set of attractors). This information is used in Step 2 of the procedure (below).
Procedure 7.
Initially, let the set R = ∅ (empty set)
For every Ai = {z0, ..., zp-1} ∈ y carry out the following steps.
Step 1: Let the set Ri = ∅. Let the sets U0, ..., Up-1 = ∅
Step 2: Identify every subsystem T1, ...., Th that occurs in Ai. Moreover, let M1, ..., Mh be the node sets involved in T1, ...., Th (resp)
Step 3: Identify the instance of Sy in Ai. i.e. .
(The procedure for this is obvious from Definition 9, given the node set My and attractor Ai)
Step 4: For j = 0, ..., p - 1, carry out Procedure 6 to identify predecessors of in zj ∈ Ai (i.e. the partial states that trigger the occurrence of in zj). The resulting predecessors are added to the set Uj
Step 5: For every possible combination of partial states satisfying
- ∈ Uj (for j = 0, ..., p - 1)
carry out the following
(a) Let N = N0 ∪ ... ∪ Np-1
(b) Let = {Ta : Ma ∩ N ≠ ∅}
(c) Add to the set Ri
Step 6: Remove all subsystem collections from Ri that contain other subsystem collections ' ∈ Ri. (i.e. )
Step 7: Add the subsystem collections in Ri to the set R
end of procedure
At the end of this procedure, every subsystem collection ∈ Ri triggers the occurrence of Sy in the attractor Ai (i.e. Definition 10 is satisfied). For every attractor state zj ∈ Ai, we identify the partial state ∈ Sy that occurs in zj (Step 3). Then, starting from the partial state in the attractor state zj, we go backwards around the attractor, identifying suitable predecessors at each time step (this backwards process can go multiple times around the attractor). This will then give us a list of partial states that can trigger the occurrence of in zj (added to Uj Step 4). Having done this for every attractor state zj ∈ Ai, Step 5 then pulls the results together to find collections of subsystems = {S1, ..., Sf} that triggers the occurrence of Sy in A (i.e. those collections that satisfy Definition 10). In particular, finding collections for which the following is true.
For every attractor state zj ∈ Ai, there exists a partial state that satisfies
(a) triggers the occurrence of in zj
(b) only involves nodes and states from the subsystems {S1, ..., Sf}
Step 6 just ensures that only the most informative collections are kept and there is no redundancy. Finally, since these collections of subsystems are added to the set R in Step 7, for every attractor (which contains Sy), R = {} is a regulation set and satisfies the properties of Definition 7.
A formal proof for this procedure can be be found in Additional file 2 (see Theorem S2.20 in section S2.2.2). However, this procedure shows just one way of a regulation set, for a subsystem Sy. There may be more than one possible regulation set for a subsystem and some may be more descriptive than others. In Section S2.2.1 and S2.2.2 of Additional file 2, we give some some extra constraints (and corresponding procedures) that can be applied when looking for regulation sets.
Authors' contributions
DI designed the method of identifying subsystems, co-conceived the study and drafted the manuscript. NM co-conceived the study and drafted the manuscript. All authors read and approved the final manuscript.
Supplementary Material
Contributor Information
David J Irons, Email: d.irons@sheffield.ac.uk.
Nicholas AM Monk, Email: nick.monk@nottingham.ac.uk.
Acknowledgements
We would like to thank C. Cannings, S. Bullock and K. Walters for critical reading. This work was supported by funding from BBSRC (grant ref: 02/B1/B/08370) and EPSRC (grant ref: EP/D003105/1).
References
- Hartwell L, Hopfield J, Leibler S, Murray A. From molecular to modular cell biology. Nature (supp) 1999;402:C47–C52. doi: 10.1038/35011540. [DOI] [PubMed] [Google Scholar]
- Yu H, Gerstein M. Genomic analysis of the hierarchical structure of regulatory networks. PNAS. 2006;103:14724–14731. doi: 10.1073/pnas.0508637103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Newman M, Girvan M. Finding and evaluating community structure in networks. Phys Rev E. 2004;69:026113. doi: 10.1103/PhysRevE.69.026113. [DOI] [PubMed] [Google Scholar]
- Shen-Orr S, Milo R, Mangan S, Alon U. Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet. 2002;31:64–68. doi: 10.1038/ng881. [DOI] [PubMed] [Google Scholar]
- Resendis-Antonio O, Freyre-González J, Menchaca-Méndez R, Gutiérrez-Rios R, Martinez-Antonio A, Ávila-Sánchez C, Collado-Vides J. Modular analysis of the transcriptional regulatory network of E. coli. Trends Genet. 2005;21:16–20. doi: 10.1016/j.tig.2004.11.010. [DOI] [PubMed] [Google Scholar]
- Ingram P, Stumpf M, Stark J. Network motifs: structure does not determine function. BMC Genomics. 2006;7:108. doi: 10.1186/1471-2164-7-108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bar-Joseph Z, Gerber G, Lee T, Rinaldi N, Yoo J, Robert F, Gordon D, Fraenkel E, Jaakkola T, Young R, Gifford D. Computational discovery of gene modules and regulatory networks. Nat Biotech. 2003;21(11):1337–1342. doi: 10.1038/nbt890. [DOI] [PubMed] [Google Scholar]
- Kato M, Hata N, Banerjee N, Futcher B, Zhang M. Identifying combinatorial regulation of transcription factors and binding motifs. Genome Biol. 2004;5(8):1–13. doi: 10.1186/gb-2004-5-8-r56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen K, Calzone L, Csikasz-Nagy A, Cross F, Novak B, Tyson J. Integrative Analysis of Cell Cycle Control in Budding Yeast. Mol Biol Cell. 2004;15:3841–3862. doi: 10.1091/mbc.E03-11-0794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Albert R, Othmer H. The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila. J Theor Biol. 2003;223:1–18. doi: 10.1016/S0022-5193(03)00035-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chaves M, Albert R, Sontag E. Robustness and fragility of Boolean models for genetic regulatory networks. J Theor Biol. 2005;235:431–449. doi: 10.1016/j.jtbi.2005.01.023. [DOI] [PubMed] [Google Scholar]
- Ma W, Lai L, Ouyang Q, Tang C. Robustness and modular design of the Drosophila segment polarity network. Mol Syst Biol. 2006;2:70. doi: 10.1038/msb4100111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Irons D. Improving the efficiency of attractor cycle identification in Boolean networks. Physica D. 2006;217:7–21. doi: 10.1016/j.physd.2006.03.006. [DOI] [Google Scholar]
- Grossniklaus U, Pearson R, Gehring W. The Drosophila sloppy paired locus encodes two proteins involved in segmentation that show homology with mammalian transcription factors. Genes & Dev. 1992;6:1030–1051. doi: 10.1101/gad.6.6.1030. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.