Abstract
We show how a graph algorithm for finding matching labeled paths in pairs of labeled directed graphs can be used to perform model invalidation for a class of dynamical systems including regulatory network models of relevance to systems biology. In particular, given a partial order of events describing local minima and local maxima of observed quantities from experimental time series data, we produce a labeled directed graph we call the pattern graph for which every path from root to leaf corresponds to a plausible sequence of events. We then consider the regulatory network model, which can itself be rendered into a labeled directed graph we call the search graph via techniques previously developed in computational dynamics. Labels on the pattern graph correspond to experimentally observed events, while labels on the search graph correspond to mathematical facts about the model. We give a theoretical guarantee that failing to find a match invalidates the model. As an application we consider gene regulatory models for the yeast S. cerevisiae.
Keywords: regulatory networks, switching systems, time series, dynamics
AMS subject classifications. 37N25, 37N30
1. Introduction.
One of the fundamental challenges, as we move toward an era of data driven science, is how to make use of imprecise data to select or reject models and parameters that cannot be derived from first principles. Motivated by problems from systems biology we address this challenge in the context of oscillatory data under the assumption that reasonable models prescribe appropriate local behavior of trajectories. We adopt the following strategy. From experimental time series data we extract a partial order of events describing minima and maxima of observed quantities. On the modeling side, as a function of parameters, we construct a directed graph to catalogue the possible dynamics. The main result of this paper is an efficient algorithm to identify if the model dynamics is capable of exhibiting sequences of minima and maxima that are consistent with the experimental data. Failure can then be used for model rejection or parameter reduction.
To provide more detail we consider a particular example. High throughput experimental technology is making the collection of time series of gene expression a routine process. However, this data is noisy, often contains significant measurement error, is typically collected at a coarse time scale, and generally is collected over a relatively short time span. In an attempt to extract robust information from such data we focus on the ordering of extremal events. This paper does not address the difficulty of detecting and eliminating spurious pairs of extrema in data—a challenging problem in its own right. Instead, we assume that a statistically valid procedure is used to identify or impose time intervals during which a local maximum or minimum has occurred. This renders the time series into a set of extremal events where the error bounds determine the time intervals associated with the individual maxima and minima. If two intervals do not overlap, then we can distinguish the relative timing between the associated events, but if they do overlap, we cannot. Therefore, we represent relative timing as a partially ordered set (poset) that we call the poset of extrema (see Definition 3.1). We assume, however, that there is a linear temporal ordering along which the extrema occur and our lack of knowledge is due to experimental constraints. Consequently, we adopt the hypothesis that one of the linear extensions of the poset of extrema represents the correct sequence of events.
Because gene expression data is noisy and often collected at a coarse time scale, in practice there are many nodes in the poset of extrema that are incomparable (i.e., the time ordering cannot be resolved from the data). Roughly speaking, each such ambiguity leads to an additional multiplicative factor in the number of possible linear extensions. Therefore, in general we expect that the set of all linear extensions will be large—in fact, exponential in the number of genes and exponential in the length of the time series. To overcome this ostensibly intractable situation we construct a labeled directed acyclic graph that we call the pattern graph (see Definition 3.2), which gives a compressed representation of the set of all linear extensions. This directed graph is given by the transitive reduction of the lattice of down sets of the poset of extrema, with labels on nodes and edges identifying which variables are increasing or decreasing or have reached extrema. As explained in section 2.3.1, for a fixed number of genes the time to compute the pattern graph is only polynomial in the length of the time series, and the resulting labeled directed acyclic graph may be stored in linear space. This gives rise to efficient (i.e., polynomial time) algorithms for pattern matching among the exponential number of linear extensions.
The poset of extrema and the pattern graph represents the codification of the experimental data. Viewed abstractly this is just a means of formally capturing potential temporal ordering of experimentally observable phenomena and therefore these ideas are potentially applicable to a wide variety of problems within and outside of the life sciences.
Returning to the example of gene regulation, we observe that this is an extremely complex multiscale process and thus it is not reasonable to postulate a precise nonlinear model that describes its behavior. However, as indicated above, we assume that we can make assumptions concerning the local qualitative behavior of the dynamics. With this in mind we introduce the following notion.
Definition 1.1.
A system of trajectories on a space X is a collection of continuous functions from closed intervals to X, i.e., x: [a, b] → X for some a, b ∈ R, called trajectories, such that
the restriction of any trajectory to a smaller closed interval is again a trajectory;
a concatenation of trajectories is again a trajectory (more precisely, if x: [a, b] → X and y: [c, d] → X are trajectories such that x(b) = y(c), then there exists a trajectory z: [0, (b – a) + (d – c)] → X and any time translation of z, that agrees with x on [0, b − a] and y on [b − a, (b − a) + (d − c)]);
the time translation, i.e., x(t −Δt), of any trajectory is again a trajectory; and
every map x:{0} → X is a trajectory.
Notice that the set of solutions to a smooth system of ordinary differential equations can be written as a system of trajectories.
A heuristic description of how we employ this concept (see section 3.2 for formal definitions) is as follows. The phase space X is decomposed into a finite number of rectangular domains . Restricted to each of these rectangular domains each individual trajectory is monotone in every variable (note that even within a domain we are not assuming the same directions of monotonicity for each trajectory). The boundaries of the domains are called walls. On the walls the trajectories are allowed to achieve a local extremum with respect to at most one variable, which we call an extremal event. A system of trajectories that satisfies these conditions is called extrema-pattern-matchable with respect to . The dynamics is then recorded as a labeled directed graph, called the search graph (see Definition 3.6), where vertices correspond to domains and edges are determined by wall trajectories. The labeling acts on nodes and edges and is used to codify our knowledge as to which variables are increasing or decreasing or have reached extrema. Thus, the search graph formalizes the structure of the dynamics that can be expressed by a model that generates the system of trajectories.
The content of this paper is the development of an efficient means of comparing the model dynamics against the experimentally observed dynamics. The fundamental result is Theorem 2.5, which provides a polynomial time algorithm for matching maximal paths in the pattern graph, i.e., a specific ordering of extrema events that is compatible with the experimental data, to paths in the search graph, i.e., trajectories that are realizable by the model for the dynamics, where the matching preserves the labeling. A simplistic description of the applicability of this result is as follows: if the algorithm fails to produce a matching, then this provides a guarantee that the dynamics incorporated in the system of trajectories is incapable of reproducing the sequences of extrema that are compatible with the data. As a consequence we can reject the associated model of dynamics.
Of course, to apply these ideas to realistic problems is much more challenging. While we do not know a particular model that describes the observed dynamics we assume that the dynamics can be modeled by an unknown nonlinear system. One of the fundamental lessons of the theory of dynamical systems is that structure of invariant sets of nonlinear systems can change dramatically as a function of parameters. Thus, to achieve the claims of the title of this paper, we need both a systematic method for generating parameterized models and their associated search graphs, and a robust finite characterization of dynamics that can be computed over all parameter values. For this we make use of the recently developed Dynamics Signatures Generated by Regulatory Network (DSGRN) framework and software [4, 15].
The starting point for DSGRN is a regulatory network RN (see Definition 4.1). In the context of gene regulation this is an annotated directed graph in which the nodes represent genes, edges indicate the interaction between the genes, and the annotation indicates if the interaction involves activation or repression. It is also assumed that a logic, dictating how information received at each node is processed, is given. Given a regulatory network with N nodes and |E| edges DSGRN represents dynamics occurring on the phase space X = (0, ∞)N where the dynamics is parameterized by an N + 3 · |E| dimensional set Z ⊂ (0, ∞)N+3·|E|. In particular, DSGRN computes a finite decomposition of Z where the individual regions are given by explicit semialgebraic sets. DSGRN represents parameter space via an undirected graph PG, called the parameter graph, where the nodes correspond to the above mentioned regions of Z and edges provide adjacency information. The dynamics is represented by a directed graph, called a state transition graph, derived from a parameter dependent rectangular decomposition of X. A fundamental fact is that as a function of parameters the state transition graph is constant over nodes of PG, i.e., it does not change on the individual regions of the decomposition of parameter space. The state transition graph can be large, and therefore DSGRN condenses the information into a directed acyclic graph MG, called a Morse graph. The nodes of the Morse graph correspond to maximal recurrent subgraphs of the state transition graph and the edges indicate reachability, via the state transition graph, from one Morse node to another. The output of DSGRN is called the DSGRN database, which is organized around the parameter graph PG. In particular, for each node in PG the database provides the explicit semialgebraic set in parameter space and the associated Morse graph MG.
Observe that, as desired, DSGRN provides us with a finite description of global dynamics over parameter space. However, the dynamics is described in combinatorial terms and we would like to argue that we are comparing the ordering extrema of continuous trajectories against experimental data. To make this comparison as transparent as possible we use the information encoded in the state transition graph to construct a particularly simple system of trajectories based on classical switching system models [5, 6, 7, 8, 11, 12, 13, 14].
However, we claim that the results for the switching systems immediately extend to a broader class of models. In particular, as is made explicit in the proof of Theorem 4.8, the qualitative properties of the trajectories of alone are sufficient to obtain an extreme-pattern-matchable system of trajectories. Thus, our results apply any model that produces trajectories with the same monotonicity properties as that of the switching system.
This paper is organized as follows. In section 2 we begin by recalling ideas from and establishing notation associated with graphs and posets. We present the algorithms that underlie our approach to matching model dynamics with experimental data and provide worst case complexity bounds for these algorithms.
In section 3 we provide combinatorial formalizations of the experimental data, the dynamics of the models, and the relation that allows us to compare them. To be more specific, in section 3.1 we show how experimental data can furnish a poset of extrema P. Interest in the set of all linear extensions leads us (by Theorem 2.10) to construct the down set graph of the poset of extrema. We label each down set according to whether a function that has experienced the events in the down set but not the events not included in the down set is increasing or decreasing in each variable. We label the edges in the down set graph (which are of the form A → A ∪ {p}) according to the extremal event associated with p ∈ P. We also introduce self-edges that are labeled as not experiencing any extremal events. We call the resulting labeled directed graph the pattern graph.
In section 3.2, we describe a class of dynamical models for which we can characterize possible trajectories in a combinatorial manner via a domain graph. A domain graph discretizes a dynamical system by giving a finite set of domains separated by codimension-1 walls. Vertices in the domain graph correspond to domains, and edges correspond to flow from one domain to another via a wall. We label the vertices of the domain graph according to whether the coordinate functions xi(t) are increasing, decreasing, or possibly both, in the associated domains. We label the edges of the graph according to which local minima or local maxima could occur on the associated walls. We call the resulting labeled directed graph a search graph.
In section 3.3, we present a matching relation between the pattern graph and the search graph. We prove (Theorem 3.8) that if there does not exist a match between a path from root to leaf of the pattern graph and a path in the search graph, then the dynamical model underlying the search graph is incompatible with the experimental observations leading to the pattern graph. Theorem 2.5 shows that we can decide whether such a match exists in polynomial time.
Finally, in section 4 we show how these ideas can be applied. We begin in section 4.1 with a brief review of the mathematical structure underlying DSGRN. In section 4.2 we provide a simple example of how one can pass from experimental time series data to a labeled pattern graph. Finally, in section 4.3 we apply these techniques to a simple wavepool model [19] for the metabolic cycle in S. cerevisiae. Courtesy of the Haase lab [18] we have experimental time series data for mRNA sequences associated with the genes SWI4, HCM1, NDD1, and YOX1 collected at time intervals of 5 minutes (see Figure 4). We take a biologically implausible model and show that our proposed techniques reject it and we take a model that is biologically acceptable and use our techniques to greatly constrain relations between parameters.
2. Graph theory and algorithms.
2.1. Matching paths in labeled graphs.
Definition 2.1.
Given a finite set Σ, we denote by Σn the set of n-tuples consisting of elements of Σ. We denote by Σ* the set of all finite tuples of elements of Σ, i.e., .
For the purpose of this paper a directed graph G = (V, E) consists of a finite set of vertices V and edges E ⊂ V × V. A path in G from s ∈ V to t ∈ V is a finite sequence of vertices (s = v1, v2, …, vn = t) such that vi ∈ V and (vi, vi+1) ∈ E. We denote the set of all such paths by G[s ⇝ t].
Definition 2.2.
A labeled directed graph G is a quadruple (V, E, Σ, ℓ) where V and E denote the vertices and edges of G, Σ is a finite set called labels, and ℓ : V ∪ E → Σ is called a labeling function. Given a path p = (v1, …, vn) in G, the associated labeling is defined to be
Definition 2.3.
A matching relation between two labeled directed graphs G = (V, E, Σ, ℓ) and G′ = (V′, E′, Σ′, ℓ′) is a choice of a relation between the label sets Σ and Σ′. To indicate that the labels a ∈ Σ and b ∈ Σ′ match, i.e., are related, we write a ⌣ b. We extend the matching relation ⌣ onto the tuples of labels Σ* and Σ′* via
Given a matching relation, a path p = (v1, …, vn) in G[s ⇝ t], and a path in G′[s′ ⇝ t′], we say that p matches p′ and write p ⌣ p′ whenever L(p) ⇝ L(p′). Note that we are using the same symbol ⌣ to refer to three matching relations: between Σ and Σ′, between Σ* and Σ′*, and between paths in G[s ⇝ t] and paths in G′[s′ ⇝ t′].
Definition 2.4.
Let ⌣ be a matching relation between two labeled directed graphs G = (V, E, Σ, ℓ) and G′ = (V′,E′, Σ′,ℓ′). Suppose s, t ∈ V and s′, t′ ∈ V′. The alignment problem Alignment(G, G′, ⌣, (s, t), (s′,t′)) is the decision problem of determining if there is a pair of paths p ∈ G[s ⇝ t] and p′ ∈ G′[s′ ⇝ t′] such that p ⌣ p′.
Theorem 2.5.
There exist polynomial time algorithms for the following decision problems:
Let s, t ∈ V, s′, t′ ∈ V′. Decide Alignment(G, G′, ⌣, (s, t), (s′,t′)).
Let s, t ∈ V. Decide ∃s′, t′ ∈ V′ Alignment(G, G′, ⌣, (s, t), (s′, t′)).
Let s, t ∈ V. Decide ∃s′ ∈ V′ Alignment(G, G′, ⌣, (s, t), (s′, s′)).
We postpone the proof of Theorem 2.5 to section 2.3, where we give explicit algorithms.
2.2. Down set graph of a poset.
Definition 2.6.
A poset (P, ≤) is a set P equipped with a transitive, reflexive, antisymmetric relation ≤ called a partial order. A linear extension of ≤ is a total order ≤′ which extends ≤, i.e., for all p0,p1 ∈ P, p0 ≤ p1 implies p0 ≤′ p1. We will use the notation (P, <) to denote a strict partial order.
Definition 2.7.
Let (P, ≤) be a poset. A down set of P is a subset A ⊂ P such that for all p, q ∈ P, p ≤ q and q ∈ A implies p ∈ A. The collection of down sets of P is denoted by O(P).
Definition 2.8.
Let (P, ≤) be a finite poset. The down set graph of (P, ≤), denoted PD, is the directed graph (O(P), F) with vertices O(P) and edges A → A′ iff A ⊊ A′ and there does not exist A″ ∈ O(P) such that A ⊊ A″ ⊊ A′.
Remark 2.9.
If A, A′ ∈ O(P) and there exists an edge A → A′ in PD, then A′ = A ∪ {p} where p ∈ A and q ≤ p implies that q ∈ A.
For completeness, we include an algorithm for constructing the down set graph of a poset in section 2.3.2.
Theorem 2.10.
Given a finite poset (P, ≤), the associated down set graph PD = (O(P),F) is a directed acyclic graph with a unique root ∅ and a unique leaf P. Moreover, there is a bijection between the paths in PD from the root ∅ to the leaf P and the linear extensions of ≤.
Proof.
The directed acyclic property is inherited from the definition via proper set inclusion. ∅ and P are the unique root and leaf since ∅ and P are the unique maximal and minimal elements in O(P), respectively. Now we show the moreover part. Let ≤′ be a linear extension of ≤. Suppose P = {p1, p2, …, pn}, where the indexing has been chosen so that p1 ≤′ p2 ≤′ ⋯ ≤′ pn. Define Pk := {p1, p2, …,pk}. Then ∅ → P1 → P2 → ⋯ → Pn−1 → Pn = P gives a path in PD from root to leaf unique to ≤′. Now the converse. Suppose ∅ → P1 → P2 → ⋯· → Pn−1 → Pn = P is a path from root to leaf in PD. We claim that Pk \ Pk−1 must be a singleton for each k. Suppose otherwise. Then let a, b ∈ Pk \ Pk−1 such that a ≠ b. Without loss, assume either a ≤ b or a and b are incomparable. Then Pk−1 ∪ {a} is a down set and Pk−1 ⊊ Pk−1 ∪ {a} ⊊ Pk which by Definition 2.8 contradicts Pk−1 → Pk. Accordingly, let pk = Pk \ Pk−1 for k = 1, …, n, and see that this sequence of elements completely characterizes the path from root to leaf in PD. Define the total order ≤′ via p1 ≤′ p2 ≤′ ⋯ ≤′ pn; since pk ≤ pk+1 holds for all k, ≤′ is a linear extension of ≤. Thus a path from root to leaf in PD uniquely determines a linear extension ≤′ of ≤. ■
2.3. Algorithms.
2.3.1. Alignment problem.
Definition 2.11.
Let ⌣ be a matching relation between two labeled directed graphs G = (V, E, Σ, ℓ) and G′ = (V′,E′, Σ′, ℓ′). The alignment graph AlignmentGraph(G, G′, ⌣) is defined to be the directed graph (V″,E″) given by
The alignment graph AlignmentGraph(G, G′, ⌣) is a subset of the product graph G × G′, and hence it has at most |V||V′| vertices, |E||E′| edges.
The following proposition follows immediately from the construction of the alignment graph and the definition of matching paths.
Proposition 2.12.
Paths in AlignmentGraph(G, G′, ⌣) are in one-to-one correspondence with pairs of matching paths in G and G′. In particular, is a path in the alignment graph iff p = (v1, v2, …, vn) and are a pair of matching paths in G and G′, respectively.
It immediately follows that the alignment problem is equivalent to a reachability query in the alignment graph.
Proposition 2.13.
The following are equivalent:
Alignment(G, G′, ⌣, (s, t), (s′, t′)),
AlignmentGraph(G, G′, ⌣) [(s, s′) ⇝ (t, t′)] = ∅.
Proposition 2.14.
If the cost of checking whether labels match is constant, then
can be constructed in O(|V||V′| + |E||E′|) time.
Proof.
The vertices of the alignment graph may be determined by checking for each element of (v, v′) ∈ V × V′ whether ℓ(v) = ℓ(v′). The edges of the alignment graph may be determined by checking for each (e, e′) ∈ E × E′ whether ℓ(e) = ℓ(e′). The result follows. ■
Proposition 2.15.
Let G = (V, E) be a directed graph and let s ∈ G. Then, the set Reachable(G, s) := {t ∈ V : G[s ⇝ t] = ∅} can be computed in O(|V| + |E|) time.
Proof.
Depth- or breadth-first search of G beginning at s will find all vertices reachable from s in time linear in the number of vertices and edges of G [3]. For completeness we provide a standard depth-first-search algorithm as Reachable(G, s) in Algorithm 1. ■
Proof of Theorem 2.5.
We show that the procedures Match, PathMatch, and CycleMatch of Algorithm 1 are polynomial time algorithms which decide the decision problems (1), (2), and (3), respectively. Let G″ = AlignmentGraph(G, G′, ⌣). By Proposition 2.13, we may rewrite (1), (2), (3) as follows:
Let s, t ∈ V, s′, t′ ∈ V′. Decide if (t, t′) ∈ Reachable(G″, (s, s′)).
Let s, t ∈ V. Decide ∃s′, t′ ∈ V′ (t, t′) ∈ Reachable(G″, (s, s′)).
Let s, t ∈ V. Decide ∃s′ ∈ V′ (t, s′) ∈ Reachable(G′′, (s, s′)).
The correctness of Match for deciding (1) is now immediate. For (2) and (3), we recognize we can handle the outermost ∃s′ ∈ V′ algorithmically via a for loop over s′ ∈ V′. The algorithms PathMatch and CycleMatch result. This gives correctness.
To see that the algorithms are polynomial time, we refer to Propositions 2.14 and 2.15. In particular, given these it is straightforward to verify (defining |G| = |V| + |E|) that Reachable executes in worst-case O(|G|) time, Match executes in worst-case O(|G||G′|) time, and PathMatch and CycleMatch execute in worst-case O(|G||G′||V′|) time. ■
See Figure 8 for an example of a PathMatch along with the corresponding path giving reachability in the alignment graph.
2.3.2. Construction of down set graph.
Recall that given a poset P a subset I ⊂ P is independent if no two elements of I are comparable.
Proposition 2.16.
Algorithm 2 computes the down set graph of a poset P.
Proof.
We first note there is a one-to-one correspondence between down sets of a poset and the independent sets of a poset. In particular given a down set D we can associate an independent set I = Maxima|E|ementsOf(D), and given an independent set I we can associate a down set D = Downset(I) := {p ∈ P : p ≤ q for some q ∈ I}. It is straightforward to see that this is one to one. Now consider the following recursively defined function:
See first that the recursion terminates since each recursive function call operates on a smaller set. Notice that if the function operates on a down set, then removing the maximal vertices again results in down sets—in fact, precisely the adjacent down sets in the down set graph. Hence E = f (P) is the set of edges in the down set graph. Writing this recursion in terms of independent sets, we have
Now from
the correctness of Algorithm 2 follows: it is just an implementation of this recursion which prevents some redundant recursion paths to save time (by storing them in V). ■
Algorithm 1.
procedure REACHABLE(G, s) | procedure PATHMATCH(G, G′, (s, t)) |
Push s onto stack S. | G″ ← AlignmentGraph(G, G′, ⌣) |
while S is not empty do | for s′ ∊ V′ do |
Pop u from stack S. | R ← Reachable(G″, (s, s′)) |
R ← R ∪ {u} | for (v, v′) ∊ R do |
A ← {v : (u, v) ∊ E} | if v = t then |
for v ∊ A do | return True |
if v ∉ R then | end if |
Push v into stack S | end for |
end if | end for |
end for | return False |
end while | end procedure |
return R | |
end procedure | procedure CYCLEMATCH(G, G′, (s, t)) |
G″ ← AlignmentGraph(G, G′, ⌣) | |
procedure MATCH(G, G′, (s, t), (s, t′)) | for s′ ∊ V′ do |
G″ ← AlignmentGraph(G, G′, ⌣) | R ← Reachable(G″, (s, s′)) |
R ← Reachable(G″, (s, s′)) | for (v, v′) ∊ R do |
if (t, t′) ∊ R then | if v = t and v′ = s′ then |
return True | return True |
else | end if |
return False | end for |
end if | end for |
end procedure | return False |
end procedure |
Algorithm 2 does not run in polynomial time in general, yet it does for the special case of interest for the application of this paper. In particular, as we will describe in the next section, we will consider posets for which each element is associated to one of a small number of variables x1, x2, …, xd, and all elements in the poset associated to the same variable are comparable. Moreover, as we discuss in section 4, associated to poset elements are time intervals determining the partial order such that one time interval (a, b) compares less than another time interval (c, d) iff b ≤ c. Under these assumptions, the incomparability graph of the poset P (i.e., the graph with vertices P and edges u ↔ v whenever u and v are incomparable) is an interval graph [9], which is a special kind of chordal graph. A chordal graph with n vertices has at most n maximal cliques [20, 10]. From these considerations we get the following bound.
Proposition 2.17.
Assume that P is a finite poset. Let d be the cardinality of the maximum independent set in P. Assume that the incomparability graph of P is chordal. Then the down set graph DP has at most 2dn vertices, and for fixed d, Algorithm 2 executes in polynomial time.
Algorithm 2.
procedure POSETTODOWNSETGRAPH(P) |
Let S be an empty stack |
V ← ∅ |
E ← ∅ |
Push MaximalElementsOf(P) onto S |
while S is not empty do |
Pop I from S |
V ← V ∪ {I} |
for v ∈ I do |
I′ ← MaximalElementsOf((I ∪ Predecessors(v)) \ {v}) |
if I′ ∉ V then |
Push I′ onto S |
end if |
E ← E ∪ {(I′, I)} |
end for |
end while |
return (V, E) |
end procedure |
3. Matching posets of extrema against computational dynamics models.
3.1. Labeled directed graphs from posets of extrema.
Assume that we can measure N variables over a time interval [0, T] for the system that we are interested in modeling. If the quantities of these variables change continuously, then there exists a continuous function that represents the dynamics. We will assume that over this time interval each variable attains finitely many local extrema. As discussed in the introduction, in applications we can only sample the system at finite time intervals and the measurements will be subject to noise. We use the following structure to codify the possible orderings of maxima and minima of the coordinates xi of x.
Definition 3.1.
A poset of extrema (P, <τ; μ) is a finite poset (P, <τ) equipped with a sur-jective function μ : P → {−, m, M}N that satisfies the following conditions. For n = 1, …, N, define Pn = {p : μ(p),n ∈ {m, M}}.
.
If n ≠ j, then Pn ∩ Pj =∅.
For each n, Pn ⊂ P is totally ordered by <τ.
Let u, v ∈ Pn. If u <τ v and μ(u) = μ(v), then there exists w ∈ Pn such that u <τ w <τ v and μ(u) ≠ μ(w).
It is worth commenting on the rationale behind Definition 3.1. The poset of extrema is designed to capture orderings with respect to time of minima and maxima of d variables. The symbols −, m, and M stand for not an extremum, local minimum, and local maximum, respectively, and in applications the ordering <τ respects the direction of time. Condition1 implies that every vertex of P is associated with an extremal event, where we define an extremal event to be a local extremum in exactly one coordinate projection. Condition 2 implies that each vertex is associated to an extremal event of precisely one variable, i.e., μ(p)n = − for all but precisely one n ∈ {1,2, …,N}. The assumption that each Pn is totally ordered with respect to <τ implies that for each variable the ordering (with respect to time) of the minima and maxima is known; any ambiguity arises from comparing across variables. The final condition prevents a variable experiencing two local maxima or two local minima consecutively. Note that this is an assumption about the sampling frequency of the experiment.
Returning to the unknown function x that represents the dynamics, one expects, generically, that the maxima and minima of the coordinates xn occur at different times. In the context of the poset of extrema (P, <τ) we interpret this to mean that the true dynamics corresponds to linear extension of <τ. Since, given the data, the linear extension is unknown we consider any linear extension to be a plausible sequence of events. Our goal is to use the machinery of section 2.1 in order to search for linear extensions of P and thus we construct, following Theorem 2.10, the down set graph PD of P, which exhibits a one-to-one correspondence between paths from root to leaf and linear extensions of P.
In order to produce a labeled directed graph suitable for pattern matching algorithms, we require labels on the vertices and edges of PD. We make use of a particular set of labels
called the extrema labels which are intended to carry the following information:
I: increasing;
D: decreasing;
m: minimum;
M: maximum;
−: transitioning;
*: lack of knowledge.
Definition 3.2.
Let (P, <τ; μ) be a poset of extrema with down set graph PD = (O(P), E). The pattern graph P induced by the poset of extrema P is the labeled directed graph (O(P), E∪ {(A, A) : A ∈ O(P)}, ), where the labeling of the vertices is given by
and the labeling of the edges is defined by
and (see Remark 2.9)
Although the pattern graph is (trivially) cyclic due to the presence of self-edges, we will continue to refer to the root and leaf nodes of as ∅ and P, respectively.
We give an example of a poset of extrema and the associated pattern graph using two variables x1 and x2, i.e., N = 2. Later on, we shall relate this example to a yeast dataset [18] that is discussed in section 4.2. For now, assume that x1 and x2 first attain minima, then later attain maxima, but that the timing of the minima cannot be distinguished, and neither can the maxima. This leads to the poset in Figure 1 (left). The associated pattern graph is in Figure 1 (right). The down sets of the poset of extrema are mapped to integers via
3.2. Labeled directed graphs from computational dynamics.
In this section we develop the notion of a search graph, a labeled directed graph suitable for pattern matching sequence of extrema for models arising in computational dynamics.
Definition 3.3.
Let X = (0, ∞)N. Suppose for each n ∈ {1, …, N} we have a finite set
that we call thresholds. The rectangular decomposition of X induced by (Θ1, Θ2, …, ΘN) is the partition of X into cells such that each is a product of intervals
where for each n,
Accordingly, each cell is homeomorphic to an open ball of some dimension, which we call the dimension of the cell. We denote k-dimensional cells . We call the cells in domains and we call the cells in walls. Two domains are said to be adjacent if the intersection of their closures contains a wall. We denote by Xk the union of all k-dimensional cells, i.e., .
Definition 3.4.
Consider X = (0, ∞)N with rectangular decomposition and a system of trajectories on X. A trajectory x : [t0,t1] → X is a domain trajectory if x([t0, t1]) ⊂ ξ for some domain . A trajectory x : [t0,t1] → X is a wall trajectory from a domain ξ to a domain ξ′ if there exists a wall such that x([t0, t1]) ⊂ ξ ∪ σ ∪ ξ′, and x−1(ξ) < x−1(σ) < x−1(ξ′) (in the sense of comparing sets, i.e., A < B iff for all a ∈ A, b ∈ B, a < b). The domain graph generated by on is the directed graph where the vertices are domains and there is an edge ξ → ξ′ iff there exists a wall trajectory from ξ to ξ′.
Definition 3.4 indicates how a domain graph is generated from a system of trajectories. For the applications discussed in this paper we are interested in particular trajectories that can be defined in terms of the domain graph.
Definition 3.5.
Let D = (V, E) be the domain graph generated by on . A trajectory which is the finite concatenation of wall trajectories is said to be a domain-wall trajectory. The associated domain graph path of a domain-wall trajectory x is the path of the domain graph edges corresponding to the wall trajectories which comprise x.
Definition 3.6.
Let be a system of trajectories on X = (0, ∞)N with rectangular decomposition . If every domain trajectory is monotone with respect to every variable and every wall trajectory undergoes at most one extremal event, we say is extrema-pattern-matchable with respect to . In this case, the labeled directed graph is said to be a search graph if (V, E) is the domain graph and ℓ reflects, as follows, our level of knowledge of the behaviors of trajectories:
Observe that * indicates a lack of knowledge. If a system of trajectories is extrema-pattern-matchable, we can always make its domain graph into a search graph by choosing all labels to be *. This would lead to a higher rate of false positives in matching; it is better to assign the strongest labels one can prove.
As an example, consider a system of trajectories over a rectangular decomposition of , with trajectories qualitatively depicted in Figure 2 (left). The walls are shown as dotted lines. The domains are labeled 1–4, and within each domain the trajectories are monotonic in each variable, satisfying the requirements to be extrema-pattern-matchable. Within domain 1, x2 trajectories are decreasing, while x1 trajectories either decrease or increase. The associated search graph is shown in Figure 2 (right); node 1 corresponding to domain 1 is labeled *D and likewise for the other nodes. The edges between nodes in the search graph correspond to concatenation of domain trajectories. Clearly, there cannot be a maximum in x1 as we pass from domain 1 to domain 2, but there could be a minimum, whereas x2 is constantly decreasing and cannot have either a minimum or maximum on that wall. The edge (1 → 2) in the search graph is therefore labeled m-, and similar arguments hold for the other edges.
3.3. Matching pattern graphs against search graphs.
As indicated in the introduction we are interested in identifying whether our model for dynamics is capable of producing sequences of maxima and minima that do not contradict the experimental data. The capability is equivalent to the existence of a matching between a pattern graph and the search graph. To do this we impose a particular matching relation.
Definition 3.7.
The extremal event matching relation ⌣ext on Σext is given by
(vertices) (I ⌣ext *), (I ⌣ext I), (D ⌣ext *), (D ⌣ext D), (*⌣ext *),
(edges) (−⌣ext −), (−⌣ext m), (−⌣ext M), (−⌣ext *), (m ⌣ext m), (m ⌣ext *), (M ⌣ext M), (M ⌣ext *).
Given a pattern graph and a search graph , we extend this relation to by defining a ⌣ext b whenever for all 1 ≤ i ≤ N, ai ⌣ext bi.
Theorem 3.8.
Let be a pattern graph for a poset of extrema (P, <, μ) and let be a search graph for a system of trajectories which is extrema-pattern-matchable with respect to a rectangular decomposition of X = (0, ∞)N. If admits a domain-wall trajectory with a sequence of extremal events corresponding to a linear extension of P, then there exists a path from root to leaf and a path s in such that p ⌣ext s.
Proof.
Let <‘ be a linear extension of P and name the elements of P as e1 <‘e2 <‘·⋯ <‘en. Suppose that ϕ : [t0, t1] → X is a domain-wall trajectory with the sequence of extremal events e1, e2, …, en. We show there exists a path p from root to leaf in and a path s in such that p ⌣ s.
Step 1. We construct a path p from root to leaf in and a path s in . By Definition 3.5, since ϕ is a domain-wall trajectory, it can be written as a concatenation of wall trajectories ϕi : [ti, ti+1] for i = 1, 2, …,m. Because is a search graph, Definition 3.6 implies that extremal events for ϕ(t) can only occur on walls (i.e., during times when ϕ(t) ∊ Xd−1) and at most one kind of extremal event can occur on a given wall. Since it is impossible for the same extremal event to occur twice in a row (e.g., between any two local minima there must be an intervening local maximum), and wall trajectories intersect precisely one wall, it follows that each wall trajectory experiences at most one extremal event. If we denote the set of extremal events which occur on the wall trajectory ϕi as Ei, then card Ei ≤ 1. Therefore, Ei = ∅ or {ej} for some j. Since ϕ experiences the events e1,e2, …, en in order, and ϕ is the concatenation of the trajectories ϕi, it follows that there exists an increasing function μ : {1, …, n} → {1, …, m} such that for i ∊ {1, …, n}, ei ∊ Eμ(i). Define p1 := ∅, and for i ∊ {1, …, m} define .
We show p1 → p2 → ⋯ → pm+1 is a path in from root to leaf. To this end it suffices to show that: (1) for each i ∊ {1, …, m + 1}, pi is a down set of P, (2) for each i ∊ {1, …, m} there is an edge pi → pi+1 in , (3) p1 = ∅, and (4) pm+1 = P.
Let i ∊ {1, …, m + 1}. Define k = max μ−1({1, …, i}). Since μ is increasing it follows that .
Since e1 <‘ e2 <‘ ⋯ <‘ en and <‘ is a linear extension of P, it follows that pi is a down set of P. This demonstrates (1). Now let i ∊ {1, …,m}. We show pi → pi+1 is an edge in . There are two cases: either (a) Ei = ∅ and pi = pi+1, or else (b) or else Ei = {ek} for some k and pi+1 = pi ∪ {ek}. For case (a), pi → pi+1 is an edge in since the pattern graph admits all self-edges. For case (b), pi → pi+1 an edge in since contains the edges present in the down set graph of P. This demonstrates (2). That p1 = ∅ is by definition. This demonstrates (3). Finally, . This demonstrates (4). Since (1), (2), (3), and (4) hold we have that p = p1 → p2 → ⋯ → pm+1 is a path from root to leaf in . Let s be the path in corresponding to the sequence of wall trajectories ϕi (i.e., the path associated with the domain-wall trajectory ϕ). Denote the vertices of the path i in order as s1 → s2 → ⋯ → sm+1. Note that the wall trajectories ϕi correspond to the edges si → si+1 in s. We have constructed a path p from root to leaf in and a path s in , completing Step 1.
Step 2. We show that for p and s so constructed, p ⌣ s holds. By Definitions 2.3 and 3.7, it suffices to show that for each i ∊ {1, …, N}, for each j ∊ {1, …, m + 1}, ℓ(pj)i ⌣′’ ℓ(sj)i (i.e., vertex labels match) and for each i ∊ {1, …, N} for each j ∊ {1, …, m}, ℓ(pj → pj+1)i ⌣′’ ℓ(sj → sj+1)i (i.e., edge labels match).
Proof that edge labels match. Let i ∊ {1, …, N} and j ∊ {1, …, m}. We show
(3.1) |
There are two cases: either (1) Ej = ∅ or (2) Ej = {ek} for some k. For case (1), Ej = ∅ implies pj = pj+1 and hence ℓ(pj → pj+1)i = −. Meanwhile ℓ(sj → sj+1)i ∊ {−,m,M, *}. By Definition 3.7 it follows that (3.1) holds for case (1). For case (2), Ej = {ek} for some k, we distinguish three subcases: (a) ek is local minimum for variable i, (b) ek is a local maximum for variable i, or (c) ek is not a local extremum for variable i. For subcase (a), ℓ(pj → pj+1)i = m. Since the wall trajectory ϕj experienced a local minimum for variable i, it follows that we could not have ruled out a local minimum on the wall corresponding to the edge sj → sj+1. This eliminates the possibility that ℓ(sj → sj+1)i is either − or M, i.e., ℓ(sj → sj+1)i ∊ {m, *}. Since m ⌣′ m and m ⌣′ *, (3.1) holds for subcase (a). Subcase (b) is similar. For subcase (c), ℓ(pj → pj+1)i = −, and the argument of case (1) again applies. Hence (3.1) holds in all cases.
Proof that vertex labels match.
Let i ∊{1, …, N} and j ∊ {1, …, m + 1}. We show ℓ(pj)i ⌣′ ℓ(sj)i.
Let Pi = {p ∊ P : ℓ(p)i ∊ {I, D}}. We consider two cases: (1) Pi = ∅ and (2) Pi = ∅. For case (1), by Definition 3.2, Pi = ∅ implies ℓ(pj)i = *. By Definition 3.6, ℓ(sj)i ∊ {I, D, *}, and by Definition 3.7 * ⌣′ I, * ⌣′ D, and * ⌣′ *. It follows that ℓ(pj)i ⌣′ ℓ(sj)i for case (1). For case (2), we assume Pi = ∅. Then ℓ(pj)i ∊ {I, D}. There are four subcases depending on whether (a) ℓ(pj)i = I or ℓ(pj)i = D and (b) whether pj ∩ Pi = ∅. As they are all similar, we only consider the subcase when ℓ(pj)i = I and and pj ∩ Pi = ∅. Let ϕ’ : [t1, tj] → X be the domain-wall trajectory ϕ’ obtained by concatenating ϕ1, ϕ2, ⋯·, ϕj−1. By construction, pj is the set of events in P which occur on ϕ’. Let e be the maximal element of pj ∩ Pi. By Definition 3.2, ℓ(pj)i = I implies that e is a local minimum. It follows that ϕ’ is increasing in variable i after event e occurs. This implies that for sufficiently small ∊ > 0, is an increasing trajectory with image contained in the domain sj. By Definition 3.6, it follows that ℓ(sj)i ∊ {I, *}. By Definition 3.7, I ⌣′ I and I ⌣′ *, and ℓ(pj)i ⌣′ ℓ(sj)i follows. Similar arguments for the other three subcases show ℓ(pj)i ⌣′ ℓ(sj)i for case (2). We have shown p ⌣ s, which completes Step 2.
Since in Step 1 we constructed a path p in from root to leaf and a path s in and in Step 2 we showed p ⌣ s, the proof is complete. ■
To continue our example, we take the pattern graph from Figure 1 (right) and the search graph from Figure 2 (right) and seek matching paths , . To do this, we form the alignment graph as in Definition 2.11 using the matching relation ⌣ext given in Definition 3.7. We then apply Proposition 2.13 that states that finding paths in the alignment graph is equivalent to finding pairs of matching paths in the pattern and search graphs. In particular, we seek a match to a path that is a linear extension of the poset of extrema in Figure 1 (left), to verify that the system of trajectories in Figure 2 (left) can support the constraints on the order of extrema summarized by the poset.
The alignment graph is given in Figure 3, where each node is labeled by a pair (a, b), where a is a node identifier for the search graph (integers 1–4) and b is a node identifier for the pattern graph (integers 0–6). The red path denotes a match between path p = (0,1,3, 5,6) in the pattern graph in Figure 1 (right) and cyclic path s = (1, 2, 3, 4,1) in the search graph in Figure 2 (right). We notice that p = (0,1, 3, 5, 6) corresponds to a linear extension of the poset of extrema in Figure 1 (left), since it is a path from root to leaf of the pattern graph (Theorem 2.10). Therefore has at least one trajectory with a sequence of extrema respecting the constraints of the poset of extrema.
4. Application to regulatory networks.
As indicated in the introduction, to provide a demonstration of how to apply the combinatorial tools described in the previous sections we make use of DSGRN. A complete description of the mathematical framework can be found in [4]; however, for the benefit of the reader we begin this section with a short review. We then present an application to a simple system associated with the cell cycle of S. cerevisiae using experimental time series data (provided courtesy of the Haase lab; see [18] for data collection methods) for mRNA sequences associated with SWI4, HCM1, NDD1, and YOX1 collected at time intervals of 5 minutes.
4.1. DSGRN model for regulatory networks.
We provide a mathematical definition of a regulatory network, its associated parameter space, and an explicit decomposition of parameter space into a finite set of regions. For the sake of clarity we begin with a discussion of switching systems and demonstrate that they provide a system of trajectories. We then observe that based on the monotonicity assumption of systems of trajectories, the results for switching systems are applicable to a much larger class of dynamical systems. We conclude by relating the system of trajectories to the output of DSGRN, which provides us with a means of analyzing specific data sets.
Definition 4.1.
A regulatory network RN = (V, E) consists of vertices V = {1, …,N} called network nodes, annotated directed edges E ⊂ V × V × {→, ⊣} called interactions, and for each n ∊ V, polynomial monotone increasing functions called node logics, where Sn := {(i, n) ∊ E} is called the nth source set.
An → annotated edge is referred to as an activation and an ⊣ annotated edge is called a repression. We indicate that either i → j or i ⊣ j without specifying which by writing (i, j) ∊ E. We allow self-edges. From one node to another we admit at most one type of annotated edge, e.g., we cannot have both i → j and i ⊣ j simultaneously. The nth target set is given by Tn := {(n, j) ∊ E}.
A parameterized family of dynamics is generated from the regulatory network.
Definition 4.2.
A parameter for a regulatory network RN = (V, E) is a tuple z ∊ Z ⊂ (0, ∞)(N+3·|E|). The coordinates of a parameter z are associated with the nodes and edges of RN and are given by the values of four functions γ : V → (0, ∞), and l, u, Θ : E → (0, ∞) with the constraint that l(e) ≤ u(e) for each e ∊ E.
The functions γ, l, u, and Θ are used to decompose phase space and generate dynamics as follows. Define
and assume that for all n = 1, …, N,
(4.1) |
Then, (Θ1, Θ2, …, ΘN) defines a rectangular decomposition (see Definition 3.3) on X := (0, ∞)N.
Define Γ to be the diagonal N × N matrix with diagonal entries γ(n) for n ∊ {1, …, N}. Define W : E × X → (0, ∞) via
Finally, define componentwise by
The construction up to the point allows us to recall the classical switching system [12, 5, 14],
(4.2) |
where Γ is a diagonal matrix with diagonal entries given by γ, and the parameters l, u, and Θ specify Λ. Thus, (4.2) is implicitly associated with a given parameter value z ∊ Z. A nice feature of the switching system is that the structure of the dynamics over the rectangular domains is easily understood. Observe that if , then Λ is constant on ξ and hence it makes sense to write Λ(ξ).
Definition 4.3.
A parameter value z ∊ Z is regular if (4.1) holds, l(e) < u(e) for all e ∊ E, and
(4.3) |
if an N − 1 dimensional face of lies in the hyperplane defined by xn = Θ(n, k). The set of regular parameter values is denoted by ZR.
From now on we restrict our attention to switching systems for which the parameter value is regular.
Definition 4.4.
An RN switching system domain trajectory is a function x: [t0, t1] → cl(ξ), where , that solves the differential equation
(4.4) |
For z ∊ ZR, the associated RN switching system of trajectories at z is denoted by and is defined to be the smallest system of trajectories (see Definition 1.1) which contains every RN switching system domain trajectory.
Remark 4.5.
It is straightforward to verify that under Definition 1.1, the intersection of two systems of trajectories is again a system of trajectories. Thus the notion of the smallest system of trajectories containing some set of trajectories is well-defined.
Observe that (4.4) is a linear differential equation and for each ξ can be extended to all of . In this case,
is a globally attracting fixed point.
Let be the canonical projection map onto the nth coordinate.
Proposition 4.6.
Let RN be a regulatory network and z ∊ ZR. Consider the switching system of trajectories . Let ξ, be separated by the hyperplane xn = Θ((n, j)) for some (n, j) ∊ E such that πn(ξ) < πn(ξ’). Then,
there exists a wall trajectory from ξ to ξ’ iff ,
there exists a wall trajectory from ξ’ to ξ iff .
Proof.
We show (i) and leave (ii) to the reader. Suppose there exists a wall trajectory x : [0, T] → (0, ∞)N from ξ to ξ’. By Definition 1.1, the restrictions x|A and x|B onto and are again trajectories. By Definition 4.4, x|A and x|B are solutions to (4.4). Such solutions are monotonic, so it follows that x|A and x|B are increasing. This requires πn(Pξ) > Θ((n, j)) and πn(Pξ’) > Θ((n, j)) (with strictness since we reach or leave the wall in finite time), yielding as desired.
To prove the converse suppose . Let , where is the cell between ξ and ξ’. Solve the initial value problem (4.4) with initial value in forward time in ξ’ and in backward time in ξ to obtain solutions x: [t0,t1] → cl(ξ) and y: [t1, t2] → cl(ξ’) such that . By Definition 4.4, x and y are trajectories in . By Definition 1.1 the concatenation of x and y is again a trajectory. This yields a wall trajectory from ξ to ξ’. ■
Proposition 4.7.
Let RN be a regulatory network, z ∊ ZR, and be a domain. Then the switching system of trajectories has the following properties:
Every trajectory x(t) in ξ is monotonic in each variable.
If , then for every trajectory x(t) in ξ, xn(t) is an increasing function.
If , then for every trajectory x(t) in ξ, xn(t) is a decreasing function.
If , there exist trajectories x(t) in ξ where xn(t) may be either an increasing, decreasing, or constant function.
Let w be a wall associated with the hyperplane xn = Θ((n, j)) arising from the regulatory network interaction xn → xj. Then, the only type of extremum a wall trajectory can undergo as it passes through w is a local minimum in the variable xj.
Let w be a wall associated with the hyperplane xn = Θ((n, j)) arising from the regulatory network interaction xn ⊣ x,. Then, the only type of extremum a wall trajectory can undergo as it passes through w is a local maximum in the variable xj.
Proof.
Properties (i)-(iv) follow immediately from (4.4).
We show (v) and leave (vi) to the reader. Let w be a wall associated with the hyperplane xn = Θ((n,j)) arising from the regulatory network interaction xn → xj. Let ξ, ξ’ be the adjacent domains that w separates, such that πn(ξ) < πn(ξ’). Let x : [t0,t1] → X be a wall trajectory from ξ to ξ’. We show that x cannot undergo any kind of extremum except possibly a local minimum in the variable xj.
Since z ∊ ZR, Θ((n, j)) ≠ Θ((n, k)) for j ≠ k. This implies that for all k ≠ j. Define x|A and x|B, where and . Since x|A and x|B each obey (4.4) on their respective domains, it follows that xk obeys everywhere. Thus xk(t) is monotonic and hence experiences no extremal event. Now we show x cannot undergo a local maximum event in variable xj. Since l((i, j)) < u((i, j)) it follows from the definitions that we must have . If x|A is constant or decreasing in the jth coordinate, then there cannot be a local maximum as we pass the wall. So we consider only the case where x|A is increasing in the jth coordinate. This case requires that . Hence, . Since x|B is a solution of the initial value problem corresponding to (4.4) with an initial condition for xj less than , it follows that xj is everywhere increasing. Therefore, xj does not experience a local maximum. ■
While the action of the switching system on top dimensional domains is easy to understand, the matching results only make use of the qualitative properties of the system of trajectories described in the conclusions of Propositions 4.6 and 4.7. With this in mind let denote any system of trajectories that satisfies Proposition 4.6(i)–(ii) and Proposition 4.7(i) – (vi).
Theorem 4.8.
Let (RN, z) be a parameterized regulatory network with z ∊ ZR and let be an associated system of trajectories. Let be the labeled directed graph given by the following:
.
.
For all n ∈ V, whenever .
For all n ∈ V, ℓ(ξ)n = * whenever .
-
For all n, j, k ∈ V,
ℓ(ξ → ξ′)n = − whenever xj → xk, n = k, and xj = Θ((j, k)) separates ξ and ξ′.
-
For all n, j ∈ V,
ℓ(ξ → ξ′)n = m whenever xj → xn and xj = Θ((j, n)) separates ξ and ξ′.
-
For all n, j ∈ V,
ℓ(ξ → ξ′)n = M whenever xj ⊣ xn and xj = Θ((j, n)) separatesξ and ξ′.
Then, is a search graph for with the rectangular decomposition .
Proof.
By Proposition 4.6, it follows that is the domain graph for with the rectangular decomposition . By Proposition 4.7, it follows that the vertex and edge labels satisfy the requirements of Definition 3.6. ■
Theorem 4.8 guarantees that given a regulatory network and regular parameter value there exists a search graph. The next proposition indicates that parameter space admits a finite decomposition, where within each open component of the decomposition the parameters exhibit isomorphic search graphs.
Proposition 4.9.
For a fixed regulatory network the following hold:
The regular parameter values ZR form an open and dense subset of all parameter values Z.
ZR has finitely many connected components.
The connected components of ZR are semialgebraic sets which can be written as systems of strict inequalities involving polynomials of the parameters.
If z1, z2 ∈ZR are in the same connected component of ZR, then the search graph for is isomorphic to the search graph associated with .
We do not provide a proof of Proposition 4.9 as it is a partial summary of results in [4] that describes the mathematical foundations for the DSGRN software [15]. Given a regulatory network for which |Sn| ≤ 3 and |Tn| ≤ 3 the key computational result of [4] is that DSGRN provides an efficient computational scheme for constructing an undirected graph PG, called the parameter graph, where each node represents one of the connected components described in Proposition 4.9(iii) and the edges correspond to a notion of adjacency of the parameter regions. In addition, for each node in the parameter graph DSGRN can be used to compute the associated domain graph, i.e., identify the set of vertices and the set of edges of S as described in Theorem 4.8(i) and (ii).
From the domain graph, it is possible to extract summary data, called a Morse graph, that provides information about the global dynamics. The association of a Morse graph to each node in the parameter graph PG gives rise to the notion of a database of dynamical information; the interested reader is referred to [1, 2, 4] for further details about Morse graphs and dynamical databases. For the purposes of this paper, the notion of the domain graph, and the search graph which arises from it, suffices.
We remark that the system of trajectories qualitatively depicted in Figure 2 (left) arises from the regulatory network RN({x1,x2}, {x1 → x2,x2 ⊣ x1}) for any regular parameter z satisfying
4.2. Labeled pattern graph from experimental data.
We now turn to the task of generating a labeled pattern graph from experimental data. The graph in Figure 4 (left) provides normalized expression level data for mRNA sequences associated with SWI4, HCM1, NDD1, and YOX1 from S. cerevisiae taken at time intervals of 5 minutes. Since we are only concerned with the orderings of the extrema, the normalization of the data makes it easier to identify these extrema.
As indicated in the introduction identifying extrema in data is a serious statistical endeavor that we do not address in this paper. While our techniques require a set of potential sequences of extrema, they are agnostic with respect to how the potential sequences are derived; therefore we are content for the purpose of this paper to use simple heuristics. In particular, the table in Figure 4 (right) provides intervals of time within which we declare a maximum or minimum value of expression has occurred. For example, to allow for noise in the data the tightest time bound we are willing to assume on the maxima for SWI4 and YOX1 is (15, 30). Similarly, we ignore the potential for a local minimum and maximum of NDD1 at time points 70 and 80 and instead assume that a minimum occurs somewhere within the time interval (70,85).
Because we are using intervals to quantify the occurrence in time of extrema we cannot expect to obtain a linear ordering. Instead we define a partial order <τ by
(4.5) |
Note that the poset of extrema in Figure 1 (left) arises from using <τ on rows 1, 4, 5, and 6 in the table; i.e., we form the poset consisting of the first minimum and first maximum of each of x1 = SWI4 and x2 = YOX1.
Using all of the rows in the table in Figure 4 results in the poset indicated in Figure 5. Note that the linear extensions of <τ correspond to ordered sequences of extrema events.
Observe that we have constructed a poset of extrema (P, <τ; μ) (see Definition 3.1) where P consists of the entries of the time interval column in the table in Figure 4 (right), <τ is as defined by (4.5), and the values of μ: P → { −, m, M}4 are obtained from the event column of the table in Figure 4 (right). For example, if the first coordinate of μ corresponds to SWI4, then P1 = {(−∞, 10), (15, 30), (50, 60), (75, ∞)}. Following Definition 3.2 the associated pattern graph is shown in Figure 6. We remark that Proposition 2.17 applies in this situation, i.e., Algorithm 2 can quickly compute the pattern graph .
4.3. Results for wavepool models.
The regulatory network RNW shown in Figure 7 is perhaps the simplest representative of the family of wavepool models proposed by the Haase lab [19] for the metabolic cycle in S. cerevisiae. Our goal is to identify if, for a particular identification of the nodes {1, 2, 3,4} with the genes {SWI4, HCM1, NDD1, YOX1}, a DSGRN model of this form is consistent with the time series data shown in Figure 4 (left) and, if so, under what ranges of parameter values.
Applying the DSGRN database code to RNw produces a parameter graph PG with 1080 nodes. As explained in section 4.1, the phase space of this network is (0, ∞)4 and the parameter space is a subset of (0, ∞)19. The nodes correspond to 1080 distinct regions of parameter space, which in turn give rise to 1080 distinct classes of state transition graphs which may arise from the regulatory network of Figure 7. For each node we may present the associated nonempty connected region of parameter space as the solution set of a system of polynomial inequalities. For each point z in this set, the system of trajectories gives rise to the same associated search graph.
4.3.1. Invalidating a model.
As a simple test we begin by considering a model that can be ruled out based on known biological interactions. Consider the regulatory network RNW where 1 ↔ NDD1, 2 ↔ HCM1, 3 ↔ SWI4, and 4 ↔ YOX1. The evidence table in the supplementary material of [17, Table S2] shows no known regulation of HCM1 by NDD1, despite numerous experiments of different types, so we would expect to see no pattern matches with this model. Applying our pattern matching methodology to the search graphs which arise for each of the 1080 parameter nodes corresponding to this instantiation of the regulatory network RNW and the pattern graph of Figure 6 we obtain no matches. This indicates that no matter how parameters are chosen, the dynamical model cannot give rise to a solution trajectory exhibiting a behavior qualitatively similar to the collected experimental data of Figure 4. Accordingly, we reject the proposed regulatory network model.
4.3.2. Parameter learning.
We now turn to an accepted version of the wavepool regulatory network model RNW where 1 ↔ SWI4, 2 ↔ HCM1, 3 ↔ NDD1, and 4 o YOX1. For this network we expect to find matches (in fact, failure to find any matches would probably suggest that the DSGRN model was inappropriate).
Applying the pattern matching methodology to the search graphs which arise for each of the 1080 parameter nodes corresponding to this revised instantiation of the regulatory network RNW and the pattern graph of Figure 6 results in matches for 22 parameter nodes. By Theorem 3.8, for any parameter z belonging to any of the other 1058 parameter nodes, the system of trajectories does not contain any trajectory passing only through domains and walls which exhibits a sequence of extrema matching a plausible total order of the experimentally observed extrema in the data. Hence, our analysis has dramatically reduced uncertainty about relationships between the underlying parameters.
Furthermore, we can explicitly describe the regions of parameter space that correspond to these 22 matching parameter nodes. For example, for one such parameter node the associated parameter region in (0, ∞)19 is given by the inequalities
(4.6) |
where
l1 := l((NDD1; SWI4)) u1 := u((NDD1; SWI4)) θ1 := Θ((NDD1; SWI4))
l2 := l((YOX1; SWI4)) u2 := u((YOX1; SWI4)) θ2 := Θ((YOX1; SWI4))
l3 := l((SWI4; HCM1)) u3 := u((SWI4; HCM1)) θ3 := Θ((SWI4; HCM1))
l4 := l((HCM1; NDD1)) u4 := u((HCM1; NDD1)) θ4 := Θ((HCM1; NDD1))
l5 := l((SWI4; YOX1)) u5 := u((SWI4; YOX1)) θ5 := Θ((SWI4; YOX1))
γ1 :=γ(SWI4) γ2 :=γ(HCM1)
γ3 :=γ(NDD1) γ4 :=γ(YOX1)
A complete listing of such regions is available in supplementary material [16].
A pair of matching paths between the pattern graph and the search graph corresponding to this parameter region is shown in Figure 8.
5. Concluding remarks.
We presented a general method capable of rejecting models that cannot match coarse data generated by an experimentally measured time series. Our assumptions are very general; we expect that the time series is subject to substantial experimental error and therefore we only assume partial knowledge of the order of extrema of the components of the time series. This information is encoded in a poset of extrema, which we represent as a labeled directed acyclic graph called a pattern graph.
Coming from the modeling side, we start with a concept of system of trajectories. Such a system can be produced by decomposition of the phase space into disjoint domains in which all trajectories are monotone, and on each boundary between domains, at most one component can attain an extremum. Existence of such a decomposition allows extraction of the extremal behavior and its encoding into a search graph. On this level of generality we show that the problem of matching labeled paths between pattern graph and search graph can be solved in polynomial time.
We discuss the applicability of our approach in two directions. First, we provide an example of a class of models which can be used to construct search graphs. Second, we apply our method to expression time series data from cell cycle in yeast. We show how our method can be used to learn parameter regimes consistent with the experimental measurement by rejecting parameter regimes where the dynamics does not align with the data.
In order to ensure our results may be reproduced we adhere to the following recipe: (1) we release our code under an open-source license, (2) we host our code on a publicly available site using version control (i.e., history tracking), (3) we give the version numbers of the code used to produce the result, (4) we provide instructions for installing and running the code, and (5) we produce digital object identifiers (DOIs) of the versioned code for use in bibliographical entries.
The computer codes used to reproduce the results in this paper are stored in two code repositories. The first repository is the DSGRN project [15]. This is an open-source project which, as of writing, is hosted on the code-sharing website GitHub at https://github.com/shaunharker/DSGRN. The version utilized for this paper is 1.0.0. The second repository is the supplement to this paper [16] and houses the code (which relies on DSGRN) which is used to reproduce the above results. This again is open-source and is hosted at https://github.com/shaunharker/2017-DSGRN-ModelRejection. The version utilized for this paper is 1.0.0. The DOIs for these can be found in the references.
Funding:
This work was partially supported by grant NIH-1R01GM126555-01 as part of the Joint DMS/NIGMS Initiative to Support Research at the Interface of the Biological and Mathematical Sciences. The work of the first author was partially supported by DARPA D12AP200025 and USDA 2015-51106-23970. The work of the second author was partially supported by NSF grants DMS-1226213, DMS-1361240, DARPA D12AP200025, and NIH R01 grant 1R01AG040020-01. The work of the third and fourth authors was partially supported by grants NSF-DMS-1125174, 1248071, and 1521771 and DARPA contract HR0011-16-2-0033.
Contributor Information
Bree Cummins, Email: cummins@math.montana.edu.
Tomas Gedeon, Email: gedeon@math.montana.edu.
Shaun Harker, Email: sharker@math.rutgers.edu.
Konstantin Mischaikow, Email: mischaik@math.rutgers.edu.
REFERENCES
- [1].Arai Z, Kalies W, Kokubu H, Mischaikow K, Oka H, and Pilarczyk P, A database schema for the analysis of global dynamics of multiparameter systems, SIAM J. Appl. Dyn. Syst, 8 (2009), pp. 757–789. [Google Scholar]
- [2].Bush J, Gameiro M, Harker S, Kokubu H, Mischaikow K, Obayashi I, and Pilarczyk P, Combinatorial-topological framework for the analysis of global dynamics, Chao, 22 (2012), 047508. [DOI] [PubMed] [Google Scholar]
- [3].Cormen TH, Stein C, Rivest RL, AND Leiserson CE, Introduction to Algorithms, 2nd ed., McGraw-Hill, New York, 2001. [Google Scholar]
- [4].Cummins B, Gedeon T, Harker S, Mischaikow K, and Mok K, Combinatorial representation of parameter space for switching networks, SIAM J. Appl. Dyn. Syst, 15 (2016), pp. 2176–2212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].de Jong H, Modeling and simulation of genetic regulatory systems: A literature review, J. Comput. Biol, 9 (2002), pp. 67–103, 10.1089/10665270252833208. [DOI] [PubMed] [Google Scholar]
- [6].Edwards R, Analysis of continuous-time switching networks, Phys. D, 146 (2000), pp. 165–199, https://www.sciencedirect.com/science/article/pii/S0167278900001305. [Google Scholar]
- [7].Edwards R, Farcot E, and Foxall E, Explicit construction of chaotic attractors in Glass networks, Chaos Solitons Fractals, 45 (2012), pp. 666–680, 10.1016/j.chaos.2012.02.018. [DOI] [Google Scholar]
- [8].Edwards R and Ironi L, Periodic solutions of gene networks with steep sigmoidal regulatory functions, Phys. D, 282 (2014), pp. 1–15, 10.1016/j.physd.2014.04.013. [DOI] [Google Scholar]
- [9].Fulkerson D and Gross O, Incidence matrices and interval graphs, Pacific J. Math, 15 (1965), pp. 835–855. [Google Scholar]
- [10].Gavril F, Algorithms for minimum coloring, maximum clique, minimum covering by cliques, and, maximum independent set of a chordal graph, SIAM J. Comput, 1 (1972), pp. 180–187. [Google Scholar]
- [11].Glass L and Kauffman SA, Co-operative components, spatial localization and oscillatory cellular dynamics, J. Theoret. Biol, 34 (1972), pp. 219–37, https://www.ncbi.nlm.nih.gov/pubmed/5015702. [DOI] [PubMed] [Google Scholar]
- [12].Glass L and Kauffman SA, The logical analysis of continuous, non-linear biochemical control networks, J. Theoret. Biol, 39 (1973), pp. 103–29, https://www.ncbi.nlm.nih.gov/pubmed/4741704. [DOI] [PubMed] [Google Scholar]
- [13].Glass L and Pasternack J, Stable oscillations in mathematical models of biological control systems, J. Math. Biol, 6 (1978), pp. 207–223, https://link.springer.com/article/10.1007/BF02547797. [Google Scholar]
- [14].Gouzé J-L and Sari T, A class of piecewise linear differential equations arising in biological models, Dyn. Syst, 17 (2002), pp. 299–316, 10.1080/1468936021000041681. [DOI] [Google Scholar]
- [15].Harker S, DSGRN Software, , 2017.
- [16].Harker S and Cummins B, Code Supplemental for “Model Rejection and Parameter Reduction via Time Series”. , 2017. [DOI] [PMC free article] [PubMed]
- [17].Kovacs LAS, Mayhew MB, Orlando DA, Jin Y, Li Q, Huang C, Reed SI, Mukher-JEE S, and Haase SB, Cyclin-dependent kinases are regulators and effectors of oscillations driven by a transcription factor network, Molecular Cell, 45 (2012), pp. 669–679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Leman AR, Bristow SL, and Haase SB, Analyzing transcription dynamics during the budding yeast cell cycle, in Cell Cycle Control. Methods in Molecular Biology (Methods and Protocols), Vol. 1170, Noguchi E and Gadaleta M, eds., Humana Press, New York, NY, 2014, pp. 295–312. [DOI] [PubMed] [Google Scholar]
- [19].Orlando DA, Lin CY, Bernard A, Wang JY, Socolar JE, Iversen ES, Hartemink AJ, and Haase SB, Global control of cell-cycle transcription by coupled cdk and, network oscillators, Nature, 453 (2008), pp. 944–947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Rose DJ, Tarjan RE, and Lueker GS, Algorithmic aspects of vertex elimination on graphs, SIAM J. Comput, 5 (1976), pp. 266–283. [Google Scholar]