Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Apr 1.
Published in final edited form as: J Math Biol. 2020 Feb 1;80(5):1523–1557. doi: 10.1007/s00285-020-01471-4

Using extremal events to characterize noisy time series

Eric Berry *, Bree Cummins *, Robert Nerem *, Lauren M Smith , Steven B Haase , Tomas Gedeon *
PMCID: PMC7093088  NIHMSID: NIHMS1555432  PMID: 32008103

Abstract

Experimental time series provide an informative window into the underlying dynamical system, and the timing of the extrema of a time series (or its derivative) contains information about its structure. However, the time series often contain significant measurement errors. We describe a method for characterizing a time series for any assumed level of measurement error ε by a sequence of intervals, each of which is guaranteed to contain an extremum for any function that ε-approximates the time series. Based on the merge tree of a continuous function, we define a new object called the normalized branch decomposition, which allows us to compute intervals for any level ε.

We show that there is a well-defined total order on these intervals for a single time series, and that it is naturally extended to a partial order across a collection of time series comprising a dataset. We use the order of the extracted intervals in two applications. First, the partial order describing a single dataset can be used to pattern match against switching model output [1], which allows the rejection of a network model. Second, the comparison between graph distances of the partial orders of different datasets can be used to quantify similarity between biological replicates.

Keywords: time series, merge trees, order of extrema, partial orders

1. Introduction

Time series data provide a discrete measurement of a dynamical system. By collecting simultaneous time series measuring different components of a dynamical system, we can infer potentially causal relationships between components [2, 3, 4, 5]. These relationships are represented in the form of a regulatory network, deduced from data experimentally or via network learning [5, 6, 7, 8, 9, 10, 11]. Our recent work [12] has focused on the post hoc study of admissible dynamics for these network models. We associate to each network model a switching system [13, 14, 15, 16], in which each node is modeled by a piecewise linear ODE.

In [12], we introduced a method to describe the global dynamics of a regulatory network for all parameterizations of the switching system. This approach, named DSGRN (Dynamic Signatures Generated by Regulatory Networks) [17], leverages the fact that the switching system admits a finite decomposition of phase space into domains, and each variable of the dynamical system can be assigned one of the finite number of states representing these domains. Furthermore, the dynamics of the switching system can be described by capturing transitions between these states via a state transition graph. Finally, the switching systems admit an explicit finite decomposition of parameter space in regions where the state transition graph is constant. DSGRN provides a complete combinatorialization of dynamics in both phase space and parameter space.

The solution trajectories of the parameterized switching system correspond to the sequences of discrete states determined by paths through the state transition graph. From the sequence of states, one can deduce the sequences of maxima and minima (extrema) of the admissible solution trajectories.

In a recent paper [1], we compared a sequence of extrema generated by paths in the DSGRN switching model to experimentally observed sequences of maxima and minima in time series. We were seeking consistency between a network model and a time series dataset by checking if the ordering of extrema in the dataset is compatible with the ordering of extrema along a path in the state transition graph. If such a match could not be found, we proposed that the network model be rejected as a valid model of the underlying biological system producing time series data.

The sparseness of measurements in time series may not accurately capture the timing of an extremum. As a consequence, it may be difficult to ascertain differences in ordering between extrema in different time series with high confidence. To account for these issues, [1] proposed that each extremum in the dataset should be assigned a time interval representing the level of uncertainty in the timing of the extremum. If two time intervals overlapped, then the relative ordering of the extrema was taken to be unknown. If the intervals were disjoint, then the relative order was known. This relation formed a partial order of extrema representing the time series dataset.

In [1], the intervals were chosen in an ad hoc fashion. In this manuscript, we describe how to extract the desired time intervals about each extremum in a dataset D in a rigorous and automated fashion.

We consider our dataset D to be a collection of n time series

D=(D1,D2,,Dn)

measured at a common discrete set of time points (z1,…,zk). We assume that the measurement error of size ε is additive and thus the “true” time series Ti lies in a band of size 2ε about measured time series Di. Our goal is to compute a collection time intervals Iε(Di), each of which is guaranteed to contain an extremum of the true time series Ti. Using the techniques of merge trees [18, 19], branch decompositions [20], and sublevel sets, we construct time intervals Iε(Di) that increase in length as a function of increasing ε (see Section 4). We call this construction the ε-extremal interval method.

As an example, see Figure 1. On top is a time series curve D1 that has been interpolated to smooth it (see Appendix A) and normalized to the range [−0.5, 0.5]. It has five local extrema including the endpoints. In the second row of Figure 1, there is a visualization of Iε(D1) as a function of increasing noise ε. The reddish lines are the intervals associated to local maxima and the gray lines are associated to the local minima.

Figure 1:

Figure 1:

Top: A smoothed and normalized time series. Bottom: The ε-extremal intervals as a function of increasing noise ε. The gray lines correspond to local minima and the reddish lines are associated to local maxima.

The bottom row in the lower plot has five points marked, each of which is an interval of length zero corresponding to the case without noise, ε = 0. At a noise level of ±1%, the intervals have widened but remain distinct. At a noise level of ±2%, some of the intervals have started to overlap. Somewhere between ±3% and ±4%, the two intervals corresponding to the maximum and minimum near time point 100 have widened so much that they coincide. When two intervals coincide, it means the ε-band has become so large that the true time series T1 is not guaranteed to have any extrema within that interval, and the interval is removed from Iε(D1). Between ±4% and ±5%, the interval associated to the right endpoint minimum has become a proper subset of the interval of the adjacent maximum. The interval containing the maximum continues to the 5% noise level, while the interval associated to the rightmost minimum has been removed from Iε(D1). Every time an interval becomes a proper subset of another interval, the smaller interval is removed and the larger one is retained in Iε(D1). The reason is because the true time series T1 is only guaranteed to attain the extremum associated to the larger interval.

The top row in the lower plot provides a description of the time series assuming a measurement error of ±5%. At this level of precision, the time series is characterized by a global minimum well-separated from a global maximum. When the noise level increases enough that these last two intervals coincide, which they will do by a 50% noise level, then we say that the time series is ε-constant. At that point in time, all information about the extrema of the time series (if they exist) is is erased by noise.

As alluded to above, the key property that guides the construction of the collection of intervals Iε(Di) is the following. Let l be the linear interpolation of a time series Di and let e be its local extremum. For a fixed measurement error level ε, let IIε(Di) be the ε-extremal interval containing e. We prove in Section 4 that every continuous function whose graph lies within the band l ± ε must attain an extremum of the same type as e (maximum or minimum) within the interval I. Moreover, I is the minimal such interval about e with this property, up to the discretization level of the time series. This means that the intervals Iε(Di) describe the position of extrema of a time series in a way that is robust to noise in the most precise way possible given the information in the dataset. Naturally, if important extrema are missing from the dataset due to sparse time points, then this method cannot guess at their existence.

Using our approach, a dataset D is represented as a strict partial order Pε(D)=(Iε(D),) of the collection of intervals representing extrema for all time series in the dataset. If IIε(D) is an interval in time series Di and JIε(D) is an interval in time series Dj with ij, then IJ if and only if the right endpoint of I is less than or equal to the left endpoint of J. In other words, I is comparable to J if and only if the intervals are disjoint, except possibly at a single point. When intervals are disjoint, then the extrema they represent are well-separated and can be unambiguously distinguished. Since not all pairs of intervals in Iε(D) are comparable, the intervals are only partially ordered.

Each partial order Pε(D) can be represented as a graph HεR(V,E), called the Hasse diagram, with nodes V corresponding to intervals in Iε(D), and an edge in EV × V whenever IJ. This graph is a coarse representation of the information contained in the dataset, containing qualitative rather than quantitative phase shifts.

As ε increases, there are fewer disjoint intervals in Iε(D), fewer well-ordered pairs of intervals, and therefore fewer restrictions on the ordering of extrema between time series. This results in a more permissive partial order, where the amount of trusted information decreases as assumed noise level increases. Eventually, ε increases to the point where some neighboring extrema become indistinguishable and the number of nodes in HεR(V,E) decreases. We illustrate this on an example in Figure 2. In the upper left, there are two interpolated and normalized time series. The blue curve is the same as the one in Figure 1 (top), and has five local extrema. The second orange curve has four local extrema including endpoints. We calculate the ε-extremal intervals for both curves for ε = 0.0, 0.03, and 0.05 and get the Hasse diagrams of the partial orders shown in the figure. In every Hasse diagram, the arrow of time points downward.

Figure 2:

Figure 2:

Hasse diagrams of two time series as a function of ε. Upper left: The time series under consideration. Upper right: ε = 0.0. Lower left: ε = 0.03. Lower right: ε = 0.05.

At the upper right in Figure 2 is the Hasse diagram for ε = 0.0, which corresponds to the case without noise, and all nine local extrema are represented. At ε = 0.03 (lower left), all nine extrema are still present, but the greater number of incomparable ε-extremal intervals indicates a more permissive partial order. For example, at ε = 0, the first NDD1 minimum (the one closest to the top of the Hasse diagram) must occur before the first CDC20 maximum. But when ε = 3% those two extrema are incomparable, as indicated by the lack of an arrow between them in the Hasse diagram on the lower left. At the lower right with ε = 5%, there are only six extrema. CDC20 has lost its two middle extrema and its rightmost minimum, as shown in Figure 1 (bottom) where ε = 0.05.

As motivation for the work in Section 4, we discuss the applications of our method after a brief introduction to some standard definitions in graph theory. Application 1 demonstrates that the ε-extremal interval method can be used for consistency-checking a DSGRN model of network dynamics as suggested in [1]. Application 2 shows that the technique can be used to quantify the similarity between different time series datasets.

For the purpose of the presentation of the applications we will ask the reader to accept that the collection Iε(D), the partial order Pε(D)=(Iε(D),), and its representation HεR(V,E) can be unambiguously constructed for any dataset D and any ε. Rigorous technical detail for the method of ε-extremal intervals is given in depth in Section 4, along with information on the computational implementation [21]. Section 4 may be read before the applications in Section 3 if desired.

2. Graph theory preliminaries

Definition 1. A directed graph G(V,E) is a set of nodes (or vertices) V, together with a collection of edges EV × V. A labeled, directed graph G(V, E, ) has in addition a labeling function ℓ that assigns labels to the nodes and/or edges of a graph.

We will refer to all of unlabeled, node-labeled, and node- and edge-labeled graphs in this manuscript. One important example is a gene regulatory network, which is a node- and edge-labeled directed graph. Each node is labeled by the gene product that it represents, and every edge is labeled either as an activating (→) or repressing (⊣) edge. The gene regulatory network that we will explore in Application 1 is shown in Figure 3.

Figure 3:

Figure 3:

Wavepool model of core genes involved in regulation of the yeast (S. cerevisiae) cell cycle [22].

Definition 2. A partial order P = (S, ≤) is a binary relationon the set S that is reflexive, antisymmetric, and transitive. A strict partial order P = (S, <) is a binary relation < on the set S that is antisymmetric and transitive. A (strict) total order is a (strict) partial order where for any pair (a, b) ∈ S × S either ab (a < b) or ba (b < a). A linear extension of a (strict) partial order is a (strict) total order T such that if ab (a < b) in P, then ab (a < b) in T.

In this manuscript, we will only be concerned with strict partial and strict total orders. Our notation reflects this, but for brevity we will often refer only to partial and total orders.

Every partial order can be represented as a directed graph called a Hasse diagram. To explain Hasse diagrams, it is useful to know the concepts of transitive closures and transitive reductions. A transitive closure adds a direct edge wherever there is a path between two nodes, and a transitive reduction removes an edge whenever there is a longer path from one node to another.

Definition 3. Let G(V, E) be a directed graph. A node j is reachable from a node i in the graph G if there exists a path

(i,i1),(i1,i2),,(in,j)

such that each edge (ij, ij+1) ∈ E. The transitive closure of G(V, E) is the directed graph GC(V, E′) with EE′ such that (i, j) ∈ E′ if and only if j is reachable from i in G The transitive reduction of G(V, E) is the directed graph GR(V, E″) with the minimal set of edges E″E such that if the vertex jV is reachable from iV in the graph G, then j is reachable from i in GR.

Definition 4. Let P = (S, <) be a strict partial order on a finite set S. Let H(V, E) be a directed graph where the nodes V are in a bijection with the elements of the set S, and an edge (i,j) ∈ E if and only if i < j. The Hasse diagram HR(V, E″) of P is the transitive reduction of H.

Note that the graph H defined in Definition 4 is its own transitive closure, H(V, E) = HC(V, E), because the partial order P is transitively closed. For clarity, we will refer to HC rather than H in the text. The Hasse diagram HR plays a role in both Applications 1 and 2, and the transitive closure of the Hasse diagram HC is important in Application 2. We will use the notation HεR(D) and HεC(D) for a given dataset D and noise level ε.

Every partial order P is associated to a unique distributive lattice. This is the content of Birkhoff’s Representation Theorem (see Chapter 5 in [23]).

Definition 5. Let P = (S, <) be a strict partial order over a finite set S. A down-set is any set QS such that if aQ and b < a, then bQ. A down-set lattice is the partial order P=(O(P),), ordered by set inclusion, where O(P) is the collection of down-sets of P.

The down-set lattice has a minimal element, the empty set, and a maximal element, S. In the Hasse diagram of the down-set lattice, the two nodes associated to these sets are called the root and the leaf respectively. For brevity, we will use the term down-set lattice interchangeably with the Hasse diagram of the down-set lattice.

Remark 1. The paths from root to leaf in the down-set lattice are in a one-to-one correspondence with the collection of linear extensions of P. This observation plays a central role in Application 1.

Definition 6. A graph distance is a non-negative, real-valued, symmetric function d(G1, G2) acting on two graphs G1 and G2 that satisfies the triangle inequality and that is zero if and only if G1 = G2.

The graph distance described in Section 4.3.3 for node-labeled, directed graphs is used in Application 2 to assess the similarity between two time series datasets.

3. Applications

3.1. Model rejection via pattern matching

3.1.1. DSGRN

In our previous work [12] we introduced a method to describe the global dynamics of a regulatory network for all parameters. This approach, named DSGRN (Dynamic Signatures Generated by Regulatory Networks) [17], uses combinatorial dynamics generated by switching systems [13, 14, 15, 16] to construct a database of all possible dynamics that a network may exhibit.

The core procedure of DSGRN [12, 17] is the following:

  1. A switching system ODE model is associated to a gene regulatory network. The dimension of phase space is the number of nodes in the network, N. The dimension of parameter space is 3M + N, where M is the number of edges in the network.

  2. The form of switching systems allows parameter space R+3M+N to be decomposed into a finite number of semi-algebraic regions. We call each region a DSGRN parameter.

  3. The form of switching systems also admits a decomposition of phase space R+N into a finite number of rectangular regions (boxes). The number of boxes depends on the number of discrete states that each gene product can attain, which is exactly one more than the number of out-edges from the corresponding node.

  4. The dynamics of the switching system for a DSGRN parameter are captured by a state transition graph, in which each node is associated to one of the boxes in phase space, and edges capture the direction of the normal component of the vector field at a boundary between two boxes. This assignment is consistent if there is no negative self-regulation in the regulatory network [14]. A solution trajectory of the switching system then corresponds to a path in the state transition graph.

DSGRN produces a finite collection of state transition graphs that captures the parameter dependence of the dynamics across all of parameter space.

Important to the application here, the nodes of the state transition graph can be labeled by whether each gene product is increasing (I), decreasing (D), or both (*) in the corresponding phase space box. Likewise, the edges can be labeled by which variable is attaining an extremum between boxes (M for maximum, m for minimum, a dash for neither). By assumptions on the switching system, only one gene product may attain an extremum at a time. An example labeled state transition graph for a fixed DSGRN parameter for the network in Figure 3 is shown in Figure 4.

Figure 4:

Figure 4:

Example state transition graph for the network in Figure 3. This is a state transition graph as described in the text, with 4-symbol labels indicating whether a variable is increasing (I), decreasing (D), or both (*) in the corresponding partition of phase space. The order of the symbols is SWI4, HCM1, NDD1, and YOX1. The edge labels indicate the possibility of a local maximum (M) or minimum (m) for each variable. The dash indicates that no extremum is possible for that variable.

3.1.2. DSGRN model consistency with a dataset

In a recent paper [1], we compared a DSGRN model of molecular regulation to experimentally observed time series. We proposed that a network can be rejected as a model of a biological system that produces the experimental dataset D if there is no DSGRN parameter at which a path in a DSGRN state transition graph is “consistent” with the partial order Pε(D) derived from a dataset.

A path is consistent with partial order Pε(D) if it is a linear extension of Pε(D). Any linear extension of Pε(D) is a total order of the extrema consistent with the time discretization and the measurement error level ε. Therefore a path in the state transition graph that is a linear extension of Pε(D) represents a solution trajectory that has an order of extrema consistent with noise level ε in the data. If such an extension exists, the network cannot be rejected as a valid model.

Because we are seeking a linear extension, we make use of the down-set lattice structure introduced in Definition 5, since it is a summary of all linear extensions of Pε(D). First, the lattice is augmented with labels. In the Hasse diagram of Pε(D), HεR(D) the nodes can be naturally labeled with extrema. Therefore one can unambiguously label the edges of HεR(D) based on whether the gene product is increasing or decreasing, which is opposite to the labeling on the state transition graph. We showed that there is a natural way to assign dual labeling to the down-set lattice. In other words, the lattice can be labeled unambiguously with extrema on the edges and increasing or decreasing behavior on the nodes. Second, the lattice is augmented with self-edges at every node. Since gene products may monotonically increase or decrease in concentration across multiple boxes before reaching an extremum, self-loops were added to represent this dwell time. The resulting graph is called a pattern graph, and has a unique root node and a unique leaf node. An example is shown in Figure 6 based on the time series introduced in Figure 5.

Figure 6:

Figure 6:

Corresponding pattern graph for H0.15R(D) in Figure 5 (lower right). The 4-symbol node labels with I (increasing) and D (decreasing) and the edge labels with m (minimum) and M (maximum) symbols have the same meaning as in Figure 4. Notice the unique root and unique leaf.

Figure 5:

Figure 5:

S. cerevisiae time series data [28] (upper right) and example HεR(D) for the data at ε = 0.0 (left) and ε = 0.15 (lower right). The arrow of time points downward on the Hasse diagrams.

We seek a pair of paths with matching labels on both nodes and edges, where one path goes from root to leaf in the pattern graph and the other path is in the state transition graph. The matching relation between labels that we define allows for non-exact matches; namely the symbol * is allowed to match both I and D. This is a type of approximate graph matching [24, 25, 26, 27]. If such a pair exists, then the model is consistent with the data and cannot be rejected. In [1], we formulate this consistency problem as a graph theory problem with a polynomial time algorithm.

In Figure 5 of [1], the partial order Pε(D) was calculated by hand via visual inspection of the data. We illustrate the approach developed in this paper on the example from [1].

3.1.3. Cell cycle data model

In Figure 5 (upper right), we present microarray time series data from the yeast S. cerevisiae, which was normalized between −0.5 and 0.5. The data were published in [28], and have here undergone shifting via CLOCCS analysis [29] and smoothing via polynomial splines, as described in Appendix A. These data are similar to those in [1] (Section 4.2, data published in [30]), which are data collected on the same genes from the same organism in the same lab, but using RNAseq technology rather than the microarray platform.

We show two partial orders arising from this data at two noise levels using the ε-extremal interval method. The Hasse diagrams are shown in Figure 5 at ε = 0.0 (left) and 0.15 (lower right). The number of extrema at a noise level of 0 is 28, and drops to 16 at a noise level of 0.15. The partial order at ε = 0.0 is far more restrictive (and thus looks closer to a total order), because few of the intervals overlap, and more order relations are known. The pattern graph associated to the partial order on the right is shown in Figure 6.

The regulatory network in Figure 3 is a simplified version of the wavepool model, introduced in [22]. This model has been corroborated experimentally [31, 32, 33] and describes the mechanism for controlling the cell-cycle transcriptional program.

In [1], we verified that the wavepool model is consistent with the data. We find the same result here. In particular, there is a pair of matching paths between Figures 4 and 6 of length 17:

(18,38)(6,37)(7,34)(8,32)(2,31)(5,42)(4,25)(3,22)(9,21)(6,19)(7,65)(8,63)(2,13)(5,80)(4,7)(3,4)(0,0)

The first number is the integer node label in Figure 4, and the second is the integer node label in Figure 6. The first node pair is at the root of the pattern graph (node 38) and the last node pair is at the leaf of the pattern graph (node 0). It can be verified that the labels at each pair of nodes match, remembering that the wild card character * matches itself, I, and D. Likewise, the edge labels match as well. So at the DSGRN parameter that produced the state transition graph in Figure 4, the wavepool network model cannot be rejected.

3.1.4. Global assessment of the network model

We seek to characterize the performance of the wavepool model across parameter space and across various levels of noise. Recall that we use the variable ε to denote a band of noise around a time series. Recall also that the partial order representing the dataset Pε(D) depends on ε, so that the pattern graph depends on ε as well. We independently normalized each time series in the dataset between −0.5 and 0.5 so that ε represents a percentage of the distance between the global maximum and global minimum of each time series in the dataset.

The wavepool network in Figure 3 has 1080 DSGRN parameters. For ε values ranging from 0.0 to 0.15, we searched for pairs of matching paths between the pattern graph and the state transition graph at each DSGRN parameter using the DSGRN pattern matching algorithm described in [1] and implemented in [17]. See Figure 5 for the partial orders P0.0(D) and P0.15(D) showing the extremes of the representation of the dataset.

The results are summarized in the first row of Table 1. There are 22 DSGRN parameters in the wavepool network that exhibit consistent dynamics with the time series in Figure 5 (upper right) for noise levels ε = 0.01 through 0.04. This is the same number of DSGRN parameters with path matches found in [1].

Table 1:

Number of parameters with at least one match to Pε(D).

ε 0.0 0.01 0.04 0.05 0.06 0.08 0.09 0.10 0.14 0.15

Figure 3 0 22 22 12 12 42 24 24 24 24
Figure 7 0 0 0 0 0 0 0 0 22 22

Although the number of matches is the same, none of the partial orders Pε(D) match the one computed by hand in [1]. This is likely due to the difference in sampling times; the RNAseq data in [1] have a sampling interval of 5 minutes, while the microarray data have a sampling interval of 16 minutes. This discrepancy can be responsible for different resolutions in peak detection, and therefore change the representative partial orders. It is also possible that the CLOCCS shifting of the microarray dataset resulted in a small phase shift with respect to the RNAseq data, which could alter the relative locations of the extrema.

In addition to the difference in partial orders, we also do not know if the collection of parameters at which there are pattern matches is the same between this work and the previous one. However, the fact that the model has comparable performance under different data collection platforms and sampling intervals highlights both the reproducibility of the performance of the S. cerevisiae cell cycle, and the power of this technique to identify/reject models based on time series data.

We note that at ε = 0, the wavepool model has no matches to the partial order in Figure 5 (left). Without considering noise, we would be tempted to reject the hypothesis that the wavepool model is consistent with the experimental data.

However, the ability to scan through different potential noise levels reveals that the wavepool model cannot be rejected, as it consistently matches the data over a range of small noise levels.

Finally, we observe that the number of matches does not increase monotonically with ε. This may at first seem counterintuitive, since as intervals grow larger and overlap, the number of constraints in the partial order is reduced. However, nodes in the graph representing shallow extrema in the time series disappear as ε grows larger, which changes the size of the graph. Under these conditions, there is no guarantee of monotonicity. In our example here, the number of nodes in the partial order decreases from 28 to 16 as ε increases from 0 to 0.15. At ε = 0.08, we find the highest number of parameters with matches, 42.

3.1.5. Model rejection

In a second numerical experiment, we propose a different model for the same time series data, depicted in Figure 7. Here we swapped the positions of the genes NDD1 and SWI4 compared with the first model, thus creating an “incorrect” model for the data. In [1], no matches were found for this model using the partial order computed by hand and shown in Figure 5 of that work. The number of matches that we find now for this network are shown in second row of Table 1. We observe that there are no matches in this network until noise level ε = 0.14, so that we match [1] in the range ε = 0.01 through 0.04 as before. At a noise level of ε = 0.14, the band of uncertainty around the data is 28% of the difference between the global maximum and the global minimum in the normalized data. This a very high level of noise, and we hypothesize that the partial order at this noise level does not sufficiently constrain the model, leading to matches with models that are inconsistent with experimental results. The large range of noise levels with zero matches would lead us to reject the network in Figure 7 as a description of the biological mechanism that produced the time series data.

Figure 7:

Figure 7:

A model in which two genes of the wavepool model are swapped, which our methodology invalidates.

3.2. Quantifying similarity between replicate experiments

In this application, we seek to quantify the similarity between two replicates of the same experiment with datasets D1 and D2. We construct for each dataset partially ordered sets of ε-extremal intervals Pε(D1) and Pε(D2), and represent them by the transitive closures of Hasse diagrams HεC(D1) and HεC(D2) as in Definitions 3 and 4. We then calculate a graph distance between HεC(D1) and HεC(D2), which is roughly the proportion of non-shared edges between the two graphs; see Section 4.3.3 for a precise definition. The distance d(HεC(D1),HεC(D2)) that we use is scaled so it has a range between 0 and 1. We say that the similarity between two datasets D1 and D2 is given by

s(D1,D2)=1d(HεC(D1),HεC(D2)).

This similarity measure gives roughly the proportion of shared edges between HεC(D1) and HεC(D2).

In order for the comparison of the two datasets to be biologically relevant, they must be synchronized at the same point in the yeast cell cycle. In the datasets we consider, the time series were processed to align with the yeast cell cycle using the techniques described in Appendix A. After this processing, we normalized the data to the range [−0.5, 0.5] and truncated so that most of the time series exhibited one period, as shown in Figure 8. The truncation limits the computation time of the graph distance, and focuses the analysis on the highest and most synchronized peaks.

Figure 8:

Figure 8:

Microarray yeast cell cycle data from [28], processed as described in Appendix A. (Top) Replicate 1. (Bottom) Replicate 2.

To illustrate the properties of the similarity measure, we perform four different comparison experiments.

  1. We first concentrate on only four genes, SWI4, YOX1, NDD1, and HCM1. We denote by D1 and D2 the time series of these four genes extracted from experiments D1 and D2, respectively (see Figure 9 top row). We calculate the similarity s(D1,D2) over a range of ε.

  2. We compute s(D1,D2), where D2 is the dataset formed by time series SWI4, CLB2, NDD1, and HCM1. Here we replace the time series of YOX1 in the second dataset by the time series of CLB2, where the CLB2 time series can be seen in Figure 8.

  3. We compare the same data as in (2), but we mislabel CLB2 in D2 as YOX1. We call the mislabeled dataset D3. The calculation of s(D1,D3) shows the effect of replacing the YOX1 data in D2 by a different time series. The comparison between experiments (2) and (3) gives an idea of the impact on the distance measure when there are non-matching gene labels in the partial orders.

  4. Lastly, we compare all of the time series by randomly sampling four genes to construct datasets D^1 and D^2 one hundred times, and calculating s(D^1,D^2) for each over a range of ε. The mean of these curves is taken to be representative of the full dataset. The same experiment is performed with random samples of eight genes to show the dependence of results on gene sample size.

Figure 9:

Figure 9:

(Top left) Time series for replicate 1. (Top right) Time series for replicate 2. (Bottom left) Hasse diagram H0.01R(D1) for replicate 1 at ε = 0.01. (Bottom right) Hasse diagram H0.01R(D2) for replicate 2 at ε = 0.01. The arrow of time points downward on the Hasse diagrams.

Experiment 1:

We compute ε-extremal intervals to produce a partially ordered set of extrema for D1 and D2. As an example, the Hasse diagrams H0.01R(D1) and H0.01R(D2) for ε = 0.01 are shown in the bottom row of Figure 9. Although we use the transitive closure HεC(D) to calculate distances, the transitive reductions are shown for simplicity, so that the structures of the partial orders are easier to compare.

We repeat this procedure at noise levels ε = 0 to 0.15 at intervals of 0.005. The similarity s(D1,D2) was calculated at each ε and represented as the blue curve in Figure 10. Notice that the similarity between the partially ordered sets goes to 1 at larger values of ε and varies between 0.7 and 1.0 over the whole range of ε. To assess if this represents strong or weak similarity, we perform two other numerical experiments.

Figure 10:

Figure 10:

Similarity as a function of ε. Blue curve: Experiment 1, s(D1,D2). Orange curve: Experiment 2, s(D1,D2). Green curve: Experiment 3, s(D1,D3).

Experiment 2:

We replace the second data set D2 by the dataset D2 composed of genes SWI4, CLB2, NDD1, and HCM1, and we show the similarity s(D1,D2) in the orange curve in Figure 10. Since the YOX1 extrema in replicate 1 are being compared with the CLB2 extrema in replicate 2, the distance between the partially ordered sets is larger than in Experiment 1. The similarity between the partial orders ranges between about 40-55%. This gives an idea of the distance when nodes cannot be matched across partial orders because of a single time series swap.

Experiment 3:

We relabeled all “CLB2” extrema in replicate 2 to “YOX1” labels to get dataset D3. This allows us to compare the distance when the curve shape of CLB2 is used in place of the true YOX1 data. The resulting similarities s(D1,D3) are shown in the green curve in Figure 10. By comparing the orange curve (Experiment 2) with the green curve (Experiment 3) in Figure 10, it can be seen that having mismatched labels contributes substantially to the dissimilarity of time series. However, even when the mismatched labels are artificially removed with relabeling, we see by comparing the blue curve (Experiment 1) with the green curve (Experiment 3) in Figure 10 that using the same set of time series in both replicates leads to a noticeably higher similarity over most of the range of ε.

Experiment 4:

We now consider all the time series in the dataset, as shown in Figure 8, to quantify the similarity of the two replicates as a function of noise. Because our algorithm for graph distance does not scale favorably with the size of the graph, we chose to (a) sample ε more coarsely from 0 to 0.15 at intervals of 0.01, and (b) pick at random one hundred samples of four and eight genes each, and then calculate the mean similarity. The resulting curves are shown in Figure 11 with samples of 4 genes on the left and 8 genes on the right. The thick black line indicates the mean and ±1 standard deviation of the one hundred samples shown in grey. The mean on both curves ranges between about 75-95% similarity, with the bulk of the values over 80% similar. The standard deviation decreases substantially with increasing gene sample sizes, suggesting that sampling subsets of genes is a reasonable proxy for calculating the similarity of the whole dataset.

Figure 11:

Figure 11:

Similarity s(D^1,D^2) as a function of ε. Thin gray curves: one hundred samples of 4 genes each (left) and 8 genes each (right) from the 16 genes listed in the legend of Figure 8. Thick black curve: Mean of the one hundred samples with ±1 standard deviation.

4. Methods

The applications in the previous section are dependent on the representation of a time series dataset D by a partial order Pε(D) over time intervals Iε(D) representing the location of extrema up to a noise level ε. We now present in detail the construction of the intervals Iε(D).

We begin in Section 4.1 by establishing the theory for continuous functions. For f:[x1,x2]R a continuous function on a closed interval [x1, x2], we present an approach that finds a collection of intervals Jε(f), called ε-extremal intervals, with the property that any continuous function g whose values are within measurement error ε of f is guaranteed in each interval IJε(f) to attain a local extremum.

Our main tool is the notion of the merge tree of f [18, 19] that we use to define a new object, called the normalized branch decomposition of f on [x1, x2]. The normalized branch decomposition allows us construct a collection of ε-minimal intervals on [x1, x2], such that every continuous function g that remains within the bounds fε and f + ε is guaranteed to achieve a minimum in each ε-minimal interval. By taking −f and applying the same method, we construct ε-maximal intervals, in which every perturbation g of f bounded by ε is guaranteed to attain a local maximum. The union of ε-minimal and ε-maximal intervals forms the set Jε(f).

The extension of merge trees and branch decompositions to discrete time series is straightforward [34]. We show in Section 4.2 that ε-extremal intervals Iε can also be assigned to a discrete time series by using the linear interpolation to construct a continuous function f. With this view, most of the same theorems hold for discrete time series as for general continuous functions.

In both discrete and continuous cases, there is a total order on the ε-extremal intervals for a single function f. In other words, the ε-extremal intervals represent a sequence of extrema of f that can be trusted up to measurement error level of ε. The total orders associated to a collection of functions {fi} derived from a time series dataset can be extended to a partial order on the extrema of {fi}.

In Section 4.3, we discuss Algorithms 1 and 2 of [34] that are used to compute merge trees and branch decompositions for a set of time series. We also discuss algorithms derived from Section 4.2 for constructing ε-extremal intervals, partial orders, and a graph distance for partial orders. We provide a repository [21] in Python 3.7 that implements all the algorithms.

4.1. ε-extremal intervals for continuous functions

Consider a continuous function, f:[x1,x2]R, defined on a closed interval, [x1,x2]R.

Definition 7. Let C([x1, x2]) denote the space of continuous functions g:[x1,x2]R, endowed with the supremum norm. For ε ≥ 0, define

Nε(f){gC([x1,x2]):fg<ε}

to be the ε-neighborhood of f. A function gNε(f) will be called an ε-perturbation of f.

Given ε ≥ 0, we would like to compute a collection of intervals, Jε(f){Iiε}, Iiε[x1,x2], not necessarily disjoint such that

  1. every gNε(f) attains either a minimum or a maximum in each Iiε, and

  2. for any nonempty JIiε, there exists some hNε(f) such that h does not attain a maximum or a minimum in J.

The collection Jε(f) represents a set of extrema corresponding to a noise level of ε.

To construct Jε(f) for all ε, we use merge trees [18, 20, 19, 35] to construct an associated object, which we call the normalized branch decomposition. We will show that the normalized branch decomposition provides the proper framework to allow us to associate a collection Jε(f) to all ε > 0.

4.1.1. Merge trees

The merge tree of a real-valued function, f, captures the connectivity of the sublevel sets, f−1(−∞, h], for each hR, similar to how the Reeb graph [18] captures the connectivity of the level sets of a function. Here, we recall the definition of the merge tree associated to a function, and refer the reader to [18] and [19] for more details.

Definition 8. The merge tree of f, denoted Tf, is defined to be the quotient space

TfΓ(f)~,

where Γ(f) denotes the graph of f, and for x,y ∈ Γ(f), we declare x ~ y if there exists an hR such that both x and y belong to the same level set of f, f−1(h), and also to the same connected component of the sublevel set, f−1(−∞, h].

See Figure 12 (b) for an example of a merge tree. To visualize the construction of the merge tree, imagine a horizontal line sweeping upward from the bottom of the time series depicted in Figure 14 (a). An intersection of such a line with a local minimum corresponds to a leaf of a merge tree, where the leaves are located at the bottom of the merge tree, and an intersection with a local maximum corresponds to an internal node of the merge tree. Notice that each node of Tf, whether maximum or minimum, is associated to a time at which the extremum is located, ti, and a height of the extremum, f(ti). Denote by {i} the leaves and by {mi} the internal nodes of the merge tree. For i a leaf, denote its height by ai = f(ti), and similarly for mi an internal node, bi = f(ti).

Figure 12:

Figure 12:

(a) The graph of a function with height h vs time t, (b) its corresponding merge tree, (c) the branch decomposition represented as a colored graph, and (d) the normalized branch decomposition.

Figure 14:

Figure 14:

From left to right: the graph of a function, its corresponding dual merge tree, the dual branch decomposition, and the normalized dual branch decomposition.

Note that while the left hand endpoint in Figure 1 (a) is a local maximum, it does not play an important role in the merge tree, Figure 1 (b), because no new topological features appear or merge with other features at that height. The endpoints are the only extrema that might not play a significant role in the merge tree.

4.1.2. Normalized branch decomposition

A branch decomposition is a way of partitioning a merge tree that pairs up the appearance (i.e. birth) and the disappearance (i.e. death) of minima as a function of sublevel height h [20]. The birth heights are given by values {f(ti)} associated to leaves i and the death heights {f(tk)} correspond to the internal nodes of the merge tree. At each internal node, at least two branches merge. We choose to continue the branch with lowest birth height and terminate all other branches.

This is unambiguous when the branches start at distinct heights. Although having distinct minima is a generic property in continuous functions, experimental time series may be measured only up to some finite resolution, in which case equal height values at which different minima appear may be common. Therefore, in the case that the branches do not start at distinct heights, we arbitrarily decide to continue the branch with the lowest height that also occurs first in time. Other choices are reasonable and would induce different branch decompositions.

Definition 9. We define a total orderon the leaves of the merge tree {i} by saying ℓij if one of the following holds:

  1. ai < aj, or

  2. ai = aj and ti < tj.

This defines an indexing on the leaves of Tf that satisfies ii+1 for all i, starting at i = 0.

Definition 10. Let mk be an internal node of Tf, with associated height bk. Define Smk to be the subtree of Tf that is rooted at mk. We define the branch decomposition of f to be the collection of intervals

  1. [a0, b0], where b0 is the global maximum of f; and

  2. [ai,bk) whenever Smk is the largest subtree satisfying
    1. i is in the subtree Smk and
    2. ij for all ℓjSmk with ij.

The branch decomposition can be viewed as a partition of the merge tree, as shown in Figure 12 (c).

Definition 11. The normalized branch decomposition of f:[x1,x2]R is a collection of intervals

B(f)iJi,

wheredenotes disjoint union, J0 := [0, b0a0], and Ji := [0, bkai) for i > 0 are defined using the branch decomposition of f. Recall that each Ji is uniquely associated to a leaf ℓi, with value ai = f(ti), and its time of occurrence ti. We say that ti is the representative of Ji.

Note that by representing the collection (Ji) as disjoint intervals, we can visualize the normalized branch decomposition as a “barcode”-like summary, which we show in Figure 12 (d).

4.1.3. ε-minimal intervals

We now establish properties of the normalized branch decomposition that allow us to associate a collection of intervals Jε(f) to each parameter ε. First, we describe how to obtain the intervals corresponding to local minima, and then we dualize this procedure to obtain intervals corresponding to the local maxima.

Definition 12. Fix ε > 0. Let BεB(f) be the collection of all intervals in the normalized branch decomposition B(f) that are longer than 2ε,

Bε={JiB(f):Ji>2ε}.

Let Ji = [0, bkai) ∈ Bε and consider its representative ti. Define φ(Ji) to be the connected component of (fε)−1(−∞, ai + ε) that contains ti. Clearly, φ(Ji) is a well-defined relatively open interval in [x1, x2]. We define the collection of ε-minimal intervals, denoted

Jεmin(f):={φ(Ji)|JiBε}

to be the collection of all such intervals. For Ii = φ(Ji), we say that ti is the representative of Ii as well as of Ji.

See Figure 13 for a depiction of the action of φ. The following Proposition shows that we cannot have overlapping ε-minimal intervals.

Figure 13:

Figure 13:

A graph of a function, f, as well as f ± ε. While (fε)−1(−∞, ai + ε) consists of both the dark blue and light blue intervals, φ(Ji) is just the connected component that contains the representative, ti.

Proposition 1. For any Ii,IjJεmin(f), IiIj = ∅.

Proof. Consider Ii = φ(Ji) for some Ji = [0, bkai) ∈ Bε, and Ij = φ(Jj) for some Jj = [0, baj) ∈ Bε, their representatives ti and tj, and their leaves in the merge tree i and j, respectively. Since ti and tj are locations of minima of the continuous function f, there must exist at least one maximum of f between ti and tj. Let bq denote the highest local maximum between the two minima, with location tq ∈ (ti,tj). Notice that if bq < bk or bq < b, then the two leaves i and j are in the same subtree rooted at one of the internal nodes mk or m associated to bk and b respectively. This contradicts the fact that Ii,Ij correspond to distinct branches in the branch decomposition. So bqbk and bqb. Then by Definition 12,

bqaibkai>2εandbqajbaj>2ε,

so that tq ∉ (fε)−1(−∞, ai + ε) = Ii and tq ∉ (fε)−1(−∞, aj + ε) = Ij. Since tq ∈ (ti,tj) with tiIi and tjIj, this establishes that IiIj = ∅. □

Assume f:[x1,x2]R is a continuous function and assume B(f) is its corresponding normalized branch decomposition.

Proposition 2. Fix ε > 0. Then any gNε(f) attains a local minimum in the relative interior of every interval IJεmin(f).

Proof. Note that any interval IJεmin(f) has a corresponding interval JiBεB(f) given by I = φ(Ji). By construction of B(f), f attains a local minimum ai in I, since ai = f(ti) and tiI.

We now consider several cases.

  • First, assume I = (y1, y2) away from the endpoints of [x1, x2]; i.e. yixi,i = 1, 2. Since gNε(f), and f(ti) = ai, it follows that g(ti) < ai + ε. But by definition of I we have ai + ε = f(y1) − ε, because otherwise y1I. Therefore
    ai+ε=f(y1)ε<g(y1),
    and by similar argument ai + ε < g(y2). It follows that
    g(ti)<g(y1)andg(ti)<g(y2)
    and by continuity g attains a local minimum in I.
  • Next, assume that I = [x1,y), and gNε(f). If g attains a local minimum at x1, then the proof is complete; so assume that g(x1) is not a local minimum of g in I. As before we have that g(y) > ai + ε, and g(ti) < ai + ε. If g(x1) ≥ ai + ε, then by continuity, g must attain a local minimum in the interior of I. Finally, assume that g(x1) < ai + ε. Since g does not attain a local minimum at x1, there exists some x~I such that g(x~)<g(x1)<g(y). By continuity, g must attain a minimum in the interior of I.

  • The case where I = (y, x2] follows from a similar argument to that of the previous case.

Let I denote the closure of interval I.

Proposition 3. Consider IJεmin(f). For any J¯I, there exists a strictly monotone function g:J¯R such that gNε(fJ¯).

Proof. Let I=[y1,y2] and J¯[z1,z2]. We prove the case z1 = y1, with an analogous argument proving the case z2 = y2. Recall that f attains a minimum in I, ai = f(ti). First assume that the minimum is located in the interior of the subinterval, ti ∈ (z1, z2). Note that f(z2) < ai + 2ε, and for all z ∈ (ti, z2), we have f(z) ≥ ai. Therefore there exists δ2 > 0 such that

ai+ε+δ2Nε(f(z2))=(f(z2)ε,f(z2)+ε).

Since z1 = y1, there exists some δ0 > 0, such that

ai+εδ0Nε(f(z1)).

Finally, there is δ1 < δ0 such that

ai+εδ1Nε(f(ti)).

We define two linear increasing functions g1:[z1,ti]R and g2:[ti,z2]R by

g1(z1)=ai+εδ0,g1(ti)=ai+εδ1;andg2(ti)=ai+εδ1,g2(z2)=ai+ε+δ2.

Function g1 is increasing; since δ1 < δ0; g2 is also clearly increasing. Most importantly, we have that the graph of g1 is contained in Nε(f|[z1,ti]) and the graph of g2 is contained in Nε(f|[ti,z2]). Since g1(ti) = g2(ti) the continuous function, g:J¯R, defined by

g(x)={g1(x),ifx[z1,ti]g2(x)ifx[ti,z2]

is in Nε(fJ¯). By construction, g is a piecewise linear, strictly increasing function. This establishes the result for the case ti ∈ (z1, z2). To finish the proof, we note that if ti < z1, the construction of g2 produces the desired function, and if z2 < ti, then the construction of g1 produces the desired function.

Corollary 1. For any non-empty, proper subinterval, JI with IJεmin(f), such that J¯I, there exists some gNε(f) such that g does not attain a local minimum in J.

We conclude that Jεmin(f) consists of intervals on which every ε-perturbation of f attains a local minimum, and furthermore, that these intervals are the smallest such intervals for which this is true. This justifies the name and notation of Jεmin(f). This collection robustly represents the minima of f up to precision ε.

4.1.4. Dual construction for local maxima

The simple observation that the maxima of f are the minima of −f leads to the following definition.

Definition 13. We define a collection of ε-maximal intervals Jεmax(f) by

Jεmax(f):=Jεmin(f).

Given the above definition, we have the following corollary from Proposition 2 and Corollary 1.

Corollary 2. Fix ε > 0. Then any gNε(f) attains a local maximum in every interval IJεmax(f). Furthermore, for any nonempty, proper subinterval JI with J¯I, there exists some hNε(f) such that h does not attain a local maximum in I.

Therefore the collection Jεmax(f) robustly represents the maxima of f, which is a natural dual of Jεmin(f). The corresponding dual merge tree, dual branch decomposition, and dual normalized branch decomposition of the function f in Figure 12 are shown in Figure 14. To visualize the construction of the merge tree, imagine a horizontal line sweeping down from the top of the time series depicted in Figure 14 (a). An intersection of such a line with a local maximum corresponds to a leaf of a merge tree, where the leaves are located at the top of the merge tree, and an intersection with a local minimum corresponds to an internal node of the merge tree.

Definition 14. The collection of ε-extremal intervals is the collection

Jε(f):=Jεmin(f)Jεmax(f).

If Jε(f)=, then we say f is ε-constant.

The motivation for the definition of ε-constant is that when Jε(f)=, then ba ≤ 2ε for any minimum a and maximum b, and so the extrema cannot be distinguished at ε. We now show that the ε-minimal intervals are distinct from ε-maximal intervals.

Proposition 4. Consider IJεmin(f), represented by u and JJεmax(f), represented by v. Then uJ and vI.

Proof. By the definition of representative, f(u) = a is a local minimum with uI and f(v) = b is a local maximum with vJ. Then vI since otherwise

v(fε1)(,a+ε)b=f(v)a+2εba2εI,JJε(f).

A similar argument shows uJ.

Corollary 3. Jεmin(f)Jεmax(f)=.

4.1.5. Total ordering in Jε(f)

Since f(x) is continuous, we expect that the ε-minimal and ε-maximal intervals must alternate. In this section, we define a total order on these intervals and prove that it is well-defined and that the extremal intervals alternate as expected. Technicalities occur, because the ε-minimal intervals can overlap with the ε-maximal intervals; see Figure 15.

Figure 15:

Figure 15:

A graph of a function, f, as well as f ± ε. Here, ti is a representative of a minimum with associated ε-minimum interval φ(Ji), and tj is a representative of the ε-maximum interval φ(Jj). Notice that φ(Ji) ∩ φ(Jj) = ∅.

Definition 15. We define an order ◁ on Jε(f) as follows. Consider two relatively open intervals, I,JJε(f), with y1 < y2 the endpoints of I and z1 < z2 the endpoints of J. Then IJ if and only if either yl < zl or y2 < z2.

Theorem 1. The orderon Jε(f) is a well-defined total order.

Proof. Consider two relatively open intervals I,JJε(f), with y1 < y2 the endpoints of I and z1 < z2 the endpoints of J. Let u represent I and let v represent J. To prove that is well-defined, we must show that y1z1 iff y2z2.

The case where IJ = ∅ is trivial, so we consider IJ = ∅. It follows from Proposition 1 that when IJ = ∅, then either IJεmin(f) and JJεmax(f) or vice versa. Now it follows from Proposition 4 that

u<vy1uz1andy2vz2

and similarly

v<uz1vy1andz2uy2.

This shows that the order is well defined. It now follows from Corollary 3 that the order is total. □

Theorem 2. ε-minimal intervals alternate with ε-maximal intervals.

Proof. Consider two adjacent ε-minimal intervals I1 < I2 with a1 and a2 the associated local minima. Assume without loss that a1 > a2. Then there exists t ∈ (a1, a2) such that t ∉ (fε)−1(−∞, a1 + ε), since otherwise I1 and I2 are not distinct. This implies that f(t) > a1 + 2ε and therefore b := maxx∈[a1,a2] f(x) > a1 + ε > a2 + ε. Thus there is an ε-maximal interval J containing b. Since ◁ is a total order this implies IJK.

The argument ruling out two adjacent maxima is similar. □

4.1.6. Partial ordering between Jε(fi)

So far we have considered the representation of a single function f(t) in terms of a noise-level dependent collection of ε-minimal and ε-maximal intervals. However, many datasets include M functions fi on the same domain [x1, x2]. There is a natural partial order on the union of the ε-extremal intervals.

Definition 16. Define the partially ordered set

(iMJε(fi),)

to be an extension of the total order of each Jε(fi) as follows. If I,JJε(fi) for some i, defineas in Definition 15. Now consider IJε(fi) and JJε(fj) for ij, with y1 < y2 the endpoints of I and z1 < z2 the endpoints of J. Then

IJif and only ify2z1.

We choose this order because overlapping ε-extremal intervals across functions indicate that the order of the respective local extrema is not decidable at the noise level of ε.

4.2. ε-extremal intervals for discrete time series

Time series have values measured at only a finite collection of times. In this section, we consider this case in more detail.

Definition 17. A set D{Di}i=1M is a dataset on the interval [x1, x2] if Di{(zk,hki)}k=1N where

Z:={z1=x1,z2,,zN1,zN=x2},

is an ordered set with zj < zj+1 and the heights hki are measurements of the ith variable at zk. The components Di will be referred to as time series.

The collection Z is independent of i. An example would be a collection of time series of gene expression, such as comes from RNAseq data. We note, however, that the independent variable is not required to be time.

Definition 18. Let fi be the linear interpolation of Di, and let Jε(fi) be the ε-extremal intervals of the linear interpolation. Define Kεmin(Di) to be the set of relatively open intervals in [x1, x2] with endpoints in the set Z such that for each IiJεmin(fi), there exists JiKεmin(Di) satisfying

  1. JiIi, and

  2. Ji is the minimal such interval; i.e. there does not exist an interval Ki with endpoints in Z such that JiKiIi.

We define a function βmin:Kεmin(Di)Jεmin(fi) by βmin(Ji) = Ii that captures this relationship. Analogously, considering the linear interpolationfi, there is an analogous set of intervals Kεmax(Di) and a map βmax.

Since every local extremum of the linear interpolation fi occurs at one of the points zjZ, it is easy to see that the proof of Proposition 1 is still valid. Therefore we have

Proposition 5. For any I,JKεmin(Di), IJ = ∅.

Note that choosing fi to be the linear interpolation of Di is critical in order for this proposition to hold.

Definition 18 is a conservative definition in the sense that a minimum is guaranteed to occur within each interval I of Kεmin(Di) by restricting to βmin(I)Jεmin(fi). In other words, Proposition 2 still holds for Kεmin(Di). However, minimality is lost in discretization and Proposition 3 does not hold for Kεmin(Di).

This widening of the ε-extremal intervals means that an ε-minimal interval and an ε-maximal interval can coincide. In other words, given IKεmin(Di) and JKεmax(Di), we can have that I = J, so that Proposition 3 does not hold. To address this issue, we remove these intervals from the sets Kεmin(Di) and Kεmax(Di).

Definition 19. Let Kε(Di) be the intersection Kεmin(Di)Kεmax(Di). Then define

Iεmin(Di):=Kεmin(Di)\Kε(Di)Iεmax(Di):=Kεmax(Di)\Kε(Di)Iε(Di):=Iεmin(Di)Iεmax(Di).

We say that a time series Di is ε-constant if and only if

Iε(Di)=.

In a slight abuse of nomenclature, we will refer to Iε(Di) as the collection of ε-extremal intervals of Di, and Iεmax(Di) and Iεmin(Di) will be called the ε-maximal and ε-minimal intervals of Di, respectively. We say that

Iε(D):=iMIε(Di)

is the set of ε-extremal intervals of the dataset.

Since Iε(D)Kε(D), all the results for Kε(D) hold on Iε(D).

The total order described in Definition 15, now applied to Iε(Di), and the associated Theorem 1 proving the total order is well-defined, hold without changes – provided we again use the fact that all extrema of fi occur at some zj in the discretized time interval. Using that observation, we remark that the ε-minimal and ε-maximal intervals of Iε(Di) still alternate, as in Theorem 2. We are now free to apply Definition 16 for the partial order ◁ on Iε(D), as restated here.

Definition 20. Define a partial orderon the set Iε(D) as follows: Let IIε(Di) and JIε(Dj) with y1 < y2 the endpoints of I and z1 < z2 the endpoints of J. If i = j, then IJ if and only if either y1 < z1 or y2 < z2. If ij, then IJ if and only if y2z1.

4.3. Algorithms and software

4.3.1. Merge trees and branch decompositions

To calculate merge trees and branch decompositions for a discrete time series Di={zj,hji}, we follow Smirnov and Morozov [34], who provide pseudocode for Kruskals algorithm and a helper function called FindDeepest. This combination of algorithms has O(m log n) complexity for a merge tree with n nodes and m edges. We briefly summarize these algorithms here.

Let Z be the ordered set of time points {z1, …, zN} and let fi be the linear interpolation as before, with fi(zj)=hji. We form a linear graph G, where the vertices of G are labeled by the elements of Z, and edges in G connect and zj and zj+1 for all i. For brevity, we will drop subscripts where the context allows, and we will refer to a vertex in G as an element zZ.

Definition 21 ([34]). For hR, the sublevel graph at h, denoted Gh, is the subgraph induced by the vertices Z′Z whose function values fi(z) for zZ′ do not exceed h. The representative of vertex z at level hfi(z) is the vertex yZ with the minimum function value in the connected component of Gh containing z.

Definition 22 ([34]). The merge tree of fi on G is the tree on the vertex set of G that has an edge (z, y), with fi(z) < fi(y), if the connected component of z in Gfi(z)(z) is a subset of the connected component of y in Gfi(y), and there is no vertex x with fi(z) < fi(x) < fi(y) such that the connected component of z is a subset of the connected component of x in Gfi(x).

The algorithm of Smirnov and Morozov [34] is based on a representation of each vertex z in the merge tree by a triplet of vertices (z, s, y), where vertex z represents itself at levels h ∈ [fi(z), fi(s)), and y becomes its representative at level fi(s). We make the following observations:

  • if z is a vertex representing the global minimum of fi, then the triplet attached to z will be (z, z, z);

  • if z is any other local minimum of fi and hence a leaf of the merge tree, then the triplet associated to z is (z, s, y) with zsy and fi(y) < fi(z) < fi(s);

  • if z is any other non-leaf vertex of G, then its triplet representation is (z, z, y) with fi(y) < fi(z).

Note that for any leaf z of the merge tree with triplet (z, s, y), the interval [fi(z), fi(s)) represents a branch in the branch decomposition. For z with a triplet (z, z, z), the branch is [fi(z), b0], where b0 is the global maximum of fi. The normalized branch decomposition is then the collection {[0, fi(s) − fi(z))}∪{[0, b0fi(z)]} for every triplet in the branch decomposition. Therefore the algorithms of [34] calculate both the merge tree and the normalized branch decomposition simultaneously.

We implemented Algorithms 1 and 2 of [34] in Python 3 [21], along with a post-processing function to isolate the leaves of the merge tree using the fact that non-leaf vertices always have the triplet form (z, z, y) with fi(y) < fi(z). When there are two minima of identical depth, the one that occurs first in time is chosen to represent the triplet as in Definition 9.

4.3.2. ε-extremal intervals

Once the leaves are isolated, we calculate, for some specified noise level ε, the ε-minimal interval associated to each minimum. First, we remove all normalized branches that are not greater than 2ε. Then we take each representative z in the remaining triplets (z, s, y), and grow a ball around z until the associated function values fi(zδ1) and fi(z+δ2) meet or exceed a distance of 2ε from fi(z). This constraint arises because we are constructing the connected component of the sublevel set (fiε)−1(−∞, fi(z) + ε) that contains z. So we are seeking the largest set of vertices Z′Z such that the subgraph of G induced by Z′ is connected, and any vZ′ satisfies fi(v) − ε ∈ (−∞, fi(z) + ε), or equivalently, fi(v) − ε < fi(z) + ε.

In order to find the dual merge tree, dual branch decomposition, and associated ε-maximal intervals, we simply reflect the curve fi over the z-axis to get −fi, and repeat exactly the same procedure. The calculation of ε-extremal intervals given the triplets is at worst linear in the number of time points in the time series.

Once we have all of the ε-extremal intervals for a dataset, Iε(D), we impose the partial order in Definition 20. All of this functionality is in the open source software [21], along with Jupyter notebooks that generate the figures for the applications in Section 3.

4.3.3. Graph distance

In Application 2, we use a graph distance to calculate similarity between the transitive closures of Hasse diagrams. This graph distance gives roughly the proportion of dissimilar edges between HεC(D1) and HεC(D2).

Definition 23. For two node-labeled graphs G(V, E, ℓ) and G′(V′, E′, ℓ′), a bijection ϕ : VV′ is a graph isomorphism if and only if

  • (v1, v2) ∈ E ⇔ (ϕ(v1), ϕ(v2)) ∈ E′ for all v1, v2V,

  • (v) = ℓ′(ϕ(v)) for all vV.

Definition 24. For two node-labeled graphs G(V, E, ) and G′(V′, E′, ℓ′) the directed maximum common edge induced subgraph (DMECS) problem is to find some ordered pair (W, W′) with WE and W′E′ such that if

U={v1V|(v1,v2)Wor(v2,v1)W}U={v1V|(v1,v2)Wor(v2,v1)W}

then the graphs H = (U, W, |U) and H′ = (U′, W′, ℓ′|U′) are isomorphic and the value of |W| = |W′| is maximized. Here H and H′ are edge-induced subgraphs of G and G′. |W| is maximized if for all ZE and Z′ ⊆ E′ such that Z and Z′ induce isomorphic subgraphs of G and G′,

|Z||W|.

We define DMCES(G, G′) = |W| where |W| is maximized.

It is shown in [36] that

d(G,G)=1DMCES(G,G)max(|E|,|E|).

is a metric on the space of node-labeled, directed graphs. Since by definition

DMCES(G,G)max(|E|,|E|),

distance varies between 0 (most similar) and 1 (least similar).

As shown in [36], there is a polynomial-time reduction of the DMCES problem to the maximum clique problem and polynomial-time reduction of the graph isomorphism problem to the DMCES problem. Thus DMCES is no easier than the graph isomorphism problem and no harder than the maximum clique problem. This suggests that computing the DMCES problem is exponentially hard but no harder than NP-complete problems. We use an algorithm from [36] to compute the DMCES problem which leverages the special structure of the Hasse diagrams produced from datasets; namely that the partial order is built from the collection of total orders that arise from each individual time series.

5. Discussion

The method described in this paper assigns to a time series a collection of partially ordered intervals that are dependent upon a level of measurement uncertainty ε. Each interval is guaranteed to contain either a maximum or a minimum of every continuous function that is ε-close to the time series. We are particularly focused on applications in molecular and cellular biology where ’omics data can measure expression levels of thousands of genes; however, this approach is widely applicable.

Due to experimental challenges, a typical time series has 10-20 time points with time resolution of minutes to hours, and significant levels of measurement error. We do not assume that the measured variables are statistically independent; in fact this dependence carries information about the interactions between the components of the system which are of great interest. There are methods that use time series to deduce a structure of the underlying causal network [2, 3, 4, 5], i.e. which genes up-regulate or down-regulate other genes. These methods rely on sequencing of time points when genes achieve their peak expression, lowest expression, or time points when they pass the half saturation point. Since this is the time where maximal rate of change occurs, we may alternatively approximate the derivative of the time series and turn the problem of finding the sequence of times with maximal rates of change to the problem of finding extrema.

Our work is closely related to work on merge trees and persistence homology [18, 37, 38]. Persistence type methods use topological data analysis to extract the most prominent extrema. Since the most persistent features have large amplitude, and we are interested in all levels where extrema appear and disappear, the 0-persistence of f gives closely related, but complementary results to ours.

Similar ideas to those presented in this paper have been used in [39] in the context of 2D uncertain scalar fields. Their goal is to describe mandatory critical points for 2D data for which upper and lower bounding scalar fields f and f+ are given. The mandatory critical points are characterized by regions of the plane called critical components where the critical point is guaranteed to occur for any realization of a scalar field within f and f+, and by critical intervals in R which bound the admissible height of the critical point. These critical components correspond in our approach to ε-minimal intervals when f+ = f + ε and f = fε.

Our approach can be viewed as an adaptation of the technique in [39] to time series data, with some important differences. Since the time series data are assumed to arise from a continuous process, we first analyze continuous functions and only then extend it to the discrete time series data. Moreover, we do not assume that the upper (f+) and lower (f) bounds are given; rather we parameterize these in terms of parameter ε and analyze a range of values of ε. This aspect is very important in the applications we present in Section 3. In these applications, we analyze multiple time series, construct partial orders of ε-extremal intervals, and use the partial orders to compare models to time series as well as quantify differences between time series replicates.

The two applications of our technique use microarry data. In the first application, the presented analysis allows the rejection of DSGRN network models [1, 12] that cannot reproduce the experimentally observed sequences of minima and maxima of microarray time series. For a proposed network model, DSGRN computes all possible sequences of minima and maxima that can be produced by the network, across all parameters. This data can be then compared to a partial order computed from the experimental time series data, at different levels of assumed experimental measurement uncertainty ε. We illustrate our approach on data from the yeast cell cycle, where the regulatory network is well-described and has substantial experimental validation [22, 40, 28, 41, 31, 42]. We show that the network model can reproduce experimental data at a low level of assumed experimental noise, but cannot reproduce data where we artificially swap labels on the time series. Swapping labels is equivalent to making the model incorrect, and such a model is consistent with data only at very high levels of noise (28% of the total signal amplitude).

In our second application, we study again the gene regulatory network that controls the cell-cycle-transcriptional program [32, 33, 28, 31]. The network of serially-activated transcription factors activates other transcription factors at appropriate phase of the cell cycle and thus plays a key role in establishing a cell cycle generating order in cellular transcription. There is an ongoing debate in the field on the role of this network in controlling cell-cycle-regulated transcription and in the ordering of cell-cycle events [43, 44]. A reproducible ordering of gene expression, which we observe in this paper, argues for precise control of the transcriptional program and provides further supporting evidence for the importance of a cell-cycle gene regulatory network.

The problem of aligning data from different experiments and evaluating the similarity of two experiments is a problem in biology that goes beyond the study of cell cycles. Circadian clock networks also control large, well-ordered programs of phase-specific gene expression [45, 46], and perturbations to those programs are likely to be found in clock-associated diseases. Ordering the expression of genes is also a fundamental mechanism for the assembly of a variety of protein complexes [47]. Thus, the ability to accurately compare ordering in gene expression will be useful for identifying perturbations in complex formation, circadian regulation, as well as cell-cycle control. The approach presented in this paper is applicable to any experimental time series data where comparison and evaluation of similarity of ordering is desired.

Acknowledgments:

T. G. was partially supported by NSF grant DMS-1361240, USDA 2015-51106-23970, DARPA grant FA8750-17-C-0054, NIH grant 1R01GM126555-01, and NSF TRIPODS+X grant 1839299. B.C. was partially supported by grants USDA 2015-51106-23970, DARPA grant FA8750-17-C-0054, NIH 1R01GM126555-01, and NSF TRIPODS+X grant 1839299. E. B. was partially supported by NSF grants DMS-1508040, DMS-1664858, DMS-1557716, DMS-1812055 and DMS-1945639. R. N. was supported by Montana State University’s Undergraduate Scholars Program (USP) during the Fall 2018 funding cycle. S.H. was partially supported by NIH 5 R01 GM126555-03 and DARPA FA8750-17-C-0054. L.S. was supported by NIH 5 R01 GM126555-03. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

A. Yeast data analysis

Time-series transcriptomic data for one replicate of wild-type yeast Saccharomyes cerevisiae were previously published in [28]. Microarray (Affymetrix Yeast Genome 2.0) expression data were normalized as previously described [28], although for this study Affymetrix probe IDs were re-annotated using Affymetrix Yeast Genome 2.0 microarray annotation 35. Expression data were aligned to a common cell-cycle time line using the CLOCCS (Characterizing Loss of Cell Cycle Synchrony) [29] population synchrony model, as previously described [28]. Briefly, the CLOCCS model allows multiple time-series experiments to be aligned to a common cell-cycle timeline, using experimentally-derived yeast budding data. The CLOCCS model converts time points in the series to life points, which indicate the progression through the cell cycle. Expression data for both replicates were interpolated to integer life points with an interval of one using a Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) spline. Life points were then trimmed so both replicate time series were of identical location and duration in the cell cycle.

Footnotes

Publisher's Disclaimer: This Author Accepted Manuscript is a PDF file of an unedited peer-reviewed manuscript that has been accepted for publication but has not been copyedited or corrected. The official version of record that is published in the journal is kept up to date and so may therefore differ from this version.

References

RESOURCES