Inferring sparse networks for noisy transient processes

Hoang M Tran; Satish TS Bukkapatnam

doi:10.1038/srep21963

. 2016 Feb 26;6:21963. doi: 10.1038/srep21963

Inferring sparse networks for noisy transient processes

Hoang M Tran ^1,², Satish TS Bukkapatnam ^1,^a

PMCID: PMC4768174 PMID: 26916813

Abstract

Inferring causal structures of real world complex networks from measured time series signals remains an open issue. The current approaches are inadequate to discern between direct versus indirect influences (i.e., the presence or absence of a directed arc connecting two nodes) in the presence of noise, sparse interactions, as well as nonlinear and transient dynamics of real world processes. We report a sparse regression (referred to as the Inline graphic -min) approach with theoretical bounds on the constraints on the allowable perturbation to recover the network structure that guarantees sparsity and robustness to noise. We also introduce averaging and perturbation procedures to further enhance prediction scores (i.e., reduce inference errors), and the numerical stability of Inline graphic -min approach. Extensive investigations have been conducted with multiple benchmark simulated genetic regulatory network and Michaelis-Menten dynamics, as well as real world data sets from DREAM5 challenge. These investigations suggest that our approach can significantly improve, oftentimes by 5 orders of magnitude over the methods reported previously for inferring the structure of dynamic networks, such as Bayesian network, network deconvolution, silencing and modular response analysis methods based on optimizing for sparsity, transients, noise and high dimensionality issues.

Many real world processes including biological¹,², socio-economics³,⁴, and engineering systems⁵, can be represented as large scale dynamic networks⁶. The multitude of state variables of the process represent the network nodes and the arcs represent the dynamic coupling between pairs of state variables. Inferring the structure of these networks is critical for multiple purposes such as identifying key causal relationship, clustering, partitioning or reducing the system state space; thereby facilitating effective prediction, control and/or interventions of its underlying processes. For example, inferring the signaling pathways of the gene p53 was noted to be crucial towards advancing cancer treatment⁷.

Real world processes exhibit nonlinear dynamics and they almost always occur in transient conditions. Identifying the structure, especially the existence or absence of a direct dynamic coupling between the variables of such systems has been noted to be a standing challenge of modern science⁸, and the underlying causal mechanisms remain largely undiscovered. Most often, only noisy measurements of the network outputs in the form of a small ensemble of time series data are available for network inference⁸,⁹,¹⁰,¹¹,¹²,¹³. The use of conventional system identification approaches can produce many spurious links due to the transitivity of influences among the nodes. Several methods for network inference notably based on Bayesian update¹⁴,¹⁵,¹⁶,¹⁷,¹⁸,¹⁹, Granger causality and multivariate autoregressive²⁰,²¹,²²,²³,²⁴, partial correlation²⁵, network deconvolution (ND)²⁶, network silencing²⁷ and conditional causal relation²⁸,²⁹,³⁰,³¹ have been investigated to filter the effect of indirect influences. When the time series gathered under transient conditions were available, a Modular Response Analysis (MRA)³²,³³,³⁴ method was proposed to infer the network structure at each time point. However, these methods suffer from serious drawbacks such as they mostly assume the system to exhibit linear and time-invariant dynamics²⁶, determinism (noise-free)³³,³⁴,³⁵, and/or the existence of a point attractor under steady state²⁷. While MRA method can be employed to reconstruct dynamics under transient conditions³³, its performance deteriorates sharply in the presence of noise and the method encounters severe numerical stability issues, especially when the underlying dynamics is highly nonlinear. This tends to severely restrict its applicability to real world processes. Notably, the earlier methods essentially focus on dealing with each of the following scenarios including transient time series³³, noisy measurements¹⁴,¹⁵,¹⁶,¹⁷,¹⁸,¹⁹, and indirect influence removal¹⁴,¹⁵,¹⁶,¹⁷,¹⁸,¹⁹,²⁰,²¹,²²,²³,²⁴,²⁵,³³separately. The realistic scenario combining all these scenarios has not been considered. All available methods literally break down when presented with this scenario.

Towards addressing this gap, we introduce an approach based on modifying ND, silencing and MRA methods to account for sparsity, transients, noise and high dimensionality issues. Specifically, we have investigated a sparse regression (henceforth referred to as the Inline graphic -min) formulation to recover the structure of dynamic networks from noisy data gathered under transient conditions. Our main contribution is in providing a theoretical bound on the constraints of the -min formulation and providing stable numerical procedures that overcome effects of nonlinear couplings in large interconnected processes, availability of only a small sample of short time series ensembles, and inaccuracies in estimating noise levels. These bounds mitigate tedious trial and error procedures employed customarily as part of Inline graphic -min implementations¹,³⁴,³⁵,³⁶. The theoretical results and subsequent experimental studies suggest that the present -min approach is more robust to noise compared to the contemporary dynamic Bayesian network¹⁴,¹⁵,¹⁶,¹⁷,¹⁸,¹⁹ as well as NDs²⁶,²⁷,³². It is shown that up to 5 orders of magnitude reduction in the inference error are possible from the present approach, leading to a more accurate inference of the network structure for complex real world networks.

Methods

Towards a more formal treatment, we define a real world system as high dimensional coupled differential equation of the form

where Inline graphic is a state vector, p is the parameter vector, is an initial condition. As noted in the foregoing, such dynamics can also be represented in form of a network³⁷ shown in Fig. 1, where the node i represents the state variable and a directed arc represents the existence and the strength of the coupling (direct influence) Inline graphic between node i and node j. In this context, the direct influence of node j on node i around a certain point x in the state space defined in Eq. (1) can be expressed as

The total influences in (b) are the accumulation of the influences transited through all paths in (a). For example, the total influence in (b) is the accumulation of the influence transited through the paths and .

graphic file with name srep21963-m14.jpg

It may be noted that, a node j is connected to a node i at time t if Inline graphic . Hence, captures the physical structure of the dynamical system (1) at time t. In practice, needs to be inferred from the measurements of the total influence between every pair of nodes²⁶,²⁷ or estimated from time series outputs of the dynamic system gathered under transient conditions³³. The total influence Inline graphic is the sum of the direct influence of node j on node i and all indirect influences from node j to node i through other nodes connecting to both of them (see Fig. 1b). For example, total influence from , is the sum of indirect influences along the paths and , or . In other words, the total influence that node j has on node i around a certain point x on the state space defined in Eq. (1) is defined recursively as

graphic file with name srep21963-m25.jpg

graphic file with name srep21963-m26.jpg

graphic file with name srep21963-m27.jpg

which is similar to the expression noted in in Barzel and Barabási²⁷. Conventionally, under stationarity assumptions, Inline graphic can be approximated using similarity measures, such as correlation and mutual information⁸ estimated from raw samples of time series. The direct and total influence matrices are related at every time t by the following equation:

graphic file with name srep21963-m29.jpg

where Inline graphic and are functions (defined depending on the context) of and , respectively. Pertinently, when the underlying dynamical system is linear and time-invariant, and do not depend on time. Eq. (7) generalizes previous network deconvolution formulations as follows: for Feizi et al.²⁶, Inline graphic , for Barzel and Barabási²⁷ , and for Sontag et al.³³, , where . For simplicity of expressions, we use henceforth S, B and C instead of and in this subsection. The “true” network structure can be estimated by solving the following -min formulation:

graphic file with name srep21963-m44.jpg

where Inline graphic , and is the allowable perturbation that captures the effects of noise in the measured data. We note that in the absence of noise, this formulation is equivalent to ND and MRA. In the following sections we present two alternative -min formulations for direct influence inference. The first formulation presented in Eqs (9, 10) addresses the estimation of Inline graphic for real world scenarios when the total influence is directly measurable (e.g., based on the strengths of co-excitations), and the second formulation Eqs (21, 22) addresses the inference of the network structure (i.e., determine all node pairs where ) under one of the most generic scenarios of using multiple ensembles of time series realizations of the state variables, collected under noisy and transient conditions with different parameter settings. It may be noted that inferring the network structure under such generic conditions has not been investigated to date.

Network inference when total influence matrix is available

For the case where the measurements of total influence matrix G are provided²⁶, the relaxed Inline graphic -min formulation can be written as

graphic file with name srep21963-m52.jpg

or in vector form as

graphic file with name srep21963-m53.jpg

where Inline graphic is the column of G. In order to solve for an accurate estimate of from Eqs (9) or (10) using standard solvers³⁸,³⁹, estimation of and are crucial. Specifically, when noisy measurements of the total influence matrix differ from the “true” total influence as , the estimated direct influence matrix differs from the true direct influence matrix as Inline graphic , and

graphic file with name srep21963-m61.jpg

graphic file with name srep21963-m62.jpg

The quantity Inline graphic is called total perturbation. In vector form, can represent the total perturbation for computing row i of . The bounds on and are as follows (See Theorem 1 in Supplementary Information):

graphic file with name srep21963-m68.jpg

graphic file with name srep21963-m69.jpg

graphic file with name srep21963-m70.jpg

where γ is the largest eigenvalue of ΔG, δ_K is the restricted isometry constant⁴⁰ and Inline graphic is the Frobenius norm of a matrix. By employing these bounds, we can set the values of and for effective network inference. As subsequent numerical investigations indicate, the performance of the method does not degrade significantly due to the presence of noise, and this is the major advantage of the present approach. It may be noted that our method is designed to provide the sparsest network structure that replicates the measured total influence G within a bound (specified in terms of the allowable total perturbation). This is very important because only a small set of noisy observations are available, for most real world applications. For example, in the case of genetic regulatory networks, only a subset of dynamic regimes (i.e. marked by the active degrees of freedom) of the underlying process are captured. Therefore, identification of true network structure would never be guaranteed by any approach, and among the network structures that can replicate the observed total influence within a specified bound, the sparsest network would be of the most interest. Although sparser than the network derived by ND, Inline graphic -min derived structure might be adequate to uncover the total dynamic couplings of the process captured in the observed data.

In real world scenarios, Inline graphic is not always known. Overestimation of can lead to network structures that are sparser than the original. However, we show that the effects of under-estimation of noise can be alleviated to a great extent. When noise level is unknown but multiple realizations of the noisy measurements of G are available, it is possible to further reduce the inference error by combining the estimates with different realizations of G as Inline graphic (See the Proposition 1 in Supplementary Information), where are direct influence matrices computed from and are N different measurements or estimates of the total influence matrix . This result assumes that is bounded. However, it may be noted that even if is arbitrarily large we find that Inline graphic is at least as good as . This averaging procedure allows us to improve the network inference accuracy when multiple measurements of the total influence matrix are available. For example, when the network structure does not change significantly as the system approaches a steady state, the total influence matrices can be measured multiple times, each corresponds to one time window.

Network inference when the time series under transient conditions are available (total influence matrix not given)

In practice, Inline graphic are often estimated using convenient similarity measures such as correlation or mutual information between the time series and of the nodes as stated in the foregoing section. These estimations have a very low accuracy due to nonstationaries (transient), low sampling rates and sample size limitation; and can not capture the total influence in the system. Also, in most real world applications, only finite samples of time series Inline graphic are available, and the present NDs can not be employed in these scenarios. To overcome these drawbacks, we have adapted an approach to estimate the direct influence based on multiple time series ensembles obtained by perturbing parameters of the dynamical system Eq. (1)³³. We first modify the perturbation procedure proposed by Sontag et al.³³ to make it more robust to numerical error then further improve the accuracy of network inference by introducing a sparse regression formulation and the averaging scheme.

A robust perturbation procedure

According to Sontag et al.³³, Inline graphic can be derived from the following equation:

graphic file with name srep21963-m92.jpg

where

graphic file with name srep21963-m93.jpg

and

graphic file with name srep21963-m94.jpg

Note that Γ plays the role of the total influence matrix G in the previous section. To compute the row i of the matrix S, the parameters Inline graphic to be perturbed are chosen such that ³³. As a consequence, changes in indirectly affect , and are much smaller than , for . As a result, the i^th column in the matrix is much smaller (2 orders of magnitude smaller as in the Table 1 for the network studied in case study 1) compared to other columns when Inline graphic . A numerical issue this poses can be understood based on the following linear system of equations

Table 1. The matrix R for computing the first row of S is estimated using Sontag et al. ³³’s perturbation procedure.

	r.₁	r.₂	r.₂	r.₂	r.₂	r.₂	r.₂	max\|r._j\|
r_1.	−2.868e-4	−7.284e-5	−3.106e-4	−1.578e-4	2.443-e4	−8.315e-5	−4.896e-4	0.0005
r_2.	1.160e-4	0.1050	−4.261e-4	−1.803e-4	6.490e-4	2.261e-4	−2.379e-4	0.1050
r_3.	−1.136e-4	−2.658e-4	0.1179	−2.766e-4	1.370e-4	−2.524e-4	2.776e-4	0.1179
r_4.	−4.431e-4	−4.543e-4	−4.138e-4	0.0961	6.824e-4	6.710e-5	1.609e-4	0.0961
r_5.	−2.397e-4	−4.439e-4	−1.225e-4	−7.024e-4	0.1100	3.256e-4	2.069e-4	0.1100
r_6.	4.053e-4	−3.773e-4	−2.577e-4	−5.065e-5	0.0012	0.1195	4.481e-4	0.1195
r_7.	−1.030e-4	3.312e-5	−2.900e-4	−5.258e-5	0.0100	−1.651e-4	0.0820	0.0820

Open in a new tab

The first row/column of R is two orders of magnitude smaller than others, which presents major numerical issues for inferring structures of large networks.

graphic file with name srep21963-m105.jpg

Here, the sensitivity of solution u to the change in A can be quantified as follows⁴¹

graphic file with name srep21963-m106.jpg

where Inline graphic . Whenever A contains a j column such that , , C contains a row i such that . As a consequence, becomes several magnitudes larger than other rows. Therefore, the perturbation procedure proposed by Sontag et al.³³ is very unrobust to noise or numerical error in x_is.

The following modification to the perturbation procedure addresses the aforementioned issue. Consider the case when Inline graphic depends linearly on x_i as in the following system⁴²:

graphic file with name srep21963-m113.jpg

This system describes popular biochemical reactions when the activity of a chemical species is inhibited by its own concentration⁴³,⁴⁴. To compute the Inline graphic row of the Jacobian, the parameters p_i is also perturbed. Note that

graphic file with name srep21963-m115.jpg

graphic file with name srep21963-m116.jpg

The remaining parameters are perturbed as in Eqs (17, 18). Therefore, to compute Inline graphic , we can solve the system of equations (16) with

graphic file with name srep21963-m118.jpg

and other Inline graphic , are defined as in (17, 18).

A robust network identification approach

In addition to the perturbation procedure proposed in Eqs (17, 18, 19), we present a method to solve Eq. (16) that is more robust to the presence of noise. In the present context, the Inline graphic -min formulation of Eq. (16) takes the following form:

graphic file with name srep21963-m122.jpg

graphic file with name srep21963-m123.jpg

As noted in the foregoing section, estimation of Inline graphic and based on the noise levels when measuring is essential to ensure that the solution to Eq. (21) serves as a viable estimator of the “true” direct influence . The following bounds and approximation allow the specification of and (Theorems 4 and 5 in Supplementary Information)

graphic file with name srep21963-m130.jpg

graphic file with name srep21963-m131.jpg

graphic file with name srep21963-m132.jpg

where

graphic file with name srep21963-m133.jpg

and Inline graphic are the errors incurred when measuring , respectively. As stated in the foregoing, noise level is not known a priori in most real world systems. In this situation, the network structure is deduced based on the entries in the estimated that are equal to zero for all t and can be estimated by the entries in as Inline graphic that converge to zero, where is the direct influence matrix computed from , and are measurements or approximations of the total influence matrix at time (see Proposition 2 in Supplementary Information). This averaging procedure allows us to improve the accuracy to predict the pair of nodes that are not connected when the measurement noise level is not available. As a result, our method ensures low false positive rates on the “arcs”. As noted in the context of Proposition 1, network inference with Inline graphic tends to be at least as good as with even when is arbitrarily large.

Results

We have considered two case studies to validate the theoretical results and evaluate the performance of the Inline graphic -min approach. The first case study contains two simulation scenarios. The first scenario simulates a scale-free network whose structure resembles that of the genetic regulation process of E. Coli species⁴⁵. Here, the challenge is to estimate the true network structure, i.e., the direct influence matrix Inline graphic from a noisy total influence matrix G. This scenario is optimal for assessing the closeness of the bounds stated in Eqs (14, 15) relative to the true bounds on the constraints , and comparing the performance of the -min formulation relative to the recent ND methods in terms of inference error and sparsity. The next scenario simulates a system of Hill-type differential equations modeling a gene interaction network. Here, the challenge is to estimate the true network structure from noisy and transient time series data. The second case study is an application of our method to infer genetic regulatory networks (GRNs) from empirical data in the context of DREAM5 challenge⁴⁶. This challenge is a standard framework for evaluating GRN inference methods.

Case I: simulation studies

Inferring direct influence networks from total influence network

First, we adapted the procedure specified by Muchnik⁴⁷ to generate 500 random realizations of scale-free networks consisting of Inline graphic nodes, with a degree exponent of 2.2. In each realization, the weights of the true direct influence network, follow the distribution with , and . The true total influence matrix was obtained as . The noisy total influence matrix was generated as , where the contaminated noise was considered in two cases: (1) proportional, i.e., Inline graphic and (2) independent, i.e., . We considered cases where the measurement noise level is known as well as those where there is uncertainty in estimating the measurement noise level.

We first compare the “true” bound Inline graphic (computed using S⁰) and the bounds for estimated based on Eqs (13, 14). In the presence of noise, the bounds appear to be in the same order of magnitude for all simulated networks (Table 2). The results also suggest that the bound specified in Eq. (13) closely matches the “true” bound and can be used to approximate the feasible region when Inline graphic is unknown with high accuracy. Although the bound in Eq. (14) tends to be loose, it can be used as an upper bound for .

Table 2. Comparison of bounds on total perturbation obtained using Eqs (13) and (14) suggests that Eq. (13) provides a good approximation and Eq. (14) serves as an upper bound of .

Formula	Mean
	9.79 × 10⁻³
(Eq. 13)	8.89 × 10⁻³
(Eq. 14)	8.61 × 10⁻²

Open in a new tab

We next compared the performance of ND and Inline graphic -min approaches (using our bounds Eqs (13) and (14)) in terms of inference error defined as , where is computed using the different methods being compared. The -min approach with “true” constraint bound significantly improves the ND (the mean and the variance of the estimated ρ were reduced by 45% and 99%, respectively) (Fig. 2). Employing Inline graphic (based on Eq. (13)), the -min approach performs much better than ND (the mean and variance of ρ are reduced by 33.5% and %, respectively). More importantly, the inference error of -min approaches were concentrated around of 0.15 within ±0.05, while those of ND were spread over a larger range, from 0.3 to 0.6. This suggests that Inline graphic -min approach using our bound in Eq. (13) is more robust than ND to noise and approximation error incurred when measuring the total influence matrix.

Histograms summarizing the relative performance of ND and -min approaches for the benchmark numerical case in terms of (a) inference error that quantifies the accuracy and (b) Hoyer measure that quantifies the sparsity of the solution. The solution from the -min approach is more precise and sparser than ND: compared to NDs, the mean and the variance of the inference error are reduced by 45% and 99%, respectively, when using -min with ; 33.5% and 87.5%, respectively when using -min with ; the mean of Hoyer measure is increased by 16.38% and variance reduced by 69% when using the -min with , and is increased by 15.90% in mean, reduced by 75.69% in variance when using .

We also compared the sparsity of the recovered networks measured in terms of Hoyer sparsity measure⁴⁸ defined as follows

graphic file with name srep21963-m177.jpg

Note that Inline graphic . The closer it is to 1, the sparser S is. In terms of this measure, the solution of the -min approach is much sparser (mean is 16.38% larger, variance is 69% smaller when using the true bound , and mean is 15.90% larger, variance is 75.69% smaller when using the approximated bound than solution of ND (Fig. 2b). Also, the Hoyer measure of the Inline graphic -min approach is concentrated more around a much higher value (sparse matrices) than that of ND indicating that the -min approach using our bound gives a significantly sparser solution than ND. As a result, this gives a more interpretable connection structure without the loss of performance.

We also studied the effects of the bounds of Inline graphic -min formulation on inference error to verify Eq. (40) numerically. When , the inference error trends almost linearly with (see Fig. 3). This confirms the conclusion of Theorem 3. Also, when and tends toward 0, the inference error increases. This shows an evidence of over-fitting.

The inference error attains a minimum near the true bound , and it trends almost linearly with as it is increased beyond . As , the inference error increases exponentially, which is an evidence of over fitting.

Subsequently, we studied the effect of averaging (Proposition 1) in the context of the Inline graphic -min and ND methods. We conducted N = 40 simulations, in each of which, and were generated as stated in the foregoing. We used the inference error without and with averaging as measures for comparison from each simulation defined as follows:

graphic file with name srep21963-m193.jpg

graphic file with name srep21963-m194.jpg

where Ŝ^(k) Inline graphic is the realization of and is estimated as stated in Proposition 1. The results suggest that averaging reduces the inference error of both methods by about 8 times in all cases, thus supporting the validity of Proposition 1 (Fig. 4). The inference errors were almost the same between ND and Inline graphic -min with .

Box plots summarizing the effects of averaging on (a) ND and (b) -min with . The inference errors were almost unchanged with -min compared to ND. Averaging (light/red) reduced inference error further by about 8 times compared to without averaging (dark/blue). The values were 0.1196 with ND and 0.0259 with -min (p-values of the paired t-tests between the inference error without and with averaging were ≤10⁻⁵ in all cases).

Inferring direct influence network structure from multiple time series under transient conditions

In this section we represent the performance of Inline graphic -min approach in inferring network structure from transient time series with an unknown noise level. In this study we used Michaelis-Menten dynamic system given by²⁷:

graphic file with name srep21963-m202.jpg

where the “true” network defined by Inline graphic is a scale-free network⁴⁵ generated randomly with degree exponent consisting of nodes with about 70 edges, whose weights follow the distribution .

We obtained 30 different variants of this network. For each of these invariants (trials), a perturbed network was obtained by changing (perturbing) the parameters according to Eqs (18, 19, 20). Every solution Inline graphic , , obtained from an initial condition was contaminated with noise of the form to simulate a noisy measurement . Here was chosen to be 10⁻⁴. The direct influence matrix were estimated using Sontag et al.’s³³ method, as well as -min formulations, with different values of bounds. Next, Inline graphic was estimated as in Proposition 2 by averaging over 30 time samples chosen randomly. For performance evaluation, we used the inference error without and with averaging , given by

graphic file with name srep21963-m220.jpg

graphic file with name srep21963-m221.jpg

where Inline graphic is Heaviside function. These error measures quantify the number of absent links that are correctly identified.

As summarized in Fig. 5, the Inline graphic -min approach performs better than Sontag et al.’s³³ method in all cases tested. In fact, were reduced by 10⁵ times. The poor performance of Sontag et al.’s³³ method is attributed to the numerical issues noted in the earlier section. A further 30% reduction in inference error resulted from averaging for both cases. Next, the cases (c) and (d) were designed to simulate the real situations where the noise magnitude is unknown. We considered cases where the noise levels are under or overestimated by 1 order of magnitude. While Sontag et al.’s³³ method would not be applicable in such cases, Inline graphic -min without averaging was found to lead to suboptimal inference. Under underestimation , averaging was found to further reduce the inference error by about 70%, and the inference error were of the same level as one would obtain when the noise level is known. This result is consistent with and is a clear verification of Proposition 2. When the noise level is overestimated, the resulting network tends to be highly sparse, offering excellent specificity in identifying the absence of direct coupling. The inference errors are therefore low even without averaging by default. In this case averaging reduces the inference errors by 5%. The p-values of the paired t-tests between the inference error with and without averaging were below 0.0282 in all cases suggesting that averaging helps improve network inference.

(a) Sontag *et al.*’s³³ method ; (b) -min with noise magnitude given , (c) -min with noise magnitude underestimated as 10% the actual , and (d) -min with noise magnitude overestimated as 10 times the actual . The inference error was reduced by 10⁵ times when using the -min approach (21, 22), compared to Sontag *et al.*’s³³ method. Averaging further reduced inference error by at least 30% in all cases (p-values of the paired t-tests consistently were below 0.0282).

Case II: Application to empirical genetic regulatory network inference

Next, we applied our method to infer real world GRNs and compare its performance with other methods including ND²⁶, Bayesian network inference, Pearson and Spearman correlation networks⁸ using the framework presented in DREAM5 challenge. Here, the Pearson and Spearman correlations were considered as they are the most widely used methods for network inference and can provide a reasonable estimation of the total influence matrix²⁶,²⁷. In addition, ND has been most effective in inferring network topology when the total influence matrix G is estimated using Person and Spearman correlations. Therefore, these serve as the challenging test cases to evaluate the performance of Inline graphic -min where ND is already effective. The DREAM5 challenge contains gene-expression microarray data of three species including an in silico benchmark, a prokaryotic model organism (E. coli) and a eukaryotic model organism (S. cerevisiae). Beside ρ and Hoyer metrics, we employed the following score, which was used in earlier works⁸ to assess the performance of a network inference method for recovering the structure underlying these data sets:

graphic file with name srep21963-m230.jpg

where Inline graphic and are p-values computed from AUROC (area under receiver operating characteristic curve) and AUPR (area under precision-recall curve).

The results of the performance evaluation are summarized in Fig. 6. We note that for computing the performance metrics we first generated 30 different G matrices with Pearson correlation, 30 others with Spearman correlation and another 30 with Mutual Information for each data set. The G matrix in each case was estimated using samples of size 75% of the data set. The averaging procedure considers the S matrices estimated from these G matrices using different methods. In terms of ξ-score (Eq. (31)), which quantifies how well—in terms of having low false negative rates (FNR, related to sensitivity), and low false positive rates (FRN, related to specificity), the true positive rate (TPR) and true negative rate (TNR)—the estimated Inline graphic captures , -min approach yields with at least 18.53% higher than with ND in all cases tested except the in silico case (see Fig. 6). Both ND and -min performed better than Bayesian network approach whose ξ-scores were 14.891, 0.029, 0.0001, respectively, for the three data sets⁸. In terms of ρ-score (Eq. (29)), which quantifies the false positive rates (i. e., the specificity), Inline graphic -min approach reduces ρ by 2-3 orders compared to ND in all cases. These results provide a strong evidence for the relevance of the -min approach for network structure inference. In terms of sparsity, -min approach increased the Hoyer measure by about 20% in most cases, and were much closer to the Hoyer measures of the gold-standard network, compared to ND.

The total influence G matrix is estimated by Pearson correlation (blue/dark), Spearman correlation (red/light) and Mutual Information (green/light). Compared to ND, the prediction scores with -min are increased by 23.94% (for G from Pearson correlation), 53.03% (for G from Spearman correlation) & 18.53% (for G from Mutual Information) for *E. Coli*, 89.09%, 249.7% & 116.74% for *S. cerevisiae*, respectively; the inference errors ρ (29) are reduced by 2 to 3 orders of magnitude in all cases; Hoyer measures are increased by 34%, 36.41% & 322.91% for *E. Coli*, 18.85%, 19.59% & 96.65% for *S. cerevisiae*, respectively. For *in silico* data, ND gives a solution with 11% higher prediction score but 33% less sparse than -min approach. Averaging slightly improves the performance of all methods (<10%).

As noted earlier for in silico data, although the ρ-score with Inline graphic -min was at least 1160% lower (i.e., higher specificity) and Hoyer was 33% higher (i.e., higher sparsity), the ξ-score was slightly (10%) lower than with ND. The lower ξ- score for -min is perhaps a consequence of the method being susceptible to over-specification of the noise level. In this context, it must be noted that the solutions from both ND and Inline graphic -min can replicate the observed total influence G within a specified bound (as total perturbation). However, the solutions from -min tend to be much sparser and have lower false positive rate. Given that there were only 805 sample measurements to reconstruct G matrices for 1643 nodes in the in silico network, it is highly likely that several dynamic modes (degree of freedom) are not observable from the data. Therefore, Inline graphic -min generated a much sparser network which, by formulation, is guaranteed to be adequate to capture the observed modes of the dynamics within the specified total perturbation limits. The ND derived networks for in silico and other cases that have higher ξ-score, intriguingly, were consistently found to have much lower Hoyer score (hence sparsity) even compared to the specified total influence matrix. Thus, Inline graphic -min-generated solutions provide significant improvement in specificity, although the sensitivity at times were found to be slightly lower than with ND.

Averaging improves the ξ-scores (Eq. (31)) with all methods by at most 10%. This is perhaps due to the near-stationarity of the total influence matrix G, when computed using data over long time windows that smooths out various higher order transient effects. Also, one may note that the averaging makes the network inferred from ND less sparse than without averaging. This is because under noise, transients and data sparsity, ND yields vastly different network topologies depending on the samples employed. Averaging over these vastly different networks causes a reduction in sparsity. These results, taken together suggest that the Inline graphic -min approach is perhaps the best known means to provide specificity for network inference from transient and noisy data. The utility of the approach would be to provide a minimal set of arcs (dynamic couplings or direct influences) to be considered for further network dynamics reconstruction applications.

Discussion and Concluding remarks

In this paper, we have investigated a method to robustly infer the structure of a network representing a sparse dynamical system from noisy, transient time series data. When the noise level is known, the Inline graphic -min formulation employing our theoretical formula for the bound on total perturbation improves the recently reported NDs in terms of both accuracy and sparsity. When the noise level is unknown, we have shown that by averaging the networks inferred from different time points or conditions, the inference of network structure of real world processes becomes highly plausible.

Pertinently, for most real world processes, the total influence is not known a priori; only the time series ensembles gathered under transient conditions are available (e.g., gene expression microarray data⁸,⁴⁹, protein-protein interaction data⁵⁰ as in the case of Michaelis-Menten dynamics). It has been noted that most of the earlier approaches present severe accuracy, noise sensitivity and/or numerically stability issues for such realistic scenarios. To overcome these limitations, we have investigated the Inline graphic -min approach with a novel perturbation procedure for time series based network inference. Averaging over the solutions estimated at different time windows has been shown to allow inference of the structure for complex real world networks, especially when the noise levels are unknown or cannot be accurately estimated.

Next, we have applied our method to three benchmark systems: a sparse scale-free network⁵¹ with a specified noise level and the total influence between any two nodes given, a genetic regulatory network model formulated in terms of a system of Hill-type differential equations²⁷, and GRNs of DREAM5 challenge⁴⁶. These analyses suggest that our proposed bounds on the constraints for the Inline graphic -min formulation, extracted from a few time series samples acquired under transient conditions, are of the same order (i.e., they closely envelop) with the constraints estimated based on the full knowledge of the noise level. The -min formulation reduces the inference errors defined in (31) and (29) by 18.53% and 2 to 3 orders of magnitude, respectively, and improves the sparsity of the solution (measured in terms of Hoyer sparsity measure) by 15.9%, in comparison with conventional approaches including various versions of dynamic Bayesian approaches for network inference as well as ND. If instead of the total influence, only the time series gathered under transient conditions is provided, such as in the case of Michaelis-Menten dynamics, Inline graphic -min approach achieves a 4 order reduction in inference error compared to MRA. These theoretical and and numerical studies suggest that our proposed method can be employed to effectively infer the presence of dynamic coupling (i.e., arc set or the direct influence in a dynamic network) based on sparse samples.

As with any network reconstruction approach, the method assumes that the time series realizations taken together can adequately mirror the salient dynamic regimes of the underlying process⁵², and as noted earlier, the approach is restricted to ensuring high levels of specificity and not sensitivity in identifying the direct influences. Additionally, while the approach is fairly robust to the presence of noise, the estimates Inline graphic from the averaging procedure for the arcs with is guaranteed to converge to zero only in the presence of additive noise. More specifically, one of the following conditions need to hold for the approach to be applicable: (1) the governing equation of the process dynamics is specified, so that Inline graphic or can be constructed; (2) one or more realizations of (based on ND or silencing method) or (based on MRA) are given. In our experience, 30 realizations ensured the convergence of the averaging method; (3) one realization of a n-dimensional time series is available for estimating using various alternative methods outlined in Feizi et al.’s²⁶ or Inline graphic time series realizations with the same initial condition are available for estimating using Eq. (17). Note that Scenario 1 is useful only for applications such as to investigate if there exists a more compact (sparser) network representation to capture the specified process dynamics. In Scenarios 2 and 3, we assume that the noise level or its lower limit is known, and adequate number of realizations are available to ensure convergence of the averaging method. In scenario 3, Eq. (17) yields a finite space-time approximation of the partial derivatives Inline graphic . They are estimated by perturbing the parameters and keeping the initial condition the same for two time series signals. The length of the time series in this case can be really small, or it can just be samples taken over multiple (roughly 30), short (can be even 2 samples) time windows. However, the time steps (or sampling interval) in each time window must be small enough to ensure that Inline graphic values locally converge. Sensitivity of the network inference performance to time step size, however, needs further investigation.

Efforts are underway to address some of the Inline graphic -min aforementioned limitations. We are investigating a two-stage approach to recover local nonlinear dynamics from sparse time series data. For future research, we will consider a more realistic scenario where not all state variables can be measured. In GRN inference, for example, only the outputs/activations of only those genes that have been discovered are measured. However, unknown genes might have significant influence on the network structure. Removing the effects of unmeasured variables, when combined with the method proposed in this paper, will lead to a more advanced network inference method.

Additional Information

How to cite this article: Tran, H. M. and Bukkapatnam, S. T.S. Inferring sparse networks for noisy transient processes. Sci. Rep. 6, 21963; doi: 10.1038/srep21963 (2016).

Supplementary Material

Supplementary Information

srep21963-s1.pdf^{(143.8KB, pdf)}

Supplementary Information

srep21963-s2.doc^{(72KB, doc)}

Acknowledgments

The authors thank the anonymous reviewers for their constructive comments that have helped improve the manuscript. They also acknowledge the National Science Foundation CMMI division (Grants 1437139 and 1432914) for the generous support of this research. The open access publishing fees for this article have been covered by the Texas A&M University Online Access to Knowledge (OAK) Fund, supported by the University Libraries and the Office of the Vice President for Research.

Footnotes

Author Contributions H.M.T. and S.T.S.B. designed and performed the research, analyzed the resutls and wrote the paper.

References

Chen T., He H. L. & Church G. M. Modeling gene expression with differential equations. In Pacific Symposium on Biocomputing vol. 4, 4 (1999). [PubMed] [Google Scholar]
Hecker M., Lambeck S., Toepfer S., Van Someren E. & Guthke R. Gene regulatory network inference: data integration in dynamic models - a review. Biosystems 96, 86–103 (2009). [DOI] [PubMed] [Google Scholar]
Schweitzer F. et al. Economic networks: The new challenges. Science 325, 422 (2009). [DOI] [PubMed] [Google Scholar]
Carrington P. J., Scott J. & Wasserman S. Models and Methods in Social Network Analysis, vol. 28 (Cambridge University Press, 2005). [Google Scholar]
Guimera R., Mossa S., Turtschi A. & Amaral L. N. The worldwide air transportation network: Anomalous centrality, community structure, and cities’ global roles. Proceedings of the National Academy of Sciences 102, 7794–7799 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
Newman M. E. The structure and function of complex networks. SIAM Review 45, 167–256 (2003). [Google Scholar]
Vogelstein B., Lane D. & Levine A. J. Surfing the p53 network. Nature 408, 307–310 (2000). [DOI] [PubMed] [Google Scholar]
Marbach D. et al. Wisdom of crowds for robust gene network inference. Nature Methods 9, 796–804 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
De Smet R. & Marchal K. Advantages and limitations of current network inference methods. Nature Reviews Microbiology 8, 717–729 (2010). [DOI] [PubMed] [Google Scholar]
Marbach D. et al. Revealing strengths and weaknesses of methods for gene network inference. Proceedings of the National Academy of Sciences 107, 6286–6291 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
Faisal F. E. & Milenković T. Dynamic networks reveal key players in aging. Bioinformatics 30, 1721–1729 (2014). [DOI] [PubMed] [Google Scholar]
Žitnik M. & Zupan B. Gene network inference by probabilistic scoring of relationships from a factorized model of interactions. Bioinformatics 30, i246–i254 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
Tang Q., Sun S. & Xu J. Learning scale-free networks by dynamic node specific degree prior. In Proceedings of The 32nd International Conference on Machine Learning, 2247–2255 (2015).
Chiuso A. & Pillonetto G. A bayesian approach to sparse dynamic network identification. Automatica 48, 1553–1565 (2012). [Google Scholar]
Friedman N., Linial M., Nachman I. & Pe’er D. Using bayesian networks to analyze expression data. Journal of Computational Biology 7, 601–620 (2000). [DOI] [PubMed] [Google Scholar]
Friedman N. Inferring cellular networks using probabilistic graphical models. Science 303, 799–805 (2004). [DOI] [PubMed] [Google Scholar]
Zou M. & Conzen S. D. A new dynamic bayesian network (dbn) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics 21, 71–79 (2005). [DOI] [PubMed] [Google Scholar]
Young W. C., Raftery A. E. & Yeung K. Y. Fast bayesian inference for gene regulatory networks using scanbma. BMC Systems Biology 8, 47 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
Hill S. M. et al. Bayesian inference of signaling network topology in a cancer cell line. Bioinformatics 28, 2804–2810 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
Seth A. K. A matlab toolbox for granger causal connectivity analysis. Journal of Neuroscience Methods 186, 262–273 (2010). [DOI] [PubMed] [Google Scholar]
Basu S., Shojaie A. & Michailidis G. Network granger causality with inherent grouping structure. Journal of Machine Learning Research 16, 417–453 (2015). [PMC free article] [PubMed] [Google Scholar]
Bolstad A., Van Veen B. D. & Nowak R. Causal network inference via group sparse regularization. IEEE Transactions on Signal Processing 59, 2628–2641 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
Haufe S., Nolte G., Müller K.-R. & Krämer N. Sparse causal discovery in multivariate time series. JMLR W&CP 6, 97–106 (2010). [Google Scholar]
Lozano A. C., Abe N., Liu Y. & Rosset S. Grouped graphical granger modeling for gene expression regulatory networks discovery. Bioinformatics 25, i110–i118 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
De La Fuente A., Bing N., Hoeschele I. & Mendes P. Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 20, 3565–3574 (2004). [DOI] [PubMed] [Google Scholar]
Feizi S., Marbach D., Médard M. & Kellis M. Network deconvolution as a general method to distinguish direct dependencies in networks. Nature Biotechnology 31, 726–733 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
Barzel B. & Barabási A.-L. Network link prediction by global silencing of indirect correlations. Nature Biotechnology 31, 720–725 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
Ebert-Uphoff I. & Deng Y. Causal discovery for climate research using graphical models. Journal of Climate 25, 5648–5665 (2012). [Google Scholar]
Runge J., Heitzig J., Petoukhov V. & Kurths J. Escaping the curse of dimensionality in estimating multivariate transfer entropy. Physical Review Letters 108, 258701 (2012). [DOI] [PubMed] [Google Scholar]
Runge J. et al. Identifying causal gateways and mediators in complex spatio-temporal systems. Nature Communications 6 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
Runge J., Petoukhov V. & Kurths J. Quantifying the strength and delay of climatic interactions: the ambiguities of cross correlation and a novel measure based on graphical models. Journal of Climate 27, 720–739 (2014). [Google Scholar]
Kholodenko B. N. et al. Untangling the wires: a strategy to trace functional interactions in signaling and gene networks. Proceedings of the National Academy of Sciences 99, 12841–12846 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
Sontag E., Kiyatkin A. & Kholodenko B. N. Inferring dynamic architecture of cellular networks using time series of gene expression, protein and metabolite data. Bioinformatics 20, 1877–1886 (2004). [DOI] [PubMed] [Google Scholar]
Wang W. X., Yang R., Lai Y. C., Kovanis V. & Grebogi C. Predicting catastrophes in nonlinear dynamical systems by compressive sensing. Physical Review Letters 106, 154101 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
Napoletani D. & Sauer T. D. Reconstructing the topology of sparsely connected dynamical networks. Physical Review E 77, 026103 (2008). [DOI] [PubMed] [Google Scholar]
Wang W.-X., Yang R., Lai Y.-C., Kovanis V. & Harrison M. A. F. Time-series-based prediction of complex oscillator networks via compressive sensing. EPL (Europhysics Letters) 94, 48006 (2011). [Google Scholar]
Boccaletti S., Latora V., Moreno Y., Chavez M. & Hwang D.-U. Complex networks: Structure and dynamics. Physics Reports 424, 175–308 (2006). [Google Scholar]
Candes E. & Romberg J. l₁-magic: Recovery of sparse signals via convex programming (2005), (Date of access: 03/05/2014). Available at: http://users.ece.gatech.edu/justin/l1magic/.
Gurobi Optimization I. Gurobi optimizer reference manual (2014), (Date of access: 02/03/2014). Available at: http://www.gurobi.com.
Herman M. A. & Strohmer T. General deviants: An analysis of perturbations in compressed sensing. IEEE Journal of Selected Topics in Signal Processing 4, 342–349 (2010). [Google Scholar]
Horn R. A. & Johnson C. R. Matrix Analysis (Cambridge University Press, 1985). [Google Scholar]
Barzel B. & Barabási A.-L. Network link prediction by global silencing of indirect correlations. Nature Biotechnology 31, 720–725 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
Karlebach G. & Shamir R. Modelling and analysis of gene regulatory networks. Nature Reviews Molecular Cell Biology 9, 770–780 (2008). [DOI] [PubMed] [Google Scholar]
Alon U. An Introduction to Systems Biology: Design Principles of Biological Circuits (CRC press, 2006). [Google Scholar]
Jeong H., Tombor B., Albert R., Oltvai Z. N. & Barabási A.-L. The large-scale organization of metabolic networks. Nature 407, 651–654 (2000). [DOI] [PubMed] [Google Scholar]
Stolovitzky G., Monroe D. & Califano A. Dialogue on reverse-engineering assessment and methods. Annals of the New York Academy of Sciences 1115, 1–22 (2007). [DOI] [PubMed] [Google Scholar]
Muchnik L. Complex networks package for matlab (version 1.6) (2013), (Date of access: 12/08/2014). Available at: http://www.levmuchnik.net/Content/Networks/ComplexNetworksPackage.html.
Hoyer P. O. Non-negative matrix factorization with sparseness constraints. The Journal of Machine Learning Research 5, 1457–1469 (2004). [Google Scholar]
Arbeitman M. N. et al. Gene expression during the life cycle of drosophila melanogaster. Science 297, 2270–2275 (2002). [DOI] [PubMed] [Google Scholar]
Pagel P. et al. The mips mammalian protein-protein interaction database. Bioinformatics 21, 832–834 (2005). [DOI] [PubMed] [Google Scholar]
Barabási A.-L. & Albert R. Emergence of scaling in random networks. Science 286, 509–512 (1999). [DOI] [PubMed] [Google Scholar]
Cheng D. et al.‘s Time series forecasting for nonlinear and non-stationary processes: a review and comparative study. IIE Transactions 47, 1053–1071 (2015). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information

srep21963-s1.pdf^{(143.8KB, pdf)}

Supplementary Information

srep21963-s2.doc^{(72KB, doc)}

[b1] Chen T., He H. L. & Church G. M. Modeling gene expression with differential equations. In Pacific Symposium on Biocomputing vol. 4, 4 (1999). [PubMed] [Google Scholar]

[b2] Hecker M., Lambeck S., Toepfer S., Van Someren E. & Guthke R. Gene regulatory network inference: data integration in dynamic models - a review. Biosystems 96, 86–103 (2009). [DOI] [PubMed] [Google Scholar]

[b3] Schweitzer F. et al. Economic networks: The new challenges. Science 325, 422 (2009). [DOI] [PubMed] [Google Scholar]

[b4] Carrington P. J., Scott J. & Wasserman S. Models and Methods in Social Network Analysis, vol. 28 (Cambridge University Press, 2005). [Google Scholar]

[b5] Guimera R., Mossa S., Turtschi A. & Amaral L. N. The worldwide air transportation network: Anomalous centrality, community structure, and cities’ global roles. Proceedings of the National Academy of Sciences 102, 7794–7799 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]

[b6] Newman M. E. The structure and function of complex networks. SIAM Review 45, 167–256 (2003). [Google Scholar]

[b7] Vogelstein B., Lane D. & Levine A. J. Surfing the p53 network. Nature 408, 307–310 (2000). [DOI] [PubMed] [Google Scholar]

[b8] Marbach D. et al. Wisdom of crowds for robust gene network inference. Nature Methods 9, 796–804 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[b9] De Smet R. & Marchal K. Advantages and limitations of current network inference methods. Nature Reviews Microbiology 8, 717–729 (2010). [DOI] [PubMed] [Google Scholar]

[b10] Marbach D. et al. Revealing strengths and weaknesses of methods for gene network inference. Proceedings of the National Academy of Sciences 107, 6286–6291 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[b11] Faisal F. E. & Milenković T. Dynamic networks reveal key players in aging. Bioinformatics 30, 1721–1729 (2014). [DOI] [PubMed] [Google Scholar]

[b12] Žitnik M. & Zupan B. Gene network inference by probabilistic scoring of relationships from a factorized model of interactions. Bioinformatics 30, i246–i254 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[b13] Tang Q., Sun S. & Xu J. Learning scale-free networks by dynamic node specific degree prior. In Proceedings of The 32nd International Conference on Machine Learning, 2247–2255 (2015).

[b14] Chiuso A. & Pillonetto G. A bayesian approach to sparse dynamic network identification. Automatica 48, 1553–1565 (2012). [Google Scholar]

[b15] Friedman N., Linial M., Nachman I. & Pe’er D. Using bayesian networks to analyze expression data. Journal of Computational Biology 7, 601–620 (2000). [DOI] [PubMed] [Google Scholar]

[b16] Friedman N. Inferring cellular networks using probabilistic graphical models. Science 303, 799–805 (2004). [DOI] [PubMed] [Google Scholar]

[b17] Zou M. & Conzen S. D. A new dynamic bayesian network (dbn) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics 21, 71–79 (2005). [DOI] [PubMed] [Google Scholar]

[b18] Young W. C., Raftery A. E. & Yeung K. Y. Fast bayesian inference for gene regulatory networks using scanbma. BMC Systems Biology 8, 47 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[b19] Hill S. M. et al. Bayesian inference of signaling network topology in a cancer cell line. Bioinformatics 28, 2804–2810 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[b20] Seth A. K. A matlab toolbox for granger causal connectivity analysis. Journal of Neuroscience Methods 186, 262–273 (2010). [DOI] [PubMed] [Google Scholar]

[b21] Basu S., Shojaie A. & Michailidis G. Network granger causality with inherent grouping structure. Journal of Machine Learning Research 16, 417–453 (2015). [PMC free article] [PubMed] [Google Scholar]

[b22] Bolstad A., Van Veen B. D. & Nowak R. Causal network inference via group sparse regularization. IEEE Transactions on Signal Processing 59, 2628–2641 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[b23] Haufe S., Nolte G., Müller K.-R. & Krämer N. Sparse causal discovery in multivariate time series. JMLR W&CP 6, 97–106 (2010). [Google Scholar]

[b24] Lozano A. C., Abe N., Liu Y. & Rosset S. Grouped graphical granger modeling for gene expression regulatory networks discovery. Bioinformatics 25, i110–i118 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

[b25] De La Fuente A., Bing N., Hoeschele I. & Mendes P. Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 20, 3565–3574 (2004). [DOI] [PubMed] [Google Scholar]

[b26] Feizi S., Marbach D., Médard M. & Kellis M. Network deconvolution as a general method to distinguish direct dependencies in networks. Nature Biotechnology 31, 726–733 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[b27] Barzel B. & Barabási A.-L. Network link prediction by global silencing of indirect correlations. Nature Biotechnology 31, 720–725 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[b28] Ebert-Uphoff I. & Deng Y. Causal discovery for climate research using graphical models. Journal of Climate 25, 5648–5665 (2012). [Google Scholar]

[b29] Runge J., Heitzig J., Petoukhov V. & Kurths J. Escaping the curse of dimensionality in estimating multivariate transfer entropy. Physical Review Letters 108, 258701 (2012). [DOI] [PubMed] [Google Scholar]

[b30] Runge J. et al. Identifying causal gateways and mediators in complex spatio-temporal systems. Nature Communications 6 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[b31] Runge J., Petoukhov V. & Kurths J. Quantifying the strength and delay of climatic interactions: the ambiguities of cross correlation and a novel measure based on graphical models. Journal of Climate 27, 720–739 (2014). [Google Scholar]

[b32] Kholodenko B. N. et al. Untangling the wires: a strategy to trace functional interactions in signaling and gene networks. Proceedings of the National Academy of Sciences 99, 12841–12846 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]

[b33] Sontag E., Kiyatkin A. & Kholodenko B. N. Inferring dynamic architecture of cellular networks using time series of gene expression, protein and metabolite data. Bioinformatics 20, 1877–1886 (2004). [DOI] [PubMed] [Google Scholar]

[b34] Wang W. X., Yang R., Lai Y. C., Kovanis V. & Grebogi C. Predicting catastrophes in nonlinear dynamical systems by compressive sensing. Physical Review Letters 106, 154101 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[b35] Napoletani D. & Sauer T. D. Reconstructing the topology of sparsely connected dynamical networks. Physical Review E 77, 026103 (2008). [DOI] [PubMed] [Google Scholar]

[b36] Wang W.-X., Yang R., Lai Y.-C., Kovanis V. & Harrison M. A. F. Time-series-based prediction of complex oscillator networks via compressive sensing. EPL (Europhysics Letters) 94, 48006 (2011). [Google Scholar]

[b37] Boccaletti S., Latora V., Moreno Y., Chavez M. & Hwang D.-U. Complex networks: Structure and dynamics. Physics Reports 424, 175–308 (2006). [Google Scholar]

[b38] Candes E. & Romberg J. l₁-magic: Recovery of sparse signals via convex programming (2005), (Date of access: 03/05/2014). Available at: http://users.ece.gatech.edu/justin/l1magic/.

[b39] Gurobi Optimization I. Gurobi optimizer reference manual (2014), (Date of access: 02/03/2014). Available at: http://www.gurobi.com.

[b40] Herman M. A. & Strohmer T. General deviants: An analysis of perturbations in compressed sensing. IEEE Journal of Selected Topics in Signal Processing 4, 342–349 (2010). [Google Scholar]

[b41] Horn R. A. & Johnson C. R. Matrix Analysis (Cambridge University Press, 1985). [Google Scholar]

[b42] Barzel B. & Barabási A.-L. Network link prediction by global silencing of indirect correlations. Nature Biotechnology 31, 720–725 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[b43] Karlebach G. & Shamir R. Modelling and analysis of gene regulatory networks. Nature Reviews Molecular Cell Biology 9, 770–780 (2008). [DOI] [PubMed] [Google Scholar]

[b44] Alon U. An Introduction to Systems Biology: Design Principles of Biological Circuits (CRC press, 2006). [Google Scholar]

[b45] Jeong H., Tombor B., Albert R., Oltvai Z. N. & Barabási A.-L. The large-scale organization of metabolic networks. Nature 407, 651–654 (2000). [DOI] [PubMed] [Google Scholar]

[b46] Stolovitzky G., Monroe D. & Califano A. Dialogue on reverse-engineering assessment and methods. Annals of the New York Academy of Sciences 1115, 1–22 (2007). [DOI] [PubMed] [Google Scholar]

[b47] Muchnik L. Complex networks package for matlab (version 1.6) (2013), (Date of access: 12/08/2014). Available at: http://www.levmuchnik.net/Content/Networks/ComplexNetworksPackage.html.

[b48] Hoyer P. O. Non-negative matrix factorization with sparseness constraints. The Journal of Machine Learning Research 5, 1457–1469 (2004). [Google Scholar]

[b49] Arbeitman M. N. et al. Gene expression during the life cycle of drosophila melanogaster. Science 297, 2270–2275 (2002). [DOI] [PubMed] [Google Scholar]

[b50] Pagel P. et al. The mips mammalian protein-protein interaction database. Bioinformatics 21, 832–834 (2005). [DOI] [PubMed] [Google Scholar]

[b51] Barabási A.-L. & Albert R. Emergence of scaling in random networks. Science 286, 509–512 (1999). [DOI] [PubMed] [Google Scholar]

[b52] Cheng D. et al.‘s Time series forecasting for nonlinear and non-stationary processes: a review and comparative study. IIE Transactions 47, 1053–1071 (2015). [Google Scholar]

PERMALINK

Inferring sparse networks for noisy transient processes

Hoang M Tran

Satish TS Bukkapatnam

Abstract

Methods

Figure 1. Illustration of direct and total influence.

Network inference when total influence matrix is available

Network inference when the time series under transient conditions are available (total influence matrix not given)

A robust perturbation procedure

Table 1. The matrix R for computing the first row of S is estimated using Sontag et al. ³³’s perturbation procedure.

A robust network identification approach

Results

Case I: simulation studies

Inferring direct influence networks from total influence network

Table 2. Comparison of bounds on total perturbation obtained using Eqs (13) and (14) suggests that Eq. (13) provides a good approximation and Eq. (14) serves as an upper bound of .

Figure 2.

Figure 3. Variation of inference error with total perturbation bound ε_i.

Figure 4.

Inferring direct influence network structure from multiple time series under transient conditions

Figure 5. Box plots summarizing the inference errors without and with averaging for.

Case II: Application to empirical genetic regulatory network inference

Figure 6. Performance comparison of (1) original G matrix, (2) ND, (3) ND with averaging, (4) ℓ₁-min and (5) ℓ₁-min with averaging for the DREAM5 challenge datasets.

Discussion and Concluding remarks

Additional Information

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Inferring sparse networks for noisy transient processes

Hoang M Tran

Satish TS Bukkapatnam

Abstract

Methods

Figure 1. Illustration of direct and total influence.

Network inference when total influence matrix is available

Network inference when the time series under transient conditions are available (total influence matrix not given)

A robust perturbation procedure

Table 1. The matrix R for computing the first row of S is estimated using Sontag et al. 33’s perturbation procedure.

A robust network identification approach

Results

Case I: simulation studies

Inferring direct influence networks from total influence network

Table 2. Comparison of bounds on total perturbation obtained using Eqs (13) and (14) suggests that Eq. (13) provides a good approximation and Eq. (14) serves as an upper bound of .

Figure 2.

Figure 3. Variation of inference error with total perturbation bound εi.

Figure 4.

Inferring direct influence network structure from multiple time series under transient conditions

Figure 5. Box plots summarizing the inference errors without and with averaging for.

Case II: Application to empirical genetic regulatory network inference

Figure 6. Performance comparison of (1) original G matrix, (2) ND, (3) ND with averaging, (4) ℓ1-min and (5) ℓ1-min with averaging for the DREAM5 challenge datasets.

Discussion and Concluding remarks

Additional Information

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 1. The matrix R for computing the first row of S is estimated using Sontag et al. ³³’s perturbation procedure.

Figure 3. Variation of inference error with total perturbation bound ε_i.

Figure 6. Performance comparison of (1) original G matrix, (2) ND, (3) ND with averaging, (4) ℓ₁-min and (5) ℓ₁-min with averaging for the DREAM5 challenge datasets.