Identifying Dynamical Time Series Model Parameters from Equilibrium Samples, with Application to Gene Regulatory Networks

William Chad Young; Ka Yee Yeung; Adrian E Raftery

doi:10.1177/1471082x18776577

. Author manuscript; available in PMC: 2021 Apr 5.

Published in final edited form as: Stat Modelling. 2018 Jun 17;19(4):444–465. doi: 10.1177/1471082x18776577

Identifying Dynamical Time Series Model Parameters from Equilibrium Samples, with Application to Gene Regulatory Networks

William Chad Young ¹, Ka Yee Yeung ², Adrian E Raftery ³

PMCID: PMC8021096 NIHMSID: NIHMS1585593 PMID: 33824624

Abstract

Gene regulatory network reconstruction is an essential task of genomics in order to further our understanding of how genes interact dynamically with each other. The most readily available data, however, are from steady state observations. These data are not as informative about the relational dynamics between genes as knockout or over-expression experiments, which attempt to control the expression of individual genes. We develop a new framework for network inference using samples from the equilibrium distribution of a vector autoregressive (VAR) time-series model which can be applied to steady state gene expression data. We explore the theoretical aspects of our method and apply the method to synthetic gene expression data generated using GeneNetWeaver.

Keywords: Gene networks, Network reconstruction, Time series, VAR equilibrium

1. Introduction

With continuing advances in gene expression measurement technologies, more and more expression data are becoming available for analysis. One of the important uses for these data has been in inferring gene regulatory networks. These networks identify relationships between genes and give insight into the complex inner workings of the cell.

Even with the increasing amounts of data, constructing gene regulatory networks is difficult in practice. The large number of genes involved, generally in the thousands, makes methods developed for smaller networks computationally infeasible. Time-series genetic data, with the same population of cells observed at multiple times, is useful in inferring the direction of the relationships (Yeung et al., 2011; Young et al., 2014). Knockdown and over-expression data can also be used to infer directionality of relationships (Young et al., 2016). However, most data take the form of perturbation screens or steady state data at a single timepoint. This makes it harder to identify the direction of the relationship between two genes without prior knowledge.

There have been many methods developed for the analysis of steady state data. Mutual information and correlation-based methods are commonly used, although the resulting networks tend to lack directionality (Tusher et al., 2001; Basso et al., 2005; Margolin et al., 2006; Faith et al., 2007; Meyer et al., 2007). Regression-based methods have also been applied to steady state data to infer network structure (Omranian et al., 2016; Singh and Vidyasagar, 2016). Bayesian network methods (Friedman et al., 2010) can result in directed graphs, but the networks generated are, of necessity, acyclic. This means that any cycles, such as those that are found in real biological systems, will not be captured, so the resulting networks will be unrealistic in this respect. Shojaie et al. (2014) developed a a method combining information from both perturbation and steady state data to increase the accuracy of the inferred network.

Here we propose a new method based on an implicit multivariate time-series model that uses steady state data and has the capability to test the existence of edges in directed networks. This method is not restricted to acyclic networks, but can be used to infer information about cycles in the network. We present proofs of consistency and efficiency in a constrained case as well as a likelihood ratio test for the existence of an edge. We also give simulation results and apply the method to a synthetic dataset, showing that the method performs well in practice.

2. First-Order Vector Autoregressive Model

To build a model for equilibrium data, first consider a first-order vector autoregressive, or VAR(1) model for time-series data for a system with p genes with no correlation in the error term between genes (Lütkepohl, 2005). We can write the model as

x_{1} = ε_{1}, x_{t} = A x_{t - 1} + ε_{t} for t > 1, ε_{t} \sim N (0, D),

where D is diagonal, x_t and ε_t are vectors of length p while A and D are p × p matrices. Here, A identifies the relationships between the genes. A non-zero (i, j)-th element of A indicates that there is an edge from gene j to gene i.

Autoregressive models have been broadly used to infer gene networks from time-series data, with approaches ranging from penalized regression (LASSO, elastic net) to Bayesian model averaging (BMA) to Dynamic Bayesian Networks (DBN). Michailidis and d’Alché Buc (2013) review many of these approaches to modeling time-series using autoregressive models.

At first glance, the VAR(1) model does not appear to be applicable to steady state data. However, if the eigenvalues of A are less than 1 in absolute value, then as t → ∞, x_t converges to a stable equilibrium distribution (Anderson, 2000). We can use this equilibrium distribution as the model for steady state data. This can be applied to either a set of experiments with the same perturbation or to the control experiments.

2.1. Equilibrium Distribution

Since this is a Gaussian VAR model, we can write the equilibrium distribution as

x_{\infty} \sim N (0, Σ),

where Σ is also a p × p matrix. To find Σ, we use an iterative relationship between the variances at consecutive time points:

Var [x_{1}] = D, Var [x_{t + 1}] = A Var [x_{t}] A^{T} + D, Σ = \sum_{i = 0}^{\infty} A^{i} D (A^{T})^{i} .

We will call this the summation identity (Sheppard, 2013).

Another expression for the asymptotic variance uses the identity vec(AXB) = (B^T ⊗ A)vec(X):

Σ = \sum_{i = 0}^{\infty} A^{i} D (A^{T})^{i}, vec (Σ) = vec (\sum_{i = 0}^{\infty} A^{i} D (A^{T})^{i}), = \sum_{i = 0}^{\infty} vec (A^{i} D (A^{T})^{i}), = \sum_{i = 0}^{\infty} (A^{i} \otimes A^{i}) vec (D), = \sum_{i = 0}^{\infty} (A \otimes A^{i}) vec (D), = (I - A \otimes A)^{- 1} vec (D) .

This Kronecker identity is convenient for calculating the asymptotic variance from a given A and D since it eliminates the infinite sum (Sheppard, 2013). We can also write out a recursive identity for Σ since, at the equilibrium, the variance does not change from one time step to the next:

Σ = A Σ A^{T} + D .

(2.1)

3. Identifiability

When looking at steady state data as opposed to time-series data, we must rely on the relationship between (A, D) and Σ in order to infer anything about A and D. This raises an issue of identifiability. As a simple example, substituting −A for A in any of the variance identities above does not change anything, so the sign of A cannot be identified from equilibrium data only. More generally, let Q be any p × p orthogonal matrix. An orthogonal matrix is defined as any matrix such that QQ^T = I. For any such Q, $\tilde{A}$ is defined as

\tilde{A} = A Σ^{1 ∕ 2} Q Σ^{- 1 ∕ 2}

will satisfy the recursive identity (Tong et al., 1992).

We can state the identifiability problem as follows. Under what conditions is the map

(A, D) \leftrightarrow Σ

bijective? As a necessary condition we can count parameters.

Proposition 3.1. A necessary condition for identifiability in the VAR equilibrium model of dimension p, with connected nodes and non-zero diagonal elements of A, is that the number of non-zero off-diagonal elements of A be no more than p(p − 3)/2 and that p ≥ 5.

Proof. The number of parameters in Σ for a p-dimensional model is p(p + 1)/2. For identifiability, the total number of independently estimable parameters in A and D must not exceed that number. D is diagonal and thus accounts for p parameters. Similarly, the diagonal of A is assumed to be non-zero, implying that the expression level of each gene at timepoint t + 1 is dependent on the expression level of that gene at timepoint t for all genes. This accounts for an additional p parameters. Thus a maximum p(p − 3)/2 off-diagonal elements of A may be non-zero. Now, the graph defined by A must be connected. If it is not, then it can be decomposed into two smaller, independent components, and each of those must be identifiable. We need at least p − 1 off-diagonal elements of A for it to be connected. Thus p must be at least 5, because if p = 4, only a maximum of 2 off-diagonal elements is allowed, which is not enough to fully connect the 4 nodes.

Parameter counting yields a necessary condition, but it is not a sufficient condition. We have not found sufficient conditions in the most general case. We have, however, been able examine identifiability in two specific examples, one acyclic and one cyclic. To assess the identifiability of a specific model, we used the recursive identity,

Σ = A Σ A^{T} + D .

If a given model is identifiable, then there will be only one (A, D) pair which satisfies this equation.

The first example is a 5-dimensional model with a simple network structure, given in Figure 1. This model has four off-diagonal non-zero elements in A. The maximum allowable by parameter counting is 5, so we are close to but not at that limit. Writing out the recursive identity as a system of equations yields equations, the first two of which are:

σ_{11} = a_{11}^{2} σ_{11} + 2 a_{11} a_{12} σ_{12} + a_{22}^{2} σ_{22} + d_{1}, σ_{12} = a_{11} a_{22} σ_{12} + a_{12} a_{22} σ_{22} + a_{11} a_{23} σ_{13} + a_{12} a_{23} σ_{23}, \dots

where a_ij the entry of A in the i’th row and j’th column, and d_i is the i-th diagonal entry in D. Next, we solve the system of equations to eliminate as many variables as possible. Then we perform a grid search on the free variables to find all potential solutions of the recursive identity. In this example, we were able to solve the system down to a single free parameter, a₅₅.

Figure 1: — The network structure for the cyclic identifiability example.

To test the parameter solutions, we used the following model parameters:

A = [\begin{matrix} 0.6 & 0.3 & 0 & 0 & 0 \\ 0 & 0.4 & - 0.4 & 0 & 0 \\ 0 & 0 & 0.8 & 0.5 & 0 \\ 0 & 0 & 0 & 0.5 & - 0.3 \\ 0 & 0 & 0 & 0 & 0.7 \end{matrix}], D = I_{5} .

The grid search was performed by allowing a₅₅ to vary between 0 and 1 in increments of 0.0001. This is the allowable range for a₅₅ since A and −A both result in the same Σ. For each prospective value of a₅₅, the other elements of A and D were calculated. These values were then used to evaluate the objective function log $‖ Σ - A Σ A^{T} - D ‖_{2}^{2}$ . Figure 2 shows the results. From this it is clear that the only value of a₅₅ that satisfies the recursive identity is the true value, 0.7, where the objective function is minimized. The flat region of the objective curve to the left of a₅₅ ≈ 0.65 reflects the fact that there is no valid solution for A and D with that starting value for a₅₅, either because some element of A is not a real value or because a diagonal element of D is negative. Also note that there is a degenerate solution at a₅₅ = 1 such that A = I and D = 0. Since D represents the variance of the noise added to the VAR system, its diagonal elements must be positive. So in this example the model is identifiable.

Figure 2: — Grid search for solutions of the acyclic identifiability example. The objective function is minimized at the true value of a₅₅ = 0.7, showing that the model is identifiable in this example.

Our second example involves a cyclic network. We added a single edge to the previous network, making a 3-node cycle, as shown in Figure 3. We used the following model parameters:

A = [\begin{matrix} 0.3 & 0.6 & 0 & 0 & 0 \\ 0 & 0.1 & - 0.3 & 0 & 0 \\ 0 & 0 & 0.7 & 0.2 & 0 \\ 0 & 0.3 & 0 & 0.4 & - 0.6 \\ 0 & 0 & 0 & 0 & 0.5 \end{matrix}], D = I_{5} .

For this model, we were able to solve down to two free parameters, a₄₅ and a₅₅. We again performed a grid search over the appropriate parameter space (a₄₅ ∈ (−1, 1), a₅₅ ∈ (0, 1)) with an initial grid increment of 0.001 and evaluated the objective function at each point. Figure 4 shows the results of the grid search, both in the larger space as well as zoomed in around the truth, where a much smaller grid increment of 10⁻¹¹ was used. There are large areas where a valid solution is not achieved, indicated by the white areas. Again, however, the original solution is the only valid one, indicating that this model is also identifiable.

Figure 4: — Grid search for solutions of the cyclic identifiability example showing the value of the objective function at each point searched. The left plot shows the grid search over the entire model space, while the right plot is zoomed in on the true parameter value. The white areas indicate parameter values that result in invalid solutions. The true solution at (−0.6, 0.5) is the only valid solution.

3.1. Simple Case

Although the parameter counting condition is helpful and results in identifiability in at least a couple of examples, it is important to find a more general result for identifiability.

We now find a more general condition for identifiablity within a specific subset of VAR(1) models. We do this by restricting the model considerably to make the analysis more tractable. First, we constrain D to be known. Secondly, we require that the model be acyclic, and thus the nodes can be ordered such that A is lower-triangular. Finally, we require that the diagonal entries of A be less than 1 in absolute value and that their signs be known. With these constraints, the model is identifiable and has good asymptotic properties.

Consider the model as constrained above. We order the nodes in the network topologically, meaning that there are no edges from node i to node j if i > j. This creates an A that is lower-triangular. We will use the recursive identity for the equilibrium variance:

A Σ A^{T} + D = Σ,

(3.1)

A Σ A^{T} = Σ - D .

(3.2)

We solve this identity recursively by looking at the leading principal submatrices, an approach inspired by Drton et al. (2011). First, define S_i to be the leading i × i principal submatrix of Σ. That is the matrix made of elements in the first i rows and columns of Σ. Similarly, define B_i to be the leading principal submatrix of B = Σ – D and Γ_i to be the leading principal submatrix of A. Also, define s_i+1 to be the vector containing the first i elements of the (i + 1)^th column of Σ or B. That is, $s_{i + 1} = [σ_{1, i + 1}, \dots, σ_{i, i + 1}]^{'}$ . Similarly, $γ_{i + 1}^{T}$ is the vector containing the first i elements of the (i + 1)^th row of Γ. So,

S_{i + 1} = [\begin{matrix} S_{i} & s_{i + 1} \\ s_{i + 1}^{T} & s_{i + 1, i + 1} \end{matrix}], s_{i i} = σ_{i i} B_{i + 1} = [\begin{matrix} B_{i} & s_{i + 1} \\ s_{i + 1}^{T} & b_{i + 1, i + 1} \end{matrix}], b_{i i} = σ_{i i} - d_{i i} Γ_{i + 1} = [\begin{matrix} Γ_{i} & 0 \\ γ_{i + 1}^{T} & γ_{i + 1, i + 1} \end{matrix}], γ_{i i} = a_{i i} .

Now, for i = 1, we get

γ_{i i}^{2} s_{i i} = b_{i i}, a_{i i} = \sqrt{\frac{σ_{i i} - d_{i i}}{σ_{i i}}} .

This gives us A₁. Now, suppose we know A_i. Then, for i + 1, we have

B_{i + 1} = Γ_{i + 1} S_{i + 1} Γ_{i + 1}^{T}, [\begin{matrix} B_{i} & s_{i + 1} \\ s_{i + 1}^{T} & b_{i + 1, i + 1} \end{matrix}] = [\begin{matrix} Γ_{i} S_{i} Γ_{i}^{T} & Γ_{i} S_{i} γ_{i + 1} + γ_{i + 1, i + 1} Γ_{i} s_{i + 1} \\ - & γ_{i + 1}^{T} S_{i} γ_{i + 1} + 2 γ_{i + 1, i + 1} s_{i + 1}^{T} γ_{i + 1} + γ_{i + 1, i + 1}^{2} s_{i + 1, i + 1} \end{matrix}] .

We solve for γ_i+1,i+1 and γ_i+1 to get

γ_{i + 1, i + 1} = \sqrt{\frac{b_{i + 1, i + 1} - s_{i}^{T} (Γ_{i} S_{i} Γ_{i}^{T})^{- 1} s_{i}}{s_{i + 1, i + 1} - s_{i}^{T} Γ^{- 1} s_{i}}}, γ_{i + 1} = (Γ_{i} S_{i})^{- 1} (I - γ_{i + 1, i + 1} Γ_{i}) s_{i} .

Notice here that the expression for γ_i+1,i+1 has two solutions, the positive and negative square root. These elements are the diagonal entries in A, thus requiring that the signs of the diagonal elements be specified. If they are not specified, there are 2^p possible solutions. Once the signs of these elements are specified, we have a unique solution for A given Σ and D. However, there still exists positive definite Σ that do not map back to a valid A given D. As a simple example, if σ_ii < d_ii, then the model will not work since that fact means that the equilibrium variance is less than the error variance added at each step.

Now we show that the maximum likelihood estimator (MLE) for A in this model is both consistent and efficient:

Theorem 3.2. Suppose p-dimensional x₁,…, x_n independent are observed from the distribution

X ∣ A \sim N (0, \sum_{i = 0}^{\infty} A^{i} D (A^{T})^{i}),

where D is known and A is lower-triangular with the signs of the diagonal elements known, and the diagonal elements less than 1 in absolute value. Then $\hat{A}$ , the MLE of A, is consistent and asymptotically efficient with

\sqrt{n} (\hat{a} - a) \to N (0, F (a)^{- 1}),

where a is the vectorization of the non-zero elements of A and $F (a)$ is the Fisher information matrix, whose elements are

F (a)_{i j} = E [(\frac{\partial}{\partial a_{i}} \log f (x ∣ A)) (\frac{\partial}{\partial a_{j}} \log f (x ∣ A))] .

Proof. We will show that the conditions for Cramér’s Theorem, as found in Ferguson (1996, p. 121), are satisfied.

The appendix contains derivations of the first and second partial derivatives and show that they exist and are continuous. For example, the first partial derivatives take the form

\frac{\partial σ_{i j}}{\partial a_{m n}} = \frac{1}{1 - a_{i i} a_{j j}} [\begin{matrix} (A Σ)_{i n} 1_{[j = m, j \geq n]} + (A Σ)_{j n} 1_{[i = m, i \geq n]} \\ + \sum_{k = 1}^{i} \sum_{l = 1}^{j} a_{i k} a_{j l} \frac{\partial σ_{k l}}{\partial a_{m n}} 1_{[(k, l) \neq (i, j)]} \end{matrix}] .

Both the first and second partial derivatives can be calculated iteratively.

Now let

Ψ (x, a) = \frac{\partial}{\partial a} \log f (x ∣ a), a k vector, \dot{Ψ} (x, a) = \frac{\partial^{2}}{\partial a^{2}} \log f (x ∣ a), a k \times k matrix .

Note that the components of $\dot{Ψ} (x, a)$ , as shown in the appendix, are all of the form

\sum_{i, j} c_{i j} ({xx}^{T})_{i j},

where the c_ij’s are constant. Further, the expectations of the absolute values of the components are finite since E[∣xx^T∣_ij] < ∞ (Li and Wei, 2012). Now let

K (x) = \sum_{i, j} m_{i j} ∣ {xx}^{T} ∣_{i j},

where m_ij is the maximum of all the ∣c_ij∣’s across all components. Then each component of $\dot{Ψ} (x, a)$ is bounded by K(x) in absolute value and K(x) has finite expectation.

These results, along with the earlier proof of identifiability and the form of the constraints on A, satisfy Cramér’s Theorem. Thus the MLE of A is asymptotically consistent and efficient.

In addition to asymptotic consistency and efficiency, we also can form a likelihood-ratio test for the existence of an edge.

Corollary 3.3. Let $A_{1}$ be the set of all valid A in the VAR equilibrium model, possibly with some entries constrained to be 0. Let $A_{0} \subset A_{1}$ with a difference in dimension between the two spaces of c. This corresponds to c additional entries of A constrained to be 0. Then the log-likelihood ratio test statistic

λ = - 2 \log (\frac{{sup}_{A \in A_{0}} L (A ∣ x)}{{sup}_{A \in A_{1}} L (A ∣ x)})

has a $χ_{c}^{2}$ distribution, asymptotically.

Proof. The conditions which satisfy Cramér’s theorem also satisfy Wilk’s theorem concerning the likelihood-ratio test statistic (Ferguson, 1996, p. 145).

3.2. Simulation Results - Asymptotic Variance

We have an asymptotic distribution for the MLE, and now we assess the quality of its approximation in finite samples via simulation. Because of the complexity of the calculations involved in finding the Fisher information matrix, we will restrict ourselves to a two-dimensional model. This model has three parameters for A,

A = [\begin{matrix} a_{11} & 0 \\ a_{21} & a_{22} \end{matrix}],

so a = [a₁₁, a₂₁, a₂₂]^T and the Fisher information matrix will be 3 by 3, with elements

F (a)_{i j} = E [(\frac{\partial}{\partial a_{i}} \log f (x ∣ A)) (\frac{\partial}{\partial a_{j}} \log f (x ∣ A))] .

For our simulations, we used

A = [\begin{matrix} 0.9 & 0 \\ 0.7 & 0.8 \end{matrix}], D = I_{2} .

Using these values, the asymptotic variance of the MLE for A is calculated to be

F (a)^{- 1} = [\begin{matrix} 0.002 & - 0.126 & 0.029 \\ - 0.126 & 3.094 & - 0.850 \\ 0.029 & - 0.850 & 0.251 \end{matrix}] .

To test this, we performed a simulation where we generated n samples from the equilibrium distribution. We then found the value of $\hat{A}$ that maximized the likelihood. To do this, we initialized A using the recursive algorithm described above and used the function optim in R to optimize the log-likelihood of the model, utilizing the BFGS method developed by Broyden (1970); Fletcher (1970); Goldfarb (1970). We repeated the initialization and optimization procedure 1000 times and used the resulting $\hat{A}$ values to calculate an empirical variance. We used three different sample sizes n, resulting in the following empirical variances:

n = 1000,

V a r (\hat{A}) = [\begin{matrix} 0.021 & - 0.077 & 0.009 \\ - 0.077 & 2.633 & - 0.717 \\ 0.009 & - 0.717 & 0.219 \end{matrix}],

n = 10000,

V a r (\hat{A}) = [\begin{matrix} 0.023 & - 0.074 & 0.007 \\ - 0.074 & 2.705 & - 0.746 \\ 0.007 & - 0.746 & 0.230 \end{matrix}],

n = 100000,

V a r (\hat{A}) = [\begin{matrix} 0.023 & - 0.091 & 0.012 \\ - 0.091 & 2.757 & - 0.737 \\ 0.012 & - 0.737 & 0.220 \end{matrix}],

These results show that the asymptotic variance approximates the finite sample variances quite well, and so provide support for the asymptotic theory.

3.3. Simulation Results - Coverage

Another way to assess the accuracy of the asymptotic approximation for finite samples is to look at the coverage of the resulting confidence intervals. To do this, we used the same two-dimensional model as above. In each single simulation, a₁₁ and a₂₂ were chosen at random from a Uniform(0.1, 0.9) distribution and a₂₁ was chosen at random from a Uniform(−0.9, 0.9) distribution. We then generated n samples from the equilibrium distribution and used those samples to obtain an MLE for A by maximixing the log-likelihood as before. We then used the empirical asymptotic variance of the MLE to obtain confidence intervals and see if the true values are covered. We repeat this 1000 times with the same A to get coverage results.

We carried out this procedure 10 times each for n equal to 100, 1000 and 10000, generating a different A each time. The coverage results are shown in Figures 5-8. The x-axis is the true value of a₁₁. This value matters more than the other A values because it propagates to all the values of Σ. As n increases, the coverage results clearly improve. The coverage is essentially correct for large n, both marginally and jointly.

Figure 5: — Joint coverage for all parameters of A using the 95% confidence interval from the multivariate normal asymptotic distribution of the MLE.

Figure 8: — Marginal coverage for a₂₂ using the 95% confidence interval from the asymptotic distribution of the MLE.

We note that with small n and a₁₁, the observed variance of x₁ can be smaller than that of the known D. This can cause problems with optimizing A.

4. Likelihood Ratio Test

We now assess the validity of the likelihood-ratio test for the simple VAR equilibrium model. To test this, we used a 3-dimensional system with

A = [\begin{matrix} 0.6 & 0 & 0 \\ - 0.3 & 0.4 & 0 \\ 0 & - 0.5 & 0.8 \end{matrix}] .

For this system, $A_{0}$ has the correct constraint that the lower-left element is zero while $A_{1}$ does not have that constraint. For a dataset generated from the model, we find the MLE under both hypotheses and construct the likelihood ratio test statistic.

To obtain the empirical distribution of the test statistic, we generated 10,000 samples from the equilibrium distribution and calculated the likelihood ratio for those data. We repeated this 1,000 times to get a distribution of the statistic values. Figure 9 shows the distribution compared with a χ² distribution with one degree of freedom. The distributions are close, indicating that the asymptotic distribution of the likelihood-ratio chi-squared test statistic provides a good approximation in this simulated example.

Figure 9: — Comparison of the empirical distribution of the likelihood ratio statistic to test for an additional edge in the VAR equilibrium problem with a χ² distribution with one degree of freedom. The left-hand plot shows the density of the test statistic compared with the χ² distribution while the right shows the empirical CDF comparison.

5. Application to Synthetic Data

We have shown by simulation that the asymptotic distributions of estimators and likelihood-ratio test statistics provide good approximations to finite-sample distributions in several example scenarios. What happens when we try to apply the method in situations where the model is not known to be correct? The matrix D not likely to be known a priori in a real data setting, and in gene networks the graph is not necessarily restricted to be acyclic. That the general VAR equilibrium model does not require an acyclic graph is a potential advantage of our method over Bayesian network methods, which require a directed acyclic graph.

5.1. GeneNetWeaver Data

To look at the method in a more realistic setting, we need a reasonable size network where the truth is known. For this we used networks from the DREAM4 in silico network challenge competition (http://dreamchallenges.org/project/dream4-in-silico-network-challenge/). The 10-gene networks from the competition are a good size for analysis, include cycles, and are known. The original data used in the competition include time series data simulating a perturbation to the network. The first and last time points do correspond to a stable equilibrium for the network, but these cannot be used for our purposes because the original data do not include enough separate time series from which to get samples.

However, the software used to generate the data, GeneNetWeaver, is available online at http://gnw.sourceforge.net/ (Marbach et al., 2009; Schaffter et al., 2011). The software can be configured to generate any number of independent time series from a given network model. The data are generated according to a dynamical model using ordinary differential equations with added stochastic noise. For the time series data, the network begins at equilibrium and then the network is artificially perturbed for the first half of the time course. At the midpoint of the time course, the perturbation is removed and the network is allowed to return to its equilibrium state by the end of the time course. The GeneNetWeaver software is designed to provide a rich simulation of the dynamics of biological regulatory networks. The ODEs used include terms simulating transcription and translation rates as well as protein degradation rates, and simulates mRNA levels and protein levels simultaneously. Noise is added to the system to simulate both random fluctuations in the actual levels of mRNA and protein as well as measurement error.

We used GeneNetWeaver to generate time series data using the network model used in the first 10-gene network from the DREAM4 competition. The GeneNetWeaver software has this network available as a pre-configured model from which data can be generated. The network has 10 nodes and 15 non-self edges, including a cycle involving three genes. All settings for data generation were the same as those used in the DREAM4 competition, including the amount of added noise to the system and the model of noise in microarrays. We used GeneNetWeaver to generate 100 random time series of 21 points each from the model and took the first and last timepoints as samples from the equilibrium distribution.

5.2. Method Application

To see how the VAR equilibrium method works on these data, we used the known network structure as our initial model for A, which includes cycles and is not lower-triangular. We did not assume a known D, but included it as additional parameters for the model. This resulted in a total of 35 parameters for the model - 10 from D, 10 diagonal elements of A, and 15 off-diagonal elements of A reflecting the 15 non-self edges in the network. The matrix Σ has 55 parameters, so this is well within the parameter counting requirement. This does not guarantee identifiability, but does suggest that it is likely to hold.

To find the MLE for the model, we randomly initialized A and D. The diagonal elements of A were initialized randomly from a Uniform(0,0.9) distribution to reflect the belief that the autoregressive parameter for the effect of any gene’s current expression level on its level at the next time point is positive. The off-diagonal elements were initialized from a Uniform(−0.9, 0.9) distribution. A requirement on D is that each element be less than the variance of the corresponding gene in the equilibrium data, so each element was initialized to be the variance of the corresponding gene times a draw from a Uniform(0.1, 0.9) distribution.

After A and D were initialized, the log-likelihood of the data was optimized using the optim function in R with the BFGS method. The optimization scheme may get stuck in a local mode since the likelihood is not convex in A and D, so the random initialization and optimization was repeated 1,000 times to provide a better search of the model space. The best A and D were kept as the approximate MLE.

This gave an estimate of the parameters for the true model, but in order to test individual edges, we needed to find the MLE for models adjacent to the true model. These models are defined by taking the original network structure and adding or removing a single edge. When removing edges, we needed to ensure that the connectivity of the graph was not broken. Three of the edges in the original network, if removed, would isolate a single node. This leaves 12 of the original 15 edges available to be tested. Adding an edge does not create problems, and thus we were able to test all 75 extra edges. With the results from the true model and the 87 testable models, we could see which edges the likelihood ratio test identified as significant.

Running the above optimization scheme for each of the 88 models resulted in an approximate MLE for each model. Because these optimizations were run independently for each model and the model space is not necessarily fully searched, this resulted in some mis-ordering of the models. That is, the MLE found for one model resulted in a likelihood that was worse than that of a model in which some of the parameters of the first model are constrained to be zero. This is never allowed since the parameters of the smaller model can be used for the larger model, resulting in the same likelihood.

To fix the ordering issue, we performed a second optimization step for each target model. We initialized optim with the parameter values from each of the models. In some cases, this involved throwing out parameter values for extra edges or using zeros from smaller models. The model was then optimized from that starting point. Further, if there are zeroes from a smaller model, we rerandomized just the zeros a small number of times to give better search coverage. We iterated over this step until the models were appropriately ordered, which happened within a few iterations.

5.3. Results

We applied the optimization scheme described above to both the initial and final timepoint data from the generated data from GeneNetWeaver. For each model explored, we obtained the best log-likelihood value for comparison. We then compared the true model with each other model, either with one less or one more edge, and computed the likelihood ratio statistic for testing the edge. From the likelihood ratio statistic we computed a p-value by comparing the value to a χ² distribution with one degree of freedom.

Ideally, the p-values corresponding to the 12 edges which are in the true network would be low and the p-values for the 75 edges added to the true model would be high. Figure 10 shows the results for both the first and the last timepoint. Looking at the circles, we find that 7 of the 150 edges added to the true network were identified as true edges at p = 0.05. This is a Type I error rate of 5%, showing that the test is producing the correct level of false positive results. Five of the 24 true edges which were tested were identified as true edges, yielding a power of 21%, which is quite promising for this area of research.

Figure 10: — p-values for each tested edge in the GeneNetWeaver data. Circles correspond to edges added to the true network (false edges) and triangles correspond to edges removed from the true network (true edges). The left figure is using data from the first timepoint and the right figure is using data from the last timepoint. The dashed line corresponds with a p-value of 0.05.

6. Discussion

Steady state gene expression data present a challenge in inferring directional edges. We have presented a model which can be used to test for the existence of edges in such data. We have derived asymptotic properties of estimators and tests for this model in a constrained, but still reasonably large subset of the possible models, and we found that the resulting asymptotic approximations performed well in some finite-sample settings. The derivation of necessary and sufficient conditions for identifiability in the fully general case is a topic for future research.

The identifiability problem is also found in structural equation modeling (SEM) (Bentler and Weeks, 1980). SEM is similar to a VAR model in that it specifies relationships among a set of variables. SEM, however, relates the variables without considering the time element. If X is the collection of random variables observed, then a linear structural equation model with Gaussian noise can be written as

X = A X + ε, ε \sim N (0, D) .

Seeking identifiability conditions has been the subject of a number of papers (Drton et al., 2011; Brito and Pearl, 2012; Drton and Weihs, 2015), and general conditions for this class of models are not known.

We have shown that even in cases where the theory has not been validated, the VAR equilibrium method can still identify true relationships among genes. This is evident in the analysis of the GeneNetWeaver data, where true edges were identified more consistently than extraneous edges.

If the network is partially known, then the likelihood ratio statistic can be used to test for extra edges and thus build up a more comprehensive view of the network. As an example, we may want to learn about a network perturbed with a certain drug. If the unperturbed network is known, we can test edges in and around that network using the perturbation data to learn about the changes induced by the drug. Since we may not expect the entire network to change, the VAR equilibrium method takes advantage of prior knowledge about the network structure.

To apply the VAR equilibrium-based likelihood ratio test method to steady state data without a known network to start from will take further development. One way to go about this would be to take a subset of genes and look at all possible network structures for those genes. For each model, the steady state data would be used to maximize the likelihood. One could then combine the models using a Bayesian model averaging approach. To do this, the integrated likelihood needs to be approximated for each model. As a starting point, the Bayesian Information Criteria (BIC) could be used. The BIC is equal to −2log $\hat{L} + k$ log n, where $\hat{L}$ is the maximized likelihood for the model, k is equal to 2p plus the number of edges in the network, and n is the number of observations. Averaging over all models would result in a posterior probability for each edge in the network.

Figure 6: — Marginal coverage for a₁₁ using the 95% confidence interval from the asymptotic distribution of the MLE.

Figure 7: — Marginal coverage for a₂₁ using the 95% confidence interval from the asymptotic distribution of the MLE.

Acknowledgements

This research was supported by NIH grants U54-HL127624, R01-HD054511 and R01-HD070936. Raftery’s research was also partly supported by the Center for Advanced Study in the Behavioral Sciences at Stanford University.

References

Anderson TW (2000). A note on a vector-variate normal distribution and a stationary autoregressive process. Journal of Multivariate Analysis, 72, 149–150. [Google Scholar]
Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, and Califano A (2005). Reverse engineering of regulatory networks in human B cells. Nature Genetics, 37, 382–390. [DOI] [PubMed] [Google Scholar]
Bentler PM and Weeks DG (1980). Linear structural equations with latent variables. Psychometrika, 45, 289–308. [Google Scholar]
Brito C and Pearl J (2012). Graphical condition for identification in recursive SEM. arXiv preprint arXiv:1206.6821. [Google Scholar]
Broyden CG (1970). The convergence of a class of double-rank minimization algorithms 1. general considerations. IMA Journal of Applied Mathematics, 6, 76–90. [Google Scholar]
Drton M and Weihs L (2015). Generic identifiability of linear structural equation models by ancestor decomposition. arXiv preprint arXiv:1504.02992. [Google Scholar]
Drton M, Foygel R, and Sullivant S (2011). Global identifiability of linear structural equation models. Annals of Statistics, 39, 865–886. [Google Scholar]
Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, and Gardner TS (2007). Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biology, 5, e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ferguson TS (1996). A Course in Large Sample Theory, volume 49. Chapman & Hall London. [Google Scholar]
Fletcher R (1970). A new approach to variable metric algorithms. The Computer Journal, 13, 317–322. [Google Scholar]
Friedman J, Hastie T, and Tibshirani R (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1–22. [PMC free article] [PubMed] [Google Scholar]
Goldfarb D (1970). A family of variable-metric methods derived by variational means. Mathematics of Computation, 24, 23–26. [Google Scholar]
Li WV and Wei A (2012). A Gaussian inequality for expected absolute products. Journal of Theoretical Probability, 25, 92–99. [Google Scholar]
Lütkepohl H (2005). New Introduction to Multiple Time Series Analysis. Springer Science & Business Media. [Google Scholar]
Marbach D, Schaffter T, Mattiussi C, and Floreano D (2009). Generating realistic in silico gene networks for performance assessment of reverse engineering methods. Journal of Computational Biology, 16, 229–239. [DOI] [PubMed] [Google Scholar]
Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, and Califano A (2006). ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, 7, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meyer PE, Kontos K, Lafitte F, and Bontempi G (2007). Information-theoretic inference of large transcriptional regulatory networks. EURASIP Journal on Bioinformatics and Systems Biology, 2007, 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Michailidis G and d’Alché Buc F (2013). Autoregressive models for gene regulatory network inference: Sparsity, stability and causality issues. Mathematical biosciences, 246(2), 326–334. [DOI] [PubMed] [Google Scholar]
Omranian N, Eloundou-Mbebi JM, Mueller-Roeber B, and Nikoloski Z (2016). Gene regulatory network inference using fused LASSO on multiple data sets. Scientific Reports, 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schaffter T, Marbach D, and Floreano D (2011). GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics, 27, 2263–2270. [DOI] [PubMed] [Google Scholar]
Sheppard K (2013). Financial econometrics notes. URL http://www.kevinsheppard.com/MFE.
Shojaie A, Jauhiainen A, Kallitsis M, and Michailidis G (2014). Inferring regulatory networks by combining perturbation screens and steady state gene expression profiles. PloS One, 9, article e82393. [DOI] [PMC free article] [PubMed] [Google Scholar]
Singh N and Vidyasagar M (2016). bLARS: An algorithm to infer gene regulatory networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13, 301–314. [DOI] [PubMed] [Google Scholar]
Tong L, Yu S, and Liu R (1992). Identifiability of a set of quadratic equations with unknown coefficients. In Circuits and Systems, 1992. ISCAS’92. Proceedings., 1992 IEEE International Symposium on, volume 1, pages 292–295. [Google Scholar]
Tusher VG, Tibshirani R, and Chu G (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences, 98, 5116–5121. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yeung KY, Dombek KM, Lo K, Mittler JE, Zhu J, Schadt EE, Bumgarner RE, and Raftery AE (2011). Construction of regulatory networks using expression time-series data of a genotyped population. Proceedings of the National Academy of Sciences, 108, 19436–19441. [DOI] [PMC free article] [PubMed] [Google Scholar]
Young WC, Raftery AE, and Yeung KY (2014). Fast Bayesian inference for gene regulatory networks using ScanBMA. BMC Systems Biology, 8, 47. [DOI] [PMC free article] [PubMed] [Google Scholar]
Young WC, Raftery AE, and Yeung KY (2016). A posterior probability approach for gene regulatory network inference in genetic perturbation data. Mathematical Biosciences and Engineering, 13, 1241–1251. [DOI] [PubMed] [Google Scholar]

[R1] Anderson TW (2000). A note on a vector-variate normal distribution and a stationary autoregressive process. Journal of Multivariate Analysis, 72, 149–150. [Google Scholar]

[R2] Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, and Califano A (2005). Reverse engineering of regulatory networks in human B cells. Nature Genetics, 37, 382–390. [DOI] [PubMed] [Google Scholar]

[R3] Bentler PM and Weeks DG (1980). Linear structural equations with latent variables. Psychometrika, 45, 289–308. [Google Scholar]

[R4] Brito C and Pearl J (2012). Graphical condition for identification in recursive SEM. arXiv preprint arXiv:1206.6821. [Google Scholar]

[R5] Broyden CG (1970). The convergence of a class of double-rank minimization algorithms 1. general considerations. IMA Journal of Applied Mathematics, 6, 76–90. [Google Scholar]

[R6] Drton M and Weihs L (2015). Generic identifiability of linear structural equation models by ancestor decomposition. arXiv preprint arXiv:1504.02992. [Google Scholar]

[R7] Drton M, Foygel R, and Sullivant S (2011). Global identifiability of linear structural equation models. Annals of Statistics, 39, 865–886. [Google Scholar]

[R8] Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, and Gardner TS (2007). Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biology, 5, e8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Ferguson TS (1996). A Course in Large Sample Theory, volume 49. Chapman & Hall London. [Google Scholar]

[R10] Fletcher R (1970). A new approach to variable metric algorithms. The Computer Journal, 13, 317–322. [Google Scholar]

[R11] Friedman J, Hastie T, and Tibshirani R (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1–22. [PMC free article] [PubMed] [Google Scholar]

[R12] Goldfarb D (1970). A family of variable-metric methods derived by variational means. Mathematics of Computation, 24, 23–26. [Google Scholar]

[R13] Li WV and Wei A (2012). A Gaussian inequality for expected absolute products. Journal of Theoretical Probability, 25, 92–99. [Google Scholar]

[R14] Lütkepohl H (2005). New Introduction to Multiple Time Series Analysis. Springer Science & Business Media. [Google Scholar]

[R15] Marbach D, Schaffter T, Mattiussi C, and Floreano D (2009). Generating realistic in silico gene networks for performance assessment of reverse engineering methods. Journal of Computational Biology, 16, 229–239. [DOI] [PubMed] [Google Scholar]

[R16] Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, and Califano A (2006). ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, 7, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Meyer PE, Kontos K, Lafitte F, and Bontempi G (2007). Information-theoretic inference of large transcriptional regulatory networks. EURASIP Journal on Bioinformatics and Systems Biology, 2007, 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Michailidis G and d’Alché Buc F (2013). Autoregressive models for gene regulatory network inference: Sparsity, stability and causality issues. Mathematical biosciences, 246(2), 326–334. [DOI] [PubMed] [Google Scholar]

[R19] Omranian N, Eloundou-Mbebi JM, Mueller-Roeber B, and Nikoloski Z (2016). Gene regulatory network inference using fused LASSO on multiple data sets. Scientific Reports, 6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Schaffter T, Marbach D, and Floreano D (2011). GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics, 27, 2263–2270. [DOI] [PubMed] [Google Scholar]

[R21] Sheppard K (2013). Financial econometrics notes. URL http://www.kevinsheppard.com/MFE.

[R22] Shojaie A, Jauhiainen A, Kallitsis M, and Michailidis G (2014). Inferring regulatory networks by combining perturbation screens and steady state gene expression profiles. PloS One, 9, article e82393. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Singh N and Vidyasagar M (2016). bLARS: An algorithm to infer gene regulatory networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13, 301–314. [DOI] [PubMed] [Google Scholar]

[R24] Tong L, Yu S, and Liu R (1992). Identifiability of a set of quadratic equations with unknown coefficients. In Circuits and Systems, 1992. ISCAS’92. Proceedings., 1992 IEEE International Symposium on, volume 1, pages 292–295. [Google Scholar]

[R25] Tusher VG, Tibshirani R, and Chu G (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences, 98, 5116–5121. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Yeung KY, Dombek KM, Lo K, Mittler JE, Zhu J, Schadt EE, Bumgarner RE, and Raftery AE (2011). Construction of regulatory networks using expression time-series data of a genotyped population. Proceedings of the National Academy of Sciences, 108, 19436–19441. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Young WC, Raftery AE, and Yeung KY (2014). Fast Bayesian inference for gene regulatory networks using ScanBMA. BMC Systems Biology, 8, 47. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Young WC, Raftery AE, and Yeung KY (2016). A posterior probability approach for gene regulatory network inference in genetic perturbation data. Mathematical Biosciences and Engineering, 13, 1241–1251. [DOI] [PubMed] [Google Scholar]

PERMALINK

Identifying Dynamical Time Series Model Parameters from Equilibrium Samples, with Application to Gene Regulatory Networks

William Chad Young

Ka Yee Yeung

Adrian E Raftery

Abstract

1. Introduction