Direct estimation of differential networks

Sihai Dave Zhao; T Tony Cai; Hongzhe Li

doi:10.1093/biomet/asu009

. Author manuscript; available in PMC: 2015 May 26.

Published in final edited form as: Biometrika. 2014 May 12;101(2):253–268. doi: 10.1093/biomet/asu009

Direct estimation of differential networks

Sihai Dave Zhao ¹, T Tony Cai ², Hongzhe Li ³

PMCID: PMC4443936 NIHMSID: NIHMS685159 PMID: 26023240

Abstract

It is often of interest to understand how the structure of a genetic network differs between two conditions. In this paper, each condition-specific network is modeled using the precision matrix of a multivariate normal random vector, and a method is proposed to directly estimate the difference of the precision matrices. In contrast to other approaches, such as separate or joint estimation of the individual matrices, direct estimation does not require those matrices to be sparse, and thus can allow the individual networks to contain hub nodes. Under the assumption that the true differential network is sparse, the direct estimator is shown to be consistent in support recovery and estimation. It is also shown to outperform existing methods in simulations, and its properties are illustrated on gene expression data from late-stage ovarian cancer patients.

Keywords: Differential network, Graphical model, High dimensionality, Precision matrix

1 Introduction

A complete understanding of the molecular basis of disease will require characterization of the network of interdependencies between genetic components. There are many types of networks that may be considered, such as protein-protein interaction networks or metabolic networks (Emmert-Streib and Dehmer, 2011), but the focus in this paper is on transcriptional regulatory networks. In many cases, interest centers not on a particular network but rather on whether and how the network changes between disease states. Indeed, differential networking analysis has recently emerged as an important complement to differential expression analysis (de la Fuente, 2010; Ideker and Krogan, 2012). For example, Hudson et al. (2009) studied a mutant breed of cattle known to differ from wild-type cattle by a mutation in the myostatin gene. Myostatin was not differentially expressed between the two breeds, but Hudson et al. (2009) showed that a differential network analysis could correctly identify it as the gene containing the causal mutation. In another example, using an experimental technique called differential epistasis mapping, Bandyopadhyay et al. (2010) demonstrated large-scale changes in the genetic networks of yeast cells after perturbation by a DNA-damaging agent.

Transcriptional networks are frequently modeled as Gaussian graphical models (Markowetz and Spang, 2007). Gene expression levels are assumed to be jointly Gaussian, so that two expression levels are conditionally independent given the other genes if and only if the corresponding entry of the precision matrix, or the inverse covariance matrix, is zero. Representing gene expression levels as nodes and conditional dependency relationships as edges in a graph results in a Gaussian graphical model (Lauritzen, 1996). A differential network can be modeled as changes in this graph structure between two conditions.

However, there may be cases where the conditional dependency relationships between pairs of genes change in magnitude but not in structure. For example, two genes may be positively conditionally dependent in one group but negatively conditionally dependent in the other. The supports of the precision matrices of the two groups would be identical, and would not reflect these potentially biologically significant differences in magnitude. Instead, in this paper two genes are defined to be connected in the differential network if the magnitude of their conditional dependency relationship changes between two groups. More precisely, consider independent observations of expression levels of p genes from two groups of subjects: X_i = (X_i₁, . . . , X_ip )^T for i = 1, . . . , n_X from one group and Y_i = (Y_i₁, . . . , Y_ip )^T for i = 1, . . . , n_Y from the other, where X_i ~ N(μ_X, Σ_X) and Y_i ~ N(μ_Y , Σ_Y ). The differential network is defined to be the difference between the two precision matrices, denoted $Δ_{0} = Σ_{Y}^{- 1} - Σ_{X}^{- 1}$ . The entries of Δ₀ can also be interpreted as the differences in the partial covariances of each pair of genes between the two groups. This type of model for a differential network has been adopted by others as well, for example in Li et al. (2007), Danaher et al. (2013), and an unpublished technical report by Städler and Mukherjee (arxiv:1308.2771).

2 Previous approaches

There are currently two main types of approaches to estimating Δ₀. The most straightforward one is to separately estimate $Σ_{X}^{- 1}$ and $Σ_{Y}^{- 1}$ and then to subtract the estimates. A naive estimate of a single precision matrix can be obtained by inverting the sample covariance matrix. However, in most experiments the number of gene expression probes exceeds the number of subjects. In this high-dimensional data setting, the sample covariance matrix is singular and alternative methods are needed to estimate the precision matrix. Theoretical and computational work has shown that estimation is possible under the key assumption that the precision matrix is sparse, meaning that each row and each column has relatively few nonzero entries (Friedman et al., 2008; Ravikumar et al., 2008; Yuan, 2010; Cai et al., 2011).

The second type of approach is to jointly estimate $Σ_{X}^{- 1}$ and $Σ_{Y}^{- 1}$ , taking advantage of an assumption that they share common features. For example, Chiquet et al. (2011), Guo et al. (2011), and Danaher et al. (2013) penalized the joint log-likelihood of the X_i and Y_i using penalties such as the group lasso (Yuan and Lin, 2006) and group bridge (Huang et al., 2009; Wang et al., 2009), which encourage the estimated precision matrices to have similar supports. Danaher et al. (2013) also introduced the fused graphical lasso, which uses a fused lasso penalty (Tibshirani et al., 2005) to encourage the entries of the estimated precision matrices to have similar magnitudes.

However, most of these approaches assume that both $Σ_{X}^{- 1}$ and $Σ_{Y}^{- 1}$ are sparse, but real transcriptional networks often contain hub nodes (Barabási and Oltvai, 2004; Barabási et al., 2011), or genes that interact with many other genes. The rows and columns of $Σ_{X}^{- 1}$ and $Σ_{Y}^{- 1}$ corresponding to hub nodes have many nonzero entries and violate the sparsity condition. The method of Danaher et al. (2013) is one exception that does not require individual sparsity. Its estimates ${\hat{Σ}}_{X}^{- 1}$ and ${\hat{Σ}}_{Y}^{- 1}$ minimize

\sum_{g \in {X, Y}} n_{g} {\log \det Σ_{g}^{- 1} - tr ({\hat{Σ}}_{g} Σ_{g}^{- 1})} - λ_{1} \sum_{g \in {X, Y}} \sum_{j \neq k} ∣ ω_{j k}^{g} ∣ + λ_{2} \sum_{j k} ∣ ω_{j k}^{X} - ω_{j k}^{Y} ∣,

(1)

where ${\hat{Σ}}_{X}$ and ${\hat{Σ}}_{Y}$ are sample covariance matrices of the X_i and Y_i, $ω_{j k}^{X}$ and $ω_{j k}^{Y}$ are the (j, k)th entries of $Σ_{X}^{- 1}$ and $Σ_{Y}^{- 1}$ , and det(·) and tr(·) are the determinant and trace of a matrix, respectively. The first term of (1) is the joint likelihood of the X_i and Y_i and the second and third terms comprise a fused lasso-type penalty. The parameters λ₁ and λ₂ control the sparsity of the individual precision matrix estimates and the similarities of their entries, respectively, and when λ₁ is set to zero (1) does not require ${\hat{Σ}}_{X}^{- 1}$ or ${\hat{Σ}}_{Y}^{- 1}$ to be sparse. A referee also pointed out that a recently introduced method (Mohan et al., 2012) were also designed for estimating networks containing hubs. However, theoretical performance guarantees for these methods have not been derived.

The direct estimation method proposed in this paper does not require $Σ_{X}^{- 1}$ and $Σ_{Y}^{- 1}$ to be sparse and does not require separate estimation of these precision matrices. Theoretical performance guarantees are provided for differential network recovery and estimation, and simulations show that when the separate networks include hub nodes, direct estimation is more accurate than fused graphical lasso or separate estimation.

3 Direct estimation of difference of two precision matrices

3.1 Constrained optimization approach

Let |·| denote element-wise norms and let ∥·∥ denote matrix norms. For a p × 1 vector a = (a₁, . . . , a_p)_T, define |a|₀ to be the number of nonzero elements of a, ${∣ a ∣}_{1} = \sum_{j} ∣ a_{j} ∣, {∣ a ∣}_{2} = {(\sum_{j} a_{j}^{2})}^{1 ∕ 2}$ , and |a|_∞ = max_j |a_j|. for a p × p matrix A with entries a_jk, define |A|₀ be the number of nonzero entries of A, |A|₁ = Σ_j,k |a_jk|, |A|_∞ = max_j,k |a_jk|, ∥A∥₁ = max_k Σ_j |a_jk|, ∥A_∞ = max_j Σ_k |a_jk|, ∥A∥₂ = sup_|a|2≤1 |Aa|₂, and ${‖ A ‖}_{F} = {(\sum_{j, k} a_{j k}^{2})}^{1 ∕ 2}$ .

Let ${\hat{Σ}}_{X} = n_{X}^{- 1} \sum_{i} (X_{i} - \overset{‒}{X}) {(X_{i} - \overset{‒}{X})}^{T}$ , where $\overset{‒}{X} = n_{X}^{- 1} \sum_{i} X_{i}$ , and let ${\hat{Σ}}_{Y}$ be defined similarly. Since the true Δ₀ satisfies Σ_X Δ₀Σ_Y − (Σ_X − Σ_Y) = 0, a sensible estimation procedure would solve ${\hat{Σ}}_{X} Δ {\hat{Σ}}_{Y} - ({\hat{Σ}}_{X} - {\hat{Σ}}_{Y}) = 0$ for Δ. When min(n_X, n_Y ) < p there are an infinite number of solutions, but accurate estimation is still possible when Δ₀ is sparse. Motivated by the constrained $ℓ_{1}$ minimization approach to precision matrix estimation of Cai et al. (2011), one estimator can be obtained by solving

\arg \min {∣ Δ ∣}_{1} subject to {∣ {\hat{Σ}}_{X} Δ {\hat{Σ}}_{Y} - {\hat{Σ}}_{X} + {\hat{Σ}}_{Y} ∣}_{\infty} \leq λ_{n}

and then symmetrizing the solution. This is equivalent to a linear program, as for any three p × p matrices A, B, and C, $vec (A B C) = (C^{T} \otimes A) vec (B)$ , where $\otimes$ denotes the Kronecker product and vec(B) denotes the p² × 1 vector obtained by stacking the columns of B. Therefore Δ₀ could be estimated by solving and then symmetrizing

\arg \min {∣ Δ ∣}_{1} subject to {∣ ({\hat{Σ}}_{Y} \otimes {\hat{Σ}}_{X}) vec (Δ) - vec ({\hat{Σ}}_{X} - {\hat{Σ}}_{Y}) ∣}_{\infty} \leq λ_{n} .

(2)

This approach directly estimates the difference matrix without even implicitly estimating the individual precision matrices. The key is that sparsity is assumed for Δ₀ and not for $Σ_{X}^{- 1}$ or $Σ_{Y}^{- 1}$ . Direct estimation thus allows the presence of hub nodes in the individual networks and can still achieve accurate support recovery and estimation in high dimensions, as will be discussed in Section 4. A similar direct estimation approach was also proposed by Cai and Liu (2011) for high-dimensional linear discriminant analysis. Linear discriminant analysis depends on the product of a precision matrix and the difference between two mean vectors, and Cai and Liu (2011) showed that direct estimation of this product is possible even in cases where the precision matrix or the mean difference are not individually estimable.

3.2 A modified problem

The linear program (2) has a p² × p² constraint matrix ${\hat{Σ}}_{Y} \otimes {\hat{Σ}}_{X}$ and can become computationally demanding for large p. A modified procedure can alleviate this burden by requiring the estimate to be symmetric. Denote the (j, k)th entry of a matrix Δ by δ_jk, and define β to be the p(p + 1)/2 × 1 vector with β = (δ_{jk)1≤j≤k≤p}. Estimating a symmetric Δ is thus equivalent to estimating β, which has only p(p + 1)/2 parameters. Define the p² × p(p + 1)/2 matrix S with columns indexed by 1 ≤ j ≤ k ≤ p and rows indexed by l = 1, . . . , p and m = 1, . . . , p, so that each entry is labeled by S_lm,jk. For j ≤ k, let S_jk,jk = S_kj,jk = 1, and set all other entries of S equal to zero. For example, when p = 3,

S = (\begin{matrix} S_{11, 11} & S_{11, 12} & S_{11, 13} & S_{11, 22} & S_{11, 23} & S_{11, 33} \\ S_{21, 11} & S_{21, 12} & S_{21, 13} & S_{21, 22} & S_{21, 23} & S_{21, 33} \\ S_{31, 11} & S_{31, 12} & S_{31, 13} & S_{31, 22} & S_{31, 23} & S_{31, 33} \\ S_{12, 11} & S_{12, 12} & S_{12, 13} & S_{12, 22} & S_{12, 23} & S_{12, 33} \\ S_{22, 11} & S_{22, 12} & S_{22, 13} & S_{22, 22} & S_{22, 23} & S_{22, 33} \\ S_{32, 11} & S_{32, 12} & S_{32, 13} & S_{32, 22} & S_{32, 23} & S_{32, 33} \\ S_{13, 11} & S_{13, 12} & S_{13, 13} & S_{13, 22} & S_{13, 23} & S_{13, 33} \\ S_{23, 11} & S_{23, 12} & S_{23, 13} & S_{23, 22} & S_{23, 23} & S_{23, 33} \\ S_{33, 11} & S_{33, 12} & S_{33, 13} & S_{33, 22} & S_{33, 23} & S_{33, 33} \end{matrix}) = (\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \end{matrix}) .

When Δ is symmetric some calculation shows that $({\hat{Σ}}_{Y} \otimes {\hat{Σ}}_{X}) vec (Δ) = ({\hat{Σ}}_{Y} \otimes {\hat{Σ}}_{X}) S β$ . Furthermore, if $β_{0} = {(δ_{j k}^{0})}_{1 \leq j \leq k \leq p}$ , where $δ_{j k}^{0}$ is the (j, k)th entry of Δ₀, by Lemma A1 β₀ is the unique solution to $S^{T} (Σ_{Y} \otimes Σ_{X}) S β - S^{T} vec (Σ_{X} - Σ_{Y}) = 0$ . Therefore one reasonable approach to estimate a sparse β₀ in high dimensions is to solve

\hat{β} = \arg \min {∣ β ∣}_{1} subject to {∣ S^{T} ({\hat{Σ}}_{Y} \otimes {\hat{Σ}}_{X}) S β - S^{T} vec ({\hat{Σ}}_{X} - {\hat{Σ}}_{Y}) ∣}_{\infty} \leq λ_{n} .

However, the inequality constraints can be improved. Let E be the p × p matrix such that $vec (E) = ({\hat{Σ}}_{Y} \otimes {\hat{Σ}}_{X}) S β$ . These constraints treat the diagonals and off-diagonals of $E - ({\hat{Σ}}_{X} - {\hat{Σ}}_{Y})$ differently, with the diagonals constrained roughly half as much as the off-diagonals.

Therefore the remainder of this paper considers the estimate of Δ₀ obtained by solving

\begin{matrix} \hat{β} = \arg \min {∣ β ∣}_{1} subject to & {∣ S^{T} \hat{Σ} S β - S^{T} \hat{b} ∣}_{O \infty} \leq λ_{n} \\ and & {∣ S^{T} \hat{Σ} S β - S^{T} \hat{b} ∣}_{D \infty} \leq λ_{n} ∕ 2, \end{matrix}

(3)

where $\hat{Σ} = {\hat{Σ}}_{Y} \otimes {\hat{Σ}}_{X}$ , $\hat{b} = vec ({\hat{Σ}}_{X} - {\hat{Σ}}_{Y})$ , and for a p(p+1)/2 × 1 vector c, |c|_O∞ denotes the sup-norm of the entries of c corresponding to the off-diagonal elements of its matrix form, and |c|_D∞ is the sup-norm of the entries corresponding to the diagonal elements. The matrix form of $\hat{β}$ will be denoted by $\hat{Δ}$ . Compared to (2), (3) requires only a p(p + 1)/2 × p(p + 1)/2 constraint matrix, but requires a stronger theoretical condition to guarantee support recovery and estimation consistency, which is discussed in Section 4.

3.3 Implementation

The estimator (3) can be computed by slightly modifying code from the R package flare, recently developed by X. Li, T. Zhao, X. Yuan, and H. Liu to implement a variety of high dimensional linear regression and precision matrix estimation methods. Their method uses the alternating direction method of multipliers; for a thorough discussion see Boyd et al. (2011). To apply their algorithm, rewrite (3) as

\hat{β} = \underset{r, β}{\arg \min} f (r) + {∣ β ∣}_{1} subject to r + S^{T} \hat{Σ} S β = S^{T} \hat{b},

where f(r) equals infinity if |r|_O∞ > λ_n or |r|_D∞ > λ_n/2 and zero otherwise. The augmented Lagrangian is then

L_{p} (r, β, y) = f (r) + {∣ β ∣}_{1} + u^{T} (r + S^{T} \hat{Σ} S β - S^{T} \hat{b}) + (ρ ∕ 2) {‖ r + S^{T} \hat{Σ} S β - S^{T} \hat{b} ‖}_{2}^{2},

where u is the Lagrange multiplier and ρ > 0 is a penalty parameter specified by the user. The alternating direction method of multipliers obtains the solution using the updates

\begin{matrix} r^{(t + 1)} = & \underset{r}{\arg \min} {‖ u^{t} ∕ ρ + S^{T} \hat{b} - S^{T} \hat{Σ} S β^{t} - r ‖}_{2}^{2} ∕ 2 + f (r) ∕ ρ, \\ β^{t + 1} = & \underset{β}{\arg \min} {‖ u^{t} ∕ ρ - r^{t + 1} + S^{T} \hat{b} - S^{T} \hat{Σ} S β ‖}_{2}^{2} ∕ 2 + {∣ β ∣}_{1} ∕ ρ, \\ u (t + 1) = & u^{t} + ρ (S^{T} \hat{b} - r^{t + 1} - S^{T} \hat{Σ} S β^{t + 1}), \end{matrix}

for each iteration t. The flare package incorporates several strategies to speed convergence, such as using a closed-form expression for r^t⁺¹, using a hybrid coordinate descent and linearization procedure to obtain β^t⁺¹, and dynamically adjusting ρ at each iteration.

The direct estimation approach can be tuned using an approximate Akaike information criterion. For the loss functions

L_{\infty} (λ_{n}) = {∣ {\hat{Σ}}_{X} \hat{Δ} (λ_{n}) {\hat{Σ}}_{Y} - {\hat{Σ}}_{X} + {\hat{Σ}}_{Y} ∣}_{\infty}, L_{F} (λ_{n}) = {‖ {\hat{Σ}}_{X} \hat{Δ} (λ_{n}) {\hat{Σ}}_{Y} - {\hat{Σ}}_{X} + {\hat{Σ}}_{Y} ‖}_{F},

(4)

where $\hat{Δ} (λ_{n})$ makes explicit the dependence of the estimator on the tuning parameter, λ_n is chosen to minimize

(n_{X} + n_{Y}) L (λ_{n}) + 2 k,

(5)

where L(λ_n) represents either L_∞ or L_F and k is the effective degrees of freedom, which can be approximated by $k = {∣ \hat{β} ∣}_{0}$ , or the number of nonzero elements in the upper triangle of $\hat{Δ}$ . The loss functions (4) focus on the supremum and Frobenius norms in light of Theorems 2 and 3 in Section 4, but other norms could be used as well.

4 Theoretical properties

Let $σ_{j k}^{X}$ and $σ_{j k}^{Y}$ be the (j, k)th entries of Σ_X and Σ_Y , respectively. Define $σ_{\max}^{X} = \max_{j} σ_{j j}^{X}$ and $σ_{\max}^{Y} = \max_{j} σ_{j j}^{Y}$ . Good performance of direct estimation requires the following conditions.

Condition 1

The true difference matrix Δ₀ has s < p nonzero entries in its upper triangle, and |Δ₀|₁ ≤ M, where M does not depend on p.

Condition 2

With s defined as in Condition 1, the constants $μ_{X} = \max_{j \neq k} ∣ σ_{j k}^{X} ∣$ and $μ_{Y} = \max_{j \neq k} ∣ σ_{j k}^{Y} ∣$ must satisfy $μ = 4 \max (μ_{X} σ_{\max}^{Y}, μ_{Y} σ_{\max}^{X}) \leq σ_{\min}^{S} {(2 s)}^{- 1}$ , where $σ_{\min}^{S} = \min_{j, k} (σ_{j j}^{Y} σ_{j j}^{X}, σ_{j j}^{X} σ_{j j}^{X} + 2 σ_{k j}^{Y} σ_{j k}^{X} + σ_{j j}^{Y} σ_{k k}^{X})$ .

Condition 1 requires the difference matrix to have essentially constant sparsity, which is reasonable because genetic networks are not expected to differ much between two conditions. Condition 2 requires that the true covariances between the covariates are not too high, and can hold even when $Σ_{X}^{- 1}$ and $Σ_{Y}^{- 1}$ are not sparse. Actually it is sufficient to require only that the magnitude of the largest off-diagonal entry of $S^{T} (Σ_{Y} \otimes Σ_{X}) S$ be less than $σ_{\min}^{S} ∕ 2 s$ , but Condition 2 is more interpretable.

Condition 2 is closely related to the mutual incoherence property introduced by Donoho and Huo (2001), but is more complicated in the current setting because it involves a linear function of the Kronecker product of two covariance matrices. Solving (2) instead of (3) would require only $\max (μ_{X} σ_{\max}^{Y}, μ_{Y} σ_{\max}^{X}) \leq \min_{j, k} (σ_{j j}^{X} σ_{k k}^{Y}) {(2 \tilde{s})}^{- 1}$ , with s̃ equal to the total number of nonzero entries of Δ₀. If in addition $σ_{j j}^{X} = σ_{j j}^{Y} = 1$ for all j, max(μ_X, μ_Y) ≤ (2s̃)⁻¹ would be required, which is similar to imposing the usual mutual incoherence condition on Σ_X and Σ_Y . Condition 2 is more restrictive, but (3) is easy to compute and still gives good finite-sample results.

Under these conditions, a thresholded version of the direct estimator $\hat{Δ}$ can successfully recover the support of Δ₀. Let the (j, k)th entries of Δ₀ and $\hat{Δ}$ be $δ_{j k}^{0}$ and ${\hat{δ}}_{j k}^{}$ , respectively. For a threshold τ_n > 0 define the estimator

{\hat{Δ}}_{τ_{n}} = {{\hat{δ}}_{j k} I (∣ {\hat{δ}}_{j k} ∣ > τ_{n})} .

Let the (j, k)th entry of ${\hat{Δ}}_{τ_{n}}$ be ${\hat{δ}}_{j k}^{τ_{n}}$ , and define the function

sgn (t) = {\begin{matrix} 1, & t > 0, \\ 0, & t = 0, \\ - 1, & t < 0 . \end{matrix}

Then if $M ({\hat{Δ}}_{τ_{n}}) = {sgn ({\hat{δ}}_{j k}^{τ_{n}}) : j = 1, \dots, p, k = 1, \dots, p}$ and $M (Δ_{0}) = {sgn (δ_{j k}^{0}) : j = 1, \dots, p, k = 1, \dots, p}$ are vectors of the signs of the entries of the estimated and true difference matrices, respectively, the following theorem holds.

Theorem 1

Suppose Conditions 1 and 2 hold, and let $σ_{\min}^{S}$ and μ be defined as in Condition 2. If min(n_X, n_Y ) > log p,

τ_{n} \geq \frac{1}{σ_{\min}^{S}} {1 + \frac{σ_{\min}^{S}}{σ_{\min}^{S} - (2 s - 1) μ}} {M (∣ σ_{l^{'} l}^{X} ∣ + ∣ σ_{m^{'} m}^{Y} ∣ + C) + 1} 8 C {\frac{\log p}{\min (n_{X}, n_{Y})}}^{1 ∕ 2},

and $\min_{j, k : δ_{j k \neq 0}^{0}} ∣ δ_{j k}^{0} ∣ > 2 τ_{n}$ , then $M ({\hat{Δ}}_{τ_{n}}) = M (Δ_{0})$ with probability at least 1 − 8p^−τ, where C is defined in Lemma A2 in the Appendix.

Theorem 1 states that with high probability, ${\hat{Δ}}_{τ_{n}}$ can recover not only the support of Δ₀ but also the signs of its nonzero entries, as long as those entries are sufficiently large. In other words, in the context of genetic networks, ${\hat{Δ}}_{τ_{n}}$ can correctly identify genes whose conditional dependencies change in magnitude between two conditions, as well as the directions of those changes, as long as min(n_X, n_Y ) is large relative to log p. In practice the threshold τ_n can be treated as a tuning parameter. In simulations and data analysis τ_n was set to 0 · 0001.

A thresholding step is natural in practice because small entries of $\hat{Δ}$ are most likely noisy estimates of zero. This step could be avoided by imposing an irrepresentability condition on $Σ_{X} \otimes Σ_{Y}$ , similar to those assumed in the proofs of the selection consistencies of the lasso (Meinshausen and Bühlmann, 2006; Zhao and Yu, 2006) and the graphical lasso (Ravikumar et al., 2008). However, these types of conditions are stronger than the mutual incoherence-type property assumed in Condition 2, as discussed in Lounici (2008). The thresholded estimators are pursued in this paper because of their milder theoretical requirements.

In addition to identifying the entries of $Σ_{X}^{- 1}$ and $Σ_{Y}^{- 1}$ that change, $\hat{Δ}$ can correctly quantify these changes, in the sense of being consistent for Δ₀ in the Frobenius norm.

Theorem 2

Suppose Conditions 1 and 2 hold and define $σ_{\min}^{S}$ and μ as in Condition 2. If min(n_X, n_Y ) > log p and

λ_{n} = {M (∣ σ_{l^{'} l}^{X} ∣ + ∣ σ_{m^{'} m}^{Y} ∣ + C) + 1} 4 C {\frac{\log p}{\min (n_{X}, n_{Y})}}^{1 ∕ 2},

then

{‖ \hat{Δ} - Δ_{0} ‖}_{F} \leq \frac{{(5 s)}^{1 ∕ 2}}{σ_{\min}^{S}} {1 + \frac{σ_{\min}^{S}}{σ_{\min}^{S} - (2 s - 1) μ}} 2 λ_{n}

with probability at least 1 − 8p^−τ, where C is defined in Lemma A2.

The proofs of Theorems 1 and 2 rely on the following bound on the element-wise $ℓ_{\infty}$ norm of the estimation error.

Theorem 3

Suppose Conditions 1 and 2 hold, and define $σ_{\min}^{S}$ and μ as in Condition 2. If min(n_X, n_Y ) > log p and

λ_{n} = {M (∣ σ_{l^{'} l}^{X} ∣ + ∣ σ_{m^{'} m}^{Y} ∣ + C) + 1} 4 C {\frac{\log p}{\min (n_{X}, n_{Y})}}^{1 ∕ 2},

then

{∣ \hat{Δ} - Δ_{0} ∣}_{\infty} \leq \frac{1}{σ_{\min}^{S}} {1 + \frac{σ_{\min}^{S}}{σ_{\min}^{S} - (2 s - 1) μ}} 2 λ_{n}

with probability at least 1 − 8p^−τ, where C is defined in Lemma A2 in the Appendix.

Similar theoretical properties have been derived for separate and joint approaches to estimating differential networks (Cai et al., 2011; Guo et al., 2011). However, these require sparsity conditions on each Σ⁻¹, such as ∥Σ⁻¹∥_∞ ≤ M′ < ∞, which can be violated if the individual networks contain hub nodes. In contrast, Theorems 1–3 can still hold in the presence of hubs.

5 Simulations

5.1 Settings

Simulations were conducted to compare direct estimation (3), fused graphical lasso (1), and separate estimation using the procedure of Cai et al. (2011). Data were generated with p = 40, 60, 90, and 120 and X₁, . . . , X_nX and Y₁, . . . , Y_nY were generated from N(0, Σ_X) and N(0, Σ_Y ), respectively, with n_X = n_Y = 100.

For each p, the support of $Σ_{X}^{- 1}$ was first generated according to a network with p(p − 1)/10 edges and a power law degree distribution with an expected power parameter of 2, which should mimic real-world networks (Newman, 2003). This still gives a relatively sparse network, since only 20% of all possible edges are present, but the power law structure creates hub nodes, which make certain rows and columns nonsparse.

The value of each nonzero entry of $Σ_{X}^{- 1}$ was next generated from a uniform distribution with support [−0 · 5, −0 · 2] ∪ [ [0 · 2, 0 · 5]. To ensure positive-definiteness, each row was divided by two when p = 40, three when p = 60, four when p = 90, and five when p = 120. The diagonals were then set equal to one and the matrix was symmetrized by averaging it with its transpose. The differential network Δ₀ was generated such that the largest 20%, by magnitude, of the connections of the top two hub nodes of $Σ_{X}^{- 1}$ changed sign between $Σ_{X}^{- 1}$ and $Σ_{Y}^{- 1}$ . In other words, Δ₀ was a sparse matrix, with zero entries everywhere except for 20% of the entries in two rows and columns.

Each method was tuned using an approximate Akaike information criterion. Direct estimation was tuned using (5) and one of the loss functions in (4). For a fair comparison fused graphical lasso was tuned in the same way, after searching across all combinations of three values of λ₁ and 10 values of λ₂. Small values of λ₁ were used because the true precision matrices were nonsparse. Separate estimation was tuned by searching across 10 different values of the tuning parameter to minimize ${AIC}_{X} = n_{X} tr ({\hat{Σ}}_{X} {\hat{Ω}}_{X}) - n_{X} \log \det ({\hat{Ω}}_{X}) + 2 {∣ {\hat{Ω}}_{X} ∣}_{0}$ , where ${\hat{Σ}}_{X}$ was the sample covariance matrix of the X and ${\hat{Ω}}_{X}$ was the estimated precision matrix. The same was done for the Y_i, with AIC_Y defined similarly. Results were averaged over 250 replications.

5.2 Results

Figure 1 illustrates the receiver operating characteristic curves of the three estimation methods. Let ${\hat{δ}}_{j k}$ be the (j, k)th entry of a given estimator $\hat{Δ}$ and let $δ_{j k}^{0}$ be the (j, k)th entry of the true Δ₀. The true positive and negative rates of $\hat{Δ}$ were defined as

TP = \frac{\sum_{j k} I ({\hat{δ}}_{j k} \neq 0 and δ_{j k}^{0} \neq 0)}{\sum_{j k} I (δ_{j k}^{0} \neq 0)}, TN = \frac{\sum_{j k} I ({\hat{δ}}_{j k} = 0 and δ_{j k}^{0} = 0)}{\sum_{j k} I (δ_{j k}^{0} = 0)},

respectively. Different points on the curves correspond to different tuning parameter values. The curves for the fused graphical lasso estimator (1) were plotted by varying λ₂. The λ₁ parameter, which controls the sparsity of the individual precision matrix estimates, was fixed at a small value because the individual matrices were not sparse. For a fair comparison with the thresholded direct estimator, the fused graphical lasso was thresholded at 0 · 0001. The separate estimator performed poorly and its curves were not plotted. Figure 1 shows that direct estimation compared favorably to fused graphical lasso.

Receiver operating characteristic curves for support recovery of $Δ_{0} = Σ_{Y}^{- 1} - Σ_{X}^{- 1}$ ; solid line: thresholded direct estimator; dashed line: thresholded fused graphical lasso estimator with λ₁ = 0; dotted line: fused graphical lasso estimator with λ₁ = 0 · 1

The true discovery and nondiscovery rates of the three estimators were studied as well, and were defined as

TD = \frac{\sum_{j k} I ({\hat{δ}}_{j k} \neq 0 and δ_{j k}^{0} \neq 0)}{\sum_{j k} I ({\hat{δ}}_{j k} \neq 0)}, TND = \frac{\sum_{j k} I ({\hat{δ}}_{j k} = 0 and δ_{j k}^{0} = 0)}{\sum_{j k} I ({\hat{δ}}_{j k} = 0)},

respectively. These rates were taken to be zero when their denominators were equal to zero. In the analysis of genomic data minimizing the number of false discoveries is a major concern. The direct and fused graphical lasso estimators were thresholded at 0 · 0001 and the separate estimator was thresholded at 0 · 0002. In all settings, the true nondiscovery rates of the direct and fused graphical lasso estimators were close to 100%; separate estimation frequently did not identify any zero entries in the differential network. The true discovery rates are reported in Table 1, which also compares the effects of tuning using different loss functions (4). For direct estimation, tuning using L_∞ gave the best true discovery rates for smaller p while L_F was preferrable for larger p. For fused graphical lasso, L_∞ was always the better choice. Using either L_∞ or L_F , direct estimation performed well compared to fused graphical lasso and separate estimation, especially for larger p.

Table 1.

Average true discovery rates over 250 simulations. Standard errors are in parentheses

	${\hat{Δ}}_{τ_{n}}$		${\hat{Δ}}_{F G L τ_{n}}$
p	L_∞	L_F	L_∞	L_F	${\hat{Δ}}_{S τ_{n}}$
40	77(16)	29(16)	83(12)	27(17)	2(0)
60	76(19)	66(21)	74(19)	65(30)	1(0)
90	66(25)	80(31)	55(26)	12(28)	1(0)
120	48(39)	61(45)	33(25)	1(7)	1(0)

Open in a new tab

The direct and fused graphical lasso estimators were tuned using (5) and either L1 or LF from (4); ${\hat{Δ}}_{τ_{n}}$ : thresholded direct estimator; ${\hat{Δ}}_{F G L τ_{n}}$ : thresholded fused graphical lasso estimator; ${\hat{Δ}}_{S τ_{n}}$ : thresholded separate estimator.

The Frobenius norm estimation accuracies of the unthresholded estimators tuned using different loss functions are reported in Table 2. The results of tuning using L_∞ or L_F were comparable. Direct estimation was much more accurate than separate estimation and slightly more accurate than fused graphical lasso. It is possible for direct estimation to simultaneously give markedly better support recovery but similar estimation compared to fused graphical lasso because estimation error depends on the magnitudes of the estimated entries, while support recovery depends only on whether they are nonzero. For example, suppose $\hat{Δ}$ had the same support as the true Δ₀, but each nonzero entry had magnitude 0 · 01. Then under the p = 120 simulation setting, ${‖ \hat{Δ} - Δ_{0} ‖}_{F} = 1 \cdot 56$ . The estimation error of this $\hat{Δ}$ is even higher than to those in Table 2, but it exactly recovers the true support.

Table 2.

Average estimation errors in Frobenius norm over 250 simulations. Standard errors are in parentheses

	$\hat{Δ}$		${\hat{Δ}}_{F G L}$
p	L_∞	L_F	L_∞	L_F	${\hat{Δ}}_{S}$
40	1·67(0·13)	1·46(0·22)	1·72(0·06)	1·50(0·15)	12·96(0·78)
60	1·68(0·07)	1·62(0·11)	1·68(0·05)	1·69(0·06)	28·45(2·01)
90	1·68(0·04)	1·71(0·03)	1·68(0·03)	1·72(0·03)	57·30(2·75)
120	1·55(0·02)	1·55(0·01)	1·54(0·02)	1·55(0·00)	102·84(4·06)

Open in a new tab

The direct and fused graphical lasso estimates were tuned using (5) with either L_∞ or L_F from (4); $\hat{Δ}$ : direct estimator; ${\hat{Δ}}_{F G L}$ : fused graphical lasso estimator; ${\hat{Δ}}_{S}$ : separate estimator.

The good performance of direct estimation came at the price of some computational convenience. The memory required by the large constraint matrix behaves like O(p⁴), though the simulations generally needed no more than one gigabyte of memory when p = 120. In an unpublished technical report, Hong and Luo (arxiv:1208.3922) proved the global linear convergence of the alternating direction method of multipliers applied to problems like (3). However, each iteration of the proposed algorithm requires roughly O(sp⁴) computations, where s is the number of nonzero entries in the upper triangle of Δ₀, as defined in Condition 1. The simulations required on average 51, 853, 6231, and 51589 seconds when p = 40, 60, 90, and 120, respectively. On the other hand, these memory and time requirements are still reasonable in practice. Recently, H. Pang, H. Liu, and R. Vanderbei have developed an even faster algorithm for constrained $ℓ_{1}$ -minimization problems such as (3), available in the R package fastclime, which should reduce the computational burden of direct estimation.

6 Gene expression study of ovarian cancer

The proposed approach was applied to gene expression data collected from patients with stage III or IV ovarian cancer. Using these data, Tothill et al. (2008) identified six molecular subtypes of ovarian cancer, which they labeled C1 through C6. They found that the C1 subtype, characterized by differential expression of genes associated with stromal and immune cell types, was associated with much shorter survival times.

The proposed direct estimation procedure was applied to investigate whether this poor prognosis sub-type was also associated with differential wiring of genetic networks. The subjects were divided into a C1 group, with 78 patients, and a C2–C6 group, with 113 patients. Several pathways from the KEGG pathway database (Ogata et al., 1999; Kanehisa et al., 2012) were studied to determine if any differences in the conditional dependency relationships of the gene expression levels existed between the subtypes. All probesets corresponding to the same gene symbol were first averaged to get gene-level expression measurements.

Direct estimation and fused graphical lasso were tuned using (5) with the loss function L_∞ because in simulations this gave the best results for fused graphical lasso and good results for direct estimation. Separate estimation was tuned as described in Section 3.3. The direct and fused graphical lasso estimators were thresholded at 0 · 0001 to recover the differential network. The separate estimator was not simply thresholded at 0 · 0002 because simulations showed that this method gave poor true discovery rates. Instead, two genes were defined as being linked in the differential network if they were connected in one group but not the other, or if they were connected in both groups but their conditional dependency relationship changed sign. The procedure of Cai et al. (2011) thresholded at 0 · 0001 was used to recover the individual networks.

Two illustrative examples are reported in Fig. 2. Only genes included in at least one edge, or which saw a change in partial variance between the two subtypes, were included in the figures. In the results of separate estimation, only genes in the differential network estimated by direct estimation were labeled. To interpret the results, the most highly connected genes in the differential networks were considered to be important.

Estimates of the differential networks between ovarian cancer subtypes. The direct and fused graphical lasso estimators were thresholded and the separate estimator was further sparsified; see text. Black edges show increase in conditional dependency from ovarian cancer subtype C1 to subtypes C2–C6, gray edges show decrease. (a)–(c): KEGG 04350, TGF-β pathway, (d)–(f): KEGG 04210, Apoptosis pathway. (a)–(e) tuned using *L_∞* with (5), (f) tuned with λ₁ = 0 and λ₂ to give the same number of edges as (d); see text. Separate estimator not shown for apoptosis pathway.

Figures 2(a)–2(c) illustrate estimates of the differential network of the TGF-β signaling pathway, which at 82 genes is larger than the sample size of the C1 group. Direct estimation suggested the presence of two hub genes, COMP and THBS2, which have both been found to be related to resistance to platinum-based chemotherapy in epithelial ovarian cancer (Marchini et al., 2013). Fused graphical lasso gave the same number of edges in the differential network as direct estimation and only suggested the importance of COMP. It was hard to draw meaningful conclusions from the results of separate estimation because the denseness of the estimated network made it difficult to identify a small number of important genes.

Figures 2(d)–2(f) give the estimates for the apoptosis pathway, which at 87 genes was also larger than sample size of the C1 group. The separate estimator again resulted in a dense network and is not included in Fig. 2. Direct estimation pointed to BIRC3 and TNFSF10 as being important genes. Indeed, TNFSF10 encodes the TRAIL protein, which has been studied a great deal because of its potential as an anticancer drug (Yagita et al., 2004; Bellail et al., 2009), and in particular as a therapy for ovarian cancer (Petrucci et al., 2012; Kipps et al., 2013). BIRC3 can inhibit TRAIL-induced apoptosis (Johnstone et al., 2008) and has also been considered for use as a therapeutic target in cancer (Vucic and Fairbrother, 2007). Figure 2(e) shows the fused graphical lasso estimator tuned in the same way as the direct estimator, and suggests only BIRC3 as being important. For a fairer comparison with direct estimation, Fig. 2(f) depicts the fused graphical lasso estimator after fixing λ₁ = 0 and adjusting λ₂ to achieve the same level of sparsity as Fig. 2(d). The result is similar to Fig. 2(d), though it suggests that BIRC3 and PRKAR2B, a protein kinase, are important, rather than BIRC3 and TRAIL.

7 Discussion

Instead of modeling a differential network as the difference of two precision matrices, as proposed above, another possibility is to use the difference between two directed acyclic graphs. These graphs are natural models for single transcriptional regulatory networks, with nodes representing gene expression levels and edges indicating how the nodes are causally related to each other. Biological changes to a network can be thought of as interventions on some of its nodes, which results in changes to the graphical structure; see the unpublished technical report by Hauser and Bühlmann (arxiv:1303.3216) and references therein. Frequently, however, only observational data are available for gene expression, so it is difficult to estimate the underlying causal structures (Kalisch and Bühlmann, 2007; Maathuis et al., 2009). It would be interesting to develop a direct estimation method for differential networks with interventional data.

For method (3) to have good properties in high dimensions, Δ₀ must be sparse. While reasonable, this assumption will be violated if the biological differences between two groups manifest as global changes that affect a large number of gene-gene dependencies. If some proportion of these global changes are of sufficient magnitude, the method should still be able to detect their presence, though it may not recover all of the changes or accurately estimate their magnitudes. The most challenging case for the proposed method occurs when the network changes are numerous but small. A new statistic could be defined to quantify the degree of global change between two precision matrices, but so far there is little consensus as to what statistic might be most biologically meaningful.

Finally, while the focus has been on directly estimating the difference between two precision matrices, there are situations where interest may center on how a transcriptional regulatory network differs between K conditions, where K > 2. The proposed method could of course be used to estimate all pairwise differential networks, but this could be time-consuming. Another possibility would be to estimate the difference between each precision matrix and some common precision matrix, which could be taken to be the inverse of the pooled covariance matrix of all K groups. In other words, if ${\hat{Σ}}_{k}$ were the sample covariance matrix of the kth group, let ${\hat{Σ}}_{P} = \sum_{k} w_{k} {\hat{Σ}}_{k}$ be some weighted average of the ${\hat{Σ}}_{k}$ and consider solving

{\hat{Δ}}_{k} = \arg \min {∣ Δ ∣}_{1} subject to {∣ {\hat{Σ}}_{k} Δ {\hat{Σ}}_{P} - {\hat{Σ}}_{k} + {\hat{Σ}}_{P} ∣}_{\infty} \leq λ_{n} .

If the difference matrix $Δ_{k} = Σ_{P}^{- 1} - Σ_{k}^{- 1}$ were sparse, where $Σ_{P} = E ({\hat{Σ}}_{P})$ and $Σ_{k}^{- 1}$ is the precision matrix of the kth group, ${\hat{Δ}}_{k}$ would be a direct estimate of Δ_k. The differential network between the jth and kth group could then be estimated as ${\hat{Δ}}_{j} - {\hat{Δ}}_{k}$ .

Acknowledgments

This research is supported by National Institute of Health grants and a National Science Foundation grant. The authors thank Xingguo Li for his expertise regarding the computational complexity of the proposed algorithm.

APPENDIX: PROOFS OF THEOREMS

Lemma 1

The matrix $S^{T} (Σ_{Y} \otimes Σ_{X}) S$ is invertible, where Σ_Y and Σ_X are p × p covariance matrices and S is defined in Section 3.2.

Proof

Since Σ_X and Σ_Y are positive-definite, there exists a full-rank matrix Σ^1/2 such that $Σ_{Y} \otimes Σ_{X} = {(Σ^{1 ∕ 2})}^{T} Σ^{1 ∕ 2}$ . Furthermore, from its construction S has full column rank, so rank(S) = p(p+ 1)/2. Therefore

rank {S^{T} (Σ_{Y} \otimes Σ_{X}) S} = rank {S^{T} {(Σ^{1 ∕ 2})}^{T} Σ^{1 ∕ 2} S} = rank (Σ^{1 ∕ 2} S) = rank (S) .

Since $S^{T} (Σ_{Y} \otimes Σ_{X}) S$ is p(p + 1)/2 × p(p + 1)/2, it is full rank and therefore invertible.

The next lemma comes from the proofs of Theorems 1(a) and 4(a) in Cai et al. (2011).

Lemma 2

Let X_i = (X_i₁, . . . , X_ip)^T for i = 1, . . . , n be independent and identically distributed random vectors with E(X_i) = (μ₁, . . . , μ_j_T , and let X̄ = n⁻¹X_i and $\hat{Σ} = n^{- 1} \sum_{i} (X_{i} - \overset{‒}{X}) {(X_{i} - \overset{‒}{X})}^{T}$ . If there exists some 0 < η < 1/4 such that log p/n ≤ η and $E {e^{t {(X_{i j - μ_{j}})}^{2}}} \leq K \leq \infty$ for all |t| ≤ η and j = 1, . . . , p, then

{∣ \hat{Σ} - Σ ∣}_{\infty} \leq C {(\log p ∕ n)}^{1 ∕ 2}

with probability at least 1 − 4p^−τ, where C = 2η ⁻²(2 + τ + η⁻¹ e²K²)² and τ > 0.

Lemma 3

Let $Σ = Σ_{Y} \otimes Σ_{X}$ . Label the entries of S_TΣS as $σ_{j^{'} k^{'}, j k}^{S} (1 \leq j^{'} \leq k^{'} \leq p; 1 \leq j \leq k \leq p)$ . Then

\begin{matrix} σ_{j^{'} k^{'}, j k}^{S} = σ_{k^{'} k}^{Y} σ_{j^{'} j}^{X} + σ_{k^{'} j}^{Y} σ_{j^{'} k}^{X} + σ_{j^{'} k}^{Y} σ_{k^{'} j}^{X} + σ_{j^{'} j}^{Y} σ_{k^{'} k}^{Y}, & j^{'} \neq k^{'}, j \neq k, \\ σ_{j^{'} k^{'}, j j}^{S} = σ_{k^{'} j}^{Y} σ_{j^{'} j}^{X} + σ_{j^{'} j}^{Y} σ_{k^{'} j}^{X}, & j^{'} \neq k^{'}, j = k, \\ σ_{j^{'} j, j k}^{S} = σ_{j^{'} k}^{Y} σ_{j^{'} j}^{X} + σ_{j^{'} j}^{Y} σ_{j^{'} k}^{X}, & j^{'} = k^{'}, j \neq k, \\ σ_{j^{'} j^{'}, j j}^{S} = σ_{j^{'} j}^{Y} σ_{j^{'} j}^{X} & j^{'} = k^{'}, j = k . \end{matrix}

Proof

Label the entries of Σ as σ_l′m′,lm(l′ = 1, . . . , p; m′ = 1, . . . , p; l = 1, . . . , p; m = 1, . . . , p) and the entries of S as S_lm,jk(l = 1, . . . , p; m = 1, . . . , p; 1 ≤ j ≤ k ≤ p), as in Section 3.3. By the definition of the Kronecker product, $σ_{l^{'} m^{'}, l m} = σ_{m^{'} m}^{Y} σ_{l^{'} l}^{X}$ , and $σ_{j^{'} k^{'}, j k}^{S} = \sum_{l^{'}, m^{'}, l, m} S_{l^{'}, m^{'}, j^{'} k^{'}} σ_{l^{'}, m^{'}, l m} S_{l m, j k}$ , so the lemma follows from the definition of the entries of S.

Proof of Theorem 3

Let the entries of Δ₀ be denoted $δ_{j k}^{0}$ , and define the p(p + 1)/2 × 1 vector $β_{0} = {(δ_{j k}^{0})}_{1 \leq j \leq k \leq p}$ . Define Σ as in Lemma A3, $\hat{b} = vec ({\hat{Σ}}_{X} - {\hat{Σ}}_{Y})$ , b = vec(Σ_X − Σ_Y ), and $h = \hat{β} - β_{0}$ . The bound on ${∣ \hat{Δ} - Δ_{0} ∣}_{\infty} = {∣ h ∣}_{\infty}$ is obtained following Lounici (2008).

Denote the ath component of S^T ΣSh by (S^TΣSh)_a, the (a, b)th entry of S^TΣS by $σ_{a b}^{S}$ , and the bth component of h by h_b. Also let $μ = \max_{a \neq b} ∣ σ_{a b}^{S} ∣$ . Then ${(S^{T} Σ S h)}_{a} = \sum_{b = 1} σ_{a b}^{S} h_{b} = σ_{a a}^{S} h_{a} + \sum_{b \neq a} σ_{a b}^{S} h_{b}$ , which implies

∣ σ_{a a}^{S} h_{a} ∣ \leq {∣ S^{T} Σ S h ∣}_{\infty} + μ ∣ σ_{a b}^{S} ∣ \sum_{b \neq a} ∣ h_{b} ∣ .

(6)

The diagonal terms $σ_{a a}^{S}$ can be relabeled as $σ_{j k, j k}^{S}$ , where j may equal k, and from Lemma A3 must satisfy $σ_{j k, j k}^{S} \geq σ_{\min}^{S}$ , with $σ_{\min}^{S}$ defined in Condition 2. The off-diagonal terms $σ_{a b}^{S}$ , a ≠ b can be relabeled as $σ_{j^{'} k^{'}, j k}^{S}$ with j′ ≠ j or k′ ≠ k, and from Lemma A3 must satisfy $σ_{j^{'}, k^{'}, j k}^{S} \leq 4 \max (μ_{X} σ_{\max}^{X}, μ_{Y} σ_{\max}^{Y}) = μ$ , with μ_X and μ_Y defined in Condition 2. Using these facts, and Condition 2, (6) becomes

{∣ h ∣}_{\infty} \leq \frac{1}{σ_{\min}^{S}} ({∣ S^{T} Σ S h ∣}_{\infty} + \frac{σ_{\min}^{S}}{2 s} {∣ h ∣}_{1}) .

(7)

The method of Cai et al. (2010b) is used to bound |h|₁. Let T₀ be the set of indices corresponding to the support of β₀, and for any p × 1 vector a = (a₁, . . . , a_p)^T let a_T₀ be the vector with components a_T₀_j = 0 for j ∉/ T₀ and a_T₀_j = a_j for j ∈ T₀.

First it must be shown that β₀ is in the feasible set with high probability. Since X_i and Y_i are both Gaussian, they satisfy the conditions of Lemma A2 and thus ${∣ {\hat{Σ}}_{X} - Σ_{X} ∣}_{\infty}$ and ${∣ {\hat{Σ}}_{Y} - Σ_{Y} ∣}_{\infty}$ are both less than C{log p/ min(n_X, n_Y)}^1/2 with probability at least 1 − 8p^−τ. Then

\begin{matrix} {∣ S^{T} \hat{Σ} S β_{0} - S^{T} \hat{b} ∣}_{\infty} & \leq {∣ S^{T} (\hat{Σ} - Σ) S β_{0} ∣}_{\infty} + {∣ S^{T} (\hat{b} - b) ∣}_{\infty} \\ \leq {‖ S^{T} ‖}_{\infty} {∣ \hat{Σ} - Σ ∣}_{\infty} {‖ S ‖}_{1} {∣ β_{0} ∣}_{1} + {‖ S^{T} ‖}_{\infty} ({∣ {\hat{Σ}}_{X} - Σ_{X} ∣}_{\infty} + {∣ {\hat{Σ}}_{Y} - Σ_{Y} ∣}_{\infty}) \\ \leq 4 M {∣ \hat{Σ} - Σ ∣}_{\infty} + 4 C {\log p ∕ \min (n_{X}, n_{Y})}^{1 ∕ 2}, \end{matrix}

where ∥S∥₁ = 2 by the definition of S and |β₀|₁ ≤ M by Condition 1. Next, from the proof of Lemma A3, each entry of Σ can be written as $σ_{l^{'} l}^{X} σ_{m^{'} m}^{Y}$ , so

\begin{matrix} ∣ {\hat{σ}}_{l^{'} l}^{X} {\hat{σ}}_{m^{'} m}^{Y} - σ_{l^{'} l}^{X} σ_{m^{'} m}^{Y} ∣ & = ∣ σ_{l^{'} l}^{X} ({\hat{σ}}_{m^{'} m}^{Y} - σ_{m^{'} m}^{Y}) + ({\hat{σ}}_{l^{'} l}^{X} - σ_{l^{'} l}^{X}) σ_{m^{'} m}^{Y} + ({\hat{σ}}_{l^{'} l}^{X} - σ_{l^{'} l}^{X}) ({\hat{σ}}_{m^{'} m}^{Y} - σ_{m^{'} m}^{Y}) ∣ \\ \leq [∣ σ_{l^{'} l}^{X} ∣ + ∣ σ_{m^{'} m}^{Y} ∣ + C {\log p ∕ \min (n_{X}, n_{Y})}^{1 ∕ 2}] C {\log p ∕ \min (n_{X}, n_{Y})}^{1 ∕ 2} \\ \leq (∣ σ_{l^{'} l}^{X} ∣ + ∣ σ_{m^{'} m}^{Y} ∣ + C) C {\log p ∕ \min (n_{X}, n_{Y})}^{1 ∕ 2}, \end{matrix}

since min(n_X, n_Y ) > log p. Then β₀ is feasible with probability at least 1 − 8p^−τ if $λ_{n} = {M (∣ σ_{l^{'} l}^{X} ∣ + ∣ σ_{m^{'} m}^{Y} ∣ + C) + 1) 4 C {\log p ∕ \min (n_{X}, n_{Y})}^{1 ∕ 2}$ .

Now |h|₁ can be bounded. By the definition of (3), ${∣ β_{0} ∣}_{1} - {∣ \hat{β} ∣}_{1} \geq 0$ . This implies that ${∣ β_{0 T_{0}} ∣}_{1} - ({∣ {\hat{β}}_{T_{0}} ∣}_{1} + {∣ {\hat{β}}_{T_{0}^{c}} ∣}_{1}) \geq 0$ . Using the triangle inequality, ${∣ β_{0 T_{0}} - {\hat{β}}_{T_{0}} ∣}_{1} \geq {∣ {\hat{β}}_{T_{0}^{c}} ∣}_{1}$ , or in other words, ${∣ h_{T_{0}^{c}} ∣}_{1} \leq {∣ h_{T_{0}} ∣}_{1}$ . Therefore ${∣ h ∣}_{1} \leq 2 {∣ h_{T_{0}} ∣}_{1} \leq 2 s^{1 ∕ 2} {∣ h_{T_{0}} ∣}_{2}$ . To bound |h_T₀|₂, observe following Cai et al. (2009) that for any s-sparse vector c,

∣ c^{T} S^{T} Σ S c ∣ \geq \sum_{a} σ_{a a}^{S} c_{a}^{2} - ∣ \sum_{a \neq b} σ_{a b}^{S} c_{a} c_{b} ∣ \geq σ_{\min}^{S} {∣ c ∣}_{2}^{2} - μ \sum_{a \neq b} ∣ c_{a} c_{b} ∣ \geq σ_{\min}^{S} {∣ c ∣}_{2}^{2} - μ (s - 1) {c ∣}_{2}^{2} .

This implies that

\begin{matrix} ∣ h_{T_{0} T}^{} S^{T} Σ S h ∣ & \geq ∣ h_{T_{0}}^{T} S^{T} Σ S h_{T_{0}} ∣ - ∣ h_{T_{0}}^{T} S^{T} Σ S h_{T_{0}^{c}} ∣ \geq {σ_{\min}^{S} - (s - 1) μ} {∣ h_{T_{0}} ∣}_{2}^{2} - ∣ \sum_{a, b} σ_{a b}^{S} h_{T_{0} a} h_{T_{0}^{c} b} ∣ \\ \geq {σ_{\min}^{S} - (s - 1) μ} {∣ h_{T_{0}} ∣}_{2}^{2} - μ {∣ h_{T_{0}} ∣}_{1} {∣ h_{T_{0}^{c}} ∣}_{1} \geq {σ_{\min}^{S} - (s - 1) μ} ∣ h_{T_{0}}} {∣ h_{T_{0}} ∣}_{2}^{2} - μ {∣ h_{T_{0}} ∣}_{1}^{2} \\ \geq {σ_{\min}^{S} - (2 s - 1) μ} {∣ h_{T_{0}} ∣}_{2}^{2} . \end{matrix}

Together with $∣ h_{T_{0}}^{T} S^{T} Σ S h ∣ \leq {∣ h_{T_{0}} ∣}_{1} {∣ S^{T} Σ S h ∣}_{\infty} \leq s^{1 ∕ 2} {∣ h_{T_{0}} ∣}_{2} {∣ S^{T} Σ S h ∣}_{\infty}$ , this implies that

{∣ h ∣}_{1} \leq 2 s^{1 ∕ 2} {∣ h_{T_{0}} ∣}_{2} \leq \frac{2 s {∣ S^{T} Σ S h ∣}_{\infty}}{σ_{\min}^{S} - (2 s - 1) μ},

so (7) becomes

{∣ h ∣}_{\infty} \leq \frac{1}{σ_{\min}^{S}} {1 + \frac{σ_{\min}^{S}}{σ_{\min}^{S} - (2 s - 1) μ}} {∣ S^{T} Σ S h ∣}_{\infty} .

Bounding |S^TΣSh|_∞ uses the proof that β₀ is feasible, because

\begin{matrix} {∣ S^{T} Σ S h ∣}_{\infty} & = {∣ S^{T} Σ S \hat{β} - S^{T} b ∣}_{\infty} \\ \leq {∣ S^{T} \hat{Σ} S \hat{β} - S^{T} \hat{b} ∣}_{\infty} + {∣ S^{T} (\hat{Σ} - Σ) S {\hat{β}}_{1} ∣}_{\infty} + {∣ S^{T} (\hat{b} - b) ∣}_{\infty} \\ \leq λ_{n} + {‖ S^{T} ‖}_{\infty} {∣ \hat{Σ} - Σ ∣}_{\infty} {‖ S ‖}_{1} {∣ β_{0} ∣}_{1} + {‖ S^{T} ‖}_{\infty} ({∣ {\hat{Σ}}_{X} - Σ_{X} ∣}_{\infty} + {∣ {\hat{Σ}}_{Y} - Σ_{Y} ∣}_{\infty}) \\ \leq λ_{n} + {4 M (∣ σ_{l^{'} l}^{X} ∣ + ∣ σ_{m^{'} m}^{Y} ∣ + C) + 4 {\log p ∕ \min (n_{X}, n_{Y})}^{1 ∕ 2} = 2 λ_{n} \end{matrix}

when ${∣ {\hat{Σ}}_{X} - Σ_{X} ∣}_{\infty}$ and ${∣ {\hat{Σ}}_{Y} - Σ_{Y} ∣}_{\infty}$ are both less than C{log p/ min(n_X, n_Y)}^1/2.

Proof of Theorem 1

Let ${\hat{δ}}_{j k}^{τ_{n}}$ be the (j, k)th entry of ${\hat{Δ}}_{τ_{n}}$ . Then

pr {M ({\hat{Δ}}_{τ_{n}}) = M (Δ_{0})} = pr {\max_{j, k : δ_{j k}^{0} = 0} ∣ {\hat{δ}}_{j k}^{τ_{n}} ∣ = 0 \cap \min_{j, k : δ_{j k}^{0} > 0} {\hat{δ}}_{j k}^{τ_{n}} > 0 \cap \max_{j, k : δ_{j k}^{0} < 0} {\hat{δ}}_{j k}^{τ_{n}} < 0} .

Suppose $δ_{j k}^{0} > 0$ . Then ${\hat{δ}}_{j k} = δ_{j k}^{0} - ({\hat{δ}}_{j k} - δ_{j k}^{0}) > 2 τ_{n} - τ_{n}$ with probability going to 1, by Theorem 3, so ${\hat{δ}}_{j k}^{τ_{n}} = {\hat{δ}}_{j k} > 0$ . Next suppose $δ_{j k}^{0} < 0$ . Then ${\hat{δ}}_{j k} = δ_{j k}^{0} - ({\hat{δ}}_{j k} - δ_{j k}^{0}) < - 2 τ_{n} + τ_{n}$ with probability going to 1, so ${\hat{δ}}_{j k}^{τ_{n}} = {\hat{δ}}_{j k} < 0$ . Finally for $δ_{j k}^{0} = 0, ∣ {\hat{δ}}_{j k} ∣ = ∣ {\hat{δ}}_{j k} - δ_{j k}^{0} ∣ \leq τ_{n}$ with probability going to 1, so ${\hat{δ}}_{j k}^{τ_{n}} = 0$ .

Proof of Theorem 2

Solutions to this type of $ℓ_{1}$ -constrained optimization problem have ${∣ h_{T_{0}} ∣}_{1} \geq {∣ h_{T_{0}^{c}} ∣}_{1}$ . Cai et al. (2010a) used this property of h, along with their Lemma 3, to show that ${∣ h ∣}_{2} \leq 2 {∣ h_{T_{0} \cup T^{*}} ∣}_{2}$ , where T* is the set of indices corresponding to the s/4-largest components of $h_{T_{0}^{c}}$ . Then ${∣ h ∣}_{2} \leq 2 {(1.25 s)}^{1 ∕ 2} {∣ h ∣}_{\infty}$ , and combining this with Theorem 3 completes the proof.

Contributor Information

Sihai Dave Zhao, Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA.

T. Tony Cai, Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.

Hongzhe Li, Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA.

References

Bandyopadhyay S, Mehta M, Kuo D, Sung M-K, Chuang R, Jaehnig EJ, Bodenmiller B, Licon K, Copeland W, Shales M, Fiedler D, Dutkowski J, Guénolé A, van Attikum H, Shokat KM, Kolodner RD, Huh W-K, Aebersold R, Keogh M-C, Krogan NJ, Ideker T. Rewiring of genetic networks in response to DNA damage. Sci. Signal. 2010;330(6009):1385–1389. doi: 10.1126/science.1195618. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barabási A-L, Oltvai ZN. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 2004;5(2):101–113. doi: 10.1038/nrg1272. [DOI] [PubMed] [Google Scholar]
Barabási A-L, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 2011;12(1):56–68. doi: 10.1038/nrg2918. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bellail AC, Qi L, Mulligan P, Chhabra V, Hao C. TRAIL agonists on clinical trials for cancer therapy: the promises and the challenges. Rev. Recent Clin. Trials. 2009;4(1):34–41. doi: 10.2174/157488709787047530. [DOI] [PubMed] [Google Scholar]
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundat. Trends Mach. Learn. 2011;3(1):1–122. [Google Scholar]
Cai T, Liu W. A Direct Estimation Approach to Sparse Linear Discriminant Analysis. J. Am. Statist. Assoc. 106(496):1566–1577. Dec. 2011. ISSN 0162-1459. doi: 10.1198/jasa.2011.tm11199. URL http://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm11199. [Google Scholar]
Cai T, Xu G, Zhang J. On recovery of sparse signals via l 1 minimization. IEEE Trans. Inf. Theory. 2009;55(7):3388–3397. [Google Scholar]
Cai T, Wang L, Xu G. Shifting inequality and recovery of sparse signals. IEEE Trans. Signal Process. 2010a;58(3):1300–1308. [Google Scholar]
Cai T, Wang L, Xu G. Stable recovery of sparse signals and an oracle inequality. IEEE Trans. Inf. Theory. 2010b;56(7):3516–3522. [Google Scholar]
Cai T, Liu W, Luo X. A constrained ℓ1 minimization approach to sparse precision matrix estimation. Am. Statist. Assoc. 2011;106(494):594–607. [Google Scholar]
Chiquet J, Grandvalet Y, Ambroise C. Inferring multiple graphical structures. Statist. Comput. 2011;21(4):537–553. [Google Scholar]
Danaher P, Wang P, Witten DM. The joint graphical lasso for inverse covariance estimation across multiple classes. J. Roy. Statist. Soc. Ser. B. 2013 doi: 10.1111/rssb.12033. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
de la Fuente A. From differential expression to differential networking – identification of dysfunctional regulatory networks in diseases. Trends Genet. 2010;26(7):326–333. doi: 10.1016/j.tig.2010.05.001. [DOI] [PubMed] [Google Scholar]
Donoho D, Huo X. Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inf. Theory. 2001;47(7):2845–2862. [Google Scholar]
Emmert-Streib F, Dehmer M. Networks for systems biology: conceptual connection of data and function. IET Syst. Biol. 2011;5(3):185–207. doi: 10.1049/iet-syb.2010.0025. [DOI] [PubMed] [Google Scholar]
Friedman JH, Hastie TJ, Tibshirani RJ. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9(3):432–441. doi: 10.1093/biostatistics/kxm045. [DOI] [PMC free article] [PubMed] [Google Scholar]
Guo J, Levina E, Michailidis G, Zhu J. Joint estimation of multiple graphical models. Biometrika. 2011;98(1):1–15. doi: 10.1093/biomet/asq060. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huang J, Ma S, Xie H, Zhang C-H. A group bridge approach for variable selection. Biometrika. 2009;96(2):339–355. doi: 10.1093/biomet/asp020. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hudson NJ, Reverter A, Dalrymple BP. A differential wiring analysis of expression data correctly identifies the gene containing the causal mutation. PLoS Comput. Biol. 2009;5(5):e1000382. doi: 10.1371/journal.pcbi.1000382. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ideker T, Krogan N. Differential network biology. Mol. Syst. Biol. 2012;8(1):565. doi: 10.1038/msb.2011.99. [DOI] [PMC free article] [PubMed] [Google Scholar]
Johnstone RW, Frew AJ, Smyth MJ. The TRAIL apoptotic pathway in cancer onset, progression and therapy. Nat. Rev. Cancer. 2008;8(10):782–798. doi: 10.1038/nrc2465. [DOI] [PubMed] [Google Scholar]
Kalisch M, Bühlmann P. Estimating high-dimensional directed acyclic graphs with the pc-algorithm. J. Mach. Learn. Res. 2007;8:613–636. [Google Scholar]
Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2012;40(D1):D109–D114. doi: 10.1093/nar/gkr988. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kipps E, Tan DSP, Kaye SB. Meeting the challenge of ascites in ovarian cancer: new avenues for therapy and research. Nat. Rev. Cancer. 2013:273–282. doi: 10.1038/nrc3432. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lauritzen SL. Graphical Models. Oxford University Press; Oxford: 1996. [Google Scholar]
Li K-C, Palotie A, Yuan S, Bronnikov D, Chen D, Wei X, Choi O-W, Saarela J, Peltonen L. Finding disease candidate genes by liquid association. Genome Biol. 2007;8(10):R205. doi: 10.1186/gb-2007-8-10-r205. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lounici K. Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Statist. 2008;2:90–102. [Google Scholar]
Maathuis MH, Kalisch M, Bühlmann P. Estimating high-dimensional intervention effects from observational data. Ann. Statist. 2009;37(6A):3133–3164. [Google Scholar]
Marchini S, Fruscio R, Clivio L, Beltrame L, Porcu L, Nerini IF, Cavalieri D, Chiorino G, Cattoretti G, Mangioni C, Milani R, Torri V, Romualdi C, Zambelli A, Romano M, Signorelli M, di Giandomenico S, DIncalci M. Resistance to platinum-based chemotherapy is associated with epithelial to mesenchymal transition in epithelial ovarian cancer. Eur. J. Cancer. 2013;49:520–530. doi: 10.1016/j.ejca.2012.06.026. [DOI] [PubMed] [Google Scholar]
Markowetz F, Spang R. Inferring cellular networks–a review. BMC Bioinformatics. 2007;8(Suppl 6):S5. doi: 10.1186/1471-2105-8-S6-S5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the lasso. Ann. Statist. 2006;34(3):1436–1462. [Google Scholar]
Mohan K, Chung M, Han S, Witten D, Lee S-I, Fazel M. Structured learning of Gaussian graphical models. Adv. Neural Inf. Process. Syst. 2012:629–637. [PMC free article] [PubMed] [Google Scholar]
Newman ME. The structure and function of complex networks. SIAM Rev. 2003;45(2):167–256. [Google Scholar]
Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 1999;27(1):29–34. doi: 10.1093/nar/27.1.29. [DOI] [PMC free article] [PubMed] [Google Scholar]
Petrucci E, Pasquini L, Bernabei M, Saulle E, Biffoni M, Accarpio F, Sibio S, Di Giorgio A, Di Do-nato V, Casorelli A, Benedetti-Panici P, Testa U. A small molecule SMAC mimic LBW242 potentiates TRAIL-and anticancer drug-mediated cell death of ovarian cancer cells. PloS One. 2012;7(4):e35073. doi: 10.1371/journal.pone.0035073. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ravikumar PK, Raskutti G, Wainwright MJ, Yu B. Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of [lscript]1-regularized MLE. Adv. Neural Inf. Process. Syst. 2008;21:1329–1336. [Google Scholar]
Tibshirani RJ, Saunders M, Rosset S, Zhu J, Knight K. Sparsity and smoothness via the fused lasso. J. Roy. Statist. Soc. Ser. B. 2005;67(1):91–108. [Google Scholar]
Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N, Fereday S, Hung JA, Chiew Y-E, Haviv I, Australian D, Ovarian Cancer Study Group. deFazio Gertig, A., Bowtell DDL. Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin. Cancer Res. 2008;14:5198–208. doi: 10.1158/1078-0432.CCR-08-0196. [DOI] [PubMed] [Google Scholar]
Vucic D, Fairbrother WJ. The inhibitor of apoptosis proteins as therapeutic targets in cancer. Clin. Cancer Res. 2007;13(20):5995–6000. doi: 10.1158/1078-0432.CCR-07-0729. [DOI] [PubMed] [Google Scholar]
Wang S, Nan B, Zhu N, Zhu J. Hierarchically penalized Cox regression with grouped variables. Biometrika. 2009;96(2):307–322. [Google Scholar]
Yagita H, Takeda K, Hayakawa Y, Smyth MJ, Okumura K. TRAIL and its receptors as targets for cancer therapy. Cancer Sci. 2004;95(10):777–783. doi: 10.1111/j.1349-7006.2004.tb02181.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yuan M. Sparse inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res. 2010;11:2261–2286. [Google Scholar]
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B. 2006;68(1):49–67. [Google Scholar]
Zhao P, Yu B. On model selection consistency of lasso. J. Mach. Learn. Res. 2006;7:2541–2563. [Google Scholar]

[R1] Bandyopadhyay S, Mehta M, Kuo D, Sung M-K, Chuang R, Jaehnig EJ, Bodenmiller B, Licon K, Copeland W, Shales M, Fiedler D, Dutkowski J, Guénolé A, van Attikum H, Shokat KM, Kolodner RD, Huh W-K, Aebersold R, Keogh M-C, Krogan NJ, Ideker T. Rewiring of genetic networks in response to DNA damage. Sci. Signal. 2010;330(6009):1385–1389. doi: 10.1126/science.1195618. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Barabási A-L, Oltvai ZN. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 2004;5(2):101–113. doi: 10.1038/nrg1272. [DOI] [PubMed] [Google Scholar]

[R3] Barabási A-L, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 2011;12(1):56–68. doi: 10.1038/nrg2918. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Bellail AC, Qi L, Mulligan P, Chhabra V, Hao C. TRAIL agonists on clinical trials for cancer therapy: the promises and the challenges. Rev. Recent Clin. Trials. 2009;4(1):34–41. doi: 10.2174/157488709787047530. [DOI] [PubMed] [Google Scholar]

[R5] Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundat. Trends Mach. Learn. 2011;3(1):1–122. [Google Scholar]

[R6] Cai T, Liu W. A Direct Estimation Approach to Sparse Linear Discriminant Analysis. J. Am. Statist. Assoc. 106(496):1566–1577. Dec. 2011. ISSN 0162-1459. doi: 10.1198/jasa.2011.tm11199. URL http://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm11199. [Google Scholar]

[R7] Cai T, Xu G, Zhang J. On recovery of sparse signals via l 1 minimization. IEEE Trans. Inf. Theory. 2009;55(7):3388–3397. [Google Scholar]

[R8] Cai T, Wang L, Xu G. Shifting inequality and recovery of sparse signals. IEEE Trans. Signal Process. 2010a;58(3):1300–1308. [Google Scholar]

[R9] Cai T, Wang L, Xu G. Stable recovery of sparse signals and an oracle inequality. IEEE Trans. Inf. Theory. 2010b;56(7):3516–3522. [Google Scholar]

[R10] Cai T, Liu W, Luo X. A constrained ℓ1 minimization approach to sparse precision matrix estimation. Am. Statist. Assoc. 2011;106(494):594–607. [Google Scholar]

[R11] Chiquet J, Grandvalet Y, Ambroise C. Inferring multiple graphical structures. Statist. Comput. 2011;21(4):537–553. [Google Scholar]

[R12] Danaher P, Wang P, Witten DM. The joint graphical lasso for inverse covariance estimation across multiple classes. J. Roy. Statist. Soc. Ser. B. 2013 doi: 10.1111/rssb.12033. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] de la Fuente A. From differential expression to differential networking – identification of dysfunctional regulatory networks in diseases. Trends Genet. 2010;26(7):326–333. doi: 10.1016/j.tig.2010.05.001. [DOI] [PubMed] [Google Scholar]

[R14] Donoho D, Huo X. Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inf. Theory. 2001;47(7):2845–2862. [Google Scholar]

[R15] Emmert-Streib F, Dehmer M. Networks for systems biology: conceptual connection of data and function. IET Syst. Biol. 2011;5(3):185–207. doi: 10.1049/iet-syb.2010.0025. [DOI] [PubMed] [Google Scholar]

[R16] Friedman JH, Hastie TJ, Tibshirani RJ. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9(3):432–441. doi: 10.1093/biostatistics/kxm045. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Guo J, Levina E, Michailidis G, Zhu J. Joint estimation of multiple graphical models. Biometrika. 2011;98(1):1–15. doi: 10.1093/biomet/asq060. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Huang J, Ma S, Xie H, Zhang C-H. A group bridge approach for variable selection. Biometrika. 2009;96(2):339–355. doi: 10.1093/biomet/asp020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Hudson NJ, Reverter A, Dalrymple BP. A differential wiring analysis of expression data correctly identifies the gene containing the causal mutation. PLoS Comput. Biol. 2009;5(5):e1000382. doi: 10.1371/journal.pcbi.1000382. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Ideker T, Krogan N. Differential network biology. Mol. Syst. Biol. 2012;8(1):565. doi: 10.1038/msb.2011.99. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Johnstone RW, Frew AJ, Smyth MJ. The TRAIL apoptotic pathway in cancer onset, progression and therapy. Nat. Rev. Cancer. 2008;8(10):782–798. doi: 10.1038/nrc2465. [DOI] [PubMed] [Google Scholar]

[R22] Kalisch M, Bühlmann P. Estimating high-dimensional directed acyclic graphs with the pc-algorithm. J. Mach. Learn. Res. 2007;8:613–636. [Google Scholar]

[R23] Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2012;40(D1):D109–D114. doi: 10.1093/nar/gkr988. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Kipps E, Tan DSP, Kaye SB. Meeting the challenge of ascites in ovarian cancer: new avenues for therapy and research. Nat. Rev. Cancer. 2013:273–282. doi: 10.1038/nrc3432. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Lauritzen SL. Graphical Models. Oxford University Press; Oxford: 1996. [Google Scholar]

[R26] Li K-C, Palotie A, Yuan S, Bronnikov D, Chen D, Wei X, Choi O-W, Saarela J, Peltonen L. Finding disease candidate genes by liquid association. Genome Biol. 2007;8(10):R205. doi: 10.1186/gb-2007-8-10-r205. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Lounici K. Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Statist. 2008;2:90–102. [Google Scholar]

[R28] Maathuis MH, Kalisch M, Bühlmann P. Estimating high-dimensional intervention effects from observational data. Ann. Statist. 2009;37(6A):3133–3164. [Google Scholar]

[R29] Marchini S, Fruscio R, Clivio L, Beltrame L, Porcu L, Nerini IF, Cavalieri D, Chiorino G, Cattoretti G, Mangioni C, Milani R, Torri V, Romualdi C, Zambelli A, Romano M, Signorelli M, di Giandomenico S, DIncalci M. Resistance to platinum-based chemotherapy is associated with epithelial to mesenchymal transition in epithelial ovarian cancer. Eur. J. Cancer. 2013;49:520–530. doi: 10.1016/j.ejca.2012.06.026. [DOI] [PubMed] [Google Scholar]

[R30] Markowetz F, Spang R. Inferring cellular networks–a review. BMC Bioinformatics. 2007;8(Suppl 6):S5. doi: 10.1186/1471-2105-8-S6-S5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the lasso. Ann. Statist. 2006;34(3):1436–1462. [Google Scholar]

[R32] Mohan K, Chung M, Han S, Witten D, Lee S-I, Fazel M. Structured learning of Gaussian graphical models. Adv. Neural Inf. Process. Syst. 2012:629–637. [PMC free article] [PubMed] [Google Scholar]

[R33] Newman ME. The structure and function of complex networks. SIAM Rev. 2003;45(2):167–256. [Google Scholar]

[R34] Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 1999;27(1):29–34. doi: 10.1093/nar/27.1.29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Petrucci E, Pasquini L, Bernabei M, Saulle E, Biffoni M, Accarpio F, Sibio S, Di Giorgio A, Di Do-nato V, Casorelli A, Benedetti-Panici P, Testa U. A small molecule SMAC mimic LBW242 potentiates TRAIL-and anticancer drug-mediated cell death of ovarian cancer cells. PloS One. 2012;7(4):e35073. doi: 10.1371/journal.pone.0035073. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Ravikumar PK, Raskutti G, Wainwright MJ, Yu B. Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of [lscript]1-regularized MLE. Adv. Neural Inf. Process. Syst. 2008;21:1329–1336. [Google Scholar]

[R37] Tibshirani RJ, Saunders M, Rosset S, Zhu J, Knight K. Sparsity and smoothness via the fused lasso. J. Roy. Statist. Soc. Ser. B. 2005;67(1):91–108. [Google Scholar]

[R38] Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N, Fereday S, Hung JA, Chiew Y-E, Haviv I, Australian D, Ovarian Cancer Study Group. deFazio Gertig, A., Bowtell DDL. Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin. Cancer Res. 2008;14:5198–208. doi: 10.1158/1078-0432.CCR-08-0196. [DOI] [PubMed] [Google Scholar]

[R39] Vucic D, Fairbrother WJ. The inhibitor of apoptosis proteins as therapeutic targets in cancer. Clin. Cancer Res. 2007;13(20):5995–6000. doi: 10.1158/1078-0432.CCR-07-0729. [DOI] [PubMed] [Google Scholar]

[R40] Wang S, Nan B, Zhu N, Zhu J. Hierarchically penalized Cox regression with grouped variables. Biometrika. 2009;96(2):307–322. [Google Scholar]

[R41] Yagita H, Takeda K, Hayakawa Y, Smyth MJ, Okumura K. TRAIL and its receptors as targets for cancer therapy. Cancer Sci. 2004;95(10):777–783. doi: 10.1111/j.1349-7006.2004.tb02181.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] Yuan M. Sparse inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res. 2010;11:2261–2286. [Google Scholar]

[R43] Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B. 2006;68(1):49–67. [Google Scholar]

[R44] Zhao P, Yu B. On model selection consistency of lasso. J. Mach. Learn. Res. 2006;7:2541–2563. [Google Scholar]

PERMALINK

Direct estimation of differential networks

Sihai Dave Zhao

T Tony Cai

Hongzhe Li

Abstract

1 Introduction

2 Previous approaches

3 Direct estimation of difference of two precision matrices

3.1 Constrained optimization approach

3.2 A modified problem

3.3 Implementation

4 Theoretical properties

Condition 1

Condition 2

Theorem 1

Theorem 2

Theorem 3

5 Simulations

5.1 Settings

5.2 Results

Figure 1.

Table 1.

Table 2.

6 Gene expression study of ovarian cancer

Figure 2.

7 Discussion

Acknowledgments

APPENDIX: PROOFS OF THEOREMS

Lemma 1

Proof

Lemma 2

Lemma 3

Proof

Proof of Theorem 3

Proof of Theorem 1

Proof of Theorem 2

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases