Application of fused graphical lasso to statistical inference for multiple sparse precision matrices

Qiuyan Zhang; Lingrui Li; Hu Yang

doi:10.1371/journal.pone.0304264

. 2024 May 31;19(5):e0304264. doi: 10.1371/journal.pone.0304264

Application of fused graphical lasso to statistical inference for multiple sparse precision matrices

Qiuyan Zhang ^1,^*, Lingrui Li ¹, Hu Yang ²

Editor: Debo Cheng³

PMCID: PMC11142621 PMID: 38820407

Abstract

In this paper, the fused graphical lasso (FGL) method is used to estimate multiple precision matrices from multiple populations simultaneously. The lasso penalty in the FGL model is a restraint on sparsity of precision matrices, and a moderate penalty on the two precision matrices from distinct groups restrains the similar structure across multiple groups. In high-dimensional settings, an oracle inequality is provided for FGL estimators, which is necessary to establish the central limit law. We not only focus on point estimation of a precision matrix, but also work on hypothesis testing for a linear combination of the entries of multiple precision matrices. We apply a de-biasing technology, which is used to obtain a new consistent estimator with known distribution for implementing the statistical inference, and extend the statistical inference problem to multiple populations. The corresponding de-biasing FGL estimator and its asymptotic theory are provided. A simulation study and an application of the diffuse large B-cell lymphoma data show that the proposed test works well in high-dimensional situation.

Introduction

Undirected graphical models are popular tools for representing the network structure of data and have been widely applied in many domains, such as machine learning, gene pattern recognition, and financial data analysis. Letting x = (x₁, …, x^p)^T be a p-variate normal random vector with mean vector μ and covariance Σ₀ (Σ₀ is positive definite), the precision matrix (or concentration matrix) is denoted the inverse of the covariance matrix, i.e., $Θ_{0} ≔ Σ_{0}^{- 1}$ . The graphical models capture conditional dependence relationships between random variables via non-zero entries in a precision matrix. If Θ_0ij ≠ 0, xⁱ and x^j, i, j = 1, …, p are dependent on each other, given all other variables. Meanwhile, the zero entries in the precision matrix correspond to pairs of variables that are conditionally independent given other variables. Therefore, the graph model is closely related to the precision matrix. The research of estimating and testing of a precision matrix have been a rapidly growing research direction in the past few years.

Letting x₁, …, x_n be a sequence of independent and identically distributed (i.i.d.) observations from the population x, X_p×n ≔ (x₁, …, x_n). A natural estimator of the precision matrix is the inverse of the sample covariance matrix $\hat{Σ}$ , where $\hat{Σ} = \frac{1}{n} X^{T} X$ . On the one hand, in high-dimensional settings, Johnstone [1] proposed that the eigenvalues of the sample covariance matrix do not converge to the corresponding eigenvalue of the population covariance matrix for Σ = I. Consequently, this estimator becomes invalid when the dimension p is comparable to the sample size n. On the other hand, the sample covariance matrix is singular in a p > n − 1 setting. This will produce non-negligible errors in using ${\hat{Σ}}_{n}^{- 1}$ to estimate Θ₀. In addition, a sparse (i.e., many entries are either zero or nearly so) assumption for a high-dimensional precision matrix is essential, since the zero entries imply the conditional independence structures, which are what we are most concerned with in the graphical model. In general, ${\hat{Σ}}_{n}^{- 1}$ does not have a sparsity construction. How to estimate the sparse precision matrix in high-dimensional settings is an intractable problem.

In recent years, various proposals have been put forward for estimating a precision matrix in high-dimensional situations, among which the graphical model with sparsity-promoting penalties is valid for obtaining a sparse estimator. By applying the l₁ (lasso penalty) to the entries of the concentration matrix, Yuan and Lin [2] proposed a max-det algorithm to obtain the estimator of Θ₀. The convergence result of the estimator is derived under a p fixed assumption. Using a coordinate descent procedure, Friedman et al. [3] provided an algorithm for solving a graphical Lasso estimator that is remarkably fast, even if p > n. Rothman et al. [4] investigated a sparse permutation invariant covariance estimator, and established a convergence rate of the estimator in the Frobenius norm as both data dimension p and sample size n are allowed to grow, and showed that the rate explicitly depends on how sparse the true concentration matrix is. For additional theoretical details on penalized likelihood methods for graphical models, see Fan et al. [5], Ravikumar et al. [6], Xue and Zou [7], and Yuan et al. [8].

The above-mentioned methods focus on estimating a single graphical model. However, joint estimators perform better in recovering the truth graphs for multiple graphical models, when graphs sharing the similar structure. Guo et al. [9] studied joint estimation of precision matrices that have a hierarchical structure assumption. Zhang et al. [10] proposed a new joint group lasso penalty to restore the joint graphical model. Their method was applied for multiple gene networks data with several subpopulations and data types. A fused graphical lasso was proposed by Danaher et al. [11] with a penalty imposing a similar structure of a precision matrix across groups. Supposing that $X_{p \times n_{k}}^{[k]} ≔ (x_{1}^{[k]}, \dots, x_{n_{k}}^{[k]})$ are sample matrices, and $x_{i}^{[k]} \in R^{p} (i = 1, \dots, n_{k})$ are sampled i.i.d. from a distribution with mean μ^[k] and covariance $Σ_{0}^{[k]}$ , for k = 1, …, K, we assume μ^[k] = 0 without loss of generality. To simplify notation, we omit the subscript of $X_{p \times n_{k}}^{[k]}$ , and denote the sample matrices as X^[k]. The population precision matrix is defined as the inverse of the population covariance matrix, i.e., $Θ_{0}^{[k]} = {(Σ_{0}^{[k]})}^{- 1}$ . The estimators of precision matrices ${Θ_{0}^{[k]}}$ are investigated by minimizing the negative penalized log likelihood

\begin{matrix} {{\hat{Θ}}^{[k]}} = {arg min}_{{Θ^{[k]} \in S^{+ +}}} \sum_{k} {t r ({\hat{Σ}}^{[k]} Θ^{[k]}) - log det (Θ^{[k]})} + P ({Θ^{[k]}}), \end{matrix}

(1)

where P({Θ^[k]}) denotes the penalty function, the ${{\hat{Θ}}^{[k]}}$ are the minimizers of (1), and we optimize over the symmetric positive-definite matrices set $S^{+ +}$ . The fused graphical lasso (FGL) is the solution to optimization problem (1) with the fused lasso penalty

\begin{matrix} P ({Θ^{[k]}}) = λ \sum_{k = 1}^{K} | | {(Θ^{[k]})}^{-} {| |}_{1} + ρ \sum_{k < k^{'}} | | {(Θ^{[k]} - Θ^{[k^{'}]})}^{-} {| |}_{1}, \end{matrix}

(2)

where λ and ρ are non-negative regularization parameters, (Θ^[k])⁻ represents the matrix obtained by setting the diagonal elements of (Θ^[k]) to zero, and || ⋅ ||₁ denotes the l₁ norm of a vector or matrix. It is reasonable to restrict non-diagonal elements of Θ^[k], since we are most concerned with the conditional independence cross-different variables. Note that the first term in (2) is the classical lasso penalty, which shrinks the coefficients toward 0 as λ increases. It guarantees discovery of the sparse estimators ${{\hat{Θ}}^{[k]}}$ of the model. The penalty on (Θ^[k] − Θ^[k′])⁻ indicates that the elements of ${\hat{Θ}}^{[1]}, \dots, {\hat{Θ}}^{[K]}$ have a similar network structure across classes.

An approach for the estimation of the joint graphical models largely relies on penalized estimation. The penalty biases the estimates toward the assumed structure, which makes hypothesis tests for precision matrices more challenging. Work on statistical inference for low-dimensional parameters in graphical models has recently been carried out (Janková and van de Geer [12]; Janková and van de Geer [13]; Ren et al. [14]; Yu et al. [15]) based on the l₁-penalized estimator. Janková and van de Geer [12] provided a de-biasing technique to obtain a new consistent estimator with known distribution. However, these approaches were developed only in the setting in which the parameters of one graph are inferred. In contrast, studies of inference techniques using estimators obtained from cross-group penalization are much fewer. The work on statistical inference for multiple graphical models is an interesting area open for future research. Inspired by Janková and van de Geer [12], we not only give FGL estimators of multiple precision matrices from co-movement data, but also test the linear combination of the entries of these precision matrices. The core of the proposed method is based on the de-biasing technique, and we implement statistical inference of the precision matrices under high-dimensional settings according to the proposed central limit theorem.

The rest of this paper is organized as follows. In Main results section, we give the oracle inequality for multiple estimators with a FGL penalty and its weighted version. Testing the hypothesis for the linear combination of corresponding entries of multiple precision matrices is also considered in this section. Based on de-biasing technology, the CLT of the proposed statistics for multiple populations is also derived in this section. In Numerical study part, we report the results of simulations. In Real Data Application, we apply the proposed method to identification of gene-to-gene interaction of the diffuse large B-cell lymphoma data. All technical details are relegated to the Proof of Theorem part.

Main results

We assume following notation throughout the paper. For a matrix $A = {(a_{i j})}_{i, j = 1}^{p}$ , we denote (A)_ij as (i, j)-entry of A, or denote its (i, j)-entry as A_ij to simplify the notation. We write $| A |$ for the determinant of A, and the trace of matrix A is denoted tr(A). Letting A⁺ = diag(A) for a diagonal matrix with the same diagonal as A, A⁻ = A − A⁺. ${| | A | |}_{F}^{2} = \sum_{i, j} a_{i j}^{2}$ denotes the Frobenius norm (also known as the matrix 2-norm). We use the notation ||A||_∞ = max_i,j|a_ij| for the supremum norm of a matrix A, and |||A|||₁ ≔ max_j ∑_i|a_ij| for the l₁-operator norm.

We write $f (n) = O (g (n))$ if f(n) ≤ cg(n) for some constant c < ∞, and f(n) = Ω(g(n)) if f(n) ≥ c′g(n) for some constant c′ > 0. The notation f(n) ≍ g(n) means that $f (n) = O (g (n))$ and f(n) = Ω(g(n)). In the common high-dimensional setting, the dimension p is allowed to grow to infinity. The dimension is comparable, substantially larger or smaller than the sample size. We set sample sizes n₁ ≍ … ≍ n_K ≍ n throughout the paper, and n* = n₁ + … + n_K going to infinity. Furthermore, for notational simplicity, we assume that n₁ = … = n_K = n.

Oracle inequality

To obtain the oracle inequality of multiple estimators of FGL models, we introduce some notation related to the sparsity assumptions on the entries of the true precision matrix. Letting

\begin{matrix} S_{k} ≔ {(i, j) : Θ_{0 i j}^{[k]} \neq 0, i \neq j}, \end{matrix}

where $Θ_{0 i j}^{[k]}$ is the (i, j)-entry of $Θ_{0}^{[k]}$ and s_k = |S_k| is the cardinality of S_k, we adopt the boundedness of the eigenvalues of the true precision matrix and certain tail conditions proposed by Janková and Van De Geer [12].

Condition 1 (Bounded eigenvalues) There exist universal constants L for k such that

\begin{matrix} 0 < L < Λ_{min} (Θ_{0}^{[k]}) \leq Λ_{max} (Θ_{0}^{[k]}) < 1 / L < \infty, \end{matrix}

where Λ_min and Λ_max denote the minimum and maximum eigenvalues of a matrix, respectively.

Condition 2 (Sub-Gaussianity vector condition) The observations $x_{i}^{[k]}$ , i = 1, …, n_k, are uniformly sub-Gaussian vectors in the respective groups.

We propose the oracle inequality for FGL lasso under the K = 2 situation.

Theorem 1 Supposing that Conditions 1 and 2 hold, for k = 1, 2, tuning parameter λ satisfying 2(ρ + λ₀) ≤ λ ≤ c/8L, and $\frac{8 λ^{2} (s_{1} + s_{2})}{c} + \frac{4 p λ_{0}^{2}}{c} \leq λ_{0} / 2 L$ . On the set ${{max}_{k} | | {\hat{Σ}}^{[k]} - Σ_{0}^{[k]} | |_{\infty} \leq λ_{0}}$ , k = 1, 2, it holds that

\begin{matrix} c \sum_{k = 1}^{2} | | {\hat{Θ}}^{[k]} - Θ_{0}^{[k]} {| |}_{F}^{2} + λ \sum_{k = 1}^{2} | | {({\hat{Θ}}^{[k]} - Θ_{0}^{[k]})}^{-} {| |}_{1} \leq \frac{8 λ^{2} (s_{1} + s_{2})}{c} + \frac{4 p λ_{0}^{2}}{c}, \end{matrix}

and

\begin{matrix} \begin{matrix} \sum_{k = 1}^{2} | | | {\hat{Θ}}^{[k]} - Θ_{0}^{[k]} {| | |}_{1} \leq \frac{4 λ (8 s_{1} + 8 s_{2} + p)}{c}, \end{matrix} \end{matrix}

where c = 1/(8L²).

Remark 1 From the inequality, we must select λ so that λp → 0 as n → ∞ to ensure consistency, which is not satisfied by a sub-Gaussianity random vector. Thus, the condition λp → 0 excludes the p ≫ n situation.

The FGL does not take into account that the variables have, in general, different scaling. Thus, we consider the weighted FGL. The minimizer of the optimization problem (1) with weighted FGL penalty

\begin{matrix} P ({Θ^{[k]}}) = λ \sum_{k} \sum_{i \neq j} {\hat{W}}_{i i}^{[k]} {\hat{W}}_{j j}^{[k]} | Θ_{i j}^{[k]} | + ρ \sum_{k < k^{'}} \sum_{i \neq j} | {\hat{W}}_{i i}^{[k]} {\hat{W}}_{j j}^{[k]} Θ_{i j}^{[k]} - {\hat{W}}_{i i}^{[k^{'}]} {\hat{W}}_{j j}^{[k^{'}]} Θ_{i j}^{[k^{'}]} | \end{matrix}

(3)

is denoted ${{\hat{Θ}}_{w}^{[k]}}$ , where ${\hat{W}}^{[k]} = [d i a g ({\hat{Σ}}^{[k]})]^{\frac{1}{2}}$ . Further, the population correlation matrix is denoted $R_{0}^{[k]}$ and the sample correlation matrix is denoted

\begin{matrix} {\hat{R}}^{[k]} = {({\hat{W}}^{[k]})}^{- 1} {\hat{Σ}}^{[k]} {({\hat{W}}^{[k]})}^{- 1} . \end{matrix}

If we substitute ${\hat{R}}^{[k]}$ for ${\hat{Σ}}^{[k]}$ , the minimizer of

\begin{matrix} {arg min}_{{Θ^{[k]} \in S^{+ +}}} \sum_{k} {t r ({\hat{R}}^{[k]} Θ^{[k]}) - log det (Θ^{[k]})} + P ({Θ^{[k]}}) \end{matrix}

(4)

with a FGL penalty (2) is denoted ${{\hat{Θ}}_{R}^{[k]}}$ , which is a matter of estimating the parameter by the normalized data. Then,

\begin{matrix} {\hat{Θ}}_{R}^{[k]} = {\hat{W}}^{[k]} {\hat{Θ}}_{w}^{[k]} {\hat{W}}^{[k]}, \end{matrix}

which means, essentially, that ${\hat{Θ}}_{R}^{[k]}$ are the estimators of $Θ_{R 0}^{[k]} ≔ (R_{0}^{[k]})^{- 1}$ .

Theorem 2 Under the conditions of Theorem 1, on the set ${{max}_{k} | | {\hat{R}}^{[k]} - R_{0}^{[k]} | |_{\infty} \leq λ_{0}}$ , k = 1, 2, it holds that

\begin{matrix} \begin{matrix} c \sum_{k = 1}^{2} | | {\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]} {| |}_{F}^{2} + λ \sum_{k = 1}^{2} | | {({\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]})}^{-} {| |}_{1} \leq \frac{8 λ^{2} (s_{1} + s_{2})}{c}, \end{matrix} \end{matrix}

(5)

\begin{matrix} \begin{matrix} \sum_{k = 1}^{2} | | | {\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]} {| | |}_{1} \leq \frac{32 λ (s_{1} + s_{2})}{c}, \end{matrix} \end{matrix}

(6)

and

\begin{matrix} \begin{matrix} \sum_{k = 1}^{2} | | | {\hat{Θ}}_{w}^{[k]} - Θ_{0}^{[k]} {| | |}_{1} \leq \frac{32 λ (s_{1} + s_{2})}{c} . \end{matrix} \end{matrix}

(7)

It is natural to extend this conclusion to the K > 2 FGL model. For k = 1, …, K and the K > 2 situation, we obtain the following theorem.

Theorem 3 (Multiple FGL model) Supposing that Conditions 1 and 2 hold, for K > 2, $2 (\frac{K (K - 1)}{2} ρ + λ_{0}) \leq λ \leq c / 8 L$ , and $\frac{8 λ^{2} \sum_{k = 1}^{K} s_{k}}{c} + \frac{2 K p λ_{0}^{2}}{c} \leq λ_{0} / 2 L$ , on the set ${{max}_{k} | | {\hat{Σ}}^{[k]} - Σ_{0}^{[k]} | |_{\infty} \leq λ_{0}}$ , k = 1, …, K, it holds that

\begin{matrix} \begin{matrix} c \sum_{k = 1}^{K} | | {\hat{Θ}}^{[k]} - Θ_{0}^{[k]} {| |}_{F}^{2} + λ \sum_{k = 1}^{K} | | {({\hat{Θ}}^{[k]} - Θ_{0}^{[k]})}^{-} {| |}_{1} \leq \frac{8 λ^{2} \sum_{k = 1}^{K} s_{k}}{c} + \frac{2 K p λ_{0}^{2}}{c} \end{matrix} \end{matrix}

(8)

and

\begin{matrix} \begin{matrix} \sum_{k = 1}^{K} | | | {\hat{Θ}}^{[k]} - Θ_{0}^{[k]} {| | |}_{1} \leq \frac{2 K λ (8 \sum_{k = 1}^{K} s_{k} + \frac{K p}{2})}{c} . \end{matrix} \end{matrix}

(9)

Theorem 4 (Multiple FGL model for weighted version) Under the conditions of Theorem 3, on the set ${{max}_{k} | | {\hat{R}}^{[k]} - R_{0}^{[k]} | |_{\infty} \leq λ_{0}}$ , k = 1, 2, it holds that

\begin{matrix} \begin{matrix} c \sum_{k = 1}^{K} | | {\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]} {| |}_{F}^{2} + λ \sum_{k = 1}^{K} | | {({\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]})}^{-} {| |}_{1} \leq \frac{8 λ^{2} \sum_{k = 1}^{K} s_{k}}{c}, \end{matrix} \end{matrix}

(10)

\begin{matrix} \begin{matrix} \sum_{k = 1}^{K} | | | {\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]} {| | |}_{1} \leq \frac{16 K λ \sum_{k = 1}^{K} s_{k}}{c}, \end{matrix} \end{matrix}

(11)

and

\begin{matrix} \begin{matrix} \sum_{k = 1}^{K} | | | {\hat{Θ}}_{w}^{[k]} - Θ_{0}^{[k]} {| | |}_{1} \frac{16 K λ \sum_{k = 1}^{K} s_{k}}{c} . \end{matrix} \end{matrix}

(12)

Asymptotic property

We not only focus on the point estimation of multiple precision matrices, but also on hypothesis testing for the linear combination of the entries of the precision matrices over two groups. One may want to test whether the elements of the precision matrix over two groups are equal:

\begin{matrix} H_{0} : Θ_{0 i j}^{[1]} = Θ_{0 i j}^{[2]} v s . H_{1} : Θ_{0 i j}^{[1]} \neq Θ_{0 i j}^{[2]} . \end{matrix}

(13)

To test Hypothesis (13), we aim to obtain confidence intervals for estimators based on the de-biasing technique, which is imposed for eliminating the bias associated with the penalty. The de-biasing estimator is defined as ${\hat{Θ}}_{d}^{[k]} = 2 {\hat{Θ}}^{[k]} - {\hat{Θ}}^{[k]} {\hat{Σ}}^{[k]} {\hat{Θ}}^{[k]}$ . The difference between the de-biasing estimator and the true value can be decomposed into two parts as follows:

\begin{matrix} {\hat{Θ}}_{d}^{[k]} - Θ_{0}^{[k]} = Ξ^{[k]} + ϒ^{[k]}, \end{matrix}

where

\begin{matrix} Ξ^{[k]} = - Θ_{0}^{[k]} ({\hat{Σ}}^{[k]} - Σ_{0}^{[k]}) Θ_{0}^{[k]}, \\ ϒ^{[k]} = - ({\hat{Θ}}^{[k]} - Θ_{0}^{[k]}) ({\hat{Σ}}^{[k]} - Σ_{0}^{[k]}) Θ_{0}^{[k]} - ({\hat{Θ}}^{[k]} - Θ_{0}^{[k]}) ({\hat{Σ}}^{[k]} {\hat{Θ}}^{[k]} - I_{p}) . \end{matrix}

Under the compatibility conditions, Janková and van de Geer [16] proposed that the (i, j)-entry of ${\hat{Θ}}_{d}^{[k]} - Θ_{0}^{[k]}$ has an asymptotic normality property, and $\sqrt{n} | | ϒ^{[k]} {| |}_{\infty}$ converges to zero in probability. Thus, for testing Hypothesis (13), we construct the testing statistic

\begin{matrix} T_{i j} ≔ ({\hat{Θ}}_{d}^{[1]} - {\hat{Θ}}_{d}^{[2]})_{i j} = [2 {\hat{Θ}}^{[1]} - {\hat{Θ}}^{[1]} {\hat{Σ}}^{[1]} {\hat{Θ}}^{[1]} - (2 {\hat{Θ}}^{[2]} - {\hat{Θ}}^{[2]} {\hat{Σ}}^{[2]} {\hat{Θ}}^{[2]})]_{i j} \end{matrix}

(14)

using de-biasing estimators.

For K = 2, we let

\begin{matrix} s = max {s_{1}, s_{2}}, d = max {d_{1}, d_{2}}, \end{matrix}

where

\begin{matrix} d_{k} = max_{j = 1, \dots, p} | D_{j}^{[k]} |, D_{j}^{[k]} = {(i, j) : Θ_{0 i j}^{[k]} \neq 0, i \neq j} . \end{matrix}

Next, we establish the central limit theorem for T_ij.

Theorem 5 Assuming Conditions 1, 2, and $λ ≍ ρ ≍ \sqrt{log p / n}$ and $(p + s) \sqrt{d} = o (\sqrt{n} / log p)$ , it holds that

\begin{matrix} {\hat{Θ}}_{d}^{[1]} - {\hat{Θ}}_{d}^{[2]} - (Θ_{0}^{[1]} - Θ_{0}^{[2]}) = Ξ^{[1]} - Ξ^{[2]} + r e m, \end{matrix}

(15)

where

\begin{matrix} {| | r e m | |}_{\infty} = | | ϒ^{[1]} - ϒ^{[2]} {| |}_{\infty} = o_{p} (1 / \sqrt{n}), \end{matrix}

(16)

and o_p denotes the convergence in probability. Moreover,

\begin{matrix} \sqrt{n} [T_{i j} - Θ_{0 i j}] \to_{D} N (0, σ_{i j}^{2}), \end{matrix}

(17)

where $Θ_{0 i j} = {(Θ_{0}^{[1]} - Θ_{0}^{[2]})}_{i j}$ .

To complete the testing procedure, we use the consistent estimator ${\hat{σ}}_{i j}^{2} = {({\hat{Θ}}^{[1]})}_{i i} {({\hat{Θ}}^{[1]})}_{j j} + {({\hat{Θ}}^{[1]})}_{i j}^{2} + {({\hat{Θ}}^{[2]})}_{i i} {({\hat{Θ}}^{[2]})}_{j j} + {({\hat{Θ}}^{[2]})}_{i j}^{2}$ for Theorem 5. Theorem 5 provide a practical and efficient way of obtaining the p value and critical value for the test statistic. Under a null hypothesis, we observe that $Θ_{0 i j}^{[1]} - Θ_{0 i j}^{[2]} = 0$ . For an α level of significance, we reject H₀ if $| \sqrt{n} T_{i j} / {\hat{σ}}_{i j}^{2} | > ξ_{α / 2}$ , where ξ_α is the 1 − α upper quantile of the standard normal distribution.

Theorem 5 requires a stronger sparsity condition than the corresponding oracle-type inequality in Theorem 1. According to the convergence rate of $(p + s) \sqrt{d}$ , Theorem 5 applies to the p ≪ n situation. For p ≫ n, we provide the following theorem.

Theorem 6 Assuming Conditions 1, 2, and $λ ≍ ρ ≍ \sqrt{log p / n}$ and $s \sqrt{d} = o (\sqrt{n} / log p)$ , for the p ≪ n regime, the Eq (22) holds with ${\hat{Θ}}_{w}^{[k]}$ , where

\begin{matrix} {| | r e m | |}_{\infty} = o_{p} (1 / \sqrt{n}) . \end{matrix}

(18)

In addition,

\begin{matrix} \sqrt{n} [T_{w i j} - Θ_{0 i j}] \to_{D} N (0, σ_{i j}^{2}), \end{matrix}

(19)

where $T_{w i j} = {(2 {\hat{Θ}}_{w}^{[1]} - {\hat{Θ}}_{w}^{[1]} {\hat{Σ}}^{[1]} {\hat{Θ}}_{w}^{[1]})}_{i j} - {(2 {\hat{Θ}}_{w}^{[2]} - {\hat{Θ}}_{w}^{[2]} {\hat{Σ}}^{[2]} {\hat{Θ}}_{w}^{[2]})}_{i j}$ .

We do not need to impose the so-called irrepresentability condition on Σ to derive the theoretical properties of our estimators, in contrast to Brownlees et al. [17].

In addition, for the multi-sample precision matrix hypothesis problem, one may want to test a linear hypothesis testing problem:

\begin{matrix} H_{0} : a_{1} Θ_{0 i j}^{[1]} + \dots + a_{K} Θ_{0 i j}^{[K]} = 0 v s . H_{1} : not H_{0}, \end{matrix}

(20)

where a₁, …, a_K are known constants. Similar to the two-sample case, we proposed the test statistic

\begin{matrix} a_{1} {\hat{Θ}}_{d i j}^{[1]} + \dots + a_{K} {\hat{Θ}}_{d i j}^{[K]} . \end{matrix}

(21)

For the K > 2 multiple situation, we assume s = max{s₁, …, s_K} and d = max{d₁, …, d_K}. Consequently, we establish the asymptotic normality of the proposed statistic in the following corollary, i.e., Corollary 1.

Corollary 1 Under the assumptions of Theorem 5, it holds that

\begin{matrix} f ({\hat{Θ}}_{d}^{[1]}, \dots, {\hat{Θ}}_{d}^{[K]}) - f (Θ_{0}^{[1]}, \dots, Θ_{0}^{[K]}) = f (Ξ^{[1]}, \dots, Ξ^{[K]}) + r e m, \end{matrix}

(22)

\begin{matrix} {| | r e m | |}_{\infty} = | | f (ϒ^{[1]}, \dots, ϒ^{[K]}) {| |}_{\infty} = o_{p} (1 / \sqrt{n}), \end{matrix}

(23)

where f(x₁, …, x_K) = a₁x₁ + … + a_Kx_K. In addition,

\begin{matrix} \sqrt{n} [T_{i j} - Θ_{0 i j}] \to_{D} N (0, σ_{i j}^{2}), \end{matrix}

(24)

where $T_{i j} = f ({\hat{Θ}}_{d i j}^{[1]}, \dots, {\hat{Θ}}_{d i j}^{[K]})$ and $Θ_{0 i j} = f (Θ_{0 i j}^{[1]}, \dots, Θ_{0 i j}^{[K]})$ .

The asymptotic variance σ_ij in Corollary 1 is unknown, so to construct confidence intervals we use a consistent estimator

\begin{matrix} {\hat{σ}}_{i j}^{2} = f_{v} ([{({\hat{Θ}}^{[1]})}_{i i} {({\hat{Θ}}^{[1]})}_{j j} + {({\hat{Θ}}^{[1]})}_{i j}^{2}], \dots, [{({\hat{Θ}}^{[K]})}_{i i} {({\hat{Θ}}^{[K]})}_{j j} + {({\hat{Θ}}^{[K]})}_{i j}^{2}]), \end{matrix}

where $f_{v} (x_{1}, \dots, x_{K}) = a_{1}^{2} x_{1} + \dots + a_{K}^{2} x_{K}$ . In addition, a weighted version is proposed as follows.

Corollary 2 Under the assumptions of Theorem 6, the residual term in (23) converges in probability with rate $1 / \sqrt{n}$ , and CLT in (24) holds by replacing ${\hat{Θ}}^{[k]}$ by ${\hat{Θ}}_{w}^{[k]}$ , which is obtained by solving the weighted FGL optimization problem.

Numerical study

Simulation experiments were carried out to evaluate the performance of the proposed de-biasing FGL test. We considered the sparse graphical model, and a random sample was generated from the multivariate normal distribution $N (0_{p}, {(Θ_{0}^{[k]})}^{- 1})$ with a population covariance matrix defined as the inverse of the population precision matrix.

To solve the graphical lasso problem with a certain penalty, we refer to the alternating direction method of multiplier (ADMM) algorithm, since it is guaranteed to converge to the global optimum. For more details, the reader is referred to Boyd et al. [18] and Danaher et al. [11]. When an objective method for selecting tuning parameters λ and ρ is required, the approximations of the Akaike information criterion (AIC), Bayesian information criterion, or cross-validation method can be used to select tuning parameters. The AIC method was chosen for the following simulation, and λ and ρ both range from 0.05 to 0.3 with a step of 0.0086, where the step is derived by (0.3 − 0.05)/(30 − 1).

In addition, all the reported simulation results are based on 500 simulations with a nominal significance level of 0.05, and we set the dimension to 100.

Fluctuations of test

We illustrated the theoretical asymptotic normality result on simulated data for testing the two-sample problem (13), and we set precision matrices equal under a null hypothesis, i.e., $Θ_{0}^{[1]} = Θ_{0}^{[2]}$ .

Letting G be a p × p symmetric graph matrix with diagonal entries 0 and $\tilde{α}$ percent of off-diagonal elements 1, and U be p × p matrix with elements i.i.d. generated from the uniformly distribution on the interval (0, 1), i.e., U(0, 1), we denote the elements of the symmetric matrix $\tilde{Θ}$ as ${\tilde{θ}}_{i j}$ . For i > j,

\begin{matrix} {\tilde{θ}}_{i j} = \frac{g_{i j} u_{i j} + g_{j i} u_{j i}}{2} - 1_{{\frac{g_{i j} u_{i j} + g_{j i} u_{j i}}{2} < 0.5}}, \end{matrix}

(25)

where g_ij and u_ij are the (i, j)-entry of G and U, respectively, and 1_{·} is the indicator function. For i < j, we set ${\tilde{θ}}_{i j} = {\tilde{θ}}_{j i}$ . The diagonal entries of matrix $\tilde{Θ}$ are zeros. Then, the precision matrix is generated as

\begin{matrix} Θ_{0}^{[k]} = \tilde{Θ} + (| Λ_{min} (\tilde{Θ}) | + 0.1) I_{p} . \end{matrix}

(26)

This shows that the matrix generated is symmetric and positive definite. To make the non-zero entries go away from 0 and to generate a sparse matrix, we subtract 1 from the non-zero elements. In addition, the precision matrix generation procedure shows that $\tilde{α}$ is a parameter controlling the sparsity. When $\tilde{α} = 1$ , a dense matrix is generated. As is well known, the sparsity of a matrix not only requires a small quantity of non-zero elements, but also a large absolute value of non-zero elements. The parameter $\tilde{α}$ controls sparsity in terms of the number of sparse elements.

We examined the fluctuation of $\sqrt{n} T_{i j} / {\hat{σ}}_{i j}$ under (p, n) = (100, 200) and (p, n) = (100, 400) settings for the extremely sparse and dense precision matrix cases, respectively. For the extremely sparse precision matrix case, we set the parameter $\tilde{α} = 0.01$ , and for dense case we use $\tilde{α} = 1$ .

We simulated the fluctuation for the extremely sparse case as shown in Fig 1 and the dense case in Fig 2. The index (i, j) in the simulation was intermittently chosen. In fact, the CLT provides the method for testing any element of the linear combination of the precision matrix. Theoretically, we can test for any index (i, j)-entry of Θ₀ whether the true value is zero or not.

Average coverage probabilities

We demonstrate the performance of the test method for the K = 2 situation on testing the hypothesis as follows.

Equal Null. Testing hypothesis (13);
Linear Null. Testing the linear null hypothesis $H_{0} : a_{1} Θ_{0 i j}^{[1]} + a_{2} Θ_{0 i j}^{[2]} = 0$ , i.e., $H_{0} : Θ_{0 i j}^{[2]} = - \frac{a_{1}}{a_{2}} Θ_{0 i j}^{[1]}$ . Without loss generation, we chose $- \frac{a_{1}}{a_{2}} = 0.5$ and $Θ_{0 i j}^{[1]}$ generated from (26).

From the global perspective, we used the average coverage, which is also considered in Janková and van de Geer [12]. Letting

\begin{matrix} I_{i j} ≔ [T_{i j} - 1.96 \frac{σ_{i j}}{\sqrt{n}}, T_{i j} + 1.96 \frac{σ_{i j}}{\sqrt{n}}] \end{matrix}

(27)

be the 95% asymptotic confidence interval for Θ_0ij, we substitute the estimator ${\hat{σ}}_{i j}$ for σ_ij to obtain the empirical version. The frequency of the true value being covered by the confidence interval (27) is defined as ${\hat{ϑ}}_{i j}$ . Then, the average coverage over a set A is denoted

\begin{matrix} A v g c o v_{A} = \frac{1}{| A |} \sum_{(i, j) \in A} {\hat{ϑ}}_{i j} . \end{matrix}

(28)

S denotes the set of non-zero entries of $Θ_{0 i j}^{[1]}$ . It is easy to check that S = S₁ = S₂ for the reason that $Θ_{0 i j}^{[1]}$ and $Θ_{0 i j}^{[2]}$ have same structure of sparsity for the Equal Null and Linear Null cases. Thus, for the different null hypotheses, we simulated the average coverage over S and its complementary set S^c. The parameter of sparsity is $\tilde{α} = 0.1, 0.5,$ and 0.9.

Partial results in Table 1 meet our expectation. However, we do not deny that the simulations are affected by randomness. In addition, the proposed method is based on the combination of estimation and hypothesis testing, which accumulates error. The simulation results provide guidance for practice.

Table 1. Estimated average coverage probabilities for K = 2 situation.

2* $\tilde{α}$	2*n	Equal Null		Linear Null
2* $\tilde{α}$	2*n	S	S ^c	S	S ^c
2*0.1	200	0.9886	0.9875	0.9101	0.9824
2*0.1	400	0.9885	0.9867	0.8607	0.9762
2*0.5	200	0.9880	0.9878	0.9384	0.9745
2*0.5	400	0.9870	0.9868	0.8820	0.9647
2*0.9	200	0.9901	0.9899	0.9509	0.9751
2*0.9	400	0.9889	0.9890	0.9091	0.9639

Open in a new tab

Multiple FGL case

For the multiple FGL case, we examined the fluctuation of the statistic T_ij for the K = 3 situation on testing the hypothesis as follows.

Three-sample Linear Null. Testing hypothesis $H_{0} : Θ_{0 i j}^{[3]} = - \frac{a_{1}}{a_{3}} Θ_{0 i j}^{[1]} - \frac{a_{2}}{a_{3}} Θ_{0 i j}^{[2]}$ , where $- \frac{a_{1}}{a_{3}} = 0.6$ and $- \frac{a_{2}}{a_{3}} = 0.9$ are both generated from U(0, 1). $Θ_{0 i j}^{[1]}$ and $Θ_{0 i j}^{[2]}$ are both generated from (26) with parameters 0.01 and 0.1, respectively.

We set $- \frac{a_{1}}{a_{3}}$ and $- \frac{a_{2}}{a_{3}}$ to positive numbers, since the setting of hypothesis testing should guarantee that ${Θ_{0 i j}^{[k]}}_{k = 1}^{3}$ are symmetric positive-definite matrices. Besides, for Three-sample Linear Null, S denotes the set of non-zero entries of $Θ_{0 i j}^{[1]} + \frac{a_{2}}{a_{3}} Θ_{0 i j}^{[2]}$ . The dimension and sample size are (p, n) = (100, 200) and (p, n) = (100, 400), respectively. Histograms of the proposed statistic T_ij at the

\begin{matrix} (i, j) \in {(1, 1), (1, 10), (1, 20), (1, 30)} \end{matrix}

locations of the precision matrix are presented in Fig 3.

Real data application

The lymphoma is a malignant tumor with increasing incidence and mortality year by year. In this part, we apply the proposed method to two sets of diffuse large B-cell lymphoma (DLBCL) data, denoted DLBCL-A [19] and DLBCL-B [20], which is available at http://portals.broadinstitute.org/cgibin/cancer/datasets.cgi. Some brief information on these datasets can be found in Table 2. The DLBCL-A and DLBCL-B datasets have 3 subgroups, and the label and sample size of each subgroup are shown in the 5th column in Table 2. Both DLBCL-A dataset and DLBCL-B dataset have a high dimension with 662 genes but only a few observations with the sample size 141 for the DLBCL-A dataset and 180 for the DLBCL-B dataset.

Table 2. Brief introduction to the gene profile expression datasets.

Dataset	n	p	Subgroups	Subgroup label (sample size)
DLBCL-A	141	662	3	I (49), II (50), III (42)
DLBCL-B	180	662	3	I (42), II (51), III (87)

Open in a new tab

Typically, we test for differences in mean vectors across different disease subgroups, however, the role of gene-to-gene interactions in the data across different subtypes remains unclear. In this section, we use our test approach to identify whether the gene-to-gene interactions that controls lymphoma most behave the same across different disease subtypes. For distinct gene subtypes of the same disease gene data, we focus on testing the equality of two precision matrices. The hypothesis testing problem is

\begin{matrix} H_{0} : Θ_{0 i j}^{t y p e i} = Θ_{0 i j}^{t y p e j} v s . H_{1} : Θ_{0 i j}^{t y p e i} \neq Θ_{0 i j}^{t y p e j} \end{matrix}

where type i and type i are chosen from I, II, III set in Table 2 and type i ≠ type j. We tune parameters with weighted FGL penalty in (3) by AIC criterion. After the tuning procedure, we estimate precision matrices, and then return a p × p matrix, whose (i, j)-th elements are p-value of statistic T_ij. The results are demonstrated in Figs 4 and 5.

Fig 4 — P-values of T_ij by comparing subtype I and subtype II (left), subtype II and subtype III (middle), and subtype I and subtype III (right) with DLBCL-A dataset.

Fig 5 — P-values of T_ij by comparing subtype I and subtype II (left), subtype II and subtype III (middle), and subtype I and subtype III (right) with DLBCL-B dataset.

As can be seen in the figure, the interactions between genes of DLBCL-A dataset are not the same among three different subtypes, while for DLBCL-B dataset, the interactions between genes of three different subtypes are mostly similar.

Proof of theorem

Proof of Theorem 1

To prove Theorem 1, we need a lemma of Janková and Van de Geer [16], which is present as follow.

Lemma 7 Let f(Δ) ≔ tr(ΔΣ₀) − [log det(Δ + Θ₀) − log det(Θ₀)]. Assume that 1/L ≤ λ_min(Θ₀) ≤ λ_max(Θ₀) ≤ L for some constant L ≥ 1. Then for all Δ such that ||Δ||_F ≤ 1/(2L), f(Δ) is well defined and

\begin{matrix} f (Δ) \geq \frac{1}{2 {(L + 1 / (2 L))}^{2}} {| | Δ | |}_{F}^{2} . \end{matrix}

To simplify the notation, we substitute ${\hat{Σ}}_{k}$ , Σ_0k, ${\hat{Θ}}_{k}$ , Θ_0k for ${\hat{Σ}}^{[k]}$ , $Σ_{0}^{[k]}$ , ${\hat{Θ}}^{[k]}$ , $Θ_{0}^{[k]}$ respectively.

Proof 1 Note that ${\hat{Θ}}_{k}$ is the minimum value of the fused graphical Lasso for k = 1, 2. Let ${\tilde{Θ}}_{k} = α_{k} {\hat{Θ}}_{k} + (1 - α_{k}) Θ_{0 k}$ , and $α_{k} = \frac{M}{M + | | {\hat{Θ}}_{k} - Θ_{0 k} {| |}_{F}}$ . According to the definitions of ${\tilde{Θ}}_{k}$ , and the convexity of loss function

\begin{matrix} F_{n} (Θ_{1}, Θ_{2}) \\ = & t r (Θ_{1} {\hat{Σ}}_{1}) - log det (Θ_{1}) + t r (Θ_{2} {\hat{Σ}}_{2}) - log det (Θ_{2}) \\ + λ | | Θ_{1}^{-} {| |}_{1} + λ | | Θ_{2}^{-} {| |}_{1} + ρ | | Θ_{1}^{-} - Θ_{2}^{-} {| |}_{1}, \end{matrix}

we obtain

\begin{matrix} F_{n} ({\tilde{Θ}}_{1}, {\tilde{Θ}}_{2}) \leq F_{n} (Θ_{01}, Θ_{02}) . \end{matrix}

That is

\begin{matrix} \sum_{k = 1}^{2} {t r ({\tilde{Θ}}_{k} - Θ_{0 k}) {\hat{Σ}}_{k} - (log det ({\tilde{Θ}}_{k}) - log det (Θ_{0 k})) + λ | | {\tilde{Θ}}_{k}^{-} {| |}_{1}} + ρ | | {\tilde{Θ}}_{1}^{-} - {\tilde{Θ}}_{2}^{-} {| |}_{1} \\ \leq & λ | | Θ_{01}^{-} {| |}_{1} + λ | | Θ_{02}^{-} {| |}_{1} + ρ | | Θ_{01}^{-} - Θ_{02}^{-} {| |}_{1} . \end{matrix}

(29)

Let $Δ_{k} = {\tilde{Θ}}_{k} - Θ_{0 k}$ , and

\begin{matrix} f (Δ_{k}) ≔ t r (Δ_{k} Σ_{0 k}) - [log det (Δ_{k} + Θ_{0 k}) - log det (Θ_{0 k})], \end{matrix}

subtracting $t r (Δ_{1} ({\hat{Σ}}_{1} - Σ_{01})) + t r (Δ_{2} ({\hat{Σ}}_{2} - Σ_{02}))$ on the both sides of the inequality (29), we get

\begin{matrix} \begin{matrix} f (Δ_{1}) + f (Δ_{2}) + λ | | {\tilde{Θ}}_{1}^{-} {| |}_{1} + λ | | {\tilde{Θ}}_{2}^{-} {| |}_{1} + ρ | | {\tilde{Θ}}_{1}^{-} - {\tilde{Θ}}_{2}^{-} {| |}_{1} \\ \leq & - t r (Δ_{1} ({\hat{Σ}}_{1} - Σ_{01})) - t r (Δ_{2} ({\hat{Σ}}_{2} - Σ_{02})) + λ | | Θ_{01}^{-} {| |}_{1} + λ | | Θ_{02}^{-} {| |}_{1} + ρ | | Θ_{01}^{-} - Θ_{02}^{-} {| |}_{1} . \end{matrix} \end{matrix}

(30)

For $t r (Δ_{k} ({\hat{Σ}}_{k} - Σ_{0 k}))$ term, we have

\begin{matrix} \begin{matrix} | t r (Δ_{k} ({\hat{Σ}}_{k} - Σ_{0 k})) | & = | G (Δ_{k} \circ ({\hat{Σ}}_{k} - Σ_{0 k})) | \\ \leq | G (Δ_{k}^{-} \circ ({\hat{Σ}}_{k}^{-} - Σ_{0 k}^{-})) | + | G (Δ_{k}^{+} \circ ({\hat{Σ}}_{k}^{+} - Σ_{0 k}^{+})) |, \end{matrix} \end{matrix}

where function G(M) takes the summation of all the elements of the matrix M, and ∘ is Hadamard product. According to Cauchy-Schwarz inequality, on the sets ${{max}_{k} {| | {\hat{Σ}}_{k} - Σ_{0 k} {| |}_{\infty}} \leq λ_{0}}$ ,

\begin{matrix} \begin{matrix} | G (Δ_{k}^{-} \circ ({\hat{Σ}}_{k}^{-} - Σ_{0 k}^{-})) | + | G (Δ_{k}^{+} \circ ({\hat{Σ}}_{k}^{+} - Σ_{0 k}^{+})) | \\ \leq | | {\hat{Σ}}_{k}^{-} - Σ_{0 k}^{-} {| |}_{\infty} | | Δ_{k}^{-} {| |}_{1} + | | {\hat{Σ}}_{k}^{+} - Σ_{0 k}^{+} {| |}_{F} | | Δ_{k}^{+} {| |}_{F} \\ \leq λ_{0} | | Δ_{k}^{-} {| |}_{1} + | | {\hat{Σ}}_{k}^{+} - Σ_{0 k}^{+} {| |}_{F} | | Δ_{k}^{+} {| |}_{F} . \end{matrix} \end{matrix}

Hence,

\begin{matrix} \begin{matrix} - t r (Δ_{k} ({\hat{Σ}}_{k} - Σ_{0 k})) & \leq | t r (Δ_{k} ({\hat{Σ}}_{k} - Σ_{0 k})) | \\ \leq λ_{0} | | Δ_{k}^{-} {| |}_{1} + | | {\hat{Σ}}_{k}^{+} - Σ_{0 k}^{+} {| |}_{F} | | Δ_{k}^{+} {| |}_{F} . \end{matrix} \end{matrix}

(31)

Next, for L_k ≥ 1 satisfying condition

\begin{matrix} 1 / L_{k} \leq λ_{m i n} (Θ_{0 k}) \leq λ_{m a x} (Θ_{0 k}) \leq L_{k}, \end{matrix}

we choose L > 1 satisfying 1/L ≤ 1/L_k and L_k ≤ L, k = 1, 2. Based on the definitions of Δ_k and ${\tilde{Θ}}_{k}$ , we get

\begin{matrix} \begin{matrix} | | Δ_{k} {| |}_{F} = α_{k} | | {\hat{Θ}}_{k} - Θ_{0 k} {| |}_{F} = \frac{| | {\hat{Θ}}_{k} - Θ_{0 k} {| |}_{F}}{M + | | {\hat{Θ}}_{k} - Θ_{0 k} {| |}_{F}} M, \end{matrix} \end{matrix}

(32)

for arbitrary M in (0, 1/2L]. Thus, ||Δ_k||_F is bounded by M, i.e., ||Δ_k||_F ≤ M. For f(Δ_k) term, based on Lemma 7, we have

\begin{matrix} f (Δ_{k}) \geq c | | {\tilde{Θ}}_{k} - Θ_{0 k} {| |}_{F}^{2}, \end{matrix}

(33)

where $c = \frac{1}{2 {(L + 1 / (2 L))}^{2}}$ . In particular, we choose c = 1/(8L²), and the inequality (33) still holds.

Using bounds (31) and (33), the inequality (30) turns to be

\begin{matrix} \begin{matrix} c | | {\tilde{Θ}}_{1} - Θ_{01} {| |}_{F}^{2} + c | | {\tilde{Θ}}_{2} - Θ_{02} {| |}_{F}^{2} + λ | | {\tilde{Θ}}_{1}^{-} {| |}_{1} + λ | | {\tilde{Θ}}_{2}^{-} {| |}_{1} + ρ | | {\tilde{Θ}}_{1}^{-} - {\tilde{Θ}}_{2}^{-} {| |}_{1} \\ \leq & λ_{0} | | Δ_{1}^{-} {| |}_{1} + λ_{0} | | Δ_{2}^{-} {| |}_{1} + | | {\hat{Σ}}_{1}^{+} - Σ_{01}^{+} {| |}_{F} | | Δ_{1}^{+} {| |}_{F} + | | {\hat{Σ}}_{2}^{+} - Σ_{02}^{+} {| |}_{F} | | Δ_{2}^{+} {| |}_{F} \\ + λ | | Θ_{01}^{-} {| |}_{1} + λ | | Θ_{02}^{-} {| |}_{1} + ρ | | Θ_{01}^{-} - Θ_{02}^{-} {| |}_{1} . \end{matrix} \end{matrix}

(34)

We move some terms of the inequality (34) and combine them to get the following inequality

\begin{matrix} \begin{matrix} c | | {\tilde{Θ}}_{1} - Θ_{01} {| |}_{F}^{2} + c | | {\tilde{Θ}}_{2} - Θ_{02} {| |}_{F}^{2} \\ + λ {| | {\tilde{Θ}}_{1}^{-} {| |}_{1} - | | Θ_{01}^{-} {| |}_{1} + | | {\tilde{Θ}}_{2}^{-} {| |}_{1} - | | Θ_{02}^{-} {| |}_{1}} \\ \leq & λ_{0} {| | {\tilde{Θ}}_{1}^{-} - Θ_{01}^{-} {| |}_{1} + | | {\tilde{Θ}}_{2}^{-} - Θ_{02}^{-} {| |}_{1}} + ρ {| | Θ_{01}^{-} - Θ_{02}^{-} {| |}_{1} - | | {\tilde{Θ}}_{1}^{-} - {\tilde{Θ}}_{2}^{-} {| |}_{1}} \\ + | | {\hat{Σ}}_{1}^{+} - Σ_{01}^{+} {| |}_{F} | | {\tilde{Θ}}_{1}^{+} - Θ_{01}^{+} {| |}_{F} + | | {\hat{Σ}}_{2}^{+} - Σ_{02}^{+} {| |}_{F} | | {\tilde{Θ}}_{2}^{+} - Θ_{02}^{+} {| |}_{F} . \end{matrix} \end{matrix}

(35)

Next we need to prove three inequations:

\begin{matrix} | | {\tilde{Θ}}_{k}^{-} {| |}_{1} - | | Θ_{0 k}^{-} {| |}_{1} \geq | | Δ_{k S_{k}^{c}}^{-} {| |}_{1} - | | Δ_{k S_{k}}^{-} {| |}_{1}, \end{matrix}

(36)

\begin{matrix} | | {\tilde{Θ}}_{k}^{-} - Θ_{0 k}^{-} {| |}_{1} \leq | | Δ_{k S_{k}^{c}}^{-} {| |}_{1} + | | Δ_{k S_{k}}^{-} {| |}_{1}, \end{matrix}

(37)

\begin{matrix} | | Θ_{01}^{-} - Θ_{02}^{-} {| |}_{1} - | | {\tilde{Θ}}_{1}^{-} - {\tilde{Θ}}_{2}^{-} {| |}_{1} \leq | | {\tilde{Θ}}_{1}^{-} - Θ_{01}^{-} {| |}_{1} + | | {\tilde{Θ}}_{2}^{-} - Θ_{02}^{-} {| |}_{1} . \end{matrix}

(38)

Because

\begin{matrix} \begin{matrix} | | {\tilde{Θ}}_{k}^{-} {| |}_{1} & = | | Θ_{0 k}^{-} + Δ_{k}^{-} {| |}_{1} \\ = | | Θ_{0 k S_{k}}^{-} + Δ_{k S_{k}}^{-} {| |}_{1} + | | Δ_{k S_{k}^{c}}^{-} {| |}_{1}, \end{matrix} \end{matrix}

and

\begin{matrix} | | Θ_{0 k}^{-} {| |}_{1} = | | Θ_{0 k S_{k}}^{-} {| |}_{1} \end{matrix}

hold. Thus,

\begin{matrix} \begin{matrix} | | {\tilde{Θ}}_{k}^{-} {| |}_{1} - | | Θ_{0 k}^{-} {| |}_{1} = & | | Θ_{0 k S_{k}}^{-} + Δ_{k S_{k}}^{-} {| |}_{1} + | | Δ_{k S_{k}^{c}}^{-} {| |}_{1} - | | Θ_{0 k S_{k}}^{-} {| |}_{1} \\ \geq & | | Δ_{k S_{k}^{c}}^{-} {| |}_{1} - | | | Θ_{0 k S_{k}}^{-} + Δ_{k S_{k}}^{-} {| |}_{1} - | | Θ_{0 k S_{k}}^{-} {| |}_{1} | \\ \geq & | | Δ_{k S_{k}^{c}}^{-} {| |}_{1} - | | Δ_{k S_{k}}^{-} {| |}_{1}, \end{matrix} \end{matrix}

which proves inequality (36). By the triangle inequality, we naturally obtain

\begin{matrix} \begin{matrix} | | {\tilde{Θ}}_{k}^{-} - Θ_{0 k}^{-} {| |}_{1} = & | | Δ_{k}^{-} {| |}_{1} \\ = & | | Δ_{k S_{k}^{c}}^{-} + Δ_{k S_{k}}^{-} {| |}_{1} \\ \leq & | | Δ_{k S_{k}^{c}}^{-} {| |}_{1} + | | Δ_{k S_{k}}^{-} {| |}_{1} . \end{matrix} \end{matrix}

Thus, the inequation (37) holds. For inequation (38), we have

\begin{matrix} \begin{matrix} | | Θ_{01}^{-} - Θ_{02}^{-} {| |}_{1} - | | {\tilde{Θ}}_{1}^{-} - {\tilde{Θ}}_{2}^{-} {| |}_{1} \\ = & | | Θ_{01}^{-} - {\tilde{Θ}}_{1}^{-} + {\tilde{Θ}}_{1}^{-} - {\tilde{Θ}}_{2}^{-} + {\tilde{Θ}}_{2}^{-} - Θ_{02}^{-} {| |}_{1} - | | {\tilde{Θ}}_{1}^{-} - {\tilde{Θ}}_{2}^{-} {| |}_{1} \\ \leq & | | {\tilde{Θ}}_{1}^{-} - Θ_{01}^{-} {| |}_{1} + | | {\tilde{Θ}}_{2}^{-} - Θ_{02}^{-} {| |}_{1} . \end{matrix} \end{matrix}

Thus, the inequality (35) yields

\begin{matrix} \begin{matrix} c | | {\tilde{Θ}}_{1} - Θ_{01} {| |}_{F}^{2} + c | | {\tilde{Θ}}_{2} - Θ_{02} {| |}_{F}^{2} \\ + λ {| | Δ_{1 S_{1}^{c}}^{-} {| |}_{1} - | | Δ_{1 S_{1}}^{-} {| |}_{1} + | | Δ_{2 S_{2}^{c}}^{-} {| |}_{1} - | | Δ_{2 S_{2}}^{-} {| |}_{1}} \\ \leq & (ρ + λ_{0}) {| | Δ_{1 S_{1}^{c}}^{-} {| |}_{1} + | | Δ_{1 S_{1}}^{-} {| |}_{1} + | | Δ_{2 S_{2}^{c}}^{-} {| |}_{1} + | | Δ_{2 S_{2}}^{-} {| |}_{1}} \\ + | | {\hat{Σ}}_{1}^{+} - Σ_{01}^{+} {| |}_{F} | | {\tilde{Θ}}_{1}^{+} - Θ_{01}^{+} {| |}_{F} + | | {\hat{Σ}}_{2}^{+} - Σ_{02}^{+} {| |}_{F} | | {\tilde{Θ}}_{2}^{+} - Θ_{02}^{+} {| |}_{F} . \end{matrix} \end{matrix}

By taking 2(ρ + λ₀) < λ, we conclude that

\begin{matrix} \begin{matrix} 2 c {| | {\tilde{Θ}}_{1} - Θ_{01} {| |}_{F}^{2} + | | {\tilde{Θ}}_{2} - Θ_{02} {| |}_{F}^{2}} + λ {| | Δ_{1 S_{1}^{c}}^{-} {| |}_{1} + | | Δ_{2 S_{2}^{c}}^{-} {| |}_{1}} \\ \leq & 3 λ {| | Δ_{1 S_{1}}^{-} {| |}_{1} + | | Δ_{2 S_{2}}^{-} {| |}_{1}} \\ + 2 {| | {\hat{Σ}}_{1}^{+} - Σ_{01}^{+} {| |}_{F} | | {\tilde{Θ}}_{1}^{+} - Θ_{01}^{+} {| |}_{F} + | | {\hat{Σ}}_{2}^{+} - Σ_{02}^{+} {| |}_{F} | | {\tilde{Θ}}_{2}^{+} - Θ_{02}^{+} {| |}_{F}} . \end{matrix} \end{matrix}

By the definition of Δ_k, we have

\begin{matrix} \begin{matrix} | | Δ_{k}^{-} {| |}_{1} = & | | Δ_{k S_{k}}^{-} + Δ_{k S_{k}^{c}}^{-} {| |}_{1} \\ \leq & | | Δ_{k S_{k}}^{-} {| |}_{1} + | | Δ_{k S_{k}^{c}}^{-} {| |}_{1} . \end{matrix} \end{matrix}

(39)

So we deduce

\begin{matrix} \begin{matrix} 2 c {| | {\tilde{Θ}}_{1} - Θ_{01} {| |}_{F}^{2} + | | {\tilde{Θ}}_{2} - Θ_{02} {| |}_{F}^{2}} + λ {| | Δ_{1}^{-} {| |}_{1} + | | Δ_{2}^{-} {| |}_{1}} \\ \leq & 4 λ {| | Δ_{1 S_{1}}^{-} {| |}_{1} + | | Δ_{2 S_{2}}^{-} {| |}_{1}} \\ + 2 {| | {\hat{Σ}}_{1}^{+} - Σ_{01}^{+} {| |}_{F} | | {\tilde{Θ}}_{1}^{+} - Θ_{01}^{+} {| |}_{F} + | | {\hat{Σ}}_{2}^{+} - Σ_{02}^{+} {| |}_{F} | | {\tilde{Θ}}_{2}^{+} - Θ_{02}^{+} {| |}_{F}} \end{matrix} \end{matrix}

holds. Since the inequality of arithmetic and geometric means, the inequality $| | Δ_{k S_{k}}^{-} {| |}_{1} \leq \sqrt{s_{k}} | | Δ_{k S_{k}}^{-} {| |}_{F}$ holds. Thus

\begin{matrix} \begin{matrix} 2 c {| | {\tilde{Θ}}_{1} - Θ_{01} {| |}_{F}^{2} + | | {\tilde{Θ}}_{2} - Θ_{02} {| |}_{F}^{2}} + λ {| | Δ_{1}^{-} {| |}_{1} + | | Δ_{2}^{-} {| |}_{1}} \\ \leq & 4 λ {\sqrt{s_{1}} | | Δ_{1 S_{1}}^{-} {| |}_{F} + \sqrt{s_{2}} | | Δ_{2 S_{2}}^{-} {| |}_{F}} \\ + 2 {| | {\hat{Σ}}_{1}^{+} - Σ_{01}^{+} {| |}_{F} | | {\tilde{Θ}}_{1}^{+} - Θ_{01}^{+} {| |}_{F} + | | {\hat{Σ}}_{2}^{+} - Σ_{02}^{+} {| |}_{F} | | {\tilde{Θ}}_{2}^{+} - Θ_{02}^{+} {| |}_{F}} . \end{matrix} \end{matrix}

(40)

Using xy ≤ (x² + y²)/2, the inequality (40) infer that

\begin{matrix} \begin{matrix} 2 c {| | {\tilde{Θ}}_{1} - Θ_{01} {| |}_{F}^{2} + | | {\tilde{Θ}}_{2} - Θ_{02} {| |}_{F}^{2}} + λ {| | Δ_{1}^{-} {| |}_{1} + | | Δ_{2}^{-} {| |}_{1}} \\ \leq & \frac{1}{2} (c | | Δ_{1 S_{1}}^{-} {| |}_{F}^{2} + \frac{16 λ^{2} s_{1}}{c} + c | | Δ_{2 S_{2}}^{-} {| |}_{F}^{2} + \frac{16 λ^{2} s_{2}}{c}) \\ + \frac{1}{2} (c | | {\tilde{Θ}}_{1}^{+} - Θ_{01}^{+} {| |}_{F}^{2} + \frac{4 | | {\hat{Σ}}_{1}^{+} - Σ_{01}^{+} {| |}_{F}^{2}}{c} + c | | {\tilde{Θ}}_{2}^{+} - Θ_{02}^{+} {| |}_{F}^{2} + \frac{4 | | {\hat{Σ}}_{2}^{+} - Σ_{02}^{+} {| |}_{F}^{2}}{c}) . \end{matrix} \end{matrix}

Because

\begin{matrix} \begin{matrix} c | | {\tilde{Θ}}_{k}^{+} - Θ_{0 k}^{+} {| |}_{F}^{2} + c | | Δ_{k S_{k}}^{-} {| |}_{F}^{2} \leq & {c | | {\tilde{Θ}}_{k}^{+} - Θ_{0 k}^{+} {| |}_{F}^{2} + c | | Δ_{k}^{-} {| |}_{F}^{2}} \\ + {c | | Δ_{k S_{k}}^{-} {| |}_{F}^{2} + c | | Δ_{k S_{k}^{c}}^{-} {| |}_{F}^{2} + c | | Δ_{k}^{+} {| |}_{F}^{2}} \\ = & 2 c | | Δ_{k} {| |}_{F}^{2}, \end{matrix} \end{matrix}

(41)

we obtain

\begin{matrix} \begin{matrix} 2 c {| | {\tilde{Θ}}_{1} - Θ_{01} {| |}_{F}^{2} + | | {\tilde{Θ}}_{2} - Θ_{02} {| |}_{F}^{2}} + λ {| | Δ_{1}^{-} {| |}_{1} + | | Δ_{2}^{-} {| |}_{1}} \\ \leq & c {| | Δ_{1} {| |}_{F}^{2} + | | Δ_{2} {| |}_{F}^{2}} + \frac{8 λ^{2} (s_{1} + s_{2})}{c} + \frac{2 | | {\hat{Σ}}_{1}^{+} - Σ_{01}^{+} {| |}_{F}^{2}}{c} + \frac{2 | | {\hat{Σ}}_{2}^{+} - Σ_{02}^{+} {| |}_{F}^{2}}{c} . \end{matrix} \end{matrix}

Thus,

\begin{matrix} \begin{matrix} c {| | Δ_{1} {| |}_{F}^{2} + | | Δ_{2} {| |}_{F}^{2}} + λ {| | Δ_{1}^{-} {| |}_{1} + | | Δ_{2}^{-} {| |}_{1}} \\ \leq & \frac{8 λ^{2} (s_{1} + s_{2})}{c} + \frac{2 | | {\hat{Σ}}_{1}^{+} - Σ_{01}^{+} {| |}_{F}^{2}}{c} + \frac{2 | | {\hat{Σ}}_{2}^{+} - Σ_{02}^{+} {| |}_{F}^{2}}{c} . \end{matrix} \end{matrix}

(42)

Based on the inequality $| | {\hat{Σ}}_{k}^{+} - Σ_{0 k}^{+} {| |}_{F} \leq \sqrt{p} | | {\hat{Σ}}_{k}^{+} - Σ_{0 k}^{+} {| |}_{\infty}$ , we have

\begin{matrix} \begin{matrix} c {| | Δ_{1} {| |}_{F}^{2} + | | Δ_{2} {| |}_{F}^{2}} + λ {| | Δ_{1}^{-} {| |}_{1} + | | Δ_{2}^{-} {| |}_{1}} \leq \frac{8 λ^{2} (s_{1} + s_{2})}{c} + \frac{4 p λ_{0}^{2}}{c} . \end{matrix} \end{matrix}

(43)

Next, we prove that substituting ${\hat{Θ}}_{k}$ for ${\tilde{Θ}}_{k}$ , the conclusion still holds. According to the condition,

\begin{matrix} \begin{matrix} | | Δ_{1} {| |}_{F}^{2} + | | Δ_{2} {| |}_{F}^{2} \leq & \frac{λ_{0}}{2 c L} \leq \frac{λ}{4 c L} \leq \frac{1}{32 L^{2}} . \end{matrix} \end{matrix}

Taking $M = 1 / (2 \sqrt{2} L) < 1 / 2 L$ , we have

\begin{matrix} \begin{matrix} | | Δ_{1} {| |}_{F}^{2} + | | Δ_{2} {| |}_{F}^{2} \leq & M^{2} / 4 . \end{matrix} \end{matrix}

Thus, ||Δ_k||_F is bounded by M/2. In addition,

\begin{matrix} \begin{matrix} | | {\hat{Θ}}_{k} - Θ_{0 k} {| |}_{F} = \frac{M | | Δ_{k} {| |}_{F}}{M - | | Δ_{k} {| |}_{F}}, \end{matrix} \end{matrix}

which means $| | {\hat{Θ}}_{k} - Θ_{0 k} {| |}_{F}$ is monotone increasing function of ||Δ_k||_F on set (0, M). We obtain that $| | {\hat{Θ}}_{k} - Θ_{0 k} {| |}_{F} \leq M$ . Therefore, we can substitute ${\hat{Θ}}_{k}$ for ${\tilde{Θ}}_{k}$ , and that leads to the inequality (43) holds for ${\hat{Θ}}_{k}$ .

According to inequality (43), we get

\begin{matrix} \begin{matrix} | | {\hat{Θ}}_{k} - Θ_{0 k} {| |}_{F}^{2} \leq & \frac{8 λ^{2} (s_{1} + s_{2})}{c^{2}} + \frac{4 p λ_{0}^{2}}{c^{2}} \\ \leq & \frac{λ^{2} (8 s_{1} + 8 s_{2} + p)}{c^{2}}, \end{matrix} \end{matrix}

and

\begin{matrix} \begin{matrix} | | {\hat{Θ}}_{k}^{-} - Θ_{0 k}^{-} {| |}_{1} \leq & \frac{8 λ (s_{1} + s_{2})}{c} + \frac{4 p λ_{0}^{2}}{λ c} \\ \leq & \frac{λ (8 s_{1} + 8 s_{2} + p)}{c} . \end{matrix} \end{matrix}

Thus, we conclude the upper bound of $\sum_{k = 1}^{2} | | | {\hat{Θ}}_{k} - Θ_{0 k} {| | |}_{1}$ ,

\begin{matrix} \begin{matrix} \sum_{k = 1}^{2} | | | {\hat{Θ}}_{k} - Θ_{0 k} {| | |}_{1} \leq & \sum_{k = 1}^{2} (| | {\hat{Θ}}_{k}^{+} - Θ_{0 k}^{+} {| |}_{\infty} + | | {\hat{Θ}}_{k}^{-} - Θ_{0 k}^{-} {| |}_{1}) \\ \leq & \sum_{k = 1}^{2} (| | {\hat{Θ}}_{k} - Θ_{0 k} {| |}_{F} + | | {\hat{Θ}}_{k}^{-} - Θ_{0 k}^{-} {| |}_{1}) \\ \leq & \frac{2 λ \sqrt{8 s_{1} + 8 s_{2} + p}}{c} + \frac{2 λ (8 s_{1} + 8 s_{2} + p)}{c} \\ \leq & \frac{4 λ (8 s_{1} + 8 s_{2} + p)}{c} . \end{matrix} \end{matrix}

Proof of Theorem 2

Proof 2 The minimizer $({\hat{Θ}}_{R}^{[1]}, {\hat{Θ}}_{R}^{[2]})$ satisfying inequality (42), that is

\begin{matrix} \begin{matrix} c {| | {\hat{Θ}}_{R}^{[1]} - Θ_{R 0}^{[1]} {| |}_{F}^{2} + | | {\hat{Θ}}_{R}^{[2]} - Θ_{R 0}^{[2]} {| |}_{F}^{2}} + λ {| | {({\hat{Θ}}_{R}^{[1]} - Θ_{R 0}^{[1]})}^{-} {| |}_{1} + | | {({\hat{Θ}}_{R}^{[2]} - Θ_{R 0}^{[2]})}^{-} {| |}_{1}} \\ \leq & \frac{8 λ^{2} (s_{1} + s_{2})}{c} + \frac{2 | | {({\hat{R}}^{[1]} - R_{0}^{[1]})}^{+} {| |}_{F}^{2}}{c} + \frac{2 | | {({\hat{R}}^{[2]} - R_{0}^{[2]})}^{+} {| |}_{F}^{2}}{c} . \end{matrix} \end{matrix}

The diagonal elements of ${\hat{R}}^{[k]}$ and $R_{0}^{[k]}$ are all 1. Thus

\begin{matrix} \begin{matrix} c {| | {\hat{Θ}}_{R}^{[1]} - Θ_{R 0}^{[1]} {| |}_{F}^{2} + | | {\hat{Θ}}_{R}^{[2]} - Θ_{R 0}^{[2]} {| |}_{F}^{2}} + λ {| | {({\hat{Θ}}_{R}^{[1]} - Θ_{R 0}^{[1]})}^{-} {| |}_{1} + | | {({\hat{Θ}}_{R}^{[2]} - Θ_{R 0}^{[2]})}^{-} {| |}_{1}} \\ \leq & \frac{8 λ^{2} (s_{1} + s_{2})}{c} . \end{matrix} \end{matrix}

Moreover, for the conclusion of the l₁-operator norm, we get

\begin{matrix} \begin{matrix} | | | {\hat{Θ}}_{R}^{[1]} - Θ_{R 0}^{[1]} {| | |}_{1} + | | | {\hat{Θ}}_{R}^{[2]} - Θ_{R 0}^{[2]} {| | |}_{1} \\ \leq & \sum_{k = 1}^{2} (| | {({\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]})}^{+} {| |}_{\infty} + | | {({\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]})}^{-} {| |}_{1}) \\ \leq & \sum_{k = 1}^{2} (| | {\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]} {| |}_{F} + | | {({\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]})}^{-} {| |}_{1}) \\ \leq & \frac{32 λ (s_{1} + s_{2})}{c} . \end{matrix} \end{matrix}

For the minimizer $({\hat{Θ}}_{w}^{[1]}, {\hat{Θ}}_{w}^{[2]})$ , following inequality holds

\begin{matrix} \begin{matrix} | | | {\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]} {| | |}_{1} \\ = & | | | {\hat{W}}^{[k]} {\hat{Θ}}_{w}^{[k]} {\hat{W}}^{[k]} - W_{0}^{[k]} Θ_{w 0}^{[k]} W_{0}^{[k]} {| | |}_{1} \\ \leq & | | {\hat{W}}^{[k]} {| |}_{\infty}^{2} | | | {\hat{Θ}}_{w}^{[k]} - Θ_{w 0}^{[k]} {| | |}_{1} + | | {\hat{W}}^{[k]} - W_{0}^{[k]} {| |}_{\infty} | | | Θ_{w 0}^{[k]} {| | |}_{1} | | {\hat{W}}^{[k]} {| |}_{\infty} \\ + | | W_{0}^{[k]} {| |}_{\infty} | | | Θ_{w 0}^{[k]} {| | |}_{1} | | {\hat{W}}^{[k]} - W_{0}^{[k]} {| |}_{\infty} . \end{matrix} \end{matrix}

(44)

To draw the conclusion, we have the following facts:

The Sub-Gaussian vector with covariance $Σ_{0}^{[k]}$ implies that $\sqrt{n / log p} | | ({\hat{Σ}}^{[k]} - Σ_{0}^{[k]}) {| |}_{\infty}$ is bounded in probability.
The eigenvalues of $Θ_{w 0}^{[k]}$ are bounded by a constant.

Thus, $| | | {\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]} {| | |}_{1}$ and $| | | {\hat{Θ}}_{w}^{[k]} - Θ_{w 0}^{[k]} {| | |}_{1}$ share the same boundary.

Proof of Theorem 3

Proof 3 Similarly, ${\hat{Θ}}_{k}$ are the minimum value of the fused graphical Lasso for k = 1, 2, ⋯, K. Let ${\tilde{Θ}}_{k} = α_{k} {\hat{Θ}}_{k} + (1 - α_{k}) Θ_{0 k}$ , and $α_{k} = \frac{M}{M + | | {\hat{Θ}}_{k} - Θ_{0 k} {| |}_{F}}$ . Denotes

\begin{matrix} F_{n} (Θ_{1}, \dots, Θ_{K}) = \sum_{k = 1}^{K} {t r (Θ_{k} {\hat{Σ}}_{k}) - log det (Θ_{k})} + λ \sum_{k = 1}^{K} | | Θ_{k}^{-} {| |}_{1} + ρ \sum_{k < k^{'}} | | Θ_{k}^{-} - Θ_{k^{'}}^{-} {| |}_{1}, \end{matrix}

we obtain

\begin{matrix} F_{n} ({\tilde{Θ}}_{1}, {\tilde{Θ}}_{2}, \dots, {\tilde{Θ}}_{K}) \leq F_{n} (Θ_{01}, Θ_{02}, \dots, Θ_{0 K}) . \end{matrix}

Thus,

\begin{matrix} \sum_{k = 1}^{K} {t r ({\tilde{Θ}}_{k} - Θ_{0 k}) {\hat{Σ}}_{k} - (log det ({\tilde{Θ}}_{k}) - log det (Θ_{0 k})) + λ | | {\tilde{Θ}}_{k}^{-} {| |}_{1}} \\ + ρ \sum_{k < k^{'}} | | {\tilde{Θ}}_{k}^{-} - {\tilde{Θ}}_{k^{'}}^{-} {| |}_{1} \\ \leq & λ \sum_{k = 1}^{K} | | Θ_{0 k}^{-} {| |}_{1} + ρ \sum_{k < k^{'}} | | Θ_{0 k}^{-} - Θ_{0 k^{'}}^{-} {| |}_{1} . \end{matrix}

Using the notations that $Δ_{k} = {\tilde{Θ}}_{k} - Θ_{0 k}$ and

f (Δ_{k}) ≔ t r (Δ_{k} Σ_{0 k}) - [log det (Δ_{k} + Θ_{0 k}) - log det (Θ_{0 k})],

we yield the following expression

\begin{matrix} \begin{matrix} \sum_{k = 1}^{K} f (Δ_{k}) + λ \sum_{k = 1}^{K} | | {\tilde{Θ}}_{k}^{-} {| |}_{1} + ρ \sum_{k < k^{'}} | | {\tilde{Θ}}_{k}^{-} - {\tilde{Θ}}_{k^{'}}^{-} {| |}_{1} \\ \leq & - \sum_{k = 1}^{K} (t r (Δ_{k} ({\hat{Σ}}_{k} - Σ_{0 k}))) - t r (Δ_{2} ({\hat{Σ}}_{2} - Σ_{02})) \\ + λ \sum_{k = 1}^{K} | | Θ_{0 k}^{-} {| |}_{1} + ρ \sum_{k < k^{'}} | | Θ_{0 k}^{-} - Θ_{0 k^{'}}^{-} {| |}_{1} . \end{matrix} \end{matrix}

(45)

For L_k ≥ 1, k = 1, 2, ⋯, K, the minimum and maximum eigenvalues of Θ_0k hold that

\begin{matrix} 1 / L_{k} \leq λ_{m i n} (Θ_{0 k}) \leq λ_{m a x} (Θ_{0 k}) \leq L_{k} . \end{matrix}

For multiple case, we select a constant L satisfying 1/L ≤ 1/L_k and L_k ≤ L. By similar analysis, for M in (0, 1/2L], the inequality (32) and the inequality (33) still hold.

For K groups data, based on the inequalities (31) and (33). Then, the inequality (45) turns to be

\begin{matrix} \begin{matrix} c \sum_{k = 1}^{K} | | {\tilde{Θ}}_{k} - Θ_{0 k} {| |}_{F}^{2} + λ \sum_{k = 1}^{K} | | {\tilde{Θ}}_{k}^{-} {| |}_{1} + ρ \sum_{k < k^{'}} | | {\tilde{Θ}}_{k}^{-} - {\tilde{Θ}}_{k^{'}}^{-} {| |}_{1} \\ \leq & \sum_{k = 1}^{K} {λ_{0} | | Δ_{k}^{-} {| |}_{1} + | | {\hat{Σ}}_{k}^{+} - Σ_{0 k}^{+} {| |}_{F} | | Δ_{k}^{+} {| |}_{F}} + λ \sum_{k = 1}^{K} | | Θ_{0 k}^{-} {| |}_{1} + ρ \sum_{k < k^{'}} | | Θ_{0 k}^{-} - Θ_{0 k^{'}}^{-} {| |}_{1} . \end{matrix} \end{matrix}

Thus,

\begin{matrix} \begin{matrix} c \sum_{k = 1}^{K} | | {\tilde{Θ}}_{k} - Θ_{0 k} {| |}_{F}^{2} + λ \sum_{k = 1}^{K} {| | {\tilde{Θ}}_{k}^{-} {| |}_{1} - | | Θ_{0 k}^{-} {| |}_{1}} \\ \leq & λ_{0} \sum_{k = 1}^{K} | | {\tilde{Θ}}_{k}^{-} - Θ_{0 k}^{-} {| |}_{1} + ρ \sum_{k < k^{'}} {| | Θ_{0 k}^{-} - Θ_{0 k^{'}}^{-} {| |}_{1} - | | {\tilde{Θ}}_{k}^{-} - {\tilde{Θ}}_{k^{'}}^{-} {| |}_{1}} \\ + \sum_{k = 1}^{K} {| | {\hat{Σ}}_{k}^{+} - Σ_{0 k}^{+} {| |}_{F} | | {\tilde{Θ}}_{k}^{+} - Θ_{0 k}^{+} {| |}_{F}} . \end{matrix} \end{matrix}

(46)

When k = 1, 2, ⋯, K, the inequations (36) and (37) still hold. Similarly, we have the following inequality

\begin{matrix} \begin{matrix} | | Θ_{0 k}^{-} - Θ_{0 k^{'}}^{-} {| |}_{1} - | | {\tilde{Θ}}_{k}^{-} - {\tilde{Θ}}_{k^{'}}^{-} {| |}_{1} \\ = & | | Θ_{0 k}^{-} - {\tilde{Θ}}_{k}^{-} + {\tilde{Θ}}_{k}^{-} - {\tilde{Θ}}_{k^{'}}^{-} + {\tilde{Θ}}_{k^{'}}^{-} - Θ_{0 k^{'}}^{-} {| |}_{1} - | | {\tilde{Θ}}_{k}^{-} - {\tilde{Θ}}_{k^{'}}^{-} {| |}_{1} \\ \leq & | | {\tilde{Θ}}_{k}^{-} - Θ_{0 k}^{-} {| |}_{1} + | | {\tilde{Θ}}_{k^{'}}^{-} - Θ_{0 k^{'}}^{-} {| |}_{1} . \end{matrix} \end{matrix}

(47)

Thus, by the Eqs (36), (37) and (47) the inequality (46) yields

\begin{matrix} \begin{matrix} c \sum_{k = 1}^{K} | | {\tilde{Θ}}_{k} - Θ_{0 k} {| |}_{F}^{2} + λ \sum_{k = 1}^{K} {| | Δ_{k S_{k}^{c}}^{-} {| |}_{1} - | | Δ_{k S_{k}}^{-} {| |}_{1}} \\ \leq & λ_{0} \sum_{k = 1}^{K} {| | Δ_{k S_{k}^{c}}^{-} {| |}_{1} + | | Δ_{k S_{k}}^{-} {| |}_{1}} \\ + ρ \sum_{k < k^{'}} {| | Δ_{k S_{k}^{c}}^{-} {| |}_{1} + | | Δ_{k S_{k}}^{-} {| |}_{1} + | | Δ_{k^{'} S_{k^{'}}^{c}}^{-} {| |}_{1} + | | Δ_{k^{'} S_{k^{'}}}^{-} {| |}_{1}} \\ + \sum_{k = 1}^{K} {| | {\hat{Σ}}_{k}^{+} - Σ_{0 k}^{+} {| |}_{F} | | {\tilde{Θ}}_{k}^{+} - Θ_{0 k}^{+} {| |}_{F}} \\ \leq & (\frac{K (K - 1)}{2} ρ + λ_{0}) \sum_{k = 1}^{K} {| | Δ_{k S_{k}^{c}}^{-} {| |}_{1} + | | Δ_{k S_{k}}^{-} {| |}_{1}} \\ + \sum_{k = 1}^{K} {| | {\hat{Σ}}_{k}^{+} - Σ_{0 k}^{+} {| |}_{F} | | {\tilde{Θ}}_{k}^{+} - Θ_{0 k}^{+} {| |}_{F}} . \end{matrix} \end{matrix}

Since K is a fixed constant, and $2 (\frac{K (K - 1)}{2} ρ + λ_{0}) < λ$ , we can obtain

\begin{matrix} \begin{matrix} 2 c \sum_{k = 1}^{K} | | {\tilde{Θ}}_{k} - Θ_{0 k} {| |}_{F}^{2} + λ \sum_{k = 1}^{K} | | Δ_{k S_{k}^{c}}^{-} {| |}_{1} \\ \leq & 3 λ \sum_{k = 1}^{K} | | Δ_{k S_{k}}^{-} {| |}_{1} + 2 \sum_{k = 1}^{K} {| | {\hat{Σ}}_{k}^{+} - Σ_{0 k}^{+} {| |}_{F} | | {\tilde{Θ}}_{k}^{+} - Θ_{0 k}^{+} {| |}_{F}} . \end{matrix} \end{matrix}

On the basis of the inequality (39), we deduce

\begin{matrix} \begin{matrix} 2 c \sum_{k = 1}^{K} | | {\tilde{Θ}}_{k} - Θ_{0 k} {| |}_{F}^{2} + λ \sum_{k = 1}^{K} | | Δ_{k}^{-} {| |}_{1} \\ \leq & 4 λ \sum_{k = 1}^{K} | | Δ_{k S_{k}}^{-} {| |}_{1} + 2 \sum_{k = 1}^{K} {| | {\hat{Σ}}_{k}^{+} - Σ_{0 k}^{+} {| |}_{F} | | {\tilde{Θ}}_{k}^{+} - Θ_{0 k}^{+} {| |}_{F}} \end{matrix} \end{matrix}

holds. In addition, one can get the inequality $| | Δ_{k S_{k}}^{-} {| |}_{1} \leq \sqrt{s_{k}} | | Δ_{k S_{k}}^{-} {| |}_{F}$ . Thus

\begin{matrix} \begin{matrix} 2 c \sum_{k = 1}^{K} | | {\tilde{Θ}}_{k} - Θ_{0 k} {| |}_{F}^{2} + λ \sum_{k = 1}^{K} | | Δ_{k}^{-} {| |}_{1} \\ \leq & 4 λ \sum_{k = 1}^{K} (\sqrt{s_{k}} | | Δ_{k S_{k}}^{-} {| |}_{F}) + 2 \sum_{k = 1}^{K} {| | {\hat{Σ}}_{k}^{+} - Σ_{0 k}^{+} {| |}_{F} | | {\tilde{Θ}}_{k}^{+} - Θ_{0 k}^{+} {| |}_{F}} . \end{matrix} \end{matrix}

(48)

Based on xy ≤ (x² + y²)/2 and the inequality (41), the inequality (48) infer that

\begin{matrix} \begin{matrix} 2 c \sum_{k = 1}^{K} | | {\tilde{Θ}}_{k} - Θ_{0 k} {| |}_{F}^{2} + λ \sum_{k = 1}^{K} | | Δ_{k}^{-} {| |}_{1} \\ \leq & \frac{1}{2} \sum_{k = 1}^{K} (c | | Δ_{k S_{k}}^{-} {| |}_{F}^{2} + \frac{16 λ^{2} s_{k}}{c}) + \frac{1}{2} \sum_{k = 1}^{K} (c | | {\tilde{Θ}}_{k}^{+} - Θ_{0 k}^{+} {| |}_{F}^{2} + \frac{4 | | {\hat{Σ}}_{k}^{+} - Σ_{0 k}^{+} {| |}_{F}^{2}}{c}) \\ \leq & c \sum_{k = 1}^{K} | | Δ_{k} {| |}_{F}^{2} + \frac{8 λ^{2} \sum_{k = 1}^{K} s_{k}}{c} + \frac{2 \sum_{k = 1}^{K} | | {\hat{Σ}}_{k}^{+} - Σ_{0 k}^{+} {| |}_{F}^{2}}{c} . \end{matrix} \end{matrix}

Thus,

\begin{matrix} c \sum_{k = 1}^{K} | | Δ_{k} {| |}_{F}^{2} + λ \sum_{k = 1}^{K} | | Δ_{k}^{-} {| |}_{1} \leq \frac{8 λ^{2} \sum_{k = 1}^{K} s_{k}}{c} + \frac{2 \sum_{k = 1}^{K} | | {\hat{Σ}}_{k}^{+} - Σ_{0 k}^{+} {| |}_{F}^{2}}{c} . \end{matrix}

(49)

Using the relation between the Frobenius norm and the supremum norm, we have

\begin{matrix} \begin{matrix} c \sum_{k = 1}^{K} | | Δ_{k} {| |}_{F}^{2} + λ \sum_{k = 1}^{K} | | Δ_{k}^{-} {| |}_{1} \leq \frac{8 λ^{2} \sum_{k = 1}^{K} s_{k}}{c} + \frac{2 K p λ_{0}^{2}}{c} . \end{matrix} \end{matrix}

(50)

According to the inequality (50), we get

\begin{matrix} \sum_{k = 1}^{K} | | Δ_{k} {| |}_{F}^{2} \leq \frac{λ_{0}}{2 c L} . \end{matrix}

According to λ₀ ≤ λ/2 and the condition λ ≤ c/8L, we get

\begin{matrix} \sum_{k = 1}^{K} | | Δ_{k} {| |}_{F}^{2} \leq \frac{1}{32 L^{2}} . \end{matrix}

Taking $M = 1 / (2 \sqrt{2} L) < 1 / 2 L$ , we have

\begin{matrix} \sum_{k = 1}^{K} | | Δ_{k} {| |}_{F}^{2} \leq M^{2} / 4 . \end{matrix}

Thus, ||Δ_k||_F is bounded by M/2. Further, we can derive $| | {\hat{Θ}}_{k} - Θ_{0 k} {| |}_{F} \leq M$ which means that we can substitute ${\hat{Θ}}_{k}$ for ${\tilde{Θ}}_{k}$ , and that leads to the inequality (50) holds for ${\hat{Θ}}_{k}$ , i.e.

\begin{matrix} \begin{matrix} c \sum_{k = 1}^{K} | | {\hat{Θ}}_{k} - Θ_{0 k} {| |}_{F}^{2} + λ \sum_{k = 1}^{K} | | {({\hat{Θ}}_{k} - Θ_{0 k})}^{-} {| |}_{1} \leq \frac{8 λ^{2} \sum_{k = 1}^{K} s_{k}}{c} + \frac{2 K p λ_{0}^{2}}{c}, \end{matrix} \end{matrix}

That implies

\begin{matrix} \begin{matrix} \sum_{k = 1}^{K} | | | {\hat{Θ}}_{k} - Θ_{0 k} {| | |}_{1} \leq & \sum_{k = 1}^{K} (| | {\hat{Θ}}_{k}^{+} - Θ_{0 k}^{+} {| |}_{\infty} + | | {\hat{Θ}}_{k}^{-} - Θ_{0 k}^{-} {| |}_{1}) \\ \leq & \sum_{k = 1}^{K} (| | {\hat{Θ}}_{k} - Θ_{0 k} {| |}_{F} + | | {\hat{Θ}}_{k}^{-} - Θ_{0 k}^{-} {| |}_{1}) \\ \leq & K [\frac{λ \sqrt{8 \sum_{k = 1}^{K} s_{k} + \frac{K p}{2}}}{c} + \frac{λ (8 \sum_{k = 1}^{K} s_{k} + \frac{K p}{2})}{c}] \\ \leq & \frac{2 K λ (8 \sum_{k = 1}^{K} s_{k} + \frac{K p}{2})}{c}, \end{matrix} \end{matrix}

which completes the proof.

Proof of Theorem 4

Proof 4 We get from (49)

\begin{matrix} \begin{matrix} c \sum_{k = 1}^{K} | | {\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]} {| |}_{F}^{2} + λ \sum_{k = 1}^{K} | | {({\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]})}^{-} {| |}_{1} \\ \leq & \frac{8 λ^{2} \sum_{k = 1}^{K} s_{k}}{c} + \frac{2 \sum_{k = 1}^{K} | | {({\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]})}^{+} {| |}_{F}^{2}}{c}, \end{matrix} \end{matrix}

and similarly derive

\begin{matrix} c \sum_{k = 1}^{K} | | {\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]} {| |}_{F}^{2} + λ \sum_{k = 1}^{K} | | {({\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]})}^{-} {| |}_{1} \leq \frac{8 λ^{2} \sum_{k = 1}^{K} s_{k}}{c} . \end{matrix}

Using

\begin{matrix} \sum_{k = 1}^{K} | | | {\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]} {| | |}_{1} \leq \sum_{k = 1}^{K} (| | {\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]} {| |}_{F} + | | {({\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]})}^{-} {| |}_{1}) \end{matrix}

we have

\begin{matrix} \sum_{k = 1}^{K} | | | {\hat{Θ}}_{R}^{[k]} - Θ_{R 0}^{[k]} {| | |}_{1} \leq \frac{16 K λ \sum_{k = 1}^{K} s_{k}}{c} . \end{matrix}

At last, using the inequality (44), based on the analysis of the upper bound of $| | W_{0}^{[k]} {| |}_{\infty}$ and $| | {\hat{W}}^{[k]} {| |}_{\infty}$ , and the convergence rate of $| | ({\hat{Σ}}^{[k]} - Σ_{0}^{[k]}) {| |}_{\infty}$ , we draw the conclusion that

\begin{matrix} \sum_{k = 1}^{K} | | | {\hat{Θ}}_{w}^{[k]} - Θ_{0}^{[k]} {| | |}_{1} \frac{16 K λ \sum_{k = 1}^{K} s_{k}}{c} . \end{matrix}

Proof of Theorem 5

Proof 5 First of all, we prove that the remainder converge in probability with a $1 / \sqrt{n}$ convergence rate. On account of Theorem 1, we get

\begin{matrix} \begin{matrix} {| | r e m | |}_{\infty} \leq \sum_{k = 1}^{2} | | ({\hat{Θ}}^{[k]} - Θ_{0}^{[k]}) ({\hat{Σ}}^{[k]} - Σ_{0}^{[k]}) Θ_{0}^{[k]} {| |}_{\infty} + \sum_{k = 1}^{2} | | ({\hat{Θ}}^{[k]} - Θ_{0}^{[k]}) ({\hat{Σ}}^{[k]} {\hat{Θ}}^{[k]} - I_{p}) {| |}_{\infty} \end{matrix} \end{matrix}

Define

\begin{matrix} l (Θ) = \sum_{k = 1}^{2} {t r ({\hat{Σ}}^{[k]} Θ^{[k]}) - log det (Θ^{[k]})} + λ \sum_{k = 1}^{2} | | {(Θ^{[k]})}^{-} {| |}_{1} + ρ | | {(Θ^{[1]} - Θ^{[2]})}^{-} {| |}_{1} . \end{matrix}

By the Karush-Kuhn-Tucker (KKT) conditions, we yield

\begin{matrix} {\hat{Σ}}^{[1]} - {({\hat{Θ}}^{[1]})}^{- 1} + (λ + ρ) {\hat{Z}}^{[1]} = 0, \end{matrix}

(51)

and

\begin{matrix} {\hat{Σ}}^{[2]} - {({\hat{Θ}}^{[2]})}^{- 1} + (λ - ρ) {\hat{Z}}^{[2]} = 0, \end{matrix}

(52)

where ${\hat{Z}}_{i j}^{[k]} = s i g n ({\hat{Θ}}_{i j}^{[k]})$ if ${\hat{Θ}}_{i j}^{[k]} \neq 0$ , and satisfying $| | {\hat{Z}}^{[k]} {| |}_{\infty} \leq 1$ . Multiplying by ${\hat{Θ}}^{[1]}$ on the Eq (51), we get

\begin{matrix} I_{p} - {\hat{Σ}}^{[1]} {\hat{Θ}}^{[1]} = (λ + ρ) {\hat{Z}}^{[1]} {\hat{Θ}}^{[1]} . \end{matrix}

Similarly, we have

\begin{matrix} I_{p} - {\hat{Σ}}^{[2]} {\hat{Θ}}^{[2]} = (λ - ρ) {\hat{Z}}^{[2]} {\hat{Θ}}^{[2]} . \end{matrix}

Thus,

\begin{matrix} \begin{matrix} {| | r e m | |}_{\infty} \leq & \sum_{k = 1}^{2} | | | ({\hat{Θ}}^{[k]} - Θ_{0}^{[k]}) {| | |}_{1} | | ({\hat{Σ}}^{[k]} - Σ_{0}^{[k]}) {| |}_{\infty} | | | Θ_{0}^{[k]} {| | |}_{1} \\ + (λ + ρ) \sum_{k = 1}^{2} | | | ({\hat{Θ}}^{[k]} - Θ_{0}^{[k]}) {| | |}_{1} | | {\hat{Z}}^{[k]} {| |}_{\infty} | | | {\hat{Θ}}^{[k]} {| | |}_{1} . \end{matrix} \end{matrix}

To draw the conclusion, we have

\begin{matrix} | | | ({\hat{Θ}}^{[k]} - Θ_{0}^{[k]}) {| | |}_{1} \leq b (p + s) λ, \end{matrix}

(53)

where b is a constant and is related to L. According to the Schwarz inequality and Weyl inequality, we get

\begin{matrix} | | | Θ_{0}^{[k]} {| | |}_{1} \leq \sqrt{d + 1} Λ_{max} (Θ_{0}^{[k]}) . \end{matrix}

(54)

The bound of $| | | {\hat{Θ}}^{[k]} {| | |}_{1}$ is derived by

\begin{matrix} | | | {\hat{Θ}}^{[k]} {| | |}_{1} \leq | | | {\hat{Θ}}^{[k]} - Θ_{0}^{[k]} {| | |}_{1} + | | | Θ_{0}^{[k]} {| | |}_{1} . \end{matrix}

(55)

According to the rate of λ, we conclude that

\begin{matrix} | | | {\hat{Θ}}^{[k]} {| | |}_{1} \leq \sqrt{d + 1} Λ_{max} (Θ_{0}^{[k]}) . \end{matrix}

(56)

Besides, the Sub-Gaussian random vector with covariance $Σ_{0}^{[k]}$ implies that $| | {\hat{Σ}}^{[k]} - Σ_{0}^{[k]} {| |}_{\infty} = O_{p} (\sqrt{log (p) / n})$ , where O_p denotes bounded in probability. We get

\begin{matrix} {| | r e m | |}_{\infty} & \leq & \frac{4 λ (8 s_{1} + 8 s_{2} + p)}{c} \sqrt{\frac{log p}{n}} \sqrt{d + 1} max {Λ_{max} (Θ_{0}^{[1]}), Λ_{max} (Θ_{0}^{[2]})} \\ + (λ + ρ) \frac{4 λ (8 s_{1} + 8 s_{2} + p)}{c} \sqrt{d + 1} max {Λ_{max} (Θ_{0}^{[1]}), Λ_{max} (Θ_{0}^{[2]})} . \end{matrix}

For λ ≍ ρ, ||rem||_∞ is bounded by $\tilde{b} (p + s) \sqrt{d + 1} λ^{2}$ in probability, where $\tilde{b}$ is a constant related to L. Based on the condition $(p + s) \sqrt{d} = o (\sqrt{n} / log p)$ , ${| | r e m | |}_{\infty} = o_{p} (1 / \sqrt{n})$ . According to the bounded fourth moments of ${({\hat{Θ}}^{[k]})}_{i i} {({\hat{Θ}}^{[k]})}_{j j} + {({\hat{Θ}}^{[k]})}_{i j}^{2}$ and Lindeberg central limit theorem, we complete the proof of the Theorem 5.

Proof of Theorem 6

Proof 6 The conclusions of Theorem 6 can be obtained from the arguments (53)–(56). For weighted version, ||rem||_∞ can be bounded by $\tilde{b} s \sqrt{d + 1} λ^{2}$ , which completes the proof.

Data Availability

All data come from website http://portals.broadinstitute.org/cgibin/cancer/datasets.cgi. However, we do not have the right to share this data.

Funding Statement

the author Q.Y. Zhang is supported by Program for youth innovation Research in Capital University of Economics and Business(QNTD202207).

References

1. Johnstone I. M. On the distribution of the largest eigenvalue in principal components analysis. Annals of statistics. 2001;29(2):295–327. doi: 10.1214/aos/1009210544 [DOI] [Google Scholar]
2. Yuan M., Lin Y. Model selection and estimation in the gaussian graphical model. Biometrika. 2007;94(1):19–35. doi: 10.1093/biomet/asm018 [DOI] [Google Scholar]
3. Friedman J., Hastie T., Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9(3):432–441. doi: 10.1093/biostatistics/kxm045 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Rothman A. J., Bickel P. J., Levina E., Zhu J. Sparse permutation invariant covariance estimation. Electronic Journal of Statistics. 2008;2:494–515. doi: 10.1214/08-EJS176 [DOI] [Google Scholar]
5. Fan J. Q., Feng Y., Wu Y. C. Network exploration via the adaptive lasso and scad penalties. Annals of applied statistics. 2009;3(2):521–541. doi: 10.1214/08-AOAS215SUPP [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Ravikumar P., Wainwright M. J., Raskutti G., Yu B. High-dimensional covariance estimation by minimizing l₁-penalized log-determinant divergence. Electronic Journal of Statistics. 2011;5:935–980. [Google Scholar]
7. Xue L. Z., Zou H. Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Annals of Statistics. 2012;40(5):2541–2571. doi: 10.1214/12-AOS1041 [DOI] [Google Scholar]
8. Yuan Y. P., Shen X. T., Pan W., Wang Z. Z. Constrained likelihood for reconstructing a directed acyclic gaussian graph. Biometrika. 2019;106(1):109–125. doi: 10.1093/biomet/asy057 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Guo J., Levina E., Michailidis G., Zhu J. Joint estimation of multiple graphical models. Biometrika. 2011;98(1):1–15. doi: 10.1093/biomet/asq060 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Zhang X.-F., Ou-Yang L., Yan T., Hu X. T., Yan H. A joint graphical model for inferring gene networks across multiple subpopulations and data types. IEEE Transactions on Cybernetics. 2019;51(2):1043–1055. doi: 10.1109/TCYB.2019.2952711 [DOI] [PubMed] [Google Scholar]
11. Danaher P., Wang P., Witten D. M. The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2014;76(2):373–397. doi: 10.1111/rssb.12033 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Janková J., van de Geer S. Confidence intervals for high-dimensional inverse covariance estimation. Electronic Journal of Statistics. 2015;9(1):1205–1229. [Google Scholar]
13. Janková J., van de Geer S. Honest confidence regions and optimality in high-dimensional precision matrix estimation. Test. 2017;26(1):143–162. doi: 10.1007/s11749-016-0503-5 [DOI] [Google Scholar]
14. Ren Z., Sun T., Zhang C.-H., Zhou H. H. Asymptotic normality and optimalities in estimation of large gaussian graphical models. Annals of Statistics. 2015;43(3):991–1026. doi: 10.1214/14-AOS1286 [DOI] [Google Scholar]
15. Yu M., Gupta V., Kolar M. Simultaneous inference for pairwise graphical models with generalized score matching. Journal of Machine Learning Research. 2020;21(91):1–51.34305477 [Google Scholar]
16.Janková, J., van de Geer, S. Inference in high-dimensional graphical models. http://arxiv.org/abs/arXiv:1801.08512
17. Brownlees C., Nualart E., Sun Y. C. Realized networks. Journal of Applied Econometrics. 2018;33(7):986–1006. doi: 10.1002/jae.2642 [DOI] [Google Scholar]
18. Boyd S., Vandenberghe L. Convex optimization. Cambridge university press; 2004. [Google Scholar]
19. Monti S., Savage K. J., Kutok J. L., Feuerhake F., Kurtin P., Mihm M., et al. Molecular profiling of diffuse large b-cell lymphoma identifies robust subtypes including one characterized by host inflammatory response. Blood. 2005;105(5):1851–1861. doi: 10.1182/blood-2004-07-2947 [DOI] [PubMed] [Google Scholar]
20. Rosenwald A., Wright G., Chan W. C., Connors J. M., Campo E., Fisher R. I., et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. The New England Journal of Medicine. 2002;346(25):1937–1947. doi: 10.1056/NEJMoa012914 [DOI] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0304264.r001

Decision Letter 0

Debo Cheng

3 Apr 2024

PONE-D-23-38136Application of fused graphical lasso to statistical inference for multiple sparse precision matricesPLOS ONE

Dear Dr. Zhang,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by May 18 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Debo Cheng

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Note from Emily Chenette, Editor in Chief of PLOS ONE, and Iain Hrynaszkiewicz, Director of Open Research Solutions at PLOS:

Did you know that depositing data in a repository is associated with up to a 25% citation advantage (https://doi.org/10.1371/journal.pone.0230416)? If you’ve not already done so, consider depositing your raw data in a repository to ensure your work is read, appreciated and cited by the largest possible audience. You’ll also earn an Accessible Data icon on your published paper if you deposit your data in any participating repository (https://plos.org/open-science/open-data/#accessible-data).

3. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work.

Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

4. Please update your submission to use the PLOS LaTeX template. The template and more information on our requirements for LaTeX submissions can be found at http://journals.plos.org/plosone/s/latex.

5. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match.

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

6. Thank you for stating the following in the Acknowledgments Section of your manuscript:

"Q.Y. Zhang was supported by NSFC 12201430."

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

"The author(s) received no specific funding for this work."

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

7. Thank you for uploading your study's underlying data set. Unfortunately, the repository you have noted in your Data Availability statement does not qualify as an acceptable data repository according to PLOS's standards.

At this time, please upload the minimal data set necessary to replicate your study's findings to a stable, public repository (such as figshare or Dryad) and provide us with the relevant URLs, DOIs, or accession numbers that may be used to access these data. For a list of recommended repositories and additional information on PLOS standards for data deposition, please see https://journals.plos.org/plosone/s/recommended-repositories.

8. Your abstract cannot contain citations. Please only include citations in the body text of the manuscript, and ensure that they remain in ascending numerical order on first mention.

Additional Editor Comments:

Both reviewers acknowledged the paper's contributions, yet they also highlighted certain drawbacks. Therefore, they recommend a major revision to address these issues.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors study the problem of estimating multiple precision matrices from

multiple populations. They propose to use the fused graphical lasso (FGL) method, which is a lasso penalty on sparsity of precision matrices plus

a refined lasso penalty

on the two precision matrices that restrains the

similar structure across multiple groups.

They obtain some inequalities for multiple estimators of FGL models in terms of $L_1$ and Frobenius matrix norms

under several conditions.

Built on the work of Jankov´a and van de Geer (2015) who investigated a de-biasing

technology to obtain a new consistent estimator with known distribution, the authors extend the statistical inference problem to multiple populations, and propose the de-biasing

FGL estimators. The corresponding asymptotic property of de-biasing FGL estimators is provided. A

simulation study and an application of the diffuse large B-cell lymphoma data demonstrate theoretical results.

It is a nice contribution. Some comments are in the attached file.

Reviewer #2: The authors use the fused graphical lasso (FGL) method to estimate multiple precision matrices from multiple populations simultaneously. In high-dimensional setting, an oracle inequality is provided for FGL estimators, which is necessary to establish the central limit law. They not only focus on point estimation of precision matrix, but also work on hypothesis testing for a linear combination of the entries of multiple precision matrices. They extend the Jankov$\\acute{a}$ and van de Geer's [Confidence intervals for high-dimensional inverse covariance estimation, {\\it Electron. J. Stat.} {\\bf 9}(1) (2015) 1205-1229.] de-biasing technology to multiple populations statistical inference problem and propose the de-biasing FGL estimators. The corresponding asymptotic property of de-biasing FGL estimators is provided. A simulation study and an application of the diffuse large B-cell lymphoma data show that the proposed test works well in high-dimensional situations.

However, I still have some questions about the manuscript.

1. As mentioned in the paper, the authors need to minimize the negative penalized log-likelihood function (1) and get the algorithm solution as an estimator in both numerical study and real data application. The authors use the ADMM algorithm to execute the estimating process with the classical AIC tuning method. Could the author demonstrate the ADMM algorithm for fused lasso? Will the solutions with different tuning methods affect the accuracy of the hypothesis testing process? Is there any better approach to tune the parameters?

2. In page 2, line 13: ``$\\hat\\Sigma^{-1}$" $\\to$ ``$\\hat\\Sigma^{-1}_{n}$".

3. In page 4, line 2: ``we denote $(A)_{ij}$ its $(i,j)$-entry" $\\to$ ``we denote $(A)_{ij}$ as $(i,j)$-entry of A".

4. In page 4, line 3 and line 18: The authors confused the symbols of the determinant and the cardinality, please distinguish them.

5. In page 11, line 17: what does notation $|A|$ mean? Please clarify the meaning of the symbol.

6. There are too many formulas with numbers. Please remove the numbers of formulas that are not used.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PONE-D-23-38136-review.pdf

pone.0304264.s001.pdf^{(70.2KB, pdf)}

PLoS One. 2024 May 31;19(5):e0304264. doi: 10.1371/journal.pone.0304264.r002

Author response to Decision Letter 0

15 Apr 2024

Thanks for the comments of reviewers and editor. We have uploaded the “respond to reviewers.pdf” to the attachment. Please check the attachment. Thanks again.

Attachment

Submitted filename: Response to Reviewers_PONE_D_23_38136.pdf

pone.0304264.s002.pdf^{(174.8KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0304264.r003

Decision Letter 1

Debo Cheng

9 May 2024

Application of fused graphical lasso to statistical inference for multiple sparse precision matrices

PONE-D-23-38136R1

Dear Dr. Zhang,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Debo Cheng

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Both reviewers agreed to accept this manuscript. I agree to recommend acceptance.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

PLoS One. doi: 10.1371/journal.pone.0304264.r004

Acceptance letter

Debo Cheng

14 May 2024

PONE-D-23-38136R1

PLOS ONE

Dear Dr. Zhang,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Debo Cheng

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: PONE-D-23-38136-review.pdf

pone.0304264.s001.pdf^{(70.2KB, pdf)}

Attachment

Submitted filename: Response to Reviewers_PONE_D_23_38136.pdf

pone.0304264.s002.pdf^{(174.8KB, pdf)}

Data Availability Statement

All data come from website http://portals.broadinstitute.org/cgibin/cancer/datasets.cgi. However, we do not have the right to share this data.

[pone.0304264.ref001] 1. Johnstone I. M. On the distribution of the largest eigenvalue in principal components analysis. Annals of statistics. 2001;29(2):295–327. doi: 10.1214/aos/1009210544 [DOI] [Google Scholar]

[pone.0304264.ref002] 2. Yuan M., Lin Y. Model selection and estimation in the gaussian graphical model. Biometrika. 2007;94(1):19–35. doi: 10.1093/biomet/asm018 [DOI] [Google Scholar]

[pone.0304264.ref003] 3. Friedman J., Hastie T., Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9(3):432–441. doi: 10.1093/biostatistics/kxm045 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0304264.ref004] 4. Rothman A. J., Bickel P. J., Levina E., Zhu J. Sparse permutation invariant covariance estimation. Electronic Journal of Statistics. 2008;2:494–515. doi: 10.1214/08-EJS176 [DOI] [Google Scholar]

[pone.0304264.ref005] 5. Fan J. Q., Feng Y., Wu Y. C. Network exploration via the adaptive lasso and scad penalties. Annals of applied statistics. 2009;3(2):521–541. doi: 10.1214/08-AOAS215SUPP [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0304264.ref006] 6. Ravikumar P., Wainwright M. J., Raskutti G., Yu B. High-dimensional covariance estimation by minimizing l₁-penalized log-determinant divergence. Electronic Journal of Statistics. 2011;5:935–980. [Google Scholar]

[pone.0304264.ref007] 7. Xue L. Z., Zou H. Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Annals of Statistics. 2012;40(5):2541–2571. doi: 10.1214/12-AOS1041 [DOI] [Google Scholar]

[pone.0304264.ref008] 8. Yuan Y. P., Shen X. T., Pan W., Wang Z. Z. Constrained likelihood for reconstructing a directed acyclic gaussian graph. Biometrika. 2019;106(1):109–125. doi: 10.1093/biomet/asy057 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0304264.ref009] 9. Guo J., Levina E., Michailidis G., Zhu J. Joint estimation of multiple graphical models. Biometrika. 2011;98(1):1–15. doi: 10.1093/biomet/asq060 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0304264.ref010] 10. Zhang X.-F., Ou-Yang L., Yan T., Hu X. T., Yan H. A joint graphical model for inferring gene networks across multiple subpopulations and data types. IEEE Transactions on Cybernetics. 2019;51(2):1043–1055. doi: 10.1109/TCYB.2019.2952711 [DOI] [PubMed] [Google Scholar]

[pone.0304264.ref011] 11. Danaher P., Wang P., Witten D. M. The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2014;76(2):373–397. doi: 10.1111/rssb.12033 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0304264.ref012] 12. Janková J., van de Geer S. Confidence intervals for high-dimensional inverse covariance estimation. Electronic Journal of Statistics. 2015;9(1):1205–1229. [Google Scholar]

[pone.0304264.ref013] 13. Janková J., van de Geer S. Honest confidence regions and optimality in high-dimensional precision matrix estimation. Test. 2017;26(1):143–162. doi: 10.1007/s11749-016-0503-5 [DOI] [Google Scholar]

[pone.0304264.ref014] 14. Ren Z., Sun T., Zhang C.-H., Zhou H. H. Asymptotic normality and optimalities in estimation of large gaussian graphical models. Annals of Statistics. 2015;43(3):991–1026. doi: 10.1214/14-AOS1286 [DOI] [Google Scholar]

[pone.0304264.ref015] 15. Yu M., Gupta V., Kolar M. Simultaneous inference for pairwise graphical models with generalized score matching. Journal of Machine Learning Research. 2020;21(91):1–51.34305477 [Google Scholar]

[pone.0304264.ref016] 16.Janková, J., van de Geer, S. Inference in high-dimensional graphical models. http://arxiv.org/abs/arXiv:1801.08512

[pone.0304264.ref017] 17. Brownlees C., Nualart E., Sun Y. C. Realized networks. Journal of Applied Econometrics. 2018;33(7):986–1006. doi: 10.1002/jae.2642 [DOI] [Google Scholar]

[pone.0304264.ref018] 18. Boyd S., Vandenberghe L. Convex optimization. Cambridge university press; 2004. [Google Scholar]

[pone.0304264.ref019] 19. Monti S., Savage K. J., Kutok J. L., Feuerhake F., Kurtin P., Mihm M., et al. Molecular profiling of diffuse large b-cell lymphoma identifies robust subtypes including one characterized by host inflammatory response. Blood. 2005;105(5):1851–1861. doi: 10.1182/blood-2004-07-2947 [DOI] [PubMed] [Google Scholar]

[pone.0304264.ref020] 20. Rosenwald A., Wright G., Chan W. C., Connors J. M., Campo E., Fisher R. I., et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. The New England Journal of Medicine. 2002;346(25):1937–1947. doi: 10.1056/NEJMoa012914 [DOI] [PubMed] [Google Scholar]

PERMALINK

Application of fused graphical lasso to statistical inference for multiple sparse precision matrices

Qiuyan Zhang

Lingrui Li

Hu Yang

Roles

Abstract

Introduction

Main results

Oracle inequality

Asymptotic property

Numerical study

Fluctuations of test

Fig 1. The fluctuation for two-sample case with sparse precision matrix.

Fig 2. The fluctuation for two-sample case with dense precision matrix.

Average coverage probabilities

Table 1. Estimated average coverage probabilities for K = 2 situation.

Multiple FGL case

Fig 3. The fluctuation for multiple-sample case with dense precision matrix.

Real data application

Table 2. Brief introduction to the gene profile expression datasets.

Fig 4. The p-values of proposed test for DLBCL-A dataset.

Fig 5. The p-values of proposed test for DLBCL-B dataset.

Proof of theorem

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Proof of Theorem 5

Proof of Theorem 6

Data Availability

Funding Statement

References

Decision Letter 0

Debo Cheng

Roles

Author response to Decision Letter 0

Decision Letter 1

Debo Cheng

Roles

Acceptance letter

Debo Cheng

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases