Abstract
In this paper, the fused graphical lasso (FGL) method is used to estimate multiple precision matrices from multiple populations simultaneously. The lasso penalty in the FGL model is a restraint on sparsity of precision matrices, and a moderate penalty on the two precision matrices from distinct groups restrains the similar structure across multiple groups. In high-dimensional settings, an oracle inequality is provided for FGL estimators, which is necessary to establish the central limit law. We not only focus on point estimation of a precision matrix, but also work on hypothesis testing for a linear combination of the entries of multiple precision matrices. We apply a de-biasing technology, which is used to obtain a new consistent estimator with known distribution for implementing the statistical inference, and extend the statistical inference problem to multiple populations. The corresponding de-biasing FGL estimator and its asymptotic theory are provided. A simulation study and an application of the diffuse large B-cell lymphoma data show that the proposed test works well in high-dimensional situation.
Introduction
Undirected graphical models are popular tools for representing the network structure of data and have been widely applied in many domains, such as machine learning, gene pattern recognition, and financial data analysis. Letting x = (x1, …, xp)T be a p-variate normal random vector with mean vector μ and covariance Σ0 (Σ0 is positive definite), the precision matrix (or concentration matrix) is denoted the inverse of the covariance matrix, i.e., . The graphical models capture conditional dependence relationships between random variables via non-zero entries in a precision matrix. If Θ0ij ≠ 0, xi and xj, i, j = 1, …, p are dependent on each other, given all other variables. Meanwhile, the zero entries in the precision matrix correspond to pairs of variables that are conditionally independent given other variables. Therefore, the graph model is closely related to the precision matrix. The research of estimating and testing of a precision matrix have been a rapidly growing research direction in the past few years.
Letting x1, …, xn be a sequence of independent and identically distributed (i.i.d.) observations from the population x, Xp×n ≔ (x1, …, xn). A natural estimator of the precision matrix is the inverse of the sample covariance matrix , where . On the one hand, in high-dimensional settings, Johnstone [1] proposed that the eigenvalues of the sample covariance matrix do not converge to the corresponding eigenvalue of the population covariance matrix for Σ = I. Consequently, this estimator becomes invalid when the dimension p is comparable to the sample size n. On the other hand, the sample covariance matrix is singular in a p > n − 1 setting. This will produce non-negligible errors in using to estimate Θ0. In addition, a sparse (i.e., many entries are either zero or nearly so) assumption for a high-dimensional precision matrix is essential, since the zero entries imply the conditional independence structures, which are what we are most concerned with in the graphical model. In general, does not have a sparsity construction. How to estimate the sparse precision matrix in high-dimensional settings is an intractable problem.
In recent years, various proposals have been put forward for estimating a precision matrix in high-dimensional situations, among which the graphical model with sparsity-promoting penalties is valid for obtaining a sparse estimator. By applying the l1 (lasso penalty) to the entries of the concentration matrix, Yuan and Lin [2] proposed a max-det algorithm to obtain the estimator of Θ0. The convergence result of the estimator is derived under a p fixed assumption. Using a coordinate descent procedure, Friedman et al. [3] provided an algorithm for solving a graphical Lasso estimator that is remarkably fast, even if p > n. Rothman et al. [4] investigated a sparse permutation invariant covariance estimator, and established a convergence rate of the estimator in the Frobenius norm as both data dimension p and sample size n are allowed to grow, and showed that the rate explicitly depends on how sparse the true concentration matrix is. For additional theoretical details on penalized likelihood methods for graphical models, see Fan et al. [5], Ravikumar et al. [6], Xue and Zou [7], and Yuan et al. [8].
The above-mentioned methods focus on estimating a single graphical model. However, joint estimators perform better in recovering the truth graphs for multiple graphical models, when graphs sharing the similar structure. Guo et al. [9] studied joint estimation of precision matrices that have a hierarchical structure assumption. Zhang et al. [10] proposed a new joint group lasso penalty to restore the joint graphical model. Their method was applied for multiple gene networks data with several subpopulations and data types. A fused graphical lasso was proposed by Danaher et al. [11] with a penalty imposing a similar structure of a precision matrix across groups. Supposing that are sample matrices, and are sampled i.i.d. from a distribution with mean μ[k] and covariance , for k = 1, …, K, we assume μ[k] = 0 without loss of generality. To simplify notation, we omit the subscript of , and denote the sample matrices as X[k]. The population precision matrix is defined as the inverse of the population covariance matrix, i.e., . The estimators of precision matrices are investigated by minimizing the negative penalized log likelihood
| (1) |
where P({Θ[k]}) denotes the penalty function, the are the minimizers of (1), and we optimize over the symmetric positive-definite matrices set . The fused graphical lasso (FGL) is the solution to optimization problem (1) with the fused lasso penalty
| (2) |
where λ and ρ are non-negative regularization parameters, (Θ[k])− represents the matrix obtained by setting the diagonal elements of (Θ[k]) to zero, and || ⋅ ||1 denotes the l1 norm of a vector or matrix. It is reasonable to restrict non-diagonal elements of Θ[k], since we are most concerned with the conditional independence cross-different variables. Note that the first term in (2) is the classical lasso penalty, which shrinks the coefficients toward 0 as λ increases. It guarantees discovery of the sparse estimators of the model. The penalty on (Θ[k] − Θ[k′])− indicates that the elements of have a similar network structure across classes.
An approach for the estimation of the joint graphical models largely relies on penalized estimation. The penalty biases the estimates toward the assumed structure, which makes hypothesis tests for precision matrices more challenging. Work on statistical inference for low-dimensional parameters in graphical models has recently been carried out (Janková and van de Geer [12]; Janková and van de Geer [13]; Ren et al. [14]; Yu et al. [15]) based on the l1-penalized estimator. Janková and van de Geer [12] provided a de-biasing technique to obtain a new consistent estimator with known distribution. However, these approaches were developed only in the setting in which the parameters of one graph are inferred. In contrast, studies of inference techniques using estimators obtained from cross-group penalization are much fewer. The work on statistical inference for multiple graphical models is an interesting area open for future research. Inspired by Janková and van de Geer [12], we not only give FGL estimators of multiple precision matrices from co-movement data, but also test the linear combination of the entries of these precision matrices. The core of the proposed method is based on the de-biasing technique, and we implement statistical inference of the precision matrices under high-dimensional settings according to the proposed central limit theorem.
The rest of this paper is organized as follows. In Main results section, we give the oracle inequality for multiple estimators with a FGL penalty and its weighted version. Testing the hypothesis for the linear combination of corresponding entries of multiple precision matrices is also considered in this section. Based on de-biasing technology, the CLT of the proposed statistics for multiple populations is also derived in this section. In Numerical study part, we report the results of simulations. In Real Data Application, we apply the proposed method to identification of gene-to-gene interaction of the diffuse large B-cell lymphoma data. All technical details are relegated to the Proof of Theorem part.
Main results
We assume following notation throughout the paper. For a matrix , we denote (A)ij as (i, j)-entry of A, or denote its (i, j)-entry as Aij to simplify the notation. We write for the determinant of A, and the trace of matrix A is denoted tr(A). Letting A+ = diag(A) for a diagonal matrix with the same diagonal as A, A− = A − A+. denotes the Frobenius norm (also known as the matrix 2-norm). We use the notation ||A||∞ = maxi,j|aij| for the supremum norm of a matrix A, and |||A|||1 ≔ maxj ∑i|aij| for the l1-operator norm.
We write if f(n) ≤ cg(n) for some constant c < ∞, and f(n) = Ω(g(n)) if f(n) ≥ c′g(n) for some constant c′ > 0. The notation f(n) ≍ g(n) means that and f(n) = Ω(g(n)). In the common high-dimensional setting, the dimension p is allowed to grow to infinity. The dimension is comparable, substantially larger or smaller than the sample size. We set sample sizes n1 ≍ … ≍ nK ≍ n throughout the paper, and n* = n1 + … + nK going to infinity. Furthermore, for notational simplicity, we assume that n1 = … = nK = n.
Oracle inequality
To obtain the oracle inequality of multiple estimators of FGL models, we introduce some notation related to the sparsity assumptions on the entries of the true precision matrix. Letting
where is the (i, j)-entry of and sk = |Sk| is the cardinality of Sk, we adopt the boundedness of the eigenvalues of the true precision matrix and certain tail conditions proposed by Janková and Van De Geer [12].
Condition 1 (Bounded eigenvalues) There exist universal constants L for k such that
where Λmin and Λmax denote the minimum and maximum eigenvalues of a matrix, respectively.
Condition 2 (Sub-Gaussianity vector condition) The observations , i = 1, …, nk, are uniformly sub-Gaussian vectors in the respective groups.
We propose the oracle inequality for FGL lasso under the K = 2 situation.
Theorem 1 Supposing that Conditions 1 and 2 hold, for k = 1, 2, tuning parameter λ satisfying 2(ρ + λ0) ≤ λ ≤ c/8L, and . On the set , k = 1, 2, it holds that
and
where c = 1/(8L2).
Remark 1 From the inequality, we must select λ so that λp → 0 as n → ∞ to ensure consistency, which is not satisfied by a sub-Gaussianity random vector. Thus, the condition λp → 0 excludes the p ≫ n situation.
The FGL does not take into account that the variables have, in general, different scaling. Thus, we consider the weighted FGL. The minimizer of the optimization problem (1) with weighted FGL penalty
| (3) |
is denoted , where . Further, the population correlation matrix is denoted and the sample correlation matrix is denoted
If we substitute for , the minimizer of
| (4) |
with a FGL penalty (2) is denoted , which is a matter of estimating the parameter by the normalized data. Then,
which means, essentially, that are the estimators of .
Theorem 2 Under the conditions of Theorem 1, on the set , k = 1, 2, it holds that
| (5) |
| (6) |
and
| (7) |
It is natural to extend this conclusion to the K > 2 FGL model. For k = 1, …, K and the K > 2 situation, we obtain the following theorem.
Theorem 3 (Multiple FGL model) Supposing that Conditions 1 and 2 hold, for K > 2, , and , on the set , k = 1, …, K, it holds that
| (8) |
and
| (9) |
Theorem 4 (Multiple FGL model for weighted version) Under the conditions of Theorem 3, on the set , k = 1, 2, it holds that
| (10) |
| (11) |
and
| (12) |
Asymptotic property
We not only focus on the point estimation of multiple precision matrices, but also on hypothesis testing for the linear combination of the entries of the precision matrices over two groups. One may want to test whether the elements of the precision matrix over two groups are equal:
| (13) |
To test Hypothesis (13), we aim to obtain confidence intervals for estimators based on the de-biasing technique, which is imposed for eliminating the bias associated with the penalty. The de-biasing estimator is defined as . The difference between the de-biasing estimator and the true value can be decomposed into two parts as follows:
where
Under the compatibility conditions, Janková and van de Geer [16] proposed that the (i, j)-entry of has an asymptotic normality property, and converges to zero in probability. Thus, for testing Hypothesis (13), we construct the testing statistic
| (14) |
using de-biasing estimators.
For K = 2, we let
where
Next, we establish the central limit theorem for Tij.
Theorem 5 Assuming Conditions 1, 2, and and , it holds that
| (15) |
where
| (16) |
and op denotes the convergence in probability. Moreover,
| (17) |
where .
To complete the testing procedure, we use the consistent estimator for Theorem 5. Theorem 5 provide a practical and efficient way of obtaining the p value and critical value for the test statistic. Under a null hypothesis, we observe that . For an α level of significance, we reject H0 if , where ξα is the 1 − α upper quantile of the standard normal distribution.
Theorem 5 requires a stronger sparsity condition than the corresponding oracle-type inequality in Theorem 1. According to the convergence rate of , Theorem 5 applies to the p ≪ n situation. For p ≫ n, we provide the following theorem.
Theorem 6 Assuming Conditions 1, 2, and and , for the p ≪ n regime, the Eq (22) holds with , where
| (18) |
In addition,
| (19) |
where .
We do not need to impose the so-called irrepresentability condition on Σ to derive the theoretical properties of our estimators, in contrast to Brownlees et al. [17].
In addition, for the multi-sample precision matrix hypothesis problem, one may want to test a linear hypothesis testing problem:
| (20) |
where a1, …, aK are known constants. Similar to the two-sample case, we proposed the test statistic
| (21) |
For the K > 2 multiple situation, we assume s = max{s1, …, sK} and d = max{d1, …, dK}. Consequently, we establish the asymptotic normality of the proposed statistic in the following corollary, i.e., Corollary 1.
Corollary 1 Under the assumptions of Theorem 5, it holds that
| (22) |
| (23) |
where f(x1, …, xK) = a1x1 + … + aKxK. In addition,
| (24) |
where and .
The asymptotic variance σij in Corollary 1 is unknown, so to construct confidence intervals we use a consistent estimator
where . In addition, a weighted version is proposed as follows.
Corollary 2 Under the assumptions of Theorem 6, the residual term in (23) converges in probability with rate , and CLT in (24) holds by replacing by , which is obtained by solving the weighted FGL optimization problem.
Numerical study
Simulation experiments were carried out to evaluate the performance of the proposed de-biasing FGL test. We considered the sparse graphical model, and a random sample was generated from the multivariate normal distribution with a population covariance matrix defined as the inverse of the population precision matrix.
To solve the graphical lasso problem with a certain penalty, we refer to the alternating direction method of multiplier (ADMM) algorithm, since it is guaranteed to converge to the global optimum. For more details, the reader is referred to Boyd et al. [18] and Danaher et al. [11]. When an objective method for selecting tuning parameters λ and ρ is required, the approximations of the Akaike information criterion (AIC), Bayesian information criterion, or cross-validation method can be used to select tuning parameters. The AIC method was chosen for the following simulation, and λ and ρ both range from 0.05 to 0.3 with a step of 0.0086, where the step is derived by (0.3 − 0.05)/(30 − 1).
In addition, all the reported simulation results are based on 500 simulations with a nominal significance level of 0.05, and we set the dimension to 100.
Fluctuations of test
We illustrated the theoretical asymptotic normality result on simulated data for testing the two-sample problem (13), and we set precision matrices equal under a null hypothesis, i.e., .
Letting G be a p × p symmetric graph matrix with diagonal entries 0 and percent of off-diagonal elements 1, and U be p × p matrix with elements i.i.d. generated from the uniformly distribution on the interval (0, 1), i.e., U(0, 1), we denote the elements of the symmetric matrix as . For i > j,
| (25) |
where gij and uij are the (i, j)-entry of G and U, respectively, and 1{·} is the indicator function. For i < j, we set . The diagonal entries of matrix are zeros. Then, the precision matrix is generated as
| (26) |
This shows that the matrix generated is symmetric and positive definite. To make the non-zero entries go away from 0 and to generate a sparse matrix, we subtract 1 from the non-zero elements. In addition, the precision matrix generation procedure shows that is a parameter controlling the sparsity. When , a dense matrix is generated. As is well known, the sparsity of a matrix not only requires a small quantity of non-zero elements, but also a large absolute value of non-zero elements. The parameter controls sparsity in terms of the number of sparse elements.
We examined the fluctuation of under (p, n) = (100, 200) and (p, n) = (100, 400) settings for the extremely sparse and dense precision matrix cases, respectively. For the extremely sparse precision matrix case, we set the parameter , and for dense case we use .
We simulated the fluctuation for the extremely sparse case as shown in Fig 1 and the dense case in Fig 2. The index (i, j) in the simulation was intermittently chosen. In fact, the CLT provides the method for testing any element of the linear combination of the precision matrix. Theoretically, we can test for any index (i, j)-entry of Θ0 whether the true value is zero or not.
Fig 1. The fluctuation for two-sample case with sparse precision matrix.
Histogram of for . Here, T(i,j) = Tij and . The setting is (p, n) = (100, 200) with (i, j) ∈ {(1, 1), (1, 30), (1, 60), (1, 90)} for four graphs in the first line. The sample size and dimension were set as (p, n) = (100, 400) for four graphs in the second line.
Fig 2. The fluctuation for two-sample case with dense precision matrix.
Histogram of for . Here, T(i,j) = Tij and . The setting is (p, n) = (100, 200) with (i, j) ∈ {(1, 1), (1, 30), (1, 60), (1, 90)} for four graphs in the first line. The sample size and dimension were set to (p, n) = (100, 400) for four graphs in the second line.
Average coverage probabilities
We demonstrate the performance of the test method for the K = 2 situation on testing the hypothesis as follows.
Equal Null. Testing hypothesis (13);
Linear Null. Testing the linear null hypothesis , i.e., . Without loss generation, we chose and generated from (26).
From the global perspective, we used the average coverage, which is also considered in Janková and van de Geer [12]. Letting
| (27) |
be the 95% asymptotic confidence interval for Θ0ij, we substitute the estimator for σij to obtain the empirical version. The frequency of the true value being covered by the confidence interval (27) is defined as . Then, the average coverage over a set A is denoted
| (28) |
S denotes the set of non-zero entries of . It is easy to check that S = S1 = S2 for the reason that and have same structure of sparsity for the Equal Null and Linear Null cases. Thus, for the different null hypotheses, we simulated the average coverage over S and its complementary set Sc. The parameter of sparsity is and 0.9.
Partial results in Table 1 meet our expectation. However, we do not deny that the simulations are affected by randomness. In addition, the proposed method is based on the combination of estimation and hypothesis testing, which accumulates error. The simulation results provide guidance for practice.
Table 1. Estimated average coverage probabilities for K = 2 situation.
| 2* | 2*n | Equal Null | Linear Null | ||
|---|---|---|---|---|---|
| S | S c | S | S c | ||
| 2*0.1 | 200 | 0.9886 | 0.9875 | 0.9101 | 0.9824 |
| 400 | 0.9885 | 0.9867 | 0.8607 | 0.9762 | |
| 2*0.5 | 200 | 0.9880 | 0.9878 | 0.9384 | 0.9745 |
| 400 | 0.9870 | 0.9868 | 0.8820 | 0.9647 | |
| 2*0.9 | 200 | 0.9901 | 0.9899 | 0.9509 | 0.9751 |
| 400 | 0.9889 | 0.9890 | 0.9091 | 0.9639 | |
Multiple FGL case
For the multiple FGL case, we examined the fluctuation of the statistic Tij for the K = 3 situation on testing the hypothesis as follows.
Three-sample Linear Null. Testing hypothesis , where and are both generated from U(0, 1). and are both generated from (26) with parameters 0.01 and 0.1, respectively.
We set and to positive numbers, since the setting of hypothesis testing should guarantee that are symmetric positive-definite matrices. Besides, for Three-sample Linear Null, S denotes the set of non-zero entries of . The dimension and sample size are (p, n) = (100, 200) and (p, n) = (100, 400), respectively. Histograms of the proposed statistic Tij at the
locations of the precision matrix are presented in Fig 3.
Fig 3. The fluctuation for multiple-sample case with dense precision matrix.
Histogram of for . Here, T(i,j) = Tij and . The setting is (p, n) = (100, 200) with (i, j) ∈ {(1, 1), (1, 10), (1, 20), (1, 30)} for four graphs in the first line. The sample size and dimension were set to (p, n) = (100, 400) for four graphs in the second line.
Real data application
The lymphoma is a malignant tumor with increasing incidence and mortality year by year. In this part, we apply the proposed method to two sets of diffuse large B-cell lymphoma (DLBCL) data, denoted DLBCL-A [19] and DLBCL-B [20], which is available at http://portals.broadinstitute.org/cgibin/cancer/datasets.cgi. Some brief information on these datasets can be found in Table 2. The DLBCL-A and DLBCL-B datasets have 3 subgroups, and the label and sample size of each subgroup are shown in the 5th column in Table 2. Both DLBCL-A dataset and DLBCL-B dataset have a high dimension with 662 genes but only a few observations with the sample size 141 for the DLBCL-A dataset and 180 for the DLBCL-B dataset.
Table 2. Brief introduction to the gene profile expression datasets.
| Dataset | n | p | Subgroups | Subgroup label (sample size) |
|---|---|---|---|---|
| DLBCL-A | 141 | 662 | 3 | I (49), II (50), III (42) |
| DLBCL-B | 180 | 662 | 3 | I (42), II (51), III (87) |
Typically, we test for differences in mean vectors across different disease subgroups, however, the role of gene-to-gene interactions in the data across different subtypes remains unclear. In this section, we use our test approach to identify whether the gene-to-gene interactions that controls lymphoma most behave the same across different disease subtypes. For distinct gene subtypes of the same disease gene data, we focus on testing the equality of two precision matrices. The hypothesis testing problem is
where type i and type i are chosen from I, II, III set in Table 2 and type i ≠ type j. We tune parameters with weighted FGL penalty in (3) by AIC criterion. After the tuning procedure, we estimate precision matrices, and then return a p × p matrix, whose (i, j)-th elements are p-value of statistic Tij. The results are demonstrated in Figs 4 and 5.
Fig 4. The p-values of proposed test for DLBCL-A dataset.
P-values of Tij by comparing subtype I and subtype II (left), subtype II and subtype III (middle), and subtype I and subtype III (right) with DLBCL-A dataset.
Fig 5. The p-values of proposed test for DLBCL-B dataset.
P-values of Tij by comparing subtype I and subtype II (left), subtype II and subtype III (middle), and subtype I and subtype III (right) with DLBCL-B dataset.
As can be seen in the figure, the interactions between genes of DLBCL-A dataset are not the same among three different subtypes, while for DLBCL-B dataset, the interactions between genes of three different subtypes are mostly similar.
Proof of theorem
Proof of Theorem 1
To prove Theorem 1, we need a lemma of Janková and Van de Geer [16], which is present as follow.
Lemma 7 Let f(Δ) ≔ tr(ΔΣ0) − [log det(Δ + Θ0) − log det(Θ0)]. Assume that 1/L ≤ λmin(Θ0) ≤ λmax(Θ0) ≤ L for some constant L ≥ 1. Then for all Δ such that ||Δ||F ≤ 1/(2L), f(Δ) is well defined and
To simplify the notation, we substitute , Σ0k, , Θ0k for , , , respectively.
Proof 1 Note that is the minimum value of the fused graphical Lasso for k = 1, 2. Let , and . According to the definitions of , and the convexity of loss function
we obtain
That is
| (29) |
Let , and
subtracting on the both sides of the inequality (29), we get
| (30) |
For term, we have
where function G(M) takes the summation of all the elements of the matrix M, and ∘ is Hadamard product. According to Cauchy-Schwarz inequality, on the sets ,
Hence,
| (31) |
Next, for Lk ≥ 1 satisfying condition
we choose L > 1 satisfying 1/L ≤ 1/Lk and Lk ≤ L, k = 1, 2. Based on the definitions of Δk and , we get
| (32) |
for arbitrary M in (0, 1/2L]. Thus, ||Δk||F is bounded by M, i.e., ||Δk||F ≤ M. For f(Δk) term, based on Lemma 7, we have
| (33) |
where . In particular, we choose c = 1/(8L2), and the inequality (33) still holds.
Using bounds (31) and (33), the inequality (30) turns to be
| (34) |
We move some terms of the inequality (34) and combine them to get the following inequality
| (35) |
Next we need to prove three inequations:
| (36) |
| (37) |
| (38) |
Because
and
hold. Thus,
which proves inequality (36). By the triangle inequality, we naturally obtain
Thus, the inequation (37) holds. For inequation (38), we have
Thus, the inequality (35) yields
By taking 2(ρ + λ0) < λ, we conclude that
By the definition of Δk, we have
| (39) |
So we deduce
holds. Since the inequality of arithmetic and geometric means, the inequality holds. Thus
| (40) |
Using xy ≤ (x2 + y2)/2, the inequality (40) infer that
Because
| (41) |
we obtain
Thus,
| (42) |
Based on the inequality , we have
| (43) |
Next, we prove that substituting for , the conclusion still holds. According to the condition,
Taking , we have
Thus, ||Δk||F is bounded by M/2. In addition,
which means is monotone increasing function of ||Δk||F on set (0, M). We obtain that . Therefore, we can substitute for , and that leads to the inequality (43) holds for .
According to inequality (43), we get
and
Thus, we conclude the upper bound of ,
Proof of Theorem 2
Proof 2 The minimizer satisfying inequality (42), that is
The diagonal elements of and are all 1. Thus
Moreover, for the conclusion of the l1-operator norm, we get
For the minimizer , following inequality holds
| (44) |
To draw the conclusion, we have the following facts:
The Sub-Gaussian vector with covariance implies that is bounded in probability.
The eigenvalues of are bounded by a constant.
Thus, and share the same boundary.
Proof of Theorem 3
Proof 3 Similarly, are the minimum value of the fused graphical Lasso for k = 1, 2, ⋯, K. Let , and . Denotes
we obtain
Thus,
Using the notations that and
we yield the following expression
| (45) |
For Lk ≥ 1, k = 1, 2, ⋯, K, the minimum and maximum eigenvalues of Θ0k hold that
For multiple case, we select a constant L satisfying 1/L ≤ 1/Lk and Lk ≤ L. By similar analysis, for M in (0, 1/2L], the inequality (32) and the inequality (33) still hold.
For K groups data, based on the inequalities (31) and (33). Then, the inequality (45) turns to be
Thus,
| (46) |
When k = 1, 2, ⋯, K, the inequations (36) and (37) still hold. Similarly, we have the following inequality
| (47) |
Thus, by the Eqs (36), (37) and (47) the inequality (46) yields
Since K is a fixed constant, and , we can obtain
On the basis of the inequality (39), we deduce
holds. In addition, one can get the inequality . Thus
| (48) |
Based on xy ≤ (x2 + y2)/2 and the inequality (41), the inequality (48) infer that
Thus,
| (49) |
Using the relation between the Frobenius norm and the supremum norm, we have
| (50) |
According to the inequality (50), we get
According to λ0 ≤ λ/2 and the condition λ ≤ c/8L, we get
Taking , we have
Thus, ||Δk||F is bounded by M/2. Further, we can derive which means that we can substitute for , and that leads to the inequality (50) holds for , i.e.
That implies
which completes the proof.
Proof of Theorem 4
Proof 4 We get from (49)
and similarly derive
Using
we have
At last, using the inequality (44), based on the analysis of the upper bound of and , and the convergence rate of , we draw the conclusion that
Proof of Theorem 5
Proof 5 First of all, we prove that the remainder converge in probability with a convergence rate. On account of Theorem 1, we get
Define
By the Karush-Kuhn-Tucker (KKT) conditions, we yield
| (51) |
and
| (52) |
where if , and satisfying . Multiplying by on the Eq (51), we get
Similarly, we have
Thus,
To draw the conclusion, we have
| (53) |
where b is a constant and is related to L. According to the Schwarz inequality and Weyl inequality, we get
| (54) |
The bound of is derived by
| (55) |
According to the rate of λ, we conclude that
| (56) |
Besides, the Sub-Gaussian random vector with covariance implies that , where Op denotes bounded in probability. We get
For λ ≍ ρ, ||rem||∞ is bounded by in probability, where is a constant related to L. Based on the condition , . According to the bounded fourth moments of and Lindeberg central limit theorem, we complete the proof of the Theorem 5.
Proof of Theorem 6
Proof 6 The conclusions of Theorem 6 can be obtained from the arguments (53)–(56). For weighted version, ||rem||∞ can be bounded by , which completes the proof.
Data Availability
All data come from website http://portals.broadinstitute.org/cgibin/cancer/datasets.cgi. However, we do not have the right to share this data.
Funding Statement
the author Q.Y. Zhang is supported by Program for youth innovation Research in Capital University of Economics and Business(QNTD202207).
References
- 1. Johnstone I. M. On the distribution of the largest eigenvalue in principal components analysis. Annals of statistics. 2001;29(2):295–327. doi: 10.1214/aos/1009210544 [DOI] [Google Scholar]
- 2. Yuan M., Lin Y. Model selection and estimation in the gaussian graphical model. Biometrika. 2007;94(1):19–35. doi: 10.1093/biomet/asm018 [DOI] [Google Scholar]
- 3. Friedman J., Hastie T., Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9(3):432–441. doi: 10.1093/biostatistics/kxm045 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Rothman A. J., Bickel P. J., Levina E., Zhu J. Sparse permutation invariant covariance estimation. Electronic Journal of Statistics. 2008;2:494–515. doi: 10.1214/08-EJS176 [DOI] [Google Scholar]
- 5. Fan J. Q., Feng Y., Wu Y. C. Network exploration via the adaptive lasso and scad penalties. Annals of applied statistics. 2009;3(2):521–541. doi: 10.1214/08-AOAS215SUPP [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Ravikumar P., Wainwright M. J., Raskutti G., Yu B. High-dimensional covariance estimation by minimizing l1-penalized log-determinant divergence. Electronic Journal of Statistics. 2011;5:935–980. [Google Scholar]
- 7. Xue L. Z., Zou H. Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Annals of Statistics. 2012;40(5):2541–2571. doi: 10.1214/12-AOS1041 [DOI] [Google Scholar]
- 8. Yuan Y. P., Shen X. T., Pan W., Wang Z. Z. Constrained likelihood for reconstructing a directed acyclic gaussian graph. Biometrika. 2019;106(1):109–125. doi: 10.1093/biomet/asy057 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Guo J., Levina E., Michailidis G., Zhu J. Joint estimation of multiple graphical models. Biometrika. 2011;98(1):1–15. doi: 10.1093/biomet/asq060 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Zhang X.-F., Ou-Yang L., Yan T., Hu X. T., Yan H. A joint graphical model for inferring gene networks across multiple subpopulations and data types. IEEE Transactions on Cybernetics. 2019;51(2):1043–1055. doi: 10.1109/TCYB.2019.2952711 [DOI] [PubMed] [Google Scholar]
- 11. Danaher P., Wang P., Witten D. M. The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2014;76(2):373–397. doi: 10.1111/rssb.12033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Janková J., van de Geer S. Confidence intervals for high-dimensional inverse covariance estimation. Electronic Journal of Statistics. 2015;9(1):1205–1229. [Google Scholar]
- 13. Janková J., van de Geer S. Honest confidence regions and optimality in high-dimensional precision matrix estimation. Test. 2017;26(1):143–162. doi: 10.1007/s11749-016-0503-5 [DOI] [Google Scholar]
- 14. Ren Z., Sun T., Zhang C.-H., Zhou H. H. Asymptotic normality and optimalities in estimation of large gaussian graphical models. Annals of Statistics. 2015;43(3):991–1026. doi: 10.1214/14-AOS1286 [DOI] [Google Scholar]
- 15. Yu M., Gupta V., Kolar M. Simultaneous inference for pairwise graphical models with generalized score matching. Journal of Machine Learning Research. 2020;21(91):1–51.34305477 [Google Scholar]
- 16.Janková, J., van de Geer, S. Inference in high-dimensional graphical models. http://arxiv.org/abs/arXiv:1801.08512
- 17. Brownlees C., Nualart E., Sun Y. C. Realized networks. Journal of Applied Econometrics. 2018;33(7):986–1006. doi: 10.1002/jae.2642 [DOI] [Google Scholar]
- 18. Boyd S., Vandenberghe L. Convex optimization. Cambridge university press; 2004. [Google Scholar]
- 19. Monti S., Savage K. J., Kutok J. L., Feuerhake F., Kurtin P., Mihm M., et al. Molecular profiling of diffuse large b-cell lymphoma identifies robust subtypes including one characterized by host inflammatory response. Blood. 2005;105(5):1851–1861. doi: 10.1182/blood-2004-07-2947 [DOI] [PubMed] [Google Scholar]
- 20. Rosenwald A., Wright G., Chan W. C., Connors J. M., Campo E., Fisher R. I., et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. The New England Journal of Medicine. 2002;346(25):1937–1947. doi: 10.1056/NEJMoa012914 [DOI] [PubMed] [Google Scholar]





