Skip to main content
PLOS One logoLink to PLOS One
. 2024 May 31;19(5):e0304264. doi: 10.1371/journal.pone.0304264

Application of fused graphical lasso to statistical inference for multiple sparse precision matrices

Qiuyan Zhang 1,*, Lingrui Li 1, Hu Yang 2
Editor: Debo Cheng3
PMCID: PMC11142621  PMID: 38820407

Abstract

In this paper, the fused graphical lasso (FGL) method is used to estimate multiple precision matrices from multiple populations simultaneously. The lasso penalty in the FGL model is a restraint on sparsity of precision matrices, and a moderate penalty on the two precision matrices from distinct groups restrains the similar structure across multiple groups. In high-dimensional settings, an oracle inequality is provided for FGL estimators, which is necessary to establish the central limit law. We not only focus on point estimation of a precision matrix, but also work on hypothesis testing for a linear combination of the entries of multiple precision matrices. We apply a de-biasing technology, which is used to obtain a new consistent estimator with known distribution for implementing the statistical inference, and extend the statistical inference problem to multiple populations. The corresponding de-biasing FGL estimator and its asymptotic theory are provided. A simulation study and an application of the diffuse large B-cell lymphoma data show that the proposed test works well in high-dimensional situation.

Introduction

Undirected graphical models are popular tools for representing the network structure of data and have been widely applied in many domains, such as machine learning, gene pattern recognition, and financial data analysis. Letting x = (x1, …, xp)T be a p-variate normal random vector with mean vector μ and covariance Σ00 is positive definite), the precision matrix (or concentration matrix) is denoted the inverse of the covariance matrix, i.e., Θ0Σ0-1. The graphical models capture conditional dependence relationships between random variables via non-zero entries in a precision matrix. If Θ0ij ≠ 0, xi and xj, i, j = 1, …, p are dependent on each other, given all other variables. Meanwhile, the zero entries in the precision matrix correspond to pairs of variables that are conditionally independent given other variables. Therefore, the graph model is closely related to the precision matrix. The research of estimating and testing of a precision matrix have been a rapidly growing research direction in the past few years.

Letting x1, …, xn be a sequence of independent and identically distributed (i.i.d.) observations from the population x, Xp×n ≔ (x1, …, xn). A natural estimator of the precision matrix is the inverse of the sample covariance matrix Σ^, where Σ^=1nXTX. On the one hand, in high-dimensional settings, Johnstone [1] proposed that the eigenvalues of the sample covariance matrix do not converge to the corresponding eigenvalue of the population covariance matrix for Σ = I. Consequently, this estimator becomes invalid when the dimension p is comparable to the sample size n. On the other hand, the sample covariance matrix is singular in a p > n − 1 setting. This will produce non-negligible errors in using Σ^n-1 to estimate Θ0. In addition, a sparse (i.e., many entries are either zero or nearly so) assumption for a high-dimensional precision matrix is essential, since the zero entries imply the conditional independence structures, which are what we are most concerned with in the graphical model. In general, Σ^n-1 does not have a sparsity construction. How to estimate the sparse precision matrix in high-dimensional settings is an intractable problem.

In recent years, various proposals have been put forward for estimating a precision matrix in high-dimensional situations, among which the graphical model with sparsity-promoting penalties is valid for obtaining a sparse estimator. By applying the l1 (lasso penalty) to the entries of the concentration matrix, Yuan and Lin [2] proposed a max-det algorithm to obtain the estimator of Θ0. The convergence result of the estimator is derived under a p fixed assumption. Using a coordinate descent procedure, Friedman et al. [3] provided an algorithm for solving a graphical Lasso estimator that is remarkably fast, even if p > n. Rothman et al. [4] investigated a sparse permutation invariant covariance estimator, and established a convergence rate of the estimator in the Frobenius norm as both data dimension p and sample size n are allowed to grow, and showed that the rate explicitly depends on how sparse the true concentration matrix is. For additional theoretical details on penalized likelihood methods for graphical models, see Fan et al. [5], Ravikumar et al. [6], Xue and Zou [7], and Yuan et al. [8].

The above-mentioned methods focus on estimating a single graphical model. However, joint estimators perform better in recovering the truth graphs for multiple graphical models, when graphs sharing the similar structure. Guo et al. [9] studied joint estimation of precision matrices that have a hierarchical structure assumption. Zhang et al. [10] proposed a new joint group lasso penalty to restore the joint graphical model. Their method was applied for multiple gene networks data with several subpopulations and data types. A fused graphical lasso was proposed by Danaher et al. [11] with a penalty imposing a similar structure of a precision matrix across groups. Supposing that Xp×nk[k](x1[k],,xnk[k]) are sample matrices, and xi[k]Rp(i=1,,nk) are sampled i.i.d. from a distribution with mean μ[k] and covariance Σ0[k], for k = 1, …, K, we assume μ[k] = 0 without loss of generality. To simplify notation, we omit the subscript of Xp×nk[k], and denote the sample matrices as X[k]. The population precision matrix is defined as the inverse of the population covariance matrix, i.e., Θ0[k]=(Σ0[k])-1. The estimators of precision matrices {Θ0[k]} are investigated by minimizing the negative penalized log likelihood

{Θ^[k]}=argmin{Θ[k]S++}k{tr(Σ^[k]Θ[k])-logdet(Θ[k])}+P({Θ[k]}), (1)

where P({Θ[k]}) denotes the penalty function, the {Θ^[k]} are the minimizers of (1), and we optimize over the symmetric positive-definite matrices set S++. The fused graphical lasso (FGL) is the solution to optimization problem (1) with the fused lasso penalty

P({Θ[k]})=λk=1K||(Θ[k])-||1+ρk<k||(Θ[k]-Θ[k])-||1, (2)

where λ and ρ are non-negative regularization parameters, (Θ[k]) represents the matrix obtained by setting the diagonal elements of (Θ[k]) to zero, and || ⋅ ||1 denotes the l1 norm of a vector or matrix. It is reasonable to restrict non-diagonal elements of Θ[k], since we are most concerned with the conditional independence cross-different variables. Note that the first term in (2) is the classical lasso penalty, which shrinks the coefficients toward 0 as λ increases. It guarantees discovery of the sparse estimators {Θ^[k]} of the model. The penalty on (Θ[k] − Θ[k′]) indicates that the elements of Θ^[1],,Θ^[K] have a similar network structure across classes.

An approach for the estimation of the joint graphical models largely relies on penalized estimation. The penalty biases the estimates toward the assumed structure, which makes hypothesis tests for precision matrices more challenging. Work on statistical inference for low-dimensional parameters in graphical models has recently been carried out (Janková and van de Geer [12]; Janková and van de Geer [13]; Ren et al. [14]; Yu et al. [15]) based on the l1-penalized estimator. Janková and van de Geer [12] provided a de-biasing technique to obtain a new consistent estimator with known distribution. However, these approaches were developed only in the setting in which the parameters of one graph are inferred. In contrast, studies of inference techniques using estimators obtained from cross-group penalization are much fewer. The work on statistical inference for multiple graphical models is an interesting area open for future research. Inspired by Janková and van de Geer [12], we not only give FGL estimators of multiple precision matrices from co-movement data, but also test the linear combination of the entries of these precision matrices. The core of the proposed method is based on the de-biasing technique, and we implement statistical inference of the precision matrices under high-dimensional settings according to the proposed central limit theorem.

The rest of this paper is organized as follows. In Main results section, we give the oracle inequality for multiple estimators with a FGL penalty and its weighted version. Testing the hypothesis for the linear combination of corresponding entries of multiple precision matrices is also considered in this section. Based on de-biasing technology, the CLT of the proposed statistics for multiple populations is also derived in this section. In Numerical study part, we report the results of simulations. In Real Data Application, we apply the proposed method to identification of gene-to-gene interaction of the diffuse large B-cell lymphoma data. All technical details are relegated to the Proof of Theorem part.

Main results

We assume following notation throughout the paper. For a matrix A=(aij)i,j=1p, we denote (A)ij as (i, j)-entry of A, or denote its (i, j)-entry as Aij to simplify the notation. We write |A| for the determinant of A, and the trace of matrix A is denoted tr(A). Letting A+ = diag(A) for a diagonal matrix with the same diagonal as A, A = AA+. ||A||F2=i,jaij2 denotes the Frobenius norm (also known as the matrix 2-norm). We use the notation ||A|| = maxi,j|aij| for the supremum norm of a matrix A, and |||A|||1 ≔ maxji|aij| for the l1-operator norm.

We write f(n)=O(g(n)) if f(n) ≤ cg(n) for some constant c < ∞, and f(n) = Ω(g(n)) if f(n) ≥ cg(n) for some constant c′ > 0. The notation f(n) ≍ g(n) means that f(n)=O(g(n)) and f(n) = Ω(g(n)). In the common high-dimensional setting, the dimension p is allowed to grow to infinity. The dimension is comparable, substantially larger or smaller than the sample size. We set sample sizes n1 ≍ … ≍ nKn throughout the paper, and n* = n1 + … + nK going to infinity. Furthermore, for notational simplicity, we assume that n1 = … = nK = n.

Oracle inequality

To obtain the oracle inequality of multiple estimators of FGL models, we introduce some notation related to the sparsity assumptions on the entries of the true precision matrix. Letting

Sk{(i,j):Θ0ij[k]0,ij},

where Θ0ij[k] is the (i, j)-entry of Θ0[k] and sk = |Sk| is the cardinality of Sk, we adopt the boundedness of the eigenvalues of the true precision matrix and certain tail conditions proposed by Janková and Van De Geer [12].

Condition 1 (Bounded eigenvalues) There exist universal constants L for k such that

0<L<Λmin(Θ0[k])Λmax(Θ0[k])<1/L<,

where Λmin and Λmax denote the minimum and maximum eigenvalues of a matrix, respectively.

Condition 2 (Sub-Gaussianity vector condition) The observations xi[k] , i = 1, …, nk, are uniformly sub-Gaussian vectors in the respective groups.

We propose the oracle inequality for FGL lasso under the K = 2 situation.

Theorem 1 Supposing that Conditions 1 and 2 hold, for k = 1, 2, tuning parameter λ satisfying 2(ρ + λ0) ≤ λ ≤ c/8L, and 8λ2(s1+s2)c+4pλ02cλ0/2L. On the set {maxk||Σ^[k]-Σ0[k]||λ0}, k = 1, 2, it holds that

ck=12||Θ^[k]-Θ0[k]||F2+λk=12||(Θ^[k]-Θ0[k])-||18λ2(s1+s2)c+4pλ02c,

and

k=12|||Θ^[k]-Θ0[k]|||14λ(8s1+8s2+p)c,

where c = 1/(8L2).

Remark 1 From the inequality, we must select λ so that λp → 0 as n → ∞ to ensure consistency, which is not satisfied by a sub-Gaussianity random vector. Thus, the condition λp → 0 excludes the pn situation.

The FGL does not take into account that the variables have, in general, different scaling. Thus, we consider the weighted FGL. The minimizer of the optimization problem (1) with weighted FGL penalty

P({Θ[k]})=λkijW^ii[k]W^jj[k]|Θij[k]|+ρk<kij|W^ii[k]W^jj[k]Θij[k]-W^ii[k]W^jj[k]Θij[k]| (3)

is denoted {Θ^w[k]}, where W^[k]=[diag(Σ^[k])]12. Further, the population correlation matrix is denoted R0[k] and the sample correlation matrix is denoted

R^[k]=(W^[k])-1Σ^[k](W^[k])-1.

If we substitute R^[k] for Σ^[k], the minimizer of

argmin{Θ[k]S++}k{tr(R^[k]Θ[k])-logdet(Θ[k])}+P({Θ[k]}) (4)

with a FGL penalty (2) is denoted {Θ^R[k]}, which is a matter of estimating the parameter by the normalized data. Then,

Θ^R[k]=W^[k]Θ^w[k]W^[k],

which means, essentially, that Θ^R[k] are the estimators of ΘR0[k](R0[k])-1.

Theorem 2 Under the conditions of Theorem 1, on the set {maxk||R^[k]-R0[k]||λ0} , k = 1, 2, it holds that

ck=12||Θ^R[k]-ΘR0[k]||F2+λk=12||(Θ^R[k]-ΘR0[k])-||18λ2(s1+s2)c, (5)
k=12|||Θ^R[k]-ΘR0[k]|||132λ(s1+s2)c, (6)

and

k=12|||Θ^w[k]-Θ0[k]|||132λ(s1+s2)c. (7)

It is natural to extend this conclusion to the K > 2 FGL model. For k = 1, …, K and the K > 2 situation, we obtain the following theorem.

Theorem 3 (Multiple FGL model) Supposing that Conditions 1 and 2 hold, for K > 2, 2(K(K-1)2ρ+λ0)λc/8L, and 8λ2k=1Kskc+2Kpλ02cλ0/2L, on the set {maxk||Σ^[k]-Σ0[k]||λ0}, k = 1, …, K, it holds that

ck=1K||Θ^[k]-Θ0[k]||F2+λk=1K||(Θ^[k]-Θ0[k])-||18λ2k=1Kskc+2Kpλ02c (8)

and

k=1K|||Θ^[k]-Θ0[k]|||12Kλ(8k=1Ksk+Kp2)c. (9)

Theorem 4 (Multiple FGL model for weighted version) Under the conditions of Theorem 3, on the set {maxk||R^[k]-R0[k]||λ0} , k = 1, 2, it holds that

ck=1K||Θ^R[k]-ΘR0[k]||F2+λk=1K||(Θ^R[k]-ΘR0[k])-||18λ2k=1Kskc, (10)
k=1K|||Θ^R[k]-ΘR0[k]|||116Kλk=1Kskc, (11)

and

k=1K|||Θ^w[k]-Θ0[k]|||116Kλk=1Kskc. (12)

Asymptotic property

We not only focus on the point estimation of multiple precision matrices, but also on hypothesis testing for the linear combination of the entries of the precision matrices over two groups. One may want to test whether the elements of the precision matrix over two groups are equal:

H0:Θ0ij[1]=Θ0ij[2]vs.H1:Θ0ij[1]Θ0ij[2]. (13)

To test Hypothesis (13), we aim to obtain confidence intervals for estimators based on the de-biasing technique, which is imposed for eliminating the bias associated with the penalty. The de-biasing estimator is defined as Θ^d[k]=2Θ^[k]-Θ^[k]Σ^[k]Θ^[k]. The difference between the de-biasing estimator and the true value can be decomposed into two parts as follows:

Θ^d[k]-Θ0[k]=Ξ[k]+ϒ[k],

where

Ξ[k]=-Θ0[k](Σ^[k]-Σ0[k])Θ0[k],ϒ[k]=-(Θ^[k]-Θ0[k])(Σ^[k]-Σ0[k])Θ0[k]-(Θ^[k]-Θ0[k])(Σ^[k]Θ^[k]-Ip).

Under the compatibility conditions, Janková and van de Geer [16] proposed that the (i, j)-entry of Θ^d[k]-Θ0[k] has an asymptotic normality property, and n||ϒ[k]|| converges to zero in probability. Thus, for testing Hypothesis (13), we construct the testing statistic

Tij(Θ^d[1]-Θ^d[2])ij=[2Θ^[1]-Θ^[1]Σ^[1]Θ^[1]-(2Θ^[2]-Θ^[2]Σ^[2]Θ^[2])]ij (14)

using de-biasing estimators.

For K = 2, we let

s=max{s1,s2},d=max{d1,d2},

where

dk=maxj=1,,p|Dj[k]|,Dj[k]={(i,j):Θ0ij[k]0,ij}.

Next, we establish the central limit theorem for Tij.

Theorem 5 Assuming Conditions 1, 2, and λρlogp/n and (p+s)d=o(n/logp), it holds that

Θ^d[1]-Θ^d[2]-(Θ0[1]-Θ0[2])=Ξ[1]-Ξ[2]+rem, (15)

where

||rem||=||ϒ[1]-ϒ[2]||=op(1/n), (16)

and op denotes the convergence in probability. Moreover,

n[Tij-Θ0ij]DN(0,σij2), (17)

where Θ0ij=(Θ0[1]-Θ0[2])ij.

To complete the testing procedure, we use the consistent estimator σ^ij2=(Θ^[1])ii(Θ^[1])jj+(Θ^[1])ij2+(Θ^[2])ii(Θ^[2])jj+(Θ^[2])ij2 for Theorem 5. Theorem 5 provide a practical and efficient way of obtaining the p value and critical value for the test statistic. Under a null hypothesis, we observe that Θ0ij[1]-Θ0ij[2]=0. For an α level of significance, we reject H0 if |nTij/σ^ij2|>ξα/2, where ξα is the 1 − α upper quantile of the standard normal distribution.

Theorem 5 requires a stronger sparsity condition than the corresponding oracle-type inequality in Theorem 1. According to the convergence rate of (p+s)d, Theorem 5 applies to the pn situation. For pn, we provide the following theorem.

Theorem 6 Assuming Conditions 1, 2, and λρlogp/n and sd=o(n/logp) , for the pn regime, the Eq (22) holds with Θ^w[k], where

||rem||=op(1/n). (18)

In addition,

n[Twij-Θ0ij]DN(0,σij2), (19)

where Twij=(2Θ^w[1]-Θ^w[1]Σ^[1]Θ^w[1])ij-(2Θ^w[2]-Θ^w[2]Σ^[2]Θ^w[2])ij.

We do not need to impose the so-called irrepresentability condition on Σ to derive the theoretical properties of our estimators, in contrast to Brownlees et al. [17].

In addition, for the multi-sample precision matrix hypothesis problem, one may want to test a linear hypothesis testing problem:

H0:a1Θ0ij[1]++aKΘ0ij[K]=0vs.H1:notH0, (20)

where a1, …, aK are known constants. Similar to the two-sample case, we proposed the test statistic

a1Θ^dij[1]++aKΘ^dij[K]. (21)

For the K > 2 multiple situation, we assume s = max{s1, …, sK} and d = max{d1, …, dK}. Consequently, we establish the asymptotic normality of the proposed statistic in the following corollary, i.e., Corollary 1.

Corollary 1 Under the assumptions of Theorem 5, it holds that

f(Θ^d[1],,Θ^d[K])-f(Θ0[1],,Θ0[K])=f(Ξ[1],,Ξ[K])+rem, (22)
||rem||=||f(ϒ[1],,ϒ[K])||=op(1/n), (23)

where f(x1, …, xK) = a1x1 + … + aKxK. In addition,

n[Tij-Θ0ij]DN(0,σij2), (24)

where Tij=f(Θ^dij[1],,Θ^dij[K]) and Θ0ij=f(Θ0ij[1],,Θ0ij[K]).

The asymptotic variance σij in Corollary 1 is unknown, so to construct confidence intervals we use a consistent estimator

σ^ij2=fv([(Θ^[1])ii(Θ^[1])jj+(Θ^[1])ij2],,[(Θ^[K])ii(Θ^[K])jj+(Θ^[K])ij2]),

where fv(x1,,xK)=a12x1++aK2xK. In addition, a weighted version is proposed as follows.

Corollary 2 Under the assumptions of Theorem 6, the residual term in (23) converges in probability with rate 1/n, and CLT in (24) holds by replacing Θ^[k] by Θ^w[k], which is obtained by solving the weighted FGL optimization problem.

Numerical study

Simulation experiments were carried out to evaluate the performance of the proposed de-biasing FGL test. We considered the sparse graphical model, and a random sample was generated from the multivariate normal distribution N(0p,(Θ0[k])-1) with a population covariance matrix defined as the inverse of the population precision matrix.

To solve the graphical lasso problem with a certain penalty, we refer to the alternating direction method of multiplier (ADMM) algorithm, since it is guaranteed to converge to the global optimum. For more details, the reader is referred to Boyd et al. [18] and Danaher et al. [11]. When an objective method for selecting tuning parameters λ and ρ is required, the approximations of the Akaike information criterion (AIC), Bayesian information criterion, or cross-validation method can be used to select tuning parameters. The AIC method was chosen for the following simulation, and λ and ρ both range from 0.05 to 0.3 with a step of 0.0086, where the step is derived by (0.3 − 0.05)/(30 − 1).

In addition, all the reported simulation results are based on 500 simulations with a nominal significance level of 0.05, and we set the dimension to 100.

Fluctuations of test

We illustrated the theoretical asymptotic normality result on simulated data for testing the two-sample problem (13), and we set precision matrices equal under a null hypothesis, i.e., Θ0[1]=Θ0[2].

Letting G be a p × p symmetric graph matrix with diagonal entries 0 and α˜ percent of off-diagonal elements 1, and U be p × p matrix with elements i.i.d. generated from the uniformly distribution on the interval (0, 1), i.e., U(0, 1), we denote the elements of the symmetric matrix Θ˜ as θ˜ij. For i > j,

θ˜ij=gijuij+gjiuji2-1{gijuij+gjiuji2<0.5}, (25)

where gij and uij are the (i, j)-entry of G and U, respectively, and 1{·} is the indicator function. For i < j, we set θ˜ij=θ˜ji. The diagonal entries of matrix Θ˜ are zeros. Then, the precision matrix is generated as

Θ0[k]=Θ˜+(|Λmin(Θ˜)|+0.1)Ip. (26)

This shows that the matrix generated is symmetric and positive definite. To make the non-zero entries go away from 0 and to generate a sparse matrix, we subtract 1 from the non-zero elements. In addition, the precision matrix generation procedure shows that α˜ is a parameter controlling the sparsity. When α˜=1, a dense matrix is generated. As is well known, the sparsity of a matrix not only requires a small quantity of non-zero elements, but also a large absolute value of non-zero elements. The parameter α˜ controls sparsity in terms of the number of sparse elements.

We examined the fluctuation of nTij/σ^ij under (p, n) = (100, 200) and (p, n) = (100, 400) settings for the extremely sparse and dense precision matrix cases, respectively. For the extremely sparse precision matrix case, we set the parameter α˜=0.01, and for dense case we use α˜=1.

We simulated the fluctuation for the extremely sparse case as shown in Fig 1 and the dense case in Fig 2. The index (i, j) in the simulation was intermittently chosen. In fact, the CLT provides the method for testing any element of the linear combination of the precision matrix. Theoretically, we can test for any index (i, j)-entry of Θ0 whether the true value is zero or not.

Fig 1. The fluctuation for two-sample case with sparse precision matrix.

Fig 1

Histogram of nTij/σ^ij for α˜=0.01. Here, T(i,j) = Tij and σ^(i,j)=σ^ij. The setting is (p, n) = (100, 200) with (i, j) ∈ {(1, 1), (1, 30), (1, 60), (1, 90)} for four graphs in the first line. The sample size and dimension were set as (p, n) = (100, 400) for four graphs in the second line.

Fig 2. The fluctuation for two-sample case with dense precision matrix.

Fig 2

Histogram of nTij/σ^ij for α˜=1. Here, T(i,j) = Tij and σ^(i,j)=σ^ij. The setting is (p, n) = (100, 200) with (i, j) ∈ {(1, 1), (1, 30), (1, 60), (1, 90)} for four graphs in the first line. The sample size and dimension were set to (p, n) = (100, 400) for four graphs in the second line.

Average coverage probabilities

We demonstrate the performance of the test method for the K = 2 situation on testing the hypothesis as follows.

  • Equal Null. Testing hypothesis (13);

  • Linear Null. Testing the linear null hypothesis H0:a1Θ0ij[1]+a2Θ0ij[2]=0, i.e., H0:Θ0ij[2]=-a1a2Θ0ij[1]. Without loss generation, we chose -a1a2=0.5 and Θ0ij[1] generated from (26).

From the global perspective, we used the average coverage, which is also considered in Janková and van de Geer [12]. Letting

Iij[Tij-1.96σijn,Tij+1.96σijn] (27)

be the 95% asymptotic confidence interval for Θ0ij, we substitute the estimator σ^ij for σij to obtain the empirical version. The frequency of the true value being covered by the confidence interval (27) is defined as ϑ^ij. Then, the average coverage over a set A is denoted

AvgcovA=1|A|(i,j)Aϑ^ij. (28)

S denotes the set of non-zero entries of Θ0ij[1]. It is easy to check that S = S1 = S2 for the reason that Θ0ij[1] and Θ0ij[2] have same structure of sparsity for the Equal Null and Linear Null cases. Thus, for the different null hypotheses, we simulated the average coverage over S and its complementary set Sc. The parameter of sparsity is α˜=0.1,0.5, and 0.9.

Partial results in Table 1 meet our expectation. However, we do not deny that the simulations are affected by randomness. In addition, the proposed method is based on the combination of estimation and hypothesis testing, which accumulates error. The simulation results provide guidance for practice.

Table 1. Estimated average coverage probabilities for K = 2 situation.

2*α˜ 2*n Equal Null Linear Null
S S c S S c
2*0.1 200 0.9886 0.9875 0.9101 0.9824
400 0.9885 0.9867 0.8607 0.9762
2*0.5 200 0.9880 0.9878 0.9384 0.9745
400 0.9870 0.9868 0.8820 0.9647
2*0.9 200 0.9901 0.9899 0.9509 0.9751
400 0.9889 0.9890 0.9091 0.9639

Multiple FGL case

For the multiple FGL case, we examined the fluctuation of the statistic Tij for the K = 3 situation on testing the hypothesis as follows.

  • Three-sample Linear Null. Testing hypothesis H0:Θ0ij[3]=-a1a3Θ0ij[1]-a2a3Θ0ij[2], where -a1a3=0.6 and -a2a3=0.9 are both generated from U(0, 1). Θ0ij[1] and Θ0ij[2] are both generated from (26) with parameters 0.01 and 0.1, respectively.

We set -a1a3 and -a2a3 to positive numbers, since the setting of hypothesis testing should guarantee that {Θ0ij[k]}k=13 are symmetric positive-definite matrices. Besides, for Three-sample Linear Null, S denotes the set of non-zero entries of Θ0ij[1]+a2a3Θ0ij[2]. The dimension and sample size are (p, n) = (100, 200) and (p, n) = (100, 400), respectively. Histograms of the proposed statistic Tij at the

(i,j){(1,1),(1,10),(1,20),(1,30)}

locations of the precision matrix are presented in Fig 3.

Fig 3. The fluctuation for multiple-sample case with dense precision matrix.

Fig 3

Histogram of nTij/σ^ij for α˜=1. Here, T(i,j) = Tij and σ^(i,j)=σ^ij. The setting is (p, n) = (100, 200) with (i, j) ∈ {(1, 1), (1, 10), (1, 20), (1, 30)} for four graphs in the first line. The sample size and dimension were set to (p, n) = (100, 400) for four graphs in the second line.

Real data application

The lymphoma is a malignant tumor with increasing incidence and mortality year by year. In this part, we apply the proposed method to two sets of diffuse large B-cell lymphoma (DLBCL) data, denoted DLBCL-A [19] and DLBCL-B [20], which is available at http://portals.broadinstitute.org/cgibin/cancer/datasets.cgi. Some brief information on these datasets can be found in Table 2. The DLBCL-A and DLBCL-B datasets have 3 subgroups, and the label and sample size of each subgroup are shown in the 5th column in Table 2. Both DLBCL-A dataset and DLBCL-B dataset have a high dimension with 662 genes but only a few observations with the sample size 141 for the DLBCL-A dataset and 180 for the DLBCL-B dataset.

Table 2. Brief introduction to the gene profile expression datasets.

Dataset n p Subgroups Subgroup label (sample size)
DLBCL-A 141 662 3 I (49), II (50), III (42)
DLBCL-B 180 662 3 I (42), II (51), III (87)

Typically, we test for differences in mean vectors across different disease subgroups, however, the role of gene-to-gene interactions in the data across different subtypes remains unclear. In this section, we use our test approach to identify whether the gene-to-gene interactions that controls lymphoma most behave the same across different disease subtypes. For distinct gene subtypes of the same disease gene data, we focus on testing the equality of two precision matrices. The hypothesis testing problem is

H0:Θ0ijtypei=Θ0ijtypejvs.H1:Θ0ijtypeiΘ0ijtypej

where type i and type i are chosen from I, II, III set in Table 2 and type itype j. We tune parameters with weighted FGL penalty in (3) by AIC criterion. After the tuning procedure, we estimate precision matrices, and then return a p × p matrix, whose (i, j)-th elements are p-value of statistic Tij. The results are demonstrated in Figs 4 and 5.

Fig 4. The p-values of proposed test for DLBCL-A dataset.

Fig 4

P-values of Tij by comparing subtype I and subtype II (left), subtype II and subtype III (middle), and subtype I and subtype III (right) with DLBCL-A dataset.

Fig 5. The p-values of proposed test for DLBCL-B dataset.

Fig 5

P-values of Tij by comparing subtype I and subtype II (left), subtype II and subtype III (middle), and subtype I and subtype III (right) with DLBCL-B dataset.

As can be seen in the figure, the interactions between genes of DLBCL-A dataset are not the same among three different subtypes, while for DLBCL-B dataset, the interactions between genes of three different subtypes are mostly similar.

Proof of theorem

Proof of Theorem 1

To prove Theorem 1, we need a lemma of Janková and Van de Geer [16], which is present as follow.

Lemma 7 Let f(Δ) ≔ tr(ΔΣ0) − [log det(Δ + Θ0) − log det(Θ0)]. Assume that 1/L ≤ λmin0) ≤ λmax0) ≤ L for some constant L ≥ 1. Then for all Δ such that ||Δ||F ≤ 1/(2L), f(Δ) is well defined and

f(Δ)12(L+1/(2L))2||Δ||F2.

To simplify the notation, we substitute Σ^k, Σ0k, Θ^k, Θ0k for Σ^[k], Σ0[k], Θ^[k], Θ0[k] respectively.

Proof 1 Note that Θ^k is the minimum value of the fused graphical Lasso for k = 1, 2. Let Θ˜k=αkΘ^k+(1-αk)Θ0k, and αk=MM+||Θ^k-Θ0k||F. According to the definitions of Θ˜k, and the convexity of loss function

Fn(Θ1,Θ2)=tr(Θ1Σ^1)-logdet(Θ1)+tr(Θ2Σ^2)-logdet(Θ2)+λ||Θ1-||1+λ||Θ2-||1+ρ||Θ1--Θ2-||1,

we obtain

Fn(Θ˜1,Θ˜2)Fn(Θ01,Θ02).

That is

k=12{tr(Θ˜k-Θ0k)Σ^k-(logdet(Θ˜k)-logdet(Θ0k))+λ||Θ˜k-||1}+ρ||Θ˜1--Θ˜2-||1λ||Θ01-||1+λ||Θ02-||1+ρ||Θ01--Θ02-||1. (29)

Let Δk=Θ˜k-Θ0k , and

f(Δk)tr(ΔkΣ0k)-[logdet(Δk+Θ0k)-logdet(Θ0k)],

subtracting tr(Δ1(Σ^1-Σ01))+tr(Δ2(Σ^2-Σ02)) on the both sides of the inequality (29), we get

f(Δ1)+f(Δ2)+λ||Θ˜1-||1+λ||Θ˜2-||1+ρ||Θ˜1--Θ˜2-||1-tr(Δ1(Σ^1-Σ01))-tr(Δ2(Σ^2-Σ02))+λ||Θ01-||1+λ||Θ02-||1+ρ||Θ01--Θ02-||1. (30)

For tr(Δk(Σ^k-Σ0k)) term, we have

|tr(Δk(Σ^k-Σ0k))|=|G(Δk(Σ^k-Σ0k))||G(Δk-(Σ^k--Σ0k-))|+|G(Δk+(Σ^k+-Σ0k+))|,

where function G(M) takes the summation of all the elements of the matrix M, andis Hadamard product. According to Cauchy-Schwarz inequality, on the sets {maxk{||Σ^k-Σ0k||}λ0},

|G(Δk-(Σ^k--Σ0k-))|+|G(Δk+(Σ^k+-Σ0k+))|||Σ^k--Σ0k-||||Δk-||1+||Σ^k+-Σ0k+||F||Δk+||Fλ0||Δk-||1+||Σ^k+-Σ0k+||F||Δk+||F.

Hence,

-tr(Δk(Σ^k-Σ0k))|tr(Δk(Σ^k-Σ0k))|λ0||Δk-||1+||Σ^k+-Σ0k+||F||Δk+||F. (31)

Next, for Lk ≥ 1 satisfying condition

1/Lkλmin(Θ0k)λmax(Θ0k)Lk,

we choose L > 1 satisfying 1/L ≤ 1/Lk and LkL, k = 1, 2. Based on the definitions of Δk and Θ˜k, we get

||Δk||F=αk||Θ^k-Θ0k||F=||Θ^k-Θ0k||FM+||Θ^k-Θ0k||FM, (32)

for arbitrary M in (0, 1/2L]. Thus, ||Δk||F is bounded by M, i.e., ||Δk||FM. For fk) term, based on Lemma 7, we have

f(Δk)c||Θ˜k-Θ0k||F2, (33)

where c=12(L+1/(2L))2. In particular, we choose c = 1/(8L2), and the inequality (33) still holds.

Using bounds (31) and (33), the inequality (30) turns to be

c||Θ˜1-Θ01||F2+c||Θ˜2-Θ02||F2+λ||Θ˜1-||1+λ||Θ˜2-||1+ρ||Θ˜1--Θ˜2-||1λ0||Δ1-||1+λ0||Δ2-||1+||Σ^1+-Σ01+||F||Δ1+||F+||Σ^2+-Σ02+||F||Δ2+||F+λ||Θ01-||1+λ||Θ02-||1+ρ||Θ01--Θ02-||1. (34)

We move some terms of the inequality (34) and combine them to get the following inequality

c||Θ˜1-Θ01||F2+c||Θ˜2-Θ02||F2+λ{||Θ˜1-||1-||Θ01-||1+||Θ˜2-||1-||Θ02-||1}λ0{||Θ˜1--Θ01-||1+||Θ˜2--Θ02-||1}+ρ{||Θ01--Θ02-||1-||Θ˜1--Θ˜2-||1}+||Σ^1+-Σ01+||F||Θ˜1+-Θ01+||F+||Σ^2+-Σ02+||F||Θ˜2+-Θ02+||F. (35)

Next we need to prove three inequations:

||Θ˜k-||1-||Θ0k-||1||ΔkSkc-||1-||ΔkSk-||1, (36)
||Θ˜k--Θ0k-||1||ΔkSkc-||1+||ΔkSk-||1, (37)
||Θ01--Θ02-||1-||Θ˜1--Θ˜2-||1||Θ˜1--Θ01-||1+||Θ˜2--Θ02-||1. (38)

Because

||Θ˜k-||1=||Θ0k-+Δk-||1=||Θ0kSk-+ΔkSk-||1+||ΔkSkc-||1,

and

||Θ0k-||1=||Θ0kSk-||1

hold. Thus,

||Θ˜k-||1-||Θ0k-||1=||Θ0kSk-+ΔkSk-||1+||ΔkSkc-||1-||Θ0kSk-||1||ΔkSkc-||1-|||Θ0kSk-+ΔkSk-||1-||Θ0kSk-||1|||ΔkSkc-||1-||ΔkSk-||1,

which proves inequality (36). By the triangle inequality, we naturally obtain

||Θ˜k--Θ0k-||1=||Δk-||1=||ΔkSkc-+ΔkSk-||1||ΔkSkc-||1+||ΔkSk-||1.

Thus, the inequation (37) holds. For inequation (38), we have

||Θ01--Θ02-||1-||Θ˜1--Θ˜2-||1=||Θ01--Θ˜1-+Θ˜1--Θ˜2-+Θ˜2--Θ02-||1-||Θ˜1--Θ˜2-||1||Θ˜1--Θ01-||1+||Θ˜2--Θ02-||1.

Thus, the inequality (35) yields

c||Θ˜1-Θ01||F2+c||Θ˜2-Θ02||F2+λ{||Δ1S1c-||1-||Δ1S1-||1+||Δ2S2c-||1-||Δ2S2-||1}(ρ+λ0){||Δ1S1c-||1+||Δ1S1-||1+||Δ2S2c-||1+||Δ2S2-||1}+||Σ^1+-Σ01+||F||Θ˜1+-Θ01+||F+||Σ^2+-Σ02+||F||Θ˜2+-Θ02+||F.

By taking 2(ρ + λ0) < λ, we conclude that

2c{||Θ˜1-Θ01||F2+||Θ˜2-Θ02||F2}+λ{||Δ1S1c-||1+||Δ2S2c-||1}3λ{||Δ1S1-||1+||Δ2S2-||1}+2{||Σ^1+-Σ01+||F||Θ˜1+-Θ01+||F+||Σ^2+-Σ02+||F||Θ˜2+-Θ02+||F}.

By the definition of Δk, we have

||Δk-||1=||ΔkSk-+ΔkSkc-||1||ΔkSk-||1+||ΔkSkc-||1. (39)

So we deduce

2c{||Θ˜1-Θ01||F2+||Θ˜2-Θ02||F2}+λ{||Δ1-||1+||Δ2-||1}4λ{||Δ1S1-||1+||Δ2S2-||1}+2{||Σ^1+-Σ01+||F||Θ˜1+-Θ01+||F+||Σ^2+-Σ02+||F||Θ˜2+-Θ02+||F}

holds. Since the inequality of arithmetic and geometric means, the inequality ||ΔkSk-||1sk||ΔkSk-||F holds. Thus

2c{||Θ˜1-Θ01||F2+||Θ˜2-Θ02||F2}+λ{||Δ1-||1+||Δ2-||1}4λ{s1||Δ1S1-||F+s2||Δ2S2-||F}+2{||Σ^1+-Σ01+||F||Θ˜1+-Θ01+||F+||Σ^2+-Σ02+||F||Θ˜2+-Θ02+||F}. (40)

Using xy ≤ (x2 + y2)/2, the inequality (40) infer that

2c{||Θ˜1-Θ01||F2+||Θ˜2-Θ02||F2}+λ{||Δ1-||1+||Δ2-||1}12(c||Δ1S1-||F2+16λ2s1c+c||Δ2S2-||F2+16λ2s2c)+12(c||Θ˜1+-Θ01+||F2+4||Σ^1+-Σ01+||F2c+c||Θ˜2+-Θ02+||F2+4||Σ^2+-Σ02+||F2c).

Because

c||Θ˜k+-Θ0k+||F2+c||ΔkSk-||F2{c||Θ˜k+-Θ0k+||F2+c||Δk-||F2}+{c||ΔkSk-||F2+c||ΔkSkc-||F2+c||Δk+||F2}=2c||Δk||F2, (41)

we obtain

2c{||Θ˜1-Θ01||F2+||Θ˜2-Θ02||F2}+λ{||Δ1-||1+||Δ2-||1}c{||Δ1||F2+||Δ2||F2}+8λ2(s1+s2)c+2||Σ^1+-Σ01+||F2c+2||Σ^2+-Σ02+||F2c.

Thus,

c{||Δ1||F2+||Δ2||F2}+λ{||Δ1-||1+||Δ2-||1}8λ2(s1+s2)c+2||Σ^1+-Σ01+||F2c+2||Σ^2+-Σ02+||F2c. (42)

Based on the inequality ||Σ^k+-Σ0k+||Fp||Σ^k+-Σ0k+||, we have

c{||Δ1||F2+||Δ2||F2}+λ{||Δ1-||1+||Δ2-||1}8λ2(s1+s2)c+4pλ02c. (43)

Next, we prove that substituting Θ^k for Θ˜k, the conclusion still holds. According to the condition,

||Δ1||F2+||Δ2||F2λ02cLλ4cL132L2.

Taking M=1/(22L)<1/2L, we have

||Δ1||F2+||Δ2||F2M2/4.

Thus, ||Δk||F is bounded by M/2. In addition,

||Θ^k-Θ0k||F=M||Δk||FM-||Δk||F,

which means ||Θ^k-Θ0k||F is monotone increasing function of ||Δk||F on set (0, M). We obtain that ||Θ^k-Θ0k||FM. Therefore, we can substitute Θ^k for Θ˜k, and that leads to the inequality (43) holds for Θ^k.

According to inequality (43), we get

||Θ^k-Θ0k||F28λ2(s1+s2)c2+4pλ02c2λ2(8s1+8s2+p)c2,

and

||Θ^k--Θ0k-||18λ(s1+s2)c+4pλ02λcλ(8s1+8s2+p)c.

Thus, we conclude the upper bound of k=12|||Θ^k-Θ0k|||1,

k=12|||Θ^k-Θ0k|||1k=12(||Θ^k+-Θ0k+||+||Θ^k--Θ0k-||1)k=12(||Θ^k-Θ0k||F+||Θ^k--Θ0k-||1)2λ8s1+8s2+pc+2λ(8s1+8s2+p)c4λ(8s1+8s2+p)c.

Proof of Theorem 2

Proof 2 The minimizer (Θ^R[1],Θ^R[2]) satisfying inequality (42), that is

c{||Θ^R[1]-ΘR0[1]||F2+||Θ^R[2]-ΘR0[2]||F2}+λ{||(Θ^R[1]-ΘR0[1])-||1+||(Θ^R[2]-ΘR0[2])-||1}8λ2(s1+s2)c+2||(R^[1]-R0[1])+||F2c+2||(R^[2]-R0[2])+||F2c.

The diagonal elements of R^[k] and R0[k] are all 1. Thus

c{||Θ^R[1]-ΘR0[1]||F2+||Θ^R[2]-ΘR0[2]||F2}+λ{||(Θ^R[1]-ΘR0[1])-||1+||(Θ^R[2]-ΘR0[2])-||1}8λ2(s1+s2)c.

Moreover, for the conclusion of the l1-operator norm, we get

|||Θ^R[1]-ΘR0[1]|||1+|||Θ^R[2]-ΘR0[2]|||1k=12(||(Θ^R[k]-ΘR0[k])+||+||(Θ^R[k]-ΘR0[k])-||1)k=12(||Θ^R[k]-ΘR0[k]||F+||(Θ^R[k]-ΘR0[k])-||1)32λ(s1+s2)c.

For the minimizer (Θ^w[1],Θ^w[2]) , following inequality holds

|||Θ^R[k]-ΘR0[k]|||1=|||W^[k]Θ^w[k]W^[k]-W0[k]Θw0[k]W0[k]|||1||W^[k]||2|||Θ^w[k]-Θw0[k]|||1+||W^[k]-W0[k]|||||Θw0[k]|||1||W^[k]||+||W0[k]|||||Θw0[k]|||1||W^[k]-W0[k]||. (44)

To draw the conclusion, we have the following facts:

  • The Sub-Gaussian vector with covariance Σ0[k] implies that n/logp||(Σ^[k]-Σ0[k])|| is bounded in probability.

  • The eigenvalues of Θw0[k] are bounded by a constant.

Thus, |||Θ^R[k]-ΘR0[k]|||1 and |||Θ^w[k]-Θw0[k]|||1 share the same boundary.

Proof of Theorem 3

Proof 3 Similarly, Θ^k are the minimum value of the fused graphical Lasso for k = 1, 2, ⋯, K. Let Θ˜k=αkΘ^k+(1-αk)Θ0k, and αk=MM+||Θ^k-Θ0k||F. Denotes

Fn(Θ1,,ΘK)=k=1K{tr(ΘkΣ^k)-logdet(Θk)}+λk=1K||Θk-||1+ρk<k||Θk--Θk-||1,

we obtain

Fn(Θ˜1,Θ˜2,,Θ˜K)Fn(Θ01,Θ02,,Θ0K).

Thus,

k=1K{tr(Θ˜k-Θ0k)Σ^k-(logdet(Θ˜k)-logdet(Θ0k))+λ||Θ˜k-||1}+ρk<k||Θ˜k--Θ˜k-||1λk=1K||Θ0k-||1+ρk<k||Θ0k--Θ0k-||1.

Using the notations that Δk=Θ˜k-Θ0k and

f(Δk)tr(ΔkΣ0k)-[logdet(Δk+Θ0k)-logdet(Θ0k)],

we yield the following expression

k=1Kf(Δk)+λk=1K||Θ˜k-||1+ρk<k||Θ˜k--Θ˜k-||1-k=1K(tr(Δk(Σ^k-Σ0k)))-tr(Δ2(Σ^2-Σ02))+λk=1K||Θ0k-||1+ρk<k||Θ0k--Θ0k-||1. (45)

For Lk ≥ 1, k = 1, 2, ⋯, K, the minimum and maximum eigenvalues of Θ0k hold that

1/Lkλmin(Θ0k)λmax(Θ0k)Lk.

For multiple case, we select a constant L satisfying 1/L ≤ 1/Lk and LkL. By similar analysis, for M in (0, 1/2L], the inequality (32) and the inequality (33) still hold.

For K groups data, based on the inequalities (31) and (33). Then, the inequality (45) turns to be

ck=1K||Θ˜k-Θ0k||F2+λk=1K||Θ˜k-||1+ρk<k||Θ˜k--Θ˜k-||1k=1K{λ0||Δk-||1+||Σ^k+-Σ0k+||F||Δk+||F}+λk=1K||Θ0k-||1+ρk<k||Θ0k--Θ0k-||1.

Thus,

ck=1K||Θ˜k-Θ0k||F2+λk=1K{||Θ˜k-||1-||Θ0k-||1}λ0k=1K||Θ˜k--Θ0k-||1+ρk<k{||Θ0k--Θ0k-||1-||Θ˜k--Θ˜k-||1}+k=1K{||Σ^k+-Σ0k+||F||Θ˜k+-Θ0k+||F}. (46)

When k = 1, 2, ⋯, K, the inequations (36) and (37) still hold. Similarly, we have the following inequality

||Θ0k--Θ0k-||1-||Θ˜k--Θ˜k-||1=||Θ0k--Θ˜k-+Θ˜k--Θ˜k-+Θ˜k--Θ0k-||1-||Θ˜k--Θ˜k-||1||Θ˜k--Θ0k-||1+||Θ˜k--Θ0k-||1. (47)

Thus, by the Eqs (36), (37) and (47) the inequality (46) yields

ck=1K||Θ˜k-Θ0k||F2+λk=1K{||ΔkSkc-||1-||ΔkSk-||1}λ0k=1K{||ΔkSkc-||1+||ΔkSk-||1}+ρk<k{||ΔkSkc-||1+||ΔkSk-||1+||ΔkSkc-||1+||ΔkSk-||1}+k=1K{||Σ^k+-Σ0k+||F||Θ˜k+-Θ0k+||F}(K(K-1)2ρ+λ0)k=1K{||ΔkSkc-||1+||ΔkSk-||1}+k=1K{||Σ^k+-Σ0k+||F||Θ˜k+-Θ0k+||F}.

Since K is a fixed constant, and 2(K(K-1)2ρ+λ0)<λ , we can obtain

2ck=1K||Θ˜k-Θ0k||F2+λk=1K||ΔkSkc-||13λk=1K||ΔkSk-||1+2k=1K{||Σ^k+-Σ0k+||F||Θ˜k+-Θ0k+||F}.

On the basis of the inequality (39), we deduce

2ck=1K||Θ˜k-Θ0k||F2+λk=1K||Δk-||14λk=1K||ΔkSk-||1+2k=1K{||Σ^k+-Σ0k+||F||Θ˜k+-Θ0k+||F}

holds. In addition, one can get the inequality ||ΔkSk-||1sk||ΔkSk-||F. Thus

2ck=1K||Θ˜k-Θ0k||F2+λk=1K||Δk-||14λk=1K(sk||ΔkSk-||F)+2k=1K{||Σ^k+-Σ0k+||F||Θ˜k+-Θ0k+||F}. (48)

Based on xy ≤ (x2 + y2)/2 and the inequality (41), the inequality (48) infer that

2ck=1K||Θ˜k-Θ0k||F2+λk=1K||Δk-||112k=1K(c||ΔkSk-||F2+16λ2skc)+12k=1K(c||Θ˜k+-Θ0k+||F2+4||Σ^k+-Σ0k+||F2c)ck=1K||Δk||F2+8λ2k=1Kskc+2k=1K||Σ^k+-Σ0k+||F2c.

Thus,

ck=1K||Δk||F2+λk=1K||Δk-||18λ2k=1Kskc+2k=1K||Σ^k+-Σ0k+||F2c. (49)

Using the relation between the Frobenius norm and the supremum norm, we have

ck=1K||Δk||F2+λk=1K||Δk-||18λ2k=1Kskc+2Kpλ02c. (50)

According to the inequality (50), we get

k=1K||Δk||F2λ02cL.

According to λ0 ≤ λ/2 and the condition λ ≤ c/8L, we get

k=1K||Δk||F2132L2.

Taking M=1/(22L)<1/2L , we have

k=1K||Δk||F2M2/4.

Thus, ||Δk||F is bounded by M/2. Further, we can derive ||Θ^k-Θ0k||FM which means that we can substitute Θ^k for Θ˜k, and that leads to the inequality (50) holds for Θ^k, i.e.

ck=1K||Θ^k-Θ0k||F2+λk=1K||(Θ^k-Θ0k)-||18λ2k=1Kskc+2Kpλ02c,

That implies

k=1K|||Θ^k-Θ0k|||1k=1K(||Θ^k+-Θ0k+||+||Θ^k--Θ0k-||1)k=1K(||Θ^k-Θ0k||F+||Θ^k--Θ0k-||1)K[λ8k=1Ksk+Kp2c+λ(8k=1Ksk+Kp2)c]2Kλ(8k=1Ksk+Kp2)c,

which completes the proof.

Proof of Theorem 4

Proof 4 We get from (49)

ck=1K||Θ^R[k]-ΘR0[k]||F2+λk=1K||(Θ^R[k]-ΘR0[k])-||18λ2k=1Kskc+2k=1K||(Θ^R[k]-ΘR0[k])+||F2c,

and similarly derive

ck=1K||Θ^R[k]-ΘR0[k]||F2+λk=1K||(Θ^R[k]-ΘR0[k])-||18λ2k=1Kskc.

Using

k=1K|||Θ^R[k]-ΘR0[k]|||1k=1K(||Θ^R[k]-ΘR0[k]||F+||(Θ^R[k]-ΘR0[k])-||1)

we have

k=1K|||Θ^R[k]-ΘR0[k]|||116Kλk=1Kskc.

At last, using the inequality (44), based on the analysis of the upper bound of ||W0[k]|| and ||W^[k]||, and the convergence rate of ||(Σ^[k]-Σ0[k])||, we draw the conclusion that

k=1K|||Θ^w[k]-Θ0[k]|||116Kλk=1Kskc.

Proof of Theorem 5

Proof 5 First of all, we prove that the remainder converge in probability with a 1/n convergence rate. On account of Theorem 1, we get

||rem||k=12||(Θ^[k]-Θ0[k])(Σ^[k]-Σ0[k])Θ0[k]||+k=12||(Θ^[k]-Θ0[k])(Σ^[k]Θ^[k]-Ip)||

Define

l(Θ)=k=12{tr(Σ^[k]Θ[k])-logdet(Θ[k])}+λk=12||(Θ[k])-||1+ρ||(Θ[1]-Θ[2])-||1.

By the Karush-Kuhn-Tucker (KKT) conditions, we yield

Σ^[1]-(Θ^[1])-1+(λ+ρ)Z^[1]=0, (51)

and

Σ^[2]-(Θ^[2])-1+(λ-ρ)Z^[2]=0, (52)

where Z^ij[k]=sign(Θ^ij[k]) if Θ^ij[k]0 , and satisfying ||Z^[k]||1. Multiplying by Θ^[1] on the Eq (51), we get

Ip-Σ^[1]Θ^[1]=(λ+ρ)Z^[1]Θ^[1].

Similarly, we have

Ip-Σ^[2]Θ^[2]=(λ-ρ)Z^[2]Θ^[2].

Thus,

||rem||k=12|||(Θ^[k]-Θ0[k])|||1||(Σ^[k]-Σ0[k])|||||Θ0[k]|||1+(λ+ρ)k=12|||(Θ^[k]-Θ0[k])|||1||Z^[k]|||||Θ^[k]|||1.

To draw the conclusion, we have

|||(Θ^[k]-Θ0[k])|||1b(p+s)λ, (53)

where b is a constant and is related to L. According to the Schwarz inequality and Weyl inequality, we get

|||Θ0[k]|||1d+1Λmax(Θ0[k]). (54)

The bound of |||Θ^[k]|||1 is derived by

|||Θ^[k]|||1|||Θ^[k]-Θ0[k]|||1+|||Θ0[k]|||1. (55)

According to the rate of λ, we conclude that

|||Θ^[k]|||1d+1Λmax(Θ0[k]). (56)

Besides, the Sub-Gaussian random vector with covariance Σ0[k] implies that ||Σ^[k]-Σ0[k]||=Op(log(p)/n), where Op denotes bounded in probability. We get

||rem||4λ(8s1+8s2+p)clogpnd+1max{Λmax(Θ0[1]),Λmax(Θ0[2])}+(λ+ρ)4λ(8s1+8s2+p)cd+1max{Λmax(Θ0[1]),Λmax(Θ0[2])}.

For λ ≍ ρ, ||rem|| is bounded by b˜(p+s)d+1λ2 in probability, where b˜ is a constant related to L. Based on the condition (p+s)d=o(n/logp), ||rem||=op(1/n). According to the bounded fourth moments of (Θ^[k])ii(Θ^[k])jj+(Θ^[k])ij2 and Lindeberg central limit theorem, we complete the proof of the Theorem 5.

Proof of Theorem 6

Proof 6 The conclusions of Theorem 6 can be obtained from the arguments (53)–(56). For weighted version, ||rem|| can be bounded by b˜sd+1λ2, which completes the proof.

Data Availability

All data come from website http://portals.broadinstitute.org/cgibin/cancer/datasets.cgi. However, we do not have the right to share this data.

Funding Statement

the author Q.Y. Zhang is supported by Program for youth innovation Research in Capital University of Economics and Business(QNTD202207).

References

  • 1. Johnstone I. M. On the distribution of the largest eigenvalue in principal components analysis. Annals of statistics. 2001;29(2):295–327. doi: 10.1214/aos/1009210544 [DOI] [Google Scholar]
  • 2. Yuan M., Lin Y. Model selection and estimation in the gaussian graphical model. Biometrika. 2007;94(1):19–35. doi: 10.1093/biomet/asm018 [DOI] [Google Scholar]
  • 3. Friedman J., Hastie T., Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9(3):432–441. doi: 10.1093/biostatistics/kxm045 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Rothman A. J., Bickel P. J., Levina E., Zhu J. Sparse permutation invariant covariance estimation. Electronic Journal of Statistics. 2008;2:494–515. doi: 10.1214/08-EJS176 [DOI] [Google Scholar]
  • 5. Fan J. Q., Feng Y., Wu Y. C. Network exploration via the adaptive lasso and scad penalties. Annals of applied statistics. 2009;3(2):521–541. doi: 10.1214/08-AOAS215SUPP [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Ravikumar P., Wainwright M. J., Raskutti G., Yu B. High-dimensional covariance estimation by minimizing l1-penalized log-determinant divergence. Electronic Journal of Statistics. 2011;5:935–980. [Google Scholar]
  • 7. Xue L. Z., Zou H. Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Annals of Statistics. 2012;40(5):2541–2571. doi: 10.1214/12-AOS1041 [DOI] [Google Scholar]
  • 8. Yuan Y. P., Shen X. T., Pan W., Wang Z. Z. Constrained likelihood for reconstructing a directed acyclic gaussian graph. Biometrika. 2019;106(1):109–125. doi: 10.1093/biomet/asy057 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Guo J., Levina E., Michailidis G., Zhu J. Joint estimation of multiple graphical models. Biometrika. 2011;98(1):1–15. doi: 10.1093/biomet/asq060 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Zhang X.-F., Ou-Yang L., Yan T., Hu X. T., Yan H. A joint graphical model for inferring gene networks across multiple subpopulations and data types. IEEE Transactions on Cybernetics. 2019;51(2):1043–1055. doi: 10.1109/TCYB.2019.2952711 [DOI] [PubMed] [Google Scholar]
  • 11. Danaher P., Wang P., Witten D. M. The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2014;76(2):373–397. doi: 10.1111/rssb.12033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Janková J., van de Geer S. Confidence intervals for high-dimensional inverse covariance estimation. Electronic Journal of Statistics. 2015;9(1):1205–1229. [Google Scholar]
  • 13. Janková J., van de Geer S. Honest confidence regions and optimality in high-dimensional precision matrix estimation. Test. 2017;26(1):143–162. doi: 10.1007/s11749-016-0503-5 [DOI] [Google Scholar]
  • 14. Ren Z., Sun T., Zhang C.-H., Zhou H. H. Asymptotic normality and optimalities in estimation of large gaussian graphical models. Annals of Statistics. 2015;43(3):991–1026. doi: 10.1214/14-AOS1286 [DOI] [Google Scholar]
  • 15. Yu M., Gupta V., Kolar M. Simultaneous inference for pairwise graphical models with generalized score matching. Journal of Machine Learning Research. 2020;21(91):1–51.34305477 [Google Scholar]
  • 16.Janková, J., van de Geer, S. Inference in high-dimensional graphical models. http://arxiv.org/abs/arXiv:1801.08512
  • 17. Brownlees C., Nualart E., Sun Y. C. Realized networks. Journal of Applied Econometrics. 2018;33(7):986–1006. doi: 10.1002/jae.2642 [DOI] [Google Scholar]
  • 18. Boyd S., Vandenberghe L. Convex optimization. Cambridge university press; 2004. [Google Scholar]
  • 19. Monti S., Savage K. J., Kutok J. L., Feuerhake F., Kurtin P., Mihm M., et al. Molecular profiling of diffuse large b-cell lymphoma identifies robust subtypes including one characterized by host inflammatory response. Blood. 2005;105(5):1851–1861. doi: 10.1182/blood-2004-07-2947 [DOI] [PubMed] [Google Scholar]
  • 20. Rosenwald A., Wright G., Chan W. C., Connors J. M., Campo E., Fisher R. I., et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. The New England Journal of Medicine. 2002;346(25):1937–1947. doi: 10.1056/NEJMoa012914 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Debo Cheng

3 Apr 2024

PONE-D-23-38136Application of fused graphical lasso to statistical inference for multiple sparse precision matricesPLOS ONE

Dear Dr. Zhang,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by May 18 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Debo Cheng

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Note from Emily Chenette, Editor in Chief of PLOS ONE, and Iain Hrynaszkiewicz, Director of Open Research Solutions at PLOS: 

Did you know that depositing data in a repository is associated with up to a 25% citation advantage (https://doi.org/10.1371/journal.pone.0230416)? If you’ve not already done so, consider depositing your raw data in a repository to ensure your work is read, appreciated and cited by the largest possible audience. You’ll also earn an Accessible Data icon on your published paper if you deposit your data in any participating repository (https://plos.org/open-science/open-data/#accessible-data).

3. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. 

Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

4. Please update your submission to use the PLOS LaTeX template. The template and more information on our requirements for LaTeX submissions can be found at http://journals.plos.org/plosone/s/latex.

5. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. 

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

6. Thank you for stating the following in the Acknowledgments Section of your manuscript: 

"Q.Y. Zhang was supported by NSFC 12201430."

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. 

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: 

"The author(s) received no specific funding for this work."

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

7. Thank you for uploading your study's underlying data set. Unfortunately, the repository you have noted in your Data Availability statement does not qualify as an acceptable data repository according to PLOS's standards.

At this time, please upload the minimal data set necessary to replicate your study's findings to a stable, public repository (such as figshare or Dryad) and provide us with the relevant URLs, DOIs, or accession numbers that may be used to access these data. For a list of recommended repositories and additional information on PLOS standards for data deposition, please see https://journals.plos.org/plosone/s/recommended-repositories.

8. Your abstract cannot contain citations. Please only include citations in the body text of the manuscript, and ensure that they remain in ascending numerical order on first mention.

Additional Editor Comments:

Both reviewers acknowledged the paper's contributions, yet they also highlighted certain drawbacks. Therefore, they recommend a major revision to address these issues.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors study the problem of estimating multiple precision matrices from

multiple populations. They propose to use the fused graphical lasso (FGL) method, which is a lasso penalty on sparsity of precision matrices plus

a refined lasso penalty

on the two precision matrices that restrains the

similar structure across multiple groups.

They obtain some inequalities for multiple estimators of FGL models in terms of $L_1$ and Frobenius matrix norms

under several conditions.

Built on the work of Jankov´a and van de Geer (2015) who investigated a de-biasing

technology to obtain a new consistent estimator with known distribution, the authors extend the statistical inference problem to multiple populations, and propose the de-biasing

FGL estimators. The corresponding asymptotic property of de-biasing FGL estimators is provided. A

simulation study and an application of the diffuse large B-cell lymphoma data demonstrate theoretical results.

It is a nice contribution. Some comments are in the attached file.

Reviewer #2: The authors use the fused graphical lasso (FGL) method to estimate multiple precision matrices from multiple populations simultaneously. In high-dimensional setting, an oracle inequality is provided for FGL estimators, which is necessary to establish the central limit law. They not only focus on point estimation of precision matrix, but also work on hypothesis testing for a linear combination of the entries of multiple precision matrices. They extend the Jankov$\\acute{a}$ and van de Geer's [Confidence intervals for high-dimensional inverse covariance estimation, {\\it Electron. J. Stat.} {\\bf 9}(1) (2015) 1205-1229.] de-biasing technology to multiple populations statistical inference problem and propose the de-biasing FGL estimators. The corresponding asymptotic property of de-biasing FGL estimators is provided. A simulation study and an application of the diffuse large B-cell lymphoma data show that the proposed test works well in high-dimensional situations.

However, I still have some questions about the manuscript.

1. As mentioned in the paper, the authors need to minimize the negative penalized log-likelihood function (1) and get the algorithm solution as an estimator in both numerical study and real data application. The authors use the ADMM algorithm to execute the estimating process with the classical AIC tuning method. Could the author demonstrate the ADMM algorithm for fused lasso? Will the solutions with different tuning methods affect the accuracy of the hypothesis testing process? Is there any better approach to tune the parameters?

2. In page 2, line 13: ``$\\hat\\Sigma^{-1}$" $\\to$ ``$\\hat\\Sigma^{-1}_{n}$".

3. In page 4, line 2: ``we denote $(A)_{ij}$ its $(i,j)$-entry" $\\to$ ``we denote $(A)_{ij}$ as $(i,j)$-entry of A".

4. In page 4, line 3 and line 18: The authors confused the symbols of the determinant and the cardinality, please distinguish them.

5. In page 11, line 17: what does notation $|A|$ mean? Please clarify the meaning of the symbol.

6. There are too many formulas with numbers. Please remove the numbers of formulas that are not used.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PONE-D-23-38136-review.pdf

pone.0304264.s001.pdf (70.2KB, pdf)
PLoS One. 2024 May 31;19(5):e0304264. doi: 10.1371/journal.pone.0304264.r002

Author response to Decision Letter 0


15 Apr 2024

Thanks for the comments of reviewers and editor. We have uploaded the “respond to reviewers.pdf” to the attachment. Please check the attachment. Thanks again.

Attachment

Submitted filename: Response to Reviewers_PONE_D_23_38136.pdf

pone.0304264.s002.pdf (174.8KB, pdf)

Decision Letter 1

Debo Cheng

9 May 2024

Application of fused graphical lasso to statistical inference for multiple sparse precision matrices

PONE-D-23-38136R1

Dear Dr. Zhang,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Debo Cheng

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Both reviewers agreed to accept this manuscript. I agree to recommend acceptance.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

Acceptance letter

Debo Cheng

14 May 2024

PONE-D-23-38136R1

PLOS ONE

Dear Dr. Zhang,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Debo Cheng

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: PONE-D-23-38136-review.pdf

    pone.0304264.s001.pdf (70.2KB, pdf)
    Attachment

    Submitted filename: Response to Reviewers_PONE_D_23_38136.pdf

    pone.0304264.s002.pdf (174.8KB, pdf)

    Data Availability Statement

    All data come from website http://portals.broadinstitute.org/cgibin/cancer/datasets.cgi. However, we do not have the right to share this data.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES