Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jul 1.
Published in final edited form as: J Multivar Anal. 2024 Feb 13;202:105298. doi: 10.1016/j.jmva.2024.105298

Estimation of multiple networks with common structures in heterogeneous subgroups

Xing Qin a, Jianhua Hu b, Shuangge Ma c, Mengyun Wu b,*
PMCID: PMC10907012  NIHMSID: NIHMS1968373  PMID: 38433779

Abstract

Network estimation has been a critical component of high-dimensional data analysis and can provide an understanding of the underlying complex dependence structures. Among the existing studies, Gaussian graphical models have been highly popular. However, they still have limitations due to the homogeneous distribution assumption and the fact that they are only applicable to small-scale data. For example, cancers have various levels of unknown heterogeneity, and biological networks, which include thousands of molecular components, often differ across subgroups while also sharing some commonalities. In this article, we propose a new joint estimation approach for multiple networks with unknown sample heterogeneity, by decomposing the Gaussian graphical model (GGM) into a collection of sparse regression problems. A reparameterization technique and a composite minimax concave penalty are introduced to effectively accommodate the specific and common information across the networks of multiple subgroups, making the proposed estimator significantly advancing from the existing heterogeneity network analysis based on the regularized likelihood of GGM directly and enjoying scale-invariant, tuning-insensitive, and optimization convexity properties. The proposed analysis can be effectively realized using parallel computing. The estimation and selection consistency properties are rigorously established. The proposed approach allows the theoretical studies to focus on independent network estimation only and has the significant advantage of being both theoretically and computationally applicable to large-scale data. Extensive numerical experiments with simulated data and the TCGA breast cancer data demonstrate the prominent performance of the proposed approach in both subgroup and network identifications.

Keywords: Gaussian graphical models, Heterogeneity analysis, High-dimensional data, Network estimation, Primary 62H30, Secondary 62H12

1. Introduction

Many modern applications often involve analyzing a network structure for a set of high-dimensional variables. For example, biological networks specific to disease contexts explore the patterns of association in molecular data (e.g., genes, proteins, etc.) and play a critical role in understanding the underlying biological progresses [4, 16, 30]. Several statistical methods have been developed for network estimation. Among them, Gaussian graphical models have been widely employed, where the precision matrix describes the conditional dependencies between variables [8, 38]. To capture sparsity patterns in the precision matrix, one strategy is to conduct estimation based on the penalized likelihood function [1, 12]. Another strategy is to reduce the estimation of the precision matrix to a collection of sparse regression problems [26], which is not only easy to optimize but also more amenable to theoretical analysis. Recent methodological developments include CLIME [6], Tiger [22], SCIO [24], and scaled-Lasso [34], among which Tiger is computationally the most efficient due to its square root loss function and asymptotic tuning-free property.

Despite considerable successes, these studies are limited to homogeneous analysis and assume that high-dimensional data has only a single network. In practical applications, it is common that observed data comes from multiple subgroups and has heterogeneous dependencies. For example, in biomedical studies, cancer heterogeneity has received extensive attention, and the biological networks often differ across subtypes of the same cancer while also having certain commonalities. To accommodate both the specific and common information among multiple networks, based on the Gaussian graphical models, significant progresses have been made recently using the group regularization techniques, including the more popular likelihood-based strategies [2, 7, 14, 28] and relatively limited investigations from a sparse linear regression perspective in a column-by-column fashion [25].

However, all the aforementioned approaches rely on prior known subgroup memberships, which are usually not available for practical high-dimensional data with complex and unknown group structures. To accommodate the unknown heterogeneity, the Gaussian mixture model serves as a suitable choice for simultaneously conducting subgroup and network identifications. To this end, [13] develops a joint graphical lasso penalty on multiple precision matrices to extract both homogeneous and heterogeneous information across all subgroups. [15] takes a further step, developing an efficient ECM algorithm and establishing non-asymptotic statistical properties for the estimated networks. Recently, [29] develops a penalized fusion method for heterogeneity analysis based on the Gaussian mixture model, which has the advantage of being capable of automatically determining the number of subgroups. The aforementioned approaches are based on likelihood regularization strategies and are often only applicable to small-scale data with dimensions less than 100.

In this study, we develop a new joint estimation approach for multiple networks with unknown sample heterogeneity. Based on the Gaussian mixture model, we propose estimating multiple precision matrices in a column-by-column manner with a reparameterization technique and introduce a composite minimax concave penalty (MCP) to effectively accommodate both specific and common structures of networks. This is significantly different from the existing heterogeneous network estimation publications [15, 29] (which directly maximize a penalized likelihood with respect to large matrices) and has the advantage of automatically imposing an adaptive penalty on the parameters, resulting in scale-invariant estimators and potential tuning-insensitive properties. This study is much more than an extension of the homogeneous column-wise Gaussian graphical model [22, 25]. Specifically, significantly advancing from [22] which is based on the square root loss function, the proposed approach is based on ordinary square loss, enjoying simplicity and achieving optimization convexity. In addition, significantly advancing from [25], the proposed approach can provide equivalent estimates under scaling in mixture models, which is much more important than in homogeneous analysis. The proposed approach allows theoretical studies to focus on independent network estimation only (disregarding the diagonal elements of precision matrices) and can take advantage of parallel computation to achieve more efficiency, making it theoretically and computationally feasible for large-scale data. Overall, this study provides a practically useful tool for the joint estimation research of multiple networks from heterogeneous subgroups.

2. Methods

Suppose that there are n independent subjects, which come from K subgroups. For the ith subject, denote xi=xi,1,,xi,p as the p-dimensional vector of predictor measurements and Ci{1,,K} as the subgroup assignment. Assume that

xiCi=k𝒩p(μk,Ωk1), (1)

where μk=μk,1,,μk,p and Ωk=ωk,jp×p are the mean vector and precision matrix for the kth subgroup, respectively. This Gaussian graphical model has been popular in existing network analysis publications. For the kth subgroup, the relationships between the predictors in the network are measured using the subgroup-specific precision matrix Ωk, which describes the conditional independence between any two predictors given the rest. Specifically, if ωk,j0, there is a connection between predictors and j for the kth subgroup, and otherwise if ωk,j=0.

Based on (1), when Ci=k, the conditional distribution of xi,j given xi,\j=xi,:j(p-1)×1 is

xi,jxi,\j;Ci=k𝒩(μk,jjωk,j/ωk,jj(xi,μk,),1/ωk,jj). (2)

For j, define

β¨k,j,:=ωk,j/ωk,jj,β¨k,j,j:=μk,jjβ¨k,j,μk,,andσk,j2:=1/ωk,jj. (3)

Then (2) can be rewritten as:

xi,jxi,\j;Ci=k𝒩(β¨k,j,j+xi,\jβ¨k,j,\j,σk,j2), (4)

where β¨k,j,\j=β¨k,j,:j(p-1)×1. Based on (3), estimation of Ωk’s can be achieved by considering the conditional distribution (4) for j{1,,p}, where ωk,j0 corresponds to β¨k,j,0 for j.

In practice, the subgroup assignments Ci’s are not always observed. To accommodate the unknown sample heterogeneity, we consider the following conditional Gaussian mixture model:

xi,jxi,\jk=1Kπk𝒩(β¨k,j,j+xi,\jβ¨k,j,\j,σk,j2),

where πk=PrCi=k is the probability that a subject belongs to the kth subgroup with k=1Kπk=1. For more effective heterogeneous network estimation while accommodating the common structure across the K networks, we consider reparameterization for each k{1,,K} and j,{1,,p}:

τk,j=1/σk,j,βk,j,=β¨k,j,/σk,j, (5)

and propose the following penalized objective function for each j{1,,p}:

Ln(θj)=i=1nln{k=1Kπkg(xi,j;xi,\jβk,j,\j+βk,j,j,τk,j)}njρ{k=1Kρ(|βk,j,|;λ,γ);λ,Kλγ2}, (6)

where θj=vecπ1,,πK,αj is the vector of the overall unknown parameters in the jth subproblem with αj=vecτ1,j,,τK,j,βj and βj=vecβ1,j,j,β1,j,\j,,βK,j,j,βK,j,\j,gxi,j;xi,\jβk,j,\j+βk,j,j,τk,j=lnτk,j-τk,jxi,j-βk,j,j-xi,\jβk,j,\j2/2, and ρ(v;λ,γ)=λ0v1-(λγ)-1x+dx is the minimax concave penalty (MCP) with tuning parameter λ and regularization parameter γ. Here, following the existing literature [3], the tuning parameter of the outer penalty is chosen to be Kλγ/2, ensuring that the group-level penalty reaches a maximum if and only if each of its components is at its maximum.

The proposed estimation procedure has been motivated by the following considerations. In (6), the first term is a reparameterized version of the log-likelihood function based on the distribution of xi,j given xi,\j. Different from the studies that estimate the precision matrix based on the joint distribution of xi directly, the proposed conditional distribution-based strategy can reduce the estimation problem of a large matrix to a collection of sparse linear regression problems, enjoying not only great computational efficiency but also satisfactory theoretical properties. Specifically, as demonstrated in [33], the scale-invariance considerations and theoretical results indicate the necessity of taking penalty levels proportional to noise levels in high-dimensional problems. Thus, taking advantage of the column-wise strategy, we utilize reparameterization to automatically scale the penalty to the noise level and make the proposed estimator scale-invariant [32, 42]. Different from the existing heterogeneous network analysis studies [13, 15, 29], such a strategy achieves “adaptive adjustment” of the coefficients based on the data characteristics, resulting in a more accurate estimation. The proposed model is closely related to the scaled sparse linear regression [33], which does not require any knowledge of noise and has comparable theoretical properties to studies under known noise. In contrast, penalizing β¨k,j,’s directly on the basis of (4) leads to an indirect effect on the estimation of the scale parameters σk,j’s and non-convex optimization, which may result in serious consequences in mixture model analysis. Additionally, such a reparameterization strategy enjoys the advantage that the identification and estimation results are insensitive to the choice of tuning parameters, which simplifies the tuning procedure. In the second term, we apply the composite MCP [3] to capture heterogeneity and homogeneity across all K subgroups. According to the relationship between βk,j, and ωk,j, the outer MCP encourages the K estimated precision matrices to share a similar pattern of sparsity to accommodate the potential common network structure across the subgroups, where β1,j,,,βK,j, will be simultaneously shrunk to zero or not. On the other hand, the inner MCP imposed on βk,j,’s amounts to further sparsing each precision matrix to accommodate the specific network structure across the subgroups. Here, different from the sparse group lasso penalty commonly used in the literature [13, 15], the adopted composite MCP improves accuracy and interpretability of the model by allowing important relationships to have large coefficients without overshrinking. As our primary goal is to estimate subgroup-specific precision matrices and investigate heterogeneous networks, the intercept terms βk,j,j’s are not subject to penalization. The model can be easily extended to accommodate sparse learning for the subgroup means as well.

2.1. Computation

We develop an Expectation Maximization (EM) algorithm to optimize objective function (6). Specifically, we introduce a latent variable zki=ICi=k, where I{} is the indicator function, and then consider the following complete objective function:

Qn(θj)=i=1nk=1Kzkiln{πkg(xi,j;xi,\jβk,j,\j+βk,j,j,τk,j)}njρ{k=1Kρ(|βk,j,|;λ,γ);λ,Kλγ2}. (7)

Denote Ψ=vecμ1,,μK,Ω1,,ΩK,π1,,πK and α=vecα1,,αp. The computation proceeds with the following steps.

  1. Initialization: Set iteration time t=0. For each k{1,,K}, initialize πk(0)=1/K and Ωk(0)=Ip, where Ip is an identity matrix of dimension p. Randomly divide the subjects into K subgroups of an equal size. For each k{1,,K}, take the mean of each subgroup as the initial value μk(0).

  2. In the tth E step, the conditional expectation of (7) with respect to Ψ(t-1) obtained at the (t-1)th iteration is
    EΨ(t1){Qn(θj)}=i=1nk=1Kρki(t)ln{πkg(xi,j;xi,\jβk,j,\j+βk,j,j,τk,j)}njρ{k=1Kρ(|βk,j,|;λ,γ);λ,Kλγ2} (8)
    for j{1,,p}. Here ρki(t)=Przki=1xi;Ψ(t-1)=πk(t-1)fkxi;μk(t-1),Ωk(t-1)/=1Kπ(t-1)fxi;μl(t-1),Ω(t-1) with fk being a multivariate Gaussian density function with a subgroup-specific mean vector μk(t-1) and precision matrix Ωk(t-1), where μk(t-1) and Ωk(t-1) are computed based on the estimation of α at the (t-1)th iteration and equations (3) and (5) (see the supplementary materials for more details). Here, with the consideration of the symmetry of Ωk’s, following [22], we compute ωk,j(t-1)=ωk,j(t-1)+ωk,j(t-1)/2. To further ensure the positive definiteness of the precision matrices, a small diagonal matrix is added to Ωk’s [25].
  3. In the tth M step,
    • Maximize (8) with respect to πk and compute πk(t)=i=1nρki(t)/n.
    • Maximize (8) with respect to αj. For each j{1,,p}, the following two steps are carried out sequentially,
      (1) With the other parameters fixed, optimizing Qnθj with respect to τk,j for each k{1,,K} yields
      τk,j(t)=argminτk,j{i=1nρki(t)(τk,jxi,jβk,j,j(t1)xi,\jβk,j,\j(t1))22i=1nρki(t)lnτk,j}=b˜k(t)+(b˜k(t))2+4na˜k(t)πk(t)2a˜k(t),
      where a˜k(t)=i=1nρki(t)xi,j2 and b˜k(t)=i=1nρki(t)xi,\jβk,j,\j(t-1)+βk,j,j(t-1)xi,j.
      (2) With the other parameters fixed, optimizing Qnθj with respect to βj yields that
      βj(t)=argminβj[i=1nk=1Kρki(t)(τk,j(t)xi,jβk,j,jxi,\jβk,j,\j)22+njρ{k=1Kρ(|βk,j,|;λ,γ);λ,Kλγ2}], (9)
      which is a weighted linear regression problem with composite MCP and can be realized using the R package grpreg [3].

By iterating the E and M steps, convergence is achieved usually within a moderate number of iterations, which is concluded in our numerical study if k=1Kμk(t)-μk(t-1)2/μk(t)2+Ωk(t)-Ωk(t-1)F/Ωk(t)F0.01 with Ωk(t)F=i,jωk,ij(t)2 being the Frobenius norm of Ωk(t). To improve performance of the proposed EM algorithm, following published literature [42], we consider multiple random initializations of subjects’ subgroup memberships and choose the estimate with the smallest AIC defined below as the final estimate.

Following [3], we set γ=3 in the proposed algorithm. For λ, following [7], we adopt the AIC criterion defined as -2i=1nlnk=1Kπˆkfkxi;μˆk,Ωˆk+k=1K2sˆk, where πˆk’s, μˆk’s, and Ωˆk’s are the final updates from the proposed algorithm, and sˆk=#(,j):ωˆk,j0,1<jp. Given fixed tuning parameters, in each iteration of the proposed EM algorithm, for each j{1,,p}, updating the regression parameters βj costs at most O{nK(p-1)} operations, resulting in a complete cost up to O{nKp(p-1)} operations. In comparison, the existing popular SCAN approach [15], which conducts heterogeneous network analysis based on the Gaussian mixture model and EM algorithm, has a computational complexity of OnKp2+Kp3 in each iteration of the EM algorithm. The proposed approach is computationally more feasible for large datasets with p>n. Since the optimization of (8) can be realized separately for j{1,,p}, we develop a parallel computing strategy for the EM algorithm to further improve efficiency. The R package MultiNet that implements the proposed approach in a parallel manner is available at https://github.com/mengyunwu2020/MultiNet.

3. Statistical Properties

For a set 𝒮, denote 𝒮c and |𝒮| as its complement and cardinality, respectively. For a vector v, denote v𝒮 as the component of v indexed by 𝒮. For a matrix M=Mijp1×p2, denote M𝒮1,𝒮2 as the submatrix of M indexed by 𝒮1 and 𝒮2 and M=maxi1,,p1j=1p2Mij as the maximum induced norm of M.

Denote Ωk0 as the true precision matrix for the kth subgroup and Ek0=(,j):1jp,ωk,j00 as the set of the nonzero off-diagonal elements of Ωk0. For the jth subproblem, let the vector of the true parameters be θj0=vecπ10,,πK0,τ1,j0,,τK,j0,β1,j,j0,,βK,j,j0,β1,j,\j0,,βK,j,\j0, and denote Cj=q:θj,q00 as the non-zero element set, where θj,q0 is the qth element of θj0. In addition, for the jth subproblem and kth subgroup, denote 𝒮j,k=:βk,j,00 and j=:ωk,j00 and j. Let s0=maxj{1,,p},k{1,,K}𝒮j,k.

Let θj,Cj*=vecπ1*,,πK*,τ1,j*,,τK,j*,β1,j,j*,,βK,j,j*,β1,j,𝒮j,1*,,βK,j,𝒮j,K* be the maximizer of the oracle counterpart of objective function (6) defined as:

L˜n(θj,Cj)=i=1nlnf(xi,j;xi,\j,θj,Cj)=i=1nln{k=1Kπkg(xi,j;xi,𝒮j,kβk,j,𝒮j,k+βk,j,j,τk,j)},

where fxi,j;xi,\j,θj,Cj=k=1Kπkgxi,j;xi,𝒮j,kβk,j,𝒮j,k+βk,j,j,τk,j.

To establish the theoretical results of the proposed approach in terms of identification and estimation, we need the following technical assumptions.

Assumption 1. The probability density function fxi,j;xi,\j,θj has a common support and is identifiable with respect to θj. Moreover, the first and second derivatives of lnfxi,j;xi,\j,θj satisfy that, for any q,{1,,K(p+2)},

E{lnf(xi,j;xi,\j,θj)θj,q|θj=θj0}=0,E{lnf(xi,j;xi,\j,θj)θj,qlnf(xi,j;xi,\j,θj)θj,|θj=θj0}=E{2lnf(xi,j;xi,\j,θj)θj,qθj,|θj=θj0}.

Assumption 2. The Fisher information matrix

(θj,Cj)=E[{lnf(xi,j;xi,\j,θj,Cj)θj,Cj}{lnf(xi,j;xi,\j,θj,Cj)θj,Cj}]

is finite and positive definite at θj,Cj=θj,Cj0. Furthermore, there exist finite constants m1 and m2 such that for any q,Cj,

E{2lnf(xi,j;xi,\j,θj,Cj)(θj,q)2|θj,Cj=θj,Cj0}<m1,E[{lnf(xi,j;xi,\j,θj,Cj)θj,lnf(xi,j;xi,\j,θj,Cj)θj,q}2|θj,Cj=θj,Cj0]<m2.

Assumption 3. There exists an open set 𝒩0 containing the true parameter θj0 and satisfies that, for almost all vi=vecxi,j,xi,\j and θj𝒩0, the density fxi,j;xi,\j,θj admits all third derivatives with respect to θj. Moreover, there exist integrable functions M1vi and M2vi, such that for any θj𝒩0 and q,,m{1,,K(p+2)},

|2θj,qθj,lnf(xi,j;xi,\j,θj)|M1(vi),|3θj,qθj,θj,mlnf(xi,j;xi,\j,θj)|M2(vi).

Assumption 4. For some constant b(0,1/4),K=Onb and s0=on1/4-b.

Assumption 5. Ks0/nλ20 and na/2-1/2lnn/λ20, when n, where a(0,1/2).

Assumption 6. mink{1,,K},j{1,,p}βk,j,0:𝒮j,k>γλ, and θj,q0 is bounded for each j{1,,p} and q {1,,K(p+2)}.

Assumption 7. ln(p)=Ona.

Assumptions 1–3 impose conditions on the mixture distribution and have been commonly assumed in the field of mixture modeling [18, 20, 42]. Assumption 4 makes a constraint on the sparsity parameter s0 and allows the number of subgroups K to grow slowly with n. It suggests that, for each sub-regression problem, as long as the number of non-zero parameters (i.e., s0K) is equal, the required sample size n is the same. Similar conditions have also been assumed in published multiple network analysis studies, such as Condition (C1) and assumptions in Proposition 1 in [5]. Assumption 5 restricts the rate of λ relative to the sample size. A similar condition is often assumed in high-dimensional studies [9, 42]. Assumption 6 imposes constraints on the true parameters, where the first subcondition restricts the rate by which the nonzero coefficients can be distinguished from zero, and the other one restricts the true parameters to a bounded range, which are also considered in the literature [9, 15]. Assumption 7 allows the dimension p to grow exponentially.

Theorem 1. Under Assumptions 1–4, there exists a strict local maximizer θj,Cj* of L~nθj,Cj such that θj,Cj*-θj,Cj0=OpKs0/n.

Theorem 2. Under Assumptions 1–7, the oracle estimator θ~j with θ~j,Cj=θj,Cj* and θ~j,Cjc=0 is a strict local maximizer of Lθj with probability tending to 1.

Theorem 3. Under Assumptions 1–7, there exist a strict local maximizer θˆj of Lnθj for each j{1,,p}, where for each k{1,,K}, the corresponding estimated precision matrix Ωˆk obtained from (5) and the resulting edge set Eˆk=(,j):1jpandωˆk,j0 satisfy that

  1. maxk,j,ωˆk,j-ωk,j0=OpKs0/n,k=1KΩˆk-Ωk0=OpK3s03/n, and k=1KΩˆk-Ωk0𝒜F=OpK3s2/n,

  2. with probability tending to 1, Eˆk=Ek0.

Here, 𝒜={(,j):1jp} is the off-diagonal index set for the p × p precision matrix, and s=maxk{1K}|Ek0| is the sparsity parameter of the true precision matrices.

The proofs of Theorems 1, 2, and 3 are provided in Section 7. In our study, we reduce the estimation problem of a large matrix to a collection of sparse linear regression problems. For each regression problem, we examine the theoretical properties based on the oracle estimator. A similar strategy has been considered in published high-dimensional data analysis studies, such as [9, 17, 42]. Theorem 1 establishes the estimation consistency of the oracle estimator θj,Cj*, where the true sparsity structure is known and thus the number of unknown parameters is in the same order as Ks0. Theorem 2 then demonstrates that the proposed estimator θˆj is asymptotically as efficient as the oracle one. In Theorem 3, we further study the estimation errors of the precision matrices under the elementwise sup-norm and maximum induced norm, as well as that of the off-diagonal elements under the Frobenius norm, which are useful for the graph recovery from the precision matrices.

Remark 1. As the spectral norm is dominated by the maximum induced norm, the estimation error of the proposed estimators under the spectral norm also follows a rate of OpK3s03/n, which is important for the consistency of the eigenvalue and eigenvector estimates and can be further used to analyze theoretical properties of downstream inferences.

Remark 2. Considering the goal of network estimation and taking advantage of the column-wise strategy, the proposed theoretical study can focus on the off-diagonal elements of Ωk’s. Under the assumption ln(p)=Ona, the accuracy of the off-diagonal elements (𝒜) can reach k=1KΩˆk-Ωk0𝒜F=OpK3s2/n, which depends on the sparsity parameter of the true precision matrices, making the proposed approach applicable to large-scale datasets. In contrast, the existing heterogeneous network analysis approaches, including [13, 15, 29], rely heavily on the estimation properties of the precision matrices as a whole, which requires consideration of the estimation error of the diagonal elements and needs the assumption pln(p)=o(n).

4. Simulation

We first consider p=100 and the number of subgroups K{2,3,4}. Two settings for the subgroup sizes are considered, where the first one is a balanced design with 200 subjects in each subgroup, and the second one is an imbalanced design with subgroup sizes being (150,200),(150,200,250), and (155,185,215,245) for K{2,3,4}, respectively. The networks for the K subgroups are generated as follows. First, following [7], we simulate each network with ten unconnected subnetworks and consider three settings S1-S3 with different levels of similarity across the subgroups. Specifically, under settings S1 and S2, there are two and five subnetworks where all the subgroups share the same sparsity structures, respectively. Under setting S3, for K=2, eight of the ten subnetworks have the same sparsity structures in all the subgroups; and for K=3 and 4, there are five subnetworks with the same sparsity structures in all the subgroups, and another three subnetworks with the same sparsity structures shared only by the first two subgroups. Second, for each subnetwork, we consider three types of network structure: Power-law network, for which the degree distribution follows a power law; Nearest-neighbor network, where p/10 points are first randomly generated on a unit square, and based on the calculated p/10×(p/10-1)/2 pairwise distances, 2 nearest neighbors of each point are found and connected; and Erdös-Rényi network, where the edge between each pair of nodes is added independently with probability 0.2.

For each k{1,,K}, the observations of the kth subgroup are generated from 𝒩pμk,Ωk-1, where the first four elements of μk are non-zero and the rest p-4 elements are all zero (details are provided in the supplementary materials, see Appendix), and Ωk’s are generated based on the networks. Specifically, following [22], for the kth network, we generate an adjacency matrix Ak whose non-zero off-diagonal entries (corresponding to edges) are generated from Uniform ([-0.6,-0.3][0.3,0.6]) and the diagonal entries are set to be 0. Then, Ωk is constructed as Ωk=DAk+λminAk+0.2IpD, where D is a diagonal matrix whose diagonal elements are 15,315,15,315,,15,315 with 1d being a d-dimensional vector of all ones, and λminAk is the smallest eigenvalue of Ak.

In addition to the proposed approach, five alternatives are also considered. SCAN, which is a heterogeneous network analysis approach based on the penalized log-likelihood of the Gaussian mixture model and the EM-ADMM algorithm [15]. Tiger, which is a single network analysis method via column-wise linear regressions and exploits Lasso based on a square root loss function for sparse network estimation [22]. True+Tiger, True+JGL, and True+JSEM, which apply Tiger, JGL, and JSEM for multiple network estimation based on the true subgroup memberships of subjects. Specifically, Tiger is conducted separately for each subgroup. JGL is the likelihood-based joint graph Lasso for multiple network estimation with known subgroup memberships [7]. JSEM conducts a joint analysis of multiple networks using neighborhood selection based on the group lasso penalty [25]. Both SCAN and True+JGL can be realized using the R package JGL, and Tiger and True+Tiger can be realized using the R package huge. The R code implementing True+JSEM can be downloaded from https://github.com/drjingma/JSEM. Among these approaches, SCAN achieves subgroup identification and multiple network estimation simultaneously, as well as accommodating common structures among networks. It is the most direct competitor for the proposed approach. In contrast, the other alternatives were originally developed for homogeneous network analysis or heterogeneous network analysis with known subgroup memberships. Tiger, True+Tiger, and True+JSEM estimate the sparse precision matrices in a column-by-column fashion. The first two perform sparse regression based on the square root loss function, but they either ignore the differences or the commonalities across subgroups. True+JSEM utilizes the squared loss function and does not consider the sparsity structure within each subgroup.

To evaluate the performance of different approaches, we consider the following measures: (1) Clustering error (CE) for evaluating subgroup identification performance, which calculates the distance between the estimated and true subgroup memberships φˆ and φ, defined as CE=Cn2-1(i,j):Iφˆxi=φˆxjIφxi=φxj,i<j; (2) Precision matrix square error (PME) for measuring estimation performance, defined as PME =k=1KΩˆk-Ωk0F/K; (3) True and false positive rates (TPR and FPR) for evaluating network identification performance, defined as TPR=k=1K<jIωk,j00,ωˆk,j0/<jIωk,j00/K and FPR=k=1K<jIωk,j0=0,ωˆk,j0/<jIωk,j0=0/K.

For each scenario, 100 replicates are conducted. For a fair comparison, for all approaches, the true value of K is considered. The results under the three types of network structure for K=3 are shown in Tables 13, respectively. The rest of the results for K=2 and K=4 are provided in the supplementary materials. It can be observed that the proposed approach has superior or competitive performance compared to SCAN in terms of subgroup identification accuracy under all the simulation scenarios. The improvement is more significant under the scenarios with more complex network structures (e.g. power-law network), lower levels of subgroup differences (e.g. S3), or a more imbalanced sample design. For example, for K=3 and the power-law network (Table 1), under the scenario with setting S3 and an imbalanced design, the median CEs are 0.000 (proposed) and 0.166 (SCAN), respectively. Additionally, the proposed approach also performs better in network estimation and identification accuracy. It is able to identify most TPs while keeping FPs much lower than the alternatives. For example, for K=3 and the Erdös-Rényi network (Table 3), under the scenario with setting S2 and a balanced design, the proposed approach has (PME, TPR, FPR)=(16.751, 0.987, 0.039), while (26.975, 0.772, 0.076) for SCAN, (31.025, 0.789, 0.169) for Tiger, (21.104, 0.870, 0.075) for True+Tiger, (20.795, 0.901, 0.184) for True+JGL, and (16.403, 0.965, 0.087) for True+JSEM.

Table 1:

Simulation results under the scenarios with the power-law networks and K=3. In each cell, we show the median (median absolute deviation) of the CE, PME, TPR and FPR values based on 100 replicates. CE=Cn2-1(i,j):Iφˆxi=φˆxjIφxi=φxj,i<j,PME=k=1KΩˆk-Ωk0F/K, TPR=k=1K<jIωk,j00,ωˆk,j0/<jIωk,j00/K, and FPR=k=1K<jIωk,j0=0,ωˆk,j0/<jIωk,j0=0/K.

Approach CE PME TPR FPR
S1 with the balanced design
proposed 0.000(0.000) 20.377(1.058) 0.961(0.004) 0.037(0.001)
SCAN 0.001(0.001) 32.988(0.207) 0.757(0.006) 0.066(0.002)
Tiger - 36.751(0.268) 0.763(0.016) 0.193(0.004)
True+Tiger - 28.146(0.608) 0.832(0.008) 0.067(0.002)
True+JGL - 27.372(0.539) 0.888(0.008) 0.181(0.003)
True+JSEM - 22.451(0.635) 0.943(0.006) 0.084(0.003)
S2 with the balanced design
proposed 0.000(0.000) 19.884(1.528) 0.957(0.010) 0.036(0.002)
SCAN 0.000(0.000) 28.831(0.359) 0.791(0.009) 0.061(0.002)
Tiger - 34.823(0.302) 0.779(0.012) 0.183(0.006)
True+Tiger - 25.802(0.289) 0.848(0.005) 0.064(0.001)
True+JGL - 25.367(0.453) 0.910(0.004) 0.172(0.003)
True+JSEM - 20.182(0.549) 0.960(0.005) 0.070(0.007)
S3 with the balanced design
proposed 0.000(0.000) 21.495(0.770) 0.963(0.006) 0.037(0.001)
SCAN 0.000(0.000) 30.745(0.258) 0.778(0.008) 0.058(0.001)
Tiger - 36.745(0.320) 0.739(0.012) 0.189(0.005)
True+Tiger - 28.153(0.319) 0.839(0.006) 0.063(0.001)
True+JGL - 26.963(0.559) 0.907(0.008) 0.168(0.004)
True+JSEM - 19.588(0.497) 0.973(0.005) 0.068(0.004)
S1 with the imbalanced design
proposed 0.000(0.000) 22.823(1.167) 0.952(0.007) 0.037(0.001)
SCAN 0.002(0.002) 35.378(0.475) 0.739(0.007) 0.058(0.002)
Tiger - 38.486(0.219) 0.756(0.011) 0.190(0.005)
True+Tiger - 29.594(0.574) 0.837(0.005) 0.064(0.001)
True+JGL - 28.999(0.487) 0.900(0.007) 0.173(0.002)
True+JSEM - 24.722(0.767) 0.940(0.007) 0.079(0.006)
S2 with the imbalanced design
proposed 0.000(0.000) 21.155(1.522) 0.949(0.009) 0.037(0.001)
SCAN 0.086(0.084) 31.318(2.751) 0.766(0.025) 0.060(0.005)
Tiger - 34.915(0.306) 0.796(0.012) 0.187(0.005)
True+Tiger - 26.661(0.522) 0.846(0.005) 0.067(0.001)
True+JGL - 26.007(0.662) 0.906(0.006) 0.173(0.003)
True+JSEM - 20.472(0.552) 0.959(0.004) 0.068(0.012)
S3 with the imbalanced design
proposed 0.000(0.000) 22.736(2.614) 0.954(0.009) 0.038(0.002)
SCAN 0.166(0.004) 34.468(4.191) 0.753(0.063) 0.063(0.009)
Tiger - 37.710(0.241) 0.744(0.010) 0.180(0.004)
True+Tiger - 28.955(0.590) 0.837(0.006) 0.065(0.002)
True+JGL - 28.095(0.539) 0.908(0.010) 0.173(0.004)
True+JSEM - 20.138(0.685) 0.977(0.007) 0.068(0.003)

Table 3:

Simulation results under the scenarios with the Erdös-Rényi networks and K=3. In each cell, we show the median (median absolute deviation) of the CE, PME, TPR and FPR values based on 100 replicates. CE=Cn2-1(i,j):Iφˆxi=φˆxjIφxi=φxj,i<j,PME=k=1KΩˆk-Ωk0F/K, TPR=k=1K<jIωk,j00,ωˆk,j0/<jIωk,j00/K, and FPR=k=1K<jIωk,j0=0,ωˆk,j0/<jIωk,j0=0/K.

Approach CE PME TPR FPR
S1 with the balanced design
proposed 0.002(0.002) 16.805(1.169) 0.988(0.004) 0.040(0.001)
SCAN 0.004(0.002) 26.831(1.337) 0.779(0.016) 0.076(0.004)
Tiger - 31.927(0.190) 0.797(0.015) 0.176(0.005)
True+Tiger - 17.662(0.593) 0.918(0.008) 0.073(0.001)
True+JGL - 18.989(0.619) 0.918(0.008) 0.181(0.003)
True+JSEM - 17.261(0.455) 0.945(0.008) 0.098(0.003)
S2 with the balanced design
proposed 0.002(0.002) 16.751(0.906) 0.987(0.004) 0.039(0.002)
SCAN 0.004(0.002) 26.975(0.412) 0.772(0.007) 0.076(0.001)
Tiger - 31.025(0.220) 0.789(0.017) 0.169(0.005)
True+Tiger - 21.104(0.451) 0.870(0.005) 0.075(0.001)
True+JGL - 20.795(0.460) 0.901(0.020) 0.184(0.021)
True+JSEM - 16.403(0.586) 0.965(0.009) 0.087(0.002)
S3 with the balanced design
proposed 0.002(0.002) 14.699(0.628) 0.996(0.004) 0.039(0.002)
SCAN 0.006(0.006) 17.155(2.467) 0.768(0.027) 0.061(0.008)
Tiger - 20.312(0.181) 0.796(0.012) 0.186(0.006)
True+Tiger - 17.670(0.467) 0.919(0.008) 0.074(0.001)
True+JGL - 17.271(0.459) 0.930(0.010) 0.158(0.005)
True+JSEM - 15.435(0.433) 0.971(0.004) 0.098(0.003)
S1 with the imbalanced design
proposed 0.004(0.002) 16.979(1.322) 0.986(0.006) 0.038(0.003)
SCAN 0.004(0.002) 26.194(0.787) 0.786(0.008) 0.076(0.002)
Tiger - 31.834(0.169) 0.797(0.012) 0.175(0.004)
True+Tiger - 18.193(0.564) 0.918(0.005) 0.077(0.002)
True+JGL - 19.442(0.488) 0.914(0.008) 0.184(0.003)
True+JSEM - 17.479(0.499) 0.942(0.008) 0.097(0.002)
S2 with the imbalanced design
proposed 0.002(0.002) 18.227(1.626) 0.983(0.007) 0.039(0.003)
SCAN 0.004(0.002) 27.080(0.547) 0.764(0.010) 0.076(0.002)
Tiger - 31.129(0.221) 0.797(0.009) 0.168(0.004)
True+Tiger - 21.627(0.477) 0.870(0.005) 0.077(0.002)
True+JGL - 21.139(0.567) 0.893(0.013) 0.182(0.008)
True+JSEM - 17.005(0.579) 0.962(0.007) 0.088(0.003)
S3 with the imbalanced design
proposed 0.002(0.002) 14.946(0.959) 0.992(0.004) 0.039(0.003)
SCAN 0.008(0.004) 25.161(0.934) 0.791(0.016) 0.079(0.003)
Tiger - 32.592(0.195) 0.794(0.018) 0.159(0.006)
True+Tiger - 18.603(0.628) 0.919(0.008) 0.078(0.001)
True+JGL - 18.169(0.440) 0.931(0.009) 0.161(0.004)
True+JSEM - 15.791(0.668) 0.972(0.004) 0.099(0.002)

In general, under scenarios with a smaller number of subgroups (K=2), performance of all approaches is significantly improved, with the proposed approach still performing the best and having better performance with increasing similarity across networks (from S1 to S3). The superiority of the proposed approach becomes more prominent under scenarios with a larger number of subgroups, suggesting the validity of the proposed strategy under complex situations. Tiger tends to perform the worst as it ignores heterogeneity. Comparatively, SCAN can simultaneously conduct subgroup identification and multiple network estimation, thus having better performance. Under ideal scenarios with true subgroup memberships, True+Tiger, True+JGL, and True+JSEM perform the second best, with True+JSEM being the most prominent as it conducts joint network analysis with an effective sparse linear programming perspective. In most cases, benefiting from the satisfactory scale-invariant property, the proposed approach, although with unknown sample heterogeneity, can exhibit more accurate network identification than these methods depending on true subgroup memberships. In summary, our approach can effectively capture both shared and unique network structures in heterogeneous network data across diverse scenarios with various degrees of within-group similarity, network structures, and number of subgroups.

In addition to the above analyses, we investigate scenarios where the predictors are higher-dimensional, with p=500 and p=1,000. The detailed settings and results (Table S7) are provided in the supplementary materials, where the simulated networks for p=500 are denser than those for p=1,000. As it is not computationally feasible to conduct analysis with SCAN for scenarios with p=1,000, the corresponding results are not available. For larger-scale data, the performance of those regularized likelihood-based approaches such as SCAN and True+JGL decays as expected, especially for SCAN, whose performance gets dramatically worse and cannot afford the computational cost in the case of p=1,000. However, the network analysis approaches based on column-wise linear regressions continue to maintain their satisfactory performance. Under the scenarios with a denser network and high dimension (p=500), the proposed approach exhibits slightly inferior performance compared to True+JSEM, as it is more difficult to accurately identify subgroup memberships for subjects, resulting in reduced network identification accuracy. Under the scenario with sparser networks (p=1,000), although with a high dimension, the proposed approach still behaves more favorably. Even when the networks exhibit a higher degree of similarity (S3), the TPR of the proposed approach drops slightly due to the misclassification of subjects, but it still has a smaller number of false positives for the networks.

To gain a deeper insight into the benefit of considering common structures in different networks, we compare the proposed approach with an ad-hoc approach called “ad-hoc MCP” that estimates each network separately using only MCP based on the identified subgroups with the proposed approach. The comparison results under the scenario with K=3 are provided in Supplementary Table S8, which demonstrates that the proposed approach has superior identification and estimation performance compared to the ad-hoc MCP method. Furthermore, we take one replicate under the scenario with setting S3, a balanced design, and the power-law network as an example, and provide the heatmaps of the true sparse structures and estimated results with the proposed approach and the ad-hoc MCP approach in Supplementary Figs. S1 and S2. It is observed that the proposed approach can more accurately identify the true sparse structures and more effectively accommodate the common structures across different networks.

4.1. Computer time

To examine the computational superiority of the proposed approach, we conduct simulations with various values of (n,p,K), which are all implemented on a computer with an Intel Core i7 processor and 24 GB of RAM. The computer time results of the proposed approach and the alternatives with fixed tuning parameters are reported in Supplementary Table S9. In general, analysis with the proposed approach is observed to take more time than Tiger and those methods based on known subgroup memberships, which is due to the fact that our approach aims to perform both subgroup and multiple network identifications, while the others either only analyze a single network or rely on prior subgroup information. Compared to the most direct competitor SCAN, which is also based on the Gaussian mixture model and EM algorithm, the proposed approach is significantly faster, especially for large-scale data. For example, under the scenario with n=600,p=500, and K=3, the average computer time is 1794.367 (proposed) and 5169.790 (SCAN) seconds. When the dimension increases to 1,000, SCAN is computationally infeasible (more than 24 hours) even with a sample size of n=300. However, the proposed algorithm is still computationally affordable, with the computer time being around 2.6 hours when n=1,000.

The computational cost of the proposed approach can be significantly reduced by using parallel computing, benefiting from the column-wise strategy. For K=3, we compare the computational cost of the single-thread csingle and its parallel version cparrallel under the scenarios with various sample sizes, dimensions, and number of cores. The cost ratio rc=cparrallel/csingle as a function of dimension is provided in Supplementary Fig. S3. As can be observed, the computer time of the proposed approach speeds up over the single-thread version of the algorithm, especially for high-dimensional settings and the use of multi-core processors. The corresponding costs of the proposed analysis with eight cores are also provided in Supplementary Table S9. For high-dimensional data, the proposed approach can sometimes be as efficient as the True+JGL and True+JSEM methods. For example, under the scenario with n=300, p=1000, and K=2, the average computer time is 172.217 seconds (proposed with parallel), 678.119 seconds (True+JGL), and 553.785 seconds (True+JSEM).

4.2. Tuning-insensitive regularization path

We further numerically examine the tuning-insensitive properties of the proposed approach. Motivated by [22, 33], for finite samples, we consider the tuning parameter with the form ζln{K(p-1)}/n, where ζ is a positive constant independent of all unknown parameters, and introduce the graph recovery accuracy: Accuracy = TPR – FPR. Taking the simulation scenario under the power-law network and setting S1 as an example, we examine the values of Accuracy and PME as a function of ζ for the proposed approach, as well as SCAN and True+JSEM, in Fig. 1. It is observed that for the proposed approach, the regularization paths are flat without a significant change when ζ lies in the range of (0.4,0.7), which suggests that the proposed approach is empirically insensitive to the tuning parameter λ. In contrast, for SCAN and True+JSEM, a larger range of ζ1,ζ2 (or ζ) is required in the search of the optimal value, and the paths present more irregular changes, which indicates more sensitivity of SCAN and True+JSEM to the choice of tuning parameters. In general, the proposed approach is easier to implement than SCAN and True+JSEM to find reasonable tuning parameters.

Fig. 1:

Fig. 1:

Column one: Values of Accuracy and PME as a function of ζ for the proposed approach; Columns two and three: Values of Accuracy and PME as a function of ζ1 for the individual-level sparsity parameter which has form ζ1ln{K(p1)}/n, and ζ2 for the group-level sparsity parameter which has form ζ2ln(p-1)/n for the SCAN approach; Column four: Values of Accuracy and PME as a function of ζ for the group-level sparsity parameter which has form ζln(p1)/n for the True+JSEM approach, under the scenario with the power-law network and setting S1.

5. Data Analysis

We focus on reconstructing gene networks for cancer, which is an important task for better understanding the underlying biological processes. Specifically, we analyze breast cancer data from The Cancer Genome Atlas (TCGA), where the underlying heterogeneity has been posing an increasing public health concern. The mRNA gene expression measurements are considered, which are downloaded from the TCGA website using the R package cgdsr. In total, 1, 100 breast cancer subjects are available with 18, 506 gene expression measurements. As the number of connected genes is not expected to be large, to improve stability as well as reduce computational cost, we conduct prescreening, which has been a common technique in published studies. Specifically, following [25], we focus on the genes in the Wnt signaling, oxidative phosphorylation, Mtor signaling, and citrate cycle tca cycle pathways, which have been demonstrated to play an important role in all cancer types in the literature. This results in 316 genes for downstream analysis.

In practice, the number of subgroups K is usually unknown. In this study, we adopt the gap statistic using the R package NbClust to select the optimal value of K. Given the candidate set K{2,3,4,5,6,7,8}, with the gap statistic, the proposed approach identifies two subgroups with group sizes 805 (subgroup 1) and 295 (subgroup 2). In addition, 1,171 and 666 edges among 309 and 289 genes are discovered for the two subgroups, respectively, and 222 common edges across the two subgroups are identified. The graphical representation of the identified networks is provided in Fig. 2.

Fig. 2:

Fig. 2:

Data analysis: gene networks for the two subgroups identified with the proposed approach. The black lines represent the common edges shared by the two subgroups, and the blue lines represent the specific edges for each subgroup.

To gain more insight into the identified networks, we examine the related genes’ functional and biological connections by conducting Gene Ontology (GO) enrichment analysis, which is implemented using DAVID 2021 [31]. Our analysis first suggests that those genes involved in the common edges are functionally and biologically connected with certain significantly enriched GO terms. For example, in one common subnetwork, genes ATP6V1A, ATP5F1A, ATP6V1B2, and NDUFS1 are enriched with the ATP metabolic process (GO:0046034, P-value: 7.37×10-3), which has been demonstrated to be closely linked to the selective killing of respiratory competent cancer cells that are critical for tumor progression, including breast cancer [21]. In addition, in this subnetwork, genes ROCK1, ROCK2, and RAC1 are enriched with regulation of stress fiber assembly (GO:0051492, P-value: 1.06×10-3), and studies have shown that stress fiber-mediated cellular stiffness can promote tumor growth in the precancerous stage, including breast cancer [35]. Genes NDUFA11, COX6A1, RPS6KA3, FRAT1, STK11, SOX17, RPS6KA1, EP300, and ATP6V1E1 are enriched with protein binding function (GO:0005515, P-value: 3.14×10-12), which has been reported to play an important role in cancer treatment, and in fact several proteins have been shown to be effective in delivering to tumor sites [37]. Genes GSK3B, PRKAA1, CHD8, TCF7, LRP6, NKD2, SOX17, DVL1, and DVL3 are enriched with the Wnt signaling pathway (GO:0016055, P-value: 1.63×10-21), which has been reported to be activated in over half of breast cancer patients and plays an important role in triple negative breast cancer development [39]. Moreover, genes COX7B, NDUFA11, COX4I1, NDUFA10, COX6A1, COX7C, COX8A, NDUFC2, NDUFC1, and SDHD are enriched with mitochondrial inner membrane (GO:0005743, P-value: 1.39×10-58), which has been reported to be essential in the regulation of cancer cell migration and invasion [36], including breast cancer [41]. The GO enrichment analysis also finds that the genes involved in the specific edges of different subgroups are associated with some distinct significantly enriched GO terms. For example, genes WNT5A, TSC2, ULK1, AXIN2, CSNK1E, and TP53 are enriched with the protein localization process (GO:0008104, P-value: 3.58×10-3) in subgroup 1, which has been reported to be implicated in the pathogenesis of human diseases, including breast cancer, and therapeutic strategies targeting protein localization have been conceptualized as promising for the treatment of a variety of human diseases [19]. In addition, genes LEF1, PSEN1, DKK1, and LRP6 are enriched with embryonic limb morphogenesis process (GO:0030326, P-value: 0.02) in subgroup 2, where some embryonic genes are significantly upregulated in estrogen receptor negative breast cancer [43]. These biologically sensible findings provide support for the validity of the proposed network identification analysis.

Analyses are also conducted using the alternative approaches with K=2. Here, as the true subgroup memberships are not available, for comparative analyses under known subgroup memberships as well as indirect support for the subgroup identification results, we follow [25] and consider three clinical breast cancer subgroups, namely ER+, ER−, and other unevaluated cases, based on whether the subjects have estrogen receptors. In our data analysis, the sizes of these three subgroups are 812, 238, and 50, respectively. The multiple network estimation procedure using the alternatives is performed based on the ER+ and ER− subgroups. The summary comparison results of the heterogeneity analysis and network analysis are reported in Tables S10 and S11 of the supplementary materials, where the numbers of subjects in each subgroup and edges in each network identified by the different approaches and their overlaps are provided. Here, the subgroups identified using the different approaches are matched by correlation, and the different approaches lead to different subgroup memberships. For ER+ and ER−, we present the proportions of subjects that are identified in different subgroups using the proposed and SCAN approaches in Fig. S4 of the supplementary materials. It is observed that the two approaches can discriminate the ER+ and ER− subgroups effectively, with the proposed approach demonstrating more advantageous results. The P-value of the Chi-square test for the proposed approach is 6.08×10-69. Additionally, by matching the identified subgroups with ER status, the CE of the proposed approach is 0.29, compared to 0.37 for SCAN. These results suggest the effectiveness of the proposed analysis.

In Table S11 of the supplementary materials, we observe that the different approaches identify a moderate number of overlapping edges. To indirectly support the network identification results, following [11], we adopt a resampling strategy and compute the negative log-likelihood statistic (NLS) to evaluate “prediction accuracy”, with a smaller value indicating a better performance. Specifically, we first randomly divide the data into a training and a testing set, estimate the parameters on the training set, and finally compute the NLS using the testing set. Based on 100 resamplings, the proposed approach has an average NLS value of 77449.60, compared to 78278.28 for SCAN, 99115.53 for Tiger, 97528.56 for ER+Tiger, 97385.31 for ER+JGL, and 96463.68 for ER+JSEM, with the proposed approach having competitive prediction accuracy.

6. Discussion

Network analysis of heterogeneous subgroups is still a wide-open problem in various research fields. In this article, we have proposed a new joint estimation approach for multiple networks based on the multivariate Gaussian mixture modeling. Different from the existing regularized likelihood-based approaches, which are usually only applicable to small-scale data, the proposed approach provides a more effective and useful tool for estimating multiple precision matrices by solving a collection of simpler sparse regression subproblems. Specifically, based on the Gaussian graphical model, we have proposed a reparameterized mixture of regression models with composite MCP to effectively accommodate both similarities and differences across undiscovered distinct subgroups. Such a strategy enjoys the scale-invariant and tuning-insensitive properties and can be solved in parallel to largely reduce computational cost. The theoretical properties of the proposed approach are investigated, indicating that the proposed estimator enjoys the oracle property. A number of numerical experiments demonstrate the superior performance of the proposed approach in terms of subgroup and network identification accuracy. The application to a TCGA data exploits different gene networks for breast cancer and rediscovers biologically sensible gene relationships associated with the heterogeneous subgroups.

In the proposed approach, we have adopted the composite MCP to accommodate the common and specific structures among different networks. Other penalties, such as ρvecβ1,j,,,βK,j,2;λ1,γ1+k=1Kρβk,j,;λ2,γ2 (sparse group MCP) can also achieve identification at both the group and individual levels and may have satisfactory statistical and numerical properties. Both composite MCP and sparse group MCP have been adopted in published studies, such as [3, 17] for composite MCP, and [23] for sparse group MCP. It is expected that performance of these two penalties may depend on the underlying model, data settings, and other factors. We adopt the composite MCP, as it has been popular in the literature and leads to satisfactory numerical performance. In future works, some other penalties, including this sparse group MCP, can be further investigated. In our data analysis, the TCGA breast cancer data has been analyzed, which has often been used for Gaussian distribution-based network analysis [27, 40]. We have followed these studies and conducted analysis without data transformation or normalization, and some results with important biological implications have been found. We acknowledge the complexity of gene expression data with potential non-Gaussian properties. It would be of interest to implement robust techniques like the t distribution mixture model or some other nonparanormal mixture graphical models to further study non-Gaussian distributed data. We have mainly focused on the identification of gene networks using expression data. Many other biological measurements, such as mutation and DNA methylation, can be further used for a better understanding of the mechanisms of cancer.

7. Technical details

Proof of Theorem 1. Recall that L~nθj,Cj=i=1nlnfxi,j;xi,\j,θj,Cj. Let δn=Ks0/n and h be a sj-dimensional vector, where sj=3K+k=1K𝒮j,k is the nonsparsity size for the jth subproblem. It suffices to show that L~nθj,Cj0+δnh<L~nθj,Cj0 everywhere on the boundary {h:h=C}, where C is a sufficiently large positive constant. By Taylor expansion, we have:

L˜n(θj,Cj0+δnh)L˜n(θj,Cj0)=δnh{L˜n(θj,Cj)θj,Cj|θj,Cj=θj,Cj0}+12δn2h{2L˜n(θj,Cj)2θj,Cj|θj,Cj=θj,Cj0}h+δn36q,,mCj3L˜n(θj,Cj)θj,qθj,θj,m|θj,Cj=θj,Cjhqhhm:=I+II+III,

where θj,Cj lies on the line segment connecting θj,Cj0+δnh and θj,Cj0.

For I, by Assumption 1, as n,

1nL˜n(θj,Cj)θj,Cj|θj,Cj=θj,Cj0=n{1ni=1nlnf(xi,j;xi,\j,θj,Cj)θj,Cj|θj,Cj=θj,Cj0}=n[1ni=1nlnf(xi,j;xi,\j,θj,Cj)θj,Cj|θj,Cj=θj,Cj0E{lnf(xi,j;xi,\j,θj,Cj)θj,Cj|θj,Cj=θj,Cj0}𝒩(0,(θj,Cj0)).

Thus,

L˜n(θj,Cj)θj,Cj|θj,Cj=θj,Cj0=Op(n).

Then, it is easy to see that

|I|Op(nδn2)h.

For II,

II=12nδn2h(θj,Cj0)h+12nδn2h{1n2L˜n(θj,Cj)2θj,Cj|θj,Cj=θj,Cj0+(θj,Cj0)}h.

Following similar arguments as in the proof of Lemma 8 in [10], with Assumptions 1–2, we have:

1n2L˜n(θj,Cj)2θj,Cj|θj,Cj=θj,Cj0+(θj,Cj0)=op(1/sj).

Therefore,

II=12nδn2h(θj,Cj0)h+12nδn2h2×op(1).

For III, by Assumption 3 and applying Cauchy-Schwartz inequality, we have:

|III|=δn36|q,,mCj3L˜n(θj,Cj)θj,qθj,θj,m|θj,Cj=θj,Cjhqhhm|=δn36|q,,mCji=1n3lnf(xi,j;xi,\j,θj,Cj)θj,qθj,θj,m|θj,Cj=θj,Cjhqhhm|δn36i=1n{q,,mCjM22(vi)}1/2h3=Op(sj3/2δn)×nδn2×h2.

By Assumption 4, for b(0,1/4),K=Onb and s0=on1/4-b, then Ks02=on1/2. Thus,

III=op(nδn2)h2.

Due to the positive definiteness of the Fisher information matrix θj,Cj at θj,Cj=θj,Cj0, for a sufficiently large C, the quadratic function -nδn2hθj,Cj0h/2 in II dominates I and III. This completes the proof.

Proof of Theorem 2. To prove that θ~j is a local maximizer of Lnθj, we consider θj in a small neighborhood of θ~j such that θj-θ~j=Ona/2-1/2lnn/(Kp)+s0K/(pn), and let θ_j have θ_jCj=θjCj and θ_jCjc=0. According to Theorem 1, we have Ln(θ_j)<Ln(θ˜j). Hence it suffices to show Ln(θ_j)>Ln(θj). Notice that

Ln(θj)Ln(θ_j)=L˜n(θj)L˜n(θ_j)nj[ρ{k=1Kρ(|βk,j,|;λ,γ);λ,Kλγ2}ρ{k=1Kρ(|β_k,j,|;λ,γ);λ,Kλγ2}]:=Ln1+Ln2.

For Ln1, we have:

Ln1=L˜n(θj)L˜n(θ_j)=k𝒮j,kcL˜n(θj)βk,j,|θj=θ¯jβk,j,=k𝒮j,kc{L˜n(θj)βk,j,|θj=θ˜j+(θ¯jθ˜j)2L˜n(θj)βk,j,θj|θj=θj}βk,j,,

where θ¯j lies on the line segment connecting θj and θ_j, and θj lies on the line segment connecting θ¯j and θ˜j.

First consider L~nθj/βk,j,θj=θ~j. With a second order Taylor expansion, for 𝒮j,kc, we have:

L˜n(θj)βk,j,|θj=θ˜j=L˜n(θj)βk,j,|θj=θj0+(θ˜j,Cjθj,Cj0)2L˜n(θj)βk,j,θj,Cj|θj=θj0+(θ˜j,Cjθj,Cj0)3L˜n(θj)βk,j,2θj,Cj|θj=θ`j(θ˜j,Cjθj,Cj0):=I+II+III, (10)

where θ`j lies on the line segment connecting θ˜j and θj0.

For I in (10), define the following event:

Ω1={maxk{1,,K}max𝒮j,kc|L˜n(θj)βk,j,|θj=θj0|ζnn},

where ζn=na/2lnn with a(0,1/2). By Assumption 3 and ln(p)=Ona in Assumption 7, together with Bernstein’s inequality, we can obtain that, when n, there exists κ>0 such that:

Pr(Ω1)=1Pr{maxk{1,,K}max𝒮j,kc|L˜n(θj)βk,j,|θj=θj0|>ζnn}1k=1K𝒮j,kcPr{|1nL˜n(θj)βk,j,|θj=θj0|>ζn}12(Kpk=1K|𝒮j,k|)exp(ζn22κ)12Kpexp(ζn22κ)1.

Thus, with probability tending to 1,

maxk{1,,K}max𝒮j,kc|L˜n(θj)βk,j,|θj=θj0|=O(na/2+1/2lnn).

For II in (10), applying Assumption 3 and Cauchy-Schwartz inequality yields that:

maxk{1,,K}max𝒮j,kc|(θ˜j,Cjθj,Cj0)2L˜n(θj)βk,j,θj,Cj|θj=θj0|maxk{1,,K}max𝒮j,kci=1n|mCj2lnf(xi,j;xi,\j,θj)βk,j,θj,m|θj=θj0(θ˜j,mθj,m0)|maxk{1,,K}max𝒮j,kci=1n[mCj{2lnf(xi,j;xi,\j,θj)βk,j,θj,m|θj=θj0}12θ˜j,Cjθj,Cj0i=1n[sj{M1(vi)}2]12θ˜j,Cjθj,Cj0=Op(Kns0). (11)

Similarly, for III in (10), we have:

maxk{1,,K}max𝒮j,kc|(θ˜j,Cjθj,Cj0)3L˜n(θj)βk,j,2θj,Cj|θj=θ`j(θ˜j,Cjθj,Cj0)|maxk{1,,K}max𝒮j,kci=1n[q,mCj{3lnf(xi,j;xi,\j,θj)βk,j,θj,qθj,m|θj=θ`j}2]12θ˜j,Cjθj,Cj02i=1n[sj2{M2(vi)}2]12θ˜j,Cjθj,Cj02=op(Kns0).

Then, we have:

maxk{1,,K}max𝒮j,kc|Ln(θj)βk,j,|θj=θ˜|Op(na/2+1/2lnn)+Op(Kns0)+op(Kns0).

Moreover, using similar discussion as that in (11), we can obtain that:

maxk{1,,K}max𝒮j,kc(θ¯jθ˜j)2L˜n(θj)βk,j,θj|θj=θ˙ji=1n[K(p+2){M1(vi)}2]12θ¯jθ˜j=O(nKp)θ¯jθ˜j.

Together with θ-j-θ~jθj-θ~j=Ona/2-1/2lnn/(Kp)+s0K/(pn), we obtain that:

Ln1=L˜n(θj)L˜n(θ_j)=k𝒮j,kc{Op(na/2+1/2lnn)+Op(Kns0)}βk,j,. (12)

For Ln2, let ρ˙(β;λ,γ) be the derivative of ρ(β;λ,γ) with respect to β, more specifically,

ρ˙(β;λ,γ)=(λβγ)I{βλγ}.

By Taylor expansion, we have:

Ln2=nk=1K𝒮j,kcρ˙{k'=1Kρ(|β˘k',j,|;λ,γ);λ,Kλγ2}ρ˙(|β˘k,j,|;λ,γ)|βk,j,|, (13)

where β˘ lies between β_ and β. Result (13) also depends on the following fact: by Assumption 6,β˘k,j,>γλ, and thus ρ˙β˘k,j,;λ,γ=0 for 𝒮j,k.

Note that when 𝒮j,kc, when n,

ρ˙(|β˘k,j,|;λ,γ)λ,

and

k'=1Kρ(|β˘k',j,|;λ,γ)=k'kKρ(|β˘k',j,|;λ,γ)+ρ(|β˘k,j,|;λ,γ)mj,γλ22,

where mj,=#k':βk',j,00<K is the number of the nonzero elements in β1,j,,,βK,j,. Therefore, for 𝒮j,kc, when n,

ρ˙{k'=1Kρ(|β˘k',j,|;λ,γ);λ,Kλγ2}ρ˙(|β˘k,j,|;λ,γ)(1mj,K)λ2. (14)

Following similar arguments as in the proof of Theorem 1 in [9], and combining with the results of (12) and (14), we can obtain that when n,

Ln(θj)Ln(θ_j)=Ln1+Ln2k𝒮j,kc{Op(na/2+1/2lnn)+Op(Ks0n)}βk,j,n(1mj,K)λ2|βk,j,|.

By Assumption 5, na/2-1/2lnn/λ20 and Ks0/nλ20 when n, we can prove that Lnθj-Lnθ_j<0, which indicates that θ~j is a local maximizer of Lnθj.

Proof of Theorem 3. From Theorem 2, we know that there exists a strict local maximizer θˆj of Lnθj, which is asymptotically as efficient as θ~j. Next we consider the properties of the estimators Ωˆk’s obtained from (5) and the resulting edge set Eˆk=(,j):1jp, and ωˆk,j0. Recall that

βk,j,=ωk,j/ωk,jj,τk,j=ωk,jj.

Thus,

ωk,j=βk,j,τk,j,ωk,jj=τk,j2.

Together with the result in Theorem 2, it’s obvious that with probability tending to 1, Eˆk=Ek0. Furthermore, since τˆk,j-τk,j0OpKs0/n and τk,j0 is bounded by Assumption 6, τˆk,j/τk,j0-1OpKs0/n. Thus for the diagonal element ωk,jj, we have:

|ωˆk,jjωk,jj0|=|τˆk,j2(τk,j0)2|=|τˆk,jτk,j0|(τˆk,j+τk,j0)=Op{Ks0/n(maxkmaxjτk,j0)}=Op(Ks0/n).

Together with βˆk,j,-βk,j,0OpKs0/n, for the off-diagonal element ωk,j with j, we have:

|ωˆk,jωk,j0|=|βˆk,j,τˆk,jβk,j,0τk,j0|(|βˆk,j,τˆk,jβk,j,0τˆk,j|+|βk,j,0τˆk,jβk,j,0τk,j0|)τˆk,j|βˆk,j,βk,j,0|+|τˆk,jτk,j01||ωk,j0|=Op(Ks0/n).

Thus, we conclude that maxk,j,ωˆk,j-ωk,j0=OpKs0/n.

Similarly,

k=1KΩˆkΩk0=k=1Kmaxj{=1p|ωˆk,jωk,j0|}=k=1Kmaxj{j|βˆk,j,τˆk,jβk,j,0τk,j0|+|τˆk,j2(τk,j0)2|}k=1Kmaxj{j(|βˆk,j,τˆk,jβk,j,0τˆk,j|+|βk,j,0τˆk,jβk,j,0τk,j0|)+|τˆk,jτk,j0|(τˆk,j+τk,j0)}k=1Kmaxj{|τˆk,jτk,j0|j|βk,j,0|+τˆk,jj|βˆk,j,βk,j,0|+|τˆk,jτk,j0|(τˆk,j+τk,j0)}Op(Ks0/nmaxjk=1Kj|βk,j,0|+Ks0/n+K3s0/n)=Op(K3s03/n).

Besides,

k=1K(ΩˆkΩk0)𝒜F=k=1K{j=1pj(ωˆk,jωk,j0)2}1/2=k=1K{j=1pj(βˆk,j,τˆk,jβk,j,0τk,j0)2}1/2k=1K{2j=1p(τˆk,jτk,j0)2j(βk,j,0)2+2j=1pτˆk,j2j(βˆk,j,βk,j,0)2}1/2k=1K{2j=1p(τˆk,jτk,j01)2j(ωk,j0)2+2j=1pτˆk,j2j(βˆk,j,βk,j,0)2}1/2Op(Ks0/n)k=1K{j=1pj(ωk,j0)2}1/2+Op[k=1K{j=1pj(βˆk,j,βk,j,0)2}1/2]Op(K3s0s/n)Op(K3s2/n).

Supplementary Material

1

Table 2:

Simulation results under the scenarios with the nearest-neighbor networks and K=3. In each cell, we show the median (median absolute deviation) of the CE, PME, TPR and FPR values based on 100 replicates. CE=Cn2-1(i,j):Iφˆxi=φˆxjIφxi=φxj,i<j,PME=k=1KΩˆk-Ωk0F/K, TPR=k=1K<jIωk,j00,ωˆk,j0/<jIωk,j00/K, and FPR=k=1K<jIωk,j0=0,ωˆk,j0/<jIωk,j0=0/K.

Approach CE PME TPR FPR
S1 with the balanced design
proposed 0.000(0.000) 18.171(0.524) 0.982(0.005) 0.037(0.002)
SCAN 0.000(0.000) 34.619(0.392) 0.710(0.008) 0.076(0.001)
Tiger - 37.508(0.207) 0.817(0.016) 0.184(0.004)
True+Tiger - 22.575(0.616) 0.862(0.007) 0.068(0.001)
True+JGL - 22.317(0.562) 0.904(0.008) 0.178(0.004)
True+JSEM - 18.468(0.498) 0.964(0.005) 0.084(0.003)
S2 with the balanced design
proposed 0.000(0.000) 17.441(0.515) 0.987(0.003) 0.037(0.001)
SCAN 0.002(0.002) 36.546(0.699) 0.713(0.008) 0.073(0.002)
Tiger - 40.345(0.245) 0.825(0.013) 0.183(0.004)
True+Tiger - 23.529(0.654) 0.867(0.006) 0.066(0.001)
True+JGL - 21.983(0.497) 0.925(0.016) 0.190(0.020)
True+JSEM - 17.508(0.603) 0.980(0.005) 0.071(0.003)
S3 with the balanced design
proposed 0.000(0.000) 17.637(0.507) 0.990(0.003) 0.037(0.002)
SCAN 0.002(0.002) 34.577(0.543) 0.739(0.005) 0.078(0.001)
Tiger - 38.052(0.304) 0.808(0.015) 0.192(0.006)
True+Tiger - 24.609(0.612) 0.852(0.008) 0.072(0.001)
True+JGL - 22.986(0.589) 0.919(0.014) 0.215(0.008)
True+JSEM - 19.003(0.477) 0.957(0.005) 0.073(0.003)
S1 with the imbalanced design
proposed 0.000(0.000) 18.582(0.864) 0.977(0.003) 0.037(0.002)
SCAN 0.000(0.000) 33.991(0.492) 0.710(0.008) 0.075(0.001)
Tiger - 37.336(0.196) 0.824(0.010) 0.181(0.004)
True+Tiger - 23.089(0.447) 0.862(0.005) 0.071(0.001)
True+JGL - 22.730(0.562) 0.907(0.006) 0.179(0.002)
True+JSEM - 18.693(0.413) 0.962(0.005) 0.083(0.004)
S2 with the imbalanced design
proposed 0.000(0.000) 17.526(0.909) 0.980(0.008) 0.038(0.002)
SCAN 0.002(0.002) 35.934(0.459) 0.718(0.005) 0.072(0.002)
Tiger - 40.089(0.176) 0.817(0.008) 0.178(0.003)
True+Tiger - 24.006(0.455) 0.863(0.007) 0.071(0.002)
True+JGL - 22.817(0.562) 0.919(0.013) 0.176(0.005)
True+JSEM - 17.546(0.663) 0.980(0.003) 0.071(0.004)
S3 with the imbalanced design
proposed 0.000(0.000) 17.688(0.609) 0.987(0.003) 0.038(0.001)
SCAN 0.002(0.002) 34.453(0.414) 0.739(0.008) 0.076(0.002)
Tiger - 38.031(0.239) 0.824(0.012) 0.192(0.005)
True+Tiger - 25.031(0.467) 0.857(0.005) 0.076(0.002)
True+JGL - 23.854(0.716) 0.921(0.010) 0.219(0.005)
True+JSEM - 19.367(0.456) 0.957(0.005) 0.074(0.004)

Acknowledgments

The authors thank the editors and reviewers for their careful review and insightful comments. This work was supported by National Natural Science Foundation of China [12071273], Shanghai Rising-Star Program [22QA1403500], Shanghai Research Center for Data Science and Decision Technology, National Institutes of Health [CA204120], and Natural Science Foundation [2209685].

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Appendix

The Supplementary Materials contain additional results of the proposed algorithm, simulations, and data analysis, which can be found online.

References

  • [1].Banerjee O, El Ghaoui L, d’Aspremont A, Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data, The Journal of Machine Learning Research 9 (2008) 485–516. [Google Scholar]
  • [2].Bilgrau AE, Peeters CF, Eriksen PS, Bøgsted M, van Wieringen WN, Targeted fused ridge estimation of inverse covariance matrices from multiple high-dimensional data classes., Journal of Machine Learning Research 21 (2020) 1–52.34305477 [Google Scholar]
  • [3].Breheny P, Huang J, Penalized methods for bi-level variable selection, Statistics and Its Interface 2 (2009) 369–380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Buphamalai P, Kokotovic T, Nagy V, Menche J, Network analysis reveals rare disease signatures across multiple levels of biological organization, Nature Communications 12 (2021) 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Cai T, Li H, Liu W, Xie J, Joint estimation of multiple high-dimensional precision matrices, Statistica Sinica 26 (2016) 445–464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Cai T, Liu W, Luo X, A constrained l1 minimization approach to sparse precision matrix estimation, Journal of the American Statistical Association 106 (2011) 594–607. [Google Scholar]
  • [7].Danaher P, Wang P, Witten DM, The joint graphical lasso for inverse covariance estimation across multiple classes, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 (2014) 373–397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Engelke S, Hitz AS, Graphical models for extremes, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (2020) 871–932. [Google Scholar]
  • [9].Fan J, Lv J, Nonconcave penalized likelihood with NP-dimensionality, IEEE Transactions on Information Theory 57 (2011) 5467–5484. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Fan J, Peng H, Nonconcave penalized likelihood with a diverging number of parameters, The Annals of Statistics 32 (2004) 928–961. [Google Scholar]
  • [11].Fan X, Fang K, Ma S, Wang S, Zhang Q, Assisted graphical model for gene expression data analysis, Statistics in Medicine 38 (2019) 2364–2380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Friedman J, Hastie T, Tibshirani R, Sparse inverse covariance estimation with the graphical lasso, Biostatistics 9 (2008) 432–441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Gao C, Zhu Y, Shen X, Pan W, Estimation of multiple networks in Gaussian mixture models, Electronic Journal of Statistics 10 (2016) 1133–1154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Gibberd AJ, Nelson JD, Regularized estimation of piecewise constant Gaussian graphical models: The group-fused graphical lasso, Journal of Computational and Graphical Statistics 26 (2017) 623–634. [Google Scholar]
  • [15].Hao B, Sun WW, Liu Y, Cheng G, Simultaneous clustering and estimation of heterogeneous graphical models, Journal of Machine Learning Research 18 (2018) 1–58. [PMC free article] [PubMed] [Google Scholar]
  • [16].Hill SM, Heiser LM, Cokelaer T, Unger M, Nesser NK, Carlin DE, Zhang Y, Sokolov A, Paull EO, Wong CK, et al. , Inferring causal molecular networks: empirical assessment through a community-based effort, Nature Methods 13 (2016) 310–318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Huang Y, Zhang Q, Zhang S, Huang J, Ma S, Promoting similarity of sparsity structures in integrative analysis with penalization, Journal of the American Statistical Association 112 (2017) 342–350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Hui FKC, Warton DI, Foster SD, Multi-species distribution modeling using penalized mixture of regressions, The Annals of Applied Statistics 9 (2015) 866–882. [Google Scholar]
  • [19].Hung M-C, Link W, Protein localization in disease and therapy, Journal of Cell Science 124 (2011) 3381–3392. [DOI] [PubMed] [Google Scholar]
  • [20].Khalili A, Lin S, Regularization in finite mixture of regression models with diverging number of parameters, Biometrics 69 (2013) 436–446. [DOI] [PubMed] [Google Scholar]
  • [21].Kim MS, Gernapudi R, Cedeño YC, Polster BM, Martinez R, Shapiro P, Kesari S, Nurmemmedov E, Passaniti A, Targeting breast cancer metabolism with a novel inhibitor of mitochondrial atp synthesis, Oncotarget 11 (2020) 3863–3885. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Liu H, Wang L, Tiger: A tuning-insensitive approach for optimally estimating Gaussian graphical models, Electronic Journal of Statistics 11 (2017) 241–294. [Google Scholar]
  • [23].Liu J, Huang J, Xie Y, Ma S, Sparse group penalized integrative analysis of multiple cancer prognosis datasets, Genetics Research 95 (2013) 68–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Liu W, Luo X, Fast and adaptive sparse precision matrix estimation in high dimensions, Journal of Multivariate Analysis 135 (2015) 153–162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Ma J, Michailidis G, Joint structural estimation of multiple graphical models, The Journal of Machine Learning Research 17 (2016) 5777–5824. [Google Scholar]
  • [26].Meinshausen N, Bühlmann P, High-dimensional graphs and variable selection with the lasso, The Annals of Statistics 34 (2006) 1436–1462. [Google Scholar]
  • [27].Niu Y, Ni Y, Pati D, Mallick BK, Covariate-assisted bayesian graph learning for heterogeneous data, Journal of the American Statistical Association (2023) 1–15. [Google Scholar]
  • [28].Price BS, Molstad AJ, Sherwood B, Estimating multiple precision matrices with cluster fusion regularization, Journal of Computational and Graphical Statistics 30 (2021) 823–834. [Google Scholar]
  • [29].Ren M, Zhang S, Zhang Q, Ma S, et al. , Gaussian graphical model-based heterogeneity analysis via penalized fusion, Biometrics 78 (2022) 524–535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Shen J, Liu F, Tu Y, Tang C, Finding gene network topologies for given biological function with recurrent neural network, Nature Communications 12 (2021) 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Sherman BT, Hao M, Qiu J, Jiao X, Baseler MW, Lane HC, Imamichi T, Chang W, DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update), Nucleic Acids Research 10 (2022) W216–W221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Städler N, Bühlmann P, Van De Geer S, 1-penalization for mixture regression models, Test 19 (2010) 209–256. [Google Scholar]
  • [33].Sun T, Zhang C-H, Scaled sparse linear regression, Biometrika 99 (2012) 879–898. [Google Scholar]
  • [34].Sun T, Zhang C-H, Sparse matrix inversion with scaled lasso, The Journal of Machine Learning Research 14 (2013) 3385–3418. [Google Scholar]
  • [35].Tavares S, Vieira AF, Taubenberger AV, Araújo M, Martins NP, Brás-Pereira C, Polónia A, Herbig M, Barreto C, Otto O, et al. , Actin stress fiber organization promotes cell stiffening and proliferation of pre-invasive breast cancer cells, Nature Communications 8 (2017) 1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Wallace DC, Mitochondria and cancer, Nature Reviews Cancer 12 (2012) 685–698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Wang X, Li S, Wang S, Zheng S, Chen Z, Song H, Protein binding nanoparticles as an integrated platform for cancer diagnosis and treatment, Advanced Science 9 (2022) 2202453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Yi H, Zhang Q, Lin C, Ma S, Information-incorporated Gaussian graphical model for gene expression data, Biometrics 78 (2022) 512–523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [39].Zhan T, Rindtorff N, Boutros M, Wnt signaling in cancer, Oncogene 36 (2017) 1461–1473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Zhang X-F, Ou-Yang L, Yan T, Hu XT, Yan H, A joint graphical model for inferring gene networks across multiple subpopulations and data types, IEEE Transactions on Cybernetics 51 (2019) 1043–1055. [DOI] [PubMed] [Google Scholar]
  • [41].Zhao J, Zhang J, Yu M, Xie Y, Huang Y, Wolff DW, Abel PW, Tu Y, Mitochondrial dynamics regulates migration and invasion of breast cancer cells, Oncogene 32 (2013) 4814–4824. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42].Zhong T, Zhang Q, Huang J, Wu M, Ma S, Heterogeneity analysis via integrating multi-sources high-dimensional data with applications to cancer studies, Statistica Sinica 33 (2023) 729–758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Zvelebil M, Oliemuller E, Gao Q, Wansbury O, Mackay A, Kendrick H, Smalley MJ, Reis-Filho JS, Howard BA, Embryonic mammary signature subsets are activated in brca1−/− and basal-like breast cancers, Breast Cancer Research 15 (2013) 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES