Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Sep 8.
Published in final edited form as: J Am Stat Assoc. 2022 Jan 5;117(538):533–546. doi: 10.1080/01621459.2021.2000866

Bayesian Edge Regression in Undirected Graphical Models to Characterize Interpatient Heterogeneity in Cancer

Zeya Wang 1, Ahmed O Kaseb 2, Hesham M Amin 3, Manal M Hassan 4, Wenyi Wang 5, Jeffrey S Morris 6
PMCID: PMC9454401  NIHMSID: NIHMS1759119  PMID: 36090952

Abstract

It is well-established that interpatient heterogeneity in cancer may significantly affect genomic data analyses and in particular, network topologies. Most existing graphical model methods estimate a single population-level graph for genomic or proteomic network. In many investigations, these networks depend on patient-specific indicators that characterize the heterogeneity of individual networks across subjects with respect to subject-level covariates. Examples include assessments of how the network varies with patient-specific prognostic scores or comparisons of tumor and normal graphs while accounting for tumor purity as a continuous predictor. In this paper, we propose a novel edge regression model for undirected graphs, which estimates conditional dependencies as a function of subject-level covariates. We evaluate our model performance through simulation studies focused on comparing tumor and normal graphs while adjusting for tumor purity. In application to a dataset of proteomic measurements on plasma samples from patients with hepatocellular carcinoma (HCC), we ascertain how blood protein networks vary with disease severity, as measured by HepatoScore, a novel biomarker signature measuring disease severity. Our case study shows that the network connectivity increases with HepatoScore and a set of hub genes as well as important gene connections are identified under different HepatoScore, which may provide important biological insights to the development of precision therapies for HCC.

Keywords: Undirected graphical models, Non-static graph, Gene regulatory network, Tumor heterogeneity, Bayesian adaptive shrinkage

1. Introduction

The proliferation of new technologies that can simultaneously measure genetic, transcriptomic, and proteomic markers have revolutionized biomedical research and contributed to the advent of precision therapy, whereby medical treatment strategies are tailored to individual patients on the basis of molecular characteristics of their disease. While certain individual genes have key biological roles in healthy and/or diseased cells, molecular processes relevant to the functional behavior of multi-cellular organisms or complex diseases are not determined by individual genetic factors, but rather complex interactions of various molecules at various molecular resolution levels. Network Biology is a nascent and burgeoning subfield of systems biology that involves the discovery and characterization of molecular interactions underlying complex diseases, including cancer. Graphical models, which characterize the conditional dependency structure among random variables, are widely used in genomic studies to build networks representing interactions among different biological units, including genes and proteins.

There has been a great deal of work on graphical models over the past decade. One model class shown to be useful for discovering biological networks is the undirected graphical model, for which nodes index random variables and edges connecting nodes represent the global conditional dependency structure among the variables (Lauritzen 1996). A popular tool in studying undirected graphs is the Gaussian graphical model, for which conditional independence, the absence of an edge, corresponds to a zero entry in the precision (or concentration) matrix of multivariate Gaussian distribution (Dempster 1972), which also attracts growing interest in the recent development of distributed statistical learning (Lee et al. 2017).

However, most of the graphical model work in existing literature involves estimation of a single network for a population, while inter-patient heterogeneity in many complex diseases, including cancer, suggests that these networks may vary across patients. Characterization of this heterogeneity has the potential to reveal insights into the differences in molecular processes across patients that can lead to the discovery of novel precision therapy strategies. One way of characterizing inter-patient network heterogeneity is to assess how the networks vary across patient-level covariates. Two specific examples that have motivated this work include tumor purity and prognostic indices explaining inter-patient heterogeneity in cancer.

Accounting for tumor purity in biological networks.

Tumor samples are inherently heterogeneous, with different types of cells present in a clinically derived sample, potentially confounding, to a large extent, the downstream analysis of gene expression or protein profiling of solid tumors (Farley 2015, Junttila & de Sauvage 2013). In practice, tumor samples invariably contain some contaminating normal tissue, and the proportion of a sample that is pure tumor, called tumor purity, varies from sample to sample. For this reason, many of the standard tumor versus normal comparisons are biased, typically attenuated, because they do not adjust for tumor purity and assume that the tumor samples are pure tumors. This principle also holds true in more advanced analyses including gene or protein networks, as any comparison of normal and tumor networks would be similarly biased by this factor. Deconvolution models such as DeMixT (Wang et al. 2018) can be fit to molecular data for tumor and normal samples in order to obtain an estimate of the tumor purity, πi for each sample i = 1, … , N. Including this measurement as a continuous covariate in a graph regression model enables estimation of pure normal and pure tumor networks in a way that adjusts for this heterogeneous contamination.

Differential biological networks by severity of disease.

The characterization of inter-patient heterogeneity within a given cancer can contribute to new precision therapy strategies. For example, important biological mechanisms can be revealed by assessments of how various gene-gene or protein-protein networks strengthen or weaken with advancing disease. In hepatocellular carcinoma (HCC), Morris et al. (2020) has developed a novel prognostic signature computed from a patient’s plasma protein profile called the HepatoScore. The HepatoScore for a given patient is a score πi ∈ [0, 1] that quantifies the degree of aberration in the patient’s blood protein profile relative to healthy subjects, with π ≈ 0 indicating a protein profile essentially no different from a healthy subject, π ≈ 1 indicating that patient’s profile is maximally aberrant, and π in between (e.g., π ≈ 0.5) indicating a moderate level of aberration relative to healthy subjects. Although determined without consideration of any patient-level clinical factors (i.e., unsupervised), HepatoScore has demonstrated remarkable prognostic separability (low/medium/high HepatoScore with median survival of 38.2/18.3/7.1 months) in a set of 767 HCC patients. This biological score contains more prognostic information than standard factors such as metastasis or nodal involvement, and provides a significant refinement of existing staging systems, e.g. with metastatic HCC patients with low HepatoScore having substantially better prognosis than non-metastatic HCC patients with high HepatoScores. HepatoScore can be shown to be driven by a number of key proteins, including some in key pathways relevant to HCC such as growth hormone (GH), angiogenesis, and immune response. By modeling how protein networks vary across the continuous covariate HepatoScore, we can assess which protein-protein connections characterize advanced disease and provide molecular insights into this inter-patient heterogeneity.

Literature on heterogeneous graphical models.

There are a number of papers in the existing literature on heterogeneous graphs, but none that precisely solves the problem underlying our motivating examples. There are a number of papers on “group graphs”, in which graphical models are jointly estimated for discretized groups of subjects in a way that estimates group-specific graphs, while also borrowing strength between groups on common edges. Two-sample inference can be used to test differential edges between groups. Xia et al. (2015) developed a multiple testing procedure to detect gene-by-gene interactions with binary traits, while Narayan et al. (2015) proposed a novel resampling, random penalization, and random effects method for testing to identify the functional brain connections between two groups from neuroimages. Many other works focus on more than two groups (Guo et al. 2011, Danaher et al. 2014, Cai et al. 2016, Liu et al. 2017, Saegusa & Shojaie 2016). Liu et al. (2017) extended the two-sample test to capture the structural similarities and differences among multiple Gaussian graphical models. Guo et al. (2011) jointly estimated multiple graphical models by incorporating a hierarchical penalty for common factors and group-specific factors. Danaher et al. (2014) developed a more general model by employing fused lasso or group lasso penalties to encourage shared edges across the estimated precision matrices. Saegusa & Shojaie (2016) applied a Laplacian shrinkage penalty to encourage similarity among estimates from related subpopulations, and further proposed a Laplacian penalty based on hierarchical clustering for unknown population structures. Most recently, Bayesian approaches have become popular in modeling group graphs to construct the differential biological networks (Peterson et al. 2015, Tan et al. 2017, Mitra et al. 2016, Lin et al. 2017). Peterson et al. (2015) utilized a Markov random field prior for encouraging the common structure between different groups. Tan et al. (2017) investigated metabolic associations with the effect of cadmium through inducing multiplicative priors on the graphical space. Lin et al. (2017) proposed a Bayesian neighborhood selection method that jointly estimates multiple Gaussian graphical models for data with both spatial and temporal structure through naturally incorporating this structure. Motivated by the progress made for the joint estimation of multiple graphs, there is a growing development to have heterogeneous graphical models with a more relaxed assumption for observations (Yang et al. 2014, Hao et al. 2017). Hao et al. (2017) proposed a method to learn a cluster structure of data while estimating multiple graphical models that does not need to specify the membership of observations. Yang et al. (2014) developed a class of mixed graphical models, in which each node-conditional distribution with a graphical model belongs to a possibly different univariate exponential family, therefore allowing random variables to be from heterogeneous domain sets for complex data. While groups can be viewed as categorical covariates, these methods do not model graphical variation across the continuous covariates of primary interest as in our motivating examples and the present paper.

Other methods model heterogeneity across covariates with graphs or covariance matrices, but are not suitable for our setting. Hoff & Niu (2012) proposed a covariance regression model that regresses a covariance matrix on a set of explanatory variables using a factor analysis. Zou et al. (2017) studied different estimators to parameterize the covariance matrix as a function of predictors. Cai et al. (2012) proposed a covariate-adjusted Gaussian graphical model that regress a p-dimensional vector of responses on a q-dimensional vector of covariates, but the precision matrix does not depend on predictors, so this method only evaluates how the nodes change with covariates, but not node-to-node dependencies. Zhou et al. (2010) and Kolar & Xing (2009) developed dynamic undirected graph models varying with time. Cheng et al. (2014) modeled multivariate binary data using an Ising model to study the change of dependency with covariates. While some of these designs model covariance heterogeneity, these methods either do not provide node-specific inference, deal with covariance rather than precision matrices, or cannot be applied to general regression settings with multiple covariates for Gaussian graphical models.

Liu et al. (2010) proposed Graph-optimized classification and regression trees to partition the covariate space and estimate the graph within each partitioned subspace. While quite flexible, as reported by Cheng et al. (2014), this model lacks interpretation of the graphical model and covariates, and it has the undesirable property that graphs constructed for covariate values close to each are not necessarily similar. A machine learning method proposed by Kolar et al. (2010) applied a penalized kernel smoothing approach and allowed the precision matrix to change with covariates. One weakness of this method is that it ignores the intrinsic symmetry of the precision matrix, which may result in contradictory, unclear results in neighborhood selection and subsequent interpretation. Similarly using a kernel regression-based approach, Lee & Xue (2018) proposed another covariate dependent graphical model that utilizes a nonparametric mixture of Gaussian graphical models with a single scalar covariate controlling mixture probability and distribution. The finite mixture model effectively clusters the subjects into discrete subgroups based on a partitioning of the covariate space, and then estimates separate graphs for each partition point. This method shares similar limitations as found with kernel based methods: it lacks a clear interpretation of the change of graph structure with respect to the covariates. Plus, more fundamentally, it can only handle a single covariate, not multiple covariates as in a general graph regression modeling framework as we develop in this paper. Ni et al. (2018) constructed Bayesian graphical regression models for directed acyclic graphs (DAG), which enable directed graphs to vary with general covariates, but their approach does not work in the undirected graph setting, which poses additional challenging difficulties and is our primary interest here. To our knowledge, none of the existing literature has considered building a regression model for edges in undirected graphs allowing general linear model-based effects and multiple covariates, whose development is the primary goal of this manuscript.

Outline:

In this paper, we present a Bayesian method to perform edge regression for undirected graphical models. We define edge-specific conditional precision functions that allow the edge strengths of an undirected graphical model to vary with extraneous covariates. We estimate these elements of the precision matrix using a joint regression model while constraining the elements corresponding to a given node to be the same, and thus we guarantee symmetry in the corresponding edges of the precision matrix. We induce sparsity on both the edges and covariates through Bayesian global-local priors that introduce nonlinear shrinkage, after which posterior edge selection occurs to generate predicted graphs for given sets of covariates while accounting for multiple testing across edges and covariates using Bayesian false discovery rate (FDR) considerations. We demonstrate the performance of this method in a simulation study in the context of estimating gene networks that are specific for the tumor and stromal components by accounting for proportions of the two components in the observed mixed data, and by application to an HCC case study in which we assess heterogeneity of protein networks across the prognostic index HepatoScore.

The rest of the paper is structured as follows. In Section 2, we provide a formal description of edge regression with several theoretical properties for undirected graphical models. Then we present our models with sampling scheme and posterior inference technique. We present our simulation studies in Section 3 and our HCC case study in Section 4. Section 5 contains a discussion and conclusions.

2. Methods

2.1. Edge regression

A graphical model for a random p-vector Y is defined by a tuple GY={G,P(Y)}, where G is a graph and P(Y) denotes its associated distribution. G = (V, E) represents a conditional independence structure among random variables by specifying a set of nodes V = 1, 2, 3, ⋯ , p and a set of edges EV × V. In this work, our intended focus for application is on moderate-sized graphs G with nodes V in the dozens to greater than 100 or so and thus edges E from hundreds to thousands, which is useful in practice in studying genetic pathways, many of which are on that order of magnitude. Each node in graph G corresponds to a random variable in Y. In an undirected graph, we have undirected edges E, where (i, j) ∈ E if and only if (j, i) ∈ E. For example, a Gaussian graphical model is defined by assuming P(Y) is a Gaussian distribution with mean μRp and covariance matrix ΣRp×p. YnN(μ,Ω1), n = 1, ⋯ , N, where Yn is the observed data and Ω=Σ1Rp×p is the inverse covariance matrix (a.k.a., precision matrix or concentration matrix). In a Gaussian graphical model, Ω is a p × p symmetric positive definite matrix with elements (ωij). If ωij = 0, then the random variables i and j are conditionally independent given all the other variables of Y, which indicates that there is no edge in G between nodes i and j. Therefore the conditional independence structure of graph G can be inferred from models for the precision matrix Ω, which is well-known as the covariance selection model.

In our proposed edge regression model, given another q-dimension random vector X = (x1, ⋯ , xq)T, we consider GY(X)={G(X),P(YX)}, and the precision matrix for each observation Yn given X = xn is a function of X, allowing the conditional independence structure to vary from observation to observation over different realizations of X. In the following discussion, we use the term extraneous covariates to define X. We denote the precision matrix dependent on X through Ω(x) with elements ωij(X). Here we focus on linear assumptions in X, leaving extensions to nonparametric representations to future work. Ni et al. (2018) shows a functional pairwise Markov property for directed acyclic graphs, which implies the pairwise Markov property still holds if given the covariates when modeling the graph GY with P(Y) as a function of the external covariates X. Similarly, we have the following lemma for functional covariance selection that is used to represent edge regression for the covariance selection problem.

Lemma 1 (FUNCTIONAL COVARIANCE SELECTION RULE) Assume Y has a multivariate Gaussian distribution given extraneous covariates X with a precision matrix Ω(X). YiYjY−(i,j), Xωij(X) = 0. This follows from the covariance selection rule when a set of extraneous covariates X is given. Edge regression includes the following special cases:

  1. If X = ∅, then we have an ordinary undirected graphical model;

  2. If X is a set of discrete covariates (e.g., binary/categorical), then the edge regression model reduces to the problem of estimating multiple graphical models.

2.2. Regression model for undirected graphs

In this section, we introduce a sparse regression model to perform edge regression for undirected graphical models. From now on we assume the μ = 0 for simplicity. Denote the partial correlation between random variable Yi and Yj by ρij(1 ≤ ijp), where ρij=ωijωiiωjj. Hence, from the covariance selection rule, the edge (i, j) ∈ E is equivalent to the partial correlation ρij ≠ 0. A well-known lemma implies that when yi (1 ≤ ip) is expressed in a linear regression form of Σji θijyj + ϵi, θij = −(ωij/ωii) and ρij can be represented as sign(θij)(θijθji) (Peng et al. 2012). We can extend this lemma to a case of edge regression by including the extraneous covariates X into the regression method, which is stated formally in the following lemma.

Lemma 2 For 1 ≤ ip, considering predicting yi from other variables yi given extraneous covariates X = x with a varying-coefficient model, we have yi = Σji θij(x)yj + ϵi, such that ϵi is uncorrelated with y−i given X = x if and only if the optimal prediction rule gives θij(x)=ωij(x)ωii(x)=ρij(x)ωjj(x)ωii(x), where ωij(x) and ωii(x) respectively correspond to the off-diagonal and diagonal element of Ω(x). Hence, ρij(x)=sign(θij(x))×θij(x)θji(x). Additionally, var(ϵi) = 1/ωii(x), cov(ϵi, ϵj) = ωij/(x)/(ωii(x)ωjj(x)). θij(.) is a conditional precision function (CPF) that defines ρij through X.

Lemma 2 is also self-evident when the partial correlation is calculated given X. From Lemma 2, X changes the partial correlation ρij as well as the regression coefficients of yi over yj through the function θij(.). We call θij(.) CPF considering it defines the relationship between the partial correlation and extraneous covariates through ωij(.) and ωii(.) from the precision matrix (i.e., inverse covariance matrix). In this sense, θij(.) can be estimated to characterize the conditional dependency structure for a subject-level graph given X. Under this setting, the covariance selection problem for a subject-level graph is transformed into a feature selection problem for regression with varying coefficients, i.e., the sparsity structure of an undirected graph can be learned through a sparse regression. Under the assumption of sparse edges, in a Bayesian regression framework we can use variable selection or nonlinear shrinkage on the θij(x) and perform thresholding on the posterior probabilities (posterior thresholding) to establish the nonzero entries of the graph and their magnitudes (Morris et al. 2008). ωii is determined by the multiple correlation of variable Yi with the remaining variables, Ri,−i, and the node variance, σi2. Similarly to the assumption of homoscedastic node-level variances made in a DAG setting for graphical regression (Ni et al. 2018), we assume Ri,−i and σi2 do not change with X and model ωii as constant for simplicity and parsimony. Since our primary interest is in the edge structure with pairwise correlation, in this way we allow CPF to vary across X just through ωij, which facilitates modeling and expedites computation. Thus, learning how covariates affect edge selection is equivalent to learning how the sparsity structure of off-diagonal elements of the precision matrix varies with X.

2.3. Parameterization of the conditional precision function

In Lemma 2, we defined the conditional precision function. Suppose we have a set of extraneous covariates X, which can be continuous or discrete. According to our assumptions, ωii(x) = ωii. With θij(x)=ωij(x)ωii, the functional form of θij(.) is determined by ωij(.), which is constrained to be equal to ωji(.) in the precision matrix. Using a linear function to model the relationship between the partial correlations and extraneous covariates, we parameterize the dependence of ωij(.) on X:

ωij(X)=s=1qβsijXs (1)

where βsij is the effect of discrete or categorical variable Xs on the edge (i, j). In the regression for each edge (i, j), the conditional precision function can be considered a type of varying coefficient model (Hastie & Tibshirani 1993).

Joint regression models:.

By regressing Yi over Yi given X, we can write the regression model as:

Yi=jiθij(X)Yj+ϵi,i=1,,pθij(X)=ωij(X)ωii;ϵiN(0,1ωii) (2)

As we previously mentioned, only ωij is assumed to vary across X for focusing on learning the dynamic sparsity structure of off-diagonal elements and reducing the computational complexity, so θij and ϵi share the same scaling parameter ωii. Since ωij(.) is the off-diagonal element of precision matrix corresponding to vertex i and vertex j, we have ωji(.) = ωij(.). Hence, we will constrain these two functions to be identical in the sampling scheme when we jointly perform these regressions. Consequently, we have βsij=βsji for every ij. We rewrite the full conditional probability of Yi as:

YiYi,{βsi,i}s=1q,ωii,{Xs}s=1qN(jips=1qβsijXsyjωii,1ωii). (3)

2.4. Bayesian adaptive shrinkage

It has been widely observed that genomic and proteomic graphs tend to be sparse, and as previously discussed, the sparsity of a graph corresponds to sparsity in the estimated precision matrix given extraneous covariates. We will induce sparsity in the subject-specific precision matrix using a Bayesian approach involving shrinkage priors on the coefficients corresponding to the off-diagonal precision matrix elements.

The spike-slab prior (Mitchell & Beauchamp 1988), consisting of a mixture of a spike at 0 and a continuous slab, is a popular choice as a Bayesian sparsity prior. It provides true zero estimates for some variables in the model, yielding automatic edge selection in our graphical setting, plus it has some desirable theoretical properties (Johnstone et al. 2004, Scott et al. 2010, Narisetty et al. 2014, Castillo et al. 2012). However, in high dimensional settings involving a large number of variables or, as in our setting, a large number of potential graph edges, this prior can have computational problems in searching the enormously large underlying state space. Another alternative is to use global-local priors (Polson & Scott 2010) that involve scale mixtures of normals. These priors are absolutely continuous, making them computationally easy to work with even in high dimensional settings, and with a shape that induces a type of nonlinear shrinkage in which small magnitude coefficients shrink strongly towards zero, while large magnitude coefficients are left largely unaffected. As described below in Section 2.5, this nonlinear shrinkage effectively induces a type of sparsity in the graph edges, and posterior selection rules can be used to induce true zeros in the estimated graph structure for specific covariate levels (Polson & Scott 2010).

There are many potential global-local prior choices, including the Bayesian Lasso (Park & Casella 2008), Horseshoe (Carvalho et al. 2010), Dirichlet Laplace (Bhattacharya et al. 2015), Normal-Exponential-Gamma (Griffin & Brown 2011), and Normal-Gamma priors (Griffin, et al. 2010). Here, we will use the Normal-Gamma prior, which has been shown to have outstanding sparsity properties (Griffin et al. 2010). This distribution is indexed by two parameters that together provide useful flexibility in capturing varying degrees of sparsity and heavy-tails in the distribution of coefficients. Furthermore, there are efficient Gibbs sampling schemes available for the Normal-Gamma prior (Griffin et al. 2010). Specifically, assuming ωij(X)=s=1qβsijXs, we will assume the following Normal-Gamma prior for the coefficients βsij:

βsijN(0,ψsij);ψsijGamma(λs,1(2γ2)). (4)

The CPFθij(x)=ωij(x)ωii=s=1qβsijXsωii=s=1qβsijωiiXs is still a linear function. For each βsij in edge regression, The latent scale parameter ψsij serves as an adaptive shrinkage parameter across both edges and covariates. We allow the shape parameter λs to vary across covariates, but borrow strength across edges, and set the scale parameter γ to be common across covariates and edges. This hierarchical structure is constructed to have flexibility, yet borrow strength across edges within covariates, and then across covariates. For ωii, which controls the variance parameter in the neighborhood selection model, we choose a vague prior such that ωii ∝ 1, as done by Griffin et al. (2010), for our following discussion. If ωii is given with a conjugate prior Gamma(a*, b*), the full conditional distribution for ωii keeps the same form, so our sampling scheme can still be implemented by a Gibbs step. A graphical representation of this hierarchical formulation is shown in Figure 1.

Figure 1:

Figure 1:

A graphical representation of edge regression with normal-gamma prior. Single arrows are probabilistic edges; double arrows are deterministic edges; squares are observed data; circles are random variables. The total number of instances of each variable that is enclosed in the same plate is given by the constant in the corner of that plate. ρij is the partial correlation for edge (i, j).

Sampling scheme:

We adapt the scheme of Griffin et al. (2010) to sample λs and γ simultaneously by specifying exponential and inverse-gamma hyper priors. Enabled by this hierarchical specification of the Normal-Gamma prior, we implement a Metropolis-within-Gibbs sampling scheme to update each parameter sequentially. The Gibbs steps involve a multivariate Gaussian for βij, generalized inverse Gaussians for ωii and ψsij, a Gamma distribution for the scale parameter γ−2, and a Metropolis-Hastings step is used to update the shape parameter λs. After sampling the parameters in the edge regression model, we subsequently obtain posterior samples for the subject-specific precision matrices for the subjects of interest in the dataset, and we could also produce posterior precision matrices for any other hypothetical subjects with specific levels of the covariates x. The steps of the sampler are summarized in Algorithm 1 (Supplementary Section B) and the corresponding computational details are given in the Supplementary Section B and C. Recall that the CPF θij(x) is used to define the edge strength ρij(x) through X. When the edge strength of the subject-level graph is varying across x, the adaptive shrinkage priors imposed on each βsij induce different degrees of shrinkage on θij(x) across x (Equation (1)). Note that the Normal-Gamma prior induces a ridge-type prior N(0,ψsij) on each βsij, so with a linear CPF, the prior induced to θij(X = x) will still be a ridge prior, of which the variance item is controlled by x. The intercept term is also given sparsity priors, inducing sparsity across the edges overall. This implies that, a priori, we expect most edges do not vary strongly with a given x, but only a subset of edges. With the shrinkage induced onto the edges across subjects, a Bayesian FDR control procedure will be proposed to select edges for each subject-level graph, which finally induces sparsity at each subject-level graph.

2.5. Posterior inference and thresholding

We perform edge selection to estimate covariate-specific graphs by thresholding posterior probabilities of edge inclusion (PPI) for each edge based on the MCMC samples. For a given set of covariate levels x, we have L posterior samples of the precision matrix elements ρxij,l, l = 1 … , L after burn-in and thinning. Recall that our Normal-Gamma prior will not result in ρxij,l0, but does nonlinearly shrink the ρ towards zero, such that ρij ≈ 0 for a large number of (i, j), and ρij is very large in magnitude for a relatively small number of (i, j). If we choose a minimum magnitude of interest κ below which we consider the conditional dependence negligible, such that we consider (i, j) ∈ Ex,κ if ρxi,j>κ (Hoti & Sillanpää 2006), we can estimate the marginal PPI, Px,κi,j=Pr{(i,j)Ex,κY,x}, with l=1LI(ρxij,l>κ)L. The quantity qx,κi,j=1Px,κ(i,j) can be considered an estimate of the Bayesian local FDR for selecting edge (i, j) in the graph under covariate levels x thus defined. For a given global Bayesian FDR level δ, we will flag any edges for which Px,κi,j1ϕx,δ as present in our inferred graph, chosen as follows: First, sort {qx,κi,j} in ascending order to obtain {qx,κ(t),t=1,,p(p1)2}; Second, for a given δ, find the largest t* such that (t)1t=1tqx,κ(t)δ; Thrid, set ϕx,δ=qx,κ(t), and select edges with qx,κi,jϕx,δ. This choice implies that we expect ≤ 100δ% of the edges in the estimated edge set E^x,κ will result in false positives, as defined above.

Selection of δ and κ:

In order to apply this selection rule, choices must be made for δ and κ. δ should be chosen to correspond to the desired expected FDR, and κ the minimal value of partial correlation below which we consider the association practically negligible in the context of the given application. In practice, it is a good idea to assess the sensitivity of results to a choice of κ and δ, with edges that persist even with smaller δ and larger κ prioritized more highly for any subsequent follow up. In any simulation studies, the average area under the curve (bAUC) of the receiver operating curves (ROC) (McGuffey et al. 2018) can be used to assessed model performance over the entire range of possible choices for δ and κ.

3. Simulations

In this section, we present simulation studies to investigate the performance of our Bayesian edge regression, designed to mimic the setting of tumor heterogeneity discussed in the introduction. As previously stated, most researchers interested in contrasting tumor and normal networks would not account for tumor purity, but instead would estimate normal and tumor graphs from the respective samples either using independent or group graphical models. Thus, we will compare our Bayesian edge regression method with three commonly used approaches for estimating multiple graphical models, the fused graphical lasso; the group graphical lasso (Danaher et al. 2014); as well as the Laplacian shrinkage for inverse covariance matrices from heterogeneous populations (LASICH) (Saegusa & Shojaie 2016), and an approach for solving covariate-dependent graphical model, the nonparametric finite mixture of Gaussian graphical model (NFMGGM) (Lee & Xue 2018). Additionally, we include a comparison with a method of Bayesian inference of multiple Gaussian graphical models (BIMGGM) (Peterson et al. 2015), which is a Bayesian approach to inference on group graphs. We further run the proposed Bayesian edge regression model with binary covariates mimicking the group definition used when applying the other group models, which is denoted as Bayesian edge regression (group case) in the following discussion. For each simulation, we run 20, 000 MCMC iterations, in which the first 10, 000 iterations are discarded as a “burn-in” period, and thin out the chain using every 10-th sample.

Data generation for simulation.

In order to test the ability of our Bayesian edge regression method to account for tumor purity in network estimation, we simulate data in a way to mimic the real-life setting of normal contamination in tumor samples. In our simulation study, we use a similar setting to construct precision matrices from Peterson et al. (2015) and include 20 nodes to represent 20 genes, which produces a proper degree of sparsity with around 10 ~ 20% of possible edges included for the generated precision matrix. From here on, we use the more general term “normal” to represent the stroma component. Let Yn be the log2-transformed expressions from the clinically derived tumor sample n. According to Ahn et al. (2013) and Wang et al. (2018), the observed expressions before log2-transformation of gene expression data are well-modeled as a linear mixture of the expressions from the normal and the tumor components. It follows that

2Yn=(1πn)2Nn+πn2Tn, (5)

where with a multivariate extension for gene interactions the log2-transformed expressions from the normal component NnN(μN,ΩN1) and those from the tumor component TnN(μT,ΩT1). πn ∈ [0, 1] is the proportion of the tumor component before log2-transformation, i.e., the measured tumor purity for sample n. Following Equation (5), we generated Yn for each sample n from the the simulated expressions Nn and Tn. For simplicity, we set μN = 0 and μT = 0 in our simulation. We also generate NnN(o,ΩN1) as a reference group for normal component, where πn = 0. We provide two simulations with different set-up of precision matrix.

Simulation 1. Without overlap in tumor and normal graphs. ΩT, where off-diagonal elements ωTi,i+2=ωTi+2,i uniformly sampled from [−0.5, −0.3] ∪ [0.3, 0.5] for i = 1, ⋯ , 18. ΩN, where off-diagonal elements ωNi,i+1=ωNi+1,i uniformly sampled from [−0.5, −0.3] ∪ [0.3, 0.5] for i = 1, ⋯ , 19. For both ΩT and ΩN, all the diagonal elements are one and all the other elements are left with zero. ΩT and ΩN are truly sparse with just 18 and 19 edges. They do not have any overlapping edges by construction. We simulate reference normal samples of size NN = 50 and mixed tumor samples of size NY = 150 with {πn}n=1150 generated from an arithmetic sequence from 0.01 to 0.99. πn will be considered as the extraneous covariate to our edge regression model and also taken as a fixed value in our following experiments. We randomly generated 100 datasets for this simulation.

Simulation 2. With overlap in tumor and normal graphs. ΩT, where off-diagonal elements ωTi,i+1=ωTi+1,i=0.5 for i = 1, ⋯ , 19, ωTi,i+2=ωTi+2,i=0.4 for i = 1, ⋯ , 18. All the diagonal elements are one and all the other elements are left with zero. ΩN, where we remove 30 edges randomly from ΩT by substituting these 30 nonzero elements with zero and randomly add 30 edges to ΩT by substituting these 30 zero elements with values uniformly sampled from [−0.6, −0.4] ∪ [0.4, 0.6]. To ensure ΩN is positive definite, following Danaher et al. (2014), we divide each off-diagonal element by 1.5 times the sum of the absolute value of all the off-diagonal elements in its row. Then we average the transformed matrix with its transpose to guarantee it is symmetric. Although this procedure is able to guarantee the generated matrix will be positive definite, it can bring weak signals to ΩN, which makes the estimation even more difficult. We allow ΩT and ΩN to have seven overlapping edges, and ΩN has relatively weak edge strengths. We simulate reference normal samples of size NN = 100 and mixed tumor samples of size NY = 200 and generate πn as in Simulation 1. We randomly generated 100 datasets for this simulation. We also provide two more simulations that simulate ΩN and ΩT with scale-free networks and nearest-neighbor networks in Supplementary Section D (Guo, Levina, Michailidis & Zhu 2011). The graph structures for all the simulation settings are shown in Supplementary Figure 1.

We compare the results of our edge regression with application of the fused and group graphical lassos1, LASICH 2, NFMGGM 3, and BIMGGM 4, respectively, to the tumor and normal measurements, Y and N′, respectively. The application of these group graph methods corresponds to what might be the usual practice of estimating tumor graphs from tumor samples without adjusting for tumor purity and normal contamination, and estimation of the normal graph from normal controls, so has practical scientific relevance. For running our method and NFMGGM, all the genes are normalized to have a mean of zero and a standard deviation of one with all the samples. For running all the group graphical models, the data are normalized to have a mean of zero and a standard deviation of one, respectively, within the tumor and normal group. More details of the implementations for all the methods are in the Supplementary Section D.

In our Bayesian edge regression model, following Equation (1) we parameterize the dependence of ωij(.) on X:

ωij(π)=βij(1π)+αij(π). (6)

Under this parameterization, αij represents the precision element for pure tumor samples, and βij the precision element for a pure normal sample, with the sample specific edges given by a linear combination as determined by their tumor purity π. This model allows us to reweight samples based on tumor purity to estimate the pure normal and pure tumor graphs. For an additional comparison of our approach using discrete predictors, we also ran our Bayesian edge regression model with binary covariates mimicking the group definition used when applying the other group models (i.e., Bayesian edge regression (group case)). In this application, we use two binary covariates to encode the membership of tumor and normal groups and add an additional covariate to capture the interaction effects between tumor and normal groups (see more details in Supplementary Section D).

We implement these methods across 100 simulated datasets for both the first and second simulations. We compare the methods in terms of accuracy in estimating the graph structure using the area under the ROC curve (AUC) and true positive rate (TPR) and false positive rate (FPR). We have two regularization parameters for both our graph edge regression (κ and δ), NFMGGM (λ and h), LASICH, and the two graphical lasso methods (λ1 and λ2). Thus, for all these methods we compute a bivariate AUC (McGuffey et al. 2018) by varying both two parameters at the same time, and then computing the expected AUC (bAUC) by binning results on a grid of 1- specificity, and computing the average sensitivity within those values. BIMGGM is reported with a univariate AUC given it only requires one tuning parameter. For better observing how each tuning parameter affects the model performance for these methods with two hyperparameters, we further report the best univariate AUC over one hyperparameter given the other is fixed. To compare TPR and FPR for a single choice of regularization parameters, we use κ = 0.1 and δ = 0.1 for the Bayesian edge regression methods that are respectively built with continuous covariates and discrete covariates, and we choose λ1 and λ2 or λ and h for the compared methods following the previously mentioned guidelines (Supplementary Section D). We report the results for BIMGGM with the same δ = 0.1 for posterior thresholding with Bayesian local FDR. In addition to the TPR and FPR reported with the selected model, we also report a TPR corresponding to FPR ≈ 0.1 for each method, providing a fair comparison of methods using common criteria. The ROC curves for these two simulations are given in Figure 2, and tables showing the AUC, TPR, and FPR for normal, tumor, and overall are given in Supplementary Section D.

Figure 2:

Figure 2:

Simulation results for Section 3. ROC curves for the structure learning of simulated normal (w.r.t ΩN) and tumor graphs (w.r.t ΩT) in Simulation 1 and Simulation 2.

In Simulation 1, we see that all methods yield relatively high bAUC for the normal graphs, but the Bayesian edge regression with continuous covariates has much better bAUC (0.915) for the tumor graph than the other group graph methods (0.706, 0.793, 0.808, and 0.813) (see Supplementary Table 2). This is related to the fact that, unlike the group graph methods, our edge regression can adjust for continuous variables, such as the tumor purity, and thus reduce biases in parameter estimations for the tumor graph. The proposed method outperforms the kernel regression-based method NFMGGM (0.810) for the tumor graph as well, which suggests a parametric edge regression leads to better performance in this case. Also, note that the FPRs for the Bayesian edge regression method are all below 0.100, while the NFMGGM and graphical lasso methods have high FPR for the choices of their regularization parameters. We can further find that the proposed method is reported with the highest overall TPR (0.903 versus 0.660, 0.697, 0.735, 0.772, 0.781, 0.740) among all the methods when the model is chosen such that FPR ≈ 0.1 (see Supplementary Table 3).

In Simulation 2, a more challenging setting with weaker signal, we also observe that our Bayesian edge regression method with continuous covariates produces higher bAUC for both the normal (0.801 versus 0.748, 0.770, 0.751, 0.738, 0.770) and tumor (0.911 versus 0.758, 0.813, 0.852, 0.799, 0.868) graphs (see Supplementary Table 4). Once again, the graph lasso methods with the chosen regularization parameters resulted in much higher FPRs (> 0.500), while the Bayesian edge regression was much lower (0.047 for normal and 0.269 for tumor; see the results in Supplementary Section D). Similarly, the proposed method leads to the highest overall TPR (0.700 versus 0.423, 0.417, 0.443, 0.397, 0.638, 0.592) when FPR ≈ 0.1 (see Supplementary Table 5). From the two simulations, we find that implementing a group graphical model on our Bayesian edge regression framework (i.e., Bayesian edge regression (group)) obtains a performance as similar as other group graph methods, but worse than the regression model with continuous covariates (overall AUC: 0.874 versus 0.931 in Simulation 1; 0.819 versus 0.856 in Simulation 2) (Supplementary Table 2 and 4), which again highlights the importance of adjusting for tumor purity as a continuous covariates when estimating tumor and normal networks.

4. Proteomic Networks in Hepatocellular Carcinoma

Markers in HCC:

Hepatocellular carcinoma (HCC) is the fifth most common cancer worldwide and the third-leading cause of cancer-related death, and it has an increasing incidence in developing countries. In the USA, it is the fastest growing cause of cancer-related mortality in men, and with the alarming increase of hepatitis C, it is expected to continue to grow in incidence in the coming years. Over 80% of patients present with advanced disease and underlying cirrhosis (Fattovich et al. 2004, Sanyal et al. 2010), which prevents curative treatment options. There are only a few approved systemic therapies for HCC, e.g., sorafenib, and various targeted therapies are being assessed in combination, including therapies targeting angiogenesis pathways. New, targeted therapies and precision therapy strategies clearly are needed. Cytokines are blood proteins secreted by various types of cells in the immune system that have an effect on other cells. There is significant evidence that numerous cytokines mediate processes involved in the liver, including inflammation, necrosis, cholestasis, fibrosis, and regeneration, and are a key factor in many liver diseases, including HCC (Tilg 2001). Other biological pathways instrumental for HCC include inflammation, metabolic pathways, immune response, growth factor, and angiogenesis (Dhanasekaran et al. 2016, Aravalli et al. 2008). A deeper characterization of the molecular basis of interpatient heterogeneity of HCC, including behavior within these pathways, has the potential to contribute to new, targeted precision therapy strategies for HCC.

A recently developed novel prognostic measure characterizing inter-patient heterogeneity in HCC from blood protein profiles is the HepatoScore (Morris et al. 2020). This biological prognostic score has been shown to dramatically refine HCC staging systems, e.g. accurately delineating a subset of metastatic patients with low HepatoScore who have substantially better prognosis than non-metastatic patients with high HepatoScore. Although it is a biological score based only on blood protein levels including no clinical factors, HepatoScore by itself outperforms all existing staging systems and prognostic factors (Morris et al. 2020). While a global score computed from the entire panel of circulating proteins, HepatoScore is primarily driven by a subset of key circulating proteins within various molecular pathways relevant to HCC, most notably the immune response, GH, and angiogenesis pathways. It is thought that these pathways play a major role in characterizing the patient’s cancer and prognosis, and deeper characterization of the interrelationships across these proteins may yield important biological insights.

The dataset analyzed in this paper involves measurements of proteins that are obtained using CytokineMAP (Myriad RBM, Austin, TX) on plasma samples from 767 HCC patients (Morris et al. 2020). The proteins considered here include 71 proteins from immune system, GH, and angiogenesis pathways plus alpha-fetoprotein (AFP), an important protein for HCC used for early detection and prognosis. After scaling all proteins to have a mean of zero and variance of one, we ran our graph edge regression model as outlined above. Given HepatoScore π ∈ [0, 1], our model for the conditional precision edge (i, j) is given by ωij(π) = βij(1 − π) + αij(π).

Under this parameterization βij represents the edge strength for low HepatoScore (π = 0), αij the edge strength for high HepatoScore (π = 1), and a linear combination assumed for a moderate HepatoScore, e.g. with the edge strength for π = 0.5 given by 0.5 βij + 0.5αij. The graph edge strengths for any continuous HepatoScore π ∈ [0, 1] can be computed from this model. We used the same prior setting as in Section 3, and chose a tuning parameter σλ that provides an acceptance rate of the Metropolis step at around 20% ~ 30%, determined after the burn-in period. We ran the MCMC sampler for 10, 000 iterations after a burn-in of 10, 000, then thinning to keep every 10th sample.

To assess convergence, we observed trace plots and ran a Geweke convergence diagnostic for all parameters. The histogram of the Geweke p-values suggests that the chain converged satisfactorily (Supplementary Figure 4). Within the sampler, we also obtained posterior samples for the predicted precision matrices corresponding to a low (π = 0), medium (π = 0.5), and high (π = 1) HepatoScore as described above, and applied our posterior edge selection approach based on δ = 0.1 and κ = 0.15, also considering κ = 0.1 and κ = 0.2 for sensitivity. We also ran NFMGGM for comparison, with results in Supplementary Section E.

While our method produces predicted graphs for any π ∈ [0, 1], for interpretation we focus on three levels of π ∈ {0, 0.5, 1}. Figure 3 contains the estimated graphs for a low HepatoScore (π = 0), medium HepatoScore (π = 0.5), and high HepatoScore (π = 1), with edge direction indicated by color (green = positive, red = negative), edge strength by line width, and node size indicating the number of connecting edges. Blue lines indicate edges shared across all levels of π, and their direction is given by panel (d). It is clear that the number of graph edges increases with HepatoScore values, indicating that the protein network connectivity increases for more invasive forms of HCC. Figure 4 summarizes the number of connections within each pathway and between each pair of pathways, as a function of HepatoScore π, and Supplementary Table 15 shows the number of edges within and between different pathways in the respective graphs for π = 0, 0.5 and 1. We see that the number of intra-pathway connections within each of the three pathways strongly increases with HepatoScore, especially for a high HepatoScore (π > 0.8), with more than twice the number of edges than a low HepatoScore. The number of inter-pathway edges increases with the HepatoScore even more strongly, with a three to fourfold increase, much notably between the angiogenesis and immune pathways with 40 edges for π = 1 and only 15 edges for π = 0. The increased connectivity could correspond to increased activity within these important pathways, and increased cross talk between them. This could have important implications for the underlying molecular biology, and it needs to be followed up to validate and assess the biological implications of these associations. Supplementary Section E contains the results using κ = 0.1 and 0.2, which demonstrate the same substantive effects, although of course with a greater and fewer number of total edges, respectively, in the graphs.

Figure 3:

Figure 3:

Estimated graphs (δ = 0.10, κ = 0.15) from the Bayesian edge regression for the GH, immune, and angiogenesis pathways with (a) π = 0; (b) π = 0.5; (c) π = 1; (d) Common edges. Colors indicate positive (green), negative (red), and common (blue) edges. The thickness of edge is proportional to ρ^ij for the edge (i, j), and the size of node is proportional to its degree.

Figure 4:

Figure 4:

The number of connected edges within each pathway and across different pathways varies with different HepatoScores (from π = 0 to 1). The color gradient for each table cell changes with the square root of the proportion of connected. κ = 0.15.

Hub proteins are proteins with many connections in the graph, and they may be involved in multiple regulatory activities. Different hub genes are identified for these three graphs (Supplementary Table 16). IGFBP-3 has been identified as a hub gene in the high HepatoScore graph and with a moderate degree of connectivity in the low HepatoScore graphs, where the connected nodes are different. IGFBP-3 has been considered as an effective predictor for HCC patients with chronic HCV infections, and it is a transcription factor encoding proteins to suppress HCC cell proliferation, so the reduction of IGFBP-3 is significantly associated with the development of HCC (Aleem et al. 2012, Ma et al. 2016). There are many edges that vary over the HepatoScore. We highlight a few notable ones here and present the rest in Supplementary Section E. Figure 5 contains a plot for three edges, which presents the edge strength as a function of HepatoScore π along with 95% credible intervals and the corresponding PPIδ=0.1,κ=0.15.

Figure 5:

Figure 5:

Predicted Edges. Top panels show predicted edge strengths, with bottom panels plotting posterior probabilities of inclusion (PPI) with blue lines indicating FDR=0.10 thresholds. For a value of π, the head of the corresponding red line above the blue curve indicates a nonzero edge strength.

6Ckine is strongly associated with MIP-3, β for medium and high HepatoScores, whereas this edge is not apparent in the graphs for a low HepatoScore. The regulation between 6Ckine and MIP-3, β has been previously reported to play a determinant role in accumulating antigen-loaded mature dendritic cells (Caux et al. 2000). AFP/MIP-3, α is another pair that shows no correlation in the graph for low HepatoScore, but a positive correlation for high HepatoScore. AFP (α-fetoprotein) is a tumor marker for liver cancer. The levels of AFP have been reported to relate with MIP-3, α levels in HCC, where the serum levels of MIP-3, α are increased (Yamauchi et al. 2003). CA-15-3 is well known to detect breast cancer and distinguish from non-cancerous lesions, and its level has been shown to be increased for end-stage liver disease patients (Pissaia et al. 2009, Szekanecz et al. 2008). MCP-1 is a protein secreted by the HCC microenvironment that can promote progression, angiogenesis, and metastasis in cancer through recruiting and modifying mesenchymal stromal cells (MSCs). CA-15-3/MCP-1 shows negative correlation for π = 0 and positive correlation for π = 1, which corresponds to these previous empirical findings of elevated levels of CA-15-3 and MCP-1 in liver disease.

In Supplementary Figure 8, we show several additional connections we consider biologically meaningful, as we discuss in Supplementary Section E. We further summarize how edge connectedness varies with the HepatoScore in Figure 4 (see Supplementary Figures 9 and 10 for κ = 0.1 and 0.2). These results suggest that the protein networks in these important pathways differ in HCC patients with more and less advanced stages of the disease, with connectivity increasing with HepatoScore, with a dramatically greater number of connections for the HCC patients with higher HepatoScore and the most poor prognoses. Biological studies investigating these differences have the potential to reveal insights into the molecular heterogeneity distinguishing these patients from those with a much better prognosis, and this knowledge can contribute towards our efforts to identify sorely needed new precision therapy strategies for HCC. These discoveries were made possible by the novel modeling framework we have introduced in this paper.

5. Discussion

In this article, we introduce a Bayesian edge regression model for construction of non-static undirected graphs with edge strengths varying with extraneous covariates. To deal with potential high dimensionality, we use global-local priors to effectively induce sparsity into the underlying graphs, and we use posterior probabilities to infer important graph edges for a given set of covariates. Based on node-wise regressions, that have been shown with good performance for graph reconstruction (Leday et al. 2015, Meinshausen & Bühlmann 2006, Ha et al. 2020), our method is primarily focused and recommended for applied settings in which the focus is on edge detection rather than estimation of the full precision or covariance matrix. This modeling framework allows researchers to study how clinical and biological factors lead to heterogeneous genomic or proteomic networks varying across patients. We demonstrate how this method could be used to incorporate tumor purity in estimating structural differences between tumor and normal graphs, and to assess how graphs vary across a continuous prognostic factor HepatoScore to explain interpatient heterogeneity in HCC. Our construction is general and can be used in any setting with multivariate data and covariates for which assessment of how conditional dependencies across the variables vary across continuous or discrete covariates are of interest. While motivated by the setting of continuous covariates, the method is based on a general regression framework in which any number of continuous or discrete covariates or interactions can be included. We also provide freely available code for fitting our models.

Our sampling scheme employs a Gibbs sampler, which yields posterior samples for the model and predicted graphs for any set of covariates that can be used for posterior inference. Our hyperpriors for the normal-gamma shrinkage prior are set to allow the borrowing of information for regularization parameters of covariate coefficient across different edges. Our sampling procedure is also easy to implement and requires only minimal tuning of shrinkage hyperparameters. We show the parameterization of our conditional precision function and its practicality through simulation study, and we demonstrate that our method is able to provide a reasonable sensitivity and specificity in edge selection. The parameterization is flexible and shown to be able to borrow strength in a group-specific setting by introducing interactions.

Our fully Bayesian method is designed for application to moderate-sized graphs from dozens to over 100 nodes, the scale of our motivating examples, and our method scales well to these sizes. It is not intended for enormous graphs with 1000s or 10,000s of nodes, a setting that would require enormous sample sizes for estimability anyway. As is commonly done in genomic settings (Peterson et al. 2015, Telesca et al. 2012, Chun et al. 2015), we recommend researchers select a subset of genes of interest from pre-specified pathways of interest. For example, if one does not have an a priori list of 100 or so genes to look at, they could first download pathway genes from a public database such as KEGG, Reactome, BioCyc, or Pathway Commons, and then perform manual curation to come up a gene list in the order of dozens to 100s of genes or less for each model fit.

In this paper, we focus on the setting for which the graph edge strengths are linearly related to covariates. In some settings, one may wish to relax this linearity assumption and use nonparametric regression approaches, such as generalized additive models in this setting, so that edge strengths can vary nonlinearly with the covariates using the well-known association between penalized splines and random effect models. This involves significant changes to the methodological framework and computational schemes.

Additionally, we note that like many previous works in graphical modeling and especially in graphical regression (Ni et al. 2018, Kolar et al. 2010) that care most about graph structures, our current implementation focuses on regression for the off-diagonal elements of the precision matrices after standardizing all variables, which does not account for the change of diagonal elements. Our simulation studies are generated such that the diagonals vary with the covariates, and the outstanding performance of our method in the simulations demonstrate robustness of performance to this type of heteroscedasticity. In principle, we could avoid standardization and regress the diagonal variances on covariates as well, although this would require substantially reworking the modeling framework, and is outside the scope of this paper.

We also acknowledge that, like numerous other node-wise regression methods in the literature (Leday et al. 2015, Meinshausen & Bühlmann 2006, Peng et al. 2012, Kolar et al. 2010, Ha et al. 2020), our model does not explicitly constrain positive definiteness for all possible covariate levels. One possible route is to construct a joint prior (and hence a generative model) on the entire precision matrix elements, Ω(X), ∀X, such that it lies in the cone of positive definite matrices. Although theoretically sound, this would, in principle, invoke a joint sampling scheme to generate precision matrix (and its elements) for each subject and would add considerable computational expense. Specifically, in contrast to fitting one population level graphical model or a few in the case of multiple graphical models, this scenario involves a fitting schema that scales both in the number of nodes and subjects, thus significantly increasing the complexity of the problem and enforcement of sparsity, and without additional structural assumptions, would make it untenable for many practical settings, including ours. Instead, we focus on identifying edges whose strengths vary across covariates, and we have found in practice our method tends to yield positive definite predicted precision matrices in a vast majority of the cases, e.g., for 99.8% of covariate levels for Simulation 1 (Supplementary Table 6). Given the general utility of our model and outstanding performance in simulations, we believe that our method is a substantial addition to the literature even without this explicit constraint, so we leave its consideration for future work.

Supplementary Material

Supplementary
HepatoScore Supplementary

Footnotes

1

Available in the R package JGL

2

available in the R package LASICH

3

Implementation requested from the authors

Contributor Information

Zeya Wang, Department of Statistics, Rice University; Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Veerabhadran Baladandayuthapani; Department of Biostatistics, University of Michigan.

Ahmed O. Kaseb, Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center

Hesham M. Amin, Department of Hematopathology, The University of Texas MD Anderson Cancer Center

Manal M. Hassan, Department of Epidemiology, The University of Texas MD Anderson Cancer Center

Wenyi Wang, Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center.

Jeffrey S. Morris, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania

References

  1. Ahn J et al. (2013), ‘Demix: deconvolution for mixed cancer transcriptomes using raw measured data’, Bioinformatics 29(15), 1865–1871. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Aleem E et al. (2012), ‘Serum igfbp-3 is a more effective predictor than igf-1 and igf-2 for the development of hepatocellular carcinoma in patients with chronic hcv infection’, Oncology letters 3(3), 704–712. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Aravalli RN et al. (2008), ‘Molecular mechanisms of hepatocellular carcinoma’, Hepatology 48(6), 2047–2063. [DOI] [PubMed] [Google Scholar]
  4. Bhattacharya A, et al. (2015), ‘Dirichlet–laplace priors for optimal shrinkage’, Journal of the American Statistical Association 110(512), 1479–1490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cai TT et al. (2012), ‘Covariate-adjusted precision matrix estimation with an application in genetical genomics’, Biometrika 100(1), 139–156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cai TT et al. (2016), ‘Joint estimation of multiple high-dimensional precision matrices’, Statistica Sinica 26(2), 445. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Carvalho CM, et al. (2010), ‘The horseshoe estimator for sparse signals’, Biometrika 97(2), 465–480. [Google Scholar]
  8. Castillo I et al. (2012), ‘Needles and straw in a haystack: Posterior concentration for possibly sparse sequences’, The Annals of Statistics 40(4), 2069–2101. [Google Scholar]
  9. Caux C et al. (2000), Dendritic cell biology and regulation of dendritic cell trafficking by chemokines, in ‘Springer seminars in immunopathology’, Vol. 22, Springer, pp. 345–369. [DOI] [PubMed] [Google Scholar]
  10. Cheng J et al. (2014), ‘A sparse ising model with covariates’, Biometrics 70(4), 943–953. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chun H et al. (2015), ‘Gene regulation network inference with joint sparse gaussian graphical models’, Journal of Computational and Graphical Statistics 24(4), 954–974. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Danaher P et al. (2014), ‘The joint graphical lasso for inverse covariance estimation across multiple classes’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(2), 373–397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Dempster AP (1972), ‘Covariance selection’, Biometrics pp. 157–175. [Google Scholar]
  14. Dhanasekaran R et al. (2016), ‘Molecular pathogenesis of hepatocellular carcinoma and impact of therapeutic advances’, F1000Research 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Farley P (2015), “purity’ of tumor samples may significantly bias genomic analyses’. URL: http://medicalxpress.com/news/2015-12-purity-tumor-samples-significantly-bias.html [Google Scholar]
  16. Fattovich G et al. (2004), ‘Hepatocellular carcinoma in cirrhosis: incidence and risk factors’, Gastroenterology 127(5), S35–S50. [DOI] [PubMed] [Google Scholar]
  17. Griffin JE, et al. (2010), ‘Inference with normal-gamma prior distributions in regression problems’, Bayesian Analysis 5(1), 171–188. [Google Scholar]
  18. Griffin JE & Brown PJ (2011), ‘Bayesian hyper-lassos with non-convex penalization’, Australian and New Zealand Journal of Statistics 53(4), 423–442. [Google Scholar]
  19. Guo J, Levina E, Michailidis G & Zhu J (2011), ‘Joint estimation of multiple graphical models’, Biometrika 98(1), 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Guo J et al. (2011), ‘Joint estimation of multiple graphical models’, Biometrika 98(1), 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Ha MJ, Stingo FC & Baladandayuthapani V (2020), ‘Bayesian structure learning in multilayered genomic networks’, Journal of the American Statistical Association (just-accepted), 1–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Hao B et al. (2017), ‘Simultaneous clustering and estimation of heterogeneous graphical models.’, Journal of Machine Learning Research 18, 217–1. [PMC free article] [PubMed] [Google Scholar]
  23. Hastie T & Tibshirani R (1993), ‘Varying-coefficient models’, Journal of the Royal Statistical Society. Series B (Methodological) pp. 757–796. [Google Scholar]
  24. Hoff PD & Niu X (2012), ‘A covariance regression model’, Statistica Sinica pp. 729–753. [Google Scholar]
  25. Hoti F & Sillanpää M (2006), ‘Bayesian mapping of genotype× expression interactions in quantitative and qualitative traits’, Heredity 97(1), 4–18. [DOI] [PubMed] [Google Scholar]
  26. Johnstone IM et al. (2004), ‘Needles and straw in haystacks: Empirical bayes estimates of possibly sparse sequences’, The Annals of Statistics 32(4), 1594–1649. [Google Scholar]
  27. Junttila MR & de Sauvage FJ (2013), ‘Influence of tumour micro-environment heterogeneity on therapeutic response’, Nature 501(7467), 346–354. [DOI] [PubMed] [Google Scholar]
  28. Kolar M & Xing EP (2009), ‘Sparsistent estimation of time-varying discrete markov random fields’, arXiv preprint arXiv:0907.2337. [Google Scholar]
  29. Kolar M et al. (2010), On sparse nonparametric conditional covariance selection, in ‘Proceedings of the 27th International Conference on Machine Learning (ICML-10)’, pp. 559–566. [Google Scholar]
  30. Lauritzen SL (1996), Graphical models, Oxford University Press. [Google Scholar]
  31. Leday GG et al. (2015), ‘Gene network reconstruction using global-local shrinkage priors’, arXiv preprint arXiv:1510.03771 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Lee JD et al. (2017), ‘Communication-efficient sparse regression’, The Journal of Machine Learning Research 18(1), 115–144. [Google Scholar]
  33. Lee KH & Xue L (2018), ‘Nonparametric finite mixture of gaussian graphical models’, Technometrics 60(4), 511–521. [Google Scholar]
  34. Lin Z et al. (2017), ‘On joint estimation of gaussian graphical models for spatial and temporal data’, Biometrics 73(3), 769–779. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Liu H et al. (2010), Graph-valued regression, in ‘Advances in Neural Information Processing Systems’, pp. 1423–1431. [Google Scholar]
  36. Liu W et al. (2017), ‘Structural similarity and difference testing on multiple sparse gaussian graphical models’, The Annals of Statistics 45(6), 2680–2707. [Google Scholar]
  37. Ma Y et al. (2016), ‘Insulin-like growth factor-binding protein-3 inhibits igf-1-induced proliferation of human hepatocellular carcinoma cells by controlling bfgf and pdgf autocrine/paracrine loops’, Biochemical and biophysical research communications 478(2), 964–969. [DOI] [PubMed] [Google Scholar]
  38. McGuffey EJ et al. (2018), ‘pibag: Pathway-based integrative bayesian modeling of multiplatform genomics data’, M.D. Anderson Technical Report. [Google Scholar]
  39. Meinshausen N & Bühlmann P (2006), ‘High-dimensional graphs and variable selection with the lasso’, The annals of statistics pp. 1436–1462. [Google Scholar]
  40. Mitchell TJ & Beauchamp JJ (1988), ‘Bayesian variable selection in linear regression’, Journal of the American Statistical Association 83(404), 1023–1032. [Google Scholar]
  41. Mitra R et al. (2016), ‘Bayesian graphical models for differential pathways’, Bayesian Analysis 11(1), 99–124. [Google Scholar]
  42. Morris JS et al. (2008), ‘Bayesian analysis of mass spectrometry proteomic data using wavelet-based functional mixed models’, Biometrics 64(2), 479–489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Morris JS et al. (2020), ‘Hepatoscore-14: Measures of biological heterogeneity significantly improve prediction of hepatocellular carcinoma risk’, Hepatology . [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Narayan M et al. (2015), ‘Two sample inference for populations of graphical models with applications to functional connectivity’, arXiv preprint arXiv:1502.03853 . [Google Scholar]
  45. Narisetty NN et al. (2014), ‘Bayesian variable selection with shrinking and diffusing priors’, The Annals of Statistics 42(2), 789–817. [Google Scholar]
  46. Ni Y et al. (2018), ‘Bayesian graphical regression’, Journal of the American Statistical Association pp. 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Park T & Casella G (2008), ‘The bayesian lasso’, Journal of the American Statistical Association 103(482), 681–686. [Google Scholar]
  48. Peng J et al. (2012), ‘Partial correlation estimation by joint sparse regression models’, Journal of the American Statistical Association . [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Peterson C et al. (2015), ‘Bayesian inference of multiple gaussian graphical models’, Journal of the American Statistical Association 110(509), 159–174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Pissaia A et al. (2009), Significance of serum tumor markers carcinoembryonic antigen, ca 19-9, ca 125, and ca 15-3 in pre-orthotopic liver transplantation evaluation, in ‘Transplantation proceedings’, Vol. 41, pp. 682–684. [DOI] [PubMed] [Google Scholar]
  51. Polson NG & Scott JG (2010), ‘Shrink globally, act locally: Sparse bayesian regularization and prediction’, Bayesian Statistics 9, 501–538. [Google Scholar]
  52. Saegusa T & Shojaie A (2016), ‘Joint estimation of precision matrices in heterogeneous populations’, Electronic journal of statistics 10(1), 1341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Sanyal AJ et al. (2010), ‘The etiology of hepatocellular carcinoma and consequences for treatment’, The oncologist 15(Supplement 4), 14–22. [DOI] [PubMed] [Google Scholar]
  54. Scott JG et al. (2010), ‘Bayes and empirical-bayes multiplicity adjustment in the variable-selection problem’, The Annals of Statistics 38(5), 2587–2619. [Google Scholar]
  55. Szekanecz É, et al. (2008), ‘Tumor-associated antigens in systemic sclerosis and systemic lupus erythematosus: associations with organ manifestations, immunolaboratory markers and disease activity indices’, Journal of autoimmunity 31(4), 372–376. [DOI] [PubMed] [Google Scholar]
  56. Tan LS et al. (2017), ‘Bayesian inference for multiple gaussian graphical models with application to metabolic association networks’, The Annals of Applied Statistics 11(4), 2222–2251. [Google Scholar]
  57. Telesca D et al. (2012), ‘Modeling protein expression and protein signaling pathways’, Journal of the American Statistical Association 107(500), 1372–1384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Tilg H (2001), ‘Cytokines and liver disease’, Canadian Journal of Gastroenterology and Hepatology 15(10), 661–668. [DOI] [PubMed] [Google Scholar]
  59. Wang Z et al. (2018), ‘Transcriptome deconvolution of heterogeneous tumor samples with immune infiltration’, iScience 9, 451–460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Xia Y et al. (2015), ‘Testing differential networks with applications to the detection of gene-gene interactions’, Biometrika 102(2), 247–266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Yamauchi K et al. (2003), ‘Increased serum levels of macrophage inflammatory protein-3α in hepatocellular carcinoma: relationship with clinical factors and prognostic importance during therapy’, 11(5), 601–605. [PubMed] [Google Scholar]
  62. Yang E et al. (2014), Mixed graphical models via exponential families, in ‘Artificial Intelligence and Statistics’, pp. 1042–1050. [Google Scholar]
  63. Zhou S et al. (2010), ‘Time varying undirected graphs’, Machine Learning 80(2), 295–319. [Google Scholar]
  64. Zou T. et al. (2017), ‘Covariance regression analysis’, Journal of the American Statistical Association 112(517), 266–281. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary
HepatoScore Supplementary

RESOURCES