Bayesian Covariate-Dependent Gaussian Graphical Models with Varying Structure

Yang Ni; Francesco C Stingo; Veerabhadran Baladandayuthapani

. Author manuscript; available in PMC: 2023 Oct 5.

Published in final edited form as: J Mach Learn Res. 2022;23(242):https://www.jmlr.org/papers/v23/21-0102.html.

Bayesian Covariate-Dependent Gaussian Graphical Models with Varying Structure

Yang Ni ¹, Francesco C Stingo ², Veerabhadran Baladandayuthapani ³

PMCID: PMC10552903 NIHMSID: NIHMS1857185 PMID: 37799290

Abstract

We introduce Bayesian Gaussian graphical models with covariates (GGMx), a class of multivariate Gaussian distributions with covariate-dependent sparse precision matrix. We propose a general construction of a functional mapping from the covariate space to the cone of sparse positive definite matrices, which encompasses many existing graphical models for heterogeneous settings. Our methodology is based on a novel mixture prior for precision matrices with a non-local component that admits attractive theoretical and empirical properties. The flexible formulation of GGMx allows both the strength and the sparsity pattern of the precision matrix (hence the graph structure) change with the covariates. Posterior inference is carried out with a carefully designed Markov chain Monte Carlo algorithm, which ensures the positive definiteness of sparse precision matrices at any given covariates’ values. Extensive simulations and a case study in cancer genomics demonstrate the utility of the proposed model.

Keywords: Covariate-dependent graphs, Markov random fields, Random thresholding, Subject-level inference, Undirected graphs

1. Introduction

Undirected Gaussian graphical models (GGMs), also known as Gaussian Markov random fields, are one of the common tools to analyze multivariate data with complex structure and find many useful applications across biomedicine, finance, and public health. A GGM can simply be expressed as a multivariate Gaussian distribution with a sparse precision (inverse-covariance) matrix. The zero entries of the precision matrix have probabilistic interpretation of conditional independence between the Gaussian random variables (nodes of a graph). Moreover, all the conditional independence relationships can be directly estimated from the accompanying undirected graph for which a zero entry in the precision matrix corresponds to a missing edge in the graph. This equivalency, essentially reduces the problem of graph structure learning in GGMs to finding zeros in the precision matrix.

Many existing GGM approaches (Dobra et al., 2004; Sudderth et al., 2004; Meinshausen and Bühlmann, 2006; Yuan and Lin, 2007; Friedman et al., 2008; Scott and Carvalho, 2008; Dobra et al., 2011; Green and Thomas, 2013; Drton and Maathuis, 2017; Khare et al., 2018; Massam, 2018; Gan et al., 2019) assume an independent and identically distributed (i.i.d.) sampling scheme y_i = (y_i1, … , y_ip) ~ N (0, Ω⁻¹) for i = 1, … , n where Ω is the precision matrix. However, the independence assumption does not hold in many applications. For example, observations in multivariate time series data are not independent and exhibit temporal correlations; similarly for spatial data with spatial correlation. In addition, the assumption of identical distribution implies homogeneity across observations and is often violated as well. For instance, tumor heterogeneity is a well-known characteristic in cancer: patients with the same cancer-type can be rather different in their genetic/genomic architecture. Forcing the same GGM (i.e., the same precision matrix Ω) onto every patient is a restrictive modeling assertion, when modeling cancer genomic networks.

Attempts have been made to extend GGMs or other types of graphical models beyond i.i.d. data. If there is a natural grouping of the observations, multiple graphical models (Guo et al., 2011; Danaher et al., 2014; Oates et al., 2014; Peterson et al., 2015; Yajima et al., 2015; Xie et al., 2016; Ni et al., 2018; Shaddox et al., 2018) can be applied to learn group-specific graphs assuming observations within each group are i.i.d.. Another line of work incorporates additional covariates x_i in estimating graphs. Conditional Gaussian graphical models (Rothman et al., 2010; Yin and Li, 2011; Bhadra and Mallick, 2013) are multivariate linear regression models with the error terms following an i.i.d. GGM (can be viewed as chain graphical models). While graph estimation is conditional on the covariates, they only enter the model via the mean structure. As a consequence, the graph topology and the precision matrix stay the same across observations. In this paper, we are taking a more direct approach, in the sense that the latent graph and hence the sparse precision matrix are explicit functions of covariates.

There are a few recent work in this direction. Liu et al. (2010a) proposed a tree-based method that partitions the covariate space into a finite number of subspaces by classification and regression trees and fits GGMs separately to subsets of data. However, the estimated graphs may be unstable and lack similarity for similar covariates due to the separate graph estimation, as reported by Cheng et al. (2014). Kolar et al. (2010) proposed a penalized kernel smoothing approach that allows the precision matrix to vary with covariates. Cheng et al. (2014) developed a conditional Ising model for binary data where the dependencies are linear functions of covariates. Although the methods of Kolar et al. (2010) and Cheng et al. (2014) allow edge strength to vary with covariates, graph structure is assumed to be constant across all observations. Recently, Ni et al. (2019) proposed a graphical regression framework that allows both edge strength and graph structure to vary with covariates in (directed) Bayesian networks. They assumed there exists a natural ordering of the nodes. Given this assumption, Bayesian networks can be written as systems of recursive linear regressions. A conditional independence function was then introduced to connect regression coefficients with covariates.

In this paper, we consider a general problem of estimating undirected GGMs conditional on covariates (GGMx). GGMx allows not only the edge strength (i.e., off-diagonal elements of precision matrix) but also the graph structure (i.e., sparsity pattern of precision matrix) to vary as functions of covariates, which is illustrated in Figure 1 with graphs with four nodes and two covariates. Figure 1 also illustrates the generative mechanism underlying GGMx: covariates x_i generate sparse precision matrices Ω_i (hence the graphs G_i), which in turn generate responses y_i. The major challenge in this context is the positive definiteness constraint of precision matrices – a sine qua non for GGMs – in the presence of covariates. We propose a simple strategy by specifying a matrix-valued function f(·), such that Ω_i = f(x_i) is a positive definite matrix for any x_i almost surely; along with the function f(·) consisting of a random thresholding component that encourages sparse precision matrix estimation, specifically enforcing the required zero-pattern that corresponds to missing edges. The sparse functional relationship between Ω_i and x_i allows for novel graph interpolation for an unseen observation at covariates x*. We show that the random thresholding gives rise to a discrete mixture of non-local priors (Johnson and Rossell, 2010) for precision matrices. We also carefully design a Markov chain Monte Carlo (MCMC) algorithm for posterior inference, which guarantees to propose positive definite precision matrices for any x_i. GGMx allows for subject-level inference on unknown graphs. Moreover, GGMx is a general class of graphical models, which subsumes at least five special cases including standard GGMs, group-specific GGMs (Guo et al., 2011; Danaher et al., 2014; Peterson et al., 2015), time-varying GGMs (Zhou et al., 2010), covariate-dependent GGMs (Kolar et al., 2010), and context-specific GGMs (Nyman et al., 2017). Extensive simulation studies show strong and robust performance of GGMx compared with competing methods. Using a cancer genomics case study, we demonstrate how GGMx can be used to infer subject-specific gene networks, which can facilitate deeper investigations in the genomic foundation of precision medicine.

Figure 1: — Illustration of GGMx. Subject-level sparse precision matrices Ω_i and graphs *G_i* of Y vary with covariates X. The edge thickness is proportional to its strength of association *ω_ijk*. Both edge strength and graph structure change with X. GGMx can also be viewed as a generative model: X generate graphs which in turn generate Y.

The rest of this article is organized as follows. We introduce the background and notations in Section 2. We present the proposed GGMx in Section 3 and discuss the link between the random thresholding prior and non-local priors in Section 4. We summarize the posterior inference and graph interpolation in Section 5. We demonstrate the utility and robustness of GGMx with extensive simulation studies in Section 6. GGMx is illustrated by a real data application in Section 7. Section 8 provides our closing discussion.

2. Background and notation

A GGM is a multivariate Gaussian distribution with a sparse precision matrix. Let $Y = (Y_{1}, \dots, Y_{p}) \sim N (0, Ω^{- 1})$ be multivariate Gaussian random variables with mean zero and precision matrix $Ω = [ω_{j k}]$ . Since the off-diagonal elements in Ω are proportional to partial correlations, a zero entry ω_jk = 0 indicates that Y_j and Y_k are conditionally independent given all other variables. A GGM graphically represents the zero patterns of Ω by an undirected graph. An undirected graph G = (V, E) consists of a set of nodes V = {1, …, p} and a set of undirected edges $E \subseteq {{j, k} ∣ j, k \in V}$ . The nodes V represent the variables Y and an edge {j, k} is present in the graph if and only if $ω_{j k} \neq 0$ . This is not an arbitrary way of drawing a graph. In fact, the conditional independence relationships that are encoded in the multivariate Gaussian distribution can be directly read off from G using the notion of graph separation. Importantly, learning the graph structure is equivalent to finding the zero patterns of Ω.

Under the Bayesian paradigm, several prior distributions (Roverato, 2002; Wang et al., 2012; Wang, 2015) for sparse precision matrices have been developed, which all take the same general form,

π (Ω) = \frac{\tilde{π} (Ω) I (Ω \in M^{+})}{\int \tilde{π} (Ω) I (Ω \in M^{+}) d Ω} \propto \tilde{π} (Ω) I (Ω \in M^{+}),

(1)

with M⁺ being the collection of positive definite matrices (PDMs). For example, G-Wishart prior (Roverato, 2002) assumes $\tilde{π} (Ω)$ to be a Wishart distribution $W i s h a r t (\cdot ∣ b, Ω_{0})$ and $M^{+} ≔ M_{G}^{+}$ to be PDMs consistent with a graph G, which leads to $π (Ω ∣ G, b, Ω_{0}) \propto W i s h a r t (Ω ∣ b, Ω_{0}) I (Ω \in M_{G}^{+})$ . Bayesian graphical lasso (Wang et al., 2012) assumes $\tilde{π} (Ω)$ to be a product of independent exponential priors Exp(·|λ) and double-exponential priors DE(·|λ) on diagonal and off-diagonal elements of Ω, $Ω, π (Ω ∣ λ) \propto \prod_{j < k} D E (ω_{j k} ∣ λ) \prod_{j} E x p (ω_{i i} ∣ λ / 2) I (Ω \in M^{+})$ . Graphical spike-and-slab prior (Wang, 2015) replaces the double-exponential priors in Bayesian graphical lasso by spike-and-slab priors, $π (Ω ∣ G, v_{1}, v_{0}, λ) \propto \prod_{{j, k} \in E} N (ω_{j k} ∣ 0, v_{1}) \prod_{{j, k} \notin E} N (ω_{j k} ∣ 0, v_{0}) \prod_{j} E x p (ω_{i i} ∣ λ / 2) I (Ω \in M^{+})$ where v₁ ≫ v₀. Priors on Ω can be defined either conditionally on the graph G or marginally; in what follows we do not use a model indicator parameter G but will infer the graph structures directly from the zero patterns in the precision matrices.

3. Gaussian graphical models with covariates

Let y₁, … , y_n be n realizations of a random vector Y = (Y₁, … , Y_p). We assume an independent multivariate Gaussian distribution for each observation $y_{i} \sim p (y_{i} ∣ Ω_{i}) = N (0, Ω_{i}^{- 1})$ with the precision matrix $Ω_{i} = [ω_{i j k}]$ , importantly, indexed by i = 1, … , n. A subject-level graph G_i = (V, E_i) is embedded in the subject-level precision matrix $Ω_{i} : {j, k} \in E_{i}$ if and only if $ω_{i j k} \neq 0$ .

Without further modeling assumptions, Ω_i cannot be estimated with a single observation i. Let x₁, … , x_n be n realizations of covariates X = (1, X₁, … , X_q). Note that when X = 1 (i.e., there is no covaraites), the proposed GGMx is reduced to standard GGMs; more discussion of special cases of GGMx will be given later. We model Ω_i ≔ f(x_i) through a symmetric matrix-valued function f(·), which is estimable as a population-level parameter shared across all observations.

General construction of covariate-dependent priors.

The key is the construction of the function $f (\cdot) = [f_{j k} (\cdot)]$ such that $Ω_{i} = f (x_{i})$ is a PDM for any x_i, i = 1, … , n. Let $ℳ^{+}$ denote the collection of all such functions. This can be achieved by specifying a prior f ~ II that assigns positive mass only on functions that satisfy such requirement, $Π (ℳ^{+}) = 1$ . We consider the following generalization of the prior density in (1) as,

π (f) = \frac{\tilde{π} (f) I (f \in ℳ^{+})}{\int \tilde{π} (f) I (f \in ℳ^{+}) d f},

(2)

where $\tilde{π}$ is a distribution on matrix-valued functions. Note that the support of $\tilde{π}$ is not limited to $ℳ^{+}$ , offering great flexibility in the choice of $\tilde{π}$ . For example, we can start from independent distributions a priori such that $\tilde{π} (f) = \prod_{j \leq k} \tilde{π} (f_{j k})$ ; using this construction, the marginal distribution $\tilde{π} (f_{j k})$ need not be defined with a constrained range. Because of the deterministic relationship Ω_i = f (x_i), prior π(f) induces a conditional prior on Ω_i given x_i.

Two additional critical properties are desired for f(·). (i) Smoothness — similar inputs should give rise to similar PDMs. Without smoothness, similar subjects may have vastly different networks, which is difficult to interpret in many applications including ours. (ii) Sparsity — π(f) should have positive probability on sparse PDMs. Sparsity is a common assumption in high-dimensional models including GGMs, which improves statistical efficiency and interpretability compared to dense models. In order to encourage sparsity of Ωi, a positive mass has to be placed on sparse PDMs a priori because otherwise there will be zero mass on sparse PDMs a posteriori even if data strongly favor sparse PDMs. To equip f(·) with these two properties, we decompose each off-diagonal element f_jk(·) of f (·) into two components,

f_{j k} (x_{i}) = g_{j k} (x_{i}) I (| g_{j k} (x_{i}) | > t_{j k}), for j < k,

(3)

where $g_{j k} (\cdot)$ is some smooth function, the hard thresholding $I (| g_{j k} (x_{i}) | > t_{j k})$ promotes sparsity in $f_{j k} (\cdot)$ , and t_jk is a random threshold, which can be interpreted as a minimum effect size of ω_ijk. Specifically, whenever $g_{j k} (x_{i})$ is less than t_jk in magnitude, the hard thresholding truncates $f_{j k} (x_{i})$ to zero and hence induces a missing edge between nodes j and k for subject i. Our use of a thresholding function to induce sparsity on precision matrices Ωi= f(x_i) is novel and crucially different from conventional GGM priors including the G-Wishart prior and the graphical spike-and-slab prior (Wang, 2015): in order to construct observation-specific graphs, conventional priors would require a latent indicator for each potential edge and each observation, which would greatly increase the model complexity. For example, in our application with multiple myeloma dataset, conventional priors would need n·p·(p—1)/2 = 79,728 latent indicators whereas the proposed GGMx needs much fewer p · (p — 1)/2 = 528 thresholding parameters. Moreover, as will be introduced later, GGMx enables undirected graph interpolation for unseen covariates, a new feature that is difficult to obtain with conventional priors. Other choices of thresholding functions are possible such as soft thresholding and nonnegative garrote thresholding. The main motivation of choosing hard thresholding over the alternatives is its theoretical connection with mixture of non-local priors; see Section 4.

For the diagonal elements (inverse-partial-variance, Whittaker 2009) f_jj(·) of f(·), we assume the following model to ensure its nonnegativity,

f_{j j} (x_{i}) = \exp {g_{j j} (x_{i})} .

(4)

Note that unlike off-diagonal elements in (3), the diagonal element f_jj(·) is not subject to thresholding.

Remark 1 Our formulation encompasses covariate-dependent priors on both the off-diagonal (inverse-covariance) and diagonal (inverse-partial-variance) elements, thus conducting both graphical and inverse-partial-variance regression, simultaneously.

Remark 2 The proposed prior has two advantages over the more commonly used G-Wishart prior: (i) the induced prior on Ω_i from (2) explicitly incorporates covariates x_i and (ii) the normalizing constant of G-Wishart is not a constant with respect to graph G and therefore comparing two graphs requires explicit evaluation of the intractable normalizing constant whereas π(f), due to the thresholding function, does not have such complication.

Given f(·) and X, the proposed GGMx satisfies functional Markov properties, e.g., the pairwise functional Markov property, which is stated formally in the following lemma.

Lemma 1 If f_jk(X) = 0, then $Y_{j} ╨ Y_{k} ∣ Y_{r e s t}$ , X where Y_rest is the subvector of Y without Y_j and Y_k.

The proof of Lemma 1 directly follows from the fact that f_jk(X) = 0 implies there is a missing edge between nodes j and k given covariates X, which in turn implies that $Y_{j} ╨ Y_{k} ∣ Y_{r e s t}$ , X from standard GGM theory.

A natural choice of g_jk(·) is a linear function $g_{j k} (x_{i}) = β_{j k}^{T} x_{i}$ although, in general, g_jk(·) can be any smooth function. Given the limited sample size of the case study, we consider g_jk(·) to be linear for parsimony (see Section 8 for a brief discussion on modeling a nonlinear g_jk) and interpretability (β_jk are the rates of changes of ω_ijk in x_i). If the focus is on learning the graph structure and strength, i.e., the off-diagonal elements of Ω_i, one can further simplify the model by reducing diagonal elements g_jj(x_i) to be constant with respect to the covariates.

GGMx is a fairly flexible class of models and has at least five special cases (see Table 1). (i) If X only contains the intercept, then GGMx reduces to the case of the standard GGM because the graph is a function of a constant and hence is constant. (ii) If X is categorical, then GGMx is a multiple graphical model (also known as group-specific GGM) as the categorical covariate defines the groups. (iii) If X is univariate time points, then GGMx can be used for modeling time-varying GGMs (Zhou et al., 2010) by treating time as a covariate¹. (iv) If the thresholds t_jk’s are fixed to 0, then GGMx is a covariate-dependent GGM in which the strength of the graph varies continuously with the covariates but the structure is constant because a non-zero linear function is non-zero almost everywhere. (v) If X is a subset of Y, then GGMx can be interpreted as a context-specific GGM (Nyman et al., 2017) where the graph structure varies with (discretized) X.

Table 1:

Five special cases of GGMx.

Special cases of GGMx	Conditions	Mapping X ↦ Ω
Standard GGM	x_i = 1	$g_{j k} (x_{i}) = β_{j k}$	$ω_{i j k} = β_{j k} I (\| β_{j k} \| > t_{j k})$
Group-specific GGM	x_i = c, c ∈ {1, … , C}	$g_{j k} (x_{i}) = β_{j k c}$	$ω_{i j k} = β_{j k c} I (\| β_{j k c} \| > t_{j k})$
Time-varying GGM	x_i = x, time $x \in ℝ$	$g_{j k} (x) = x \cdot β_{j k}$	$ω_{i j k} = x \cdot β_{j k} I (\| x \cdot β_{j k} \| > t_{j k})$
Covariate-dependent GGM	t_jk = 0	$g_{j k} (x_{i}) = β_{j k}^{T} x_{i}$	$ω_{i j k} = β_{j k}^{T} x_{i}$
Context-specific GGM	x_i = y_i1	$g_{j k} (x_{i}) = y_{i 1} \cdot β_{j k}$	$ω_{i j k} = y_{i 1} \cdot β_{j k} I (\| y_{i 1} \cdot β_{j k} \| > t_{j k})$

Open in a new tab

Priors. We assign priors to β_jk and t_jk, which in turn define $\tilde{π} (f)$ . We assume an independent multivariate Gaussian prior $β_{j k} \sim π (β_{j k}) = N (β_{j k} ∣ 0, τ_{j k} I_{q})$ . The thresholding parameter t_jk can be interpreted as the minimum size of off-diagonal elements of Ω_i. Since its value is usually unknown in practice, we assign a truncated normal prior $t_{j k} \sim π (t_{j k}) = N (μ_{t}, σ_{t}^{2}) I (t_{j k} > 0)$ to reflect the uncertainty. As we will show in the next section, the priors of β_jk and t_jk induce a mixture of non-local priors on Ω_i.

To complete the prior formulation, for the hyperparamter τ_jk, we assign a hyperprior

τ = {τ_{j k}}_{j \leq k} \sim π (τ) = \frac{C_{τ} \prod_{j \leq k} I G (τ_{j k} ∣ a_{τ}, b_{τ})}{\int C_{τ} \prod_{j \leq k} I G (τ_{j k} ∣ a_{τ}, b_{τ}) d τ}

where IG(a, b) denotes an inverse-gamma density with shape a and scale b, and C_τ is the normalizing constant in (2),

C_{τ} = \int \tilde{π} (f) I (f \in ℳ^{+}) d f .

Including C_τ in the prior of π(τ) serves to cancel out $C_{τ}^{- 1}$ in (2) so that the full conditional of τ_jk is inverse-gamma. Similar cancellation trick has been used and thoroughly investigated in Bayesian graphical lasso (Wang et al., 2012).

A schematic representation of the proposed GGMx is provided in Figure 2.

4. Theoretical Properties

We establish a general result of the connection between the proposed prior of precision matrices induced by (2) and (3) and non-local alternative priors in GGM. A non-local prior assigns a vanishing density (under the alternative hypothesis) to the neighborhood of the null hypothesis. In variable selection contexts, this density vanishes around 0 and therefore shrinks small effect to zero, which is appealing because we are interested in a parsimonious estimation of the graph (i.e., a sparse network). Non-local priors have been shown, both theoretically and empirically, to have superior performance over local priors in various applications including hypothesis testing, high-dimensional sparse regression, and Bayesian networks (Johnson and Rossell, 2010, 2012; Altomare et al., 2013; Rossell and Telesca, 2017; Shin et al., 2018; Ni et al., 2019). However, to the best of our knowledge, all existing priors of sparse precision matrices in GGM (G-Wishart, Bayesian graphical lasso, and stochastic search structure learning prior) are local, i.e., π(Ω) does not approach 0 as ω_jk → 0 for (j, k) ∈ E. Conceptually, local priors have a seemingly “contradictory” representation of one’s prior belief. On the one hand, (j, k) ∈ E suggests ω_jk is non-zero. But on the other hand, local priors fail to assign zero mass at ω_jk = 0; in fact, local priors often assign the maximum mass at zero. The practical implication of such “contradiction” is that local priors tend to favor denser models and be more susceptible to false discoveries compared to non-local priors especially for high-dimensional models like GGMx.

Let π_θ and π_t generically denote the priors for θ_jk and $t_{j k}, θ_{j k} \sim π_{θ} (θ_{j k})$ and $t_{j k} \sim π_{t} (t_{j k})$ . Let $T = [t_{j k}]$ . We now show the connection between non-local priors and the proposed prior of the following general form,

π (Ω ∣ T) = \frac{\tilde{π} (Ω ∣ T) I (Ω \in M^{+})}{\int \tilde{π} (Ω ∣ T) I (Ω \in M^{+}) d Ω},

and

\tilde{π} (Ω ∣ T) = \prod_{j = 1}^{p} π_{d} (ω_{j j}) \prod_{j < k} π_{ω ∣ t} (ω_{j k} ∣ t_{j k}),

where

ω_{j k} = θ_{j k} I (| θ_{j k} | > t_{j k}), for j < k .

Note that the equations above have no reference to covariates. We deliberately do so for clarity and generality; all the following theoretical results apply to the marginal distribution π(Ω_i) in GGMx by letting $θ_{j k} = g_{j k} (x_{i}) = β_{j k}^{T} x_{i}$ and $ω_{j j} = \exp {g_{j j} (x_{i})}$ . Conditional on t_jk, the prior π_θ induces a spike-and-slab mixture distribution,

π_{ω ∣ t} (ω_{j k} ∣ t_{j k}) = ρ δ_{0} (ω_{j k}) + (1 - ρ) {\tilde{π}}_{ω ∣ t} (ω_{j k} ∣ t_{j k})

where the mixture weight $ρ = \Pr (| ω_{j k} | < t_{j k} ∣ t_{j k})$ is computed under the conditional distribution of ω_jk induced by π_θ(·) and hence is a function of t_jk (not ω_jk), and the slab is a truncated distribution,

{\tilde{π}}_{ω ∣ t} (ω_{j k} ∣ t_{j k}) = \frac{π_{θ} (ω_{j k}) I (| ω_{j k} | > t_{j k})}{P r (| ω_{j k} | > t_{j k} ∣ t_{j k})} .

Slightly abusing the notations, let $ω = (ω_{1}, \dots, ω_{M}) = (ω_{12}, \dots, ω_{1 p}, ω_{23}, \dots, ω_{2 p}, \dots, ω_{p - 1, p})$ be an M-dimensional vector containing upper-triangular elements of Ω with $M = (\begin{matrix} p \\ 2 \end{matrix})$ . Let $S \subseteq {1, \dots, M}$ denote the indices of non-zeros elements in Ω (or equivalently in ω), i.e. ω_m = 0 if and only if m ∈ S^c. Then the conditional prior of Ω given T can be written as a mixture over all possible subsets S,

π (Ω ∣ T) = \frac{1}{g (T)} I (Ω \in M^{+}) \prod_{j = 1}^{p} π_{d} (ω_{j j})

(5)

\times \sum_{S \in 2^{{1, \dots, M}}} \prod_{m \in S} π_{θ} (ω_{m}) I (| ω_{m} | > t_{m}) \prod_{m \in S^{c}} P r (| ω_{m} | < t_{m} ∣ t_{m}) δ_{0} (ω_{m}),

(6)

where $g (T) = \int \tilde{π} (Ω ∣ T) I (Ω \in M^{+}) d Ω$ is the normalizing constant and 2^{1,…,M} is the power set of {1, … , M}. Our main theorem shows that under very mild conditions, the marginal prior π(Ω) is a discrete mixture of non-local priors. Before we present the main theorem, we first state a lemma that is useful in proving the theorem.

Lemma 2 E[1/g(T)] < ∞ if the distribution π_θ(·) of θ_jk has positive mass around zero, i.e., there exists δ > 0 such that for any $0 < δ' < δ, \int_{- δ'}^{δ'} π_{θ} (θ) d θ > 0$ , and the distribution π_d(·) of ω_jj is not a point mass at zero, i.e., $π_{d} (\cdot) \neq δ_{0} (\cdot)$ .

Proof Consider

g (T) = \int \tilde{π} (Ω ∣ T) I (Ω \in M^{+}) d Ω = \Pr (Ω \in M^{+} ∣ T) > \Pr ({ω_{j j} > (p - 1) λ}_{j = 1}^{p}, {| ω_{j k} | \leq λ}_{j < k} ∣ T), \forall λ \geq 0 > \Pr ({ω_{j j} > (p - 1) λ}_{j = 1}^{p}, {| θ_{j k} | \leq λ}_{j < k}) = \prod_{j = 1}^{p} P r (ω_{j j} > (p - 1) λ) \prod_{j < k} P r (| θ_{j k} | \leq λ) \overset{def}{=} L (λ) .

The first inequality holds because diagonally dominant symmetric matrix is positive definite and the second inequality is true because $| θ_{j k} | \leq λ$ implies $| ω_{j k} | \leq λ$ by design and T is independent of ω_jj and θ_jk. If π_θ has positive mass around zero and π_d is not a point mass at zero, we can pick a sufficiently small (but positive) λ* > 0 such that the lower bound L(λ*) of g(T) is positive. Then it follows that E[1/g(T)] < ∞.

Theorem 1 The marginal prior π(Ω) is given by

π (Ω) = \sum_{S \in 2^{{1, \dots, M}}} ρ_{S} π_{S} (Ω),

where π_S(Ω) is the prior under the hypothesis $H_{S} : ω_{m} \neq 0, m \in S$ and ω_m = 0, m ∈ S^c. Moreover, π_S(Ω) is a non-local prior for any $S \in 2^{{1, \dots, M}} \ \emptyset$ , that is, $π_{S} (Ω) \to 0$ as $ω_{m} \to 0$ for m ∈ S, provided (i) Pr(t = 0) = 0, (ii) π_θ(·) is bounded and has positive mass near 0, and (iii) $π_{d} (\cdot) \neq δ_{0} (\cdot)$ .

Proof The marginal distribution of Ω is given by

π (Ω) = \int π (Ω ∣ T) π_{t} (T) d T = I (Ω \in M^{+}) \prod_{j = 1}^{p} π_{d} (ω_{j j}) \int \frac{1}{g (T)} \prod_{j < k} π_{ω ∣ t} (ω_{j k} ∣ t_{j k}) π_{t} (T) d T .

Let $m (Ω) = I (Ω \in M^{+}) \prod_{j = 1}^{p} π_{d} (ω_{j j})$ , then

π (Ω) = \int m (Ω) \frac{1}{g (T)} \prod_{j < k} π_{ω ∣ t} (ω_{j k} ∣ t_{j k}) π_{t} (T) d T = \int m (Ω) \frac{1}{g (T)} \prod_{j < k} {\Pr (| ω_{j k} | < t_{j k} ∣ t_{j k}) δ_{0} (ω_{j k}) + π_{θ} (ω_{j k}) I (| ω_{j k} | > t_{j k})} π_{t} (t_{j k}) d T = \int m (Ω) \frac{1}{g (T)} \sum_{S \in 2^{{1, \dots, M}}}} \prod_{m \in S} π_{θ} (ω_{m}) I (| ω_{m} | > t_{m}) π_{t} (t_{m}) \prod_{m \in S^{c}} \Pr (| ω_{m} | < t_{m} ∣ t_{m}) δ_{0} (ω_{m}) π_{t} (t_{m}) d T = \sum_{S \in 2^{{1, \dots, M}}} m (Ω) E_{T} [\frac{1}{g (T)} \prod_{m \in S} I (| ω_{m} | > t_{m}) \prod_{m \in S^{c}} \Pr (| ω_{m} | < t_{m} ∣ t_{m})] \prod_{m \in S} π_{θ} (ω_{m}) \prod_{m \in S^{c}} δ_{0} (ω_{m}) \overset{def}{=} \sum_{S \in 2 {1, \dots, M}} h_{S} (Ω) = \sum_{S \in 2 {1, \dots, M}} \int h_{S} (Ω) d Ω \times \frac{h_{S} (Ω)}{\int h_{S} (Ω) d Ω} \overset{def}{=} \sum_{S \in 2 {1, \dots, M}} ρ_{S} \times π_{S} (Ω) .

We will show that for any m ∈ S and any sequence $ω_{m}^{(n)} \to 0$ as $n \to \infty, π_{S} (Ω^{(n)}) \to 0$ as n → ∞ where Ω⁽ⁿ⁾ contains $ω_{m}^{(n)}$ as an element. Note that

\frac{1}{g (T)} \prod_{m \in S} I (| ω_{m} | > t_{m}) \prod_{m \in S^{c}} P r (| ω_{m} | < t_{m} ∣ t_{m}) \leq \frac{1}{g (T)} .

Since E[1/g(T)] < ∞ due to conditions (ii) - (iii) and Lemma 2, and $\lim_{n \to \infty} I (| ω_{m}^{(n)} | > t_{m}) = 0$ almost surely due to condition (i), then by dominated convergence theorem, we have

\lim_{n \to \infty} E_{T} [\frac{1}{g (T)} I (| ω_{m}^{(n)} | > t_{m}) \prod_{m' \in S, m' \neq m} I (| ω_{m'} | > t_{m'}) \prod_{m \in S^{c}} P r (| ω_{m} | < t_{m} ∣ t_{m})] = E_{T} [\frac{1}{g (T)} {\lim_{n \to \infty} I (| ω_{m}^{(n)} | > t_{m})} \prod_{m' \in S, m' \neq m} I (| ω_{m'} | > t_{m'}) \prod_{m \in S^{c}} P r (| ω_{m} | < t_{m} ∣ t_{m})] = 0

Finally, condition (ii) renders $π_{S} (Ω^{(n)}) \to 0$ .

Conditions (i) - (iii) in Theorem 1 are very mild and satisfied by a wide range of π_t, π_θ, and π_d. Condition (i) is trivially satisfied if π_t is continuous (e.g., gamma, inverse-gamma, log-normal, and truncated normal distributions). Condition (ii) holds for Cauchy, normal, and most of the scale mixtures of normal distributions such as Laplace, normal-gamma, and t distributions. Condition (iii) only excludes point mass at zero δ₀(·) from all the possible choices of π_d(·).

A simple illustrative example

As a concrete example, π_S(Ω) is non-local under the prior distributions specified in Section 3, namely, $π_{θ} (θ_{j k}) = N (0, τ), π_{d} (ω_{j j}) = log-normal (0, τ)$ , and $π_{t} (t_{j k}) = N (μ_{t}, σ_{t}^{2}) I (t_{j k} > 0)$ . To visualize the proposed non-local prior, we consider a small precision matrix with p = 3 and perform a prior simulation to generate Ω from π(Ω), the procedure of which is a special case of the posterior simulation procedure (ignoring the likelihood) to be described in Section 5. We visualize π_S(Ω) for S = {1, … , M}, i.e., a complete graph. The marginal densities of pairs of off-diagonal elements of Ω (normalized to partial correlations) are depicted in the top panel of Figure 3 which show vanishing density as ω_jk approaches 0. By contrast, a local prior on Ω (simulated by fixing t_jk = 0) has an increasing density as ω_jk approaches 0 as shown in the bottom panel of Figure 3.

Figure 3: — Non-local (top) and local (bottom) prior distributions of Ω.

Remark The connection between non-local priors and random thresholding has been investigated in the regression context (Rossell and Telesca, 2017; Ni et al., 2019). We make a nontrivial extension to precision matrix estimation for undirected GGMs. One major difference between our theory and those in Rossell and Telesca (2017) and Ni et al. (2019) is the complexity of the intractable prior normalizing constant g(T) in (5). Intractable prior normalizing constant is a common challenge in standard Bayesian GGMs (Dobra et al., 2011; Wang et al., 2012; Wang, 2015), both theoretically and computationally. In order to show the equivalence between non-local priors and random thresholding for GGMs, we make extra assumptions, i.e., π_θ(·) has positive mass around zero and $π_{d} (\cdot) \neq δ_{0} (\cdot)$ , in order to bound E[1/g(T)]. These mild assumptions are not required in previous works. Also note that Rossell and Telesca (2017) truncates probability density whereas we threshold the random variables. Consequently, the resulting marginal prior of Rossell and Telesca (2017) is a non-local prior while ours is a discrete mixture of the non-local prior and point mass at 0. Computationally, the issue of intractable normalizing constant is resolved by a carefully designed MCMC algorithm, which will be discussed in the next section.

5. Posterior Inference

The proposed GGMx is parameterized by three sets of parameters ${β_{j k}}_{j \leq k}, {t_{j k}}_{j < k}$ , and ${τ_{j k}}_{j \leq k}$ . The joint posterior distribution of these parameters is given by,

p ({β_{j k}}_{j \leq k}, {t_{j k}}_{j < k}, {τ_{j k}}_{j \leq k} ∣ {y_{i}, x_{i}}_{i = 1}^{n}) \propto \prod_{i = 1}^{n} N (y_{i} ∣ 0, Ω_{i}) \prod_{j < k} N (t_{j k} ∣ μ_{t}, σ_{t}^{2}) I (t_{j k} > 0) \prod_{j \leq k} N (β_{j k} ∣ 0, τ_{j k} I_{q}) I G (τ_{j k} ∣ a_{τ}, b_{τ}),

where the right-hand side of this equation depends on x_i through Ω_i = f (x_i) and f(·) is defined by ${β_{j k}}_{j \leq k}$ and ${t_{j k}}_{j < k}$ . The posterior inference of the model parameters is carried out by MCMC. We need to carefully choose a proposal distribution that can propose $f \in ℳ^{+}$ efficiently. This is not a trivial task because the probability that we generate $f \in ℳ^{+}$ is practically zero if we propose β_jk and t_jk from naive proposals such as standard random walks. Here, we introduce a proposal that always proposes $f \in ℳ^{+}$ .

For illustration, suppose we are currently updating the (j, k)th element of Ω_i. Let ω_i,−k,k denote the kth column of Ω_i without the kth row and let Ω_i,−k,−k denote the submatrix of Ω_i without the kth row and column. Let $ϕ_{i k} = ω_{i k k} - u_{i k}$ with $u_{i k} = ω_{i, - k, k}^{T} Ω_{i, - k, - k}^{- 1} ω_{i, - k, k}$ . We first propose new $β_{j k ℓ}^{*}$ and $t_{j k}^{*}$ from some proposal densities $q_{β} (β_{j k ℓ}^{*} ∣ β_{j k ℓ})$ and $q_{t} (t_{j k}^{*} ∣ t_{j k})$ such as random walks for ℓ = 1, … , q+1. The resulting new values of ω_i,−k,k and u_ik are denoted by $ω_{i, - k, k}^{*}$ and $u_{i k}^{*}$ . Notice that Ω_i is positive definite if and only if ϕ_ik > 0 for k = 1, … , p. This is due to the Sylvester’s criterion that a symmetric matrix is positive definite if and only if all of the leading principal minors are positive. Without loss of generality, assuming k is the last column and all previous principal minors are positive, and assuming covariates x_i are positive. Then the last leading principal minor $\det (Ω_{i}) = (ω_{i k k} - u_{i k}) \det (Ω_{i, - k, - k})$ is positive if and only if $ω_{i k k} - u_{i k} > 0$ . Therefore, in order to ensure positive definiteness of Ω_i, ∀i when updating its (j, k)th element, we will additionally propose a new $β_{k k ℓ}^{*}$ such that $ω_{i k k}^{*} = \exp {g_{k k}^{*} (x_{i})} > u_{i k^{*}}$ where $g_{k k}^{*} (x_{i}) = x_{i ℓ} β_{k k ℓ}^{*} + \sum_{ℓ' \neq ℓ} x_{i ℓ'} β_{k k ℓ'}$ . The solution to this inequality for all i is the constraint that the proposal of $β_{k k ℓ}^{*}$ needs to respect. Specifically, we will propose $β_{k k ℓ}^{*} \sim q_{β} (β_{k k ℓ}^{*} ∣ β_{k k ℓ}) I (β_{k k ℓ}^{*} \in S_{k ℓ}^{*})$ where

S_{k ℓ}^{*} = {β ∣ β > \max_{i} (\frac{\log (u_{i k}^{*}) - \sum_{ℓ' \neq ℓ} x_{i ℓ'} β_{k k ℓ'}}{x_{i ℓ}})} .

We summarize the property of the proposal density in the following proposition, of which the proof is given by the proceeding paragraph.

Proposition 1 The proposal density $q (β_{j k ℓ}^{*}, t_{j k}^{*}, β_{k k ℓ}^{*} ∣ β_{j k ℓ}, t_{j k}, β_{k k ℓ}) = q_{β} (β_{j k ℓ}^{*} ∣ β_{j k ℓ}) q_{t} (t_{j k}^{*} ∣ t_{j k}) q_{β} (β_{k k ℓ}^{*} ∣ β_{k k ℓ}) I (β_{k k ℓ}^{*} \in S_{k ℓ}^{*})$ and the full conditional density $p (β_{j k ℓ}^{*}, t_{j k}^{*}, β_{k k ℓ}^{*} ∣ \cdot)$ have the same support.

We now provide the MCMC (Metropolis-within-Gibbs) algorithm below; its validity is guaranteed by Proposition 1 and standard MCMC theory.

The MCMC Algorithm.

Initialize model parameters. Repeat the following steps until practical convergence.

(I) Update precision matrices Ω_i. Scanning through each column k = 1, … , p, each row j ≠ k, and each covariate ℓ = 1, … , q + 1, we propose $β_{j k ℓ}^{*}, t_{j k}^{*}$ , and $β_{k k ℓ}^{*}$ from $q_{β} (β_{j k ℓ}^{*} ∣ β_{j k ℓ}), q_{t} (\log t_{j k}^{*} ∣ \log t_{j k})$ , and $q_{β} (β_{k k ℓ}^{*} ∣ β_{k k ℓ}) I (β_{k k ℓ}^{*} \in S_{k ℓ}^{*})$ where $q_{t} (\log t_{j k}^{*} ∣ \log t_{j k}) = N (\log t_{j k}^{*} ∣ \log t_{j k}, η_{t}^{2}), q_{β} (β_{j k ℓ}^{*} ∣ β_{j k ℓ}) = N (β_{j k ℓ}^{*} ∣ β_{j k ℓ}, η_{β}^{2})$ and $q_{β} (β_{k k ℓ}^{*} ∣ β_{k k ℓ}) = N (β_{k k ℓ}^{*} ∣ β_{k k ℓ}, η_{β}^{2})$ . We accept the proposal with probability min(1, α) where

α = \frac{\prod_{i = 1}^{n} p (y_{i} ∣ Ω_{i}^{*}) π (β_{j k ℓ}^{*}) π (t_{j k}^{*}) π (β_{k k ℓ}^{*}) q_{β} (β_{j k ℓ} ∣ β_{j k ℓ}^{*}) q_{t} (t_{j k} ∣ t_{j k}^{*}) q_{β} (β_{j k ℓ} ∣ β_{j k ℓ}^{*}) I (β_{k k ℓ} \in S_{k ℓ})}{\prod_{i = 1}^{n} p (y_{i} ∣ Ω_{i}) π (β_{j k ℓ}) π (t_{j k}) π (β_{k k ℓ}) q_{β} (β_{j k ℓ}^{*} ∣ β_{j k ℓ}) q_{t} (t_{j k}^{*} ∣ t_{j k}) q_{β} (β_{k k ℓ}^{*} ∣ β_{k k ℓ}) I (β_{k k ℓ}^{*} \in S_{k ℓ}^{*})} .

The proposal standard deviations $η_{t}^{2}$ and $η_{β}^{2}$ can be set to achieve desired acceptance rate (say, 20%-40%).

(II) Update the hypervariances τ_jk from the inverse-gamma full conditional, $τ_{j k} \sim I G (a_{τ} + 1 / 2, b_{τ} + β_{j k ℓ}^{2} / 2)$ .

Graph estimation.

A point estimate of G_i can be obtained by thresholding the posterior probability of inclusion. Specifically, we select ${j, k} \in E_{i}$ if $P r ({j, k} \in E_{i} ∣ y_{i}, x_{i}) > c$ where c ∈ [0, 1] is the probability cutoff² . The posterior probability of inclusion can be approximated by the MCMC samples,

P r ({j, k} \in E_{i} ∣ y_{i}, x_{i}) = P r (ω_{i j k} \neq 0 ∣ y_{i}, x_{i}) \approx \frac{1}{R} \sum_{r = 1}^{R} I {ω_{i j k}^{(r)} \neq 0},

where the superscript (r) indexes the posterior samples.

Graph interpolation.

Since the precision matrix Ω_i = f (x_i) is modeled as a function of x_i, we can interpolate a graph G^* = (V, E^*) for an unseen observation at covariates x^*. It is achieved through the posterior predictive distribution of f(·), which can be approximated by the MCMC samples,

P r ({j, k} \in E^{*} ∣ y, x, x^{*}) = P r {f_{j k} (x^{*}) \neq 0 ∣ y, x, x^{*}} \approx \frac{1}{R} \sum_{r = 1}^{R} I {f_{j k}^{(r)} (x^{*}) \neq 0} .

Graph interpolation requires covariates x^* only, since the right-hand side of the equation above does not depend on y^*. In practice, this is a desirable property. For example, one can predict the gene network for new patients without sequencing the whole genome; the measurement of covariates (e.g., blood biomarkers) will suffice.

6. Simulations

6.1. Simulation Setup

We assessed the utility and operating characteristics of GGMx in seven simulation scenarios with different levels of sparsity and types of covariates. The same size of the dataset in application was used: n = 151, p = 33, and q = 2 (q was set to 1 for the last scenario). Note that even with a moderate dataset, the number of parameters $(β_{j k}, t_{j k}, τ_{j k})$ that need to be estimated is $\frac{p (p + 1) (q + 2)}{2} + \frac{p (p - 1)}{2} = 2, 772$ , which is substantially larger than the sample size. We focused on graph structure learning in the first five scenarios by assuming constant diagonal elements g_jj(·) for simplicity; non-constant case (i.e., simultaneous inverse-partial-variance and graphical regression) will be considered in the last two scenarios. We fixed the probability cutoff c to be 0.5 in all scenarios.

Scenario I.

We generated the simulated data from our model. We randomly set 2% of β_jkℓ for j < k to be ±1 with equal probability. We set t_jk = 0.5 and all the diagonal elements of Ω_i to be 1. The covariate x_ij was generated from an uniform distribution $x_{i j} \overset{i i d}{\sim} U (- 1, 1)$ . The resulting precision matrix Ω_i might not be positive definite for all observations i = 1, … , n. We repeated the process until Ω_i > 0, ∀i. Then the observation y_i was drawn from normal $y_{i} \overset{i n d}{\sim} N (0, Ω_{i}^{- 1})$ . Using the same procedure, we generated a similar independent dataset with sample size 50 for testing graph interpolation of GGMx.

Scenario II.

The procedure in Scenario I was inefficient to generate a denser network. In addition, it may not mimic well the data in application. In this scenario, we used one posterior draw from GGMx applied to the multiple myeloma data as simulation truth. The true β_jkl’s are shown as heatmaps in Figure 4a where ℓ = 1 corresponds to the intercept and ℓ = 2, 3 correspond to the two covariates. Since the heatmap of β_jk1 is denser than those of β_jk2 and β_jk3, there were more nearly constant edges than highly varying edges. The true t_jk’s are shown in Figure 4b. The covariates x_i of the multiple myeloma dataset was used. And y_i was drawn from the model $y_{i} \overset{i n d}{\sim} N (0, Ω_{i}^{- 1})$ with Ω_i = f(x_i).

Figure 4: — Simulation truths for Scenario II. Heatmaps of (a) true *β_jkl*’s and (b) true t_jk. They are one posterior draw from GGMx applied to the multiple myeloma data.

Scenario III.

This scenarios considered a simulation truth from an ordinary GGM, i.e. Ω_i = Ω, ∀i. We generated a true Ω as follows.

Generate an Erdös-Rényi graph G with connecting probability 5%.
Set the diagonal entries of Ω to 1. For each edge {j, k} in G, draw corresponding off-diagonal entrie ω_jk uniformly in [−1, −0.5] ∪ [0.5, 1].
Since Ω might not be positive definite, we kept adding 0.1I to Ω until Ω became positive definite. The resulting partial correlations were less than 0.4 in magnitude. Then we simulated $y_{i} \overset{i i d}{\sim} N (0, Ω^{- 1})$ and $x_{i j} \overset{i i d}{\sim} U (- 1, 1)$ . GGMx took the independently generated x_ij as covariates, which were pure “noises” for constructing the graph of y_i.

Scenario IV.

We extended Scenario III to multiple graphs with C = 3 groups. The sample size of each group was n₁ = 50, n₂ = 50, and n₃ = 51. Graph G₁ was generated as an Erdös-Rényi graph with connecting probability 10%, which led to 63 edges. We randomly turned 3 edges on and 3 edges off from G₁ to obtain G₂ and similarly constructed G₃ from G₂. As a result, each pair of (G₁, G₂) and (G₂, G₃) shares about 90% edges whereas (G₁, G₃) shares about 80% edges. Then given graphs, the precision matrices and observations y_i were generated in the same way as Scenario III. To apply GGMx in this setting, we let x_ij be a binary indicator such that x_ij = 1 if observation i belongs to group j for j = 1, 2 and x_ij = 0 for j = 1, 2 if observation i belongs to group 3.

Scenario V.

We have considered continuous covariates (Scenarios I-II), a discrete covariate (Scenarios IV), or no relevant covariates (Scenario III). Here, we included a scenario with one continuous covariate and one discrete covariate. We generated the data by following Scenario I with one covariate replaced by a Bernoulli(0.5) variable and the corresponding coefficients β_jkℓ’s set to ±0.5 with equal probability.

Scenario VI.

We considered a scenario without assuming ω_ijj to be a constant; instead we set g_jj(x_i) = 0.1 + 0.2x_i1 + 0.2x_i2 and $ω_{i j j} = \exp {g_{j j} (x_{i})}$ . For off-diagonal elements, we randomly included 2% of the edges and the corresponding β_jkl for j < k was set to be 0.7. The covariate x_iℓ was generated from $x_{i ℓ} \overset{i i d}{\sim} 2 B e t a (2, 1)$ . The resulting precision matrix Ω_i might not be positive definite for all observations i = 1, … , n. We repeated the process until Ω_i > 0, ∀i. Then the observation y_i was drawn from normal $y_{i} \overset{i n d}{\sim} N (0, Ω_{i}^{- 1})$ .

Scenario VII.

To illustrate GGMx can be used to recover time-varying GGM, we reduced the number of covariate to q = 1 from Scenario VI.

6.2. Methods under Consideration

We compared the proposed GGMx with six competing methods: Bayesian Gaussian graphical models (Mohammadi et al., 2015), graphical lasso (Friedman et al., 2008), kernel graphical lasso (Liu et al., 2010a), fused graphical lasso, group graphical lasso (Danaher et al., 2014), and Bayesian multiple Gaussian graphical model (Shaddox et al., 2018).

Bayesian Gaussian graphical models (BGGMs) assume i.i.d. multivariate Gaussian likelihood and the G-Wishart prior on the precision Ω ~ W_G(b, D) and a uniform prior on the graph G. G-Wishart prior is conjugate to the multivariate Gaussian likelihood. However, due to intractable prior normalizing constant of G-Wishart prior, non-trivial MCMC algorithm is required for posterior inference. We use an efficient trans-dimensional MCMC algorithm proposed by Mohammadi et al. (2015) based on a continuous-time birth-death process.

Graphical lasso (glasso) is a penalized likelihood approach that maximizes the objective function $\log | Ω | - tr (S Ω) - λ ∥ Ω ∥_{1}$ where S is the sample covariance matrix. The first two terms are the Gaussian log-likelihood and the last term is an ℓ₁ penalty, which induces sparsity in Ω. The optimization is solved using a coordinate descent algorithm.

Both BGGM and glasso assume i.i.d. sampling and are designed to infer networks that do not change with covariates. For a more fair comparison, we implemented the kernel graphical lasso (k-glasso) approach outlined in Liu et al. (2010a). K-glasso is a modification of glasso with the sample covariance matrix S replaced by a covariate-dependent covariance matrix via kernel smoothing. Specifically, let

S (x) = \sum_{i = 1}^{n} K (\frac{‖ x - x_{i} ‖}{h}) (y_{i} - μ (x)) {(y_{i} - μ (x))}^{T} / \sum_{i = 1}^{n} K (\frac{‖ x - x_{i} ‖}{h}),

with

μ (x) = \sum_{i = 1}^{n} K (\frac{‖ x - x_{i} ‖}{h}) y_{i} / \sum_{i = 1}^{n} K (\frac{‖ x - x_{i} ‖}{h}),

where || · || is the Euclidean norm, h > 0 is the bandwidth, and K(·) is a Gaussian kernel. Then a sparse estimate of Ω_i is obtained by applying glasso with $S = S (x_{i}), {\hat{Ω}}_{i} = {arg min}_{Ω} {\log | Ω | - tr (S (x_{i}) Ω) - λ_{i} ∥ Ω ∥_{1}}$ .

As pointed out in Section 3, the proposed GGMx is a multiple graphical model when the covariates are categorical. Multiple graphical models assume that observations are divided into C groups. The goal is to jointly estimate group-specific sparse precision matrices Ω^(c), c = 1, … , C. Since the grouping of observations can be represented by a categorical variable, GGMx is able to learn group-specific graphs. For comparison, we consider three alternative multiple graphical model approaches, the two penalized approaches proposed in Danaher et al. (2014), fused graphical lasso (FGL) and group graphical lasso (GGL), and the Bayesian multiple Gaussian graphical model (MGGM) proposed by Shaddox et al. 2018. Both penalized algorithms maximize the following objective with respect to positive definite matrices ${Ω^{(c)}}_{c = 1}^{C}$ ,

\sum_{c = 1}^{C} n_{c} {\log | Ω^{(c)} | - tr (S^{(c)} Ω^{(c)})} - P ({Ω^{(c)}}_{c = 1}^{C}),

where n_c is the sample size of group c, S^(c) is the sample covariance matrix of group c, and P(·) is a penalty that encourages sparsity and similarity of ${Ω^{(c)}}_{c = 1}^{C}$ . The penalty is chosen to be $λ_{1} \sum_{c = 1}^{C} \sum_{j \neq k} | ω_{j k}^{(c)} | + λ_{2} \sum_{c < c'} \sum_{j, k} | ω_{j k}^{(c)} - ω_{j k}^{(c')} |$ for FGL and $λ_{1} \sum_{c = 1}^{C} \sum_{j \neq k} | ω_{j k}^{(c)} | + λ_{2} \sum_{j \neq k} \sqrt{\sum_{c = 1}^{C} ω_{j k}^{(c) 2}}$ for GGL.

Finally, MGGM uses local priors on sparse precision matrices (Wang, 2015) and can be thought as the local prior counterpart of the proposed method for the multiple graphs setting; comparisons with this method only pertain to Scenario IV.

For GGMx, we set the hyperparameters, a_τ = b_τ = 10⁻¹, μ_t = 1, and σ_t = 0.2; these choices will be tested in sensitivity analyses at the end of this section. Both GGMx and BGGM were run for 10,000 iterations with 5,000 burn-in. The regularization parameter of glasso was selected by the stability approach (Liu et al., 2010b) implemented in the R package huge. The tuning parameters λ₁ and λ₂ of FGL and GGL were selected based on the approximated Akaike Information Criterion (AIC) as suggested by Danaher et al. (2014). A 20 × 20 grid evenly spaced between 0.05 and 0.5 for λ₁, and between 0.001 and 0.01 for λ₂, was used. Likewise, the tuning parameters λ_i and h_i of k-glasso were also selected based on AIC on a 20 × 20 grid [0.1, 1] × [0.1, 1] for each observation i = 1, … , n. All results were based on 50 repeat simulations.

6.3. Simulation Results

To assess the graph recovery performance, we computed true positive rate (TPR), false discovery rate (FDR), and Matthews correlation coefficient (MCC),

TPR = \frac{TP}{TP + FN}, FDR = \frac{FP}{TP + FP}, MCC = \frac{TP \times TN - FP \times FN}{\sqrt{(TP + FP) (TP + FN) (TN + FP) (TN + FN)}},

where TP, FP, TN, and FN stand for true positives, false positives, true negatives, and false negatives. MCC takes value between −1 and 1 with 1 being perfect graph recovery and 0 being random guess. In addition, we scrutinized the edges with inclusion probability that is considerably affected by the covariates’ value. Hence, we introduced another three measures: partial TPR (pTPR), partial FDR (pFDR), and partial MCC (pMCC) which are simply TPR, FDR, and MCC restricted to the edges with true frequency of inclusion across observations between 0.1 and 0.9. We report all the metrics in Figure 5. Overall, GGMx had robust, superior performance (with high true positive and low false discovery rates) across all scenarios.

Figure 5: — Simulations. Operating characteristics averaged over 50 repeat simulations under seven scenarios. Graph interpolation is not shown as it is similar to graph estimation. pMCC is 0 when no missing edge is detected. pTPR, pFDR, and pMCC are not available for Scenarios III and IV.

In Scenario I, GGMx clearly outperformed BGGM and glasso in all six measures. This was expected because the data were generated from the proposed model and all edges were associated with covariates. BGGM and glasso assume i.i.d. sampling and therefore did not perform well. Although k-glasso was much better than BGGM and glasso, it is clear that GGMx performed significantly better than k-glasso in all metrics. In addition, GGMx can interpolate graph structure given new covariates. The results of graph interpolation (not shown) were very similar to those of graph estimation.

In Scenario II, it appeared that BGGM was comparable to GGMx in TPR, FDR and MCC. This is because in the simulation truth, there were much more nearly constant edges than highly varying edges. In many cases including the application, it is interesting to focus on highly varying edges as they are most differential across observations. Not surprisingly, GGMx had favorable performance compared to BGGM and glasso in terms of pTPR, pFDR, and pMCC. K-glasso was not able to pick up the signals in this scenario, which mimicked the real data.

In Scenario III where there was no relationship between graph and covariates, BGGM outperformed GGMx, glasso, and k-glasso. But GGMx still had a reasonably good performance with the lowest FDR and substantially better overall performance than glasso and k-glasso.

In Scenario IV (multiple graphical models), GGL, FGL, and MGGM had higher TPR compared to GGMx, however, at the price of higher FDR. Consequently, GGMx outperformed GGL, FGL, and MGGM in terms of FDR and the overall measure MCC.

In Scenario V, GGL and FGL were applied ignoring the continuous covariate whereas k-glasso was applied ignoring the discrete covariate. GGMx was able to simultaneously incorporate both continuous and discrete covariates in estimating graphs and therefore as expected it had the best performance compared to BGGM, glasso, k-glasso, GGL, and FGL in practically all measures.

In Scenario VI where the diagonal elements of Ω_i were not constrained to be constants, the results were consistent with those in Scenarios I-V. BGGM and glasso outperformed k-glasso overall but k-glasso was much better with respect to the selection of the edges that have substantial variability (measured by pTPR, pFDR, and pMCC). The proposed GGMx was clearly the best in both overall measures and partial measures. For example, GGMx had considerably higher MCC as well as pMCC than all the competing methods. In addition, we also evaluated the estimation accuracy of Ω_i by computing the mean squared error (MSE). We again focused on edges with true frequency of inclusion across observations between 0.1 and 0.9. The resulting MSE was 0.10, 1.24, 0.44, and 0.82 for GGMx, BGGM, glasso, and k-glasso, which demonstrated the capability of the proposed GGMx in capturing the heterogeneity in Ω_i.

In Scenario VII, the main conclusion stays the same as in Scenario VI although k-glasso had significantly reduced FDR, however, at the price of significantly reduced TPR. GGMx, on the other hand, demonstrated its stable performance across all scenarios and all measures.

Lastly, we assessed the sensitivity of GGMx to the choice of all the hyperparameters (a_τ, b_τ) and (μ_t, σ_t). We picked Scenario VII and varied the hyperparameters in the following range, (a_τ, b_τ) ∈ {(10⁻², 10⁻²), (10⁻³, 10⁻³), (10⁻⁴, 10⁻⁴)} and (μ_t, σ_t) ∈ {(1.0, 0.5), (1.0, 1.0), (1.5, 1.0)}³. The performance of GGMx with different hyperparameters is reported in Table 2, which shows GGMx is robust within the considered range.

Table 2:

Sensitivity Analysis. Operating characteristics for simulations under six alternative hyperparameter settings. The numbers are calculated on the basis of 50 repetitions; standard deviations are within parentheses. The first row shows the performance of GGMx in Scenario VII with default hyperparameter setting (a_τ, b_τ) = (10⁻¹, 10⁻¹) and (μ_t, σ_t) = (1, 0.2).

		TPR	FDR	MCC	pTPR	pFDR	pMCC
Default Parameter Setting		0.94 (0.04)	0.20 (0.12)	0.86 (0.07)	0.94 (0.04)	0.01 (0.01)	0.81 (0.09)
(a_τ, b_τ)	(10⁻², 10⁻²)	0.93 (0.04)	0.15 (0.10)	0.89 (0.06)	0.93 (0.04)	0.01 (0.01)	0.81 (0.09)
	(10⁻³, 10⁻³)	0.93 (0.04)	0.10 (0.08)	0.91 (0.05)	0.93 (0.04)	0.01 (0.01)	0.80 (0.08)
	(10⁻⁴, 10⁻⁴)	0.93 (0.04)	0.07 (0.07)	0.93 (0.04)	0.93 (0.04)	0.01 (0.01)	0.80 (0.08)
(μ_t, σ_t)
	(1.0, 0.5)	0.97 (0.03)	0.33 (0.12)	0.80 (0.08)	0.97 (0.03)	0.02 (0.02)	0.82 (0.07)
	(1.0, 1.0)	0.98 (0.02)	0.30 (0.13)	0.82 (0.08)	0.98 (0.02)	0.03 (0.02)	0.81 (0.08)
	(1.5, 1.0)	0.97 (0.03)	0.19 (0.11)	0.88 (0.07)	0.97 (0.03)	0.03 (0.02)	0.81 (0.08)

Open in a new tab

7. Application in Multiple Myeloma

We present an application of GGMx in modeling transcriptomic regulation in multiple myeloma (MM) which is a late-stage malignancy of plasma cells. Recent research has shifted the focus from traditional “one size fits all” therapies to precision medicine strategies because MM is a highly heterogeneous genetic disease at an individual level (Hervé et al., 2011). To find better personalized treatment and more accurate prescriptive recommendations to MM patients, there needs to be a better understanding of the heterogeneity based on genomically defined pathways (Lohr et al., 2014). We use data generated by the Multiple Myeloma Research Consortium, a multi-institutional collaborative research effort collected data (among others) on gene expressions and clinical parameters from MM patients (Chapman et al., 2011).

We focus our analyses on the genes mapped to one of the most important pathways in MM, NF-κB signaling pathway. Activation of the NF-κB pathway has been implicated in MM, but the genomic foundation of such activation is only partially understood (Demchenko et al., 2010; Roy et al., 2018). Clinical information includes measurements of two important prognostic factors, serum beta-2 microglobulin (Sβ₂M) and serum albumin. The International Staging System (Greipp et al. 2005) uses these two prognostic factors to stage MM: stage I, Sβ₂M < 3.5 mg/L and serum albumin ≥ 3.5 g/dL; stage II, neither stage I nor III; and stage III, Sβ₂M ≥ 5.5 mg/L. The observed values of Sβ₂M and serum albumin, and the staging partition are depicted in Figure 6. We use these two prognostic factors as covariates (q = 2).

Figure 6: — Observed prognostic factors are shown as crosses and dots. Dots are chosen as representative cases for network visualization in Figure 8. Triangles will be used to interpolate networks for unseen patients shown in Figure 9. The prognostic covariates space are partitioned into Stages I, II, and III, according to the International Staging System for multiple myeloma.

The goal of this study was to infer subject-level gene expression networks whose structures are modified by the prognostic factors. After removing outliers and samples with missing gene expression or clinical information, we had n = 151 samples and p = 33 genes. We ran two separate MCMCs, each with 50,000 iterations, discarded the first 50% as burn-in and saved every 50th sample after burn-in. To check MCMC convergence, we calculated the potential scale reduction factor (PSRF, Gelman et al. 1992) for each entry in Ω_i, i = 1, … , n. The median PSRF was 1.00 with interquartile range 0.01, which showed no lack of convergence. We then concatenated the two chains and all subsequent inference was based on the combined Monte Carlo samples. The probability cutoff c was chosen to control the posterior expected FDR at 1%.

Population-level inference

The estimated graphs had 30 edges per subject on average with minimum 20 edges (from a stage III patient) and maximum 37 edges (from a stage I patient). We summarized a population-level gene expression network G = (V, E) as the union of all networks across subjects $E = \cup_{i = 1}^{n} E_{i}$ . There were |E| = 42 edges in G. To visualize the graph variability, we computed the variance of edge inclusion. Specifically, let e_jk = (e_1jk , … , e_njk) be a binary vector such that e_ijk = 1 if {j, k} ∈ E_i. Then for edge {j, k}, the variance of edge inclusion was defined as the sample variance of e_jk. The population-level network was reported in Figure 7, with the edge width proportional to edge inclusion variability. We found 14 out of 42 edges with variance greater than 0.2 (note the maximum variance is 0.25 for Bernoulli random variable). These 14 edges appeared in about 30%-70% of the patients. In line with our simulation studies, traditional GGMs are unlikely to accurately capture these differential edges.

Figure 7: — Population-level summary of gene expression network. The network is a union of all networks across subjects. The edge width is proportional to edge inclusion variability.

Subject-level inference

Next, we focus on the subject-level inference. We chose 6 representative patients, 2 from each stage, to show their respective networks in Figure 8. The values of their prognostics factors are represented by the dots in Figure 6. We set the edge width proportional to the absolute value of partial correlation $ρ_{i j k} = - \frac{ω_{i j k}}{\sqrt{ω_{i j j} ω_{i k k}}}$ , and use solid lines to represent positive partial correlations and dashed lines negative partial correlations.

We highlight several interesting biological findings. RELB was found to be a highly connected gene across all patients (Figures 7 and 8). RELB is a core member of NF-κB family. Hence it is not surprising that RELB played an import role in NF-κB pathway. In fact, many MM patients have abnormal NF-κB target gene expression, associated with genetic aberration of NFKB1 and NFKB2 (Annunziata et al., 2007). This further confirms our finding that RELB was consistently positively associated with NFKB1 and NFKB2. In addition, NFKBIA is an inhibitor of NF-κB, which is consistent with our findings that NFKBIA was negatively associated with RELB across patients. It is also known that genes in the same family tend to be positively associated with each other. Our study found positive links, for example, BIRC2—BIRC3 and NFKBIA—NFKBIZ. As disease progresses, some paths get blocked and some new connections get acquired. Among others, the link between LTB and TNFRSF13B was found in stage III patients but not in stage I patients whereas the link between NFKBIL2 and MAP3K7IP2 was lost in stage III patients. While some of those links are well documented in the biological literature (Liu et al., 2017), their gain and loss mechanisms need further validation and investigation.

Finally, as new patients come into the clinic, GGMx can be used to quickly predict the individualized gene network only based on the blood test results of Sβ₂M and serum albumin without the costly and time-consuming whole genome sequencing. For illustration, we picked two sets of covariates that were unobserved in our collected data; they are represented by triangles in Figure 6. The estimated gene expression network of the two hypothetical patients are shown in Figure 9, which was enabled by the unique feature of graph interpolation of the proposed GGMx.

Figure 9: — Network interpolation for two sets of unseen prognostic factors, represented as triangles in Figure 6.

8. Discussion

In this article, we introduce a general regression framework for (undirected) Gaussian graphical models with covariates (GGMx). This generalization of regular GGM beyond i.i.d. data allows the graph structure and strength to change with covariates and is particularly challenging especially in the undirected graph context due to the positive definiteness constraint of a precision matrix. We have addressed this challenge through a novel prior that is theoretically connected to non-local priors for precision matrices, paired with a carefully designed MCMC algorithm for efficient posterior inference. GGMx includes at least five special cases including standard GGMs, group-specific GGMs, time-varying GGMs, covariate-dependent GGMs, and context-specific GGMs. We demonstrated the utility and robustness of GGMx through extensive simulations and an application in precision oncology. Our GGMx framework is broadly applicable to many other scientific domains of interest. For example, in brain functional magnetic resonance imaging data, GGMx can be used to study how brain connectivity networks change with covariates such as time and stimuli.

We remark that covariance regression (Hoff and Niu, 2012; Fox and Dunson, 2015) is a closely related model. It is, however, fundamentally different from the proposed GGMx in at least two ways. First, covariance regression assumes the covariance matrix rather than the precision matrix to be a function of x_i which takes a specific form, $Σ_{i} = Ω_{i}^{- 1} = Ψ + Λ (x_{i}) + Λ {(x_{i})}^{T}$ for some PDM Ψ and matrix-value function Λ(x_i). Second, covariance regression assumes a dense Σ_i whereas GGMx allows Ω_i to be sparse and moreover, the sparsity pattern can change with covariates. Note that zeros in Λ(x_i) generally do not translate to zeros in Σ_i or Ω_i. Therefore, it is not straightforward to extend the covariance regression framework to allow sparsity.

While our work is a useful first step for undirected graphical regression, there are several extensions and refinements possible. We have chosen the smooth covariate-dependent functions, g_jk(·), to be linear for simplicity and parsimony. Same choice has been made by similar papers (Cheng et al., 2014). However, in general, it can be replaced by a nonlinear function. For example, letting $\tilde{x}$ denote some basis expansion of x such as splines and wavelets, we can model $g_{j k} (x) = β_{j k}^{T} \tilde{x}$ and the same inference procedure with linear functions applies. We plan to incorporate nonlinearity in our future work. Furthermore, we have worked with a moderate number of variables due to several reasons. First, the number of parameters that need to be estimated in GGMx is on the order of $\frac{p (p + 1) (q + 2)}{2} + \frac{p (p - 1)}{2}$ , which can be large even for a moderate number of variables and covariates. Second, from an application perspective, we focus on a specific signalling pathway in multiple myeloma, NF-κB for deeper scientific interpretations. The small sample size (relative to the number of parameters) does not allow for reliable inferences for a much larger number of variables (e.g., the entire transcriptomic profile). Finally, the scalability of the proposed GGMx also limits the number of variables under consideration. The scalability can be potentially improved by adopting more efficient MCMC algorithms such as Metropolis-adjusted Langevin algorithm (Roberts et al., 1996) or Hamiltonian Monte Carlo (Duane et al., 1987). Both algorithms take advantage of gradient information of the target distribution. However, the hard thresholding function in (3) is discontinuous. This difficulty can be potentially overcome by considering a continuous relaxation of the hard thresholding function (Cai et al., 2018). Another potential solution is resorting to variational Bayes algorithms, which approximate the posterior distributions by simpler variational distributions through minimizing the Kullback–Leibler divergence between them. We hope to address the scalability issue in our future work.

Acknowledgement

YN was partially supported by NSF DMS-2112943. VB was partially supported by NIH grants R01CA244845-01A1 and P30 CA-046592 and start-up funds from the U-M Rogel Cancer Center and School of Public Health.

Footnotes

^1.

This is a conceptual statement. Note that existing time-varying GGM methods typically assume graphs to vary non-linearly with time whereas this paper considers linearly-varying graphs.

^2.

Note that the probability cutoff c is different from the random threshold t_jk. The random threshold is a model parameter used to induce sparsity whereas the probability cutoff is introduced to obtain posterior point estimate of graphs.

^3.

The resulting prior means (variances) of the minimum effect size t_jk are 1.0 (0.22), 1.3 (0.63), and 1.6 (0.77).

Contributor Information

Yang Ni, Department of Statistics, Texas A&M University, College Station, TX 77843, USA.

Francesco C. Stingo, Department of Statistics, Computer Science, Applications “G. Parenti”, The University of Florence Florence, Italy

Veerabhadran Baladandayuthapani, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.

References

Altomare Davide, Consonni Guido, and Rocca Luca La. Objective Bayesian search of Gaussian directed ccyclic graphical models for ordered variables with non-local priors. Biometrics, 69(2):478–487, 2013. [DOI] [PubMed] [Google Scholar]
Annunziata Christina M, Davis R Eric, Demchenko Yulia, Bellamy William, Gabrea Ana, Zhan Fenghuang, Lenz Georg, Hanamura Ichiro, Wright George, Xiao Wenming, et al. Frequent engagement of the classical and alternative NF-κB pathways by diverse genetic abnormalities in multiple myeloma. Cancer Cell, 12(2):115–130, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bhadra Anindya and Mallick Bani K. Joint high-dimensional Bayesian variable and covariance selection with an application to eQTL analysis. Biometrics, 69(2):447–457, 2013. [DOI] [PubMed] [Google Scholar]
Cai Qingpo, Kang Jian, Yu Tianwei, et al. Bayesian network marker selection via the thresholded graph Laplacian Gaussian prior. Bayesian Analysis, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chapman Michael A, Lawrence Michael S, Keats Jonathan J, Cibulskis Kristian, Sougnez Carrie, Schinzel Anna C, Harview Christina L, Brunet Jean-Philippe, Ahmann Gregory J, Adli Mazhar, et al. Initial genome sequencing and analysis of multiple myeloma. Nature, 471(7339):467–472, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cheng Jie, Levina Elizaveta, Wang Pei, and Zhu Ji. A sparse Ising model with covariates. Biometrics, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
Danaher Patrick, Wang Pei, and Witten Daniela M. The joint graphical lasso for inverse covariance estimation across multiple classes. J R Stat Soc B, 76(2):373–397, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
Demchenko Yulia N, Glebov Oleg K, Zingone Adriana, Keats Jonathan J, Bergsagel P Leif, and Kuehl W Michael. Classical and/or alternative NF-κB pathway activation in multiple myeloma. Blood, 115(17):3541–3552, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dobra Adrian, Hans Chris, Jones Beatrix, Nevins Joseph R, Yao Guang, and West Mike. Sparse graphical models for exploring gene expression data. Journal of Multivariate Analysis, 90(1):196–212, 2004. [Google Scholar]
Dobra Adrian, Lenkoski Alex, and Rodriguez Abel. Bayesian inference for general Gaussian graphical models with application to multivariate lattice data. Journal of the American Statistical Association, 106(496):1418–1433, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Drton Mathias and Maathuis Marloes H. Structure learning in graphical modeling. Annual Review of Statistics and Its Application, 4:365–393, 2017. [Google Scholar]
Duane Simon, Kennedy Anthony D, Pendleton Brian J, and Roweth Duncan. Hybrid Monte Carlo. Physics Letters B, 195(2):216–222, 1987. [Google Scholar]
Fox Emily B and Dunson David B. Bayesian nonparametric covariance regression. The Journal of Machine Learning Research, 16(1):2501–2542, 2015. [Google Scholar]
Friedman Jerome, Hastie Trevor, and Tibshirani Robert. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gan Lingrui, Narisetty Naveen N, and Liang Feng. Bayesian regularization for graphical models with unequal shrinkage. Journal of the American Statistical Association, 114 (527):1218–1231, 2019. [Google Scholar]
Gelman Andrew, Rubin Donald B, et al. Inference from iterative simulation using multiple sequences. Statistical Science, 7(4):457–472, 1992. [Google Scholar]
Green Peter J. and Thomas Alun. Sampling decomposable graphs using a Markov chain on junction trees. Biometrika, 100(1):91, 2013. [Google Scholar]
Greipp Philip R, Miguel Jesus San, Durie Brian GM, et al. International staging system for multiple myeloma. Journal of Clinical Oncology, 23(15):3412–3420, 2005. [DOI] [PubMed] [Google Scholar]
Guo Jian, Levina Elizaveta, Michailidis George, and Zhu Ji. Joint estimation of multiple graphical models. Biometrika, page asq060, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hervé Avet-Loiseau, Florence Magrangeas, Philippe Moreau, Michel Attal, Thierry Facon, Kenneth Anderson, Harousseau Jean-Luc Munshi Nikhil, and Stéphane Minvielle. Molecular heterogeneity of multiple myeloma: pathogenesis, prognosis, and therapeutic implications. Journal of Clinical Oncology, 29(14):1893–1897, 2011. [DOI] [PubMed] [Google Scholar]
Hoff Peter D and Niu Xiaoyue. A covariance regression model. Statistica Sinica, pages 729–753, 2012. [Google Scholar]
Johnson Valen E and Rossell David. On the use of non-local prior densities in Bayesian hypothesis tests. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(2):143–170, 2010. [Google Scholar]
Johnson Valen E and Rossell David. Bayesian model selection in high-dimensional settings. Journal of the American Statistical Association, 107(498):649–660, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Khare Kshitij, Rajaratnam Bala, and Saha Abhishek. Bayesian inference for Gaussian graphical models beyond decomposable graphs. Journal of the Royal Statistical Society: Series B, 80(4):727–747, 2018. [Google Scholar]
Kolar Mladen, Parikh Ankur P, and Xing Eric P. On sparse nonparametric conditional covariance selection. In ICML-10, pages 559–566, 2010.
Liu Han, Chen Xi, Wasserman Larry, and Lafferty John D. Graph-valued regression. In Advances in Neural Information Processing Systems, pages 1423–1431, 2010a. [Google Scholar]
Liu Han, Roeder Kathryn, and Wasserman Larry. Stability approach to regularization selection (stars) for high dimensional graphical models. In Advances in Neural Information Processing Systems, pages 1432–1440, 2010b. [PMC free article] [PubMed] [Google Scholar]
Liu Ting, Zhang Lingyun, Joo Donghyun, and Sun Shao-Cong. NF-κkB signaling in inflammation. Signal Transduction and Targeted Therapy, 2:17023, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lohr Jens G, Stojanov Petar, Carter Scott L, et al. Widespread genetic heterogeneity in multiple myeloma: implications for targeted therapy. Cancer Cell, 25(1):91–101, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
Massam Hélène. Bayesian inference in graphical Gaussian models. Handbook of Graphical Models, pages 257–282, 2018. [Google Scholar]
Meinshausen Nicolai and Bühlmann Peter. High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3):1436–1462, 2006. [Google Scholar]
Mohammadi Abdolreza, Wit Ernst C, et al. Bayesian structure learning in sparse Gaussian graphical models. Bayesian Analysis, 10(1):109–138, 2015. [Google Scholar]
Ni Yang, Müller Peter, Zhu Yitan, and Ji Yuan. Heterogeneous reciprocal graphical models. Biometrics, 74(2):606–615, 2018. [DOI] [PubMed] [Google Scholar]
Ni Yang, Stingo Francesco C, and Baladandayuthapani Veerabhadran. Bayesian graphical regression. Journal of the American Statistical Association, 114(525):184–197, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nyman Henrik, Pensar Johan, and Corander Jukka. Stratified gaussian graphical models. Communications in Statistics-Theory and Methods, 46(11):5556–5578, 2017. [Google Scholar]
Oates Chris J, Korkola Jim, Gray Joe W, Mukherjee Sach, et al. Joint estimation of multiple related biological networks. The Annals of Applied Statistics, 8(3):1892–1919, 2014. [Google Scholar]
Peterson Christine, Stingo Francesco C, and Vannucci Marina. Bayesian inference of multiple Gaussian graphical models. Journal of the American Statistical Association, 110 (509):159–174, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Roberts Gareth O, Tweedie Richard L, et al. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli, 2(4):341–363, 1996. [Google Scholar]
Rossell David and Telesca Donatello. Nonlocal priors for high-dimensional estimation. Journal of the American Statistical Association, 112(517):254–265, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rothman Adam J, Levina Elizaveta, and Zhu Ji. Sparse multivariate regression with covariance estimation. Journal of Computational and Graphical Statistics, 19(4):947–962, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Roverato Alberto. Hyper inverse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models. Scandinavian Journal of Statistics, 29(3):391–411, 2002. [Google Scholar]
Roy Payel, Sarkar Uday Aditya, and Basak Soumen. The NF-κB activating pathways in multiple myeloma. Biomedicines, 6(2):59, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
Scott James G and Carvalho Carlos M. Feature-inclusion stochastic search for Gaussian graphical models. Journal of Computational and Graphical Statistics, 17(4):790–808, 2008. [Google Scholar]
Shaddox Elin, Stingo Francesco C, Peterson Christine B, Jacobson Sean, Cruickshank-Quinn Charmion, Kechris Katerina, Bowler Russell, and Vannucci Marina. A Bayesian approach for learning gene networks underlying disease severity in COPD. Statistics in Biosciences, 10(1):59–85, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shin Minsuk, Bhattacharya Anirban, and Johnson Valen E. Scalable Bayesian variable selection using nonlocal prior densities in ultrahigh-dimensional settings. Statistica Sinica, 28(2):1053, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sudderth Erik B, Wainwright Martin J, and Willsky Alan S. Embedded trees: Estimation of gaussian processes on graphs with cycles. IEEE Transactions on Signal Processing, 52 (11):3136–3150, 2004. [Google Scholar]
Wang Hao. Scaling it up: Stochastic search structure learning in graphical models. Bayesian Analysis, 10(2):351–377, 2015. [Google Scholar]
Wang Hao et al. Bayesian graphical lasso models and efficient posterior computation. Bayesian Analysis, 7(4):867–886, 2012. [Google Scholar]
Whittaker Joe. Graphical models in applied multivariate statistics. Wiley Publishing, 2009. [Google Scholar]
Xie Yuying, Liu Yufeng, and Valdar William. Joint estimation of multiple dependent gaussian graphical models with applications to mouse genomics. Biometrika, 103(3):493–511, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yajima Masanao, Telesca Donatello, Ji Yuan, and Müller Peter. Detecting differential patterns of interaction in molecular pathways. Biostatistics, 16(2):240–251, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yin Jianxin and Li Hongzhe. A sparse conditional Gaussian graphical model for analysis of genetical genomics data. The Annals of Applied Statistics, 5(4):2630, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yuan Ming and Lin Yi. Model selection and estimation in the Gaussian graphical model. Biometrika, 94(1):19–35, 2007. [Google Scholar]
Zhou Shuheng, Lafferty John, and Wasserman Larry. Time varying undirected graphs. Machine Learning, 80(2-3):295–319, 2010. [Google Scholar]

[R1] Altomare Davide, Consonni Guido, and Rocca Luca La. Objective Bayesian search of Gaussian directed ccyclic graphical models for ordered variables with non-local priors. Biometrics, 69(2):478–487, 2013. [DOI] [PubMed] [Google Scholar]

[R2] Annunziata Christina M, Davis R Eric, Demchenko Yulia, Bellamy William, Gabrea Ana, Zhan Fenghuang, Lenz Georg, Hanamura Ichiro, Wright George, Xiao Wenming, et al. Frequent engagement of the classical and alternative NF-κB pathways by diverse genetic abnormalities in multiple myeloma. Cancer Cell, 12(2):115–130, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Bhadra Anindya and Mallick Bani K. Joint high-dimensional Bayesian variable and covariance selection with an application to eQTL analysis. Biometrics, 69(2):447–457, 2013. [DOI] [PubMed] [Google Scholar]

[R4] Cai Qingpo, Kang Jian, Yu Tianwei, et al. Bayesian network marker selection via the thresholded graph Laplacian Gaussian prior. Bayesian Analysis, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Chapman Michael A, Lawrence Michael S, Keats Jonathan J, Cibulskis Kristian, Sougnez Carrie, Schinzel Anna C, Harview Christina L, Brunet Jean-Philippe, Ahmann Gregory J, Adli Mazhar, et al. Initial genome sequencing and analysis of multiple myeloma. Nature, 471(7339):467–472, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Cheng Jie, Levina Elizaveta, Wang Pei, and Zhu Ji. A sparse Ising model with covariates. Biometrics, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Danaher Patrick, Wang Pei, and Witten Daniela M. The joint graphical lasso for inverse covariance estimation across multiple classes. J R Stat Soc B, 76(2):373–397, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Demchenko Yulia N, Glebov Oleg K, Zingone Adriana, Keats Jonathan J, Bergsagel P Leif, and Kuehl W Michael. Classical and/or alternative NF-κB pathway activation in multiple myeloma. Blood, 115(17):3541–3552, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Dobra Adrian, Hans Chris, Jones Beatrix, Nevins Joseph R, Yao Guang, and West Mike. Sparse graphical models for exploring gene expression data. Journal of Multivariate Analysis, 90(1):196–212, 2004. [Google Scholar]

[R10] Dobra Adrian, Lenkoski Alex, and Rodriguez Abel. Bayesian inference for general Gaussian graphical models with application to multivariate lattice data. Journal of the American Statistical Association, 106(496):1418–1433, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Drton Mathias and Maathuis Marloes H. Structure learning in graphical modeling. Annual Review of Statistics and Its Application, 4:365–393, 2017. [Google Scholar]

[R12] Duane Simon, Kennedy Anthony D, Pendleton Brian J, and Roweth Duncan. Hybrid Monte Carlo. Physics Letters B, 195(2):216–222, 1987. [Google Scholar]

[R13] Fox Emily B and Dunson David B. Bayesian nonparametric covariance regression. The Journal of Machine Learning Research, 16(1):2501–2542, 2015. [Google Scholar]

[R14] Friedman Jerome, Hastie Trevor, and Tibshirani Robert. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Gan Lingrui, Narisetty Naveen N, and Liang Feng. Bayesian regularization for graphical models with unequal shrinkage. Journal of the American Statistical Association, 114 (527):1218–1231, 2019. [Google Scholar]

[R16] Gelman Andrew, Rubin Donald B, et al. Inference from iterative simulation using multiple sequences. Statistical Science, 7(4):457–472, 1992. [Google Scholar]

[R17] Green Peter J. and Thomas Alun. Sampling decomposable graphs using a Markov chain on junction trees. Biometrika, 100(1):91, 2013. [Google Scholar]

[R18] Greipp Philip R, Miguel Jesus San, Durie Brian GM, et al. International staging system for multiple myeloma. Journal of Clinical Oncology, 23(15):3412–3420, 2005. [DOI] [PubMed] [Google Scholar]

[R19] Guo Jian, Levina Elizaveta, Michailidis George, and Zhu Ji. Joint estimation of multiple graphical models. Biometrika, page asq060, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Hervé Avet-Loiseau, Florence Magrangeas, Philippe Moreau, Michel Attal, Thierry Facon, Kenneth Anderson, Harousseau Jean-Luc Munshi Nikhil, and Stéphane Minvielle. Molecular heterogeneity of multiple myeloma: pathogenesis, prognosis, and therapeutic implications. Journal of Clinical Oncology, 29(14):1893–1897, 2011. [DOI] [PubMed] [Google Scholar]

[R21] Hoff Peter D and Niu Xiaoyue. A covariance regression model. Statistica Sinica, pages 729–753, 2012. [Google Scholar]

[R22] Johnson Valen E and Rossell David. On the use of non-local prior densities in Bayesian hypothesis tests. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(2):143–170, 2010. [Google Scholar]

[R23] Johnson Valen E and Rossell David. Bayesian model selection in high-dimensional settings. Journal of the American Statistical Association, 107(498):649–660, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Khare Kshitij, Rajaratnam Bala, and Saha Abhishek. Bayesian inference for Gaussian graphical models beyond decomposable graphs. Journal of the Royal Statistical Society: Series B, 80(4):727–747, 2018. [Google Scholar]

[R25] Kolar Mladen, Parikh Ankur P, and Xing Eric P. On sparse nonparametric conditional covariance selection. In ICML-10, pages 559–566, 2010.

[R26] Liu Han, Chen Xi, Wasserman Larry, and Lafferty John D. Graph-valued regression. In Advances in Neural Information Processing Systems, pages 1423–1431, 2010a. [Google Scholar]

[R27] Liu Han, Roeder Kathryn, and Wasserman Larry. Stability approach to regularization selection (stars) for high dimensional graphical models. In Advances in Neural Information Processing Systems, pages 1432–1440, 2010b. [PMC free article] [PubMed] [Google Scholar]

[R28] Liu Ting, Zhang Lingyun, Joo Donghyun, and Sun Shao-Cong. NF-κkB signaling in inflammation. Signal Transduction and Targeted Therapy, 2:17023, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Lohr Jens G, Stojanov Petar, Carter Scott L, et al. Widespread genetic heterogeneity in multiple myeloma: implications for targeted therapy. Cancer Cell, 25(1):91–101, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] Massam Hélène. Bayesian inference in graphical Gaussian models. Handbook of Graphical Models, pages 257–282, 2018. [Google Scholar]

[R31] Meinshausen Nicolai and Bühlmann Peter. High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3):1436–1462, 2006. [Google Scholar]

[R32] Mohammadi Abdolreza, Wit Ernst C, et al. Bayesian structure learning in sparse Gaussian graphical models. Bayesian Analysis, 10(1):109–138, 2015. [Google Scholar]

[R33] Ni Yang, Müller Peter, Zhu Yitan, and Ji Yuan. Heterogeneous reciprocal graphical models. Biometrics, 74(2):606–615, 2018. [DOI] [PubMed] [Google Scholar]

[R34] Ni Yang, Stingo Francesco C, and Baladandayuthapani Veerabhadran. Bayesian graphical regression. Journal of the American Statistical Association, 114(525):184–197, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Nyman Henrik, Pensar Johan, and Corander Jukka. Stratified gaussian graphical models. Communications in Statistics-Theory and Methods, 46(11):5556–5578, 2017. [Google Scholar]

[R36] Oates Chris J, Korkola Jim, Gray Joe W, Mukherjee Sach, et al. Joint estimation of multiple related biological networks. The Annals of Applied Statistics, 8(3):1892–1919, 2014. [Google Scholar]

[R37] Peterson Christine, Stingo Francesco C, and Vannucci Marina. Bayesian inference of multiple Gaussian graphical models. Journal of the American Statistical Association, 110 (509):159–174, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Roberts Gareth O, Tweedie Richard L, et al. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli, 2(4):341–363, 1996. [Google Scholar]

[R39] Rossell David and Telesca Donatello. Nonlocal priors for high-dimensional estimation. Journal of the American Statistical Association, 112(517):254–265, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] Rothman Adam J, Levina Elizaveta, and Zhu Ji. Sparse multivariate regression with covariance estimation. Journal of Computational and Graphical Statistics, 19(4):947–962, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] Roverato Alberto. Hyper inverse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models. Scandinavian Journal of Statistics, 29(3):391–411, 2002. [Google Scholar]

[R42] Roy Payel, Sarkar Uday Aditya, and Basak Soumen. The NF-κB activating pathways in multiple myeloma. Biomedicines, 6(2):59, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] Scott James G and Carvalho Carlos M. Feature-inclusion stochastic search for Gaussian graphical models. Journal of Computational and Graphical Statistics, 17(4):790–808, 2008. [Google Scholar]

[R44] Shaddox Elin, Stingo Francesco C, Peterson Christine B, Jacobson Sean, Cruickshank-Quinn Charmion, Kechris Katerina, Bowler Russell, and Vannucci Marina. A Bayesian approach for learning gene networks underlying disease severity in COPD. Statistics in Biosciences, 10(1):59–85, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] Shin Minsuk, Bhattacharya Anirban, and Johnson Valen E. Scalable Bayesian variable selection using nonlocal prior densities in ultrahigh-dimensional settings. Statistica Sinica, 28(2):1053, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] Sudderth Erik B, Wainwright Martin J, and Willsky Alan S. Embedded trees: Estimation of gaussian processes on graphs with cycles. IEEE Transactions on Signal Processing, 52 (11):3136–3150, 2004. [Google Scholar]

[R47] Wang Hao. Scaling it up: Stochastic search structure learning in graphical models. Bayesian Analysis, 10(2):351–377, 2015. [Google Scholar]

[R48] Wang Hao et al. Bayesian graphical lasso models and efficient posterior computation. Bayesian Analysis, 7(4):867–886, 2012. [Google Scholar]

[R49] Whittaker Joe. Graphical models in applied multivariate statistics. Wiley Publishing, 2009. [Google Scholar]

[R50] Xie Yuying, Liu Yufeng, and Valdar William. Joint estimation of multiple dependent gaussian graphical models with applications to mouse genomics. Biometrika, 103(3):493–511, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] Yajima Masanao, Telesca Donatello, Ji Yuan, and Müller Peter. Detecting differential patterns of interaction in molecular pathways. Biostatistics, 16(2):240–251, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] Yin Jianxin and Li Hongzhe. A sparse conditional Gaussian graphical model for analysis of genetical genomics data. The Annals of Applied Statistics, 5(4):2630, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] Yuan Ming and Lin Yi. Model selection and estimation in the Gaussian graphical model. Biometrika, 94(1):19–35, 2007. [Google Scholar]

[R54] Zhou Shuheng, Lafferty John, and Wasserman Larry. Time varying undirected graphs. Machine Learning, 80(2-3):295–319, 2010. [Google Scholar]

PERMALINK

Bayesian Covariate-Dependent Gaussian Graphical Models with Varying Structure

Yang Ni

Francesco C Stingo

Veerabhadran Baladandayuthapani

Abstract

1. Introduction

Figure 1:

2. Background and notation

3. Gaussian graphical models with covariates

General construction of covariate-dependent priors.

Table 1:

Figure 2:

4. Theoretical Properties

A simple illustrative example

Figure 3:

5. Posterior Inference

The MCMC Algorithm.

Graph estimation.

Graph interpolation.

6. Simulations

6.1. Simulation Setup

Scenario I.

Scenario II.

Figure 4:

Scenario III.

Scenario IV.

Scenario V.

Scenario VI.

Scenario VII.

6.2. Methods under Consideration

6.3. Simulation Results

Figure 5:

Table 2:

7. Application in Multiple Myeloma

Figure 6:

Population-level inference

Figure 7:

Subject-level inference

Figure 8:

Figure 9:

8. Discussion

Acknowledgement

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases