Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Jun 12.
Published in final edited form as: Stat (Int Stat Inst). 2016 Nov 27;5(1):322–337. doi: 10.1002/sta4.119

Semiparametric Bayes conditional graphical models for imaging genetics applications

Suprateek Kundu a,*, Jian Kang b
PMCID: PMC5467734  NIHMSID: NIHMS865227  PMID: 28616224

Abstract

Motivated by the need for understanding neurological disorders, large-scale imaging genetic studies are being increasingly conducted. A salient objective in such studies is to identify important neuroimaging biomarkers such as the brain functional connectivity, as well as genetic biomarkers, which are predictive of disorders. However, typical approaches for estimating the group level brain functional connectivity do not account for potential variation, resulting from demographic and genetic factors, while usual methods for discovering genetic biomarkers do not factor in the influence of the brain network on the imaging phenotype. We propose a novel semiparametric Bayesian conditional graphical model for joint variable selection and graph estimation, which simultaneously estimates the brain network after accounting for heterogeneity, and infers significant genetic biomarkers. The proposed approach specifies priors on the regression coefficients, which clusters brain regions having similar activation patterns depending on covariates, leading to dimension reduction. A novel graphical prior is proposed, which encourages modularity in brain organization by specifying denser and sparse connections within and across clusters, respectively. The posterior computation proceeds via a Markov chain Monte Carlo. We apply the approach to data obtained from the Alzheimer’s disease neuroimaging initiative and demonstrate numerical advantages via simulation studies.

Keywords: brain functional network, conditional graphical model, imaging genetics, modularity, semiparametric Bayes, variable selection

1 Introduction

During the last two decades, tremendous progress has been made in both neuroimaging and high-throughput genotyping technology, which has resulted in the development of an emergent interdisciplinary field known as imaging genetics, focusing on the genetic dissection of neuroimaging and clinical phenotypes. The goal of imaging genetics studies is to discover the brain-wide, genome-wide association patterns, which drive complex neurological disorders [such as Alzheimer’s disease (AD), autism spectrum disorder and major depressive disorder]. A key objective in these studies is to characterize important neuroimaging and genetic biomarkers, which are associated with psychological disorders.

One important neuroimaging biomarker that has shown tremendous promise is the group level brain functional connectivity (Biswal et al., 1995; Smith et al., 2012; Huang et al., 2010, Kim et al., 2015), which characterizes the coherence of the neural activities among distinct brain regions for a collection of subjects. However typical approaches for estimating the group level brain network often fail to account for heterogeneity across subjects, resulting from demographic, clinical and genetic variations. This may lead to spurious associations and erroneous inferences. In addition to functional connectivity, several genetic biomarkers have been shown to be predictive of neurological disorders. Such biomarkers are often inferred by modelling the association between gene products/variants and the brain imaging phenotype (Stein et al., 2010; Zhu et al., 2014; Stingo et al., 2013), because of the knowledge that some neuroimaging traits are closer to the action of the gene compared with clinical phenotypes (Mier et al., 2010; Munafo et al., 2008). However, existing approaches for detecting such associations usually do not take into account the underlying brain functional network influencing the imaging phenotype.

To our knowledge, the body of work for inferring genetic associations with the imaging phenotype to discover genetic biomarkers for neurological disorders and the literature on estimating the brain functional connectivity have developed in a largely independent manner. In fact, there is a scarcity of approaches that can achieve the two goals simultaneously. To bridge such a gap through this work, we propose to jointly (i) estimate the group level brain functional network, after accounting for extrinsic sources of variation, and (ii) infer significant genetic and demographic associations with the imaging phenotype, leading to the discovery of important biomarkers. In addition, the proposed approach identifies functional modules, which contains brain regions having similar activation patterns influenced by covariates and deciphers the connectivity within each of these modules. The emphasis on functional modules is motivated by modular brain networks, which is a well-known feature of brain organization confirmed by several previous studies (Meunier et al., 2010).

A natural approach to fulfilling the aforementioned stated goals is the conditional Gaussian graphical model, which structures the multivariate outcome as a sum of a linear term involving covariates and a Gaussian residual encapsulating the graphical structure. The estimated graph under the conditional Gaussian graphical model provides a meaningful group level brain network comprising intrinsic connections after teasing out effects of third-party nodes and external sources of variation. Another advantage under this model is being able to compare brain networks across multiple groups, where it is imperative to account for variations because of genetic and demographic factors to make the comparisons meaningful. This is particularly useful for our real-data application where we seek to compare the functional connectivity between subjects with AD, subjects with mild cognitive disorder and healthy individuals. In this work, we impose sparsity on brain networks, which is supported by findings showing that a brain region usually interacts with only a few other regions in neurological processes (Stam et al., 2007; Supekar et al., 2008).

To our knowledge, there is a limited literature for conditional graphical models, with the primary focus being essentially limited to genetic studies. Frequentist approaches such as Yin & Li (2011), Li et al. (2012) and Cai et al. (2013) mainly focus on graph estimation after adjusting for covariates. However, the performance of the variable selection by those methods is not well assessed. On the other hand, the Bayesian approach proposed by Bhadra & Mallick (2013) assumes the same inclusion status for each covariate across all nodes (imaging phenotypes in our case), which makes this approach clearly inadequate for imaging genetics applications.

A key challenge in conditional graphical models is achieving good variable selection and graphical model estimation simultaneously, with these two goals being closely intertwined. In particular, model misspecification in terms of an overly sparse coefficient matrix can lead to spurious associations between nodes because of lack of adjustment for confounders (as noted by Yin & Li, 2011), while an artificially dense coefficient matrix is expected to cause overfitting, which may result in poor estimates for the strength of associations (as evidenced in Section 3). Ideally, a parsimonious Bayesian approach is desirable, which can provide a balance between the two goals, while providing uncertainty quantification to address the heterogeneity inherent in imaging genetics applications.

To achieve objectives (i) and (ii), we propose a flexible Bayesian conditional graphical model for joint covariate selection and graphical model estimation. The proposed model clusters the columns of the regression coefficient matrix under an infinite mixture of Laplace prior, which results in groups of nodes having similar activation patterns depending on covariates. The prior on the regression coefficients shrinks unimportant effect sizes to zero and yields groups of nodes that are related to covariates by similar magnitudes. The brain functional connectivity is estimated by a novel class of semiparametric graphical priors depending on the unknown cluster allocations, which specify sparse associations across clusters, but allow for denser connectivity within a cluster. Such a prior is motivated by modular brain networks (Nicolini & Bifone, 2016) and encourages functional modules characterized by distinct subnetworks. The method is straightforward to implement via a Markov chain Monte Carlo (MCMC). We design a post-MCMC approach involving multiplicity corrections for variable selection, which is able to identify important covariates influencing the imaging phenotype, as well as the functional modules.

We note that the emphasis on functional modules and corresponding subnetworks stems from the concept of modular brain networks, which describes the brain as a network of interconnected components comprising anatomically and/or functionally related brain regions (Sporns, 2010). Modular systems naturally tend to be small-world networks (Pan & Sinha, 2009), which is a well-established property in human brain organization. Moreover, widely used multivariate methods based on principal component analysis or independent component analysis have confirmed that the brain can typically be decomposed into subsystems of functionally connected brain regions (Guo, 2011; Calhoun et al., 2008). Another recent work by Wang et al. (2016) demonstrated that the brain network (derived using partial correlations) can be divided into different modules, with sparse connections across these modules but denser connections within each module. Motivated by such existing studies, we propose a data adaptive approach to estimate a modular brain network after accounting for covariate information, where the functional modules are learned from the data.

The paper is organized as follows. Section 2 proposes our semiparametric conditional graphical model and develops a posterior computation scheme, Section 3 lays out our simulation study, and Section 4 applies our methods to the analysis of Alzheimer’s disease neuroimaging initiative (ADNI) data.

2 Methodology

2.1 Semiparametric conditional graphical models

Let X and Z be the n × p and n × q dimensional outcome and covariate matrices, respectively, with the i-th row of X and Z being denoted as xi and zi, i = 1, …, n. In our imaging genetics applications, xi corresponds to the multivariate imaging phenotype, while zi denotes the supplementary genetic and demographic information for the i-th individual. We assume that the rows of X have been centred; thus, it is not necessary to include an intercept term. Consider the following conditional graphical model:

xi=zi(β1,,βp)+εi,εi~N(0,G),G~π(GG),βk~F,F=l=1wlδηl,ηl~j=1qDE(ηlj;λl), (1)

where δθ denotes a point mass at θ, N(·) and DE(·) denote Gaussian and double exponential/Laplace distributions, respectively, εi denotes the residual, βk = (βk1, …, βkq)T is the vector of regression coefficients, which captures the effect of covariates on the k-th outcome measurement (k = 1, …, p), and ΣG is the covariance matrix, which is defined conditional on the graph G. The prior on the covariance matrix and the graph space is discussed in detail in (2). We denote B = (β1, …, βp) as the coefficient matrix, so that xi = ziB + εi in (1).

The prior on the regression coefficients in (1) follows an infinite mixture of Laplace distributions, with the k-th component having a shrinkage parameter λk, and an associated weight wk, k = 1, …, ∞. The weights are structured as stick-breaking weights, so that wk = νkΠl<k(1 − νl), νk ~ Beta(1, M), with k=1wk=1. The mixture of Laplace prior enables dimension reduction by clustering the columns of B into distinct groups and encourages shrinkage of unimportant effect sizes towards zero. Each resulting cluster comprises nodes, which have similar activation patterns and are related to the covariates by similar magnitudes. In the special case when λk = λ for all k, the prior on the regression coefficients reduces to a Dirichlet process mixture of Laplace distributions (Sethuraman, 1994). The total number of clusters (H) is random and increases with the precision parameter M. The ingenuity of our approach lies in proposing a novel class of semiparametric graphical priors in (2), which translates the parsimony implied by the clustering of the columns of B into sparsity in the precision matrix, via a modular structure.

We assume that the support of the graph space is restricted to the class of decomposable graphs ℳ. To construct the prior on ℳ, first, fix the number of clusters induced under the mixture prior in (1) to be H. Further, denote the clusters as (S1, …, SH), with Sh containing the indices of ph nodes belonging to cluster h(h=1Hph=p. Define the edge set E under the graph G as E:= {e(k, l), k < l, k, l = 1, …, p}, where e(k, l) takes values 1 or 0 depending on whether the (k, l)-th edge is present in E or not. We formalize the semiparametric graphical prior π(G | S1, …, SH), defined conditional on cluster allocations, as follows:

e(k,l)~Ber(ω1)1(h=1H(kSh,lSh))+Ber(ω0)1(hhkSh,lSh,hh),kl,ω1~Be(aω,1,bω,1),ω0~Be(aω,0,bω,0),GG~HIWG(b,D), (2)

where 1(·) is the indicator function, Be(·) denotes a beta distribution, Ber(ω) denotes the Bernoulli distribution with inclusion probability ω, and HIW(b, D) refers to the hyper inverse-Wishart prior (Dawid & Lauritzen, 1993; Lauritzen, 1996) with scale matrix D and b degrees of freedom. The scale matrix is assumed to be diagonal in our work, that is, D = diag(d1, …, dp), with dj ~ π(dj), j = 1, …, p. The hyper inverse-Wishart prior in (2) restricts the support of G-1 to a space of positive definite matrices having zero off-diagonal elements corresponding to absent edges. We refer to the prior on the covariance in (2) as the semiparametric hyper inverse-Wishart prior or spHIW, because it is defined conditional on the unknown clustering parameters. We note that a change in the number of clusters and cluster memberships under the mixture distribution in (1) will result in corresponding changes in the graphical prior in (2).

Formulation (2) specifies the edge inclusion probabilities as ω1 or ω0, depending on whether the edge corresponds to two nodes belonging to the same cluster or different clusters. We choose hyper-parameters aω,0, bω,0, to have a small prior mean, and aω,1, bω,1, to have a larger prior mean, so as to encourage a higher density of edges within clusters, and sparse edges across clusters. The proposed approach thus results in a modular structure for the graph, such that there are sparse connections between modules, but denser connections within each module. The composition of the functional modules, as well as the subnetworks associated with the modules, are learned from the data.

Because the prior in (2) is defined conditional on the clustering parameters, it is appealing to derive the marginal prior π(G) after integrating out these parameters. First, note that

π(GS1,,SH)=K-1(ω1aω,1+t1G-1(1-ω1)bω,1+h=1Hph(ph-1)/2-t1G-1)×(ω0aω,0+t0G-1(1-ω0)bω,0+p(p-1)/2-h=1Hph(ph-1)/2-t0G-1), (3)

where G ∈ ℳ, K is the normalizing constant, and t1G, t0G, represent the number of edges within and across clusters, respectively. We noted previously that, when λj = λ for all j = 1, …,∞, the prior on the regression coefficients in (1) is a Dirichlet process mixture of Laplace distributions. In this special case, we can use results from Kyung et al. (2010) to obtain the following marginal form of the prior on the graph space:

π(G)=Γ(M)Γ(M+p)H=1pMH(S1,,SH)CHh=1HΓ(ph)π(GS1,,SH)1(GM),

where π(G | S1, …, SH) is defined as in (3), Γ(·) denotes the Gamma function, and the set 𝒞H contains all possible clustering allocations S1, …, SH, given H clusters.

2.2 Variable selection

We propose a post-MCMC variable selection approach, which proceeds by constructing credible regions accounting for multiplicity corrections. The variable selection enables us to infer (i) covariate effects on individual nodes and (ii) subsets of covariates determining clusters of nodes. It is understood that the covariates that affect a particular cluster may influence one or more connections between nodes in that cluster. Further, a group of covariates may affect more than one cluster, in which case it is possible that these covariates explain the dependence between such codependent clusters.

We construct rectangular credible regions incorporating multiplicity corrections as 𝒟 := {βjk : | βjk |/std(βjk) ≤ Uα*, j = 1, …, q, k = 1, …, p}, where std(βjk) is the standard deviation for βkl, and α* is the multiplicity adjusted width of the credible intervals. The aforementioned credible intervals enable us to test a set of local hypotheses H0,jk:βjkUjk versus βjk>Ujk for j = 1, …, q, k = 1, …, p, where the threshold for each regression coefficient is adjusted according to its standard deviation and hence is different from hard thresholding approaches, which choose a fixed threshold. The local hypothesis tests can be performed using a t-test at a significance level α* = α/(pq) under a Bonferroni correction. Although it is straightforward to use more sophisticated alternatives such as the false discovery rate approach (Benjamini & Hochberg, 1995), a simple Bonferonni correction performs adequately for our simulations studies and data applications.

2.3 Posterior computation

We propose an efficient approximate posterior computation scheme using a parameter expansion strategy. Under the original formulation (1), the computation of cluster membership probabilities for different columns of B will require p matrix inversions of order p − 1 each, which can be computationally restrictive. We devise a parameter expanded model to bypass the need of inverting matrices when computing cluster memberships. We fit the modified model:

xi=ziB+αi+εi,εi~N(0,δlp),αi~N(0,G),i=1,,n, (4)

where αi = (αi1, …, αip)T can be interpreted as the intercept term, which captures the graph information, and δ ~ Be(aδ, bδ ) is the residual variance. The prior on the graph and the covariance matrix is defined similarly as in (2). Marginalizing out the intercept in (4) yields xi ~ N(ziBG + δI) ≈ N(ziBG), when δ ≈ 0, which essentially gives back our original formulation (1).

The computational advantage of (4) stems from the fact that all elements in the data matrix X are independent conditionally on B, α1, …, αn, δ. This allows the following form of the posterior distribution, conditional on the clustering (S1, …, SH) :

L(i=1nN(xi;zi(ηs1,,ηsp)+αi,δlp)N(αi;0,G))(h=1Hl=1qDE(ηhl;λh))π(GG)π(GS1,,SH), (5)

where sj ∈ {1, …, H} denotes the cluster membership for the j-th column in B, j = 1, …, p. Under (5), it is straightforward to compute the cluster membership for a particular column independently of the other columns, in a computationally inexpensive manner that does not involve matrix inversions. In practice, the approximation under (4) is implemented by specifying a conjugate prior on δ with mode near zero and having a small variance, which results in posterior samples of δ = O(10−3), where a1 = O(a2) implies that |a1/a2| is bounded. In our experience, this choice works adequately for a variety of scenarios.

We use an MCMC algorithm for the posterior computation, which proceeds by (i) updating the cluster memberships and cluster atoms conditional on other parameters; (ii) updating the graph conditional on the cluster memberships, and then updating the inverse covariance matrix conditional on the graph; and (iii) updating the intercepts and residual variance conditional on the other parameters. We update the graph using a Metropolis–Hastings step in a manner similar to Bhadra & Mallick (2012), where the proposal distribution changes a non-zero element in the adjacency matrix to a zero element with probability 1 − aG, and the reverse proposal occurs with probability aG. Except for the graph, all remaining parameters in (5) can be sampled via closed form posteriors. The MCMC steps are described in detail in the Supporting Information section.

2.3.1 Inferring optimal clustering and point estimate for the graph

Our computation yields posterior samples of cluster membership allocations for each column of B. In order to estimate functional modules, we compute the optimal clustering over MCMC iterations using the least squares criteria by Dahl (2006). Denote by 𝒮(m), the vector of cluster allocations at the m-th MCMC iteration. The optimal cluster is selected as

S=argminS(m),m=1,,Ti=1pj=1p(Δi,j(S(m))-π^i,j)2,

where Δi,j(𝒮(m)) = 1 if (i, j) belong to the same cluster under 𝒮(m), and 0 otherwise, m = 1, …, T, and π̂ is the estimated matrix of pair-wise probabilities of belonging to the same cluster, computed over all MCMC iterations. The final estimated graph structure is computed in a manner consistent with this optimal clustering, by computing the marginal inclusion probabilities of edges using MCMC samples corresponding to the clustering 𝒮*, and including edges with high probabilities.

3 Simulation studies

3.1 Description

We consider three simulation settings (Cases I–III) with varying dimensions involving a true model of the form xi ~ N(ziB00) where Σ0 is the true covariance matrix, and B0 = (β01, …, β0p) are the true regression coefficients. For Cases I and II, the number of non-zero rows in the coefficient matrix (B0) are 10 and 5, respectively, where the elements in these non-zero rows are randomly set to 2, 3 or 0, and the proportion of zeros are high to ensure a sparse coefficient matrix. For both cases, the inverse covariance matrix 0-1=Ω0 is generated as follows. First, we generate Σ* having elements σ*(l, l′) = 0.5 (||ll′| + 1|1.4 − 2|ll′|1.4 + ||ll′| − 1|1.4), l, l′ = 1, …, p, which corresponds to a fractional Gaussian noise process with Hurst parameter as 0.7. We then invert Σ* to obtain Ω* and subsequently fix all off-diagonal elements of Ω* to be zero if the absolute value is less than 0.05, to obtain Ω1. Finally, we rescale the diagonal elements of Ω1 as ω1,kk=0.1+jk,j=1pω1,jk to obtain a diagonally dominant matrix, which is positive definite, denoted as Ω0=0-1. This is the true precision matrix that is used to generate the data. The true graph G0 is obtained by including all edges corresponding to an absolute partial correlation greater than 0. Note that the true model is a violation of the clustering as well as the block diagonal assumptions inherent in the proposed methodology. We consider dimensions (n, p, q) = (100, 80, 100), (100, 80, 200) for Cases I and II.

For Case III, we fit our model (1) and (2) to the positron emission technology (PET) data for individuals with mild cognitive impairment (MCI) obtained from the ADNI dataset, and then use the fitted model to simulate data. The dataset in question contains PET measurements recorded for 121 MCI samples (n), with additional information on q = 546 single nucleotide polymorphisms (SNPs). The imaging measurements were summarized into p = 42 regions of interest in the brain as in Huang et al. (2010), which are outlined in Table I. These regions are distributed in the frontal, parietal, occipital and temporal lobes and are considered to be potentially related to AD. We fit our model using dichotomized SNP data with value 1 if the minor allele frequency is 1 or 2, and 0 otherwise. This fitted model that is used to generate data that corresponds to a high dimensional multivariate response regression model, having 546 covariates and 140 edges.

Table I.

List of 42 regions of interest for simulation Case III.

Frontal lobe Parietal lobe Occipital lobe Temporal lobe
Frontal_Sup_L Parietal_Sup_L Occipital_Sup_L Temporal_Sup_L
Frontal_Sup_R Parietal_Sup_R Occipital_Sup_R Temporal_Sup_R
Frontal_Mid_L Parietal_Inf_L Occipital_Mid_L Temporal_Pole_Sup_L
Frontal_Mid_R Parietal_Inf_R Occipital_Mid_R Temporal_Pole_Sup_R
Frontal_Sup_Medial_L Precuneus_L O ccipital_Inf_L Temporal_Mid_L
Frontal_Sup_Medial_R Precuneus_R Occipital_Inf_R Temporal_Mid_R
Frontal_Mid_Orb_L Cingulum_Post_L Temporal_Pole_Mid_L
Frontal_Mid_Orb_R Cingulum_Post_R Temporal_Pole_Mid_R
Rectus_L Temporal_Inf_L 8301
Rectus_R Temporal_Inf_R 8302
Cingulum_Ant_L Fusiform_L
Cingulum_Ant_R Fusiform_R
Hippocampus_L
Hippocampus_R
ParaHippocampal_L
ParaHippocampal_R

We compare our approach (spHIW) with (i) the sparse seemingly unrelated regression (SSUR) method by Bhadra & Mallick (2012) for simultaneous graphical model estimation and variable selection; (ii) a multivariate version of the Bayesian lasso (Park & Casella, 2008) denoted as BLASSO designed to perform variable selection; and (iii) the frequentist graphical lasso (Friedman et al., 2008) denoted as GLASSO for graphical model estimation without accounting for covariates; we implemented spHIW and BLASSO in MATLAB, while the MATLAB code for SSUR was obtained from the authors of that article. The GLASSO was implemented using the R package glasso.

For the Bayesian approaches, we ran 25,000 MCMC iterations with a burn in of 5000. The initial adjacency matrix for the proposed approach and SSUR was chosen to be identity corresponding to a null graph, and the parameters in the hyper inverse-Wishart prior for these approaches were defined as b = 3, D = dIp. We imposed a conjugate Gamma prior on d, which seemed to work well in a variety of scenarios. In addition, we specify independent Gamma priors on λl, l = 1, …, q, as well as M ~ Ga(1, 1) and δ−1 ~ Ga(1000, 1). All results are reported over 50 replicates.

3.2 Comparison criteria

We looked at several metrics for comparisons, including (i) out-of-sample prediction in terms of mean squared error or MSE; (ii) estimation of true regression coefficients in terms of L2 error; (iii) estimation of the precision matrix in terms of L1 error; (iv) area under the receiver operating characteristic (ROC) curve for variable selection; and (v) area under the ROC curve for graphical model estimation. The predicted test samples were obtained using posterior predictive distributions under Bayesian approaches, and this was used to compute MSE. However, it was not possible to report MSE under GLASSO because it does not incorporate covariate information. Estimation of the precision matrix and regression coefficients under our approach was based on MCMC samples corresponding to the optimal clustering as outlined in Section 2.3, while it was based on all MCMC samples for the other Bayesian approaches.

To compute the area under the curve (AUC) for variable selection accuracy under our approach and BLASSO, we examine a series of regression models obtained by including all covariates for which |β̂kl|/se(β̂kl) > tkl, and excluding remaining variables. Here, tkl is a threshold that controls the sparsity of the regression model, and β̂kl, se(β̂kl) are the estimated mean and standard errors for βkl, k = 1, …, q, l = 1, …, p. For SSUR, the AUC for variable selection was computed by looking at a series of regression models obtained by varying the threshold for the posterior inclusion probabilities. On the other hand, we computed the AUC for graphical model estimation under the spHIW, and SSUR by examining a series of graphs obtained by varying the threshold for posterior inclusion probabilities for edges. Again, only the MCMC samples corresponding to the optimal clustering was used to compute the graph under our approach. The AUC for graph estimation under GLASSO was obtained by examining a series of models corresponding to different values of the penalty parameter.

3.3 Results

The numerical results under all approaches are reported in Table II. Our approach exhibits a lower error for estimating the true regression coefficients compared with BLASSO under Case I, but slightly higher error under Case II. The error under our approach was lower compared with SSUR for all scenarios. The AUC for variable selection is the highest under BLASSO, while it is the lowest for SSUR (close to 0.5). The poor AUC under SSUR is due to the inclusion of almost all covariates in the model, which indicates the inability of the approach to differentiate between important and unimportant covariates. In spite of having a lower AUC for variable selection compared with BLASSO, the proposed approach does significantly better in terms of out-of-sample prediction. This points to the advantage of incorporating the graph structure for prediction purposes, as opposed to assuming independence within the outcome measurements. SSUR has the largest out-of-sample MSE, in spite of reporting an overly large regression model.

Table II.

Numerical comparison for different approaches under all cases.

Case (n, p, q) Method MSE
β^L22
AUC (graph) AUC (var) ||Ω̂||L1
I (100, 80, 100) spHIW 0.27 0.05 0.80 0.82 0.0138
BLASSO 0.55 0.07 N/A 0.98 N/A
SSUR 0.62 0.12 0.78 0.50 0.0389
GLASSO N/A N/A 0.79 N/A 0.0447
I (100, 80, 200) spHIW 0.28 0.05 0.79 0.80 0.0141
BLASSO 0.56 0.09 N/A 0.96 N/A
SSUR 0.60 0.10 0.76 0.52 0.0917
GLASSO N/A N/A 0.78 N/A 0.0401
II (100, 80, 100) spHIW 0.26 0.10 0.80 0.85 0.0112
BLASSO 0.54 0.08 N/A 0.96 N/A
SSUR 0.61 0.13 0.77 0.50 0.0271
GLASSO N/A N/A 0.79 N/A 0.0432
II (100, 80, 200) spHIW 0.28 0.10 0.79 0.92 0.0127
BLASSO 0.57 0.08 N/A 0.97 N/A
SSUR 0.65 0.11 0.78 0.52 0.12
GLASSO N/A N/A 0.80 N/A 0.0391
III (121, 42, 546) spHIW 0.01 0.0015 0.88 0.94 0.17
BLASSO 0.02 0.0028 N/A 0.51 N/A
SSUR 0.02 0.0031 0.87 0.78 0.97
GLASSO N/A N/A 0.89 N/A 0.56

Note: MSE stands for out-of-sample mean squared error; β^L22 implies squared L2 error in estimating the regression coefficients; AUC (graph) and AUC (var) denote area under the curve for graphical model estimation and variable selection, respectively; and ||Ω̂||L1 denotes error for estimating the precision matrix.

For graphical model estimation, we note the proposed approach, SSUR, and GLASSO all have a similar AUC. However, the proposed approach produces the smallest error for estimating the precision matrix, which points to a superior ability to accurately estimate partial correlations. We conjecture that a lower error in estimating the partial correlations is due to the removal of external sources of variation, which when unaccounted for can potentially lead to erroneous estimates for strength of associations.

In summary, for Cases I and II, the proposed conditional graphical model has superior out-of-sample prediction performance by incorporating the underlying graph structure but suffers from a poorer variable selection performance due to the presence of a sizable number of additional covariance parameters. In contrast, the competing SSUR approach does very poorly in terms of variable selection and out-of-sample prediction. Both SSUR, which includes almost all covariates in the regression model, and GLASSO, which does not include any covariate at all, have comparable graph estimation performance, but higher errors when estimating the strength of associations. This underlines the role of accurate variable selection as an important factor in the estimation of conditional associations.

For Case III where the data resembles a real-world application, the proposed approach has superior performance compared with all approaches. In particular, the method has comparable out-of-sample prediction, but a lower error for estimating true regression coefficients, and a significantly higher AUC for variable selection. The higher AUC compared with Cases I and II points to the increased ability of the proposed method to differentiate between important and unimportant variables when the dimension of the multivariate outcome is moderate compared with the sample size, even when the number of candidate predictors is large. We also observe that the proposed approach has an improved graphical model estimation performance relative to competing approaches, as evident from significantly higher AUC, and a substantially smaller error for estimating partial correlations.

4 Application to imaging genetics

4.1 Description of Alzheimer’s disease neuroimaging initiative data

The ADNI collected a large amount of imaging, genetic and clinical data. The goal of the ADNI study is to determine whether different imaging biomarkers, along with genetic variants and clinical markers are strongly associated with the AD and the progression of MCI. In this article, we primarily concentrate on identifying (i) important connections in the functional brain network after accounting for age, gender, handedness, weight and genes; (ii) functional modules or collections of regions of interest in the brain, which work together to drive brain functions, and the corresponding subnetworks; and (iii) important genes influencing the imaging phenotype and the functional modules. The brain network is computed using PET measurements; however, it is straightforward to apply the method to other imaging modalities such as magnetic resonance imaging (MRI). We perform the analysis separately for the MCI, AD groups and healthy controls (HCs) and compare results across the three groups. We begin with a data description.

4.1.1 Imaging data

ADNI 1 collected the longitudinal PET scans at multiple time points across different imaging sites. To study the association between the imaging biomarkers and genetic variants, we used the PET scans at baseline for 49 AD patients, 121 MCI patients and 71 healthy subjects. The standard preprocessing steps including co-registration, normalization and spatial smoothing (8 mm full width at half maximum (FWHM)) were applied to the PET dataset. We considered 90 brain regions that are defined according to the automated anatomical labelling system. We computed the PET regional summaries using the first principal component scores over all voxels with each region, in a similar fashion as in Bowman et al. (2012). This 90 × 1 dimensional summary vector of PET scans is our outcome variable.

4.1.2 Genetics data

The SNPs in the ADNI study were genotyped using the Human 610-Quad BeadChip (Illumina, Inc., San Diego, CA, USA). By following Zhu et al. (2014) and Wang et al. (2012), we only focused on SNPs that belong to the top 40 candidate genes reported in the AlzGene database (http://www.alzgene.org) as of 10 June 2010. Before the data analysis, we performed standard preprocessing steps (Wang et al. 2012) on the SNP data for quality control. We also removed the SNPs having (i) more than 1% missing values, (ii) minor allele frequency less than 5% and (iii) the Hardy–Weinberg equilibrium p-value less than 10−6. The final dataset includes 614 SNPs on 37 genes. Figure 1 shows the number of the SNPs in the analysis per gene. The total number of covariates is 618 including preselected 614 SNPs and four demographic variables including handedness, age, gender and weight.

Figure 1.

Figure 1

Top 37 genes in the analysis and the number of SNPs per gene. There are a total of 614 SNPs included in the analysis.

4.2 Analysis results

Brain network identification

Based on the MCMC samples, we computed the posterior edge inclusion probabilities of the brain network for each group. By thresholding the probability at 0.5, the group-specific brain networks are obtained in Figure 2. Specifically, the AD, MCI and HC networks have 79, 102 and 73 important edges, respectively. There are 14 edges shared by all the three groups. Some of the common edges are in the default mode network (Buckner et al., 2008). For example, the functional connections between left and right precuneus (related to self-consciousness) appear in all the three networks, which implies that AD or MCI subjects have similar functional activities between the two precuneus regions as the HC subjects. Also, all the three networks contain edges between the left and right hippocampus, which indicates that the two regions are still functionally connected in the AD and MCI groups, although the damage in the Hippocampus has been confirmed to be related to AD.

Figure 2.

Figure 2

Functional brain network estimation for Alzheimer’s disease (AD), mild cognitive impairment (MCI) and healthy control (HC) groups and the common edges shared by the three networks.

As one of the defining features of the proposed method, it can identify functional modules or communities for each group-specific network. In our analysis, there are three, three and two functional modules identified in the AD, MCI and HC groups, respectively, which are shown in Figure 3. It can be seen that AD and MCI have two similar functional modules: communities 1 and 2, while the modules related to HC are quite different from the AD and MCI groups. Functional module 1 for AD and MCI groups collect many regions in parietal and temporal lobes, with the number of connections being 29 and 42, respectively. We observe that some functional connections between the two hemisphere are missing in the AD group compared with the MCI group. For example, the AD network does not have the functional connections between the right and left fusiform gyrus, the functionality of which is mainly related to face and body recognition (McCarthy et al., 1997).

Figure 3.

Figure 3

Functional modules or communities for Alzheimer’s disease (AD), mild cognitive impairment (MCI) and healthy control (HC) groups, along with important genes. The arrows relate the subnetworks to the significant genes influencing them. Each such gene can influence one or more connections in the functional modules, as well as one or more phenotypes in that module.

The functional module 2 in both AD and MCI networks include hippocampus and parahippocampal in temporal lobe. The total numbers of connections between these two regions and all other regions are six in the MCI network and only two in the AD network. This implies that these regions become more isolated in AD group compared with the MCI group, which has been confirmed by previous findings (Supekar et al., 2008; Huang et al., 2010). The functional module 3 for the AD group mainly includes four regions: postcentral gyrus, precentral gyrus, paracentral lobule and supplementary motor area, whose functions are mainly related to the motor skill and sense of touch. Because this is a separate module in the AD group, these regions have much fewer connections compared with the HC group, which potentially implies reduced motor skills and sense of touch for AD subjects. Compared with the brain networks for AD and MCI groups, our analysis only detects two functional modules for the HC, which implies increased connectivity compared with the AD and MCI patients.

Important genes for brain networks

Based on the MCMC samples, we identify important SNPs associated with each functional module for AD, MCI and HC groups (Table III). For example, SNP “rs2018334” on NEDD9 is significantly associated with the subnetwork community 2 in the AD group, which is supported by the findings by Wang et al. (2012). GAB2 was also recognized as an important gene for both AD and MCI groups, but not the HC group; this corresponds to prior evidence implying that the gene modifies late onset AD risk in APOE ε4 carriers and influences Alzheimer’s neuropathology (Reiman et al., 2007). Keeping in line with prior findings, CH25H was found to be significantly associated with AD risk but not MCI or HC (Wollmer, 2010). Further, genes that promote MCI disease risk but are not associated with HC individuals include ECE1, which is associated with cognitive ability in elderly individuals and disease risk (Hamilton et al., 2012), as well as ADAM-10, which regulate neuronal plasticity affecting AD (Marcello et al., 2013), and phosphatidylinositol binding clathrin assembly protein (PICALM), which was one of the first AD loci identified by genome-wide association studies, and which has also been validated in independent samples. In addition, SORL1, which is known to be a potential tool for identifying MCI subjects at high risk of conversion to AD (Piscopo et al., 2015), is found to be significant in the MCI and HC groups, but not with the AD group. In addition to the aforementioned genes, we found that age is a significant predictor for community 1 in the MCI group.

Table III.

Important SNPs (genes) that are significantly associated with the subnetwork communities for each group.

AD MCI HC
Community 1 rs4933497 (CH25H) rs11590928 (ECE1), rs3026886 (ECE1), rs1015477 (DAPK1), rs10868609 (DAPK1), rs1105384 (DAPK1), rs10509825 (SORCS1), rs10501608 (PICALM), rs1790213 (SORL1), rs12594742 (ADAM10), rs4309 (ACE) rs4428180 (TF), rs7748486 (NEDD9), rs661319 (SORCS1), rs4713432 (NEDD9), rs10868644 (DAPK1), rs11601559 (SORL1)
Community 2 rs2018334 (NEDD9), rs11603112 (GAB2) rs3026868 (ECE1), rs3026886 (ECE1), rs871495 (DAPK1), rs12248564 (SORCS1), rs821962 (SORCS1), rs1015477 (DAPK1) rs16871157 (NEDD9), rs6691117 (CR1), rs729211 (CALHM1), rs7036781 (DAPK1)
Community 3 rs212518 (ECE1) rs1015477 (DAPK1), rs1415020 (SORCS1), rs821962 (SORCS1), rs2450135 (GAB2), rs7941639 (GAB2), rs3026886 (ECE1) rs10509825 (SORCS1), rs34634755 (GAB2), rs666682 (PICALM)

Note: AD, Alzheimer’s disease; MCI, mild cognitive impairment; HC, healthy control.

5 Discussion

We have developed a new Bayesian semiparametric conditional graphical model for imaging genetics studies and applied it for analyzing the ADNI dataset. Our approach can jointly estimate the brain network after accounting for external sources of variation and infer important genetic and demographic factors associated with the imaging phenotype and the brain network. It can also simultaneously discover functional modules in the brain and infer the connectivity within each such module. To our knowledge, the proposed method is among the first to jointly address the aforementioned aims and is expected to provide deeper insights in imaging genetic studies, compared with existing approaches. Given the high dimensional nature of the data in imaging genetics applications, it would be interesting to explore the scalability of the proposed approach as p and/or q increases. Our initial experiments suggest that the semiparametric conditional graphical model scales well with the number of covariates q, but the computational speed may become slower as p is increased. However, more effort is needed to examine such scalability issues, and we leave this topic for future investigation.

Acknowledgments

We would like to thank the Alzheimer’s Disease Neuroimaging Initiative for providing us access with the data.

Footnotes

Supporting information

Additional supporting information may be found in the online version of this article at the publisher’s web-site.

References

  1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B (Statistical Methodology) 1995;57:289–300. [Google Scholar]
  2. Biswal B, Zerrin Yetkin F, Haughton VM, Hyde JS. Functional connectivity in the motor cortex of resting human brain using echo-planar mri. Magnetic Resonance in Medicine. 1995;34:537541. doi: 10.1002/mrm.1910340409. [DOI] [PubMed] [Google Scholar]
  3. Bhadra A, Mallick BK. Joint high-dimensional Bayesian variable and covariance selection with an application to eQTL analysis. Biometrics. 2013;69:447–457. doi: 10.1111/biom.12021. [DOI] [PubMed] [Google Scholar]
  4. Bowman FD, Zhang L, Derado G, Chen S. Determining Functional Connectivity using fMRI Data with Diffusion-Based Anatomical Weighting. NeuroImage. 2012;62:1769–1779. doi: 10.1016/j.neuroimage.2012.05.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Buckner RL, Andrews–Hanna JR, Schacter DL. The brain’s default network. Annals of the New York Academy of Sciences. 2008;1124:1–38. doi: 10.1196/annals.1440.011. [DOI] [PubMed] [Google Scholar]
  6. Cai T, Li H, Liu W, Xie J. Covariate-adjusted precision matrix estimation with an application in genetical genomics. Biometrika. 2013;100:139–156. doi: 10.1093/biomet/ass058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Calhoun VD, Kiehl KA, Pearlson GD. Modulation of temporally coherent brain networks estimated using ICA at rest and during cognitive tasks. Human Brain Mapping. 2008;29:828–838. doi: 10.1002/hbm.20581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Dahl DB. Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model. In: Do K-A, Mller P, Vannucci M, editors. Bayesian Inference for Gene Expression and Proteomics. Cambridge University Press; 2006. [Google Scholar]
  9. Dawid AP, Lauritzen SL. Hyper Markov laws in the statistical analysis of decomposable graphical models. Annals of Statistics. 1993;21:1272–1317. [Google Scholar]
  10. Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the lasso. Biostatistics. 2008;9:432–441. doi: 10.1093/biostatistics/kxm045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Guo Y. A general probabilistic model for group independent component analysis and its estimation methods. Biometrics. 2011;67:1532–1542. doi: 10.1111/j.1541-0420.2011.01601.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Hamilton G, Harris SE, Davies G, Liewald DC, Tenesa A, Payton A. The role of ECE1 variants in cognitive ability in old age and Alzheimer’s disease risk. American Journal of Medial Genetics, Part B. 2012;159 B:676–709. doi: 10.1002/ajmg.b.32073. [DOI] [PubMed] [Google Scholar]
  13. Huang S, Li J, Sun L, Ye J, Fleisher A, Wu T, Chen K, Reiman E Alzheimer’s disease neuroimaging initiative. Learning brain connectivity of Alzheimer’s disease by sparse inverse covariance estimation. Neuroimage. 2010;50:935–949. doi: 10.1016/j.neuroimage.2009.12.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Kim J, Wozniak JR, Mueller BA, Pan W. Testing group differences in brain functional connectivity: using correlations or partial correlations? Brain Connectivity. 2015;5:214–231. doi: 10.1089/brain.2014.0319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Kyung M, Gill J, Casella G. Estimation in Dirichlet random effects models. Annals of Statistics. 2010;38:979–1009. [Google Scholar]
  16. Lauritzen SL. Graphical Models. Oxford University Press; United Kingdom: 1996. [Google Scholar]
  17. Li B, Chun H, Zhao H. Sparse estimation of conditional graphical models with application to gene networks. Journal of the American Statistical Association. 2012;497:152–167. doi: 10.1080/01621459.2011.644498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Marcello E, Saraceno C, Musardo S, Vara H, de la Fuente AG, Pelucchi S, Di Marino D, Borroni B, Tramontano A, Pérez-Otaño I, Padovani A, Giustetto M, Gardoni F, Di Luca M. Endocytosis of synaptic ADAM10 in neuronal plasticity and Alzheimer’s disease. Journal of Clinical Investigation. 2013;123:2523–2538. doi: 10.1172/JCI65401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. McCarthy G, Puce A, Gore JC, Allison T. Face-specific processing in the human fusiform gyrus. Journal of Cognitive Neuroscience. 1997;9:605–610. doi: 10.1162/jocn.1997.9.5.605. [DOI] [PubMed] [Google Scholar]
  20. Mier D, Kirsch P, Meyer-Lindenberg A. Neural substrates of pleiotropic action of genetic variation in COMT: a meta-analysis. Molecular Psychiatry. 2010;15:918–927. doi: 10.1038/mp.2009.36. [DOI] [PubMed] [Google Scholar]
  21. Munafo MR, Attwood AS, Flint J. Bias in genetic association studies: effects of research location and resources. Psychological Medicine. 2008;38:1213–1214. doi: 10.1017/S003329170800353X. [DOI] [PubMed] [Google Scholar]
  22. Meunier D, Lambiotte R, Bullmore ET. Modular and hierarchically modular organization of brain networks. Frontiers in Neuroscience. 2010;4:200. doi: 10.3389/fnins.2010.00200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Nicolini C, Bifone A. Modular structure of brain functional networks: breaking the resolution limit by surprise. Scientific Reports. 2016;6:19250. doi: 10.1038/srep19250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Pan RK, Sinha S. Modularity produces small-world networks with dynamical time-scale separation. Europhysics Letters Association. 2009;85:68006. [Google Scholar]
  25. Park T, Casella G. The Bayesian lasso. Journal of the American Statistical Association. 2008;103:681–686. [Google Scholar]
  26. Piscopo P, Tosto G, Belli C, Talarico G, Galimberti D, Gasparini M, Canevelli M, Poleggi A, Crestini A, Albani D, Forloni G, Lucca U, Quadri P, Tettamanti M, Fenoglio C, Scarpini E, Bruno G, Vanacore N, Confaloni A. SORL1 gene is associated with the conversion from mild cognitive impairment to Alzheimer’s disease. Journal of Alzheimer’s Disease. 2015;46:771–776. doi: 10.3233/JAD-141551. [DOI] [PubMed] [Google Scholar]
  27. Reiman EM, Webster JA, Myers AJ, Hardy J, Dunckley T, Zismann VL, Joshipura KD, Pearson JV, Hu-Lince D, Huentelman MJ, Craig DW, Coon KD, Liang WS, Herbert RH, Beach T, Rohrer KC, Zhao AS, Leung D, Bryden L, Marlowe L, Kaleem M, Mastroeni D, Grover A, Heward CB, Ravid R, Rogers J, Hutton ML, Melquist S, Petersen RC, Alexander GE, Caselli RJ, Kukull W, Papassotiropoulos A, Stephan DA. GAB2 alleles modify Alzheimer’s risk in APOEε4 Carriers. Neuron. 2007;54:713–720. doi: 10.1016/j.neuron.2007.05.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Sethuraman J. A constructive definition of Dirichlet priors. Statistica Sinica. 1994;4:639–650. [Google Scholar]
  29. Smith SM, Beckmann CF, Andersson J, Auerbach EJ, Bijsterbosch J, Douaud G, Duff E, Feinberg DA, Griffanti L, Harms MP, Kelly M, Laumann T, Miller KL, Moeller S, Petersen S, Power J, Salimi-Khorshidi G, Snyder AZ, Vu AT, Woolrich MW, Xu J, Yacoub E, Uurbil K, Van Essen DC, Glasser MF. Resting-state fMRI in the human connectome project. Neuroimage. 2012;80:144–168. doi: 10.1016/j.neuroimage.2013.05.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Sporns O. Networks of the Brain. MIT Press; Cambridge MA: 2010. [Google Scholar]
  31. Stam CJ, Jones BF, Nolte G, Breakspear M, Scheltens P. Small-world networks and functional connectivity in Alzheimer’s disease. Cerebral Cortex. 2007;17:92–99. doi: 10.1093/cercor/bhj127. [DOI] [PubMed] [Google Scholar]
  32. Stein JL, Hua X, Lee S, Ho AJ, Leow AD, Toga AW, Saykin AJ, Shen L, Foroud T, Pankratz N, Huentelman MJ, Craig DW, Gerber JD, Allen AN, Corneveaux JJ, Dechairo BM, Potkin SG, Weiner MW, Thompson P. Voxelwise genome-wide association study (vGWAS) Neuroimage. 2010;53:1160–1174. doi: 10.1016/j.neuroimage.2010.02.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Stingo FC, Guindani M, Vannucci M, Calhoun VD. An integrative Bayesian modeling approach to imaging genetics. Journal of the American Statistical Association. 2013;108:875–891. doi: 10.1080/01621459.2013.804409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Supekar K, Menon V, Rubin D, Musen M, Greicius MD. Network analysis of intrinsic functional brain connectivity in Alzheimer’s Disease. PLoS Computional Biology. 2008;4:1–11. doi: 10.1371/journal.pcbi.1000100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Wang Y, Bi L, Wang H, Li Y, Di Q, Xu W, Qian Y. NEDD9 rs760678 polymorphism and the risk of Alzheimer’s disease: a meta-analysis. Neuroscience Letters. 2012;527:121–125. doi: 10.1016/j.neulet.2012.08.044. [DOI] [PubMed] [Google Scholar]
  36. Wang H, Jin X, Zhang Y, Wang J. Single-subject morphological brain networks: connectivity mapping, topological characterization and testretest reliability. Brain and Behavior. 2016;6:e00448. doi: 10.1002/brb3.448. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Wollmer MA. Cholesterol-related genes in Alzheimer’s disease. Biochim Biophysica Acta. 2010;1808:762–773. doi: 10.1016/j.bbalip.2010.05.009. [DOI] [PubMed] [Google Scholar]
  38. Yin J, Li H. A sparse conditional graphical model for analysis of genetical genomics data. Annals of Applied Statistics. 2011;5:2630–2650. doi: 10.1214/11-AOAS494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Zhu H, Khondker Z, Lu Z, Ibrahim JG. Bayesian generalized low rank regression models for neuroimaging phenotypes and genetic markers. Journal of the American Statistical Association. 2014;109:977–990. [PMC free article] [PubMed] [Google Scholar]

RESOURCES