Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2024 Jul 9;26(1):kxae023. doi: 10.1093/biostatistics/kxae023

Bayesian estimation of covariate assisted principal regression for brain functional connectivity

Hyung G Park 1,
PMCID: PMC11823071  PMID: 38981041

Abstract

This paper presents a Bayesian reformulation of covariate-assisted principal regression for covariance matrix outcomes to identify low-dimensional components in the covariance associated with covariates. By introducing a geometric approach to the covariance matrices and leveraging Euclidean geometry, we estimate dimension reduction parameters and model covariance heterogeneity based on covariates. This method enables joint estimation and uncertainty quantification of relevant model parameters associated with heteroscedasticity. We demonstrate our approach through simulation studies and apply it to analyze associations between covariates and brain functional connectivity using data from the Human Connectome Project.

Keywords: brain functional connectivity, dimension reduction, heteroscedasticity

1 Introduction

This paper reformulates covariate-assisted principal (CAP) regression of Zhao et al. (2021b) in the Bayesian paradigm. The approach identifies covariate-relevant components of the covariance of multivariate response data. Specifically, the method estimates a set of linear projections of multivariate response signals, whose variance is related to external covariates. In neuroscience, there is interest in analyzing statistical dependency between time-series of brain signals from distinct regions of the brain, which we refer to as functional connectivity (FC) (Lindquist 2008; Fornito and Bullmore 2012; Fornito et al. 2013; Monti et al. 2014; Fox and Dunson 2015). The brain signals underling FC are multivariate, and each brain activity is considered relative to others (Varoquaux et al. 2010) in analyzing FC, as this statistical dependency is related with behavioral characteristics (covariates). This paper develops a Bayesian approach to conducting supervised dimension reduction for the response signals, to analyze the association between external covariates and the FC characterized by the multivariate signals’ covariances.

Typically, the first step to analyze brain FC is to define a set of nodes corresponding to spatial regions of interest (ROIs), where each node is associated with its own time course of imaging data. Then the network connections (or an “edge” structure between the nodes) are subsequently estimated based on the statistical dependency between each of the nodes’ time course (van der Heuvel and Hulshoff Pol 2010; Friston 2011). FC networks have been inferred using Pearson’s correlation coefficients (Hutchison et al. 2013) and also with partial correlations in the context of Gaussian graphical models (Whittaker 1990; Hinne et al. 2014) summarized in the precision or inverse covariance matrix. In recent years, there has been a focus on subject-level graphical models where the node-to-node dependencies vary with respect to subject-level covariates. This line of research involves methods to estimate or test group-specific graphs (Guo et al. 2011; Danaher et al. 2014; Narayan et al. 2015; Peterson et al. 2015; Xia et al. 2015; Cai et al. 2016; Saegusa and Shojaie 2016; Lin et al. 2017; Tan et al. 2017; Xia and Li 2017; Durante and Dunson 2018; Xia et al. 2018) as well as general Gaussian graphical models for graph edges that allow both continuous and discrete covariates, estimated based on trees (Liu et al. 2010), kernels (Kolar et al. 2010; Lee and Xue 2018), linear or additive regression (Ni et al. 2019; Wang et al. 2022; Zhang and Li 2023). However, like other standard node-wise regression methods (e.g. Meinshausen and Buhlmann 2006; Peng et al. 2009; Kolar et al. 2010; Cheng et al. 2014; Leday et al. 2017; Ha et al. 2021) in Gaussian graphical models, these approaches focus on edge detection (ie estimation of the off-diagonal elements) rather than estimating the full precision or covariance matrix and do not explicitly constrain positive definiteness of precision or covariance matrices. Works on general tensor outcome regression (Li and Zhang 2017; Sun and Li 2017; Lock 2018) also do not generally guarantee the positive definiteness of the outcomes. While the problem of dimension reduction of individual covariances has been studied in brain dynamic connectivity analysis (Dai et al. 2020), problems in computer vision (Harandi et al. 2017; Li and Lu 2018; Gao et al. 2023) and brain computer interfaces (Davoudi et al. 2017; Xie et al. 2017) as well as multi-group covariance estimation (Flury 1984, 1986; Boik 2002; Pourahmadi et al. 2007; Hoff 2009; Franks and Hoff 2019), covariate information was not utilized in conducting dimension reduction, or it views the data at the group level, which does not account for subject-level heterogeneity in the brain networks. Gaussian graphical models have been applied to study brain connectivity networks in fMRI data (e.g. Li and Solea 2018; Zhang et al. 2020), however, the focus was on analyzing connectivity networks, without explicitly considering their relationship with subject-level covariates.

In this paper, in line with the covariance regression literatures (see, e.g. Engle and Kroner 1995; Fong et al. 2006; Varoquaux et al. 2010; Pourahmadi 2011; Hoff and Niu 2012; Fox and Dunson 2015; Zou et al. 2017; Zhao et al. 2021a,b, 2024), we will frame the problem of analyzing FC as modeling of heteroscedasticity, ie estimating a covariance function Σx=var[Y|x] across a range of values for an explanatory x-variable. In contrast to the approach developed in Zhao et al. (2021b) where each projection vector for Σi is estimated sequentially and in Franks (2022) where statistical inference is conducted conditionally on the estimated dimension-reduced subspace, the proposed framework allows coherent and simultaneous inference on all model parameters within the Bayesian paradigm.

One typical approach to associating brain FC with behavior is to take a massive univariate test approach that relates each connectivity matrix element with subject-level covariates (e.g. Woodward et al. 2011; Grillon et al. 2013). However, this “massive edgewise regression” lacks statistical power, as it (i) ignores dependencies among the connectivity elements; and (ii) involves quadratically increasing number of regressions that exacerbate the problem of multiple testing. On the other hand, multivariate methods such as principal component analysis (PCA) as considered in Crainiceanu and Punjabi (2011) consider the data from all ROIs at once, reducing the dimensionality of the original outcome to a smaller number of “networks” components, however, these common components may be associated with small eigenvalues, or the corresponding eigenvalues may not be associated with covariates.

The outcome data of interest are multivariate time-series resting-state fMRI (rs-fMRI) data in Rp measured simultaneously across the p ROIs (or parcels) defined based on an anatomical parcellation (Eickhoff et al. 2018) or “network nodes” (Smith et al. 2012) derived from a data-driven algorithm such as independent component analysis (ICA) (Calhoun et al. 2009; Smith et al. 2013). As in Seiler and Holmes (2017), we will apply the Bayesian CAP regression to data from the Human Connectome Project (HCP) (Van Essen et al. 2013) to compare short sleepers (ie 6 hours) with conventional sleepers (ie 7 to 9 hours) with respect to their FC.

2 Method

2.1 Covariance regression models

We consider n subjects, with subject-specific covariances for brain activity time series from p ROIs {ΣiRp×p,i=1,,n}. The space of valid covariance matrices ΣiRp×p is the space of symmetric positive definite (SPD) matrices, denoted as Symp+ in this paper. The rs-fMRI time-series for a given subject i are drawn from a Gaussian distribution: YitN(μi,Σi) with μiRp and ΣiSymp+. For centered data, the mean μi=0, and the covariance Σi captures FC. Without loss of generality, we assume that the observed signal is mean-centered so that t=1TiYit=0Rp for each subject (i=1,,n), as our focus is on FC characterized by the covariance between the brain signals. We observed Yit over Ti time points for each subject i(i=1,,n) along with subject-level vectors of covariates xiRq(i=1,,n).

In this paper, instead of directly modeling the subject-specific covariances Σi=cov(Yit) (as in Seiler and Holmes (2017); Fox and Dunson (2015); Zou et al. (2017)) in which most of the covariance heterogeneity may be unrelated with xi, we aim to extract a lower dimensional component whose covariance heterogeneity is related with xi. We will characterize this lower dimensional structure by a dimension reducing matrix ΓRp×d where ΓΓ=Id (ie Γ is in a Stiefel manifold) with dp. Specifically, we consider a latent factor model for Yit

Yit=ΓΨi12sit+Liϵit (2.1)

with latent factors sitN(0,Id) and ϵitN(0,Ipd), of dimensions d and p—d, respectively, where

Ψi12=exp(diag((Bxi+zi)/2)) (2.2)

models the x-related heteroscedasticity along the projection directions ΓRp×d. In (2.2), diag((Bxi+zi)/2)Rd×d is a diagonal matrix, where its diagonal elements are given by a linear predictor vector (Bxi+zi)/2Rd. In (2.1), ΓRp×d specifies the Principal Directions of Covariance (PDCs) of Yit related with xi, whereas the other orthogonal components LiRp×(pd), which satisfy LiΓ, are included to account for the “noise” directions and magnitudes of the heteroscedasticity that are unrelated with xi.

In (2.2), the matrix B=[β0,Bˇ]Rd×q (where β0Rd represents the intercept) is a regression coefficient matrix that relates xiRq (with its first element being 1) to the subject-level outcome covariance Σi. Under model (2.1), the subject-level covariance is given by

Σi=ΓΨiΓ+LiLi, (2.3)

that decomposes the individual covariance matrices Σi into two components, covariates related and unrelated, a principal factor decomposition of Σi. In (2.3), unlike the more general structure on LiLi whose variability is unrelated with xi, the PDCs Γ serve as features (ie “subnetworks”) that we expect to be consistent across subjects. Along Γ, model (2.2) incorporates subject-level random effects ziN(0,Ω) to capture additional heteroscedasticity not captured by xi. In model (2.22), the diagonality of the d × d core tensor ΨiSymd+ is needed as an identifiability condition, since any non-diagonal SPD Ψ˜i can be diagonalized by its normalized eigenvectors A˜Rd×d (assuming common eigenvectors A˜ for Ψ˜i across subjects), and ΓA˜Rp×d can instead be used as the orthonormal dimension reduction matrix. While we impose the diagonality of Ψi, we allow ziN(0,Ω), where Ω may have off-diagonal elements that allow residual correlation in the the projected signals ΓYitRd beyond what is modeled by common covariates xi.

Remark 2.1.

The covariance model (2.1) and (2.2) should be distinguished from the principal component (PC) regression that relates xi with the PCs ΓYitRd, as our interest is in studying the association between the covariate xi with the variance of the components (ie heteroscedasticity), rather than with the components ΓYitRd themselves.

For a multivariate outcome signal YitRpZYitRp at time point t for subject i, Seiler and Holmes (2017) utilized a heteroscedesticity model, cov(Yit)=BxixiB+σ2Ip(t=1,,Ti)(i=1,,n), where the outcome covariance matrix Σi=cov(Yit) is modeled by a quadratic function of BxiRp, where BRp×q is the regression coefficient associated with xiRq, and σ2>0. However, this model is quite restrictive, as its outer product term BxixiBRp×p is of rank 1, and the noise covariance term σ2Ip is diagonal with independent variances. On the other hand, model (2.3) identifies a covariate associated rank-d (where d1) structure via Γ and allows a less restrictive noise covariance structure, which makes the covariance modeling with xi more flexible than that of Seiler and Holmes (2017). In particular, the outcome dimension reduction via Γ implicit in model (2.3) offers computational advantages through working with low dimensional (d-by-d) covariances (rather than full p-by-p covariances), that can be particularly advantageous when the number of within-subject time points (Ti) is relatively small compared to the signal dimension p. The general outer product approach proposed by Hoff and Niu (2012) replaces σ2Ip by a p × p SPD matrix, requiring a large number of parameters (that can scale quadratically in p). The approaches proposed in Fox and Dunson (2015); Zou et al. (2017) also similarly model the whole p × p matrix Σi, which may make the interpretation challenging for large matrices (Zhao et al. 2021b).

Zhao et al. (2021b) considered CAP regression, var(γ(k)Yit)=exp(xiβ(k)), where the PDCs γ(k)Rp(k=1,,d) are sequentially estimated subject to identifiability constraints γ(k)Σ¯γ(k)=1 (in which Σ¯ is a p × p covariance representative of the overall study population) and γ(k)γ(k)(kk). However, under a sequential optimization framework, joint inference on the outcome projection matrix Γ=[γ(1),,γ(d)]Rp×d and the regression coefficient B=[β(1),,β(d)]Rq×d is not straightforward, and thus, Zhao et al. (2021a,b, 2024) conducted bootstrap-based statistical inference only on the coefficients B, and not on Γ. On the other hand, the proposed model (2.1), coupled with the core tensor model (2.2), further accounts for the additional heteroscedasticity in the projected outcomes by using subject-level random effets zi to relax the model assumption, while simultaneously modeling all the relevant parameters (Γ,B,Ψi,Ω), allowing for more coherent downstream analysis that improves the model interpretability which we will discuss in Section 4.

2.2 Tangent space parametrization of dimension-reduced covariance

Due to the constraint vΣiv0 for all nonzero vRp, the space Symp+ of covariance matrices {Σi} forms a curved manifold which does not conform to Euclidean geometry; for example, the negative of a SPD matrix and some linear combinations of SPD matrices are not SPD (Schwartzman 2016). Thus, analyzing Σi in the Euclidean vector space is not adequate to capture the curved nature of PDCs, and leads to a biased estimation of PDCs (Zhao et al. 2021b). However, Symp+ is a Riemannian manifold under the affine-invariant Riemannian metric (AIRM) (Pennec et al. 2006), whose tangent space forms a vector space. We will use a Riemannian parametrization of SPD matrices in estimating the PDCs in this paper. A tangent space projection requires selection of a reference point that is close to Σi(i=1,,n) to be projected. A sensible reference point on Symp+ is a mean of Σi(i=1,,n), denoted as Σ¯Symp+. We will use the matrix whitening transport of Ng et al. (2016) to bring the covariances Σi(i=1,,n) close to Ip, by applying matrix whitening based on Σ¯. The resulting whitened covariances Σ¯12ΣiΣ¯12 would be close to the identity matrix Ip, at which we can construct a common tangent space for projection.

Remark 2.2.

Here we briefly review some relevant concepts of Riemannian geometry. Let ASymp+, and TA(Symp+) be the tangent space at A. Given two tangent vectors X1,X2TA(Symp+) at A, the AIRM inner product is X1,X2A=tr(A1X1A1X2). Given XTA(Symp+), there is a unique geodesic denoted as γ(t)Symp+ such that γ(0)=A and γ(0)=X,

γ(t)=ExpA(tX)=A12exp(tA12XA12)A12 (2.4)

that connects A to a point B=γ(1)Symp+ when evaluated at t = 1. For XTA(Symp+), the Exponential map, defined as ExpA(X):=γ(1)Symp+, projects the given X to a point BSymp+, in such a way that the A and X distance on the tangent plane is the same as that between A and B on the manifold. The (AIRM) Log map, which is the inverse mapping of ExpA(X), projects the point BSymp+ back to the tangent vector,

X=LogA(B)=A12log(A12BA12)A12TA(Symp+), (2.5)

and we can re-express the geodesic (2.4) as γ(t)=ExpA(tLogA(B)),t[0,1]. The corresponding geodesic distance between A and B is d(A,B)=LogA(B),LogA(B)A12=||log(A12BA12)||F, where ||·||F is the Frobenius norm.

In this paper, for each dimension reducing matrix ΓRp×d, we will use Ψ¯:=ΓΣ¯ΓSymd+, where Σ¯ is a fixed representative population level covariance, to “whiten” the individual level dimension-reduced covariances Ψi=ΓΣiΓSymd+(i=1,,n) of model (2.3). Specifically, we will normalize Ψi by Ψ¯12 (where Ψ¯12 is computed based on the eigendecomposition of Ψ¯=ΓΣ¯Γ), so that the resulting individual “whitened” SPD Ψi:=Ψ¯12ΨiΨ¯12=ΓΣ¯12ΣiΣ¯12Γ(i=1,,n) is close to the identity matrix Id. We will parametrize these Ψi(i=1,,n) in the tangent space at Id, by projecting Ψi at Id using the Log map,

LogId(Ψi)=log(Ψi)=log(Ψ¯12ΨiΨ¯12)(=ϕΨ¯(Ψi)), (2.6)

locally mapping the bipoint Ψ¯,ΨiSymd+×Symd+ to an element in the tangent space at Id. For notational convenience, in (2.6) let us denote the Log map, log(Ψ¯12ΨiΨ¯12) given Ψ¯, as ϕΨ¯(Ψi)Rd×d, which is no longer linked by the positive definiteness constraint (Pervaiz et al. 2020) and forms a vector space. Then, treating Ψi as a local perturbation of Ψ¯ in tangent space, we model ϕΨ¯(Ψi) in (2.6) by a linear model of the form,

ϕΨ¯(Ψi)=diag(B˜xi+z˜i) (2.7)

where the linear predictor B˜xi+z˜iRd lies in (unrestricted) Euclidean vector space. Upon parametrizing ϕΨ¯(Ψi) (with appropriate priors on B˜ and z˜iN(0,Ω)), we will re-map these covariate-parametrized objects ϕΨ¯(Ψi) in (2.7) to the original space in Symd+, by first taking Exponential map, Exp(ϕΨ¯(Ψi))=exp(ϕΨ¯(Ψi)) (ie taking (2.4) at t = 1 and A=Id) and then translating it back to the base point ΓΣ¯Γ through “de-whitening” with Ψ¯=ΓΣ¯Γ, yielding

Ψi=exp(ϕΨ¯(Ψi))ΓΣ¯Γ (2.8)

which completes our parameterization of the core tensor Ψi in (2.3). To define the mapping (2.6), we select Σ¯ to represent an estimate of the Euclidean average of Σi. Among examined estimators in previous works (Dadi et al. 2019; Pervaiz et al. 2020) this choice of Σ¯ showed stable performance across various scenarios. We set Σ¯=1ni=1nΣ^i, where Σ^i=1Til=1TiYitYit.

2.3 Posterior inference

2.3.1 Prior and likelihood specification

We perform posterior inference on the tangent space parameterized model (2.7), which will be mapped to parametrization (2.2). Let D represent the observed data and Ψ denote the collection {Ψi}i=1n, and let Yi={Yi1,,YiTi}. The posterior of parameters (Γ,Ψ,B˜,Ω) can be expressed as the the product of a prior and the likelihood,

p(Γ,Ψ,B˜,Ω|D)p(Γ,Ψ,B˜,Ω)i=1np(Yi|Γ,Ψ,B˜,Ω). (2.9)

The covariate relevant component likelihood for subject i under (2.1) is

p(Yi|Γ,Ψ,B˜,Ω)|ΓΨiΓ+LiLi|Ti/2exp(12t=1TiYit(ΓΨi1Γ+Li(LiLi)1Li)Yit)|Ψi|Ti/2exp(12t=1TiYitΓΨi1ΓYit)|Ξi|Ti/2exp(12t=1TiYitLiΞi1LiYit)|Ψi|Ti/2exp(12t=1Titr(YitΓΨi1ΓYit))|exp(ϕΨ¯(Ψi))|Ti/2exp(12t=1Titr(YitΣ¯12Γ(exp(ϕΨ¯(Ψi)))1ΓΣ¯12Yit)) (2.10)

where the last line follows from the tangent-space parametrization (2.8) of Ψi. Equation (2.10) indicates that the likelihood is in the form of a Gaussian likelihood of transformed responses,

ΓΣ¯12YitN(0,exp(ϕΨ¯(Ψi))=N(0,exp(diag(B˜xi+z˜i)) (2.11)

and no attempt will be made to estimate the parameters Li in (2.1) unrelated with xi.

We specify the prior p(Γ,Ψ,B˜,Ω)=p(Γ,B˜,Ω)p(Ψ|Γ,B˜,Ω) in (2.9) as

p(Γ)p(B˜)p(Ω)exp{12i=1n(ϕΨ¯(Ψi)B˜xi)Ω1(ϕΨ¯(Ψi)B˜xi)n2log|Ω|}, (2.12)

using independent priors p(Γ,B˜,Ω)=p(Γ)p(B˜)p(Ω) and a conditional prior on Ψ={Ψi}i=1n given (Γ,B˜,Ω) based on ϕΨ¯(Ψi)=diag(B˜xi+z˜i). In (2.12), ϕΨ¯(Ψi)Rd denotes the vector of the diagonal elements of ϕΨ¯(Ψi)Rd×d. For B˜Rd×q, we use a mean zero matrix Gaussian prior with element-wise standard deviation σB˜jk>0. For ΩSymd+, which we decompose into diag(ω)Ω˜diag(ω), we use an unit-scale half-Cauchy distribution (Gelman 2006; Polson and Scott 2012) on each element of the standard deviation vector ωRd (allowing for the possibility of extreme values) and a Lewandowski-Kurowicka-Joe (LKJ) prior (Lewandowski et al. 2009) on the correlation matrix Ω˜ with hyperparameter η>0 (specifying the amount of expected prior correlations). For ΓRp×d, we use a matrix angular central Gaussian (MACG) (Chikuse 1990; Jupp and Mardia 1999) with hyperparameter ΦSymp+. An orthonormal random matrix Γ is said to be distributed as a MACG (with parameter Φ) if Γ=dU(UU)1/2, where URp×d follows a p × d matrix normal distribution, whose density is

fU(U)=(2π)pd/2|Φ|d/2exp(tr(UΦ1U/2)). (2.13)

If the row covariance Φ=Ip, then the prior on U encodes no spatial information. In our illustrations, we employed flat priors on Γ and the correlation matrix Ω˜ (with Φ=Ip and η = 1, respectively), and weakly informative priors on B˜, using σB˜jk2=2.52.

2.3.2 Posterior computation via polar expansion

A Markov chain Monte Carlo (MCMC) sampling for Γ from the posterior (2.9) is challenging due to the restriction that Γ is in a Stiefel manifold. We will use polar expansion to transform the orthonormal parameter Γ to an unconstrained object (U) to work around this restriction. Generally, “parameter expansion” of a statistical model refers to methods which expand the parameter space by introducing redundant working parameters for computational purposes (Jauch et al. 2021). By polar decomposition (Higham 1986), any arbitrary matrix URp×d can be decomposed into two components,

U=ΓUSU, (2.14)

where the first component ΓU:=U(UU)1/2Rp×d is an orthonormal (rotation) matrix, and the second SU:=(UU)1/2Rd×d is a symmetric nonnegative (stretch tensor) matrix.

Using a MACG prior on Γ with prior on U in (2.13) allows for posterior inference on U (rather than directly on Γ). By employing the polar expansion of ΓU to U in (2.14), we “parameter expand” an orthonormal ΓU to an unconstrained U. This expanded parameter maintains the same model likelihood p(D|ΓU,Ψ,B˜,Ω) as in (2.10). However, the prior p(ΓU,Ψ,B˜,Ω) in (2.12) expands to p(U,Ψ,B˜,Ω) under parametrization (2.14), leading to the corresponding posterior expansion from p(ΓU,Ψ,B˜,Ω|D) in (2.9) to p(U,Ψ,B˜,Ω|D). Using MCMC, we first approximate samples from the expanded posterior p(U,Ψ,B˜,Ω|D), then conduct the polar decomposition (2.14) to obtain the samples from the posterior of ΓU, which can be verified via a change of variable from U to ΓU. Specifically, given a Markov chain {Us,Ψs,B˜s,Ωs} with a stationary distribution proportional to p(U,Ψ,B˜,Ω|D), we approximate the posterior of Γ by {Γs} where Γs=Us(UsUs)1/2 for each s, yielding approximate samples from p(Γ,Ψ,B˜,Ω|D).

In this paper, approximate the posterior distribution of parameters (U,Ψ,B˜,Ω) using an adaptive Hamiltonian Monte Carlo (HMC) sampler (Neal 2011) with automatic differentiation and adaptive tuning, implemented in Stan (2023). Consequently, we obtain HMC posterior samples of (Γ,Ψ,B,Ω). The mapping between B and B˜ is given in Supplementary Materials S1. As in any PCA-type analysis, there is a sign non-identifiability of Γ; the non-identifiability of matrix Γ up to random sign changes for each component. That is, the component vector γ(k) and γ(k) correspond to the same direction. We can align the posterior samples {γs(k)}. For the first post-warmup sample γ1(k), let j1=argmaxj(|γj,1(k)|). For s2, we compared the sign of γj1,s(k) with that of γj1,1(k), and if the signs disagreed, we multiplied γs(k) by –1. The aligned γs(k)’s were used to construct the credible intervals of γ(k). In Sections 3 and 4, we employed a burn-in of 700 steps, during which Stan optimizes tuning parameters for the HMC sampler. After burn-in, we ran HMC for an additional 1300 steps to generate 1300 post-warmup samples. Convergence was assessed by examining traceplots of random parameter subsets.

Unlike ICA, where the order of the extracted components is relatively arbitrary, the components γ(k)Yit(k=1,,d) in (2.1) specified by Γ=[γ(1),,γ(d)]Rp×d can be ranked based on the sample variance of the expected log-variance E[logΨˇi(k)|D] they explain across observations i=1,,n, where logΨˇi(k)=xiβ(k)(k=1,,d); here we exclude subject-level random effects zi(k)R to quantify only covariate-associated heteroscedasticity. Specifically, we sort the d estimated components in decreasing order of the magnitude of the sample variance V(k)=i=1n{E[logΨˇi(k)|D]1ni=1nE[logΨˇi(k)|D]}2(k=1,,d) of the expected log-variance E[logΨˇi(k)|D] attributable to xi.

2.3.3 Determination of the number d of the components

We propose to use a selection criterion based on the Watanabe-Akaike Information Criterion (WAIC) (Watanabe 2010) which can be used to estimate the expected log posterior. Given a fixed d, we compute the log pointwise predictive density (LPPD) of the dimension reduced model, penalized by the WAIC effective degrees of freedom, r^waic (e.g. Gelman et al. (2014)). Specifically, we select the dimensionality d of the covariate-assisted outcome projection, which maximizes the expected deviance between two models in the projected outcome space: one incorporating covariate-explained heteroscedasticity ΓYitN(0,Ψˇi=exp(diag(Bxi))), and the other without heteroscedasticity ΓYitN(0,Ψ¯=ΓΣ¯Γ). The expected deviance (scaled by –2) is estimated by

2i=1nt=1Ti1Ss=1SlogR(s)+2r^waic, (2.15)

where r^waic=i=1nt=1Ti1Ss=1S(logR(s)1Ss=1SlogR(s))2, in which R(s)=p(Γ(s)Yit|Ψˇi(s))p(Γ(s)Yit|Ψ¯(s)), ie the posterior ratio of the two models with vs. without covariate-explained heteroscedasticity, computed using the MCMC posterior parameter samples (s=1,,S). If the covariates xi are predictive of the covariances ΓΣiΓ along all PDCs ΓRp×d of rank d, then the corresponding expected log posterior, E[logp(ΓYit|Ψˇi)], will be large. However, for a too large rank d, the covariates may not predict the covariances ΓΣiΓ in all posited directions Γ, leading to a smaller expected log posterior ratio, E[logp(ΓYit|Ψˇi)p(ΓYit|Ψ¯)]=E[logp(ΓYit|Ψˇi)]E[logp(ΓYit|Ψ¯)], compared to that with the optimal projected outcome dimension d. Considering the ratio is crucial for making this criterion comparable across different d’s, and we select d that minimizes this expected deviance. In Supplementary Materials S2, we demonstrate the validity of this criterion in selecting the correct number of covariate-relevant heteroscedasticity components.

3 Simulation illustration

3.1 Simulation setup

For each unit (subject) i, we simulate a set of outcome signals YitRp(t=1,,Ti)(i=1,,n) from a Gaussian distribution with mean zero and p × p unit-specific covariance Σi. We vary n{100,200,300,400},Ti{10,20,30}, and p{10,20}. We use model (2.3) to generate ΣiRp×p, where the core SPD Ψi=exp(diag(Bxi+zi))Symd+ with d = 2, where xi=(1,xi1,xi2,xi3,xi4)Rq, is defined based on the subject-level linear predictors Bxi+zi,

Bxi+zi=(0.10.40.50.50.50.10.30.40.40.4)(1xi1xi2xi3xi4)+zi=β0+((xi1xi2xi3xi4)β(1)(xi1xi2xi3xi4)β(2))+zi

of dimension d = 2, where β0=(0.1,0.1)R2 is the intercept vector, β(1)=(0.4,0.5,0.5,0.5) and β(2)=(0.3,0.4,0.4,0.4) are the regression coefficients for (xi1,xi2,xi3,xi4)Rq1. We generate covariates xi1iidBernoulli(0.5) and xi2,xi3,xi4iidN(0,12), and the subject-specific random effects ziiidN(0,Ω), where Ω=(ω11ω12ω12ω22)=(0.520.10.10.52), to define Ψi.

For each simulation run, we use the von Mises–Fisher distribution to randomly generate an orthonormal basis matrix [Γ,L]Rp×p for YitRp, and its subcomponent LRp×(pd) is further transformed by subject-specific orthonormal matrices AiR(pd)×(pd), each randomly generated from the von Mises–Fisher distribution. Then, the “noise” covariance components LiLi=LAiexp(diag(ϵi))AiLRp×p are specified by generating ϵiRpd with each element ϵijiidN(0,0.52), whereas Γexp(diag(Bxi+zi))ΓRp×p specify the “signal” components. For each simulation run, we compute the base covariance Σ¯ that we use for tangent-space parametrization of model (2.3) as the sample marginal covariance on the training sample.

To investigate the robustness of the method against model misspecification, we further consider the case where there are no common eigenvectors Γ across subjects. We consider subject-level random perturbation using the subject-level rotation matrices R(θi)=(cos(θi)sin(θi)sin(θi)cos(θi)) with random angles θiiidUnif[π/10,π/10](i=1,,n), and use ΓR(θi)Rp×d(i=1,,n) in place of Γ in generating the responses in (2.1), referred to as “model misspecification” cases.

3.2 Evaluation metric

We run the simulation 50 times. For each simulation run, we compute, as evaluation metrics, the absolute cosine similarity 1|γ^(k),γ(k)| for the loading coefficient vectors (where a value close to 0 indicates the proximity) and the root mean squared error (RMSE) ||β^(k)β(k)||/4 (k = 1, 2) for the regression coefficient vectors, as well as the RMSE for the elements of the random effect covariance matrix Ω,||(ω^11,ω^12,ω^22)(ω11,ω12,ω22)||/3, where the notation ·^ represents the posterior mean of ·. While we conduct the model estimation using the tangent space parameterization (2.7) with B˜, the results are mapped to the original parametrization with B in (2.3). This approximately amounts to shifting the intercept vector β0:=(β0(1),β0(2))R2 by the diagonal elements of log(ΓΣ¯1Γ)R2×2 (see Supplementary Materials S1). We report the estimation performance for β0 by reporting RMSE ||β^0β0||/2, under the original parametrization with B. Additionally, to assess whether the constructed credible intervals provide reasonably correct coverage for the true values of the parameters, we evaluate the posterior credible intervals of the model parameters (γ(k),β(k),Ω) with respect to the frequentist’s coverage proportion. Specifically, for each simulation run, we estimate the posterior distribution of the parameters and calculate the 95% posterior credible intervals for the parameters, and then evaluate how often the credible intervals contain the true parameter values. We used a random initialization of the Markov chains in our posterior sampling.

3.3 Simulation results

In Fig. 1, as sample sizes (n,T) increase, the estimation performance tends to improve overall. Particularly when the sample sizes are relatively small (e.g. n=100,T=10), the improvement tends to depends on the magnitude of the covariate effects on the outcome projection component, as performance for parameters for the first component (γ(1) and β(1)) tends to be slightly better than those for the second components (γ(2) and β(2)), reflecting stronger covariate effects on the first projection component. The number of subjects (n) and time points (T) both influence performance; increasing T enhances estimation by providing more subject-level information for accurate estimates of subject-specific random effects and their covariance Ω, and accordingly population-level parameters γ(k) and β(k). The p = 10 cases reported in Supplementary Materials S3 show qualitative similar results to those for the p = 20 cases.

Fig. 1.

Fig. 1

The model parameter estimation performance for p = 20 case, for the loading coefficient vectors γ(k) (k = 1, 2), elements of the random effect covariance matrix Ω, regression coefficients β(k) (k = 1, 2), and intercept β0, averaged across 50 simulation replications, with varying n{100,200,300,400} and T{10,20,30}.

In terms of coverage probability, the results in Table 1 for both p = 10 and 20 cases indicate that the “actual” coverage probability is reasonably close to the “nominal” coverage probability of 0.95, particularly with larger sample sizes (e.g. n = 400, T = 30) for the regression coefficients β(k). Overall, the results in Table 1 suggest that the Bayesian credible intervals exhibit reasonable frequentist coverage, providing estimates of the parameter uncertainty that aligns with the desired coverage level. In Supplementary Materials S4, we further examine the model’s performance under misspecification: 1) when excluding the random effect component zi; and 2) when there are no common “signal” eigenvectors across subjects. Without the random effect, estimation performance remains comparable in terms of bias, but the coverage of 95% credible intervals tends to underestimate uncertainties, particularly for the regression coefficients β(k). The absence of common covariate-related eigenvectors introduces bias in estimating β(k), leading to lower coverage levels of the credible intervals than nominal. The average computation time (on a MacBook running M3 Max with 96 GB unified memory) was about 0.8 hours (SD=0.16) for obtaining 1300 posterior samples on n = 400 subjects with T = 30 time points and p = 20.

Table 1.

The proportion of time that 95% posterior credible intervals contain the true values of the projection loading vectors γ(k) (k = 1, 2), regression coefficients β(k) (k = 1, 2), and elements of Ω, averaged across 50 simulation replications, with varying n{100,200,300,400} and T{10,20,30}.a

p = 10
p = 20
n T γ(1) γ(2) β(1) β(2) Ω γ(1) γ(2) β(1) β(2) Ω
100 10 0.89 0.90 0.93 0.90 0.91 0.86 0.86 0.86 0.84 0.88
20 0.85 0.85 0.91 0.87 0.93 0.90 0.89 0.92 0.87 0.94
30 0.88 0.87 0.91 0.90 0.91 0.87 0.89 0.88 0.88 0.88
200 10 0.90 0.88 0.95 0.92 0.94 0.90 0.93 0.96 0.94 0.89
20 0.92 0.91 0.97 0.92 0.93 0.91 0.92 0.93 0.94 0.93
30 0.89 0.89 0.96 0.88 0.94 0.89 0.89 0.90 0.89 0.89
300 10 0.88 0.88 0.90 0.90 0.88 0.90 0.91 0.96 0.92 0.89
20 0.90 0.86 0.90 0.84 0.91 0.92 0.90 0.96 0.92 0.90
30 0.91 0.90 0.91 0.90 0.91 0.91 0.91 0.94 0.92 0.91
400 10 0.91 0.89 0.96 0.92 0.85 0.90 0.93 0.93 0.92 0.90
20 0.94 0.91 0.96 0.96 0.87 0.92 0.91 0.94 0.92 0.93
30 0.93 0.92 0.96 0.95 0.89 0.93 0.91 0.92 0.92 0.92
a

Coverage was computed for each entry, then averaged within components (γ(k),β(k) and Ω) and across the simulation replications (rounded to two significant digits).

4 Application

In this section, we applied the Bayesian CAP regression to data from HCP. As in Seiler and Holmes (2017), we used the rs-fMRI data from HCP 820 subjects and examined the associations between rs-fMRI and sleep duration. Each subject underwent 4 complete 15-min sessions (with TR=750 ms, corresponding to 1200 time points per session for each subject), and each 15-min run of each subject’s rfMRI data was preprocessed according to Smith et al. (2013). We focused on the first session which is about a typical duration for rs-fMRI studies. We also applied the proposed method to the other three sessions to examine the sensitivity and reliability of this regression (see Supplementary Materials S6, where the covariate-related FC exhibits a high level of consistency across all 4 scanning sessions, with the intra-cluster correlation coefficient value of 0.84, 0.72, 0.84 and 0.83, for the 4 identified network components in terms of the log-variance).

We used a data-driven parcellation based on spatial ICA with p = 15 components (ie using p = 15 data-driven “networks nodes;” see Fig. 2 for their most relevant axial slices in MNI152 space) from the HCP PTN (Parcellation + Timeseries + Netmats) dataset, where each subject’s rs-fMRI timeseries data were mapped onto the set of ICA maps (Filippini et al. 2009). We refer to Smith et al. (2013) for details about preprocessing and the ICA time series computation. We conduct inference on the association between the FC over these IC network nodes (Smith et al. 2012) and sleep duration, gender and their interaction.

Fig. 2.

Fig. 2

Fifteen independent components (ICs) from spatial group-ICA constituting a data-driven parcellation with 15 components (“network nodes”), provided by the HCP PTN dataset, represented at the most relevant axial slices in MNI152 space. According to Seiler and Holmes (2017), these IC networks correspond to default network (Net15), cerebellum (Net9), visual areas (Net1, Net3, Net4, and Net8), cognition-language (Net2, Net5, Net10, and Net14), perception-somesthesis-pain (Net2, Net6, Net10, and Net14), sensorimotor (Net7 and Net11), executive control (Net12) and auditory (Net12 and Net13).

As in Seiler and Holmes (2017), we classified the subjects into two groups: a group of 489 conventional sleepers (average sleep duration between 7 and 9 hours each night) and a group of 241 short sleepers (average equal or less than 6 hours each night). This yielded a total of 730 participants to compare FC (over the IC networks in Fig. 2) between short and conventional sleepers. Since the time series are temporally correlated, we inferred the equivalent sample size of independent samples. We computed the effective sample size (ESS) defined by Kass et al. (1998), ESS=mini{1,,n},j{1,,p}(Ti1+2s=1cor(Yi1(j),Yi,1+s(j))), where Yit(j) is the data at time t of the jth network node for subject i, following a conservative approach taking the minimum over all p components and n subjects as the overall estimator. Based on the estimated ESS, we performed thinning of the observed timeseries data, subsampling Ti=T=ESS=34 time points for each subject. The resulting outcome data, Yit(i=1,,n)(t=1,,T), were then mean-removed per each subject (so that i=1TYit=0R15 for each i), and we focused on the association between their covariances Σi and covariates.

We used the WAIC criterion (2.15) to identify d = 4 projection components. The models’ WAIC values over the range of d = 1 to 6 were –227.9, –397.6, –520.4, –602.7, –573.4, and –358.4, where the minimizer was the d = 4 case. The parameters (Γ, B and Ω) with d = 4 are summarized by their posterior means and 95% credible intervals, reported in Supplementary Materials S4. The expected value of the log Deviation from Diagonality (DfD) was 0.60, suggesting a moderate departure from the diagonality of Ψi assumed in (2.2), but the deviation is not overly pronounced.

Under model (2.1), for a linear contrast vector δRq, we can define the log covariance “contrast” map due to a δ-change in the covariates xRq, which corresponds to Γ(diag())ΓRp×p (see Supplementary Materials S7), where BRd×q is the regression coefficient matrix in (2.2). Specifically, the diagonal elements of this contrast matrix Γ(diag())Γ can be extracted and exponentiated. This represents the response signals’ variance ratio (VR) corresponding to a δ-change in the covariates. For the four contrasts derived from the SleepDuration × Gender interaction, the left two column panels in Fig. 3 present the response signals’ variance ratio, contrasting (i) short vs. conventional sleeper among male; (ii) short vs. conventional sleeper among female; (iii) male vs. female among short sleeper; and (iv) male vs. female among conventional sleeper.

Fig. 3.

Fig. 3

The response signals’ variance ratio (posterior means and 95% credible intervals), corresponding to the four contrasts formed by the Gender-by-SleepDuration interaction. The 95% credible intervals that do not include the variance ratio of 1 are highlighted in red. The sets (“parcel sets”) of network nodes whose signals’ variances are expected to change in the same impact directions due to the corresponding contrasts are indicated in the last column panels, for the Short vs. Conventional sleeper contrasts in the top row, and for the Male vs. Female contrasts in the bottom row.

In Fig. 3, the nodes, or “parcels,” whose VR values were identified (based on 95% credible intervals) to be significantly different from 1, were all with VR > 1. The third column panels of Fig. 3 indicate the nodes whose signals’ variances are expected to change in the same direction, for the Short vs. Conventional sleeper contrasts in the top row panel, and for the Male vs. Female contrasts in the bottom row panel.

For each δ contrast, we can infer the δ contrasts’ impact on the connectivity by 95% credible intervals on the p(p+1)/2 connectivity elements of the contrast matrix Γ(diag())Γ. The first column panels in Fig. 4 display the covariance elements identified to be significant, whereas the second column panels display the posterior mean of the matrix elements of Γ(diag())Γ, where each row panel corresponds to each δ contrast in the covariates. The results from the statistical significance maps in Fig. 4 indicate that, overall, there are more substantial connectivity differences between Short and Conventional sleepers (the first two row panels), compared to the cases when we compare Male vs. Female (the last two row panels), and there were slightly more pronounced Short vs. Conventional sleepers differences among Males (the first row panel) than among Females (the second row panel). While there were several identified connectivity differences between Male vs. Female among Short sleepers, there were no statistically significant Male vs. Female differences among Conventional sleepers.

Fig. 4.

Fig. 4

The statistical significance map (the left column panels) and the posterior mean (the right column panels) of the log covariance contrast Γ(diag())Γ for each of the four covariate contrasts δ, derived from the SleepDuration × Gender interaction.

One conventional method for analyzing group ICA data involves initially computing subject-level Pearson correlations between the ICs, which are then Fisher z-transformed. This process is performed on (p(p1)/2=) 105 pairs of correlations (calculated from 15 ICs), while we conduct the element-wise log transformation on the p = 15 diagonal elements. A total of 120 element-wise linear regressions were then conducted on SleepDuration, Gender and their interaction, and P-values were corrected for multiplicity using the Benjamini–Hochberg (BH) (Benjamini and Hochberg 1995) procedure to control the false discovery rate (FDR) at 0.05. The patterns of the connectivity differences, implied by each δ-contrast, from this mass-univariate approach are presented in Supplementary Material S10, which were similar to the results from Bayesian CAP in Fig. 4. However, compared to the results from Bayesian CAP, far fewer statistically significant elements (13 vs. 77, out of 480 elements) were identified.

While the CAP regression formulation of Zhao et al. (2021b) also alleviates the multiplicity issue and thus can improve statistical power, inference is limited to the association between covariates and the projected outcome components, making it challenging to interpret covariates’ impacts in measured ROIs directly. Therefore, the approach is not directly comparable with the proposed approach here. In Supplementary Materials S8, we display the similarity (similarity between –1 and 1, with 0 indicating orthogonal) of the estimated projection directions from CAP (Zhao et al. 2021b) (in their first four leading components) and those from the proposed Bayesian latent factor model, which shows positive association for each projection direction with the similarity at least 0.4. We also report the CAP regression coefficients (with 95% bootstrap confidence intervals) for each estimated projected outcome component.

According to the meta analysis in Smith et al. (2009), the identified Parcel Set contrasting the Short vs. Conventional sleeper in Fig. 3 mainly correspond to visual areas (network nodes N1, N3, N4, N8), auditory areas (N12, N13) and sensory motor (N11). Curtis et al. (2016) found that self-reported sleep duration primarily co-varied with FC in auditory, visual, and sensorimotor cortices. Specifically, shorter sleep durations were associated with increased FC among auditory, visual, and sensorimotor cortices (these regions roughly correspond to the network nodes N1, N3, N4, N8, N12, N13, and N11), and decreased FC between these regions and the cerebellum (N9). These positive and negative associations found in Curtis et al. (2016) are consistent with the results in the contrast maps presented in Fig. 4 which contrast Short vs. Conventional sleepers.

5 Discussion

Extending the frequentist approach developed in Zhao et al. (2021b) under a probabilistic model (2.1), coupled with a geometric formulation of the dimension-reduced covariance objects Ψi in (2.3), the proposed Bayesian method provides a framework to conduct inference on all relevant parameters simultaneously, that produces more interpretable results regarding how the covariates’ effects are expressed in the ROIs. Furthermore, the outcome dimension reduction approach avoids the need to work with subject-specific full p-by-p sample covariance matrices, which can suffer from estimation instability when the number of time points (volumes) is not large (which is typically the case for fMRI signals). Generally, the CAP formulation of Zhao et al. (2021b) allows for a more targeted and efficient analysis by identifying the specific components of the outcome data relevant to the association between covariates and FC.

Although the computational burden and complexity associated with working with the full p-by-p sample covariance matrix can be significantly alleviated by reducing the dimensionality of the outcome data, the method is generally not suitable to be run in very high-dimensional outcome data, such as voxel-level data, and is better suited for intermediate spaces, such as those produced by ICA or an anatomical parcellation. Overfitting might occur due to the large number of parameters in the estimation of the outcome projection matrix Γ. Future work will apply prior distributions on the dimension reducing matrix Γ as well as on the covariate effect parameters B that promote sparsity, for improved estimation and interpretation in higher dimensional spaces.

As in Zhao et al. (2021a,b, 2024), the assumption that we make in conducting the inference is partially common eigenvectors of the covariance structure (Wang et al. 2021), in which the covariance is decomposed into shared and unique components, where the shared components captures the information related to the covariates. Future endeavors will explore strategies to mitigate concerns related to model misspecification by addressing heterogeneity in these shared components across subjects. We have conducted preliminary thinning of the observed multivariate time-series to achieve an effective sample size, involving subsampling to eliminate temporal dependencies. Subsequent investigations will refine this approach to delve into individual differences in dynamic FC (e.g. Zhang et al. 2020; Bahrami et al. 2022), incorporating dimension reduction models that account for both between-subject heterogeneity in spatial patterns and within-subject temporal correlation through state-space modeling of latent factors. This will facilitate a deeper exploration of associations between covariates and FC.

A main challenge in modeling covariance matrices is the positive definiteness constraint. Unlike a mean vector where a link function can act element-wise, the positive-definiteness on a covariance matrix is a constraint on all its entry (Pourahmadi 2011). One approach is to transform the problem into an unconstrained estimation problem through a transformation such as Cholesky decomposition, although this requires natural ordering information. Alternative way is to consider a more fundamental geometric formulation, that views individual covariances as elements on a (nonlinear) manifold. A more global transformation (compared to an entry-wise transformation) such as matrix log-transformation then maps individual covariances to a tangent space, allowing for unconstrained operations. However, a global log-transformation poses interpretability challenges, as it generally alters the covariate’s impact directions with respect to the measured ROIs. Our geometry-based CAP approach focuses on identifying relevant eigenvectors, while simultaneously estimating eigenvalues-by-covariates associations through a linear model in a tangent space. By assuming and identifying relevant eigenvectors Γ that align with the covariates’ impact directions, the global log transformation maintains their orientation regarding the covariates’ effects, thus the estimated pairwise covariance contrasts preserve their interpretability as covariate-induced pairwise connectivity differences.

Yet another important challenge is the high dimensionality, as the number of covariance elements increase quadratically in the response variable’s dimension. Generally, CAP regression of Zhao et al. (2021b), and its extension developed here, is useful if there is no need to model the generation of the entire observations, and one is only interested in isolating the data into a potentially low-dimensional representation in which they exhibit certain desired characteristics such as maximizing the model likelihood associated with xi. Such supervised dimension reductions can generally mitigate the curse of dimensionality in covariance modeling.

Supplementary Material

kxae023_Supplementary_Data

Acknowledgements

The author is grateful to Dr Xiaomeng Ju and Dr Thaddeus Tarpey for helpful discussions and to the three reviewers of this manuscript for their constructive reviews.

Supplementary material

Supplementary material is available at Biostatistics Journal online.

Funding

This work was supported by National Institutes of Health (NIH) grant 5 R01 MH099003. Data were provided by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University.

Conflict of interest statement

None declared.

Data availability

The code used in this paper is accessible at the following GitHub repository: https://github.com/syhyunpark/bcap

References

  1. Bahrami M, Laurienti PJ, Shappell HM, Dagenbach D, Simpson SL. A mixed-modeling framework for whole-brain dynamic network analysis. Network Neurosci. 2022:6(2):591–613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological). 1995:57(1):289–300. [Google Scholar]
  3. Boik RJ. Spectral models for covariance matrices. Biometrika. 2002:89(1):159–182. [Google Scholar]
  4. Cai TT, Li H, Liu W, Xie J. Joint estimation of multiple high-dimensional precision matrices. Stat Sin. 2016:26(2):445–464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Calhoun V, Liu J, Adali T. A review of group ICA for fMRI data and ICA for joint inference of imaging, genetic, and ERP data. Neuroimage. 2009:45(1):S163–S172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cheng J, Levina E, Wang P, Zhu J. A sparse ising model with covariates. Biometrics. 2014:70(4):943–953. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chikuse Y. The matrix angular central gaussian distribution. J Multivar Anal. 1990:33(2):265–274. [Google Scholar]
  8. Crainiceanu CBS, Luo S, Zipunnikov VMCM, Punjabi NM. Population value decomposition, a framework for the analysis of image populations. J Am Stat Assoc. 2011:106(495):775–790. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Curtis BJ, Williams PG, Jones CR, Anderson JS. Sleep duration and resting fMRI functional connectivity: examination of short sleepers with and without perceived daytime dysfunction. Brain Behav. 2016:6(12):e00576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Dadi K, Rahim M, Abraham A, Chyzhyk D, Milham M, Thirion B, Varoquaux G. Benchmarking functional connectome-based predictive models for resting-state fMRI. Neuroimage. 2019:192(15):115–134. [DOI] [PubMed] [Google Scholar]
  11. Dai M, Zhang Z, Srivastava A. Analyzing dynamical functional connectivity as trajectories on space of covariance matrices. IEEE Trans Med Imaging. 2020:39(3):611–620. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Danaher P, Wang P, Witten DM. The joint graphical lasso for inverse covariance estimation across multiple classes. J R Stat Soc Ser B. 2014:76:(2)373–397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Davoudi A, Ghidary SS, Sadatnejad K. Dimensionality reduction based on distance preservation to local mean for symmetric positive definite matrices and its application in brain–computer interfaces. J Neural Eng. 2017:14(3):036019. [DOI] [PubMed] [Google Scholar]
  14. Durante D, Dunson DB. Bayesian inference and testing of group differences in brain networks. Bayesian Anal. 2018:13(1):29–58. [Google Scholar]
  15. Eickhoff SB, Yeo BTT, Genon S. Imaging-based parcellations of the human brain. Nat Rev Neurosci. 2018:19(11):672–686. [DOI] [PubMed] [Google Scholar]
  16. Engle RF, Kroner KF. Multivariate simultaneous generalized ARCH. Econ Theory. 1995:11(1):122–150. [Google Scholar]
  17. Filippini N, MacIntosh BJ, Hough MG, Goodwin GM, Frisoni GB, Smith SM, Matthews PM, Beckmann CF, Mackay CE. Distinct patterns of brain activity in young carriers of the apoe-e4 allele. Proc Natl Acad Sci USA. 2009:106(17):7209–7214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Flury BN. Common principal components in k groups. J Am Stat Assoc. 1984:79(388):892–898. [Google Scholar]
  19. Flury BN. Asymptotic theory for common principal component analysis. Ann Stat. 1986:14(2):418–430. [Google Scholar]
  20. Fong PW, Li WK, An HZ. A simple multivariate ARCH model specified by random coefficients. Comput Stat Data Anal. 2006:51(3):1779–1802. [Google Scholar]
  21. Fornito A, Bullmore ET. Connectomic intermediate phenotypes for psychiatric disorders. Front Psychiatry. 2012:3:32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Fornito A, Zalesky A, Breakspear M. Graph analysis of the human connectome: promise, progress, and pitfalls. Neuroimage. 2013:15(80):426–444. [DOI] [PubMed] [Google Scholar]
  23. Fox EB, Dunson DB. Bayesian nonparametric covariance regression. J Mach Learn Res. 2015:16(77):2501–2542. [Google Scholar]
  24. Franks AM. Reducing subspace models for large-scale covariance regression. Biometrics. 2022:78(4):1604–1613. [DOI] [PubMed] [Google Scholar]
  25. Franks AM, Hoff P. Shared subspace models for multi-group covariance estimation. J Mach Learn Res. 2019:20(171):1–37. [Google Scholar]
  26. Friston K. Functional and effective connectivity. Brain Connect. 2011:1(1):13–36. [DOI] [PubMed] [Google Scholar]
  27. Gao W, Ma Z, Xiong C, Gao T. Dimensionality reduction of SPD data based on Riemannian manifold tangent spaces and local affinity. Appl Intell. 2023:53:1887–1911. [Google Scholar]
  28. Gelman A. Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 2006:1(3):515–533. [Google Scholar]
  29. Gelman A, Hwang J, Vehtari A. Understanding predictive information criteria for bayesian models. Stat Comput. 2014:24:997–1016. [Google Scholar]
  30. Grillon ML, Oppenheim C, Varoquaux G, Charbonneau F, Devauchelle AD, Krebs MO, Bayle F, Thirion B, Huron C. Hyperfrontality and hypoconnectivity during refreshing in schizophrenia. Psychiatry Res. 2013:211(3):226–233. [DOI] [PubMed] [Google Scholar]
  31. Guo J., Levina E., Michailidis G., Zhu J. Joint estimation of multiple graphical models. Biometrika. 2011:98:1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Ha M, Stingo F, Baladandayuthapani V. Bayesian structure learning in multi-layered genomic networks. J Am Stat Assoc. 2021:116(534):605–618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Harandi M, Salzmann M, Hartley R. Dimensionality reduction on SPD manifolds: the emergence of geometry-aware methods. IEEE Trans Pattern Anal Mach Intell. 2017:40(1):48–62. [DOI] [PubMed] [Google Scholar]
  34. Higham NJ. Computing the polar decomposition—with applications. SIAM J Sci Stat Comput. 1986:7(4):1059–1417. [Google Scholar]
  35. Hinne M, Ambrogioni L, Janssen RJ, Heskes T, van Gerven MA. Structurally-informed bayesian functional connectivity analysis. Neuroimage. 2014:1(86):294–305. [DOI] [PubMed] [Google Scholar]
  36. Hoff P. A hierarchical eigenmodel for pooled covariance estimation. J R Stat Soc Ser B (Stat Methodol). 2009:71(5):971–992. [Google Scholar]
  37. Hoff P, Niu X. A covariance regression model. Stat Sin. 2012:22(2):729–753. [Google Scholar]
  38. Hutchison RM, Womelsdorf T, Allen EA, Bandettini PA, Calhoun VD, Corbetta M, Della PS, Duyn JH, Glover GH, Gonzalez-Castillo J, Handwerker DA, Keilholz S, Kiviniemi V, Leopold DA, de Pasquale F, Sporns O, Walter M. et al. Dynamic functional connectivity: promise, issues, and interpretations. Neuroimage. 2013:80(15):360–378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Jauch M, Hoff PD, Dunson DB. Monte Carlo simulation on the Stiefel manifold via polar expansion. J Comput Graph Stat. 2021:30(3):622–631. [Google Scholar]
  40. Jupp PE, Mardia KV.. Directional Statistics. London: John Wiley & Sons, 1999. [Google Scholar]
  41. Kass RE, Carlin BP, Gelman A, Neal R.. Markov chain monte carlo in practice: a roundtable discussion. Am Stat. 1998:52(2):93–100. [Google Scholar]
  42. Kolar M, Parikh AP, Xing EP.. On sparse nonparametric conditional covariance selection. In: ICML-10, Madison, WI: Omnipress, 2010, 559–566. [Google Scholar]
  43. Leday GG, de Gunst GB, Kpogbezan MC, van der Vaart AW, van Wieringen WN, van de Wiel MA. Gene network reconstruction using global-local shrinkage priors. Ann Appl Stat. 2017:11(1):41–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Lee KH, Xue L. Nonparametric finite mixture of gaussian graphical models. Technometrics. 2018:60(4): 511–521. [Google Scholar]
  45. Lewandowski D, Kurowicka D, Joe H.. Generating random correlation matrices based on vines and extended onion method. J Multivar Anal. 2009:100(9):1989–2001. [Google Scholar]
  46. Li B, Solea E. A nonparametric graphical model for functional data with application to brain networks based on FMRI. J Am Stat Assoc. 2018:113(524):1637–1655. [Google Scholar]
  47. Li L, Zhang X. Parsimonious tensor response regression. J Am Stat Assoc. 2017:112(519):1131–1146. [Google Scholar]
  48. Li Y, Lu R. Locality preserving projection on SPD matrix lie group: algorithm and analysis. Sci China Inf Sci. 2018:61:092104. [Google Scholar]
  49. Lin Z, Wang T, Yang C, Zhao H. On joint estimation of gaussian graphical models for spatial and temporal data. Biometrics. 2017:73(3):769–779. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Lindquist M. The statistical analysis of fMRI data. Stat Sci. 2008:23(4):439–464. [Google Scholar]
  51. Liu H, Chen X, Wasserman L, Lafferty J.. Graph-valued regression. In: Advances in Neural Information Processing Systems 23 (NIPS 2010), Vancouver, British Columbia, Canada. Curran Associates, Inc.; 2010, 1423–1431. [Google Scholar]
  52. Lock EF. Tensor-on-tensor regression. J Comput Graph Stat. 2018:27(3):638–647. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Meinshausen N, Buhlmann P.. High-dimensional graphs and variable selection with the lasso. Ann Stat. 2006:34(3):1436–1462. [Google Scholar]
  54. Monti RP, Hellyer P, Sharp D, Leech R, Anagnostopoulos C, Montana G. Estimating time-varying brain connectivity networks from functional mri time series. Neuroimage. 2014:103:427–443. [DOI] [PubMed] [Google Scholar]
  55. Narayan M, Allen GI, Tomson S.. 2015. Two sample inference for populations of graphical models with applications to functional connectivity, arXiv, arXiv:1502.03853, preprint: not peer reviewed.
  56. Neal RM. MCMC Using Hamiltonian Dynamics. Chapter 5, Bocan Raton: Chapman and Hall-CRC Press; 2011. [Google Scholar]
  57. Ng B, Varoquaux G, Poline J, Greicius M, Thirion B. Transport on Riemannian manifold for connectivity-based brain decoding. IEEE Trans Med Imaging. 2016:35(1):208–216. [DOI] [PubMed] [Google Scholar]
  58. Ni Y, Stingo FC, Baladandayuthapani V. Bayesian graphical regression. J Am Stat Assoc. 2019:114(525):184–197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Peng J, Wang P, Zhou N, Zhu J. Partial correlation estimation by joint sparse regression models. J Am Stat Assoc. 2009:104(486):735–746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Pennec X, Fillard P, Ayache N. A Riemannian framework for tensor computing. Int J Comput Vis. 2006:66(1):41–66. [Google Scholar]
  61. Pervaiz U, Vidaurre D, Woolrich MW, Smith SM. Optimising network modelling methods for fMRI. Neuroimage. 2020:211:116604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Peterson CB, Stingo FC, Vannucci M. Bayesian inference of multiple gaussian graphical models. J Am Stat Assoc. 2015:110(509):159–174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Polson NG, Scott JG. On the half-cauchy prior for a global scale parameter. Bayesian Anal. 2012:7(4):887–902. [Google Scholar]
  64. Pourahmadi M. Covariance estimation: the GLM and regularization perspectives. Stat Sci. 2011:26(3): 369–387. [Google Scholar]
  65. Pourahmadi M, Danielrs MJ, Park T. Simultaneous modelling of the Cholsky decomposition of several covariance matrices. J Multivar Anal. 2007:98(3):568–587. [Google Scholar]
  66. Saegusa T, Shojaie A. Joint estimation of precision matrices in heterogeneous populations. Electronic J Stat. 2016:10(1):1341–1392. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Schwartzman A. Lognormal distributions and geometric averages of symmetric positive definite matrices. Int Stat Rev. 2016:84(3):456–486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Seiler C, Holmes S. Multivariate heteroscedasticity models for functional brain connectivity. Front Neurosci. 2017:11:696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Smith SM, Fox PT, Miller KL, Glahn D, Fox PM, Mackay CE, Filippini N, Watkins KE, Toro R, Larid AR. et al. Correspondence of the brain’s functional architecture during activation and rest. Proc Natl Acad Sci USA. 2009:106(31):13040–13045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Smith SM, Miller KL, Moeller S, Xu J, Auerbach EJ, Woolrich MW, Beckmann CF, Jenkinson M, Andersson J, Glasser MF, Van Essen DC, Feinberg DA, Yacoub ES. et al. Temporally-independent functional modes of spontaneous brain activity. Proc Natl Acad Sci USA. 2012:109(8): 3131–3136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Smith SM, Vidaurre D, Beckmann CF, Glasser MF, Jenkinson M, Miller KL, Nichols TE, Robinson EC, Salimi-Khorshidi G, Woolrich MW, Barch DM, Ugurbil K. et al. Functional connectomics from resting-state fmri. Trends Cognit Sci. 2013:17(12):666–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Stan Development Team. Stan modeling language users guide and reference manual, 2.35. 2023. https://mc-stan.org.
  73. Sun WW, Li L. Store: sparse tensor response regression and neuroimaging analysis. J Mach Learn Res. 2017: 18(1):4908–4944. [Google Scholar]
  74. Tan LSL, Jasra A, De Iorio M, Ebbels TMD. Bayesian inference for multiple gaussian graphical models with application to metabolic association networks. Ann Appl Stat. 2017:11(4):2222–2251. [Google Scholar]
  75. van der Heuvel M, Hulshoff Pol H.. Exploring the brain network: a review on resting-state fMRI functional connectivity. Neuropsychopharmacol Rep. 2010:20(8):519–534. [DOI] [PubMed] [Google Scholar]
  76. Van Essen D, Smith S, Barch D, Behrens T, Yacoub E, Ugurbil K. et al. The WU-Minn human connectome project: an overview. Neuroimage. 2013:80:62–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Varoquaux G, Baronnet F, Kleinschmidt A, Fillard P, Thirion B. Detection of brain functional-connectivity difference in post-stroke patients using group-level covariance modeling. Med Image Comput Comput Assist Interv. 2010:13(1):200–208. [DOI] [PubMed] [Google Scholar]
  78. Wang B, Luo X, Zhao Y, Caffo B. Semiparametric partial common principal component analysis for covariance matrices. Biometrics. 2021:77(4):1175–1186. [DOI] [PubMed] [Google Scholar]
  79. Wang Z, Kaseb AO, Amin HM, Hassan MM, Wang W, Morris JS. Bayesian edge regression in undirected graphical models to characterize interpatient heterogeneity in cancer. J Am Stat Assoc. 2022:117(538): 533–546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Watanabe S. Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J Mach Learn Res. 2010:11(116):3571–3594. [Google Scholar]
  81. Whittaker J. Graphical Models in Applied Multivariate Statistics. Wiley Series in Probability and Mathematical Statistics, Chichester: John Wiley and Sons, 1990. [Google Scholar]
  82. Woodward ND, Rogers B, Heckers S. Functional resting-state networks are differentially affected in schizophrenia. Schizophrenia Res. 2011:130(1–3):86–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Xia Y, Cai T, Cai TT. Testing differential networks with applications to the detection of gene-gene interactions. Biometrika. 2015:102(2):247–266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Xia Y, Cai T, Cai TT. Multiple testing of submatrices of a precision matrix with applications to identification of between pathway interactions. J Am Stat Assoc. 2018:113(521):328–339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Xia Y, Li L. Hypothesis testing of matrix graph model with application to brain connectivity analysis. Biometrics. 2017:73(3):780–791. [DOI] [PubMed] [Google Scholar]
  86. Xie X, Yu ZL, Lu H, Gu Z, Li Y. Motor imagery classification based on bilinear sub-manifold learning of symmetric positive-definite matrices. IEEE Trans Neural Syst Rehabil Eng. 2017:25(6):504–516. [DOI] [PubMed] [Google Scholar]
  87. Zhang J, Li Y. High-dimensional gaussian graphical regression models with covariates. J Am Stat Assoc. 2023;118(543):2088–2100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Zhang J, Wei SW, Li L. Mixed-effect time-varying network model and application in brain connectivity analysis. J Am Stat Assoc. 2020:115(532):2022–2036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Zhao Y, Caffo BS, Luo X. Principal regression for high dimensional covariance matrices. Electronic J Stat. 2021a:15(2):4192–4235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Zhao Y, Caffo BS, Luo X. Longitudinal regression of covariance matrix outcomes. Biostatistics. 2024;25(2):385–401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Zhao Y, Wang B, Mostofsky SH, Caffo BS, Luo X. Covariate assisted principal regression for covariance matrix outcomes. Biostatistics. 2021b:22(3):629–645. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Zou T, Lan W, Wang H, Tsai C-L.. Covariance regression analysis. J Am Stat Assoc. 2017:112(517):266–281. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

kxae023_Supplementary_Data

Data Availability Statement

The code used in this paper is accessible at the following GitHub repository: https://github.com/syhyunpark/bcap


Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES