Abstract
Blind source separation (BSS) aims to separate latent source signals from their mixtures. For spatially dependent signals in high dimensional and large-scale data, such as neuroimaging, most existing BSS methods do not take into account the spatial dependence and the sparsity of the latent source signals. To address these major limitations, we propose a Bayesian spatial blind source separation (BSP-BSS) approach for neuroimaging data analysis. We assume the expectation of the observed images as a linear mixture of multiple sparse and piece-wise smooth latent source signals, for which we construct a new class of Bayesian nonparametric prior models by thresholding Gaussian processes. We assign the vMF priors to mixing coefficients in the model. Under some regularity conditions, we show that the proposed method has several desirable theoretical properties including the large support for the priors, the consistency of joint posterior distribution of the latent source intensity functions and the mixing coefficients, and the selection consistency on the number of latent sources. We use extensive simulation studies and an analysis of the resting-state fMRI data in the Autism Brain Imaging Data Exchange (ABIDE) study to demonstrate that BSP-BSS outperforms the existing method for separating latent brain networks and detecting activated brain activation in the latent sources.
Keywords: Latent source signal separations, posterior consistency, neuroimaging, sparse signals, spatially dependent signals
1. Introduction
Neuroimaging techniques such as functional magnetic resonance imaging (fMRI) have become an important tool to investigate neural processing in brain functioning. The observed three-dimensional (3D) brain imaging data such as fMRI blood-oxygen-level-dependent (BOLD) effects represent the combination of source signals generated by various underlying brain functional networks (Power et al., 2011). A common objective in imaging analysis is to decompose the observed whole-brain 3D images to identify and characterize underlying brain networks, which are organizations of multiple brain regions that demonstrate correlated brain signals measured by functional imaging. This problem can be addressed by blind source separation (BSS) (Biswal and Ulmer, 1999, BSS) methods which aim at separating latent source signals from their mixture observations. BSS methods such as principal component analysis (PCA) and independent component analysis (ICA) have been applied in neuroimaging studies for this purpose.
In this work, we propose a new Bayesian spatial blind source separation (BSP-BSS) modeling framework for extracting sparse latent signals from spatially dependent neuroimaging data. Let be the cubic volume region of voxel v in brain images. Let Xiv represent the observed imaging intensity value at voxel v for the ith image. For example, Xiv can represent the BOLD signal intensity at voxel v from the ith fMRI frame for a single-subject research, or the statistical map derived from neuroimaging data for the ith subject for a multi-subject research. We decompose the expectation of Xiv as a linear combination of q latent components:
(1) |
where represents the spatial source signal intensity at voxel v for the jth component. Here, Sj(·) is an intensity measure (Kallenberg, 2017) which is a deterministic function mapping a spatial region to the expected intensity from the spatial source signal. Aij’s are the mixing coefficients that mix the q latent source signals to generate the observed data. We make statistical inferences on model parameters under the Bayesian framework. We adopt the thresholded Gaussian process (TGP) as the prior model to account for the sparsity and spatial dependence among the source signals, where TGP is a stochastic process constructed by thresholding a smooth Gaussian process and it provides a large probability support for a class of sparse and piecewise-smooth functions.
The proposed model in (1) has fundamental distinctions from the commonly used BSS methods in neuroimaging such as ICA (McKeown et al., 1998; Calhoun et al., 2001; Beckmann and Smith, 2004; Shi and Guo, 2016). ICA has become one of the most commonly used tools for decomposing functional neuroimaging data to investigate underlying brain functional networks (Calhoun et al., 2001; Beckmann and Smith, 2005; Guo, 2011; Shi and Guo, 2016; Wang and Guo, 2019). However, there are several methodological and application limitations with the ICA methods. First, the existing ICA model typically assumes the components’ spatial source signals across the brain as independent random samples from a latent distribution (Hyvärinen and Oja, 2000). However, the spatial independence assumption is usually violated in neuroimaging data where there are known spatial dependence in the neural signals in the brain (Derado et al., 2010). This is due to the similarity of the neural functioning in nearby brain locations and also due to the spatial smoothing of the images that is performed as a common step in pre-processing of neuroimages in order to improve the signal-to-noise ratio. The spatial independence assumption inherited from the classic ICA, therefore, is a major methodological limitation of the existing ICA models in neuroimaging applications. Secondly, given the large number of voxels in the brain, raw results from existing ICA methods are often noisy and require thresholding to identify significant source signals in the brain. With the absence of a unified approach for thresholding, various strategies have been used in different studies (Beckmann and Smith, 2004; Griffanti et al., 2017), which reduces the comparability in the results across studies. Thirdly, a well-known challenge in BSS methods such as ICA is the choice of the number of latent components in the decomposition. Although some selection methods have been proposed (Minka, 2001; Beckmann and Smith, 2004; Li et al., 2006, 2007), their performance is not well studied, and often the selection criteria do not have the intuitive appeal in neuroimaging applications.
Another related work to BSP-BSS is the Bayesian spatial factor model, which is a powerful tool for efficient dimension reduction and flexible covariance modeling for high-dimensional data, and has been applied in imaging data analysis (Montagna et al., 2018; Guo et al., 2022). In particular, in the meta-analysis of neuroimaging data by Montagna et al. (2018), the latent factors play a role to connect the intensity of consistent brain activation patterns to covariate variables. In image-on-image regression by Guo et al. (2022), the spatial latent factors are introduced to capture the association between the response image and predictor images. However, the existing factor models cannot achieve the goals of BSP-BSS for making inferences on the spatial source signals from high-dimensional imaging data. In other application fields such as public health (Wang and Wall, 2003), finance (Gelfand et al., 2004, 2007) and environmental statistics (Guhaniyogi et al., 2013; Ren and Banerjee, 2013; Zhang and Banerjee, 2022), spatially-oriented data are collected and the spatial factor models have been developed to capture the spatial variation and dependence of data collected across geographical areas. For example, motivated by the analysis of the commercial real estate prices, the spatially varying linear model of coregionalization (SVLMC) (Gelfand et al., 2004) has been constructed using latent Gaussian spatial processes. This method has been further extended to analyze air monitor value data in California (Ren and Banerjee, 2013), where indicator variables are included in the model to select the latent processes that capture spatial dependence.
Compared with existing methods, our proposed BSP-BSS has the following appealing features. First, BSP-BSS explicitly models the spatial dependence of the latent source signals via the covariance kernels of TGP which can effectively accommodate the complex spatial correlations among the neuroimaging data. Furthermore, the TGP prior can outperform the shrinkage priors adopted by other Bayesian BSS methods (Fevotte and Godsill, 2006; Knowles and Ghahramani, 2007; Zayyani et al., 2009; Bhattacharya and Dunson, 2011; Mohammad-Djafari, 2012) for detecting sparse and spatially dependent signals. To select the important brain regions and networks, BSP-BSS provides a unified framework to make Bayesian inferences with theoretical guarantees on the thresholding parameter and provides more accurate measures on the uncertainty of brain region selection. In addition, BSP-BSS utilizes the intrinsic properties of TGP to assign a positive prior probability to the scenario that a latent source has zero or ignorable effects in terms of brain activation, indicating the latent source does not effectively generate meaningful source signals and hence can be eliminated. This provides a systematic Bayesian modeling approach to making posterior inferences on the number of effective latent sources.
The main advantages of BSP-BSS stem from using the TGP prior to achieve sparsity and spatial dependence of the latent source signals simultaneously. TGP is a special stochastic process constructed by thresholding a latent Gaussian process (GP). The general GP is a flexible modeling tool for functions, curves, and images which may involve complex correlation structures. Over the past decades, the GP has been applied extensively to spatial statistics and machine learning (Rasmussen, 2003; Banerjee et al., 2008; Boehm Vock et al., 2015; Nychka et al., 2015). Recent works in neuroimaging have also shown the advantage of GP in modeling spatially correlated neuroimaging data (Marquand et al., 2010; Hyun et al., 2014, 2016; Kang et al., 2018). The soft-TGP has been successfully adopted to specify priors for spatially varying coefficients in scalar-on-image regression (Kang et al., 2018). In general, for Bayesian modeling of sparsity, the thresholded Gaussian priors (Nakajima and West, 2013a,b; Ni et al., 2019; Cai et al., 2020) have been shown as successful alternatives to shrinkage priors. Of note, to construct the thresholded prior models, the “hard” thresholding operator is widely used in the aforementioned existing literature while the “soft” thresholding operator (Kang et al., 2018) is adopted particularly for modeling sparse, continuous, and piece-wise smooth effects of imaging predictors on the outcome variable. A soft-TGP prior ensures the spatially-varying function is continuous with probability one, i.e., it places a zero probability on the function with discontinuous jumps. This model assumption is appropriate in scalar-on-image regression as it can reflect the continuous change of the scalar outcome variable due to the changes of imaging predictors over space. However, it is not a suitable prior specification for the spatial source signal intensity in our model which reflects the complex activation pattern of brain images. In many neuroimaging studies, it is common that the important brain activation regions have sharp edges on the boundaries due to intrinsic properties of brain functions and anatomical structures (Smith and Nichols, 2018). Thus, we adopt a hard-TGP which provides a large prior support for a wide range of sparse and piece-wise smooth functions with discontinuous jumps. In addition, the threshold parameter in the hard-TGP has an appealing interpretation as the minimum detectable signal intensity, while it is not available with the soft-TGP.
The theoretical properties for BSP-BSS are completely different from those derived from the traditional BSS framework. In traditional BSS such as ICA, the latent source signals are assumed to be a set of random variables that follow a parametric or nonparametric distribution involving unknown parameters. The theoretical justifications of statistical inference have been focused on the mixing coefficients and the distribution of the latent source signals (Samarov and Tsybakov, 2004; Samworth and Yuan, 2012; Shen et al., 2016). Alternatively, BSP-BSS treats each latent source signal as a sparse and piece-wise smooth spatially-varying function. Thus, both latent source signals and mixing coefficients are unknown parameters of interest in BSP-BSS. Under the Bayesian inference framework, we assign the von Mises-Fisher (vMF) priors (Fisher, 1953; Watson, 1982) to the mixing coefficients, ensuring the model identifiability. We specify the priors for the latent source signals using TGP, ensuring the sparsity and spatial dependence of the latent source signals simultaneously. We establish the theoretical properties of the proposed model, which enjoys the large prior support, leading to the joint posterior consistency of mixing coefficients and latent source intensity functions, as well as the selection consistency of the effective number of latent source signals.
The rest of the paper is organized as follows: Section 2 develops the new model along with the identifiability conditions, the prior specifications, and the posterior inference procedure. Section 3 establishes all the theoretical properties of the proposed method. Section 4 focuses on the details of posterior computation, where we adopt the stochastic gradient Hamiltonian Monte Carlo (SGHMC) algorithm. The advantages of the proposed method over existing methods are demonstrated in Section 5 with simulations and in Section 6 with analysis of resting-sate fMRI data analysis in the Autism Brain Imaging Data Exchange (ABIDE) study. Section 7 concludes with a brief discussion on future work.
2. Bayesian Spatial Blind Source Separation
Let be a compact region in the d-dimensional Euclidean space for a positive integer d. Suppose is partitioned into V disjoint but spatially contiguous sub-regions, denoted as , such that . In neuroimaging applications, represents the whole brain region and each may represent a voxel, i.e. the basic cubic volume element in the 3D image, and it may also refer to a brain region of interest, i.e. a collection of spatially contiguous voxels. Suppose we obtain observations from n images on and denote by the intensity value of for the ith observed image (i = 1,… , n). We perform spatial blind source separation on Xiv into q latent source components:
(2) |
where ϵiv follows a zero-mean normal distribution and its variance measures total variability of spatial noises in . We assume {ϵiv} are independent over i and v. The functions and are intensity measures defined on the measurable space with being a σ-field of brain region (Kallenberg, 2017). We assume Sj(·) is a signed measure (Cannarsa and D’Aprile, 2015) in that we allow observed image Xiv to take both positive and negative values. For the jth component, represents the intensity of latent source signals from , and Aij is the unknown mixing coefficient for the ith observed image. For any , let be the Lebesgue measure of and we assume
(3) |
where sj(ξ) and represent the jth spatial source signal intensity and the spatial noise variance intensity at location ξ in brain region respectively. Write as the mixing matrix with being the jth column of A, and s(·) = {s1(·),… , sq(·)} as the collection of q unknown spatial source intensity functions. For simplicity, we drop off “(·)” and denote s = s (·) and sj = sj(·) for the rest of paper. We define the effective number of latent sources as follows:
(4) |
where ∥sj∥1, i.e. the L1 norm of sj, reflects the effect size of the jth latent source signal and qeff counts the number of latent sources with nonzero effects. In neuroimaging applications, qeff represents the number of unique activation patterns that are essential to recovering the observed brain images, potentially providing insights on brain functions and/or structures.
The proposed model is a new modeling framework for spatial blind source signal separation, where the latent source signals are represented as the deterministic intensity functions. This model assumption is fundamentally different from existing BSS methods such as ICA and factor models in which the latent source signals are random variables and only their distributions are identifiable. As the parameters of interest include multiple functions and matrices, it is important to define the parameter space and establish model identifiability conditions as a foundation for developing a formal statistical inference procedure. We discuss these issues in detail in Section 2.1.
2.1. Parameter Space and Model Identifiability
We begin with notations. Let , p ≥ 1 be the Lp-norm of a real function f on , and denote the L∞-norm. Let be the Lp-norm of an array of real functions f = (f1,… , fq)⏉, and denote the L∞-norm. Similarly, for any vector , let be the Lp-norm and ∥v∥ = maxi=1(|vi|) be the L∞-norm. For any matrix , let be the Lp-norm and ∥V∥∞ = maxi,j(|Vij|) be the L∞-norm. Let denote a differentiable function of order ρ on . Let be an event indicator function where if event occurs and , otherwise. Denote by A*, s*, and σ* the true parameters of the model that generates data and let be the parameter space of A and s, where and are defined in the assumptions below.
Assumption 1.
The true intensity functions belong to a space where is a set of functions defined on a compact closed set . We say , if there exists a permutation of {1,… , q}, denoted as {ω1,… , ωq}, such that . Without loss of generality, we assume for simplicity. We assume there exists a non-empty open set with ; and are non-overlapped across j’s, i.e. for j ≠ j′. We assume and sj satisfy the following two conditions: (1) sparsity: there exists a constant ζ > 0 such that and , where is the closure of ; and (2) piece-wise smoothness: there exists an integer ρ0 > 0 such that .
Assumption 2.
The jth column of the true mixing matrix A* belongs to a space defined as . Let and .
Assumptions 1 and 2 define the parameter space of the proposed model. We assume the latent source signal intensity functions belong to a functional space of piece-wise smooth, sparse and bounded functions. This assumption provides flexibility for modeling intensity functions with various shapes. For a general BSS problem, it may be strong to assume that the nonzero signal regions of intensity functions do not overlap. However, in neuroimaging applications, the well-known functional brain activation regions typically do not overlap (Smith et al., 2009). In the simulation study (see Section 5), we also consider the cases where the true intensity functions are partially overlapped between different source signals. We show that the proposed models achieve a better accuracy to identify the true sources compared with the Infomax ICA (Bell and Sejnowski, 1995), a classic ICA algorithm. It is well-known that an ICA model is identifiable up to permutation and scaling of the sources. Our model has a similar but more challenging issue: the intensity functions s of our interests may not identifiable since the model involves the integrals of the intensity functions over brain regions. Thus, to ensure the scale identifiability, we assume the L2-norm of each column of the mixing coefficient matrix A equals in Assumption 2. A similar assumption has been made in Bayesian factor analysis, known as the -orthonormal factor assumption (Ma and Liu, 2021), to avoid magnitude inflation of the posterior samples of the loading matrix. To ensure the intensity function s is uniquely determined by the intensity measure S, we impose additional constraints on in Assumption 3.
Assumption 3.
The spatial regions satisfy: 1) and , for any v ≠ v′. 2) For any v and j, or . 3) There exist constants K0 and K1 such that and , where .
Now we establish the model identifiability in the following proposition.
Proposition 1.
(Identifiability) If Assumptions 1 – 3 hold, model (2) is identifiable up to a joint flipping of the signs of source functions and mixing coefficients, i.e., for any (A, s), (A′, s′) ∈ Θ, n > 0, and V > 0, if , there exists a flipping indicator such that (A[δ0], s[δ0]) = (A′, s′) where and are the flipped versions of A and s respectively.
The identifiability stated above does not involve permutation of the sources as the order of the sources is specified in Assumption 1. The model is identifiable up to permutation if we reconstruct the true parameter space accordingly.
2.2. Prior Specifications
We discuss the prior specifications for BSP-BSS. For the mixing coefficient matrix A, we assign the independent vMF priors to its scaled columns, i.e.,
(5) |
where η > 0 and ρ is an n-dimensional vector with ∥ρ∥2 = 1. The vMF distribution is chosen because it is defined on the (n − 1) - dimensional sphere in . This property ensures (Assumption 2) holds with probability one. Let represent a Gaussian process with mean μ and covariance kernel κ. We assume sj follows a TGP, denoted as , which can be constructed as follows:
(6) |
where gζ(x) = I(|x|> ζ)x for ζ > 0 is a thresholding function. We also assume sj’s are independent across j’s. The independence assumption is not necessarily required for establishing the asymptotic properties of our model, but it is generally assumed in BSS of neuroimaging and leads to more efficient posterior computation algorithms and satisfactory empirical results for imaging data analysis from our experiences. If one would like to model dependence across latent sources, a more general covariance structure (Rowe, 2002) can be considered for special applications, but it is beyond the scope of this paper.
To make fully Bayesian inference on the hyperparameters, we assign the independent inverse gamma priors for across v with shape aσ and scale bσ, denoted as , which is a conjugate prior, leading to the Gibbs sampling update scheme in posterior computation. We adopt a noninformative prior for the thresholding parameter ζ by assuming ζ ∝ I(ζ > 0).
2.3. Bayesian Inference
For posterior inference, we resort to Markov chain Monte Carlo (MCMC) for which we discuss in detail in Section 4. Suppose we obtain H MCMC samples of (A, s), denoted as . We estimate A by the posterior mean, i.e., . To estimate s and detect the activation regions, we first compute the posterior inclusion probability (PIP), i.e., PIPj(ξ) = Pr{sj(ξ) ≠ 0}, for each intensity function j at each location by . Then, we estimate sj(ξ) by the weighted posterior mean if PIPj(ξ) ≠ 0, and otherwise. We estimate the effective number of latent source signals in (4) using by
(7) |
Given a posterior probability level p0 ∈ (0, 1), we estimate the activation region by
(8) |
which is interpreted as the jth brain activation region where each location has the nonzero effect with a marginal posterior probability at least p0. A common choice of p0 is 0.5, which has been widely adopted as the marginal median posterior inclusion probability criterion for model selection in linear regression (Barbieri and Berger, 2004). To choose p0 with controlling the false discovery rate (FDR), we suggest adopting an approach by Morris et al. (2008) for Bayesian functional data analysis. Both approaches can accurately select the activation regions with slight differences for the case with a large sample size or a high signal-to-noise ratio. See detailed comparisons in Section 3 in the Supplementary Material.
3. Theoretical Properties
We investigate the theoretical properties of the BSP-BSS model. The motivation of this investigation is the nonparametric nature of the proposed model, as it involves unknown intensity functions and the number of unknown mixing coefficients increases as the sample size increases. Thus, the classic theory of Bayesian inferences for parametric models does not apply to BSP-BSS, for which we need to study three main theoretical properties: large support of TGP and the vMF distributions as priors for the true parameter space (Theorem 1), joint posterior consistency of the mixing coefficients and the latent sources (Theorem 2) and selection consistency on the effective number of latent sources (Theorem 3). For simplicity, we assume the hyperparameters are fixed at certain values. We follow the general posterior consistency theorem proposed by Choudhuri et al. (2004) to show Theorem 2, which requires us to verify the prior positivity of neighborhoods (Lemma 1) and construct uniformly consistent tests whose type 1 and type 2 errors have specific bounds (Lemmas 2–8). Theorems 2 and 3 are shown with both the sample size n and the number of spatial regions V being large. Hence, we assume V depends on n and write V as Vn in the rest of the paper. Details of the proofs can be found in the Supplementary Material.
3.1. Assumptions
We introduce the following additional assumptions for the theorems.
Assumption 4.
There exist constants M2, M3 such that .
Assumption 5.
There exist constants c1, c2 and ν such that and 0 < v < 1 − d / (2ρ0), with d being the dimension of the spatial space .
Assumption 6.
Given , the kernel function κ(ξ,·) has continuous partial derivatives up to order 2ρ0 + 2.
Assumption 4 imposes the restrictions on the noise variance intensity , which ensures the total variance over the brain is bounded away from zero and infinity. Assumption 5 implies that the order of Vn should be the same with a polynomial of n that is related with the dimension d and the smoothness of the kernel functions of the TGP. This assumption is sensible in that the number of voxels of the standard brain template is much larger than the number of images in the neuroimaging studies. We specify the smoothness of the kernel functions with Assumption 6 following Ghosal and Roy (2006). We summarize all the assumptions along with the explanations in neuroimaging applications in Table 1.
Table 1.
Summary of assumptions and their interpretations in neuroimaging applications.
Assumptions | Interpretations | |
---|---|---|
A.1 | Source intensity functions are | Neural signals are sparse across the brain; |
sparse and piecewise smooth with | signal changes smoothly in the regions with | |
non-zero regions not overlapped | the same type of brain tissues; there is little | |
across different latent sources. | spatial overlap between major brain networks. | |
A.2 | Each column of the mixing matrix | Mixing weights of important brain activation |
A satisfies with . | patterns are scale-invariant. | |
A.3 | The size of spatial regions satisfies | Spatial volumes are similar across voxels/brain |
with . | regions; the volume of a voxel is smaller in | |
higher resolution images. | ||
A.4 | The variance of noise satisfies with | Noise level in brain images are bounded. |
. | ||
A.5 | The sample size satisfies with | The number of voxels/ROIs can grow |
. | faster than the number of images. | |
A.6 | The kernel function κ (ξ,) | The change rate of brain signals over voxels |
Assumptions | Interpretations | |
is smooth up to order 2σ0 + 2. | is bounded in the smooth transition region. |
3.2. Large Support
We show with Theorem 1 that the TGP and the vMF distribution priors have large support. This result ensures a positive prior probability such that model parameters concentrate on the arbitrarily small neighborhoods of any values in the true parameter space.
Theorem 1.
(Large support) Suppose A and s are independent and follow the priors (5) and (6) specified in Section 2.2 with some hyperparameters. If Assumptions 1 and 2 hold, for any (A*, s*) ∈ Θ, any flipping indicator δ and any ε > 0,
3.3. Posterior Consistency
Next, we establish the consistency of the joint posterior distribution of A and s, which provides theoretical justifications for large-scale imaging data analysis via BSP-BSS. For any 0 < M1 < ∞, let Θ := {(A, s) ∈ Θ :∥A∥∞ < M1} be the parameter space of interest.
Theorem 2.
(Consistency) If Assumptions 1 – 6 hold, for any ε > 0, there exists a flipping indicator δ0 such that, as n → ∞,
in - probability, for any true parameter Θ* = (A*, s*) ∈ Θ, where is the actual distribution of the data X given Θ*.
In Theorem 2, we restrict the parameter space of interest to Θ which only includes the bounded A. This consistency result indicates that the joint posterior distribution of A and s concentrates on an arbitrarily small neighborhood of the true parameter Θ* in Θ, as both the number of voxels (or regions) Vn and the number of observed images n go to infinity. The neighborhood of A* is defined with the L1-norm multiplied by 1 / n since the dimension of A increases with the sample size, which implies the mixing coefficients for images converge to the truth on average. The neighborhood of s* is defined with the functional L1-norm.
3.4. Selection Consistency on the Number of Latent Sources
To perform the theoretical analysis of BSP-BSS for selecting the number of latent source signals, we extend the parameter space from to by including the zero-intensity function into each functional space and establish the following theorem.
Theorem 3.
If Assumptions 1 – 6 hold, as n → ∞,
in -probability, where qeff is the effective number of latent sources as defined in equation (4), i.e. and .
Theorem 3 implies that BSP-BSS can correctly estimate the effective number of latent sources with a high probability for a sufficiently large number of images. Thus, as long as the specified q is adequately large, BSP-BSS can potentially identify all the effective latent source signals among the observed images. This property is especially useful when there is a lack of prior knowledge on the number of latent sources. The finite-sample performance on latent source signal selection is investigated through a simulation study in Section 5.3.
4. Posterior Computation
Now we discuss the posterior computation details. The spatial resolution of brain imaging data can be high and the standard brain template may contain hundreds of thousands of voxels. This poses computational challenges for the posterior inferences on BSP-BSS with voxel-level data. To address this issue, we adopt an equivalent representation of the prior model for the intensity function in light of the eigendecomposition of the covariance kernel in TGP. By the intrinsic property of GP (Ghosal and Roy, 2006), when the number of eigenfunctions is sufficiently large, the proposed BSP-BSS model can be well approximated by a truncated model representation, for which we develop a computationally feasible posterior computation algorithm for the large scale voxel-level imaging data. The R package BSPBSS is freely available on the author’s GitHub at https://github.com/benwu233/BSPBSS.
4.1. Prior Representation of Intensity Functions
Consider the eigendecomposition of the covariance kernel , where is the set of eigenvalues with λl ≥ λl+1 for l = 1, 2,… , and represents the set of eigenfunctions that satisfy with , for any l, l′ ∈ {1, 2,… ,}. Equivalently, Bj(ξ) can be represented as a linear combination of the eigenfunctions, i.e. , where bjl ~ N(0, λl), j = 1, … , q, l = 1,… are mutually independent. In practice, we can truncate the summation at a sufficiently large finite number of components L to obtain a fairly good approximation of Bj, i.e. , where bj = (bj1,… , bjL)⏉ and ψ(ξ) = {ψ1(ξ),… , ξL(ξ)}⏉. Since the imaging signals appear to be smooth in many brain regions, to achieve a good approximation, the required number of eigenfunctions L is still much smaller than the number of voxels Vn. Thus, with this approximation, the number of parameters for Bj can be reduced substantially, leading to a feasible posterior computation algorithm. By the truncation approximation of Bj, we introduce the truncated TGP prior representation of intensity function. Let B = (b1,… , bq), then we approximate sj(ξ) in (3) with
(9) |
where Λ = diag{λ1,… , λL} and 0L is a vector of zeros with length L. We resort to the numerical approximation to compute the integral in (3). When analyzing the voxel-level imaging data, represents the small cubic region of voxel v and can be accurately approximated by where ξv is the center location of the voxel v.
4.2. Markov Chain Monte Carlo
We develop a Metropolis-Hasting within Gibbs sampling algorithm to simulate the posterior distribution of the proposed BSP-BSS model (2) with the prior approximation (9). For A and σ, the full conditional distributions have the close form leading to the Gibbs sampling update schemes. The parameter ζ is updated with the Metropolis algorithm with random walk proposals. The algorithm to update parameter B is the most challenging step due to the high-dimensionality and the complexity of the full conditional density function that involves the hard thresholding function leading to the discontinuity. We propose a smooth approximation of the hard thresholding function and adopt the stochastic gradient Hamiltonian Monte Carlo (SGHMC) proposed by Chen et al. (2014) to update B given other parameters (Algorithm 1).
Algorithm 1:
SGHMC for updating B at the hth MCMC iteration
sample momentum: ν(h) ~ N(0, ηI);
set (B0, v0) = (B(h−1), v(h−1));
for t = 1,… , T do
update Bt = Bt−1 + νt−1;
sample ωt ~ N(0, 2αηI);
sample and ;
update ;
end
set (B(h), ν(h)) = (BT, νT);
In Algorithm 1, given the sample of other parameters at the hth iteration, i.e., A(h), σ(h) and ζ(h), we compute the stochastic gradient term with respect to B by subsampling both the indices of images and the indices of regions, i.e.,
where π(·|A, B, σ, ζ) is the density function of the observed image intensity given all parameters and π(B) is the prior of B. The two index sets and are random subsets of the image indices and the region indices , respectively. The number of leapfrog steps T, the learning rate η and the momentum term (1 − α) can be chosen according to the suggestions of Chen et al. (2014). The details of the MCMC sampling scheme are in the Supplementary Material.
5. Simulations
We conduct simulations to evaluate the performance of the proposed model under various scenarios. In Scenarios I and II, observed images are generated from a mixture of source signals with geometric spatial patterns with either sharp or smooth edges. In Scenario III, we generate fMRI type data using a neuroimaging Matlab toolbox SimTB (Erhardt et al., 2012) where the spatial source signals and the temporal mixing matrix are generated based on real fMRI spatial signals and time series. We compare the proposed BSP-BSS with ICA implemented using the popular Infomax algorithm (Bell and Sejnowski, 1995). We evaluate the performance of the methods in recovering the underlying source signals as well as the mixing matrices from the observed mixed data. In application, spatial signals estimates from standard ICA algorithms such as Infomax ICA are often thresholded to identify activated regions in each source signal (McKeown et al., 1998; Kiviniemi et al., 2003). Therefore, we also consider a thresholded Infomax ICA. Specifically, the estimated spatial source signals from Infomax ICA are transformed to Z-scores (McKeown et al., 1998) and thresholded source signals are obtained by only retaining Z-scores with a magnitude greater than two. For BSP-BSS, we use the modified squared exponential kernel for the TGP prior of s. Details can be found in Section 2.6 in the Supplementary Material. The MCMC sampling runs for 4,000 iterations with 2,000 burn-in. The results for all three Scenarios suggest that BSP-BSS achieves much better performance compared with existing methods. Next, we present Scenarios I and II in detail, and include details for Scenario III in Section 3.2 in the Supplementary Material.
5.1. Scenario I: Geometric Source Signals with Sharp Edges
We first consider a linear mixture of three latent sources with 2D geometric patterns with sharp edges. We generate three 30×30 binary source images as the true latent sources s*, in which the activated regions have geometric plane shapes (square, circle, and triangle) with sharp edges. We set the sample size to be n = 30. Each column of the true mixing matrix A* is generated using the vMF prior with the concentration parameter η = 0. We consider three cases with different noise levels (low, medium and high) by setting the value of to 1 × 10−3, 5 × 10−4 and 1 × 10−4 respectively. The corresponding average signal-to-noise ratios are around 0.41, 0.83 and 4.34, respectively.
5.2. Scenario II: Geometric Source Signals with Smooth Edges
Then we smooth the latent sources in Scenario I to generate spatial source signals with smooth edges, leading to a more challenging scenario. Specifically, we first generate binary images as in Scenario I and then replace the value on each pixel with the average over its neighbors to smooth the source signals, leading to smooth edges for the activated regions. Other simulation settings remain the same as in Scenario I. The smoothing reduces the signal-to-noise ratios of observations in high, medium, and low noise levels to 0.18, 0.35, and 1.76 respectively, therefore making the estimation more difficult. Figure 1 shows the true latent sources in both Scenarios I and II.
Fig. 1.
True spatial source signal intensities in Scenarios I and II.
Table 2 compares the performance of BSP-BSS and ICA over 100 simulation runs in terms of their performance as binary classifiers for separating activated and non-activated regions. Specifically, we compare the means and standard deviations of their positive and negative predictive values (PPV and NPV), sensitivity and specificity. In Scenario I with sharp edges, both methods show high PPV, NPV, and specificity, with BSP-BSS demonstrating much higher sensitivity. This shows BSP-BSS has higher statistical power than the standard ICA in detecting activated regions with sharp edges while maintaining a similar false positive rate. Scenario II is a more challenging case for the proposed BSP-BSS and ICA as binary classifiers since the activated regions have smooth edges and the signal-to-noise ratios are lower than Scenario I. Consequently, we find some reduction in the classification accuracy in Scenario II from both methods as compared to Scenario I. However, the relative performance of BSP-BSS is much better than ICA in Scenario II, especially in terms of NPV and sensitivity. For this challenging scenario, BSP-BSS still maintains a sensitivity of 0.5–0.7 across the noise levels while the sensitivity of ICA drops dramatically to 0.09–0.13. These results indicate that BSP-BSS’s advantage of higher statistical power in detecting activated regions is even more obvious when the activation regions have smooth edges.
Table 2.
Selection accuracy of the activated regions for BSP-BSS and ICA for both simulation scenarios, shown as the mean (standard deviation) over 100 simulation runs. The ICA source signals are extracted using Infomax, transformed to Z-socres and thresholded with | z |≥ 2. The values are multiplied by 103.
Scenario I: sharp edge | Scenario II: smooth edge | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Noise | Method | PPV | NPV | Sens. | Spec. | PPV | NPV | Sens. | Spec. | ||||||||
Low | BSP-BSS | 995 | (4) | 998 | (1) | 988 | (4) | 999 | (1) | 998 | (4) | 782 | (13) | 701 | (23) | 999 | (3) |
ICA | 999 | (10) | 974 | (4) | 841 | (27) | 1000 | (1) | 1000 | (1) | 552 | (1) | 130 | (4) | 1000 | (0) | |
Medium | BSP-BSS | 996 | (4) | 997 | (1) | 983 | (4) | 999 | (1) | 999 | (2) | 724 | (11) | 591 | (22) | 999 | (1) |
ICA | 998 | (16) | 952 | (4) | 704 | (25) | 1000 | (2) | 969 | (15) | 546 | (1) | 109 | (5) | 997 | (2) | |
High | BSP-BSS | 995 | (4) | 996 | (1) | 979 | (5) | 999 | (1) | 999 | (4) | 689 | (12) | 515 | (26) | 999 | (2) |
ICA | 993 | (5) | 934 | (3) | 586 | (21) | 999 | (0) | 872 | (58) | 538 | (3) | 90 | (9) | 988 | (5) |
We also compare the accuracy of BSP-BSS and ICA in estimating the mixing matrix by evaluating the Amari error (Amari et al., 1996). The Amari error, a measure of the difference between two nonsingular matrices, has been used in the ICA studies (Bach and Jordan, 2002; Chen and Bickel, 2006; Lee et al., 2011) as a convergence criterion for ICA algorithms and a measure of accuracy between the true and estimated mixing matrix. Amari error was originally defined for invertible square matrices since the mixing matrix of many traditional ICA algorithms is a square matrix due to the dimension reduction and whitening pre-processing steps often conducted prior to ICA. However, in our BSP-BSS, such preprocessing steps are not required and the mixing matrix is not necessarily a squared matrix. Therefore, we extend the Amari error to non-squared matrices by considering the generalized inverse of A*. Specifically, we define the extended Amari error between A and A* as:
where . Figure 2 compares the Amari errors of the estimated mixing matrices based on BSP-BSS and ICA over 100 simulation runs for both Scenarios I and II with boxplots. The results show BSP-BSS has a much smaller Amari error compared with ICA. In summary, the comparisons between the proposed BSP-BSS and ICA show that our method can potentially provide a more powerful and accurate tool for blind source separation, especially in the presence of smooth source signals.
Fig. 2.
Amari errors of the estimated mixing coefficients with BSP-BSS and ICA (implemented using Infomax) over 100 simulation runs for simulation Scenarios I and II.
5.3. Selection of the Number of Latent Sources
As discussed in Section 2.3, the BSP-BSS model automatically selects the effective number of latent sources. To evaluate its accuracy, we generate observations as in Scenarios I and II above at the three noise levels with the true effective number of latent sources is qeff = 3. We decompose the observed data with the proposed BSP-BSS model with overspecified numbers of latent sources q = 5, 7, 9. To handle such a challenging task, we fit the BSP-BSS model using a longer MCMC chain with 6,000 iterations and 3,000 burn-in, and assign the prior of the thresholding parameter ζ with an upper bound such that ζ ∝ I(0 < ζ < Q(s)) to improve convergence. Here, we take Q(s) as the 95% quantile of the absolute value of all the sj(ξ). Then we estimate the effective number of latent sources using the proposed estimator in (7), which implies each source should have non-zero values on at least one pixel. The results from Table 3 show that the proposed estimator has correctly identified the true number of latent sources qeff = 3 in almost all of the simulation runs, even when the noise level is high and q is significantly overspecified in the BSP-BSS model. For those few simulation runs with incorrect results, the estimated effective number of latent sources is 4, which is very close to the true value. The results show that a promising strategy for identifying the effective number of latent sources is to specify a relatively large number of sources q in the proposed BSP-BSS model and then use the proposed estimator to identify the effective number of sources. This automatic selection strategy is very useful in real data applications where we usually do not know the “true” number of sources prior to conducting the blind source separation.
Table 3.
The frequency of correct selection of the number of latent sources for BSP-BSS over 100 simulation runs. The observations are generated as in simulation Scenarios I and II. The true number of source signal is qeff = 3. We fit BSP-BSS model by specifying q = 5, 7, or 9.
Scenario I: sharp edge | Scenario II: smooth edge | |||||
---|---|---|---|---|---|---|
Noise | q = 5 | q = 7 | q = 9 | q = 5 | q = 7 | q = 9 |
Low | 100 | 99 | 100 | 100 | 100 | 94 |
Medium | 100 | 100 | 100 | 100 | 100 | 99 |
High | 100 | 100 | 100 | 98 | 100 | 100 |
6. Analysis of ABIDE Data
We apply our method to analyze the multi-subject resting-state fMRI (rs-fMRI) data from the Autism Brain Imaging Data Exchange (ABIDE) study (Di Martino et al., 2014) to investigate the differences in brain functional networks between patients with autism and healthy subjects.
6.1. Data Description
ABIDE collected functional and structural brain imaging data from laboratories around the world, aiming to accelerate the understanding of the neural bases of autism. We apply the proposed BSP-BSS to analyze ABIDE I preprocessed data (Craddock et al., 2013). ABIDE I, which was released in August 2012, is a collaboration of 16 international imaging sites that have aggregated and are openly sharing neuroimaging data from 539 individuals (ages 7–64 years) suffering from autism spectrum disorder (ASD) and 573 age-matched typical controls (Di Martino et al., 2014). We use the rs-fMRI data from ABIDE I preprocessed with the Configurable Pipeline for the Analysis of Connectomes (CPAC, http://fcp-indi.github.com). The preprocessing steps start from basic processing including dropping the first several volumes, slice timing correction, motion realignment, and intensity normalization. Then nuisance variable regression is performed to clean confounding variation due to physiological processes (heartbeat and respiration), head motion, and low-frequency scanner drifts. Band-pass filtering is applied after the regression to retain frequencies between 0.01Hz and 0.1Hz. All the images are registered to the MNI standard brain space. Our analysis focuses on an imaging statistic, the weighted degree centrality (WDC), defined as the sum of weights on the edges that are connected to all the other nodes. WDC provides a useful measure for brain intrinsic connectivity networks (Zuo et al., 2012). WDC has been widely adopted to identify “functional hubs” and study the topology of these hub structures (Fransson et al., 2011; Langer et al., 2012; Li et al., 2016). In addition, WDC is one of the most widely used measures that summarize information in the functional connectivity at the voxel level (Zuo et al., 2012). A challenge in analyzing WDC is that the data may involve complex spatial dependence among voxels. To address this issue, we apply the proposed BSP-BSS method to analyze the WDC data. Specifically, using the ABIDE data preprocessing pipeline (Craddock et al., 2013), we obtain WDC data from unsmoothed preprocessed rs-fMRI data which are registered to the MNI152 (Grabner et al., 2006) 3mm space. After removing the missing values, the data consist of 882 subject with 407 ASD patients and 475 healthy controls.
6.2. Results
We apply both the proposed BSP-BSS and ICA to the data described above. The modified squared exponential kernel is applied for the TGP prior of s as in the simulation study (see details in Section 2.6 in the Supplementary Material). The number of eigenfunctions in the approximate representation is specified as 500, which we find is sufficiently large to capture the characteristics of the data. The number of latent sources is specified as 30. We run the MCMC algorithm for 30,000 iterations with 15,000 burn-in and thin the chain after burn-in to obtain 750 samples. Gelman and Rubin’s convergence diagnostic (Gelman and Rubin, 1992) is conducted to evaluate the convergence of the algorithm. Five zero sources are found among the 30 sources estimated with the model, which implies q = 30 is sufficient in capturing the effective number of latent sources. The activation regions are estimated as in (8), by thresholding PIP at 0.5. Among the latent sources extracted by BSP-BSS, we identify several well-established brain functional networks. Figure 3 compares the functional networks identified by BSP-BSS and ICA, including the medial parietal cortex (MPC), bilateral inferior-lateral-parietal cortex (ilPC), and ventromedial prefrontal cortex (vmPFC), which are known as subregions of the default mode network (DMN). The ICA source signals are shown in Z-scores. The BSP-BSS source signals are rescaled to [0, 1]. From Figure 3, we find the spatial source signals estimated by BSP-BSS better align with the spatial distribution of the well-known functional networks reported in the neuroimaging literature (Smith et al., 2009; Iraji et al., 2019). For example, BSP-BSS has successfully identified key regions in the networks. In comparison, ICA has poor spatial coverage in some of the key regions such as the vmPFC of DMN, and encounters cross-talk between MPC and ilPC. Our findings from real data application are consistent with the results from the simulation studies that BSP-BSS has better statistical power than ICA in detecting regions of relevance in brain networks.
Fig. 3.
Common brain functional networks recovered by ICA and BSP-BSS with ABIDE data. The ICA source signals are extracted using Infomax, transformed to Z-scores and thresholded with |z|≥ 2. The BSP-BSS source signals are rescaled to [0, 1].
The default mode network dysfunction has often been reported in ASD studies (Cherkassky et al., 2006; Monk et al., 2009; Lynch et al., 2013; Uddin et al., 2013). In our study, we investigate the difference in the mixing coefficients of DMN subregions between the ASD patient group and the healthy group for both BSP-BSS and ICA. We perform the Wilcoxon test on the mixing coefficients to examine between-group differences. The p-values of the three subregions, i.e., MPC, ilPC and vmPFC, based on the Wilcoxon test are 0.0489, 0.9581, and 0.0138, respectively, for BSP-BSS and 0.0626, 0.7347, and 0.0003, respectively, for ICA. Therefore, both BSP-BSS and ICA find a significant difference in vmPFC between ASD patients and healthy controls, but BSP-BSS demonstrates better power in detecting the difference in MPC as compared to ICA.
7. Discussion
In this paper, we propose a new Bayesian spatial blind source separation method (BSP-BSS) to make inferences on sparse and spatially dependent source signals. We show that the BSP-BSS has desirable theoretical properties including model identifiability, large support of the priors, posterior consistency of the model parameters and selection consistency of the effective number of latent sources. The proposed BSP-BSS has two main advantages compared with existing BSS methods. First, thanks to the sparse prior, BSP-BSS can automatically identify activated regions in the sources. In comparison, most existing ICA methods generate non-sparse source estimates and require an additional thresholding procedure to identify activated regions, which typically is adhoc and lacks theoretical justifications. The sparsity also guides BSP-BSS to select the effective number of latent sources, which provides a solution to a major challenge for traditional algorithms. Secondly, BSP-BSS directly accounts for spatial dependence within each source signal and leads to better recovery of sources with spatial patterns, which is very common in neuroimaging applications. In our model, we assume the noise terms are spatially independent. This implies that major spatially structured signals of the brain activation are captured by the extracted source signals in the model whereas the model residuals mainly reflect random residual variations in the imaging data after accounting for the spatially structured source signals. This is a commonly adopted assumption in blind source separation for brain imaging (Beckmann and Smith, 2004, 2005; Guo, 2011; Shi and Guo, 2016; Wang and Guo, 2019). From our experiences, the proposed method still works well when there is mild to moderate spatial dependence in the residuals, as long as the spatial dependence in the source signals is stronger than that in the noises. We have included simulation results to demonstrate that our model performance remains strong in the presence of spatially dependent noise (Section 4.2 in the Supplementary Material).
The simulation study has shown the significant outperformance of BSP-BSS over ICA when the true latent sources have spatially clustering activated regions. The applications to fMRI data also demonstrate the advantage of our method. In our applications, we evaluate the proposed method with multi-subject rs-fMRI data in ABIDE dataset, where BSP-BSS finds a significant difference in the mixing coefficients for a DMN subregion between health and patient groups while ICA fails. To the best of our knowledge, BSP-BSS is the first BSS method that simultaneously addresses the smoothness and sparsity in neuroimaging latent source signals. Potentially, our proposed method can be extended to multi-modality brain image analysis with the assumptions that different modules share some common spatial traits in the latent sources. Another possible future direction is extending the spatial model to the spatial-temporal model to deal with longitudinal imaging data, which is also a very challenging topic in the neuroimaging field.
Supplementary Material
Acknowledgments
We are grateful to the associate editor and three anonymous reviewers for their valuable comments. This work was partially supported by the NIH grants R01MH105561 (Guo, Kang and Wu), R01GM124061 (Kang), R01DA048993 (Kang), R01MH118771 (Guo) and R01MH120299 (Guo).
Footnotes
Supplementary Material
Proofs of theoretical properties, details of the posterior computation, additional simulation results and sensitivity analysis are included in the Supplementary Material.
References
- Amari S. i., Cichocki A, and Yang HH (1996), “A New Learning Algorithm for Blind Signal Separation,” in Advances in Neural Information Processing Systems, pp. 757–763. [Google Scholar]
- Bach FR and Jordan MI (2002), “Kernel Independent Component Analysis,” Journal of Machine Learning Research, 3, 1–48. [Google Scholar]
- Banerjee S, Gelfand AE, Finley AO, and Sang H (2008), “Gaussian Predictive Process Models for Large Spatial Data Sets,” Journal of the Royal Statistical Society, Series B, 70, 825–848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barbieri MM and Berger JO (2004), “Optimal Predictive Model Selection,” The Annals of Statistics, 32, 870–897. [Google Scholar]
- Beckmann CF and Smith SM (2004), “Probabilistic Independent Component Analysis for Functional Magnetic Resonance Imaging,” IEEE Transactions on Medical Imaging, 23, 137–152. [DOI] [PubMed] [Google Scholar]
- — (2005), “Tensorial Extensions of Independent Component Analysis for Multisubject fMRI Analysis,” NeuroImage, 25, 294–311. [DOI] [PubMed] [Google Scholar]
- Bell AJ and Sejnowski TJ (1995), “An Information-Maximization Approach to Blind Separation and Blind Deconvolution,” Neural Computation, 7, 1129–1159. [DOI] [PubMed] [Google Scholar]
- Bhattacharya A and Dunson DB (2011), “Sparse Bayesian Infinite Factor Models,” Biometrika, 98, 291–306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Biswal BB and Ulmer JL (1999), “Blind Source Separation of Multiple Signal Sources of fMRI Data Sets Using Independent Component Analysis,” Journal of Computer Assisted Tomography, 23, 265–271. [DOI] [PubMed] [Google Scholar]
- Boehm Vock LF, Reich BJ, Fuentes M, and Dominici F (2015), “Spatial Variable Selection Methods for Investigating Acute Health Effects of Fine Particulate Matter Components,” Biometrics, 71, 167–177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cai Q, Kang J, and Yu T (2020), “Bayesian Network Marker Selection via the Thresholded Graph Laplacian Gaussian Prior,” Bayesian Analysis, 15, 79–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Calhoun VD, Adali T, Pearlson GD, and Pekar JJ (2001), “A Method for Making Group Inferences from Functional MRI Data Using Independent Component Analysis,” Human Brain Mapping, 14, 140–151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cannarsa P and D’Aprile T (2015), “Signed Measures,” in Introduction to Measure Theory and Functional Analysis, Springer, pp. 253–270. [Google Scholar]
- Chen A and Bickel PJ (2006), “Efficient Independent Component Analysis,” The Annals of Statistics, 34, 2825–2855. [Google Scholar]
- Chen T, Fox E, and Guestrin C (2014), “Stochastic Gradient Hamiltonian Monte Carlo,” in International Conference on Machine Learning, pp. 1683–1691. [Google Scholar]
- Cherkassky VL, Kana RK, Keller TA, and Just MA (2006), “Functional Connectivity in a Baseline Resting-State Network in Autism,” Neuroreport, 17, 1687–1690. [DOI] [PubMed] [Google Scholar]
- Choudhuri N, Ghosal S, and Roy A (2004), “Bayesian Estimation of the Spectral Density of a Time Series,” Journal of the American Statistical Association, 99, 1050–1059. [Google Scholar]
- Craddock C, Benhajali Y, Chu C, Chouinard F, Evans A, Jakab A, Khundrakpam BS, Lewis JD, Li Q, Milham M, Yan C, and Bellec P (2013), “The Neuro Bureau Preprocessing Initiative: Open Sharing of Preprocessed Neuroimaging Data and Derivatives,” in Neuroinformatics 2013, Stockholm, Sweden. [Google Scholar]
- Derado G, Bowman FD, and Kilts CD (2010), “Modeling the Spatial and Temporal Dependence in fMRI Data,” Biometrics, 66, 949–957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Di Martino A, Yan CG, Li Q, Denio E, Castellanos FX, Alaerts K, Anderson JS, Assaf M, Bookheimer SY, Dapretto M, Deen B, Delmonte S, Dinstein I, Ertl-Wagner B, Fair DA, Gallagher L, Kennedy DP, Keown CL, Keysers C, Lainhart JE, Lord C, Luna B, Menon V, Minshew NJ, Monk CS, Mueller S, Müller RA, Nebel MB, Nigg JT, O’ Hearn K, Pelphrey KA, Peltier SJ, Rudie JD, Sunaert S, Thioux M, Tyszka JM, Uddin LQ, Verhoeven JS, Wenderoth N, Wiggins JL, Mostofsky SH, and Milham MP (2014), “The Autism brain Imaging Data Exchange: towards a Large-scale Evaluation of the Intrinsic Brain Architecture in Autism,” Molecular Psychiatry, 19, 659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Erhardt EB, Allen EA, Wei Y, Eichele T, and Calhoun VD (2012), “SimTB, a Simulation Toolbox for fMRI Data under a Model of Spatiotemporal Separability,” NeuroImage, 59, 4160–4167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fevotte C and Godsill SJ (2006), “A Bayesian Approach for Blind Separation of Sparse Sources,” IEEE Transactions on Audio, Speech, and Language Processing, 14, 2174–2188. [Google Scholar]
- Fisher RA (1953), “Dispersion on a Sphere,” Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, 217, 295–305. [Google Scholar]
- Fransson P, Åden U, Blennow M, and Lagercrantz H (2011), “The Functional Architecture of the Infant Brain as Revealed by Resting-State fMRI,” Cerebral Cortex, 21, 145–154. [DOI] [PubMed] [Google Scholar]
- Gelfand AE, Banerjee S, Sirmans C, Tu Y, and Ong SE (2007), “Multilevel Modeling Using Spatial Processes: Application to the Singapore Housing Market,” Computational Statistics & Data Analysis, 51, 3567–3579. [Google Scholar]
- Gelfand AE, Schmidt AM, Banerjee S, and Sirmans C (2004), “Nonstationary Multivariate Process Modeling through Spatially Varying Coregionalization,” Test, 13, 263–312. [Google Scholar]
- Gelman A and Rubin DB (1992), “Inference from Iterative Simulation Using Multiple Sequences,” Statistical Science, 7, 457–472. [Google Scholar]
- Ghosal S and Roy A (2006), “Posterior Consistency of Gaussian Process Prior for Nonparametric Binary Regression,” The Annals of Statistics, 34, 2413–2429. [Google Scholar]
- Grabner G, Janke AL, Budge MM, Smith D, Pruessner J, and Collins DL (2006), “Symmetric atlasing and model based segmentation: an application to the hippocampus in older adults,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, pp. 58–66. [DOI] [PubMed] [Google Scholar]
- Griffanti L, Douaud G, Bijsterbosch J, Evangelisti S, Alfaro-Almagro F, Glasser MF, Duff EP, Fitzgibbon S, Westphal R, Carone D, Beckmann CF, and Smith SM (2017), “Hand Classification of fMRI ICA Noise Components,” NeuroImage, 154, 188–205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guhaniyogi R, Finley AO, Banerjee S, and Kobe RK (2013), “Modeling Complex Spatial Dependencies: Low-Rank Spatially Varying Cross-Covariances with Application to Soil Nutrient Data,” Journal of Agricultural, Biological, and Environmental Statistics, 18, 274–298. [Google Scholar]
- Guo C, Kang J, and Johnson TD (2022), “A Spatial Bayesian Latent Factor Model for Image-on-Image Regression,” Biometrics, 78, 72–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guo Y (2011), “A General Probabilistic Model for Group Independent Component Analysis and Its Estimation Methods,” Biometrics, 67, 1532–1542. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hyun JW, Li Y, Gilmore JH, Lu Z, Styner M, and Zhu H (2014), “SGPP: Spatial Gaussian Predictive Process Models for Neuroimaging Data,” NeuroImage, 89, 70–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hyun JW, Li Y, Huang C, Styner M, Lin W, Zhu H, and Initiative ADN (2016), “STGP: Spatio-Temporal Gaussian Process Models for Longitudinal Neuroimaging Data,” NeuroImage, 134, 550–562. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hyvärinen A and Oja E (2000), “Independent Component Analysis: Algorithms and Applications,” Neural Networks, 13, 411–430. [DOI] [PubMed] [Google Scholar]
- Iraji A, Deramus TP, Lewis N, Yaesoubi M, Stephen JM, Erhardt E, Belger A, Ford JM, McEwen S, Mathalon DH, Mueller BA, Pearlson GD, Potkin SG, Preda A, Turner JA, Vaidya JG, van Erp TG, and Calhoun VD (2019), “ The Spatial Chronnectome Reveals a Dynamic Interplay between Functional Segregation and Integration,” Human Brain Mapping, 40, 3058–3077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kallenberg O (2017), Random Measures, Theory and Applications, Springer. [Google Scholar]
- Kang J, Reich BJ, and Staicu A-M (2018), “Scalar-on-Image Regression via the Soft-Thresholded Gaussian Process,” Biometrika, 105, 165–184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kiviniemi V, Kantola J-H, Jauhiainen J, Hyvärinen A, and Tervonen O (2003), “Independent Component Analysis of Nondeterministic fMRI Signal Sources,” NeuroImage, 19, 253–260. [DOI] [PubMed] [Google Scholar]
- Knowles D and Ghahramani Z (2007), “Infinite Sparse Factor Analysis and Infinite Independent Components Analysis,” in International Conference on Independent Component Analysis and Signal Separation, Springer, pp. 381–388. [Google Scholar]
- Langer N, Pedroni A, Gianotti LR, Hänggi J, Knoch D, and Jäncke L (2012), “Functional Brain Network Efficiency Predicts Intelligence,” Human Brain Mapping, 33, 1393–1406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee S, Shen H, Truong Y, Lewis M, and Huang X (2011), “Independent Component Analysis Involving Autocorrelated Sources with an Application to Functional Magnetic Resonance Imaging,” Journal of the American Statistical Association, 106, 1009–1024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li S, Ma X, Huang R, Li M, Tian J, Wen H, Lin C, Wang T, Zhan W, Fang J, et al. (2016), “Abnormal Degree Centrality in Neurologically Asymptomatic Patients with End-Stage Renal Disease: a Resting-State fMRI Study,” Clinical Neurophysiology, 127, 602–609. [DOI] [PubMed] [Google Scholar]
- Li Y-O, Adali T, and Calhoun VD (2006), “Sample Dependence Correction for Order Selection in fMRI Analysis,” in 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2006., IEEE, pp. 1072–1075. [Google Scholar]
- Li Y-O, Adal i T, and Calhoun VD (2007), “Estimating the Number of Independent Components for Functional Magnetic Resonance Imaging Data,” Human Brain Mapping, 28, 1251–1266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lynch CJ, Uddin LQ, Supekar K, Khouzam A, Phillips J, and Menon V (2013), “Default Mode Network in Childhood Autism: Posteromedial Cortex Heterogeneity and Relationship with Social Deficits,” Biological Psychiatry, 74, 212–219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma Y and Liu JS (2021), “On Posterior Consistency of Bayesian Factor Models in High Dimensions,” Bayesian Analysis. [Google Scholar]
- Marquand A, Howard M, Brammer M, Chu C, Coen S, and Mourão-Miranda J (2010), “Quantitative Prediction of Subjective Pain Intensity from Whole-Brain fMRI Data Using Gaussian Processes,” NeuroImage, 49, 2178–2189. [DOI] [PubMed] [Google Scholar]
- McKeown MJ, Makeig S, Brown GG, Jung T-P, Kindermann SS, Bell AJ, and Sejnowski TJ (1998), “Analysis of fMRI Data by Blind Separation into Independent Spatial Components,” Human Brain Mapping, 6, 160–188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Minka TP (2001), “Automatic Choice of Dimensionality for PCA,” in Advances in Neural Information Processing Systems, pp. 598–604. [Google Scholar]
- Mohammad-Djafari A (2012), “Bayesian Approach with Prior Models which Enforce Sparsity in Signal and Image Processing,” EURASIP Journal on Advances in Signal Processing, 2012, 52. [Google Scholar]
- Monk CS, Peltier SJ, Wiggins JL, Weng S-J, Carrasco M, Risi S, and Lord C (2009), “Abnormalities of Intrinsic Functional Connectivity in Autism Spectrum Disorders,” NeuroImage, 47, 764–772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Montagna S, Wager T, Barrett LF, Johnson TD, and Nichols TE (2018), “Spatial Bayesian Latent Factor Regression Modeling of Coordinate-Based Meta-Analysis Data,” Biometrics, 74, 342–353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morris JS, Brown PJ, Herrick RC, Baggerly KA, and Coombes KR (2008), “Bayesian Analysis of Mass Spectrometry Proteomic Data Using Wavelet-Based Functional Mixed Models,” Biometrics, 64, 479–489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nakajima J and West M (2013a), “Bayesian Analysis of Latent Threshold Dynamic Models,” Journal of Business & Economic Statistics, 31, 151–164. [Google Scholar]
- — (2013b), “Bayesian Dynamic Factor Models: Latent Threshold Approach,” Journal of Financial Econometrics, 11, 116–153. [Google Scholar]
- Ni Y, Stingo FC, and Baladandayuthapani V (2019), “Bayesian Graphical Regression,” Journal of the American Statistical Association, 114, 184–197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nychka D, Bandyopadhyay S, Hammerling D, Lindgren F, and Sain S (2015), “A Multiresolution Gaussian Process Model for the Analysis of Large Spatial Datasets,” Journal of Computational and Graphical Statistics, 24, 579–599. [Google Scholar]
- Power JD, Cohen AL, Nelson SM, Wig GS, Barnes KA, Church JA, Vogel AC, Laumann TO, Miezin FM, Schlaggar BL, et al. (2011), “Functional Network Organization of the Human Brain,” Neuron, 72, 665–678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rasmussen CE (2003), “Gaussian Processes in Machine Learning,” in Summer School on Machine Learning, Springer, pp. 63–71. [Google Scholar]
- Ren Q and Banerjee S (2013), “Hierarchical Factor Models for Large Spatially Misaligned Data: A Low-Rank Predictive Process approach,” Biometrics, 69, 19–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rowe DB (2002), “A Bayesian Approach to Blind Source Separation,” Journal of Interdisciplinary Mathematics, 5, 49–76. [Google Scholar]
- Samarov A and Tsybakov A (2004), “Nonparametric Independent Component Analysis,” Bernoulli, 10, 565–582. [Google Scholar]
- Samworth RJ and Yuan M (2012), “Independent Component Analysis via Nonparametric Maximum Likelihood Estimation,” The Annals of Statistics, 40, 2973–3002. [Google Scholar]
- Shen W, Ning J, and Yuan Y (2016), “Rate-Adaptive Bayesian Independent Component Analysis,” Electronic Journal of Statistics, 10, 3247–3264. [Google Scholar]
- Shi R and Guo Y (2016), “Investigating Differences in Brain Functional Networks Using Hierarchical Covariate-Adjusted Independent Component Analysis,” The Annals of Applied Statistics, 10, 1930–1957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith SM, Fox PT, Miller KL, Glahn DC, Fox PM, Mackay CE, Filippini N, Watkins KE, Toro R, Laird AR, and Beckmann CF (2009), “Correspondence of the Brain’s Functional Architecture during Activation and Rest,” Proceedings of the National Academy of Sciences, 106, 13040–13045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith SM and Nichols TE (2018), “Statistical Challenges in “Big Data” Human Neuroimaging,” Neuron, 97, 263–268. [DOI] [PubMed] [Google Scholar]
- Uddin LQ, Supekar K, and Menon V (2013), “Reconceptualizing Functional Brain Connectivity in Autism from a Developmental Perspective,” Frontiers in Human Neuroscience, 7, 458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang F and Wall MM (2003), “Generalized Common Spatial Factor Model,” Biostatistics, 4, 569–582. [DOI] [PubMed] [Google Scholar]
- Wang Y and Guo Y (2019), “A Hierarchical Independent Component Analysis Model for Longitudinal Neuroimaging Studies,” NeuroImage, 189, 380–400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Watson GS (1982), “Distributions on the Circle and Sphere,” Journal of Applied Probability, 19, 265–280. [Google Scholar]
- Zayyani H, Babaie-Zadeh M, and Jutten C (2009), “An Iterative Bayesian Algorithm for Sparse Component Analysis in Presence of Noise,” IEEE Transactions on Signal Processing, 57, 4378–4390. [Google Scholar]
- Zhang L and Banerjee S (2022), “Spatial Factor Modeling: A Bayesian Matrix-Normal Approach for Misaligned Data,” Biometrics, 78, 560–573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zuo X-N, Ehmke R, Mennes M, Imperati D, Castellanos FX, Sporns O, and Milham MP (2012), “Network Centrality in the Human Functional Connectome,” Cerebral Cortex, 22, 1862–1875. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.