Bayesian Spatial Blind Source Separation via the Thresholded Gaussian Process

Ben Wu; Ying Guo; Jian Kang

doi:10.1080/01621459.2022.2123336

. Author manuscript; available in PMC: 2025 Jan 1.

Published in final edited form as: J Am Stat Assoc. 2022 Nov 28;119(545):422–433. doi: 10.1080/01621459.2022.2123336

Bayesian Spatial Blind Source Separation via the Thresholded Gaussian Process

Ben Wu ^a, Ying Guo ^b, Jian Kang ^c,^*

PMCID: PMC10964322 NIHMSID: NIHMS1853444 PMID: 38545331

Abstract

Blind source separation (BSS) aims to separate latent source signals from their mixtures. For spatially dependent signals in high dimensional and large-scale data, such as neuroimaging, most existing BSS methods do not take into account the spatial dependence and the sparsity of the latent source signals. To address these major limitations, we propose a Bayesian spatial blind source separation (BSP-BSS) approach for neuroimaging data analysis. We assume the expectation of the observed images as a linear mixture of multiple sparse and piece-wise smooth latent source signals, for which we construct a new class of Bayesian nonparametric prior models by thresholding Gaussian processes. We assign the vMF priors to mixing coefficients in the model. Under some regularity conditions, we show that the proposed method has several desirable theoretical properties including the large support for the priors, the consistency of joint posterior distribution of the latent source intensity functions and the mixing coefficients, and the selection consistency on the number of latent sources. We use extensive simulation studies and an analysis of the resting-state fMRI data in the Autism Brain Imaging Data Exchange (ABIDE) study to demonstrate that BSP-BSS outperforms the existing method for separating latent brain networks and detecting activated brain activation in the latent sources.

Keywords: Latent source signal separations, posterior consistency, neuroimaging, sparse signals, spatially dependent signals

1. Introduction

Neuroimaging techniques such as functional magnetic resonance imaging (fMRI) have become an important tool to investigate neural processing in brain functioning. The observed three-dimensional (3D) brain imaging data such as fMRI blood-oxygen-level-dependent (BOLD) effects represent the combination of source signals generated by various underlying brain functional networks (Power et al., 2011). A common objective in imaging analysis is to decompose the observed whole-brain 3D images to identify and characterize underlying brain networks, which are organizations of multiple brain regions that demonstrate correlated brain signals measured by functional imaging. This problem can be addressed by blind source separation (BSS) (Biswal and Ulmer, 1999, BSS) methods which aim at separating latent source signals from their mixture observations. BSS methods such as principal component analysis (PCA) and independent component analysis (ICA) have been applied in neuroimaging studies for this purpose.

In this work, we propose a new Bayesian spatial blind source separation (BSP-BSS) modeling framework for extracting sparse latent signals from spatially dependent neuroimaging data. Let $ℬ_{v}$ be the cubic volume region of voxel v in brain images. Let X_iv represent the observed imaging intensity value at voxel v for the ith image. For example, X_iv can represent the BOLD signal intensity at voxel v from the ith fMRI frame for a single-subject research, or the statistical map derived from neuroimaging data for the ith subject for a multi-subject research. We decompose the expectation of X_iv as a linear combination of q latent components:

E (X_{i v}) = \sum_{j = 1}^{q} A_{i j} S_{j} (ℬ_{v}),

(1)

where $S_{j} (ℬ_{v})$ represents the spatial source signal intensity at voxel v for the jth component. Here, S_j(·) is an intensity measure (Kallenberg, 2017) which is a deterministic function mapping a spatial region to the expected intensity from the spatial source signal. A_ij’s are the mixing coefficients that mix the q latent source signals to generate the observed data. We make statistical inferences on model parameters under the Bayesian framework. We adopt the thresholded Gaussian process (TGP) as the prior model to account for the sparsity and spatial dependence among the source signals, where TGP is a stochastic process constructed by thresholding a smooth Gaussian process and it provides a large probability support for a class of sparse and piecewise-smooth functions.

The proposed model in (1) has fundamental distinctions from the commonly used BSS methods in neuroimaging such as ICA (McKeown et al., 1998; Calhoun et al., 2001; Beckmann and Smith, 2004; Shi and Guo, 2016). ICA has become one of the most commonly used tools for decomposing functional neuroimaging data to investigate underlying brain functional networks (Calhoun et al., 2001; Beckmann and Smith, 2005; Guo, 2011; Shi and Guo, 2016; Wang and Guo, 2019). However, there are several methodological and application limitations with the ICA methods. First, the existing ICA model typically assumes the components’ spatial source signals across the brain as independent random samples from a latent distribution (Hyvärinen and Oja, 2000). However, the spatial independence assumption is usually violated in neuroimaging data where there are known spatial dependence in the neural signals in the brain (Derado et al., 2010). This is due to the similarity of the neural functioning in nearby brain locations and also due to the spatial smoothing of the images that is performed as a common step in pre-processing of neuroimages in order to improve the signal-to-noise ratio. The spatial independence assumption inherited from the classic ICA, therefore, is a major methodological limitation of the existing ICA models in neuroimaging applications. Secondly, given the large number of voxels in the brain, raw results from existing ICA methods are often noisy and require thresholding to identify significant source signals in the brain. With the absence of a unified approach for thresholding, various strategies have been used in different studies (Beckmann and Smith, 2004; Griffanti et al., 2017), which reduces the comparability in the results across studies. Thirdly, a well-known challenge in BSS methods such as ICA is the choice of the number of latent components in the decomposition. Although some selection methods have been proposed (Minka, 2001; Beckmann and Smith, 2004; Li et al., 2006, 2007), their performance is not well studied, and often the selection criteria do not have the intuitive appeal in neuroimaging applications.

Another related work to BSP-BSS is the Bayesian spatial factor model, which is a powerful tool for efficient dimension reduction and flexible covariance modeling for high-dimensional data, and has been applied in imaging data analysis (Montagna et al., 2018; Guo et al., 2022). In particular, in the meta-analysis of neuroimaging data by Montagna et al. (2018), the latent factors play a role to connect the intensity of consistent brain activation patterns to covariate variables. In image-on-image regression by Guo et al. (2022), the spatial latent factors are introduced to capture the association between the response image and predictor images. However, the existing factor models cannot achieve the goals of BSP-BSS for making inferences on the spatial source signals from high-dimensional imaging data. In other application fields such as public health (Wang and Wall, 2003), finance (Gelfand et al., 2004, 2007) and environmental statistics (Guhaniyogi et al., 2013; Ren and Banerjee, 2013; Zhang and Banerjee, 2022), spatially-oriented data are collected and the spatial factor models have been developed to capture the spatial variation and dependence of data collected across geographical areas. For example, motivated by the analysis of the commercial real estate prices, the spatially varying linear model of coregionalization (SVLMC) (Gelfand et al., 2004) has been constructed using latent Gaussian spatial processes. This method has been further extended to analyze air monitor value data in California (Ren and Banerjee, 2013), where indicator variables are included in the model to select the latent processes that capture spatial dependence.

Compared with existing methods, our proposed BSP-BSS has the following appealing features. First, BSP-BSS explicitly models the spatial dependence of the latent source signals via the covariance kernels of TGP which can effectively accommodate the complex spatial correlations among the neuroimaging data. Furthermore, the TGP prior can outperform the shrinkage priors adopted by other Bayesian BSS methods (Fevotte and Godsill, 2006; Knowles and Ghahramani, 2007; Zayyani et al., 2009; Bhattacharya and Dunson, 2011; Mohammad-Djafari, 2012) for detecting sparse and spatially dependent signals. To select the important brain regions and networks, BSP-BSS provides a unified framework to make Bayesian inferences with theoretical guarantees on the thresholding parameter and provides more accurate measures on the uncertainty of brain region selection. In addition, BSP-BSS utilizes the intrinsic properties of TGP to assign a positive prior probability to the scenario that a latent source has zero or ignorable effects in terms of brain activation, indicating the latent source does not effectively generate meaningful source signals and hence can be eliminated. This provides a systematic Bayesian modeling approach to making posterior inferences on the number of effective latent sources.

The main advantages of BSP-BSS stem from using the TGP prior to achieve sparsity and spatial dependence of the latent source signals simultaneously. TGP is a special stochastic process constructed by thresholding a latent Gaussian process (GP). The general GP is a flexible modeling tool for functions, curves, and images which may involve complex correlation structures. Over the past decades, the GP has been applied extensively to spatial statistics and machine learning (Rasmussen, 2003; Banerjee et al., 2008; Boehm Vock et al., 2015; Nychka et al., 2015). Recent works in neuroimaging have also shown the advantage of GP in modeling spatially correlated neuroimaging data (Marquand et al., 2010; Hyun et al., 2014, 2016; Kang et al., 2018). The soft-TGP has been successfully adopted to specify priors for spatially varying coefficients in scalar-on-image regression (Kang et al., 2018). In general, for Bayesian modeling of sparsity, the thresholded Gaussian priors (Nakajima and West, 2013a,b; Ni et al., 2019; Cai et al., 2020) have been shown as successful alternatives to shrinkage priors. Of note, to construct the thresholded prior models, the “hard” thresholding operator is widely used in the aforementioned existing literature while the “soft” thresholding operator (Kang et al., 2018) is adopted particularly for modeling sparse, continuous, and piece-wise smooth effects of imaging predictors on the outcome variable. A soft-TGP prior ensures the spatially-varying function is continuous with probability one, i.e., it places a zero probability on the function with discontinuous jumps. This model assumption is appropriate in scalar-on-image regression as it can reflect the continuous change of the scalar outcome variable due to the changes of imaging predictors over space. However, it is not a suitable prior specification for the spatial source signal intensity in our model which reflects the complex activation pattern of brain images. In many neuroimaging studies, it is common that the important brain activation regions have sharp edges on the boundaries due to intrinsic properties of brain functions and anatomical structures (Smith and Nichols, 2018). Thus, we adopt a hard-TGP which provides a large prior support for a wide range of sparse and piece-wise smooth functions with discontinuous jumps. In addition, the threshold parameter in the hard-TGP has an appealing interpretation as the minimum detectable signal intensity, while it is not available with the soft-TGP.

The theoretical properties for BSP-BSS are completely different from those derived from the traditional BSS framework. In traditional BSS such as ICA, the latent source signals are assumed to be a set of random variables that follow a parametric or nonparametric distribution involving unknown parameters. The theoretical justifications of statistical inference have been focused on the mixing coefficients and the distribution of the latent source signals (Samarov and Tsybakov, 2004; Samworth and Yuan, 2012; Shen et al., 2016). Alternatively, BSP-BSS treats each latent source signal as a sparse and piece-wise smooth spatially-varying function. Thus, both latent source signals and mixing coefficients are unknown parameters of interest in BSP-BSS. Under the Bayesian inference framework, we assign the von Mises-Fisher (vMF) priors (Fisher, 1953; Watson, 1982) to the mixing coefficients, ensuring the model identifiability. We specify the priors for the latent source signals using TGP, ensuring the sparsity and spatial dependence of the latent source signals simultaneously. We establish the theoretical properties of the proposed model, which enjoys the large prior support, leading to the joint posterior consistency of mixing coefficients and latent source intensity functions, as well as the selection consistency of the effective number of latent source signals.

The rest of the paper is organized as follows: Section 2 develops the new model along with the identifiability conditions, the prior specifications, and the posterior inference procedure. Section 3 establishes all the theoretical properties of the proposed method. Section 4 focuses on the details of posterior computation, where we adopt the stochastic gradient Hamiltonian Monte Carlo (SGHMC) algorithm. The advantages of the proposed method over existing methods are demonstrated in Section 5 with simulations and in Section 6 with analysis of resting-sate fMRI data analysis in the Autism Brain Imaging Data Exchange (ABIDE) study. Section 7 concludes with a brief discussion on future work.

2. Bayesian Spatial Blind Source Separation

Let $ℬ \subset ℝ^{d}$ be a compact region in the d-dimensional Euclidean space $ℝ^{d}$ for a positive integer d. Suppose $ℬ$ is partitioned into V disjoint but spatially contiguous sub-regions, denoted as ${ℬ_{v}}_{v = 1}^{V}$ , such that $ℬ = \cup_{v = 1}^{V} ℬ_{v}$ . In neuroimaging applications, $ℬ$ represents the whole brain region and each $ℬ_{v}$ may represent a voxel, i.e. the basic cubic volume element in the 3D image, and it may also refer to a brain region of interest, i.e. a collection of spatially contiguous voxels. Suppose we obtain observations from n images on $ℬ$ and denote by $X_{i v} \in ℝ$ the intensity value of $ℬ_{v}$ for the ith observed image (i = 1,… , n). We perform spatial blind source separation on X_iv into q latent source components:

X_{i v} = \sum_{j = 1}^{q} A_{i j} S_{j} (ℬ_{v}) + ϵ_{i v}, ϵ_{i v} ~ N {0, σ^{2} (ℬ_{v})},

(2)

where ϵ_iv follows a zero-mean normal distribution and its variance $σ^{2} (ℬ_{v})$ measures total variability of spatial noises in $ℬ_{v}$ . We assume {ϵ_iv} are independent over i and v. The functions $S_{j} (\cdot) \in ℝ$ and $σ^{2} (\cdot) \in ℝ^{+}$ are intensity measures defined on the measurable space $(ℱ, ℬ)$ with $ℱ$ being a σ-field of brain region $ℬ$ (Kallenberg, 2017). We assume S_j(·) is a signed measure (Cannarsa and D’Aprile, 2015) in that we allow observed image X_iv to take both positive and negative values. For the jth component, $S_{j} (ℬ_{v})$ represents the intensity of latent source signals from $ℬ_{v}$ , and A_ij is the unknown mixing coefficient for the ith observed image. For any $𝒟 \subset ℬ$ , let $m (𝒟)$ be the Lebesgue measure of $𝒟$ and we assume

S_{j} (𝒟) = \int_{D} s_{j} (ξ) m (d ξ), for j = 1, \dots, q, and σ^{2} (𝒟) = \int_{D} {\tilde{σ}}^{2} (ξ) m (d ξ),

(3)

where s_j(ξ) and ${\tilde{σ}}^{2} (ξ)$ represent the jth spatial source signal intensity and the spatial noise variance intensity at location ξ in brain region $ℬ$ respectively. Write $σ = {σ (ℬ_{1}), \dots, σ (ℬ_{V})}^{⊤}, A = (A_{i j})$ as the mixing matrix with $A_{j j} = {(A_{1 j}, \dots, A_{n j})}^{⊤}$ being the jth column of A, and s(·) = {s₁(·),… , s_q(·)} as the collection of q unknown spatial source intensity functions. For simplicity, we drop off “(·)” and denote s = s (·) and s_j = s_j(·) for the rest of paper. We define the effective number of latent sources as follows:

q_{e f f} = \sum_{j = 1}^{q} I {{‖ s_{j} ‖}_{1} > 0},

(4)

where ∥s_j∥₁, i.e. the L¹ norm of s_j, reflects the effect size of the jth latent source signal and q_eff counts the number of latent sources with nonzero effects. In neuroimaging applications, q_eff represents the number of unique activation patterns that are essential to recovering the observed brain images, potentially providing insights on brain functions and/or structures.

The proposed model is a new modeling framework for spatial blind source signal separation, where the latent source signals are represented as the deterministic intensity functions. This model assumption is fundamentally different from existing BSS methods such as ICA and factor models in which the latent source signals are random variables and only their distributions are identifiable. As the parameters of interest include multiple functions and matrices, it is important to define the parameter space and establish model identifiability conditions as a foundation for developing a formal statistical inference procedure. We discuss these issues in detail in Section 2.1.

2.1. Parameter Space and Model Identifiability

We begin with notations. Let $‖ f ‖_{p} = {(\int_{ℬ} | f (ξ) |^{p} d ξ)}^{1 / p}$ , p ≥ 1 be the L^p-norm of a real function f on $ℬ$ , and $‖ f ‖_{\infty} = \sup_{ξ ϵ ℬ} | f (ξ) |$ denote the L^∞-norm. Let $‖ f ‖_{p} = \sum_{j = 1}^{q} {(\int_{ℬ} {| f_{j} (ξ) |}^{p} d ξ)}^{1 / p}, p \geq 1$ be the L^p-norm of an array of real functions f = (f₁,… , f_q)^⏉, and $‖ f ‖_{\infty} = \max \sup_{ξ ϵ ℬ} | f_{j} (ξ) |$ denote the L^∞-norm. Similarly, for any vector $v \in ℝ^{d}$ , let $‖ v ‖_{p} = {(\sum_{i = 1}^{d} {| v_{i} |}^{p})}^{1 / p}, p \geq 1$ be the L^p-norm and ∥v∥ = max_i=1(|v_i|) be the L^∞-norm. For any matrix $v \in ℝ^{d \times q}$ , let $‖ v ‖_{p} = {({\sum_{i = 1}^{d} \sum_{j = 1}^{q} | V_{i j} |}^{p})}^{1 / p}, p \geq 1$ be the L^p-norm and ∥V∥_∞ = max_i,j(|V_ij|) be the L^∞-norm. Let $f \in 𝒞^{ρ} (ℬ)$ denote a differentiable function of order ρ on $ℬ$ . Let $I (ℰ)$ be an event indicator function where $I (ℰ) = 1$ if event $ℰ$ occurs and $I (ℰ) = 0$ , otherwise. Denote by A*, s*, and σ* the true parameters of the model that generates data and let $Θ = 𝒜 \times 𝒮$ be the parameter space of A and s, where $𝒜$ and $𝒮$ are defined in the assumptions below.

Assumption 1.

The true intensity functions $s^{*} = (s_{1}^{*}, \dots, s_{q}^{*})$ belong to a space $𝒮 : = \prod_{j = 1}^{q} 𝒮_{j}$ where $𝒮_{j}$ is a set of functions defined on a compact closed set $ℬ \subset ℝ^{d}$ . We say $s = (s_{1}, \dots, s_{q}) \in 𝒮$ , if there exists a permutation of {1,… , q}, denoted as {ω₁,… , ω_q}, such that $s_{ω_{j}} \in 𝒮_{j}$ . Without loss of generality, we assume $s_{j} \in 𝒮_{j}$ for simplicity. We assume there exists a non-empty open set $ℛ_{j} \subset ℬ$ with $m (ℛ_{j}) > 0$ ; and $ℛ_{j}$ are non-overlapped across j’s, i.e. $ℛ_{j} \cap ℛ_{j'} = \emptyset$ for j ≠ j′. We assume $ℛ_{j}$ and s_j satisfy the following two conditions: (1) sparsity: there exists a constant ζ > 0 such that $| s_{j} (ξ) | > ζ, v \in ℛ_{j}$ and $s_{j} (ξ) = 0, ξ \in ℬ \ {\bar{ℛ}}_{j}$ , where ${\bar{ℛ}}_{j}$ is the closure of $ℛ_{j}$ ; and (2) piece-wise smoothness: there exists an integer ρ₀ > 0 such that $s_{j} (ξ) I {ξ \in {\bar{ℛ}}_{j}} \in 𝒞^{ρ_{0}} ({\bar{ℛ}}_{j})$ .

Assumption 2.

The jth column of the true mixing matrix A* belongs to a space $𝒜_{j n} \subset ℝ^{n}$ defined as $𝒜_{j n} = {a \in ℝ^{n} : ‖ a ‖_{2}^{2} = n}$ . Let $𝒜_{n} = \prod_{j = 1}^{q} 𝒜_{j n}$ and $𝒜 = \cup_{n \geq q} 𝒜_{n}$ .

Assumptions 1 and 2 define the parameter space of the proposed model. We assume the latent source signal intensity functions belong to a functional space of piece-wise smooth, sparse and bounded functions. This assumption provides flexibility for modeling intensity functions with various shapes. For a general BSS problem, it may be strong to assume that the nonzero signal regions of intensity functions do not overlap. However, in neuroimaging applications, the well-known functional brain activation regions typically do not overlap (Smith et al., 2009). In the simulation study (see Section 5), we also consider the cases where the true intensity functions are partially overlapped between different source signals. We show that the proposed models achieve a better accuracy to identify the true sources compared with the Infomax ICA (Bell and Sejnowski, 1995), a classic ICA algorithm. It is well-known that an ICA model is identifiable up to permutation and scaling of the sources. Our model has a similar but more challenging issue: the intensity functions s of our interests may not identifiable since the model involves the integrals of the intensity functions over brain regions. Thus, to ensure the scale identifiability, we assume the L²-norm of each column of the mixing coefficient matrix A equals $\sqrt{n}$ in Assumption 2. A similar assumption has been made in Bayesian factor analysis, known as the $\sqrt{n}$ -orthonormal factor assumption (Ma and Liu, 2021), to avoid magnitude inflation of the posterior samples of the loading matrix. To ensure the intensity function s is uniquely determined by the intensity measure S, we impose additional constraints on $ℬ_{v}$ in Assumption 3.

Assumption 3.

The spatial regions ${ℬ_{v}}_{v = 1}^{V}$ satisfy: 1) $ℬ = \cup_{v = 1}^{V} ℬ_{v}$ and $ℬ_{v} \cap ℬ_{v^{'}} = \emptyset$ , for any v ≠ v′. 2) For any v and j, $ℬ_{v} \subset {\bar{ℛ}}_{j}$ or $ℬ_{v} \subset ℬ \ {\bar{ℛ}}_{j}$ . 3) There exist constants K₀ and K₁ such that $0 < K_{1} < m (ℬ) < K_{0}$ and $1 / (K_{0} V) < m (ℬ_{v}) \leq \bar{m} (ℬ_{v}) < 1 / (K_{1} V), v = 1, \dots, V$ , where $\bar{m} (ℬ_{v}) = \sup_{ξ, ξ^{'} \in ℬ_{v}} {({‖ ξ - ξ^{'} ‖}_{\infty})}^{d}$ .

Now we establish the model identifiability in the following proposition.

Proposition 1.

(Identifiability) If Assumptions 1 – 3 hold, model (2) is identifiable up to a joint flipping of the signs of source functions and mixing coefficients, i.e., for any (A, s), (A′, s′) ∈ Θ, n > 0, and V > 0, if $\sum_{j = 1}^{q} A_{i j} S_{j} (ℬ_{v}) = {\sum_{j = 1}^{q} A^{'}}_{i j} {S^{'}}_{j} (ℬ_{v}), i = 1, \dots, n, v = 1, \dots, V$ , there exists a flipping indicator $δ_{0} = (δ_{01}, \dots, δ_{0 q}) \in {(δ_{1}, \dots, δ_{q}), δ_{j} = \pm 1, j = 1, \dots, q}$ such that (A[δ₀], s[δ₀]) = (A′, s′) where $A [δ_{0}] = (δ_{01} A_{. 1}, \dots, δ_{0 q} A_{. q})$ and $s [δ_{0}] = (δ_{01} s_{1}, \dots, δ_{0 q} s_{q})$ are the flipped versions of A and s respectively.

The identifiability stated above does not involve permutation of the sources as the order of the sources is specified in Assumption 1. The model is identifiable up to permutation if we reconstruct the true parameter space $𝒮$ accordingly.

2.2. Prior Specifications

We discuss the prior specifications for BSP-BSS. For the mixing coefficient matrix A, we assign the independent vMF priors to its scaled columns, i.e.,

n^{- 1 / 2} A_{. j} ~ vM F (η, ρ), j = 1, \dots, q

(5)

where η > 0 and ρ is an n-dimensional vector with ∥ρ∥₂ = 1. The vMF distribution is chosen because it is defined on the (n − 1) - dimensional sphere in $ℝ^{n}$ . This property ensures ${‖ A_{. j} ‖}_{2} = \sqrt{n}$ (Assumption 2) holds with probability one. Let $𝒢 𝒫 (μ, κ)$ represent a Gaussian process with mean μ and covariance kernel κ. We assume s_j follows a TGP, denoted as $s_{j} ~ 𝒯 𝒢 𝒫 (ζ, 0, κ)$ , which can be constructed as follows:

s_{j} (ξ) : = g_{ζ} {B_{j} (ξ)}, B_{j} ~ 𝒢 𝒫 (0, κ),

(6)

where g_ζ(x) = I(|x|> ζ)x for ζ > 0 is a thresholding function. We also assume s_j’s are independent across j’s. The independence assumption is not necessarily required for establishing the asymptotic properties of our model, but it is generally assumed in BSS of neuroimaging and leads to more efficient posterior computation algorithms and satisfactory empirical results for imaging data analysis from our experiences. If one would like to model dependence across latent sources, a more general covariance structure (Rowe, 2002) can be considered for special applications, but it is beyond the scope of this paper.

To make fully Bayesian inference on the hyperparameters, we assign the independent inverse gamma priors for $σ^{2} (ℬ_{v})$ across v with shape a_σ and scale b_σ, denoted as $σ^{2} (ℬ_{v}) ~ G^{- 1} (a_{σ}, b_{σ})$ , which is a conjugate prior, leading to the Gibbs sampling update scheme in posterior computation. We adopt a noninformative prior for the thresholding parameter ζ by assuming ζ ∝ I(ζ > 0).

2.3. Bayesian Inference

For posterior inference, we resort to Markov chain Monte Carlo (MCMC) for which we discuss in detail in Section 4. Suppose we obtain H MCMC samples of (A, s), denoted as ${A^{(h)}, s^{(h)}}_{h = 1}^{H}$ . We estimate A by the posterior mean, i.e., $A = H^{- 1} {\sum_{h = 1}^{H} A}^{(h)}$ . To estimate s and detect the activation regions, we first compute the posterior inclusion probability (PIP), i.e., PIP_j(ξ) = Pr{s_j(ξ) ≠ 0}, for each intensity function j at each location $ξ \in ℬ$ by ${PIP}_{j} (ξ) = H^{- 1} \sum_{h = 1}^{H} I {s_{j}^{(h)} (ξ) \neq 0}$ . Then, we estimate s_j(ξ) by the weighted posterior mean ${\hat{s}}_{j} (ξ) = {H {PIP}_{j} (ξ)}^{- 1} \sum_{h = 1}^{H} s_{j}^{(h)} (ξ) I {s_{j}^{(h)} (ξ) \neq 0}$ if PIP_j(ξ) ≠ 0, and ${\hat{s}}_{j} (ξ) = 0$ otherwise. We estimate the effective number of latent source signals in (4) using ${\hat{s}}_{j}$ by

{\hat{q}}_{e f f} = \sum_{j = 1}^{q} I {{‖ {\hat{s}}_{j} ‖}_{1} > 0} .

(7)

Given a posterior probability level p₀ ∈ (0, 1), we estimate the activation region $ℛ_{j}$ by

ℛ_{j} (p_{0}) = {ξ \in ℬ : {PIP}_{j} (ξ) \geq p_{0}},

(8)

which is interpreted as the jth brain activation region where each location has the nonzero effect with a marginal posterior probability at least p₀. A common choice of p₀ is 0.5, which has been widely adopted as the marginal median posterior inclusion probability criterion for model selection in linear regression (Barbieri and Berger, 2004). To choose p₀ with controlling the false discovery rate (FDR), we suggest adopting an approach by Morris et al. (2008) for Bayesian functional data analysis. Both approaches can accurately select the activation regions with slight differences for the case with a large sample size or a high signal-to-noise ratio. See detailed comparisons in Section 3 in the Supplementary Material.

3. Theoretical Properties

We investigate the theoretical properties of the BSP-BSS model. The motivation of this investigation is the nonparametric nature of the proposed model, as it involves unknown intensity functions and the number of unknown mixing coefficients increases as the sample size increases. Thus, the classic theory of Bayesian inferences for parametric models does not apply to BSP-BSS, for which we need to study three main theoretical properties: large support of TGP and the vMF distributions as priors for the true parameter space (Theorem 1), joint posterior consistency of the mixing coefficients and the latent sources (Theorem 2) and selection consistency on the effective number of latent sources (Theorem 3). For simplicity, we assume the hyperparameters are fixed at certain values. We follow the general posterior consistency theorem proposed by Choudhuri et al. (2004) to show Theorem 2, which requires us to verify the prior positivity of neighborhoods (Lemma 1) and construct uniformly consistent tests whose type 1 and type 2 errors have specific bounds (Lemmas 2–8). Theorems 2 and 3 are shown with both the sample size n and the number of spatial regions V being large. Hence, we assume V depends on n and write V as V_n in the rest of the paper. Details of the proofs can be found in the Supplementary Material.

3.1. Assumptions

We introduce the following additional assumptions for the theorems.

Assumption 4.

There exist constants M₂, M₃ such that $0 < M_{2} < {‖ {\tilde{σ}}^{2} ‖}_{\infty} < M_{3}$ .

Assumption 5.

There exist constants c₁, c₂ and ν such that $c_{1} n^{d} \leq V_{n} \leq c_{2} n^{2 ρ_{0} (1 - v)}$ and 0 < v < 1 − d / (2ρ₀), with d being the dimension of the spatial space $(ℬ \subset ℝ^{d})$ .

Assumption 6.

Given $ξ \in ℬ$ , the kernel function κ(ξ,·) has continuous partial derivatives up to order 2ρ₀ + 2.

Assumption 4 imposes the restrictions on the noise variance intensity ${\tilde{σ}}^{2}$ , which ensures the total variance over the brain is bounded away from zero and infinity. Assumption 5 implies that the order of V_n should be the same with a polynomial of n that is related with the dimension d and the smoothness of the kernel functions of the TGP. This assumption is sensible in that the number of voxels of the standard brain template is much larger than the number of images in the neuroimaging studies. We specify the smoothness of the kernel functions with Assumption 6 following Ghosal and Roy (2006). We summarize all the assumptions along with the explanations in neuroimaging applications in Table 1.

Table 1.

Summary of assumptions and their interpretations in neuroimaging applications.

	Assumptions	Interpretations
A.1	Source intensity functions are	Neural signals are sparse across the brain;
	sparse and piecewise smooth with	signal changes smoothly in the regions with
	non-zero regions not overlapped	the same type of brain tissues; there is little
	across different latent sources.	spatial overlap between major brain networks.
A.2	Each column of the mixing matrix	Mixing weights of important brain activation
	A satisfies with ${‖ A_{. j} ‖}_{2}^{2} = n$ .	patterns are scale-invariant.
A.3	The size of spatial regions satisfies	Spatial volumes are similar across voxels/brain
	with $1 / (K_{0} V) < m (ℬ_{v}) < 1 / (K_{1} V)$ .	regions; the volume of a voxel is smaller in
		higher resolution images.
A.4	The variance of noise satisfies with	Noise level in brain images are bounded.
	$0 < M_{2} < {‖{\tilde{σ}}^{2}‖}_{\infty} < M_{3}$ .
A.5	The sample size satisfies with	The number of voxels/ROIs can grow
	$c_{1} n^{d} \leq V_{n} \leq c_{2} n^{2 ρ_{0} (1 - v)}$ .	faster than the number of images.
A.6	The kernel function κ (ξ,)	The change rate of brain signals over voxels
	Assumptions	Interpretations
	is smooth up to order 2σ₀ + 2.	is bounded in the smooth transition region.

Open in a new tab

3.2. Large Support

We show with Theorem 1 that the TGP and the vMF distribution priors have large support. This result ensures a positive prior probability such that model parameters concentrate on the arbitrarily small neighborhoods of any values in the true parameter space.

Theorem 1.

(Large support) Suppose A and s are independent and follow the priors (5) and (6) specified in Section 2.2 with some hyperparameters. If Assumptions 1 and 2 hold, for any (A*, s*) ∈ Θ, any flipping indicator δ and any ε > 0,

\Pr {(A, s) \in Θ : {‖ A [δ] - A^{*} ‖}_{\infty} < ε, {‖ s [δ] - s^{*} ‖}_{\infty} < ε} > 0.

3.3. Posterior Consistency

Next, we establish the consistency of the joint posterior distribution of A and s, which provides theoretical justifications for large-scale imaging data analysis via BSP-BSS. For any 0 < M₁ < ∞, let Θ := {(A, s) ∈ Θ :∥A∥_∞ < M1} be the parameter space of interest.

Theorem 2.

(Consistency) If Assumptions 1 – 6 hold, for any ε > 0, there exists a flipping indicator δ₀ such that, as n → ∞,

\Pr {(A, s) \in Θ : n^{- 1} {‖ A [δ_{0}] - A^{*} ‖}_{1} < ε, {‖ s [δ_{0}] - s^{*} ‖}_{1} < ε ∣ X} \to 1,

in $P_{Θ^{*}}^{n}$ - probability, for any true parameter Θ* = (A*, s*) ∈ Θ, where $P_{Θ^{*}}^{n}$ is the actual distribution of the data X given Θ*.

In Theorem 2, we restrict the parameter space of interest to Θ which only includes the bounded A. This consistency result indicates that the joint posterior distribution of A and s concentrates on an arbitrarily small neighborhood of the true parameter Θ* in Θ, as both the number of voxels (or regions) V_n and the number of observed images n go to infinity. The neighborhood of A* is defined with the L¹-norm multiplied by 1 / n since the dimension of A increases with the sample size, which implies the mixing coefficients for images converge to the truth on average. The neighborhood of s* is defined with the functional L¹-norm.

3.4. Selection Consistency on the Number of Latent Sources

To perform the theoretical analysis of BSP-BSS for selecting the number of latent source signals, we extend the parameter space $𝒮$ from $\prod_{j = 1}^{q} 𝒮_{j}$ to $\prod_{j = 1}^{q} (𝒮_{j} \cup {0})$ by including the zero-intensity function into each functional space $𝒮_{j}$ and establish the following theorem.

Theorem 3.

If Assumptions 1 – 6 hold, as n → ∞,

\Pr {(A, s) \in Θ : q_{e f f} = q_{e f f}^{*} ∣ X} \to 1,

in $P_{Θ^{*}}^{n}$ -probability, where q_eff is the effective number of latent sources as defined in equation (4), i.e. $q_{e f f} = \sum_{j = 1}^{q} I {{‖ s_{j} ‖}_{1} > 0}$ and $q_{e f f}^{*} = \sum_{j = 1}^{q} I {{‖ s_{j}^{*} ‖}_{1} > 0}$ .

Theorem 3 implies that BSP-BSS can correctly estimate the effective number of latent sources with a high probability for a sufficiently large number of images. Thus, as long as the specified q is adequately large, BSP-BSS can potentially identify all the effective latent source signals among the observed images. This property is especially useful when there is a lack of prior knowledge on the number of latent sources. The finite-sample performance on latent source signal selection is investigated through a simulation study in Section 5.3.

4. Posterior Computation

Now we discuss the posterior computation details. The spatial resolution of brain imaging data can be high and the standard brain template may contain hundreds of thousands of voxels. This poses computational challenges for the posterior inferences on BSP-BSS with voxel-level data. To address this issue, we adopt an equivalent representation of the prior model for the intensity function in light of the eigendecomposition of the covariance kernel in TGP. By the intrinsic property of GP (Ghosal and Roy, 2006), when the number of eigenfunctions is sufficiently large, the proposed BSP-BSS model can be well approximated by a truncated model representation, for which we develop a computationally feasible posterior computation algorithm for the large scale voxel-level imaging data. The R package BSPBSS is freely available on the author’s GitHub at https://github.com/benwu233/BSPBSS.

4.1. Prior Representation of Intensity Functions

Consider the eigendecomposition of the covariance kernel $κ (ξ, ξ^{'}) = \sum_{i = 1}^{\infty} λ_{l} ψ_{l} (ξ) ψ_{l} (ξ^{'})$ , where ${λ_{l}}_{l = 1}^{\infty}$ is the set of eigenvalues with λ_l ≥ λ_l+1 for l = 1, 2,… , and ${ψ_{l} (ξ)}_{l = 1}^{\infty}$ represents the set of eigenfunctions that satisfy with $\int_{ℬ} ψ_{l} (ξ) ψ_{I^{'}} (ξ) d ξ = I (l = l^{'})$ , for any l, l′ ∈ {1, 2,… ,}. Equivalently, B_j(ξ) can be represented as a linear combination of the eigenfunctions, i.e. $B_{j} (ξ) = \sum_{i = 1}^{\infty} b_{j l} ψ_{l} (ξ)$ , where b_jl ~ N(0, λ_l), j = 1, … , q, l = 1,… are mutually independent. In practice, we can truncate the summation at a sufficiently large finite number of components L to obtain a fairly good approximation of B_j, i.e. $\sum_{l = 1}^{L} b_{j l} ψ_{l} (ξ) = b_{j}^{⊤} ψ (ξ)$ , where b_j = (b_j1,… , b_jL)^⏉ and ψ(ξ) = {ψ₁(ξ),… , ξ_L(ξ)}^⏉. Since the imaging signals appear to be smooth in many brain regions, to achieve a good approximation, the required number of eigenfunctions L is still much smaller than the number of voxels V_n. Thus, with this approximation, the number of parameters for B_j can be reduced substantially, leading to a feasible posterior computation algorithm. By the truncation approximation of B_j, we introduce the truncated TGP prior representation of intensity function. Let B = (b₁,… , b_q), then we approximate s_j(ξ) in (3) with

s_{j} (ξ; B, ζ) = b_{j}^{⊤} ψ (ξ) I {| b_{j}^{⊤} ψ (ξ) | > ζ} with b_{j} ~ N (0_{L}, Λ),

(9)

where Λ = diag{λ₁,… , λ_L} and 0_L is a vector of zeros with length L. We resort to the numerical approximation to compute the integral in (3). When analyzing the voxel-level imaging data, $ℬ_{v}$ represents the small cubic region of voxel v and $S_{j} (ℬ_{v})$ can be accurately approximated by $s_{j} (ξ_{v}) m (ℬ_{v})$ where ξ_v is the center location of the voxel v.

4.2. Markov Chain Monte Carlo

We develop a Metropolis-Hasting within Gibbs sampling algorithm to simulate the posterior distribution of the proposed BSP-BSS model (2) with the prior approximation (9). For A and σ, the full conditional distributions have the close form leading to the Gibbs sampling update schemes. The parameter ζ is updated with the Metropolis algorithm with random walk proposals. The algorithm to update parameter B is the most challenging step due to the high-dimensionality and the complexity of the full conditional density function that involves the hard thresholding function leading to the discontinuity. We propose a smooth approximation of the hard thresholding function and adopt the stochastic gradient Hamiltonian Monte Carlo (SGHMC) proposed by Chen et al. (2014) to update B given other parameters (Algorithm 1).

Algorithm 1:

SGHMC for updating B at the hth MCMC iteration

sample momentum: ν(h) ~ N(0, ηI);

set (B₀, v₀) = (B^(h−1), v^(h−1));

for t = 1,… , T do

update B_t = B_t−1 + ν_t−1;

sample ω_t ~ N(0, 2αηI);

sample $ℐ \subset {i}_{i = 1}^{n}$ and $𝒱 \subset {v}_{v = 1}^{V_{n}}$ ;

update $v_{t} = (1 - α) v_{t - 1} - η \nabla \tilde{U} (B_{t}; ℐ, 𝒱) + ω_{t}$ ;

end

set (B^(h), ν^(h)) = (B_T, ν_T);

In Algorithm 1, given the sample of other parameters at the hth iteration, i.e., A^(h), σ^(h) and ζ^(h), we compute the stochastic gradient term with respect to B by subsampling both the indices of images and the indices of regions, i.e.,

\nabla \tilde{U} (B; ℐ, 𝒱) = - \frac{n V_{n}}{| ℐ | | 𝒱 |} \sum_{i \in ℐ} \sum_{v \in 𝒱} \nabla \log π (X_{i v} ∣ A^{(h)}, B, σ^{(h)}, ζ^{(h)}) - \nabla \log π (B),

where π(·|A, B, σ, ζ) is the density function of the observed image intensity given all parameters and π(B) is the prior of B. The two index sets $ℐ$ and $𝒱$ are random subsets of the image indices ${i}_{i = 1}^{n}$ and the region indices ${v}_{v = 1}^{V_{n}}$ , respectively. The number of leapfrog steps T, the learning rate η and the momentum term (1 − α) can be chosen according to the suggestions of Chen et al. (2014). The details of the MCMC sampling scheme are in the Supplementary Material.

5. Simulations

We conduct simulations to evaluate the performance of the proposed model under various scenarios. In Scenarios I and II, observed images are generated from a mixture of source signals with geometric spatial patterns with either sharp or smooth edges. In Scenario III, we generate fMRI type data using a neuroimaging Matlab toolbox SimTB (Erhardt et al., 2012) where the spatial source signals and the temporal mixing matrix are generated based on real fMRI spatial signals and time series. We compare the proposed BSP-BSS with ICA implemented using the popular Infomax algorithm (Bell and Sejnowski, 1995). We evaluate the performance of the methods in recovering the underlying source signals as well as the mixing matrices from the observed mixed data. In application, spatial signals estimates from standard ICA algorithms such as Infomax ICA are often thresholded to identify activated regions in each source signal (McKeown et al., 1998; Kiviniemi et al., 2003). Therefore, we also consider a thresholded Infomax ICA. Specifically, the estimated spatial source signals from Infomax ICA are transformed to Z-scores (McKeown et al., 1998) and thresholded source signals are obtained by only retaining Z-scores with a magnitude greater than two. For BSP-BSS, we use the modified squared exponential kernel for the TGP prior of s. Details can be found in Section 2.6 in the Supplementary Material. The MCMC sampling runs for 4,000 iterations with 2,000 burn-in. The results for all three Scenarios suggest that BSP-BSS achieves much better performance compared with existing methods. Next, we present Scenarios I and II in detail, and include details for Scenario III in Section 3.2 in the Supplementary Material.

5.1. Scenario I: Geometric Source Signals with Sharp Edges

We first consider a linear mixture of three latent sources with 2D geometric patterns with sharp edges. We generate three 30×30 binary source images as the true latent sources s*, in which the activated regions have geometric plane shapes (square, circle, and triangle) with sharp edges. We set the sample size to be n = 30. Each column of the true mixing matrix A* is generated using the vMF prior with the concentration parameter η = 0. We consider three cases with different noise levels (low, medium and high) by setting the value of $σ^{2} (ℬ_{v})$ to 1 × 10⁻³, 5 × 10⁻⁴ and 1 × 10⁻⁴ respectively. The corresponding average signal-to-noise ratios are around 0.41, 0.83 and 4.34, respectively.

5.2. Scenario II: Geometric Source Signals with Smooth Edges

Then we smooth the latent sources in Scenario I to generate spatial source signals with smooth edges, leading to a more challenging scenario. Specifically, we first generate binary images as in Scenario I and then replace the value on each pixel with the average over its neighbors to smooth the source signals, leading to smooth edges for the activated regions. Other simulation settings remain the same as in Scenario I. The smoothing reduces the signal-to-noise ratios of observations in high, medium, and low noise levels to 0.18, 0.35, and 1.76 respectively, therefore making the estimation more difficult. Figure 1 shows the true latent sources in both Scenarios I and II.

Fig. 1 — True spatial source signal intensities in Scenarios I and II.

Table 2 compares the performance of BSP-BSS and ICA over 100 simulation runs in terms of their performance as binary classifiers for separating activated and non-activated regions. Specifically, we compare the means and standard deviations of their positive and negative predictive values (PPV and NPV), sensitivity and specificity. In Scenario I with sharp edges, both methods show high PPV, NPV, and specificity, with BSP-BSS demonstrating much higher sensitivity. This shows BSP-BSS has higher statistical power than the standard ICA in detecting activated regions with sharp edges while maintaining a similar false positive rate. Scenario II is a more challenging case for the proposed BSP-BSS and ICA as binary classifiers since the activated regions have smooth edges and the signal-to-noise ratios are lower than Scenario I. Consequently, we find some reduction in the classification accuracy in Scenario II from both methods as compared to Scenario I. However, the relative performance of BSP-BSS is much better than ICA in Scenario II, especially in terms of NPV and sensitivity. For this challenging scenario, BSP-BSS still maintains a sensitivity of 0.5–0.7 across the noise levels while the sensitivity of ICA drops dramatically to 0.09–0.13. These results indicate that BSP-BSS’s advantage of higher statistical power in detecting activated regions is even more obvious when the activation regions have smooth edges.

Table 2.

Selection accuracy of the activated regions for BSP-BSS and ICA for both simulation scenarios, shown as the mean (standard deviation) over 100 simulation runs. The ICA source signals are extracted using Infomax, transformed to Z-socres and thresholded with | z |≥ 2. The values are multiplied by 10³.

		Scenario I: sharp edge								Scenario II: smooth edge
Noise	Method	PPV		NPV		Sens.		Spec.		PPV		NPV		Sens.		Spec.
Low	BSP-BSS	995	(4)	998	(1)	988	(4)	999	(1)	998	(4)	782	(13)	701	(23)	999	(3)
	ICA	999	(10)	974	(4)	841	(27)	1000	(1)	1000	(1)	552	(1)	130	(4)	1000	(0)
Medium	BSP-BSS	996	(4)	997	(1)	983	(4)	999	(1)	999	(2)	724	(11)	591	(22)	999	(1)
	ICA	998	(16)	952	(4)	704	(25)	1000	(2)	969	(15)	546	(1)	109	(5)	997	(2)
High	BSP-BSS	995	(4)	996	(1)	979	(5)	999	(1)	999	(4)	689	(12)	515	(26)	999	(2)
	ICA	993	(5)	934	(3)	586	(21)	999	(0)	872	(58)	538	(3)	90	(9)	988	(5)

Open in a new tab

We also compare the accuracy of BSP-BSS and ICA in estimating the mixing matrix by evaluating the Amari error (Amari et al., 1996). The Amari error, a measure of the difference between two nonsingular matrices, has been used in the ICA studies (Bach and Jordan, 2002; Chen and Bickel, 2006; Lee et al., 2011) as a convergence criterion for ICA algorithms and a measure of accuracy between the true and estimated mixing matrix. Amari error was originally defined for invertible square matrices since the mixing matrix of many traditional ICA algorithms is a square matrix due to the dimension reduction and whitening pre-processing steps often conducted prior to ICA. However, in our BSP-BSS, such preprocessing steps are not required and the mixing matrix is not necessarily a squared matrix. Therefore, we extend the Amari error to non-squared matrices by considering the generalized inverse of A*. Specifically, we define the extended Amari error between A and A* as:

d (A, A^{*}) = \frac{1}{2 q} \sum_{i = 1}^{q} | \frac{\sum_{j - 1}^{q} | w_{i j} |}{\max_{j} | w_{i j} |} - 1 | + \frac{1}{2 q} \sum_{i = 1}^{q} | \frac{\sum_{i = 1}^{q} | w_{i j} |}{\max_{i} | w_{i j} |} - 1 |,

where $w_{i j} = {{(A^{* ⊤} A^{*})}^{- 1} A^{* ⊤} A}_{i j}$ . Figure 2 compares the Amari errors of the estimated mixing matrices based on BSP-BSS and ICA over 100 simulation runs for both Scenarios I and II with boxplots. The results show BSP-BSS has a much smaller Amari error compared with ICA. In summary, the comparisons between the proposed BSP-BSS and ICA show that our method can potentially provide a more powerful and accurate tool for blind source separation, especially in the presence of smooth source signals.

Fig. 2 — Amari errors of the estimated mixing coefficients with BSP-BSS and ICA (implemented using Infomax) over 100 simulation runs for simulation Scenarios I and II.

5.3. Selection of the Number of Latent Sources

As discussed in Section 2.3, the BSP-BSS model automatically selects the effective number of latent sources. To evaluate its accuracy, we generate observations as in Scenarios I and II above at the three noise levels with the true effective number of latent sources is q_eff = 3. We decompose the observed data with the proposed BSP-BSS model with overspecified numbers of latent sources q = 5, 7, 9. To handle such a challenging task, we fit the BSP-BSS model using a longer MCMC chain with 6,000 iterations and 3,000 burn-in, and assign the prior of the thresholding parameter ζ with an upper bound such that ζ ∝ I(0 < ζ < Q(s)) to improve convergence. Here, we take Q(s) as the 95% quantile of the absolute value of all the s_j(ξ). Then we estimate the effective number of latent sources using the proposed estimator in (7), which implies each source should have non-zero values on at least one pixel. The results from Table 3 show that the proposed estimator has correctly identified the true number of latent sources q_eff = 3 in almost all of the simulation runs, even when the noise level is high and q is significantly overspecified in the BSP-BSS model. For those few simulation runs with incorrect results, the estimated effective number of latent sources is 4, which is very close to the true value. The results show that a promising strategy for identifying the effective number of latent sources is to specify a relatively large number of sources q in the proposed BSP-BSS model and then use the proposed estimator to identify the effective number of sources. This automatic selection strategy is very useful in real data applications where we usually do not know the “true” number of sources prior to conducting the blind source separation.

Table 3.

The frequency of correct selection of the number of latent sources for BSP-BSS over 100 simulation runs. The observations are generated as in simulation Scenarios I and II. The true number of source signal is q_eff = 3. We fit BSP-BSS model by specifying q = 5, 7, or 9.

	Scenario I: sharp edge			Scenario II: smooth edge
Noise	q = 5	q = 7	q = 9	q = 5	q = 7	q = 9
Low	100	99	100	100	100	94
Medium	100	100	100	100	100	99
High	100	100	100	98	100	100

Open in a new tab

6. Analysis of ABIDE Data

We apply our method to analyze the multi-subject resting-state fMRI (rs-fMRI) data from the Autism Brain Imaging Data Exchange (ABIDE) study (Di Martino et al., 2014) to investigate the differences in brain functional networks between patients with autism and healthy subjects.

6.1. Data Description

ABIDE collected functional and structural brain imaging data from laboratories around the world, aiming to accelerate the understanding of the neural bases of autism. We apply the proposed BSP-BSS to analyze ABIDE I preprocessed data (Craddock et al., 2013). ABIDE I, which was released in August 2012, is a collaboration of 16 international imaging sites that have aggregated and are openly sharing neuroimaging data from 539 individuals (ages 7–64 years) suffering from autism spectrum disorder (ASD) and 573 age-matched typical controls (Di Martino et al., 2014). We use the rs-fMRI data from ABIDE I preprocessed with the Configurable Pipeline for the Analysis of Connectomes (CPAC, http://fcp-indi.github.com). The preprocessing steps start from basic processing including dropping the first several volumes, slice timing correction, motion realignment, and intensity normalization. Then nuisance variable regression is performed to clean confounding variation due to physiological processes (heartbeat and respiration), head motion, and low-frequency scanner drifts. Band-pass filtering is applied after the regression to retain frequencies between 0.01Hz and 0.1Hz. All the images are registered to the MNI standard brain space. Our analysis focuses on an imaging statistic, the weighted degree centrality (WDC), defined as the sum of weights on the edges that are connected to all the other nodes. WDC provides a useful measure for brain intrinsic connectivity networks (Zuo et al., 2012). WDC has been widely adopted to identify “functional hubs” and study the topology of these hub structures (Fransson et al., 2011; Langer et al., 2012; Li et al., 2016). In addition, WDC is one of the most widely used measures that summarize information in the functional connectivity at the voxel level (Zuo et al., 2012). A challenge in analyzing WDC is that the data may involve complex spatial dependence among voxels. To address this issue, we apply the proposed BSP-BSS method to analyze the WDC data. Specifically, using the ABIDE data preprocessing pipeline (Craddock et al., 2013), we obtain WDC data from unsmoothed preprocessed rs-fMRI data which are registered to the MNI152 (Grabner et al., 2006) 3mm space. After removing the missing values, the data consist of 882 subject with 407 ASD patients and 475 healthy controls.

6.2. Results

We apply both the proposed BSP-BSS and ICA to the data described above. The modified squared exponential kernel is applied for the TGP prior of s as in the simulation study (see details in Section 2.6 in the Supplementary Material). The number of eigenfunctions in the approximate representation is specified as 500, which we find is sufficiently large to capture the characteristics of the data. The number of latent sources is specified as 30. We run the MCMC algorithm for 30,000 iterations with 15,000 burn-in and thin the chain after burn-in to obtain 750 samples. Gelman and Rubin’s convergence diagnostic (Gelman and Rubin, 1992) is conducted to evaluate the convergence of the algorithm. Five zero sources are found among the 30 sources estimated with the model, which implies q = 30 is sufficient in capturing the effective number of latent sources. The activation regions are estimated as in (8), by thresholding PIP at 0.5. Among the latent sources extracted by BSP-BSS, we identify several well-established brain functional networks. Figure 3 compares the functional networks identified by BSP-BSS and ICA, including the medial parietal cortex (MPC), bilateral inferior-lateral-parietal cortex (ilPC), and ventromedial prefrontal cortex (vmPFC), which are known as subregions of the default mode network (DMN). The ICA source signals are shown in Z-scores. The BSP-BSS source signals are rescaled to [0, 1]. From Figure 3, we find the spatial source signals estimated by BSP-BSS better align with the spatial distribution of the well-known functional networks reported in the neuroimaging literature (Smith et al., 2009; Iraji et al., 2019). For example, BSP-BSS has successfully identified key regions in the networks. In comparison, ICA has poor spatial coverage in some of the key regions such as the vmPFC of DMN, and encounters cross-talk between MPC and ilPC. Our findings from real data application are consistent with the results from the simulation studies that BSP-BSS has better statistical power than ICA in detecting regions of relevance in brain networks.

Fig. 3 — Common brain functional networks recovered by ICA and BSP-BSS with ABIDE data. The ICA source signals are extracted using Infomax, transformed to Z-scores and thresholded with |z|≥ 2. The BSP-BSS source signals are rescaled to [0, 1].

The default mode network dysfunction has often been reported in ASD studies (Cherkassky et al., 2006; Monk et al., 2009; Lynch et al., 2013; Uddin et al., 2013). In our study, we investigate the difference in the mixing coefficients of DMN subregions between the ASD patient group and the healthy group for both BSP-BSS and ICA. We perform the Wilcoxon test on the mixing coefficients to examine between-group differences. The p-values of the three subregions, i.e., MPC, ilPC and vmPFC, based on the Wilcoxon test are 0.0489, 0.9581, and 0.0138, respectively, for BSP-BSS and 0.0626, 0.7347, and 0.0003, respectively, for ICA. Therefore, both BSP-BSS and ICA find a significant difference in vmPFC between ASD patients and healthy controls, but BSP-BSS demonstrates better power in detecting the difference in MPC as compared to ICA.

7. Discussion

In this paper, we propose a new Bayesian spatial blind source separation method (BSP-BSS) to make inferences on sparse and spatially dependent source signals. We show that the BSP-BSS has desirable theoretical properties including model identifiability, large support of the priors, posterior consistency of the model parameters and selection consistency of the effective number of latent sources. The proposed BSP-BSS has two main advantages compared with existing BSS methods. First, thanks to the sparse prior, BSP-BSS can automatically identify activated regions in the sources. In comparison, most existing ICA methods generate non-sparse source estimates and require an additional thresholding procedure to identify activated regions, which typically is adhoc and lacks theoretical justifications. The sparsity also guides BSP-BSS to select the effective number of latent sources, which provides a solution to a major challenge for traditional algorithms. Secondly, BSP-BSS directly accounts for spatial dependence within each source signal and leads to better recovery of sources with spatial patterns, which is very common in neuroimaging applications. In our model, we assume the noise terms are spatially independent. This implies that major spatially structured signals of the brain activation are captured by the extracted source signals in the model whereas the model residuals mainly reflect random residual variations in the imaging data after accounting for the spatially structured source signals. This is a commonly adopted assumption in blind source separation for brain imaging (Beckmann and Smith, 2004, 2005; Guo, 2011; Shi and Guo, 2016; Wang and Guo, 2019). From our experiences, the proposed method still works well when there is mild to moderate spatial dependence in the residuals, as long as the spatial dependence in the source signals is stronger than that in the noises. We have included simulation results to demonstrate that our model performance remains strong in the presence of spatially dependent noise (Section 4.2 in the Supplementary Material).

The simulation study has shown the significant outperformance of BSP-BSS over ICA when the true latent sources have spatially clustering activated regions. The applications to fMRI data also demonstrate the advantage of our method. In our applications, we evaluate the proposed method with multi-subject rs-fMRI data in ABIDE dataset, where BSP-BSS finds a significant difference in the mixing coefficients for a DMN subregion between health and patient groups while ICA fails. To the best of our knowledge, BSP-BSS is the first BSS method that simultaneously addresses the smoothness and sparsity in neuroimaging latent source signals. Potentially, our proposed method can be extended to multi-modality brain image analysis with the assumptions that different modules share some common spatial traits in the latent sources. Another possible future direction is extending the spatial model to the spatial-temporal model to deal with longitudinal imaging data, which is also a very challenging topic in the neuroimaging field.

Supplementary Material

Supp 1

NIHMS1853444-supplement-Supp_1.pdf^{(4.3MB, pdf)}

Acknowledgments

We are grateful to the associate editor and three anonymous reviewers for their valuable comments. This work was partially supported by the NIH grants R01MH105561 (Guo, Kang and Wu), R01GM124061 (Kang), R01DA048993 (Kang), R01MH118771 (Guo) and R01MH120299 (Guo).

Footnotes

Supplementary Material

Proofs of theoretical properties, details of the posterior computation, additional simulation results and sensitivity analysis are included in the Supplementary Material.

References

Amari S. i., Cichocki A, and Yang HH (1996), “A New Learning Algorithm for Blind Signal Separation,” in Advances in Neural Information Processing Systems, pp. 757–763. [Google Scholar]
Bach FR and Jordan MI (2002), “Kernel Independent Component Analysis,” Journal of Machine Learning Research, 3, 1–48. [Google Scholar]
Banerjee S, Gelfand AE, Finley AO, and Sang H (2008), “Gaussian Predictive Process Models for Large Spatial Data Sets,” Journal of the Royal Statistical Society, Series B, 70, 825–848. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barbieri MM and Berger JO (2004), “Optimal Predictive Model Selection,” The Annals of Statistics, 32, 870–897. [Google Scholar]
Beckmann CF and Smith SM (2004), “Probabilistic Independent Component Analysis for Functional Magnetic Resonance Imaging,” IEEE Transactions on Medical Imaging, 23, 137–152. [DOI] [PubMed] [Google Scholar]
— (2005), “Tensorial Extensions of Independent Component Analysis for Multisubject fMRI Analysis,” NeuroImage, 25, 294–311. [DOI] [PubMed] [Google Scholar]
Bell AJ and Sejnowski TJ (1995), “An Information-Maximization Approach to Blind Separation and Blind Deconvolution,” Neural Computation, 7, 1129–1159. [DOI] [PubMed] [Google Scholar]
Bhattacharya A and Dunson DB (2011), “Sparse Bayesian Infinite Factor Models,” Biometrika, 98, 291–306. [DOI] [PMC free article] [PubMed] [Google Scholar]
Biswal BB and Ulmer JL (1999), “Blind Source Separation of Multiple Signal Sources of fMRI Data Sets Using Independent Component Analysis,” Journal of Computer Assisted Tomography, 23, 265–271. [DOI] [PubMed] [Google Scholar]
Boehm Vock LF, Reich BJ, Fuentes M, and Dominici F (2015), “Spatial Variable Selection Methods for Investigating Acute Health Effects of Fine Particulate Matter Components,” Biometrics, 71, 167–177. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cai Q, Kang J, and Yu T (2020), “Bayesian Network Marker Selection via the Thresholded Graph Laplacian Gaussian Prior,” Bayesian Analysis, 15, 79–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
Calhoun VD, Adali T, Pearlson GD, and Pekar JJ (2001), “A Method for Making Group Inferences from Functional MRI Data Using Independent Component Analysis,” Human Brain Mapping, 14, 140–151. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cannarsa P and D’Aprile T (2015), “Signed Measures,” in Introduction to Measure Theory and Functional Analysis, Springer, pp. 253–270. [Google Scholar]
Chen A and Bickel PJ (2006), “Efficient Independent Component Analysis,” The Annals of Statistics, 34, 2825–2855. [Google Scholar]
Chen T, Fox E, and Guestrin C (2014), “Stochastic Gradient Hamiltonian Monte Carlo,” in International Conference on Machine Learning, pp. 1683–1691. [Google Scholar]
Cherkassky VL, Kana RK, Keller TA, and Just MA (2006), “Functional Connectivity in a Baseline Resting-State Network in Autism,” Neuroreport, 17, 1687–1690. [DOI] [PubMed] [Google Scholar]
Choudhuri N, Ghosal S, and Roy A (2004), “Bayesian Estimation of the Spectral Density of a Time Series,” Journal of the American Statistical Association, 99, 1050–1059. [Google Scholar]
Craddock C, Benhajali Y, Chu C, Chouinard F, Evans A, Jakab A, Khundrakpam BS, Lewis JD, Li Q, Milham M, Yan C, and Bellec P (2013), “The Neuro Bureau Preprocessing Initiative: Open Sharing of Preprocessed Neuroimaging Data and Derivatives,” in Neuroinformatics 2013, Stockholm, Sweden. [Google Scholar]
Derado G, Bowman FD, and Kilts CD (2010), “Modeling the Spatial and Temporal Dependence in fMRI Data,” Biometrics, 66, 949–957. [DOI] [PMC free article] [PubMed] [Google Scholar]
Di Martino A, Yan CG, Li Q, Denio E, Castellanos FX, Alaerts K, Anderson JS, Assaf M, Bookheimer SY, Dapretto M, Deen B, Delmonte S, Dinstein I, Ertl-Wagner B, Fair DA, Gallagher L, Kennedy DP, Keown CL, Keysers C, Lainhart JE, Lord C, Luna B, Menon V, Minshew NJ, Monk CS, Mueller S, Müller RA, Nebel MB, Nigg JT, O’ Hearn K, Pelphrey KA, Peltier SJ, Rudie JD, Sunaert S, Thioux M, Tyszka JM, Uddin LQ, Verhoeven JS, Wenderoth N, Wiggins JL, Mostofsky SH, and Milham MP (2014), “The Autism brain Imaging Data Exchange: towards a Large-scale Evaluation of the Intrinsic Brain Architecture in Autism,” Molecular Psychiatry, 19, 659. [DOI] [PMC free article] [PubMed] [Google Scholar]
Erhardt EB, Allen EA, Wei Y, Eichele T, and Calhoun VD (2012), “SimTB, a Simulation Toolbox for fMRI Data under a Model of Spatiotemporal Separability,” NeuroImage, 59, 4160–4167. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fevotte C and Godsill SJ (2006), “A Bayesian Approach for Blind Separation of Sparse Sources,” IEEE Transactions on Audio, Speech, and Language Processing, 14, 2174–2188. [Google Scholar]
Fisher RA (1953), “Dispersion on a Sphere,” Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, 217, 295–305. [Google Scholar]
Fransson P, Åden U, Blennow M, and Lagercrantz H (2011), “The Functional Architecture of the Infant Brain as Revealed by Resting-State fMRI,” Cerebral Cortex, 21, 145–154. [DOI] [PubMed] [Google Scholar]
Gelfand AE, Banerjee S, Sirmans C, Tu Y, and Ong SE (2007), “Multilevel Modeling Using Spatial Processes: Application to the Singapore Housing Market,” Computational Statistics & Data Analysis, 51, 3567–3579. [Google Scholar]
Gelfand AE, Schmidt AM, Banerjee S, and Sirmans C (2004), “Nonstationary Multivariate Process Modeling through Spatially Varying Coregionalization,” Test, 13, 263–312. [Google Scholar]
Gelman A and Rubin DB (1992), “Inference from Iterative Simulation Using Multiple Sequences,” Statistical Science, 7, 457–472. [Google Scholar]
Ghosal S and Roy A (2006), “Posterior Consistency of Gaussian Process Prior for Nonparametric Binary Regression,” The Annals of Statistics, 34, 2413–2429. [Google Scholar]
Grabner G, Janke AL, Budge MM, Smith D, Pruessner J, and Collins DL (2006), “Symmetric atlasing and model based segmentation: an application to the hippocampus in older adults,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, pp. 58–66. [DOI] [PubMed] [Google Scholar]
Griffanti L, Douaud G, Bijsterbosch J, Evangelisti S, Alfaro-Almagro F, Glasser MF, Duff EP, Fitzgibbon S, Westphal R, Carone D, Beckmann CF, and Smith SM (2017), “Hand Classification of fMRI ICA Noise Components,” NeuroImage, 154, 188–205. [DOI] [PMC free article] [PubMed] [Google Scholar]
Guhaniyogi R, Finley AO, Banerjee S, and Kobe RK (2013), “Modeling Complex Spatial Dependencies: Low-Rank Spatially Varying Cross-Covariances with Application to Soil Nutrient Data,” Journal of Agricultural, Biological, and Environmental Statistics, 18, 274–298. [Google Scholar]
Guo C, Kang J, and Johnson TD (2022), “A Spatial Bayesian Latent Factor Model for Image-on-Image Regression,” Biometrics, 78, 72–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
Guo Y (2011), “A General Probabilistic Model for Group Independent Component Analysis and Its Estimation Methods,” Biometrics, 67, 1532–1542. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hyun JW, Li Y, Gilmore JH, Lu Z, Styner M, and Zhu H (2014), “SGPP: Spatial Gaussian Predictive Process Models for Neuroimaging Data,” NeuroImage, 89, 70–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hyun JW, Li Y, Huang C, Styner M, Lin W, Zhu H, and Initiative ADN (2016), “STGP: Spatio-Temporal Gaussian Process Models for Longitudinal Neuroimaging Data,” NeuroImage, 134, 550–562. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hyvärinen A and Oja E (2000), “Independent Component Analysis: Algorithms and Applications,” Neural Networks, 13, 411–430. [DOI] [PubMed] [Google Scholar]
Iraji A, Deramus TP, Lewis N, Yaesoubi M, Stephen JM, Erhardt E, Belger A, Ford JM, McEwen S, Mathalon DH, Mueller BA, Pearlson GD, Potkin SG, Preda A, Turner JA, Vaidya JG, van Erp TG, and Calhoun VD (2019), “ The Spatial Chronnectome Reveals a Dynamic Interplay between Functional Segregation and Integration,” Human Brain Mapping, 40, 3058–3077. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kallenberg O (2017), Random Measures, Theory and Applications, Springer. [Google Scholar]
Kang J, Reich BJ, and Staicu A-M (2018), “Scalar-on-Image Regression via the Soft-Thresholded Gaussian Process,” Biometrika, 105, 165–184. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kiviniemi V, Kantola J-H, Jauhiainen J, Hyvärinen A, and Tervonen O (2003), “Independent Component Analysis of Nondeterministic fMRI Signal Sources,” NeuroImage, 19, 253–260. [DOI] [PubMed] [Google Scholar]
Knowles D and Ghahramani Z (2007), “Infinite Sparse Factor Analysis and Infinite Independent Components Analysis,” in International Conference on Independent Component Analysis and Signal Separation, Springer, pp. 381–388. [Google Scholar]
Langer N, Pedroni A, Gianotti LR, Hänggi J, Knoch D, and Jäncke L (2012), “Functional Brain Network Efficiency Predicts Intelligence,” Human Brain Mapping, 33, 1393–1406. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee S, Shen H, Truong Y, Lewis M, and Huang X (2011), “Independent Component Analysis Involving Autocorrelated Sources with an Application to Functional Magnetic Resonance Imaging,” Journal of the American Statistical Association, 106, 1009–1024. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li S, Ma X, Huang R, Li M, Tian J, Wen H, Lin C, Wang T, Zhan W, Fang J, et al. (2016), “Abnormal Degree Centrality in Neurologically Asymptomatic Patients with End-Stage Renal Disease: a Resting-State fMRI Study,” Clinical Neurophysiology, 127, 602–609. [DOI] [PubMed] [Google Scholar]
Li Y-O, Adali T, and Calhoun VD (2006), “Sample Dependence Correction for Order Selection in fMRI Analysis,” in 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2006., IEEE, pp. 1072–1075. [Google Scholar]
Li Y-O, Adal i T, and Calhoun VD (2007), “Estimating the Number of Independent Components for Functional Magnetic Resonance Imaging Data,” Human Brain Mapping, 28, 1251–1266. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lynch CJ, Uddin LQ, Supekar K, Khouzam A, Phillips J, and Menon V (2013), “Default Mode Network in Childhood Autism: Posteromedial Cortex Heterogeneity and Relationship with Social Deficits,” Biological Psychiatry, 74, 212–219. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ma Y and Liu JS (2021), “On Posterior Consistency of Bayesian Factor Models in High Dimensions,” Bayesian Analysis. [Google Scholar]
Marquand A, Howard M, Brammer M, Chu C, Coen S, and Mourão-Miranda J (2010), “Quantitative Prediction of Subjective Pain Intensity from Whole-Brain fMRI Data Using Gaussian Processes,” NeuroImage, 49, 2178–2189. [DOI] [PubMed] [Google Scholar]
McKeown MJ, Makeig S, Brown GG, Jung T-P, Kindermann SS, Bell AJ, and Sejnowski TJ (1998), “Analysis of fMRI Data by Blind Separation into Independent Spatial Components,” Human Brain Mapping, 6, 160–188. [DOI] [PMC free article] [PubMed] [Google Scholar]
Minka TP (2001), “Automatic Choice of Dimensionality for PCA,” in Advances in Neural Information Processing Systems, pp. 598–604. [Google Scholar]
Mohammad-Djafari A (2012), “Bayesian Approach with Prior Models which Enforce Sparsity in Signal and Image Processing,” EURASIP Journal on Advances in Signal Processing, 2012, 52. [Google Scholar]
Monk CS, Peltier SJ, Wiggins JL, Weng S-J, Carrasco M, Risi S, and Lord C (2009), “Abnormalities of Intrinsic Functional Connectivity in Autism Spectrum Disorders,” NeuroImage, 47, 764–772. [DOI] [PMC free article] [PubMed] [Google Scholar]
Montagna S, Wager T, Barrett LF, Johnson TD, and Nichols TE (2018), “Spatial Bayesian Latent Factor Regression Modeling of Coordinate-Based Meta-Analysis Data,” Biometrics, 74, 342–353. [DOI] [PMC free article] [PubMed] [Google Scholar]
Morris JS, Brown PJ, Herrick RC, Baggerly KA, and Coombes KR (2008), “Bayesian Analysis of Mass Spectrometry Proteomic Data Using Wavelet-Based Functional Mixed Models,” Biometrics, 64, 479–489. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nakajima J and West M (2013a), “Bayesian Analysis of Latent Threshold Dynamic Models,” Journal of Business & Economic Statistics, 31, 151–164. [Google Scholar]
— (2013b), “Bayesian Dynamic Factor Models: Latent Threshold Approach,” Journal of Financial Econometrics, 11, 116–153. [Google Scholar]
Ni Y, Stingo FC, and Baladandayuthapani V (2019), “Bayesian Graphical Regression,” Journal of the American Statistical Association, 114, 184–197. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nychka D, Bandyopadhyay S, Hammerling D, Lindgren F, and Sain S (2015), “A Multiresolution Gaussian Process Model for the Analysis of Large Spatial Datasets,” Journal of Computational and Graphical Statistics, 24, 579–599. [Google Scholar]
Power JD, Cohen AL, Nelson SM, Wig GS, Barnes KA, Church JA, Vogel AC, Laumann TO, Miezin FM, Schlaggar BL, et al. (2011), “Functional Network Organization of the Human Brain,” Neuron, 72, 665–678. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rasmussen CE (2003), “Gaussian Processes in Machine Learning,” in Summer School on Machine Learning, Springer, pp. 63–71. [Google Scholar]
Ren Q and Banerjee S (2013), “Hierarchical Factor Models for Large Spatially Misaligned Data: A Low-Rank Predictive Process approach,” Biometrics, 69, 19–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rowe DB (2002), “A Bayesian Approach to Blind Source Separation,” Journal of Interdisciplinary Mathematics, 5, 49–76. [Google Scholar]
Samarov A and Tsybakov A (2004), “Nonparametric Independent Component Analysis,” Bernoulli, 10, 565–582. [Google Scholar]
Samworth RJ and Yuan M (2012), “Independent Component Analysis via Nonparametric Maximum Likelihood Estimation,” The Annals of Statistics, 40, 2973–3002. [Google Scholar]
Shen W, Ning J, and Yuan Y (2016), “Rate-Adaptive Bayesian Independent Component Analysis,” Electronic Journal of Statistics, 10, 3247–3264. [Google Scholar]
Shi R and Guo Y (2016), “Investigating Differences in Brain Functional Networks Using Hierarchical Covariate-Adjusted Independent Component Analysis,” The Annals of Applied Statistics, 10, 1930–1957. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smith SM, Fox PT, Miller KL, Glahn DC, Fox PM, Mackay CE, Filippini N, Watkins KE, Toro R, Laird AR, and Beckmann CF (2009), “Correspondence of the Brain’s Functional Architecture during Activation and Rest,” Proceedings of the National Academy of Sciences, 106, 13040–13045. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smith SM and Nichols TE (2018), “Statistical Challenges in “Big Data” Human Neuroimaging,” Neuron, 97, 263–268. [DOI] [PubMed] [Google Scholar]
Uddin LQ, Supekar K, and Menon V (2013), “Reconceptualizing Functional Brain Connectivity in Autism from a Developmental Perspective,” Frontiers in Human Neuroscience, 7, 458. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang F and Wall MM (2003), “Generalized Common Spatial Factor Model,” Biostatistics, 4, 569–582. [DOI] [PubMed] [Google Scholar]
Wang Y and Guo Y (2019), “A Hierarchical Independent Component Analysis Model for Longitudinal Neuroimaging Studies,” NeuroImage, 189, 380–400. [DOI] [PMC free article] [PubMed] [Google Scholar]
Watson GS (1982), “Distributions on the Circle and Sphere,” Journal of Applied Probability, 19, 265–280. [Google Scholar]
Zayyani H, Babaie-Zadeh M, and Jutten C (2009), “An Iterative Bayesian Algorithm for Sparse Component Analysis in Presence of Noise,” IEEE Transactions on Signal Processing, 57, 4378–4390. [Google Scholar]
Zhang L and Banerjee S (2022), “Spatial Factor Modeling: A Bayesian Matrix-Normal Approach for Misaligned Data,” Biometrics, 78, 560–573. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zuo X-N, Ehmke R, Mennes M, Imperati D, Castellanos FX, Sporns O, and Milham MP (2012), “Network Centrality in the Human Functional Connectome,” Cerebral Cortex, 22, 1862–1875. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp 1

NIHMS1853444-supplement-Supp_1.pdf^{(4.3MB, pdf)}

[R1] Amari S. i., Cichocki A, and Yang HH (1996), “A New Learning Algorithm for Blind Signal Separation,” in Advances in Neural Information Processing Systems, pp. 757–763. [Google Scholar]

[R2] Bach FR and Jordan MI (2002), “Kernel Independent Component Analysis,” Journal of Machine Learning Research, 3, 1–48. [Google Scholar]

[R3] Banerjee S, Gelfand AE, Finley AO, and Sang H (2008), “Gaussian Predictive Process Models for Large Spatial Data Sets,” Journal of the Royal Statistical Society, Series B, 70, 825–848. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Barbieri MM and Berger JO (2004), “Optimal Predictive Model Selection,” The Annals of Statistics, 32, 870–897. [Google Scholar]

[R5] Beckmann CF and Smith SM (2004), “Probabilistic Independent Component Analysis for Functional Magnetic Resonance Imaging,” IEEE Transactions on Medical Imaging, 23, 137–152. [DOI] [PubMed] [Google Scholar]

[R6] — (2005), “Tensorial Extensions of Independent Component Analysis for Multisubject fMRI Analysis,” NeuroImage, 25, 294–311. [DOI] [PubMed] [Google Scholar]

[R7] Bell AJ and Sejnowski TJ (1995), “An Information-Maximization Approach to Blind Separation and Blind Deconvolution,” Neural Computation, 7, 1129–1159. [DOI] [PubMed] [Google Scholar]

[R8] Bhattacharya A and Dunson DB (2011), “Sparse Bayesian Infinite Factor Models,” Biometrika, 98, 291–306. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Biswal BB and Ulmer JL (1999), “Blind Source Separation of Multiple Signal Sources of fMRI Data Sets Using Independent Component Analysis,” Journal of Computer Assisted Tomography, 23, 265–271. [DOI] [PubMed] [Google Scholar]

[R10] Boehm Vock LF, Reich BJ, Fuentes M, and Dominici F (2015), “Spatial Variable Selection Methods for Investigating Acute Health Effects of Fine Particulate Matter Components,” Biometrics, 71, 167–177. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Cai Q, Kang J, and Yu T (2020), “Bayesian Network Marker Selection via the Thresholded Graph Laplacian Gaussian Prior,” Bayesian Analysis, 15, 79–102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Calhoun VD, Adali T, Pearlson GD, and Pekar JJ (2001), “A Method for Making Group Inferences from Functional MRI Data Using Independent Component Analysis,” Human Brain Mapping, 14, 140–151. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Cannarsa P and D’Aprile T (2015), “Signed Measures,” in Introduction to Measure Theory and Functional Analysis, Springer, pp. 253–270. [Google Scholar]

[R14] Chen A and Bickel PJ (2006), “Efficient Independent Component Analysis,” The Annals of Statistics, 34, 2825–2855. [Google Scholar]

[R15] Chen T, Fox E, and Guestrin C (2014), “Stochastic Gradient Hamiltonian Monte Carlo,” in International Conference on Machine Learning, pp. 1683–1691. [Google Scholar]

[R16] Cherkassky VL, Kana RK, Keller TA, and Just MA (2006), “Functional Connectivity in a Baseline Resting-State Network in Autism,” Neuroreport, 17, 1687–1690. [DOI] [PubMed] [Google Scholar]

[R17] Choudhuri N, Ghosal S, and Roy A (2004), “Bayesian Estimation of the Spectral Density of a Time Series,” Journal of the American Statistical Association, 99, 1050–1059. [Google Scholar]

[R18] Craddock C, Benhajali Y, Chu C, Chouinard F, Evans A, Jakab A, Khundrakpam BS, Lewis JD, Li Q, Milham M, Yan C, and Bellec P (2013), “The Neuro Bureau Preprocessing Initiative: Open Sharing of Preprocessed Neuroimaging Data and Derivatives,” in Neuroinformatics 2013, Stockholm, Sweden. [Google Scholar]

[R19] Derado G, Bowman FD, and Kilts CD (2010), “Modeling the Spatial and Temporal Dependence in fMRI Data,” Biometrics, 66, 949–957. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Di Martino A, Yan CG, Li Q, Denio E, Castellanos FX, Alaerts K, Anderson JS, Assaf M, Bookheimer SY, Dapretto M, Deen B, Delmonte S, Dinstein I, Ertl-Wagner B, Fair DA, Gallagher L, Kennedy DP, Keown CL, Keysers C, Lainhart JE, Lord C, Luna B, Menon V, Minshew NJ, Monk CS, Mueller S, Müller RA, Nebel MB, Nigg JT, O’ Hearn K, Pelphrey KA, Peltier SJ, Rudie JD, Sunaert S, Thioux M, Tyszka JM, Uddin LQ, Verhoeven JS, Wenderoth N, Wiggins JL, Mostofsky SH, and Milham MP (2014), “The Autism brain Imaging Data Exchange: towards a Large-scale Evaluation of the Intrinsic Brain Architecture in Autism,” Molecular Psychiatry, 19, 659. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Erhardt EB, Allen EA, Wei Y, Eichele T, and Calhoun VD (2012), “SimTB, a Simulation Toolbox for fMRI Data under a Model of Spatiotemporal Separability,” NeuroImage, 59, 4160–4167. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Fevotte C and Godsill SJ (2006), “A Bayesian Approach for Blind Separation of Sparse Sources,” IEEE Transactions on Audio, Speech, and Language Processing, 14, 2174–2188. [Google Scholar]

[R23] Fisher RA (1953), “Dispersion on a Sphere,” Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, 217, 295–305. [Google Scholar]

[R24] Fransson P, Åden U, Blennow M, and Lagercrantz H (2011), “The Functional Architecture of the Infant Brain as Revealed by Resting-State fMRI,” Cerebral Cortex, 21, 145–154. [DOI] [PubMed] [Google Scholar]

[R25] Gelfand AE, Banerjee S, Sirmans C, Tu Y, and Ong SE (2007), “Multilevel Modeling Using Spatial Processes: Application to the Singapore Housing Market,” Computational Statistics & Data Analysis, 51, 3567–3579. [Google Scholar]

[R26] Gelfand AE, Schmidt AM, Banerjee S, and Sirmans C (2004), “Nonstationary Multivariate Process Modeling through Spatially Varying Coregionalization,” Test, 13, 263–312. [Google Scholar]

[R27] Gelman A and Rubin DB (1992), “Inference from Iterative Simulation Using Multiple Sequences,” Statistical Science, 7, 457–472. [Google Scholar]

[R28] Ghosal S and Roy A (2006), “Posterior Consistency of Gaussian Process Prior for Nonparametric Binary Regression,” The Annals of Statistics, 34, 2413–2429. [Google Scholar]

[R29] Grabner G, Janke AL, Budge MM, Smith D, Pruessner J, and Collins DL (2006), “Symmetric atlasing and model based segmentation: an application to the hippocampus in older adults,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, pp. 58–66. [DOI] [PubMed] [Google Scholar]

[R30] Griffanti L, Douaud G, Bijsterbosch J, Evangelisti S, Alfaro-Almagro F, Glasser MF, Duff EP, Fitzgibbon S, Westphal R, Carone D, Beckmann CF, and Smith SM (2017), “Hand Classification of fMRI ICA Noise Components,” NeuroImage, 154, 188–205. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Guhaniyogi R, Finley AO, Banerjee S, and Kobe RK (2013), “Modeling Complex Spatial Dependencies: Low-Rank Spatially Varying Cross-Covariances with Application to Soil Nutrient Data,” Journal of Agricultural, Biological, and Environmental Statistics, 18, 274–298. [Google Scholar]

[R32] Guo C, Kang J, and Johnson TD (2022), “A Spatial Bayesian Latent Factor Model for Image-on-Image Regression,” Biometrics, 78, 72–84. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Guo Y (2011), “A General Probabilistic Model for Group Independent Component Analysis and Its Estimation Methods,” Biometrics, 67, 1532–1542. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Hyun JW, Li Y, Gilmore JH, Lu Z, Styner M, and Zhu H (2014), “SGPP: Spatial Gaussian Predictive Process Models for Neuroimaging Data,” NeuroImage, 89, 70–80. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Hyun JW, Li Y, Huang C, Styner M, Lin W, Zhu H, and Initiative ADN (2016), “STGP: Spatio-Temporal Gaussian Process Models for Longitudinal Neuroimaging Data,” NeuroImage, 134, 550–562. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Hyvärinen A and Oja E (2000), “Independent Component Analysis: Algorithms and Applications,” Neural Networks, 13, 411–430. [DOI] [PubMed] [Google Scholar]

[R37] Iraji A, Deramus TP, Lewis N, Yaesoubi M, Stephen JM, Erhardt E, Belger A, Ford JM, McEwen S, Mathalon DH, Mueller BA, Pearlson GD, Potkin SG, Preda A, Turner JA, Vaidya JG, van Erp TG, and Calhoun VD (2019), “ The Spatial Chronnectome Reveals a Dynamic Interplay between Functional Segregation and Integration,” Human Brain Mapping, 40, 3058–3077. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Kallenberg O (2017), Random Measures, Theory and Applications, Springer. [Google Scholar]

[R39] Kang J, Reich BJ, and Staicu A-M (2018), “Scalar-on-Image Regression via the Soft-Thresholded Gaussian Process,” Biometrika, 105, 165–184. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] Kiviniemi V, Kantola J-H, Jauhiainen J, Hyvärinen A, and Tervonen O (2003), “Independent Component Analysis of Nondeterministic fMRI Signal Sources,” NeuroImage, 19, 253–260. [DOI] [PubMed] [Google Scholar]

[R41] Knowles D and Ghahramani Z (2007), “Infinite Sparse Factor Analysis and Infinite Independent Components Analysis,” in International Conference on Independent Component Analysis and Signal Separation, Springer, pp. 381–388. [Google Scholar]

[R42] Langer N, Pedroni A, Gianotti LR, Hänggi J, Knoch D, and Jäncke L (2012), “Functional Brain Network Efficiency Predicts Intelligence,” Human Brain Mapping, 33, 1393–1406. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] Lee S, Shen H, Truong Y, Lewis M, and Huang X (2011), “Independent Component Analysis Involving Autocorrelated Sources with an Application to Functional Magnetic Resonance Imaging,” Journal of the American Statistical Association, 106, 1009–1024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] Li S, Ma X, Huang R, Li M, Tian J, Wen H, Lin C, Wang T, Zhan W, Fang J, et al. (2016), “Abnormal Degree Centrality in Neurologically Asymptomatic Patients with End-Stage Renal Disease: a Resting-State fMRI Study,” Clinical Neurophysiology, 127, 602–609. [DOI] [PubMed] [Google Scholar]

[R45] Li Y-O, Adali T, and Calhoun VD (2006), “Sample Dependence Correction for Order Selection in fMRI Analysis,” in 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2006., IEEE, pp. 1072–1075. [Google Scholar]

[R46] Li Y-O, Adal i T, and Calhoun VD (2007), “Estimating the Number of Independent Components for Functional Magnetic Resonance Imaging Data,” Human Brain Mapping, 28, 1251–1266. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] Lynch CJ, Uddin LQ, Supekar K, Khouzam A, Phillips J, and Menon V (2013), “Default Mode Network in Childhood Autism: Posteromedial Cortex Heterogeneity and Relationship with Social Deficits,” Biological Psychiatry, 74, 212–219. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] Ma Y and Liu JS (2021), “On Posterior Consistency of Bayesian Factor Models in High Dimensions,” Bayesian Analysis. [Google Scholar]

[R49] Marquand A, Howard M, Brammer M, Chu C, Coen S, and Mourão-Miranda J (2010), “Quantitative Prediction of Subjective Pain Intensity from Whole-Brain fMRI Data Using Gaussian Processes,” NeuroImage, 49, 2178–2189. [DOI] [PubMed] [Google Scholar]

[R50] McKeown MJ, Makeig S, Brown GG, Jung T-P, Kindermann SS, Bell AJ, and Sejnowski TJ (1998), “Analysis of fMRI Data by Blind Separation into Independent Spatial Components,” Human Brain Mapping, 6, 160–188. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] Minka TP (2001), “Automatic Choice of Dimensionality for PCA,” in Advances in Neural Information Processing Systems, pp. 598–604. [Google Scholar]

[R52] Mohammad-Djafari A (2012), “Bayesian Approach with Prior Models which Enforce Sparsity in Signal and Image Processing,” EURASIP Journal on Advances in Signal Processing, 2012, 52. [Google Scholar]

[R53] Monk CS, Peltier SJ, Wiggins JL, Weng S-J, Carrasco M, Risi S, and Lord C (2009), “Abnormalities of Intrinsic Functional Connectivity in Autism Spectrum Disorders,” NeuroImage, 47, 764–772. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] Montagna S, Wager T, Barrett LF, Johnson TD, and Nichols TE (2018), “Spatial Bayesian Latent Factor Regression Modeling of Coordinate-Based Meta-Analysis Data,” Biometrics, 74, 342–353. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] Morris JS, Brown PJ, Herrick RC, Baggerly KA, and Coombes KR (2008), “Bayesian Analysis of Mass Spectrometry Proteomic Data Using Wavelet-Based Functional Mixed Models,” Biometrics, 64, 479–489. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R56] Nakajima J and West M (2013a), “Bayesian Analysis of Latent Threshold Dynamic Models,” Journal of Business & Economic Statistics, 31, 151–164. [Google Scholar]

[R57] — (2013b), “Bayesian Dynamic Factor Models: Latent Threshold Approach,” Journal of Financial Econometrics, 11, 116–153. [Google Scholar]

[R58] Ni Y, Stingo FC, and Baladandayuthapani V (2019), “Bayesian Graphical Regression,” Journal of the American Statistical Association, 114, 184–197. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R59] Nychka D, Bandyopadhyay S, Hammerling D, Lindgren F, and Sain S (2015), “A Multiresolution Gaussian Process Model for the Analysis of Large Spatial Datasets,” Journal of Computational and Graphical Statistics, 24, 579–599. [Google Scholar]

[R60] Power JD, Cohen AL, Nelson SM, Wig GS, Barnes KA, Church JA, Vogel AC, Laumann TO, Miezin FM, Schlaggar BL, et al. (2011), “Functional Network Organization of the Human Brain,” Neuron, 72, 665–678. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R61] Rasmussen CE (2003), “Gaussian Processes in Machine Learning,” in Summer School on Machine Learning, Springer, pp. 63–71. [Google Scholar]

[R62] Ren Q and Banerjee S (2013), “Hierarchical Factor Models for Large Spatially Misaligned Data: A Low-Rank Predictive Process approach,” Biometrics, 69, 19–30. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R63] Rowe DB (2002), “A Bayesian Approach to Blind Source Separation,” Journal of Interdisciplinary Mathematics, 5, 49–76. [Google Scholar]

[R64] Samarov A and Tsybakov A (2004), “Nonparametric Independent Component Analysis,” Bernoulli, 10, 565–582. [Google Scholar]

[R65] Samworth RJ and Yuan M (2012), “Independent Component Analysis via Nonparametric Maximum Likelihood Estimation,” The Annals of Statistics, 40, 2973–3002. [Google Scholar]

[R66] Shen W, Ning J, and Yuan Y (2016), “Rate-Adaptive Bayesian Independent Component Analysis,” Electronic Journal of Statistics, 10, 3247–3264. [Google Scholar]

[R67] Shi R and Guo Y (2016), “Investigating Differences in Brain Functional Networks Using Hierarchical Covariate-Adjusted Independent Component Analysis,” The Annals of Applied Statistics, 10, 1930–1957. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R68] Smith SM, Fox PT, Miller KL, Glahn DC, Fox PM, Mackay CE, Filippini N, Watkins KE, Toro R, Laird AR, and Beckmann CF (2009), “Correspondence of the Brain’s Functional Architecture during Activation and Rest,” Proceedings of the National Academy of Sciences, 106, 13040–13045. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R69] Smith SM and Nichols TE (2018), “Statistical Challenges in “Big Data” Human Neuroimaging,” Neuron, 97, 263–268. [DOI] [PubMed] [Google Scholar]

[R70] Uddin LQ, Supekar K, and Menon V (2013), “Reconceptualizing Functional Brain Connectivity in Autism from a Developmental Perspective,” Frontiers in Human Neuroscience, 7, 458. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R71] Wang F and Wall MM (2003), “Generalized Common Spatial Factor Model,” Biostatistics, 4, 569–582. [DOI] [PubMed] [Google Scholar]

[R72] Wang Y and Guo Y (2019), “A Hierarchical Independent Component Analysis Model for Longitudinal Neuroimaging Studies,” NeuroImage, 189, 380–400. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R73] Watson GS (1982), “Distributions on the Circle and Sphere,” Journal of Applied Probability, 19, 265–280. [Google Scholar]

[R74] Zayyani H, Babaie-Zadeh M, and Jutten C (2009), “An Iterative Bayesian Algorithm for Sparse Component Analysis in Presence of Noise,” IEEE Transactions on Signal Processing, 57, 4378–4390. [Google Scholar]

[R75] Zhang L and Banerjee S (2022), “Spatial Factor Modeling: A Bayesian Matrix-Normal Approach for Misaligned Data,” Biometrics, 78, 560–573. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R76] Zuo X-N, Ehmke R, Mennes M, Imperati D, Castellanos FX, Sporns O, and Milham MP (2012), “Network Centrality in the Human Functional Connectome,” Cerebral Cortex, 22, 1862–1875. [DOI] [PubMed] [Google Scholar]

PERMALINK

Bayesian Spatial Blind Source Separation via the Thresholded Gaussian Process

Ben Wu

Ying Guo

Jian Kang

Abstract

1. Introduction

2. Bayesian Spatial Blind Source Separation

2.1. Parameter Space and Model Identifiability

Assumption 1.

Assumption 2.

Assumption 3.

Proposition 1.

2.2. Prior Specifications

2.3. Bayesian Inference

3. Theoretical Properties

3.1. Assumptions

Assumption 4.

Assumption 5.

Assumption 6.

Table 1.

3.2. Large Support

Theorem 1.

3.3. Posterior Consistency

Theorem 2.

3.4. Selection Consistency on the Number of Latent Sources

Theorem 3.

4. Posterior Computation

4.1. Prior Representation of Intensity Functions

4.2. Markov Chain Monte Carlo

Algorithm 1:

5. Simulations

5.1. Scenario I: Geometric Source Signals with Sharp Edges

5.2. Scenario II: Geometric Source Signals with Smooth Edges

Fig. 1.

Table 2.

Fig. 2.

5.3. Selection of the Number of Latent Sources

Table 3.

6. Analysis of ABIDE Data

6.1. Data Description

6.2. Results

Fig. 3.

7. Discussion

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases