SUMMARY
Independent component analysis (ICA) has become an important tool for analyzing data from functional magnetic resonance imaging (fMRI) studies. ICA has been successfully applied to single-subject fMRI data. The extension of ICA to group inferences in neuroimaging studies, however, is challenging due to the unavailability of a pre-specified group design matrix and the uncertainty in between-subjects variability in fMRI data. We present a general probabilistic ICA (PICA) model that can accommodate varying group structures of multi-subject spatio-temporal processes. An advantage of the proposed model is that it can flexibly model various types of group structures in different underlying neural source signals and under different experimental conditions in fMRI studies. A maximum likelihood method is used for estimating this general group ICA model. We propose two EM algorithms to obtain the ML estimates. The first method is an exact EM algorithm which provides an exact E-step and an explicit noniterative M-step. The second method is an variational approximation EM algorithm which is computationally more efficient than the exact EM. In simulation studies, we first compare the performance of the proposed general group PICA model and the existing probabilistic group ICA approach. We then compare the two proposed EM algorithms and show the variational approximation EM achieves comparable accuracy to the exact EM with significantly less computation time. An fMRI data example is used to illustrate application of the proposed methods.
Keywords: Independent component analysis, Multi-subject imaging data, Functional magnetic resonance imaging (fMRI), Group inferences, Maximum likelihood estimation, EM algorithm, Variational approximation
1. Introduction
Functional magnetic resonance imaging (fMRI) is a powerful non-invasive tool for studying behavior-, clinical- and cognitive-related neural activity. Observed fMRI data represent the combination of spatio-temporal processes from various underlying neural signals such as signals that are generated in response to experimental stimuli and signals that regulate physiological functions (e.g breathing and heart beating). A major goal in the fMRI analysis is to distinguish and characterize the spatial distribution and temporal dynamics of the source signals. Independent component analysis (ICA) is a computational technique that has been widely applied in fMRI studies for achieving this goal. As a special case of blind source separation, ICA is a computational method for separating observed data into additive components which are statistically as independent as possible. In fMRI analysis, ICA is applied to characterize the spatio-temporal processes of various brain source signals by decomposing observed fMRI data as the outer product of spatial maps and associated time courses where statistical independence is often assumed for the spatial maps of different signals. Compared to conventional analysis tools such as general linear model (GLM), a key advantage of ICA is that it is a data-driven approach and does not rely on a priori model of brain activity. Therefore, ICA is a very useful exploratory tool for analyzing cognitive paradigms or source signals for which prior knowledge of the expected brain temporal response is not available.
ICA has been successfully applied to single-subject fMRI analysis (McKeown et al., 1998; Beckmann and Smith, 2004). The extension of ICA to group inferences is not straightforward because ICA does not have a pre-specified design matrix (Calhoun et al., 2003). Several methods have been proposed to perform group ICA analysis on multi-subject fMRI data (Calhoun et al., 2001; Beckmann and Smith, 2005; Svensén et al., 2002; Schmithorst and Holland, 2002). In specific, the GIFT (Calhoun et al., 2001) and tensor PICA (Beckmann and Smith, 2005) are the most frequently used. These two approaches decompose observed fMRI data based on a specific group structure of multi-subject spatio-temporal processes (Guo and Pagnoni, 2008). Recently, Guo and Pagnoni (2008) has proposed a unified framework for group ICA which can accommodate various types of group structures and include the GIFT and tensor PICA models as special cases.
A major limitation of existing group ICA models is that the same group structure is assumed for all independent components. Previous studies have shown that ICA components extracted from fMRI data represent a wide variety of source signals that include task-related, transiently task-related, physiology-related, artifact-related, etc (Calhoun et al., 2003). Since the source signals have quite different biological and neural properties, they often demonstrate distinctive types of between-subjects variability. Therefore, it is important to assume a suitable group structure for each independent component (IC) based on the characteristics of the underlying source signal. Another limitation of the existing group ICA models is that they assume the same group structure under all experimental conditions during the scanning session. In typical task-related fMRI studies, the experimental paradigm often consists of different types of tasks or stimuli. The between-subjects variability of source signals may change across different tasks. Given the aforementioned reasons, a more flexible group ICA modeling framework is needed to allow signal- and/or task-specific group structures in multi-subject ICA decomposition.
The estimation of group ICA models is a challenging task since because both the time courses and the spatial maps need to be estimated from the observed data. For most of the current group ICA methods, the estimation and inference procedures are tailored specifically for an assumed group structure. In Guo and Pagnoni (2008), we proposed a general maximum likelihood estimation method via an EM algorithm that can be used to estimate different types of group ICA models. Though the proposed EM provided a versatile tool for estimating group structures, the algorithm had a couple of issues that limit its applicability in practice: it relied on a second-order Taylor expansion to approximate the conditional expectation of the log-likelihood in the E-step, resulting in a complex form of the expectation function; and its M-step required using numerical methods such as the Newton-Raphson algorithm; therefore was computationally inefficient and could cause numerical problems.
In this paper, we propose a general probabilistic ICA model that can flexibly model various types of signal- and task-specific group structures in fMRI data. The proposed general model can extract common group temporal dynamics across subjects in a task-related source signal where subjects’ responses were registered to the same task time series; or estimate subject-specific temporal dynamics in physiology-related signals or in resting-state fMRI studies. In multi-group studies, the proposed model allows us to evaluate differences between subject groups in relevant source signals.
A maximum likelihood estimation approach is proposed for the general group probabilistic ICA model. We propose two EM algorithms to obtain the maximum likelihood estimates. First, we propose an exact EM method which provides exact evaluation of the conditional expectation of the log-likelihood in the E-step and provides explicit noniterative solutions for parameter updates in the M-step. We then propose a variational approximation EM to provide fast computation for group ICA models with large number of ICs. The variational EM uses an optimal factorized approximation distribution for evaluating the conditional probabilities in the E-step and provides explicit forms for M-step. Through our simulation studies, we show that the two EM algorithms provide highly accurate estimation of fMRI source signals. Compared to the exact EM, the variational EM significantly reduces computation time while providing comparable results.
2. Methods
2.1 A general group probabilistic ICA model
We propose a general model for group ICA of multi-subject fMRI data that can flexibly model different kinds of between-subjects variability. To set notation, let i = 1, …, N index subjects, t = 1, …, T index time points, v = 1, …, V index voxels. Let yiv be a T × 1 data vector representing the observed fMRI data from subject i at voxel v and is the NT × 1 group data vector concatenated across subjects. The group ICA model is defined as,
| (1) |
where sv is the q × 1 vector containing q statistically independent non-Gaussian spatial source signals at voxel v, M is the TN × q group mixing matrix that mixes the independent q spatial source signals to generate the observed multi-subject fMRI images, ev is the TN × 1 vector representing the N subjects’ noise at voxel v and ev ~ MVN (0, IN ⊗ Σv) where Σv is T × T error covariance matrix for the vth voxel. The noise term is assumed to be independent across voxels because the spatial correlation across voxels is modelled by the spatial source signals (Hyvärinen et al., 2001; Beckmann and Smith, 2004). Since preliminary analysis such as pre-whitening (Bullmore et al., 1996) can be performed to remove the temporal correlations in the noise term and standardize the variability across voxels, the isotropic covariance is often assumed for the noise (Beckmann and Smith, 2004), i.e. Σv = σ2IT.
The proposed ICA model in (1) is referred to as the noisy or probabilistic ICA (Hyvärinen et al., 2001) since it includes the Gaussian noise term to account for the background noise that is not accounted by the source signals. In comparison, the classical noise-free ICA model assumes that the observed data can be fully characterized by the source signals. The noise-free ICA is known to be susceptible to over-fitting problem and also preclude formal statistical tests of significance for the source estimates (Beckmann and Smith, 2004).
2.2 Modeling different types of group structures
Previous studies have shown that the spatial distributions of source signals are more consistent across subjects than the associated temporal responses. Therefore, the proposed group ICA model (1) assumes group spatial distributed patterns across subjects and models the between-subjects variability in the temporal dynamics through the group mixing matrix M. By parameterizing M, the proposed group ICA model can be used to model various kinds of between-subjects variability in temporal responses in multi-subject fMRI data. In previous group ICA methods (Calhoun et al., 2001; Beckmann and Smith, 2005; Guo and Pagnoni, 2008), the same type of between-subjects variability is assumed for all ICs. This assumption is often restrictive considering the wide variety of source signals in fMRI data. For example, the temporal dynamics of task-related signals tend to be highly consistent across subjects. Other types of signals, such as those representing physiological fluctuations and spontaneous mental activity are more heterogeneous among subjects. In this paper, we propose signal-specific group structures to accommodate source signals with different types of between-subject variability. Define M = [m1, …, mq] where mℓ (ℓ = 1, …, q) is the ℓth column of M representing the group mixing coefficients for the ℓth source signal. We propose to parameterize mℓ according to the characteristic of the associated source signal. We consider the following signal-specific group structures,
-
single-group structure
(2) where aℓ is a T × 1 vector representing the group temporal response for the ℓth IC and cℓ is a N × 1 vector representing subject-specific loadings on the group temporal response. The single-group structure assumes all subjects’ temporal responses of the ℓth signal are proportional to the group temporal response scaled by subject-specific loading parameters. This structure is applicable for task-related signals that demonstrate similar temporal response patterns across subjects.
-
multi-group structure
(3) The multi-group structure applies to an fMRI study where the subjects are from g groups and where is the concatenated fMRI data across subjects in group k. In (3), is the T × 1 vector containing the group time courses associated with the ℓth IC for group k and c(k) is the Nk × 1 subject loading vector for group k. The multi-group structure is an extension of the single-group structure which assumes similar response within a group but allows heterogeneous responses between groups.
-
subject-specific structure
(4) where aiℓ, (i = 1, …, N) is T × 1 vector representing the temporal response of the ℓth source signal for subject i. This structure allows subject-specific temporal response patterns and hence is the most flexible structure. This structure is suitable for signals with high heterogeneity across subjects, such as signals unrelated to task activations or signals observed in resting-state studies.
-
Task-specific group structure
In structure (d), we consider the effects of experimental tasks on the between-subjects variability. During an fMRI study, the experimental paradigm is often composed of intervals that are associated with different types or levels of tasks. For examples, the experimental paradigm is often composed of task-on and task-off periods where subjects are exposed to an experimental stimulus or asked to perform tasks during the task-on periods and disengaged from the stimulus or the tasks during the task-off periods. Source signals may demonstrate varying group structures depending on the type or level of the task. Exploratory analysis have found that brain functional networks tend to have more homogenous temporal response patterns across subjects when they are engaging in the experimental task as compared to the resting period. Hence, we propose a task-specific group structure. Suppose the experimental paradigm consists of J different types or levels of tasks. Let
(j = 1, …, J) represent the subset of time points for the jth task, with |
| = Tj and
. A task-specific group structure for the ℓth IC can be written as,
(5) where Lj is a T × Tj incidence matrix where the (t1, t2)th element of Lj is 1 if the t1th time point in the study period is associated with the jth task and corresponds to the t2th ordered time point in
, mℓ,j is the NTj × 1 group mixing subvector for the ℓth source signal during the jth task. For example, for a task-related source signal, we can specify the single-group structure for the task-on periods and specify the subject-specific structure for task-off periods.
3. Estimation
3.1 Maximum likelihood estimation
ICA models are often estimated by maximizing statistical independence of the extracted ICs which depends on expressing the source signals as the product of observed data and the unmixing matrix by inverting the ICA model. These algorithms were developed for the classical noise-free ICA and are not appropriate for probabilistic ICA models which are no longer directly invertable due to the presence of the additive noise. We propose to estimate the proposed group probabilistic ICA models using a maximum likelihood (ML) approach. Comparing to alternative approaches, the ML method can be readily extended for probabilistic ICA models and can also provide a convenient framework for statistical inference and model comparisons in ICA. In previous work, ML estimation has mainly been applied for single-subject ICA (Hyvärinen, 1998; Moulines et al., 1997). Here, we extend the ML method for multi-subject fMRI data.
We construct the likelihood function based on a data augmentation scheme where unobserved spatial source signals are treated as latent data. Following Guo and Pagnoni (2008), we specify a mixture of Gaussian distribution as our source distribution model, on the rationale that sources signals in functional brain images are generally attributed to a small percentage of voxels of the brain whereas most brain areas exhibit background fluctuations. The mixture of Gaussian is well suited to model such mixed patterns by employing different Gaussian components to capture the distribution of the small proportion of activated voxels and the distribution of the majority of the brain areas that are not strongly related to the signal. Denote sℓv as the spatial source signal for the ℓth (ℓ = 1, …, q) IC at voxel v. The pdf of sℓv based on a mixture of Gaussian distribution is,
| (6) |
where m is the number of Gaussian density components in the mixture, is the Gaussian density function with mean μℓj and variance , πℓj is the proportion of the jth Gaussian density. represents the parameters associated with the mixture of Gaussian distribution for the ℓth IC. Our experiments and previous work (Beckmann and Smith, 2004) suggest that a mixture of two to three Gaussian components are usually appropriate to capture the distribution of the spatial signals, because observed fMRI signals across the brain are generally composed of background noise, negative or postive BOLD effects. In the mixture of Gaussian, the different Gaussian components correspond to the hidden states of either activated/deactivated neural responses or background fluctuation in the brain. We define a latent class variable to represent the hidden state. Let zℓv represent the source state for ℓth (ℓ = 1, …, q) IC at voxel v where zℓv takes value in [1, …, m] with probability p (zℓv = j) = πℓj (1 ≤ j ≤ m) and that zv = (z1v, …, zqv)′. Conditional on zℓv, we have .
Based on the ICA model (1) and the source distribution model, the complete log-likelihood is as follows,
| (7) |
where Y = {yv}, S = {sv} and Z = {zv}, θ = {M, {Σv}, ϕ} with , Ψ is the pdf for a multivariate Gaussian distribution, μzv = [μ1z1v, …, μqzqv]′ and .
3.2 The exact EM algorithm
The parameters θ of our group ICA model are estimated by maximizing the marginal log-likelihood L(θ; Y). We consider the EM algorithm to find the ML estimates. We first derive an exact EM which provides explicit E-step and M-step.
3.2.1 The E-step and M-step for the exact EM algorithm
At the E-step, we find the expectation of the complete log-likelihood function by integrating over the conditional distribution p(s, z|y, θ̂). We obtain the joint conditional distribution by deriving the conditional distributions p(sv|zv, yv, θ) and p(zv|yv, θ). From the source distribution model (6) and the Gaussian likelihood (1), we can show that
| (8) |
where is a realization of the latent state vector z (r = 1, …, mq) and
| (9) |
| (10) |
where and .
To derive p(zv|yv, θ), we note that yv|zv follows a multivariate Gaussian distribution after integrating over the source signals. Then, using Bayes’s theorem, we can show that,
| (11) |
with
Combining (8) and (11), we show that,
| (12) |
with .
Based on the conditional distributions, the expectation of the complete log-likelihood function can be obtained as follows,
| (13) |
with and .
For detailed definition of functions Q1, Q2 and Q3, please refer to equations (2),(3) and (4) in Section A.1 of Web Appendix A. The E-step then has an explicit form by applying conditional distributions (8), (11) and (12) to evaluate the expectation function in (13).
For the M-step, the EM algorithm obtains the updated estimates by maximizing Q(θ|θ̂(k)) w.r.t. θ. We show in Section A.2 of Web Appendix A that we can derive explicit noniterative solutions for all the parameters. More specifically, the updated estimates for the group mixing matrix and the noise variance-covariance matrix is,
| (14) |
| (15) |
where is the submatrix in M̂(k+1) that corresponds to the ith subject. Under the common assumption Σv = σ2I, we have that,
| (16) |
We can also derive the following explicit estimation for the parameters of the mixture of Gaussian source distribution model,
| (17) |
| (18) |
| (19) |
The conditional moments of the source signals and the conditional probabilities of the latent states in (17)–(19) are evaluated based on the conditional distributions (8), (11) and (12).
3.2.2 The algorithm for the exact EM
The steps for the proposed exact E-M algorithm are summarized in the following:
-
Step 1
Start with initial parameter values , which could be estimated based on results from existing group ICA software.
-
Step 2
E-step: compute the conditional expectation function Q(θ|θ̂(k)) based on the conditional distributions (8), (11) and (12).
-
Step 3
Modified M-step:
-
Step 4
Iterate between steps 2–3 until convergence.
Compared to standard EM, the proposed EM has the additional Step 3B in the M-step to project onto the subspace of the specified model structure. We now provide the detailed procedure for Step 3B. We illustrate the procedure using the single-group structure. Using similar methods, we can project onto subspaces specified by other model structures. With the single-group structure, we can find where ϒℓ = {m : m = c ⊗ a} using the following procedure:
-
P1
reshape to T × N matrix Wℓ such that , which is the ith column of Wℓ, is the sub-vector of length T in that corresponds to the ith subject.
-
P2
mℓ ∈ ϒℓ if and only if Wℓ is a matrix of rank 1 with each column being a scaled repetition of a. Therefore, we project Wℓ onto the rank-1 matrix space using singular value decomposition (SVD). Denote Wℓ = UDV′ as the SVD of Wℓ where D is the diagonal matrix containing the singular values that rank from highest to lowest. The projection of Wℓ onto the rank-1 matrix space is W̃ℓ = UD̃V′, where D̃ contains only the first singular value while all the other singular values are set at zero (Eckart and Young, 1936). is then obtained by reshaping W̃ℓ, i.e. . The group temporal response vector âℓ and the subject loading vector ĉℓ can be obtained from the first column of U and V, respectively.
3.2.3 Data dimension reduction and whitening
ICA usually incorporates several preprocessing steps including centering, dimension reduction and whitening in order to reduce the complexity for the subsequent ICA decomposition. In Guo and Pagnoni (2008), we introduced a two-stage dimension reduction and whitening procedure for group ICA of multi-subject fMRI data. Guo and Pagnoni(2008) have shown that the two-stage dimension reduction and whitening is equivalent to multiplying the original group fMRI data with a linear transformation matrix. Therefore, the estimation of the group ICA model can be performed using the proposed exact EM algorithm with slight modifications. Specifically, in step 3A, the update parameter estimates are obtained on the reduced dimension; in step 3B, we first transform the estimated mixing matrix back to the original scale and then project the reconstructed mixing matrix onto the space specified by the group structure.
3.2.4 Spatial maps for the estimated source signals
After we obtain the parameter estimates θ̂, we can estimate the spatial source signals and their variability based on their conditional distribution (12), i.e.
In fMRI analysis, researchers are often interested in thresholded IC maps to identify “significantly active” voxels associated with each source signals. With the mixture of Gaussian source distribution, the activation of a voxel can be readily evaluated by the probability for the voxel to belong to the Gaussian component representing activation, which is the component with the estimated mean far away from zero (Guo and Pagnoni, 2008). That is, the probability of activation at voxel v in the ℓth signal is estimated as, , where the j1th Gaussian component models the activation of interest in a signal. One can then label a voxel as significantly activated if the conditional probability exceeds a pre-specified threshold.
3.3 The variational approximation EM algorithm
3.3.1 The E-step and M-step for the variational EM algorithm
The exact EM algorithm has the advantage of providing a closed form for the E-step and explicit solutions for the M-step. However, this exact estimation becomes computational expensive when the number of ICs increases. To address this issue, we consider an approximation EM algorithm for estimating group ICA models with large number of sources. We note that the main cause for the slow computation of the exact EM algorithm is due to the co-dependence among the source signals when conditioning on the observed fMRI data. That is, the following conditional distribution
| (21) |
is no longer factorizable across the source signals. Hence, estimation of the parameters of one source signal involves summing across all possible states of all other q − 1 sources in (17)–(19). To circumvent this time-consuming step, we propose to evaluate the conditional expectation of the log-likelihood function in (13) based on a factorized approximate conditional distribution,
| (22) |
where fsℓ is a probability function for sℓ, is a probability function for the latent state zℓv which satisfies for ℓ = 1, …, q and v = 1, …, V. The functional form of fsℓ is chosen to minimize the Kullback-Leibler (KL) divergence between the factorized approximation (22) and the true conditional distribution (21). By setting the functional derivative of the KL(p̃||p) with respective to fsℓ equal to zero (Csató et al., 2000), we can show that the optimal choice of fsℓ is a Gaussian distribution function. Therefore, the optimal factorized approximation has the following form,
| (23) |
with are the new parameters associated with the factorized approximation distribution. From (23), we note that the marginal approximation conditional distribution for the source signal is still a mixture of Gaussian distribution where each Gaussian component has a diagonal variance-covariance matrix.
The parameters we need to estimate now is ϑ = {θ, ϑ̃}. The estimation is performed by iteratively updating θ and ϑ̃. The estimates of θ are obtained by maximize the conditional expectation of the log-likelihood function evaluated based on the factorized conditional distribution p̃(sv, zv|yv, ϑ̃). Then, we estimate θ̃ by minimizing the KL distance between the approximation and the exact conditional distributions. The derivations for are provided in Web Appendix B. We can show estimates for the variance parameters in the factorized conditional distribution have the following explicit form,
| (24) |
with .
The estimation of the mean {μ̃ℓjv} and the proportion parameters {π̃ℓjv} are co-dependent. The mean parameters {μ̃ℓjv} can estimated through the web equation (16) in Web Appendix B, which involves {π̃ℓjv}. The proportion parameters p̃ℓjv are estimated as,
| (25) |
with . Therefore, the estimates for {μ̃ℓjv} and {π̃ℓjv} are obtained by iterating web equation (16) in Web Appendix B and (25) until convergence, which can be achieved after a few iterations.
Guo and Pagnoni (2008) has previously developed an approximation EM based on Taylor series expansion. Compared to the Taylor-series EM, the variational EM has the following advantages: the optimal factorized approximation in the E-step of the variational EM provides an expectation function with much simpler form and is analytically closer to the exact expectation; the M-step of the variational EM provides explicit forms and is computationally more efficient and stabler than that of the Taylor-series EM.
3.3.2 The algorithm for the variational approximation EM
-
Step 1
Start with initial parameter values .
-
Step 2
E-step: compute the conditional expectation function Q̃(θ|θ̂(k)) based on the factorized conditional distribution, i.e.
-
Step 3
Modified M-step:
-
3A
update parameter estimates for ϑ = {θ, ϑ̃}
-
(3A.1)
obtain θ̂(k+1) s.t. θ̂(k+1) = argmaxθ Q̃(θ|θ̂(k)). Specifically, θ̂(k+1) is obtained by evaluating solutions (14)–(19) based on .
-
(3A.2)
obtain s.t. . Specifically, is obtained through solutions (24), web equation (16) in Web Appendix B and (25).
-
(3A.1)
project onto the space ϒℓ specified by the model structure (ℓ = 1, …, q):
-
3A
-
Step 4
Iterate between steps 2–3 until convergence.
4. Simulation Studies
We conducted two simulation studies to evaluate the performance of the proposed group probabilistic ICA model and its estimation methods. The simulation results were obtained using in-house MATLAB scripts developed by the author. In the first simulation study, we compared the performance of the proposed group ICA method and the existing probabilistic ICA method, the tensor PICA approach (Beckmann and Smith, 2005). We generated fMRI data for 9 subjects with three underlying source signals. We considered three simulation cases with different group structures which represent various types of source signals. Please refer to Section C.1 of Web Appendix C for detailed information regarding the simulation of the source signals.
With the simulated data, we fitted the proposed group probabilistic ICA model with the exact EM algorithm and the tensor PICA model with the algorithm in Beckmann and Smith (2005). Following previous work (Beckmann and Smith, 2005; Guo and Pagnoni, 2008), the accuracy of the model estimation was assessed by calculating the correlations between the true and estimated spatial maps and time courses. Since ICA recovery is permutation invariant, each estimated IC is matched with the original source with which it has the highest spatial correlation. Table 1 shows the proposed method provides more accurate estimation of the source signals in both the spatial and temporal domains than the tensor PICA in all simulation cases. The proposed model can accurately estimate source signals with different types of between-subjects heterogeneity, while the tensor PICA only provides accurate estimates for signals generated from the single-group structure. The estimates based on the proposed method are also less variable than those from tensor PICA, which is reflected by the lower standard deviation of the spatial and temporal correlations. For detailed discussion of the simulation results, please refer to Section C.2 of Web Appendix C.
Table 1.
Simulation study for comparison between the proposed group probabilistic ICA method and the tensor PICA based on 200 runs. The proposal ICA model is estimated using the exact EM algorithm. Data were simulated for 9 subjects with three underlying source signals.
| Spatial correlations Mean(SD) | ||||||
|---|---|---|---|---|---|---|
|
| ||||||
| The proposed group PICA model
|
Tensor PICA
|
|||||
| IC1 | IC2 | IC3 | IC1 | IC2 | IC3 | |
| Case A | 0.995 (0.002) | 0.994 (0.003) | 0.994 (0.002) | 0.993 (0.004) | 0.993 (0.004) | 0.993 (0.003) |
| Case B | 0.995 (0.002) | 0.995 (0.003) | 0.995 (0.002) | 0.988 (0.011) | 0.990 (0.010) | 0.988 (0.010) |
| Case C | 0.995 (0.002) | 0.994 (0.003) | 0.996 (0.002) | 0.989 (0.010) | 0.980 (0.015) | 0.977 (0.029) |
| Temporal correlations Mean(SD) | ||||||
|
| ||||||
| The proposed group PICA model
|
Tensor PICA
|
|||||
| IC1 | IC2 | IC3 | IC1 | IC2 | IC3 | |
|
| ||||||
| Case A | 0.980 (0.004) | 0.979 (0.004) | 0.979 (0.004) | 0.985 (0.008) | 0.985 (0.007) | 0.984 (0.007) |
| Case B | 0.946 (0.009) | 0.942 (0.012) | 0.944 (0.010) | 0.463 (0.029) | 0.555 (0.038) | 0.525 (0.032) |
| Case C | 0.980 (0.005) | 0.973 (0.006) | 0.978 (0.011) | 0.988 (0.006) | 0.743 (0.029) | 0.481 (0.042) |
In Case A, the single-group structure is specified for each IC; in CASE B, the subject-specific structure is specified for each IC, in CASE C, IC1 has the single-group structure, IC2 has the multi-group structure and IC3 has the subject-specific structure.
For the proposed model, we specified the subject-specific structure for Case A-B and signal-specific structure for Case C.
In the second simulation study, we compared the performance of the exact EM algorithm and the variational approximation EM. We generated fMRI data for 9 subjects and considered three model sizes with the number of source signals of q = 5, 8, and 10. Signals were generated from two simulation cases (Table 2). We then fit the proposed ICA model using both the exact EM and the variational EM. Results from Table 2 show that the accuracy of the variational EM is fairly comparable to that of the exact EM in both the spatial and temporal domains. The standard deviations for the spatial and temporal correlations are slightly higher for the variational EM indicating the estimates from the approximation algorithm are a bit more variable than those from the exact EM. The exact EM converged for all the 200 simulation runs while the variational EM converged in 91%–99.5% runs. The major advantage of the variational EM is that it is much faster than the exact EM. The variational EM’s computational advantage becomes more significant with the increase of the number of source signals. For q = 10, the variational EM only used 2%–3% computation time of the exact EM. We also compared the results between the variational EM and the the Taylor-series approximation EM by Guo and Pagnoni (2008) and found the variational EM provides more accurate estimates, is computationally faster and also has higher convergence rate.
Table 2.
Simulation study for comparison between the exact EM algorithm and the variational EM based on 200 runs.
| Spatial correlations Mean(SD)
|
Temporal correlations Mean(SD)
|
Computing Time (Sec.)
|
Convergence (%)
|
||||||
|---|---|---|---|---|---|---|---|---|---|
| Exact EM | Variational EM | Exact EM | Variational EM | Exact EM | Variational EM | Exact EM | Variational EM | ||
| q=5 | Case A | 0.991 (0.002) | 0.990 (0.003) | 0.970 (0.004) | 0.969 (0.005) | 20.0 | 9.5 | 100 | 94.0 |
| Case B | 0.994 (0.002) | 0.994 (0.002) | 0.948 (0.008) | 0.948 (0.008) | 18.5 | 7.6 | 100 | 95.5 | |
| q=8 | Case A | 0.990 (0.002) | 0.989 (0.003) | 0.965 (0.005) | 0.963 (0.006) | 249.2 | 17.3 | 100 | 91.0 |
| Case B | 0.993 (0.002) | 0.992 (0.002) | 0.952 (0.007) | 0.951 (0.007) | 248.6 | 13.2 | 100 | 95.0 | |
| q=10 | Case A | 0.988 (0.003) | 0.987 (0.004) | 0.961 (0.005) | 0.958 (0.007) | 1278.3 | 35.3 | 100 | 90.5 |
| Case B | 0.991 (0.003) | 0.991 (0.003) | 0.954 (0.006) | 0.954 (0.006) | 1152.9 | 24.9 | 100 | 99.5 | |
q is the number of independent components.
In Case A, the single-group structure is specified for each IC; in Case B, the subject-specific structure is specified for each IC.
The subject-specific group structure was specified when fitting the model.
5. An fMRI data example
We apply the proposed method to real fMRI data from a study on the neurobiological correlates of Zen meditation (Guo and Pagnoni, 2008). The practice of Zen meditation is traditionally considered as conducive to a mental state of reduced conceptual processing while full awareness is retained. We hypothesize that habitual meditators would exhibit an enhanced capacity to voluntarily moderate intensity and duration of the automatic conceptual processing kindled by external stimuli. The study recruited 12 Zen meditators and 12 matched controls. A lexical decision paradigm was used in which word and nonword items were presented visually on a screen and subjects were asked to respond whether the item was a real English word by pushing a button-box with their left hand. Subjects were instructed to use the awareness of their breathing throughout the session as a reference point to monitor and counteract attentional lapses. Hence, there are two experimental conditions during the scanning: an ongoing meditative baseline condition and a phasic perturbation of this baseline by visual stimuli. Under our hypothesis, we would expect differential temporal patterns of neural responses to the experimental stimuli between meditators and non-meditators, especially under meditative condition. For details regarding the experimental task, the collection and preprocessing of the fMRI data, please refer to Web Appendix D.
We performed the two-stage data dimension reduction and whitening procedure on the Zen meditation data prior to the ICA analysis (Web Appendix D). Based on Laplace approximation(Beckmann and Smith, 2004), 14 ICs were appropriate. The ICs were then estimated using the maximum likelihood method via the proposed exact EM algorithm. By examining the spatial distribution of the 14 ICs, we identified two components of particular interest on the basis of the putative neural systems involved in meditation and the execution of the experimental task. The first component represents the task-related network that is is related to the performance of the lexical decision task. The second component represents the attentional network which was selected for the central role of attentional control. We specified the multi-group structure for these two components to extract group-specific temporal responses for comparisons between meditators and controls. For other ICs that are not temporally registered to the task, we specified the subject-specific structure. Spatial maps showing activated brain regions in the two components of interest are presented in Figure 1 where voxels with an estimated conditional probability of activation exceeding 0.95 are labeled active. The task-related network includes the supplementary motor area (SMA), the hand region of the right sensorimotor cortex contralateral to the (left) hand that was pressing the button box, and the visual cortex which is activated by the presentation of the visual word and noword stimuli. The attentional network is a fronto-parietal system including the bilateral intraparietal sulcus and the supplementary eye fields, which is consistent with the general architecture of attentional function.
Figure 1.
Thresholded activation maps based on the exact EM algorithm for two selected independent components from the Zen meditation fMRI data. Voxels with a conditional probability of activation exceeding 0.95 are labeled active. The two components correspond to the task-related network and the attentional network. For each component, images are shown from coronal, sagittal and axial views. This figure appears in color in the electronic version of this article.
We display in Figure 2 the group-specific temporal responses for the two networks. Some summary statistics of the temporal responses are presented in Table 3. We distinguish between the task active state (task-on) when word and nonword stimuli were presented and the meditative state (task-off). During the task-on periods, the responses of the task-related network are highly correlated with the task time series for both groups of subjects (Table 3). For the attentional network, the responses of the meditators is much more co-registered to the task time series than the controls, indicating the meditators has better attentional focus during the task than controls. We also compare the variability of the temporal responses between task active state and meditative state (Table 3). We find that for the attentional network the two groups have different kinds of changes when shifting from task-on state to meditative state. The controls demonstrated similar level of activity between these two states. In comparison, the meditators’ attentional network became much more stabilized during the meditative state with lower variability (Table 3). Results from both Table 3 and Figure 2 show that the responses of the attentional network during the task-off periods is much more regularized for the meditators as compared to controls. A permutation test can be applied to draw a formal statistical conclusion about the significance of the difference between the two groups. Our finding provides supporting evidence for the study hypothesis that meditators can better regularize their conceptual processing during meditative state.
Figure 2.
Estimated temporal responses based on the exact EM algorithm for the task-related network and the attentional network for the Zen meditation fMRI data. The dashed lines correspond to the estimated temporal responses for the meditation group; the dotted lines correspond to the control group and the solid line in the task-on figures is the task time series convolved with the HRF function. The figures on the left represent responses during task-on periods and the figures on the right represent the responses during task-off periods under the meditative state.
Table 3.
Results for the group ICA of the Zen meditation study
| IC | Subject group | Corr. with † the task time series | Standard Deviation ‡
|
|
|---|---|---|---|---|
| Task-on | Task-off | |||
| Task-related network | Control | 0.829 | 0.048 | 0.027 |
| Meditator | 0.855 | 0.046 | 0.020 | |
| Attentional network | Control | 0.592 | 0.045 | 0.042 |
| Meditator | 0.862 | 0.046 | 0.023 | |
Correlation between the estimated group-specific time course with the HRF convolved task time series.
Standard deviation of the estimated group-specific time course.
We also fit the same group probabilistic ICA model using the variational EM which resulted in 99.5% reduction in the computation time as compared to the exact EM. We then compared the results from the two algorithms. In the spatial domain, the correlation between the estimated spatial maps based on the two algorithms is 0.950 for the task-related network and 0.913 for the attentional network. Web Figure 3 displays the thresholded spatial maps for the networks based on the variational EM. Compared to Figure 1, we can see the estimated networks included the same brain regions as those by the exact EM, with only slight difference in the size of the activated area in a few regions. The estimated temporal dynamics are also very similar between the two algorithms (Figure 2 and Web Figure 4). Specifically, the correlations between the estimated time courses based on the two algorithms are 0.943 and 0.980 for controls and meditators in the task-related network and 0.963 and 0.932 for controls and meditators in the attentional network (Web Figure 4).
6. Discussion
We have presented a general probabilistic framework for group ICA analysis of multi-subject fMRI data. Our proposed group ICA model takes a probabilistic approach which is able to provide formal statistical inference for estimated source signals. Compared to the classic noiseless ICA, the probabilistic ICA is more difficult to estimate due to the presence of the noise term. Previous group probabilistic ICA method, the tensor PICA, essentially takes a similar estimation approach as the noiseless ICA by expressing the source signal as the product of the unmixing matrix and whitened observations. In this paper, we propose a maximum likelihood method that provides a formal estimation framework for probabilistic ICA models. The likelihood-based framework will also allow us to perform model comparisons between various group structures using likelihood ratio tests (Guo and Pagnoni, 2008).
Our proposed group ICA model assumes signal-specific and task-specific group structures to provide better fit for fMRI signals related to difference sources and experimental tasks. To select a suitable group structure for each IC, we first obtain information on the biological and neural properties of the underlying source signal based on the characteristics of the spatial distribution and temporal dynamics of the IC. Then, an appropriate group structure can be specified accordingly. One can also consider incorporating experimental paradigm information in specifying the group mixing processes to improve the robustness of the ICA in the presence of noise signals including artifactual or physiologic noise.
A challenge in performing multivariate analysis such as ICA in fMRI studies is how to flexibly modeling HRF that can vary spatially and between subjects. Our proposed ICA model can accommodate different HRF across subjects by applying the subject-specific structure. Spatially constant HRF is assumed in the proposed model as in most current ICA methods since modeling spatially varying HRF functions in ICA still remains an open research topic.
We develop two EM algorithms to obtain ML estimates for the proposed group ICA model. The exact EM provides explicit E-step and M-step and has high convergence rate. To provide a more practical algorithm for ICA models with moderate to large number of components, we propose the variational approximation EM which demonstrates comparable accuracy to the exact EM but requires significantly less computation time. When developing statistical methods for neuroimaging studies, computational efficiency is an important issue to consider due to the enormous amount of data involved. Developing a fast and accurate approximation method provides a feasible solution to many scenarios where the exact method is computationally too costly.
Supplementary Material
Acknowledgments
This work was supported by a URC grant funded by Emory University Research Committee and NIH grant R01-MH079251. The author thanks Dr. Giuseppe Pagnoni for the Zen meditation data.
Footnotes
Web Appendices and Figures referenced in Sections 3, 4 and 5 are available under the Paper Information link at the Biometrics website http://www.biometrics.tibs.org.
References
- Beckmann CF, SMITH SM. Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE Transactions on Medical Imaging. 2004;23:137–152. doi: 10.1109/TMI.2003.822821. [DOI] [PubMed] [Google Scholar]
- Beckmann CF, Smith SM. Tensorial extensions of independent component analysis for multisubject FMRI analysis. NeuroImage. 2005;25:294–311. doi: 10.1016/j.neuroimage.2004.10.043. [DOI] [PubMed] [Google Scholar]
- Bell AJ, Sejnowski TJ. An information-maximization approach to blind separation and blind deconvolution. Neural Computation. 1995;7:1129–1159. doi: 10.1162/neco.1995.7.6.1129. [DOI] [PubMed] [Google Scholar]
- Bullmore E, Brammer M, Williams S, Rabe-Hesketh S, Janot N, David A, Mellers J, Howard R, Sham P. Statistical methods of estimation and inference for functional MR image analysis. Magnetic Resonance Medicine. 1996;35:261–277. doi: 10.1002/mrm.1910350219. [DOI] [PubMed] [Google Scholar]
- Calhoun V, Adali T, Hansen LK, Larsen J, Pekar J. ICA of Functional MRI Data: An Overview. 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003); Nara, Japan. 2003. pp. 281–288. [Google Scholar]
- Calhoun VD, Adali T, Pearlson GD, Pekar JJ. A method for making group inferences from functional MRI data using independent component analysis. Human Brain Mapping. 2001;14:140–151. doi: 10.1002/hbm.1048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eckart C, Young G. The approximation of one matrix by another of lower rank. Psychometrika. 1936;1:211–218. [Google Scholar]
- Guo Y, Pagnoni G. A unified framework for group independent component analysis for multi-subject fMRI data. NeuroImage. 2008;42:1078–1093. doi: 10.1016/j.neuroimage.2008.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hyvärinen A, Karhunen J, Oja E. Independent Component Analysis. New York: Wiley; 2001. [Google Scholar]
- Hyvärinen A. Independent component analysis in the presence of gaussian noise by maximizing joint likelihood. Neurocomputing. 1998;22:49–67. [Google Scholar]
- Csató L, Fokoué E, Opper M, Schottky B, Winther O. Efficient approaches to Gaussian process classification. In: Solla SA, Leen TK, Müller K-R, editors. Advances in Neural Information Processing Systems. Vol. 12. The MIT Press; Cambridge, MA: 2000. [Google Scholar]
- Mckeown MJ, Makeig S, Brown GG, Jung TP, Kindermann SS, Bell AJ, Sejnowski TJ. Analysis of fMRI data by blind separation into independent spatial components. Human Brain Mapping. 1998;6:160–188. doi: 10.1002/(SICI)1097-0193(1998)6:3<160::AID-HBM5>3.0.CO;2-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moulines É, Cardoso J-F, Gassiat E. Maximum Likelihood For Blind Separation And Deconvolution Of Noisy Signals Using Mixture Models. Proc. ICASSP; Munich. 1997. pp. 3617–3620. [Google Scholar]
- Schmithorst V, Holland S. Multiple networks recruited during a story processing task found using group inferences across subjects from independent component analysis. Proc. of the 10th Annual Meeting of ISMRM; Honolulu, HI. 2002. p. 754. [Google Scholar]
- Svensén M, Kruggel F, Benali H. ICA of fMRI group study data. Neuroimage. 2002;16:551–563. doi: 10.1006/nimg.2002.1122. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


