A unified framework for group independent component analysis for multi-subject fMRI data

Ying Guo; Giuseppe Pagnoni

doi:10.1016/j.neuroimage.2008.05.008

. Author manuscript; available in PMC: 2010 Apr 13.

Published in final edited form as: Neuroimage. 2008 May 16;42(3):1078–1093. doi: 10.1016/j.neuroimage.2008.05.008

A unified framework for group independent component analysis for multi-subject fMRI data

Ying Guo ^1,^*, Giuseppe Pagnoni ²

PMCID: PMC2853771 NIHMSID: NIHMS67197 PMID: 18650105

Abstract

Independent component analysis (ICA) is becoming increasingly popular for analyzing functional magnetic resonance imaging (fMRI) data. While ICA has been successfully applied to single-subject analysis, the extension of ICA to group inferences is not straightforward and remains an active topic of research. Current group ICA models, such as the GIFT (Calhoun et al., 2001) and tensor PICA (Beckmann and Smith, 2005), make different assumptions about the underlying structure of the group spatio-temporal processes and are thus estimated using algorithms tailored for the assumed structure, potentially leading to diverging results. To our knowledge, there are currently no methods for assessing the validity of different model structures in real fMRI data and selecting the most appropriate one among various choices. In this paper, we propose a unified framework for estimating and comparing group ICA models with varying spatio-temporal structures. We consider a class of group ICA models that can accommodate different group structures and include existing models, such as the GIFT and tensor PICA, as special cases. We propose a maximum likelihood (ML) approach with a modified Expectation-Maximization (EM) algorithm for the estimation of the proposed class of models. Likelihood ratio tests (LRT) are presented to compare between different group ICA models. The LRT can be used to perform model comparison and selection, to assess the goodness-of-fit of a model in a particular data set, and to test group differences in the fMRI signal time courses between subject subgroups. Simulation studies are conducted to evaluate the performance of the proposed method under varying structures of group spatio-temporal processes. We illustrate our group ICA method using data from an fMRI study that investigates changes in neural processing associated with the regular practice of Zen meditation.

Keywords: Independent component analysis, Multi-subject data, Functional magnetic resonance imaging (fMRI), Group comparison, Maximum likelihood estimation, Likelihood ratio test, EM algorithm

Introduction

Independent component analysis is becoming increasingly popular for analyzing functional neuroimaging data. Compared to the conventional analysis tools such as general linear model (GLM), a key advantage of ICA is that it is a data-driven approach and does not rely on a priori model of brain activity. Therefore, ICA is applicable to cognitive paradigms where prior knowledge of the expected brain time course is not available. ICA could also be used as an exploratory tool to identify and distinguish various types of signals.

ICA has been successfully applied to single-subject fMRI analysis (Beckmann and Smith, 2004; McKeown et al., 1998; Petersen et al., 2000). However, the extension of ICA to group inferences is not as straightforward as in the case of GLM because ICA does not have a pre-specified design matrix, and both the time courses and the spatial maps need to be estimated for each subject (Calhoun and Adali, 2006). Several methods have been proposed to perform group ICA analysis on fMRI data aggregated across multiple subjects. Calhoun et al. developed the GIFT approach (Calhoun et al., 2001), which consists in an initial data-reduction through PCA for each subject, followed by the temporal concatenation of the reduced data across subjects, and a final ICA decomposition of the concatenated data. Back-construction and statistical comparison of individual maps is performed following the ICA estimation. More recently, Beckman and Smith (2005) proposed a tensor probabilistic ICA (PICA) that factors the multi-subject data as a trilinear combination of three outer products, representing the loadings in the temporal, spatial and subject domains, respectively. Tensor PICA is derived from parallel factor analysis (PARAFAC) (Harshman and Lundy, 1984) and it is a natural extension of the two-way product factoring of the single subject ICA, which factors the data as a combination of two outer products of loadings in the temporal and spatial domains. Other group methods that have been proposed include the approach by Svensén and colleagues (Svensén et al., 2002), which concatenates multi-subject fMRI data in the spatial domain and extracts independent components with subject-specific spatial maps associated with common time courses across subjects. Schmithorst and Holland also proposed a group ICA method, which performs PCA reduction and ICA decomposition on the data averaged across subjects (Schmithorst and Holland, 2002).

Among the existing methods, the GIFT and tensor PICA are the most frequently used for performing group ICA analysis of multi-subject fMRI data. These two methods share several similarities: both methods are spatial ICA approaches, i.e., they assume statistical independence of the spatial maps of the extracted components and both methods provide estimation of group spatial maps by performing ICA on the aggregated group data. On the other hand, the GIFT and tensor PICA have important distinctions. A major difference lies in the structure of the group spatio-temporal processes that is assumed in the ICA decomposition. The tensor PICA approach decomposes the multi-subject fMRI data as a trilinear combination of three outer products representing group spatial maps, group time courses and subject loadings. That is, subjects are associated with the same set of group spatial maps and time courses but differ in the magnitude of loading on the group spatio-temporal processes. The GIFT decomposition of the data, on the other hand, implies group spatial maps and subject-specific time courses. Furthermore, the estimation and statistical inference procedure for the GIFT and tensor ICA model are also distinctive and are tailored to their specific model structure. The GIFT approach belongs to the classical noise-free ICA framework which assumes that data are completely characterized by the estimated sources and the mixing matrix. Based on the noise-free assumption, the GIFT reconstructs a subject’s spatial map from the group ICA estimation by multiplying the inverse of the block of the mixing matrix corresponding to the subject with the observed data from the subject. The statistical inference for spatial activation is then performed through a “random effects” inference on the individual maps. The tensor PICA approach is a probabilistic ICA approach which assumes the observed data is the combination of a set of statistically non-Gaussian sources aggregated through the mixing matrix and additive Gaussian noises. Voxel-wise Z-scores are calculated by dividing the estimated spatial maps by the noise standard deviation. The Z-scores are then modeled with the Gaussian/Gamma mixture where the Gaussian component represents the background noise and the Gamma distribution models brain activation. The statistical inference for spatial activation is performed through calculation of the posterior probability for activation based on the mixture model for the voxel-wise Z-scores.

Since the existing group ICA models assume different group structures in the ICA decomposition and their estimation and statistical inference procedures are based on the particular model structure, these methods may produce different results when applied to the same fMRI data. It is desirable to develop a more general framework for group ICA that could accommodate varying structures of group spatio-temporal processes. It is also important to develop a statistical method to select an appropriate group structure for a particular data set.

An important task in analyzing multi-subject fMRI data is to characterize and compare brain activity between subjects from different groups, such as subjects with/without certain psychiatric conditions or subjects assigned to various treatment arms. Current group ICA methods do not take into account subjects’ group identification in the ICA decomposition. Group comparisons are typically performed as post-ICA-estimation analysis often by comparing independent components estimated separately in each group. There are several limitations associated with the existing group comparison approaches. First, when independent components are estimated separately in each group, it is necessary to first identify matching components in different groups. Because independent components are not ordered as principle components, this is often done by identifying independent components in each group that are associated with a pre-specified spatial or temporal template, which requires some prior information on the spatial distribution or temporal dynamics of the underlying source signal (Calhoun et al., 2004). Furthermore, it is possible that spatial ICA may split a spatio-temporal structure into two or more temporally correlated components, which creates another difficulty in selection of matching components from different groups. Recently, Calhoun and colleagues proposed an approach that performs a group ICA on combined data from both subject groups and then reconstruct subject-specific maps and time courses for group comparisons (Calhoun et al., 2008). This new approach avoids the need for matching components between groups. However, the group identification is still not incorporated directly in the ICA decomposition. The second issue with the existing group ICA comparison approaches lies in the interdependence of group comparisons on the temporal and spatial domains. Preferably, group comparison in one of the domains should be performed while controlling for the group difference in the other domain. For example, GLM estimates and compares group spatial map by regressing each subject’s data against the same temporal paradigm. Such approach does not naturally apply to ICA because both the time courses and spatial maps are estimated from data. Within common group ICA comparison approaches (but see (Calhoun et al., 2008)), group independent components are estimated separately in each group and thus differ both in their spatial images and time courses. Hence, group comparison in either the temporal or the spatial domain is confounded by the group difference in the other domain. The above limitations of the existing approaches arise mainly because the group comparisons are performed indirectly as a second-stage analysis after estimating independent components separately in each group. Hence, it is desirable to develop a group ICA method that could directly incorporate subjects’ group information in the ICA decomposition and therefore provide a formal statistical method for group comparisons.

In this paper, we propose a unified framework for fitting group ICA models that are based on varying structures of group spatio-temporal processes. We consider a class of group spatial ICA models, assuming independence in the spatial domain. The proposed models decompose multi-subject fMRI data into group spatial maps and a group mixing matrix that reflects the assumed structure of group spatio-temporal processes, such as the trilinear product structure of tensor PICA. This class of models incorporates existing methods such as the GIFT and tensor PICA as special cases. Furthermore, by specifying an appropriate structure for the group mixing matrix, the proposed model could directly incorporate subjects’ group information in the ICA decomposition. For model estimation and statistical inference, we propose a maximum likelihood approach. Latent spatial source signals are modeled with Gaussian mixture distributions where the various Gaussian components model the probability density of background noise and BOLD effects respectively. We develop a modified EM algorithm to obtain the maximum likelihood estimates of the parameters in the group ICA models.

To evaluate the validity of various structures of group spatio-temporal processes for a particular data set, we present statistical tests for making model comparisons between group ICA models assuming different group structures. The statistical tests are based on the difference in the maximum log-likelihoods under two ICA model structures and are known as the likelihood ratio test (LRT). The LRT can be used to assess the validity of a group structure in an fMRI data set. Furthermore, a statistical test based on the LRT is developed to examine group differences in the spatial modes’ associated time courses between subject groups. We also develop a local LRT for comparisons of different group structures for a subset of independent components. The local LRT is motivated by the fact that ICA components extracted from fMRI data often reflect different kinds of signals including task-related, transiently task-related, physiology-related and artifact-related ones. (Calhoun and Adali, 2006; McKeown and Sejnowski, 1998). Given the varying characteristics of ICA components, it may be desirable to assume a specific group structure only upon components that are of interest or expected to have particular properties while not imposing this structure upon components whose properties are unknown or unlikely to conform to the chosen structure. Another motivation for the local LRT is that we may be interested only in a subset of the signals that are relevant to the study objectives. The local LRT provides more precise evaluation of an assumed group structure on the selected components.

The remainder of this paper is organized as follows. In the Methods section, we introduce aclass of group ICA models and show that existing group ICA models such as the GIFT andtensor PICA can be viewed as special cases within this class. We then present a maximum likelihood (ML) approach with a modified Expectation-Maximization (EM) algorithm for the estimation of the proposed class of models. To compare between group ICA models, we introduce likelihood ratio tests and show how to use the LRT to assess the goodness-of-fit of a group structure in a data set and to test group differences between subject groups. The Results section evaluates the performance of the proposed method on simulated data, and also illustrates its application to real fMRI data from a study investigating changes in neural processing associated with the regular practice of Zen meditation. Finally, a concluding section summarizes and provides further discussion about the presented method and findings.

Methods

In this section, we first describe the class of group ICA models that is able to subsume varying structures of the group spatio-temporal processes. We then present the maximum likelihood approach for model estimation and likelihood ratio tests for performing model comparisons.

A class of group ICA models

In the fMRI application of ICA, the data is usually decomposed as a product of spatial maps and associated time courses. Statistical independence is assumed for either the spatial maps or the time courses, leading to the terminology of spatial ICA or temporal ICA, respectively (Calhoun and Adali, 2006). For fMRI data, spatial ICA has become dominant because the spatial independence assumption is well suited to the sparse distributed nature of the spatial pattern for most cognitive activation paradigms (McKeown and Sejnowski, 1998). In this paper, we also assume spatial independence in our group ICA model. To set notation, let i = 1,…, N index subjects, s = 1,…,T index time points, and v = 1,…,V index voxels. Let X_i be the T ×V matrix representing the observed fMRI data from subject i. We consider the following group ICA model

X = M S + E,

(1)

where $X = {[X_{1}^{t}, \dots, X_{N}^{t}]}^{t}$ is the NT ×V group data matrix formed by concatenating N subjects’ data in the temporal domain, S is q×V matrix containing q statistically independent spatial maps in its rows, $M = {[A_{1}^{t}, \dots, A_{N}^{t}]}^{t}$ is the TN× q group mixing matrix, where A_i is the T×q submatrix corresponding to the i th subject, and $E = {[E_{1}^{t}, \dots, E_{N}^{t}]}^{t}$ is the NT×V noise matrix where E_i is the T×V noise matrix corresponding to the i th subject. Define e_v as the v th column of the E matrix representing the N subjects’ noise term at voxel v. The noise term is assumed to follows a zero-mean multivariate Gaussian distribution within each subject and is independent across subjects, i.e. e_v ~ MVN(0,I_N ⊗ Σ_v) where Σ_v is the T×T error covariance matrix for the v th voxel. The ICA model in (1) is referred to as the noisy or probabilistic ICA (Beckmann and Smith, 2004; Hyvarinen et al., 2001) since it includes the Gaussian noise term to account for the background noise that is not represented in the source signals. In noisy ICA, the noise term is generally assumed to be additive and Gaussian distributed since structured noise usually appear in the data as structured non-Gaussian variability and hence can be modeled as one of the source signals (Beckmann and Smith, 2004; Beckmann and Smith, 2005; Hyvarinen,1998; Hyvarinen et al., 2001). Comparing to the classical noise-free ICA, the noisy ICA model can help address the issue of overfitting and provides a convenient framework for formal statistical tests of significance for source signal estimates (Beckmann and Smith, 2004).

Our group ICA model decomposes the multi-subject fMRI data as the product of the group mixing matrix M and the group spatial map matrix S. The group spatial map matrix S is estimated by aggregating information from all subjects’ data and represents the group spatial patterns of brain function. The group mixing matrix M is formed by concatenating the submatrices A_i which represent the mixing processes for the i th subject (i = 1,…, N). The relationship between the subject mixing matrices reflects the nature of the between-subject heterogeneity in the modeled neural activity. Therefore, by specifying different kinds of regularities across subjects for the group mixing matrix, we obtain group ICA models with varying structures of the group spatio-temporal processes. In the following, we consider several special cases within this class of group ICA models.

Connection with existing group ICA methods

By specifying an appropriate structure for the group mixing matrix, the class of group ICA models in (1) relates naturally to some of the existing group ICA methods. For example, the group structure assumed in the GIFT approach is equivalent to the setting of

M = {[A_{1}^{t}, \dots, A_{N}^{t}]}^{t},

(2)

where A_i is the T×q submatrix corresponding to the i th subject. Since there are no restrictions on the relationship between the subject submatrices, the group structure of the GIFT corresponds to the most general model within our class of group ICA models. Note, however, we note that the GIFT model is different from the proposed group ICA model in that the former is a noise-free classical ICA whereas the latter includes a noise term.

The tensor PICA is another special case within our class of group ICA models. The trilinear product model of the tensor PICA is equivalent to assuming the following Khatri-Rao product structure (Bro, 1998) for the group mixing matrix,

M = C | \otimes | A

(3)

where C is the N×q subject loading matrix with c_iℓ representing the i th subject loading on the ℓ th independent component (i = 1,…, N, ℓ = 1,…, q), and A is the T×q matrix containing the group time courses associated with the q independent components. The Khatri-Rao product C|⊗|A is a TN × q matrix formed by N copies of the A matrix stacked and column-wise scaled by the rows of the subject loading matrix C , i.e. C|⊗|A = ((Adiag(C_1·))^t,…,Adiag(C_N·))^t)^t (Bro, 1998).In other words, we assume that each subject’s mixing matrix is equal to the common mixing matrix A scaled by the subject’s loading vector, i.e.A_i = Adiag(C_i·))^t.

Extended group tensor model

In many imaging studies, it is desirable to describe and compare the spatio-temporal processes related to brain activity among subject groups with different characteristics or treatment assignments. Group comparisons are more challenging with ICA than with the standard GLM analysis. In GLM analysis, the subjects’ group identifying labels are incorporated into the design matrix and group-specific estimates of brain activations are readily derived. For ICA, the group information is generally not directly taken into account in the extraction of the independent components and group-specific independent components are typically estimated separately in each group.

Within the model framework of (1), we propose a group ICA model that incorporates the subjects’ group information in the structure of the group mixing matrix and thus factors in the group labels in the ICA decomposition of the multi-group data. For illustration purposes, we present the model for a two-group scenario but the model could easily be generalized for more than two groups. Suppose we collected fMRI data from two groups of subjects. $Let X_{i}^{(g)}$ (i = 1,…,N _g) denote the T×V fMRI data matrix from subject i in group g (g = 1,2). $Let X^{(g)} = {[X_{1}^{{(g)}^{t}}, \dots, X_{N_{g}}^{{(g)}^{t}}]}^{t}$ be the N _gT×V data matrix obtained by temporally concatenating the data from the subjects in group g (g = 1,2) and X = [X^{(1)^t}, X^{(2)^t}]^t be the (N₁ + N₂)T×V data matrix obtained by stacking the data from both groups. We propose a group ICA model within the model framework (1) with the group mixing matrix specified as,

M = (\begin{matrix} C^{(1)} | \otimes | A^{(1)} \\ C^{(2)} | \otimes | A^{(2)} \end{matrix}),

(4)

Here A^(g) (g = 1,2) is the T×q matrix containing the group time courses associated with the q independent components for group g and C^(g) is the N_g × q subject loading matrix for group g where $c_{i ℓ}^{(g)}$ represents the loading of the i th subject in group g on the ℓ th independent component (i = 1,…,N_g , ℓ = 1,…,q). With the group mixing matrix (4), we assume a trilinear product structure for the group spatio-temporal processes within each group, with representative time courses being allowed to be different between the two groups. The proposed model can be viewed as an extension of the tensor PICA model for multi-group data, allowing us to capture different temporal responses between subject groups.

To provide a quick overview of the aforementioned special cases within the proposed group ICA models, we summarize their key properties and defined structure for the group mixing matrix in Table 1 below. We will henceforth refer to these models using the given model names in Table 1.

Table 1.

Descriptions of some important cases of the proposed class of group ICA models.

Model Name	Definition equation	Properties	Defined structure for the group mixing matrix	Connection with existing methods
Tensor Model	Eq. (3)	Each subject’s mixing matrix is equal to the common group time course matrix A scaled by the subject’s loadings, which means that the subjects’ time courses associated with an IC are scaled repetitions of the common group time course.	$M = C \| \otimes \| A$	Tensor PICA (Beckman and Smith,2005)
Group Tensor Model	Eq. (4)	Subjects within a group are associated with a common group time course matrix, with representative time courses being allowed to be different between the two groups. The model can be generalized to multiple groups.	$M = (\begin{matrix} C^{(1)} \| \otimes \| A^{(1)} \\ C^{(2)} \| \otimes \| A^{(2)} \end{matrix}),$	Extension of Tensor PICA for multi-group data
Full Model	Eq. (2)	The most general model within our class of group ICA models with no restrictions on the relationship between the subject mixing matrices.	$M = {[A_{1}^{t}, \dots, A_{N}^{t}]}^{t},$	GIFT (Calhoun et al., 2001)

Open in a new tab

Estimation

The proposed group ICA models are estimated using maximum likelihood estimation. The maximum likelihood (ML) approach has been used for single-subject ICA analysis (Belouchrani and Cardoso, 1995; Cichocki et al., 1998; Hyvarinen, 1998; Moulines et al., 1997). Here, we extend the ML method for ICA decomposition of multi-subject fMRI data. The likelihood function is constructed based on a data augmentation scheme where unobserved spatial source signals are treated as missing data. The distribution of the source signals is modeled by a mixture of Gaussian distributions. We developed a modified expectation-maximization (EM) algorithm to obtain the maximum likelihood estimates for these sources.

The complete likelihood function

We first construct the likelihood function for group ICA models in (1). Given the spatial signal s_v, the observed multi-subject fMRI data at voxel v follows a multivariate Gaussian distribution with mean Ms_v and variance I _N ⊗ Σ _v. Therefore, the complete log-likelihood for the observed data and the latent spatial signals in the group ICA model (1) is as follows

log p (X, S; θ) = \sum_{v = 1}^{V} log Ψ (x_{v}; M s_{v}, I_{N} \otimes Σ_{v}) + \sum_{v = 1}^{V} log f (s_{v}; φ),

(5)

Where _ψ(·) is the multivariate Gaussian density function, f(s_v;φ) represents the probability density function (pdf) for the spatial signal s_v, with φ denoting the parameters involved in the pdf, and θ = (M,{Σ_v},φ) represents all the parameters in the likelihood function. The complete likelihood in (5) is composed of two parts. The first part represents the likelihood function of the observed fMRI data given the parameters and the spatial signals and the second part is the likelihood for the latent spatial signals based on a prior distribution function. In the following, we discuss the choice of the distribution function for the spatial signals.

Gaussian mixtures for the latent spatial source signals

We follow two major criteria in selecting an appropriate distribution function for the latent spatial signals. First, the distribution should be able to capture the salient features of the underlying spatial source signals; for example, in fMRI most task-related signals are usually attributable to a small percentage of voxels of the brain, whereas the rest of the brain areas are thought to exhibit task-unrelated background fluctuations (Biswal and Ulmer, 1999; Suzuki et al., 2001). Secondly, due to the enormous amount of data in multi-subject fMRI studies, it is desirable to select a distribution that offers numerical efficiency in the estimation of the model parameters. Based on the above two criteria, we model the spatial source signals with a mixture of Gaussian distributions. The Gaussian mixture distribution is well suited to capture different patterns of neural activity across the brain (Xu et al., 1997). Furthermore, the Gaussian mixture offers tractable mathematic properties that facilitate the model estimation, as will be detailed below.

Denote with s_vℓ the spatial source signal for ℓ th (ℓ = 1,…, q) independent component at voxel v. The pdf of s_vℓ , based on Gaussian mixture distribution, is,

f (s_{vℓ}; φ_{ℓ}) = \sum_{j = 1}^{m} π_{ℓj} Ψ (s_{vℓ}; μ_{ℓj}, σ_{ℓj}^{2}),

(6)

where m is the number of Gaussian density components in the mixture. Our experiments and previous work (Beckmann and Smith, 2004) suggest that two to three mixtures are typically appropriate to capture the distribution of the spatial signals. $Ψ (s_{vℓ}; μ_{ℓj}, σ_{ℓj}^{2})$ is the Gaussian density function with mean μℓj and variance $σ_{ℓj}^{2}, π_{ℓj}$ is the proportion of the j th Gaussian density which satisfies $0 \leq π_{ℓj} \leq 1 and \sum_{j = 1}^{m} π_{ℓj} = 1 . φ_{ℓ} = ({π_{ℓj}}, {μ_{ℓj}}, {σ_{ℓj}^{2}})$ represents the parameters associated with the Gaussian mixture distribution for the ℓ th independent spatial source signal. To facilitate our later derivation with the mixture distribution, it is helpful to assume that a latent class variable w_vℓ exists, taking values in [1,…,m] with probability p(w_vℓ =π_ℓj) (1 ≤ j ≤ m). The latent class variable w_vℓ indicates the membership of voxel v to the Gaussian mixture, i.e. when w_vℓ= j, the conditional distribution of s_vℓ corresponds to the j th Gaussian component in the mixture,

p (s_{vℓ} | w_{ℓ} = j) = Ψ (s_{vℓ}; μ_{ℓj}, σ_{ℓj}^{2}), 1 \leq j \leq m

(7)

Due to the statistical independence between the components, the joint distribution of q independent components at voxel v equals the product of the pdf of each component,i.e.,

f (s_{v}; φ) = \prod_{ℓ = 1}^{q} f (s_{vℓ}; φ),

(8)

where s_v = [s_v1 ,… , s_vq]^t and φ = (φ₁,…,φ_q). We can then substitute the Gaussian mixture density function in (8) into the complete log-likelihood function (5).

Modified E-M algorithm

When the likelihood function involves unobserved latent variables, the expectation-maximization (EM) algorithm (Dempster et al., 1977; Dempster et al., 1981) is often used to find maximum likelihood estimates of parameters. The EM algorithm is an iterative algorithm that alternates between performing an expectation step (E-step) and a maximization step (M-step). In the E-step, one computes an expectation of the log-likelihood conditioning on the posterior distribution of latent variables S given the observed data and the current parameter estimates θ̂^(k). At the M-step, the updated maximum likelihood estimates of the parametersθ̂^(k+1) is computed by maximizing the expected log-likelihood found on the E-step. The parameter estimates θ̂^(k+1) found on the M-step are then used to begin another E-step, and the process is iterated until convergence, i.e. until the parameter estimates θ̂^(k) and θ̂^(k+1) in two consecutive iterations are considered sufficiently close.

Since the log-likelihood in (5) involves unobserved spatial source signals S , we consider the EM framework for model estimation. Due to the special features of the imaging data and our proposed group ICA model, we develop a modified EM algorithm.

At the E-step, the expected log-likelihood function is computed as

Q (θ | {\hat{θ}}^{(k)}) = E_{S | X, {\hat{θ}}^{(k)}} log p (X, S; θ) = \int log p (X, S; θ) Pr (S | X, {\hat{θ}}^{(k)}) d θ

(9)

To calculate the expectation, we first need to derive the posterior distribution of the latent spatial signals. Given the assumed mixture of Gaussian distributions for s_v and the multivariate Gaussian distribution of the data, we show that the posterior distribution of s_v is again a mixture of Gaussian distributions,

Pr (s_{v} | x_{v}, θ) = \sum_{r = 1}^{m^{q}} p_{v}^{(r)} Ψ (s_{v}; α_{v}^{(r)}, Ω_{v}^{(r)}),

(10)

$w^{(r)} = (w_{1}^{(r)}, \dots, w_{q}^{(r)}) \in {1, \dots, m} :$ rth realization of latent labels w_v = (w_v1,…,w_vq, where r = 1,…m^q

μ^{(r)} = [μ_{1 w_{1}^{(r)}}, \dots, μ_{q w_{q}^{(r)}}]^{T}, Γ^{(r)} = diag (σ_{1 w_{1}^{(r)}}^{2}, \dots, σ_{q w_{q}^{(r)}}^{2}) .

Ω_{v}^{(r)} = {[M^{T} (I_{N} \otimes {Σ_{v}}^{- 1}) M + Γ^{{(r)}^{- 1}}]}^{- 1},

α_{v}^{(r)} = Ω_{v}^{(r)} [M^{T} (I_{N} \otimes {Σ_{v}}^{- 1}) x_{v} + Γ^{{(r)}^{- 1}} μ^{(r)}]

p_{v}^{(r)} = z_{v}^{(r)} / \sum_{r = 1}^{m^{q}} z_{v}^{(r)}, with

z_{v}^{(r)} = (\prod_{ℓ = 1}^{q} \frac{π_{ℓ w_{ℓ}^{(r)}}}{σ_{ℓ w_{ℓ}^{(r)}}}) . | Ω_{v}^{(r)} |^{\frac{1}{2}} exp [- \frac{1}{2} (μ^{{(r)}^{T}} Γ^{{(r)}^{- 1}} μ^{(r)} - α_{v}^{{(r)}^{T}} Ω_{v}^{{(r)}^{- 1}} α_{v}^{(r)})]

The expectation of the log-likelihood function is then obtained by integrating the log-likelihood function over the Gaussian mixture posterior distribution. However, the integration in (9) does not have a tractable form. When the E-step is intractable, typical solutions include the Monte Carlo EM (MCEM) algorithm (Wei and Tanner, 1990) or the stochastic EM algorithm (Delyon et al., 1999), which approximate the intractable expectation function by simulating random variables from the posterior distribution. Due to the large number of voxels with imaging data, implementations of these algorithms are practically infeasible. As an alternative, we propose to approximate the intractable expectation function Q(θ | θ̂^(k)) using a second-order Taylor expansion approximation, which gives the following explicit form for Q(θ | θ̂^(k)),

\begin{matrix} Q (θ | {\hat{θ}}^{(k)}) & = \sum_{v = 1}^{V} {- \frac{1}{2} [log | Σ_{v} | + {x_{v}}^{T} (I_{N} \otimes {Σ_{v}}^{- 1}) x_{v} - 2 {x_{v}}^{T} (I_{N} \otimes Σ_{v}^{- 1}) M {\hat{\bar{s}}}_{v} \\ + \sum_{r = 1}^{m^{q}} {\hat{p}}_{v}^{(r)} [t r (M^{T} (I_{N} \otimes {Σ_{v}}^{- 1}) M {\hat{Ω}}_{v}^{(r)}) + {\hat{α}}_{v}^{{(r)}^{T}} M^{T} (I_{N} \otimes Σ_{v}^{- 1}) M {\hat{α}}_{v}^{(r)}] \\ + \sum_{ℓ = 1}^{q} (log (f ({\hat{\bar{s}}}_{v}; φ)) + \frac{1}{2} \frac{v \hat{a} r (s_{vℓ})}{f ({\hat{\bar{s}}}_{v}; φ} [- {(\sum_{j = 1}^{m} \frac{π_{ℓj}}{σ_{ℓj}^{2}} Ψ ({\hat{\bar{s}}}_{vℓ}; μ_{ℓj}, σ_{ℓj}^{2}) (μ_{ℓj} - {\hat{\bar{s}}}_{vℓ}))}^{2} / f ({\hat{\bar{s}}}_{v}; φ) \\ + (\sum_{j = 1}^{m} \frac{π_{ℓj}}{σ_{ℓj}^{2}} Ψ ({\hat{\bar{s}}}_{vℓ}; μ_{ℓj}, σ_{ℓj}^{2}) [{(μ_{ℓj} - {\hat{\bar{s}}}_{vℓ})}^{2} / σ_{ℓj}^{2} - 1])])} \end{matrix}

(11)

Our modified E-step does not require iterative numerical simulations and hence is computationally more efficient than the existing algorithms.

At the M-step of the standard EM algorithm, the updated estimates are obtained by maximizing the expected log-likelihood function computed in the E-step,

{\hat{θ}}^{(k + 1)} = arg \max_{θ} Q (θ | {\hat{θ}}^{(k)})

(12)

In our class of group ICA models, the group mixing matrix M may have a specific structure based on the chosen assumption for the group spatio-temporal processes. For example, the Tensor Model assumes that the matrix M can be written as a Khatri-Rho product of the group time course matrix and the subject loading matrix. In order to estimate the group mixing matrix with a specific structure, an additional step is performed after the M-step of the standard E-M algorithm to project the estimated group mixing matrix onto the space of the specified model structure. Continuing with our example, to estimate the group mixing matrix for the Tensor Model such that M^ = C^ |⊗|A^, a rank-1 approximation can be employed (Beckmann and Smith, 2005).

The steps for the proposed modified E-M algorithm are summarized in the following:

start with initial parameter values θ⁽⁰⁾= (M⁽⁰⁾,{Σ_v⁽⁰⁾}, φ⁽⁰⁾). The initial values could be obtained by first analyzing the multi-subject fMRI data using existing group ICA software such as the GIFT (http://icatb.sourceforge.net/) or FSL’s MELODIC routine (http://www.fmrib.ox.ac.uk/fsl/melodic).
modified E-step: calculate Q(θ |θ^^(k)) based on a Taylor expansion approximation (11)
M-step: update parameter estimates by maximizing the expectation function obtained through step 2, i.e. θ̂^(k+1) arg max _θ Q(θ |θ^^(k))
project the estimated group mixing matrix M̂^(k+1) onto the space specified by the model structure. The step is done by taking each column of M̂^(k+1) that corresponds to a particular independent component and projecting it to the specified space.
iterate between steps 2–4 until convergence

After obtaining the parameter estimates θ̂ = (M̂, {Σ̂_v}, φ̂) from the modified E-M algorithm, the spatial source signal and its variance can be estimated based on its posterior distribution (10),

{\hat{s}}_{v} = E (s_{v} | x_{v}, \hat{θ}) = \sum_{r = 1}^{m^{q}} {\hat{p}}_{v}^{(r)} {\hat{Ω}}_{v}^{(r)} ({\hat{M}}^{T} {\hat{Σ}}_{v}^{- 1} x_{v} + {\hat{Γ}}^{{(r)}^{- 1}} {\hat{μ}}^{(r)}),

(13)

var (s_{v}) = V a r (s_{v} | x_{v}, \hat{θ}) = \sum_{r = 1}^{m^{q}} {\hat{p}}_{v}^{(r)} ({\hat{Ω}}_{v}^{(r)} + {\hat{α}}_{v}^{(r)} {\hat{α}}_{v}^{{(r)}^{T}}) - {\hat{s}}_{v} {\hat{s}}_{v}^{T} .

(14)

The spatial IC maps can then be obtained by plotting the raw IC estimates ŝ _v or the IC standardized by their standard deviation ŝ_v / s .d..(s_v), to evaluate the relative activity of various brain regions in each signal.

An important goal in fMRI analysis is to identify "active" voxels, i.e. voxels that display significant changes of activity (positive or negative), typically task -related but not necessarily so (e.g., in resting state studies). We model the spatial source signals using Gaussian mixtures, with different Gaussian components modeling the probability density of background noise and BOLD effects, respectively. In this scheme, the activation of a voxel can be evaluated by calculating the probability for the voxel to belong to the Gaussian component that represents activation. Gaussian components that model the activation vs. background noise could be easily identified by examining the estimated mean associated with each Gaussian. Gaussian with a mean close to 0 represents background noise and Gaussians with mean far away from 0 represent activation corresponding to positive or negative BOLD effects. Suppose that the j₁ th Gaussian component models the activation of interest in each signal, the probability of activation at voxel v in the ℓ th signal can be estimated as the posterior probability for the latent class variable w_vℓ to point to the j₁ th Gaussian component , i.e.,

\hat{P} r (w_{vℓ} = j_{1} | \hat{θ}, X) = \sum_{r \in {w_{vℓ}^{(r)} = j_{1}}} {\hat{p}}_{v}^{(r)} .

(15)

A statistical inference for the distribution of the activated voxels can be obtained by plotting the thresholded posterior probability of their signal.

Statistical tests between group ICA models

Based on the maximum likelihood method, we can compare various group ICA models using the likelihood ratio test (LRT). The LRT can be used to assess the goodness-of-fit of an assumed spatio-temporal structure in a group data set. Furthermore, we derive a statistical test based on the LRT to test group differences in the fMRI signal time course between subject groups.

Likelihood ratio test

The likelihood ratio test is a statistical test between two nested models in which the simple model is a special case of the general model, in the sense that the general model differs from the simple model only by the addition of one or more parameters. In the likelihood ratio test, the two nested models are framed as a null hypothesis H₀ and an alternative hypothesis H₁ where the simple model is represented by the null hypothesis H₀ and the general model is represented by the combination of the null and alternative hypotheses, i.e. H₀ ∪H₁. The likelihood ratio test statistic is the difference between the maximum log-likelihoods of the data under the two models. Under the null hypothesis H₀ , the likelihood ratio test statistics (LR) approximately follows a Chi-square distribution with degrees of freedom equal to the number of additional parameters in the general model. If the LR is greater than the α upper percentile of the Chi-square distribution with the given degrees of freedom, the null hypothesis is rejected and we conclude that the general model is significantly better supported by the observed data than the simple model.

To perform the likelihood ratio test between nested group ICA models with different structures for the group mixing matrix M, we calculate the marginal likelihood of the observed data by integrating the complete likelihood function with respect to the density of the unobserved spatial source signals. The marginal likelihood for voxel v can be written as:

m (x_{v}; θ) = \int Ψ (x_{v}; M s_{v}, Σ_{v}) f (s_{v}; φ) d s \propto | Σ_{v} |^{- N / 2} . exp (- \frac{1}{2} x_{v}^{T} {Σ_{v}}^{- 1} x_{v}) \cdot \sum_{r = 1}^{m^{q}} z_{v}^{(r)} .

(16)

Hence, the marginal log-likelihood for our class of group ICA models is

L (X; θ) = c - \frac{1}{2} \sum_{v = 1}^{V} [N log | Σ_{v} | + x_{v}^{T} (I_{N} \otimes Σ_{v}^{- 1}) x_{v} + \log \sum_{r = 1}^{m^{q}} Z_{v}^{(r)}],

(17)

where c is a constant term that does not involve the parameters. The LR test statistic for comparing two nested group ICA models is,

L R = - 2 [L (X; {\hat{θ}}_{0}) - L (X; {\hat{θ}}_{1})]

(18)

where θ̂₀ and θ̂₁ are the MLE of the parameters under the simple model and the general model, respectively. The LR statistic follows a Chi-square distribution, i.e. LR ~ χ² (τ) , with the number of degrees of freedom τ given by the difference in the dimensionality of θ̂₀ and θ̂₁, which essentially equals the difference in the number of parameters in the group mixing matrix under the two ICA models.

Goodness-of-fit test for a group ICA model

Using the LRT, we could evaluate the goodness-of-fit for a specific structure of the group spatio-temporal processes in a multi-subject fMRI data set by comparing the model with the specific group structure against the Full Model. In the following, we present the goodness-of-fit test for the Tensor Model. Similar tests can be performed to evaluate the appropriateness of other group structures.

The Tensor Model assumes the trilinear product structure for the group ICA decomposition. When the assumption of trilinear structure is violated, the validity of the estimated group spatio-temporal processes becomes questionable. To evaluate the goodness-of-fit of the Tensor Model in a data set, we compare the group ICA model with the trilinear product structure against the Full Model using the likelihood ratio test. The hypotheses of the test are:

\begin{matrix} H_{o} & : M = C | \otimes | A \\ H_{1} & : M = [A_{1}^{'}, \dots A_{N}^{'}]^{'} where A_{i} \neq A diag (C_{i \cdot}) for i = 1, \dots, N \end{matrix}

(19)

The null hypothesis H_o states that the group mixing matrix can be expressed as the Khatri-Rho product of the subject loading matrix and the group time course matrix, which is equivalent to specifying a trilinear product structure for the group spatio-temporal processes. The general model in the test, i.e. H₀ ∪ H₁ , is the Full Model, where the group mixing matrix is equal to the concatenated mixing matrices from the individual subjects, without any assumption of a shared structure among them. The trilinear model is a special case of the Full Model when each subject’s mixing matrix can be expressed as the group time courses matrix column-wise scaled by the subject’s loading vector. Hence, the test is equivalent to testing whether the subjects’ individual mixing matrices satisfy the relationship A_i=A diag(C_i·) for i =1,…,N. The likelihood ratio test statistic is defined in (18) with θ̂₀ = {Â, Ĉ, {Σ̂_v0}, φ̂₀} and θ̂₁ = M̂, {Σ̂_v}, φ̂} representing the set of estimated parameters for the models under the hypotheses H₀ and H₀ ∪ H₁ , respectively. The LR statistic follows a Chi-square distribution with the number of degrees of freedom equal to the difference in the number of parameters under the two hypotheses, which is τ = TNq −(T + N)q. If $L R > χ_{α}^{2} (τ)$ , the null hypothesis is rejected and it can be concluded that the data do not satisfy the trilinear product structure.

Testing group differences between subject groups

We hereby derive a statistical test of differences between subject groups. The proposed test is constructed as the likelihood ratio test between the Tensor Model and the Group Tensor Model. In the Tensor Model, the defining assumption is that all subjects are associated with the same set of group spatial maps and group temporal responses. As an extension, the Group Tensor Model (4) allows heterogeneous temporal responses between different subject groups. Hence, by comparing the two models using the LRT, we could examine whether there is a significant difference in the temporal responses between two groups. The hypotheses for the proposed group test are

H_{o} : M = C | \otimes | A, H_{1} : M = (\begin{matrix} C^{(1)} | \otimes | A^{(1)} \\ C^{(2)} | \otimes | A^{(2)} \end{matrix}) and A^{(1)} \neq A^{(2)} .

(20)

The Tensor Model in the null hypothesis is a special case of the Group Tensor Model when A⁽¹⁾ = A⁽²⁾ , i.e. when the group time courses of the two groups are assumed to be the same. Hence, the test is equivalent to testing whether A⁽¹⁾ = A⁽²⁾. The LRT test statistic is defined as in (18) with θ̂₀ = {Â, Ĉ, {Σ̂_v0}, φ̂₀} and θ̂₁ = {Â₁, Ĉ₁, Â₂, Ĉ₂, {Σ̂_v}, φ̂} representing the maximum likelihood estimates of the parameters under the Tensor Model and the Group Tensor Model, respectively. The difference in the two models is that the null hypothesis provides for only one group time course matrix for all the subjects, while the alternative hypothesis specifies two time course matrices, one for each group. Hence, the number of degrees of freedom of the Chi-square distribution equals the number of parameters in the additional group time course matrix required by the alternative hypothesis, i.e. τ = Tq. If $L R > χ_{α}^{2} (τ)$ the null hypothesis is rejected and it can be concluded that the temporal responses are significantly different between the two groups.

Local likelihood ratio test for a subset of independent components

Observed fMRI data represent the combination of various kinds of signals including task-related, transiently task-related, physiology-related and artifact-related ones. Since the signals stem from different underlying sources, they may be associated with various types of between-subject variability. For example, the temporal dynamics of task-related signals tend to be more consistent across subjects (when the experimental paradigm is fixed), since the neural activity is temporally locked to the same stimulus presentation schedule for all subjects. Other types of signals, such as those representing physiological fluctuations (e.g., cardiac or respiratory effects on the BOLD signal, or spontaneous mental activity unrelated to the task) are typically more heterogeneous among subjects. Given the varying characteristics of the signals, it may be desirable to assume different group spatio-temporal structures for independent components related to different kinds of signals. For example, one may choose to assume the trilinear product structure only for a subset of independent components (e.g., task-related ones), while not imposing this structure upon components for which it seems unreasonable (e.g., head-motion related ones). The assessment of the goodness-of-fit of the trilinear structure is therefore only relevant to the selected subset of components. Similarly, when testing group differences between subject groups, the interest of the researcher may focus on only those signals that are relevant to the study objectives.

To address the above issues, we propose the local LRT that is targeted to a subset of the independent components. The local LRT allows for comparisons between structures of group spatio-temporal processes for a subset of components. Suppose that a researcher is interested in testing the group structures for q₁ independent components where 1 ≤q₁ ≤ q. Without loss of generality, it can be assumed that the selected components correspond to the first q₁ columns in the group mixing matrix due to the permutation invariance of ICA. Hence, the group mixing matrix can be partitioned into two sub-matrices such that M = [M₁,M₂] , where M₁ is the TN × q₁ mixing matrix corresponding to the selected q₁ components and M₂ is the TN × q₂ mixing matrix corresponding to the other components with q₂ = q−q₁. The local LRT focuses on the comparisons of different group structures for M₁ related to the selected components.

As an example, let us consider the local goodness-of-fit test for the trilinear product structure. The hypotheses being tested are:

\begin{array}{l} H_{o} : M_{1} = C^{*} | \otimes | A^{*}, \\ H_{1} : M_{1} = [A_{1}^{* t}, \dots A_{N}^{* t}]^{'} where A_{i}^{*} \neq A^{*} diag (C_{i·}^{*}) for i = 1 \dots, N, \end{array}

(21)

whereC^* is the N × q₁ subject loading matrix representing the subjects’ loadings on the q₁ selected independent components, A^* is the T × q₁ matrix containing the group time courses associated with the selected components, and $A_{i}^{*} (i = 1, \dots, N)$ is the T × q₁ individual mixing matrix corresponding to the i th subject for the q₁ selected independent components. The local LRT test statistic is then constructed based on the difference in the maximum log-likelihood under the hypotheses H₀ and H₀ ∪ H₁ , respectively. The LR statistic follows a Chi-square distribution with the number of degrees of freedom τ = TNq₁ − (T + N) q₁. If $L R > χ_{α}^{2} (τ)$ , the null hypothesis is rejected and it can be concluded that the selected subset of independent components does not satisfy the trilinear product structure assumption.

Similarly, the local LRT can be set up to test group differences between subject groups restricted to a subset of selected independent components. The testing hypotheses are in this case:

H_{o} : M_{1} = C^{*} | \otimes | A^{*}, H_{1} : M_{1} = (\begin{matrix} C^{{(1)}^{*}} | \otimes | A^{{(1)}^{*}} \\ C^{{(2)}^{*}} | \otimes | A^{{(2)}^{*}} \end{matrix}) and A^{{(1)}^{*}} \neq A^{{(2)}^{*}}

(22)

where A^(g)^* (^g = 1,2) is the T × q₁ matrix containing the group time courses associated with the q₁ selected independent components for group g and C^{(g)^*} is the N_g × q₁ subject loading matrix for group g for the selected components. The LR statistic for this test follows a Chi-square distribution with the number of degrees of freedom τ = Tq₁.

To perform the local LRTs on a subset of selected components, we need to fit group ICA models with a component-specific group spatio-temporal structure. This can be achieved by identifying, in step 4 of the modified E–M algorithm, the columns in the updated group mixing matrix that correspond to the selected signals and project these columns onto the space specified by the group structure assumed for the selected signals. To identify the column that corresponds to a selected signal, the estimated spatial maps for the independent components can be ranked according to their correlation with a spatial template of the signal of interest.

Data dimension reduction and whitening

ICA analysis usually incorporates several preprocessing steps including centering, dimension reduction and whitening in order to reduce the complexity for the subsequent ICA decomposition. In group ICA analysis of multi-subject fMRI data, two stages of dimension reduction are often performed to reduce the computational load and avoid overfitting (Beckmann and Smith, 2005; Calhoun et al., 2001). At the first stage, a dimension reduction in performed in the temporal domain within each subject. Following Beckman and Smith (2005), we first perform a probabilistic PCA (PPCA) of the group data matrix obtained by spatially concatenating the individual subjects’ data, i.e. Y_T×NV =[X₁ … X_N]. Each subject’s data is then projected onto the common subspace spanned by the first R eigenvectors from the PPCA analysis U_R , where R is determined using Laplace approximation (Minka, 2000),

{\tilde{X}}_{i} = U_{R}^{T} X_{i},

(23)

where X̃_i is the reduced data for the i th subject with the dimensionality of R×V.

The reduced data from all the subjects are then concatenated to form the matrix $\tilde{X} = {[{\tilde{X}}_{1}^{t}, \dots, {\tilde{X}}_{N}^{t}]}^{t}$ and a second stage dimension reduction and whitening is performed as in a single subject PICA analysis (Beckmann and Smith, 2004),

X^{*} = {(Λ_{q} - σ^{2} I_{q})}^{- 1 / 2} U_{q}^{T} \tilde{X},

(24)

where q is the number of independent components extracted from the concatenated dataset X̃ that is again estimated using Laplace approximation, U_q and Λ_q contain the first q eigenvectors and eigenvalues based on the singular value decomposition of X̃, and the error variance σ² represents the variability in X̃ accounted by the q independent components and is estimated by the average of the NR−q smallest eigenvalues of X̃.

The two-stage dimension reduction and whitening is equivalent to multiplying the original group fMRI data by the following transformation matrix,

H = {(Λ_{q} - σ^{2} I_{q})}^{- 1 / 2} U_{q}^{T} (I_{N} \otimes U_{R}^{T}) .

(25)

Hence, the group ICA model in (1) can be re-expressed on the reduced and sphered space as following:

X^{*} = M^{*} S + E^{*},

(26)

where X^* = HX,M^* = HM, and E^* = HE. The transformed noise term in model (26) still follows a multivariate Gaussian distribution, i.e. $e_{v}^{*} ~ MVN (0, Σ_{v}^{*}) with Σ_{v}^{*} = H (I_{N} \otimes Σ_{v}) H^{T} .$ The new parameter vector for the group ICA model (26) is $θ^{*} = (M^{*}, {Σ_{v}^{*}}, φ)$ Note that φ, which represents the parameters related to the Gaussian mixture distribution of the spatial source signals, is not affected by the transformation since the dimension reduction is performed in the temporal domain. The parameter vector θ^* can be estimated using the proposed modified EM algorithm. Note that a slight modification is needed in the projection step (Step 4) of the EM algorithm because the group mixing matrix on the reduced dimension is no longer composed of concatenated subject submatrices. Therefore, in Step 4 ,M̂^*(k+1) needs first be transformed back to the original scale with ${\hat{M}}^{(k + 1)} = H^{- 1} {\hat{M}}^{* (k + 1)} where H^{- 1} = (I_{N} \otimes U_{R}) U_{q} {(Λ_{q} - σ^{2} I_{q})}^{1 / 2} . {\hat{M}}^{(k + 1)}$ is then projected onto a specified subspace and the new estimates are obtained on the reduced dimension as ${\tilde{M}}^{* (k + 1)} = H {\tilde{M}}^{(k + 1)} .$ After obtaining the parameter estimates θ̂^* from the modified EM, the parameters estimates θ̂ can be computed by back-transforming θ̂^* to the original scale.

Simulation Studies

To evaluate the performance of the proposed maximum likelihood method and the likelihood ratio tests for the group ICA models, we conducted two sets of simulation studies. In the first set, we generated a multi-subject dataset for one group of subjects and considered four simulation cases with different structures for the group mixing matrices, representing various types of subject heterogeneity. We fitted the Full Model and the Tensor Model for each simulation case with the proposed maximum likelihood approach. The accuracy of the model estimation was assessed by calculating the correlations between the true and estimated spatial maps and time courses. The goodness-of-fit of the Tensor Model was evaluated through the proposed LRT for each simulation case. In the second set of simulations, we generated data for two groups of subjects. The Tensor Model and Group Tensor Model were fitted and the proposed LRT was applied for testing group differences.

Single-group simulation study

Three source signals were generated with the spatial maps and associated time courses depicted in Figure 1. The time courses in Figure 1 were estimated time sources from the ICA analysis of an fMRI dataset and hence represent realistic temporal dynamics of source signals. While the simulated spatial maps are simpler than typical 3D brain activation images from real fMRI data, they are devised to reproduce the sparse nature of activations in fMRI, i.e. the fact that the source signal is attributable to a small number of voxels while the majority of the voxels exhibit background fluctuations. Gaussian background noises were added to the generated source signals. We simulated data for a group of 9 subjects. We considered four simulation cases with different structures for the group mixing matrix, representing varying types of between-subject heterogeneity:

Spatial maps and time courses used for the simulation study. Panel (A) presents the spatial maps of the three independent components where the areas in dark red represents activated locations in each component. The spatial signals at other locations represent Gaussian background noise. Panel (B) plots the time courses associated with each of the independent components. The time courses are the estimated temporal responses of independent components extracted from real fMRI data.

Case 0

In this case, we generated data according to the Tensor Model. Hence, the group mixing matrix is the Khatri-Rao product of the subject loading matrix C and a group time series matrixA. The group time series matrix consists of the three time courses as depicted in Figure 1, i.e. A = (t₁,t₂,t₃), which are associated with the three spatial maps. The subjects’ loadings on the three sources are (1, 0.5, 0.3), (0.3, 1, 0.5), (0.5, 0.3, 1), (0.9, 0.4, 0.3), (0.3, 0.9, 0.4),(0.4, 0.3, 0.9), (0.8, 0.4, 0.2), (0.2, 0.8, 0.4), and (0.4, 0.2, 0.8) times the noise standard deviation for the 9 subjects.

Case 1

Each subject’s mixing time course A_i (i = 1,…,9) is the composition of the group time courseA = (t₁,t₂,t₃) in Figure 1 and individualized Gaussian noise. Hence, a subject’s time course associated with each source signal is no longer proportional to the group time course and the group model thus deviates from the Tensor Model.

Case 2

We transformed the group time courses in Figure 1 into the frequency domain where we retained the frequency but randomized the phase for each subject, and then transformed it back to the temporal domain to obtain each subject’s time course. This case is closer to the resting-state fMRI BOLD signals where the temporal responses of different subjects may have similar frequency features but different phase patterns.

Case 3

Each subject contains the three spatial maps in Figure 1 but modulated by individual time courses A_i (i = 1,…,9) that were generated from a Gaussian process and hence do not reflect common underlying temporal dynamics.

Multi-group simulation study

In the second set of simulation study, we generated data from two groups of subjects each with 3 subjects. Here, we considered a relatively small group size so that we can evaluate the performance of the proposed methods in small sample studies. In typical multi-group fMRI studies, the number of subjects in each group is usually larger than 3, resulting in generally better performance in model estimation and statistical tests. The generated data represented the combination of three independent components. The spatial maps of the components are portrayed in Figure 1. Gaussian background noises were added to the spatial source signals. We considered two simulation cases, exemplifying different types of between-group heterogeneity:

Case 1

We generated data according to the Tensor Model. Hence, the group mixing matrix of all subjects is the Khatri-Rao product of the subject loading matrix C and a group time series matrix A. The time series matrix consists of three time courses as depicted in Figure.1, i.e. A = t₁, t₂, t₃, which are associated with the three spatial maps. The subject loading on the three sources are (1, 0.5, 0.3), (0.3, 1, 0.5), (0.5, 0.3, 1), (1, 0.6, 0.8), (0.6, 1, 0.2), and (0.3, 0.4, 1) times the noise standard deviation for the six subjects in the two groups. Note that the same group time course matrix A is shared by subjects of both groups, indicating fMRI signals with the same temporal structure in the two groups.

Case 2

We generated data from the Group Tensor Model. Within group g (g = 1,2), the group mixing matrix of the 3 subjects is the Khatri-Rao product the subject loading matrix C^(g) and the group time series matrix A^(g). The group time series matrices A^(g) =(t₁^(g) ,t₂^(g) ,t₃^(g)) (g = 1,2)contain two different sets of time courses (Figure 3) associated with the three spatial source signals. Therefore, the simulated fMRI time courses are similar within each group but different between the two groups.

Results

Simulation studies

In the following, we compare the performance of different group ICA models under the various types of spatio-temporal structure embedded in the simulated datasets. We first consider the results from the single-group simulation study and compare the results from the Tensor Model and Full Model. The accuracy of the model estimates is measured by the spatial correlations between the estimated and true spatial maps (Figure 2A) and the temporal correlations between the estimated and true time courses (Figure 2B). We then consider the multi-group simulation study where subjects were from two groups and compare the results from the Tensor Model and Group Tensor Model. In the simulation study, the proposed likelihood ratio tests were performed between nested group ICA models. The performance of the LRT was evaluated by the empirical type I error and power of the test. More specifically, we recorded the rejection rate of the LRT, that is, the proportions of simulated data where the LRT rejected the null hypothesis, which is the simple model, in favor of the alternative hypothesis, which is the more general model. The rejection rate is an empirical estimate of the type I error when the null hypothesis is true and the data were generated from the simple model, and an empirical estimate of the power of the LRT when the null hypothesis is false and the data were generated from the general model.