Abstract
Inter-subject variability in evoked brain responses is attracting attention because it may reflect important variability in structure–function relationships over subjects. This variability could be a signature of degenerate (many-to-one) structure–function mappings in normal subjects or reflect changes that are disclosed by brain damage. In this paper, we describe a non-iterative fuzzy clustering algorithm (FCP: fuzzy clustering with fixed prototypes) for characterizing inter-subject variability in between-subject or second-level analyses of fMRI data. The approach identifies the contribution of each subject to response profiles in voxels surviving a classical F-statistic criterion. The output identifies subjects who drive activation in specific cortical regions (local effects) or in voxels distributed across neural systems (global effects). The sensitivity of the approach was assessed in 38 normal subjects performing an overt naming task. FCP revealed that several subjects had either abnormally high or abnormally low responses. FCP may be particularly useful for characterizing outlier responses in rare patients or heterogeneous populations. In these cases, atypical activations may not be detected by standard tests, under parametric assumptions. The advantage of using FCP is that it searches all voxels systematically and can identify atypical activation patterns in a quantitative and unsupervised manner.
Abbreviations: RFX, random-effect analysis; FCM, fuzzy c-mean clustering; FCP, fuzzy clustering with fixed prototypes; D, similarity metric; U, degree of membership; G, degree of contribution
Keywords: Functional magnetic resonance imaging, Overt object naming, Inter-individual variability, Outliers, Atypical activations, Fuzzy clustering, Second-level analysis
Introduction
In functional neuroimaging, group analyses are used to assess effects of interest at the population level (e.g., Benali et al., 2003; Bosch, 2000; Friston et al., 1999; Lazar et al., 2002; McNamee and Lazar, 2004; Penny et al., 2003; Svensen et al., 2002). They assess the reliability or consistency of responses across individuals, in relation to the inter-subject variations that are assumed to be random. In this paper, we are concerned with the identification of ‘outliers’ in group analyses, i.e., data points that deviate markedly (either very high or very low activation) in one subject, compared to the others (Grubbs, 1969; Rasmussen, 1988). Outliers1 inflate the variance and move the mean towards the outlier which is problematic for both parametric and nonparametric statistics (e.g., Gastwirth, 1966; Zimmerman, 1994). Here, we present a new approach for identifying the contribution of each subject to group activations in functional imaging studies.
The effect of an outlier on the group activation at a given voxel is illustrated in Fig. 1 using simulated data. When the effect size of one subject (subject 1 in Fig. 1) is increased or decreased, it perturbs the group mean and dispersion across subjects. In this example, the heightened response from one subject increases the variance and reduces the significance of the group effect (i.e., lowers t values). True group effects (i.e., activations that are consistent across subjects) can therefore be lost if one subject has an atypical response.
In real data, outliers can reflect technical artefacts and less obvious problems such as sampling from mixtures of populations (for reviews see Beckman and Cook, 1983; Osborne and Overbay, 2004). The importance of identifying the influence of outliers on group effects has been addressed extensively (e.g., Beckman and Cook, 1983; Osborne and Overbay, 2004) and several techniques have been proposed to identify, modify, or remove outliers, before performing statistical tests (Barnett and Lewis, 1978; Beckman and Cook, 1983; Belsley et al., 1980; Bradlow and Zaslavsky, 1997; Chaloner and Brant, 1988; Davies and Gather, 1993; Gastwirth, 1966; Grubbs, 1969; Hawkins, 1980; Hocking, 1976; Huber, 1981; Rousseeuw and Leroy, 2003; Shapiro and Brady, 1995). Outlier removal furnishes inferences that are more robust to parametric assumptions. However, in fMRI, only a few attempts have been made to assess the contribution of outliers to the mean effect (Kherif et al., 2003) even though it is known that inter-subject variability in activations is higher than within-subject or between-session variability (Smith et al., 2005; Wei et al., 2004).
In this context, we present a new approach based on the classification of all voxels according to the clustering principle of fuzzy logic (Zadeh, 1965). Fuzzy logic classification (Bezdek, 1981) has been employed previously in fMRI data analysis (e.g., Baumgartner et al., 1998; Buerki et al., 2003; Fadili et al., 2000; Golay et al., 1998; Jahanian et al., 2004; Windischberger et al., 2003), but only for data-driven first-level analyses (e.g., time series classification of all voxels without a priori knowledge). Our approach, fuzzy clustering with fixed prototypes (FCP), is specifically tailored to the exploration of inter-subject variability at the second level, where each data point represents an effect (i.e., summary statistic or contrast) from a single subject. By conducting the analysis at the second level, our method also differs from others by being non-iterative and hypothesis-driven with predefined clusters (i.e., prototypes). We demonstrate that FCP can identify regional group effects that are driven or hidden by high or low activation in one subject.
Methods
Subjects
Thirty-eight healthy right-handed subjects (13 males, 25 females, 32 ± 20 years) gave written informed consent to participate in this study. Subjects were native English speakers, had normal or corrected-to-normal vision, with no history of neurological or psychiatric disorders. This study was approved by the Medical Ethics Committee of the Institute of Neurology.
Paradigm and stimuli
There were two conditions of interest: (a) an activation task that involved object naming and (b) a baseline task that controlled for visual and articulatory processing by requiring participants to articulate “1, 2, 3” in response to pictures of meaningless non-objects. To facilitate task switching, the conditions were blocked with twelve pictures (objects or non-objects) per block. Within a block, the twelve pictures were presented as four sequential stimuli (one stimulus per 4.3 s) with three pictures per stimulus, one above and two below. The participants were asked to name the objects in the same order (top, bottom left, bottom right) or to say “1, 2, 3” to the non-objects while looking at the top, bottom left and bottom right picture. Over the experiment, there were 32 object naming stimuli (96 pictures) and 16 non-object stimuli (48 pictures). These conditions of interest were interspersed with fixation, reading aloud and saying “1,2,3” to meaningless symbols. The total scanning time for all conditions was 12 min in two separate six-minute sessions.
To ensure that the task was understood correctly, all subjects were provided with detailed instructions and underwent a short training session before entering the scanner. To minimize artefacts from head motion, subjects were instructed to whisper their response with minimal mouth movement. Subject responses were recorded with an in-house MRI-compatible auditory recording system.
MRI acquisition
Data were acquired on a 1.5 T Siemens system (Siemens Medical Systems, Erlangen, Germany). Functional imaging comprised an EPI GRE sequence (TR/TE/flip = 3600 ms/50 ms/90°, FOV = 192 mm, matrix = 64 × 64, 40 axial slices with 3 × 3 × 3 mm3 voxel size). The multi-slice volume was positioned on sagittal scout images. Functional scanning was always preceded by 14.4 s of dummy scans to insure tissue steady-state magnetization.
Data analysis
Data processing and statistical analyses were performed with the Statistical Parametric Mapping SPM2 software package (Wellcome Department of Imaging Neuroscience, London UK, http://www.fil.ion.ucl.ac.uk/spm/). All functional volumes were spatially realigned, un-warped, normalized to the MNI space, and smoothed with an isotropic 6-mm FWHM Gaussian kernel, with resulting voxels size of 2 × 2 × 2 mm3. Time-series from each voxel were high-pass filtered (1/128 Hz cut-off) to remove low-frequency noise and signal drift. The pre-processed functional volumes of each subject were then submitted to a fixed-effects analysis, using the general linear model at each voxel. Each stimulus onset was modeled as an event encoded in condition-specific ‘stick-functions’. The resulting stimulus functions were convolved with a canonical hemodynamic response function to form regressors for the linear model. Our contrast of interest was the main effect of object naming relative to the non-object baseline. The appropriate summary or contrast image (i.e., a contrast of maximum likelihood parameter estimates) was then entered into a second-level analysis (i.e., random-effects analysis) to enable inferences about the population from which our subjects were drawn. From this second level analysis, we generated a statistical parametric map of the F statistic at each voxel SPM{F}, which characterized differences in activation (activations and deactivations) for object naming relative to the non-object baseline. The SPM of the F-statistic was used to identify candidate voxels for subsequent fuzzy clustering. It would be perfectly possible to include all brain voxels in the clustering analysis (i.e., without selection based on the F-statistic); however, one is usually interested in detecting outlier responses in regions that are typically engaged by the experimental paradigm. The clustering itself used the contrasts summarizing the activation for each subject at the candidate voxels. In the absence of outliers, we would expect these contrast values to be normally distributed, by central limit theorem.
Non-iterative fuzzy clustering with fixed prototypes (FCP)
The algorithm for FCP is adapted from Bezdek's fuzzy c-mean (FCM) clustering approach (Bezdek, 1981; Bezdek et al., 1997). In contrast to FCM, FCP is a non-iterative method that uses prototypes (i.e., clusters) that are fixed a priori, this means that the number of clusters does not have to be estimated and the procedure can be implemented non-iteratively. See Fig. 2 for a schematic illustration of the FCP algorithm.
In practice, we select Nvox voxels that we want to assign to C clusters. Here, we included all voxels with F > 2.0 in the second-level analysis (about 85,000 voxels). As outlier subjects are unknown, all subjects represent plausible classes in our algorithm. Therefore, the number of clusters (i.e., prototypes) is equal to the number of subjects C = Nsub. This means that each cluster represents the contribution of the corresponding subject to the mean effect at the voxel level. Each voxel i has a vector Xi of Nsub values that correspond to the contrast (i.e., activation) for each subject. The resemblance between each voxel i and each cluster (prototype) j is characterized by a “similarity metric” Dij. The degree of membership Uij is calculated for each voxel i by comparing Dij for each cluster j to all other clusters.
Similarity metric D
We quantify the similarity metric Dij between a voxel i and prototype j as:
(1) |
The real constant α is a “tuning” parameter that can be adjusted to control the sensitivity of the method to outlier values (see below). tanh is the hyperbolic tangent, Xij is the effect for the j-th subject at voxel i, X¯i is the mean over subjects and Xi≠j¯ is the mean effect without subject j. Accordingly, the similarity metric can be interpreted as (i) a measure of how far subject j is from the group mean, scaled by α or; (ii) a measure of how the mean effect of the group is perturbed when subject j is excluded. The latter perspective is important and suggests that our algorithm is formally similar to regression diagnostic methods that assess the extent to which a particular data point influences the model, by determining the change when that point is omitted; for example, the Cook's D-distance (Cook and Weisberg, 1982) and the DFFITS/DFBETAS statistics (Belsley et al., 1980). Note that other resemblance metrics have been employed in previous studies with standard fuzzy classification, including the hyperbolic correlation measure (for more details, see Fadili et al., 2000; Golay et al., 1998).
To illustrate the effect of α, we considered a voxel i with a given mean effect X¯i and a standard deviation equal to one. Fig. 3 illustrates the influence of parameter α on D. Increasing α leads to “smooth” D values, suggesting that α can be considered as a “smoothness” parameter. Critically, in order to keep the method independent of the scaling of X, we set α equal to 3 · α, where α is the standard deviation of the group (i.e., standard deviation over all voxels and all subjects). Voxels that are driven by high positive activation (i.e., large effects) are identified with a positive α value (3 · α) and voxels that are driven by low or negative effects (e.g., deactivation) are identified with a negative α value (− 3 · α).
Degree of membership U
The similarity metric D is then used to quantify the degree of membership Uij of voxel i to cluster j according to the following equation:
(2) |
The parameter λ is a negative number that represents the degree of fuzziness (e.g., Fadili et al., 2000) or the defuzzification parameter (e.g., Dimitriadou et al., 2004) as defined in the FCM approach. The influence of fuzziness on the clustering has been explored in previous studies (e.g., Bezdek, 1981; Fadili et al., 2000, 2001; Krishnapuram and Keller, 1993): when λ tends to − ∞ the classification becomes hard and Uij takes the value 0 (voxel i is not a member of cluster j) or 1 (voxel i belongs to cluster j) but when λ goes to 0 the classification is fuzzy (Uij is near to 1/Nsub). Fig. 4 illustrates the influence of λ on U with a range of D values. In our approach, classification was fuzzy (Uij was a continuous number between 0 and 1) when λ was between − 8 and − 2 (Fig. 4). We held λ constant at − 4 (see below for more details).
To localize regions that are driven by subject j, the jth column of U is thresholded at 0.3 and displayed on a normalized anatomical volume. These maps identify the regions that are driven by a particular subject.
Contribution coefficient G
We assess the contribution of each subject to the distributed response by computing the following coefficient:
(3) |
This coefficient Gj, computed for each cluster, is the relative proportion of the brain volume that belongs to the j-th subject (note that Gj sums to one). In hard classification (i.e., λ → − ∞), Gj is simply the proportion of voxels that belong to the j-th subject. In fuzzy classification, it is an estimation of how each subject dominates the observed data in a ‘global’ and “relative” way. In this sense, Gj reflects the ‘global’ contribution of the j-th subject relative to other subjects. The profile of Gj over subjects allows one to detect subjects who dominate in their contribution to the overall activations observed.
Simulations with FCP method
Before applying FCP to real fMRI data, we performed several simulations to assess (i) the sensitivity, (ii) the specificity, (iii) the influence of the parameter λ and (iv) the distribution of G values in a group with or without outlier subjects.
To assess the specificity and the sensitivity of FCP, we generated receiver operating characteristic (ROC) curves (Metz, 1978; Sorenson and Wang, 1996) at different values of the parameter λ. ROC curves encode the dependence of the true positive rate (sensitivity) on the false positive rate (one minus specificity) for different thresholds on the degree of membership U. Practically, 100,000 artificial voxels and 38 subjects were generated from a unit normal distribution (mean = 0, σ = 1). In a particular subject (e.g., subject 20), a proportion q of voxels (i.e., q = 5%, five thousands voxels) was sampled from a different normal distribution (mean = 3, σ = 1) and considered as a true positive (i.e., a true outlier). We then ran FCP on these simulated data with σ = 3. G values were assessed for each subject. This procedure was repeated for several values of λ = − 1, − 2, − 3, − 4, − 6, − 8, − 10, − 20 and − 40 and for different outlier distributions (mean = 3, 4 and 5).
To assess sensitivity we treated one subject (e.g., subject 20) as an outlier. Two different proportions were used: q = 1% (1000 outlier voxels) and q = 0.2% (200 outlier voxels). For both proportions, 10,000 simulations were performed. We assessed specificity by generating the null distribution of G with λ = − 4; we generated 100,000 voxels and 38 subjects from a normal distribution (mean = 0, σ = 1) (i.e., with no outliers). This procedure was repeated 10,000 times to provide samples of G under the null hypothesis of no outliers. We repeated the simulations but with different number of subjects Nsub (with 10,000 iterations for each group size Nsub). In these analyses, Nsub varied from 10 to 50.
Although these simulations used normally distributed data, clustering with fuzzy logic does not assume normality of the data. The only assumption we made was the existence of outliers far from the group mean (irrespective of the distribution of the population) and simply translated this assumption to a resemblance measure D.
Results
Simulated data
Influence of λ
As shown in Fig. 4, parameter λ affects the assessment of degree of membership U. Therefore, the coefficient G also depends on this parameter. Fig. 5A illustrates the influence of λ on G values when an outlier subject contained voxels with an effect at three standard deviations from the mean. When λ is small in absolute value (e.g., − 1 or − 2), the difference between the G value of the outlier subject and the G values of other subjects is low, suggesting low discrimination. This is due to the fact that U values are very similar (near to 1/Nsub) when λ tends to zero (Fig. 4). On the other hand, the ROC analyses of sensitivity and specificity at the voxel level showed low sensitivity when λ is too high in absolute value (e.g., − 10). This might be explained by the fact that voxel clustering becomes categorical (i.e., U = 0|1) when λ tends to − ∞ and can miss outlier voxels in the presence of noise, as illustrated by our simulations. Empirically, intermediate values of λ (e.g., − 4) appear the best compromise between high sensitivity at the voxel level and high sensitivity at the subject level. The same conclusions were reached when the effect of an outlier is very far from the mean of the group (see Fig. 5B). When applying FCP to real data, we therefore set λ equal to − 4.
Distribution of G
Fig. 6A showed the distribution of G when all subjects are comparable (i.e., no outliers). Over 10,000 simulations, G values are stable with a mean equal to 1/Nsub (0.026) and a standard deviation of 0.0002. Consequently, we can define a confidence interval with 38 subjects such that G values within the interval [0.025, 0.027] indicate that all subjects behave similarly. In other words, a subject with a G value of more than 0.027 could be considered atypical. Fig. 6B illustrates the mean G values over all realizations for each subject, for different group sizes. For each number of subjects Nsub, a threshold can be computed for G (see gray curve in Fig. 6B). In addition, the sensitivity of coefficient G to the presence of an outlier subject is shown in Figs. 6C and D. For instance, when 1% (Fig. 6C) or 0.2% (Fig. 6D) of voxels of a given subject are atypical, the G coefficient identifies it (G value of the outlier subject is higher than 0.27). Critically, Fig. 6D suggests that our method can identify outlier subjects even if the outlier effect is present only in a limited number of voxels (e.g., here in 0.2% of 100,000 voxels).
Object naming data
Contribution (global) — G
Group activation for the main effect of object naming, relative to the non-object baseline, is shown in Fig. 7A. Positive activations were observed in bilateral fusiform, inferior occipital gyri, cerebellum and SMA, with left lateralized effects in pre-central, inferior frontal, and middle temporal gyri. Negative activations were observed in bilateral inferior and superior parietal regions, precuneus, posterior cingulate and superior frontal gyri, with right lateralized effects in inferior temporal and pre-central gyri. These regions have been observed in previous studies with object naming tasks; see Price et al., 2005 for review. We also illustrate the percentage overlap between thresholded individual maps (Fig. 7B). Basically, these maps represent how often each voxel has been observed as “activated” across subjects at a given individual threshold (p < 0.01, uncorrected). The common voxels between subjects are not surprisingly less frequent than voxels that have been observed in only one or two subjects, suggesting that, across subjects, activated regions are variable in size, localization and statistical significance.
Interestingly, the contribution of each subject, as represented by G, is variable across subjects (Fig. 8). As demonstrated above using simulated data, given that there were 38 subjects, each subject would be expected to contribute equivalently with G values less than 0.027. Some subjects have high G values when α is positive (e.g., subjects 17, 25, 37, and 38), suggesting that regional activation in these subjects is higher than in the other subjects. For example, subject 37 has a G value of 0.09, which suggests that this subject has proportionally higher activation compared to other subjects. Likewise, some subjects have high G values when α is negative (e.g., subjects 3, 19, 33, and 35), suggesting that these subjects have very low activation in some of the regions activated by the group as a whole. For example, subject 19 has a G value of 0.07, which suggests that this subject has a globally low activation level in a relatively large number of voxels, compared to the other subjects.
Contribution (local) — U
In addition to this global measure, our approach identifies local outliers (at the voxel level). Fig. 9 illustrates some of the cortical regions with very different levels of activation in one subject compared to the others. For instance, subject 17 had higher activation in the right supramarginal gyrus (MNI coordinates: x = 58, y = − 30, z = 36) than all the other subjects. In contrast, subject 3 had lower activation in the left inferior occipito-temporal cortex (x = − 50, y = − 60, z = − 10). These regions have high U values (i.e., U > 0.9), which suggests that activation in these subjects is very far from the mean of the group.
Discussion
We have described a new clustering approach, FCP, to identify activations that are driven by one subject relative to the others. Our exploratory analysis for multi-subject fMRI data provides an objective characterization of inter-individual variability. This is usually difficult to achieve by visual inspection alone, particularly when the number of subjects is large. Previous reports have described different ways to discount (i.e., down-weight) the influence of outliers, during first or second-level analyses of fMRI data using, for instance, robust regression approaches (e.g., Diedrichsen and Shadmehr, 2005; Wager et al., 2005). Our method is motivated by the unusual view that outliers are interesting; FCP characterizes variability in individual functional maps to determine whether the variance is meaningful or not. We have illustrated the performance of FCP in a relatively large number of subjects performing an overt naming task. Below, we discuss some potential applications for the analysis of single-case patient studies and the characterization of normal inter-subject variability. We also describe ways that the analysis can be adapted to address specific questions.
Single-case patient studies
In clinical fMRI, the identification of activations that are driven by patients more or less than groups of control subjects is important, for example, when determining the effect of brain damage on neuronal responses and the mechanisms supporting recovery (e.g., Crinion and Price, 2005; Fernandez et al., 2004; Naeser et al., 2004; Seghier et al., 2001). As there is considerable variability in the site and extent of brain damage, conclusions are often sought on comparing activation in a patient to that in a group of control subjects (e.g., Fernandez et al., 2004; Seghier et al., 2001; Zacks et al., 2004). The sensitivity of conventional parametric statistical tests (e.g., the two sample t-test) in these analyses depends on the variance within the control group as well as the degree to which the patient's activation differs from the mean response. If variance in the control group is high (i.e., inflated by outlier values), then atypical activations in the patient may not be detected, even when the patient response lies outside the range of typical responses. We will demonstrate this in an application paper.
Our FCP approach allows a constrained and directed exploration of the data, which can be used to identify where the patient activation pattern is fundamentally different from the regions activated by control subjects. Critically, in this context, FCP facilitates a description of the data in the absence of statistically significant effects, as illustrated here with artificial data (Fig. 6). For example, when a patient has damage to a region that is activated by groups of control subjects, the expectation is that activation in this region will be significantly less in the patient than controls. However, even if the patient activation is atypically low, it will not be significantly different from controls, if the variance within the control group is high. The advantage of using FCP to explore individual variability is that it searches all the voxels (in the SPM{F} or the whole brain), rather than being constrained to a priori regions of interest. Thus, FCP can identify the full set of regions (neural system) where patient activation is atypically high (or low) compared to the controls. Finally, because identification of regions showing outlier response is based on fuzzy clustering there is no multiple comparison problem (i.e., U and G are “relative” measures; e.g., Kandel et al., 1995) because there is no categorical declaration of significance of the sort applied to SPMs.
Atypical activations in healthy populations
There are many sources of variability in normal activation patterns that may not be predicted a priori (e.g., Nadeau et al., 1998). These include the availability of different sensorimotor or cognitive strategies for the same task (see Edwards et al., 2005; Noppeney et al., 2004; Price and Friston, 2002; Speer et al., 2003; Tsukiura et al., 2005) and the influence of different behavioral and demographic variables (e.g., Gron et al., 2000; Iaria et al., 2003; Rimol et al., 2006; Rypma et al., 2005). The FCP approach is designed specifically to identify effects that are driven by one or few subjects. It is also useful for highlighting subtle technical problems that were not apparent during pre-processing or first-level statistical analyses. However, FCP is not designed to look at subgroups of subjects that differ in their cognitive approach to a task. Other approaches are being developed for the classification of subgroups including those based on Gaussian Mixture Modeling and Bayesian model comparison procedures (Noppeney et al., 2006; see also Bogorodzki et al., 2005; Zhong et al., 2004). In summary, although our approach is not optimized for classifying subjects into subgroups; it can be used to identify (i) subjects with high G values who ‘globally’ drive a significant number of ‘distributed’ voxels, and (ii) subjects with high U values who ‘locally’ drive specific cortical regions.
Adapting the analysis to the question of interest
Our FCP approach is intrinsically parameterized by different factors with fixed numbers of clusters (set to the number of subjects). Principally, the factor α is a tuning parameter that allows the user to adjust the sensitivity of the method to outliers (as detailed in Fig. 3). In addition, one can use alternative similarity measures (D), including the Pearson correlation distance, the hyperbolic correlation distance (Fadili et al., 2000; Golay et al., 1998) or the Cook distance (Cook and Weisberg, 1982). The choice of the similarity measure is obviously related to the definition of the prototypes, based here on the known high sensitivity and influence of regression analysis to outlier values (Devlin et al., 1975; Stevens, 1984). Likewise, the definition of the degree of contribution of each subject (G) can also be reformulated with other measures. The principal motivation here for G was to represent, in one measure, how much a subject contributes to the activation at all voxels. Moreover, the quantification of the degree of membership (U) could also be modified to include spatial constraints, in particular, to take into account the spatial dependency between each voxel and its neighbors (e.g., Ahmed et al., 2002; Liew et al., 2000).
Comparison with previous methods
Several approaches have been proposed previously to deal with the presence of outliers. Wager et al. have used robust statistics to down-weight the influence of outliers and assess the mean effects across subjects more accurately (Wager et al., 2005). In the same way, a diagnosis suite, called SPMd, identifies outlier scans at the first (within-subject) level by examining the stability of the fMRI signal over time (Luo and Nichols, 2003) and then removing outlier scans before statistical analysis. This method has been used recently in the context of multi-subject analysis to identify outlier sessions and subjects (Zhang et al., 2006), specifically by computing an “outlier rate” at the global level and exploring the normality of data with the Shapiro–Wilk statistics. With SPMd, it is also possible to generate voxel-specific measures that indicate how far a given subject is from the group mean. The main perspective of these methods is to down-weight or remove effects that are far from other “normal” effects (Luo and Nichols, 2003; Wager et al., 2005). This contrasts with our approach, which “targets” outlier effects in order to characterize them in further analyses. Other approaches have used alternative methods to identify outlier subjects. For instance, Kherif et al. (2003) proposed temporal and spatial similarity measures in order to assess the similarity between subjects before group analysis. This approach, based on a multivariate analysis framework and applied in a multi-contrasts context, assesses the relative inter-subject distance using multidimensional scaling tools. It also uses Cook's test, at the global level, to identify outlier subjects. Moreover, other similarity measures based on independent component analysis (ICA) have been proposed in the context of multi-subject fMRI analysis, including the mutual correlation coefficient between estimated independent components (Esposito et al., 2005) and components from tensor probabilistic independent component analysis (Beckmann and Smith, 2005).
To compare our FCP approach to others, we re-analyzed our data using SPMd (Luo and Nichols, 2003) and the “distance” toolbox provided by Kherif et al. (2003). With respect to Luo and Nichol's method, we used the recent version “spmd2”, which was developed initially for first (within-subject)-level data diagnosis to ensure the stability of fMRI signals over time. We computed different rates following the multi-subject study of Zhang et al. (2006) and compared the “outlier rate” from SPMd with our global G values, as illustrated in Fig. 10A. This demonstrated a number of consistencies between the two approaches. For example, subjects 37, 2, 19, 1, 38 and 29 dominate the activation across voxels, as indicated by our method (Fig. 8). However, we noticed that (i) at the global level, SPMd did not distinguish between outlier effects that are below the mean from those that are above the mean; this may be important when comparing patients to controls, and (ii) there is no quantitative interpretation of this rate, unlike the FCP approach (G values under null hypothesis are equal to 1/Nsub, see Fig. 6). SPMd generates a normality diagnostic image based on Shapiro–Wilk statistics. This allows voxels violating normality (outliers) to be identified. Then, individual images are generated to assess how far a subject is from the group mean at a given voxel (equivalent to our U value). The assessment of these images by SPMd is based on the assumption that the population effect is normal. In contrast, our FCP does not assume normality.
We also tested the new “distance” toolbox of Kherif et al. (2003) on our 38 subjects. We first displayed the mean distance that indicated how far a given subject is from the group (i.e., the mean of distances between subjects). As shown in Fig. 10B, the mean distance plot suggested that there were no outlier subjects (according also to the Cook test with the default cut off of 0.5). Fig. 10B indicated that some subjects with high mean distance also had high G values with our FCP method (e.g., subjects 1, 29 and 33). However, other subjects did not concord with our results (e.g., 8, 24 and 32). The discrepancy is due mainly to the fact that our method is based on a voxel-by-voxel analysis, whereas the approach in the Kherif et al. toolbox is multivariate. Note also that the distance measure employed does not distinguish between effects that are below or above the group mean. Moreover, measures at the voxel level (as in our local measure U) are currently not available in this toolbox.
In summary, although outliers can be identified by existing approaches, our approach based on fuzzy set theory is fundamentally different because (i) it acts directly on second (between-subject)-level summary statistics; (ii) it uses fuzzy logic theory which may be more appropriate for vague and ambiguous concepts like typicality and outliers; (iii) it models all subjects explicitly as potential outliers by fixing the prototypes, (iv) it employs a robust local similarity measure (D) at the voxel level that can be adapted easily to other contexts; (v) it provides a way to identify outlier subjects (i.e., with the G value) at the global level and specifies if this outlier effect is below or above the group mean (i.e., the sign of parameter α); and (vi) it furnishes a local measure (i.e., a U value) that allows voxels with atypical activation levels to be identified in each subject.
Conclusion
Here, we have presented a new approach that identifies subject-driven activations in fMRI data. This method could be used as an exploratory approach in multi-subject fMRI studies. Its sensitivity is illustrated here with both synthetic and real data from a relatively large number of healthy right-handed subjects performing an object naming task. Future investigations will explore the specificity of such approaches in group studies when activation is expected to be heterogeneous, for example in left-handed, multilingual, pediatric or diseased populations.
Acknowledgments
This work was funded by the Wellcome Trust and the James S. MacDonnell Foundation (conducted as part of the NRG initiative). The authors would like to thank Will Penny and Ferath Kherif for their valuable suggestions on the manuscript.
Footnotes
In statistics, observations that are far outside the norm of a population have been described under different names, including unrepresentative, influential, “rogue”, contaminant, “maverick”, outlier or fringe-lier observations. For the rest of the paper, we will use the term “outlier” to designate this kind of observation.
References
- Ahmed M.N., Yamany S.M., Mohamed N., Farag A.A., Moriarty T. A modified fuzzy c-mean algorithm for bias field estimation and segmentation of MRI data. IEEE Trans. Med. Imaging. 2002;21:193–199. doi: 10.1109/42.996338. [DOI] [PubMed] [Google Scholar]
- Barnett V., Lewis T. Wiley; New York: 1978. Outliers in Statistical Data. [Google Scholar]
- Baumgartner R., Windischberger C., Moser E. Quantification in functional magnetic resonance imaging: fuzzy clustering vs. correlation analysis. Magn. Reson. Imaging. 1998;16:115–125. doi: 10.1016/s0730-725x(97)00277-4. [DOI] [PubMed] [Google Scholar]
- Beckman R.J., Cook R.D. Outlier...........s. Technometrics. 1983;25:119–149. [Google Scholar]
- Beckmann C.F., Smith S.M. Tensorial extensions of independent component analysis for multisubject FMRI analysis. NeuroImage. 2005;25:294–311. doi: 10.1016/j.neuroimage.2004.10.043. [DOI] [PubMed] [Google Scholar]
- Belsley D.A., Kuh E., Welsch R.E. Wiley; New York: 1980. Regression diagnostics. [Google Scholar]
- Benali H., Mattout J., Pelegrini-Issac M. Multivariate group effect analysis in functional magnetic resonance imaging. Inf. Process Med. Imaging. 2003;18:548–559. doi: 10.1007/978-3-540-45087-0_46. [DOI] [PubMed] [Google Scholar]
- Bezdek J.C. Plenum Press; New York: 1981. Pattern Recognition with Fuzzy Objective Functions Algorithms. [Google Scholar]
- Bezdek J.C., Hall L.O., Clark M.C., Goldgof D.B., Clarke L.P. Medical image analysis with fuzzy models. Stat. Methods Med. Res. 1997;6:191–214. doi: 10.1177/096228029700600302. [DOI] [PubMed] [Google Scholar]
- Bogorodzki P., Rogowska J., Yurgelun-Todd D.A. Structural group classification technique based on regional fMRI BOLD responses. IEEE Trans. Med. Imaging. 2005;24:389–398. doi: 10.1109/tmi.2004.843183. [DOI] [PubMed] [Google Scholar]
- Bosch V. Statistical analysis of multi-subject fMRI data: assessment of focal activations. J. Magn. Reson. Imaging. 2000;11:61–64. doi: 10.1002/(sici)1522-2586(200001)11:1<61::aid-jmri9>3.0.co;2-c. [DOI] [PubMed] [Google Scholar]
- Bradlow E.T., Zaslavsky A.M. Case influence analysis in Bayesian inference. J. Comput. Graph. Stat. 1997;6:314–331. [Google Scholar]
- Buerki M., Lovblad K.O., Oswald H., Nirkko A.C., Stein P., Kiefer C., Schroth G. Multiresolution fuzzy clustering of functional MRI data. Neuroradiology. 2003;45:691–699. doi: 10.1007/s00234-003-1026-9. [DOI] [PubMed] [Google Scholar]
- Chaloner K., Brant R. A Bayesian approach to outlier detection and residual analysis. Biometrika. 1988;75:651–659. [Google Scholar]
- Cook R.D., Weisberg S. Chapman-Hall; London: 1982. Residuals and Influence in Regression. [Google Scholar]
- Crinion J., Price C.J. Right anterior superior temporal activation predicts auditory sentence comprehension following aphasic stroke. Brain. 2005;128:2858–2871. doi: 10.1093/brain/awh659. [DOI] [PubMed] [Google Scholar]
- Davies L., Gather U. The identification of multiple outliers. J. Am. Stat. Assoc. 1993;88:782–792. [Google Scholar]
- Devlin S.J., Gnanadesikan R., Kettenring J.R. Robust estimation and outlier detection with correlation coefficients. Biometrika. 1975;62:531–545. [Google Scholar]
- Diedrichsen J., Shadmehr R. Detecting and adjusting for artifacts in fMRI time series data. NeuroImage. 2005;27:624–634. doi: 10.1016/j.neuroimage.2005.04.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dimitriadou E., Barth M., Windischberger C., Hornik K., Moser E. A quantitative comparison of functional MRI cluster analysis. Artif. Intell. Med. 2004;31:57–71. doi: 10.1016/j.artmed.2004.01.010. [DOI] [PubMed] [Google Scholar]
- Edwards J.D., Pexman P.M., Goodyear B.G., Chambers C.G. An fMRI investigation of strategies for word recognition. Brain Res. Cogn Brain Res. 2005;24:648–662. doi: 10.1016/j.cogbrainres.2005.03.016. [DOI] [PubMed] [Google Scholar]
- Esposito F., Scarabino T., Hyvarinen A., Himberg J., Formisano E., Comani S., Tedeschi G., Goebel R., Seifritz E., Di Salle F. Independent component analysis of fMRI group studies by self-organizing clustering. NeuroImage. 2005;25:193–205. doi: 10.1016/j.neuroimage.2004.10.042. [DOI] [PubMed] [Google Scholar]
- Fadili M.J., Ruan S., Bloyet D., Mazoyer B. A multistep unsupervised fuzzy clustering analysis of fMRI time series. Hum. Brain Mapp. 2000;10:160–178. doi: 10.1002/1097-0193(200008)10:4<160::AID-HBM20>3.0.CO;2-U. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fadili M.J., Ruan S., Bloyet D., Mazoyer B. On the number of clusters and the fuzziness index for unsupervised FCA application to BOLD fMRI time series. Med. Image Anal. 2001;5:55–67. doi: 10.1016/s1361-8415(00)00035-9. [DOI] [PubMed] [Google Scholar]
- Fernandez B., Cardebat D., Demonet J.F., Joseph P.A., Mazaux J.M., Barat M., Allard M. Functional MRI follow-up of language processes in health subjects and during recovery in a case of aphasia. Stroke. 2004;35:2171–2176. doi: 10.1161/01.STR.0000139323.76769.b0. [DOI] [PubMed] [Google Scholar]
- Friston K.J., Holmes A.P., Price C.J., Buchel C., Worsley K.J. Multisubject fMRI studies and conjunction analyses. NeuroImage. 1999;10:385–396. doi: 10.1006/nimg.1999.0484. [DOI] [PubMed] [Google Scholar]
- Gastwirth J.L. On robust procedures. J. Am. Stat. Assoc. 1966;61:929–948. [Google Scholar]
- Golay X., Kollias S., Stoll G., Meier D., Valavanis A., Boesiger P. A new correlation-based fuzzy logic clustering algorithm for fMRI. Magn. Reson. Med. 1998;40:249–260. doi: 10.1002/mrm.1910400211. [DOI] [PubMed] [Google Scholar]
- Gron G., Wunderlich A.P., Spitzer M., Tomczak R., Riepe M.W. Brain activation during human navigation: gender-different neural networks as substrate of performance. Nat. Neurosci. 2000;3:404–408. doi: 10.1038/73980. [DOI] [PubMed] [Google Scholar]
- Grubbs F.E. Procedures for detecting outlying observations in samples. Technometrics. 1969;11:1–21. [Google Scholar]
- Hawkins D.M. Chapman and Hall; London: 1980. Identification of Outliers. [Google Scholar]
- Hocking R.R. The analysis and selection of variables in linear regression. Biometrics. 1976;32:1–49. [Google Scholar]
- Huber P.J. John Wiley and Sons; New York: 1981. Robust Statistics. [Google Scholar]
- Iaria G., Petrides M., Dagher A., Pike B., Bohbot V.D. Cognitive strategies dependent on the hippocampus and caudate nucleus in human navigation: variability and change with practice. J. Neurosci. 2003;23:5945–5952. doi: 10.1523/JNEUROSCI.23-13-05945.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jahanian H., Hossein-Zadeh G.A., Soltanian-Zadeh H., Ardekani B.A. Controlling the false positive rate in fuzzy clustering using randomization: application to fMRI activation detection. Magn. Reson. Imaging. 2004;22:631–638. doi: 10.1016/j.mri.2004.01.035. [DOI] [PubMed] [Google Scholar]
- Kandel A., Martins A., Pacheco R. Discussion: on the very real distinction between fuzzy and statistical methods. Technometrics. 1995;37:276–281. [Google Scholar]
- Kherif F., Poline J.P., Mériaux S., Benali H., Flandin G., Brett M. Group analysis in functional neuroimaging: selecting subjects using similarity measures. NeuroImage. 2003;20:2197–2208. doi: 10.1016/j.neuroimage.2003.08.018. [DOI] [PubMed] [Google Scholar]
- Krishnapuram R., Keller J.M. A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst. 1993;1:98–110. [Google Scholar]
- Lazar N.A., Luna B., Sweeney J.A., Eddy W.F. Combining brains: a survey of methods for statistical pooling of information. NeuroImage. 2002;16:538–550. doi: 10.1006/nimg.2002.1107. [DOI] [PubMed] [Google Scholar]
- Liew A.W.C., Leung S.H., Lau W.H. Fuzzy image clustering incorporating spatial continuity. IEE Proc. Vision Image Signal Process. 2000;147:185–192. [Google Scholar]
- Luo W.L., Nichols T. Diagnosis and exploration of massive univariate neuroimaging models. NeuroImage. 2003;19:1014–1032. doi: 10.1016/s1053-8119(03)00149-6. [DOI] [PubMed] [Google Scholar]
- McNamee R.L., Lazar N.A. Assessing the sensitivity of fMRI group maps. NeuroImage. 2004;22:920–931. doi: 10.1016/j.neuroimage.2004.02.016. [DOI] [PubMed] [Google Scholar]
- Metz C.E. Basic principles of ROC analysis. Semin. Nucl. Med. 1978;8:283–298. doi: 10.1016/s0001-2998(78)80014-2. [DOI] [PubMed] [Google Scholar]
- Nadeau S.E., Williamson D.J., Crosson B., Gonzalez Rothi L.J., Heilman K.M. Functional imaging: heterogeneity in task strategy and functional anatomy and the case for individual analysis. Neuropsychiatry Neuropsychol. Behav. Neurol. 1998;11:83–96. [PubMed] [Google Scholar]
- Naeser M.A., Martin P.I., Baker E.H., Hodge S.M., Sczerzenie S.E., Nicholas M., Palumbo C.L., Goodglass H., Wingfield A., Samaraweera R., Harris G., Baird A., Renshaw P., Yurgelun-Todd D. Overt propositional speech in chronic nonfluent aphasia studied with the dynamic susceptibility contrast fMRI method. NeuroImage. 2004;22:29–41. doi: 10.1016/j.neuroimage.2003.11.016. [DOI] [PubMed] [Google Scholar]
- Noppeney U., Friston K.J., Price C.J. Degenerate neuronal systems sustaining cognitive functions. J. Anat. 2004;205:433–442. doi: 10.1111/j.0021-8782.2004.00343.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Noppeney U., Penny W.D., Price C.J., Flandin G., Friston K.J. Identification of degenerate neuronal systems based on intersubject variability. NeuroImage. 2006;30:885–890. doi: 10.1016/j.neuroimage.2005.10.010. [DOI] [PubMed] [Google Scholar]
- Osborne J.W., Overbay A. The power of outliers (and why researchers should always check for them) Pract. Assess., Res. Eval. 2004;9:6. ( http://PAREonline.net/getvn.asp?v=9&n=6) [Google Scholar]
- Penny W.D., Holmes A.P., Friston K.J. Random effects analysis. In: Frackowiak R.S.J., Friston K.J., Frith C., Dolan R.J., Price C.J., Zeki S., Ashburner J., Penny W.D., editors. Human Brain Function. Academic Press; 2003. [Google Scholar]
- Price C.J., Friston K.J. Degeneracy and cognitive anatomy. Trends Cogn. Sci. 2002;6:416–421. doi: 10.1016/s1364-6613(02)01976-9. [DOI] [PubMed] [Google Scholar]
- Price C.J., Devlin J.T., Moore C.J., Morton C., Laird A.R. Meta-analyses of object naming: effect of baseline. Hum. Brain Mapp. 2005;25:70–82. doi: 10.1002/hbm.20132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rasmussen J.L. Evaluating outlier identification tests: mahalanobis D squared and comrey Dk. Multivar. Behav. Res. 1988;23:189–202. doi: 10.1207/s15327906mbr2302_4. [DOI] [PubMed] [Google Scholar]
- Rimol L.M., Specht K., Hugdahl K. Controlling for individual differences in fMRI brain activation to tones, syllables, and words. NeuroImage. 2006;30:554–562. doi: 10.1016/j.neuroimage.2005.10.021. [DOI] [PubMed] [Google Scholar]
- Rousseeuw P.J., Leroy A.M. John Wiley and Sons; New Jersey, USA: 2003. Robust Regression and Outlier Detection. [Google Scholar]
- Rypma B., Berger J.S., Genova H.M., Rebbechi D., D'Esposito M. Dissociating age-related changes in cognitive strategy and neural efficiency using event-related fMRI. Cortex. 2005;41:582–594. doi: 10.1016/s0010-9452(08)70198-9. [DOI] [PubMed] [Google Scholar]
- Seghier M., Lazeyras F., Momjian S., Annoni J.-M., de Tribolet N., Khateb A. Language representation in a patient with a dominant right hemisphere: fMRI evidence for an intrahemispheric reorganisation. NeuroReport. 2001;12:2785–2790. doi: 10.1097/00001756-200109170-00007. [DOI] [PubMed] [Google Scholar]
- Shapiro L.S., Brady M. Rejecting outliers and estimating errors in an orthogonal-regression framework. Philos. Trans., Phys. Sci. Eng. 1995;350:407–439. [Google Scholar]
- Smith S.M., Beckmann C.F., Ramnani N., Woolrich M.W., Bannister P.R., Jenkinson M., Matthews P.M., McGonigle D.J. Variability in fMRI: a re-examination of inter-session differences. Hum. Brain Mapp. 2005;24:248–257. doi: 10.1002/hbm.20080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sorenson J.A., Wang X. ROC methods for evaluation of fMRI techniques. Magn. Reson. Med. 1996;36:737–744. doi: 10.1002/mrm.1910360512. [DOI] [PubMed] [Google Scholar]
- Speer N.K., Jacoby L.L., Braver T.S. Strategy-dependent changes in memory: effects on behavior and brain activity. Cogn. Affect. Behav. Neurosci. 2003;3:155–167. doi: 10.3758/cabn.3.3.155. [DOI] [PubMed] [Google Scholar]
- Stevens J.P. Outliers and influential data points in regression analysis. Psychol. Bull. 1984;95:334–344. [Google Scholar]
- Svensen M., Kruggel F., Benali H. ICA of fMRI group study data. NeuroImage. 2002;16:551–563. doi: 10.1006/nimg.2002.1122. [DOI] [PubMed] [Google Scholar]
- Tsukiura T., Mochizuki-Kawai H., Fujii T. The effect of encoding strategies on medial temporal lobe activations during the recognition of words: an event-related fMRI study. NeuroImage. 2005;25:452–461. doi: 10.1016/j.neuroimage.2005.01.003. [DOI] [PubMed] [Google Scholar]
- Wager T.D., Keller M.C., Lacey S.C., Jonides J. Increased sensitivity in neuroimaging analyses using robust regression. NeuroImage. 2005;26:99–113. doi: 10.1016/j.neuroimage.2005.01.011. [DOI] [PubMed] [Google Scholar]
- Wei X., Yoo S.S., Dickey C.C., Zou K.H., Guttmann C.R., Panych L.P. Functional MRI of auditory verbal working memory: long-term reproducibility analysis. NeuroImage. 2004;21:1000–1008. doi: 10.1016/j.neuroimage.2003.10.039. [DOI] [PubMed] [Google Scholar]
- Windischberger C., Barth M., Lamm C., Schroeder L., Bauer H., Gur R.C., Moser E. Fuzzy cluster analysis of high-field functional MRI data. Artif. Intell. Med. 2003;29:203–223. doi: 10.1016/s0933-3657(02)00072-6. [DOI] [PubMed] [Google Scholar]
- Zacks J.M., Michelon P., Vettel J.M., Ojemann J.G. Functional reorganization of spatial transformations after a parietal lesion. Neurology. 2004;63:287–292. doi: 10.1212/01.wnl.0000129844.11712.d8. [DOI] [PubMed] [Google Scholar]
- Zadeh L.A. Fuzzy sets. Inf. Control. 1965;8:338–353. [Google Scholar]
- Zhang H., Luo W.L., Nichols T.E. Diagnosis of single-subject and group fMRI data with SPMd. Hum. Brain Mapp. 2006;27:442–451. doi: 10.1002/hbm.20253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhong N., Wu J.L., Nakamura A., Ohshima M., Mizuhara H. Peculiarity oriented fMRI brain data analysis for studying human multi-perception mechanism. Cogn. Syst. Res. 2004;5:241–256. [Google Scholar]
- Zimmerman D.W. A note on the influence of outliers on parametric and nonparametric tests. J. Gen. Psychol. 1994;121:391–401. [Google Scholar]