Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2017 Feb 27;18(3):521–536. doi: 10.1093/biostatistics/kxw050

PCA leverage: outlier detection for high-dimensional functional magnetic resonance imaging data

Amanda F Mejia *, Mary Beth Nebel *, Ani Eloyan *, Brian Caffo *, Martin A Lindquist *,*
PMCID: PMC5862350  PMID: 28334131

Summary

Outlier detection for high-dimensional (HD) data is a popular topic in modern statistical research. However, one source of HD data that has received relatively little attention is functional magnetic resonance images (fMRI), which consists of hundreds of thousands of measurements sampled at hundreds of time points. At a time when the availability of fMRI data is rapidly growing—primarily through large, publicly available grassroots datasets—automated quality control and outlier detection methods are greatly needed. We propose principal components analysis (PCA) leverage and demonstrate how it can be used to identify outlying time points in an fMRI run. Furthermore, PCA leverage is a measure of the influence of each observation on the estimation of principal components, which are often of interest in fMRI data. We also propose an alternative measure, PCA robust distance, which is less sensitive to outliers and has controllable statistical properties. The proposed methods are validated through simulation studies and are shown to be highly accurate. We also conduct a reliability study using resting-state fMRI data from the Autism Brain Imaging Data Exchange and find that removal of outliers using the proposed methods results in more reliable estimation of subject-level resting-state networks using independent components analysis.

Keywords: fMRI, High-dimensional statistics, Image analysis, Leverage, Outlier detection, Principal component analysis, Robust statistics

1. Introduction

Outliers in high-dimensional (HD) settings, such as genetics, medical imaging, and chemometrics, are a common problem in modern statistics and have been the focus of much recent research (Hubert and others, 2005; Filzmoser and others, 2008; Hadi and others, 2009; Shieh and Hung, 2009; Fritsch and others, 2012; Ro and others, 2015). One such source of HD data is functional magnetic resonance imaging (fMRI). An fMRI run usually contains 100 000–200 000 volumetric elements or “voxels” within the brain, which are sampled at hundreds of time points. Here, we consider voxels to be variables and time points to be observations, in which case the outlier problem is to identify time points that contain high levels of noise or artifacts.

Multiple noise sources related to the hardware and the participant (Lindquist and others, 2008) can corrupt fMRI data, including magnetic field instabilities, head movement, and physiological effects, such as heartbeat and respiration. Noise sources appear as high-frequency “spikes,” image artifacts distortions, and signal drift. fMRI data also undergo a series of complex preprocessing steps before being analyzed; errors during any one of these steps could introduce additional artifacts. Thus, performing adequate quality control prior to statistical analysis is critical.

In recent years, the availability of fMRI data has increased rapidly. The emergence of a number of publicly available fMRI databases, often focusing on a specific disease or disorder, presents a great opportunity to study brain function and organization. However, these datasets are usually collected from multiple sites with varying methods for acquisition, processing, and quality control, resulting in widely varying levels of quality and high rates of artifacts. In the absence of automated outlier detection methods appropriate for fMRI data, quality inspection often takes place in a manual or semi-automated manner by individual research groups. This presents a timely opportunity for statisticians to develop more automated methods.

Here we propose an HD outlier detection method based on dimension reduction through principal components analysis (PCA) and established measures of outlyingness, namely leverage and robust distances. While leverage has not typically been employed for outlier identification outside of a regression framework, we argue for leverage as a meaningful measure when the principal components (PCs) are themselves of interest, which is often true for fMRI data.

Several outlier detection methods for standard and HD data use PCA, including PCA influence functions and other PC sensitivity measures (Brooks, 1994; Gao and others, 2005). However, these methods are often computationally demanding as they rely on re-estimating the PCs with each observation left out. Similarly, methods that depend on robust covariance estimation (see Hadi and others, 2009 for a review) are usually not suited for HD settings. One such method, the minimum covariance determinant (MCD) estimator, identifies the observation subset with the smallest sample covariance matrix determinant (Rousseeuw, 1985). Hubert and others (2005) proposed ROBPCA, a robust PCA method for HD data that can also identify outliers, which lie far from the robust PCs space. Filzmoser and others (2008) proposed PCOut and Sign, two computationally efficient methods that perform standard PCA after robustly scaling the data and looking for outliers within the principal directions explaining 99% of the variance. Ro and others (2015) proposed the minimum diagonal product estimator, which is related to the MCD but ignores off-diagonal elements and is identifiable when there are more variables than observations. Fritsch and others (2012) proposed an HD adaptation of the MCD through regularization and applied the method to neuroimaging summary statistics.

However, such methods are often validated using only moderately sized data containing more observations than variables. One exception comes from Shieh and Hung (2009) who proposed identifying outlying genes in microarray data by performing PCA dimension reduction prior to robust distance computation on the reduced data. The method was validated on a dataset of approximately 100 observations and 2000 variables and shown to result in fewer false positives and false negatives than ROBPCA.

Existing methods for fMRI artifact identification have focused on head motion and ad-hoc measures of quality. While the removal of affected time points (“scrubbing” or “spike regression”) using these methods appears beneficial (Satterthwaite and others, 2013; Power and others, 2014), a more unified outlier detection framework is needed, as motion is only one potential artifact source in fMRI data. In addition, existing methods result in a collection of measures that must somehow be combined. We propose a single measure of outlyingness related to the influence of each time point on PC estimation, which is the basis of several common brain connectivity measures (see Section 2.2).

The remainder of this paper is organized as follows. We begin with a description of our statistical methodology. We then present a simulation study, which is used to assess the sensitivity and specificity of the proposed methods. Next, we present a reliability analysis employing the Autism Brain Imaging Data Exchange (ABIDE) dataset. We conclude with a brief discussion.

2. Methods

As described in detail below, we propose two PCA-based measures of outlyingness, PCA leverage and PCA robust distance, and develop thresholding rules to label outliers using either measure. For both measures, we begin with PCA dimension reduction. All computations are performed in the R statistical environment version 3.1.1 (R Core Team, 2014).

2.1. Dimension reduction

Let Inline graphic be the number of 3D “volumes” collected over time in an fMRI run, and let Inline graphic be the number of voxels in the brain. Let Inline graphic (Inline graphic) represent an fMRI run, where each row of Inline graphic is a vectorized volume. We first center and scale each column of Inline graphic relative to its median and median absolute deviation (Hampel and others, 1986), respectively, to avoid the influence of outliers. The singular value decomposition (SVD) (Golub and Reinsch, 1970) of Inline graphic is given by Inline graphic, where Inline graphic is diagonal with elements Inline graphic and Inline graphic. Here Inline graphic denotes the transpose of matrix Inline graphic. The rows of Inline graphic contain the PCs or eigenimages of Inline graphic, and the columns of Inline graphic contain the corresponding PC scores. Note that to avoid memory limitations, rather than compute the SVD of Inline graphic directly, one generally computes the SVD of Inline graphic to obtain Inline graphic and Inline graphic and then solves for Inline graphic.

We retain Inline graphic principal components, so that the “reduced data” are given by the submatrices of Inline graphic and Inline graphic corresponding to the first Inline graphic principal components. For ease of notation, we redefine Inline graphic and Inline graphic to represent these submatrices and Inline graphic. To choose the model order Inline graphic, we retain only components with a greater-than-average eigenvalue. While more sophisticated cutoff methods exist, we find that this simple cutoff rule works well in practice (Jackson, 1993). To avoid extreme solutions, we require Inline graphic.

2.2. PCA leverage

In regression, leverage is defined as the diagonals of the “hat matrix” Inline graphic, where Inline graphic is a matrix of explanatory variables (Neter and others, 1996). The hat matrix projects the outcome variable(s) Inline graphic onto the column space of Inline graphic, yielding the projected data Inline graphic. Leverage, bounded between Inline graphic and Inline graphic, is often used to assess the potential influence of an observation on the regression fit, as it is the change in Inline graphic due to a 1-unit change in Inline graphic and is proportional to the uncertainty in the Inline graphic estimate, since Inline graphic. Particularly relevant to our context, leverage is also a measure of outlyingness among the explanatory variables, as it is related to the Mahalanobis distance.

Extending leverage to the PCA context, we treat Inline graphic as an estimated design matrix in the estimation of Inline graphic. With Inline graphic and Inline graphic fixed, Inline graphic is equivalent to the least squares estimate Inline graphic in the multivariate regression model Inline graphic. We therefore define PCA leverage as Inline graphic, where Inline graphic. Note that Inline graphic is simply a scaling factor applied to each variable and therefore has no effect on leverage. Continuing the regression analogy, in PCA, Inline graphic projects Inline graphic onto the column space of Inline graphic, the principal directions, as Inline graphic. Furthermore, PCA leverage is a measure of outlyingness among the PCA scores and within Inline graphic, since Inline graphic. While in reality Inline graphic and Inline graphic are not fixed and PCA leverage therefore only approximately represents the influence of each observation on the PCs and fitted values in Inline graphic, we find this approximation to be quite close in practice, as illustrated in Figure 1. Note that dimension reduction is essential for PCA leverage to be informative, since Inline graphic when all Inline graphic components are retained.

Fig. 1.

Fig. 1.

Top panel. For one randomly sampled subject, 50 contiguous time points and 200 contiguous voxels were randomly selected. For the resulting dataset Inline graphic, five PCs were identified, and PCA leverage was computed for each time point Inline graphic, displayed in the column on the left. After centering and scaling Inline graphic, each observed value Inline graphic was increased by one unit and PCs and scores recomputed. The matrix displayed on the right shows the resulting change in the fitted value Inline graphic, where Inline graphic. Although some variation is seen across voxels, the observed change in fitted values is overall quite similar to the leverage. Other randomly sampled subjects show similar patterns, supporting the analogy with regression (where the relationship is exact) and the concept of PCA leverage as a measure of influence in PCA. Bottom panel. We performed the analysis described above for Inline graphic randomly sampled subjects, then computed the average change in fitted values across voxels. We performed the analysis with Inline graphic, Inline graphic and Inline graphic PCs retained. The plot displays the PCA leverage and average change in fitted values for each subject, as well as a linear smoother across subjects. PCA leverage and the average change in fitted values are nearly equal, again supporting PCA leverage as a measure of influence in PCA.

In regression, leverage only represents the potential influence of an observation on regression coefficient estimation; influence points must be outliers in the explanatory variables (“leverage points”) as well as in the response variable(s). In contrast, PCA leverage is a more direct measure of influence, as PCA leverage points are outliers in both Inline graphic and the original data Inline graphic. Thus, PCA leverage points are also influence points. Furthermore, while in regression we discern “good” from “bad” leverage points, fMRI observations with high PCA leverage are unlikely to represent true signal, since the signal change associated with neuronal sources is very small compared with noise and artifacts. Therefore, we assume that all observations with high PCA leverage are “bad” influence points in the fMRI context.

Moreover, the interpretation of PCA leverage as the influence of each observation on PC estimation is particularly relevant for resting-state fMRI (rs-fMRI). PC estimation is a preprocessing step for one of the most common types of analysis for rs-fMRI data: estimation of spatially independent brain networks and the functional connectivity between those networks. In such analyses, PCA leverage is both a measure of influence on the quantity of interest and of outlyingness.

In setting a leverage threshold to identify outliers, it is important to recognize that the sum of leverages for a set of observations equals the number of variables in the design matrix. The mean leverage of all Inline graphic observations is therefore fixed at Inline graphic. Outliers wield a large amount of leverage, and their presence reduces the leverage of all remaining observations, such that the mean Inline graphic may be significantly greater than the leverage of normal observations. Thus, the median is a more appropriate reference quantity than the mean for normal observations. If the leverage of observation Inline graphic exceeds Inline graphic times the median Inline graphic, it is labeled a “leverage outlier.” In the simulations and experimental data analysis described below, we consider Inline graphic.

While such practical rules may work well in the absence of convenient statistical properties, a formal statistical test for outliers with known and controllable properties is desirable. In the following section, we propose an alternative robust distance measure based on MCD estimators (Rousseeuw, 1985).

2.3. Principal components robust distance

For a design matrix with an intercept or centered variables, leverage is related to the squared empirical Mahalanobis distance (Mahalanobis, 1936), defined for an Inline graphic matrix Inline graphic and observation Inline graphic as Inline graphic, where Inline graphic and Inline graphic are the sample mean and covariance matrix of Inline graphic, respectively. The Mahalanobis distance is known to be sensitive to outliers due to their influence on the sample mean and covariance, which may lead to “masking,” a phenomenon in which truly outlying observations appear normal due to the presence of more extreme outliers (Rousseeuw and Van Zomeren, 1990; Rousseeuw and Hubert, 2011).

As an alternate measure, we adopt the MCD distance proposed by Rousseeuw (1985). For a general dataset, let Inline graphic be the number of observations and Inline graphic be the number of variables. The MCD estimators of location, Inline graphic, and scale, Inline graphic, are obtained by computing the sample mean and covariance within a subset of the data of size Inline graphic for which the confidence ellipsoid determined by Inline graphic and centered at Inline graphic has minimal volume. The maximum breakdown point of MCD estimators is obtained by setting Inline graphic and approaches Inline graphic as Inline graphic. The MCD distance Inline graphic is then computed as a Mahalanobis distance using Inline graphic and Inline graphic in place of Inline graphic and Inline graphic. For ease of notation, let Inline graphic.

Let Inline graphic, and let Inline graphic, Inline graphic, be the indices of the observations selected to compute Inline graphic and Inline graphic. Let Inline graphic be the indices of the remaining observations, among which we look for outliers. For Gaussian data, Inline graphic approximately follow a Inline graphic distribution (Hubert and others, 2005; Shieh and Hung, 2009), while for Inline graphic,

d~i2:=c(mp+1)pmdi2Fp,mp+1,

where Inline graphic and Inline graphic can be estimated asymptotically or through simulation. (While some previous work has simply assumed a Inline graphic distribution for Inline graphic, we find this to result in many false positives.) To estimate Inline graphic we use the asymptotic form, Inline graphic, while to estimate Inline graphic we use the small sample-corrected asymptotic form given in Hardin and Rocke (2005). To improve the F-distribution fit, Maronna and Zamar (2002) and Filzmoser and others (2008) scale the distances to match the median of the theoretical distribution. However, as Inline graphic contains at most half of the original observations, the median within Inline graphic represents the Inline graphicth or greater quantile within Inline graphic and may be contaminated with outliers. Therefore, we let Inline graphic, where Inline graphic is the Inline graphicth sample quantile of Inline graphic. We label a “distance outlier” any observation in Inline graphic with Inline graphic greater than the Inline graphicth quantile of the theoretical F distribution. In our simulations and experimental data analysis, we consider Inline graphic ranging from Inline graphic to Inline graphic.

For time series data, where the assumption of independence among observations is violated, the distributional results given in Hardin and Rocke (2005) may be invalid. As the autocorrelation in fMRI time series is often modeled as an AR(1) process with a coefficient of Inline graphic, we divide each fMRI time series into three subsets, each consisting of every third observation. The autocorrelation within each subset is negligible at approximately Inline graphic. To obtain the MCD distance for each observation, we use the MCD estimates of center and scale within each subset, averaged across subsets, and we find that this significantly improves the distributional fit.

3. Simulation study

3.1. Construction of baseline scans

Our simulated dataset is based on fMRI scans from three subjects collected as part of the ABIDE dataset (described in Section 4). For generalizability, each subject was chosen from a different data collection site. For each scan, we identify a contiguous subset of volumes containing no detectable artifacts, resulting in 141, 171, and 89 volumes, respectively. We reduce dimensionality by using only the 45th axial (horizontal) slice, corresponding roughly to the center of the brain. For scan Inline graphic, let Inline graphic be the resulting length of the scan and Inline graphic be the resulting number of voxels in the brain mask, so that scan Inline graphic is represented by the Inline graphic matrix Inline graphic. We can separate Inline graphic into an anatomical baseline, the mean image Inline graphic (Inline graphic), and the residual Inline graphic, representing primarily functional information. Then Inline graphic, where Inline graphic is a vector of Inline graphics of length Inline graphic.

We then use independent components analysis (ICA), a blind-source separation algorithm, to decompose the intrinsic activity in Inline graphic into a number of spatially independent sources (McKeown and others, 1997) (described in Section 4). Let Inline graphic be the number of sources of neuronal signal identified for scan Inline graphic. Then Inline graphic, where Inline graphic (Inline graphic) contains the spatial maps of each source, and Inline graphic (Inline graphic) contains the time courses of each source. The residual Inline graphic contains structured (spatially and temporally correlated) noise. Let Inline graphic.

3.2. Artifact-free images

For each scan Inline graphic, we construct three simulation setups: baseline image (Inline graphic) plus white noise (setup 1); baseline image plus functional signal (Inline graphic) plus white noise (setup 2); and baseline image plus functional signal plus structured noise (Inline graphic) (setup 3). In setup 3, Inline graphic is used to vary the signal-to-noise ratio (SNR) Inline graphic and is defined below.

To test the specificity of each outlier detection method in the artifact-free setting, we generate images with varying SNR in the following way. For scan Inline graphic, we have true signal variance Inline graphic and true noise variance Inline graphic. Defining SNR as the ratio of signal variance to noise variance, let Inline graphic be the desired SNR of the simulated scans. For setups 1 and 2, we generate the white noise matrix Inline graphic for scan Inline graphic as independent, mean-zero Gaussian noise with variance Inline graphic. For setup 3, we generate the structured noise matrix Inline graphic, where Inline graphic and Inline graphic is the baseline SNR of scan Inline graphic, equal to Inline graphic, Inline graphic and Inline graphic, respectively. Therefore, the simulated artifact-free data at SNR Inline graphic is Inline graphic for setup 1; Inline graphic for setup 2; and Inline graphic for setup 3. For setups 1 and 2, we randomly generate Inline graphicInline graphic times; for setup 3, the noise is fixed.

The specificity, or percentage of observations not labeled as outliers that are truly non-outliers, in this case is simply the percentage of volumes in each scan not labeled as outliers. Figure 2 shows the mean specificity across Inline graphic iterations, where each line represents a scan and SNR level. The dotted lines correspond to SNR of Inline graphic, which is close to the observed SNR of each scan. Specificity is nearly Inline graphic for both leverage and robust distance methods across all thresholds and SNR levels considered. In the presence of structured noise, the specificity of the robust distance method is approximately Inline graphicInline graphic in some cases, unless the Inline graphicth quantile threshold is used.

Fig. 2.

Fig. 2.

Specificity of each method in the absence of artifacts by simulation setup. Each line shows the mean across 1000 iterations for a given scan Inline graphic and SNR. The dotted lines correspond to SNR of 0.05, which is close to the observed SNR of the fMRI scans used to construct the simulated scans.

3.3. Images with artifacts

We generate four common types of fMRI artifacts: spikes, motion, banding, and ghosting. Spikes are created by increasing the intensity of an entire volume by a given percentage. Motion artifacts are created by rotating a volume by a given angle. Banding artifacts are generated by changing an intensity in the Fourier transform of the image, resulting in a striped appearance. Ghosting artifacts are created by superimposing a figure or “ghost” moving through space over time.

At each of Inline graphic iterations, one simulated fMRI scan is generated for each subject, SNR level, artifact type, and simulation setup. For spike, motion, and banding artifacts, 10 volumes are randomly selected from each scan, and the artifact intensity for each volume is generated from a uniform distribution (range Inline graphicInline graphic intensity increase for spike artifacts; Inline graphicInline graphic rotation for motion artifacts; 50–200 times change at location Inline graphic of the Fourier transformed image). For ghosting artifacts, nine sequential volumes are randomly selected, and the mean intensity of the ghost, relative to the mean intensity of the image, is randomly generated from a uniform distribution (range 0.06–0.32). An example of each artifact type is displayed in Figure 3.

Fig. 3.

Fig. 3.

Examples of each artifact type. (a) A normal volume (left) and a volume with a spike artifact (right). (b) The image mask before and after rotation for a rotation artifact. (c) A banding artifact. (d) One volume of a ghosting artifact. The spike, rotation, and ghosting artifacts are generated from the maximum artifact intensity as described in Section 3.3; the banding artifact is generated randomly as described in Section 3.3.

We are interested in both the specificity and the sensitivity, or the percentage of true outliers identified as outliers. Figure 4 shows the mean sensitivity and specificity for each outlier detection method, simulation setup, and artifact type, where each line represents a scan and SNR level. The realistic SNR of Inline graphic is shown as a dotted line. As the simulation setup becomes more realistic, the sensitivity to outliers tends to decrease, while the specificity is relatively stable. The robust distance method has nearly Inline graphic specificity in all scenarios and tends to display higher sensitivity than the leverage method, particularly for banding and spike artifacts. While differences across artifact types are apparent, these are likely driven by the range of intensities chosen.

Fig. 4.

Fig. 4.

Sensitivity and specificity of each method in the presence of artifacts by simulation setup. Each line shows the mean across 1000 iterations for a given scan Inline graphic and SNR. The dotted lines correspond to SNR of 0.05, which is close to the observed SNR of the fMRI scans used to construct the simulated scans.

4. Experimental data results

Using a large, multi-site fMRI dataset, we assess the result of outlier removal on the scan-rescan reliability of a common type of analysis. This section is organized as follows. We begin with a description of the dataset employed and show an example. We then describe the reliability analysis. Finally, we quantify the improvement to reliability with the proposed outlier detection methods using a linear mixed model to account for subject and site effects.

4.1. fMRI dataset

ABIDE is a publicly available resource of neuroimaging and phenotypic information from 1112 subjects consisting of 20 datasets collected at 16 sites (Di Martino and others, 2014). Fully anonymized data from 91 children collected at Kennedy Krieger Institute after the ABIDE release were also included. Image acquisition parameters and demographic information are available at http://fcon_1000.projects.nitrc.org/indi/abide/. For each subject, a Inline graphic-weighted MPRAGE volume and one or more rs-fMRI sessions were collected on the same day. Details of data pre-processing and quality control are provided in supplementary material Appendix A available at Biostatistics online, where Table 1 lists the number of subjects in each dataset.

For a single example scan, Figure 5 shows the leverage and robust distance functions, along with six motion parameters (roll, pitch, yaw, and translation in each direction) and their derivatives, which are commonly used for artifact detection. Below the plot, the volumes corresponding to the spikes at time points 60, 90, 134, and 150 are shown. Three of the spikes are leverage and distance outliers using any of the thresholds considered (Inline graphic for leverage; Inline graphic for robust distance), while the spike at time point 90 is only a leverage outlier at Inline graphic. Obvious banding artifacts are seen at time points 60 and 150, a moderate banding artifact is seen at time point 134, and no visible artifact is apparent at time point 90. While the artifact at time point 150 would be detected using motion measures, the other spikes would likely go undetected.

Fig. 5.

Fig. 5.

For a single subject, the motion parameters, leverage function, and robust distance function. Below the plot, the volumes corresponding to the spikes at time points 60, 90, 134, and 150 (shaded on the plot) are shown. Three of the spikes are leverage and distance outliers using any of the thresholds considered (Inline graphic for leverage; Inline graphic for robust distance), while the spike at time point 90 is only a leverage outlier at Inline graphic. Obvious banding artifacts are seen at time points 60 and 150, a moderate banding artifact is seen at time point 134, and no visible artifact is apparent at time point 90. While the artifact at time point 150 would be detected using motion measures, the other spikes would likely go undetected using only motion.

4.2. Estimation of subject-level brain networks and connectivity

Resting-state brain networks represent regions of the brain that act in a coordinated manner during rest. While such networks have traditionally been identified at the group level, there is growing interest in estimating these networks at the subject level, where the higher levels of noise make accurate estimation difficult. There is also interest in estimating the subject-level “functional connectivity,” or temporal dependence of neuronal activation, between these different networks (van den Heuvel and Pol, 2010). We assess the benefits of outlier removal on the reliability of these networks and their functional connectivity. Details of the estimation of subject-level resting-state networks are provided in supplementary material Appendix B available at Biostatistics online. Here we briefly describe the procedure. We begin by performing group ICA (GICA) separately for each of the ABIDE datasets Inline graphic. The result of GICA is the Inline graphic matrix Inline graphic, where Inline graphic is the number of voxels in the group-level brain mask and Inline graphic is the number of independent components (ICs). Each row of Inline graphic may represent a source of noise (e.g. motion, respiration) or a resting-state network. After identification of those ICs corresponding to resting-state networks, let Inline graphic denote the Inline graphic matrix containing the Inline graphic resting-state networks identified for dataset Inline graphic. Using dual regression (Beckmann and others, 2009), we obtain Inline graphic, where Inline graphic is the Inline graphic matrix whose rows contain the estimated resting-state brain networks for subject Inline graphic, and Inline graphic is the Inline graphic “mixing matrix” representing the activation of each network over time. We are interested in reliable estimation of two quantities: Inline graphic and the Inline graphic matrix Inline graphic, which represents the functional connectivity between each pair of networks.

4.3. Measuring reliability of subject-level brain networks and functional connectivity

Let Inline graphic and Inline graphic be two sets of estimated resting-state networks for subject Inline graphic obtained by performing dual regression separately for two different scanning sessions of subject Inline graphic in dataset Inline graphic. There is no need to match components between Inline graphic and Inline graphic, since the ICs in each correspond to the same group-level ICs in Inline graphic. To assess reliability, for each subject Inline graphic and component Inline graphic we compute the number of overlapping voxels between Inline graphic and Inline graphic after both have been thresholded (as described in supplementary material Appendix B available at Biostatistics online). We then average over all Inline graphic networks to obtain the average scan-rescan overlap for each subject per network, denoted Inline graphic for subject Inline graphic in dataset Inline graphic using outlier removal method Inline graphic. Outlier removal methods include no outlier removal, leverage-based outlier removal with Inline graphic, and robust distance-based outlier removal with Inline graphic. Similarly, to assess reliability of functional connectivity between networks, let Inline graphic and Inline graphic be the mixing matrices corresponding to Inline graphic and Inline graphic. We compute the mean squared error (MSE) between the upper triangles of Inline graphic and Inline graphic and denote the result Inline graphic for subject Inline graphic in dataset Inline graphic using outlier removal method Inline graphic.

Although most subjects in the ABIDE dataset have only a single scanning session, we can simulate scan-rescan data by splitting each subject’s data into two contiguous subsets consisting of the first and second half of time points, respectively. We use the resulting pseudo scan-rescan data to obtain Inline graphic, Inline graphic, Inline graphic, and Inline graphic, then compute our reliability measures Inline graphic and Inline graphic. While this approach may produce an optimistic estimate of the true scan-rescan reliability, this is not a concern as we are primarily interested in the change in reliability due to outlier removal.

To test for changes in reliability of resting-state networks or functional connectivity due to outlier removal, we fit a linear mixed effects model with a fixed effect for each outlier removal method, a fixed effect for each dataset, and a random effect for each subject. We employ this model for its ability to test several groups and methods simultaneously and to account for within-subject correlation across methods. Using the Inline graphic subjects for whom at least one outlier was identified using any method, we estimate the following model for Inline graphic:

Mikm=bi0+γk+δmIm>0+ϵikm, ϵikmN(0,σ2), bi0N(0,τ2),

where Inline graphic indicates no outlier removal. Here, Inline graphic represents the baseline reliability for subjects in dataset Inline graphic when no outlier removal is performed, and Inline graphic represents the change in reliability when outlier removal method Inline graphic is used. To obtain coefficient estimates, we fit this model using the lme function from the nlme package (Pinheiro and others, 2016). Since we have a large sample size, we compute Normal 95% confidence intervals.

4.4. The effect of outlier removal on reliability

Figure 6 displays estimates and 95% confidence intervals for the coefficients of the models for reliability of resting state networks (a) and functional connectivity (b). For (a), larger overlap values represent greater reliability; for (b), smaller MSE values represent greater reliability. The left-hand panels of (a) and (b) display the fixed effects for each dataset (Inline graphic) and illustrate the heterogeneity in baseline reliability across ABIDE datasets before outlier removal. This reflects the substantial differences in acquisition, processing and quality control methods across the data collection sites contributing to ABIDE. The middle and right-hand panels of (a) and (b) display the coefficients for each outlier removal method (Inline graphic). The average percentage of volumes labeled as outliers using each method is also displayed in gray. Both leverage-based and distance-based methods significantly improve reliability of estimates of subject-level brain networks and functional connectivity. Improvement in reliability is maximized using a threshold of Inline graphic or Inline graphic for leverage-based outlier removal and Inline graphic for distance-based outlier removal. The maximum improvement in overlap of resting-state networks is approximately Inline graphic voxels using either method, while the maximum reduction in MSE of functional connectivity is Inline graphic and is achieved using leverage-based outlier removal with Inline graphic.

Fig. 6.

Fig. 6.

Estimates and 95% confidence intervals for the model coefficients for (a) the scan-rescan overlap of brain networks and (b) the scan-rescan MSE of connectivity between each network. For both models, the left-hand plot displays the fixed effects for each dataset (Inline graphic) and illustrates the heterogeneity in reliability across datasets in ABIDE before outlier detection. The middle and right-hand plots display the coefficients for each outlier removal method (Inline graphic), which represent the change in reliability due to outlier removal. These plots also show the percentage of volumes in each fMRI run labeled as outliers using each method. Both leverage and robust distance-based outlier removal methods result in a statistically significant improvement to reliability of brain networks and connectivity. While both methods appear to be fairly robust to the choice of threshold, reliability is maximized by choosing a cutoff of Inline graphic times the median for PCA leverage and the Inline graphicth quantile for PCA robust distance.

We also stratify the model by those subjects who passed quality inspection and those who failed. Figure 1supplementary material Appendix C available at Biostatistics online shows estimates and 95% confidence intervals for the model coefficients after stratification. While subjects who failed quality inspection tend to improve more than those who passed quality inspection, differences are not statistically significant, and both groups of subjects benefit substantially from outlier removal.

5. Discussion

We have proposed a method to detect outlying time points in an fMRI scan by drawing on the traditional statistical ideas of PCA, leverage, and outlier detection. The proposed methods have been validated through simulated data and a large, diverse fMRI dataset. We have demonstrated that the proposed methods are accurate and result in improved reliability of two common types of analysis for resting-state fMRI data, namely identification of resting-state networks through ICA and estimation of functional connectivity between these networks.

The proposed techniques are, to the best of our knowledge, the first to provide a single measure of outlyingness for time points in an fMRI scan, which can be easily thresholded to identify outliers. Unlike motion-based outlier detection methods for fMRI, they are agnostic to the source of artifact and therefore may be used as a general method to detect artifacts, including those unrelated to motion. Furthermore, PCA leverage is directly related to the estimation of principal components, which are an important quantity in the analysis of resting-state fMRI, as they are used as input to ICA for the identification of resting-state networks.

One limitation of our approach is that we perform validation on a single dataset, the ABIDE. However, this dataset is in fact a diverse collection of 20 datasets from 16 international sites, which strengthens the generalizability of our results. Another limitation of the proposed methods is that they may be sensitive to the number of principal components retained. However, we have found that the method performs well with different model orders (e.g. Inline graphic or Inline graphic), and we propose an automated method of selecting model order in order to provide a fully automated approach.

A limitation of PCA leverage-based outlier removal is that the proposed thresholding rule is, as in regression, somewhat ad-hoc. However, the use of the median leverage across observations as a benchmark is a reasonable approach, and we have tested a range of thresholds. Based on our reliability analysis we expect a threshold of Inline graphic of Inline graphic times the median leverage to work well in practice for fMRI, but for different types of data the researcher may wish to re-evaluate this choice. In particular, fMRI volumes containing artifacts tend to be very different from those volumes free of outliers, so a relatively high threshold for leverage and robust distance tends to work well; in other contexts, outliers may be more subtle.

While the proposed methods have been designed and validated for resting-state fMRI data, they may be easily extended to other types of medical imaging data, such as task fMRI and EEG data, as well as other types of HD data. Furthermore, they may also be extended to group analyses. Future work should focus on exploring these directions.

As the availability of large fMRI datasets continues to grow, automated outlier detection methods are becoming essential for the effective use of such data. In particular, the reliability of analyses employing these diverse datasets may be negatively impacted by the presence of poor quality data. The outlier detection methods we propose have the potential to improve the quality of such datasets, thus enhancing the possibilities to use these data to understand neurological diseases and brain function in general.

Supplementary Material

Supplementary Data

Supplementary material

Supplementary material is available at http://biostatistics.oxfordjournals.org. Conflict of Interest: None declared.

ACKNOWLEDGMENTS

Conflict of Interest: None declared.

Funding

National Institute of Biomedical Imaging and Bioengineering [R01 EB016061 and P41 EB015909], and the National Institute of Mental Health [R01 MH095836].

References

  1. Beckmann C. F., Mackay C. E., Filippini N. and Smith S. M. (2009). Group comparison of resting-state fMRI data using multi-subject ICA and dual regression. NeuroImage 47(Suppl 1), S148. [Google Scholar]
  2. Brooks S. P. (1994). Diagnostics for principal components: influence functions as diagnostic tools. The Statistician 43, 483–494. [Google Scholar]
  3. Di Martino A., Yan C.-G., Li Q., Denio E., Castellanos F. X., Alaerts K., Anderson J. S., Assaf M.. and others (2014). The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Molecular Psychiatry 19, 659–667. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Filzmoser P., Maronna R. and Werner M. (2008). Outlier identification in high dimensions. Computational Statistics & Data Analysis 52, 1694–1711. [Google Scholar]
  5. Fritsch V., Varoquaux G., Thyreau B., Poline J.-B. and Thirion B. (2012). Detecting outliers in high-dimensional neuroimaging datasets with robust covariance estimators. Medical image analysis 16, 1359–1370. [DOI] [PubMed] [Google Scholar]
  6. Gao S., Li G. and Wang D. (2005). A new approach for detecting multivariate outliers. Communications in Statistics: Theory and Methods 34, 1857–1865. [Google Scholar]
  7. Golub G. H. and Reinsch C. (1970). Singular value decomposition and least squares solutions. Numerische Mathematik 14, 403–420. [Google Scholar]
  8. Hadi A. S., Imon A. H. M. and Werner M. (2009). Detection of outliers. Wiley Interdisciplinary Reviews: Computational Statistics 1, 57–70. [Google Scholar]
  9. Hampel F. R., Ronchetti E. M., Rousseeuw P. and Stahel W. A. (1986). Robust Statistics: The Approach Based on Influence Functions. New York: John Wiley. [Google Scholar]
  10. Hardin J. and Rocke D. M. (2005). The distribution of robust distances. Journal of Computational and Graphical Statistics 14, 1–19. [Google Scholar]
  11. Hubert M., Rousseeuw P. J. and Vanden Branden K. (2005). ROBPCA: a new approach to robust principal component analysis. Technometrics 47, 64–79. [Google Scholar]
  12. Jackson D. A. (1993). Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches. Ecology 74, 2204–2214. [Google Scholar]
  13. Lindquist M. A. and others (2008). The statistical analysis of fMRI data. Statistical Science 23, 439–464. [Google Scholar]
  14. Mahalanobis P. C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Sciences (Calcutta) 2, 49–55. [Google Scholar]
  15. Maronna R. A. and Zamar R. H. (2002). Robust estimates of location and dispersion for high-dimensional datasets. Technometrics 44, 307–317. [Google Scholar]
  16. McKeown M. J., Makeig S., Brown G. G., Jung T.-P., Kindermann S. S., Bell A. J. and Sejnowski T. J. (1997). Analysis of fMRI data by blind separation into independent spatial components (No. NHRC-REPT-97-42). NAVAL HEALTH RESEARCH CENTER SAN DIEGO CA. Technical Report. DTIC Document. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Neter J., Kutner M. H., Nachtsheim C. J. and Wasserman W. (1996). Applied Linear Statistical Models, Volume 4 Chicago: Irwin. [Google Scholar]
  18. Pinheiro J., Bates D., DebRoy S., Sarkar D. and Team. R Core (2016). nlme: Linear and Nonlinear Mixed Effects Models. R package version3.1–118. http://CRAN.R-project.org/package=nlme. [Google Scholar]
  19. Power J. D., Mitra A., Laumann T. O., Snyder A. Z., Schlaggar B. L. and Petersen S. E. (2014). Methods to detect, characterize, and remove motion artifact in resting state fMRI. NeuroImage 84, 320–341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Team. R Core. (2014). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. [Google Scholar]
  21. Ro K., Zou C., Wang Z. and Yin G. (2015). Outlier detection for high-dimensional data. Biometrika. 102, 589–599. [Google Scholar]
  22. Rousseeuw P. J. (1985). Multivariate estimation with high breakdown point. Mathematical statistics and applications 8, 283–297. [Google Scholar]
  23. Rousseeuw P. J. and Hubert M. (2011). Robust statistics for outlier detection. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1, 73–79. [Google Scholar]
  24. Rousseeuw P. J. and Van Zomeren B. C. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association 85, 633–639. [Google Scholar]
  25. Satterthwaite T. D., Elliott M. A., Gerraty R. T., Ruparel K., Loughead J., Calkins M. E., Eickhoff S. B.. and others (2013). An improved framework for confound regression and filtering for control of motion artifact in the preprocessing of resting-state functional connectivity data. NeuroImage 64, 240–256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Shieh A. D. and Hung Y. S. (2009). Detecting outlier samples in microarray data. Statistical applications in genetics and molecular biology 8, 1–24. [DOI] [PubMed] [Google Scholar]
  27. Van Den Heuvel M. P. and Pol H. E. H. (2010). Exploring the brain network: a review on resting-state fMRI functional connectivity. European Neuropsychopharmacology 20, 519–534. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES