Evaluating the efficacy of fully automated approaches for the selection of eye blink ICA components

Matthew B Pontifex; Vladimir Miskovic; Sarah Laszlo

doi:10.1111/psyp.12827

. Author manuscript; available in PMC: 2018 May 1.

Published in final edited form as: Psychophysiology. 2017 Feb 13;54(5):780–791. doi: 10.1111/psyp.12827

Evaluating the efficacy of fully automated approaches for the selection of eye blink ICA components

Matthew B Pontifex ¹, Vladimir Miskovic ², Sarah Laszlo ²

PMCID: PMC5397386 NIHMSID: NIHMS840363 PMID: 28191627

Abstract

Independent component analysis (ICA) offers a powerful approach for the isolation and removal of eye blink artifacts from EEG signals. Manual identification of the eye blink ICA component by inspection of scalp map projections, however, is prone to error, particularly when non-artifactual components exhibit topographic distributions similar to the blink. The aim of the present investigation was to determine the extent to which automated approaches for selecting eye blink related ICA components could be utilized to replace manual selection. We evaluated popular blink selection methods relying on spatial features [EyeCatch()], combined stereotypical spatial and temporal features [ADJUST()], and a novel method relying on time-series features alone [icablinkmetrics()] using both simulated and real EEG data. The results of this investigation suggest that all three methods of automatic component selection are able to accurately identify eye blink related ICA components at or above the level of trained human observers. However, icablinkmetrics(), in particular, appears to provide an effective means of automating ICA artifact rejection while at the same time eliminating human errors inevitable during manual component selection and false positive component identifications common in other automated approaches. Based upon these findings, best practices for 1) identifying artifactual components via automated means and 2) reducing the accidental removal of signal-related ICA components are discussed.

Keywords: Independent Component Analysis, EEG Artifact, EEGLAB

Over the past decade an increasing number of laboratories have begun to utilize temporal independent component analysis (ICA) to isolate and remove eye blink artifacts from EEG signals. Indeed, examination of the journal Psychophysiology over the past two years reveals that nearly the same proportion of EEG investigations utilize temporal ICA approaches for eye blink artifact reduction/correction as those investigations that utilize regression-based approaches. This is undoubtedly related in some part to the growing adoption of EEGLAB (Delorme & Makeig, 2004), a MATLAB / Octave based graphical toolbox for data processing, which implements temporal ICA in its standard workflow. Of practical importance, this temporal ICA approach to artifact correction may not be globally appropriate for all artifacts — such as saccadic eye movements and non-stationary artifacts, (see Hoffmann & Falkenstein, 2008 for further discussion). However, for eye blink artifact correction, the ICA approach has been found to exhibit superior performance relative to regression-based approaches (Jung, et al., 2000). An important distinction to make, however, is that unlike regression-based approaches, ICA does not inherently perform any form of artifact correction. That is, these approaches are simply blind-source signal separation techniques that attempt to dissociate temporally independent yet spatially fixed components (Bell & Sejnowski, 1995). Thus, a critical limitation of temporal ICA-based approaches to artifact correction is the reliance on subjective human judgments to determine what components are associated with noise rather than signal, so that the data can be back-projected to reconstruct EEG signals in the absence of artifactual activity. Although automated approaches exist, we have little understanding of the extent to which these automated ICA component selection approaches are robust to variation in signal-to noise-ratio or across varying electrode densities. Thus, the aim of the present investigation was to determine if fully automated approaches for selecting eye blink related ICA components can and should be utilized to replace manual selection of eye blink artifact components by human users.

In a common EEGLAB workflow, following separation of the signals using standard ICA algorithms, a human observer must visually sift through the full set of temporal ICA components in order to manually select one or more components for removal. Such an approach is not only labor intensive, but it is also user-dependent, making it more prone to errors, or, potentially, to bias (e.g., quality control is dependent on the user’s expertise level). Human fallibility is especially relevant in the case of experiments concerned with frontal ERP / EEG activity, where it is particularly difficult to differentiate the scalp projection of the temporal ICA component(s) associated with eye blink activity from the scalp projection of the temporal ICA component(s) associated with genuine, frontally maximal cortical activity, such as the LAN (Castellanos & Makarov, 2006). Although it is possible to inform these decisions by inspecting temporal ICA activations (see Figure 1), such approaches are not explicitly detailed within the EEGLAB documentation, rendering knowledge and implementation of such methodologies potentially variable across research laboratories.

Illustration of eye blinks recorded from the VEOG electrode, how they manifest across three midline electrode sites, and how the eye blink related signal is separated from other signals using ICA decomposition. Note the high degree of similarity between the VEOG electrode and ICA component 1.

To address the potential limitations of human-selected ICA artifact rejection, a number of methods for automatic identification of artifact related temporal ICA components have been developed for EEGLAB. These include ADJUST (Mognon, Jovicich, Bruzzone, & Buiatti, 2011), CORRMAP (Viola, Thorne, Edmonds, Schneider, & Eichele, 2009), and EyeCatch (Bigdely-Shamlo, Kreutz-Delgado, Kothe, & Makeig, 2013). It should be pointed out that such approaches are simply added on following the application of ICA to the data in order to automate the selection of artifact related components, and are not used as a replacement for the ICA application. The most widely downloaded software plugin is the ADJUST() method, which attempts to identify a wide array of potential sources of artifact such as eye blinks, eye movements, cardiac induced artifacts, and other stereotypical movements. To this end, temporal independent components are characterized by combining both spatial and temporal information, with the identification of artifactual components based on stereotyped spatio-temporal features such as temporal kurtosis and the spatial average difference. Alternatively, the EyeCatch() plugin attempts to distinguish temporal ICA components based on the correlation between the scalp map projection for each ICA component and a database of 3,452 (as of the writing of this manuscript) exemplar eye-activity related template scalp maps. This approach has been found to exhibit overall performance similar to the CORRMAP() function (Viola, Thorne, Edmonds, Schneider, & Eichele, 2009) but has an advantage over CORRMAP() in that it is fully automated.

A limitation of the approaches mentioned above is that they largely rely on ancillary indices of ICA components, such as scalp map projections of temporal ICA weights, to differentiate eye-blink related components from non-blink related components, rendering them prone to the same potential sources of error as the human visual inspection approach — particularly when non-artifactual frontally maximal ICA components occur. Since these approaches rely on the scalp topography of the temporal ICA components, the eye-blink component can easily be confused with non-artifactual frontally distributed components. This is a fundamental weakness of any topography-based approach to selecting the eye blink related temporal ICA component. In contrast, a time-domain approach should not be vulnerable to potential confusion between frontally distributed non-artifact components and eye blink related components. Accordingly, we were interested in comparing a time-domain approach to the existing spatial approaches. Consequently, we developed a time-domain approach, icablinkmetrics(), predicated on two basic premises: 1) that the temporal ICA component(s) associated with eye blinks should be related to the eye blink activity present within the EEG more so than any other temporal ICA component (e.g., via correlation and convolution), and 2) that removal of the temporal ICA component associated with the eye blinks should reduce the eye blink artifact present within the EEG data more so than the removal of any other temporal ICA component following back projection (when the data is reconstructed without the artifactual component). Again this approach is simply added on following the application of ICA to the data in order to automate the selection of artifact related components, and is not used as a replacement to the ICA application. For a more detailed description of the background and theory underlying the premises which guided the development of the icablinkmetrics() time-domain approach see the Supplementary Appendix. In the interest of transparency, we note that the icablinkmetrics() approach was created by M. P. with input from S. L. The icablinkmetrics() EEGLAB plugin—which can be run from either the command line or the Tools menu of EEGLAB — is available through the EEGLAB Extension Manager or by downloading from http://sccn.ucsd.edu/wiki/EEGLAB_Extensions.

The aim of the present investigation was to assess the efficacy of these automated approaches for the selection of eye blink related artifact components. To this end, we evaluated the relative merits of automatic approaches to eye-blink component selection methods relying on time-series data [icablinkmetrics()] as compared to those relying on combined stereotypical spatial and temporal features [ADJUST(), (Mognon, Jovicich, Bruzzone, & Buiatti, 2011)], or spatial features alone [EyeCatch(), (Bigdely-Shamlo, Kreutz-Delgado, Kothe, & Makeig, 2013)]. An intrinsic weakness of a temporal approach to eye blink related component selection is that with increasing noise in the time series, procedures for identifying when eye blink related activity occurs are more prone to failure. To examine this issue, we utilized simulated EEG data to investigate the extent to which each of these automated approaches would be sensitive to variation in the magnitude of the eye blink artifact amid increasing levels of noise in the signal. Next, we assessed the generalizability of these automated approaches across real EEG data collected with varying electrode densities and in response to different tasks. Finally, for comparison, we assessed the accuracy of the current common method of trained observers visually selecting temporal ICA components. Collectively, these analyses serve to address the critical question of whether fully automated approaches for selecting eye blink related ICA components can and should be utilized to replace manual selection of eye artifact components by human users, and, if so, what their potential vulnerabilities are.

Simulated EEG Varying in the Magnitude of the Artifact and Level of Noise

Method

A total of 3,072 simulated EEG datasets were created matched to three (real) exemplar EEG datasets (1,024 simulations per exemplar dataset). For each exemplar dataset, simulated data were created representing a wide range of possible eye blink artifact magnitude and noise conditions. In this context, the aim was not to simulate the computational processes by which the EEG signal is actually created in the brain (e.g., Laszlo & Armstrong, 2014; Laszlo & Plaut, 2012). Rather, our goal was to ensure that the artificial data exhibited the same frequency domain properties and signal-to-noise ratio (prior to the injection of more noise per the experimental manipulations) in the same amplitude range as true EEG data. Such an approach enabled the creation of EEG datasets that had similar properties to real EEG, while allowing for the ability to modulate the level of noise present within the signal as a function of the variability found within real EEG datasets. Simulated EEG datasets were created by 1) Fourier decomposing each exemplar dataset at each channel and then 2) producing weighted sums of sines with random phase shifts that resulted in simulated datasets with the same frequency characteristics as the exemplar. The first and last 100 points of the simulated time series were removed to account for edge artifacts from the finite sum of sines, and simulated time series were scaled to have the same mean and standard deviation as the exemplar datasets, per channel. Each simulated dataset contained 25,480 points for each of 28 channels, allowing 32.5 data points for each ICA weight (data points/channels²). Noise was added to the simulated datasets by randomly perturbing both the phase and amplitude at each point in the time series. Phase perturbations were distributed uniformly, amplitude perturbations were distributed normally. The noise perturbations within the simulated EEG data were scaled to create 32 levels of noise ranging from 0.4 to 10 times the standard deviation of the exemplar EEG dataset in increments of 0.31 standard deviations. Simulated data constructed in this manner do not include eye blink artifacts, and thus constitute the “ground truth” for ICA artifact correction. That is, this data can be compared with reconstructed data created by removing each ICA component. The reconstructed data that is most similar to the ground truth data must then reflect removal of the truly artifactual eye blink component (as opposed to the other, non-artifactual components).

Eye blink artifacts were then introduced into the simulated data using a Chebyshev window (250 ms in length) as the model eye blink. Twenty eye blinks were introduced into the simulated time series at a rate of roughly one blink every 1.25 seconds with the propagation of the simulated blinks across the scalp controlled by a spherical head model derived empirically from the exemplar EEG dataset. The simulated eye blinks were scaled to create 32 levels of artifact magnitude ranging from 20 to 300 µV in increments of 9 µV. This approach therefore allowed for the examination of the automated eye blink component selection algorithms across an extreme range of signal to noise ratios. Figure 2 provides exemplars of the simulated EEG across the range of possible eye blink artifact magnitude and noise conditions.

Representative data illustrating the simulated EEG across the range of possible eye blink artifact magnitude and noise conditions for three electrode sites. For reference, the time points for the seeded eye blinks are highlighted in green.

Following each simulation, ICA decompositions were performed using the extended Infomax algorithm to extract subgaussian components using the default settings called for the binary instance of this function in EEGLAB. To identify the components related to the simulated artifact, the mean difference (as an absolute value) between the blink-free simulated data and the reconstructed simulated data was computed following back projection of the data without each ICA component, separately. As the eye blink component(s) should be rare relative to the other components, the truly artifactual components were selected by normalizing the differences, and computing the probability of the difference occurring given a normal distribution (see Figure 3). Those components with a probability less than 0.05 were identified as truly artifactual components. Across the 3,072 simulations, the truly artifactual ICA component was identified in 1,700 (55.3%) of the simulations with instances where the truly artifactual component was unable to be determined occurring when the magnitude of the noise far exceeded the magnitude of the eye blink (see Figures 2 and 4). Comparison of the automated component selection procedures was restricted to only those simulations where the truly artifactual component was able to be identified.

Representative data illustrating how the ground truth artifact-related ICA component was identified in the simulated EEG data. Only the removal of a single component returns the simulated data to near its uncontaminated state, with the normalized difference between the uncontaminated data and the contaminated data following removal of the ICA component reflecting that component as an outlier. As most components should be unrelated to the artifact, any component identified as an outlier was considered as related to the artifact.

Graphic illustration of the results of 3,072 simulations of EEG data (1,024 simulations per exemplar dataset) for the likelihood of identifying the artifact (sensitivity) and the likelihood of misidentifying signal as artifact (1-specificity) as a function of eye blink magnitude and noise for each automated procedure. As each exemplar dataset was used to test the full range of signal to noise, some data points may only reflect a singular simulation whereas others may reflect the result of three simulations at that eye blink magnitude and noise level. Areas where the ground truth eye blink component was unable to be determined (occurring in 1,372 of the 3,072) are uncolored.

Each of the three automated procedures (icablinkmetrics() version 3.1, ADJUST() version 1.1.1, and EyeCatch()) was then tested using their default parameters. The icablinkmetrics() function was run using the VEOG channel of the simulated dataset as the artifact comparison channel. The icablinkmetrics() function identified eye blinks within the artifact channel by cross-correlating a canonical eye blink waveform using the eyeblinklatencies() function, only accepting seeded eye blinks which exhibited correlations of 0.96 or higher. Quantification of the efficacy of the automated component selection approaches for reducing the simulated artifact was performed by computing the percent reduction in the difference between the blink-free simulated data and the reconstructed simulated data ([absolute value([difference between data with simulated eye blink and blink-free data] − [difference between reconstructed data following artifact removal and blink-free data]) / (difference between data with simulated eye blink and blink-free data)]; see Table 1). Perfect reconstruction of the simulated data to its blink-free state would thus be reflected by 100% reduction in the difference between the blink-free simulated data and the reconstructed simulated data following artifact removal. All data processing was conducted using an Apple iMac with a 3.5 GHz Intel Core i7 processor and 32 GB of 1600 MHz DDR3 SDRAM.

Table 1.

ICA component classifications

	True Positive	True Negative	False Positive	False Negative	Sensitivity	Specificity	Reduction of Artifact
	Eye blink correctly classified	Non blink correctly classified	Said it was eye blink but it was not	Said it was not an eye blink but it was	TP / (TP+FN)	TN / (TN+FP)	Based on components selected
	(Rejected Artifact)	(Retained Signal)	(Rejected Signal)	(Retained Artifact)	(Identify Artifact)	(Identify Signal)
Simulated Data
icablinkmetrics()	1234	45900	0	466	72.6%	100%	89.5%
ADJUST()	1662	45436	464	38	97.8%	99.0%	82.8%
EyeCatch()	1560	45428	472	140	91.8%	99.0%	83.9%
Real Data
icablinkmetrics()	92	4936	0	0	100%	100%	88.0%
32 Channel Array	40	998	0	0	100%	100%	93.4%
64 Channel Array	38	2233	0	0	100%	100%	89.3%
128 Channel Array	14	1705	0	0	100%	100%	69.2%
ADJUST()	89	4738	198	3	96.7%	96.0%	86.6%
32 Channel Array	39	959	39	1	97.5%	96.1%	91.3%
64 Channel Array	38	2110	123	0	100%	94.5%	88.7%
128 Channel Array	12	1669	36	2	85.7%	97.9%	67.2%
EyeCatch()	92	4847	89	0	100%	98.2%	87.2%
32 Channel Array	40	950	48	0	100%	95.2%	93.3%
64 Channel Array	38	2212	21	0	100%	99.1%	89.2%
128 Channel Array	14	1685	20	0	100%	98.8%	64.1%
Expert Observer	89	4930	6	3	96.7%	99.9%	85.4%
32 Channel Array	38	994	4	2	95.0%	99.6%	88.9%
64 Channel Array	38	2233	0	0	100%	100%	89.3%
128 Channel Array	13	1703	2	1	92.9%	99.9%	64.6%
Competent Observer	81	4921	15	11	88.0%	99.7%	79.8%
32 Channel Array	38	994	4	2	95.0%	99.6%	88.9%
64 Channel Array	38	2232	1	0	100%	100%	89.3%
128 Channel Array	5	1695	10	9	35.7%	99.4%	27.7%
Novice Observer	81	4888	48	11	88.0%	99.0%	82.4%
32 Channel Array	38	977	21	2	95.0%	97.9%	89.0%
64 Channel Array	37	2230	3	1	97.4%	99.9%	88.9%
128 Channel Array	6	1681	24	8	42.9%	98.6%	46.0%

Open in a new tab

Note: Values indicate the number of components. The values for Reduction of Artifact indicate the percentage of the artifact removed following removal of the ICA components identified as artifactual. For the simulated data, this value reflects the percent similarity between the simulated data prior to the introduction of eye blink artifacts and the reconstructed data following removal of the selected ICA components. For the real data, this value reflects the percent reduction of the convolution (i.e., overlap) between the mean eye blink artifact and the EEG activity across all electrode sites during this same period following removal of the selected ICA components.

Statistical Analysis

The efficacy of the automated procedures for identifying the eye blink ICA component were examined statistically by evaluating their sensitivity (the likelihood of correctly identifying the eye blink ICA component(s); i.e., hits) and specificity (the likelihood of correctly not identifying a non-blink component as an eye blink ICA component(s); i.e., correct rejections) relative to the truly artifactual component. As all simulated datasets were contaminated by eye blink artifact, failure to select an eye blink component was considered a false negative error (‘miss’), unless the truly artifactual component was unable to be determined (e.g., such as if the Infomax algorithm was unable to separate the seeded eye blink from the background noise).

Results

Component selection counts along with the sensitivity and specificity are provided in Table 1. A graphical illustration of the likelihood of identifying the artifact (sensitivity) and the likelihood of misidentifying signal as artifact (1-specificity) as a function of eye blink magnitude and noise for each automated procedure is provided in Figure 4. Results of the simulation indicate that icablinkmetrics() exhibited a lower sensitivity level (72.6%) than ADJUST() and EyeCatch(), which exhibited sensitivities above 91%. The sensitivity of icablinkmetrics() and EyeCatch() was observed to vary as a function of the magnitude of the eye blink artifact and the relative noise level, with both demonstrating perfect sensitivity when the artifact amplitude to noise ratio was high. However, as the artifact amplitude to noise ratio was reduced so too was the sensitivity (see Figure 4). In contrast, ADJUST() exhibited a less interpretable pattern of decreases in sensitivity.

Although icablinkmetrics() exhibited reduced sensitivity relative to the other methods, it also displayed perfect specificity (i.e., it never made any false alarms) regardless of the artifact amplitude or noise level of the simulated data. The specificity of ADJUST() was observed to vary as a function of the magnitude of the eye blink artifact and the relative noise level, demonstrating perfect specificity when the artifact amplitude to noise ratio was high. However, as the artifact amplitude to noise ratio was reduced so too was the specificity (see Figure 4). In contrast, EyeCatch() exhibited a less interpretable pattern of decreases in specificity, seeming to have a greater incidence of falsely identifying components as artefactual when the noise level was the lowest. Additionally, icablinkmetrics() was observed to exhibit a 0% false discovery rate with the removal of the selected components resulting in 89.5% similarity to the original blink-free simulated data, whereas ADJUST() and EyeCatch() were observed to exhibit false discovery rates of 21.8% and 23.2%, respectively, with removal of the selected components resulting in less than an 84% similarity to the original blink-free simulated data. However, when restricted to only those instances where all three automated component selection approaches were able to identify a component as artifactual — thereby ensuring equivalent comparisons free from potential bias related to the failure to identify a component; the components selected by icablinkmetrics(), ADJUST(), and EyeCatch() were all observed to return the data with approximately 91% similarity to the original blink-free simulated data.

Discussion

The aim of this section was to evaluate the extent to which automatic eye-blink ICA component selection methods would be sensitive to variation in the magnitude of the eye blink artifact amid increasing levels of noise in the signal. Utilizing simulated EEG data with an identifiable truly artifactual eye blink ICA component revealed that, sensibly, decreases in the ratio between the artifact amplitude and the noise appeared to negatively impact each of the automated selection approaches. For the time-series approach utilized by icablinkmetrics(), decreases in the ratio between the artifact amplitude and the noise resulted in a reduced ability to identify a component as related to the artifact. However, despite alterations in the amplitude of the artifact and the noise, icablinkmetrics() never falsely identified a non-artifactual component as related to the eye blink. Under fully automated implementations then, icablinkmetrics() might fail to identify ICA components associated with the eye blink with noisier datasets but would seem to be robust against falsely removing signal-related ICA components (i.e., it errs on the side of caution), as reflected by a 100% positive predictive value and 99% negative predictive value.

EyeCatch() in contrast, relying on spatial features alone, exhibited greater stability in its ability to identify eye blink related ICA components despite decreases in the ratio between the artifact amplitude and the noise. However, EyeCatch() exhibited the highest false discovery rate of any of the methods, particularly when the dataset exhibited very low levels of noise, suggesting that under fully automated implementations EyeCatch() might encourage the removal of signal-related ICA components – as reflected by 76.8% positive predictive value and 99.7% negative predictive value.

ADJUST(), which relies on combined stereotypical spatial and temporal features, was observed to exhibit more random failures in the ability to identify ICA components associated with the eye blink, whereas only the likelihood of falsely identifying signal-related ICA components was related to the ratio between the artifact amplitude and the noise. Thus, similar to EyeCatch(), ADJUST() exhibited a 78.2% positive predictive value and 99.9%% negative predictive value suggestive of a bias towards detecting the eye blink related component at the expense of occasionally falsely identifying a signal-related component as artifactual. From a signal detection standpoint these results are sensible: that is, the approach [icablinkmetrics()] that made no false alarms also exhibited many misses, while the approaches [ADJUST() & EyeCatch()] that had the most hits also had the most false alarms.

To ensure that the eye blink artifact is fully removed (e.g., in cases where the ICA algorithm separated the eye blink artifact across multiple components), one might consider the bias to remove several ICA components a strength of the ADJUST() and EyeCatch() approaches. However, within the context of the present investigation, the ICA algorithm was effectively able to dissociate the eye blink related activity into a singular component. Thus, other components simply reflect random perturbations of the signal and their removal would have little benefit for restoring the data to its original uncontaminated state. Indeed, when all three automated approaches returned component identifications, removal of additional components by the ADJUST() and EyeCatch() approaches provided no incremental improvement in restoring the data to its uncontaminated state as all approaches exhibited approximately 91% similarity to the original data following removal of the identified components. Such false positive component identifications, however, may be more detrimental within real EEG datasets as the components selected for removal may be associated with important aspects of the neural signal rather than the artifact. Although the use of simulated data allows for determination of the extent to which these selection approaches can identify the truly artifactual component associated with the eye blink, prior to recommending the utilization of any of these fully automated approaches, it is necessary to further examine their efficacy when used with real EEG data varying across common electrode densities (i.e., 32, 64, and 128 channel montages) and in response to different tasks. We address this issue next.