Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jan 1.
Published in final edited form as: J Abnorm Psychol. 2020 Jan;129(1):29–37. doi: 10.1037/abn0000458

Methodological Choices in Event-Related Potential (ERP) Research and Their Impact on Internal Consistency Reliability and Individual Differences: An Examination of the ERN and Anxiety

Julia Klawohn 1,2, Alexandria Meyer 2, Anna Weinberg 3, Greg Hajcak 1,2
PMCID: PMC6931902  NIHMSID: NIHMS1040338  PMID: 31868385

Abstract

Researchers in clinical psychophysiology make several methodological decisions during the analysis of event-related potentials (ERPs). In the current study, we review these choices from the perspective of individual differences. We focus on baseline period and reference scheme (i.e., average, mastoid, current source density)− as well as choices regarding where (i.e., single electrode site vs. pooling of sites), when (i.e., area, area around peak) and how (i.e., subtraction- or regression-based difference scores) to quantify ERPs. To illustrate the impact of these analytic pathways on internal consistency reliability and individual differences, we focus on the error-related negativity (ERN) and anxiety—and present data from two samples: first, in adults with diagnosed generalized anxiety disorder (GAD); second, in relation to continuous self-reported symptoms of GAD in a large community sample of adolescent females. Results generally indicated similar internal consistency and between-subject effect sizes across all evaluated methods. Nonetheless, some patterns of variation emerged, such as that across both datasets difference-based ERN measures, especially with mastoid reference, yielded more robust associations with GAD diagnosis and symptoms, despite somewhat lower internal consistency. The current analyses suggest that the association between ERN and anxiety is robust across a range of commonly used methodological choices. The present study is an example of how systematic analyses of analytic strategies on measures of internal consistency and between-subject variability could help inform individual differences ERP research.

Keywords: ERN, event-related potentials, ERP, psychopathology, methods

General Scientific Summary:

This study systematically explores the effects of different EEG analytic choices on the internal consistency of a neural measure and its relation to symptoms of anxiety in two independent data sets. Results indicate psychometric properties of the error-related negativity (ERN) are robust across a range of commonly used methodological choices.


Event-related potentials (ERPs) are measures of neural function that can be used to study distinct neural processes both within and across individuals; they are relatively inexpensive and easy to assess, and feasible and safe to assess across the lifespan. Moreover, ERPs can be used as neurocognitive measures to characterize and differentiate clinical groups and related traits and symptom dimensions (Weinberg, Dieterich, & Riesel, 2015). Further, ERPs can function as neurobiological risk markers that can track patterns of familial risk for psychopathology – and ERPs can be leveraged to prospectively predict the onset and course of psychiatric disorders (Hajcak, Klawohn, & Meyer, 2019).

Although ERPs directly reflect functional electrocortical activity, they are the end-state that results from a number of data processing steps and analytic decisions. While there are commonly accepted processing principles and recommendations (e.g., Keil et al., 2014; Luck, 2014), there are various suitable options to choose from for many of the processing steps. For instance, ERPs reflect the voltage, or electrical difference, between an electrode site and a reference site—an ERP waveform “at” one electrode really reflects the electrical potential between that electrode and the reference used. One of the most common reference schemes is to contrast EEG channels against the averaged activity recorded from the left and right mastoid electrodes. Similar approaches use an electrode placed on the nose or both earlobes as the reference. On the other hand, the average reference scheme uses the mean activity of all channels as the reference for each individual EEG channel (for more detailed descriptions see Luck, 2014). Further, a form of reference-free data analysis is the current source density (CSD) transformation, that estimates radial current flow from the scalp-recorded EEG by using neighboring electrodes as the reference (Kayser & Tenke, 2015). While all these referencing schemes are reasonable for most ERPs, they do alter the appearance of waveforms considerably (Hajcak, Weinberg, MacNamara, & Foti, 2012), and there is only scarce knowledge on how the choice of reference might affect both psychometric properties of ERPs and relationships between ERPs and other individual differences measures.

ERPs are often quantified in terms of the difference between two within-subject conditions (e.g., ERP activity on emotional versus neutral trials). This scoring approach is used to isolate neural activity associated with one condition relative to another—this is also done to examine the specificity of differences that may exist between people (i.e., to control for potential differences in a baseline condition). Relative measures are often derived by computing subtraction-based difference scores (i.e., subtraction of condition-related mean amplitudes, or by scoring the ERP difference waveform—what we have previously referred to as delta [Δ] measures). Moreover, we have recently suggested using residualized scores as an alternative relative ERP difference approach (Meyer, Lerner, De Los Reyes, Laird, & Hajcak, 2017).

In addition, there is substantial variability in where and how ERPs are quantified. ERPs are typically measured at the site of maximum (i.e., where an ERP is largest); however, many studies average across neighboring electrode sites and measure the ERP at a pooling of electrodes—often with the presumption that doing so increases signal and reduces noise. In terms of ERP quantification, one very common approach is calculating the mean activity in a specific time period (i.e. mean amplitude). Alternatively, ERPs can be scored using peak-based methods, based on the determination of a local maximum or minimum; this approach includes simple peak (i.e. the single most extreme amplitude value) or peak-to-peak quantification (i.e. the amplitude difference between the peak of the ERP of interest and another peak). In general, the advantage of peak scoring approaches is that they can account for individual variation in the timing of ERP peaks (i.e., mean amplitude scoring focuses on the same window across all individuals); however, peak scoring approaches have been criticized because they weight a single data point and can be biased measures (Luck, 2014). Scoring the area around a peak (i.e. mean activity of a component centered around the peak) is a hybrid method which may benefit from the relative advantages of both the peak and mean amplitude approaches.

In the present study, we systematically investigated the impact of these methodological choices on the error-related negativity (ERN) – in terms of both its internal consistency and relationship with anxiety. The ERN is a response-locked ERP that presents as a sharp negative deflection shortly after error commission over fronto-central electrodes and is a well-established electrophysiological marker of error-processing. Increased ERN amplitudes have been reported in adult clinical groups with obsessive-compulsive disorder OCD (Riesel, 2019), generalized anxiety disorder (GAD; Weinberg, Olvet, & Hajcak, 2010), social anxiety disorder (Endrass, Riesel, Kathmann, & Buhlmann, 2014), and in pediatric OCD and anxiety (Meyer, 2017). Moreover, amplified error signaling can indicate risk for psychopathology, as shown by familial (Riesel et al., 2019) and prospective developmental studies (Meyer, Hajcak, Torpey-Newman, Kujawa, & Klein, 2015). Thus, the ERN has emerged as a neural measure with substantial clinical utility (Hajcak et al., 2019).

ERPs always include baseline activity prior to an event of interest. In ERP studies examining stimulus-related processes, the 200 or 500 ms period prior to stimulus onset is often used as baseline. The choice of baseline is more complicated for the ERN insofar as differences between error and correct trials are evident prior to an incorrect button press; thus, using the 200 ms window prior to responses may include some error-related brain activity. Indeed, studies have used either the mean activity from −200 to 0 ms before response (Hajcak, McDonald, & Simons, 2004) or −500 to −300 ms (Weinberg et al., 2016) as baseline.

In terms of individual difference studies on the ERN, mastoid reference (Gehring, Himle, & Nisenson, 2000; Olvet & Hajcak, 2010), average reference (Endrass, Klawohn, Schuster, & Kathmann, 2008), and the CSD references schemes have all been used (Nelson, Jackson, Amir, & Hajcak, 2017). The ERN is scored most commonly at or around the FCz location (Kaczkurkin, 2013) or at pooled electrode sites surrounding FCz (Larson, Steffen, & Primosch, 2013). Some studies have scored the ERN as the average activity (i.e., mean amplitude) in a fixed window (Gehring et al., 2000), whereas others have quantified the ERN in terms of simple peak scoring (Nieuwenhuis, Nielen, Mol, Hajcak, & Veltman, 2005), peak-to-peak scoring (Klawohn, Endrass, Preuss, Riesel, & Kathmann, 2016), or area around the peak (Boksem, Tops, Kostermans, & De Cremer, 2008). Several studies have analyzed relative difference scores (i.e., error relative to correct trials; ΔERN), using the subtraction of mean amplitude (Meyer et al., 2015), regression-based residual scores (Meyer et al., 2017), or area around the peak of the difference waveform (Chong & Meyer, 2018).

For the current investigation, two different samples were analyzed to systematically investigate the impact of common methodological choices – such as different references, baseline periods, and quantification methods – on psychometric properties of the ERN and its relationship to generalized anxiety. Since the ERN relates both to pathological and continuous variability of worry (Moser, Moran, Schroder, Donnellan, & Yeung, 2013) and in line with the most common study approaches in clinical ERP research, the current study first reexamined previously reported data on the ERN in relation to clinical GAD; then, we conducted identical analyses in a new large adolescent data set in relation to continuous self-reported symptoms of worry.

Method

Samples and Measures

Sample 1 combines EEG data of adult participants from two previously published studies (Weinberg, Klein, & Hajcak, 2012; Weinberg et al., 2010), re-analyzed here using different methodological approaches. The sample includes 40 participants with a current diagnosis of generalized anxiety disorder (GAD), without comorbid depression, and 51 participants without current or past diagnosis of a psychiatric disorder (HC). All participants were interviewed with the Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, fourth edition (SCID-I; First, Spitzer, Gibbon, & Williams, 1995), to ensure they either met diagnostic criteria for current GAD or did not currently or previously meet criteria for any Axis I diagnosis, respectively. For additional information see Weinberg et al. (2010). Participants had a mean age of 25.6 years (GAD: 26.4, SD = 9.9; HC: 25.01, SD = 8.3) and were predominantly female (GAD: 95.0%, HC 84.3%). All participants gave informed consent prior to participation.

Sample 2 is from a large prospective study in female adolescents. Basic descriptions of the first wave of assessment can be found in Meyer et al. (2018). The data for the current study stem from the second wave of assessment and have not been reported before. Self-report and flanker task EEG data were available from 195 adolescent girls. Data from 11 participants had to be excluded from analysis due to insufficient data quality (n = 7), <6 error trials (n = 1), or >45% errors (n = 3). The resulting sample included 184 girls with a mean age of 14.4 (range 10 – 17) years. All participants and their parents provided informed consent and assent, respectively. All adolescents completed a self-report with the Screen for Anxiety Related Emotional Disorders (SCARED; Birmaher et al., 1997), which measures anxiety symptoms in children aged 9 to 18 years with 38 items, each rated on a scale from 0 to 2. For the current study, we studied the association of the ERN with the GAD-subscale.

EEG Data Collection and Experimental Task

Participants in both samples completed an identical arrowhead version of a flanker task, administered using Presentation software (Neurobehavioral Systems, Inc., Albany, California, USA). In total, 330 trials were administered. Half of the stimuli were compatible, the other half incompatible, presented at random order in 11 blocks. Performance-sensitive feedback was provided during the breaks between blocks: “Please try to be more accurate” was shown when performance was 75% correct or lower; “Please try to respond faster” when performance was above 90% correct; otherwise “You are doing a great job” was presented. Continuous EEG data were collected from 34 electrodes, positioned with an elastic cap in accordance with the 10/20 system, as well as on the right and left mastoids. Eye movements and blinks were recorded with four electrodes, two placed horizontally at the canthi of the eyes, two placed above and below the right eye. Data were pre-amplified with a BioSemi ActiveTwo System (Biosemi, Amsterdam, Netherlands) and digitized at 1024 Hz sampling-rate. A common mode sense (CMS) active electrode served as recording reference.

EEG analyses (i.e., methodological pathways)

EEG data were processed offline using Brain Vision Analyzer, Version 2.1 (Brain Products, Gilching, Germany). Initially, EEG data was re-referenced to the average of both mastoid electrodes and bandpass filtered from 0.1 to 30 Hz. Data were segmented into epochs from −500 to 1000 ms around responses. Ocular artifacts were corrected using the algorithm by Gratton, Coles, & Donchin (1983), employing the horizontal and vertical eye channels. Further artifact rejection was performed automatically by rejecting epochs of data when voltage steps of more than 50 µV between sampling points, or a maximal absolute difference of more than 300 µV was present, or when low activity was detected, defined as a voltage difference < 0.5 µV over 100 ms. Remaining artifacts were identified and removed based on visual inspection.

Subsequent analyses were then performed three times, each using a different reference scheme (see Figure S1 in supplement). Data either remained referenced to mastoid electrodes, were re-referenced to average of all electrode sites, or the current-source density (CSD) transformation was applied (order of splines = 4, maximal degree of Legendre polynomials: 10; λ smoothing-parameter = 10−5). For all reference schemes, response-locked ERPs were averaged for correct and incorrect responses separately. Two baseline corrections were applied, using intervals from −500 to −300 ms or −200 to 0 ms before response. For all resulting ERP averages, several different quantification methods were used in relation to electrode FCz, and to a pool of electrodes (i.e., average of Cz, FC1, FC2, FCz, and Fz). Peak-detection determined individual ERN peaks as the most negative deflection from −100 to 200 ms around response onset. Further, a preceding positive peak was identified within −150 to 50 ms relative to response onset. Peaks were visually inspected and corrected if necessary. ERP quantification included the following measures: mean amplitude between 0 to 100 ms after response; mean amplitude over 100 ms centered around the peak of the ERN (i.e., area around the peak of the ERN); peak-to-peak amplitude (i.e., difference between the ERN peak and preceding positive peak). Further, several difference-score (i.e., ΔERN) measures were calculated: ΔERNsubtract as the difference between error and correct mean amplitude scores in the 0 to 100 ms window relative to response onset; the mean amplitude over 100 ms centered around the peak of the difference waveform was quantified; ERNresid scores were generated as the variance leftover in a regression (i.e., unstandardized residuals) wherein correct response mean amplitudes were entered predicting error mean amplitudes.

Statistical Analysis

In Sample 1, group differences in ERN amplitudes for the GAD and HC group were examined for all methodological choices using independent samples t-tests. Effect size estimates (Cohen’s d) and respective 95% confidence intervals were determined (Wuensch, 2012). In Sample 2, correlation between ERN scores and the SCARED GAD-subscale was examined with Pearson’s r, confidence intervals were determined via Fisher’s z’-transformation. Further, internal consistency reliability of ERN scores was examined with a split-half approach, where the correlation between averages of odd and even numbered trials was determined and corrected using the Spearman-Brown prophecy formula (Nunnally, Bernstein, & Berge, 1967). All statistical analyses were conducted with SPSS Statistics, version 23.0 (IBM, Armonk, N.Y.). Statistical tests were two-tailed with α = .05.

Results

Sample 1

Grand-average waveforms are presented in Figure 1 (upper panel), results of internal consistency and between-subject analyses are presented in Table 1 and Figure 2. For simple ERN measures, the choice of baseline did not impact internal consistency (mean r = .80, for both baseline periods; ranging from .72 to .88). However, the response-proximal baseline (i.e., −200 to 0 ms) appeared to be associated with larger between-subjects effect sizes than the earlier (i.e., −500 to −300 ms) baseline, and this pattern was especially prominent for ERN measures with average reference. Across all reference schemes, the baseline-independent peak-to-peak measures for both single and pooled electrodes had high internal consistency (mean r = .85) and rather large between-group effect sizes (mean d = 0.50).

Figure 1.

Figure 1

Grand-average waveforms for Sample 1 (upper panel) and Sample 2 (lower panel) with respect to the main analysis methods. CSD = Current source density transformation. Please note different scales across columns.

Table 1:

Results of group comparisons (Sample 1), correlational analysis (Sample 2), and of internal consistency analyses for ERPs derived from different methodological approaches; Sample1: N = 91 (GAD: n = 41, HC: n = 50), Sample 2: N = 184 adolescents

Sample 1
Sample 2
Baseline −200 to 0 Baseline −500 to −300 Baseline −200 to 0 Baseline −500 to −300
EEG analytic pathway Group Comparison Group Comparison Correlation w. GAD SCARED Correlation w. GAD SCARED
REF Quantification d p rint d p rint r p rint r p rint
ERN measures
MAST MA FCz 0.515 .017 .72 0.473 .028 .71 −.146 .048 .76 −.114 .124 .83
MA Pool 0.505 .019 .72 0.439 .040 .73 −.144 .051 .76 −.103 .165 .83
AAP FCz 0.519 .016 .74 0.444 .038 .70 −.099 .180 .74 −.079 .288 .82
AAP Pool 0.430 .045 .78 0.355 .096 .76 −.096 .196 .68 −.064 .387 .79
Peak to Peak FCz 0.567 .009 .84 0.567 .009 .84 −.112 .132 .77 −.112 .132 .77
Peak to Peak Pool 0.485 .024 .86 0.485 .024 .86 −.104 .160 .72 −.104 .160 .72
AVG MA FCz 0.507 .019 .76 0.349 .102 .79 −.148 .044 .82 −.070 .346 .86
MA Pool 0.549 .011 .79 0.349 .102 .84 −.159 .031 .81 −.059 .428 .84
AAP FCz 0.432 .044 .71 0.270 .203 .77 −.129 .082 .84 −.046 .536 .89
AAP Pool 0.484 .024 .74 0.275 .198 .84 −.129 .082 .83 −.021 .774 .86
Peak to Peak FCz 0.437 .041 .82 0.437 .041 .82 −.114 .123 .85 −.114 .123 .85
Peak to Peak Pool 0.463 .031 .84 0.463 .031 .84 −.100 .175 .82 −.100 .175 .82
CSD MA FCz 0.500 .020 .85 0.422 .048 .73 −.087 .242 .86 −.047 .525 .90
MA Pool 0.568 .009 .85 0.463 .031 .85 −.129 .081 .85 −.080 .278 .89
AAP FCz 0.450 .036 .85 0.391 .068 .71 −.097 .190 .84 −.049 .507 .90
AAP Pool 0.523 .015 .84 0.422 .049 .84 −.126 .088 .86 −.072 .330 .91
Peak to Peak FCz 0.514 .017 .88 0.515 .017 .88 −.080 .278 .88 −.080 .278 .88
Peak to Peak Pool 0.515 .017 .87 0.515 .017 .87 −.103 .163 .87 −.103 .163 .87
ΔERN measures
MAST ΔERNsubtract FCz 0.509 .018 .79 0.617 .004 .64 −.207 .005 .77 −.225 .002 .71
ΔERNsubtract Pool 0.541 .012 .79 0.632 .004 .62 −.195 .008 .76 −.205 .005 .69
AAP Diffwave FCz 0.410 .055 .79 0.511 .017 .63 −.205 .005 .74 −.215 .003 .64
AAP Diffwave Pool 0.456 .034 .77 0.545 .012 .60 −.182 .013 .72 −.183 .013 .60
ΔERNresid FCz 0.547 .011 .71 0.642 .003 .60 −.177 .016 .75 −.213 .004 .71
ΔERNresid Pool 0.558 .010 .71 0.642 .003 .59 −.170 .020 .74 −.195 .008 .70
AVG ΔERNsubtract FCz 0.361 .091 .79 0.304 .154 .66 −.185 .012 .80 −.174 .018 .65
ΔERNsubtract Pool 0.439 .040 .79 0.380 .075 .67 −.179 .015 .77 −.151 .041 .66
AAP Diffwave FCz 0.285 .179 .74 0.224 .293 .59 −.176 .017 .82 −.166 .024 .71
AAP Diffwave Pool 0.353 .098 .80 0.292 .173 .69 −.159 .031 .82 −.133 .072 .68
ΔERNresid FCz 0.473 .027 .75 0.346 .105 .65 −.178 .016 .79 −.166 .024 .72
ΔERNresid Pool 0.539 .012 .77 0.418 .051 .68 −.182 .013 .77 −.144 .051 .67
CSD ΔERNsubtract FCz 0.376 .079 .86 0.317 .138 .63 −.157 .033 .92 −.191 .009 .94
ΔERNsubtract Pool 0.427 .047 .86 0.342 .109 .80 −.193 .009 .81 −.217 .003 .66
AAP Diffwave FCz 0.275 .199 .82 0.285 .180 .57 .131 .077 .78 −.168 .023 .71
AAP Diffwave Pool 0.334 .118 .83 0.361 .091 .62 −.147 .046 .83 −.173 .019 .79
ΔERNresid FCz 0.448 .037 .84 0.203 .341 .64 −.137 .065 .83 −.191 .009 .77
ΔERNresid Pool 0.520 .016 .84 0.285 .182 .80 −.179 .015 .81 −.212 .004 .76

Note. REF = reference; AVG = average reference; MAST = linked mastoid reference; CSD = current source density; MA= Mean amplitude, 0–100 ms; AAP = area around peak, 100 ms; Diffwave = difference waveform ERN-CRN; Pool = frontocentral electrode pool, electrodes Fz, FCz, Cz, FC1, FC2; rint = internal consistency.

Figure 2.

Figure 2

Forest Plot of effect size estimates and 95% confidence intervals in Sample 1 and Sample 2.

Regarding ΔERN measures, the response-proximal baseline showed higher internal consistency (mean r = .79) than the earlier baseline (mean r = .65). Within the response-proximal baseline period, both the ΔERNsubtract and the ERNresid had high effect sizes across reference schemes (mean d = 0.48), whereas for the earlier baseline, only mastoid-referenced ΔERN measures resulted in significant group effects. For all ΔERN measures, across reference schemes and baselines, effect size estimates were somewhat higher for pooled (mean d = 0.45) than single electrode (mean d = 0.40) quantification, whereas no differences in internal consistency emerged. Despite these variations, all confidence intervals overlapped (see Figure 2), indicating that none of the analytic choices in Sample 1 produced significantly different results from the rest.

Sample 2

The SCARED GAD-subscale yielded a sample mean of 5.9 (SD = 4.6). Results for the association of ERN amplitudes with the GAD-subscale are presented in Table 1, grand-average waveforms in Figure 1 (lower panel). Inspection of results for simple ERN measures indicated acceptable to good internal consistency (r range .68 to .88, mean r = .83), for both baseline periods. Significant correlations with GAD symptoms were only found using the response-proximal (i.e., −200 to 0 ms) baseline for mean-amplitude measures with mastoid or average reference. In contrast, ΔERN measures were characterized by significant or trend-level correlations (all p < .077, r range −.131 to −.225) with self-reported GAD symptoms for all quantification methods and across reference schemes. Further considering the various ΔERN measures, the earlier baseline (i.e., −500 to −300 ms) tended to be associated with larger effect sizes for mastoid and CSD references, but not when using average reference; pooling was generally associated with lower effect sizes for both mastoid and average reference, but higher effect sizes for the CSD reference. Internal consistency was good for the response-proximal baseline (r range .72 to .92) and somewhat more variable but in the acceptable to good range for the earlier baseline period (r range .60 to .94). Similar to findings in Sample 1, confidence intervals for associations between ERN and GAD symptoms all overlapped (Figure 2).

Discussion

The current study examined the impact of common choices in ERP analyses on the internal consistency of the ERN and its relationship with individual differences in anxiety. In particular, we focused on different reference schemes, baseline periods, and quantification methods. Our approach was to first reanalyze previously published data on ERN in a relatively large sample of adults with clinical GAD versus healthy controls—to evaluate the impact of specific analytic choices in these data. The more response-proximal baseline (i.e., −200 to 0 ms) was generally associated with better internal consistency, although all measures – including difference measures – had acceptable to good internal consistency. Almost all measures using a mastoid reference yielded robust between-group differences; the fact that both simple ERN and difference measures of the ERN produced comparable between-group differences when the mastoid reference was utilized is consistent with meta-analytic findings in adults (Moser et al., 2013). Most simple ERN measures using the CSD reference also robustly differentiated GAD from healthy participants, and average referenced data was also associated with significant group differences, though only when using the more response-proximal (i.e., −200 to 0 ms) baseline. There was no clear impact of scoring method (i.e., mean amplitude vs peak-based scoring, or using single versus pooled electrodes) in these data.

We then examined the impact of these analytic decisions in a new data set, in relation to continuous GAD symptoms in a large non-clinical sample of adolescents. In this dataset, all ERN measures had acceptable to excellent internal consistency that was on par with internal reliabilities found in adults in Study 1; pooling and scoring approach again had a negligible impact on internal consistency. Unlike Study 1, difference ERN measures were more reliably related to GAD symptoms in the adolescent sample than simple ERN measures. For simple ERN measures, only mean amplitudes with the response-proximal (i.e., −200 to 0 ms) baseline using mastoid or average reference were significantly correlated with GAD symptoms. In contrast, 34 out of 38 ΔERN measures were significantly correlated with GAD symptoms, thus indicating an overall more robust association with anxiety. It might be an interesting avenue for a meta-analysis to examine whether in pediatric and adolescent populations, a stronger association of anxiety with ΔERN measures in contrast to simple ERN measures generalizes across studies and age ranges.

It is important to note that the two samples analyzed differed in several ways: Study 1 was comprised of 91 adults who either had diagnosed GAD or no diagnosed psychopathology, whereas Study 2 was a community sample of adolescent females. Thus, it is somewhat difficult to interpret differences between studies, which could be due to multiple factors (e.g., age, diagnostic status, statistical power). Nonetheless, results of the present studies collectively suggest that the most common ERP analytic choices have only limited impact on both the psychometric properties of the ERN, and its association with anxiety. Specifically, effect sizes were generally similar and overlapping. Along the same lines, internal consistency reliability across all analytic strategies was acceptable to good. Several additional consistent findings emerged across samples. Although studies often use a pooling of electrode sites based on the notion that this will increase internal consistency of the ERP score, we found no evidence that pooled electrode sites had superior internal consistency compared to single-site measures of the ERN. Further, across both samples and study designs, almost all effects based on ΔERN measures and mastoid reference were significant—across all baseline periods and scoring approaches. This might be due to the fact that ΔERN measures control for basic response processing as well as possibly overlapping stimulus-locked activity. In particular, the regression-based ERN measure might be best suited to control for suppression-effects induced by stimulus-related activity (Meyer et al., 2017). But also, the mean amplitude ΔERNsubtract and, in adults, the peak-to-peak ERN (in itself a form of difference score and baseline-independent) had overall good internal consistency and were robust regarding between-subject effect sizes. Although the difference-based measures had somewhat lower internal consistency (while still within an acceptable range), it appears that a higher proportion of that reliable variance relates to other individual difference measures. Thus, despite concerns raised regarding the reliability of difference scores, ΔERN measures seem to better isolate the error-specific neural activity relevant to individual differences in anxiety.

There was also evidence that the baseline period impacts between-subject differences, especially in considering simple versus difference ERN measures. For example, in both Study 1 and 2, the response-proximal baseline was associated with larger effect sizes for simple ERN measures when using mastoid reference, whereas the opposite was true for ΔERN measures using mastoid reference. These data suggest that a response-proximal baseline might be preferable for simple ERN measures when using a mastoid reference, whereas an earlier reference is preferable for difference measures. The choice of baseline might be a specifically relevant to the ERN, since error and correct trials can differ well before response onset (i.e., see Figure 1). This is further complicated by the possibility that individual differences in stimulus-locked ERPs might overlap with and impact the ERN (Meyer et al., 2017; Riesel, Klawohn, Kathmann, & Endrass, 2017). The take-home suggestions for future research on ERN and anxiety would be to use mastoid reference, a difference-based ERN score, and an earlier baseline period (i.e., −500 to −300 ms).

We undertook these analyses to understand the concrete impact of common ERP analytic choices on both the psychometric properties of the ERN and its relationship with anxiety. In light of various reasonable choices available to ERP researchers, there is concern about the number of possible comparisons, p-hacking, and related concerns about replicability and false positive results (Baldwin, 2017; Luck & Gaspelin, 2017). These concerns could lead researchers to avoid analytic explorations of their data – an approach we took explicitly in the current study. Two things seem true: effect sizes varied numerically, and some analytic paths produced results that fell below the threshold for statistical significance; however, internal reliability was uniformly high and between-subjects effect sizes were overlapping and generally consistent. No single analytic path produced statistically significant results—and that there was not an outlying result based on any specific set of reasonable decisions. It is essential to ensure that methodological choices (i.e. quantification approaches, timeframes, etc.) are not made based on data, as this inflates α-error (Luck & Gaspelin, 2017); we would encourage researchers to specify specific analytic approaches a priori.

The current studies were limited in their focus on the ERN and anxiety. Therefore, similar analyses would be important in the context of other ERP measures and individual differences. As in the current study, we would only suggest conducting such analyses to understand the impact of methodological choices to guide future studies. Indeed, it might also be important for future studies to examine the impact of other methodological decisions, such as ocular correction methods.

Supplementary Material

Supplemental Material

Acknowledgments

Supported by grants from NIMH R01 MH097767 to GH, F31 MH091837 to AW and F31 MH102880 to AM. Part of the results were presented at the 2018 Annual Meeting of the Society for Psychophysiological Research. Approved by the Institutional Review Board at Florida State University (#2018.26010).

References

  1. Baldwin SA (2017). Improving the rigor of psychophysiology research. International Journal of Psychophysiology, 111, 5–16. [DOI] [PubMed] [Google Scholar]
  2. Birmaher B, Khetarpal S, Brent D, Cully M, Balach L, Kaufman J, & Neer SM (1997). The Screen for Child Anxiety Related Emotional Disorders (SCARED): scale construction and psychometric characteristics. J Am Acad Child Adolesc Psychiatry, 36(4), 545–553. 10.1097/00004583-199704000-00018 [DOI] [PubMed] [Google Scholar]
  3. Boksem MA, Tops M, Kostermans E, & De Cremer D (2008). Sensitivity to punishment and reward omission: evidence from error-related ERP components. Biol Psychol, 79(2), 185–192. 10.1016/j.biopsycho.2008.04.010 [DOI] [PubMed] [Google Scholar]
  4. Chong LJ, & Meyer A (2018). Understanding the Link between Anxiety and a Neural Marker of Anxiety (The Error-Related Negativity) in 5 to 7 Year-Old Children. Developmental Neuropsychology, 1–17. 10.1080/87565641.2018.1528264 [DOI] [PubMed]
  5. Endrass T, Klawohn J, Schuster F, & Kathmann N (2008). Overactive performance monitoring in obsessive-compulsive disorder: ERP evidence from correct and erroneous reactions. Neuropsychologia, 46(7), 1877–1887. doi:S0028-3932(07)00434-4 [DOI] [PubMed] [Google Scholar]
  6. Endrass T, Riesel A, Kathmann N, & Buhlmann U (2014). Performance monitoring in obsessive-compulsive disorder and social anxiety disorder. Journal of Abnormal Psychology, 123(4), 705–714. 10.1037/abn0000012 [DOI] [PubMed] [Google Scholar]
  7. First MB, Spitzer RL, Gibbon M, & Williams JB (1995). Structured clinical interview for DSM-IV axis I disorders New York: New York State Psychiatric Institute. [Google Scholar]
  8. Gehring WJ, Himle J, & Nisenson LG (2000). Action-monitoring dysfunction in obsessive-compulsive disorder. Psychological Science, 11(1), 1–6. 10.1111/1467-9280.00206 [DOI] [PubMed] [Google Scholar]
  9. Gratton G, Coles MG, & Donchin E (1983). A new method for off-line removal of ocular artifact. Electroencephalogr Clin Neurophysiol, 55(4), 468–484. [DOI] [PubMed] [Google Scholar]
  10. Hajcak G, Klawohn J, & Meyer A (2019). The utility of event-related potentials (ERPs) in clinical psychology. Annu Rev Clin Psychol [DOI] [PubMed]
  11. Hajcak G, McDonald N, & Simons RF (2004). Error-related psychophysiology and negative affect. Brain and Cognition, 56(2), 189–197. 10.1016/j.bandc.2003.11.001 [DOI] [PubMed] [Google Scholar]
  12. Hajcak G, Weinberg A, MacNamara A, & Foti D (2012). ERPs and the study of emotion. The Oxford handbook of event-related potential components, 441, 474. [Google Scholar]
  13. Kaczkurkin AN (2013). The effect of manipulating task difficulty on error-related negativity in individuals with obsessive-compulsive symptoms. Biological Psychology, 93(1), 122–131. 10.1016/j.biopsycho.2013.01.001 [DOI] [PubMed] [Google Scholar]
  14. Kayser J, & Tenke CE (2015). On the benefits of using surface Laplacian (current source density) methodology in electrophysiology. International Journal of Psychophysiology, 97(3), 171–173. 10.1016/j.ijpsycho.2015.06.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Keil A, Debener S, Gratton G, Junghöfer M, Kappenman ES, Luck SJ, … Yee CM (2014). Committee report: publication guidelines and recommendations for studies using electroencephalography and magnetoencephalography. Psychophysiology, 51(1), 1–21. [DOI] [PubMed] [Google Scholar]
  16. Klawohn J, Endrass T, Preuss J, Riesel A, & Kathmann N (2016). Modulation of Hyperactive Error Signals in Obsessive-Compulsive Disorder by Dual-Task Demands. Journal of Abnormal Psychology, 125(2), 292–298. 10.1037/abn0000134 [DOI] [PubMed] [Google Scholar]
  17. Larson M, Steffen P, & Primosch M (2013). The impact of a brief mindfulness meditation intervention on cognitive control and error-related performance monitoring. Frontiers in Human Neuroscience, 7(308). 10.3389/fnhum.2013.00308 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Luck SJ (2014). An introduction to the event-related potential technique: MIT press. [Google Scholar]
  19. Luck SJ, & Gaspelin N (2017). How to get statistically significant effects in any ERP experiment (and why you shouldn’t). Psychophysiology, 54(1), 146–157. 10.1111/psyp.12639 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Meyer A (2017). A biomarker of anxiety in children and adolescents: A review focusing on the error-related negativity (ERN) and anxiety across development. Dev Cogn Neurosci, 27, 58–68. 10.1016/j.dcn.2017.08.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Meyer A, Carlton C, Crisler S, & Kallen A (2018). The development of the error-related negativity in large sample of adolescent females: Associations with anxiety symptoms. Biological Psychology, 138, 96–103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Meyer A, Hajcak G, Torpey-Newman DC, Kujawa A, & Klein DN (2015). Enhanced Error-Related Brain Activity in Children Predicts the Onset of Anxiety Disorders Between the Ages of 6 and 9. Journal of Abnormal Psychology, 124(2), 266–274. 10.1037/abn0000044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Meyer A, Lerner MD, De Los Reyes A, Laird RD, & Hajcak G (2017). Considering ERP difference scores as individual difference measures: Issues with subtraction and alternative approaches. Psychophysiology, 54(1), 114–122. 10.1111/psyp.12664 [DOI] [PubMed] [Google Scholar]
  24. Moser JS, Moran TP, Schroder HS, Donnellan MB, & Yeung N (2013). On the relationship between anxiety and error monitoring: a meta-analysis and conceptual framework. Frontiers in Human Neuroscience, 7, 466 10.3389/fnhum.2013.00466 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Nelson BD, Jackson F, Amir N, & Hajcak G (2017). Attention bias modification reduces neural correlates of response monitoring. Biological Psychology, 129, 103–110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Nieuwenhuis S, Nielen MM, Mol N, Hajcak G, & Veltman DJ (2005). Performance monitoring in obsessive-compulsive disorder. Psychiatry Research, 134(2), 111–122. [DOI] [PubMed] [Google Scholar]
  27. Nunnally J, Bernstein I, & Berge J. t. (1967). Psychometric theory (Vol. 226): JSTOR. [Google Scholar]
  28. Olvet DM, & Hajcak G (2010). The effect of trial-to-trial feedback on the error-related negativity and its relationship with anxiety. (vol 9, pg 427, 2009). Cognitive Affective & Behavioral Neuroscience, 10(1), 157–157. 10.3758/Cabn.10.1.157 [DOI] [PubMed] [Google Scholar]
  29. Riesel A (2019). The erring brain: Error-related negativity as an endophenotype for OCD-A review and meta-analysis. Psychophysiology, 56(4), e13348 10.1111/psyp.13348 [DOI] [PubMed] [Google Scholar]
  30. Riesel A, Klawohn J, Grutzmann R, Kaufmann C, Heinzel S, Bey K, … Kathmann N (2019). Error-related brain activity as a transdiagnostic endophenotype for obsessive-compulsive disorder, anxiety and substance use disorder. Psychol Med, 1–11. 10.1017/S0033291719000199 [DOI] [PMC free article] [PubMed]
  31. Riesel A, Klawohn J, Kathmann N, & Endrass T (2017). Conflict monitoring and adaptation as reflected by N2 amplitude in obsessive-compulsive disorder. Psychol Med, 1–11. 10.1017/S0033291716003597 [DOI] [PubMed]
  32. Weinberg A, Dieterich R, & Riesel A (2015). Error-related brain activity in the age of RDoC: A review of the literature. Int J Psychophysiol 10.1016/j.ijpsycho.2015.02.029 [DOI] [PubMed]
  33. Weinberg A, Klein DN, & Hajcak G (2012). Increased error-related brain activity distinguishes generalized anxiety disorder with and without comorbid major depressive disorder. J Abnorm Psychol, 121(4), 885–896. 10.1037/a0028270 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Weinberg A, Meyer A, Hale-Rude E, Perlman G, Kotov R, Klein DN, & Hajcak G (2016). Error-related negativity (ERN) and sustained threat: Conceptual framework and empirical evaluation in an adolescent sample. Psychophysiology, 53(3), 372–385. 10.1111/psyp.12538 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Weinberg A, Olvet DM, & Hajcak G (2010). Increased error-related brain activity in generalized anxiety disorder. Biological Psychology, 85(3), 472–480. 10.1016/j.biopsycho.2010.09.011 [DOI] [PubMed] [Google Scholar]
  36. Wuensch KL (2012). Using SPSS to obtain a confidence interval for Cohen’s d.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

RESOURCES