Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Apr 1.
Published in final edited form as: Int J Psychophysiol. 2017 Feb 2;114:24–30. doi: 10.1016/j.ijpsycho.2017.01.012

How Many Blinks are Necessary for a Reliable Startle Response? A Test Using the NPU-threat Task

Lynne Lieberman 1, Elizabeth S Stevens 1, Carter J Funkhouser 1, Anna Weinberg 2, Casey Sarapas 1, Ashley A Huggins 3, Stewart A Shankman 1
PMCID: PMC5344746  NIHMSID: NIHMS852388  PMID: 28163133

Abstract

Emotion-modulated startle is a frequently used method in affective science. Although there is a growing literature on the reliability of this measure, it is presently unclear how many startle responses are necessary to obtain a reliable signal. The present study therefore evaluated the reliability of startle responding as a function of number of startle responses (NoS) during a widely used threat-of-shock paradigm, the NPU-threat task, in a clinical (N = 205) and non-clinical (N = 92) sample. In the clinical sample, internal consistency was also examined independently for healthy controls vs. those with panic disorder and/or major depression and retest reliability was assessed as a function of NoS. Although results varied somewhat by diagnosis and for retest reliability, the overall pattern of results suggested that six startle responses per condition were necessary to obtain acceptable reliability in clinical and non-clinical samples during this threat-of-shock paradigm in the present study.

Keywords: anxiety-potentiated startle, emotion-modulated startle, eyeblink startle reflex, fear-potentiated startle, reliability

1. Introduction

Establishing the reliability of a measure is an essential first step towards establishing its validity (Cronbach, 1947; Cronbach & Meehl, 1955). Although this fact is well accepted in the development of self-report and interview measures, the psychometric properties of psychophysiological indices of psychological constructs has received less attention until recently (Hajcak & Patrick, 2015; Tomarken, 1995). This is particularly important given the increasingly prominent role of psychophysiological measures within psychology (and affective science more specifically; Schwartz, Lilienfeld, Meca, & Sauvigné, 2016; Shankman & Gorka, 2015). The present study therefore seeks to contribute to this burgeoning literature by examining the reliability of a widely used psychophysiological index of emotion – electromyography of the eyeblink startle reflex (EMG startle).

The startle reflex is particularly conducive to translational research on emotion because it is present across species and its magnitude is modulated by an organism’s emotional state. More specifically, the magnitude of the startle reflex is potentiated or blunted relative to baseline when an organism is in an aversive (e.g., fear) or appetitive (e.g., excitement) emotional state, respectively (Grillon & Ameli, 2001; Vrana. Spence & Lang, 1988). Startle is also commonly used to examine emotional processing abnormalities that may contribute to the development and maintenance of psychopathology. For example, heightened aversive responding to particular threatening stimuli/situations has been implicated in the pathogenesis of several internalizing disorders (e.g., panic disorder and interoceptive cues; posttraumatic stress disorder and trauma-related cues; social anxiety disorder and social evaluation; Craske et al., 2009). However, unpredictable threatening stimuli are particularly aversive for anxious individuals. Panic disorder (PD), posttraumatic stress disorder, and social anxiety disorder have all been associated with heightened startle potentiation during the anticipation of unpredictable threat (Cornwell, Johnson, Berardi & Grillon, 2006; Grillon et al., 2009; Shankman et al., 2013). Thus, aberrant emotion-modulated startle, particularly during the anticipation of unpredictable threat, may represent a transdiagnostic marker for several internalizing disorders.

The literature on the psychometric properties of emotion-modulated startle has also grown in recent years. Investigations of the retest reliability of emotion-modulated startle elicited during an affective picture-viewing task have yielded mixed results, with some investigations finding strong retest reliability (Bradley, Gianaros, & Lang, 1995; Larson, Ruffalo, Nietert, & Davidson, 2000) and others finding weak retest reliability (Kaye, Bradford, & Curtin, 2016; Manber, Allen, Burton, & Kaszniak, 2000). Only two studies to date have examined retest reliability of emotion-modulated startle during the No threat-Predictable threat-Unpredictable threat-task (NPU; Schmitz & Grillon, 2012), a startle paradigm that is widely used to differentiate startle potentiation to predictable threat (i.e., fear-potentiated startle) and unpredictable threat (i.e., anxiety potentiated startle). Both studies reported retest correlations above .69 for anxiety-potentiated startle and fear-potentiated startle (Kaye et al., 2016; Shankman et al., 2013). Kaye and colleagues (2016) reported acceptable internal consistency (i.e., Cronbach’s alphas > .70 [Nunnally, 1978]) for anxiety-potentiated startle and fear-potentiated startle during the NPU-threat task.

Despite growing focus in the field of psychology on exploring the reliability of emotion-modulated startle, there are several major gaps in the extant literature on the psychometric properties of this psychophysiological measure. For example, it is presently unknown how many startle responses are necessary to obtain a reliable index of startle potentiation scores during emotion-modulated startle paradigms. It is also presently unknown whether the number of startle responses (NoS) necessary for reliable condition averages (which are used to calculate startle potentiation scores) and potentiation scores differs for those with internalizing psychopathology relative to those without. This is a particularly important question to address given the abovementioned association between internalizing psychopathology and aberrant emotion-modulated startle.

Condition averages and potentiation scores calculated from a sufficient NoS should demonstrate acceptable internal consistency and strong retest reliability. Determining the minimum number of startle responses (NoS) necessary for reliable condition averages and potentiation scores would be highly beneficial for the design of future experimental protocols (at least with the NPU startle paradigm), which should be as brief as possible to reduce participant burden and the potential impact of startle habituation on task effects (Blumenthal et al., 2005). An empirically determined minimum NoS could also help experimenters determine when a participant has too few usable startle responses to be included in data analyses. This is critical given that certain trials may be excluded for some participants due to artifacts (e.g., excessive participant movement just before or after the presentation of a startle probe) and non-responses (i.e., failure to exhibit a discernable startle response) and some participants may withdraw from the study prior to study completion.

Several studies have examined the reliability of event-related potentials as a function of number of trials (e.g., Foti, Kotov & Hajcak, 2013; Moran, Jendrusina & Moser, 2013; Meyer, Riesel & Proudfit, 2013). However, only one study to our knowledge has examined this question with respect to EMG startle data. Our laboratory recently investigated the NoS necessary for adequate internal consistency (i.e., degree of interrelatedness or stability; Tavakol & Dennick, 2011) of average startle magnitude during each condition of the NPU-threat task (i.e., condition averages) in a non-clinical sample. Startle magnitude exhibited excellent internal consistency (Cronbach’s alpha > .80) for all NPU conditions with as few as three responses (Nelson, Hajcak, & Shankman, 2015). The present study will replicate our previous investigation by examining the internal consistency of condition averages during NPU as a function of NoS across two additional samples, one clinical and one non-clinical. We will also extend our previous investigation by examining, (a) the internal consistency of potentiation scores (i.e., fear-potentiated startle and anxiety potentiated startle) as a function of NoS; and (b) whether the NoS necessary for adequate consistency of condition averages and potentiation scores differs for those with an anxiety and/or depressive disorder. Lastly, we will conduct exploratory analyses to assess the NoS necessary for significant retest reliability of condition averages and potentiation scores in a subset of participants.

2. Methods

2.1. Participants

Data from the present study was collected as part of two investigations on emotional and cognitive processes. Details of the two studies are provided elsewhere (see Sarapas, Weinberg, Langenecker, & Shankman, & 2017; Shankman et al., 2013). In brief, study 1 (n = 92) was a non-clinical sample of undergraduates. Study 2 (n = 205) was a clinical sample recruited from the community to be in one of four groups: (1) no history of Axis I psychopathology (i.e., healthy controls; n = 82), (2) current major depressive disorder (MDD) and no lifetime history of any anxiety disorder (i.e., MDD-only group; n = 37), (3) current PD and no lifetime history of MDD (i.e., PD-only group; n = 28), (4) current PD and MDD (i.e., comorbid PD and MDD group; n = 58). Diagnoses were made via the Structured Clinical Interview for DSM-IV (SCID; First, Spitzer, Gibbon, & Williams, 1996).

Exclusion criteria for both studies were a history of head trauma, left-handedness, and English fluency. Participants in Study 2 were additionally required to have no lifetime history of a psychotic disorder, bipolar disorder, or dementia. Participant demographics can be found in Table 1, along with clinical characteristics, such as self-reported anxiety and depressive symptomology.

Table 1.

Sample Demographics and Clinical Characteristics

Characteristic Clinical Sample Non-Clinical Sample
Age 32.93 (12.31) 19.02 (1.38)
Gender (% Female) 64.40 76.1
Ethnicity (% Caucasian) 46.30 35.9
IDAS-Dysphoria 22.26 (10.61) 21.74 (81.90)
IDAS-Panic 11.93 (5.34) 11.78 (4.00)
IUS-12 28.22 (10.09) 27.74 (8.67)

Note. IDAS = Inventory for Depression and Anxiety Symptoms (Watson et al., 2007); IUS-12 = Intolerance of Uncertainty scale (Carleton, Norton & Asmundson, 2007)

2.2. Procedure and NPU-Threat Task

The full procedure for Studies 1 and 2 has been reported elsewhere (Sarapas et al., 2017; Shankman et al., 2013). In brief, after informed consent all participants completed the NPU threat-task. For Study 2, 34 participants returned to the laboratory 5–17 (M = 9.46, SD = 3.71) days after their initial visit to complete NPU a second time. Of these 34 individuals, 7 had MDD-only, 5 had PD-only, 10 had comorbid PD and MDD, and 12 were healthy controls. All procedures were approved by the local Institutional Review Board.

The NPU-threat task was designed to assess responses to predictable and unpredictable threats (Schmitz & Grillon 2012). In brief, prior to the task, shock electrodes were placed on participants’ left wrist and a shock work-up procedure was completed to identify the level of shock intensity each participant described as “highly annoying but not painful” (between 1–5 mA). Participants also completed a 2-min startle habituation task prior to the task to reduce early, exaggerated startle potentiation.

The NPU-threat task included three within-subjects conditions - no shock (N), predictable shock (P), and unpredictable shock (U). Text at the bottom of the computer monitor informed participants of the current threat condition and each condition lasted for 90s. In Study 1, a 6-s countdown was displayed five times within each condition, and in Study 2, an 8-s geometric cue (blue circle for N, red square for P, and green star for U) was presented four times within each condition. Interstimulus intervals ranged from 7 to 17-s during which only the text describing the condition was on the screen (i.e., ISI conditions).

During N, no shocks were delivered. During P, Study 1 participants only received a shock when the countdown reached 1 and Study 2 participants only received a shock when the cue (red square) was on the screen (i.e., the shock was predicted by the countdown or cue in study 1 and 2, respectively). In the U condition, shocks were administered at any time (i.e., during the cue countdown [hereafter: cue] or ISI). Study 1 participants received 20 shocks (10 each during P and U) and 48 startle probes (16 each during N, P, and U). Study 2 participants received 12 shocks (6 during P and 6 during U) and 72 startle probes (24 each during N, P, and U). Study 2’s NPU was divided into two recording blocks, separated by a rest period.

Stimuli (i.e., shocks, white noise) were administered using PSYLAB (Contact Precision Instruments, London, UK) hardware and software. Psychophysiological data were acquired using Neuroscan 4.4 (Compumedics, Charlotte, NC). Acoustic startle probes were 40-ms, 103-dB bursts of white noise presented binaurally through headphones. Electric shocks were 400-ms. Consistent with published guidelines (Blumethal et al., 2005), EMG startle was recorded from two 4-mm Ag/AgCl electrodes placed over the orbicularis oculi muscle below the right eye and the ground electrode was at the frontal pole (AFZ). Data were collected using a bandpass filter of DC to 200 Hz at a sampling rate of 1,000 Hz.

Startle blinks were scored according to published guidelines (Blumenthal et al., 2005). Data processing included applying a 28 Hz high-pass filter, rectifying, and then smoothing using a 40 Hz low-pass filter. Blink response was defined as the peak amplitude of EMG activity within the 20–150 ms period following startle probe onset relative to baseline. The baseline period was defined as the average baseline EMG level for the 50 ms preceding the startle probe onset. Each peak was identified by software but examined by hand to ensure acceptability. Blinks were scored as nonresponses if EMG activity during the 20–150 ms poststimulus time frame did not produce a blink peak that was visually differentiated from baseline activity. Blinks that were scored as nonresponses were included as zeros. Blinks were scored as missing if the baseline period was contaminated with noise, movement artifact, or if a spontaneous or voluntary blink began before minimal onset latency and thus interfered with the startle probe-elicited blink response.

2.3. Data Analysis Plan

Reliability was examined separately for Studies 1 and 2. Reliability was also examined separately for startle amplitude (non-responses scored as missing values) and magnitude (nonresponses scored as zeros). Cronbach’s alpha was used to index internal consistency (Santos, 1999). We first examined Cronbach’s alpha as a function of the NoS entered into the averages for each condition (NCue, PCue, UCue, NISI, PISI, and UISI) with a maximum of 8 (Study 1) and 12 (Study 2) probes per condition. Condition averages were derived from raw microvolt values. For each NoS (NoS = 2; NoS = 3, etc), startle probes were selected in the order that they occurred in (i.e., sequentially).1 Given that, as mentioned above, some startle responses were scored as missing during EMG data processing, it is important to note that the available sample size of participants for all reliability analysis decreased as the NoS increased. The median number of probes that elicited missing responses was 2 (out of 48) for study 1 and 4 (out of 72) for study 2 and the median number of non-responses was 1 in each sample (sees Table 2 and 3).2 Also of note is that no case analyses were conducted, and no model outliers were removed. That is, all participants who completed the NPU-threat task in each study were included in the analyses.

Table 2.

Sample size at each NoS for the Non-Clinical Sample’s Cronbach’s Alpha Analyses of Magnitude Condition Averages

NoS NISI NCue PISI PCue UISI UCue
2 80 81 72 80 80 83
3 77 73 66 72 74 79
4 74 71 60 68 69 76
5 72 66 56 67 66 73
6 68 64 53 64 63 67
7 66 61 50 61 63 63
8 64 58 50 60 58 61

Note. NoS = Number of startle responses; N = No shock condition; P = Predictable shock condition; U = Unpredictable shock condition; ISI = Inter-stimulus interval

Table 3.

Sample size at each NoS for the Clinical Sample’s Cronbach’s Alpha Analyses of Magnitude Condition Averages

NoS NISI NCue PISI PCue UISI UCue
2 178 172 187 193 187 187
3 169 159 183 188 175 166
4 152 139 173 180 168 156
5 135 128 156 166 159 151
6 130 119 147 155 145 143
7 122 109 142 147 139 137
8 116 105 135 141 133 134
9 110 102 129 136 129 126
10 102 99 122 130 126 123
11 100 94 118 119 115 118
12 97 88 111 109 108 112

Note. NoS = Number of startle responses; N = No shock condition; P = Predictable shock condition; U = Unpredictable shock condition; ISI = Inter-stimulus interval

Internal consistency analyses were conducted separately for each diagnostic group for Study 2 (i.e., healthy controls, PD-only, MDD-only, and comorbid MDD/PD). Cronbach’s alpha was defined as ‘acceptable’ when equal to or greater than .70 (Nunnally, 1978). Split-half reliability analyses were conducted to examine the internal consistency of potentiation scores as a function of NoS. To do so, averages of odd-numbered trials and even-numbered trials were first separately calculated as a function of NoS (e.g., the average of startle responses one and three; the average of startle responses two and four, etc.). Spearman-Brown corrected Coefficients were then calculated to assess the relation between odd and even trials (see Kappenman et al., 2014 and Kaye et al., 2016 for a similar approach). Consistent with the literature, Spearman-Brown Coefficients were interpreted as acceptable if greater than .50 (Kaye et al., 2016).

For Study 2, retest reliabilities were tested as a function of NoS for: (1) average startle in each of the six NPU conditions, (2) startle potentiation to the unpredictable threat (average UCue minus average NCue and average UISI minus average NISI), and (3) startle potentiation to the predictable threat (average PCue minus average NCue). Pearson’s r was also used to assess retest reliability.

3. Results

3.1. Internal Consistency in the Non-Clinical Sample (Study 1)

At only two responses (NoS=2), Cronbach’s alphas for average startle magnitude ranged from .70–.83 for all conditions (see Figure 1A). For average startle amplitude with two responses, Cronbach’s alphas were comparable, ranging from .79–.86 for all conditions except PCue (.68). Cronbach’s alpha for amplitude during PCue reached an acceptable level of .75 at three responses. For magnitude and amplitude potentiation scores, Spearman-Brown Coefficients reached an acceptable level across all conditions at just two responses total (range of rs = .73–.86 and rs = .71–.86, respectively, p < .05 [see Figure 1B]).

Figure 1.

Figure 1

Figure 1

Figure 1

Note. Internal consistency, as indexed by Cronbach’s alpha, of startle magnitude as a function of number of responses during each condition of the NPU-threat task in the (A) non-clinical, and (C) clinical sample (across all diagnostic groups). Split-level correlations as a function of responses for potentiation scores in the (B) non-clinical, and (D) clinical sample (across all diagnostic groups). Error bars represent a 95% confident interval.

3.2. Internal Consistency in the Clinical Sample (Study 2)

Across all four groups, at two responses, Cronbach’s alphas for startle magnitude and amplitude ranged from .85–.90 across all six conditions (see Figure 1C). Similarly, for magnitude and amplitude potentiation scores, split-half correlations reached an acceptable level across all conditions at just two responses (range of rs = .85–.86 for magnitude and amplitude, p < .05 [see Figure 1D]).

The number of responses necessary to reach acceptable Cronbach’s alpha levels across all conditions was comparable across diagnostic groups. In healthy controls alphas across all conditions ranged from .86–.90 for magnitude (see Figure 2A) and .85–.90 for amplitude at NoS = 2. In the MDD-only group, alphas across all conditions ranged from .78–.92 for magnitude (see Figure 2B) and .77–.91 for amplitude at NoS = 2. For startle amplitude in the PD-only group, alphas ranged from .80–.90 across all conditions except NISI at NoS = 2. Likewise, for startle magnitude in the PD-only group, alphas ranged for from .82–.90 across all conditions except NISI at NoS = 2. Alpha for magnitude and amplitude during NISI reached an acceptable level of .81 at NoS = 3 (see Figure 2C). Lastly, in the comorbid MDD/PD group, alphas across all conditions ranged from .82–.94 for magnitude (see Figure 2D) and 83–.93 for amplitude at NoS = 2.

Figure 2.

Figure 2

Figure 2

Note. Internal consistency, as indexed by Cronbach’s alpha, of startle magnitude as a function of number of responses during each condition of the NPU-threat task in the clinical sample among individuals with (A) no history of psychopathology, (B) MDD-only, (C) PD-only and (D) comorbid PD/MDD. Error bars represent a 95% confident interval.

Given that alpha values for magnitude and amplitude were acceptable for all conditions across all diagnostic groups at NoS = 3, exploratory follow-up analyses were conducted to examine whether Cronbach’s alpha values significantly differed between those with a diagnosis of PD and/or MDD relative to healthy controls. To compare internal consistency estimates at this NoS between individuals with and without a diagnosis, Cronbach’s alpha values at NoS = 3 were calculated for individuals with any diagnosis (i.e., collapsing across individuals with PD-only, MDD-only, or comorbid PD/MDD). We then conducted a series of pairwise comparisons using a dependent-alpha calculator developed by Abd-El-Fattah & Hassan (2011) to statistically compare Cronbach’s alpha at NoS = 3 for individuals with any diagnosis, relative to healthy controls for the key threat conditions of the NPU-threat task: PCue, UCue, and UISI. These comparisons revealed no significant differences between Cronbach’s alpha values at NoS =3 for individuals with a diagnosis, relative to those without.

3.3. Retest Reliability (Study 2)

For all conditions except NCue and PISI, there was a significant positive retest correlation for startle magnitude across the two visits with as few as NoS = 2 (range of rs = .38–.71, ps < .05, see Figure 3A). Retest correlations for startle magnitude reached significance for NCue and PISI at NoS = 3 (rs = .28 and .31, respectively, p < .05). Similarly, for all conditions except except NCue and PISI, there was a positive retest correlation for startle amplitude with as few as NoS = 2 (range of rs across conditions at three responses = .44–.78, ps < .05). Retest correlations for NCue startle amplitude reached significance at NoS = 5 (r = .39, p < .05), and PISI at NoS = 3 (r = .35, p < .05).

Figure 3.

Figure 3

Note. Retest reliability in the clinical sample, as indexed by Pearson’s r, of average startle magnitude during (A) each condition of the NPU-threat task, as well as (B) startle magnitude potentiation to predictable and unpredictable threats (PCue - NCue, UCue – Ncue, UISI – NISI). Error bars represent a 95% confident interval.

Startle potentiation to unpredictable threat during visit one positively predicted startle potentiation during visit two with as few as two startle responses for magnitude (rs for UCue and UISI at two responses = .61 and .49, respectively, p < .05) and amplitude (rs for UCue and UISI at two responses = .59 and .56, respectively, p < .05). Retest reliability for PCue reached significance at NoS = 6 for amplitude (r = .38, p < .05) and magnitude (r = .36, p < .05 [Figure 3B]).

4. Discussion

EMG of emotion-modulated startle is a commonly used index of emotional processing and startle potentiation to threat has been used as a measure of heightened negative emotional responding to threatening stimuli/situations in various anxiety disorders (Cornwell et al., 2006; Grillon et al., 2009). Given the potential for emotion-modulated startle to serve as a transdiagnostic marker of multiple internalizing conditions, there is a growing literature on the psychometric properties of this psychophysiological measure. This is the first study, however, to examine the reliability of EMG startle as a function of number of startle responses during each condition of the NPU-threat task, a widely used threat of shock paradigm, in two samples – one clinical and one non-clinical. In the clinical sample, we also explored retest reliability in a smaller subset of subjects as a function of number of startle responses for: (1) NPU condition averages, (2) anxiety-potentiated startle to unpredictable threat (UISI/UCue), and (3) fear-potentiated startle to predictable threat (PCue).

In the non-clinical sample, two responses were necessary for magnitude and three responses for amplitude condition averages to reach acceptable internal consistency (alpha >.70) across all conditions. This pattern of results is similar to our laboratory’s previous finding that as few as two responses were necessary for magnitude to reach acceptable internal consistency across all NPU conditions in a non-clinical sample (Nelson et al., 2015). In the clinical sample, just two startle responses were necessary for condition averages (for magnitude and amplitude) to reach acceptable internal consistency across all conditions. Importantly, the internal consistency results for condition averages were similar across MDD-only, PD-only, comorbid MDD/PD, healthy controls, suggesting that internalizing psychopathology did not negatively impact reliability. Internal consistency of startle potentiation to threat, a commonly used index of negative emotional responding, was comparable to that of condition averages. More specifically, split-half correlations for magnitude and amplitude startle potentiation scores reached an acceptable level across all threat conditions in the non-clinical and clinical samples at just two responses total.

Of note is that the NoS necessary for significant retest reliability of average startle and potentiation scores differed between task conditions. All condition averages exhibited significant retest reliability at just two responses except for PISI and NCue. For PISI and NCue to exhibit significant retest reliability for amplitude and magnitude, five responses were necessary. As safety conditions in a threatening task, PISI and NCue may elicit greater variability and inconsistency in startle responding within a given task administration than do clearly threatening conditions (Lissek, Pine & Grillon, 2006). Retest reliability reached significance at just two responses for amplitude and magnitude potentiation to UCue and UISI. However, retest reliability did not reach significance for PCue until NoS = 6, suggesting that startle potentiation to predictable threat may be somewhat more variable than to unpredictable threat.

It is noteworthy that reactivity to unpredictable threat may be more reliable than reactivity to predictable threat, as the literature on the relation between startle potentiation to predictable threat and anxiety psychopathology is less consistent (e.g., Shankman et al., 2013; Grillon et al., 2008) than the literature on the relation between startle potentiation to unpredictable threat and anxiety psychopathology (e.g., Gorka et al., 2017; Lieberman et al., 2016; Shankman et al., 2013). That is, mixed findings on the relation between anxiety psychopathology and reactivity to predictable threat may be in part due to the poorer reliability of startle potentiation during the anticipation of predictable threat. It is also noteworthy that a higher NoS was necessary for significant retest reliability of PCue relative to the NoS necessary for acceptable internal consistency of PCue. This suggests that researchers may need to obtain a greater number of startle responses for temporal stability of startle potentiation to predictable threat, whereas fewer responses may be necessary for internal consistency of startle condition averages during PCue. Relatedly, researchers may place a greater emphasis on the results from retest analyses when designing a study that aims to obtain a temporally stable index of startle. Temporally stable indices of startle may be particularly relevant in clinical research, which may use startle responding as a predictor of risk for psychopathology or response to treatment for psychopathology.

In interpreting retest reliability results, however, it is important to consider several factors. First, this was an exploratory analyses conducted in a smaller sample (n = 34). Second, although retest correlations reached statistical significance for the majority of conditions at just two responses, the coefficients were moderate at this NoS. Retest correlations increased in magnitude as the NoS increased. This pattern of results suggests that the retest reliability of startle condition averages and potentiation scores is improved by a greater NoS.

In sum, investigators may only need six startle responses in non-clinical and clinical samples to obtain reliable and stable indices of average startle amplitude or magnitude in each condition of NPU, as well as of anxiety-potentiated and fear-potentiated startle during NPU. It is worth noting that potentiation scores (rather than startle during the individual conditions) are often the metric of interest in the NPU-threat task and other emotion-modulated startle paradigms. Given this, it is encouraging for psychophysiological researchers that so few startle responses were necessary for potentiation scores and the condition averages that are used to calculate those potentiation scores. As mentioned above, compared to self-report and interview measures of psychological variables, the psychometrics of psychophysiological tasks are often ignored, but this pattern has begun to change. For example, there have been recent investigations on how best to quantify startle potentiation or change within a paradigm (Bradford, Starr, Shackman & Curtin, 2015). Moreover, Kaye et al. (2016) investigated the internal consistency of startle condition averages and potentiation scores. Results from the present study are consistent with those reported by Kaye et al., 2016, such that startle during the NPU-threat task exhibited acceptable internal consistency and temporal stability. Furthermore, NoS analyses reported here suggest that the significant retest reliability reported by Shankman et al. (2013) in this same clinical sample and, could have been obtained with half as many startle responses. There have also been recent investigations to determine the number of events necessary to obtain reliable ERP averages (Foti, Kotov & Hajcak, 2013; Moran, Jendrusina & Moser, 2013; Meyer, Riesel & Proudfit, 2013). Results from ERP investigations of this nature yielded results that are similar to that of the present study, such that a minimum of seven and eight responses have been suggested to obtain a reliable index of the late positive potential and error-related negativity, respectively. This exploratory study therefore adds to this growing methodological literature, and provides an empirically determined guideline to consider when developing a task to assess for emotion-modulated startle (or at least with the NPU paradigm).

Given that startle probes are naturally aversive and participant startle responses tend to habituate over the course of a task (Blumenthal et al., 2005; Campbell et al., 2014), it is important that researchers design their startle tasks to be as brief as possible to decrease participant burden and increase the quality of the psychophysiological data collected. Although data from the present study suggests that a minimum of six may be sufficient to obtain reliable and stable indices of startle during NPU, it is important to note that several responses were excluded from analyses after data collection due to artifacts or non-responses. For the non-clinical sample in the present study, a median of two responses was scored as missing and one as non-response (out of 48 responses across six conditions). For the clinical sample, a median of four responses was scored as missing and one as non-response (out of 72 responses across six conditions). Taken together, these data suggest that approximately 6–7% of startle responses may need to be excluded from data analyses due to artifacts (which typically occur at random throughout a task). It may therefore be necessary to increase the size of one’s task by this percentage so as to improve the likelihood that there is a six responses available for analyses.

Although the overarching goal of this study was to provide an empirically determined guideline to inform the development of startle tasks, a second and related goal is to inform data pre-processing and analytic procedures for emotion-modulated startle paradigms. For example, if some participants have multiple unusable trials due to randomly occurring artifacts, researchers may choose to include those participants in analyses so long as there are still six usable trials per condition. Researchers should, however, consider their sample size when determining whether subjects with noisier EMG data should be excluded from analyses. When sample sizes are small, researchers may choose to include subjects with fewer than six usable trials per condition in order to improve the signal-to-noise ratio of the data. Ultimately, research of this nature can also inform the selection of artifact-rejection procedures that strike an appropriate balance to maximize signal-to-noise ratio. Two important caveats to the abovementioned guideline (i.e., the minimum NoS per condition = six) should be noted. First, this guideline may only generalize to studies that utilize the NPU-threat task (Schmitz & Grillon, 2012). That is, a different NoS may be (and likely will be) necessary to obtain reliable signals for other emotion modulated startle paradigms (e.g., affective picture viewing [e.g., Lang, Bradley & Cuthbert, 1997], or fear conditioning [Duits et al., 2015]). Second, given that the present study’s clinical sample only included individuals with select internalizing disorders (i.e., MDD and/or PD), the suggested minimum NoS may not apply to individuals with other types of psychopathologies, such as externalizing or psychotic disorders.

There are also several limitations to the present study that should be noted. First, the two samples had slightly different NPU-threat tasks (e.g., countdowns vs. geometric shapes for cues), although the overall recommended NoS for both samples were quite comparable. Second, the sample size for retest reliability analyses was too small to evaluate whether retest reliability differed by diagnosis. Third, although analyses were also conducted with startle responses added in a random order (see Footnote 1), startle responses were only randomized once for this purpose. Thus, future studies should examine whether results change as a function of repeated random sampling. Additionally, further studies should examine whether a similar NoS is necessary to obtain a reliable index of baseline startle magnitude. However, this study benefited from several strengths including the assessment of the reliability of startle across two samples, one of which included individuals with diagnosed internalizing psychopathology. Additionally, the reliability of startle magnitude and amplitude were examined, which is important given that these two methods of startle quantification are each frequently used in research.

5. Conclusions

Results from the present study provide information that may help researchers obtain psychometrically sound indices of emotional processing using the eyeblink startle reflex. In particular, our findings suggest that a minimum of six responses may be sufficient for obtaining a reliable and stable index of emotion-modulated startle (i.e., anxiety-potentiated and fear-potentiated startle) during the NPU-threat task in non-clinical and clinical samples. Although this guideline may apply to other emotion-modulated paradigms, future studies should test this directly.

Highlights.

  • An investigation of the number of startle responses needed for a reliable signal

  • Six startle responses were needed for acceptable reliability

  • This number did not differ for those with internalizing psychopathology

  • This guideline can inform task development and artifact rejection procedures

Acknowledgments

Funding: This study was supported by grants from National Institute of Mental Health (S.S., grant number R01 MH098093 and R21 MH080689.

Footnotes

1

The pattern of results was comparable when internal consistency analyses were conducted by adding startle responses to reliability estimates in a random order. For this method, at each NoS (NoS = 2; NoS = 3, etc), startle probes were randomly selected from all possible non-missing startle probes. For example, for NoS = 3, if a participant in study two had all 12 non-missing startle probes for a condition, 3 of the 12 were randomly selected for the analyses.

2

The median is more appropriate than the mean in this context as ‘number of missings’ and ‘number of nonresponses’ were highly skewed (i.e., the vast majority of probes elicited startle responses).

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Abd-El-Fattah SM, Hassan HK. Dependent-Alpha Calculator: Testing the Differences between Dependent Coefficients Alpha. Journal of Applied Quantitative Methods. 2011;6:59–61. [Google Scholar]
  2. Blumenthal TD, Cuthbert BN, Filion DL, Hackley S, Lipp OV, van Boxtel A. Committee report: Guidelines for human startle eyeblink electromyographic studies. Psychophysiology. 2005;42(1):1–15. doi: 10.1111/j.1469-8986.2005.00271.x. [DOI] [PubMed] [Google Scholar]
  3. Bonnet DG. Sample Size Requirements for Testing and Estimating Coefficient Alpha. Journal of Educational and Behavioral Statistics. 2002;27(4):335–340. doi: 10.3102/10769986027004335. [DOI] [Google Scholar]
  4. Bradford DE, Starr MJ, Shackman AJ, Curtin JJ. Empirically based comparisons of the reliability and validity of common quantification approaches for eyeblink startle potentiation in humans. Psychophysiology. 2015;52(12):1669–1681. doi: 10.1111/psyp.12545. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bradley MM, Gianaros P, Lang PJ. As time goes by: Stability of affective startle modulation. Psychophysiology. 1995;32:21. [Google Scholar]
  6. Campbell ML, Gorka SM, McGowan SK, Nelson BD, Sarapas C, Katz AC, … Shankman SA. Does anxiety sensitivity correlate with startle habituation? An examination in two independent samples. Cognition & Emotion. 2014;28(1):46–58. doi: 10.1080/02699931.2013.799062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Carleton RN, Norton MPJ, Asmundson GJ. Fearing the unknown: A short version of the Intolerance of Uncertainty Scale. Journal of anxiety disorders. 2007;21(1):105–117. doi: 10.1016/j.janxdis.2006.03.014. [DOI] [PubMed] [Google Scholar]
  8. Cornwell BR, Johnson L, Berardi L, Grillon C. Anticipation of public speaking in virtual reality reveals a relationship between trait social anxiety and startle reactivity. Biological Psychiatry. 2006;59(7):664–666. doi: 10.1016/j.biopsych.2005.09.015. [DOI] [PubMed] [Google Scholar]
  9. Craske MG, Rauch SL, Ursano R, Prenoveau J, Pine DS, Zinbarg RE. What is anxiety disorder? Depression and Anxiety. 2009;26(12):1066–1085. doi: 10.1002/da.20633. [DOI] [PubMed] [Google Scholar]
  10. Cronbach LJ. Test “reliability”: Its meaning and determination. Psychometrika. 1947;12(1):1–16. doi: 10.1007/BF02289289. [DOI] [PubMed] [Google Scholar]
  11. Cronbach LJ, Meehl PE. Construct validity in psychological tests. Psychological Bulletin. 1955;52(4):281–302. doi: 10.1037/h0040957. [DOI] [PubMed] [Google Scholar]
  12. First MB, Spitzer RL, Gibbon M, Williams JBW. Structured Clinical Interview for DSM-IV Axis I disorders (SCID I) New York: Biometric Research Department; 1996. [Google Scholar]
  13. Foti D, Kotov R, Hajcak G. Psychometric considerations in using error-related brain activity as a biomarker in psychotic disorders. Journal of Abnormal Psychology. 2013;122(2):520–531. doi: 10.1037/a0032618. [DOI] [PubMed] [Google Scholar]
  14. Gorka SM, Lieberman L, Shankman SA, Phan KL. Startle potentiation to uncertain threat as a psychophysiological indicator of fear-based psychopathology: An examination across multiple internalizing disorders. Journal of Abnormal Psychology. 2017;126(1):8–18. doi: 10.1037/abn0000233. http://dx.doi.org/10.1037/abn0000233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gorka SM, Liu H, Sarapas C, Shankman SA. Time course of threat responding in panic disorder and depression. International Journal of Psychophysiology. 2015;98(1):87–94. doi: 10.1016/j.ijpsycho.2015.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Grillon C, Ameli R. Conditioned inhibition of fear-potentiated startle and skin conductance in humans. Psychophysiology. 2001;38(5):807–815. doi: 10.1017/S0048577201000294. [DOI] [PubMed] [Google Scholar]
  17. Grillon C, Pine DS, Lissek S, Rabin S, Bonne O, Vythilingam M. Increased anxiety during anticipation of unpredictable aversive stimuli in posttraumatic stress disorder but not in generalized anxiety disorder. Biological Psychiatry. 2009;66(1):47–53. doi: 10.1016/j.biopsych.2008.12.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Hajcak G, Patrick CJ. Situating psychophysiological science within the research domain criteria (RDoC) framework. International Journal of Psychophysiology. 2015;98(2):223–226. doi: 10.1016/j.ijpsycho.2015.11.001. [DOI] [PubMed] [Google Scholar]
  19. Kaye JT, Bradford DE, Curtin JJ. Psychometric properties of startle and corrugator response in NPU, affective picture viewing, and resting state tasks. Psychophysiology. 2016 doi: 10.1111/psyp.12663. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Lang PJ, Bradley MM, Cuthbert BN. Motivated attention: Affect, activation, and action. In: Lang PJ, Simons RF, Balaban MT, editors. Attention and orienting: Sensory and motivational processes. Hillsdale NJ: Lawrence Erlbaum Associates; 1997. pp. 97–135. [Google Scholar]
  21. Larson CL, Ruffalo D, Nietert JY, Davidson RJ. Temporal stability of the emotion-modulated startle response. Psychophysiology. 2000;37(1):92–101. doi: 10.1111/1469-8986.3710092. [DOI] [PubMed] [Google Scholar]
  22. Duits P, Cath DC, Lissek S, Hox JJ, Hamm AO, Engelhard IM, … Baas JM. Updated meta-analysis of classical fear conditioning in the anxiety disorders. Depression and Anxiety. 2015;32(4):239–253. doi: 10.1002/da.22353. [DOI] [PubMed] [Google Scholar]
  23. Manber R, Allen JJB, Burton K, Kaszniak AW. Valence-dependent modulation of psychophysiological measures: Is there consistency across repeated testing? Psychophysiology. 2000;37(5):683–692. doi: 10.1111/1469-8986.3750683. [DOI] [PubMed] [Google Scholar]
  24. Meyer A, Riesel A, Proudfit GH. Reliability of the ERN across multiple tasks as a function of increasing errors. Psychophysiology. 2013;50(12):1220–1225. doi: 10.1111/psyp.12132. [DOI] [PubMed] [Google Scholar]
  25. Moran TP, Jendrusina AA, Moser JS. The psychometric properties of the late positive potential during emotion processing and regulation. Brain Research. 2013;1516:66–75. doi: 10.1016/j.brainres.2013.04.018. [DOI] [PubMed] [Google Scholar]
  26. Nelson BD, Hajcak G, Shankman SA. Event–related potentials to acoustic startle probes during the anticipation of predictable and unpredictable threat. Psychophysiology. 2015;52(7):887–894. doi: 10.1111/psyp.12418. [DOI] [PubMed] [Google Scholar]
  27. Nunnally JC. Psychometric theory. 2. New York: McGraw-Hill; 1978. [Google Scholar]
  28. Santos J. Cronbach’s alpha: A tool for assessing the reliability of scales. Journal of Extension. 1999;37(2):34–36. [Google Scholar]
  29. Sarapas C, Weinberg A, Langenecker SA, Shankman SA. Relationships among attention networks and physiological responding to threat. Brain and Cognition. 2017;111:63–72. doi: 10.1016/j.bandc.2016.09.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Schmitz A, Grillon C. Assessing fear and anxiety in humans using the threat of predictable and unpredictable aversive events (the NPU-threat test) Nature Protocols. 2012;7(3):527–532. doi: 10.1038/nprot.2012.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Schwartz SJ, Lilienfeld SO, Meca A, Sauvigné KC. The role of neuroscience within psychology: A call for inclusiveness over exclusiveness. American Psychologist. 2016;71(1):52–70. doi: 10.1037/a0039678. [DOI] [PubMed] [Google Scholar]
  32. Shankman SA, Gorka SM. Psychopathology research in the RDoC era: Unanswered questions and the importance of the psychophysiological unit of analysis. International Journal of Psychophysiology. 2015;98(2):330–337. doi: 10.1016/j.ijpsycho.2015.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Shankman SA, Nelson BD, Sarapas C, Robison-Andrew E, Campbell ML, Altman SE, … Gorka SM. A psychophysiological investigation of threat and reward sensitivity in individuals with panic disorder and/or major depressive disorder. Journal of Abnormal Psychology. 2013;122(2):322–338. doi: 10.1037/a0030747. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Tavakol M, Dennick R. Making sense of Cronbach’s alpha. International Journal of Medical Education. 2011;2:53–55. doi: 10.5116/ijme.4dfb.8dfd. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Tomarken AJ. A psychometric perspective on psychophysiological measures. Psychological Assessment. 1995;7(3):387–395. doi: 10.1037/1040-3590.7.3.387. [DOI] [Google Scholar]
  36. Vrana SR, Spence EL, Lang PJ. The startle probe response: A new measure of emotion? Journal of Abnormal Psychology. 1988;97(4):487–491. doi: 10.1037/0021-843X.97.4.487. [DOI] [PubMed] [Google Scholar]
  37. Watson D, O’Hara MW, Simms LJ, Kotov R, Chmielewski M, McDade-Montez EA, … Stuart S. Development and validation of the Inventory of Depression and Anxiety Symptoms (IDAS) Psychological Assessment. 2007;19(3):253. doi: 10.1037/1040-3590.19.3.253. [DOI] [PubMed] [Google Scholar]

RESOURCES