Reinforcement Learning and the Reward Positivity (RewP) with Aversive Outcomes

Elizabeth A Bauer; Brandon K Watanabe; Annmarie MacNamara

doi:10.1111/psyp.14460

. Author manuscript; available in PMC: 2025 Apr 1.

Published in final edited form as: Psychophysiology. 2023 Nov 22;61(4):e14460. doi: 10.1111/psyp.14460

Reinforcement Learning and the Reward Positivity (RewP) with Aversive Outcomes

Elizabeth A Bauer ¹, Brandon K Watanabe ¹, Annmarie MacNamara ¹

PMCID: PMC10939817 NIHMSID: NIHMS1956832 PMID: 37994210

Abstract

The reinforcement learning (RL) theory of the reward positivity (RewP), an ERP component that measures reward responsivity, suggests that the RewP should be largest when positive outcomes are unexpected and has been supported by work using appetitive outcomes (e.g., money). However, the RewP can also be elicited by the absence of aversive outcomes (e.g., shock). The limited work to-date that has manipulated expectancy while using aversive outcomes has not supported the predictions of RL theory. Nonetheless, this work has been difficult to reconcile with the appetitive literature because the RewP was not observed as a reward signal in these studies, which used passive tasks that did not involve participant choice. Here, we tested the predictions of the RL theory by manipulating expectancy in an active/choice-based threat-of-shock doors task that was previously found to elicit the RewP as a reward signal. Moreover, we used principal components analysis to isolate the RewP from overlapping ERP components. Eighty participants viewed pairs of doors surrounded by a red or green border; shock delivery was expected (80%) following red-bordered doors and unexpected (20%) following green-bordered doors. The RewP was observed as a reward signal (i.e., no shock > shock) that was not potentiated for unexpected feedback. In addition, the RewP was larger overall for unexpected (versus expected) feedback. Therefore, the RewP appears to reflect the additive (not interactive) effects of reward and expectancy, challenging the RL theory of the RewP, at least when reward is defined as the absence of an aversive outcome.

1. Introduction

The reward positivity (RewP) is a frontocentral event-related potential (ERP) measuring reward responsivity (for review, see Proudfit, 2015). The RewP is typically elicited by feedback indicating monetary reward, but can also be elicited by feedback indicating the absence of an aversive event, such as shock delivery (Heydari & Holroyd, 2016; Mulligan & Hajcak, 2018). According to reinforcement learning (RL) theory, the RewP encodes reward prediction errors. Therefore, it should be largest for unexpected positive outcomes. Work using monetary reward has supported this prediction (for meta-analysis, see Sambrook & Goslin, 2015). On the other hand, the small number of studies that have used aversive outcomes to elicit the RewP (i.e., noise blasts, Soder & Potts, 2018; shock, Talmi, Atkinson, & El-Deredy, 2013) have not supported this prediction, raising the question of whether RL theory holds when positive outcomes are defined as the absence of aversive events.

The RewP has been referred to as the feedback negativity (FN; Hajcak, Moser, Holroyd, & Simons, 2007) or feedback-related negativity (FRN; Sambrook & Goslin, 2015; Talmi, Atkinson, & El-Deredy, 2013) when it has been conceptualized in terms of no reward minus reward (i.e., a relative negativity). Irrespective of its name, the RewP/FN/FRN is more positive for rewarding compared to not rewarding feedback (Heydari & Holroyd, 2016; Mulligan & Hajcak, 2018). Neurobiologically, the notion that the RewP is a marker of reward responsivity is supported by work suggesting that it reflects activity in the mesocortical dopamine system (Carlson, Foti, Mujica-Parodi, Harmon-Jones, & Hajcak, 2011; Foti, Weinberg, Dien, & Hajcak, 2011; Holroyd & Coles, 2002; Holroyd, Larsen, & Cohen, 2004). That is, dopaminergic feedback associated with the RewP is thought to drive learning by biasing performance away from non-rewarding and towards rewarding responses. Moreover, these signals may be enhanced when outcomes are unexpected, indicating that a change in expectations/predictions is needed (Holroyd, Nieuwenhuis, Yeung, & Cohen, 2003). In line with this notion, the RL theory of the RewP suggests that this component should be sensitive to the interaction of outcome × expectancy, whereby feedback indicating rewarding (minus non-rewarding) outcomes leads to more positive amplitudes when outcomes are unexpected.

Two studies to-date have tested the predictions of RL theory using aversive outcomes to elicit the RewP (Soder & Potts, 2018; Talmi et al., 2013). In both studies, the RewP was potentiated for feedback indicating the delivery of unexpected aversive outcomes rather than unexpected positive outcomes. Critically, however, confirmation/disconfirmation of the RL theory’s notion that the RewP encodes positive prediction errors is difficult using results from these studies, because in contrast to the appetitive literature, these studies failed to observe a main effect of reward on the RewP. That is, delivered outcomes (whether positive or negative) elicited more positive amplitudes in the time range of the RewP – i.e., a “‘reverse’ FRN” was observed (Talmi et al., 2013, p. 8267). Therefore, both studies suggested, more fundamentally, that the RewP might represent a salience rather than a reward signal.

One potential explanation for the results observed in these prior studies might be their use of passive tasks, in which feedback was delivered without requiring participants to make a choice. Prior work using aversive outcomes and active tasks, in which participants had to choose between stimuli before feedback was presented, has, by contrast, found that absent compared to delivered aversive outcomes elicited more positive amplitudes, in line with monetary reward tasks and a reward-based account of the RewP (Heydari & Holroyd, 2016; Mulligan & Hajcak, 2018). Active tasks might better elicit the RewP, as prior work has shown the RewP is maximized when participants must choose between stimuli (Walsh & Anderson, 2012; Yeung & Sanfey, 2004). Therefore, a more interpretable test of RL theory in the context of absent aversive outcomes may involve the use of an active task, to elicit reward-related modulation of the RewP.

Another potential explanation for why some prior work has observed larger RewPs for aversive versus non-aversive outcomes might stem from a confound between feedback valence and the rarity of stimuli used to signal this feedback. For instance, in one study in which feedback could indicate monetary reward, monetary loss, or neutral outcomes, the RewP was larger for monetary loss than neutral outcomes. However, in this experiment, the feedback used to signal monetary loss (“−75c”) was rare compared to the feedback that signaled monetary reward, because “$0” was used to indicate loss and neutral outcomes. In a separate experiment, the researchers avoided this confound between feedback valence and feedback stimulus rarity by using “win” and “lose”. Results from this second experiment showed that both monetary reward and absent monetary loss yielded more positive RewPs than neutral outcomes (Glazer & Nusslock, 2022), in line with the notion that the RewP is a reward signal. Therefore, confounds between feedback valence and the rarity of the stimuli used to signal this valence may attenuate or even reverse valence effects on the RewP. Moreover, although the joint effects of reward and expectancy could not be examined in this study (because rarity was not manipulated independently of valence), results suggested that the RewP might be sensitive to both reward and expectancy effects.

An additional factor that may be important to understanding the functional significance of the RewP is ERP quantification methods that isolate the RewP from neighboring ERP components. In particular, the RewP overlaps in time and space with the P3, which is also sensitive to expectancy (Donchin & Coles, 1988; Polich, 2007). Prior work that has tested the RL theory has attempted to isolate the RewP from the P3 by creating difference scores matched on expectancy – i.e., unexpected positive (minus unexpected negative) versus expected positive (minus expected negative) outcomes (Holroyd et al., 2011; Holroyd & Krigolson, 2007). In contrast to this difference score approach, principal components analysis (PCA) can provide a sophisticated, data-driven means of separating out overlapping ERP components and isolating the RewP. Moreover, PCA has already been successfully used to isolate the RewP associated with monetary gain (Foti et al., 2011; Holroyd et al., 2008; Sambrook & Goslin, 2016) and the absence of aversive events (Mulligan & Hajcak, 2018).

Here, we set out to determine whether the predictions of RL theory hold for absent aversive outcomes, when using an active task that manipulated outcome expectancy and a PCA to isolate the RewP from the P3. We used a modified doors task previously shown to elicit the RewP to an omitted aversive outcome (Mulligan & Hajcak, 2018). On each trial, participants selected the left or right door from a pair of doors. They were told that the door could reveal positive feedback (shock would not be delivered) or negative feedback (shock would be delivered). Reward expectancy was modulated using colored borders (red or green) surrounding the pair of doors, which indicated the likelihood of aversive compared to non-aversive outcome delivery on each trial: “expected” (80%) or “unexpected” (20%). Based on prior work, we hypothesized that feedback indicating aversive outcome omission would elicit larger RewPs compared to feedback indicating aversive outcome delivery (Heydari & Holroyd, 2016; Mulligan & Hajcak, 2018). In line with the RL theory of the RewP, we expected that outcome and expectancy would interact such that positive versus negative outcomes would lead to relatively more positive amplitudes for unexpected feedback (Holroyd & Krigolson, 2007; Holroyd, Krigolson, & Lee, 2011). We expected that the P3 would be larger for unexpected compared to expected feedback and for positive versus negative feedback (Hajcak et al., 2005, 2007).

2. Method

All data are open and available on the Open Science Framework (https://osf.io/j3hyn/).

2.1. Participants

Ninety-four undergraduate students provided written informed consent and participated in the study for course credit. Data from fourteen participants were excluded from analyses due to insufficient artifact-free trials for EEG analysis (> 50% of trials rejected), resulting in a final sample of 80 participants (49 female; age M = 18.95 years, SD = 1.90). Study procedures were in compliance with the Helsinki Declaration of 1975 (as revised in 1983) and were approved by the Texas A&M institutional review board.

2.2. Experimental Procedure and Task

To control for individual differences in shock sensitivity, participants’ shock sensitivity tolerance was determined using standard procedures (Bradford, Magruder, Korhumel, & Curtin, 2014; Cheng, Jackson, & MacNamara, 2022; MacNamara & Barley, 2018). In brief, participants rated a series of gradually increasing shocks, administered to the wrist, using a scale from 0 (“Can’t feel shock”) to 100 (“Highest you can tolerate”). When participants indicated that the shock level had reached the highest level they could tolerate, no further shocks were administered, and the shock level selected by the participant was used for the modified Doors task.

Next, participants completed a novel, modified version of a doors task; Figure 1 depicts example trials from the task. During the task, participants were presented with a picture of two doors and were asked to choose one door by pressing the left or right mouse button. The doors were surrounded by either a red or green border, and remained onscreen until the participant made a response; doors were surrounded by a green border on 50% of trials and a red border on 50% of trials. Following the participant’s response, a white fixation cross was presented on a black screen for 1000 ms, followed by the presentation of a red or green arrow for 2000 ms. Red arrows were always followed by a mild electric shock to the wrist, whereas green arrows were never followed by electric shock. Shock was administered at the end of arrow presentation/at 2000 ms and lasted for 200 ms, during which time a white fixation cross was presented on a black screen. After shock administration, a white fixation cross was presented on a black screen for 7000 ms before participants started the next trial. The task consisted of four blocks with 30 trials per block, for a total of 120 trials. Condition order was random within each block; 50% of trials within each block were red-bordered door trials and 50% of trials were green-bordered door trials. Prior to starting the task, participants completed six practice trials where they selected a door and were presented with a red or green arrow. No shocks were administered during the practice trials. Following the practice trials, participants were told that in the real task, the doors would be surrounded by either a red or green border. Participants were also informed that if they saw a red arrow after the doors went offscreen they would receive a mild electric shock, and that if they saw a green arrow they would not receive an electric shock.

Unknown to participants, red bordered doors were followed by a red arrow (i.e., shock) on 80% of trials (48 trials) and a green arrow (i.e., no shock) on 20% of trials (12 trials); green bordered doors were followed by a green arrow on 80% of trials (48 trials) and a red arrow on 20% of trials (12 trials). Therefore, over time, participants would come to expect different outcomes following red- versus green-bordered doors (e.g., a red arrow paired with shock following red bordered doors) but occasionally the outcome would be different than expected (e.g., a green arrow and no shock following red bordered doors). Participants were not informed of the meaning behind the red and green borders (i.e., shock likely versus unlikely) prior to starting the task.

2.3. EEG Recording and Data Reduction

Continuous EEG was recorded using an ActiCap and the ActiChamp amplifier system (Brain Products, Gilching, Germany). Thirty-two electrode sites were used based on the 10/20 system. The electrooculogram (EOG) was recorded from four facial electrodes: two electrodes were placed approximately 1 cm above and below the right eye, forming a bipolar channel to measure vertical eye movement and blinks and two electrodes were placed approximately 1 cm beyond the outer edges of each eye, forming a bipolar channel to measure horizontal eye movements. EEG data was digitized at a 24-bit resolution with a sampling rate of 1000 Hz.

Offline, data were processed using BrainVision Analyzer 2 software (Brain Products GmbH, Gilching, Germany). The signal from each electrode was referenced offline to the average of the left and right mastoids (TP9/10) and band-pass filtered with high-pass and low-pass filters of 0.01 and 30 Hz, respectively. Data were segmented for each trial beginning 200 ms prior to arrow presentation and lasting for 1200 ms (1000 ms into arrow presentation). Baseline correction for each trial was performed using the 200 ms before arrow presentation. Eye blink and ocular artifacts were corrected using the method developed by Miller, Gratton and Yee (1988). Artifact analysis was used to identify a voltage step of more than 50.0 μV between sample points, a voltage difference of 300.0 μV within a trial, and a maximum voltage difference of less than 0.50 μV within 100 ms intervals. Trials were also inspected visually for any remaining artifacts, and data from individual channels containing artifacts were rejected on a trial-to-trial basis. Following artifact rejection, the average number of trials left per condition was as follows: expected no shock M = 42.95, SD = 4.90 (out of a total of 48 trials); expected shock M = 43.51, SD = 4.46 (out of a total of 48 trials); unexpected no shock M = 10.74, SD = 1.25 (out of a total of 12 trials); unexpected shock M = 10.74, SD = 1.27 (out of a total of 12 trials).

Temporospatial PCA was used to quantify the RewP and the P3, which overlaps in time and space with the RewP. The temporospatial PCA was conducted using the ERP PCA Toolkit version 2.64 (Dien, 2010b). Promax rotation, the covariance matrix, and Kaiser normalization were used for the temporal PCA, in line with prior work (Dien, 2010a; Dien, Beal, & Berg, 2005; Mulligan & Hajcak, 2018). Fourteen temporal factors were extracted based on the Scree plots. We then subjected the spatial distributions of the temporal factors to a spatial PCA. Infomax rotation, and the covariance matrix were used for the spatial PCA. Seven spatial factors were extracted based on the Scree plots, resulting in 98 factor combinations.

Sixteen factors accounted for > 1% of the variance and were retained for further inspection (Kaiser, 1960). One factor which accounted for 13% of the variance was selected for further analysis because it was temporally and spatially analogous to the RewP as a positivity numerically maximal at FC1 at 387 ms; prior work using PCA to isolate the RewP also observed a similar component (Mulligan & Hajcak, 2018). Another factor that accounted for 2% of the variance was selected for further analysis because it was temporally and spatially analogous to the P3 as a positivity numerically maximal at Pz at 387 ms.

2.4. Statistical Analyses

The RewP and P3 were analyzed using separate 2 (outcome: no shock, shock) × 2 (expectancy: expected, unexpected) repeated measures analyses of variance (ANOVAs). All analyses were performed using SPSS statistical software version 27.0 (IBM, Armonk, NY).

3. Results

Table 1 presents average RewP and P3 amplitudes, shown separately for each outcome and expectancy condition. Figure 2 depicts grand-averaged waveforms for the RewP at a pooling of Fz and Cz, shown separately for each condition. These voltages reflect scalp-elicited amplitudes (i.e., not PCA components) and are presented as a point of comparison/for full visualization of effects as they appear in the summed waveform.

Table 1.

PCA-derived means (SDs) for the RewP and P3

	No Shock		Shock
	Expected	Unexpected	Expected	Unexpected
RewP (μV)	9.81 (6.13)	11.32 (6.86)	6.41 (6.20)	8.32 (6.96)
P3 (μV)	6.20 (3.09)	7.02 (3.79)	4.77 (3.06)	6.29 (3.79)

Open in a new tab

Figure 2. — Scalp-elicited amplitudes corresponding to the RewP. Grand-averaged waveforms at a pooling of Fz and Cz, shown separately for each condition.

3.1. RewP

Figure 3A depicts PCA-derived headmaps for the RewP showing the voltage difference distribution for no shock minus shock feedback and unexpected minus expected feedback. Figure 3B depicts PCA-derived grand-averaged waveforms for the RewP, shown separately for each condition.

There was a main effect of outcome, F(1, 79) = 39.72, p < .001, η_p² = .34, such that the RewP was larger for no shock (M = 10.57 μV, SD = 6.14) versus shock (M = 7.36 μV, SD = 6.30) feedback. There was also a main effect of expectancy, F(1, 79) = 22.19, p < .001, η_p² = .22, such that the RewP was larger for unexpected (M = 9.82 μV, SD = 6.24) versus expected (M = 8.11 μV, SD = 5.77) feedback. The interaction between outcome × expectancy did not reach significance, p = .483.

3.2. P3

Figure 4A depicts PCA-derived headmaps for the P3 showing the voltage difference distribution for no shock minus shock feedback, shown separately for expected and unexpected feedback. Figure 4B depicts PCA-derived grand-averaged waveforms for the P3, shown separately for each condition.

There was a main effect of outcome, F(1, 79) = 18.69, p < .001, η_p² = .19, such that the P3 was larger for no shock (M = 6.61 μV, SD = 3.23) versus shock (M = 5.53 μV, SD = 3.26) feedback. There was also a main effect of expectancy, F(1, 79) = 31.70, p < .001, η_p² = .29, such that the P3 was larger for unexpected (M = 6.66 μV, SD = 3.48) versus expected (M = 5.48 μV, SD = 2.86) feedback. Additionally, there was a significant interaction between outcome × expectancy, F(1, 79) = 4.76, p = .032, η_p² = .06, which indicated that the expected no shock versus expected shock difference was larger than the unexpected no shock versus unexpected shock difference.

4. Discussion

Prior work has supported the RL theory of the RewP when positive outcomes have been appetitive. That is, larger RewPs have generally been observed for unexpected positive (versus negative) outcomes as compared to expected positive (versus negative) outcomes (Sambrook & Goslin, 2015). Nonetheless, the limited number of studies that have operationalized positive outcomes as the absence of an aversive event have not supported the RL theory of the RewP (Soder & Potts, 2018; Talmi et al., 2013). Moreover, these prior studies failed to show reward-related potentiation of the RewP, possibly because they used passive rather than active tasks and/or confounded feedback valence with the rarity of feedback stimuli. Because the RL theory of the RewP also assumes that amplitudes should be more positive for positive versus negative outcomes, results from these studies make it difficult to assess the expectancy predictions of the RL theory. Here, we tested the predictions of RL theory using a variation on an active task shown to elicit reward-related modulation of the RewP to absent aversive outcomes (Mulligan & Hajcak, 2018). Additionally, we used PCA to isolate the RewP from the overlapping P3.

We observed that the RewP was larger for feedback indicating aversive outcome omission versus delivery, as well as for unexpected versus expected feedback. Therefore, in contrast to the predictions of the RL theory, these results indicate that the RewP reflects the additive (not interactive) effects of reward and expectancy. In contrast to prior work that had used absent aversive outcomes (Soder & Potts, 2018; Talmi et al., 2013), we observed the RewP as a reward signal, rather than a salience signal. Therefore, our results align with a large body of prior work on the RewP (Mulligan & Hajcak, 2018; Proudfit, 2015), but suggest that in addition to being a reward signal, the RewP may additionally (additively) be potentiated to unexpected outcomes. Critically, the current study builds on prior work that had separately examined the influence of outcome and expectancy on the RewP (Glazer & Nusslock, 2022) by examining these factors simultaneously, which allowed for a test of the RL theory, which predicts that these factors should interact.

Since we did not observe an interaction between outcome × expectancy for the RewP, our results do not support the notion that the RewP reflects positive prediction errors and might be surprising based on the large body of work supporting the RL theory of the RewP with appetitive outcomes (Sambrook & Goslin, 2015). Nonetheless, they may be explained by a few factors. For instance, there is evidence that both aversive and appetitive prediction errors are encoded in the mesolimbicortical neurocircuit (Brooks & Berns, 2013; Garrison, Erdeniz, & Done, 2013). Though this notion is in contrast to traditional depictions of this system as signaling reward-related valuation in particular, evidence suggests that dopaminergic neurons in different regions of the mesocorticolimbic system increase firing for the offset and onset of aversive stimuli (Budygin et al., 2012). Moreover, some dopaminergic neurons are excited by both appetitive reward and aversive outcome delivery, particularly when the outcome is unexpected (Matsumoto & Hikosaka, 2009). Therefore, based on our results and prior work (Soder & Potts, 2018; Talmi et al., 2013), it seems possible that the predictions of RL theory do not hold when prediction errors concern aversive outcomes.

Our results also confirm that the RewP can be elicited by the absence of an aversive outcome (Heydari & Holroyd, 2016; Mulligan & Hajcak, 2018). These results stand in contrast to prior work that had used passive tasks, and had failed to show reward-related modulation of the RewP for absent aversive outcomes (Soder & Potts, 2018; Talmi et al., 2013). Outcomes in passive tasks should be less meaningful for learning, because the absence of choice behavior may signal to the participant that there is no way to affect outcomes. Under these conditions, salient rather than rewarding outcomes may potentiate the RewP.

We also observed main effects for the P3 that were similar to our RewP results. That is, the P3 was larger for rewarding and unexpected feedback. In addition, we observed an interaction between outcome and expectancy for the P3, which was in the opposite direction of what RL theory would predict, as the “win – loss” difference was the largest for expected rather than unexpected feedback. This appeared to be driven by a greater expectancy-related modulation of the P3 to shock feedback (see Table 1 and Figure 4), which fits with prior work suggesting that the P3 is sensitive to the motivational salience of stimuli (Hajcak & Foti, 2020; Nieuwenhuis, 2011). Moreover, these results fit with prediction error theories, which have suggested that unexpected negative outcomes might be especially important to learning (Rescorla & Wagner, 1972; Yau & McNally, 2019). More broadly, it is likely that the P3 will affect the amplitude of the RewP when the RewP is not separated out from other components (see Figure 2). Therefore, PCA or similar methods are important for work that wishes to discern effects on the RewP as distinct from the P3.

One limitation of the present study is that we only used aversive outcomes. Having participants complete both aversive and appetitive outcome versions of the task would have allowed us to compare the effect of expectancy on the RewP elicited by these different types of outcomes. This would further inform understanding of whether the same or different neural systems are implicated in appetitive and aversive prediction errors.

Together, our results challenge the notion that the RewP measures reward prediction errors. Specifically, they suggest that the RewP is potentiated to the additive effects of reward and unexpected outcomes. Going forward, further testing of RL theory using aversive outcomes might lead to refinements of the RL theory, and ultimately, to a more complete and comprehensive understanding of reward-based learning.

Acknowledgments

This work was supported in part by NIMH R01 MH125083 (to AM). EB was supported by NIMH T32 MH106454 during preparation of the manuscript.

Footnotes

CRediT Statement

Elizabeth Bauer: Formal analysis; writing – original draft

Brandon Watanabe: Formal analysis

Annmarie MacNamara: Conceptualization; methodology; supervision; writing-review and editing

References

Bradford DE, Magruder KP, Korhumel RA, & Curtin JJ (2014). Using the Threat Probability Task to Assess Anxiety and Fear During Uncertain and Certain Threat. JoVE (Journal of Visualized Experiments), (91), e51905. 10.3791/51905 [DOI] [PMC free article] [PubMed] [Google Scholar]
Brooks AM, & Berns GS (2013). Aversive stimuli and loss in the mesocorticolimbic dopamine system. Trends in Cognitive Sciences, 17(6), 281–286. 10.1016/j.tics.2013.04.001 [DOI] [PubMed] [Google Scholar]
Budygin EA, Park J, Bass CE, Grinevich VP, Bonin KD, & Wightman RM (2012). Aversive stimulus differentially triggers subsecond dopamine release in reward regions. Neuroscience, 201, 331–337. 10.1016/j.neuroscience.2011.10.056 [DOI] [PMC free article] [PubMed] [Google Scholar]
Carlson JM, Foti D, Mujica-Parodi LR, Harmon-Jones E, & Hajcak G (2011). Ventral striatal and medial prefrontal BOLD activation is correlated with reward-related electrocortical activity: A combined ERP and fMRI study. NeuroImage, 57(4), 1608–1616. 10.1016/j.neuroimage.2011.05.037 [DOI] [PubMed] [Google Scholar]
Cheng Y, Jackson TB, & MacNamara A (2022). Modulation of threat extinction by working memory load: An event-related potential study. Behaviour Research and Therapy, 150, 104031. 10.1016/j.brat.2022.104031 [DOI] [PMC free article] [PubMed] [Google Scholar]
Dien J (2010a). Evaluating two-step PCA of ERP data with Geomin, Infomax, Oblimin, Promax, and Varimax rotations. Psychophysiology, 47(1), 170–183. 10.1111/j.1469-8986.2009.00885.x [DOI] [PubMed] [Google Scholar]
Dien J (2010b). The ERP PCA Toolkit: An open source program for advanced statistical analysis of event-related potential data. Journal of Neuroscience Methods, 187(1), 138–145. 10.1016/j.jneumeth.2009.12.009 [DOI] [PubMed] [Google Scholar]
Dien J, Beal DJ, & Berg P (2005). Optimizing principal components analysis of event-related potentials: Matrix type, factor loading weighting, extraction, and rotations. Clinical Neurophysiology, 116(8), 1808–1825. 10.1016/j.clinph.2004.11.025 [DOI] [PubMed] [Google Scholar]
Foti D, Weinberg A, Dien J, & Hajcak G (2011). Event-related potential activity in the basal ganglia differentiates rewards from nonrewards: Temporospatial principal components analysis and source localization of the feedback negativity. Human Brain Mapping, 32(12), 2207–2216. 10.1002/hbm.21182 [DOI] [PMC free article] [PubMed] [Google Scholar]
Garrison J, Erdeniz B, & Done J (2013). Prediction error in reinforcement learning: A meta-analysis of neuroimaging studies. Neuroscience & Biobehavioral Reviews, 37(7), 1297–1310. 10.1016/j.neubiorev.2013.03.023 [DOI] [PubMed] [Google Scholar]
Glazer J, & Nusslock R (2022). Outcome valence and stimulus frequency affect neural responses to rewards and punishments. Psychophysiology, 59(3), e13981. 10.1111/psyp.13981 [DOI] [PMC free article] [PubMed] [Google Scholar]
Hajcak G, & Foti D (2020). Significance?… Significance! Empirical, methodological, and theoretical connections between the late positive potential and P300 as neural responses to stimulus significance: An integrative review. Psychophysiology, 57(7), e13570. 10.1111/psyp.13570 [DOI] [PubMed] [Google Scholar]
Hajcak G, Moser JS, Holroyd CB, & Simons RF (2007). It’s worse than you thought: The feedback negativity and violations of reward prediction in gambling tasks. Psychophysiology, 44(6), 905–912. 10.1111/j.1469-8986.2007.00567.x [DOI] [PubMed] [Google Scholar]
Heydari S, & Holroyd CB (2016). Reward positivity: Reward prediction error or salience prediction error? Psychophysiology, 53(8), 1185–1192. 10.1111/psyp.12673 [DOI] [PubMed] [Google Scholar]
Holroyd CB, & Coles MGH (2002). The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109(4), 679–709. 10.1037/0033-295X.109.4.679 [DOI] [PubMed] [Google Scholar]
Holroyd CB, & Krigolson OE (2007). Reward prediction error signals associated with a modified time estimation task. Psychophysiology, 44(6), 913–917. 10.1111/j.1469-8986.2007.00561.x [DOI] [PubMed] [Google Scholar]
Holroyd CB, Krigolson OE, & Lee S (2011). Reward positivity elicited by predictive cues. NeuroReport, 22(5), 249–252. 10.1097/WNR.0b013e328345441d [DOI] [PubMed] [Google Scholar]
Holroyd CB, Larsen JT, & Cohen JD (2004). Context dependence of the event-related brain potential associated with reward and punishment. Psychophysiology, 41(2), 245–253. 10.1111/j.1469-8986.2004.00152.x [DOI] [PubMed] [Google Scholar]
Holroyd CB, Nieuwenhuis S, Yeung N, & Cohen JD (2003). Errors in reward prediction are reflected in the event-related brain potential. NeuroReport, 14(18), 2481–2484. [DOI] [PubMed] [Google Scholar]
Kaiser HF (1960). The Application of Electronic Computers to Factor Analysis. Educational and Psychological Measurement, 20(1), 141–151. 10.1177/001316446002000116 [DOI] [Google Scholar]
MacNamara A, & Barley B (2018). Event-related potentials to threat of predictable and unpredictable shock. Psychophysiology, 55(10), e13206. 10.1111/psyp.13206 [DOI] [PMC free article] [PubMed] [Google Scholar]
Matsumoto M, & Hikosaka O (2009). Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature, 459(7248), 837–841. 10.1038/nature08028 [DOI] [PMC free article] [PubMed] [Google Scholar]
Miller GA, Gration G, & Yee CM (1988). Generalized Implementation of an Eye Movement Correction Procedure. Psychophysiology, 25(2), 241–243. 10.1111/j.1469-8986.1988.tb00999.x [DOI] [Google Scholar]
Mulligan EM, & Hajcak G (2018). The electrocortical response to rewarding and aversive feedback: The reward positivity does not reflect salience in simple gambling tasks. International Journal of Psychophysiology, 132, 262–267. 10.1016/j.ijpsycho.2017.11.015 [DOI] [PubMed] [Google Scholar]
Nieuwenhuis S (2011). Learning, the P3, and the locus coeruleus-norepinephrine system. In Neural Basis of Motivational and Cognitive Control (pp. 209–222). Oxford University Press. [Google Scholar]
Proudfit GH (2015). The reward positivity: From basic research on reward to a biomarker for depression. Psychophysiology, 52(4), 449–459. 10.1111/psyp.12370 [DOI] [PubMed] [Google Scholar]
Rescorla RA, & Wagner AR (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and non-reinforcement. In Black AH & Prokasy WF (Eds.), Classical Conditioning II: Current Research and Theory (pp. 64–69). Appleton-Century Crofts. Retrieved from https://cir.nii.ac.jp/crid/1572543025504096640 [Google Scholar]
Sambrook TD, & Goslin J (2015). A neural reward prediction error revealed by a meta-analysis of ERPs using great grand averages. Psychological Bulletin, 141(1), 213–235. 10.1037/bul0000006 [DOI] [PubMed] [Google Scholar]
Soder HE, & Potts GF (2018). Medial frontal cortex response to unexpected motivationally salient outcomes. International Journal of Psychophysiology, 132, 268–276. 10.1016/j.ijpsycho.2017.11.003 [DOI] [PubMed] [Google Scholar]
Talmi D, Atkinson R, & El-Deredy W (2013). The Feedback-Related Negativity Signals Salience Prediction Errors, Not Reward Prediction Errors. Journal of Neuroscience, 33(19), 8264–8269. 10.1523/JNEUROSCI.5695-12.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
Walsh MM, & Anderson JR (2012). Learning from experience: Event-related potential correlates of reward processing, neural adaptation, and behavioral choice. Neuroscience & Biobehavioral Reviews, 36(8), 1870–1884. 10.1016/j.neubiorev.2012.05.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
Yau JO-Y, & McNally GP (2019). Rules for aversive learning and decision-making. Current Opinion in Behavioral Sciences, 26, 1–8. 10.1016/j.cobeha.2018.08.006 [DOI] [Google Scholar]
Yeung N, & Sanfey AG (2004). Independent Coding of Reward Magnitude and Valence in the Human Brain. Journal of Neuroscience, 24(28), 6258–6264. 10.1523/JNEUROSCI.4537-03.2004 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Bradford DE, Magruder KP, Korhumel RA, & Curtin JJ (2014). Using the Threat Probability Task to Assess Anxiety and Fear During Uncertain and Certain Threat. JoVE (Journal of Visualized Experiments), (91), e51905. 10.3791/51905 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Brooks AM, & Berns GS (2013). Aversive stimuli and loss in the mesocorticolimbic dopamine system. Trends in Cognitive Sciences, 17(6), 281–286. 10.1016/j.tics.2013.04.001 [DOI] [PubMed] [Google Scholar]

[R3] Budygin EA, Park J, Bass CE, Grinevich VP, Bonin KD, & Wightman RM (2012). Aversive stimulus differentially triggers subsecond dopamine release in reward regions. Neuroscience, 201, 331–337. 10.1016/j.neuroscience.2011.10.056 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Carlson JM, Foti D, Mujica-Parodi LR, Harmon-Jones E, & Hajcak G (2011). Ventral striatal and medial prefrontal BOLD activation is correlated with reward-related electrocortical activity: A combined ERP and fMRI study. NeuroImage, 57(4), 1608–1616. 10.1016/j.neuroimage.2011.05.037 [DOI] [PubMed] [Google Scholar]

[R5] Cheng Y, Jackson TB, & MacNamara A (2022). Modulation of threat extinction by working memory load: An event-related potential study. Behaviour Research and Therapy, 150, 104031. 10.1016/j.brat.2022.104031 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Dien J (2010a). Evaluating two-step PCA of ERP data with Geomin, Infomax, Oblimin, Promax, and Varimax rotations. Psychophysiology, 47(1), 170–183. 10.1111/j.1469-8986.2009.00885.x [DOI] [PubMed] [Google Scholar]

[R7] Dien J (2010b). The ERP PCA Toolkit: An open source program for advanced statistical analysis of event-related potential data. Journal of Neuroscience Methods, 187(1), 138–145. 10.1016/j.jneumeth.2009.12.009 [DOI] [PubMed] [Google Scholar]

[R8] Dien J, Beal DJ, & Berg P (2005). Optimizing principal components analysis of event-related potentials: Matrix type, factor loading weighting, extraction, and rotations. Clinical Neurophysiology, 116(8), 1808–1825. 10.1016/j.clinph.2004.11.025 [DOI] [PubMed] [Google Scholar]

[R9] Foti D, Weinberg A, Dien J, & Hajcak G (2011). Event-related potential activity in the basal ganglia differentiates rewards from nonrewards: Temporospatial principal components analysis and source localization of the feedback negativity. Human Brain Mapping, 32(12), 2207–2216. 10.1002/hbm.21182 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Garrison J, Erdeniz B, & Done J (2013). Prediction error in reinforcement learning: A meta-analysis of neuroimaging studies. Neuroscience & Biobehavioral Reviews, 37(7), 1297–1310. 10.1016/j.neubiorev.2013.03.023 [DOI] [PubMed] [Google Scholar]

[R11] Glazer J, & Nusslock R (2022). Outcome valence and stimulus frequency affect neural responses to rewards and punishments. Psychophysiology, 59(3), e13981. 10.1111/psyp.13981 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Hajcak G, & Foti D (2020). Significance?… Significance! Empirical, methodological, and theoretical connections between the late positive potential and P300 as neural responses to stimulus significance: An integrative review. Psychophysiology, 57(7), e13570. 10.1111/psyp.13570 [DOI] [PubMed] [Google Scholar]

[R13] Hajcak G, Moser JS, Holroyd CB, & Simons RF (2007). It’s worse than you thought: The feedback negativity and violations of reward prediction in gambling tasks. Psychophysiology, 44(6), 905–912. 10.1111/j.1469-8986.2007.00567.x [DOI] [PubMed] [Google Scholar]

[R14] Heydari S, & Holroyd CB (2016). Reward positivity: Reward prediction error or salience prediction error? Psychophysiology, 53(8), 1185–1192. 10.1111/psyp.12673 [DOI] [PubMed] [Google Scholar]

[R15] Holroyd CB, & Coles MGH (2002). The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109(4), 679–709. 10.1037/0033-295X.109.4.679 [DOI] [PubMed] [Google Scholar]

[R16] Holroyd CB, & Krigolson OE (2007). Reward prediction error signals associated with a modified time estimation task. Psychophysiology, 44(6), 913–917. 10.1111/j.1469-8986.2007.00561.x [DOI] [PubMed] [Google Scholar]

[R17] Holroyd CB, Krigolson OE, & Lee S (2011). Reward positivity elicited by predictive cues. NeuroReport, 22(5), 249–252. 10.1097/WNR.0b013e328345441d [DOI] [PubMed] [Google Scholar]

[R18] Holroyd CB, Larsen JT, & Cohen JD (2004). Context dependence of the event-related brain potential associated with reward and punishment. Psychophysiology, 41(2), 245–253. 10.1111/j.1469-8986.2004.00152.x [DOI] [PubMed] [Google Scholar]

[R19] Holroyd CB, Nieuwenhuis S, Yeung N, & Cohen JD (2003). Errors in reward prediction are reflected in the event-related brain potential. NeuroReport, 14(18), 2481–2484. [DOI] [PubMed] [Google Scholar]

[R20] Kaiser HF (1960). The Application of Electronic Computers to Factor Analysis. Educational and Psychological Measurement, 20(1), 141–151. 10.1177/001316446002000116 [DOI] [Google Scholar]

[R21] MacNamara A, & Barley B (2018). Event-related potentials to threat of predictable and unpredictable shock. Psychophysiology, 55(10), e13206. 10.1111/psyp.13206 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Matsumoto M, & Hikosaka O (2009). Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature, 459(7248), 837–841. 10.1038/nature08028 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Miller GA, Gration G, & Yee CM (1988). Generalized Implementation of an Eye Movement Correction Procedure. Psychophysiology, 25(2), 241–243. 10.1111/j.1469-8986.1988.tb00999.x [DOI] [Google Scholar]

[R24] Mulligan EM, & Hajcak G (2018). The electrocortical response to rewarding and aversive feedback: The reward positivity does not reflect salience in simple gambling tasks. International Journal of Psychophysiology, 132, 262–267. 10.1016/j.ijpsycho.2017.11.015 [DOI] [PubMed] [Google Scholar]

[R25] Nieuwenhuis S (2011). Learning, the P3, and the locus coeruleus-norepinephrine system. In Neural Basis of Motivational and Cognitive Control (pp. 209–222). Oxford University Press. [Google Scholar]

[R26] Proudfit GH (2015). The reward positivity: From basic research on reward to a biomarker for depression. Psychophysiology, 52(4), 449–459. 10.1111/psyp.12370 [DOI] [PubMed] [Google Scholar]

[R27] Rescorla RA, & Wagner AR (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and non-reinforcement. In Black AH & Prokasy WF (Eds.), Classical Conditioning II: Current Research and Theory (pp. 64–69). Appleton-Century Crofts. Retrieved from https://cir.nii.ac.jp/crid/1572543025504096640 [Google Scholar]

[R28] Sambrook TD, & Goslin J (2015). A neural reward prediction error revealed by a meta-analysis of ERPs using great grand averages. Psychological Bulletin, 141(1), 213–235. 10.1037/bul0000006 [DOI] [PubMed] [Google Scholar]

[R29] Soder HE, & Potts GF (2018). Medial frontal cortex response to unexpected motivationally salient outcomes. International Journal of Psychophysiology, 132, 268–276. 10.1016/j.ijpsycho.2017.11.003 [DOI] [PubMed] [Google Scholar]

[R30] Talmi D, Atkinson R, & El-Deredy W (2013). The Feedback-Related Negativity Signals Salience Prediction Errors, Not Reward Prediction Errors. Journal of Neuroscience, 33(19), 8264–8269. 10.1523/JNEUROSCI.5695-12.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Walsh MM, & Anderson JR (2012). Learning from experience: Event-related potential correlates of reward processing, neural adaptation, and behavioral choice. Neuroscience & Biobehavioral Reviews, 36(8), 1870–1884. 10.1016/j.neubiorev.2012.05.008 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] Yau JO-Y, & McNally GP (2019). Rules for aversive learning and decision-making. Current Opinion in Behavioral Sciences, 26, 1–8. 10.1016/j.cobeha.2018.08.006 [DOI] [Google Scholar]

[R33] Yeung N, & Sanfey AG (2004). Independent Coding of Reward Magnitude and Valence in the Human Brain. Journal of Neuroscience, 24(28), 6258–6264. 10.1523/JNEUROSCI.4537-03.2004 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Reinforcement Learning and the Reward Positivity (RewP) with Aversive Outcomes

Elizabeth A Bauer

Brandon K Watanabe

Annmarie MacNamara

Abstract

1. Introduction