Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Oct 1.
Published in final edited form as: J Exp Psychol Anim Learn Cogn. 2024 Oct;50(4):267–284. doi: 10.1037/xan0000385

Contextual Modulation of Human Associative Learning following Novelty-Facilitated Extinction, Counterconditioning, and Conventional Extinction

Jérémie Jozefowiez 1, James E Witnauer 2, Yaroslav Moshchenko 3, Cameron M McCrea 3, Kristina A Stenstrom 3, Ralph R Miller 3
PMCID: PMC11911142  NIHMSID: NIHMS2056832  PMID: 39432329

Abstract

The expression of an association between a conditioned stimulus (CS) and an unconditioned stimulus (US) can be attenuated by presenting the CS by itself (i.e., extinction, Ext). Though effective, Ext is susceptible to recovery effects such as renewal, spontaneous recovery, and reinstatement. Dunsmoor et al. (2015; 2019) have proposed that pairing the CS with a neutral outcome (novelty-facilitated extinction, NFE) could offer better protection against recovery effects than Ext. Though NFE has been compared to Ext, it has rarely been compared to counterconditioning (CC), a similar procedure except that the CS is paired with a US having a valence opposite to the US used in initial training. We report two aversive conditioning experiments using the rapid-trial streaming procedure with human participants that compare the efficacies and susceptibilities to ABA renewal of Ext, CC, and NFE. Associative learning was assessed through expectancy learning and evaluative conditioning. CC and NFE equally decreased anticipation of the US in the presence of the CS (i.e., expectancy learning). Depending on how the CS-US association was probed, they were either as or more effective at doing so than Ext. All three interference treatments were equally susceptible to context manipulations. Only CC clearly altered the valence of the CS (i.e., evaluative conditioning). Valence ratings after Ext, CC, and NFE, as well as a no-interference control condition, were all equally susceptible to context effects. Overall, the present study does not support the assertion that NFE is consistently more resistant to recovery effects than Ext.

Keywords: Extinction, counterconditioning, novelty-facilitated extinction, renewal, expectancy learning, evaluative conditioning


In Pavlovian conditioning, pairing an initially neutral conditioned stimulus (CS) with a biologically relevant unconditioned stimulus (US) leads to the formation of a CS-US association (associative learning), the strength of which determines the capacity of the CS to elicit conditioned responses (CR). If the US is an aversive event, the CR typically includes indications of fear and anxiety. For this reason, associative learning is thought to play a central role in anxiety disorders and phobias (e.g., Van Elzakker et al., 2014; Vervliet et al., 2013), and extensive research has been conducted toward reducing the extent to which the CS triggers the CR once the CS-US association has been established.

After conditioning consisting of CS-US pairings, presenting the CS by itself reduces the potential of the CS to elicit a CR (i.e., extinction [Ext]; Pavlov, 1927). Relatedly, counterconditioning (CC; Pavlov, 1927) is a reduction in original conditioned responding that results from the CS being paired with a stimulus having a valence opposite to that of the original US. However, these treatments often wane with procedural changes between extinction and testing (Bouton, 2017). Testing the CS in a context that is distinctly different from the context of Ext or CC increases the target CR strength relative to the CR observed in the Ext or CC context. For example, rats that received CS-US pairings in context A followed by extinction in context B will show renewal of CR strength when tested in the context of the original CS-US pairings (ABA renewal; Bouton & Bolles, 1979). Similar recovery effects are observed when the US is presented in the test context before testing with the CS (reinstatement; Rescorla & Heth, 1975) or when the retention interval between the last Ext or CC session and the test is increased (spontaneous recovery; Pavlov, 1927).

Recently, Dunsmoor and his collaborators (Dunsmoor et al., 2015; Dunsmoor et al., 2019; see also Lucas et al., 2018) have proposed a new procedure, novelty-facilitated extinction (NFE), that reduces expression of the CS-US association. NFE resembles CC in that it involves post-acquisition pairings of the CS with another stimulus, but the paired stimulus is neutral in valence rather than being opposite in valence to the training US. Dunsmoor et al. reported that after initial pairings of a CS with an aversive shock US, NFE is more effective than Ext in reducing the conditioned galvanic skin conductance response (SCR) by humans and the conditioned freezing by rats. Moreover, Dunsmoor et al. (2015, 2019) found NFE to be putatively less susceptible to spontaneous recovery than Ext. Lucas et al. (2018) reached the same conclusion using reinstatement to assess recovery of both SCR and evaluative ratings of CS valence by humans. However, Dunsmoor et al.’s observations might be difficult to replicate; for example, Krypotos and Engelhard (2018) observed that NFE was less effective than Ext in a test immediately after the response-reducing treatment (NFE or Ext. See also Steinman et al., 2022) and equally subject to reinstatement-induced recovery in both affective ratings of the CSs and expectancy of the USs.

An additional limitation of the conclusions reached by Dunsmoor et al. (2015, 2019) and Lucas et al. (2018) is that they did not compare NFE with CC in terms of efficacy or susceptibility to recovery, which is surprising given Dunsmoor et al.’s (2015, 2019) rationale for proposing the use of NFE. Based on the Rescorla and Wagner (1972) model of learning, they posited that change in the CS-US association was a function of the surprisingness of the event following the CS. Although subject to argument, they presumed that the neutral outcome used in NFE was more surprising than the omission of the US in Ext; hence, it should have a greater efficacy than simple extinction which consists of presentations of the CS alone. However, one might suppose that the absence of an outcome as in Ext differs more from target training than does the substitution of the target outcome to an affectively neutral nontarget outcome in NFE. Moreover, extending their theoretical reasoning for expecting NFE to be more effective than Ext, Steinman et al. (2022) suggested that NFE reduces uncertainty more than Ext because detection of a non-event (i.e., no outcome in Ext) is more difficult than detection of a nontarget outcome in NFE. Based on these two factors, Dunsmoor et al. hypothesized that the greater amount of surprise and reduced uncertainty generated by NFE also would result in lower susceptibility to recovery effects, which was consistent with the specific data they presented (see also, Lipp et al., 2020). However, both of their theoretical assumptions are arguable. First, it is not clear why a neutral outcome should be more surprising than the absence of the expected outcome; one might view a cue followed by the target outcome as being more similar to the same cue followed by a different outcome than by no outcome in that the first two have in common the presence of a subsequent event. Second, although the importance of surprise in extinction is central in most models of conditioning including the Rescorla-Wagner (1972) model and is supported by phenomena such as overexpectation (Kremer, 1978), no data to date has linked surprise to recovery effects. Likewise, there is no data supporting a link between recovery effects and uncertainty. Still, suppose we assume that NFE is less susceptible to recovery than Ext because it generates more surprise. If this were true, after pairing the CS with an aversive US in initial training, pairing it with an appetitive US would be arguably more surprising than pairing it with a neutral outcome. Hence, if Dunsmoor et al.’s account is correct, we would expect CC to be even more effective and more resistant to recovery effects than NFE.

Research has shown that CC is frequently more effective than Ext at reducing emotional CRs (for a review, see Jozefowiez et al., 2020). Data on the susceptibility to recovery of reductions in responding due to CC relative to Ext are sparse and, to date, contradictory. Holmes, Leung, and Westbrook (2016) have concluded that CC is more susceptible to renewal than Ext based on experiments that measured freezing in rats. In contrast, Kang et al. (2018), using human participants and measures of both CS valence and US expectancy, have reported that CC is less susceptible not only to renewal (see also Thomas et al., 2012) but also spontaneous recovery and reinstatement. Hence, it seems important to conjointly assess the relative effectiveness of CC and NFE, and their susceptibilities to renewal. Using chocolate as US, Van Gucht et al. (2013) found no difference between Ext and CC regarding their impact on either CS-elicited approach behavior or US expectancy. Otherwise, only CC was able to alter conditioned liking with its effect generalizing between contexts.

Using SCR, Chen et al. (2022) compared NFE with not only Ext but also CC. They found no appreciable difference across the three treatments in either immediate efficacy or susceptibility to spontaneous recovery and reinstatement-induced recovery. However, Chen et al.’s study used electric shock to initially condition the CS, whereas CC consisted of pairing the CS with positive images from the International Affective Picture Scale (IAPS; Lang et al., 2008). Likely, the absolute value of the valence of the electric shock is greater than the valence of positive IAPS pictures. If the valence of a stimulus is coded relative to other stimuli in the same situation, this might have led to the positive IAPS pictures being processed by participants in a way that resembled neutral pictures. This possibility cannot be ruled out because Chen et al. did not ask their participants to rate the valence of the various stimuli they used as USs in their study. A better comparison between CC and NFE could be achieved by using aversive and appetitive stimuli that have at least similar absolute valences and that use the same sensory modality (e.g., visual).

Importantly, the SCR is elicited by a variety of emotionally positive and negative stimuli. In a procedure that includes Phase 1 CS-aversive US pairings and Phase 2 CS-appetitive stimulus CC trials, the participant will learn to expect the appetitive stimulus as a result of CC. However, the expectation of an appetitive stimulus could trigger an SCR that would be indistinguishable from an SCR triggered by an association between the CS and the aversive US. Consequently, other methods for assessing the CS-US association are necessary to compare the efficacies of NFE and CC. Often, participants are asked to rate how likely the CS is to be followed by the US. Another possibility is to have participants rate the valence of the CS: a CS paired with a positive outcome should be rated as more positive than a CS paired with a negative outcome (i.e., evaluative conditioning; for a review, see De Houwer et al., 2001). Such a strategy was followed by Quintero et al. (2024). They paired a visual CS with an aversive sound. The CS was subsequently paired with a positive sound (CC), a neutral one (NFE), or no sound at all (Ext). Assessing associative learning through both expectancy learning (the participants had to rate how likely was the US to appear following the CS) and evaluative conditioning (the participants had to rate the valence of the CS), they did not observe any difference among the three treatments regarding either their efficacies immediately after interference treatment or their susceptibilities to spontaneous recovery. However, Quintero et al. (2024) did not assess the comparative susceptibilities of Ext, CC, and NFE to context effects, more specifically, renewal. This was the goal of the present experiments. We used the rapid-trial streaming procedure developed by Allan and her collaborators (a.k.a. the streamed-trial procedure; Alcala et al., 2023; Crump et al., 2007; Hannah et al., 2009; Jozefowiez, 2021; Jozefowiez & Miller, 2024; Laux et al., 2010; Maia et al., 2018; Murphy et al., 2021; Siegel et al., 2009) that we had previously used in the study comparing CC and Ext (Jozefowiez et al., 2020).

Experiment 1

In Experiment 1, participants were exposed to rapid streams of stimuli divided into two phases. The first phase established an association between the target CS (X) and an aversive US (NEG). Except for the control condition (Ctr), the second phase was intended to reduce expression of the X-NEG association through either Ext (presenting X by itself), CC (pairing X with an appetitive US), or NFE (pairing X with a neutral outcome). The positive and aversive USs, as well as the neutral outcomes, were all IAPS pictures.

Context manipulations were achieved by changing the background over which the stimuli were displayed. Phase 1 of a stream took place in Context A whereas Phase 2 took place in Context B. Testing occurred in either Context A or B depending on experimental condition to assess potential ABA renewal.

During the test, the participants were asked how likely it was that the target CS would be followed by the aversive US in each context (prediction rating). This measured expectancy learning. The participants were also asked to rate how much they ‘liked’ the CS when it was presented in a given context (valence rating). This measured evaluative conditioning. Some authors have suggested that evaluative conditioning is at least highly resistant, if not completely insensitive, to extinction (i.e., Baeyens et al., 1988; Díaz et al., 2005; Gawronski et al., 2015; Vansteenwegen et al., 2006). In a previous study comparing CC and NFE using the streaming procedure (Jozefowiez et al., 2020), we observed that different results are obtained depending on whether associative learning is assessed through expectancy learning or evaluative conditioning.

Methods

Participants

In our previous study (Jozefowiez et al., 2020), the within-subject difference between Ext and CC corresponded to a Cohen’s (1988) d of 0.3, and the correlation between the performance of participants in the Ext and CC conditions varied between 0.3 to 0.7. The power analysis used to select the size of the sample in the present experiment assumed that the effect sizes of Ext, CC, and NFE would be inside of this range. With those assumptions, 80% statistical power for a t-test is achieved with a sample size between 55 and 122 participants, corresponding to correlation strengths of 0.7 and 0.3, respectively. Hence, we sought to obtain data from 100 participants. This and the subsequent experiment were approved by the SUNY-Binghamton Institutional Review Board.

Overall, we ran 180 participants, between 18 and 50 years old without a predisposition for visually-induced seizures or a history of anxiety disorders, recruited from the SUNY-Binghamton subject pool. Eighty participants failed to meet the criterion for inclusion based on ratings during a warmup procedure (see below). The data file from one participant who met the inclusion criterion was corrupted and could not be read. Hence, the analysis below is based on a sample of 99 participants (29 males, 69 females, and one participant who did not indicate gender, with the imbalance reflecting the demographics of the subject pool). The mean age of participants was 18.99 +/− 0.86, ranging from 18 to 21 years old. Even though the target sample size was 100 participants, we recruited 180 participants to compensate for the anticipated loss of participants who failed to meet the inclusion criterion.

Apparatus and Stimuli

The experiment was conducted on Windows PCs in individual cubicles at SUNY-Binghamton. The screen resolution of the computers was 1920 × 1080 pixels, and the monitors were 53.34 cm wide. The experiment used a customized program written in Python using the PsychoPy2 library (Peirce, 2007). Participants used a standard computer mouse to provide their responses.

Six stimuli (cues: X, Y, W; and outcomes: NEG, NEUT, POS) were used in each condition. As there were 8 experimental conditions, 24 neutral cues played the roles of cues X, Y, and W, and 8 aversive, 8 neutral, and 8 appetitive pictures played the roles of NEG, NEUT, and POS, respectively. For each participant, 8 sets of stimuli were created by randomly selecting three images from the pool of 24 cues and randomly assigning them to the roles of Cues X, Y, and W. One stimulus from the pool of 8 aversive pictures, one from the pool of 8 appetitive pictures, and one from the pool of 8 neutral pictures were randomly selected to fill the roles of the NEG, POS, and NEUT outcomes, respectively. All random selection of images was done without replacement. Hence, each condition had a different set of cues and outcomes, with a different random assignment of the cues and outcomes to each condition for each participant. The cues were 450- x 490-pixel neutral symbols. Based on our judgment, they were distinctively different from each other. The aversive, appetitive, and neutral pictures were drawn from the IAPS (Lang et al., 2008; see Table 1). Although the three categories of pictures differed in their valence ratings, matching them regarding arousal was not possible because aversive images are ordinarily more arousing than either appetitive or neutral images, and appetitive images are more arousing than neutral images. Instead, we tried to reduce as much as possible the differences in arousal by picking the most arousing neutral images and the least arousing aversive images. All IAPS pictures were scaled down from their original 1024 × 768 resolution to match the dimensions of the cues.

Table 1.

Identifier, valence, and arousal ratings of the IAPS used as outcomes.

IAPS identifier Valence Arousal Set

#7950 Tissue 4.94 2.28 Neutral
#7059 Keyring 4.93 2.73 Neutral
#7010 Basket 4.94 1.76 Neutral
#7012 Rubber bands 4.98 3.00 Neutral
#7020 Fan 4.97 2.16 Neutral
#7035 Mug 4.98 2.66 Neutral
#7041 Baskets 4.99 2.60 Neutral
#7090 Book 5.19 2.61 Neutral
#3103 Injury 2.07 6.06 Aversive
#3150 Mutilation 2.26 6.55 Aversive
#9571 Cat 1.96 5.64 Aversive
#9301 Toilet 2.26 5.28 Aversive
#9185 Dead Dog 1.97 5.65 Aversive
#9183 Hurt Dog 1.69 6.58 Aversive
#9181 Dead Cows 2.26 5.39 Aversive
#9570 Dog 1.68 6.14 Aversive
#1440 Seal 8.19 4.61 Appetitive
#1441 Polar Bears 7.97 4.61 Appetitive
#1460 Kitten 7.97 3.94 Appetitive
#1710 3 puppies 8.21 4.31 Appetitive
#1750 Bunnies 8.28 4.10 Appetitive
#1920 Dolphins 7.90 4.27 Appetitive
#2045 Baby 7.87 5.47 Appetitive
#2070 Baby 8.17 4.51 Appetitive

The stimuli appeared superimposed over a context. The single context for the Warmup conditions and Contexts A and B for each of the 8 experimental conditions (17 contexts in all) consisted of a rectangular image, each framed by a border (see Figure 1 for an example). The background of the image inside the border was 1 of 17 solid colors, each yoked to a distinctly different border pattern. The borders were 1024 × 768 pixels while the internal colored area was 960 × 662 pixels. A white fixation cross (10 × 40 pixels vertically, 46 × 9 pixels horizontally) was displayed at appropriate times at the center of the screen.

Figure 1.

Figure 1.

Top: Examples of background and stimuli shown to the participants. Bottom left: Example of a prediction rating screen. Bottom right: Example of a valence rating. The stimuli and background shown in these examples are those used during the warm-up.

A unique set of stimuli (X, Y, W, O1, O2) and a unique context were used during the warmup conditions that preceded the experimental conditions. All stimuli in this set were 450- x 490-pixel, affectively neutral symbols. Those stimuli were used throughout the whole warm-up sequence and were the same for all the participants.

Procedure

All participants were initially required to complete an informed consent form and asked to turn off their cell phones. Upon giving consent, they were seated in individual cubicles. During the experiment, the participants were exposed to various conditions in which they saw streams of trials. After each stream, they were asked to make predictions and provide valence ratings concerning a target cue and the various outcomes to which they had been exposed.

For each condition, a set of three randomly selected cues and three randomly selected outcomes was assigned to a randomly selected pair of contexts (see Tables 2 and 3). Random selections were without replacement across conditions and were made anew for each participant. Each condition, presented in random order, was composed of a stimulus stream followed by the prediction and the valence rating questions.

Table 2.

Composition of the trial streams during warmup in Experiment 1.

Condition Phase 1 (cycled once) Phase 2 (cycled twice)

Positive warmup A: 6 X-O1 / 5 Y- / 6 W- B: 6 Y- / 5 Y-O2
Negative warmup A: 6 X- / 5 Y-O2 / 6 W- B: 6 Y- / 5 Y-O2
Null warmup-1 A: 3 X-O1 / 3 X- / 5 Y-O2 / 6 W- B: 6 Y- / 5 Y-O2
Null warmup-2 A: 3 X-O1 / 3 X-O2 / 5 Y- / 6 W- B: 6 Y- / 5 Y-O2

Note. ‘n I-J’ means that n trials of cue I and outcome J were presented during each cycle. ‘n I-’ means that n trials of cue I alone were presented during each cycle. The letters A and B indicate in which context the stream was presented. The program cycled once through Phase 1 and twice through Phase 2 before the contingency question was presented. During each cycle, the order of presentation of the trials was determined randomly. X and O1 were the target cues and outcomes. The prediction and valence ratings always occurred in context A.

Table 3:

Composition of the trial streams during experimental conditions in Experiment 1.

Condition Phase 1 (cycle once) Phase 2 (cycle twice)

Ctr A: 5 X-NEG / 5 W- B: 5 Y-POS / 5 Y-NEUT / 10 Y-
Ext A: 5 X-NEG / 5 W- B: 5 Y-POS / 5 Y-NEUT / 5 X- / 5 W-
CC A: 5 X-NEG / 5 W- B: 5 X-POS / 5 Y-NEUT / 5 Y- / 5 W-
NFE A: 5 X-NEG / 5 W- B: 5 Y-POS / 5 X-NEUT / 5 Y- / 5 W-

Note. ‘n I-J’ means that n trials of cue I and outcome J were presented during each cycle.’ n I-’ means that n trials of cue I alone were presented during each cycle. The letters A and B indicate in which context the stream was presented. The program cycled once through Phase 1 and twice through Phase 2 before the contingency question was presented. During each cycle, the order of presentation of the trials was determined randomly. X and O1 were the target cues and outcomes. The prediction and valence ratings always occurred in context A for the ABA streams and context B for the ABB streams. During a block, each of the conditions in Table 3 was presented twice to a participant: testing occurred in Context A on one occasion (ABA condition) and in Context B on the other occasion (ABB condition). Hence, a block was composed of 8 streams: cues, outcomes, and context differed for each of these streams.

Each stream began with a 1000-ms presentation of the context and the fixation cross. The fixation cross remained visible throughout the two training phases of each stream. The Phase 1 trials (see Table 3) were presented in a random order and were followed immediately by the Phase 2 trials which consisted of two cycles of the trials specified in Table 3 with trial order randomized within each cycle. Every trial started with the presentation of a cue (X, Y, or W) for 1000 ms, and potentially 400 ms into the cue, an outcome (POS, NEG, or NEUT) was presented for 600 ms. Trials were separated by a 500-ms intertrial interval (ITI). X was always presented such that its left edge was one stimulus width to the right of the center of the screen; Y was always presented such that its right edge was one stimulus width to the left of center; W was always presented centered in the upper part of the screen, vertically aligned with and between the locations of X and Y. Outcome NEG was always presented in the lower left corner, diagonally opposite the location of X; Outcomes POS and NEUT were always presented centered immediately below the fixation cross (see Figure 1). The intent of Phase 1, which was identical across all conditions, was to establish an association between cue X and outcome NEG. Stimuli were displayed in Context A. Except for the Ctr condition, the aim of Phase 2 was to impair the expression of the X-NEG association established during Phase 1 either through Ext (X presented by itself; we label this event the NULL outcome), CC (X paired with POS), or NFE (X paired with NEUT). During Phase 2, stimuli were displayed in Context B.

Once exposure to a condition was complete, the participant was asked to predict how likely in a specific context (A or B) each of three given cue/outcome configurations was immediately following the presentation of X and to rate the relative valences of the various cues presented during the condition (see Figure 1). As test order might influence responding to the second-asked question, the actual order of testing (i.e., prediction vs. valence) was counterbalanced within-subjects. That is, for half of the streams, the prediction rating occurred before the valence rating, whereas the reverse was true for the remaining streams.

For the prediction ratings (see the lower left panel of Figure 1 for an example), the participant was shown a screenshot of cue X in either Context A (ABA condition) or Context B (ABB condition), as it had appeared during the stimulus stream. Above the cue was the statement: “Imagine you have been shown the following configuration on the screen:” and below the cue was the instruction: “Using the scale below each image, use the mouse to indicate how likely the top configuration is to be followed by each of the below configurations.” Below this instruction, there were four screenshots arranged in a square showing an image of Cue X along with an image of each potential outcome (NEG, POS, NEUT, and the NULL outcome), all presented in either Context A (ABA condition) or Context B (ABB condition). Below the image of each outcome was an 11-point Likert scale ranging from 0 to 100 (incremented by steps of 10) and anchored at 0 (‘Very unlikely’) and 100 (‘Very likely’). Consistently, the mouse pointer initially appeared centered below the “Imagine…” statement at the top of the screen. The location of the four screenshots showing the outcomes was randomized for each participant and each condition. A white circle indicating a specific rating in the Likert scale became black if the participant clicked on it and remained black for 150 ms. Then, that particular outcome image and its associated Likert scale disappeared. 150 ms after the participants had given the last of their four prediction ratings, the prediction rating screen disappeared.

Each valence rating pitted two cues (S1 and S2) against one another (the lower right panel of Figure 1 provides an example). Three valence ratings were presented sequentially in random order: X vs. W, X vs. Y, and W vs. Y. Cue W was always presented exactly as often as Cue X but was never paired with any outcome. Hence, the valence of W should have remained neutral and, therefore, served as a neutral option against which X could be compared by participants. The X vs. W ratings served as the critical dependent variable in data analysis, whereas the other ratings were only fillers that prevented participants from ignoring cue Y in later experienced conditions. During each valence rating, the participant saw screenshots of Cues S1 and S2 in Context A for the ABA condition and Context B for the ABB condition, as they appeared during the condition’s stream. One was displayed on the left of the screen, the other on the right, with cues randomly assigned to a location. Centered above the two cues was the question: “Which stimulus do you prefer?” Below the cue was an 11-point Likert scale, ranging from +5 to 0 to +5, and anchored at +5, 0, and +5. The anchor below 0 always read, “I do not have any preference.” For the series of three valence ratings, the horizontal positions of the two cues were counterbalanced. Under each of the left and right instances of +5, the anchor read, “I much more prefer [W, X, or Y],” where X, Y, or W in the anchors were small images of that cue. The image of the cue in its training context always appeared above the anchor corresponding to the highest preference for that stimulus (so, for instance, the image of X, in either context A or context B depending on the condition, was displayed immediately above the “I much prefer X” anchor). The participants used the mouse to provide their answers. Consistently, the mouse appeared at the center of the screen, equidistant from the leftmost +5 and the rightmost +5 anchors. When the participants clicked on the white circle indicating the various rating levels in the Likert scale, the selected value turned black for 150 ms before the valence rating screen disappeared. Once the prediction and valence ratings were over, another screen appeared, prompting participants to left-click the mouse whenever they were ready to start the next condition.

Warmup.

Before experiencing the experimental conditions, all participants received a series of warmup conditions aimed at familiarizing them with the procedure and providing examples of different relationships between the target cue X and the outcomes (see Table 2). The conditions during the warmup all used the specific context and stimulus sets designated for the warmup, which were distinctly different from those used for the experimental conditions. Notably, the warmup stimulus set contained only two outcomes (O1 and O2), each of them neutral in valence, versus three outcomes (POS, NEG, and NEUT) in the experimental stimulus sets, with only one of them being neutral. Thus, instead of providing four ratings during the prediction test, participants provided three, corresponding to the X-O1, X-O2, and X-NULL contingencies. Otherwise, the structure of the warmup condition was identical to the description above, with O1 appearing at the same location as NEG and O2 appearing at the same location as POS and NEUT. The warmup procedure consisted of two parts. First, participants received warmup training streams that were followed by feedback on the programmed relationships between X and each of the outcomes (O1, O2, and NULL). Second, participants received warmup streams of trials intended to test whether they could reliably report positive, zero, and negative contingencies between X and each of its outcomes. Between warmup streams, participants were tested on both the predictive relationship between X and the various outcomes and on the valence of X. For each valence rating, participants were reminded that there was no right or wrong rating of valence and then prompted to left-click the mouse to advance from one stream to the next.

The first warmup training stream provided an example of a positive contingency between X and O1. The trial composition for the stream presented in this condition (positive warmup) is shown in Table 2. Other than the differences noted in Table 2, all warmup training streams were identical. At the end of each warmup stream, the prediction and valence ratings (in random order) were requested in Context A. Once the participants had provided their prediction and valence ratings, an instruction screen was presented:

In the sequence you have just seen, [X] was always followed by [O1], [X] was never followed by [O2], and a stimulus was always presented following [X]. Hence, you should have judged that it is very likely that [X] would be followed by [O1]. As for which stimulus you prefer, there is no good or bad answer to those questions; just choose according to your preference!

[X], [O1], and [O2] consisted of images of the respective cue and outcomes.

The second warmup training stream provided participants with an example of a negative X-O1 contingency. The trial composition for the negative warmup stream is shown in Table 2. Once participants had provided their responses to the prediction and the valence rating questions, an instruction screen explained to them that their answer should have been 0 for the X-O1 and X-O2 relations but 100 for the X-NULL relation because X had never been paired with an outcome.

The third and fourth warmup training streams provided participants with examples of zero contingencies for X-O1 and X-NULL (third warmup stream) and X-O1 and X-O2 (fourth warmup stream). The trial compositions for these warmup streams are shown in Table 2. Once the participants had provided their responses to the prediction and valence ratings, an instruction screen explained the correct prediction ratings as in the first and second warmup streams. Participants were then prompted to left-click the mouse to start the next condition.

In the final warmup streams, participants were eliminated from the study if they failed to answer all the questions correctly. Before beginning the final warmup streams, participants were told that they would be presented with more examples of each of the four types of warmup conditions, but now there would be no feedback. Upon left-clicking the mouse, participants were exposed to a series of warmup measurement quartets consisting of the positive, negative, and two null warmups in a random order for each quartet. Following each stream, participants provided both prediction ratings and valence ratings. In the positive warmup streams, the participant’s prediction rating was defined as correct if it was between 80 and 100 for the X-O1 contingency and between 0 and 20 for the X-O2 and X-NULL contingency. Following each negative warmup stream, ratings between 0 and 20 for the X-O1 and X-O2 contingencies and between 80 and 100 for the X-NULL contingency were defined as correct. For the first zero contingency warmup stream, a correct response was defined as one between 30 and 70 for the X-O1 and X-NULL contingencies and between 0 and 20 for the X-O2 contingency. For the second zero contingency warmup stream, a correct response was defined as one between 30 and 70 for the X-O1 and X-O2 contingencies and between 0 and 20 for the X-NULL contingency. Participants were repeatedly presented with this quartet of warmup conditions until they were either able to provide a correct response for each condition composing a quartet or until they had been exposed to 10 quartets. In the latter case, they were considered to have failed warmup training and were thanked and then dismissed. As previously mentioned, 80 participants failed the warmup.

Experimental Conditions.

The experimental conditions started immediately after the warmup criterion was reached. Before participants clicked their mouse to advance to the study, an instruction screen informed them that the relation between X and the various outcomes would now be more difficult to identify. They were exposed to three octets of conditions (i.e., each experimental condition was presented three times). The 8 conditions in each octet were composed by factorially crossing the type of interference treatment (Ctr vs. Ext vs. CC vs. NFE) with the context in which the prediction and valence ratings took place (A vs. B). The composition of the streams in the Ctr, Ext, CC, and NFE conditions are shown in Table 3. Phase 1 consisted of trials that established an X-NEG association and a W-NULL association. The last trial of Phase 1 was followed immediately by the first trial of the first cycle of Phase 2 and the last trial of the first cycle of Phase 2 was followed immediately by the first trial of the second cycle of Phase 2. The order of trials within a cycle was randomly determined. Testing occurred at the end of each stream, in either context A or B depending on the condition. Once a participant completed an octet of conditions, the next octet commenced. Condition order was randomized within octets for each participant.

Outcome ratings.

After all the experimental condition streams, the affective valences of the outcomes were assessed as a manipulation check. Participants were asked to evaluate the valence of each outcome on a −5 to +5 scale. To obtain a rating, a stimulus was shown for 400 ms centered on the screen over a grey background. This was followed by the question “How pleasant or unpleasant is this image for you?” along with the presentation, below the question, of an 11-point Likert scale going from −5 to +5 and anchored at −5 (very unpleasant), 0 (neither pleasant nor unpleasant), and +5 (very pleasant). Participants responded by clicking on the Likert scale, which was followed by a 1000-ms ITI screen consisting of a white fixation cross over a grey background that preceded the next question. The ratings of all the outcomes were repeated five times, with the order randomized within each of the five cycles. The goal of these final outcome ratings was to make sure that the valence of the IAPS pictures remained intact despite their smaller scale and their short presentation. Note that, as we were not interested in their arousal properties which, anyway, were roughly equated across outcomes; thus, we did not ask the participants to rate how aroused they were by the pictures. Once these valence ratings were completed, the participants were presented with a debriefing screen informing them of the intent of the experiment.

Data Analysis

Although the Likert scale used to measure participants’ ratings is formally an ordinal scale, parametric statistics were used to analyze ratings as is often done (see Maia et al., 2018, for a critique of this approach). For each participant in each condition (Ctr, Ext, CC, and NFE), a mean prediction rating and a mean valence rating were computed based on the three prediction and valence ratings of that condition. The X vs. W valence ratings were used to compute the valence of X. If the participant indicated that he preferred X over W, the valence of X was, of course, the rating he chose on the Likert scale. If, on the contrary, the participant indicated a preference for W over X, the rating X was assigned was the valence the participant assigned to W with the sign reversed. Hence, every X vs. W valence rating allowed to assign to X a valence ranging from −5 and −1 (when the participant indicated he preferred W over X), equal to 0 (when the participant indicated he was indifferent between the two stimuli), or ranging from 1 to 5 (when the participant indicated he preferred X over W). The other valence ratings were only collected as decoys and were not analyzed. For each stimulus presented during the final outcome rating phase, a mean rating was computed based on the five ratings the participant gave for each outcome.

Inferential analysis of ratings was carried out using repeated-measures analyses of variance (ANOVA). The Huyhn-Feldt correction was applied when the sphericity assumption was not met. Partial eta-squared with a 90% confidence interval (CI) was used as the measure of effect size for each ANOVA, which was computed using resources from Nelson (2016). Based on Steiger’s (2004) analysis, which points out that the one-tailed nature of hypothesis tests based on the F-distribution, supports the use of 90% (two-tailed) CIs, we used in the present study 90% CIs for ηp2 following ANOVAs with alphas of 0.05. Cohen’s d was used as a measure of effect size whenever two conditions were compared. Following Cummings’ (2012) recommendation, the unbiased estimate of Cohen’s d in the population (sometimes called Hedge’s g) was used as a measure of effect size and computed with the formula:

estimatedd=1-34df-1×Mdiffs12+s222

where the right side of the formula divides the mean of the difference scores across participants (Mdiff), by an estimate of standard deviation that considers the pooled variances of the two conditions, s12+s222, and df is the degrees of freedom on which a paired samples t-test is based (n - 1). The left side of the formula applies a correction for estimating d based on a sample. 95% CIs for Cohen’s d for paired designs can only be approximated. Cousineau and Goulet-Pelletier (2021) reviewed eight different methods to achieve this goal with the conclusion that the best one is the adjusted lambda-prime method, which we implemented using the ESCI module in JAMOVI (www.jamovi.com). Error bars in graphs are always 95% CIs computed using Student’s t distribution.

Transparency and openness statement

All raw data from these experiments and the Python computer code for Experiment 1 are available at https://orb.binghamton.edu/psych_fac/19/ and upon request from Jeremie Jozefowiez or Ralph Miller. All the stimuli are available with the PYTHON program at https://orb.binghamton.edu/psych_fac/19/. We report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study.

Results

Outcome Valences

Figure 2 depicts the mean ratings for the valence of the outcomes for each condition. Quite clearly, the IAPS pictures retained their emotional properties despite their reduced size and short presentation time. The POS outcomes were rated as highly positive in all conditions, whereas the NEG outcomes were rated as highly negative. The NEUT outcomes were rated slightly positive.

Figure 2.

Figure 2.

Mean valence ratings for the outcomes sampled from the IAPS as a function of condition in Experiment 1. Error bars are 95% CIs. NEG, NEUT, and POS indicate aversive, neutral, and appetitive images, respectively. Ctr, Ext, CC, and NFE indicate control, extinction, counterconditioning, and novelty-facilitated extinction, respectively.

Prediction ratings

Efficacy and Context Sensitivity of Ext, CC, and NFE.

The top panel of Figure 3 shows the mean prediction ratings as a function of the conditions. Let us first focus on the X-NEG ratings. A repeated-measures ANOVA using the testing context (ABA vs. ABB) and the type of interference treatment (Ctr. vs. Ext vs. CC. vs. NFE) as factors found a main effect of the context manipulation, F(1, 98) = 69. 98, p < 0.001, ηp2=0.42, 90% CI [0.29, 0.51], and of the type of interference, F(1.81, 177.38) =118.05, p < 0.001, ηp2=0.55, 90% CI [0.46, 0.61], but failed to detect an interaction between these two factors, F(2.90, 284.20) = 2.34, p = 0.08, ηp2=0.02, 90% CI [0, 0.05].

Figure 3.

Figure 3.

Top: Mean X-NEG, X-NULL, X-POS, and X-NEUT prediction ratings as a function of condition in Experiment 1. Bottom: Valence of X, corrected for W, as a function of conditions in Experiment 1. Error bars are 95% CIs. ABA and ABB indicate the contexts of acquisition (A), response-degrading treatment (B), and test (A or B), respectively. Negative values indicate aversive valence ratings and positive values indicate greater perceived pleasantness.

Table 4 lists the results of comparisons between conditions. Across different response-reducing treatments (Ext, CC, and NFE), ratings were reliably higher in Context A than in Context B. All three response-reducing treatments (Ext, NFE, and CC) resulted in significantly lower ratings than the control (Ctr) treatment.

Table 4:

Results of Experiment 1.

Measure Comparison Mean difference Cohen’s d Between-measurement r

Prediction A vs. B 33.70 [25.70, 41.70] 1.45 [1.09, 1.84] −0.50
Prediction Ctr vs. Ext 28.60 [23.50, 33.70] 1.56 [1.26, 1.89] 0.02
Prediction Ctr vs. CC 32.40 [27.20, 37.50] 1.81 [1.49, 2.17] −0.06
Prediction Ctr vs. NFE 33.20 [28.00, 38.30] 1.84 [1.52, 2.20] −0.03
Prediction Ext vs. CC 3.77 [1.11, 6.44] 0.23 [0.07, 0.40] 0.66
Prediction Ext vs. NFE 4.58 [1.66, 7.50] 0.28, [0.10, 0.46] 0.60
Prediction CC vs. NFE 0.81 [−1.51, 3.12] 0.05 [−0.09, 0.20] 0.73
Valence Ctr vs. Ext −0.34 [−0.63, −0.06] −0.13 [−0.24, −0.02] 0.85
Valence Ctr vs. CC −0.78 [−1.17, −0.39] −0.30 [−0.46, −0.15] 0.71
Valence Ctr vs. NFE −0.27 [−0.67, 0.13] −0.10 [−0.26, 0.05] 0.70
Valence Ext vs. CC −0.43 [−0.77, −0.10] −0.17 [−0.30, −0.04] 0.78
Valence Ext vs. NFE 0.07 [−0.30, 0.45] 0.03 [−0.12, 0.17] 0.74
Valence CC vs. NFE 0.51 [0.14, 0.87] 0.20 [0.06, 0.35] 0.73

Note. Each mean difference reports the mean of the differences in ratings between the specified conditions. Brackets contain the upper and lower limits of 95% confidence intervals for either the mean difference or estimate of effect size (d, see the Methods section). Between-measurement r reports the correlation between participants’ scores in the comparisons. For valence comparisons, negative values indicate that the first listed condition (e.g., “Ext” in “Ext vs. CC”) produced ratings corresponding to greater unpleasantness. Except for the A vs. B comparison, all comparisons pooled ratings at test in contexts A and B.

Prediction Ratings for Other Outcomes.

Figure 3 also shows the mean prediction ratings for the POS, NEUT, and NULL outcomes as well as the NEG outcome. Besides the X-NEG ratings, only the X-NULL ratings in Ext, the X-POS ratings in CC, and the X-NEUT ratings in NFE differed meaningfully from 0. The NULL outcome in the Ext condition, the POS outcome in the CC condition, and the NEUT outcome in the NFE condition all have a special status in that, just like the NEG outcome, they had been paired with X, albeit in Phase 2. In the Ctr condition, in which no outcome other than NEG was paired with X, no similar pattern could be observed. The mean X-NULL rating was a bit higher than the other ones and was slightly affected by the test context, but this difference was small compared to what happened to the X-NULL ratings in Ext.

The test context affected the X-NULL rating in Ext, the X-POS rating in CC, and the X-NEUT rating in NFE, but the effect was opposite the one observed for the X-NEG rating. In summary, when the X-NEG rating in Ext (or CC or NFE) increased, the X-NULL rating (or X-POS rating or X-NEUT rating, respectively) decreased. Figure 4, which depicts the relationship between the X-NEG ratings and the X-NULL (and X-POS and X-NEUT) ratings in Ext (and CC and NFE, respectively) shows that this relation is not an artifact of averaging across participants.

Figure 4.

Figure 4.

Correlation between the X-NEG prediction ratings and the X-NULL (X-POS, and X-NEUT, respectively) prediction ratings in Ext (CC, and NFE, respectively) in Experiment 1.

A striking feature of Figure 4 is that some of the points are aligned linearly along the equation y=100-x, with x being the value of the X-NEG prediction rating and y being the value of either the X-NULL, X-POS, or X-NEUT rating depending on whether the condition is Ext, CC, or NFE. This is due to some participants seemingly allocating their prediction ratings across the various outcomes so that the sum was equal to 100%.

Valence ratings

The bottom panel of Figure 3 shows the mean valence ratings of X (relative to W) across conditions as a function of the test context and the type of interference treatment. A repeated-measures ANOVA using the test context (Context A vs. Context B) and the type of interference (Ctr vs. Ext vs. CC vs. NFE) as factors found a main effect of the type of interference, F(2.79, 272.97) = 6.31, p < 0.001, ηp2=0.06, 90% CI [0.02, 0.10], but did not detect an effect of the context, F(1, 98) = 3.40, p = 0.07, ηp2=0.03, 90% CI [0.00, 0.11], nor an interaction between the two factors, F(3, 294) = 0.79, p = 0.50, ηp2=0.008, 95% CI [0.00, 0.02]. The results of comparisons between conditions are summarized in Table 4. Importantly, lower valence ratings (less appetitive) pooled from Contexts A and B were observed after Ext and CC, relative to the control treatment. Moreover, the analysis failed to detect a reliable change in valence resulting from NFE treatment. The valence of X was higher after CC than after either Ext or NFE, and there was no detectable difference between Ext and NFE. Thus, CC, but not NFE, reliably increased valence ratings at test relative to Ext.

Discussion

The goal of this experiment was to directly compare the efficacies of Ext, CC, and NFE in reducing both the valence of a CS previously paired with an aversive outcome (evaluative conditioning) and the predicted likelihood of that aversive outcome given the CS (expectancy learning) at tests in both the context of target association degradation (ABB) and the context of acquisition (ABA).

Examining the prediction ratings, the expression of the target X-NEG association was robustly altered by Ext, CC, and NFE, with both CC and NFE being a little more effective. That is, CC and NFE produced lower prediction ratings than Ext, and no appreciable difference was observed between CC and NFE. All conditions (Ctr, Ext, CC, and NFE) were susceptible to large context effects (estimated d between 1.0 and 1.2).

Examining the valence ratings, the data suggest that CC and Ext increased the valence of X relative to Ctr and that the effect of CC was stronger than either Ext or NFE. Otherwise, there is no detectable context effect although, at the descriptive level, the valence of the cue was slightly higher when the testing occurred in Context B than in Context A, as one would expect. We will postpone further discussion of these conclusions until the data from Experiment 2 have been described. Discussing the results of the two experiments together will remove some of the ambiguities that arise when each experiment is considered by itself.

Experiment 2

The greater efficacy of CC relative to Ext in lowering the prediction ratings observed in Experiment 1 is inconsistent with a result we repeatedly observed in prior experiments that contrasted Ext and CC using the streaming procedure (Jozefowiez et al., 2020). In that series, we found that Ext was more effective than CC, except in one experiment in which CC proved as effective as Ext. We hypothesized that this was because, in that experiment, conditions allowed participants to better discriminate between the initial US paired with the target CS and the interfering US used in CC. Likewise, in Experiment 1, asking the participants to rate the X-POS, X-NEUT, and X-NULL relations in addition to the X-NEG relation should have highlighted the difference in outcomes between Phases 1 and 2, creating exactly the kind of situation hypothesized by Jozefowiez et al. to boost the efficacy of CC relative to Ext, especially as all the outcomes were simultaneously present on screen during the prediction ratings. The same reasoning would also apply to NFE, which could explain why both CC and NFE were more effective at altering the prediction ratings than Ext.

If this hypothesis were correct, the greater efficacies of CC and NFE over Ext observed for the prediction ratings in Experiment 1 should not be observed if participants were probed only about the X-NEG association. Experiment 2 was designed to test this prediction. It replicated Experiment 1, except that during the prediction ratings, the participants only rated how likely it was for X to be followed by NEG. We expected this manipulation to bridge the gap in efficacy between Ext and CC/NFE, perhaps making Ext even more effective than CC and NFE.

Methods

Participants

234 participants, recruited from the Binghamton University subject pool, completed the task. Seventy-seven failed the warmup, 2 failed an attention screen, and 10 indicated they were distracted during the experiment, leaving us with data from 145 participants (57 males, 87 females, and 1 participant did not identify as either male or female). The mean age was 18.91 +/− 0.97, ranging from 18 to 22 years old.

Apparatus and stimuli

Because of the COVID-19 pandemic, data collection took place online. The stimuli for Experiment 2 were identical to the ones used in Experiment 1. The Gorilla Experiment Builder (www.gorilla.sc) was used to create and host the experiment (Anwyl-Irvine et al., 2020). We replicated the look and feel of the Python program used in Experiment 1 as closely as possible within the Gorilla system. However, at the end of Experiment 2, we added screens for attention and distraction because the experiment was conducted online.

Procedure

The procedure was identical to that of Experiment 1 with the following exceptions:

  1. During the prediction ratings, participants were asked only about the X-NEG association, with the negative outcome being presented in the middle of the screen.

  2. Participants were screened during the warmup on their ability to correctly rate only the X-O1 association, instead of the three associations used in Experiment 1. We consider the potential effects of this change in elimination criteria in the General Discussion. During the warmup procedure, the prediction ratings occurred before the valence ratings for the positive warmup and null-warmup1 streams, whereas they took place after the valence ratings for the negative and null-wamup2 streams.

  3. We eliminated cue W as a reference stimulus for cue X because its inclusion resulted in neither stronger effects nor greater statistical sensitivity than was observed in experiments that lacked a reference stimulus (e.g., Jozefowiez et al., 2020). In Experiment 2, the participants directly rated how they felt about X. It used the same layout as the valence ratings in Experiment 1 (Figure 1) but the question was now stated the following way: “How pleasant or unpleasant is this image for you?”. The Likert scale used by the participant to answer was modified accordingly: it ranged from −5 to +5 and anchored at −5 (“Very unpleasant”), 0 (“Neither pleasant nor unpleasant”), and +5 (“Very pleasant”).

  4. In an attempt to make the experiment shorter, cue presentations were 800 ms in duration (instead of 1000 ms in Experiment 1) and were coterminous with the 400-ms outcome (on cue-outcome trials, versus 600 ms in Experiment 1). Trials were separated by a 400-ms ITI (versus 500 ms in Experiment 1). Moreover, each octet of conditions was presented only twice instead of three times because analysis of the data from Experiment 1 revealed that the third presentation did not enhance sensitivity.

  5. During the final outcome ratings, an undetected programming error led to the outcome being shown along with the Likert scale.

The data were analyzed using the same method as in Experiment 1.

Transparency and openness statement

The raw data for Experiment 2 is available at https://orb.binghamton.edu/psych_fac/19/. The code used to run that experiment is available at https://app.gorilla.sc/openmaterials/512932. The stimuli are available with the program at https://app.gorilla.sc/openmaterials/512932.

Results

Outcome valences

Figure 5 shows the valence ratings of the outcomes as a function of the condition. Valence ratings of outcomes were similar to those observed in Experiment 1, indicating that the SUNY Binghamton population online rated the IAPS images similar to the way they did in person (Experiment 1).

Figure 5.

Figure 5.

Mean valence ratings for the outcomes as a function of condition in Experiment 2. Error bars are 95% CIs.

Prediction ratings

The top panel of Figure 6 shows the prediction rating for the X-NEG relation as a function of condition. A repeated-measures ANOVA using the test context (Context A vs. Context B) and the type of interference (Ctr vs. Ext vs. CC vs. NFE) as factors found a main effect of the test context, F(1, 144) = 50.15, p < 0.001, ηp2=0.26, 90% CI [0.16, 0.35], of the type of interference, F(2.41, 347.47) = 40.47, p < 0.001, ηp2=0.22, 90% CI [0.15, 0.27], and an interaction between the two factors, F(3, 432) = 2.80, p < 0.05, ηp2=0.02, 90% CI [0.0004, 0.04].

Figure 6.

Figure 6.

Top: Mean prediction ratings as a function of condition in Experiment 2b. Bottom: Mean valence ratings of X as a function of condition in Experiment 2b. Error bars are 95% CIs.

Table 5 provides a list of the relevant contrasts necessary to interpret the conclusions of the ANOVA. The interaction between the test context and the type of interference was due to Ctr being less impacted by the context switch than was Ext, CC, or NFE. Whether testing occurred in Context A or Context B, the prediction ratings were higher in Ctr than in Ext, C, or NFE, whereas there were no significant differences between Ext, CC, and NFE. If we define the context effect as the difference between the Context A rating and the Context B rating, everything else being equal, a context effect was observed in all conditions, but it was weaker in Ctr than in Ext, CC, and NFE. Critically, there was no difference between Ext, CC, and NFE.

Table 5:

Results of Experiment 2

Measure Context Comparison Mean difference Cohen’s d Between-measurement r

Prediction A Ctr vs. Ext 11.60 [7.54, 15.60] 0.53 [0.35, 0.72] 0.38
Prediction A Ctr vs. CC 12.30 [8.12, 16.60] 0.54 [0.36, 0.73] 0.40
Prediction A Ctr vs. NFE 13.10 [8.97, 17.20] 0.58 [0.40, 0.77] 0.42
Prediction A Ext vs. CC 0.76 [−2.27, 3.79] 0.03 [−0.09, 0.14] 0.75
Prediction A Ext vs. NFE 1.48 [−1.21, 4.17] 0.06 [−0.05, 0.16] 0.80
Prediction A CC vs. NFE 0.72 [−2.47, 3.92] 0.03 [−0.09, 0.15] 0.73
Prediction B Ctr vs. Ext 17.90 [12.90, 22.90] 0.60 [0.43, 0.78] 0.48
Prediction B Ctr vs. CC 18.10 [12.90, 23.20] 0.62 [0.44, 0.80] 0.43
Prediction B Ctr vs. NFE 19.50 [14.00, 25.10] 0.64 [0.46, 0.83] 0.39
Prediction B Ext vs. CC 0.21 [−4.06, 4.47] 0.01 [−0.13, 0.14] 0.65
Prediction B Ext vs. NFE 1.66 [−2.78, 6.09] 0.05 [−0.09, 0.19] 0.64
Prediction B CC vs. NFE 1.45 [−3.08, 5.98] 0.05 [−0.10, 0.19] 0.62
Prediction A vs. B Ctr 12.00 [7.00, 16.90] 0.51 [0.30, 0.73] 0.18
Prediction A vs. B Ext 18.20 [12.10, 24.40] 0.64 [0.42, 0.87] 0.13
Prediction A vs. B CC 17.70 [12.20, 23.20] 0.61 [0.42, 0.81] 0.34
Prediction A vs B NFE 18.40 [12.70, 24.10] 0.62 [0.42, 0.82] 0.32
Prediction A vs. B Ctr vs. Ext 6.28 [1.01, 11.50] 0.18 [0.03,0.34] 0.57
Prediction A vs. B Ctr vs. CC 5.72 [0.68, 10.80] 0.18 [0.02, 0.34] 0.54
Prediction A vs. B Ctr vs. NFE 6.45 [1.27, 11.60] 0.20 [0.04, 0.36] 0.54
Prediction A vs. B Ext vs. CC −0.55 [−5.62, 4.52] −0.02 [−0.16, 0.13] 0.63
Prediction A vs. B Ext vs. NFE 1.17 [−5.00, 5.35] 0.01 [−0.14, 0.15] 0.62
Prediction A vs. B CC vs. NFE 0.72 [−4.50, 5.95] 0.02 [−0.13, 0.17] 0.57
Valence Both Ctr vs. Ext −0.05 [−0.26, 0.16] −0.02 [−0.13, 0.08] 0.78
Valence Both Ctr vs. CC −0.45 [−0.75, −0.16] −0.24 [−0.39, −0.08] 0.56
Valence Both Ctr vs. NFE −0.09 [−0.30, 0.12] −0.05 [−0.16, 0.06] 0.77
Valence Both Ext vs. CC −0.41 [−0.69, −0.12] −0.22 [−0.37, −0.−7] 0.60
Valence Both Ext vs. NFE −0.04 [−0.25, 0.16] −0.02 [−0.13, 0.08] 0.78
Valence Both CC vs. NFE 0.36 [0.13, 0.60] 0.19 [0.07, 0.32] 0.70

Note. Each mean difference reports the mean of the differences in ratings between the specified conditions. Brackets contain the upper and lower limits of 95% confidence intervals for either the mean difference or estimate of effect size (Cohen’s d, see the Methods section). Between-measurement r reports the correlation between participants’ scores in the comparisons. For valence comparisons, negative values indicate that the first listed condition (e.g., ‘Ext’ in ‘Ext vs. CC’) produced ratings corresponding to greater unpleasantness. All A vs. B contrasts refer to the size of the context effect. For instance, ‘A vs B Ctr’ compared the ratings in the Ctr condition when testing occurred in A versus the ratings in the Ctr condition when the rating occurred in B. ‘A vs B Ctr vs. Ext’ compares the ‘A vs B Ctr’ difference with ‘A vs. B Ext’ difference.

Valence ratings

The bottom panel of Figure 6 shows the mean valence ratings of X for each experimental condition. A repeated-measures ANOVA using the test context (Context A vs. Context B) and the type of interference (Ctr vs. Ext vs. CC vs. NFE) as factors found a main effect of the test context, F(1, 144) = 21.23, p < 0.001, ηp2=0.13, 90% CI [0.05, 0.21], and of the type of interference, F(2.54, 366.02) = 5.79, p < 0.01, ηp2=0.04, 90% CI [0.009, 0.07], but no interaction between the two factors, F(3, 432) = 0.50, p = 0.69, ηp2=0.003, 90% CI [0.00, 0.01].

Table 5 shows the relevant contrasts necessary to interpret the results of the ANOVA. The valence of X was higher when testing occurred in Context B than in Context A. The valence of X in Ctr was lower than in CC, but there was no detectable difference between Ctr and either Ext or NFE. The valence of X in CC was higher than in either Ext or NFE, whereas there was no detectable difference between Ext and NFE.

Discussion

Prediction ratings

Regarding the prediction ratings, Experiment 2 found that (a) Ext, CC, and NFE lowered the X-NEG prediction ratings relative to Ctr, (b) but there were no differences among Ext, CC, and NFE in that regard, (c) independent of condition (Ctr, Ext, CC, NFE), the prediction ratings were lower when testing occurred in Context B than Context A, (d) but Ext, CC, and NFE were more sensitive than Ctr to this context effect, (e) even though they did not differ from each other.

A difference between Experiments 1 and 2 concerns the context effect. In Experiment 2, Ext, CC, and NFE were more sensitive to the context switch than Ctr, which corresponds to our working definition of renewal. We did not observe this in Experiment 1, but at the descriptive level, the context effect was smaller in Ctr than in Ext, CC, or NFE in Experiment 1. Moreover, Experiment 2 was a replication of an experiment that we previously conducted online with 99 participants recruited on Prolific (We do not report this experiment in the present article because, due to a few programming errors, doing so would have distracted from the focus of this report (the data for this experiment are available at https://orb.binghamton.edu/psych_fac/17, and the Gorilla program can be accessed at https://app.gorilla.sc/openmaterials/512932). The results of this experiment were identical to the one of Experiment 2, including the smaller context effect in Ctr. The most parsimonious explanation for this pattern of result across the three experiments appears to be that our inability to detect that the context effect was smaller in Ctr than in Ext/CC/NFE in Experiment 1 was a type II error, though, of course, this kind of argument is always debatable.

Valence ratings

Concerning the valence ratings, a strong context effect was observed with X having a higher valence (i.e., less unpleasant) when testing occurred in B than in A (see bottom panel of Figure 6), as was expected because X was never paired with the NEG outcome in B (a nonsignificant tendency in the same direction, .05 < p < .10, was seen in Experiment 1). However, there was no evidence that this context effect varied as a function of the type of interference. Notably, the data do not refute the hypothesis that it is the same in Ctr relative to Ext, CC, and NFE. We detected no evidence of the context affecting cue valence ratings in Experiment 1, but again, this may reflect inadequate statistical power. At the descriptive level, the mean valence of X was higher (less aversive or more appetitive) when testing occurred in B than A as illustrated in the bottom panel of Figure 6.

Otherwise, in both experiments, CC consistently produced changes in the valence of X relative to Ctr. In contrast, we found no evidence that NFE altered the valence of X relative to Ctr. The conclusions regarding Ext are more ambiguous. In Experiment 1, while not as effective as CC, Ext still altered the valence of X relative to Ctr, whereas it did not statistically differ from NFE. In Experiment 2, there was no statistical evidence that Ext altered the valence of X. One way to explain this pattern might be to assume that the potential of Ext to alter the valence of X was intermediate between CC (which was very effective at doing so) and NFE (which was unable to do so), while at the same time not being as consistent as CC in this regard.

While the results from Experiment 2 agree with those of Experiment 1 on a qualitative level, one might be tempted to compare them at the quantitative level. However, this sort of comparison would provide little information for two reasons. First, the valence ratings are quite noisy, which makes almost all between-subject comparisons useless. Second, it would be hard to interpret any quantitative difference between the two studies because they did not probe the valence of X the same way, and one study was conducted onsite whereas the other was conducted online. In our experience and as one should expect, the latter always leads to smaller effect sizes, both because of an increase in the variability of the data and a decrease in the difference between the means of the conditions (the decrease in statistical power resulting from the lower effect size can usually be compensated for by recruiting a larger number of participants in an online study).

General Discussion

In two experiments, we examined the efficacies of Ext, CC, and NFE and their susceptibilities to contextual manipulations. Associative learning was assessed through expectancy learning (how likely is it for the US to follow the CS?) and evaluative conditioning (How does the participant feel about the CS).

Expectancy learning:

CC and NFE produced similar changes in outcome expectancy ratings. Depending on the way the participants’ knowledge of the cue-outcome contingency was probed, Ext was either less effective than CC and NFE (Experiment 1) or equally effective (Experiment 2). In terms of sensitivity to the test context, expectancy learning following Ext, CC, and NFE were equally sensitive to the context manipulation. In Experiment 2, all three interference treatments were more sensitive to that manipulation than was the Ctr condition, although this was true only at the descriptive level in Experiment 1. As the results of Experiment 2 were replicated in another study, we think we have grounds to argue that the small context effect in the Ctr condition observed in Experiment 1 is real despite not reaching the statistical significance threshold..

Evaluative conditioning:

CC increased the valence of X from the negative value acquired in Phase 1 and was more effective at doing so than either Ext or NFE. Moreover, the valence ratings in the NFE, CC, and Ext conditions were all affected by the context change to roughly the same degree as in the Ctr condition.

In retrospect, the greater efficacy of CC than either NFE or Ext in reducing evaluative conditioning but not expectancy learning is not surprising. CC pairs the target cue with an outcome of the opposite valence, whereas neither NFE nor Ext does this. Outcome valence may be unimportant in driving changes in outcome expectancy because changes in outcome expectancy depend on ‘whether’ the target outcome occurs after the cue instead of the affective value of the outcome. Aust et al. (2019) and Lipp & Purkis (2005) proposed that greater resistance of evaluative conditioning than outcome expectancy to the effects of response-attenuating treatments like extinction results from differences in the test questions used to measure evaluative conditioning and outcome expectancy. Expectancy learning probes recent memories, whereas evaluative conditioning presumably probes all relevant memories with approximately equal weight. Their account, however, does not explain the differences we observed between NFE, CC, and Ext.

The primary goal of the present experiments was to assess Dunsmoor et al’s assertion that NFE is more resistant to recovery than Ext, while including CC in that assessment. The present data do not lend support to the assumed generality of previous papers concerning NFE (Dunsmoor et al., 2015; Dunsmoor et al., 2019. See also Lucas et al, 2018). In our procedure, where an effect of NFE was observed at all, it proved as susceptible as Ext and CC to recovery effects in expectancy learning. The ineffectualness of NFE in reducing evaluative conditioning prevents us from making a strong claim about emotional recovery from NFE. Among the many possible reasons for this discrepancy is that we examined ABA renewal, whereas previous researchers looked at spontaneous recovery (Dunsmoor et al.) and reinstatement (Lucas et al.). However, it should be noted that Quintero et al. (2024) failed to replicate Dunsmoor et al.’s results regarding the lesser susceptibility of NFE to spontaneous recovery. It is difficult to pinpoint a reason for these discrepancies. The uncomfortable truth might be that, like many other Pavlovian phenomena including major ones such as blocking (Maes et al., 2016), conclusions regarding NFE are highly task and parameter-dependent.

A final feature of the data worthy of mention is the dissociation between expectancy learning and evaluative conditioning, in the sense that one cannot predict the conclusions based on one measure by relying on the other. We already observed such dissociation in our previous study comparing Ext and CC (Jozefowiez et al., 2020). Konorski (1967) and Wagner and Brandon (1989) have suggested that subjects learn various CS-US associations in Pavlovian conditioning, notably an association between the CS and the sensory feature of the US (sensory CS-US association), and one between the CS and the emotional feature of the US (emotional CS-US association). An intuitive way to explain the dissociation between expectancy learning and evaluative conditioning is to assume that the former probes the sensory CS-US association whereas evaluative conditioning probes the emotional CS-US association. That the valence of the US has limited impact on expectancy learning whereas it is critical for evaluative conditioning supports this view. The alternative would be to assume that there is only a single CS-US association, but its behavioral consequences depend on how it is probed. Different response rules would lead to different behavioral patterns depending on whether the subject is probed about what he knows about the CS-US contingency or how he feels about the CS. Without any clear first principles to determine such response rules, they would necessarily remain post hoc and arbitrary and, to account for the data, quite cumbersome. By contrast, the distinction between sensory and emotional associations is well-established (see Bouton, 2018, for a review) and accounts for the irrational aspect of many of our emotional responses. In our opinion, this strongly favors the hypothesis that expectancy learning and evaluative probes different associations over the view that they probe the same association but induce different rules of expression for that association.

Contrasts with the existing literature

Our data also contradict the results of Holmes et al. (2016) and Kang et al. (2018). Holmes et al., using rats, found more recovery (renewal) after CC than Ext, and Kang et al., using humans, observed less recovery (reinstatement and spontaneous recovery) after CC than Ext. In contrast, we found strong evidence that, in our procedure, CC was as susceptible as Ext to context effects for both our measures of associative learning. We do not have any explanation for these discrepancies, but they indicate that further work on the susceptibility of CC to context effects is necessary.

As previously mentioned, evaluative conditioning has often been considered to be resistant to extinction (i.e., Baeyens et al., 1988; Díaz et al., 2005; Gawronski et al., 2015; Vansteenwegen et al., 2006). Our data are partially compatible with this conclusion in that the effect of Ext on the valence of X was decidedly less reliable than the one of CC. In a previous series of experiments looking at the difference between Ext and CC (Jozefowiez et al., 2020), we found that CC was more effective at altering the valence of the target cue than was Ext, whereas the relative effectiveness of CC and Ext in interfering with expectancy learning varied as a function of various experimental parameters. Notably, if the Phase 1 outcome was also presented during Phase 2 treatment that consisted of either the target cue presented by itself (Ext) or the target cue paired with a new outcome (CC), CC was more effective than Ext. In contrast, the reverse was true if the Phase 1 outcome was absent during Phase 2. The results of the present study are consistent with our previous observations. In influencing evaluative conditioning, CC was more effective than both Ext and NFE. In influencing expectancy learning, the relative effectiveness of CC, NFE, and Ext varied as a function of the method used to probe knowledge of the cue-outcome contingency.

Potential limitations

Unconventional US.

Most studies of aversive conditioning in humans use either electric shock, loud noise, or air puff. One might wonder whether our conclusions would generalize to such paradigms. That concern might stem from the IAPS images we used not being as aversive, especially when compared to electric shock. We note that Dunsmoor and his collaborators claimed broad generality for their findings, which certainly did not generalize to our preparations. This proves that there are clear boundary conditions to their conclusions which need to be better understood to draw appropriate theoretical and practical implications from their results. Moreover, the IAPS is a well-accepted tool for the induction of emotional reactions and the investigation of emotional processes (Bradley & Lang, 2007), and its ability to engage brain areas involved in emotional processing is well-documented (i.e., Sabatinelli, Bradley, Lang, Costa, & Versace, 2007; Sabatinelli, Bradley, Fitzsimmons, & Lang, 2007), and that IAPS images are used as aversive US in the fear-potentiated startle response paradigm, a well-established procedure in the study of human aversive learning (i.e., Bradley et al., 2006; Vrana et al., 1988). Finally, we have been able to reproduce in subsequent yet unpublished studies the small but reliable evaluative conditioning effects induced by the IAPS pictures, which further speaks to their potency as emotionally-charged US.

The choice of IAPS pictures as USs for expectancy learning as well as evaluative conditioning as CR in Experiment 1 proved almost providential when the COVID19 pandemic made it necessary to move online. This would not have been possible if we had used more traditional aversive US such as electric shocks or SCR as a measure of learning because those USs require the presence of the participants in the lab. COVID19 has receded but, because of the replication crisis, the move toward online experimentation in psychology has not, in that it permits greater statistical power. Hence, we think one additional value of this report is the demonstration of a procedure allowing for the online study of associative learning with an emotional US. How well the conclusions drawn from such methods can be generalized to studies using more traditional aversive US and to real-life anxiety disorders is an open question that should be investigated by further research.

High drop-out rate.

Another issue the reader might be concerned with is the large number of participants we had to discard because they either failed to meet the learning criterion during the warm-up or because they admitted not being attentive during the study. As far as Experiment 1 is concerned, the very strict learning criterion during the warm-up which required the participants to correctly rate the X-O1, X-Null, and X-O2 associations explains why so many participants failed to meet it. The drop-out rate in Experiment 2 seems more related to a motivational issue which can hardly be avoided with long studies of this sort. We have good reasons to believe that this was not a problem because it does not change the conclusions. As we already mentioned, Experiment 2, which used Binghamton University students working for credits, was actually the replication of a study run with Prolific participants, who have a reputation for being ‘good’ participants in the sense that they really try to engage with their studies. Indeed, whereas the elimination rate was 38.03% for the Binghamton University students, it was only 15.30% for the Prolific participants. Despite these differences, the results of the two experiments were identical.

Did we observe renewal?

We observed reliable context effects in Experiments 1 and 2 for both the prediction and the valence ratings, but do they qualify as renewal? Doubts about this might be raised because the Ctr condition was also modulated by the context, whereas, at least in animal experiments, only the interference condition is typically modulated by the context.

Let us assume that the observed context modulation is the sum of two components: on one hand, a general component equally affecting all conditions and which can be explained either by appealing to Tulving’s encoding specificity principle or, in a Rescorla-Wagner fashion, to a Context A-NEG association summing with the X-NEG association when testing occurred in Context A; on the other, a component specifically impacting the interference conditions and resulting potentially from an inhibition of the X-NEG association by context B. Only the latter component would constitute genuine renewal as the term is ordinarily used.

In this framework, renewal is observed in an interference condition if the context effect is larger in that condition than in the Ctr condition. This is certainly the case for Ext, CC, and NFE in Experiment 2 as far as the prediction ratings are concerned. In Experiment 1, the context effect in the Ctr condition is smaller than the context effect in Ext, CC, and NFE at the descriptive level, but they did not differ at the inferential level. Based on the data from Experiment 2 and its unreported replication, we have argued that this lack of statistical significance is a type II error, and that renewal was present in Experiment 1 too.

A further argument in favor of renewal being present in Experiment 1 takes advantage of the participants’ having rated not only the X-NEG association but also the association between X and all the other outcomes shown in a stream (NULL, POS, NEUT). Let us assume that the context effects observed were caused purely by associations between the contexts and the outcomes which summed with the association between X and the outcome at test, rather than a modulation of the association between X and the outcomes, which would constitute genuine renewal. In this case, Context A would be strongly associated with NEG while Context B would be strongly associated with NULL, POS, and NEUT. Hence, we should have observed (a) a decrease in the X-NEG ratings in Context B as the A-NEG association could not sum with the X-NEG association in Context B; (b) an increase in the rating of the X-NULL (X-POS, X-NEUT, respectively) association in Ext (CC, NFE, respectively) as it sums up with the B-NULL (B-POS, P-NEUT, respectively) association; (c) a smaller increase in the ratings of X-POS and X-NEUT associations in Ext, and in the ratings of X-NULL, and X-NEUT associations in CC, and X-NULL and X-POS associations in NFE due to the associations existing between Context B and these outcomes. As shown in the top panel of Figure 3, this last prediction is refuted by the data because only the ratings for the outcomes paired with X were impacted by the context switch; the ratings for the other outcomes remained close to 0 in both contexts.

Importantly, none of these arguments can be applied to the valence ratings. As we failed to detect a statistically significant difference between Ctr and the three interference conditions in the valence ratings, we cannot refute the hypothesis that the context effects for the valence ratings do not constitute genuine renewal but only a general impact of the context on the stimulus valence. As the valence ratings were noisier than the prediction ratings, this possibly reflects a power issue. At the descriptive level, the effect size for the context effect on the valence ratings in the Ctr condition (expressed as Cohen’s d, Experiment 1: 0.05, 95% CI [−0.11, 0.22]; Experiment 2: 0.11, 95% CI [−0.03, 0.26]) is smaller than in either Ext or CC (Experiment 1: Ext = 0.17, 95% CI [0.01, 0.34]; CC = 0.14, 95% CI [−0.04, 0.32]; Experiment 2: Ext = 0.22, 95% CI [0.08, 0.37], CC = 0.23, 95% CI [0.08, 0.38]) but at the same level as NFE (Experiment 1: 0.02, 95% CI [0.20, 0.17]; Experiment 2: 0.17, 95% CI [0.03, 0.32]).

Concluding statement

Finding ways to reduce renewal from associative interference has been a goal of considerable research for numerous years. A solution to this problem has proven quite elusive, possibly because contextualization of ambiguous learning is highly functional in so many situations and usually leads to adaptive behavior. Based on our results, which confirm an earlier report by Quintero et al. (2024), NFE is as susceptible to contextualization as Ext and CC. This suggests that the lesser susceptibility of NFE to recovery effect reported by Dunsmoor et al. (2015) is in all probability highly task and parameter-dependent. Thus, NFE is no magic bullet against renewal.

Acknowledgments

This research was supported in part by NIH Award MH033881 and Agence Nationale pour la Recherche (ANR-21-CE28–0013). All raw data from these experiments and the Python computer code for Experiment 1 are available at https://orb.binghamton.edu/psych_fac/19/ and upon request from Jeremie Jozefowiez or Ralph Miller. Experiment 2 was programmed with Gorilla Builder: the code used to run that experiment is available at https://app.gorilla.sc/openmaterials/512932. The authors report no conflicts of interest.

Jérémie Jozefowiez, Univ. Lille, CNRS, UMR 9193 – SCALab – Sciences Cognitives et Sciences Affectives, F-59000 Lille, France; Ralph R. Miller, Yaroslav Moshchenko, Cameron M. McCrea, Kristina A. Stenstrom, Department of Psychology, SUNY-Binghamton (Binghamton, New York); James E. Witnauer, Department of Psychology, SUNY – Brockport (Brockport, New York). We thank Julianna Aquilone, Kevin Artus, Nathaniel Darko, Dennis Elengickal, Allison Escaldi, Allison Hope, Jovin Huang, Audrey Huff, Dave Jiang, Sarah Landman, Jenna Polis, and Samuel Woltag for commenting on an earlier version of the manuscript.

References

  1. Alcalà JA, Miller RR, Kirkden RD, & Urcelay GP (2023). Contiguity and overshadowing interactions in the rapid-streaming procedure. Learning & Behavior, 51, 482–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Anwyl-Irvine, Massonié J, Flitton A, Kirkham N, & Evershed JK. (2020). Gorilla in our midst: an online behavioral experiment builder. Behavior Research Methods, 52, 388–407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Aust F, Haaf JM, & Stahl C. (2019). A memory-based judgment account of expectancy-liking dissociations in evaluative condition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 45(3), 417–439. 10.1037/xlm0000600 [DOI] [PubMed] [Google Scholar]
  4. Baeyens F, Crombez G, Van den Bergh O, & Eelan P. (1988). Once in contact always in contact: Evaluative conditioning is resistant to extinction. Advances in Behavior Research Therapy, 10, 179–199. [Google Scholar]
  5. Balaz MA, Capra S, Hartl P, & Miller RR (1981). Contextual potentiation of acquired behavior after devaluing direct context-US associations. Learning and Motivation, 12, 383–397. [Google Scholar]
  6. Bradley MM, Codispoti M, & Lang PJ (2006). A multi-process account of startle modulation during affective perception. Psychophysiology, 43, 486–497. [DOI] [PubMed] [Google Scholar]
  7. Bradley MM, & Lang PJ (2007). The International Affective Picture System (IAPS) in the study of emotion and attention. In Coan JA and Allen JJB (Eds), Handbook of Emotion Elicitation and Assessment (pp. 24–46). Oxford University Press. [Google Scholar]
  8. Beckers T, De Vicq P, Baeyens F. (2009). Evaluative conditioning is insensitive to blocking. Psychologica Belgica, 49, 41–57. [Google Scholar]
  9. Bouton ME (2017). Extinction: Behavioral mechanisms and their implications. In Byrne G. (Ed), Learning and memory: A comprehensive reference, 2nd edition: Vol 1: Learning theory and behavior (pp. 61–83). Cambridge, MA: Academic Press. [Google Scholar]
  10. Bouton ME (2018). Learning and behavior: a contemporary synthesis (2nd edition). Sinauer. [Google Scholar]
  11. Bouton ME, & Bolles RC. (1979). Contextual control and the extinction of conditioned fear. Learning and Motivation, 10, 445–466. [Google Scholar]
  12. Chen Y, Lin X, Ai S, Sun Y, Shi L, Meng S, Lu L, Shi J. (2022). Comparing three extinction methods to reduce fear expression and generalization. Behavioural Brain Research. 10.1016/j.bbr.2021.113714 [DOI] [PubMed] [Google Scholar]
  13. Cohen J. (1988). Statistical power analysis for the behavioral sciences. New York: Routledge. [Google Scholar]
  14. Cousineau D, & Goulet-Pelletier J-C (2021). A study of confidence intervals for Cohen’s dp in within-subject designs with new proposals. Quantitative Methods for Psychology, 17, 51–75. [Google Scholar]
  15. Crump MJC, Hannah SD, Allan LG, & Hord LK (2007). Contingency judgments on the fly. Quarterly Journal of Experimental Psychology, 60, 753–761. [DOI] [PubMed] [Google Scholar]
  16. Cummings G. (2012). Understanding the new statistics: effect size, confidence intervals, and meta-analysis. New York, NY: Routledge. [Google Scholar]
  17. Davey GCL, & Singh J. (1988). The Kamin “blocking” effect and electrodermal conditioning in humans. Journal of Psychophysiology, 2(1), 17–25. [Google Scholar]
  18. De Houwer J, Thomas S, & Baeyens F. (2001). Associative learning of likes and dislikes: A review of 25 years of research in human evaluative conditioning. Psychological Bulletin, 127, 853–869. [DOI] [PubMed] [Google Scholar]
  19. Díaz E, Ruiz G, & Baeyens F. (2005). Resistance to extinction of human evaluative conditioning using a between-subjects design: Associative learning of likes and dislikes. Cognition and Emotion, 19, 245–268. [DOI] [PubMed] [Google Scholar]
  20. Dunsmoor JE, Campese VD, Ceceli AO, LeDoux JE, & Phelps EA (2015). Novelty-facilitated extinction: providing a novel outcome in place of an expected threat diminishes recovery of defensive responses. Biological Psychiatry, 78, 203–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Dunsmoor JE, Kroes MCW, Li J, Daw ND, Simpson HB, & Phelps EA (2019). Role of human ventromedial prefrontal cortex in learning and recall of enhanced extinction. Journal of Neuroscience, 39, 3264–3276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Gawronski B, Gast A, & De Houwer J. (2015). Is evaluative conditioning really resistant to extinction? Evidence for changes in evaluative judgments without changes in evaluative representations. Emotion and Cognition, 29, 816–830. [DOI] [PubMed] [Google Scholar]
  23. Hannah SD, Crump MJC, Allan LG, & Siegel S. (2009). Cue-interaction effects in contingency judgments using the streamed-trial procedure. Canadian Journal of Experimental Psychology, 63, 103–112. [DOI] [PubMed] [Google Scholar]
  24. Hinchy J, Lovibond PF, & Ten-Horst KM (1995). Blocking in human electrodermal conditioning. Quarterly Journal of Experimental Psychology, 48B, 2–12. [PubMed] [Google Scholar]
  25. Holmes NM, Leung HT, & Westbrook RF (2016). Counterconditioned fear responses exhibit greater renewal than extinguished fear responses. Learning and Memory, 23, 141–150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Jozefowiez J. (2021). Individual differences in the perception of cue-outcome contingencies: A signal detection analysis. Behavioural Processes, 188, 104398. [DOI] [PubMed] [Google Scholar]
  27. Jozefowiez J, Berruti AS, Moshchenko Y, Peña T, Polack CW, & Miller RR (2020). Retroactive interference: Counterconditioning and extinction with and without biologically significant outcomes. Journal of Experimental Psychology: Animal Learning and Cognition, 46, 443–459. [DOI] [PubMed] [Google Scholar]
  28. Jozefowiez J, & Miller RR (2024). Cue duration and trial spacing effects in contingency assessment in the streaming procedure with humans. Journal of Experimental Psychology: Animal Learning & Cognition, 50, 99–117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Kang S, Vervliet B, Engelhardd IM, van Dis EAM, & Hagenaars MA (2018). Reduced return of threat expectancy after counterconditioning versus extinction. Behavior Research and Therapy, 108, 78–84. [DOI] [PubMed] [Google Scholar]
  30. Konorski J. (1967). Integrative activity of the brain: an interdisciplinary approach. University of Chicago Press. [Google Scholar]
  31. Kamin LJ (1969). Predictability, surprise, attention, and conditioning. In Campbell BA, & Church RM (Eds), Punishment and aversive behavior (pp. 279–396). New York: Appleton Century Croft. [Google Scholar]
  32. Kimmel HD, & Bevill MJ (1996). Blocking and unconditional response diminution in human classical autonomic conditioning. Integrative Physiological and Behavioral Science, 31, 18–43. [DOI] [PubMed] [Google Scholar]
  33. Kremer EF (1978). The Rescorla-Wagner model: losses in associative strength in compound conditioned stimuli. Journal of Experimental Psychology: Animal Behavior Processes, 4, 22–36. [DOI] [PubMed] [Google Scholar]
  34. Krypotos A-M, & Engelhard IM (2018). Testing a novelty-based extinction procedure for the reduction of conditioned avoidance. Journal of Behavior Therapy and Experimental Psychiatry, 60, 22–28. 10.1016/j.jbtep.2018.02.006 [DOI] [PubMed] [Google Scholar]
  35. Lang PJ, Bradley MM, & Cuthbert BN (2008). International affective picture system (IAPS): Affective ratings of pictures and instruction manual. Technical Report A-8 University of Florida, Gainesville, FL. [Google Scholar]
  36. Laux JP, Goedert KM, & Markman AB (2010). Causal discounting in the presence of a stronger cue is due to bias. Psychonomic Bulletin & Review, 17, 213–218. [DOI] [PubMed] [Google Scholar]
  37. Lipp OV, Luck CC, & Muir AC (2019). Evaluative conditioning affects the subsequent acquisition of differential fear conditioning as indexed by electrodermal responding and stimulus evaluations. Psychophysiology, 57, e13505. [DOI] [PubMed] [Google Scholar]
  38. Lipp OV, & Purkis HM (2005). No support for dual process accounts of human affective learning in simple Pavlovian conditioning. Cognition and Emotion, 19, 269–282. [DOI] [PubMed] [Google Scholar]
  39. Lipp OV, Waters AM, Luck CC, Ryan KM, & Craske MG (2020). Novel approaches for strengthening human fear extinction: The roles of novelty, additional USs, and additional GSs. Behavioural Research and Therapy, 124, 103529. 10.1016/j.brat.2019.103529 [DOI] [PubMed] [Google Scholar]
  40. Lovibond PF, Siddle DAT, & Bond NW (1988). Insensitivity to stimulus validity in human Pavlovian conditioning. Quarterly Journal of Experimental Psychology, 40B, 377–410. [PubMed] [Google Scholar]
  41. Lucas J, Luck CC, & Lipp OV (2018). Novelty-facilitated extinction and the reinstatement of conditional human fear. Behaviour Research and Therapy, 109, 68–74. [DOI] [PubMed] [Google Scholar]
  42. Maia S, Lefèvre F, & Jozefowiez J. (2018). Psychophysics of associative learning: Quantitative properties of subjective contingency. Journal of Experimental Psychology: Animal Learning and Cognition, 44, 67–81. [DOI] [PubMed] [Google Scholar]
  43. Maes E, Boddez Y, Alfei JM, Krypotos A-M, D’Hooge R, De Houwer J, & Beckers T. (2016). The elusive nature of the blocking effect: 15 failures to replicate. Journal of Experimental Psychology: General, 145, 49–71. [DOI] [PubMed] [Google Scholar]
  44. Mitchell CJ, & Lovibond PF (2002). Backward and forward blocking in human electrodermal responding: Blocking requires an assumption of outcome additivity. Quarterly Journal of Experimental Psychology, 55B, 311–329. [DOI] [PubMed] [Google Scholar]
  45. Murphy RA, Witnauer JE, Castiello S, Tsvetkov A, Li A, Alcaides D, & Miller RR (2021). More frequent, shorter trials enhance acquisition in a training session: There is a free lunch! Journal of Experimental Psychology: General, 151(1), 41–64. 10.1037/xge0000910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Nelson JB (2016). A robust function to return the cumulative density of non-central F distribution in Microsoft Office Excel. Psicologia, 37, 61–83. [Google Scholar]
  47. Pavlov IP (1960). Conditional Reflexes. Dover Publications (original work published 1927). [Google Scholar]
  48. Peirce JW (2007). PsychoPy: Psychophysics software in Python. Journal of Neuroscience Methods, 162, 8–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Rescorla RA, & Heth CD (1975). Reinstatement of fear to an extinguished conditioned stimulus. Journal of Experimental Psychology: Animal Behavior Processes, 1(1), 88–96. [PubMed] [Google Scholar]
  50. Rescorla RA & Wagner A. (1972). A theory of Pavlovian conditioning: Variation in the effectiveness of reinforcement and nonreinforcement. In Black AH & Prokasy WF (eds), Classical conditioning II (pp. 64–99). New York: Appleton Century Croft. [Google Scholar]
  51. Sabatinelli D, Bradley MM, Lang PJ, Costa VD, & Versace F. (2007). Pleasure rather than salience activates human nucleus accumbens and medial prefrontal cortex. Journal of Neurophysiology, 98, 1374–1379. [DOI] [PubMed] [Google Scholar]
  52. Sabatinelli D, Bradley MM, Flitzsimmons JR, & Lang PJ (2005). Parallel amygdala and inferotemporal activation reflect emotional intensity and fear relevance. NeuroImage, 24, 1265–1270. [DOI] [PubMed] [Google Scholar]
  53. Siegel S, Allan LG, Hannah SD, & Crump MJC (2009). Applying signal detection theory to contingency assessment. Comparative Cognition & Behavior Reviews, 4, 116–134. [Google Scholar]
  54. Steiger JH (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164–182. [DOI] [PubMed] [Google Scholar]
  55. Steinman SA, Dunsmoor JE, et al. (2022). A preliminary test of novelty-facilitated extinction in individuals with pathology anxiety. Frontiers in Behavioral Neuroscience, 16, 873489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Thomas BL, Cutler M, & Novak C. (2012). A modified counterconditioning procedure prevents the renewal of conditioned fear in rats. Learning and Motivation, 43(1–2), 24–34. [Google Scholar]
  57. Tulving E. & Osler S. (1968). Effectiveness of retrieval cues in memory for words. Journal of Experimental Psychology, 77, 593–601. [DOI] [PubMed] [Google Scholar]
  58. Van Gucht D, Baeyens F, Hermans D, & Beckers T. (2013). The inertia of conditioned craving: Does context modulate the effect of counterconditioning? Appetite, 65, 51–57. [DOI] [PubMed] [Google Scholar]
  59. Van Elzakker MB, Dahlgren MK, Davis FC, Dubois S, & Shin LM (2014). From Pavlov to PTSD: The extinction of conditioned fear in rodents, humans, and anxiety disorders. Neurobiology of Learning and Memory, 113, 3–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Vansteenwegen D, Francken G, Vervliet B, De Clercq A, & Eelen P. (2006). Resistance to extinction in evaluative conditioning. Journal of Experimental Psychology: Animal Behavior Processes, 32, 71–79. [DOI] [PubMed] [Google Scholar]
  61. Vrana SR, Spence EL, & Lang PJ (1988). The startle probe response: a new measure of emotion? Journal of Abnormal Psychology, 97, 487–491. [DOI] [PubMed] [Google Scholar]
  62. Vervliet B, Craske MG, & Hermans D. (2013). Fear extinction and relapse: State of the art. Annual Review of Clinical Psychology, 9, 215–248. [DOI] [PubMed] [Google Scholar]
  63. Quintero MJ, Moris J, & Lòpez FJ (2024). Evaluating the effects of counterconditioning, novelty-facilitated extinction, and standard extinction on the spontaneous recovery of threat expectancy and conditioned stimulus valence. Quarterly Journal of Experimental Psychology, 77, 14–28. [DOI] [PubMed] [Google Scholar]
  64. Wagner AR, & Brandon SE (1989). Evolution of a structured connectionist model of conditioning (AESOP). In Klein SB & Mowrer RR (Eds), Contemporary learning theories: Pavlovian conditioning and the status of traditional learning theory (pp. 149–189). Lawrence Erlbaum Associate. [Google Scholar]

RESOURCES