Abstract
To determine whether the punishment of a discriminated operant behavior has effects that are specific to the punished response, rats were reinforced for performing two different instrumental responses (lever pressing and chain pulling) in the presence of a single discriminative stimulus (S). They were then either punished with mild footshock for performing one of the responses (R1) in S, or they received the same shocks in a noncontingent manner while performing R1 in S (i.e., a yoked control). In final tests of both R1 and R2 in S, the punished rats were more suppressed to R1 than R2, but the yoked rats were not. The results extend previous results with extinction rather than punishment learning (Bouton, Trask, & Carranza-Jasso, 2016) and support a larger parallel between extinction and punishment of both free-operant and discriminated-operant responding. Punishment is like extinction in creating a response-specific inhibition of either free or discriminated operant behavior.
Keywords: Punishment, extinction, discriminated operant learning, instrumental learning
1. Introduction
Learning to stop making a response is a fundamental aspect of instrumental conditioning (e.g., Bouton & Broomer, 2023). Animals may learn to stop responding under a variety of conditions, including extinction—in which a response is no longer reinforced—and punishment—in which responding is still reinforced, but also earns an aversive outcome such as footshock (e.g., Bouton & Schepers, 2015; Broomer & Bouton, 2023; Jean-Richard-Dit-Bressel & McNally, 2016; Marchant et al., 2013; Panlilio et al., 2003). Recent evidence from this laboratory suggests that extinction and punishment can be considered similar forms of context-dependent inhibitory learning. For example, Broomer and Bouton (2023) recently compared extinction and punishment on three basic recovery effects—renewal, spontaneous recovery, and reacquisition—and observed response recovery when the response was tested in a different context than the one in which it had been extinguished or punished. This context-dependent recovery occurred following changes in both physical and temporal context (see Bouton et al., 2020). Although these results and others (e.g., Bernal-Gamboa et al., 2017; Bouton et al., 2011; Bouton & Schepers, 2015; Rescorla, 1997) indicate that extinction and punishment learning are both context-dependent, they do not address the mechanism by which the context might modulate the expression of instrumental responding.
The results of other experiments in this laboratory, however, suggest that instrumental extinction also involves direct inhibition of the response (see Bouton & Broomer, 2023, for a review). For example, Todd (2013) reinforced two responses (R1 and R2) in two separate contexts (A and B, respectively), and then extinguished each response in the opposite context from which it was trained (i.e., R1 in B and R2 in A). When each response was tested in each context, R1 was suppressed in B and R2 was suppressed in A, but each response renewed when tested in its original acquisition context. Thus, the rats had learned to suppress a specific response in a specific context. A similar pattern has been demonstrated in a discriminated operant design (Bouton et al., 2016). For example, Bouton et al. trained two instrumental responses (R1 and R2) in the presence of the same discriminative stimulus, S (i.e., SR1, SR2), then extinguished only SR1. At test, rats selectively suppressed SR1 responding—but not SR2 responding—again suggesting that instrumental extinction involves learning to suppress a specific response (see also Colwill & Rescorla, 1986; Rescorla, 1993).
Parallel research suggests that punishment learning is similarly response-specific. Bouton and Schepers (2015) used the two-response/two-context method introduced by Todd (2013) but punished—rather than extinguished—each response. That is, R1 and R2 were first reinforced in Contexts A and B, respectively, and then punished in the opposite context. A test of each response in each context showed the same pattern of results as in Todd’s (2013) two-response/two-context design: Rats renewed responding with each response in its original context, and thus had learned to suppress a specific response in a specific context. However, response-specific suppression following punishment has yet to be demonstrated in a discriminated operant procedure like that of Bouton et al. (2016). To extend the behavioral comparison between punishment and extinction (see also Bouton & Broomer, 2023; Broomer & Bouton, 2023), we here applied Bouton et al.’s (2016) discriminated operant method to punishment. That is, in the present experiment, we first reinforced separate R1 and R2 behaviors (lever pressing and chain pulling) in the presence of a single discriminative stimulus (S) (see Table 1). Then one group was punished for performing R1 during S and another (a yoked control) received the same shocks at the same time, but noncontingently, while performing R1 in S (see Bouton & Schepers, 2015; Jean-Richard-Dit-Bressel et al., 2018). We hypothesized that, if punishment learning is indeed like extinction learning, then the punished rats would exhibit response-specific suppression, responding less on the punished SR1 than the unpunished SR2 in a final test. In contrast, we expected yoked rats to respond similarly on the two stimulus-response combinations.
Table 1.
Design of the Experiment
| Group | Response training | Punishment | Test |
|---|---|---|---|
|
| |||
| Punished |
SRl-pellet | SRl-pellet-shock |
SR1? SR2? |
| Yoked | SR2-pellet | SRl-pellet / shock | |
Note. “S” denotes a 30-s auditory discriminative stimulus. R1 and R2 were a lever and chain, counterbalanced between subjects.
2. Method
2.1. Subjects.
The subjects were 32 naïve male Wistar rats (Charles River, Raleigh, NC) run in separate cohorts of 16 (these were separate “replications” of the experiment). Rats were approximately 75–90 days old at the start of the experiment and individually housed in a room maintained on a 12:12-hr light:dark cycle. The experiment took place during the light period of the cycle. Upon arrival, the animals were in the Psychological Science Department colony for eight days before being food restricted to 80% of their baseline body weights. They were maintained at this weight via daily feedings for the duration of the experiment.
2.2. Apparatus.
Two sets of four conditioning chambers housed in separate rooms of the laboratory were used. These were designed to serve as two distinct contexts, but were not used in that capacity here. Each chamber was housed in its own sound attenuation chamber and was of the same design (Med Associates model ENV-007-VP, St. Albans, VT) measuring 29.53 cm × 23.5 cm × 27.31 cm (l × w × h). A recessed 5.1 cm × 5.1 cm food cup was centered in the front wall approximately 2.5 cm above the level of the floor. A retractable lever (ENV-112CM) served as either R1 or R2. The lever was positioned to the left of the food cup protruded 1.9 cm into the chamber. A chain-pull response manipulation (Med Associates model ENV-111C) served as either R2 or R1. The chain was suspended from a microswitch mounted on the top of the Skinner box. When inserted in the chamber, the chain hung 1.9 cm from the front wall, 3.1 cm to the right of the food cup, and 6.2 cm above the floor. The discriminative stimulus (S) was a 30-s, 2900 Hz, 80 dB tone delivered through a 7.6-cm speaker mounted to the ceiling of the sound attenuation chamber. Each chamber was illuminated by one 7.5-W incandescent bulb mounted to the ceiling of the sound attenuation chamber approximately 43.6 cm above the grid floor. Ventilation fans provided background noise of 65 dBA.
The reinforcer was a 45 mg grain food pellet (MLab Rodent Tablets, 5TUM; TestDiet, Richmond, IN). The punishing stimulus was a 0.5-mA, 0.5-s footshock delivered to each chamber by aversive stimulator/scrambler modules (ENV-414). The apparatus was controlled by computer equipment located in an adjacent room.
2.3. Procedure.
The procedure was based on that of Experiment 2 in Bouton et al. (2016) with punishment and yoked shock replacing extinction in the response-elimination phase.
2.3.1. Magazine training.
All rats were assigned to an operant chamber, in which the entire experiment was conducted. They then received a single 30-min magazine training session in which food pellets were freely delivered approximately every 30 seconds according to a random time (RT) 30-s schedule, resulting in approximately 60 pellets.
2.3.2. Response training.
On the same day as magazine training, the rats received two 30-min response training sessions, one with R1 and one with R2 (lever and chain, counterbalanced). In each, responding was reinforced from the start on a random interval (RI) 30-s schedule, such the first response after an average interval of 30 s earned a food pellet. No hand shaping was required.
2.3.3. Acquisition.
On each of the following 12 days, rats received two discriminated operant training sessions, one for each response (i.e., SR1, SR2). Only the manipulandum for the response being trained was available in each session. Lever pressing and chain pulling were counterbalanced as R1 and R2. In every session, responding was reinforced on an RI 30-s schedule only during 16 30-s tone presentation. Tone presentation occurred simultaneously for all rats. Discriminative responding was encouraged by increasing the intertrial interval (ITI) over the first three sessions. The ITI was 30-s on Day 1, averaged 60-s on Day 2, and averaged 90-s for the remainder of the acquisition phase. Shorter ITIs at the start maintained responding and helped ensure that rats continued to respond in the tone. Order of sessions was alternated in a double enclosed pattern (e.g., ABBABAABABBA) such that rats first received SR1 training on half of the days, and first received SR2 training on the other half. Each session lasted approximately 32 minutes.
2.3.4. Punishment.
On each of the next two days, rats received two daily sessions of either punishment or yoked shock while performing R1 in S. As in the acquisition phase, each session consisted of 16 30-s trials with an average ITI of 90 s and lasted approximately 32 minutes. Responding during S continued to earn pellets on an RI 30-s schedule. For punished rats, footshocks delivered on a VI 90-s schedule were also added in S. The shock schedule featured random selection without replacement from a list of five intervals: 60 s, 75 s, 90 s, 105 s, and 120 s. The timer accumulated time across separate 30-s stimuli; this meant that a shock was delivered approximately every third trial. Each punished rat was yoked to a control rat in Group Yoked, such that when the punished rat earned a shock, a shock of the same magnitude and duration was simultaneously delivered to the yoked counterpart. Thus, Groups Punished and Yoked received the same number and distribution of shocks, but differed in whether the shock was contingent on the response.
2.3.5. Test.
On the final day of the experiment, rats received two 10-min test sessions in extinction, one with R1 and the other with R2 (order counterbalanced). Each test consisted of four presentations of the tone with an ITI of 90 s. Each session began with a 180-s delay before the first trial. No pellets or shocks were delivered during the test sessions.
3. Results
One rat failed to increase responding to R1 during the 30-s S relative to responding during the 30-s pre-S period (z = −2.18 for the final training session), one rat similarly failed to acquire SR2 (z = −2.22 for the final training session), and one rat in the Punished group failed to reduce responding by the end of the punishment phase (z = 2.14 for the final punishment session). These three rats were excluded from all analyses.
3.1. Acquisition.
The acquisition of discriminated SR1 and SR2 responding is depicted in Figure 1. A 2 (Group) by 2 (Replication) by 2 (Response) by 2 (Period: S, pre-S) by 12 (Session) ANOVA revealed main effects of Period, F(1, 25) = 353.02, MSE = 40.79, p < .001, Session, F(11, 275) = 21.49, MSE = 10.12, p < .001, and Replication, F(1, 25) = 17.44, MSE = 148.87, p < .001, as well as significant interactions between Period and Replication, F(1, 25) = 6.55, MSE = 40.79, p = .017, Session and Replication, F(11, 275) = 4.18, MSE = 10.12, p < .001, Period and Session, F(11, 275) = 165.95, MSE = 3.06, p < .001, and Period, Session, and Replication, F(11, 275) = 2.57, MSE = 3.06, p = .004. Effects and interactions involving Replication were due to slower acquisition of both responses in the second replication.
Figure 1. Acquisition of Discriminated Responding.

Note. “S” denotes average responding during the 30-s discriminative stimulus, “pre-S” denotes responding during the 30-s period prior to stimulus onset. Error bars depict the standard error of the mean (SEM).
3.2. Punishment.
The results of the punishment phase—now grouped into 4-trial blocks—are depicted in Figure 2. As the figure suggests, both Punished and Yoked rats reduced responding over the course of the punishment phase, but Punished rats did so to a significantly greater extent. A 2 (Group) by 2 (Replication) by 2 (Period) by Block (16) ANOVA revealed significant main effects of Period, F(1, 25) = 77.92, MSE = 143.22, p < .001, Block, F(15, 375) = 15.91, MSE = 17.15, p < .001, and Group, F(1, 25) = 5.71, MSE = 296.23, p = .025. There were also significant interactions between Group and Period, F(1, 25) = 5.88, MSE = 143.22, p = .023, Group and Block, F(15, 375) = 1.81, MSE = 210.05, p = .031, and Period and Block, F(15, 375) = 9.16, MSE = 9.83, p < .001.
Figure 2. Punishment of SR1.

Note. Only SR1 was available during this phase. Error bars depict SEM.
Response rates during S in the final session of the punishment phase were averaged across blocks and analyzed with a 2 (Group) by 2 (Replication) ANOVA, which revealed a significant main effect of Group, F(1, 25) = 12.86, MSE = 31.82, p = .001, and no other significant effects or interactions (larger F = .43). Response rates from the final session were also converted to elevation scores by subtracting pre-S responding (averaged across trials) from the S responding (averaged across trials). The elevation score provides a direct indication of the extent to which S promoted responding above baseline. The mean elevation score for Group Yoked was 7.36 (SEM = 1.43) while the mean elevation score for Group Punished was 1.28 (SEM = .74), The group difference was significant, t(27) = 3.53, p = .001.
3.3. Test.
Test data were analyzed in the form of response rate in S and elevation scores, again calculated by subtracting responding in the 30-s pre-S period from responding in the 30-s S period. These data are depicted in Figure 3a and 3b. Response rates in S were analyzed with a 2 (Group) by 2 (Replication) by 2 (Response) ANOVA, which revealed a significant main effect of Response, F(1, 25) = 4.96, MSE = 32.74, p = .035, and no other effects or interactions (largest F = 2.17). Planned comparisons indicated that punished rats responded significantly less on R1 than R2 (p = .020) whereas Yoked rats responded similarly on each response (p = .578). A similar 2 (Group) by 2 (Replication) by 2 (Response) ANOVA on the elevation scores revealed no significant main effects or interactions. However, planned comparisons indicated that Punished rats responded significantly less on R1 than R2 (p = .046) whereas Yoked rats responded similarly on each response (p = .806). Response-contingent punishment in S thus produced a response-specific suppression of responding.
Figure 3. Test Results.

Note. A) Mean response rates during S. Inset columns indicate mean of pre-S responding. B) Mean elevation scores (S – pre-S) at test. Error bars indicate 1 SEM.
4. Discussion
Response-specific suppression has been demonstrated in extinction with both free and discriminated operant procedures (Bouton et al., 2016; Todd, 2013). Punishment learning is thought to depend on similar fundamental behavioral mechanisms (Bouton & Schepers, 2015; see Broomer & Bouton, 2023), but to our knowledge response-specific suppression in punishment has only been demonstrated in free operant designs (e.g., Bouton & Schepers, 2015; see also Bolles, Holz, Dunn, & Hill, 1980). The present experiment extended the extinction-punishment parallel by testing response-specific suppression in punishment of a discriminated operant response. As predicted, punished rats responded significantly less during SR1 than SR2, whereas any suppression in the yoked rats did not significantly differ between the two. These results suggest that punishment learning was response-specific, consistent with the results of Bouton and Schepers (2015). More generally, they are consistent with the evidence that both punishment and extinction involve response-specific as well as context-specific inhibitory learning (e.g., Bouton et al., 2016; Broomer & Bouton, 2023; Todd, 2013; see Bouton & Broomer, 2023, for review).
Although yoked rats responded similarly on SR1 and SR2 at test, their declining response rates through the punishment phase suggested a suppressive effect of non-response-contingent shock on SR1 responding. One straightforward explanation is the possible contribution of Pavlovian fear conditioning to S. The importance of including noncontingent or yoked controls in punishment studies has been noted frequently (e.g., Bouton & Schepers, 2015; Jean-Richard-Dit-Bressel et al., 2018). In a free operant procedure, a yoked group controls for both nonassociative effects of shock and possible Pavlovian associations learned between the background context and shock (either could in principle suppress responding in a punishment procedure). Note that a context-shock association is subject to considerable latent inhibition (and perhaps appetitive counterconditioning) via context exposure during initial instrumental training. In a discriminated operant procedure like the present one, however, shocks are delivered only during a brief and highly salient discriminative stimulus. Although that stimulus also receives preexposure and appetitive conditioning during the initial training phase, there is less overall exposure to S than to the physical context stimuli, and Pavlovian fear might therefore develop easily and suppress responding more generally in a discriminated operant design than in a free operant design. Importantly, however, the similarity of yoked group performance on SR1 and SR2 during testing here suggests that the specific suppression of R1 in the experimental group was driven by the contingent relationship between R1 and shock rather than S and shock.
Inhibition of responding is a fundamental aspect of response learning. The present experiment contributes to a body of evidence indicating that the inhibition of behavior can be specific to the response (e.g., Bouton & Schepers, 2015; Bouton et al., 2016; Todd, 2013). That is, animals learn to stop making a specific response in a specific context, regardless of whether that context is a physical environment or a discriminative stimulus. This response-specific inhibitory learning may be a fundamental mechanism by which context modulates behavior when an animal has learned to stop performing an instrumental response.
Acknowledgments
Supported by Grant RO1 DA033123 from the United States National Institutes of Health. Send correspondence to Mark E. Bouton, Department of Psychological Science, University of Vermont, Burlington, VT 05405.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Bolles RC, Holtz R, Dunn T, & Hill W (1980). Comparisons of stimulus learning and response learning in a punishment situation. Learning and Motivation, 11, 78–96. [Google Scholar]
- Bouton ME, & Broomer MC (2023). Learning to stop responding. Behavioural Processes, 104830. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bouton ME, Maren S, & McNally GP (2020). Behavioral and neurobiological mechanisms of Pavlovian and instrumental extinction learning. Physiological Reviews, 101, 611–681. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bouton ME & Schepers ST (2015). Renewal after the punishment of free operant behavior. Journal of Experimental Psychology: Animal Learning and Cognition, 41(1), 81–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bouton ME, Todd TP, Vurbic D, & Winterbauer NE (2011). Renewal after the extinction of free operant behavior. Learning & Behavior, 39(1), 57–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bouton ME, Trask S & Carranza-Jasso R (2016). Learning to inhibit the response during instrumental (operant) extinction. Journal of Experimental Psychology: Animal Learning and Cognition, 42(3), 246–258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Broomer MC, & Bouton ME (2023). A comparison of renewal, spontaneous recovery, and reacquisition after punishment and extinction. Learning & Behavior. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Colwill RM & Rescorla RA (1986). Associative structures in instrumental learning. In Bower G (Ed.), The psychology of learning and motivation (Vol. 20, pp. 55–104). Orlando, FL: Academic Press. [Google Scholar]
- Jean-Richard-Dit-Bressel P & McNally GP (2016). Lateral, not medial, prefrontal cortex contributes to punishment and aversive instrumental learning. Learning & Memory, 23, 607–617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jean-Richard-Dit-Bressel P, Killcross S, & McNally GP (2018). Behavioral and neurobiological mechanisms of punishment: implications for psychiatric disorders. Neuropsychopharmacology, 43, 1639–1650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marchant NJ, Khuc TN, Pickens CL, Bonci A, & Shaham Y (2013). Context-induced relapse to alcohol seeking after punishment in a rat model. Biological Psychiatry, 73, 256–262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Panlilio LV, Thorndike EB, Schindler CW (2003). Reinstatement of punishment-suppressed opioid self-administration in rats: an alternative model of relapse to drug abuse. Psychopharmacology, 168, 229–235. [DOI] [PubMed] [Google Scholar]
- Rescorla RA (1993). Inhibitory associations between S and R in extinction. Animal Learning & Behavior, 21, 327–336. [Google Scholar]
- Rescorla RA (1997). Spontaneous recovery of instrumental discriminative responding. Animal Learning & Behavior, 25(4), 485–497. [Google Scholar]
- Todd TP (2013). Mechanisms of renewal after the extinction of instrumental behavior. Journal of Experimental Psychology: Animal Behavior Processes, 39(3), 193–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
