Abstract
Three experiments examined the return of a habitual instrumental response to the status of goal-directed action. In all experiments, rats received extensive training in which lever pressing was reinforced with food pellets on a random-interval schedule of reinforcement. In Experiment 1, the extensively-trained response was not affected by conditioning a taste aversion to the reinforcer, and was therefore considered a habit. However, if the response had earned a new and unexpected food pellet during the final training session, the response was affected by taste aversion conditioning to the (first) reinforcer, and had thus been converted to a goal-directed action. In Experiment 3, 30 mins of prefeeding with an irrelevant food pellet immediately before the test also converted a habit back to action, as judged by the taste-aversion devaluation method. That result was consistent with difficulty in finding evidence of habit with the sensory-specific satiety method after extensive instrumental training (Experiment 2). The results suggest that an instrumental behavior’s status as a habit is not permanent, and that a habit can be returned to action status by associating it with a surprising reinforcer (Experiment 1) or by giving the animal an unexpected prefeeding immediately prior to the action/habit test (Experiment 3).
Keywords: Goal-directed action, habit, reinforcer devaluation, taste aversion learning, sensory-specific satiety
Instrumental behaviors are thought to take two forms: Goal-directed actions, which are emitted if they produce an outcome that the organism currently wants or values, and habits, which are behaviors that automatically occur in a particular situation without regard to the outcome’s current value (e.g., Dickinson, 1985, 1994). Actions and habits are typically distinguished with reinforcer devaluation methods. In one such method, a taste aversion is separately conditioned to the reinforcer after instrumental training by pairing it with a toxin (such as lithium chloride, LiCl), and then instrumental responding is tested in extinction. If the response is a goal-directed action, reinforcer devaluation suppresses it; the animal behaves as if it remembers that the action led to the outcome, and that it no longer values the outcome. In contrast, if the behavior is a habit, devaluation has no effect on the response (e.g., Thrailkill & Bouton, 2015). In a second reinforcer devaluation technique, the animal is allowed to consume the reinforcing outcome (or a different, irrelevant outcome) to the point of satiety just before the extinction test of the instrumental response. Once again, if the response is a goal-directed action, specific satiety to its reinforcer suppresses it (e.g., Dickinson, Campos, Vargas, & Balleine, 1996; Colwill & Rescorla, 1985b), but if it is a habit, sensory-specific satiety to the reinforcer should have no effect (e.g., Urcelay & Jonkman, 2019).
Actions eventually become habits after extended practice (e.g., Adams, 1982; Dickinson, Balleine, Watt, Gonzalez, & Boakes, 1995; Holland, 2004; Thrailkill & Bouton, 2015), particularly if they are reinforced on interval schedules of reinforcement (Dickinson, Nicholas, & Adams, 1983). Recent research in this laboratory has suggested that habit may develop during training as the reinforcer becomes more and more predictable (Thrailkill, Trask, Vidal, Alcalá, & Bouton, 2018). For example, in one experiment, Thrailkill et al. studied lever pressing that was reinforced in the presence of a discriminative stimulus (S), but not in its absence. In one group, lever pressing was reinforced in S on an interval schedule in such a way that every S presentation contained a reinforcer. In another group, lever pressing was reinforced at the same overall rate but under conditions in which only 50% of the S presentations contained a reinforcer. After extensive training, reinforcer devaluation tests with the taste aversion method revealed that lever pressing in the 100% group had become a habit, whereas lever pressing in the 50% group had remained an action. The authors suggested that in the 100% group, the reinforcer had become highly predictable, whereas in the 50% group it had not. As a consequence, the 100% group paid less attention to its behavior, and the behavior was emitted automatically-- a habit developed. The idea is an application of the Pearce-Hall model’s (Pearce & Hall, 1980) Pavlovian attention rule to instrumental conditioning. According to that rule, organisms pay less and less attention to a conditioned stimulus (CS) during conditioning as the reinforcer becomes predictable. But they continue to pay attention to CSs if the reinforcer is kept unpredictable. For example, Kaye and Pearce (1984) found that rats gradually paid less attention to a CS that was always paired with a reinforcer, but kept attending to one that was reinforced only 50% of the time (see also Wilson, Boumphrey, & Pearce, 1992). In Thrailkill et al.’s experiment, the uncertainty about presentation of a reinforcer in S in the 50% group might similarly maintain surprise and the animal’s attention to its behavior. The results were thus compatible with the Pearce-Hall attention rule, but they were not anticipated by other possible mechanisms of habit learning—such as the simple law of effect (e.g., de Wit & Dickinson, 2009; Wood & Runger, 2016) or the experience of a low correlation between behavior rate and reward rate (Dickinson, 1985; Dickinson & Perez, 2018).
Although the Thrailkill et al. results begin to provide insight into conditions that might create habits, what are the conditions that might break them? The present experiments began as a test of the complementary prediction that a habit might return to action status if the reinforcer is made surprising again. According to the Pearce-Hall rule, a surprising reinforcer substituted for a predicted one would increase attention to the predictor (or here, the instrumental behavior) again (e.g., Kaye & Pearce, 1984; Hall & Pearce, 1982; Wilson et al., 1992). Rats were first trained to lever press with a method that prior work in this laboratory suggested would make the response habitual. In Experiments 1 and 2, lever pressing was reinforced with either grain or sucrose pellets (Outcome 1); an experimental group was then given a final instrumental training session in which the other reinforcer (Outcome 2) was substituted for the first. The question was whether the surprising second outcome would return the habit back to an action, and thus make it sensitive to the reinforcer devaluation treatment. In Experiment 1, we confirmed that it did when habit development was assessed with the taste aversion devaluation method. In Experiment 2, we tested the idea again with the sensory-specific satiety procedure. Surprisingly, whether the habitual response was associated with a surprising outcome or not, the response at test took the form of a goal-directed action. In Experiment 3, we therefore tested the possibility that the pellets consumed during prefeeding in the specific satiety procedure might themselves convert a habit back to action. When assessed with taste aversion devaluation, a control group had a habit, whereas identically-trained rats that were prefed on an irrelevant pellet just before the test had an action. The results thus suggest that unexpected pellets, whether they are contingent on the response or not, can return a habit to the status of goal-directed action.
Experiment 1
The design of the first experiment is summarized in Table 1. During the instrumental training phase, rats were reinforced for lever pressing on a Random Interval (RI) schedule of reinforcement until they had received 360 occasions in which the response had been paired with a reinforcer. This type and amount of instrumental training is sufficient to create a habit (e.g., Dickinson et al., 1995; Thrailkill & Bouton, 2015; Schoenberg, Sola, Seyller, Kelberman, & Toufexis, 2019). Half the rats were reinforced with grain pellets, and half were reinforced with sucrose pellets (O1). In a final training session, rats in the experimental condition had sucrose substituted for grain or grain substituted for the sucrose outcome (O2). Control rats continued to receive the original outcome (O1). After a reinforcer devaluation phase in which half of each group had O1 paired with LiCl and the other half had it unpaired, lever pressing was tested in extinction. We expected habit in the control groups—there should be no difference in instrumental responding between the paired and unpaired groups. But if the surprising O2 had returned the habit back to action, there should be a reinforcer devaluation effect in the experimental groups—the paired group should show suppressed responding relative to the unpaired group.
Table 1.
Designs of the experiments
| Group | Acquisition | Aversion Conditioning | Test | |
|---|---|---|---|---|
| Experiment 1 | ||||
| Control Paired | 13 R-O1 | O1 – LiCl | R? | |
| Control Unpaired | 13 R-O1 | O1 / LiCl | R? | |
| O2 Paired | 12 R-O1, 1 R-O2 | O1 - LiCl | R? | |
| O2 Unpaired | 12 R-O1, 1 R-O2 | O1 / LiCl | R? | |
| Experiment 2 | ||||
| Control | 19 R-O1 | -- | O1, O1, O1… R?; O2, O2, O2… R? | |
| O3 | 18 R-O1, 1 R-O3 | -- | O1, O1, O1… R?; O2, O2, O2… R? | |
| Experiment 3 | ||||
| Control Paired | 20 R-O1 | O1 – LiCl | -- | R? |
| Control Unpaired | 20 R-O1 | O1 / LiCl | -- | R? |
| PF Paired | 20 R-O1 | O1 - LiCl | O2, O2, O2… R? | |
| PF Unpaired | 20 R-O1 | O1 / LiCl | O2, O2, O2… R? | |
Note: O1, O2, and O3 represent different food pellets; R indicates the instrumental response; numbers (1, 12, 13,18, 19, 20) indicating the number of sessions in acquisition; -- represents no treatment.
Method
Subjects
Thirty-two female Wistar rats (Charles River, St. Constance, Quebec, Canada), aged 75–90 days at the start of the experiment, were individually housed in a room maintained on a 16–8 light-dark cycle. Experimental sessions were conducted during the light portion of the cycle at the same time each day. Rats were food deprived and maintained at 80% of their free-feeding weights for the duration of the experiment. They were allowed ad libitum access to water in their home cages. Weights were maintained by supplemental feeding approximately 30 min after each experimental session.
Apparatus
The apparatus consisted of two unique sets of four conditioning chambers (Model ENV008-VP; Med Associates, Fairfax, VT) housed in separate rooms of the laboratory. Each chamber was in its own sound attenuation chamber. All boxes measured 30.5 × 24.1 × 23.5 cm. The side walls and ceiling were made of clear acrylic plastic, and the front and rear walls were made of brushed aluminum. A recessed food cup measured 5.1 × 5.1 cm and was centered on the front wall approximately 2.5 cm above the grid floor. Two retractable levers (Model ENV-112CM, Med Associates) were located on the front wall on either side of the food cup. The levers were 4.8 cm long and 6.3 cm above the grid floor. Levers protruded 1.9 cm from the front wall when extended. The right lever was never used (and remained retracted throughout all experimental sessions). The chambers were illuminated by a 7.5-W incandescent bulb mounted to the ceiling of the sound attenuation chamber, 34.9 cm from the grid floor; ventilation fans provided background noise of 65 dBA. The two sets of boxes had unique features that allowed them to serve as different contexts but were not used for that purpose here. In one set of boxes, the grids of the floor were spaced 1.6 cm apart (center to center). The ceiling and a side wall had black horizontal stripes, 3.8 cm wide and 3.8 cm apart. In the other set of boxes, the floor consisted of alternating stainless steel grids with different diameters (0.5 and 1.3 cm, spaced 1.6 cm apart). The ceiling and sidewall were covered with dark dots (2 cm in diameter). There were no other distinctive features between the two sets of chambers. Two types of 45-mg food pellets were used as reinforcers: Grain (5-TUM) and sucrose (5-TUT; TestDiet MLab Rodent Tablets, Richmond, IN). The apparatus was controlled by computer equipment in an adjacent room.
Procedure
Food restriction began one week prior to the beginning of training. Animals were handled each day and maintained at their target body weight with supplemental feeding.
Magazine training.
On the first day, rats received two sessions of magazine training in the operant chambers in which the rats were trained to eat from the food magazine. Each session was conducted with a different food pellet (grain or sucrose) in a counterbalanced order. During these sessions, levers were retracted and 30 pellets were delivered according to a random time (RT) 30 s schedule.
Instrumental training.
Instrumental training began the day following magazine training. One session was conducted each day for a total of 12 days. Each session consisted of the insertion of the left lever and then its retraction after 30 reinforced responses with O1. Half the rats (n = 16) received grain pellets for O1 and the other half (n = 16) received sucrose pellets for O1. On the first day of response training, lever pressing was reinforced with O1 on a random interval (RI) 2 s schedule. On the second day, lever pressing was reinforced on an RI 15 s schedule. On the third day, the RI schedule increased to RI 30 s for the remaining sessions. The rats received a total of 12 response training sessions, or 360 total reinforced responses.
Experimental manipulation.
On Day 14, rats were matched on RI 30 s response rates and then assigned to the Experimental (Group O2) or the Control (Group Control) group in a manner that counterbalanced operant chambers. Group Control received one more session of response training on a RI 30-s reinforcement schedule with O1. Group O2 received the same response training session with O2 (i.e., grain pellets if O1 was sucrose and sucrose pellets if O1 was grain).
Aversion conditioning.
Aversion conditioning with the O1 pellet then proceeded over the next 12 days. The procedure involved pairing the pellets with lithium chloride (LiCl) for the paired groups. On the first day of each two-day cycle, Paired rats (half of both the Experimental and Control groups) received 50 noncontingent O1 pellets in the operant chamber on an RT 30-s schedule. They were then removed from the chamber and given an immediate intraperitoneal (i.p.) injection of 20 ml/kg LiCl (0.15 M) and put in the transport box prior to being returned to the home cage. Unpaired rats (the other half of the Experimental and Control groups) received the same exposure to the chamber and LiCl injection without receiving any pellets. On day 2 of each cycle, rats did not receive a LiCl injection. On this day, Unpaired rats received 50 pellets according to an RT 30-s schedule, while Paired rats received exposure to the chamber for the same amount of time as in the preceding pellet session. There were 6 two-day conditioning cycles arranged so that the Paired group received six O1-LiCl pairings. In order to maintain equivalent pellet exposure during aversion conditioning, the Unpaired group was only allowed to consume the average number of pellets eaten by the Paired group on each pellet trial.
Test.
All rats were tested for lever pressing during a 10-min session in which the lever was present, but presses had no scheduled consequences.
Consumption test.
On the next day, animals were returned to conditioning boxes where they received ten O1 reinforcers delivered to the food cup on a RI 30-s schedule, and the number consumed was counted.
Reacquisition test.
On the final day, animals were given two 15-min reacquisition sessions that proceeded as a typical instrumental training session, with response-contingent reinforcers delivered on a RI 30-s schedule. One session used O1 and the other used O2 (order counterbalanced).
Data analysis.
The results were evaluated with analysis of variance (ANOVA). During the Test phase, our a priori focus was planned comparisons between the Paired and Unpaired groups in each condition, because such comparisons are how goal-directed actions and habits are identified. To minimize Type I error rate, we report only those comparisons (using the error term from the overall ANOVA), which are orthogonal. Because of positive skew in the test data, we converted the test scores to logarithms (base 10) before statistical analysis. Both lever press rates and the logged rates are reported.
Results
Instrumental Training.
Instrumental conditioning proceeded without incident (Figure 1, Panel a). Groups did not differ based on pellet type received in Session 13 (at right). These observations were supported by a Group (O2, Control) by Devaluation by Session (13) ANOVA which found a significant effect of session, F(12, 336) = 55.85, MSE = 40.69, p < .001, and no other significant effects or interactions, largest F(1, 28) = 1.42, MSE = 983.40. A Group by Devaluation ANOVA on response rates in the final session prior to reinforcer devaluation revealed no significant differences, Fs < 1.
Figure 1:

Results of Experiment 1. a.) Instrumental training phase (means and standard errors of the mean, SEMs). b.) Performance in the extinction test of the instrumental response (means and SEMs). O2 = surprising outcome in the final training session; Control = no surprising outcome; Paired and Unpaired = O1 pairings (or unpairing) with LiCl.
Test.
Aversion conditioning was uneventful; on the final trial of the devaluation phase, Group O2 Paired consumed 0 pellets and Group Control Paired consumed a mean of 0.2 pellets, while the Unpaired groups ate all that were offered. As shown in Figure 1 (Panel b), a reinforcer devaluation effect was clearly present in Group O2 (left), but not in Group Control (right) [log means (standard error of the mean) were 0.91 (0.07) and 1.20 (0.05) for Groups O2 Paired and O2 Unpaired, and 0.91 (0.08) and 1.09 (0.04) for Groups Control Paired and Control Unpaired]. The planned comparisons confirmed a significant devaluation effect (difference between the Paired and Unpaired groups) in the O2 groups, F(1, 28) = 9.96, MSE = 0.06, p = .004, η2 = .26, suggesting that the response was a goal-directed action. For Group Control, the difference between the Paired and Unpaired groups did not approach significance, F(1, 28) = 1.87.
The effectiveness of the aversion conditioning treatment was confirmed in the consumption and reacquisition tests; Groups O2 Paired and Control Paired ate a mean of 0.5 and 0.6 out of 10 pellets that were offered, while rats in both Unpaired groups consumed all of them.The averted pellet was also unable to support reacquisition of the instrumental response; Unpaired rats increased instrumental responding in the presence of the O1 reinforcer during the reacquisition test, whereas Paired rats in either condition did not.
Discussion
Reinforcer devaluation did not affect the instrumental response in the control groups, confirming that the training procedure had generated a behavior that was indeed independent of the reinforcer’s value—i.e., was a habit. However, a new reinforcer introduced during the final session of instrumental training made the response sensitive to the effect of reinforcement devaluation in the experimental (O2) groups. The results are thus consistent with the view that an unexpected reinforcer can return a habit to the status of a goal-directed action.
It is interesting that the devaluation effect in the O2 groups took the form of higher responding in the unpaired group relative to the other groups. Rescorla (1996, 1997) has shown that when a new appetitive outcome (O2) is substituted for another appetitive outcome (O1) in instrumental learning, O2 creates a temporary inhibition of the response-O1 association, which then spontaneously recovers over time and increases the rate of instrumental responding. It is conceivable that the same effect occurred here over the 12-day reinforcer devaluation phase, causing an elevated baseline in the O2 Unpaired group.
Experiment 2
The second experiment sought to extend the results of Experiment 1 by using the sensory-specific satiety devaluation method. Here, instead of receiving aversion conditioning with the reinforcer, the animals were fed the reinforcer to satiety immediately before lever pressing was tested in extinction. As noted earlier, if the response is a goal-directed action, specific satiety to the O1 outcome should depress the response relative to a condition in which the rat is prefed an irrelevant pellet that was not associated with the instrumental response.
The design of the experiment is sketched in Table 1. It was conceptually the same as Experiment 1, but used the specific satiety devaluation method. In addition to the final test shown in the table, the rats were given an early test after initial instrumental training (not shown) to confirm the method’s usefulness in detecting action. Then, after continuing the training so as to create a habit, rats in an experimental condition received a surprising reinforcer contingent on the instrumental response during a final acquisition session (as in Experiment 1). If habits reconvert to action when associated with a surprising reinforcer, we expected to see action in the experimental group and a habit in the controls, as in Experiment 1, but here when tested with sensory-specific satiety.
Method
Subjects and apparatus
The subjects were 32 female Wistar rats purchased from the same vendor as those in the Experiment 1 and maintained under the same conditions. The apparatus was also the same. Animals were food deprived to 95% of their free-feeding weights over the course of five days of food deprivation prior to instrumental training, maintained throughout the experiment with a supplemental feeding approximately one hr after each day’s training.
Reinforcement consisted of the delivery of a 45-mg food pellet into the food cup. Three types of pellets were used: the grain (5-TUM) and sucrose (5-TUT) pellets used in Experiment 1, and a new sweet-fatty pellet (5-C2R; TestDiet MLab Rodent Tablets, Richmond, IN). Grain and sucrose pellets were used as either operant (O1) or control (O2) reinforcers in the specific satiety tests. Sweet-fatty pellets were used as a surprising outcome (O3) in a single session prior to the test.
Procedure
Pellet preexposure.
Animals were separately exposed to each pellet type across three 40-minute sessions during which they were permitted to consume 30 pellets. These exposure sessions occurred in the home cage, with pellets presented in ceramic ramekins (9 cm × 4.5 cm) placed in the corner of the home cage. This procedure familiarized the rats with presentations of pellets in the home cage.
Magazine training.
On the following day, all animals received three sessions of magazine training in the conditioning chamber. In each session, a single type of pellet (grain, sucrose, or sweet-fatty; counterbalanced order) was delivered to the food cup on an RT 30-s schedule with both levers retracted. Each session terminated upon delivery of 30 pellets.
Response training.
All animals were then assigned to either grain or sucrose as the lever-press outcome (O1). Thus, lever presses were associated with sucrose pellets for half the animals and grain pellets for the other half. On the day following magazine training, animals were placed in the conditioning boxes for the first of four daily lever-training sessions. The left lever was inserted and presses were reinforced according to a RI 2-s schedule the first day, a RI 15-s schedule the second, and a RI 30-s schedule the third and fourth. Each session terminated and levers retracted once each animal had received 30 pellets.
To control for exposure to O1 and O2, animals were allowed to consume an equivalent number of the non-trained pellet in the home cage each day. Thus, rats for whom lever-pressing was reinforced by 30 sucrose pellets received 30 grain pellets in the home cage and vice-versa. O2 pellets were presented in the ramekins placed in the corner of the home cage, as they had been in initial pellet exposures. O2 exposures occurred either one hour prior to, or one hour following, the instrumental training session on a two-day alternating schedule (AABB).
Action test.
Following four days of training and 120 response-outcome pairings, a sensory-specific satiety procedure was used to assess the sensitivity of the response to outcome devaluation. On each of two days, animals were allowed to feed freely on 20 g of either O1 (devalued condition) or O2 (valued condition) for one hour prior to the test. Pellets were again presented in ceramic ramekins in the home cage. Immediately following the prefeeding, animals were transported to the conditioning chambers, where they underwent a three-minute extinction test in which lever presses were recorded, but no reinforcers were delivered. (The brief test was designed to minimize carryover of extinction from the first to second tests.) The order in which pellet types (O1 vs. O2) were presented was counterbalanced.
Extended response training.
Following the action test, animals received eight more daily training sessions that proceeded as previously described. Animals then completed six additional days of training on a RI 60-s schedule. (A leaner schedule was used to improve the probability of observing habit; Garr, Bushra, & Delamater, 2017.) All extended training sessions terminated upon delivery of 30 pellets. At the end of the phase, all animals had received a total of 540 response-outcome pairings. Daily O2 presentations in the home cage proceeded as previously described.
Experimental manipulation.
On the final day of training, lever presses for half the animals produced the unexpected sweet-fatty O3 pellet on the RI 60-s schedule. Lever-presses for the other animals continued to deliver the regular O1. All animals also received the usual O2 exposure in the home cage on this day.
Final test.
On the two days following the experimental manipulation, lever pressing was tested with the sensory-specific satiety procedure. As before, rats were prefed with O1 and O2 (order counterbalanced) and lever pressing was assessed in extinction for 3 min.
Results
Action training and action test.
The animals acquired the lever-press response without incident (Figure 2, Panel a). During the devaluation test that followed the initial training (Figure 2, Panel b), the animals exhibited reduced response rates in the devalued condition compared to the valued condition, F(1, 31) = 8.78, MSE = 3.19, p = .006. The significant difference in response rates between valued and devalued conditions indicates that the behavior was a goal-directed action. The animals ate a mean of 9.5 and 9.6 g during prefeeding of O1 (devalued condition) and O2 (valued condition).
Figure 2:

Results of Experiment 2. a.) Instrumental training phase (means and SEMs). Vertical dashed line indicates the occasion of the Action Test. b.) Instrumental responding during the first extinction test (Action Test). (Means and standard error bars that have been corrected for within-subject comparison, Cousineau & O’Brien, 2014). c.) Instrumental responding during the final Test (Means and error bars corrected for within-subject comparison). O3 = surprising outcome in the final instrumental training session; Control = no surprising outcome; Devalued and Valued = Prefeeding with O1 or O2, respectively, immediately prior to testing.
Extended training and experimental manipulation.
All animals completed extended training without incident (top right Figure 2, Panel a). A Group (O3, Control) by Session (18) ANOVA on all the acquisition sessions found an effect of session, F(17, 510) = 67.95, MSE = 32.40, p < .001, and no other significant effects or interactions, Fs < 1. On the last day of training prior to the surprising outcome manipulation, there was no difference in response rates between experimental and control groups, F(1, 30) < 1. When lever presses were reinforced in the next session by O3 for half the subjects, there was a trend toward a reduction in response rates that fell well short of statistical reliability, F(1, 30) = 2.82, MSE = 137.10, p = .104.
Final test.
Following extended training, responding in both the O3 and Control groups was suppressed by the sensory-specific satiety manipulation (Figure 2, Panel c) [log means (standard error of the mean) were 0.63 (0.09) and 0.83 (0.04) for Group O3 in the Devalued and Valued conditions, and 0.45 (0.09) and 0.75 (0.06) in the Control groups in those conditions]. Planned comparisons found significant effects of devaluation in both the O3 and Control groups, F(1, 30) = 6.08, MSE = .08, p = .020, η2 = .17, and F(1, 30) = 12.17, p = .002, η2 = .29, suggesting goal-directed action in both conditions. The animals ate a mean of 8.7 g and 9.2 g during prefeeding with O1 (devalued condition) and O2 (valued condition).
Discussion
The sensory-specific satiety procedure was successful at detecting goal-directed action. Indeed, whenever rats were prefed on the pellet that had reinforced lever pressing (O1), lever pressing was suppressed relative to a condition in which the rats were equally fed on an equally-familiar control pellet that had not been associated with the instrumental response (O2). The response was thus sensitive to the current value of its goal, and was a goal-directed action. However, even after a total of 540 response-outcome pairings, sensory-specific satiety continued to suppress responding in the control group. Although the results suggest that sensory-specific satiety provides a good way to detect goal-directed action, the experiment produced no evidence of habit, making the results ambiguous with respect to whether the surprising reinforcer delivered at the end of training converted a habit back to action, as it did in Experiment 1.
Experiment 2 was actually one of three unpublished experiments we conducted in an attempt to adapt the specific satiety procedure to detecting habit after extended instrumental training. Unfortunately, in our hands, prefeeding on the reinforcing food pellet (compared with an equally familiar control pellet, as here) always suppressed lever-pressing, even after instrumental training protocols that had produced habits assessed with the taste aversion procedure (e.g., Experiment 1; Thrailkill & Bouton, 2015). Moreover, the results of a related series of experiments in our laboratory (Trask, Shipman, Green, & Bouton, submitted), which will be described further in the General Discussion, suggested that presentation of surprising reinforcers that are not contingent on the behavior at the end of instrumental training can turn a habit (assessed with the taste aversion procedure) into action again. These considerations led us to suppose that the unexpected pellets presented in the specific satiety procedure might themselves return a habit to a goal-directed action. That hypothesis was therefore tested in the third experiment.
Experiment 3
The idea that a habit can be returned to action in the moment of the test is not inconsistent with what we know of everyday habits, which can readily return to action status when the contemporary conditions encourage it. For example, one of us has caught himself driving to work instead of the grocery store on weekend mornings, only to recover in the middle of the mistake and successfully find his way to the grocery store (thus exercising goal direction). We thus asked whether free reinforcers presented immediately before the test could also return a habit to action.
The design of the experiment is summarized in Table 1. Rats again received instrumental training with an extended training procedure that produces habit. In the next phase, two groups then received the Paired taste aversion conditioning procedure used in Experiment 1 to devalue the O1 reinforcer, while two groups received the Unpaired treatment. Then, on the test day, rats in the Prefed (PF) Paired and Unpaired groups received an opportunity to feed for 30 min on an irrelevant pellet (O2) immediately before the instrumental response was tested in extinction. The question was whether consuming the irrelevant pellets would convert the habit back to action, making the Paired group with the taste aversion to O1 suppress its responding relative to the Unpaired group.
Method
Subjects and apparatus
Subjects were 32 naïve female Wistar rats from the usual supplier, 75 to 90 days old at the start of the experiment. They were housed in the same way as animals in Experiment 1 and 2. Training sessions occurred during the light phase of the 16:8 hr light:dark cycle. Animals were food deprived to 95% of their free-feeding weights as in Experiment 2. The apparatus was the same as that used in Experiment 1 and 2. Reinforcement consisted of the grain and sucrose pellets previously described.
Procedure
Pellet preexposure.
Following five days of food deprivation in the home cage, all animals received two days of exposure to both grain and sucrose pellets (counterbalanced order). On the first day, approximately 60 pellets were twice presented in the ceramic ramekins placed in a corner of the home cage. One such exposure was given each with grain and sucrose pellets. The procedure was repeated in reverse order on the second day.
Magazine training.
On the next day, each rat was placed in the conditioning chamber with levers retracted for a single magazine training session. At this time, the rats received the pellet that was to be used as the instrumental outcome (grain or sucrose each for half the animals). Pellets were delivered to the food cup on a RT 30-s schedule. Each session terminated after 30 minutes.
Instrumental training.
On the day following magazine training, animals began instrumental training with the reinforcer used in magazine training. There were two 30-min sessions each day in which lever pressing was reinforced on the RI 30-s schedule. Training took place over ten days for a total of twenty training sessions, allowing approximately 1200 response-outcome pairings. The twice-daily training procedure was used to be consistent with methods used in other recent experiments on habit-action conversion in this laboratory (Trask et al., submitted; see also Shipman, Trask, Bouton, & Green, 2018).
Reinforcer devaluation.
Following instrumental training, animals were assigned to either Paired or Unpaired groups and then received taste aversion conditioning with the pellet type used during instrumental training following the procedure used in Experiment 1. There were 6 taste aversion conditioning trials.
Test.
Immediately prior to a 10-min instrumental extinction test, half of the Paired and half of the Unpaired animals were allowed to feed freely (PF condition) on 20 g of the type of pellet that had not been used in the instrumental and devaluation phases (i.e., O2). Pellets were presented in the ceramic ramekins again in the corner of the home cage. Rats were allowed to eat for 30 min, and were then immediately given a 10-min extinction test where the performance of the trained lever pressing response was assessed.
Consumption and reacquisition tests.
On the next day, animals were returned to conditioning boxes where they received ten O1 reinforcers delivered to the food cup on a RI 30-s schedule. On the final day, animals were given one 30-min reacquisition session that proceeded as a typical instrumental training session, in which levers were extended, response-contingent reinforcers were delivered on a RI 30-s schedule, and the session terminated after 30 min.
Results
Instrumental training.
All animals acquired the lever-press response without incident (Figure 3, Panel a). Response rates at the end of acquisition did not differ based on group assignment or devaluation condition. A Group (PF, Control) by Devaluation (Paired, Unpaired) by Session (20) ANOVA found a significant effect of session, F(19, 532) = 76.98, MSE = 54.79, p < .001, and no other effects or interactions, largest F(1, 28) = 1.24. A Group by Devaluation ANOVA on response rates in the final session prior to reinforcer devaluation revealed no significant effects or interactions, largest F(1, 28) = 1.58, MSE = 460.43.
Figure 3:

Results of Experiment 3. a.) Instrumental training phase (means and SEMs). b.) Performance in the extinction test of the instrumental response (means and SEMs). PF = prefed with O2 prior to the extinction test; Control = not prefed with O2; Paired and Unpaired = O1 pairings (or unpairing) with LiCl.
Test.
Aversion conditioning proceeded normally; on the final trial of the devaluation phase, Group PF Paired consumed a mean of 1.1 pellets and Group Control Paired consumed a mean of 0.2 pellets, whereas the Unpaired groups ate all of them. The test results are summarized in Figure 3, Panel b [log means (standard error of the mean) were 0.40 (0.17) and 0.89 (0.14) for Groups PF Paired and PF Unpaired, and 0.83 (0.14) and 0.92 (0.12) for Groups Control Paired and Control Unpaired]. There was no evidence of a devaluation effect in the control groups, suggesting habit. However, there was a devaluation effect in the prefed groups. Planned comparisons confirmed a significant effect of devaluation in the prefed groups, F(1, 28) = 5.70, MSE = .17, p = .024, η2 = .17, but no evidence of one in the control groups that were not prefed, F(1, 28) < 1. Thus, a habit was evidently converted to an action by the prefeeding manipulation. The prefed groups ate a mean of 8.2 g (Group Paired) and 7.7 g (Group Unpaired) during prefeeding.
The effectiveness of the aversion conditioning treatment in both Paired groups was confirmed in subsequent consumption and reacquisition tests; Groups PF Paired and Control Paired each ate a mean of 0.1 out of 10 pellets when offered while rats in both Unpaired groups consumed all of them. The averted pellet was not able to support the reacquisition of the instrumental response; Unpaired rats increased instrumental responding in the presence of the reinforcer, whereas Paired rats in both the PF and Control conditions did not.
Discussion
Rats that received extensive instrumental training in which lever pressing was associated with O1 continued to lever press during testing after O1 had been averted through taste aversion conditioning. Thus, the control groups demonstrated habit. However, when similar groups were first allowed to prefeed on O2, lever pressing was sensitive to the averted value of the O1 reinforcer—and the response was a goal-directed action. The results thus suggest that prefeeding on an irrelevant outcome just before a test can return a habit to the status of goal-directed action.
One surprise, of course, is that the prefeeding procedure used here is similar to one that might be used to devalue reinforcers through sensory-specific satiety (e.g., Experiment 2). Note that the present experiment devalued an irrelevant pellet through prefeeding; prefeeding itself did not devalue the instrumental reinforcer (the prior pairings of O1 with LiCl did). Note also that we did not use a procedure here that controlled the amount of exposure to O1 and O2 during instrumental training (cf. Experiment 2). But the current results suggest that although sensory-specific satiety may be a good way to determine whether a response is a goal-directed action, it may not be as good a method for detecting habits. Although specific satiety has sometimes been used this way (e.g., Urcelay & Jonkman, 2019), many studies have instead used the method to determine whether a neural and/or pharmacological manipulation makes the response less goal-directed (e.g., Bradfield, Hart, & Balleine, 2018; Corbit, Kendig, & Moul, 2019; Furlong, Supit, Corbit, Killcross, & Balleine, 2017). The current results suggest that one reason why we did not find evidence of habit with a sensory specific satiety procedure in Experiment 2 (and in our other unpublished experiments) may be that unexpectedly consuming pellets during a prefeeding might convert the habit to a goal-directed action. It evidently did so in the present experiment.
General Discussion
Habits often emerge when a goal-directed action is reinforced repeatedly (e.g., Adams, 1982; Dickinson et al., 1995; Holland, 2004; Thrailkill & Bouton, 2015), perhaps because the reinforcer becomes predictable (Thrailkill et al., 2018). The current experiments identified two treatments that appear to return a habit to the status of a goal-directed action. Experiment 1 tested the hypothesis, based on the results of Thrailkill et al. (2018), that reinforcing a habitual response with a surprising outcome might return it to action status by boosting attention to the response (cf. Pearce & Hall, 1980). When a response had received 360 pairings with O1 on an interval schedule of reinforcement, taste aversion conditioning of O1 failed to suppress the response when it was tested in extinction, suggesting that the behavior was a habit. However, when the response earned a surprising outcome (O2) in a final training session, responding was suppressed after aversion conditioning to O1, suggesting that O2 had converted the response to a goal-directed action. Experiment 3 asked whether unexpected pellets just before the test could also return a habit to an action. After approximately 1,200 R-O1 pairings, the response was again a habit as confirmed by the lack of impact of O1 aversion conditioning. But if animals were given a 30-min opportunity to consume O2 pellets immediately before the test, the response was sensitive to reinforcer devaluation again; the habit had once again returned to action. That result was consistent with our lack of success in using the sensory-specific satiety method to detect habit: In Experiment 2, the response was a goal-directed action even after extensive training, as if prefeeding had also converted the behavior back to action. Overall, the results indicate that an instrumental behavior’s status as a habit is not permanent. Either surprising reinforcers at the time of the test (Experiment 3), or surprising response-contingent reinforcers at the end of instrumental training (Experiment 1), can return an extensively-trained habit to a goal-directed action.
The idea that surprising pellets can return a habit to action is consistent with other results obtained in this laboratory. Shipman et al. (2018) produced little evidence that a response was a habit even after an extensive amount of instrumental training. In two experiments, they extensively trained one response (R1) over a series of sessions in Context A (approximately 1,440 response-reinforcer pairings earned on a VI 30-s schedule). At the end of that training, they reinforced a new response (R2) in Context B during sessions that were intermixed with the last R1 training sessions in Context A. Unexpectedly, both R1 and R2 were goal-directed actions: Both were suppressed by devaluation of the reinforcer via taste aversion conditioning. In experiments that pursued the finding further, Trask et al. (submitted) found that the intermixed sessions of reinforcement of R2 had converted R1 from a habit to an action. As before, R1 was an action when R2 was reinforced in intermixed sessions in Context B, but it was a habit if the rats received only intermixed exposures (without pellets) to Context B. Other results indicated that the R2 training sessions had to be intermixed with R1 training sessions; if R2 training occurred after R1 training was finished, R1 remained a habit. Finally, reinforcers that were either contingent or not contingent on R2 returned R1 to action status, and R2 could also be paired with a different type of pellet. Like the present results, the results of Trask et al. and Shipman et al. suggest that a behavior’s habit status is not permanent. They also suggest that unexpected exposure to food pellets (in this case intermixed with the final sessions of habit training) can return a habit to a goal-directed action. It may be worth noting that the Trask et al. and Shipman et al. studies involved male rats as subjects, suggesting generality to the current findings with females. And it is also worth noting that the results of Trask et al. and Shipman et al. are reminiscent of the finding that intermixed reinforcement of two responses throughout instrumental training may prevent an extensively-trained action from becoming a habit (Colwill & Rescorla, 1985a, 1988; Holland, 2004; Kosaki & Dickinson, 2010). The present results have at least a family resemblance to all of these findings.
Why should unexpected reinforcers at the end of habit training (Experiment 1) or just before the test (Experiment 3) convert a habit back to an action? One possibility could build on the idea that habits develop because the animal pays less attention to a behavior when its reinforcer becomes predictable (Thrailkill et al., 2018). Perhaps unexpected reinforcers redirect attention to a response that has been “tuned out” through this process. It may seem surprising to think that presentation of a reinforcer can nonspecifically increase attention to behaviors that have never been associated with it (e.g., Experiment 3; Trask et al., Experiment 3), as the Pearce-Hall theory would require (Kaye & Pearce, 1984; Hall & Pearce, 1982; Wilson et al., 1992). But a general boost in attention could be functional in helping the animal detect the cause(s) of a new reinforcer. A second possibility is that an animal might pay less attention to its representation of the reinforcer as the reinforcer becomes predictable. This idea is consistent with many theories of associative learning, which often assume that representations of outcomes are processed less when they are predicted by other events (e.g., Rescorla & Wagner, 1972; Wagner, 1978, 1981; Wagner & Brandon, 1989). It is also consistent with the fact that extensively-trained behaviors are less sensitive to the effects of reinforcer devaluation: A less-processed reinforcer representation would be less able to influence instrumental responding. The present results (see also Trask et al., submitted, and Shipman et al., 2018) could suggest that presentation of an unexpected food reinforcer might boost attention to the representation of foods (especially, perhaps, in a food-deprived rat). However, either the response-focused or reinforcer representation-focused approach just sketched would need to embrace the fact that different reinforcers (Experiment 3; Trask et al., submitted), and even reinforcers that are presented in other contexts (Experiment 3; Trask et al., submitted), can influence the controlling process.
A third possibility may be especially relevant for the results of Experiment 3, where 30 mins of prefeeding on an irrelevant food pellet returned a habit to action status. Recall that the animals ate an average of about 8 g of food at that time—a meal. Prefeeding could have thus moved the animal from an interoceptive state of hunger to one of satiety. Given evidence suggesting that hunger and satiety states can play the role of “contexts” in controlling Pavlovian (e.g., Davidson, 1993) and instrumental (Schepers & Bouton, 2017) behavior, it is possible that Experiment 3 involved a change of context at the time of testing. Previous work suggests that an exteroceptive context switch weakens habits more than goal-directed actions (Thrailkill & Bouton, 2015), and the most recent evidence suggests that a context switch may be sufficient to convert a habit back to goal-directed action (Steinfeld & Bouton, submitted). Thus, the conversion of habit to action in Experiment 3 can be viewed as an effect of context change. It is also conceivable, though much more conjectural, that delivery of a surprising consequence of the response at the end of Experiment 1’s habit acquisition phase could have signaled a change of context that similarly returned the habit to action.
At a methodological level, the present results suggest that the taste aversion method of reinforcer devaluation may provide a better technique for detecting habit than sensory-specific satiety. In our hands, either method provides a reliable way to detect goal-directed action. But as suggested by the results of Experiment 3, prefeeding food pellets just prior to the test, as in a sensory-specific satiety procedure, may paradoxically return a habit to action status. Interestingly, at least one recent successful use of the sensory-satiety method for detecting habit employed several preexposures to the prefeeding protocol prior to habit testing (Urcelay & Jonkman, 2019, Experiment 1). Perhaps novelty of the procedure is important in allowing it to convert a habit into action, although it is worth noting that Experiment 2 involved two exposures to prefeeding (in the initial test) before the unsuccessful habit test was conducted at the end of the experiment.
At a theoretical level, the findings that habits can revert to action status may be consistent with recent difficulties in confirming habits in experiments with human participants. De Wit et al. (2018) reported several experiments in which humans failed to show evidence of habitual responding after extended instrumental training. Although the results could represent a failure to induce or acquire habit, it is also possible that a habit was successfully acquired but that aspects of the procedures converted the behavior to action status before testing. For example, two of de Wit et al.’s experiments (Experiments 3A and 3B) reported failures to find evidence of habit after reinforcer devaluation with a sensory-specific satiety procedure, which the current Experiment 3 suggests may convert habit to action (but see Tricomi, Balleine, & O’Doherty, 2009). It is worth noting that the implication that habits might be observed under only a narrow set of conditions might also suggest caution in accepting the possibility that habit learning plays a central role in behavior disorders such as addiction (see also Hogarth, 2018).
In summary, perhaps the most important implication of the present findings is that an instrumental response’s status as a habit is not permanent. The results are consistent with a more dynamic view in which instrumental behaviors are seen as moving flexibly between habit and goal-direction. As noted earlier, switching a behavior out of habit status, as when we recover in the middle of a mistake and resume goal orientation, is not an unusual human experience. In addition to studying the conditions that create habits, we need more information on how we break out of habits, too.
Acknowledgments
This research was supported by National Institutes of Health Grant RO1 DA 033123 to MEB.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Adams CD (1982). Variations in the sensitivity of instrumental responding to reinforcer devaluation. Quarterly Journal of Experimental Psychology, 34B, 77–98. [Google Scholar]
- Bradfield LA, Hart G, & Balleine BW (2018). Inferring action-dependent outcome representations depends on anterior but not posterior medial orbitofrontal cortex. Neurobiology of Learning and Memory, 155, 463–473. [DOI] [PubMed] [Google Scholar]
- Colwill RM, & Rescorla RA (1985a). Instrumental responding remains sensitive to reinforcer devaluation after extensive training. Journal of Experimental Psychology: Animal Behavior Processes, 11, 520–536. [PubMed] [Google Scholar]
- Colwill RM, & Rescorla RA (1985b). Postconditioning devaluation of a reinforcer affects instrumental responding. Journal of Experimental Psychology: Animal Behavior Processes, 11, 120–132. [PubMed] [Google Scholar]
- Colwill RM, & Rescorla RA (1988). Associations between the discriminative stimulus and the reinforcer in instrumental learning. Journal of Experimental Psychology: Animal Behavior Processes, 14, 155–164. [Google Scholar]
- Corbit L, Kendig M, & Moul C (2019). The role of serotonin 1B in the representation of food outcomes. Scientific Reports, 9: 2497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cousineau D, & O’Brien F (2014). Error bars in within-subject designs: A comment on Baguley (2012). Behavior Research Methods, 46, 1149–1151. [DOI] [PubMed] [Google Scholar]
- Davidson TL (1993). The nature and function of interoceptive signals to feed: toward integration of physiological and learning perspectives. Psychological Review, 100, 640–657. [DOI] [PubMed] [Google Scholar]
- de Wit S, & Dickinson A (2009). Associative theories of goal-directed behavior: A case for animal-human translational models. Psychological Research, 73, 463–476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Wit S, Kindt M, Knot SL, Verhoeven AAC, Robbins TW, Gasull-Camos J, Evans M, Mirza H, & Gillan CM (2018). Shifting the balance between goals and habits: Five failures in experimental habit induction. Journal of Experimental Psychology: General, 147, 1043–1065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dickinson A (1985). Actions and habits: The development of behavioral autonomy. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 308, 67–78. [Google Scholar]
- Dickinson A (1994). Instrumental conditioning In Mackintosh NJ (Ed.) Animal learning and cognition: Handbook of perception and cognition series (2nd Ed., pp. 45–79). San Diego, CA: Academic Press. [Google Scholar]
- Dickinson A, Balleine B, Watt A, Gonzalez F, & Boakes RA (1995). Motivational control after extended instrumental training. Animal Learning and Behavior, 23, 197–206. [Google Scholar]
- Dickinson A, Campos J, Varga ZI, & Balleine B (1996). Bidirectional instrumental conditioning. Quarterly Journal of Experimental Psychology B, 49, 289–306. [DOI] [PubMed] [Google Scholar]
- Dickinson A, Nicholas DJ, & Adams CD (1983). The effect of instrumental training contingency on susceptibility to reinforcer devaluation. Quarterly Journal of Experimental Psychology, 35B, 35–51. [Google Scholar]
- Dickinson A, & Perez OD (2018). Actions and habits: Psychological issues in dual system theory In Morris RW, Bornstein AM, & Shenhav A (Eds.), Goal-Directed Decision Making: Computations and Neural Circuits (pp. 1–37). Elsevier. [Google Scholar]
- Furlong TM, Supit ASA, Corbit LH, Killcross S, & Balleine BW (2017). Pulling habits out of rats: Adenosine 2A receptor antagonism in dorsal striatum rescues methamphetamine-induced deficits in goal-directed action. Addiction Biology, 22, 172–183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garr E, Bushra B, & Delamater AD Habit formation does not depend on the correlation between response rates and reward rates. Society for Neuroscience, 2017. [Google Scholar]
- Hall G, & Pearce JM (1982). Restoring the associability of a pre-exposed CS with a surprising event. The Quarterly Journal of Experimental Psychology B, 34, 127–140. [Google Scholar]
- Hogarth L (2018). A critical review of habit theory of drug dependence In Verplanken B (Ed.), The psychology of habit: Theory, mechanisms, change, and contexts. Cham: Springer. [Google Scholar]
- Holland PC (2004). Relations between Pavlovian-instrumental transfer and reinforcer devaluation. Journal of Experimental Psychology: Animal Behavior Processes, 30, 104–117. [DOI] [PubMed] [Google Scholar]
- Kaye H, & Pearce JM (1984). The strength of the orienting response during Pavlovian conditioning. Journal of Experimental Psychology: Animal Behavior Processes, 10, 90–109. [PubMed] [Google Scholar]
- Kosaki Y, & Dickinson A (2010). Choice and contingency in the development of behavioral autonomy during instrumental conditioning. Journal of Experimental Psychology. Animal Behavior Processes, 36, 334–342. [DOI] [PubMed] [Google Scholar]
- Pearce JM, & Hall G (1980). A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review, 87, 532–552. [PubMed] [Google Scholar]
- Rescorla RA (1996). Spontaneous recovery after training with multiple outcomes. Animal Learning & Behavior, 24, 11–18. [Google Scholar]
- Rescorla RA (1997). Spontaneous recovery of instrumental discriminative responding. Animal Learning & Behavior, 25, 485–497. [Google Scholar]
- Rescorla RA & Wagner AR (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement In Black AH and Prokasy WK (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York: Appleton-Century-Crofts. [Google Scholar]
- Schepers ST, & Bouton ME (2017). Hunger as a context: Food-seeking that is inhibited during hunger can renew in the context of satiety. Psychological Science, 28, 1640–1648. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schoenberg HL, Sola EX, Seyller E, Kelberman M, & Toufexis D (2018). Female rats express habitual behavior earlier in operant training than males. Behavioral Neuroscience, 133, 110–120. [DOI] [PubMed] [Google Scholar]
- Shipman ML, Trask S, Bouton ME, & Green JT (2018). Inactivation of prelimbic and infralimbic cortex respectively affect expression of minimally-trained and extensively-trained goal-directed actions. Neurobiology of Learning and Memory, 155, 164–172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Steinfeld MR, & Bouton ME (submitted for publication) Context and renewal of habits and goal-directed actions after extinction. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thrailkill EA, & Bouton ME (2015). Contextual control of instrumental actions and habits. Journal of Experimental Psychology: Animal Learning and Cognition, 41, 69–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thrailkill EA, Trask S, Vidal P, Alcalá JA, & Bouton ME (2018). Stimulus control of actions and habits: A role for reinforcer predictability and attention in the development of habitual behavior. Journal of Experimental Psychology: Animal Learning and Cognition, 44, 370–384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tricomi E, Balleine BW, & O’Doherty JP (2009). A specific role for posterior dorsolateral striatum in human habit learning. The European Journal of Neuroscience, 29, 2225–2232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trask S, Shipman ML, Green JT, & Bouton ME (submitted). Factors that restore goal-direction to a habitual behavior. Manuscript submitted for publication. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Urcelay GP, & Jonkman S (2019). Delayed rewards facilitate habit formation. Journal of Experimental Psychology: Animal Learning and Cognition, in press. [DOI] [PubMed] [Google Scholar]
- Wagner AR (1978). Expectancies and the priming of STM In Hulse SH, Fowler H, & Honig WK (Eds.), Cognitive processes in animal behavior (pp. 177–209). Hillsdale, NJ: Lawrence Erlbaum. [Google Scholar]
- Wagner AR (1981). SOP: A model of automatic memory processing in animal behavior In Spear NE & Miller RR (Eds.), Information processing in animals: Memory mechanisms (pp.5–47). Hillsdale, NJ: Lawrence Erlbaum Associates. [Google Scholar]
- Wagner AR, & Brandon SE (1989). Evolution of a structured connectionist model of Pavlovian conditioning (AESOP) IN Klein SB & Mowrer RR (Eds.), Contemporary learning theories: Pavlovian conditioning and the status of traditional learning theory (pp. 149–190). Hillsdale, NJ: Erlbaum. [Google Scholar]
- Wilson PN, Boumphrey P, & Pearce JM (1992). Restoration of the orienting response to a light by a change in its predictive accuracy. The Quarterly Journal of Experimental Psychology, 44B, 17–36. [Google Scholar]
- Wood W, & Rünger D (2016). Psychology of habit. Annual Review of Psychology, 67, 289–314. [DOI] [PubMed] [Google Scholar]
