Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Feb 1.
Published in final edited form as: Behav Neurosci. 2020 Oct 29;135(1):79–87. doi: 10.1037/bne0000422

Renewal of goal direction with a context change after habit learning

Michael R Steinfeld 1, Mark E Bouton 1
PMCID: PMC8049955  NIHMSID: NIHMS1661788  PMID: 33119327

Abstract

An instrumental action can be goal-directed after a moderate amount of practice and then convert to habit after more extensive practice. Recent evidence suggests, however, that habits can return to action status after different environmental manipulations. The present experiments therefore asked whether habit learning interferes with goal direction in a context-dependent manner like other types of retroactive interference (e.g., extinction, punishment, counterconditioning). In Experiment 1, rats were given a moderate amount of instrumental training to form an action in one context (Context A) and then more extended training of the same response to form a habit in another context (Context B). We then performed reinforcer devaluation with taste aversion conditioning in both contexts, and tested the response in both contexts. The response remained habitual in Context B, but was goal-directed in Context A, indicating renewal of goal direction after habit learning. Experiment 2 expanded on Experiment 1 by testing the response in a third context (Context C). It found that the habitual response also renewed as action in this context. Together, the results establish a parallel between habit and extinction learning: Conversion to habit does not destroy action knowledge, but interferes with it in a context-specific way. They are also consistent with other results suggesting that habit is specific to the context in which it is learned, whereas goal-direction can transfer between contexts.

Keywords: Context, Renewal, Goal-Directed Actions, Habits, Instrumental Learning, Overtraining


Instrumental responses can be goal-directed actions (henceforth “actions”) following a moderate amount of practice or training. However, they can eventually convert to habits after more extensive training (e.g., Dickinson, 1985). Actions are thought to be supported by associations between the response and the outcome (R-O associations), and the behavior is primarily performed if the outcome is valued. Therefore, changing the value of the outcome changes the amount of responding performed by an animal. In contrast, habits are not dependent on the value of the outcome. Instead, they are said to be governed by associations between the response and stimuli that were present during response training (S-R associations) (Adams & Dickinson, 1981). Rather than being motivated by the outcome, S-R responses are thought to be elicited more automatically by a specific antecedent stimulus or context (Steinfeld & Bouton, 2020; Thrailkill & Bouton, 2015). Adams (1982) distinguished between actions and habits by separately conditioning a taste aversion to the reinforcer after instrumental learning. He found that, when the instrumental response was then tested in extinction, a response that had been given a moderate amount of training was suppressed after its outcome had been devalued; it was therefore goal-directed. In contrast, a response that had been extensively trained was not affected by outcome devaluation; it was thus habitual. The distinction between actions and habits has recently gained a great deal of attention among neuroscientists (Balleine, 2019; Balleine & O’Doherty, 2010; Furlong, Supit, Corbit, Killcross, & Balleine, 2015; Gremel & Costa, 2013; Robbins, Vaghi, & Banca, 2019; Yin, Knowlton, & Balleine, 2004, 2005, 2006) and computational neuroscientists (e.g., Daw, Niv, & Dayan, 2005; Solway & Botvinick, 2012; Verschure, Pennartz, & Pezzulo, 2014).

Emerging behavioral evidence suggests that habit learning after extensive practice does not destroy the original action knowledge. For example, Bouton, Broomer, Rey, and Thrailkill (2020) found that an extensively-trained habitual response can become goal-directed if it is paired with a surprising food pellet reinforcer at the end of instrumental training or if a surprising pre-feeding with irrelevant pellets occurs immediately before the test. Trask, Shipman, Green, and Bouton (2020) found that an extensively-trained habit was goal-directed at test if training of a second response began before training of the first response was complete. This effect occurred even though the new response was trained in a different context, with a different reinforcer, or when the reinforcer was not contingent on an available response. The authors argued that a surprising reinforcer converted a habit back to an action. The duration of that reconversion is not known. But both sets of results suggest that a behavior’s conversion to habit with extensive practice does not erase or destroy its initial status as a goal-directed action.

If habit does not destroy action knowledge, then it may be fair to view the conversion of action into habit as a retroactive interference process with properties like those of extinction (e.g., Bouton, 2017), punishment (Marchant, Khuc, Pickens, Bonci, & Shaham, 2013), or counterconditioning (Peck & Bouton, 1990). Retroactive interference treatments are those in which new learning interferes with learning that was acquired in an earlier phase (e.g., Bouton, 1993, 2019; Miller, Kasprow, & Schachtman, 1986; Spear, 1981). If habit learning is like other forms of interference, then it is likely to be impermanent and specific to the context in which it is learned. For example, renewal has been extensively studied in the interference paradigms, and particularly in extinction (e.g., Bouton, 1993, 2019). In an ABA renewal paradigm, a response is trained in Context A, extinguished in Context B, and renews in Context A. Importantly, renewal also occurs in ABC and AAB designs. In ABC renewal, a response is trained in Context A, extinguished in Context B, and renews in a neutral Context C. In AAB renewal, a response is trained and extinguished in Context A, and renews in Context B. ABA, AAB, and ABC renewal have all been observed in instrumental learning (Bouton, Todd, Vurbic, & Winterbauer, 2011; Bouton, Winterbauer, & Todd, 2012; Todd, 2013; Todd, Winterbauer, & Bouton 2012). Together, results suggest that during extinction learning, animals learn to inhibit the performance of a response in a specific context, and removal from the extinction-learning context causes removal of the inhibition, allowing the response to renew (Bouton, 2019; Bouton & Todd, 2014; Todd, 2013; Todd, Vurbic, & Bouton, 2014a, 2014b). Importantly, other forms of retroactive interference in instrumental learning, such as punishment, also leave the original performance available for renewal (e.g., Bouton, 2019; Bouton & Schepers, 2015; Marchant et al., 2013).

Steinfeld and Bouton (2020) recently investigated the renewal of actions and habits after extinction. They tested both actions and habits (identified with the reinforcer devaluation method) in ABA and ABC renewal paradigms. A response was first trained as either an action or a habit in Context A, and then reinforcer devaluation was conducted in all contexts. Then the response was extinguished in Context B, and tested in either Contexts A and B (ABA renewal) or Context C and B (ABC renewal). In the ABA paradigm, actions and habits renewed as actions and habits, respectively, suggesting that the R-O and S-R structures controlling behavior were at least partly intact in the original context after extinction. However, in tests of ABC renewal, responding manifested as an action in Context C regardless of whether it had been an action or a habit in Context A. Thus, a habit trained in Context A renewed as an action in Context C. That result was compatible with another consistent finding: Whenever a habit was trained in Context A, it appeared to have the properties of an action at the start of extinction in Context B. One way of viewing the results is that action knowledge learned in Context A renewed in Context B or C after either habit learning or extinction learning.

The present experiments were designed to pursue this idea further. In Experiment 1, a lever press response received a moderate amount of training in Context A to the point where it should be an action, and was then converted into habit with more extended training in a second context (Context B). Rats were then given reinforcer devaluation in both contexts and lever pressing was tested in both contexts. If conversion from action to a habit is similar to extinction, the response should renew as an action in Context A after conversion to habit in Context B. In Experiment 2, lever pressing again received a moderate amount of training in Context A, and an extensive amount of training in Context B, but the rats also received equivalent exposure to a third context (Context C). The rats were then given reinforcer devaluation in all three contexts, and the response was tested in Contexts B and C. If habit learning is a context-specific retroactive interference effect, then the response should also renew as an action again in a third context (C). Notice that, unlike the earlier experiments of Steinfeld and Bouton (2020), the focus here was on interference with action knowledge by habit learning, and not extinction.

Experiment 1

Experiment 1 tested the possibility of ABA renewal of an action after the response was converted to a habit. The experimental design is shown in Table 1. Rats first lever pressed in Context A for three sessions (action training), followed by 12 further sessions of lever pressing in Context B (habit training). Previous work in this laboratory suggests that a behavior is an action after the amount of training administered in Context A, but a habit after the amount of training given in Context B (e.g., Steinfeld & Bouton, 2020; Thrailkill & Bouton, 2015). We then performed reinforcer devaluation in both contexts, with one group receiving pairings of the reinforcer with lithium chloride (LiCl) and the other group receiving the reinforcer and LiCl unpaired. The response was then tested in extinction in both contexts (order counterbalanced). No effect of devaluation was expected in Context B, as the behavior should be a habit there. However, when returned to Context A, the action training context, the behavior might return to R-O control and renew as an action, which would manifest as a devaluation effect in the group that had the reinforcer paired with LiCl.

Table 1.

Experimental Designs

Experiment Group Action Acquisition Habit Acquisition Devaluation Test
1 Paired R+ (A) 3 Sessions R+ (B) 12 Sessions Pellets → LiCl (A & B) R- (A & B)
Unpaired Pellets / LiCl (A & B)

2 Paired R+ (A) 3 Sessions R+ (B) 12 Sessions Pellets → LiCl (A, B, & C) R- (C & B)
Unpaired Pellets / LiCl (A, B, & C)

Note. A, B, and C denote contexts; R = response; + = Reinforced on a RI 30-s schedule; - = Not reinforced. Exposures to the contexts were equated throughout the experiments (not shown).

Method

Subjects and Apparatus

The subjects were 16 naïve female Wistar rats purchased from Charles River Laboratories (St. Constance, Quebec). They were between 75 and 90 days old at the start of the experiment and were individually housed in a room maintained on a 16:8-h light:dark cycle. Experimentation took place during the light period of the cycle. The rats were food-deprived to 80% of their initial body weights throughout the experiment. The research protocols used in the present experiments were approved by the University of Vermont Institutional Animal Care and Use Committee.

Procedure

Apparatus

The apparatus was two unique sets of four conditioning chambers (Model ENV-007-VP; Med Associates, St. Albans, VT). Each chamber was housed in its own sound-attenuating chamber. All chambers measured 31.75 × 24.13 × 29.21 cm (length × width × height). The side walls consisted of clear acrylic panels, and the front and rear walls were made of brushed aluminum. A recessed food cup was centered on the front wall approximately 2.5 cm above the floor. A retractable lever (Model ENV-112CM, Med Associates) was positioned to the left of the food cup. The lever was 4.8 cm wide and 6.3 cm above the grid floor. It protruded 2.0 cm from the front wall when extended. The chambers were illuminated by 7.5-W incandescent bulbs mounted to the ceiling of the sound attenuation chamber. Ventilation fans provided background noise of 65 dBA. The reinforcers used were 45-mg grain food pellet (MLab Rodent Tablets, 5TUM; TestDiet, Richmond, IN).

The two contexts differed in several ways. One set of operant chambers had a grid floor that consisted of alternating stainless steel grids with different diameters (0.5 and 1.3 cm, spaced 1.6 cm apart). The set was scented using Hannaford brand distilled white vinegar (Scarborough, ME). The other set also had a grid floor, but all of the bars were the same size (1.3 cm, spaced 1.6 cm apart center-to-center), and it was scented using Vick’s VapoRub (Cincinnati, OH). It also had a 1.91-cm wide dark vertical stripe on the back wall of the chamber.

Procedure

Magazine Training.

On the first day of training, the rats learned to eat pellets from the food cup in each of their assigned contexts (order counterbalanced). This required two magazine training sessions (one in each context) that consisted of 30 noncontingent pellet presentations delivered on a random time (RT) 30-s schedule of reinforcement. These sessions lasted approximately 15 to 20 min. Rats were returned to their home cages following completion of each session. The intersession interval was approximately 20 min.

Instrumental Acquisition.

Rats were then given one daily session in each context (two sessions a day) for 15 days. For the first three daily sessions in Context A, the lever was inserted into the chamber after a two-min delay. After this, responses on it rats were reinforced according to a random interval (RI) 30-s schedule. The session concluded when the rat had earned 30 reinforcers, at which time the lever was retracted. Once all of the rats reached 30 reinforcers, they were returned to their home cages. Daily sessions in Context B were the same duration as those in A, but the lever was never presented. During Sessions 4–15, the habit acquisition phase, lever pressing was reinforced in Context B, and exposure sessions were run in Context A, again matched for time. Context B training sessions consisted of 30 pellet presentations on an RI 30-s schedule. Sessions lasted for approximately 15 min and the intersession interval was approximately 20 min.

The order of the context exposures was counterbalanced across group and day. Furthermore, the sessions in each context were double alternated. For example, on Day 1 half the rats were exposed to Context A and then Context B, while the other half were exposed to Context B and then Context A. The orders were then reversed on Day 2. The rats were then given context exposures in the opposite pattern for Days 3 and 4, after which the other pattern would repeat. This method of counterbalancing was used for all stages (except reinforcer devaluation) in both experiments.

Reinforcer Devaluation.

Reinforcer devaluation began on the day following the final instrumental (habit) training session. During this phase of the experiment, rats were given only one session per day in either Context A or Context B, with the lever unavailable throughout the phase. During this stage, the rats were divided into a Paired group, which was given LiCl injections and pellets during the same sessions, and an Unpaired group, which received LiCl injections and pellets during different sessions. The sessions alternated between injection sessions (odd-numbered days) and noninjection sessions (even-numbered days). On injection sessions, the Paired rats were given noncontingent pellets on a RT 30-s schedule, while the Unpaired group was merely placed in the context for the same period of time. At the end of the session, each rat was given a 20 ml/kg LiCl (0.15 M) injection. On noninjection sessions, the Unpaired rats were given pellets on a RT 30-s schedule, while the Paired rats were given context exposures matched for time. No injections were given after these sessions. At the beginning of the devaluation phase, 50 pellets were delivered to each rat during the appropriate session. Following each injection session, the average number of pellets eaten by the Paired rats was calculated, and this became the new number of pellets delivered to the rats on the next session. This method ensured that all rats received the same number of injections, and roughly the same number of pellets, but only the Paired rats received the injections and pellets on the same day, resulting in only the Paired rats developing a taste aversion to the pellets.

Reinforcer devaluation lasted for a total of twelve days, over which the animals had three conditioning trials in each context. The first four-session cycle followed either an ABBA or BAAB pattern in which there was one injection session and one noninjection session in each context. The full pattern employed during devaluation was A (injection) - B (no injection) - B (injection) - A (no injection) (or the analogous BAAB pattern). The second cycle was run using the opposite pattern than the first (BAAB or ABBA), while the third cycle returned to the original pattern for half of the rats.. There were always 48 hours between successive injections.

Test.

The rats were then given a 10-min test session with the lever present in each context. The lever was inserted following a 2-min delay. The test was conducted in extinction to ensure that the rats could not directly associate their behavior with the reinforcer after it had been devalued. Half the rats from each group were tested in Context A first and Context B second, while the other half was tested in Context B first and Context A second. The intersession interval was approximately 15 min.

Consumption Test.

The rats were given a pellet consumption test on the next day to further assess the aversion to the pellets in the Paired group. Ten pellets were delivered to each rat on an RT 30-s schedule without the lever available in both contexts. The number of pellets consumed by each rat was counted. The order of sessions was counterbalanced as per the main test session, but in the opposite order for each rat.

Reacquisition Test.

The rats were given a reacquisition test on the day following the pellet consumption test to assess the pellet’s ability to support the response. These sessions were similar to the acquisition sessions. The lever was inserted after a two-min delay, and the rats could press to earn pellets on a RI 30-s schedule. However, instead of ending after 30 reinforcers, this test session ended after 30 minutes. Test sessions were conducted in each context, with the order counterbalanced as per the main test session.

Data analysis.

Response rates (responses per min) were analyzed with analysis of variance (ANOVA) and independent samples t-tests. Rejection criterion was set at p < .05 for all statistical comparisons. During the Test phase, our a priori focus was on comparing the Paired and Unpaired groups in each context, because such comparisons are how goal-directed actions and habits are identified. Effect sizes are reported when appropriate. Individual observations were considered to be outliers if their z-score was greater than 2 (see Field, 2005), and were removed from the final data analyses. Data transformation were performed if data were not normally distributed.

Results

Acquisition.

One rat from the Unpaired group was removed from the analyses because it was an outlier showing unusually high responding during the test in Context B (z = 2.13). The acquisition data are depicted in Figure 1. All of the rats acquired the lever press response. The overall rate of lever pressing increased across sessions, as confirmed by a Devaluation (Paired, Unpaired) by Session (15) repeated measures ANOVA. The analysis found a main effect of session, F(14, 182) = 44.80, MSE = 1,642.52, p < .001, with no effect of devaluation or a devaluation by session interaction, Fs < 1. Separate ANOVAs indicated that there was a main effect of session in both Context A on Days 1–3, F(2, 26) = 41.10, MSE = 481.10, p < .001, and Context B on Days 4–15, F(11, 143) = 27.16, MSE = 985.89, p < .001.

Figure 1:

Figure 1:

Lever press acquisition in Experiment 1. Mean response rates in both groups are shown during acquisition in Contexts A (action training) and B (conversion to habit). Error bars represent standard error of the mean (SEM).

Devaluation.

Pellet consumption during the reinforcer devaluation phase is shown in Figure 2. Pellet consumption decreased over sessions, and there was no difference in the pattern observed in Context A or Context B. A Context (2) by Session (3) repeated measures ANOVA on consumption in the Paired groups found a main effect of session, F(2, 14) = 108.92, MSE = 3.37, p < .001, but no effect of context, or context by session interaction, largest F = 1.71. On the final session of devaluation, the Paired rats ate an average of 0 pellets. The Unpaired rats ate all of the pellets offered throughout the phase.

Figure 2:

Figure 2:

Results of the reinforcer devaluation phase of Experiment 1. Data are the mean proportion of pellets consumed by paired rats during their first, second, and third devaluation sessions in Contexts A and B. (Unpaired rats consumed all pellets on each trial.) Error bars represent standard error of the mean (SEM).

Test.

The mean response rates of the groups during the tests are summarized in Figure 3. A Shapiro-Wilk test indicated that the Paired response rates in Context A, and the Unpaired rates in Context B were not normally distributed, W(8) = .77, p = .013, W(7) = .73, p = .003, respectively. A log (X + 1) transformation was performed to correct for this (1 was added to the raw score prior to the log transformation), and those data are summarized in the figure. A Context (A, B) by Devaluation ANOVA found a main effect of context, F(1, 13) = 21.83, MSE = .37, p < .001, ηp2 = .62, and a main effect of devaluation, F(1, 13) = 6.19, MSE = .35, p = .027, ηp2 = .23. The context by devaluation interaction was also significant, F(1, 13) = 5.00, MSE = .08, p = .043, ηp2 = .28 . Planned within-context independent samples t-tests isolated the devaluation effects. In Context B, the difference between Paired and Unpaired groups did not approach significance, t(13) = −1.29, p = .220, suggesting habit was being expressed there. In Context A, the Paired group responded significantly less than the Unpaired group, t(13) = −2.2, p = .012, d = 1.51, indicating a devaluation effect and thus the presence of goal-directed action.

Figure 3:

Figure 3:

Results of testing in Experiment 1 (response rates were subjected to a log (X +1) transformation). Bars display mean response rates of both groups in both Contexts A and B. Response rates reflect the overall rate during the 10-minute session. Error bars represent standard error of the mean (SEM).

Consumption and Reacquisition Tests.

During the Consumption test (not shown), the Paired rats ate an average of 0.4 pellets in Context A and 0.4 in Context B, confirming the presence of a taste aversion in both contexts. The Unpaired rats ate all 10 pellets offered. During the Reacquisition test (not shown), delivery of the pellet supported the response in both contexts, but only in the Unpaired rats. There was very little responding by the Paired rats during this test. The pattern confirmed that the pellet was only reinforcing for the Unpaired rats.

Discussion

The results of Experiment 1 suggested a renewal of goal-directed responding in Context A after the response had converted to habit in Context B. That is, based on the results of a reinforcer devaluation test, after limited training in Context A and extended training in Context B, the response renewed as an action in Context A and remained a habit in Context B. These results continue to support the idea that habits are context specific (Steinfeld & Bouton, 2020; Thrailkill & Bouton, 2015), as a return to the action acquisition context appeared to reduce S-R and renew R-O. Perhaps more important, they suggest that the conversion of an action into a habit did not destroy the original action learning, which is still expressed in performance in the action-training context. Habit learning thus interferes with action performance in Context B in a way that is reminiscent of extinction.

It is notable that aversion conditioning with the pellet reinforcer occurred at similar rates in Contexts A and B during the devaluation phase (Figure 2). This result occurred despite the fact that the two contexts, though equally exposed and equally familiar to the animals, had been differentially associated with the pellet during the habit and acquisition phases (Context B had received more pairings with the reinforcer than Context A had during acquisition). The difference could have conceivably caused less aversive conditioning to Context B (or less conditioning to the reinforcer when presented there), and thus less suppression of behavior there. The fact that contextual control after reinforcer devaluation was unique to the lever pressing response, and not pellet consumption (see also Experiment 2), is most consistent with the idea that lever pressing was differentially depressed by context during testing because it was an action in Context A and a habit in Context B.

It is unclear from this experiment whether return to the action acquisition context was necessary to renew action responding, or whether mere removal from the habit context would be sufficient to renew R-O responding. Experiment 2 was therefore designed to assess this question.

Experiment 2

The purpose of Experiment 2 was to determine if the renewal of action after habit learning can also be accomplished by simple removal from the habit training context and placement in a neutral third context (ABC renewal). If habit learning is like other forms of retroactive interference, we should expect ABC as well as ABA renewal. And if the context plays part of the role of the stimulus in eliciting habitual behavior (Steinfeld & Bouton, 2020; Thrailkill & Bouton, 2015), then removing an animal from the habit-training context should weaken habitual control, possibly resulting in the renewal of action.

The experimental design is summarized in Table 1. In this experiment, rats again received three sessions of training in Context A followed by 12 sessions of habit training in Context B. Reinforcer devaluation was then conducted in both Context A and Context B, as well as an associatively neutral Context C. The response was then tested in extinction in Context C and Context B. Of interest was what happens when the test occurs in the neutral context (Context C). Switching from the habit learning context should weaken habit (Thrailkill & Bouton, 2015), perhaps allowing conversion from S-R to R-O control, causing the behavior to renew as an action there.

Method

Subjects

Experiment 2 was run in two replications of 12 female Wistar rats each for a total of 24 rats. The origin, age, and housing conditions of the rats were identical to Experiment 1.

Apparatus

Replication 1.

The apparatus used in the first replication was the same as that of Experiment 1 except that a third set of four operant chambers housed in a third room in the laboratory was added. The new chambers were of the same dimensions as the other chambers, with a floor consisting of 1.3-cm diameter grids that were “staggered” such that the height of adjacent bars differed by 1.6 cm. There were no markings on the walls. Mr. Clean citrus scented cleaning solution (Cincinnati, OH) was used to give the third context a unique scent.

Replication 2.

Three new sets of boxes were used for the second replication. Two sets of chambers measured 31.75 × 24.13 × 29.21 cm (length × width × height). The side walls consisted of clear acrylic plastic, and the front and rear walls were made of brushed aluminum. A recessed food cup was centered on the front wall approximately 2.5 cm above the floor. A retractable lever (model ENV-112CM, Med Associates) was positioned to the left of the food cup. The lever was 4.8 cm wide and 6.3 cm above the grid floor. It protruded 2.0 cm from the front wall when extended. The chambers were illuminated by 7.5-W incandescent bulbs mounted to the ceiling of the sound-attenuation chamber. Ventilation fans provided background noise of 65 dBA. In one set of operant chambers, the floor consisted of 0.5 cm diameter stainless steel floor grids spaced 1.6 cm apart (center-to-center) and mounted parallel to the front wall. The ceiling and side wall had black horizontal stripes, 3.8 cm wide and 3.8 cm apart. This set was scented using Hannaford distilled white wine vinegar. In the second set of chambers, the floor consisted of alternating stainless-steel grids with different diameters (0.5 and 1.3 cm, spaced 1.6 cm apart). The ceiling and side wall were covered with dark dots (2.0 cm in diameter). This set was scented using Vicks VapoRub.

The third set of chambers was identical in size to the chambers used in Experiment 1, meaning that it was smaller than the other two chambers used in this experiment, and it did not have any distinct markings on the back panel. The floor consisted of 1.3-cm diameter grids that were staggered such that the height of adjacent bars differed by 1.6 cm. It was scented using Mr. Clean citrus scented cleaning solution (Cincinnati, OH).

Procedure

The procedure was identical in the two replications except where noted.

Acquisition.

Rats were given three sessions of action training in Context A and then 12 sessions of habit training in Context B following the procedure used in Experiment 1. Rats were given exposure to all three contexts every day of training following a method analogous to the one used in the first experiment. Order of context exposure followed a pseudo-random pattern that ensured that exposure to the contexts never occurred in the same order on two consecutive days (e.g., ACB, BAC, CBA, BCA, ABC, CAB). Assignment of action learning context, habit learning context, and neutral context was counterbalanced across the three sets of operant chambers. The contexts were also counterbalanced with respect to the Paired and Unpaired groups.

Devaluation.

Reinforcer devaluation was then conducted in all three contexts. The order of reinforcer devaluation in each context was counterbalanced so that one sixth of the rats were given reinforcer devaluation in Context A, then B, then C, while another sixth were given reinforcer devaluation in Context B, then A, the C, etc. An example of the pattern used was A (injection) – A (no injection) – B (injection) – B (no injection) – C (injection) – C (no injection). Cycles were performed until the Paired rats ate an average of one pellet or less. This required two cycles in Replication 1, but three cycles in Replication 2 (for unknown reasons, devaluation occurred less quickly). Rats were only given one session per day during this phase.

Test.

Rats were tested in Context B and Context C using the procedure of Experiment 1. Test order was counterbalanced as in Experiment 1.

Consumption and Reacquisition.

Consumption and reacquisition tests followed the procedures used in Experiment 1. Consumption tests were conducted in all three contexts, while the reacquisition test for each rat was conducted in Contexts B and C. The consumption test occurred the day after the extinction test, and the reacquisition test occurred the day following the consumption test. Consumption testing occurred in the opposite order as the extinction test. Reacquisition testing lasted for 15 minutes. Running order of the reacquisition test was the same as the extinction test day.

Results

Acquisition.

The acquisition data are depicted in Figure 4. Both Paired and Unpaired rats increased lever pressing across sessions, which was confirmed by a Replication (2) by Devaluation (Paired, Unpaired) by Session (15) ANOVA. The ANOVA found a main effect of session, F(14, 280) = 27.38, MSE = 2,814.56, p < .001, with no main effects of devaluation, group, replication, or any significant interactions, largest F(14, 280) = 1.13. Separate ANOVAs indicated that there was a main effect of session in both Context A (Days 1–3), F(2, 44) = 88.73, MSE = 1,197.10, p < .001, and Context B (Days 4–15), F(11, 242) = 13.24, MSE = 1,285.64, p < .001.

Figure 4:

Figure 4:

Lever press acquisition in Experiment 2. Shown are the mean response rates of both groups during acquisition in Contexts A (action training) and B (conversion to habit). Error bars represent standard error of the mean (SEM).

Devaluation.

The Reinforcer devaluation data for the Paired rats in Replication 2 are shown in Figure 5. (Due to an experimental error, the context assignment of the consumption scores in Replication 1 were not recorded.) A Context (3) by Session (3) repeated measures ANOVA found a significant effect of session, F(2, 20) = 126.50, MSE = 3.64, p < .001. There was no effect of context or context by session interaction, Fs < 1. On the final session of devaluation, the Paired rats ate an average of 1.1 pellets. The Unpaired rats ate all of the pellets given to them throughout devaluation.

Figure 5:

Figure 5:

Results of the reinforcer devaluation phase from Replication 2 of Experiment 2. Data are the mean proportion of pellets consumed by paired rats during their first, second, and third devaluation sessions in Contexts A, B and C. (Unpaired rats consumed all pellets on each trial.) Error bars represent standard error of the mean (SEM).

Test.

The test data are displayed in Figure 6. To be consistent with Experiment 1, we performed a log (X + 1) transformation on all of the test data. A Group by Devaluation by Replication repeated measures ANOVA found a significant effect of context, F(1, 20) = 16.84, MSE = .78, p = .001, ηp2 = .46. There were also significant effects of devaluation, F(1, 20) = 9.66, MSE = .54, p = .006, ηp2 = .33, and replication, F(1, 20) = 19.24, MSE = 1.08, p < .001, ηp2 = .49. (There was generally less responding in Replication 2). There was a significant context by devaluation interaction, F(1, 20) = 4.68, MSE = .22, p = .043, ηp2 = .19. There was a marginally significant devaluation by replication interaction, F(1, 20) = 4.17, MSE = .23, p = .055, ηp2 = .17. No other factors, including interactions, approached significance, largest F = 1.88. Planned within-context comparisons assessed the devaluation effect in the two test contexts. An independent samples t-test confirmed that there was no devaluation effect in Context B, t(22) < 1, but a significant devaluation effect in Context C, t(22) = −2.61, p = .016, d = 1.04. The results are consistent with the view that lever pressing was a habit in the habit context, but renewed as an action in the neutral context (Context C).

Figure 6:

Figure 6:

Results of testing in Experiment 2 (response rates were subjected to a log (X +1) transformation). Bars display mean response rates of both groups in both Contexts A and B. Response rates reflect the overall rate during the 10-minute session. Error bars represent standard error of the mean (SEM).

To further test for the presence of habit in the present experiments, we collapsed the test data from Context B across experiments to maximize the power to detect any possible difference between the Paired and Unpaired groups after the training procedure. An independent samples t-test once again found no difference in responding in Context B between Paired (Mean = 1.09) and Unpaired (Mean = 1.18) rats, t(37) = −1.24, p = .223.

Consumption and Reacquisition Tests.

During the Consumption test (not shown), the Paired rats ate an average of 0.8 pellets in Context A, 1.6 in Context B, and 1.3 in Context C, confirming the presence of taste aversion in all three contexts. The Unpaired rats ate all 10 pellets. During the Reacquisition test (not shown), delivery of the pellet supported the response in both Contexts only in the Unpaired rats. There was very little responding by the Paired rats during this test. The pattern confirmed that the pellet was only reinforcing for the Unpaired rats.

Discussion

Responding tested in a third context (C) showed renewed action properties after it had been trained as an action in Context A and converted to habit in Context B. The results suggest that simple removal of the response from the habit-training context was sufficient to renew action control. Given this pattern of results, it seems that any change in context may be sufficient to reduce the S-R control and renew R-O responding. It is worth noting that, as in Experiment 1, the contexts did not differ in their control of pellet consumption (and aversion conditioning) during the reinforcer devaluation phase. Overall, the results are consistent with the conclusions that (1) habit learning did not destroy prior action learning, (2) habit learning instead interfered with action in a context-specific way, and (3) habit learning was context-specific while action learning tended to transfer across contexts.

General Discussion

The present experiments examined the renewal of a response’s goal-directed action properties after it had been converted to habit with more extensive training. In Experiment 1, an instrumental response was given a moderate amount of action training in Context A, a more extensive amount of training to convert it to habit in Context B, and was then tested for action or habit status in the two contexts after a reinforcer devaluation treatment. Based on the effect of reinforcer devaluation, the response appeared habitual in Context B but renewed as an action in Context A. In Experiment 2, a response was similarly trained as an action in Context A and converted to habit in Context B before being tested in Contexts B and C. Based on the effects of reinforcer devaluation, the response again manifested as a habit in Context B and as an action in the neutral Context C. Together, the results suggest that the conversion of the behavior into habit did not destroy its original action properties, which were renewed by a change of context. The effects of habit learning on the expression of goal direction thus appear similar to the effects of extinction: Habit learning interferes with Phase 1 (action) performance as long as testing occurs in the habit-learning context.

One reason why habit may be more context-specific than action is that habits are learned after initial action learning. There is evidence suggesting that in Pavlovian conditioning, and possibly in instrumental learning, second-learned information is more context specific than first-learned; for example, in either case, extinction is more context-specific than first-learned conditioning (e.g. Bouton, 2019; Trask, Thrailkill, & Bouton, 2017). Nelson (2002) provided strong evidence for greater context-sensitivity of a second-learned information in Pavlovian conditioning. After training a CS (Y) as a conditioned inhibitor in Context A by compounding it with an excitor X whenever X was nonreinforced (i.e., X+, XY- training), he gave Y excitatory training in Context A (Y+). Finally, he tested Y in both Context A and Context B. He found that Y had become excitatory in Context A, but this excitation was attenuated in Context B. Importantly, this context switch had more effect on a CS that had first been trained as an inhibitor compared to an excitatory training only group or a no pre-training control. Nelson demonstrated that this effect also occurred when the second-learned association was inhibitory rather than excitatory. Perhaps second-learned habit is similarly more context-specific than first-learned action, as if the imbalance or asymmetry is a general feature of retroactive interference learning.

Gremel and Costa (2013) have also reported contextual cueing of goal-directed and habitual behavior in individual animals using a paradigm that differed from the one used here. The authors trained mice to perform a lever-press response in Context A and Context B during repeatedly alternating sessions. While the response and the reinforcer were the same in both contexts, the response was reinforced on an RI schedule in Context A and a random ratio (RR) schedule in Context B, a treatment that might promote habit in Context A and action in Context B (Dickinson, 1985; Dickinson, Nicholas, & Adams, 1983). Using the sensory-specific satiety method of devaluing the reinforcer, the authors confirmed that the response was a habit in Context A and an action in Context B. Contexts thus served as a cue for habitual and goal-directed behavior in both the present experiments and those of Gremel and Costa (2013). The present experiments were different (among other ways) in that they converted action into habit in a single second phase by merely conducting continued training with an RI schedule.

Neurobiological research suggests that goal-directed actions are mediated by activity in the dorsomedial striatum (DMS) and prelimbic cortex (PL), for example, and that habits are contrastingly represented in the dorsolateral striatum (DLS) and infralimbic cortex (IL) (Balleine, 2019; Corbit, 2018; Coutureau & Killcross, 2003; Killcross & Coutureau, 2003; Yin et al., 2004). Interestingly, there is a suggestion that these circuits coexist, and that habit does not erase action capability when it is learned. For example, Coutureau and Killcross (2003) found that pharmacologically suppressing IL returned a habitual behavior to action status—as if action had not been erased. Related findings have been reported by Yin et al. (2006) with respect to inactivating the DLS. The neural findings thus agree with the present behavioral findings in suggesting that turning an action into a habit by extending the instrumental training does not necessarily erase action knowledge.

The results are also in general accord with other recent behavioral results from this laboratory, described in the Introduction, that suggest that goal-direction survives the conversion of a behavior into a habit with extended practice. Steinfeld and Bouton (2020) gave a response either moderate or extensive training in Context A to form an action or a habit, respectively, and then extinguished the response in Context B. They then tested both responses in the acquisition context (A), extinction context (B), and a neutral context (C). Responses trained as actions and habits renewed after extinction as actions and habits, respectively, when they were tested in the acquisition context (A). However, after extinction both actions and habits renewed as actions when tested in the neutral context (C). Moreover, during extinction in Context B, the response manifested as an action regardless of the amount of training it had received in Context A, suggesting that a context-switch after training also allowed the response to renew as an action. The current findings are also consistent with the results of Bouton et al. (2020) and Trask et al. (2020), who found that several manipulations, typically involving the presentation of unexpected reinforcers, caused a habitual behavior to return to the status of a goal-directed action. The results of Bouton et al. (2020), Steinfeld and Bouton (2020), Trask et al. (2020), along with the present results, suggest that a habit can be disrupted and readily returned to action.

The results of the present experiments are in partial agreement with dual-process theories of habit formation which suggest that action and habit associations can be held and retained at the same time (Balleine, 2019; de Wit & Dickinson, 2009; Dickinson et al., 1995; Dickinson & Balleine, 1993; Gremel & Costa, 2013). For example, according to de Wit and Dickinson (2009), S-R and R-O associations both start developing at the beginning of instrumental training. The strength of an instrumental response is a function of the summed strengths of the R-O and S-R processes. The amount that each process contributes to the overall strength of the response is not equal. Early in training, the R-O system contributes more to response strength, but with extended training the contribution of R-O weakens as the S-R process strengthens. de Wit and Dickinson (2009) suggest that R-O and S-R associations reside in separate associative and habit memory systems, implying that they might potentially coexist after habit learning. However, the quantitative model proposed by Perez and Dickinson (2020) predicts that the knowledge of goal-direction will be erased as instrumental training continues, the organism experiences less variation in (and less correlation between) the rate of responding and the rate being reinforced, and habit learning replaces it. This view contrasts with our behavioral results suggesting that habits can readily convert back to goal-directed actions again (see also Bouton et al., 2020; Steinfeld & Bouton, 2020; Trask et al., 2020). Habit can thus interfere with, but not destroy, action knowledge. The present results further suggest that the retrieval of the two corresponding types of associations can depend on the current context.

Overall, the results add to the view that habits are specific to the context where they are learned, whereas actions transfer across contexts (Steinfeld & Bouton, 2020; Thrailkill & Bouton, 2015). They also suggest that the S-R associations that form during habit training do not eliminate the R-O associations formed earlier in training, which renew upon removal from the habit learning context. Thus, the same response can manifest as either action or habit depending on the context. Action-to-habit conversion is not a permanent, one-way process. The present results, along with other recent findings (Bouton et al., 2020; Steinfeld & Bouton, 2020; Trask et al., 2020), suggest that habitual behaviors may be able to switch between the status of habit and goal-directed action.

Acknowledgments

This research was supported by NIH Grant RO1 DA 033123.

References

  1. Adams CD (1982). Variations in the sensitivity of instrumental responding to reinforcer devaluation. The Quarterly Journal of Experimental Psychology B: Comparative and Physiological Psychology, 34, 77–98. [Google Scholar]
  2. Adams CD, & Dickinson A (1981). Instrumental responding following reinforcer devaluation. The Quarterly Journal of Experimental Psychology Section B, 33, 109–121. [Google Scholar]
  3. Balleine BW (2019). Hierarchical action control: Adaptive collaboration between actions and habits. Frontiers in Psychology, 10, 2735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Balleine BW (2019). The meaning of behavior: Discriminating reflex and volition in the brain. Neuron, 104, 47–62. [DOI] [PubMed] [Google Scholar]
  5. Balleine BW, & O’Doherty JP (2010). Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology, 35, 48–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bouton ME (1993). Context, time, and memory retrieval in the interference paradigms of Pavlovian learning. Psychological Bulletin, 114, 80–99. [DOI] [PubMed] [Google Scholar]
  7. Bouton ME (2019). Extinction of instrumental (operant) learning: interference, varieties of context, and mechanisms of contextual control. Psychopharmacology, 236, 7–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bouton ME, Broomer MC, Rey CN, & Thrailkill EA (2020). Unexpected food outcomes can return a habit to a goal-directed action. Neurobiology of Learning and Memory, in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bouton ME, & Schepers ST (2015). Renewal after the punishment of free operant behavior. Journal of Experimental Psychology: Animal Learning and Cognition, 41, 81–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Bouton ME, & Todd TP (2014). A fundamental role for context in instrumental learning and extinction. Behavioural Processes, 104, 13–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bouton ME, Todd TP, Vurbic D, Winterbauer NE (2011). Renewal after the extinction of free operant behavior. Learning & Behavior, 29, 57–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Bouton ME, Winterbauer NE, & Todd TP (2012). Relapse processes after the extinction of instrumental learning: renewal, resurgence, and reacquisition. Behavioural Processes, 90, 130–141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Colwill RM, & Rescorla RA (1985). Instrumental responding remains sensitive to reinforcer devaluation after extensive training. Journal of Experimental Psychology: Animal Behavior Processes, 11, 520–536. [PubMed] [Google Scholar]
  14. Corbit LH (2018). Understanding the balance between goal-directed and habitual behavioral control. Current Opinion in Behavioral Sciences, 20, 161–168. [Google Scholar]
  15. Coutureau E, & Killcross S (2003). Inactivation of the infralimbic prefrontal cortex reinstates goal-directed responding in overtrained rats. Behavioural Brain Research, 146, 167–174. [DOI] [PubMed] [Google Scholar]
  16. Daw ND, Niv Y, & Dayan P (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8, 1704–1711. [DOI] [PubMed] [Google Scholar]
  17. de Wit S, & Dickinson A (2009). Associative theories of goal-directed behaviour: a case for animal–human translational models. Psychological Research, 73, 463–476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Dickinson A (1985). Actions and habits: the development of behavioural autonomy. Philosophical Transactions of the Royal Society of London. B, Biological Sciences, 308, 67–78. [Google Scholar]
  19. Dickinson A, & Balleine B (1993). Actions and responses: The dual psychology of behaviour. In Eilan N, McCarthy RA, & Brewer B (Eds.), Spatial representation: Problems in philosophy and psychology (p. 277–293). Blackwell Publishing. [Google Scholar]
  20. Dickinson A, Nicholas DJ, & Adams CD (1983). The effect of instrumental training contingency on susceptibility to reinforcer devaluation. The Quarterly Journal of Experimental Psychology B: Comparative and Physiological Psychology, 35, 35–51. [Google Scholar]
  21. Dickinson A, Balleine B, Watt A, Gonzalez F, & Boakes RA (1995). Motivational control after extended instrumental training. Animal Learning & Behavior, 23, 197–206. [Google Scholar]
  22. Field A (2005). Discovering statistics using SPSS. Thousand Oaks, CA: Sage Publications. [Google Scholar]
  23. Furlong TM, Supit AS, Corbit LH, Killcross S, & Balleine BW (2017). Pulling habits out of rats: Adenosine 2A receptor antagonism in dorsomedial striatum rescues meth-amphetamine-induced deficits in goal-directed action. Addiction Biology, 22, 172–183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Gremel CM, Chancey JH, Atwood BK, Luo G, Neve R, Ramakrishnan C, … & Costa RM (2016). Endocannabinoid modulation of orbitostriatal circuits gates habit formation. Neuron, 90, 1312–1324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Gremel CM, & Costa RM (2013). Orbitofrontal and striatal circuits dynamically encode the shift between goal-directed and habitual actions. Nature Communications, 4, 2264–2276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Killcross S, & Coutureau E (2003). Coordination of actions and habits in the medial prefrontal cortex of rats. Cerebral cortex, 13, 400–408. [DOI] [PubMed] [Google Scholar]
  27. Marchant NJ, Khuc TN, Pickens CL, Bonci A, Shaham Y (2013). Context-induced relapse to alcohol seeking after punishment in a rat model. Biological Psychiatry, 73, 256–262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Miller RR, Kasprow WJ, & Schachtman TR (1986). Retrieval variability: sources and consequences. The American Journal of Psychology, 99, 145–218. [PubMed] [Google Scholar]
  29. Nelson JB (2002). Context specificity of excitation and inhibition in ambiguous stimuli. Learning and Motivation, 33, 284–310. [Google Scholar]
  30. Peck CA, & Bouton ME (1990). Context and performance in aversive-to-appetitive and appetitive-to-aversive transfer. Learning and Motivation, 21, 1–31. [Google Scholar]
  31. Perez OD, & Dickinson A (2020). A theory of actions and habits: The interaction of rate correlation and contiguity systems in free-operant behavior. Psychological Review, in press. [DOI] [PubMed] [Google Scholar]
  32. Robbins TW, Vaghi MM, & Banca P (2019). Obsessive-compulsive disorder: puzzles and prospects. Neuron, 102, 27–47. [DOI] [PubMed] [Google Scholar]
  33. Shipman ML, Trask S, Bouton ME, & Green JT (2018). Inactivation of prelimbic and infralimbic cortex respectively affects minimally-trained and extensively-trained goal- directed actions. Neurobiology of Learning and Memory, 155, 164–172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Solway A, & Botvinick MM (2012). Goal-directed decision making as probabilistic inference: A computational framework and potential neural correlates. Psychological Review, 119, 120–154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Spear NE (1981). Extending the domain of memory retrieval. In Spear NE & Miller RR (Eds.), Information processing in animals: memory mechanisms (pp. 341–378). Hillsdale, NJ: Erlbaum. [Google Scholar]
  36. Steinfeld MR & Bouton ME (2020). Context and renewal of habits and goal-directed actions after extinction. Journal of Experimental Psychology: Animal Learning and Cognition, in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Thrailkill E & Bouton ME (2015). Contextual control of instrumental actions and habit. Journal of Experimental Psychology: Animal Learning and Cognition, 41, 69–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Thrailkill EA, Trask S, Vidal P, Alcalá JA, & Bouton ME (2018). Stimulus control of actions and habits: A role for reinforcer predictability and attention in the development of habitual behavior. Journal of Experimental Psychology: Animal Learning and Cognition, 44, 370–384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Todd TP (2013). Mechanisms of renewal after the extinction of instrumental behavior. Journal of Experimental Psychology: Animal Behavior Processes 39, 193–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Todd TP, Vurbic D, Bouton ME (2014a). Behavioral and neurobiological mechanisms of extinction in Pavlovian and instrumental learning. Neurobiology of Learning and Memory, 108, 52–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Todd TP, Vurbic D, Bouton ME (2014b). Mechanisms of renewal after the extinction of discriminated operant behavior. Journal of Experimental Psychology: Animal Learning and Cognition, 40, 355–368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Todd TP, Winterbauer NE, & Bouton ME (2012). Effects of the amount of acquisition and contextual generalization on the renewal of instrumental behavior after extinction. Learning & Behavior, 40, 145–157. [DOI] [PubMed] [Google Scholar]
  43. Trask S, Shipman ML, Green JT, Bouton ME (2020). Some factors that restore goal-direction to a habitual behavior. Neurobiology of Learning and Memory, in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Trask S, Thrailkill EA, & Bouton ME (2017). Occasion setting, inhibition, and the contextual control of extinction in Pavlovian and instrumental (operant) learning. Behavioral Processes, 137, 64–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Verschure PF, Pennartz CM, & Pezzulo G (2014). The why, what, where, when and how of goal-directed choice: neuronal and computational principles. Philosophical Transactions of the Royal Society B: Biological Sciences, 369, 188–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Yin HH, Knowlton BJ & Balleine BW (2004) Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. European Journal of Neuroscience, 19, 181–189. [DOI] [PubMed] [Google Scholar]
  47. Yin HH, Knowlton BJ & Balleine BW (2005). The role of the dorsomedial striatum in instrumental conditioning. European Journal of Neuroscience, 22, 513–523. [DOI] [PubMed] [Google Scholar]
  48. Yin HH, Knowlton BJ & Balleine BW (2006) Inactivation of dorsolateral striatum enhances sensitivity to changes in the action-outcome contingency in instrumental conditioning. Behavioral Brain Research, 166, 189–196. [DOI] [PubMed] [Google Scholar]

RESOURCES