Abstract
Although it has been shown that the basolateral amygdala (BLA) and the mediodorsal thalamus (MD) are critical for goal-directed instrumental performance, much remains unknown about the respective contributions of these structures to action selection. The current study assessed the effects of post-training BLA and MD lesions on several tests of instrumental action selection. We found that MD damage disrupted the influence of pavlovian cues over action selection but left intact rats' ability to select actions based on either the expected value or the discriminative stimulus properties of the outcome. In contrast, BLA lesions impaired performance on all three tests of action selection. Because both lesion types disrupted the influence of cues that signal reward over instrumental performance, we then investigated the involvement of these structures in pavlovian contingency learning using a task in which the predictive status of one of two cues is degraded by delivering its outcome noncontingently during the intertrial interval. As expected, the sham group selectively suppressed their conditioned approach performance to the cue that no longer signaled its outcome but continued to respond to the control stimulus. In contrast, both lesioned groups were impaired on this task. Interestingly, whereas the MD group displayed a nonspecific reduction in responding to both cues, the BLA group continued to show high levels of responding to both cues as if their performance was completely insensitive to this contingency manipulation. These findings demonstrate that the BLA and MD make important yet distinct contributions to instrumental action selection.
Keywords: incentive, reinstatement, goal-directed, predictive learning, expectancy, decision, reward
Introduction
Instrumental action selection is guided by several distinct processes. Under typical training conditions, performance is highly flexible and shares the characteristic features of human goal-directed action (Dickinson and Balleine, 1993); specifically, the rate of instrumental performance depends on the contingency between an action and its outcome and the reward value of that outcome. For instance, rats that have recently been satiated on one food outcome will tend to withhold an action that was rewarded with that outcome but continue to perform an action that produced a different outcome (Balleine and Dickinson, 1998). Environmental cues that signal reward also play a role in action selection by facilitating the retrieval of actions with which they share a common outcome (Kruse et al., 1983). This effect, known as pavlovian-instrumental transfer, is known to depend on different processes from those mediating the influence of outcome devaluation on performance (Rescorla, 1994; Holland, 2004) and is sensitive to changes in the predictive status of the eliciting stimulus (Delamater, 1995). The reward itself also plays an important part in action selection. When tested in extinction, for instance, the presentation of a free outcome will tend to selectively reinstate the performance of an action that has been trained with that outcome over an action trained with a different outcome (Delamater et al., 2003; Ostlund and Balleine, 2007a). However, unlike outcome devaluation, which influences performance through response–outcome learning, the outcome delivery tends to bias responding in favor of whichever action it signaled during training based on an outcome-response association (Colwill, 1994; Balleine and Ostlund, 2007).
There has been considerable progress in delineating the neural substrates of instrumental action selection in recent years. Pretraining lesions of the basolateral amygdala (BLA) render instrumental performance insensitive to manipulations of outcome value and response–outcome contingency (Balleine et al., 2003; Corbit and Balleine, 2005) and similar impairments have been observed with pretraining lesions of the mediodorsal thalamus (MD) (Corbit et al., 2003), the prelimbic cortex (Balleine and Dickinson, 1998; Corbit and Balleine, 2003; Killcross and Coutureau, 2003; Ostlund and Balleine, 2005), and the dorsomedial striatum (Yin et al., 2005), suggesting that goal-directed performance depends on a widely distributed neural system. Of course, permanent pretraining lesions prevent the target structure from contributing to either acquisition- or performance-related processes and therefore reveal little about its pattern of involvement over the course of training. Although post-training lesion studies have advanced our understanding the respective contributions of the dorsomedial striatum and prelimbic cortex to goal-directed action selection (Ostlund and Balleine, 2005; Yin et al., 2005), it is not known whether the MD and BLA continue to play a role in instrumental performance after initial training.
We investigated the effects of post-training lesions of the MD or BLA on the selection of actions based on (1) expected reward value, (2) cues that signal reward, and (3) noncontiguous reward delivery. We then examined whether these structures contribute to stimulus–outcome contingency learning. The findings indicate that they play important yet distinct roles in the selection and initiation of instrumental actions.
Materials and Methods
Subjects and apparatus.
Twenty-four adult female Long–Evans rats (Harlan, Indianapolis, IN) served as subjects. Rats were housed in pairs in transparent plastic tubs located in a temperature- and humidity-controlled vivarium. Behavioral training and testing was conducted during the light phase of the 12 h light/dark cycle. The subjects were food deprived throughout behavioral training and testing by restricting their daily food allotment to ∼10–12 g of home chow, sufficient to maintain them at ∼85% of their free-feeding weight. The behavioral procedures were performed in 24 identical Med Associates (East Fairfield, VT) operant chambers enclosed in sound- and light-attenuating shells. Each chamber was equipped with two retractable levers that could be extend to the left and right of a recessed food magazine. Attached to each food magazine was a pellets dispenser, used to deliver 45 mg grain-based food pellets (Bio-Serv, Frenchtown, NJ), and an infusion pump fitted with a syringe, used to deliver 0.1 ml drops of 20% sucrose solution. An infrared photobeam crossed the magazine opening, allowing for the detection of head entries. Illumination was provided by a house light (3 W, 24 V) located on the wall opposite the magazine. Tone (2 kHz; 80 dB) and white noise (80 dB) generators were used to produce the auditory cues. A set of three microcomputers running the Med-PC program (Med Associates) controlled all experimental events and recorded responses.
Pavlovian conditioning.
All rats received eight daily sessions of pavlovian conditioning. Each of the two auditory cues was consistently paired with a different outcome. For half of the subjects, the tone was paired with pellets and the noise was paired with sucrose, whereas the remaining subjects received the opposite stimulus–outcome relationships. Each stimulus presentation lasted 2 min during which the corresponding outcome was delivered on a random time (RT) 30 s schedule. Each session consisted of four presentations of each stimulus, presented in random order, with individual trials separated by a variable intertrial interval (ITI; mean, 5 min).
Instrumental training.
All rats then received 11 d of instrumental conditioning. The two responses (left and right lever press) were trained with different outcomes in separate daily sessions. The order of training sessions was alternated over days. For half of the subjects in each group, pressing the left lever delivered pellets and pressing the right lever delivered sucrose, whereas the remaining subjects received the opposite response–outcome relationships, counterbalanced with pavlovian stimulus–outcome relationships. Each session was terminated after 15 outcomes were earned or 30 min had elapsed, whichever came first. For the first two days of instrumental training, lever pressing was continuously reinforced, such that each action resulted in an outcome delivery (probability, 1.0). The reinforcement schedule was then gradually shifted over days through a series of increasing random ratio (RR) schedules: an RR-5 schedule (probability, 0.2) was used on days 3–5, an RR-10 schedule (probability, 0.1) was used on days 6–8, and an RR-20 schedule (probability, 0.05) was used on days 9–11.
Surgical procedures.
After the conclusion of instrumental training, all rats were provided with unrestricted access to lab chow for two days before and for seven days after surgery. At the time of surgery, rats were anesthetized with pentobarbital (Nembutal, 50 mg/kg) and administered atropine (0.1 mg) before being placed in a stereotaxic frame (Stoelting, Wood Dale, IL). An incision was made into the scalp to expose the skull surface and the incisor bar was adjusted to place bregma and lambda in the same horizontal plane. Small burr holes were drilled into the skull above the target sites. Bilateral excitotoxic lesions were made by manually infusing NMDA (20 μg/μl in PBS) into either the BLA or MD at a rate of 0.1 μl/min using a 1 μl Hamilton syringe. For BLA lesions, 0.25 μl of NMDA was infused into each of four sites [anteroposterior (AP) −2.3, −3.0; mediolateral (ML) ±5.2; dorsoventral (DV) −7.6; AP and ML coordinates relative to bregma and DV coordinates relative to dura]. For MD lesions, 0.4 μl of NMDA was infused into each of two sites (AP −2.2; ML ± 0.7; DV −6.2; all coordinates relative to bregma). The needle was left in place for an additional 4 min in each infusion site to allow for diffusion of the drug. Sham lesions were made using the same procedures except that the needle was not lowered and no drug was infused. A recovery period of 10 d was provided between surgery and behavioral testing. Rats were handled daily and returned to the food deprivation schedule during the last 3 d of this period.
Outcome devaluation.
Subjects were given two sessions of outcome devaluation testing (one for each outcome). For the first test, all rats were initially given unrestricted access to one of the two training outcomes for 1 h. One-half of the subjects in each condition were satiated on food pellets (40 g in a glass bowl placed in their home cage) and the other half were satiated on sucrose (40 ml in drinking bottle attached to their home cage). Immediately after this period, the rats were placed in the behavioral chambers for a 5-min choice extinction test in which both levers were inserted but no outcomes were delivered. Forty-eight hours later, each rat was given a second test otherwise identical to the first except that they were satiated on the other outcome.
Cue-guided response selection.
All rats were given a single pavlovian-instrumental transfer test session 48 h after the last outcome devaluation test. Both levers were inserted into the chamber for the duration of the session. Lever presses were recorded but did not result in the delivery of reward (i.e., the levers were inactive). After an initial 10 min period of extinction, the effect of pavlovian cue presentations on instrumental performance was then assessed over a series of eight transfer trials, four with each stimulus, occurring in the following order: tone-noise-noise-tone-noise-tone-tone-noise. Stimulus presentations lasted 2 min and were separated by a 3 min fixed ITI.
Outcome-guided response selection.
Forty-eight hours later, all rats were given a single test of outcome-guided response selection modeled after the pavlovian-instrumental transfer test described above. Both levers were available for the duration of the session but were again inactive. As with transfer, the rats were given 10 min of initial extinction. The effect of free outcome deliveries on instrumental performance was then assessed over a series of eight trials, four trials with each outcome, occurring in the following order: sucrose-grain-grain-sucrose-grain-sucrose-sucrose-grain. Each trial was initiated by two deliveries of the appropriate outcome. Trials were separated by a 5-min fixed ITI.
Pavlovian contingency degradation.
All rats were then given eight days of outcome-selective pavlovian contingency degradation training using a procedure adapted from Delamater (1995). Each rat was trained with the same stimulus–outcome relationships that they had been presented with during initial pavlovian training. During contingency degradation training, however, stimulus presentations lasted only 20 s and were immediately followed by the delivery of the appropriate outcome with a probability of 0.5. Each session consisted of eight trials with each stimulus, presented in a pseudo-random order with a variable ITI (mean, 4 min). During the ITI, one of the two outcomes was delivered noncontingently with a probability of 0.5 in each 20 s period, such that it was equiprobable in the presence and the absence of its corresponding stimulus. Half of the subjects in each group received noncontingent food pellets and half received noncontingent sucrose (counterbalanced with the stimulus–outcome relationships).
Histology.
After behavioral testing, the rats received a lethal overdose of sodium pentobarbital and were perfused transcardially with 0.9% saline followed by 10% buffered formalin solution. Brains were then extracted and fixed in a 25% sucrose-formalin solution for 2–3 d. The brains were then frozen and sliced into 50 μm coronal sections around the target structure. These sections were mounted on glass slides and stained with thionin. A light microscope was used to verify the placement and extent of each lesion through comparison with a rat brain atlas (Paxinos and Watson, 1998) and sections from sham-lesioned rats.
Results
Histology
Figure 1 presents the results of the histological analysis. NMDA infusions were effective in producing substantial neuronal damage in each of the target structures. In general, BLA lesions extended throughout the rostrocaudal extent of this structure and were primarily restricted to the basal, accessory basal and lateral nuclei of the amygdala, although limited damage was sometimes observed in surrounding structures in some rats. MD lesions encompassed the medial, central and lateral nuclei of the MD thalamus, but also extended into surrounding thalamic nuclei (e.g., centeromedial, centerolateral, and intermediodorsal nuclei) in some subjects although, again, this was not systematically observed either within or between subjects. Two subjects died during surgery. The final group sizes for testing were as follows: group sham, n = 8; group MD, n = 7; group BLA, n = 7.
Initial training
Pavlovian and instrumental conditioning proceeded without incident. The mean rate of responding during the last session of instrumental training was 26.1 (±2.6) for the sham group, 21.6 (±2.6) for the MD group and 23.3 (±2.4) for the BLA group. An ANOVA performed on these data found that the rate of instrumental performance did not significantly differ across groups (F < 1). These data represent the presurgery baseline response rate. No rewarded sessions were given after surgery to prevent potential reacquisition in rats given post-training lesions. We did find that group BLA displayed a significantly lower rate of responding than the other two groups across the tests of response selection (see below). Because these tests were conducted in extinction, it is possible that this effect reflects a failure to persist in the absence of reinforcement. Pretraining BLA lesions, however, tend to have little if any effect on the baseline rate of responding in either rewarded sessions or in transfer tests conducted in extinction (Balleine et al., 2003; Corbit and Balleine, 2005). Therefore, it seems more likely that the response decrement observed here occurred because the BLA was functional during initial training, allowing it to be incorporated into the neural circuitry underlying the expression of instrumental performance. Furthermore, it should also be carefully noted that the tests of action selection used in the current study were designed to target choice performance; i.e., we were interested in how performance was distributed across the two actions rather than on the overall rate of responding.
Instrumental outcome devaluation test
The first test we conducted after surgery targeted the selection of actions based on anticipated reward value. Although the two actions were trained with outcomes of comparable value, the rats were satiated on one of the two outcomes immediately before testing to temporarily reduce the value of that outcome. Figure 2 presents the test results in successive 1 min bins for the action that had earned the devalued outcome (devalued) and the action that had earned the nondevalued outcome (nondevalued). Whereas both the sham group and the MD group displayed a selective suppression of responding for the devalued outcome, the BLA group displayed generally low rates of responding on both levers. A three-way mixed ANOVA with lesion, value (devalued and nondevalued) and minute (1–5) as factors resulted in significant main effects of lesion (F(2,19) = 6.54; p < 0.01), value (F(1,19) = 32.29; p < 0.001), and minute (F(4,76) = 6.12; p < 0.001). There was also a significant lesion by value interaction (F(2,19) = 6.87; p < 0.01), confirming that sensitivity to outcome devaluation differed across groups. No other interaction reached significance (F values < 1). Simple effects analysis revealed a significant effect of value for group sham (F(1,7) = 22.14; p < 0.01) and group MD (F(1,6) = 11.26; p < 0.05), but not for group BLA (F(1,6) = 1.66; p > 0.05). Although these results suggest the BLA group was impaired in using outcome value to choose between instrumental actions, this interpretation is complicated somewhat by the fact that this group displayed a significantly lower rate of responding than the other two groups. It is therefore possible that our ability to detect any further decrement their performance was hindered by a floor effect. However, previous studies have found that unoperated rats responding at a similarly low rate (for example, those trained on concurrent interval schedules) are capable of decreasing their performance in reaction to outcome devaluation (Colwill and Rescorla, 1985). It is also important to note that these rats failed to display any evidence of outcome devaluation at the beginning of the test session, when their response rate was greatest and when there was sufficient room to observe an effect.
Cue-guided response selection
We then conducted an outcome-selective pavlovian-instrumental transfer test, which assessed the rats' ability to use cues that signal reward to choose between instrumental actions based on a shared outcome representation. For instance, a cue paired with grain pellets should selectively facilitate responding on an action that earned pellets over an action that earned sucrose solution. Notice that, to bias action selection one way or the other, the eliciting cue must retrieve an outcome representation incorporating sensory features that are unique to that outcome. Interestingly, although the retrieval of nonspecific outcome features might be expected to have some excitatory effect on actions trained with a different outcome, this is rarely observed in tests of selective transfer (Colwill and Rescorla, 1988; Ostlund and Balleine, 2007b), suggesting that this effect is primarily mediated by the distinctive sensory features of the anticipated outcome. More general excitatory effects of reward-related cues on the rate of instrumental performance have be observed, but only under conditions in which the specific sensory properties of the predicted outcome were unlikely to form an important part of the representational structure controlling action selection (cf. Corbit and Balleine, 2005; for discussion, see Corbit et al., 2007).
The results of the test are presented in Figure 3 as the mean number of lever presses performed per minute (collapsed across actions) during the stimulus that predicted the same outcome as that action (Same), during the stimulus that predicted the outcome earned by the other action (Different), and during the 2 min that preceded each stimulus presentation (baseline). As expected, the sham group displayed clear evidence of outcome-specific pavlovian-instrumental transfer, selecting actions based on the outcome signaled by the eliciting stimulus. This effect was not observed in the performance of BLA- or MD-lesioned rats. The results of a two-way mixed ANOVA using lesion and stimulus (baseline, Same, and Different) as factors found a significant effect of lesion (F(1,19) = 14.83; p < 0.001) and stimulus (F(2,38) = 10.673; p < 0.001), and also detected a significant lesion by stimulus interaction (F(4,38) = 3.27; p < 0.05). A significant effect of stimulus was observed for group sham (F(2,14) = 13.82; p < 0.001), and Bonferroni post hoc analysis found that these rats responded significantly more during stimulus Same than during either stimulus Different (p < 0.01) or during the baseline period (p < 0.05), confirming that the pavlovian cues biased action selection in an outcome-specific manner. In contrast, the stimulus effect did not reach significance for group BLA (F(2,12) = 2.12; p > 0.20) or group MD (F(2,12) = 2.94; p = 0.09), suggesting that their instrumental performance was relatively insensitive to reward-related cues.
Outcome-guided response selection
Our next test assessed the impact of noncontiguous reward delivery on instrumental response selection. As with selective transfer, rats are known to use the unique sensory features of freely delivered outcomes to guide their instrumental performance; e.g., the delivery of a grain pellet tends to selectively facilitate the performance of an action trained with pellets, relative to another action trained with sucrose solution (Ostlund and Balleine, 2007a). However, this effect differs from transfer in that it does not involve stimulus–outcome learning, but instead depends on a highly specific stimulus–response connection in which the training outcome serves as the eliciting stimulus.
Figure 4 presents the results of the test, plotted as the mean number of lever presses performed per minute (collapsed across actions) during the 2 min that followed the delivery of the outcome that had been earned by that action (Same), during the 2 min that followed the delivery of the outcome that had been earned by the other action (Different), and during the 2 min that preceded each outcome delivery (baseline). As expected, the performance of the sham group was influenced by the discriminative stimulus properties of the outcome, such that their tendency to perform an action was greater after the delivery of the Same outcome than after the delivery of the Different outcome. Although the MD group displayed a similar pattern of responding, the BLA group failed to show this effect. A two-way mixed ANOVA using lesion and outcome (baseline, Same, and Different) as factors found a main effect of lesion (F(1,19) = 6.48; p < 0.01) and outcome (F(2,38) = 18.7; p < 0.001), and also found a significant lesion by outcome interaction (F(4,38) = 3.34; p < 0.05). Simple effects analysis detected a significant effect of outcome for groups sham (F(2,14) = 10.28; p < 0.01) and MD (F(2,12) = 9.29; p < 0.01) groups, but not for group BLA (F(2,12) = 2.62; p > 0.10). Bonferroni post hoc analysis revealed that both the sham group (p < 0.05) and the MD group (p < 0.05) responded more after the delivery of outcome Same than after the delivery of outcome Different, although we did not observe a significant increment in responding after the delivery of outcome Same relative to the baseline period for either group sham or group MD (p > 0.10). This latter finding is not surprising given that, in the cue-based selection test, the eliciting stimulus was present throughout the entire 2 min observation period, whereas it was delivered only at the very beginning of the observation period in the outcome-based test; i.e., it is possible that the weaker stimulus support for responding in the outcome-based test was not sufficient to elevate responding over baseline. Furthermore, it is worth emphasizing that the difference in responding observed across periods Same and Different provides an unambiguous demonstration that the free outcome deliveries were effective in guiding action selection in both sham- and MD-lesioned rats.
Pavlovian contingency degradation
Pavlovian contingency degradation training was then conducted to further explore the involvement of the MD and BLA in stimulus–outcome encoding. Specifically, we assessed whether lesioned rats would appropriately suppress their conditioned approach performance to a cue that no longer served as a reliable signal of its outcome. The results are presented in Figure 5. The left panels display the amount of time spent in the food magazine during the stimulus that was paired with the noncontingently delivered outcome (degraded), during the stimulus that was paired with the other outcome (nondegraded), and during the 20 s that immediately preceded each stimulus presentation (baseline). For convenience, the conditional component of these data are presented in the right panels using a difference score (stimulus − baseline). The sham group displayed normal contingency learning, selectively withholding their approach performance during the degraded stimulus while continuing to approach during the nondegraded stimulus. However, MD and BLA lesions produced dissociable effects on pavlovian contingency learning, whereas MD-lesioned rats exhibited a general reduction in responding to both stimuli, the approach performance of BLA-lesioned rats appeared to be completely insensitive to the noncontingent outcome deliveries (i.e., both stimuli continued to evoke conditioned approach behavior). A three-way mixed ANOVA using lesion, stimulus (degraded, nondegraded, and baseline) and block (1–4) as factors found a significant main effect of stimulus (F(2,38) = 7.22; p < 0.01) and significant stimulus by block (F(6,114) = 4.42; p < 0.001) and lesion by stimulus by block (F(12,114) = 1.95; p < 0.05) interactions. No other main effect or interaction reached significance (largest F value, group by block interaction; F(6,57) = 1.32; p > 0.25). To explore the source of the three-way interaction, a separate stimulus by block ANOVA was conducted for each group. For the sham group, the ANOVA detected a marginal effect of stimulus (F(2,14) = 3.71; p = 0.05), but found no effect of block (F(3,21) = 2.28; p > 0.10). More importantly, however, the test found a significant stimulus by block interaction (F(6,42) = 2.90; p < 0.05). This interaction continued to be significant with the baseline data excluded from the analysis (i.e., degraded vs nondegraded) (F(3,21) = 3.58; p < 0.05), indicating that shams acquired differential levels of responding to the two stimuli over the course of contingency training. For the MD group, the ANOVA found no effect of stimulus (F(2,12) = 1.18; p > 0.10) or block (F < 1), but did detect a significant interaction between these factors (F(10,60) = 3.84; p < 0.001). With reference to Figure 5, it seems likely that this interaction reflected the convergence of performance during both the degraded and nondegraded stimuli with the baseline performance over trials. As support for this interpretation, the stimulus by block interaction did not reach significance (F < 1) when baseline data were excluded from the analysis, indicating that the MD group responded similarly to the degraded and nondegraded stimuli over the course of contingency training. The significant interaction with the baseline data included therefore confirms that the MD group exhibited a nonspecific decrease in responding to both cues relative to baseline. For the BLA group, the test resulted in a significant main effect of stimulus (F(2,12) = 3.92; p < 0.05), but found no effect of block (F(3,18) = 1.16; p > 0.10) or stimulus by block interaction (F < 1), indicating that their performance was unaffected by this manipulation of stimulus–outcome contingency.
Discussion
Our aim was to investigate the involvement of the MD and BLA in the control of action selection in instrumental conditioning. We found that rats with post-training MD lesions showed normal sensitivity to outcome devaluation, providing evidence that their memory for response–outcome relationships was preserved. Additional testing revealed that group MD was impaired in cue-guided, but not outcome-guided, response selection, indicating a rather specific deficit in using stimulus–outcome associations to choose between instrumental actions. In contrast, we found that post-training BLA lesions disrupted performance on all three tests of instrumental control, suggesting that this structure plays a more fundamental role in encoding and maintaining reward representations.
The finding that post-training MD lesions failed to affect outcome devaluation performance is noteworthy given an earlier report that rats with pretraining MD lesions exhibit insensitivity to manipulations of reward value and response–outcome contingency (Corbit et al., 2003). These previous findings fit nicely with the view that response–outcome learning is mediated by a corticostriatal circuit involving the prelimbic region of the prefrontal cortex and the dorsomedial striatum (Balleine, 2005). Pretraining lesions of these structures also leave instrumental performance insensitive to outcome devaluation and response–outcome contingency degradation (Balleine and Dickinson, 1998; Corbit and Balleine, 2003; Killcross and Coutureau, 2003; Ostlund and Balleine, 2005; Yin et al., 2005). Other findings indicate that the dorsomedial striatum and prelimbic cortex make distinct contributions to instrumental learning. Yin et al. (2005) found that goal-directed instrumental performance can be disrupted by permanently lesioning the dorsomedial striatum after training or by temporarily inactivating it just before the test session, suggesting that it is critical for the permanent storage and/or expression of response–outcome associations. In contrast, post-training prelimbic lesions have been shown to leave intact outcome devaluation performance (Ostlund and Balleine, 2005), indicating that its contribution is limited to acquisition. Our findings suggest that the involvement of the MD in instrumental learning mirrors that of the prelimbic cortex, which may not be surprising given that it is an important relay of information from the basal ganglia to the prefrontal cortex (Ongur and Price, 2000). It is therefore possible that whereas the successful encoding of response–outcome associations depends on an interaction between the dorsomedial striatum and the prelimbic cortex relayed via the MD, the retrieval and implementation of these associations ultimately becomes independent of this thalamocortical feedback circuit. However, additional research will be needed to investigate this hypothesis.
There have been previous reports that pretraining BLA lesions impair instrumental outcome devaluation performance (Balleine et al., 2003; Corbit and Balleine, 2005). Our study demonstrates that BLA lesions made after training have a similar effect, indicating that its contribution to goal-directed performance is long lasting. This finding is consistent with other previous findings implicating the BLA in post-training reward evaluation (Wang et al., 2005; Wellman et al., 2005). The BLA also appears to be important for stimulus–outcome encoding. Pretraining lesions of this structure disrupt outcome-selective transfer performance (Blundell et al., 2001; Corbit and Balleine, 2005) and render pavlovian conditioned approach performance insensitive to outcome devaluation (Hatfield et al., 1996). However, Pickens et al. (2003) found that BLA lesions made after initial pavlovian training left intact the effect of outcome devaluation on conditioned approach (Pickens et al., 2003), raising the possibility that it may be necessary for the acquisition, but not the expression, of pavlovian outcome expectancies. In contrast to this interpretation, we found that post-training BLA lesions abolished outcome-selective transfer. One explanation for this apparent discrepancy is that the BLA is only required for the expression of stimulus–outcome learning when these associations are needed to guide instrumental action selection. Alternatively, it may be that selective transfer is a more sensitive assay of outcome encoding than the single-outcome devaluation task used by Pickens et al. (2003) because the former, but not the latter, relies on rats' ability to generate an outcome expectation that is detailed enough to be discriminated from other available outcomes. According to this account, post-training BLA lesions should disrupt the impact of outcome devaluation on conditioned approach performance when an outcome-selective experimental design is used (Colwill and Motzkin, 1994; Blundell et al., 2003), a prediction that remains to be assessed.
It is also revealing that BLA damage tends to spare performance on tests of action selection that do not rely on the use of detailed reward representations, including discrimination performance based on conventional cues (Burns et al., 1999) and the general motivational form of transfer (Corbit and Balleine, 2005). Based on such findings, it has been suggested that the BLA is responsible for encoding sensory-specific reward representations (Balleine and Killcross, 2006). Providing additional support for this view, pretraining BLA lesions have been shown to impair rats' ability to use the identity of freely delivered rewards as the basis for their choice between instrumental actions (Balleine et al., 2003). We found that post-training BLA lesions were also effective in disrupting performance on an outcome-guided response selection task, suggesting that it is involved in the long-term storage and/or implementation of these reward representations.
We also found that the MD is critical for cue-based action selection. Because MD lesions spared the rats' capacity to choose between actions based on either the value or the discriminative stimulus properties of a reward, this effect does not appear to be the result of a general impairment in response selection or reward representation. Instead, the data support the view that the MD plays an important role in the expression of stimulus–outcome associations. In contrast to its role in response–outcome encoding, it seem unlikely that this function of the MD depends on its interaction with the prelimbic cortex, as lesions of this structure have no effect on outcome-selective transfer (Corbit and Balleine, 2003). However, the MD shares connections with the entire prefrontal cortex, including the orbitofrontal region, an area known to be involved in the expression of pavlovian outcome expectancies (Holland and Gallagher, 2004; Schoenbaum and Roesch, 2005). Interestingly, as with the MD, post-training orbitofrontal lesions impair cue-guided action selection, but preserve the sensitivity of instrumental performance to outcome devaluation (Ostlund and Balleine, 2007b).
To further explore the role of the MD and BLA in pavlovian learning, we assessed how lesions of these areas would affect rats' ability to learn about a change in the predictive relationship between a pavlovian cue and its outcome. In normal rats, the magnitude of conditioned approach performance tends to reflect the contingency that exists between the cue and its outcome and not merely the number of times these events have been paired (Pearce and Bouton, 2001). In the current study, contingency learning was tested by selectively degrading one of two previously trained stimulus–outcome associations. Although both cues continued to be paired with their respective outcomes, one outcome was also delivered in an unpaired manner. sham-lesioned rats showed clear and selective sensitivity to this contingency degradation treatment, suppressing their conditioned approach to the cue that no longer served as a valid predictor of its outcome while continuing to respond to the control cue. Rats with MD lesions, however, displayed a general reduction in responding to both cues, regardless of their current predictive status. Interestingly, we found previously that orbitofrontal lesions produce a similar nonspecific sensitivity to stimulus–outcome contingency degradation (Ostlund and Balleine, 2007b). Such an impairment could result from a failure to generate outcome-specific pavlovian expectancies. According to this account, rats with lesions of either the MD or orbitofrontal cortex are capable of stimulus–outcome contingency learning but are forced to rely on generic outcome expectations, perhaps composed of the general affective properties common to both outcomes (Balleine and Killcross, 2006). It is reasonable to assume that this inability to form discriminable outcome expectancies would produce a nonspecific contingency degradation effect; the decrement in associative strength produced by the noncontingent outcome deliveries would be assigned equally to both cues. Alternatively, it is possible that MD (and orbitofrontal) lesions abolish the capacity for stimulus–outcome contingency learning, leaving approach performance to fall under the control of the response–outcome learning system that supports goal-directed instrumental performance. Because no explicit instrumental contingency was arranged during this phase of the experiment, their approach performance should have slowly decreased to a low rate distributed nondifferentially across both of the cues and the intertrial interval, as was observed.
In contrast, BLA-lesioned rats displayed a complete insensitivity to contingency degradation training. This pattern of results cannot be explained by either the generic outcome encoding account or the instrumental contingency learning account, because they predict some decrement in responding. Instead, performance in the BLA group suggests that rats with these lesions no long use an error-based learning rule and default to use a contiguity-based learning rule instead (Hebb, 1949). Because the temporal contiguity and frequency of stimulus–outcome pairings remained stable and did not differ across cues during contingency degradation training, any animal relying on a simple contiguity rule should have maintained high levels of conditioned approach performance to both the predictive and the nonpredictive cues, as was observed in BLA-lesioned rats.
Footnotes
This work was supported by National Institute of Mental Health Grant 56446.
References
- Balleine BW. Neural bases of food-seeking: affect, arousal and reward in corticostriatolimbic circuits. Physiol Behav. 2005;86:717–730. doi: 10.1016/j.physbeh.2005.08.061. [DOI] [PubMed] [Google Scholar]
- Balleine BW, Dickinson A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology. 1998;37:407–419. doi: 10.1016/s0028-3908(98)00033-1. [DOI] [PubMed] [Google Scholar]
- Balleine BW, Killcross S. Parallel incentive processing: an integrated view of amygdala function. Trends Neurosci. 2006;29:272–279. doi: 10.1016/j.tins.2006.03.002. [DOI] [PubMed] [Google Scholar]
- Balleine BW, Ostlund SB. Still at the choice-point: action selection and initiation in instrumental conditioning. Ann NY Acad Sci. 2007;1104:147–171. doi: 10.1196/annals.1390.006. [DOI] [PubMed] [Google Scholar]
- Balleine BW, Killcross AS, Dickinson A. The effect of lesions of the basolateral amygdala on instrumental conditioning. J Neurosci. 2003;23:666–675. doi: 10.1523/JNEUROSCI.23-02-00666.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blundell P, Hall G, Killcross S. Lesions of the basolateral amygdala disrupt selective aspects of reinforcer representation in rats. J Neurosci. 2001;21:9018–9. doi: 10.1523/JNEUROSCI.21-22-09018.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blundell P, Hall G, Killcross S. Preserved sensitivity to outcome value after lesions of the basolateral amygdala. J Neurosci. 2003;23:7702–7709. doi: 10.1523/JNEUROSCI.23-20-07702.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burns LH, Everitt BJ, Robbins TW. Effects of excitotoxic lesions of the basolateral amygdala on conditional discrimination learning with primary and conditioned reinforcement. Behav Brain Res. 1999;100:123–133. doi: 10.1016/s0166-4328(98)00119-3. [DOI] [PubMed] [Google Scholar]
- Colwill RM. Associative representations of instrumental contingencies. In: Medin DL, editor. The psychology of learning and motivation. New York: Academic; 1994. pp. 1–72. [Google Scholar]
- Colwill RM, Motzkin DK. Encoding of the unconditioned stimulus in pavlovian conditioning. Anim Learn Behav. 1994;22:284–394. [Google Scholar]
- Colwill RM, Rescorla RA. Postconditioning devaluation of a reinforcer affects instrumental responding. J Exp Psychol Anim Behav Process. 1985;11:120–132. [Google Scholar]
- Colwill RM, Rescorla RA. Associations between the discriminative stimulus and the reinforcer in instrumental learning. J Exper Psychol Anim Behav Process. 1988;14:155–164. [Google Scholar]
- Corbit LH, Balleine BW. The role of prelimbic cortex in instrumental conditioning. Behav Brain Res. 2003;146:145–157. doi: 10.1016/j.bbr.2003.09.023. [DOI] [PubMed] [Google Scholar]
- Corbit LH, Balleine BW. Double dissociation of basolateral and central amygdala lesions on the general and outcome-specific forms of pavlovian-instrumental transfer. J Neurosci. 2005;25:962–970. doi: 10.1523/JNEUROSCI.4507-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corbit LH, Muir JL, Balleine BW. Lesions of mediodorsal thalamus and anterior thalamic nuclei produce dissociable effects on instrumental conditioning in rats. Eur J Neurosci. 2003;18:1286–1294. doi: 10.1046/j.1460-9568.2003.02833.x. [DOI] [PubMed] [Google Scholar]
- Corbit LH, Janak PH, Balleine BW. General and outcome-specific forms of pavlovian-instrumental transfer: the effect of shifts in motivational state and inactivation of the ventral tegmental area. Eur J Neurosci. 2007;26:3141–3149. doi: 10.1111/j.1460-9568.2007.05934.x. [DOI] [PubMed] [Google Scholar]
- Delamater AR. Outcome-selective effects of intertrial reinforcement in pavlovian appetitive conditioning. Anim Learn Behav. 1995;23:31–39. [Google Scholar]
- Delamater AR, LoLordo VM, Sosa W. Outcome-specific conditioned inhibition in pavlovian backward conditioning. Learn Behav. 2003;31:393–402. doi: 10.3758/bf03196000. [DOI] [PubMed] [Google Scholar]
- Dickinson A, Balleine BW. Actions and responses: the dual psychology of behaviour. In: Eilan N, McCarthy R, Brewer MW, editors. Spatial representation. Oxford: Basil Blackwell; 1993. pp. 277–293. [Google Scholar]
- Hatfield T, Han JS, Conley M, Gallagher M, Holland P. Neurotoxic lesions of basolateral, but not central, amygdala interfere with pavlovian second-order conditioning and reinforcer devaluation effects. J Neurosci. 1996;16:5256–5265. doi: 10.1523/JNEUROSCI.16-16-05256.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hebb DO. The organization of behavior. New York: Wiley; 1949. [Google Scholar]
- Holland PC. Relations between pavlovian-instrumental transfer and reinforcer devaluation. J Exp Psychol Anim Behav Process. 2004;30:104–117. doi: 10.1037/0097-7403.30.2.104. [DOI] [PubMed] [Google Scholar]
- Holland PC, Gallagher M. Amygdala-frontal interactions and reward expectancy. Curr Opin Neurobiol. 2004;14:148–155. doi: 10.1016/j.conb.2004.03.007. [DOI] [PubMed] [Google Scholar]
- Killcross S, Coutureau E. Coordination of actions and habits in the medial prefrontal cortex of rats. Cereb Cortex. 2003;13:400–408. doi: 10.1093/cercor/13.4.400. [DOI] [PubMed] [Google Scholar]
- Kruse JM, Overmier JB, Konz WA, Rokke E. Pavlovian conditioned stimulus effects on instrumental choice behavior are reinforcer specific. Learn Motiv. 1983;14:165–181. [Google Scholar]
- Ongur D, Price JL. The organization of networks within the orbital and medial prefrontal cortex of rats, monkeys and humans. Cereb Cortex. 2000;10:206–219. doi: 10.1093/cercor/10.3.206. [DOI] [PubMed] [Google Scholar]
- Ostlund SB, Balleine BW. Lesions of medial prefrontal cortex disrupt the acquisition but not the expression of goal-directed learning. J Neurosci. 2005;25:7763–7770. doi: 10.1523/JNEUROSCI.1921-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ostlund SB, Balleine BW. Selective reinstatement of instrumental performance depends on the discriminative stimulus properties of the mediating outcome. Learn Behav. 2007a;35:43–52. doi: 10.3758/bf03196073. [DOI] [PubMed] [Google Scholar]
- Ostlund SB, Balleine BW. Orbitofrontal cortex mediates outcome encoding in pavlovian but not instrumental conditioning. J Neurosci. 2007b;27:4819–4825. doi: 10.1523/JNEUROSCI.5443-06.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paxinos G, Watson C. Ed 4. New York: Academic; 1998. The rat brain in stereotaxic coordinates. [DOI] [PubMed] [Google Scholar]
- Pearce JM, Bouton ME. Theories of associative learning in animals. Annu Rev Psychol. 2001;52:111–139. doi: 10.1146/annurev.psych.52.1.111. [DOI] [PubMed] [Google Scholar]
- Pickens CL, Saddoris MP, Setlow B, Gallagher M, Holland PC, Schoenbaum G. Different roles for orbitofrontal cortex and basolateral amygdala in a reinforcer devaluation task. J Neurosci. 2003;23:11078–11084. doi: 10.1523/JNEUROSCI.23-35-11078.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rescorla RA. Transfer of instrumental control mediated by a devalued outcome. Anim Learn Behav. 1994;22:27–33. [Google Scholar]
- Schoenbaum G, Roesch M. Orbitofrontal cortex, associative learning, and expectancies. Neuron. 2005;47:633–636. doi: 10.1016/j.neuron.2005.07.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang SH, Ostlund SB, Nader K, Balleine BW. Consolidation and reconsolidation of incentive learning in the amygdala. J Neurosci. 2005;25:830–835. doi: 10.1523/JNEUROSCI.4716-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wellman LL, Gale K, Malkova L. GABAA-mediated inhibition of basolateral amygdala blocks reward devaluation in macaques. J Neurosci. 2005;25:4577–4586. doi: 10.1523/JNEUROSCI.2257-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yin HH, Ostlund SB, Knowlton BJ, Balleine BW. The role of the dorsomedial striatum in instrumental conditioning. Eur J Neurosci. 2005;22:513–523. doi: 10.1111/j.1460-9568.2005.04218.x. [DOI] [PubMed] [Google Scholar]