Flexible control of Pavlovian-instrumental transfer based on expected reward value

Andrew T Marshall; Briac Halbout; Christy N Munson; Collin Hutson; Sean B Ostlund

doi:10.1037/xan0000348

. Author manuscript; available in PMC: 2024 Jan 1.

Published in final edited form as: J Exp Psychol Anim Learn Cogn. 2023 Jan;49(1):14–30. doi: 10.1037/xan0000348

Flexible control of Pavlovian-instrumental transfer based on expected reward value

Andrew T Marshall ¹, Briac Halbout ², Christy N Munson ², Collin Hutson ², Sean B Ostlund ²

PMCID: PMC10561628 NIHMSID: NIHMS1899398 PMID: 36795420

Abstract

The Pavlovian-instrumental transfer (PIT) paradigm is widely used to assay the motivational influence of reward-predictive cues, reflected by their ability to invigorate instrumental behavior. Leading theories assume that a cue’s motivational properties are tied to predicted reward value. We outline an alternative view which recognizes that reward-predictive cues may suppress rather than motivate instrumental behavior under certain conditions, an effect termed positive conditioned suppression. We posit that cues signaling imminent reward delivery tend to inhibit instrumental behavior, which is exploratory by nature, in order to facilitate efficient retrieval of the expected reward. According to this view, the motivation to engage in instrumental behavior during a cue should be inversely related to the value of the predicted reward, since there is more to lose by failing to secure a high-value reward than a low-value reward. We tested this hypothesis in rats using a PIT protocol known to induce positive conditioned suppression. In Experiment 1, cues signaling different reward magnitudes elicited distinct response patterns. Whereas the 1-pellet cue increased instrumental behavior, cues signaling 3 or 9 pellets suppressed instrumental behavior and elicited high levels of food-port activity. Experiment 2 found that reward-predictive cues suppressed instrumental behavior and increased food-port activity in a flexible manner that was disrupted by post-training reward devaluation. Further analyses suggest that these findings were not driven by overt competition between the instrumental and food-port responses. We discuss how the PIT task may provide a useful tool for studying cognitive control over cue-motivated behavior in rodents.

Reward-paired cues acquire Pavlovian incentive motivational properties which allow them to invigorate instrumental reward-seeking behavior (Dickinson et al. 2000; Estes 1948; Lovibond 1981), a phenomenon referred to as Pavlovian-instrumental transfer (PIT). This influence seems to serve an adaptive function by promoting risky and effortful foraging activity in environments that signal potential reward availability. The PIT paradigm is widely used to study the mechanisms of cue-motivated behavior (Cartoni et al. 2016; Corbit and Balleine 2016) and how they contribute to pathological reward seeking in addiction and related disorders (Corbit and Janak 2007; 2016; Garbusow et al. 2016; Genauck et al. 2020; LeBlanc et al. 2013; LeBlanc et al. 2014; LeBlanc et al. 2012; Marshall and Ostlund 2018; Ostlund et al. 2014; Saddoris et al. 2011; Sebold et al. 2021; Shiflett 2012; Shiflett et al. 2013; Vogel et al. 2018; Wyvell and Berridge 2001). However, despite decades of research, much remains unclear about how fundamental variables such as expected reward value influence expression of the PIT effect.

Leading computational models of incentive learning (Dayan and Balleine 2002; McClure et al. 2003; Zhang et al. 2009) assume that motivational value is assigned to cues based on the total amount of delay-discounted reward that they predict (i.e., their state value). The motivational influence of cues should therefore directly depend on basic Pavlovian conditioning parameters such as reward probability, cue-reward interval, and reward magnitude. This account makes some intuitive predictions. For instance, a cue that reliably signals the immediate delivery of a large reward should acquire strong motivational properties, whereas a cue signaling that an upcoming reward will be small, delayed, or unlikely to occur at all, should acquire weak motivational properties.

However, in contrast to these predictions, evidence suggests that the motivational impact of reward-paired cues is instead inversely related to their ability to predict reward. For instance, cues signaling a high probability of imminent reward do not invigorate and may even suppress instrumental performance, an effect known as positive conditioned suppression (Azrin and Hake 1969; Crombag et al. 2008; Lovibond 1981; Meltzer and Hamm 1978; Miczek and Grossman 1971; Vandyne 1971). Instead, such cues elicit high levels of Pavlovian conditioned approach behavior directed at the food-port (Marshall et al. 2020). In contrast, cues that signal a low probability of reward become potent motivators of instrumental behavior, while eliciting more modest levels of food-port activity (Marshall et al. 2020). The motivational influence of cues also depends on their temporal relationship with reward delivery (Delamater and Holland 2008; Delamater and Oakeshott 2007; Lovibond 1981; Matell and Della Valle 2018). For instance, we have shown that cues signaling a fixed 30-sec interval between cue onset and reward delivery produce a gradual suppression of instrumental behavior and a coincident increase in food-port activity as the expected reward delivery time draws near (Marshall and Ostlund 2018).

Such findings suggest that reward-paired cues can acquire distinct motivational and predictive properties that evoke different kinds of behavior, with the former promoting the pursuit of new rewards through instrumental behavior and the latter eliciting a pause in instrumental behavior and anticipatory reward-retrieval activity (Ostlund and Marshall 2021). Organizing behavior in this manner, based on reward expectancy, is important for efficient foraging and is a central pillar of behavior systems theory (Timberlake et al. 1982). While it is adaptive to seek out new rewards through instrumental behavior (or other general search activities) when rewards are scarce, such behavior is unnecessary and may even interfere with the retrieval of a reward that is expected soon (i.e., focal search), increasing the chance that it will be pilfered or otherwise lost.

However, this simplistic description of foraging sidesteps the complexity involved in ambiguous situations, when cues may elicit conflicting tendencies to both seek out new rewards and collect an expected reward. The factors involved in resolving such conflict are not well understood, though presumably the value of the expected reward plays an important role. Early studies on the interaction between Pavlovian and instrumental learning processes may also provide some insight. For instance, Konorski and colleagues demonstrated through a series of studies that Pavlovian cues predicting imminent reward do not simply elicit conditioned consummatory responses – in their case, salivation and orienting toward the food cup – they also acquire the ability to acutely interrupt ongoing instrumental behavior (Ellison and Konorski 1965; Konorski 1967). Such cues were also able to prevent other discriminative cues from motivating instrumental performance (Soltysik et al. 1976) and were themselves extremely resistant to acquiring motivational properties if later used as discriminative stimuli for instrumental performance (Konorski and Wyrwicka 1950). These findings suggest that cues signaling imminent, response-independent reward actively inhibit the expression of cue-motivated behavior.

Following this logic, we have hypothesized that Pavlovian cues acquire the potential to motivate instrumental behavior, but that this motivational response is subject to cognitive control and is therefore suppressed in situations where such behavior would be disadvantageous (Marshall et al. 2020; Marshall and Ostlund 2018; Ostlund and Marshall 2021). Cognitive control broadly refers to the ability to regulate thoughts, emotions, and actions based on an internally represented behavioral goal (Braver 2012). A core function of cognitive control is to override prepotent response tendencies – particularly those which are impulsive or habitual in nature – in situations where an alternative course of action is more advantageous (Botvinick and Braver 2015; Miller and Cohen 2001). We can apply this framework to PIT by assuming that when a cue signals that a valuable reward is imminent, the impulse to engage in instrumental reward-seeking behavior will be suppressed in order to allow for efficient reward retrieval. Thus, the value of the reward predicted by a cue should play an important albeit indirect role in modulating instrumental performance, indicating the degree to which such behavior should be suppressed. A cue that signals imminent delivery of a high-value reward should therefore be less effective in motivating instrumental behavior and more effective in eliciting food-port activity than a cue that signals a less valuable reward. These predictions are readily distinguished from those made by theories of incentive learning that assume a positive correlation between motivational and predictive value (Dayan and Balleine 2002; McClure et al. 2003; Zhang et al. 2009).

Interesting, the limited data that exist on this subject suggests that expected reward value rarely affects the expression of PIT performance. For instance, several studies have shown that cues retain their ability to motivate instrumental behavior despite predicting a reward that has been devalued through conditioned taste aversion learning (Colwill and Rescorla 1990; Holland 2004; Rescorla 1994). However, such studies have generally used PIT protocols designed to maximize motivational effects and use cues that signal long, sporadic intervals before reward delivery. Therefore, while these findings suggest that cues normally motivate instrumental behavior independently of expected reward value, they do not address whether expected reward value modulates the suppressive influence of cues that signal imminent reward (i.e., positive conditioned suppression).

We investigated this issue using an alternative version of the PIT task in which reward-predictive cues tend to suppress instrumental behavior and increase food-port activity in apparent anticipation of reward delivery (Marshall and Ostlund 2018). The role of expected reward value on these responses was assessed using a parametric manipulation of reward magnitude in Experiment 1 and post-training reward devaluation in Experiment 2.

Experiment 1

Experiment 1 investigated the influence of expected reward magnitude on the expression of instrumental reward-seeking and food-port entry behavior during PIT testing. Briefly, hungry rats were first trained to lever press for food pellets before undergoing Pavlovian conditioning, in which two distinct 30-sec auditory cues signaled food pellet delivery at cue offset. Reward magnitude was varied across cues and groups. While all groups had at least one cue that signaled three food pellets, the alternate cue signaled either one (Group 1/3), three (Group 3/3), or nine food pellets (Group 3/9). PIT testing was then performed by intermittently presenting these reward-predictive cues in a noncontingent manner while rats were free to press the lever and enter the food-port, in the absence of reward delivery.

Methods

Animals and Apparatus

Twenty-six experimentally naïve adult male Long Evans rats (Envigo) were used in this experiment. They arrived at the facility (University of California, Irvine; Irvine, CA, USA) at approximately 10 weeks of age. They were pair-housed in a colony room set to a standard 12:12 hr light:dark schedule. Rats were tested during the light phase. Water was always provided ad libitum in the home cages. Rats were fed between 10–14 g of standard lab chow per day during the experiment to maintain them at ~85% of their estimated free-feeding bodyweight. Husbandry and experimental procedures were approved by the UC Irvine Institutional Animal Care and Use Committee (IACUC) and conducted in accordance with the National Research Council Guide for the Care and Use of Laboratory Animals.

The experiment was conducted in 14 operant chambers (Med-Associates; St. Albans, VT), each housed within sound-attenuating, ventilated boxes. Each chamber was equipped with a stainless-steel grid floor; two stainless steel walls (front and back); and a transparent polycarbonate side-wall, ceiling, and door. Pellet dispensers, mounted on the outside of the operant chamber, were equipped to deliver 45-mg food pellets (Bio-Serv) into a recessed food-port centered on the lower section of the front wall. Head entries into the food-port were transduced by an infrared photobeam. A retractable lever was located to the left of the food-port, on the front wall. The chamber was also equipped with a house light centered at the top of the back wall. Auditory stimuli were presented to animals via a speaker located on the back wall. Experimental events were controlled and recorded with 10-ms resolution by the software program MED-PC IV (Tatham and Zurn 1989).

Procedure

Magazine training

All sessions of all phases began with the onset of the houselight. In each of two 30-minute sessions of magazine training, grain-based food pellets were delivered on a random-time (RT) 60-s schedule.

Instrumental training

During initial instrumental (lever-press) training, rats were continuously reinforced with a grain-based food pellet delivery for pressing the left lever (fixed-ratio, FR-1), earning a maximum of 30 pellets per session. These FR-1 sessions lasted no more than 30 min. Seven rats required an extra session of FR-1 training, which lasted until these rats earned 30 pellets. During subsequent training sessions, lever pressing was reinforced according to a random-interval (RI) schedule, such that the lever remained available but was inactive for an average of t seconds after each reward delivery, where individual t values were selected randomly from an exponential distribution. The RI schedule was changed over training days with 1 day of RI-5 (t = 5 sec), 1 day of RI-15 (t = 15 sec), 2 days of RI-30 (t = 30 sec), and 10 days of RI-45 (t = 45 sec) training. Each RI session lasted 30 minutes.

Pavlovian training

Pavlovian training involved exposure to two 30-s conditioned stimuli (CS; 3-kHz tone and 10-Hz clicker; 80 dB) which were paired with reward (grain-based food pellets). Rats were assigned to one of three groups with different CS-reward magnitude arrangements. For Group 1/3 (n = 9), one CS terminated with 1 pellet and the other with 3 pellets. For Group 3/3 (n = 8), both CSs terminated with 3 pellets. For Group 3/9 (n = 9), one CS terminated with 3 pellets and the other with 9 pellets. Stimulus identity was counterbalanced with group and reward magnitude conditions.

In each 20-min session, a 60-s interval preceded onset of the first CS. There was a random 120-s inter-trial-interval (ITI) between CS presentations, and a 60-s interval following the final CS presentation prior to the end of the session. Pavlovian training lasted for 25 daily sessions, each involving 4 pseudorandomly-alternating presentations of each CS (8 total trials per session).

Pavlovian-to-instrumental transfer (PIT)

Following Pavlovian training, rats received two daily sessions of instrumental retraining (identical to earlier sessions with the RI-45 schedule) followed, on the next day, by one 30-min session of instrumental extinction, in which the lever was continuously available but was inactive. On the following day, rats received a PIT test session (30.25 minutes in duration), during which the lever was once again continuously available but inactive. During the test, rats received 4 noncontingent presentations of each 30-s CS in pseudorandom order (ABBABAAB). The ITI was 150 s, and a 5-min interval preceded onset of the first CS. No food pellets were delivered at test.

Data Analysis

All summary measures were obtained from the raw data using MATLAB (The MathWorks; Natick, MA, USA), and analyzed with mixed-effects regression models (Pinheiro and Bates 2000), a powerful analytical framework that is both well established and highly recommended for behavioral research (Boisgontier and Cheval 2016). Mixed-effects models are comparable to repeated-measures regression analyses, and allow for parameter estimation per manipulation condition (fixed effects) and the individual (random effects) (Bolker et al. 2009; Hoffman and Rovine 2007; Pinheiro and Bates 2000; Schielzeth et al. 2013). Mixed-effects regression models (1) effectively handle missing data and (2) permit the inclusion of categorical and continuous predictors in the same analysis, thus allowing detection of group-level changes across ordered data samples (i.e., continuous time points) while also accounting for corresponding individual differences. All relevant fixed-effects factors were included in each model. Categorical predictors were effects-coded (i.e., codes sum to 0), and continuous predictors were mean-centered. For analyses of Pavlovian training and PIT, the fixed-effects structure included main effects of group and reward magnitude, and the random-effects structure included a by-rat random intercept. PIT analyses also included main effects of time within the CS (i.e., CS 10-s period), with CS Reward Magnitude × CS 10-s Period interactions also included for analysis of individual groups. (Here, analysis of Group 1/3’s food-port-entry behavior employed a linear model with no random effects due to random-effects convergence issues given outlier removal, described below.) Both group (categorical) and reward magnitude (continuous; i.e., 1, 3, 9) were included in these analyses to differentiate overall between-group differences in behavior versus sample-wide changes in behavior as a function of differences in reward magnitude. Instrumental training analyses incorporated generalized linear mixed-effects models (family: gamma, link: log) with predictors of group and time since the previous reward delivery. The random-effects structure included a random by-rat slope of time since reward delivery and a random intercept of rat, restricted to be uncorrelated. The alpha level for all tests was .05. Sample sizes were not predetermined based on statistical analysis but are similar to those reported in previous publications (Halbout et al. 2019; Marshall et al. 2020; Marshall and Ostlund 2018). Main effects and interactions are reported as the results of ANOVA F-tests (i.e., whether the coefficients for each fixed effect were significantly different from 0).

Our primary dependent measures were the rates of lever pressing and food-port entry behavior (recorded as the number of discrete food-port beam breaks). We quantified cue-induced changes in behavior by subtracting the mean response rate (response per minute) during pre-CS periods (30 sec each) from the mean response rate during CS periods, calculated separately for consecutive 10-sec periods within individual 30-sec CSs to characterize the time course of responding. Pre-CS (baseline) data were averaged across all CS trials (within-subject).

The final five sessions of Pavlovian training were used to assess conditioned food-port entry behavior during CS trials relative to pre-CS baseline periods. Analyses of instrumental training included the final three sessions of training. Data points were considered outliers if their values were at least three scaled median absolute deviations from the median (Leys et al. 2013), in which the median value of the absolute deviations from the median was scaled by ~1.48 (Rousseeuw and Croux 1993) and then multiplied by 3; for PIT analyses, outliers were based on the rats’ mean difference scores within each condition. For the current experiment, 24 individual data points were removed from the instrumental training analysis (i.e., 24 of 1,170 data points [26 rats × 45 bins]) and one rats’ data (Group 1/3, 3-pellet CS) were removed from the PIT analyses of food-port-entry behavior. Lastly, 38 of 528 data points (i.e., paired observations of lever-pressing and food-port-entry behavior) were excluded from trial-by-trial analyses of concurrent local changes in both CS-induced lever-pressing and food-port-entry behavior. There were no outliers in the other analyses. Outlier removal was specific to each analysis, such that a rats’ exclusion from one analysis did not prohibit inclusion from other analyses.

Results

Instrumental and Pavlovian Training

Rats were first trained to lever press for food reward. While the rats had not yet been assigned to groups for Pavlovian training, Figure 1A shows the groups’ mean lever-press rates on an RI-45 s schedule of reinforcement as a function of time since the previous reward delivery. All groups [mean (SEM)] lever pressed at comparable rates [Group 1/3: 15.8 (1.8); Group 3/3: 14.8 (1.4); Group 3/9: 15.7 (1.5)]. Per a generalized linear mixed-effects model (distribution = gamma, link = log) on response rates during the 15–45 s time window, there was neither a main effect of group, F(1, 786) = 0.21, p = .812, nor a Group × Time interaction, F(2, 786) = 0.77, p = .461.

Following instrumental training, rats were trained to associate two 30-s CSs with food reward, delivered upon termination of the cues. Figure 1B shows conditioned food-port entry behavior (relative to the pre-CS baseline) for the final 5 sessions of training. Analysis of these data revealed a main effect of group, F(2, 40) = 5.69, p = .007, and a main effect of CS reward magnitude, F(1, 40) = 16.27, p < .001. The latter effect appeared to be driven by Groups 1/3 and 3/9, which showed higher levels of food-port activity during whichever CS signaled the larger of the two possible reward magnitudes.

Pavlovian-instrumental transfer (PIT)

Following two sessions of instrumental retraining and one session of instrumental extinction, rats were given a PIT test, in which the 30-s CSs were presented while the rats were able to freely lever press without reinforcement. Figures 2A and 2B show rats’ CS-induced change in lever-press rate and food-port entry rate, respectively, relative to pre-CS baseline periods. Analyses revealed main effects of group, F(2, 127) = 3.59, p = .030, and CS reward magnitude on lever pressing, F(1, 127) = 5.74, p = .018, in which CS-induced lever pressing decreased with increases in predicted reward magnitude. For food-port entry rate, there was a main effect of group, F(2, 124) = 4.71, p = .011, and a significant increase in CS-induced food-port entries with predicted reward magnitude, F(1, 124) = 4.45, p = .037.

Figure 2. — CS-induced changes in (A) lever-press rates and (B) food-port entries (relative to the pre-CS baseline) on the Pavlovian-instrumental transfer (PIT) test. These data are plotted separately according to Group and Reward Magnitude and show changes in responding over time (10-sec bins) during 30-sec CS presentations (averaged across trials). CS = conditioned stimulus. G1v3 = Group 1/3. G3v3 = Group 3/3. G3v9 = Group 3/9. Error bars reflect ± 1 between-subjects standard error of the mean.

Given the between-groups manipulation of reward magnitude and that one group experienced only one reward magnitude (i.e., Group 3/3), a second set of within-group analyses were conducted to better elucidate how differential reward magnitude influenced CS-induced changes in behavior. For Group 1/3, the 3-pellet CS reduced press rates more than the 1-pellet CS, F(1, 50) = 8.37, p = .006, and also led to higher levels of food-port entry, F(1, 47) = 10.08, p = .003. The food-port entry analysis also detected a significant CS Reward Magnitude × CS 10-s Period interaction, F(1, 47) = 4.16, p = .047, which indicated that the increase in food-port entries over time was steeper for the 3-pellet CS than the 1-pellet CS. Similarly, for Group 3/9, lever pressing was suppressed more by the 9-pellet CS than the 3-pellet CS, F(1, 50) = 5.36, p = .025, but there was no effect of reward magnitude on food-port entry, F(1, 50) = 1.79, p = .187.

These findings are consistent with the cognitive control hypothesis but could be the product of response competition between food-port and lever-press behaviors. Specifically, when given a cue that signals a large, desirable reward, rats may simply lose the opportunity to lever press because they are preoccupied with checking the food-port. However, we found little support this alternative account. For instance, if a rat’s tendency to check the food-port during a specific cue period interferes with their ability to also press the lever during that period, then these responses should be negatively correlated across cue periods for individual rats (e.g., large increases in food-port activity should co-occur unchanged or decreased press rates). To investigate this possibility, we assessed the within-subject correlation between press- and entry-rate difference scores across all 10-sec CS time bins (3 bins per trial x 4 trials, as in Figure 2), separately for each CS type, for each rat. Figure 3 shows that, with the exception of one CS type for each of 2 rats, r’s ≤ −0.59, p’s ≤ .045, lever pressing and food-port entries were not significantly correlated, p’s ≥ .050 (median p-value = .467). Moreover, of the 2 rats that did show significant correlations between press and entry rate, neither correlation passed Bonferroni correction for multiple comparisons (i.e., .05 divided by 44 separate correlations = .0011). Also notable was the finding that rats were in many cases able to increase their rate of lever pressing (i.e., displaying positive difference scores) during cue periods with extreme increases food-port entry behavior (> 30 entries/min), suggesting that food-port activity did not impose a major limit on lever-press performance.

Figure 3. — Scatterplots are presented for each rat. Data are separated by trial (unlike in Figure 2) and 10-s CS bin (as in Figure 2). The abscissa shows CS-induced changes in food-port-entry rate for individual time bins; the ordinate shows CS-induced changes in lever-press rate for corresponding time bins. For each CS type for each rat, simple regression lines were fit, with dashed lines indicating that the fit was not significant and bolded solid lines indicating significance at p < .05. CS = conditioned stimulus.

We also assessed whether food-port and lever-press responses competed more acutely over short time frames. For this analysis, we constructed peri-event histograms showing the probability of food-port entry occurrence during 0.1-sec periods surrounding individual lever-press responses (Figure 4A–C) and the corresponding probability of lever-press occurrence surrounding individual food-port entry responses (Figure 4D–F); a +/− 8-sec peri-event window was used for the panels in Figure 4. Visual inspection of these data indicates that there were only very brief (~ 1-sec) dips in the probability of food-port behavior when rats lever pressed (Figures 4A–C) and in the probability of lever-pressing when rats entered the food-port (Figures 4D–F). Importantly, we also observed pronounced increases in the probability of food-port checking shortly after lever-press performance (Figures 4A–C) and in the probability of lever-press prior to food-port entry behavior (Figures 4D–F). These findings are consistent with previous reports (Halbout et al. 2019; Marshall et al. 2020; Marshall and Ostlund 2018) and indicate that these responses were typically performed together as a lever-press→port-entry action sequence.

Figure 4. — **(A-C)** Probability of food-port entries as a function of time from each lever press. **(D-F)** Probability of lever presses as a function of time from each food-port entry. Note the y-axes are different for Panels A-C and Panels D-F.

Discussion

The results of Experiment 1 indicate that the motivational influence of reward-predictive cues on instrumental behavior varies inversely with expected reward magnitude, with the 1-pellet cue increasing reward seeking and the 3- and 9-pellet cues decreasing this behavior, particularly near the expected time of reward delivery. In contrast, attempts to retrieve reward at the food-port were more apparent when larger rewards were expected. While we did not include a control for nonassociative effects (e.g., pseudoconditioning) in this study, we have included such controls in several previous studies using similar designs and typically find that unpaired cues have little or no effect on instrumental performance (LeBlanc et al. 2013; LeBlanc et al. 2014; Ostlund et al. 2014). Moreover, the parametric effects of reward magnitude strongly imply that CS-induced changes in lever-press rate were attributable to CS-specific associative learning.

It is also notable that cue-elicited food-port entries did not monotonically vary with the magnitude of the predicted reward during either Pavlovian training or PIT testing, particularly in Group 3/9. While this may reflect random variability, our informal observations suggest that cues signaling larger rewards tend to elicit fewer but more persistent bouts of waiting at the food-port than the other cues. We therefore recorded and analyzed food-port dwell times in Experiment 2 to more fully characterize the influence of predicted reward value on this behavior and further probe competition between food-port and lever-press responses.

Experiment 2

In Experiment 1, reward value (magnitude) was manipulated during Pavlovian conditioning. This complicates data interpretation because it conflates the influence that reward value has on conditioned responding with the potential influence that expected reward value may have on the regulation of PIT performance. Consider the finding that rats decrease their rate of lever pressing when presented with a cue that predicts the delivery of 3 or 9 pellets. This may reflect a previously established tendency for such cues to elicit a competing food-port response, which may be performed in an automatic and inflexible manner. Thus, the results of Experiment 1 do not provide definitive support for the hypothesis that reward-predictive cues prompt a flexible shift from lever pressing to checking the food-port based on expected reward value, even though they are generally compatible with this account.

The post-training reward devaluation procedure avoids this issue and provides a more direct test of the degree to which a response is performed flexibly, based on an internal representation of the expected reward (Colwill and Rescorla 1990; Holland 2004; Holland and Straub 1979; Rescorla 1994). We applied this approach to probe the influence of expected reward value on PIT performance using the same basic task used in Experiment 1, but with a few key changes. In Experiment 2, rats were once again trained to press a lever for a grain-based food pellet, after which they were given differential Pavlovian conditioning with two distinct CS-reward contingencies. One CS signaled the delivery of 3 banana pellets and the other CS signaled the delivery of 3 chocolate pellets. Based on the results of Group 3/3 in Experiment 1, we anticipated that this training would establish cues that suppress lever pressing and elicit high levels of food-port activity during the PIT test. However, in the current experiment, we established a conditioned taste aversion to one of the two Pavlovian rewards after training but prior to testing. Thus, while the two CSs signaled equally valuable rewards during training, only one of these CSs signaled a valuable reward at test, with the other signaling a recently devalued reward. According to the cognitive control account, this treatment should selectively disrupt the behavioral effects of the CS associated with the devalued reward since this cue no longer signals a need to suppress lever pressing and check the food-port. However, if the behavioral effects of these CSs are established during training and expressed in a habitual or automatic manner at test, then reward devaluation should have no impact on task performance.