Skip to main content
eLife logoLink to eLife
. 2019 May 20;8:e43551. doi: 10.7554/eLife.43551

Mesolimbic dopamine projections mediate cue-motivated reward seeking but not reward retrieval in rats

Briac Halbout 1,2,, Andrew T Marshall 1,2, Ali Azimi 1,2, Mimi Liljeholm 3, Stephen V Mahler 2,4, Kate M Wassum 5,6, Sean B Ostlund 1,2,
Editors: Naoshige Uchida7, Joshua I Gold8
PMCID: PMC6548499  PMID: 31107241

Abstract

Efficient foraging requires an ability to coordinate discrete reward-seeking and reward-retrieval behaviors. We used pathway-specific chemogenetic inhibition to investigate how rats’ mesolimbic and mesocortical dopamine circuits contribute to the expression and modulation of reward seeking and retrieval. Inhibiting ventral tegmental area dopamine neurons disrupted the tendency for reward-paired cues to motivate reward seeking, but spared their ability to increase attempts to retrieve reward. Similar effects were produced by inhibiting dopamine inputs to nucleus accumbens, but not medial prefrontal cortex. Inhibiting dopamine neurons spared the suppressive effect of reward devaluation on reward seeking, an assay of goal-directed behavior. Attempts to retrieve reward persisted after devaluation, indicating they were habitually performed as part of a fixed action sequence. Our findings show that complete bouts of reward seeking and retrieval are behaviorally and neurally dissociable from bouts of reward seeking without retrieval. This dichotomy may prove useful for uncovering mechanisms of maladaptive behavior.

Research organism: Rat

Introduction

Foraging and other reward-motivated behaviors tend to unfold as a sequence of actions, beginning with a reward-seeking phase and ending with an attempt to retrieve and consume any rewards produced by this activity. Coordinating the discrete reward-seeking and reward-retrieval behaviors that make up these action sequences is important for efficient foraging. When rewards are sparse or otherwise difficult to obtain, attempts to retrieve them are often unnecessary and should therefore be withheld to conserve energy and minimize opportunity costs (Stephens and Krebs, 1986; Niv et al., 2007). Consistent with this, studies on self-paced instrumental behavior show that the ability to efficiently pattern reward-seeking and -retrieval responses based on task demands (e.g., reinforcement schedule) can strongly impact the rate at which rewards are obtained (Ostlund et al., 2012; Wassum et al., 2012; Matamales et al., 2017). However, such behaviors must remain sensitive to changes in internal and external states. For instance, environmental cues that signal reward availability increase attempts to seek out (Estes, 1948; Corbit and Balleine, 2016) and retrieve reward (Marshall and Ostlund, 2018). While the ability to develop and modify action sequences is normally adaptive, this process may become dysregulated in certain conditions, such as obsessive-compulsive disorder (Joel and Avisar, 2001; Korff and Harvey, 2006; Frederick and Cocuzzo, 2017) and drug addiction (Tiffany, 1990; Graybiel, 2008; Volkow et al., 2013), leading to maladaptive behaviors. Despite this, the behavioral and neural mechanisms responsible for regulating reward seeking and retrieval are not well understood.

Previous studies strongly implicate dopamine in learning new action sequences (Graybiel, 1998; Jin and Costa, 2015). While other findings suggest that dopamine is not as important for the expression of well-established action sequences (Levesque et al., 2007; Wassum et al., 2012), it remains possible that dopamine contributes to action sequence performance when changes in task conditions prompt a reorganization of reward seeking and retrieval. For instance, previous studies indicate that the tendency for reward-paired cues to motivate reward-seeking behavior critically depends on dopamine signaling (Dickinson et al., 2000; Ostlund and Maidment, 2012; Wassum et al., 2011), particularly in the nucleus accumbens (NAc) (Wyvell and Berridge, 2000; Lex and Hauber, 2008; Wassum et al., 2013; Ostlund et al., 2014; Aitken et al., 2016). Interestingly, we recently found that such cues do not simply provoke reward-seeking behavior (e.g., lever pressing), they also increase the likelihood that such behavior will be followed by an attempt to retrieve reward (e.g., food-cup approach)(Marshall and Ostlund, 2018). Although this finding suggests that reward-paired cues preferentially motivate complete bouts of reward seeking and retrieval, it has yet to be established if this modulation of action sequence performance depends on dopamine.

Dopamine may also contribute to regulating attempts to seek out and retrieve a reward when the value of that reward changes. Self-paced, instrumental reward-seeking actions are normally performed in a goal-directed manner, such that they are sensitive to changes in reward value (Balleine and Dickinson, 1998). However, they can develop into inflexible stimulus-response habits with extended training (Dickinson, 1985; Dickinson et al., 1995). In contrast, it is not well understood how changes in reward value modulate attempts to retrieve rewards produced through instrumental reward-seeking behavior. For example, it has been suggested that rats’ tendency to approach the food cup after lever pressing may represent a discrete goal-directed action – one that is selected independently of the initial decision to press the lever (Rescorla, 1964). Alternatively, rats may concatenate the press-approach sequence to form an action chunk, which can then be selected and deployed as a single unit of behavior (Lashley, 1951; Graybiel, 1998; Jin and Costa, 2015). Action chunks are thought to represent a special form of habit, or behavioral chain, in which each element of the chain automatically elicits the next response. This allows for efficient action sequencing but comes with a decrease in behavioral flexibility. Once an action chunk has been initiated, it should be automatically completed without further consideration of reward value (Dezfouli et al., 2014; Smith and Graybiel, 2016).

In the current study, we applied a chemogenetic approach to investigate the role of the mesocorticolimbic dopamine system in action sequence performance in rats. We used a combination of well-established behavioral assays and novel microstructural analyses to selectively probe the influence of reward-paired cues and expected reward value on the regulation of reward-seeking and -retrieval responses. We found that inhibiting dopamine neurons in the ventral tegmental area (VTA) or their inputs to the NAc, but not the medial prefrontal cortex (mPFC), reversibly disrupted cue-motivated reward seeking, but spared the tendency for reward-paired cues to trigger complete bouts of seeking and retrieval. These dopamine manipulations had no impact on rats’ tendency to adjust their reward-seeking behavior in response to reward devaluation. Importantly, attempts to retrieve reward were not suppressed by reward devaluation, suggesting that this behavior was the product of action chunking.

Results

Effects of response-contingent feedback about reward delivery on reward retrieval

We first characterized the relationship between reward-seeking and -retrieval responses when rewards are sparse (Figure 1A). Rats were trained to lever press on a RI-60s schedule, such that this action was often nonreinforced and only occasionally earned food pellet delivery into a recessed food cup. Not surprisingly, we found that the probability of food-cup approach was elevated for several seconds after performance of the lever-press action (Figure 1B and C). This timeframe for press-contingent food-cup approach behavior is consistent with previous reports (Nicola, 2010; Marshall and Ostlund, 2018), and was relatively consistent across the current experiments (see Figure 3—figure supplement 1). We therefore used a cutoff value of 2.5 s to identify reward-retrieval attempts. To control for reward-retrieval opportunities, which were contingent on lever pressing, our analysis focuses on a normalized measure – the proportion of lever presses that were followed by food-cup approach.

Figure 1. Microstructural organization of instrumental behavior.

Figure 1.

(A) Hungry rats were trained to perform a self-paced ‘reward seeking’ task, in which pressing a lever was intermittently reinforced with food pellets (RI-60s schedule). Press-contingent food-cup approaches were taken as a measure of attempted ‘reward retrieval’. (B) Probability of food-cup approaches as a function of time surrounding reinforced (purple) and nonreinforced (gray) lever presses. (C) Representative pattern of food-cup approach behavior for an individual rat surrounding reinforced and nonreinforced lever presses. Individual reinforced trials are separately presented across the y-axis aligned at the point at which the lever became activated (i.e., primed for reinforcement). (D, E) Effects of manipulating instrumental reinforcement contingency on the organization of reward-seeking and -retrieval responses. Total lever presses (D) or presses followed by an approach (E) during tests in which lever pressing was intermittently reinforced (RI-60s) either with food pellets and associated cues (Food and Cues) or with pellet dispenser cues but no actual food delivery (Cues Only). Rats were also tested without any reinforcement (No Food or Cues). (F) The proportion of lever presses that were followed by food-cup approach was higher for reinforced presses than for nonreinforced presses, regardless of whether pressing was reinforced with Food and Cues, or Cues Only. Rats also continued to sporadically check the food cup after nonreinforced lever presses, albeit at a much lower level than after reinforced presses.

Figure 1—source data 1. This spreadsheet contains the behavioral responses for individual rats in Figure 1.
DOI: 10.7554/eLife.43551.003

We found that rats were much more likely to approach the food cup after reinforced presses than after nonreinforced presses (t(8) = 19.33, p<0.001), suggesting they could detect when pellets were delivered based on sound and tactile cues produced by the dispenser. This was confirmed in subsequent tests, during which lever pressing produced either 1) pellet dispenser cues and actual pellet delivery (Food and Cues), 2) pellet dispenser cues only (Cues Only), or 3) no pellet dispenser cues or pellet delivery (No Food or Cues). Here too, we found that food-cup approaches were more likely after reinforced than nonreinforced lever presses, regardless of whether pellet dispenser cues were presented alone or together with actual food delivery (Figure 1F; ts(8) ≥ 13.74, ps <0.001; the overall frequency of lever pressing (Figure 1D) and the frequency of complete bouts of presses that were followed by an approach (Figure 1E) are presented for comparison). Although pellet dispenser cues were clearly an effective trigger for rats to shift from the lever to the food cup, they also made these shifts spontaneously, indicating that they had developed the tendency to perform the complete press-approach action sequence. These unprompted approaches occurred after a relatively small subpopulation of nonreinforced lever presses, which is consistent with our previous data (Marshall and Ostlund, 2018).

Inhibiting dopamine neurons during Pavlovian-to-instrumental transfer preferentially disrupts cue-motivated reward seeking, but not reward retrieval

Our previous findings suggest that reward-predictive cues both invigorate reward-seeking behavior (i.e., the PIT effect) and increase the likelihood that such actions will be followed by an attempt to retrieve reward from the food cup (Marshall and Ostlund, 2018). Experiment 2 investigated the contributions of the mesocorticolimbic dopamine system to these distinct behavioral effects of reward-paired cues.

Rats with dopamine neuron-specific expression of the inhibitory DREADD hM4Di or mCherry in the VTA (Figure 2) were trained on a PIT task (Figure 3A) consisting of a Pavlovian conditioning phase, in which two different auditory cues were paired (CS+) or unpaired (CS-) with food pellets, and a separate instrumental training phase, in which rats were trained to lever press for pellets. During PIT testing, we noncontingently presented the CS+ and CS- while rats were free to lever press and check the food cup without response-contingent food or cue delivery.

Figure 2. DREADD expression in Th:Cre +rats.

Figure 2.

(A) Th:Cre+ rats received bilateral injections of AAV-hSyn-DIO-hM4Di-mCherry or AAV-hSyn-DIO-mCherry in the VTA. (B) Representative expression of the mCherry-tagged inhibitory DREADD hM4Di (red) in VTA Th positive neurons (green) of Th:Cre+ rats, as well as in neuronal terminals (C) projecting to the nucleus accumbens (NAc) and medial prefrontal cortex (mPFC). Scale bar is 500 µm.

Figure 3. Chemogenetic inhibition of dopamine neurons on Pavlovian to instrumental transfer (PIT) performance.

(A) Experimental design: Following viral vectors injections and recovery, rats received Pavlovian training, during which they learned to associate an auditory cue (CS+) with food pellet delivery. During instrumental conditioning, rats performed the same lever-press task used in Experiment 1. Lever pressing was extinguished (Ext) before rats were submitted to a PIT test, which included separate noncontingent presentations of the CS+ and an unpaired control cue (CS-). (B) Chemogenetic inhibition of VTA dopamine neurons disrupted cue-motivated reward seeking. Total lever presses during PIT trials for rats expressing the inhibitory DREADD hM4Di or mCherry following vehicle (left) or CNO (5 mg/kg, right) treatment prior to test. Presses during pre-CS (gray) and CS periods (red) are plotted separately. (C) PIT expression is specifically impaired in hM4Di expressing Th:Cre+ rats. PIT scores (total presses: CS+ - pre-CS+) show that the CS+ increased lever pressing after vehicle treatment for both groups, but that CNO suppressed this effect in the hM4Di group but not the mCherry group. **p<0.01. (D) The CS+ increased the proportion of lever presses that were followed by a food-cup approach during PIT testing. Inhibiting VTA dopamine neurons did not disrupt expression of this effect. Instead, rats in both groups showed a modest increase in their likelihood of checking the food cup after lever pressing when treated with CNO. (E) Representative organization of the effects of the CS+ and CS- on attempts to seek out and retrieve reward during PIT. Data show lever presses and food-cup approaches (press-contingent or noncontingent) for two control rats (Th:Cre+ rats expressing mCherry and receiving vehicle).

Figure 3—source data 1. This spreadsheet contains the behavioral responses for individual rats in Figure 3.
DOI: 10.7554/eLife.43551.009

Figure 3.

Figure 3—figure supplement 1. Probability of food-cup approaches as a function of time surrounding individual lever-press responses during PIT testing, plotted separately for CS+ (blue), CS- (red), and pre-CS (yellow) periods.

Figure 3—figure supplement 1.

This analysis was restricted to vehicle tests completed by rats in Experiments 2 and 3A. Shewhart process control chart analyses were used to determine the times when food-cup approach behavior was elevated with respect to constant background rates. This value approximated 2.5 s following each lever press, as shown by the shaded box.
Figure 3—figure supplement 2. Frequency of lever presses that were followed by a food-cup approach during PIT testing by rats expressing the inhibitory DREADD hM4Di or mCherry following vehicle or CNO (5 mg/kg) treatment in Experiment 2.

Figure 3—figure supplement 2.

Data from pre-CS (gray) and CS (blue) periods are plotted separately. Error bars represent ±1 standard error of the estimated marginal means from the corresponding fitted generalized linear mixed-effects model. We found that the CS+ but not the CS- strongly increased the frequency of these press-approach sequences (CS Period * CS Type interaction, t(240) = 7.87, p<0.001), which did not vary as a function of Group or Drug treatment (interactions involving these factors, ps ≥. 18). However, while CNO administration did not disrupt the ability of the CS+ to elicit these press-approach sequences, it did result in a general, hM4Di-independent reduction in the frequency of these sequences (Drug effect, t(240) = −4.52, p<0.001; Drug * Group interaction, t(240) = −1.19, p=0.234).
Figure 3—figure supplement 3. Noncontingent (press-independent) food-cup approaches during PIT testing by rats expressing the inhibitory DREADD hM4Di or mCherry following vehicle or CNO (5 mg/kg) treatment in Experiment 2.

Figure 3—figure supplement 3.

Data from pre-CS (gray) and CS (green) periods are plotted separately. Error bars represent ±1 standard error of the estimated marginal means from the corresponding fitted generalized linear mixed-effects model. This behavior was selectively increased by the CS+ relative to the CS- (CS Period * CS Type interaction, t(240) = 11.72, p<0.001). CNO produced a nonspecific (hM4Di-independent) suppression in this approach response (Drug effect, t(240) = −1.99, p=0.047), which was more pronounced for the mCherry group (Group * Drug * CS Type * CS Period interaction, t(240) = 2.73, p=0.0067).

We found that rats selectively increased their lever press performance during CS+ presentations, relative to the CS- and pre-CS response rates (Figure 3B; CS Period * CS Type interaction, p<0.001; see Supplementary file 1A for full generalized linear mixed-effects model output). This effect was significantly attenuated by CNO in a group-specific manner (Group * Drug * CS Period * CS Type interaction, p=0.002). Analysis of data from CS+ trials (only) found that CNO selectively suppressed cue-induced lever pressing in hM4Di relative to mCherry rats (Drug * Group * CS Period interaction, p=0.013). Further analysis found that the mCherry group displayed a pronounced increase in lever pressing during CS+ trials (CS Period * CS Type interaction, p<0.001), and this effect was not altered by CNO (Drug * CS Period * CS Type interaction, p=0.780). In contrast, CNO pretreatment significantly disrupted expression of CS+ induced lever pressing in the hM4Di group (Drug * CS Period * CS Type interaction, p<0.001). hM4Di rats showed a CS+ specific elevation in lever pressing when pretreated with vehicle (CS Period * CS Type interaction, p<0.001) but not CNO (CS Period * CS Type interaction, p=0.684). While these findings indicate that CNO selectively disrupted the response-invigorating influence of the CS+ by inhibiting VTA dopamine neurons in hM4Di rats, there was also some indication that CNO may have produced a nonspecific, group-independent, suppression of PIT performance (Drug x CS Period x CS Type, p=0.007). We therefore conducted a more focused analysis of CS+ induced changes in lever-press performance (PIT score: CS+ - pre-CS+; Figure 3C), which confirmed that CNO significantly suppressed this behavioral effect in the hM4Di group (t(17) = −3.83, p<0.001), but not in the mCherry group (t(13) = −1.21, p=0.249). This is in line with recent findings that similar CNO treatment does not significantly alter PIT performance in DREADD-free rats (Collins et al., 2019).

We also investigated if VTA dopamine neuron inhibition impacts the tendency for the CS+ to increase attempts to retrieve reward after performing the reward-seeking response (Figure 3D and E; see Figure 3—figure supplement 1 for illustration of the probability of food-cup approach surrounding lever presses during nonreinforced PIT trials). We found that the CS+ (p<0.001) but not the CS- (p=0.501) increased the proportion of lever presses that were followed by a food-cup approach, even though no rewards were actually delivered at test (see Supplementary file 1B for full generalized linear mixed-effects model output; see Figure 3—figure supplements 2 and 3 for analysis of total press-contingent and noncontingent approaches, respectively). Importantly, CNO did not alter this response to the CS+ in a group-specific manner (Group * Drug * CS+ Period, p=0.835), indicating that VTA dopamine neuron function is not required for this behavior. However, CNO did induce some nonspecific, group-independent alterations in the proportion of presses that were followed by a food-cup approach, lowering the overall likelihood of this behavior (Drug effect, p=0.019), but enhancing the effect of the CS+ (Drug * CS+ Period, p<0.037).

Pathway-specific inhibition of dopamine projections to NAc, but not mPFC, disrupts cue-motivated reward seeking but not retrieval

As previously reported (Mahler et al., 2019), hM4Di expression in VTA dopamine neurons resulted in transport of DREADDs to axonal terminals in the NAc and mPFC (Figure 2). We took advantage of this to investigate the roles of these two pathways in PIT performance, again distinguishing between the influence of reward-paired cues on reward seeking and reward retrieval. Guide cannulae were aimed at the NAc or mPFC in rats expressing hM4Di in VTA dopamine neurons (Experiment 3A; Figure 4A and Figure 4—figure supplement 1). These rats underwent training and testing for PIT (Figure 4B), as described above, but were pretreated with intra-NAc or mPFC injections of CNO (1 mM) or vehicle to achieve local inhibition of neurotransmitter release (Mahler et al., 2014; Stachniak et al., 2014; Lichtenberg et al., 2017), an approach previously shown to be effective in inhibiting dopamine release (Mahler et al., 2019). Figure 4C shows that, in hM4Di-expressing rats, the CS+ specific increase in lever pressing (CS Period * CS Type interaction, p<0.001) was disrupted by CNO in a manner that depended on microinjection site (Drug * CS Period * CS Type * Site interaction, p=0.003; Supplementary file 1C for full generalized linear mixed-effects model output). After intracranial vehicle injections, rats showed a CS+ specific elevation in pressing (CS Period * CS Type interaction, p<0.001), which did not differ significantly across vehicle injection sites (CS Period * CS Type * Site interaction, p=0.151). Unlike with systemic CNO, the CS+ remained effective in increasing lever pressing after CNO microinjection into the mPFC (CS Type * CS Period interaction, p<0.001) and NAc (CS Type * CS Period interaction, p<0.001). However, this effect was significantly attenuated when CNO was injected into the NAc versus the mPFC (CS Period * CS Type * Site interaction, p=0.012; analysis of CNO data only). A more focused analysis of CS+ elicited lever pressing (Figure 4D; PIT score) confirmed that CNO disrupted this effect in the NAc group (t(6) = −2.49, p=0.047), but not in the mPFC group (t(8) = 0.34, p=0.746).

Figure 4. Pathway specific chemogenetic inhibition of dopamine on PIT performance.

(A) Th:Cre+ rats initially received VTA AAV-hSyn-DIO-hM4Di-mCherry injections and were implanted with guide cannulas aimed at the medial prefrontal cortex (mPFC) or nucleus accumbens (NAc) for microinjection of CNO (1 mM) or vehicle to inhibit dopamine terminals at test. (B) Following surgery, rats underwent training and testing for PIT, as described above. We analyzed the microstructural organization of behavior (Lever presses: seeking, and presses followed by a food-cup approach: retrieval) at test. (C) Pathway specific inhibition of dopamine terminals in the NAc but not the mPFC disrupted cue-motivated reward seeking. Total lever presses during PIT trials for rats expressing the inhibitory DREADD hM4Di and receiving CNO or vehicle microinfusions in either the mPFC or NAc prior to test. Presses during pre-CS (gray) and CS periods (red) are plotted separately. (D) PIT expression was specifically impaired following NAc CNO treatment. PIT scores (total presses: CS+ - pre-CS+) show that the CS+ increased lever pressing following vehicle treatment in both groups, but that CNO suppressed this effect when injected into the NAc but not the mPFC. *p<0.05. (E) The CS+ increased the proportion of lever presses that were followed by a food-cup approach during PIT testing. This effect did not significantly vary as a function of drug treatment or group. (F) Scatter plots show the relationship between individual differences in the effect of the CS+ on lever presses that were not followed by food-cup approach in the vehicle condition (PIT Score for presses without approach) and the suppressive effect of CNO on CS+ evoked lever pressing (PIT Score for CNO test - PIT Score for vehicle test). Data points are from individual rats receiving intra-mPFC (left panel) or intra-NAc (right panel) microinjections.

Figure 4—source data 1. This spreadsheet contains the behavioral responses for individual rats in Figure 4.
DOI: 10.7554/eLife.43551.016

Figure 4.

Figure 4—figure supplement 1. Cannulae placements for Experiment 3A hM4Di expressing rats.

Figure 4—figure supplement 1.

Individual placements in nucleus accumbens (NAc) and medial prefrontal cortex (mPFC).
Figure 4—figure supplement 2. Frequency of lever presses during PIT testing by rats expressing mCherry following microinjection of vehicle (A) or CNO (B) into the mPFC or NAc in Experiment 3B.

Figure 4—figure supplement 2.

Data from pre-CS (gray) and CS (red) periods are plotted separately. Error bars represent ±1 standard error of the estimated marginal means from the corresponding fitted generalized linear mixed-effects model. The CS+ induced a cue-specific increase in lever pressing (CS Type * CS Period interaction, t(176) = 4.51, p<0.001). Rats in the NAc group exhibited a marginally weaker CS+ specific increase in lever pressing than rats in the mPFC group (Site * CS Type * CS Period interaction, t(176) = −1.89, p=0.060). CNO appeared to slightly attenuate CS+ elicited lever pressing, particularly when injected into the mPFC, though this effect was not significant (effect was not significant (Drug * CS Period * CS Type interaction, t(176) = −0.0022, p=0.998; Site * Drug * CS Period * CS Type interaction, t(176) = 1.83, p=0.068). (C) A focused analysis of CS+ elicited lever pressing (PIT Score; ±1 between-subjects SEM) confirmed that CNO injections did not significantly disrupt this effect in either the mPFC (t(5) = 1.63, p=0.165) or NAc (t(5) = 0.33, p=0.753) group. These results indicate that the tendency for intra-NAc CNO injections to disrupt CS+ elicited lever pressing in hM4Di expressing rats in Experiment 3A (see Figure 4, main text) was due to dopamine terminal inhibition and not a nonspecific CNO effect.
Figure 4—figure supplement 3. Frequency of presses that were followed by a food-cup approach during PIT testing by rats expressing the inhibitory DREADD hM4Di following microinjection of CNO or vehicle into the mPFC or NAc in Experiment 3A.

Figure 4—figure supplement 3.

Data from pre-CS (gray) and CS (blue) periods are plotted separately. Error bars represent ±1 standard error of the estimated marginal means from the corresponding fitted generalized linear mixed-effects model. The cue-induced increase in the frequency of these press-approach sequences was specific to the CS+ (CS Type * CS Period interaction, t(240) = 7.34, p<0.001), which did not vary as a function of drug and/or injection site (ps >0.689), nor were there any main effects of these treatment variables (Drug effect, t(240) = 1.05, p=0.294; Site effect, t(240) = −0.26, p=0.798; Drug * Site interaction, t(240) = 1.28, p=0.203).
Figure 4—figure supplement 4. Noncontingent (press-independent) food-cup approaches during PIT testing in rats expressing the inhibitory DREADD hM4Di following microinjection of CNO or vehicle into the mPFC or NAc in Experiment 3A.

Figure 4—figure supplement 4.

Data from pre-CS (gray) and CS (green) periods are plotted separately. Error bars represent ±1 standard error of the estimated marginal means from the corresponding fitted generalized linear mixed-effects model. Noncontingent approach behavior was selectively elicited by the CS+ but not the CS- (CS Period * CS Type interaction, t(240) = 12.98, p<0.001). CNO administration led to a very modest but reliable enhancement in the tendency for the CS+ to increase this behavior over baseline levels (Drug * CS Type * CS Period interaction, t(240) = 2.98, p=0.0032), which did not depend on injection site (Drug * Site * CS Type * CS Period interaction, t(240) = −1.45, p=0.147). CNO did not have any general (cue-independent) effects on this response (Drug effect, t(240) = −0.41, p=0.680; Site effect, t(240) = 0.75, p=0.456; Drug * Site interaction, t(240) = −0.96, p=0.338).
Figure 4—figure supplement 5. Scatter plots show the relationship between individual differences in the effect of the CS+ on lever presses that were not followed by food-cup approach in the vehicle condition (PIT score for presses without approach) and the suppressive effect of CNO on CS+ evoked lever pressing (PIT Score for CNO test - PIT Score for vehicle test).

Figure 4—figure supplement 5.

Data points are from individual rats expressing mCherry (left panel) or hM4Di (right panel).

The disruptive effect of intra-NAc CNO administration on PIT performance did not systematically vary as a function of injection site (data not presented), which is not surprising given previous findings that this effect is modulated by dopamine signaling in both the core and shell of the NAc (Lex and Hauber, 2008; Peciña and Berridge, 2013). Given such findings, it is possible that complete inhibition of ventral striatal dopamine transmission would abolish expression of the PIT effect, as it was found with systemic CNO treatment in Experiment 2. It is also possible that VTA dopamine projections to areas not targeted in the current study (e.g., amygdala) make an important, parallel contribution to this behavior.

We also conducted a separate experiment (Experiment 3B) with rats expressing the mCherry reporter (only) in VTA dopamine neurons to determine if this behavioral effects of CNO microinfusion was hM4Di-dependent. While there was evidence that CNO may have produced some nonspecific response suppression when injected into the mPFC but not the NAc (Drug * Site * CS Period * CS Type, p=0.068), this drug treatment did not significantly disrupt expression of CS+ elicited lever pressing for either injection site (p’s > 0.165; Figure 4—figure supplement 2).

As in the previous experiment, we found that the CS+ (p<0.001) increased the proportion of lever presses that were followed by an attempt to retrieve reward from the food cup (Figure 4E; Supplementary file 1D for full generalized linear mixed-effects model output; see Figure 4—figure supplements 3 and 4 for analysis of total press-contingent and noncontingent approaches, respectively). CNO seemed to generally reduce the likelihood that lever pressing would be followed by food-cup approach, though this effect did not reach statistical significance (Drug effect, p=0.057). If anything, intra-NAc injections of CNO tended to enhance the effect of the CS+ on this approach response, though this effect also failed to reach significance (Drug * Site * CS+ Period, p=0.093).

The above findings indicate that VTA dopamine circuitry supports the motivational influence of the CS+ on reward seeking but does not mediate that cue's ability to promote reward retrieval. We wondered if this might account for variability in the partial, response-suppressive effect of NAc dopamine terminal inhibition. Specifically, we hypothesized that rats inclined to respond to the CS+ by engaging in discrete bouts of lever pressing, without attempting to retrieve reward, would be particularly sensitive to inhibition of NAc dopamine inputs. Consistent with this, we found that for the NAc group, individual differences in the effect of the CS+ on lever presses without subsequent food cup approach (during the vehicle test) were correlated with the degree to which CNO suppressed CS+ evoked lever pressing (PIT Score for all presses), relative to vehicle (CNO – Vehicle; r = −0.81, p=0.027; Figure 4F). No such relationship was found for the mPFC group (r = −0.19, p=0.618), which did not show sensitivity to dopamine terminal inhibition. Similar analysis of data from Experiment 2 also found no correlation between these measures (Figure 4—figure supplement 5), which may not be surprising given that systemic inhibition of VTA dopamine neurons led to a more robust and consistent suppression of CS+ evoked lever pressing (Figure 3B).

Altogether, these findings demonstrate that the mesolimbic dopamine system selectively mediates cue-motivated reward seeking, and suggest that dopamine inputs to the NAc are particularly important for individuals that tend to respond to such cues with discrete bouts of reward seeking without subsequent reward retrieval.

Inhibiting dopamine neurons spares the sensitivity of reward-seeking actions to reward devaluation

It is unclear from the above findings if rats' tendency to approach the food cup after lever pressing reflects a discrete goal-directed action or if this response tends to be performed habitually, as part of a fixed press-approach action chunk. We conducted a reward devaluation experiment to probe this issue and investigate the role of VTA dopamine neurons in goal-directed action selection. Rats expressing mCherry or hM4Di in VTA dopamine neurons were trained on two distinct instrumental action-outcome contingencies, after which they underwent reward devaluation testing after pretreatment with CNO (5 mg/kg) or vehicle (Figure 5A). Rats performed significantly fewer presses on the devalued lever than on the valued lever (Figure 5B; Lever effect, p<0.001; Supplementary file 1E for full generalized linear mixed-effects model output). CNO treatment did not significantly alter the effect of reward devaluation on lever pressing in either hM4Di or mCherry rats (Drug * Lever, p=0.146; Group * Drug * Lever interaction, p=0.591), indicating that VTA dopamine neuron function is not required for this aspect of goal-directed action selection. Inhibiting VTA dopamine neurons also failed to disrupt sensitivity to devaluation during reinforced testing (see Figure 5—figure supplement 1).

Figure 5. Chemogenetic inhibition of dopamine neurons on reward devaluation performance.

(A) Th:Cre+ rats received VTA injections of AAV-hSyn-DIO-hM4Di-mCherry or AAV-hSyn-DIO-mCherry. Following recovery, rats were trained on two distinct lever-press actions for two different rewards (Instrumental Learning). Rats then underwent reward-specific devaluation testing following treatment with CNO (5 mg/kg) or vehicle. (B) Chemogenetic VTA dopamine inhibition did not alter the impact of reward devaluation on reward seeking. Total lever presses on the valued (red bars) and devalued (gray) levers in hM4Di or mCherry expressing Th:Cre+ rats, following CNO (5 mg/kg) or vehicle treatments. (C) Proportion of valued (blue) and devalued (gray) lever-press actions that were followed by a food-cup approach. Rats were more likely to attempt to retrieve reward after performing the devalued lever-press action. This effect was not altered by VTA dopamine neuron inhibition. (D). Lever presses performed without a subsequent food-cup approach response (red) were more sensitive to reward devaluation than presses that were followed by an approach (blue).

Figure 5—source data 1. This spreadsheet contains the behavioral responses for individual rats in Figure 5.
DOI: 10.7554/eLife.43551.019

Figure 5.

Figure 5—figure supplement 1. Data from the reinforced phase of reward devaluation testing for rats expressing the inhibitory DREADD hM4Di or mCherry following vehicle or CNO treatment in Experiment 4.

Figure 5—figure supplement 1.

(A) Significantly fewer lever presses were performed on the devalued lever compared to the valued lever, t(148) = −5.55, p<0.001, which did not interact with Group or Drug, ps ≥. 095. (B) Similarly, the frequency of presses that were followed by a food-cup approach was lower on the devalued versus the valued lever, t(148) = −5.46, p<0.001, which also did not depend on Group or Drug, ps ≥0. 128. In A-B, error bars represent ±1 standard error of the estimated marginal means from the corresponding fitted generalized linear mixed-effects model. (C) The proportion of lever presses that were followed by a food-cup approach was significantly higher for the devalued versus the valued lever, t(131) = 4.11, p<0.001, which also did not depend on Group or Drug conditions, ps ≥. 679.

VTA dopamine neuron inhibition did not significantly alter the overall likelihood of press-contingent approach behavior or its sensitivity to reward devaluation (Figure 5C; ps ≥. 109; see Supplementary file 1F for full generalized linear mixed-effects model output). Interestingly, we found that the proportion of presses that were followed by a food-cup approach was actually greater for the devalued lever than for the valued lever (Lever effect, p=0.040). This effect was driven by the fact that lever presses that were not followed by approach were more strongly suppressed by reward devaluation than presses that were directly followed by an approach (Press Type * Lever interaction, p<0.001; see Figure 5D and Supplementary file 1G for full generalized linear mixed-effects model output).

It was not possible to analyze the impact of reward devaluation on noncontingent approach responses performed at test because this behavior was associated with both the valued and devalued reward. However, other findings from our lab (data not shown) from studies involving a single reward-type indicate that noncontingent approaches are readily suppressed by reward devaluation, in contrast to response-contingent approaches. This is in line with previous reports that food-cup approach behavior is generally sensitive to reward devaluation (Balleine, 1992; Thrailkill and Bouton, 2017), particularly if it is elicited by Pavlovian reward-predicted cues (Holland and Straub, 1979; Lichtenberg et al., 2017).

Discussion

We investigated the role of mesocorticolimbic dopamine circuitry in regulating reward-seeking (lever pressing) and reward-retrieval responses (press-contingent food-cup approach). Consistent with a recent study (Marshall and Ostlund, 2018), we found that noncontingent CS+ presentations increased reward seeking, generally, but also increased the likelihood that rats would attempt to retrieve reward after performing such actions. These behaviors were differentially mediated by the mesolimbic dopamine system. Specifically, chemogenetic inhibition of VTA dopamine neurons or their inputs to NAc, but not mPFC, disrupted the excitatory influence of the CS+ on reward seeking, but spared that cue’s ability to increase attempts to retrieve reward. These behaviors were also differentially sensitive to reward devaluation, which suppressed reward seeking but actually increased the likelihood that rats would attempt to retrieve reward. VTA dopamine neurons inhibition did not impact the influence of reward devaluation on either component of behavior.

We found that attempts to retrieve reward by transitioning from the lever to the food cup were executed in a habitual manner, without consideration of reward value, consistent with action chunking (Dezfouli et al., 2014; Smith and Graybiel, 2016). However, task performance was not limited to these press-approach action chunks. When rats pressed the lever but were not reinforced (with food or cues), they would occasionally check the food cup but often omitted this response. This sporadic pattern of reward retrieval is adaptive given that strict press-approach action sequencing is unnecessary under such conditions, when rewards are sparse and uncertain. Instead, rats seemed to vacillate between two different strategies when initiating the lever-press response, performing it as part of a complete action chunk (press-approach) or as a discrete action (press only). These distinct patterns of reward seeking appeared to be differentially sensitive to reward devaluation. While rats were generally less likely to lever press for the devalued reward than for the valued reward, press-approach action chunks tended to be less sensitive to reward devaluation than presses that were not followed by approach. Because of this differential sensitivity to reward devaluation, the proportion of all lever presses followed by an attempt to retrieve reward was actually greater for devalued action than for the valued action. Such findings supports the connection between action chunking and habitual behavior (Graybiel, 2008; Dezfouli et al., 2014; Smith and Graybiel, 2016), and suggest that moment-to-moment control over self-paced, reward-seeking behavior may shift back and forth between habit and goal-directed systems.

PIT testing revealed that the CS+ generally increased lever pressing, but disproportionately increased the performance of press-approach action chunks, at least relative to their otherwise low frequency of occurring in the absence of the CS+. This finding further bolsters the connection between action chunking and habitual control given previous reports that habitual reward-seeking actions are particularly sensitive to the motivational effects of reward-paired cues (Holland, 2004; Wiltgen et al., 2012). However, while press-approach action chunks were elevated during the CS+, they still accounted for only a minority (between 30% and 50%) of lever presses that were performed during these trials. Most lever presses evoked by the CS+ were not followed by a food-cup approach, and it was this component of the PIT effect that was selectively disrupted by chemogenetic inhibition of VTA dopamine neurons or their inputs to NAc. The ability of the CS+ to promote press-approach chunks was, in contrast, completely spared by these manipulations. Consistent with this, we found that the response-suppressive effect of NAc dopamine terminal inhibition varied across rats based on the way they normally responded to the CS+. Rats that responded to that cue with a large increase in discrete lever presses (i.e., without subsequent food-cup approach) showed the greatest suppression. We suggest that this may reflect differences across rats in their sensitivity to the dopamine-dependent motivational effects of reward-paired cues.

Previous studies have found that dopamine receptor antagonists either selectively suppress lever pressing without affecting concomitant food-cup approach (Nelson and Killcross, 2013), or suppress both types behavior to a similar extent (Wassum et al., 2011; Ostlund et al., 2012). Even this latter finding is consistent with dopamine contributing more to reward seeking than reward retrieval, since a reduction in reward seeking creates fewer opportunities to retrieve reward. Interpreting these findings is problematic, however, because such studies typically have not applied microstructural analyses, like those used here, to distinguish between press-contingent and noncontingent food-cup approaches. One exception is a study by Nicola (2010) showing that blocking dopamine receptors in the NAc attenuates cue-triggered lever pressing without impacting the latency of subsequent food-cup approach behavior. Building on such findings, the current study used the PIT paradigm to show that the mesolimbic dopamine system specifically mediates the motivational influence of reward-paired cues on reward seeking but not their dissociable ability to increase the likelihood that such actions will be followed by an attempt to retrieve reward.

Our previous studies monitoring mesolimbic dopamine release during PIT performance are also interesting to consider together with the current findings. For instance, we found that CS+ evoked phasic dopamine release in the NAc correlates with that cue’s effect on lever pressing (Wassum et al., 2013; Ostlund et al., 2014) but not food-cup approaches (Aitken et al., 2016). We also found that individual CS+ evoked lever presses are temporally correlated with transient bouts of phasic dopamine release (Ostlund et al., 2014). The current findings suggest that this relationship between NAc dopamine release and cue-motivated reward seeking may be stronger for discrete presses that are performed without a subsequent food-cup approach than for complete press-approach chunks. This question remains to be investigated, and would help resolve whether the mesolimbic dopamine system is involved in modulating reward seeking, generally, or whether its activity becomes uncoupled from the execution of action chunks, which may become differentially associated with nigrostriatal dopamine system activity (Jin and Costa, 2010).

While dopamine is known to play a crucial role in forming new action chunks (Graybiel, 1998; Jin and Costa, 2015), its role in the expression of previously learned action chunks is less clear. Our findings indicate that VTA dopamine circuitry does not play a necessary role in the execution of press-approach action chunks, regardless of whether they are self-initiated or are prompted by a reward-paired cue. This is generally compatible with previous findings. For instance, dopamine receptor blockade suppresses action sequence performance early but not late in training (Levesque et al., 2007; Wassum et al., 2012). Moreover, the phasic NAc dopamine release that normally precedes action sequence performance tends to become attenuated as rats acquire efficient task performance, presumably through action chunking (Cacciapaglia et al., 2012; Wassum et al., 2012; Klanker et al., 2015; Collins et al., 2016). That said, the mesolimbic dopamine system continues to contribute to action sequence tasks that require considerable effort, such as the execution of a long series of lever presses (Fischbach-Weiss et al., 2018).

Inhibiting VTA dopamine neurons did not impact rats’ sensitivity to reward devaluation, which is consistent with other findings in the literature (Dickinson et al., 2000; Lex and Hauber, 2010a; Lex and Hauber, 2010b; Wassum et al., 2011). Such findings are interesting given that regions innervated by this dopamine system, including the NAc and mPFC, are known to make important contributions to goal-directed decision making (Bradfield and Balleine, 2017; Sharpe et al., 2019). Of course, dopamine likely contributes to goal-directed decision making in more demanding tasks that require greater cognitive resources (Floresco, 2013; Cools, 2015; Westbrook and Braver, 2016).

It is also notable that inhibiting mPFC dopamine terminals had no detectable effects on expression of PIT, since food-paired cues are known to elicit dopamine release (Bassareo and Di Chiara, 1997; Feenstra et al., 1999) and neural activity (Homayoun and Moghaddam, 2009) in the mPFC. It is possible that the dissociable effects of NAc versus mPFC dopamine terminal inhibition reported here may relate to inherent differences between the mesolimbic and mesocortical dopamine systems, which include regional differences in release kinetics and in the density of dopamine terminals or receptors (Lammel et al., 2008; Weele et al., 2019; Mahler et al., 2019). However, previous lesion studies suggest that the mPFC may not be an essential component of the circuitry that mediates PIT performance (Cardinal et al., 2003; Corbit and Balleine, 2003), which is more in line with the current results.

Our findings may also have implications for understanding the role of dopamine in pathologies of behavioral control such as obsessive-compulsive disorder (OCD). In the signal attenuation model of OCD (Joel and Avisar, 2001), rats learn that response-contingent cues no longer signal that an instrumental reward-seeking action will produce reward. In this case, the logical organization of reward-seeking and -retrieval actions disintegrates, such that rats exhibit persistent reward seeking, typically without attempting to collect reward from the food cup. It was previously reported that blocking D1-dopamine receptors disrupts expression of these incomplete bouts of compulsive-like reward seeking, without affecting the production of complete bouts of reward seeking and retrieval, which continue to be performed on some test trials (Joel and Doljansky, 2003). Considered in this light, our findings suggest that the mesolimbic dopamine system may mediate the tendency for reward-paired cues to promote this potentially compulsive component of cue-motivated reward seeking. This link deserves further research, and may facilitate research to advance understanding and treatment of compulsive disorders like OCD and addiction (Joel et al., 2008; Robinson et al., 2014).

Materials and methods

Animals

In total, 89 male and female Long-Evans Tyrosine hydroxylase (Th):Cre+ rats (hemizygous Cre+) (Witten et al., 2011; Mahler et al., 2019) and wildtype (WT) littermates were used for this study. Subjects were at least 3 months of age at the start of the experiment and were single- or paired-housed in standard Plexiglas cages on a 12 hr/12 hr light/dark cycle. Animals were maintained at ~85% of their free-feeding weight during behavioral procedures. All experimental procedures that involved rats were approved by the UC Irvine Institutional Animal Care and Use Committee and were in accordance with the National Research Council Guide for the Care and Use of Laboratory Animals.

Apparatus

Behavioral procedures took place in sound- and light-attenuated Med Associates chambers (St Albans, VT, USA; ENV-007). Individual chambers were equipped with two retractable levers (Med Associates; ENV-112CM) positioned to the left and right of recessed food cup. Grain-based dustless precision pellets (45 mg, BioServ, Frenchtown, NJ, USA) were delivered into the cup using a pellet dispenser (Med Associates; ENV-203M-45). Sucrose solution (20% wt/vol) was delivered into the cup with a syringe pump (Med Associates; PHM-100). A photobeam detector (Med Associates; ENV-254-CB) positioned across the magazine entrance was used to record food-cup approaches. Chambers were illuminated by a houselight during all sessions.

Surgery

Th:Cre+ rats were anesthetized using isoflurane and placed in a stereotaxic frame for microinjections of a Cre-dependent (DIO) serotype two adeno-associated virus (AAV) vectors to induce dopamine neuron-specific expression of the inhibitory designer receptor exclusively activated by designer drug (DREADD) hM4Di fused to mCherry (AAV-hSyn-DIO-hM4Di-mCherry), or mCherry alone (AAV-hSyn-DIO-mCherry) (University of North Carolina Chapel Hill vector Core, Chapel Hill, NC, USA/Addgene, Cambridge, MA, USA; Experiment 2 was replicated with both sources) (Armbruster et al., 2007; Mahler et al., 2019). The AAV was injected bilaterally into the VTA (−5.5 mm AP,±0.8 mm ML, −8.15 mm DV; 1µL/side). Experiment 3 rats were bilaterally implanted with guide cannulae (22 gage, Plastic One) 1 mm dorsal to NAc (+1.3 AP, ±1.8 ML, −6.2 DV) or mPFC (+3.00 AP, ±0.5 ML, −3.0 DV) for subsequent clozapine-n-oxide (CNO) microinjections. Animals were randomly assigned to virus (hM4Di or mCherry) and cannula location (NAc or mPFC) groups. Animals were allowed at least 5 days of recovery before undergoing food restriction and behavioral training. Testing occurred at least 25 days after surgery to allow adequate time for viral expression of hM4Di throughout dopamine neurons, including in terminals within the NAc and mPFC.

Experiment 1: Effects of response-contingent feedback about reward delivery on reward retrieval

Instrumental learning

WT rats (n = 9) underwent 2 d of magazine training. In each session, 40 pellets were delivered into the food cup on a random 90 s intertrial interval (ITI). Rats then received 9 d of instrumental lever-press training. In each session, rats had continuous access to the right lever, which could be pressed to deliver food pellets into the food cup. The schedule of reinforcement was adjusted over days from continuous reinforcement (CRF) to increasing random intervals (RI), such that reinforcement only became available once a randomly determined interval had elapsed since the last reinforcer delivery. Rats received one day each of CRF, RI-15s, and RI-30s training, before undergoing 6 days of training with RI-60s. Each session was terminated after 30 min or after 20 rewards deliveries.

Varying response-contingent feedback

Following training, rats were given a series of tests to assess the influence of response-contingent feedback about reward delivery on instrumental reward-seeking (lever presses) and reward-retrieval responses (press-contingent food-cup approach). Rats were given three tests (30 min each, pseudorandom order over days) during which lever pressing caused: 1) activation of the pellet dispenser to deliver a pellet into the food cup (RI-60s schedule; Food and Cues Test), 2) activation of the pellet dispenser to deliver a pellet into an external cup not accessible to the rats, producing associated sound and tactile cues but no reward (also RI-60s schedule; Cues Only Test), or 3) no dispenser activation (i.e., extinction; No Food or Cues Test).

Experiments 2 and 3: Role of mesocorticolimbic dopamine in cue-motivated reward seeking and retrieval

Pavlovian conditioning

Th:Cre+ rats (n = 60) underwent 2 d of magazine training, as in Experiment 1 (40 pellets on 90 s random ITI). Rats then received eight daily Pavlovian conditioning sessions. Each session consisted of a series of 6 presentations of a two-min audio cue (CS+; either a pulsating 2 kHz pure tone (0.1 s on and 0.1 s off) or white noise; 80 dB), with trials separated by a 5 min variable ITI (range 4–6 min between CS onsets). During each CS+ trial, pellets were delivered on a 30 s random time schedule, resulting in an average of 4 pellets per trial. Rats were separately habituated to an unpaired auditory stimulus (CS-; alternative audio stimulus; 2 min duration). CS- exposure procedures differed slightly across experiments. For Experiment 2, which assessed the effects of system-wide dopamine neurons inhibition, rats received a final Pavlovian conditioning session consisting of four trials with the CS+ (reinforced, as described above) followed by four trials with the CS- (nonreinforced), separated by a 5 min variable ITI. In Experiment 3, which assessed the effects of local inhibition of dopamine terminals in NAc or mPFC, rats were given 2 days of CS- only exposure (eight nonreinforced trials per session, 5 min variable ITI) following initial CS+ training. Conditioning was measured by comparing the rate of food-cup approach between the CS onset and the first pellet delivery (to exclude unconditioned behavior) to the rate of approach during the pre-CS period.

Instrumental training

Following Pavlovian conditioning, rats were given 9 d of instrumental training, as in Experiment 1, with one day each of CRF, RI-15s, RI-30s, and 6 days of RI-60s. Sessions ended after 30 min or 20 rewards were earned.

Pavlovian-to-instrumental transfer (PIT) test

After the last instrumental training session, rats were given a session of Pavlovian (CS+) training, identical to initial training. They were then given a 30 min extinction session, during which lever presses were recorded but had no consequence (i.e., no food or cues). On the next day, rats were given a PIT test, during which the lever was continuously available but produced no rewards. Following 8 min of extinction, the CS+ and CS- were each presented four times (2 min per trial) in pseudorandom order and separated by a 3 min fixed ITI. Before each new round of testing, rats were given two sessions of instrumental retraining (RI-60s), one session of CS+ retraining, and one 30 min extinction session, as described above. Test procedures differed slightly between Experiments 2 and 3.

Experiment 2

Th:Cre+ rats expressing hM4Di (n = 18) or mCherry only (n = 14) in VTA dopamine neurons were used to assess the effects of system-wide inhibition of the mesocorticolimbic dopamine system on PIT performance. These groups were run together and received CNO (5 mg/kg, i.p.) or vehicle (5% DMSO in saline) injections 30 min prior to testing. They underwent a second test following retraining (described above), prior to which the alternative drug pretreatment was administered.

Experiment 3

In Experiment 3A, Th:Cre+ rats expressing hM4Di in VTA dopamine neurons were used to assess the impact of locally inhibiting dopaminergic terminals in the NAc (n = 7) or mPFC (n = 9) on PIT performance. Because microinjection procedures produced additional variability in task performance, rats in this experiment underwent a total of 4 tests. Rats received either CNO microinfusions (1 mM, 0.5 µL/side or 0.3 µL/side, for NAc and mPFC respectively) or vehicle (DMSO 5% in aCSF) 5 min before the start of each test and were given two rounds of testing each with CNO and vehicle (test order counterbalanced across other experimental conditions). To determine if the effects of CNO microinjections depended on hM4Di expression, a separate control study (Experiment 3B) was run using Th:Cre+ rats expressing mCherry only in VTA dopamine neurons. Experiments 3A and 3B were run and analyzed separately.

Experiment 4: Role of mesocorticolimbic dopamine in goal-directed action selection

Instrumental Training

Th:Cre+ rats expressing hM4Di (n = 11) or mCherry only (n = 9) in VTA dopamine neurons began with 2 d of magazine training, during which they received 20 grain-pellets and 20 liquid sucrose rewards (0.1 mL of 20% sucrose solution, wt/vol) in random order according to a common 30 s random ITI. This was followed by 11 d of instrumental training with two distinct action–outcome contingencies (e.g., left-lever press → grain; right-lever press→ sucrose). The reinforcement schedule that was gradually shifted over days with 2d of CRF to increasingly effortful random ratio (RR) schedules, with 3 d of RR-5, 3 d of RR-10, and 3d of RR-20 reinforcement. The left and right lever-press responses were trained in separate sessions, at least 2 hr apart, on each day. Action-outcome contingencies were counterbalanced across subjects. Sessions were terminated after 30 min elapsed or 20 pellets were earned.

Devaluation Testing

To selectively devalue one of the food rewards prior to testing, rats were satiated on grain pellets or sucrose solution by providing them with 90 min of unrestricted access to that food in the home cage. After 60 min of feeding, rats received CNO (5 mg/kg, i.p.) or vehicle injections. After an additional 30 min of feeding, rats were placed in the chamber for a test in which they had continuous access to both levers. The test began with a 5 min nonreinforced phase (no food or cues), which was immediately followed by a 15 min reinforced phase, during which each action was reinforced with its respective reward (CRF for the first five rewards, then RR-20 for the remainder of the session). Rats were given a total of 4 devaluation tests, two after CNO and two after vehicle, alternating the identity of the devalued reward across the two tests in each drug condition (test order counterbalanced across training and drug conditions).

Histology

Rats were deeply anesthetized with a lethal dose of pentobarbital and perfused with 1x PBS followed by 4% paraformaldehyde. Brains were postfixed in 4% paraformaldehyde, cryoprotected in 20% sucrose and sliced at 40 μm on a cryostat. To visualize hM4Di expression, we performed immunohistochemistry for Th and mCherry tag. Tissue was first incubated in 3% normal donkey serum PBS plus Triton X-100 (PBST; 2 hr) and then in primary antibodies in PBST at 4°C for 48 hr using rabbit anti-DsRed (mCherry tag; 1:500; Clontech; 632496), and mouse anti-Th (1:1,000, Immunostar; 22941) antibodies. Sections were incubated for 4 hr at room temperature in fluorescent conjugated secondary antibodies (Alexa Fluor 488 goat anti-mouse (Th; 1:500; Invitrogen; A10667) and Alexa Fluor 594 goat anti-rabbit (DsRed; 1:500; Invitrogen; A11037)).

Drugs

CNO was obtained from NIMH (Experiments 2 and 4) or Sigma-Aldrich (St. Louis, MO, USA; Experiment 3), and dissolved in 5% DMSO in saline, or aCSF for microinjection.

Behavioral measures

Reward-seeking actions were quantified as the total number (frequency) of lever presses performed per unit time. Based on microstructural analyses described below, lever presses that were followed by a food-cup approach (≤2.5 s) were distinguished from presses that were not followed by an approach. The proportion of presses that were followed by an approach response served as our primary measure of press-contingent reward retrieval. We also analyzed bouts of noncontingent food-cup approach (occurring >2.5 s after the most recent press or approach), which served as a measure of spontaneous or cue-evoked reward retrieval.

Statistical analysis

Data were analyzed using general(ized) linear mixed-effects models (Pinheiro and Bates, 2000), which allows for simultaneous parameter estimation as a function of condition (fixed effects) and the individual rat (random effects) (Pinheiro and Bates, 2000; Bolker et al., 2009; Boisgontier and Cheval, 2016). Analyses on count data (e.g., response frequency) incorporated a Poisson response distribution and a log link function (Coxe et al., 2009). Fixed-effects structures included an overall intercept and the full factorial of all primary manipulations (Experiment 2: Group, Drug, CS Type, CS Period; Experiment 3: Site, Drug, CS Type, CS Period; Experiment 4: Group, Drug, Lever), and the random-effects structures included by-subjects uncorrelated intercepts adjusted for the within-subjects manipulations (i.e., Experiments 2 and 3: Drug, CS Type, and CS Period; Experiment 4: Drug, Lever). ‘CS Type’ refers to the distinction between the CS+ and CS-, while ‘CS Period’ refers to the distinction between the 120 s CS duration and the 120 s period preceding its onset. Proportion data were square-root transformed prior to analysis to correct positive skew, but are plotted in non-transformed space for ease of interpretation. These data were collapsed across pre-CS+ and pre-CS- periods, such that the factor ‘CS Period’ had three levels (CS+, CS-, and Pre-CS). The fixed- and random-effects structures of this analysis was identical to the frequency analysis above with the exception that CS Type was not included in the analysis, and the random-effects structure only included by-subjects intercepts.

All statistical analyses were conducted using the Statistics and Machine Learning Toolbox in MATLAB (The MathWorks; Natick, MA, USA). The alpha level for all tests was .05. As all predictors were categorical in the mixed-effects analysis, effect size was represented by the unstandardized regression coefficient (Baguley, 2009), reported as b in model output tables. Mixed-effects models provide t-values to reflect the statistical significance of the coefficient relative to the population mean (i.e., simple effects). These simple effects are indicative of main effects and interactions when a factor has only two levels. For factors with at least three levels, F-tests were conducted to reveal the overall significance of the effect or interaction(s) involving this factor. The source of significant interactions was determined by secondary mixed-effects models identical to those described above but split by the relevant factor of interest. For analyses in which a significant main effect had more than two levels, post-hoc tests of main effects employed MATLAB’s coefTest function, and interactions were reported in-text as the results of ANOVA F-tests (i.e., whether the coefficients for each fixed effect were significantly different from 0).

When analyzing data from PIT experiments, the ability of the CS+ to selectively increase performance of a response (relative to the CS-) over baseline (pre-CS) levels was indicated by a significant CS Type * CS Period interaction. We were particularly interested in treatment-induced alterations in the expression of this effect, as indicated by significant 3-way and 4-way interactions involving this CS Type * CS Period term, in combination with Drug and/or Group factors. We were also interested in potential main effects of Drug and/or Group factors, reflecting broad, cue-independent behavioral effects. While statistical output tables include a summary of all fixed effects included in the model, only these theoretically interesting findings are discussed in the main text. Lower level interactions involving only CS Type or CS Period, but not their combination, are provided in the output tables but are not discussed in the main text given that they may be the product of incidental or spurious behavioral differences across cue conditions.

PIT Scores (CS+ – pre-CS+) were calculated for more focused analysis of CS+ elicited lever pressing. One-sample t-tests were used to assess the effect of CNO for each group. Because inhibiting VTA dopamine neurons or their NAc terminals predominantly disrupted the ability of the CS+ to elicit lever presses that were not followed by an approach response, we also assessed if differences across rats in their tendency to exhibit such behavior in the Vehicle Test (PIT score; presses without approach) correlated with differences in their sensitivity to the response-suppressive effect of CNO on CS+ elicited lever pressing (CNO – Vehicle; PIT score, all presses).

Acknowledgements

The authors acknowledge the assistance of Christy N Munson in the acquisition of behavioral data.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Briac Halbout, Email: halboutb@uci.edu.

Sean B Ostlund, Email: sostlund@uci.edu.

Naoshige Uchida, Harvard University, United States.

Joshua I Gold, University of Pennsylvania, United States.

Funding Information

This paper was supported by the following grants:

  • National Institute on Drug Abuse to Stephen V Mahler.

  • National Institute of Mental Health 106972 to Kate M Wassum, Sean B Ostlund.

  • National Institute of Diabetes and Digestive and Kidney Diseases 098709 to Sean B Ostlund.

  • National Institute on Drug Abuse 029035 to Sean B Ostlund.

  • National Institute on Aging 045380 to Sean B Ostlund.

Additional information

Competing interests

Reviewing editor, eLife.

No competing interests declared.

Author contributions

Conceptualization, Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing.

Conceptualization, Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing.

Data curation, Investigation.

Conceptualization, Methodology, Writing—review and editing.

Conceptualization, Validation, Investigation, Methodology, Writing—review and editing.

Conceptualization, Supervision, Funding acquisition, Writing—review and editing.

Conceptualization, Data curation, Formal analysis, Supervision, Funding acquisition, Investigation, Methodology, Writing—original draft, Project administration, Writing—review and editing.

Ethics

Animal experimentation: All experimental procedures that involved rats were approved by the UC Irvine Institutional Animal Care and Use Committee (protocol AUP-17-68) and were in accordance with the National Research Council Guide for the Care and Use of Laboratory Animals.

Additional files

Supplementary file 1. Generalized linear mixed-effects model outputs.
elife-43551-supp1.docx (36.1KB, docx)
DOI: 10.7554/eLife.43551.020
Transparent reporting form
DOI: 10.7554/eLife.43551.021

Data availability

All data generated and analyzed during this study are included in supporting files. Source data files have been provided for Figures 1, 3, 4 and 5, as well as their respective figure supplements.

References

  1. Aitken TJ, Greenfield VY, Wassum KM. Nucleus accumbens core dopamine signaling tracks the need-based motivational value of food-paired cues. Journal of Neurochemistry. 2016;136:1026–1036. doi: 10.1111/jnc.13494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Armbruster BN, Li X, Pausch MH, Herlitze S, Roth BL. Evolving the lock to fit the key to create a family of G protein-coupled receptors potently activated by an inert ligand. Proceedings of the National Academy of Sciences. 2007;104:5163–5168. doi: 10.1073/pnas.0700293104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Baguley T. Standardized or simple effect size: what should be reported? British Journal of Psychology. 2009;100:603–617. doi: 10.1348/000712608X377117. [DOI] [PubMed] [Google Scholar]
  4. Balleine B. Instrumental performance following a shift in primary motivation depends on incentive learning. Journal of Experimental Psychology: Animal Behavior Processes. 1992;18:236–250. doi: 10.1037/0097-7403.18.3.236. [DOI] [PubMed] [Google Scholar]
  5. Balleine BW, Dickinson A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology. 1998;37:407–419. doi: 10.1016/S0028-3908(98)00033-1. [DOI] [PubMed] [Google Scholar]
  6. Bassareo V, Di Chiara G. Differential influence of associative and nonassociative learning mechanisms on the responsiveness of prefrontal and accumbal dopamine transmission to food stimuli in rats fed ad libitum. The Journal of Neuroscience. 1997;17:851–861. doi: 10.1523/JNEUROSCI.17-02-00851.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Boisgontier MP, Cheval B. The anova to mixed model transition. Neuroscience & Biobehavioral Reviews. 2016;68:1004–1005. doi: 10.1016/j.neubiorev.2016.05.034. [DOI] [PubMed] [Google Scholar]
  8. Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen JR, Stevens MH, White JS. Generalized linear mixed models: a practical guide for ecology and evolution. Trends in Ecology & Evolution. 2009;24:127–135. doi: 10.1016/j.tree.2008.10.008. [DOI] [PubMed] [Google Scholar]
  9. Bradfield L, Balleine B. The learning and motivational processes controlling Goal-Directed action and their neural bases. Decision Neuroscience. 2017:71–80. [Google Scholar]
  10. Cacciapaglia F, Saddoris MP, Wightman RM, Carelli RM. Differential dopamine release dynamics in the nucleus accumbens core and shell track distinct aspects of goal-directed behavior for sucrose. Neuropharmacology. 2012;62:2050–2056. doi: 10.1016/j.neuropharm.2011.12.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Cardinal RN, Parkinson JA, Marbini HD, Toner AJ, Bussey TJ, Robbins TW, Everitt BJ. Role of the anterior cingulate cortex in the control over behavior by pavlovian conditioned stimuli in rats. Behavioral Neuroscience. 2003;117:566–587. doi: 10.1037/0735-7044.117.3.566. [DOI] [PubMed] [Google Scholar]
  12. Collins AL, Greenfield VY, Bye JK, Linker KE, Wang AS, Wassum KM. Dynamic mesolimbic dopamine signaling during action sequence learning and expectation violation. Scientific Reports. 2016;6:20231. doi: 10.1038/srep20231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Collins AL, Aitken TJ, Huang IW, Shieh C, Greenfield VY, Monbouquette HG, Ostlund SB, Wassum KM. Nucleus accumbens cholinergic interneurons oppose Cue-Motivated behavior. Biological Psychiatry. 2019 doi: 10.1016/j.biopsych.2019.02.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Cools R. The cost of dopamine for dynamic cognitive control. Current Opinion in Behavioral Sciences. 2015;4:152–159. doi: 10.1016/j.cobeha.2015.05.007. [DOI] [Google Scholar]
  15. Corbit LH, Balleine BW. The role of prelimbic cortex in instrumental conditioning. Behavioural Brain Research. 2003;146:145–157. doi: 10.1016/j.bbr.2003.09.023. [DOI] [PubMed] [Google Scholar]
  16. Corbit LH, Balleine BW. Learning and motivational processes contributing to Pavlovian-Instrumental transfer and their neural bases: dopamine and beyond. Current Topics in Behavioral Neurosciences. 2016;27:259–289. doi: 10.1007/7854_2015_388. [DOI] [PubMed] [Google Scholar]
  17. Coxe S, West SG, Aiken LS. The analysis of count data: a gentle introduction to poisson regression and its alternatives. Journal of Personality Assessment. 2009;91:121–136. doi: 10.1080/00223890802634175. [DOI] [PubMed] [Google Scholar]
  18. Dezfouli A, Lingawi NW, Balleine BW. Habits as action sequences: hierarchical action control and changes in outcome value. Philosophical Transactions of the Royal Society B: Biological Sciences. 2014;369:20130482. doi: 10.1098/rstb.2013.0482. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Dickinson A. Actions and habits: the development of behavioural autonomy. Philosophical Transactions of the Royal Society B: Biological Sciences. 1985;308:67–78. doi: 10.1098/rstb.1985.0010. [DOI] [Google Scholar]
  20. Dickinson A, Balleine B, Watt A, Gonzales F, Boakes R. Overtraining and the motivational control of instrumental action. Animal Learning & Behavior. 1995;22:197–206. [Google Scholar]
  21. Dickinson A, Smith J, Mirenowicz J. Dissociation of pavlovian and instrumental incentive learning under dopamine antagonists. Behavioral Neuroscience. 2000;114:468–483. doi: 10.1037/0735-7044.114.3.468. [DOI] [PubMed] [Google Scholar]
  22. Estes WK. Discriminative conditioning; effects of a pavlovian conditioned stimulus upon a subsequently established operant response. Journal of Experimental Psychology. 1948;38:173–177. doi: 10.1037/h0057525. [DOI] [PubMed] [Google Scholar]
  23. Feenstra MG, Teske G, Botterblom MH, De Bruin JP. Dopamine and noradrenaline release in the prefrontal cortex of rats during classical aversive and appetitive conditioning to a contextual stimulus: interference by novelty effects. Neuroscience Letters. 1999;272:179–182. doi: 10.1016/S0304-3940(99)00601-1. [DOI] [PubMed] [Google Scholar]
  24. Fischbach-Weiss S, Reese RM, Janak PH. Inhibiting mesolimbic dopamine neurons reduces the initiation and maintenance of instrumental responding. Neuroscience. 2018;372:306–315. doi: 10.1016/j.neuroscience.2017.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Floresco SB. Prefrontal dopamine and behavioral flexibility: shifting from an "inverted-U" toward a family of functions. Frontiers in Neuroscience. 2013;7:62. doi: 10.3389/fnins.2013.00062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Frederick MJ, Cocuzzo SE. Contrafreeloading in rats is adaptive and flexible: support for an animal model of compulsive checking. Evolutionary Psychology. 2017;15:147470491773593. doi: 10.1177/1474704917735937. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Graybiel AM. The basal ganglia and chunking of action repertoires. Neurobiology of Learning and Memory. 1998;70:119–136. doi: 10.1006/nlme.1998.3843. [DOI] [PubMed] [Google Scholar]
  28. Graybiel AM. Habits, rituals, and the evaluative brain. Annual Review of Neuroscience. 2008;31:359–387. doi: 10.1146/annurev.neuro.29.051605.112851. [DOI] [PubMed] [Google Scholar]
  29. Holland PC. Relations between Pavlovian-instrumental transfer and reinforcer devaluation. Journal of Experimental Psychology: Animal Behavior Processes. 2004;30:104–117. doi: 10.1037/0097-7403.30.2.104. [DOI] [PubMed] [Google Scholar]
  30. Holland PC, Straub JJ. Differential effects of two ways of devaluing the unconditioned stimulus after pavlovian appetitive conditioning. Journal of Experimental Psychology: Animal Behavior Processes. 1979;5:65–78. doi: 10.1037/0097-7403.5.1.65. [DOI] [PubMed] [Google Scholar]
  31. Homayoun H, Moghaddam B. Differential representation of Pavlovian-instrumental transfer by prefrontal cortex subregions and striatum. European Journal of Neuroscience. 2009;29:1461–1476. doi: 10.1111/j.1460-9568.2009.06679.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Jin X, Costa RM. Start/stop signals emerge in nigrostriatal circuits during sequence learning. Nature. 2010;466:457–462. doi: 10.1038/nature09263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Jin X, Costa RM. Shaping action sequences in basal ganglia circuits. Current Opinion in Neurobiology. 2015;33:188–196. doi: 10.1016/j.conb.2015.06.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Joel D, Stein DJ, Schreiber R. Animal Models of Obsessive–Compulsive Disorder: From Bench to Bedside via Endophenotypes and Biomarkers A2 - McArthur. In: Robert A, Borsini F, editors. Animal and Translational Models for CNS Drug Discovery. San Diego: Academic Press; 2008. pp. 133–164. [Google Scholar]
  35. Joel D, Avisar A. Excessive lever pressing following post-training signal attenuation in rats: a possible animal model of obsessive compulsive disorder? Behavioural Brain Research. 2001;123:77–87. doi: 10.1016/S0166-4328(01)00201-7. [DOI] [PubMed] [Google Scholar]
  36. Joel D, Doljansky J. Selective alleviation of compulsive lever-pressing in rats by D1, but not D2, blockade: possible implications for the involvement of D1 receptors in obsessive-compulsive disorder. Neuropsychopharmacology. 2003;28:77–85. doi: 10.1038/sj.npp.1300010. [DOI] [PubMed] [Google Scholar]
  37. Klanker M, Sandberg T, Joosten R, Willuhn I, Feenstra M, Denys D. Phasic dopamine release induced by positive feedback predicts individual differences in reversal learning. Neurobiology of Learning and Memory. 2015;125:135–145. doi: 10.1016/j.nlm.2015.08.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Korff S, Harvey BH. Animal models of obsessive-compulsive disorder: rationale to understanding psychobiology and pharmacology. Psychiatric Clinics of North America. 2006;29:371–390. doi: 10.1016/j.psc.2006.02.007. [DOI] [PubMed] [Google Scholar]
  39. Lammel S, Hetzel A, Häckel O, Jones I, Liss B, Roeper J. Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system. Neuron. 2008;57:760–773. doi: 10.1016/j.neuron.2008.01.022. [DOI] [PubMed] [Google Scholar]
  40. Lashley KS. The Problem of Serial Order in Behavior. Bobbs-Merrill; 1951. [Google Scholar]
  41. Levesque M, Bedard MA, Courtemanche R, Tremblay PL, Scherzer P, Blanchet PJ. Raclopride-induced motor consolidation impairment in primates: role of the dopamine type-2 receptor in movement chunking into integrated sequences. Experimental Brain Research. 2007;182:499–508. doi: 10.1007/s00221-007-1010-4. [DOI] [PubMed] [Google Scholar]
  42. Lex A, Hauber W. Dopamine D1 and D2 receptors in the nucleus accumbens core and shell mediate Pavlovian-instrumental transfer. Learning & Memory. 2008;15:483–491. doi: 10.1101/lm.978708. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Lex B, Hauber W. The role of dopamine in the prelimbic cortex and the dorsomedial striatum in instrumental conditioning. Cerebral Cortex. 2010a;20:873–883. doi: 10.1093/cercor/bhp151. [DOI] [PubMed] [Google Scholar]
  44. Lex B, Hauber W. The role of nucleus accumbens dopamine in outcome encoding in instrumental and pavlovian conditioning. Neurobiology of Learning and Memory. 2010b;93:283–290. doi: 10.1016/j.nlm.2009.11.002. [DOI] [PubMed] [Google Scholar]
  45. Lichtenberg NT, Pennington ZT, Holley SM, Greenfield VY, Cepeda C, Levine MS, Wassum KM. Basolateral amygdala to orbitofrontal cortex projections enable Cue-Triggered reward expectations. The Journal of Neuroscience. 2017;37:8374–8384. doi: 10.1523/JNEUROSCI.0486-17.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Mahler SV, Vazey EM, Beckley JT, Keistler CR, McGlinchey EM, Kaufling J, Wilson SP, Deisseroth K, Woodward JJ, Aston-Jones G. Designer receptors show role for ventral pallidum input to ventral tegmental area in cocaine seeking. Nature Neuroscience. 2014;17:577–585. doi: 10.1038/nn.3664. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Mahler SV, Brodnik ZD, Cox BM, Buchta WC, Bentzley BS, Quintanilla J, Cope ZA, Lin EC, Riedy MD, Scofield MD, Messinger J, Ruiz CM, Riegel AC, España RA, Aston-Jones G. Chemogenetic manipulations of ventral tegmental area dopamine neurons reveal multifaceted roles in cocaine abuse. The Journal of Neuroscience. 2019;39:503–518. doi: 10.1523/JNEUROSCI.0537-18.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Marshall AT, Ostlund SB. Repeated cocaine exposure dysregulates cognitive control over cue-evoked reward-seeking behavior during Pavlovian-to-instrumental transfer. Learning & Memory. 2018;25:399–409. doi: 10.1101/lm.047621.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Matamales M, Skrbis Z, Bailey MR, Balsam PD, Balleine BW, Götz J, Bertran-Gonzalez J. A corticostriatal deficit promotes temporal distortion of automatic action in ageing. eLife. 2017;6:e29908. doi: 10.7554/eLife.29908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Nelson AJ, Killcross S. Accelerated habit formation following amphetamine exposure is reversed by D1, but enhanced by D2, receptor antagonists. Frontiers in Neuroscience. 2013;7:76. doi: 10.3389/fnins.2013.00076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Nicola SM. The flexible approach hypothesis: unification of effort and cue-responding hypotheses for the role of nucleus accumbens dopamine in the activation of reward-seeking behavior. Journal of Neuroscience. 2010;30:16585–16600. doi: 10.1523/JNEUROSCI.3958-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Niv Y, Daw ND, Joel D, Dayan P. Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacology. 2007;191:507–520. doi: 10.1007/s00213-006-0502-4. [DOI] [PubMed] [Google Scholar]
  53. Ostlund SB, Kosheleff AR, Maidment NT. Relative response cost determines the sensitivity of instrumental reward seeking to dopamine receptor blockade. Neuropsychopharmacology. 2012;37:2653–2660. doi: 10.1038/npp.2012.129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Ostlund SB, LeBlanc KH, Kosheleff AR, Wassum KM, Maidment NT. Phasic mesolimbic dopamine signaling encodes the facilitation of incentive motivation produced by repeated cocaine exposure. Neuropsychopharmacology. 2014;39:2441–2449. doi: 10.1038/npp.2014.96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Ostlund SB, Maidment NT. Dopamine receptor blockade attenuates the general incentive motivational effects of noncontingently delivered rewards and reward-paired cues without affecting their ability to bias action selection. Neuropsychopharmacology. 2012;37:508–519. doi: 10.1038/npp.2011.217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Peciña S, Berridge KC. Dopamine or opioid stimulation of nucleus accumbens similarly amplify cue-triggered 'wanting' for reward: entire core and medial shell mapped as substrates for PIT enhancement. European Journal of Neuroscience. 2013;37:1529–1540. doi: 10.1111/ejn.12174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Pinheiro J, Bates D. Mixed-Effects Models in S and S-Plus. New York: Springer; 2000. [Google Scholar]
  58. Rescorla RA. Relation of Bar-Presses to magazine approaches. Psychological Reports. 1964;14:943–948. doi: 10.2466/pr0.1964.14.3.943. [DOI] [Google Scholar]
  59. Robinson TE, Yager LM, Cogan ES, Saunders BT. On the motivational properties of reward cues: individual differences. Neuropharmacology. 2014;76:450–459. doi: 10.1016/j.neuropharm.2013.05.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Sharpe MJ, Stalnaker T, Schuck NW, Killcross S, Schoenbaum G, Niv Y. An integrated model of action selection: distinct modes of cortical control of striatal decision making. Annual Review of Psychology. 2019;70:53–76. doi: 10.1146/annurev-psych-010418-102824. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Smith KS, Graybiel AM. Habit formation. Dialogues in Clinical Neuroscience. 2016;18:33. doi: 10.31887/DCNS.2016.18.1/ksmith. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Stachniak TJ, Ghosh A, Sternson SM. Chemogenetic synaptic silencing of neural circuits localizes a hypothalamus→midbrain pathway for feeding behavior. Neuron. 2014;82:797–808. doi: 10.1016/j.neuron.2014.04.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Stephens DW, Krebs JR. Foraging Theory. Princeton University Press; 1986. [Google Scholar]
  64. Thrailkill EA, Bouton ME. Effects of outcome devaluation on instrumental behaviors in a discriminated heterogeneous chain. Journal of Experimental Psychology: Animal Learning and Cognition. 2017;43:88–95. doi: 10.1037/xan0000119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Tiffany ST. A cognitive model of drug urges and drug-use behavior: role of automatic and nonautomatic processes. Psychological Review. 1990;97:147–168. doi: 10.1037/0033-295X.97.2.147. [DOI] [PubMed] [Google Scholar]
  66. Volkow ND, Wang GJ, Tomasi D, Baler RD. Unbalanced neuronal circuits in addiction. Current Opinion in Neurobiology. 2013;23:639–648. doi: 10.1016/j.conb.2013.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Wassum KM, Ostlund SB, Balleine BW, Maidment NT. Differential dependence of pavlovian incentive motivation and instrumental incentive learning processes on dopamine signaling. Learning & Memory. 2011;18:475–483. doi: 10.1101/lm.2229311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Wassum KM, Ostlund SB, Maidment NT. Phasic mesolimbic dopamine signaling precedes and predicts performance of a self-initiated action sequence task. Biological Psychiatry. 2012;71:846–854. doi: 10.1016/j.biopsych.2011.12.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Wassum KM, Ostlund SB, Loewinger GC, Maidment NT. Phasic mesolimbic dopamine release tracks reward seeking during expression of Pavlovian-to-instrumental transfer. Biological Psychiatry. 2013;73:747–755. doi: 10.1016/j.biopsych.2012.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Weele CMV, Siciliano CA, Tye KM. Dopamine tunes prefrontal outputs to orchestrate aversive processing. Brain Research. 2019;1713:16–31. doi: 10.1016/j.brainres.2018.11.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Westbrook A, Braver TS. Dopamine does double duty in motivating cognitive effort. Neuron. 2016;89:695–710. doi: 10.1016/j.neuron.2015.12.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Wiltgen BJ, Sinclair C, Lane C, Barrows F, Molina M, Chabanon-Hicks C. The effect of ratio and interval training on Pavlovian-instrumental transfer in mice. PLOS ONE. 2012;7:e48227. doi: 10.1371/journal.pone.0048227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Witten IB, Steinberg EE, Lee SY, Davidson TJ, Zalocusky KA, Brodsky M, Yizhar O, Cho SL, Gong S, Ramakrishnan C, Stuber GD, Tye KM, Janak PH, Deisseroth K. Recombinase-driver rat lines: tools, techniques, and optogenetic application to dopamine-mediated reinforcement. Neuron. 2011;72:721–733. doi: 10.1016/j.neuron.2011.10.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Wyvell CL, Berridge KC. Intra-accumbens amphetamine increases the conditioned incentive salience of sucrose reward: enhancement of reward "wanting" without enhanced "liking" or response reinforcement. The Journal of Neuroscience. 2000;20:8122–8130. doi: 10.1523/JNEUROSCI.20-21-08122.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision letter

Editor: Naoshige Uchida1
Reviewed by: Stan Floresco2

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

[Editors’ note: this article was originally rejected after discussions between the reviewers, but the authors were invited to resubmit after an appeal against the decision.]

Thank you for submitting your work entitled "Ventral tegmental dopamine inputs to the nucleus accumbens mediates cue-triggered motivation but not reward expectancy" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Stan Floresco (Reviewer #3).

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife, at least in its current form.

Halbout and colleagues performed a series of experiments to determine the role of VTA dopamine neurons projecting to the nucleus accumbens (NAc) in reward-seeking behaviors in a Pavlovian-instrumental transfer (PIT) task. Specifically, the authors sought to separate whether dopamine regulates reward expectancy or motivation (response vigor). The authors show that chemogenetic inhibition of dopamine neurons impaired CS-induced lever pressing but not entry into the reward port (reward retrieval) induced directly by the sensory stimuli associated with food delivery. Based on the idea that the reward retrieval is caused by reward expectancy whereas CS-induced lever pressing is due to motivation, the authors conclude that VTA dopamine neurons regulate motivation (response vigor) but not reward expectancy.

All the reviewers thought that this study addresses an important question: identifying the exact role that this population of dopamine neurons plays in reward-seeking behavior. They agreed that this study uses very careful behavioral observations and a nice combination of behavioral paradigms, and obtained important data. However, the overall assessment of the study somewhat varied among the reviewers, as you can see below. After discussion, we thought that the definitions of some terms such as 'reward expectancy' are not very clear. 'Reward expectancy' is rather a broad term and has been used differently in different fields. It is not obvious whether 'motivation' and 'reward expectancy' are completely separated, in the first place. Also, reward expectancy may, in some literature, imply a narrower definition corresponding to that required for goal-directed behaviors. Overall, the definition of reward expectancy based on reward retrieval behavior is not very convincing (please see the comment by reviewer 1 and 2) and some of the results (devaluation experiments) appear to contradict the idea of 'reward expectancy'. Furthermore, as reviewer 1 pointed out, there appear to be alternative explanations of the data. In total, we thought that these terminologies and the interpretation of the results require significant revisions for clarify and consistency. Please also note that eLife is a general bio logy journal, and writing has to be understandable to readers outside a subfield. Additionally, the reviewers were concerned by the relatively small effects. Specifically, in some experiments (Experiment 2 [Figure 3B] and Experiment 3 [Figure 4B and Figure 4—figure supplement 2]), it appears that CNO alone had a strong effect on behavior and the results using a summary statistics ('CNO suppression') are not very convincing.

Given the above concerns, we cannot proceed with the current manuscript. Due to the potential importance of the data, however, we would like to suggest the authors to submit a revision plan if the authors think that they can address the major concerns raised by the reviewers. The revision plan should respond to the reviewers' main criticisms and describe specifically how the paper would be changed to address them, including whether new experiments would be performed. We would like to use the revision plan to make a final decision as to whether a revision is a viable approach for this manuscript.

Reviewer #1:

Halbout et al. use fine-grained analysis of behavior in a Pavlovian-instrumental transfer (PIT) task to assay whether VTA dopamine neurons facilitate the general PIT effect via increasing reward expectancy (as opposed to enhancing cue-triggered motivation). This is an important question, as answering it would help to further define dopamine function. The authors' strategy is to use the probability of reward port entry after an operant response as an index of reward expectancy. This measure is not reduced by chemogenetic inhibition of VTA dopamine neurons even though this manipulation effectively reduces the main PIT effect of enhanced operant responding during presentation of Pavlovian cues. The authors conclude that dopamine does not enhance PIT via promoting reward expectancy. Instead, dopamine enhances PIT via its contribution to cue-triggered motivation, and an additional non-dopamine-dependent component of PIT is the result of enhanced reward expectancy due CS+ presentation.

Although for the most part the results are convincing from the technical perspective, the interpretation completely depends on the authors' claim that port entries occurring shortly after an operant response are an index of reward expectancy. The authors support this claim by showing, in an instrumental task identical to the one used in the PIT experiments, that cues presented after the operant response that signal reward delivery (pellet drop) increase the probability of a port entry. However, this unsurprising result is entirely consistent with an alternative interpretation that the authors have apparently not considered. That is that port entry is simply the next step after the lever response in a chunked skilled action sequence. Such sequences are most likely a form of stimulus-response habit in which the stimulus is the previous action. In addition, because pellet drop occurs after the lever press and predicts reward, it could also participate in S-R habit learning. In this scenario, a lever press followed by pellet drop would simply activate the response (port entry) more strongly than either stimulus alone. Supporting this interpretation, the authors show that port entry after lever pressing is strongly resistant to outcome devaluation – a hallmark of S-R habits. It is therefore perplexing that the authors use this behavior as an index of reward expectancy – a component of goal-directed and cognitive forms of behavior, which should be sensitive to devaluation precisely because the value of the expected reward is reduced.

The authors go on to show that during the PIT test, not only does the CS+ increase the probability of the operant response relative to control periods (e.g., CS-), but it also increases the probability that any individual operant response will be followed by a port entry. They further show that chemogenetic inhibition of VTA dopamine neurons or terminals in the accumbens during the PIT test reduces the CS-induced increase in operant responding while leaving intact the probability of port entry after an operant response. The authors conclude that although a CS-triggered increase in reward expectancy drives a component of the PIT effect (those operant responses followed by port entry), an additional component is not driven by expectancy (operant responses not followed by port entry), and dopamine contributes to PIT via the latter mechanism, likely via enhancing cue-triggered motivation.

To accept this interpretation, one must accept the assertion that operant-entry sequences are the result of reward expectancy to a greater extent than operant responses occurring without a subsequent entry. As described above, this claim is suspect. It is certainly interesting that inhibition of dopamine neurons has a greater effect on single operants than operant-entry sequences, but there are alternative explanations. For instance, the authors' devaluation experiment suggests that operant actions that are not followed by port entry are sensitive to devaluation and are therefore likely controlled by action-outcome goal-directed associations. In contrast, the post-devaluation increase in frequency of operant-entry sequences suggests that operant-entry behavior is at least partially controlled by stimulus-response habit associations. In this scenario, dopamine facilitates the influence of a Pavlovian CS+ on goal-directed behavior but not habit behavior. Under the assumption that reward expectancy is more important for goal-directed than habit behaviors, this is roughly the opposite of the authors' conclusion. What the authors may have found is something perhaps even more intriguing: that CSs in the PIT test can influence habit-based behaviors at least as strongly as goal-directed behaviors. Of course, one might expect the neural mechanism of such effects to be different (as an interaction between Pavlovian and action value representations is possible only with goal-directed behaviors), and it is therefore informative that dopamine contributes only to facilitation of goal-directed behaviors.

In summary, the authors have found something intriguing in the resistance of operant response-port entry sequences to inhibition of dopamine neurons and dopamine terminals in the accumbens. However, their interpretation is not adequately supported by their data, some of which in fact contradict their conclusions.

Reviewer #2:

This article by Halbout and colleagues describe a series of experiments aimed at clarifying the contribution of the dopaminergic innervation of the ventral striatum in the interaction between Pavlovian and instrumental responses in an appetitive context. The study is well designed and this question should be relevant for neuroscientists interested in the neural basis of decision-making and motivation.

I have, however, a series of issues with the manuscript:

The terminology is very confusing, and the theoretical frame is globally difficult to grasp. There are many concepts that are used to capture very simple behaviors, even in the Results section (reward seeking, exploration, reward expectancy, motivation, vigor etc.). Could the authors make a set of simple predictions based on a simple theoretical frame and describe how they decide to operationalize the questions? There are places in the text when it is the case, but I'm afraid that there is a drift in the issues at stake between the Abstract and the Discussion.

The description of the methods is too superficial. It might be sufficient for specialists in the field (conditioning in rats) and/or people who know the work of this team, but not for a more general audience. Typically, including the contingencies and the timing of the reward schedules in the different tasks would help.

I am having some trouble reconciling the conclusions of the authors and the data, as shown by the figure:

Experiment 1: Again, the text is difficult to read but it seems to me that there are two things going on here: the operant behavior of these animals in the task is very 'habitual' (as opposed to goal-directed), since the rate of lever pressing decreases very little when reward is omitted (Figure 1D). In line with that, presumably, animals almost only try to collect the reward when they get a direct evidence (cue) that it has been delivered. So reward does not drive action and actions do not predict reward, and animals are left with Pavlovian processes to anticipate the reward. Is there more to it than that? If yes, it would require precise quantitative predictions and measures to demonstrate it. I would avoid the word 'exploration', unless the authors can demonstrate that animals try to get the reward by exploring another potential way to get the reward (e.g. another lever). Here, action seems to be driven much more by compulsion/ habits than exploration.

Experiments 2 and 3:

Looking at Figure 3B (Experiment 2) and Figure 4B and Figure 4—figure supplement 2 (Experiment 3), CNO alone has a strong effect on behavior and the difference between mCherry and hM4Di animals is really small. Not even sure it is significant, on the scale of the experiment. For example, in Experiment 2, Figure 3D shows a greater effect of CNO on response to CS- than CS+. Again, when looking at all the bars, I don't understand the conclusions.

In Experiment 2, to test the prediction regarding the role of DA in PIT, they should test the interaction between 3 factors (CNO vs. Vehicle; mCherry vs. hM4Di and pre vs. post CS+). By the way, why using the pre/post CS+ when there is a CS- condition, which seems a priori as a better reference for assessing the specificity of the Pavlovian effects of the CS+, as opposed to general effect of having a stimulus? Looking at the data, CNO abolishes the increase in responding evoked by the CS+, and the major difference between mCherry and hM4Di is in the pre CS+ period… If the authors really want to take into trial to trial variability, they could use a relative measure (pre vs. post, on a trial by trial basis) but compare the effect of CS+ with the effect of CS- to assess PIT, and the influence of the various manipulations.

It is essentially the same thing for experiment 3: the effect of CNO (here in the NAc) vs. vehicle is much bigger than the difference between mCherry and hM4Di. Ok, injecting CNO in the MFC has no effect, but this is potentially because the concentration of DA receptor (targeted by the CNO…) is smaller there than in the ACC. Separate issue, but as I mentioned above, describing what is essentially compulsive/habitual lever pressing as 'exploratory seeking' does not help.

Given these, I am not sure that any of the conclusions regarding the role of DA really stands. Again, the experimental design as a whole is excellent, I believe, and the results are interesting in terms of clarifying the neural and behavioral processes at stake in PIT, but ample revision are needed.

Reviewer #3:

This is a very interesting and cleverly-designed study that parses our how mesocorticolimbic dopamine (DA) transmission may be involved in the ability of Pavlovian cues to invigorate reward seeking, using a well-established Pavlovian-to-Instrumental transfer test. Using DREADD approaches to silence dopamine cell bodies they show that suppressing DA activity blocks or attenuates the PIT effect. Moreover, this appeared to be mediated primarily by DA activity in the accumbens, but not prefrontal cortex. What was particularly important about these findings is that the authors conducted a sophisticated microstructural analysis of behavior, focusing on how often rats may approach the food cup (as a measure of reward expectancy). This analysis showed that, even though reducing DA activity suppressed the increase in lever pressing induced by Pavlovian, reward-associated cues, it did not affect approaches to the food cup, suggesting that DA's role in this context is to enhance motivation, but not reward expectancy. They also showed that reducing DA activity did not influence how reinforcer devaluation altered responding.

This is a well-designed study that has important implications for teasing out what DA does (and does not do) in modulating reward seeking. The analyses are sound, and the Discussion is fair and scholarly. I have no major issues with the paper, but a few relatively minor points the authors should attend to.

1) Subsection “Pavlovian conditioning”, was a 10 Hz tone used? Or 10 kHz (I'm not sure rats can hear 10 Hz).

2) "After a session of Pavlovian conditioning, rats were given a 30-min extinction session" – based on the preceding section, it's unclear what happened. Did rats receive a Pavlovian session after the instrumental training and before the first test and then every subsequent one? This should be clarified.

3) Subsection “Inhibiting dopamine neurons during Pavlovian-to-instrumental transfer preferentially disrupts cue-motivated reward seeking, but not reward retrieval”, last paragraph – the reader is referred to Figure 3E and F, but there is no F panel. Do they mean D and E?

4) In looking at the mCherry group in Figure 3, it appears that systemic CNO may have attenuated the PIT effect (although clearly not as much as it did in the hM4Di group). I'm not sure if the analysis look at this comparison, but it would be important to point out either way if CNO on its own has some effect on PIT.

5) The authors use an interesting correlational analysis to try explain the variation in the accumbens CNO data. However, looking at the histology, there is also considerable variation in the placements of infusions, some in core, some in shell. This warrants some mention, and perhaps another analysis seeing if the effects were stronger in animals with placements located in one NAc subregion vs. another.

[Editors’ note: what now follows is the decision letter after the authors submitted for further consideration.]

Thank you for resubmitting your work entitled "Mesolimbic dopamine projections mediate cue-motivated reward seeking but not reward retrieval" for further consideration at eLife. Your revised article has been favorably evaluated by Joshua Gold as the Senior Editor, a Reviewing Editor, and three reviewers.

All of the reviewers agreed that the manuscript has been greatly improved. However. the reviewer 1 still has remaining issues as described below. Before proceeding further, we would like to see your response to these comments.

Reviewer #1:

Halbout et al. show that inhibition of dopamine neurons (or terminals in the nucleus accumbens, but not prefrontal cortex) reduces the general Pavlovian-instrumental transfer (PIT) effect primarily by reducing operant responses that occur without a following reward port entry. In rats trained to perform an instrumental task, reward devaluation reduced operant performance in an extinction test, but left operant response-port entry sequences relatively unaffected, indicating that performance of operant-entry sequences is most likely a stimulus-response habit. These results make the subtle but important point that even though cues that motivate reward seeking increase the number of action sequences performed via a mesolimbic dopamine-dependent mechanism, the performance of the sequence once it is initiated is independent of mesolimbic dopamine. This point is in line with previous work from the authors and others, and its support with microstructural behavioral analysis provides a careful and useful addition to the field.

This version of the paper includes additional control experiments, and the authors' new interpretations are much better supported by their data than previously. However, the authors should address a few additional points.

1) The authors present evidence that the chunked operant-entry action sequence is resistant to devaluation. However, it is possible that port entry behavior is simply resistant to devaluation whether or not the entry follows an operant response. The authors show that animals make "noncontingent food cup approaches" (i.e., port entries), but do not analyze the rate of this behavior in any of their tasks. The argument that operant responses are resistant to devaluation because they are part of a chunked, habit action sequence would be strengthened if entries that are NOT part of such sequences were not resistant to devaluation.

2) It would also be interesting to know whether noncontingent entries in the PIT test are dependent on dopamine/mesolimbic dopamine.

3) Systemic CNO had a greater effect on PIT than intra-accumbens CNO. The authors should discuss the possible reasons for this.

4) Figure 4F and the similar supplementary figure are hard to understand. First, the X axis should be more descriptive (i.e., refer to CS-induced increase in lever presses without approach). Second, the Y axis reads "CNO suppression", but apparently greater suppression is represented by more negative numbers. This is counterintuitive. I think the authors are trying to say that the impact of CNO positively correlates with the impact of the CS on lever presses without approach, but they write that there is a negative correlation. The figure legend adds to the confusion by saying that the Y axis is "PIT Score for vehicle test – PIT Score for CNO test". If that's the case, then the lower PIT scores in CNO should result in positive values, not the predominantly negative values shown in the figure.

Reviewer #2:

The authors seriously improved the manuscript and they addressed my previous remarks. This is a very interesting piece of work and I have no major concern anymore.

Reviewer #3:

In my opinion, the authors have done a fine job at addressing concerns raised during the previous rounds of review. I have no further comments. I reiterate that I believe this is an interesting and clever study.

eLife. 2019 May 20;8:e43551. doi: 10.7554/eLife.43551.024

Author response


[Editors’ note: the author responses to the first round of peer review follow.]

Reviewer #1:

[…] Although for the most part the results are convincing from the technical perspective, the interpretation completely depends on the authors' claim that port entries occurring shortly after an operant response are an index of reward expectancy. The authors support this claim by showing, in an instrumental task identical to the one used in the PIT experiments, that cues presented after the operant response that signal reward delivery (pellet drop) increase the probability of a port entry. However, this unsurprising result is entirely consistent with an alternative interpretation that the authors have apparently not considered. That is that port entry is simply the next step after the lever response in a chunked skilled action sequence. Such sequences are most likely a form of stimulus-response habit in which the stimulus is the previous action. In addition, because pellet drop occurs after the lever press and predicts reward, it could also participate in S-R habit learning. In this scenario, a lever press followed by pellet drop would simply activate the response (port entry) more strongly than either stimulus alone. Supporting this interpretation, the authors show that port entry after lever pressing is strongly resistant to outcome devaluation – a hallmark of S-R habits. It is therefore perplexing that the authors use this behavior as an index of reward expectancy – a component of goal-directed and cognitive forms of behavior, which should be sensitive to devaluation precisely because the value of the expected reward is reduced.

The authors go on to show that during the PIT test, not only does the CS+ increase the probability of the operant response relative to control periods (e.g., CS-), but it also increases the probability that any individual operant response will be followed by a port entry. They further show that chemogenetic inhibition of VTA dopamine neurons or terminals in the accumbens during the PIT test reduces the CS-induced increase in operant responding while leaving intact the probability of port entry after an operant response. The authors conclude that although a CS-triggered increase in reward expectancy drives a component of the PIT effect (those operant responses followed by port entry), an additional component is not driven by expectancy (operant responses not followed by port entry), and dopamine contributes to PIT via the latter mechanism, likely via enhancing cue-triggered motivation.

To accept this interpretation, one must accept the assertion that operant-entry sequences are the result of reward expectancy to a greater extent than operant responses occurring without a subsequent entry. As described above, this claim is suspect. It is certainly interesting that inhibition of dopamine neurons has a greater effect on single operants than operant-entry sequences, but there are alternative explanations. For instance, the authors' devaluation experiment suggests that operant actions that are not followed by port entry are sensitive to devaluation and are therefore likely controlled by action-outcome goal-directed associations. In contrast, the post-devaluation increase in frequency of operant-entry sequences suggests that operant-entry behavior is at least partially controlled by stimulus-response habit associations. In this scenario, dopamine facilitates the influence of a Pavlovian CS+ on goal-directed behavior but not habit behavior. Under the assumption that reward expectancy is more important for goal-directed than habit behaviors, this is roughly the opposite of the authors' conclusion. What the authors may have found is something perhaps even more intriguing: that CSs in the PIT test can influence habit-based behaviors at least as strongly as goal-directed behaviors. Of course, one might expect the neural mechanism of such effects to be different (as an interaction between Pavlovian and action value representations is possible only with goal-directed behaviors), and it is therefore informative that dopamine contributes only to facilitation of goal-directed behaviors.

In summary, the authors have found something intriguing in the resistance of operant response-port entry sequences to inhibition of dopamine neurons and dopamine terminals in the accumbens. However, their interpretation is not adequately supported by their data, some of which in fact contradict their conclusions.

We agree that our original focus on reward expectancy was confusing. This was partly due to our unconventional use of that term, which we did not intend to be synonymous with goal-directed control given evidence that reward expectancies can guide action selection in a manner that is independent of reward value during PIT performance (Colwill and Rescorla, 1990; Rescorla, 1994; Holland, 2004). But we now believe this framework for data interpretation was unclear and unnecessary. We have therefore omitted the term reward expectancy altogether, and have substantially revised the manuscript to focus on the broader topic of how dopamine contributes to the control of instrumental reward-seeking and reward-retrieval actions, while explicitly contrasting goal-directed (devaluation-sensitive) and habitual (devaluation-insensitive) accounts of behavior. From this perspective, we agree with the reviewer’s conclusion that press-contingent food-cup approaches were performed habitually, suggesting the use of a press-approach behavioral chain or action chunk. This interpretation is also compatible with our finding that reward-paired cues strongly increase production of press-approach chunks, given previous studies showing that habitual behavior is particularly sensitive to the motivational influence of such cues (Holland, 2004; Wiltgen et al., 2012).

As the reviewer notes, presses that were not followed by food-cup approach were particularly sensitive to reward devaluation. We have elaborated on our discussion of this issue (subsection “Inhibiting dopamine neurons spares the sensitivity of reward-seeking actions to reward devaluation”, last paragraph; Discussion, second paragraph, documents with colored text) and added a new analysis to help develop this finding (Figure 5D; Supplementary file 1G). However, we do not believe this is a defining feature of such actions. For instance, the variable-interval training protocol used in our PIT experiments is known to support habit formation. Thus, even discrete lever presses (without approach) performed during PIT tests were likely to have been insensitive to devaluation, had this been assessed. We suggest that such discrete lever presses are not exclusively aligned with either habitual or goal-directed control. Indeed, others have argued that repetitive lever pressing without subsequent food-cup approach may model OCD-like compulsive checking behavior, which we discuss in the last paragraph of the Discussion section.

Reviewer #2:

[…] I have a series of issues with the manuscript:

The terminology is very confusing, and the theoretical frame is globally difficult to grasp. There are many concepts that are used to capture very simple behaviors, even in the Results section (reward seeking, exploration, reward expectancy, motivation, vigor etc.). Could the authors make a set of simple predictions based on a simple theoretical frame and describe how they decide to operationalize the questions? There are places in the text when it is the case, but I'm afraid that there is a drift in the issues at stake between the Abstract and the Discussion.

We acknowledge that our original framework was confusing. As noted in our comments to reviewer #1, we have thoroughly revised the manuscript to correct this problem. Also, as noted above, we have adopted unambiguous terms for behavior in the Results section and figures, and have operationalized more complex or ambiguous terms.

The description of the methods is too superficial. It might be sufficient for specialists in the field (conditioning in rats) and/or people who know the work of this team, but not for a more general audience. Typically, including the contingencies and the timing of the reward schedules in the different tasks would help.

We apologize for any omissions and have added to this part of the Materials and methods section to make sure that these and other important details are specified.

I am having some trouble reconciling the conclusions of the authors and the data, as shown by the figure:

Experiment 1: Again, the text is difficult to read but it seems to me that there are two things going on here: the operant behavior of these animals in the task is very 'habitual' (as opposed to goal-directed), since the rate of lever pressing decreases very little when reward is omitted (Figure 1D). In line with that, presumably, animals almost only try to collect the reward when they get a direct evidence (cue) that it has been delivered. So reward does not drive action and actions do not predict reward, and animals are left with Pavlovian processes to anticipate the reward. Is there more to it than that? If yes, it would require precise quantitative predictions and measures to demonstrate it. I would avoid the word 'exploration', unless the authors can demonstrate that animals try to get the reward by exploring another potential way to get the reward (e.g. another lever). Here, action seems to be driven much more by compulsion/ habits than exploration.

We generally agree with and have adopted this habit-based interpretation, as noted in our comments to reviewer #1. We have also avoided using exploration, exploratory reward seeking, and other ambiguous terms, as noted above.

Experiments 2 and 3:

Looking at Figure 3B (Experiment 2) and Figure 4B and Figure 4—figure supplement 2 (Experiment 3), CNO alone has a strong effect on behavior and the difference between mCherry and hM4Di animals is really small. Not even sure it is significant, on the scale of the experiment. For example, in Experiment 2, Figure 3D shows a greater effect of CNO on response to CS- than CS+. Again, when looking at all the bars, I don't understand the conclusions.

Reviewers #2 and #3 both noted that systemic CNO may have had a nonspecific (hM4Di-independent) effect on lever pressing and that this should be acknowledged in the text. Consistent with this observation, our analysis of Total Lever Presses in Experiment 2 (Figure 3B) found a significant Drug x CS Period x CS Type interaction (p = 0.007), indicating that CNO partially suppressed lever pressing in a manner that was not necessarily limited to the hM4Di group. While this and related findings were indicated in the statistical output table for this analysis (Supplementary file 1A), we had not given it appropriate attention in the main text. We have edited the manuscript to acknowledge this potential nonspecific CNO effect (subsection “Inhibiting dopamine neurons during Pavlovian-to-instrumental transfer preferentially disrupts cue-motivated reward seeking, but not reward retrieval”, third paragraph). However, we also emphasize that this partial, nonspecific suppression of lever pressing does not account for the hM4Di-dependent effects of CNO that support our main conclusions. As noted in the Results section (see the aforementioned paragraph), we found a significant Group x Drug x CS period x CS type interaction (p =.002), indicating that the effects of CNO on PIT performance was more disruptive for the hM4Di group than for the mCherry group. Further analysis of data from the hM4Di group (excluding mCherry data) revealed a significant Drug x CS period x CS type interaction (p <.001). This group showed a CS+ specific increase in pressing during the vehicle test (CS period x CS type interaction, p <.001), but showed no such effect when tested on CNO (CS period x CS type interaction, p =.684). In contrast, analysis of data from the mCherry group (excluding hM4Di data) found a CS+ specific increase in lever pressing (CS Period x CS Type interaction, p <.001) that was not significantly altered by CNO (Drug x CS Period x CS Type interaction, p =.780). This null effect in the mCherry group is in line with a recently published study led by Dr. Kate Wassum (coauthor) that systemic CNO treatment does not significantly disrupt expression of the PIT effect in DREADD-free rats (Collins et al., 2019), which is cited in the current manuscript. It is also important to note that the same systemic CNO treatment did not significantly affect lever-press performance in Experiment 4 (Figure 5B). Based on these results and others noted below, we believe our original conclusions are strongly supported by the data, though we agree that they should be discussed in the context of potential nonspecific CNO effects.

We have also added an acknowledgement of the potential for CNO microinfusions to produce a partial nonspecific suppression in lever pressing in the mCherry rats used in Experiment 3B (subsection “Surgery”; Figure 4—figure supplement 2). Importantly, this effect did not reach significance (Drug * Site * CS Period * CS Type interaction, p =.068) and was driven by intra-mPFC (not NAc) CNO injections. Moreover, further analysis revealed that CNO did not significantly disrupt the CS+ induced increase in lever pressing (PIT score) in either group (p’s >.165). These findings indicate that the disruption of PIT detected in Experiment 3A (Figure 4C and 4D) following intra-NAc CNO injections in rats expressing hM4Di in VTA dopamine neurons resulted from local dopamine terminal inhibition and not a nonspecific action of that drug.

As detailed below, we believe that reviewer #2’s separate concern about effect sizes stems partly from variability in lever press rates across groups, drug conditions, and CS periods, which tended to obscure the effect of chemogenetic dopamine neuron inhibition on the increase in lever pressing that is specifically attributable to noncontingent CS+ presentation. In the next comment section, we develop the rationale behind our analysis of CS+ elicited behavior.

In Experiment 2, to test the prediction regarding the role of DA in PIT, they should test the interaction between 3 factors (CNO vs. Vehicle; mCherry vs. hM4Di and pre vs. post CS+). By the way, why using the pre/post CS+ when there is a CS- condition, which seems a priori as a better reference for assessing the specificity of the Pavlovian effects of the CS+, as opposed to general effect of having a stimulus? Looking at the data, CNO abolishes the increase in responding evoked by the CS+, and the major difference between mCherry and hM4Di is in the pre CS+ period… If the authors really want to take into trial to trial variability, they could use a relative measure (pre vs. post, on a trial by trial basis) but compare the effect of CS+ with the effect of CS- to assess PIT, and the influence of the various manipulations.

As requested by the reviewer, we have included a description of the significant Drug (CNO vs. Vehicle) x Group (mCherry vs. hM4Di) x CS period (pre vs. CS) interaction for CS+ trials (i.e., omitting CS- trials) (t(120) = -2.53, p =.013) in the Results section (subsection “Inhibiting dopamine neurons during Pavlovian-to-instrumental transfer preferentially disrupts cue-motivated reward seeking, but not reward retrieval”, third paragraph), which further supports the conclusion that inhibiting VTA dopamine neurons suppressed the ability of the CS+ to stimulate lever pressing.

We agree with the reviewer that the CS- is an essential control for evaluating the behavioral effects of the CS+ that specifically depend on that cue’s relationship with reward. However, it is not sufficient to simply contrast levels of responding during CS+ and CS- presentations without controlling for local baseline rates of responding during pre-CS periods. This is particularly important in PIT studies because lever press rates fluctuate sporadically due to the lack of reinforcement at test. For this reason, virtually all PIT studies control for pre-CS response levels (Holmes et al., 2010; Cartoni et al., 2016). It is for this reason that our primary analysis of PIT data always incorporates both CS Type (CS+ vs. CS-) and CS Period (pre-CS vs. CS) as factors, in addition to Drug (CNO vs. Veh) and Group (e.g., AAV group for Experiment 2 or injection site for Experiments 3A and 3B). When interpreting the results of these complex analyses, our main focus is on interactions involving both CS Type and CS Period, which reflects the ability for the CS+ but not the CS- to selectively increase responding relative to their cue-specific baseline levels. Main effects of Drug and Group are also of interest, as they represent general (cue-independent) behavioral effects. However, lower level interactions involving only CS Period or CS Type are harder to interpret because they are likely driven – to some degree – by incidental or nonspecific differences in pre-CS response rates, effects which are either uninteresting (random) or are better captured by the main effects of Drug or Group. Therefore, while the supplementary tables outline the full output of our main analyses (Supplementary file 1A-G), our discussion of results within the main text focuses on the results of most interest (i.e., main effects of Drug, Group, the Drug * Group interaction, or any interactions involving both CS Period and CS Type). We have modified the “Statistical Analysis” subsection of the Materials and methods to make this aspect of our data interpretation explicit to readers (third paragraph).

Incidental and/or nonspecific differences in pre-CS response rates can also make it difficult for readers to identify the source of the complex interactions expressed in raw data (total presses) in Figures 3B and 4B (each has 16 bars, including pre-CS periods). Therefore, we now include an analysis of PIT scores isolating CS+ induced changes in pressing (CS+ – pre-CS+; see Figures 3C, 4D, and Figure 4—figure supplement-2). This is a widely-used measure of the PIT effect (Pecina et al., 2006; El-Amamy and Holland, 2007; Bertran-Gonzalez et al., 2013; Laurent et al., 2016; Laurent et al., 2017; Alarcon et al., 2018; Panayi and Killcross, 2018) and facilitates data interpretation for general readers. This secondary analysis is also useful for confirming the source of potential drug effects on CS+ elicited behavior. We have used this analysis in place of the admittedly confusing and less compelling CNO suppression analysis that had been used in the original manuscript.

It is essentially the same thing for Experiment 3: the effect of CNO (here in the NAc) vs. vehicle is much bigger than the difference between mCherry and hM4Di. Ok, injecting CNO in the MFC has no effect, but this is potentially because the concentration of DA receptor (targeted by the CNO…) is smaller there than in the ACC.

This comment also raises another important point about our experimental design and statistical analysis, which depends heavily on within-subject comparisons (e.g., CS type, CS period, Drug). Not only is it crucial to consider baseline (pre-CS) response rates, as discussed above, it is essential to take into account unintended, presumably incidental, between-subject differences in PIT performance when evaluating the effect of CNO. For instance, in the Vehicle test of Experiment 2, rats in the hM4Di group showed a slightly stronger CS+ induced increase in lever pressing, relative to the mCherry group. These between-subject differences in off-drug PIT performance should be controlled for when assessing within-subject effects of CNO (Figure 3B). Similarly, for Experiment 3B (Figure 4—figure supplement 2), rats in the mCherry NAc group happened to show a relatively modest increase in lever pressing to the CS+ in their Vehicle test. Importantly, CNO administration did not significantly suppress this PIT effect and, in fact, was associated with a nonsignificant increase in CS+ elicited pressing. As noted above, this result indicates that the tendency for intra-NAc CNO administration to disrupt PIT performance in hM4Di expressing rats in Experiment 3A (see Figures 4C and 4D) was not due to a nonspecific drug effect.

One additional wrinkle to Experiment 3 is that Experiment 3B was conducted separately as a control study to analyze potential nonspecific CNO effects. While we used the same basic procedure as in Experiment 3A, direct comparisons between experiments are tempting but difficult to interpret. We believe that the critical question to ask when evaluating such data is whether CNO produced an effect in DREADD-free rats that accounts for the main findings, as is common in related studies (Augur et al., 2016; Laurent et al., 2016; Marchant et al., 2016; Campese et al., 2017; Lichtenberg et al., 2017; Alcaraz et al., 2018; Hsu et al., 2018; Collins et al., in press). Our findings demonstrate that this was not the case.

Separate issue, but as I mentioned above, describing what is essentially compulsive/habitual lever pressing as 'exploratory seeking' does not help.

As noted above, we have omitted this terminology.

Given these, I am not sure that any of the conclusions regarding the role of DA really stands. Again, the experimental design as a whole is excellent, I believe, and the results are interesting in terms of clarifying the neural and behavioral processes at stake in PIT, but ample revision are needed.

We thank the reviewer for these constructive comments and hope that our extensive revisions have adequately addressed their concerns.

Reviewer #3:

This is a very interesting and cleverly-designed study that parses our how mesocorticolimbic dopamine (DA) transmission may be involved in the ability of Pavlovian cues to invigorate reward seeking, using a well-established Pavlovian-to-Instrumental transfer test. Using DREADD approaches to silence dopamine cell bodies they show that suppressing DA activity blocks or attenuates the PIT effect. Moreover, this appeared to be mediated primarily by DA activity in the accumbens, but not prefrontal cortex. What was particularly important about these findings is that the authors conducted a sophisticated microstructural analysis of behavior, focusing on how often rats may approach the food cup (as a measure of reward expectancy). This analysis showed that, even though reducing DA activity suppressed the increase in lever pressing induced by Pavlovian, reward-associated cues, it did not affect approaches to the food cup, suggesting that DA's role in this context is to enhance motivation, but not reward expectancy. They also showed that reducing DA activity did not influence how reinforcer devaluation altered responding.

This is a well-designed study that has important implications for teasing out what DA does (and does not do) in modulating reward seeking. The analyses are sound, and the Discussion is fair and scholarly. I have no major issues with the paper, but a few relatively minor points the authors should attend to.

1) Subsection “Pavlovian conditioning”, was a 10 Hz tone used? Or 10 kHz (I'm not sure rats can hear 10 Hz).

We thank the reviewer for spotting this mistake. We used a 2kHz pure tone that pulsated at 10Hz (0.1s on/0.1s off). We have corrected the Materials and methods section accordingly (subsection “Pavlovian conditioning”).

2) "After a session of Pavlovian conditioning, rats were given a 30-min extinction session" – based on the preceding section, it's unclear what happened. Did rats receive a Pavlovian session after the instrumental training and before the first test and then every subsequent one? This should be clarified.

We have revised this passage to clarify our training/retraining procedure (subsection “Pavlovian-to-instrumental transfer (PIT) test”). After the last session of instrumental training, rats were given a session of Pavlovian (CS+) training and a 30-min extinction session on the days prior to the first PIT test. At the end of this paragraph we clarify that: “before each new round of testing, rats were given two sessions of instrumental retraining (RI-60s), one session of CS+ retraining, and one 30-min extinction session, as described above.”

3) Subsection “Inhibiting dopamine neurons during Pavlovian-to-instrumental transfer preferentially disrupts cue-motivated reward seeking, but not reward retrieval”, last paragraph – the reader is referred to Figure 3E and F, but there is no F panel. Do they mean D and E?

We have corrected our panel labels. Thank you for drawing our attention to this.

4) In looking at the mCherry group in Figure 3, it appears that systemic CNO may have attenuated the PIT effect (although clearly not as much as it did in the hM4Di group). I'm not sure if the analysis look at this comparison, but it would be important to point out either way if CNO on its own has some effect on PIT.

As noted in our response to a similar comment by reviewer #2, we have revised the manuscript to acknowledge this potential nonspecific effect of CNO and provide evidence that this effect does not account for the attenuated CS+ elicited lever pressing produced by CNO in the hMD4i group (subsection “Inhibiting dopamine neurons during Pavlovian-to-instrumental transfer preferentially disrupts cue-motivated reward seeking, but not reward retrieval”, third paragraph).

5) The authors use an interesting correlational analysis to try explain the variation in the accumbens CNO data. However, looking at the histology, there is also considerable variation in the placements of infusions, some in core, some in shell. This warrants some mention, and perhaps another analysis seeing if the effects were stronger in animals with placements located in one NAc subregion vs. another.

We have looked closely at this issue. Unfortunately, variability in injector location did not help resolve variability in effect of CNO on PIT performance. This is not entirely surprising given previous findings that dopamine signaling in both core and shell may contribute to cue-motivated lever pressing (Lex and Hauber, 2008; Peciña and Berridge, 2013). We note this issue in the main text (subsection “Pathway-specific inhibition of dopamine projections to NAc, but not mPFC, disrupts cue-motivated reward seeking but not retrieval”, first paragraph).

Additional References:

Alarcon DE, Bonardi C, Delamater AR (2018). Associative mechanisms involved in specific Pavlovian-to-instrumental transfer in human learning tasks. Q J Exp

Psychol (Hove) 71:1607-1625.

Alcaraz F, Fresno V, Marchand AR, Kremer EJ, Coutureau E, Wolff M (2018). Thalamocortical and corticothalamic pathways differentially contribute to goal directed behaviors in the rat. eLife 7.

Augur IF, Wyckoff AR, Aston-Jones G, Kalivas PW, Peters J (2016) Chemogenetic

Activation of an Extinction Neural Circuit Reduces Cue-Induced Reinstatement of

Cocaine Seeking. J Neurosci 36:10174-10180.

Bertran-Gonzalez J, Laurent V, Chieng BC, Christie MJ, Balleine BW (2013). Learning related translocation of delta-opioid receptors on ventral striatal cholinergic interneurons mediates choice between goal-directed actions. J Neurosci 33:16060-16071.

Campese VD, Soroeta JM, Vazey EM, Aston-Jones G, LeDoux JE, Sears RM (2017). Noradrenergic Regulation of Central Amygdala in Aversive Pavlovian-to Instrumental Transfer. eNeuro 4.

Cartoni E, Balleine B, Baldassarre G (2016). Appetitive Pavlovian-instrumental Transfer: A review. Neurosci Biobehav R 71:829-848.

Colwill RM, Rescorla RA (1990). Effect of reinforcer devaluation on discriminative control of instrumental behavior. J Exp Psychol Anim Behav Process 16:40-47.

El-Amamy H, Holland PC (2007). Dissociable effects of disconnecting amygdala central nucleus from the ventral tegmental area or substantia nigra on learned orienting and incentive motivation. Eur J Neurosci 25:1557-1567.

Holmes NM, Marchand AR, Coutureau E (2010). Pavlovian to instrumental transfer: A neurobehavioural perspective. Neurosci Biobehav R 34:1277-1295.

Hsu TM, Noble EE, Liu CM, Cortella AM, Konanur VR, Suarez AN, Reiner DJ, Hahn JD, Hayes MR, Kanoski SE (2018). A hippocampus to prefrontal cortex neural

pathway inhibits food motivation through glucagon-like peptide-1 signaling. Mol

Psychiatry 23:1555-1565.

Laurent V, Chieng B, Balleine BW (2016). Extinction Generates Outcome-Specific

Conditioned Inhibition. Curr Biol 26:3169-3175.

Laurent V, Wong FL, Balleine BW (2017). The Lateral Habenula and Its Input to the Rostromedial Tegmental Nucleus Mediates Outcome-Specific Conditioned

Inhibition. Journal of Neuroscience 37:10932-10942.

Marchant NJ, Campbell EJ, Whitaker LR, Harvey BK, Kaganovsky K, Adhikary S, Hope BT, Heins RC, Prisinzano TE, Vardy E, Bonci A, Bossert JM, Shaham Y (2016). Role of Ventral Subiculum in Context-Induced Relapse to Alcohol Seeking after Punishment-Imposed Abstinence. J Neurosci 36:3281-3294.

Panayi MC, Killcross S (2018). Functional heterogeneity within the rodent lateral

orbitofrontal cortex dissociates outcome devaluation and reversal learning

deficits. eLife 7.

Pecina S, Schulkin J, Berridge KC (2006) Nucleus accumbens corticotropin-releasing factor increases cue-triggered motivation for sucrose reward: paradoxical positive incentive effects in stress? BMC biology 4:8.

Rescorla RA (1994) Transfer of Instrumental Control Mediated by a Devalued Outcome. Anim Learn Behav 22:27-33.

[Editors’ note: the author responses to the re-review follow.]

Reviewer #1:

[…] This version of the paper includes additional control experiments, and the authors' new interpretations are much better supported by their data than previously. However, the authors should address a few additional points.

1) The authors present evidence that the chunked operant-entry action sequence is resistant to devaluation. However, it is possible that port entry behavior is simply resistant to devaluation whether or not the entry follows an operant response. The authors show that animals make "noncontingent food cup approaches" (i.e., port entries), but do not analyze the rate of this behavior in any of their tasks. The argument that operant responses are resistant to devaluation because they are part of a chunked, habit action sequence would be strengthened if entries that are NOT part of such sequences were not resistant to devaluation.

We agree that the question of whether response-contingent and noncontingent approach responses differ in their sensitivity to reward devaluation is an important one. However, we are limited in our ability to analyze this issue in with the current data set. For our reward-specific devaluation, both rewards are retrieved from the same food cup. This makes it impossible to distinguish between noncontingent (press-independent) food cup approaches based on whether they are motivated by the devalued vs. non-devalued reward. We have indicated this reason for not including noncontingent approach analysis in the Results section (subsection “Inhibiting dopamine neurons spares the sensitivity of reward-seeking actions to reward devaluation”, last paragraph). We also note that other findings from our lab (not shown) from studies using a single action-outcome contingency task indicate that noncontingent approaches are indeed readily suppressed by reward devaluation, in contrast to press-contingent approaches. These findings are in line with previous reports that devaluation suppresses food-cup approach behavior (e.g., Balleine, 1992; Thrailkill and Bouton, 2017), including when those approaches are directly elicited by Pavlovian conditioned stimuli (Holland and Straub, 1979; Lichtenberg et al., 2017).

2) It would also be interesting to know whether noncontingent entries in the PIT test are dependent on dopamine/mesolimbic dopamine.

Our analysis of noncontingent entries during PIT testing (Figure 3—figure supplement 3, Figure 4—figure supplement 4) indicates that VTA dopamine inhibition (CNO in hM4Di expressing rats) does not disrupt this aspect of behavior.

3) Systemic CNO had a greater effect on PIT than intra-accumbens CNO. The authors should discuss the possible reasons for this.

We have added a passage (subsection “Pathway-specific inhibition of dopamine projections to NAc, but not mPFC, disrupts cue-motivated reward seeking but not retrieval”, second paragraph) discussing this issue.

4) Figure 4F and the similar supplementary figure are hard to understand. First, the X axis should be more descriptive (i.e., refer to CS-induced increase in lever presses without approach). Second, the Y axis reads "CNO suppression", but apparently greater suppression is represented by more negative numbers. This is counterintuitive. I think the authors are trying to say that the impact of CNO positively correlates with the impact of the CS on lever presses without approach, but they write that there is a negative correlation. The figure legend adds to the confusion by saying that the Y axis is "PIT Score for vehicle test – PIT Score for CNO test". If that's the case, then the lower PIT scores in CNO should result in positive values, not the predominantly negative values shown in the figure.

We agree that this figure panel and caption were generally confusing and have made several changes to clarify what is being presented. We continue to prefer using a measure (CNO-Vehicle) for which negative numbers reflect the degree of suppression is desirable. But we have corrected the caption and avoided directly using the term negative correlation in the main text (subsection “Pathway-specific inhibition of dopamine projections to NAc, but not mPFC, disrupts cue-motivated reward seeking but not retrieval”, fifth paragraph), which we agree may confuse readers. Importantly, the direction of the correlation statistic and nature of the measures used are clearly presented in the text and figures.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Figure 1—source data 1. This spreadsheet contains the behavioral responses for individual rats in Figure 1.
    DOI: 10.7554/eLife.43551.003
    Figure 3—source data 1. This spreadsheet contains the behavioral responses for individual rats in Figure 3.
    DOI: 10.7554/eLife.43551.009
    Figure 4—source data 1. This spreadsheet contains the behavioral responses for individual rats in Figure 4.
    DOI: 10.7554/eLife.43551.016
    Figure 5—source data 1. This spreadsheet contains the behavioral responses for individual rats in Figure 5.
    DOI: 10.7554/eLife.43551.019
    Supplementary file 1. Generalized linear mixed-effects model outputs.
    elife-43551-supp1.docx (36.1KB, docx)
    DOI: 10.7554/eLife.43551.020
    Transparent reporting form
    DOI: 10.7554/eLife.43551.021

    Data Availability Statement

    All data generated and analyzed during this study are included in supporting files. Source data files have been provided for Figures 1, 3, 4 and 5, as well as their respective figure supplements.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES