Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Jun 15.
Published in final edited form as: Neuron. 2016 May 26;90(6):1312–1324. doi: 10.1016/j.neuron.2016.04.043

Endocannabinoid modulation of orbitostriatal circuits gates habit formation

Christina M Gremel 1,2, Jessica Chancey 1, Brady Atwood 1, Guoxiang Luo 1, Rachael Neve 3, Charu Ramakrishnan 4, Karl Deisseroth 4, David M Lovinger 1,*, Rui M Costa 5,*
PMCID: PMC4911264  NIHMSID: NIHMS784683  PMID: 27238866

Abstract

Everyday function demands efficient and flexible decision-making allowing for habitual and goal-directed action control. An inability to shift has been implicated in disorders with impaired decision-making including obsessive-compulsive disorder and addiction. Despite this, our understanding of specific molecular mechanisms and circuitry involved in shifting action control remains limited. Here we identify an endogenous molecular mechanism, in a specific cortical-striatal pathway, mediating the transition between goal-directed and habitual action strategies. Deletion of cannabinoid type 1 (CB1) receptors from cortical projections originating in the orbitofrontal cortex (OFC) prevents mice from shifting from goal-directed to habitual instrumental lever-pressing. Activity of OFC neurons projecting to dorsal striatum (OFC-DS) and specifically, activity of OFC-DS terminals, is necessary for goal-directed action control. Lastly, CB1 deletion from OFC-DS neurons prevents the shift from goal-directed to habitual action control. These data suggest that the emergence of habits depends on endocannabinoid-mediated attenuation of a competing circuit controlling goal-directed behaviors.


Decision-making requires a balance between flexible and efficient action selection. For normal function, we need to be able to retrieve routine actions quickly and efficiently, and habits serve this purpose. However, the transition between habitual and goal-directed control provides the capacity to perform the same action based on updated consequences. Difficulties with stopping habits and shifting to goal-directed control underlie numerous neuropsychiatric disorders that display impaired decision-making (Dias-Ferreira et al., 2009; Graybiel, 2008; Griffiths et al., 2014) including obsessive-compulsive disorder (Burguière et al., 2015; Gillan et al., 2011) and addiction (Belin et al., 2013; Goldstein and Volkow, 2011). However, our understanding of specific molecular mechanisms and circuitry involved in controlling action shifting remains limited.

In everyday decision-making, the contiguity and contingency of actions relative to outcomes can be more and less predictable. Hence it is likely that both goal-directed and habitual action strategies are contributing to action control, although to varying degrees. This raises the possibility that decision-making strategies compete for downstream behavioral control. Indeed, this has been observed in supporting neural circuits, with neural coding of habitual actions in the dorsal medial striatal (DMS), a region known to participate in support goal-directed control, and vice versa, neural coding of goal-directed actions in dorsal lateral striatal (DLS), a region known to support habitual processes (Graybiel, 2008; Gremel and Costa, 2013; Hilario et al., 2012; Stalnaker et al., 2010; Thorn et al., 2010; Yin et al., 2006; 2005a; 2004). Thus, disorders such as OCD and addiction may induce a pathology that results in an over-reliance on habitual circuitry in situations where greater goal-directed control would be more advantageous. Such reliance on habitual action circuitry has been observed following extended and repetitive action training (Smith and Graybiel, 2013; Thorn et al., 2010), and following periods of drug-self-administration (Belin and Everitt, 2008; Corbit et al., 2012; Dickinson et al., 2002), with notable exceptions (Colwill and Rescorla, 1985; Samson et al., 2004). These findings underscore the importance of understanding the mechanisms and circuits involved in shifts between these differing action control strategies.

We have found that the orbital frontal cortex (OFC) is necessary for animals to modulate action value and to shift towards goal-directed actions (Gremel and Costa, 2013). We previously developed a within-subject action shifting instrumental lever-press task that allowed us to examine the activity of the same neurons in involved circuits as the animal performed the same action in a goal-directed versus habitual manner. While previous findings hypothesized a role for OFC in stimulus-outcome learning (Ostlund and Balleine, 2007; Rolls et al., 1996; Stalnaker et al., 2015), our study (Gremel and Costa, 2013) as well as additional findings (Bradfield et al., 2015; Gourley et al., 2013; Rhodes and Murray, 2013; Stalnaker et al., 2015), have identified a role for lateral OFC (Gourley et al., 2013; Gremel and Costa, 2013) and medial OFC (Bradfield et al., 2015) in the control over goal-directed decision-making that is independent of contextual control, but is dependent on the expected action value. Together, along with medial prefrontal cortex (Killcross and Coutureau, 2003), these findings strongly support OFC involvement in goal-directed decision-making. Intriguingly, our previous findings suggested that to shift from goal-directed to habitual actions necessitates a decrease in excitatory transmission from lateral OFC projection neurons, with optogenetic activation selectively increasing the frequency of goal-directed and not habitual actions, and chemogenetic inhibition of lateral OFC projection neurons preventing goal-directed control. Shifts in action control could therefore emerge from changes in the relative weight of OFC and downstream circuits, like DMS, that support goal-directed actions, versus other competing circuits, like DLS, that support habits (Graybiel, 2008; Gremel and Costa, 2013; Yin et al., 2006; 2004; 2005b). This led us to investigate potential endogenous mechanisms responsible for decreasing OFC transmission, thereby reducing its impact on downstream brain areas and allowing for habitual actions to emerge.

One candidate is the endocannabinoid system. Endocannabinoid signaling at the cannabinoid type 1 (CB1) produces synaptic plasticity in corticostriatal circuits that support action strategies (Gerdeman et al., 2002). CB1 receptors are Gi/o coupled G-protein receptors expressed in excitatory and inhibitory projection neurons as well as inhibitory interneurons within corticobasal ganglia circuits controlling actions (Lovinger, 2010). CB1 receptors are largely concentrated on both excitatory and inhibitory presynaptic terminals where upon activation they reduce the probability of neurotransmitter release (Kano et al., 2009). Importantly, previous work found that activation of the CB1 receptor by endocannabinoids is necessary for habitual learning (Hilário et al., 2007) and that chronic CB1 receptor activation biases towards the use of habitual action strategies (Nazzaro et al., 2012). Therefore, we hypothesized that CB1 receptor activation on OFC projection neurons could be one of the mechanisms gating the shift from goal-directed to habitual action control.

Through the use of Cre-recombinase-enabled, cell-specific and circuit-specific viral approaches in transgenic mice (Fenno et al., 2014), we identify an endogenous molecular mechanism, in a specific cortical-striatal pathway, mediating the transition between goal-directed and habitual action strategies. By selectively deleting CB1 receptors in OFC projection neurons, we identify a role for endocannabinoid modulation of excitatory cortical output in the transition from goal-directed to habitual control. We then show that activity of OFC neurons projecting to dorsal striatum (OFC-DS), and more specifically that transmission at OFC-DS terminals in striatum, is necessary to maintain goal-directed control. Finally, we show that selective deletion of CB1 receptors in OFC-DS neurons is sufficient to block the emergence of habitual behavior. Together, our findings suggest an endogenous mechanism of action shifting, with endocannabinoid-mediated inhibition of OFC-DS circuits allowing for the emergence of habitual control.

Results

To examine cell-type and circuit specific control over action selection, it is advantageous to examine goal-directed and habitual actions in the same subject concurrently. We therefore used a within-subject, self-paced instrumental lever-press task we recently developed (Gremel and Costa, 2013) where mice will readily shift between performing the same action on a similar manipulandum, for the same reward using a goal-directed versus habitual action strategy (Fig. 1). In this paradigm we pair schedules of reinforcement historically used to favor goal-directed or habitual control, random ratio (RR) and random interval (RI) respectively (Adams, 1982; Adams and Dickinson, 1981; Colwill and Rescorla, 1985; Dickinson, 1985), with a particular context. Each day mice are concurrently trained to press a similar lever in the same location for the same reward (pellets or a 20% sucrose solution) under RR and RI schedules. The other outcome is provided as a control in the home-cage (i.e., is not contingent on lever-press behavior). During training, these schedules produce largely similar lever pressing rates (ANOVA main effect of day: F8, 192 = 36.40, p < 0.001) (Fig. 1B).

Figure 1. Within subject shifting between goal-directed and habitual actions.

Figure 1

A, Acquisition schematic of lever-pressing for a food outcome under random interval (RI) and random ratio (RR) schedules of reinforcement. The same mouse is placed in two operant chambers distinguished by contextual cues, in successive order where they are trained to press the same lever (e.g., left lever) for the same outcome (food pellets vs. sucrose solution) (e.g., food pellets). The bias towards goal-directed actions is generated through use of random ratio (RR) schedules of reinforcement, where the reinforcer is delivered following on average the n lever press (2 days n = 10 followed by 4 days of n = 20). In contrast, random interval (RI) reinforcement schedules are used to bias towards use of habitual actions, with the reinforcer delivered following the first lever press after on average an interval of t has passed (2 days of t = 30 s followed by 4 days of t = 60s). Each day following lever press training, the other outcome (e.g. sucrose) is provided in the home cage. B, Response rate for a control cohort under RI and RR schedules across acquisition. C, Schematic of outcome devaluation procedure. On the Valued day (V) mice are fed (1h) a control outcome (e.g. sucrose) that they have experienced in their home cage. On the Devalued day (DV), mice are prefed the outcome associated with the lever press (e.g., food pellet). Following prefeeding, mice are placed into the RI and RR contexts, and lever presses are measured for 5 min in the absence of reinforce delivery. D, Lever pressing in V and DV states in RI (grey) and RR (black) contexts. E, Distribution of lever presses between V and DV days in RI and RR training contexts. F, Within-subject devaluation indexes in previously trained RI and RR contexts, reflecting potential shifts in the magnitude of devaluation. Individual results and mean ± SEM are shown. * = p < 0.05.

To probe the extent to which actions are controlled through goal-directed processes, we conduct a sensory-specific satiety outcome devaluation procedure across two days, termed valued day and devalued day (Fig. 1C). The valued day provides a control for general effects that satiation may have on lever-press behavior, when mice can pre-feed on the home-cage outcome that was not associated with a lever press. Outcome devaluation occurs on the devalued day, when mice can prefeed on the outcome previously earned by lever pressing, thereby decreasing the motivation for that particular outcome. Each day following prefeeding, mice are placed in each training context and non-rewarded lever-pressing is measured. Goal-directed control is sensitive to changes in motivation for the immediate expected outcome, with habitual control less so. Hence the degree of goal-directedness is assessed through the sensitivity of lever-pressing to outcome devaluation (Adams, 1982; Adams and Dickinson, 1981; Colwill and Rescorla, 1985), with a greater reduction in lever-pressing in the devalued state compared to valued state indicative of greater goal-directed action control (and no reduction indicative of habitual control).

Our within-subject task produces goal-directed and habitual action control (Fig. 1D). A two-way ANOVA revealed a significant interaction between Training context and Day (F1, 22 = 10.92, p = 0.003); mice pressed less in the devalued versus valued state, only in the RR context (Bonferroni corrected p < 0.01). Mice differentially distributing their lever presses between valued and devalued states in the RR but not RI training contexts (Fig. 1E) (one-sample t-test RR context: t11 = 7.27, p < 0.001). Importantly, there is a shift in the distribution of goal-directedness as measured by a devaluation index [(Valued-Devalued Lever presses)/Total Lever presses] (i.e. values closer to 1 indicative of greater goal-directedness) for each training context, with individual mice showing greater goal-directedness in the RR than RI context (paired t-test: t11 = 3.41, p = 0.005) (Fig. 1F). Hence, on the same day mice can readily shift between goal-directed and habitual control over lever pressing for the same outcome.

Deletion of OFC CB1 receptors prevents habitual control

We first examined the ability to shift between goal-directed and habitual action control following deletion of CB1 receptors in the OFC. We used a viral-vector approach to selectively delete CB1 receptors from OFC neurons in adult mice. LoxP-flanked CB1 transgenic mice (CB1lflox) (Marsicano, 2003) and their wild-type littermates were given stereotaxically-guided intracranial injections targeted to the lateral OFC of either AAV9-Ef1a-Cre-eGFP for widespread Cre-recombinase, or AAV9-CamKIICre-eGFP to target specifically OFC projection neurons (Fig 2A; Supplementary Fig. 1). To confirm the presence or absence of CB1 receptors in terminals of OFC projection neurons after these manipulations (> 4 week post surgery) we included Cre-dependent channelrhodopsin (AAV5-Ef1a-DIO-ChR2-eYFP) in some of the OFC injections, and examined the ability of the CB1 agonist WIN55,212 to reduce light-evoked excitatory transmission from OFC terminals onto medium spiny neurons (MSNs) in the dorsal striatum via whole-cell patch clamp electrophysiology.

Figure 2. Deletion of CB1 receptor from OFC neurons impairs habitual action control.

Figure 2

A, Schematic of viral strategy and ex vivo physiological assessment. B, Cre-dependent ChR2 eYFP detected at OFC injection site (left) and downstream DS (right) in CB1floxCamKII Cre mice. C, Representative traces showing assessment of light-evoked excitatory post-synaptic currents (oEPSCs) in DS MSNs, with the blue circle indicating the light pulse (473 nm wavelength, 5 ms) in CB1floxCamKII Cre mice and wild-type littermates. WIN55, 212, is a CB1 receptor agonist. D, Relative amplitude of DS MSN oEPSCs following WIN55, 212 application (10 min, 1 μM) differed between wild-type and CB1floxCre /CB1floxCamKII Cre mice. E, Experimental design schematic. F, Lever presses made following outcome devaluation procedures in valued (V) and devalued (DV) states across RI and RR training contexts for the different treatment groups. G, Normalized lever presses during outcome devaluation testing, showing the distribution of lever-presses between V and DV states in the different training contexts (RI and RR) across the different treatment groups. H, Shifts in outcome devaluation index between RI and RR training contexts for Control mice, CB1floxCre mice, and CB1floxCamKII Cre mice. Individual results and mean ± SEM are shown. * = p < 0.05, # = p = 0.08. See also Supplementary Fig. 1 and 2.

Consistent with previous findings showing extensive synaptic connectivity between OFC and striatum in mice (Pan et al., 2010; Wall et al., 2013) (Fig. 2B), we detected optically-evoked excitatory post-synaptic currents (oEPSCs) in MSNs in dorsal striatum (Fig 2C). As expected, application of WIN55,212 reduced the amplitude of oEPSPs in control mice (unpaired t-test: t14 = 3.55, p = 0.003), with a one sample t-test against baseline showing reduced relative EPSCs in control mice (t5 = 5.42, p = 0.003) (Fig. 2C, D; Supplementary Fig. 1E), showing intact CB1 receptor function in OFC presynaptic terminals. In contrast, WIN55, 212 did not reduce oEPSCs in virally injected CB1flox mice (p > 0.05), confirming deletion of CB1 receptors from OFC presynaptic terminals.

Adult control mice (wild-type littermates injected with virus), CB1flox mice with CB1 receptors deleted from OFC neurons (CB1floxCre), and CB1floxmice with CB1 receptors deleted from OFC CamKII expressing neurons (CB1floxCamKII Cre), were trained to lever press for the same food reward under both RI and RR schedules of reinforcement (Fig. 2E). Deletion of CB1 receptors from OFC neurons had a small effect on acquisition of lever press behavior; CB1floxCre mice and CB1floxCamKII Cre mice made slightly more lever presses as the RR requirement increased (Supplementary Fig. 2). Importantly, CB1floxCre mice and CB1floxCamKII Cre mice were able to distinguish between RI and RR schedules as evidenced by maintained sensitivity to differential feedback functions produced by each schedule (Supplementary Fig 2G–I).

To evaluate the degree to which mice used the expected outcome value to control decision-making, we performed an outcome devaluation test. Control mice selectively reduced lever pressing following outcome devaluation in the RR context but not the RI context (Fig. 2F). Repeated-measures ANOVA performed showed a ‘Day x Context’ interaction in control mice and reduced responding on the devalued day only in the RR context (F1, 20 = 2.79, p = 0.05;Bonferroni corrected p < 0.05). In contrast, CB1floxCre mice and CB1floxCamKII Cre mice made fewer lever presses in the devalued compared to valued state, in both previously trained RI and RR contexts (Fig. 2F). A main effect of Day was observed in CB1floxCre mice (F1, 24 = 18.08, p = 0.0003) and CB1floxCamKII Cre mice (F1, 16 = 41,47, p < 0.0001) with reduced lever pressing in devalued versus valued states in both RI and RR contexts (Bonferroni corrected ps’ < 0.05). Further, a one-sample t-test (against chance or 0.5) of normalized lever presses between valued and devalued states in each training context showed that control mice were only sensitive to devaluation in the RR context (RI context: t10 = 0.82, p = 0.43); RR context: t10 = 2.67, p = 0.025). In contrast, CB1floxCre mice and CB1floxCamKII Cre mice were sensitive to devaluation in both RI and RR previously trained contexts (Fig. 2G) (CB1floxCre mice RI: t12 = 3.96, p = 0.002; RR context: t12 = 3.33, p = 0.006) (CB1floxCamKII Cre mice RI: t8 = 3.96, p = 0.003; RR context: t8 = 9.83, p < 0.0001).

Finally, to evaluate whether CB1 deletion altered the within-subject shift in goal-directedness, we performed a paired t-test on the devaluation index for each group to compare the degree to which each mouse was goal-directed in the RI versus the RR context. Although all mice consumed similar amounts of food during sensory-specific satiation procedures (Supplementary Fig. 2F), control mice tended to show greater sensitivity to outcome devaluation in the RR than RI context (Fig. 2H) (t20 = 2.36, p = 0.08). In contrast, CB1floxCre mice and CB1floxCamKII Cre mice showed similar sensitivity to outcome devaluation in both the RI and RR contexts (CB1floxCre mice: t24 = 0.21, p = 0.83; CB1floxCamKII Cre mice: t16 = 1.07, p = 0.30). These findings are in accordance with additional experiments using an inducible forebrain specific Cre recombinase approach, which suggests a role for OFC CB1 receptors in shifting towards habitual action control (Supplementary Fig. 3). Together, these data strongly suggest that deletion of CB1 receptors from OFC neurons and specifically CamKII-expressing neurons in adulthood prevents a shift to habitual action control and renders mice reliant on goal-directed processes in both contexts.

Attenuation of OFC-DS projection activity results in habitual control

We previously showed that efficacy of excitatory transmission from OFC CamKII expressing neurons to their targets maintains goal-directed control, with chemogenetic inhibition of OFC projection neurons leaving mice reliant on habitual decision-making processes (Gremel and Costa, 2013). However, OFC neurons send projections widely across the cortex, striatum, and midbrain areas (Hoover and Vertes, 2011; Pan et al., 2010; Schilman et al., 2008). Given dorsal striatal involvement in goal-directed and habitual actions, we first tested if the activity of OFC-DS neurons controls the shift from goal-directed to habitual actions. We took a combined retrograde and chemogenetic viral approach to silence activity of OFC-DS neurons during outcome devaluation (Armbruster et al., n.d.; Fenno et al., 2014; Stachniak et al., 2014; Znamenskiy and Zador, 2013). We selectively expressed the inhibitory designer Gi-coupled human muscarinic receptor 4 receptor (hM4D) in OFC-DS neurons and performed outcome devaluation procedures in the presence of the selective exogenous ligand clozapine-N-oxide (CNO), thereby inhibiting OFC-DS neurons.

Adult wild-type mice were injected with the retrograde herpes simplex virus-1 carrying Cre-recombinase (hEF1α-eYFP-IRES-cre) (HSV Cre) or eYFP (hEF1α-EYFP-IRES) (HSV YFP) stereotaxically targeted to the DMS where we had previously observed OFC fiber tracts and recorded oEPSCs in MSNs (Fig. 3A; Supplementary Fig. 4). In the same surgery, mice were also given an injection of AAV5-Ef1a-DIO-hM4D-mCherry into lateral OFC (Fig. 3A). This combination of viruses resulted in expression of hM4D only in OFC neurons that send projections to dorsal striatum (OFC-DS h4MD; Fig 3C). We did not observe hM4D-induced mCherry fluorescence following hEF1α-EYFP-IRES control injections into striatum (control mice; Supplementary Fig. 4). We have previously shown that CNO activation of hM4D receptors on OFC projection neurons leads to a reduction in OFC activity in vivo (Gremel and Costa, 2013). To verify that CNO activation of hM4D receptors inhibited activity of OFC-DS neurons (Stachniak et al., 2014), we performed whole-cell patch recordings of visually identified hM4D expressing OFC-DS neurons. We observed hyperpolarization of OFC neurons with bath application of CNO (Supplementary Fig. 4D) and CNO application blocked current-evoked action potential firing whereas vehicle application did not (Fig. 3D, E).

Figure 3. Activity in OFC-DS neurons is necessary for the shift to goal-directed action control.

Figure 3

A, Schematic of combinatorial retrograde and AAV viral strategy. B, Schematic of experimental design with devaluation testing performed following CNO administration. C, hM4D mCherry expression in OFC. D–E, Representative traces showing the ability of injected current (-200 to +300 pA, 100 pA steps) to evoke an action potential in OFC neurons (baseline) and following vehicle (DMSO; D) and CNO (10 μM; E) application. F, Lever presses during outcome devaluation testing across valued (V) and devalued (DV) states in RI and RR training contexts. G, Normalized lever presses during outcome devaluation testing showing the distribution of lever presses between V and DV states in the previously RI and RR trained contexts. H, Devaluation index for each group of mice in the previously trained RI and RR contexts. Individual results and mean ± SEM are shown. * = p < 0.05. See also Supplementary Fig. 3.

We trained control and OFC-DS hM4D mice on our within-subject lever-press task. Following acquisition, we performed the outcome devaluation procedure in the presence of CNO (1 mg/kg, i.p.; Fig. 3B). Once again, control mice showed reduced responding in the devalued state in the RR but not RI context (Fig 3F) (Repeated measures ANOVA (Day by Context: F1,12 = 0.8, p = 0.37); pre-planned paired t-test RR context t12 = 2.44, p = 0.05). In contrast, OFC-DS hM4D mice did not reduce responding in the devalued state in either RI or RR contexts (F1, 18 = 0.98, p = 0.34; ps’ > 0.05). Furthermore, we found that while control mice differentially distributed lever presses between valued and devalued states only in the RR context (Fig. 3G) (RI t6 = 1.69, p = 0.14; RR t6 = 5.00, p = 0.002), OFC-DS hM4D mice showed similar lever press distributions in each context (RI t9 = 0.11, p = 0.91; RR t9 = 1.28, p = 0.23). We then examined whether hM4D activation and silencing of OFC-DS altered the shift in devaluation index between RI and RR contexts. We found that while control mice were overall more goal-directed in the RR than RI training context, this was not the case for OFC-DS hM4D mice (Fig. 3H). A paired t-test of devaluation indexes in RI versus RR contexts revealed a significant shift in control mice (t12 = 2.33, p = 0.03) that was not present in OFC-DS hM4D mice (t18 = 0.44, p = 0.66). The observed increase in the overall level of lever-pressing in OFC-DS hM4D mice (Fig. 3F) raises the possibility that increased locomotor activity interfered with the ability to show outcome devaluation. Correlational analyses performed did not support this hypothesis (see Supplementary Fig. 4E, F), nor did CNO induce a general increase in locomotor activity in control or OFC-DS hM4D mice (Supplementary Fig. 4G).

However, since deep layer cortical projection neurons have collateral projections (Shepherd, 2013), systemic administration of CNO could also result in attenuated collateral activity of OFC-DS neurons in other downstream terminal areas. To directly examine the contribution of OFC terminals in the DS, we infused CNO directly into the more medial DS prior to outcome devaluation procedures. We implanted cannulae targeted to the medial DS in adult wild-type mice that had been previously injected with AAV5-Ef1a-Cre-eGFP into DS (Rothermel et al., 2013) and AAV5-Ef1a-DIO-hM4D-mCherry into lateral OFC (Fig 4C) (Supplementary Fig. 4). Following recovery and subsequent lever-press training on RI and RR schedules, we microinjected CNO (300 μM) (n = 6) or saline (n = 6) into the DS prior to devaluation testing.

Figure 4. Attenuating OFC-DS transmission disrupts goal-directed control.

Figure 4

A, Schematic of combinatorial AAV viral strategy. B, Schematic of experimental design with devaluation testing performed following intra-cranial CNO or saline administration. C, (left panel) DIO hM4D mCherry expression in OFC, (center panel) cannula placement within DS, (right panel) DS insert from block outline in center panel showing DIO hM4D mCherry fiber expression in DS. D, Lever presses during outcome devaluation testing across valued (V) and devalued (DV) states in RI and RR training contexts. E, Normalized lever presses during outcome devaluation testing showing the distribution of lever presses between V and DV states in the previously RI and RR trained contexts. F, Devaluation index for each group of mice in the previously trained RI and RR contexts. Individual results and mean ± SEM are shown. * = p < 0.05. See also Supplementary Fig. 4.

We found that the handling procedures associated with microinjections prior to devaluation testing were sufficient to disrupt habitual control in Saline infused mice, with Saline mice showing strong goal-directed control over actions in both training contexts (ANOVA; no interaction, main effect of Day: F1, 10 = 29.30, p < 0.001) (Fig. 4D). In spite of this disruption, intra-DS infusions of CNO resulted in strong habitual control in both RI and RR training contexts (no interaction, no main effects: ps’ > 0.05). Control mice differentially distributed their lever presses between valued and devalued states in both RI and RR contexts (RI: t5 = 22.82, p < 0.001; RR: t5 = 15.18, p < 0.001), whereas CNO-treated mice did not (RI t5 = 1.46, p > 0.05; RR t5 = 1.07, p > 0.05) (Fig. 4E). Neither group showed a shift in devaluation index (ps’ > 0.05) (Fig. 4F). A planned comparison performed on devaluation indexes between groups revealed greater goal-directedness in the saline versus CNO treated mice (main effect of Treatment group: F1, 10 = 26.58, p < 0.01). Hence, saline mice were goal-directed while CNO-induced attenuation of OFC-DS transmission left mice reliant on habitual strategies to control lever pressing.

The lack of goal-directed control following CNO treatment is not explained by disrupted response inhibition, since in a separate test, systemic CNO treatment did not prevent satiety-induced reduction in responding in an open-ended fixed ratio schedule (Supplementary Fig. 4J–L). Instead, our findings are in line with reports of alterations in OFC-DS activity in repetitive action pathologies such as obsessive-compulsive disorder (OCD) (Burguière et al., 2013), and may underlie previous reports relating habitual responding and OCD (Gillan et al., 2011). Together this suggests that hM4D activation and subsequent attenuation of OFC-DS neuron transmission selectively disrupts the ability for goal-directed strategies to control lever-press responding.

Cannabinoid type 1 receptor mediated attenuation of OFC-DS activity is critical for habitual control

The previous data suggested that activity of OFC-DS neurons is critical for goal-directed behavior. We therefore reasoned that attenuation of transmission in this pathway via CB1 receptor activation could be critical for habit formation. We next investigated whether endogenous CB1 receptor activation and subsequent reduction in glutamate release from OFC-DS neurons was necessary for the shift from goal-directed to habitual action control. We used a site-specific targeted combinatorial viral approach (Fenno et al., 2014) in transgenic mice to selectively delete CB1 receptors in only the OFC neurons that project to DS. CB1flox mice and their wild-type littermates were injected with a retrograde virus carrying flipase hEF1α-eYFP-IRES-flp (HSV fp) targeted to the same DMS region of observed OFC-DS connections (Fig. 5A, Supplementary Fig. 6). In the same surgery, we also stereotaxically injected AAV8-Ef1a-FD-mCherry-p2A-Cre targeted to the lateral OFC, with Cre-recombinase dependent upon the presence of flipase (AAV fp-Cre). The aim of this site-specific viral targeting was to limit Cre-recombinase expression to OFC-DS neurons, and CB1 deletion in OFC-DS neurons of CB1flox mice (OFC-DS CB1flox), and not in wild-type littermates (Ctl). We included AAV5-Ef1a-DIO-ChR2-eYFP (AAV DIO ChR2) to functionally assess CB1 receptor deletion (Fig 5A, B). We were able to reliably measure light-evoked changes in membrane potential of OFC somata (Fig. 5C), indicating successful Cre-recombinase activity. Although more challenging due to the relative sparseness of terminals, we were able to perform whole-cell patch recordings and evoke oEPSCs in a subset of MSNs from wild-type control and CB1flox mice (Fig. 5E,). WIN55, 212 reduced the amplitude of oEPSC in wildtype but not CB1flox mice (Fig. 5E, F). These data suggest that our site-specific targeted combinatorial viral approach in transgenic mice was successful at deleting CB1 receptors in terminals of OFC-DS neurons.

Figure 5. Deletion of CB1 receptors in OFC-DS neurons prevents habitual control over actions.

Figure 5

A, Schematic of combinatorial viral strategy and ex vivo physiological assessment. B, fp dependent Cre-mCherry detected in OFC (left) and in downstream DS (right). C, Representative traces showing ChR2-mediated firing of an OFC neuron. D, Schematic of experimental design. E, Representative traces showing assessment of DS MSN oEPSCs in a subset of OFC-DS CB1flox mice (n = 2) and wild-type littermate (Ctl) (n = 1) mice in the absence (left) and presence (right) of WIN55, 212. F, Relative amplitude of DS MSN oEPSCs following WIN55, 212 application in Ctl and OFC-DS CB1flox mice. G, Lever presses in valued (V) and (devalued (DV) states, in RI and RR training contexts. Individual results and mean ± SEM are shown. H, Normalized lever presses during outcome devaluation testing showing the distribution of lever presses across V and DV states in the different training contexts (RI and RR). I, Devaluation index plotted within-subject for RI and RR training contexts for control mice and OFC-DS CB1flox mice. Individual results and mean ± SEM are shown. * = p < 0.05. See also Supplementary Fig. 5.

Deletion of CB1 receptors from OFC-DS neurons did not alter acquisition of lever-press behavior under RI or RR reinforcement schedules, and mice were able to distinguish between schedules (Supplementary Fig. 5). During outcome devaluation procedures, control mice showed reduced lever pressing in the devalued state in the RR but not RI context (Fig. 5G) (Repeated-measures ANOVA (Context x Day): F1, 12 = 5.47, p = 0.04) (Bonferroni corrected ps’ < 0.05). However, OFC-DS CB1flox mice reduced lever pressing in both RI and RR contexts in the devalued state (no interaction, main effect (Day): F1, 18 = 29.72, p < 0.0001) (Bonferroni corrected ps’ < 0.05). Further OFC-DS CB1flox mice differentially distributed their lever pressing between valued and devalued states in both the RI and RR contexts (Fig. 5H) (RI: t12 = 6.14, p < 0.0001; RR: t12 = 2.45, p < 0.05). Controls only differentially distributed pressing in the RR context (RI t7 = 1.3, p =0.24; RR t7 = 3.45, p = 0.011). Finally, control mice increased their devaluation index between RI and RR contexts (Fig. 5I) (t14 = 2.11, p = 0.05) indicative of a shift in the degree of goal-directedness; while OFC-DS CB1flox mice did not show such a shift (t20 = 0.49, p = 0.63). This suggests that habitual action control involves CB1 receptor-mediated inhibition of the OFC-DS circuit that supports goal-directed actions.

Discussion

The data presented in this study uncover a novel mechanism for shifting towards habitual control. Our results show that competition between DMS and DLS circuits supporting goal-directed versus habitual action, respectively, is strongly influenced by the gating of incoming transmission from OFC. By targeting molecular mechanisms in a cell-type, circuit-specific, and projection specific manner, we identify an endogenous mechanism that underlies gating shifts between goal-directed and habitual action control in the same subject.

In the same mouse, we were able to examine the loss of goal-directed action control in one context, while goal-directed control was intact in the remaining context. We used differing schedules of reinforcement to bias differential action control. Importantly, the different RI and RR schedules we used to bias different action control strategies do not produce differences in the macro-structure or micro-structure of lever-pressing, nor do they produce differences in rate across a session (i.e. interval-induced scalloping of lever-pressing) (Gremel and Costa, 2013). This within-subject approach to investigate habitual action control differs from that of an extending training approach, where in some but not all (Colwill and Rescorla, 1985)) circumstances, extensive experience with the action-outcome relationship will shift the control of a subject’s responding from initial goal-directed control, to habitual control (Dickinson, 1985). Hence, in the within-subject procedure used in the current experiments, it is not the animal that is habitual or goal-directed; it is action control within a particular context that is habitual or goal-directed (Fig 1). Although there may be subtle differences, habitual control biased through use of both extended training or RI schedules recruits the dorsal lateral striatum, (Barnes et al., 2005; Dias-Ferreira et al., 2009; Gremel and Costa, 2013; Hilario et al., 2012; Smith and Graybiel, 2013; Thorn et al., 2010; Yin et al., 2006; 2004; Yin and Knowlton, 2006), goal-directed control biased through the use of RR depends on dorsomedial striatum (Barnes et al., 2005; Dias-Ferreira et al., 2009; Gremel and Costa, 2013; Hilario et al., 2012; Smith and Graybiel, 2013; Thorn et al., 2010; Yin et al., 2006; 2005a; 2005b; 2004; Yin and Knowlton, 2006). Using this within-subject approach, we took a chemogenetic approach to inhibit the activity of OFC-DS neurons, and found that inhibition of OFC-DS activity via activation of the Gi/o-coupled hM4D receptor specifically during probe test sessions left the subject reliant on habit circuitry where normally goal-directed circuits are favored (Fig. 3). Furthermore, specific attenuation of transmission at OFC terminals in DS prevented use of goal-directed strategies for action control (Fig. 4). This suggests that increasing Gi/o modulation at OFC terminals in DS contributes to the emergence of habitual action control. Whether this mechanism is also responsible for the shift to habitual control following extended training remains to be investigated.

Previous work had implicated an endogenous Gi/o-coupled receptor in habitual action control, the CB1 receptor (Hilário et al., 2007). Strikingly, CB1 receptor antagonism during acquisition alone, prevented use of habitual action control. This suggests that CB1 receptors are important for learning-induced plasticity that is necessary for habitual control. Given the widespread expression and increasing gradient of CB1 found in medial to lateral striatum (Kano et al., 2009), it seemed likely that CB1 receptor modulation of habitual control occurred through actions in corticostriatal circuits. The majority CB1 receptors are found on presynaptic inhibitory synapses within striatum (Mátyás et al., 2006; Uchigashima et al., 2007). The expression on these terminals likely accounts for the higher levels of expression in DLS relative to DMS. However, CB1 receptors are also found on cortical neurons projection to striatum where they are localized to terminals (Uchigashima et al., 2007) and endocannabinoids are released post-synaptically in an activity dependent manner (Gerdeman et al., 2002; Uchigashima et al., 2007) bind CB1 receptors presynaptically to serve as a powerful modulator of glutamate release (Gerdeman and Lovinger, 2001). We targeted these corticostriatal CB1 receptors, and found the prior to training, the specific deletion of CB1 receptors in OFC neurons and in particular OFC-DS neurons, resulted in a loss of habitual action control (Fig. 2, 5). Together, our data suggest that habitual control over an action strategy depends on Gi/o-coupled receptor mediated attenuation of a competing circuit controlling goal-directed behaviors.

The existence of parallel associative and sensorimotor circuits that control disparate action strategies suggests that perturbations to those controlling goal-directed strategies may result in an abnormal reliance on habit circuitry. The reported reduction in task-induced activation of OFC-DS circuits (Remijnse et al., 2006) and greater reliance on habitual processes (Gillan et al., 2011) observed in OCD patients, raises the hypothesis that a pathology involving reduced activity of OFC-DS circuits is present with OCD. Indeed, in a genetic mouse model of OCD, an acute increase in transmission specifically at OFC-DS terminals reduced compulsive grooming behavior (Burguière et al., 2013). Additionally, chronic stimulation of OFC inputs into striatum resulted in development of a repetitive grooming pathology in wildtype mice (Ahmari et al., 2013). Further, both clinical and preclinical work have implicated dysfunction of OFC circuits in addiction disorders (Goldstein and Volkow, 2011; Schoenbaum and Shaham, 2008) that involve impaired decision-making. It is important to note that chronic administration of the main psychoactive ingredient in cannabis, Δ9-THC, also biases toward use of habitual strategies, and chronic Δ9-THC as well as alcohol exposure, disrupts endocannabinoid modulation in DS (DePoy et al., n.d.; Nazzaro et al., 2012). Thus, loss of endocannabinoid-mediated plasticity in OFC-DS circuits could contribute to addiction (Belin et al., 2013; Goldstein and Volkow, 2011) and OCD (Burguière et al., 2015; Gillan et al., 2011). We also observed increased response (lever pressing) rates following attenuation of OFC-DS neuron activity via Gi/o-coupled hM4D receptor activation and following deletion of CB1 receptors from OFC-DS neurons that was independent of any effects on outcome devaluation. While the increased response rates may reflect a lack of competition between circuits, they cannot contribute to action control as Gi/o-coupled hM4D receptor activation led to habitual control while CB1 deletion from OFC-DS neurons led to loss of habitual control. Further, the observed increased response rates are not due to lack of inhibitory control, as both show sensitivity to satiety. Our studies do suggest OFC-DS circuits underlie a critical component of action control that is disturbed in disorders affecting decision-making control over actions.

With an impaired habit-breaking phenotype present in numerous psychiatric disorders (Griffiths et al., 2014), and wide-spread use of the abused drugs in the general populace, our findings suggest that therapeutic targeting of the endocannabinoid system is a viable option for restoration of goal-directed control.

Materials and Methods

All experiments were approved by the National Institute on Alcohol Abuse and Alcoholism (NIAAA) and the NIAAA Animal Care and Use Committee and done in accordance with NIH guidelines. Transgenic mice Cnr1loxP/loxP were obtained (Marsicano, 2003) and bred in-house. Male and female mice were housed in groups of 1–4, with mouse chow and water ad libitum unless stated otherwise, and kept on a 12 h light/dark cycle. All surgical, electrophysiological, and behavioral experiments were performed during the light portion of the cycle.

Deletion of CB1 receptor from OFC neurons

Adult CB1flox mice and their wild-type littermates were given stereotaxically-guided intracranial injections targeted at the OFC (from Bregma; A: 2.70mm, M: ±1.75 mm, and V: −2.25mm) of either AAV9-Cre-eGFP or AAV9-CamKIICre-eGFP (200 nl per side), along with AAV5-DIO-ChR2-eYFP (100nl per side;University of Pennsylvania vector core). At least 4 weeks following viral injection, mice were trained on the within-subject instrumental task, or were used for whole-cell patch clamp physiology. Virus spread in each mouse put through behavioral procedures was examined under a fluorescence microscope (Olympus MVX10); mice with fluorescence outside the OFC inclusion area were excluded from the final data analyses (n = 9). Final group ns from a total of 3 replicates are the following; Ctl: n = 12; CB1floxCre n = 13; and CB1floxCamKII Cre n = 9.

Chemogenetic inhibition of OFC-DS neurons

C57Bl/6J mice each received 2 bilateral injections, one targeting the DS and the other the OFC. In one experiment, hEF1α-eYFP-IRES-cre or hEF1α-EYFP-IRES (MIT vector core) was injected into the DS (from Bregma; A: 0.5 mm, M ± 1.25 mm, V −3.25 mm) (300 nl per side), and AAV5-DIO-hM4D-mCherry (Gene Therapy Vector Core at the University of North Carolina) into OFC (coordinates same as above) (200 nl/side). In a second experiment, mice were given an infusion of AAV5-Cre-GFP (Gene Therapy Vector Core at the University of North Carolina) (200 nl) into the DS and AAV5-DIO-hM4D-mCherry into OFC (200 nl). These mice were also implanted with bilateral cannulae (Plastics One, Inc.) targeted to the dorsal striatum (from Bregma; A: 0.5 mm, M ± 1.25 mm, V −2.75 mm). Following at least 4 weeks of recovery, mice were trained in the within-subject task. In experiments with systemic CNO administration, each day 60 min prior to devaluation testing mice were given an intraperitoneal injection of CNO (1 mg/kg; 10 ml/kg). The mice with cannulae were lightly anesthetized, dummy cannulae removed, and injectors targeting dorsal striatum (−3.25 mm) were lowered. 15–30 min prior to the onset of the devaluation procedure, mice were give 300 nl intra-cranial injections of either 0.9% isotonic Saline or CNO (300 μM) at a rate of 100 nl/min. Following devaluation testing, these mice also underwent an additional lever-press test. Mice were pretreated (15 min) with systemic saline or CNO (1 mg/kg, 10 ml/kg) prior to access to the previously earned outcome on a fixed ratio of 8. To confirm successful viral infection and expression of hM4D receptors, a subset of injected mice were euthanized and whole-cell patch clamp recordings were used to verify CNO-induced suppression of OFC-DS neuronal activity. Viral spread was observed under a fluorescence microscope (Olympus MVX10), and mice with no expression or significant expression outside of OFC inclusion boundaries (n = 5) or misplaced cannulae or infection at the site of intra-cranial injection (CNO treated: n = 4; Saline treated: n = 3), were excluded. The final group n’s from two systemic replicates are as follows; Ctl n = 7, hM4D n = 10. The final group n’s from one intra-cranial replicate is as follows; Saline n = 6, CNO-treated n = 6.

Deletion of CB1 receptors from OFC-DS neurons

To selectively delete CB1 receptors in OFC-DS neurons in adult mice, we bilaterally injected CB1flox and wild-type littermates with hEF1α-eYFP-IRES-flpo (MIT vector core; 300 nl/side) into DS, and AAV8-Ef1a-FD-mCherry-p2A-Cre (courtesy of Dr. Deisseroth, Stanford University; 200 nl per side) and AAV5-DIO-ChR2-eYFP (University of Pennsylvania Vector Core; 100 nl/side) into OFC. At least 6 weeks following viral injection mice were trained on the within subject procedure. Histological examination was done to confirm viral expression, and mice with no mCherry/YFP expression were excluded from the final behavioral analyses (n = 9). A subset of mice, were used for ex vivo physiological assessment via whole-cell patch clamp. Final behavioral groups n’s from two replicates are the following; Ctl n = 7; OFC-DS CB1flox mice n = 10.

Lever press training

All behavioral training and testing took place as previously described (Gremel and Costa, 2013). In brief, mice were placed in operant chambers in sound attenuating boxes (Med-Associates, St. Albans, VT) in which they pressed a single lever (left or right) in a self-paced manner for an outcome of either regular “chow” pellets (20 mg pellet per reinforcer, Bio-Serve formula F05684) or sucrose solution (20–30 μl of 20% solution per reinforcer). The other outcome was provided later in their home-cage and used as a control for general satiation in the outcome devaluation test. Before training commenced, mice were food restricted to 90% of their baseline weight at which they were maintained for the duration of experimental procedures.

Training was conducted as follows: each day each mouse was trained in two separate operant chambers distinguished by contextual cues [i.e. black and white vertical striped laminated paper on chamber walls (3.2 mm wide stripes) or clear plexiglass chamber walls]. Upon completion of training in one context, mice were immediately trained in the remaining context. For each mouse, the order of schedule exposure, lever position and the outcome obtained upon lever pressing were kept constant across contexts. However, the context, schedule order, lever position, and outcome earned were counterbalanced across mice. Each training session commenced with illumination of the house light and lever extension, and ended following schedule completion or after 60 min with the lever retracting and the house-light turning off. On the first day, mice were trained to approach the food magazine (no lever present) in each context on a random time (RT) schedule, with a reinforcer delivered on average every 60 sec for a total of 15 min. Next, mice were trained in each context on continuous reinforcement schedules (CRF or FR1), where every lever-press was reinforced, with the possible number of earned reinforcers increasing across training days (CRF5, 15, 30). In the absence of any predictive cue signaling reward delivery, mice acquired lever-press behavior within 3 days. After acquiring lever-press behavior, mice were trained on random interval (RI) and random ratio (RR) schedules of reinforcement (Adams, 1982; Derusso et al., 2010) with schedules differentiated by context, with the session ending in each context after 15 reinforcers were earned or after 60 min had elapsed. Mice initially pressed under RI30 (on average one reinforcer following the first press after an average of 30 sec) and RR10 (on average one reinforcer every 10 lever presses) schedules for two days, followed by four days of RI60 and RR20 training.

Outcome devaluation testing occurred across two consecutive days. In brief, on the valued day, mice had ad libitum access to the home-cage outcome for 1 h before serial brief non-reinforced test sessions in the previous RI and RR training contexts. On the devalued day, mice were given 1 h ad libitum access to the outcome previously earned by lever-press, and then underwent serial non-reinforced test sessions in each training context. Pre-feeding took place in a separate cage to which mice were previously habituated, and the amount consumed was recorded. Order of context exposure during testing was the same as training exposure, with order of devaluation day counterbalanced across mice. Tests in each context were 5 min in duration.

A subset of mice were given an additional probe lever-press test following outcome devaluation testing. Food restricted mice were pretreated (15 min) with CNO (n = 6) or saline (n = 6), and then placed in the first of the previously-trained operant contexts (counterbalanced across RI and RR within each treatment group). Mice had access to the training lever, but now the reinforcement schedule was a fixed ratio 8 (every 8th lever press produced the outcome). Mice could earn unlimited rewards within a 1 h period. Lever presses, rewards earned, and head entry behavior was analyzed.

Locomotor activity in a novel cage

Ctl and hM4D mice were injected with CNO (1 mg/kg, 10 ml/kg) and placed in a novel polycarbonate cage for 60 min. Horizontal activity was detected as infrared beam crosses (1 inch spacing, 10 beams per cage) within 10-second bins using Opto M3 activity monitors (Columbus Instruments, Columbus, OH). Once the trial was over, mice were immediately returned to their home cage.

Slice preparation and electrophysiology

Mice were anesthetized with isoflurane and decapitated. Brains were removed and placed in ice-cold cutting solution containing in mM: sucrose 194, NaCl 30, KCl 4.5, NaHCO3 26, NaH2PO4 1.2, glucose 10. Coronal brain slices containing the OFC or DS, 250-mm thick were obtained using a vibrating blade microtome (Leica VT1200S) and recovered in aerated ACSF containing in mM: NaCl 124, KCl 4.5, MgCl2 1, NaHCO3 26, NaH2PO4 1.2, D-glucose 10, CaCl2 2 at 32°C for 30 min. Slices were then placed at room temperature until experimental use. Whole-cell patch clamp recordings were performed between 30–32°C ± 1°C (with control by an Automatic Temperature Controller, Warner Instruments, Hamden, CT). Neurons in slices were visualized with an upright microscope using a 40x (0.8 n.a.) water immersion objective. Real-time images were displayed on a video monitor, which aided navigation and placement of recording pipettes. 2–4 MΩ patch pipettes were pulled from borosilicate glass capillaries (1.5 mm outer diameter, 0.86 mm inner diameter; World Precision Instruments, Sarasota, FL) and filled with internal solution. Two internal solutions were used. The K-based internal contained in mM: K-gluconate 126, KCl 4, HEPES 10, Mg-ATP 4, Na-GTP 0.3, Phosphocreatine 10. The Cs-based internal contained in mM: CsCl 150, HEPES 10, MgCl2 2, Na-GTP 0.3, Mg-ATP 3, BAPTA-4Cs 0.2. Recordings were made using a Multiclamp 700B amplifier (Molecular Devices, Foster City, CA). Membrane currents were filtered at 2 kHz, digitized using a Digidata 1322A at 10 kHz, displayed and saved using Clampex v9.2, and analyzed with Clampfit v9.2 (Molecular Devices). Statistical analysis was performed with GraphPad Prism 6 software (GraphPad Software, Inc. La Jolla, CA). To isolate EPSCs, picrotoxin (100μM, Sigma) was added to the extracellular solutoin. Series resistance was less than 25 MΩ and cells with changes of more than 20% were excluded. A 10 min application of WIN55,212 (1μM) was used to examine presence or absence of CB1-mediated synaptic depression. For opto-activation experiments, oEPSCs in MSNs and oEPSPs in OFC neurons were evoked in brain slices using 470-nm blue light (5-ms exposure time) delivered via field illumination using a High-Power LED Source (LED4D067, Thor Labs). Light intensity was adjusted to produce oEPSCs of 100–400-pA magnitude (<100 mW). oEPSCs were evoked once per minute. Imaging of brain slices was performed using an Olympus MVX10 microscope (Olympus Corporation of America).

Statistical analyses

The α level was set at 0.05 for all analyses, unless otherwise indicated. Initial analyses showed normal distributions for all behavioral data. For all behavioral analyses, lever presses, lever press rate, rewards earned, and head entries were analyzed using repeated measures ANOVA, with post-hoc analyses performed using Bonferroni-corrected paired t-tests where appropriate. For outcome devaluation testing analyses, two-way ANOVA (Devaluation state x Schedule) within each Group was used to evaluate differences in lever-press and consumption behavior with post-hoc analyses performed using Bonferroni-corrected paired t-tests where appropriate. To investigate the within-subject distribution of lever-presses between Valued and Devalued states, we normalized lever-presses for Valued and Devalued states to total lever-pressing (Valued + Devalued) in each context. We then conducted planned one-sample t-tests for normalized data to examine whether each condition differed from chance (0.5); that is, what distribution of lever presses between Valued and Devalued states for each schedule was observed in normalized data, with a value of 0.5 reflecting the same level of lever pressing between Valued and Devalued states. Additionally, we examined the magnitude of outcome devaluation by creating a devaluation index ((Lever Presses − Valued state Lever Presses Devalued state)/(Lever Presses Valued state + Lever Presses Devalued state)) for each mouse in the RR and RI contexts. We then conducted paired t-tests to examine differences in the magnitude of devaluation between RI and RR contexts for each group of mice. For slice experiments, 10 optically evoked EPSCs recorded prior to WIN administration were averaged to calculate the baseline amplitude, and 10 oEPSCs recorded 10 min following the completion of drug application were averaged to determine the post-drug amplitude. Post-drug amplitudes were normalized to pre-drug amplitudes and an unpaired Student’s t test of the normalized amplitudes from WT and CB1flox animals was performed.

Supplementary Material

supplement

Acknowledgments

We thank Matthew Pava for assistance in the electrophysiological experiments and Emily Baltz for assistance with intra cranial surgeries. We thank Drs. Beat Lutz and Giovanni Marsicano for providing the Cnr1loxP/loxP mice. The DREADD virus was provided by the Gene Therapy and Vector Core at the University of North Carolina. This research was supported by the NIAAA Division of Intramural Clinical and Biological Research, ERA-NET; European Research Council (COG 617142) and HHMI (IEC 55007415) grants to R.M.C., and a Pathway to Independence Award R00 AA021780 and NARSAD Young Investigator Grant from the Brain & Behavior Research Foundation to C.M.G.

Footnotes

Author contributions. C.M.G, D.M.L., and R.M.C. designed the study, interpreted results, and wrote the manuscript. C.M.G. coordinated the experiments. C.M.G. performed surgical procedures and behavioral experiments. C.M.G, J.C., and B.A. performed in vitro electrophysiological experiments. C.M.G. and J.C. analyzed data. G.L. made mouse constructs. R.N., C.R., and K.D. supplied viruses.

Author information: author deposition statement, competing interest declarations. The authors have no competing financial interests.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Adams CD. Variations in the sensitivity of instrumental responding to reinforcer devaluation. The Quarterly journal of experimental psychology B, Comparative and physiological psychology. 1982;34:77–98. [Google Scholar]
  2. Adams CD, Dickinson A. Instrumental Responding Following Reinforcer Devaluation. Q J Exp Psychol-B. 1981;33:109–121. [Google Scholar]
  3. Ahmari SE, Spellman T, Douglass NL, Kheirbek MA, Simpson HB, Deisseroth K, Gordon JA, Hen R. Repeated Cortico-Striatal Stimulation Generates Persistent OCD-Like Behavior. Science. 2013;340:1234–1239. doi: 10.1126/science.1234733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Armbruster BN, Li X, Pausch MH, Herlitze S, Roth BL. Evolving the lock to fit the key to create a family of G protein-coupled receptors potently activated by an inert ligand. n.d doi: 10.1073/pnas.0700293104. pnas.org. [DOI] [PMC free article] [PubMed]
  5. Barnes TD, Kubota Y, Hu D, Jin DZ, Graybiel AM. Activity of striatal neurons reflects dynamic encoding and recoding of procedural memories. Nature. 2005 doi: 10.1038/nature04053. Published online: 18 June 2008; | doi:10.1038/nature06993 437, 11581038/nature06993 437 1158 1161. [DOI] [PubMed] [Google Scholar]
  6. Belin D, Belin-Rauscent A, Murray JE, Everitt BJ. Addiction: failure of control over maladaptive incentive habits. - PubMed - NCBI. Curr Opin Neurobiol. 2013;23:564–572. doi: 10.1016/j.conb.2013.01.025. [DOI] [PubMed] [Google Scholar]
  7. Belin D, Everitt BJ. Cocaine seeking habits depend upon dopamine-dependent serial connectivity linking the ventral with the dorsal striatum. Neuron. 2008;57:432–441. doi: 10.1016/j.neuron.2007.12.019. [DOI] [PubMed] [Google Scholar]
  8. Bradfield LA, Dezfouli A, van Holstein M, Chieng B, Balleine BW. Medial Orbitofrontal Cortex Mediates Outcome Retrieval in Partially Observable Task Situations. Neuron. 2015 doi: 10.1016/j.neuron.2015.10.044. [DOI] [PubMed] [Google Scholar]
  9. Burguière E, Monteiro P, Feng G, Graybiel AM. Optogenetic stimulation of lateral orbitofronto-striatal pathway suppresses compulsive behaviors. Science. 2013;340:1243–1246. doi: 10.1126/science.1232380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Burguière E, Monteiro P, Mallet L, Feng G, Graybiel AM. Striatal circuits, habits, and implications for obsessive–compulsive disorder. Curr Opin Neurobiol. 2015;30:59–65. doi: 10.1016/j.conb.2014.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Colwill RM, Rescorla RA. Postconditioning devaluation of a reinforcer affects instrumental responding. J Exp Psychol Anim Behav Process. 1985;11:120–132. doi: 10.1037/0097-7403.11.1.120. [DOI] [Google Scholar]
  12. Corbit LH, Nie H, Janak PH. Habitual alcohol seeking: time course and the contribution of subregions of the dorsal striatum. Biological Psychiatry. 2012;72:389–395. doi: 10.1016/j.biopsych.2012.02.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. DePoy L, Daut R, Brigman JL, MacPherson K, Crowley N, Gunduz-Cinar O, Pickens CL, Cinar R, Saksida LM, Kunos G, Lovinger DM, Bussey TJ, Camp MC, Holmes A. Chronic alcohol produces neuroadaptations to prime dorsal striatal learning. n.d doi: 10.1073/pnas.1308198110. pnas.org. [DOI] [PMC free article] [PubMed]
  14. Derusso AL, Fan D, Gupta J, Shelest O, Costa RM, Yin HH. Instrumental uncertainty as a determinant of behavior under interval schedules of reinforcement. Frontiers in integrative neuroscience. 2010:4. doi: 10.3389/fnint.2010.00017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Dias-Ferreira E, Sousa JC, Melo I, Morgado P, Mesquita AR, Cerqueira JJ, Costa RM, Sousa N. Chronic Stress Causes Frontostriatal Reorganization and Affects Decision-Making. Science. 2009;325:621–625. doi: 10.1126/science.1171203. [DOI] [PubMed] [Google Scholar]
  16. Dickinson A. Actions and Habits: The Development of Behavioural Autonomy. Philos Trans R Soc Lond, B, Biol Sci. 1985;308:67–78. doi: 10.1098/rstb.1985.0010. [DOI] [Google Scholar]
  17. Dickinson A, Wood N, Smith JW. Alcohol seeking by rats: action or habit? The Quarterly journal of experimental psychology B, Comparative and physiological psychology. 2002;55:331–348. doi: 10.1080/0272499024400016. [DOI] [PubMed] [Google Scholar]
  18. Fenno LE, Mattis J, Ramakrishnan C, Hyun M, Lee SY, He M, Tucciarone J, Selimbeyoglu A, Berndt A, Grosenick L, Zalocusky KA, Bernstein H, Swanson H, Perry C, Diester I, Boyce FM, Bass CE, Neve R, Huang ZJ, Deisseroth K. Targeting cells with single vectors using multiple-feature Boolean logic. Nature Methods. 2014;11:763–772. doi: 10.1038/nmeth.2996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Gerdeman G, Lovinger DM. CB1 cannabinoid receptor inhibits synaptic release of glutamate in rat dorsolateral striatum. Journal of Neurophysiology. 2001;85:468–471. doi: 10.1152/jn.2001.85.1.468. [DOI] [PubMed] [Google Scholar]
  20. Gerdeman GL, Ronesi J, Lovinger DM. Postsynaptic endocannabinoid release is critical to long-term depression in the striatum. Nat Neurosci. 2002;5:446–451. doi: 10.1038/nn832. [DOI] [PubMed] [Google Scholar]
  21. Gillan CM, Papmeyer M, Morein-Zamir S, Sahakian BJ, Fineberg NA, Robbins TW, De Wit S. Disruption in the balance between goal-directed behavior and habit learning in obsessive-compulsive disorder. Am J Psychiatry. 2011;168:718–726. doi: 10.1176/appi.ajp.2011.10071062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Goldstein RZ, Volkow ND. Dysfunction of the prefrontal cortex in addiction: neuroimaging findings and clinical implications. Nature Reviews Neuroscience. 2011;12:652–669. doi: 10.1038/nrn3119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Gourley SL, Olevska A, Zimmermann KS, Ressler KJ, DiLeone RJ, Taylor JR. The orbitofrontal cortex regulates outcome-based decision-making via the lateral striatum. Eur J Neurosci. 2013;38:2382–2388. doi: 10.1111/ejn.12239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Graybiel AM. Habits, rituals, and the evaluative brain. Annu Rev Neurosci. 2008;31:359–387. doi: 10.1146/annurev.neuro.29.051605.112851. [DOI] [PubMed] [Google Scholar]
  25. Gremel CM, Costa RM. Orbitofrontal and striatal circuits dynamically encode the shift between goal-directed and habitual actions. Nat Commun. 2013;4:2264. doi: 10.1038/ncomms3264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Griffiths KR, Morris RW, Balleine BW. Translational studies of goal-directed action as a framework for classifying deficits across psychiatric disorders. Front Syst Neurosci. 2014:8. doi: 10.3389/fnsys.2014.00101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Hilario M, Holloway T, Jin X, Costa RM. Different dorsal striatum circuits mediate action discrimination and action generalization. Eur J Neurosci. 2012;35:1105–1114. doi: 10.1111/j.1460-9568.2012.08073.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Hilário MRF, Clouse E, Yin HH, Costa RM. Endocannabinoid signaling is critical for habit formation. Frontiers in integrative neuroscience. 2007;1:6. doi: 10.3389/neuro.07.006.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Hoover WB, Vertes RP. Projections of the medial orbital and ventral orbital cortex in the rat. J Comp Neurol. 2011;519:3766–3801. doi: 10.1002/cne.22733. [DOI] [PubMed] [Google Scholar]
  30. Kano M, Ohno-Shosaku T, Hashimotodani Y, Uchigashima M, Watanabe M. Endocannabinoid-Mediated Control of Synaptic Transmission. Physiol Rev. 2009;89:309–380. doi: 10.1152/physrev.00019.2008. [DOI] [PubMed] [Google Scholar]
  31. Killcross S, Coutureau E. Coordination of actions and habits in the medial prefrontal cortex of rats. Cereb Cortex. 2003;13:400–408. doi: 10.1093/cercor/13.4.400. [DOI] [PubMed] [Google Scholar]
  32. Lovinger DM. Neurotransmitter roles in synaptic modulation, plasticity and learning in the dorsal striatum. Neuropharmacology. 2010;58:951–961. doi: 10.1016/j.neuropharm.2010.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Marsicano G. CB1 Cannabinoid Receptors and On-Demand Defense Against Excitotoxicity. Science. 2003;302:84–88. doi: 10.1126/science.1088208. [DOI] [PubMed] [Google Scholar]
  34. Mátyás F, Yanovsky Y, Mackie K, Kelsch W, Misgeld U, Freund TF. Subcellular localization of type 1 cannabinoid receptors in the rat basal ganglia. Neuroscience. 2006;137:337–361. doi: 10.1016/j.neuroscience.2005.09.005. [DOI] [PubMed] [Google Scholar]
  35. Nazzaro C, Greco B, Cerovic M, Baxter P, Rubino T, Trusel M, Parolaro D, Tkatch T, Benfenati F, Pedarzani P, Tonini R. SK channel modulation rescues striatal plasticity and control over habit in cannabinoid tolerance. Nat Neurosci. 2012;15:284–293. doi: 10.1038/nn.3022. [DOI] [PubMed] [Google Scholar]
  36. Ostlund SB, Balleine BW. The contribution of orbitofrontal cortex to action selection. Ann N Y Acad Sci. 2007;1121:174–192. doi: 10.1196/annals.1401.033. [DOI] [PubMed] [Google Scholar]
  37. Pan WX, Mao T, Dudman JT. Frontiers: Inputs to the Dorsal Striatum of the Mouse Reflect the Parallel Circuit Architecture of the Forebrain. Front Neuroanat. 2010;4:147. doi: 10.3389/fnana.2010.00147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Remijnse PL, Nielen MMA, van Balkom AJLM, Cath DC, van Oppen P, Uylings HBM, Veltman DJ. Reduced orbitofrontal-striatal activity on a reversal learning task in obsessive-compulsive disorder. Archives of General Psychiatry. 2006;63:1225–1236. doi: 10.1001/archpsyc.63.11.1225. [DOI] [PubMed] [Google Scholar]
  39. Rhodes SEV, Murray EA. Differential effects of amygdala, orbital prefrontal cortex, and prelimbic cortex lesions on goal-directed behavior in rhesus macaques. - PubMed - NCBI. J Neurosci. 2013;33:3380–3389. doi: 10.1523/JNEUROSCI.4374-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Rolls ET, Everitt BJ, Roberts A. The Orbitofrontal Cortex [and Discussion] Philosophical Transactions of the Royal Society of London B: Biological Sciences. 1996;351:1433–1444. doi: 10.1098/rstb.1996.0128. [DOI] [PubMed] [Google Scholar]
  41. Rothermel M, Brunert D, Zabawa C, Díaz-Quesada M, Wachowiak M. Transgene expression in target-defined neuron populations mediated by retrograde infection with adeno-associated viral vectors. Journal of Neuroscience. 2013;33:15195–15206. doi: 10.1523/JNEUROSCI.1618-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Samson HH, Cunningham CL, Czachowski CL, Chappell A, Legg B, Shannon E. Devaluation of ethanol reinforcement. Alcohol. 2004;32:203–212. doi: 10.1016/j.alcohol.2004.02.002. [DOI] [PubMed] [Google Scholar]
  43. Schilman EA, Uylings HBM, Galis-de Graaf Y, Joel D, Groenewegen HJ. The orbital cortex in rats topographically projects to central parts of the caudate-putamen complex. Neurosci Lett. 2008;432:40–45. doi: 10.1016/j.neulet.2007.12.024. [DOI] [PubMed] [Google Scholar]
  44. Schoenbaum G, Shaham Y. The Role of Orbitofrontal Cortex in Drug Addiction: A Review of Preclinical Studies. Biological Psychiatry. 2008;63:256–262. doi: 10.1016/j.biopsych.2007.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Shepherd GMG. Corticostriatal connectivity and its role in disease. Nature Reviews Neuroscience. 2013;14:278–291. doi: 10.1038/nrn3469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Smith KS, Graybiel AM. A Dual Operator View of Habitual Behavior Reflecting Cortical and Striatal Dynamics. Neuron. 2013 doi: 10.1016/j.neuron.2013.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Stachniak TJ, Ghosh A, Sternson SM. Chemogenetic Synaptic Silencing of Neural Circuits Localizes a Hypothalamus→Midbrain Pathway for Feeding Behavior. Neuron. 2014 doi: 10.1016/j.neuron.2014.04.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Stalnaker TA, Calhoon GG, Ogawa M, Roesch MR, Schoenbaum G. Frontiers: Neural correlates of stimulus-response and response-outcome associations in dorsolateral versus dorsomedial striatum. Frontiers in integrative neuroscience. 2010;4:12. doi: 10.3389/fnint.2010.00012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Stalnaker TA, Cooch NK, Schoenbaum G. What the orbitofrontal cortex does not do. Nat Neurosci. 2015;18:620–627. doi: 10.1038/nn.3982. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Thorn CA, Atallah H, Howe M, Graybiel AM. Differential Dynamics of Activity Changes in Dorsolateral and Dorsomedial Striatal Loops during Learning. Neuron. 2010;66:781–795. doi: 10.1016/j.neuron.2010.04.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Uchigashima M, Narushima M, Fukaya M, Katona I, Kano M, Watanabe M. Subcellular arrangement of molecules for 2-arachidonoyl-glycerol-mediated retrograde signaling and its physiological contribution to synaptic modulation in the striatum. Journal of Neuroscience. 2007;27:3663–3676. doi: 10.1523/JNEUROSCI.0448-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Wall NR, De La Parra M, Callaway EM, Kreitzer AC. Differential innervation of direct- and indirect-pathway striatal projection neurons. Neuron. 2013;79:347–360. doi: 10.1016/j.neuron.2013.05.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Yin HH, Knowlton BJ. The role of the basal ganglia in habit formation. Nature Reviews Neuroscience. 2006;7:464–476. doi: 10.1038/nrn1919. [DOI] [PubMed] [Google Scholar]
  54. Yin HH, Knowlton BJ, Balleine BW. Inactivation of dorsolateral striatum enhances sensitivity to changes in the action-outcome contingency in instrumental conditioning. Behav Brain Res. 2006;166:189–196. doi: 10.1016/j.bbr.2005.07.012. [DOI] [PubMed] [Google Scholar]
  55. Yin HH, Knowlton BJ, Balleine BW. Blockade of NMDA receptors in the dorsomedial striatum prevents action-outcome learning in instrumental conditioning. Eur J Neurosci. 2005a;22:505–512. doi: 10.1111/j.1460-9568.2005.04219.x. [DOI] [PubMed] [Google Scholar]
  56. Yin HH, Knowlton BJ, Balleine BW. Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur J Neurosci. 2004;19:181–189. doi: 10.1111/j.1460-9568.2004.03095.x. [DOI] [PubMed] [Google Scholar]
  57. Yin HH, Ostlund SB, Knowlton BJ, Balleine BW. The role of the dorsomedial striatum in instrumental conditioning. Eur J Neurosci. 2005b;22:513–523. doi: 10.1111/j.1460-9568.2005.04218.x. [DOI] [PubMed] [Google Scholar]
  58. Znamenskiy P, Zador AM. Corticostriatal neurons in auditory cortex drive decisions during auditory discrimination. Nature. 2013 doi: 10.1038/nature12077. Published online: 18 June 2008; | doi:10.1038/nature06993 497, 482 1038/nature06993 497 482 485. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplement

RESOURCES