Abstract
Emerging data indicates that endocannabinoid signaling is critical to the formation of habitual behavior. Previous work demonstrated that antagonism of cannabinoid receptor type 1 (CB1R) with AM251 during operant training impairs habit formation, but it is not known if this behavioral effect is specific to disrupted signaling of the endocannabinoid ligands anandamide or 2-arachidonoyl glycerol (2-AG). Here, we used selective pharmacological compounds during operant training to determine the impact of fatty acid amide hydrolase (FAAH) inhibition to increase anandamide (and other n-acylethanolamines) or monoacylglycerol lipase (MAGL) inhibition to increase 2-AG levels on the formation of habitual behaviors in mice using a food-reinforced contingency degradation procedure. We found, contrary to our hypothesis, that inhibition of FAAH and of MAGL disrupted the formation of habits. Next, AM251 was administered during training to verify that impaired habit formation could be assessed using contingency degradation. AM251-exposed mice responded at lower rates during training and at higher rates in the test. To understand the inconsistency with published data, we performed a proof-of-principle dose-response experiment to compare AM251 in our vehicle-solution to the published vehicle-suspension on response rates. We found consistent reductions in response rate with increasing doses of AM251 in solution and an inconsistent dose-response relationship with AM251 in suspension. Together, our data suggest that further characterization of the role of CB1R signaling in the formation of habitual responding is warranted and that augmenting endocannabinoids may have clinical utility for prophylactically preventing aberrant habit formation such as that hypothesized to occur in substance use disorders.
Keywords: Contingency degradation, variable interval, habit, anandamide, 2-arachidonoyl glycerol, CB1 receptors
Introduction
Research efforts in diverse disciplines across cognitive science, psychology, and neuroscience into motivated behavior have repeatedly described two types of adaptive behavioral control – goal-directed, and habitual (Balleine and Dickinson, 1998; Barker and Taylor, 2014; Dolan and Dayan, 2013; Gourley and Taylor, 2016; Sutton and Barto, 1998). Habits are reflexive actions that are automatically performed in response to antecedent environmental stimuli, and as such, are insensitive to changes in the value of the outcome. In contrast, goal-directed actions are those performed in order to achieve a valued outcome, reflecting knowledge about the contingency between the action and the outcome. Optimal behavior is thought to reflect a balance of these two processes (Voon et al., 2017). There is evidence, however, that disruptions in this balance may lead to enhanced habitual behavior in humans with different mental disorders including obsessive-compulsive disorder, substance use disorders, binge eating disorder, and behavioral addictions (Ersche et al., 2016; McKim et al., 2016; Sebold et al., 2014; Voon et al., 2015). Elucidation of the neurobiological mechanisms that underlie the transition from goal-directed to habitual behavioral control could provide novel insight into the pathophysiology of these disorders (Barker and Taylor, 2014; Malvaez and Wassum, 2018; Torregrossa and Taylor, 2016).
The formation of habits depends on the neuromodulatory functions of endocannabinoids (Gremel et al., 2016; Hilario et al., 2007). The endocannabinoid system primarily signals in neurons through the cannabinoid receptor type 1 (CB1), which have widespread expression throughout the brain (Herkenham et al., 1990). Transgenic mice with global or circuit-specific CB1 receptor knockout have impaired habit learning, i.e., preserved goal-directed behavior following a habit-forming training paradigm (Gremel et al., 2016; Hilario et al., 2007) suggesting that CB1 receptors may be involved in the formation and/or expression of habit. Pharmacological evidence suggests that CB1 receptors are important for both processes. For example, administration of a CB1 receptor inverse agonist during operant training prevents the formation of habitual behaviors in mice (Hilario et al., 2007). Chronic administration of THC after operant training increases the expression of habitual behavior (Nazzaro et al., 2012), which may be associated with the reduced CB1 receptor binding that has been observed in individuals with cannabis use disorder (Ceccarini et al., 2015; Hirvonen et al., 2012). Additionally, previous work from our group has demonstrated bi-directional effects of CB1 receptor agonists and antagonists on the expression of food habits, increasing and decreasing expression respectively (Gianessi et al., 2019). Because CB1 receptor availability is lower in individuals who are dependent substances other than THC (Ceccarini et al., 2014; Hirvonen et al., 2013, 2018), endocannabinoid dysregulation may be the mechanism by which habitual behaviors emerge in individuals with substance use disorders.
The two primary endogenous ligands for the CB1 receptor are anandamide and 2-arachidonoyl glycerol (2-AG) (Devane et al., 1992; Mechoulam et al., 1995; Sugiura et al., 1995). Anandamide is a partial agonist at the CB1 receptor, whereas 2-AG is a full agonist at the CB1 receptor (Luk et al., 2004). Both ligands are synthesized on demand following activity in the post-synaptic neuron and act as retrograde messengers to CB1 receptors located at the pre-synaptic terminal and produce forms of short and long-term plasticity (Augustin and Lovinger, 2018). Most notably CB1 receptors are required for forms of long term depression, and it is hypothesized that CB1 receptor-dependent long term depression is necessary for habit formation (Gerdeman et al., 2003). Because both anandamide and 2-AG act as agonists at the CB1 receptor and can yield long term depression, it is likely that either or both of these ligands are critical to the formation of habits.
Understanding the contributions of anandamide and 2-AG to behavior is possible with the use of compounds that target the specific degradation pathways of these ligands. For example, URB597 inhibits the primary degradation enzyme for anandamide, fatty acid amide hydrolase (FAAH), and has been shown to selectively increase extracellular levels of anandamide compared to 2-AG (Wiskerke et al., 2012) although it also elevates levels of other substrates for FAAH including oleoylethanolamide and palmitoylethanolamide (Kathuria et al., 2003). Additionally, JZL184 inhibits monoacyl glycerol lipase, the primary enzyme that degrades 2-AG (Long et al., 2008), and has been shown to selectively increase extracellular 2-AG levels (Wiskerke et al., 2012). Previous studies have used these compounds to determine how elevations of anandamide and 2-AG impact anxiety-like behavior (Bedse et al., 2017; Bluett et al., 2017; Kathuria et al., 2003), as well as other behaviors (Blednov et al., 2007; Long et al., 2008), but none have investigated how these compounds impact the formation of habitual appetitive behaviors. Here we used these selective compounds, URB597 and JZL184, to determine how pharmacologically-mediated elevations in anandamide (and other FAAH substrates) and 2-AG levels, respectively, impacted the formation of habitual behaviors in mice using a food-reinforced contingency degradation procedure. Moreover, we also examined how inverse agonism of CB1 receptor signaling with AM251 would alter the formation of habits. We hypothesized that increasing 2-AG and/or anandamide (and other FAAH substrates) would facilitate the formation of habits, whereas antagonizing the CB1 receptor would impede the formation of habits.
Materials and Methods
Animals
A total of n= 88 adult (>7 week old) male C57BL/6 mice (Charles River Laboratories; Wilmington, MA) were used. The sex and strain were selected to match the previous studies that demonstrated the necessity for CB1 receptors in habit learning (Gremel et al., 2016; Hilario et al., 2007). Mice were maintained at 85–90% of free-feeding body weight for the duration of the experiment by feeding 2.0–3.0 g of standard rodent chow (2918 Teklad diet; Envigo, Huntingdon, UK) per mouse per day. Mice used in Experiments 1–4 were experimentally naïve. A subset of mice from Experiment 3 were used for the proof-of-principle Experiment 5. All procedures were approved by the Yale University Institutional Animal Care and Use Committee and were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals of the Institute of Laboratory Animal Resources.
Drugs
The drugs used were the following: AM251 [Fisher Scientific (Waltham, MA)]; JZL184 [Sigma-Aldrich (St. Louis, MO) and Cayman Chemical (Ann Arbor, MI)]; URB597 [Sigma-Aldrich]. All drugs were dissolved in 5% DMSO, 15% Tween 80 in sterile physiological saline, except in the AM251 dose-response control experiment where some doses were suspended in 1% DMSO in saline, as was done in a previous report (Hilario et al., 2007). The 5% DMSO, 15% Tween 80 vehicle was selected because all of the selected drugs are soluble in this vehicle, which permits shared vehicle conditions for experiments and minimizes the numbers of animals required. All drugs were injected intraperitoneal at 10 mL/kg. Doses used were as follows: JZL184 at 2 mg/kg, URB597 at 0.5 mg/kg, and AM251 at the following doses: 0.5, 1, 3, and 6 mg/kg.
Operant training, testing, and behavioral analyses
Operant behavior was conducted in standard operant chambers within sound-attenuated boxes (Med Associates, St. Albans, VT) as detailed previously (Gianessi et al., 2019; Gourley et al., 2010). Chambers were equipped with three adjacent nosepoke apertures on the back wall and a magazine located in the center of the front wall. Apertures and magazine were each equipped with a light and a photobeam sensor. All entries into the apertures and magazine were recorded. Sucrose-sweetened grain pellet reinforcers (Bioserv F0071, Flemington, NJ, USA) were dispensed into the magazine and served as the primary reinforcer in all operant sessions. A fan provided ventilation and background noise throughout the behavioral sessions.
Operant Training
Mice underwent two days of magazine training where a single reinforcer was delivered once every 60 seconds. Entries into the magazine and apertures had no programmed consequences. Sessions terminated after 30 minutes. Following magazine training, the mice underwent operant training. During these sessions one aperture, either the left or right, was assigned to deliver reward (referred to as “active”) and the other two apertures had no programmed consequence (referred to as “inactive”). Operant sessions began with illumination of the active aperture, and ended with the light extinguishing.
Responses into the active aperture were reinforced on a fixed ratio 1 (FR1) schedule, where each response resulted in a single reinforcer. FR1 sessions terminated after 30 minutes or when mice earned 60 reinforcers, whichever occurred first. Mice remained on FR1 schedule of reinforcement until all mice earned at least 30 reinforcers in a single FR1 session. Mice were then assigned to drug conditions based on their performance on the FR1 schedule, such that the number of days it took to reach 30 reinforcer criterion and total reinforcers earned on the final FR1 day did not differ between drug conditions.
Mice were then trained using a variable interval (VI) schedule of reinforcement because this schedule of reinforcement is known to promote the formation of habitual behavior (Derusso et al., 2010). The duration of each interval was randomly selected from an exponential list, with an average of 30 seconds for VI30, and 60 seconds for VI60. The first active response the mice performed after the interval elapsed resulted in delivery of a reinforcer. The duration of the next interval was then randomly selected. Training sessions terminated after 30 minutes. Drugs were administered prior to assessing operant behavior on the VI schedule as described below for each experiment.
Contingency degradation test
Contingency degradation was conducted the subsequent day following the baseline VI training session to determine if responding was habitual or goal-directed, as previously described (Barker et al., 2013; Gianessi et al., 2019, 2020). Briefly, test sessions appeared similar to training sessions but reinforcers were non-contingently delivered. Active aperture responses had no programmed consequence. Reinforcers were delivered at equal intervals, matching the total number to the reinforcers earned the day prior. Test sessions terminated after 30 minutes. No drugs were given prior to the contingency degradation tests.
Experiment 1:
URB597 (n=8), JZL184 (n=8), or vehicle (n=7) was administered 30 minutes prior to all VI training sessions. Mice received three VI30 training sessions and three VI60 training sessions before testing with contingency degradation under drug-free conditions to assess if pharmacological manipulations during operant training affected habit learning.
Experiment 2:
URB597 (n=11), JZL184 (n=11) or vehicle (n=6) was administered 2 hours prior to VI training, to investigate additional time course effects from Experiment 1 and to reduce the potential impact of an acute stress response to the injections. Injections of saline (i.p.) have been reported to maximally increase plasma corticosterone levels in C57/Bl6 mice between 10–20 minutes after the injection, which returns to baseline levels approximately about 60 minutes following the injection (Freund et al., 1988). Stress has been shown to accelerate the formation of habitual responding (Dias-ferreira et al., 2009; Gourley et al., 2012; Schwabe and Wolf, 2009). Both URB597 and JZL184 have anxiolytic properties (Bedse et al., 2017; Bluett et al., 2017; Kathuria et al., 2003), so it is possible that URB597 and JZL184 attenuated the injection-induced stress response in Experiment 1. This experiment was run concurrently with Experiment 4 and the number of mice used for the experiment was minimized by dividing the mice in the vehicle-exposed condition in half for each timepoint. The timing of vehicle administration did not alter response rates on any part of the experiment [VI30 days: Main effect of day: χ2= 5.8, p=0.06, Main effect of injection time: χ2= 0.5, p=0.5, Day-by-injection time interaction: χ2= 2.1, p=0.4; VI60 days: Main effect of day: χ2= 5.1, p=0.08, Main effect of injection time: χ2= 0.3, p=0.6, Day-by-injection time interaction: χ2= 3.6, p=0.2; Contingency degradation test: Main effect of session: χ2= 2.1, p=0.1, Main effect of injection time: χ2= 1.8, p=0.2, Session-by-injection time interaction: χ2= 0.8, p=0.4] so these data are presented with a combined vehicle group (n=12). One mouse from the 2 hour vehicle group is excluded from the habit test analysis due to an experimenter error on the contingency degradation test. Mice received three VI30 training sessions and three VI60 training sessions before the contingency degradation test.
Experiment 3:
URB597 (n=7), JZL184 (n=6) or vehicle (n=6) was administered 2 hours prior to all VI training sessions. This timepoint was selected out of an abundance of caution, despite the appearance of no effect from the stress of the injection in Experiment 2. Mice received one VI30 training session, and one VI60 training session prior to the first contingency degradation test to attenuate the formation of habitual responding for the vehicle condition on the habit test, thus avoiding a ceiling effect and allowing for assessment of whether habits could be facilitated. Then, sessions alternated between VI60 training sessions and contingency degradation test sessions due to continued observation of goal-directed behavior. Mice underwent a total of five contingency degradation tests under drug-free conditions.
Experiment 4:
Vehicle (n=6) or 1 mg/kg AM251 (n=12) was administered 30 minutes prior to most VI training sessions to determine if antagonism of CB1R impaired habit formation in the contingency degradation procedure, with the prediction that AM251 would impair habit formation (Hilario et al., 2007). This 30-minute timepoint was chosen to match that of the previous report (Hilario et al., 2007). This experiment was run concurrently with Experiment 2 and the number of mice used for the experiment was minimized by dividing the mice in the vehicle-exposed condition in half for each timepoint. The timing of vehicle administration did not alter response rates on any part of the experiment (see above description for Experiment 2) so these data are presented with a combined vehicle group (n=12) for the VI30 and initial VI60 training days, n=11 for Habit test 1. Due to an experimenter error n=5 mice in the AM251 condition received incorrect numbers of rewards on the first contingency degradation test and are excluded from this timepoint. Mice received three VI30 training sessions and three VI60 training sessions before the first contingency degradation test. Following the first contingency degradation test mice in the 30-minute timepoint groups received one VI60 training session without drug administration, which was followed by a second contingency degradation test to disambiguate drug-induced alterations in baseline responding from measures of habitual responding.
Experiment 5:
This proof-of-principle experiment examined whether vehicle differences in AM251 formulation (solution vs. suspension) might explain the observations in Experiment 4 that diverged from those predicted.
A dose-response experiment was conducted comparing response rates on a baseline VI60 day to the response rates the following VI60 day when a dose of AM251 was administered 30 minutes prior. This 30-minute timepoint was chosen to match that of the previous report (Hilario et al., 2007). Mice received one dose of AM251 in each vehicle and the order of administration was counterbalanced. Of note, mice used in Experiment 5 were a subset of the mice used previously for Experiment 3 (n=18). AM251 was administered at 0, 0.5, 3, and 6 mg/kg in two different formulations. The solution formulation was 5% DMSO, 15% Tween 80 in saline, to match that used for URB597 and JZL184. The suspension formulation was 1% DMSO in saline, as prepared in a previous report (Hilario et al., 2007).
Statistical analysis
Data analyses were conducted in Prism 8.4.3 (Graphpad, San Diego, CA) and SPSS 21 (IBM, Armonk, NY). Data are presented as the mean ± standard error of the mean. Total responses across drug conditions and behavioral sessions were analyzed using repeated measures generalized estimating equations (GEE) with a Poisson distribution, because this distribution is the most appropriate for count data (i.e., total number of active nose poke responses, number of rewards earned). Regression coefficients were tested with Wald χ2 to determine if they were significantly different from zero. Total responses, and in Experiment 4, rewards earned, were analyzed across days of VI training to determine if there are effects of drug administration on operant responding (i.e., including a factor for day with levels for first, second, third etc. day and a factor for drug with levels for the different drugs administered). Habit tests are analyzed by comparing, within-subjects, total active responses made on the Baseline VI60 session prior to and those made during the contingency degradation test (i.e., including a factor for session type with levels for baseline and for test, and including a factor with levels for the different drugs administered). Post hoc tests of significant interactions consisted of computing lower order comparisons (i.e., for 3-way interactions, follow up with 2-way interactions). Post hoc analysis of the significant day-by-drug interaction for active response rate during VI60 training in Experiment 3 was conducted with repeated measures GEE with a Poisson distribution with a Sidak adjustment for multiple comparisons, as there was no significant main effect of drug. Post hoc tests for the effect of dose of AM251 in Experiment 5 were conducted with repeated measures GEE with a Poisson distribution pairwise with a Sidak adjustment for multiple comparisons. Effect sizes for comparisons between two means were estimated by calculating Cohen’s d.
Results
Experiment 1:
Mice were trained for 3 days on a VI30 schedule and for 3 days on a VI60 schedule of reinforcement before the contingency degradation test (see timeline Figure 1a). Vehicle, and the enzyme inhibitors URB597 (0.5 mg/kg), or JZL184 (2 mg/kg) were administered 30 minutes prior to each VI training session. Response rates across days of training are presented in Figure 1b. Response rates increased across the VI30 training sessions as the mice learned to respond on the VI schedule of reinforcement (main effect of day: χ2= 52.6, p<0.001). Post hoc analyses of the drug-by-day interaction (drug-by-day interaction: χ2= 9.8, p=0.05) indicated that within each drug condition response rates increased across days of VI30 training (Vehicle condition main effect of day: χ2= 15.1, p=0.001; URB597 main effect of day χ2=8.3, p=0.02; JZL184 condition main effect of day: χ2=39.4, p<0.001). However, on the VI60 schedule of reinforcement the response rate across the three sessions did not differ between drug conditions (main effect of drug: χ2= 0.2, p= 0.9; main effect of day: χ2= 5.0, p=0.08; drug-by-day interaction: χ2= 8.2, p=0.09). Additionally, inactive response rates changed over the course of VI60 training (main effect of day χ2= 6.8, p=0.03, drug-by-day interaction χ2= 10.7, p=0.03), which was driven by a decrease in inactive responses in the vehicle condition (Vehicle condition main effect of day: χ2= 16.6, p<0.001).
Figure 1: Persistent goal-directed behavior following FAAH or MAGL inhibition during training.
a. Experimental timeline
b. Response rates on active and inactive nose pokes across VI training and contingency degradation test for the mice given vehicle (n=7), URB597 (n=8), and JZL184 (n=8) 30 minutes prior to each VI session.
c. Response rates on baseline day compared to the contingency degradation test. ** p<0.01; ***p<0.001
Response rates on the contingency degradation test were significantly altered by drug condition (Figure 1c; main effect of session type: χ2= 23.9, p<0.001; main effect of drug: χ2= 2.5, p=0.3; session-by-drug interaction: χ2= 5.8, p=0.05). The response rates of the URB597-exposed group and the JZL184-exposed group were reduced on the test session compared to their baseline sessions (URB597: χ2= 13.9, p<0.001, d= 1.4; JZL184 χ2= 27.3, p<0.001, d= 1.9), indicating that responding of the URB597 and JZL184-exposed mice was goal-directed. In contrast, the response rates of the vehicle-exposed group on the test session did not differ from their baseline session (χ2= 0.03, p=0.9, d = 0.1), indicating that the responding of the vehicle-exposed group was habitual. These data suggest that increasing anandamide or other FAAH substrates and increasing 2-AG levels prevented, rather than accelerated, the formation of habits. Inactive responses were also altered by contingency degradation testing (main effect of session χ2 = 8.5, p=0.004, drug-by-session interaction χ2 = 10.6, p=0.005). Inactive responses decreased on the contingency degradation test in JZL184-exposed (χ2 = 4.3, p=0.04, d=1.0), and in URB597-exposed (χ2 = 9.7, p=0.002, d= 1.1) groups but not in the Vehicle group (χ2 = 0.7, p=0.4, d= 0.27).
Experiment 2:
The paradoxical effects of URB597 and JZL184 on habit formation may be because these compounds, which have known anxiolytic effects (Bedse et al., 2017; Bluett et al., 2017; Kathuria et al., 2003), may have reduced the stress response of mice to the injection and thus prevented the formation of habits. In the second experiment URB597 and JZL184 were administered 2 hours before the VI sessions (see timeline Figure 2a) – a period when the injection-mediated enhancements of corticosterone are known to return to baseline (Freund et al., 1988). Response rates for URB597 and JZL184 injections across the experiment are presented in Figure 2b. Response rates increased over the VI30 training days (main effect of day: χ2= 9.4, p=0.009), indicating that the mice learned the task. There was also a significant day-by-drug interaction (day-by-drug interaction: χ2= 14.6, p=0.006). Post hoc analyses detected significant effects of day for the JZL184 condition (main effect of day: χ2= 22.9, p<0.001). Response rates also increased over VI60 training days (main effect of day: χ2= 6.0, p=0.05). Inactive responses decreased over VI30 training (main effect of day χ2 = 20.3, p<0.001), and again over VI60 training (main effect of day χ2 =12.1, p=0.002).
Figure 2: Further time course characterization of persistent goal-directed behavior following FAAH or MAGL inhibition during training.
a. Experimental timeline
b. Response rates on active and inactive nose pokes across VI training and contingency degradation tests for the mice given vehicle (n=12), URB597 (n=11), and JZL184 (n=11) before each VI session.
c. Response rates on the baseline days compared to the contingency degradation tests for the mice exposed to vehicle, URB597, and JZL184. ***p<0.001; **p<0.01
Response rates were then assessed in the contingency degradation test under a drug-free state. Administration of either enzyme inhibitor during operant training significantly altered responding on the contingency degradation test (Figure 2c; main effect of session type: χ2= 46.9, p<0.001; main effect of drug: χ2= 0.5, p=0.8; session type-by-drug interaction: χ2= 25.0, p<0.001). The response rates of mice previously exposed to URB597 or JZL184 decreased in the contingency degradation test compared to that in the baseline session (URB597: χ2= 8.2, p=0.004, d= 0.6; JZL184: χ2= 63.1, p<0.001, d= 1.0) suggesting that their responding was goal-directed. Response rates of the vehicle-exposed group contingency test did not differ from that in the baseline session (χ2= 2.0, p=0.2, d= 0.2), indicating that their responding was habitual. These results are similar to those observed in Experiment 1 suggesting that the increase in goal-directed behavior was not accounted for by differences in a stress response to the injection.
Experiment 3:
The training procedures used in Experiment 1 and 2 to engender the formation of habitual responding may have prevented us from observing a facilitation in the formation of habits by URB597 and JZL284. To address this possibility, the operant training was reduced for Experiment 3 (see timeline Figure 3a) to a single day of VI30 training and VI60 training prior to the first contingency degradation test. We then alternated between VI60 training sessions and contingency degradation test sessions to repeatedly assess habitual responding as a function of operant training. URB597 and JZL184 were administered 2 hours before each VI training session, as in Experiment 2, to reduce the impact of the stress response to the injection. Response rates across the VI training and contingency degradation tests are presented in Figure 3b. Response rates on the VI30 schedule were not different between the drug groups (main effect of drug: χ2= 0.2, p=0.9). Response rates on the VI60 schedule, however, differed across days between drug groups (day-by-drug interaction: χ2= 45.6, p<0.001). Post hoc analyses detected significant differences in response rate from JZL184-exposed mice on day 2 compared to day 6 (p=0.04), and compared to day 9 (p=0.001). No other significant differences were observed (all p≥0.1). Inactive responses changed over VI60 training also (main effect of day χ2 = 24.0, p<0.001, drug-by-day interaction χ2 =39.1, p<0.001). Inactive responses changed in all drug conditions across VI60 training days (JZL184: main effect of day χ2 = 231.7, p<0.001, URB597, χ2 =33.6, p<0.001, Vehicle: 113.8, p<0.001). Inactive responses were altered by contingency degradation testing (main effect of test number χ2 = 27.7, p<0.001, session type-by-test number interaction χ2 =12.1, p=0.016, test number-by-drug-interaction χ2 = 32.9, p<0.001, session type-by-test number-by-drug interaction χ2 = 22.4, p=0.004). Follow up analysis of inactive responses on each contingency degradation test revealed a significant reduction in inactive responses on the first three contingency degradation tests compared to their baseline (Test 1: main effect of session type χ2 = 6.8, p=0.009, Test 2: main effect of session type χ2 = 14.2, p<0.001, Test 3: main effect of session type χ2 = 20.1, p<0.001). Additionally, on Test 3 there was a significant drug-by-session type interaction χ2 = 9.8, p=0.008. The JZL184-exposed (χ2 = 129.8, p<0.001, d=0.9) and URB597-exposed (χ2 = 6.6, p=0.01, d=0.2) groups reduced their inactive responses on Test 3, but the Vehicle-exposed group did not (χ2 = 2.4, p=0.1, d=0.9). On Test 5 there was a significant drug-by-session type interaction (χ2 = 12.1, p=0.002), which was driven by a significant decrease in inactive responses in the JZL184-exposed group only (χ2 = 13.0, p<0.001, d=0.7).
Figure 3: Continued goal-directed behavior following habit training with MAGL inhibition.
a. Experimental timeline
b. Response rates on active and inactive nose pokes across VI training and contingency degradation tests for the mice given vehicle (n= 6), URB597 (n=7), and JZL184 (n=6) 2 hours before each VI session.
c. Percent change from baseline [((Responses on contingency degradation test – Responses on prior VI session)/Responses on the prior VI session) * 100] for each contingency degradation test. ***p<0.001; #p=0.06
Data from the five contingency degradation tests are plotted as percent change from baseline (Figure 3c). A percent change that is zero (or positive) indicates responding is habitual, whereas a percent change below zero indicates responding was goal-directed. Response rates across the five contingency degradation tests were altered by drug across tests and session type (main effect of session type χ2= 82.3, p<0.001; main effect of test number χ2= 143.6, p<0.001; main effect of drug χ2= 0.9, p=0.6; session type-by-test number-by-drug interaction χ2= 33.8, p<0.001). Subsequent analyses for each test session indicated that responding was goal-directed in all experimental groups during the first three contingency degradation tests (Test 1: main effect of session type χ2= 8.9, p=0.003; main effect of drug χ2= 0.2, p=0.9; session type-by-drug interaction χ2= 0.14, p=0.9; Test 2: main effect of session type χ2= 33.7, p<0.001; main effect of drug χ2= 0.4, p=0.8; session type-by-drug interaction χ2= 0.3, p=0.9; Test 3: main effect of session type χ2= 51.6, p<0.001; main effect of drug χ2= 0.1, p=0.9; session type-by-drug interaction χ2= 6.0, p=0.05). Post hoc tests of the significant session-type-by-drug interaction for Test 3 indicated that responding was goal directed in vehicle and JZL184-exposed mice, but this effect was less in the URB597-exposed mice (Test 3: Vehicle main effect of session type χ2= 26.5, p<0.001, d=1.7; JZL184 main effect of session type χ2= 35.6, p<0.001, d=1.0; URB597 main effect of session type χ2= 3.6, p=0.06, d=0.4). At Test 4, there was a reduction in response rate on the test session from baseline responding for vehicle and JZL184-exposed mice, but not for URB597-exposed mice (Test 4: main effect of session type χ2= 66.1, p<0.001; main effect of drug χ2= 5.2, p=0.08; session type-by-drug interaction χ2= 35.0, p<0.001; Post hoc comparisons: Vehicle main effect of session type χ2= 19.7, p<0.001, d=0.7; JZL184 main effect of session type χ2= 86.4, p<0.001, d=2.2; URB597 main effect of session type χ2= 2.5, p=0.1, d=0.4). These data suggest that administration of URB597 may have accelerated the formation of habitual responding. Moreover, the responding of vehicle-exposed and URB597-exposed mice on Test 5 was habitual, whereas the responding of JZL184-exposed mice remained goal-directed (main effect of session type χ2= 5.6, p=0.02; main effect of drug χ2= 2.8, p=0.2; session type-by-drug interaction χ2= 20.5, p<0.001; post hoc analyses: Vehicle main effect of session type χ2= 0.9, p=0.4, d=0.3; JZL184 main effect of session type χ2= 24.9, p<0.001, d=1.5; URB597 main effect of session type χ2= 0.001, p=0.9, d=0.01). These findings in JZL184-exposed mice are consistent with those observed in Experiment 1 and 2, and indicate that administration of JZL184 during operant training attenuated the formation of habits.
Experiment 4:
Previous studies have indicated that the CB1 receptor inverse agonist AM251 prevents the formation of habitual responding when administered only during VI training (Hilario et al., 2007). To determine if the paradoxical effects observed in Experiments 1–3 were due to differences in the operant paradigms used to assess habitual responding (current study: contingency degradation, Hilario and colleagues: specific satiety (Hilario et al., 2007)), we examined how inverse agonism of CB1 receptors with AM251 impacted the formation of habits as assessed in contingency degradation tests. Mice were given 1 mg/kg AM251 or vehicle 30 minutes before operant training (3 days on a VI30 schedule and 3 days on a VI60 schedule of reinforcement) and responding was assessed in a contingency degradation session (see timeline Figure 4a). Response rates for the AM251 and the vehicle groups are presented in Figure 4b. Administration of AM251 during VI training reduced response rates on VI training days (VI30: main effect of day χ2= 6.4, p=0.04, main effect of drug χ2= 9.7, p=0.002; VI60: main effect of drug χ2= 5.2, p=0.02). The main effect of day for the VI30 sessions suggests that AM251 did not disrupt the ability of mice to learn the operant contingencies. Inactive responses decreased across VI60 training days (main effect of day χ2 = 7.0, p=0.03). Administration of AM251 during VI training also reduced rewards earned on VI training days (Figure 4c; VI30: main effect of day: χ2= 5.7, p=0.06; main effect of drug: χ2= 18.1, p<0.001, day-by-drug interaction: χ2= 4.0, p=0.1; VI60: main effect of day: χ2=1.1, p=0.6, main effect of drug: χ2=12.4, p<0.001, day-by-drug interaction: χ2= 6.4, p=0.04). Post hoc tests reveal that AM251-exposed mice earned fewer rewards than Vehicle-exposed mice on each day of initial VI60 training (Day 4: main effect of drug: χ2=10.5, p=0.001, d= 1.4; Day 5: main effect of drug: χ2= 10.0, p=0.002, d=1.5; Day 6 main effect of drug: χ2= 5.3, p=0.02, d= 1.0).
Figure 4: Baseline effects of AM251 on response rates complicate the interpretation of responding during contingency degradation tests.
a. Experimental timeline
b. Response rates on active and inactive nose pokes across VI training and contingency degradation tests for the mice given vehicle (combined Vehicle n=12; 30-min Vehicle n=6 for day 8 and 9), and AM251 (n=12), except for day 7 where combined Vehicle n=11 and AM251 n=7 due to experimenter error. *p<0.05; **p<0.01
c. Rewards earned across the experiment, **p<0.01; ***p<0.001
d. Response rates on the baseline day when mice were exposed to vehicle (n=12) or AM251 (n=12) compared to the contingency degradation test. ***p<0.001
e. Response rates on the baseline day drug-free compared to the contingency degradation test for mice previously exposed to vehicle (n=6) or AM251 (n=12). ***p<0.001
Response rates were significantly altered in the first test session (Figure 4d; main effect of session type: χ2= 6.6, p=0.01; main effect of AM251: χ2= 1.6, p=0.2; session-by-AM251 interaction: χ2= 16.0, p<0.001). The interaction, however, was due to an increase in response rates on the contingency degradation test in the AM251-exposed group (Vehicle: χ2= 2.0, p=0.2, d=0.2; AM251: χ2= 14.5, p<0.001, d= 1.2). This unexpected increase in responding in mice given AM251 suggests that appetite-suppressing effects of AM251 on baseline responding and rewards earned may confound contingency degradation test results when compared to a drug-free state. Inactive responses by AM251-exposed mice were higher than those made by the Vehicle group on the first contingency degradation test session (main effect of drug χ2 = 5.5, p=0.02; session-by-drug interaction χ2 = 3.9, p=0.05; Baseline session: main effect of drug χ2 = 2.2, p=0.1, d=0.7; Contingency degradation session: main effect of drug χ2 = 5.6, p=0.02, d=0.8).
We then conducted an additional VI60 training session without administration of AM251 or vehicle and a second contingency degradation test. Notably, on this drug-free VI60 day there were no differences on response rate (main effect of drug: χ2= 1.3, p=0.3, d=0.6) or on the number of rewards mice earned (main effect of drug: χ2= 0.1, p=0.7, d=0.2) between mice previously exposed to AM251 compared to mice previously exposed to vehicle.
Mice that had been given AM251 during the initial operant training reduced their response rates in the second contingency degradation test (Figure 4e: main effect of session type: χ2= 0.3, p=0.6; main effect of drug: χ2= 0.006, p= 0.9; session-by-drug interaction: χ2= 34.8, p<0.001) and post hoc analyses revealed that response rate was reduced by AM251 treatment on the test compared to the baseline off-drug (AM251: χ2= 19.1, p<0.001; d=0.7). No difference in response rate for the vehicle-exposed group were observed (Vehicle: χ2= 1.9, p=0.2, d=0.4). These data indicate that the AM251-exposed mice remained goal-directed while the vehicle-exposed mice were habitual, and are consistent with the effects observed by Hilario and colleagues (Hilario et al., 2007).
Experiment 5
The reduction in responding in the VI schedules following administration of AM251 observed in Experiment 4 was unexpected, given that the dose of 1 mg/kg was substantially lower than the 3 mg/kg and 6 mg/kg doses previously used (Hilario et al., 2007). Moreover, Hilario and colleagues did not observe the profound reductions in responding that we did. We hypothesized that this discrepancy might be attributable to differences in the formulation of AM251 used between the current study (solution: 5% DMSO, 15% Tween 80) and that report by Hilario and colleagues (suspension: 1% DMSO, Hilario et al., 2007). To investigate this hypothesis, we examined operant responding of mice on a VI60 schedule following administration of AM251 (0.5 mg/kg, 3 mg/kg, and 6 mg/kg, see timeline Figure 5a) which was dissolved either in 5% DMSO, 15% Tween 80 (Figure 5b) or suspended in 1% DMSO (Figure 5c) as a proof-of-principle. The mice used for this experiment were a subset of those previously used for Experiment 3, and had already been trained on the VI schedule of reinforcement prior to inclusion in this experiment.
Figure 5: Response rates on the VI60 schedule following a range of doses and vehicles of AM251.
a. Experimental timeline
b. Active response rates for the AM251 dose response in the 5% DMSO, 15% Tween 80 in saline vehicle. There were n=5 in the vehicle group, n=4 in the 0.5 mg/kg group, n=5 in the 3 mg/kg group, and n=4 in the 6 mg/kg group. ***p<0.001; **p<0.01; *p<0.05
c. Active response rates for the AM251 dose in the 1% DMSO suspension in saline vehicle. There were n=4 in the vehicle group, n=5 in the 0.5 mg/kg group, n=4 in the 3 mg/kg group, and n=5 in the 6 mg/kg group.
d. Inactive response rates for the AM251 dose response in the 5% DMSO, 15% Tween 80 in saline vehicle. There were n=5 in the vehicle group, n=4 in the 0.5 mg/kg group, n=5 in the 3 mg/kg group, and n=4 in the 6 mg/kg group.
e. Inactive response rates for the AM251 dose in the 1% DMSO suspension in saline vehicle. There were n=4 in the vehicle group, n=5 in the 0.5 mg/kg group, n=4 in the 3 mg/kg group, and n=5 in the 6 mg/kg group.
Response rates were significantly altered following administration of AM251 (main effect of day χ2= 108.8, p<0.001; main effect of vehicle χ2= 102.9, p<0.001; day-by-dose interaction χ2= 84.2, p<0.001; day-by-vehicle interaction χ2= 102.3, p<0.0001; day-by-dose-by-vehicle interaction χ2= 25.0, p<0.001). To follow up on the significant three-way interaction, we next conducted analyses for effects of dose and vehicle on each day. On the baseline day, there was a significant main effect of vehicle (main effect of vehicle χ2= 5.8, p=0.02), which reflects that there was a higher mean response rate on the baseline days prior to AM251 administered in suspension (11.1 ± 1.01 responses per minute) compared to the response rate on the baseline days prior to AM251 administered in solution (9.0 ± 0.085 responses per minute). On the day when AM251 was administered, response rates were significantly altered by dose and vehicle (main effect of dose χ2= 20.1, p<0.001; main effect of vehicle χ2= 127.2, p<0.001; dose-by-vehicle interaction χ2= 15.6, p=0.001). To follow up on the significant dose-by-vehicle interaction, we next analyzed the effects of dose, within each vehicle, for the day when AM251 was administered. Following administration of AM251 in a solution there was a significant effect of dose (Figure 5b, main effect of dose χ2= 29.8, p<0.001). Although post hoc analyses detected no differences between vehicle and any of the doses of AM251 in solution (all p>0.1, vehicle vs. 0.5 mg/kg d=0.7, vehicle vs. 3 mg/kg d=1.1, vehicle vs. 6 mg/kg d=1.1), this is likely due to a small sample size and large variability in the vehicle-exposed group. Responding following the low dose of AM251 (0.5 mg/kg) was significantly greater than that following 3 mg/kg (p= 0.05, d=1.6), and 6 mg/kg (p=0.001, d=2.3). No significant difference in response rate was detected between administration of 3 mg/kg and 6 mg/kg AM251 (p=0.4, d=1.0), which may indicate a floor effect for reduced responding from AM251 administration. In contrast, when AM251 was in the 1% DMSO suspension vehicle there were no differences in response rate following drug administration (Figure 5c, main effect of dose χ2= 1.4, p=0.7). These findings suggest that our discrepant results in Experiment 4 – where a lower dose of AM251 reduced response rate and rewards earned during VI schedule of reinforcement – from the previous report (Hilario et al., 2007) were likely due to differences in the formulation of AM251 (e.g., suspension versus solution) rather than other differences between our experimental design (e.g., nose pokes vs. levers).
Inactive responses were also altered by day, dose, and vehicle (main effect of day χ2 = 17.0, p<0.001, main effect of dose, χ2 = 12.9, p=0.005, main effect of vehicle χ2 = 6.2, p=0.013, day-by-dose interaction χ2 = 8.3, p=0.04, day-by-vehicle interaction χ2 = 12.6, p<0.0001). We next analyzed the effects of dose and vehicle for each day separately. On the Baseline day, there was a main effect of vehicle on inactive responses (χ2 = 10.9, p=0.001), which reflects a higher rate of inactive responses made on the baseline days prior to the administration of AM251 in solution (Solution Baseline: 0.204 ± 0.02 inactive responses per minute vs. Suspension Baseline: 0.13 ± 0.03 inactive responses per minute). On the day when AM251 was administered, inactive responses were altered by dose and vehicle (main effect of dose χ2 = 16.2, p=0.001, main effect of vehicle χ2 = 11.0, p=0.001).
Discussion
Inhibition of FAAH or MAGL prevent the formation of habitual responding
Here we investigated how pharmacologically-mediated increases in levels of endocannabinoid ligands, anandamide and 2-AG, impacted the formation of habits. We predicted that an increase in either one or both of these endocannabinoids would promote the formation of habits. To test this hypothesis, we administered enzyme-inhibiting drugs that selectively increased anandamide and other FAAH substrates (URB597) or 2-AG (JZL184) during operant training. Contrary to our hypothesis, we observed that URB597 and JZL184 given during the presumed formation of habits resulted in goal-directed responding at test. It is possible that the mechanism by which increasing 2-AG impeded habit formation was through functional antagonism of endocannabinoid signaling via CB1 receptor desensitization and internalization from repeated dosing of JZL184; however this is unlikely because the 2 mg/kg dose of JZL184 used in these studies is below the threshold dose of 8 mg/kg reported to cause desensitization with repeated administration (Schlosburg et al., 2010).
We hypothesized that the effects of endocannabinoid augmentation on the formation of habitual responding would be mediated by the CB1 receptor. It is also possible, however, that these endocannabinoid manipulations were mediated via other receptors including transient receptor potential cation channel subfamily V member 1 (TRPV1), cannabinoid receptor type 2 (CB2), orphan G protein-coupled receptor 55 (GPR55), or peroxisome proliferator-activated receptors (PPAR). Previous work has demonstrated that transgenic TRPV1 knockout mice have impaired habit learning for food reinforcers (Shan et al., 2015), similar to that observed in CB1 receptor knockout mice (Hilario et al., 2007). Future work should address whether TRPV1 mediates the effect of augmenting 2-AG or anandamide and FAAH substrates during habit learning by co-administration of capsazepine or another antagonist that does not alter operant responding on its own (Gianessi et al., 2019). To our knowledge, there have been no direct studies of GPR55, CB2, or PPAR mechanisms for habitual behavior. Nonetheless, there are intriguing findings from studies that implicate endocannabinoid signaling at these receptors on other behavioral tasks. Pharmacological antagonism of GPR55 in the dorsolateral striatum impaired learning of a T maze task (Marichal-Cancino et al., 2016) and transgenic GPR55 knockout mice show impaired performance on an accelerated rotarod task (Wu et al., 2013) that is known to engage striatal circuitry similar to habit learning (Yin and Knowlton, 2006; Yin et al., 2009). Additionally, the CB1 receptor inverse agonist AM251 also acts as a GPR55 agonist (Ryberg et al., 2007), which may contribute to the observed effects on operant responding in Experiment 4 and 5, as well as in the previous report (Hilario et al., 2007). Selective agonists of the CB2 receptor reduce cocaine self-administration, but do not alter performance on a rotarod task in mice (Xi et al., 2011). CB2 receptor knockout mice self-administer less nicotine (Navarrete et al., 2013), self-administer similar amounts of cocaine (Xi et al., 2011), and self-administer more ethanol than wild type mice do (Ortega-Álvaro et al., 2015), indicating that there are reinforcer-specific contributions of CB2 receptors on motivation. Pharmacological agonists of PPAR have been shown to decrease nicotine self-administration, but have no effect on operant responding for cocaine or food (Mascia et al., 2011). Future studies determining a role for GPR55, CB2, and PPAR in the formation and expression of habitual responding are warranted. From a translational standpoint, impeding the formation of habits may have clinical utility for preventative usage, such as a treatment for individuals with a family history of alcoholism or other risk factors who have not developed alcohol use disorder themselves.
From a clinical standpoint, there is a need to develop drugs that can reduce the expression of pathologically habitual behaviors, such as chronic substance abuse. Previous work from our group has demonstrated that JZL184 does not alter the expression of food habits (Gianessi et al., 2019), and that neither URB597 nor JZL184 altered the expression of alcohol habits (Gianessi et al., 2020). These catabolic enzyme inhibitors, therefore, may not be the best pharmacological mechanism for reducing compulsive substance use. Nonetheless, these compounds may have clinical utility for reducing the anxiety that occurs during alcohol withdrawal (Cippitelli et al., 2008; Serrano et al., 2018). Intriguingly, DO34, a compound which decreases 2-AG synthesis and release, has been found to reduce the expression of alcohol habits (Gianessi et al., 2020). This finding indicates that 2-AG release may be permissive for the expression of alcohol habits, but does not exacerbate the expression of habitual responding when 2-AG signaling is further augmented. Future work administering DO34 during habit training could further clarify the contributions of 2-AG release to habit formation versus expression.
Effects of CB1R antagonism on operant appetitive behavior
We administered a CB1R inverse agonist, AM251, during operant training to confirm that our contingency degradation test for habitual responding was able to detect CB1-mediated changes in habit formation. We observed an overall reduction in response rate on the VI schedule of reinforcement following 1 mg/kg AM251, despite the previous evidence that higher doses (3 and 6 mg/kg) had no effect on overall response rate (Hilario et al., 2007). Initially we observed an increase in response rate on the contingency degradation test compared to the baseline response rate when AM251 was on board. We do not interpret this increase to be reflective of habitual responding, but instead as a result of the design of our contingency degradation procedure where the number of rewards given in the contingency degradation test was matched to those earned on the previous day’s VI60 training day. Administration of AM251 significantly reduced the number of rewards mice earned in VI schedules (Figure 4c), indicating that the rewards earned under the influence of AM251 did not satiate the mice in a drug-free state. Thus the increased response rate on the first contingency degradation test (day 7) may have reflected an effort to receive more rewards. Correspondingly, when we gave the AM251-exposed mice a baseline VI60 day drug-free, we were able to observe a significant reduction in response rate on the contingency degradation test. This comparison to a drug-free baseline revealed that AM251 exposure did hinder the formation of habitual responding that has been previously reported (Hilario et al., 2007).
It is important to note that there are additional experimental design differences between the previous report and Experiment 4: their study had mice responding on levers rather than nose pokes, and their habit test used selective satiety devaluation rather than contingency degradation. Previous results in rats have demonstrated that AM251 can dose-dependently reduce operant responding for food reinforcement on levers when formulated in a vehicle with DMSO and Tween-80 (McLaughlin et al., 2003, 2010; Sink et al., 2008), so this behavioral manipulanda difference is less likely to be the major contributing factor for the discrepant findings of reduced operant responding when AM251 is administered. Additionally, the observation in Experiment 5 that operant responding on nose pokes was not reduced by administration of AM251 in the 1% DMSO suspension bolsters our claim that this difference in vehicle formulation is the critical experimental design difference driving the observed reduced response rates following administration of AM251 in Experiment 4. Contingency learning and valuation are separable in the brain, for example the nucleus accumbens is important for valuation aspects of habitual responding but not contingency (Corbit et al., 2001), and the opposite is true for the entorhinal cortex (Corbit et al., 2002). Notably, the discrepant findings from their report are from the VI sessions when AM251 is administered, which were conducted mostly the same – lever versus nosepoke notwithstanding. Further confirmation studies are needed to determine whether a lower dose of AM251 that does not alter response rates on the VI schedule could impede the formation of habits, because this observed goal-directed behavior might be a result from fewer pairings of the action-outcome contingency during training in the AM251 group compared to the vehicle group. Additionally, AM251 acts as a GPR55 agonist (Ryberg et al., 2007), which may contribute to observed effects on operant responding. Future studies to determine the role for CB1 receptor signaling in habit formation should utilize more selective pharmacology.
Due to the marked reductions in responding following administration of 1 mg/kg AM251, we conducted a proof-of-principle assay with a range of doses, testing for effects of using a different vehicle for AM251 – the previous study used 1% DMSO suspension whereas our studies used 5% DMSO, 15% Tween 80 in saline. Notably, administration of the 5% DMSO, 15% Tween 80 vehicle induces response rate reductions on its own, which motivated our efforts to use within-subjects experimental designs where possible (Gianessi et al., 2019, 2020). We found a very consistent dose-response relationship with AM251 in solution in our vehicle, and inconsistent, variable data with the suspension vehicle. It is likely that the bioavailability of AM251 formulated in suspension is lower than the intended 3 mg/kg and 6 mg/kg (Hilario et al., 2007), therefore it is unknown to what degree CB1 receptors were antagonized during operant training. Furthermore, our previous study demonstrated that 1 mg/kg AM251 reduced the expression of habitual responding (Gianessi et al., 2019), which may suggest that the observed impaired formation of habits in CB1 receptor knockout mice could be explained through the necessity of CB1 receptors for the expression of habitual responding (Hilario et al., 2007). Further studies to determine whether CB1 receptors are also necessary for habit formation are needed. These results underscore the importance of considering effects of pharmacological compounds on motivation for appetitive reinforcers during operant tasks.
Endocannabinoid modulation of appetitive motivation
Indeed, a persistent challenge with pharmacological studies of the endocannabinoid system is that many of these compounds alter motivation for appetitive reinforcement (Fattore et al., 2010; Di Marzo et al., 2009). For example, JZL184 has been found to dose-dependently increase breakpoint on a progressive ratio schedule of reinforcement for food reinforcers (Oleson et al., 2012), and for alcohol reinforcers (Gianessi et al., 2020). On the other hand, FAAH inhibition does not affect progressive ratio responding for alcohol reinforcers in rat (Cippitelli et al., 2008) nor for food reinforcers in non-human primates (Kangas et al., 2016). Future studies investigating the specific role for endocannabinoid ligands in the formation of habitual behavior may be best designed using a task such as a stimulus-response water maze task (Goodman and Packard, 2015) that is not reinforced through food-based outcomes.
Considerations for sex differences in endocannabinoids
We focused our studies on male mice to follow up on previous results indicating that CB1 receptors are necessary for habit formation (Gremel et al., 2016; Hilario et al., 2007). There are sex-specific outcomes for forming habits that have been observed using the Four Core Genotypes model, a transgenic mouse model that allows for the separation of chromosomal complement and gonadal sex. Habits for food form more slowly in chromosomal male mice (Quinn et al., 2007) and habits for alcohol form more slowly in chromosomal female mice (Barker et al., 2010). There are notable sex differences in response to cannabinoid drugs, including several that are dependent on circulating gonadal hormones such as anti-nociception, yet others that may be dependent on chromosomal complement such as hyperphagic effects of cannabinoid drugs (Craft et al., 2013). Future studies into sex differences in the contributions of the endocannabinoid system to habit formation are warranted.
Conclusion
In conclusion, a deeper understanding of the neuromodulatory mechanisms involved in the formation of habitual control over behavior is important to understand the theorized neurobiological changes that occur when, for example, transitioning from a goal-directed social drinker to a compulsive habitual drinker with alcohol use disorder. Understanding the endocannabinoid mechanisms involved in forming habitual responding could potentially lead to novel pharmacotherapies that would prevent the formation habits or, critically, reduce aberrant established habits.
Acknowledgements
This study was supported by public health service grants AA012870, DA041480, and DA043443. Additional support was provided by a NARSAD award, the Charles B.G. Murphy Fund, and the State of CT Department of Mental Health Services. C.A.G. is currently supported by T32AA007573-22. This publication does not express the view of the Department of Mental Health and Addiction Services or the State of Connecticut. The views and opinions expressed are those of the authors.
Abbreviations
- 2-AG
2-arachidonoyl glycerol
- CB1
Cannabinoid receptor type 1
- CB2
cannabinoid receptor type 2
- FAAH
fatty acid amide hydrolase
- FR1
fixed ratio 1 schedule of reinforcement
- GPR55
orphan G protein-coupled receptor 55
- PPAR
peroxisome proliferator-activated receptors
- TRPV1
transient receptor potential cation channel subfamily V member 1
- VI30
variable interval schedule of reinforcement with average of 30 seconds
- VI60
variable interval schedule of reinforcement with average of 60 seconds
Footnotes
Conflict of Interest Statement
The authors declare no conflict of interest.
Data Accessibility Statement
Data are available via request from the authors.
References:
- Augustin SM, and Lovinger DM (2018). Functional Relevance of Endocannabinoid-Dependent Synaptic Plasticity in the Central Nervous System. [DOI] [PMC free article] [PubMed]
- Balleine BW, and Dickinson A (1998). Goal-directed instrumental action: Contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407–419. [DOI] [PubMed] [Google Scholar]
- Barker JM, and Taylor JR (2014). Habitual alcohol seeking: Modeling the transition from casual drinking to addiction. Neurosci. Biobehav. Rev 47, 281–294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barker JM, Torregrossa MM, Arnold AP, and Taylor JR (2010). Dissociation of Genetic and Hormonal Influences on Sex Differences in Alcoholism-Related Behaviors. J. Neurosci 30, 9140–9144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barker JM, Torregrossa MM, and Taylor JR (2013). Bidirectional modulation of infralimbic dopamine D1 and D2 receptor activity regulates flexible reward seeking. Front. Neurosci 7, 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bedse G, Hartley ND, Neale E, Gaulden AD, Patrick TA, Kingsley PJ, Uddin MJ, Plath N, Marnett LJ, and Patel S (2017). Functional Redundancy Between Canonical Endocannabinoid Signaling Systems in the Modulation of Anxiety. Biol. Psychiatry 82, 488–499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blednov YA, Cravatt BF, Boehm SL 2nd, Walker D, and Harris RA (2007). Role of endocannabinoids in alcohol consumption and intoxication: studies of mice lacking fatty acid amide hydrolase. Neuropsychopharmacology 32, 1570–1582. [DOI] [PubMed] [Google Scholar]
- Bluett RJ, Báldi R, Haymer A, Gaulden AD, Hartley ND, Parrish WP, Baechle J, Marcus DJ, Mardam-Bey R, Shonesy BC, et al. (2017). Endocannabinoid signalling modulates susceptibility to traumatic stress exposure. Nat. Commun 8, 14782. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ceccarini J, Hompes T, Verhaeghen A, Casteels C, Peuskens H, Bormans G, Claes S, and Van Laere K (2014). Changes in cerebral CB1 receptor availability after acute and chronic alcohol abuse and monitored abstinence. J. Neurosci 34, 2822–2831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ceccarini J, Kuepper R, Kemels D, van Os J, Henquet C, and Van Laere K (2015). [18 F]MK-9470 PET measurement of cannabinoid CB 1 receptor availability in chronic cannabis users. Addict. Biol 20, 357–367. [DOI] [PubMed] [Google Scholar]
- Cippitelli A, Cannella N, Braconi S, Duranti A, Tontini A, Bilbao A, DeFonseca FR, Piomelli D, and Ciccocioppo R (2008). Increase of brain endocannabinoid anandamide levels by FAAH inhibition and alcohol abuse behaviours in the rat. Psychopharmacology (Berl). 198, 449–460. [DOI] [PubMed] [Google Scholar]
- Corbit LH, Muir JL, and Balleine BW (2001). The Role of the Nucleus Accumbens in Instrumental Conditioning: Evidence of a Functional Dissociation between Accumbens Core and Shell. J. Neurosci 21, 3251–3260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corbit LH, Ostlund SB, and Balleine BW (2002). Sensitivity to instrumental contingency degradation is mediated by the entorhinal cortex and its efferents via the dorsal hippocampus. J. Neurosci 22, 10976–10984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Craft RM, Marusich JA, and Wiley JL (2013). Sex differences in cannabinoid pharmacology: A reflection of differences in the endocannabinoid system? Life Sci. 92, 476–481. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Derusso AL, Fan D, Gupta J, Shelest O, Costa RM, and Yin HH (2010). Instrumental uncertainty as a determinant of behavior under interval schedules of reinforcement. 4, 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Devane WA, Hanus L, Breuer A, Pertwee RG, Stevenson LA, Griffin G, Gibson D, Mandelbaum A, Etinger A, and Mechoulam R (1992). Isolation and structure of a brain constituent that binds to the cannabinoid receptor. Science 258, 1946–1949. [DOI] [PubMed] [Google Scholar]
- Dolan RJ, and Dayan P (2013). Goals and habits in the brain. Neuron 80, 312–325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ersche KD, Gillan CM, Jones PS, Williams GB, Laetitia HE, Luijten M, De Wit S, Sahakian BJ, Bullmore ET, and Robbins TW (2016). Carrots and sticks fail to change behavior in cocaine addiction. Science (80-.) 352, 1468–1471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fattore L, Melis M, Fadda P, Pistis M, and Fratta W (2010). The endocannabinoid system and nondrug rewarding behaviours. Exp. Neurol 224, 23–36. [DOI] [PubMed] [Google Scholar]
- Freund RK, Martin BJ, Jungschaffer D. a, Ullman EA, and Collins AC (1988). Genetic differences in plasma corticosterone levels in response to nicotine injection. Pharmacol. Biochem. Behav 30, 1059–1064. [DOI] [PubMed] [Google Scholar]
- Gerdeman GL, Partridge JG, Lupica CR, and Lovinger DM (2003). It could be habit forming: drugs of abuse and striatal synaptic plasticity. Trends Neurosci. 26, 184–192. [DOI] [PubMed] [Google Scholar]
- Gianessi CA, Groman SM, and Taylor JR (2019). Bi-directional modulation of food habit expression by the endocannabinoid system. Eur. J. Neurosci [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gianessi CA, Groman SM, Thompson SL, Jiang M, Stelt M, and Taylor JR (2020). Endocannabinoid contributions to alcohol habits and motivation: Relevance to treatment. Addict. Biol 25, e12768. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goodman J, and Packard MG (2015). The influence of cannabinoids on learning and memory processes of the dorsal striatum. Neurobiol. Learn. Mem 125, 1–14. [DOI] [PubMed] [Google Scholar]
- Gourley SL, and Taylor JR (2016). Going and stopping: Dichotomies in behavioral control by the prefrontal cortex. Nat. Neurosci 19, 656–664. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gourley SL, Lee AS, Howell JL, Pittenger C, and Taylor JR (2010). Dissociable regulation of instrumental action within mouse prefrontal cortex. Eur. J. Neurosci 32, 1726–1734. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gremel CM, Chancey JH, Atwood BK, Luo G, Neve R, Ramakrishnan C, Deisseroth K, Lovinger DM, and Costa RM (2016). Endocannabinoid Modulation of Orbitostriatal Circuits Gates Habit Formation. Neuron 90, 1312–1324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Herkenham M, Lynn AB, Little MD, Johnson MR, Melvin LS, De Costa BR, and Rice KC (1990). Cannabinoid receptor localization in brain. Proc. Natl. Acad. Sci. U. S. A 87, 1932–1936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hilario MRF, Clouse E, Yin HH, and Costa RM (2007). Endocannabinoid signaling is critical for habit formation. Front. Integr. Neurosci 1, 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hirvonen J, Goodwin RS, Li C, Terry GE, Zoghbi SS, Morse C, Pike VW, Volkow ND, Huestis MA, and Innis RB (2012). Reversible and regionally selective downregulation of brain cannabinoid CB1 receptors in chronic daily cannabis smokers. Mol. Psychiatry 17, 642–649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hirvonen J, Zanotti-Fregonara P, Umhau JC, George DT, Rallis-Frutos D, Lyoo CH, Li C-T, Hines CS, Sun H, Terry GE, et al. (2013). Reduced cannabinoid CB1 receptor binding in alcohol dependence measured with positron emission tomography. Mol. Psychiatry 18, 916–921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hirvonen J, Zanotti-Fregonara P, Gorelick DA, Lyoo CH, Rallis-Frutos D, Morse C, Zoghbi SS, Pike VW, Volkow ND, Huestis MA, et al. (2018). Decreased Cannabinoid CB1 Receptors in Male Tobacco Smokers Examined With Positron Emission Tomography. Biol. Psychiatry 84, 715–721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kangas BD, Leonard MZ, Shukla VG, Alapafuja SO, Nikas SP, Makriyannis A, and Bergman J (2016). Comparisons of Δ9-tetrahydrocannabinol and anandamide on a battery of cognition-related behavior in nonhuman primates. J. Pharmacol. Exp. Ther 357, 125–133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kathuria S, Gaetani S, Fegley D, Valiño F, Duranti A, Tontini A, Mor M, Tarzia G, La Rana G, Calignano A, et al. (2003). Modulation of anxiety through blockade of anandamide hydrolysis. Nat. Med 9, 76–81. [DOI] [PubMed] [Google Scholar]
- Long JZ, Li W, Booker L, Burston JJ, Kinsey SG, Schlosburg JE, Pavón FJ, Serrano AM, Selley DE, Parsons LH, et al. (2008). Selective blockade of 2-arachidonoylglycerol hydrolysis produces cannabinoid behavioral effects. Nat. Chem. Biol 5, 37–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luk T, Jin W, Zvonok A, Lu D, Lin X-Z, Chavkin C, Makriyannis A, and Mackie K (2004). Identification of a potent and highly efficacious, yet slowly desensitizing {CB1} cannabinoid receptor agonist. Br. J. Pharmacol 142, 495–500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Malvaez M, and Wassum KM (2018). Regulation of habit formation in the dorsal striatum. Curr. Opin. Behav. Sci 20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marichal-Cancino BA, Sánchez-Fuentes A, Méndez-Díaz M, Ruiz-Contreras AE, and Prospéro-García O (2016). Blockade of GPR55 in the dorsolateral striatum impairs performance of rats in a T-maze paradigm. Behav. Pharmacol 27, 393–396. [DOI] [PubMed] [Google Scholar]
- Di Marzo V, Ligresti A, and Cristino L (2009). The endocannabinoid system as a link between homoeostatic and hedonic pathways involved in energy balance regulation. Int. J. Obes 33, S18–S24. [DOI] [PubMed] [Google Scholar]
- Mascia P, Pistis M, Justinova Z, Panlilio LV, Luchicchi A, Lecca S, Scherma M, Fratta W, Fadda P, Barnes C, et al. (2011). Blockade of nicotine reward and reinstatement by activation of alpha-type peroxisome proliferator-activated receptors. Biol. Psychiatry 69, 633–641. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKim TH, Bauer DJ, and Boettiger CA (2016). Addiction history associates with the propensity to form habits. J. Cogn. Neurosci 28, 1024–1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McLaughlin PJ, Winston K, Swezey L, Wisniecki A, Aberman J, Tardif DJ, Betz AJ, Ishiwari K, Makriyannis A, and Salamone JD (2003). The cannabinoid CB1 antagonists SR 141716A and AM 251 suppress food intake and food-reinforced behavior in a variety of tasks in rats. Behav. Pharmacol 14, 583–588. [DOI] [PubMed] [Google Scholar]
- McLaughlin PJ, Winston KM, Swezey LA, Vemuri VK, Makriyannis A, and Salamone JD (2010). Detailed analysis of food-reinforced operant lever pressing distinguishes effects of a cannabinoid CB1 inverse agonist and dopamine D1 and D2 antagonists. Pharmacol. Biochem. Behav 96, 75–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mechoulam R, Ben-Shabat S, Hanus L, Ligumsky M, Kaminski NE, Schatz AR, Gopher A, Almog S, Martin BR, and Compton DR (1995). Identification of an endogenous 2-monoglyceride, present in canine gut, that binds to cannabinoid receptors. Biochem. Pharmacol 50, 83–90. [DOI] [PubMed] [Google Scholar]
- Navarrete F, Rodríguez-Arias M, Martín-García E, Navarro D, García-Gutiérrez MS, Aguilar MA, Aracil-Fernández A, Berbel P, Miñarro J, Maldonado R, et al. (2013). Role of CB2 cannabinoid receptors in the rewarding, reinforcing, and physical effects of nicotine. Neuropsychopharmacology 38, 2515–2524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nazzaro C, Greco B, Cerovic M, Baxter P, Rubino T, Trusel M, Parolaro D, Tkatch T, Benfenati F, Pedarzani P, et al. (2012). SK channel modulation rescues striatal plasticity and control over habit in cannabinoid tolerance. Nat. Neurosci 15, 284–293. [DOI] [PubMed] [Google Scholar]
- Oleson EB, Beckert MV, Morra JT, Lansink CS, Cachope R, Abdullah RA, Loriaux AL, Schetters D, Pattij T, Roitman MF, et al. (2012). Endocannabinoids Shape Accumbal Encoding of Cue-Motivated Behavior via CB1 Receptor Activation in the Ventral Tegmentum. Neuron 73, 360–373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ortega-Álvaro A, Ternianov A, Aracil-Fernández A, Navarrete F, García-Gutiérrez MS, and Manzanares J (2015). Role of cannabinoid CB2 receptor in the reinforcing actions of ethanol. Addict. Biol 20, 43–55. [DOI] [PubMed] [Google Scholar]
- Quinn JJ, Hitchcott PK, Umeda EA, Arnold AP, and Taylor JR (2007). Sex chromosome complement regulates habit formation. Nat. Neurosci 10, 1398–1400. [DOI] [PubMed] [Google Scholar]
- Ryberg E, Larsson N, Sjögren S, Hjorth S, Hermansson NO, Leonova J, Elebring T, Nilsson K, Drmota T, and Greasley PJ (2007). The orphan receptor GPR55 is a novel cannabinoid receptor. Br. J. Pharmacol 152, 1092–1101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schlosburg JE, Blankman JL, Long JZ, Nomura DK, Pan B, Kinsey SG, Nguyen PT, Ramesh D, Booker L, Burston JJ, et al. (2010). Chronic monoacylglycerol lipase blockade causes functional antagonism of the endocannabinoid system. Nat. Neurosci 13, 1113–1119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sebold M, Deserno L, Nebe S, Schad DJ, Garbusow M, Hägele C, Keller J, Jünger E, Kathmann N, Smolka M, et al. (2014). Model-based and model-free decisions in alcohol dependence. Neuropsychobiology 70, 122–131. [DOI] [PubMed] [Google Scholar]
- Serrano A, Pavon FJ, Buczynski MW, Schlosburg J, Natividad LA, Polis IY, Stouffer DG, Zorrilla EP, Roberto M, Cravatt BF, et al. (2018). Deficient endocannabinoid signaling in the central amygdala contributes to alcohol dependence-related anxiety-like behavior and excessive alcohol intake. Neuropsychopharmacology. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shan Q, Christie MJ, and Balleine BW (2015). Plasticity in striatopallidal projection neurons mediates the acquisition of habitual actions. Eur. J. Neurosci 42, 2097–2104. [DOI] [PubMed] [Google Scholar]
- Sink KS, Vemuri VK, Olszewska T, Makriyannis A, and Salamone JD (2008). Cannabinoid CB1 antagonists and dopamine antagonists produce different effects on a task involving response allocation and effort-related choice in food-seeking behavior. Psychopharmacology (Berl). 196, 565–574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sugiura T, Kondo S, Sukagawa A, Nakane S, Shinoda A, Itoh K, Yamashita A, and Waku K (1995). 2-Arachidonoylglycerol: a possible endogenous cannabinoid receptor ligand in brain. Biochem. Biophys. Res. Commun 215, 89–97. [DOI] [PubMed] [Google Scholar]
- Sutton RS, and Barto AG (1998). Reinforcement learning: an introduction (Cambridge, Mass: MIT Press; ). [Google Scholar]
- Torregrossa MM, and Taylor JR (2016). Neuroscience of learning and memory for addiction medicine: from habit formation to memory reconsolidation. In Progress in Brain Research, (Elsevier B.V.), pp. 91–113. [DOI] [PubMed] [Google Scholar]
- Voon V, Derbyshire K, Rück C, Irvine MA, Worbe Y, Enander J, Schreiber LRN, Gillan C, Fineberg NA, Sahakian BJ, et al. (2015). Disorders of compulsivity: A common bias towards learning habits. Mol. Psychiatry 20, 345–352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Voon V, Reiter A, Sebold M, and Groman S (2017). Model-Based Control in Dimensional Psychiatry. Biol. Psychiatry 82, 391–400. [DOI] [PubMed] [Google Scholar]
- Wiskerke J, Irimia C, Cravatt BF, De Vries TJ, Schoffelmeer ANM, Pattij T, and Parsons LH (2012). Characterization of the effects of reuptake and hydrolysis inhibition on interstitial endocannabinoid levels in the brain: An in vivo microdialysis study. ACS Chem. Neurosci 3, 407–417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu C-S, Chen H, Sun H, Zhu J, Jew CP, Wager-Miller J, Straiker A, Spencer C, Bradshaw H, Mackie K, et al. (2013). GPR55, a G-Protein Coupled Receptor for Lysophosphatidylinositol, Plays a Role in Motor Coordination. PLoS One 8, e60314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xi ZX, Peng XQ, Li X, Song R, Zhang HY, Liu QR, Yang HJ, Bi GH, Li J, and Gardner EL (2011). Brain cannabinoid CB2 receptors modulate cocaine’s actions in mice. Nat. Neurosci 14, 1160–1168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yin HH, and Knowlton BJ (2006). The role of the basal ganglia in habit formation. Nat. Rev. Neurosci 7, 464–476. [DOI] [PubMed] [Google Scholar]
- Yin HH, Mulcare SP, Hilário MRFF, Clouse E, Holloway T, Davis MI, Hansson AC, Lovinger DM, and Costa RM (2009). Dynamic reorganization of striatal circuits during the acquisition and consolidation of a skill. Nat. Neurosci 12, 333–341. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data are available via request from the authors.