Skip to main content
Elsevier Sponsored Documents logoLink to Elsevier Sponsored Documents
. 2022 Mar 14;32(5):1163–1174.e6. doi: 10.1016/j.cub.2021.12.027

Striatal dopamine signals are region specific and temporally stable across action-sequence habit formation

Wouter van Elzelingen 1,2, Pascal Warnaar 1,2, João Matos 1,2, Wieneke Bastet 1,2, Roos Jonkman 1,2, Dyonne Smulders 1,2, Jessica Goedhoop 1,2, Damiaan Denys 1,2, Tara Arbab 1,2, Ingo Willuhn 1,2,3,4,
PMCID: PMC8926842  PMID: 35134325

Summary

Habits are automatic, inflexible behaviors that develop slowly with repeated performance. Striatal dopamine signaling instantiates this habit-formation process, presumably region specifically and via ventral-to-dorsal and medial-to-lateral signal shifts. Here, we quantify dopamine release in regions implicated in these presumed shifts (ventromedial striatum [VMS], dorsomedial striatum [DMS], and dorsolateral striatum [DLS]) in rats performing an action-sequence task and characterize habit development throughout a 10-week training. Surprisingly, all regions exhibited stable dopamine dynamics throughout habit development. VMS and DLS signals did not differ between habitual and non-habitual animals, but DMS dopamine release increased during action-sequence initiation and decreased during action-sequence completion in habitual rats, whereas non-habitual rats showed opposite effects. Consistently, optogenetic stimulation of DMS dopamine release accelerated habit formation. Thus, we demonstrate that dopamine signals do not shift regionally during habit formation and that dopamine in DMS, but not VMS or DLS, determines habit bias, attributing “habit functions” to a region previously associated exclusively with non-habitual behavior.

Keywords: dopamine, striatum, habit formation, behavior, basal ganglia, habits, automated behavior, goal-directed behavior, action repetition, rat

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • Validation of a novel test that monitors habit development individually across time

  • Dopamine release during habit development is stable across relevant striatal regions

  • Only dopamine release in dorsomedial striatum correlates with habit development

  • Optogenetic stimulation of dorsomedial striatal dopamine accelerates habit formation


Van Elzelingen et al. simultaneously track striatal dopamine signaling and development of habitual behavior. In the dorsomedial striatum, previously associated with non-habitual behavior, dopamine release increases during action initiation in habitual rats and decreases during action completion, whereas non-habitual rats show opposite effects.

Introduction

Our daily lives are governed by learned routines, exemplified by many actions we perform automatically without paying attention, such as pulling out our smartphone to seek rewarding input by perpetually scrolling down the screen. Such habits are thought to gradually develop from initially goal-directed, conscious actions (e.g., using our smartphone to contact a friend) that reliably resulted in desired outcomes. Habit formation is often tied to detection of stimuli predictive of these outcomes and is facilitated by both extended high-rate action repetition (referred to as “overtraining”1) and relative uncertainty as to when precisely the desired outcome will be achieved.2,3

The striatum has long been implicated in habit formation,4, 5, 6 and its subregions are assumed to fulfill distinct functions in this process. The dorsolateral striatum (DLS) is hypothesized to actively mediate habit learning, supported by ample evidence from rodent experiments which link DLS activity with (overtraining-induced) habit formation and DLS lesion or inactivation with attenuated habitual responding.7, 8, 9, 10, 11, 12, 13, 14 The (posterior) dorsomedial striatum (DMS) is hypothesized to control goal-directed behavior and, thus, oppose habitual strategies.15,16 A third functional domain heavily investigated in the context of reward learning is the ventromedial striatum (VMS), which is involved in motivation and pre-programmed appetitive behaviors.17,18 The behavioral impact of these three regions is thought to evolve with the gradual development of a habit, reflecting an inter-regional shift of the locus of behavioral control: a dorsal shift away from VMS after early stages of instrumental learning9,17, 18, 19, 20 and a relative transfer from DMS to DLS as behavior becomes more ingrained and habitual.7,9,13,16,21, 22, 23, 24, 25, 26, 27

These presumed intra-striatal shifts of the locus of behavioral control are thought to depend on striatal-dopaminergic neurotransmission, which is in line with the anatomy of the dopamine system28 and is corroborated by studies using dopaminergic drugs as reinforcers.29, 30, 31 Consistently, ample evidence implicates striatal dopamine in habit formation,6,9,23,24,32 and performance of ingrained action sequences is reflected in action-chunking activity patterns both in striatal and dopamine neurons.33 Furthermore, genetic deletion of dopamine-neuron NMDA receptors impairs habit learning,34 as does lesion of nigrostriatal dopamine projections,35 whereas pharmacological enhancement of dopamine transmission biases behavior toward habit formation.36 In humans, dopamine-depleted Parkinson’s patients exhibit deficits in habitual control.6,37,38 However, although dopamine is indubitably involved in habit formation, its regional specificity and its involvement in the widely assumed regional shift of behavioral control during the transition from goal-directed to habitual behavior is understudied. Conclusions were advanced on the grounds of anatomical studies and sparse functional evidence: one study tracked regionally shifting striatal dopamine signaling across development of a presumed habit reinforced by cocaine,31 whereas another study reported no such shift using a natural reinforcer.39 Thus, the central questions that remain are how region-specific and regionally shifting dopamine signals regulate habit formation.

In experimental animals, habit formation is best achieved by either overtraining or employing variable-interval (VI) reinforcement schedules that lead to high response rates and reward-timing uncertainty.1, 2, 3,40, 41, 42, 43 Dickinson and collaborators developed a widely applied operational definition that became the gold standard for habit classification:9,40,41,44 an insensitivity to (short-term) changes in value of action outcomes and/or action-outcome contingency.42 However, this definition identifies habits on the failure to satisfy criteria for goal-directed action (negative definition), including the possibility that unforeseen factors compromise integration of changes in action-outcome conditioning.45,46 Besides, traditionally used habit measures can only be applied once in a given individual and are not suited to detect individual differences, both of which are critical features, since habits develop slowly and recruitment of habitual strategies vary significantly between individuals. To address these limitations, we set out to develop a paradigm that “positively” identifies habits, accounts for individual differences, and may be repeated within a subject to track gradual habit development.

We delineate the differential contributions of relevant striatal subregions in habit formation by measuring real-time fluctuations in extracellular dopamine concentrations in VMS, DMS, and DLS of rats performing in an action-sequence reinforcement task (a “seeking-taking” chain task, which combines VI and fixed-ratio (FR) schedules and overtraining) and subsequently validating the role of region-specific dopamine signaling using optogenetic stimulation. Our findings challenge the traditionally assumed shifting of regional dopamine dynamics and surprisingly indicate that DMS, instead of DLS, dopamine determines the transition to habitual behavior, thereby questioning the idea that DMS opposes habit formation.

Results

Habitual behavior is induced by excessive behavioral repetition (overtraining)9,11,24,47 or by reinforcing behavior on a VI schedule.3,48, 49, 50 Here, we overtrained animals under a chained VI60-FR1 reinforcement schedule, thereby combining the two most prominent paradigms. Rats learned to press a seeking lever (distal to reward) on a VI60 schedule (followed by a 3-s delay in which no levers were extended) and, subsequently, an FR1 taking lever (proximal to reward) that produced a food pellet (Figure 1A). The number of seeking presses under VI reinforcement increased over 10 weeks of training (week 1 versus week 10; t35 = 12.21, p < 0.0001; Figure 1B, left), whereas the number of food-magazine head entries did not (n = 36, Z = −0.440, p = 0.660; Figure 1B, middle). We assessed the probability with which rats checked the magazine for food during the 3-s delay period between retraction of the seeking lever and extension of the taking lever. This probability decreased dramatically across weeks (week 1 versus week 10: n = 36, Z = −5.185, p < 0.0001; Figure 1B, right), demonstrating that animals learned the task structure quickly.

Figure 1.

Figure 1

A seeking-taking chain task to monitor dopamine signaling during habit formation

(A) A VI60:FR1 trial begins with seeking-lever extension. Seeking-lever presses (“distal”) are registered until the final VI60 press (“intermediate”), which triggers seeking-lever retraction, followed by taking-lever extension 3 s later. Approximately 1 s later, animals press the taking lever (“proximal”), it is retracted, and the food-pellet reward is delivered 3 s later, marking the start of the variable inter-trial interval (ITI).

(B) Number of seeking-lever presses (left) and food-magazine head entries (middle) during the VI60 segment. The probability of a food-magazine head entry during the 3-s period following seeking-lever retraction and prior to taking-lever extension (right) decreases rapidly with training, demonstrating task acquisition. Data are mean ± SEM (black) with individual animals (gray).

(C) Rats that were trained for 10 weeks underwent outcome devaluation via sensory-specific satiety (pre-feeding), in which, on two separate days, rats had 1-h ad libitum access to either regular pellets (bottom) or alternative (grain-based) pellets (top) before they were exposed to the seeking-lever (extinction) test. Seeking was unaffected by outcome devaluation, demonstrating that a habit was induced. Subsequently, rats were exposed to both pellets (“choice test”) to demonstrate that outcome-specific devaluation was successful (non-pre-fed pellets preferred; p < 0.001).

(D) Coronal brain sections with recording sites in VMS (blue), DMS (green), and DLS (red), relative to bregma.51

(E) Representative verification of electrode-tip placement in VMS (blue circle), DMS (green circle), and DLS (red circle) and corresponding electrode tracks (arrows).

(F) Dopamine release to an unpredicted food pellet demonstrates stable electrode sensitivity. Data are median with range in quartiles and mean (+); ns, not significant.

(G) DMS dopamine following seeking-lever extension (distal) was modestly diminished after 10 weeks. No significant differences (ns) were observed in VMS or DLS.

(H) During proximal actions, no significant differences (ns) were detected in dopamine release between weeks 1 and 10. p < 0.05, ∗∗∗p < 0.001.

See also Figure S1.

We tested whether behavior persists despite reward devaluation by sensory-specific satiety via a 1-h pre-feeding of regular training pellets (devalued condition) or alternative pellets (valued condition) after 10 weeks of seeking-taking training (Figure 1C, left). Subsequent presentation of the seeking lever under extinction conditions generated no significant difference in the number of seeking-lever presses between valued and devalued conditions (n = 14, Z = −0.785, p = 0.433; Figure 1C, middle), demonstrating that seeking-taking training induced a habit. A “pellet choice test” verified that satiety was specific to the pre-fed pellet type, i.e., each pellet type was devalued effectively (n = 14, valued condition: Z = −3.296, p < 0.001; devalued condition: Z = −3.297, p < 0.001; Figure 1C, right).

We characterized dopamine release longitudinally across weeks, using fast-scan cyclic voltammetry in VMS, DMS, and DLS. Electrode implantation induced minimal brain damage, and placement was verified (Figure 1E). Dopamine release in response to unpredicted delivery of a food pellet was measured as a proxy for electrode sensitivity, which remained stable across weeks in all regions (VMS: t24 = 0.7162, p = 0.4808; DMS: t10 = 1.355, p = 0.2052; DLS: t15 = 1.046, p = 0.3121; Figures 1F and S1). Besides modestly diminished DMS dopamine release upon seeking-lever extension (distal; t10 = 2.267, p = 0.0468), no longitudinal differences were found (distal VMS: t24 = 1.186, p = 0.2473; distal DLS: t18 = 0.1541, p = 0.8793; proximal VMS: t24 = 1.798, p = 0.0848; proximal DMS: t10 = 1.041, p = 0.3223; proximal DLS: t18 = 0.3994, p = 0.6943; Figures 1G and 1H). Thus, regional dopamine release is relatively stable across training, with the only task-relevant changes occurring in DMS.

To overcome limitations of traditional habit testing, we developed a test during which both seeking and taking levers were extended simultaneously for 1 min under extinction conditions (Figure 2A, left), allowing rats the choice to engage with the lever proximal to reward delivery (taking; goal-directed choice) or the lever at the distal to reward delivery (seeking; habitual choice). In week 1, rats exhibited a preference for the taking over the seeking lever, but by week 10, this preference had shifted to the seeking lever (n = 36; week 1: Z = −3.540, p = 0.0004; week 10: Z = −4.357, p < 0.0001; Figure 2A, right). This temporal evolution of lever preference was highly reliable between different rat cohorts (Figure S2A). Extinction effects due to repeated testing are unlikely because (1) the time spent in regular training exceeds preference testing almost 500 times and (2) the frequency of preference tests did not influence the habit readout (Figures S2A and S2B).

Figure 2.

Figure 2

A novel preference test to measure the development of habits

(A) Habit formation was tested by presenting both levers simultaneously for 1 min under extinction conditions (left). In week 1 of VI60:FR1 seeking-taking training, rats preferentially pressed the taking lever, but by week 10, preference had shifted to the seeking lever (right).

(B) In the habit group, the number of food-magazine head entries remained stable (top) across weeks, but rats switched from a taking- to a strong seeking-lever preference (middle) and exhibited more seeking than taking presses in week 10 (bottom).

(C) Food-magazine head entries remained stable in the no-habit group (top), but rats switched from a taking-lever preference to no preference (middle) and exhibited no significant difference between seeking and taking presses in week 10 (bottom).

(D and E) Habitual rats performed more seeking-lever presses (D) and fewer taking-lever presses (E) than non-habitual rats in week 10, but not in week 1.

(F) The seeking-taking preference index demonstrates a higher seeking-lever preference in habitual rats compared to non-habitual rats in week 10, but not in week 1.

(G) The conditioning “logic” translated from training to preference test: food-magazine head entries were executed more frequently after taking-lever presses than after seeking-lever presses, both in habitual and non-habitual rats. Histogrammed data with mean ± SEM shown separately.

(H and I) Linear regression analyses demonstrate associations of the seeking-lever index with other habit-like behavior during the preference test, i.e., (H) a positive correlation with seeking-lever stickiness and (I) a negative correlation with food-magazine head-entry probability. Data are mean ± SEM, in some panels combined with individual animals (open circles). p < 0.05, ∗∗∗p < 0.001.

See also Figures S2 and S3.

To capitalize on individual differences, we classified animals (same data as in Figures 1C–1H) into two equal-sized groups based on their relative seeking-taking preference in week 10 (i.e., number of seeking minus number of taking presses, divided by total number of presses, as shown in Figure 2F), which is identical to a median split. There were no significant changes in number of head entries into the food magazine over time in either group (Figures 2B and 2C [top]). Both groups displayed a taking-lever preference in week 1 (habit: n = 18, Z = −1.810, p = 0.0370; no habit: n = 18, Z = −3.312, p = 0.0002; Figure 2C), but only the habitual animals developed a steadily increasing seeking-lever preference (habit group: lever, F1,44 = 58.17, p < 0.0001; lever x week interaction, F4,98 = 42.82, p < 0.0001; no-habit group: lever, F1,46 = 1.206, p = 0.2779; Figures 2B and 2C [middle]) and exhibited this preference in week 10 (habit: n = 18, Z = −3.724, p < 0.0001; no habit: n = 18, Z = −1.491, p = 0.1454; Figures 2B and 2C [bottom]). Habitual animals showed more seeking presses (U(nhabit = 18, nno-habit = 18) = 11.500, Z = −4.765, p < 0.0001; Figure 2D) and fewer taking presses (U(nhabit = 18, nno-habit = 18) = 1.500, Z = −5.098, p < 0.0001; Figure 2E) compared to non-habitual rats in week 10, but not week 1 (seeking: U(nhabit = 18, nno-habit = 18) = 106.500, Z = −1.758, p = 0.0790; Figure 2D; taking: U(nhabit = 18, nno-habit = 18) = 153.500, Z = −0.269, p = 1.0000; Figure 2E). Consistently, grouping the rats based on week-10 performance (Figure 2F) separated habit from no-habit animals in week 10 (U(nhabit = 18, nno-habit = 18) = 0.000, Z = −5.144, p < 0.0001), but not week 1 (U(nhabit = 18, nno-habit = 18) = 115.000, Z = −1.487, p = 0.1370;). Across weeks 1 and 10, seeking-lever presses (n = 18, Z = −3.724, p = 0.0006; Figure 2D) and seeking-preference index (n = 18, Z = −3.724, p = 0.0006; Figure 2F) increased in habitual animals, whereas taking responses decreased (n = 18, Z = −3.725, p = 0.0006; Figure 2E). In non-habitual animals, seeking responses (n = 18, Z = −3.680, p = 0.0005; Figure 2D) and seeking-preference index (n = 18, Z = −3.636, p = 0.0005; Figure 2F) also increased, albeit to a lesser extent than in habitual rats, whereas taking responses did not change (n = 18, Z = −0.104, p = 0.9175; Figure 2E).

During lever-preference tests, habitual and non-habitual rats made more head entries into the food magazine after pressing the taking lever than the seeking lever (Figure 2G) in weeks 1 (habit group: t32 = 7.542, p < 0.0001; non-habit group: t34 = 8.488, p < 0.0001) and 10 (habit group: t27 = 2.510, p = 0.0184; non-habit group: t34 = 6.202, p < 0.0001). Thus, both groups learned the task structure and translated it from training sessions (levers presented sequentially) to the test situation (levers presented simultaneously). Habit and no-habit rats did not differ in number of head entries that followed seeking- or taking-lever presses in weeks 1 (seeking: t32 = 1.506, p = 0.4257; taking: t34 = 1.430, p = 0.3238; Figure 2G, left) or 10 (seeking: t34 = 0.5440, p = 0.5900; taking: t27 = 2.393, p = 0.0956; Figure 2G, right). Notably, repeated presses on the same lever were much more prevalent than seeking-taking action sequences in both habit and no-habit animals in both weeks 1 and 10 (Figures S3A–S3C). Thus, simultaneous lever presentation during the preference test is marked by rats “getting stuck” on one of the levers (no-habit rats on the taking lever [Figure S3B] and habit rats on the seeking lever [Figure S3C]) rather than performing action sequences involving both levers (Figure S3A).

We validated the preference test by carrying out regression analyses, testing for a relationship between the seeking-taking preference index (shown in Figure 2F) and other “habit-like” behaviors displayed during the test. The preference index correlated positively with the probability to repeat seeking-lever presses (R value = 0.6797, p < 0.0001; Figure 2H) and negatively with the probability to make food-magazine head entries (R value = −0.4301, p = 0.0088; Figure 2I).

Classifying rats as habitual or non-habitual based on week-10 preference-test performance, revealed that the two groups perform differently during seeking-taking training (Figures 3A and S4): the frequency of “distal” seeking lever presses across VI training weeks was higher in habitual rats (group, F5,34 = 18.48, p = 0.0001; group x week interaction, F5,170 = 3.154, p = 0.0095). Conversely, the frequency of distal food-magazine head entries during VI was lower in habitual rats (group, F1,32 = 10.66, p = 0.0026). In contrast, we observed no differences in head-entry frequency outside VI when levers were retracted (group, F1,32 = 0.0003168, p = 0.9859) and in “intermediate” head-entry probability during the 3-s period following seeking-lever retraction, prior to taking-lever extension (group, F1,34 = 1.463, p = 0.2348). Together, these data suggest that habit animals exhibit a greater affinity to the seeking lever (habit-like behavior) paired with less outcome checking (goal-directed behavior) and that group differences are specific to task-relevant actions and do not reflect discrepancies in general activity.

Figure 3.

Figure 3

Diametrically opposing DMS dopamine signals in habitual and non-habitual rats during distal and proximal reward seeking

(A) Habitual animals showed a higher frequency of distal seeking-lever presses during the VI60 trial segment compared to non-habitual rats (middle left). Conversely, habitual animals exhibited fewer food-magazine head entries during VI60 (middle right). However, this difference in head entries was not present outside VI60, when no levers were extended (left). Similarly, no group differences were observed for the head-entry probability during the 3-s period following seeking-lever retraction and prior to taking-lever extension (right). Data are mean ± SEM.

(B) Dopamine release to an unpredicted food-pellet delivery (proxy for electrode sensitivity) did not differ between groups. Data are median with range in quartiles and mean (+); ns, not signifcant.

(C) During distal actions (start of VI60), no significant dopamine group differences were observed in VMS or DLS. However, DMS dopamine was greater in habitual rats in both weeks 1 and 10. Data are mean + SEM; traces are aligned to seeking-lever extension.

(D) During proximal actions, no significant group differences in dopamine were observed in VMS or DLS. However, again, DMS dopamine differed between habitual and non-habitual rats, but in contrast to distal actions, proximal events significantly decreased dopamine in habitual animals in both weeks 1 and 10. Data are mean + SEM; gray-shaded areas indicate approximate final seeking-press timing.

(E) To evaluate the association of dopamine with reward seeking (distal actions: lever presses early in the VI60 segment), trial-by-trial correlations were calculated for week 1 (top) and week 10 (bottom). Gray bars display the animals’ R values distributed over given histogram brackets. Each circle represents a single animal (habitual in orange, non-habitual in brown), and significant correlations are depicted as filled circles and non-significant correlations as empty circles. VMS demonstrated the closest relationship between dopamine and seeking-lever presses. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001.

See also Figure S4.

Dopamine release to an unpredicted food pellet (proxy for electrode sensitivity) did not differ significantly between habitual and non-habitual animals (Figure 3B) in VMS (week 1: t23 = 0.3175, p = 0.7537; week 10: t26 = 1.146, p = 0.2623), DMS (week 1: t10 = 0.7898, p = 0.4480; week 10: t11 = 0.5837, p = 0.5712), and DLS (week 1: t17 = 1.541, p = 0.1417; week 10: t18 = 0.4060, p = 0.6895). Dopamine measurements throughout the 10-week training (Figures 3C and 3D) did not reveal any differences between habitual and non-habitual rats during “distal,” “intermediate,” or “proximal” actions in VMS (distal week 1: t23 = 0.8278, p = 0.4163; distal week 10: t26 = 1.648, p = 0.2228; proximal week 1: t23 = 0.6261, p = 1.0000; proximal week 10: t26 = 0.3088, p = 0.7600) and DLS (distal week 1: t20 = 0.6727, p = 1.0000; distal week 10: t17 = 0.09775, p = 0.9233; proximal week 1: t20 = 0.6011, p = 0.5545; proximal week 10: t18 = 1.922, p = 0.1412). However, distal DMS dopamine concentration was higher in habitual animals in week 10 (t11 = 5.213, p = 0.0006), a surprising effect that was already detectable in week 1 (t10 = 2.438, p = 0.0350). Remarkably, this effect was reversed during intermediate and proximal actions: habitual animals had lower DMS dopamine in both week 1 (t10 = 5.428, p = 0.0006) and week 10 (t11 = 3.663, p = 0.0037). Within-animal trial-by-trial correlations (Figure 3E) revealed that reward seeking in itself, irrespective of training amount, was strongly associated with VMS, but not DMS or DLS dopamine; seeking-lever pressing in itself, therefore, does not bring about the changes in DMS dopamine we observe during habit development. In summary, habitual behavior was correlated with increased DMS dopamine release during distal behavioral responses, which was decreased during responses proximal to reward delivery. Moreover, DMS dopamine differences were already apparent early in training (before habits developed) and were not involved in (motor) performance of distal seeking.

To determine whether augmented DMS dopamine signaling is sufficient in itself to induce habitual behavior, we bilaterally optogenetically stimulated the terminals of DMS-projecting dopamine neurons during seeking presses (Figures 4A and 4B). We controlled for possible reinforcing effects by exposing rats to an open-field arena where presence of the animal in a specific quadrant triggered photo-stimulation (real-time place preference [RT-PP] assay) and observed no significant difference between ChR2 and control rats (expressing EYFP only) in time spent in the active quadrant (t15 = 1.745, p = 0.1014; Figure 4C). In comparison, other studies applying RT-PP via dopamine photo stimulation have reported animals spending about 80% of their time in the photo-stimulated position,52, 53, 54, 55, 56 whereas we observed only 45%. Similarly, when rats were given the opportunity to self-stimulate DMS dopamine terminals optogenetically by poking their noses into a device (over the course of 3 days), there was no significant difference in the number of active or inactive nose pokes between ChR2 and EYFP rats (active-port group, F1,9 = 4.476, p = 0.0635, time x group interaction: F2,18 = 2.566, p = 0.105; inactive-port group, F1,9 = 2.127, p = 0.1787, time x group interaction: F2,18 = 1.046, p = 0.372; Figure 4D), and there was no difference in active and inactive nose pokes within either of these groups (ChR2: poke-port, F1,12 = 4.514, p = 0.0551, time x poke-port interaction: F2,24 = 1.395, p = 0.267; EYFP: poke-port, F1,6 = 0.06194, p = 0.8118, time x poke-port interaction: F2,12 = 0.016, p = 0.984; Figure 4D). Furthermore, for ChR2 animals, no difference between active and inactive pokes was detected on day 3 (t9 = 1.576, p = 0.150). In comparison, overall, other studies applying optogenetic self-stimulation of dopamine have reported about 25–40 responses/min,52,55,57,58 whereas we observed only about 1 response/min on day 1 and 0.15/min on day 3. Taken together, stimulation of dopaminergic DMS terminals was without significant reinforcing effects.

Figure 4.

Figure 4

Optogenetic stimulation of DMS dopamine during seeking accelerates habit formation

(A) AAV injection into substantia-nigra pars-compacta (SNpc) of Th::Cre rats induced ChR2-EYFP or EYFP (control) expression in dopamine neurons. DMS dopamine terminals were optogenetically stimulated bilaterally.

(B) Immunostaining for tyrosine hydroxylase (TH) and EYFP indicates selective virus expression in midbrain dopamine neurons (bottom) and their striatal terminals (top). SNR, substantia nigra pars reticulata; VTA, ventral tegmental area.

(C) Percentage of time spent in the DMS-photo-stimulated (active) quadrant of an open-field arena. No significant difference (ns) was observed between ChR2 and EYFP rats.

(D) Nose pokes into the active port triggered DMS dopamine photo stimulation. We observed no significant differences (ns) within or between ChR2 and EYFP animals in the number of active and inactive nose pokes.

(E) During seeking-taking training, seeking presses triggered DMS dopamine photo stimulation. The conditioning “logic” translated from training to preference test: food-magazine head entries were executed more frequently after taking-lever presses than after seeking-lever presses, both in habitual and non-habitual rats. Histogrammed data with mean ± SEM shown separately.

(F) No significant differences were found in frequency of food-magazine head entries outside VI60, when no levers were extended.

(G) ChR2 animals made more distal VI60 seeking-lever presses (left). No significant difference was found for food-magazine head entries during VI60 (right).

(H) Similarly, no group differences were observed for the head-entry probability during the 3-s period following seeking-lever retraction and prior to taking-lever extension (right).

(I) In week 1, both ChR2 and EYFP rats preferentially pressed the taking lever during the habit test. By week 3, this preference had shifted to the seeking lever for the ChR2 rats, but not the EYFP rats.

(J) The ChR2 group made significantly more seeking-lever responses, but groups did not differ in taking-lever presses. Data in (E) and (F) are mean ± SEM. p < 0.05, ∗∗p < 0.01.

During seeking-taking training, each (distal) seeking press triggered a 1 s DMS photo stimulation. The lever-preference test showed that both groups made more food-magazine head entries after taking compared to seeking responses (ChR2: lever, F1,18 = 22.49, p = 0.0002; EYFP: lever, F1,12 = 20.10, p = 0.0007; Figure 4E). Furthermore, we detected no differences over time (between weeks 1 or 3) between the number of ChR2 and EYFP rats that performed taking-lever-to-head-entry sequences (group, F1,15 = 0.003275, p = 0.9551) or seeking-lever-to-head-entry sequences (group, F1,15 = 0.1307, p = 0.7228), indicating that task-structure representation was stable across training sessions.

During seeking-taking training, ChR2 animals increased their seeking-lever presses during VI60 compared to EYFP controls (group, F1,15 = 18.27, p = 0.0007, group x week interaction, F4,60 = 15.20, p < 0.0001; Figure 4G, left). Post hoc analysis revealed that the number of seeking-lever presses was higher in weeks 1–3 (respectively: t15 = 4.546, p = 0.0012; t15 = 4.521, p = 0.0016; t15 = 4.654, p = 0.0015), indicating that optogenetic DMS dopamine stimulation promoted habit formation. In contrast, no differences in frequency of magazine head entries were found between ChR2 and EYFP rats, either during VI (group, F1,15 = 0.3029, p = 0.5901; Figure 4G, right) or outside VI (group, F1,15 = 1.088, p = 0.3134; Figure 4F). Therefore, DMS dopamine did not simply invigorate general responding but specifically enhanced habitual responding. No group differences were found for head-entry probability during the 3-s period following seeking-lever retraction and prior to taking-lever extension (group, F1,15 = 0.4022, p = 0.5355; Figure 4H), indicating that photo stimulation did not affect task-structure comprehension. In comparison, optogenetic effects are not robust in the self-stimulation condition (Figure 4D) but are very robust when applied in conjunction with seeking-lever presses during seeking-taking training (Figure 4G, left). Thus, we conclude that optogenetic effects on seeking-lever preference cannot be explained by reinforcing properties of DMS photo stimulation alone.

DMS photo-stimulation effects also translated into behavior during the lever-preference test. ChR2 animals pressed the seeking lever more than the taking lever across 3 weeks of training (lever, F1,15 = 7.840, p = 0.0135; Figure 4I, left), whereas EYFP animals did not (lever, F1,12 = 0.6623, p = 0.4316; Figure 4I, right). When compared to taking-lever presses, post hoc analysis revealed that both groups made fewer seeking-lever presses in week 1 (EYFP: n = 7, Z = −2.366, p = 0.036; ChR2: n = 10, Z = −2.397, p = 0.017), but only ChR2 rats made significantly more seeking- than taking-lever presses in week 3 (EYFP: n = 7, Z = −1.609, p = 0.1077; ChR2:n = 10, Z = −2.805, p = 0.010; Figure 4I). The number of taking-lever presses did not differ between groups (group, F1,15 = 0.2355, p = 0.6345; Figure 4J, right), but ChR2 rats made more seeking-lever responses than EYFP animals (group, F1,15 = 7.840, p = 0.0135; Figure 4J, left), where post hoc analysis revealed that the number of seeking-lever presses was higher in week 3 (U(nChR2 = 10, nEYFP = 7) = 1.500, Z = −3.273, p = 0.001). This suggests that augmented DMS dopamine during training contributes to (1) an increased occurrence of seeking, but not taking responses and (2) accelerating habit formation assessed in the lever-preference test.

Discussion

We set out to delineate the role of the striatal dopamine system in habit formation (i.e., automation of reward seeking) and assessed dopamine release in three striatal regions implicated heavily in acquisition and execution of reward seeking: VMS, DMS, and DLS. A 10-week (over)training in a seeking-taking chain task resulted in habitual seeking, as evidenced by reward devaluation via sensory-specific satiety, traditionally employed to identify habits. The sequential nature of the task enabled us to dissect dopamine signaling during distal (seeking initiation) and proximal (seeking completion and reward taking) epochs. Surprisingly, dopamine release was stable between training weeks 1 and 10, the only exception being a modestly decreased distal DMS signal in week 10. To better interrogate the developing habit, we designed a novel lever-preference test. In week 1, animals displayed a taking-lever preference (goal-directed), but with continued training, rats began either to sample both levers equally (no-habit rats) or to develop a seeking-lever preference (habit rats). Habit rats displayed increased DMS dopamine during distal reward seeking and decreased DMS dopamine during actions more proximal to reward delivery (compared to no-habit rats), whereas VMS and DLS dopamine did not differ between groups. Although this DMS “habit signature” was not related to seeking actions themselves (these were more associated with VMS dopamine), we hypothesized it to be causally related to the gradually developing seeking-lever bias. Thus, we optogenetically stimulated DMS dopamine neuron terminals during seeking (during training), which resulted in a nearly instant selective promotion of seeking behavior during training, and accelerated the shift toward seeking-lever preference (in the test). Together, our findings demonstrate that dopamine release in DMS, but not VMS or DLS, predicts future reward-seeking strategies, where augmented distal DMS dopamine promotes the development of habitual reward-seeking (and more pronounced proximal DMS dopamine reflects the propensity toward a non-habitual strategy).

To study neural mechanisms of a gradually developing habit, we introduced a novel (lever-preference) test that, unlike traditionally employed indicators of habits (invoked via satiety-specific devaluation, contingency degradation, or taste aversion), was designed to allow for repeated testing and analysis of individual differences. We validated that seeking-taking training conditions translated to preference-test conditions by demonstrating that taking-lever presses are followed more often by reward-magazine entries than seeking-lever presses, indicating that rats expect reward after taking-lever, but not seeking-lever, presses, and thus learned and generalized the task structure. That training-induced seeking-lever preference is a valid indicator of habit formation was underlined by the fact that it correlated positively with seeking-lever stickiness (another habit-like feature) and negatively with food-magazine entries (presumably a goal-directed action). Our test enables positive identification of habitual performance by measuring an active-choice preference, as opposed to the “traditional” negative identification via exclusion of a goal-directed strategy. Notably, the test consistently and reliably exposed the gradual nature of the change in lever preference across training weeks in several cohorts of animals (Figure S2A). The slow speed at which the preference shift develops is another validating feature (as reported in other seeking-taking studies59,60), as habits in humans are thought to form slowly via extensive repetition.3,42,61,62 Moreover, the test demonstrated that within the same animals, via overtraining, habit-like performance can arise out of a behavior that was initially under goal-directed control;61,62 overtraining has been shown to effectively produce habits, both in single-action and action-sequence tasks, but aspects of action sequences (e.g., initiation or termination) may remain under goal-directed or mixed control (the latter may explain that no-habit rats do not exhibit a lever preference in week 10).1,63, 64, 65 Crucially, by chaining two operant behaviors, we temporally separated initiation (distal) and termination (proximal) of seeking (followed by reward consumption: taking), which allowed clear distinction between underlying neuronal mechanisms. Finally, our approach offers additional measures (reward-magazine head entries and different action sequences) that further qualify behavioral strategy (goal-directed or habitual). Taken together, we propose that “habits” are not a unitary concept and can be detected in different ways (see STAR Methods for more discussion). Here, we introduce a valuable addition to traditionally employed habit paradigms.

Although dopamine neurons are well known to contribute to habit formation,6,9,10,23,32, 33, 34, 35, 36,66, 67, 68 little is known about their projection-specific activity of the dopamine system therein. Based on the presumed functions of VMS (motivation/Pavlovian behavior), DMS (goal-directed behavioral control), and DLS (sensorimotor/habit learning), we recorded dopamine transients in these striatal domains, hypothesizing that regional dopamine signals throughout habit development would align to these functions. Additionally, based on the generally presumed dopamine-dependent inter-regional shift in locus of behavioral control during habit development (see introduction), we expected task-relevant dopamine signals to shift from VMS to the dorsal striatum and/or from DMS to DLS. An alternative hypothesis suggests that the influence of dopamine subsides altogether with increasing training/automaticity.23,69,70 However, surprisingly, our results demonstrate that dopamine signaling (1) does not shift inter-regionally (neither across the animal population as a whole nor between individual habit and non-habitual animals) and (2) does not decrease with increasing behavioral automaticity in any of the three regions. In fact, the stability of dopamine signals across 10 weeks of training is remarkable given dopamine’s role as a “learning/teaching” signal (the task is fully acquired within first few weeks). Even more surprisingly, DMS, a region traditionally associated with the control of goal-directed reward seeking, exhibited the most critical features, where an increase in dopamine during reward-seeking initiation was predictive of future habit-like behavior. This habitual distal DMS dopamine surge was paralleled by a depression of proximal DMS dopamine. In contrast, DMS dopamine signals in non-habitual animals were not simply absent but opposite to those of habitual animals: they decreased distally and increased proximally. This result also underlined that DMS dopamine signaling was not compromised in non-habitual rats. Furthermore, although “action vigor” is a potential contributor to DMS dopamine signals, the number of seeking presses in training (i.e., a proxy for vigor) increases over time in both habit and no-habit animals (Figure 3A). Furthermore, DMS dopamine signals are opposite between these groups, which is the case from the beginning of training, and the signals do not correlate with seeking-press frequency (Figure 3E). Thus, it seems unlikely that vigor is itself a major contributor. Together, our data indicate that the temporal distribution (distal/proximal) of dopamine signaling in DMS, but not VMS or DLS, governs whether a behavior becomes habitual or not.

Since the DMS is traditionally not associated with habit formation directly,8,13,16,25,27,29 the question is what function dopamine released into the DMS fulfills during habit formation. From the beginning of training, DMS dopamine signals were present during seeking initiation in animals that displayed an affinity to the seeking lever, reflecting a predisposition toward habit formation, which manifested in the preference test only after weeks of training. This suggests that DMS dopamine signals may strengthen the seeking preference incrementally over many repetitions, which then translates into other situations (i.e., the lever-preference test) later. This is corroborated by our optogenetic-stimulation study: optogenetic DMS dopamine stimulation applied during training is reflected in a significant seeking-lever preference only after 2 weeks. Notably, repetition of seeking presses (overtraining) alone was not sufficient to establish a lever preference in itself, since non-habitual rats were also overtrained, suggesting that DMS dopamine is essential to habit formation.

Enhanced engagement with the seeking lever is reminiscent of “sign tracking,” but sign tracking is a Pavlovian trait that does not develop gradually and, thus, does not explain the gradual shift in lever preference across weeks. Furthermore, sign tracking is strongly linked to VMS dopamine. Consistently, VMS dopamine was correlated with seeking-press frequency, whereas DMS dopamine was less so. This suggests that DMS dopamine release does not directly induce seeking behavior (although it does exacerbate it, as evidenced by our optogenetic study), which is supported by the finding that optogenetic manipulation of DMS dopamine (outside of seeking-taking training) did not suffice to robustly reinforce behavior during intracranial self-stimulation and RT-PP.

An intriguing, yet speculative, idea is that in habit animals, the dopamine response to the earliest predictor of reward “propagated back in time” to the extension of the seeking lever (distal), whereas in no-habit animals, this dopamine response “remained” at seeking-lever retraction (a more conservative and precise predictor of reward [proximal]). This way, DMS dopamine stimulates development of the habit-like reward pursuit, since DMS dopamine increased in habit rats and decreased in no-habit rats at the time of seeking-lever presentation and coincided with seeking initiation without causing this behavior directly. In other words, DMS dopamine may be the effector that, during repeated performance, slowly strengthens the association between seeking stimulus/lever and response and thereby may reflect a proclivity to habit formation, consistent with our finding that such DMS dopamine was already enhanced prior to habit development.

Our results challenge the reigning hypotheses of regionally shifting dopamine involvement during habit formation, dopamine independence of habits, and exclusive DLS governance of habits and thereby break with the idea that the DMS simply counterbalances a DLS habit system. Notably, without our in-depth analysis of individual animal behavior across habit development, our dopamine data (averaged across the entire population) would have supported the idea of a weakened goal-directed DMS system that biases animals toward habitual performance and masked the intricate contribution of the DMS dopamine system. Instead, we were able to delineate two distinct rat populations and demonstrate that DMS-dopamine release during seeking initiation critically contributes to the formation of a habit. Intriguingly, DMS dopamine exhibited a double dissociation, where proximal events were accompanied by a signal opposite to distal events (in both habit and no-habit rats). In consideration of the fact that dopamine signaling did not vanish in, or shift between, any of the measured striatal regions, we suggest that habit formation is a decentralized, concerted effort of many collaborating striatal domains including the DMS.

STAR★Methods

Key resources table

REAGENT or RESOURCE SOURCE IDENTIFIER
Antibodies

Rabbit anti-GFP (primary antibodies) Thermo Fisher Scientific Cat#A-6455; RRID: AB_221570
Chicken anti-TH (primary antibodies) Aves Labs RRID: AB_10013440
Donkey anti-rabbit Alexa 488 Jackson Immunoresearch RRID: AB_2340619; 711-546-152
Donkey anti-chicken Alexa 647 Jackson Immunoresearch RRID: AB_2340380; 703-606-155

Bacterial and virus strains

AAV5.EF1α -DIO-ChR2-EYFP Viral Vector Facility, University of Zurich N/A
AAV5.EF1α -DIO-EYFP Viral Vector Facility, University of Zurich N/A

Chemicals, peptides, and recombinant proteins

Phosphate-buffered saline Thermo Fisher Scientific Cat#18912014
Normal donkey serum Jackson ImmunoResearch RRID: AB_2337258
Bovine serum albumin Sigma-Aldrich Cat#A9647-1kg
Sodium azide Sigma-Aldrich Cat#S2002-100 g
Triton X-100 Sigma-Aldrich Cat#X-100-500ml
4% Paraformaldehyde (37% formalin) VWR Cat#97064-606

Deposited data

Code This paper https://osf.io/j3zaq/

Experimental models: Organisms/strains

Long-Evans rats Janvier Labs, France RjOrl:LE
TH-Cre rats on Long-Evans background Rat Resource & Research Center https://www.rrrc.us/ LE-Tg(TH-Cre)3.1Deis / RRID:RRRC_00659

Software and algorithms

MATLAB version R2018b MathWorks https://www.mathworks.com/products/new_products/release2018b.html
IBM SPSS statistics 25 IBM https://www.ibm.com/nl-en/products/spss-statistics
Prism version 9.2.0 GraphPad https://www.graphpad.com/scientific-software/prism/
LabVIEW version 2014 National Instruments https://www.ni.com/nl-nl/shop/labview.html
Med-PC version IV Med-associates https://www.med-associates.com/med-pc-v/
QuPath version 0.2.3 Open source https://qupath.github.io/

Other

45mg Dustless Precision Pellets Rodent, Purified Bio-Serv Cat#: F0021
45mg Dustless Precision Pellets Rodent, Grain-Based Bio-Serv Cat#: F0165

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Ingo Willuhn (i.willuhn@nin.knaw.nl).

Materials availability

This study did not generate new unique reagents.

Experimental model and subject details

Animals

Adult male Long-Evans rats from Janvier Labs were used for voltammetry and behavior-only studies. For the optogenetic study, adult male TH-Cre rats (on Long-Evans background71) were used. Rats (320-410 g) were housed individually and kept on a reversed 12 h light/dark cycle (lights off from 08:00 to 20:00) with controlled temperature and humidity. Rats were food-restricted to 85% of their free-feeding body weight, and water was provided ad libitum. Regular laboratory chow was available in the home cage two h after the end of daily behavioral training (executed during the dark phase), supplementing food intake during training. All animal procedures were conducted in accordance with Dutch and European law and approved by the Animal Experimentation Committee of the Royal Netherlands Academy of Arts and Sciences. For dopamine recordings, 35 animals had at least one functional and histologically verified recording electrode. 16 rats were used to verify whether reward-driven behavior was habitual after 10 weeks of training by traditional satiety-specific outcome devaluation, of which two rats were excluded from analysis, because outcome devaluation failed. 17 rats were used for optogenetic stimulation of dopaminergic neuron-terminals in DMS that had histologically verified virus expression in both hemispheres and placement of optical fibers bilaterally in DMS.

Method details

Stereotactic surgery procedures

Stereotactic surgery was executed as described previously.31,72 Before surgery, all surgical equipment was sterilized and the immediate surroundings were wiped with 70% ethanol. Rats were anesthetized with 1%–3% isoflurane and placed in a stereotactic frame. Heart rate and breathing frequency where monitored to assess surgical anesthetic depth. Body temperature was monitored and maintained with a heating pad. Following subcutaneous injection with the analgesic Metacam (0.2mg meloxicam/100 g), the scalp was shaved, disinfected with 70% alcohol, and incised to expose the cranium. The site of incision was treated with lidocaine (100mg/mL). For the voltammetric experiment, holes were drilled for three anchor surgical screws, a Ag/AgCl reference electrode, and two custom-made carbon-fiber micro-electrodes73 unilaterally targeting two of three regions: VMS (1.2mm rostral, 1.5mm lateral, and −7.1mm ventral from Bregma51), DMS (−0.2mm rostral, 2.5mm lateral, and −4.5mm ventral), and DLS (1.2mm rostral, 3.6mm lateral, and −4.5mm ventral). Electrodes were secured to the skull with dental acrylic cement anchored to the surgical screws.

For the optogenetic experiment, holes were drilled for three anchor surgical screws, two optic fibers targeting the DMS bilaterally (−0.2mm rostral, +/−2.7mm lateral, and −3.7mm ventral), and four virus injections in the ventral midbrain across both hemispheres, targeting the substantia nigra pars compacta (−5.2mm rostral, +/−2.4mm lateral, and −7.8mm ventral, and −6.0mm rostral, +/−2.4mm lateral, and −7.6mm ventral). After slowly lowering the microinjector into position, virus was infused at a rate of 0.1μl/min using a microsyringe pump. After infusion, injectors were left in place for 10min and then removed slowly. Optic fibers (diameter of 230μm, custom-made) targeted at the DMS (the dopamine-neuron projection target of interest) were secured to the skull with dental acrylic cement anchored to the surgical screws.

Following surgery, rats received 0.5ml of saline s.c for rehydration and were placed in an incubator for at least 1 h. To monitor any possibility of discomfort or pain and to make sure that the animals properly recovered, we monitored their overall appearance, weight, behavior, and the state of the incision (wound healing) daily for 3 days post-surgery. Surgery and anesthesia duration was shorter than 2 h.

Viral vectors and optogenetic stimulation

For optogenetic stimulation, AAV5.Ef1α-DIO-ChR2-EYFP (0.9μl/injection site, for a total of 3.6μl/rat, titer ∼3x1012 particles/mL) was injected into the ventral midbrain for Cre-dependent expression of channelrhodopsin in dopamine neurons of TH-Cre rats. AAV5.Ef1α-DIO-EYFP (0.9μl/injection site, for a total of 3.6μl/rat, titer ∼3x1012 particles/mL) was used as a Cre-dependent fluorophore control. For optogenetic stimulation, we used laser light with a wavelength of 470 nm, a pulse width of 10ms, a frequency of 20Hz, and an intensity of 20mW at the tip of the optic-fiber patch cable (constant illumination).

Operant chambers

All behavioral experiments were conducted in modified modular operant chambers (32x30x29cm, Med Associates Inc., VT, USA), equipped with a food magazine with an integrated light (connected to an automated food-pellet dispenser) flanked by two retractable operant levers, a house light, multiple tone generators, a white-noise generator, and metal grid floors. The wall opposite the food magazine was outfitted with two nose-poke response devices (port with integrated cue lights) located on adjacent panels. Each operant box was surveilled by a video camera. The boxes were housed in metal Faraday cages, that were insulated with sound-absorbing polyurethane foam and ventilated by a fan.

Training on seeking-taking chain schedule of reinforcement

The seeking-taking chain schedule of reinforcement was adapted from Olmstead and colleagues74 and Balleine and colleagues.75 After rats were placed in the operant chamber, the start of each daily behavioral-training session was marked by illumination of the house light and the sound of the white noise, which both remained on for the duration of the session. During each trial, rats could earn one “purified” food pellet (Bio. Serv. Inc., USA), delivered into the tray of the food magazine, accompanied by a 2 s-illumination of the magazine light. Behavioral training commenced when rats reached 85% of free-feeding body weight. On the first day, animals were habituated to the operant chamber and 30 food pellets were delivered on a variable-interval (VI) schedule of 90 s. On the second day, rats learned to press the taking lever, the proximal link of the chain (i.e., proximal to the food reward), on a fixed ratio 1 (FR1) schedule for an immediate delivery of a pellet, with a maximum of 40 pellets. The following three days, taking-lever presses triggered pellet delivery (FR1) plus lever retraction, and an inter-trial interval (ITI) of 25 s was imposed, followed by re-extension of the lever into the operant box. In the next session, rats learned to press the seeking lever, the distal link of the chain (i.e., distal to the food reward), in order to gain access to the taking lever. Following its extension into the box (marking the start of each trial), one press on the seeking lever (FR1) triggered retraction of this lever, and after a delay of 3 s, the taking lever was extended. One press on the taking lever (FR1) triggered retraction of the taking lever and delivery of a food pellet 3 s thereafter (FR1:FR1 schedule). During the variable ITI of 25 s, both levers remained retracted. In the following sessions, rats were trained on increasing VIs (ranging from 2 s to eventually 60 s) for the seeking lever: VI2:FR1, VI7:FR1, VI15:FR1, VI30:FR1, VI60:FR1. Each VI started with the rat’s first response on the seeking lever, once extended. Subsequent lever presses went without programmed consequences, until the first press after the VI ended, which triggered seeking-lever retraction and taking-lever extension 3 s thereafter. Figure 2A depicts a schematic overview of a VI60:FR1 trial. Sessions ended once rats earned 40 rewards. Rats were trained daily on the VI60:FR1 for 10 weeks. Lever position (seeking/taking) was counterbalanced between rats.

Lever-preference test

Behavioral automaticity was probed regularly (no more than once per week) immediately preceding the daily training session, using a novel “lever-preference” test: After a variable interval of 25 s with only house-light and white noise turned on, animals were exposed to both seeking and taking levers simultaneously for one minute under extinction conditions (recording lever presses and head entries into the food magazine without programmed consequences). During this minute, a preference for the taking lever (i.e., more presses on the taking than seeking lever) was interpreted as a goal-directed behavioral strategy, whereas a seeking-lever preference was taken as evidence for habitual responding.

We aim to establish the lever-preference test as a new way to assess the development of habitual behavior. Our intention is not to emulate another version of an outcome-devaluation procedure or to design a proxy for it. Instead, we introduce an alternative way to track progressive automation of behavior and present a number of points that validate and/or support this test as a tool that identifies a habit:

  • 1)

    Across the 10 weeks of training, the animals’ initial test preference to interact with the taking lever (that is associated with immediate (FR1) food delivery) switches to the seeking lever (that never directly leads to food delivery and never leads to food quickly, but instead only leads to the extension of the taking lever).

  • 2)

    We demonstrate that this preference switch occurs within individual animals.

  • 3)

    The taking preference at the beginning of training is present in animals from both groups (habit and no-habit; Figures 2B and 2C).

  • 4)

    This switch only occurs after a very high rate of action and session repetition (1000s of repetitions in 10 weeks of daily training sessions; Figure 1B). It is widely accepted that habit learning materializes through high repetition of responding. Our paradigm requires many repetitions of the behavior before the habit is detectable (long after the motor skill to lever press had been acquired).

  • 5)

    The slow speed of this switch is comparable to the speed of habit development in human everyday life (Figures 2B and 2C), establishing face validity, in contrast to some other habit tasks that report habits developing as quickly as after 1-4 days of behavioral training.

  • 6)

    Induction of sensory-specific satiety after 10 weeks of seeking-taking training demonstrates that our behavioral training schedule induces a habit as defined and tested by traditional outcome devaluation (and in the traditional way: as a singular group, as opposed to considering individual differences; Figure 1C), establishing construct validity and fulfilling the traditional requirement to classify a behavior as habitual.

  • 7)

    Habitual animals exhibit a higher probability to repeat seeking-lever presses (lever stickiness) during the preference test (Figure 2H), a measure associated with habitual behavior.

  • 8)

    Habitual animals commit fewer head entries into the food magazine during the preference test (Figure 2I), a measure associated with goal-oriented behavior (non-habitual).

  • 9)

    The measures mentioned under points 7 and 8 correlate with seeking-lever preference (Figure 2H and 2I).

  • 10)

    The preference test detects a similar preference progression across weeks independent of the frequency with which the test is applied (Figure S2).

  • 11)

    Rats do not perform head entries into the food magazine during the 3-s waiting period before extension of the taking lever, indicating that the animals learn the “task logic” (i.e., the taking lever needs to be pressed in order to receive food (Figure 1B (right)).

  • 12)

    The “task logic” translates well from the training situation (VI60/FR1) to the preference-test situation (both levers out), as animals in both groups (habit and no-habit) perform substantially more food-magazine entries after taking-lever presses (the lever associated with food delivery; Figure 2G).

We do not provide an extensive cross-validation of our preference test with the sensory-specific satiety data presented in Figure 1C for several reasons: 1) We employed behavioral training under a VI60 reinforcement schedule, which has in fact already been widely validated and widely accepted to induce habits, for both single-action and action-sequence tasks. 2) Data acquired using the reward-devaluation approach is relatively “noisy” and provides little clustering of the population data, as exemplified in Figure 1C (center). 3) We believe that what is commonly referred to as a “habit” is not a unitary concept and has a number of different aspects and that, therefore, there are different ways to approach its detection. An approach that focuses on devaluation-insensitivity criteria is a widely accepted approach, originally put forward by Dickinson and colleagues, and has led to invaluable research findings. However, it does not capture all aspects of a (human) habit. For example, using its underlying logic, a habit can be induced in a single, relatively short behavioral training session in humans, and no more than 4-5 sessions in rodents. Yet, the human behavior to be modeled by this research, is behavior that has slowly developed across weeks or months, or even years. Our paradigm requires weeks of daily training to induce a habit. 4) To our knowledge, sensory-specific satiety devaluation has never been reported as a way to assess individual differences in habit formation. In addition to this type of devaluation being very sensitive to disturbances in general, we think that this approach is actually not suitable to assess individual differences: Following the ad-lib pre-feeding, rats were exposed to the seeking lever for five minutes under extinction conditions during which lever presses were recorded. Each animal was subjected to both valued (pre-feeding with pellets different from training pellets) and devalued (pre-feeding with training pellets) conditions (on separate days); the sequence of conditions was counterbalanced between animals. An important observation is that during the first pre-feeding session, animals eat more than after the second. Thus, whichever condition comes first, valued or devalued, the first session elicits more lever presses relative to the same condition when it occurs second in temporal sequence (in other animals). We have observed this phenomenon reliably in several cohorts of animals, and believe it to be a general phenomenon, despite the fact that, to our knowledge, it has not been reported in the literature. This phenomenon complicates looking at individual differences, as individuals are strongly affected by the sequence of pre-feeding, which restricts the utility of pre-feeding devaluation to group comparisons, where an equal number of animals have undergone each condition first (due to counterbalancing).

Experimental procedures

Experiment 1: Week 10 validation of habit induction by seeking-taking paradigm

One group of rats (n = 14) was tested for habitual responding using a traditional satiety-specific outcome devaluation, after 10 weeks of daily training on the VI60:FR1 seeking-taking chain schedule of reinforcement. Outcome devaluation (sensory-specific satiety) was achieved by giving rats one-h ad-libitum access to either grain-based pellets (used to remove hunger as a motivating factor for lever pressing; the pellets that are used for behavioral training retain their “rewarding” value, this is the “valued condition”), or regular (purified) pellets (used to decrease both hunger and value of the behavioral-training pellets, this is the “devalued condition”) prior to behavioral testing. Food pellets were placed in custom-made metal cups in an empty, otherwise unused home cage (“feeding cage”; water freely available). Following one h of this pre-feeding, rats were exposed to the seeking lever in the operant chamber for five minutes under extinction conditions during which lever presses were recorded. This was subsequently followed by a “pellet-choice” test to determine the efficacy of the pre-feeding procedure, measured by the preference for either pellet type during two minutes of simultaneous ad-libitum access to both types in the feeding cage. Animals were excluded from analyses if they consumed any of the prefed-type pellets during this test (insufficient sensory-specific satiety). Each animal was subjected to both valued and devalued conditions (on separate days); the sequence of conditions was counterbalanced between animals.

Experiment 2: Dopamine measurement during seeking-taking chain training

In a second group of rats (n = 35), we employed fast-scan cyclic voltammetry (FSCV) using chronically implanted carbon-fiber micro-electrodes73 to record rapid changes in dopamine release in different domains of the striatum, during week 1 and week 10 of training on the seeking-taking chain of reinforcement schedule. On recording days, before the start of behavioral training, electrodes were connected to a head-mounted voltammetric amplifier that was interfaced through an electrical swivel above the test chamber (allowing animals free movement; Crist Instrument, MD, USA) with a PC-driven data-acquisition and analysis system (National Instruments, TX, USA).73

Voltammetric scans were repeated every 100ms (10Hz sampling rate). During each scan, the alternating potential at the carbon-fiber electrode tip was ramped linearly from –0.4V versus Ag/AgCl to +1.3V (anodic sweep) and back to −0.4V (cathodic sweep) at 400V/s (total scan time of 8.5ms) and held at −0.4V between scans. Waveform generation, data acquisition, and analysis were carried out on a PC-based system using two PCI multifunction data acquisition cards and software written in LabVIEW (National Instruments). Dopamine is oxidized during the anodic sweep, if present at the surface of the electrode, forming dopamine-o-quinone (peak reaction detected around +0.7V), which is reduced back to dopamine in the cathodic sweep (peak reaction detected around −0.3V). The ensuing flux of electrons is measured as current and is directly proportional to the number of molecules that undergo electrolysis. The background-subtracted, time-resolved current obtained from each scan provides a chemical signature characteristic of the analyte, allowing resolution of dopamine from other substances.76

Experiment 3: Optogenetic stimulation of DMS dopamine-neuron terminals

In a third group of rats (n = 17), we optogenetically stimulated DMS dopamine-neuron terminals (bilaterally) in response to seeking-lever presses for three weeks during daily seeking-taking chain training sessions starting under VI30:FR1 and progressing to VI60:FR1. A seeking-lever press triggered a 1 s laser photo-stimulation (20mW, 20Hz, 20x10ms light pulses) followed by a 1 s timeout. Starting 10 s after the first seeking-lever press, only every third seeking-lever press triggered stimulation. Rats were exposed to the lever-preference test (without optogenetic stimulation) regularly to assess whether an approach habit was developing. Laser-stimulation visibility was masked using a custom-made LED mounted on the wall of the operant box emitting blue light pulses (20Hz, 20x10ms).

Experiment 4: Real-time place preference (RT-PP) and intracranial self-stimulation

Prior to experiment 3 (after 8-9 week post-surgery period to allow for sufficient virus expression), rats were exposed to an open-field arena in which an RT-PP assay was implemented. Following a 5min baseline period without photo-stimulation, the presence of the rat in one of the quadrants of the open-field (i.e., active quadrant) triggered photo-stimulation (20mW, 20Hz, 20x10ms light pulses) of DMS dopamine-neuron terminals during a 15min assay. The position of the active quadrant was assigned randomly. Time spent in the active quadrant was recorded. Laser-stimulation visibility was masked using a custom-made LED mounted on the wall of the open-field emitting blue light pulses (20Hz, 20x10ms).

After Experiment 3, rats were tested in the same operant box for three days in an operant self-stimulation experiment, where nose pokes into the active nose-poke device triggered a 1 s photo-stimulation (20mW, 20Hz, 20x10ms light pulses) of DMS dopamine-neuron terminals, followed by a 1 s timeout. Nose pokes into the inactive nose-poke device were recorded without programmed consequence. The location of active and inactive nose-poke devices was counterbalanced between rats. Laser-stimulation visibility was masked using a custom-made LED mounted on the wall of the operant box emitting blue light pulses (20Hz, 20x10ms).

Histological verification of recording sites, optic-fiber placement, and virus expression

After completion of experiments, rats were deeply anesthetized using a lethal dose of pentobarbital (14mg/100 g), and transcardially perfused with saline followed by 4%-paraformaldehyde (PFA). In animals with electrode implants, recording sites were marked with an electrolytic lesion before perfusion. Brains were removed and post-fixed in PFA for one day after which they were placed in 30% sucrose for cryoprotection. The brains were rapidly frozen using an isopentane bath, sliced on a cryostat (50-μm coronal sections, −20°C), and stained with Cresyl violet to aid visualization of anatomical structures, electrode-induced lesion or virus-infusion sites, and optic-fiber placement.

For the optogenetic experiment, sliced serial coronal sections (50μm) were stored in phosphate-buffered saline (PBS) with 0.02%-sodium azide at 4°C until further use. For immunohistochemical stainings, sections were incubated in blocking buffer (5% bovine serum albumin (BSA), 5% normal donkey serum (NDS), and 0.2%-triton-X in PBS) for 1 h, followed by overnight incubation in primary antibodies rabbit anti-GFP (1:1000, Invitrogen A6455, Thermo Fisher) and chicken anti-TH (1:1000, Aves TYH) in blocking buffer. All sections were washed 4x (10min each) in PBS and then incubated for 1 h in blocking buffer containing fluorescent Alexa-conjugated secondary antibodies (1:1000, donkey anti-chicken Alexa 647 and donkey anti-rabbit Alexa 488; Jackson Immunoresearch 711-546-152 and 703-606-155). Sections were washed again 4x in PBS, with the second wash containing DAPI (nuclei stain; Sigma Aldrich D9542), then mounted onto slides and coverslipped with Mowiol mounting medium. Slides were imaged using three fluorescent channels of the ZEISS Axio Scan.z1 slide-scanner at 10x magnification for qualitative expression and targeting verification. Sections were subsequently imaged in z stacks with a confocal microscope (Leica TCS SP5) to verify specificity of viral expression.

Quantification and statistical analysis

Data analysis

Dopamine-related current was isolated from the voltammetric signal by chemometric analysis using a standard training set, based on electrically stimulated dopamine release detected with chronically implanted electrodes; resulting dopamine concentration was estimated based on average post-implantation sensitivity of electrodes.73 Resulting dopamine traces were smoothed with a moving 10-point median filter prior to analysis of average concentration. Dopamine concentration was averaged over 10 s (approximate duration of the observed phasic signal) following seeking-lever extension (distal link of the chain) and over 7.5 s before and 2.5 s after reward delivery (proximal link of the chain); resulting averages were compared to the average concentration over the 2 s before each trial (baseline). Prior to each FSCV recording session, two unpredicted food-pellet deliveries (spaced apart by two minutes) confirmed electrode viability to detect dopamine, as well as electrode sensitivity over time. Data analysis was performed using MATLAB R2018b (The Mathworks, Inc., MA, USA).

Statistical analysis

Individual electrochemical signals were averaged across sessions and then across animals. For comparison of electrochemical data, individual behavioral data were binned into weeks and then averaged across animals. Corresponding electrochemical and behavioral data from week 1 were compared to week 10 using paired Student’s t test. When separated into groups, behavioral data were analyzed using multivariate ANOVAs with group, lever, nose-poke port, and week as factors. When main effects or interactions were significant, post hoc analyses were carried out and P values were adjusted according to the Holm-Bonferroni correction method for multiple comparisons. For the optogenetic self-stimulation experiment (Figure 4D), we used two-way ANOVAs, with nose-poke port (active versus inactive) or group (ChR2 versus EYFP) as factors, and time (day) as repeated-measurement. Non-parametric testing was conducted when data was not normally distributed. Graphical representations were made using Prism (GraphPad Software, La Jolla, CA, USA) and MATLAB. Statistical analysis was carried out using Prism and SPSS (version 25; Chicago, IL, USA) and MATLAB. Data collection and analysis were not performed blind to the conditions of the experiments. Statistical significance was set to p < 0.05. All data are presented as mean or median ± SEM.

Sample size was 35 rats for dopamine recordings (Figures 1 and 3), 16 for testing habitual behavior by traditional outcome devaluation via sensory-specific satiety (Figure 1), and 17 for optogenetic stimulation of dopaminergic neuron-terminals in the DMS (Figure 4), as described under “Animals.” To demonstrate reliability of the lever-preference test, we used seven cohorts of rats with different sample sizes (ncohort1 = 12, ncohort2 = 12, ncohort3 = 13, ncohort4 = 16, ncohort5 = 6, ncohort6 = 14 ncohort7 = 6; Figure S2).

Acknowledgments

We thank Ralph Hamelink and Nicole Yee for technical support, Matthijs Feenstra for input on the manuscript, and Lucia Economico for illustrations. This research was funded by the following organizations: H2020 European Research Council (ERC) (ERC-2014-STG 638013 to I.W.), Netherlands Organisation for Scientific Research (NWO) (VIDI 864.14.010,2015/06367/ALW to I.W.), and Netherlands Organisation for Scientific Research (NWO) (Gravitation program, BRAINSCAPES 024.004.012 to I.W.). The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Author contributions

Conceptualization, W.v.E., J.G., T.A., and I.W.; resources, I.W.; data curation, W.v.E., P.W., and I.W.; formal analysis, W.v.E. and P.W.; investigation, W.v.E., J.M., W.B., R.J., D.S., and J.G.; visualization, W.v.E., and P.W.; supervision, I.W.; funding acquisition, I.W.; methodology, W.v.E. and I.W.; writing – original draft, W.v.E., T.A., and I.W.; writing – review & editing, W.v.E., D.D., T.A., and I.W.; project administration, I.W.

Declaration of interests

The authors declare no competing interests.

Inclusion and diversity

One or more of the authors of this paper self-identifies as an underrepresented ethnic minority in science.

Published: February 7, 2022

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.cub.2021.12.027.

Supplemental information

Document S1. Figures S1–S4
mmc1.pdf (1.2MB, pdf)
Document S2. Article plus supplemental information
mmc2.pdf (4MB, pdf)

Data and code availability

  • All data reported in this paper will be shared by the lead contact upon request.

  • The code used for this study is available at https://osf.io/j3zaq/ and, if necessary, more detailed information is available from the corresponding author upon request.

  • Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

References

  • 1.Adams C.D. Variations in the sensitivity of instrumental responding to reinforcer devaluation. Q. J. Exp. Psychol. B. 1982;34:77–98. [Google Scholar]
  • 2.Dickinson A., Nicholas D.J., Adams C.D. The effect of the instrumental training contingency on susceptibility to reinforcer devaluation. Q. J. Exp. Psychol. B. 1983;35:35–51. [Google Scholar]
  • 3.Dickinson A. Actions and habits: the development of behavioural autonomy. Philos. Trans. R. Soc. Lond. B. 1985;308:67–78. [Google Scholar]
  • 4.Packard M.G., Hirsh R., White N.M. Differential effects of fornix and caudate nucleus lesions on two radial maze tasks: evidence for multiple memory systems. J. Neurosci. 1989;9:1465–1472. doi: 10.1523/JNEUROSCI.09-05-01465.1989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Packard M.G., McGaugh J.L. Double dissociation of fornix and caudate nucleus lesions on acquisition of two water maze tasks: further evidence for multiple memory systems. Behav. Neurosci. 1992;106:439–446. doi: 10.1037//0735-7044.106.3.439. [DOI] [PubMed] [Google Scholar]
  • 6.Knowlton B.J., Mangels J.A., Squire L.R. A neostriatal habit learning system in humans. Science. 1996;273:1399–1402. doi: 10.1126/science.273.5280.1399. [DOI] [PubMed] [Google Scholar]
  • 7.Yin H.H., Knowlton B.J., Balleine B.W. Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur. J. Neurosci. 2004;19:181–189. doi: 10.1111/j.1460-9568.2004.03095.x. [DOI] [PubMed] [Google Scholar]
  • 8.Yin H.H., Knowlton B.J., Balleine B.W. Inactivation of dorsolateral striatum enhances sensitivity to changes in the action-outcome contingency in instrumental conditioning. Behav. Brain Res. 2006;166:189–196. doi: 10.1016/j.bbr.2005.07.012. [DOI] [PubMed] [Google Scholar]
  • 9.Graybiel A.M. Habits, rituals, and the evaluative brain. Annu. Rev. Neurosci. 2008;31:359–387. doi: 10.1146/annurev.neuro.29.051605.112851. [DOI] [PubMed] [Google Scholar]
  • 10.Amaya K.A., Smith K.S. Neurobiology of habit formation. Curr. Opin. Behav. Sci. 2018;20:145–152. [Google Scholar]
  • 11.Jog M.S., Kubota Y., Connolly C.I., Hillegaart V., Graybiel A.M. Building neural representations of habits. Science. 1999;286:1745–1749. doi: 10.1126/science.286.5445.1745. [DOI] [PubMed] [Google Scholar]
  • 12.Barnes T.D., Kubota Y., Hu D., Jin D.Z., Graybiel A.M. Activity of striatal neurons reflects dynamic encoding and recoding of procedural memories. Nature. 2005;437:1158–1161. doi: 10.1038/nature04053. [DOI] [PubMed] [Google Scholar]
  • 13.Thorn C.A., Atallah H., Howe M., Graybiel A.M. Differential dynamics of activity changes in dorsolateral and dorsomedial striatal loops during learning. Neuron. 2010;66:781–795. doi: 10.1016/j.neuron.2010.04.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Smith K.S., Graybiel A.M. A dual operator view of habitual behavior reflecting cortical and striatal dynamics. Neuron. 2013;79:361–374. doi: 10.1016/j.neuron.2013.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Yin H.H., Knowlton B.J. The role of the basal ganglia in habit formation. Nat. Rev. Neurosci. 2006;7:464–476. doi: 10.1038/nrn1919. [DOI] [PubMed] [Google Scholar]
  • 16.Yin H.H., Knowlton B.J., Balleine B.W. Blockade of NMDA receptors in the dorsomedial striatum prevents action-outcome learning in instrumental conditioning. Eur. J. Neurosci. 2005;22:505–512. doi: 10.1111/j.1460-9568.2005.04219.x. [DOI] [PubMed] [Google Scholar]
  • 17.Hernandez P.J., Sadeghian K., Kelley A.E. Early consolidation of instrumental learning requires protein synthesis in the nucleus accumbens. Nat. Neurosci. 2002;5:1327–1331. doi: 10.1038/nn973. [DOI] [PubMed] [Google Scholar]
  • 18.Smith-Roe S.L., Kelley A.E. Coincident activation of NMDA and dopamine D1 receptors within the nucleus accumbens core is required for appetitive instrumental learning. J. Neurosci. 2000;20:7737–7742. doi: 10.1523/JNEUROSCI.20-20-07737.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Atallah H.E., Lopez-Paniagua D., Rudy J.W., O’Reilly R.C. Separate neural substrates for skill learning and performance in the ventral and dorsal striatum. Nat. Neurosci. 2007;10:126–131. doi: 10.1038/nn1817. [DOI] [PubMed] [Google Scholar]
  • 20.Hernandez P.J., Schiltz C.A., Kelley A.E. Dynamic shifts in corticostriatal expression patterns of the immediate early genes Homer 1a and Zif268 during early and late phases of instrumental training. Learn. Mem. 2006;13:599–608. doi: 10.1101/lm.335006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Everitt B.J., Robbins T.W. Neural systems of reinforcement for drug addiction: from actions to habits to compulsion. Nat. Neurosci. 2005;8:1481–1489. doi: 10.1038/nn1579. [DOI] [PubMed] [Google Scholar]
  • 22.Everitt B.J., Robbins T.W. From the ventral to the dorsal striatum: devolving views of their roles in drug addiction. Neurosci. Biobehav. Rev. 2013;37(9 Pt A):1946–1954. doi: 10.1016/j.neubiorev.2013.02.010. [DOI] [PubMed] [Google Scholar]
  • 23.Everitt B.J., Robbins T.W. Drug addiction: updating actions to habits to compulsions ten years on. Annu. Rev. Psychol. 2016;67:23–50. doi: 10.1146/annurev-psych-122414-033457. [DOI] [PubMed] [Google Scholar]
  • 24.Lipton D.M., Gonzales B.J., Citri A. Dorsal striatal circuits for habits, compulsions and addictions. Front. Syst. Neurosci. 2019;13:28. doi: 10.3389/fnsys.2019.00028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Yin H.H., Ostlund S.B., Knowlton B.J., Balleine B.W. The role of the dorsomedial striatum in instrumental conditioning. Eur. J. Neurosci. 2005;22:513–523. doi: 10.1111/j.1460-9568.2005.04218.x. [DOI] [PubMed] [Google Scholar]
  • 26.Lerner T.N. Interfacing behavioral and neural circuit models for habit formation. J. Neurosci. Res. 2020;98:1031–1045. doi: 10.1002/jnr.24581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Yin H.H., Mulcare S.P., Hilário M.R.F., Clouse E., Holloway T., Davis M.I., Hansson A.C., Lovinger D.M., Costa R.M. Dynamic reorganization of striatal circuits during the acquisition and consolidation of a skill. Nat. Neurosci. 2009;12:333–341. doi: 10.1038/nn.2261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Haber S.N., Fudge J.L., McFarland N.R. Striatonigrostriatal pathways in primates form an ascending spiral from the shell to the dorsolateral striatum. J. Neurosci. 2000;20:2369–2382. doi: 10.1523/JNEUROSCI.20-06-02369.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Murray J.E., Belin D., Everitt B.J. Double dissociation of the dorsomedial and dorsolateral striatal control over the acquisition and performance of cocaine seeking. Neuropsychopharmacology. 2012;37:2456–2466. doi: 10.1038/npp.2012.104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Belin D., Everitt B.J. Cocaine seeking habits depend upon dopamine-dependent serial connectivity linking the ventral with the dorsal striatum. Neuron. 2008;57:432–441. doi: 10.1016/j.neuron.2007.12.019. [DOI] [PubMed] [Google Scholar]
  • 31.Willuhn I., Burgeno L.M., Everitt B.J., Phillips P.E.M. Hierarchical recruitment of phasic dopamine signaling in the striatum during the progression of cocaine use. Proc. Natl. Acad. Sci. USA. 2012;109:20703–20708. doi: 10.1073/pnas.1213460109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Wickens J.R., Horvitz J.C., Costa R.M., Killcross S. Dopaminergic mechanisms in actions and habits. J. Neurosci. 2007;27:8181–8183. doi: 10.1523/JNEUROSCI.1671-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Jin X., Costa R.M. Start/stop signals emerge in nigrostriatal circuits during sequence learning. Nature. 2010;466:457–462. doi: 10.1038/nature09263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Wang L.P., Li F., Wang D., Xie K., Wang D., Shen X., Tsien J.Z. NMDA receptors in dopaminergic neurons are crucial for habit learning. Neuron. 2011;72:1055–1066. doi: 10.1016/j.neuron.2011.10.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Faure A., Haberland U., Condé F., El Massioui N. Lesion to the nigrostriatal dopamine system disrupts stimulus-response habit formation. J. Neurosci. 2005;25:2771–2780. doi: 10.1523/JNEUROSCI.3894-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Nelson A., Killcross S. Amphetamine exposure enhances habit formation. J. Neurosci. 2006;26:3805–3812. doi: 10.1523/JNEUROSCI.4305-05.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Bannard C., Leriche M., Bandmann O., Brown C.H., Ferracane E., Sánchez-Ferro Á., Obeso J., Redgrave P., Stafford T. Reduced habit-driven errors in Parkinson’s Disease. Sci. Rep. 2019;9:3423. doi: 10.1038/s41598-019-39294-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Witt K., Nuhsman A., Deuschl G. Dissociation of habit-learning in Parkinson’s and cerebellar disease. J. Cogn. Neurosci. 2002;14:493–499. doi: 10.1162/089892902317362001. [DOI] [PubMed] [Google Scholar]
  • 39.Seiler J.L., Cosme C.V., Sherathiya V.N., Bianco J.M., Lerner T.N. Dopamine signaling in the dorsomedial striatum promotes compulsive behavior. Curr. Biol. 2022 doi: 10.1016/j.cub.2022.01.055. Published online February 7, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Adams C.D., Dickinson A. Instrumental responding following reinforcer devaluation. The Quarterly Journal of Experimental Psychology Section B. 1981;33:109–121. [Google Scholar]
  • 41.Colwill R.M., Rescorla R.A. Postconditioning devaluation of a reinforcer affects instrumental responding. J. Exp. Psychol. Anim. Behav. Process. 1985;11:120–132. [Google Scholar]
  • 42.Wood W., Rünger D. Psychology of habit. Annu. Rev. Psychol. 2016;67:289–314. doi: 10.1146/annurev-psych-122414-033417. [DOI] [PubMed] [Google Scholar]
  • 43.Hilário M.R.F., Costa R.M. High on habits. Front. Neurosci. 2008;2:208–217. doi: 10.3389/neuro.01.030.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Balleine B.W., Dickinson A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology. 1998;37:407–419. doi: 10.1016/s0028-3908(98)00033-1. [DOI] [PubMed] [Google Scholar]
  • 45.Balleine B.W., Dezfouli A. Hierarchical action control: adaptive collaboration between actions and habits. Front. Psychol. 2019;10:2735. doi: 10.3389/fpsyg.2019.02735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Buabang E.K., Boddez Y., De Houwer J., Moors A. Don’t make a habit out of it: Impaired learning conditions can make goal-directed behavior seem habitual. Motiv. Sci. 2021;7:252–263. [Google Scholar]
  • 47.Smith K.S., Graybiel A.M. Investigating habits: strategies, technologies and models. Front. Behav. Neurosci. 2014;8:39. doi: 10.3389/fnbeh.2014.00039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Hilário M.R.F. Endocannabinoid signaling is critical for habit formation. Front. Integr. Neurosci. 2007;1:6. doi: 10.3389/neuro.07.006.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Rossi M.A., Yin H.H. Methods for studying habitual behavior in mice. Curr. Protoc. Neurosci. 2012;8:29. doi: 10.1002/0471142301.ns0829s60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Robbins T.W., Costa R.M. Habits. Curr. Biol. 2017;27:R1200–R1206. doi: 10.1016/j.cub.2017.09.060. [DOI] [PubMed] [Google Scholar]
  • 51.Paxinos G., Watson C. Academic Press; 1998. The Rat Brain in Stereotaxic Coordinates. [DOI] [PubMed] [Google Scholar]
  • 52.Nieh E.H., Vander Weele C.M., Matthews G.A., Presbrey K.N., Wichmann R., Leppla C.A., Izadmehr E.M., Tye K.M. Inhibitory input from the lateral hypothalamus to the ventral tegmental area disinhibits dopamine neurons and promotes behavioral activation. Neuron. 2016;90:1286–1298. doi: 10.1016/j.neuron.2016.04.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Thomas C.S., Mohammadkhani A., Rana M., Qiao M., Baimel C., Borgland S.L. Optogenetic stimulation of lateral hypothalamic orexin/dynorphin inputs in the ventral tegmental area potentiates mesolimbic dopamine neurotransmission and promotes reward-seeking behaviours. Neuropsychopharmacol. 2022;47:728–740. doi: 10.1038/s41386-021-01196-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Zhang Z., Liu Q., Wen P., Zhang J., Rao X., Zhou Z., Zhang H., He X., Li J., Zhou Z., et al. Activation of the dopaminergic pathway from VTA to the medial olfactory tubercle generates odor-preference and reward. eLife. 2017;6:e25423. doi: 10.7554/eLife.25423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Galaj E., Han X., Shen H., Jordan C.J., He Y., Humburg B., Bi G.-H., Xi Z.-X. Dissecting the role of GABA neurons in the VTA versus SNr in opioid reward. J. Neurosci. 2020;40:8853–8869. doi: 10.1523/JNEUROSCI.0988-20.2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Coimbra B., Domingues A.V., Soares-Cunha C., Correia R., Pinto L., Sousa N., Rodrigues A.J. Laterodorsal tegmentum-ventral tegmental area projections encode positive reinforcement signals. J. Neurosci. Res. 2021;99:3084–3100. doi: 10.1002/jnr.24931. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Ilango A., Kesner A.J., Keller K.L., Stuber G.D., Bonci A., Ikemoto S. Similar roles of substantia nigra and ventral tegmental dopamine neurons in reward and aversion. J. Neurosci. 2014;34:817–822. doi: 10.1523/JNEUROSCI.1703-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Saunders B.T., Richard J.M., Margolis E.B., Janak P.H. Dopamine neurons create Pavlovian conditioned stimuli with circuit-defined motivational properties. Nat. Neurosci. 2018;21:1072–1083. doi: 10.1038/s41593-018-0191-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Giuliano C., Puaud M., Cardinal R.N., Belin D., Everitt B.J. Individual differences in the engagement of habitual control over alcohol seeking predict the development of compulsive alcohol seeking and drinking. Addict. Biol. 2021;26:e13041. doi: 10.1111/adb.13041. [DOI] [PubMed] [Google Scholar]
  • 60.Zapata A., Minney V.L., Shippenberg T.S. Shift from goal-directed to habitual cocaine seeking after prolonged experience in rats. J. Neurosci. 2010;30:15457–15463. doi: 10.1523/JNEUROSCI.4072-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Wood W., Neal D.T. A new look at habits and the habit-goal interface. Psychol. Rev. 2007;114:843–863. doi: 10.1037/0033-295X.114.4.843. [DOI] [PubMed] [Google Scholar]
  • 62.Miller K.J., Shenhav A., Ludvig E.A. Habits without values. Psychol. Rev. 2019;126:292–311. doi: 10.1037/rev0000120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Dezfouli A., Balleine B.W. Actions, action sequences and habits: evidence that goal-directed and habitual action control are hierarchically organized. PLoS Comput. Biol. 2013;9:e1003364. doi: 10.1371/journal.pcbi.1003364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Dezfouli A., Lingawi N.W., Balleine B.W. Habits as action sequences: hierarchical action control and changes in outcome value. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2014;369:20130482. doi: 10.1098/rstb.2013.0482. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Garr E., Delamater A.R. Exploring the relationship between actions, habits, and automaticity in an action sequence task. Learn. Mem. 2019;26:128–132. doi: 10.1101/lm.048645.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Schoenbaum G., Setlow B. Cocaine makes actions insensitive to outcomes but not extinction: implications for altered orbitofrontal-amygdalar function. Cereb. Cortex. 2005;15:1162–1169. doi: 10.1093/cercor/bhh216. [DOI] [PubMed] [Google Scholar]
  • 67.Nordquist R.E., Voorn P., de Mooij-van Malsen J.G., Joosten R.N.J.M.A., Pennartz C.M.A., Vanderschuren L.J.M.J. Augmented reinforcer value and accelerated habit formation after repeated amphetamine treatment. Eur. Neuropsychopharmacol. 2007;17:532–540. doi: 10.1016/j.euroneuro.2006.12.005. [DOI] [PubMed] [Google Scholar]
  • 68.Eyny Y.S., Horvitz J.C. Opposing roles of D1 and D2 receptors in appetitive conditioning. J. Neurosci. 2003;23:1584–1587. doi: 10.1523/JNEUROSCI.23-05-01584.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Ljungberg T., Apicella P., Schultz W. Responses of monkey dopamine neurons during learning of behavioral reactions. J. Neurophysiol. 1992;67:145–163. doi: 10.1152/jn.1992.67.1.145. [DOI] [PubMed] [Google Scholar]
  • 70.Choi W.Y., Balsam P.D., Horvitz J.C. Extended habit training reduces dopamine mediation of appetitive response expression. J. Neurosci. 2005;25:6729–6733. doi: 10.1523/JNEUROSCI.1498-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Witten I.B., Steinberg E.E., Lee S.Y., Davidson T.J., Zalocusky K.A., Brodsky M., Yizhar O., Cho S.L., Gong S., Ramakrishnan C., et al. Recombinase-driver rat lines: tools, techniques, and optogenetic application to dopamine-mediated reinforcement. Neuron. 2011;72:721–733. doi: 10.1016/j.neuron.2011.10.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Willuhn I., Burgeno L.M., Groblewski P.A., Phillips P.E.M. Excessive cocaine use results from decreased phasic dopamine signaling in the striatum. Nat. Neurosci. 2014;17:704–709. doi: 10.1038/nn.3694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Clark J.J., Sandberg S.G., Wanat M.J., Gan J.O., Horne E.A., Hart A.S., Akers C.A., Parker J.G., Willuhn I., Martinez V., et al. Chronic microsensors for longitudinal, subsecond dopamine detection in behaving animals. Nat. Methods. 2010;7:126–129. doi: 10.1038/nmeth.1412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Olmstead M.C., Parkinson J.A., Miles F.J., Everitt B.J., Dickinson A. Cocaine-seeking by rats: regulation, reinforcement and activation. Psychopharmacology (Berl.) 2000;152:123–131. doi: 10.1007/s002130000498. [DOI] [PubMed] [Google Scholar]
  • 75.Balleine B.W., Garner C., Gonzalez F., Dickinson A. Motivational control of heterogeneous instrumental chains. J. Exp. Psychol. Anim. Behav. Process. 1995;21:203–217. [Google Scholar]
  • 76.Phillips P.E.M., Wightman R.M. Critical guidelines for validation of the selectivity of in-vivo chemical microsensors. Trends in Analytical Chemistry. 2003;22:509–514. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S4
mmc1.pdf (1.2MB, pdf)
Document S2. Article plus supplemental information
mmc2.pdf (4MB, pdf)

Data Availability Statement

  • All data reported in this paper will be shared by the lead contact upon request.

  • The code used for this study is available at https://osf.io/j3zaq/ and, if necessary, more detailed information is available from the corresponding author upon request.

  • Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

RESOURCES