Optimal behavior and decision making result from a balance of control between two strategies, one cognitive/goal-directed and one habitual. These systems are known to rely on the anatomically distinct dorsomedial (DMS) and dorsolateral (DLS) striatum, respectively. But the transcriptional regulatory mechanisms required to learn and transition between these strategies are unknown. Here we examined the role of one chromatin-based transcriptional regulator, histone modification via histone deacetylases (HDACs), in this process.
To do this, we combined procedures that diagnose behavioral strategy in rats with pharmacological and viral-mediated HDAC manipulations, chromatin immunoprecipitation, and mRNA quantification.
The results indicate that dorsal striatal HDAC3 activity constrains habit formation. Systemic HDAC inhibition following instrumental (lever press→reward) conditioning increased histone acetylation throughout the dorsal striatum and accelerated habitual control of behavior. HDAC3 was removed from the promoters of key, CREB-regulated, learning-related genes in the dorsal striatum as habits formed with overtraining and with post-training HDAC inhibition. Decreasing HDAC3 function, either by selective pharmacological inhibition or expression of dominant-negative mutated HDAC3, in either the DLS or DMS accelerated habit formation, while HDAC3 overexpression in either region prevented habit.
These results challenge the strict dissociation between DMS and DLS function in goal-directed v. habitual behavioral control and identify dorsal striatal HDAC3 as a critical molecular directive of the transition to habit. Because this transition is disrupted in many neurodegenerative and psychiatric diseases, these data suggest a potential molecular mechanism for the negative behavioral symptoms of these conditions and a target for therapeutic intervention.
Keywords: Epigenetic, Reward, Learning, Decision making, Chromatin, HDAC3, Instrumental conditioning
Humans and animals rely on two distinct strategies for decision making (1, 2). The deliberative, goal-directed strategy requires prospective evaluation of potential actions and their anticipated consequences, via a learned association between these variables, and, therefore supports behavior that can readily adapt when circumstances change (3). Repetition of successful actions promotes less cognitively-taxing habits, in which behavior is more automatically triggered by antecedent events (4). The balance between these two forms of learning promotes adaptive and efficient behavior, but disruptions can lead to the symptoms that underlie myriad neurodegenerative and psychiatric diseases (5, 6).
The goal-directed and habit strategies have been demonstrated to rely on largely distinct cortico-basal ganglia circuits centered on the dorsomedial (DMS) (7–10) and dorsolateral (DLS) (11, 12) striatum, respectively. Although gene transcription has been implicated (13–18), the mechanisms that regulate transcriptional events to support the acquisition of and transition between these behavioral control strategies are unknown. Post-translational modifications of core histone proteins alter accessibility to DNA for transcriptional machinery to coordinate gene expression, and have been implicated in the regulation of neuronal plasticity and memory (19–21), and, thus, could regulate goal-directed and/or habit learning. But the function of such mechanisms in instrumental learning or in the dorsal striatum is not known. Histone deacetylases (HDACs) are particularly interesting because they are removed from gene promoters after salient behavioral events, allowing histone acetyltransferases (HATs) to promote histone acetylation, which in turn allows transcriptional processes supporting the long-lasting changes in neuronal function that ultimately give rise to learning (22–24). Here we evaluated the role of HDACs in instrumental action-reward learning using procedures that diagnose behavioral strategy combined with pharmacological or viral-mediated manipulation of HDAC function and molecular analysis.
A detailed description of the methods is provided in the Supplemental Material. Specific experiments are described below. All procedures were conducted in accordance with the NIH Guide for the Care and Use of Laboratory Animals and were approved by the UCLA Institutional Animal Care and Use Committee.
Effect of post-training HDAC inhibition on instrumental learning and behavioral strategy
We first examined the function of HDACs in goal-directed and habit learning by systemically inhibiting HDAC activity following instrumental conditioning and then probing behavioral strategy off drug. Adult, male rats were trained to lever press to earn delivery of a food-pellet reward on a random-interval-30-s reinforcement schedule. Rats were given either limited (3 days) or intermediate (4 days) training, known to preserve goal-directed control, or extended (6 days) training known to allow habits to dominate (4) (Fig. 1A). Rats were administered the nonspecific class I HDAC inhibitor sodium butyrate (25) (NaBut, 1.0 g/kg, i.p. (26–30)) or sterile-water vehicle immediately after each training session to examine the function of HDACs in consolidation of the learning underlying instrumental performance. All rats acquired the behavior; both vehicle- and NaBut-treated rats increased their lever-press rate across training (Fig. 1B–D; Training day: Limited, F3,60=92.59, P<0.0001; Intermediate, F4,80=50.69, P<0.0001; Extended, F6,96=18.64, P<0.0001), though the NaBut group plateaued at a lower rate than vehicle controls with limited (Drug x Day: F3,60=3.63, P=0.02), or intermediate (F4,80=5.49, P<0.001), but not extended (F6,96=0.42, P=0.86) training.
Figure 1. Effect of post-training HDAC inhibition on instrumental learning, behavioral strategy, and acetylation at histone H4.
(A) Schematic representation of procedures (B–D) Instrumental (lever press→reward) training performance for rats given limited (B; N=11/group), intermediate (C; N=11–12/group), or extended (D; N=9/group) instrumental training. (data presented as mean + s.e.m). CRF (continuous reinforcement) on the first training day lever pressing was continuously reinforced with food-pellet delivery. A RI-30s reinforcement schedule was in place thereafter. (E–G) Lever presses during the subsequent devaluation tests normalized to total presses across both tests for the Valued [Valued state presses/(Valued + Devalued state presses)] and Devalued [Devalued state presses/(Valued + Devalued state presses)] states for rats that received limited (E), intermediate (F), or extended (G) training. Dashed line indicates point of equal responding between tests. (data presented as mean + scatter). (H) Schematic representation of procedures, (I) Representative immunofluorescent image and quantification (J) of acetylation of H4K8 (H4K8Ac; N=4–6/condition) 1 hr following instrumental training/drug treatment. Scale bar = 20 μm. Data normalized to vehicle control (dashed line). *P<0.05; ** P<0.01.
Behavioral strategy cannot be determined from simple lever-press performance. To identify the degree of goal-directed v. habitual control, rats were given a 5-min, drug-free, outcome-specific devaluation test (Fig. 1A). Non-reinforced lever pressing was assessed following sensory-specific satiation (1-hr pre-feeding) on the food pellet earned by lever pressing (Devalued state). This was compared to pressing after satiation on an alternate food pellet that had been non-contingently provided daily outside of the training context (Valued state). Each rat was tested under both conditions and data were normalized between Valued and Devalued conditions to provide an index of the extent to which behavior was under goal-directed v. habitual control and to control for variability in press rates between subjects (for raw press rates, see Fig. S1). If subjects are using a goal-directed strategy and, therefore, considering the consequences of their actions, they should downshift pressing in the devalued state (‘sensitivity to devaluation’). Habits are marked by insensitivity to devaluation. As expected, in vehicle-treated subjects we observed a reduction in lever pressing in the Devalued state relative to the Valued state in the limited training group (Fig. 1E; Devaluation: F1,20=19.68, P=0.003) and a failure to reduce responding following devaluation in the extended training group (Fig. 1G; F1,16=0.62, P>0.999). This was not altered by post-training HDAC inhibition (Drug: Limited, F1,20=0.00, P>0.999; Extended, F1,16=1.88, P=0.19; Drug x Devaluation: Limited, F1,20=0.41, P=0.53; Extended, F1,16=0.06, P=0.80), suggesting that HDAC inhibition neither disrupted initial goal-directed learning, nor prevented habit formation with overtraining. NaBut-treated subjects did show more variable performance following overtraining, which may have resulted from off-target effects of extended NaBut treatment.
Post-training HDAC inhibition did, however, accelerate the rate at which habit came to dominate behavioral control. With intermediate training, NaBut-treated animals became insensitive to devaluation of the earned reward, failing to reduce responding in the Devalued v. Valued states, while lever pressing in vehicle-treated subjects remained sensitive to devaluation (Fig. 1F; Devaluation: F1,20=1.68, P=0.21; Drug: F1,20=0.74, P=0.40; Drug x Devaluation: F1,20=4.85, P=0.04). In separate groups, we found that post-training HDAC inhibition accelerated habit formation when behavioral strategy was assayed by reversal of the action-outcome contingency (Fig. S2), which does not involve a value or satiety manipulation, and occurred even when rats were trained on a random-ratio reinforcement schedule (Fig. S3) known to promote goal-directed control (1). Importantly, this effect did not manifest if NaBut was administered outside the memory-consolidation window (Fig. S4). These data reveal that inhibition of class I HDACs following instrumental conditioning facilitates the transition to habitual control of behavior, allowing habits to dominate at a point in training that behavior would normally remain primarily goal-directed.
Effect of post-training HDAC inhibition on dorsal striatal histone acetylation and neuronal activation
HDACs constrain histone acetylation levels by removing acetyl groups from histone tails and HDAC inhibition can prevent the rapid deacetylation of histone proteins (19), therefore, HDAC inhibitor treatment should cause increased histone acetylation in brain regions in which HATs are activated (21). In a separate group of subjects, we next used immunofluorescence (26, 31, 32) to determine the brain region-specific effects of peripheral NaBut treatment following instrumental training (Fig. 1H). We hypothesized that histone acetylation would be increased in the DLS, a region critical for habit formation (11) and demonstrated to be transcriptionally-active during habit learning (17). We focused on histone H4 lysine 8 (H4K8Ac), a histone acetylation mark implicated in memory formation (33, 34). 1 hr after the last intermediate training session and drug delivery, H4K8Ac was significantly higher in the DLS (t8=3.08, P=0.02) of NaBut-treated subjects compared to controls (Fig. 1I–J, see also Fig. S5). H4K8Ac was also elevated in the DMS (t8=2.41, P=0.04), but there were no significant differences detected in any other regions examined (see Table S1). This increase in histone acetylation was restricted to neurons in the DMS and occurred in both neuronal and non-neuronal cells in the DLS (Fig. S6). Using real-time quantitative polymerase chain reaction (qPCR), we found that both the DMS and DLS were engaged (as measured by immediately early gene expression) during instrumental conditioning and this was not significantly altered by post-training HDAC inhibition (Fig. S7).
Effect of training and post-training HDAC inhibition on HDAC3 occupancy at learning-related gene promoters and gene expression in the dorsolateral striatum
By removing acetyl groups, HDACs, typically, repress gene transcription (19). That post-training HDAC inhibition accelerated the transition to habit, suggests that HDACs might normally be engaged early in learning to restrain gene expression supporting habit formation. If this is true, then HDAC occupancy at the promoters of key learning-related genes might be enriched with intermediate instrumental training to prevent dominance by the habit strategy, but removed back to baseline levels when conditions are ripe for habit to dominate, e.g., overtraining, or by HDAC inhibition. To test this, we used chromatin immunoprecipitation (ChIP) coupled with qPCR to examine HDAC occupancy at the promoter regions of Bdnf1, Nr4a1, and Nr4a2. These genes were selected because they have been shown to be regulated by histone acetylation, involved in long-term memory formation (30, 31, 33–35), and are regulated by cAMP response element-binding protein (CREB) (36–38), a transcription factor implicated in habit-like behaviors (13, 16, 17). We focused on the DLS, given its crucial role in habit formation and findings of training-related activity and NaBut regulation of H4K8Ac in this region, and HDAC3, the most highly expressed Class I HDAC in the striatum (39).
1 hr following the last intermediate training session (Fig. 2A), HDAC3 occupancy at the Bdnf exon 1 promoter was enriched relative to homecage controls (P<0.001), but returned to control levels with extended training and was not elevated above control levels when intermediate training was followed by NaBut treatment (Fig. 2B; F3,12=18.53, P<0.001). A similar pattern was detected for HDAC3 occupancy at the promoter for Nr4a2 (Fig. 2D; F3,12=5.74, P=0.01). Although there was an overall effect of training/treatment group on HDAC3 occupancy at the Nr4a1 promoter (Fig. 2C; F3,21=3.12, P=0.05), in no case was HDAC3 enrichment at this promoter significantly different from control (P>0.05). Expression of Bdnf1 (Fig. 2E), quantified with qPCR, was significantly elevated 1 hr following intermediate training with NaBut treatment relative to homecage controls (t13=2.41, P=0.03), the intermediate trained control group (t13=3.02, P=0.01), and controls following extended training (t13=3.23, P=0.01). A similar pattern was detected in Bdnf9 expression (Fig. S8). Nr4a1 and Nr4a2 expression (Fig. 2F–G) were not significantly different from controls in any condition.
Figure 2. Effect of training and post-training HDAC inhibition on HDAC3 occupancy at learning-related gene promoters and gene expression in the dorsolateral striatum.
(A) Schematic representation of procedures. (B–D) ChIP was performed with anti-HDAC3 followed by qPCR to identify HDAC3 binding to the Bdnf1 (B), Nr4a1 (C), or Nr4a2 (D) promoters in the DLS of home cage (HC) controls or following either intermediate (INT) or extended training (EXT) in vehicle-treated rats, or NaBut treatment post-intermediate training. Data presented as fold change relative to IgG (% Input/IgG). (E–G) mRNA expression of Bdnf1 (E), Nr4a1 (F), and Nr4a2 (G) in the DLS. *P<0.05, **P<0.01, between groups; ##P<0.01 relative to HC.
These data suggest that HDAC3 occupancy at the Bdnf1 and Nr4a2 promoters in the DLS is normally enriched during early instrumental learning, but becomes dissociated from these gene promoters as habits dominate behavioral control. Post-training HDAC inhibition accelerates both habit formation and removal of HDAC3 from the promoters of these key learning genes, and also induces Bdnf expression. Therefore, HDAC inhibition immediately following instrumental training caused a transcriptionally-permissive, hyperacetylated histone state, disengaged HDAC3 from specific gene promoters in the DLS, and facilitated behavioral control by the habit system.
Effect of HDAC3 manipulation in dorsolateral striatum on habit formation
That HDAC3 is engaged in the DLS during early learning and then disengaged with the overtraining that promotes habit, suggests that decreasing DLS HDAC3 activity might promote habit formation. We tested this hypothesis in two ways. First, by pharmacologically inhibiting HDAC3 activity specifically in the DLS immediately after each intermediate instrumental training session and, second, by expressing a dominant negative HDAC3 point mutant (34) in the DLS prior to all training to selectively disrupt HDAC3 enzymatic activity, without affecting protein-protein interactions (40, 41) (Fig. 3A). In both cases, habit formation was potentiated, recapitulating the systemic HDAC inhibition result. Post-training intra-DLS infusion of the selective HDAC3 inhibitor RGFP966 (1.0 ng/0.5μl/side) (31, 42, 43) elevated H4K8Ac in the DLS (Fig. 3B–C; t8=3.15, P=0.014; see also Fig. S9), but not the adjacent DMS (Fig. 3B; Normalized H4K8Ac optical density: Vehicle, 1.011±0.10; RGFP966, 1.00±0.27; t3=0.06, P=0.954) compared to vehicle-treated controls. This treatment did not alter the acquisition of instrumental lever-press behavior (Fig. 3D; Training day: F4,68=32.61, P<0.001; Drug: F1,17=0.95, P=0.34; Drug x Day: F4,68=2.32, P=0.07), but did render this behavior insensitive to devaluation of the earned reward under conditions in which intermediate-trained controls remained sensitive (Fig. 3E; Devaluation: F1,17=10.83, P=0.004; Drug: F1,17=1.12, P=0.31; Drug x Devaluation: F1,17=7.25, P=0.02). Similarly, expressing a dominant negative point mutant of HDAC3 (AAV2/1-CMV-HDAC3Y298H-V5), in the DLS produced H4K8 hyperacetylation (Fig. 3F–G; t10=3.91, P=0.003) and insensitivity to outcome devaluation (Fig. 3I; Devaluation: F1,14=3.74 P=0.07; Virus: F1,14=0.47, P=0.506; Virus x Devaluation: F1, 17=6.19, P=0.03), without altering lever-pressing acquisition (Fig. 3H; Training day: F4,56=29.21, P<0.001; Virus: F1,14<0.01, P=0.99; Virus x Day: F4,56=0.56, P=0.69). These results demonstrate that attenuating DLS HDAC3 activity promotes habit formation.
Figure 3. Effect of HDAC3 manipulation in dorsolateral striatum on habit formation.
(A) Schematic representation of procedures. (B) Top, schematic representation of injector tips in the DLS. Numbers to the lower right of each section represent distance (mm) anterior to bregma. Coronal section drawings taken from (96). Middle, representative immunofluorescent images of H4K8Ac in DMS (left) and DLS (right) 1 hr following instrumental training/intra-DLS vehicle (top) or RGFP966 (bottom) infusion. Scale bar = 20 μm. (F, J) Top, schematic representation of HDAC3 point mutant (F), or HDAC3 (J) expression in the DLS for all subjects. Middle, representative immunofluorescent images of V5-tagged HDAC3 point mutant (F) or HDAC3 (J) expression in the DLS. Bottom, Representative immunofluorescent images of H4K8Ac in rats expressing the HDAC3 point mutant (F) or HDAC3 (J) either outside (left) or inside (right) of expression zone. (C, G, K) Quantification of H4K8Ac for rats receiving intra-DLS vehicle or RGFP966 infusion (C; N=5/group), intra-DLS empty vector (EV) or HDAC3 point mutant (HDAC3pm) (G; N=6/group), or intra-DLS empty vector or HDAC3 overexpression (K; N=5–6/group). (data presented as mean + scatter). (D, H, L) Instrumental training performance for rats given post-training intra-DLS vehicle or RGFP966 infusions (D; N=10–12/group), intra-DLS empty vector or HDAC3 point mutant (H; N=8/group), or intra-DLS empty vector or HDAC3 (L; N=11–13/group). (data presented as mean + s.e.m). CRF (continuous reinforcement) on the first training day lever pressing was continuously reinforced with food-pellet delivery. (E, I, M) Normalized lever presses during the subsequent devaluation tests for rats given post-training intra-DLS vehicle or RGFP966 infusions (E), intra-DLS empty vector (EV) or HDAC3 point mutant (I), or intra-DLS empty vector (EV)or HDAC3 (M). *P<0.05; **P<0.01; ***P<0.001.
The combined data strongly suggest that in the DLS HDAC3 might normally be engaged to restrain habits when sufficient action repetition has not yet occurred. If this is true, then increasing HDAC3 activity in the DLS should slow or prevent habit formation. To test this, we overexpressed HDAC3 in the DLS (AAV2/1-CMV-HDAC3-V5), used immunofluorescence to confirm that this reduced H4K8Ac in the DLS (Fig. 3J–K; t9=2.29, P=0.048), and gave subjects extended instrumental training known to promote habit. All subjects similarly acquired the lever-press behavior (Fig. 3L; Training day: F6,126=81.44, P<0.001; Virus: F1,21=0.06, P=0.81; Virus x Day: F6,126=0.44, P=0.85). As expected, control subjects showed evidence of habits (insensitivity to devaluation), whereas subjects with HDAC3 overexpressed in the DLS were unable to form behavioral habits and continued to show sensitivity to devaluation (Fig. 3M; Devaluation: F1,21=2.24 P=0.15; Virus: F1,21=−8.17, P>0.999; Virus x Devaluation: F1,21=6.45, P=0.02), even after extensive overtraining (Fig. S10). Together, these data demonstrate that in the DLS HDAC3 is a critical negative regulator of habit formation.
Effect of HDAC3 manipulation in dorsomedial striatum on habit formation
We found that systemic post-instrumental-training HDAC inhibition increased H4K8Ac in not only the DLS, but also the DMS. To further understand how altered regulation of gene transcription in these dissociable brain systems contributes to a transition in behavioral control, we next asked whether manipulating HDAC3 in the DMS would modify the progression of instrumental strategy (Fig. 4A). Both post-training intra-DMS HDAC3 inhibition (RGFP966, 1.0 ng/0.5μl/side; Fig. 4B–C; DMS: t7=2.54, P=0.04; DLS: Vehicle, 1.00±0.02 normalized H4K8Ac optical density v. RGFP966 1.09±0.07, t7=1.30, P=0.23; see also Fig. S11) and expression of the dominant negative HDAC3 point mutant in the DMS (Fig. 4F–G; t7=4.32, P=0.004) induced a hyperacetylated H4K8 state that was restricted to the DMS. Neither treatment affected acquisition of the lever-press behavior (RGFP966: Fig. 4D; Training day: F4,64=45.38, P<0.001; Drug: F1,16=0.06, P=0.81; Drug x Day; F4,64=0.12, P=0.98; HDAC3pm: Fig. 4H; Day: F4,84=32.38, P<0.001; Virus: F1,21=0.19 P=0.67; Virus x Day: F4,84=0.44, P=0.78). To our surprise, given the canonical function of the DMS in goal-directed, not habit, learning (7, 8), both post-training intra-DMS RGFP966 (Fig. 4E; Devaluation: F1,16=1.16 P=0.298; Drug: F1,16=0.00, P>0.999; Drug x Devaluation: F1, 16=4.79, P=0.04) and DMS HDAC3 point mutant expression (Fig. 4I; Devaluation: F1,21=1.94, P=0.18; Virus: F1,21=17.61, P<0.001; Virus x Devaluation: F1,21=5.52, P=0.03) potentiated habit formation, as indicated by insensitivity to devaluation of the earned reward following intermediate training. Conversely, DMS HDAC3 overexpression reduced H4K8Ac (Fig. 4J–K; t8=3.27, P=0.01) did not alter lever pressing acquisition (Fig. 3L; Day: F6,132=61.69, P<0.001; Virus: F1,22=1.36, P=0.26; Virus x Day: F6,132=2.01, P=0.07), but prevented habit formation (Fig. 4M; Devaluation: F1,22=7.06, P=0.01; Virus: F1,22=2.68, P=0.12; Virus x Devaluation: F1,22=4.34, P=0.049), even with extensive overtraining (Fig. S12). These unexpected results demonstrate that, like in the DLS, DMS HDAC3 negatively regulates the transition to habit.
Figure 4. Effect of HDAC3 manipulation in dorsomedial striatum on habit formation.
(A) Schematic representation of procedures. (B) Top, schematic representation of injector tips in the DMS. Numbers to the lower right of each section represent distance (mm) anterior to bregma. Middle, representative immunofluorescent images of H4K8Ac in DMS (left) and DLS (right) 1 hr following instrumental training/intra-DMS vehicle (top) or RGFP966 (bottom) infusion. Scale bar = 20 μm. (F, J) Top, schematic representation of HDAC3 point mutant (F), or HDAC3 (J) expression in the DMS for all subjects. Middle, representative immunofluorescent images of HDAC3 point mutant (F) or HDAC3 (J) expression in the DMS. Bottom, Representative immunofluorescent images of H4K8Ac in rats expressing the HDAC3 point mutant (F) or HDAC3 (J) either outside (left) or inside (right) of the expression zone. (C, G, K) Quantification of H4K8Ac for rats receiving intra-DMS vehicle or RGFP966 infusion (C; N=4–5/group), intra-DMS empty vector (EV) or HDAC3 point mutant (G; N=4–5/group), or intra-DMS empty vector or HDAC3 overexpression (K; N=4–6/group). (data presented as mean + scatter). (D, H, L) Instrumental training performance for rats given post-training intra-DMS vehicle or RGFP966 infusions (D; N=9/group), intra-DMS empty vector or HDAC3 point mutant (H; N=11–12/group), or intra-DMS empty vector or HDAC3 (L; N=12/group). (data presented as mean + s.e.m). CRF (continuous reinforcement) on the first training day lever pressing was continuously reinforced with food-pellet delivery. (E, I, M) Normalized lever presses during the subsequent devaluation tests for rats given post-training intra-DMS vehicle or RGFP966 infusions (E), intra-DMS empty vector (EV) or HDAC3 point mutant (I), or intra-DMS empty vector (EV) or HDAC3 (M). *P<0.05; **P<0.01.
Given this surprising finding, we next examined normal HDAC3 engagement in the DMS with instrumental training. Using ChIP coupled with qPCR (Fig 5A), we found that, unlike in the DLS, HDAC3 occupancy at Bdnf1 was not significantly altered 1 hr following instrumental training or intermediate training followed by systemic NaBut treatment (Fig. 5B; F3,22=1.87, P=0.16). Similarly, in no case was HDAC3 enrichment at the Nr4a2 promoter significantly different from controls (Fig. 5D; F3,21=2.74, P=0.07). Instead, HDAC3 occupancy at the Nr4a1 promoter (Fig. 5C; F3,12=7.76, P=0.004) was increased relative to homecage controls with intermediate training (P<0.05) and returned to control levels following extended training and was not elevated following intermediate training with systemic NaBut treatment. Follow up mRNA analyses (Fig. 5E–G) revealed Nr4a2 expression (Fig. 5G; F3,31=3.45, P=0.03) was attenuated relative to homecage controls with intermediate training (P<0.01) and intermediate training with systemic NaBut treatment (P<0.05). Bdnf1 and Nr4a1 were not significantly altered (Fig. 5E; Bdnf1, F3,16=1.57, P=0.24; see also Fig. S8; Nr4a1, F3,25=1.87, P=0.55). These results suggest that HDAC3 activity in the DMS and DLS is differentially regulated by instrumental training as habits form.
Figure 5. Effect of training and post-training HDAC inhibition on HDAC3 occupancy at learning-related gene promoters and gene expression in the dorsomedial striatum.
(A) Schematic representation of procedures. (B–D) ChIP was performed with anti-HDAC3 followed by qPCR to identify HDAC3 binding to the Bdnf1 (B), Nr4a1 (C), or Nr4a2 (D) promoters in the DMS of home cage (HC) controls or following either intermediate training (INT) or extended training (EXT) in vehicle-treated rats, or NaBut treatment post-intermediate training. Data presented as fold change relative to IgG (% Input/IgG (e–g) mRNA expression of Bdnf1 (E), Nr4a1 (F), and Nr4a2 (G) in the DMS. **P<0.01, between groups; ##P<0.01 relative to homecage control.
Reward seeking and decision making are controlled by a balance between two systems, one reflective, involving prospective consideration of learned action consequences, and one reflexive, allowing common behaviors to be automatically triggered by antecedent events on the basis of their past success. The data here provide converging evidence in support of dorsal striatal HDAC3 as a critical molecular regulator of the transition of behavioral control to the habit strategy. Systemic HDAC inhibition during instrumental acquisition increases histone acetylation in the dorsal striatum and accelerates habitual control of behavior. HDAC3 occupancy at specific learning-related gene promoters in the dorsal striatum is reduced, relative to early training, when habits form with overtraining and this is mimicked by HDAC inhibition. Using local pharmacological and viral manipulations, HDAC3 activity was found to constrain habit learning in the DLS, previously implicated in habit (11), and also in the DMS, which has been canonically ascribed a role in the opposing, goal-directed, strategy (7, 8).
HDAC3 in the dorsal striatum functions as a negative regulator of habit, as evidenced by its disruption accelerating the rate at which behavioral control transitioned to dominance by habit and its overexpression in the dorsal striatum preventing subjects from forming habits under conditions that would normally promote them to do so. The temporally-restricted effect of post-training HDAC inhibition suggests that HDAC3 negatively regulates the consolidation of habit memories. Habits are slow to form, being gradually acquired with repetition of successful actions in the presence of consistent stimuli, but once fully formed can be executed almost automatically, freeing attention to be focused elsewhere (1, 2). The current data suggest that in the dorsal striatum the repressive enzyme HDAC3 could be a molecular ‘brake’ (22, 23) on this type of learning, being engaged to slow habit formation and preserve behavioral control by the more cognitively-taxing, but less error-prone, goal-directed system until enough successful repetition has proceeded to ensure sufficient accuracy of the habit. In support of this, early in training, when habits do not yet dominate behavioral control, HDAC3 occupancy at the Bdnf1 and Nr4a2 promoters in the DLS and at the Nr4a1 promoter in the DMS is enriched and then returns to baseline levels with the repeated training that promotes habit. That overexpression of HDAC3 in the dorsal striatum prevented habit formation further suggests that, under suitable conditions (e.g., repeated success), an instrumental learning opportunity triggers activity-dependent signaling that removes HDAC3 to create the transcriptionally-permissive state that allow habits to strengthen and, eventually, come to control behavior. Dorsal striatal HDAC3, therefore, normally curtails habit.
Consolidation of long-term memories, such as habits, depends on gene transcription (44). HDAC3 may, therefore, function to curtail the gene transcription underlying habit. Transcription regulated by CREB has been shown to be essential for long-term memory (45–48), and CREB function in the dorsal striatum is crucial for habit learning (13). The enhancement of memory by HDAC inhibition has been demonstrated to occur through regulation of CREB-regulated genes (33, 36), including Bdnf, Nr4a1, and Nr4a2 (36, 49), genes themselves implicated in learning (33, 35–37, 50–52). Here we found that as habits form HDAC3 was disengaged (relative to its enriched state during early learning) from the Bdnf1 and Nr4a2 promoters in the DLS and the Nr4a1 promoter in the DMS. HDAC inhibition, which accelerated habit formation, similarly altered HDAC3 occupancy at these promoters and, in some cases, induced the expression of these genes. These data suggest that a probable mechanism for dorsal striatal HDAC3 moderation of habit formation is via regulation of CREB-regulated genes. HDAC3 represses transcription by removing acetylation and recruiting complementary repressive enzymes, such as methyltransferases or phosphatases. HDAC removal allows HATs to promote histone acetylation, which decreases affinity of histone tails for DNA and serves as a recruitment signal for transcriptional coactivators (19) that can promote active gene expression subserving synaptic plasticity and learning (22–24). Although the data show that H4K8Ac is increased in the dorsal striatum by post-training HDAC inhibition, the precise mechanisms of HDAC3 regulation of CREB-regulated genes in the dorsal striatum, e.g., whether it is via acetylation at H4K8, other histone sites, a combination of marks, and/or through co-repressor mechanisms remains to be explored.
Interestingly, HDAC3 binding at gene promoters and gene expression did not always perfectly match. This discrepancy could be due to differential temporal dynamics between transcriptional activation and HDAC3-promoter interactions (34), but it also likely indicates that HDAC3 binding is not the sole determinant of expression of the genes examined here. Indeed, gene expression is regulated by a complex concert of factors, including repressive enzymes, like HDAC3, and activating enzymes, such as transcription factors and HATs. Recruitment of HATs, such as CREB-binding protein (28, 36), are required for HDAC3 regulation of gene transcription and memory. Moreover, it is also possible that the non-specific class I HDAC inhibition treatment engaged repressive mechanisms (53) additional to those normally engaged to regulate gene expression differently than learning alone, perhaps suggesting alternate, non-natural routes to habit. Future experiments assessing other transcriptional regulators and/or using genome-wide approaches are needed to further elucidate the molecular mechanisms supporting habit.
Surprisingly, HDAC3 was found to negatively regulate habit in both the DLS and DMS. HDAC3 inhibition restricted to the DMS potentiated habit formation, whereas DMS-specific HDAC3 overexpression, while not affecting instrumental learning per se, prevented control from transitioning to dominance by the habit system, completely recapitulating the effect found with identical DLS manipulations. The overexpression result, along with data demonstrating HDAC3 enrichment early in training and removal back to baseline with overtraining at the Nr4a1 promoter in the DMS, provide evidence that DMS HDAC3 activity functions normally to repress habit. These results were unexpected in light of the canonical view that the DMS is crucial for action-outcome learning (54, 55) and evidence that DMS lesions force behavioral control by the habit system (7–9). Potential functional differences between the anterior and posterior DMS may help understand these surprising results. The anterior DMS was targeted here based on the locus of systemic HDAC-inhibition-induced increase in histone acetylation, whereas seminal lesion results implicated the posterior DMS in goal-directed learning (8). More recent work, however, has demonstrated both the anterior and posterior DMS to be crucial for the action-outcome learning underlying goal-directed control (9). Previous data suggest that DLS circuits might store habit-related information (56–59), with memories vital for goal-directed control stored in the DMS (60–62). One possibility is that HDAC3 may regulate the formation and storage of habit memories in the DLS, while in the DMS it depotentiates the action-outcome memories underlying goal-directed control as deliberation becomes less required and actions become chunked (63, 64) into stereotyped units. Indeed, neurons in the DLS will show chunking-related activity that strengthens with training, while simultaneously DMS neurons show activity at deliberation points that wanes with extended training (65). It is also possible that HDAC3 regulates DMS indirect-pathway projections, activity in which has been shown to promote habit (66). In either case, the distinct HDAC3 occupancy patterns at learning-related gene promoters in the DLS v. DMS suggests differential transcriptional regulation by HDAC3 in these subregions. Importantly, these results challenge the strict dissociation between DMS and DLS function in goal-directed and habitual control of behavior.
These data reveal a new molecular directive of habit formation. Dorsal striatal HDAC3 functions as a molecular ‘brake’ over habit, being engaged to slow the transition to habit and removed when the conditions are ripe for habits to dominate. Epigenetic dysfunction has been implicated in the etiology of many psychiatric disease states (67–71). The balance between goal-directed and habitual control is also disrupted in these conditions. Indeed, deficits in the acquisition and execution of behavioral habits are symptoms of both Huntington’s (72) and Parkinson’s disease (73–75). An overreliance on habit has been associated with the compulsivity that manifests across a range of psychiatric diseases (6, 76), including obsessive-compulsive disorder (77, 78), autism-spectrum disorder (79), schizophrenia (80, 81), addiction (82), alcoholism (83, 84), and compulsive overeating (85). Moreover, stress, a predisposing condition to many psychiatric illnesses, can lead to abnormal HDAC activity (68, 86), which could, given the current result, lead to its ability to potentiate habits (87–89) and, thereby, promote maladaptive compulsive behavior (90). The current data, therefore, suggest a potential molecular mechanism for such maladaptive behavior (90) and support notions of HDAC3 as a promising target for therapeutic intervention (23, 67, 91–95). These data also highlight the potential for unintended effects on habit learning by HDAC3 therapeutic manipulations.
Supplementary Material
Supplemental Table S1. Quantification of H4K8 acetylation 1 hr following instrumental training/systemic drug treatment. (N=4–6/condition). H4K8Ac normalized to vehicle control mean + SEM. PrL, prelimbic cortex (t7=1.01, P=0.345); IL, infralimbic cortex (t7=0.80, P=0.451); NAc, nucleus accumbens (t9=1.23, P=0.249); CeN, central amygdala (t9=0.21, P=0.836); BLA, basolateral amygdala (t10=0.08, P=0.939); VTA, ventral tegmental area (t6=0.65, P=0.541); SNc, substantia nigra pars compacta (t6=0.57, P=0.581).
Supplemental Figure S1. Effect of HDAC manipulation on behavioral strategy- raw press rates. (A–C) Lever-press rate during devaluation tests for rats given limited (A; Devaluation: F1,20=4.88, P=0.04, Drug: F1,20=18.80, P=0.0003, Drug x Devaluation: F1,20=0.03, P=0.86), intermediate (B; Devaluation: F1,20=3.16, P=0.09, Drug: F1,20=7.84, P=0.01, Drug x Devaluation: F1,20=3.84, P=0.06), or extended (C; Devaluation: F1,16=1.12, P=0.31, Drug: F1,16=1.55, P=0.23, Drug x Devaluation: F1,16=0.01, P=0.90) instrumental training. (data presented as mean + scatter). (D–F) Lever-press rate during devaluation tests for rats given post-training intra-DLS vehicle or RGFP966 infusions (D; Devaluation: F1,17=5.43, P=0.03, Drug: F1,17=0.61, P=0.44, Drug x Devaluation: F1,17=3.08, P=0.10), intra-DLS empty vector (EV) or HDAC3 point mutant (HDAC3pm; E; Devaluation: F1,14=3.03, P=0.07, Virus: F1,14=0.68, P=0.42, Virus x Devaluation: F1,14=4.57, P=0.05), or intra-DLS empty vector or HDAC3 (F; Devaluation: F1,22=2.46, P=0.13, Virus: F1, 22=0.15, P=0.70, Virus x Devaluation: F1, 22=0.79, P=0.38). (G–I) Lever press rate during devaluation tests for rats given post-training intra-DMS vehicle or RGFP966 infusions (G; Devaluation: F1,16=1.42, P=0.25, Drug: F1,16=0.13, P=0.72, Drug x Devaluation: F1,16=2.06, P=0.17), intra-DLS empty vector or HDAC3 point mutant (H; Devaluation: F1,21=2.96, P=0.10, Virus: F1,21=0.08, P=0.78, Virus x Devaluation: F1,21=4.63, P=0.04), or intra-DLS empty vector or HDAC3 (I; Devaluation: F1,22=5.78, P=0.03, Virus: F1,22=1.41, P=0.25, Virus x Devaluation: F1,22=3.36, P=0.08). *P<0.05; **P<0.01.
Supplemental Figure S2. Post-training HDAC inhibition impairs sensitivity to lever press→reward omission contingency. Rats (N=7–8/group) were trained on an instrumental task, as described in the main text, whereby lever pressing earned delivery of a food-pellet reward on a RI-30-s schedule of reinforcement. All rats were given intermediate training and were administered NaBut or vehicle immediately after each RI-30-s training session. (A) All rats acquired the instrumental behavior; both vehicle- and NaBut-treated rats increased their lever-press rate across training days (Training day: F4,116=64.87, P<0.0001), with the NaBut group plateauing at a lower rate, compared to vehicle controls (Day x Drug: F4,116=7.02, P<0.0001). (B) After training, goal-directed v. habit action strategy was assessed using a 30-min omission test in which lever pressing no longer earned reward, but rather delayed non-contingent reward delivery (see Methods). Performance in these subjects was compared to controls that received yoked reward delivery with lever presses having no programed consequences (Yoked). No drug was delivered prior to or after this test. Post-training HDAC inhibition reduced sensitivity to this omission contingency compared to vehicle-treated controls (Time x Drug x Omission, F5,135=3.00, P=0.027). While vehicle-treated controls rapidly reduced their lever press behavior in response to the omission contingency (B- right; Omission group: F1,14=4.98, P=0.043; Time: F5,70=7.74, P<0.0001; Omission x Time F5,70=4.53, P=0.001), rats treated with the HDAC inhibitor following each training session failed to show sensitivity to omission (B- left; Omission: F1,13=0.61, P=0.448; Time: F5,65=15.02, P<0.0001; Omission x Time F5,65=0.60, P=0.699). Data presented as mean + s.e.m. *P<0.05; **P<0.01.
Supplemental Figure S3. Post-training HDAC inhibition accelerates habitual control of behavior when trained on random-ratio schedule. Rats (N=10–11/group) were trained on an instrumental task, as described in the main text, whereby lever pressing earned food-pellet rewards on a random-ratio (RR) 10 schedule of reinforcement. (A) All rats were given intermediate training and were administered NaBut or vehicle immediately after each RR-10 training session. All rats acquired the instrumental behavior; both vehicle- and NaBut-treated rats increased their lever-press rate across training days (Training day: F4,64=44.50, P<0.0001). There was no significant difference in training between the vehicle- and NaBut-treated groups (Drug: F1,16=1.65, P=0.217; Day x Drug: F4,64=2.35, P=0.063). (data presented as mean + s.e.m). (B) Following training, rats were given a devaluation test, as described in the main text. Planned comparisons revealed that while control subjects, as expected, reduced their lever pressing following devaluation of the earned reward (t9=2.44, P=0.037) lever pressing in NaBut-treated subjects became insensitive to devaluation (t7=0.39, P=0.710) indicative of habitual control over action. (data presented as mean + scatter). *P<0.05.
Supplemental Figure S4. HDAC inhibition outside post-training consolidation window has no effect. Rats (N=8/group) were trained on an instrumental task, as described in the main text, whereby lever pressing earned food-pellet rewards on a RI-30-s schedule of reinforcement. All rats were given intermediate training and were administered NaBut or vehicle 10 hr after each RI-30-s training session. (A) All rats acquired the instrumental behavior; both vehicle- and NaBut-treated rats increased their lever-press rate across training days (Training day: F4,52=34.50, P<0.0001). There was no significant difference in training between the vehicle- and NaBut-treated groups (Drug: F1,13=0.02, P=0.896; Day x Drug: F4,52=0.36, P=0.831). (data presented as mean + s.e.m). (B) Following training, rats were given a devaluation test as described in the main text. Both groups showed significant sensitivity to devaluation of the earned reward (Devaluation: F1,13=7.23, P=0.019; Drug: F1,13=0.87, P=0.369; Drug x Devaluation: F1,13<0.0001, P=0.994). (data presented as mean + scatter).
Supplemental Figure S5. Representative images of H4K8Ac in whole striatum following post-training HDAC inhibition following. Representative immunofluorescent images of acetylation of H4K8 (H4K8Ac) in dorsolateral (DLS) and dorsomedial (DMS) striatum of a vehicle-treated rat (Top panels) or NaBut-treated rat (Bottom panels).
Supplemental Figure S6. Effect of post-training HDAC inhibition following intermediate instrumental training on acetylation at histone H4 in neuronal and non-neuronal cells. Representative immunofluorescent images of acetylation of H4K8 (H4K8Ac) and NeuN double-labeled tissue and quantification (N=4/condition) of H4K8Ac in dorsolateral (DLS) and dorsomedial (DMS) striatum neuronal (NeuN+) and non-neuronal (NeuN-) cells 1 hr following instrumental training/drug treatment. (B) In NeuN+ cells, H4K8Ac was elevated in both DLS and DMS (Drug: F1,12=7.26, P=0.02; Region: F1,12=0.64, P=0.43 Drug x Region: F1,12=0.44, P=0.44). (C) In NeuN- cells, H4K8Ac was elevated in the DLS, but not in the DMS of NaBut-treated rats relative to vehicle controls (Region: F1,12=4.99, P=0.05; Drug: F1,12=8.85, P=0.01; Drug x Region: F1,12=4.99, P=0.05). Data normalized to vehicle control (dashed line). *P<0.05, **P<0.01.
Supplemental Figure S7. Instrumental training induces immediate early gene expression in dorsal striatum and this is unaltered by HDAC inhibition. (A) Schematic representation of experimental procedure. (B–D) mRNA expression of IEGs implicated in memory processes, including instrumental conditioning (17, 18) in the DLS 1 hr following intermediate training (INT) or extended training (EXT) in vehicle-treated rats, or NaBut treatment post-intermediate training, relative to home cage control levels (dashed line). (B) c-fos expression was significantly altered by training/treatment condition (F3,30=4.09, P=0.015) and was specifically elevated following extended training (P<0.01) and intermediate training paired with NaBut (P=0.05) relative to homecage controls. (C) Egr1 was significantly elevated following instrumental training in all groups relative to homecage controls (F3,32=6.33, P=0.002). (D) FosB was not significantly induced relative to homecage controls (F3,30=1.84, P=0.056). In all cases, there were no significant differences between vehicle- and NaBut-treated groups. (data presented as mean + scatter). (E–G) mRNA expression of IEGs in DMS. (E) c-fos expression was unaltered relative to homecage controls, while (F) Egr1 (F3,32=8.47, P=0.003) and (G) FosB (F3,30=3.24, P=0.036) expression was elevated relative to homecage controls. *P<0.05; **P<0.01.
Supplemental Figure S8. Effect of training and post-training HDAC inhibition on Bdnf exon IX expression in the dorsolateral striatum. (A) Schematic representation of experimental procedure. (B–C) mRNA expression of Bdnf9 in the DLS (B; F3,26=2.68, P=0.068) and DMS (C; F3,27=0.22, P=0.884).
Supplemental Figure S9. Representative images of H4K8Ac in whole striatum following dorsolateral-specific striatal HDAC3 manipulations. (A) Representative immunofluorescent images of acetylation of H4K8 (H4K8Ac) from one rat that received unilateral intra-DLS vehicle infusion in one hemisphere and unilateral RGFP966 infusion in the contralateral hemisphere. (B–C) (Left) Representative immunofluorescent images of DLS viral spread (outlined) and (Right) corresponding H4K8Ac in serial sections (outline represents viral expression from adjacent section) from rats expressing the HDAC3 point mutant (B) or overexpressing HDAC3 (C).
Supplemental Figure S10. Overexpression of HDAC3 in the dorsolateral striatum prevents habit formation with extensive overtraining. A subset of the subjects for which HDAC3 was overexpressed in the DLS were given further extended instrumental training on the RI-30-s schedule of reinforcement. (A) All rats responded similarly on the instrumental behavior over the extended overtraining period (Training day: F12,108=45.59, P<0.0001; Virus: F1,9=1.74, P=0.220; Day x Virus: F12,108=0.728, P=0.721). (data presented as mean + s.e.m). (B) After this extended overtraining, rats were given a second devaluation test, as described in the main text. Planned comparisons revealed that whereas, as expected, control subjects (N=7) showed evidence of habits (insensitivity to outcome devaluation; t9=1.32, P=0.22), subjects with HDAC3 overexpressed (N=4) in the DLS were unable to form habits and continued to show sensitivity to devaluation (t9=2.17, P=0.058). (data presented as mean + scatter).
Supplemental Figure S11. Representative images of H4K8Ac in whole striatum following dorsomedial-specific striatum HDAC3 manipulations. (A) Representative immunofluorescent images of acetylation of H4K8 (H4K8Ac) from one rat that received unilateral intra-DMS vehicle infusion in one hemisphere and unilateral RGFP966 infusion in the contralateral hemisphere. (B–C) (Left) Representative immunofluorescent images of DMS viral spread (outlined) and (Right) corresponding H4K8Ac in serial sections (outline represents viral expression from adjacent section) from rats expressing the HDAC3 point mutant (B) or overexpressing HDAC3 (C).
Supplemental Figure S12. Overexpression of HDAC3 in the dorsomedial striatum prevents habit formation with extensive overtraining. A subset of the subjects for which HDAC3 was overexpressed in the DMS were given further extended instrumental training on RI-30 s schedule of reinforcement. (A) All rats responded similarly on the instrumental behavior over the extended overtraining period (Training day: F12,273=13.97, P<0.0001; Virus: F1,273=0.84, P=0.360; Day x Virus: F12,273=0.49, P=0.920). (data presented as mean + s.e.m). (B) After this extended overtraining, rats were given a second devaluation test, as described in the main text. Planned comparisons revealed that whereas, as expected, control (N=12) subjects showed evidence of habits (insensitivity to outcome devaluation; t9=1.51, P=0.15), subjects with HDAC3 overexpressed (N=11) in the DLS were unable to form habits and continued to show sensitivity to devaluation (t21=4.05, P=0.0006). (data presented as mean + scatter). ***P<0.001.
This research was supported by a Hellman Foundation Fellowship, a UCLA Faculty Career Development award, and grant DA035443 from NIDA, to KMW, grant DA025922 and AG051807 to MAW, and start-up funds from UCLA Life Sciences Division to PJK.
MM and KMW designed the research, analyzed and interpreted the data with assistance from MAW and PJK. MM conducted the research with assistance from VYG and MDM. DPM and MAW prepared and contributed viral constructs and conducted ChIP experiments. MM, NAA, and PJK collected and analyzed qRT-PCR and Western Blot data. MM and KMW wrote the manuscript with assistance from MAW and PJK.
The authors declare no biomedical financial interests or potential conflicts of interest.
Associated Data
Supplementary Materials
Supplemental Table S1. Quantification of H4K8 acetylation 1 hr following instrumental training/systemic drug treatment. (N=4–6/condition). H4K8Ac normalized to vehicle control mean + SEM. PrL, prelimbic cortex (t7=1.01, P=0.345); IL, infralimbic cortex (t7=0.80, P=0.451); NAc, nucleus accumbens (t9=1.23, P=0.249); CeN, central amygdala (t9=0.21, P=0.836); BLA, basolateral amygdala (t10=0.08, P=0.939); VTA, ventral tegmental area (t6=0.65, P=0.541); SNc, substantia nigra pars compacta (t6=0.57, P=0.581).
Supplemental Figure S1. Effect of HDAC manipulation on behavioral strategy- raw press rates. (A–C) Lever-press rate during devaluation tests for rats given limited (A; Devaluation: F1,20=4.88, P=0.04, Drug: F1,20=18.80, P=0.0003, Drug x Devaluation: F1,20=0.03, P=0.86), intermediate (B; Devaluation: F1,20=3.16, P=0.09, Drug: F1,20=7.84, P=0.01, Drug x Devaluation: F1,20=3.84, P=0.06), or extended (C; Devaluation: F1,16=1.12, P=0.31, Drug: F1,16=1.55, P=0.23, Drug x Devaluation: F1,16=0.01, P=0.90) instrumental training. (data presented as mean + scatter). (D–F) Lever-press rate during devaluation tests for rats given post-training intra-DLS vehicle or RGFP966 infusions (D; Devaluation: F1,17=5.43, P=0.03, Drug: F1,17=0.61, P=0.44, Drug x Devaluation: F1,17=3.08, P=0.10), intra-DLS empty vector (EV) or HDAC3 point mutant (HDAC3pm; E; Devaluation: F1,14=3.03, P=0.07, Virus: F1,14=0.68, P=0.42, Virus x Devaluation: F1,14=4.57, P=0.05), or intra-DLS empty vector or HDAC3 (F; Devaluation: F1,22=2.46, P=0.13, Virus: F1, 22=0.15, P=0.70, Virus x Devaluation: F1, 22=0.79, P=0.38). (G–I) Lever press rate during devaluation tests for rats given post-training intra-DMS vehicle or RGFP966 infusions (G; Devaluation: F1,16=1.42, P=0.25, Drug: F1,16=0.13, P=0.72, Drug x Devaluation: F1,16=2.06, P=0.17), intra-DLS empty vector or HDAC3 point mutant (H; Devaluation: F1,21=2.96, P=0.10, Virus: F1,21=0.08, P=0.78, Virus x Devaluation: F1,21=4.63, P=0.04), or intra-DLS empty vector or HDAC3 (I; Devaluation: F1,22=5.78, P=0.03, Virus: F1,22=1.41, P=0.25, Virus x Devaluation: F1,22=3.36, P=0.08). *P<0.05; **P<0.01.
Supplemental Figure S2. Post-training HDAC inhibition impairs sensitivity to lever press→reward omission contingency. Rats (N=7–8/group) were trained on an instrumental task, as described in the main text, whereby lever pressing earned delivery of a food-pellet reward on a RI-30-s schedule of reinforcement. All rats were given intermediate training and were administered NaBut or vehicle immediately after each RI-30-s training session. (A) All rats acquired the instrumental behavior; both vehicle- and NaBut-treated rats increased their lever-press rate across training days (Training day: F4,116=64.87, P<0.0001), with the NaBut group plateauing at a lower rate, compared to vehicle controls (Day x Drug: F4,116=7.02, P<0.0001). (B) After training, goal-directed v. habit action strategy was assessed using a 30-min omission test in which lever pressing no longer earned reward, but rather delayed non-contingent reward delivery (see Methods). Performance in these subjects was compared to controls that received yoked reward delivery with lever presses having no programed consequences (Yoked). No drug was delivered prior to or after this test. Post-training HDAC inhibition reduced sensitivity to this omission contingency compared to vehicle-treated controls (Time x Drug x Omission, F5,135=3.00, P=0.027). While vehicle-treated controls rapidly reduced their lever press behavior in response to the omission contingency (B- right; Omission group: F1,14=4.98, P=0.043; Time: F5,70=7.74, P<0.0001; Omission x Time F5,70=4.53, P=0.001), rats treated with the HDAC inhibitor following each training session failed to show sensitivity to omission (B- left; Omission: F1,13=0.61, P=0.448; Time: F5,65=15.02, P<0.0001; Omission x Time F5,65=0.60, P=0.699). Data presented as mean + s.e.m. *P<0.05; **P<0.01.
Supplemental Figure S3. Post-training HDAC inhibition accelerates habitual control of behavior when trained on random-ratio schedule. Rats (N=10–11/group) were trained on an instrumental task, as described in the main text, whereby lever pressing earned food-pellet rewards on a random-ratio (RR) 10 schedule of reinforcement. (A) All rats were given intermediate training and were administered NaBut or vehicle immediately after each RR-10 training session. All rats acquired the instrumental behavior; both vehicle- and NaBut-treated rats increased their lever-press rate across training days (Training day: F4,64=44.50, P<0.0001). There was no significant difference in training between the vehicle- and NaBut-treated groups (Drug: F1,16=1.65, P=0.217; Day x Drug: F4,64=2.35, P=0.063). (data presented as mean + s.e.m). (B) Following training, rats were given a devaluation test, as described in the main text. Planned comparisons revealed that while control subjects, as expected, reduced their lever pressing following devaluation of the earned reward (t9=2.44, P=0.037) lever pressing in NaBut-treated subjects became insensitive to devaluation (t7=0.39, P=0.710) indicative of habitual control over action. (data presented as mean + scatter). *P<0.05.
Supplemental Figure S4. HDAC inhibition outside post-training consolidation window has no effect. Rats (N=8/group) were trained on an instrumental task, as described in the main text, whereby lever pressing earned food-pellet rewards on a RI-30-s schedule of reinforcement. All rats were given intermediate training and were administered NaBut or vehicle 10 hr after each RI-30-s training session. (A) All rats acquired the instrumental behavior; both vehicle- and NaBut-treated rats increased their lever-press rate across training days (Training day: F4,52=34.50, P<0.0001). There was no significant difference in training between the vehicle- and NaBut-treated groups (Drug: F1,13=0.02, P=0.896; Day x Drug: F4,52=0.36, P=0.831). (data presented as mean + s.e.m). (B) Following training, rats were given a devaluation test as described in the main text. Both groups showed significant sensitivity to devaluation of the earned reward (Devaluation: F1,13=7.23, P=0.019; Drug: F1,13=0.87, P=0.369; Drug x Devaluation: F1,13<0.0001, P=0.994). (data presented as mean + scatter).
Supplemental Figure S5. Representative images of H4K8Ac in whole striatum following post-training HDAC inhibition following. Representative immunofluorescent images of acetylation of H4K8 (H4K8Ac) in dorsolateral (DLS) and dorsomedial (DMS) striatum of a vehicle-treated rat (Top panels) or NaBut-treated rat (Bottom panels).
Supplemental Figure S6. Effect of post-training HDAC inhibition following intermediate instrumental training on acetylation at histone H4 in neuronal and non-neuronal cells. Representative immunofluorescent images of acetylation of H4K8 (H4K8Ac) and NeuN double-labeled tissue and quantification (N=4/condition) of H4K8Ac in dorsolateral (DLS) and dorsomedial (DMS) striatum neuronal (NeuN+) and non-neuronal (NeuN-) cells 1 hr following instrumental training/drug treatment. (B) In NeuN+ cells, H4K8Ac was elevated in both DLS and DMS (Drug: F1,12=7.26, P=0.02; Region: F1,12=0.64, P=0.43 Drug x Region: F1,12=0.44, P=0.44). (C) In NeuN- cells, H4K8Ac was elevated in the DLS, but not in the DMS of NaBut-treated rats relative to vehicle controls (Region: F1,12=4.99, P=0.05; Drug: F1,12=8.85, P=0.01; Drug x Region: F1,12=4.99, P=0.05). Data normalized to vehicle control (dashed line). *P<0.05, **P<0.01.
Supplemental Figure S7. Instrumental training induces immediate early gene expression in dorsal striatum and this is unaltered by HDAC inhibition. (A) Schematic representation of experimental procedure. (B–D) mRNA expression of IEGs implicated in memory processes, including instrumental conditioning (17, 18) in the DLS 1 hr following intermediate training (INT) or extended training (EXT) in vehicle-treated rats, or NaBut treatment post-intermediate training, relative to home cage control levels (dashed line). (B) c-fos expression was significantly altered by training/treatment condition (F3,30=4.09, P=0.015) and was specifically elevated following extended training (P<0.01) and intermediate training paired with NaBut (P=0.05) relative to homecage controls. (C) Egr1 was significantly elevated following instrumental training in all groups relative to homecage controls (F3,32=6.33, P=0.002). (D) FosB was not significantly induced relative to homecage controls (F3,30=1.84, P=0.056). In all cases, there were no significant differences between vehicle- and NaBut-treated groups. (data presented as mean + scatter). (E–G) mRNA expression of IEGs in DMS. (E) c-fos expression was unaltered relative to homecage controls, while (F) Egr1 (F3,32=8.47, P=0.003) and (G) FosB (F3,30=3.24, P=0.036) expression was elevated relative to homecage controls. *P<0.05; **P<0.01.
Supplemental Figure S8. Effect of training and post-training HDAC inhibition on Bdnf exon IX expression in the dorsolateral striatum. (A) Schematic representation of experimental procedure. (B–C) mRNA expression of Bdnf9 in the DLS (B; F3,26=2.68, P=0.068) and DMS (C; F3,27=0.22, P=0.884).
Supplemental Figure S9. Representative images of H4K8Ac in whole striatum following dorsolateral-specific striatal HDAC3 manipulations. (A) Representative immunofluorescent images of acetylation of H4K8 (H4K8Ac) from one rat that received unilateral intra-DLS vehicle infusion in one hemisphere and unilateral RGFP966 infusion in the contralateral hemisphere. (B–C) (Left) Representative immunofluorescent images of DLS viral spread (outlined) and (Right) corresponding H4K8Ac in serial sections (outline represents viral expression from adjacent section) from rats expressing the HDAC3 point mutant (B) or overexpressing HDAC3 (C).
Supplemental Figure S10. Overexpression of HDAC3 in the dorsolateral striatum prevents habit formation with extensive overtraining. A subset of the subjects for which HDAC3 was overexpressed in the DLS were given further extended instrumental training on the RI-30-s schedule of reinforcement. (A) All rats responded similarly on the instrumental behavior over the extended overtraining period (Training day: F12,108=45.59, P<0.0001; Virus: F1,9=1.74, P=0.220; Day x Virus: F12,108=0.728, P=0.721). (data presented as mean + s.e.m). (B) After this extended overtraining, rats were given a second devaluation test, as described in the main text. Planned comparisons revealed that whereas, as expected, control subjects (N=7) showed evidence of habits (insensitivity to outcome devaluation; t9=1.32, P=0.22), subjects with HDAC3 overexpressed (N=4) in the DLS were unable to form habits and continued to show sensitivity to devaluation (t9=2.17, P=0.058). (data presented as mean + scatter).
Supplemental Figure S11. Representative images of H4K8Ac in whole striatum following dorsomedial-specific striatum HDAC3 manipulations. (A) Representative immunofluorescent images of acetylation of H4K8 (H4K8Ac) from one rat that received unilateral intra-DMS vehicle infusion in one hemisphere and unilateral RGFP966 infusion in the contralateral hemisphere. (B–C) (Left) Representative immunofluorescent images of DMS viral spread (outlined) and (Right) corresponding H4K8Ac in serial sections (outline represents viral expression from adjacent section) from rats expressing the HDAC3 point mutant (B) or overexpressing HDAC3 (C).
Supplemental Figure S12. Overexpression of HDAC3 in the dorsomedial striatum prevents habit formation with extensive overtraining. A subset of the subjects for which HDAC3 was overexpressed in the DMS were given further extended instrumental training on RI-30 s schedule of reinforcement. (A) All rats responded similarly on the instrumental behavior over the extended overtraining period (Training day: F12,273=13.97, P<0.0001; Virus: F1,273=0.84, P=0.360; Day x Virus: F12,273=0.49, P=0.920). (data presented as mean + s.e.m). (B) After this extended overtraining, rats were given a second devaluation test, as described in the main text. Planned comparisons revealed that whereas, as expected, control (N=12) subjects showed evidence of habits (insensitivity to outcome devaluation; t9=1.51, P=0.15), subjects with HDAC3 overexpressed (N=11) in the DLS were unable to form habits and continued to show sensitivity to devaluation (t21=4.05, P=0.0006). (data presented as mean + scatter). ***P<0.001.