Methods for studying habitual behavior in mice

Mark A Rossi; Henry H Yin

doi:10.1002/0471142301.ns0829s60

. Author manuscript; available in PMC: 2013 Jul 1.

Published in final edited form as: Curr Protoc Neurosci. 2012 Jul;CHAPTER:Unit8.29. doi: 10.1002/0471142301.ns0829s60

Methods for studying habitual behavior in mice

Mark A Rossi ¹, Henry H Yin ¹

PMCID: PMC3408711 NIHMSID: NIHMS391597 PMID: 22752897

Abstract

Habit formation refers to the process by which goal-directed behavior becomes automatized and less sensitive to changes in the value of the goal. It has clear relevance for our understanding of skill learning and addiction. Recent studies have begun to reveal the neural substrates underlying this process. This article summarizes what is known about the experimental methods used, and provides a protocol for generating and assessing habit formation in mice.

Introduction

A key experimental finding shaped modern research on instrumental conditioning. A series of studies beginning in the 1980's has shown that performance (e.g. the rate of lever pressing) is sensitive to manipulations of reward value and to instrumental contingency under some conditions. Such behavior has been called “goal-directed actions”. By contrast, under other conditions, behavior can become less sensitive to these manipulations (Adams, 1982; Adams and Dickinson, 1981; Colwill and Rescorla, 1985; Colwill and Rescorla, 1986; Dickinson, 1985). Such behavior has been called “habitual.” The conditions differ primarily in the extent of training and in the nature of the feedback provided to the animal. With limited training, behavior is more goal-directed, but with extensive training, it can become less sensitive to value and contingency manipulations, i.e. habitual. Likewise, with certain feedback functions, for example ratio schedules of reinforcement, instrumental behavior is goal-directed, but with other feedback functions, e.g. variable interval schedules, behavior can rapidly become habitual (Adams, 1982; Adams and Dickinson, 1981; Dickinson, 1994; Yin and Knowlton, 2006; Yin et al., 2008; Yin et al., 2005b; Yu et al., 2009; Yu et al., 2010).

BASIC PROTOCOL 1: GENERATING HABITUAL OPERANT RESPONDING IN MICE

This protocol is sufficient to produce habitual operant responding in adult mice (>6weeks, 20-30g each). From the time food restriction begins, approximately 20 days of training/testing are required to generate and test for habitual lever pressing behavior. For the most consistent data, it is best to test mice 7 days per week at approximately the same time of each day.

Materials

Mice: e.g., C57BL/6J from Jackson Laboratories or any other mouse of interest (at least 6 weeks of age; 20-30 g each)

Desktop computer with Microsoft Windows operating system

Software

Med Associates behavioral software: Med-PC and Trans-IV

Behavioral program for conditioned reinforcement (CRF)

Behavioral programs for Random Interval schedules (RI30 and RI60)

Behavioral program for extinction test

Might be nice to add a note on how these programs are obtained – i.e., either written by the experimenter or obtained from an archive of programs that I believe Med Associates makes freely available.

Med-PC to Excel (Med Associates)

Operant chambers (commercially available e.g., Med Associates) each housed inside a light-resistant and sound-attenuating box. Each operant chamber is 21.6 cm long × 17.8 cm wide × 12.7 cm high and has two retractable levers, a pellet receptacle between them, and a house light (3W, 24V) on the opposite wall.

14 mg pellets (dustless precision pellets for mice, Bio-Serv, USA)

Scale for weighing food and mice

Clean cages for pre-feeding. One cage per mouse (in addition to the home cages)

Lithium Chloride Solution (Sigma Chemical, St. Louis, MO; 0.15 M, 40 mL/kg IP). Syringes 1-mL

Pre-experiment setup and considerations

Operant conditioning

Commercially available operant boxes are used (i.e., Med Associates Inc). Each box should be connected to a desktop computer and operated using available behavioral software (i.e., Med-PC). Each operant chamber should be housed within a light-resistant and sound-attenuating box (Figure 1). Dimensions of a typical Med Associates operant chamber are (21.6 cm long × 17.8 cm wide × 12.7 cm high). Each operant chamber has a pellet receptacle in the center of one wall where food pellets are dispensed via an external pellet dispenser. On either side of the receptacle are retractable levers, and on the opposite wall, a 3 W, 24 V house light. Before testing each day, all parts of the operant chamber should be tested to assure that they are working correctly (i.e., ensure that levers and pellet dispenser are working properly and that the receptacle is not clogged). Following each test day, each operant chamber should be cleaned by removing any feces and wiping the inside of the chamber with warm water and alcohol.

Note: If testing male and female mice in the same operant chambers, use alcohol to wipe out the inside walls of each chamber between mice, and then allow the box to air out for approximately 5 minutes before beginning the next group of mice.

Each operant chamber (21.6 cm long × 17.8 cm wide × 12.7 cm high) should be housed within a light-resistant and sound-attenuating box. Each chamber has a pellet receptacle in the center of one wall where food pellets are dispensed via an external pellet dispenser. On either side of the receptacle are retractable levers, and on the opposite wall, a 3 W, 24 V house light.

Behavioral programs

Recording lever presses

Each behavioral program can be created using the Trans-IV and Med-PC software. Within the program script, the minimum duration for a lever press can be defined. While the levers can record presses at a faster rate, >10 ms duration followed by a minimum of 10 ms pause is used to define a lever press in order to avoid counting one response multiple times.

Recording head entries

An infrared photo beam positioned just inside the entrance to the pellet receptacle is used to record each time the mouse enters the receptacle. Each time the mouse breaks the photo beam, one receptacle entry is recorded. This can be used as a measure of general activity or as indication that the mouse is retrieving the pellets.

Phase I: Preparing the mice and food restriction

Approximately 5 days before beginning lever training, mice should be handled in order to become accustomed to being manipulated by experimenters. At this time, the mice should be weighed and marked so that each individual can be identified. Identification can be accomplished by marking the tail, toe numbering, or ear tagging.
Prior to training, mice should be food-restricted and maintained at approximately 85% of free-feeding body weight. Initial food restriction should be accomplished gradually over 3 days prior to the first day of lever training. On each day of testing, the amount of home cage chow that each animal receives should be adjusted based on the number of pellets that the mouse earns such that the total amount of food (by weight) is sufficient to maintain the mouse's weight at 85% of free-feeding weight. After training has been initiated, animals should be fed once per day after the training session and should have free access to water throughout. To expedite lever press acquisition, expose mice to food pellets in their home cage one day prior to initial lever training.

Note: For lever training and random interval schedule behavioral testing, the session length can be determined either by session length or the maximum number of pellets received. Either 60-minute or 50-pellet maximums are recommended.

Phase II: Lever training (3-5 days)

3
After mice have been sufficiently food restricted, lever training can begin.
- Important note: Before behavioral testing begins each day, each mouse should be weighed to monitor the weight relative to the baseline, free-feeding body weight. If mice begin to lose too much weight (<85% of free-feeding weight), food restriction should be temporarily halted until the mouse's weight is above 85%.
4
Place one animal in each operant chamber. On the first day of testing, wait approximately 5 minutes after the mice have been placed in the operant chambers but before starting the behavioral program to allow the mice to adjust to and explore the chamber before initiating the behavioral program.
5
Using Med-PC on the desktop computer that controls the operant chambers, start the continuous reinforcement (CRF) program. During CRF training (also known as fixed ratio 1 schedule), each lever press results in the delivery of one pellet into the receptacle. At the start of the session, the house light is illuminated and one lever is inserted into the chamber (Figure 2). The house light remains illuminated and the lever remains inserted and active during the entire session (60 minutes or until 50 pellets have been earned). At the completion of the session, the lever is retracted and the house light is turned off.
Note: To expedite lever-press acquisition, begin the first day of CRF training with 3 pellets in the magazine and a small amount of crushed pellet powder placed on the extended lever.
6
Following completion of each training session, feed and return each mouse to its home cage. The amount of home chow should be adjusted based on the number of pellets earned during behavioral testing. Typically 1.5 - 2.5 g of total food per mouse per day is sufficient to maintain weight, but this can be adjusted based on the sex/age/weight/metabolism of the mouse.
Note: If testing more than one group of mice in series, take care to clean the inside of each operant box with water between groups in order to reduce the behavioral effects that may result from a mouse smelling a novel mouse from the previous group.
7
Train mice on CRF for three to five daily sessions or until every mouse has acquired lever pressing (50 presses in one session).

Example of a mouse performing on an operant task.

Phase III: Random interval schedule

Random interval schedule (4-8 days)

8
After consistent lever pressing has been acquired for each mouse, begin random interval training using a random interval 30-second schedule (RI30). RI30 training utilizes the same equipment and software as CRF (above) with a modified behavioral program. At the start of each session, the house light is illuminated and the lever is inserted into the chamber as in CRF training. However, on an RI30 schedule, a pellet can only be earned once every 30 seconds, and the point at which the lever is activated varies randomly from trial to trial within a 30-second window. The first press after the lever has been activated results in the delivery of one pellet, and presses before the lever has been activated have no consequences. The session ends with retraction of the lever and the house light being turned off.
9
Training on an RI30 schedule should result in nearly constant lever pressing. Conduct a minimum of three sessions at RI30. If the rate of pressing remains low (similar to that of CRF), continue to train until the pressing rate increases for all mice.
10
After animals have completed at least 2 sessions of an RI30 schedule, train mice on a random interval 60-second schedule (RI60) for a minimum of 4 60-minute sessions. Mice can be trained for more than seven sessions to increase the likelihood of habitual responding.
11
RI60 schedules are similar to RI30, however, the lever is activated randomly within a 60-second window rather than a 30-second window. This schedule of reinforcement imposes more variability (uncertainty) in the time at which a pellet can be earned, and thus, will promote a consistently high rate of pressing. After mice are sufficiently trained on an RI60 schedule, they will typically press the lever many hundreds of times within a 60-minute session. The increased uncertainty introduced by RI60 training promotes habitual operant responding that is resistant to reward devaluation.

Phase IV: Devaluation

Selective satiety (1 day)

12
A reward devaluation procedure is commonly use to test for habitual responding. In such a procedure, the value of the reward (in this case, the food pellet) is reduced. Habitual behavior is resistant to outcome devaluation, whereas behavior that is not habitual is susceptible to devaluation. Thus, a mouse who is pressing a lever habitually will continue to press even when the value of the reward has been sufficiently reduced, whereas non-habitually responding mice will quickly reduce their rate of pressing. Reward devaluation can be done in many ways, but a commonly used strategy is selective satiety. In a selective satiety procedure, mice are given open access to the reward (pellets) prior to testing.
Note: Selective satiety is a good test to use for experimenters interested in performing a pre-training test because it is easily reversible (useful to clarify what is meant by reversible).
13
After a minimum of 4 days of RI training have been completed, conduct a five-minute test in extinction (in the absence of reward). Immediately before extinction testing, mice should be given pellets ad libitum (the same pellets that they earn during behavioral testing) for at least one hour. To do this, separate animals into individual cages that are empty (free of bedding). Place 5g of pellets into each cage with one mouse. Allow the mice undisturbed, free access to the pellets for at least one hour.
Important note: Each mouse's level of satiety is more important than the absolute time that each mouse is given access to the food, so the remaining pellets should be weighed to ensure that the mouse ate some of the pellets (>0.5 g) before proceeding to extinction testing. useful to discuss whether controls are needed to confirm that satiation with the non-rewarded food fails to reduce responding (in goal-directed mice at least)
14
After each mouse is sated, place one animal in each operant box. Begin the five-minute extinction session in which the house light is illuminated and the lever is inserted, but the lever is not active (pressing the lever at any time does not result in delivery of food). No pellets are delivered during extinction testing. The session ends with retraction of the lever. If the lever pressing is habitual, the mouse will continue to press the lever many times (similar to the rate during the RI60 training), even when the reward has been devalued and in the absence of rewards. Following the extinction test, return the mice to their home cages.
Note: If the mice ate less than their daily allotment of food during devaluation (to maintain ~85% body weight), then feed them the appropriate amount. If during devaluation they ate more than the amount required to maintain body weight, do not feed the mice any additional food.

Phase V: Omission (contingency reversal, optional)

Omission (2 days)

15
After devaluation, the animals can be retrained under the RI60 schedule for 2 more sessions. The next day, the instrumental contingency was reversed in an omission procedure, which tests the sensitivity of the animal to a change in the prevailing causal relationship between lever pressing and food reward. For the omission training, a pellet is delivered every 20 s without lever pressing, but each press would reset the counter and thus delay the food delivery. Thus the animal must refrain from pressing the lever to receive rewards. The instrumental contingency between action and outcome has been completely reversed. Omission is usually the most effective way to reduce the rate of lever pressing. However, habitual performance is more resistant to the omission treatment (Yu et al., 2009). Normally 1-2 30-minute sessions of omission testing is sufficient. Note that omission is normally conducted after devaluation, because the experience of the omission contingency can result in significant learning about the instrumental contingency. If done before devaluation, additional days of retraining using the RI schedule will be needed.

Phase VI: Data transfer

16
After data from each session have been collected, the Med-PC program writes the data (in the form of timestamps for each behavioral event) to the hard drive of the desktop computer that controls the operant chambers. To analyze these data, the simplest approach is to use the Med-PC to Excel software (Med Associates) to transfer the behavioral data into a Microsoft Excel spreadsheet. The Med-PC to Excel software will automatically separate the data from each mouse (based on the start time of the behavioral program and the box number). From here, the behavioral events for each mouse (i.e., lever presses, head entries, reward deliveries) can be plotted and analyzed as desired. The data can also be directly imported into MatLab or other analysis programs based on individual needs.
Note: In addition to the behavioral measures recorded directly from the Med Associates operant set-up (lever presses, head entries, rewards, etc...) a video tracking system can also be used to visually monitor the behavior of the mice and track the movement of each mouse throughout each behavioral test.

ALTERNATE PROTOCOL: ALTERNATIVE DEVALUATION METHOD USING LITHIUM CHOLORIDE INJECTION (3 DAYS)

The food reward can also be devalued by artificially inducing conditioned taste aversion. It takes advantage of the well established finding that lithium chloride injection can result in gastric malaise, with an effect similar to food poisoning. By pairing the consumption of the reward with lithium chloride injections, therefore, one can artificially induce taste aversion to the food. This alternative is more permanent than selective satiety with pre-feeding, so additional tests such as omission cannot be performed afterwards if this method of devaluation is used.

Note: The food in question should not be one that the animal has been exposed to repeatedly in the past. In general, the more a given food the animal has consumed in the past, the more difficult it will be to induce taste aversion to the food. This is why taste aversion in humans is typically associated with novel foods, rather than with extremely familiar foods. This phenomenon is known as latent inhibition. In the protocol presented here, the animal is assumed to be exposed to the food pellets only during training. But if the animal has more experience with the food pellets used, additional injections might be needed to produce sufficient taste aversion.

Additional Materials

None; please refer to Basic Protocol 1

Alternate Protocol steps and annotations

One day after the final session, each mouse is placed in a clean cage with no bedding and approximately 2 grams of food pellets (identical to what is used during acquisition) distributed on the floor of the tub. The mice should be left undisturbed for at least 30 min, allowing them to consume food pellets. Immediately afterwards, they are removed from the cages and injected with either 0.9% saline (“valued” mice) or 0.15M lithium chloride (“devalued mice; 40 ml/kg intraperitoneal). After injection they are then returned to their homecages. They should not receive their daily ration of the home chow within the next 4 hours. For the unpaired control condition, “valued” mice receive an injection of 0.15M lithium chloride (40ml/kg i.p.) and “devalued” mice received an equal volume injection of 0.9% saline 2 hours before exposure to the pellets. This procedure gives all mice equal exposure to lithium chloride, but “devalued” mice had LiCl paired with access to food pellets while “valued” animals had unpaired injections. This procedure is performed once daily for 3 days. Consumption is measured by weighing the food pellets before and after the consumption period and using the difference as the amount consumed. The efficacy of the devaluation procedure is assessed simply by the amount of pellets consumed each day. The pairing with LiCl is expected to rapidly reduce the amount consumed over 3 days.
On the day following the third LiCl injection, conduct a five-minute extinction test (described above) without pre-feeding the animals. By this time, the pairing of LiCl with food on the three previous days should have sufficiently reduced the value of the food such that mice who are not responding habitually will cease to press the lever, whereas those who are responding habitually will continue to press the lever. (consider adding a note re. reported difficulties with malaise deval in mice – as per our discussion).

Omission (contingency reversal, optional)

3
Optional omission testing follows the same procedures as discussed in Basic Protocol 1.

Some example data would be really useful.

COMMENTARY

Background Information

Assessing habit formation

The first class of experimental manipulations used to assess habit formation, e.g. devaluation, changes how much the animal “wants” the reward. Operationally, this simply means how much the animal will consume the reward when freely exposed to it. During training, they are typically food deprived so that they are motivated to press the lever for a food reward. In this deprived state, the value of the reward is high. But once they have learned to press the lever, we can change the value of the reward by either inducing taste aversion to the food or by pre-feeding the animal with the same reward. Because such manipulations reduce the value of the reward, they are known as “outcome devaluation.” When the sated animal is exposed to the food, or when taste aversion has been induced, it will consume very little if at all. Thus the concept of value in value manipulations is operationally defined, and can be conveniently measured.

In the second class of manipulations, the instrumental contingency between the action and the outcome, or the feedback function, can be altered independently after training (Dickinson et al., 1983; Hammond, 1980). The feedback function in instrumental conditioning is usually controlled by the experimenter (Ferster and Skinner, 1957; Skinner, 1938). Feedback functions define what the animal receives (input) as a function of its behavior (output). In instrumental conditioning this can be manipulated by programming the relationship between lever pressing and reward delivery, the “schedule of reinforcement.” For the animal to acquire lever pressing, it is necessary to set up some feedback function such that reward is delivered contingent upon lever pressing. But once acquired behavior can become less sensitive to changes in the instrumental contingency.

Critical Parameters and Troubleshooting

In spite of the advances in our understanding of habit formation in the last three decades, the conditions leading to habit formation remain poorly understood (Derusso et al., 2010; Yin et al., 2004; Yin et al., 2005a; Yin et al., 2006). One critical factor in habit formation is the amount of training that each animal is given prior to devaluation testing. In general, it appears that more training leads to more habit formation. Extended training, usually a week of daily sessions on a particular schedule, can generate performance that is impervious to outcome devaluation. But the extent of training is not the only factor contributing to habit formation.

The reinforcement schedule is critical in generating habitual behavior. Studies have shown that variable interval (VI) schedules are especially effective at generating habitual behavior (Derusso et al., 2010; Yin et al., 2004). Under VI schedules, the first press after some time interval has elapsed is rewarded. The interval is variable, unlike fixed interval (FI) schedules, which have the same interval between reward delivery and the time of availability for the next reward. FI schedules, by contrast, do not promote habit formation (Derusso et al., 2010). In RI schedules, to control the rate of reward, it is necessary to press quickly and constantly, because the lever pressing is a way of discovering whether the variable and unpredictable scheduled interval has elapsed. As a result, many presses are not temporally contiguous with the reward. In FI schedules, the predictability of the interval results in high rate of pressing mainly in the later part of the scheduled interval, as the expected time of reward delivery approaches. Most presses are consequently temporally more contiguous with the reward. This difference may be responsible for the difference between these schedules in their ability to generate habitual lever pressing. Ratio schedules, in which more pressing leads to more rewards, are also known to promote more goal-directed actions, rather than habitual behavior (Dickinson, 1994; Yin et al., 2005a; Yin et al., 2005b). As is the case with FI schedules, actions are also more contiguous with rewards in ratio schedules, thus suggesting that the action-outcome contiguity may play a key role in the switch from goal-directed to habitual behavior.

Systematic analysis of habit formation in mice has only just begun in recent years (Hilario et al., 2007; Yu et al., 2009). There are tremendous gaps in our knowledge. For example, we know little about the differences between different strains in their capacity to form habits. But it is hoped that the variety of powerful genetic tools available in mice, and the convenience of the mouse model for operant experiments, will accelerate progress in this field.

Anticipated Results

Once mice have been trained on the RI60 reinforcement schedule, the rate of lever pressing should be higher than the rate of pressing during CRF training. If performance is habitual, the rate of lever pressing during unreinforced devaluation testing (selective satiety or conditioned taste aversion) should not differ from the rate of non-devalued mice (Figure 3). That is, if the operant response is habitual, mice will continue to press the lever even when the food is no longer “valued”.

Data from a devaluation test adapted from a published study (Yu et al., 2009). WT, wild-type mice; KO, A2A adenosine receptor knockout mice.

Habitual behavior is more resistant to the reversal of the instrumental contingency. Thus, during omission testing in which the instrumental contingency between the action–lever press–and the outcome–food delivery–has been completely reversed, mice who are responding habitually will tend to show a more gradual reduction in the rate of responding as compared to control mice that are more sensitive to the action-outcome contingency. As has been shown previously (Hilario et al., 2007; Yu et al., 2009), this procedure can be used in conjunction with genetic manipulations to test specific genes’ contributions to habit formation.

Time Considerations

It takes approximately 10-20 consecutive days to generate habitual lever-pressing using a random interval schedule of reinforcement and subsequently test for habit formation in mice. While this protocol has been used successfully to generate habitual responding in C57BL/6J mice, it is unknown whether other strains of mice form habits in the same way. Consequently, the number of days of training and testing at each step (CRF, RI30, RI60, Omission) may not be the same for other mouse lines. It is important, however, that the mice have been given sufficient training at each reinforcement schedule in order to avoid confounding results. For example, if mice do not show elevated rates of pressing on RI schedules compared to CRF, additional sessions of RI training should be conducted.

When assessing devaluation using the selective satiety procedure, the amount of time that each mouse is given free access to the food before operant testing is conducted is less important than the amount of food consumed by each mouse. Mice should be given a minimum of 30 minutes of free access to the food, but because of individual differences in responding to the novelty of the new feeding chamber, time of day, or initial level of satiety some mice may require additional feeding time to become sufficiently sated ( > 0.5 g of food consumed).

Acknowledgements

This work was supported by National Institute on Alcohol Abuse and Alcoholism Grants 018018 and 016991 to HHY.

Literature cited

Adams CD. Variations in the sensitivity of instrumental responding to reinforcer devaluation. Quarterly journal of experimental psychology. 1982;33b:109–122. [Google Scholar]
Adams CD, Dickinson A. Instrumental responding following reinforcer devaluation. Quarterly Journal of Experimental Psychology. 1981;33:109–122. [Google Scholar]
Colwill RM, Rescorla RA. Postconditioning devaluation of a reinforcer affects instrumental responding. Journal of Experimental Psychology: Animal Behavior Processes. 1985;11:120–132. [PubMed] [Google Scholar]
Colwill RM, Rescorla RA. Associative structures in instrumental learning. In: Bower G, editor. In The psychology of learning and motivation. Vol. 20. Academic Press; New York: 1986. pp. 55–104. [Google Scholar]
Derusso AL, Fan D, Gupta J, Shelest O, Costa RM, Yin HH. Instrumental uncertainty as a determinant of behavior under interval schedules of reinforcement. Front Integr Neurosci. 2010:4. doi: 10.3389/fnint.2010.00017. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dickinson A. Actions and habits: the development of behavioural autonomy. Philosophical Transactions of the Royal Society. 1985;B308:67–78. [Google Scholar]
Dickinson A. Instrumental Conditioning. In: Mackintosh NJ, editor. Animal Learning and Cognition. Academic; Orlando: 1994. pp. 45–79. [Google Scholar]
Dickinson A, Nicholas DJ, Adams CD. The effect of the instrumental training contingency on susceptibility to reinforcer devaluation. Quarterly Journal of Experimental Psychology: Comparative & Physiological Psychology. 1983;35:35–51. [Google Scholar]
Ferster C, Skinner BF. Schedules of Reinforcement. Appleton Century; New York: 1957. [Google Scholar]
Hammond LJ. The effect of contingency upon the appetitive conditioning of free-operant behavior. Journal of the Experimental Analysis of Behavior. 1980;34:297–304. doi: 10.1901/jeab.1980.34-297. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hilario MRF, Clouse E, Yin HH, Costa RM. Endocannabinoid signaling is critical for habit formation. Frontiers in integrative neuroscience. 2007;1:6. doi: 10.3389/neuro.07.006.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
Skinner BF. The behavior of organisms. Appleton-Century-Crofts; New York: 1938. [Google Scholar]
Yin HH, Knowlton BJ. The role of the basal ganglia in habit formation. Nat Rev Neurosci. 2006;7:464–476. doi: 10.1038/nrn1919. [DOI] [PubMed] [Google Scholar]
Yin HH, Knowlton BJ, Balleine BW. Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur J Neurosci. 2004;19:181–189. doi: 10.1111/j.1460-9568.2004.03095.x. [DOI] [PubMed] [Google Scholar]
Yin HH, Knowlton BJ, Balleine BW. Blockade of NMDA receptors in the dorsomedial striatum prevents action-outcome learning in instrumental conditioning. Eur J Neurosci. 2005a;22:505–512. doi: 10.1111/j.1460-9568.2005.04219.x. [DOI] [PubMed] [Google Scholar]
Yin HH, Knowlton BJ, Balleine BW. Inactivation of dorsolateral striatum enhances sensitivity to changes in the action-outcome contingency in instrumental conditioning. Behav Brain Res. 2006;166:189–196. doi: 10.1016/j.bbr.2005.07.012. [DOI] [PubMed] [Google Scholar]
Yin HH, Ostlund SB, Balleine BW. Reward-guided learning beyond dopamine in the nucleus accumbens: the integrative functions of cortico-basal ganglia networks. Eur J Neurosci. 2008;28:1437–1448. doi: 10.1111/j.1460-9568.2008.06422.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yin HH, Ostlund SB, Knowlton BJ, Balleine BW. The role of the dorsomedial striatum in instrumental conditioning. Eur J Neurosci. 2005b;22:513–523. doi: 10.1111/j.1460-9568.2005.04218.x. [DOI] [PubMed] [Google Scholar]
Yu C, Gupta J, Chen JF, Yin HH. Genetic deletion of A2A adenosine receptors in the striatum selectively impairs habit formation. J Neurosci. 2009;29:15100–15103. doi: 10.1523/JNEUROSCI.4215-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yu C, Gupta J, Yin HH. The role of mediodorsal thalamus in temporal differentiation of reward-guided actions. Front Integr Neurosci. 2010:4. doi: 10.3389/fnint.2010.00014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Adams CD. Variations in the sensitivity of instrumental responding to reinforcer devaluation. Quarterly journal of experimental psychology. 1982;33b:109–122. [Google Scholar]

[R2] Adams CD, Dickinson A. Instrumental responding following reinforcer devaluation. Quarterly Journal of Experimental Psychology. 1981;33:109–122. [Google Scholar]

[R3] Colwill RM, Rescorla RA. Postconditioning devaluation of a reinforcer affects instrumental responding. Journal of Experimental Psychology: Animal Behavior Processes. 1985;11:120–132. [PubMed] [Google Scholar]

[R4] Colwill RM, Rescorla RA. Associative structures in instrumental learning. In: Bower G, editor. In The psychology of learning and motivation. Vol. 20. Academic Press; New York: 1986. pp. 55–104. [Google Scholar]

[R5] Derusso AL, Fan D, Gupta J, Shelest O, Costa RM, Yin HH. Instrumental uncertainty as a determinant of behavior under interval schedules of reinforcement. Front Integr Neurosci. 2010:4. doi: 10.3389/fnint.2010.00017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Dickinson A. Actions and habits: the development of behavioural autonomy. Philosophical Transactions of the Royal Society. 1985;B308:67–78. [Google Scholar]

[R7] Dickinson A. Instrumental Conditioning. In: Mackintosh NJ, editor. Animal Learning and Cognition. Academic; Orlando: 1994. pp. 45–79. [Google Scholar]

[R8] Dickinson A, Nicholas DJ, Adams CD. The effect of the instrumental training contingency on susceptibility to reinforcer devaluation. Quarterly Journal of Experimental Psychology: Comparative & Physiological Psychology. 1983;35:35–51. [Google Scholar]

[R9] Ferster C, Skinner BF. Schedules of Reinforcement. Appleton Century; New York: 1957. [Google Scholar]

[R10] Hammond LJ. The effect of contingency upon the appetitive conditioning of free-operant behavior. Journal of the Experimental Analysis of Behavior. 1980;34:297–304. doi: 10.1901/jeab.1980.34-297. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Hilario MRF, Clouse E, Yin HH, Costa RM. Endocannabinoid signaling is critical for habit formation. Frontiers in integrative neuroscience. 2007;1:6. doi: 10.3389/neuro.07.006.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Skinner BF. The behavior of organisms. Appleton-Century-Crofts; New York: 1938. [Google Scholar]

[R13] Yin HH, Knowlton BJ. The role of the basal ganglia in habit formation. Nat Rev Neurosci. 2006;7:464–476. doi: 10.1038/nrn1919. [DOI] [PubMed] [Google Scholar]

[R14] Yin HH, Knowlton BJ, Balleine BW. Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur J Neurosci. 2004;19:181–189. doi: 10.1111/j.1460-9568.2004.03095.x. [DOI] [PubMed] [Google Scholar]

[R15] Yin HH, Knowlton BJ, Balleine BW. Blockade of NMDA receptors in the dorsomedial striatum prevents action-outcome learning in instrumental conditioning. Eur J Neurosci. 2005a;22:505–512. doi: 10.1111/j.1460-9568.2005.04219.x. [DOI] [PubMed] [Google Scholar]

[R16] Yin HH, Knowlton BJ, Balleine BW. Inactivation of dorsolateral striatum enhances sensitivity to changes in the action-outcome contingency in instrumental conditioning. Behav Brain Res. 2006;166:189–196. doi: 10.1016/j.bbr.2005.07.012. [DOI] [PubMed] [Google Scholar]

[R17] Yin HH, Ostlund SB, Balleine BW. Reward-guided learning beyond dopamine in the nucleus accumbens: the integrative functions of cortico-basal ganglia networks. Eur J Neurosci. 2008;28:1437–1448. doi: 10.1111/j.1460-9568.2008.06422.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Yin HH, Ostlund SB, Knowlton BJ, Balleine BW. The role of the dorsomedial striatum in instrumental conditioning. Eur J Neurosci. 2005b;22:513–523. doi: 10.1111/j.1460-9568.2005.04218.x. [DOI] [PubMed] [Google Scholar]

[R19] Yu C, Gupta J, Chen JF, Yin HH. Genetic deletion of A2A adenosine receptors in the striatum selectively impairs habit formation. J Neurosci. 2009;29:15100–15103. doi: 10.1523/JNEUROSCI.4215-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Yu C, Gupta J, Yin HH. The role of mediodorsal thalamus in temporal differentiation of reward-guided actions. Front Integr Neurosci. 2010:4. doi: 10.3389/fnint.2010.00014. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Methods for studying habitual behavior in mice

Mark A Rossi

Henry H Yin

Abstract

Introduction