Skip to main content
. Author manuscript; available in PMC: 2019 Dec 20.
Published in final edited form as: Prog Neuropsychopharmacol Biol Psychiatry. 2017 Jun 27;87(Pt A):22–32. doi: 10.1016/j.pnpbp.2017.06.029

Figure 1.

Figure 1

Illustration of behavioral paradigms used to assess whether control of performance is goal - directed or habitual. A. In instrumental tasks, rodents are trained to make an action (i.e. lever press) to obtain a reward. During the test, the instrumental contingency between the response and the outcome can be degraded with non-contingent reward delivery (upper panel). It is also possible to devalue the outcome by giving ad libitum access to the reward to generate sensory-specific satiety, or by pairing reward consumption with lithium chloride injection to generate conditioned taste aversion (lower panel). After contingency degradation or outcome devaluation, performance is assessed under extinction. A reduction in responding demonstrates that performance is goal-directed while persistent responding indicates habitual behavior. B. In some human instrumental tasks, fruit pictures (the stimuli) are presented and signal which associated response (a left or right key press) earns points signaled by a subsequent picture of fruit inside a box (the outcome). If the wrong response is emitted, the box is empty. Some outcomes are then devalued (indicated by a red cross on the fruit picture), and will now lead to a subtraction of points. During the test (right panel), subjects are presented with a rapid succession of stimuli, and are instructed to make the correct response for stimuli signaling still valuable outcomes (“go” trial), while they should refrain from responding for stimuli associated with devalued outcomes (“no go” trial). Adapted from (de Wit et al., 2012). C. In the sequential 2-stage decision task, one choice between 2 options at the first step leads commonly (70%) to a second pair of options but results occasionally (30%) in another set of options. At this second step, the selection of an option is rewarded or not, according to variable and unpredictable probabilities (left panel). Pure model-based agents are more likely to repeat the 1st step choice (i.e. ‘stay’) following rewarded trials after common transitions, but will switch to the alternative 1st step option after rare transitions (middle panel). In contrast, pure model -free agents will tend to repeat the same 1st step choice after rewarded trials, irrespective of the transition that preceded reward (right panel)(Daw et al., 2011).