Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2014 Nov 5;369(1655):20130482. doi: 10.1098/rstb.2013.0482

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2014 The Author(s) Published by the Royal Society. All rights reserved.

PMC Copyright notice

Figure 1. — The two-stage task. (a) At stage 1, subjects choose between A₁ and A₂, and the outcome can be O₁ or O₂. They then make another choice (R₁ versus R₂), which can have a rewarding or neutral result ($ versus X). (b) The outcomes of A₁ and A₂ are commonly O₁ and O₂, respectively. On approximately 30% of trials, however, these relationships switch and A₁ leads to O₂, and A₂ leads to O₁ (dashed arrows). The values of O₁ and O₂ depend on the probability of earning a reward after the stage 2 actions, R₁ or R₂. In the current illustration, actions in O₁ are not rewarded and so O₁ has a low value, whereas O₂ has a high value because an action in O₂ is rewarded. Each stage 2 action will result in a reward with either a high (0.7) or a low probability (0.2) independent of the other actions. In each trial, there is a small chance (1 : 7) that these reward probabilities reset to high or low randomly, which causes frequent devaluation/revaluation of outcomes (O₁ and O₂) across the session. (c) Both choices at O₂ have a low value (left), and in the next trial, the reward probability of one of the actions becomes high (right). In a rare trial, the subject executes A₁ and receives O₂ instead of O₁, and then the action in O₂ becomes rewarded (blue arrows indicate executed actions), which causes offline revaluation of O₂. Thus A₂ should be taken in the subsequent trials to reach O₂. (d) O₂ has a high value (left), and on the next trial, the action that was rewarded previously in O₂ is not rewarded (right). In a rare trial, the subject chooses A₁ and receives O₂ instead of O₁, after which the action in O₂ is not rewarded. This causes the offline devaluation of O₂ and, in subsequent trials, A₁ should be chosen so as to avoid O₂. (e) The probability of selecting the same stage 1 action on the next trial as a function of whether the previous trial was rewarded, and whether it was a common or rare trial (mixed-effect logistic regression analysis with all coefficients treated as random effects across subjects; ‘reward’ × ‘transition type’ interaction: coefficient estimate = 0.41; s.e. = 0.11; p < 5 × 10⁻⁴). Based on (c,d), when rewarded after a rare trial, a different stage 1 action should be taken, whereas when unrewarded after a rare trial, the same stage 1 action should be taken. This pattern is reversed if the previous trial is common: the same stage 1 action should be taken if the previous trial is rewarded, and a different stage 1 action should be taken if it is unrewarded. This stay/switch pattern predicts an interaction between reward and transition type if stage 1 actions are guided by their outcomes values, which is consistent with the behavioural results (see [14]). Error bars, 1 s.e.m.