Skip to main content
. 2020 Jun 10;40(24):4761–4772. doi: 10.1523/JNEUROSCI.2897-19.2020

Figure 2.

Figure 2.

Behavior and RL model. A, Accuracy rate of all sessions as defined by the fraction of free trials in which a subject chose the bandit with highest mean payout, discarding the first 25% of trials in each block. Each color represents a different session, for experiential and observational trials, with average and standard error indicated on the left and right. Accuracy in experiential and observational trials was not significantly different (p < 0.66, two-sample t test), n.s., not significant. The dashed red line indicates the chance level estimated by the theoretical 95th percentile of correct proportions, obtained from an agent making random decisions with p = 0.5. B, Typical time course of modeled EVs throughout the task, using the RL (counterfactual) model. Bandit 1 (exp) and Bandit 2 (exp) indicate EVs for each of the two bandits shown in experiential blocks, respectively, whereas Bandit 1 (obs) and Bandit 2 (obs) indicate EVs for each of the two bandits shown in observational blocks, respectively. C, Parameter fits for each valid session, for the chosen RL model. The model contained a single learning rate (α) for experiential and observational trials and an inverse temperature β. Dark blue horizontal lines indicate parameter means, and cyan horizontal lines indicate SE.