OL task. A, Block structure. The task had 288 trials in total, in four blocks of 72 trials. Each block contained either experiential or OL trials, as well as choice trials. Block order was interleaved, and bandit values were reversed after the end of block 2. B, Reward structure. Reward was accrued to subjects' total only in experiential trials, and reward feedback was only presented in learning trials, both in experiential and observational blocks. C, Learning trials structure. Top row, Experiential learning trials. After a fixation cross of jittered duration between 1 and 2 s, subjects viewed a one-armed bandit whose tumbler was spun after 0.5 s. After a 1-s spinning animation, subjects received outcome feedback, which lasted for 2 s. Bottom row, OL trials. Subjects observed a video of another player experiencing learning trials with the same structure. Critically, outcomes received by the other player were not added to the subject's total. Lower bar, Timing of trial events in seconds. D, Choice trials structure. Subjects chose between the two bandits shown in the learning trials of the current block. After deciding, the chosen bandit's tumbler spun for 1 s, and no outcome feedback was presented.