Fig. 2.
Reward expectation during multistep actions measured by anticipatory licking. (A) The anticipatory licking movements for the 800-ms period before the reinforcer beeps (SI Text) in monkey CC are color-coded. (B) The average proportion of trials in which the amplitude of anticipatory licking exceeded the threshold (50% maximum) is plotted against the time to the reinforcer beeps in the two monkeys. (C) Bar graphs of the normalized licking duration (100–800 ms period before the beeps, mean and SEM; 32 sessions in monkey BT and 75 sessions in monkey CC; SI Text) against trial type. The average reward probability (dashed green line) and the best-fit value function derived from reinforcement learning algorithm (solid black line, γ = 0.65, R = 0.71, P = 0.29 in monkey BT; γ = 0.66, R = 0.74, P = 0.16 in monkey CC) are superimposed. (D) The parameter space landscape of correlation coefficients between the experimental and simulated licking duration in which R is plotted against γ. The values of the second derivatives of R are −27 for monkey BT and −6.1 for monkey CC.