a, Subjects fixated on a cue to initiate each trial. They were presented with one (forced choice) or two (free choice) pictures. After indicating their choice with a lever, they received the corresponding juice amount with the probability . b, Subjects learned 16 pictures associated with different juice amounts and reward probabilities. c, Subjects were more likely to choose pictures associated with larger and more probable rewards. Using the same arrangement as b, the color indicates the percent of free trials on which each picture was chosen when available. d, To train the direction decoder, we split the pictures into equal-sized groups from the lowest expected value (value bin = 1) to the highest (value bin = 4). Groupings were the same for both subjects; the same arrangement as in b. e, Choice response times on free (green) and forced (gray) trials. f, The effects of maximum value and value difference on reaction times. To visualize the unique contributions of each of these predictors, response times were modeled as a linear function of one parameter, and the residuals (mean ± s.e.m.) were plotted as a function of the other parameter.