Skip to main content
. Author manuscript; available in PMC: 2020 Aug 7.
Published in final edited form as: Neuron. 2019 Jun 10;103(3):533–545.e5. doi: 10.1016/j.neuron.2019.05.017

Figure 1. Monkeys managed explore-exploit trade-offs.

Figure 1.

(A) Structure of an individual trial in the bandit task. (B) Each block of n trials, up to 650 trials, began with the presentation of 3 novel images. This set, s1, of visual choice options was repeatedly presented to the monkey for j trials. After a minimum of 10 and up to a maximum of 30 trials, one of the existing options was randomly replaced with a novel image. This formed a new of set options, s2, that were presented for j trials. The novel option in this set was randomly assigned its own reward probability from a symmetric distribution, {0.2, 0.5, 0.8}. This process of introducing a novel option was repeated up to 32 times within a block. (C) MRI guided reconstruction of recording locations in the amygdala (red) and ventral striatum (blue) projected onto views from a macaque brain atlas (Saleem and Logothetis, 2007). (D) Fraction of times the monkeys chose each option type in terms of the number of trials since the introduction of a novel option (E) The fraction of times the monkeys chose novel options based on their assigned value. (F) The fraction of times, across all trials, the monkeys chose each option type as a function of the empirical reward value of the best alternative option. (G) Choice RTs based on which option was chosen and the number of trials elapsed since a novel option was introduced.