(A) Dynamic matching behavior of a monkey during a single experimental session. Continuous blue curve shows cumulative choices of the red and green targets. Black lines show average ratio of incomes (red: green) within each block (here, 1:1, 1:3, 3:1, 1:1, 1:6, and 6:1). Matching predicts that the blue and black curves are parallel.
(B) Block-wise matching behavior. Each data point represents a block of trials with the baiting probabilities for each target held constant. Reward and choice fractions are shown for the red target (those for the green target are given by one minus the fraction for the red target). Perfect matching corresponds to data points along the diagonal line. Deviations (undermatching) are apparent, as the choice probability is lower than reward probability when the latter is larger than 0.5.
(C) In a linear-nonlinear model, past rewards are integrated across previous trials with a filter time constant of approximately five to ten trials, yielding estimated values for the two targets νr and νg. Choice probability as a function of νr and νg is modeled as either a softmax rule (left panel) or a fractional rule (middle panel). Monkey’s behavioral data are fitted better by the softmax (sigmoid) decision criterion (right panel).
(D) In a recurrent neural circuit model endowed with reward-dependent plasticity (Figure 3A) applied to the foraging task, the average synaptic strength is a linear function of the return from each choice (the reward probability per choice on a target). Red and green data points are for the synaptic strengths cA (for red target) and cB (for green target), respectively.
(E) Graded activity of neurons in the two selective neural populations. The activity of decision neurons shows a graded pattern if single-trial firing rates are sorted and averaged according to the choice and the difference between synaptic strengths. Activity is aligned by the onset of two targets, and it is shown separately for the choice that is the preferred (red) or nonpreferred (blue) target of the neurons. In addition, trials are subdivided into four groups according to the difference between the values encoded by the synaptic strength onto the two competing neural populations (cA − cB = −0.05 to −0.14 [dashed], 0 to −0.05 [thin], 0 to 0.05 [normal], 0.05 to 0.14 [thick]).
(F) For one session of the model simulation of the foraging experiment, the cumulative choice on target A is plotted versus the cumulative choice on target B (blue). The black straight lines show the baiting probability ratio in each block. The same baiting probability ratios are used as in the monkey’s experiment (A).
(G) Each point shows the blockwise choice fraction as a function of the blockwise reward fraction for a block of trials on which the baiting probabilities are held constant. The model reproduces the matching behavior as well as the undermatching phenomenon.
(A) is reproduced with permission from Sugrue et al. (2004), (B) and (C) from Corrado et al. (2005), and (D)–(G) from Soltani and Wang (2006).