Significance
We examine how the brain guides active sensing in awake, behaving primates using a paradigm in which information sampling is dissociated from reinforcement variables, such as cumulative future reward or reward prediction errors. We show that target selective cells in lateral intraparietal cortex encode decision variables based on expected gains in instrumental information—the extent to which a visual cue, when discriminated, is expected to reduce the uncertainty of a subsequent action.
Keywords: information sampling, saccades, attention, decisions, reward
Abstract
In natural behavior, animals have access to multiple sources of information, but only a few of these sources are relevant for learning and actions. Beyond choosing an appropriate action, making good decisions entails the ability to choose the relevant information, but fundamental questions remain about the brain’s information sampling policies. Recent studies described the neural correlates of seeking information about a reward, but it remains unknown whether, and how, neurons encode choices of instrumental information, in contexts in which the information guides subsequent actions. Here we show that parietal cortical neurons involved in oculomotor decisions encode, before an information sampling saccade, the reduction in uncertainty that the saccade is expected to bring for a subsequent action. These responses were distinct from the neurons’ visual and saccadic modulations and from signals of expected reward or reward prediction errors. Therefore, even in an instrumental context when information and reward gains are closely correlated, individual cells encode decision variables that are based on informational factors and can guide the active sampling of action-relevant cues.
In natural behavior, animals have access to multiple sources of information, but few of these sources are relevant for learning or action. Making good decisions therefore entails not only the selection of the ultimate action but, more primarily, the decision of which source of information to sample. Decisions about information sampling are central for tasks as diverse as making a medical diagnosis (which is the best test to prescribe?), making categorization decisions (which is the most informative feature?) (1, 2), and guiding skilled actions (what should I keep my eyes on while driving?). Despite the ubiquity and significance of active sampling mechanisms, few studies have been devoted to understanding these mechanisms and their importance for decision theories. Evidence accumulation has been extensively examined in decision research (3) but has been portrayed as a passive process, in the sense that decision makers rely on predetermined (experimenter selected) sources of information but cannot themselves determine which source to consult to guide a future action.
Recent studies begin to shed light on this question by showing that animals (including pigeons, monkeys, and humans) prefer to observe cues that are predictive rather than nonpredictive about a future reward, and that the value of informative cues is encoded in the orbital frontal cortex and midbrain dopamine (DA) cells (4, 5). These investigations, however, have been limited to noninstrumental contexts in which animals seek to obtain information about a reward merely in order to know, but cannot act based on the information. Very little is known about the much more common scenario in which animals sample instrumental information to make decisions and guide future actions (6).
Understanding instrumental sampling poses two important challenges beyond those arising in noninstrumental contexts. First, because instrumental sampling is, by definition, coordinated with actions, understanding it requires developing sequential paradigms in which an individual first decides which information to sample, and then decides which action to take based on that information. Sequential tasks of this kind have been used in human observers (e.g., refs. 2 and 7–10) but are largely eschewed in work with experimental animals where decision-making is studied with single-step tasks (11) (see ref. 12 for a notable exception).
Second, in an instrumental context, a more reliable cue, by definition, supports a better choice of actions, strongly correlating gains in reward with gains in information. Behavioral and computational studies suggest that a full explanation of instrumental sampling behaviors requires considering not only rewards but also bona fide informational factors (2, 7–9, 13). However, we lack behavioral paradigms that can be used with animals and can clearly dissociate neuronal signals of instrumental information from those of expected reward gains, or furnish an understanding of neural signals related to these factors.
Here we address this question by examining the activity of target selective cells in the monkey lateral intraparietal area (LIP), a cortical area implicated in the top-down control of attention and gaze (3, 14). LIP neurons are particularly appropriate for this investigation, because they integrate multiple factors relevant for deciding where to allocate attention and gaze, and are sensitive to the expected reward values of alternative gaze locations (11, 15–22).
To determine whether LIP target-selective cells are also sensitive to expected gains in information, we used a two-step decision task in which the monkeys made a first rapid eye movement (saccade) to sample information from a visual cue, and a second saccade to select an action based on that information. We focus on the responses before the first information sampling saccade, and show that they encode the gains in information that the saccade was expected to bring, in a manner that is independent of the cumulative future rewards and reward prediction errors (RPEs) associated with the informative cues. Thus, parietal cortical neurons involved in active sensing decisions encode bona fide information-based decision variables that can guide the judicious sampling of action-relevant cues.
Results
Two monkeys (Macaca mulatta) were trained on a task in which they made two saccades on each trial—a first saccade to obtain information from a visual cue, and a second saccade to report a decision based on that information (Fig. 1A). In each trial, after achieving central fixation, the monkeys had a 500-ms delay period during which they viewed a visual display containing two cues (round apertures containing small dots) and two targets (white squares; Fig. 1A, first and second panels). At the end of this period, the fixation point disappeared, and the monkeys had to make a saccade to one of the cues. When the monkey’s gaze arrived on the cue, the dots inside the chosen cue began to move with 100% coherence toward one of the targets, indicating which target was most likely to deliver a reward (Fig. 1A, fourth panel, black arrows). After motion onset, the monkeys were free to indicate their final decision by making a second saccade to a target, and the trial ended with a reward or lack of reward (see below).
Fig. 1.
Free-choice task and behavior. (A) Each trial began when the monkeys achieved fixation of a central spot (small black circle) placing the RF of an LIP cell (dashed circle) on an eccentric screen location. (A representative RF in the right hemifield is shown for illustrative purposes, but it was not visible to the monkey during the experiment.) Two targets were then presented outside the RF (white squares) followed by two cues, of which, one was inside the RF and the other was at the diametrically opposite location (round apertures containing small dots). The monkeys viewed the display for a 500-ms delay period, after which the fixation point disappeared, and the monkeys made a first saccade to one of the cues (freely chosen; third panel, red arrow). At the end of this saccade, the chosen cue delivered its information in the form of 100% coherent dot motion directed toward one of the targets (fourth panel, black arrows). After motion onset, the monkeys were free to indicate their final decision by making a second saccade to a target (fourth panel, red arrows), and the trial ended with a probabilistic reward [p(R)]. (B) The fraction of choices of the optimal cue at the first saccade step, as a function of the difference in validity of the available cues, for M1 and M2. The x axis shows the difference in validity for the three possible pairs of unequal validity cues, and each point shows the mean and SEM across all recording sessions. (C) The rates of reward as a function of the validity of the chosen cue, in the same format as B.
Our interest was in the neuronal activity related to the decision for the first saccade, which was made during the initial period of central fixation, and expressed the monkeys’ choice of which cue to sample (Fig. 1A, delay, second panel). Note that, during this delay period, the cues did not yet deliver motion information, as the dots inside both apertures were stationary. However, the monkeys were informed about cue validity by means of a colored border surrounding each cue. Validity was defined as the probability that the motion delivered by a cue would correctly specify the rewarded final action, and could take values of 100%, 80%, or 55% (with the color-validity mapping held constant for each monkey but randomized across monkeys; Methods). Therefore, validity is a measure of how informative each cue was likely to be for the choice of the subsequent action, and is mathematically equivalent to two widely used measures of expected information gains (EIG)—Shannon information and the maximum a posteriori (MAP) estimate (Eqs. S1 and S2)—and we use “validity” and “EIG” interchangeably in the paper. The task design, therefore, separated neural activity related to the choice of an informative item (based on peripheral validity information before the first saccade) from activity associated with the sensory discrimination (based on foveal motion information after the saccade).
When presented with free-choice trials containing two unequal validity cues, both monkeys chose to inspect the higher-validity cue with >90% probability, suggesting that they used a validity-based sampling policy (each monkey, P < 0.0001 relative to 0.5; Fig. 1B). After selecting a cue, the monkeys tended to obey its instruction and choose the target that was congruent with the motion direction. Consequently, the reward rates approached the maximum expected rates in the task and, as expected in an instrumental context, scaled with the validity of the chosen cue (Fig. 1C, two-way ANOVA, P < 10−9 for the main effect of validity, P < 10−8 for monkey, and P = 0.004 for monkey × validity interaction).
LIP Neurons Encode EIG.
To examine the encoding of cue validity at the level of individual cells, we identified individual target-selective LIP neurons (Methods) and customized the display so that, during the initial delay period, one of the cues fell inside the receptive field (RF) of the recorded cell while the other stimuli were outside the RF, allowing us to capture activity related to the first saccade decision. Note that, after the delay period, the RF of the cell moved away from the visual display by virtue of the first saccade, and we could no longer observe meaningful responses to the second saccade or motion discrimination.
During the delay period before the first saccade, LIP neurons (n = 50) had robust visual responses to the onset of the cues, followed by sustained saccade-related responses that were higher if the saccade was directed inside versus opposite the RF (Fig. 2A; two traces in each panel). However, the directional response—the difference in firing for the two directions—was not constant but scaled as a function of the relative validity of the available cues (Fig. 2A, compare across the three panels), suggesting that the neurons multiplex visual and saccade-related responses with a sensitivity to validity/EIG.
Fig. 2.
Quantitative analysis of EIG effects on two-cue trials. (A) Population responses on two-cue trials in which the alternative cues had (Left) large, (Center) medium, and (Right) low differences in validity, sorted according to saccade direction (toward or opposite the RF). Throughout the paper, we use gray, green, and blue to represent, respectively, 100%, 80%, and 55% valid cues (although, in practice, the colors differed by monkey). Traces show mean and SEM of activity across all of the cells (n = 50). Firing rates were z-scored for each cell using the mean and SD across all correct trials (Methods), and can be compared across all trial groups. (B) Time-resolved regression coefficients (sliding window of 50-ms width, 1-ms step) estimating the effects of the validity of the RF cue (Val InRF), the validity of the opposite RF cue (Val OppRF), and saccade direction (Sac Dir), velocity, latency, and accuracy across the trials shown in A. (C) Cumulative distribution of coefficients for individual cells in 200-ms bins spanning the early and late parts of the delay period.
To quantitatively characterize the EIG response, we regressed each neuron’s firing rates as a function of the validity of the RF cue, the validity of the opposite RF cue, saccade direction, saccade latency, saccade velocity, and saccade accuracy (Eq. S3). The validity of the RF cue had a prominent effect in the early part of the delay period (Fig. 2 B and C, 100 ms to 300 ms after cue onset), with an average coefficient of 1.51 ± 0.2 (P < 0.0001, corresponding to 12.08 spikes per second (sp/s) in terms of raw firing rates). During this interval, the validity of the RF cue had a much stronger effect than the validity of the opposite cue (orange; −0.36 ± 0.11, P = 10−4) or saccade direction (green; 0.28 ± 0.17, P = 10−6) and a much higher prevalence among individual cells (with 72% showing significant encoding of the validity of the RF cue, compared with only 16% for the opposite RF cue, 10% for saccade direction, and 0%, 0%, and 2%, respectively, for saccade velocity, latency, and accuracy; Fig. 2C, Left). During the later delay period (300 ms to 500 ms), the average effect of saccade direction increased and became comparable to that of cue validity (direction, 1.08 ± 0.24 vs. validity, 0.85 ± 0.13, P = 0.68), consistent with the fact that LIP neurons reflect oculomotor planning late in a decision epoch (3). However, the validity response remained highly prevalent, with 84% of cells showing significant sensitivity to the validity of the RF cue, compared with 48% for saccade direction.
Further analysis verified that EIG modulations were robust in a subset of the two-cue trials selected so as to differ only in the validity of the RF cue but be matched for saccade direction and the validity of the opposite cue (Fig. S1). Finally, model comparisons using Akaike and Bayesian Information Criteria (23) showed that, among the set of all possible models based on the regressors of interest, models that contained a term for the validity of the RF cue provided a superior fit (Fig. S2). Together, these findings suggest that LIP neurons robustly encoded the EIG of an RF cue in combination with, but independently of, their previously described responses to visual onsets, saccade planning, and competitive normalization by non-RF cues.
Fig. S1.
Validity effects in subsets of trials equated for saccade direction and the validity of the opposite cue (refers to Fig. 2). To further verify the independence of EIG responses from saccade planning, we selected subsets of two-cue trials that differed only in the validity of the RF cue (55% vs. 80%) but were identical in terms of the saccade direction (away from the RF) and the validity of the opposite-RF cue (100%). (A) Average population responses for the two trial types (mean and SEM across 50 cells) show that the cells responded more for 80% relative to 55% RF cues. (B) Regression analysis (identical to Eq. S3 except for the absence of a term for saccade direction and validity of the opposite cue, which did not vary in these trials) confirmed that the cells did not modulate as a function of saccade latency, velocity, or accuracy, but had a significant modulation with the validity of the RF cue (P < 10−10 relative to 0 during the 125 to 250 ms after cue onset). (C) The results were consistent in individual cells. The time-resolved regression coefficients for each parameter are shown for individual cells. (D) Saccade latencies, velocities, and accuracies did not differ as a function of cue validity. Trials were pooled across all recording sessions after z-scoring within individual sessions. The EIG of the RF cue (blue vs. green) had negligible effects on saccade metrics. Paired comparisons using the Mann−Whitney u test yielded P = 0.38 for saccade latency, P = 0.15 for velocity, and P = 0.86 for accuracy. Note that, by z-scoring and pooling, we minimized intersession noise and maximized trial numbers (n = 1,676 trials), resulting in high statistical power and very low likelihood of a type II error.
Fig. S2.
Model comparison (refers to Fig. 2B). For each cell, we computed 63 regression models (corresponding to all possible combinations of the six regressors used in Fig. 2B) in the (A−D) early delay period (100 md to 300 ms) and (E−H) late delay period (300 ms to 500 ms) and evaluated each model using sample corrected Akaike Information Criterion (AICc) and the Bayesian information Criterion (BIC). AICc and BIC measure the goodness of fit of a model using different philosophies. AICc assumes that reality is much higher dimension than any model, and is thus biased toward overfitting the data (allowing too many free parameters), BIC, in contrast, assumes that one of the available models is a “true” best fit, and is relatively biased toward underfitting the data (rejecting an additional parameter that has explanatory power). Because it is not a priori clear which bias is more appropriate, we chose a standard strategy of computing both metrics and testing for disagreements between them. The two tests agreed in assigning the best performance (lowest scores) to models that included a term for the validity of the RF cue. (A and E) Mean and SEM of AICc and BIC values for subgroups of models that included each one of the six regressors (across with all possible combinations of the remaining regressors) arranged in order of model quality (lower values are better). AICc and BIC values were normalized for each cell by subtracting the cell’s best model, and the points show the mean and SEM of the normalized values (“Excess”) across cells and models in a given class. (B and F) The fraction of models (pooled across cells) for which the coefficient of each regressor was significant at P < 0.05, 0.01, and 0.001. (C and D) Comparison matrices of the AICc and BIC values in A, showing the likelihood that the differences between any pair of values arose by chance. Red indicates values that are significantly different at P < 0.05. (G and H) Same as C and D, for the data in E, analyzing 300 ms to 500 ms after cue onset. In the early delay period, models that included a term for the validity of the RF cue outperformed all other models, including those that included terms for saccade direction and the validity of the opposite cue, whether evaluated with the AICc or the BIC (A−D). In the late delay period, models that included the validity of the RF cue were only slightly worse than those including saccade direction (E−H). In both epochs, over 90% of the coefficients for the validity of the RF cue were significant at P < 0.05 (B and F). For all of the analyses, equivalent results were obtained when we reanalyzed the data using median rather than mean AICc and BIC values.
EIG Is Encoded Independently of Expected Reward.
Behavioral evidence suggests that informational factors influence gaze allocation during sensorimotor tasks, supporting the idea that the validity responses we find encode EIG (24). However, an alternative possibility is that these responses reflect reward expectation, because validity in an instrumental context is, by definition, correlated with the reward probability of subsequent actions (Fig. 1C). To distinguish between these possibilities, we compared the responses of a sample of LIP cells (n = 69, including the 50 cells described in Fig. 2) to two types of RF stimuli that were equated for reward expectation and differed in whether or not they brought decision information.
We tested the neurons in two modified versions of the original task, in which the stimulus appearing in the RF was, respectively, informative or uninformative for the final action. The informative condition was identical to the original two-cue task, with the exception that a single informative cue appeared inside the RF, and monkeys were forced to sample the information conveyed by this cue (Fig. 3A; forced choice, informative). The “uninformative” condition was equivalent in all respects, except that the monkeys received action and reward information from a precue, and the RF cue delivered no additional information (Fig. 3B). Each trial in this condition began with the presentation of a precue that was located opposite the RF, had a colored border indicating validity, and contained coherent motion toward one of the targets, thus signaling the trial’s final target and reward probability (Fig. 3B, first panel). After the precue was extinguished, a second uninformative stimulus appeared in the RF, and, after a 500-ms delay, the monkeys made a first saccade to this stimulus, viewed random (0% coherence) motion inside it, and made a second saccade to a target.
Fig. 3.
The informative/uninformative stimulus test. (A) Trial stages in the informative and uninformative task. The Informative task was identical to the cue choice task except that a single cue appeared in the RF, forcing the monkeys to complete the trial based on this cue. (B) In the uninformative condition, a precue containing moving dots appeared opposite the RF simultaneous with target onset, and conveyed both the reward probability of the trial (by virtue of its colored border) and the instruction about the final action (through the dot motion; first panel). The precue then disappeared and was replaced by an uninformative stimulus inside the RF (second panel). After an additional 500-ms delay period, the monkeys were required to make a saccade to the RF stimulus (third panel) before making their final saccade to a target (fourth panel). Note that, although the uninformative stimulus delivered no information (but only random, 0% coherence motion), a saccade to this stimulus was still valuable because it was necessary to obtain the reward. As depicted by the cartoon at the bottom of B, we used two types of uninformative stimuli, which were distinguished by the color of their borders and were each paired with a constant informative precue to establish long-term reward associations. Throughout the paper, we depict the uninformative stimulus that followed a 55% valid precue in red and the stimulus that followed an 80% valid precue in cyan (although in practice the colors were randomized for each monkey). (C) (Left) A standard model-free RL simulation (SI Methods) predicts that cue value scales as a function of validity, and that this scaling is identical for the uninformative stimuli. Note that the model predictions refer to the relative modulations across the different cues, but the absolute Q values are arbitrary. (Right) LIP responses to the informative and uninformative items (mean and SEM of the traces shown in Fig. 4A, averaged across 125 ms to 250 ms after cue onset). (**P < 0.001; two-way ANOVA with pairwise post hoc comparisons; n.s., P = 0.6). Although the model predicts identical value scaling for informative and uninformative items, LIP neurons show much stronger scaling for the informative cues.
In the uninformative condition, therefore, the first saccade to the RF cue was equated to that in the informative task in its direction, timing, and reward expectation, and differed only in that it was not expected to bring decision information. The equivalence in reward expectation was ensured by the fact that, in the uninformative condition, the saccade to the RF item was necessary to harvest the reward, and reward probability was governed by the validity of the initial precue. To ensure that reward associations were consistent over longer time scales, we consistently paired each type of uninformative item with a specific informative precue (e.g., a red border uninformative item always followed a 55% valid precue and hence had a ∼55% long-term reward expectancy, whereas a cyan border uninformative item always followed an 80% valid precue, and thus had an ∼80% long-term expectancy; Fig. 3B). (Note that, although monkey 1 (M1) performed the task with all three pairs of yoked informative−uninformative cues, monkey 2 (M2) was limited to pairs with a 55% and 80% precue (Methods). We therefore focus our analysis on these two trial types, which were tested in both monkeys (and separately analyze the three-cue data from monkey M1).
To confirm that the informative and uninformative items had equivalent reward value, we simulated a reinforcement learning (RL) model that used the same state transitions and reward contingencies as the behavioral task, and learnt the values of each state by trial and error using a temporal difference algorithm (SI Methods). As shown in Fig. 3C (Left), the model predicts that the value of the first saccade will depend on cue validity and that this dependence will be identical for informative and uninformative items (Fig. 3C, two-way ANOVA, P < 10−10 for cue validity, P > 0.8 for task type and interaction). This prediction follows from the fundamental structure of model-free RL algorithms, in which the final reward of an action sequence confers value to all previous steps, subject to temporal discounting but impervious to informational factors (25, 26), and confirms that our task design appropriately equated the reward values of informative and uninformative items.
Contrary to model predictions, LIP neurons discriminated only the validity of informative cues but not the reward associations of uninformative items (Fig. 3C, “Data”). A two-way ANOVA revealed significant effects of validity (P < 0.001) and validity × task interaction (P < 0.001) and, in post hoc comparisons, a significant effect of validity for informative cues but not for uninformative items (P < 0.001 vs. P = 0.6; average responses during the period of peak modulation, 125 ms to 250 ms after cue onset).
A time-resolved regression analysis (Fig. 4 B and C and Eq. S4) yielded average regression coefficients 3.4 ± 0.57 z-score (SD) units for informative cues vs. 0.81 ± 0.39 z-score units for uninformative stimuli (P < 0.0003), a result that was robust for each monkey (M1, 3.4 ± 0.65 vs. 0.75 ± 0.44, P = 0.0013; M2, 3.57 ± 0.9 vs. 1.2 ± 0.6, P = 0.048). This difference was consistent at the level of individual cells, where more than twice as many neurons showed significant effects of validity than showed encoding of reward expectation (32/69 vs. 15/69 cells; P < 0.002, z = 3.05 test of proportions; Fig. 4B, colormaps, and Fig. 4C). We further verified that the sensitivity to cue type in the informative condition was independent of saccade metrics (Fig. S3), was replicated when reanalyzed in terms of the Shannon EIG (SEIG) (Fig. S3), and was replicated in M1 using the three yoked pairs of informative and uninformative cues (Fig. S4).
Fig. 4.
LIP neurons encode validity but not the cumulative future rewards of uninformative cues. (A) (Left) Average firing rates (n = 69 cells) for 55% and 80% valid cues, and (Right) their yoked uninformative stimuli. To highlight the cue-related modulation, firing rates were z-scored after subtracting the average activity for each stimulus class (we use the term “Excess” to indicate mean subtraction). Error bars show SEM across cells. (B) Average regression coefficients for the validity/reward responses in (Left) informative and (Right) uninformative trials for (Top) each monkey and (Bottom) each cell (colormaps). Note that the regression coefficients estimate the size of the neural effects across the entire validity range (50 to 100%) and are thus nearly twice as large as the difference in responses between the 80% and 55% cues, which span only half of this range. (C) Cell-by-cell comparison of validity and reward effects. Each point shows the average validity/reward coefficients of one cell (125 ms to 250 ms), color coded according to its significance along the x and y axes. In the marginal histograms, significant points are indicated by darker colors. Dotted lines show sample means. Note that all of the cells were used to compute the marginal histograms and the means indicated by the dotted lines, but one outlier that had coefficients of 25.2 for informative cues and 11.4 for uninformative cues was left out of the plot for clarity of presentation (this cell came from M1 and can be seen on row 44 of B as showing very high modulation for informative cues in analysis epoch). Recomputing the statistics without this outlier did not change the results (both monkeys, average and SEM for informative vs. uninformative cues: 3.1 ± 0.48 vs. 0.65 ± 0.36, P = 0.00035; M1, 3.0 ± 0.54 vs. 0.57 ± 0.41, P = 0.0015; both monkeys, z test of proportions for the incidence of significant cells, z = 3.09, P = 0.001).
Fig. S3.
Neurons encode EIG and validity but not saccade metrics in the one-cue task (refers to Fig. 4). The same analysis as in Fig. S1, applied to one-cue informative trials in which a single informative cue was in the RF (same cells that are analyzed in Fig. 4). The regressions were coded in terms of (Left) SEIG (Eq. S2) and (Right) validity. (A) The average coefficient value (mean and SEM across cells) and (B) the results for individual cells. EIG/validity modulations were robust in a majority of cells, but no neuron showed effects of saccade metrics.
Fig. S4.
Neurons modulate for informative but not uninformative cues when using three levels of EIG (refers to Fig. 4). The same analysis as in Fig. 4, applied to the data from M1, who was tested with all three informative cues and corresponding uninformative cues. The average coefficients for informative and uninformative cues were, respectively, 2.75 ± 0.48 vs. 0.85 ± 0.27 z-score units/validity, P = 0.0001, with 53% vs. 23% of cells showing significant modulations, z = 3.37, and P = 0.0007 for test of proportions. (A–C) Same conventions as Fig. 4. (A) Average firing rates. (B) Average regression coefficients. (C) Cell-by-cell regression coefficients in 125–250-ms window.
Note that, because the theoretical predictions pertain to the relative rather than absolute response magnitudes, our analysis so far has focused on the mean-subtracted neuronal activity, which factors out the visual onset response and reveals the relative responses to the different levels of validity/reward expectation (Fig. 4A). Examination of the full (not mean-subtracted) responses showed that the validity modulations rode on top of a visual onset response that was strong for both informative and uninformative cues and could be factored out by linear regression (Fig. S5A and Eq. S4). Finally, analysis of the raw (not z-scored) responses showed that the effects were very robust, with average regression coefficients of 24.15 ± 0.57 sp/s for informative cues vs. 4.68 ± 1.2 sp/s for uninformative stimuli (P = 0.000008; Fig. S6 A and B).
Fig. S5.
Full responses to the uninformative and cue-change tasks (refers to Figs. 4 and 7). (A) (Top) The average population responses to the informative and uninformative cues that are compared in Fig. 4, before mean subtraction. The neurons had a visual response to the onset of both cue types, and the validity modulations (differences between the two cue types) rode on top of this response. This finding is the expected one in LIP neurons, which multiplex visual, saccade-related, and decision activity in their firing rates (14) and implies that validity is not encoded in terms of absolute firing rates but must be read out using more complex decoding mechanisms (34). (Bottom) Regression coefficients associated with the visual response (β0 in Eq. S4, gray) and the validity/reward modulations (β1 in Eq. S4, brown). (B) (Top) Same as A, but showing the responses to informative cues vs. cue-change trials that are analyzed in Fig. 7. (Bottom) Regression coefficients associated with the visual response (β0 in Eq. S8, gray) and the validity/reward modulations (β1 in Eq. S8, brown). (C) Reorienting effects cannot explain the validity encoding. Comparison of the visual onset responses shows that the overall visual responses were slightly higher for an uninformative stimulus relative to an informative cue in the RF (gray traces in Top, Right vs. Left in A), and also slightly higher for cue-change relative to informative trials (gray traces in Top, Right vs. Left in B), reflecting the reorienting of attention from the opposite-RF precue to the new stimulus in the RF (27). To determine whether this effect may have affected validity coding, we quantified the visual enhancement for each cell using a reorienting index (RI), , where FRfirst is the z-scored visual response to an RF stimulus when that stimulus was the first one that appeared on a trial (i.e., the informative cue in the one-cue task). FRsecond is the visual response to the RF stimulus when that was the second stimulus in the task (i.e., the uninformative stimulus in the uninformative task, or the second cue in the cue-change task). The RI was computed in the epoch 100 ms to 250 ms after cue onset to capture the peak enhancement. The scatterplot shows the cell-by-cell RI and validity modulation for the cells tested in the one-cue task (black points; n = 69; Fig. 4) and for those tested on the cue-change task (magenta, n = 24, described in Fig. 7). RIs were, on average, positive and showed a marginally significant trend for visual enhancement (average RI = 0.09, P = 0.069). Critically, there was no correlation with encoding of validity in either the uninformative or the cue-change test.
Fig. S6.
Validity encoding is robust in nonnormalized firing rates (refers to Fig. 4). The same analysis as in Fig. 4, applied to raw firing rates before z-scoring or mean subtraction. Validity effects remain robust, with average regression coefficients of 24.15 ± 0.57 sp/s for informative cues vs. 4.68 ± 1.2 sp/s for uninformative stimuli (P = 0.000008) and 42% vs. 18% of cells showing significant modulations (z = 2.96, P = 0.001 for test of proportions). Note that, in this analysis, the β0 coefficient (gray traces) tracks the baseline rate and raw visual response and hence is tonically elevated even before cue onset, whereas the validity β1 coefficient tracks the relative modulation by cue type and is 0 before cue onset. (A) Average firing rates and average regression coefficients. (B) Cell-by-cell regression coefficients in 125–250-ms window.
Although the above results suggest that LIP neurons encode EIG, it is important to rule out confounds based on spurious differences between the informative and uninformative tasks. An important concern is that, although our tasks nominally equated reward expectations, variations in the monkeys’ performance may have influenced the rewards that were de facto experienced for the different cues. The latencies of the monkeys’ saccades to uninformative items scaled inversely as a function of expected reward (P < 0.01 in each monkey), confirming that the monkeys were sensitive to expected reward even when making a saccade to an uninformative item. However, the reward rate differences between the 55% and 80% cues tended to be slightly higher for informative cues relative to uninformative items (across all sessions—M1: informative, 53.1 ± 0.7% vs. 76.3 ± 0.08%; uninformative: 54.7 ± 1.1% vs. 73.6 ± 1.0%, two-way ANOVA, P < 10−59 for validity, P = 0.53 for condition and P = 0.008 for interaction; M2 informative, 58.1 ± 2.6 vs. 77.3 ± 2.2%; uninformative, 51.6 ± 2.0% vs. 61.1 ± 1.5%, two-way ANOVA, P < 10−4 for validity, P < 10−2 for condition and P = 0.045 for interaction). Two observations show that this performance difference could not explain the neuronal results. First, as documented above (Fig. 4B), the neuronal validity effects were similar in the two monkeys despite their different performance. Second, we found no correlation between the experienced reward differential and the EIG modulations in individual cells in either monkey for informative cues (Spearman rank coefficient—M1, r = −0.1, P = 0.44; M2, r = 0.16, P = 0.68) and, most importantly, for uninformative items (Fig. 5A; Spearman rank coefficient—entire sample r = −0.04, P = 0.76; M1, r = −0.07, P = 0.61; M2, r = 0.34, P = 0.39). Finally, additional regression analyses that incorporated terms for trial by trial reward history (Eqs. S5 and S6) showed that the effects of prior trial rewards were minimal and limited to the baseline period before cue onset (Fig. 5B), ruling out artifacts related to the experienced reward rates or reward history.
Fig. 5.
Selective encoding of validity cannot be explained by task-related confounds. (A) Reward differentials. Each point represents one neuron. The y axis shows the reward coefficient on uninformative trials, and the x axis shows the difference in the fractional rewards obtained for the 80% versus the 55% uninformative stimuli during the recording of that neuron. The extent to which different neurons modulate for uninformative cues is not correlated with the difference in the average rewards experienced for those cues across or within monkeys. (B) Regression coefficients (mean and SEM) dissociating the effects of validity/reward and prior trial rewards (Eq. S5). During the 125- to 250-ms epoch of peak validity modulation, fewer than 10% of cells were significantly modulated by the prior trial reward (seven and four, respectively, in the informative and uninformative conditions); this result was confirmed if we restricted the history regressor to only include trials of the same cue type (Eq. S6) or added terms to capture the effects of up to three previous trials. (C) Distributions of motion viewing times, showing longer times in the informative condition for both monkeys. In the uninformative condition, M1 shows a prominent peak at 100 ms, the minimum viewing time imposed by the experiment. (D) Regression coefficients (mean and SEM) showing that the presaccadic responses encode the validity of informative cues but not the postsaccadic viewing times (Eq. S7).
An additional concern is that variations in the monkeys’ postsaccadic motion viewing times may have affected trial length and hence temporal discounting. However, viewing times were considerably longer for informative relative to uninformative cues in both monkeys (Fig. 5C; each monkey, P < 10−28), a difference that is consistent with the informativeness of the different cues but contrary to an explanation in terms of temporal discounting (according to which we should see weaker neural modulations for informative cues, due to the greater discounting associated with these cues). On a trial by trial basis, presaccadic firing rates were not sensitive to postsaccadic viewing times, further arguing against confounds related to the postsaccadic viewing or motion discrimination (Fig. 5D and Eq. S7).
Finally, we considered potential effects of task geometry, related to the fact that, in the uninformative condition, attention had to be reoriented toward the RF after having been engaged by a precue at the opposite location (Fig. 3B). As shown in Fig. S5A, reorienting was associated with a slight enhancement of the visual response in our sample of cells (compare gray traces for informative and uninformative items), consistent with previous results (27), but the magnitude of this enhancement was not correlated with the neurons’ validity/reward modulations (Fig. S5C, black dots).
Together, these findings rule out spurious explanations related to the complexities of double-step tasks, and suggest that LIP neurons encode EIG independently of reward expectations.
EIG Is Distinct from Reward Prediction Errors.
Although, in EIG Is Encoded Independently, we considered an explanation in terms of cumulative future rewards—the payoffs that an agent can expect to receive after taking an action—an additional question is whether our findings can be explained in terms of RPE, defined as a change in reward expectation relative to a prior state (28). Even though previous investigations have not tested whether LIP neurons are sensitive to RPEs, such a sensitivity could potentially explain the neurons’ lack of modulation for uninformative cues. In the informative condition, the monkeys began each trial with a prior reward expectation of ∼78% (based on the average reward rates across the different validity cues), and the appearance of an informative cue signaled an increase (for 100% validity), no change (80% validity), or decrease (55% validity) in reward probability relative to this prior expectation. However, the appearance of an uninformative item did not alter prior beliefs (Fig. 3B), potentially explaining the lack of neuronal modulation. To examine whether the cells respond to RPE, we conducted an additional cue-change test in which a first informative cue established the monkey’s initial reward expectations and a second cue modified these expectations, producing RPEs that were independent of cue validity.
The majority of trials in the cue-change task were identical to the one-cue condition of the standard task (Fig. 3A), in that the monkeys received a single cue opposite the cell’s RF and completed the trial based on the information they sampled in this cue. In the remaining, critical 25% of trials, the initial cue disappeared before the first saccade and was replaced with another informative cue inside the RF; the monkeys viewed the second cue for an additional 500 ms during central fixation, and completed the trial by harvesting information from this cue (Fig. 6A). In these cue-change trials, therefore, monkeys would form a reward expectation based on the validity of the initial cue, and change these expectations, producing an RPE, based on the second cue. Consider, for instance, the subset of trials in which the initial cue had 100%, 80%, or 55% validity, and was followed by a second cue of 80% validity (Fig. 6A). Even though this second cue had constant validity, it signaled different RPEs according to the validity of the initial cue (i.e., −20%, 0%, and 25% after initial cues of, respectively, 100%, 0%, and 25% validity). RL model simulations (SI Methods) verified that the RPEs in the cue-change task had equivalent magnitudes to those associated with the informative cues (Fig. 6B). Therefore, if LIP neurons encoded RPE, they should show significant modulation in response to the 80% cue, which should be equivalent to the response modulations across the informative cues.
Fig. 6.
Validity responses cannot be explained by RPE. (A) Trial stages for the cue-change trials. All conventions are as in Fig. 3A. The monkeys first viewed an informative cue, which had stationary dots but a known validity (border color), and appeared simultaneously with the targets, opposite the RF (“Fixation”). On the cue-change trials depicted here (which were 25% of all trials), the initial cue was replaced with a different RF cue. Thereafter, the trial proceeded identically to the one-cue informative condition, with a 500-ms delay period, followed by a saccade to the RF cue and a second saccade to a target. Note that the first cue never delivered its motion instruction; its purpose was to establish an initial reward expectation (by virtue of its validity), which could be modified by the second cue, producing an RPE. (B) (Left) RL simulations confirm that informative cues were associated with RPEs that were proportional to their validity (“Informative”). Cues of 80% validity that appeared on cue-change trials (“Cue Change”) were associated with similar RPEs by virtue of following an initial cue of 55%, 80%, or 100% validity. The bars show mean and SEM of simulated RPEs, z-scored across all conditions. Green bars for the cue-change condition are arranged in order of RPE. (Right) LIP responses modulated for informative cues but not 80% valid cues with matched RPE. The bars show the mean and SEM of the mean-subtracted, z-scored firing rates shown in Fig. 7A, averaged across the interval of peak effect (125 ms to 250 ms after cue onset) and all 24 cells tested in this task. (**P < 0.001; two-way ANOVA with post hoc comparisons; n.s., P = 0.87).
Contrary to this prediction, the neurons modulated much more strongly as a function of validity than as a function of RPE. Focusing again on the mean-subtracted firing rates to remove the common visual response (Fig. 7A), we found a significant effect of validity but not RPE (Fig. 6B, “Data,” two-way ANOVA, P < 0.0004 for validity and validity × task interaction; post hoc comparisons, P < 0.001 for validity, P = 0.87 for RPE). Regression analysis (Eq. S8) confirmed that the cells had much stronger modulations according to validity than according to RPE in each monkey [Fig. 7B, Top; average coefficients 125 ms to 250 ms after cue onset were 3.5 ± 0.57 for informative cues vs. 0.74 ± 0.3 for RPE in the entire sample (n = 24, P < 0.0001); M1, 3.67 ± 0.7 vs. 0.91 ± 0.4, P < 0.002, n = 18; M2, 2.97 ± 0.9 vs. 0.23 ± 0.37; P < 0.024, n = 6]. Significant effects of validity and RPE were found in, respectively, 15/24 vs. 5/24 cells (62.5% vs. 21%; z-test of proportions, z = 2.3, P = 0.01).
Fig. 7.
Neural responses on the reward change test. (A) Average firing rates for (Left) valid cues and (Right) 80% cues with matched RPEs (n = 24 cells). To highlight the cue-related modulation, firing rates were z-scored after subtracting the average activity for each stimulus class (the term “Excess” indicates mean subtraction). The legends in each panel show the validity of the informative cues and the equivalent RPEs for the 80% cues on cue-change trials. (B) Average regression coefficients for the validity/reward change responses for (Top) each monkey and (Bottom) individual cells (colormaps). (C) Paired comparison of the validity and RPE effects in individual cells. All conventions are as in Fig. 4.
Control analyses ruled out explanations based on spurious factors. Reaction times for the first saccade showed a significant reduction as a function of RPE (one-way ANOVA, P = 0.0001 in each monkey), showing that the monkeys were sensitive to and motivated by the RPE associated with the second cue. Second, we controlled for the possibility that neurons only modulated for the first step in an action sequence, by using an additional subset of “change to self” trials where the first and second cues had equal validity of 100%, 80%, or 55% (Methods). On these trials, the neurons showed a significant modulation according to the validity of RF cue (P < 0.0009), verifying that a validity response could be elicited at the second step in a sequence. Finally, examination of the full (not mean-subtracted) response showed that the neurons had a robust visual response to the RF cues, which, similar to the uninformative task, showed a slight enhancement due to reorienting that was uncorrelated with the sensitivity to validity (Fig. S5 B and C).
Together, the findings rule out spurious explanations, and suggest that LIP neurons encode EIG independently of changes in reward expectations.
Discussion
We show that, in a task in which monkeys could select which stimulus to sample before choosing a final action, LIP neurons encoded the expected gains in decision information associated with alternative cues, and this encoding was distinct from the cells’ well-known visual, saccade, and reward modulations, and from an encoding of RPE. Although replicating the essential features of instrumental sampling requires relatively complex sequential paradigms (10, 29), extensive analyses and control conditions ruled out confounds that may arise in such paradigms. We discuss the implication of the findings from the perspectives of the value-based and priority-based interpretations of LIP function.
Value-Based Decisions.
A prominent interpretation of the LIP target selection responses is that they encode the relative reward values of competing options, which can be read out by downstream mechanisms to select reward-maximizing action policies (11, 22). Our results are broadly consistent with a decision-based interpretation, as the responses we found provided a presaccadic, validity-based ranking of the alternative cues that could guide the decision of which cue to sample.
Our key finding, however, is that the ranking based on EIG could not be explained by the reward mechanisms that have been considered in previous investigations. LIP neurons are sensitive to future rewards and are thought to encode the cumulative future value of an action or state, consistent with the predictions of model-free RL mechanisms (21, 30). In our task, however, the cells distinguished between informative and uninformative items with equivalent reward expectations, and did not modulate as a function of RPE [whose encoding is well established for DA cells (31, 32) but had not been investigated in LIP], indicating that they encode the EIG of visual cues in a manner that is not captured by model-free reinforcement mechanisms.
It is important to note that, although our findings imply that expected rewards are not sufficient to explain the LIP response, they leave open the possibility that rewards contribute to constructing this response. Indeed, a small fraction of the cells we examined showed significant modulations for uninformative stimuli and in the change-cue task, and may potentially encode reward expectation and/or RPE, consistent with the integrative nature of the LIP target-selective response (14, 33, 34).
Our conclusion that LIP neurons encode higher-order forms of utility beyond simple reward expectation is consistent with two previous studies demonstrating a sensitivity to values based on social status (35) and the motivational salience of a punishment-predicting cue (36). Interestingly, social status and motivational salience identify stimuli that govern future actions, raising the question of how social and motivational factors shape active sensing policies.
Our findings raise important questions about how the brain may compute EIG. One debated question is whether EIG estimates rely on explicit measures of uncertainty (37) or higher-order effects of rewards [such as convex utility function or nonstandard effects of RPE that have yet to be characterized in individual cells (38–40)]. A second key question is whether EIG is computed dynamically based on the uncertainty of each forthcoming action or relies on long-term estimates of average validity. A dynamic “look ahead” mechanism affords high flexibility, but it may be computationally expensive and may not be used consistently in all behavioral contexts (10, 29). An important question for future research is to what extent EIG computations are flexible and responsive to rapid changes in context, or rely on stored validity representations to produce routine-based sampling policies (13, 41, 42).
What benefits might the brain derive from computing a reward-independent response to EIG, given that information sampling must ultimately support reward maximization? One possible answer to this question comes from studies of cognitive control, which suggest that actions that require attentive control are those associated with new information, whereas actions that have low uncertainty are habitual and can be performed with little attention (43–45). This implies that the brain must triage cognitive effort according to not only future rewards but also the informational demands of a situation, requiring a reward-independent sensitivity to EIG.
A second answer to this question comes from the fact that the rewards associated with sampling are very indirect and, as was the case in our task, contingent on postsampling actions. In many conditions, the postsampling decisions may be quite complex (2, 41), and their rewards may be ambiguous or fully unknown [as in curiosity-based exploration (6)]. In such complex conditions, the brain may derive a significant advantage from computing decision variables over shorter time scales, related to reducing the uncertainty of a proximate action.
Attention and Gaze.
A second common interpretation of the LIP target selection response is in terms of a “priority” map that ranks competing visual cues for saccades or attention (14, 33, 46–52). Similar to the value interpretation, the priority hypothesis describes LIP as encoding a common currency for visual selection. However, the priority hypothesis goes beyond the value framework in making the specific proposal that LIP cells provide topographically organized attentional feedback facilitating early sensory discrimination (14, 53).
The EIG modulations we found cannot be explained by a number of previously described attention/priority effects. First, these modulations cannot provide topographically organized attentional feedback, as they were not aligned in time or space with the perceptual discrimination: Although the EIG responses described a cue at a peripheral location before the saccade, the motion discrimination occurred at a foveal location after the saccade. In addition, EIG responses were uncorrelated with reorienting of spatial attention (27) or saccade motor factors.
Within the priority framework, our results are broadly consistent with the idea that LIP cells encode “relevance” or “top-down” target selection (14, 54). However, it is important to note that no studies, whether at the level of neural responses or in the behavioral/computational literatures, have attempted to give a computational definition of this form of selection, producing great difficulties in our ability to model top-down attention and gaze (41, 55). A key contribution of our results, therefore, is that they reveal a specific neural signal of task relevance based on expected gains in decision information, making these signals amenable to computational modeling in future investigations.
Methods
Data were collected from two adult male rhesus monkeys (M. mulatta) using standard techniques (56), approved by the Animal Care and Use Committees of Columbia University and New York State Psychiatric Institute as complying with the guidelines within the Public Health Service Guide for the Care and Use of Laboratory Animals. Eye position was recorded with Applied Science Laboratories eye tracking system digitized at 240 Hz. Visual stimuli were presented at 57 cm viewing distance on a Sony GDM-FW900 Trinitron monitor (viewing area of 30.8° by 48.2°), and their onset was measured by a diode that detected the onset of a refresh cycle.
Task.
For all of the task versions, cues and uninformative stimuli were round patches that measured 3.5° in diameter and contained small dots (0.2°), and the targets were small squares of 0.4° on a side. Cues of different validity correctly indicated the rewarded target on, respectively, 100%, 80%, and 55% of trials and signaled the erroneous target on the remaining, randomly interleaved trials. The three validities were signaled with equiluminant gray, blue, and green borders, with validity-color mappings held constant for each monkey and randomized across monkeys. The display was adjusted for each LIP cell so that, when the monkeys held central fixation, one of the cues fell inside the RF (typically at eccentricities of 8° to 12°) while the other cue was at the diametrically opposite location, and the two targets were at equal eccentricities around an axis orthogonal to that linking the cues (also outside the RF). To ensure that the monkeys used the motion to guide their second saccade, we imposed minimum motion viewing times before making the second saccade (100 ms for M1, and 100 ms to 300 ms for M2). Rewards, if given, arrived at a fixed interval of 200 ms after the end of the second saccade.
Each trial began when the monkeys achieved and maintained central fixation for 400 ms to 600 ms. After this interval, the two targets appeared, on their own on standard informative trials (Figs. 1A and 3A, first panels), or simultaneous with the opposite-RF precue on uninformative and change-cue trials (Figs. 3B and 6A, first panels). The target/precue period lasted 300 ms for M1 and 350 ms for M2, and was followed by the appearance of the RF cues, the 500-ms delay period, and the two-saccade sequence as described in Results. Therefore, the three task versions differed only in whether or not a precue appeared together with the targets, while the timing was identical across the different tasks. In the two-cue choice task (Fig. 1A), trials with the three possible pairs of cues with unequal EIG were presented in random order, with the location of the higher-validity cue randomized to fall inside or opposite the RF.
In the uninformative cue condition (Fig. 3B), initial testing showed that the monkeys had difficulties performing single-cue informative and uninformative trials if these were interleaved. In addition, M2 had relatively low performance for the 100% valid cue (even though he reliably selected this cue on free-choice trials; Fig. 1B). To compensate for these difficulties and minimize reward confounds associated with performance differences across the two tasks, we ran the single-cue informative and uninformative trials in short interleaved trial blocks (collecting at least 10 completed trials for each condition and randomizing the order of the blocks), and based the analysis on the 55% and 80% valid cues that were tested in both monkeys.
In the cue-change task (Fig. 6), it was necessary for the monkeys to believe that the initial cue signaled reward probability on most trials, which meant that we had to maintain a low frequency of cue-change trials, which we set at 25%. This low frequency, in turn, prevented us from exhaustively examining all of the nine possible combinations of the first and second cues, and we decided to focus on the five sequences that were most relevant for our purposes. These were sequences in which the first cue had validity of 100%, 80%, or 55% and the second, RF, cue had validity of 80%, and two more sequences in which the first and second cues had equal validities of 100% and 55%. These trial types allowed us to test whether the responses to an 80% RF cue modulated according to the reward expectations set by a prior cue (Fig. 6A) and how neurons responded to cues that had different validities but zero RPE (100, 100%; 80, 80%; and 55, 55%). These five types of cue-change trials appeared with equal probability and were randomly interleaved with no-change trials. A block continued until at least five trials were completed for each trial type.
Neural Recordings.
Single electrodes were advanced into the intraparietal sulcus (IPS) using a Kopf Microdrive (David Kopf Instruments), and the data were recorded using the Advanced Processing Module for neural signal recording Fred Haer (FHC, Inc.), and MatLab (MathWorks) and Mathematica (Wolfram) were used for off-line data analysis.
Neurons were identified as belonging to LIP based on accepted anatomical and physiological criteria. We used structural MRI to guide electrode placement to the appropriate level of the IPS and, during recordings, restricted our recordings depths to 3 mm to 8 mm below the cortical surface. At the end of the recordings, we acquired a final structural MRI with electrodes inserted at the anterior and medial margins of the region from which we had obtained cells, and found that the trajectories of these electrodes were fully in the lateral bank. Thus, we can be confident that all of the other recording locations—which were posterior and lateral to these landmarks—were also in the lateral bank. For physiological verification, we screened each isolated cell and only tested it further if it had significant, spatially tuned delay period activity during a memory-guided saccade task (one-way ANOVA, P < 0.05). These well-established criteria conclusively distinguish LIP from neighboring areas that are located in the medial bank of the IPS (medial intraparietal area), on the lateral cortical surface (7a), or at the bottom of the IPS (ventral intraparietal area), and which have visual and postsaccadic responses but much weaker delay period activity before memory-guided saccades (57, 58). Note also that, because we were biased toward recording cells with high delay period activity, we cannot speak to any correlations (or lack thereof) that may exist between this activity and validity modulations.
Data Analysis.
Analysis was based on 69 well-isolated neurons (40 in M1) that were tested in the one-cue informative/uninformative task, of which 50 were also tested with the two-cue task, and 24 were tested with the cue-change task. Incomplete trials (in which the monkey did not make a second saccade) were removed from the analysis. Complete trials were analyzed whether or not they received a reward. Saccades were analyzed using a velocity-based algorithm (59), and saccade onset was calculated from the earliest sample of continuous acceleration. All statistical analyses were preceded by tests of normality and symmetry (P < 0.05). If the data met the criteria of normality and symmetry, a paired-sample t test was used. If only the symmetry criterion was met, a Wilcoxon-signed-rank test was used. If neither criterion was met, a Mann−Whitney u test was computed.
Analysis of the neural responses was always conducted on unsmoothed rates. Time-resolved regression analyses were computed using standardized coefficients, on firing rates measured in a 50-ms window stepped by 1 ms throughout the delay period. For graphical displays, the value for each time bin was plotted in the middle of the 50-ms window (e.g., 0 ms to 50 ms is plotted at 25 ms). Note that the coefficients are signed (not converted to absolute values), and thus a positive coefficient indicates a true coding of that parameter (rather than spurious effects that may masquerade as positive coding if absolute values are taken). For display purposes only, firing rates were convolved with the right half of a Gaussian kernel of 20 ms SD that was centered on the true spike time, smearing the signal only forward in time.
Our main analyses are based on z-scored firing rates, which normalizes for overall differences in firing levels across cells. However, we obtained equivalent results using nonnormalized rates (Fig. S6). To z-score firing rates within a cell, we computed the average firing rate in each trial obtained from that cell, measured from the time of fixation point onset to the end of the trial (∼200 ms after the end of the reward period). We then computed the mean and SD of these firing rates across all of the trials that were included in a given analysis, and transformed the trial by trial firing rate by subtracting the mean and dividing by the SD of this distribution. Individual trial z scores were averaged for a cell and then averaged across all cells to obtain the average and SEs of the population response.
SI Methods
Equations.
Validity and EIG.
Explicit choices of information in humans have been shown to be sensitive to the MAP estimate—i.e., favor those cues that maximize the probability that one of the decision alternatives is correct (1)—and sampling policies based on MAP may also be implemented in oculomotor control (8). In mathematical notation,
[S1] |
where i is the rewarded option and indexes the possible targets. In the current task, validity is defined in terms of the probability of signaling the correct target, and is precisely equal to MAP.
A second common measure of information gains is based on Shannon information, defined as
[S2] |
where entropy (H) is given by
At the outset of each trial, the probability that a target location is correct is P = 0.5, and the decision entropy, H(0.50), is 1 bit. After sampling motion direction information, decision entropy is reduced by H(Val), where Val is cue validity. The SEIG is the difference between the prior and posterior entropy, and equals 1.0, 0.278, and 0.007 bits for, respectively, Val = 1.0, 0.8, and 0.55. Thus, validity is monotonically related to SEIG.
Regressions.
All regressors were standardized to a range of 0 to 1. For saccade velocity, latency, accuracy, and fixation duration, we converted the raw values into percentiles from 0 to 1. In two-choice conditions, validity has a lower bound at 0.5; therefore we stretched the full validity range (0.5 to 1.0) to fit into the range of 0 to 1 (or, equivalently, plotted regression coefficients as if the line intercepted the y axes at x = 0.5 rather than x = 0). Thus, the fitted coefficients capture the firing rate modulations across the full range of regressors that were used in the task.
All of the regressions were fit to the time resolved firing rates (FR), measured in a sliding window of 50 ms width with a 1-ms step. Thus, each beta coefficient is associated with a waveform across time. The coefficient captured the time course of the average response, including the baseline firing, visual onset, and presaccadic responses that were common across all conditions.
To estimate the effects of validity independent of saccade parameters for Fig. 2B, we used the equation
[S3] |
where inRFValidity is the validity of the RF cue, oppRFValidity is the validity of the opposite RF cue, SaccLatency is saccade latency, SaccVelocity is peak saccade velocity, and SaccAccuracy is saccade angular accuracy.
To compare responses to informative cues and uninformative stimuli (Fig. 4 B and C), we fit firing rates in each condition to the equation
[S4] |
where C codes the validity of the informative RF cue in informative trials and the reward probability of the yoked uninformative stimulus in uninformative trials. The fitted coefficients capture the visual response (β0) and the validity/reward modulations (β1).
To measure the impact of local reward history (Fig. 5B), we fit firing rates in each condition to the equation
[S5] |
In this equation, Rewn−1 is coded as 1 if the previous trial received a reward and 0 otherwise, regardless of the validity of the cue.
To test for possible cue-specific effects and for effects across multiple trials, we used the additional equation
[S6] |
Here ValMatch was set to 1 if the validity of the cue on the prior trial matched that on the current trial, and 0 otherwise. Therefore, the equation models a scenario in which monkeys update reward values based on the reward history of individual cues. We computed the coefficients separately for each of the three validity levels. This resulted in nine reward coefficients, none of which were significantly above 0 during the delay epoch.
To determine the effects of EIG and fixation duration (Fig. 5D), we fit the time-resolved firing rates with
[S7] |
where VT is the stamdardized duration (percentile) of the postsaccadic fixation (viewing time, Fig. 5 C and D).
To determine the effects of change in reward (Fig. 7 B and C), we fit firing with
[S8] |
where ΔVal codes the change in reward associated with a cue. In the informative condition, ΔVal had values of +0.22, +0.02, and −0.23 for, respectively, the 100%, 80%, and 55% informative cues. In the cue-change condition, the regressor took values of −0.20, 0.00, and +0.25 for, respectively, the 100%, 80%, and 55% first cues.
RL Simulations.
RL simulations were performed to compute the values and RPEs that a reward maximizing process would attribute to each state. We modeled each setting using a Markov decision process (MDP) that includes the task states and transitions, and allowed the model to learn state values based on the final rewards using Q-learning, a standard trial and error learning algorithm. Q-learning is a well-known model-free RL technique that learns an action value function specifying the expected utility of taking a given action in a given state and following the optimal policy thereafter, and can be used to find an optimal action selection policy for any finite MDP (26).
To model the informative condition, the MDP had a cue state with two possible values corresponding to 55% and 80% validity cues, an action state with two values corresponding to the decision alternatives, and a reward state with two values corresponding to reward or no reward. The MDP for the uninformative condition was similar except that an uninformative state was interposed between the cue and action states. To model the RPE for informative cues (Fig. 6B), we used a model that was equivalent to that for the informative cues except that it had an “initial” state preceding the cues. To model RPE, we used an initial cue state that had a 55%, 80%, or 100% valid cue and could transition to the same or a different validity at the next step. Note that we only modeled changes in validity and ignored mere changes in location. Thus, the probability of transitioning to the same validity was 87.5% for the 55% and 100% initial cues and 100% for the 80% cues [based on the probability of change trials (in which only one cue appeared, 75% of total) and “jump to self” trials in which the second cue signaled the same validity as the first].
At each time instant, the Q values are computed as follows: , where is the standard temporal difference error or RPE. For each simulation, we computed the optimal values using an exact value iteration method. In each instance (Figs. 3C and 6B), we analyzed the z-scored Q values of the cue states computed over 1,000 simulations at the point when learning converged (with convergence evaluated by comparison with the optimal values).
Note that, because we did not aim to replicate the precise response magnitudes shown by the cells, we arbitrarily set the simulation parameters in a way that would approximate these magnitudes (learning rate α = 0.25, discount factor γ = 0.9, and an ε-greedy policy of 50%) and focused on the model predictions about the relative values of the different cues and across the informative and uninformative conditions.
Acknowledgments
We thank Latoya Palmer and Cherise Washington for expert administrative assistance; Girma Asfaw for superior veterinary care; and Genevieve Price, Santiago Alonso Diaz, Kirsten Quiles, and Richard Meehan for help with animal training. The work was supported by the Kavli Institute for Brain Science at Columbia University, National Institutes of Health (NIH) Training Grants EY013933-10 and T32 MH015174-35, and NIH Grants 5R01MH098039 (to J.G.) and R24 EY015634 (to J.G.).
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1613844114/-/DCSupplemental.
References
- 1.Nelson JD. Finding useful questions: On Bayesian diagnosticity, probability, impact, and information gain. Psychol Rev. 2005;112:979–999. doi: 10.1037/0033-295X.112.4.979. [DOI] [PubMed] [Google Scholar]
- 2.Yang SC, Lengyel M, Wolpert DM. Active sensing in the categorization of visual patterns. eLife. 2016;5:e12215. doi: 10.7554/eLife.12215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Gold JI, Shadlen MN. The neural basis of decision making. Annu Rev Neurosci. 2007;30:535–574. doi: 10.1146/annurev.neuro.29.051605.113038. [DOI] [PubMed] [Google Scholar]
- 4.Blanchard TC, Hayden BY, Bromberg-Martin ES. Orbitofrontal cortex uses distinct codes for different choice attributes in decisions motivated by curiosity. Neuron. 2015;85:602–614. doi: 10.1016/j.neuron.2014.12.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Bromberg-Martin ES, Hikosaka O. Midbrain dopamine neurons signal preference for advance information about upcoming rewards. Neuron. 2009;63:119–126. doi: 10.1016/j.neuron.2009.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gottlieb J, Oudeyer PY, Lopes M, Baranes A. Information-seeking, curiosity, and attention: Computational and neural mechanisms. Trends Cogn Sci. 2013;17:585–593. doi: 10.1016/j.tics.2013.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Nelson JD, McKenzie CR, Cottrell GW, Sejnowski TJ. Experience matters: Information acquisition optimizes probability gain. Psychol Sci. 2010;21:960–969. doi: 10.1177/0956797610372637. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Najemnik J, Geisler WS. Eye movement statistics in humans are consistent with an optimal search strategy. J Vis. 2008;8:4.1–14. doi: 10.1167/8.3.4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Johnson L, Sullivan B, Hayhoe M, Ballard D. Predicting human visuomotor behaviour in a driving task. Philos Trans R Soc Lond B Biol Sci. 2014;369:20130044. doi: 10.1098/rstb.2013.0044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Eckstein MP, Schoonveld W, Zhang S, Mack SC, Akbas E. Optimal and human eye movements to clustered low value cues to increase decision rewards during search. Vis Res. 2015;113:137–154. doi: 10.1016/j.visres.2015.05.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Sugrue LP, Corrado GS, Newsome WT. Choosing the greater of two goods: Neural currencies for valuation and decision making. Nat Rev Neurosci. 2005;6:363–375. doi: 10.1038/nrn1666. [DOI] [PubMed] [Google Scholar]
- 12.Nakamura K. Neural representation of information measure in the primate premotor cortex. J Neurophysiol. 2006;96:478–485. doi: 10.1152/jn.01326.2005. [DOI] [PubMed] [Google Scholar]
- 13.Hayhoe M, Ballard D. Modeling task control of eye movements. Curr Biol. 2014;24:R622–R628. doi: 10.1016/j.cub.2014.05.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bisley JW, Goldberg ME. Attention, intention, and priority in the parietal lobe. Annu Rev Neurosci. 2010;33:1–21. doi: 10.1146/annurev-neuro-060909-152823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Platt ML, Glimcher PW. Neural correlates of decision variables in parietal cortex. Nature. 1999;400:233–238. doi: 10.1038/22268. [DOI] [PubMed] [Google Scholar]
- 16.Lee D, Seo H, Jung MW. Neural basis of reinforcement learning and decision making. Annu Rev Neurosci. 2012;35:287–308. doi: 10.1146/annurev-neuro-062111-150512. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Louie K, Grattan LE, Glimcher PW. Reward value-based gain control: Divisive normalization in parietal cortex. J Neurosci. 2011;31:10627–10639. doi: 10.1523/JNEUROSCI.1237-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Louie K, Glimcher PW. Separating value from choice: Delay discounting activity in the lateral intraparietal area. J Neurosci. 2010;30:5498–5507. doi: 10.1523/JNEUROSCI.5742-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kable JW, Glimcher PW. The neural correlates of subjective value during intertemporal choice. Nat Neurosci. 2007;10:1625–1633. doi: 10.1038/nn2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Dorris MC, Glimcher PW. Activity in posterior parietal cortex is correlated with the relative subjective desirability of action. Neuron. 2004;44:365–378. doi: 10.1016/j.neuron.2004.09.009. [DOI] [PubMed] [Google Scholar]
- 21.Sugrue LP, Corrado GS, Newsome WT. Matching behavior and the representation of value in the parietal cortex. Science. 2004;304:1782–1787. doi: 10.1126/science.1094765. [DOI] [PubMed] [Google Scholar]
- 22.Kable JW, Glimcher PW. The neurobiology of decision: Consensus and controversy. Neuron. 2009;63:733–745. doi: 10.1016/j.neuron.2009.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Burnham KP, Anderson DR. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. 2nd Ed Springer; New York: 2002. [Google Scholar]
- 24.Gottlieb J, Hayhoe M, Hikosaka O, Rangel A. Attention, reward, and information seeking. J Neurosci. 2014;34:15497–15504. doi: 10.1523/JNEUROSCI.3270-14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Dayan P, Daw ND. Decision theory, reinforcement learning, and the brain. Cogn Affect Behav Neurosci. 2008;8:429–453. doi: 10.3758/CABN.8.4.429. [DOI] [PubMed] [Google Scholar]
- 26.Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT Press; Cambridge, MA: 1998. [Google Scholar]
- 27.Robinson DL, Bowman EM, Kertzman C. Covert orienting of attention in macaques. II. Contributions of parietal cortex. J Neurophysiol. 1995;74:698–712. doi: 10.1152/jn.1995.74.2.698. [DOI] [PubMed] [Google Scholar]
- 28.Niv Y, Schoenbaum G. Dialogues on prediction errors. Trends Cogn Sci. 2008;12:265–272. doi: 10.1016/j.tics.2008.03.006. [DOI] [PubMed] [Google Scholar]
- 29.Morvan C, Maloney LT. Human visual search does not maximize the post-saccadic probability of identifying targets. PLOS Comput Biol. 2012;8:e1002342. doi: 10.1371/journal.pcbi.1002342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Seo H, Barraclough DJ, Lee D. Lateral intraparietal cortex and reinforcement learning during a mixed-strategy game. J Neurosci. 2009;29:7278–7289. doi: 10.1523/JNEUROSCI.1479-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Waelti P, Dickinson A, Schultz W. Dopamine responses comply with basic assumptions of formal learning theory. Nature. 2001;412:43–48. doi: 10.1038/35083500. [DOI] [PubMed] [Google Scholar]
- 32.Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 2005;47:129–141. doi: 10.1016/j.neuron.2005.05.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ipata AE, Gee AL, Bisley JW, Goldberg ME. Neurons in the lateral intraparietal area create a priority map by the combination of disparate signals. Exp Brain Res. 2009;192:479–488. doi: 10.1007/s00221-008-1557-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Park IM, Meister ML, Huk AC, Pillow JW. Encoding and decoding in parietal cortex during sensorimotor decision-making. Nat Neurosci. 2014;17:1395–1403. doi: 10.1038/nn.3800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Klein JT, Deaner RO, Platt ML. Neural correlates of social target value in macaque parietal cortex. Curr Biol. 2008;18:419–424. doi: 10.1016/j.cub.2008.02.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Leathers ML, Olson CR. In monkeys making value-based decisions, LIP neurons encode cue salience and not action value. Science. 2012;338:132–135. doi: 10.1126/science.1226405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Daddaoua N, Lopes M, Gottlieb J. Intrinsically motivated oculomotor exploration guided by uncertainty reduction and conditioned reinforcement in non-human primates. Sci Rep. 2016;6:20202. doi: 10.1038/srep20202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Beierholm UR, Dayan P. Pavlovian-instrumental interaction in ‘observing behavior.’. PLOS Comput Biol. 2010;6:e1000903. doi: 10.1371/journal.pcbi.1000903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kreps DM, Porteus EL. Temporal resolution of uncertainty and dynamic choice theory. Econometrica. 1978;46:185–200. [Google Scholar]
- 40.Iigaya K, Story GW, Kurth-Nelson Z, Dolan RJ, Dayan P. The modulation of savouring by prediction error and its effects on choice. eLife. 2016;5:e13747. doi: 10.7554/eLife.13747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Tatler BW, Hayhoe MM, Land MF, Ballard DH. Eye guidance in natural vision: Reinterpreting salience. J Vis. 2011;11:5–25. doi: 10.1167/11.5.5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Desrochers TM, Jin DZ, Goodman ND, Graybiel AM. Optimal habits can develop spontaneously through sensitivity to local cost. Proc Natl Acad Sci USA. 2010;107:20512–20517. doi: 10.1073/pnas.1013470107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Cavanagh JF, Frank MJ. Frontal theta as a mechanism for cognitive control. Trends Cogn Sci. 2014;18:414–421. doi: 10.1016/j.tics.2014.04.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Shenhav A, Botvinick MM, Cohen JD. The expected value of control: An integrative theory of anterior cingulate cortex function. Neuron. 2013;79:217–240. doi: 10.1016/j.neuron.2013.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Fan J. An information theory account of cognitive control. Front Hum Neurosci. 2014;8:680. doi: 10.3389/fnhum.2014.00680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Mirpour K, Bisley JW. Dissociating activity in the lateral intraparietal area from value using a visual foraging task. Proc Natl Acad Sci USA. 2012;109:10083–10088. doi: 10.1073/pnas.1120763109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Bisley JW, Mirpour K, Arcizet F, Ong WS. The role of the lateral intraparietal area in orienting attention and its implications for visual search. Eur J Neurosci. 2011;33:1982–1990. doi: 10.1111/j.1460-9568.2011.07700.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Mirpour K, Ong WS, Bisley JW. Microstimulation of posterior parietal cortex biases the selection of eye movement goals during search. J Neurophysiol. 2010;104:3021–3028. doi: 10.1152/jn.00397.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Foley NC, Jangraw DC, Peck C, Gottlieb J. Novelty enhances visual salience independently of reward in the parietal lobe. J Neurosci. 2014;34:7947–7957. doi: 10.1523/JNEUROSCI.4171-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Gottlieb J, Snyder LH. Spatial and non-spatial functions of the parietal cortex. Curr Opin Neurobiol. 2010;20:731–740. doi: 10.1016/j.conb.2010.09.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Gottlieb J, Kusunoki M, Goldberg ME. Simultaneous representation of saccade targets and visual onsets in monkey lateral intraparietal area. Cereb Cortex. 2005;15:1198–1206. doi: 10.1093/cercor/bhi002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Gottlieb JP, Kusunoki M, Goldberg ME. The representation of visual salience in monkey parietal cortex. Nature. 1998;391:481–484. doi: 10.1038/35135. [DOI] [PubMed] [Google Scholar]
- 53.Reynolds JH, Heeger DJ. The normalization model of attention. Neuron. 2009;61:168–185. doi: 10.1016/j.neuron.2009.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Gottlieb J, Balan P. Attention as a decision in information space. Trends Cogn Sci. 2010;14:240–248. doi: 10.1016/j.tics.2010.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Baluch F, Itti L. Mechanisms of top-down attention. Trends Neurosci. 2011;34:210–224. doi: 10.1016/j.tins.2011.02.003. [DOI] [PubMed] [Google Scholar]
- 56.Oristaglio J, Schneider DM, Balan PF, Gottlieb J. Integration of visuospatial and effector information during symbolically cued limb movements in monkey lateral intraparietal area. J Neurosci. 2006;26:8310–8319. doi: 10.1523/JNEUROSCI.1779-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Barash S, Bracewell RM, Fogassi L, Gnadt JW, Andersen RA. Saccade-related activity in the lateral intraparietal area. II. Spatial properties. J Neurophysiol. 1991;66:1109–1124. doi: 10.1152/jn.1991.66.3.1109. [DOI] [PubMed] [Google Scholar]
- 58.Snyder LH, Batista AP, Andersen RA. Intention-related activity in the posterior parietal cortex: A review. Vision Res. 2000;40:1433–1441. doi: 10.1016/s0042-6989(00)00052-3. [DOI] [PubMed] [Google Scholar]
- 59.Nyström M, Holmqvist K. An adaptive algorithm for fixation, saccade, and glissade detection in eyetracking data. Behav Res Methods. 2010;42:188–204. doi: 10.3758/BRM.42.1.188. [DOI] [PubMed] [Google Scholar]