Abstract
To investigate mechanisms by which reward modulates target selection, we studied the behavioral effects of perturbing dopaminergic activity within the frontal eye field (FEF) of monkeys performing a saccadic choice task and simulated the effects using a plausible cortical network. We found that manipulation of FEF activity either by blocking D1 receptors (D1Rs) or by stimulating D2 receptors (D2Rs) increased the tendency to choose targets in the response field of the affected site. However, the D1R manipulation decreased the tendency to repeat choices on subsequent trials, whereas the D2R manipulation increased that tendency. Moreover, the amount of shift in target selection resulting from the two manipulations correlated in opposite ways with the baseline stochasticity of choice behavior. Our network simulation results suggest that D1Rs influence target selection mainly through their effects on the strength of inputs to the FEF and on recurrent connectivity, whereas D2Rs influence the excitability of FEF output neurons. Altogether, these results reveal dissociable dopaminergic mechanisms influencing target selection and suggest how reward can influence adaptive choice behavior via prefrontal dopamine.
Keywords: computational modeling, decision making, oculomotor
As the primary means of exploring the visual environment, we shift our gaze several times each second via saccadic eye movements. Where we look depends not only on the physical salience of visual stimuli, but also on their reward value (1–3). For example, while shopping, your attention may be captured differently by colored sales tags upon realizing that they indicate different discounts (e.g., yellow tags for 20% vs. red tags for 40% off). Studies of value-based behavior have established that midbrain dopamine (DA) neurons signal different aspects of reward (4) and that neurons in many cortical structures receiving dopaminergic projections represent the reward value of visual stimuli (5). The frontal eye field (FEF), the area of prefrontal cortex (PFC) most directly involved in triggering saccadic eye movements, receives inputs from structures encoding reward value (6), and these inputs might be sufficient to control value-based, saccadic choice (target selection). It is also possible that this control operates via FEF projections to caudate in the basal ganglia, where DA-mediated plasticity modulates reward-dependent target selection (7). In addition, the direct dopaminergic inputs to the FEF (8) could provide a mechanism that independently modulates target selection based on reward signals. Determining how dopamine within the FEF influences saccades is thus important for understanding the mechanism by which reward modulates the selection of visual targets for eye movements.
To address these questions, we manipulated dopaminergic activity within the FEF of monkeys performing a saccadic choice task. In the task, monkeys freely selected between two identical visual targets appearing at varying temporal asynchronies. By systematically altering the target onset asynchrony (TOA), we could measure the bias in selecting either target, the change in selection probability due to previous choice, and the sensitivity of choice to the TOA. The selection of either target was rewarded after the saccade, so the bias in target selection corresponded to the delay in reward that the animal endured for selecting its preferred target. Dopaminergic FEF activity was manipulated either by blocking D1 receptors (D1Rs) with a selective antagonist or by stimulating D2 receptors (D2Rs) with a selective agonist. We recently reported that these two manipulations produce equivalent effects on saccadic target selection, both increasing the propensity of monkeys to make saccades to stimuli within the response fields (RFs) of affected FEF neurons (9). Both receptor subtypes are known to exert complex, modulatory effects on neural activity and behavior (10–13). For example, D1Rs exhibit dose-dependent, inverted U-shape effects on the persistent activity of PFC neurons (14, 15) and on working memory (16). To understand the complex modulatory effects of FEF dopamine on saccadic choice behavior we looked for differential influences of D1Rs and D2Rs on that behavior and simulated those influences using a plausible cortical network. We observed dissociable influences of D1R- and D2R-mediated FEF activity on saccadic choice that could be explained by dopaminergic modulation of synaptic plasticity and neural activity within different cortical layers.
Results
Experimental Findings.
To quantify target selection, we used a task in which monkeys were trained to choose one of two stimuli as the targets of saccadic eye movements (Fig. 1A). During each experiment, we positioned one of the two targets (Tin) within the RFs of FEF neurons at the site of a drug infusion and the other target (Tout) at a diametrically opposite location. Targets were presented at varying TOAs over a wide range of values (∼ ±200 ms) and monkeys were rewarded regardless of which target they chose. Positive (negative) values of the TOA denote that Tin (Tout) appeared first, followed by the appearance of the second target TOA milliseconds later. To select the second target, the monkey thus had to wait (an amount of time equal to the TOA) until it appeared. All TOAs were presented with equal probability and were pseudorandomly interleaved such that on any given trial the monkey could not predict the TOA.
Fig. 1.
Similar effects of manipulating D1R- and D2R-mediated FEF activity on saccadic choice behavior. (A) The saccadic choice task. In the task, two targets appeared on the display asynchronously (Δt), and the monkey was rewarded for making a saccadic eye movement to either one of them. One of the targets (Tin) appeared within the FEF RF. (B) Psychometric functions from two example experiments measuring the proportion of Tin choices across TOAs before (black) and after manipulation of D1R- (red) or D2R-mediated (blue) FEF activity. Solid curves show the logistic regression fit. The vertical dashed lines denote the TOAs yielding a 0.5 proportion of Tin choices (i.e., PES). (C–E) Distribution of PES values (C), overall choice probability [p(Tin)] (D), and stochasticity of choice (σ) (E) before and after the dopaminergic manipulations. Triangles show the median of each distribution and asterisks denote significant differences (P < 0.05) between distributions.
When targets appeared simultaneously, monkeys tended to choose one target more often than the other, exhibiting a bias. By varying the TOA, we could measure that bias in terms of the delay in onset at which the monkey began to choose the alternative target (17). We refer to that bias as the point of equal selection (PES), namely the TOA at which the monkey chooses the two targets with equal probability. The PES was determined from a logistic fit of the probability of choosing Tin as a function of the TOA (Materials and Methods). Positive values of the PES denote biases in favor of Tout, whereas negative values denote biases in favor of Tin.
Saccadic choice was measured during control trials and after the manipulation of either D1R- or D2R-mediated FEF activity via the local infusion of the selective D1R antagonist SCH23390 or the selective D2R agonist quinpirole. In total, we performed 34 experiments on two monkeys: 21 D1R antagonist infusions (14 in monkey A and 7 in monkey B) and 13 D2R agonist infusions (8 in monkey A and 5 in monkey B). The infusions in monkeys A and B were done in the left and right hemispheres, respectively. We analyzed data from blocks of trials before (control) and after the drug infusion (drug). These data consisted of an average of 166 (control, SD = 48) and 173 (drug, SD = 42) trials in each of the D1R experiments and 201 (control, SD = 43) and 205 (drug, SD = 44) trials in each of the D2R experiments. As the results were consistent between the two monkeys and there were no effects of learning (Fig. S1 and SI Text, Consistency of Experimental Results Between Two Monkeys and Learning Effects), we performed our analyses using the combined data.
Fig. 1B shows the psychometric functions obtained in two example experiments in monkey A before and after a D1R antagonist or a D2R agonist infusion into the FEF. In both cases, the PES during control was slightly positive, indicating a small bias toward Tout. Following both the D1R and the D2R manipulations the PES was shifted leftward, indicating an increase in Tin selection. This pattern of results was consistent across all experiments in the two monkeys. Although there were no significant differences between the PESs measured during the separate D1R and D2R experiments, either before (controlD1R vs. controlD2R, z = −0.99, P = 0.3) or after the drug infusion (D1R vs. D2R, z = −0.64, P = 0.5), within each experiment the PES was significantly reduced by both drug manipulations (ΔPES = −29.3 ± 4.0 ms for D1R, z = −4.0, P < 0.0001; ΔPES = −20.5 ± 8.1 ms for D2R, z = −2.1, P < 0.04) (Fig. 1C). Furthermore, the magnitude of the shift toward Tin choices was independent of the control PES for both drug manipulations (r = 0.03, P = 0.9 for D1R; r = −0.09, P = 0.8 for D2R), indicating a fixed increment in Tin preference. We observed no effects of either drug manipulation on saccadic latency or amplitude (SI Text, Drug Effects on Saccade Metrics). In addition to measuring the effect of the drug manipulations on the choice bias (PES), we also examined its effect on the overall probability of selecting Tin, or p(Tin). Consistent with the PES effects, we found that both blocking D1Rs [Δp(Tin) = 0.071 ± 0.010, z = −4.0, P < 0.0001] and stimulating D2Rs [Δp(Tin) = 0.046 ± 0.016, z = −2.3, P < 0.02] increased p(Tin) above that of control trials (Fig. 1D). Thus, saccadic target choice was shifted in favor of Tin following both dopaminergic manipulations.
In addition to measuring bias in the selection of targets, we also measured the stochasticity of the monkeys’ choices, determined from σ of the fitted psychometric function (Materials and Methods). Larger values of σ correspond to greater stochasticity, i.e., a larger range of TOAs at which the choice is not determined solely by the TOA, whereas lower values of σ correspond to more deterministic choice behavior. We found that there were significant differences between the σ-values measured during the separate D1R and D2R manipulations both before (controlD1R vs. controlD2R, z = 2.4, P < 0.01) and after the drug infusion (D1R vs. D2R, z = 3.8, P < 0.0001) (Fig. 1E). The differences might be due to the different experiments being carried out on different days. However, the key comparison is between control and drug values within each manipulation, carried out on the same day. For this comparison, we found that neither the D1R (z = −0.43, P = 0.7) nor the D2R (z = −1.4, P = 0.2) altered σ. Thus, the stochasticity of choice behavior was unaltered by either dopaminergic manipulation.
In contrast to the nearly identical effects of blocking D1Rs and stimulating D2Rs on target preference [p(Tin) and PES] and on the stochasticity of choice (σ), we found that the two manipulations altered choice behavior in very different ways. First, we found that the D1R and D2R manipulations exerted different effects on the tendency of monkeys to repeat choices on subsequent trials. We quantified the tendency to repeat choices with a repetition index (RI), positive values indicating a probability of repetition that exceeds the tendency due solely to the choice bias (SI Text, Repetition Index as a Measure of Repetition in Choice). Fig. 2A shows the distribution of RIs before and after the D1R and D2R manipulations. Although neither of the control RIs significantly differed from zero (controlD1R, z = −1.4, P = 0.2; controlD2R, z = −0.73, P = 0.5), both drug manipulations yielded nonzero RIs, yet in opposite ways. The D1R antagonist reduced RIs below zero (RI = −0.032 ± 0.009, z = −2.8, P < 0.005), to values significantly less than control (ΔRI = −0.028 ± 0.010, z = −2.4, P < 0.02), indicating that monkeys became less likely to repeat target choices on subsequent trials. In contrast, the D2R agonist increased RIs above zero (RI = 0.050 ± 0.012, z = −2.8, P < 0.005), to values significantly greater than control (ΔRI = 0.040 ± 0.012, z = −2.6, P < 0.01), indicating that monkeys became more likely to repeat target choices on subsequent trials. Consequently, although the control RIs did not differ before the drug infusion (controlD1R vs. controlD2R, z = 0.78, P = 0.4), they did so after the drug infusion (D1R vs. D2R, z = 4.0, P < 0.00005). Thus, the two dopaminergic manipulations exerted opposite effects on the tendency of monkeys to repeat saccadic choices.
Fig. 2.
Dissociable effects of blocking D1Rs and stimulating D2Rs on saccadic choice behavior. (A) Distribution of the repetition index values before and after the dopaminergic manipulations. Other conventions are as in Fig. 1. (B) Correlations between changes in p(Tin) due to drug manipulations and the stochasticity of choice (σ) measured during control trials. Correlation coefficients, significance, and linear fits (solid line) are shown for the two datasets.
Second, we found that although the two dopaminergic manipulations increased Tin choices equally, the magnitude of their effects correlated with the choice stochasticity in opposite ways. Fig. 2B shows how the increases in Tin choices varied as a function of choice stochasticity measured during control trials. Following the D1R antagonist infusion, the increase in p(Tin) was positively correlated with the σ measured during control trials (r = 0.61, P = 0.003). This result indicates that during experiments in which monkeys exhibited greater stochasticity in their choices (larger σ-values, i.e., choices were less determined by the TOA), blocking D1Rs led to greater increases in Tin choices. In contrast, the increase in p(Tin) following stimulation of D2Rs was negatively correlated with the σ measured during control trials (r = −0.64, P = 0.02). Thus, during experiments in which monkeys exhibited greater stochasticity in their choices, D2R stimulation led to smaller increases in Tin choices. In addition, an analysis of covariance (ANCOVA) confirmed the contrasting relationships between stochasticity and increases in Tin selection produced by the D1R and D2R manipulations [F(1, 30) = 15.7, P = 0.0004]. Finally, we tested whether changes in choice bias and repetition were generated through the same mechanisms by computing the correlation between the two. The PES and RI were not correlated either before (controlD1R, r = 0.069, P = 0.8; controlD2R, r = 0.13, P = 0.7) or after the drug infusion (D1R, r = 0.22, P = 0.3; D2R, r = 0.44, P = 0.1). In addition, we found no significant correlation between changes in the PES and RI due to blocking D1Rs (r = 0.28, P = 0.2) or stimulating D2Rs (r = −0.36, P = 0.2).
Modeling Results.
In an attempt to account for the observed dopaminergic effects on saccadic choice behavior, we constructed a biophysically plausible cortical network model of target selection and examined how changing model parameters altered choice behavior in the task. The model was composed of two FEF columns, consisting of pools of excitatory neurons within superficial (layers II and III) and deep (layers V and VI) cortical laminae (Fig. 3A). Each column contained two pools of excitatory pyramidal neurons, one in superficial layers and one in deep layers. In addition, a pool of inhibitory interneurons mediated mutual inhibition between the excitatory pools of the superficial layers. The excitatory pools within superficial and deep layers had RFs corresponding to the location of the two saccadic targets. Both superficial layer pools were driven by three types of input: background, visual, and value based (SI Text, Computational Model). These pools competed in a winner-take-all fashion to drive deep layer (output) pools, which in turn rendered a choice. The winner-take-all property was due to connectivity of excitatory and inhibitory pools in the superficial layers. In the deep layers, however, there was only weak recurrent excitation between neurons with similar selectivity. The excitatory pools in the deep layers sent outputs to the brainstem or the superior colliculus, driving target selection. Therefore, the activity of neural pools in the deep layers determined the network’s choice on each trial. Specifically, we assumed that the network’s choice on a given trial was the target that was represented by the deep layer pool whose activity reached 15 Hz first. Consequently, changes in the excitability or input efficacy of deep layer pools can also affect decision. The model also implemented dopaminergic modulation of synaptic plasticity and neural activity, providing a means by which to simulate the D1R and D2R manipulations (Materials and Methods).
Fig. 3.
Network architecture of the model used to simulate saccadic choice and the effects of manipulating different network elements on choice behavior. (A) The model comprised two FEF columns (Tout and Tin), consisting of excitatory neurons within superficial and deep layers. Inhibitory (Inh) interneurons mediated mutual inhibition between the excitatory pools within the superficial layers. Superficial layer pools were driven by value-based, visual, and background inputs and projected to deep layer pools. Deep layer pools projected to brainstem oculomotor structures to render a saccadic choice. Connections and neural activity in the two columns were modulated by DA. To reproduce the effects of the drug infusion, only elements within the Tin column were altered in the simulations (orange). Inset shows example activity trajectories of the Tout and Tin pools [r(Tin) vs. r(Tout)] during trials on which the target appearing first (dark green or dark blue) or second (pale green or pale blue) was selected. The gray ellipse highlights the activity trajectories measured after the appearance of the first target and before the appearance of the second target (TOA epoch) on trials in which the first target (Tin) was chosen. Note that activity within the two pools diverges quickly during this epoch. The gray circle highlights the activity trajectory during the TOA epoch on trials in which the second target was chosen. Note that activity within the two pools remains equal during this interval. (B) Summary of choice behavior changes resulting from alterations to different network sites. The direction of alteration at each site was chosen such that it increased Tin selection (except for IE). Zero represents no change. For comparison, a summary of the experimental findings is shown at the bottom.
The model selected between the two saccade targets with a probability that depended on the TOA. The probability of choosing Tin tended to be 1 or 0 when the absolute value of TOA was large (i.e., when the Tin appears long before or after Tout), due to a lack of visual input to one of the columns during the interval between target onsets and activity within that column being suppressed by visually driven activity within the other. On the other hand, when the TOA was close to zero, choice behavior was less determined by the TOA. This occurs because the time interval in which visual input differs is near zero and thus the outcome of the competition between Tin and Tout pools depends on other inputs. Fig. 3A (Inset) depicts the simulated response trajectories of pools within the superficial layers of Tin and Tout columns during trials in which either the first- or the second-appearing target was chosen (Fig. S2). In the latter case, the responses of the two pools tended to remain equal in the interval between target onsets (indicated by the gray circle), but could diverge later solely due to random fluctuations in the inputs. Notably, we found that the model’s choice behavior could be fit as a sigmoid function of the TOA, similar to the experimental data (Fig. S3A). Moreover, by modulating the background inputs and overall visual inputs we could account for the experimentally observed variability in PES and σ-values (Fig. S3B and SI Text, Computational Model).
After establishing that the model can qualitatively replicate saccadic choice behavior during the control experiments, we next studied how independently altering different elements of the network changes that behavior (SI Text, Effects of Drug-Induced Alterations on the Model’s Choice Behavior). The model considered two classes of alterations, one static and one dynamic; the former corresponded to alterations to history-independent synaptic efficacy, whereas the latter corresponded to alterations to history-dependent processes (e.g., short-term plasticity, STP). First, we found that static alterations to all types of input, recurrent connections, and the excitability of the output pool (via alterations of the efficacy of the deep layer background input) could alter the probability of Tin choices, p(Tin) or equivalently the PES (Fig. 3B). Increasing the efficacy increased p(Tin) for all sites except the inhibitory–excitatory (IE) connections, where increased efficacy decreased p(Tin). These results are expected as increases in the efficacy of the aforementioned sites (except IE connections) increase the activity in the Tin column and therefore increase the selection of the Tin target. Note that alterations in the efficacy of excitatory–inhibitory connections and of superficial to deep layer pools resulted in qualitatively similar effects to those of alterations in IE and the excitability of output pools, respectively. Second, static alteration to all sites produced changes in p(Tin) that depended on the value of σ (stochasticity in choice) (Fig. S4), resembling our experimental observations. Namely, changes in p(Tin) were larger when σ was larger for static alterations to all sites except the deep layers where these changes were smaller for larger σ-values (see SI Text, Comparison of the Effects of Alterations to the Superficial Layers vs. the Deep Layers for an intuitive explanation). Third, we found that altering value-based inputs was the only static alteration that could produce changes in the RI (Fig. S4). This is because only the efficacy of value-based input carried information about the choice on the previous trial. Fourth, dynamic alteration to all sites except value-based input [via changing the rates of long-term depression (LTD) and potentiation (LTP)] produced changes in p(Tin). Moreover, as with the static alterations, changes in p(Tin) depended on the value of σ. Fifth, dynamic alteration to all sites except visual and background inputs altered the RI (Fig. S5). Specifically, reduction in STP at recurrent connections resulted in changes in the RI that were positive at excitatory–excitatory (EE) and negative at IE connections. Note that STP reduces the strength of EE connections within the recently active column and results in alternation; therefore, reduction in STP increases repetition. In addition, increases in the rates of LTD and LTP and increases in afterdepolarization (AD, an increase in membrane potential that is dependent upon a preceding action potential for its initiation) both increased the RI (Fig. S5).
After exploring the effects of manipulating individual elements of the network, we used a combination of those alterations to reproduce our experimental findings (Fig. 4). Specifically, we sought to recapitulate the increase in p(Tin) observed with both the D1R and the D2R manipulations, opposite effects of the two manipulations on the RI, and the contrasting correlations between the p(Tin) increases and baseline σ. Based on modeling results summarized in Fig. 3B, one can notice that negative correlations between the p(Tin) increases and σ can be obtained only via alterations to activity in the deep layer pool, whereas any alterations to superficial layers resulted in positive correlation between the p(Tin) increases and σ. We reproduced the D1R effects with the network simulations by (i) increasing the efficacy of all inputs and both types of recurrent connections, (ii) decreasing the rates of LTD and LTP, and (iii) decreasing STP at all inputs and recurrent connections (EE and IE), but less strongly at EE connections (Fig. S6). The D2R effects were reproduced by (i) increasing the excitability of the output pool and (ii) increasing the AD within the output pool (Fig. S6). Both sets of network alterations resulted in increases in p(Tin) that were comparable to those observed experimentally (∼0.08) (Fig. 4A). In contrast, the two sets of alterations produced opposite changes to the RI. The first set of alterations produced D1R-like decreases in the RI (ΔRI = −0.04), whereas the second set produced D2R-like increases in the RI (ΔRI = 0.04) (Fig. 4B). The D1R-like decrease in the RI was achieved via imposing weaker decreases in STP at EE connections compared with other sites. The D2R-like increase in the RI was achieved via increases in the AD within the output pool. Finally, similar to the experimental results, the first set of alterations yielded an increase in p(Tin) that was positively correlated with σ during control trials, whereas the second set of alterations produced an increase in p(Tin) that was negatively correlated with σ (Fig. 4C).
Fig. 4.
Replication of experimental results with two sets of alterations to network elements. (A–C) Distributions of p(Tin) (A) and RIs (B) before and after the two sets of network alterations and correlations (C) between alteration-induced changes in p(Tin) and stochasticity of choice (σ). Red and blue symbols indicate D1R-like and D2R-like effects, respectively. Other conventions are as in Fig. 2.
Discussion
We observed dissociable influences of DA-mediated FEF activity on saccadic target selection. Increases in target selection were achieved either by blocking D1Rs or by stimulating D2Rs. However, the former manipulation decreased the tendency to repeat choices on subsequent trials, whereas the latter increased that tendency. Moreover, the amount of shift in choice resulting from blocking D1Rs was positively correlated with baseline stochasticity of choice, whereas the amount of shift due to stimulating D2Rs was negatively correlated with stochasticity. A simple prediction of these results is that decreases in target selection would result from stimulating D1Rs or blocking D2Rs, yet the dissociable effects on repetition and choice stochasticity would remain. Our simulations using a plausible cortical model of target selection reproduced the experimental effects and pointed to the biophysical mechanisms underlying DA’s influence on target selection. Specifically, our model predicts that D1Rs influence target selection mainly through their effects on the strength of inputs to the FEF and on recurrent connectivity within superficial layers, whereas D2Rs influence the excitability of FEF output neurons.
It is significant that our cortical network model was able to reproduce the experimental results via a unique (in terms of loci and relative strength) set of alterations to multiple network elements (Fig. S6). Nonetheless, it is also important to consider whether the alterations needed to achieve the modeling results are consistent with the known prefrontal distributions of D1Rs and D2Rs and their effects on neural activity (11). Notably, it is known that D1Rs modulate the efficacy of inputs (18, 19) and both excitatory and inhibitory recurrent connections (20, 21), modulate STP at inputs (18) and recurrent connections (22–24), and influence both LTD and LTP within the PFC (25–28). These properties of D1Rs are consistent with the network alterations necessary to reproduce the experimental results, namely increases in the efficacy of inputs and recurrent connections, decreases in STP, and decreases in the rates of LTD and LTP. The requirement of stronger modulation of STP in inhibitory interneurons is also compatible with the observation that GABA-ergic activity within the FEF contributes to target selection (17). Moreover, the decrease in rates of LTD and LTP is consistent with a recent finding that D1R-mediated activity within the lateral prefrontal cortex contributes to learning of novel visuomotor associations (29).
In addition, D2Rs are expressed largely within deep layers of cortex (30, 31), where in the FEF, layer V pyramidal neurons provide the primary output to the superior colliculus and brainstem oculomotor nuclei (32, 33). In contrast, D1Rs are expressed throughout cortical layers (30). Furthermore, prefrontal D2Rs are known to enhance the excitability of layer V pyramidal neurons (34) and it was recently reported that AD is also enhanced by D2R agonists in a subtype of layer V pyramidals (35), thereby prolonging activity of these neurons for hundreds of milliseconds. Consistent with these properties, an increase in the excitability and AD of the deep layer pool was precisely the alterations to the model network required to reproduce our experimental D2R effects. Thus, not only was the model able to reproduce the dissociable effects of the D1R and D2R manipulations on target selection, but also the alterations to the model required to reproduce those effects were consistent with the known properties of D1Rs and D2Rs and their distributions across cortical layers.
Role of Dopamine in Saccadic Target Selection.
Saccadic target selection is determined in part by the reward value of potential targets (1–3). Given the clear role of the FEF in target selection, it is surprising that few studies have explored the contribution of FEF dopamine to this behavior. It has been shown that reward-dependent modulation of saccadic target selection relies on dissociable effects of D1Rs and D2Rs on neural activity in the striatum, presumably through modulation of long-term synaptic plasticity in the caudate (7, 36). Both caudate and FEF neurons have been shown to exhibit modulation of activity by the reward value of saccadic targets (37). Thus, it is important to determine the relative contributions of these two areas to value-based target selection. Similar to the caudate studies, our study also demonstrates dissociable roles of D1R- and D2R-mediated FEF activity on target selection. However, it is possible that dopaminergic modulations within these two areas could contribute to reward control of target selection under different conditions: modulation within the FEF for when the reward values are changing or are unpredictable (e.g., timing of the second target in our experiment) and modulation within the caudate for when the rewards for different targets are fixed and predictable (e.g., small and big rewards) and a bias is desirable (7, 36).
The dissociable dopaminergic effects we observed suggest potential mechanisms that may underlie different aspects of adaptive choice behavior. First, we found opposing effects of the D1R and D2R manipulations on repetition (RI), namely decreases and increases in repetition for the two manipulations, respectively. The observed decrease in repetition following the infusion of a D1R antagonist is consistent with previous results showing increases in repetitive behavior (perseveration) following the infusion of a D1R agonist into prefrontal cortex (16). One might suggest that variation in the degree of repetition may be related to the exploration–exploitation trade-off observed in adaptive choice behavior. This trade-off reflects the balance between maximizing reward based on current knowledge and testing alternative actions to acquire new knowledge (38). A recent study found that DA levels in mice influence the degree of exploitation (39). Another study in humans found evidence of a correlation between exploration and prefrontal DA (40). In the context of this trade-off, the D1R effects we observed might be viewed as an increase in exploration, whereas the D2R effects might be viewed as an increase in exploitation.
Second, although both manipulations equally increased the probability of Tin selection, those increases depended on the baseline stochasticity in choice in opposite ways. For the D1R antagonist, larger increases in Tin selection were associated with greater stochasticity in choice with respect to the TOA. In contrast, for the D2R agonist, larger increases in Tin selection were associated with smaller stochasticity. This dissociation suggests a possible mechanism for adjusting behavior according to the sensitivity to relevant information. Given that greater stochasticity reflects lower sensitivity to visual information, specifically the TOA, DA could adjust choice bias according to that sensitivity. Differential adjustments of choice bias according to sensitivity could be beneficial when multiple cues exist, but only one carries reward information. When behavior relies on the relevant cue and cue sensitivity is high, the degree of bias adjustment should be small (i.e., inversely proportional to sensitivity). On the other hand, when behavior relies on an irrelevant cue, and sensitivity to that cue is high, then the degree of bias adjustment should be large (i.e., proportional to sensitivity). The latter case may be particularly important for escaping local maxima in reward maximization (38).
Previous modeling work has helped elucidate how DA contributes to working memory via D1R-mediated differential changes to NMDA and AMPA currents (41) and differential dopaminergic modulation of NMDA currents in excitatory and inhibitory neurons (42). Here, we were able to pinpoint possible neural mechanisms through which D1Rs and D2Rs differentially alter saccadic target selection by virtue of their effects in different cortical layers. Our model suggests how dopaminergic modulation of the afferents to the FEF could alter reward-dependent choice. The predictions of this model could be tested in experiments in which reward delivery is probabilistic and therefore the animal’s choice is determined by the integration of reward history (43, 44). One might predict, for example, that after blocking D1Rs within the FEF, the form and time constant of reward integration would be altered such that the impact of previous rewards on current choices could be increased or decreased. Such an outcome could further clarify the contribution of FEF dopamine to reward-dependent choice behavior.
Materials and Methods
Experimental Procedures.
Two monkeys (Macaca mulatta) were trained on a saccadic choice task. All experimental procedures were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals, the Society for Neuroscience Guidelines and Policies, and the Stanford University Animal Care and Use Committee. For detailed general and surgical procedures see SI Text, Experimental Procedures.
Dopaminergic Manipulation of FEF Activity.
We used a microinjectrode system for simultaneous microstimulation and microinfusion of drugs (45). The center of the RF of the FEF site under study was defined by the endpoint of the saccades evoked by its electrical microstimulation. We positioned one of the targets (Tin) in the RF of the FEF site and the other one in the opposite hemifield (Tout). Small volumes (0.5–1 µL) of the selective D1R antagonist SCH23390 or the D2R agonist quinpirole were delivered into the FEF with infusion rates less than 100 nL/min. Volumes of this size diffuse ∼1–3 mm within the cortex (46) and thus affect neurons within only a few columns of the FEF. Both drugs were obtained from Sigma-Aldrich. The acidic PH of the D1R antagonist solution was adjusted to 5.5–6.0 before the infusion, whereas the D2R agonist required no PH adjustment. Given the limitations to the number of possible repeated infusions in a single cortical region, due to the risk of damage (45), we chose the two (of four possible) dopaminergic manipulations most likely to yield interpretable results.
Data Analysis.
To quantify target selection, for each experiment we measured the psychometric function by computing the probability of selecting Tin as a function of the TOA, for trials before and after drug infusion. The psychometric function was then fitted by a sigmoid (logistic function), which yielded two parameters: a bias parameter (PES) that determined the TOA for which the two targets were selected with equal frequency and a measure of stochasticity in choice (σ, often referred to as the temperature) with respect to the TOA. For one of the D2R manipulation experiments in monkey B, the choice probability plateaued at about 0.2 and 0.8 for the minimum and maximum values of the TOA, respectively. To get a better fit for this experiment, we bounded our logistic function between 0.2 and 0.8; however, our results were unaffected by this choice of fitting.
Unless otherwise mentioned, we used the Wilcoxon signed-rank test for comparison between control and drug experiments and the Wilcoxon rank sum test for comparisons between the two drug conditions (for which the z and P values are reported). For correlation measures we reported the Pearson correlation coefficient and its significance value. Unless otherwise mentioned, data are expressed as a mean plus or minus the SEM.
Model Implementation.
To simulate the superficial layers we implemented a mean-field reduction of a detailed spiking network model (47). The details of this implementation have been described previously (47, 48). The deep layer pools were simulated using a firing-rate model with a realistic response function. In addition, we incorporated short-term (STP) and long-term synaptic plasticity in the inputs, STP in connections between pools in the superficial layers, and the effect of afterdepolarization on neural activity in the deep layer pools (SI Text, Computational Model).
Simulation of Dopaminergic Manipulations.
We assumed that reward harvest is signaled globally by the phasic activity of midbrain DA and this signal results in an elevation of prefrontal DA for a few hundred milliseconds, during which short-term and long-term plasticity are modulated by DA (see SI Text, Computational Model for more details). In addition, the drug manipulation could alter connections, synaptic plasticity, or neural excitability of the column that was infused with drug, in two different ways. First, the drug manipulation could change the synaptic efficacy of a given pathway or the neural excitability of a pool (static alterations). Second, the drug manipulation could alter STP of a given pathway, DA-dependent rates of LTD and LTP of value-based inputs, or the activity-dependent change in neural excitability between trials (dynamic alterations). These alterations could affect the network through the following pathways: background, visual, and value-based inputs to the superficial layers; the strength of connections between excitatory pools, between inhibitory and excitatory pools, and from superficial to deep layer pools; and the excitability of the deep layer pools (see SI Text, Computational Model and Effects of Drug-Induced Alterations on the Model’s Choice Behavior for more details).
Supplementary Material
Acknowledgments
We thank D. S. Aldrich for technical assistance and K. L. Clark and M. Zirnsak for helpful comments on the manuscript. This work was supported by National Institutes of Health Grant EY014924.
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1221236110/-/DCSupplemental.
References
- 1.Navalpakkam V, Koch C, Rangel A, Perona P. Optimal reward harvesting in complex perceptual environments. Proc Natl Acad Sci USA. 2010;107(11):5232–5237. doi: 10.1073/pnas.0911972107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Markowitz DA, Shewcraft RA, Wong YT, Pesaran B. Competition for visual selection in the oculomotor system. J Neurosci. 2011;31(25):9298–9306. doi: 10.1523/JNEUROSCI.0908-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Schütz AC, Trommershäuser J, Gegenfurtner KR. Dynamic integration of information about salience and value for saccadic eye movements. Proc Natl Acad Sci USA. 2012;109(19):7547–7552. doi: 10.1073/pnas.1115638109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Schultz W. Getting formal with dopamine and reward. Neuron. 2002;36(2):241–263. doi: 10.1016/s0896-6273(02)00967-4. [DOI] [PubMed] [Google Scholar]
- 5.Soltani A, Wang X-J. From biophysics to cognition: Reward-dependent adaptive choice behavior. Curr Opin Neurobiol. 2008;18(2):209–216. doi: 10.1016/j.conb.2008.07.003. [DOI] [PubMed] [Google Scholar]
- 6.Huerta MF, Krubitzer LA, Kaas JH. Frontal eye field as defined by intracortical microstimulation in squirrel monkeys, owl monkeys, and macaque monkeys. II. Cortical connections. J Comp Neurol. 1987;265(3):332–361. doi: 10.1002/cne.902650304. [DOI] [PubMed] [Google Scholar]
- 7.Hikosaka O. Basal ganglia mechanisms of reward-oriented eye movement. Ann N Y Acad Sci. 2007;1104:229–249. doi: 10.1196/annals.1390.012. [DOI] [PubMed] [Google Scholar]
- 8.Berger B, Trottier S, Verney C, Gaspar P, Alvarez C. Regional and laminar distribution of the dopamine and serotonin innervation in the macaque cerebral cortex: A radioautographic study. J Comp Neurol. 1988;273(1):99–119. doi: 10.1002/cne.902730109. [DOI] [PubMed] [Google Scholar]
- 9.Noudoost B, Moore T. Control of visual cortical signals by prefrontal dopamine. Nature. 2011;474(7351):372–375. doi: 10.1038/nature09995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Missale C, Nash SR, Robinson SW, Jaber M, Caron MG. Dopamine receptors: From structure to function. Physiol Rev. 1998;78(1):189–225. doi: 10.1152/physrev.1998.78.1.189. [DOI] [PubMed] [Google Scholar]
- 11.Seamans JK, Yang CR. The principal features and mechanisms of dopamine modulation in the prefrontal cortex. Prog Neurobiol. 2004;74(1):1–58. doi: 10.1016/j.pneurobio.2004.05.006. [DOI] [PubMed] [Google Scholar]
- 12.Robbins TW, Arnsten AFT. The neuropsychopharmacology of fronto-executive function: Monoaminergic modulation. Annu Rev Neurosci. 2009;32:267–287. doi: 10.1146/annurev.neuro.051508.135535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Monte-Silva K, et al. Dose-dependent inverted U-shaped effect of dopamine (D2-like) receptor activation on focal and nonfocal plasticity in humans. J Neurosci. 2009;29(19):6124–6131. doi: 10.1523/JNEUROSCI.0728-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Williams GV, Goldman-Rakic PS. Modulation of memory fields by dopamine D1 receptors in prefrontal cortex. Nature. 1995;376(6541):572–575. doi: 10.1038/376572a0. [DOI] [PubMed] [Google Scholar]
- 15.Vijayraghavan S, Wang M, Birnbaum SG, Williams GV, Arnsten AFT. Inverted-U dopamine D1 receptor actions on prefrontal neurons engaged in working memory. Nat Neurosci. 2007;10(3):376–384. doi: 10.1038/nn1846. [DOI] [PubMed] [Google Scholar]
- 16.Zahrt J, Taylor JR, Mathew RG, Arnsten AF. Supranormal stimulation of D1 dopamine receptors in the rodent prefrontal cortex impairs spatial working memory performance. J Neurosci. 1997;17(21):8528–8535. doi: 10.1523/JNEUROSCI.17-21-08528.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Schiller PH, Tehovnik EJ. Cortical inhibitory circuits in eye-movement generation. Eur J Neurosci. 2003;18(11):3127–3133. doi: 10.1111/j.1460-9568.2003.03036.x. [DOI] [PubMed] [Google Scholar]
- 18.Law-Tho D, Hirsch JC, Crepel F. Dopamine modulation of synaptic transmission in rat prefrontal cortex: An in vitro electrophysiological study. Neurosci Res. 1994;21(2):151–160. doi: 10.1016/0168-0102(94)90157-0. [DOI] [PubMed] [Google Scholar]
- 19.Urban NN, González-Burgos G, Henze DA, Lewis DA, Barrionuevo G. Selective reduction by dopamine of excitatory synaptic inputs to pyramidal neurons in primate prefrontal cortex. J Physiol. 2002;539(Pt 3):707–712. doi: 10.1113/jphysiol.2001.015024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Gao WJ, Krimer LS, Goldman-Rakic PS. Presynaptic regulation of recurrent excitation by D1 receptors in prefrontal circuits. Proc Natl Acad Sci USA. 2001;98(1):295–300. doi: 10.1073/pnas.011524298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Gao W-J, Wang Y, Goldman-Rakic PS. Dopamine modulation of perisomatic and peridendritic inhibition in prefrontal cortex. J Neurosci. 2003;23(5):1622–1630. doi: 10.1523/JNEUROSCI.23-05-01622.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Seamans JK, Durstewitz D, Christie BR, Stevens CF, Sejnowski TJ. Dopamine D1/D5 receptor modulation of excitatory synaptic inputs to layer V prefrontal cortex neurons. Proc Natl Acad Sci USA. 2001;98(1):301–306. doi: 10.1073/pnas.011518798. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.González-Burgos G, Krimer LS, Urban NN, Barrionuevo G, Lewis DA. Synaptic efficacy during repetitive activation of excitatory inputs in primate dorsolateral prefrontal cortex. Cereb Cortex. 2004;14(5):530–542. doi: 10.1093/cercor/bhh015. [DOI] [PubMed] [Google Scholar]
- 24.Young CE, Yang CR. Dopamine D1-like receptor modulates layer- and frequency-specific short-term synaptic plasticity in rat prefrontal cortical neurons. Eur J Neurosci. 2005;21(12):3310–3320. doi: 10.1111/j.1460-9568.2005.04161.x. [DOI] [PubMed] [Google Scholar]
- 25.Otani S, Blond O, Desce JM, Crépel F. Dopamine facilitates long-term depression of glutamatergic transmission in rat prefrontal cortex. Neuroscience. 1998;85(3):669–676. doi: 10.1016/s0306-4522(97)00677-5. [DOI] [PubMed] [Google Scholar]
- 26.Gurden H, Takita M, Jay TM. Essential role of D1 but not D2 receptors in the NMDA receptor-dependent long-term potentiation at hippocampal-prefrontal cortex synapses in vivo. J Neurosci. 2000;20(22):RC106. doi: 10.1523/JNEUROSCI.20-22-j0003.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Huang Y-Y, Simpson E, Kellendonk C, Kandel ER. Genetic evidence for the bidirectional modulation of synaptic plasticity in the prefrontal cortex by D1 receptors. Proc Natl Acad Sci USA. 2004;101(9):3236–3241. doi: 10.1073/pnas.0308280101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Matsuda Y, Marzo A, Otani S. The presence of background dopamine signal converts long-term synaptic depression to potentiation in rat prefrontal cortex. J Neurosci. 2006;26(18):4803–4810. doi: 10.1523/JNEUROSCI.5312-05.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Puig MV, Miller EK. The role of prefrontal dopamine D1 receptors in the neural mechanisms of associative learning. Neuron. 2012;74(5):874–886. doi: 10.1016/j.neuron.2012.04.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Lidow MS, Goldman-Rakic PS, Gallager DW, Rakic P. Distribution of dopaminergic receptors in the primate cerebral cortex: Quantitative autoradiographic analysis using [3H]raclopride, [3H]spiperone and [3H]SCH23390. Neuroscience. 1991;40(3):657–671. doi: 10.1016/0306-4522(91)90003-7. [DOI] [PubMed] [Google Scholar]
- 31.Santana N, Mengod G, Artigas F. Quantitative analysis of the expression of dopamine D1 and D2 receptors in pyramidal and GABAergic neurons of the rat prefrontal cortex. Cereb Cortex. 2009;19(4):849–860. doi: 10.1093/cercor/bhn134. [DOI] [PubMed] [Google Scholar]
- 32.Stanton GB, Goldberg ME, Bruce CJ. Frontal eye field efferents in the macaque monkey: I. Subcortical pathways and topography of striatal and thalamic terminal fields. J Comp Neurol. 1988;271(4):473–492. doi: 10.1002/cne.902710402. [DOI] [PubMed] [Google Scholar]
- 33.Segraves MA, Goldberg ME. Functional properties of corticotectal neurons in the monkey’s frontal eye field. J Neurophysiol. 1987;58(6):1387–1419. doi: 10.1152/jn.1987.58.6.1387. [DOI] [PubMed] [Google Scholar]
- 34.Wang Y, Goldman-Rakic PS. D2 receptor regulation of synaptic burst firing in prefrontal cortical pyramidal neurons. Proc Natl Acad Sci USA. 2004;101(14):5093–5098. doi: 10.1073/pnas.0400954101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Gee S, et al. Synaptic activity unmasks dopamine D2 receptor modulation of a specific class of layer V pyramidal neurons in prefrontal cortex. J Neurosci. 2012;32(14):4959–4971. doi: 10.1523/JNEUROSCI.5835-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Nakamura K, Hikosaka O. Role of dopamine in the primate caudate nucleus in reward modulation of saccades. J Neurosci. 2006;26(20):5360–5369. doi: 10.1523/JNEUROSCI.4853-05.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Ding L, Hikosaka O. Comparison of reward modulation in the frontal eye field and caudate of the macaque. J Neurosci. 2006;26(25):6695–6703. doi: 10.1523/JNEUROSCI.0836-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press; 1998. [Google Scholar]
- 39.Beeler JA, Daw N, Frazier CRM, Zhuang X. Tonic dopamine modulates exploitation of reward learning. Front Behav Neurosci. 2010;4:1–14. doi: 10.3389/fnbeh.2010.00170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Frank MJ, Doll BB, Oas-Terpstra J, Moreno F. Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nat Neurosci. 2009;12(8):1062–1068. doi: 10.1038/nn.2342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Durstewitz D, Seamans JK, Sejnowski TJ. Dopamine-mediated stabilization of delay-period activity in a network model of prefrontal cortex. J Neurophysiol. 2000;83(3):1733–1750. doi: 10.1152/jn.2000.83.3.1733. [DOI] [PubMed] [Google Scholar]
- 42.Brunel N, Wang XJ. Effects of neuromodulation in a cortical network model of object working memory dominated by recurrent inhibition. J Comput Neurosci. 2001;11(1):63–85. doi: 10.1023/a:1011204814320. [DOI] [PubMed] [Google Scholar]
- 43.Sugrue LP, Corrado GS, Newsome WT. Matching behavior and the representation of value in the parietal cortex. Science. 2004;304(5678):1782–1787. doi: 10.1126/science.1094765. [DOI] [PubMed] [Google Scholar]
- 44.Soltani A, Wang X-J. A biophysically based neural model of matching law behavior: Melioration by stochastic synapses. J Neurosci. 2006;26(14):3731–3744. doi: 10.1523/JNEUROSCI.5159-05.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Noudoost B, Moore T. A reliable microinjectrode system for use in behaving monkeys. J Neurosci Methods. 2011;194(2):218–223. doi: 10.1016/j.jneumeth.2010.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Hupé JM, Chouvet G, Bullier J. Spatial and temporal parameters of cortical inactivation by GABA. J Neurosci Methods. 1999;86(2):129–143. doi: 10.1016/s0165-0270(98)00162-9. [DOI] [PubMed] [Google Scholar]
- 47.Wong K-F, Wang X-J. A recurrent network mechanism of time integration in perceptual decisions. J Neurosci. 2006;26(4):1314–1328. doi: 10.1523/JNEUROSCI.3733-05.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Soltani A, Wang X-J. Synaptic computation underlying probabilistic inference. Nat Neurosci. 2010;13(1):112–119. doi: 10.1038/nn.2450. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.