Summary
Reward from a particular action is seldom immediate, and the influence of such delayed outcome on choice decreases with delay. It has been postulated that when faced with immediate and delayed rewards, decision makers choose the option with maximum temporally discounted value. We examined the preference of monkeys for delayed reward in a novel inter-temporal choice task and the neural basis for real-time computation of temporally discounted values in the dorsolateral prefrontal cortex. During this task, the locations of the targets associated with small and large rewards and their corresponding delays were randomly varied. We found that prefrontal neurons often encoded the temporally discounted value of reward expected from a particular option. Furthermore, activity tended to increase with discounted values for targets presented in the neuron's preferred direction, suggesting that activity related to temporally discounted values in the prefrontal cortex might determine the animal's behavior during inter-temporal choice.
Introduction
During decision making, the outcomes expected from alternative actions are often evaluated along multiple dimensions, such as the magnitude and likelihood of expected reward. Furthermore, outcomes from choices are commonly delayed. Therefore, time has to be taken into consideration, not only because of uncertainty intrinsic to the future events, but also because humans and animals must continuously satisfy their basic physiological needs (Stephens and Krebs, 1986). Indeed, it has been extensively demonstrated that both humans and animals value immediate reward more than delayed reward (Frederick et al., 2002; Kalenscher and Pennartz, 2008). This is referred to as temporal discounting, and how the value of reward changes as a function of its delay is described by a discount function. Similarly, the value of reward depreciated according to its delay is referred to as temporally discounted value. If future reward is devalued at a constant rate, for example, to compensate for a constant probability that reward may be lost at any given time, then the resulting discount function would be an exponential function (Samuelson, 1937). Indeed, exponential discount function has been observed in human decision makers, when this was required to maximize the overall income (Schweighofer et al., 2006). However, the majority of empirical studies in humans and animals found that the slope of discount function or discount rate decreases with delay. Namely, an immediate reward is much more attractive than can be accounted for by an exponential discount function. This can be reasonably well accounted for by a hyperbolic discount function (Mazur, 1987; Rachlin et al., 1991; Green and Myerson, 2004).
Abnormal preference for immediate reward during inter-temporal choice has been implicated in several psychiatric disorders, such as attention deficit hyperactivity disorders and substance abuse (Whittmann and Paulus, 2008). Despite the fundamental role of time in decision making, however, the neural basis of temporal discounting remains poorly understood. For example, neural activity related to reward delay has been identified in the rodent orbitofrontal cortex (Roesch et al., 2006) and in the avian analog of the prefrontal cortex (Kalenscher et al., 2005), but such signals were largely seen only after the animal made its choice in a given trial. Therefore, how the brain encodes the temporally discounted values to guide the animal's choice during inter-temporal choice is presently unknown. Furthermore, previous animal studies on inter-temporal choice have relied on adjusting procedures (Mazur, 1987), in which the delay for the small reward is fixed and only the delay for the larger reward is changed gradually to determine the delays that equalize the animal's preference for small and large rewards (Kalenscher et al., 2005). When the reward delays are adjusted gradually, however, temporally discounted values of alternative choices can be predicted from those in the previous trial, making it unnecessary for the animals to compute the temporally discounted values for the alternative outcomes on a trial-by-trial basis. In addition, under an adjusting procedure, animals tend to choose the same target in successive trials (Cardinal et al., 2002), and this makes it difficult to distinguish neural signals related to temporally discounted values from those related to the animal's previous choices and their outcomes (Barraclough et al., 2004; Seo and Lee, 2007). Therefore, in the inter-temporal choice task used in the present study, the magnitude and delay of reward associated with a given target were varied randomly across trials.
In the present study, we focused on whether and how the dorsolateral prefrontal cortex (DLPFC) is involved in temporal discounting and inter-temporal choice. The DLPFC is implicated in the contextual control of behaviors (Miller and Cohen, 2001) and decision making (Lee et al., 2007). In addition, human DLPFC increases its activity during inter-temporal choice (McClure et al., 2004, 2007; Tanaka et al., 2004; Berns et al., 2007). Previous single-neuron recording studies have also identified neural signals related to the reward magnitude and delay in the DLPFC, when the animal was instructed to produce a particular response (Leon and Shadlen, 1999; Roesch and Olson, 2005a; Tsujimoto and Sawaguchi, 2005). The results from the present study show that many neurons in the DLPFC changed their activity similarly when the magnitude of reward from a particular target decreased and when the same reward was delayed, suggesting that DLPFC encodes the temporally discounted value of reward expected from a particular choice.
Results
Inter-temporal choice behavior in monkeys
Three rhesus monkeys (D, H, and J) were trained in a novel inter-temporal choice task, in which the animal chose between a small immediate reward and a large delayed reward by making an eye movement towards one of two visual targets. A green target delivered a small reward when it was chosen by the animal, whereas a red target delivered a large reward. Throughout the paper, therefore, the green and red targets are referred to as small-reward and large-reward targets, respectively. The delay for reward from each target was indicated by a clock, which consisted of a variable number of dots surrounding the target (Figure 1A). The position of the green and red targets and the reward delays were randomized across trials. In Experiment I (monkeys D and J), the interval between the target fixation and reward delivery was indicated by a clock consisting of yellow dots (1 s/dot) displayed around each target (Figure 1A), whereas in Experiment II (monkeys H and J), the reward delay and the number of dots in the clock were manipulated separately by using two alternative dot colors (Figure 1B; yellow, 1 s/dot; cyan, 4 s/dot).
All three animals chose the small-reward target more frequently as the delay for the small reward decreased and as the delay for the large reward increased (Figure 1C, D), indicating that their preference for delayed reward was systematically influenced by the information about the reward delays extracted from the clocks. In addition, their choice behaviors were better accounted for by a hyperbolic discount function than by an exponential discount function (see Experimental Procedures, Eq 2 and 3). For Experiments I and II (116 and 142 sessions, respectively), a hyperbolic discount function provided a better fit in 81.0% and 79.6% of the sessions, respectively. For both experiments, the mean normalized log likelihood was significantly larger for the hyperbolic discount function (paired t-test, p<10−5). For Experiment I, the median values of discount factor (κ) in the hyperbolic discount function were 0.12 and 0.35 s−1 for monkeys D and J, respectively (Figure 1C). For Experiment II, the corresponding values for monkeys H and J were 0.31 and 0.46 s−1, respectively (Figure 1D).
For Experiment II, the animal's preference for delayed reward was little affected when the same delay was indicated by different combinations of yellow and cyan dots (Figure 1B), suggesting that the animals estimated the reward delays reliably regardless of the number of dots in the clock (Figure 1D, empty symbols). In addition, the analysis based on maximum likelihood revealed that the animals estimated the delays associated with each cyan dot accurately (3.83 ± 0.06 s/dot and 3.75 ± 0.06 s/dot for monkeys H and J, respectively; see Experimental Procedures). Finally, the analysis of errors showed that the effort necessary to satisfy the fixation requirement during the reward delay period was small. About 16.4% of the trials were aborted due to the animal's failure to maintain its fixation during the task. However, most of these errors occurred during the fore-period and cue-period with less than 3.5% of the errors committed during the reward delays.
Prefrontal activity related to temporally discounted values
Action potentials were recorded from a total of 349 neurons (164 and 185 neurons in Experiments I and II, respectively) in the DLPFC during the inter-temporal choice task. Among them, 80 neurons were recorded from the region immediately surrounding the principal sulcus, whereas 99 and 170 neurons were from areas dorsal and ventral to the principal sulcus (Figure 2). To test whether the activity of DLPFC neurons is reliably related to the temporally discounted value of reward expected from a particular target, we estimated the temporally discounted values for both targets on a trial-by-trial basis using the discount function obtained from the animal's behavior in each session. Many DLPFC neurons displayed modulations in their activity during the cue period according to the temporally discounted value for one or both targets (see Experimental Procedures, Eq 5). For example, the neuron shown in Figure 3A significantly decreased its activity with the temporally discounted value for the right-hand target (Figure 3A, last column), whereas the neuron shown in Figure 3C increased and decreased its activity with the temporally discounted values for the left-hand and right-hand targets, respectively (Figure 3C, last two columns). The effect of temporally discounted value was significant in 32.4% of the neurons (113/349; t-test, p<0.05, corrected for multiple comparisons). We also examined the effect of temporally discounted values by treating the two targets as separate cases (698 cases = 2 targets × 349 neurons). In this analysis, the effect of temporally discounted value was significant in 27.2% of the cases (190/698). The activity increased and decreased with the temporally discounted values for 75 and 115 cases, respectively.
A correlation between neural activity and temporally discounted value could arise exclusively from the effect of reward magnitude or reward delay. If a neuron genuinely encodes the temporally discounted value of reward for a particular target, increasing the magnitude of reward and decreasing its delay for a given target should change the activity of a neuron similarly, since both of these changes increase the temporally discounted value of the same target. To test this, we first applied a regression model that included two dummy variables corresponding to the animal's choice and the position of the large-reward target in addition to the delay of the reward associated with each target (see Experimental Procedures, Eq 6). Many DLPFC neurons modulated their activity during the cue period according to one or more variables in this regression analysis. Overall, 64 neurons (18.3%) significantly changed their activity according to the position of the target chosen by the animal. In addition, 110 neurons (31.5%) modulated their activity according to the position of the target associated with the large reward. The effect of reward delay on the activity of a given neuron was evaluated separately for the two target positions. Among the total of 698 cases, a significant effect of reward delay was found for 158 cases (22.6%). In 73 of these cases (46.2%), the activity was also significantly affected by the reward magnitude. When the effects of reward magnitude and delay from the same target were both significant, these two factors tended to influence the activity of a given neuron antagonistically (63 cases, 86.3%), as expected for the signals related to temporally discounted values. For example, the two neurons shown in Figure 3A and 3C decreased its activity when the large-reward target appeared on the right side (Figure 3, first two columns). The results from the regression analysis also showed that the neuron shown in Figure 3A increased its activity with the reward delay for the right-hand target. In contrast, the neuron shown in Figure 3C significantly increased its activity when the reward delay decreased for the left-ward target. Neurons that significantly modulated their activity according to reward magnitude, reward delays, and temporally discounted values were distributed broadly within the DLPFC (Figure 2). The proportions of such neurons were not significantly different for the regions dorsal and ventral to the principal sulcus (χ2-test, p>0.5).
The regression coefficients estimated in the same regression model are not independent. Therefore, to test more directly how the activity of a given neuron was affected by changing the magnitude and delay of reward expected from a given target, we computed a magnitude index and a reward index to measure the effect of increasing the reward magnitude and delay for a given target, respectively, from two separate subsets of trials. To control for the effect of the animal's choice, these indices was calculated separately according to the position of the animal's chosen target (Figure 4A; see Experimental Procedures). As a result, this analysis was based on the data obtained in a relatively small number of trials for each neuron. Nevertheless, the magnitude index was significantly and negatively correlated with the reward index (r=−0.23, p<0.005; Figure 4B), especially when the analysis included only the cases in which the effect of temporally discounted value was statistically significant in the regression analysis (r=−0.45, p<0.001; Figure 4B, filled symbols). These results suggest that activity related to reward magnitude and delay in the DLPFC can be parsimoniously accounted for by the temporally discounted values. Moreover, since the information about the reward magnitude and delay was signaled by two unrelated visual features, these results make it unlikely that neural activity related to reward magnitude and delay merely reflected the corresponding features of the visual display.
To examine the time course of signals related to temporally discounted values, the same regression analyses described above were also applied with a 200-ms sliding window. The results from this analysis show that the signals related to the temporally discounted values emerge in the DLPFC earlier than the signals related to the animal's choice. For example, the fraction of neurons that showed the significant effects of temporally discounted values, reward magnitude, or reward delay increased rapidly at the beginning of the cue period, whereas the fraction of neurons showing the significant effects of the animal's choice increased more gradually during the cue period (Figure 5A and 5B). We also quantified the amount of variance in neural activity accounted for by a particular set of variables using the coefficient of partial determination (CPD; see Experimental Procedures). Similar to the results on the fraction of neurons with significant effects, the average CPD for the entire population of DLPFC neurons increased more rapidly for the temporally discounted values, reward magnitude and reward delays, than for the animal's choice (Figure 5C and 5D). The same analysis showed that some neurons (N=88, 25.2%) still modulated their activity according to the temporally discounted values during the first 200 ms after the cue period.
Activity related to reward value vs. sensory features
The information about reward magnitude and delay was indicated by visual stimuli during the inter-temporal choice task used in the present study. The fact that their effects on neural activity were systematically related (Figure 4B) makes it unlikely that the activity related to temporally discounted value simply reflected the information about target colors or the number of dots in the clocks. To test this possibility further, we analyzed the activity recorded during the control task in Experiment I. Target colors and clocks were identical in the inter-temporal choice and control tasks. However, during the control task, the animal was required to choose the target presented in the same color as the fixation target and always received the same amount of juice reward without any delays (see Experimental Procedures). Therefore, temporally discounted values were constant for all targets in the control trials. For the purpose of comparison with the results obtained from the inter-temporal choice, however, we computed a fictitious discounted value for each target in the control task, as if its reward magnitude and delay were determined by its color and clock as in the inter-temporal choice task. For each neuron, we then tested whether the neural activity was differentially affected by the temporally discounted values in the inter-temporal choice trials and the fictitious discounted values in the control trials, using an interaction-term for the task and temporally discounted values in a regression model (see Experimental Procedures, Eq 7).
This analysis was applied separately to the two targets for each neuron (328 cases = 2 targets × 164 neurons). During the inter-temporal choice task, 79 cases (24.1%) showed significant effects of the discounted value, whereas 60 cases (18.3%) showed significant effectsof the fictitious discounted values during the control task. Although this difference was not statistically significant (χ2-test, p=0.07), the magnitude of the standardized regression coefficients related to the fictitious discounted values were significantly smaller than those related to the temporally discounted values (paired t-test, p<10-7). Therefore, some DLPFC neurons continue to process the information about temporally discounted values, but this effect was attenuated, when they were no longer behaviorally relevant. Among the 79 cases that showed the significant effect of temporally discounted values during the inter-temporal choice task, 30 cases (38.0%) showed significant interaction between the temporally discounted value and task. Moreover, in 29 of these 30 cases, the regression coefficients were smaller in control trials (Figure 6A, empty circles). For example, the effect of discounted value on the activity of the neuron shown in Figure 3A was significantly reduced (p<0.0001) and virtually abolished in control trials (Figure 3B). Similarly, the effects of visual features signaling the reward magnitude and delay were significantly attenuated in the control trials (Figure 6, B and C).
We also found that manipulating the number of dots in the clocks separately from reward delay in Experiment II did not strongly influence the effect of temporally discounted values or reward delays on neural activity (Figure 7). With a regression model that controlled for the animal's choice, 30.0% of the cases in Experiment II (111/370) showed significant modulation in their activity related to temporally discounted values. In the majority of these neurons (69.4%, 77/111 cases), the effect of temporally discounted value was still significant even when the number of dots for each target was included as an additional independent variable in the regression model (see Experimental Procedures, Eq 8). For example, the neuron shown in Figure 3C showed a significant effect of temporally discounted value for both targets, regardless of whether the number of dots was included in the model or not (t-test, p<0.005). In addition, the standardized regression coefficients for the temporally discounted values calculated with and without the number of dots in the regression model were highly correlated across the population of DLPFC neurons (r=0.89, p<10−128; Figure 7A). Similarly, the standardized regression coefficients for the reward delay were relatively unaffected by including the number of dots in the regression model (r=0.82, p<10−92; Figure 7B; see Experimental Procedures, Eq 9). Similar results were obtained by the regression model that included a set of dummy variables indicating the number of dots in the clock for each target. In this model, the effect of a clock with m dots was modeled with a dummy variable Nm, which is set to 1 when the clock for a given target has m dots and 0 otherwise. Therefore, this model included 6 additional regressors corresponding to the clocks with 2, 5, and 8 dots for each target. The majority of the neurons which showed significant effects of temporally discounted values in the original model still did so in this new model (61.3%, 68/111 cases). The standardized regression coefficients for the temporally discounted values in these two models were highly correlated (r=0.84, p<10−97; Figure 7C). Similarly, the standardized regression coefficients for the reward delay were also relatively unaffected (r=0.77, p<10−71; Figure 7D).
Activity related to temporally discounted value and choice
Lastly, we investigated whether and how the prefrontal activity related to temporally discounted values could contribute to the animal's choice. If the activity of a given neuron is affected equally by the temporally discounted values of the two alternative targets, such activity would not be related to the difference in temporally discounted values between the two targets, and cannot influence the animal's behavior during the inter-temporal choice task. In contrast, we found that signals related to the temporally discounted values were spatially biased. Most of the neurons that significantly modulated their activity in relation to temporally discounted values showed such significant effects only for one of the targets (77.0%, 87/113 neurons). In addition, many of these neurons (46.0%, 52/113 neurons) also showed significant modulation according to the difference in the temporally discounted values for the two targets (see Experimental Procedures, Eq 10). This implies that DLPFC activity tends to encode the temporally discounted value of a target in a specific location.
To investigate further how the signals in the DLPFC related to temporally discounted values might contribute to the animal's choice, we examined the neurons that modulated their activity according to the animal's choice during the first 200 ms interval after the cue period. Among these 118 directionally tuned neurons, 26 of them (22.0%) significantly modulated their activity during the cue period according to the difference between the temporally discounted values for the two targets (Figure 8A). Most of these neurons (80.1%, 21/26 neurons) increased their activity with the temporally discounted value of the target in the preferred direction. We also examined a subset of these neurons that showed significant effects of temporally discounted values for both targets. The number of neurons that passed these conjunctive tests was relatively small (N=16). Nevertheless, the number of neurons that significantly increased and decreased their activity with the temporally discounted values for the targets in their preferred and null directions, respectively (N=9, 56%), was significantly higher than expected by chance (25%; binomial test, p<0.002).
To gain additional insights into how the signals related to the temporally discounted values in the DLPFC evolved during the cue period, we performed a regression analysis with a sliding window for the neurons that showed significant directional selectivity during the first 200 ms interval after the cue period as well as significant effects of temporally discounted values for at least one target during the cue period (N=55 neurons). These neurons were chosen without correcting for multiple statistical tests, since the goal of this analysis was not to determine the fraction of neurons with significant effects of temporally discounted values. For these neurons, standardized regression coefficients for the temporally discounted values in each time step were averaged separately for the targets in the preferred and null directions of each neuron. The results show that during the cue period, the activity of such neurons tended to increase (decrease) gradually according to the temporally discounted value of the target in the neuron's preferred (null) directions (Figure 8C). Similarly, the standardized regression coefficients estimated for the entire cue-period was significantly larger for the preferred direction than for the null direction (paired t-test, p<10−4). A subset of these neurons (N=25) was also tested in the control task of Experiment II. However, the standardized regression coefficients estimated for the fictitious discounted values based on the activity during the entire cue-period were not significantly different for the targets in the preferred and null directions (paired t-test, p=0.123). The same analysis was also repeated after including 21 neurons that showed significant directional tuning and significant effects of fictitious discounted values in the control task. The difference between the preferred and null direction was still not significant (p=0.411; Figure 8B). Similarly, the regression analysis with a sliding window revealed little divergence for the signals related to the fictitious values of the targets in the preferred and null directions during the control trials (Figure 8D).
Discussion
Using a novel inter-temporal choice task in which reward delays were indicated by clocks, we found that a hyperbolic discount function accounted for the choice behaviors of monkeys better than an exponential discount function. These results are consistent with the previous findings in humans (Rachlin et al., 1991; Frederick et al., 2002; Green and Myerson, 2004), pigeons (Mazur, 1987, 2000), and rodents (Richards et al., 1997). Although inter-temporal choice behavior has been examined in non-human primates (Tobin et al., 1996; Rosati et al., 2007), how the subjective value of delayed reward changes with its delay has not been studied quantitatively in monkeys. For a given exponential discount function, the rate of discounting is constant, whereas discount rate decreases with delay for a hyperbolic discount function. Therefore, the results from the present study indicate that monkeys tend to devalue delayed reward more steeply when the delay is relatively short. The specific value of the discount factor obtained in the present study ranged from 0.12 to 0.46 s−1, indicating that the subjective value of reward was diminished by half when the reward was delayed by 2.2 to 8.3 s. Although this indicates that the monkeys tested in the present study were quite impatient, this is comparable to the values observed in pigeons (0.3 to 2.24 s−1; Mazur, 2000) and rodents (0.07 to 0.36 s−1; Richards et al., 1997).
We also found that many neurons in the DLPFC changed their activity according to the temporally discounted value of reward from a particular target before the animal fully committed to a particular choice with its behavioral response. The results from the regression analysis with a sliding window showed that the signals related to temporally discounted values emerged in the DLPFC prior to the signals related to the animal's choice. In contrast, previous neurophysiological studies in birds (Kalenscher et al., 2005) and rodents (Roesch et al., 2006) demonstrated the effect of reward delay emerging after the animal made its choice. We also found that activity in the DLPFC related to temporally discounted value was spatially specific and tended to be spatially congruent with saccade-related activity of individual neurons. Therefore, activity related to temporally discounted values in the DLPFC did not simply reflect the animal's arousal or overall motivational level (Roesch and Olson, 2004, 2005a). Moreover, signals in the DLPFC related to the temporally discounted values for the target in the preferred direction diverged gradually from those related to the target in the null direction. These signals, therefore, might arise from competitive interactions in a recurrent attractor network and ultimately determine the animal's choice (Wang, 2002).
We demonstrated that the neural signals in the DLPFC related to temporally discounted values did not merely result from activity related to reward magnitude or reward delay exclusively. In many DLPFC neurons, signals related to reward magnitude and delay were combined such that neurons tended to change their activity similarly to the increase in reward magnitude and the decrease in reward delay for a particular target, as expected for temporally discounted value. Previously, several studies have found that the activity of neurons in the DLPFC is often modulated by the magnitude of reward expected from a particular movement (Leon and Shadlen, 1999; Kobayashi et al., 2002), or by the immediacy of reward (Tsujimoto and Sawaguchi, 2005). Roesch and Olson (2005a) have also found that the activity of some neurons in the DLPFC was influenced similarly by an increase in reward magnitude and a decrease in reward delay. However, in all of these studies, the animal was forced to produce the movement indicated by a sensory stimulus, and therefore was not allowed to choose its action freely according to its own preference for delayed reward, as in the inter-temporal choice task used in the present study. Therefore, it was difficult to determine whether DLPFC activity related to reward magnitude and delay shown in these previous studies can be used to influence the animal's choice behavior. For example, signals related to reward magnitude and delay found in the study by Roesch and Olson (2005a) were not systematically related to the direction of the animal's eye movement, leading them to conclude that such signals might more closely reflect the level of the animal's motivation rather than getting involved in the process of evaluating reward for action selection.
Several neuroimaging studies have examined the pattern of brain activations in human subjects during inter-temporal choice. However, the finding in the present study that the neurons in the DLPFC encoded the temporally discounted values in a spatially selective manner was not predicted by the results from the neuroimaging studies. This discrepancy is likely due to the limited spatial resolution available in the neuroimaging studies. For example, McClure et al. (2004, 2007) found that activity in the DLPFC and posterior parietal cortex increased during inter-temporal choice, regardless of whether the choice was based on money or juice reward. Although they did not find any changes in the activity of the DLPFC related to reward delays, they found that the activity in several brain areas, such as the ventral striatum, medial orbitofrontal cortex, and dorsal anterior cingulate cortex, increased when one of the alternative rewards was immediately available. Similarly, during inter-temporal choice involving hypothetical money, the strength of activation in the striatum and posterior insula, but not in the DLPFC, changed according to whether subjects chose immediate or delayed rewards (Whittmann et al., 2007). Kable and Glimcher (2007) also found that the activity in the ventral striatum, medial prefrontal cortex, and posterior cingulate cortex mirrored closely subject-specific temporally discounted value of delayed reward, but the activity in the DLPFC did not show a consistent pattern. The results from the present study suggest that the DLPFC might play a more specific role during inter-temporal choice by encoding the temporally discounted values than suggested by the results from these previous neuroimaging studies.
The present study focused on the role of the DLPFC in encoding the decision variables necessary for inter-temporal choice behaviors. In addition to encoding the temporally discounted values during inter-temporal choice, evaluating the expected outcomes in a temporal domain might be a key function of the prefrontal cortex. This is essential for choosing appropriate future actions (Averbeck et al., 2006) as well as selecting time-sensitive information for the storage in the capacity-limited working memory (Goldman-Rakic, 1995; Fuster, 2001). However, results from previous studies suggest that the prefrontal cortex is likely to be one of multiple nodes in a large network of cortical and subcortical areas involved in inter-temporal choice, including the basal ganglia, amygdala, orbitofrontal cortex, insula, and posterior cingulate cortex (Cardinal et al., 2001; McClure et al., 2004, 2007; Tanaka et al., 2004; Winstanley et al., 2004; Roesch and Olson, 2005b; Roesch et al., 2006, 2007; Hariri et al., 2006; Whittmann et al., 2007; Kable and Glimcher, 2007). In addition, neurons in many brain areas, such as the anterior cingulate cortex (Shidara and Richmond, 2002) and the supplementary and pre-supplementary motor areas (Sohn and Lee, 2007), also modulate their activity according to the number of additional movements necessary to obtain reward, suggesting that they might be involved in evaluating the overall costs and benefits expected from the animal's behavior (Rushworth et al., 2007). Therefore, future studies will need to investigate further on how the signals related to reward magnitude and delay are processed in other brain areas and combined in the DLPFC as well as how such signals can be ultimately translated into the animal's motor outputs.
Experimental Procedures
Animal preparation
Three male rhesus monkeys were used. During the experiment, the animal was seated on a primate chair facing a computer screen. The animal's eye position was monitored with a video-based eye tracking system with a 225-Hz sampling rate (ET-49, Thomas Recording). Single-unit activity was recorded from the dorsolateral prefrontal cortex using a multi-electrode recording system (Thomas Recording, Giessen, Germany) and a multi-channel acquisition processor (Plexon Inc, Dallas, TX). All neurons were localized anterior to the frontal eye field, which was identified by the eye movements evoked by micro-stimulation (Figure 2; Bruce et al., 1985). All the procedures used in present study were approved by the University of Rochester Committee on Animal Research and the Institutional Animal Care and Use Committee at Yale University, and conformed to the Public Health Service Policy on Humane Care and Use of Laboratory Animals and the Guide for the Care and Use of Laboratory Animals.
Behavioral task
Two rhesus monkeys were tested in each of the two separate experiments (Experiment I, monkeys D and J; Experiment II, monkeys H and J). In Experiment I, animals performed an inter-temporal choice task and a control task (see below) in separate blocks of trials. In Experiment II, animals performed only the inter-temporal choice task. During the inter-temporal choice task in both experiments, the animal began each trial by fixating a small white square at the center of a computer screen during a 1-s fore-period (Figure 1A). Two peripheral targets were then presented along the horizontal meridian (Figure 1A, “cue”). The animal was required to shift its gaze towards one of the two targets when the central square was extinguished after a 1-s cue period (Figure 1A, “go”). One of the targets (TL) was a red disk and delivered a large reward (0.4 ml of apple juice) when it was chosen. The other target (TS) was a green disk and delivered a smaller reward. The amount of juice delivered for choosing the green disk was 0.26 ml except that it was 0.2 for monkey J in Experiment II. The amount of small reward was chosen based on the pilot behavioral experiments so that the animal did not choose only one target exclusively. The positions of these two targets were randomized and counter-balanced across trials.
Experiment I
During the inter-temporal choice task of Experiment I, a given target was presented by itself, which indicated that the reward would be delivered without any delay after the target was fixated, or presented with a clock consisting of 2, 4, 6, or 8 yellow dots, corresponding to 2, 4, 6, or 8s delay for the reward delivery. Once the animal fixated its chosen target, yellow dots disappeared one at a time at the rate of 1s/dot (Figure 1A, “delay”), and the animal was rewarded when the last dot was removed.
During this reward delay period, the animal was required to fixate the chosen target until the reward delivery, but was allowed to re-fixate the target within 0.3 s after breaking the fixation. After the animal chose TS, the inter-trial interval was increased by the difference in the delays for the two targets, so that the onset of the next trial was not influenced by the animal's choice. The delay for TS was 0 or 2 s, and the delay for TL was 0, 2, 4, 6, or 8 s. During Experiment I, all possible pairs of delays for TS and TL were used unless the delay for TS was longer than the delay for TL. Each of these 9 delay pairs were presented twice in a block of inter-temporal choice trials with the position of TL counter-balanced (18 trials/block). The control task in Experiment I was identical to the inter-temporal choice task, except for the following two changes. First, the central fixation target was either green or red, and this indicated the color of the peripheral target the animal was required to choose. Incorrect trials were not rewarded. Second, the animal was always rewarded by the same amount of reward without any delay after it fixated the correct target. Blocks of inter-temporal choice and control trials were tested alternately. Neurons tested in Experiment I were included in the analysis if they were tested in at least 6 blocks for each condition (216 trials), and the majority of the neurons (135/164 neurons) were tested in 10 blocks for each condition (360 trials).
Experiment II
In Experiment II, only the inter-temporal choice task was tested, and the reward delay was 0 or 2 s for TS, and 0, 2, 5, or 8 s for TL. In addition, two different colors were used for clocks. Each yellow dot added a 1-s delay between the target fixation and reward delay, whereas each cyan dot added a 4-s delay. This made it possible to distinguish the effects of reward delay and the number of dots in the clock on neural activity, because reward delay of 5 or 8 s could be signaled by more than one combination of yellow and cyan dots (Figure 1B). Specifically, denoting the clock with NY yellow dots and NC cyan dots as (NY, NC), the delays of 0 and 2 s were always indicated by (0,0) and (2,0). The 5-s delay was indicated by (5,0) or (1,1), whereas the 8-s delay was indicated by (8,0), (4,1), or (0,2). Therefore, the delay for TL could be indicated by 7 different clocks, whereas there were only two different clocks for TS, resulting in 14 possible clock pairs. Each of these 14 clock pairs was tested twice in a block of choice trials, with the position of TL counter-balanced. Therefore, a block of trials consisted of 28 trials. Neurons tested in Experiment II were included in the analysis if they were tested in at least 8 blocks (224 trials), and the majority of neurons (144/185 neurons) were tested in 10 blocks (280 trials).
Analysis of behavior data
Denoting the temporally discounted value of a given target x as DV(Ax, Dx), where Ax and Dx indicate the magnitude and delay of the reward from target x, the probability that the animal would choose TS in a model, P(TS), was given by the Boltzmann distribution as follows,
(1) |
where TS and TL denote the targets associated with small and large rewards, and β the inverse temperature controlling the randomness of the animal's choices. The temporally discounted value was determined using an exponential discount function,
(2) |
or a hyperbolic discount function,
(3) |
where κ is a discount factor (s−1). In this study, we considered only these two discount functions that have been most frequently used in the literature. A small number of different reward delays used in this study makes it difficult to distinguish among multiple discount functions that behave similarly (McClure et al., 2004). Denoting the animal's choice in trial t as ct (TS or TL), the likelihood for the animal's choices in a given session was given by,
(4) |
where N denotes the number of trials. For each discount function, the parameters β and κ were chosen to maximize the log likelihood (Pawitan, 2001). Since both of these models include two parameters, the model with the larger log likelihood was chosen for each recording session.
To examine how accurately the animal estimated the reward delay when cyan dots were included in the clock, we assume that the animal's subjective estimate for the reward delay indicated by the clock containing NY yellow dots and NC cyan dots was DS = NY + DC NC (s). If the animal estimated the relative delays corresponding to the yellow and cyan dots accurately, the value of DC would be 4 s. The value of DC was estimated using the maximum likelihood procedure.
Analysis of neural data
Regression analyses
The spike rates during the 1-s cue period were analyzed by applying a series of regression analyses. The first model (Eq 5) included the temporally discounted values of two targets (DVL and DVR for the left-hand and right-hand targets, respectively) in addition to the dummy variable corresponding to the animal's choice (C = 0 and 1 for the left-hand and right-hand targets, respectively). The second model (Eq 6) included the reward delays for the two targets (DL and DR for the left-hand and right-hand targets, respectively) in addition to the dummy variables corresponding to the animal's choice and the position of TL (TG = 0 and 1 for TL on the right and left side).
(5) |
(6) |
where S denotes the spike rate, and a0 ∼ a4 regression coefficients. For the control trials of Experiment I, TG indicated the position of the red target (0, right; 1, left) and the delays were determined as in the inter-temporal choice trials although they were fictitious. Temporally discounted values were obtained from the hyperbolic discount function fit to the behavioral data obtained in the same session (Eq 3). For control trials, we calculated fictitious discounted values as if the color of the target and its clock indicated reward magnitude and delay as in the inter-temporal choice trials.
To test whether the effect of the temporally discounted values in the inter-temporal choice task differed from the effect of fictitious discounted values in the control task, we applied the following regression model that included a set of interaction terms.
(7) |
where CON denotes a dummy variable indicating the task (0, choice task; 1, control task), and a0 ∼ a7 regression coefficients. For the control trials, the values of DVL and DVR were given by the fictitious discounted values. Next, to test how the effect of delay or temporally discounted values was influenced by the number of dots in Experiment II, we tested the following regression models.
(8) |
(9) |
where NL (NR) denotes the number of dots in the clock for the left-hand (right-hand) target. However, the relationship between the number of dots and neural activity might be nonlinear and even non-monotonic. Therefore, to examine whether the effect of delay or temporally discounted values was influenced by activity nonlinearly related to the number of dots, we tested two further regression models by adding a set of dummy variables indicating whether the clock for each target included 2, 5, and 8 dots to the regression models described above (Eq 5 and 6; Figure 7C and 7D).
To determine whether the activity of given neuron was affected differentially by the temporally discounted values of rewards expected from the two alternative targets, we applied the regression model that includes the difference between the temporally discounted values for the two targets,
(10) |
For each neuron, we also tested whether the position of the target chosen by the animal significantly modulated the activity during 200 ms after the fixation offset (t-test; p<0.05). The direction of eye movement that elicited a significantly higher (lower) activity during this time window is referred to as the neuron's preferred (null) direction.
The statistical significance of each regression coefficient was determined with a t-test (p<0.05). Some analyses (Eq 5 and 6) were also applied using a 200-ms sliding window. Standardized regression coefficient for the i-th independent variable (also referred to as beta coefficient) is defined as ai (si/sy), in which ai denotes the raw regression coefficient, and si and sy the standard deviations of the i-th independent variable and the dependent variable, respectively (Zar, 1999). In addition, the variance accounted for by a set of independent variables was quantified by the coefficient of partial determination (CPD, also referred to as squared semi-partial correlation coefficient; Neter et al., 1996). For the independent variable X2, CPD is defined as following.
where SSE(Xi) refers to the sum of squared errors in a regression model that includes Xi. Therefore, this corresponds to the fraction of variability in the dependent variable that can be accounted by including the variable X2 in the model that already includes X1.
Comparison of magnitude and delay effects
If the activity of a given neuron encodes the temporally discounted value of reward expected from a particular target, increasing the magnitude and delay of reward from the same target should influence the activity in opposite directions. To test this prediction directly without using a regression analysis, we quantified how the neural activity during the cue period was influenced by increasing the magnitude and delay of reward associated with a particular target during a subset of trials in Experiment II. This analysis was performed separately according to the position of the target chosen by the animal (Figure 4A), so each neuron could contribute up to two cases. This analysis was applied only to the data from Experiment II, because all the conditions necessary to perform this analysis were not included in Experiment I.
To examine the effect of reward magnitude, the activity must be compared between the two sets of trials that differed only in the position of the large-reward target. Therefore, we measured the mean spike rate during the cue period of the trials in which the animal chose TS with 0-s reward delay in a particular location instead of TL with a 2-s reward delay (Figure 4A, magnitude index, top) and the mean spike rate during the cue period of the trials in which the animal chose TL in the same location with 0-s reward delay instead of TS with 2-s delay (Figure 4A, magnitude index, bottom). Denoting these two measures as M1 and M2, the magnitude index was computed as (M2 − M1) / (M2 + M1). Similarly, to examine the effect of reward delay, we measured the mean spike rate during the cue period of the trials in which the animal chose TL with no reward delay instead of TS with no delay (Figure 4A, delay index, top) and the mean spike rate in the trials in which the animal chose TL with a 2-s reward delay instead of TS with no reward delay (Figure 4A, delay index, bottom). Denoting these two measures as D1 and D2, the delay index was the computed as (D2 − D1) / (D2 + D1). We then calculated the correlation coefficient for the magnitude index and delay index. For a given neuron, this analysis could be applied as long as there was at least one trial in each of the 4 conditions described above. Although the reliability of the result would increase with the number of trials used to calculate these indices, this reduces the number of neurons that could be included in this analysis. Figure 4B shows all the cases in which there were at least 3 trials in each condition. The negative correlation between the magnitude index and the delay index was statistically significant, as long as the minimum number of trials included in each condition was less than 7.
Acknowledgments
We are grateful to B. Davis and L. Carr for their help with the experiments. This research was supported by a grant from the National Institute of Health (P01 NS048328).
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Averbeck BB, Sohn JW, Lee D. Activity in prefrontal cortex during dynamic selection of action sequences. Nat Neurosci. 2006;9:276–282. doi: 10.1038/nn1634. [DOI] [PubMed] [Google Scholar]
- Barraclough DJ, Conroy ML, Lee D. Prefrontal cortex and decision making in a mixed-strategy game. Nat Neurosci. 2004;7:404–410. doi: 10.1038/nn1209. [DOI] [PubMed] [Google Scholar]
- Berns GS, Laibson D, Loewenstein G. Intertemporal choice - toward an integrative framework. Trends Cogn Sci. 2007;11:482–488. doi: 10.1016/j.tics.2007.08.011. [DOI] [PubMed] [Google Scholar]
- Bruce CJ, Goldberg ME, Bushnell MC, Stanton GB. Primate frontal eye fields. II. Physiological and anatomical correlates of electrically evoked eye movements. J Neurophysiol. 1985;54:714–734. doi: 10.1152/jn.1985.54.3.714. [DOI] [PubMed] [Google Scholar]
- Cardinal RN, Daw N, Robbins TW, Everitt BJ. Local analysis of behaviour in the adjusting-delay task for assessing choice of delayed reinforcement. Neural Netw. 2002;15:617–634. doi: 10.1016/s0893-6080(02)00053-9. [DOI] [PubMed] [Google Scholar]
- Cardinal RN, Pennicott DR, Sugathapala CL, Robbins TW, Everitt BJ. Impulsive choice induced in rats by lesions of the nucleus accumbens core. Science. 2001;292:2499–2501. doi: 10.1126/science.1060818. [DOI] [PubMed] [Google Scholar]
- Frederick S, Loewenstein G, O'Donoghue T. Time discounting and time preference: a critical review. J Econ Lit. 2002;40:351–401. [Google Scholar]
- Fuster JM. The prefrontal cortex--an update: time is of the essence. Neuron. 2001;30:319–333. doi: 10.1016/s0896-6273(01)00285-9. [DOI] [PubMed] [Google Scholar]
- Goldman-Rakic PS. Cellular basis of working memory. Neuron. 1995;14:477–485. doi: 10.1016/0896-6273(95)90304-6. [DOI] [PubMed] [Google Scholar]
- Green L, Myerson J. A discounting framework for choice with delayed and probabilistic rewards. Psychol Bull. 2004;130:769–792. doi: 10.1037/0033-2909.130.5.769. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hariri AR, Brown SM, Williamson DE, Flory JD, de Wit H, Manuck SB. Preference for immediate over delayed rewards is associated with magnitude of ventral striatal activity. J Neurosci. 2006;26:13213–13217. doi: 10.1523/JNEUROSCI.3446-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kable JW, Glimcher PW. The neural correlates of subjective value during intertemporal choice. Nat Neurosci. 2007;10:1625–1633. doi: 10.1038/nn2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kalenscher T, Pennartz CMA. Is a bird in the hand worth two in the future? The neuroeconomics of intertemporal decision-making. Prog Neurobiol. 2008;84:284–315. doi: 10.1016/j.pneurobio.2007.11.004. [DOI] [PubMed] [Google Scholar]
- Kalenscher T, Windmann S, Diekamp B, Rose J, Güntürkün O, Colombo M. Single units in the pigeon brain integrate reward amount and time-to-reward in an impulsive choice task. Curr Biol. 2005;15:594–602. doi: 10.1016/j.cub.2005.02.052. [DOI] [PubMed] [Google Scholar]
- Kobayashi S, Lauwereyns J, Koizumi M, Sakagami M, Hikosaka O. Influence of reward expectation on visuospatial processing in macaque lateral prefrontal cortex. J Neurophysiol. 2002;87:1488–1498. doi: 10.1152/jn.00472.2001. [DOI] [PubMed] [Google Scholar]
- Lee D, Rushworth MFS, Walton ME, Watanabe M, Sakagami M. Functional specialization of the primate frontal cortex during decision making. J Neurosci. 2007;27:8170–8173. doi: 10.1523/JNEUROSCI.1561-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leon MI, Shadlen MN. Effect of expected reward magnitude on the response of neurons in the dorsolateral prefrontal cortex of the macaque. Neuron. 1999;24:415–425. doi: 10.1016/s0896-6273(00)80854-5. [DOI] [PubMed] [Google Scholar]
- Mazur JE. An adjusting procedure for studying delayed reinforcement. In: Commons ML, Mazur JE, Nevin JA, Rachlin H, editors. Quantitative Analyses of Behavior: The Effect of Delay and of Intervening Events on Reinforcement value. V. Hillsdale, NJ: Lawrence Erlbaum Associates; 1987. pp. 55–73. [Google Scholar]
- Mazur JE. Tradeoffs among delay, rate, and amount of reinforcement. Behav Process. 2000;49:1–10. doi: 10.1016/s0376-6357(00)00070-x. [DOI] [PubMed] [Google Scholar]
- McClure SM, Ericson KM, Laibson DI, Loewenstein G, Cohen JD. Time discounting for primary rewards. J Neurosci. 2007;27:5796–5804. doi: 10.1523/JNEUROSCI.4246-06.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McClure SM, Laibson DI, Loewenstein G, Cohen JD. Separate neural systems value immediate and delayed monetary rewards. Science. 2004;306:503–507. doi: 10.1126/science.1100907. [DOI] [PubMed] [Google Scholar]
- Miller EK, Cohen JD. An integrative theory of prefrontal cortex function. Annu Rev Neurosci. 2001;24:167–202. doi: 10.1146/annurev.neuro.24.1.167. [DOI] [PubMed] [Google Scholar]
- Neter J, Kutner MH, Nachtsheim CJ, Wasserman W. Applied linear statistical models. 4th. Boston: McGraw-Hall; 1996. [Google Scholar]
- Pawitan Y. All Likelihood. Oxford: Oxford University Press; 2001. [Google Scholar]
- Rachlin H, Raineri A, Cross D. Subjective probability and delay. J Exp Anal Behav. 1991;55:233–244. doi: 10.1901/jeab.1991.55-233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Richards JB, Mitchell SH, de Wit H, Seiden LS. Determination of discount functions in rats with an adjusting-amount procedure. J Exp Anal Behav. 1997;67:353–366. doi: 10.1901/jeab.1997.67-353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roesch MR, Calu DJ, Schoenbaum G. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat Neurosci. 2007;10:1615–1624. doi: 10.1038/nn2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roesch MR, Olson CR. Neuronal activity related to reward value and motivation in primate frontal cortex. Science. 2004;304:307–310. doi: 10.1126/science.1093223. [DOI] [PubMed] [Google Scholar]
- Roesch MR, Olson CR. Neuronal activity dependent on anticipated and elapsed delay in macaque prefrontal cortex, frontal and supplementary eye fields, and premotor cortex. J Neurophysiol. 2005a;94:1469–1497. doi: 10.1152/jn.00064.2005. [DOI] [PubMed] [Google Scholar]
- Roesch MR, Olson CR. Neuronal activity in primate orbitofrontal cortex reflects the value of time. J Neurophysiol. 2005b;94:2457–2471. doi: 10.1152/jn.00373.2005. [DOI] [PubMed] [Google Scholar]
- Roesch MR, Taylor AR, Schoenbaum G. Encoding of time-discounted rewards in orbitofrontal cortex is independent of value representation. Neuron. 2006;51:509–520. doi: 10.1016/j.neuron.2006.06.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosati AG, Stevens JR, Hare B, Hauser MD. The evolutionary origins of human patience: temporal preferences in chimpanzees, bonobos, and human adults. Curr Biol. 2007;17:1663–1668. doi: 10.1016/j.cub.2007.08.033. [DOI] [PubMed] [Google Scholar]
- Rushworth MFS, Behrens TEJ, Rudebeck PH, Walton ME. Contrasting roles for cingulate and orbitofrontal cortex in decisions and social behaviour. Trends Cogn Sci. 2007;11:168–176. doi: 10.1016/j.tics.2007.01.004. [DOI] [PubMed] [Google Scholar]
- Samuelson P. A note on measurement of utility. Rev Econ Stud. 1937;4:155–161. [Google Scholar]
- Schweighofer N, Shishida K, Han CE, Okamoto Y, Tanaka SC, Yamawaki S, Doya K. Humans can adopt optimal discounting strategy under real-time constraints. PLoS Comput Biol. 2006;2:e152. doi: 10.1371/journal.pcbi.0020152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seo H, Lee D. Temporal filtering of reward signals in the dorsal anterior cingulate cortex during a mixed-strategy game. J Neurosci. 2007;27:8366–8377. doi: 10.1523/JNEUROSCI.2369-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sohn JW, Lee D. Order-dependent modulation of directional signals in the supplementary and presupplementary motor areas. J Neurosci. 2007;27:13655–13666. doi: 10.1523/JNEUROSCI.2982-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shidara M, Richmond BJ. Anterior cingulate: single neuronal signals related to degree of reward expectancy. Science. 2002;296:1709–1711. doi: 10.1126/science.1069504. [DOI] [PubMed] [Google Scholar]
- Stephens DW, Krebs JR. Foraging Theory. Princeton: Princeton University Press; 1986. [Google Scholar]
- Tanaka SC, Doya K, Okada G, Ueda K, Okamoto Y, Yamawaki S. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat Neurosci. 2004;7:887–893. doi: 10.1038/nn1279. [DOI] [PubMed] [Google Scholar]
- Tobin H, Logue AW, Chelonis JJ, Ackerman KT, May JG. Self-control in the monkey Macaca fascicularis. Anim Learn Behav. 1996;24:168–174. [Google Scholar]
- Tsujimoto S, Sawaguchi T. Neuronal activity representing temporal prediction of reward in the primate prefrontal cortex. J Neurophysiol. 2005;93:3687–3692. doi: 10.1152/jn.01149.2004. [DOI] [PubMed] [Google Scholar]
- Wang XJ. Probabilistic decision making by slow reverberation in cortical circuits. Neuron. 2002;36:955–968. doi: 10.1016/s0896-6273(02)01092-9. [DOI] [PubMed] [Google Scholar]
- Winstanley CA, Theobald DEH, Cardinal RN, Robbins TW. Contrasting roles of basolateral amygdala and orbitofrontal cortex in impulsive choice. J Neurosci. 2004;24:4718–4722. doi: 10.1523/JNEUROSCI.5606-03.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wittmann M, Leland DS, Paulus MP. Time and decision making: differential contribution of the posterior insular cortex and the striatum during a delay discounting task. Exp Brain Res. 2007;179:643–653. doi: 10.1007/s00221-006-0822-y. [DOI] [PubMed] [Google Scholar]
- Whittman M, Paulus MP. Decision making, impulsivity, and time perception. Trends Cogn Sci. 2008;12:7–12. doi: 10.1016/j.tics.2007.10.004. [DOI] [PubMed] [Google Scholar]
- Zar JH. Biostatistical Analysis. 4th. Upper Saddle River; NJ: Prentice-Hall: 1999. [Google Scholar]