Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2010 Nov 8;107(47):20547–20552. doi: 10.1073/pnas.1012246107

Separate value comparison and learning mechanisms in macaque medial and lateral orbitofrontal cortex

M P Noonan 1,1, M E Walton 1, T E J Behrens 1, J Sallet 1, M J Buckley 1, M F S Rushworth 1
PMCID: PMC2996698  PMID: 21059901

Abstract

Uncertainty about the function of orbitofrontal cortex (OFC) in guiding decision-making may be a result of its medial (mOFC) and lateral (lOFC) divisions having distinct functions. Here we test the hypothesis that the mOFC is more concerned with reward-guided decision making, in contrast with the lOFC's role in reward-guided learning. Macaques performed three-armed bandit tasks and the effects of selective mOFC lesions were contrasted against lOFC lesions. First, we present analyses that make it possible to measure reward-credit assignment—a crucial component of reward-value learning—independently of the decisions animals make. The mOFC lesions do not lead to impairments in reward-credit assignment that are seen after lOFC lesions. Second, we examined how the reward values of choice options were compared. We present three analyses, one of which examines reward-guided decision making independently of reward-value learning. Lesions of the mOFC, but not the lOFC, disrupted reward-guided decision making. Impairments after mOFC lesions were a function of the multiple option contexts in which decisions were made. Contrary to axiomatic assumptions of decision theory, the mOFC-lesioned animals’ value comparisons were no longer independent of irrelevant alternatives.


Although it is widely agreed that the orbitofrontal cortex (OFC) is important for value-guided behavior (1, 2), the precise nature of its contribution is unclear. On the one hand, neuroimaging experiments suggest its ventromedial aspect is part of the circuit for deciding between options on the basis of their value (3, 4). Moreover De Martino et al. (5) have reported that ventromedial frontal activity is correlated with the degree to which subjects’ decisions are rational and uninfluenced by irrelevant features of the context in which they are made. In contrast, other accounts emphasize a role in learning and storage of option values (6).

One potential reason for lack of consensus about the OFC may be that there is functional specialization within it, as suggested by the distinct anatomical connections of its ventromedial and lateral divisions (7) The aim of our study was, therefore, to assess evidence of functional specialization within the OFC for value learning and value-guided decision making. The effect of medial OFC (mOFC) lesions was compared with that of lateral OFC (lOFC) lesions in two groups of animals.

The investigation of value-guided decision making was based on the premise that a lesion that disrupts part a circuit for deciding and discriminating between choice options of differing value should impair decision making as a function of the proximity of options’ values. By analogy, lesions of visual feature-discrimination mechanisms disrupt discrimination as a function of the similarity of the visual features (8). In experiments reported below, macaques were trained to choose between three stimuli associated with differing probabilities of reward. The probabilities of reward, therefore, determined the options’ values and the proximity of the best (V1) and second best (V2) options’ values was manipulated.

The investigation of value learning focused on credit assignment: the process by which visual stimuli are associated with reward values during associative learning (9). Normally, monkeys learn to attribute value to a stimulus as a function of the precise history of reward received in association with the choice of that particular stimulus. We have recently shown, however, that animals with lOFC lesions instead value a stimulus as a recency-weighted function of the history of all rewards received approximately at the time of its choice, even when the rewards were actually caused by choices of alternative stimuli on preceding and subsequent trials. This finding leads to the very counterintuitive prediction that lOFC lesions should make monkeys fail to improve on “easy” decisions when the reward values of the possible choices are very disparate. Although normal animals credit such stimuli with their own distinct values, in contrast, animals with lOFC lesions should credit both stimuli with approximately their mean value. Note also that this predicted pattern of behavior is the opposite of that expected from a disruption to a value comparison mechanism.

The latter part of Results focuses on value-guided decision making but the first part examines whether value learning proceeds normally after mOFC lesions. We compared the effects of mOFC lesions with previously published effects of lOFC lesions (9). All other data and analyses are uniquely presented here.

Results

Experiment 1: Varying Reward Schedules and Credit Assignment.

Before surgery, seven macaques were trained to perform three-armed bandit tasks under four different reward schedules. Three and four went on to have lOFC and mOFC lesions, respectively (Fig. 1C and Fig. S1). Each day the animals were presented with three novel visual stimuli on a touch-screen monitor (Fig. 1A). The reward probabilities associated with each stimulus were determined by preprogrammed schedules and varied continuously throughout the session (an example of varying schedule is shown in Fig. 1B and three more in Fig. S2). The schedules were identical to ones used to identify impairments in credit assignment after lOFC lesion (9). This fact, together with the absence of group difference in preoperative performance, meant direct comparisons could be made between mOFC lesions in the current experiment (Fig. 2B) and lOFC lesions (9) (Fig. 2, Insets).

Fig. 1.

Fig. 1.

Task description and histology. (A) On each trial, three clipart stimuli were presented on a touch-screen; each was associated with a different outcome probability delivered according to the reward schedule. Gray, blue, and red circles represent different 250-ms tones. (B) Example of a varying reward schedule. (C) Medial OFC (Left) and lOFC (Center) lesion locations represented on an unoperated control, with redness indicating lesion overlap (mOFC: one to four animals; lOFC: one to three animals). The full overlap of each lesion type in all animals in each group is shown for mOFC and lOFC groups in blue and pink, respectively, on the same sections (Right).

Fig. 2.

Fig. 2.

Reward-credit assignment during value learning. (A) Influence of rewards on the current trial on valuation of stimuli chosen in recent trials. The difference in likelihood of choosing option A on trial n after previously selecting option B on trial n-1 as a function of whether or not a reward was received for this choice. Data are plotted based on the length of choice history on A [(Left) one previous choice of A; (Center) two to three previous choices of A; (Right) four to seven previous choices of A]. After lOFC lesions (Inset: redrawn from ref. 9), but not mOFC lesions (main panel), the credit for the outcome (reward or no reward) received for choosing B on the current trial (n) is partly assigned to stimuli chosen in earlier trials. (B) Influence of past rewards on valuation of current choice. The difference in likelihood of choosing option B on trial n after previously selecting option A on trials n-2 to n-5 and option B on the previous trial (n-1), as a function of whether the A choices were or were not rewarded (“?” on axis labels indicates presence or absence of reward at that point in the history). After lOFC lesions (Inset: redrawn from ref. 9), but not mOFC lesions (main panel), credit for rewards received for earlier choices of A is partly assigned to the subsequent choice of B. Preoperative control, post-mOFC lesion, and post-lOFC lesion data are shown in green, blue, and pink, respectively.

Lateral OFC lesions impair performance on such schedules, particularly in the later parts of each day's testing session after the options’ values have changed (9, 10). A similar analysis was performed on the pre- and postoperative data from the mOFC. Values were assigned to stimuli A, B, and C on each trial on the basis of a Rescorla-Wagner learning algorithm. The proportions of trials on which monkeys took the best value option (V1), as opposed to the second best (V2) or worst (V3) value options, for the first, second, and last thirds of the different testing sessions were calculated. As similarly reported for lOFC lesions, mOFC lesions lead to a decrement in the proportion of V1 choices, particularly in the later periods of testing sessions (Fig. S3) (Surgery × Session period interaction: F1,6 = 8.41, P = 0.0430). The effects of the two lesions on these measures are very similar.

The next two analyses examined whether mOFC lesions induced poor performance in the same manner as lOFC lesions, by disrupting the assignment of the correct credit for an outcome to the particular stimulus that was chosen. The first credit assignment analysis examined the degree to which reward delivery after the choice, for example of stimulus B, led to revaluation of B as opposed to revaluation of other options, for example stimulus A, chosen on previous or subsequent trials. In this example, correct credit assignment should lead to reward being assigned to B and an increased likelihood of subsequently choosing B, and a decreased likelihood of choosing A, on the next trial. We compared the proportion of trials on which one particular option (e.g., stimulus B) was chosen on the next trial as a function of whether one other particular option (e.g., stimulus A) was or was not rewarded on the current trial. The comparison was also made as a function of the number of previous consecutive trials on which stimulus A had been chosen (Fig. 2A). Note that this analysis of where credit for a reward is assigned during value learning is independent of value-guided decision making because the key comparison is between trials on which macaques made the same decisions after the same prior decision histories.

After a long history of one choice (A), a new choice (B) is less likely to be reselected by lOFC-lesioned animals if rewarded than if not, because reinforcement for the B choice was erroneously assigned to the preceding choices of A (9) (Fig. 2A, Inset, and Fig. S4). The inappropriate assignment of credit to option A increased as a function of the number of times A had been picked recently (9). No such effect was seen when pre- and postoperative performance was compared in the same way in the mOFC group (Reward × Lesion; F1,3 = 0.34, P = 0.602). An ANOVA comparing pre- and postoperative performances in mOFC and lOFC groups as a function of the number of preceding choices of A confirmed lOFC lesions caused a greater impairment (Reward × Lesion × Group; F2,10 = 3.974, P = 0.020).

As well as the credit assignment of the lOFC-lesioned animals being affected by their recent choice history, credit assignment in this group was also influenced by recent reinforcement history, such that an option (e.g., stimulus B) was more likely to be reselected if a prior choice of another option (e.g., stimulus A) had been rewarded than if not, because the reward for the preceding A was erroneously assigned to B (9) (Fig. 2B, Inset, and Fig. S5). The effect was clearest when the reward for the prior choice of A had been delivered more recently (question marks “?” in Fig. 2B indicate the point in the previous trial series when reward was either delivered or not delivered). Note that again this second analysis of value learning is independent of decision history because the key comparison is being made between trials on which macaques made the same decisions after the same prior decision histories. No evidence of the same impairment was seen when pre- and post-mOFC lesion performances were compared (Reward × Lesion; F1,3 = 0.57, P = 0.504), and an ANOVA comparing pre- and post-operative performance in the mOFC and lOFC groups as a function of the trial on which reward was or was not delivered confirmed lOFC lesions caused a significantly greater change in behavior (Reward × Lesion × Group; F1,5 = 6.726, P = 0.049).

Experiment 2: Fixed Reward Schedules and Reward-Guided Decision Making.

In Experiment 2, macaques once again made choices between three options with different reward probabilities but the probabilities were fixed throughout sessions (Fig. 3 A–C). As in the previous experiment, however, novel stimuli were used for each testing session. The reward probability of the stimuli with the best (V1) and worst (V3) values were the same in all three fixed schedules (0.6 and 0, respectively), but the value of the second-best stimulus (V2) was systematically varied between 0.375 (V2_HIGH), 0.2 (V2_MID), or 0 (V2_LOW). Thus, Experiment 2 investigated how the proximity of the best two values (V1V2 value difference), affected decision-making (Fig. 3 D–I).

Fig. 3.

Fig. 3.

Effect of option value proximity. (A–C) Fixed reward schedules V2_HIGH (A), V2_MID (B), and V2_LOW (C). Proportion of choices of V1 in the fixed reward schedules, V2_HIGH (D and G), V2_MID (E and H), and V2_LOW (F and I). (D–F) Control pre-lesion (green) and post-mOFC (blue) lesion performance. (G–I) Postoperative lOFC (pink) and matched unoperated control (green) performance (D–F). Insets show number of trials to reach 70% V1 choices. Medial OFC lesions caused impairments in V2_HIGH (A and D) but lOFC lesions impaired performance in V2_MID (B and H) and V2_LOW (C and I). Proportion of V1 choices in the varying reward schedules as a function of V1V2 reward differences. Pre- and postoperative data from the mOFC (J, L, and N) and lOFC group (K, M, and O) from varying schedule trials with V1V2 value differences corresponding to the fixed-schedule value differences in DF and GI. (J–O) The proportion of trials on which animals failed to pick V1 so they can be compared readily with the Insets in DI. Once again, the mOFC impairment was apparent on trials with proximate value options (compare J and H).

Postoperatively, the mOFC group was profoundly impaired when V1 and V2 values were proximate. Choice behavior was quantified as a function of the number of trials taken to reach a criterion of ≥70% best (V1) choices across 10 trials (9). Analysis of mOFC group results revealed a significant interaction between testing session (pre- and postsurgery) and condition (F2,6 = 5.63, P = 0.042) because of poorer postoperative performance with more proximate option values (t3 = −4.05, P = 0.027). Such a pattern suggests the mOFC lesions compromise the process by which options’ values are compared so that proximate values can no longer be distinguished.

In contrast, lOFC lesions affected performance in the opposite manner (Fig. 3 G–I). A comparison of the lOFC and mOFC groups’ postoperative performance revealed a double dissociation in impairments (Group × Condition: F2,10 = 11.38, P = 0.003). The mOFC lesion group was worse than the lOFC group in the V2_HIGH condition (t5 = 2.87, P = 0.035), where the option values are proximate whereas the lOFC group was worse than the mOFC group in the V2_MID and V2_lOW conditions (t5 = −4.71, P = 0.005; t5 = −2.79, P = 0.039, respectively), where the option values were disparate. Note the lOFC impairment is the one predicted if lOFC damage disrupts assignment of credit for a reward to a specific choice. With credit appropriately assigned, it becomes increasingly easy to learn that V1 is better than V2 as the value of V2 falls, as V2 infrequently yields a reward. When credit assignment is disrupted, however, this benefit is impaired, as some of the credit for V1 rewards will be misassigned to V2 choices, and some of the credit for V2 nonrewards will be misassigned to V1 choices. The estimated value of both V1 and V2 will instead approach the mean of the two options, and so animals will continue to sample V2. This finding is similar to the pattern of choices seen at the beginning of each experimental session for the lOFC-lesioned monkeys. Nonetheless, a deficit in credit assignment should not cause the lOFC animals to fail to learn completely. Stimulus value will still reflect the recency-weighted history of rewards received for all choices, and periods in which V1 is (by chance) chosen more frequently will be associated with periods in which more rewards are received. As soon as V1 starts to increase in estimated value, it will be sampled more frequently, making learning easier still. Consistent with the lOFC impairment being one of credit assignment was the finding that, when the deficit was prominent (Fig. 3I), it was only present at the beginning of the session but no longer in the second half of the session (t5 = −1.76, P > 0.05). In contrast, the mOFC deficit was apparent even in the second half of the session (t5 = 5.04, P = 0.004), as might be expected if it reflected a persistent failure of value-guided decision making.

Before fixed schedules, testing animals with mOFC and lOFC lesions had received identical training on the varying schedules and performed similarly (Fig. 1). Both considerations suggest it is appropriate to compare the groups’ postoperative fixed-schedule performances. Although animals with lOFC lesions had not received preoperative testing on the fixed schedules, it is difficult to see how this could account for the double dissociation observed. We nevertheless carried out four additional tests to establish whether value-guided decision making was impaired after mOFC, but not lOFC, lesions.

First, to confirm that lOFC lesions impaired performance on easy decision-making trials when the V1V2 difference was large, lOFC performance was compared with that of a control group of three macaques with no lesions. The control macaques had training and testing histories that were identical in every respect to that of the lOFC animals [SI Methods (9)]. Just as in the comparison with the mOFC lesion, it was found that the lOFC group was more impaired than controls when V1 and V2 option values were disparate (V2_MID: t4 = −4.82, P = 0.009; V2_low: t4 = −4.97, P = 0.008) but not when they were proximate (V2_High: t4 = −1.65, P = 0.174).

Before moving to the next two tests of value-guided decision making, we carried out one further search for value-learning impairments in the fixed schedules in Experiment 2 to check whether we had confounded them with decision-making impairments. A learning rate, a parameter describing the extent to which feedback changed stimulus-value estimates, was fitted to each animal's performance using standard nonlinear minimization procedures on each testing day (SI Methods: Credit Assignment Analysis and Fig. S6). Although the learning rate increased with increasing V1-V2 value differences in controls and mOFC-lesioned animals (both, P < 0.05), this effect was significantly reduced in the lOFC group (Linear Interaction Group × Condition; lOFCs vs. controls: F1,4 = 27.92, P = 0.006; lOFCs vs. mOFCs: F1,5 = 17.82, P = 0.008; and no effect in lOFC group alone: P > 0.1).

Reanalysis of the Varying Reward Schedules and Reward-Guided Decision Making.

The second analysis of value-guided decision making confirmed that mOFC lesions caused a greater impairment than lOFC lesions when option values were proximate. To conduct this analysis, we once again focused on the varying reward schedule data (Fig. 1) that was collected from both lesion groups under identical pre- and postoperative conditions (SI Methods: Value Comparison Analysis). Trials were identified across all varying schedules in which V1V2 and V2V3 value differences resembled those in the three fixed schedules. Despite the dynamic trial-by-trial changes in values that characterized varying schedules, the same pattern of impairment was seen again after mOFC lesion (Fig. 3 J, L, and O) (main effects of lesion: F1,3 = 19.43, P = 0.022; value difference: F2,6 = 73.84, P < 0.001; linear interaction of lesion across three levels of value difference: F1,3 = 48.59, P = 0.006) but not after lOFC lesion (Fig. 3 K, M, and N) (F1,2 = 1.96, P = 0.296). Moreover, there was once again a significant difference between impairment patterns seen after mOFC and lOFC lesions (three-way interaction of Lesion, Value Difference, and Group: F1,10 = 6.51, P = 0.015). The dynamic nature of the varying environment means that this analysis was less sensitive to the nature of the lOFC lesion impairment than the analyses presented in Fig. 3.

Reward-Guided Decision Making and Independence of Irrelevant Alternatives.

To begin investigating whether it was simply proximity of V1 and V2 values that determined the mOFC impairment or whether it was also the context in which the V1V2 comparison was made, we carried out a regression analysis. This analysis tested whether the proportion of V1 choices was predicted by linear factors not just of the V1V2 difference, but terms representing the (V1V2)(V2V3), the (V1V2)V3, and the (V2V3)V3 interaction terms. To estimate these interaction effects the regression included main effects of the V1V2 difference, the V2V3 difference, and the magnitude of V3. In brief, the regression coefficients that indicate the weight of influence of each interaction term on performance all became significantly more negative after the mOFC lesion (Fig 4A) (F1,3 = 41.40, P = 0.008). This result suggests decline in performance after mOFC lesion was not just a function of V1V2 difference but also of the value of V2 being much better than that of V3, the V3 value itself being large, and the interaction of these effects, as if such contexts distracted monkeys with mOFC lesions from identifying or attending to the best options when making a decision.

Fig. 4.

Fig. 4.

(A) Weights of influence of stimulus value differences on choices of V1 before (green) and after (blue) mOFC lesion. The interactions between the stimulus value differences, for example the interaction between the V1V2 and V2V3 difference [(V1V2)(V2V3)], and (V1V2)V3 and (V2V3)V3, are shown on the right of the figure. Although the interaction terms are positive before surgery, indicating such combinations of value difference were conducive to good performance, they are significantly more negative after mOFC lesion. (B) Proportion of trials on which monkeys chose options as a function of the value difference with respect to one other option in the context of a high (solid line) or low (dashed line) value third option before (i) and after (ii) mOFC lesion. The same effect remained statistically significant in the mOFC lesion group, even if small in absolute terms, when the analysis focused just on the first halves of each testing session when performance tended to be strong even postoperatively (Ci and Cii), but no such effect was seen after lOFC lesions (Ciii and Civ). In summary, only after mOFC lesions are monkeys’ value comparisons dependent on irrelevant alternatives.

The final assessment of value-guided decision making also tested whether or not value-guided comparisons were influenced by the context in which they were made. Rational value-guided decisions between a given pair of options should be made in the same manner, independently of what other alternative options are available (11, 12). Logistic functions were constructed to describe each animal's choices between possible pairs of options under two conditions (Fig. 4B). The steepness of the slope reflects the degree to which decisions are deterministic functions of value, as opposed to random. Note, however, that a reduction in sigmoid slope after a lesion could reflect impaired value-guided decision making or impairment in the learning and assignment of values to options before comparison during decision making.

Our analysis, therefore, did not focus simply on the change in sigmoid slope describing value comparison between possible pairs of options in pre- and postoperative tests, but on its dependence on the value of the third option (SI Methods: Value Comparison Analysis). In the first condition, the value of the third option was high (top 33 percentile) and in the second condition the value of the third option was low (bottom 33 percentile). It is possible that a faulty learning mechanism might estimate the values of the two options being compared incorrectly, but it should do so in the same way in both conditions. It is possible that our modeling and estimation of the options’ values is inaccurate, but the inaccuracy should be the same in each condition. What is at stake then is whether the sigmoid curves describing decision-making in the two conditions are identical or different to one another. The two curves were identical preoperatively (Fig. 4Bi), showing that the distribution of monkeys’ choices between any two options does not normally depend on the value of a third alternative. After mOFC lesions, however, value comparisons between each pair of options depended significantly on what third option was available at the same time (t3 = 5.44, P = 0.012) (Fig. 4Bii). Pre- and postoperative results remained essentially identical, even after normalization of the values (SI Methods: Reward and Error Sensitivity Analyses and Fig. S7).

The approach was difficult to apply to the second half of each day's testing session in the lOFC lesion animals because no learning occurred. In general, however, animals with lOFC lesions exhibited learning in the first half of each session. Application of the same analysis to the first half of each day's session revealed no lOFC impairment (Fig. 4 Ciii and Civ) (P > 0.1). In contrast, an increase in dependency of decision making on irrelevant alternatives was still apparent in the first half of each testing session after mOFC lesions (Fig.4 Ci and Cii) (F2, 6 = 16.361, P = 0.004). This finding was true even though the mOFC impairment was otherwise hardly apparent in the first half of each of the testing sessions. Finally, a direct comparison of the two lesion groups confirmed that the dependency of decision making on the value of an irrelevant alternative became significantly greater after mOFC lesion than it did after lOFC lesions (F2, 10 = 16.361, P = 0.001).

Medial OFC and lOFC Are Not Specialized for Processing Rewards and Errors.

Previous attempts to characterize mOFC and lOFC differences have focused on specialization for reward and error processing, respectively (13). If lOFC is specialized for error processing, then a lesion should hamper switching to an alternative after one choice has been unrewarded. In contrast, if mOFC is critical for detecting occurrence of reward then mOFC lesions should have the complementary effect and diminish the rate at which animals stay with the same option after a reward is received. Although both groups, unsurprisingly, switched more after errors both before and after surgery (main effect of trial type: F1,5 = 198.17, P < 0.001), there was no evidence for relative insensitivity to rewards and errors after mOFC and lOFC lesions, respectively. Instead, both lesions caused a general increase in switching (Fig. S8) (main effect of surgery: F1,5 = 13.09, P = 0.015). Although there was a nonsignificant trend for mOFC lesion animals to switch more frequently than lOFC lesion animals, they did so even before surgery.

Discussion

The mOFC and lOFC have distinct roles in reward-guided decision making and learning. There is a double dissociation between the effects of lesions to the areas (Fig. 3 D–I). Although the different anatomical connections of the mOFC and lOFC are well documented (7), evidence of their different behavioral roles has been lacking. Ventromedial activity has been prominent in neuroimaging studies (1), but little has been known of the corresponding region in other species or through other techniques. Despite reports that OFC neuron activity is correlated with the values of choices and the rewards or errors that follow (6, 14, 15), most studies have focused on the lOFC (14). Medial OFC lesions in monkeys have usually only been made in conjunction with lOFC lesions (1618).

The lOFC is critical for credit assignment (9). In its absence, the credit for rewards is erroneously misattributed not just to the stimulus choice that led to its delivery, but to prior and subsequent choices (Fig. 2). Thorndike's “law of effect” is no longer in operation, even though the “spread of effect” of reward (19) still operates. Although several representations of choice history and reward history exist in the brain (20), only a limited number of regions, including the lOFC, represent the conjoint history of choices and rewards (21, 22). An lOFC lesioned animal should struggle to learn when each option's value is very different from the others because it will erroneously estimate the value of each option as close to the mean. This counterintuitive pattern of impairment was indeed found after lOFC lesions (Fig. 3 G–I and Fig. S5).

Although value learning is the first stage of the proposed “standard model” of decision making, value comparison is the second stage (6). Contrary to suggestion, the lOFC rather than a ventromedial area, such as the mOFC, was essential for value learning in the present study. Moreover, the claim that the second stage is “implemented in lateral prefrontal and parietal areas” rather than a ventromedial area, such as the mOFC, must be qualified. Although small in size, mOFC lesions in the current study impaired value-guided decision making in four different analyses. The first two analyses (Fig. 3) demonstrated that, after mOFC lesions, value comparison was particularly susceptible to errors when the options’ values were close. Additional analyses (Fig.4 A and B) extended this finding by showing that a third option's value had a distracting influence on choices made between any two other values after mOFC lesion. After practice and under normal circumstances, monkeys’ choices, in the absence of any speeded response requirements, abide by the independence of irrelevant alternatives axiom of decision theory (11, 12); the distribution of choices made between two options is independent of the value of any third option, but this is no longer the case after mOFC lesions. There was, however, no evidence that mOFC lesions impaired credit assignment or other aspects of value learning.

We suggest the models of decision making must be revised in recognition of the finding that ventromedial prefrontal frontal areas, such as the mOFC, are intimately involved in the process by which a comparison between the values of potential choice options is made. The precise nature of the mOFC's involvement, however, remains to be specified. Functional MRI studies have reported that ventromedial frontal activity, but not lOFC activity, reflects the value of the option that will be chosen but also the value of options that will go unchosen (3, 4, 23). In some cases, but not all (23), a value-difference signal suggests a value comparison process within the ventromedial frontal cortex. On the other hand, an automatic ventromedial frontal valuation signal is found even when stimuli are presented in the absence of any decision requirement (24). The present finding that animals were beguiled into making incorrect choices, partly as a function of how much better V2 was than V3 and distracted by high values of V3 (Fig. 4 A and B), suggests the mOFC is at the nexus of an attentional or representational process that identifies the options for choice and the comparison process itself. Moreover, we emphasize that it is likely that the mOFC is concerned with making decisions between goals to pursue rather than between the actions that might be made to pursue such goals. This theory is consistent with the finding that the region is more active when behavior is under the guidance of goal values and responses are not being made in a “habitual” fashion (25). Once the goal value that will guide behavior is selected, the saliency of representations of locations in the environment and of responses that bring the animal's eye or hand to that location are enhanced in the lateral parietal and frontal cortex (26).

Just as mOFC-lesion monkeys make irrational choices that are influenced by the presence of an irrelevant alternative, so too are human subjects with lower mOFC/ventromedial prefrontal cortex (vmPFC) blood-oxygen level-dependent signals more likely to make irrational decisions that are influenced by context (5). Human patients with mOFC/vmPFC lesions also make unusual decisions, which may reflect failure to attend to relevant aspects of the decision-making context (27, 28). Such a pattern of impairment might lead to the appearance of satisficing rather than optimizing behavior (29).

That animals with mOFC lesions still sometimes choose V1 may be because the lesion does not affect the whole valuation and decision-making system of which the mOFC is a component, and it leaves intact other independent value systems that may exist in the brain (1, 2, 6, 30). The striatum, posterior, and anterior cingulate cortex also carry value signals implicated in decision-making (3, 6, 14). The different functions of reward representations in areas such as the OFC and anterior cingulate cortex are currently debated (10, 14, 31), but they may mean the areas can only partially compensate for one another.

Prediction errors may guide reward learning (32). A representation of expected value must be maintained by the brain to calculate value prediction errors when outcomes do not match prior expectations. If the mOFC carried a representation of reward expectation critical for reward prediction-error learning, then mOFC lesions, like lOFC lesions, should have altered learning, but that was not the case (Fig. S6). Medial OFC value signals, even if automatically generated (24), instead appear to be critical for guiding decisions.

Further evidence that other aspects of the influence of value on behavior remain intact after mOFC lesions is presented in Fig. S9. The average value of the environment influences response vigor and the speed with which responses are made (33), and this also remains the case after mOFC lesions. Such dissociations are consistent with suggestions that the brain contains multiple value representations.

Some studies have been taken to support a relative specialization for rewards and errors in the mOFC and lOFC, respectively (13). No support, however, was found for this hypothesis in the current study (Fig. S8), as was the case in a recent recording study that focused on the lOFC (34). In contrast, our hypothesis that the mOFC is concerned with value-guided decision-making can account for previous findings; it predicts mOFC signals should correlate with the values of the choice options being considered and then with the value of the choice finally taken (3, 4, 23). Moreover, an account of the lOFC that focuses on value assignment predicts lOFC activation to errors if the errors occasion reassignment of values to options. In addition, however, it predicts lOFC responses to positive outcomes if they also occasion the revaluation of an option.

Medial OFC involvement in value comparison accords with its connections with brain regions implicated in other aspects of reward-guided behavior (7). In contrast, lOFC involvement in assignment of value to stimuli is consistent with its connections with higher order sensory areas, such as the anterior temporal and perirhinal cortex, that represent those stimuli (7), and the finding that primate lOFC is a prerequisite for stimulus-reward association learning but not action-reward association learning (10).

Methods

Four male rhesus macaques (Macaca mulatta) were tested before and after bilateral aspiration lesions of mOFC. They were compared with three animals with lOFC lesions (Fig. 1 and SI Methods). All stimuli used were presented on a touch screen monitor (SI Methods) during all reward schedules (Figs. 1 and 2 and SI Methods: Schedules). Additional details of value assignment (Fig. 2) and value comparison analyses (Figs. 3 and 4) are presented in SI Methods: Credit Assignment Analyses and SI Methods: Value Comparison Analyses.

Supplementary Material

Supporting Information

Acknowledgments

We thank L. Hunt for his useful comments on the manuscript, G. Daubney for histology, and the Biomedical Services team for animal husbandry. We acknowledge funding from the Medical Research Council (to M.E.W., T.E.J.B., J.S., M.J.B., and M.F.S.R.) and the Wellcome Trust (to M.P.N., and M.E.W.).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1012246107/-/DCSupplemental.

References

  • 1.Rangel A, Camerer C, Montague PR. A framework for studying the neurobiology of value-based decision making. Nat Rev Neurosci. 2008;9:545–556. doi: 10.1038/nrn2357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Rushworth MF, Behrens TE, Rudebeck PH, Walton ME. Contrasting roles for cingulate and orbitofrontal cortex in decisions and social behaviour. Trends Cogn Sci. 2007;11(4):168–176. doi: 10.1016/j.tics.2007.01.004. [DOI] [PubMed] [Google Scholar]
  • 3.FitzGerald TH, Seymour B, Dolan RJ. The role of human orbitofrontal cortex in value comparison for incommensurable objects. J Neurosci. 2009;29:8388–8395. doi: 10.1523/JNEUROSCI.0717-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Boorman ED, Behrens TE, Woolrich MW, Rushworth MF. How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action. Neuron. 2009;62:733–743. doi: 10.1016/j.neuron.2009.05.014. [DOI] [PubMed] [Google Scholar]
  • 5.De Martino B, Kumaran D, Seymour B, Dolan RJ. Frames, biases, and rational decision-making in the human brain. Science. 2006;313:684–687. doi: 10.1126/science.1128356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Kable JW, Glimcher PW. The neurobiology of decision: Consensus and controversy. Neuron. 2009;63:733–745. doi: 10.1016/j.neuron.2009.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Price JL. Definition of the orbital cortex in relation to specific connections with limbic and visceral structures and other cortical regions. Ann N Y Acad Sci. 2007;1121:54–71. doi: 10.1196/annals.1401.008. [DOI] [PubMed] [Google Scholar]
  • 8.Buckley MJ, Gaffan D, Murray EA. Functional double dissociation between two inferior temporal cortical areas: perirhinal cortex versus middle temporal gyrus. J Neurophysiol. 1997;77:587–598. doi: 10.1152/jn.1997.77.2.587. [DOI] [PubMed] [Google Scholar]
  • 9.Walton ME, Behrens TE, Buckley MJ, Rudebeck PH, Rushworth MF. Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning. Neuron. 2010;65:927–939. doi: 10.1016/j.neuron.2010.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Rudebeck PH, et al. Frontal cortex subregions play distinct roles in choices between actions and stimuli. J Neurosci. 2008;28:13775–13785. doi: 10.1523/JNEUROSCI.3541-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ray P. Independence of irrelevant alternatives. Econometrica. 1973;41:987–991. [Google Scholar]
  • 12.Luce RD. Individual Choice Behavior: A Theoretical Analysis. New York: Wiley; 1959. [Google Scholar]
  • 13.Kringelbach ML, Rolls ET. The functional neuroanatomy of the human orbitofrontal cortex: Evidence from neuroimaging and neuropsychology. Prog Neurobiol. 2004;72:341–372. doi: 10.1016/j.pneurobio.2004.03.006. [DOI] [PubMed] [Google Scholar]
  • 14.Wallis JD, Kennerley SW. Heterogeneous reward signals in prefrontal cortex. Curr Opin Neurobiol. 2010;20(2):191–198. doi: 10.1016/j.conb.2010.02.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Padoa-Schioppa C, Assad JA. Neurons in the orbitofrontal cortex encode economic value. Nature. 2006;441:223–226. doi: 10.1038/nature04676. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Rudebeck PH, Bannerman DM, Rushworth MF. The contribution of distinct subregions of the ventromedial frontal cortex to emotion, social behavior, and decision making. Cogn Affect Behav Neurosci. 2008;8:485–497. doi: 10.3758/CABN.8.4.485. [DOI] [PubMed] [Google Scholar]
  • 17.Izquierdo A, Suda RK, Murray EA. Bilateral orbital prefrontal cortex lesions in rhesus monkeys disrupt choices guided by both reward value and reward contingency. J Neurosci. 2004;24:7540–7548. doi: 10.1523/JNEUROSCI.1921-04.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Murray EA, Wise SP. Interactions between orbital prefrontal cortex and amygdala: Advanced cognition, learned responses and instinctive behaviors. Curr Opin Neurobiol. 2010;20:212–220. doi: 10.1016/j.conb.2010.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Thorndike EL. A proof of the Law of Effect. Science. 1933;77(1989):173–175. doi: 10.1126/science.77.1989.173-a. [DOI] [PubMed] [Google Scholar]
  • 20.Seo H, Lee D. Cortical mechanisms for reinforcement learning in competitive games. Philos Trans R Soc Lond B Biol Sci. 2008;363:3845–3857. doi: 10.1098/rstb.2008.0158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Tsujimoto S, Genovesio A, Wise SP. Monkey orbitofrontal cortex encodes response choices near feedback time. J Neurosci. 2009;29:2569–2574. doi: 10.1523/JNEUROSCI.5777-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Sul JH, Kim H, Huh N, Lee D, Jung MW. Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making. Neuron. 2010;66:449–460. doi: 10.1016/j.neuron.2010.03.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Wunderlich K, Rangel A, O'Doherty JP. Economic choices can be made using only stimulus values. Proc Natl Acad Sci USA. 2010;107:15005–15010. doi: 10.1073/pnas.1002258107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Lebreton M, Jorge S, Michel V, Thirion B, Pessiglione M. An automatic valuation system in the human brain: Evidence from functional neuroimaging. Neuron. 2009;64:431–439. doi: 10.1016/j.neuron.2009.09.040. [DOI] [PubMed] [Google Scholar]
  • 25.de Wit S, Corlett PR, Aitken MR, Dickinson A, Fletcher PC. Differential engagement of the ventromedial prefrontal cortex by goal-directed and habitual behavior toward food pictures in humans. J Neurosci. 2009;29:11330–11338. doi: 10.1523/JNEUROSCI.1639-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Sugrue LP, Corrado GS, Newsome WT. Choosing the greater of two goods: Neural currencies for valuation and decision making. Nat Rev Neurosci. 2005;6:363–375. doi: 10.1038/nrn1666. [DOI] [PubMed] [Google Scholar]
  • 27.Fellows LK. Deciding how to decide: Ventromedial frontal lobe damage affects information acquisition in multi-attribute decision making. Brain. 2006;129:944–952. doi: 10.1093/brain/awl017. [DOI] [PubMed] [Google Scholar]
  • 28.Damasio AR. Descartes’ Error: Emotiom, Reason, and the Human Brain. New York: Putman Publishing; 1994. [Google Scholar]
  • 29.Simon HA. Models of Man. New York: Wiley; 1957. [Google Scholar]
  • 30.Rangel A, Hare T. Neural computations associated with goal-directed choice. Curr Opin Neurobiol. 2010;20:262–270. doi: 10.1016/j.conb.2010.03.001. [DOI] [PubMed] [Google Scholar]
  • 31.Kennerley SW, Dahmubed AF, Lara AH, Wallis JD. Neurons in the frontal lobe encode the value of multiple decision variables. J Cogn Neurosci. 2009;21:1162–1178. doi: 10.1162/jocn.2009.21100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Sutton R, Barto AG. Reinforcement Learning. Cambridge, Massachusetts: MIT Press; 1998. [Google Scholar]
  • 33.Niv Y, Daw ND, Joel D, Dayan P. Tonic dopamine: Opportunity costs and the control of response vigor. Psychopharmacology (Berl) 2007;191:507–520. doi: 10.1007/s00213-006-0502-4. [DOI] [PubMed] [Google Scholar]
  • 34.Morrison SE, Salzman CD. The convergence of information about rewarding and aversive stimuli in single neurons. J Neurosci. 2009;29:11471–11483. doi: 10.1523/JNEUROSCI.1815-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES