Orbitofrontal neurons infer the value and identity of predicted outcomes

Thomas A Stalnaker; Nisha K Cooch; Michael A McDannald; Tzu-Lan Liu; Heather Wied; Geoffrey Schoenbaum

doi:10.1038/ncomms4926

. Author manuscript; available in PMC: 2014 Dec 4.

Published in final edited form as: Nat Commun. 2014 Jun 4;5:3926. doi: 10.1038/ncomms4926

Orbitofrontal neurons infer the value and identity of predicted outcomes

Thomas A Stalnaker ^1,^*,^†, Nisha K Cooch ^2,^†, Michael A McDannald ², Tzu-Lan Liu ², Heather Wied ², Geoffrey Schoenbaum ^1,^2,^3,^*

PMCID: PMC4056018 NIHMSID: NIHMS587567 PMID: 24894805

Abstract

The best way to respond flexibly to changes in the environment is to anticipate them. Such anticipation often benefits us if we can infer that a change has occurred, before we have actually experienced the effects of that change. Here we test for neural correlates of this process by recording single-unit activity in the orbitofrontal cortex in rats performing a choice task in which the available rewards changed across blocks of trials. Consistent with the proposal that orbitofrontal cortex signals inferred information, firing changes at the start of each new block as if predicting the not-yet-experienced reward. This change occurs whether the new reward is different in number of drops, requiring signaling of a new value, or in flavor, requiring signaling of a new sensory feature. These results show that orbitofrontal neurons provide a behaviorally relevant signal that reflects inferences about both value-relevant and value-neutral information about impending outcomes.

Keywords: orbitofrontal, learning, inference, single unit, rat

The best way to respond flexibly to changes in the environment is to anticipate them. In many cases, such anticipation requires us to infer that a change has occurred, before we have actually experienced the effects of that change. For example, suppose you see your boss striding into his office in the morning with a dark scowl on his face. You realize that he is in a bad mood, and consequently you can change your expectations for what he might say to you in your morning meeting. Note that because you are using inference, the sensory cue – the scowl – need not ever be directly associated with the changed value of his words. But you, having observed the hidden sign and made the inference, can anticipate what will happen next.

The orbitofrontal cortex (OFC) has been hypothesized to track such inferred states and to signal information about expected outcomes based on them^1–3. Indeed, numerous studies have found that neural activity in OFC anticipates expected outcomes^4–12. However, the question of what function this activity performs has not been fully answered. One idea is that anticipatory activity drives decision making by representing the expected or even economic value of the outcomes that are likely to ensue in a given situation^{6, 13}. However, most OFC neurons track specific features of outcomes, like flavor, or even their context, rather than value, and respond more generally to other events in the environment^{6, 11, 14–17}. Furthermore, causal studies have reported that OFC manipulations disrupt value-guided decisions when they are based on inferences and not when they are based on simple comparisons of previously learned values^{13, 18–23}.

Here we sought to test for neural correlates of this inferential process by recording single-unit activity in OFC in rats performing a choice task in which, on each trial, the rats chose between two milk rewards which varied across blocks of trials in both in number of drops and flavor. We were specifically interested in how OFC neurons would change firing in anticipation of new rewards at block switches, before those rewards were actually experienced. Consistent with the proposal that OFC infers information about expected outcomes, firing in OFC changed at the start of each new block as if predicting the not-yet-experienced reward. This change predicted the speed of with which choice behavior was adjusted in that block, and it occurred whether the new reward was different in number of drops, requiring signaling of a new value, or in flavor, requiring signaling of a new sensory feature. These results show that OFC neurons infer both value-relevant and value-neutral information about impending outcomes.

RESULTS

Rats(n=6) were trained in an odor-guided choice task (illustrated in Figure 1A) to respond at either a left or right well to receive a small (one drop) or large (three drops) amount of chocolate or vanilla-flavored milk. This task was similar to an odor-guided choice task that we have used previously¹⁴, except that we manipulated reward flavor in some blocks instead of reward delay. Importantly, we used amounts and concentrations that we established separately were equivalently preferred (see Figure 1B). Response-reward contingencies were stable across blocks of ~60 trials, but switched unpredictably in reward number or reward flavor at block transitions. Contingencies were arranged so that 1) rewards in the two directions always differed in both number and flavor (e.g. large chocolate versus small vanilla or large vanilla versus small chocolate), and, 2) number and flavor switches alternated according to the sequence number-flavor-number-flavor across the five blocks of every session. Free-choice and forced-choice trials, instructed by an odor delivered at the beginning of the trial, were intermixed within blocks but always had the same response-reward contingencies.

A. After initiating a trial with a nosepoke, an odor was delivered for 500 ms, after which rats responded at one of the two fluid wells for 1 or 3 drops of chocolate or vanilla milk, delivered 500 ms after the well poke. Two odors indicated *forced choices,* left or right; a third odor indicated *free choice.* Reward contingencies were stable across blocks of ~60 trials, but switched in number of drops (dashed lines) or flavor (dotted lines) in four unsignaled transitions. Rewards in the two directions always differed in both number of drops and flavor (one of the four possible block sequences is shown).

B. Chocolate and vanilla milk were equally preferred in a ten minute consumption test in a separate group of rats (t₁₀=0.1,p=0.93).

C. Choice rates in the task reflected the number of drops but not the flavor. Number switches (left panel) had a similarly large effect on choice rates for chocolate->chocolate and vanilla->vanilla switches. Flavor switches (right panel) had no effect on choice rates across big vanilla->big chocolate or big chocolate->big vanillla. Line figures show average trial-by-trial choice rates; inset bar graphs show choice rates before vs. after switches (25 trials each). ANOVA on difference in choice rates across transitions, factors transition type and initial flavor; main effect of transition type (F_1,92=195.7,p<0.001), driven by significant changes across number transitions (planned contrast, F_1,92=445.9,p<0.0001), and insignificant changes across flavor transitions (planned contrast, F_1,92=1.3,p=0.27); no effect of initial flavor (F_1,92=0.0,p=0.93); no differences between vanilla-to-chocolate and chocolate-to-vanilla (planned contrast. F_1,92=2.3,p=0.13).

D. Reaction time (top panel) and accuracy (bottom panel) reflected the number of drops expected but not the flavor. Bar graphs show average reaction time (from end of odor to movement) or accuracy on forced-choice trials within the last 25 trials of blocks. Within-subjects ANOVAs on reaction time and accuracy: main effects of number (F_1,93=62.2,p<0.001; F_1,93=182.3,p<0.001) but not flavor (F_1,93=0.3,p=0.57.; F_1,93=5.3,p=0.024^†), nor interactions (F_1,93=0.1,p=0.73.; F_1,93=5.1,p=0.027^†).

Error bars show standard errors; * p<0.001, main effects of number; ^†not significant if p-value criterion is corrected for the three separate ANOVAs testing flavor effects (p<0.0167 by Bonferroni correction)

The purpose of this task was to compare neural activity after switches in the number of reward drops, in which the value of choices changed, with that after switches in reward flavor, in which there was no change in the value of the two choices. In accord with this distinction, as shown in Figure 1C, number switches resulted in a fast-developing and sustained change in choice rate on free-choice trials, so that after ~38 trials rats were choosing the side with 3 drops of milk on ~80% of free-choice trials. The rate with which they made this switch was independent of flavor, and flavor transitions (chocolate to vanilla or vanilla to chocolate) caused no enduring change in choice rate (see figure captions for all statistics).

We also examined how reaction time and accuracy on forced-choice trials were influenced by shifts in reward number and flavor. As shown in Figures 1C–D, reaction time, defined as the time from odor offset to odor port exit, was faster and forced-choice performance was more accurate when a large reward could be expected compared to when a small reward could be expected. Again these effects were independent of reward flavor. Thus rats’ performance in this task on both free- and forced-choice trials was sensitive to the number of reward drops, thereby reflecting the higher value of a large reward, but insensitive in to their flavor, reflecting the similar value placed on the chocolate and vanilla flavors.

We recorded 831 single-units from OFC in six rats during these 94 behavioral sessions, which included all recording sessions in which all five block were completed. The six rats completed 3, 24, 25, 28, 5 and 9 sessions, in which were recorded 28, 170, 275, 246, 28 and 84 single-units, respectively (some rats had fewer than ten completed sessions because of broken electrodes or other technical problems). The locations of the recordings are shown in Figure 2A. We were interested in reward-anticipatory activity; such activity has been reported frequently in OFC. Thus we analyzed activity in a 500ms epoch as rats waited in the fluid port, immediately before the first drop of reward was delivered. We performed an ANOVA on firing rate for each unit during this epoch with factors reward number, reward flavor, and reward location (left or right). We used forced-choice trials for this analysis, because they were equally distributed across the levels of these factors. Many neurons were selective for number or flavor during this epoch (Figures 2B–F). Counting main effects of number along with number X flavor and number X location interactions, 278 neurons (33%) were number-anticipatory and 198 (24%) were flavor-anticipatory, including 124 (15%) with effects of both number and flavor or interactions between them. In addition, as has been reported previously, many neurons fired selectively in anticipation of rewards according to their location (384, or 46%, showed some effect of reward location), and the majority of neurons with effects of number or flavor also had some effect of location (173 of the 278 neurons with number effects, or 62%, also had location effects; 123 of 198 neurons with flavor effects, or 62%, also had location effects).

A. The black boxes indicate the approximate location from which recordings were made in each rat (in the left hemisphere). The width represents the estimated span of the electrode bundle (~1 mm), and the height represents the approximate extent of recording across all sessions. Bregma + 2.8 to 3.6 mm; scale bar = 1 mm.

B. The colored sections represent the proportion of the entire population with significant effects of number of drops, flavor, or both, based on a 3-way ANOVA performed on firing rate across trials, with factors flavor, number and reward location. The epoch tested was the 500 ms immediately preceding delivery of the first drop of reward.

C. Reward-selective neurons were distributed equally between chocolate- and vanilla-preferring, and between big- and small-preferring. Shown are flavor- and number-selectivity indices calculated for each recorded neuron, color-coded as in B. The proportions of significant neurons were not significantly different between small vs. big or between chocolate vs. vanilla by chi-square test. Big-preferring, 126 neurons; small-preferring, 152 neurons (chi-square stat: 2.4,p=0.12); chocolate-preferring, 103 neurons; vanilla-preferring, 95 neurons (chi-square stat: 0.3,p=0.57). Average magnitude of number index for large-preferring vs. small-preferring (t₂₇₆=−1.2,p=0.24); average magnitude of flavor index for chocolate-preferring vs. vanilla-preferring (t₁₉₆=1.1,p=0.29).

**D E, F.** Each plot shows, for a single-unit example, average firing rate for each of the 4 number/flavor combinations, separately for the left and right wells and aligned on initial reward delivery (at 0 seconds). Vertical dotted lines show epoch from well-entry to reward delivery. D shows an example with number/flavor interaction, E shows an example with only a number effect, and F shows an example with only a flavor effect. All examples shown here have strong effects of location.

To determine whether neurons in these anticipatory populations encoded reward value or simply the features of available rewards, we calculated a number and flavor index for each neuron. This index was simply the difference in average peak-normalized firing rate during the reward anticipatory epoch, between the large and small or the chocolate and vanilla reward trials, respectively. As shown in Figure 2C, both number- and flavor-selective populations were equally distributed between their two poles, indicating that OFC neurons were equally likely to fire selectively in anticipation of large vs. small rewards or chocolate vs. vanilla rewards (large, 126 neurons; small, 152 neurons; chocolate, 103 neurons; vanilla, 95 neurons). In addition, the average magnitude of the number index did not differ between large- and small-preferring populations, nor did the magnitude of the flavor index differ between chocolate- and vanilla-preferring populations. Thus OFC seemed to contain similar representations of each variable reward feature, independent of their value relevance as indexed by behavior.

We next asked how these neurons changed their firing rates across number and flavor transitions. Because we were particularly interested in whether they would signal information about expected outcomes prior to direct experience with them in a new block, we took advantage of one salient feature of our task, which was that a change in reward on one side could be used to predict the features of the new reward on the other side. For example, the sudden receipt of a large chocolate reward on the left meant that a small vanilla reward would now be available on the right. Thus, when a rat received a new reward on one side at the beginning of a new block, it could immediately infer the features - both size and flavor - of the new reward available on the other side. To test whether this inference was evident in the firing rate of outcome-anticipatory OFC neurons, we identified which direction the rats had first received reward after each block switch, and then examined neural activity on the first response in the other direction – i.e. the first trial in the off-direction. For simplicity, we will call this trial the “inference trial.” We eliminated all cases in which a free-choice trial occurred before the inference trial on the same side, because that free-choice trial would have had the same reward as the on the inference trial. Across all block switches, an average of 2.1±0.1rewarded trials occurred in the first direction before the critical inference trial occurred in the off-direction.

We examined activity in all neurons that were number- or flavor-anticipatory by an ANOVA, using only trials in the second half of blocks (so as not to pre-select the population for early-block selectivity, which we were examining here). We tested activity across block switches in which a preferred reward feature for a particular neuron occurred in the block before or after the switch, comparing the firing rate on the inference trial with that at the end of the previous block (see Methods). If the population signaled the inference, then in the latter case (preferred feature after switch) firing rate might be expected to increase on the inference trial, while in the former case (preferred feature before switch) firing rate might be expected to decrease on the inference trial. As illustrated in Figure 3 for single-unit examples and Figure 4 for the population, we found exactly this pattern of firing for both number- and flavor-selective neurons. For example, the unit in 3A was selective late in blocks for the small reward (1 drop) at the left well. When the reward on the left changed from 3 drops to 1 drop, the neuron fired phasically in anticipation of the new small reward the first time 1 drop was delivered on the left (i.e. before it was directly experienced). The only basis for making this prediction would be knowledge of the associative structure of the task in combination with the memory of having received a new 3-drop reward at the right well on the past few trials. These two pieces of information could be used to infer that it would receive a 1-drop reward the next time it responded at the left well. Importantly, such changes in firing were only observed on the inference trial; if we instead analyzed activity on the final trial of the previous block (i.e. one trial before the inference trial), there was no significant change in activity (Figure 5). Finally, we performed a second control analysis, shown in Figure 6, in which number-selective neurons were tested across flavor block switches, and vice versa; neither of these conditions showed an effect of block switch. A direct comparison of the two control conditions with the inference condition also showed that the observed pattern of changes in activity was specific to the inference trial (stats for this comparison are in the legend for Figure 6).

(A and B) show number-selective neurons across number block switches and (C and D) show flavour-selective neurons across flavor block switches. Plots show average firing rate for the previous block (light-colored line), the new block (dark-colored line), and the first trial of the new block in the direction that occurred second (the “inference trial”; dotted line). Plots are aligned on initial reward delivery (at 0 seconds); vertical dotted lines show epoch from well-entry to reward delivery. Activity on the opposite side to the inference, in which rewards are always opposite in flavor and number, are shown also. When the block and direction switched from the anti-preferred to preferred reward feature, as in examples A and C, anticipatory firing on the inference trial tended to be greater than in the previous block; when it switched from preferred to anti-preferred, as in examples B and D, anticipatory firing rate on the inference trial tended to be less than in the previous block.

(A,C,E) show number switches and (B,D,F) show flavor switches. **A, B.** Shown is average peak-normalized activity in the 500ms before reward delivery, for all number-selective neurons (A) or flavor-selective neurons (B) on the inference trial, compared to the end of the previous block in the same direction. Neurons were separated by whether the new block (dark color) or previous block (light color) contained their preferred reward feature. Block switches were only included when the rat had received the new reward at least once *on the opposite side* before the first trial shown in this figure. This information would in theory allow the rat to know that a block switch had taken place, and, based on the rules about how rewards were paired in blocks, infer the new reward feature to be presented on that side. The significant interaction between the two conditions (dark- and light-colored lines) in both number- and flavor-anticipatory populations, indicate that these populations signaled this inference.

** significant interaction at p<0.001 across both populations; see below for detailed statistics.

**C, D, E, F.** Each figure shows population activity across the trial in the conditions summarized in A and B. In C and D, the new block contains the preferred reward feature (dark-colored line in A and B); phasic activity increased for the new block. In E and F the previous block contained the preferred reward feature (light-colored line in A and B); phasic activity decreased in the new block, especially immediately before reward delivery. Activity in all panels is aligned to the first drop of reward delivery, shown by arrows and dashed lines.

ANOVA on anticipatory firing rate across neurons, within-subjects factor before/after, between-subjects factors number/flavor and switch type (anti-preferred→preferred or preferred→anti-preferred): Significant interaction between before/after X switch type (F_1,320=15.2,p<0.001), with no interaction of this effect with number/flavor (F_1,320=1.1,p=0.29). Before/after X switch type interaction was significant for the number-selective population (F_1,320=5.1,p<0.05), and the flavor-selective population (F_1,320=10.2,p<0.01). Anti-preferred→preferred increased significantly (F_1,320 = 5.6,p<0.05), with no interaction with number/flavor (F_1,320=0.0,p=0.92). Preferred→anti-preferred decreased significantly (F_1,320=9.8,p<0.01) with no interaction with number/flavor (F_1,320=2.4,p=0.12).

**A, B.** Shown is average peak-normalized activity in the 500ms before reward delivery, for all number-selective neurons (A) or flavor-selective neurons (B) on the last trial before the block switch compared to end of that block not including the last trial. This analysis includes the same populations in the same blocks as in the inference analysis, with neurons categorized as anti-preferred or preferred in the same way.

**C, D, E, F.** Each panel shows population activity across the trial in the conditions summarized in A and B. Activity in all panels is aligned to reward delivery, shown by arrows and dashed lines.

ANOVA on anticipatory firing rate across neurons, with within-subjects factor before/after, and between-subjects factors number/flavor and preferred/anti-preferred No interaction of before/after X preferred/anti-preferred (F_1,320=0.3,p=0.57), and no other effects (Fs < 2.3,ps>0.13). Neither the number-selective population (F_1,320=0.0,p=0.90) nor the flavor-selective population (F_1,320=0.4,p=0.53) showed an interaction between before/after X preferred/anti-preferred. At neither the level of preferred nor anti-preferred was there a significant effect of before vs. after (F_1,320=2.1,p=0.15; F_1,320=0.5,p=0.49).

**A, B.** Shown is average peak-normalized activity of the same populations during the same epoch shown in Figure 4, but here number-anticipatory neurons are shown across flavor switches (A) and flavor-anticipatory neurons across number switches (B). Neurons were separated by whether in the inference condition (shown in Figure 4) the new block and direction contained their preferred reward feature (dark color) or their anti-preferred reward feature (light color). Block switches were only included when the rat had received the new reward at least once *on the opposite side* before the first trial shown in this figure.

**C, D, E, F.** Each panel shows population activity across the trial in the conditions summarized in A and B. Activity in all panels is aligned to reward delivery, shown by arrows and dashed lines.

ANOVA on anticipatory firing rate across neurons, within-subjects factor before/after, and between-subjects factors number/flavor and switch type (anti-preferred → preferred or preferred → anti-preferred): No interaction of before/after X switch type (F_1,320=1.0,p=0.32), and no other effects F’s<0.6,p’s>0.45). Neither the number-selective population (F_1,320 = 1.4,p=0.23) nor the flavor-selective population (F_1,320 = 0.1,p=0.74) showed an interaction between before/after X switch type.

To compare the two control conditions (Figures 5 and 6) with the inference condition, we ran an ANOVA on the difference scores (after - before) for the inference condition, first control condition (Figure 5), and second control condition (Figure 6), as a within-subjects factor, and switch type (anti-preferred → preferred or preferred → anti-preferred) as a between-subjects factor. This ANOVA revealed an interaction between difference-score and switch-type (F_2,644 = 6.3, p=0.002). The interaction between inference vs. the two controls (pooled) X switch-type was also significant (planned comparison: F_1,322 = 12.2, p=0.0006). And, at each level of switch-type, the comparison of inference vs. the two controls was also significant (planned comparisons: anti-preferred to preferred, F_1,322 = 8.2, p=0.004; preferred to anti-preferred, F_1,322 = 4.3, p=0.039).

Interestingly, the inference signal observed in OFC neurons was not tightly locked to the rats’ behavior. For number switches, OFC neurons seemed to infer the outcome before the rats’ behavior clearly reflected this knowledge, at least according to our standard behavioral metrics. Rats took about 12 rewarded trials (6 in each direction) after a block switch to reach a 50% choice rate (i.e. choosing both sides equally). In contrast, the inference signal was detectable before the first time a new reward was delivered in the off-direction. In addition, for flavor switches, the inference signal seemed to be completely dissociated from behavior, because rats did not change their behavior in any obvious way in response to flavor changes. This suggests that the neural inference signal was not a consequence of behavior, but it also calls into question whether the signal has any behavioral relevance.

To examine this issue, we looked at choice rate on blocks in which we observed strong inference signals versus those in which we observed weak inference signals, by calculating an inference score for each number-selective neuron across each included block switch. This score was simply the difference between the normalized firing rate on the inference trial of the new block minus that in the same direction at the end of the previous block (see Methods). As shown in Figure 7A, when the number of drops changed, a large inference score was associated with a rapid switch in choice behavior. This difference was most pronounced in the first five trials of the block, but was significant until about trial ten. Accordingly, across the first ten trials of the block, we found a significant positive correlation between the neural inference score of the single-units and the rats’ choice rates (R = 0.28, p < 0.01, Pearson correlation). This correlation was specific for activity on the inference trial. As shown in Figure 7B, when we did a parallel analysis using the last trial of the previous block in place of the inference trial, there was no correlation with choice behavior in the new block (R = −0.031, p = 0.75, Pearson correlation).

A. Shown is the average trial-by-trial choice rate across number block switches when a neuron with an inference score *above* (green line) or *below (*black line) the median was recorded. Inset shows average choice rate across the first 5 trials, next 5 trials, or trials 11–60. Scatter plot shows the correlation between the single-unit inference score (difference in normalized firing rate between inference trial and end of previous block; see Methods) and choice rate over the first ten trials. Choice rate is defined as the percentage of choices towards the side with the big reward in the new block. ANOVA on choice rate with factors group and trial-set (1–5,6–10, or11–60), found a significant effect of group (F_1,513=17.6,p<0.0001) and a group X trial-set interaction (F_2,513=3.6 p<0.05), planned comparisons between groups were significant at trials 1–5 (F_1,513=10.7,p<0.01), trials 6–10 (F_1,513=12.3,p<0.001); but not at trials 11–60 (F_1,513=0.1,p=0.76).

** p < 0.01.

B. For this control analysis, activity on the last forced-choice trial before the block switch was used to compute the difference score (difference in normalized firing rate between that trial and trials at the end of that block not including that trial). ANOVA on choice rate with factors group and trial-set (1–5,6–10, or 11–60), found a small effect of group (F_1,513= 4.3,p<0.05), but no group X trial-set interaction (F_2,513=0.0,p=0.96), and no effect of group at any level of trial-set (F’s<2.1,ps>0.15).

The relationship between the inference signal and the rats’ choice performance can also be shown if we divide sessions by whether the rats switched quickly to the large reward at the start of the block (choice rate in first ten trials above the median) or only more slowly (choice rate in first ten trials below the median). As illustrated in Figure 8, this analysis shows that neural inference was only observed at the start of fast-switching blocks and not at all at the start of slow-switching blocks. Again, this difference was specific for the inference trial: when we compared activity either on the last trial of the previous block, or at the end of the new block, fast-switching and slow-switching blocks did not differ in OFC reward selectivity.

**A, B.** Shown is average peak-normalized activity of all number-anticipatory neurons, as in Figure 4, here separated by whether choice rate on the first ten trials was *above* (A) or *below* (B) the median. Dotted line shows activity at the end of the block in same direction. The two groups differed in selectivity on the inference trial, but not on the trial immediately before it (the last trial in the previous block, not shown here) nor at the end of the block.

* * significant interaction at p<0.01 (see below for detailed statistics).

**C, D, E, F.** Each panel shows population activity across the trial in the conditions summarized in A and B. Activity in all panels is aligned to reward delivery, shown by arrows and dashed lines.

ANOVA on difference in firing rate from the end of the previous block, factors: trial (last trial of previous block, inference trial, end of new block), switch type (anti-preferred→preferred/preferred→anti-preferred), and group (fast-switching/slow-switching), found a significant 3-way interaction (F_2,366=3.9,p<0.05). In the fast-switching group, effect of switch type was significant on last trial (F_1,183=0.3,p=0.60), but not on inference trial (F_1,183=10.7,p<0.01) nor at end (F_1,183=58.5,p<0.0001). For the slow-switching group, no effect of switch type on last trial (F_1,183=0.2,p=0.67) nor inference trial (F_1,183=0.5,p=0.48) but a significant effect at end (F_1,183=44.0,p<0.0001). Interaction between group X switch type was not significant on last trial (F_1,183=0.0,p=0.94) nor at end (F_1,183=0.4,p=0.52), but was on inference trial (F_1,183=7.9,p<0.01). Inference trial vs. end of the new block X switch type interaction was not significant for the fast-switching group (F_1,183=0.2,p=0.65), but was significant for the slow-switching group (F_1,183=17.4,p<0.0001); interaction between these interactions was significant (F_1,183=7.0,p<0.01). No effect of group and no group X preferred vs. anti-preferred interaction on firing rate at the end of the previous block; (leftmost points in top panels; F_1,183=0.3, p=0.60; F_1,183=0.2 p=0.63).

Given the relationship between the number inference signal and behavior, we next wondered whether flavor inferences had any relationship with behavior. Although flavor changes had no enduring effect on choice rate, a small downward deflection in choice rate appeared to occur immediately after flavor switches (Figure 1C). We tested whether this change was significant by examining choice rate in the first two free-choice trials after flavor switches. Indeed, there was a small but significant decrease in the choice rate towards the large reward (or, equivalently, an increase in choice rate towards small reward) in those first two free-choice trials, as compared to the last two free-choice trials of the previous block. This difference disappeared in the next two free-choice trials of the new block (last two choice trials of previous block: 15±2.0% small choice; first two of new block: 22±2.5%; next two of new block: 17±2.3%; a within-subjects ANOVA on choice rate, with factor trial-pair found a main effect of trial pair: F_2,374=3.2,p<0.05; planned comparison between last two trials and first two trials: F_1,187=6.3,p<0.05; planned comparison between last two trials and second two trials: F_1,187=0.5,p=0.47).

To test whether this behavioral effect of flavor switches was related to the flavor inference neural signal, we divided flavor block switches by whether the first two free-choice trials after the switch had at least one choice of the small reward, or whether they had none (this split was in effect a median split, because the median number of small choices in the first two was zero). We re-examined neural activity in the same flavor anticipatory population analyzed earlier across flavor switches, this time testing each side of the split separately. As shown in Figure 9, in the small-choice group, inference signaling was significant on the inference trial, but did not change significantly between the inference trial and the end of the block. Conversely, in the zero-small-choice group, neurons did not show significant changes in activity on the inference trial, and all updating occurred between the inference trial and the end of the new block. Thus, like number inference signaling, strong flavor inference signaling appears to be associated with a behavioral change that may indicate immediate awareness of the switch. However, we did not see a significant interaction between group and signaling on the inference trial, indicating that this behavioral measure was not as closely tied to behavior as the speed of choice updating was. We were also unable to test for a correlation with this behavior, because almost all switches had either zero small choices or one small choice within these two free-choice trials.

**A, B.** Shown is average peak-normalized activity of all flavor-anticipatory neurons, as in Figure 4, here separated by whether rats chose small reward on one of the first two free-choice trials of the new block (A) or on neither (B). Dotted line shows activity at the end of the block in same direction. Choosing small reward may indicate immediate awareness of the block switch (see Results). In A, all significant updating occurred on the inference trial, whereas in B, all significant updating occurred between the inference trial and the end of the block.

*, ** significant interaction at p<0.05 or p<0.01, respectively (see below for detailed statistics).

**C, D, E, F.** Each panel shows population activity across the trial in the conditions summarized in A and B. Activity in all panels is aligned to reward delivery, shown by arrows and dashed lines.

ANOVA on difference in firing rate from the end of the previous block, with factors trial (last trial of previous block, inference trial, end of new block), switch type (anti-preferred→preferred or preferred→anti-preferred), and group (small-choice/no-small-choice), found a trend towards a significant effect of group (F_1,126=3.3,p=0.071). In the small-choice group, effect of switch type was not significant on last trial (F_1,126=2.0,p=0.16), but was significant on inference trial (F_1,126=4.0,p<0.05) and at end (F_1,126=19.9,p<0.0001). For the no small-choice group, effect of switch type not significant on last trial (F_1,126=0.3,p=0.62) nor inference trial F_1,126=2.1,p=0.15) but was at end (F_1,126=76.5,p<0.00001). No significant group X switch type interactions at any level of trial (Fs<2.2,p’s>0.13). Inference trial vs. end X switch type interaction was not significant for the small-choice group (F_1,126=0.1,p=0.71), but was significant for the no-small-choice group (F_1,126=9.1,p<0.003); interaction between these interactions was not significant (F_1,126=1.4, p=0.24). No effect of group on firing rate at the end of the previous block (leftmost points in top panels; F_1,126=0.1,p=0.71) and no group X preferred vs. anti-preferred interaction at the end of the previous block (F_1,183=0.0,p=0.97).

The question of whether inference signaling reflects partial or complete updating of the number or flavor selectivity can be answered using the split data described above. For both number and flavor block switches, when rats showed behavioral evidence of being aware of the block change (i.e. either fast choice switching after number switches or transient small-choices after flavor switches), no significant additional updating occurred between the inference trial and the end of the new block. Conversely, when there was less evidence that rats were aware of the block change, the inference trial itself showed no evidence of updating, and instead all significant updating happened between the inference trial and the end of the new block.

Finally, we asked whether any behavioral or neural factor might predict the strength of either the drop-number or flavor inference signal. We examined 1) the total number of trials in the previous block, 2) the percent of block switches that were the first switch of the session, 3) the choice rate at the end of the previous block, and 4) the total number of errors in the previous block. As detailed in Table 1, none of these factors were significantly different between block switches in which the inference signal was above the median as compared to when it was below the median. Likewise, as illustrated in the leftmost points in the top panels of Figures 8 and 9, the strength of drop-number or flavor selectivity at the end of the previous block failed to predict whether rats would switch choice rate quickly in the new block (for drop-number switches), or whether the first two free-choice trials would contain a small choice (for flavor switches).

Table 1.

Behavioral variables from the block previous to block switches in which inference signals are greater or less than the median

Behavioral variable	Inf. sig. > median	Inf. sig. < median	Comparison
Number Inferences
# trials in prev. block¹	53.2±2.0	53.4±2.1	t₁₉₂ = −0.1 p=0.95
% of cases that are the 1^st switch of the session	53 of 97 (54%)	47 of 97 (49%)	p=0.22 (by chi-square test)
Choice rate towards big outcome in last 20 trials of previous block¹	89.3±1.4%	90.7±1.1%	t₁₉₂ = −0.8 p=0.42
Total errors, prev. block¹	6.6±0.8	6.5±0.7	t₁₉₂ = 0.1 p=0.88
Flavor Inferences
# trials in prev. block¹	62.7±0.7	62.7±0.6	t₁₆₇ = 0.0 p=0.99
% of cases that are the 1^st flavor switch of the session	38 of 72 (53%)	45 of 97 (46%)	p=0.28 (by chi-square test)
Choice rate towards big outcome in last 20 trials of previous block¹	83.0±2.4%	79.6±2.1%	t₁₆₅ = 1.1 p=0.29
Total errors, prev. block¹	5.6±0.7	6.0±0.6	t₁₆₇ = −0.4 p=0.66

Open in a new tab

Mean ± standard error is shown.

DISCUSSION

The ability to predict likely outcomes or consequences is fundamental to normal adaptive behavior and learning. This ability obviously benefits if changes in outcomes can be inferred based on an understanding of the rules or causal structure of the environment rather than requiring direct experience. Broadly speaking, this is the basis of the distinction between so-called habitual or model-free behavior, in which responses are made based on pre-computed or cached values or “policies”, and goal-directed, outcome-guided, or model-based behavior, in which responses are based on values or policies that are arrived at on-the-fly through a process of mental simulation^24–27.

Historically the OFC has been strongly associated with emotions and associative behavior^28–30; however, in the past decade, accounts of this involvement have been increasingly dichotomized to reflect the division between signaling information derived solely from direct experience versus information that is inferred from access to relationships between events in the environment or task at hand. OFC is typically not necessary for the former but is always critical for the latter^{13, 18, 20–23}. This has been interpreted as showing a role for the OFC in signaling an inferred or derived rather than a pre-computed value^{6, 13}.

Yet a growing number of accounts implicate the OFC not only in signaling inferred value but also in signaling value-independent information about impending events. This is evident in single unit and fMRI studies, in which neural signals in OFC represent features of impending outcomes – and in fact other predictable events – even when they are value neutral^{6, 11, 14, 15, 31}. And in rats there exists clear evidence that the OFC is necessary for behavior and learning that reflects these specific sensory features^{18, 19, 32}. These data raise the possibility that the OFC may not represent only value, even inferred, but instead may be a fundamental part of the circuit that represents the associative structure of the environment ³³.

Here we provide single-unit evidence consistent with this proposal. Specifically we found that single units in rat OFC fired in anticipation of expected outcomes in a way that reflected both value-based and value-neutral features of the outcomes. Indeed the strength or directionality of the correlates did not appear to be especially tied to whether or not the features had implications for value. As such, these results are consistent with a variety of prior reports that have emphasized the signaling of value-neutral associative features in the orbitofrontal cortex^{11, 15–17}. Of course, information about value-neutral features could still be used to derive value or to drive goal-directed behavior, but regardless of its use, these results suggest that OFC neurons represent a rich description of the specific features defining the outcome, including even the location or directional response required to obtain it.

Furthermore, this information was represented immediately at the start of new trial blocks, even before the predicted outcome features had been directly re-experienced. As noted earlier, such a prediction would require knowledge of the rules or associative structure of the task. Rats may use this knowledge to mentally simulate predicted outcomes (as model-based processing is usually conceived) or, alternatively, they may recognize which of the unique task “states” (i.e. the arrangement of outcomes in each block) they are currently in. Notably, both approaches require rats to represent the “state space” or, equivalently, the associative structure of the task, in concert with more local information (recently experienced outcomes), to make predictions that in turn allow fast updating of choice behavior. This linking together of associative information to promote flexible responding is consistent with the proposal that the OFC is part of a network involved in tracking hidden environmental states and making predictions about impending events ³³. This role may be one reason the OFC is important for behavior in settings similar to the current task, which involve reversals ^{21, 30, 34–36}, however it may also be related to OFC’s involvement in flexible behavior that goes beyond immediate or even direct experience ^37–41.

METHODS

Subjects

Male Long-Evans rats were obtained at 175–200g (approximately 60 days old on arrival) from Charles River Labs, Wilmington, MA. Rats were tested at the University of Maryland School of Medicine in accordance with School of Medicine and NIH guidelines.

Surgical procedures and histology

Surgical procedures followed guidelines for aseptic technique. Electrodes, consisting of drivable bundles of eight 25-um diameter FeNiCr wires (Stablohm 675, California Fine Wire, Grover Beach, CA) electroplated with platinum to an impedance of ~300 kOhms, were manufactured and implanted as in prior recording experiments. Driveable electrodes were implanted in the left orbitofrontal cortex (n = 6; 3.0 mm anterior to bregma, 3.2 mm laterally, and, to begin, 4.0 mm ventral to the surface of the brain) in each rat. Four rats also had electrodes implanted in the ventral striatum; those data are not included in this report. At the end of the study, the final electrode position was marked, the rats were euthanized with an overdose of isoflurane and perfused, and the brains were removed from the skulls and processed using standard techniques.

Behavioral task

Recording was conducted in aluminum chambers, on one wall of which was a panel with an odor port and two fluid wells arranged below it (see Figure 1). The odor port was connected to an air flow dilution olfactometer to allow the rapid delivery of olfactory cues. The fluid wells were connected to fluid delivery lines containing flavored milk (Nesquick brand chocolate or vanilla) diluted 50% with water. Delivery of odors at the odor port and the fluids at the fluid wells was controlled by a custom C++ program interfaced with solenoid valves. Photobeam breaks at the port and wells were monitored and recorded by the program. A houselight was also controlled by the program.

Rats were trained rats extensively before implanting them with electrodes. After implantation, we retrained rats to work with the recording cable. Each training session included as many trials as a rat would perform before quitting, ~150–250. This initial shaping phase gradually introduced all elements of the task (described below), and thus rats could learn the associative structure of the task over this period. Recording was begun when rats could complete five blocks of trials (at least 260 trials) with the cable. Total number of pre-recording training sessions averaged 32.5 (ranging from 24 to 43).

Each recording session consisted of a series of self-paced trials organized into five blocks. Rats could initiate a trial by poking into the odor port while the house light was illuminated. Beginning 500 ms after the odor poke, an odor would be delivered for 500 ms. If the rat withdrew from the odor port before completion of the 1000 ms pre-odor + odor period, the trial would be aborted and the houselight turned off. At the end of the odor, rats could respond by moving from the odor port to the left fluid well or the right fluid well, after which they had to wait for 500 ms before fluid delivery began; if they exited the well during this period, no fluid was delivered and the trial ended. The identity of the odor specified whether they could receive reward at the left well (forced-choice left), the right well (forced-choice right), or either well (free-choice). The identity and meaning of these odors remained the same across the entire experiment. Odors were presented in a pseudorandom sequence such that the free-choice odor was presented on 7/20 trials and the left/right odors were presented in equal numbers (±1 over 250 trials). In addition, the same odor could be presented on no more than 3 consecutive trials.

Rewards were either one bolus or three boli of chocolate or vanilla milk, with bolus size ~0.05 ml and 500 ms between boli. Response-reward contingencies were consistent within blocks of trials, such that the same reward would be delivered for every correct right response, either free- or forced-choice, and a different reward would be delivered for every correct left response, free- or forced-choice. The reward schedule was arranged so that in each block, reward features available on one side were always paired with the opposite reward features on the other side – thus when one drop of chocolate milk was available on the left, three drops of vanilla was available on the right, etc., resulting in a total of four different reward combinations. On the first block, consisting of on average 43 (standard deviation 16) trials that were used to set the rats’ expectations before the first block switch, one of these combinations was randomly chosen. The subsequent four block transitions then followed, in order: 1) a drop-number transition, in which the side with one drop changed to three drops and vice versa, but the side-flavor contingencies remained the same; 2) a flavor transition, in which the side with chocolate changed to vanilla and vice versa, but the side-number contingencies remained the same, 3) another drop-number transition, 4) another flavor transition. These block transitions were not explicitly signaled. The length of the last four blocks varied non-systematically around 65, with a standard deviation of 10.7 across the experiment.

During testing, rats were limited to 10 minutes of ad lib water each day, in addition to fluid earned in the task.

Flavor preference testing

In six rats from a separate experiment (same strain and source, and same water restriction regimen) we compared consumption of the chocolate vs. vanilla milk solution in two-bottle tests. All rats were tested for ten total minutes, with the location of the bottles swapped every 30 seconds. Two rats were given five 2-minute tests while the other four rats were given one 10-minute test each.

Single-unit recording

Procedures were the same as described previously⁴². Wires were screened for activity daily; if no activity was detected, the rat was removed and the electrode assembly was advanced 40 or 80 um. Otherwise a session was conducted, and the electrode was advanced by at least 40 um at the end of the session. Neural activity was recorded using Plexon Multichannel Acquisition Processor systems (Dallas, TX), interfaced with odor discrimination training chambers. Signals from the electrode wires were amplified and filtered by standard procedures described in previous studies. Waveforms (>2.5:1 signal-to-noise) were extracted from active channels and recorded with event timestamps sent by the behavioral program. Waveforms were not inverted before data analysis.

Data analysis

Units were sorted using Offline Sorter software from Plexon Inc. (Dallas, TX), using a template matching algorithm. Sorted files were then processed in Neuroexplorer to extract unit timestamps and relevant event markers. These data were subsequently analyzed in Matlab (Natick, MA).

To analyze reward anticipatory activity, we examined firing rate in the 500ms epoch between fluid well entry and the first bolus of reward delivery. We performed an ANOVA (p < 0.05) on each neuron’s firing rate during this epoch, with baseline (from houselight on to odor initiation) subtracted, using factors number of reward drops (1 or 3), reward flavor (chocolate or vanilla), and well location (left or right). In the initial ANOVA, we included all correct forced-choice trials across the last four blocks (across which all factors were completely crossed and balanced). Every neuron with a number or flavor effect, regardless of whether it showed an inhibitory or excitatory response, was included.

To measure the degree of number- and flavor-selectivity, we calculated a number and flavor index for each neuron. To do so, we peak-normalized each neuron by dividing all firing rates by a normalization factor, which was the peak firing rate across the trial in 500 ms bins after averaging across the first ten and last ten of each of the eight number-flavor-well conditions in the last four blocks. We then calculated the number and flavor indices as:

(1)
number index = average peak-normalized firing rate on all 3-drop trials – average peak-normalized firing rate on all 1-drop trials
(2)
flavor index = average peak-normalized firing rate on all chocolate trials – average peak-normalized firing rate on all vanilla trials.

For neurons with significant interaction effects, the above formulas would not reflect the actual degree of their selectivity, so for those neurons, we calculated separate indices for each level of the interacting factor, and then took the index with the highest absolute value. For instance, for a neuron with a flavor X direction interaction, we calculated separate flavor indices for the left and right wells, and then assigned whichever of these was greater in magnitude as the flavor index for that neuron.

To analyze neural activity reflecting inferences, we identified the well at which reward was first delivered after each block switch, and then identified the first forced-choice trial that was rewarded at the other well in that block, eliminating cases in which a free-choice trial was rewarded at that well before the first forced-choice trial. For simplicity, we will call this trial the inference trial. The inference trial was therefore the first trial of the block in which reward was delivered at that well. We compared reward-anticipatory activity on the inference trial with activity in forced-choice trials in the same direction at the end of the previous block. For the reported analysis, this previous block activity was the average of the last three trials, not including the last one, from the previous block, but the results remained qualitatively the same independent of exactly how many trials from the previous block we used; we eliminated the last trial so that we could examine activity on that trial as a control. For drop-number-selective neurons, we noticed that they tended to ramp up firing before the rat entered the fluid well to wait for reward; thus we changed the epoch to begin 100 ms before fluid well entry, still ending upon delivery of the first drop of reward.

To define the population to be tested for inference signaling, we first redid the ANOVA, this time only including forced-choice trials after the first twenty in each block (the last half of blocks). We did this in order to eliminate statistical bias towards finding inference, which would occur at the beginning of blocks, but results were similar using the original ANOVA. We tested for inference signaling across all neurons with a significant effect of number, flavor, or interactions of these factors. Each neuron’s preferred reward feature was defined for main effect neurons as the feature it fired most for across the trials that went into the ANOVA. For instance, for a neuron with a main effect of number, it was defined as 3-drop-preferring if it fired most across trials (after the first twenty of blocks) in which the 3-drop reward was delivered. For interaction effects, we did follow-up ANOVAs at each level of the interacting factor to determine whether that neuron had a significant effect at each level, and, if so, which reward feature it preferred at that level. For instance, for a neuron with a significant flavor X direction effect, we tested for a flavor effect separately at the left well and the right well, and, for each significant effect, defined whether that neuron preferred chocolate or vanilla at that well. Then, for inference testing, we only included a neuron when the block transition in the analyzed direction consisted of a switch between that neuron’s preferred to anti-preferred reward feature, or between its anti-preferred to preferred reward feature. We also did two control analyses to test whether observed patterns of changes were specific to inference: 1) for each neuron, we replaced the inference trial with final trial of the previous block, but leaving all other factors the same, and 2) for each neuron, we examined the change across the opposite block change from that in which the inferential change was measured (e.g. for a flavor inference measured on the first flavor block change, we examined its change across the first number block change).

For correlations with behavior, we calculated an inference score for each neuron across each qualifying block switch, defined as follows:

(3)
inference score = (sign-factor)* (peak-normalized firing rate during the analyzed epoch of the inference trial – average peak-normalized firing rate in the analyzed epoch over the last 10 (not including the last trial) of the previous block in the same direction.)
(4)
sign-factor = 1, for anti-preferred → preferred switches
(5)
sign-factor = −1, for preferred → anti-preferred switches

We used ten trials from the previous block in equation 3 in order to reduce trial-to-trial variability in the score as much as possible, but, again, the exact number of trials from the previous block did not change the results qualitatively. As a control, we tested a parallel correlation in which we substituted for the inference trial the last forced-choice trial from the previous block. We also calculated selectivity at the end of the new block, in parallel to the inference score, as follows:

(6)
end-of-block selectivity = (sign-factor)* (average peak-normalized firing rate in the last 10 (not including the last trial) of the new block in the same direction as the inference trial – average peak-normalized firing rate over the last 10 (not including the last trial) of the previous block in the same direction.)

When data for correlations contained two block changes for the same neuron, those datapoints were averaged so that each datapoint in the correlation came from a unique neuron.

Statistics were done using Matlab, Excel, and Statistica. Planned comparisons were used for testing specific effects of multi-way ANOVAs. For displays of neural activity, bin-size was 100 ms except when the inference trial was shown for single-units, in which case bin-size was 250 ms. Neural activity in the displays was smoothed with a boxcar algorithm, with a 9-bin boxcar used for single-unit plots outside of the analyzed epoch (3-bin boxcar for the inference trial on single-units), and either a 3-bin boxcar (for Figure 2) or no smoothing (Figure 3) being used within the analyzed epoch. For population average activity displays (Figures 4, 5 and 7), a 5-bin boxcar was used. To make plots of trial-by-trial choice rate, we first aligned all rewarded trials for all blocks, and for each trial (1st, 2nd, 3rd, etc. after the block switch) we took the proportion of choices towards the side with the 3-drop reward, excluding all blocks in which a forced-choice trial happened to occur on that trial. For Figure 1, line was then smoothed using a 3-bin boxcar, separately for before and after the switch.

Acknowledgments

This work was supported by funding from NIDA. The opinions expressed in this article are the authors’ own and do not reflect the view of the National Institutes of Health, the Department of Health and Human Services, or the United States government.

Footnotes

Author Contributions

T.A.S., N.K.C., and G.S. conceived the experiments; N.K.C. carried out the experiments, with assistance from M.A.M., T-L.L. and H.W. T.A.S. and N.K.C. analyzed the results, and the manuscript was prepared by T.A.S., N.K.C., and G.S. with input from the other authors.

Competing Financial Interests

The authors declare no competing financial interests.

References

1.Schoenbaum G, Esber G. How do you (estimate you will) like them apples? Integration as a defining trait of orbitofrontal function. Current Opinion in Neurobiology. 2010;20:205–211. doi: 10.1016/j.conb.2010.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Padoa-Schioppa C. Neurobiology of economic choice: a goods-based model. Annual Review of Neuroscience. 2011;34:333–359. doi: 10.1146/annurev-neuro-061010-113648. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Wallis JD. Orbitofrontal cortex and its contribution to decision-making. Annual Review of Neuroscience. 2007;30:31–56. doi: 10.1146/annurev.neuro.30.051606.094334. [DOI] [PubMed] [Google Scholar]
4.Schoenbaum G, Chiba AA, Gallagher M. Orbitofrontal cortex and basolateral amygdala encode expected outcomes during learning. Nature Neuroscience. 1998;1:155–159. doi: 10.1038/407. [DOI] [PubMed] [Google Scholar]
5.Levy DJ, Glimcher PW. Comparing apples and oranges: Using reward-specific and reward-general subjective value representation in the brain. Journal of Neuroscience. 2011;31:14693–14707. doi: 10.1523/JNEUROSCI.2218-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Padoa-Schioppa C, Assad JA. Neurons in orbitofrontal cortex encode economic value. Nature. 2006;441:223–226. doi: 10.1038/nature04676. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.O’Doherty J, Deichmann R, Critchley HD, Dolan RJ. Neural responses during anticipation of a primary taste reward. Neuron. 2002;33:815–826. doi: 10.1016/s0896-6273(02)00603-7. [DOI] [PubMed] [Google Scholar]
8.Gottfried JA, O’Doherty J, Dolan RJ. Encoding predictive reward value in human amygdala and orbitofrontal cortex. Science. 2003;301:1104–1107. doi: 10.1126/science.1087919. [DOI] [PubMed] [Google Scholar]
9.Thorpe SJ, Rolls ET, Maddison S. The orbitofrontal cortex: neuronal activity in the behaving monkey. Experimental Brain Research. 1983;49:93–115. doi: 10.1007/BF00235545. [DOI] [PubMed] [Google Scholar]
10.Tremblay L, Schultz W. Relative reward preference in primate orbitofrontal cortex. Nature. 1999;398:704–708. doi: 10.1038/19525. [DOI] [PubMed] [Google Scholar]
11.Kennerley SW, Behrens TE, Wallis JD. Double dissociation of value computations in orbitofrontal and anterior cingulate neurons. Nature Neuroscience. 2011 doi: 10.1038/nn.2961. AOP. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Sul JH, Kim H, Huh N, Lee D, Jung MW. Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making. Neuron. 2010;66:449–460. doi: 10.1016/j.neuron.2010.03.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Gallagher M, McMahan RW, Schoenbaum G. Orbitofrontal cortex and representation of incentive value in associative learning. Journal of Neuroscience. 1999;19:6610–6614. doi: 10.1523/JNEUROSCI.19-15-06610.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Roesch MR, Taylor AR, Schoenbaum G. Encoding of time-discounted rewards in orbitofrontal cortex is independent of value representation. Neuron. 2006;51:509–520. doi: 10.1016/j.neuron.2006.06.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Schoenbaum G, Eichenbaum H. Information coding in the rodent prefrontal cortex. I. Single-neuron activity in orbitofrontal cortex compared with that in pyriform cortex. Journal of Neurophysiology. 1995;74:733–750. doi: 10.1152/jn.1995.74.2.733. [DOI] [PubMed] [Google Scholar]
16.Feierstein CE, Quirk MC, Uchida N, Sosulski DL, Mainen ZF. Representation of spatial goals in rat orbitofrontal cortex. Neuron. 2006;51:495–507. doi: 10.1016/j.neuron.2006.06.032. [DOI] [PubMed] [Google Scholar]
17.Furuyashiki T, Holland PC, Gallagher M. Rat orbitofrontal cortex separately encodes response and outcome information during performance of goal-directed behavior. Journal of Neuroscience. 2008;28:5127–5138. doi: 10.1523/JNEUROSCI.0319-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Jones JL, et al. Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science. 2012;338:953–956. doi: 10.1126/science.1227489. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.McDannald MA, Lucantonio F, Burke KA, Niv Y, Schoenbaum G. Ventral striatum and orbitofrontal cortex are both required for model-based, but not model-free, reinforcement learning. Journal of Neuroscience. 2011;31:2700–2705. doi: 10.1523/JNEUROSCI.5499-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Gremel CM, Costa RM. Orbitofrontal and striatal circuits dynamically encode the shift between goal-directed and habitual actions. Nature Communications. 2013;4:2264. doi: 10.1038/ncomms3264. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Izquierdo AD, Suda RK, Murray EA. Bilateral orbital prefrontal cortex lesions in rhesus monkeys disrupt choices guided by both reward value and reward contingency. Journal of Neuroscience. 2004;24:7540–7548. doi: 10.1523/JNEUROSCI.1921-04.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Rudebeck PH, Saunders RC, Prescott AT, Chau LS, Murray EA. Prefrontal mechanisms of behavioral flexibility, emotion regulation and value updating. Nature Neuroscience. 2013;16:1140–1145. doi: 10.1038/nn.3440. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.West EA, DesJardin JT, Gale K, Malkova L. Transient inactivation of orbitofrontal cortex blocks reinforcer devaluation in macaques. Journal of Neuroscience. 2011;31:15128–15135. doi: 10.1523/JNEUROSCI.3295-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Dickinson A, Balleine BW. Motivational control of goal-directed action. Animal Learning and Behavior. 1994;22:1–18. [Google Scholar]
25.Huys QJ, et al. Bonsai trees in your head: how the Pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Computational Biology. 2012;8:e1002410. doi: 10.1371/journal.pcbi.1002410. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Daw ND, Niv Y, Dayan P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience. 2005;8:1704–1711. doi: 10.1038/nn1560. [DOI] [PubMed] [Google Scholar]
27.Holland PC, Rescorla RA. The effects of two ways of devaluing the unconditioned stimulus after first and second-order appetitive conditioning. Journal of Experimental Psychology: Animal Behavior Processes. 1975;1:355–363. doi: 10.1037//0097-7403.1.4.355. [DOI] [PubMed] [Google Scholar]
28.Rolls ET. The orbitofrontal cortex. Philosophical Transactions of the Royal Society of London B. 1996;351:1433–1443. doi: 10.1098/rstb.1996.0128. [DOI] [PubMed] [Google Scholar]
29.Bechara A, Damasio H, Tranel D, Damasio AR. Deciding advantageously before knowing the advantageous strategy. Science. 1997;275:1293–1294. doi: 10.1126/science.275.5304.1293. [DOI] [PubMed] [Google Scholar]
30.Jones B, Mishkin M. Limbic lesions and the problem of stimulus-reinforcement associations. Experimental Neurology. 1972;36:362–377. doi: 10.1016/0014-4886(72)90030-1. [DOI] [PubMed] [Google Scholar]
31.McDannald MA, et al. Orbitofrontal neurons acquire responses to ‘valueless’ Pavlovian cues during unblocking. (in review) [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Ostlund SB, Balleine BW. Orbitofrontal cortex mediates outcome encoding in Pavlovian but not instrumental learning. Journal of Neuroscience. 2007;27:4819–4825. doi: 10.1523/JNEUROSCI.5443-06.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Wilson RC, Takahashi YK, Schoenbaum G, Niv Y. Orbitofrontal Cortex as a Cognitive Map of Task Space. Neuron. 2014;81:267–279. doi: 10.1016/j.neuron.2013.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Hornak J, et al. Reward-related reversal learning after surgical excisions in orbito-frontal or dorsolateral prefrontal cortex in humans. Journal of Cognitive Neuroscience. 2004;16:463–478. doi: 10.1162/089892904322926791. [DOI] [PubMed] [Google Scholar]
35.Schoenbaum G, Nugent S, Saddoris MP, Setlow B. Orbitofrontal lesions in rats impair reversal but not acquisition of go, no-go odor discriminations. Neuroreport. 2002;13:885–890. doi: 10.1097/00001756-200205070-00030. [DOI] [PubMed] [Google Scholar]
36.Walton ME, Behrens TEJ, Buckley MJ, Rudebeck PH, Rushworth MFS. Separable learning systems in teh macaque brain and the role of the orbitofrontal cortex in contingent learning. Neuron. 2010;65:927–939. doi: 10.1016/j.neuron.2010.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Abe H, Lee D. Distributed coding of actual and hypothetical outcomes in the orbital and dorsolateral prefrontal cortex. Neuron. 2011;70:731–741. doi: 10.1016/j.neuron.2011.03.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Barron HC, Dolan RJ, Behrens TE. Online evaluation of novel choices by simultaneous representation of multiple memories. Nature Neuroscience. 2013;16:1492–1498. doi: 10.1038/nn.3515. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Wimmer GE, Daw ND, Shohamy D. Generalization of value in reinforcement learning in humans. European Journal of Neuroscience. 2012;35:1092–1104. doi: 10.1111/j.1460-9568.2012.08017.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. Model-based influences on humans’ choices and striatal prediction errors. Neuron. 2011;69:1204–1215. doi: 10.1016/j.neuron.2011.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Takahashi YK, et al. Neural estimates of imagined outcomes in the orbitofrontal cortex drive behavior and learning. Neuron. 2013;80:507–518. doi: 10.1016/j.neuron.2013.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Stalnaker TA, Calhoon G, Ogawa M, Roesch MR, Schoenbaum G. Neural correlates of stimulus-response and response-outcome associations in dorsolateral versus dorsomedial striatum. Frontiers in Integrative Neuroscience. 2010;4 doi: 10.3389/fnint.2010.00012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Schoenbaum G, Esber G. How do you (estimate you will) like them apples? Integration as a defining trait of orbitofrontal function. Current Opinion in Neurobiology. 2010;20:205–211. doi: 10.1016/j.conb.2010.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Padoa-Schioppa C. Neurobiology of economic choice: a goods-based model. Annual Review of Neuroscience. 2011;34:333–359. doi: 10.1146/annurev-neuro-061010-113648. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Wallis JD. Orbitofrontal cortex and its contribution to decision-making. Annual Review of Neuroscience. 2007;30:31–56. doi: 10.1146/annurev.neuro.30.051606.094334. [DOI] [PubMed] [Google Scholar]

[R4] 4.Schoenbaum G, Chiba AA, Gallagher M. Orbitofrontal cortex and basolateral amygdala encode expected outcomes during learning. Nature Neuroscience. 1998;1:155–159. doi: 10.1038/407. [DOI] [PubMed] [Google Scholar]

[R5] 5.Levy DJ, Glimcher PW. Comparing apples and oranges: Using reward-specific and reward-general subjective value representation in the brain. Journal of Neuroscience. 2011;31:14693–14707. doi: 10.1523/JNEUROSCI.2218-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Padoa-Schioppa C, Assad JA. Neurons in orbitofrontal cortex encode economic value. Nature. 2006;441:223–226. doi: 10.1038/nature04676. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.O’Doherty J, Deichmann R, Critchley HD, Dolan RJ. Neural responses during anticipation of a primary taste reward. Neuron. 2002;33:815–826. doi: 10.1016/s0896-6273(02)00603-7. [DOI] [PubMed] [Google Scholar]

[R8] 8.Gottfried JA, O’Doherty J, Dolan RJ. Encoding predictive reward value in human amygdala and orbitofrontal cortex. Science. 2003;301:1104–1107. doi: 10.1126/science.1087919. [DOI] [PubMed] [Google Scholar]

[R9] 9.Thorpe SJ, Rolls ET, Maddison S. The orbitofrontal cortex: neuronal activity in the behaving monkey. Experimental Brain Research. 1983;49:93–115. doi: 10.1007/BF00235545. [DOI] [PubMed] [Google Scholar]

[R10] 10.Tremblay L, Schultz W. Relative reward preference in primate orbitofrontal cortex. Nature. 1999;398:704–708. doi: 10.1038/19525. [DOI] [PubMed] [Google Scholar]

[R11] 11.Kennerley SW, Behrens TE, Wallis JD. Double dissociation of value computations in orbitofrontal and anterior cingulate neurons. Nature Neuroscience. 2011 doi: 10.1038/nn.2961. AOP. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Sul JH, Kim H, Huh N, Lee D, Jung MW. Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making. Neuron. 2010;66:449–460. doi: 10.1016/j.neuron.2010.03.033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Gallagher M, McMahan RW, Schoenbaum G. Orbitofrontal cortex and representation of incentive value in associative learning. Journal of Neuroscience. 1999;19:6610–6614. doi: 10.1523/JNEUROSCI.19-15-06610.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Roesch MR, Taylor AR, Schoenbaum G. Encoding of time-discounted rewards in orbitofrontal cortex is independent of value representation. Neuron. 2006;51:509–520. doi: 10.1016/j.neuron.2006.06.027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Schoenbaum G, Eichenbaum H. Information coding in the rodent prefrontal cortex. I. Single-neuron activity in orbitofrontal cortex compared with that in pyriform cortex. Journal of Neurophysiology. 1995;74:733–750. doi: 10.1152/jn.1995.74.2.733. [DOI] [PubMed] [Google Scholar]

[R16] 16.Feierstein CE, Quirk MC, Uchida N, Sosulski DL, Mainen ZF. Representation of spatial goals in rat orbitofrontal cortex. Neuron. 2006;51:495–507. doi: 10.1016/j.neuron.2006.06.032. [DOI] [PubMed] [Google Scholar]

[R17] 17.Furuyashiki T, Holland PC, Gallagher M. Rat orbitofrontal cortex separately encodes response and outcome information during performance of goal-directed behavior. Journal of Neuroscience. 2008;28:5127–5138. doi: 10.1523/JNEUROSCI.0319-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Jones JL, et al. Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science. 2012;338:953–956. doi: 10.1126/science.1227489. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.McDannald MA, Lucantonio F, Burke KA, Niv Y, Schoenbaum G. Ventral striatum and orbitofrontal cortex are both required for model-based, but not model-free, reinforcement learning. Journal of Neuroscience. 2011;31:2700–2705. doi: 10.1523/JNEUROSCI.5499-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Gremel CM, Costa RM. Orbitofrontal and striatal circuits dynamically encode the shift between goal-directed and habitual actions. Nature Communications. 2013;4:2264. doi: 10.1038/ncomms3264. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Izquierdo AD, Suda RK, Murray EA. Bilateral orbital prefrontal cortex lesions in rhesus monkeys disrupt choices guided by both reward value and reward contingency. Journal of Neuroscience. 2004;24:7540–7548. doi: 10.1523/JNEUROSCI.1921-04.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Rudebeck PH, Saunders RC, Prescott AT, Chau LS, Murray EA. Prefrontal mechanisms of behavioral flexibility, emotion regulation and value updating. Nature Neuroscience. 2013;16:1140–1145. doi: 10.1038/nn.3440. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.West EA, DesJardin JT, Gale K, Malkova L. Transient inactivation of orbitofrontal cortex blocks reinforcer devaluation in macaques. Journal of Neuroscience. 2011;31:15128–15135. doi: 10.1523/JNEUROSCI.3295-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Dickinson A, Balleine BW. Motivational control of goal-directed action. Animal Learning and Behavior. 1994;22:1–18. [Google Scholar]

[R25] 25.Huys QJ, et al. Bonsai trees in your head: how the Pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Computational Biology. 2012;8:e1002410. doi: 10.1371/journal.pcbi.1002410. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Daw ND, Niv Y, Dayan P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience. 2005;8:1704–1711. doi: 10.1038/nn1560. [DOI] [PubMed] [Google Scholar]

[R27] 27.Holland PC, Rescorla RA. The effects of two ways of devaluing the unconditioned stimulus after first and second-order appetitive conditioning. Journal of Experimental Psychology: Animal Behavior Processes. 1975;1:355–363. doi: 10.1037//0097-7403.1.4.355. [DOI] [PubMed] [Google Scholar]

[R28] 28.Rolls ET. The orbitofrontal cortex. Philosophical Transactions of the Royal Society of London B. 1996;351:1433–1443. doi: 10.1098/rstb.1996.0128. [DOI] [PubMed] [Google Scholar]

[R29] 29.Bechara A, Damasio H, Tranel D, Damasio AR. Deciding advantageously before knowing the advantageous strategy. Science. 1997;275:1293–1294. doi: 10.1126/science.275.5304.1293. [DOI] [PubMed] [Google Scholar]

[R30] 30.Jones B, Mishkin M. Limbic lesions and the problem of stimulus-reinforcement associations. Experimental Neurology. 1972;36:362–377. doi: 10.1016/0014-4886(72)90030-1. [DOI] [PubMed] [Google Scholar]

[R31] 31.McDannald MA, et al. Orbitofrontal neurons acquire responses to ‘valueless’ Pavlovian cues during unblocking. (in review) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Ostlund SB, Balleine BW. Orbitofrontal cortex mediates outcome encoding in Pavlovian but not instrumental learning. Journal of Neuroscience. 2007;27:4819–4825. doi: 10.1523/JNEUROSCI.5443-06.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Wilson RC, Takahashi YK, Schoenbaum G, Niv Y. Orbitofrontal Cortex as a Cognitive Map of Task Space. Neuron. 2014;81:267–279. doi: 10.1016/j.neuron.2013.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Hornak J, et al. Reward-related reversal learning after surgical excisions in orbito-frontal or dorsolateral prefrontal cortex in humans. Journal of Cognitive Neuroscience. 2004;16:463–478. doi: 10.1162/089892904322926791. [DOI] [PubMed] [Google Scholar]

[R35] 35.Schoenbaum G, Nugent S, Saddoris MP, Setlow B. Orbitofrontal lesions in rats impair reversal but not acquisition of go, no-go odor discriminations. Neuroreport. 2002;13:885–890. doi: 10.1097/00001756-200205070-00030. [DOI] [PubMed] [Google Scholar]

[R36] 36.Walton ME, Behrens TEJ, Buckley MJ, Rudebeck PH, Rushworth MFS. Separable learning systems in teh macaque brain and the role of the orbitofrontal cortex in contingent learning. Neuron. 2010;65:927–939. doi: 10.1016/j.neuron.2010.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Abe H, Lee D. Distributed coding of actual and hypothetical outcomes in the orbital and dorsolateral prefrontal cortex. Neuron. 2011;70:731–741. doi: 10.1016/j.neuron.2011.03.026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Barron HC, Dolan RJ, Behrens TE. Online evaluation of novel choices by simultaneous representation of multiple memories. Nature Neuroscience. 2013;16:1492–1498. doi: 10.1038/nn.3515. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.Wimmer GE, Daw ND, Shohamy D. Generalization of value in reinforcement learning in humans. European Journal of Neuroscience. 2012;35:1092–1104. doi: 10.1111/j.1460-9568.2012.08017.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. Model-based influences on humans’ choices and striatal prediction errors. Neuron. 2011;69:1204–1215. doi: 10.1016/j.neuron.2011.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Takahashi YK, et al. Neural estimates of imagined outcomes in the orbitofrontal cortex drive behavior and learning. Neuron. 2013;80:507–518. doi: 10.1016/j.neuron.2013.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] 42.Stalnaker TA, Calhoon G, Ogawa M, Roesch MR, Schoenbaum G. Neural correlates of stimulus-response and response-outcome associations in dorsolateral versus dorsomedial striatum. Frontiers in Integrative Neuroscience. 2010;4 doi: 10.3389/fnint.2010.00012. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Orbitofrontal neurons infer the value and identity of predicted outcomes

Thomas A Stalnaker

Nisha K Cooch

Michael A McDannald

Tzu-Lan Liu

Heather Wied

Geoffrey Schoenbaum

Abstract

RESULTS

Figure 1. Task and behavior.

Figure 2. Recording sites and characterization of reward-selective activity.

Figure 3. Single-unit examples showing inference signaling.

Figure 4. Population inference signaling.

Figure 5. Control analysis demonstrating that activity does not change on the last trial before the block switch in the same populations.

Figure 6. Control analysis demonstrating activity across block switches in which the preferred reward feature does not change.

Figure 7. Neural inference signals positively correlate with the rate at which choice behavior adjusts to the block switch.

Figure 8. OFC number-selective neurons signal the inference only on blocks in which the rat subsequently switches choice behavior quickly.

Figure 9. OFC flavor-selective neurons signal the inference only on blocks in which the rat choses the small reward.

Table 1.

DISCUSSION

METHODS

Subjects

Surgical procedures and histology

Behavioral task

Flavor preference testing

Single-unit recording

Data analysis

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Orbitofrontal neurons infer the value and identity of predicted outcomes

Thomas A Stalnaker

Nisha K Cooch

Michael A McDannald

Tzu-Lan Liu

Heather Wied

Geoffrey Schoenbaum

Abstract

RESULTS

Figure 1. Task and behavior.

Figure 2. Recording sites and characterization of reward-selective activity.

Figure 3. Single-unit examples showing inference signaling.

Figure 4. Population inference signaling.

Figure 5. Control analysis demonstrating that activity does not change on the last trial before the block switch in the same populations.

Figure 6. Control analysis demonstrating activity across block switches in which the preferred reward feature does not change.

Figure 7. Neural inference signals positively correlate with the rate at which choice behavior adjusts to the block switch.

Figure 8. OFC number-selective neurons signal the inference only on blocks in which the rat subsequently switches choice behavior quickly.

Figure 9. OFC flavor-selective neurons signal the inference only on blocks in which the rat choses the small reward.

Table 1.

DISCUSSION

METHODS

Subjects

Surgical procedures and histology

Behavioral task

Flavor preference testing

Single-unit recording

Data analysis

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases