Skip to main content
The Journal of Neuroscience logoLink to The Journal of Neuroscience
. 2016 Feb 10;36(6):1996–2006. doi: 10.1523/JNEUROSCI.3366-15.2016

Multifaceted Contributions by Different Regions of the Orbitofrontal and Medial Prefrontal Cortex to Probabilistic Reversal Learning

Gemma L Dalton 1, Nena Y Wang 1, Anthony G Phillips 1, Stan B Floresco 1,
PMCID: PMC6602019  PMID: 26865622

Abstract

Different subregions of the prefrontal cortex (PFC) contribute to the ability to respond flexibly to changes in reward contingencies, with the medial versus orbitofrontal cortex (OFC) subregions contributing differentially to processes such as set-shifting and reversal learning. To date, the manner in which these regions may facilitate reversal learning in situations involving reward uncertainty remains relatively unexplored. We investigated the involvement of five distinct regions of the rat OFC (lateral and medial) and medial PFC (prelimbic, infralimbic, and anterior cingulate) on probabilistic reversal learning wherein “correct” versus “incorrect” responses were rewarded on 80% and 20% of trials, respectively. Contingencies were reversed repeatedly within a session. In well trained rats, inactivation of the medial or lateral OFC induced dissociable impairments in performance (indexed by fewer reversals completed) when outcomes were probabilistic, but not when they were assured. Medial OFC inactivation impaired probabilistic learning during the first discrimination, increased perseverative responding and reduced sensitivity to positive and negative feedback, suggestive of a deficit in incorporating information about previous action outcomes to guide subsequent behavior. Lateral OFC inactivation preferentially impaired performance during reversal phases. In contrast, prelimbic inactivation caused an apparent improvement in performance by increasing the number of reversals completed. This was associated with enhanced sensitivity to recently rewarded actions and reduced sensitivity to negative feedback. Infralimbic inactivation had no effect, whereas the anterior cingulate appeared to play a permissive role in this form of reversal learning. These results clarify the dissociable contributions of different regions of the frontal lobes to probabilistic learning.

SIGNIFICANCE STATEMENT The ability to adjust behavior in response to changes involving uncertain or probabilistic reward contingencies is an essential survival skill that is impaired in a variety of psychiatric disorders. It is well established that different forms of cognitive flexibility are mediated by anatomically distinct regions of the frontal lobes when reinforcement contingencies are assured, however, less is known about the contribution of these regions to probabilistic reinforcement learning. Here we show that different regions of the orbitofrontal and medial prefrontal cortex make distinct contributions to probabilistic reversal learning. These findings provide novel information about the complex interplay between frontal lobe regions in mediating these processes and accordingly provide insight into possible pathophysiology that underlies impairments in cognitive flexibility observed in mental illnesses.

Keywords: orbitofrontal cortex, prelimbic cortex, probabilistic, reversal learning

Introduction

It is well established that different regions of the prefrontal cortex (PFC) mediate distinct forms of cognitive flexibility. For example, lesions of the dorsolateral PFC (dlPFC) in primates or medial PFC in rats impairs shifts between different strategies or attentional sets (Dias et al., 1996; Ragozzino et al., 1999; Birrell and Brown, 2000). In comparison, shifting between different stimulus–reward associations (reversal learning) is facilitated by the orbitofrontal cortex (OFC) (McAlonan and Brown, 2003; Ghods-Sharifi et al., 2008). These findings have provided valuable insight into the neural mechanisms underlying the ability to adapt behavior to changing circumstances, but have been limited mostly to procedures that provide explicit “correct” or “incorrect” feedback, a scenario that rarely arises in the real world. Indeed, recent data suggest that the specific frontostriatal circuitry involved behavioral/cognitive flexibility can vary depending on whether feedback is probabilistic or assured (Dalton et al., 2014).

Damage to the OFC in humans and nonhuman primates impairs reversal learning during tasks that provide unequivocal feedback, whereas damage to the dlPFC leaves performance intact (Dias et al., 1996; Fellows and Farah, 2003). Similarly, patients with damage to the OFC display impairments on probabilistic reversal learning (PRL) where more ambiguous feedback is provided, whereas those with damage to lateral frontal regions that excluded the OFC displayed more variable effects on performance (Berlin et al., 2004; Hornak et al., 2004). Note that these latter situations require more complex evaluations of action–outcome associations and tracking of the broader context of reward history to ascertain which response option may be more profitable. Thus, additional frontal regions may be recruited when cognitive demands are increased, an idea supported by imaging studies using tasks where correct responses are rewarded only 70–80% of the time and “incorrect” responses are occasionally rewarded. These studies highlight a central role for the OFC in guiding responding when feedback is ambiguous, but also implicate other PFC regions in this type of learning, including the ventrolateral PFC, dorsal anterior cingulate (dACC) and the dlPFC (Cools et al., 2002; O'Doherty et al., 2003; Remijnse et al., 2005). Tsuchida et al. (2010) directly addressed this issue by using lesion–function mapping in patients with focal frontal lobe damage to identify regions that were critical for PRL. Patients with OFC (but not dACC) lesions were impaired. This latter observation emphasizes that even though studies with brain-damaged patients can identify general regions of the frontal lobes that may contribute to certain forms of cognitive flexibility; the often diffuse lesions incurred by these individuals make it difficult to locate specific functions to distinct cortical regions. In this regard, preclinical studies may shed additional light on this issue.

The present study conducted a systematic analysis of the contribution of five key regions of the rat frontal lobe to PRL, using an operant task developed for rats (Bari et al., 2010; Dalton et al., 2014). There has been debate whether rat medial PFC and OFC regions share functional homology to similar regions in primates (Preuss, 1995; Uylings et al., 2003). Taking into account their anatomical connectivity, projection patterns of the rat medial OFC (mOFC) and lateral OFC (lOFC) to striatum and amygdala are similar to those of areas 14 and 12/13 of the primate OFC (Ongür and Price, 2000; Schilman et al., 2008; Wise, 2008; Hoover and Vertes, 2011). Likewise, the rat anterior cingulate, prelimbic, and infralimbic regions display similar striatal connectivity to areas 24, 32, and 25 of primate anterior cingulate (Sesack et al., 1989; Ongür and Price, 2000; Hoover and Vertes, 2007; Wise, 2008). Previous studies in our laboratory have identified a key role for the nucleus accumbens shell in facilitating performance of this task and in mediating reward sensitivity (Dalton et al., 2014). Here, we assessed the effects of inactivation of some of the main OFC and medial PFC inputs to the accumbens in well trained rats to identify possible dissociable roles for these regions in this form of cognitive flexibility.

Materials and Methods

Subjects.

Male Long–Evans rats (280–350 g) were housed in single cages and maintained on a 12 h light/dark cycle with ad libitum access to standard laboratory chow and water. The colony was maintained at 21°C with a 12 h light/dark cycle (lights on at 07:00 h). All experiments were performed during the light phase of the cycle. Rats were given 7–8 d to acclimatize to the colony before behavioral procedures began. Rats were handled and weighed daily during this period and throughout the course of the experiment. During behavioral training, rats had ad libitum access to water and were maintained on a restricted laboratory chow diet to maintain 85–90% of ad libitum weight in age-matched rats. All experiments were conducted in accordance with the standards of the Canadian Council on Animal Care and were approved by the Committee on Animal Care, University of British Columbia.

Apparatus.

All testing was conducted in operant chambers (30.5 × 24 × 21 cm; Med-Associates) enclosed in sound-attenuating boxes. Each box contained a fan to mask outside noises and to provide ventilation. Two retractable levers were located on either side of a central food hopper into which sugar pellet reinforcement (45 mg; BioServ) was delivered. Each chamber was illuminated by a 100 mA house light located in the top-center of the wall opposite the levers. All experimental data were recorded by an IBM personal computer connected to the chambers via an interface.

Orbital/prefrontal regions-of-interest and surgery.

Before training, rats were anesthetized with ketamine (100 mg/kg)/xylazine (7 mg/kg), and implanted with bilateral 23 gauge stainless-steel guide cannulae located above the mOFC (flat skull: anteroposterior = +4.2 mm, mediolateral = ±0.7 mm, dorsoventral = −3.2 mm from dura) the lOFC (flat skull: anteroposterior = +3.8 mm, mediolateral = ±2.6 mm, dorsoventral = −3.2 mm from dura), the prelimbic cortex (flat skull: anteroposterior = +3.4 mm, mediolateral = ±0.7 mm, dorsoventral = −2.8 mm from dura), the infralimbic cortex (flat skull: anteroposterior = +2.8 mm, mediolateral = ±0.7 mm, dorsoventral = −4.1 mm from dura) or the dACC (flat skull: anteroposterior = +2.0 mm, mediolateral = ±0.7 mm, dorsoventral = −1.2 mm from dura), using standard stereotaxic techniques. Guide cannulae were implanted vertically and held in place with stainless steel screws and dental acrylic. Thirty gauge obdurators flush with the end of guide cannulae remained in place until the infusions were made. Rats were given at least 1 week to recover from surgery before behavioral training began. During this period, they were handled for at least 5 min each day and were food restricted to 85% of their free-feeding body weight.

Lever-pressing training.

On the day before their first exposure to the operant chambers, rats were given ∼25 sugar pellet rewards in their home cage. On the first day of training, the food cup contained two to three pellets and crushed pellets were placed on a lever before each rat was placed into the chamber. Rats were first trained to press one of the levers to receive reward on a fixed-ratio 1 schedule to a criterion of 60 presses in 30 min, and were required to press the other lever on the next day (counterbalanced left/right between subjects). Rats were then trained on a simplified version of the full task. These 90 trial sessions began with the levers retracted and the operant chamber in darkness. Every 40 s, a new trial was initiated by illumination of the house light and insertion of one of the two levers into the chamber. If the rat failed to respond on the lever within 10 s, the lever was retracted, the house light was extinguished, and the trial was scored as an omission. A response within 10 s of lever insertion resulted in delivery of a single pellet with 50% probability. This procedure was used to familiarize the rats with the probabilistic nature of the full task. In every pair of trials, the left or right lever was presented once, and the order within the pair of trials was randomized. Rats were trained for ∼3–4 d to a criterion of 80 or more successful trials (ie; ≤10 omissions), after which they were trained on one of two reversal learning tasks.

PRL.

The procedures used in the present study were modified from those described by Bari et al. (2010) through the use of retractable levers (as opposed to nosepoke apertures used in the previous study). Daily sessions consisted of 200 discrete choice trials, with an intertrial interval of 15 s (50 min total). Trials began with illumination of the house light, and 3 s later, insertion of both levers into the chamber. At the start of each session, one of the two levers was randomly selected to be correct and the other incorrect. During this initial discrimination phase, a response on the correct lever delivered a single reward pellet on 80% of trials, whereas an incorrect response delivered reinforcement on only 20% of trials. Failure to press a lever within 10 s of insertion (i.e., trial omission) led to their retraction and termination of the house light until the next trial. Once the correct lever was selected on eight consecutive trials (regardless of whether a correct choice was reinforced), the contingencies were reversed so that the correct lever now became the one that provided a lower probability of reward (i.e., incorrect lever) and vice versa. This pattern was repeated over the course of a daily session. Daily training sessions continued until a group of rats achieved more than three reversals per session for 2 consecutive days. Across all experiments, rats required an average of 11 training sessions (range 10–15) to achieve this criterion. On the following day, rats received their first counterbalanced microinfusion tests.

Reversal learning with assured outcomes.

We determined a priori that if inactivation of a particular cortical region impaired performance on the PRL task, we would also assess the effect of this manipulation on a simplified version of the task where the outcomes of correct and incorrect choices were assured, rather than probabilistic. This would determine whether impairments in probabilistic learning were attributable to more general impairments in cognitive flexibility or more selectively driven by disruptions in the ability to alter behavior in response to probabilistic feedback. This task differed from the probabilistic reversal learning task only with respect to the contingency that correct/incorrect response always/never delivered reinforcement, respectively. Separate groups of experimentally naive rats, unfamiliar with the probabilistic reversal procedure were trained on this task for 7 d, after which they proceeded to the microinfusion test phase of the experiment.

Drugs and microinfusion procedures.

One or 2 d before their first microinfusion test day, rats received a mock infusion procedure, during which obdurators were removed from the guide cannulae, and replaced with stainless steel injectors for 2 min, without an infusion procedure.

A within-subjects design was used for all experiments. Inactivation of each brain region was achieved by microinfusion of a solution containing the GABAB agonist baclofen and the GABAA agonist muscimol (100 ng each per side, Sigma-Aldrich). GABA agonists or saline were infused bilaterally (0.4 μl over 88 s) via a 30 gauge injection cannula that protruded 0.8 mm beyond the guide cannula. Injection cannulae were left in place for 60 s to allow for diffusion. Rats remained in their home cages for an additional 10 min period before behavioral testing. Neurophysiological studies have shown that administration of muscimol into the brain induces a significant suppression of neural activity for at least 2 h (van Duuren et al., 2007), which would last throughout the duration of the test sessions used here (50 min).

On the first infusion test day, one-half of the rats in each group received saline infusions, and the other one-half received baclofen/muscimol. The following day all rats received a baseline training day (no infusion). If a rat achieved less than two reversals during this baseline session, it was given an additional day of training before the second infusion test. On the day after baseline performance was reestablished, rats received a second counterbalanced infusion of saline or baclofen/muscimol.

Histology.

After completion of behavioral testing, rats were euthanized in a carbon dioxide chamber. Brains were removed and fixed in a 4% formalin solution. The brains were frozen and sliced in 50 μm sections before being mounted and stained with cresyl violet. Placements were verified with reference to the neuroanatomical atlas of Paxinos and Watson (2005). Data from rats with placements outside the borders of the region-of-interest and asymmetrical placements were removed from the analysis. In general, animals with inaccurate placements did not display prominent changes in performance following inactivation treatments relative to saline infusions. The locations of infusion sites are displayed in Figure 1.

Figure 1.

Figure 1.

Histology. Left, Schematics of coronal sections showing the range of acceptable locations of infusions within the medial OFC (filled circles) and lateral OFC (open circles). Right, The range of acceptable locations of infusions within the prelimbic (filled circles), infralimbic (open circles), and anterior cingulate (filled squares) regions of the PFC. Photomicrographs of representative placements in these regions are also presented, with arrows highlighting the location of the cannulae tips.

Data analysis.

The primary dependent variable of interest was the number of reversals completed per session; these were analyzed as a function of the number of complete trials. Specifically, data were transformed using the following formula: [(no. of reversals completed per session/(200 − no. of trial omissions)] × 100 (i.e., number of reversals per 100 completed trials). This transformation was used to accommodate for any potential increases in the number of trial omissions induced by inactivation treatments, which could complicate interpretation of the raw data because a decrease in the number of reversals/session could either be attributable to impairment in cognitive processes related to reversal learning or merely reflect fewer completed trials. These data were analyzed with repeated-measures one-way ANOVAs.

Ancillary analyses assessed differences in the number of errors committed to achieve criterion of eight correct consecutive choices for the first discrimination and the first reversals of a session, as we have described previously (Dalton et al., 2014). These data were analyzed typically with two-way repeated-measures ANOVAs, with treatment and phase (first discrimination, first reversal) as two within-subjects factors.

Whenever inactivation of a particular region altered PRL performance significantly, we also analyzed the number of perseverative errors rats made during the reversal phases of the task. For these analyses, we compared the number of consecutive incorrect choices committed after a reversal of reinforcement contingencies (i.e., after 8 consecutive correct responses). Once a rat made a correct response after a reversal, subsequent errors were no longer counted as perseverative. For these analyses, we compared the average number of perseverative errors made by an individual rat over the minimum number of reversals completed by that rat after both treatments. This was because when rats completed a greater number of reversals, they tended to make fewer perseverative errors during the latter part of the session, which in turn could artificially reduce the average number of perseverative errors per reversal for the session. Thus, this procedure allowed a more unbiased measure of perseveration that we could compare across both treatments. For example, under control conditions, a rat may have completed five reversals, whereas after inactivation treatments, the same rat may have only completed three reversals. In this instance, we computed the average number of perseverative errors made per reversal for only the first three reversals during both control and inactivation treatments. These data were analyzed using repeated-measures one-way ANOVAs.

For the PRL task, we also analyzed each animal's choices according to the outcome after both correct and incorrect choices of each preceding trial to assess whether neural inactivation altered reward (“win-stay”) or negative feedback (“lose-shift”) sensitivity (Bari et al., 2010). Win–stay ratios assessed the likelihood that a subject followed a rewarded choice with another choice of the same type (correct or incorrect). These ratios were calculated from the number of trials on which a rat chose the correct/incorrect lever after being rewarded on the preceding trial, divided by the total number of rewarded correct or incorrect choices. Conversely, lose–shift ratios indexed how likely rats were to switch choices after receiving negative feedback (i.e., reward omission) for a response on the preceding trial. These values were calculated from the number of trials on which a rat switched responding to the other lever after not being rewarded for a correct or incorrect choice on the preceding trial, divided by the total number of non-rewarded correct/incorrect choices. The proportion of win-stay and lose-shift scores for both correct and incorrect choices were analyzed using three-way repeated-measures ANOVAs with treatment, trial type (win-stay and lose-shift), and choice type (correct and incorrect) as three within-subjects factors.

Last, latencies to make a choice and the number of trial omissions (i.e., trials where no response was made within 10 s of lever insertion) were also analyzed with one-way ANOVAs.

Results

mOFC inactivation: PRL performance

Fifteen rats with cannulae implanted into the mOFC were initially trained on the PRL task. Data from three rats were eliminated because of inaccurate placements residing ventral to the mOFC. For the remaining animals (n = 12), infusions of baclofen/muscimol into the mOFC markedly impaired performance, indexed by a decreased in the number of reversals completed (F(1,11) = 23.25, p = 0.001; Fig. 2A). This impairment was not accompanied by changes in the number of trial omissions or choice latencies (both F values <1, both p values >0.35; Table 1).

Figure 2.

Figure 2.

Inactivation of the medial (top row) or the lateral (bottom row) regions of the OFC differentially impairs PRL. A, Microinfusions of baclofen and muscimol (Bac/Mus) into the mOFC (n = 12) reduced the number of reversals completed per 100 successful trials. For this and all other figures, circles and dashed lines represent data from individual animals following both treatments. B, Errors to achieve criterion performance during the initial discrimination and first reversal phases after inactivation and control treatments. mOFC inactivation increased errors during the first discrimination of the session, and this effect persisted during the first reversal. C, mOFC inactivation increased perseverative errors throughout the task. D, mOFC inactivation caused a decrease in both win-stay and lose-shift behavior after both correct and incorrect choices. E, lOFC inactivation (n = 10) also reduced the number of reversals completed. F, In contrast to mOFC inactivation, lOFC inactivation did not affect error rates during the initial acquisition of the task but did tend to increase the number of errors made during the first reversal. G, lOFC inactivation did not alter perseverative tendencies. H, These treatments reduced both win-stay and lose-shift behaviors only after incorrect choices. Asterisk denotes p < 0.05.

Table 1.

Number of trial omissions over 200 trials and average response latencies following inactivation and vehicle treatments in different regions of the OFC and medial PFC

Saline Inactivation
Probabilistic reversals
    Orbitofrontal
        mOFC
            Trial omissions 10.9 (3.9) 8.2 (3.1)
            Response latency(s) 1.0 (0.2) 1.0 (0.1)
        lOFC
            Trial omissions 9.2 (2.9) 21.5 (5.1)*
            Response latency 1.0 (0.1) 1.5 (0.2)*
    Medial prefrontal
        Prelimbic
            Trial omissions 3.3 (2.0) 1.4 (0.8)
            Response latency 0.7 (0.1) 0.6 (0.1)
        Infralimbic
            Trial omissions 10.2 (2.9) 11.5 (5.7)
            Response latency 1.0 (0.1) 0.9 (0.1)
        Anterior cingulate
            Trial omissions 11.0 (5.3) 22.4 (8.5)
            Response latency 0.8 (0.1) 1.2 (0.2)*
Reversals w/assured outcomes
        mOFC
            Trial omissions 3.0 (1.0) 7.8 (4.3)
            Response latency 0.9 (0.1) 0.9 (0.1)
        lOFC
            Trial omissions 1.0 (0.6) 9.8 (4.2)
            Response latency 0.7 (0.1) 1.1 (0.2)*

Values are displayed as mean (SEM).

*p < 0.05.

To determine whether differences in performance were attributable to difficulty during reversal shifts or a more general disruption in learning based on probabilistic feedback, we compared the number of errors to achieve criterion for the initial discrimination and first reversal. One rat in this group did not achieve criterion performance on the initial discrimination following inactivation treatments, whereas the remaining 11 rats completed the initial discrimination phase and at least one reversal after both treatments. Analysis of the data from these 11 rats revealed a significant main effect of treatment (F(1,10) = 6.00, p = 0.034), but no treatment × phase interaction (F(1,10) = 0.41, p = 0.54). As displayed in Figure 2B, mOFC inactivation increased errors to criterion during the initial discrimination, and first reversal, although visual inspection of these data showed that this effect was numerically larger during the initial discrimination. Furthermore, during the reversal phases of the task, mOFC inactivation increased the average number of perseverative error per reversal (i.e., consecutive errors following a shift in reinforcement contingencies; F(1,11) = 5.51, p = 0.041; Fig. 2C). Thus, mOFC inactivation not only impaired the use of probabilistic reward feedback to identify the more profitable option at the start of a test session, but also retarded suppression of a particular response upon shifts in reinforcement contingencies.

Additional insight into the deficits induced by mOFC inactivation was obtained from analyses of changes in sensitivity to positive or negative feedback. Under control conditions, rats followed a rewarded correct choice with another correct choice (win-stay behavior) on 70 ± 2% of these occasions, whereas a rewarded incorrect choice was followed by another incorrect choice on 65 ± 4% of these types of trials. In comparison, on trials where rats were not rewarded after a response, they shifted to the alternative lever (lose-shift) on 48 ± 4% and 53 ± 3% of subsequent trials after correct and incorrect choices, respectively. Analysis of these data obtained on saline and inactivation test days revealed a significant main effect of treatment (F(1,11) = 7.56, p = 0.019), but no other interactions effects with the treatment factor were observed (all F values <0.2, all p values >0.90). As displayed in Figure 2D, mOFC inactivation uniformly reduced both win-stay and lose-shift behavior, regardless of the whether the preceding choice was correct or incorrect. Thus, mOFC inactivation rendered animals less sensitive to either positive or negative feedback, reducing the impact that recent action outcomes exerted on subsequent choices. Together, these data demonstrate that the mOFC plays a critical role in facilitating probabilistic learning. The marked impairment induced by mOFC inactivation in well trained subjects was apparent during the initial discrimination phase, suggesting that these effects may not reflect deficits exclusive to reversal learning, but rather a more comprehensive impairment in probabilistic reinforcement learning. These impairments were associated with increased perseverative tendencies, along with a general reduction in the ability to incorporate positive or negative feedback to guide subsequent action selection.

lOFC inactivation: probabilistic reversals

Eleven rats with cannulae implanted into the lOFC were used in this experiment. Data from one rat was eliminated because of inaccurate placements that were ventral to the lOFC, leaving a final n = 10. Inactivation of the lOFC impaired PRL performance, as indexed by a reduction in the number of reversals completed per 100 trials (F(1,9) = 8.33, p = 0.018; Fig. 2E). However, analysis of the errors made during the initial phases of the task suggested that these impairments were qualitatively different from those observed following mOFC inactivation. Analysis of the number of errors to criterion for the initial discrimination and first reversal revealed no significant main effect of treatment (F(1,9) = 1.28, p = 0.29) but did reveal a strong trend toward a significant treatment × phase interaction (F(1,8) = 4.68, p = 0.059). As is apparent from Figure 2F, performance during the initial discrimination phase was not significantly affected by inactivation of the lOFC, however, performance was noticeably impaired during the reversal phase indicating that rats had difficulty modifying their behavior following a change in reward contingencies. However, this impairment was not associated with enhanced perseverative tendencies (F(1,9) = 1.20, p = 0.30; Fig. 2G). This lack of effect on perseverative responding may be related to the extended training rats received, which may reduce lOFC involvement response suppression under these circumstances (Boulougouris and Robbins, 2009; Young and Shapiro, 2009; Stalnaker et al., 2015).

Unlike mOFC inactivation, lOFC inactivation significantly increased choice latencies (F(1,9) = 7.85, p = 0.022; Table 1), in a manner similar to the effects of these manipulation on response latencies during probabilistic discounting (St Onge and Floresco, 2010). Accordingly, lOFC inactivation also increased trial omissions (F(1,9) = 7.86, p = 0.021; Table 1), presumably attributable to a slowing of response selection that led to a greater number of trials where rats did not respond within the allotted 10 s period while the levers were extended. This increase in choice latency may be related to alterations in phasic dopamine transmission which can invigorate approach behavior toward reward-related cues (Flagel et al., 2011). Similar inactivation of the lOFC has been reported to attenuate phasic dopamine responses induced by reward-related cues during cost/benefit decision making (Jo and Mizumori, 2015). Thus, in addition to mediating accurate performance during PRL, neural activity in the lOFC may also facilitate timely approach toward reward-related stimuli via interactions with the dopamine system.

With respect to changes in win-stay/lose-shift behavior, analysis of these data revealed a significant treatment × choice interaction (F(1,9) = 5.57, p = 0.043; Fig. 2H), but no other significant main effects or interactions with the treatment factor (all F values <1.5, all p values >0.25). Simple main effects analysis further revealed that lOFC inactivation did not affect win-stay or lose-shift tendencies following a correct choice (all F values <2.2, all p values >0.17). Instead, these treatments induced a subtle but statistically significant reduction in both win-stay and lose-shift ratios after incorrect choices (F(1,9) = 5.09, p = 0.05; Fig. 2H right). Thus, following lOFC inactivation, rats were less likely to shift away from the incorrect lever after a more common non-rewarded response, and were also less likely to select the incorrect lever again on the rarer occasions when an erroneous choice was rewarded.

mOFC and lOFC inactivations: reversal learning with assured outcomes

The finding that inactivation of either the mOFC or lOFC impaired PRL in well trained animals differs from other observations that lesions of the OFC region does not impair reversal performance once rats have experienced shifts in reinforcement contingencies (Schoenbaum et al., 2002; McAlonan and Brown, 2003; Boulougouris et al., 2007; Boulougouris and Robbins, 2009). An important difference between the previous and present studies is that in the former instances, a correct/incorrect response always/never delivered reward. To explore whether impairments induced by inactivation of these OFC regions were related to the probabilistic nature of the task, separate groups of rats were well trained on a similar task in which a correct choice was always rewarded and an incorrect choice was never rewarded before receiving saline or inactivation treatments in either the mOFC or lOFC. In these experiments, rats completed more reversals compared with those trained on the probabilistic task, likely due to the relatively more straightforward reinforcement contingencies. As such, all rats in both the mOFC and lOFC groups completed the first discrimination and at least two reversals after saline and inactivation treatments, permitting us to compare the number of errors made across these three phases.

Infusions of baclofen/muscimol into the mOFC (n = 8) failed to affect performance when correct/incorrect responses were always/never reinforced (F(1,7) = 1.22, p = 0.30; Fig. 3A). Analysis of the number of errors to achieve criterion for the initial discrimination and subsequent two reversals did not reveal a significant main effect of treatment or treatment × phase interactions (all F values <2.84, all p values >0.13; Fig. 3B).

Figure 3.

Figure 3.

Inactivation of neither the mOFC (top row, n = 8) nor the lOFC (bottom row, n = 6) affects performance of a reversal learning task when feedback was assured. A, C, Number of reversals completed per 100 successful trials following saline or inactivation treatments within the mOFC or lOFC. B, D, Errors to achieve criterion performance were not affected by inactivation of either the mOFC (B) or the lOFC (D) during the initial discrimination and first two reversal phases of the reversal with assured outcomes task after inactivation and control treatments.

Inactivation of the lOFC (n = 6) had a somewhat equivocal effect on performance on this task. Three of the six animals in this experiment showed a considerable reduction in the number of reversals completed after inactivation relative to saline treatments, one rat displayed a marked increase in this measure and two others showed minimal change in performance (Fig. 3C). Thus, even though these treatments reduced the average number of reversals completed, analyses of these data failed to yield a significant difference between treatment conditions (F(1,5) = 1.08, p = 0.35; Fig. 3C). This trend appeared to be driven by a slight increase in the number of errors committed during the initial discrimination of the session, yet, analyses of the errors made during the first three discrimination also failed to reveal a significant difference between treatments (main effect of treatment: F(1,5) = 1.04, p = 0.36; treatment × phase interaction: F(2,10) = 2.35, p = 0.15).

With respect to other performance measures, the number of omissions was unaffected by inactivation of either brain region (both F values <4.28; both p values >0.09; Table 1), whereas response latency was unaffected by inactivation of the mOFC (F(1,7) = 0.43, p = 0.53) but again was significantly increased by lOFC inactivation (F(1,5) = 23.38, p = 0.005; Table 1). Thus, these data confirm that neural activity within the mOFC is not required for efficient reversal performance when animals have experienced shifts in reinforcement contingencies that are assured. Furthermore, the lOFC plays, at best, a relatively limited role in facilitating reversal shifts under these conditions after extended training, consistent with previous findings (Boulougouris and Robbins, 2009; Young and Shapiro, 2009). In comparison, both regions play more prominent, although somewhat different roles in mediating cognitive flexibility when action–outcome contingencies are probabilistic.

Medial PFC regions and PRL

Prelimbic PFC

Eighteen rats with cannulae implanted into the prelimbic area of the PFC were trained on the PRL task for this experiment. Data from four rats were eliminated because of inaccurate placements residing ventral to the prelimbic cortex leaving a final n = 14. In these animals, inactivation of the prelimbic cortex induced an surprising increase in the number of reversals completed (F(1,13) = 22.11, p < 0.001; Fig. 4A). Rats in this cohort completed fewer reversals/100 trials (1.6 ± 0.2) after saline infusions into the prelimbic cortex when compared with control performance of rats in the OFC groups (2.6–3.4 reversals completed/100 trials). To confirm that the increase in reversals completed following prelimbic inactivations was not an artifact of the somewhat poorer performance of these rats under control conditions, we analyzed data from a subset of animals whose performance after saline infusions was more comparable to rats in the OFC groups. Despite the fewer number of animals included in this analysis (n = 6), we again observed that inactivation of the prelimbic PFC increased the number of reversals completed/100 trials (mean = 3.6 ± 0.2) relative to saline infusions (mean = 2.2 ± 0.1: F(1,5) = 49.82, p < 0.001). The improvement in reversal performance induced by prelimbic inactivation was mirrored by a significant decrease in the number of errors made to reach criterion at both the discrimination and reversal phases (main effects of treatment: (F(1,13) = 15.61, p = 0.002); but no treatment × phase interaction (F(1,13) = 0.67, p = 0.43; Fig. 4B). Inactivation of the prelimbic cortex had no effects on perseverative errors (Fig. 4C), number of omissions or latency to respond (all F values <1.5, all p values >0.24; Table 1).

Figure 4.

Figure 4.

Inactivation of the prelimbic PFC induced an apparent improvement in PRL performance. A, Inactivation of the prelimbic PFC (n = 14) increased the number of reversals completed per 100 trials relative to control treatments. B, Errors to achieve criterion were reduced following prelimbic PFC inactivation. C, Perseverative-type errors were not affected by these treatments. D, Win-stay tendencies were increased following both correct and incorrect choices while lose-shift behavior was decreased only following correct choices. Asterisks denote p < 0.05.

Additional insight into the apparent improvement in probabilistic reversal performance induced by prelimbic cortex inactivation was obtained by analysis of the win-stay/lose-shift data. This analysis yielded a significant treatment × trial type interaction (F(1,13) = 27.42, p < 0.001) and a significant treatment × choice type interaction (F(1,13) = 10.39, p = 0.007), although the three-way interaction was not significant (F(1,13) = 1.99, p = 0.18). To further clarify the effect of prelimbic inactivation on reward and negative feedback sensitivity, exploratory two-way ANOVAs were conducted on win-stay and lose-shift data obtained after correct and incorrect choices. For correct choices, the analysis again revealed a significant treatment × trial type interaction (F(1,13) = 16.28, p = 0.001; Fig. 4D, left). Partitioning of this interaction confirmed that inactivation of the prelimbic PFC increased the tendency to follow a rewarded correct choice with another correct choice (p < 0.05), while at the same time, reducing tendency to shift responding after a non-rewarded correct choice (p < 0.05). Analysis of win-stay/lose-shift ratios after incorrect choices yielded another significant treatment × trial-type interaction (F(1,13) = 15.58, p = 0.002; Fig. 4D, right). On these types of trials, prelimbic inactivation again increased win-stay behavior (p < 0.05). However, these treatments did not alter the likelihood of rats shifting their responding after a non-rewarded incorrect response (p > 0.15). Thus, the enhanced reversal performance induced by prelimbic inactivation was likely driven by a greater tendency for rats to follow a rewarded correct choice with a similar choice, while at the same time, making them less likely to shift away from the correct lever on trials when correct choices were not reinforced.

Infralimbic PFC

Fourteen rats with cannulae implanted into the infralimbic PFC were trained on the PRL task, and data from three rats were eliminated because of inaccurate placements residing ventral to the infralimbic cortex. For the remaining animals (n = 11), inactivation of the infralimbic cortex did not significantly affect the number of reversals per 100 completed trials (F(1,10) = 1.41, p = 0.26; Fig. 5A) or errors at either the acquisition or reversal stage of the test (all F values <1.0; Fig. 5B). Notably, there was considerable overlap in terms of the anterior/posterior placements of guide cannula in this group relative to those in the prelimbic group (Fig. 1). Yet performance of this group under control conditions was comparable to other groups in the study, suggesting that the fewer number of reversals completed by rats in the prelimbic group after control treatments was more likely attributable to random variations in performance across groups rather than nonspecific damage incurred by the indwelling cannula. There were also no significant main effects or interactions with the treatment factor for win-stay/lose-shift behavior (all F values <1.0; Fig. 5C). Number of omissions made and latency to respond were also unaffected (both F values <1.0; Table 1).

Figure 5.

Figure 5.

Inactivation of the infralimbic PFC (top row, n = 11) or the anterior cingulate (bottom row, n = 10) did not significantly affect PRL performance. A, D, the number of reversals completed per 100 trials. B, E, The number of errors made to achieve criterion during either the initial acquisition or reversal stages of the task or (C, F) win-stay/lose-shift behavior. Note, however, the trend of reduced number of reversals completed induced by anterior cingulate inactivation.

dACC

Thirteen rats were initially trained in this experiment, with data from two rats being eliminated following the postmortem identification of tumors and data from one rat was eliminated because of inaccurate, asymmetrical placement, leaving a final n = 10. As displayed in Figure 5D, infusions of baclofen/muscimol into the dACC reduced the number of reversals per 100 completed trials in the majority of animals tested, yet one rat in this experiment showed a marked increase on this measure. This variability occluded our ability to detect a significant effect of treatment (F(1,9) = 3.05, p = 0.12) Despite this trend, analysis of the error data showed that dACC inactivation had no significant main effect on the number of errors made to reach criterion at either the acquisition or reversal stage of the test (all F values <1.0; Fig. 5E). Similarly, no significant effect was found for win-stay/lose-shift behavior (all F values <1.4; Fig. 5F). Latency to respond was significantly increased following inactivation of the ACC (F(1,9) = 13.18, p = 0.005; Table 1), whereas the number of omissions made was unaffected (F(1,9) = 1.62, p = 0.24; Table 1).

Discussion

The present study provides novel insight into the contribution of different OFC and medial PFC regions in reinforcement learning when reward feedback is probabilistic. Inactivation of the mOFC or lOFC induced qualitatively different deficits, with mOFC inactivation impairing probabilistic learning, increasing perseverative responding and reducing the impact of both rewarded and non-rewarded actions on subsequent action selection. lOFC inactivation more selectively impaired reversal performance, driven in part by a disruption in adjusting behavior after non-rewarded incorrect choices. In contrast, prelimbic medial PFC inactivation seemingly improved performance, increasing sensitivity to reinforced actions and reducing sensitivity to non-rewarded correct choices.

Different contributions by OFC subregions to PRL

OFC damage impairs reversal learning with determined outcomes, while leaving initial discrimination learning relatively intact (Dias et al., 1996; Fellows and Farah, 2003; Boulougouris et al., 2007; Ghods-Sharifi et al., 2008). Most rodent studies have focused on the lOFC, whereas comparatively few have examined the contribution of the mOFC to this form of cognitive flexibility (Gourley et al., 2010). The present findings that activity in both OFC regions enables efficient PRL, in combination with the relative lack of effect on performance on a similar task where outcomes were assured reveal that both OFC regions play fundamental and pervasive roles in facilitating flexible responding when reinforcement contingencies are probabilistic. These data complement those implicating the OFC in guiding behavior under conditions of uncertainty (Rogers et al., 1999; van Duuren et al., 2009) or when task complexity is otherwise increased (Rudebeck and Murray, 2008).

mOFC inactivation increased errors during the initial discrimination, suggestive of an impairment in distinguishing responses that yield high probability rewards from lower ones. This in keeping with suggestions that this region integrates goal value signals (Elliott et al., 2000; Kable and Glimcher, 2009) and mediates action–outcome representations (Mainen and Kepecs, 2009) to guide value-based action selection (Gläscher et al., 2009, 2012; Sul et al., 2010; Stopper et al., 2014). Additional analyses revealed that suboptimal reward seeking reflected a generalized deficit in retrieving and incorporating information about outcomes of previous actions to guide subsequent choice, as both win-stay and lose-shift behavior were reduced. Reduced negative feedback sensitivity after a non-rewarded incorrect choice may have contributed to increased perseveration, promoting persistent erroneous responding after a shift in reinforcement contingencies. This complex myriad of effects highlights the importance of the mOFC in facilitating probabilistic learning by integrating information about the likelihood of obtaining rewards following different actions to guide ongoing reward seeking.

In contrast, lOFC inactivation did not affect initial discrimination learning, suggesting that these manipulations left basic motoric and motivational processes intact. Instead, these treatments induced more restricted impairments during reversal stages, decreasing win-stay and lose-shift behavior selectively after incorrect choices, consistent with the idea that the lOFC mediates adjustments response selection upon violations of reward expectancies signaled by negative feedback (O'Doherty et al., 2001; Levens et al., 2014). Notably, functional imaging in monkeys performing reversal tasks have revealed outcome-associated activation of the lOFC that was related to win-stay/lose-shift behavior, suggesting that this region is involved in directing behavior that is adaptive to the context, given the recent distribution of reward to choices, to maximize future reward (Chau et al., 2015). Our results suggest that this activity may be particularly important following incorrect actions. Furthermore, the fact that lOFC inactivation did not affect perseveration suggests that impairments observed in this experiment are less likely to be attributable to updating action–outcome associations after a reversal, but rather, may reflect an impairment in maintaining appropriate patterns of choice upon changes in reinforcement contingencies.

These effects of lOFC inactivation complement findings obtained with monkeys with lesions of the OFC encompassing lateral and medial regions that displayed impaired performance on a three-choice probabilistic learning task, but only during the reversal phases (Walton et al., 2010). This collection of findings provide additional support for the recent theoretical synthesis of Stalnaker et al. (2015) who propose that the lOFC may be recruited in situations that require “a novel value to be computed on the fly using new information or predictions that have been acquired since the original learning.” Thus, the lOFC and mOFC may play distinct yet complementary roles in facilitating PRL. The mOFC facilitates use of probabilistic feedback to identify actions that may yield higher probability rewards (Noonan et al., 2012). In comparison, the lOFC may identify in changes in reinforcement contingencies and signal the mOFC to update appraisals concerning actions that may be more profitable.

Tsuchida et al. (2010) tested humans with damage to both mOFC and lOFC on a PRL task and observed impairments during the initial discrimination and reversal phase, along with increased win-shift tendencies (i.e., reduced win-stay behavior). Our findings suggest that impaired initial discrimination learning and reduced win-stay behavior observed in humans with OFC damage may be attributable to disrupted mOFC function, whereas impaired reversal performance may be related to lOFC damage. These findings emphasize that a more comprehensive understanding of OFC functions will require isolating the dissociable and/or complementary contribution of the medial and lateral portions of this region make to reward seeking, cognitive flexibility, and other aspects of behavior.

Medial PFC regions and PRL

Inactivation of the infralimbic or prelimbic medial PFC did not impair PRL. Similar treatments in the dACC tended to reduce the number reversals completed, but this was not accompanied by changes in error rates or win-stay/lose-shift behavior. The ACC has been proposed to play a role in response inhibition, error detection and performance monitoring (Miyake et al., 2000; Miller and Cohen, 2001; Chase et al., 2008), as well as integration of choice/outcome history (Williams et al., 2004; Rushworth et al., 2007). Note, however, that humans with dACC lesions display normal PRL (Tsuchida et al., 2010). Furthermore, even though dACC inactivation did not significantly impair performance, it did slow choice latencies, suggesting that it plays a permissive role in guiding response selection in these situations.

Prelimbic inactivations not only failed to impair PRL, but actually increased the number of reversals completed. In comparison, lesions or inactivation of this region typically do not affect reversal learning with assured outcomes (Ragozzino et al., 1999; Boulougouris et al., 2007; Floresco et al., 2008). In attempting to understand this surprising effect, it should be noted that the frequent shifts in reinforcement contingencies animals experienced over training would reduce the impact that individual rewarded actions had on subsequent choice. Instead, these conditions promote tracking the broader context of reward history. In this regard, neurophysiological and inactivation studies have implicated the prelimbic PFC in identifying changes in reinforcement contingencies (Durstewitz et al., 2010) and monitoring actions–outcomes to track variations in reward probability (St Onge and Floresco, 2010; St Onge et al., 2012; Orsini et al., 2015). In the present study, prelimbic inactivation increased the likelihood of rats repeating a rewarded choice with the same type of choice. Thus, rather than integrating their reward history, rats with prelimbic inactivation displayed a form of reward myopia, with response selection more heavily influenced by the most recently rewarded action. Note that increased win-stay behavior was observed regardless of whether the previous choice was the correct action or not. However, correct choices were rewarded much more frequently, which would lead to greater number of correct versus incorrect choices.

A consequence of the contingencies used here was that 20% of correct choices were not rewarded. Under control conditions, this causes a shift in responding on ∼40% of such trials, which in turn can interrupt a streak of correct choices and delay criterion performance that triggers a reversal. Prelimbic inactivation reduced lose-shift behavior, primarily after non-rewarded correct choices, rather than incorrect ones. This suggests that the ability to adjust behavior after a string of non-rewarded actions (as would occur after most incorrect choices) is relatively spared by prelimbic inactivation. On the other hand, reductions in lose-shift behavior during probabilistic discounting have been observed following disconnection of prelimbic projections to the basolateral amygdala (St Onge et al., 2012). This combination of findings suggests that neural activity in the prelimbic PFC facilitates detection of infrequent errors in reward prediction that occur after non-rewarded actions. Together, the seemingly improved PRL performance after prelimbic inactivation may actually reflect impairments in the ability to monitor different aspects of volatile action–outcome associations, including diminished sensitivity to both the long-term reward history of actions and occasional negative feedback. Within the context of the PRL task structure, these impairments would increase the likelihood of repeating correct choices and reduce shifts in responding, which in this instance, manifested as longer streaks of correct choices and more reversals completed.

Conclusions

The present findings highlight the complex segregation of activity within different regions of the frontal lobes in mediating cognitive flexibility in uncertain situations. Both the mOFC and lOFC cooperate to ascertain courses of action that are more likely to yield rewards and detect changes in reinforcement contingencies. In contrast, the prelimbic PFC appears to monitor action–outcome reward histories and non-rewarded actions. It is noteworthy that deficits in probabilistic learning, cognitive flexibility and altered sensitivity to reward and negative feedback are apparent in a variety of psychiatric disorders such as schizophrenia and depression (Waltz and Gold, 2007; Taylor Tavares et al., 2008; Roiser et al., 2009; Whitton et al., 2015). Further clarification of the mechanisms that mediate these functions in the normal brain may provide insight into the distinct pathophysiologies of different frontal lobe regions that underlie abnormalities in specific aspects of cognition.

Footnotes

This work was supported by grants from the Natural Sciences and Engineering Research Council of Canada to A.G.P. and S.B.F.

The authors declare no competing financial interests.

References

  1. Bari A, Theobald DE, Caprioli D, Mar AC, Aidoo-Micah A, Dalley JW, Robbins TW. Serotonin modulates sensitivity to reward and negative feedback in a probabilistic reversal learning task in rats. Neuropsychopharmacology. 2010;35:1290–1301. doi: 10.1038/npp.2009.233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Berlin HA, Rolls ET, Kischka U. Impulsivity, time perception, emotion and reinforcement sensitivity in patients with orbitofrontal cortex lesions. Brain. 2004;127:1108–1126. doi: 10.1093/brain/awh135. [DOI] [PubMed] [Google Scholar]
  3. Birrell JM, Brown VJ. Medial frontal cortex mediates perceptual attentional set shifting in the rat. J Neurosci. 2000;20:4320–4324. doi: 10.1523/JNEUROSCI.20-11-04320.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Boulougouris V, Robbins TW. Pre-surgical training ameliorates orbitofrontal-mediated impairments in spatial reversal learning. Behav Brain Res. 2009;197:469–475. doi: 10.1016/j.bbr.2008.10.005. [DOI] [PubMed] [Google Scholar]
  5. Boulougouris V, Dalley JW, Robbins TW. Effects of orbitofrontal, infralimbic and prelimbic cortical lesions on serial spatial reversal learning in the rat. Behav Brain Res. 2007;179:219–228. doi: 10.1016/j.bbr.2007.02.005. [DOI] [PubMed] [Google Scholar]
  6. Chase HW, Clark L, Sahakian BJ, Bullmore ET, Robbins TW. Dissociable roles of prefrontal subregions in self-ordered working memory performance. Neuropsychologia. 2008;46:2650–2661. doi: 10.1016/j.neuropsychologia.2008.04.021. [DOI] [PubMed] [Google Scholar]
  7. Chau BK, Sallet J, Papageorgiou GK, Noonan MP, Bell AH, Walton ME, Rushworth MF. Contrasting roles for orbitofrontal cortex and amygdala in credit assignment and learning in macaques. Neuron. 2015;87:1106–1118. doi: 10.1016/j.neuron.2015.08.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cools R, Clark L, Owen AM, Robbins TW. Defining the neural mechanisms of probabilistic reversal learning using event-related functional magnetic resonance imaging. J Neurosci. 2002;22:4563–4567. doi: 10.1523/JNEUROSCI.22-11-04563.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dalton GL, Phillips AG, Floresco SB. Preferential involvement by nucleus accumbens shell in mediating probabilistic learning and reversal shifts. J Neurosci. 2014;34:4618–4626. doi: 10.1523/JNEUROSCI.5058-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Dias R, Robbins TW, Roberts AC. Primate analogue of the Wisconsin card sorting test: effects of excitotoxic lesions of the prefrontal cortex in the marmoset. Behav Neurosci. 1996;110:872–886. doi: 10.1037/0735-7044.110.5.872. [DOI] [PubMed] [Google Scholar]
  11. Durstewitz D, Vittoz NM, Floresco SB, Seamans JK. Abrupt transitions between prefrontal neural ensemble states accompany behavioral transitions during rule learning. Neuron. 2010;66:438–448. doi: 10.1016/j.neuron.2010.03.029. [DOI] [PubMed] [Google Scholar]
  12. Elliott R, Dolan RJ, Frith CD. Dissociable functions in the medial and lateral orbitofrontal cortex: evidence from human neuroimaging studies. Cereb Cortex. 2000;10:308–317. doi: 10.1093/cercor/10.3.308. [DOI] [PubMed] [Google Scholar]
  13. Fellows LK, Farah MJ. Ventromedial frontal cortex mediates affective shifting in humans: evidence from a reversal learning paradigm. Brain. 2003;126:1830–1837. doi: 10.1093/brain/awg180. [DOI] [PubMed] [Google Scholar]
  14. Flagel SB, Clark JJ, Robinson TE, Mayo L, Czuj A, Willuhn I, Akers CA, Clinton SM, Phillips PE, Akil H. A selective role for dopamine in stimulus–reward learning. Nature. 2011;469:53–57. doi: 10.1038/nature09588. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Floresco SB, Block AE, Tse MT. Inactivation of the medial prefrontal cortex of the rat impairs strategy set-shifting, but not reversal learning, using a novel, automated procedure. Behav Brain Res. 2008;190:85–96. doi: 10.1016/j.bbr.2008.02.008. [DOI] [PubMed] [Google Scholar]
  16. Ghods-Sharifi S, Haluk DM, Floresco SB. Differential effects of inactivation of the orbitofrontal cortex on strategy set-shifting and reversal learning. Neurobiol Learn Mem. 2008;89:567–573. doi: 10.1016/j.nlm.2007.10.007. [DOI] [PubMed] [Google Scholar]
  17. Gläscher J, Hampton AN, O'Doherty JP. Determining a role for ventromedial prefrontal cortex in encoding action-based value signals during reward-related decision making. Cereb Cortex. 2009;19:483–495. doi: 10.1093/cercor/bhn098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Gläscher J, Adolphs R, Damasio H, Bechara A, Rudrauf D, Calamia M, Paul LK, Tranel D. Lesion mapping of cognitive control and value-based decision making in the prefrontal cortex. Proc Natl Acad Sci U S A. 2012;109:14681–14686. doi: 10.1073/pnas.1206608109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Gourley SL, Lee AS, Howell JL, Pittenger C, Taylor JR. Dissociable regulation of instrumental action within mouse prefrontal cortex. Eur J Neurosci. 2010;32:1726–1734. doi: 10.1111/j.1460-9568.2010.07438.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hoover WB, Vertes RP. Anatomical analysis of afferent projections to the medial prefrontal cortex in the rat. Brain Struct Funct. 2007;212:149–179. doi: 10.1007/s00429-007-0150-4. [DOI] [PubMed] [Google Scholar]
  21. Hoover WB, Vertes RP. Projections of the medial orbital and ventral orbital cortex in the rat. J Comp Neurol. 2011;519:3766–3801. doi: 10.1002/cne.22733. [DOI] [PubMed] [Google Scholar]
  22. Hornak J, O'Doherty J, Bramham J, Rolls ET, Morris RG, Bullock PR, Polkey CE. Reward-related reversal learning after surgical excisions in orbito-frontal or dorsolateral prefrontal cortex in humans. J Cogn Neurosci. 2004;16:463–478. doi: 10.1162/089892904322926791. [DOI] [PubMed] [Google Scholar]
  23. Jo YS, Mizumori SJ. Prefrontal regulation of neuronal activity in the ventral tegmental area. Cereb Cortex. 2015 doi: 10.1093/cercor/bhv215. Advance online publication. Retrieved September 22, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Kable JW, Glimcher PW. The neurobiology of decision: consensus and controversy. Neuron. 2009;63:733–745. doi: 10.1016/j.neuron.2009.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Levens SM, Larsen JT, Bruss J, Tranel D, Bechara A, Mellers BA. What might have been? The role of the ventromedial prefrontal cortex and lateral orbitofrontal cortex in counterfactual emotions and choice. Neuropsychologia. 2014;54:77–86. doi: 10.1016/j.neuropsychologia.2013.10.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Mainen ZF, Kepecs A. Neural representation of behavioral outcomes in the orbitofrontal cortex. Curr Opin Neurobiol. 2009;19:84–91. doi: 10.1016/j.conb.2009.03.010. [DOI] [PubMed] [Google Scholar]
  27. McAlonan K, Brown VJ. Orbital prefrontal cortex mediates reversal learning and not attentional set shifting in the rat. Behav Brain Res. 2003;146:97–103. doi: 10.1016/j.bbr.2003.09.019. [DOI] [PubMed] [Google Scholar]
  28. Miller EK, Cohen JD. An integrative theory of prefrontal cortex function. Annu Rev Neurosci. 2001;24:167–202. doi: 10.1146/annurev.neuro.24.1.167. [DOI] [PubMed] [Google Scholar]
  29. Miyake A, Friedman NP, Emerson MJ, Witzki AH, Howerter A, Wager TD. The unity and diversity of executive functions and their contributions to complex “frontal lobe” tasks: a latent variable analysis. Cogn Psychol. 2000;41:49–100. doi: 10.1006/cogp.1999.0734. [DOI] [PubMed] [Google Scholar]
  30. Noonan MP, Kolling N, Walton ME, Rushworth MF. Re-evaluating the role of the orbitofrontal cortex in reward and reinforcement. Eur J Neurosci. 2012;35:997–1010. doi: 10.1111/j.1460-9568.2012.08023.x. [DOI] [PubMed] [Google Scholar]
  31. O'Doherty J, Kringelbach ML, Rolls ET, Hornak J, Andrews C. Abstract reward and punishment representations in the human orbitofrontal cortex. Nat Neurosci. 2001;4:95–102. doi: 10.1038/82959. [DOI] [PubMed] [Google Scholar]
  32. O'Doherty J, Critchley H, Deichmann R, Dolan RJ. Dissociating valence of outcome from behavioral control in human orbital and ventral prefrontal cortices. J Neurosci. 2003;23:7931–7939. doi: 10.1523/JNEUROSCI.23-21-07931.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Ongür D, Price JL. The organization of networks within the orbital and medial prefrontal cortex of rats, monkeys and humans. Cereb Cortex. 2000;10:206–219. doi: 10.1093/cercor/10.3.206. [DOI] [PubMed] [Google Scholar]
  34. Orsini CA, Moorman DE, Young JW, Setlow B, Floresco SB. Neural mechanisms regulating different forms of risk-related decision-making: insights from animal models. Neurosci Biobehav Rev. 2015;58:147–167. doi: 10.1016/j.neubiorev.2015.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Paxinos G, Watson C. The rat brain in stereotaxic coordinates. Ed 5. San Diego: Elsevier Academic; 2005. [Google Scholar]
  36. Preuss TM. Do rats have prefrontal cortex? The Rose-Woolsey-Akert program reconsidered. J Cogn Neurosci. 1995;7:1–24. doi: 10.1162/jocn.1995.7.1.1. [DOI] [PubMed] [Google Scholar]
  37. Ragozzino ME, Wilcox C, Raso M, Kesner RP. Involvement of rodent prefrontal cortex subregions in strategy switching. Behav Neurosci. 1999;113:32–41. doi: 10.1037/0735-7044.113.1.32. [DOI] [PubMed] [Google Scholar]
  38. Remijnse PL, Nielen MM, Uylings HB, Veltman DJ. Neural correlates of a reversal learning task with an affectively neutral baseline: an event-related fMRI study. Neuroimage. 2005;26:609–618. doi: 10.1016/j.neuroimage.2005.02.009. [DOI] [PubMed] [Google Scholar]
  39. Rogers RD, Owen AM, Middleton HC, Williams EJ, Pickard JD, Sahakian BJ, Robbins TW. Choosing between small, likely rewards and large, unlikely rewards activates inferior and orbital prefrontal cortex. J Neurosci. 1999;19:9029–9038. doi: 10.1523/JNEUROSCI.19-20-09029.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Roiser JP, Cannon DM, Gandhi SK, Taylor Tavares J, Erickson K, Wood S, Klaver JM, Clark L, Zarate CA, Jr, Sahakian BJ, Drevets WC. Hot and cold cognition in unmedicated depressed subjects with bipolar disorder. Bipolar Disord. 2009;11:178–189. doi: 10.1111/j.1399-5618.2009.00669.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Rudebeck PH, Murray EA. Amygdala and orbitofrontal cortex lesions differentially influence choices during object reversal learning. J Neurosci. 2008;28:8338–8343. doi: 10.1523/JNEUROSCI.2272-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Rushworth MF, Buckley MJ, Behrens TE, Walton ME, Bannerman DM. Functional organization of the medial frontal cortex. Curr Opin Neurobiol. 2007;17:220–227. doi: 10.1016/j.conb.2007.03.001. [DOI] [PubMed] [Google Scholar]
  43. Schilman EA, Uylings HB, Galis-de Graaf Y, Joel D, Groenewegen HJ. The orbital cortex in rats topographically projects to central parts of the caudate-putamen complex. Neurosci Lett. 2008;432:40–45. doi: 10.1016/j.neulet.2007.12.024. [DOI] [PubMed] [Google Scholar]
  44. Schoenbaum G, Nugent SL, Saddoris MP, Setlow B. Orbitofrontal lesions in rats impair reversal but not acquisition of go, no-go odor discriminations. Neuroreport. 2002;13:885–890. doi: 10.1097/00001756-200205070-00030. [DOI] [PubMed] [Google Scholar]
  45. Sesack SR, Deutch AY, Roth RH, Bunney BS. Topographical organization of the efferent projections of the medial prefrontal cortex in the rat: an anterograde tract-tracing study with Phaseolus vulgaris leucoagglutinin. J Comp Neurol. 1989;290:213–242. doi: 10.1002/cne.902900205. [DOI] [PubMed] [Google Scholar]
  46. Stalnaker TA, Cooch NK, Schoenbaum G. What the orbitofrontal cortex does not do. Nat Neurosci. 2015;18:620–627. doi: 10.1038/nn.3982. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. St Onge JR, Floresco SB. Prefrontal cortical contribution to risk-based decision making. Cereb Cortex. 2010;20:1816–1828. doi: 10.1093/cercor/bhp250. [DOI] [PubMed] [Google Scholar]
  48. St Onge JR, Stopper CM, Zahm DS, Floresco SB. Separate prefrontal-subcortical circuits mediate different components of risk-based decision making. J Neurosci. 2012;32:2886–2899. doi: 10.1523/JNEUROSCI.5625-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Stopper CM, Green EB, Floresco SB. Selective involvement by the medial orbitofrontal cortex in biasing risky, but not impulsive, choice. Cereb Cortex. 2014;24:154–162. doi: 10.1093/cercor/bhs297. [DOI] [PubMed] [Google Scholar]
  50. Sul JH, Kim H, Huh N, Lee D, Jung MW. Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making. Neuron. 2010;66:449–460. doi: 10.1016/j.neuron.2010.03.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Taylor Tavares JV, Clark L, Furey ML, Williams GB, Sahakian BJ, Drevets WC. Neural basis of abnormal response to negative feedback in unmedicated mood disorders. Neuroimage. 2008;42:1118–1126. doi: 10.1016/j.neuroimage.2008.05.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Tsuchida A, Doll BB, Fellows LK. Beyond reversal: a critical role for human orbitofrontal cortex in flexible learning from probabilistic feedback. J Neurosci. 2010;30:16868–16875. doi: 10.1523/JNEUROSCI.1958-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Uylings HB, Groenewegen HJ, Kolb B. Do rats have a prefrontal cortex? Behav Brain Res. 2003;146:3–17. doi: 10.1016/j.bbr.2003.09.028. [DOI] [PubMed] [Google Scholar]
  54. van Duuren E, van der Plasse G, van der Blom R, Joosten RN, Mulder AB, Pennartz CM, Feenstra MG. Pharmacological manipulation of neuronal ensemble activity by reverse microdialysis in freely moving rats: a comparative study of the effects of tetrodotoxin, lidocaine, and muscimol. J Pharmacol Exp Ther. 2007;323:61–69. doi: 10.1124/jpet.107.124784. [DOI] [PubMed] [Google Scholar]
  55. van Duuren E, van der Plasse G, Lankelma J, Joosten RN, Feenstra MG, Pennartz CM. Single-cell and population coding of expected reward probability in the orbitofrontal cortex of the rat. J Neurosci. 2009;29:8965–8976. doi: 10.1523/JNEUROSCI.0005-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Walton ME, Behrens TE, Buckley MJ, Rudebeck PH, Rushworth MF. Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning. Neuron. 2010;65:927–939. doi: 10.1016/j.neuron.2010.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Waltz JA, Gold JM. Probabilistic reversal learning impairments in schizophrenia: further evidence of orbitofrontal dysfunction. Schizophr Res. 2007;93:296–303. doi: 10.1016/j.schres.2007.03.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Whitton AE, Treadway MT, Pizzagalli DA. Reward processing dysfunction in major depression, bipolar disorder and schizophrenia. Curr Opin Psychiatry. 2015;28:7–12. doi: 10.1097/YCO.0000000000000122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Williams ZM, Bush G, Rauch SL, Cosgrove GR, Eskandar EN. Human anterior cingulate neurons and the integration of monetary reward with motor responses. Nat Neurosci. 2004;7:1370–1375. doi: 10.1038/nn1354. [DOI] [PubMed] [Google Scholar]
  60. Wise SP. Forward frontal fields: phylogeny and fundamental function. Trends Neurosci. 2008;31:599–608. doi: 10.1016/j.tins.2008.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Young JJ, Shapiro ML. Double dissociation and hierarchical organization of strategy switches and reversals in the rat PFC. Behav Neurosci. 2009;123:1028–1035. doi: 10.1037/a0016822. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from The Journal of Neuroscience are provided here courtesy of Society for Neuroscience

RESOURCES