Abstract
Cognitive control is subjectively costly, suggesting that engagement is modulated in relationship to incentive state. Dopamine appears to play key roles. In particular, dopamine may mediate cognitive effort by two broad classes of functions: 1) modulating the functional parameters of working memory circuits subserving effortful cognition, and 2) mediating value-learning and decision-making about effortful cognitive action. Here we tie together these two lines of research, proposing how dopamine serves “double duty”, translating incentive information into cognitive motivation.
Introduction
Why is thinking effortful? Unlike physical exertion, there is no readily apparent metabolic cost (relative to “rest,” which is already metabolically expensive) (Raichle and Mintun, 2006). And yet, we avoid engaging in demanding activities even when doing so might further valuable goals. This appears particularly true when goal pursuit requires extended allocation of working memory for cognitive control. One hypothesis is that cognitive effort avoidance is intended to minimize opportunity costs incurred by the allocation of working memory (Kurzban et al., 2013). If this is true, it suggests not only that working memory is allocated opportunistically, but also that allocation policies entail sophisticated cost-benefit decision-making that is sensitive to as yet unknown cost and incentive functions. In any case, the phenomenon raises a number of questions: How do brains track effort costs? What information is being tracked? How can incentives overcome such costs? What mechanisms mediate adaptive working memory allocation?
Working memory capacity is sharply limited, especially in the domain of cognitive control, involving abstract, flexible, hierarchical rules for behavior selection. Optimizing working memory allocation is thus critical for optimizing behavior. Prevalent computational frameworks have proposed reward- or expectancy-maximization algorithms for working memory allocation (Botvinick et al., 2001; Donoso et al., 2014; O’Reilly and Frank, 2006). Yet, these frameworks largely neglect that working memory allocation itself carries affective valence. High subjective costs drive disengagement, whereas sufficient incentive drives engagement. That is, allocation of working memory is a motivated process. In this review, we argue that modulatory functions of the midbrain dopamine (DA) system translate cost-benefit information into adaptive working memory allocation.
DA has been implicated in numerous processes including, but not limited to, motivation, learning, working memory, and decision-making. Two largely independent literatures ascribe disparate functional roles to DA with relevance to motivated cognition. First, DA influences the allocation of working memory directly, by modulating the functional parameters of working memory circuits. For example, DA tone in the prefrontal cortex (PFC) influences the stability of working memory representations, with higher extrasynaptic tone promoting greater stability, to a limit (Seamans and Yang, 2004). Phasic DA efflux may also push beyond the limit and toggle the PFC into a labile state such that working memory representations can be flexibly updated (Braver et al., 1999). Additionally, DA may support the learning of more sophisticated (and hierarchical) allocation policies via synaptic depression and potentiation in corticostriatal loops (Frank et al., 2001; O’Reilly and Frank, 2006). Second, DA is critical for action selection. Specifically, DA trains value functions for action selection via phasic reward prediction error dynamics potentiating behaviors that maximize reward with respect to effort in a given context (see (Niv, 2009) for a review). DA tone in the striatum and the medial PFC also promotes preparatory and instrumental behaviors in response to conditioned stimuli, and particularly effortful behavior (Kurniawan et al., 2011; Salamone and Correa, 2012).
Here, we tie together these largely independent lines of research, by proposing how the very same functional properties of DA encoding incentive information translate incentives into cognitive motivation by regulating working memory. Specifically, we propose that DA dynamics encoding incentive state promote subjectively costly working memory operations experienced as conscious, phenomenal effort. As we detail below, our proposal makes use of the concept of a “control episode” during goal pursuit (cf. “attentional episodes“ (Duncan, 2013)), involving stable maintenance of the goal state at higher-levels of the control hierarchy, along with selective updating of lower level rules for guiding behavior during completion of sub-goals, as progress is made toward the ultimate goal state. We review the ways in which DA dynamics encoding a net cost-benefit of goal engagement and persistence results in adaptive working memory allocation. As such, DA translates incentive motivation into cognitive effort.
Motivated cognition
Why cognitive effort matters
Cognitive effort is an everyday experience. The subjective costliness of cognitive effort is consequential, sometimes driving disengagement from otherwise highly valuable goals. Yet, surprisingly little is known about this phenomenon. It is neither clear what makes tasks effortful, nor why task engagement is apparently aversive in the first place (Inzlicht et al., 2014; Kurzban et al., 2013).
Beyond a quizzical influence over goal-directed behavior, there are numerous reasons to care about cognitive effort. First, expenditure is critical for career and educational success, economic decision-making, and attitude formation (Cacioppo et al., 1996; Von Stumm et al., 2011). Second, deficient effort may be a significant component of neuropsychiatric disorders for which avolition, anhedonia, and inattention feature prominently, such as ADHD (Volkow et al., 2010), depression (Hammar et al., 2011), and schizophrenia (Strauss et al., 2015). Effort avoidance may also contribute to declining cognitive performance in healthy aging (Hess and Ennis, 2011; Westbrook et al., 2013). Engagement with certain kinds of cognitive tasks appears negatively valenced, indicating a subjective cost. Subjectively inflated effort costs might undermine cognitive engagement and thereby performance.
Control-demanding tasks are valenced
Not all tasks are effortful. Tasks requiring allocation of working memory for cognitive control, however, appear to be (Botvinick et al., 2009; Dixon and Christoff, 2012; Dreisbach and Fischer, 2012; Kool et al., 2010; Massar et al., 2015; McGuire and Botvinick, 2010; Schouppe et al., 2014; Westbrook et al., 2013). Individuals allowed to select freely between tasks differing only in the frequency with which working memory must be re-allocated for cognitive control express a progressive preference for the option with lower reallocation demands (Kool et al., 2010; McGuire and Botvinick, 2010). Critically even when offered larger rewards, decision-makers discount rewards as a function of effort costs, thus selecting smaller rewards with lower demands over larger rewards with higher demands (Massar et al., 2015; Westbrook et al., 2013).
Under what conditions might cognitively demanding tasks acquire affective valence? By one account, tasks demanding cognitive control involve response conflict (Botvinick et al., 2001) or frequent errors (Brown and Braver, 2005; Holroyd and Coles, 2002), and as such are less likely to be successful, thus engendering avoidance learning to bias behavior towards tasks with higher chances of success (Botvinick, 2007). Multiple lines of evidence suggest that conflict is aversive. First, conflict in the context of a Stroop task predicts overt avoidance (Schouppe et al., 2012). Also, trial-wise variation in subjective frustration with a stop-signal task predicts BOLD signal in the anterior cingulate cortex (ACC), otherwise implicated in conflict detection (Spunt et al., 2012). In another study (McGuire and Botvinick, 2010), participant ratings of their desire to avoid a conflict-inducing task correlated positively with individual differences in recruitment of ACC and also dorsolateral PFC, putatively involved in working memory maintenance of task sets. Moreover, the dorsolateral PFC correlation remained after controlling for performance differences (RTs and error rates), indicating that the desire to avoid the task did not simply reflect perceived failure. Finally, interesting interactions between affect and cognitive control also support the notion that conflict is aversive (Dreisbach and Goschke, 2004; Saunders and Inzlicht, 2015; Shackman et al., 2011). For example, individuals respond faster to affectively negative, and slower to affectively positive stimuli, following priming by conflicting versus non-conflicting Stroop trials (Dreisbach and Fischer, 2012).
Avoidance learning to minimize loss may partly explain aversion to working memory allocation for cognitive control. Yet, it cannot be the full story. On the one hand, individuals avoid cognitive demand, even controlling for reward likelihood (Kool et al., 2010; McGuire and Botvinick, 2010; Westbrook et al., 2013). On the other, opportunity costs may reflect more than just the likelihood of failure during the current control episode; namely, they may reflect the value of missed opportunities (Kurzban et al., 2013). Finally, an adaptive system must also be judicious, as avoidance of all goals requiring cognitive control is clearly maladaptive. Decision-making must consider both costs and benefits. Indeed, there is growing evidence that the ACC is as important for biasing engagement with effortful, control-demanding tasks as it is for biasing avoidance (Shenhav et al., 2013).
Incentives motivate cognitive control
If control is avoided because of subjective costs, increased incentives could offset costs, promoting control. Indeed, incentives yield control-mediated performance enhancements, see (Botvinick and Braver, 2015; Pessoa and Engelmann, 2010) for review. Incentives enhance performance in control-demanding tasks encompassing visuospatial attention (Krebs et al., 2012; Small, 2005), task-switching (Aarts et al., 2010), working memory (Jimura et al., 2010), and context maintenance (Chiew and Braver, 2014; Locke and Braver, 2008), among others. Furthermore, incentives predict greater activity in control-related regions, including medial and lateral PFC. For example, incentives yield increased BOLD signal in the ACC, propagating to dorsolateral PFC, corresponding well with the canonical model by which the ACC monitors for control demands and recruits lateral PFC to implement control (Kouneiher et al., 2009). This particular study showed that incentives yielded an additive increase in BOLD signal, on top of demand-driven control signals. However, more recent work has shown that incentive information is not merely additive, but interactive: with increasing incentive-related activity under high task-demand conditions, thus more directly implicating incentives in the enhancement of cognitive control (Bahlmann et al., 2015), cf. (Krebs et al., 2012). Beyond mean activity, incentives also enhance the fidelity of working memory representations. Task set representations are more distinctive, as revealed by multivariate pattern analysis of BOLD data, during incentivized working memory trials (Etzel et al., 2015). Interestingly, increased distinctiveness predicts individual differences in incentive-driven behavioral enhancement.
Incentives not only drive more control-related activity, or higher fidelity task set representations, but they also affect the selection of more costly control strategies. For example, cognitive control may be recruited proactively, in advance of imperative events, or reactively, concurrent with event onset (Braver, 2012). Proactive control has behavioral advantages, but also incurs opportunity costs that bias reliance on reactive control. Incentives appear to offset costs, increasing proactive relative to reactive control, as reflected in sustained increases in BOLD signal prior to imperative events, and attenuated phasic responses at event onsets, and this shift to proactive control predicts performance enhancements, e.g. (Jimura et al., 2010). Moreover, incentive-driven shifts to proactive control are larger among highly reward-sensitive individuals (Jimura et al., 2010).
In sum, working memory operations are treated as subjectively costly. Whether apparent costliness reflects avoidance learning of behaviors with low likelihood of success, or opportunity costs, incentives can counterbalance costs, promoting working memory operations. Cost-benefit decision-making thus underlies working memory allocation for cognitive control. We propose that during goal pursuit, individuals engage in costly “control episodes”, remaining engaged to the extent that benefits outweigh costs. Moreover, we propose that DA solves a core computational problem of control episodes: namely, value-based modulation of stability and targeted flexibility of working memory for cognitive control that reflects not only prior reward learning, but also instantaneous effects of current incentive state.
To illustrate, we consider an example control episode involving the demanding task of finding the product of two two-digit numbers, incentivized by points on an examination (without calculators; Figure 1). Control episodes may be initiated by incentive-driven (point-value cued) allocation of working memory to represent the goal state (finding the product). Throughout an episode, the actor must maintain high-level goal information (e.g. the original numbers), resisting interference from distractors, while flexibly updating targeted, lower-level representations of sub-goals in a hierarchical fashion. Sub-goals in our example include: a) multiplying the ones column digits; b) carrying the tens-digit value of that product; c) adding that value to the product of the tens-digits, etc. Maintaining each sub-goal is subjectively costly and thus the stability of goal representations should reflect the value of those goals. Similarly, updating operations, as required when sub-goals are completed, are also subjectively costly. As each stage has its own costs, and costs may accumulate in excess of perceived benefits, any stage may result in disengagement. We consider the mental multiplication example for illustrative purposes only; the general notion of a control episode should apply broadly to any hierarchically structured, temporally extended sequence of goal-directed behaviors that require working memory allocation (e.g., planning, problem-solving, and reasoning).
Figure 1.
Incentive state dynamics in a control episode (exemplified by succession through mental multiplication task operations). Here, points incentivize initial engagement. Costs (red line) mount with time-on-task and increasing maintenance and updating demands. Actors persist while the net incentive value of engagement (black line) remains positive, which occurs when costs are offset by incremental progress (e.g. at sub-goal completion) and other incentives (green line). If the net incentive value goes negative, actors are prone to disengagement.
In the sections that follow, we describe how DA mediates value-based working memory management during control episodes. Figure 2 provides an overview of critical functions that will be reviewed. Tonic DA, for example, influences the stability of working memory contents by direct action in PFC (Figure 2B), while phasic DA efflux in the striatum trains policies for value-based updating of working memory contents that reflect both the reward value of the goals to which they correspond and effort (updating and maintenance) costs (Figure 2C). While cached value-functions reflect past experience, their implementation is subject to instantaneous modulation by incentive state. Accordingly, we describe how DA and its projection targets encode net incentive state, dynamically accounting for goal state re-valuation and generalized motivation. Such information is used to bias policies for working memory allocation actions (Figure 2D). Hence DA does double duty in translating incentive information into cognitive effort both by functional modulation of working memory circuits (Figure 2B and 2C) and by influencing value-learning and decision-making about effortful action (2C and 2D). We take up each of these key duties in turn.
Figure 2.
Double duty for DA in cognitive effort includes: 1) modulating the functionality of working memory circuits including maintenance stability and specific flexibility for updating working memory contents (yellow), and 2) developing and biasing value-based policies for working memory allocation (blue). Clockwise from upper left: A) Key anatomical loci of DA circuitry regulating control episodes; B) Tonic DA promotes stable and robust working memory maintenance via PFC modulation; C) Phasic DA release encoding effort-discounted reward trains allocation policies in striatum and ACC; D) Phasic DA release and ramping tone in the striatum bias action selection towards costly working memory updating in the lateral PFC, by potentiating updating generally, and updating in accordance with PFC-based action policy signals, in particular. Top-down policy signals reflect hierarchically higher-level goals and thus favor gating of contextually appropriate sub-goals into working memory. Insets are described in subsequent figures.
DA and Working Memory Management
Successful control episodes demand stable maintenance and also targeted, flexible updating of working memory, with DA appearing to play an important role in both processes. In the PFC, DA influences the stability of recurrent networks (Brunel and Wang, 2001; Seamans and Yang, 2004) and, thereby, the stability of short-term configurations that constitute control-related working memory representations (Cools and D’Esposito, 2011; Robbins and Arnsten, 2009). In the striatum, DA trains gating policies that come to determine the kinds of information that becomes represented in the PFC, and the stimulus signals that drive updating of specific PFC sub-regions (Frank et al., 2001; O’Reilly and Frank, 2006). Thereby, DA plays key roles in initiating and sustaining control episodes by functionally promoting both working memory stability and targeted flexibility.
Promoting stability of higher-order goal states
Working memory representations in the PFC (Miller and Cohen, 2001) (though see (Riggall and Postle, 2012)) are instantiated as temporarily stable, recurrent cortical pyramidal networks (Brunel and Wang, 2001). Extracellular DA maintains recurrent dynamics by increasing excitatory NMDA drive, and also pruning firing external to such networks by exciting inhibitory GABA interneurons (Berridge and Arnsten, 2013; Cools and D’Esposito, 2011; Seamans and Yang, 2004). The net effect of increasing DA (to a point) is to increase network-specific recurrent firing rates (Figure 3) and thus signal-to-noise ratio of working memory representations (Brunel and Wang, 2001). For example, DA D1 receptor agonism sharpens spatial tuning in task-relevant PFC neurons in monkeys performing a spatial working memory task (Vijayraghavan et al., 2007).
Figure 3.
Increased PFC DA tone (upper line) boosts firing in task-selective neurons in recurrent networks during working memory maintenance (e.g., delayed match-to-sample), relative to a baseline (control) dopaminergic state (lower line). In this computational simulation of neural dynamics, DA-linked increase in NMDA and GABA currents boosts persistent, recurrent firing, enhancing the stability and distractor resistance of task-relevant working memory representations. (x-axis units are arbitrary time; Brunel & Wang 2001)
Importantly, PFC DA changes dynamically, precisely when needed, to promote working memory maintenance. Salient, cognitive task-relevant events have been shown to drive mesocortical DA neuron firing that can increase extrasynaptic DA concentration in the PFC (Figure 2B), reviewed in (Bromberg-Martin et al., 2010; Phillips et al., 2008). In humans, BOLD dynamics in the ventral tegmental area (VTA) support the hypothesis that DA neurons respond to cognitive task demands, independently of reward, e.g. (Boehler et al., 2011), as well as the interaction of reward and task complexity (Krebs et al., 2012). The effect of this VTA activation may be to promote maintenance of task sets in lateral PFC regions, e.g. in those demonstrated (by reversible TMS lesion) to be critical for supporting rule-guided behavior (D'Ardenne et al., 2012). More directly, a PET study has revealed increased D2 receptor binding in ventrolateral PFC in humans performing a verbal working memory task, relative to a simpler sustained attention task (Aalto, 2005) (Figure 4A).
Figure 4.
A) Decreased binding potential of D2 receptors in ventrolateral PFC indicates increased DA tone during a verbal working memory (2-back) task relative to a less demanding sustained attention (0-back) task. (Aalto et al. 2005). B) In a high-incentive context (orange; R+), sustained activity is enhanced in right lateral PFC during a working memory (Sternberg) task, relative to a low incentive context (blue; R-). (Jimura et al. 2010).
Incentive cues also drive PFC DA release, reviewed in (Bromberg-Martin et al., 2010; Phillips et al., 2008). To the extent that incentive-related DA promotes robust maintenance, such effects help explain motivational enhancements of memory- and rule-guided behavior. It could, for example, explain why incentives predict stronger proactive, maintenance-related BOLD signal in the lateral PFC during a Sternberg-type working memory task that mediates better performance (Jimura et al., 2010) (Figure 4B). It could also explain performance enhancements following pharmacological COMT inhibition (boosting PFC DA tone, in particular) in an exploration/exploitation task, which requires the tracking of multiple value signals in working memory (Kayser et al., 2014).
Conversely, while increasing DA promotes maintenance, flexible shifting may require decreased DA. In one study, set-shifting performance was modulated in humans dosed with l-dopa. FMRI evidence localized these effects to the PFC (Shiner et al., 2015). Specifically, when participants were dosed with l-dopa, the difference between better performance on incentivized, and worse performance on non-incentivized trials, was removed. Critically, this mirrored the attenuation of BOLD signal deactivation in the ventromedial PFC typical on incentivized versus non-incentivized trials. The result was interpreted as evidence that set maintenance was under dopaminergic control, and this control must be transiently removed to shift task sets.
Too much extrasynaptic DA, on the other hand, may destabilize working memory representations (Berridge and Arnsten, 2013; Cools and D’Esposito, 2011; Seamans and Yang, 2004). One potential mechanism of supraoptimal DA effects is increasing stimulation of relatively low-affinity DA D2 receptors (Durstewitz and Seamans, 2008). This D2 stimulation leads to decreased GABA and NMDA currents, thus counteracting D1 activation effects. Blocking D2 action, therefore, could enhance PFC representations. In a recent demonstration, DA D2 receptor blockade by amisulpride, relative to placebo, enhanced PFC representations as indexed by sharper multivariate pattern discrimination of PFC BOLD data between incentive conditions during an incentive learning task (Kahnt et al., 2015).
According to one proposal, task-based DA release yielding supra-optimal DA may provide a local task-switching mechanism in the PFC (Braver et al., 1999). Specifically, DA release may toggle PFC lability, by pushing DA tone from optimal to supraoptimal levels, increasing the likelihood of context updating during task performance. However, as noted, this kind of updating would have diffuse influence and lacks the temporal and spatial specificity required for targeted updating of, for example, a subcomponent of a task-set hierarchy (O’Reilly and Frank, 2006). Even if phasic DA does not support selective updating, it may be useful to serve as a general updating or disengagement signal.
We close this section by noting that while increasing incentive can drive higher PFC DA tone, a recent study has shown conflicting results. Notably, the investigators found higher PFC DA release in anticipation of less subjectively valued outcomes in monkeys (Kodama et al., 2014). The authors interpreted this unexpected result as reflecting stress-driven DA release observed in other studies, e.g. (Butts et al., 2011). Thus incentive can promote PFC DA tone, but stress may be another affective determinant. In any case, there is a growing consensus that affective stimuli influence PFC DA tone which, in turn, modulates the stability of recurrent networks and, thereby, the contents of working memory.
Promoting targeted, flexible updating of task sets
The need for both stability and flexibility of working memory, during control episodes, creates opposing demands that DA acting by the PFC alone cannot resolve. Indeed, DA-mediated increases in stability undermine flexibility, as reflected in higher task-switch costs (Herd et al., 2014; van Schouwenburg et al., 2010). There is evidence, however, that DA can increase cognitive flexibility via D2 signaling in the ventral striatum (VS) (Aarts et al., 2010; Samanez-Larkin et al., 2013; Shiner et al., 2015; van Holstein et al., 2011). Incentives can enhance task switching, and this effect is stronger among individuals with a variant of the DA transporter gene DAT1 predicting lower transporter density, and therefore higher synaptic and extrasynaptic DA tone, particularly in the striatum (Aarts et al., 2010). This result supports the hypothesis that striatal DA release mediates incentive enhancement of cognitive flexibility. Evidence of D2 receptor involvement comes from a study comparing the effects of the DA agonist bromocriptine, and the DA D2-selective antagonist sulpiride on task-switching (van Holstein et al., 2011). Critically, those individuals with DAT1 coding for higher dopamine transporter density (lower striatal DA tone) showed reduced switch costs after being dosed with DA agonist bromocriptine, and this improvement was blocked by the D2-selective antagonist sulpiride.
Successful control episodes require not simply generalized increases in flexibility, but targeted, context-specific updating. In the mental arithmetic example, it is critical to maintain a representation of the full problem (13 × 26), while updating specific subgoals as they are completed (e.g., shifting to multiply 10 × 6, after storing the 3×6 result). While DA in the PFC lacks the temporal or spatial specificity to support targeted updating, DA can effect specific updating via the basal ganglia (Frank et al., 2001; O’Reilly and Frank, 2006) (Figure 2C). A well-supported model holds that phasic DA release in the dorsal striatum (DS) trains “Go” cell (D1-expressing medium spiny neurons) synapses, through LTP, which increase the likelihood of contextual information being gated to the PFC. DA dips, on the other hand, are proposed to train “No-Go” cells (D2-expressing medium spiny neurons) synapses, through LTD, decreasing the likelihood of context gating (Frank et al., 2001). When stimuli evoke activity in relatively more Go than No-Go cells, information is gated (by transient removal of tonic inhibition of the thalamus) for representation into the PFC. Thus, by training striatal synapses to reflect reward history, phasic DA dynamics generate cached policies governing context-specific updating of working memory.
The gating model has been extended to support the hierarchical structure of control episodes (Chatham and Badre, 2015). Corticostriatal loops may support DA-mediated hierarchical reinforcement learning, in which content is selected for updating at different levels of a hierarchy (Badre and Frank, 2012; Frank and Badre, 2012). Reciprocal connections allow the BG to not only direct which information gets gated into working memory, but also for higher-level PFC representations of context to direct what lower-level representations get out-gated, when they are no longer useful (Chatham et al., 2014). Thus, higher-level representations may interact in a top-down manner with bottom-up gating mechanisms to adaptively target content at a hierarchically lower level.
Successful control episodes are enabled by both: a) DA-trained cached, value-based gating policies in cortico-striatal circuits that bias adaptive updating in hierarchical environments, and b) DA-mediated stability (in the PFC) and flexibility (in the striatum) of working memory as a function of incentive information. Thus, DA appears to translate incentives into cognitive motivation by direct modulation of the cortico-striatal working memory network supporting control episodes.
Cost-Benefit Decision-Making
Control episodes are treated as subjectively costly. Behavioral evidence suggests cost-benefit decision-making, balancing the value of the desirable goal against an underlying cost function (Dixon and Christoff, 2012; Kool et al., 2010; Massar et al., 2015; Westbrook et al., 2013), and DA likely plays a key role. Indeed, DA has long been implicated not only in WM and motivation, but also in both value-learning and decision-making; specifically, training functions mapping value to external states along with cognitive and motor actions (Li and Daw, 2011; Wickens et al., 2007), including temporally extended action sequences (Holroyd and Yeung, 2012; O’Reilly et al., 2014). Incentive salience models propose that DA may further bias action selection at the time of choice by modulating value signals, e.g., as a function of motivational state (McClure et al., 2003; Zhang et al., 2009). So, for example, DA could mediate the decision to engage in a temporally extended sequence of cognitive actions required for multiplication of two-digit numbers, as well as to execute all sub-goals in sequence, as a function of the point value on an examination. In contrast, if the task were not incentivized, or if a lower-effort strategy was available (e.g., using a calculator), the decision process may instead resolve against control episode engagement. In the next sections, we review evidence for the role of DA in training value functions and also instantaneously biasing the selection of, and persistence with, costly cognitive actions.
DA and Action Policy Learning
Reward-Prediction Errors
A rich literature implicates the firing of midbrain DA cells in encoding the momentary difference between expected and actual reward (Schultz et al., 1997). The remarkable functional similarity between these reward prediction errors (RPEs) and temporal difference values in computational reinforcement learning (RL) has led to the hypothesis that phasic DA dynamics train the system to bias behaviors that increase context-based reinforcement likelihood (Montague et al., 1996). Mechanistically, DA does so by potentiating synapses linking representations of the current state to specific behaviors (Wickens et al., 2007). Synaptic weights acquired through this process can be thought of as value functions, in the sense of stronger weights biasing actions that maximize the likelihood of reward (i.e., those actions with greatest expected value). This extends to cognitive actions – indeed it is precisely these phasic DA RPEs that are thought to train working memory gating policies described in the previous section (Frank et al., 2001) (Figure 2C).
Critically, the functional capacity of RPE signals extends beyond simple stimulus-response pairings, to action-outcome association learning in the PFC (Glascher et al., 2008). From an action selection standpoint, this is enormously powerful. Foremost, action-outcome associations are necessary for calculating net incentive value: the expected benefits of outcomes less the cost of actions. Moreover, action-outcome associations can not only support selecting the most highly rewarded action in a given state, they also enable “looking forward”: selecting actions based upon an internal model of the environment, its states and action-contingent state transitions. An agent acting in a “model-based” fashion may select actions that also take into account its state motivation for particular outcomes (Daw et al., 2011; Glascher et al., 2008). Indeed, sensitivity to outcome devaluation (e.g. devaluation by selective satiation) is used as the benchmark of model-based decision-making (Dolan and Dayan, 2013).
There is evidence that RPEs can reflect internal models of actions and subsequent states (Hiroyuki, 2014). Hence value functions may be learned for allocating working memory, if doing so implements a mental state that increases the probability of reward, given subsequent actions (Chatham and Badre, 2013; Dayan, 2012). Thus, RPEs may train value functions governing working memory allocation. Evidence of value-based working memory allocation comes from an fMRI study of humans selecting among task sets manipulated to have variable utility (Chatham and Badre, 2013). The expected value of task sets was varied systematically over trials, and a RL model of choice behavior was used to predict trial-wise subjective values of task sets. Subjective value estimates predicted BOLD dynamics in a fronto-striatal network, supporting that task set values are tracked according to a value-updating algorithm that is likely mediated by phasic DA RPE signals. It is worth noting here that, although PFC DA is thought have slow clearance, which would preclude the temporal resolution and specificity required for precise DA-based training, per se, there is reason to believe that co-release of glutamate from DA cells innervating the PFC could provide the mechanism for synaptic learning effects (Seamans and Yang, 2004). DA cells may thus direct learning, whether by the functional consequences of glutamate in the PFC or DA in the striatum. As discussed above, however, DA release in the PFC may have further consequences in the PFC in terms of promoting working memory stability, by modulating the dynamics of recurrently firing networks of pyramidal cells.
The functionality of RPE signals may also extend to hierarchical RL, whereby value functions describe actions sequences rather than individual actions (Frank and Badre, 2012; O’Reilly et al., 2014; Ribas-Fernandes et al., 2011). Selection across sequences is critical for overcoming individually costly actions that are only justifiable given the value of desirable outcomes at sequence conclusion (Holroyd and Yeung, 2012). In the mental arithmetic example, updating working memory with a ones-digit multiplication sub-goal is costly, but may be justifiable with regard to the progress it incurs towards the ultimate, valuable goal of solving the two-digit multiplication problem. Importantly, knowledge of task hierarchy enables agents to bias such costly actions. Pseudo-RPEs (based on perceived progress rather than external reward) may train value functions regarding action sequences (Ribas-Fernandes et al., 2011). Thus RPEs may train progress-based value functions for sequences of effortful working memory updating and maintenance.
As we have just reviewed, DA-mediated RL appears to train value functions with numerous properties supporting successful control episodes. Namely, RPEs can train value functions based on action-outcome associations, supporting model-based prospection, and reflecting action sequences. Such value functions may thus promote action in hierarchically structured environments where individually costly actions, like working memory allocation, are justified inasmuch as they incur progress towards a goal that is more valuable than the sequence is costly. As we elaborate next, value functions within the ACC in particular, appear critical for biasing engagement and persistence with costly control episodes.
DA cell firing trains action-outcome associations in the ACC
The ACC and dopaminergic innervation of the ACC are critical for selecting effortful behavior (Kurniawan et al., 2011). In particular, RPE signals may train action-outcome associations in the ACC for prediction (Alexander and Brown, 2011; Donoso et al., 2014; Holroyd and Coles, 2002) and effort-based decision-making (Kennerley et al., 2011; Shenhav et al., 2013; Skvortsova et al., 2014). Action-outcome associations are necessary for cost-benefit computations. Unit recording studies in monkeys engaged in multi-attribute decision-making have uncovered ACC neurons multiplexing information about benefits and costs (including effort) in a unified value-coding scheme (Kennerley et al., 2009). This contrasts with the orbitofrontal cortex (OFC), also implicated in economic decision-making, which contains neurons encoding the value of multi-attribute outcomes, but not the cost of action to obtain such outcomes (Kennerley et al., 2011; Padoa-Schioppa, 2011). Consistent data showing multiplexed cost-benefit encoding in the ACC also comes from rodent studies (Cowen et al., 2012; Hillman and Bilkey, 2012).
There is also considerable evidence supporting cost-benefit encoding in the human ACC during effort anticipation and decision-making. In tasks utilizing advance reward and demand (effort) cues, the ACC is sensitive to the anticipation of both dimensions, in both forced- and free-choice trials (Croxson et al., 2009; Kroemer et al., 2014; Kurniawan et al., 2013; 2010; Massar et al., 2015; Prévost et al., 2010; Vassena et al., 2014). Moreover, the ACC has been repeatedly linked to the conscious experience of cognitive effort. In a striking demonstration, electrical stimulation of the human ACC reliably evoked the conscious experience of a forthcoming challenge and also a “will to persevere” through that challenge (Parvizi et al., 2013).
Tonic PFC DA strengthens cortical action policy signals
Multiplexed value information is used by the ACC to set action policies which can then be implemented via the basal ganglia, in competition with habitual biases against effortful engagement. In the domain of cognitive effort, the ACC has been proposed to subserve a specific computational function in selecting the identity of, and the intensity with which control signals are represented, as a function of the expected value of the associated outcome (Shenhav et al., 2013). In this context, DA in the ACC strengthens dynamics supporting representation and integration of action-outcome associations (as evidenced, e.g., by increasing power in gamma band oscillations (Steullet et al., 2014)), and may thereby increase the influence of ACC-based policy signals. Conversely, blocking DA diminishes the capacity of the ACC to bias the choice of greater effort for larger rewards (Schweimer and Hauber, 2006; Schweimer et al., 2005). Thus, in the mental arithmetic example, incentive-driven DA release in the ACC would promote cortical action policies related to the strategy of directly computing the solution to the two-digit multiplication problem, rather than following a prepotent bias to utilize a lower-effort strategy (i.e., guessing) or otherwise disengage.
DA and the ACC track progress to regulate persistence
Following initiation of a control episode, an actor must decide whether to persist. Opportunity costs rise with time-on-task, and so may the drive to disengage. As we propose, perceived progress implies increasing expected value, and thus may offset accruing opportunity costs. There is growing evidence that DA and the ACC regulate progress-based persistence with control episodes (Holroyd and Yeung, 2012; O’Reilly et al., 2014). In fact, in rats engaged in an effort-based decision-making task, ACC neurons multiplexing maze path, reward, and effort information were most selective after decisions were made, and their dynamics were identical across forced- and free-choice trials, suggesting greater involvement in biasing persistence than in initial selection (Cowen et al., 2012).
Control episodes are intrinsically costly, perhaps reflecting opportunity costs incurred by working memory allocation (Inzlicht et al., 2014; Kurzban et al., 2013). Adaptive persistence in effortful sequences of behavior, therefore, requires ongoing computation of accruing costs and benefits (Meyniel et al., 2013). A useful metric is the rate of progress – if progress is sufficiently fast, engagement is maintained, while slow or blocked progress yield frustration and disengagement (O’Reilly et al., 2014).
The ACC, by virtue of its capacity for hierarchical RL, and reciprocal interactions with the DA midbrain (Holroyd and Yeung, 2012; Ribas-Fernandes et al., 2011), is well-positioned to track progress and regulate engagement. By this account, the ACC (perhaps in concert with the OFC (O’Reilly et al., 2014)) uses representations of hierarchical task structure to track progress towards sub- and super-ordinate goals and conveys progress via the dopaminergic midbrain. Faster progress generates DA release, promoting value learning and engagement, while slower progress generates DA dips. Indeed, ACC unit recordings in both monkeys and rats show ramping dynamics that reflect increasing progress through action sequences (Ma et al., 2014). Importantly, this dynamic reflects internal models of task structure: rat ACC neurons track progression through a sequence of lever presses, regardless of physical lever features or of particular sequences required on a given trial (Ma et al., 2014).
The midbrain, for its part, shows RPE-like firing in response to perceived (progress-like) success in monkeys performing a visual working memory task, independent of actual success, implicating model-based criteria (Matsumoto and Takada, 2013). Also, VS BOLD signal in humans performing a working memory task increases transiently on correct versus incorrect trials, in the absence of performance feedback (Satterthwaite et al., 2012). Together, these results suggest not only that pseudo-RPEs report perceived goal progress, but also that one function of these pseudo-RPE signals is to modulate activity in the VS – a region proposed to serve as a key motivational hub (Mogenson et al., 1980).
DA and the ACC track costs to constrain persistence
Persistence is justifiable only inasmuch as that progress outpaces accruing costs. A normative account appeals to the opportunity costs of working memory allocation (Kurzban et al., 2013). Evidence of DA encoding opportunity costs comes from a high-resolution FMRI study finding signed RPE-like increases in activity in the VTA/SN corresponding with the value of unchosen options which therefore constituted missed opportunities (D'Ardenne et al., 2013). The ACC, by virtue of its connectivity with lateral PFC working memory circuits, e.g. (Kouneiher et al., 2009), and sustained activity through control-demanding tasks, e.g. (Dosenbach et al., 2006), is well-positioned to track such opportunity costs.
Regardless of the nature of control costs, however, the ACC, which has long been implicated in avoidance learning (Shackman et al., 2011), has been proposed to mediate avoidance of control demands by attenuating DA-based value-learning signals (Botvinick, 2007). The most direct evidence comes from a recent pharmaco-genetic imaging study (Cavanagh et al., 2014). The paradigm was structured such that reward and punishment cues were accompanied by either high or low decision conflict (control demands), designed to test the prediction that conflict would attenuate reward and boost punishment learning. As expected, an EEG signature of ACC activity – mid-frontal theta power – increased reliably on conflict versus non-conflict trials. Critically, conflict strengthened individual difference correlations between mid-frontal theta power and the perceived punishment value of a given stimulus, while it attenuated individual difference correlations with the perceived reward value of a given stimulus. This result supports the hypothesis that the ACC both recruits control resources and also signals the cost of recruitment, thereby attenuating reward and amplifying punishment learning. Evidence implicating DA in particular included that dosing with cabergoline – a D2 selective agonist that acts on presynaptic D2 autoreceptors to inhibit burst firing to reward and exaggerate burst firing to punishments in the DS – had the effect of reducing reward responsiveness and boosting punishment responsiveness during learning.
Direct midbrain recordings support the hypothesis that DA neurons encode effort-discounted reward value. For example, a subset (11%) of midbrain VTA neurons in monkeys performing an effortful, incentivized reaching task fired in proportion to reward magnitude discounted by effort demands (Pasquereau and Turner, 2013). Similarly, population firing rates of substantia nigra (SN) neurons in monkeys performing an effort-based decision-making task increased during higher reward trials, and decreased with increasing effort requirements (Varazzani et al., 2015) (Figure 5). Interestingly, a stronger relationship between net expected value and SN firing rates also predicted a stronger relationship between net expected value and choice behavior. This correlation suggests that midbrain dopaminergic activity has the capacity to directly influence decision-making beyond mediating value learning, a point to which we will return later.
Figure 5.
A) Raster plots of monkey substantia nigra cell activity to incentive cues during effort-based decision-making, firing intensifies with higher incentive values (liquid reward; rightward columns) and lower effort demands (handgrip squeeze; upper rows). B) Location of substantia nigra recordings. (Varazzani et al. 2015).
The ACC is thus a strong candidate for regulating persistence with control episodes, by virtue of its capacity to track not only incremental progress towards a goal, but also opportunity costs, and thereby signal control costs. We further propose that ACC regulates persistence by conveying the momentary balance of accruing progress less costs via phasic DA release from the midbrain. As we discuss in the next section, these DA projections have important effects on not just value learning, but also on action selection, including instantaneous incentive motivation effects in the striatum.
DA and Action Selection Biasing
We propose that incentive-linked DA release promotes ACC-based action policies on engagement and persistence with control episodes over opposing action biases in the striatum (Figure 2D). The VS, and particularly the nucleus accumbens (NAcc), are regarded as a core limbic-motor interface (Mogenson et al., 1980), featuring dense reciprocal connections with both the dopaminergic midbrain and cortical regions including the ACC (Haber and Knutson, 2009). The DS, as described above, caches value functions controlling the gating of both motor behavior and working memory allocation (O’Reilly and Frank, 2006). A reconceptualization of these regions, and their dopaminergic inputs, describes the VS as a “critic” evaluating states and driving DA RPE-based training of action value functions, while the DS serves as the “actor” that learns value functions for gating cognitive and motor action (Joel et al., 2002; van der Meer and Redish, 2011). Here, we highlight the role of DA in the VS in biasing action policies from cortical regions like the ACC, and DA in the DS in promoting gating of effortful cognitive actions as a function of incentive state and goal proximity.
DA RPEs train the VS to encode net incentive value
In the VS, phasic DA RPEs train cortico-striatal synapses to reflect the net incentive value of a given state, i.e., expected reward less expected effort. Hence, fast-scan cyclic voltammetry in rats performing an effort-based decision-making task reveals NAcc DA release encoding both reward magnitude and lever-press ratio requirements of corresponding alternatives (Day et al., 2010), or the encoding of ratio requirements when demands are atypically low (Gan et al., 2009). Phasic DA RPE signals, in turn, train synapses to make VS neurons more excitable to states that signal relatively higher reward and lower effort costs.
Human FMRI studies support the hypothesis that the excitability of VS neurons encode net incentive value with respect to effort, e.g. (Croxson et al., 2009; Kurniawan et al., 2013; Schmidt et al., 2012). Striatal BOLD signal during a physical effort study increased to high versus low reward, and was attenuated when it was preceded by high versus low demands for handgrip squeezes (Kurniawan et al., 2013). Similarly, in the cognitive domain, a transient VS response to reward receipt was diminished if it was preceded by high versus low demands for cognitive control (i.e., task-switching frequency) (Botvinick et al., 2009).
Importantly, the VS evaluates both model-based and model-free state features (Daw et al., 2011; van der Meer and Redish, 2011). The capacity for model-based evaluation makes the VS critical for selection of control episodes, which may involve multiple costly actions that are only justifiable with respect to ultimate goals. Hence, as we describe later, dopaminergic innervation of the VS is particularly important for selecting model-based behavior constituting control episodes.
Striatal DA release mediates incentive salience and state motivation
Adaptive engagement with control episodes should involve not only rigid implementation of cached action values, but should also be sensitive to the current motivational state. In the arithmetic example, it would be adaptive to modulate persistence upon realizing that incentive point values were larger/smaller than first thought.
The incentive salience hypothesis holds that action values can be modulated instantaneously (i.e., without prior learning) by incentive cued striatal DA release (McClure et al., 2003; Phillips et al., 2008; Zhang et al., 2009). Hunger, e.g., increases instrumental lever pressing for food in rats (Phillips et al., 2008). A longstanding literature implicates striatal DA in modifying value functions and thereby promoting state willingness to expend effort (Bromberg-Martin et al., 2010; Kurniawan et al., 2011; Salamone and Correa, 2012). Alternatively, as we discuss later, incentive-cued DA release in the VS, in particular, may promote flexible approach, increasing apparent willingness to expend effort (McGinty et al., 2013; Nicola, 2010). In this section, we review evidence for DA’s role in incentive state modulation of cached action values.
VS DA appears critical for physical effort-based decision-making. In a canonical paradigm, rats choose between climbing a high barrier for more reward, or a low barrier for less reward, or alternatively to select a high-ratio lever press option for more reward, or a low-ratio lever press option for less reward; see (Bromberg-Martin et al., 2010; Kurniawan et al., 2011; Salamone and Correa, 2012) for reviews. The typical result is that DA blockade in the VS (along with antagonism in the ACC, or lesions of the ACC-VS loop) shifts preferences from high-reward, high-effort options towards low-reward, low-effort options.
Human studies also implicate striatal DA signaling in incentive motivation. For example, D2/D3 receptor and dopamine transporter density in the NAcc predicts trait-level achievement motivation in the individuals with ADHD (Volkow et al., 2010). In a combined fallypride-PET and d-amphetamine challenge study, human volunteers with the largest DS binding potential, and highest sensitivity to d-amphetamine during an instrumental button-pressing task were more willing to button press for reward (Treadway et al., 2012). Also, systemic DA agonism by the indirect agonist d-amphetamine ameliorates physical effort deficits among individuals with Parkinson’s disease (Chong et al., 2015). In the cognitive domain, VS BOLD signal interacted with a genetic D2 receptor density marker to predict individual differences in working memory performance (Nymberg et al., 2014).
The ability of incentive-cued DA release to energize behavior appears to critically depend on D2 receptor signaling in the NAcc core. In an instrumental lever-pressing task, transient GABAergic inactivation of the NAcc core, but not the shell, shifted preferences from high effort-high-reward to low effort-low reward alternatives (Ghods-Sharifi and Floresco, 2010). Additionally, rats treated with a viral vector yielding acute overexpression of D2 receptors in the NAcc showed enhanced instrumental lever-pressing (Trifilieff et al., 2013). We note that studies using animal models with developmentally overexpressed D2 receptors have also shown the reverse effect – i.e., decreased incentive motivation (Krabbe et al., 2015; Ward et al., 2015). However, this reverse effect may be due to comorbid, developmental under-expression of NMDA NR1 and NR2B receptors on VTA neurons, which reduces both their firing frequency and burst firing (Krabbe et al., 2015).
DA promotes effortful action by promoting cortical action policy signals in the VS and increasing the likelihood of gating in the DS
How does striatal DA bias selection of effortful action? By one proposal, action policies from the cortex, including canonical economic decision-making regions like the ACC and the OFC, are sent via axons that jointly synapse along with midbrain dopaminergic neurons in the striatum, and coincident phasic DA release boosts signal-to-noise: it enhances the contrast between strongly excited synapses corresponding to policy signals at the time of choice, relative to weakly excited synapses (Figure 2D) (Nicola et al., 2004). Thus phasic DA efflux could instantaneously amplify cortical action policies projected to the VS (Roesch et al., 2009).
A recent computational proposal (“OPponent Actor Learning” or “OPAL”; Figure 6A) unifies value learning and incentive salience aspects of DA. In the DS, where gating policies are cached in terms of the relative strengths of cortico-striatal synapses onto D1-expressing “Go” and D2-expressing “NoGo” cells, DA should increase Go cell firing, and inhibit NoGo cells, thus modulating cached policies in favor of gating actions (Collins and Frank, 2014). According to this proposal, DA not only influences the learning of cached value functions, but can instantaneously modulate those value functions at the time of choice. The most direct evidence comes from an optogenetic study in which lateralized populations of DS Go and NoGo cells in rats were stimulated independently (Tai et al., 2012). Stimulation of Go cells yielded an apparent shift in preference to a contralateral option, while stimulation of NoGo cells yielded an apparent shift to an ipsilateral option. This shift mimicked additive effects in subjective value of one option over the other (Figure 6B). Hence, DS DA may translate incentive motivation into the selection of control episodes by increasing the subjective value of working memory allocation.
Figure 6.
A) Schematic of cortico-striatal neural network model modified in OpAL such that DA instantaneously promotes firing of D1-expressing, direct pathway “Go” cells (green region) and inhibits firing of D2-expressing, indirect pathway “NoGo” cells (red region) in the striatum, thereby promoting working memory gating. (Collins and Frank, 2014). B) Optogenetic stimulation of striatal D1 and D2 cells in rats, during decision-making in a two-alternative reward-learning task, produce dose-dependent shifts in preference towards the contralateral option in case of D1 cell stimulation (blue) and the ipsilateral option in case of D2 stimulation (red) mimicking shifts in subjective value functions (as reviewed in Lee et al., 2015).
DA in the VS promotes model-based behavior
The capacity of phasic DA in the VS to bias cortical action policies related to effortful action is most critical in the early stages of successful control episodes when behavior is necessarily model-based. For example, in the two-digit mental arithmetic problem, the actor must consider the point-valued outcome when deciding whether to engage and persist in the episode, since the immediate value of initial sub-goals, e.g., computing the ones-digit product, is net-negative. Here, VS DA appears critical for promoting model-based behavior, of the kind necessary for persistence in these early stages. Indeed, higher presynaptic striatal DA, as measured by [F]DOPA PET, predicts greater reliance on model-based decision-making in two-stage sequential decision-making task, and also predicted decreased reliance on habitual associations as encoded in striatal BOLD signal (Deserno et al., 2015). This could also explain why humans dosed with systemic DA agonists show more model-based relative to model-free decision-making (Wunderlich et al., 2012), especially to the extent that phasic signaling can be boosted by greater extrasynaptic tone (Dreyer, Herrick, Berg, and Hounsgaard, 2010).
The emphasis on promoting model-based behavior aligns with a reconceptualization, in which VS DA supports flexible approach, or persisting in goal-directed (and therefore model-based) behavior (McGinty et al., 2013; Nicola, 2010), rather than overcoming instrumental costs per se. The flexible approach hypothesis states that during periods in which rewards are not immediately available, agents are more likely to disengage and, because they can assume different positions with respect to operanda during such pauses, NAcc DA is needed to flexibly re-approach and engage.
By this account, much of the extant literature on NAcc DA promoting instrumental effort can be reinterpreted, wherein subtle task features allow more opportunities for disengagement in conditions for which effort demands are higher, placing more demands on NAcc DA to support flexible approach (Nicola, 2010). This could explain why NAcc DA depletion does not always affect effort-based decision-making about instrumental lever pressing in rats, e.g. when instrumental task design permits few opportunities for disengagement (Walton et al., 2009). It further explains the observation that NAcc DA is only necessary for initiating instrumental lever-pressing when there are longer pauses between action opportunities (Nicola, 2010).
Regardless of whether VS DA is necessary for overcoming effort costs or for flexible approach, the selection of effortful action sequences, like those comprising control episodes, requires VS DA (Nicola, 2007). Moreover, this may be particularly true when there is greater “psychological distance” from goals (Salamone and Correa, 2012), involving deeper action-outcome chaining, whether that distance is a function of space, time, or the number of sub-goals in a two-digit multiplication.
DA tone in the striatum reflects goal progress and invigorates action
As progress is made within a control episode, and percepts narrow in on a goal state, action invigoration becomes more important. Consider the final stage of a two-digit multiplication, when cortical representations of products for summation increasingly suggest the ultimate solution. At this stage, consideration of outcome incentives becomes less important than quick and robust execution of a retrieval action to finalize the solution. Intriguingly, recently discovered DA dynamics appear well-suited to subserve this functional shift from model-based control to invigoration (Figure 7). Namely, striatal DA tone ramps up smoothly, encoding goal progress (Howe et al., 2013). This dynamic, discovered with fast-scan cyclic voltammetry in rats navigating mazes, was found to scale with reward magnitude, and encode relative, rather than absolute distance to the goal.
Figure 7.
Ramping DA tone in the VS as measured by fast-scan cyclic voltammetry, during a single trial in which a rat progresses through a t-maze towards a final goal-state. Red vertical lines indicate, from left to right, the timing of an audible click cueing trial start, a tone indicating the direction the rat should turn, and finally successful goal attainment (chocolate milk). Ramp slopes reflected relative path distance, rather than absolute path distance or time-on-trial, and scaled with reward magnitude (Howe et al., 2013).
The mechanism of DA ramping is not clear – whether it reflects local release, or ramping firing of midbrain DA cells. Ramping may actually result from the progressive accumulation, or “spill-over” from phasic DA release (Gershman, 2014), e.g., as progress is made. In particular, phasic DA release may reflect the temporal derivative of a running average rate of progress as tracked by the ACC and OFC (O’Reilly et al., 2014), or pseudo-reward in hierarchical RL (Ribas-Fernandes et al., 2011).
An important functional consequence of rising striatal DA tone is the invigoration of behavior. Specifically, striatal DA tone is thought to encode the average rate of experienced reward and promote vigor (inverse latency to responding) adaptively, such that higher rates of reward imply a richer local resource that an actor should act more quickly to obtain (Niv et al., 2007). Hence, sufficiently fast progress towards the final goal yields ramping striatal DA tone, which can also promote action invigoration as the goal nears.
We close this section on dopaminergic mediation of value-learning and effort-based decision-making by noting conflicting evidence. First, a recent study has shown that DA-based cached values do not necessarily map onto preferred actions (Hollon et al., 2014). Specifically, fast-scan cyclic voltammetry in the rat NAcc revealed that DA tone was higher on trials in which the rat was forced to choose a dis-preferred high-effort high-reward option over a preferred low-effort low-reward option. Of course, DA may play different roles in forced- and free-choice decision-making, but this result suggests that, at least in some contexts, the rank-ordered relationship between DA and preference can be violated. Second, we note recent work aimed at developing a rodent model of cognitive effort-based decision-making, e.g. (Hosking et al., 2014). This work has provided mixed evidence so far regarding the consequences of systemic, pharmacological DA manipulation on willingness to expend cognitive (vs. physical) effort. It is open for debate whether the new rodent model represents the sorts of cognitive effort-based decision-making that is of focal interest for control episodes, and whether the task sufficiently discriminates effort-based from probabilistic decision-making. Nevertheless, a rodent model obviously holds great promise for more fine-grained investigation into the neural circuitry mediating decisions about cognitive effort.
DA Translates Incentives Into Cognitive Motivation: Summary Proposal
Here, we recapitulate the proposal we have been building whereby DA does double duty during costly control episodes. We define control episodes as temporally extended sequences in which working memory is allocated to represent the rules needed to guide goal-directed behavior. During an episode, DA does double duty in that it: a) influences working memory contents by functional modulation of working memory circuits, and b) supports value-learning and decision-making about effortful cognitive actions (Figure 2).
Generally, we propose that:
Phasic DA RPE signals encode goal benefits and effort costs for control episodes, caching net values in terms of LTP and LTD of cortico-striatal synapses.
Incentive-linked DA release instantaneously augments cached values, increasing the likelihood of gating relevant task-sets into working memory in the striatum, thereby initiating control episodes associated with high incentive value.
During control episodes, the ACC tracks both accruing opportunity costs and incremental progress – the balance of these is conveyed to midbrain DA neurons, where it is then transmitted to the striatum and PFC as phasic, effort-discounted, pseudo-RPE signals.
In the PFC, rising DA tone encoding fast goal-progress (or high incentive state) enhances the robustness of persistent activity, thereby stabilizing active maintenance in recurrent networks representing task goals.
In the VS, DA release promotes drive (or flexible approach) to select extended sequences of goal-directed behavior. This is particularly critical at early stages of a control episode. As the goal state nears, ramping DA tone invigorates (potentiates) action gating, including working memory allocation actions.
In the DS, DA tone encoding sufficiently fast goal progress in a ramping fashion increases the general likelihood of task set updating. However, hierarchically structured task sets in the PFC interact with DS to target lower-level task sets for contextually-appropriate out-gating. Thus, specific, lower-level flexibility is promoted while high-level goal maintenance is sustained during the control episode.
Conversely, to the extent that opportunity costs outpace incremental progress, the likelihood of disengagement rises. This may result from falling PFC DA tone, reducing the stability of working memory representations, or reduced likelihood of working memory gating in cortico-striatal-thalamic loops. Declining DA release undermines goal-directed flexible approach effects in the VS, further potentiating distraction.
Gaps in our account remain. We have described how rising PFC DA promotes task set stability, yet we have also pointed to evidence that supraoptimal PFC DA tone yields destabilization, and, indeed how rapid PFC DA efflux could act as a global updating signal, indiscriminately destabilizing all current representations. However, we think that, for most operating regimes, PFC DA tone is unlikely to yield destabilization. As noted in a recent review (Spencer et al., 2015), intra-PFC injections of methylphenidate in rats, boosting DA tone, do not impair working memory at concentrations that are 16- to 32-fold higher than clinically relevant methylphenidate doses. This stands in contrast to the observation that systemic administration of methylphenidate can impair working memory at 4-fold concentrations higher than clinical doses.
An account of such discrepancies (between systemic and localized DA pharmacological manipulations) is that they relate to distinctions between DA modulation of PFC versus the striatum. Specifically, systemic high-dose DA manipulations may primarily act in the striatum where high DA tone can potentiate gating indiscriminately (Cools and D’Esposito, 2011). In a recent PET study demonstrating this effect in humans, the consequences of incentive motivation on performance of a Stroop task were investigated as a function of individual differences in baseline striatal DA synthesis capacity (using 6-[18F]fluoro-l-mtyrosine uptake) (Aarts et al., 2014). The key finding was that while incentives enhanced performance for some participants, those with highest baseline synthesis capacity saw a decrement in incentivized performance. This pattern is consistent with the interpretation that incentive-cued striatal DA release for those with high baseline DA synthesis capacity yielded indiscriminant updating, undermining performance. In our proposal, striatal DA tone rises when progress outpaces opportunity costs; however, indiscriminate updating is typically prevented (under non-pharmacological conditions) by the imposition of hierarchical, targeted updating policies guided by PFC working memory representations. Thus striatal DA tone to interacts with targeted updating policies to maintain engagement with the current control episode.
In focusing on DA, we have neglected other potentially relevant neurotransmitter systems. Norepinephrine, for example, has similar effects on the stability of working memory representations, and also responds like DA to incentive cues (Sara, 2009). A recent study showed that while SNc neurons appeared to encode net cost-benefit during effort-based decision-making decision making, locus coeruleus neurons encoded effort demands during task execution, suggesting a potential dissociation (Varazzani et al., 2015). Adenosine, for its part, appears to interact with the midbrain dopaminergic system to regulate effort-based decision-making, and may account for the effects of caffeine on cognitive effort (Salamone et al., 2012). Serotonin has also been proposed to oppose DA learning effects and may subserve effort cost learning (Boureau and Dayan, 2010). Nevertheless, we think that DA in particular has a number of useful properties that position it best for mediating cognitive incentive motivation.
Our proposal is similar in scope to other recent proposals. As described above, the Expected Value of Control proposal (Shenhav et al., 2013) considers cognitive control recruitment as driven by net expected value computations in ACC. A recent RL model (Holroyd and McClure, 2015) also considers the role of the ACC in value-based regulation of cognitive control, and thus offers specific predictions about the influence of reward dynamics on effortful action including up-regulation when rewards are below average and down-regulation when rewards are above average. There are numerous points of theoretical overlap among our proposals. For example, in all three, the ACC biases the selection of effortful cognitive control actions via interactions with the striatum. Our proposal complements the other two accounts by articulating the varied and precise roles by which DA contributes to the value-based regulation of working memory systems during control episodes. For example, in the computational model of (Holroyd and McClure, 2015), DA interacts with ACC signals such that when average reward, and presumably DA tone (Niv et al., 2007), are high, control signals are boosted because outcomes are more likely to fall below the average reward rate. Conversely, striatal DA blockade is posited to be computationally equivalent to low average reward, thus any reward is effectively above average and control signals dissipate. In our proposal, by contrast, striatal DA blockade also has the effect of diminishing effortful cognitive control, but it has its effects not in terms of a shift in perceived average reward, but in terms of diminished working memory stability in the PFC and targeted flexibility via the striatum.
We acknowledge the tentative nature of our proposal. Computational modeling and experimental validation are required to ensure DA has the functional capacity to subserve adaptive engagement and persistence in the ways we hypothesize. Nevertheless, we hope this conceptual sketch unifies disparate literatures on DA’s various functional properties and prompts development of a comprehensive theory of DA in cognitive effort. We have highlighted DA’s roles in value-learning and effort-based decision-making, and also the direct functional modulation of working memory circuits and thereby working memory contents by phasic and tonic DA modes. The integration of these two broad literatures together indicates double duty for DA in motivating cognitive effort.
Temporally-extended, goal-directed behavior often involves subjectively effortful cognition. Westbrook and Braver review two broad, complementary roles by which DA translates incentive information into cognitive motivation: 1) modulating working memory circuit parameters and 2) training decision value functions for cognitive engagement.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Aalto S. Frontal and Temporal Dopamine Release during Working Memory and Attention Tasks in Healthy Humans: a Positron Emission Tomography Study Using the High-Affinity Dopamine D2 Receptor Ligand [11C]FLB 457. Journal of Neuroscience. 2005;25:2471–2477. doi: 10.1523/JNEUROSCI.2097-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aarts E, Wallace DL, Dang LC, Jagust WJ, Cools R, D’Esposito M. Dopamine and the Cognitive Downside of a Promised Bonus. Psychological Science. 2014;25:1003–1009. doi: 10.1177/0956797613517240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aarts E, Roelofs A, Franke B, Rijpkema M, Guillen F, Helmich RC, Cools R. Striatal Dopamine Mediates the Interface between Motivational and Cognitive Control in Humans: Evidence from Genetic Imaging. Neuropsychopharmacology. 2010;35:1943–1951. doi: 10.1038/npp.2010.68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alexander WH, Brown JW. Medial prefrontal cortex as an action-outcome predictor. Nature Neuroscience. 2011;14:1338–1344. doi: 10.1038/nn.2921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Badre D, Frank MJ. Mechanisms of Hierarchical Reinforcement Learning in Cortico-Striatal Circuits 2: Evidence from fMRI. Cerebral Cortex. 2012;22:527–536. doi: 10.1093/cercor/bhr117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bahlmann J, Aarts E, D'Esposito M. Influence of Motivation on Control Hierarchy in the Human Frontal Cortex. Journal of Neuroscience. 2015;35:3207–3217. doi: 10.1523/JNEUROSCI.2389-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berridge CW, Arnsten AFT. Psychostimulants and motivated behavior: Arousal and cognition. Neuroscience and Biobehavioral Reviews. 2013;37:1976–1984. doi: 10.1016/j.neubiorev.2012.11.005. [DOI] [PubMed] [Google Scholar]
- Boehler CN, Hopf JM, Krebs RM, Stoppel CM, Schoenfeld MA, Heinze HJ, Noesselt T. Task-Load-Dependent Activation of Dopaminergic Midbrain Areas in the Absence of Reward. Journal of Neuroscience. 2011;31:4955–4961. doi: 10.1523/JNEUROSCI.4845-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Botvinick MM. Conflict monitoring and decision making: Reconciling two perspectives on anterior cingulate function. Cognitive, Affective, & Behavioral Neuroscience. 2007;7:356–366. doi: 10.3758/cabn.7.4.356. [DOI] [PubMed] [Google Scholar]
- Botvinick MM, Braver TS. Motivation and Cognitive Control: From Behavior to Neural Mechanism. Annual Review of Psychology. 2015;66:83–113. doi: 10.1146/annurev-psych-010814-015044. [DOI] [PubMed] [Google Scholar]
- Botvinick MM, Braver TS, Barch DM, Carter CS, Cohen JD. Conflict monitoring and cognitive control. Psychological Review. 2001;108:624–652. doi: 10.1037/0033-295x.108.3.624. [DOI] [PubMed] [Google Scholar]
- Botvinick MM, Huffstetler S, McGuire JT. Effort discounting in human nucleus accumbens. Cognitive, Affective, & Behavioral Neuroscience. 2009;9:16–27. doi: 10.3758/CABN.9.1.16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boureau Y-L, Dayan P. Opponency Revisited: Competition and Cooperation Between Dopamine and Serotonin. Neuropsychopharmacology. 2010 doi: 10.1038/npp.2010.151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Braver TS. The variable nature of cognitive control: A dual mechanisms framework. Trends in Cognitive Sciences. 2012;16:105–112. doi: 10.1016/j.tics.2011.12.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Braver TS, Barch DM, Cohen JD. Cognition and control in schizophrenia: a computational model of dopamine and prefrontal function. Biological Psychiatry. 1999;46:312–328. doi: 10.1016/s0006-3223(99)00116-x. [DOI] [PubMed] [Google Scholar]
- Bromberg-Martin ES, Matsumoto M, Hikosaka O. Dopamine in Motivational Control: Rewarding, Aversive, and Alerting. Neuron. 2010;68:815–834. doi: 10.1016/j.neuron.2010.11.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown JW, Braver TS. Learned predictions of error likelihood in the anterior cingulate cortex. Science. 2005;307:1118. doi: 10.1126/science.1105783. [DOI] [PubMed] [Google Scholar]
- Brunel N, Wang X-J. Effects of neuromodulation in a cortical network model of object working memory dominated by recurrent inhibition. Journal of Computational Neuroscience. 2001;11:63–85. doi: 10.1023/a:1011204814320. [DOI] [PubMed] [Google Scholar]
- Butts KA, Weinberg J, Young AH, Phillips AG. Glucocorticoid receptors in the prefrontal cortex regulate stress-evoked dopamine efflux and aspects of executive function. Proceedings of the National Academy of Sciences. 2011;108:18459–18464. doi: 10.1073/pnas.1111746108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cacioppo J, Petty R, Feinstein J, Jarvis W. Dispositional differences in cognitive motivation: The life and times of individuals varying in need for cognition. Psychological Bulletin. 1996;119:197. [Google Scholar]
- Cavanagh JF, Masters SE, Bath K, Frank MJ. Conflict acts as an implicit cost in reinforcement learning. Nature Communications. 2014;5 doi: 10.1038/ncomms6394. [DOI] [PubMed] [Google Scholar]
- Chatham CH, Badre D. Working memory management and predicted utility. Frontiers in Behavioral Neuroscience. 2013;7 doi: 10.3389/fnbeh.2013.00083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chatham CH, Badre D. Multiple gates on working memory. Current Opinion in Behavioral Sciences. 2015;1:23–31. doi: 10.1016/j.cobeha.2014.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chatham CH, Frank MJ, Badre D. Corticostriatal Output Gating during Selection from Working Memory. Neuron. 2014;81:930–942. doi: 10.1016/j.neuron.2014.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chiew KS, Braver TS. Dissociable influences of reward motivation and positive emotion on cognitive control. Cognitive, Affective, & Behavioral Neuroscience. 2014;14:509–529. doi: 10.3758/s13415-014-0280-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chong TTJ, Bonnelle V, Manohar S, Veromann K-R, Muhammed K, Tofaris GK, Hu M, Husain M. Dopamine enhances willingness to exert effort for reward in Parkinson's disease. Cortex. 2015;69:40–46. doi: 10.1016/j.cortex.2015.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Collins AGE, Frank MJ. Opponent actor learning (OpAL): Modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. Psychological Review. 2014;121:337–366. doi: 10.1037/a0037015. [DOI] [PubMed] [Google Scholar]
- Cools R, D’Esposito M. Inverted-U-Shaped Dopamine Actions on Human Working Memory and Cognitive Control. Biological Psychiatry. 2011;69:e113–e125. doi: 10.1016/j.biopsych.2011.03.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cowen SL, Davis GA, Nitz DA. Anterior cingulate neurons in the rat map anticipated effort and reward to their associated action sequences. Journal of Neurophysiology. 2012;107:2393–2407. doi: 10.1152/jn.01012.2011. [DOI] [PubMed] [Google Scholar]
- Croxson PL, Walton ME, O'Reilly JX, Behrens TEJ, Rushworth MFS. Effort-based cost-benefit valuation and the human brain. Journal of Neuroscience. 2009;29:4531. doi: 10.1523/JNEUROSCI.4515-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- D'Ardenne K, Eshel N, Luka J, Lenartowicz A, Nystrom L, Cohen JD. Role of prefrontal cortex and the midbrain dopamine system in working memory updating. Proceedings of the National Academy of Sciences. 2012;109:19900–19909. doi: 10.1073/pnas.1116727109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- D'Ardenne K, Lohrenz T, Bartley KA, Montague PR. Computational heterogeneity in the human mesencephalic dopamine system. Cognitive, Affective, & Behavioral Neuroscience. 2013;13:747–756. doi: 10.3758/s13415-013-0191-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. Model-Based Influences on Humans' Choices and Striatal Prediction Errors. Neuron. 2011;69:1204–1215. doi: 10.1016/j.neuron.2011.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Day JJ, Jones JL, Wightman RM, Carelli RM. Phasic Nucleus Accumbens Dopamine Release Encodes Effort- and Delay-Related Costs. Biological Psychiatry. 2010;68:306–309. doi: 10.1016/j.biopsych.2010.03.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dayan P. How to set the switches on this thing. Current Opinion in Neurobiology. 2012;22:1068–1074. doi: 10.1016/j.conb.2012.05.011. [DOI] [PubMed] [Google Scholar]
- Deserno L, Huys QJM, Boehme R, Buchert R, Heinze H-J, Grace AA, Dolan RJ, Heinz A, Schlagenhauf F. Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making. Proceedings of the National Academy of Sciences. 2015;112:1595–1600. doi: 10.1073/pnas.1417219112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dixon ML, Christoff K. The Decision to Engage Cognitive Control Is Driven by Expected Reward-Value: Neural and Behavioral Evidence. PLoS One. 2012;7:e51637. doi: 10.1371/journal.pone.0051637. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dolan RJ, Dayan P. Goals and Habits in the Brain. Neuron. 2013;80:312–325. doi: 10.1016/j.neuron.2013.09.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Donoso M, Collins AG, Koechlin E. Foundations of human reasoningin the prefrontal cortex. Science. 2014;344:1481–1486. doi: 10.1126/science.1252254. [DOI] [PubMed] [Google Scholar]
- Dosenbach NUF, Visscher KM, Palmer ED, Miezin FM, Wenger KK, Kang HC, Burgund ED, Grimes AL, Schlaggar BL, Petersen SE. A Core System for the Implementation of Task Sets. Neuron. 2006;50:799–812. doi: 10.1016/j.neuron.2006.04.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dreisbach G, Fischer R. Conflicts as aversive signals. Brain and Cognition. 2012;78:94–98. doi: 10.1016/j.bandc.2011.12.003. [DOI] [PubMed] [Google Scholar]
- Dreisbach G, Goschke T. How Positive Affect Modulates Cognitive Control: Reduced Perseveration at the Cost of Increased Distractibility. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2004;30:343–353. doi: 10.1037/0278-7393.30.2.343. [DOI] [PubMed] [Google Scholar]
- Dreyer JK, Herrik KF, Berg RW, Hounsgaard JD. Influence of Phasic and Tonic Dopamine Release on Receptor Activation. The Journal of Neuroscience. 2010;30:14273–14283. doi: 10.1523/JNEUROSCI.1894-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duncan J. The Structure of Cognition: Attentional Episodes in Mind and Brain. Neuron. 2013;80:35–50. doi: 10.1016/j.neuron.2013.09.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Durstewitz D, Seamans JK. The dual-state theory of prefrontal cortex dopamine function with relevance to catechol-o-methyltransferase genotypes and schizophrenia. Biological Psychiatry. 2008;64:739–749. doi: 10.1016/j.biopsych.2008.05.015. [DOI] [PubMed] [Google Scholar]
- Etzel J, Cole MW, Zacks JM, Kay KN, Braver TS. Reward Motivation Enhances Task Coding in Frontoparietal Cortex. Cerebral Cortex. 2015 doi: 10.1093/cercor/bhu327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frank MJ, Badre D. Mechanisms of Hierarchical Reinforcement Learning in Corticostriatal Circuits 1: Computational Analysis. Cerebral Cortex. 2012;22:509–526. doi: 10.1093/cercor/bhr114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frank MJ, Loughry B, O’Reilly RC. Interactions between frontal cortex and basal ganglia in working memory: a computational model. Cognitive, Affective, & Behavioral Neuroscience. 2001;1:137–160. doi: 10.3758/cabn.1.2.137. [DOI] [PubMed] [Google Scholar]
- Gan JO, Walton ME, Phillips PEM. Dissociable cost and benefit encoding of future rewards by mesolimbic dopamine. 2009;13:25–27. doi: 10.1038/nn.2460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gershman SJ. Dopamine Ramps Are a Consequence of Reward Prediction Errors. Neural Computation. 2014;26:467–471. doi: 10.1162/NECO_a_00559. [DOI] [PubMed] [Google Scholar]
- Ghods-Sharifi S, Floresco SB. Differential effects on effort discounting induced by inactivations of the nucleus accumbens core or shell. Behavioral Neuroscience. 2010;124:179–191. doi: 10.1037/a0018932. [DOI] [PubMed] [Google Scholar]
- Glascher J, Hampton AN, O'Doherty JP. Determining a Role for Ventromedial Prefrontal Cortex in Encoding Action-Based Value Signals During Reward-Related Decision Making. Cerebral Cortex. 2008;19:483–495. doi: 10.1093/cercor/bhn098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haber SN, Knutson B. The reward circuit: linking primate anatomy and human imaging. Neuropsychopharmacology. 2009;35:4–26. doi: 10.1038/npp.2009.129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hammar Å, Strand M, Årdal G, Schmid M, Lund A, Elliott R. Testing the cognitive effort hypothesis of cognitive impairment in major depression. Nordic Journal of Psychiatry. 2011;65:74–80. doi: 10.3109/08039488.2010.494311. [DOI] [PubMed] [Google Scholar]
- Herd SA, Reilly RCO, Hazy TE, Chatham CH, Brant AM, Friedman NP. A neural network model of individual differences in task switching abilities. Neuropsychologia. 2014 doi: 10.1016/j.neuropsychologia.2014.04.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hess TM, Ennis GE. Age differences in the effort and costs associated with cognitive activity. The Journals of Gerontology Series B: Psychological Sciences and Social Sciences. 2011;67:447–455. doi: 10.1093/geronb/gbr129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hillman KL, Bilkey DK. Neural encoding of competitive effort in the anterior cingulate cortex. Nature Neuroscience. 2012;15:1290–1297. doi: 10.1038/nn.3187. [DOI] [PubMed] [Google Scholar]
- Hiroyuki N. Multiplexing signals in reinforcement learning with internal models and dopamine. Current Opinion in Neurobiology. 2014;25:123–129. doi: 10.1016/j.conb.2014.01.001. [DOI] [PubMed] [Google Scholar]
- Hollon NG, Arnold MM, Gan JO, Walton ME, Phillips PEM. Dopamine-associated cached values are not sufficient as the basis for action selection. Proceedings of the National Academy of Sciences. 2014;111:18357–18362. doi: 10.1073/pnas.1419770111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holroyd CB, Coles MGH. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review. 2002;109:679–709. doi: 10.1037/0033-295X.109.4.679. [DOI] [PubMed] [Google Scholar]
- Holroyd CB, McClure SM. Hierarchical control over effortful behavior by rodent medial frontal cortex: A computational model. Psychological Review. 2015;122:54–83. doi: 10.1037/a0038339. [DOI] [PubMed] [Google Scholar]
- Holroyd CB, Yeung N. Motivation of extended behaviors by anterior cingulate cortex. Trends in Cognitive Sciences. 2012;16:122–128. doi: 10.1016/j.tics.2011.12.008. [DOI] [PubMed] [Google Scholar]
- Hosking JG, Floresco SB, Winstanley CA. Dopamine Antagonism Decreases Willingness to Expend Physical, but not Cognitive, Effort: a Comparison of Two Rodent Cost/Benefit Decision-Making Tasks. Neuropsychopharmacology. 2014;40:1005–1015. doi: 10.1038/npp.2014.285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Howe MW, Tierney PL, Sandberg SG, Phillips PEM, Graybiel AM. Prolonged dopamine signalling in striatum signals proximity and value of distant rewards. Nature. 2013;500:575–579. doi: 10.1038/nature12475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Inzlicht M, Schmeichel BJ, Macrae CN. Why self-control seems (but may not be) limited. Trends in Cognitive Sciences. 2014;18:127–133. doi: 10.1016/j.tics.2013.12.009. [DOI] [PubMed] [Google Scholar]
- Jimura K, Locke HS, Braver TS. Prefrontal cortex mediation of cognitive enhancement in rewarding motivational contexts. Proceedings of the National Academy of Sciences. 2010;107:8871–8876. doi: 10.1073/pnas.1002007107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Joel D, Niv Y, Ruppin E. Actor–critic models of the basal ganglia: New anatomical and computational perspectives. Neural Networks. 2002;15:535–547. doi: 10.1016/s0893-6080(02)00047-3. [DOI] [PubMed] [Google Scholar]
- Kahnt T, Weber SC, Haker H, Robbins TW, Tobler PN. Dopamine D2-Receptor Blockade Enhances Decoding of Prefrontal Signals in Humans. Journal of Neuroscience. 2015;35:4104–4111. doi: 10.1523/JNEUROSCI.4182-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kayser AS, Mitchell JM, Weinstein D, Frank MJ. npp2014193a. Neuropsychopharmacology. 2014;40:454–462. doi: 10.1038/npp.2014.193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kennerley SW, Behrens TEJ, Wallis JD. Double dissociation of value computations in orbitofrontal and anterior cingulate neurons. Nature Neuroscience. 2011;14:1581–1589. doi: 10.1038/nn.2961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kennerley SW, Dahmubed AF, Lara AH, Wallis JD. Neurons in the frontal lobe encode the value of multiple decision variables. Journal of Cognitive Neuroscience. 2009;21:1162–1178. doi: 10.1162/jocn.2009.21100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kodama T, Hikosaka K, Honda Y, Kojima T, Watanabe M. Higher dopamine release induced by less rather than more preferred reward during a working memory task in the primate prefrontal cortex. Behavioural Brain Research. 2014;266:104–107. doi: 10.1016/j.bbr.2014.02.009. [DOI] [PubMed] [Google Scholar]
- Kool W, McGuire JT, Rosen ZB, Botvinick MM. Decision making and the avoidance of cognitive demand. Journal of Experimental Psychology-General. 2010;139:665–682. doi: 10.1037/a0020198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kouneiher F, Charron S, Koechlin E. Motivation and cognitive control in the human prefrontal cortex. Nature Neuroscience. 2009;12:939–945. doi: 10.1038/nn.2321. [DOI] [PubMed] [Google Scholar]
- Krabbe S, Duda J, Schiemann J, Poetschke C, Schneider G, Kandel ER, Liss B, Roeper J, Simpson EH. Increased dopamine D2 receptor activity in the striatum alters the firing pattern of dopamine neurons in the ventral tegmental area. Proceedings of the National Academy of Sciences. 2015;112:E1498–E1506. doi: 10.1073/pnas.1500450112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krebs RM, Boehler CN, Roberts KC, Song AW, Woldorff MG. The involvement of the dopaminergic midbrain and cortico-striatal-thalamic circuits in the integration of reward prospect and attentional task demands. Cerebral Cortex. 2012;22:607–615. doi: 10.1093/cercor/bhr134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kroemer NB, Guevara A, Teodorescu IC, Wuttig F, Kobiella A, Smolka MN. Balancing reward and work: Anticipatory brain activation in NAcc and VTA predict effort differentially. Neuroimage. 2014;102:510–519. doi: 10.1016/j.neuroimage.2014.07.060. [DOI] [PubMed] [Google Scholar]
- Kurniawan IT, Guitart-Masip M, Dolan RJ. Dopamine and effort-based decision making. Frontiers in Neuroscience. 2011;5 doi: 10.3389/fnins.2011.00081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kurniawan IT, Guitart-Masip M, Dayan P, Dolan RJ. Effort and Valuation in the Brain: The Effects of Anticipation and Execution. Journal of Neuroscience. 2013;33:6160–6169. doi: 10.1523/JNEUROSCI.4777-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kurniawan IT, Seymour B, Talmi D, Yoshida W, Chater N, Dolan RJ. Choosing to Make an Effort: The Role of Striatum in Signaling Physical Effort of a Chosen Action. Journal of Neurophysiology. 2010;104:313–321. doi: 10.1152/jn.00027.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kurzban R, Duckworth A, Kable JW, Myers J. An opportunity cost model of subjective effort and task performance. Behavioral and Brain Sciences. 2013;36:661–679. doi: 10.1017/S0140525X12003196. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee AM, Tai LH, Zador A, Wilbrecht L. Between the primate and “reptilian” brain: Rodent models demonstrate the role of corticostriatal circuits in decision making. Neuroscience. 2015;296:66–74. doi: 10.1016/j.neuroscience.2014.12.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li J, Daw ND. Signals in Human Striatum Are Appropriate for Policy Update Rather than Value Prediction. Journal of Neuroscience. 2011;31:5504–5511. doi: 10.1523/JNEUROSCI.6316-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Locke HS, Braver TS. Motivational influences on cognitive control: Behavior, brain activation, and individual differences. Cognitive, Affective, & Behavioral Neuroscience. 2008;8:99–112. doi: 10.3758/cabn.8.1.99. [DOI] [PubMed] [Google Scholar]
- Ma L, Hyman JM, Phillips AG, Seamans JK. Tracking Progress toward a Goal in Corticostriatal Ensembles. Journal of Neuroscience. 2014;34:2244–2253. doi: 10.1523/JNEUROSCI.3834-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Massar SAA, Libedinsky C, Weiyan C, Huettel SA, Chee MWL. Separate and overlapping brain areas encode subjective value during delay and effort discounting. Neuroimage. 2015;36:899–904. doi: 10.1016/j.neuroimage.2015.06.080. [DOI] [PubMed] [Google Scholar]
- Matsumoto M, Takada M. Distinct Representations of Cognitive and Motivational Signals in Midbrain Dopamine Neurons. Neuron. 2013;79:1011–1024. doi: 10.1016/j.neuron.2013.07.002. [DOI] [PubMed] [Google Scholar]
- McClure SM, Daw ND, Read Montague P. A computational substrate for incentive salience. Trends in Neurosciences. 2003;26:423–428. doi: 10.1016/s0166-2236(03)00177-2. [DOI] [PubMed] [Google Scholar]
- McGinty VB, Lardeux S, Taha SA, Kim JJ, Nicola SM. Invigoration of Reward Seeking by Cue and Proximity Encoding in the Nucleus Accumbens. Neuron. 2013;78:910–922. doi: 10.1016/j.neuron.2013.04.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McGuire JT, Botvinick MM. Prefrontal cortex, cognitive control, and the registration of decision costs. Proceedings of the National Academy of Sciences. 2010;107:7922. doi: 10.1073/pnas.0910662107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meyniel F, Sergent C, Rigoux L, Daunizeau J, Pessiglione M. Neurocomputational account of how the human brain decides when to have a break. Proceedings of the National Academy of Sciences. 2013;110:2641–2646. doi: 10.1073/pnas.1211925110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miller EK, Cohen JD. An integrative theory of prefrontal cortex function. Annual Review of Neuroscience. 2001;24:167–202. doi: 10.1146/annurev.neuro.24.1.167. [DOI] [PubMed] [Google Scholar]
- Mogenson G, Jones D, Yim C. From motivation to action: Functional interface between the limbic system and the motor system. Progress in Neurobiology. 1980;14:69–97. doi: 10.1016/0301-0082(80)90018-0. [DOI] [PubMed] [Google Scholar]
- Montague P, Dayan P, Sejnowski T. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. The Journal of Neuroscience. 1996;16:1936. doi: 10.1523/JNEUROSCI.16-05-01936.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nicola SM. The Flexible Approach Hypothesis: Unification of Effort and Cue-Responding Hypotheses for the Role of Nucleus Accumbens Dopamine in the Activation of Reward-Seeking Behavior. Journal of Neuroscience. 2010;30:16585–16600. doi: 10.1523/JNEUROSCI.3958-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nicola SM. The nucleus accumbens as part of a basal ganglia action selection circuit. Psychopharmacology. 2007;191:521–550. doi: 10.1007/s00213-006-0510-4. [DOI] [PubMed] [Google Scholar]
- Nicola SM, Woodward Hopf F, Hjelmstad GO. Contrast enhancement: a physiological effect of striatal dopamine? Cell and Tissue Research. 2004;318:93–106. doi: 10.1007/s00441-004-0929-z. [DOI] [PubMed] [Google Scholar]
- Niv Y. Reinforcement Learning in the Brain. Journal of Mathematical Psychology. 2009;53:139–154. [Google Scholar]
- Niv Y, Daw N, Joel D, Dayan P. Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacology. 2007;191:507–520. doi: 10.1007/s00213-006-0502-4. [DOI] [PubMed] [Google Scholar]
- Nymberg C, Banaschewski T, Bokde AL, Buchel C, Conrod P, Flor H, Frouin V, Garavan H, Gowland P, Heinz A. DRD2/ANKK1 polymorphism modulates the effect of ventral striatal activation on working memory performance. Neuropsychopharmacology. 2014 doi: 10.1038/npp.2014.83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Reilly RC, Frank MJ. Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural Computation. 2006;18:283–328. doi: 10.1162/089976606775093909. [DOI] [PubMed] [Google Scholar]
- O’Reilly RC, Hazy TE, Mollick J, Mackie P, Herd S. Goal-driven cognition in the brain: a computational framework. arXiv.org. q-bio.NC. 2014 [Google Scholar]
- Padoa-Schioppa C. Neurobiology of Economic Choice: A Good-Based Model. Annual Review of Neuroscience. 2011;34:333–359. doi: 10.1146/annurev-neuro-061010-113648. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parvizi J, Rangarajan V, Shirer WR, Desai N, Greicius MD. The Will to Persevere Induced by Electrical Stimulation of the Human Cingulate Gyrus. Neuron. 2013;80:1359–1367. doi: 10.1016/j.neuron.2013.10.057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pasquereau B, Turner RS. Limited Encoding of Effort by Dopamine Neurons in a Cost-Benefit Trade-off Task. Journal of Neuroscience. 2013;33:8288–8300. doi: 10.1523/JNEUROSCI.4619-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pessoa L, Engelmann JB. Embedding reward signals into perception and cognition. Frontiers in Neuroscience. 2010;4 doi: 10.3389/fnins.2010.00017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Phillips AG, Vacca G, Ahn S. A top-down perspective on dopamine, motivation and memory. Pharmacology Biochemistry and Behavior. 2008;90:236–249. doi: 10.1016/j.pbb.2007.10.014. [DOI] [PubMed] [Google Scholar]
- Prévost C, Pessiglione M, Météreau E, Cléry-Melin M-L, Dreher J-C. Separate valuation subsystems for delay and effort decision costs. Journal of Neuroscience. 2010;30:14080–14090. doi: 10.1523/JNEUROSCI.2752-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Raichle ME, Mintun M. Brain Work and Brain Imaging. Annual Review of Neuroscience. 2006;29:449–476. doi: 10.1146/annurev.neuro.29.051605.112819. [DOI] [PubMed] [Google Scholar]
- Ribas-Fernandes JJF, Solway A, Diuk C, McGuire JT, Barto AG, Niv Y, Botvinick MM. A Neural Signature of Hierarchical Reinforcement Learning. Neuron. 2011;71:370–379. doi: 10.1016/j.neuron.2011.05.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Riggall AC, Postle BR. The Relationship between Working Memory Storage and Elevated Activity as Measured with Functional Magnetic Resonance Imaging. Journal of Neuroscience. 2012;32:12990–12998. doi: 10.1523/JNEUROSCI.1892-12.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robbins T, Arnsten A. The Neuropsychopharmacology of Fronto-Executive Function: Monoaminergic Modulation. Annual Review of Neuroscience. 2009;32:267–287. doi: 10.1146/annurev.neuro.051508.135535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roesch MR, Singh T, Brown PL, Mullins SE, Schoenbaum G. Ventral Striatal Neurons Encode the Value of the Chosen Action in Rats Deciding between Differently Delayed or Sized Rewards. Journal of Neuroscience. 2009;29:13365–13376. doi: 10.1523/JNEUROSCI.2572-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Salamone JD, Correa M. The Mysterious Motivational Functions of Mesolimbic Dopamine. Neuron. 2012;76:470–485. doi: 10.1016/j.neuron.2012.10.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Salamone JD, Correa M, Nunes EJ, Randall PA, Pardo M. The Behavioral Pharmacology of Effort-related Choice Behavior: Dopamine, Adenosine and Beyond. Journal of the Experimental Analysis of Behavior. 2012;97:125–146. doi: 10.1901/jeab.2012.97-125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Samanez-Larkin GR, Buckholtz JW, Cowan RL, Woodward ND, Li R, Ansari MS, Arrington CM, Baldwin RM, Smith CE, Treadway MT, et al. A Thalamocorticostriatal Dopamine Network for Psychostimulant-Enhanced Human Cognitive Flexibility. Biological Psychiatry. 2013;74:99–105. doi: 10.1016/j.biopsych.2012.10.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sara SJ. The locus coeruleus and noradrenergic modulation of cognition. Nature Reviews Neuroscience. 2009;10:211–223. doi: 10.1038/nrn2573. [DOI] [PubMed] [Google Scholar]
- Satterthwaite TD, Ruparel K, Loughead J, Elliott MA, Gerraty RT, Calkins ME, Hakonarson H, Gur RC, Gur RE, Wolf DH. Being right is its own reward: Load and performance related ventral striatum activation to correct responses during a working memory task in youth. Neuroimage. 2012;61:723–729. doi: 10.1016/j.neuroimage.2012.03.060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saunders B, Inzlicht M. Vigour and Fatigue: How Variation in Affect Underlies Effective Self-Control. In: Braver TS, editor. Motivation and Cognitive Control. Taylor & Francis/Routledge; 2015. [Google Scholar]
- Schmidt L, Lebreton M, Cléry-Melin M-L, Daunizeau J, Pessiglione M. Neural Mechanisms Underlying Motivation of Mental Versus Physical Effort. PLoS Biology. 2012;10:e1001266. doi: 10.1371/journal.pbio.1001266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schouppe N, Demanet J, Boehler CN, Ridderinkhof KR, Notebaert W. The Role of the Striatum in Effort-Based Decision-Making in the Absence of Reward. Journal of Neuroscience. 2014;34:2148–2154. doi: 10.1523/JNEUROSCI.1214-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schouppe N, De Houwer J, Richard Ridderinkhof K, Notebaert W. Conflict: Run! Reduced Stroop interference with avoidance responses. The Quarterly Journal of Experimental Psychology. 2012;65:1052–1058. doi: 10.1080/17470218.2012.685080. [DOI] [PubMed] [Google Scholar]
- Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. [DOI] [PubMed] [Google Scholar]
- Schweimer J, Hauber W. Dopamine D1 receptors in the anterior cingulate cortex regulate effort-based decision making. Learning & Memory. 2006;13:777–782. doi: 10.1101/lm.409306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schweimer J, Saft S, Hauber W. Involvement of Catecholamine Neurotransmission in the Rat Anterior Cingulate in Effort-Related Decision Making. Behavioral Neuroscience. 2005;119:1687–1692. doi: 10.1037/0735-7044.119.6.1687. [DOI] [PubMed] [Google Scholar]
- Seamans JK, Yang CR. The principal features and mechanisms of dopamine modulation in the prefrontal cortex. Progress in Neurobiology. 2004;74:1–58. doi: 10.1016/j.pneurobio.2004.05.006. [DOI] [PubMed] [Google Scholar]
- Shackman AJ, Salomons TV, Slagter HA, Fox AS, Winter JJ, Davidson RJ. The integration of negative affect, pain and cognitive control in the cingulate cortex. Nature Reviews Neuroscience. 2011;12:154–167. doi: 10.1038/nrn2994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shenhav A, Botvinick MM, Cohen JD. The Expected Value of Control: An Integrative Theory of Anterior Cingulate Cortex Function. Neuron. 2013;79:217–240. doi: 10.1016/j.neuron.2013.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shiner T, Symmonds M, Guitart-Masip M, Fleming SM, Friston KJ, Dolan RJ. Dopamine, Salience, and Response Set Shifting in Prefrontal Cortex. Cerebral Cortex. 2015;25:3629–3693. doi: 10.1093/cercor/bhu210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skvortsova V, Palminteri S, Pessiglione M. Learning To Minimize Efforts versus Maximizing Rewards: Computational Principles and Neural Correlates. Journal of Neuroscience. 2014;34:15621–15630. doi: 10.1523/JNEUROSCI.1350-14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Small DM. Monetary Incentives Enhance Processing in Brain Regions Mediating Top-down Control of Attention. Cerebral Cortex. 2005;15:1855–1865. doi: 10.1093/cercor/bhi063. [DOI] [PubMed] [Google Scholar]
- Spencer RC, Devilbiss DM, Berridge CW. The Cognition-Enhancing Effects of Psychostimulants Involve Direct Action in the Prefrontal Cortex. Biological Psychiatry. 2015;77:940–950. doi: 10.1016/j.biopsych.2014.09.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spunt RP, Lieberman MD, Cohen JR, Eisenberger NI. The phenomenology of error processing: the dorsal ACC response to stop-signal errors tracks reports of negative affect. Journal of Cognitive Neuroscience. 2012;24:1753–1765. doi: 10.1162/jocn_a_00242. [DOI] [PubMed] [Google Scholar]
- Steullet P, Cabungcal J-H, Cuenod M, Do KQ. Fast oscillatory activity in the anterior cingulate cortex: dopaminergic modulation and effect of perineuronal net loss. Frontiers in Cellular Neuroscience. 2014;8 doi: 10.3389/fncel.2014.00244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Strauss GP, Morra LF, Sullivan SK, Gold JM. The role of low cognitive effort and negative symptoms in neuropsychological impairment in schizophrenia. Neuropsychology. 2015;29:282–291. doi: 10.1037/neu0000113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Von Stumm S, Hell B, Chamorro-Premuzic T. The Hungry Mind: Intellectual Curiosity Is the Third Pillar of Academic Performance. Perspectives on Psychological Science. 2011;6:574–588. doi: 10.1177/1745691611421204. [DOI] [PubMed] [Google Scholar]
- Tai L-H, Lee AM, Benavidez N, Bonci A, Wilbrecht L. Transient stimulation of distinct subpopulations of striatal neurons mimics changes in action value. Nature Neuroscience. 2012;15:1281–1289. doi: 10.1038/nn.3188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Treadway MT, Buckholtz JW, Cowan RL, Woodward ND, Li R, Ansari MS, Baldwin RM, Schwartzman AN, Kessler RM, Zald DH. Dopaminergic Mechanisms of Individual Differences in Human Effort-Based Decision-Making. Journal of Neuroscience. 2012;32:6170–6176. doi: 10.1523/JNEUROSCI.6459-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trifilieff P, Feng B, Urizar E, Winiger V, Ward RD, Taylor KM, Martinez D, Moore H, Balsam PD, Simpson EH, et al. Increasing dopamine D2 receptor expression in the adult nucleus accumbens enhances motivation. Molecular Psychiatry. 2013;18:1025–1033. doi: 10.1038/mp.2013.57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Meer MA, Redish AD. Ventral striatum: a critical look at models of learning and evaluation. Current Opinion in Neurobiology. 2011;21:387–392. doi: 10.1016/j.conb.2011.02.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Holstein M, Aarts E, van der Schaaf ME, Geurts DEM, Verkes RJ, Franke B, van Schouwenburg MR, Cools R. Human cognitive flexibility depends on dopamine D2 receptor signaling. Psychopharmacology. 2011;218:567–578. doi: 10.1007/s00213-011-2340-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Schouwenburg M, Aarts E, Cools R. Dopaminergic modulation of cognitive control: distinct roles for the prefrontal cortex and the basal ganglia. Current Pharmaceutical Design. 2010;16:2026–2032. doi: 10.2174/138161210791293097. [DOI] [PubMed] [Google Scholar]
- Varazzani C, San-Galli A, Gilardeau S, Bouret S. Noradrenaline and Dopamine Neurons in the Reward/Effort Trade-Off: A Direct Electrophysiological Comparison in Behaving Monkeys. Journal of Neuroscience. 2015;35:7866–7877. doi: 10.1523/JNEUROSCI.0454-15.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vassena E, Silvetti M, Boehler CN, Achten E, Fias W, Verguts T. Overlapping Neural Systems Represent Cognitive Effort and Reward Anticipation. PLoS One. 2014;9:e91008. doi: 10.1371/journal.pone.0091008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vijayraghavan S, Wang M, Birnbaum SG, Williams GV, Arnsten AFT. Inverted-U dopamine D1 receptor actions on prefrontal neurons engaged in working memory. Nature Neuroscience. 2007;10:376–384. doi: 10.1038/nn1846. [DOI] [PubMed] [Google Scholar]
- Volkow ND, Wang G-J, Newcorn JH, Kollins SH, Wigal TL, Telang F, Fowler JS, Goldstein RZ, Klein N, Logan J, et al. Motivation Deficit in ADHD is Associated with Dysfunction of the Dopamine Reward Pathway. Molecular Psychiatry. 2010;16:1147–1154. doi: 10.1038/mp.2010.97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walton ME, Groves J, Jennings KA, Croxson PL, Sharp T, Rushworth MFS, Bannerman DM. Comparing the role of the anterior cingulate cortex and 6-hydroxydopamine nucleus accumbens lesions on operant effort-based decision making. European Journal of Neuroscience. 2009;29:1678–1691. doi: 10.1111/j.1460-9568.2009.06726.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ward RD, Winiger V, Higa KK, Kahn JB, Kandel ER, Balsam PD, Simpson EH. The impact of motivation on cognitive performance in an animal model of the negative and cognitive symptoms of schizophrenia. Behavioral Neuroscience. 2015;129:292–299. doi: 10.1037/bne0000051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Westbrook A, Kester D, Braver TS. What Is the Subjective Cost of Cognitive Effort? Load, Trait, and Aging Effects Revealed by Economic Preference. PLoS One. 2013;8:e68210. doi: 10.1371/journal.pone.0068210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wickens JR, Horvitz JC, Costa RM, Killcross S. Dopaminergic Mechanisms in Actions and Habits. Journal of Neuroscience. 2007;27:8181–8183. doi: 10.1523/JNEUROSCI.1671-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wunderlich K, Smittenaar P, Dolan RJ. Dopamine Enhances Model-Based over Model-Free Choice Behavior. Neuron. 2012;75:418–424. doi: 10.1016/j.neuron.2012.03.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang J, Berridge KC, Tindell AJ, Smith KS, Aldridge JW. A Neural Computational Model of Incentive Salience. PLoS Computational Biology. 2009;5:e1000437. doi: 10.1371/journal.pcbi.1000437. [DOI] [PMC free article] [PubMed] [Google Scholar]







