Abstract
Extensive evidence implicates the ventral striatum in multiple distinct facets of action selection. Early work established a role in modulating ongoing behavior, as engaged by the energizing and directing influences of motivationally relevant cues and the willingness to expend effort in order to obtain reward. More recently, reinforcement learning models have suggested the notion of ventral striatum primarily as an evaluation step during learning, which serves as a critic to update a separate actor. Recent computational and experimental work may provide a resolution to the differences between these two theories through a careful parsing of behavior and the instrinsic heterogeneity that characterizes this complex structure.
Keywords: Nucleus accumbens, dopamine, actor-critic, reinforcement learning, motivation, decision-making
Decision making – or action selection as it is historically referred to in the striatal literature – typically depends both on past experience (learning) and current motivational state. Early conceptualizations of ventral striatal function emphasized its role in mediating the latter: Mogenson et al. (1) envisioned the nucleus accumbens as a pathway from motivation to action, a notion congruent with decision-related proposals such as overcoming effort required to obtain reward (2), incentive salience (3), and mediating the impact of behaviorally relevant (4) or temporally unpredictable cues (5). These proposals share the view that the ventral striatum influences action selection at the time of decision, broadly taken to include making a choice between options as well as the initiation or interruption of behavior. In contrast, more recent suggestions that ventral striatum is part of the “critic” component of an “actor-critic” temporal-difference reinforcement learning (TDRL) implementation casts it as enabling learning, but not itself involved at the time of decision (6). New computational models based on online evaluation processes may provide a potential resolution to these differences, suggesting that the “critic” can play a role in certain decisions at the time of decision as well as a longer-term learning role.
Ventral striatum as a reinforcement learning critic
TDRL models have been remarkably successful in predicting decision-related neural activity based on internal model parameters inferred from behavioral fits in human, non-human primate and rodent studies (7, 8). TDRL models assume the world is divided up into distinct situations or “states”: these states can change when important events happen in the world (such as a lever is provided to the subject) or through actions taken by the decision-maker or agent (such as pressing that lever) (9–11). Certain states result in reward delivery, while other states merely reflect categorizations of the situation (12). Because state space representations are internal to the agent, they need not be restricted to categorizations of the environment, but may also include predictive, working memory, or action history components (13, 14). Overall, the agent aims to maximize its reward and minimize its punishment by learning from experience which actions to take in any given state. The essential feature of the critic in the actor-critic model is that it provides a training signal to the actor to learn which actions to take. The critic accomplishes this by maintaining a value function across states that reflects the expected future reward from that state. When a state transition happens, the critic reports the difference between observed and expected value (value prediction error), which can then be used to train a separate actor. This can be seen, for instance, in the classic recordings from Schultz and colleagues of dopaminergic neurons in the ventral tegmental area (VTA), which learn to signal a prediction error upon appearance of a reward-predictive stimulus, even though that stimulus is not in itself a primary reward (15). Thus, not only do internal variables derived from TDRL models appear to fit neural signals well, but the TDRL language of states, values, and actions provides a way to formulate explicit theories that deal with learning, decision-making, and reward.
An influential proposal about the biological implementation of TDRL has suggested the ventral striatum implements the critic, encoding the value (expected future reward) and (actual) reward information necessary for the calculation of prediction error signals (6, 11, 16). In support of this idea, ventral striatum is strongly coupled to the VTA, potentially allowing VTA access to value and reward signals from the ventral striatum to compute prediction errors, and vice versa. Three main experimental approaches have been brought to bear on testing for the presence of such signals in the ventral striatum: measuring cerebral blood flow with functional magnetic resonance imaging (fMRI), dopamine concentrations using fast-scan cyclic voltammetry (FSCV), and spiking and field potential activity using electrophysiological recording.
Although the generality of the prediction error interpretation of VTA neuron activity remains controversial (17, 18), voltammetry studies have found value-prediction error signals in the dopamine levels in ventral striatum during behavioral tasks (19, 20), as predicted from dopamine neuron activity in the VTA (15). fMRI studies reliably report reward prediction error signaling in the ventral striatum (21–23), thought in part, although not entirely, to arise from VTA input (24, 25). Thus, in accordance with the actor-critic model, ventral striatum appears to have access to prediction error signal inputs. Unit recording studies in rats have looked for, but not reliably found, prediction error coding in ventral striatal spiking activity (26, 27), suggesting that this signal is transformed by ventral striatal processing. The actor-critic model suggests this transformation should result in a value signal.
Different experimental settings have been used to identify potential value signals in the ventral striatum. One possible candidate is a population of anticipatory “ramping” neurons, which gradually increase their firing rate when approaching or waiting for reward delivery (28, 29). Because future rewards are discounted, this pattern is similar to that expected of a critic state value function (9). Khamassi et al. (28) explicitly attempted to fit the firing patterns of such ramping neurons with TDRL models. They found the relationship to be mixed, with many neurons ramping up to some, but not other, reward deliveries, unlike what the critic theory would predict. By assuming fragmented state spaces – essentially allowing the agent to be confused about the true state of the world – they could reproduce TDRL value functions similar to the data. This interpretation highlights an important issue in the application of TDRL models to behavior and neural data: we may not know the true state space used by the organism (12, 30).
Value-related signals can also be related to actions taken by the agent. Direct representation of “action values” are likely part of the actor, not the critic, but actions likely result in state transitions which would be reflected in the critic’s value representations. Internal (covert) perparatory state transitions may also be reflected in the critic value signal, potentially producing what appear to be pre-action signals in the critic. There is substantial fMRI evidence for state value representations in the ventral striatum (31–33), but the data from unit recording studies has been less consistent. Ito and Doya (27) applied a comprehensive analysis of ventral striatum neural correlates using TDRL models fit to behavior on a choice task where action/outcome value could be dissociated from actions. Although they found a statistically significant population of neurons encoding the value of the upcoming outcome, this signal did not have a clear time course around the time of the decision, and the percentage of neurons involved was small (<10%). Similarly, Kim et al. (34, 35) and Kimchi and Laubach (36) found that ventral striatal activity contained little information about upcoming behavioral choice. In contrast, Roesch et al. (26) found a population of ventral striatal neurons (also <10%) coding for the value and direction of chosen actions, before the action was initiated. Similar populations have been found in other aspects of the striatum (34–36).
Finally, reward-related cues or state transitions occurring in the environment unrelated to the agent’s actions have long been known to trigger ventral striatal activity (37, 38), including responses to reward delivery (39). Consistent with the actor-critic TDRL formulation, the development of firing to reward-predictive cues in ventral striatum depends on dopaminergic input (40). In further support of the theory, there is evidence that ventral striatal firing to reward-predictive cues (26) and rewards themselves (39) are modulated by value, although these have not systematically been distinguished from motivation or palatability (26, 39).
Thus, experimental recording and imaging studies such as the above face a number of challenges in relating theoretical concepts to the data. First, many of the signals of interest – including prediction errors and value signals – are often correlated, and so require specific experimental designs for disentanglement (e.g. by including appetitive and aversive reinforcers (41)). Second, ventral striatum forms an interconnected network with a number of limbic areas, including the orbitofrontal cortex and the amygdala, in which similar value and prediction error signals appear to be present (23, 42). Third, the same behavioral task may be accomplished by different decision-making systems, with radically different information processing needs, which would change the expected action-selectivity of neural firing (43, 44). Finally, the ventral striatum is a heterogenenous structure with numerous anatomical and functional dissociations between, for instance, the core and shell subregions, electrical and neuromodulatory input gradients, and complex receptor expression patterns (45, 46). We discuss recent progress related to the last two issues next.
Functional heterogeneity: defining ventral striatal processing units
The well-known anatomical and functional heterogeneity of the ventral striatum (45, 46) contributes to the diversity and fragmentation of recording data; see for instance (47, 48) for recent examples that map known differences between the core and shell subregions to neural activity. Such differences have not yet been systematically related to value signals in the ventral striatum. However, voltammetry studies on the dopamine input to the ventral striatum suggest that there will likely be important differences. For example, Aragona et al. (49) found that cocaine sensitization of the dopamine signal on drug-receipt appeared in ventral striatum shell, while the dopamine signal that developed to the preditive cue appeared in ventral striatal core. Even within these subregions, the dopamine signal may not be unitary (50).
Ventral striatum receives a number of convergent inputs, many of which overlap at the population and single neuron level (45, 51, 52) thought to define functional subunits (53). In a recent demonstration of this, van der Meer and Redish (29) found that anticipatory “ramp” cells showed theta phase precession relative to the hippocampal theta rhythm. This systematic spike timing is thought to be important for the rapid encoding of sequences in the hippocampus, suggesting its extension to value-related signals in the ventral striatum may implement associations between places and rewards. In support of this idea, the projection from hippocampus to ventral striatum (shell) is known to be important for place-reward learning (54) and reward-related cells in the ventral striatum are more likely to exhibit coherent off-line “replay” with the hippocampus (55). A different example comes from a recent set of studies documenting the properties of gamma oscillations in the ventral striatum. It was found that “low” (40–60 Hz) and “high” (70–100 Hz) gamma oscillations not only had distinct behavioral correlates, but were also associated with distinct populations of putative fast-spiking interneurons (56–58). Low and high gamma oscillations displayed a distinctive “switching” pattern, suggesting the possibility that different ventral striatal network states, driven by distinct FSI populations, may form transient functional connections with different inputs as indicated by coherence or synchrony across areas (59). Thus, simultaneous recording from multiple structures provides an opportunity not only for electrophysiological identification of likely ventral striatal targets of specific afferents, but also for an examination of the computations implemented by these projections.
Behavioral heterogeneity: same overt behavior, different underlying processing
Current reinforcement learning theories suggest that tasks can be solved by two distinct processing mechanisms (43, 60, 61) — in the “model-based” system, the agent is able to search through a representation of potential consequences of its actions and evaluate those representations online, during the decision-making process itself; in “model-free” systems such as the actor-critic, the agent makes decisions based only on the representation of the state of the world at the moment (which, as noted above, may be complex, e.g. including working memory). The notion of model-free and model-based controllers rests on a large body of evidence for dissociable learning and decision-making systems in the brain (62–64); a key insight from these studies is that what overtly appears to be a very similar behavior (such as pressing a lever for food) may actually depend on different neural substrates depending on, for instance, the amount of training on the task. Which system is in control is not immediately clear but may be revealed by judicious probe trials or detailed behavioral analysis.
A prominent example is sensitivity to reinforcer devaluation, which is interpreted as evidence that a “model-based” system, with knowledge about the outcomes of actions and dynamically evaluating them is in control of behavior (60, 64). Although there is evidence for the involvement of ventral striatum in devaluation experiments in lever-pressing tasks, the pattern of results is complex (65, 66). However, recent studies in the rat appear to be converging on a role for ventral striatum in mediating the effect of reward value (US) on responses to a predictive cue (CS) when the US is devalued (67, 68). Thus, in a given decision situation, the involvement of ventral striatum in model-based decision-making may depend on the extent to which Pavlovian relationships and responses are congruent with the instrumental behavior (69, 70).
Although to our knowledge neural activity in ventral striatum has not been recorded during reinforcer devaluation, several recent studies have found ventral striatal firing during behaviors suggestive of “model-based” control. Krause et al. (48) found that ventral striatal neurons fired with the initiation of self-initiated movements towards places conditioned to be preferred by morphine injection (71). Recently, Nicola (72) found a striking contrast where ventral striatum was involved on “flexible approach” trials where the rat’s starting location for reward approach differed from trial to trial and/or the rat may have been engaged in other activities (such as grooming), but not on “habitual approach” trials where the rat executed a stereotyped response. Similarly, van der Meer and Redish (73) found that ventral striatal cells that normally fired during consumatory phases of reward showed increased activity shortly before decisions as rats engaged in vicarious-trial-and-error behaviors at a choice point. This effect subsided as the behaviors became more automated, and did not appear in dorsal striatal recordings on the same task (44).
Thus, ventral striatum appears to be involved in aspects of both model-free and model-based or flexible behavior. The computations underlying these systems are thought to be quite different, but value signals play an important role in both: facilitating learning as the “critic” in actor-critic TDRL models, and evaluating predictions about the future in model-based systems. The idea that ventral striatum can be a critic over actual states (as required for TDRL learning) and internally generated or hypothetical states (as may be required during flexible behaviors; (13, 14, 74)) may help bring together the known involvement in both learning in the model-free case, and performance in the model-based case (75).
Acknowledgments
Supported by NIH MH080318 (ADR) and the Department of Biology and Centre for Theoretical Neuroscience, University of Waterloo (MvdM).
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Mogenson G, Jones D, Yim C. From motivation to action: functional interface between the limbic system and the motor system. Progress in Neurobiology. 1980;14:69–97. doi: 10.1016/0301-0082(80)90018-0. [DOI] [PubMed] [Google Scholar]
- 2.Salamone JD, Correa M, Farrar AM, Nunes EJ, Pardo M. Dopamine, behavioral economics, and effort. Frontiers in behavioral neuroscience. 2009;3:13. doi: 10.3389/neuro.08.013.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Berridge KC. The debate over dopamine’s role in reward: the case for incentive salience. Psychopharmacology (Berl) 2007:391–431. doi: 10.1007/s00213-006-0578-x. [DOI] [PubMed] [Google Scholar]
- 4.Cardinal RN, Parkinson JA, Hall J, Everitt BJ. Emotion and motivation: The role of the amygdala, ventral striatum, and prefrontal cortex. Neuroscience and Biobehavioral Reviews. 2002;26:321–352. doi: 10.1016/s0149-7634(02)00007-6. [DOI] [PubMed] [Google Scholar]
- 5.Nicola SM. The nucleus accumbens as part of a basal ganglia action selection circuit. Psychopharmacology. 2007;191:521–50. doi: 10.1007/s00213-006-0510-4. [DOI] [PubMed] [Google Scholar]
- 6.Joel D, Niv Y, Ruppin E. Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural networks: the official journal of the International Neural Network Society. 2002;15:535–47. doi: 10.1016/s0893-6080(02)00047-3. [DOI] [PubMed] [Google Scholar]
- 7.O’Doherty JP, Hampton A, Kim H. Model-based fMRI and its application to reward learning and decision making. Annals of the New York Academy of Sciences. 2007;1104:35–53. doi: 10.1196/annals.1390.022. [DOI] [PubMed] [Google Scholar]
- 8.Corrado G, Doya K. Understanding neural coding through the model-based analysis of decision making. The Journal of Neuroscience. 2007;27:8178–80. doi: 10.1523/JNEUROSCI.1590-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Daw ND. PhD thesis. Carnegie Mellon University; 2003. reinforcement learning models of the dopamine system and their behavioral implications. [Google Scholar]
- 10.Dayan P, Niv Y. Reinforcement learning: the good, the bad and the ugly. Current opinion in neurobiology. 2008;18:185–96. doi: 10.1016/j.conb.2008.08.003. [DOI] [PubMed] [Google Scholar]
- 11.Maia TV. Reinforcement learning, conditioning, and the brain: Successes and challenges. Cognitive, affective & behavioral neuroscience. 2009;9:343–64. doi: 10.3758/CABN.9.4.343. [DOI] [PubMed] [Google Scholar]
- 12.Redish AD, Jensen S, Johnson A, Kurth-Nelson Z. Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. Psychological review. 2007;114:784–805. doi: 10.1037/0033-295X.114.3.784. [DOI] [PubMed] [Google Scholar]
- 13.O’Reilly RC, Frank MJ. Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural computation. 2006;18:283–328. doi: 10.1162/089976606775093909. [DOI] [PubMed] [Google Scholar]
- 14.Zilli EA, Hasselmo ME. Modeling the role of working memory and episodic memory in behavioral tasks. Hippocampus. 2008 doi: 10.1002/hipo.20382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. [DOI] [PubMed] [Google Scholar]
- 16.Houk JC, Adams JL, Barto AG. A model of how the basal ganglia generate and use neural signals that predict reinforcement. In: Houk JC, Davis JL, Beiser DG, editors. Models of information processing in the basal ganglia. 1995. pp. 249–270. [Google Scholar]
- 17.Tobler PN, Kobayashi S. Electrophysiological correlates of reward processing in dopamine neurons. In: Dreher JC, Tremblay L, editors. Handbook of Reward and Decision Making. 2009. pp. 29–50. [Google Scholar]
- 18.Bromberg-Martin ES, Matsumoto M, Hikosaka O. Distinct tonic and phasic anticipatory activity in lateral habenula and dopamine neurons. Neuron. 2010;67:144–55. doi: 10.1016/j.neuron.2010.06.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Cheer JF, Aragona BJ, Heien MLAV, Seipel AT, Carelli RM, Wightman RM. Coordinated accumbal dopamine release and neural activity drive goal-directed behavior. Neuron. 2007;54:237–44. doi: 10.1016/j.neuron.2007.03.021. [DOI] [PubMed] [Google Scholar]
- 20.Day JJ, Roitman MF, Wightman RM, Carelli RM. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nature neuroscience. 2007;10:1020–8. doi: 10.1038/nn1923. [DOI] [PubMed] [Google Scholar]
- 21.Pagnoni G, Zink CF, Montague PR, Berns GS. Activity in human ventral striatum locked to errors of reward prediction. Nature neuroscience. 2002;5:97–8. doi: 10.1038/nn802. [DOI] [PubMed] [Google Scholar]
- 22.O’Doherty J, Dayan P, Schultz J, Deichmann R, Friston K, Dolan RJ. Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning. Science. 2004;304:452–454. doi: 10.1126/science.1094285. [DOI] [PubMed] [Google Scholar]
- 23.Hare TA, O’Doherty J, Camerer CF, Schultz W, Rangel A. Dissociating the role of the orbitofrontal cortex and the striatum in the computation of goal values and prediction errors. The Journal of Neuroscience. 2008;28:5623–30. doi: 10.1523/JNEUROSCI.1309-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.D’Ardenne K, McClure SM, Nystrom LE, Cohen JD. BOLD responses reflecting dopaminergic signals in the human ventral tegmental area. Science. 2008;319:1264–1267. doi: 10.1126/science.1150605. [DOI] [PubMed] [Google Scholar]
- 25.Schonberg T, O’Doherty JP, Joel D, Inzelberg R, Segev Y, Daw ND. Selective impairment of prediction error signaling in human dorsolateral but not ventral striatum in Parkinson’s disease patients: evidence from a model-based fMRI study. NeuroImage. 2010;49:772–81. doi: 10.1016/j.neuroimage.2009.08.011. [DOI] [PubMed] [Google Scholar]
- 26.Roesch MR, Singh T, Brown PL, Mullins SE, Schoenbaum G. Ventral striatal neurons encode the value of the chosen action in rats deciding between differently delayed or sized rewards. Journal of Neuroscience. 2009;29:13365–13376. doi: 10.1523/JNEUROSCI.2572-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ito M, Doya K. Validation of decision-making models and analysis of decision variables in the rat basal ganglia. Journal of Neuroscience. 2009;29:9861–9874. doi: 10.1523/JNEUROSCI.6157-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Khamassi M, Mulder AB, Tabuchi E, Douchamps V, Wiener SI. Anticipatory reward signals in ventral striatal neurons of behaving rats. The European journal of neuroscience. 2008;28:1849–66. doi: 10.1111/j.1460-9568.2008.06480.x. [DOI] [PubMed] [Google Scholar]
- 29.van der Meer MAA, Redish AD. Theta phase precession in rat ventral striatum links place and reward information. Journal of Neuroscience. 2011 doi: 10.1523/JNEUROSCI.4869-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Nakahara H, Itoh H, Kawagoe R, Takikawa Y, Hikosaka O. Dopamine neurons can represent context-dependent prediction error. Neuron. 2004;41:269–80. doi: 10.1016/s0896-6273(03)00869-9. [DOI] [PubMed] [Google Scholar]
- 31.Knutson B, Taylor J, Kaufman M, Peterson R, Glover G. Distributed neural representation of expected value. The Journal of Neuroscience. 2005;25:4806–12. doi: 10.1523/JNEUROSCI.0642-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Preuschoff K, Bossaerts P, Quartz SR. Neural differentiation of expected reward and risk in human subcortical structures. Neuron. 2006;51:381–90. doi: 10.1016/j.neuron.2006.06.024. [DOI] [PubMed] [Google Scholar]
- 33.McClure SM, Ericson KM, Laibson DI, Loewenstein G, Cohen JD. Time discounting for primary rewards. The Journal of neuroscience: the official journal of the Society for Neuroscience. 2007;27:5796–804. doi: 10.1523/JNEUROSCI.4246-06.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kim YB, Huh N, Lee H, Baeg EH, Lee D, Jung MW. Encoding of action history in the rat ventral striatum. Journal of neurophysiology. 2007;98:3548–56. doi: 10.1152/jn.00310.2007. [DOI] [PubMed] [Google Scholar]
- 35.Kim H, Sul JH, Huh N, Lee D, Jung MW. Role of striatum in updating values of chosen actions. The Journal of Neuroscience. 2009;29:14701–12. doi: 10.1523/JNEUROSCI.2728-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Kimchi EY, Laubach M. Dynamic encoding of action selection by the medial striatum. J Neurosci. 2009;29:3148–3159. doi: 10.1523/JNEUROSCI.5206-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Carelli RM, Wondolowski J. Selective Encoding of Cocaine versus Natural Rewards by Nucleus Accumbens Neurons Is Not Related to Chronic Drug Exposure. Journal of Neuroscience. 2003;23:11214–11223. doi: 10.1523/JNEUROSCI.23-35-11214.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Setlow B, Schoenbaum G, Gallagher M. Neural encoding in ventral striatum during olfactory discrimination learning. Neuron. 2003;38:625–636. doi: 10.1016/s0896-6273(03)00264-2. [DOI] [PubMed] [Google Scholar]
- 39.Taha SA, Fields HL. Encoding of palatability and appetitive behaviors by distinct neuronal populations in the nucleus accumbens. The Journal of Neuroscience. 2005;25:1193–202. doi: 10.1523/JNEUROSCI.3975-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Yun IA, Wakabayashi KT, Fields HL, Nicola SM. The ventral tegmental area is required for the behavioral and nucleus accumbens neuronal firing responses to incentive cues. Journal of Neuro-science. 2004;24:2923–2933. doi: 10.1523/JNEUROSCI.5282-03.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Roesch MR, Calu DJ, Esber GR, Schoenbaum G. All that glitters... dissociating attention and outcome-expectancy from prediction errors signals. Journal of neurophysiology. 2010;104:587–95. doi: 10.1152/jn.00173.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Peters J, Büchel C. Neural representations of subjective reward value. Behavioural brain research. 2010;213:135–41. doi: 10.1016/j.bbr.2010.04.031. [DOI] [PubMed] [Google Scholar]
- 43.Daw ND, Niv Y, Dayan P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci. 2005;8:1704–1711. doi: 10.1038/nn1560. [DOI] [PubMed] [Google Scholar]
- 44.van der Meer MAA, Johnson A, Schmitzer-Torbert NC, Redish AD. Triple Dissociation of Information Processing in Dorsal Striatum, Ventral Striatum, and Hippocampus on a Learned Spatial Decision Task. Neuron. 2010;67:25–32. doi: 10.1016/j.neuron.2010.06.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Humphries MD, Prescott TJ. The ventral basal ganglia, a selection mechanism at the crossroads of space, strategy, and reward. Prog Neurobiol. 2009 doi: 10.1016/j.pneurobio.2009.11.003. [DOI] [PubMed] [Google Scholar]
- 46.Tremblay L, Worbe Y, Hollerman JR. The ventral striatum: a heterogenous structure involved in reward processing, motivation and decision-making. In: Dreher JC, Tremblay L, editors. Handbook of Reward and Decision Making. 2009. pp. 51–78. [Google Scholar]
- 47.Jones JL, Day JJ, Wheeler RA, Carelli RM. The basolateral amygdala differentially regulates conditioned neural responses within the nucleus accumbens core and shell. Neuroscience. 2010;169:1186–98. doi: 10.1016/j.neuroscience.2010.05.073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Krause M, German PW, Taha SA, Fields HL. A pause in nucleus accumbens neuron firing is required to initiate and maintain feeding. The Journal of Neuroscience. 2010;30:4746–56. doi: 10.1523/JNEUROSCI.0197-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Aragona BJ, Day JJ, Roitman MF, Cleaveland NA, Wightman RM, Carelli RM. Regional specificity in the real-time development of phasic dopamine transmission patterns during acquisition of a cue-cocaine association in rats. The European journal of neuroscience. 2009;30:1889–99. doi: 10.1111/j.1460-9568.2009.07027.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Wightman RM, Heien MLAV, Wassum KM, Sombers LA, Aragona BJ, Khan AS, Ariansen JL, Cheer JF, Phillips PEM, Carelli RM. Dopamine release is heterogeneous within microenvironments of the rat nucleus accumbens. The European journal of neuroscience. 2007;26:2046–54. doi: 10.1111/j.1460-9568.2007.05772.x. [DOI] [PubMed] [Google Scholar]
- 51.Goto Y, Grace AA. Limbic and cortical information processing in the nucleus accumbens. Trends in Neurosciences. 2008;31:552–558. doi: 10.1016/j.tins.2008.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Voorn P, Vanderschuren LJ, Groenewegen HJ, Robbins TW, Pennartz CM. Putting a spin on the dorsal-ventral divide of the striatum. Trends in Neuroscience. 2004;27:468–474. doi: 10.1016/j.tins.2004.06.006. [DOI] [PubMed] [Google Scholar]
- 53.Pennartz CMA, Groenewegen HJ, Lopes da Silva FH. The nucleus accumbens as a complex of functionally distinct neuronal ensembles: {A}n integration of behavioural, electrophysiological, and anatomical data. Progress in Neurobiology. 1994;42:719–761. doi: 10.1016/0301-0082(94)90025-6. [DOI] [PubMed] [Google Scholar]
- 54.Ito R, Robbins TW, Pennartz CM, Everitt BJ. Functional interaction between the hippocampus and nucleus accumbens shell is necessary for the acquisition of appetitive spatial context conditioning. Journal of Neuroscience. 2008;28:6950–6959. doi: 10.1523/JNEUROSCI.1615-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Lansink CS, Goltstein PM, Lankelma JV, McNaughton BL, Pennartz CM. Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biology. 2009;7:e1000173. doi: 10.1371/journal.pbio.1000173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Berke JD. Fast oscillations in cortical-striatal networks switch frequency following rewarding events and stimulant drugs. The European journal of neuroscience. 2009;30:848–59. doi: 10.1111/j.1460-9568.2009.06843.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.van der Meer MAA, Redish AD. Low and High Gamma Oscillations in Rat Ventral Striatum have Distinct Relationships to Behavior, Reward, and Spiking Activity on a Learned Spatial Decision Task. Frontiers in Integrative Neuroscience. 2009;3:9. doi: 10.3389/neuro.07.009.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Kalenscher T, Lansink CS, Lankelma JV, Pennartz CMA. Reward-associated gamma oscillations in ventral striatum are regionally differentiated and modulate local firing activity. Journal of Neurophysiology. 2010;103:1658–72. doi: 10.1152/jn.00432.2009. [DOI] [PubMed] [Google Scholar]
- 59.van der Meer MAA, Kalenscher T, Lansink CS, Pennartz CMA, Berke JD, Redish AD. Integrating early results on ventral striatal gamma oscillations in the rat. Frontiers in Neuroscience. 2010:4. doi: 10.3389/fnins.2010.00300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Niv Y, Joel D, Dayan P. A normative perspective on motivation. Trends Cogn Sci. 2006;10:375–381. doi: 10.1016/j.tics.2006.06.010. [DOI] [PubMed] [Google Scholar]
- 61.Redish AD, Jensen S, Johnson A. A unified framework for addiction: vulnerabilities in the decision process. Behav Brain Sci. 2008;31:415–487. doi: 10.1017/S0140525X0800472X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.O’Keefe J, Nadel L. The hippocampus as a cognitive map. Oxford: Clarendon Press; 1978. [Google Scholar]
- 63.Redish AD. Beyond the Cognitive Map: From Place Cells to Episodic Memory. MIT Press; 1999. [Google Scholar]
- 64.Balleine BW. Incentive Processes in Instrumental Conditioning. In: Mowrer RR, Klein SB, editors. Handbook of Contemporary Learning Theories. Philadelphia, PA: Lawrence Erlbaum Associates; 2001. pp. 307–366. [Google Scholar]
- 65.de Borchgrave R, Rawlins JNP, Dickinson A, Balleine BW. Effects of cytotoxic nucleus accumbens lesions on instrumental conditioning in rats. Exp Brain Res. 2002;144:50–68. doi: 10.1007/s00221-002-1031-y. [DOI] [PubMed] [Google Scholar]
- 66.Corbit LH, Muir JL, Balleine BW. The role of the nucleus accumbens in instrumental conditioning: Evidence of a functional dissociation between accumbens core and shell. J Neurosci. 2001;21:3251–3260. doi: 10.1523/JNEUROSCI.21-09-03251.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Singh T. Nucleus accumbens core and shell are necessary for reinforcer devaluation effects on Pavlovian conditioned responding. Frontiers in Integrative Neuroscience. 2010;4:126. doi: 10.3389/fnint.2010.00126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Lex B, Hauber W. The role of nucleus accumbens dopamine in outcome encoding in instrumental and Pavlovian conditioning. Neurobiology of learning and memory. 2010;93:283–90. doi: 10.1016/j.nlm.2009.11.002. [DOI] [PubMed] [Google Scholar]
- 69.Dayan P, Niv Y, Seymour B, Daw ND. The misbehavior of value and the discipline of the will. Neural networks. 2006;19:1153–60. doi: 10.1016/j.neunet.2006.03.002. [DOI] [PubMed] [Google Scholar]
- 70.Maia TV. Two-factor theory, the actor-critic model, and conditioned avoidance. Learning & behavior. 2010;38:50–67. doi: 10.3758/LB.38.1.50. [DOI] [PubMed] [Google Scholar]
- 71.German PW, Fields HL. Rat Nucleus Accumbens Neurons Persistently Encode Locations Associated With Morphine Reward. Journal of Neurophysiology. 2007;97:2094–2106. doi: 10.1152/jn.00304.2006. [DOI] [PubMed] [Google Scholar]
- 72.Nicola SM. The Flexible Approach Hypothesis: Unification of Effort and Cue-Responding Hypotheses for the Role of Nucleus Accumbens Dopamine in the Activation of Reward-Seeking Behavior. Journal of Neuroscience. 2010;30:16585–16600. doi: 10.1523/JNEUROSCI.3958-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.van der Meer MAA, Redish AD. Covert Expectation-of-Reward in Rat Ventral Striatum at Decision Points. Frontiers in integrative neuroscience. 2009;3:1. doi: 10.3389/neuro.07.001.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Buckner RL. The role of the hippocampus in prediction and imagination. Annual review of psychology. 2010;61:27–48. doi: 10.1146/annurev.psych.60.110707.163508. [DOI] [PubMed] [Google Scholar]
- 75.van der Meer MAA, Redish AD. Expectancies in Decision Making, Reinforcement Learning, and Ventral Striatum. Frontiers in Neuroscience. 2010 doi: 10.3389/neuro.01.006.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
