The striatum is essential for learning which actions lead to reward and for implementing those actions. Decades of experimental and theoretical work have led to several influential theories and hypotheses about how the striatal circuit mediates these functions. However, owing to technical limitations, testing these hypotheses rigorously has been difficult. In this Review, we briefly describe some of the classic ideas of striatal function. We then review recent studies in rodents that take advantage of optical and genetic methods to test these classic ideas by recording and manipulating identified cell types within the circuit. This new body of work has provided experimental support of some longstanding ideas about the striatal circuit and has uncovered critical aspects of the classic view that are incorrect or incomplete.
Decision-making involves the selection of a motor plan based on external information (for example, sensory inputs) and internal information (such as reward history). Here, we consider the role of the striatum in sensory-based and value-based decision-making and in the learning of reward associations that underlie these behaviours.
The striatum is the primary input nucleus of the basal ganglia and is positioned within multiple parallel cortico-subcortical loops. It receives input from the cortex and thalamus and sends outputs that ultimately relay information back to the cortex via the thalamus1–3. In addition, the striatum is a site where glutamatergic input from many brain regions converges with dense innervation from midbrain dopamine (DA) neurons4. Thus, the striatum is well positioned to have a vital role in learning and decision-making.
The striatum itself is primarily composed of GABAergic projection neurons called medium spiny neurons (MSNs), which are divided into two molecularly distinct populations with largely segregated output projection pathways through the basal ganglia5–9. These two pathways oppositely modulate the output structures of the basal ganglia, which have high baseline firing rates and tonically inhibit thalamic and brainstem nuclei10–15. In addition to MSNs, there are small populations of interneurons in the striatum, including cholinergic interneurons (CINs)16, as well multiple other subclasses of GABAergic neurons, that can be distinguished on the basis of their physiological and molecular profiles17–19.
In this Review, we discuss recent work exploring how specific cell types within the striatum and its inputs are involved in learning and decision-making. We focus on five cell types: dopaminergic input neurons, the two classes of MSNs, CINs and glutamatergic input neurons (the GABAergic interneurons have been reviewed else-where18,19). For each of these components of the striatal circuit, we briefly review classic ideas about their role in learning and decision-making, which were derived primarily from anatomical, electrophysiological and pharmacological experiments. We then discuss studies from the past decade that used genetic and optical tools to more precisely monitor and manipulate these distinct cell types within the striatal circuit in rodents. In some cases, this work has confirmed classic ideas about the role of these cell types in learning and decision-making whereas in other cases the recent research has revealed that classic ideas are incomplete, opening new questions in the field that must now be resolved.
Midbrain dopamine neurons
A teaching signal.
DA neurons that originate in the ventral tegmental area (VTA) and substantia nigra pars compacta (SNc) provide dense, topographic innervation to the striatum20–26 (Fig. 1a). The VTA projects preferentially to the nucleus accumbens (NAc), whereas the SNc projects preferentially to the dorsomedial striatum (DMS), dorsolateral striatum (DLS) and tail of the striatum (TS; see Box 1 for an introduction to striatal subregions). Seminal experiments demonstrated that these DA neurons encode reward prediction error (RPE) — the difference between experienced and expected reward27,28. This result has been confirmed in multiple species (including mice, rats and non-human primates) and observed both in the firing patterns of putative DA neurons and in DA concentration changes in the striatum29–37.
Box 1 |.
The striatum is divided into functional subregions that are thought to mediate learning and expression of different types of association. In the rodent, these regions include the dorsolateral striatum (DLS; homologous to the primate putamen), dorsomedial striatum (DMS; homologous to the primate caudate) and ventral striatum (VS). These are frequently defined as the sensorimotor, associative and limbic striatum, respectively3,57,187–190.
The DLS is thought to be important for the formation of stimulus–response associations that underlie skilled movements and habitual actions3,188,190–196, whereas the DMS regulates goal-directed behaviours that rely on response–outcome associations3,57,188,190,193–195,197. Animals will stop performing goal-directed actions if the associated outcome is no longer valuable. By contrast, habitual behaviours are relatively insensitive to the value of associated outcomes. Lesions or inactivation of the DMS render learned operant behaviours habitual, whereas lesions or inactivation of the DLS prevent behaviours from becoming habitual57,191. These structures are both active during learning, but with overtraining as an action transitions from goal-directed to habitual, distinct activity patterns develop in these structures193,194.
The nucleus accumbens (NAc; the major component of the VS) is thought to be involved in outcome evaluation and motivation and in the formation of stimulus– outcome associations that are important for Pavlovian learning188,190,198,199. Furthermore, through its substantial projection to midbrain dopamine neurons200,201, the NAc regulates dopamine release throughout the striatum202.
Finally, in the primate, the tail of the striatum (TS) is increasingly recognized as a distinct subregion that is involved in processing sensory information and promoting behaviours that are reliant on sensory information177–180. Although less well studied, some evidence suggests that the TS of rodents is also specialized for sensory information88,152,158,176. The dorsal striatum can probably be further subdivided147,148, but the functional roles of these finer divisions have yet to be interrogated.
This dopaminergic RPE signal is integral to one of the critical functions ascribed to the striatum — reinforcement learning. RPE is thought to serve as a reinforcement signal that modifies glutamatergic synaptic inputs to the striatum that are active during unexpected rewards (that is, co-active with DA neurons). Thus, DA-dependent plasticity of these synapses provides a synaptic mechanism through which actions that are associated with unexpected reward are more likely to be repeated or stimuli that are associated with unexpected reward are more likely to be pursued38–42.
Classic experiments manipulating DA activity supported the idea that DA serves as a teaching signal to support reinforcement learning. For example, it has been known since the 1950s that animals will perform arbitrary actions to receive electrical stimulation of the medial forebrain bundle43,44 and that this effect can be attenuated by dopaminergic antagonists45,46.
New evidence of heterogeneity in the dopamine system.
The classic idea that DA activity supports reinforcement learning has been directly tested in recent years with optogenetic activation or inhibition of DA neurons in a wide range of learning paradigms. These studies had the cell-type specificity and temporal precision to directly test the hypothesis that DA neurons provide an RPE signal to support reinforcement learning. They have unequivocally confirmed that DA neuron activation supports Pavlovian learning25,47,48, contextual learning49,50 and operant learning50–53. Conversely, transient optogenetic inhibition of these neurons mimics a negative prediction error: inhibition of DA neurons promotes extinction of a previously conditioned response54, induces conditioned place avoidance50 and reduces the likelihood that an animal will repeat a previously selected action52,55.
Given the anatomical and functional specialization within the striatum (Box 1), this RPE signal may support different forms of learning depending on the function of the striatal target area. Consistent with this, stimulation of either VTA DA projections or SNc DA projections to the striatum is sufficient to turn a neutral cue into a conditioned stimulus, but with important differences25. Activation of the projection from the VTA to the NAc induces cue approach and leads the cue to become reinforcing on its own. By contrast, activating the projection from the SNc to the dorsal striatum induces vigorous, but undirected, movement in response to the cue and does not cause the cue to become reinforcing25. This distinction is consistent with classic ideas about striatal subregions: the NAc is thought to be important for generating stimulus–outcome associations, whereas the dorsal striatum is thought to be more important for stimulus–response associations and action–outcome associations56,57 (Box 1).
Although recent optogenetic experiments have supported the classic idea that DA activity functions as an RPE to drive reinforcement learning, and in vivo recordings from identified DA neurons have revealed RPE signals36, numerous other entirely unexpected response profiles have been recorded in vivo that cannot easily be interpreted within the RPE framework58–61 (Fig. 1b). These findings challenge the classic idea that DA projections to the striatum uniformly transmit RPE signals and instead point to anatomical specialization within the DA system and raise new questions about what functions these non-RPE signals may serve.
Notably, specialization of DA neuron activity seems to be related to where they are located or where in the striatum they project. For example, DA neuron terminals in the dorsal striatum have relatively weak reward responses but respond robustly during locomotion59 or contralateral movements55,62. Similarly, individual DA neurons in the SNc increase their activity during movement initiation, and this increase in activity correlates with the vigour of those movements63 (Fig. 1b). Consistent with these neural correlates, optogenetic stimulation of SNc cell bodies or their terminals in the DMS increases movement59,63,64 whereas their inactivation decreases movement initiation and reduces the vigour of movements that do occur63.
Another aspect of DA activity that is seemingly inconsistent with RPE is the positive response that had been observed in some putative DA neurons, particularly in the SNc, to aversive events23,60,65–67. Calcium imaging of identified DA neurons has also suggested that these responses to aversion are projection-specific, similar to the movement-related increases in activity. In particular, DLS-projecting SNc DA neurons respond with increased activity to a foot shock23, whereas TS-projecting SNc DA neurons respond with increased activity to an air puff60 (Fig. 1b). Optogenetic activation of TS-projecting DA neurons reinforces avoidance behaviour60, suggesting that even populations of DA neurons that do not encode RPE support specific forms of reinforcement learning.
Given the increased appreciation that there are DA signals that cannot easily be considered RPE-related, an important question is how RPE and non-RPE signals are organized across individual DA neurons. In the VTA of mice navigating a virtual maze in a decision-making task, DA neurons show surprisingly heterogeneous activity, with most individual neurons representing one or two specific behavioural variables, such as reward history, trial accuracy, kinematics and/or spatial position (Fig. 1b). DA neurons within the VTA with similar activity profiles are more likely to be spatially localized58. Although RPE does not offer a clear explanation for the heterogeneous and specialized selectivity for these variables observed during this task, many of the same neurons that encode specific behavioural variables also encode RPE58.
In contrast to the overlap of RPE and non-RPE responses observed within the same neurons in the VTA, DA neurons in the SNc that are activated by reward seem to be largely distinct from those activated around the time of movement initiation59,63. This observation suggests that RPE signals are less ubiquitous in the SNc than in the VTA.
Direct and indirect output pathways
Promoting and suppressing actions.
In the dorsal striatum, MSNs of the direct pathway express the D1 dopamine receptor (D1R) and inhibit the main out-put nuclei of the basal ganglia — the internal globus pallidus (GPi) and the substantia nigra pars reticulata (SNr). By contrast, indirect pathway MSNs express the D2 dopamine receptor (D2R) and indirectly increase basal ganglia output7,8,68 (Fig. 2a). The classic view of these two pathways’ functions is that they differentially regulate behaviour by oppositely modulating the firing rate of basal ganglia output nuclei2,10,12,13,15. For example, direct pathway activation would lead to disinhibition of brainstem motor structures, as well as of thalamic nuclei that target motor cortex, to promote movement. The indirect pathway drives further activation of basal ganglia output nuclei and thus promotes suppression of their targets, inhibiting movements2,10,12,13,69–72. This proposal is often referred to as the ‘go/no-go’ model.
In its simplest form, the go/no-go model poses a straightforward hypothesis about what information is represented by each pathway: D1R neurons would be active during actions (because they promote them) and D2R neurons would be inactive during actions (because they suppress them). A related proposal suggests that D1R MSNs encode selected behaviours whereas D2R MSNs encode unselected behaviours13,69.
The go/no-go model may extend to learning and decision-making, with the two pathways exerting opposing controls on those processes. For example, striatal regions such as the DMS or VS, which receive inputs from the prefrontal cortex, could oppositely influence value-based decision-making, and subregions such as the TS, which receives inputs from sensory cortex, could oppositely influence perceptual decisions. However, to date, most studies have primarily examined the go/no-go model in the setting of spontaneous movements in which the learning and decision-making variables that control those movements are not explicitly controlled. For completeness and because the study of spontaneous movement may provide insight into how these anatomical pathways control movements in the setting of decision-making, we review below both studies of spontaneous movements and those on decision-making.
Opposing effects despite similar activity patterns.
Although electrophysiological recordings from striatal MSNs have revealed neural correlates of sensory stimuli, movement and value73–76, these studies could not differentiate between D1R and D2R MSNs, making it difficult to test predictions of the go/no-go model. The development of transgenic mouse lines to target D1R and D2R MSNs77,78 has enabled the identification and manipulation of these two populations independently in order to assess hypotheses about their endogenous activity.
Whereas the classic go/no-go model would predict opposite activity patterns in the two pathways during movement, surprisingly, recordings from the two types of MSN in the dorsal striatum have instead revealed very similar activity patterns. For example, both pathways are more active during movement than immobility79–85, are similarly active during trained79,86–91 and spontaneous movements80–84, encode the velocity of the animal80–83 and are preferentially active during contralateral movements79,91,92. These data suggest that the direct and indirect pathways simultaneously coordinate movement; indeed, there is considerable communication between the two pathways7,9,17,70,93. Thus, the simple go/no-go model of the direct versus indirect pathway function is probably incomplete.
Although these data contradict the simplest interpretation of the go/no-go model, they could allow for the possible interpretation that the direct pathway promotes the selected action whereas the indirect pathway suppresses alternative actions13,69,79. This interpretation gives rise to the interesting and testable prediction that direct pathway neurons are more selective for actions than indirect pathway neurons as there are far more unselected actions than selected actions at any point in time. Testing this model will require examination of behaviours in which neural correlates of more than two actions can be analysed, which has not been the case in the majority of studies of spontaneous movement and, to our knowledge, has not been achieved during decision-making tasks at all. However, recent investigations of spontaneous behaviour in which machine-learning algorithms classified movements into multiple discrete components suggest that this version of the model is probably incomplete as well81,82. These detailed analyses reveal that both populations concurrently encode spontaneous behaviours81,82, with a similar degree of specificity across the two pathways81,83. This similarity in specificity suggests that ensembles of D2R MSNs are unlikely to suppress all unselected actions in a particular context. Thus, so far, the data do not clearly support the extension of the go/no-go model in which the indirect pathway suppresses a wider range of actions than the direct pathway. Therefore, new ideas are needed to interpret the function of these two pathways.
Another possibility is that whereas D1R and D2R MSNs are both active during the same action, the relative activation of the two pathways determines whether that action is selected or avoided94–96. A potentially related idea is that the striatum is involved in the learning or decision-making processes that underlie motor outputs but is not directly involved in generating the motor output. In this framework, D1R and D2R MSNs may have opposing representations of the decision variables underlying a motor output despite having similar activation during movement82,97. This possibility can be best tested when internal variables related to decision-making (for example, the value of an action or the sensory evidence that drives a decision98) are parametrically controlled during decision-making, but not during spontaneous behaviour, when movements are monitored without knowledge of the decision-making process.
In support of these ideas, recent evidence suggests that value may oppositely modulate activity in the two pathways, despite both pathways showing similar activity during movement90,91,99. For example, in value-based decision-making tasks, many D1R MSNs increase their activity during reward presentation90 whereas D2R MSNs are more active during unrewarded outcomes90,100 (Fig. 2b). Opposite outcome-dependent responses are also observed in a Pavlovian conditioning task; here, D1R and D2R MSN responses to reward-predicting cues are positively or negatively correlated with reward value, respectively99. The fact that the activity of D1R and D2R MSNs seems to be differently modulated by value suggests that these neurons could oppositely encode the internal variables that underlie behavioural decisions rather than the actions themselves.
Specific optogenetic manipulations of D1R or D2R MSNs further support the idea that D1R MSNs and D2R MSNs oppositely modulate decision-making rather than influencing motor output (Fig. 2c). Animals trained in a probabilistic reversal learning task tend to repeat previously selected actions if they lead to reward but switch if they do not90,101,102. Transient stimulation of D1R MSNs or D2R MSNs just before mice execute their selected action induces a contralateral or ipsilateral bias, respectively. This is not simply a motor effect because the bias is dependent on the difference in estimated value of the two available choices, such that the bias was greater when the estimated value of the two choices was more similar102 (Fig. 2c). In addition, stimulation of D1R MSNs during outcome presentation decreases switches following reward whereas stimulation of D2R MSNs increases switching after unrewarded trials, suggesting that outcome-period activity in these pathways regulates animals’ outcome-dependent decision strategy90.
Activation of the direct and indirect pathways also seems to oppositely modulate learning. Activation of D1R MSNs in the dorsal striatum reinforces the behaviour or spatial location paired with stimulation103 and can reinforce specific features of trained movements, such as velocity104. D2R MSN activation has the opposite effects: decreasing performance of stimulation-paired behaviour, inducing aversion for a spatial location and decreasing selection of a particular movement velocity103,104. Similarly, stimulation of D1R MSNs in the NAc increases cocaine conditioned place preference (CPP) whereas stimulating D2R MSNs decreases cocaine CPP105.
Thus, considerable support from optogenetic activation suggests that D1R and D2R neurons exert antagonistic control over learning and decision-making. However, there are also recent studies that suggest surprising functions for the indirect pathway that seem to be entirely distinct from ‘no-go’. For example, in a sensory go/no-go task, activation of D1R MSNs or D2R MSNs causes a bias towards go responses, with no change in the perception of sensory information106. The increase in go responses with D1R MSN activation is consistent with classic models of direct pathway function; however, according to the classic model, D2R MSN stimulation would be expected to reduce rather than increase go responses. Similarly, optogenetic inhibition of D1R MSNs slows action initiation, consistent with the classic model, whereas D2R MSN inhibition does not speed action initiation but instead increases the probability that the mouse disengages from the task89,107.
In addition, although in the DMS direct and indirect pathway activity oppositely modulates reinforcement103, this does not seem to be the case in the DLS or NAc. D2R MSN activation in the DLS does not decrease pressing a stimulation-paired lever but instead increases pressing of both the paired and an unpaired lever108. In the NAc, activation of D1R or D2R MSNs promotes self-stimulation, although only D1R stimulation increases time spent in a simulated spatial location109. Moreover, in a task designed to test motivation, stimulating either D1R or D2R MSNs in the NAc increases motivation, and inhibition of D2R MSNs decreases motivation, causing animals to give up earlier than controls110 (but see reFs111,112). Taken together, these studies suggest that, under some conditions, the indirect pathway can have entirely unexpected roles that seem to be very different from the classically proposed role of antagonizing direct pathway function. When and why the indirect pathway serves these unexpected roles remain to be understood.
Cholinergic interneurons
Salience signalling.
Despite comprising only 1–2% of the total neurons within the striatum, CINs provide a major source of acetylcholine to the structure113 (Fig. 3a). As acetylcholine has been implicated in attention and learning in other brain regions114, and levels of cholinergic markers are particularly high in the striatum115–117, there has been great interest in understanding the function of striatal CINs. However, their sparse and distributed nature has made this goal particularly challenging.
Classic ideas about the role of CINs in learning and decision-making come from extracellular recordings of tonically active neurons (TANs), which are thought to be CINs on the basis of in vitro characterization or in vivo intracellular or juxtacellular recordings followed by histological identification113,118–120. TANs transiently respond to motivationally relevant stimuli with brief pauses that are often flanked by bursts of increased activity65,113,121–126. Interestingly, TANs tend to exhibit these pause–burst responses to both appetitive and aversive stimuli. This tendency is in stark contrast to DA neurons, which tend to respond positively to reward and negatively to aversive stimuli (signalling RPE). Thus, CINs are thought to represent the ‘salience’ or ‘motivational significance’ of stimuli, which potentially has a role in modulating the rate of learning rather than providing a reinforcement signal that can directly drive learning.
Modulating plasticity in medium spiny neurons.
The notion that putative CINs (TANs) represent salience or motivational significance leads to an interesting hypothesis — these neurons may modulate the ‘gain’ of learning and plasticity without being sufficient to drive these processes on their own, because although they respond at moments when learning should occur, they do not provide information about the direction of learning, as responses are similar for appetitive and aversive stimuli.
Recent optogenetic experiments have provided support for the idea that CINs modulate the gain on learning. For example, increasing CIN activity in the NAc hastens the extinction of a cocaine CPP; conversely, decreasing CIN activity can slow the extinction (or acquisition) of these associations127,128. Enhanced extinction learning is accompanied by a decrease in the synaptic strength of glutamatergic inputs to MSNs128 (Fig. 3b). However, this change in synaptic strength does not occur when CINs are activated outside of the learning context. Similarly, activation of CINs is not sufficient on its own (for example, in real-time CPP or intra-cranial self-stimulation tasks) to support learning128. Together, these findings suggest that CINs regulate the gain on learning when it occurs but do not drive reinforcement learning themselves.
CINs may be particularly essential in regulating the rate of learning in contexts that require flexible updating of previously learned associations129–133. Cell-type-specific lesions of CINs in the dorsal striatum or VS do not affect initial learning of action–outcome associations but do impair performance when task contingencies change130,132. CIN ablation increases perseveration with an old strategy when relevant task features change130 and impairs the ability of the animal to discriminate different action–outcome contingencies in a devaluation test132 (but see reF.131).
How do CINs modulate MSNs to support learning? CINs inhibit MSNs through various mechanisms127,134–136; thus, pauses in CIN activity may disinhibit MSNs, increasing their responsiveness to behaviourally relevant information. In addition, CINs trigger DA release from the striatal terminals of midbrain DA neurons, which may directly enhance plasticity137,138. Moreover, as mentioned above, CINs can regulate plasticity of glutamatergic input–MSN synapses128. How each of these mechanisms affects learning and decision-making, and what other mechanisms may be important, are key open areas of research in the field.
Glutamatergic inputs
A crucial component of the striatal circuit is the glutamatergic input that converges in the striatum from the cortex as well as from subcortical structures such as the thalamus, amygdala and hippocampus. Cortical and thalamic neurons project topographically to the striatum, such that different striatal subregions receive distinct combinations of cortical and thalamic inputs139–146 (Fig. 4a). In fact, unsupervised clustering of the anatomical distribution of glutamatergic inputs has been used to recover boundaries between traditional striatal subregions (for example, the DMS, DLS and NAc) and to discover new subregions of the striatum (mostly within the DMS)147,148.
Providing functional specialization to striatal sub regions.
A classic idea regarding the striatal circuit is that the neural activity of each glutamatergic input is specific to its target region and determines the function of that region. To test this model, numerous important studies have begun to examine the functional specialization of glutamatergic inputs to the striatum in learning and decision-making tasks by specifically targeting neurons on the basis of their projections149–176.
Several of these studies have supported the idea that glutamatergic inputs provide functional specialization to striatal subregions. For example, glutamatergic inputs to the TS are thought to be specialized for processing sensory information and supporting sensory-guided decisions177–180. Projections from the auditory cortex to the TS are tonotopically organized158, and neurons recorded in the TS have similar auditory responses as the auditory cortex neurons innervating them152 (Fig. 4b). In a two-choice auditory discrimination task, specifically stimulating auditory cortical neurons projecting to the striatum biases choice towards the action associated with the preferred frequency of the simulated neurons, whereas inhibition induces the opposite effect152. In addition, after rats learn this auditory discrimination task, synapses from corticostriatal neurons that encode the rewarded auditory stimuli are selectively potentiated158.
Further evidence of input specialization comes from examining the role of inputs from the mPFC to the NAc in learning (TABle 1). These neurons are involved in learning associations between a conspecific and a spatial location165 but are not required for the acquisition of Pavlovian conditioning (although they are involved in expression of conditioned behaviour)164. Furthermore, mPFC–NAc projections are not involved in learning the association of a particular action or cue with a reward (although they are involved in switching between these tasks)171. Thus, the projection from the mPFC to the NAc seems specialized to support only some types of learning.
Table 1 |.
Projection | Target | Behaviours | Resulta | Refs |
mPFC | NAc | ICSS and CPPb | Activation of terminals reinforces the behaviour that triggers stimulation | 150 |
Activation of terminals has no effect on the behaviour that triggers stimulation | 149,164 | |||
Activation of mPFC–NAc neurons reduces time spent in the stimulated spatial location | 168 | |||
mPFC (PL) | NAc | Pavlovian conditioning | Activation of mPFC–NAc neurons increases, and inactivation decreases, expression of conditioned reward-seeking behaviour | 164 |
mPFC (PL) | NAc core | Social CPP | Activation of mPFC–NAc neurons increases, and inactivation decreases, learning of association between a social target and spatial location | 165 |
mPFC (PL) | NAc core | Task switching | Activation of PL terminals decreases and inhibition of PL terminals increases perseverative errors | 171 |
BLA | NAc | ICSS and CPP | Stimulation of BLA–NAc projection increases performance of operant behaviour that triggers stimulation | 149,150,157 |
Stimulation of BLA–NAc projection increases time spent in stimulation-paired spatial location | 150 | |||
BLA | NAc | Pavlovian conditioning | Inhibition of BLA–NAc terminals decreases conditioned reward-seeking | 149 |
vHipp | NAc shell | ICSS and real-time CPP | Activation of vHipp–NAc terminals reinforces operant behaviour and increases time spent in stimulated spatial location | 150 |
vHipp | NAc shell | Social memory | Inhibition of vHipp–NAc terminals impairs social discrimination | 161 |
vHipp | NAc | CPP | Optogenetically induced LTP of vHipp–NAc synapses increases time spent in stimulated spatial location; inhibition of vHipp–NAc terminals impairs association of a social target with a spatial location | 173 |
dCA1 | NAc | CPP | Inhibition of dCA1–NAc terminals impairs retrieval of sucrose CPP | 175 |
PVT | NAc shell | CPA | Activation of PVT–NAc terminals decreases time spent in stimulated spatial location | 160 |
ILT | NAc | Social stress | Inhibition of ILT–NAc terminals decreases social avoidance following chronic social defeat stress | 159 |
VTA (glutamatergic neurons) | NAc | ICSS | Stimulation of VTA–NAc glutamatergic terminals reinforces operant behaviour | 162 |
BLA, basolateral amygdala; CPA, conditioned place aversion; CPP, conditioned place preference; dCA1, dorsal area CA1 of the hippocampus; ICSS, intracranial self-stimulation; ILT, intralaminar thalamus; LTP, long-term potentiation; mPFC, medial prefrontal cortex; NAc, nucleus accumbens; PL, prelimbic cortex; PVT, paraventricular thalamus; vHipp, ventral hippocampus; VTA, ventral tegmental area.
All experiments used optogenetics to activate or inhibit neurons or terminals.
Results differ across experiments.
In addition to whether a specific input is specialized for a specific behavioural function, a related question is whether multiple inputs to the same target region have different or redundant functions. In fact, several inputs to the NAc seem to be specialized for reward learning, which is thought to be a major function of that subregion (TABle 1). For example, several inputs to the NAc seem to be reinforcing: mice will learn to perform an action that triggers optogenetic stimulation of the projection to the NAc from the basolateral amygdala (BLA)149,150,157 or the ventral hippocampus150. Consistent with these observations, inactivation of the projection from the BLA to the NAc reduces conditioned licking in response to a reward-predicting cue149. In contrast, inhibiting this projection does not affect the acquisition of fear learning157. Whereas multiple inputs to the NAc support reward learning, several inputs from the thalamus may have opposing, aversive effects. Stimulation of the projection from the paraventricular thalamus (PVT) to the NAc is aversive, and weakening this projection through optogenetically induced long-term depression attenuates the expression of aversive symptoms of opiate withdrawal160. Moreover, chronic social defeat strengthens the projection from the intralaminar thalamus to the NAc, and optogenetic inhibition of this projection reduces the resultant social avoidance, whereas optogenetic activation reduces social interaction159.
Comparing inputs to the DMS from different regions of the prefrontal cortex also reveals evidence of functional differences across projections. In a T-maze, optogenetic manipulation of inputs from the prelimbic cortex (PL) affects decision-making only when the choice that maximizes reward is distinct from the choice that minimizes an aversive stimulus (in this case, bright light)155. By contrast, manipulation of the projection from the anterior cingulate cortex (ACC) affects multiple types of cost–benefit comparison155.
Together, these studies suggest that projections to the striatum show some functional differentiation. However, more work is needed to determine how much redundancy exists across glutamatergic inputs.
Summary and future outlook
The recent application of techniques for cell-type-specific monitoring and manipulation of distinct neuronal populations in the striatum has allowed rigorous testing of several classic ideas of striatal function. As summarized in TABle 2, many of these studies support classic models, whereas others provide unexpected insights that challenge and contradict certain prevailing ideas. Thus, new models are needed to better understand striatal contributions to learning and decision-making.
Table 2 |.
Classic view | Method | Result | Supports classic view? | Refs |
DA functions as a teaching signal | Optogenetic manipulation | DA activation promotes Pavlovian conditioning and inhibition promotes extinction of Pavlovian overexpectation | Yes | 25,47,54,206 |
DA manipulation bidirectionally modulates time spent in a laser-paired location | Yes | 49,50 | ||
DA manipulation bidirectionally modulates model-based associations | Yes | 48 | ||
DA manipulation bidirectionally modulates performance of Yes stimulation-associated operant behaviours | Yes | 50–52,55 | ||
DA neurons encode RPE | Recording from DA neurons or their striatal axons (using Ca2+ imaging or optotagging) | Identified DA neurons encode RPE and/or reward | Yes | 36,58,59,63 |
Identified DA neurons encode non-RPE information | No | 23,55,58–61,63 | ||
Optogenetic manipulation | Manipulation of SNc DA cell bodies or terminals bidirectionally modulates movement | No | 59,63,64 | |
Activation of DA neurons projecting to the TS reinforces avoidance behaviour | No | 60 | ||
D1R and D2R neurons oppositely modulate behaviour | Optogenetic manipulation | Activation of D1R MSNs increases whereas activation of D2R MSNs decreases spontaneous movement | Yes | 72,183,184 |
In value-based decision-making, activation of D1R and D2R Yes MSNs oppositely biases choice | Yes | 90,102 | ||
Activation of D1R MSNs promotes performance of a stimulation-paired behaviour and activation of D2R MSNs decreases performance of a stimulation-paired behaviour (in the DMS and NAc) | Yes | 103–105 | ||
Activation of D1R and D2R MSNs in the DLS promotes pressing of stimulation-paired lever, but activation of D2R MSNs in the DLS also increases pressing of an unpaired lever | No | 108 | ||
Activation of D1R and D2R MSNs promotes performance of a stimulation-paired behaviour, but only D1R activation increases time spent in a stimulation-paired location (in the NAc) |
No | 109 | ||
Activation of D1R and D2R MSNs promotes go responses in No a sensory go/no-go task | No | 106 | ||
Activation of D1R and D2R MSNs increases motivation | No | 110 | ||
D1R MSN inhibition slows action initiation but D2R MSN inhibition decreases task engagement | No | 89,107 | ||
D1R and D2R MSNs oppositely encode behavioural variables | Recording from identified neurons (Ca2+ imaging or optotagging) | D1R and D2R neurons are concurrently active during spontaneous and trained movement | No | 79–84,87,89–91 |
D1R and D2R neurons oppositely encode value | Yes | 90,91,99 | ||
CIN pause–burst activity signals salience and modulates learning | Optogenetic manipulation | Modulation of CINs bidirectionally modulates the rate of cocaine CPP extinction | Yes | 127,128 |
Cell-type-specific ablation | CIN ablation impairs flexible updating of learned associations | Yes | 130–132 |
CIN, cholinergic interneuron; CPP, conditioned place preference; D1R, D1 dopamine receptor; D2R, D2 dopamine receptor; DA, dopamine; DLS, dorsolateral striatum; DMS, dorsomedial striatum; MSN, medium spiny neuron; NAc, nucleus accumbens; RPE, reward prediction error; SNc, substantia nigra pars compacta; TS, tail of the striatum.
For example, DA neurons terminating in the striatum can show heterogeneous signals during complex decision-making58. This suggests that the model positing that these neurons provide only RPE signals to the striatum is incomplete. It is possibile that heterogeneous signals in DA neurons actually represent specialized types of prediction error to support specific types of learning. For example, DA inputs to the TS have been suggested to signal errors in threat prediction60. However, at this point, whether heterogenous DA signals can be viewed as specialized types of prediction error to support specific aspects of learning is not clear. Indeed, a recent study examined whether activation of DA projections to DMS during a contralateral choice in a value-based decision-making task correlates more with contralateral movement or with a specialized RPE with respect to contralateral movements, and concluded that the signal is more related to movement62. Thus, some DA signals may not reflect a prediction error at all.
Even if all DA signals may not reflect RPE, all DA neurons presumably modulate plasticity and excitability in the striatum by releasing DA. Thus, inasmuch as DA activity correlates with movement, striatal plasticity and excitability would be modulated by movement59,63 rather than (or in addition to) reward. Such movement-generated plasticity may modulate the continuity and vigour of ongoing movements. Similarly, inasmuch as DA activity correlates with an internal state, such as behavioural accuracy during decision-making58, DA release and subsequent DA-mediated plasticity may maintain the continuity of the ongoing internal state. To interrogate the function of specialized non-RPE DA signals, new studies are needed that specifically target functional subpopulations of DA neurons during learning and decision-making paradigms that elicit these signals.
Recordings of indirect and direct pathway MSNs also provide exciting new challenges to classic models. The major challenge arises from the fact that D1R and D2R MSNs seem to be co-active during trained and spontaneous movements79–82,84,90,91. Consistent with the classic model, however, the activity of D1R and D2R neurons is oppositely modulated by value in reinforcement learning and decision-making paradigms90,91,99, potentially through differential effects of a DA signal on the synaptic plasticity at (or excitability of) D1R and D2R MSNs40,42,181,182. In this framework, inputs that are active around the time of reward will be potentiated if they target D1R MSNs or weakened if they target D2R MSNs. Thus, opposing activity patterns in D1R and D2R neurons may be most evident in the case of learning and decision-making paradigms, when DA is released at specific time points to differentially modulate the two pathways. The specific learning paradigms that are relevant to each striatal subregion might differ; therefore, opposing activity in the two pathways may be behaviour-specific. This idea would be best tested by inhibiting endogenous activity in each subpopulation during learning and decision-making tasks rather than through artificial activation. To date, most optogenetic interrogations of MSN function have relied on excitatory opsins72,90,102–105,183,184, which strongly and synchronously activate many neurons in an artificial pattern and thus provide little insight into whether the endogenous activity in the two populations is opposing. Thus, despite many foundational experiments, the classic go/no-go model remains to be fully tested.
Indeed, despite extensive progress in recent years, several hypotheses from classic models of striatal function have yet to be fully tested. For example, CINs are thought to signal salient events through their pause–burst firing and are presumed to support learning. However, these activity patterns have not yet been directly replicated during learning and decision-making, and whether pauses in CIN activity are indeed crucial to their modulation of learning is unknown (although see ref.185 for indirect manipulation of CINs).
In addition, more work is needed to relate neural activity in glutamatergic inputs to classic ideas that their plasticity underlies learning and decision-making. For example, a basic test of the idea that corticostriatal plasticity is a neural substrate of reward-based learning is whether glutamatergic projections that are involved in learning new behavioural associations are also required for their expression186. Whether or not that is the case is not yet clear. Furthermore, whether specific glutamatergic inputs are specialized to support or modulate different elements of task execution, such as motivation or action selection, is also unclear. Addressing these questions will require systematic comparisons of multiple glutamatergic inputs at different time points, both within a trial and across learning, within a consistent behavioural framework.
In summary, studies that have recorded and manipulated identified neuronal populations have addressed many longstanding hypotheses regarding the role of striatal circuits in learning and decision-making. Some components of these classic models have been upheld by this new evidence, although important challenges to classic ideas have also emerged. Future experiments must be designed to address these challenges and the important ideas that remain untested.
