The dopamine (DA) system is critical for various forms of learning about salient environmental stimuli. Prior work has shown that deletion of the obligatory NR1 subunit of the N-methyl-d-aspartate (NMDA) receptor on neurons expressing the DA transporter (DAT) in mice results in reduced phasic release from DA-containing neurons. To further investigate the contribution of phasic DA release to reward-related learning and cognitive flexibility, the current study evaluated DAT-NR1 null mutant mice in a touchscreen-based pairwise visual discrimination and reversal learning paradigm. Results showed that these mutants were slower to attain a high level of choice accuracy on the discrimination task, but showed improved late reversal performance on sessions where correct choice was above chance. A number of possible interpretations are offered for this pattern of effects, including the opposing possibilities that discrimination memory was either stronger by the completion of training (overtraining effect) or weaker (learning deficit), both of which could potentially produce faster reversal. These data add to the extensive literature ascribing a critical role for DAergic neurotransmission in cognitive functions and the regulation of reward-related behaviors of relevance to addictions.
The dopamine (DA) system is critical for various forms of learning about salient environmental stimuli. Pacemaker-like firing of midbrain DA neurons results in tonic release of low concentrations. Burst-firing of DA neurons produces phasic DA release of higher concentrations and has recently been shown to result in sustained, post-burst elevations of DA as well [1,2]. Phasic DA release is a neural substrate of reward-related behaviors, but understanding of how this signal contributes to behavior is still incomplete.
Deletion of the obligatory NR1 subunit of the N-methyl-d-aspartate receptor (NMDAR) on neurons expressing the DA transporter (DAT) results in reduced phasic DA release from DAergic containing neurons in response to both unconditioned and conditioned stimuli [3–6]. These mice (DAT-NR1 null mutant) have been found to have deficits in behaviors such as formation of a conditioned place preference, cue-dependent spatial navigation (water maze and T-maze), cued fear conditioning, and delays in learning to perform an operant response for food reward [3,5].
Previous studies of the role of the DA system in behavioral flexibility support the notion that DA facilitates reversal learning [7–15] (Del Guidice et al. 2014). To further investigate how phasic DA release contributes to reward learning and cognitive flexibility, the current study tested mice lacking NR1 on DA neurons in a pairwise visual discrimination and reversal learning task recently shown to engage VTA → NAc DAergic signaling [16].
Materials and methods
Deletion of NR1 on DAergic neurons was achieved, as previously described [2], by crossing Grin1loxP/loxP mice with Slc6a3+/Cre, Grin1Δ/+ mice expressing Cre under control of the DAT gene (Slc6a3). This cross resulted in DAT-NR1 null mutant mice (Slc6a3+/Cre, Grin1Δ/loxp) which lack functional NMDAR on DAergic neurons and their littermate controls (Slc6a3+/Cre, Grin1+/loxp) (Figure 1A). While the controls did lack one functional copy of the NR1 gene, previous studies have demonstrated that current through the NMDA receptor and behavioral outcomes are equivalent to wild type animals [3,4,17]. Mice were bred on a C57BL/6J genetic background; we have previously demonstrated robust performance on touchscreen-based tasks in this strain [17,18]. Male and female DAT-NR1 null mutants (n=3 males and n=4 females) and littermate controls (n=4 males and n=4 females) were bred at the University of Washington and shipped to NIAAA after weaning. Mice were 240 days old at the time of testing and genotypes were matched for age and free-feeding weight at the start of testing. All experimental procedures were performed in accordance with the National Institutes of Health Guide for Care and Use of Laboratory Animals and approved by the local NIAAA Animal Care and Use Committee.
Behavioral testing
Apparatus and pre-training
Testing procedures were based on those previously reported [9,19,20] using the Bussey-Saksida Touch Screen System (model 80614, Lafayette Instruments, Uafayette, IN, USA). Prior to testing, body weight was reduced and maintained at 85% free-feeding weight throughout testing to motivate responding. Reward was a 14 mg food pellet (#F05684, BioServ, Frenchtown, NJ, USA), provided first in the home cage and then in the test chamber for 30 minutes to acclimate mice to the training environment and the reward (~10 reward pellets/mouse). Prior to discrimination, mice were trained to associate the dispensing of reward with presentation of a 2-second, 65-dB tone and illumination of the magazine light, initiate each trial with a head entry into the food magazine upon illumination, touch 1 of 2 touchscreen windows with a 6.5 cm2 stimulus (selected randomly from a catalogue) to receive a reward, and avoid indiscriminate touchscreen responding (i.e., touches at a blank window).
Two novel 6.5 cm2 stimuli (‘fan’ and ‘marbles’) (Figure 1A) were presented simultaneously: responses at the ‘fan’ stimulus produced a food reward at a continuous rate of reinforcement, responses at the ‘marble’ stimulus (=‘errors’) produced no food reward and a 15-second ‘timeout’ period. Each error was followed by a correction trial (“correction”) in which the 2 stimuli were presented in the same spatial configuration. The next trial proper could not begin until a correct response was made on a correction trial. Mice were given 30 trials (excluding any correction trials) per session (1 session per day) until they attained a performance criterion of >85% correct responses on two consecutive sessions. Dependent measures included sessions to criterion, total trials, percent correct responses (=100*(correct choices/total choice)), total errors, total correction trials (=corrections), and latency to choice and reward.
On the session following attainment of discrimination criterion, the designation of stimuli as correct versus incorrect was reversed for each mouse (correct = marbles, incorrect = fan). Mice were trained on 30-trial daily sessions to a criterion of ≥85% correct responding (excluding correction trials) over 2 consecutive sessions. Dependent measures included sessions to criterion, total trials, percent correct responses (=100*(correct choices/total choice)), total errors, total correction trials (=corrections), and latency to choice and reward. Because reversal (unlike discrimination) is characterized by early perseveration at the previously rewarded stimulus, whereas late reversal is dominated by learning about the newly rewarded stimulus [18–20], data were also split into early and late reversal sessions (Figure 2A). This was accomplished for each mouse by averaging all sessions on which performance was <50% correct (=early) and ≥50% correct (=late). Additionally, the number of discrimination sessions was correlated with the number of trials, errors, and corrections trials in late reversal across all mice.
Statistical analysis
Data were analyzed with t-tests corrected for multiple comparisons with the Holm-Sidak method and Pearson correlation, as appropriate, using GraphPad Prism v7.
Impaired discrimination in DAT-NR1 null mutant mice
The number of sessions necessary to reach criterion during discrimination was significantly greater in DAT-NR1 null mutant mice, as compared to non-mutant littermate controls (t12 = 2.889, P = 0.013) (Figure 1B). The impairment in the mutant mice was further demonstrated by a difference in the session-to-criterion survival curves in the genotypes (Log-rank test, χ2(1, n = 14) = 5.062, P = 0.025) (Figure 1C). There was also a significant increase in total trials in the mutant mice (t12 = 3.088, P = 0.028), but no differences in errors (t12 = 2.480, P = 0.057) or correction trials (t12 = 1.696, P = 0.116) (Figure 1D) (it should be noted that these measures may have reached the threshold for statistical significance given a larger sample size). The mutant mice performed (measured as percent correct responses) as well on the task as controls for the first 5 sessions, at which point performance of the mutants plateaued, while controls progressed to criterion. One mutant failed to meet criterion even after 30 sessions and was consequently not advanced to reversal testing. There were no statistical differences in the latency to respond (correct: t12 = 1.724, P = 0.209; incorrect: t12 = 2.031, P = 0.183) or collect the reward (t12 = 1.719, P = 0.209) between genotypes, though there was a trend for longer latencies in the mutant mice (Figure 1E).
Improved late reversal learning in DAT-NR1 null mutant mice
Genotypes did not differ in sessions (t11 = 1.373, P = 0.197) (Figure 2B), survival curves (Log-rank test, χ2(1, n = 13) = 1.947, P = 0.163) (Figure 2C), total trials (t11 = 1.620, P = 0.249), errors (t11 = 1.944, P = 0.216), or correction trials (t11 = 1.396, P = 0.249) to reversal criterion (Figure 2D). Latency to collect the reward was unchanged (t11 = 0.192, P = 0.851) but the null mutant mice were slower to make both correct (t11= 3.288, P = 0.014) and incorrect (t11= 3.714, P = 0.010) responses than controls (Figure 2E). Splitting data into early (<50% correct) and late (≥50% correct) reversal sessions, revealed that the mutant mice made fewer trials (t11= 3.110, P = 0.029) (Figure 2F) and errors (t11= 2.672, P = 0.043) (Figure 2G) than controls during late reversal, but not early reversal (trials: t11 = 0.465, P = 0.651; errors: t11 = 0.686, P = 0.507). Though the number of correction trials at late reversal were not statistically different between genotypes, this became significant (t10= 2.878, P = 0.033) on removal of 1 outlying value from the control group (Grubbs test, P < 0.05) (Figure 2H). With this outlier removed, the number of discrimination sessions completed also correlated significantly with multiple measures of performance during late reversal, including trials (r = −0.761, P = 0.004), errors (r = −0.835, P = 0.001), and correction trials (r = −0.880, P < 0.001).
The current findings demonstrate that reductions in phasic DA release, as produced by selective deletion of the NMDA-NR1 subunit on DAergic neurons, produces an impairment in pairwise visual discrimination learning, but a facilitation of the ability to acquire a reversal of stimulus-reward contingencies. DAT-NR1 null mutant mice and their non-mutant control littermates performed equally during discrimination until performance reached about 70% correct responses (5 sessions). At this point, the mutants plateaued in their performance and were slow to attain the a priori criterion of 85% correct. These data show that NMDA receptor-mediated modulation of phasic DA is not necessary for early discrimination learning, but is important for the establishment of a high level of discrimination performance. The finding that null mutant mice can learn the basic task structure but are impaired at acquiring the discrimination is consistent with previous studies using this model of reduced DA release [3,17] as well as theories suggesting a critical role for DA in reinforcement learning [22,23].
Earlier studies have concluded that striatal DAergic signaling plays a facilitatory role in reversal learning [7–15]. Surprisingly however, although DAT-NR1 null mutant mice completed nearly twice as many trials as the controls during discrimination, there were no genotype differences in behavior during early reversal, when performance was below 50% correct. This could suggest that the strength of the discrimination memory was equivalent across the genotypes, because a weaker memory for the old contingencies would presumably have produced less perseveration to the old CS+ after their reversal [24–26].
During late reversal sessions, when correct choice was greater than 50% (likely reflecting low perseveration and new learning) null mutant mice made fewer errors and completed fewer correction trials than controls, indicative of superior performance at the late reversal stage. This result suggests facilitation of learning about the newly rewarded stimulus in mice with reduced phasic DA release and is consistent with prior reports observing impaired performance in mice with increased striatal DA levels due to genetic knockdown of the dopamine transporter (DAT) [27–29] or deletion of D2 autoreceptors [30] (but see [16]). Finally, while we did observe a longer latency to make a choice during reversal in null mutant mice, reward collection latency was unchanged, suggesting motor behaviors were preserved despite reduced phasic DA release.
There are a number of potential interpretations for the apparently contradictory pattern of impaired discrimination but improved reversal. One is that the more extensive training the mutants received during discrimination produced an overtraining effect that has been associated with faster reversal [31–33]. This possibility is supported by the observation that the number of discrimination sessions correlated negatively with the number of trials, errors, and corrections trials in late reversal. Conversely, improved reversal could be read as evidence that the discrimination memory was indeed weaker but that, for reasons that are unclear, this did not manifest as reduced perseveration earlier in testing. Another, non-exclusive, possibility is that it could stem from attenuated detection of the expectancy violations present during reversal, such that that mutants are in effect able to learn the reversed contingencies with less interference from the prior outcome expectancies. In this context, VTA DA neurons are known to be responsive (increase or decrease their firing) to aversive events, including the absence of reward [34–41], and we recently reported DA release in the VTA → NAc pathway when rewards are unexpectedly rewarded during reversal in this same task [16]. DAT-NR1 null mutant mice also exhibit reduced DA release to tail pinch and impairments when learning requires avoiding (Morris water maze) or responding to (Pavlovian fear conditioning) aversive stimuli [3,5,6], suggesting the loss of phasic DA neuronal firing might blunt the detection of negative events more generally.
This interpretation remains speculative in the absence of further studies and there are a number of other noteworthy caveats to the current dataset. First, NR1 deletion on DAT neurons reduces, but does not eliminate, phasic DA release [5] and thus any residual phasic DA signaling could obscure effects on behavior in the discrimination and reversal task. Related to this, because no measurements of altered DA release were made in the mutants as they were performing this task, we have no direct insight into the precise DAergic correlates of the behavioral abnormalities observed. Further, the mutant model studied here is not restricted to any particular DAergic neuronal pathway, which limits any inferences regarding the importance of specific circuits (e.g., cortical versus striatal) known to play dissociable roles in this task [20,42]. We also cannot rule out the possibility that reduced DA signaling in the retina could have caused visual impairments in the mutant mice that affected their performance on the task [43]. Finally, because our study was not sufficiently powered to detect sex differences in the results and there is little data regarding sex differences in reversal learning, we can discount the possibility that the effects of NR1 deletion on reversal are influenced by sex.
In sum, the current study found that reductions of phasic DA release caused by deletion of the obligatory NMDA receptor NR1 subunit on DAT-expressing cells produced significant and complex disturbances in visual discrimination and reversal. In view of previous, sometimes discrepant reports concerning the role of NMDARs in behavioral flexibility [20,26,44–55], these findings highlight the potentially distinct contributions of NMDARs in different brain regions and neuronal populations to flexible behavior and add to the literature implicating DA in a wide-range of cognitive and reward-related behavioral functions.
NMDARs on DAT+ neurons were genetically deleted in mice.
Mice were tested in a pairwise discrimination and reversal learning paradigm.
DAT-NR1 null mutants were impaired at discrimination compared to controls.
Mutants performed better than controls during late reversal (performance ≥ 50%).
These data add to evidence that DA is involved in a range of reward-related behaviors.
Funding sources and acknowledgements
Research supported by the National Institute on Alcohol Abuse and Alcoholism Intramural Research Program and R01MH094536 to LSZ.
