Abstract
The basal ganglia (BG) mediate certain types of procedural learning, such as probabilistic classification learning on the ‘weather prediction task’ (WPT). Patients with Parkinson's disease (PD), who have BG dysfunction, are impaired at WPT‐learning, but it remains unclear what component of the WPT is important for learning to occur. We tested the hypothesis that learning through processing of corrective feedback is the essential component and is associated with release of striatal dopamine. We employed two WPT paradigms, either involving learning via processing of corrective feedback (FB) or in a paired associate manner (PA). To test the prediction that learning on the FB but not PA paradigm would be associated with dopamine release in the striatum, we used serial 11C‐raclopride (RAC) positron emission tomography (PET), to investigate striatal dopamine release during FB and PA WPT‐learning in healthy individuals. Two groups, FB, (n = 7) and PA (n = 8), underwent RAC PET twice, once while performing the WPT and once during a control task. Based on a region‐of‐interest approach, striatal RAC‐binding potentials reduced by 13–17% in the right ventral striatum when performing the FB compared to control task, indicating release of synaptic dopamine. In contrast, right ventral striatal RAC binding non‐significantly increased by 9% during the PA task. While differences between the FB and PA versions of the WPT in effort and decision‐making is also relevant, we conclude striatal dopamine is released during FB‐based WPT‐learning, implicating the striatum and its dopamine connections in mediating learning with FB. Hum Brain Mapp 35:5106–5115, 2014. © 2014 The Authors. Human Brain Mapping Published by Wiley Periodicals, Inc.
Keywords: basal ganglia, 11C‐raclopride positron emission tomography, non‐motor skill learning, probabilistic learning, procedural learning, weather prediction task
INTRODUCTION
Historically, declarative (explicit and conscious) and procedural (implicit and unconscious) forms of learning were considered to be based on two distinct memory systems, respectively, mediated by the medial temporal lobe (MTL) structures and the basal ganglia (Butters et al., 1985; Cohen and Squire, 1980; Knowlton et al., 1996a). These are now known to interact and act cooperatively or competitively during difference stages and conditions of learning (Foerde et al., 2006; Moody et al., 2004; Poldrack et al., 2001b, 1999; Seger and Cincotta, 2006; Wang et al., 2010).
One task that has been employed to study incidental learning in man is the weather prediction task (WPT), initially employed by Knowlton et al. (1994). The WPT is a probabilistic classification task involving incremental learning over many trials, also considered to occur without any explicit knowledge. On each trial, participants are presented with a particular arrangement of 1, 2, or 3 of 4 possible tarot cards (Fig. 1), each of which shows a different pattern (e.g., squares, diamonds, circles, or triangles). Participants are required to use the cards presented on each trial to predict a binary outcome: whether the weather will be rainy or fine. Each of the four cards is independently associated with the two possible outcomes with a fixed probability, and overall the outcomes occur with equal frequency. For example, the squares, diamonds, circles and triangles, respectively, predict the outcome ‘fine’ with a fixed probability of 0.2, 0.4, 0.6, and 0.8. Typically, participants perform around 200 trials of the WPT with corrective feedback (FB) on each trial. The feedback consists of a ‘thumbs up’ or ‘thumbs down’ message following correct and incorrect responses, respectively. By learning the independent cue‐outcome associations across trials, healthy participants can improve their predictive accuracy to well above chance across training trials on the WPT.
In one early study, Knowlton et al. (1996a) observed a double dissociation between the WPT learning of amnesic patients with MTL damage and Parkinson's Disease (PD) patients with striatal dysfunction, with only the latter group showing impaired learning on this task. This was supported by subsequent studies which confirmed impaired probabilistic classification learning in both patients with PD (Jahanshahi et al., 2010; Knowlton et al., 1996a; Shohamy et al., 2004; Wilkinson et al., 2008, Experiment 2; Witt et al., 2002) and Huntington's disease (HD) (Holl et al., 2012; Knowlton et al., 1996b).
Shohamy et al. (2004) reported that while PD patients were impaired relative to controls on the ‘standard’ version of probabilistic classification learning which involves learning with corrective feedback (i.e., FB‐based), they were unimpaired on a paired associate (PA) version of the task, where learning occurred via observation and without corrective feedback. It was concluded from these findings that PD patients are impaired at probabilistic classification learning because it entails learning with corrective feedback rather than because incidental learning per se recruits the BG. However, Wilkinson et al (2008) failed to observe a selective deficit in FB relative to PA‐based WPT learning in PD patients, and instead found impairments in both PA and FB learning in PD that were related to the severity of disease—with less severe patients being unimpaired at WPT learning. Furthermore, it has been demonstrated that relative to controls, patients with PD were impaired on the standard WPT task when tested on dopaminergic medication but not in the off state (Jahanshahi et al., 2010). Therefore, based on these results it is likely that the presence or absence of corrective feedback, disease severity and medication state are all important in determining whether PD patients are impaired at probabilistic classification learning on the WPT.
However, a more recent study from our group did demonstrate a selective deficit in HD patients for FB relative to PA‐based WPT learning (Holl et al., 2012). Furthermore, as the HD patients we tested in our study were not on dopaminergic medication and were in the early stages of the disease, these results constitute more convincing evidence that there is selective recruitment of the striatum during WPT learning with feedback relative to PA‐based learning.
No studies have investigated in vivo the role of dopamine in modulating learning on the FB and PA versions of the WPT and the aim of the present study was to do this. 11C‐raclopride (RAC) positron emission tomography (PET) provides an in vivo measurement of striatal post‐synaptic dopamine D2 availability. The RAC binding potential (BPND) (for a concencus on non‐menclature see Innis et al., 2007) is inversely related to the concentration of endogenous synaptic dopamine at the time of scanning (Laruelle, 2000). In this study, we employed a two‐scan RAC PET protocol, with each participant performing either an active (WPT–FB or PA) task or a control task during scanning. This allowed us to compare striatal dopamine release during probabilistic classification learning on the FB and PA versions of the WPT. Based on the results of previous behavioral and imaging studies, we predicted that the FB but not the PA version would be associated with significant decrease in striatal RAC binding relative to the control task.
MATERIALS AND METHODS
Participants
15 healthy volunteers were recruited; none of the participants had any neurological disorder or history of psychiatric illness, drug or alcohol abuse or were on any drug treatments that might influence performance. Participants were asked not to smoke or drink caffeinated drink for at least 12 h prior to the scan, although we did not control for their average daily consumption of caffeine or nicotine. Participants completed the National Adult Reading Test (NART) to obtain estimates of premorbid IQ, the Mini‐Mental State Examination (MMSE) (Folstein et al., 1975) and the Beck Depression Inventory (BDI) (Beck, 1978) to, respectively, screen for cognitive impairment or depression. The study was approved by the Research Ethics Committee of Hammersmith, Queen Charlotte's and Chelsea and Acton Hospitals Trust. Permission to administer radioactive substances was granted by the Administration of Radioactive Substances Advisory Committee of the UK. All participants gave written informed consent to take part in this study in accordance with the Declaration of Helsinki. Participants were randomly assigned to the following groups: FB (n = 7, 3 female) aged 45–70 years (M = 56.86, SD = 8.7) or PA group (n = 8, 3 female) aged 45–69 years (M = 55.38, SD = 8.9). Information about the groups is presented in Table 1.
Table 1.
FB group (n = 7) | PA group (n = 8) | P | |
---|---|---|---|
Age (years) | 56.86 ± 8.7 | 55.38 ± 8.9 | 0.75 |
Years of education | 12.00 ± 2.0 | 14.25 ± 4.3 | 0.29 |
Mini‐mental state examination (0–30) | 29.69 ± 0.5 | 29.00 ± 1.7 | 0.37 |
Premorid IQ estimate from NART | 112.17 ± 7.8 | 114.67 ± 11.8 | 0.67 |
Beck Depression Inventory (0–63) | 1.50 ± 1.9 | 3.00 ± 4.3 | 0.45 |
Amount of 11C‐raclopride injected (MBq) | 291.48 ± 25.2 | 297.06 ± 27.7 | 0.72 |
FB = feedback, PA = paired associate, WPT = weather prediction task, NART = National Adult Reading Test.
Numbers shown are mean ± standard deviation.
Apparatus and Materials
The stimulus material were drawn from a set of four tarot cards, each with a different geometric pattern (e.g., squares, diamonds, circles, triangles), arranged horizontally across the middle of the computer screen in black against a white background (Fig. 1).
During each condition (FB and PA), there were 400 training trials; we used more trials than usual to adapt to the duration of the scanning procedure. On each trial, participants were presented with a particular arrangement of cards comprising one, two or three of the four possible tarot cards. There were 14 possible arrangements of these cards, as the four card and no card patterns were not used. Each arrangement of cards was associated with one of two outcomes (rainy or fine) and overall these two outcomes occurred with equal frequency. The learning set was constructed such that each individual card was associated with an outcome with a fixed independent probability. For example, the fixed probability that the outcome was rainy was 0.2 if squares (card 1) were present, 0.4 if diamonds (card 2) were present, 0.6 if circles (card 3) were present and 0.8 if triangles (card 4) were present. The probability assigned to each card was counterbalanced and the probability of an outcome on a particular trial was based on the combined probability of the cards present (see Table 2 for the 14 possible card arrangements employed in the task, along with the probability of the outcome for each of the 14 patterns).
Table 2.
Arrangement | Cards present | Mean total | P (arrangement) | P (fine or hot/arrangement) | |||
---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | ||||
A | 0 | 0 | 0 | 1 | 38 | 0.095 | 0.895 |
B | 0 | 0 | 1 | 0 | 18 | 0.045 | 0.778 |
C | 0 | 0 | 1 | 1 | 52 | 0.13 | 0.923 |
D | 0 | 1 | 0 | 0 | 18 | 0.045 | 0.222 |
E | 0 | 1 | 0 | 1 | 24 | 0.06 | 0.833 |
F | 0 | 1 | 1 | 0 | 12 | 0.03 | 0.5 |
G | 0 | 1 | 1 | 1 | 38 | 0.095 | 0.895 |
H | 1 | 0 | 0 | 0 | 38 | 0.095 | 0.105 |
I | 1 | 0 | 0 | 1 | 12 | 0.03 | 0.5 |
J | 1 | 0 | 1 | 0 | 24 | 0.06 | 0.167 |
K | 1 | 0 | 1 | 1 | 18 | 0.045 | 0.556 |
L | 1 | 1 | 0 | 0 | 52 | 0.13 | 0.077 |
M | 1 | 1 | 0 | 1 | 18 | 0.045 | 0.444 |
N | 1 | 1 | 1 | 0 | 38 | 0.095 | 0.105 |
Total | 400 | 1.00 |
0 = card absent, 1 = card present.
In summary, two cards were predictive of rainy weather, one strongly (card 4), one weakly (card 3), and two cards were predictive of fine weather, one strongly (card 1), one weakly (card 2). Overall, participants experienced identical arrangement frequencies (order randomized for each participant) but the actual outcomes could differ slightly across participants. The position of the cards on the screen were held constant within participants, but counterbalanced across participants.
Procedure
All participants underwent RAC PET twice within a 4‐week period. During one of the PET sessions, participants completed either the FB or the PA version of the WPT and during the other session; they all performed the control task. 50% of participants in each group did the control task first and the remainder did the reverse.
WPT‐FB Condition
On each trial participants were presented with an arrangement of cards (see Apparatus & materials), the cards appeared on the screen for a total of 7 s. During this time, participants were asked to predict the weather on that trial, which required them to classify the card arrangement into one of the two possible outcomes, rainy or fine; responses were made via two response buttons on a response pad. Following their response, feedback appeared on the screen depending on whether the response was correct (thumbs up) or incorrect (thumbs down) (Fig. 2). The feedback and the card arrangement both remained on the screen for the remainder of the 7‐s period after which they disappeared. This was followed by a blank screen for 2 s before the next combination of cards were presented. If participants failed to make a response, the card arrangement appeared on the screen for the duration of 7 s but no feedback was provided. There were 400 training trials. The task started 5 min before injection of tracer and ended 5 min before completion of RAC PET (total duration 60 min). Participants' performance on the last 50 training trials was used to assess how much they had learned the cue‐outcome probabilities.
WPT‐PA Condition
On each trial participants were presented with an arrangement of cards (see Apparatus & materials) and at the same time as the cards appeared on the screen, participants were shown the outcome for that arrangement, for example, rainy or fine; thus, no classification was required. Both card arrangements and outcomes remained on the screen for a fixed period of 7 s. During this time, participants were required to press a response button with their right index finger to indicate that they had seen the card arrangements/outcomes, but this response did not influence the timing of the presentation of the stimuli, which was always fixed. There was then a two‐second blank screen before the next combination of cards was shown. There were 400 trials. The training phase task started 5 min before injection of tracer and ended 5 min before completion of RAC PET (total duration 60 min). In addition, there was a separate test phase which consisted of a further 42 trials. These test trials were completed immediately after completion of the PET scan. Participants were required to predict outcomes based on the given combination of cards but they did not receive any feedback. Responses were self‐paced during the test phase. Participants' performance on the test trials was used to assess how much they had learned the cue‐outcome probabilities.
Control Condition
As for the FB and PA conditions, the control task comprised 400 trials that were completed while participants had a RAC PET scan. On each trial participants were presented with an arrangement of between one and three of four possible cards, these were in the same positions on the screen as the card arrangements that were used in the experimental conditions. However, here the patterns on the four cards were identical and were not related to any outcomes or followed by corrective feedback. The card arrangements remained on the screen for a fixed period of 7 s after which they disappeared and the next card arrangement appeared after 2 s. As for the PA condition, participants were required to press a response button with their right index finger to indicate they had seen the card arrangements. There was no test phase.
PET Scanning
PET was performed using an ECAT EXACT HR+ (CTI/Siemens 962, Knoxville, TN) tomograph with a total axial field of 15.5 cm. 63 transaxial image planes were displayed as 2.46 mm slices with a reconstructed axial resolution of 5.4 mm and a transaxial resolution of 5.6 mm (Brix et al., 1997). A 10‐min transmission scan was performed prior to injection of the tracer to correct for tissue attenuation of 511 keV gamma radiation. Dynamic emission scans were acquired in three‐dimensional mode. The mean injected doses of RAC for each group is listed in Table 1. Scanning began at the start of tracer infusion generating 20 periods over 60 min. A laptop was used to present the WPT or control task to the participants, and the tasks commenced 5 min before the injection of RAC. RAC was supplied by Hammersmith Imanet.
Image Analysis
Parametric images of RAC BPND were generated using a basis function implementation of the simplified reference tissue model using cerebellar cortex to estimate non‐specific tracer uptake (Gunn et al., 1997). An image of integrated RAC signal from 0 to 60 min (an “ADD” or summed image) was also created for each participant. The ADD images were then spatially normalized to an in‐house RAC template in standard stereotaxic (MNI) space using statistical parametric mapping (SPM2) software (Wellcome Functional Imaging Laboratory, London). The transformation matrices were then applied to the corresponding RAC parametric image. A standard region‐of‐interest (ROI) object map that outlined putamen, heads of caudate nucleus and ventral striatum was defined on the RAC template with magnetic resonance imaging guidance (Whone et al., 2004). The ROI object map was then applied to the individual RAC parametric images to sample RAC BPND. The investigator (YFT) analyzing the scans was blinded to the task associated with each scan.
Statistical Analysis
In all analyses involving repeated measures ANOVA if there was a violation of the sphericity assumption, Pillai's multivariate test of significance was employed (V). Thus, if the Greenhouse–Geisser was less than 1.0, Pillai's exact F is reported. A significance criterion of α = 0.05 was used, unless otherwise specified. All significance levels reported are two‐tailed.
Behavioral Data
To establish learning in both groups, overall performance indexed by mean proportion correct across 42 trials of the test phase for the PA learning group and across trials 351–400 were each compared to chance (50%; i.e., two possible outcomes) using one sample t tests, these measures were also compared with each other using an independent samples t test. α = 0.002 was used, following Bonferroni corrections.
For the feedback condition mean proportion across 8 blocks of 50 trials was analyzed with a general linear model; repeated measures ANOVA with block (8 levels) as a within subject factor. In addition, to establish the timing of the emergence and progression of learning across blocks in this condition, mean proportion correct per block was compared to chance (50%) across all blocks using one sample t tests, α = 0.01 was used, following Bonferroni corrections.
Striatal 11‐C‐Raclopride Binding
First we established that the baseline mean striatal 11‐C‐raclopride binding potentials were comparable across Feedback and PA groups in the control conditions alone using a general linear model; an ANOVA was performed on mean RAC BPND during the control condition with brain Region (right and left: caudate vs. putamen vs. ventral striatum, i.e., 6 levels) and Group (FB vs. PA) as a between groups variable.
Second, to establish task related changes in mean RAC BPND across regions using a general linear model, an ANOVA was performed on mean RAC BPND with Region and Condition (active vs. control) as within subjects variables and Group (FB vs. PA) as a between groups variable.
This was followed up where appropriate by post‐hoc comparisons of RAC BPND across ROIs including independent and paired samples t tests.
RESULTS
There was no difference between the two groups in terms of age, years of education, NART estimated premorbid IQ, BDI, and MMSE scores (see Table 1).
Behavioral Data
Overall learning in PA and FB groups
Figure 3 depicts mean proportion correct during the last 50 trials of the FB condition (i.e., trials 351–400) and PA test phase, plotted separately for the two groups. Participants in both groups scored well above chance (50%) at this point in the task (FB 71% correct; PA 82%) and the performance of both groups was significantly greater than chance FB [t(6) = 3.52, P = 0.01], PA [t(7) = 16.34, P < 0.001] indicating significant learning of the task in both groups. Performance in the PA group was comparable to the FB group [t(7.9) = −2.75].
FB‐based learning across blocks
Figure 4 shows mean proportion correct across eight blocks of 50 trials for the FB group. An ANOVA was performed on mean proportion correct with Block (1–8) as a within subjects variable. The main effect of Block was not significant [F(1,42) = 1.11, P = 0.38]. However, for the main effect of Block there was a significant linear trend [F (1,6) = 7.65, P = 0.01] reflecting the fact that proportion correct performance increased across trials. Participants' proportion correct performance was significantly better than chance for all blocks of trials from Block 3 onwards: Block 1 (trials 1–50) [t(6) = 3.31], Block 2 (trials 51–100) [t(6) = 3.08], Block 3 (trials 101–150) [t(6) = 3.72, P = 0.01], Block 4 (trials 151–200) [t(6) = 3.77, P = 0.01], Block 5 (trials 201–250) [t(6) = 4.12, P = 0.01], Block 6 (trials 251–300) [t(6) = 3.70, P = 0.01], Block 7 (trials 301–350) [t(6) = 6.60, P = 0.001], Block 8 (trials 351–400) [t(6) = 3.52, p = 0.01]. Participants in the FB group learned the task after 100 trials, after which performance remained more or less stable until the end of training; although there was a slight decrease in performance during trials 251–300 (69%) and 301–351 (68%).
Striatal 11‐C‐raclopride binding
Mean striatal RAC BPND for the ‘active’ conditions of both groups (i.e., FB or PA) and also for each groups' respective ‘control’ condition are depicted in Figure 5.
To establish that the control conditions we employed as a baseline in the FB and PA groups were comparable with respect to mean RAC BPND, as well as in terms of behavior, an ANOVA was performed on mean RAC BPND measured in the control condition with brain Region (right and left: caudate vs. putamen vs. ventral striatum, i.e., 6 levels) and Group (FB vs. PA) as a between groups variable. The main effect of Region was significant [V = 0.98, F(5,9) = 69.70, P < 0.001], indicating either that there was a difference in levels of basal DA across regions related to performance of the control task alone and/or this effect was related to the possibility that dopamine D2/3 receptor levels are not homogenous across the striatum. However, neither the main effect Group (F < 1) nor the interaction between Region × Group [F(5,9) = 1.49] were significant, indicating the control conditions were comparable across the FB and PA groups and were therefore suitable for use as a baseline measure of RAC BPND.
To determine task‐ related changes in mean RAC BPND across regions, an ANOVA was performed on mean RAC BPND with Region and Condition (active vs. control) as within subject variables and Group (FB vs. PA) as a between groups variable. The main effect of Region was again significant [V = 0.98, F(5,9) = 105.54, p < 0.001]. Importantly, there was also a significant three way interaction between Group × Condition × Region [F(5,65) = 5.18, P < 0.001], indicating the difference between RAC BPND for active and control conditions differed significantly across regions analyzed and between the FB and PA groups. The main effects of Group [F(1,13) = 1.15] and the main effect of Condition and all other interactions between Condition × Group, Region × Group, Condition × Region did not reach significance (all Fs < 1).
In light of the significant Group × Condition × Region interaction, both within and between subject differences in RAC BPND were explored. The percentage change in RAC BPND between active and control tasks was calculated using the formula:
Between‐group differences
Figure 6 shows mean percentage change in striatal RAC BPND, for all six regions and plotted separately for the two groups. Mean percent change in RAC BPND was significantly different between the groups in the right ventral striatum [t(13) = −2.33, P = 0.04, Cohen's d = 1.5], with the left ventral striatum on the borderline of statistical significance [t(13) = −1.88, P = 0.08]. All other comparisons between the groups were not significant (ts < 1).
Within‐subject differences
Mean RAC BPND for active relative to control conditions was compared for each group separately and for each region (Fig. 5). No significant or marginally significant within‐group differences in striatal RAC BPND were noted in the PA group. However, right ventral striatal RAC BPND increased (5.0%) during the PA task compared to the control task although the change was not statistically significant [t(7) = 1.20, p = .27]. The left ventral striatal RAC BPND also showed a non‐significant trend to increase in the PA task by 4.6% [t(7) = 1.00, p = .35]. This comparison did not detect significant changes in any other region, left caudate [t(7) = −1.55, p = .17], right caudate and right and left putamen (all ts < 1).
In contrast to the PA task, in the FB group, there was a marginally significant reduction in RAC BPND in the right and left ventral striatum when performing the active task compared to the control task (13.4% reduction in the right, t(6) = −2.01, p = .09, 6.0% reduction in the left, t(6) = −2.18, p = .07), indicating release of synaptic dopamine during the FB task. For the FB group, this comparison did not trend towards significance for any other region, left putamen [t(6) = −1.15, p = .29], right putamen and right and left caudate (all ts < 1)
One participant in the feedback group achieved a very low score during the feedback training phase with an average score of just 50% correct (i.e. chance) across all 400 trials. In several previous studies of WPT learning, participants who fail to learn the task possibly due to failing to pay sufficient attention to the task throughout, have been excluded from the analysis (e.g. Shohamy et al., 2004). We repeated all of the above analyses following the exclusion of this participant in the FB group who scored at chance. Following exclusion of this participant, mean RAC BPND for active relative to control conditions was again compared for the feedback group and for each region (see Table 3). In the FB group, there was now a significant reduction in RAC BPND in the right ventral striatum when performing the active compared to the control task (17.0% reduction, t(5) = −2.73, p = .04), indicating release of synaptic dopamine during the FB task, particularly in the right ventral striatum. All other findings resulting from this new analysis were identical to the above.
Table 3.
FB (n = 6) | PA (n = 8) | |||
---|---|---|---|---|
Control task | Active task | Control task | Active task | |
R caudate | 2.13 ± 0.07 | 2.12 ± 0.09 | 2.20 ± 0.11 | 2.16 ± 0.05 |
L caudate | 2.06 ± 0.08 | 2.03 ± 0.09 | 2.17 ± 0.11 | 2.08 ± 0.05 |
R putamen | 2.64 ± 0.08 | 2.61 ± 0.09 | 2.66 ± 0.07 | 2.66 ± 0.07 |
L putamen | 2.56 ± 0.09 | 2.48 ± 0.08 | 2.54 ± 0.07 | 2.51 ± 0.09 |
R ventral striatum | 1.79 ± 0.04* | 1.56 ± 0.10* | 1.68 ± 0.08 | 1.76 ± 0.06 |
L ventral striatum | 1.93 ± 0.07 | 1.80 ± 0.05 | 1.88 ± 0.12 | 1.98 ± 0.06 |
R = right, L = left.
Numbers shown are mean ± standard error.
Significant comparison (P < 0.05).
Correlations
We examined the relationship between learning on each of the 8 blocks of 50 trials of the 400 trials and the reduction in RAC BPND observed in the right ventral striatum. We observed a significant negative correlation between mean percentage change in RAC BPND in the right ventral striatum and learning during trials 101‐ 150 [ρ = ‐.81, p = .03] (Fig. 7). Correlations for all other blocks were not significant (all ps < .05). From the figure, it is clear that during trials 101–150, learning at the group level increased from around 65% to 70% after which participants performed consistently at around 70% but there was little further improvement in performance, so the acquisition of knowledge and the stabilization of the learning process was specifically related to an increase of dopamine in the right ventral striatum.
DISCUSSION
While previous functional MRI studies have detected significant blood‐oxygen‐level‐dependent (BOLD) signal changes in the striatum during classification learning (Poldrack et al., 2001b), our study is the first to demonstrate in vivo release of striatal dopamine during the FB version of the WPT. There was a significant 13–17 % reduction in right ventral striatal RAC BPND during the FB version of the WPT compared to the control task, and no significant RAC BPND reduction during the PA version of the WPT task, consistent with the role of the ventral striatal dopamine in mediating learning with feedback.
There is a substantial body of literature supporting the role of the ventral striatum in learning, including tasks that involve probabilistic classification (Atallah et al., 2007; Cools et al., 2006; Cools et al., 2001; Rodriguez et al., 2006; Seger and Cincotta, 2006). One study (Atallah et al., 2007) has shown that the ventral, but not the dorsal, striatum is involved in learning of instrumental skills in rats. Using fMRI, several studies have found that the ventral striatal activation is associated with feedback processing (Seger and Cincotta, 2006) and sensitivity to prediction error (Rodriguez et al., 2006) during classification learning. The ventral striatum is also thought to mediate probabilistic reversal learning, another feedback‐based learning task which is impaired in PD when patients are tested on medication (Cools et al., 2001), with evidence that levodopa disrupts activation of the nucleus accumbens during reversal learning in these patients (Cools et al., 2006).
We did not detect significant dopamine release in the caudate nucleus as suggested by other studies (Poldrack et al., 2001a). Poldrack and colleagues showed that the caudate activity was low early in the FB version of the WPT, and increased as learning progressed. PET measures average dopamine release throughout the duration of the scan (in our study, 60 min). It is possible that due to its fluctuating temporal activity, we did not detect a significant increase in ‘average’ dopamine release in the caudate nucleus.
It is likely that a mesencephalic dopamine network, arising from the ventral tegmental area and projecting to ventral striatum and orbitofrontal/medial frontal cortices, underlies classification learning (Aron et al., 2004). In their comprehensive review on the role of basal ganglia in probabilistic classification learning, Shohamy and colleagues suggested that dopamine can modulate feedback‐based learning at two complementary levels: (1) at a synaptic level via stimulus‐specific phasic release; (2) at circuit level, via the relative balance of dopamine levels within sub regions of cortico‐striatal circuits (Shohamy et al., 2008). Evidence from our lab has shown that compared to matched healthy controls, PD patients are impaired on the FB version of WPT only on medication but not when tested off medication (Jahanshahi et al., 2010). This suggests a dopamine ‘overdose’ effect when tested on medication, and supports the proposal that tonic increase of dopamine with dopaminergic medication masks phasic changes in dopamine release essential for classification learning (Jahanshahi et al., 2010; Shohamy et al., 2006; Shohamy et al., 2008).
Our findings show that there is a selective increase in dopamine release in the ventral striatum during WPT learning with FB but not PA based learning. These results are consistent with the demonstration of a selective deficit in FB relative to PA based learning that has been observed before both in PD patients (Shohamy et al., 2004) and also in HD patients (Holl et al., 2012). Increased striatal dopamine release has been observed in a demanding mental arithmetic task (Pruessner et al., 2004) and in relation to the uncertainty of the decision making process (Linnet et al., 2012). As the FB task potentially involved more effort and decision‐making, this may partly explain the pattern of DA release observed. However, the finding that the stabilization of learning on the task at block 3 was significantly associated with dopamine release in the right ventral striatum, partly excludes these alternative explanations.
The laterality of dopamine release in the right ventral striatum observed by us has similarities to that previously reported by Martin‐Soelch et al. (2011), who found clear right ventral striatal dopamine release in response to unpredictable monetary reward during performance of a ‘slot machine task’. Our findings extend this observation of right ventral dopamine release to learning with corrective feedback, where there was a mixture of both positive and negative feedback. Our results further establish that the dopamine release in the right ventral striatum was tightly correlated with learning on block 3, which was the point where learning stabilized in that group.
CONCLUSIONS
While differences between the FB and PA versions of the WPT in effort and decision‐making is also relevant, we conclude that striatal dopamine is selectively released during FB based but not PA probabilistic classification learning on the WPT, implicating the striatum and its dopamine connections in mediating non‐motor skill learning with corrective feedback. These results have significant implications for understanding the mechanisms of the feedback dependent and observational learning systems, and why patients with basal ganglia disorders such as PD and HD have selective learning impairments on FB based but not the PA version of WPT and the potential contribution of dopaminergic medication to FB based WPT learning deficits in PD.
ACKNOWLEDGMENTS
YF Tai was supported by a Wellcome Trust research training fellowship. A Career Development Fellowship from Parkinson's UK supported L Wilkinson. The study was supported by a Medical Research Council (UK) core program grant.
REFERENCES
- Aron AR, Shohamy D, Clark J, Myers C, Gluck MA, Poldrack RA (2004): Human midbrain sensitivity to cognitive feedback and uncertainty during classification learning. J Neurophysiol 92:1144–1152. [DOI] [PubMed] [Google Scholar]
- Atallah HE, Lopez‐Paniagua D, Rudy JW, O'Reilly RC (2007): Separate neural substrates for skill learning and performance in the ventral and dorsal striatum. Nat Neurosci 10:126–131. [DOI] [PubMed] [Google Scholar]
- Beck AT (1978): The Beck Depression Inventory. Orlando, USA: The Psychological Corporation, Harcourd Brace Jovanovitz. [Google Scholar]
- Brix G, Zaers J, Adam LE, Bellemann ME, Ostertag H, Trojan H, Haberkorn U, Doll J, Oberdorfer F, Lorenz WJ (1997): Performance evaluation of a whole‐body PET scanner using the NEMA protocol. National Electrical Manufacturers Association. J Nucl Med 38:1614–1623. [PubMed] [Google Scholar]
- Butters N, Wolfe J, Martone M, Granholm E, Cermak LS (1985): Memory disorders associated with Huntington's disease: Verbal recall, verbal recognition and procedural memory. Neuropsychologia 23:729–743. [DOI] [PubMed] [Google Scholar]
- Cohen NJ, Squire LR (1980): Preserved learning and retention of pattern‐analyzing skill in amnesia: Dissociation of knowing how and knowing that. Science 210:207–210. [DOI] [PubMed] [Google Scholar]
- Cools R, Altamirano L, D'Esposito M (2006): Reversal learning in Parkinson's disease depends on medication status and outcome valence. Neuropsychologia 44:1663–1673. [DOI] [PubMed] [Google Scholar]
- Cools R, Barker RA, Sahakian BJ, Robbins TW (2001): Enhanced or impaired cognitive function in Parkinson's disease as a function of dopaminergic medication and task demands. Cerebral Cortex 11:1136–1143. [DOI] [PubMed] [Google Scholar]
- Foerde K, Knowlton BJ, Poldrack RA (2006): Modulation of competing memory systems by distraction. Proc Natl Acad Sci USA 103:11778–11783. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Folstein MF, Folstein SE, McHugh PR (1975): “Mini‐mental state”. A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res 12:189–198. [DOI] [PubMed] [Google Scholar]
- Gunn RN, Lammertsma AA, Hume SP, Cunningham VJ (1997): Parametric imaging of ligand‐receptor binding in PET using a simplified reference region model. Neuroimage 6:279–287. [DOI] [PubMed] [Google Scholar]
- Holl AK, Wilkinson L, Tabrizi SJ, Painold A, Jahanshahi M (2012): Probabilistic classification learning with corrective feedback is selectively impaired in early Huntington's disease—Evidence for the role of the striatum in learning with feedback. Neuropsychologia 50:2176–2186. [DOI] [PubMed] [Google Scholar]
- Innis RB, Cunningham VJ, Delforge J, Fujita M, Giedde A, Gunn RN, Holden J, Houle S, Huang SC, Ichise M, et al. (2007): Consensus nomenclature for in vivo imaging of reversibly binding radioligands. J Cereb Blood Flow Metab 27:1533–1539. [DOI] [PubMed] [Google Scholar]
- Jahanshahi M, Wilkinson L, Gahir H, Dharminda A, Lagnado DA (2010): Medication impairs probabilistic classification learning in Parkinson's disease. Neuropsychologia 48:1096–1103. [DOI] [PubMed] [Google Scholar]
- Knowlton BJ, Mangels JA, Squire LR (1996a): A neostriatal habit learning system in humans. Science 273:1399–1402. [DOI] [PubMed] [Google Scholar]
- Knowlton BJ, Squire LR, Gluck MA (1994): Probabilistic classification learning in amnesia. Learn Mem 1:106–120. [PubMed] [Google Scholar]
- Knowlton BJ, Squire LR, Paulsen JS, Swerdlow NR, Swenson M, Butters N (1996b): Dissociations within nondeclarative memory in Huntington's disease. Neuropsychology 10:538–548. [Google Scholar]
- Laruelle M (2000): Imaging synaptic neurotransmission with in vivo binding competition techniques: A critical review. J Cereb Blood Flow Metab 20:423–451. [DOI] [PubMed] [Google Scholar]
- Linnet J, Mouridsen K, Peterson E, Moller A, Doudet DJ, Gjedde A (2012): Striatal dopamine release codes uncertainty in pathological gambling. Psychiatry Res Neuroimag 204:55–60. [DOI] [PubMed] [Google Scholar]
- Martin‐Soelch C, Szczepanik J, Nugent A, Barhaghi K, Rallis D, Herscovitch P, Carson RE, Drevets WC (2011): Lateralization and gender differences in the dopaminergic response to unpredictable reward in the human ventral striatum. Eur J Neurosci 33:1706–1715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moody TD, Bookheimer SY, Vanek Z, Knowlton BJ (2004): An implicit learning task activates medial temporal lobe in patients with Parkinson's disease. Behav Neurosci 118:438–442. [DOI] [PubMed] [Google Scholar]
- Poldrack RA, Clark J, Pare‐Blagoev EJ, Shohamy D, Creso Moyano J, Myers C, Gluck MA. (2001a): Interactive memory systems in the human brain. Nature 414:546–550. [DOI] [PubMed] [Google Scholar]
- Poldrack RA, Clark J, Pare‐Blagoev EJ, Shohamy D, Moyano JC, Myers C, Gluck MA (2001b): Interactive memory systems in the human brain. Nature 414:546–550. [DOI] [PubMed] [Google Scholar]
- Poldrack RA, Prabhakaran V, Seger CA, Gabrieli JDE (1999): Striatal activation during acquisition of a cognitive skill. Neuropsychology 13:564–574. [DOI] [PubMed] [Google Scholar]
- Pruessner JC, Champagne F, Meaney MJ, Dagher A (2004): Dopamine release in response to a psychological stress in humans and its relationship to early life maternal care: A positron emission tomography study using (11)C raclopride. J Neurosci 24:2825–2831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rodriguez PF, Aron AR, Poldrack RA (2006): Ventral‐striatal/nucleus‐accumbens sensitivity to prediction errors during classification learning. Hum Brain Mapp 27:306–313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seger CA, Cincotta CM (2006): Dynamics of frontal, striatal, and hippocampal systems during rule learning. Cereb Cortex 16:1546–1555. [DOI] [PubMed] [Google Scholar]
- Shohamy D, Myers CE, Geghman KD, Sage J, Gluck MA (2006): L‐dopa impairs learning, but spares generalization, in Parkinson's disease. Neuropsychologia 44:774–784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shohamy D, Myers CE, Grossman S, Sage J, Gluck MA, Poldrack RA (2004): Cortico‐striatal contributions to feedback‐based learning: Converging data from neuroimaging and neuropsychology. Brain 127:851–859. [DOI] [PubMed] [Google Scholar]
- Shohamy D, Myers CE, Kalanithi J, Gluck MA (2008): Basal ganglia and dopamine contributions to probabilistic category learning. Neurosci Biobehav Rev 32:219–236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang WC, Lazzara MM, Ranganath C, Knight RT, Yonelinas AP (2010): The Medial Temporal Lobe Supports Conceptual Implicit Memory. Neuron 68:835–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whone AL, Bailey DL, Remy P, Pavese N, Brooks DJ (2004): A technique for standardized central analysis of 6‐(18)F‐fluoro‐L‐DOPA PET data from a multicenter study. J Nucl Med 45:1135–1145. [PubMed] [Google Scholar]
- Wilkinson L, Lagnado DA, Quallo M, Jahanshahi M (2008): The effect of feedback on non‐motor probabilistic classification learning in Parkinson's disease. Neuropsychologia 46:2683–2695. [DOI] [PubMed] [Google Scholar]
- Witt K, Nuhsman A, Deuschl G (2002): Dissociation of habit‐learning in Parkinson's and cerebellar disease. J Cogn Neurosci 14:493–499. [DOI] [PubMed] [Google Scholar]