Abstract
Prior studies have shown that dopamine (DA) functioning in frontostriatal circuits supports reinforcement learning (RL), as phasic DA activity in ventral striatum signals unexpected reward and may drive coordinated activity of striatal and orbitofrontal regions that support updating of action plans. However, the nature of DA functioning in RL is complex, in particular regarding the role of DA clearance in RL behavior. Here, in a multi-modal neuroimaging study with healthy adults, we took an individual differences approach to the examination of RL behavior and DA clearance mechanisms in frontostriatal learning networks. We predicted that better RL would be associated with decreased striatal DA transporter (DAT) availability and increased intrinsic functional connectivity among DA-rich frontostriatal regions. In support of these predictions, individual differences in RL behavior were related to DAT binding potential in ventral striatum and resting-state functional connectivity between ventral striatum and orbitofrontal cortex. Critically, DAT binding potential had an indirect effect on reinforcement learning behavior through frontostriatal connectivity, suggesting potential causal relationships across levels of neurocognitive functioning. These data suggest that individual differences in DA clearance and frontostriatal coordination may serve as markers for RL, and suggest directions for research on psychopathologies characterized by altered RL.
Keywords: dopamine, fMRI, functional connectivity, PET, reinforcement learning, striatum
Introduction
Learning to select behaviors that lead to positive outcomes is fundamental to survival, and prior research has suggested that the neurobiological mechanisms of successful reinforcement learning (RL) are shared across species, environments, and contexts (Seger 2009). In particular, striatal dopamine (DA) signaling for unexpected rewards (reward prediction errors (RPE)) is believed to play a key role in coordinating activity among striatal and orbitofrontal regions, thereby updating the value of environmental cues and action plans (Schultz 2015). Evidence for RPE encoding by DA has been found using electrophysiological measurement of DA cell firing (Cohen et al. 2012; Eshel et al. 2015; Schultz 2015), optogenetic stimulation of DA neurons (Tsai et al. 2009; Kravitz et al. 2012), and measurement of DA release in striatal terminal field regions (Flagel et al. 2011; Hart et al. 2014). Similarly, evidence linking frontostriatal circuit strength to RL has been documented in research ranging from preclinical studies (Bailey and Mair 2007; Braz et al. 2015) to human developmental (van den Bos et al. 2012) or lesion studies (Bellebaum et al. 2008). Together, this prior research provides support for the idea that DA release in frontostriatal circuits is a critical mechanism of successful learning.
A critical protein in the DA signaling pathway is the DA transporter (DAT), which facilitates rapid clearance of extracellular DA within the striatum. According to theoretical accounts of DA RPE signals, phasic release of DA in response to unexpected rewards should persist in the synapse long enough to engage post-synaptic targets but be cleared rapidly enough to maintain the requisite temporal precision for prediction error encoding. Moreover, abundant preclinical evidence indicates that, in addition to DA clearance, DAT shapes the signal-to-noise ratio of DA neurotransmission and can affect presynaptic DA levels during neuronal activity (for review, see Sulzer et al. 2016). If correct, this model would predict that modulation of DAT function would affect RPE signaling and by extension, reinforcement learning. Consistent with this framework, pharmacological manipulations that block or reverse DAT function, such as psychostimulants, have been shown to produce marked change in DA-dependent behaviors, including enhanced instrumental conditioning (Taylor and Robbins 1984; Everitt and Robbins 2005; 2016). However, these drugs often possess noradrenergic and serotonergic effects as well, and some studies using more selective DAT inhibitors (e.g., GBR12909) have failed to detect clear effects on RL behavior (Costa et al. 2014). Consequently, the role of DAT function for RL remains unclear.
In addition to its potential influence on DAergic RPE signals, individual differences in DAT function may also influence connectivity within frontostriatal networks, for example, the magnitude of positive functional connectivity between key nodes within the brain reward pathway, including regions implicated in reward prediction errors (such as the nucleus accumbens, NAc) and regions involved in updating action plans (such as the orbitofrontal cortex, OFC) (Yeo et al. 2011; Smith 2012; Smith et al. 2013). Given that one of the major effects of post-synaptic DA receptors on striatal medium spiny neurons (MSNs) is to potentiate or attenuate the strength of excitatory cortical and limbic inputs (Floresco 2015), it is plausible that DAT availability may also be associated with downstream effects on the level of large-scale network functioning. For example, lower DAT availability corresponding with increased synaptic DA may be related to enhanced coordination among frontostriatal regions. However, to our knowledge these possibilities have not yet been tested.
One strategy for investigating RL and frontostriatal DA is to capitalize on individual differences in these functional domains. People vary considerably in their RL behavior, and prior research has demonstrated that such variability corresponds with distinct profiles of DA and network-level functioning. For example, individuals characterized by DA deficiencies are shown to exhibit impaired learning (Frank et al. 2004; Wilkinson et al. 2009) and altered frontostriatal recruitment (Nakamura et al. 2001). An individual differences approach to the associations between DAT availability, frontostriatal network activity, and RL behavior, may be useful in revealing naturally occurring covariance across levels of functioning.
The present study was designed to examine how individual differences in RL manifest in frontostriatal DA systems in healthy humans, with a particular focus on DAT binding, as indexed by DAT binding potential (BPND). We took a multi-modal approach that included evaluation of individual differences using positron emission tomography (PET), functional magnetic resonance imaging (fMRI), and behavioral testing. We used the highly selective DAT ligand [11C]altropane to assess DAT availability in the NAc within ventral striatum. Motivated by research suggesting the reliability of slow-wave intrinsic connectivity (Geerligs et al. 2015), frontostriatal circuit activity was evaluated using resting-state functional connectivity. Directly relevant to the current study, prior research has shown that pharmacological manipulation of DA enhances resting-state functional connectivity between NAc and ventral frontal regions (Kelly et al. 2009), and individual differences in frontostriatal resting-state functional connectivity have been associated with DA concentrations (Horga et al. 2016). Therefore, in the present study, resting-state functional connectivity of bilateral NAc was interpreted as a circuit-level index of effective coordination among DA systems (although other neurochemicals may contribute to individual differences in the same or overlapping circuits (Felger et al. 2016)). Reinforcement learning behavior was indexed using a validated task that has been used to measure individual differences in implicit reward sensitivity in previous research (Santesso et al. 2008). We predicted that better learning task performance would be related to 1) lower DAT BPND in ventral striatum (interpreting DAT BPND as an index of DAT clearance capacity); and 2) stronger resting-state functional connectivity in a frontostriatal circuit including ventral striatum and areas of orbitofrontal cortex.
Materials and Methods
Thirty-four healthy adults (ages 19–44, mean age = 26.81, SD = 7.00; 24 females) were recruited from the Boston metropolitan area through local websites, flyers, and advertisements. All participants completed a Structured Clinical Interview for the DSM-IV-TR to confirm the absence of current or history of psychiatric illness. In a behavioral testing session, participants completed a task designed to assess implicit RL; in a separate session, participants completed PET scanning. Next, a subset (n = 25, ages 19–44, mean age = 25.48, SD = 7.04; 15 females) of participants completed a session involving magnetic resonance imaging (MRI) scanning that included structural imaging and a resting-state paradigm. The average interval between sessions was 15.12 days; inter-session interval did not covary with experimental variables. In previous independent studies, these sample sizes were shown to be adequate for examining dopamine transporter (DAT) (Yeh et al. 2012) and for investigating the neural and behavioral indices of reinforcement learning used in the present study (Pizzagalli et al. 2008; Santesso et al. 2008, 2009). In light of prior evidence that DA functioning changes with age (Volkow et al. 1996), all analyses controlled for participants’ age in months. No participant was taking psychoactive medications, and all participants reported no history of neurological impairment, head injury, or MRI counter-indications. This study was approved by the Partners Healthcare Institutional Review Board, and written informed consent was collected. Data were stripped of identifying information, encrypted, and saved to password-protected servers. Data from the present study are available upon request.
Behavioral Testing and Analysis
Probabilistic Reward Task for Assessment of Reinforcement Learning Behavior
Individual differences in RL were measured in an implicit learning task that takes a signal detection approach to measuring sensitivity to rewards, the Probabilistic Reward Task (PRT) (Pizzagalli et al. 2005). For each trial of this task, the participant was presented (for 500 ms) with a drawing of a face on which either a short or long mouth stimulus (or a short or long nose stimulus), was displayed (for 100 ms). The participant was instructed to respond as quickly as possible to indicate which stimulus was displayed, and correct responses either resulted in reward feedback ($0.20 and the phrase “Correct! You won $0.20”) or null feedback (blank screen). The reinforcement schedule was asymmetrical: one “rich” stimulus was rewarded for correct responses 3 times more frequently than the other “lean” stimulus, unbeknownst to the participant. In total, the participant received 40 reward outcomes, 30 of which were elicited by correct response to rich stimulus, 10 of which were elicited by correct response to lean stimulus. Accordingly, in each block 30 of the 50 rich trials (60%) but only 10 of the 50 lean trials (20%) could be followed by a reward feedback. For the PET analyses, participants were pooled across 2 separate studies, which used the identical PRT paradigm, but different numbers of blocks. Specifically, 67% of participants completed 3 blocks of 100 trials/block, whereas the remaining participants completed 2 blocks of 100 trials/block. In order to merge these databases, performance scores were z-transformed within subgroups that performed either the 2-block or the 3-block version of the task, before pooling z-scores for a unified RL factor in subsequent analyses; in addition, task version was included in the analyses as a covariate. The primary index of reinforcement learning behavior was z-transformed change in response bias (ΔRB) from the first to the last block of trials [ΔRB = Response Bias (final block) – Response Bias (first block)]. Response bias (RB) was computed with the equation:
In this equation, the variables Richcorrect and Richincorrect correspond to the number of correct and incorrect responses to identify the rich stimulus, respectively, and the variables Leancorrect and Leanincorrect correspond to the number of correct and incorrect responses to identify the lean stimulus, respectively. Consistent with previous studies using this task, 0.5 was added to each of the above variables to permit calculating response bias in cases in which one of the raw variables was equal to zero (Santesso et al. 2008; Vrieze et al. 2013). Positive ΔRB over the course of the task indicates reinforcement learning proficiency (i.e., increased bias to respond accurately to “rich” compared with “lean” stimuli over time). Individual differences in this measure of reinforcement learning have been shown to correspond to symptoms of anhedonia (Pizzagalli et al. 2005), response to dopaminergic drugs (Pizzagalli et al. 2008), and neural response to reward (Santesso et al. 2008). Participants who performed poorly (<55% accuracy, or >10% outlier trials with RT < 150 ms or RT > 2500 ms, or failure to achieve an overall reinforcement schedule of approximately 3:1) (Pizzagalli et al. 2005) were excluded from analyses (all n = 34 eligible for analysis). See Figure 1 for summary of the PRT; no outliers on learning performance were detected in the present sample (i.e., ΔRB scores within 3 standard deviations of mean).
PET Acquisition and Analysis
To investigate DA clearance, we used the radiotracer [11C]altropane with DAT binding potential (BPND) as the primary outcome parameter. [11C]altropane was selected as the PET tracer for this study because it has rapid and specific striatal binding (rapid kinetics in DA-rich striatal regions) and high selectivity for DAT (e.g., 28 times more selective for DAT than serotonin transporter (Fischman et al. 2001; Madras et al. 1998)). [11C]altropane binding was assessed using an ECAT EXACT HR+ (CTI, Knoxville, TN) PET camera (3D mode, 63 contiguous 2.4 mm slices, 2.06 × 2.06 mm transaxial grid). For each participant, (approximately) 10 mCi of [11C]altropane was administered intravenously over 20–30 s. Images were acquired in 39 frames, with the duration of each frame increasing over time (8 frames of 15 s, 4 frames of 60 s, 27 frames of 120 s) for a total duration of 60 min. A filtered back-projection algorithm was used to reconstruct PET images with physical corrections applied for photon scatter and attenuation, random coincidences, system deadtime, and detector inhomogeneity. The motion-corrected frames were summed and coregistered to a common reference space (Montreal Neurological Institute, MNI) using FSL (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/) and by computing the deformation field on the basis of the participant’s structural MRI scan for individual warping. Transformations were then applied to the dynamic PET images.
To calculate regional BPND (Innis et al. 2007), we used the multilinear reference tissue model (Ichise et al. 2003) with the reference region defined as the cerebellum, excluding the vermis (Alpert and Yuan 2009; Fang et al. 2012). BPND was estimated in left and right NAc, with regions defined by an anatomical atlas (Tzourio-Mazoyer et al. 2002) in MNI space (AAL atlas publically available as an SPM12 toolbox, http://www.gin.cnrs.fr/en/tools/aal-aal2/). The NAc ROI for PET imaging was structurally defined to ensure adequate coverage of ventral striatum, which is necessary to gain a sufficiently strong radiotracer signal in resting PET imaging studies. See Table 1 for report of DAT BPND in left and right NAc across the sample; the range of DAT BPND is consistent with previous studies, and no outliers were detected in the present sample (i.e., DAT BPND scores within 3 standard deviations of sample mean). For analyses and in all figures, DAT BPND scores were residualized for age and z-scored.
Table 1.
DAT BPND | ||||
---|---|---|---|---|
Min | Max | Mean | STDEV | |
Left NAc | 0.98 | 2.90 | 2.08 | 0.45 |
Right NAc | 1.08 | 3.04 | 2.19 | 0.45 |
Bilateral NAc | 1.03 | 2.94 | 2.14 | 0.44 |
Note: DAT BPND was indexed by [11C]altropane. For analyses and in all figures, DAT BPND scores were residualized for age and z-scored.
To test the association between individual differences in reinforcement learning and DAT binding in ventral striatum, we performed a correlation between (z-scored ΔRB) and DAT BPND (residualized for age) in left and right NAc. Putative hemispheric differences in correlation coefficients were tested with the Meng test (Meng et al. 1992), which tests for differences between correlation coefficients while taking into account dependency between predictor variables in each correlation.
MRI Acquisition and Analysis
Data Acquisition
A Siemens Tim Trio 3T scanner and 32-channel head coil were used to collect MRI data, including a high-resolution T1-weighted anatomical image (TR = 2200 ms, TE = 4.27 ms, flip angle = 7, 144 slices, field of view = 230 mm, matrix = 192 × 192, voxel size 1.2 × 1.2 × 1.2 mm) and eyes-open resting functional data (TR = 3000 ms, TE = 30 ms, flip angle = 85, 47 slices, field of view = 216 mm, matrix = 72 × 72, voxel size 3 × 3 × 3 mm, total duration = 6.2 min, total volumes = 124). Resting-state fMRI data were collected immediately following collection of anatomical data, and prior to other functional scanning.
Resting-state: General Image Preprocessing
We discarded the first 6 s of each participant’s functional data to allow for stabilization of the magnetic field. Preprocessing of functional data was performed in SPM8 using the standard spatial preprocessing steps of slice-time correction, realignment, normalization in MNI space, and smoothing with a 6-mm kernel.
Resting-state: Head Motion and Artifact Detection
Motion correction is of special importance for resting-state functional connectivity analysis (Buckner et al. 2013). We used SPM8 to assess head motion by translation and rotation in x, y, z directions. Next, Artifact Detection Tools (ART, www.nitrc.org/projects/artifact_detect/) were utilized to calculate time points of significant head motion or fluctuations in the magnetic field (>1 mm motion from previous frame, global mean intensity >3 standard deviations from mean intensity across functional scans) for each participant. Then, outlier images were modeled in each participant’s first-level general linear model (as a vector the length of the time series, with 1 for outlier time points and 0 for non-outlier time points) to remove the influence of outlier time points on estimates of functional connectivity while maintaining the temporal structure of the data. Thus, motion correction included the regressing out of not only residual head motion parameters (3 translation and 3 rotation parameters, plus 1 composite motion parameter reflecting the maximum scan-to-scan movement), but also outlier volumes (as calculated through artifact detection).
Resting-state: Denoising
We performed voxelwise seed-based functional connectivity analyses using the CONN toolbox (https://www.nitrc.org/projects/conn/; Whitfield-Gabrieli and Nieto-Castanon 2012). We estimated physiological and other sources of noise using CompCor (Behzadi et al. 2007), a method that estimates physiological noise from white matter and cerebrospinal fluid for each participant using principal component analysis. The first 5 components were then regressed out of each participant’s functional data on the first level of analysis. In addition, a temporal band-pass filter of 0.01–0.10 Hz was applied to the time series. This range was selected to remove high frequency activity related to cardiac and respiratory activity (Cordes et al. 2001) and low frequency activity that may be related to scanner drift.
Together, the corrections performed on the time series included: detrending, outlier correction, motion regression, and CompCor correction (which were performed together in a single first-level regression model), followed by band-pass filtering. These corrections produced a residual BOLD time course at each voxel that was used for subsequent analyses.
Resting-state: First-level Functional Connectivity Analysis
For first-level resting-state analyses, we computed the Pearson’s correlation coefficient between the full time course of a bilateral seed region of interest (ROIs) in left and right NAc and the time course of all other voxels, yielding a correlation map for each participant. Correlation coefficients were normalized using the Fisher’s z-transformation. Functionally defined NAc ROIs were used for fMRI analyses in order to restrict the seed ROI to voxels that have been shown to correspond with regions responsive to rewards (vs. null feedback) in previous BOLD imaging research from an independent sample (Admon and Pizzagalli 2015). (For alternate views of functionally defined and structurally defined NAc ROIs, see Supplementary Fig. 2). Although primary analyses used a bilateral seed, to explore potential laterality effects, follow-up voxelwise analyses were performed using left and right NAc seeds independently.
Resting-state: Group-level Functional Connectivity Analysis
For group-level analyses, first-level normalized correlation maps were entered into a whole-brain regression analysis and group-level statistics were performed at each voxel. To identify regions in which NAc functional connectivity was associated with individual differences in reinforcement learning behavior, mean-deviated ΔRB was entered as the independent variable predicting the magnitude of correlations in activity between NAc and other regions of the brain (mean-deviated age of participant was entered as a covariate). Regions in which functional connectivity with NAc was associated with ΔRB were considered significant if they exceeded a peak amplitude of P < 0.05 (2-sided, i.e., P < 0.025 in each tail), cluster corrected within an intrinsic brain mask that restricts the search space to the SPM MNI template brain to False Discovery Rate of P < 0.05. Analyses were also repeated including sex and number of outlier images as group-level covariates; because controlling for these variables did not affect results, and these variables did not relate to ΔRB, simple analyses (with age as the only covariate) are reported. See Table 2 for report of resting-state functional connectivity in significant clusters of effect; no outliers were detected in the present sample (i.e., estimates of functional connectivity within 3 standard deviations of sample mean). For mediation analyses and in all figures, estimates of resting-state functional connectivity were z-scored.
Table 2.
Peak Coord | Vol | Average FC of cluster | Correlation between implicit reinforcement learning and FC of cluster | |||
---|---|---|---|---|---|---|
Mean | STDEV | r | Cluster P | |||
Bilateral NAc seed | ||||||
OFC | −16, 20, −10 | 1214 | 0.21 | 0.08 | 0.69 | <0.001 |
Parietal cortex | 54, −36, 56 | 1070 | 0.01 | 0.09 | −0.66 | 0.001 |
Left NAc seed | ||||||
OFC | 8, 60, −24 | 1611 | 0.14 | 0.10 | 0.60 | 0.003 |
Right NAc seed | ||||||
OFC/subcallosal | −20, 4, −16 | 2463 | 0.09 | 0.10 | 0.61 | 0.002 |
SMA | 6, −16, 52 | 941 | 0.05 | 0.02 | 0.44 | 0.042 |
MFG | 36, 40, 24 | 1292 | 0.02 | 0.13 | −0.49 | 0.020 |
Parietal cortex | 40, −42, 36 | 1704 | 0.01 | 0.09 | −0.54 | 0.010 |
Note: Coord = coordinates in MNI space, Vol = volume in 1 × 1 × 1 mm voxels, FC = resting-state functional connectivity. Peak thresholded at P < 0.05 2-sided, cluster corrected to false discovery rate (FDR) of P < 0.05.
Mediation Modeling
The mediation model included the following variables: 1) DAT BPND (residualized for age) in the NAc, 2) functional connectivity between NAc and OFC (Fisher’s z-transformed correlation coefficients from the OFC region in which ΔRB predicted increased functional connectivity between NAc and OFC, extracted using REX (https://www.nitrc.org/projects/rex/), and 3) individual differences in reinforcement learning (ΔRB). The model tested the indirect relationship between DAT BPND and individual differences in RL behavior as mediated by NAc-OFC functional connectivity. To estimate the standardized regression coefficients for the associations between DAT BPND in NAc, resting-state functional connectivity of NAc-OFC, and reinforcement learning behavior, we performed regression analyses. Mediation was tested using a bootstrapping (5000 iterations) approach (Preacher and Hayes 2008) to estimate the indirect effect. Mediation and regression analyses were performed using SPSS19.
Results
Individual Differences in Reinforcement Learning Behavior are Associated with Striatal Dopamine Transporter Binding Potential
Correlation analyses tested the associations between individual differences in RL (z-scored ΔRB) and (z-scored, age-residualized) DAT BPND in left and right NAc (controlling for RL task version). Results showed that ΔRB was negatively associated with DAT BPND in bilateral NAc, r(31) = −0.43, P = 0.01, indicating that—as hypothesized—lower DAT availability was associated with better reinforcement learning (Fig. 2). Visual inspection of the scatterplot suggested potential outliers (data points exerting undue influence on the correlation); Cook’s D (Cook 1977) was calculated, and the correlation was repeated omitting (n = 3) data points with Cook’s D above the standard threshold (4/n; Bollen and Jackman 1985). Omitting these data points did not alter the overall pattern of results; although the association dropped to trend-level, the magnitude of the correlation remained a medium effect size (adjusted r = −0.32, adjusted P = 0.08) (Cohen 1992). Follow-up correlations to examine putative laterality effects revealed that ΔRB was negatively associated with DAT BPND in both left NAc, r(31) = −0.49, P < 0.01, and (at the level of a trend) in right NAc, r(31) = -0.34, P = 0.06.
Individual Differences in Reinforcement Learning Behavior are Associated with Frontostriatal Resting-State Functional Connectivity
Next, a voxelwise functional connectivity analysis was performed using seed ROIs in left and right NAc to identify regions in which functional connectivity with ventral striatum was associated with individual differences in reinforcement learning (z-scored ΔRB, controlling for age). Results showed that ΔRB was positively associated with functional connectivity between bilateral NAc and regions of medial OFC (Fig. 3), and negatively associated with functional connectivity between NAc and areas of posterior parietal cortex (Supplementary Fig. 3). Of note, we had no a priori hypotheses with respect to cortical systems beyond OFC, however, in light of interesting recent work showing that parietal systems may be especially relevant to reinforcement learning under conditions of high attentional control (Niv et al. 2015), these findings are displayed in greater detail in the Supplement and discussed below (Table 2).
Follow-up voxelwise analyses revealed positive correlations between ΔRB and functional connectivity of left NAc with distributed regions of medial and lateral OFC, and of right NAc with regions of medial OFC and mid-cingulate (Table 2).
Frontostriatal Resting-state Functional Connectivity Mediates the Relationship Between Striatal Dopamine Transporter Binding Potential and Individual Differences in Reinforcement Learning Behavior
Taken together, the above results suggest that individual differences in RL may be driven by striatal DAT expression and frontostriatal circuit coordination. These neurobiological mechanisms may act in concert to shape or reflect RL behavior: prior data indicate that DA firing together with co-activation by cortical and limbic glutamatergic afferents drives striatal responses (Floresco 2015). Lower clearance of DA by DAT may therefore increase the impact of striatal RPE signals on frontostriatal pathways, resulting in better learning. To examine this hypothesis, we tested the indirect effect of DAT BPND on individual differences in reinforcement learning performance through resting-state functional connectivity of NAc (Fig. 4). Variables entered into the mediation model included DAT BPND in bilateral NAc (age-residualized and z-scored), resting-state functional connectivity (z-scored Fisher’s z-transformed correlations) between bilateral NAc and OFC, and RL behavior (z-scored ΔRB). Bootstrapped path-analysis (Preacher and Hayes 2008) revealed that frontostriatal resting-state functional connectivity significantly mediated the relationship between bilateral DAT BPND and learning behavior (confidence interval: −0.26 to −0.01). However, in a separate mediation model, resting-state functional connectivity between striatal and parietal regions did not significantly mediate the effect of bilateral DAT BPND on learning behavior (confidence interval: −0.36 to 0.005). Follow-up analyses showed a significant indirect effect of left NAc DAT BPND through left NAc-with-OFC functional connectivity on learning behavior (bootstrapped 95% confidence interval: −0.24 to −0.02); and a trending effect of right NAc DAT BPND through right NAc-with-OFC functional connectivity on learning behavior (bootstrapped 95% confidence interval: −0.23 to 0.00).
Discussion
The present study provides evidence that individual differences in RL are associated with DAT BPND and intrinsic frontostriatal functioning. Consistent with a wide range of evidence supporting the role of striatal DA in RL across people and species (Flagel et al. 2011; Cohen et al. 2012; Eshel et al. 2015; Hart et al. 2014; Schultz 2015), we observed that individual differences in RL in healthy humans were related to DAT BPND in ventral striatum and resting-state functional connectivity of a frontostriatal circuit linking ventral striatal regions with areas of orbitofrontal cortex involved in updating action plans. Moreover, mediated effects support a model in which DA clearance capacity may shape learning performance by contributing to the intrinsic strength of frontostriatal circuits. Collectively, these data suggest that DA re-uptake and frontostriatal circuit integrity are a key source of individual differences in reinforcement learning.
In the striatum, phasic DA release in response to better-than-expected outcomes is believed to facilitate learning for reward-predictive cues via enhanced long-term potentiation or depression of cortical glutamatergic inputs to striatal medium spiny neurons (Reynolds et al. 2001; Frank 2005). This well-supported model leads to several predictions for understanding individual differences in RL. First, lower DAT availability—and hence greater sensitivity to RPE signals owing to higher levels of DA—is predicted to promote reinforcement learning. In support of this idea, we observed that individuals characterized by lower DAT BPND showed higher reinforcement learning scores in behavioral testing. This finding contrasts to some prior research suggesting that DAT blockade does not influence learning, but instead, has effects on novelty seeking (Costa et al. 2014). However, differences in the nature of tasks used to probe RL behavior may contribute to mixed findings (modeling RL behavior using a modified multiarm bandit task vs. an implicit reward sensitivity task as in the present study), and the present findings are consistent with other research indicating that DAT has a role in RL (Everitt and Robbins 2005). Second, because the expected effect of larger DAergic RPE signals is to increase coupling between striatum and glutamatergic afferents, we predicted that lower DAT binding would be associated via frontostriatal connectivity with reinforcement learning behavior. Supporting this prediction, we observed that individuals who exhibited lower DAT BPND not only performed better in a reinforcement learning task but also this association was mediated by stronger resting-state functional connectivity between NAc and the OFC. Taken together, these results support the idea that striatal DA signaling contributes to the coordinated activity of striatal and orbitofrontal regions to update learned behaviors (Wilson et al. 2014; Schuck et al. 2016), and that individual differences in striatal DA can be linked to individual differences in RL.
Some limitations to the current study warrant additional comment. First, the present study was designed to target “intrinsic” individual differences in frontostriatal functioning, by investigating molecular (DAT signaling) and systems-level (functional connectivity) activity at rest. We note that prior research suggests that approximately half of variability in resting-state functional connectivity is attributable to trait-like individual differences (Buckner et al. 2013). However, a complementary approach would be to investigate individual differences in frontostriatal functional connectivity and molecular signaling during active demands for reinforcement learning. Efforts to extend the scientific questions of the present study to dynamic task-based neuroimaging, using high-resolution neuroimaging sequences, are underway. Second, the present study evaluated DAT concentrations and frontostriatal functional connectivity in separate sessions. This approach assumes stability over time of individual differences in molecular functioning and functional connectivity, an assumption that has been supported (Zuo and Xing 2014; Chen et al. 2015) but may have exceptions (Somandepalli et al. 2015). Therefore, capturing individual differences in molecular and systems-level functioning at the same time point, for example, through simultaneous PET/fMRI (Riedl et al. 2014; Bailey et al. 2016), may provide more precise information about the correspondence between DAT and frontostriatal activity. Third, the negative relationship between learning rate and DAT potential was reduced to a trend when 3 participants characterized by data points with Cook’s D above the standard threshold were excluded (adjusted r = −0.32, adjusted P = 0.08); clearly, independent replications of the current findings are needed. A final caveat to the described findings is that the behavioral measure of reinforcements learning used (Probabilistic Reward Task) was relatively simple, and only involved learning actions in response to a single stimulus dimension that were associated with possible gains. The relationship between DAT and other aspects of reinforcement learning, for example, learning from penalties or with more complex multidimensional stimuli, was not examined. Consequently, it is possible that the observed associations between DAT and performance would not generalize to these other experimental designs. It is also possible that other RL tasks would be useful to clarify the present finding of a negative association between RL behavior and resting-state functional connectivity in a parietal corticostriatal circuit. In this context, it is interesting to note that prior research that has shown that parietal and dorsolateral regions of an attention control network are important for selecting and updating which stimulus dimensions are relevant to reinforcement learning (Niv et al. 2015). Thus, it is possible that in the present study, poor learners responded to the task as more attentionally demanding than good learners; this possibility should be examined using an RL task that varies cognitive load.
Conclusion
In spite of the above limitations, our results highlight—we believe for the first time in humans—the importance of DAT mechanisms putatively involved in DA clearance and frontostriatal circuit functioning as markers of individual differences in RL. While prior studies have used measures of DA cell firing, pharmacological and optogenetic manipulation, and terminal efflux to demonstrate the role of DAergic RPE signals during reinforcement, these data provide novel evidence for DA re-uptake as a critical source of individual differences in human reinforcement learning. Future research may build upon these findings to investigate linkages between frontostriatal DA functioning and dimensions of personality (e.g., impulsivity, reward dependence), psychiatric health (e.g., anhedonia), and daily functioning.
Supplementary Material
Notes
Conflict of Interest: Over the past 3 years, Dr. Pizzagalli has received consulting fees from Akili Interactive Labs, BlackThorn Therapeutics, Boehringer Ingelheim, Pfizer and PositScience for activities unrelated to the current research. In the past 3 years, Dr. Treadway has served as a paid consultant to Boston Consulting Group, NeuroCog Trials, Avanir Pharmaceuticals, and Blackthorn Therapeutics. No funding from these entities was used to support the current work, and all views expressed are solely those of the authors. All other authors report no biomedical financial interests.
Authors’ Contributions
All authors have approved the manuscript. D.A.P was responsible for study conceptualization and funding. F.G., L.M.M., M.B., and A.L.C. collected the data. R.H.K., M.T.T., D.W.W., P.P., and D.A.P. designed the analytic methods. R.H.K., D.W.W., and A.W conducted formal analysis. N.M.A., G.E.F., M.D.N., and D.A.P contributed resources for the present study. The manuscript was written by R.H.K. and M.T.T., and edited by D.W.W., P.K., M.D.N., and D.A.P.
Funding
Supported by National Institute of Mental Health grants R01 MH068376 and 1R01 MH101521 (DAP). Dr. Kaiser and Dr. Treadway were supported by 1F32 MH106262 and R00 MH102355, respectively.
Materials and Correspondence
Direct requests for materials and correspondence to Dr. Diego A. Pizzagalli Department of Psychiatry, Harvard Medical School, and Center for Depression, Anxiety and Stress Research, McLean Hospital, dap@mclean.harvard.edu
References
- Admon R, Pizzagalli DA. 2015. Corticostriatal pathways contribute to the natural time course of positive mood. Nat Commun. 6:10065. doi: 10.1038/ncomms10065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alpert NM, Yuan F. 2009. A general method of bayesian estimation for parametric imaging of the brain. Neuroimage. 45(4):1183–1189. [DOI] [PubMed] [Google Scholar]
- Bailey DL, Pichler BJ, Guckel B, Barthel H, Beer AJ, Botnar R, Gillies R, Goh V, Gotthardt M, Hicks RJ, et al. 2016. Combined pet/mri: from status quo to status go. Summary report of the fifth international workshop on pet/mr imaging; february 15–19, 2016; tubingen, germany. Mol Imaging Biol. 18(5):637–650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bailey KR, Mair RG. 2007. Effects of frontal cortex lesions on action sequence learning in the rat. Eur J Neurosci. 25(9):2905–2915. [DOI] [PubMed] [Google Scholar]
- Behzadi Y, Restom K, Liau J, Liu TT. 2007. A component based noise correction method (compcor) for bold and perfusion based fmri. Neuroimage. 37(1):90–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bellebaum C, Koch B, Schwarz M, Daum I. 2008. Focal basal ganglia lesions are associated with impairments in reward-based reversal learning. Brain. 131:829–841. [DOI] [PubMed] [Google Scholar]
- Bollen KA, Jackman RW. 1985. Regression diagnostics: an expository treatment of outliers and influential cases. Soc Methods Res. 13(4):510–542. [Google Scholar]
- Braz BY, Galinanes GL, Taravini IRE, Belforte JE, Murer MG. 2015. Altered corticostriatal connectivity and exploration/exploitation imbalance emerge as intermediate phenotypes for a neonatal dopamine dysfunction. Neuropsychopharmacology. 40(11):2576–2587. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buckner RL, Krienen FM, Yeo BTT. 2013. Opportunities and limitations of intrinsic functional connectivity mri. Nat Neurosci. 16(7):832–837. [DOI] [PubMed] [Google Scholar]
- Chen B, Xu T, Zhou CL, Wang LY, Yang N, Wang Z, Dong HM, Yang Z, Zang YF, Zuo XN, et al. 2015. Individual variability and test-retest reliability revealed by ten repeated resting-state brain scans over one month. PLos One. 10(12):21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohen J. 1992. A power primer. Psychol Bull. 112(1):155–159. [DOI] [PubMed] [Google Scholar]
- Cohen JY, Haesler S, Vong L, Lowell BB, Uchida N. 2012. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature. 482(7383):85–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cook RD. 1977. Detection of influential observation in linear regression. Technometrics. 19(1):15–18. [Google Scholar]
- Cordes D, Haughton VM, Arfanakis K, Carew JD, Turski PA, Moritz CH, Quigley MA, Meyerand ME. 2001. Frequencies contributing to functional connectivity in the cerebral cortex in “Resting-state” Data. Am J Neuroradiol. 22(7):1326–1333. [PMC free article] [PubMed] [Google Scholar]
- Costa VD, Tran VL, Turchi J, Averbeck BB. 2014. Dopamine modulates novelty seeking behavior during decision making. Behav Neurosci. 128(5):556–566. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eshel N, Bukwich M, Rao V, Hemmelder V, Tian J, Uchida N. 2015. Arithmetic and local circuitry underlying dopamine prediction errors. Nature. 525(7568):243–246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Everitt BJ, Robbins TW. 2005. Neural systems of reinforcement for drug addiction: from actions to habits to compulsion. Nat Neurosci. 8(11):1481–1489. [DOI] [PubMed] [Google Scholar]
- Everitt BJ, Robbins TW. 2016. Drug addiction: updating actions to habits to compulsions ten years on In: Fiske ST, editor. Annual review of psychology. vol 67 Palo Alto: Annual Reviews; p. 23–50. [DOI] [PubMed] [Google Scholar]
- Fang YHD, El Fakhri G, Becker JA, Alpert NM. 2012. Parametric imaging with bayesian priors: a validation study with c-11-altropane pet. Neuroimage. 61(1):131–138. [DOI] [PubMed] [Google Scholar]
- Felger JC, Li Z, Haroon E, Woolwine BJ, Jung MY, Hu X, Miller AH. 2016. Inflammation is associated with decreased functional connectivity within corticostriatal reward circuitry in depression. Mol Psychiatry. 21(10):1358–1365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fischman AJ, Bonab AA, Babich JW, Livni E, Alpert NM, Meltzer PC, Madras BK. 2001. C-11,i-127 altropane: a highly selective ligand for pet imaging of dopamine transporter sites. Synapse. 39(4):332–342. [DOI] [PubMed] [Google Scholar]
- Flagel SB, Clark JJ, Robinson TE, Mayo L, Czuj A, Willuhn I, Akers CA, Clinton SM, Phillips PEM, Akil H. 2011. A selective role for dopamine in stimulus-reward learning. Nature. 469(7328):53–U63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Floresco SB. 2015. The nucleus accumbens: an interface between cognition, emotion, and action In: Fiske ST, editor. Annual review of psychology. vol 66 Palo Alto: Annual Reviews; p. 25–52. [DOI] [PubMed] [Google Scholar]
- Frank MJ. 2005. Dynamic dopamine modulation in the basal ganglia: a neurocomputational account of cognitive deficits in medicated and nonmedicated parkinsonism. J Cogn Neurosci. 17(1):51–72. [DOI] [PubMed] [Google Scholar]
- Frank MJ, Seeberger LC, O’Reilly RC. 2004. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science. 306(5703):1940–1943. [DOI] [PubMed] [Google Scholar]
- Geerligs L, Rubinov M, Henson RN, Cam CAN. 2015. State and trait components of functional connectivity: Individual differences vary with mental state. J Neurosci. 35(41):13949–13961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hart AS, Rutledge RB, Glimcher PW, Phillips PEM. 2014. Phasic dopamine release in the rat nucleus accumbens symmetrically encodes a reward prediction error term. J Neurosci. 34(3):698–704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Horga G, Cassidy CM, Xu XY, Moore H, Slifstein M, Van Snellenberg JX, Abi-Dargham A. 2016. Dopamine-related disruption of functional topography of striatal connections in unmedicated patients with schizophrenia. Jama Psychiatry. 73(8):862–870. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ichise M, Liow JS, Lu JQ, Takano T, Model K, Toyama H, Suhara T, Suzuki T, Innis RB, Carson TE. 2003. Linearized reference tissue parametric imaging methods: application to c-11 dasb positron emission tomography studies of the serotonin transporter in human brain. J Cereb Blood Flow Metab. 23(9):1096–1112. [DOI] [PubMed] [Google Scholar]
- Innis RB, Cunningham VJ, Delforge J, Fujita M, Giedde A, Gunn RN, Holden J, Houle S, Huang SC, Ichise M, et al. 2007. Consensus nomenclature for in vivo imaging of reversibly binding radioligands. J Cereb Blood Flow Metab. 27(9):1533–1539. [DOI] [PubMed] [Google Scholar]
- Kelly C, de Zubicaray G, Di Martino A, Copland DA, Reiss PT, Klein DF, Castellanos FX, Milham MP, McMahon K. 2009. L-dopa modulates functional connectivity in striatal cognitive and motor networks: a double-blind placebo-controlled study. J Neurosci. 29(22):7364–7378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kravitz AV, Tye LD, Kreitzer AC. 2012. Distinct roles for direct and indirect pathway striatal neurons in reinforcement. Nat Neurosci. 15(6):816–U823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Madras BK, Gracz LM, Meltzer PC, Liang AY, Elmaleh DR, Kaufman MJ, Fischman AJ. 1998. Altropane, a spect or pet imaging probe for dopamine neurons: Ii. Distribution to dopamine-rich regions of primate brain. Synapse. 29(2):105–115. [DOI] [PubMed] [Google Scholar]
- Meng XL, Rosenthal R, Rubin DB. 1992. Comparing correlated correlation coefficients. Psychol Bull. 111(1):172–175. [Google Scholar]
- Nakamura T, Ghilardi MF, Mentis M, Dhawan V, Fukuda M, Hacking A, Moeller JR, Ghez C, Eidelberg D. 2001. Functional networks in motor sequence learning: abnormal topographies in parkinson’s disease. Hum Brain Mapp. 12(1):42–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Niv Y, Daniel R, Geana A, Gershman SJ, Leong YC, Radulescu A, Wilson RC. 2015. Reinforcement learning in multidimensional environments relies on attention mechanisms. J Neurosci. 35(21):8145–8157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pizzagalli DA, Evins AE, Schetter EC, Frank MJ, Pajtas PE, Santesso DL, Culhane M. 2008. Single dose of a dopamine agonist impairs reinforcement learning in humans: behavioral evidence from a laboratory-based measure of reward responsiveness. Psychopharmacology (Berl). 196(2):221–232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pizzagalli DA, Jahn AL, O’Shea JP. 2005. Toward an objective characterization of an anhedonic phenotype: a signal detection approach. Biol Psychiatry. 57(4):319–327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Preacher KJ, Hayes AF. 2008. Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behav Res Methods. 40(3):879–891. [DOI] [PubMed] [Google Scholar]
- Reynolds JNJ, Hyland BI, Wickens JR. 2001. A cellular mechanism of reward-related learning. Nature. 413(6851):67–70. [DOI] [PubMed] [Google Scholar]
- Riedl V, Bienkowska K, Strobel C, Tahmasian M, Grimmer T, Forster S, Friston KJ, Sorg C, Drzezga A. 2014. Local activity determines functional connectivity in the resting human brain: a simultaneous fdg-pet/fmri study. J Neurosci. 34(18):6260–6266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Santesso DL, Dillon DG, Birk JL, Holmes AJ, Goetz E, Bogdan R, Pizzagalli DA. 2008. Individual differences in reinforcement learning: behavioral, electrophysiological, and neuroimaging correlates. Neuroimage. 42(2):807–816. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Santesso DL, Evins AE, Frank MJ, Schetter EC, Bogdan R, Pizzagalli DA. 2009. Single dose of a dopamine agonist impairs reinforcement learning in humans: evidence from event-related potentials and computational modeling of striatal-cortical function. Hum Brain Mapp. 30(7):1963–1976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schuck NW, Cai MB, Wilson RC, Niv Y. 2016. Human orbitofrontal cortex represents a cognitive map of state space. Neuron. 91(6):1402–1412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schultz W. 2015. Neuronal reward and decision signals: from theories to data. Physiol Rev. 95(3):853–951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seger CA. 2009. The involvement of corticostriatal loops in learning across tasks, species, and methodologies. Basal Ganglia Ix. 58:25–39. [Google Scholar]
- Smith SM. 2012. The future of fmri connectivity. Neuroimage. 62(2):1257–1266. [DOI] [PubMed] [Google Scholar]
- Smith SM, Vidaurre D, Beckmann CF, Glasser MF, Jenkinson M, Miller KL, Nichols TE, Robinson EC, Salimi-Khorshidi G, Woolrich MW, et al. 2013. Functional connectomics from resting-state fmri. Trends Cogn Sci. 17(12):666–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Somandepalli K, Kelly C, Reiss PT, Zuo XN, Craddock RC, Yan CG, Petkova E, Castellanos FX, Milham MP, Di Martino A. 2015. Short-term test-retest reliability of resting state fmri metrics in children with and without attention-deficit/hyperactivity disorder. Dev Cogn Neurosci. 15:83–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Taylor JR, Robbins TW. 1984. Enhanced behavioral-control by conditioned reinforcers following microinjections of d-amphetamine into the nucleus accumbens. Psychopharmacology (Berl). 84(3):405–412. [DOI] [PubMed] [Google Scholar]
- Tsai HC, Zhang F, Adamantidis A, Stuber GD, Bonci A, de Lecea L, Deisseroth K. 2009. Phasic firing in dopaminergic neurons is sufficient for behavioral conditioning. Science. 324(5930):1080–1084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N, Mazoyer B, Joliot M. 2002. Automated anatomical labeling of activations in spm using a macroscopic anatomical parcellation of the mni mri single-subject brain. Neuroimage. 15(1):273–289. [DOI] [PubMed] [Google Scholar]
- van den Bos W, Cohen MX, Kahnt T, Crone EA. 2012. Striatum-medial prefrontal cortex connectivity predicts developmental changes in reinforcement learning. Cereb Cortex. 22(6):1247–1255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Volkow ND, Ding YS, Fowler JS, Wang GJ, Logan J, Gatley SJ, Hitzemann R, Smith G, Fields SD, Gur R. 1996. Dopamine transporters decrease with age. J Nuclear Med. 37(4):554–559. [PubMed] [Google Scholar]
- Vrieze E, Ceccarini J, Pizzagalli DA, Bormans G, Vandenbulcke M, Demyttenaere K, Van Laere K, Claes S. 2013. Measuring extrastriatal dopamine release during a reward learning task. Hum Brain Mapp. 34(3):575–586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whitfield-Gabrieli S, Nieto-Castanon A. 2012. Conn: a functional connectivity toolbox for correlated and anticorrelated brain networks. Brain Connectivity. 2(3):125–141. [DOI] [PubMed] [Google Scholar]
- Wilkinson L, Khan Z, Jahanshahi M. 2009. The role of the basal ganglia and its cortical connections in sequence learning: evidence from implicit and explicit sequence learning in parkinson’s disease. Neuropsychologia. 47(12):2564–2573. [DOI] [PubMed] [Google Scholar]
- Wilson RC, Takahashi YK, Schoenbaum G, Niv Y. 2014. Orbitofrontal cortex as a cognitive map of task space. Neuron. 81(2):267–279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yeh CB, Chou YH, Cheng CY, Lee MS, Wang JJ, Lee CH, Shiue CY, Su TP, Huang WS. 2012. Reproducibility of brain dopamine transporter binding with tc-99m trodat-1 spect in healthy young men. Psychiatry Research-Neuroimaging. 201(3):222–225. [DOI] [PubMed] [Google Scholar]
- Yeo BTT, Krienen FM, Sepulcre J, Sabuncu MR, Lashkari D, Hollinshead M, Roffman JL, Smoller JW, Zoller L, Polimeni JR, et al. 2011. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J Neurophysiol. 106(3):1125–1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zuo XN, Xing XX. 2014. Test-retest reliabilities of resting-state fmri measurements in human brain functional connectomics: a systems neuroscience perspective. Neurosci Biobehav Rev. 45:100–118. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.