Abstract
Interindividual differences in the effects of reward on performance are prevalent and poorly understood, with some individuals being more dependent than others on the rewarding outcomes of their actions. The origin of this variability in reward dependence is unknown. Here, we tested the relationship between reward dependence and brain structure in healthy humans. Subjects trained on a visuomotor skill-acquisition task and received performance feedback in the presence or absence of reward. Reward dependence was defined as the statistical trial-by-trial relation between reward and subsequent performance. We report a significant relationship between reward dependence and the lateral prefrontal cortex, where regional gray-matter volume predicted reward dependence but not feedback alone. Multivoxel pattern analysis confirmed the anatomical specificity of this relationship. These results identified a likely anatomical marker for the prospective influence of reward on performance, which may be of relevance in neurorehabilitative settings.
Keywords: brain structure, motivation, prefrontal cortex, reward
Introduction
Motivational processes are a fundamental driver of human performance, affecting the acquisition of new knowledge and skills (Dweck, 1986; Dickinson and Balleine, 1994). The capacity of extrinsic rewards, such as food, water, or money to shape behavior is well documented (Berridge and Robinson, 2003; Daw et al., 2005; Berridge et al., 2009; Abe et al., 2011; Manley et al., 2014). However, individual differences in the effects of reward on performance and skill acquisition have been repeatedly reported (Cohen, 2007; Schönberg et al., 2007; Santesso et al., 2008; Frank et al., 2009) and their origin remains puzzling and poorly understood. Humans differ considerably in the degree to which reward guides their behavior, with some individuals being more tuned than others to the rewarding outcomes of their actions. The propensity to consistently respond to reward cues on a trial-by-trial basis, commonly referred to as reward dependence (Cloninger, 1987; Cloninger et al., 1993), is a key feature of reward processing, dissociable from more global influences of reward on on-line within-session learning (Abe et al., 2011; Dayan et al., 2014). The local trial-by-trial effects of reward on subsequent performance may possibly be masked when performance measures are averaged across trials or blocks, necessitating analysis that allows for a better quantification of reward effects on performance at the single-trial level (Daw, 2011). To date, reward dependence has been studied primarily in the context of addiction (Cloninger, 1987; Han et al., 2007). Importantly, the neural mechanisms underlying the contribution of reward dependence to behavioral performance and skill acquisition have not been identified to date.
Reward processing is controlled by a complex hierarchy of brain systems and is influenced by many situational factors and learned associations (Berridge et al., 2009; Schönberg et al., 2014). In this respect, reward processing is often state-dependent, influenced by the state humans or other animals are operating under, for example, hunger or saturation. Thus, state-dependent computations, a product of the interaction between incoming stimuli and the internal dynamic state of neural networks (Buonomano and Maass, 2009) may play a substantial role during reward processing and valuation (Niv et al., 2006; Pompilio et al., 2006; Symmonds et al., 2010; Levy et al., 2013), and are therefore likely to contribute to the observed individual differences in the effects of reward on performance. On the other hand, the influence of more stable pre-existing state-independent factors, such as brain anatomy remains elusive. A large body of recent research has shown that performance in a wide range of perceptual, motor, and cognitive functions is linked to the local structure of gray-matter in functionally related brain regions (Davatzikos, 2004; Kanai and Rees, 2011; Kanai et al., 2011a,b, 2012). Here, we tested whether reward dependence, defined as the extent to which reward guides subsequent behavior on a trial-by-trial basis, during the acquisition of an explicit procedural skill could be predicted by structural gray-matter properties.
Materials and Methods
Participants.
Thirty-eight healthy right-handed volunteers (18 F/20 M; mean age, 24.3 ± 0.49 SEM) participated in the study. All participants provided written informed consent before their participation in the study and all procedures were approved by the Combined Neuroscience Institutional Review Board, National Institutes of Health. Inclusion criteria were unremarkable physical and neurological history, no MRI contradictions, no use of psychoactive medication, and ability to perform the task.
Skill acquisition task.
Participants trained on a sequential explicit visuomotor acquisition task (Reis et al., 2009; Schambra et al., 2011). The task was administered in a quiet room, via a laptop computer (Dell, Latitude E5510, screen size 15.6 inches), placed in front of the participants at comfortable viewing height and distance (∼40 cm away). The task required subjects to move a small cursor through five individually colored, horizontally displayed targets, numbered 1–5. Four of the targets resembled a gate (i.e., were composed of two parallel horizontal lines) and the fifth was a thick horizontal line. Subjects controlled the movements of the cursor by pinching a force-transducer with the distal phalanx of the thumb and the proximal interphalangeal joint of the index of their right dominant hand. Specifically, pressing the transducer moved the cursor rightward, toward the targets, whereas releasing pressure moved the cursor back toward the starting position. To complete a trial successfully subjects had to accurately (avoiding target-overshooting) move the cursor through Gate 1, return to the starting position proceed to Gate 2, and then move back and forth between the remaining targets and the starting position until completion of the trial. Whenever a target was reached successfully (defined as maintaining the cursor inside the gate for a minimum of 0.2 s), and on successful completion of a trial, performance feedback in the form of an auditory signal was provided. A GO-signal (a bright green circle displayed beneath the task display) indicated the beginning of each trial. Subjects were instructed to complete each sequence as accurately as possible, within a fixed duration (8 s); thus, successful performance required a combination of accuracy and speed.
Reward-guided performance.
After one block of 20 trials, where baseline levels of performance were assessed subjects continued training for five additional blocks, composed of 20 trials each. In one-half of the subjects (rewarded group, n = 19; mean age, 23.8 ± 0.5, 9 F/10 M) monetary reward (presented visually: “You Win: $0.6”) was given after each successful trial. Accumulated reward was also displayed beneath the immediate reward display. For trials that were not successfully completed subjects received a “no reward” outcome (“You Win: $0”). The task also included a penalty option (“You Lose $1”) for trials where no attempt has been made to move the cursor, but this did not occur in any of the subjects. To control for the specificity of the reported findings to reward dependence, we trained a second group of subjects on the same skill-acquisition task providing performance feedback alone, with no reward. This group (unrewarded group, n = 19; mean age, 24.7 ± 0.85, 9 F/10 M) performed the exact same task as the rewarded group with identical performance feedback settings, yet they did not receive reward upon completion of successful trials. The two groups of subjects did not differ in mean age (t(36) = 0.846, p = 0.403) or sex (distributions were identical).
Experimental design.
T1-weighted scans were administered to all subjects immediately before training. Training on the visuomotor skill-acquisition task, performed outside of the scanner, began after one block of trials where baseline performance levels were assessed. Training consisted of five blocks of 20 trials each. Short breaks were provided between blocks. Subjects also underwent functional resting state scans and most (36/38) also participated in three additional procedural memory tests that were administered after training ended (data will be reported elsewhere).
Imaging setup.
Imaging data were acquired with a 3.0-T GE Signa HDx scanner using an 8-channel coil. High-resolution (1 × 1 × 1 mm3) 3D magnetization prepared rapid gradient echo (MPRAGE) T1-weighted images were acquired (repetition time = 6.264 ms; echo time = 2.672 ms; field-of-view: 256 × 256; slice thickness: 1 mm; slice spacing: 1 mm)
Behavioral data analysis
Binary logistic regression.
To quantify the degree to which subjects were dependent on reward (in the rewarded group), and compare it with the equivalent influence of simple performance feedback (in the unrewarded group) we subjected the training data to binary logistic regression. This analysis measured the degree to which reward or performance feedback at any trial n predicted successful performance at the following trial, n + 1. The regression model was of the form:
Where the model predicts the natural log of ODDS, which corresponds to the odds of a trial being successful or not, p is the predicted probability that a trial was successful (1) rather than not successful (0), a is the intercept, and X is the predictor of the model (reward or performance feedback at trial n). The negative-log likelihood (NLL), a statistic measuring the fit of the model, or in simplified terms how well the model predicted subjects success was used as a measure of how well reward (or performance feedback) predicted subjects' performance in the task. The analysis was performed on fully binarized data, for both the rewarded and unrewarded groups. In both groups, the dependent variable was a binarized measure of trial success (0 = unsuccessful, 1 = successful). In the rewarded group, the independent predictor variable, reward at the preceding trial was binarized as well, where trials that were rewarded were coded with “1” and those that were not were coded with “0.” Similarly, in the unrewarded group, successful trials that received performance feedback indicating success were coded as “1,” and those that did not were coded as “0.” To derive the NLL fit binarized logistic regression was performed for all subjects individually. The distribution of the derived NLL statistics, as obtained in the rewarded and unrewarded groups was compared with a nonparametric Kolmogorov–Smirnov test. In addition, to compare the degree to which the NLL statistic reflected subjects' performance in the task, across groups, we performed a generalized linear model with fraction of correct trials serving as dependent variable, group assignment (rewarded/unrewarded) as factor and NLL as covariate.
Skill acquisition.
The focus of the current work is on the trial-by-trial dependency between reward and performance, referred to throughout the text as reward dependence. However, we derived two additional learning metrics from subjects' training data to assess the specificity of the results to reward dependence, rather than to other aspects of learning. We first derived, for each subject, the fraction of successful trials, averaged per block, defining learning as the difference between the last (fifth) and first blocks. A second, more elaborate measure for skill acquisition in the task, as proposed and used previously, combined speed and accuracy, capturing shifts in speed–accuracy tradeoff functions. The measure, which was described in detail in previous studies (Reis et al., 2009, Schambra et al., 2011) was of the following form:
where error rate (proportion of unsuccessful trials) and movement time (time from trial initiation to its end) were averaged per block. In error-free blocks, error rate was set at 0.05. The exponent, b, was set at 5.424, a value which was confirmed in two independent samples of subjects who performed the same task (Reis et al., 2009; Schambra et al., 2011; Dayan et al., 2014). Learning gains were defined as the difference in the skill measure between the last (fifth) and first blocks. Focusing on shifts in speed–accuracy tradeoff functions, rather on speed or accuracy separately, ensures that training-related behavioral changes did not reflect a simple change along the same tradeoff function (e.g., switching from slow and accurate to fast and inaccurate performance), which is less indicative of skill acquisition (Reis et al., 2009; Schambra et al., 2011; Dayan et al., 2014). Together with fraction of correct trials, these two learning metrics allowed us to dissociate the global effects of reward on learning gains from those of reward dependency, which expresses the more local trial-by-trial effects of reward on subsequent performance.
Imaging data analysis
VBM.
Voxel-based morphometry (VBM; Ashburner and Friston, 2000) analysis was performed using Statistical Parametric Mapping 8. T1-Weighted MR images were first segmented for gray-matter, white matter, and CSF. The diffeomorphic anatomical registration through exponentiated lie algebra (DARTEL) algorithm, implemented in SPM8 was then used for intersubject alignment of gray-matter images. The aligned images were then smoothed using an 8 mm full-width at half-maximum Gaussian kernel and transformed into Montreal Neurological Institute (MNI) space. Statistical analysis, performed in SPM8 as well, was based on an ANOVA model with group assignment (rewarded, unrewarded) serving as between-groups factor and NLL (or other learning-derived metrics, see Skill Acquisition) as a covariate of interest (thus, the interaction of group and the covariate were modeled in the design matrix). Total intracranial volume was computed within SPM and the data were proportionally scaled accordingly. Initial whole-brain analysis was performed with a voxelwise threshold of p < 0.001. Cluster-level p values were set at p < 0.05, adjusting for nonstationary smoothness (Worsley et al., 1999) using the nonstationary correction toolbox for SPM (Hayasaka et al., 2004). To further visualize the results, we extracted individual subjects' gray-matter volumes from each of the clusters that showed an effect using MarsBar (http://marsbar.sourceforge.net/). This step was performed for visualization purposes only and does not constitute an independent statistical test. Results of the VBM analysis were visualized, in part, using MRIcron (http://www.mccauslandcenter.sc.edu/mricro/mricron/).
Pattern classification.
Pattern recognition analysis was performed using the Pattern Recognition of Brain Image Data (PROBID) toolbox, on MATLAB 7. The objective of this analysis was to examine the specificity of the results to the superior frontal part of the lateral prefrontal cortex, in comparison with other structures in lateral and medial prefrontal cortex. Thus, this analysis, which was performed in the rewarded and unrewarded groups separately, aimed to examine whether patterns of gray-matter volume within these regions allow for an accurate classification of subjects who were more or less reward dependent (in the rewarded group), and as a control examined whether subjects who were more or less feedback dependent in the unrewarded group could be classified as well.
We first split the subjects in the rewarded and unrewarded groups using a median split, distinguishing within each group between subjects who were more or less reward or performance feedback dependent (higher and lower NLL = less and more dependent, respectively). Preprocessed gray-matter images (see description of preprocessing steps, above) were then subjected to pattern classification using a support vector machine (SVM) classifier. The PROBID toolbox uses a linear kernel SVM with its associated benefits in terms of reduction in the risk of overfitting. To specifically compare classification based on different regions in the prefrontal cortex, we created masks based on the automated anatomical labeling (Tzourio-Mazoyer et al., 2002) atlas via MRIcron, resliced to accurately fit subjects' T1 images using SPM. The following masks were generated, bilaterally: superior frontal gyri, middle frontal gyri, middle orbital-frontal gyri, and medial orbital-frontal gyri. We trained each classifier based on gray-matter images within each of these regions independently, each time testing whether participants with high and low NLL could be accurately classified in the rewarded and unrewarded groups. The performance of the classifier was tested with a leave-two-out cross-validation procedure (Ecker et al., 2010), where the test was run n number of times; n being the number of subjects, leaving two subjects out on each iteration. The accuracy of classification refers to the proportion of subjects who were correctly classified as more or less reward or feedback dependent in the rewarded and unrewarded groups, respectively. Accuracy also corresponds to the average between the classification's specificity and sensitivity. Taking into account rates of true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) classifications, Sensitivity = TP/(TP + FN) and Specificity = TN/(FP + TN), where TP and TN refer to subjects who were correctly classified, as more (TP) and less (TN) reward dependent (or more or less feedback dependent in the unrewarded group), whereas FP and FN to subjects who were incorrectly classified as belonging to each of these two groups. A permutation test consisting of 5000 iterations was used to derive significance estimates (p value) for each classification accuracy.
Results
Thirty-eight healthy volunteers underwent structural brain imaging (MRI) followed by training on an explicit sequential visuomotor skill-acquisition task (Reis et al., 2009; Schambra et al., 2011; Dayan et al., 2014; Fig. 1A). Training required subjects to move a small cursor back and forth between a “home” location and five targets (4 gates and 1 thick line), numbered 1–5, by modulating pinch force applied to a force transducer (Fig. 1B). Subjects were instructed to perform the entire sequence of movements (i.e., 1-back, 2-back, 3-back, 4-back, 5) accurately within 8 s. Whenever a target was reached successfully (defined as maintaining the cursor inside the gate for a minimum of 0.2 s), and upon successful completion of a trial, performance feedback in the form of an auditory tone was provided. Following a block of 20 trials, where baseline performance levels were assessed, subjects trained for five additional blocks where they all received performance feedback. In one-half of the subjects (henceforth, the rewarded group) successful completion of each training trial was additionally reinforced with monetary reward (displayed visually: “You Win $ 0.6”). To tease apart the contribution of reward from that of performance feedback, the second one-half of the participants (the unrewarded group) trained with performance feedback alone (Fig. 1C). Thus, the rewarded group received performance feedback and visually presented monetary reward, whereas the unrewarded group received performance feedback alone.
To quantitatively define reward dependence, we analyzed the training data using binary logistic regression (Kedem and Fokianos, 2002). This analysis measured the statistical dependency between trial outcome (reward or performance feedback alone) given at trial n and performance at trial n + 1 (Fig. 2A). We first binarized the performance and outcome phases of each trial (1 and 0 for successful and unsuccessful trials; 1 and 0 for positive or null outcomes). After fitting a logistic regression model to the training data of each subject, we derived the resulting NLL, a metric expressing the goodness-of-fit of the model. Thus, the NLL expresses how well the logistic regression model predicted subjects' performance along training (note that smaller NLL values express a better fit). The distributions of the NLL parameter obtained from the training data of the rewarded and unrewarded groups were not significantly different from one another (Kolmogorov–Smirnov, Z = 0.973; p = 0.3). Across groups NLL was strongly predictive of the overall performance in the task (Wald χ2 = 13.57, p < 0.001) and this predictive relationship did not differ among the rewarded and unrewarded groups (Wald χ2 = 2.89, p = 0.09). Thus, both performance feedback and reward had similar overall effects over performance at the subsequent trial.
We then tested using whole-brain VBM (Ashburner and Friston, 2000) whether the degree of dependence on reward as opposed to performance feedback alone related to variation in brain structure. Regional brain volumes were compared with an ANOVA where group assignment (rewarded/unrewarded) served as a factor and reward or performance feedback dependence (i.e., the NLL of the model) as a covariate. The main effect for group was not statistically significant, indicating that the two groups of subjects did not show any overall baseline differences in gray-matter volume. Two clusters in left and right superior frontal gyri showed a significant interaction between group and the dependence covariate (Fig. 2A; Table 1). Within these clusters gray-matter volume strongly correlated with dependence in the rewarded (r = 0.723, p < 0.001 and r = 0.709, p < 0.001 for the right and left clusters, respectively), but not in the unrewarded (r = −0.093, p = 0.705 and r = −0.066, p = 0.789) group. Notably, as smaller NLL values express a better fit of the binary-logistic regression model, positive correlations between NLL and gray-matter volume in effect expressed an inverse relation between reward dependence and brain structure. Thus, the results reveal that individuals who were more dependent on reward had smaller gray-matter volume in right and left superior frontal gyri.
Table 1.
Region | MNI coordinates (x, y, z) | t | p | ||
---|---|---|---|---|---|
L sup frontal gyrus | −14 | 26 | 61 | 3.44 | 0.0007 |
R sup frontal gyrus | 15 | 30 | 60 | 3.86 | 0.0002 |
Significant clusters showing an interaction between group (rewarded, unrewarded) and the dependence covariate. L, Left; R, right.
To examine the specificity of the results to reward dependence we tested whether the interaction between group assignment (rewarded/unrewarded) and other learning-derived parameters would yield significant interactions. With on-line learning gains defined as the difference in the fraction of correct trials during the last block, relative to those of the first block (ΔBlock5 − Block1), both groups showed significant gains over the course of training (rewarded group: t(18) = 7.78, p < 0.001; unrewarded group: t(18) = 10.75, p < 0.001). Online learning gains were comparable between the two groups (t(36) = 0.983, p = 0.332), and were weakly correlated with subjects' NLL (r = 0.038, p = 0.82), suggesting that the two measures are relatively independent. VBM analysis based on an ANOVA with group assignment serving as a factor and learning gains as a covariate did not reveal any significant interaction effects. We next repeated the analysis defining on-line learning based on an outcome measure combining both speed and accuracy, focusing on shifts in the tasks' speed–accuracy tradeoff function (Reis et al., 2009; Schambra et al., 2011; Dayan et al., 2014). Online learning gains, again defined as the differences between the first and last blocks (ΔBlock5 − Block1) were comparable in the rewarded and unrewarded groups (t(36) = 1.273, p = 0.211), and were here too weakly correlated with subjects' NLL (r = 0.056, p = 0.74). VBM analysis revealed again an insignificant interaction between group and on-line learning gains. Together, these results suggest that the interaction between group assignment and reward dependence was not present for on-line learning, and thus, confirm the specificity of the findings to reward dependence.
To test the topographic specificity of the current results to the superior frontal part of the lateral prefrontal cortex we used multivariate pattern recognition analysis with its potential for detecting more subtle morphological differences (Ecker et al., 2010). This analysis aimed to reveal whether patterns of gray-matter volume within superior frontal gyri, and in other control regions could allow for an accurate classification of subjects who were more or less reward dependent. As a control, we additionally tested whether subjects who were more or less feedback dependent could be similarly classified as well. We first performed a median split in the dependence results (Fig. 3A), distinguishing within each of the two groups between subjects who were more or less reward- or performance feedback-dependent (higher and lower NLL = less and more dependent, respectively). We then trained a support vector machine classifier (Boser et al., 1992) to distinguish between higher and lower reward- or performance feedback- dependent individuals, based on gray-matter volumes from left and right superior frontal gyri. The classification accuracy derived from this region was compared with accuracy derived from other adjacent structures in the prefrontal cortex, including middle frontal, middle orbital-frontal, and medial orbital-frontal gyri (Fig. 3B). All structures were localized anatomically based on a standardized atlas (Tzourio-Mazoyer et al., 2002). Consistent with the results reported above assignment into the subgroups of subjects who were more or less reward dependent was successfully classified from patterns of gray-matter volume in both left (77.78% accuracy, p < 0.01; Fig. 3C) and right (72.22% accuracy, p < 0.05; Fig. 3D) superior frontal gyri, but not in any of the other prefrontal regions (Fig. 3C,D; Table 2). Similarly, classification of reward dependence using a whole-brain mask (i.e., not confined to prefrontal cortical regions) was only marginally significant (p = 0.07; Fig. 3E).
Table 2.
Region | Sensitivity (%) |
Specificity (%) |
Accuracy (%) |
p |
||||
---|---|---|---|---|---|---|---|---|
Reward | Feedback | Reward | Feedback | Reward | Feedback | Reward | Feedback | |
L mid front | 55.56 | 33.33 | 66.67 | 33.33 | 61.11 | 33.33 | 0.222 | 0.955 |
R mid front | 44.44 | 55.56 | 55.56 | 33.33 | 50 | 44.44 | 0.582 | 0.755 |
L sup frontal | 77.78 | 22.22 | 77.78 | 33.33 | 77.78 | 27.78 | 0.0076 | 0.9874 |
R sup frontal | 66.67 | 55.56 | 77.78 | 44.44 | 72.22 | 50 | 0.0444 | 0.597 |
L med orb front | 55.56 | 66.67 | 22.22 | 22.22 | 38.89 | 44.44 | 0.774 | 0.779 |
R med orb front | 44.44 | 55.56 | 22.22 | 55.56 | 33.33 | 55.56 | 0.948 | 0.4112 |
L mid orb front | 66.67 | 33.33 | 66.67 | 33.33 | 66.67 | 33.33 | 0.127 | 0.888 |
R mid orb front | 55.56 | 66.67 | 33.33 | 44.44 | 44.44 | 55.56 | 0.767 | 0.406 |
Whole-brain | 77.78 | 44.44 | 55.56 | 33.33 | 66.67 | 38.89 | 0.07 | 0.888 |
Accuracy, sensitivity, and specificity obtained from a multivariate pattern classification analysis, classifying subjects who were more and less reward (in the rewarded group) or feedback (in the unrewarded group) dependent. Significance estimates (p values) derived from a permutation test consisting of 5000 iterations are also shown. Abbreviations are as in Fig. 3.
Gray-matter volume in none of the prefrontal regions successfully classified individuals who were more or less performance feedback-dependent in the unrewarded group (Fig. 3C,D; Table 2), and the classification failed also when a whole-brain mask was used (Fig. 3E). To more directly compare differences in classification of subjects who were more and less reward and feedback-dependent (Fig. 3C,D) we fitted the data with a factorial generalized linear model, testing the influence of group assignment (rewarded/unrewarded) and region (all regions as reported above) over a binary measure of classification applied to each of the classified subjects (1: classified successfully; 0: classified unsuccessfully). This analysis revealed significant region × group interaction (Wald χ2 = 15.25, p < 0.035), indicating that the differences in classification indeed differed across prefrontal cortical regions (Fig. 3C,D).
To gain additional insight into the topographical layout of the regions that successfully classified subjects who were more or less reward dependent, we generated discrimination maps (Mourão-Miranda et al., 2012; Qiu et al., 2014) for left (Fig. 4A) and right (Fig. 4B) superior frontal gyri. These maps display the spatial distribution of the weight vectors used in the classification of reward dependence, or in other words the weight of each voxel in discriminating between subjects who were more or less reward dependent. The maps reveal (Fig. 4A,B) that a spatially distributed pattern of voxels, extending from the most anterior to the most posterior parts of the superior frontal gyri contributed to the classification of reward dependence.
Overall, the results of the multivariate pattern classification analysis confirm the differential anatomical specificity of the superior frontal parts of the lateral prefrontal cortex for reward- but not performance feedback-dependence.
Discussion
Notwithstanding the increasing interest into the influence of reward on human behavior, one question that remained unresolved is why some individuals respond more consistently to the rewarding outcomes of their actions. In the current study, we aimed to address this puzzling variability by identifying stable anatomical substrates that may account for it. The results revealed that gray-matter volume in specific parts of the lateral prefrontal cortex predicted the degree of reward dependence that human subjects display during the acquisition of a novel skill. This predictive relationship was specific to reward dependence, in other words to the quantitative trial-by-trial relationship between reward and subsequent performance. No such relationship was found between brain structure and performance feedback, or the magnitude of overall reward-related on-line learning gains. Further, the predictive relationship between brain structure and reward dependence was specific to the most dorsal parts of the lateral prefrontal cortex.
Thus, we identified structural brain features strongly linked to the degree of reward dependence displayed by subjects during skill acquisition. Consistent with our main finding, neuroimaging studies in humans have implicated the superior frontal parts of the lateral prefrontal cortex in reward-based decision-making and learning (Delgado et al., 2000; Pochon et al., 2002; Rogers et al., 2004), particularly in the context of reward anticipation (Ernst et al., 2004; Bjork and Hommer, 2007) and outcome processing (Liu et al., 2011), likely a fundamental part in reward dependence. Interestingly, task-independent electroencephalogram oscillations in the α range, localized in left superior frontal gyrus, relate to subjects' propensity to respond to reward cues (Pizzagalli et al., 2005), fitting in nicely with the current results. Studies using transcranial magnetic stimulation (TMS) have additionally documented a causal role for the lateral prefrontal cortex in reward processing functions (Uher et al., 2005; Rose et al., 2011), although exact localization of stimulation effects remains challenging (Dayan et al., 2013). For example, repetitive TMS to superior frontal gyrus reduces craving for cigarettes in smokers (Rose et al., 2011), effects which were possibly due to modulation of reward reactivity. More generally, lateral prefrontal cortex integrates sensory and affective information (Averbeck and Seo, 2008), transmitting reward-related content to subcortical dopaminergic systems, thereby initiating motivated behavior (Ballard et al., 2011). In humans, motivation to obtain reward involves indirect functional input from the lateral prefrontal cortex, albeit from medial rather than superior frontal gyrus to the nucleus accumbens (Ballard et al., 2011), a structure whose activity relates to the relative efficacy of reward (Clithero et al., 2011). As the superior frontal gyrus shows intricate patterns of anatomical and functional connectivity with large-scale neuronal systems (Li et al., 2013), our results likely reflect more distributed substrates that contribute to the susceptibility to respond to reward cues during behavioral performance.
The present findings were anatomically specific. Multivoxel pattern analysis (Norman et al., 2006) was used to identify more subtle morphological differences between subjects possibly masked by a mass-univariate approach, such as VBM (Ecker et al., 2010). We found that spatially distributed patterns of voxels extending from the most anterior to the most posterior parts of the superior frontal gyri successfully discriminated subjects who were more and less reward dependent. Other control prefrontal regions, including the neighboring middle frontal, middle orbital-frontal, and medial orbital-frontal gyri, all implicated in reward processing (Liu et al., 2011) did not show such relationship. Thus, reward dependence only related to structural features of the superior frontal part of the lateral prefrontal cortex.
The data additionally demonstrate the behavioral specificity of the link between lateral prefrontal cortex and reward dependence during skill acquisition. First, the effect of performance feedback, which carried no differential incentive value and was identical in both the rewarded and unrewarded groups, was not predicted by any morphometric measures. Such specificity is consistent with previous work that highlighted differences in the behavioral properties and neural substrates of performance feedback and reward (Swinnen, 1996; Lutz et al., 2012). Administration of performance feedback guides behavioral improvements during skill acquisition which deteriorate upon its removal (Swinnen, 1996; Ronsse et al., 2011). Learning with performance feedback is associated with increased activation in visual and sensorimotor brain regions (Ronsse et al., 2011). Thus, skill acquisition with performance feedback relies on different neural substrates than those identified here, consistent with a selective role for the lateral prefrontal cortex in reward dependence.
The results also demonstrate that the trial-by-trial reward dependency displayed during behavioral performance was dissociable from on-line learning gains, expressed as the difference in average performance between the first and last blocks of practice. Although on-line learning gains in the rewarded group were significant, they were comparable to those displayed by the unrewarded group, as reported before for explicit visuomotor procedural learning (Abe et al., 2011), but not for implicit sequence learning where reward has been shown to induce stronger on-line gains relative to training with no reward (Wächter et al., 2009). Importantly, in the current results, learning gains and reward dependence were only weakly correlated across subjects, implying that the two measures are relatively independent. Consistently, gray-matter features of the prefrontal cortex predicted reward dependence but not on-line learning gains, suggesting dissociable mechanisms contributing to reward dependence and overall reward-related learning gains.
The VBM analysis showed an inverse relation between reward dependence and brain structure in which individuals with less gray-matter volume in the lateral prefrontal cortex were more reward dependent. Such findings are consistent with the proposal that smaller regional gray-matter may be associated with increased computational efficiency and improved behavior in other functions (Kanai and Rees, 2011). An alternative possibility is that individuals with reduced gray-matter volume in the lateral prefrontal cortex compensate for this reduction by responding more consistently to reward cues, a framework which has been proposed in other contexts, for instance healthy aging (Cabeza and Dennis, 2012). Still, our findings also reveal that more complex distributed multivoxel patterns of gray-matter volume within lateral prefrontal cortex (Fig. 4) differentiated subjects who were more or less reward dependent. Thus, the actual relationship between brain structure and reward dependence may not be confined to a simple inverse relation and likely reflects more diverse structure–function links.
Previous work demonstrated the influence of behavioral states on reward processing and valuation (Pompilio et al., 2006; McNamara et al., 2012; Levy et al., 2013). For example, metabolic states systematically influence human risk attitudes during financial decision making (Symmonds et al., 2010). The relationship between reward processing and more stable, pre-existing state-independent factors is less understood. Our findings demonstrate that such factors may be influential, being clearly predictive of the effects of reward on behavioral performance. Although experience dependent changes in gray-matter volume in humans are routinely reported (Dayan and Cohen, 2011), our use of a novel skill-acquisition task ensured that volumetric differences between subjects have not been induced by previous exposure to the task. In the current study our focus was on the trial-by-trial influence of reward on subsequent performance, which could relate to the reward dependence trait (Cloninger, 1987, 1993). Future research could delineate how brain structure interacts with other state-independent factors like trait characteristics (Cloninger, 1987) or genetics (Peper et al., 2007; Frank et al., 2009) in mediating reward dependence displayed during behavioral performances and other aspects of reward processing and learning.
In summary, the results show that variability in the structure of the lateral prefrontal cortex predicts the degree of reward dependence displayed during skill acquisition. Beyond their mechanistic implications, these results may open an avenue into developing predictive biomarkers for individuals' response to reward in educational (Dweck, 1986) and neurorehabilitative settings (Krakauer, 2006).
Footnotes
This work was supported by the Intramural Research Program of the National Institute of Neurological Disorders and Stroke, National Institutes of Health. This study used the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, MD (http://biowulf.nih.gov). We thank George Dold, Gary Melvin, and Ksenia Zherdeva for technical help and Nitzan Censor, Javier Elkin, and Micah Allen for helpful advice and suggestions.
The authors declare no competing financial interests.
References
- Abe M, Schambra H, Wassermann EM, Luckenbaugh D, Schweighofer N, Cohen LG. Reward improves long-term retention of a motor memory through induction of offline memory gains. Curr Biol. 2011;21:557–562. doi: 10.1016/j.cub.2011.02.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ashburner J, Friston KJ. Voxel-based morphometry: the methods. Neuroimage. 2000;11:805–821. doi: 10.1006/nimg.2000.0582. [DOI] [PubMed] [Google Scholar]
- Averbeck BB, Seo M. The statistical neuroanatomy of frontal networks in the macaque. PLoS Comput Biol. 2008;4:e1000050. doi: 10.1371/journal.pcbi.1000050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ballard IC, Murty VP, Carter RM, MacInnes JJ, Huettel SA, Adcock RA. Dorsolateral prefrontal cortex drives mesolimbic dopaminergic regions to initiate motivated behavior. J Neurosci. 2011;31:10340–10346. doi: 10.1523/JNEUROSCI.0895-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berridge KC, Robinson TE. Parsing reward. Trends Neurosci. 2003;26:507–513. doi: 10.1016/S0166-2236(03)00233-9. [DOI] [PubMed] [Google Scholar]
- Berridge KC, Robinson TE, Aldridge JW. Dissecting components of reward: “liking,” “wanting,” and learning. Curr Opin Pharmacol. 2009;9:65–73. doi: 10.1016/j.coph.2008.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bjork JM, Hommer DW. Anticipating instrumentally obtained and passively-received rewards: a factorial fMRI investigation. Behav Brain Res. 2007;177:165–170. doi: 10.1016/j.bbr.2006.10.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. Proceedings of the fifth annual workshop on computational learning theory; Pittsburgh, PA: ACM; 1992. pp. 144–152. [Google Scholar]
- Buonomano DV, Maass W. State-dependent computations: spatiotemporal processing in cortical networks. Nat Rev Neurosci. 2009;10:113–125. doi: 10.1038/nrn2558. [DOI] [PubMed] [Google Scholar]
- Cabeza R, Dennis NA. Frontal lobes and aging: deterioration and compensation. In: Stuss DT, Knight RT, editors. Principles of frontal lobe function. New York: Oxford UP; 2012. [Google Scholar]
- Clithero JA, Reeck C, Carter RM, Smith DV, Huettel SA. Nucleus accumbens mediates relative motivation for rewards in the absence of choice. Front Hum Neurosci. 2011;5:87. doi: 10.3389/fnhum.2011.00087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cloninger CR. Neurogenetic adaptive mechanisms in alcoholism. Science. 1987;236:410–416. doi: 10.1126/science.2882604. [DOI] [PubMed] [Google Scholar]
- Cloninger CR, Svrakic DM, Przybeck TR. A psychobiological model of temperament and character. Arch Gen Psychiatry. 1993;50:975–990. doi: 10.1001/archpsyc.1993.01820240059008. [DOI] [PubMed] [Google Scholar]
- Cohen MX. Individual differences and the neural representations of reward expectation and reward prediction error. Soc Cogn Affect Neurosci. 2007;2:20–30. doi: 10.1093/scan/nsl021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davatzikos C. Why voxel-based morphometric analysis should be used with great caution when characterizing group differences. Neuroimage. 2004;23:17–20. doi: 10.1016/j.neuroimage.2004.05.010. [DOI] [PubMed] [Google Scholar]
- Daw ND. Trial-by-trial data analysis using computational models. In: Phelps EA, Robbins TW, Delgado M, editors. Affect, learning and decision making, attention and performance XXIII. Oxford: Oxford UP; 2011. [Google Scholar]
- Daw ND, Niv Y, Dayan P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci. 2005;8:1704–1711. doi: 10.1038/nn1560. [DOI] [PubMed] [Google Scholar]
- Dayan E, Cohen LG. Neuroplasticity subserving motor skill learning. Neuron. 2011;72:443–454. doi: 10.1016/j.neuron.2011.10.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dayan E, Censor N, Buch ER, Sandrini M, Cohen LG. Noninvasive brain stimulation: from physiology to network dynamics and back. Nat Neurosci. 2013;16:838–844. doi: 10.1038/nn.3422. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dayan E, Averbeck BB, Richmond BJ, Cohen LG. Stochastic reinforcement benefits skill acquisition. Learn Mem. 2014;21:140–142. doi: 10.1101/lm.032417.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Delgado MR, Nystrom LE, Fissell C, Noll DC, Fiez JA. Tracking the hemodynamic responses to reward and punishment in the striatum. J Neurophysiol. 2000;84:3072–3077. doi: 10.1152/jn.2000.84.6.3072. [DOI] [PubMed] [Google Scholar]
- Dickinson A, Balleine B. Motivational control of goal-directed action. Animal Learn Behav. 1994;22:1–18. doi: 10.3758/BF03199951. [DOI] [Google Scholar]
- Dweck CS. Motivational processes affecting learning. Am Psychologist. 1986;41:1040–1048. doi: 10.1037/0003-066X.41.10.1040. [DOI] [Google Scholar]
- Ecker C, Rocha-Rego V, Johnston P, Mourao-Miranda J, Marquand A, Daly EM, Brammer MJ, Murphy C, Murphy DG. Investigating the predictive value of whole-brain structural MR scans in autism: a pattern classification approach. Neuroimage. 2010;49:44–56. doi: 10.1016/j.neuroimage.2009.08.024. [DOI] [PubMed] [Google Scholar]
- Ernst M, Nelson EE, McClure EB, Monk CS, Munson S, Eshel N, Zarahn E, Leibenluft E, Zametkin A, Towbin K, Blair J, Charney D, Pine DS. Choice selection and reward anticipation: an fMRI study. Neuropsychologia. 2004;42:1585–1597. doi: 10.1016/j.neuropsychologia.2004.05.011. [DOI] [PubMed] [Google Scholar]
- Frank MJ, Doll BB, Oas-Terpstra J, Moreno F. Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nat Neurosci. 2009;12:1062–1068. doi: 10.1038/nn.2342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Han DH, Lee YS, Yang KC, Kim EY, Lyoo IK, Renshaw PF. Dopamine genes and reward dependence in adolescents with excessive internet video game play. J Addict Med. 2007;1:133–138. doi: 10.1097/ADM.0b013e31811f465f. [DOI] [PubMed] [Google Scholar]
- Hayasaka S, Phan KL, Liberzon I, Worsley KJ, Nichols TE. Nonstationary cluster-size inference with random field and permutation methods. Neuroimage. 2004;22:676–687. doi: 10.1016/j.neuroimage.2004.01.041. [DOI] [PubMed] [Google Scholar]
- Kanai R, Rees G. The structural basis of inter-individual differences in human behaviour and cognition. Nat Rev Neurosci. 2011;12:231–242. doi: 10.1038/nrn3000. [DOI] [PubMed] [Google Scholar]
- Kanai R, Dong MY, Bahrami B, Rees G. Distractibility in daily life is reflected in the structure and function of human parietal cortex. J Neurosci. 2011a;31:6620–6626. doi: 10.1523/JNEUROSCI.5864-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kanai R, Feilden T, Firth C, Rees G. Political orientations are correlated with brain structure in young adults. Curr Biol. 2011b;21:677–680. doi: 10.1016/j.cub.2011.03.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kanai R, Bahrami B, Duchaine B, Janik A, Banissy MJ, Rees G. Brain structure links loneliness to social perception. Curr Biol. 2012;22:1975–1979. doi: 10.1016/j.cub.2012.08.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kedem B, Fokianos K. Regression models for time series analysis. New York: Wiley; 2002. [Google Scholar]
- Krakauer JW. Motor learning: its relevance to stroke recovery and neurorehabilitation. Curr Opin Neurol. 2006;19:84–90. doi: 10.1097/01.wco.0000200544.29915.cc. [DOI] [PubMed] [Google Scholar]
- Levy DJ, Thavikulwat AC, Glimcher PW. State dependent valuation: the effect of deprivation on risk preferences. PLoS One. 2013;8:e53978. doi: 10.1371/journal.pone.0053978. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li W, Qin W, Liu H, Fan L, Wang J, Jiang T, Yu C. Subregions of the human superior frontal gyrus and their connections. Neuroimage. 2013;78:46–58. doi: 10.1016/j.neuroimage.2013.04.011. [DOI] [PubMed] [Google Scholar]
- Liu X, Hairston J, Schrier M, Fan J. Common and distinct networks underlying reward valence and processing stages: a meta-analysis of functional neuroimaging studies. Neurosci Biobehav Rev. 2011;35:1219–1236. doi: 10.1016/j.neubiorev.2010.12.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lutz K, Pedroni A, Nadig K, Luechinger R, Jäncke L. The rewarding value of good motor performance in the context of monetary incentives. Neuropsychologia. 2012;50:1739–1747. doi: 10.1016/j.neuropsychologia.2012.03.030. [DOI] [PubMed] [Google Scholar]
- Manley H, Dayan P, Diedrichsen J. When money is not enough: awareness, success, and variability in motor learning. PLoS One. 2014;9:e86580. doi: 10.1371/journal.pone.0086580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McNamara JM, Trimmer PC, Houston AI. The ecological rationality of state-dependent valuation. Psychol Rev. 2012;119:114–119. doi: 10.1037/a0025958. [DOI] [PubMed] [Google Scholar]
- Mourão-Miranda J, Oliveira L, Ladouceur CD, Marquand A, Brammer M, Birmaher B, Axelson D, Phillips ML. Pattern recognition and functional neuroimaging help to discriminate healthy adolescents at risk for mood disorders from low risk adolescents. PLoS One. 2012;7:e29482. doi: 10.1371/journal.pone.0029482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Niv Y, Joel D, Dayan P. A normative perspective on motivation. Trends Cogn Sci. 2006;10:375–381. doi: 10.1016/j.tics.2006.06.010. [DOI] [PubMed] [Google Scholar]
- Norman KA, Polyn SM, Detre GJ, Haxby JV. Beyond mind-reading: multi-voxel pattern analysis of fMRI data. Trends Cogn Sci. 2006;10:424–430. doi: 10.1016/j.tics.2006.07.005. [DOI] [PubMed] [Google Scholar]
- Peper JS, Brouwer RM, Boomsma DI, Kahn RS, Hulshoff Pol HE. Genetic influences on human brain structure: a review of brain imaging studies in twins. Hum Brain Mapp. 2007;28:464–473. doi: 10.1002/hbm.20398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pizzagalli DA, Sherwood RJ, Henriques JB, Davidson RJ. Frontal brain asymmetry and reward responsiveness: a source-localization study. Psychol Sci. 2005;16:805–813. doi: 10.1111/j.1467-9280.2005.01618.x. [DOI] [PubMed] [Google Scholar]
- Pochon JB, Levy R, Fossati P, Lehericy S, Poline JB, Pillon B, Le Bihan D, Dubois B. The neural system that bridges reward and cognition in humans: an fMRI study. Proc Natl Acad Sci U S A. 2002;99:5669–5674. doi: 10.1073/pnas.082111099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pompilio L, Kacelnik A, Behmer ST. State-dependent learned valuation drives choice in an invertebrate. Science. 2006;311:1613–1615. doi: 10.1126/science.1123924. [DOI] [PubMed] [Google Scholar]
- Qiu L, Huang X, Zhang J, Wang Y, Kuang W, Li J, Wang X, Wang L, Yang X, Lui S, Mechelli A, Gong Q. Characterization of major depressive disorder using a multiparametric classification approach based on high resolution structural images. J Psychiatry Neurosci. 2014;39:78–86. doi: 10.1503/jpn.130034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reis J, Schambra HM, Cohen LG, Buch ER, Fritsch B, Zarahn E, Celnik PA, Krakauer JW. Noninvasive cortical stimulation enhances motor skill acquisition over multiple days through an effect on consolidation. Proc Natl Acad Sci U S A. 2009;106:1590–1595. doi: 10.1073/pnas.0805413106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rogers RD, Ramnani N, Mackay C, Wilson JL, Jezzard P, Carter CS, Smith SM. Distinct portions of anterior cingulate cortex and medial prefrontal cortex are activated by reward processing in separable phases of decision-making cognition. Biol Psychiatry. 2004;55:594–602. doi: 10.1016/j.biopsych.2003.11.012. [DOI] [PubMed] [Google Scholar]
- Ronsse R, Puttemans V, Coxon JP, Goble DJ, Wagemans J, Wenderoth N, Swinnen SP. Motor learning with augmented feedback: modality-dependent behavioral and neural consequences. Cereb Cortex. 2011;21:1283–1294. doi: 10.1093/cercor/bhq209. [DOI] [PubMed] [Google Scholar]
- Rose JE, McClernon FJ, Froeliger B, Behm FM, Preud'homme X, Krystal AD. Repetitive transcranial magnetic stimulation of the superior frontal gyrus modulates craving for cigarettes. Biol Psychiatry. 2011;70:794–799. doi: 10.1016/j.biopsych.2011.05.031. [DOI] [PubMed] [Google Scholar]
- Santesso DL, Dillon DG, Birk JL, Holmes AJ, Goetz E, Bogdan R, Pizzagalli DA. Individual differences in reinforcement learning: behavioral, electrophysiological, and neuroimaging correlates. Neuroimage. 2008;42:807–816. doi: 10.1016/j.neuroimage.2008.05.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schambra HM, Abe M, Luckenbaugh DA, Reis J, Krakauer JW, Cohen LG. Probing for hemispheric specialization for motor skill learning: a transcranial direct current stimulation study. J Neurophysiol. 2011;106:652–661. doi: 10.1152/jn.00210.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scḧonberg T, Daw ND, Joel D, O'Doherty JP. Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making. J Neurosci. 2007;27:12860–12867. doi: 10.1523/JNEUROSCI.2496-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schönberg T, Bakkour A, Hover AM, Mumford JA, Nagar L, Perez J, Poldrack RA. Changing value through cued approach: an automatic mechanism of behavior change. Nat Neurosci. 2014;17:625–630. doi: 10.1038/nn.3673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Swinnen SP. Information feedback for motor skill learning: a review. In: Zelaznik HN, editor. Advances in motor learning and control. Champaign, IL: Human Kinetics; 1996. pp. 37–66. [Google Scholar]
- Symmonds M, Emmanuel JJ, Drew ME, Batterham RL, Dolan RJ. Metabolic state alters economic decision making under risk in humans. PLoS One. 2010;5:e11090. doi: 10.1371/journal.pone.0011090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N, Mazoyer B, Joliot M. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage. 2002;15:273–289. doi: 10.1006/nimg.2001.0978. [DOI] [PubMed] [Google Scholar]
- Uher R, Yoganathan D, Mogg A, Eranti SV, Treasure J, Campbell IC, McLoughlin DM, Schmidt U. Effect of left prefrontal repetitive transcranial magnetic stimulation on food craving. Biol Psychiatry. 2005;58:840–842. doi: 10.1016/j.biopsych.2005.05.043. [DOI] [PubMed] [Google Scholar]
- Wächter T, Lungu OV, Liu T, Willingham DT, Ashe J. Differential effect of reward and punishment on procedural learning. J Neurosci. 2009;29:436–443. doi: 10.1523/JNEUROSCI.4132-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Worsley KJ, Andermann M, Koulis T, MacDonald D, Evans AC. Detecting changes in nonisotropic images. Hum Brain Mapp. 1999;8:98–101. doi: 10.1002/(SICI)1097-0193(1999)8:2/3<98::AID-HBM5>3.0.CO%3B2-F. [DOI] [PMC free article] [PubMed] [Google Scholar]