Abstract
Discrete jumps in knowledge, as exemplified by single-trial learning, are critical to survival. Despite its importance, however, one-trial learning remains understudied. We sought to better understand the brain activity adaptations that track punctuated changes in associative knowledge by studying visual-motor associative learning with functional magnetic resonance imaging. Human and primate neurophysiological studies of feedback-based learning indicate that performance feedback elicits high activity at first that diminishes rapidly with repeated success. Based on these findings we hypothesized a network of brain regions would track the importance of feedback, which is large early in learning and diminishes thereafter. Specifically, based on neurophysiological findings, we predicted that frontal and striatal regions would show a large activation to first trial feedback and a subsequent reduction selective to performance feedback but not stimulus cue presentation. We observed that the striatum and frontal cortex as well as several other cortical and subcortical sites exhibited this pattern. These findings match our prediction for activity in frontal and striatal regions. Furthermore, these observations support the more general hypothesis that a large network of regions participates in the associative process once the behavioral goal is definitively identified by first trial performance feedback. Activity in this network declines upon further rehearsal but only for feedback presentation. We suggest that, based on the timing of this process, these regions participate in binding together stimulus cue, motor response, and performance feedback information into an association that is used to accurately perform the task on after the first trial.
Keywords: conditional motor learning, punishment, reinforcement, reward, operant, conditioning, instrumental, fMRI, association
1. Introduction
Associative learning plays a key role in survival by providing a mechanism for behavioral adaptation. This reinforcement learning mechanisms allow organisms to learn through experience how to respond to environmental contexts that promote or threaten survival. When an organism’s response is highly consequential to survival learning may occur in one trial. Despite its importance, however, the neural correlates of one-trial learning remain understudied.
Identifying single-trial learning effects on the brain depends upon the precise experimental control of learning. Of the two basic approaches to the study of instrumental associative learning, deterministic and probabilistic, deterministic paradigms are better suited to identify single-trial learning effects. Deterministic paradigms use fixed feedback rules such that positive feedback is always given for the correct response and negative feedback is always given for the incorrect response. Probabilistic paradigms use stochastic feedback rules that can randomly vary from trial to trial according to a predetermined proportion but that on average indicate that one response is better than another. Although probabilistic paradigms contribute to our understanding of striatal, prefrontal and medial temporal lobe involvement in associative learning (Aron et al., 2004; Cools et al., 2002; Poldrack et al., 2001), probabilistic approaches cannot address one-trial learning because associative knowledge accumulates gradually over many trials. Complex deterministic category learning paradigms also yield gradual learning effects (Boettiger and D’Esposito 2005), but simple deterministic tasks are well suited to examine discrete changes in brain activation. This is because associations can be conclusively learned in a few trials and remain fixed thereafter. In spite of this, most deterministic paradigms, our own included, provide less than the ideal control of the learning process that is needed to examine one-trial learning (Bedard and Sanes 2009; Brovelli et al., 2008; Eliassen et al., 2003). In particular, successful learning can be defined by the occurrence of the first correct response to a stimulus cue. This first correct trial, when followed by only correct responses, indicates a definitive knowledge of the cue-response-outcome association. Deterministic paradigms often use multiple response alternatives or rules, however, which allows learning to occur on error trials preceding the first correct response through a process of elimination. These preliminary errors provide information to guide future behavior by identifying the remaining possible response alternatives. Often, the correct response can be learned before being selected, as in the Wisconsin Card-Sorting Test and similar paradigms (Barcelo et al., 2000; Barcelo et al., 2002; Monchi et al., 2001; Zanolie et al., 2008). This allows definitive associative knowledge to be obtained before the first correct trial. In that case, analyses that aggregate across preliminary errors or that examine only the first correct response provide an incomplete description of the changes associated with single-trial learning.
Instrumental associations can only be learned on the basis of feedback. As associations become learned by trial and error, feedback begins to convey less critical information about how to conduct future behavior. Feedback becomes completely predictable, especially if the associations are uncomplicated and deterministic. Electrophysiological correlates of this change in feedback predictability have been observed in both primate and human studies. These changes provide the basis for our hypothesis that brain activation in single-trial learning will track the importance, or utility, of feedback to future task performance. In human event-related potential (ERP) EEG studies, researchers have identified activity changes associated with unexpected feedback following a rule switch. Specifically, Barcelo and colleagues observe an increase in the frontally located P300A waveform in response to feedback presentation after a rule-switch compared to before (Barcelo et al., 2002). This activity diminishes quickly with repeated successes following a switch. Primate single-unit studies of the reward system show that activity in dopaminergic ventral tegmental area neurons also tracks changes in feedback utility. The so-called reward response of this system initially reacts to the presentation of unpredicted positive performance feedback. As the positive feedback becomes predictable with learning this reaction diminishes (Schultz et al., 1995; Schultz et al., 1997). The frontal targets of the reward system also exhibit activity correlated with feedback utility in a probabilistic paradigm (Aron et al., 2004). However, deterministic human learning studies to date provide only indirect evidence of a single-trial feedback-specific change in striatal activity with learning. Brovelli et al. (Brovelli et al., 2008) show that the striatum is more active early in learning. The relationship of this activity to trial-by-trial changes in knowledge remains unclear, though, because they use a paradigm in which learning is defined across several trials through a modeling procedure that defines a prediction error signal. Using a trial-based analysis strategy, Bedard and Sanes (Bedard and Sanes 2009) observe that Parkinson’s Disease patients differ from controls more early in learning than later, indirectly suggesting a more prominent striatal role early in learning in healthy individuals. Altogether, this research suggests that performance feedback activation should decrease following single-trial learning in striatum and prefrontal cortex. Such activity changes would correlate with the declining utility of feedback to guiding future performance.
In the current study we investigated this question using a one-trial visual-motor associative learning paradigm. The paradigm required participants to learn associations between pictures and responses. The task used easily nameable color pictures and a two-alternative choice response. Trials were classified by the pattern of performance, and the analysis included only associations where no errors occurred after the first trial. We predicted that stimulus cue representations and performance feedback representations would be differentially altered by associative learning on the first trial. Specifically, in the transition from learning on the first trial to rehearsal on the second trial there would be reductions in feedback activation in frontal and striatal regions but not similar changes in stimulus activation.
2. RESULTS
2.1 Behavior
The analysis of reaction times indicated the presence of learning related reductions. There was a significant effect of experience (F(5,95)=11.46, P<0.001), whereby reaction time differed significantly across the six categories from 1st Trials through Control trials. Post-hoc Tukey comparisons indicated significant pair-wise differences in reaction time among all learning trials and control (all P<0.005, denoted by the $ in Fig. 2). In addition there were significant reductions in reaction time between 1st Trials and Correct 4th Trials (P<0.01, denoted by the * in Fig. 2) as well as significant differences between 1st and 5-8th (P<0.001), 2nd and 5-8th (P<0.05), and 3rd and 5-8th (P<0.05), which are denoted by the # Fig. 2. There was no difference between any consecutive pair of learning categories. In summary, reaction time progressively decreased with successive presentations of each picture (Fig. 2), and reaction time differed significantly between the learning and control tasks.
Figure 2.

Reaction time as a function of experience (1st Trials, Correct 2nd, etc). Learning is evident as RT decreases from early to later learning trials. The asterisk (*) indicates RTs significantly different from Correct 4th Trials. The pound symbol (#) indicates RTs significantly different from Correct 5-8th Trials. The dollar sign ($) indicates RTs significantly different from Control Trials.
We collected error information from each participant. We did not report standard accuracy measures because we selected only correct trials for the analysis of learning, thus there was no error rate over time to use as a proxy for learning (cf. (Eliassen et al., 2003)). Errors after 1st Trials indicated a failure to learn and series with such errors before the fifth trial were excluded from 1st- Correct 4th Trial analyses. After 1st Trials, there were an average of 5±4 errors during the learning task (mean±SD. dev.; range 0-17) and 1±1 errors during the control task (range 0-6). We had a fairly even number of learning trials across subjects. There were 21±2 trials in each of the categories 1st Trials, Correct 2nd Trials, Correct 3rd Trials, and Correct 4th Trials (range 14-24). All but one subject had 19 or more 1st through Correct 4th Trials. The numbers are identical for each category because we excluded from analysis any series of trials with an error after the first and before the fifth trial (see methods). We also had similar numbers of late learning and control trials across subjects. There were 91±5 Correct 5-8th Trials (range 80-96) and 95±1 Control trials (range 91-96). On the whole the numbers of trials were consistent from subject to subject.
2.2 Brain Imaging
Learning led to changes in brain activation across a wide set of regions. We observed widespread reductions in activation for both stimulus and performance feedback presentation from 1st Trials to Correct 2nd Trials (Fig. 3A, B). In support of our prediction, we observed reductions in activation from 1st Trials to Correct 2nd Trials that were specific to feedback in frontal, parietal and striatal regions (Fig. 3C). Several regions exhibited a similar pattern indicating that activation in response to feedback presentation on 1st Trials diminished dramatically on Correct 2nd and later trials (Table 2; Fig. 3C and D-I). These feedback selective regions included medial parietal cortex, bilateral superior and inferior parietal lobule, medial frontal cortex (supplementary motor area), bilateral middle frontal gyrus, left precentral gyrus (premotor area), and bilateral striatum. Several additional areas exhibited this same pattern, although we did not predict such changes. These regions included bilateral thalamus, bilateral mid-insula, bilateral occipital cortex and left middle temporal gyrus. Six of these regions are graphed in Fig. 3 (panels D-I) to illustrate the decrease in activity that is unique to feedback on the 1st Trial vs. Correct 2nd Trial and the subsequent trials for comparison. In contrast, the activity in response to stimulus cue presentation remained elevated, increased slightly, or decreased slightly from 1st Trials to Correct 2nd Trials. Visual inspection of activity in all the subsidiary peaks listed in Table 2 indicates this same pattern (data not shown). Following our secondary deconvolution analysis we extracted the BOLD time courses from these same six regions. These time courses are plotted in Fig. 4, which shows the more pronounced reduction between the first and second for the feedback event compared to the stimulus event. Thus, the learning effects identified by the fixed shaped canonical hemodynamic response function can be clearly observed in the time varying plots of BOLD activity from these same regions. The regions shown in Fig. 3A and B indicate a number of sites where activity decreased (blue regions) or increased (orange regions) for stimulus and feedback in the transition from 1st to Correct 2nd Trials. Regions with decreasing activity with learning included bilateral insula/inferior frontal operculum, anterior cingulate cortex and ventral striatum. Regions with increasing activity during learning included the bilateral posterior insula. These regions exhibited non-selective learning changes affecting both stimulus and feedback events.
Figure 3.

Brain regions exhibiting a significant effect of learning. (A) Regions that show a significant change in activation for stimulus cue presentation from 1st to Correct 2nd Trials. (B) Regions that show a significant change in activation for performance feedback presentation from 1st to Correct 2nd Trials. (C) Regions that show a significant interaction between event type (stimulus vs. feedback) and experience (1st Trials vs. Correct 2nd Trials). Orange indicates that the decrease in activation seen in panels A and B is larger for the feedback event than for stimulus presentation. (D-F) Graphs of activity in selected regions of interest (black circles) from Table 2. Graphs depict the selective decrease from 1st to Correct 2nd Trials for feedback presentation but also show for comparison activity in each region for Correct 3rd, 4th and 5th-8th Trials and Control Trials. Activation magnitude is displayed in arbitrary MR units (color bar). For the main effects in A and B, orange regions show an increase in activation from 1st to Correct 2nd Trials and blue regions show a decrease. For the interaction effect in C, orange indicates a greater reduction in activation from 1st Trials to Correct 2nd Trials for feedback than for stimulus.
Table 2.
Regions with unique learning effects for stimulus and feedback
| Cluster # | Cluster Size | Center of Mass | Region | |||||
|---|---|---|---|---|---|---|---|---|
| (# Voxels) | X | Y | Z | |||||
| (Local Peak Activation) | ||||||||
| 1 | 2214 | 7 | 60 | 42 | ||||
| 1.1 | ( | 5 | 74 | 39 | ) | L precuneus | ||
| 1.2 | ( | 40 | 59 | 51 | ) | L SPL, BA 7 | {H} | |
| 1.3 | ( | 43 | 31 | 44 | ) | L IPL, BA 40 | ||
| 1.4 | ( | -34 | 52 | 38 | ) | R IPL, BA 40 | ||
| 1.5 | ( | -25 | 74 | 44 | ) | R SPL, BA 7 | ||
| 2 | 1326 | -28 | -13 | 42 | ||||
| 2.1 | ( | -32 | 2 | 63 | ) | R MFG, BA 6 | ||
| 2.2 | ( | -29 | -20 | 54 | ) | R MFG, BA 6 | {I} | |
| 2.3 | ( | -32 | -41 | 38 | ) | R MFG, BA 8 | ||
| 2.4 | ( | -49 | -5 | 44 | ) | R MFG, BA 6 | ||
| 2.5 | ( | 1 | 1 | 51 | ) | medial SFG, BA 6 | ||
| 2.6 | ( | -38 | -8 | 5 | ) | R insula, BA 13 | ||
| 2.7 | ( | -22 | -8 | 14 | ) | R putamen | {G} | |
| 3 | 961 | 28 | 49 | -20 | ||||
| 3.1 | ( | -5 | 32 | -28 | ) | pons | ||
| 3.2 | ( | 19 | 58 | -22 | ) | L cerebellum (VI) | {D} | |
| 3.3 | ( | 10 | 70 | -28 | ) | L cerebellum (Crus 2) | ||
| 3.4 | ( | 28 | 59 | -31 | ) | L cerebellum (Crus 1) | ||
| 3.5 | ( | 13 | 44 | -46 | ) | L cerebellum (IX) | ||
| 3.6 | ( | 44 | 55 | -21 | ) | L FFG, BA 37 | ||
| 3.7 | ( | 59 | 52 | 8 | ) | L MTG, BA 22 | ||
| 3.8 | ( | 50 | 37 | 2 | ) | L MTG, BA 22 | ||
| 4 | 337 | 33 | -31 | 36 | ||||
| 4.1 | ( | 35 | -38 | 39 | ) | L MFG, BA 8 | ||
| 5 | 335 | 31 | 6 | 57 | ||||
| 5.1 | ( | 35 | 11 | 63 | ) | L PCG, BA 6 | {F} | |
| 5.2 | ( | 35 | -8 | 53 | ) | L MFG, BA 6 | ||
| 6 | 279 | -1 | 78 | 3 | ||||
| 6.1 | ( | 2 | 95 | 3 | ) | L cuneus, BA 18 | ||
| 6.2 | ( | -11 | 65 | 20 | ) | R precuneus, BA 31 | ||
| 6.3 | ( | 13 | 73 | 2 | ) | L lingual gyrus, BA 18 | ||
| 6.4 | ( | -2 | 70 | -4 | ) | R lingual gyrus, BA 18 | ||
| 7 | 272 | 26 | -3 | 8 | ||||
| 7.1 | ( | 11 | 17 | 12 | ) | L thalamus | ||
| 7.2 | ( | 28 | 10 | 5 | ) | L putamen | ||
| 7.3 | ( | 34 | -8 | 5 | ) | L insula, BA 13 | ||
| 7.4 | ( | 19 | -20 | 11 | ) | L caudate | ||
| 8 | 235 | -21 | 64 | -34 | ||||
| 8.1 | ( | -8 | 74 | -28 | ) | R cerebellum (Crus 2) | ||
| 8.2 | ( | -34 | 58 | -30 | ) | R cerebellum (Crus 1) | ||
| 8.3 | ( | -14 | 70 | -49 | ) | R cerebellum (VII/VIII) | ||
| 9 | 141 | -17 | 11 | 8 | ||||
| 9.1 | ( | -14 | 11 | 12 | ) | R thalamus | ||
| 10 | 87 | 12 | 24 | 8 | ||||
| 10.1 | ( | -2 | 23 | 12 | ) | R thalamus | {E} | |
| 10.2 | ( | 10 | 22 | 8 | ) | L thalamus | ||
BA, Brodmann’s Area; L, center; R, Right; SPL, superior parietal lobule; IPL, inferior parietal lobule; MFG, middle frontal gyrus; SFG, superior frontal gyrus; FFG, fusiform gyrus; MTG, middle temporal gyrus; PCG, precentral gyrus; Letters in curly braces refer to activity graph panels in Fig. 3.
Figure 4.

BOLD Time Courses from Regions of Interest. Temporal plots of brain activation corroborate the observed effects in Fig. 3, panels D-F, where we showed a greater reduction in brain activation in response to feedback between the 1st Trial and Correct 2nd Trial than for stimulus. (A) Left Cerebellum. (B) Right Thalamus. (C) Left Precentral Gyrus. (D) Right Putamen. (E) Left Superior Parietal Lobule. (F) Right Middle Frontal Gyrus. Plots are of the fit coefficients extracted from the regions of interest in Fig. 3 and Table 2. Plotted with standard error bars. Time in seconds is plotted on the x-axis, and MR signal intensity in raw units is plotted on the y-axis. The key is as follows 1st Stm=1st Stimulus, Cor. 2nd Stm=Correct 2nd Stimulus, 1st Fbk=1st Feedback, Cor. 2nd Fbk=Correct 2nd Feedback. Stimulus events are plotted in solid lines with filled markers. Feedback events are plotted in dotted lines with open markers. Diamond markers denote 1st Trials, and square markers denote Correct 2nd Trials.
3. Discussion
This study was designed to test the hypothesis that striatal and frontal activity would respond in relation to the importance of feedback during learning. Feedback at the beginning of learning is of greater consequence to planning future behavior than after learning. After an association has been learned on the basis of feedback, feedback becomes predictable. Human and primate studies report greater brain activity in the dopamine system (Schultz et al., 1995; Schultz et al., 1997) and its afferent targets (Aron et al., 2004) early in learning when feedback is uncertain. ERP studies of the P300A also indicate that frontal cortex responds more prominently to uncertain feedback following a rule switch in the WCST (Barcelo et al., 2002). On the basis of these studies we predicted higher activity in striatum and PFC during as compared to after learning. We observed greater activation in response to performance feedback on 1st Trials than on Correct 2nd Trials in a number of regions including striatum and bilateral middle frontal gyrus. This effect was specific to feedback presentation and not stimulus cue presentations. Stimulus cue activation decreases little or none after 1st Trials in the regions that exhibit the feedback effect. Because activation is strongest on 1st Trials when feedback identifies the response rule, we interpret this to indicate that striatum and PFC participate in performance evaluation. This evaluative process would include associating, or binding together, information about the stimulus cue, the motor response and the outcome. This process can only and must occur following feedback presentation on the 1st trial in order to perform the second trial correctly. This matches the idea that the brain representation of an associative rule has to be activated, or in this case created, before the behavior it governs (Barcelo et al., 2002; Miller and Cohen 2001). Also, other similar paradigms suggest the importance of the striatum early in learning (Bedard and Sanes 2009; Brovelli et al., 2008). Moreover, Brovelli and colleagues have shown, using both model-based and trial-by-trial analyses, that ventro- and dorso-lateral PFC and striatum respond selectively to feedback during early learning trials including preliminary errors and the first correct trial (Brovelli et al., 2008). Our new data align our own and other prior findings that PFC and striatum participate critically in the earliest associative learning trials (Brovelli et al., 2011; Eliassen et al., 2003). This study extends those findings to show that prefrontal and striatal activity is coupled specifically to the feedback outcome event. Altogether, activity in striatum and PFC is consistent with a role in the association process that, at least in this paradigm, must be initiated at the end of the first trial.
Despite the fact that these effects occur on 1st Trials, explanations based on novelty and uncertainty do not fully capture the observed changes. A novelty response would be expected to yield the greatest activity on 1st Trials and less activity on later trials. This was observed for feedback responses, but not for stimulus cue responses, which are arguably more novel than feedback. The feedback symbols were used throughout the experiment in different combinations whereas pictures in each run were unique, especially on the first trial. Activation to stimulus cues did not diminish as much as for feedback in any region despite the greater novelty of pictures. Uncertainty also does not explain the differences. A strict correlation between brain activation and uncertainty predicts that Correct 2nd Trial feedback activity should drop to zero, and not all regions show such a pattern. A less rigid relationship between uncertainty and brain activation would predict a gradual decrease over time, which is not observed universally either in the response to stimulus or feedback. Neither novelty nor uncertainty provides a satisfying explanation of our learning effects, similar to conclusions reached regarding ERP learning effects in the WCST (Barcelo et al., 2000).
In addition to striatum and PFC, we observed that portions of parietal lobe, occipital lobe, temporal lobe, thalamus, and cerebellum responded uniquely to 1st Trial feedback (Table 2). We also observed learning effects that were shared by stimulus and feedback events (Fig. 3A,B). The learning changes common to stimulus and feedback were, for the most part, reductions in activation as we have seen previously (Eliassen et al., 2003). Two regions in particular stand out. The bilateral frontal operculum/inferior frontal gyrus/insula region (Fig. 3A,B slice 9S) has been identified in several related studies. Activity in this region decreases with learning or reduced uncertainty (Cools et al., 2002; Konishi et al., 2002; Monchi et al., 2001; O Doherty et al., 2001; O Doherty et al., 2003). The central operculum, the orange regions in Fig. 3A,B, shows an increase with learning for both stimulus and feedback, although more prominently and bilaterally for stimulus presentation. This region has been shown in a human classical conditioning paradigm to increase activation with learning as expectations change (Ramnani et al., 2000). Although our observed changes occurred between the first and second trials, our results remain consistent with existing learning studies. The unique effects for feedback include a substantial number of regions outside PFC and striatum. These additional regions could be participating in the process of association since they were selectively active on the first trial like PFC and striatum. Alternatively, activity in these regions, especially parietal and occipital sites, might derive from the visual properties of stimulus and feedback presentation and the visual-motor transformations required to execute responses. Changing the psychophysical properties in future experiments might help clarify this issue. For example using auditory stimulus and feedback presentation and requiring verbal responses would be predicted to alter activation in brain regions that are concerned with stimulus and response modality as opposed to learning. Another possibility is that association processes arising in the striatum and frontal cortex exert top-down influences on these other regions through cognitive control mechanisms. In any case, the unpredicted activity in these other regions during the critical 1st Trial feedback encourages further characterization of the contributions of these other areas to associative learning.
One aspect of this study that may affect our interpretations is that on 1st Trials we included both positive and negative feedback outcomes. We adopted this approach because 1st trial feedback, regardless of valence, identifies the correct response for the second trial. Also, preliminary analyses indicated no significant differences at our thresholds (corrected P<0.01, voxel p<0.001, cluster n≥20). Relaxing the threshold, however, showed some differences between positive and negative feedback that overlapped current observations in thalamus and striatum. The current results were substantially stronger and more widespread than feedback valence differences in our preliminary analyses. Studies have identified positive and negative feedback differences in frontal and striatal regions (Bischoff-Grethe et al., 2009; Liu et al., 2007; van Duijvenvoorde et al., 2008), so it remains a possibility that positive and negative feedback contribute differentially to 1st Trial feedback responses.
Across a network of cortical and subcortical regions we observed a selective reduction in activation in response to feedback during single-trial learning as predicted by animal and human learning studies but not yet demonstrated. These changes occurred in correlation with a discrete jump in associative knowledge at the conclusion of the first trial. This pattern implies that the bilateral middle frontal gyrus and striatum, and possibly other regions, participate in performance evaluation and that activity in this network serves to bind stimulus cue, motor response, and outcome information into an associative response rule that can be accessed during the decision-making process on future trials.
4. METHODS
4.1 Participants
Twenty-one healthy volunteers were recruited from the greater Cincinnati area and received monetary compensation for their participation. All participants gave written informed consent and were screened for MR safety according to standard, established Center for Imaging Research (CIR) protocols and University of Cincinnati Institutional Review Board guidelines. One participant was excluded from analysis due to accuracy of less than 50% in two scanner runs resulting from a miscommunication in explaining the task. Among the remaining 20 subjects, the mean age was 26±3 years, and included 14 women and 6 men. Nineteen participants were right handed with an average score of 16 on a 10-question modified Edinburgh handedness survey (Eliassen et al., 2001; Oldfield 1971) and one was left-handed with a score of -12.
4.2 Behavioral Paradigm
The behavioral experiment was programmed in E-Prime (www.pstnet.com). All participants wore MR-compatible VGA goggles and headphones equipped with a microphone (Resonance Technologies, Northridge, CA; www.mrivideo.com). The MRI system triggered the start of the E-prime paradigm in order to ensure precise timing of the task with respect to image acquisition. Participants responded by pressing with their right thumb one of two buttons on a custom-built MR-compatible button box. E-prime recorded task and behavioral data to a file, including the timing of stimulus and feedback presentations, responses, and outcomes of each trial.
The behavioral paradigm consisted of a control task and a learning task (Fig. 1A). For the learning task, participants viewed easily nameable color pictures, e.g., butterfly, fork (Rossion and Pourtois 2004), and had to learn by trial-and-error to associate each picture with button “1” or “2” on the button box. For the control task participants viewed the digits 1 or 2 and pressed the corresponding button. All trials began with the appearance of a fixation X in the center of the screen. After a variable delay of 0.5 to 3.5 sec a picture or control digit appeared on the screen, and subjects had to make a response. After 2.125 sec the screen went blank for a variable delay of 0.5 to 3.5 sec following which performance feedback was provided for a fixed duration. If a subject failed to respond before the screen went blank then an uninformative “?” was presented as feedback. Just before entering the MRI scanner, each participant was given oral and written instructions and brief training with pictures different from those used during scanning.
Figure 1.

Behavioral paradigm. (A) Time lapse schematic of associative learning and control tasks. At the beginning of a trial a fixation x appears for a variable interval. The fixation screen is followed by the presentation of a stimulus cue (picture or digit). In the associative task participants view an easily nameable color stimulus and must respond while the stimulus remains on screen by pressing one of two buttons. Presentation of the stimulus cue is followed by a blank screen for a variable delay, after which, participants receive performance feedback. The actual feedback symbols that are presented depend upon the participant’s performance and which Award Schedule is in effect (Table 1). The two feedback symbols used in this schematic figure are the positive “+” and negative “0” feedback symbols from Award Schedule 3 in Table 1. Display times are below each slide. (B) During a single scanner run, 16 control trials were followed by 32 associative learning trials. Total elapsed time approximately 6 min 18 sec.
We administered six scanner runs to each participant. During each scanner run participants performed both the control (16 trials) and learning task (32 trials; Fig. 1B). In each run participants had to learn the responses to four pictures. A different set of pictures was used for each run. A given picture was presented eight times in a run. The order of picture presentation was pseudorandom according to Latin Squares. Six scanner runs were administered in order to address two issues. First, we were interested in having as many rule acquisition trials as possible (up to 24) within a reasonable scan time (one hour) in order to maximize statistical power while still allowing for a sufficient number of rehearsal trials (up to 7) for each stimulus cue in order to allow novelty effects to dissipate. Second, we were concerned about learning effects being explained by a fixed outcome value so we provided a different monetary amount as performance feedback in each run. Monetary awards and penalties were used as performance feedback although not strictly associated with correct and incorrect performance, respectively. The monetary outcome values ranged from the -$0.75 to +$0.75 in $0.25 increments. The outcome for a correct response was always $0.25 better than the outcome for an incorrect one. The six different feedback schedules that we used ranged from the best schedule, awarding $0.75 for a correct response or $0.50 for incorrect response to the worst, penalizing $0.50 for a correct response or $0.75 for an incorrect response (Table 1). The order of feedback schedules was randomized across subjects according to Latin Squares. We used a combination of +, -, and 0 as feedback symbols in the task. Each plus or minus sign indicated positive or negative $0.25. For example ++ provided as feedback indicated a +$0.50 monetary award, and --- indicated a penalty of -$0.75. Participants were informed that monetary outcomes were in effect only during the learning trials and not the control trials. The feedback symbols, however, were the same within a run for correct and incorrect responses regardless of whether it was a learning or control trial. Participants were told that they would be paid $20. They were also told that they could win up to $24 additional depending upon performance. All participants, however, were paid $50 plus parking costs.
Table 1.
Award schedules used to control for outcome value effects.
| Correct response | Incorrect response | |||
|---|---|---|---|---|
| Award Schedule | Onscreen Symbol | Monetary Outcome | Onscreen Symbol. | Monetary Outcome |
| 1 | +++ | +$0.75 | ++ | +$0.50 |
| 2 | ++ | +$0.50 | + | +$0.25 |
| 3 | + | +$0.25 | 0 | $0 |
| 4 | 0 | $0 | − | −$0.25 |
| 5 | − | −$0.25 | − − | −$0.50 |
| 6 | − − | −$0.50 | − − − | −$0.75 |
Onscreeen Symbols appeared on the computer screen at the end of each trial to indicate performance. Monetary Outcome was the amount awarded or penalized for a specific response. No monetary information appeared on screen during the feedback presentation interval.
4.3 Behavioral Data Analysis
In order to identify changes in brain activation from rule acquisition on the first trial to rule rehearsal on the second and later trials, we grouped learning trials into five categories. These categories were used for the analyses of behavioral performance and brain activation. Each picture was presented eight times in a scanner run, so perfect performance yielded one learning trial and seven rehearsal trials. “1st Trials” included all trials where a picture was seen for the first time and followed consecutively by at least three correct trials. Following 1st Trials, the next three consecutively correct trials were assigned to the categories “Correct 2nd Trials,” “Correct 3rd Trials,” and “Correct 4th Trials.” Subsequent to these categories all learning trials that were correct five or more times were grouped into a single category called “Correct 5-8th Trials.” Control trials were grouped separately. It is important to note that although 1st Trial outcomes could be correct or incorrect by chance, the same associative information was learned either way so trials that resulted in both positive and negative feedback were included in “1st Trials.” Also important is the fact that if an error occurred after the first trial and before the fifth, none of the first four trials was included in the 1st to Correct 4th categories. Behavioral analysis of reaction times and analysis of fMRI data used these six categories. For the fMRI analysis, stimulus and feedback events were modeled separately for each of the category. This gave us a total of twelve events of interest, stimulus and feedback events for 1st Trials, Correct 2nd Trials, Correct 3rd Trials, Correct 4th Trials, Correct 5-8th Trials and Control Trials. By separately modeling each of the first four correct trials we precisely equated the learning experience across subjects, because, for instance, all Correct 4th trials were preceded by a learning trial and two consecutively correct rehearsal trials. No indirect, or latent, measures of learning need to be calculated in order to establish how much learning has taken place or at what rate in different individuals. No separate regressors were included for the behavioral responses in any condition. Behavioral responses are tightly time-locked to stimulus presentation, and stimulus events therefore contain information about the combined cognitive processes supporting decision-making and response execution.
Control trials were presented at the beginning of each run rather than interleaved within the task. In prior work on this topic (Eliassen et al., 2003) we observed during preliminary analyses that control trials at the beginning and end of the paradigm had different magnitudes of the BOLD response, We remained concerned in this study that novelty effects might occur at the beginning of each scanner run. BOLD responses associated with learning are also largest at the beginning of learning. In order to reduce the potential for novelty effects to confound learning effects, learning trials commenced in the middle of each scanner run rather than at the beginning. Although not within the scope of this study of single-trial learning, in future work we may compare Control and Correct 5-8th trials to understand the differences between association rehearsal and control trials. For this reason we attempted to match the number of trials within these two categories during the analysis phase (see results for the average number of trials within the different categories).
4.4 Image Acquisition
Brain images were acquired on a whole-body Varian Inova 4 Tesla MR system at the University of Cincinnati College of Medicine’s Center for Imaging Research (CIR). A high-resolution T1-weighted anatomical volume was acquired using a Modified Driven Equilibrium Fourier Transform acquisition (MDEFT; (Lee et al., 1995)): TMD=1.1 s, TR=13 ms, TE=6 ms, FOV=25.6 × 19.2 × 19.2 cm, matrix 256 × 192 × 96 pixels, voxel resolution 1×1×2mm, flip angle 20 degrees, zero-filled to 1×1×1mm during reconstruction. For fMRI measurements we acquired six runs of T2*-weighted gradient-echo echoplanar images (EPI) consisting of 35 contiguous 5 mm coronal slices covering the entire brain (TR/TE=3000/29 msec, FOV=208 × 208 mm, matrix 64 × 64, slice-thickness 5 mm, flip angle 75 degrees, resolution 3.25×3.25×5mm) utilizing the BOLD contrast mechanism (Kwong et al., 1992; Ogawa et al., 1992) and a multi-echo reference scan for correction of geometric distortion and Nyquist ghost (Schmithorst et al., 2001). Two acquisitions were acquired prior to a trigger at the beginning of the behavioral paradigm, and these images were discarded to account for magnetization equlibrium effects. A total of 888 volumes were acquired from each subject for analysis.
4.5 MRI Analysis
Raw MRI data were reconstructed with 2D Hamming filtering in the XY plane using in-house software developed in IDL (www.ittvis.com). Images were subsequently processed, analyzed, and visualized with AFNI (Analysis of Functional Neuroimages, afni.nimh.nih.gov; (Cox 1996; Cox and Hyde 1997)). The six fMRI runs were motion corrected to the third image of the first run using a six-parameter rigid-body transformation with Fourier interpolation (Cox and Jesmanowicz 1999). The MDEFT image was normalized to Talairach space using tools in AFNI to match each subject’s image to the International Consortium for Brain Mapping’s ICBM452 template from UCLA’s Laboratory of Neuroimaging (www.loni.ucla.edu). The EPI datasets were spatially smoothed with a 6 mm full-width, half-maximum Gaussian filter, then normalized by adopting the MDEFT transform, and resampled to 3×3×3mm. The application of Hamming filtering at reconstruction plus smoothing after normalization resulted in effective smoothing of 9mm in the xy imaging plane and 6mm in the z-slice plane. The Monte Carlo simulations used to estimate statistical significance with p-value and cluster size thresholds took this smoothing into account.
For each participant, the list of times for each event type was convolved with a canonical hemodynamic response function (HRF) in AFNI with user specified parameters including a delay time (2sec), rise time (4 sec), and fall time (4 sec) based on the observed timing of hemodynamic responses in our previous work (Eliassen et al., 2003). In addition to the behavioral events, polynomials up to third order and the motion corrections parameters were fit to each run to account for signal drift. The fit coefficients for the events were determined from the fMRI data using linear regression with AFNI’s 3dREMLfit, which estimates brain activation and accounts for serial autocorrelations in the fMRI time series. Fit coefficients for the behavioral events were entered as dependent measures in statistical analyses to identify brain activation effects of interest. In addition to this fixed shape HRF, we also modeled the behavioral events in the MRI data with a time-varying HRF. For this purpose we used a set of three sinusoidal basis functions for each event type, which allowed us to examine the time course of the BOLD response to each event type. Time courses from this secondary deconvolution analysis were extracted from the regions of interest in Fig. 3 and plotted in Fig. 4 in order to demonstrate that the temporal signal shows the same effects of learning suggested by the results obtained with the canonical HRF.
4.6 Statistical Analyses
We conducted statistical analyses of reaction times (RTs) and fMRI data. For the RT data, we evaluated the change in response times across all levels of experience (1st, Correct 2nd, Correct 3rd trials, etc.), specifically including a contrast between 1st Trials to Correct 2nd Trials. Behavioral analyses were conducted using a mixed effects analysis approach in Statistical Analysis Software (SAS, SAS Institute, Cary, NC). For the fMRI data we used AFNI’s GroupAna Matlab routines to conduct repeated measures Analysis of Variance (ANOVA). The fit coefficients from the deconvolution procedure were used as dependent measures in this analysis. We included subject as a random factor and event type (2 levels: stimulus or feedback) as a fixed factor. Experience (six levels: 1st, Correct 2nd, Correct 3rd, etc.) was included as a repeated measure. For the fMRI data, contrasts were included to examine a number of learning related changes separately for stimulus or feedback events. Most importantly, we included an interaction term that addressed our main hypothesis. This interaction term appraised whether the change in activity between 1st Trials and Correct 2nd Trials was the same or different for stimulus and feedback events. Because the utility of the feedback in guiding future behavior changes after learning on the first trial, we hypothesized that there would be more pronounced reductions in activity for feedback than for stimulus events in the frontal, parietal and striatal regions.
For all brain imaging comparisons we considered as significant a corrected p-value of 0.01, using a voxel-level p-value ≤ 0.001 and a cluster threshold of 30 voxels. These thresholds were determined according to Monte Carlo simulations using tools in AFNI (Forman et al., 1995; Friston et al., 1994; Xiong et al., 1995). In order to identify the locations of activation clusters we used the Talairach daemon (Lancaster et al., 2000) and the automated anatomical labels of the single-subject high-resolution T1 volume from the Montreal Neurological Institute (Tzourio-Mazoyer et al., 2002) as implemented in AFNI.
Supplementary Material
Highlights.
We studied the brain activity during single trial associative learning using fMRI.
We predicted frontal and striatal changes in feedback processing after the first trial.
We observed reductions in brain activity between first and second trial feedback.
Activity pattern suggests these structures bind stimulus, response, and outcome.
Acknowledgments
This work was supported by the University of Cincinnati, College of Medicine Dean’s Discovery Fund, the Center for Imaging Research, and the NIH/NIDA K01 DA20485 to JCE.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Aron AR, Shohamy D, Clark J, Myers C, Gluck MA, Poldrack RA. Human midbrain sensitivity to cognitive feedback and uncertainty during classification learning. J Neurophysiol. 2004;92:1144–1152. doi: 10.1152/jn.01209.2003. [DOI] [PubMed] [Google Scholar]
- 2.Barcelo F, Perianez JA, Knight RT. Think differently: A brain orienting response to task novelty. Neuroreport. 2002;13:1887–1892. doi: 10.1097/00001756-200210280-00011. [DOI] [PubMed] [Google Scholar]
- 3.Barcelo F, Munoz-Cespedes JM, Pozo MA, Rubia FJ. Attentional set shifting modulates the target P3b response in the wisconsin card sorting test. Neuropsychologia. 2000;38:1342–1355. doi: 10.1016/s0028-3932(00)00046-4. [DOI] [PubMed] [Google Scholar]
- 4.Bedard P, Sanes JN. On a basal ganglia role in learning and rehearsing visual-motor associations. Neuroimage. 2009;47:1701–1710. doi: 10.1016/j.neuroimage.2009.03.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Bischoff-Grethe A, Hazeltine E, Bergren L, Ivry RB, Grafton ST. The influence of feedback valence in associative learning. Neuroimage. 2009;44:243–251. doi: 10.1016/j.neuroimage.2008.08.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Boettiger CA, D’Esposito M. Frontal networks for learning and executing arbitrary stimulus-response associations. J Neurosci. 2005;25:2723–2732. doi: 10.1523/JNEUROSCI.3697-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Brovelli A, Nazarian B, Meunier M, Boussaoud D. Differential roles of caudate nucleus and putamen during instrumental learning. Neuroimage. 2011;57:1580–1590. doi: 10.1016/j.neuroimage.2011.05.059. [DOI] [PubMed] [Google Scholar]
- 8.Brovelli A, Laksiri N, Nazarian B, Meunier M, Boussaoud D. Understanding the neural computations of arbitrary visuomotor learning through fMRI and associative learning theory. Cereb Cortex. 2008;18:1485–1495. doi: 10.1093/cercor/bhm198. [DOI] [PubMed] [Google Scholar]
- 9.Cools R, Clark L, Owen AM, Robbins TW. Defining the neural mechanisms of probabilistic reversal learning using event-related functional magnetic resonance imaging. J Neurosci. 2002;22:4563–4567. doi: 10.1523/JNEUROSCI.22-11-04563.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Cox RW, Jesmanowicz A. Real-time 3D image registration for functional MRI. Magn Reson Med. 1999;42:1014–1018. doi: 10.1002/(sici)1522-2594(199912)42:6<1014::aid-mrm4>3.0.co;2-f. [DOI] [PubMed] [Google Scholar]
- 11.Cox RW, Hyde JS. Software tools for analysis and visualization of fMRI data. NMR Biomed. 1997;10:171–178. doi: 10.1002/(sici)1099-1492(199706/08)10:4/5<171::aid-nbm453>3.0.co;2-l. [DOI] [PubMed] [Google Scholar]
- 12.Cox RW. AFNI: Software for analysis and visualization of functional magnetic resonance neuroimages. Comput Biomed Res. 1996;29:162–173. doi: 10.1006/cbmr.1996.0014. [DOI] [PubMed] [Google Scholar]
- 13.Eliassen JC, Souza T, Sanes JN. Experience-dependent activation patterns in human brain during visual-motor associative learning. J Neurosci. 2003;23:10540–10547. doi: 10.1523/JNEUROSCI.23-33-10540.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Eliassen JC, Souza T, Sanes JN. Human brain activation accompanying explicitly directed movement sequence learning. Exp Brain Res. 2001;141:269–280. doi: 10.1007/s002210100822. [DOI] [PubMed] [Google Scholar]
- 15.Forman SD, Cohen JD, Fitzgerald M, Eddy WF, Mintun MA, Noll DC. Improved assessment of significant activation in functional magnetic resonance imaging (fMRI): Use of a cluster-size threshold. Magn Reson Med. 1995;33:636–647. doi: 10.1002/mrm.1910330508. [DOI] [PubMed] [Google Scholar]
- 16.Friston KJ, Worsley KJ, Frackowiak RSJ, Mazziotta JC, Evans AC. Assessing the significance of focal activations using their spatial extent. Hum Brain Mapp. 1994;1:210–220. doi: 10.1002/hbm.460010306. [DOI] [PubMed] [Google Scholar]
- 17.Konishi S, Hayashi T, Uchida I, Kikyo H, Takahashi E, Miyashita Y. Hemispheric asymmetry in human lateral prefrontal cortex during cognitive set shifting. Proc Natl Acad Sci U S A. 2002;99:7803–7808. doi: 10.1073/pnas.122644899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kwong KK, Belliveau JW, Chesler DA, Goldberg IE, Weisskoff RM, Poncelet BP, Kennedy DN, Hoppel BE, Cohen MS, Turner R. Dynamic magnetic resonance imaging of human brain activity during primary sensory stimulation. Proc Natl Acad Sci U S A. 1992;89:5675–5679. doi: 10.1073/pnas.89.12.5675. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lancaster JL, Woldorff MG, Parsons LM, Liotti M, Freitas CS, Rainey L, Kochunov PV, Nickerson D, Mikiten SA, Fox PT. Automated talairach atlas labels for functional brain mapping. Hum Brain Mapp. 2000;10:120–131. doi: 10.1002/1097-0193(200007)10:3<120::AID-HBM30>3.0.CO;2-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lee JH, Garwood M, Menon R, Adriany G, Andersen P, Truwit CL, Ugurbil K. High contrast and fast three-dimensional magnetic resonance imaging at high fields. Magn Reson Med. 1995;34:308–312. doi: 10.1002/mrm.1910340305. [DOI] [PubMed] [Google Scholar]
- 21.Liu X, Powell DK, Wang H, Gold BT, Corbly CR, Joseph JE. Functional dissociation in frontal and striatal areas for processing of positive and negative reward information. J Neurosci. 2007;27:4587–4597. doi: 10.1523/JNEUROSCI.5227-06.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Miller EK, Cohen JD. An integrative theory of prefrontal cortex function. Annu Rev Neurosci. 2001;24:167–202. doi: 10.1146/annurev.neuro.24.1.167. [DOI] [PubMed] [Google Scholar]
- 23.Monchi O, Petrides M, Petre V, Worsley K, Dagher A. Wisconsin card sorting revisited: Distinct neural circuits participating in different stages of the task identified by event-related functional magnetic resonance imaging. J Neurosci. 2001;21:7733–7741. doi: 10.1523/JNEUROSCI.21-19-07733.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Doherty J, Critchley H, Deichmann R, Dolan RJ. Dissociating valence of outcome from behavioral control in human orbital and ventral prefrontal cortices. J Neurosci. 2003;23:7931–7939. doi: 10.1523/JNEUROSCI.23-21-07931.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Doherty J, Kringelbach ML, Rolls ET, Hornak J, Andrews C. Abstract reward and punishment representations in the human orbitofrontal cortex. Nat Neurosci. 2001;4:95–102. doi: 10.1038/82959. [DOI] [PubMed] [Google Scholar]
- 26.Ogawa S, Tank DW, Menon R, Ellermann JM, Kim SG, Merkle H, Ugurbil K. Intrinsic signal changes accompanying sensory stimulation: Functional brain mapping with magnetic resonance imaging. Proc Natl Acad Sci U S A. 1992;89:5951–5955. doi: 10.1073/pnas.89.13.5951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Oldfield RC. The assessment and analysis of handedness: The edinburgh inventory. Neuropsychologia. 1971;9:97–113. doi: 10.1016/0028-3932(71)90067-4. [DOI] [PubMed] [Google Scholar]
- 28.Poldrack RA, Clark J, Pare-Blagoev EJ, Shohamy D, Creso Moyano J, Myers C, Gluck MA. Interactive memory systems in the human brain. Nature. 2001;414:546–550. doi: 10.1038/35107080. [DOI] [PubMed] [Google Scholar]
- 29.Ramnani N, Toni I, Josephs O, Ashburner J, Passingham RE. Learning- and expectation-related changes in the human brain during motor learning. J Neurophysiol. 2000;84:3026–3035. doi: 10.1152/jn.2000.84.6.3026. [DOI] [PubMed] [Google Scholar]
- 30.Rossion B, Pourtois G. Revisiting snodgrass and vanderwart’s object pictorial set: The role of surface detail in basic-level object recognition. Perception. 2004;33:217–236. doi: 10.1068/p5117. [DOI] [PubMed] [Google Scholar]
- 31.Schmithorst VJ, Dardzinski BJ, Holland SK. Simultaneous correction of ghost and geometric distortion artifacts in EPI using a multiecho reference scan. IEEE Trans Med Imaging. 2001;20:535–539. doi: 10.1109/42.929619. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Schultz W, Romo R, Ljungberg T, Mirenowicz J, Hollerman JR, Dickinson A. Reward-related signals carried by dopamine neurons. In: Houk JC, Davis JL, Beiser DG, editors. Models of Information Processing in the Basal Ganglia. The MIT Press; Cambridge, Massachusetts: 1995. pp. 233–248. [Google Scholar]
- 33.Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. [DOI] [PubMed] [Google Scholar]
- 34.Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N, Mazoyer B, Joliot M. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage. 2002;15:273–289. doi: 10.1006/nimg.2001.0978. [DOI] [PubMed] [Google Scholar]
- 35.van Duijvenvoorde AC, Zanolie K, Rombouts SA, Raijmakers ME, Crone EA. Evaluating the negative or valuing the positive? neural mechanisms supporting feedback-based learning across development. J Neurosci. 2008;28:9495–9503. doi: 10.1523/JNEUROSCI.1485-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Xiong J, Gao J, Lancaster JL, Fox PT. Clustered pixels analysis for functional MRI activation studies of the human brain. Hum Brain Map. 1995;3:287–301. [Google Scholar]
- 37.Zanolie K, Van Leijenhorst L, Rombouts SA, Crone EA. Separable neural mechanisms contribute to feedback processing in a rule-learning task. Neuropsychologia. 2008;46:117–126. doi: 10.1016/j.neuropsychologia.2007.08.009. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
