Abstract
Substantial experimental evidence supports the theory that the dopaminergic system codes a phasic (short duration) signal predicting the delivery of primary reinforcers, such as water when thirsty, during Pavlovian learning. This signal is described by the temporal difference (TD) model. Recently, it has been suggested that the human dopaminergic system also codes more complex cognitive goal states, including those required for human social interaction. Using functional magnetic resonance imaging (fMRI) with 18 healthy subjects, we tested the hypothesis that TD signals would be present during a Pavlovian learning task, and during a social motor response learning task. Using an identical model, TD signals were present in both tasks, although in different brain regions. Specifically, signals were present in the dorsal anterior cingulate, ventral striatum, amygdala, and thalamus with Pavlovian learning, and the dorsal anterior cingulate and bilateral frontal operculum with social motor response learning. The frontal operculum is believed to be the human homologue of the monkey mirror neuron system, and there is evidence which links the region with inference about other peoples' intentions and goals. The results support the contention that the human dopaminergic system predicts both primary reinforcers, and more complex cognitive goal states, such as motor responses required for human social group interaction. Dysfunction of such a mechanism might be associated with abnormal affective responses and incorrect social predictions, as occur in psychiatric disorders. Hum Brain Mapp 2009. © 2008 Wiley‐Liss, Inc.
Keywords: computational biology, social interaction, reinforcement, motor activity, ventral striatum, fMRI, frontal operculum
INTRODUCTION
Formal learning theory, such as the temporal difference (TD) model [Sutton and Barto, 1998], incorporates a phasic “prediction error” signal: events that are better than predicted evoke an increased phasic signal, those that are worse evoke a phasic reduction of background activity. TD theory developed as a model of Pavlovian learning [Sutton and Barto, 1981, 1990], is a generalization of the Rescorla–Wagner model [Rescorla and Wagner, 1972], and has links to optimal action selection in engineering [Dayan and Abbott, 2001]. The TD model is not limited to Pavlovian learning, and for example is the basis of Tesauro's backgammon computer program, which continues to play against humans successfully at a world‐class level [Montague, 2006; Tesauro, 1995].
From a physiological perspective, the TD model describes how certain classes of problems can be solved computationally, and when animals and humans solve such problems, their DA system exhibits a signal conforming to a term within the TD model: the TD prediction error signal. Considerable animal [Montague et al., 1996; Schultz, 2002; Waelti et al., 2001] and human [McClure et al., 2003a; O'Doherty et al., 2004; Paulus et al., 2004; Seymour et al., 2007; Tanaka et al., 2004; Tobler et al., 2006] experimental evidence exists linking the TD reward learning signal with the DA system. Brain regions repeatedly reported to demonstrate the TD signal in humans include the ventral striatum/putamen (VS), ventral tegmental area (VTA), insula, amygdala, and anterior cingulate (AC). Other than DA, no other types of neuron are known to exhibit the TD reward learning signal [Schultz and Dickinson, 2000], although there is evidence for a TD aversive learning signal in humans which may be serotonin linked [Seymour et al., 2005, 2007]. Nevertheless, imaging studies have also reported TD reward learning signals in brain regions which do not have strong DA innervation, so it is possible that other neurones exhibit this pattern of activity as well.
Recently, it has been suggested that the human DA system also codes more complex goals, such as those required for social interaction [Doya et al., 2003; Montague, 2006; O'Reilly et al., 1999]. This is supported by experimental work [King‐Casas et al., 2005]. Notably, the only “social” communication channel (excluding endocrine and autonomic) is “motor” [Wolpert et al., 2003], and a unifying computational theory for both social interaction and motor control has been described [Wolpert et al., 2003]. A version of this theory predicts TD signals with social interaction [Doya et al., 2003].
Therefore, we performed an fMRI study to test whether an identical TD model would capture neural responses in brain regions associated with both a Pavlovian, and a social motor response (“Social”), learning task. The first aim was to verify that the TD model described neural activity in subjects during Pavlovian learning. The second aim was to test whether neural responses to social stimuli predicting a requirement for a motor response, in the same subjects, were described by an identical TD model. Whilst the VS and dorsal striatum are commonly reported active in human TD neuroimaging studies employing Pavlovian and instrumental tasks, different brain regions are implicated in social interaction: temporal and medial frontal cortices [Frith and Frith, 2003] and frontal operculum [Gallese, 2003; Iacoboni et al., 2005; Montgomery et al., 2007]. Therefore, we hypothesized that TD signals would be found in different brain regions for the two tasks.
SUBJECTS AND METHODS
The study was approved by the local ethics committee and written informed consent was obtained from each subject.
Participants
Eighteen healthy subjects (11 female), age 42 ± 12 years, were recruited without a history of psychiatric illness, head injury, major physical illness, or drug misuse. None were receiving medications which could alter brain activity.
Pavlovian Learning Paradigm
Subjects were asked to abstain from drinking fluids from the night before the scan to ensure they were thirsty. Whilst in the scanner, they were presented randomly with one of two fractal pictures. Following the presentation of either picture, water was delivered according to a probabilistic pattern. The association between the pictures and water delivery changed slowly a number of times. Subjects were told: “After either of the pictures drops of water may be delivered. You should try and learn which picture predicts the water. The picture which predicts the water may change.” The goal of the task was therefore made explicit. Immediately after scanning, subjects completed linear analogue scales of perceived pleasantness of the water and were asked to recall the associations between the pictures and the water delivery for the first and last blocks (see below), and estimate their certainty. One‐sample t tests and χ2 tests were used to test null hypotheses of water not being reported pleasant and associations not being learned.
The Pavlovian task lasted 10 min and consisted of 100 trials each of 6 s duration. Within each trial, 2 s from the start, one of two pictures (conditioned stimuli; CS) was randomly presented, and 4 s after the start, 0.1 ml of water was delivered (unconditioned stimulus; US) or not, to subjects according to a predefined probability. Water delivery was via a polythene tube attached to an electronic syringe pump (World Precision Instruments, Stevenage, UK) positioned in the scanner control room and interfaced to the image presentation and log file generating computer. There were five blocks of 20 trials each within the 10 min period with the following probabilities of water delivery: picture 1 (80%) picture 2 (0%), picture 1 (50%) picture 2 (20%), picture 1 (0%) picture 2 (90%), picture 1 (20%), picture 2 (20%), picture 1 (80%), picture 2 (0%). Prestudy pilot testing had indicated that subjects could not identify where the boundaries between the blocks were, due to the probabilistic nature of the water delivery and the few numbers of reinforced trials in each block. Event times of picture presentation and water delivery were recorded into each subject's log file.
Social Motor Response Learning Paradigm
Here, “social” refers to viewing a “social animation,” defined as a dynamic cartoon representation (see Fig. 1) of an interaction between three human figures, consisting of ball throwing. This contrasts with the “nonsocial” stimuli (fractal pictures) used in the Pavlovian task. In addition, the term also refers to subjects participating (by button pressing) in the observed interaction. Subjects were told they were represented by one of the cartoon figures, which exhibited a ball throwing action when they pressed one of two buttons. The ball was then shown moving towards the figure selected by the chosen button, and caught by that figure. The ball catching action was done automatically for the subject. The Social task lasted for 10 min and consisted of 168 trials each lasting about 4 s. Subjects were told: “when you receive the ball, just pass it back.” Subjects were not told that the object of the game was to study the effects of varying social inclusion.
There was no intention for the Social task to be as similar as possible to the Pavlovian task. Instead, the implementation of the Social task was chosen to be as similar as possible to a paradigm published prominently in the literature [Eisenberger et al., 2003]. The Pavlovian task was chosen to be as similar as possible to an animal Pavlovian conditioning paradigm [Sutton and Barto, 1990]. The cartoon animation is provided for download by the authors of the previous study [Eisenberger et al., 2003]. Unlike the previous report, components of the Social task (throwing actions of each subject, ball passing, and catching actions), were extracted from the animated “gif” file and each cartoon component called separately and smoothly by the image presentation and log file program. This allowed event‐times of task components to be accurately recorded into each subject's log file, which was later used to extract the time courses of events for image analysis.
The Social task was not implemented exactly as described by Eisenberger because of a methodological issue. Eisenberger used an fMRI blocked design, with three blocks, the last being “social exclusion.” Each block only occurred once, because the authors wanted social exclusion to be “unexpected” [Eisenberger et al., 2003]. This is problematic as fMRI paradigms require repeated short blocks or events otherwise the signal of interest falls into the low frequency “noise” band [Friston, 2004] which is usually high‐pass filtered out during analysis. Consequently, we modified the original task to a stochastic event‐related design, such that the probability of social inclusion varied in a manner that could be partly predicted by the TD algorithm coding social inclusion as reinforcing.
“Exclusion” was defined as a pass of the ball between the two computer figures not representing the subject, “inclusion” as a pass of the ball to the subject. Unlike previous use of the game, the subject's group inclusion was systematically and slowly varied between 100 and 0% on four occasions in 17 blocks. The percentage levels of inclusion for each of the blocks were: 100, 75, 50, 25, 0, 25, 50, 75, 100, 75, 50, 25, 0, 25, 50, 75, and 100. The paradigm was designed such that each block only contained nine or 10 inclusion and exclusion events, which in combination with the above slowly changing probability of inclusion, meant that subjects only noticed a gradual change in inclusion and not block boundaries. This was confirmed in pilot testing.
To allow an identical sequence of social inclusion probabilities for each subject, and as with previous studies [Eisenberger et al., 2003; Williams et al., 2000], the behavior of the other two figures was driven by a computer program. Both animated figures had identical inclusion/exclusion histories, as our study investigated group inclusion, not aspects relating to identities of members of the group. Subjects were encouraged to believe that, just as they were represented by an animated cartoon figure which acted in accordance with their button presses, so the actions of the other two cartoon animations were also in response to people similarly pressing buttons to pass the ball [Eisenberger et al., 2003]. On the basis of previous work [Williams et al., 2000], a structured set of questions was asked immediately after scanning to assess each subject's emotional response to inclusion/exclusion during the game. One‐sample t tests and χ2 tests were used to test null hypotheses of no emotional response. Subjects were also asked to guess the percentage of ball throws that the other “people” received.
Temporal Difference Learning Model
The presence or absence of a CS at time t was coded in binary form in the stimulus representation vector xi(t) [Dayan and Abbott, 2001] from the timing of events in each subject's log file. The estimation of the value (V) of each state was
where wi were weights, updated on each trial as below. The TD error signal δ(t) was defined as
where r(t) was the delivered reinforcement (coded as unity for water delivery or “social inclusion;” zero for no water delivery or “social exclusion,” and all other time points) obtained from each subject's log file, and γ a discount factor which determined how less important later reinforcers (water delivery or social inclusion) were, compared with earlier reinforcers. Learning occurred by updating the weights on each trial as
where α was the learning rate. Each trial was assumed to consist of six time‐points in both the Pavlovian and Social tasks. As associations were learned, the TD error signal moved “backwards in time” from the US to the time of the CS. When associations changed the error signal moved forwards again to the time of the US, with less signal at the time of the CS [Dayan and Abbott, 2001]. Full learning never occurred in either task due to the probabilistic nature of associations with reinforcers and hence there was always some signal present at the time of the US.
Following previous work [King‐Casas et al., 2005], as the probability of social inclusion from a computer figure was learned, the TD signal moved backwards in time, from the point when a computer figure threw the ball and it was clear that the real subject was to receive it or not (US), to an earlier time when it became obvious that a computer figure had an opportunity to throw the ball (CS) to the subject. The CS time point was taken as 1.5 s before the actual ball pass. This corresponds to an estimate of the time when it can be recognized that a cartoon figure is catching a ball being passed to them, with the implication that that cartoon figure (with a recent social inclusion history) may throw the ball to the subject. The effects of other estimates were explored.
The learning rate α and the discount factor γ have to be chosen. As in previous studies, γ = 1.0 and α = 0.1 were used [O'Doherty et al., 2006]. The effects of other plausible choices were also investigated. The TD model used the same set of parameters across all subjects and group (Pavlovian or Social task) data, since the image analysis tested the null hypothesis of no difference between groups [Pessiglione et al., 2006]. The effects of varying these assumptions were investigated.
Image Acquisition
For blood oxygen level dependent (BOLD) imaging, T2* weighted echoplanar images were obtained using a GE Medical Systems Signa 1.5 T MRI scanner. A total of 30 axially orientated 5 mm thick contiguous sequential slices were obtained for each volume, 246 volumes being obtained with a TR of 2.5 s, TE 30 ms, flip 90°, FOV 240 mm, and matrix 64 × 64. The first 4 volumes were discarded to allow for transient effects. Image acquisition was asynchronous with respect to stimulus and feedback presentation events.
Image Analysis
Image data was converted to Analyze format and SPM2 [Friston, 2004] used for analysis. For preprocessing, BOLD images were slice time corrected then realigned to the first image in each time series. The average realigned image was used to derive parameters for spatial normalization to the SPM2 MNI template, and then the parameters applied to each image in the time series. The resultant time‐series realigned and spatially normalized images were smoothed with an 8 mm Gaussian kernel. For statistical analysis, a random effects event‐related design was used.
First Level Pavlovian Task Analysis
Each subject's log file was used to extract the sequence and timing of the pictures and water delivery. This was used to calculate a predicted TD error signal profile for each subject. High pass filtering was used. The covariate of interest was the event times multiplied by the predicted TD signal, the result convolved with the SPM2 standard haemodynamic response function, with no time or dispersion derivatives. The covariates of no interest were: the picture and water delivery event onsets convolved with the haemodynamic response function, six motion realignment terms to allow for any residual movement artefacts not removed by preprocessing realignment, and a constant term modeling the baseline of unchanged neural activity.
First Level Social Task Analysis
Each subject's log file was similarly used to extract the timing of the inclusion and exclusion events. This was again used to calculate a predicted TD profile for each subject. The 1st level Social task image analysis was done in an identical manner to the Pavlovian task, and an identical TD model was used for both analyses.
2nd Level Analyses of Pavlovian and Social Tasks
The two covariate images of interest for each subject from each paradigm formed two groups, with each group being entered into a one group t test to test the null hypothesis of no activation. A priori defined regions of interest for both tasks were: VS, VTA, opercular cortex, thalamus, amygdala, and AC. The image analysis was “nonexploratory” in the sense that TD signal in other regions were not of particular interest. The false discovery rate (FDR) method [Genovese et al., 2002] for SPM was used to control for multiple testing of voxels. Significance was defined as P < 0.05 FDR “whole brain” corrected. As is conventional, images were thresholded at P < 0.001 uncorrected to demonstrate the spatial extent of the signal.
To investigate the match between the predicted and observed TD signal, peri‐stimulus histograms were calculated using the model already described. Each trial in a given paradigm was categorized according to whether the predicted TD signal was smaller at the time of the CS compared with US (CS < US) or CS > US. Using SPM, the predicted TD signal convolved with the HRF was calculated, and then averaged for each trial category, for each subject. The observed BOLD response for each categorized trial was similarly extracted using SPM and averaged for each subject. These data were then averaged across subjects.
Comparison Between Social and Pavlovian Task
Our main hypothesis was that an identical TD model would predict neural activity in both a “simple” Pavlovian reward learning task and a Social learning task. Our secondary hypothesis was that whilst TD signals would be found with both the Social task and Pavlovian task, brain activation regions would differ. The latter was tested using a 2nd level paired t test, with 10 mm diameter small volume corrections (SVCs) centred at coordinates of significant TD signal identified from either of the one group t tests. Significance was defined here as P < 0.05 FDR SVC corrected. This method focused on regions exhibiting relatively strong TD activation in either task, then tested for differences between tasks for only these regions.
TD Model Stability Analysis
The TD calculations made assumptions regarding learning rate (α) and discount factor (γ). Consequently, the image calculations for the Pavlovian and Social paradigms were repeated to compare the effects of assuming α = 0.1 and α = 0.4, and γ = 1.0 and γ = 0.4, and the null hypothesis of no difference tested with a t test. For the Social task, the effect of assuming a time of CS relative to the US of 1.5 and 2.0 s was compared. Additionally, the Social task analysis was repeated with social exclusion defined as reinforcing, social inclusion neutral. This was to determine whether both social inclusion and exclusion TD learning signals were detectable.
RESULTS
Pavlovian Learning
Table I shows the behavioral results. Subjects correctly reported the picture‐water associations for the first and last blocks better than chance. Reported certainty of association was better for the first than last block. As expected, subjects rated the water as pleasant in their thirsty state. A typical predicted Pavlovian task TD signal is illustrated in Figure 2. The image analysis results are shown in Figure 3. Table II indicates that a TD signal was detected in the dorsal anterior cingulate (dAC), bilateral VS, amygdala, and thalamus. Figure 4 shows there was a good fit between the predicted Pavlovian task TD signal and observed blood oxygen level dependent (BOLD) fMRI signal for the VS, chosen because it exhibited one of the strongest signals.
Table I.
Significance | ||
---|---|---|
Pavlovian task | ||
Correctly reported picture‐water associations | ||
First/last blocks as percentages | 72.2/55.5 | 0.01/0.05 |
Certainty of reported picture‐water associations | ||
First/last blocks as percentages | 80.0 (18.5)/58.0 (23.1) | <0.001/0.157 |
Water pleasantness as a percentage | ||
Unpleasant (0); pleasant (100). | 74.8 (23.0) | <0.001 |
Social task | ||
Belongingness | ||
How much do you feel you belonged to the group? Not at all (0); very much (10). | 3.16 (1.63) | <0.001 |
Self esteem | ||
Do you think other participants valued you as a person? Do not value (0); do value (10). | 5.12 (2.11) | <0.001 |
Ignored and excluded | ||
Did you feel you were ignored by the other participants? 100% ignored (0); 0% ignored (10) | 4.41 (2.11) | <0.001 |
Noticed and included | ||
Did you feel you were noticed by the other participants? 100% unnoticed (0); 0% noticed (10). | 4.71 (2.33) | <0.001 |
Percentage of total throws | ||
What percentage of total ball throws did the other players receive? | 74.8 (11.4) | <0.001 |
Percentages and linear analogue scale ratings. Standard deviations in parentheses.
Figure 2.
Typical predicted temporal difference (TD) error signals. Pavlovian task (A), Social task (B). The TD model was identical for both tasks.
Figure 3.
TD signal activations for the Pavlovian (A) and Social (B) tasks. Images are thresholded at P < 0.001 uncorrected, labelled regions are significant at P < 0.05 whole brain corrected [Genovese et al., 2002]. A similar dAC activation occurred with both tasks. However, the Pavlovian task resulted in significant bilateral ventral striatal activations, the Social task significant bilateral frontal operculum activations. Abbreviations: dAC, dorsal anterior cingulate; TH, thalamus; VS/A, ventral striatum/amygdala; Op, frontal operculum. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.]
Table II.
Task | Location | Coordinate | z | Significancea |
---|---|---|---|---|
P | Dorsal AC | (−4,10,46) | 4.62 | 0.009 |
P | Ventral striatum | (−22,4,−8) | 4.23 | 0.013 |
P | Ventral striatum | (32,2,−12) | 4.14 | 0.013 |
P | Amygdala | (−20,−2,−14) | 2.88 | 0.018 |
P | Amygdala | (26,−2,−14) | 3.85 | 0.018 |
P | Thalamus | (2,−14,−6) | 4.55 | 0.009 |
S | Dorsal AC | (−4,6,50) | 4.92 | 0.021 |
S | Frontal operculum | (−44,12,−4) | 3.60 | 0.005 |
S | Frontal operculum | (44,18,4) | 3.90 | 0.021 |
P, pavlovian task; S, social task; AC, anterior cingulate.
FDR corrected at P < 0.05 [Genovese et al., 2002].
Figure 4.
Comparison between predicted TD signal and observed BOLD. Predicted TD signal convolved with a canonical haemodynamic response function is shown in gray, observed BOLD is shown in black with standard error across subjects. Top row (A) is the comparison for the ventral striatum in the Pavlovian task. Bottom row (B) is the comparison for the dorsal anterior cingulate in the Social task. Unexpected reinforcement (social inclusion) and unexpected absence of reinforcement (social exclusion) are shown in the first and second columns respectively. The inset boxes in each diagram show the pre‐convolved pattern of predicted TD signal.
Social Motor Response Learning
Table I shows the behavioural ratings. Subjects rated themselves as incompletely belonging to the group, and rated themselves as valued less and more ignored and unnoticed, than the other group members. Additionally, they correctly estimated that they received significantly fewer ball passes than the other group members. A typical predicted Social task TD signal is illustrated in Figure 2. The image analysis results are shown in Figure 3. Table II indicates that a significant TD signal was detected in the dAC and bilateral frontal operculum. Figure 4 shows there was a good fit between the predicted and observed Social task TD signal for the dAC, also chosen because it exhibited one of the strongest signals.
Comparison Between TD Signals in the Social and Pavlovian Tasks
Using the locations listed in Table II, a paired t test was done to compare the TD activations in the two tasks. Details of significant differences in activation are summarized in Table III. The Social task TD signal was significantly stronger in the dAC and frontal operculum, but significantly weaker in the VS and amygdala.
Table III.
Location | Coordinate | z | Significancea |
---|---|---|---|
Dorsal anterior cingulate | (−4,10,46) | 3.53 | 0.011 |
Frontal operculum | (−44,12,−4) | 3.60 | 0.005 |
Ventral striatum | (32,2,−12) | −2.80 | 0.012 |
Amygdala | (−20,−2,−14) | −3.15 | 0.035 |
Amygdala | (26,−2,−14) | −4.35 | 0.001 |
FDR corrected at P < 0.05
TD Model Stability Analysis
The reported results were found to be stable over a range of plausible model values (both α and γ) and no significant differences were found. No significant effect was found when altering Social task CS and US relative time. This indicates that our choice of parameters was not critical to the results. Stability over a range of TD parameters has been reported previously [e.g., O'Doherty et al., 2003]. Altering model values between tasks could not account for the observed task differences. When the Social task image analysis was repeated with social exclusion coded as reinforcing, no significant TD signal was detected. Therefore, only a neural TD signal coding social inclusion as reinforcing was detected.
DISCUSSION
Evidence for learning in both tasks is provided by subject self‐ratings and the detected TD signal. Regarding the former, for the Pavlovian task, subjects correctly learned above chance which picture predicted water delivery. During the Social task, subjects learned they had been excluded at times. The neural TD signal during the Pavlovian task reflected learning about the changing likelihood of water delivery, the neural TD signal during the Social task reflected learning about the changing likelihood of being socially included. Consistent with our first hypothesis, TD signals were present during both the Pavlovian and Social tasks, in different brain regions. It should be noted that the TD signal is defined as the difference between the predicted learned reinforcer and the actual delivered reinforcer, so is referred to as a “prediction error signal.” Consequently, “prediction” based on learning was a fundamental feature of both tasks, reflected by the detected signal. Importantly, if assumptions made during modeling of the TD signal were invalid (e.g., choice of α and γ), and the precise choice crucial, a neural TD signal with the modeled characteristics would not have been detected.
Whilst both tasks were associated with a dAC activation, the Social task exhibited bilateral frontal operculum TD activations, the Pavlovian task bilateral VS, amygdala and thalamic activations. Other experimental work has suggested that TD signals may be present during social interaction in humans. The neural correlates of “reputation” and “trust” were investigated using a two‐person economic exchange [King‐Casas et al., 2005], and a signal reported which the authors argued had characteristics analogous to the DA prediction error signal for primary reinforcers. A link with TD theory was suggested [King‐Casas et al., 2005]. Another study provided more direct evidence for a link between human TD signals and DA using a social economic task and pharmacological manipulation [Pessiglione et al., 2006]. These findings support the notion that the DA system has a role in social interactions [Doya et al., 2003; Montague, 2006; O'Reilly et al., 1999]. Whilst there is compelling evidence linking the TD signal to DA activity [McClure et al., 2003a; Montague et al., 1996; O'Doherty et al., 2004; Paulus et al., 2004; Schultz, 2002; Seymour et al., 2007; Tanaka et al., 2004; Tobler et al., 2006; Waelti et al., 2001], and DA system manipulations in humans alter the TD signal [Menon et al., 2007; Pessiglione et al., 2006], the BOLD TD signal may not be a direct measure of DA. Specifically, it has been suggested that phasic DA neuronal firing leads to DA release, which facilitates some form of longer duration postsynaptic activity, such as postsynaptic potentiation [Menon et al., 2007]. Such longer duration postsynaptic DA mediated responses could be the basis of the BOLD signal that correlates with the predicted TD signal [Menon et al., 2007]. As above, whilst an aversive TD signal may be exhibited by non‐DA neurones such as the serotonergic system, there is no evidence as yet for any neurones other than DA driving TD reward learning signals [Schultz and Dickinson, 2000; Seymour et al., 2007]. Therefore, there is good evidence linking the TD reward learning signal with DA. However, exploration of this link was outwith our current study.
The Pavlovian and Social tasks were clearly dissimilar in many ways, yet an identical TD model predicted brain activity with both tasks. The differences between the Social and Pavlovian task results should be interpreted with caution as the task details were not finely matched: e.g., total number of trials, the timing of the stimuli within a trial, number of blocks of different associations, and of course types of visual stimuli (animated cartoon figures vs static fractal pictures). A stronger dAC signal in the Social task could have been due to a better signal/noise ratio as a consequence of more trials. However, this effect seems unlikely to account for the relative absence of a VS TD signal, which is often reported in human Pavlovian and instrumental learning tasks [e.g., McClure et al., 2003a; O'Doherty et al., 2004; Pessiglione et al., 2006], and strong TD signals centred on the bilateral frontal operculum in the Social task.
The absence of ventral striatal TD activity in the Social task does not necessarily imply a lack of “motivation.” This is because subjects correctly learned they had been socially excluded at times, and rated such exclusion as unpleasant. Furthermore, detection of a neural TD signal during the Social task provides evidence for motivation, as phasic DA signals have been linked with “incentive salience” attribution [McClure et al., 2003a, b] and TD signals with motivation [Berridge, 2007]. According to Berridge and colleagues, incentive salience attribution makes a specific predictive stimulus an object of desire, motivationally tagging a predictor for a reward that an individual wants to experience. Incentive salience attribution is believed to be a conditioned motivational response of the brain, triggered by and assigned to a reward predicting stimulus. Furthermore, incentive salience attribution has been directly linked to the TD Value estimate [McClure et al., 2003b]. According to McClure et al., the TD Value estimate corresponds to a subject's internal estimate of the information learned and used to motivate behavior. The greater the estimated Value of a state, the higher the motivation for acquiring that state. Consequently, since the computation necessary for a TD signal is believed to reflect changes in motivational Value, detection of a neural TD signal implies the presence of changing “motivation.” Nevertheless, it is possible that some overall measure of motivation for the Social task differed from the Pavlovian task, and this in turn might have affected TD signal detection. For instance, subjects were deprived of liquids but not social contacts prior to the experiment. It is also possible that belief in the cartoon figures representing the actions of other subjects might have affected TD signal detection in the Social task.
The frontal operculum is believed to be the human homologue of area F5 in the monkey which exhibits the mirror neuron system (MNS) [Petrides and Pandya, 1994]. In both animals and humans, the region is implicated in social cognition and action understanding, being activated during observation of biological motion, goal directed, and expressive movements [Gallese, 2003; Iacoboni et al., 2005; Montgomery et al., 2007]. However, the MNS is not just activated by biological movements, but also animations of rigid geometric shapes that imply intentions or goals, but do not depict articulated movements of body parts [Iacoboni et al., 2005; Montgomery et al., 2007]. Consequently, it has been suggested that the frontal operculum represents inferences about other peoples' goals or intentions at a higher level than action kinematics [Gallese, 2003; Gobbini et al., 2007; Lyons et al., 2006]. A unifying computational theory (“MOSAIC”) for social interaction and motor control has been described [Wolpert et al., 2003], and it has recently been suggested that activity in the frontal operculum is consistent with the highest level of action representation in the MOSAIC model [Gobbini et al., 2007]. Consequently, the frontal operculum TD signal is unlikely to have been due to simply making a motor response. Indeed, a version of MOSAIC has been described which predicts TD signals in the MNS [Doya et al., 2003]. There is additional evidence for the frontal operculum being associated with emotion detection (facial emotional recognition and emotional prosody), with lesions resulting in deficits [Adolphs et al., 2002]. Our results are consistent with this work, in that a bilateral frontal operculum TD learning signal predicting a requirement for a social motor response was found, which was significantly less present in the Pavlovian task. In addition, subjects reported emotional responses to varying social exclusion.
Eisenberger's study reported a dAC region activated with the “social pain” of exclusion [Eisenberger et al., 2003]. This is in a similar location to the regions we report for both the Pavlovian task (reward delivery) and Social task (inclusion), which lies within an area often reported active in diverse cognitive tasks [Frith and Frith, 2003; Steele and Lawrie, 2004]. It is notable that a dAC region often reported in studies of pain is adjacent and may partly overlap with the “cognitive task” region [Peyron et al., 2000]. Consequently, there is not a contradiction between Eisenberger's dAC results and ours.
The results are intriguing, as the presence of TD signals in the Social task provides further evidence that neural responses involving social motor response learning are consistent with an established computational model. Given direct links between DA TD signals, motivation [Berridge, 2007; McClure et al., 2003b] and perhaps even emotion [Daw et al., 2002], this suggests a physiological mechanism by which the motivational and affective responses to social interaction could be modulated in humans. Psychiatric disorders which are characterized by incorrect social predictions and abnormal affective responses, such as paranoid schizophrenia and major depression, may be associated with abnormal TD signals [Kapur, 2003]. Further work on TD signals and social interaction is indicated.
Acknowledgements
The authors thank Peter Dayan for discussions on temporal difference theory.
REFERENCES
- Adolphs R,Damasio H,Tranel D ( 2002): Neural systems for recognition of emotional prosody: A 3D lesion study. Emotion 2: 23–51. [DOI] [PubMed] [Google Scholar]
- Berridge KC ( 2007): The debate over dopamine's role in reward: The case for incentive salience. Psychopharmacology (Berl) 191: 391–431. [DOI] [PubMed] [Google Scholar]
- Daw ND,Kakade S,Dayan P ( 2002): Opponent interactions between serotonin and dopamine. Neural Netw 15: 603–616. [DOI] [PubMed] [Google Scholar]
- Dayan P,Abbott LF ( 2001): Theoretical Neuroscience: Computational and Mathematical Modelling of Neural Systems. Cambridge,MA: The MIT Press. [Google Scholar]
- Doya K,Sugimoto N,Wolpert DM,Kawato M ( 2003): Selecting optimal behaviours based on contexts. International Symposium on Emergent Mechanisms of Communication. Awaji. pp 19–23.
- Eisenberger NI,Lieberman MD,Williams KD ( 2003): Does rejection hurt? An FMRI study of social exclusion. Science 302: 290–292. [DOI] [PubMed] [Google Scholar]
- Friston K ( 2004): Introduction to statistical parametric mapping In: Frackowiak RSJ,editor. Human Brain Function. London: Academic Press. [Google Scholar]
- Frith U,Frith CD ( 2003): Development and neurophysiology of mentalizing In: Frith CD,Wolpert DM, editors. The Neuroscience of Social Interaction. Oxford: Oxford University Press. [Google Scholar]
- Gallese V ( 2003): The roots of empathy: The shared manifold hypothesis and the neural basis of intersubjectivity. Psychopathology 36: 171–180. [DOI] [PubMed] [Google Scholar]
- Genovese CR,Lazar NA,Nichols T ( 2002): Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage 15: 870–878. [DOI] [PubMed] [Google Scholar]
- Gobbini MI,Koralek AC,Bryan RE,Montgomery KJ,Haxby JV ( 2007): Two takes on the social brain: A comparison of theory of mind tasks. J Cogn Neurosci 19: 1803–1814. [DOI] [PubMed] [Google Scholar]
- Iacoboni M,Molnar‐Szakacs I,Gallese V,Buccino G,Mazziotta JC,Rizzolatti G ( 2005): Grasping the intentions of others with one's own mirror neuron system. PLoS Biol 3: e79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kapur S ( 2003): Psychosis as a state of aberrant salience: A framework linking biology, phenomenology, and pharmacology in schizophrenia. Am J Psychiatry 160: 13–23. [DOI] [PubMed] [Google Scholar]
- King‐Casas B,Tomlin D,Anen C,Camerer CF,Quartz SR,Montague PR ( 2005): Getting to know you: Reputation and trust in a two‐person economic exchange. Science 308: 78–83. [DOI] [PubMed] [Google Scholar]
- Lyons DE,Santos LR,Keil FC ( 2006): Reflections of other minds: How primate social cognition can inform the function of mirror neurons. Curr Opin Neurobiol 16: 230–234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McClure SM,Berns GS,Montague PR ( 2003a): Temporal prediction errors in a passive learning task activate human striatum. Neuron 38: 339–346. [DOI] [PubMed] [Google Scholar]
- McClure SM,Daw ND,Montague PR ( 2003b): A computational substrate for incentive salience. Trends Neurosci 26: 423–428. [DOI] [PubMed] [Google Scholar]
- Menon M,Jensen J,Vitcu I,Graff‐Guerrero A,Crawley A,Smith MA, Kapur S. ( 2007): Temporal difference modeling of the blood‐oxygen level dependent response during aversive conditioning in humans: Effects of dopaminergic modulation. Biol Psychiatry 62: 765–772. [DOI] [PubMed] [Google Scholar]
- Montague PR ( 2006): Why Choose This Book? London: Penguin Books Ltd. [Google Scholar]
- Montague PR,Dayan P,Sejnowski TJ ( 1996): A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci 16: 1936–1947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Montgomery KJ,Isenberg N,Haxby JV ( 2007): Communicative hand gestures and object‐directed hand movements activate the mirror neuron system. Soc Cogn Affect Neurosci 2: 114–122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O'Doherty J,Dayan P,Schultz J,Deichmann R,Friston K,Dolan RJ ( 2004): Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304: 452–454. [DOI] [PubMed] [Google Scholar]
- O'Doherty JP,Dayan P,Friston K,Critchley H,Dolan RJ ( 2003): Temporal difference models and reward‐related learning in the human brain. Neuron 38: 329–337. [DOI] [PubMed] [Google Scholar]
- O'Doherty JP,Buchanan TW,Seymour B,Dolan RJ ( 2006): Predictive neural coding of reward preference involves dissociable responses in human ventral midbrain and ventral striatum. Neuron 49: 157–166. [DOI] [PubMed] [Google Scholar]
- O'Reilly RC,Braver TS,Cohen JD ( 1999): A biologically based computational model of working memory In: Miyake A,Shah P, editors. Models of Working Memory: Mechanisms of Active Maintenance and Executive Control. Cambridge: Cambridge University Press; pp 375–441. [Google Scholar]
- Paulus MP,Feinstein JS,Tapert SF,Liu TT ( 2004): Trend detection via temporal difference model predicts inferior prefrontal cortex activation during acquisition of advantageous action selection. Neuroimage 21: 733–743. [DOI] [PubMed] [Google Scholar]
- Pessiglione M,Seymour B,Flandin G,Dolan RJ,Frith CD ( 2006): Dopamine‐dependent prediction errors underpin reward‐seeking behaviour in humans. Nature 442: 1042–1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Petrides M,Pandya DN ( 1994): Comparative architectonic analysis of the human and macaque frontal cortex In: Boller F,Grafman J, editors. Handbook of Neuropsychology. Amsterdam: Elsevier Science B.V. pp 17–58. [Google Scholar]
- Peyron R,Laurent B,Garcia‐Larrea L ( 2000): Functional imaging of brain responses to pain. A review and meta‐analysis (2000). Neurophysiol Clin 30: 263–288. [DOI] [PubMed] [Google Scholar]
- Rescorla RA,Wagner AR ( 1972): A theory of pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement In: Black AH,Prokasy WF, editors. Classical Conditioning. II. Current Research and Theory. New York: Appleton Century Crofts. [Google Scholar]
- Schultz W ( 2002): Getting formal with dopamine and reward. Neuron 36: 241–263. [DOI] [PubMed] [Google Scholar]
- Schultz W,Dickinson A ( 2000): Neuronal coding of prediction errors. Annu Rev Neurosci 23: 473–500. [DOI] [PubMed] [Google Scholar]
- Seymour B,O'Doherty JP,Koltzenburg M,Wiech K,Frackowiak R,Friston K,Dolan R. ( 2005): Opponent appetitive‐aversive neural processes underlie predictive learning of pain relief. Nat Neurosci 8: 1234–1240. [DOI] [PubMed] [Google Scholar]
- Seymour B,Singer T,Dolan R ( 2007): The neurobiology of punishment. Nat Rev Neurosci 8: 300–311. [DOI] [PubMed] [Google Scholar]
- Steele JD,Lawrie SM ( 2004): Segregation of cognitive and emotional function in the prefrontal cortex: A stereotactic meta‐analysis. Neuroimage 21: 868–875. [DOI] [PubMed] [Google Scholar]
- Sutton RS,Barto AG ( 1981): Towards a modern theory of adaptive networks: Expectation and prediction. Psychol Rev 88: 135–170. [PubMed] [Google Scholar]
- Sutton RS,Barto AG ( 1990): Time‐derivitive models of pavlovian reinforcement In: Gabriel M,Moore J, editors. Learning and Computational Neuroscience: Foundations of Adaptive Networks. Cambridge, MA: MIT Press; pp 497–537. [Google Scholar]
- Sutton RS,Barto AG ( 1998): Reinforcement Learning. Cambridge, MA: MIT Press. [Google Scholar]
- Tanaka SC,Doya K,Okada G,Ueda K,Okamoto Y,Yamawaki S ( 2004): Prediction of immediate and future rewards differentially recruits cortico‐basal ganglia loops. Nat Neurosci 7: 887–893. [DOI] [PubMed] [Google Scholar]
- Tesauro G ( 1995): Temporal difference learning and TD‐Gammon. Commun ACM 38: 58–68. [Google Scholar]
- Tobler PN,O'Doherty JP,Dolan RJ,Schultz W ( 2006): Human neural learning depends on reward prediction errors in the blocking paradigm. J Neurophysiol 95: 301–310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Waelti P,Dickinson A,Schultz W ( 2001): Dopamine responses comply with basic assumptions of formal learning theory. Nature 412: 43–48. [DOI] [PubMed] [Google Scholar]
- Williams KD,Cheung CK,Choi W ( 2000): Cyberostracism: Effects of being ignored over the Internet. J Pers Soc Psychol 79: 748–762. [DOI] [PubMed] [Google Scholar]
- Wolpert DM,Doya K,Kawato M ( 2003): A unifying computational framework for motor control and social interaction. Philos Trans R Soc Lond B Biol Sci 358: 593–602. [DOI] [PMC free article] [PubMed] [Google Scholar]