Abstract
Learning to predict upcoming outcomes based on environmental cues is essential for adaptative behavior. In monkeys, midbrain dopaminergic neurons code two statistical properties of reward: a prediction error at the outcome and uncertainty during the delay period between cues and outcomes. Although the hippocampus is sensitive to reward processing, and hippocampal–midbrain functional interactions are well documented, it is unknown whether it also codes the statistical properties of reward information. To address this question, we recorded local field potentials from intracranial electrodes in human hippocampus while subjects learned to associate cues of slot machines with various monetary reward probabilities (P). We found that the amplitudes of negative event-related potentials covaried with uncertainty at the outcome, being maximal for P = 0.5 and minimal for P = 0 and P = 1, regardless of winning or not. These results show that the hippocampus computes an uncertainty signal that may constitute a fundamental mechanism underlying the role of this brain region in a number of functions, including attention-based learning, associative learning, probabilistic classification, and binding of stimulus elements.
Introduction
The ability to make predictions about potentially rewarding situations has been the focus of conditioning theories explaining how animals learn the predictive relationships between conditioned stimuli (CSs) and reinforcers. Most of these theories propose that learning emerges through the computation of a prediction error between predicted and actual rewards (Rescorla, 1972). Other theories propose that learning is achieved by attention to stimuli: the association between the CS and outcome is enhanced if there is uncertainty about the prediction associated with this stimulus, whereas a stimulus loses its association with a reinforcer when its consequences are accurately predicted (Pearce and Hall, 1980; Yu and Dayan, 2003). Dopamine is closely associated with reward processing (Schultz, 2007). In monkeys, midbrain dopaminergic neurons exhibit a phasic reward prediction error signal that varies monotonically with reward probability (P) at the time of the outcome and a sustained reward uncertainty signal, appearing between the cue and the outcome and following an inverted U-shaped relationship with reward probability, being highest for maximal reward uncertainty (Fiorillo et al., 2003).
Midbrain dopamine neurons broadcast reward-related signals to the ventral striatum and the orbitofrontal cortex. Although the functions of these structures during reward processing have largely been investigated, the role of the hippocampus in this domain has received little attention. However, a growing body of experimental data supports the existence of a functional loop between the ventral tegmental area (VTA) and the hippocampus (Thierry et al., 2000; Floresco et al., 2001, 2003; Lisman and Grace, 2005). In rodents, the novelty-induced activation of the VTA depends on the activation of hippocampal neurons (Legault and Wise, 2001), possibly via the nucleus accumbens–ventral pallidum–VTA pathway (Floresco et al., 2003; Lodge and Grace, 2006), and dopamine release in the hippocampus and prefrontal cortex enhances synaptic plasticity and learning in these regions (Frey et al., 1990). Moreover, several studies provided links between the hippocampus and the VTA both in schizophrenia and in rodent models of this disease (Laruelle and Innis, 1996; Lipska et al., 2003; Harrison, 2004; Lodge and Grace, 2007). In humans, functional magnetic resonance imaging (fMRI) studies showed that midbrain and hippocampus are coactivated during reward-motivated memory formation (Adcock et al., 2006), and prefrontal–hippocampal functional coupling during memory processing is strongly modulated by catechol-O-methyltransferase Val158/Met polymorphism (Bertolino et al., 2006). Reward also modulates hippocampal activity in rodents (Hölscher et al., 2003) and monkeys (Watanabe and Niki, 1985; Rolls and Xiang, 2005).
The hippocampus may also receive reward-related information from the amygdala and orbitofrontal cortex, which project to it and to the entorhinal/perirhinal cortex (Van Hoesen et al., 1975; Amaral and Cowan, 1980; Suzuki and Amaral, 1994). Together, these data suggest that reward-related information may reach the human hippocampus via several pathways.
Yet, it is still unknown whether the hippocampus codes a prediction error and/or uncertainty during learning of probabilistic cue–reward associations. To address these questions, we recorded hippocampal activity in epileptic patients implanted with depth electrodes while they learned to associate cues, i.e., images of different slot machines with distinct probabilities of monetary rewards.
Materials and Methods
Subjects.
Three male volunteers (ages 20, 40, and 27) suffering from drug-refractory partial epilepsy performed the experiment. They were stereotaxically implanted with depth electrodes as part of a presurgical evaluation. All the subjects were fully informed of the brain recordings for the present study and gave their informed consent. The procedure did not entail any additional risk for the subjects and was thus ethically acceptable according to French regulation. The target structures implanted with depth electrodes to identify the potential epileptogenic foci before eventual functional surgery were defined on the basis of noninvasive video-scalp EEG recordings, structural MRI, 18 fluorodeoxyglucose (18FDG) positron emission tomography (PET), and ictal SPECT (single photon emission computed tomography) [for a complete description of the rationale of electrode implantation, see the study by Isnard et al. (2004)]. Structural MRI and 18FDG PET scans showed no hippocampal atrophy or hypometabolism in any of the three subjects. Subject 1 suffered from right temporal lobe epilepsy, and subjects 2 and 3 suffered from left temporal lobe epilepsy. The hippocampus was included in the explored sites. Subject 1 had a unilateral implantation in the right hippocampus, subject 2 had a unilateral implantation in the left hippocampus, and subject 3 had bilateral hippocampal implantations. In subject 3, intracranial EEG recordings showed permanent paroxystic activities in the inner part of the left temporal lobe, suggesting a focal dysplasia of the left hippocampus. Consequently, the recordings from this subject's left hippocampus were discarded from our study, and only the activity from his right hippocampus was analyzed. In all subjects, EEG recordings from the epileptic temporal lobe showed that the hippocampus participated in seizure propagation but was not part of the primary epileptogenic zone. The epileptogenic trigger zones were located in the right superior parietal lobule in subject 1, in the external part of the left temporobasal neocortex in subject 2, and in the left amygdala in subject 3. Subject 1 is waiting for surgery. Subjects 2 and 3 were cured by corticectomy sparing the hippocampus and are today seizure free.
Stereotaxic implantation and electrode location.
Recording electrodes were 0.8 mm multicontact cylinders (DIXI Medical). They were implanted into the brain perpendicular to the midsagittal plane, according to Talairach and Bancaud's stereotaxic technique (Talairach and Bancaud, 1973), as already done by our group (Krolak-Salmon et al., 2004). Contacts (5–15 per electrode) were 2 mm long and spaced every 1.5 mm. Electrode locations were measured from x-ray images obtained on a stereotaxic frame and registered on the corresponding structural magnetic resonance images using a custom-designed Matlab program (MathWorks).
Behavioral task.
The experimental paradigm was implemented with the software Presentation (version 9, Neurobehavioral Systems). Subjects were presented with eight runs of five blocks with the same elementary structure. In each block, a single slot machine was presented on a computer screen during 20 consecutive trials. Each slot machine was made visually unique by displaying a particular fractal image on top of it.
In each run, five types of slot machines were presented in random order and, unbeknownst to the subjects, attached to five reward probabilities [P = 0 (P0), P0.25, P0.5, P0.75, and P1). A total of 8 × 5 = 40 different slot machines were presented in eight runs. Rewarded and unrewarded trials were pseudorandomized (Fig. 1).
Figure 1.
Experimental paradigm. Subjects estimated the reward probabilities of five types of slot machines that varied with respect to monetary reward probabilities (P0 to P1) and that could be discriminated by specific fractal images on top of them. Trials were self-paced and were composed of four distinct phases as follows. (1) Slot machine presentation (S1): subjects pressed one of two response keys to estimate whether the slot machine frequently delivered 20€ or not, over all the past trials. (2) Delay period (1.5 s): a subject's key press triggered three spinners to roll around and to successively stop at 0.5 s intervals during 0.5 s. (3) Outcome (S2) (0.5 s): the third spinner stopped spinning, revealing the trial outcome (i.e., fully informing the subject on subsequent reward or no reward delivery). Only two configurations were possible at the time the third spinner stopped: “bar, bar, seven” (no reward) or “bar, bar, bar” (rewarded trial). (4) Reward/No reward delivery (1 s): picture of a 20€ bill or rectangle with “0€” written inside.
The subjects' task was to estimate at each trial the reward probability of each slot machine at the time of its presentation, based on all the previous outcomes of the slot machine until this trial (i.e., estimate of cumulative probability since the first trial). The task was not to predict whether the slot machine would reward or not on the current trial. To perform the task, subjects had to press one of two response buttons: one button indicating that, overall, the slot machine had a high winning probability and the other button indicating that, overall, the slot machine had a low winning probability. Subjects were told that their current estimate had no influence on subsequent reward occurrence. During the task, subjects received no feedback relative to their correct/incorrect estimation of the winning probability of the slot machine. Finally, at the end of each block of 20 successive presentations of a single type of slot machine, they were asked to classify this slot machine on a scale from 0 to 4 according to their global estimate of reward delivery.
Recordings and signal averaging.
The experiment started at least 8 d after electrode implantation. At that time, anticonvulsive drug treatment had been drastically reduced for at least 1 week to record spontaneous epileptic seizures during continuous video-scalp EEG recordings performed in specially equipped rooms. The three subjects were under the following antiepileptic therapies: subject 1, lamotrigine (300 mg/24 h) and topiramate (100 mg/24 h); subject 2, carbamazepine (1400 mg/24 h) and clobazam (10 mg/24 h); and subject 3, oxcarbazepine (1200 mg/24 h), valproate (1000 mg/24 h), and alprazolam (0.75 mg/24 h). The experiment took place 48, 96, and 12 h after occurrence of a seizure for subjects 1, 2, and 3, respectively. Continuous-depth EEGs were recorded on a 128-channel device (Brain Quick System Plus; Micromed), amplified, filtered (0.1–200 Hz bandwidth), sampled at 512 Hz, and stored together with digital markers of specific events of the task for subsequent off-line analysis. These markers included five markers at the cue [appearance of the slot machine (S1)] to differentiate each of the five reward probabilities of the slot machines (P0, P0.25, P0.5, P0.75, and P1) and eight markers at the outcome [when the third spinner stopped spinning (S2)], fully informing the subject on subsequent reward or no reward delivery, defined according to the eight possible outcomes (three slot machines with either rewarded or unrewarded trials, one with only rewarded trials, and one with only unrewarded trials). The intrahippocampal EEG was referenced to another electrode contact located outside the brain, near the skull. In subjects 1 and 2, this reference electrode was located in the most superficial contact (outside brain tissue) of the hippocampal electrode with recording contacts, and in subject 3 it was located in another electrode in the contralateral side relative to the recording electrode. EEG was low-pass filtered (30 Hz) and visually inspected. Trials showing epileptic spikes and artifacts were discarded. Signals were processed with the software package for electrophysiological analyses (ELAN-Pack) developed at the Inserm U821 laboratory (Lyon, France; http://u821.lyon.inserm.fr). Averaging and analysis of the EEG were performed on epochs of 3500 ms (−1500 + 2000 ms from markers placed at the cue and at the outcome, respectively), with a baseline correction from −1500 ms to these markers. We chose this long time period as the baseline because during the delay period, when the spinners rolled around, no activity linked to the rotation of the spinners emerged in the hippocampus, providing a baseline long enough to eliminate electrical noise.
Behavioral data analysis.
The percentages of correct estimations of the high/low probability of winning for each slot machine were analyzed as a function of trial rank (1–20) averaged over subjects and runs. The estimations were defined as correct for the slot machines with low reward probabilities (P0 and P0.25) if subjects identified them as “low winning” and were defined as correct for the slot machines with high reward probabilities (P0.75 and P1) if subjects identified them as “high winning.” The slot machine with a reward probability of P0.5 had neither “low” nor “high” winning probability. The choice being binary, the percentage of 50% estimates of “high,” or symmetrically, of “low” winning probability corresponded to the correct estimate of winning probability for this slot machine.
For the probabilities P0, P0.25, P0.75, and P1, the trial rank when learning occurred was defined as the trial rank with at least 70% correct responses and for which the percentage of correct estimation did not decrease below this limit for the remaining trials. For the probability P0.5, the trial rank when learning occurred was defined as the trial rank with ∼50% of the responses being either “high” or “low” winning probability, with responses then oscillating around this value for the remaining trials. Moreover, results from subjects' classifications of the slot machines at each of the 20 successive presentations of a single type of slot machine within runs were compared with their estimations made at the end of each block.
Response time (RT) (time elapsed between the machine's appearance and the subject's response) was analyzed as a function of the reward probabilities of the slot machines and the trial rank.
Electrophysiological data analysis.
Trials containing epileptic spikes or artifacts were rejected. No trials were discarded from subject 1, whereas 30% and 16% of the trials were discarded from subjects 2 and 3, respectively (the percentages of rejected trials per condition are reported in supplemental Table 1, available at www.jneurosci.org as supplemental material).
For each subject, the mean peak amplitudes of the event-related potentials (ERPs) at S1 and S2 were computed over all trials for each of the five types of slot machines for rewarded and unrewarded trials separately. First, at S1, subjects 1 and 3 showed ERPs with constant amplitudes regardless of reward probability, whereas subject 2 had no ERP in the hippocampus. Because ERPs at S1 were not reproducible and were not related to the reward probabilities of the slot machines, they were not analyzed further.
Next, we examined the statistical significance of the ERPs at S2 with respect to the baseline signal (−1500–0 ms), with a Wilcoxon test performed on single trials for each probability on epochs of 3500 ms (−1500 to +2000 ms from the markers) with a moving time window of 20 ms, shifted by a 2 ms step. We then investigated the relationship between ERP peak amplitudes and reward probability for each subject by use of a multifactorial ANOVA, with reward probability and trial outcome (rewarded/unrewarded) as independent factors. Post hoc comparisons were then made using Tukey's HSD tests to further assess the significant differences between ERP peak amplitudes as a function of probability and outcome.
Finally, since the behavioral analysis showed that the learning criterion was reached at around the ninth trial for all reward probabilities, the first 10 trials of each block were discarded to rule out a possible effect of learning on the ERP peak amplitudes, and the same analysis on the ERP peak amplitudes was then performed for only the last 10 trials.
Moreover, for each subject, we determined the mean onset latencies, peak latencies, and durations of the ERPs, time locked to the time the third spinner stopped, for the five types of slot machines for rewarded and unrewarded trials.
Results
Behavior
Estimation of reward probability
A multifactorial ANOVA performed on the percentage of correct estimates of the probability of winning (low likelihood of winning for P0 and P0.25, high likelihood of winning for P0.75 and P1, and 50% of each alternative for P0.5) showed that both reward probability (P) and trial rank (R) influenced the percentage of correct estimations (FP(4,500) = 96.48, p < 0.000001; FR(19,500) = 4.44, p < 0.000001) and that the trial rank when learning occurred depended on reward probability (FR×P(76,500) = 1.87, p < 0.00004). The reward probabilities P0 and P1 reached the learning criterion after the 2nd trial (>80% correct estimations), whereas the reward probabilities P0.25 and 0.75 reached the learning criterion between the 4th and the 12th trial for P0.25 (7th trial, 91.6% correct estimations) and between the 5th and the 16th trial for P0.75 (9th trial, 70.8% correct estimations). The reward probability P0.5 reached the learning criterion after the ninth trial (estimations oscillating around 50% as “high” or “low” probability of winning) (Fig. 2A,B).
Figure 2.
Behavioral performance. A, B, Mean learning curves averaged across subjects, expressed as the mean percentage of “high winning probability” (A) and “low winning probability” (B) estimations of the five slot machines, as a function of trial rank. C, D, Response times. C, Mean RTs averaged across subjects as a function of reward probability. D, Mean response times averaged across subjects as a function of trial rank. The effect of trial rank on RT was caused by the first trial, which was slower for all subjects and all reward probabilities (mean ± SEM = 1200 ± 13.54 ms; Tukey's HSD post hoc test, p ≤ 0.0001).
The fact that subjects learned the actual reward probability of each slot machine at asymptote was confirmed by their additional classification of the slot machines at the end of each block on a scale from 0 to 4 (96% correct estimations for P0, 100% for P1, 87% for P0.25, 83% for P0.75, and 92% for P0.5).
RTs
The mean RTs ± SEM for all the reward probabilities and trials were 809.20 ± 25 ms for subject 1, 612.90 ± 14.90 ms for subject 2, and 832.60 ± 27.27 ms for subject 3. Subject 2 had a significantly shorter RT than the other two subjects (p = 0.00002). RTs were analyzed over all subjects with two multifactorial ANOVAs.
First, an RT analysis was performed with the reward probability (P) of the slot machines and the trial rank (R) as independent factors. There was a main effect of trial rank (FR(19,2279) = 4.22, p < 0.0000001) and no main effect of probability (FP(4,2079) = 1.63, p = 0.16). Although the ANOVA did not reveal any effect of reward probability on RT, there was a trend for RT to decrease with increasing reward probabilities (Fig. 2C). The effect of trial rank on RT was caused by the first trial, which was slower for all subjects and all reward probabilities (1200 ± 13.54 ms, Tukey's HSD post hoc test, p < 0.0001).
Second, RTs were analyzed with an ANOVA, with trial outcome (reward/no reward) (O) and reward probabilities of the slot machines (P) as independent factors, followed by Tukey's HSD post hoc test. RTs did not vary with trial outcome (FO(1,2373) = 0.28, p = 0.59); values were 777.09 ± 24.76 ms for rewarded trials and 743.03 ± 23.50 ms for unrewarded trials (Fig. 2D).
Electrophysiology
Electrode location
In each subject, at least three contiguous contacts were located in the hippocampus. In subjects 1 and 3, they were located in the right hippocampus and in subject 2 in the left hippocampus. The Talairach coordinates of the hippocampal electrode contacts from the deepest to the most superficial were the following: for subject 1, x = 20–34 (five contacts), y = −22, z = −12; for subject 2, x = −25 to −34 (four contacts), y = −22, z = −10; and for subject 3, x = 25–32 (three contacts), y = −31, z = −5. These coordinates correspond to the rostral and dorsal parts of the hippocampus in subjects 1 and 2 and to the medial and dorsal parts of the hippocampus in subject 3 (Figs. 3, 4).
Figure 3.
Location of intracranial electrode contacts. Coronal (top), sagittal (middle), and horizontal (bottom) MRI slices from the three subjects showing the location of the intracranial electrode contacts in the hippocampus. The contacts in the hippocampus yielding the largest potentials are shown as yellow squares.
Figure 4.
Hippocampal activity. ERPs recorded in the hippocampus at the outcome (S2) (shaded area, 0 + 500 ms) for each of the five types of slot machines. Left, Coronal slices of the three subjects (1–3), showing the locations of the contacts in the hippocampus. Right, Mean ERPs recorded at the outcome period.
Hippocampal ERP amplitudes
Regardless of winning or not, a robust negative ERP emerged in the hippocampus of the three subjects, 256.5 ± 16.5 ms after the outcome (S2) and before the actual outcome presentation (picture of a bill or no reward) (Fig. 4). This signal was observed for three of the four hippocampal contacts in subject 1, for one of the four contacts in subject 2, and for two of the three contacts in subject 3. Here we report results from the contact yielding the largest potential in each subject. Contacts adjacent to the one yielding the largest signal yielded a smaller amplitude signal, no signal, or a polarity inversion, suggesting that the origin of the observed ERP was close to this contact (supplemental Fig. 1, available at www.jneurosci.org as supplemental material).
For each subject and for each type of slot machine (i.e., reward probability), this emerging signal was significantly different from baseline during a time window varying from 56 to 431 ms around the maximal amplitude (Wilcoxon tests, p values varying from <0.0001 to <0.048).
Importantly, for each subject, the mean peak amplitude of these ERPs (−28 to −112 μV) followed an inverted U-curve relationship with reward probability, varying nonlinearly with reward probability and being maximal when reward uncertainty is highest (P0.5) and minimal when reward uncertainty is lowest (P0 and P1), both for rewarded and for unrewarded trials. No difference in the peak amplitudes was observed for rewarded versus unrewarded trials (ANOVA with probability and outcome as independent factors). For subject 1, FP(3,800) = 6.44, p < 0.0003, and FO(1,800) = 0.027, p = 0.87, no interaction, FP×O(3,800) = 0.75, p = 0.52; for subject 2, FP3,486) = 4.71, p < 0.003, and FO(1,486) = 0.09, p = 0.76, no interaction, FP×O(3,486) = 0.12, p = 0.95; for subject 3, FP(3,632) = 7.70, p < 0.00005, and FO(1,632) = 7.70, p < 0.00005, no interaction, FP×O(3,632) = 0.29, p = 0.83 (Fig. 5). We therefore performed the same multifactorial ANOVA at the group level, with subject (S), probability, and type of outcome (reward or no reward) as independent factors. The factor subject had no effect: FP(3,1918) = 17.55, p < 0.000001; FO(1,1918) = 0.089, p = 0.76; FS(2,3630) = 0.12, p = 0.88, no interaction, FP×O(3,1918) = 0.06, p = 0.98, FP×O×S(6,7195) = 0.46, p = 0.83 (Fig. 6A).
Figure 5.
Influence of cue–outcome uncertainty on hippocampal ERP amplitude for each subject. Mean peak amplitudes of ERPs (± SEM) at the outcome, as a function of reward probability, varied as an inverted U-shaped curve, both for rewarded (Rew) and for unrewarded (Unrew) trials (Tukey's HSD test; **p < 0.05, ***p < 0.001).
Figure 6.
Mean peak amplitudes of ERPs (± SEM) as a function of reward probability, averaged across subjects at the outcome for all trials (A) and for the last 10 trials (when learning criterion for all the reward probabilities of the slot machines was reached) (B) (Tukey's HSD test; **p < 0.005, ***p < 0.0001).
Finally, to rule out the possible influence of early-stage learning of the reward probability on the amplitude of these ERPs, we also performed an additional analysis on the ERPs for the last 10 trials of each run. A similar inverted U-shaped relationship was observed between reward probability and the amplitudes of hippocampal ERPs (ANOVA with probability and outcome as independent factors: FP(3,750) = 10.71, p < 0.000001; FO(1,750) = 0.5, p = 0.47; no interaction, FP×O(3,750) = 0.25, p = 0.85) (Fig. 6B).
Hippocampal ERP latencies and durations
Multifactorial ANOVA on the mean onset latencies, peak latencies, and durations of ERPs time locked to S2 with reward probability, outcome (rewarded/unrewarded), and subject as independent factors showed that there was no significant effect of reward probability (P) or outcome (O) on onset latencies (FP(4,17) = 0.79, p = 0.51; FO(1,17) = 0.0005, p = 0.98), peak latencies (FP(3,17) = 0.55, p = 0.65; FO(1,17) = 0.018, p = 0.89), or durations (FP(3,17) = 2.06, p = 0.14; FO(1,17) = 1.11, p = 0.30). A significant effect of subject was observed on ERP onset latencies, peak latencies, and durations. Indeed, subject 1 had significantly longer ERP onset latencies (301.77 ± 10.47 ms) compared with subjects 2 and 3 [225.36 ± 24.91 ms, p = 0.04, and 242.24 ± 11.74 ms, p < 0.02, respectively; Fisher's least significant difference (LSD) test], whereas subject 2 had significantly longer peak latencies for unrewarded trials (475.10 ± 54.67 ms versus 407.23 ± 11.28 for rewarded trials, p < 0.005; Fisher's LSD test) and significantly longer ERP durations, regardless of whether or not the trial was rewarded (526.57 ± 37.10 ms), compared with subjects 1 and 3 (300.65 ± 23.11 ms and 316.38 ± 34.51 ms, respectively, p < 0.0001; Fisher's LSD test) (supplemental Table 2, available at www.jneurosci.org as supplemental material). These slight individual differences in ERP latencies and durations have no consequences concerning the significance of the hippocampal ERP amplitudes analyzed here.
Discussion
This study provides the first direct evidence that the anterior hippocampus codes uncertainty of cue–outcome associations in humans. It shows that when subjects learned to associate cues of slot machines with various monetary reward probabilities (P), the amplitude of negative ERPs recorded in the anterior hippocampus followed an inverted U-shaped relationship with the outcome probability, regardless of winning or not.
This inverted U-shape relationship is incompatible with prediction error, novelty, or surprise coding, which would have predicted a negative monotonic correlation between ERP amplitudes and increasing reward probability (Fiorillo et al., 2003; Dreher et al., 2006).
Also, the signal we observed at the outcome cannot reflect a negative error feedback (such as an error-related negativity), because no feedback was delivered on the current trial regarding subject's estimation and because the task was not to predict the outcome of the current trial (but to estimate the cumulative reward probability since the first trial).
Moreover, despite the well established role of the hippocampus in learning, we believe that the signal we observed codes uncertainty and cannot be interpreted as a learning signal alone, because it also occurred when restricting our analysis to the last 10 trials of our experiment, when all subjects had learned the winning probability of each slot machine.
In a previous fMRI study using a similar paradigm (Dreher et al., 2006), no hippocampal activation linked to reward uncertainty was seen at the outcome. This study used a much longer delay period (14 s) than our current experiment (delay = 2 s, equal to the one used in the monkey electrophysiology experiment), which may explain why the short-lasting hippocampal uncertainty signal currently observed at the outcome (∼300 ms) may have been missed in the fMRI study.
Our current results extend to the domain of associative learning results obtained in human neuroimaging studies showing that the BOLD (blood oxygen level dependent) response in the anterior hippocampus increases with uncertainty of probabilistic sequential events (Strange et al., 2005), although other studies reported opposite results (Harrison et al., 2006).
Two important characteristics distinguish uncertainty coding in the hippocampus from the uncertainty signal recorded in monkeys' dopaminergic neurons (Fiorillo et al., 2003). First, the signal recorded in the hippocampus is transient. Second, it occurs at the outcome and not during the delay between the cue and the outcome and therefore is not linked to reward expectation. These two modes of uncertainty coding may play different functions during associative learning: the sustained mode of midbrain activity may be related to a sustained form of attention to reinforcers, motivation, or exploratory behavior (Fiorillo et al., 2003; Dreher et al., 2006), whereas the transient mode of hippocampal activity may code a posteriori the degree of uncertainty of cue–outcome associations and signal selective attention to the informative outcome (Pearce and Hall, 1980). Providing information about trial outcome may be a fundamental computational operation achieved by the hippocampus, because this has been shown to occur in other domains (Watanabe and Niki, 1985; Wittmann et al., 2007).
Both forms of uncertainty coding are compatible with the concept of Shannon's entropy from information theory (Shannon, 1948), which measures an ensemble's average information content or its uncertainty and which is maximal for outcomes with a 50% chance of occurrence. Thus, we believe that the hippocampal signal recorded at the outcome may help to adjust attention to the level of outcome uncertainty regardless of reward. In summary, these findings extend early views in the probabilistic domain that the hippocampus is involved in decreasing attention to unimportant events (Douglas, 1967) and further support the idea that it can produce increases in attention to relevant stimuli (Pearce and Hall, 1980). This general computation of cue–outcome uncertainty may represent the underlying mechanism responsible for the involvement of the hippocampus in associative learning, probabilistic classification (Squire and Zola, 1996), binding of stimulus elements (Gluck and Granger, 1993), and transitive inference (Dusek and Eichenbaum, 1997; Frank et al., 2003). Indeed, in all these hippocampus-dependent functions, the encoding of item relationships is based on the strength of their associations, which can be efficiently computed by their degree of uncertainty. This a posteriori uncertainty encoding of item associations by the hippocampus may participate in a feedback process to update these relationships, enabling dynamic adaptation to the current event.
This hippocampal uncertainty signal might either be computed by the hippocampus itself, independently of dopaminergic neurons firing, or result from hippocampal–midbrain reciprocal connections. Indeed, the integration by the hippocampus of the tonic dopaminergic signal during the delay between the cue and the outcome might result in a phasic signal at the time of the outcome. Regardless of the precise contribution of dopaminergic neurons in the present findings, different representations of uncertainty arising from the hippocampus and VTA may be conveyed to postsynaptic dopaminergic projection sites, such as the orbitofrontal cortex and the striatum, allowing further computations required for decision making under uncertainty (Hsu et al., 2005). It is clear from previous findings that a ubiquitous coding of uncertainty exists in the human brain, particularly in the ventral striatum, insula, anterior cingulate cortex, and orbitofrontal cortex (Hsu et al., 2005; Dreher et al., 2006; Preuschoff et al., 2006, 2008; Tobler et al., 2007), and the present study reveals that the hippocampus also participates in uncertainty processing. Future studies are needed to pinpoint the specific roles of each structure in computing uncertainty in different contexts.
Together, our findings have crucial implications for understanding the basic neural mechanisms used by the brain to extract structural relationships from the environment when learning cue–outcome associations. They also have important consequences regarding impairment of these mechanisms in neuropsychiatric disorders involving dysfunctions of the dopaminergic–hippocampal loop (e.g., schizophrenia).
Footnotes
We thank Dr. M. Guénot for surgical implantation of epileptic patients, Dr. A. Cheylus for help with programming the experimental paradigm, and Drs. E. Procyk, S. Wirth, and K. Reilly for helpful comments on an early version of this manuscript.
References
- Adcock RA, Thangavel A, Whitfield-Gabrieli S, Knutson B, Gabrieli JD. Reward-motivated learning: mesolimbic activation precedes memory formation. Neuron. 2006;50:507–517. doi: 10.1016/j.neuron.2006.03.036. [DOI] [PubMed] [Google Scholar]
- Amaral DG, Cowan WM. Subcortical afferents to the hippocampal formation in the monkey. J Comp Neurol. 1980;189:573–591. doi: 10.1002/cne.901890402. [DOI] [PubMed] [Google Scholar]
- Bertolino A, Rubino V, Sambataro F, Blasi G, Latorre V, Fazio L, Caforio G, Petruzzella V, Kolachana B, Hariri A, Meyer-Lindenberg A, Nardini M, Weinberger DR, Scarabino T. Prefrontal-hippocampal coupling during memory processing is modulated by COMT val158met genotype. Biol Psychiatry. 2006;60:1250–1258. doi: 10.1016/j.biopsych.2006.03.078. [DOI] [PubMed] [Google Scholar]
- Douglas RJ. The hippocampus and behavior. Psychol Bull. 1967;67:416–422. doi: 10.1037/h0024599. [DOI] [PubMed] [Google Scholar]
- Dreher JC, Kohn P, Berman KF. Neural coding of distinct statistical properties of reward information in humans. Cereb Cortex. 2006;16:561–573. doi: 10.1093/cercor/bhj004. [DOI] [PubMed] [Google Scholar]
- Dusek JA, Eichenbaum H. The hippocampus and memory for orderly stimulus relations. Proc Natl Acad Sci U S A. 1997;94:7109–7114. doi: 10.1073/pnas.94.13.7109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fiorillo CD, Tobler PN, Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science. 2003;299:1898–1902. doi: 10.1126/science.1077349. [DOI] [PubMed] [Google Scholar]
- Floresco SB, Todd CL, Grace AA. Glutamatergic afferents from the hippocampus to the nucleus accumbens regulate activity of ventral tegmental area dopamine neurons. J Neurosci. 2001;21:4915–4922. doi: 10.1523/JNEUROSCI.21-13-04915.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Floresco SB, West AR, Ash B, Moore H, Grace AA. Afferent modulation of dopamine neuron firing differentially regulates tonic and phasic dopamine transmission. Nat Neurosci. 2003;6:968–973. doi: 10.1038/nn1103. [DOI] [PubMed] [Google Scholar]
- Frank MJ, Rudy JW, O'Reilly RC. Transitivity, flexibility, conjunctive representations, and the hippocampus. II. A computational analysis. Hippocampus. 2003;13:341–354. doi: 10.1002/hipo.10084. [DOI] [PubMed] [Google Scholar]
- Frey U, Schroeder H, Matthies H. Dopaminergic antagonists prevent long-term maintenance of posttetanic LTP in the CA1 region of rat hippocampal slices. Brain Res. 1990;522:69–75. doi: 10.1016/0006-8993(90)91578-5. [DOI] [PubMed] [Google Scholar]
- Gluck MA, Granger R. Computational models of the neural bases of learning and memory. Annu Rev Neurosci. 1993;16:667–706. doi: 10.1146/annurev.ne.16.030193.003315. [DOI] [PubMed] [Google Scholar]
- Harrison LM, Duggins A, Friston KJ. Encoding uncertainty in the hippocampus. Neural Netw. 2006;19:535–546. doi: 10.1016/j.neunet.2005.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harrison PJ. The hippocampus in schizophrenia: a review of the neuropathological evidence and its pathophysiological implications. Psychopharmacology (Berl) 2004;174:151–162. doi: 10.1007/s00213-003-1761-y. [DOI] [PubMed] [Google Scholar]
- Hölscher C, Jacob W, Mallot HA. Reward modulates neuronal activity in the hippocampus of the rat. Behav Brain Res. 2003;142:181–191. doi: 10.1016/s0166-4328(02)00422-9. [DOI] [PubMed] [Google Scholar]
- Hsu M, Bhatt M, Adolphs R, Tranel D, Camerer CF. Neural systems responding to degrees of uncertainty in human decision-making. Science. 2005;310:1680–1683. doi: 10.1126/science.1115327. [DOI] [PubMed] [Google Scholar]
- Isnard J, Guénot M, Sindou M, Mauguière F. Clinical manifestations of insular lobe seizures: a stereo-electroencephalographic study. Epilepsia. 2004;45:1079–1090. doi: 10.1111/j.0013-9580.2004.68903.x. [DOI] [PubMed] [Google Scholar]
- Krolak-Salmon P, Hénaff MA, Vighetto A, Bertrand O, Mauguière F. Early amygdala reaction to fear spreading in occipital, temporal, and frontal cortex: a depth electrode ERP study in human. Neuron. 2004;42:665–676. doi: 10.1016/s0896-6273(04)00264-8. [DOI] [PubMed] [Google Scholar]
- Laruelle M, Innis RB. Images in neuroscience. SPECT imaging of synaptic dopamine. Am J Psychiatry. 1996;153:1249. doi: 10.1176/ajp.153.10.1249. [DOI] [PubMed] [Google Scholar]
- Legault M, Wise RA. Novelty-evoked elevations of nucleus accumbens dopamine: dependence on impulse flow from the ventral subiculum and glutamatergic neurotransmission in the ventral tegmental area. Eur J Neurosci. 2001;13:819–828. doi: 10.1046/j.0953-816x.2000.01448.x. [DOI] [PubMed] [Google Scholar]
- Lipska BK, Lerman DN, Khaing ZZ, Weickert CS, Weinberger DR. Gene expression in dopamine and GABA systems in an animal model of schizophrenia: effects of antipsychotic drugs. Eur J Neurosci. 2003;18:391–402. doi: 10.1046/j.1460-9568.2003.02738.x. [DOI] [PubMed] [Google Scholar]
- Lisman JE, Grace AA. The hippocampal-VTA loop: controlling the entry of information into long-term memory. Neuron. 2005;46:703–713. doi: 10.1016/j.neuron.2005.05.002. [DOI] [PubMed] [Google Scholar]
- Lodge DJ, Grace AA. The hippocampus modulates dopamine neuron responsivity by regulating the intensity of phasic neuron activation. Neuropsychopharmacology. 2006;31:1356–1361. doi: 10.1038/sj.npp.1300963. [DOI] [PubMed] [Google Scholar]
- Lodge DJ, Grace AA. Aberrant hippocampal activity underlies the dopamine dysregulation in an animal model of schizophrenia. J Neurosci. 2007;27:11424–11430. doi: 10.1523/JNEUROSCI.2847-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pearce JM, Hall G. A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol Rev. 1980;87:532–552. [PubMed] [Google Scholar]
- Preuschoff K, Bossaerts P, Quartz SR. Neural differentiation of expected reward and risk in human subcortical structures. Neuron. 2006;51:381–390. doi: 10.1016/j.neuron.2006.06.024. [DOI] [PubMed] [Google Scholar]
- Preuschoff K, Quartz SR, Bossaerts P. Human insula activation reflects risk prediction errors as well as risk. J Neurosci. 2008;28:2745–2752. doi: 10.1523/JNEUROSCI.4286-07.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rescorla RA. “Configural” conditioning in discrete-trial bar pressing. J Comp Physiol Psychol. 1972;79:307–317. doi: 10.1037/h0032553. [DOI] [PubMed] [Google Scholar]
- Rolls ET, Xiang JZ. Reward-spatial view representations and learning in the primate hippocampus. J Neurosci. 2005;25:6167–6174. doi: 10.1523/JNEUROSCI.1481-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schultz W. Behavioral dopamine signals. Trends Neurosci. 2007;30:203–210. doi: 10.1016/j.tins.2007.03.007. [DOI] [PubMed] [Google Scholar]
- Shannon C. A mathematical theory of communication. Bell System Technical Journal. 1948;27:379–423. [Google Scholar]
- Squire LR, Zola SM. Structure and function of declarative and nondeclarative memory systems. Proc Natl Acad Sci U S A. 1996;93:13515–13522. doi: 10.1073/pnas.93.24.13515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Strange BA, Duggins A, Penny W, Dolan RJ, Friston KJ. Information theory, novelty and hippocampal responses: unpredicted or unpredictable? Neural Netw. 2005;18:225–230. doi: 10.1016/j.neunet.2004.12.004. [DOI] [PubMed] [Google Scholar]
- Suzuki WA, Amaral DG. Perirhinal and parahippocampal cortices of the macaque monkey: cortical afferents. J Comp Neurol. 1994;350:497–533. doi: 10.1002/cne.903500402. [DOI] [PubMed] [Google Scholar]
- Talairach J, Bancaud J. Stereotaxic approach to epilepsy: methodology of anatomo-functional stereotaxic investigations. Prog Neurol Surg. 1973;5:297–354. [Google Scholar]
- Thierry AM, Gioanni Y, Dégénétais E, Glowinski J. Hippocampo-prefrontal cortex pathway: anatomical and electrophysiological characteristics. Hippocampus. 2000;10:411–419. doi: 10.1002/1098-1063(2000)10:4<411::AID-HIPO7>3.0.CO;2-A. [DOI] [PubMed] [Google Scholar]
- Tobler PN, O'Doherty JP, Dolan RJ, Schultz W. Reward value coding distinct from risk attitude-related uncertainty coding in human reward systems. J Neurophysiol. 2007;97:1621–1632. doi: 10.1152/jn.00745.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Hoesen G, Pandya DN, Butters N. Some connections of the entorhinal (area 28) and perirhinal (area 35) cortices of the rhesus monkey. II. Frontal lobe afferents. Brain Res. 1975;95:25–38. doi: 10.1016/0006-8993(75)90205-x. [DOI] [PubMed] [Google Scholar]
- Watanabe T, Niki H. Hippocampal unit activity and delayed response in the monkey. Brain Res. 1985;325:241–254. doi: 10.1016/0006-8993(85)90320-8. [DOI] [PubMed] [Google Scholar]
- Wittmann BC, Bunzeck N, Dolan RJ, Düzel E. Anticipation of novelty recruits reward system and hippocampus while promoting recollection. Neuroimage. 2007;38:194–202. doi: 10.1016/j.neuroimage.2007.06.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu A, Dayan P. Expected and unexpected uncertainty: Ach and NE in the neocortex. Adv Neural Inf Process Syst. 2003;15:157–164. [Google Scholar]






