Abstract
The P300 component of the human event-related brain potential has often been linked to the processing of rare, surprising events. However, the formal computational processes underlying the generation of the P300 are not well known. Here, we formulate a simple model of trial-by-trial learning of stimulus probabilities based on Information Theory. Specifically, we modeled the surprise associated with the occurrence of a visual stimulus to provide a formal quantification of the “subjective probability” associated with an event. Subjects performed a choice reaction time task, while we recorded their brain responses using electroencephalography (EEG). In each of 12 blocks, the probabilities of stimulus occurrence were changed, thereby creating sequences of trials with low, medium, and high predictability. Trial-by-trial variations in the P300 component were best explained by a model of stimulus-bound surprise. This model accounted for the data better than a categorical model that parametrically encoded the stimulus identity, or an alternative model of surprise based on the Kullback–Leibler divergence. The present data demonstrate that trial-by-trial changes in P300 can be explained by predictions made by an ideal observer keeping track of the probabilities of possible events. This provides evidence for theories proposing a direct link between the P300 component and the processing of surprising events. Furthermore, this study demonstrates how model-based analyses can be used to explain significant proportions of the trial-by-trial changes in human event-related EEG responses.
Keywords: P300, single-trial EEG, information theory, surprise, attention, independent component analysis
Introduction
Late positive components of the human event-related brain potential (ERP), in particular the P300, have traditionally been associated with the processing of unexpected events (Sutton et al., 1965) (for review, see Nieuwenhuis et al., 2005). The amplitude of the P300 appears to be determined at least partly by the probability and relevance of an event (Duncan-Johnson and Donchin, 1977). Functionally, the P300 has commonly been linked to the revision of a participant's expectation about the current task context (Donchin, 1981; Donchin and Coles, 1988; Barcelo et al., 2006), as well as the updating of task-relevant information in anticipation of subsequent events (Barcelo et al., 2008). The P300 has widely been suggested to be modulated at least in part by the surprise of a stimulus (Donchin, 1981) and some authors have used a terminology related to information theory to describe processes underlying generation of the P300 (Ruchkin and Sutton, 1978; Johnson, 1986; Barcelo et al., 2008).
However, we are not aware of any study that has quantified fluctuations in surprise on a trial-by-trial basis to study its impact on the P300. A number of recent computational models have been proposed that formally quantify the surprise conveyed by sensory stimuli. In these models, the surprise associated with an event relates to its improbability, given a prediction of the occurrence of all possible events (Strange et al., 2005). Computationally, it might be an efficient strategy to focus processing resources on such surprising events, because these provide the most information to an observer (Baldi, 2005). One apparent advantage of using a model-based approach to quantify the intuitive notion of surprising events is that competing models about the cognitive processes underlying observed neural data can be formally tested (Corrado and Doya, 2007). Using this approach, recent neuroimaging studies in humans have shown that activity in a wide-spread parietal-premotor network is associated with the surprise associated with the presentation of a visual stimulus (Strange et al., 2005).
Here, we asked whether trial-by-trial variations in the P300 can be explained by such a formal model of surprise and whether this provides a more parsimonious description of the data than alternative models. Healthy participants performed a choice reaction time (RT) task while their brain activity was measured using electroencephalography (EEG). We then quantified the surprise associated with the unique stimulus sequence given to every participant and investigated whether these quantifications could explain variations in P300 on a single-trial basis. Our findings show that trial-by-trial variabilities in the P300 component are not random noise. A substantial proportion of this variability can be explained by formal quantifications of surprise, providing a direct confirmation of previous heuristics about the computations underlying the P300 component.
Materials and Methods
Participants, experimental design, and data acquisition.
Twelve healthy participants (eight women, age range 18–29 years), all with normal or corrected-to-normal visual acuity participated in the experiment. All were recruited via the participants' database of the Department of Psychology of University College London. Experimental procedures were approved by the local ethics committee and in accordance with the Declaration of Helsinki. Participants received £15 compensation for their time and travel.
Before the experiment, participants learned by trial-and-error the associations between four arbitrary visual stimuli (equated for surface area and brightness) and four button responses (using the index and middle fingers of both hands) for 60 trials. During this training, all stimuli were presented an equal number of times in random order. If participants did not perform the task without errors on the last 15 trials of the training block, it was repeated. During the main experiment participants performed 12 blocks of 60 trials of a choice reaction time task without feedback (see Fig. 1a). Visual stimuli were presented for 200 ms each, with a stimulus onset asynchrony of 2 s. Participants were required to respond to each stimulus with the previously associated button as quickly as possible, but not at the expense of accuracy. The probability of the occurrence of each event was manipulated between blocks such that the relative probabilities of events were either 0.25 for each event (low predictability), [0.4, 0.4, 0.1, 0.1] (medium predictability), or [0.7, 0.1, 0.1, 0.1] (high predictability). Participants were not informed about these probabilities. They were simply instructed to respond as quickly as possible to each presented stimulus and that the four different stimuli were randomly distributed across blocks. All stimuli occurred equally often over the course of the experiment and all stimuli had an equal behavioral relevance. Participants were given a break between blocks; they were free to initiate the subsequent block at their own pace.
The experiment was realized using the Cogent 2000 toolbox (University College London, http://www.vislab.ucl.ac.uk/Cogent2000/index.html) for Matlab (The Mathworks). EEG was recorded (bandpass filter: 0.05–100 Hz, 500 Hz sampling rate) using a Synamps2 amplifier (Neuroscan) from the following electrode positions, using Ag/AgCl electrodes mounted in an elastic electrode cap: AF3, AF4, F7, F3, Fz, F4, F8, FC5, FC1, FC2, FC6, T7, C3, Cz, C4, T8, CP5, CP1, CP2, CP6, P7, P3, Pz, P4, P8, PO3, PO4, Oz, and left and right mastoids. Horizontal and vertical eye movements were recorded using electrodes placed lateral to both eyes and above and below the left eye. Electrode AFz served as reference during recording and the electrode common was placed on the participants' chin. Electrode impedances were kept at <10 kΩ.
Electrophysiological analyses.
EEG data were analyzed using EEGLAB (Delorme and Makeig, 2004), implemented in Matlab 7.1. Each participant's EEG data were bandpass filtered (0.3–30 Hz), down-sampled to 250 Hz, and re-referenced to average reference. Subsequently, epochs of −600 to 1400 ms around the presentation of the visual stimuli were extracted from each trial and linearly detrended. During the first step of artifact rejection, epochs containing unique, nonstereotyped artifacts (swallowing, head movements, etc) were rejected. In a second step, repeatedly occurring, stereotyped artifacts were removed using independent component analysis (ICA) (Jung et al., 2000a), which has been used in a number of recent studies on P300 (Debener et al., 2005a; Eichele et al., 2005; Jongsma et al., 2006). This method assumes that the EEG data recorded at the electrode level is a linear mixture of underlying brain signals and artifactual signals such as eye blinks, muscle activity, cardiac signals, and line noise. The ICA algorithm (extended infomax ICA) (Makeig et al., 1996) finds an “unmixing” square matrix of the size of the number of channels, which is then matrix-multiplied with the raw data to reveal maximally temporally independent components. Each independent component can then be characterized by a time course and a scalp topography. All individual independent components whose signal and scalp topography resembled known artifacts were removed from the dataset (Jung et al., 2000a,b). The remaining components were back-projected to the scalp to reveal EEG data without the contributions of the artifacts. Epochs were baseline corrected using the interval −400–0 ms before stimulus presentation as the baseline.
From these data, single-trial P300s were estimated at electrode Pz, where this ERP component is traditionally reported to be maximal (Duncan-Johnson and Donchin, 1977; Debener et al., 2005a; Jongsma et al., 2006). ERPs were created as trial averages for each participant and for each a priori stimulus category. To estimate single-trial amplitudes, for each participant, the time point at which the averaged P300s were modulated maximally by relative stimulus frequency was determined. Single-trial P300 estimates were then extracted over a window of ±60 ms around this time point of maximal modulation (cf. Jongsma et al., 2006; Barcelo et al., 2008). This method was chosen over simple peak detection (Bénar et al., 2007) to capture the condition effects and improve the reliability of single-trial amplitude measures, similar to previous studies (Debener et al., 2005b).
Ideal observers.
We modeled participants' learning of the task by assuming they acted as ideal observers who learn the probability of selecting each of the four responses after presentation of the stimuli. Following previous studies (Strange et al., 2005; Harrison et al., 2006; Bestmann et al., 2008), we assume that participants start each block assuming that all events are equally likely and update their estimate of the probability of each event type on each trial, based on the events they previously observed. The same procedure was repeated for each block, i.e., the maximum number of observations was the number of trials in a block. This amounts to assuming that each participant starts each block “anew,” without memory of the previous blocks. Although future work may focus more directly on modeling different types of information transfers between blocks, previous work has shown the suitability of this assumption (Strange et al., 2005; Harrison et al., 2006; Bestmann et al., 2008).
Formally, we can consider a discrete variable, x, that can take values from 1 to K, where in our case K = 4, i.e., each trial contained one of four possible events, corresponding to the four visual stimuli and their respective responses. This distribution is parameterized by the random vector P(x) = [p1, …, pK] (which we abbreviate using P(x) = p), whose elements sum to one and we denote the probability of the kth event as P(x = k) = pk. This is a multinomial distribution, where pk is the probability of the kth trial type occurring. We will refer to this as the generative distribution, as it was from this that a sequence of events were sampled. A simple example is a coin toss where K = 2. The probability of “heads” and “tails” is then given by P(x = heads) = p1 and P(x = tails) = p2 respectively, which sum to one.
The aim of the observer, i.e., the participant, is to estimate the above distribution of event probabilities, using the information conveyed by the encountered train of stimuli. In other words, the observer tries to estimate parameters, i.e., probabilities, contained in the vector p. Given a sample of j events, denoted by Xj = {x1, …, xj}, there are a number of ways to estimate these. An issue with using the maximum likelihood estimate is that the observer's estimate of pk will be zero if event k has not been observed. For example, if only three tosses of a coin are sampled with the outcome of three heads, then the estimate of the probability of heads is equal to one. A prediction based on this small sample would be that a tail could never occur, which is contrary to intuition. This can be resolved by giving the observer prior knowledge, as done in the Bayesian paradigm. We can assure that the observer has a greater than zero expectancy of all stimuli occurring by giving it a uniform prior, i.e., by having it assume initially that all stimuli are equally likely to occur. For the current setting, a prior distribution indicating the belief in all parameters before any observations is given by a prior Dirichlet distribution. A uniform Dirichlet prior over p is parameterized by a vector α = [α1, …, αk] and written as P(p|α) = Dir(p;αk). Choosing all elements of α equal to one represents the prior belief that the multinomial parameters are uniform. In the present case, this results in a belief that all four stimuli are likely to occur 25% of the time.
The degree of belief in the estimated probabilities p will change when an event is observed. The posterior distribution representing the belief after j trials, Xj, is given by
where nkj refers to the number of occurrences of outcome k up until observation j. In words, this expression states that the estimated probability over the parameters p is determined by the observations Xj and a uniform prior (parameterized by α). This is again a Dirichlet distribution, parameterized by the vector with elements equal to nkj + αk. Because the observer knows nkj and αk is fixed to be uniform, the posterior distribution can be computed easily and updated for each new observation. We abbreviate the estimated distribution following j trials as Dj.
The posterior distribution after observing trial j − 1, i.e., Dj − 1, can be used to predict the probability of each event occurring, i.e., the multinomial distribution, at the jth trial. The expression for this is
where the total number of observations up to the trial preceding j is
which is equal to j − 1. In words, the predicted probability of observing event (trial type) k on the jth trial, given all preceding observations and a uniform prior is equal to p̃kj, where we have used the tilde to denote that it is a prediction. This quantity changes with each new observation and is the reason for including j in the superscript. This can then be updated with each new event (trial) (cf. Strange et al., 2005).
Quantifying surprise.
Following Strange et al. (2005), we can quantify the surprise, I, on each trial as follows (cf. Shannon, 1948):
This states that the surprise of observing event type k at the jth trial is equal to the negative log of its predicted probability given all preceding trials. Accordingly, the amount of surprise conveyed by the occurrence of an event is high when an infrequent stimulus occurs in a stimulus sequence with high predictability. For example, in highly predictable blocks ([0.7, 0.1, 0.1, 0.1]), the probability of one particular event is high, whereas the other three events occur only rarely. Given repeated samples of this distribution, these low frequency events are more surprising. An event is more surprising when occurring with 0.10 probability, compared with an event with a 0.70 probability of occurring (Fig. 1c). Note that in this experiment the generative distribution did not include dependencies between consecutive events. That is, the event at one time did not depend on earlier events. This is the same as in the study by Strange et al. (2005) and different to that investigated by Harrison et al. (2006), where the current event depended on the previous. Given the assumption that participants start each block anew, we refer to this model as blockwise surprise, Ib.
Alternative models.
We compared the model of the previous section with a number of alternatives. The ideal observer described above assumed the generative model being stationary, i.e., unchanging within a block. This assumption is ideal in that it matches the true distribution used to generate trial types in the experiment. Furthermore, the model described above assumes participants start each block anew with the expectation that all events occur equally often, i.e., with a uniform (i.e., flat, uninformative) prior. Alternatively, one might expect that participants view each block merely as a continuation of the previous block, such that the experiment can be seen as one long session. We therefore also created a model based on an observer with no forgetting, here referred to as experiment-wise surprise, Ie (Fig. 1d). This is suboptimal because contingencies did change from block to block.
An alternative formulation of surprise has been suggested by Baldi et al. (Baldi, 2002; Itti and Bladi, 2006), based on the Kullback–Leibler (KL) divergence (Kullback, 1959; Clover and Thomas, 1999). The KL divergence is a scalar quantity that summarizes the difference between two probability distributions. In our case, it is used to measure the change in belief about the stimulus probabilities, P(x), after an event (i.e., visual stimulus). If this change is large then the event has a high degree of “surprise”, compared with one that has little or no effect. For the current experimental setting, the Kullback–Leibler divergence (KL surprise) at trial j is a function of the current, prior, and posterior distributions, Dj − 1 and Dj (cf. Baldi, 2002):
In words, the (blockwise) KL surprise is the “distance” between the distributions before and after observing the jth trial. Intuitively this means that events can be quantified in terms of how much they change posterior beliefs.
The difference between the KL divergence measure of surprise (see Fig. 1e) and surprise as defined in Equation 4 is that the former is an average quantity, i.e., summed over all probabilities in the distribution. In contrast, the latter is a function of the predicted probability of an observed event, i.e., trial type presented to the subject. In other words, the KL divergence is a distance measure between the current, prior, and posterior distributions, whereas Ib is a function of the predicted probability of the observed event, i.e., just one event and not an average over all possible events. The KL surprise measure relates to those proposed by Ruchkin and Sutton (1978) and Kopp (2007) to account for variations in P300.
Last, we included a conventional explanation using a categorical model of events parametrically modulated by the probability of occurrence. In this model, each trial had one of the values [0.10, 0.25, 0.40, 0.70]. This regressor models variance related to stimulus probability within a block and does not take into account any learning; hence it is similar to the traditional method of averaging ERP data over a priori probabilities. Note that this model is similar to the model used by Duncan-Johnson and Donchin (1977), who used a linear regression analysis of single-trial P300 amplitudes and a priori event probabilities.
Model estimation and comparison.
To test the hypothesis that surprise can predict event-related P300 responses we used a hierarchical general linear model (GLM), in which the parameters were optimized using empirical Bayes (Friston et al., 2007).
Data from all S subjects were concatenated in a vector Y of length T × S, where T is the number of trials per subject. These data were fitted using a three-level hierarchical model of the following structure:
The parameters weights {w1, w2} scale each column of the design matrices {Z1, Z2}. Hyperparameters {λ1, λ2, λ3} control the precision (inverse variance) of noise at each level, given by {e1, e2, e3}; these correspond to within-subject error, between-subject error and shrinkage priors on the group-parameters, w2. I is an identity matrix. The first level design matrix, Z1 was block-diagonal, with dimensions TS × PS, with P regressors per subject. These regressors are the explanatory variables provided by our different models of the task sequence (see above). Additional regressors indicated the identity of trials on which participants responded erroneously, trials that were rejected during the preprocessing of the EEG data, and a constant term. By modeling incorrect responses explicitly, we accounted for the known effects of correct or incorrect responding on reaction times and P300 (Krigolson and Holroyd, 2007). The second design matrix, Z2 = 1S ⊗ IP, represented between-subject differences in the parameter weights, where 1S is a column of ones of length S. We computed the posterior densities over model parameters and hyperparameters using standard techniques (Friston et al., 2007), where a posterior density represents the degree of belief in a parameter given data, i.e., single-trial P300 estimate.
The model evidence p(y|Mm), is the probability of the data given the mth model, which was approximated using the marginal likelihood (Penny et al., 2004; Friston et al., 2007). It is important to note that this quantity is computed by integrating out all model [hyper]-parameters and so it includes a complexity term as well as an accuracy term (expected likelihood). This evidence was used to compare competing models defined in terms of the explanatory variables in Z1.
We compared models using the ratio of the evidence for two competing models known as the Bayes Factor (Kass and Raftery, 1995). This can be formulated as a difference in approximate log model evidence for two models m and n (Fm and Fn) as follows:
Here, a difference of +3 corresponds approximately to 20:1 odds, i.e., exp(3) ≈ 20, in favor of model m over n (Harrison et al., 2006; Bestmann et al., 2008). In the present case, positive values reflect stronger evidence in favor of the model containing surprise Ib, whereas negative values would indicate stronger evidence for the alternative models tested.
Results
Behavioral results
Inspection of average reaction times on correct trials showed that participants' reaction times were affected by changes in probabilistic context. A repeated-measures ANOVA with factor “probability” (4 levels: 0.1, 0.25, 0.40, and 0.70, indicating the overall a priori probability of a stimulus within a block) showed that participants responded slower to stimuli with a lower (0.10; 573 ± 21 ms; RT ± SEM) probability of occurrence, than stimuli with a higher probability of occurrence (0.70; 427 ± 18 ms): F(3,9) = 107.904, p < 0.001. Participants responded incorrectly on 4.2% (SEM ± 0.69) of trials, making more errors in response to less frequent stimuli (F(3,9) = 15.01, p = 0.001).
Event-related potentials: trial-averaged results
Figure 2 shows the scalp topographies and grand average ERP over all trials, showing the traditional distribution of the P300. Our statistical analyses focused on the single-trial estimates of P300. Central latency of the time window used for single-trial P300 estimates was on average 531 (SEM ± 24, range 392–660) ms after stimulus onset. To verify that our averaged single-trial P300 estimates showed the same scalp topography and ordering by stimulus probability as commonly reported for average P300s, averaged single-trial estimates were entered into a repeated measures ANOVA with factors electrode (4 levels: Fz, Cz, Pz, and Oz) and probability (4 levels: 0.10, 0.25, 0.40, and 0.70). This analysis showed that average single-trial estimates differed reliably between electrodes (F(3,9) = 19.484, p < 0.001) and probability (F(3,9) = 14.936, p < 0.001). The difference in average P300 for each probability was most pronounced at electrode Pz (electrode × probability interaction: F(9,3) = 7.659, p < 0.001), as is well established for the P300 (Duncan-Johnson and Donchin, 1977; Debener et al., 2005a) (Fig. 2c). This ordering of single-trial estimates is not due simply to a potential confounding relationship between trials rejected by the artifact correction and a priori stimulus probability, as there as no systematic relationship between the two (F(3,33) = 0.008, not significant).
Event-related potentials: model-based single-trial analyses
Having replicated the traditional P300 effects in choice reaction time tasks, we subsequently focused on the model-based analyses of the single-trial P300 estimates, following the procedure advocated by MacKay (1992). First, each model was fitted to the data using the procedure described above. Second, the model evidence was calculated for each model and the models were compared using the Bayes factor. This analysis showed that the blockwise surprise Ib model provided a more parsimonious account of the data when compared to a categorical model of a priori stimulus probabilities that was used by Duncan-Johnson and Donchin (1977). Moreover, the surprise Ib model was favored over two alternative models of surprise, the KL surprise and a model of surprise without forgetting Ie. The direct comparison of surprise Ib with all other candidate models is presented in Figure 3a. A log-evidence ratio >3 indicates 20:1 odds in favor of the surprise Ib model.
Having established that the surprise Ib model provided the most parsimonious explanation of the data, the group posterior density over the model parameter indicates the contribution of surprise to the data, i.e., the single-trial P300 estimate (cf. the β in a standard regression analysis). This analysis showed that variations in P300 could be explained by surprise, with more surprising events leading to an increased P300 (5.9 μV/bit) (Fig. 3b). This finding was consistent across all participants (Fig. 3c).
Discussion
We investigated whether single-trial P300 estimates in a choice reaction time task could be explained by a formal model of the surprise conveyed by events experienced by participants. Behavioral data indicated that on average participants responded slower to less frequently occurring, i.e., more surprising, events. Consistent with earlier reports on the P300, we found that averaged P300s over central-parietal electrode sites interacted with the relative probability of event occurrence. Importantly, a model of the surprise within a block of trials provided a more parsimonious explanation of single-trial P300 changes than alternative models, including a categorical model of stimulus frequency, an alternative model of surprise based on the KL divergence, and surprise without forgetting. This novel model-based approach applied to single-trial EEG data allows for a formal quantification of the psychological variable “surprise” and its relationship to the psychophysiological marker P300.
Previous studies on the P300 have introduced the term “subjective probability” to denote that it is participants' estimation of the environment that is crucial in predicting modulations in P300 (Donchin and Coles, 1988). This has led to the suggestion that P300 reflects the updating of information in anticipation of subsequent information processing (Sutton et al., 1965; Nieuwenhuis et al., 2005; Verleger et al., 2005; Barcelo et al., 2008). The P300 has previously been linked with information theoretic concepts (Ruchkin and Sutton, 1978; Johnson, 1986; Barcelo et al., 2006, 2008; Barcelo and Knight, 2007) or Bayes' theory (Kopp, 2007). Here, we draw on information theoretic concepts to investigate the trial-by-trial influence of stimulus-bound surprise on P300 variation. Characterizing the subjective estimate of task probabilities has only recently become a major focus of research in cognitive and neurosciences (Oaksford and Chater, 2007). In the present case, we used a model of how the “subjective probability” is represented and updated over time, rather than how it changes on average.
To achieve this, the present approach combines two novel methodologies that, to our knowledge, have not been combined earlier in studies of event-related potentials. First, the model-based approach provides models about the trial-by-trial variations of task states internal to the participant, such as stimulus expectancy and reward estimate (cf. Corrado and Doya, 2007). These states are not directly accessible to the experimenter using traditional analysis methods [for a similar point, see Strange et al. (2005) and Behrens et al. (2007)]. Here, we modeled each participant as an ideal observer, who updates his belief about events by combining previous knowledge with a current event. Second, although previous studies have compared predictions from computational models qualitatively with the results from averaged evoked potentials (Nieuwenhuis et al., 2002; Cohen and Ranganath, 2007), recent advances in EEG data processing, such as ICA (Eichele et al., 2005; Debener et al., 2006; Jongsma et al., 2006), now allow for trial-by-trial analyses. We here combine this model-based approach and the single-trial data analysis by formally testing the predictions of the model to the data. Moreover, this combined approach allows for comparing the evidence of different models, given the observed ERP data. The present approach differs from that used by Duncan-Johnson and Donchin (1977). These authors used regression analysis to fit single-trial P300 amplitudes to a model of a priori stimulus probability. Their approach thus focused on the overall true probabilities that were a priori known to the experimenter, but not the participant. In contrast, we here used a formal model of how participants' learned these probabilities over the course of the experiment. In addition, we scrutinized our model against several alternative models.
We have modeled surprise Ib here according to measures described by information theory (Shannon, 1948; Clover and Thomas, 1999), consistent with previous studies showing that surprise is associated with activity in an extended corticothalamic network (Strange et al., 2005; Harrison et al., 2006) and changes in corticospinal excitability (Bestmann et al., 2008). Here, we assumed that events were stationary and unchanging within a block, matching the true generative distribution from which events were sampled. Therefore, all previous blocks and events were forgotten in an optimal way and trials within the current block were weighted equally. Note that this assumption is ideal in relation to the actual experimental paradigm but assumes participants were privy to different blocks of events being sampled from different distributions. We therefore included an alternative model in which our ideal observers had suboptimal (i.e., no) forgetting with respect to the actual experimental paradigm. In the present experiment, a model of an ideal observer beginning each block with flat priors, was superior to a model without forgetting.
Moreover, we also compared our model to an alternative measure of surprise based on the Kullback–Leibler divergence. This latter measure can be taken as a formal description of “equivocation” that has been suggested to underlay the generation of the P300 (Ruchkin and Sutton, 1978; see also Kopp, 2007). Although the present results agree with these authors' suggestion that trial-by-trial estimates of surprise based on each participant's unique trial history is important in predicting fluctuations in P300, we show that surprise Ib based on only the estimated probability of the stimulus presented on a given trial rather than the full distribution of trials, provides a better explanation to characterize changes in P300.
A remaining question is how the present modeling approach of single-trial P300 links with recent neurophysiological models of P300 generation. Nieuwenhuis et al. (2005) proposed that the P300 reflects the arrival of a phasic norepinephrine (NE) signal in cortical areas, which serves to increase signal transmission in the cortex. This proposal is based on a number of considerations, such as the similarities between the ante-conditions for phasic increases in NE and the generation of the P300 and between the target areas of NE projections and known P300 generators, and pharmacological studies that seem broadly consistent with this proposal (for review, see Nieuwenhuis et al., 2005). In this respect, it is interesting to note that recent advances in computational neuroscience point to a role of NE in the processing of contextual uncertainty. Specifically, Dayan and Yu (2006) proposed that phasic NE signals unexpected changes in the world within the context of a task. The hope of the approach taken in the current study is to use such computationally informed models to investigate the link between phasic NE to single-trial P300 data.
In the present task, each visual stimulus was linked to a distinct motor response and other factors that might influence P300, such as stimulus salience and task relevance (Johnson, 1986; De Bruijn et al., 2004), were kept constant. Therefore, we cannot determine whether the P300 modulation was purely due to the surprise conveyed by the visual stimuli, or whether it was related to the response selection on each trial (cf. Koechlin and Summerfield, 2007). Previous studies indeed show that P300 modulation can be explained in terms of the probabilistic updating of the corresponding motor response (Barcelo and Knight, 2007; Barcelo et al., 2008).
We have referred to the centroparietal component we found as the P300. Other studies have made a further distinction between the so-called P3a and P3b subcomponents (Polich, 2007). The P3b is the component commonly referred to as “P300,” and is commonly evoked by target stimuli at around 300–600 ms, similar to the component observed in the present study. In contrast, the P3a is linked to infrequent, task-novel events, and has a frontocentral maximum occurring at ∼250–400 ms (Courchesne et al., 1975; Friedman et al., 2001). In addition, the P3a component habituates fast, possibly following the pattern predicted by the KL surprise, rather than the surprise Ib that predicts P3b. This may be tested directly in experiments specifically designed for eliciting P3a responses (Debener et al., 2005a), using the modeling framework presented here. The present study did not focus on the difference between the novelty and attention-related P3a and the target and response-related P3b component. Moreover, our focus on the amplitude of P300 did not focus on potential information conveyed by P300 latency (Donchin, 1981; Donchin and Coles, 1988). Nevertheless, the amplitude contains sufficient structure that can be explained by a formal definition of surprise. By taking into account P3b versus P3a effects and latency information, it may be possible to consider surprise in the context of other mental states contributing to goal-oriented behavior.
To conclude, model-based single-trial analyses can be used for testing hypotheses of event-related EEG fluctuations. This approach provides a bridge between cognitive theories and more formal neurophysiological models of the P300 ERP. The focus on single-trial EEG data provides a more direct link to behavior and neural processing than averaged EEG activity (Debener et al., 2006). This is supported by our observation that P300 trial-by-trial amplitude fluctuations are not random noise, and can be explained by a formal model of surprise experienced in the context of a behavioral task. Our findings provide direct evidence for theories linking the P300 component and the processing of surprising events.
Footnotes
This work was supported by the Wellcome Trust (R.B.M., L.M.H., S.B.) and a Marie Curie Intra-European Fellowship within the sixth European Community Framework Programme (R.B.M.).
References
- Baldi P. A computational theory of surprise. In: Blaum M, editor. Information, coding, and mathematics. Amsterdam: Kluwer; 2002. pp. 1–26. [Google Scholar]
- Baldi P. Surprise: A shortcut for attention. In: Itti L, Rees G, Tsotsos JK, editors. Neurobiology of attention. San Diego: Elsevier Academic; 2005. pp. 24–28. [Google Scholar]
- Barcelo F, Knight RT. An information-theoretical approach to contextual processing in the human brain: Evidence from prefrontal lesions. Cereb Cortex. 2007;17:i51–i60. doi: 10.1093/cercor/bhm111. [DOI] [PubMed] [Google Scholar]
- Barcelo F, Escera C, Corral MJ, Periáñez JA. Task switching and novelty processing activate a common neural network for cognitive control. J Cogn Neurosci. 2006;18:1734–1748. doi: 10.1162/jocn.2006.18.10.1734. [DOI] [PubMed] [Google Scholar]
- Barcelo F, Periáñez JA, Nyhus E. An information theoretical approach to task-switching: Evidence from cognitive brain potentials in humans. Front Hum Neurosci. 2008;1:13. doi: 10.3389/neuro.09.013.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Behrens TE, Woolrich MW, Walton ME, Rushworth MF. Learning the value of information in an uncertain world. Nat Neurosci. 2007;10:1214–1221. doi: 10.1038/nn1954. [DOI] [PubMed] [Google Scholar]
- Bénar CG, Schön D, Grimault S, Nazarian B, Burle B, Roth M, Badier JM, Marquis P, Liegeois-Chauvel C, Anton JL. Single-trial analysis of oddball event-related potentials in simultaneous EEG-fMRI. Hum Brain Mapp. 2007;28:602–613. doi: 10.1002/hbm.20289. [DOI] [PubMed] [Google Scholar]
- Bestmann S, Harrison LM, Blankenburg F, Mars RB, Haggard P, Friston KJ, Rothwell JC. Influences of contextual uncertainty and surprise on human corticospinal excitability during preparation for action. Curr Biol. 2008;18:775–780. doi: 10.1016/j.cub.2008.04.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clover TM, Thomas JA. Elements of information theory. New York: Wiley; 1999. [Google Scholar]
- Cohen MX, Ranganath C. Reinforcement learning signals predict future decisions. J Neurosci. 2007;27:371–378. doi: 10.1523/JNEUROSCI.4421-06.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corrado G, Doya K. Understanding neural coding through the model-based analysis of decision making. J Neurosci. 2007;27:8178–8180. doi: 10.1523/JNEUROSCI.1590-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Courchesne E, Hillyard SA, Galambos R. Simulus novelty, task relevance and the visual evoked potential in man. Electroencephalogr Clin Neurophysiol. 1975;39:131–143. doi: 10.1016/0013-4694(75)90003-6. [DOI] [PubMed] [Google Scholar]
- Dayan P, Yu AJ. Phasic norepinephrine: a neural interrupt signal for unexpected events. Network. 2006;17:335–350. doi: 10.1080/09548980601004024. [DOI] [PubMed] [Google Scholar]
- Debener S, Makeig S, Delorme A, Engel AK. What is novel in the novelty oddball paradigm? Functional significance of the novelty P3 event-related potential as revealed by independent component analysis. Cogn Brain Res. 2005a;22:309–321. doi: 10.1016/j.cogbrainres.2004.09.006. [DOI] [PubMed] [Google Scholar]
- Debener S, Ullsperger M, Siegel M, Fiehler K, von Cramon DY, Engel AK. Trial-by-trial coupling of concurrent electroencephalogram and functional magnetic resonance imaging identifies the dynamics of performance monitoring. J Neurosci. 2005b;25:11730–11737. doi: 10.1523/JNEUROSCI.3286-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Debener S, Ullsperger M, Siegel M, Engel AK. Single-trial EEG-fMRI reveals the dynamics of cognitive function. Trends Cogn Sci. 2006;10:558–563. doi: 10.1016/j.tics.2006.09.010. [DOI] [PubMed] [Google Scholar]
- De Bruijn ERA, Mars RB, Hulstijn W. It wasn't me… or was it? How false feedback affects performance. In: Ullsperger M, Falkenstein M, editors. Errors, conflicts, and the brain: current opinions on performance monitoring. Leipzig: MPI of Cognitive Neuroscience; 2004. pp. 118–124. [Google Scholar]
- Delorme A, Makeig S. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J Neurosci Methods. 2004;134:9–21. doi: 10.1016/j.jneumeth.2003.10.009. [DOI] [PubMed] [Google Scholar]
- Donchin E. Surprise!...Surprise? Psychophysiology. 1981;18:493–513. doi: 10.1111/j.1469-8986.1981.tb01815.x. [DOI] [PubMed] [Google Scholar]
- Donchin E, Coles MG. Is the P300 component a manifestation of context updating? Behav Brain Sci. 1988;11:357–374. [Google Scholar]
- Duncan-Johnson C, Donchin E. On quantifying surprise: The variation of event-related brain potentials with subjective probability. Psychophysiology. 1977;14:456–467. doi: 10.1111/j.1469-8986.1977.tb01312.x. [DOI] [PubMed] [Google Scholar]
- Eichele T, Specht K, Moosmann M, Jongsma ML, Quian Quiroga R, Nordby H, Hugdahl K. Assessing the spatiotemporal evolution of neuronal activation with single-trial event-related potentials and functional MRI. Proc Natl Acad Sci U S A. 2005;102:17798–17803. doi: 10.1073/pnas.0505508102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friedman D, Cycowicz YM, Gaeta H. The novelty P3: an event-related brain potential (ERP) sign of the brain's evaluation of novelty. Neurosci Biobehav Rev. 2001;25:355–373. doi: 10.1016/s0149-7634(01)00019-7. [DOI] [PubMed] [Google Scholar]
- Friston K, Mattout J, Trujillo-Barreto N, Ashburner J, Penny W. Variational free energy and the Laplace approximation. Neuroimage. 2007;34:220–234. doi: 10.1016/j.neuroimage.2006.08.035. [DOI] [PubMed] [Google Scholar]
- Harrison LM, Duggins A, Friston KJ. Encoding uncertainty in the hippocampus. Neural Netw. 2006;19:535–546. doi: 10.1016/j.neunet.2005.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Itti P, Baldi P. Advances in neural information processing systems. Vol 19. Cambridge: MIT Press; 2006. Bayesian surprise attracts human attention; pp. 547–554. [Google Scholar]
- Johnson R., Jr A triarchic model of P300 amplitude. Psychophysiology. 1986;23:367–384. doi: 10.1111/j.1469-8986.1986.tb00649.x. [DOI] [PubMed] [Google Scholar]
- Jongsma ML, Eichele T, Van Rijn CM, Coenen AM, Hugdahl K, Nordby H, Quian Quiroga R. Tracking pattern learning with single-trial event-related potentials. Clin Neurophysiol. 2006;117:1957–1973. doi: 10.1016/j.clinph.2006.05.012. [DOI] [PubMed] [Google Scholar]
- Jung TP, Makeig S, Humphries C, Lee TW, McKeown MJ, Iragui V, Sejnowski TJ. Removing electroencephalographic artifacts by blind source separation. Psychophysiology. 2000a;37:163–178. [PubMed] [Google Scholar]
- Jung TP, Makeig S, Westerfield M, Townsend J, Courchesne E, Sejnowski TJ. Removal of eye activity artifacts from visual event-related potentials in normal and clinical subjects. Clin Neurophysiol. 2000b;111:1745–1758. doi: 10.1016/s1388-2457(00)00386-2. [DOI] [PubMed] [Google Scholar]
- Kass RE, Raftery A. Bayes factors. J Am Stat Ass. 1995;90:773–795. [Google Scholar]
- Koechlin E, Summerfield C. An information theoretical approach to prefrontal executive function. Trends Cogn Sci. 2007;11:229–235. doi: 10.1016/j.tics.2007.04.005. [DOI] [PubMed] [Google Scholar]
- Kopp B. The P300 component of the event-related brain potential and Bayes' theorem. Cogn Sci. 2007;2:113–125. [Google Scholar]
- Krigolson OE, Holroyd CB. Hierarchical error processing: different errors, different systems. Brain Res. 2007;1155:70–80. doi: 10.1016/j.brainres.2007.04.024. [DOI] [PubMed] [Google Scholar]
- Kullback S. Information theory and statistics. New York: Wiley; 1959. [Google Scholar]
- MacKay DJ. Bayesian interpolation. Neural Comput. 1992;4:415–447. [Google Scholar]
- Makeig S, Bell AJ, Jung TP, Sejnowski TJ. Independent component analysis of electroencephalographic data. In: Touretzky D, Mozer M, Hasselmo M, editors. Advances in neural information processing systems. Vol VIII. Cambridge, MA: MIT; 1996. pp. 145–151. [Google Scholar]
- Nieuwenhuis S, Ridderinkhof KR, Talsma D, Coles MG, Holroyd CB, Kok A, Van der Molen MW. A computational account of altered error processing in older age: dopamine and the error-related negativity. Cogn Affect Behav Neurosci. 2002;2:19–36. doi: 10.3758/cabn.2.1.19. [DOI] [PubMed] [Google Scholar]
- Nieuwenhuis S, Aston-Jones G, Cohen JD. Decision making, the P3, and the locus coeruleus-norepinephrine system. Psychol Bull. 2005;131:510–532. doi: 10.1037/0033-2909.131.4.510. [DOI] [PubMed] [Google Scholar]
- Oaksford M, Chater N. Bayesian rationality: the probabilistic approach to human reasoning. Oxford: Oxford UP; 2007. [DOI] [PubMed] [Google Scholar]
- Penny WD, Stephan KE, Mechelli A, Friston KJ. Comparing dynamic causal models. Neuroimage. 2004;22:1157–1172. doi: 10.1016/j.neuroimage.2004.03.026. [DOI] [PubMed] [Google Scholar]
- Polich J. Updating P300: An integrative theory of P3a and P3b. Clin Neurophysiol. 2007;118:2128–2148. doi: 10.1016/j.clinph.2007.04.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ruchkin DS, Sutton S. Equivocation and P300 amplitude. In: Otto D, editor. Multidisciplinary perspectives in event-related brain potential research. Washington, DC: U.S. Government Printing Office; 1978. pp. 175–177. [Google Scholar]
- Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27:379–423. [Google Scholar]
- Strange BA, Duggins A, Penny W, Dolan RJ, Friston KJ. Information theory, novelty and hippocampal responses: unpredicted or unpredictable? Neural Netw. 2005;18:225–230. doi: 10.1016/j.neunet.2004.12.004. [DOI] [PubMed] [Google Scholar]
- Sutton S, Braren M, Zubin J, John ER. Evoked-potential correlates of stimulus uncertainty. Science. 1965;150:1187–1188. doi: 10.1126/science.150.3700.1187. [DOI] [PubMed] [Google Scholar]
- Verleger R, Jaśkowki P, Wascher E. Evidence for an integrative role of P3b linking reaction to perception. J Psychophysiol. 2005;19:165–181. [Google Scholar]