Experience-driven recalibration of learning from surprising events

Leah Bakst; Joseph T McGuire

doi:10.1016/j.cognition.2022.105343

. Author manuscript; available in PMC: 2024 Mar 1.

Published in final edited form as: Cognition. 2022 Dec 5;232:105343. doi: 10.1016/j.cognition.2022.105343

Experience-driven recalibration of learning from surprising events

Leah Bakst ^a,^b, Joseph T McGuire ^a,^b

PMCID: PMC9851993 NIHMSID: NIHMS1857933 PMID: 36481590

Abstract

Different environments favor different patterns of adaptive learning. A surprising event that in one context would accelerate belief updating might, in another context, be downweighted as a meaningless outlier. Here, we investigated whether people would spontaneously regulate the influence of surprise on learning in response to event-by-event experiential feedback. Across two experiments, we examined whether participants performing a perceptual judgment task under spatial uncertainty (n=29, n=63) adapted their patterns of predictive gaze according to the informativeness or uninformativeness of surprising events in their current environment. Uninstructed predictive eye movements exhibited a form of metalearning in which surprise came to modulate event-by-event learning rates in opposite directions across contexts. Participants later appropriately readjusted their patterns of adaptive learning when the statistics of the environment underwent an unsignaled reversal. Although significant adjustments occurred in both directions, performance was consistently superior in environments in which surprising events reflected meaningful change, potentially reflecting a bias toward interpreting surprise as informative and/or difficulty ignoring salient outliers. Our results provide evidence for spontaneous, context-appropriate recalibration of the role of surprise in adaptive learning.

Keywords: Adaptive learning, decision making, eye movements, uncertainty

1. INTRODUCTION

Different environments involve different sources of uncertainty, with profound implications for how decision makers should respond to surprising events. An event that violates one’s expectations might be interpreted either as a meaningless outlier or a sign of fundamental change. For example, suppose you supervise a reliable employee who one day misses work without notice. Whether you dismiss the event as a one-off or modify your predictions about the employee’s future reliability will depend on your background knowledge and assumptions about the sources of uncertainty in this context.

Previous research has approached the problem of inference and prediction in dynamic environments by making a distinction between expected uncertainty and unexpected uncertainty (Payzan-LeNestour et al., 2013; Payzan-LeNestour & Bossaerts, 2011; Soltani & Izquierdo, 2019; A. J. Yu & Dayan, 2005). Expected uncertainty relates to noisy event-to-event fluctuations that are compatible with one’s current understanding of how events are generated. A less-than-fully-predictable event, like rolling a six-sided die and getting a six, need not drive updates to one’s understanding of the outcome-generating mechanism. Unexpected uncertainty relates to events that do not conform to the known unreliability of predictive cues (such as rolling a six-sided die and getting a seven) and signal a need to update one’s beliefs about the underlying context.

Previous empirical work has shown that people adaptively modulate the rate at which they learn from new observations by balancing context-appropriate estimates of expected uncertainty, the likelihood of true change in the outcome-generating process, and other factors (Bakst & McGuire, 2021; Behrens et al., 2007; Cheadle et al., 2014; Diederen et al., 2016; Farashahi et al., 2019; Lee et al., 2020; Massi et al., 2018; McGuire et al., 2014; Nassar et al., 2010, 2019; Ossmy et al., 2013). These different factors have been linked to dissociable patterns of activity in arousal-related brain systems, frontoparietal cortex, and ventral striatum (Jepma et al., 2016, 2018; Kao et al., 2020; McGuire et al., 2014; Nassar et al., 2012; O’Reilly et al., 2013). Reduced flexibility of adaptive learning has been associated with dimensions of psychopathology including trait anxiety (Browning et al., 2015; Kraus et al., 2021; Pulcu & Browning, 2019).

Previous research on adaptive learning has largely focused on scenarios in which events that signify fundamental change tend to be more extreme and surprising than events that reflect expected uncertainty. For example, in spatial prediction paradigms with “change point” structure, the generative statistics that govern target positions occasionally undergo discrete shifts. Targets subsequent to a change point tend to be associated with high prediction error. (Throughout this paper, we use “prediction error” to refer to the spatial distance between a target’s predicted and actual locations. This usage is distinct from the concept of “reward prediction error” in reinforcement learning and instead is more similar to what has sometimes been called “state prediction error”; Gläscher et al., 2010). In environments with change-point structure, an approximately Bayesian theoretical account (Nassar et al., 2016) holds that a larger prediction error should lead to a higher inferred probability that a change point has occurred, resulting in a higher trial-specific learning rate (that is, a large belief update expressed as a proportion of the prediction error). A related theoretical proposal holds that learning is gated by the associability of a stimulus, which scales with surprise (Li et al., 2011; Pearce & Hall, 1980).

However, larger-magnitude prediction errors need not always lead to higher learning rates (O’Reilly et al., 2013; L. Q. Yu et al., 2021). In some situations, good decisions require maintaining stable beliefs in the face of salient but non-predictive events, such as when an investor avoids overreacting to a transient market downturn. People can adapt to environments in which the ideal learning rate is equal or lower for surprising events relative to more moderate events (Cheadle et al., 2014; d’Acremont & Bossaerts, 2016; Lee et al., 2020; Nassar et al., 2019; O’Reilly et al., 2013; Summerfield & Tsetsos, 2015). Downregulating learning in response to non-predictive “oddball” events has been suggested to require vigilance and cognitive control (d’Acremont & Bossaerts, 2016), although other evidence suggests that actively updating beliefs requires additional cognitive resources compared with holding them stable (O’Reilly et al., 2013).

Given that surprising events can either upregulate or downregulate learning in different contexts, an outstanding question is to what degree adaptive learning undergoes spontaneous, experience-driven recalibration to track the current context’s higher-order statistics. Calibration of adaptive learning, a form of metalearning (Griffiths et al., 2019; Wang, 2021), has previously been documented in human behavioral paradigms in which experimental participants were explicitly tasked with optimizing and reporting their predictions about upcoming events and received explicit description, training, and/or event-specific cues to ensure they clearly understood the relevant sources of uncertainty (d’Acremont & Bossaerts, 2016; Nassar et al., 2019; O’Reilly et al., 2013). In this previous work, participants successfully adapted their behavior to their environment, showing different patterns of learning rate modulation in different contexts. However, it is unclear to what degree the successful adaptation was attributable to descriptive information/instructions or to direct experience with environmental statistics.

Similar questions arise for current theoretical models of adaptive learning. Recent work has shown that a neural network model could be configured either to increase or decrease learning in response to extreme events, effectively treating such events as meaningful change points in one environment and meaningless oddballs in another (Razmi & Nassar, 2022). The model achieved this behavior by associating observations with a latent representation of mental context that evolved with different dynamics in the two environments. These dynamics were built into each version of the model a priori; that is, the model was endowed with knowledge about each environment’s statistical structure, raising the question of whether such structural knowledge originates from explicit description, experiential learning, or both. Here, we investigated whether people would acquire patterns of adaptive learning solely through experience, and whether such patterns, once established, would be revised in response to a higher-order change in the experienced statistical structure.

The present work made use of an eye-tracking paradigm we developed and validated in a previous study, which used predictive gaze to measure spontaneous belief updating in the absence of overt instructions (Bakst & McGuire, 2021). The paradigm capitalized on the idea that prediction is a natural aspect of moment-to-moment cognition (Clark, 2013; Hayhoe et al., 2012; Henderson, 2017; Summerfield & Lange, 2014). While making perceptual judgments about briefly presented targets, participants’ eye movements tended to anticipate target onsets, which were temporally predictable but spatially uncertain. Despite not having been informed about the behaviors of interest or the optimal strategy, participants exhibited well-calibrated predictive inference and adapted to a manipulation of expected uncertainty in a change-point environment (Bakst & McGuire, 2021).

Here, across two experiments, we extended this eye-tracking paradigm beyond change-point environments to test the hypothesis that people would spontaneously adapt to the informativeness or uninformativeness of surprising events. In Experiment 1, we used a between participant design to contrast environments in which surprise reflected either meaningful change or meaningless outliers. In Experiment 2, using a within-participant design, we tested people’s ability to adjust their behavioral strategy when the environment’s structure reversed. This allowed us to investigate the possibility of an asymmetry in the difficulty of the two environments, such that participants would adapt more easily to environments in which surprising outcomes warranted updating or to environments in which surprising outcomes were better ignored.

2. METHODS

2.1. Participants

All human participant procedures were approved by the Boston University Institutional Review Board. Informed consent was obtained from all participants. Participants were recruited as a convenience sample from the Boston University community (Experiment 1: N=29, 18 female and 11 male, Age: mean=20.5, range 18-24; Experiment 2: N=63, 48 female and 15 male, Age mean=20.3, range 18-31). All participants had normal or corrected-to-normal vision. Nine additional participants were excluded (5 in Experiment 1 and 4 in Experiment 2): three due to technical problems, three due to task accuracy below 60%, and three due to more than half of their trials not meeting eccentricity thresholds as explained below. The sample size for Experiment 1 was determined based on effect sizes derived from pilot data, while the sample size for Experiment 2 was determined based on effect sizes observed in Experiment 1, both using a power of 80% and a significance threshold of 5%. For Experiment 2, the sample size, exclusion criteria, and key analyses were preregistered with AsPredicted (aspredicted.org/cd54j.pdf).

2.2. Task

In two experiments, participants performed an implicit spatial prediction task based on Bakst & McGuire (2021; Figure 1A) programmed in Python using PsychoPy (Peirce et al., 2019) while their eye movements were tracked (EyeLink 1000+ desk-mounted eye tracker, SR Research Ltd, Osgoode, Canada). Gaze position was collected monocularly at 1000 Hz. Head position was stabilized using a chinrest positioned 57 cm from the display (BENQ XL2430 with a resolution of 1920x1080), and the eye tracker was calibrated at the beginning of each run. An additional post-hoc calibration was performed following data collection (see below). The nominal task was to report whether briefly presented numerical digits were even or odd in order to earn reward. The digits were presented in varying locations, implicitly requiring participants to anticipate the location of the next digit so that they could use central vision to make the odd/even judgment.

Each digit appeared between two flanking Xs for 180 ms before being backward-masked by another X. The participant then had unlimited time to respond by pressing “1” with their left hand for “odd” or “0” with their right hand for “even.” Accuracy feedback (a filled or empty circle signifying a correct or incorrect response) was then displayed in the same location as the digit for 500 ms, followed by a 750-ms inter-trial interval. Thus, there was a fixed 1250 ms interval between the keypress and the appearance of the next digit, making targets temporally predictable.

Two types of trials, central and peripheral, occurred in alternating order. Central trials displayed the digit at the center of the screen, which was marked by a small white point throughout the task. Peripheral trials displayed the digit somewhere on the perimeter of a circle centered on the screen with a radius of 6.9° of visual angle, which was marked throughout the task with a white outline. The purpose of including central trials was to recenter gaze at a point equidistant from all possible digit locations prior to each peripheral trial.

Two conditions – change point (CP) and random walk (RW) – governed the peripheral digit locations. Average spatial predictability was similar between the two conditions but the sequential contingencies differed. In the CP condition, each digit’s location was drawn from a Gaussian distribution with a fixed width (σ = 11.25° in angular distance around the circle) and a mean that usually remained fixed from one peripheral trial to the next but was resampled from a uniform distribution spanning the entire circle at occasional unsignaled change points. The generative mean was not resampled during a two-trial refractory period after each change point, and was resampled with a probability of 0.167 thereafter, leading to an overall change point probability of approximately 0.125. In the RW condition, digit locations were usually drawn from a Gaussian distribution (σ = 11.25°) with a mean equal to the digit’s location on the previous peripheral trial. This random walk around the circle was punctuated by occasional uniformly distributed outliers, which occurred at the same frequency as change points in the CP condition. In contrast to the CP condition, these noisy outliers did not reset the mean of the Gaussian; the subsequent peripheral digit location was drawn from a Gaussian distribution centered on the location of the peripheral digit prior to the outlier.

Therefore, in both conditions, peripheral digits had a high probability of appearing at a Gaussian-distributed location near the previous generative mean and a lower probability of appearing at an arbitrary new location anywhere on the circle. The two conditions differed in the predictive significance of large versus small movements. Small trial-to-trial differences in the CP condition tended to reflect noise around a stable mean, whereas large differences indicated a meaningful shift. In contrast, small trial-to-trial differences in the RW condition reflected meaningful shifts in the mean, whereas large differences represented non-predictive outliers (Figure 1B).

Experiment 1 used a between-subject design in which each participant experienced only one condition, performing four 8-minute runs of either the CP or RW condition. Experiment 2 used a within-subject design in which each participant experienced three 8-minute runs of one condition followed by three 8-minute runs of the other, with the order counterbalanced across participants. Participants were not informed that the conditions changed, or that there were, in fact, multiple conditions at all. They were only told that they would complete four 8-minute runs in Experiment 1, and six 8-minute runs in Experiment 2.

Because run duration was based on time rather than number of trials (and the duration of individual trials depended on the participant’s response latency), participants varied in the total number of trials they completed. In Experiment 1, participants completed an average of 456 peripheral trials in total across all four runs (range 382-515). They completed 462 peripheral trials on average in the CP condition (range 432-515), and 450 trials on average in the RW condition (range 382-484). In Experiment 2, participants completed an average of 687 trials in total across all six runs (range 556-780). They completed 345 trials on average in the CP condition (range 283-393) and 342 trials on average in the RW condition (range 258-393).

Monetary compensation for both studies consisted of a show-up payment of $12 in addition to a bonus payment that scaled with the mean proportion of correct odd/even responses (95% accuracy resulted in a bonus of an additional $12 and payments were rounded to the nearest $0.25). A horizontal bar centered at the bottom of the screen increased in length proportional to the total number of correct responses in each run.

Following the experiment, participants were asked to respond to an open-ended question that asked, “What do you think the study was about?” Answers including any reference to digit locations/patterns or prediction were considered evidence of explicit awareness of task structure. Two independent raters assessed the responses and any discrepancies were resolved through discussion. Eleven individuals (38%) in the first experiment and 30 individuals (47%) in the second experiment were coded as having evidence for explicit awareness. Explicit awareness was not associated with odd/even judgment accuracy or predictive gaze accuracy in either experiment (all p >= 0.544, Wilcoxon rank sum test).

2.3. Post-hoc calibration

Calibration of the eye tracker was performed at the start of each run. However, small, consistent deviations were often observed between gaze position and visual targets. We therefore performed a post-hoc calibration of gaze position (Supplemental Figure 1). The post-hoc calibration step was developed to remove structured error from gaze position estimates, although our results did not change materially if it was omitted.

Using gaze position at the time of peripheral digit appearance (which served as our measure of predictive beliefs), we calculated the residuals according to the following:

R e s i d u a l s = D - \sqrt{{(B_{x} + X_{a d j})}^{2} + {(B_{y} + Y_{a d j})}^{2}}

where D=6.9° of visual angle, the eccentricity of peripheral digits. Measured gaze position was decomposed into its x and y components B_x and B_y relative to screen center, and X_adj and Y_adj represent additive post-hoc calibration adjustments. We then found the adjustments that minimized the following expression using fminsearch (Matlab):

\frac{Σ | R e s i d u a l s |}{n} + 0.1 * (| X_{a d j} | + | Y_{a d j} |)

where n is the number of residuals. We only included trials on which gaze position was closer to the peripheral circle than the center (eccentricity > 3.45° of visual angle) so as to avoid basing the calibration on beliefs that lingered near the center of the screen. This procedure was completed for each run separately. Runs were only recalibrated if they included more than 40 trials that met the eccentricity-based inclusion criterion. Two additional runs were excluded from recalibration after manual inspection revealed large, unexpected shifts in gaze position.

For X_adj, the mean across participants from both experiments was −0.086°, with a range from −4.908° to 6.862°. Y_adj had an average of 0.018° with a range from −2.757° to 3.621°. On average, the residuals per run decreased by 0.23° following post-hoc calibration.

2.4. Learning rate analyses

A participant’s predictive belief on each peripheral trial, B_t, was operationalized as the gaze position at the time of digit onset, quantified in terms of angular position on the peripheral circle. The learning rate on trial t (LR_t) was estimated as the belief update from trial t to t+1, scaled by the prediction error on trial t:

L R_{t} = \frac{B_{t + 1} - B_{t}}{X_{t} - B_{t}}

where B represents gaze-derived belief, and X is the angular position of the digit. The numerator and denominator were circular differences with a possible range of +/−180°. Analyses only included trials in which gaze was nearer to the peripheral circle than to the center (eccentricity > 3.45° of visual angle), to exclude beliefs inaccurately represented by the angle on the peripheral circle. Three participants in Experiment 1 and one participant in Experiment 2 were excluded entirely because fewer than half their trials met this eccentricity threshold.

Analyses assessed the relationship between per-trial learning rates and the absolute value of prediction errors. Median learning rate for each participant was calculated for different prediction error bins (bin edges were 0°, 5°, 10°, 15°, 20°, 25°, 35°, 45°, 75°, 105°, 135°, and 180°). This procedure was repeated separately for trials in “Early” and “Late” epochs, defined as the first half (four minutes) of the first run and second half (four minutes) of the last run, respectively. In Experiment 1, this analysis compared the first half of Run 1 to the second half of Run 4. In Experiment 2, it compared the first half of Run 1 to the second half of Run 3 for the pre-reversal phase, and the first half of Run 4 to the second half of Run 6 for the post-reversal phase.

Additionally, a linear model with a probit link was fit to learning rate as a function of prediction error magnitude across individual trials, separately for each participant, with learning rate estimates constrained to the range −0.5 to 1.5. Note that the probit function can take on values from 0 to 1, matching the theoretical range of learning rates. We allowed the input data (the empirical per-trial learning rate estimates) to retain a larger range to avoid excessively truncating the measurement error distribution for the least-squares fit, but limited the range to avoid the inclusion of extreme values >> 1 or << 0.

{\hat{L R}}_{t} = Φ (β_{0} + β_{1} \cdot | P E_{t} |)

The fmincon function in Matlab (Mathworks, Natick, MA) was used to determine the least-squares best-fitting probit coefficients, using bounds of ±3 for the intercept term (β₀) and ±0.05 for the slope term (β₁). The intercept was constrained so the resulting learning rate intercept could span nearly the full range from zero and one, while the slope was limited to allow for functions of varying steepness but not so steep as to approximate a step function. This was repeated separately for Early and Late epochs.

Subsequent analyses focused solely on change point or outlier trials, on which digit locations were sampled from a uniform distribution. These analyses were restricted to digits that appeared more than two standard deviations from the previous generative mean (>22.5° in angular distance) to focus only on extreme events. Outlier learning rate estimates were removed beforehand, defined as being outside the group interquartile range (IQR) by more than 1.5 times the IQR (which generally removed trials with learning rate estimates >>1 or <<0, comprising about 2% of trials).

We calculated each participant’s overall median learning rate for extreme events separately for the entire task experience, as well as the Early and Late epochs. The median Late-epoch learning rate per participant was considered their “final LR.” The change in learning rate (ΔLR) was computed as the Late-minus-Early median learning rate for extreme events per participant. Participants with no valid learning-rate estimates for extreme events during the Early or Late epoch were excluded from this analysis (n=3 in Experiment 1, n=15 in the pre-reversal phase of Experiment 2, and n=1 in the post-reversal phase of Experiment 2). The retained participants had a median of three extreme events included in the Early epoch and five in the Late epoch.

The time course of metalearning was estimated by taking the group median and standard error of the learning rate for each successive extreme event experienced. Data were plotted until the latest point at which at least 50% of participants per group contributed data. A larger number of extreme events were available for Experiment 1 than for the pre-reversal phase of Experiment 2 because participants completed an additional run of the task. Additionally, a larger number of extreme events were available post-reversal than pre-reversal in Experiment 2 because response times tended to decrease with experience in the task, and participants therefore encountered more trials in the second half of the experiment. We also fit probability distributions to each participant’s set of extreme-event learning rates using a Gaussian kernel (bandwidth = 0.15).

The following comparisons were preregistered as primary confirmatory analyses in Experiment 2: (1) comparison of pre-reversal “final LR” between groups (to serve as a replication of Experiment 1); (2) comparison of post-reversal ΔLR between groups and against zero in each group separately (to test for recalibration after the reversal point); and (3) comparisons of both “final LR” and ΔLR between participant groups within each condition (to test for differences as a function of whether the condition was encountered first or second).

3. RESULTS

3.1. General Task Performance

We used an implicit spatial prediction paradigm (Bakst & McGuire, 2021) across two experiments to evaluate how successfully people could learn through experience to interpret surprising events either as meaningful changes or random outliers (Figure 1A). Participants were given a nominal task of reporting whether briefly presented numerical digits were even or odd. The task implicitly led participants to make anticipatory saccades to the predicted locations of upcoming digits on the screen and to update their spatial predictive beliefs continually in response to new observations.

The task had two conditions: In the Change Point (CP) condition, peripheral digit locations were drawn from a one-dimensional Gaussian distribution on the perimeter of a large circle (σ = 11.25° of angular distance; see Figure 1A) with the mean of the Gaussian distribution occasionally resampled from a uniform distribution spanning the entire circumference of the circle. Peripheral trials alternated with central trials (see Figure 1A), which served to recenter the participant’s gaze prior to each prediction. Small changes in digit location from one peripheral trial to the next therefore tended to represent noise around a stable mean, whereas large changes in digit location tended to represent meaningful shifts in the underlying position of the Gaussian (Figure 1B). To maximize accuracy, participants in the CP condition ideally should converge on an increasingly precise estimate of the current generative mean and downregulate learning from small errors to avoid chasing noise, but should transiently raise their learning rate in response to large errors reflective of change points (Nassar et al., 2010).

In the Random Walk (RW) condition, the contingencies were reversed: the Gaussian-sampled digit location on one trial became the generative mean for the next trial. This random walk was punctuated by occasional outliers drawn from a uniform distribution spanning the circumference of the circle. Small changes in digit location therefore tended to indicate meaningful shifts in the underlying Gaussian distribution, whereas large changes tended to represent meaningless outliers. To maximize performance, participants in the RW condition ideally should fully update their beliefs in response to small errors but should downregulate learning from extreme events.

Participants performed the nominal task with high accuracy for both central and peripheral trial types in both the CP and RW conditions. Mean accuracy for odd/even judgments in Experiment 1 was 82% (SD=5.6%) for peripheral trials and 98% (SD=1.4%) for central trials (Figure 1D), with no significant differences between conditions (Wilcoxon rank sum, both p >= 0.616,). Participants were also successful at the uninstructed task of predicting digit locations. Because task events were temporally predictable, we used gaze position at the time of digit appearance as an indication of the participant’s prediction (Bakst & McGuire, 2021). Data from a representative participant are shown in Figure 1C. Gaze was directed near the appropriate eccentricity for both central and peripheral trials (Figure 1C, top panel).

Examining peripheral trials only, the angle of the prediction was near to the subsequent digit location for both the CP and RW conditions (Figure 1C, middle and bottom panels, respectively). The mean prediction error was 28.2° of angular distance (SD=4.7°) for peripheral trials, and 0.84° of visual angle (SD=0.53°) for central trials (Figure 1E), with no significant differences between conditions (Wilcoxon rank sum, both p >= 0.230). As expected, both odd/even task accuracy and predictive gaze accuracy decreased transiently around extreme events (Supplemental Figure 2).

2.2. Adaptive learning: Experiment 1

2.2.1. Effects of prediction error magnitude

We tested whether participants modulated their learning rate across trials as a function of prediction error magnitude. Per-trial learning rate was empirically estimated as the gaze-based prediction update (the distance between predictions on trials t and t+1) expressed as a proportion of the gaze-based prediction error (the distance between the predicted and observed target locations on trial t). We hypothesized that in the CP condition, participants would use higher learning rates for larger prediction errors, which tended to reflect meaningful change points. In the RW condition, in contrast, larger prediction errors tended to reflect noisy outliers and should be associated with lower learning rates.

To evaluate whether these patterns were present in the data, we first calculated the median observed trialwise learning rate for each participant for different bins of prediction error magnitude (bin edges were 0°, 5°, 10°, 15°, 20°, 25°, 35°, 45°, 75°, 105°, 135°, and 180°). Group mean and SEM (of within-subject medians) are shown in Figure 2A.

Because initial task instructions were identical in the CP and RW conditions, any systematic differences in behavior presumably emerged through metalearning over time, as participants used direct experience with the task’s statistics to recalibrate the rate at which they learned from individual events. To examine calibration of learning over time, we repeated the procedure for Early and Late task epochs (Figure 2A, middle and right panels), defined as the first and last four minutes of the task.

To facilitate a quantitative test of the effects depicted descriptively in Figure 2A, we fit probit functions to each participant’s per-trial learning rates as a function of prediction error magnitude (Figure 2B, left). Individual participant fits are shown in dotted lines and group medians are shown in thicker lines. Slopes differed significantly between the CP and RW groups over the full task (p = 0.014, Wilcoxon rank sum test), with the CP group showing a median slope of 0.009 (interquartile range [IQR] = 0.002, 0.037) and RW a median of −0.001 (IQR = −0.006, 0.002).

Early behavior showed minimal differences between the two groups (Figure 2B, middle), with no significant difference in slopes (Wilcoxon rank-sum p = 0.348; CP median [IQR] = − 0.003 [−0.010, 0.006], RW = 0.004 [−0.007, 0.043]). However, slopes in the Late epoch differed significantly between conditions (Wilcoxon rank-sum p < 0.001; CP = 0.023 [0.007, 0.050]; RW = −0.008 [−0.011, 0.001]; Figure 2D right). The change in slope between the Early and Late epochs (Late minus Early) likewise differed between conditions (Wilcoxon rank sum p = 0.011; CP median [IQR] = 0.020 [0.002, 0.059], RW = −0.006 [−0.051, 0.007]). However, the magnitude of the change in slope from Early to Late did not differ between groups (Wilcoxon rank sum p = 0.326), implying there was no evidence of faster metalearning in one context than the other.

2.2.2. Effect of extreme events

We focused our next analyses on non-Gaussian extreme events, which represented change points in the CP condition and outliers in the RW condition. Analyses were restricted to digits that were sampled from the uniform distribution and appeared more than two standard deviations from the previous Gaussian generative mean (> 22.5° of angular distance) to ensure we focused only on events that were distinguishable from Gaussian samples. In addition to being of theoretical interest, extreme events allowed for more accurate measurement of empirical learning rates compared with less-extreme events. Because per-trial learning rate estimates were calculated as a ratio with prediction error in the denominator, the estimates were less susceptible to measurement error on large-prediction-error trials.

To evaluate performance over time, we looked at extreme-event learning rates over the course of the task, with time expressed in terms of the cumulative number of extreme events encountered (Figure 3A). The CP and RW groups exhibited different trajectories of learning rate as a function of experience. The overall difference in learning rate between the two conditions can be summarized in terms of the median extreme-event learning rate per participant (Figure 3B). The median learning rate was significantly greater in the CP condition than in the RW condition (Wilcoxon rank sum p < 0.001; CP median [IQR] = 0.99 [0.97, 1.01]; RW = 0.21 [0.04, 0.74]). Extreme-event learning rates also displayed a bimodal within-subject distribution in both groups, with a peak centered at the optimal behavior for each condition (Supplemental Figure 3).

Next, we evaluated whether the two groups differed in their final learning rate (median learning rate for extreme events during the Late epoch; Figure 3C). In the CP condition, the median final learning rate was 1.00 (IQR = 0.95, 1.02), which was not significantly different from the optimal learning rate of 1 (Wilcoxon signed rank p = 0.670). In comparison, in the RW condition, the median final learning rate was 0.08 [0.004, 0.65], which was significantly greater than the optimal learning rate of 0 (Wilcoxon signed rank p = 0.002,) and significantly lower than the final learning rate for the CP group (Wilcoxon rank sum p < 0.001).

Finally, we quantified the amount of metalearning as the change in learning rate for extreme events (ΔLR) over the course of the task (Figure 3D). We estimated ΔLR by calculating each participant’s median learning rate for extreme events in Early and Late epochs and taking the difference (Late minus Early). For each group individually, the difference was not significantly different from zero (RW: Wilcoxon signed rank p = 0.110; median [IQR] = −0.28 [−0.81, 0.11]; CP: p = 0.104; median [IQR] = 0.05 [−0.03, 0.75]), likely because of consistently well-calibrated behavior over time in the CP condition and high inter-individual variability in the amount of metalearning in the RW condition. The signed value of ΔLR differed significantly between the two groups (Wilcoxon rank sum p = 0.025) and its magnitude did not (Wilcoxon rank sum p = 0.857).

2.3. Adaptive learning: Experiment 2

2.3.1. Replication

To further investigate experience-driven metalearning, we conducted a preregistered second experiment in which the conditions reversed midway through each participant’s experimental session. Each participant experienced three 8-minute runs of one condition (CP or RW) followed by three runs of the other, with the order counterbalanced across participants. As before, participants were not informed about the structure of the task or the variables of interest.

Pre-reversal data from the first three runs provided an opportunity to replicate the analyses reported above for Experiment 1. In examining learning rate as a function of prediction error magnitude, we obtained similar results (see Figure 4A for per-participant fits analogous to Figure 2B and see Supplemental Figure 4 for binned mean-of-median learning rates analogous to Figure 2A). Probit slope coefficients summarizing the overall effect of prediction error magnitude on learning rate differed significantly between the two groups (Wilcoxon rank sum p < 0.001; Figure 4A, left). The CP group showed a median slope of 0.009 (IQR = 0.003, 0.029; p < 0.001, Wilcoxon signed rank test against zero) and the RW group showed a median of −0.004 (IQR = −0.008, −0.0002; p = 0.010). The two groups did not differ during the Early epoch (Wilcoxon rank sum p = 0.660; CP median [IQR] = 0.005 [−0.004, 0.022]; RW median [IQR] = 0.012 [−0.009, 0.050]; Figure 4A middle) but differences were found in the Late epoch (Wilcoxon rank sum p < 0.001; CP median [IQR] = 0.024 [0.004, 0.050]; RW median [IQR] = −0.006 [−0.014, 0.001]; Figure 4A right). The change in slope between the Early and Late epochs differed between groups (Wilcoxon rank sum p = 0.002) but the magnitude of the change did not (Wilcoxon rank sum p = 0.093).

Figure 4. — Learning rate as a function of absolute prediction error in Experiment 2. A) Pre-reversal data: Linear models with a probit link were fit separately to each participant’s trial-by-trial learning rate estimates. Overall fits are shown at left, Early epoch at middle, and Late epoch at right. Conventions as in Figure 2B. (B) Equivalent results for the post-reversal phase of Experiment 2. Across both panels, dashed lines represent the participant group that experienced the CP condition in the pre-reversal phase and the RW condition in the post-reversal phase; solid lines represent the group that experienced the opposite order.

Patterns of learning from extreme events also matched those seen in Experiment 1. Both groups demonstrated marked metalearning (Figure 5A), with a significant difference between groups in overall median learning rate (Wilcoxon rank sum p < 0.001; Figure 5B). A preregistered replication analysis showed that final learning rates in the pre-reversal phase differed significantly between groups (Wilcoxon rank sum p < 0.001; CP median [IQR] = 0.99 [0.94, 1.04]; RW median [IQR] = 0.07 [−0.004, 0.41]; see Figure 5C). The final learning rate for the CP group was not significantly different from the optimal rate of 1 (Wilcoxon signed rank p = 0.254), whereas the final learning rate in the RW group was significantly greater than the optimal rate of 0 (Wilcoxon signed rank p < 0.001). Signed ΔLR significantly differed between the two conditions (Wilcoxon rank sum p < 0.001; Figure 5D) but the magnitude of ΔLR did not (Wilcoxon rank sum p = 0.085). We again observed a bimodal pattern of individual trial learning rates in both conditions similar to that seen in Experiment 1 (Supplemental Figure 3).

2.3.2. Post-reversal behavior

When the CP and RW conditions reversed halfway through the experimental session, participants were given no explicit indication that the environment had changed. We investigated whether they would revise their understanding of the task’s statistical structure through experience alone.

In examining learning rate as a function of prediction error magnitude, slopes in the post-reversal block as a whole did not significantly differ between groups (Wilcoxon rank sum p = 0.532), with both groups showing positive median slopes (CP median [IQR] = 0.003 [0.001, 0.023]; RW = 0.003 [−0.001, 0.007]; Figure 4B, left). The two groups also did not differ during the Early epoch (Wilcoxon rank sum p = 0.132; CP = 0.003 [−0.004, 0.014]; RW = 0.010 [−0.002, 0.049]; Figure 4B, middle) and although behavior trended in the expected direction, there was still no significant difference between groups in the Late epoch (Wilcoxon rank sum p = 0.204; CP = 0.011 [−0.002, 0.046]; RW = −0.001 [−0.005, 0.040]; Figure 4B, right), driven primarily by persistent suboptimal behavior in the group that experienced the RW condition after the reversal point. Post-reversal Late-epoch slopes did not significantly differ from zero in the RW condition (Wilcoxon signed rank p = 0.421), but were significantly positive in the CP condition (Wilcoxon signed rank p < 0.001).

In assessing learning from extreme events, participants who switched from the RW condition to the CP condition (RW ➔ CP) showed a substantial increase in their learning rate shortly after the reversal point, indicative of rapid metalearning (Figure 5A). In comparison, participants who experienced CP and then RW (CP ➔ RW) showed more gradual metalearning post-reversal. They appeared to maintain their previous beliefs about the environment for an extended period before gradually adapting their behavior.

To assess whether the order of the conditions affected behavior, we compared the overall median extreme-event learning rate for each condition between groups (comparing one group’s pre-reversal behavior to the other group’s post-reversal behavior; Figure 5B). Learning rates in the CP condition did not differ between groups (Wilcoxon rank sum p = 0.326), while those in the RW condition did (Wilcoxon rank sum p < 0.001; pre-reversal median [IQR] = 0.14 [0.05, 0.36], post-reversal = 0.94 [0.61, 0.98]), with pre-reversal learning rates closer to the optimal value of zero.

Analyses of final learning rates (extreme-event learning rates in the final 4 minutes of each phase; Figure 5C) provided further evidence that participants’ ability to calibrate to the RW environment was impacted by previous experience in the CP environment. In a pre-registered analysis, there was a significant difference in final learning rates between participants who experienced the RW condition in the pre-versus post-reversal phases. The pre-reversal group showed better-calibrated behavior (Wilcoxon rank sum p = 0.009; pre-reversal median [IQR] = 0.07 [−0.004, 0.41]; post-reversal = 0.52 [0.06, 0.99]). In contrast, the CP condition showed no significant difference between pre- and post-reversal phases (Wilcoxon rank sump = 0.505; pre-reversal median [IQR] = 0.99 [0.94, 1.04]; post-reversal = 0.99 [0.91, 1.02]). In another preregistered analysis, the change in learning rate (ΔLR, the difference between Early and Late epochs of each phase) did not significantly differ between the pre-reversal and post-reversal phases within either condition (Wilcoxon rank sum, both p >= 0.092; Figure 5D), though both were significantly different from zero in the post-reversal phase (Wilcoxon signed rank, both p <= 0.013). Additionally, the raw ΔLR differed between groups in the post-reversal phase (Wilcoxon rank sum p < 0.001), while the magnitude did not (Wilcoxon rank sum p = 0.762).

4. DISCUSSION

Across two experiments, we investigated the extent to which participants spontaneously used experience with the statistics of their environment to regulate the influence of surprise on learning. Learning was assessed via predictive eye movements while participants performed a perceptual judgment task under spatial uncertainty. Gaze-based predictions exhibited successful metalearning, tending to approach context-appropriate patterns of learning rate modulation over time. At the same time, learning rates for extreme events were generally better calibrated in a context in which large prediction errors were associated with meaningful change (CP) compared to one in which they were not (RW). We found asymmetric order effects: participants who experienced the RW condition second showed worse performance compared to those who saw RW first, whereas order had no effect on performance in the CP condition.

4.1. Learning initial task structure

Our findings agree with previous results from some explicit prediction tasks, which demonstrated better performance in conditions in which surprise was meaningful than in conditions in which surprising events needed to be ignored (d’Acremont & Bossaerts, 2016; Nassar et al., 2019). This lends support to the idea that it is more cognitively taxing to ignore salient outliers than to update beliefs in response to informative surprise (d’Acremont & Bossaerts, 2016). However, the amount of metalearning (measured in terms of within-session change) was generally similar between conditions. This could suggest that performance differences were driven by initial assumptions about the environment. Participants may have held an initial bias or default towards assuming large predictions errors indicated meaningful change. A related possibility is that maintaining stable beliefs across an intervening outlier entails elevated working memory demands. Such a default could be adaptive insofar as, in real-world environments, there might be more dire consequences for erroneously ignoring events that differ from expectations than for erroneously overweighting them.

Questions remain as to whether other task contexts or framings could evoke different patterns of initial assumptions and inductive biases. For instance, O’Reilly and colleagues (2013) found additional behavioral costs associated with belief updating compared with merely reacting to isolated surprising events. Future work should examine the flexibility of default strategies and the cues relevant to assessing the relative costs of different patterns of learning. Future work should also examine how metalearning might be altered by task manipulations of arousal, motivation, or incentives (Jepma et al., 2018; Nassar et al., 2012; Urai et al., 2017).

We noted substantial differences in behavioral policies across individuals, especially early on in the task. Behavior tended to converge over time, showing decreasing inter-individual variability in later epochs of the session. Why participants have such a variety of initial strategies is an open question. Whether it relates to differing interpretations of the experimental context, broader influence of their recent state and/or experiences, or trait-level factors (Browning et al., 2015; Kraus et al., 2021) has yet to be identified.

4.2. Learning after environmental change

In our second experiment, the conditions underwent an unsignaled reversal halfway through, and participants revised their behavioral strategy in line with the new statistical structure they encountered. The trajectory of metalearning appeared to exhibit condition-specific order effects. Participants took longer to adapt to the RW condition if they had previously experienced the CP condition than if they saw the RW condition first. In comparison, behavior in the CP condition did not appear to depend on condition order. If participants tended to hold initial biases towards a CP-like strategy, then experiencing the CP condition could have served to reinforce those biases, making them yet more difficult to overcome when the participant was suddenly thrust into the RW condition.

4.3. Theoretical implications

From a theoretical perspective, our results imply that calibration of adaptive learning can be guided by experience in the absence of descriptive information about the outcome-generating process. This conclusion is congruent with findings that statistical experience can shape learning and decision processes in a variety of domains (Constantino & Daw, 2015; McGuire & Kable, 2012; Ossmy et al., 2013; Schweighofer et al., 2006). Our findings support a class of models in which patterns of belief updating are guided by higher-order beliefs about environmental structure (Razmi & Nassar, 2022; Yu et al., 2021) and highlight the need to extend such models to incorporate experiential structure-learning or metalearning processes (Griffiths et al., 2019; Wang, 2021).

Findings from the present work will provide useful constraints for further model development. A successful theory of structure learning will need to account for the observation that behavior more readily calibrated (and recalibrated) to an environment in which surprise reflected change points than to one in which surprise reflected nonpredictive outliers. A possibility that merits further investigation is that the asymmetry in metalearning between the two conditions might be rooted in an asymmetric performance cost of miscalibration. A CP-optimized agent could be reasonably successful in the RW environment by treating an uninformative outlier as if it were two sequential change points. In contrast, an RW-optimized agent might suffer more prolonged performance costs in the CP environment. Future theoretical work might also explore parallels to other contexts in which asymmetric belief updating has been observed; for example, in updating general semantic knowledge, people are better at learning that previously non-believed statements are true than at learning that previously believed statements are false (Yang et al., 2022).

A related goal for future theoretical work is to identify computational parameters that enable some individuals to adjust their behavioral policy rapidly and successfully while others maintain suboptimal strategies for extended periods. Previous modeling work has demonstrated the varying influence of factors such as surprise and environmental volatility on participant behavior (Behrens et al., 2007; Lee et al., 2020; McGuire et al., 2014; Nassar et al., 2010, 2019) and further efforts to model metalearning could help explain the sources of the behavioral patterns we identified (d’Acremont & Bossaerts, 2016; Nassar et al., 2019).

4.4. Limitations

There are limitations on our ability to generalize our findings to other situations and participant populations. An intriguing possibility is that different task contexts might evoke different patterns of bias and flexibility in metalearning. For example, our study focused solely on rapid, momentary decisions. Whether the type of spontaneous metalearning observed here would generalize to slower, deliberative processes is currently unknown. In addition, given that our participant samples were drawn from the Boston University community, the extent to which the results generalize across populations, cultures, or clinically defined groups is an open question. Though we would expect the visual system’s propensity for predictive gaze to hold across populations, the relevant prior beliefs and the dynamics of feedback-driven cognitive flexibility could differ. Finally, while we observed clear evidence of adaptive learning in participants’ behavioral responses to extreme events, it was not possible to reliably assess behavior in response to small prediction errors. Precise assessment of the rate of learning from small errors would require a different measurement approach and potentially also a different incentive scheme, given that our task did not require predictive gaze to be perfectly accurate to maintain a high level of performance (Bakst & McGuire, 2021).

4.5. Conclusions

Two experiments demonstrated that participants calibrated patterns of learning rate modulation to the structure of their environment through experience alone. Participants displayed metalearning, adapting to the informativeness of surprising events in an initial context and readjusting after the environment’s statistics reversed. The findings motivate new questions about sources of bias and individual variation in the cognitive processes that guide learning, prediction, and decision making in complex environments.

Supplementary Material

NIHMS1857933-supplement-1.docx^{(1.4MB, docx)}

ACKNOWLEDGEMENTS

This work was supported by the National Science Foundation [grant numbers 1755757 and 1809071], the National Eye Institute [grant number EY029134], the Office of Naval Research [grant numbers N00014-16-1-2832 and N00014-17-1-2304], and the Center for Systems Neuroscience Postdoctoral Fellowship at Boston University. We thank Matthew Nassar, Sam Ling, and David Somers for helpful discussion, and Lila Wright for assistance with data collection in Experiment 1.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

CRediT authorship statement: Leah Bakst: Conceptualization, Funding Acquisition, Methodology, Investigation, Analysis, Writing. Joseph McGuire: Conceptualization, Funding Acquisition, Methodology, Writing – Review and Editing.

DATA AVAILABILITY

Task code and raw data are available via the Open Science Framework at https://osf.io/5pmhg/ and analysis code is available upon request.

REFERENCES

Bakst L, & McGuire JT (2021). Eye Movements Reflect Adaptive Predictions and Predictive Precision. Journal of Experimental Psychology: General: 150(5), 915–929. 10.1037/xge0000977 [DOI] [PMC free article] [PubMed] [Google Scholar]
Behrens TEJ, Woolrich MW, Walton ME, & Rushworth MFS (2007). Learning the value of information in an uncertain world. Nature Neuroscience, 10(9), 1214–1221. 10.1038/nnl954 [DOI] [PubMed] [Google Scholar]
Browning M, Behrens TE, Jocham G, O’Reilly JX, & Bishop SJ (2015). Anxious individuals have difficulty learning the causal statistics of aversive environments. Nature Neuroscience, 18(4), 590–596. 10.1038/nn.3961 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cheadle S, Wyart V, Tsetsos K, Myers N, de Gardelle V, Herce Castañón S, & Summerfield C (2014). Adaptive Gain Control during Human Perceptual Choice. Neuron, 81(6), 1429–1441. 10.1016/j.neuron.2014.01.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
Clark A (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204. 10.1017/s0140525xl2000477 [DOI] [PubMed] [Google Scholar]
Constantino SM, & Daw ND (2015). Learning the opportunity cost of time in a patch-foraging task. Cognitive, Affective and Behavioral Neuroscience, 15(4), 837–853. 10.3758/sl3415-015-0350-y [DOI] [PMC free article] [PubMed] [Google Scholar]
d’Acremont M, & Bossaerts P (2016). Neural Mechanisms Behind Identification of Leptokurtic Noise and Adaptive Behavioral Response. Cerebral Cortex, 26(4), 1818–1830. 10.1093/cercor/bhw013 [DOI] [PMC free article] [PubMed] [Google Scholar]
Diederen KMJ, Spencer T, Vestergaard MD, Fletcher PC, & Schultz W (2016). Adaptive Prediction Error Coding in the Human Midbrain and Striatum Facilitates Behavioral Adaptation and Learning Efficiency. Neuron, 90(5), 1127–1138. 10.1016/j.neuron.2016.04.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
Farashahi S, Donahue CH, Hayden BY, Lee D, & Soltani A (2019). Flexible combination of reward information across primates. Nature Human Behaviour, 3(11), 1215–1224. 10.1038/s41562-019-0714-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gläscher J, Daw N, Dayan P, & O’Doherty JP (2010). States versus rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron, 66(4), 585–595. 10.1016/j.neuron.2010.04.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
Griffiths TL, Callaway F, Chang MB, Grant E, Krueger PM, & Lieder F (2019). Doing more with less: meta-reasoning and meta-learning in humans and machines. Current Opinion in Behavioral Sciences, 29, 24–30. 10.1016/j.cobeha.2019.01.005 [DOI] [Google Scholar]
Hayhoe MM, McKinney T, Chajka K, & Pelz JB (2012). Predictive eye movements in natural vision. Experimental Brain Research, 217(1), 125–136. 10.1007/s00221-011-2979-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
Henderson JM (2017). Gaze Control as Prediction. Trends in Cognitive Sciences, 21(1), 15–23. 10.1016/j.tics.2016.ll.003 [DOI] [PubMed] [Google Scholar]
Jepma M, Brown SBRE, Murphy PR, Koelewijn SC, Vries B. de, Maagdenberg A. M. van den, & Nieuwenhuis S (2018). Noradrenergic and Cholinergic Modulation of Belief Updating. Journal of Cognitive Neuroscience, 30(12), 1803–1820. 10.1162/jocn_a_01317 [DOI] [PubMed] [Google Scholar]
Jepma M, Murphy PR, Nassar MR, Rangel-Gomez M, Meeter M, & Nieuwenhuis S (2016). Catecholaminergic Regulation of Learning Rate in a Dynamic Environment. PLOS Computational Biology, 12(10), e1005171. 10.1371/journal.pcbi.1005171 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kao C-H, Lee S, Gold JI, & Kable JW (2020). Neural encoding of task-dependent errors during adaptive learning. ELife, 9, e58809. 10.7554/elife.58809 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kraus N, Niedeggen M, & Hesselmann G (2021). Trait anxiety is linked to increased usage of priors in a perceptual decision making task. Cognition, 206, 104474. 10.1016/j.cognition.2020.104474 [DOI] [PubMed] [Google Scholar]
Lee S, Gold JI, & Kable JW (2020). The Human as Delta-Rule Learner. Decision, 7(1), 55–66. 10.1037/dec0000112 [DOI] [Google Scholar]
Li J, Schiller D, Schoenbaum G, Phelps EA, & Daw ND (2011). Differential roles of human striatum and amygdala in associative learning. Nature Neuroscience, 14(10), 1250–1252. 10.1038/nn.2904 [DOI] [PMC free article] [PubMed] [Google Scholar]
Massi B, Donahue CH, & Lee D (2018). Volatility Facilitates Value Updating in the Prefrontal Cortex. Neuron, 99(3), 598–608.e4. 10.1016/j.neuron.2018.06.033 [DOI] [PMC free article] [PubMed] [Google Scholar]
McGuire JT, & Kable JW (2012). Decision makers calibrate behavioral persistence on the basis of time-interval experience. Cognition, 124(2), 216–226. 10.1016/j.cognition.2012.03.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
McGuire JT, Nassar MR, Gold JI, & Kable JW (2014). Functionally Dissociable Influences on Learning Rate in a Dynamic Environment. Neuron, 84(4), 870–881. 10.1016/j.neuron.2014.10.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
Nassar MR, Bruckner R, & Frank MJ (2019). Statistical context dictates the relationship between feedback-related EEG signals and learning. ELife, 8, e46975. 10.7554/elife.46975 [DOI] [PMC free article] [PubMed] [Google Scholar]
Nassar MR, Bruckner R, Gold JI, Li S-C, Heekeren HR, & Eppinger B (2016). Age differences in learning emerge from an insufficient representation of uncertainty in older adults. Nature Communications, 7(1), 11609. 10.1038/ncomms11609 [DOI] [PMC free article] [PubMed] [Google Scholar]
Nassar MR, Rumsey KM, Wilson RC, Parikh K, Heasly B, & Gold JI (2012). Rational regulation of learning dynamics by pupil-linked arousal systems. Nature Neuroscience, 15(7), 1040–1046. 10.1038/nn.3130 [DOI] [PMC free article] [PubMed] [Google Scholar]
Nassar MR, Wilson RC, Heasly B, & Gold JI (2010). An Approximately Bayesian Delta-Rule Model Explains the Dynamics of Belief Updating in a Changing Environment. The Journal of Neuroscience, 30(37), 12366–12378. 10.1523/jneurosci.0822-10.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
O’Reilly JX, Schüffelgen U, Cuell SF, Behrens TEJ, Mars RB, & Rushworth MFS (2013). Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. Proceedings of the National Academy of Sciences of the United States of America, 110(38), E3660–9. 10.1073/pnas.1305373110 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ossmy O, Moran R, Pfeffer T, Tsetsos K, Usher M, & Donner TH (2013). The Timescale of Perceptual Evidence Integration Can Be Adapted to the Environment. Current Biology, 23(11), 981–986. 10.1016/j.cub.2013.04.039 [DOI] [PubMed] [Google Scholar]
Payzan-LeNestour E, & Bossaerts P (2011). Risk, Unexpected Uncertainty, and Estimation Uncertainty: Bayesian Learning in Unstable Settings. PLoS Computational Biology, 7(1), e1001048. 10.1371/joumal.pcbi.1001048 [DOI] [PMC free article] [PubMed] [Google Scholar]
Payzan-LeNestour E, Dunne S, Bossaerts P, & O’Doherty JP (2013). The Neural Representation of Unexpected Uncertainty during Value-Based Decision Making. Neuron, 79(1), 191–201. 10.1016/j.neuron.2013.04.037 [DOI] [PMC free article] [PubMed] [Google Scholar]
Pearce JM, & Hall G (1980). A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review, 87(6), 532–552. 10.1037/0033-295x.87.6.532 [DOI] [PubMed] [Google Scholar]
Peirce J, Gray JR, Simpson S, MacAskill M, Höchenberger R, Sogo H, Kastman E, & Lindeløv JK (2019). PsychoPy2: Experiments in behavior made easy. Behavior Research Methods, 51(1), 195–203. 10.3758/sl3428-018-01193-y [DOI] [PMC free article] [PubMed] [Google Scholar]
Pulcu E, & Browning M (2019). The Misestimation of Uncertainty in Affective Disorders. Trends in Cognitive Sciences, 23(10), 865–875. 10.1016/j.tics.2019.07.007 [DOI] [PubMed] [Google Scholar]
Razmi N, & Nassar MR (2022). Adaptive Learning through Temporal Dynamics of State Representation. Journal of Neuroscience, 42(12), 2524–2538. 10.1523/JNEUROSCI.0387-21.2022 [DOI] [PMC free article] [PubMed] [Google Scholar]
Soltani A, & Izquierdo A (2019). Adaptive learning under expected and unexpected uncertainty. Nature Reviews Neuroscience, 20(10), 635–644. 10.1038/s41583-019-0180-y [DOI] [PMC free article] [PubMed] [Google Scholar]
Summerfield C, & Lange FP de. (2014). Expectation in perceptual decision making: neural and computational mechanisms. Nature Reviews Neuroscience, 15(11), 745–756. 10.1038/nrn3838 [DOI] [PubMed] [Google Scholar]
Summerfield C, & Tsetsos K (2015). Do humans make good decisions? Trends in Cognitive Sciences, 19(1), 27–34. 10.1016/j.tics.2014.ll.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
Urai AE, Braun A, & Donner TH (2017). Pupil-linked arousal is driven by decision uncertainty and alters serial choice bias. Nature Communications, 8(1), 14637. 10.1038/ncommsl4637 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang JX (2021). Meta-learning in natural and artificial intelligence. Current Opinion in Behavioral Sciences, 38, 90–95. 10.1016/j.cobeha.2021.01.002 [DOI] [Google Scholar]
Yang BW, Stone AR, & Marsh EJ (2022). Asymmetry in Belief Revision. Applied Cognitive Psychology, December 2021, 1072–1082. 10.1002/acp.3991 [DOI] [Google Scholar]
Yu AJ, & Dayan P (2005). Uncertainty, Neuromodulation, and Attention. Neuron, 46(4), 681–692. 10.1016/j.neuron.2005.04.026 [DOI] [PubMed] [Google Scholar]
Yu LQ, Wilson RC, & Nassar MR (2021). Adaptive learning is structure learning in time. Neuroscience & Biobehavioral Reviews, 128, 270–281. 10.1016/j.neubiorev.2021.06.024 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1857933-supplement-1.docx^{(1.4MB, docx)}

Data Availability Statement

Task code and raw data are available via the Open Science Framework at https://osf.io/5pmhg/ and analysis code is available upon request.

[R1] Bakst L, & McGuire JT (2021). Eye Movements Reflect Adaptive Predictions and Predictive Precision. Journal of Experimental Psychology: General: 150(5), 915–929. 10.1037/xge0000977 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Behrens TEJ, Woolrich MW, Walton ME, & Rushworth MFS (2007). Learning the value of information in an uncertain world. Nature Neuroscience, 10(9), 1214–1221. 10.1038/nnl954 [DOI] [PubMed] [Google Scholar]

[R3] Browning M, Behrens TE, Jocham G, O’Reilly JX, & Bishop SJ (2015). Anxious individuals have difficulty learning the causal statistics of aversive environments. Nature Neuroscience, 18(4), 590–596. 10.1038/nn.3961 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Cheadle S, Wyart V, Tsetsos K, Myers N, de Gardelle V, Herce Castañón S, & Summerfield C (2014). Adaptive Gain Control during Human Perceptual Choice. Neuron, 81(6), 1429–1441. 10.1016/j.neuron.2014.01.020 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Clark A (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204. 10.1017/s0140525xl2000477 [DOI] [PubMed] [Google Scholar]

[R6] Constantino SM, & Daw ND (2015). Learning the opportunity cost of time in a patch-foraging task. Cognitive, Affective and Behavioral Neuroscience, 15(4), 837–853. 10.3758/sl3415-015-0350-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] d’Acremont M, & Bossaerts P (2016). Neural Mechanisms Behind Identification of Leptokurtic Noise and Adaptive Behavioral Response. Cerebral Cortex, 26(4), 1818–1830. 10.1093/cercor/bhw013 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Diederen KMJ, Spencer T, Vestergaard MD, Fletcher PC, & Schultz W (2016). Adaptive Prediction Error Coding in the Human Midbrain and Striatum Facilitates Behavioral Adaptation and Learning Efficiency. Neuron, 90(5), 1127–1138. 10.1016/j.neuron.2016.04.019 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Farashahi S, Donahue CH, Hayden BY, Lee D, & Soltani A (2019). Flexible combination of reward information across primates. Nature Human Behaviour, 3(11), 1215–1224. 10.1038/s41562-019-0714-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Gläscher J, Daw N, Dayan P, & O’Doherty JP (2010). States versus rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron, 66(4), 585–595. 10.1016/j.neuron.2010.04.016 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Griffiths TL, Callaway F, Chang MB, Grant E, Krueger PM, & Lieder F (2019). Doing more with less: meta-reasoning and meta-learning in humans and machines. Current Opinion in Behavioral Sciences, 29, 24–30. 10.1016/j.cobeha.2019.01.005 [DOI] [Google Scholar]

[R12] Hayhoe MM, McKinney T, Chajka K, & Pelz JB (2012). Predictive eye movements in natural vision. Experimental Brain Research, 217(1), 125–136. 10.1007/s00221-011-2979-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Henderson JM (2017). Gaze Control as Prediction. Trends in Cognitive Sciences, 21(1), 15–23. 10.1016/j.tics.2016.ll.003 [DOI] [PubMed] [Google Scholar]

[R14] Jepma M, Brown SBRE, Murphy PR, Koelewijn SC, Vries B. de, Maagdenberg A. M. van den, & Nieuwenhuis S (2018). Noradrenergic and Cholinergic Modulation of Belief Updating. Journal of Cognitive Neuroscience, 30(12), 1803–1820. 10.1162/jocn_a_01317 [DOI] [PubMed] [Google Scholar]

[R15] Jepma M, Murphy PR, Nassar MR, Rangel-Gomez M, Meeter M, & Nieuwenhuis S (2016). Catecholaminergic Regulation of Learning Rate in a Dynamic Environment. PLOS Computational Biology, 12(10), e1005171. 10.1371/journal.pcbi.1005171 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Kao C-H, Lee S, Gold JI, & Kable JW (2020). Neural encoding of task-dependent errors during adaptive learning. ELife, 9, e58809. 10.7554/elife.58809 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Kraus N, Niedeggen M, & Hesselmann G (2021). Trait anxiety is linked to increased usage of priors in a perceptual decision making task. Cognition, 206, 104474. 10.1016/j.cognition.2020.104474 [DOI] [PubMed] [Google Scholar]

[R18] Lee S, Gold JI, & Kable JW (2020). The Human as Delta-Rule Learner. Decision, 7(1), 55–66. 10.1037/dec0000112 [DOI] [Google Scholar]

[R19] Li J, Schiller D, Schoenbaum G, Phelps EA, & Daw ND (2011). Differential roles of human striatum and amygdala in associative learning. Nature Neuroscience, 14(10), 1250–1252. 10.1038/nn.2904 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Massi B, Donahue CH, & Lee D (2018). Volatility Facilitates Value Updating in the Prefrontal Cortex. Neuron, 99(3), 598–608.e4. 10.1016/j.neuron.2018.06.033 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] McGuire JT, & Kable JW (2012). Decision makers calibrate behavioral persistence on the basis of time-interval experience. Cognition, 124(2), 216–226. 10.1016/j.cognition.2012.03.008 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] McGuire JT, Nassar MR, Gold JI, & Kable JW (2014). Functionally Dissociable Influences on Learning Rate in a Dynamic Environment. Neuron, 84(4), 870–881. 10.1016/j.neuron.2014.10.013 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Nassar MR, Bruckner R, & Frank MJ (2019). Statistical context dictates the relationship between feedback-related EEG signals and learning. ELife, 8, e46975. 10.7554/elife.46975 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Nassar MR, Bruckner R, Gold JI, Li S-C, Heekeren HR, & Eppinger B (2016). Age differences in learning emerge from an insufficient representation of uncertainty in older adults. Nature Communications, 7(1), 11609. 10.1038/ncomms11609 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Nassar MR, Rumsey KM, Wilson RC, Parikh K, Heasly B, & Gold JI (2012). Rational regulation of learning dynamics by pupil-linked arousal systems. Nature Neuroscience, 15(7), 1040–1046. 10.1038/nn.3130 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Nassar MR, Wilson RC, Heasly B, & Gold JI (2010). An Approximately Bayesian Delta-Rule Model Explains the Dynamics of Belief Updating in a Changing Environment. The Journal of Neuroscience, 30(37), 12366–12378. 10.1523/jneurosci.0822-10.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] O’Reilly JX, Schüffelgen U, Cuell SF, Behrens TEJ, Mars RB, & Rushworth MFS (2013). Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. Proceedings of the National Academy of Sciences of the United States of America, 110(38), E3660–9. 10.1073/pnas.1305373110 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Ossmy O, Moran R, Pfeffer T, Tsetsos K, Usher M, & Donner TH (2013). The Timescale of Perceptual Evidence Integration Can Be Adapted to the Environment. Current Biology, 23(11), 981–986. 10.1016/j.cub.2013.04.039 [DOI] [PubMed] [Google Scholar]

[R29] Payzan-LeNestour E, & Bossaerts P (2011). Risk, Unexpected Uncertainty, and Estimation Uncertainty: Bayesian Learning in Unstable Settings. PLoS Computational Biology, 7(1), e1001048. 10.1371/joumal.pcbi.1001048 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] Payzan-LeNestour E, Dunne S, Bossaerts P, & O’Doherty JP (2013). The Neural Representation of Unexpected Uncertainty during Value-Based Decision Making. Neuron, 79(1), 191–201. 10.1016/j.neuron.2013.04.037 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Pearce JM, & Hall G (1980). A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review, 87(6), 532–552. 10.1037/0033-295x.87.6.532 [DOI] [PubMed] [Google Scholar]

[R32] Peirce J, Gray JR, Simpson S, MacAskill M, Höchenberger R, Sogo H, Kastman E, & Lindeløv JK (2019). PsychoPy2: Experiments in behavior made easy. Behavior Research Methods, 51(1), 195–203. 10.3758/sl3428-018-01193-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Pulcu E, & Browning M (2019). The Misestimation of Uncertainty in Affective Disorders. Trends in Cognitive Sciences, 23(10), 865–875. 10.1016/j.tics.2019.07.007 [DOI] [PubMed] [Google Scholar]

[R34] Razmi N, & Nassar MR (2022). Adaptive Learning through Temporal Dynamics of State Representation. Journal of Neuroscience, 42(12), 2524–2538. 10.1523/JNEUROSCI.0387-21.2022 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Soltani A, & Izquierdo A (2019). Adaptive learning under expected and unexpected uncertainty. Nature Reviews Neuroscience, 20(10), 635–644. 10.1038/s41583-019-0180-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Summerfield C, & Lange FP de. (2014). Expectation in perceptual decision making: neural and computational mechanisms. Nature Reviews Neuroscience, 15(11), 745–756. 10.1038/nrn3838 [DOI] [PubMed] [Google Scholar]

[R37] Summerfield C, & Tsetsos K (2015). Do humans make good decisions? Trends in Cognitive Sciences, 19(1), 27–34. 10.1016/j.tics.2014.ll.005 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Urai AE, Braun A, & Donner TH (2017). Pupil-linked arousal is driven by decision uncertainty and alters serial choice bias. Nature Communications, 8(1), 14637. 10.1038/ncommsl4637 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] Wang JX (2021). Meta-learning in natural and artificial intelligence. Current Opinion in Behavioral Sciences, 38, 90–95. 10.1016/j.cobeha.2021.01.002 [DOI] [Google Scholar]

[R40] Yang BW, Stone AR, & Marsh EJ (2022). Asymmetry in Belief Revision. Applied Cognitive Psychology, December 2021, 1072–1082. 10.1002/acp.3991 [DOI] [Google Scholar]

[R41] Yu AJ, & Dayan P (2005). Uncertainty, Neuromodulation, and Attention. Neuron, 46(4), 681–692. 10.1016/j.neuron.2005.04.026 [DOI] [PubMed] [Google Scholar]

[R42] Yu LQ, Wilson RC, & Nassar MR (2021). Adaptive learning is structure learning in time. Neuroscience & Biobehavioral Reviews, 128, 270–281. 10.1016/j.neubiorev.2021.06.024 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Experience-driven recalibration of learning from surprising events

Leah Bakst

Joseph T McGuire

Abstract

1. INTRODUCTION

2. METHODS

2.1. Participants

2.2. Task

Figure 1.

2.3. Post-hoc calibration

2.4. Learning rate analyses

3. RESULTS

3.1. General Task Performance

2.2. Adaptive learning: Experiment 1

2.2.1. Effects of prediction error magnitude

Figure 2.

2.2.2. Effect of extreme events

Figure 3.

2.3. Adaptive learning: Experiment 2

2.3.1. Replication

Figure 4.

Figure 5.

2.3.2. Post-reversal behavior

4. DISCUSSION

4.1. Learning initial task structure

4.2. Learning after environmental change

4.3. Theoretical implications

4.4. Limitations

4.5. Conclusions

Supplementary Material

ACKNOWLEDGEMENTS

Footnotes

DATA AVAILABILITY

REFERENCES

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases