Skip to main content
eLife logoLink to eLife
. 2021 Apr 30;10:e62825. doi: 10.7554/eLife.62825

Response-based outcome predictions and confidence regulate feedback processing and learning

Romy Frömer 1,2,, Matthew R Nassar 2, Rasmus Bruckner 3,4,5, Birgit Stürmer 6, Werner Sommer 1, Nick Yeung 7
Editors: Tadeusz Wladyslaw Kononowicz8, Richard B Ivry9
PMCID: PMC8121545  PMID: 33929323

Abstract

Influential theories emphasize the importance of predictions in learning: we learn from feedback to the extent that it is surprising, and thus conveys new information. Here, we explore the hypothesis that surprise depends not only on comparing current events to past experience, but also on online evaluation of performance via internal monitoring. Specifically, we propose that people leverage insights from response-based performance monitoring – outcome predictions and confidence – to control learning from feedback. In line with predictions from a Bayesian inference model, we find that people who are better at calibrating their confidence to the precision of their outcome predictions learn more quickly. Further in line with our proposal, EEG signatures of feedback processing are sensitive to the accuracy of, and confidence in, post-response outcome predictions. Taken together, our results suggest that online predictions and confidence serve to calibrate neural error signals to improve the efficiency of learning.

Research organism: Other

Introduction

Feedback is crucial to learning and adaptation. Across domains it is thought that feedback drives learning to the degree that it is unexpected and, hence, provides new information, for example in the form of prediction errors that express the discrepancy between actual and expected outcomes (McGuire et al., 2014; Yu and Dayan, 2005; Behrens et al., 2007; Diederen and Schultz, 2015; Diederen et al., 2016; Pearce and Hall, 1980; Faisal et al., 2008; Sutton and Barto, 1998; Wolpert et al., 2011). Yet, the same feedback can be caused by multiple sources: we may be wrong about what is the correct thing to do, or we may know what to do but accidentally still do the wrong thing (McDougle et al., 2016). When we know we did the latter, we should discount learning about the former (McDougle et al., 2019; Parvin et al., 2018). Imagine for instance learning to throw darts. You know the goal you want to achieve – hit the bullseye – and you might envision yourself performing the perfect throw to do so. However, you find that the throw you performed as intended missed the target entirely and did not yield the desired outcome: In this case, you should adjust what you believe to be the right angle to hit the bullseye, based on how you missed that last throw. On a different throw you might release the dart at a different angle than intended and thus anticipate the ensuing miss: In this case, you may not want to update your beliefs on what is the right angle of throw. How do people assign credit to either of these potential causes of feedback when learning how to perform a new task? How do they regulate how much to learn from a given feedback depending on how much they know about its causes?

Performance monitoring, that is the internal evaluation of one’s own actions, could reduce surprise about feedback and uncertainty about its causes by providing information about execution errors. For instance in the second dart throw example, missing the target may be unsurprising if performance monitoring detected that, for example, the dart was released differently than desired (Figure 1A). In simple categorical choices, people are often robustly aware of their response errors (Maier et al., 2011; Yeung et al., 2004; Riesel et al., 2013; Maier et al., 2012) and this awareness is reflected in neural markers of error detection (Murphy et al., 2015). Although errors are often studied in simple categorization tasks in which responses are either correct or incorrect, in many tasks, errors occur on a graded scale (e.g. a dart can miss the target narrowly or by a large margin), and both error detection, as well as feedback processing are sensitive to error magnitude (Luft et al., 2014; Ulrich and Hewig, 2014; Frömer et al., 2016a; Arbel and Donchin, 2011). People are even able to report gradual errors reasonably accurately (Kononowicz et al., 2019; Akdoğan and Balcı, 2017; Kononowicz and van Wassenhove, 2019).

Figure 1. Interactions between performance monitoring and feedback processing.

Figure 1.

(A) Illustration of dynamic updating of predicted outcomes based on response information. Pre-response the agent aims to hit the bullseye and selects the action he believes achieves this goal. Post-response the agent realizes that he made a mistake and predicts to miss the target entirely, being reasonably confident in his prediction. In line with his prediction and thus unsurprisingly the darts hits the floor. (B) Illustration of key concepts. Left: The feedback received is plotted against the prediction. Performance and prediction can vary in their accuracy independently. Perfect performance (zero deviation from the target, dark blue line) can occur for accurate or inaccurate predictions and any performance, including errors, can be predicted perfectly (predicted error is identical to performance, orange line). When predictions and feedback diverge, outcomes (feedback) can be better (closer to the target, area highlighted with coarse light red shading) or worse (farther from the target, area highlighted with coarse light blue shading) than predicted. The more they diverge the less precise the predictions are. Right: The precision of the prediction is plotted against confidence in that prediction. If confidence closely tracks the precision of the predictions, that is if agents know when their predictions are probably right and when they’re not, confidence calibration is high (green). If confidence is independent of the precision of the predictions, then confidence calibration is low. (C) Illustration of theoretical hypotheses. Left: We expect the correspondence between predictions and Feedback to be stronger when confidence is high and to be weaker when confidence is low. Right: We expect that agents with better confidence calibration learn better. (D) Trial schema. Participants learned to produce a time interval by pressing a button following a tone with their left index finger. Following each response, they indicated on a visual analog scale in sequence the estimate of their accuracy (anchors: ‘much too short’ = ‘viel zu kurz’ to ‘much too long’ = ‘viel zu lang’) and their confidence in that estimate (anchors: ‘not certain’ = ‘nicht sicher’ to ‘fully certain’ = ‘völlig sicher’) by moving an arrow slider. Finally, feedback was provided on a visual analog scale for 150 ms. The current error was displayed as a red square on the feedback scale relative to the target interval indicated by a tick mark at the center (Target, t) with undershoots shown to the left of the center and overshoots to the right, and scaled relative to the feedback anchors of -/+1 s (Scale, s; cf. E). Participants are told neither Target nor Scale and instead need to learn them based on the feedback. (E) Bayesian Learner with Performance Monitoring. The learner selects an intended response (i) based on the current estimate of the Target. The Intended Response and independent Response Noise produce the Executed Response (r). The Efference Copy (c) of this response varies in its precision as a function of Efference Copy Noise. It is used to generate a Prediction as the deviation from the estimate of Target scaled by the estimate of Scale. The Efference Copy Noise is estimated and expressed as Confidence (co), approximating the precision of the Prediction. Learners vary in their Confidence Calibration (cc), that is, the precision of their predictions, and higher Confidence Calibration (arrows: green >yellow > magenta) leads to more reliable translation from Efference Copy precision to Confidence. Feedback is provided according to the Executed Response and depends on the Target and Scale, which are unknown to the learner. Target and Scale are inferred based on Feedback (f), Response Noise, Prediction, and Confidence. Variables that are observable to the learner are displayed in solid boxes, whereas variables that are only partially observable are displayed in dashed boxes. (F) Target and scale error (absolute deviation of the current estimates from the true values) for the Bayesian learner with Performance monitoring (green, optimal calibration), a Feedback-only Bayesian Learner (solid black), and a Bayesian Learner with Outcome Prediction (dashed black).

This ability may be afforded by reliance on internal models to predict the outcome of movements (Wolpert and Flanagan, 2001), for example, based on an efference copy of a motor command. These predictions could help discount execution errors in learning from feedback. In fact, if these predictions perfectly matched the execution error that occurred, the remaining mismatch between predicted and obtained feedback (sensory prediction error) could serve as a reliable basis for adaptation and render feedback maximally informative about the mapping from actions to outcomes (Figure 1B).

Although participants are able to evaluate their own performance reasonably well, error detection is far less certain than outlined in the ideal scenario above, and the true cause of feedback often remains uncertain to some extent. People are critically sensitive to uncertainty, and learn more from feedback when they expect it to be more informative (McGuire et al., 2014; Schiffer et al., 2017; Bland and Schaefer, 2012; Nassar et al., 2010; O'Reilly, 2013). Uncertainty about what caused a given feedback inevitably renders it less informative, similar to decreases in reliability, and this uncertainty should be taken into account when learning from it. Confidence could support such adaptive learning from feedback by providing a read-out of the subjective precision of predicted outcomes (Nassar et al., 2010; Vaghi et al., 2017; Meyniel et al., 2015; Pouget et al., 2016), possibly relying on shared neural correlates of confidence with error detection (Boldt and Yeung, 2015; van den Berg et al., 2016). Similar to its role in regulating learning of transition probabilities (Meyniel et al., 2015; Meyniel and Dehaene, 2017), information seeking/exploration in decision making (Desender et al., 2018a; Boldt et al., 2019), and hierarchical reasoning (Sarafyazd and Jazayeri, 2019), people could leverage confidence to calibrate their use of online predictions. In line with this suggestion, people learn more about advice givers when they are more confident in the choices that advice is about (Carlebach and Yeung, 2020). In the throwing example above, the more confident you are about the exact landing position of the dart, the more surprised you should be when you find that landing position to be different: The more confident you are, the more evidence you have that your internal model linking angles to landing positions is wrong, and the more information you get about how this model is wrong. Thus, you should learn more when you are more confident. However, this reasoning assumes that your predictions are in fact more precise when you are more confident, i.e., that your confidence is well calibrated (Figure 1B).

In the present study, we tested the hypothesis that performance monitoring – error detection and confidence (Yeung and Summerfield, 2012) – adaptively regulates learning from feedback. This hypothesis predicts that error detection and confidence afford better learning, with confidence mediating the relationship between outcome predictions and feedback, and that learning is compromised when confidence is mis-calibrated (Figure 1C). It further predicts that established neural correlates of feedback processing, such as the feedback-related negativity (FRN) and the P3a (Ullsperger et al., 2014a), should integrate information about post-response outcome predictions and confidence. That is to say, an error that could be predicted based on internal knowledge of how an action was executed should not yield a large surprise (P3a) or reward prediction error (FRN) signal in response to an external indicator of the error (feedback). However, any prediction error should be more surprising when predictions were made with higher confidence. We formalize our predictions using a Bayesian model of learning and test them using behavioral and EEG data in a modified time-estimation task.

Results

Rationale and approach

Our hypothesis that performance monitoring regulates adaptive learning from feedback makes two key behavioral predictions (Figure 1C): (1) The precision of outcome predictions (i.e. the correlation between predicted and actual outcomes) should increase with confidence. (2) Learners with superior calibration of confidence to the precision of their outcome predictions should learn more quickly. Our hypothesis further predicts that feedback processing will be critically modulated by an agent’s outcome prediction and confidence. We tested these predictions mechanistically using computational modeling and empirically based on behavioral and EEG data from 40 participants performing a modified time-estimation task (Figure 1D). In comparison to darts throwing as used in our example, the time estimation task requires a simple response – a button press – such that errors map onto a single axis that defines whether the response was provided too early, timely, or too late and by how much. These errors can be mapped onto a feedback scale and, just as in the darts example where one learns the correct angle and acceleration to hit the bullseye, participants here can learn the target timing interval. In addition to requiring participants to learn and produce a precisely timed action on each trial, our task also included two key measurements that allowed us to better understand how performance monitoring affects feedback processing: (1) Participants were required to predict the feedback they would receive on each trial and indicate it on a scale visually identical to the feedback scale (Figure 1D, Prediction) and (2) Participants indicated their degree of confidence in this prediction (Figure 1D, Confidence). Only following these judgments would they receive feedback about their time estimation performance.

A mechanism for performance monitoring-augmented learning

As a demonstration of proof of the hypothesized learning principles, we implemented a computational model that uses performance monitoring to optimize learning from feedback in that same task (Figure 1E). The agent’s goal is to learn the mapping between its actions and their outcomes (sensory consequences) in the time-estimation task, wherein feedback on an initially unknown scale must be used to learn accurately timed actions. Learning in this task is challenged in two ways: First, errors signaled by feedback include contributions of response noise, for example, through variability in the motor system or in the representations of time (Kononowicz and van Wassenhove, 2019; Balci et al., 2011). Second, the efference copy of the executed response (or the estimate of what was done) varies in its precision. To overcome these challenges, the agent leverages performance monitoring: It infers the contribution of response noise to a given outcome based on an outcome prediction derived from the efference copy, and the degree of confidence in its prediction based on an estimate of the current efference copy noise. The agent then weighs Prediction and Intended Response as a function of Confidence and Response Noise when updating beliefs about the Target and the Scale based on Feedback.

We compare this model to one that has no insights into its trial-by-trial performance, but updates based on feedback and its fidelity due to response noise alone (Feedback), and another model that has insights into its trial-by-trial performance allowing it to generate predictions, and into the average precision of its predictions, but not the precision of its current prediction (Feedback + Prediction). We find that performance improves as the amount of insight into the agent’s performance increases (Figure 1F): The optimally calibrated Bayesian learner with performance monitoring outperforms both other models. Further, in line with our behavioral predictions, we find in this model that confidence varies with the precision of predictions (Figure 2A, Figure 2—figure supplement 1) and, when varying the fidelity of confidence as a read-out of precision (Confidence Calibration), agents with superior Confidence Calibration learn better (Figure 2B, Figure 2—figure supplement 1). We next sought to test whether participants’ behavior likewise displays these hallmarks of our hypothesis.

Figure 2. Relationships between outcome predictions and actual outcomes in the model and observed data (top vs.bottom).

(A) Model prediction for the relationship between Prediction and actual outcome (Feedback) as a function of Confidence. The relationship between predicted and actual outcomes is stronger for higher confidence. Note that systematic errors in the model’s initial estimates of target (overestimated) and scale (underestimated) give rise to systematically late responses, as well as underestimation of predicted outcomes in early trials, visible as a plume of datapoints extending above the main cloud of simulated data. (B) The model-predicted effect of Confidence Calibration on learning. Better Confidence Calibration leads to better learning. (C) Observed relationship between predicted and actual outcomes. Each data point corresponds to one trial of one participant; all trials of all participants are plotted together. Regression lines are local linear models visualizing the relationship between predicted and actual error separately for high, medium, and low confidence. At the edges of the plot, the marginal distributions of actual and predicted errors are depicted by confidence levels. (D) Change in error magnitude across trials as a function of confidence calibration. Lines represent LMM-predicted error magnitude for low, medium and high confidence calibrations, respectively. Shaded error bars represent corresponding SEMs. Note that the combination of linear and quadratic effects approximates the shape of the learning curves, better than a linear effect alone, but predicts an exaggerated uptick in errors toward the end, Figure 2—figure supplement 3. Inset: Average Error Magnitude for every participant plotted as a function of Confidence Calibration level. The vast majority of participants show positive confidence calibration. The regression line represents a local linear model fit and the error bar represents the standard error of the mean.

Figure 2.

Figure 2—figure supplement 1. Model comparison.

Figure 2—figure supplement 1.

Shown are the relationship between predicted and actual outcomes as a function of Confidence (top) and the Target Error Magnitude as a function of Confidence Calibration for the model that learns from feedback only (left), the model that learns from feedback and outcome predictions (center) and our Bayesian Learner with Performance Monitoring. While the Feedback only model learns the target adequately, it is not able to predict outcomes of its actions. The Feedback and Prediction model learns faster and is able to predict outcomes; however, it cannot distinguish between accurate and inaccurate predictions. Finally, the Bayesian Learner with performance monitoring predicts outcomes and distinguishes between accurate and inaccurate predictions. Whether these are advantageous depends on the fidelity of confidence as a read-out of the precision of the predictions, that is confidence calibration. The well-calibrated Bayesian Learner with Performance Monitoring outperforms both alternative models.
Figure 2—figure supplement 2. Predictions and Confidence improve as learning progresses.

Figure 2—figure supplement 2.

Plotted are actual errors as a function of predicted errors and confidence terciles. Regression lines represent local linear models. In block 1, many large actual errors are inaccurately predicted to be zero or small. These prediction errors decrease over time. Across blocks, Confidence further dissociates increasingly well between accurate and inaccurate predictions.
Figure 2—figure supplement 3. Running average log error magnitude across trials.

Figure 2—figure supplement 3.

Running average performance averaged across participants within terciles of Confidence calibration. Shaded error bars represent standard error of the mean.

Confidence reflects precision of outcome predictions

To test the predictions of our model empirically, we examined behavior of 40 human participants performing the modified time-estimation task. To test whether the precision of outcome predictions increases with confidence, we regressed participants’ signed timing production errors (signed error magnitude; scale: undershoot [negative] to overshoot [positive]) on their signed outcome predictions (Predicted Outcome; same scale as for signed error magnitude), Confidence, Block, as well as their interactions. Our results support our first behavioral prediction (Table 1): As expected, predicted outcomes and actual outcomes were positively correlated, indicating that participants could broadly indicate the direction and magnitude of their errors. Crucially, this relationship between predicted and actual outcomes was stronger for predictions made with higher confidence (Figure 2C).

Table 1. Relations between actual performance outcome (signed error magnitude), predicted outcome, confidence in predictions and their modulations due to learning across blocks of trials.

Signed error magnitude
Predictors Estimates SE CI t p
Intercept 4.63 9.99 −14.94–24.20 0..46 6.427e-01
Predicted Outcome 523.99 29.66 465..86–582.12 17.67 7.438e-70
Block 29.47 8.12 13..56–45.37 3..63 2.832e-04
Confidence −27.07 11.05 −48.73 – −5.42 −2..45 1.428e-02
Predicted Outcome: Block −149.70 21.90 −192.62 – −106.78 −6..84 8.145e-12
Predicted Outcome: Confidence 322.56 27.31 269.03–376.09 11.81 3.477e-32
Block: Confidence −25.52 9..15 −43.46 – −7.58 −2..79 5.297e-03
Predicted Outcome: Block: Confidence 90.68 33.65 24.73–156.64 2..69 7.043e-03
Random effects Model Parameters
Residuals 54478.69 N 40
Intercept 3539.21 Observations 9996
Confidence 2813.79 log-Likelihood −68816.092
Predicted Outcome 22357.33 Deviance 137632.185

Formula: Signed error magnitude ~Predicted Outcome*Block*Confidence+(Confidence +Predicted Outcome+Block|participant); Note: ‘:” indicates interactions between predictors.

In addition to this expected pattern, we found that both outcome predictions, as well as confidence calibration, improved across blocks, suggestive of learning at the level of performance monitoring (Figure 2—figure supplement 2). Note however that participants tended to bias their predictions toward the center of the scale in early blocks, when they had little knowledge about the target interval and could thus determine neither over- vs. undershoots nor their magnitude. This strategic behavior may give rise to the apparent improvements in performance monitoring.

To test more directly our assumption that Confidence tracks the precision of predictions, we followed up on these findings with a complementary analysis of Confidence as the dependent variable and tested how it relates to the precision of predictions (absolute discrepancy between predicted and actual outcome, see sensory prediction error, SPE below), the precision of performance (error magnitude), and how those change across blocks (Table 2). Consistent with our assumption that Confidence tracks the precision of predictions, we find that it increases as the discrepancy between predicted and actual outcome decreases. Confidence was also higher for larger errors, presumably because their direction (i.e. overshoot or undershoot) is easier to judge. The relationships with both the precision of the prediction and error magnitude changed across blocks, and confidence increased across blocks as well.

Table 2. Relations of confidence with the precision of prediction and the precision of performance and changes across blocks.

Confidence
Predictors Estimates SE CI t p
(Intercept) 0.26 0.04 0.18–0.33 6.35 2.187e-10
Block 0.05 0.02 0.02–0.08 3.05 2.257e-03
Sensory Prediction Error (SPE) −0.44 0.04 −0.52 – −0.36 −10.84 2.289e-27
Error Magnitude (EM) 0.17 0.05 0.08–0.27 3.73 1.910e-04
Block: SPE −0.08 0.04 −0.15 – −0.00 −1.99 4.642e-02
Block: EM 0.15 0.05 0.05–0.25 3.07 2.167e-03
Random effects Model Parameters
 Residuals 0.12 N 40
 Intercept 0.06 Observations 9996
 SPE 0.03 log-Likelihood −3640.142
 Error Magnitude 0.06 Deviance 7280.284
 Block 0.01
 Error Magnitude: Block 0.04

Formula: Confidence ~ (SPE +Error Magnitude)*Block+(SPE +Error Magnitude *Block|participant); Note: ‘:” indicates interactions between predictors.

To test whether these effects reflect monotonic increases in confidence and its relationships with prediction error and error magnitude, as expected with learning, we fit a model with block as a categorical predictor and SPE and Error Magnitude nested within blocks (Supplementary file 1). We found that confidence increased numerically from each block to the next, with significant differences between block 1 and 2, as well as block 3 and 4. Its relationship to error magnitude was reduced in the first block compared to the remaining blocks and enhanced in the final two blocks compared to the remaining blocks. These findings are thus consistent with learning effects. While the precision of predictions was more strongly related to confidence in the final block compared to the remaining blocks, it was not less robustly related in the first block, and instead somewhat weaker in the third block. This pattern is thus not consistent with learning. Importantly, whereas error magnitude was robustly related to confidence only in the last two blocks, the precision of the prediction was robustly related to confidence throughout.

Having demonstrated that, across individuals, confidence reflects the precision of their predictions (via the correlation with SPE), we next quantified this relationship for each participant separately as an index of their confidence calibration. While quantifying the relationship, we controlled for changes in performance across blocks, and to ease interpretation, we sign-reversed the obtained correlations so that higher values correspond to better confidence calibration. We next tested our hypothesis that confidence calibration relates to learning.

Superior calibration of confidence judgments relates to superior learning

To empirically test our second behavioral prediction, that people with better confidence calibration learn faster, we modeled log-transformed trial-wise error magnitude as a function of Trial (linear and quadratic effects to account for non-linearity in learning, that is stronger improvements in the beginning), Confidence Calibration for each participant (Figure 2D inset), and their interaction (Table 3). As expected, Confidence Calibration interacted significantly with the linear Trial component, that is with learning (Figure 2D). Thus, participants with better confidence calibration showed greater performance improvements during the experiment. Importantly, Confidence Calibration did not significantly correlate with overall performance (Figure 2D inset), supporting the assumption that confidence calibration relates to learning (performance change), rather than performance per se. Confidence calibration was also not correlated with individual differences in response variance (r = - 2.07e-4, 95% CI = [−0.31, 0.31], p=0.999), and the interaction of confidence calibration and block was robust to controlling for running average response variance (Supplementary file 2).

Table 3. Confidence calibration modulation of learning effects on performance.

log Error Magnitude
Predictors Estimates SE CI t p
(Intercept) 5.17 0.06 5.05–5.30 80.74 0.000e + 00
Confidence Calibration 0.58 0.58 −0.57–1.72 0.99 3.228e-01
Trial (linear) −0.59 0.07 −0.72 – −0.45 −8..82 1.197e-18
Trial (quadratic) 0.16 0.02 0.11–0.20 6.80 1.018e-11
Trial (linear): Confidence Calibration −0.86 0.32 −1.48 – −0.24 −2.72 6.467e-03
Random effects Model Parameters
 Residuals 1.18 N 40
 Intercept 0..12 Observations 9996
 Trial (linear) 0..03 log-Likelihood −15106.705
Deviance 30213.411

Formula: log Error Magnitude ~ (Confidence Calibration* Trial(linear)+Trial(quadratic) + (Trial(linear)|participant)); Note: ‘:' indicates interactions between predictors.

Thus, taken together, our model simulations and behavioral results align with the behavioral predictions of our hypothesis: Participants’ outcome predictions were better related to actual outcomes when those outcome predictions were made with higher confidence, and individuals with superior confidence calibration showed better learning.

Outcome predictions and confidence modulate feedback signals and processing

At the core of our hypothesis and model lies the change in feedback processing as a function of outcome predictions and confidence. It is typically assumed that learning relies on prediction errors, and signatures of prediction errors have been found in scalp-recorded EEG signals. Before testing directly how feedback is processed, as reflected in distinct feedback related ERP components, we will show how these prediction errors vary over time, and as a function of confidence.

We dissociate three signals that can be processed to evaluate feedback (Figure 3A): The objective magnitude of the error (Error Magnitude) reflects the degree to which performance needs to be adjusted regardless of whether that error was predicted or not. The reward prediction error (RPE), thought to drive reinforcement learning, indexes whether the outcome of a particular response was better or worse than expected. The sensory prediction error (SPE), thought to underlie forward model-based and direct policy learning in the motor domain (Hadjiosif et al., 2020), indexes whether the outcome of a particular response was close to or far off the predicted one. To illustrate the difference between the two prediction errors, one might expect to miss a target 20 cm to the left but find the arrow misses it 20 cm to the right instead. There is no RPE, as the actual outcome is exactly as good or bad as the predicted one, however, there is a large SPE, because the actual outcome is very different from the predicted one.

Figure 3. Changes in objective and subjective feedback.

Figure 3.

(A) Dissociable information provided by feedback. An example for a prediction (hatched box) and a subsequent feedback (red box) are shown overlaid on a rating/feedback scale. We derived three error signals that make dissociable predictions across combinations of predicted and actual outcomes. The solid blue line indicates Error Magnitude (distance from outcome to goal). As smaller errors reflect greater rewards, we computed Reward Prediction Error (RPE) as the signed difference between negative Error Magnitude and the negative predicted error magnitude (solid orange line, distance from prediction to goal). Sensory Prediction Error (SPE, dashed line) was quantified as the absolute discrepancy between feedback and prediction. Values of Error Magnitude (left), RPE (middle), and SPE (right) are plotted for all combinations of prediction (x-axis) and outcome (y-axis) location. (B) Predictions and confidence associate with reduced error signals. Average error magnitude (left), Reward Prediction Error (center), and Sensory Prediction Error (right) are shown for each block and confidence tercile. Average prediction errors are smaller than average error magnitudes (dashed circles), particularly for higher confidence.

Our hypothesis holds that predictions should help discount noise in the error signal and more so for higher confidence. Prediction errors should thus be smaller than error magnitude and particularly so when confidence is higher. We find that this is true in our data (Figure 3B, Supplementary file 3 and 4, note that unlike SPE, by definition RPE cannot be larger than error magnitude and that its magnitude, but not sign varies robustly with confidence).

To examine changes in these error signals with trial-to-trial changes in confidence and learning, we regressed each of these signals onto Confidence, Block, and their interaction (Supplementary file 5, Figure 3B). Consistent with our assumption that confidence tracks the precision of predictions, SPE decreased as confidence increased (b = −71.20, p>0.001), but there were no significant main effects on error magnitude or reward prediction error. However, Confidence significantly interacted with Block on all variables (Error Magnitude: b = 30.09, p<0.001, RPE: b = −64.48, p<0.001, SPE: b = 16.99, p=0.005), such that in the first block, increased Confidence is associated with smaller Error Magnitudes, less negative RPE, as well as smaller SPE. All error signals further decreased significantly across blocks (Error Magnitude: b = - 37.10, p<0.001, RPE: b = 36.26, p<0.001, SPE: b = - 17.54, p<0.001, block wise comparisons significant only from block 1 to 2 Supplementary file 6). These parallel patterns might emerge because prediction errors are derived from and thus might covary with error magnitude. To test whether changes in prediction errors were primarily driven by improvements in error magnitude rather than predictions, we reran the previous RPE and SPE models with error magnitude as a covariate (Supplementary file 7). Controlling for error magnitude notably reduced linear block effects on RPE (b = 36.26 to b = 10.3). It further eliminated block effects on SPE: b = −17.54 to b = 3.65, p=0.274, as well as the interaction of confidence and Block (b = 0.10, p=0.984), while the hypothesized main effect of Confidence prevailed (b = −60.12, p<0.001).

In summary, we find that all error signals decrease across blocks as performance improves. Although higher confidence is associated with smaller error signals in all three variables early in learning, across all blocks we find that confidence only has a consistent relationship with smaller sensory prediction errors.

Taken together, these results are consistent with our hypothesis that outcome predictions and confidence optimize feedback processing. Accordingly, we predicted that participants’ internal evaluations would modulate feedback processing as indexed by distinct feedback-related potentials in the EEG: the feedback-related negativity (FRN), P3a and P3b. Thus, the amplitude of a canonical index of signed RPE (Holroyd and Coles, 2002), the FRN, should increase to the extent that outcomes were worse than predicted, that is, with more negative-going RPE. P3a amplitude, a neural signature of surprise (Polich, 2007), should increase with the absolute difference between participants’ outcome predictions and actual outcomes (i.e. with SPE) and be enhanced in trials in which participants indicated higher confidence in their outcome predictions. To further explore the possible role of performance monitoring in learning, we also tested the joint effects of our experimental variables on the P3b as a likely index of learning (Fischer and Ullsperger, 2013).

If participants did not take their predictions into account, ERP amplitudes should scale with the actual error magnitude reflected in the feedback (Error Magnitude). Note that both RPE and SPE are equivalent to Error Magnitude in the special case where predicted errors are zero (Figure 3A), and thus Error Magnitude can be thought of as the default RPE and SPE that would arise if an individual predicted perfect execution on each trial. Thus, if participants did not take knowledge of their own execution errors into account, their FRN and P3a should both simply reflect Error Magnitude. A key advantage of our experimental design is that RPE, SPE, and Error Magnitude vary differentially as a function of actual and predicted outcomes (Figure 3A), which allowed us to test our predictions by examining whether ERP amplitudes are modulated by prediction errors (SPE and RPE) and Confidence, while controlling for other factors including Error Magnitude.

Reward prediction error modulates the feedback-related negativity

The feedback-related negativity (FRN) is an error-sensitive ERP component with a fronto-central scalp distribution that peaks between 230 and 330 ms following feedback onset (Miltner et al., 1997Figure 4A). It is commonly thought to index neural encoding of RPE (Holroyd and Coles, 2002): Its amplitude increases with the degree to which an outcome is worse than expected and, conversely, decreases to the extent that outcomes are better than expected (Hajcak et al., 2006; Holroyd et al., 2006; Walsh and Anderson, 2012; Holroyd et al., 2003; Sambrook and Goslin, 2015). Its amplitude thus decreases with increasing reward magnitude (Frömer et al., 2016a) and reward expectancy (Lohse et al., 2020). However, it is unknown whether reward prediction errors signaled by the FRN contrast current feedback with predictions based only on previous (external) feedback, or whether they might incorporate ongoing (internal) performance monitoring. Based on our overarching hypothesis, we predicted that FRN amplitude would scale with our estimate of RPE, which quantifies the degree to which actual feedback was ‘better’ than the feedback predicted after action execution, causing more negative RPEs to produce larger FRN amplitudes (Figure 4B). A key alternative option is that the FRN indexes the magnitude of an error irrespective of the participant’s post-response outcome prediction (e.g. with large FRN to feedback indicating a large error, even when the participant knows to have committed this error) (Pfabigan et al., 2015; Sambrook and Goslin, 2014; Talmi et al., 2013). Note that the prediction errors experienced by most error-driven learning models would fall into this alternative category, as they would reflect the error magnitude minus some long-term expectation of that magnitude, but not update these expectations after action execution. Thus, to test whether RPE explains variation in FRN above and beyond Error Magnitude, and to control for other factors, we included Error Magnitude, SPE, Confidence, and Block in the model (Table 4).

Figure 4. Multiple prediction errors in feedback processing.

Figure 4.

(A-C) FRN amplitude is sensitive to predicted error magnitude. (A) FRN, grand mean, the shaded area marks the time interval for peak-to-peak detection of FRN. Negative peaks between 200 and 300 ms post feedback were quantified relative to positive peaks in the preceding 100 ms time window. (B) Expected change in FRN amplitude as a function of RPE (color) for two predictions (black curves represent schematized predictive distributions around the reported prediction for a given confidence), one too early (top: high confidence in a low reward prediction) and one too late (bottom: low confidence in a higher reward prediction). Vertical black arrows mark a sample outcome (deviation from the target; abscissa) resulting in different RPE/expected changes in FRN amplitude for the two predictions, indicated by shades. Blue shades indicate negative RPEs/larger FRN, red shades indicate positive RPEs/smaller FRN and gray denotes zero. Note that these are mirrored at the goal for any predictions, and that the likelihood of the actual outcome given the prediction (y-axis) does not affect RPE. In the absence of a prediction or a predicted error of zero, FRN amplitude should increase with the deviation from the target (abscissa). (C) LMM-estimated effects of RPE on peak-to-peak FRN amplitude visualized with the effects package; shaded error bars represent 95% confidence intervals. (D– I) P3a amplitude is sensitive to SPE and Confidence. (D) Grand mean ERP with the time-window for quantification of P3a, 330–430 ms, highlighted. (E) Hypothetical internal representation of predictions. Curves represent schematized predictive distributions around the reported prediction (zero on abscissa). Confidence is represented by the width of the distributions. (F) Predictions for SPE (x-axis) and Confidence (y-axis) effects on surprise as estimated with Shannon information (darker shades signify larger surprise) for varying Confidence and SPE (center). The margins visualize the predicted main effects for Confidence (left) and SPE (bottom). (G) P3a LMM fixed effect topographies for SPE, and Confidence. (H–I) LMM-estimated effects on P3a amplitude visualized with the effects package; shaded areas in (H) (SPE) and (I) (confidence) represent 95% confidence intervals.

Table 4. LMM statistics of learning effects on FRN.

Peak-to-Peak FRN amplitude
Predictors Estimates SE CI t p
Intercept −12.67 0.49 −13.62 – −11.71 −26.03 2.322e-149
Confidence −0.19 0.15 −0.49–0.11 −1.25 2.126e-01
Reward prediction error 1.43 0.41 0.62–2.24 3.47 5.302e-04
Sensory prediction error −0.67 0.42 −1.49–0.15 −1.61 1.078e-01
Error magnitude 0.51 0.55 −0.57–1.58 0.92 3.553e-01
Block −0.15 0.11 −0.36–0.06 −1.43 1.513e-01
Random effects Model Parameters
 Residuals 27.69 N vpn 40
 Intercept 9.23 Observations 9678
 Error magnitude 2.24 log-Likelihood −29908.910
 Block 0.22 Deviance 59817.821

Formula: FRN ~ Confidence + RPE+SPE + EM+Block + (EM +Block|participant).

As predicted, FRN amplitude decreased with more positive-going RPEs (b = 1.43, p<0.001, Figure 4C), extending previous work that investigated prediction errors as a function of reward magnitude and frequency (Holroyd and Coles, 2002; Sambrook and Goslin, 2015). In contrast, error magnitude and SPE did not significantly affect FRN amplitude, suggesting in the case of the error magnitude that when errors can be accounted for by faulty execution, they do not drive internal reward prediction error signals. We found no other reliable effects and when including interaction terms, they were neither significant, nor supported by model selection (ΔΧ2(10)=10.98, p=0.359, AICreduced-full = −9, BICreduced-full = −81). We conclude that FRN amplitude reflects the degree to which feedback is better than predicted, and critically, that the outcome predictions incorporate information about likely execution errors.

Sensory prediction error and confidence modulate P3a

The frontocentral P3a is a surprise-sensitive positive-going deflection between 250 and 500 ms following stimulus onset (Figure 3E; Polich, 2007). Its functional significance can be summarized as signaling the recruitment of attention for action to surprising and motivationally relevant stimuli (Polich, 2007; Nieuwenhuis et al., 2011). P3a has been shown to increase with larger prediction errors in probabilistic learning tasks (Fischer and Ullsperger, 2013), higher goal-relevance in a go/no-go task (Walentowska et al., 2016), with increasing processing demands (Frömer et al., 2016b), and with meta-memory mismatch (feedback about incorrect responses given with high confidence [Butterfield and Mangels, 2003]).

Surprise can be quantified using Shannon Information, which reflects the amount of information provided by an outcome given a probability distribution over outcomes (O'Reilly et al., 2013). As seen in Figure 4F, this measure scales with increasing confidence, as well as SPE, that is, increasing deviations between predicted and actual outcome (margins). To generate these predictions, we computed the Shannon Information for a range of outcomes given a range of predictive distributions with varying precision, assuming that confidence reflects the precision of a distribution of predicted outcomes (Figure 4E). Thus, P3a amplitude should scale with both SPE and Confidence. We tested our predictions by examining whether P3a was modulated by SPE, and Confidence, in a model that also included Error Magnitude, RPE, and Block as control variables.

As predicted, our analyses showed that P3a amplitude significantly increased with increasing SPEs, in line with the idea of stronger violations of expectations by less accurately predicted outcomes (Figure 4G and I, Table 5), and with increasing Confidence (Figure 4G and H). Our Shannon Information simulation also predicts a small interaction between SPE and Confidence (see slight diagonal component in Figure 4F). However, when including the interaction term it was not significant and did not improve model fit, ΔΧ2(10)=10.36, p=0.410 (AIC reduced-full = −10, BIC reduced-full = −81), suggesting that any such effect was minimal.

Table 5. LMM statistics of learning effects on P3a.

P3a Amplitude
Predictors Estimates SE CI t p
Intercept 4.10 0.42 3.28–4.93 9.79 1.293e-22
Confidence 0.97 0.14 0.70–1.24 6.96 3.338e-12
Block −0.91 0.07 −1.05 – −0.77 −12.93 3.201e-38
Sensory prediction error 2.06 0..48 1.11–3.00 4..27 1.969e-05
Reward prediction error −0.75 0.38 −1.49 – −0..01 −1.98 4.794e-02
Error magnitude −1..95 0..44 −2.81 – −1..09 −4.43 9.512e-06
Random effects Model Parameters
 Residuals 22.98 N 40
 Intercept 6.83 Observations 9678
 SPE 3.02 log-Likelihood −28997.990
Deviance 57995.981

Formula: P3a ~ Confidence + Block +SPE + RPE+EM + (SPE|participant).

In addition, P3a amplitude decreased across blocks, perhaps reflecting decreased motivational relevance of feedback as participants improved their performance and predictions (Walentowska et al., 2016; Severo et al., 2020). We also observed a significant decrease of P3a with increasing Error Magnitude and larger P3a amplitudes for more negative reward prediction errors. However, these effects showed more posterior scalp distributions than those of SPE and confidence. As P3a temporally overlaps with the more posteriorly distributed P3b, these effects are likely a spillover of the P3b. Hence, we discuss them below. Taken together our results support our hypothesis that predictions and confidence shape feedback processing at the level of the P3a.

Prediction errors, objective errors, and confidence converge in the P3b

Our overarching hypothesis and model predict that outcome predictions and confidence should affect the degree to which feedback is used for future behavioral adaptation. The parietally distributed P3b scales with learning from feedback (Ullsperger et al., 2014a; Fischer and Ullsperger, 2013; Yeung and Sanfey, 2004; Sailer et al., 2010; Chase et al., 2011) and predicts subsequent behavioral adaptation (Fischer and Ullsperger, 2013; Chase et al., 2011). P3b amplitude has been found to increase with feedback salience (reward magnitude irrespective of valence; Yeung and Sanfey, 2004), behavioral relevance (choice vs. no choice; Yeung et al., 2005), with more negative-going RPE (Ullsperger et al., 2014a; Fischer and Ullsperger, 2014), but also with better outcomes in more complex tasks (Pfabigan et al., 2014).

Consistent with their necessity for governing behavioral adaptation, P3b was sensitive to participants’ outcome predictions (Table 6). P3b amplitude increased with increasing SPE (Figure 5B,E), indicating that participants extracted more information from the feedback stimulus when outcomes were less expected. As for P3a, this SPE effect decreased across blocks, and so did overall P3b amplitude, suggesting that participants made less use of the feedback as they improved on the task (Fischer and Ullsperger, 2013).

Table 6. LMM statistics of learning effects on P3b.

P3b Amplitude
Predictors Estimates SE CI t p
Intercept 4.12 0.29 3.55–4.70 14.12 2.937e-45
Block −0.48 0.09 −0.66 – −0.30 −5.20 2.037e-07
Confidence 0.08 0.20 −0.31–0.48 0.42 6.740e-01
Reward prediction error −1.12 0.46 −2.03 – −0.22 −2.43 1.493e-02
Sensory prediction error 1.75 0.47 0.84–2.66 3.76 1.691e-04
Error magnitude −2.35 0.46 −3.24 – −1.45 −5.14 2.743e-07
Confidence: Reward prediction error −0.51 0.55 −1.60–0.57 −0.92 3.556e-01
Block: Confidence 0.07 0.18 −0.28–0.43 0.41 6.823e-01
Block: Reward prediction error −0.52 0.44 −1.39–0.34 −1.19 2.359e-01
Block: Sensory prediction error −0.98 0.46 −1.88 – −0.07 −2.12 3.405e-02
Block: Confidence: Reward prediction error 2.22 0.72 0.81–3.64 3.08 2.057e-03
Random effects
 Residuals 23.95 N 40
 Intercept 3.17 Observations 9678
 Sensory Prediction Error 2.16 log-Likelihood −29197.980
 Reward prediction error 1.67 Deviance 58395.960
 Confidence 0.63

Formula: P3b ~ Block*(Confidence*RPE +SPE)+Error Magnitude + (SPE +RPE + Confidence|participant); Note: ‘:” indicates interactions.

Figure 5. Performance-relevant information converges in the P3b.

(A) Grand average ERP waveform at Pz with the time window for quantification, 416–516 ms, highlighted. (B) Effect topographies as predicted by LMMs for RPE, error magnitude, SPE and the RPE by Confidence by Block interaction. (C–F) LMM-estimated effects on P3b amplitude visualized with the effects package in R; shaded areas represent 95% confidence intervals. (C.) RPE. Note the interaction effects with Block and Confidence (D), that modulate the main effect (D) Three-way interaction of RPE, Confidence and Block. Asterisks denote significant RPE slopes within cells. (E) P3b amplitude as a function of SPE. (F) P3b amplitude as a function of Error Magnitude.

Figure 5.

Figure 5—figure supplement 1. P3b to feedback modulates error-related adjustments on the subsequent trial.

Figure 5—figure supplement 1.

Top: y-hats illustrating the interaction of previous error magnitude and P3b on improvement on the current trial for each block. Bottom: Distribution of errors in each block.

P3b amplitude also increased with negative-going RPE (Figure 5B,C), hence, for worse-than-expected outcomes, replicating previous work (Ullsperger et al., 2014a; Fischer and Ullsperger, 2014). This RPE effect was significantly modulated by Confidence and Block, indicating that the main effect needs to be interpreted with caution, and the relationship between P3b and RPE is more nuanced than previous literature suggested. As shown in Figure 5D in the first block, P3b amplitude was highest for large negative RPE and high Confidence, whereas in the last block it was highest for large negative RPE and low Confidence (see below for follow-up analyses).

In line with previous findings (Pfabigan et al., 2014; Ernst and Steinhauser, 2018), we further observed significant increases of P3b amplitude with decreasing Error Magnitude, thus, with better outcomes (Figure 5B,F, Table 5). We found no further significant interactions, and excluding the non-significant interaction terms from the full model did not significantly diminish goodness of fit, ΔΧ2(5)=10.443, p=0.064 (AICreduced-full = 0; BICreduced- full = −35).

Our hypothesis states that the degree to which people rely on their predictions when learning from feedback should vary with their confidence in those predictions. In the analysis above, we observed such an interaction with confidence only for RPE (and Block). RPE is derived from the contrast between Error Magnitude and Predicted Error Magnitude, and changes in either variable or their weighting could drive the observed interaction. To better understand this interaction and test explicitly whether confidence regulates the impact of predictions, we therefore ran complementary analyses where instead of RPE we included Predicted Error Magnitude (Table 7). Confirming its relevance for the earlier interaction involving RPE, Predicted Error Magnitude indeed interacted with Confidence and Block. Consistent with confidence-based regulation of learning, a follow-up analysis showed that in the first block, P3b significantly increased with higher Confidence, and importantly decreased significantly more with increasing Predicted Error Magnitude as Confidence increased (Supplementary file 8). Main effects of Predicted Error Magnitude emerged only in the late blocks when participants were overall more confident.

Table 7. LMM statistics of confidence weighted predicted error discounting on P3b.

P3b Amplitude
Predictors Estimates SE CI t p
Intercept 4.26 0.30 3.68–4.85 14.22 7.239e-46
Confidence 0.31 0.22 −0.12–0.75 1.41 1.595e-01
Predicted error magnitude −0.83 0.46 −1.74–0.07 −1.80 7.133e-02
Block −0.32 0.11 −0.52 – −0.11 −2.98 2.860e-03
Error magnitude −1.06 0.49 −2.03 – −0.09 −2.13 3.277e-02
Sensory prediction error 1.49 0.40 0.71–2.28 3.72 1.992e-04
Confidence: Predicted error magnitude −0.98 0.69 −2.34–0.38 −1.41 1.582e-01
Confidence: Block −0.50 0.20 −0.90 – −0.11 −2.50 1.249e-02
Predicted Error magnitude: Block −1.12 0.56 −2.22 – −0.02 −2.00 4.540e-02
Confidence:
Predicted error magnitude: Block
3.12 0.84 1.47–4.78 3.70 2.141e-04
Random effects Model Parameters
 Residuals 23.98 N 40
 Intercept 3.30 Observations 9678
 Error magnitude 3.43 log-Likelihood −29201.951
 Confidence 0.72 Deviance 58403.902

Formula: P3b ~ Block*(Confidence*Predicted Error Magnitude +SPE)+Error Magnitude + (Error Magnitude +Confidence|participant); Note: ‘:” indicates interactions.

Hence, our P3b findings indicate that early on in learning, when little is known about the task, participants learn more and discount their predictions more when they have high confidence in those predictions. In later trials however, when confidence is higher overall, participants discount their predicted errors even when confidence is relatively lower.

We next explored whether P3b amplitude is associated with trial-by-trial adjustments. To that aim, we computed the improvement on trial n as the difference between the error on trial n and the error on trial n-1. Time-estimation responses are noisy, and thus provide only a coarse trial-by-trial indicator of learning. Consistent with regression to the mean, where larger errors are more likely followed by smaller errors, improvements increased with the magnitude of the error on the previous trial (b = 0.85, p<0.001, Supplementary file 9). We find, however, that this effect varies across blocks (b = 0.13, p<0.001), and is least pronounced in the first block when most learning takes place (Block 1: b = 0.66, p<0.001; Block 2–5: b >= 0.90, p<0.001, Supplementary file 10). We thus next tested whether P3b on trial n-1 mediates the relationship between error magnitude on trial n-1 and the improvement on the current trial, leading to stronger improvements following a given error, particularly in the first block when most learning takes place and performance is less determined by previous error alone. Indeed, we found a significant three-way interaction between previous error magnitude, previous P3b amplitude and Block (b = - 0.03, p=0.031, Supplementary file 9, Figure 5—figure supplement 1) on improvement. A follow-up analysis confirmed that P3b mediated the relationship between previous error magnitude and improvement in the first block (b = 0.06, p<0.001, Supplementary file 10). This interaction was not significant within any of the remaining blocks. While intriguing and in line with previous work linking P3b to trial-by-trial adjustments in behavior, these results should be interpreted with a degree of caution given that the present task is not optimized to test for trial-to-trial adjustments in behavior.

Taken together, our ERP findings support our main hypothesis that individuals take their internal evaluations into account when processing feedback, such that distinct ERP components reflect different aspects of internal evaluations rather than just signaling objective error.

Discussion

The present study explored the hypothesis that learning from feedback depends on internal performance evaluations as reflected in outcome predictions and confidence. Comparing different Bayesian agents with varying insights into their trial-by-trial performance, we show that performance monitoring provides an advantage in learning, as long as agents’ confidence is accurately calibrated. To test our hypothesis empirically, we collected participants’ trial-wise outcome predictions and confidence judgments in a time-estimation task prior to receiving feedback, while recording EEG. Like the simulations from the Bayesian learner with performance monitoring, our empirical results show that trial-by-trial confidence tracks the precision of outcome predictions, and individuals with better coupling between confidence and the precision of their predictions (confidence calibration) showed greater improvements in performance over the course of the experiment. Moreover, participants’ subjective predictions, as well as their confidence in those predictions, influenced feedback processing as revealed by feedback-related potentials.

Our study builds on an extensive body of work on performance monitoring, proposing that deviations from performance goals are continuously monitored, and expectations are updated as soon as novel information becomes available (Holroyd and Coles, 2002; Ullsperger et al., 2014b). Hence, performance monitoring at later stages should depend on performance monitoring at earlier stages (Holroyd and Coles, 2002). Specifically, learning from feedback should critically depend on internal performance monitoring. Our results extend previous work demonstrating a shift from feedback-based to response-based evaluations as learning progresses (Bellebaum and Colosio, 2014; Bultena et al., 2017): They show that performance monitoring and learning from feedback are not mutually exclusive modes of performance evaluation; instead, they operate in concert, with confidence in response-based outcome predictions determining the degree to which this information is relied on.

Participants’ behavior displayed hallmarks of error monitoring (Kononowicz et al., 2019; Akdoğan and Balcı, 2017; Kononowicz and van Wassenhove, 2019), such that outcome predictions tracked factual errors in both direction and magnitude. Crucially, extending those previous findings, our empirical results align with unique predictions based on our hypothesis: Confidence reflected the precision of participants’ outcome predictions, and participants with superior calibration of their confidence judgments to the accuracy of their predictions learned better than those with poorer calibration. This latter finding is notable given that overall confidence calibration was similar for participants with different performance quality (error magnitude, response variance). Therefore, the empirical confidence calibration effect on learning is unlikely to be a consequence of better overall ability as described in the ‘unskilled and unaware effect’ (Kruger and Dunning, 1999) or of the dependence of confidence calibration (or metacognitive sensitivity) on performance (Fleming and Lau, 2014). Instead, the finding supports our hypothesis that confidence supports learning via optimized feedback processing.

Our simulations and ERP data reveal two critical mechanisms through which performance monitoring may impact learning from feedback: modulation of surprise and reduction of uncertainty via credit assignment. The main impact of error monitoring is to reduce the surprise about outcomes. All else being equal, a given outcome is less surprising the better it matches the predicted outcome. Consistent with discounting of predicted deviations from the goal, we found that participants’ trial-by-trial subjective outcome predictions consistently modulated feedback-based evaluation reflected in ERPs as evidenced by prediction error effects. Participants’ response-based outcome predictions were reflected in the amplitudes of FRN reward prediction error signals (Holroyd and Coles, 2002; Walsh and Anderson, 2012; Sambrook and Goslin, 2015; Correa et al., 2018), of P3a surprise signals, as well as the P3b signals combining information about reward prediction error and surprise. In our computational models, reducing surprise by taking response-based outcome predictions into account led to more accurate updating of internal representations supporting action selection and thus superior learning.

Learning – in our computational model and in our participants - was further supported by the adaptive regulation of uncertainty-driven updating via confidence. Specifically, as deviations from the goal were predicted with higher confidence, these more precise outcome predictions enhanced the surprise elicited by a given prediction error. This mechanism implemented in our model is mirrored in participants’ increased P3a amplitudes for higher confidence, and further reflected in confidence-weighted impacts of predicted error magnitude on P3b, as well as larger P3b amplitudes for higher confidence in the first block when most learning took place. Thus, a notable finding revealed by our simulations and empirical data is that, counterintuitively, agents and participants learned more from feedback when confidence in their predictions had been high.

Although FRN amplitude was not modulated by confidence, we found that P3a increased with confidence, as predicted by uncertainty-driven changes in surprise. Our results align with previous findings of larger P3a amplitude for metacognitive mismatch (Butterfield and Mangels, 2003) and offer a computational mechanism underlying previous theorizing that feedback about errors committed with high confidence attracts more attention, and therefore leads to hypercorrection (Butterfield and Metcalfe, 2006; Butterfield and Metcalfe, 2001). We also found that confidence modulated the degree to which predicted error magnitude reduced P3b amplitude, such that in initial blocks, where most learning took place, predicted error magnitude effects were amplified for higher confidence, whereas this effect diminished in later blocks, where predicted error magnitude effects were present also for low confidence (and performance and prediction errors were attenuated when confidence was high). This shift is intriguing and may indicate a functional change in feedback use as certainty in the response-outcome mapping increases and less about this mapping is learned from feedback, but the effect was not directly predicted and therefore warrants further research and replication.

Confidence has typically been studied in two-alternative choice tasks, and only rarely in relation to continuous outcomes (Meyniel et al., 2015; Meyniel and Dehaene, 2017; Boldt et al., 2019; Lebreton et al., 2015; Nassar et al., 2012; Arbuzova, 2020). By reconceptualizing error detection as outcome prediction, our results shed new light on the well-supported claim that error monitoring and confidence are tightly intertwined (Boldt and Yeung, 2015; Yeung and Summerfield, 2012; Charles and Yeung, 2019; Desender et al., 2018b; Desender et al., 2019) and forge valuable links between research on performance monitoring (Ullsperger et al., 2014a; Holroyd and Coles, 2002; Ullsperger et al., 2014b) and on learning under uncertainty (McGuire et al., 2014; Behrens et al., 2007; O'Reilly et al., 2013; Nassar et al., 2019). In doing so, our results provide further evidence to the growing literature on the role of confidence in learning and behavioral adaptation (Meyniel and Dehaene, 2017; Desender et al., 2018a; Boldt et al., 2019; Colizoli et al., 2018).

While we captured the main effects of interest with our Bayesian model and our key behavioral results are in line with our overarching hypothesis, our behavioral findings also reveal other aspects of learning that remain to be followed up on. Unlike our Bayesian agents, participants exhibited signatures of learning not only at the level of first order performance, but also at the level of performance monitoring. The precision of their outcome predictions increased as learning progressed, as did confidence. Identifying the mechanisms that drive this metacognitive learning, that is, whether changes in confidence follow the uncertainty in the internal model or reflect refinement of the confidence calibration to the efference copy noise, is an exciting question for future work.

Anyone who tried to learn a sport can relate to the intuition that just because you find out what you did was wrong doesn’t mean you know how to do it right. Our task also evokes this so-called distal problem, which refers to the difficulty of translating distal sensory outcomes of responses (e.g. the location of a red dot on a feedback scale) to required proximal movement parameter changes (changes in the timing of the response). Indeed, when practicing complex motor tasks, individuals prefer and learn better from feedback following successful trials compared to error trials (Chiviacowsky and Wulf, 2007; Chiviacowsky and Wulf, 2002; Chiviacowsky and Wulf, 2005). In line with the notion that in motor learning feedback about success is more informative than feedback about failure, we, like others in the time estimation task (Pfabigan et al., 2014; Ernst and Steinhauser, 2018), observed increasing P3b amplitude after feedback about more accurate performance (i.e. for smaller error magnitude), in addition to prediction error effects.

In our study, the P3b component, previously shown to scale with the amount of information provided by a stimulus (Cockburn and Holroyd, 2018) and the amount of learning from a stimulus (Fischer and Ullsperger, 2013), was sensitive to both RPE and SPE, indicating that multiple learning mechanisms may act in parallel, supported by different aspects of feedback. Our findings resonate with recent work in rodents showing that prediction error signals encode multiple features of outcomes (Langdon et al., 2018), and are based on distributed representations of predictions (Dabney et al., 2020). This encoding of multiple features of outcomes, like the uncertainty in predictions, may help credit assignment and support learning at multiple levels. It is still unclear to what degree different learning mechanisms – error-based, model-based, reinforcement learning – contribute to motor learning (Wolpert and Flanagan, 2016). Further research is needed to identify whether the same or different learning mechanisms operate across levels, for example, via hierarchical reinforcement learning (Holroyd and Yeung, 2012; Lieder et al., 2018), and how learning interacts between levels.

Taken together, our findings provide evidence that feedback evaluation is fundamentally affected by an individual’s internal representations of their own performance at the time of feedback. These internal representations in turn influence how people learn and thus which beliefs they will have and which actions they will take, driving what internal and external states they will encounter in the future. The present study is a first step toward elucidating this recursive process of performance optimization via internal performance monitoring and monitoring of external task outcomes.

Materials and methods

Task variables

  • t denotes the target interval, which was set to t:=19, (this simulation parameter choice was necessarily somewhat arbitrary, and choosing a different parameter does not change the model’s behavior)

  • s denotes the feedback scale, which was set to s:=90,

  • r denotes the model’s or participant’s response,

  • f denotes the feedback in the task, which was defined as

f:=(r-t)s (1)

Computational model

The Bayesian learner with performance monitoring attempted to sequentially infer the target interval t and feedback scale s (defining how the magnitude of a given response error translates to the magnitude of the error displayed on the visual feedback scale) over multiple trials, based on its intended response i, an efference copy c of its executed response and feedback f indicating the magnitude and direction of its timing errors. On each trial the model computed its intended response based on the inferred target interval. During learning, the model faced several obstacles including (1) the initially unknown scale of the feedback, making it difficult to judge whether feedback indicates small or large timing errors, (2) response noise, which offsets executed responses from intended ones, and (3) efference copy noise, which makes the efference copy unreliable to a degree that varies from trial to trial. Formally, the Bayesian learner with performance monitoring is represented by the following variables:

  • ptU(t;[0,100]) denotes the model’s prior distribution over the target interval t, as a uniform distribution (denoted by U throughout) of over possible values of t within the range 0 to 100.

  • psU(s;[0.1,100]) denotes the model’s prior distribution over the feedback scale s, as a uniform distribution over possible values of s within a range of 0.1 to 100.

  • pσr2r|iNr;i,σr2 denotes the model’s response distribution (N denoting normal distributions throughout), where i denotes the model’s intended response, which corresponds to the expected target interval i:=Ep(t)=tp(t)t, and σr2 denotes the response noise, which was set to σr: = 10 in terms of the standard deviation

  • pσc2crN(c;r,σc2) denotes the model’s efference-copy distribution with efference-copy noise (we simulated three levels: low, medium and high) expressed as standard deviation σc5,10,20, where pσcCat(σc,1/3). Here we assumed that the model was aware of its trial-by-trial efference copy noise. That is, from the perspective of the Bayesian learner with performance monitoring, efference-copy noise was not a random variable.

Intended response

During the response phase in the task, the model first computed its intended response i and then sampled its actual response r from the Gaussian response distribution. We assumed that the model’s internal response monitoring system subsequently generated the noisy efference copy c.

Learning

Based on the definition of the task and the Bayesian learner with performance monitoring, the joint distribution over the variables of interest during a trial of the task is given by

pi,σc2,σr2t,s,f,r,c:=pfr,t,spσc2(c|r)pi,σr2(r)p(t,s) (2)

To infer the target interval t and the feedback scale s, we can evaluate the posterior distribution conditional on the efference copy c and feedback f and given the intended response i, response noise σr2 and efference copy noise σc2 according to Bayes’ rule:

pi,σc2,σr2(t,s|c,f)pi,σc2,σr2t,s|c,f,rdrpf|r,t,spσc2c|rpi,σr2(r)p(t,s)dr (3)

Note that we assumed that the Bayesian learner with performance monitoring was aware how feedback f was generated in the task (Equation 1), that is, conditional on the response r, target interval t and feedback scale s, the model was able to exactly compute the probability of the feedback according to

p(f|r,t,s)={1,iff=(rt)s0,else (4)

We approximated inference using a grid over the target interval t[0,100] and feedback scale s[0.1,100]. The model first computed the probability of the currently received feedback. Although it was aware how feedback was generated in the task, it suffered from uncertainty over its response due to noise in the efference copy. For each s and t on the grid, the model evaluated the Gaussian distribution

pi,σc2,σr2fc,t,s:=Nf;(m-t)Ts,vA (5)

where A was a 100 × 50 matrix containing the grid values of t and s.

v=11σc2+1σr2 (6)

denotes the expected variance in feedback under consideration of both efference-copy noise σc2 and response noise σr2 and

m=v1σc2c+v1σr2i (7)

denotes the expected feedback under the additional consideration of efference copy c and intended response i. When computing the probability of the feedback, our model thus took into account the efference copy c and the response it intended to produce i, which were weighted according to their respective reliabilities.

Second, the model multiplied the computed probability of the observed feedback with the prior over the scale and target response, that is

pi,σc2,σr2(t,s|c,f)N(f;(mt)Ts,vA)p(s,t) (8)

In the grid approximation, the model started with a uniform prior on the joint distribution over t and s and was applied recursively, such that the posterior joint distribution on t and s for each trial served as the prior distribution for the subsequent one.

Outcome prediction

On each trial, the model reports an outcome prediction po and the confidence in this prediction, which we refer to as co. The outcome prediction maps the discrepancy between the intended response and the efference copy onto the feedback scale given the best guess of the current t and s.

po=(c-i)sp(s)s (9)

Note that this outcome prediction is different from the mean of the uncertainty-weighted expectancy distribution defined in Equation 7, in that it does not take uncertainty into account, but reflects the expectation given the efference copy alone. The inverse uncertainty σc2 of the efference copy, as described below, is translated into the agent’s confidence report.

Confidence calibration

Confidence calibration cc0,0.75,1 denotes the probability that the agent assumes the correct efference copy variance σc2 for learning (c.f. Equations 6,7). cc=1 indicates that the subjectively assumed efference-copy precision is equal to the true precision and cc=0, in contrast, indicates that the assumed precision of the efference copy is different from the true one. That is, in the case of cc=0.75, the agent more likely assumes the true precision of the efference copy but sometimes fails to take it accurately into account during learning. As shown in Figure 1, we simulated behavior of three agents that differed in their confidence calibration according to this idea.

Confidence report

In our model, the confidence report co3,2,1 (we simulate three levels as for efference copy noise) reflects how certain the agent thinks its efference copy is, where 3 refers to 'completely certain', and 1 to 'not certain'. In particular,

co={3,ifσc=52,ifσc=101,ifσc=20 (10)

That is, the confidence report is directly related to the subjective precision of the efference copy, which, as shown above, depends to the agent’s level of confidence calibration.

Model with incomplete performance monitoring

We also applied a model that had no insight into the precision of its current predictions. The agent was thus unaware of its trial-by-trial efference-copy noise σc and therefore relied on the average value of σc, that is σcp(σc)=12. As the model does not differentiate between precise and imprecise predictions but treats each prediction as average precise, it relies too much on imprecise predictions and too little on precise ones.

Model without performance monitoring

Finally, we applied a model that was aware about its response noise but, because it completely failed to consider its efference copy, it lacked insight into its trial-by-trial performance. In this version, we had

v=11σr2 (11)

and m=i.

This model accounts for the expected variance in the feedback, but cannot differentiate between trials in which the feedback is driven more by incorrect beliefs about the target versus incorrect execution. Therefore, it adjusts its beliefs too much following feedback primarily driven by execution errors and too little following feedback primarily driven by incorrect beliefs.

Participants

The experimental study included 40 participants (13 males) whose average age was 25.8 years (SD = 4.3) and whose mean handedness score (Oldfield, 1971) was 63.96 (SD = 52.09; i.e., most participants were right-handed). Participants gave informed consent to the experiment and were remunerated with course credits or 8 € per hour.

Task and procedure

Participants performed an adapted time-estimation task (Luft et al., 2014; Miltner et al., 1997) that included subjective accuracy and confidence ratings (similar to Kononowicz et al., 2019; Akdoğan and Balcı, 2017; Kononowicz and van Wassenhove, 2019). Participants were instructed that their primary goal in this task is to learn to produce an initially unknown time interval. In addition, they were asked to predict the direction and magnitude of any errors they produced and their confidence in those predictions. The time-estimation task is well established for ERP analyses e.g. (Luft et al., 2014; Miltner et al., 1997), and has the advantages that it limits the degrees of freedom of the response, and precludes concurrent visual feedback that might affect performance evaluation. The task consisted of four parts on each trial, illustrated in Figure 1B. After a fixation cross lasting for a random interval of 300–900 ms, a tone (600 Hz, 200 ms duration) was presented. Participants’ task was to terminate an initially unknown target interval of 1504 ms from tone onset, by pressing a response key with their left hand. We chose a supra-second duration to make the task sufficiently difficult (Luft et al., 2014). Following the response, a fixation cross was presented for 800 ms. Participants then estimated the accuracy of the interval they had just produced by moving an arrow on a visual analogue scale (too short – too long;±125 pixel, 3.15 ° visual angle) using a mouse cursor with their right hand. Then, on a scale of the same size, participants rated their confidence in this estimate (not certain – fully certain). The confidence rating was followed by a blank screen for 800 ms. Finally, participants received feedback about their performance with a red square (0.25° visual angle) placed on a scale identical to the accuracy estimation scale but without any labels. The placement of the square on the scale visualized error magnitude in the interval produced, with undershoots shown to the left and overshoots on the right side of the center mark, indicating the correct estimate. Feedback was presented for only 150 ms to preclude eye movements. The interval until the start of the next trial was 1500 ms.

The experiment comprised five blocks of 50 trials each, with self-paced rests between blocks. We used Presentation software (Neurobs.) for stimulus presentation, event and response logging. Visual stimuli were presented on a 4/3 17’’ BenQ Monitor (resolution: 1280 × 1024, refresh rate: 60 Hz) placed at 60 cm distance from the participant. A standard computer mouse and a customized response button (accuracy 2 ms, response latency 9 ms) were used for response registration.

Prior to the experiment, participants filled in demographic and personality questionnaires: Neuroticism and Conscientiousness Scales of NEO PI-R (Costa and McCrae, 1992) and the BIS/BAS scale (Strobel et al., 2001), as well as a subset of the Raven, 2000 progressive matrices as an index for figural-spatial intelligence. These measures were registered as potential control variables and for other purposes not addressed here. Participants were then seated in a shielded EEG cabin, where the experiment including EEG recording was conducted. Prior to the experiment proper, participants performed three practice trials.

Psychophysiological recording and processing

Using BrainVision recorder software (Brain Products, München, Germany) we recorded EEG data from 64 Ag/AgCl electrodes mounted in an electrode cap (ECI Inc), referenced against Cz at a sampling rate of 500 Hz. Electrodes below the eyes (IO1, IO2) and at the outer canthi (LO1, LO2) recorded vertical and horizontal ocular activity. We kept electrode impedance below 5 kΩ and applied a 100 Hz low pass filter, a time constant of 10 s, and a 50 Hz notch filter. At the beginning of the session we recorded 20 trials each of prototypical eye movements (up, down, left, right) for offline ocular artifact correction.

EEG data were processed using custom Matlab (The MathWorks Inc) scripts (Frömer et al., 2018) and EEGlab toolbox functions (Delorme and Makeig, 2004). We re-calculated to average reference and retrieved the Cz channel. The data were band pass filtered between 0.5 and 40 Hz. Ocular artifacts were corrected using BESA (Ille et al., 2002). We segmented the ongoing EEG from −200 to 800 ms relative to feedback onset. Segments containing artifacts were excluded from analyses, based on values exceeding ±150 µV and gradients larger than 50 µV between two adjacent sampling points. Baselines were corrected to the 200 ms pre-stimulus interval (feedback onset).

The FRN was quantified in single-trial ERP waveforms as peak-to-peak amplitude at electrode FCz, specifically as the difference between the minimum voltage in a window from 200 to 300 ms post-feedback onset and the preceding positive maximum in a window from −100 to 0 ms relative to the detected negative peak. To define the time windows for single-trial analyses of P3a and P3b amplitudes, we first determined the average subject-wise peak latencies at FCz and Pz, respectively, and exported 100 ms time windows centered on the respective latencies. Accordingly, the P3a was quantified on single trials as the average voltage within an interval from 330 to 430 ms after feedback onset across all electrodes within a fronto-central region of interest (ROI: F1, Fz, F2, FC1, FCz, FC2, C1, Cz, C2). P3b amplitude was quantified in single trials as the average voltage within a 416–516 ms interval post-feedback across all electrodes within a parietally-focused region of interest (ROI: CP1, CPz, CP2, P1, Pz, P2, PO3, POz, PO4).

Analyses

Outlier inspection of the behavioral data identified one suspicious participant (average RT >10 s) and one trial each in four additional participants (RTs > 6 s, 0.4% of data of remaining participants). These data were excluded from further analyses. We computed two kinds of prediction errors (Figure 3A): SPE was determined as the absolute difference between predicted and actual interval length: |Prediction – Feedback|. RPE was computed as the difference between the absolute predicted error and the absolute actual error as revealed by feedback: |Prediction| – |Feedback|. We quantified confidence calibration as each participant’s correlation of confidence and SPE (absolute deviation of the prediction from the actual outcome) across all trials, controlling for average error magnitude per block to account for shared changes in our confidence calibration measure with performance. To ease interpretation, we sign-reversed the correlations, such that higher values correspond to higher confidence calibration.

Statistical analyses were performed by means of linear mixed models (LMMs) using R (R Development Core Team, 2014) and the lme4 package (R Package, 2014). We chose LMMs, similar to linear multiple regression models, as they allow for parametric analyses of single-trial measures. Further, LMMs are robust to unequally distributed numbers of observations across participants, and simultaneously estimate fixed effects and random variance between participants in both intercepts and slopes. For all dependent variables, full models, including all predictors, were reduced step-wise until model comparisons indicated significantly decreased fit.

We report model comparisons and fit indices: Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), which decrease with improving model fit. Random effect structures were determined using singular value decomposition. Variables explaining zero variance were removed from the random effects structure (Bates et al., 2015; Matuschek et al., 2017).

Prior to the analyses, error magnitude, RPE and SPE were scaled from ms to seconds and confidence and block were also scaled to a range of ±1 for similar scaling of all predictors. Furthermore, block, error magnitude, confidence, and SPE were centered on their medians for accurate intercept computation. RPE was not centered, as zero represents a meaningful value on the scale (predicted and actual error magnitude are the same), and positive and negative values are qualitatively different (negative and positive values represent outcomes that are, respectively, worse or better than expected). Model formulas are reported in the respective tables. Fixed effects are visualized using the effects package (Fox and Weisberg, 2019).

Data availability

The datasets generated and analyzed during the current study are available under https://github.com/froemero/Outcome-Predictions-and-Confidence-Regulate-Learning (copy archieved at swh:1:rev:e8bfacf8fdb8126aade59581b98616b4f2fae7b3; Frömer, 2021).

Code availability

Scripts for all analyses are available under https://github.com/froemero/Outcome-Predictions-and-Confidence-Regulate-Learning.

Acknowledgements

We thank Lena Fliedner and Lara Montau for support in data acquisition and helpful discussions during the setup of the task, Rainer Kniesche for advice on programming the stimulus government, Markus Ullsperger, Adrian Haith, Martin Maier, and Rasha Abdel Rahman for valuable discussions, and Mehrdad Jazayeri for valuable feedback on a previous draft. RF is further grateful for the continuous scientific and personal support by her office mates at Humboldt-University, Benthe Kornrumpf and Florian Niefind, who made her life and work a lot more fun and happened to also have inspired the original title of this paper.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Romy Frömer, Email: romy_fromer@brown.edu.

Tadeusz Wladyslaw Kononowicz, Cognitive Neuroimaging Unit, CEA DRF/Joliot, INSERM, Université Paris-Sud, Université Paris-Saclay, NeuroSpin center, France.

Richard B Ivry, University of California, Berkeley, United States.

Funding Information

This paper was supported by the following grant:

  • NIH Office of the Director R00 AG054732 to Matthew R Nassar.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing.

Formal analysis, Supervision, Methodology, Writing - review and editing.

Formal analysis, Writing - review and editing.

Writing - review and editing.

Resources, Methodology, Writing - review and editing.

Conceptualization, Supervision, Writing - review and editing.

Ethics

Human subjects: The study was performed following the guidelines of the ethics committee of the department of Psychology at Humboldt University. Participants gave informed consent to the experiment and were remunerated with course credits or 8€ per hour.

Additional files

Supplementary file 1. Follow-up on prediction and performance precision effects on confidence.
elife-62825-supp1.docx (16.2KB, docx)
Supplementary file 2. Control analysis for confidence calibration effect on learning.
elife-62825-supp2.docx (15KB, docx)
Supplementary file 3. Follow-up on block and confidence effects on relative error signals.
elife-62825-supp3.docx (15.8KB, docx)
Supplementary file 4. Follow-up on confidence by block interaction on RPE benefit over error magnitude.
elife-62825-supp4.docx (15.5KB, docx)
Supplementary file 5. Block and confidence effects on error signals.
elife-62825-supp5.docx (16.3KB, docx)
Supplementary file 6. Follow-up on block and confidence effects on error signals.
elife-62825-supp6.docx (17.1KB, docx)
Supplementary file 7. Block and confidence effects on error signals.
elife-62825-supp7.docx (15.2KB, docx)
Supplementary file 8. Follow-up analyses on confidence-weighted predicted error magnitude effects on P3b.
elife-62825-supp8.docx (16.9KB, docx)
Supplementary file 9. Trial-to-trial improvements by block and previous error and modulations by previous P3b.
elife-62825-supp9.docx (15.9KB, docx)
Supplementary file 10. Follow-up on trial-to-trial improvements by block and previous error and modulations by previous P3b.
elife-62825-supp10.docx (18.4KB, docx)
Transparent reporting form

Data availability

Scripts and source data for all analyses are available under https://github.com/froemero/Outcome-Predictions-and-Confidence-Regulate-Learning (copy archived at https://archive.softwareheritage.org/swh:1:rev:e8bfacf8fdb8126aade59581b98616b4f2fae7b3).

References

  1. Akdoğan B, Balcı F. Are you early or late?: temporal error monitoring. Journal of Experimental Psychology: General. 2017;146:347–361. doi: 10.1037/xge0000265. [DOI] [PubMed] [Google Scholar]
  2. Arbel Y, Donchin E. How large the sin? A study of the event related potentials elicited by errors of varying magnitude. Psychophysiology. 2011;48:1611–1620. doi: 10.1111/j.1469-8986.2011.01264.x. [DOI] [PubMed] [Google Scholar]
  3. Arbuzova P. Measuring metacognition of direct and indirect parameters of voluntary movement. bioRxiv. 2020 doi: 10.1101/2020.05.14.092189. [DOI] [PubMed]
  4. Balci F, Freestone D, Simen P, Desouza L, Cohen JD, Holmes P. Optimal temporal risk assessment. Frontiers in Integrative Neuroscience. 2011;5:56. doi: 10.3389/fnint.2011.00056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bates D, Kliegl R, Vasishth S, Baayen H. Parsimonious mixed models. arXiv. 2015 https://arxiv.org/pdf/1506.04967.pdf
  6. Behrens TE, Woolrich MW, Walton ME, Rushworth MF. Learning the value of information in an uncertain world. Nature Neuroscience. 2007;10:1214–1221. doi: 10.1038/nn1954. [DOI] [PubMed] [Google Scholar]
  7. Bellebaum C, Colosio M. From feedback- to response-based performance monitoring in active and observational learning. Journal of Cognitive Neuroscience. 2014;26:2111–2127. doi: 10.1162/jocn_a_00612. [DOI] [PubMed] [Google Scholar]
  8. Bland AR, Schaefer A. Different varieties of uncertainty in human decision-making. Frontiers in Neuroscience. 2012;6:85. doi: 10.3389/fnins.2012.00085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Boldt A, Blundell C, De Martino B. Confidence modulates exploration and exploitation in value-based learning. Neuroscience of Consciousness. 2019;2019:niz004. doi: 10.1093/nc/niz004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Boldt A, Yeung N. Shared neural markers of decision confidence and error detection. Journal of Neuroscience. 2015;35:3478–3484. doi: 10.1523/JNEUROSCI.0797-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bultena S, Danielmeier C, Bekkering H, Lemhöfer K. Electrophysiological correlates of error monitoring and feedback processing in second language learning. Frontiers in Human Neuroscience. 2017;11:29. doi: 10.3389/fnhum.2017.00029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Butterfield B, Mangels JA. Neural correlates of error detection and correction in a semantic retrieval task. Cognitive Brain Research. 2003;17:793–817. doi: 10.1016/S0926-6410(03)00203-9. [DOI] [PubMed] [Google Scholar]
  13. Butterfield B, Metcalfe J. Errors committed with high confidence are hypercorrected. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2001;27:1491–1494. doi: 10.1037/0278-7393.27.6.1491. [DOI] [PubMed] [Google Scholar]
  14. Butterfield B, Metcalfe J. The correction of errors committed with high confidence. Metacognition and Learning. 2006;1:69–84. doi: 10.1007/s11409-006-6894-z. [DOI] [Google Scholar]
  15. Carlebach N, Yeung N. Flexible use of confidence to guide advice requests. PsyArXiv. 2020 doi: 10.31234/osf.io/ctyqp. [DOI] [PubMed]
  16. Charles L, Yeung N. Dynamic sources of evidence supporting confidence judgments and error detection. Journal of Experimental Psychology: Human Perception and Performance. 2019;45:39–52. doi: 10.1037/xhp0000583. [DOI] [PubMed] [Google Scholar]
  17. Chase HW, Swainson R, Durham L, Benham L, Cools R. Feedback-related negativity codes prediction error but not behavioral adjustment during probabilistic reversal learning. Journal of Cognitive Neuroscience. 2011;23:936–946. doi: 10.1162/jocn.2010.21456. [DOI] [PubMed] [Google Scholar]
  18. Chiviacowsky S, Wulf G. Self-controlled feedback: does it enhance learning because performers get feedback when they need it? Research Quarterly for Exercise and Sport. 2002;73:408–415. doi: 10.1080/02701367.2002.10609040. [DOI] [PubMed] [Google Scholar]
  19. Chiviacowsky S, Wulf G. Self-controlled feedback is effective if it is based on the learner's performance. Research Quarterly for Exercise and Sport. 2005;76:42–48. doi: 10.1080/02701367.2005.10599260. [DOI] [PubMed] [Google Scholar]
  20. Chiviacowsky S, Wulf G. Feedback after good trials enhances learning. Research Quarterly for Exercise and Sport. 2007;78:40–47. doi: 10.1080/02701367.2007.10599402. [DOI] [PubMed] [Google Scholar]
  21. Cockburn J, Holroyd CB. Feedback information and the reward positivity. International Journal of Psychophysiology. 2018;132:243–251. doi: 10.1016/j.ijpsycho.2017.11.017. [DOI] [PubMed] [Google Scholar]
  22. Colizoli O, de Gee JW, Urai AE, Donner TH. Task-evoked pupil responses reflect internal belief states. Scientific Reports. 2018;8:13702. doi: 10.1038/s41598-018-31985-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Correa CMC, Noorman S, Jiang J, Palminteri S, Cohen MX, Lebreton M, van Gaal S. How the level of reward awareness changes the computational and electrophysiological signatures of reinforcement learning. The Journal of Neuroscience. 2018;38:10338–10348. doi: 10.1523/JNEUROSCI.0457-18.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Costa PT, McCrae RR. Revised NEO personality inventory (NEO-PI-R) and NEO Five-Factor inventory (NEO-FFI): Professional manual. Psychological Assessment Resources.1992. [Google Scholar]
  25. Dabney W, Kurth-Nelson Z, Uchida N, Starkweather CK, Hassabis D, Munos R, Botvinick M. A distributional code for value in dopamine-based reinforcement learning. Nature. 2020;577:671–675. doi: 10.1038/s41586-019-1924-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Delorme A, Makeig S. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of Neuroscience Methods. 2004;134:9–21. doi: 10.1016/j.jneumeth.2003.10.009. [DOI] [PubMed] [Google Scholar]
  27. Desender K, Boldt A, Yeung N. Subjective confidence predicts information seeking in decision making. Psychological Science. 2018a;29:761–778. doi: 10.1177/0956797617744771. [DOI] [PubMed] [Google Scholar]
  28. Desender K, Boldt A, Verguts T, Donner TH. Post-decisional sense of confidence shapes speed-accuracy tradeoff for subsequent choices. bioRxiv. 2018b doi: 10.1101/466730. [DOI] [PMC free article] [PubMed]
  29. Desender K, Murphy P, Boldt A, Verguts T, Yeung N. A postdecisional neural marker of confidence predicts Information-Seeking in Decision-Making. The Journal of Neuroscience. 2019;39:3309–3319. doi: 10.1523/JNEUROSCI.2620-18.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Diederen KM, Spencer T, Vestergaard MD, Fletcher PC, Schultz W. Adaptive prediction error coding in the human midbrain and striatum facilitates behavioral adaptation and learning efficiency. Neuron. 2016;90:1127–1138. doi: 10.1016/j.neuron.2016.04.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Diederen KM, Schultz W. Scaling prediction errors to reward variability benefits error-driven learning in humans. Journal of Neurophysiology. 2015;114:1628–1640. doi: 10.1152/jn.00483.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Ernst B, Steinhauser M. Effects of feedback reliability on feedback-related brain activity: a feedback valuation account. Cognitive, Affective, & Behavioral Neuroscience. 2018;18:596–608. doi: 10.3758/s13415-018-0591-7. [DOI] [PubMed] [Google Scholar]
  33. Faisal AA, Selen LP, Wolpert DM. Noise in the nervous system. Nature Reviews Neuroscience. 2008;9:292–303. doi: 10.1038/nrn2258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Fischer AG, Ullsperger M. Real and fictive outcomes are processed differently but converge on a common adaptive mechanism. Neuron. 2013;79:1243–1255. doi: 10.1016/j.neuron.2013.07.006. [DOI] [PubMed] [Google Scholar]
  35. Fischer AG, Ullsperger M. When is the time for a change? decomposing dynamic learning rates. Neuron. 2014;84:662–664. doi: 10.1016/j.neuron.2014.10.050. [DOI] [PubMed] [Google Scholar]
  36. Fleming SM, Lau HC. How to measure metacognition. Frontiers in Human Neuroscience. 2014;8:443. doi: 10.3389/fnhum.2014.00443. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Fox J, Weisberg S. An R Companion to Applied Regression. Thousand Oaks CA: Sage; 2019. [Google Scholar]
  38. Frömer R, Stürmer B, Sommer W. The better, the bigger: the effect of graded positive performance feedback on the reward positivity. Biological Psychology. 2016a;114:61–68. doi: 10.1016/j.biopsycho.2015.12.011. [DOI] [PubMed] [Google Scholar]
  39. Frömer R, Stürmer B, Sommer W. (Don't) Mind the effort: effects of contextual interference on ERP indicators of motor preparation. Psychophysiology. 2016b;53:1577–1586. doi: 10.1111/psyp.12703. [DOI] [PubMed] [Google Scholar]
  40. Frömer R, Maier M, Abdel Rahman R. Group-Level EEG-Processing pipeline for flexible single Trial-Based analyses including linear mixed models. Frontiers in Neuroscience. 2018;12:48. doi: 10.3389/fnins.2018.00048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Frömer R. Outcome Predictions and Confidence Regulate Learning. swh:1:rev:e8bfacf8fdb8126aade59581b98616b4f2fae7b3Software Heritage. 2021 https://archive.softwareheritage.org/swh:1:rev:e8bfacf8fdb8126aade59581b98616b4f2fae7b3
  42. Hadjiosif AM, Krakauer JW, Haith AM. Did we get sensorimotor adaptation wrong? implicit adaptation as direct policy updating rather than forward-model-based learning. bioRxiv. 2020 doi: 10.1101/2020.01.22.914473. [DOI] [PMC free article] [PubMed]
  43. Hajcak G, Moser JS, Holroyd CB, Simons RF. The feedback-related negativity reflects the binary evaluation of good versus bad outcomes. Biological Psychology. 2006;71:148–154. doi: 10.1016/j.biopsycho.2005.04.001. [DOI] [PubMed] [Google Scholar]
  44. Holroyd CB, Nieuwenhuis S, Yeung N, Cohen JD. Errors in reward prediction are reflected in the event-related brain potential. NeuroReport. 2003;14:2481–2484. doi: 10.1097/00001756-200312190-00037. [DOI] [PubMed] [Google Scholar]
  45. Holroyd CB, Hajcak G, Larsen JT. The good, the bad and the neutral: electrophysiological responses to feedback stimuli. Brain Research. 2006;1105:93–101. doi: 10.1016/j.brainres.2005.12.015. [DOI] [PubMed] [Google Scholar]
  46. Holroyd CB, Coles MGH. The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. Psychological Review. 2002;109:679–709. doi: 10.1037/0033-295X.109.4.679. [DOI] [PubMed] [Google Scholar]
  47. Holroyd CB, Yeung N. Motivation of extended behaviors by anterior cingulate cortex. Trends in Cognitive Sciences. 2012;16:122–128. doi: 10.1016/j.tics.2011.12.008. [DOI] [PubMed] [Google Scholar]
  48. Ille N, Berg P, Scherg M. Artifact correction of the ongoing EEG using spatial filters based on artifact and brain signal topographies. Journal of Clinical Neurophysiology. 2002;19:113–124. doi: 10.1097/00004691-200203000-00002. [DOI] [PubMed] [Google Scholar]
  49. Kononowicz TW, Roger C, van Wassenhove V. Temporal metacognition as the decoding of Self-Generated brain dynamics. Cerebral Cortex. 2019;29:4366–4380. doi: 10.1093/cercor/bhy318. [DOI] [PubMed] [Google Scholar]
  50. Kononowicz TW, van Wassenhove V. Evaluation of Self-generated behavior: untangling metacognitive readout and error detection. Journal of Cognitive Neuroscience. 2019;31:1641–1657. doi: 10.1162/jocn_a_01442. [DOI] [PubMed] [Google Scholar]
  51. Kruger J, Dunning D. Unskilled and unaware of it: how difficulties in recognizing one's own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology. 1999;77:1121–1134. doi: 10.1037/0022-3514.77.6.1121. [DOI] [PubMed] [Google Scholar]
  52. Langdon AJ, Sharpe MJ, Schoenbaum G, Niv Y. Model-based predictions for dopamine. Current Opinion in Neurobiology. 2018;49:1–7. doi: 10.1016/j.conb.2017.10.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Lebreton M, Abitbol R, Daunizeau J, Pessiglione M. Automatic integration of confidence in the brain valuation signal. Nature Neuroscience. 2015;18:1159–1167. doi: 10.1038/nn.4064. [DOI] [PubMed] [Google Scholar]
  54. Lieder F, Shenhav A, Musslick S, Griffiths TL. Rational metareasoning and the plasticity of cognitive control. PLOS Computational Biology. 2018;14:e1006043. doi: 10.1371/journal.pcbi.1006043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Lohse KR, Miller MW, Daou M, Valerius W, Jones M. Dissociating the contributions of reward-prediction errors to trial-level adaptation and long-term learning. Biological Psychology. 2020;149:107775. doi: 10.1016/j.biopsycho.2019.107775. [DOI] [PubMed] [Google Scholar]
  56. Luft CD, Takase E, Bhattacharya J. Processing graded feedback: electrophysiological correlates of learning from small and large errors. Journal of Cognitive Neuroscience. 2014;26:1180–1193. doi: 10.1162/jocn_a_00543. [DOI] [PubMed] [Google Scholar]
  57. Maier ME, Yeung N, Steinhauser M. Error-related brain activity and adjustments of selective attention following errors. NeuroImage. 2011;56:2339–2347. doi: 10.1016/j.neuroimage.2011.03.083. [DOI] [PubMed] [Google Scholar]
  58. Maier ME, di Pellegrino G, Steinhauser M. Enhanced error-related negativity on flanker errors: error expectancy or error significance? Psychophysiology. 2012;49:899–908. doi: 10.1111/j.1469-8986.2012.01373.x. [DOI] [PubMed] [Google Scholar]
  59. Matuschek H, Kliegl R, Vasishth S, Baayen H, Bates D. Balancing type I error and power in linear mixed models. Journal of Memory and Language. 2017;94:305–315. doi: 10.1016/j.jml.2017.01.001. [DOI] [Google Scholar]
  60. McDougle SD, Boggess MJ, Crossley MJ, Parvin D, Ivry RB, Taylor JA. Credit assignment in movement-dependent reinforcement learning. PNAS. 2016;113:6797–6802. doi: 10.1073/pnas.1523669113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. McDougle SD, Butcher PA, Parvin DE, Mushtaq F, Niv Y, Ivry RB, Taylor JA. Neural signatures of prediction errors in a Decision-Making task are modulated by action execution failures. Current Biology. 2019;29:1606–1613. doi: 10.1016/j.cub.2019.04.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. McGuire JT, Nassar MR, Gold JI, Kable JW. Functionally dissociable influences on learning rate in a dynamic environment. Neuron. 2014;84:870–881. doi: 10.1016/j.neuron.2014.10.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Meyniel F, Schlunegger D, Dehaene S. The sense of confidence during probabilistic learning: a normative account. PLOS Computational Biology. 2015;11:e1004305. doi: 10.1371/journal.pcbi.1004305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Meyniel F, Dehaene S. Brain networks for confidence weighting and hierarchical inference during probabilistic learning. PNAS. 2017;114:E3859–E3868. doi: 10.1073/pnas.1615773114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Miltner WH, Braun CH, Coles MG. Event-related brain potentials following incorrect feedback in a time-estimation task: evidence for a "generic" neural system for error detection. Journal of Cognitive Neuroscience. 1997;9:788–798. doi: 10.1162/jocn.1997.9.6.788. [DOI] [PubMed] [Google Scholar]
  66. Murphy PR, Robertson IH, Harty S, O'Connell RG. Neural evidence accumulation persists after choice to inform metacognitive judgments. eLife. 2015;4:e11946. doi: 10.7554/eLife.11946. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Nassar MR, Wilson RC, Heasly B, Gold JI. An approximately bayesian delta-rule model explains the dynamics of belief updating in a changing environment. Journal of Neuroscience. 2010;30:12366–12378. doi: 10.1523/JNEUROSCI.0822-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Nassar MR, Rumsey KM, Wilson RC, Parikh K, Heasly B, Gold JI. Rational regulation of learning dynamics by pupil-linked arousal systems. Nature Neuroscience. 2012;15:1040–1046. doi: 10.1038/nn.3130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Nassar MR, Bruckner R, Frank MJ. Statistical context dictates the relationship between feedback-related EEG signals and learning. eLife. 2019;8:e46975. doi: 10.7554/eLife.46975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Nieuwenhuis S, De Geus EJ, Aston-Jones G. The anatomical and functional relationship between the P3 and autonomic components of the orienting response. Psychophysiology. 2011;48:162–175. doi: 10.1111/j.1469-8986.2010.01057.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. O'Reilly JX. Making predictions in a changing world-inference, uncertainty, and learning. Frontiers in Neuroscience. 2013;7:105. doi: 10.3389/fnins.2013.00105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. O'Reilly JX, Schüffelgen U, Cuell SF, Behrens TE, Mars RB, Rushworth MF. Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. PNAS. 2013;110:E3660–E3669. doi: 10.1073/pnas.1305373110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Oldfield RC. The assessment and analysis of handedness: the edinburgh inventory. Neuropsychologia. 1971;9:97–113. doi: 10.1016/0028-3932(71)90067-4. [DOI] [PubMed] [Google Scholar]
  74. Parvin DE, McDougle SD, Taylor JA, Ivry RB. Credit assignment in a motor decision making task is influenced by agency and not sensory prediction errors. The Journal of Neuroscience. 2018;38:4521–4530. doi: 10.1523/JNEUROSCI.3601-17.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Pearce JM, Hall G. A model for pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review. 1980;87:532–552. doi: 10.1037/0033-295X.87.6.532. [DOI] [PubMed] [Google Scholar]
  76. Pfabigan DM, Zeiler M, Lamm C, Sailer U. Blocked versus randomized presentation modes differentially modulate feedback-related negativity and P3b amplitudes. Clinical Neurophysiology. 2014;125:715–726. doi: 10.1016/j.clinph.2013.09.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Pfabigan DM, Seidel EM, Paul K, Grahl A, Sailer U, Lanzenberger R, Windischberger C, Lamm C. Context-sensitivity of the feedback-related negativity for zero-value feedback outcomes. Biological Psychology. 2015;104:184–192. doi: 10.1016/j.biopsycho.2014.12.007. [DOI] [PubMed] [Google Scholar]
  78. Polich J. Updating P300: an integrative theory of P3a and P3b. Clinical Neurophysiology. 2007;118:2128–2148. doi: 10.1016/j.clinph.2007.04.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Pouget A, Drugowitsch J, Kepecs A. Confidence and certainty: distinct probabilistic quantities for different goals. Nature Neuroscience. 2016;19:366–374. doi: 10.1038/nn.4240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. R Development Core Team . Vienna, Austria: R Foundation for Statistical Computing; 2014. https://www.r-project.org [Google Scholar]
  81. R Package 1.1-8Lme4: Linear Mixed-Effects Models Using Eigen and S4. 2014 https://cran.r-project.org/web/packages/lme4/index.html
  82. Raven J. The raven's progressive matrices: change and stability over culture and time. Cognitive Psychology. 2000;41:1–48. doi: 10.1006/cogp.1999.0735. [DOI] [PubMed] [Google Scholar]
  83. Riesel A, Weinberg A, Endrass T, Meyer A, Hajcak G. The ERN is the ERN is the ERN? convergent validity of error-related brain activity across different tasks. Biological Psychology. 2013;93:377–385. doi: 10.1016/j.biopsycho.2013.04.007. [DOI] [PubMed] [Google Scholar]
  84. Sailer U, Fischmeister FP, Bauer H. Effects of learning on feedback-related brain potentials in a decision-making task. Brain Research. 2010;1342:85–93. doi: 10.1016/j.brainres.2010.04.051. [DOI] [PubMed] [Google Scholar]
  85. Sambrook TD, Goslin J. Mediofrontal event-related potentials in response to positive, negative and unsigned prediction errors. Neuropsychologia. 2014;61:1–10. doi: 10.1016/j.neuropsychologia.2014.06.004. [DOI] [PubMed] [Google Scholar]
  86. Sambrook TD, Goslin J. A neural reward prediction error revealed by a meta-analysis of ERPs using great grand averages. Psychological Bulletin. 2015;141:213–235. doi: 10.1037/bul0000006. [DOI] [PubMed] [Google Scholar]
  87. Sarafyazd M, Jazayeri M. Hierarchical reasoning by neural circuits in the frontal cortex. Science. 2019;364:eaav8911. doi: 10.1126/science.aav8911. [DOI] [PubMed] [Google Scholar]
  88. Schiffer AM, Siletti K, Waszak F, Yeung N. Adaptive behaviour and feedback processing integrate experience and instruction in reinforcement learning. NeuroImage. 2017;146:626–641. doi: 10.1016/j.neuroimage.2016.08.057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Severo MC, Paul K, Walentowska W, Moors A, Pourtois G. Neurophysiological evidence for evaluative feedback processing depending on goal relevance. NeuroImage. 2020;215:116857. doi: 10.1016/j.neuroimage.2020.116857. [DOI] [PubMed] [Google Scholar]
  90. Strobel A, Beauducel A, Debener S, Brocke B. Eine Deutschsprachige version des BIS/BAS-Fragebogens von carver und white. Zeitschrift Für Differentielle Und Diagnostische Psychologie. 2001;22:216–227. doi: 10.1024//0170-1789.22.3.216. [DOI] [Google Scholar]
  91. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT press; 1998. [Google Scholar]
  92. Talmi D, Atkinson R, El-Deredy W. The feedback-related negativity signals salience prediction errors, not reward prediction errors. Journal of Neuroscience. 2013;33:8264–8269. doi: 10.1523/JNEUROSCI.5695-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Ullsperger M, Fischer AG, Nigbur R, Endrass T. Neural mechanisms and temporal dynamics of performance monitoring. Trends in Cognitive Sciences. 2014a;18:259–267. doi: 10.1016/j.tics.2014.02.009. [DOI] [PubMed] [Google Scholar]
  94. Ullsperger M, Danielmeier C, Jocham G. Neurophysiology of performance monitoring and adaptive behavior. Physiological Reviews. 2014b;94:35–79. doi: 10.1152/physrev.00041.2012. [DOI] [PubMed] [Google Scholar]
  95. Ulrich N, Hewig J. A miss is as good as a mile? processing of near and full outcomes in a gambling paradigm. Psychophysiology. 2014;51:819–823. doi: 10.1111/psyp.12232. [DOI] [PubMed] [Google Scholar]
  96. Vaghi MM, Luyckx F, Sule A, Fineberg NA, Robbins TW, De Martino B. Compulsivity reveals a novel dissociation between action and confidence. Neuron. 2017;96:348–354. doi: 10.1016/j.neuron.2017.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. van den Berg R, Anandalingam K, Zylberberg A, Kiani R, Shadlen MN, Wolpert DM. A common mechanism underlies changes of mind about decisions and confidence. eLife. 2016;5:e12192. doi: 10.7554/eLife.12192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Walentowska W, Moors A, Paul K, Pourtois G. Goal relevance influences performance monitoring at the level of the FRN and P3 components. Psychophysiology. 2016;53:1020–1033. doi: 10.1111/psyp.12651. [DOI] [PubMed] [Google Scholar]
  99. Walsh MM, Anderson JR. Learning from experience: event-related potential correlates of reward processing, neural adaptation, and behavioral choice. Neuroscience & Biobehavioral Reviews. 2012;36:1870–1884. doi: 10.1016/j.neubiorev.2012.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Wolpert DM, Diedrichsen J, Flanagan JR. Principles of sensorimotor learning. Nature Reviews Neuroscience. 2011;12:739–751. doi: 10.1038/nrn3112. [DOI] [PubMed] [Google Scholar]
  101. Wolpert DM, Flanagan JR. Motor prediction. Current Biology. 2001;11:R729–R732. doi: 10.1016/S0960-9822(01)00432-8. [DOI] [PubMed] [Google Scholar]
  102. Wolpert DM, Flanagan JR. Computations underlying sensorimotor learning. Current Opinion in Neurobiology. 2016;37:7–11. doi: 10.1016/j.conb.2015.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Yeung N, Botvinick MM, Cohen JD. The neural basis of error detection: conflict monitoring and the error-related negativity. Psychological Review. 2004;111:931–959. doi: 10.1037/0033-295X.111.4.931. [DOI] [PubMed] [Google Scholar]
  104. Yeung N, Holroyd CB, Cohen JD. ERP correlates of feedback and reward processing in the presence and absence of response choice. Cerebral Cortex. 2005;15:535–544. doi: 10.1093/cercor/bhh153. [DOI] [PubMed] [Google Scholar]
  105. Yeung N, Sanfey AG. Independent coding of reward magnitude and Valence in the human brain. Journal of Neuroscience. 2004;24:6258–6264. doi: 10.1523/JNEUROSCI.4537-03.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Yeung N, Summerfield C. Metacognition in human decision-making: confidence and error monitoring. Philosophical Transactions of the Royal Society B: Biological Sciences. 2012;367:1310–1321. doi: 10.1098/rstb.2011.0416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Yu AJ, Dayan P. Uncertainty, neuromodulation, and attention. Neuron. 2005;46:681–692. doi: 10.1016/j.neuron.2005.04.026. [DOI] [PubMed] [Google Scholar]

Decision letter

Editor: Tadeusz Wladyslaw Kononowicz1
Reviewed by: Tadeusz Wladyslaw Kononowicz2, Simon van Gaal3

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The authors show a novel and important finding that participants use self-knowledge to optimize learning. Participants in a time estimation task used post-response information about their temporal errors to optimize learning. This is evident in the neural prediction error signals that indexed deviations from the intended target response. This work nicely integrates reinforcement-learning, time estimation and performance monitoring.

Decision letter after peer review:

Thank you for submitting your article "I knew that! Response-based Outcome Predictions and Confidence Regulate Feedback Processing and Learning" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Tadeusz Wladyslaw Kononowicz as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Richard Ivry as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Simon van Gaal (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

The authors tested 40 human volunteers in a time production task and post-production performance evaluation (with an initially unknown target duration and feedback scale) while recording EEG. The authors tested the hypothesis that confidence (both its absolute value and its calibration to performance) have an effect on learning and that it affects the processing of reward and sensory prediction errors.

The reviewers all found the results to be interesting and the work was well-conducted. At the same time the reviewers agreed that the authors should be able to address several issues and clarify multiple aspects of the task, performed analyses, and data interpretation. The comments were compiled into essential revisions where we summarize the remarks that should involve additional data analysis and those proposing changes in the manuscript.

Essential revisions:

Additional analyses:

1. The authors analyze correlations between Error Magnitude, Predicted Outcome and Confidence, however before proceeding to analysis of ERPs the manuscript could be improved by including similar analysis of confidence correlations with RPE and SPE, beyond the one relying only on Predicted Outcome (Table 1).

2. Related to point 1, panel 3A should belong to Figure 1, especially if analyses proposed in the first point are included.

3. The authors showed that Error Magnitude decreases on average. However, all ERP analyses were focused on the current trial. If these ERP signals indeed reflect some "updating of internal representations" they should have a relationship with the behavior or neural measures observed on the next trial. It would've been very interesting to see how the processing of feedback (in behavior and ERP responses) relates to performance on the next trial. These analyses should better support the claims of "updating of internal representations", which would considerably improve in impact and quality if these analyses will be reported.

4. Plausible changes of precision (variance) of temporal performance over the course of experiment. Variance dynamics across experimental session could affected the outcome of the confidence calibration. The authors rightfully show that Confidence Calibration was not related to Average Error Magnitude. The same check should be performed for Time Production variance. Moreover, the effects within participants and over the course of experiment should be considered and presumably included as covariates in the LMM.

5. Specific point from one of the reviewers: The authors mention again on page 24: "We also found that confidence modulated RPE effects on P3b amplitude, such that in initial blocks, where most learning took place, RPE effects were amplified for higher confidence, whereas this effect reversed in later blocks, where RPE effects were present for low, but not high confidence. This shift is intriguing and may indicate a functional change in feedback use as certainty in the response-outcome mapping increases and less about this mapping is learned from feedback, but the effect was not directly predicted and therefore warrants further research and replication." This is the one result where confidence interacts with other behavioral measures, in this case RPE, which is interesting, however it does so in an unpredicted and counterintuitive way. I wonder whether the authors can in some way get a better understanding of what's going on here? Possibly the paper by Colizoli et al. (2018, Sci Rep.) may be relevant. The authors here show how task difficulty (related to confidence) and error processing are reflecting in pupil size responses.

Other reviewer raised concerns on how different Confidence splits were computed. Although, the authors provide and an intriguing interpretation reference in the paragraph above, is it possible that the early and late effects originate in fact from different group of subjects?

To sum up, extending the analyses with respect to the interaction of confidence and RPE in modulation of P3b component would strongly benefit the manuscript.

6. There is not explicit statement on what exact instructions were given to participants beyond the following one: "participants were required to predict the feedback they would receive on each trial". The caption of Figure 1B says that "scaled relative to the feedback anchors". Therefore, it is not clear what was the primary objective of the task – accurate time production or predicting the feedback accurately? Participants could have increased time production variance to perform better on feedback prediction. If participants employed that kind of strategy that could have impact indices of learning from feedback.

Given the lack of clarity of what instruction was provided to participants it is still unclear on which aspect of the task the participants focused on in their learning. Error Magnitude decreases over trial, however does RPE and SPE increase over trials as well?

Reshaping the manuscript:

1. It was evident from all reviews that at many places an explicit link between interpretative statements and performed analyses were far from clear. Below we list a few specific examples:

– "Taken together, our findings provide evidence that feedback evaluation is an active constructive process that is fundamentally affected by an individual's internal representations of their own performance at the time of feedback." I wonder what results the authors refer to here and on what results this statement is based on.

– The authors say "In line with the notion that positive feedback is more informative than error feedback in motor learning, we, like others in the time estimation task (65,66), observed increasing P3b amplitude after more positive feedback, in addition to prediction error effect". It is not clear which outcome the authors are referring to. Is "better than expected" referred to as "positive feedback"? In this case "worse than expected" triggered higher P3b amplitude.

– On page 24 the authors conclude that "Learning was further supported by the adaptive regulation of uncertainty-driven updating via confidence." Although this sounds interesting I do not see the results supporting this conclusion (but maybe I have missed those). I also think this conclusion is rather difficult to follow. The sentence thereafter they say "Specifically, as deviations from the goal were predicted with higher confidence, these more precise outcome predictions enhanced the surprise elicited by a given prediction error. Thus, a notable finding revealed by our simulations and empirical data is that, counterintuitively, agents and participants learned more from feedback when confidence in their predictions had been high." Also here I have difficulty extracting what the authors really mean. What does it mean "surprise elicited by a prediction error"? To me these are two different measures, one signed one unsigned. Further, where is it shown that participants learn more from feedback when confidence in their prediction was high?

– Differences between blocks in the effect of confidence. This result is discussed twice: in the Results (p. 19) and Discussion. Only in the latter do the authors acknowledge that their interpretation of the effect is rather speculative. I would also flag that in Results, as it was neither part of the model predictions or their design.

2. Performed transformations involving confidence should be clearly explained.

3. Model specification (the formula) should be included in the table legend to aid readability and interpretation as it makes it immediately clear what was defined as a random or fixed effect.

4. On more conceptual level, the authors rely on the assumption that 'Feedback Prediction' is derived from efference copy, which carries motor noise only. In light of the goal of the current manuscript, that is an appropriate strategy. However, I think it should be acknowledged that in the employed paradigm part of behavioral variance may originate from inherent uncertainty of temporal representations (Balci, 2011). Typically, time production variance is partition into a 'clock' variance and 'motor' variance. I have a feeling that this distinction should be spelled out in the manuscript and if assumptions are made they shall be spelled out clearer. Moreover, recent work attempted to tease apart origins of 'Feedback Predictions', indicating that it is unlikely that they originate solely from motor variability (Kononowicz and Van Wassenhove, 2019).

5. The main predictions of the experiment are described in the first paragraph of the Results. But they are not reflected in Figure 1, which is referenced in that paragraph. I would have expected an illustration of the effects of confidence, and instead that only appears on Figure 2. The authors have clear predictions that drive the analysis, but this is not reflected in the flow of the text.

6. Simulations (Figure 2. B, D): As far as I can tell, the model does not capture the data in two ways: it fails to address the cross-over effect (which the authors address) but also does not account for the apparent tendency of the data to increase the error on later trials (whereas the model predict a strict decrease in error over the course of the experiment). The second aspect is not addressed in the Discussion, I think (or I missed it). Do the authors think this is just fatigue, and therefore not consider it as a reason to modify the model? Also Panels 2.A. And C do not really match in the sense that the simulation is done over a much wider range of predicted outcomes. It seems like the model parameters were not fine-tuned to the data. Perhaps this is not strictly necessary if the quantitative predictions of the effects of confidence remain unchanged with a narrower range, but it is perhaps worth discussing.

7. "… it is unknown whether reward prediction errors signaled by the FRN rely on predictions based only on previous feedback, or whether they might incorporate ongoing performance monitoring". I think that phrase should be rephrased based on the findings of Miltner et al. (1997), correctly cited in the manuscript, which showed that FRN was responsive to correct and incorrect feedback in time estimation.

8. Relevance of the dart-throwing example: In the task, participants initially had no idea about the length of the to-be-reproduced interval, and instead had to approximate it iteratively. It was not immediately clear to me how this relates to a dart-throw, where the exact target position is known. I think I understand the authors that the unknown target here is "internal" – the specific motor commands that would lead to a bulls-eye are unknown and only iteratively approximated. If that interpretation is correct, I would recommend the authors clarify it explicitly in the paper, to aid the reader to make a connection. Or perhaps I misunderstood it. Either way it would be important to clarify it.

Balci, F., Freestone, D., Simen, P., Desouza, L., Cohen, J. D., and Holmes, P. (2011). Optimal temporal risk assessment. Front Integr Neurosci, 5, 56.

Colizoli, O., De Gee, J. W., Urai, A. E., and Donner, T. H. (2018). Task-evoked pupil responses reflect internal belief states. Scientific reports, 8(1), 1-13.

Correa, C. M., Noorman, S., Jiang, J., Palminteri, S., Cohen, M. X., Lebreton, M., and van Gaal, S. (2018). How the level of reward awareness changes the computational and electrophysiological signatures of reinforcement learning. Journal of Neuroscience, 38(48), 10338-10348.

Kononowicz, T. W., and Van Wassenhove, V. (2019). Evaluation of Self-generated Behavior: Untangling Metacognitive Readout and Error Detection. Journal of cognitive neuroscience, 31(11), 1641-1657.

eLife. 2021 Apr 30;10:e62825. doi: 10.7554/eLife.62825.sa2

Author response


Essential revisions:

Additional analyses:

1. The authors analyze correlations between Error Magnitude, Predicted Outcome and Confidence, however before proceeding to analysis of ERPs the manuscript could be improved by including similar analysis of confidence correlations with RPE and SPE, beyond the one relying only on Predicted Outcome (Table 1).

Thank you for this recommendation. We agree that our claim that Confidence reflects the precision of predictions can be tested more directly. As a critical test of our assumption that confidence varies with the precision of the prediction – i.e. SPE, we now analyze Confidence as the dependent variable and test how it relates to the precision of the prediction (sensory prediction error), the precision of performance (error magnitude), and how these relationships change across blocks. Consistent with our theoretical assumption, we find a robust relationship with SPE. We also find that Confidence increases with increasing error magnitude, and more so in later blocks. The latter finding is important because it shows that participants were in fact reporting their confidence in the accuracy of their predictions and not confidence in their performance.

The novel Results section reads as follows: “To test more directly our assumption that Confidence tracks the precision of predictions, we followed up on these findings with a complementary analysis of Confidence as the dependent variable and tested how it relates to the precision of predictions (absolute discrepancy between predicted and actual outcome, see sensory prediction error, SPE below), the precision of performance (error magnitude), and how those change across blocks (Table 2). […] This pattern is thus not consistent with learning. Importantly, whereas error magnitude was robustly related to confidence only in the last two blocks, the precision of predictions was robustly related to confidence throughout.”

As to the relationship with RPE, we agree that this is an important relationship to look at, particularly given the somewhat surprising 3-way interaction on the P3b (point 5). We think that in the context of feedback processing and ERPs this relationship is the most relevant and informative and therefore we now introduce a novel set of analyses that specifically investigates changes in our feedback regressors (RPE, SPE and EM) over time and their interaction with confidence.

The novel section reads as follows: “At the core of our hypothesis and model lies the change in feedback processing as a function of outcome predictions and confidence. […] Accordingly, we predicted that participants’ internal evaluations would modulate feedback processing as indexed by distinct feedback-related potentials in the EEG: the feedback-related negativity (FRN), P3a and P3b.”

2. Related to point 1, panel 3A should belong to Figure 1, especially if analyses proposed in the first point are included.

We agree that foreshadowing the different dimensions along which feedback can be evaluated, would help the readers. We have now altered Figure 1 to include the dissociation between performance and prediction and how one being better (or worse) than the other can alter the subjective valence of the outcome. In order to maintain continuity of Figure 1 we have introduced these concepts as part of the cartoon example, rather than in terms of our exact task. Thus, we still include panel 3A in a separate figure in the new subsection we added following your recommendation where we unpack the different kinds of prediction errors in our task, how they change across blocks and as a function of confidence. We hope that this provides the relevant information at the appropriate locations in the manuscript.

3. The authors showed that Error Magnitude decreases on average. However, all ERP analyses were focused on the current trial. If these ERP signals indeed reflect some "updating of internal representations" they should have a relationship with the behavior or neural measures observed on the next trial. It would've been very interesting to see how the processing of feedback (in behavior and ERP responses) relates to performance on the next trial. These analyses should better support the claims of "updating of internal representations", which would considerably improve in impact and quality if these analyses will be reported.

We agree that it would be great if we could link the ERPs to adjustments in behavior and we have now added an exploratory analysis to link P3b to trial-by-trial adjustments to feedback. To show such trial-by-trial adjustments, we quantify the degree to which the performance improvement on the current trial relates to the error on the previous trial and demonstrate that this relationship is contingent on P3b amplitude, specifically in the first block when most learning takes place. We set up a model for improvement on the current trial (previous error minus current error) as a function of error magnitude on the previous trial in interaction with P3b amplitude and their interaction with block. We found the expected 3-way interaction (see Figure S4). When participants’ performance improves the most (in block 1), larger P3b amplitudes to the feedback on the previous trial lead to larger improvements on the current trial. Note, however, that this finding is mostly driven by large errors and that participants are overall likely to perform worse following smaller errors. Responses at the single trial level are subject to substantial noise – the very prerequisite for our study – likely masking local adjustments in the underlying representations. Thus, we are wary to overinterpret this result and thus highlight potential caveats to this analysis in a new section we now added. –

The new section reads: “We next explored whether P3b amplitude is associated with trial-by-trial adjustments. […] While intriguing and in line with previous work linking P3b to trial-by-trial adjustments in behavior, these results should be interpreted with a degree of caution given that the present task is not optimized to test for trial-totrial adjustments in behavior.”

4. Plausible changes of precision (variance) of temporal performance over the course of experiment. Variance dynamics across experimental session could affected the outcome of the confidence calibration. The authors rightfully show that Confidence Calibration was not related to Average Error Magnitude. The same check should be performed for Time Production variance. Moreover, the effects within participants and over the course of experiment should be considered and presumably included as covariates in the LMM.

This is an excellent point. We addressed this concern in two ways as suggested by the reviewer: First, we computed the correlation between participants’ response variance and their confidence calibration. Second, to capture changes over time, we added participants running average response variance as a covariate to the model of error magnitude.

In the manuscript we now write: “Confidence calibration was also not correlated with individual differences in response variance (r = – 2.07e-4, 95%CI = [-0.31, 0.31], p =.999), and the interaction of confidence calibration and block was robust to controlling for running average response variance (Supplementary File 2).”

5. Specific point from one of the reviewers: The authors mention again on page 24: "We also found that confidence modulated RPE effects on P3b amplitude, such that in initial blocks, where most learning took place, RPE effects were amplified for higher confidence, whereas this effect reversed in later blocks, where RPE effects were present for low, but not high confidence. This shift is intriguing and may indicate a functional change in feedback use as certainty in the response-outcome mapping increases and less about this mapping is learned from feedback, but the effect was not directly predicted and therefore warrants further research and replication." This is the one result where confidence interacts with other behavioral measures, in this case RPE, which is interesting, however it does so in an unpredicted and counterintuitive way. I wonder whether the authors can in some way get a better understanding of what's going on here?

We agree that this three-way interaction deserves more unpacking, particularly given the relevance of interactions with confidence for our theoretical hypothesis. The new analyses in response to comments 1 and 2 made it clear that changes in RPE effects with Confidence and Block are complicated by the fact that RPE is not systematically related to Confidence, nor the degree to which it reduces error signals relative to error magnitude. The interaction could reflect changes in the components of RPE – Error Magnitude and Predicted Error Magnitude.

To address this point, we now report a complementary analysis that uses predicted error magnitude rather than RPE. This has the advantage that it allows us to test the specific prediction that predictions are weighted by confidence when processing feedback. This is exactly what we find. In particular in the first block, when most learning takes place, the degree to which predicted errors are discounted (as reflected in a decrease in P3b amplitude) depends on Confidence, and higher confidence is overall associated with larger P3b amplitudes. In later blocks, main effects of predicted error magnitude emerge (and we know from prior analyses, that performance is more variable when confidence is low in those late blocks allowing for larger errors to discount), likely underlying the late confidence by RPE interaction in our original analysis.

The novel Results section reads as follows: “Our hypothesis states that the degree to which people rely on their predictions when learning from feedback should vary with their confidence in those predictions. […] In later trials however, when confidence is higher overall, participants discount their predicted errors even when confidence is relatively low.”

The corresponding section in the discussion now reads: “We also found that confidence modulated the degree to which predicted error magnitude reduced P3b amplitude, such that in initial blocks, where most learning took place, predicted error magnitude effects were amplified for higher confidence, whereas this effect diminished in later blocks, where predicted error magnitude effects were present also for low confidence (and performance and prediction errors were attenuated when confidence was high).”

Possibly the paper by Colizoli et al. (2018, Sci Rep.) may be relevant. The authors here show how task difficulty (related to confidence) and error processing are reflecting in pupil size responses.

Thank you for pointing out the Colizoli reference to us. This is indeed very relevant and we now cite it in the discussion.

Other reviewer raised concerns on how different Confidence splits were computed. Although, the authors provide and an intriguing interpretation reference in the paragraph above, is it possible that the early and late effects originate in fact from different group of subjects?

The confidence splits in the original analysis were merely performed to get a sense of the underlying pattern. We have now removed these follow-up analyses, as we follow up on the 3-way interaction as described above. The pattern in these novel analyses renders a between-group effect unlikely. When separating out Confidence mean and z-scored variations for each participant, we find that the within-subject variability drives the effects we observe – reassuring us about our interpretation. However, confidence levels between subjects seem important as well, as results become unstable when they are not included, maybe because confidence changes with learning.

To sum up, extending the analyses with respect to the interaction of confidence and RPE in modulation of P3b component would strongly benefit the manuscript.

We agree that the extension of the analyses has benefitted the manuscript and thank the reviewers for their recommendation.

6. There is not explicit statement on what exact instructions were given to participants beyond the following one: "participants were required to predict the feedback they would receive on each trial". The caption of Figure 1B says that "scaled relative to the feedback anchors". Therefore, it is not clear what was the primary objective of the task – accurate time production or predicting the feedback accurately? Participants could have increased time production variance to perform better on feedback prediction. If participants employed that kind of strategy that could have impact indices of learning from feedback.

Given the lack of clarity of what instruction was provided to participants it is still unclear on which aspect of the task the participants focused on in their learning. Error Magnitude decreases over trial, however does RPE and SPE increase over trials as well?

Participants were instructed to learn to produce the correct the time interval. Thus, the emphasis was on correct timing production. In addition, they were asked to estimate the error in their response.

We now clarify in the Method: “Participants were instructed that their primary goal in this task is to learn to produce an initially unknown time interval. In addition, they were asked to predict the direction and magnitude of any errors they produced and their confidence in those predictions.”

As now reported in the manuscript, SPE decreased over the course of the experiment just like Error magnitude (primarily from block 1 to 2). Changes in RPE are difficult to interpret given that both better than expected and worse than expected outcomes are still “incorrectly” predicted. SPE is thus clearly the superior indicator. However, we can also look at changes in absolute RPE across blocks and we find that, like Error Magnitude and SPE, it decreases across blocks and primarily from block 1 to 2. Note, however, that these changes are primarily driven by improvements in time estimation performance and diminish substantially once we control for Error Magnitude. We have now added all these analyses in the Feedback section prior to the ERP analyses.

Reshaping the manuscript:

1. It was evident from all reviews that at many places an explicit link between interpretative statements and performed analyses were far from clear. Below we list a few specific examples:

– "Taken together, our findings provide evidence that feedback evaluation is an active constructive process that is fundamentally affected by an individual's internal representations of their own performance at the time of feedback." I wonder what results the authors refer to here and on what results this statement is based on.

We can see how some of this phrasing goes beyond the key findings of our study. We have now simplified the sentence to more distinctly reflect our contributions: “Taken together, our findings provide evidence that feedback evaluation is fundamentally affected by an individual’s internal representations of their own performance at the time of feedback.”

– The authors say "In line with the notion that positive feedback is more informative than error feedback in motor learning, we, like others in the time estimation task (65,66), observed increasing P3b amplitude after more positive feedback, in addition to prediction error effect". It is not clear which outcome the authors are referring to. Is "better than expected" referred to as "positive feedback"? In this case "worse than expected" triggered higher P3b amplitude.

Thank you, we now realize that this was ambiguous. This statement refers to objective performance and we have now changed the statement to make this clear. “In line with the notion that in motor learning feedback about success is more informative than feedback about failure, we, like others in the time estimation task 66,67, observed increasing P3b amplitude after feedback about more accurate performance (i.e. for smaller error magnitude), in addition to prediction error effects.”

– On page 24 the authors conclude that "Learning was further supported by the adaptive regulation of uncertainty-driven updating via confidence." Although this sounds interesting I do not see the results supporting this conclusion (but maybe I have missed those). I also think this conclusion is rather difficult to follow. The sentence thereafter they say "Specifically, as deviations from the goal were predicted with higher confidence, these more precise outcome predictions enhanced the surprise elicited by a given prediction error. Thus, a notable finding revealed by our simulations and empirical data is that, counterintuitively, agents and participants learned more from feedback when confidence in their predictions had been high." Also here I have difficulty extracting what the authors really mean. What does it mean "surprise elicited by a prediction error"? To me these are two different measures, one signed one unsigned. Further, where is it shown that participants learn more from feedback when confidence in their prediction was high?

It seems that there are some misunderstandings here that we have now tried to clarify. To address the unclear link between the conclusion and our findings, we have now extended this section to read: “Learning – in our computational model and in our participants – was further supported by the adaptive regulation of uncertainty-driven updating via confidence. […] Thus, a notable finding revealed by our simulations and empirical data is that, counterintuitively, agents and participants learned more from feedback when confidence in their predictions had been high.”

We believe it is important to dissociate between prediction errors and surprise. In particular, we quantify two types of prediction errors, where only RPE is signed (better or worse than predicted) and SPE (how different than predicted) is not. However, we propose that surprise is even more nuanced than the latter, because it is not only dependent on the absolute mismatch between prediction and outcome, but also on the confidence with which the prediction was made. That is what we simulate using Shannon information. To make this more apparent from the beginning and provide intuitions, we now foreshadow this concept in the introduction: “In the throwing example above, the more confident you are about the exact landing position of the dart, the more surprised you should be when you find that landing position to be different: The more confident you are, the more evidence you have that your internal model linking angles to landing positions is wrong, and the more information you get about how this model is wrong. […] However, this reasoning assumes that your predictions are in fact more precise when you are more confident, i.e., that your confidence is well calibrated (Figure 1B).”

We have further altered the relevant sentences in last paragraph in the introduction to read: “That is to say, an error that could be predicted based on internal knowledge of how an action was executed should not yield a large surprise (P3a) or reward prediction error (FRN) signal in response to an external indicator of the error (feedback). However, any prediction error should be more surprising when predictions were made with higher confidence.”

– Differences between blocks in the effect of confidence. This result is discussed twice: in the Results (p. 19) and Discussion. Only in the latter do the authors acknowledge that their interpretation of the effect is rather speculative. I would also flag that in Results, as it was neither part of the model predictions or their design.

Thank you for pointing out this oversight on our end. We have now entirely removed the interpretation of the 3-way interaction from the Results section. As you can see described in our response to point 5 above we have added extensive additional analyses that provide better insights into the confidence effects across blocks as they relate to our hypothesis and now rely on these additional findings for our interpretations instead.

2. Performed transformations involving confidence should be clearly explained.

We assume that this comment refers to the computation of confidence calibration. The reviewers are right in that we did not clearly explain that in the results. Now that we added the novel group-level analysis of the underlying relationship, we build on that to unpack more clearly how we derive the individual difference

measure before moving on to the section where we test for differences in learning varying with confidence calibration.

“Having demonstrated that, across individuals, confidence reflects the precision of their predictions (via the correlation with SPE), we next quantified this relationship for each participant separately as an index of their confidence calibration. […] We next tested our hypothesis that confidence calibration relates to learning.”

3. Model specification (the formula) should be included in the table legend to aid readability and interpretation as it makes it immediately clear what was defined as a random or fixed effect.

We have now added formulas to all tables.

4. On more conceptual level, the authors rely on the assumption that 'Feedback Prediction' is derived from efference copy, which carries motor noise only. In light of the goal of the current manuscript, that is an appropriate strategy. However, I think it should be acknowledged that in the employed paradigm part of behavioral variance may originate from inherent uncertainty of temporal representations (Balci, 2011). Typically, time production variance is partition into a 'clock' variance and 'motor' variance. I have a feeling that this distinction should be spelled out in the manuscript and if assumptions are made they shall be spelled out clearer. Moreover, recent work attempted to tease apart origins of 'Feedback Predictions', indicating that it is unlikely that they originate solely from motor variability (Kononowicz and Van Wassenhove, 2019).

First of all, apologies for having missed these papers. Thank you for pointing them out to us! It’s true, we previously used motor noise as a blanket term to account for multiple sources of variability. This was more of a convenience than a strong assumption. We have now replaced motor noise with response noise throughout the manuscript and briefly mention the two different drivers, citing the mentioned papers. “First, errors signaled by feedback include contributions of response noise, e.g. through variability in the motor system or in the representations of time 24,41. Second, the efference copy of the executed response (or the estimate of what was done) varies in its precision.”

5. The main predictions of the experiment are described in the first paragraph of the Results. But they are not reflected in Figure 1, which is referenced in that paragraph. I would have expected an illustration of the effects of confidence, and instead that only appears on Figure 2. The authors have clear predictions that drive the analysis, but this is not reflected in the flow of the text.

Thank you for your comment. We agree that it would help to visualize our predictions in the beginning. We have now revised Figure 1 to clarify key concepts and show our main predictions. We still show the model predictions and the empirical data as tests of these predictions in Figure 2.

6. Simulations (Figure 2. B, D): As far as I can tell, the model does not capture the data in two ways: it fails to address the cross-over effect (which the authors address) but also does not account for the apparent tendency of the data to increase the error on later trials (whereas the model predict a strict decrease in error over the course of the experiment). The second aspect is not addressed in the Discussion, I think (or I missed it). Do the authors think this is just fatigue, and therefore not consider it as a reason to modify the model? Also Panels 2.A. And C do not really match in the sense that the simulation is done over a much wider range of predicted outcomes. It seems like the model parameters were not fine-tuned to the data. Perhaps this is not strictly necessary if the quantitative predictions of the effects of confidence remain unchanged with a narrower range, but it is perhaps worth discussing.

To follow up on this, we plotted the log transformed running average error magnitude for three bins of confidence calibration. As can be seen in Figure 2—figure supplement 3, our statistical model approximates, but does not properly capture the shape of the learning curves, which rather seems to saturate within the first 100 trials for low confidence calibration, than showing a marked increase towards the end. We now note this in the figure caption. This figure shows the running average log-transformed error magnitude (10 trials) averaged within Confidence calibration terciles across trials. Computing running averages was necessary to denoise the raw data for display. The edited figure caption reads: “Note that the combination of linear and quadratic effects approximates the shape of the learning curves, better than a linear effect alone, but predicts an exaggerated uptick in errors towards the end, cf. Figure 2 – supplement 3.”

7. "… it is unknown whether reward prediction errors signaled by the FRN rely on predictions based only on previous feedback, or whether they might incorporate ongoing performance monitoring". I think that phrase should be rephrased based on the findings of Miltner et al. (1997), correctly cited in the manuscript, which showed that FRN was responsive to correct and incorrect feedback in time estimation.

We think this is a misunderstanding. As the Reviewer describes, Miltner et al. demonstrated that on average, error feedback elicits a negative deflection over fronto-central sites (FRN) relative to correct feedback. They do not consider expectations/predictions at all – either based on performance history or on performance monitoring. This finding was later built on and extended (Holroyd and Coles, 2002) by showing that the processing of error and correct feedback is sensitive to contextual expectations, i.e. reflects reward prediction error, not just error feedback per se. We extend this line of work further, by asking whether beyond the contextually-defined reward prediction error, FRN amplitude is sensitive to response-based outcome predictions derived through internal monitoring. Thus, the key question in our paper is whether error detection feeds into the prediction that underlies the prediction error processing reflected in the FRN. Miltner et al. have not shown or tested that. To avoid confusion, we have now changed the corresponding sentence to read: “However, it is unknown whether reward prediction errors signaled by the FRN contrast current feedback with predictions based only on previous (external) feedback, or whether they might incorporate ongoing (internal) performance monitoring.”

8. Relevance of the dart-throwing example: In the task, participants initially had no idea about the length of the to-be-reproduced interval, and instead had to approximate it iteratively. It was not immediately clear to me how this relates to a dart-throw, where the exact target position is known. I think I understand the authors that the unknown target here is "internal" – the specific motor commands that would lead to a bulls-eye are unknown and only iteratively approximated. If that interpretation is correct, I would recommend the authors clarify it explicitly in the paper, to aid the reader to make a connection. Or perhaps I misunderstood it. Either way it would be important to clarify it.

Thank you, yes, that is exactly right. One can think of the feedback scale just like the target. In each case, the right movement that produces the desired outcome needs to be learned. It doesn’t matter if I know that the target interval is 1.5 s if I don’t have a good sense of what that means for my time production. Similarly, it doesn’t help me to know where the target is if I don’t know how to reach it. Thus, the reviewer is exactly right that what is being iteratively approximated is the correct response. We now unpack the dart-throwing example in more detail throughout the introduction and when we introduce the task. To explicitly tie the relevant concepts together we now write: “In comparison to darts throwing as used in our example, the time estimation task requires a simple response – a button press – such that errors map onto a single axis that defines whether the response was provided too early, timely, or too late and by how much. These errors can be mapped onto a feedback scale and, just as in the darts example where one learns the correct angle and acceleration to hit the bullseye, participants here can learn the target timing interval.”

Balci, F., Freestone, D., Simen, P., Desouza, L., Cohen, J. D., and Holmes, P. (2011). Optimal temporal risk assessment. Front Integr Neurosci, 5, 56.

Colizoli, O., De Gee, J. W., Urai, A. E., and Donner, T. H. (2018). Task-evoked pupil responses reflect internal belief states. Scientific reports, 8(1), 1-13.

Correa, C. M., Noorman, S., Jiang, J., Palminteri, S., Cohen, M. X., Lebreton, M., and van Gaal, S. (2018). How the level of reward awareness changes the computational and electrophysiological signatures of reinforcement learning. Journal of Neuroscience, 38(48), 10338-10348.

Kononowicz, T. W., and Van Wassenhove, V. (2019). Evaluation of Self-generated Behavior: Untangling Metacognitive Readout and Error Detection. Journal of cognitive neuroscience, 31(11), 1641-1657.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Supplementary file 1. Follow-up on prediction and performance precision effects on confidence.
    elife-62825-supp1.docx (16.2KB, docx)
    Supplementary file 2. Control analysis for confidence calibration effect on learning.
    elife-62825-supp2.docx (15KB, docx)
    Supplementary file 3. Follow-up on block and confidence effects on relative error signals.
    elife-62825-supp3.docx (15.8KB, docx)
    Supplementary file 4. Follow-up on confidence by block interaction on RPE benefit over error magnitude.
    elife-62825-supp4.docx (15.5KB, docx)
    Supplementary file 5. Block and confidence effects on error signals.
    elife-62825-supp5.docx (16.3KB, docx)
    Supplementary file 6. Follow-up on block and confidence effects on error signals.
    elife-62825-supp6.docx (17.1KB, docx)
    Supplementary file 7. Block and confidence effects on error signals.
    elife-62825-supp7.docx (15.2KB, docx)
    Supplementary file 8. Follow-up analyses on confidence-weighted predicted error magnitude effects on P3b.
    elife-62825-supp8.docx (16.9KB, docx)
    Supplementary file 9. Trial-to-trial improvements by block and previous error and modulations by previous P3b.
    elife-62825-supp9.docx (15.9KB, docx)
    Supplementary file 10. Follow-up on trial-to-trial improvements by block and previous error and modulations by previous P3b.
    elife-62825-supp10.docx (18.4KB, docx)
    Transparent reporting form

    Data Availability Statement

    The datasets generated and analyzed during the current study are available under https://github.com/froemero/Outcome-Predictions-and-Confidence-Regulate-Learning (copy archieved at swh:1:rev:e8bfacf8fdb8126aade59581b98616b4f2fae7b3; Frömer, 2021).

    Scripts and source data for all analyses are available under https://github.com/froemero/Outcome-Predictions-and-Confidence-Regulate-Learning (copy archived at https://archive.softwareheritage.org/swh:1:rev:e8bfacf8fdb8126aade59581b98616b4f2fae7b3).


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES