Differentiating between Bayesian parameter learning and structure learning based on behavioural and pupil measures

Danaja Rutar; Olympia Colizoli; Luc Selen; Lukas Spieß; Johan Kwisthout; Sabine Hunnius

doi:10.1371/journal.pone.0270619

. 2023 Feb 16;18(2):e0270619. doi: 10.1371/journal.pone.0270619

Differentiating between Bayesian parameter learning and structure learning based on behavioural and pupil measures

Danaja Rutar ^1,^2,^‡,^*, Olympia Colizoli ^1,^‡, Luc Selen ¹, Lukas Spieß ³, Johan Kwisthout ^1,^‡, Sabine Hunnius ^1,^‡

Editor: Anthony C Constantinou⁴

PMCID: PMC9934335 PMID: 36795714

Abstract

Within predictive processing two kinds of learning can be distinguished: parameter learning and structure learning. In Bayesian parameter learning, parameters under a specific generative model are continuously being updated in light of new evidence. However, this learning mechanism cannot explain how new parameters are added to a model. Structure learning, unlike parameter learning, makes structural changes to a generative model by altering its causal connections or adding or removing parameters. Whilst these two types of learning have recently been formally differentiated, they have not been empirically distinguished. The aim of this research was to empirically differentiate between parameter learning and structure learning on the basis of how they affect pupil dilation. Participants took part in a within-subject computer-based learning experiment with two phases. In the first phase, participants had to learn the relationship between cues and target stimuli. In the second phase, they had to learn a conditional change in this relationship. Our results show that the learning dynamics were indeed qualitatively different between the two experimental phases, but in the opposite direction as we originally expected. Participants were learning more gradually in the second phase compared to the first phase. This might imply that participants built multiple models from scratch in the first phase (structure learning) before settling on one of these models. In the second phase, participants possibly just needed to update the probability distribution over the model parameters (parameter learning).

Introduction

Imagine you recently moved from one desert city in Australia to another. In your old neighbourhood all front yards were brown due to severe drought. However, in your new neighbourhood, to your surprise, all the front yards appear to be green. You wonder how this is possible since both cities have the same climate. You come up with potential reasons for this and decide that probably the new neighbourhood has just experienced a period of heavy rain. It is not until a week later that you see your neighbour changing their lawn for a new one. Now, you realise that all the front yards in this town have artificial green lawns. At that moment you add a new parameter to your model that can explain the situation which was unexplainable under the old model.

What challenge does the example above pose? We make sense of the world through models and use them for generating predictions about the incoming sensory evidence ([1, 2]). When a model fails to adequately predict the sensory evidence, it needs to be adjusted. In the case above, a new model parameter needs to be added to account for the new observation. How does this happen? What mechanism allows for adding a new parameter to an existing model? This question has been one of the central questions in cognitive science, but it remains, at least empirically, understudied.

Here, we address this long-standing question within the predictive-processing framework, a popular and influential theoretical framework in computational cognitive neuroscience ([1, 3–7]). The predictive-processing framework aims to be a unifying framework for understanding the entirety of human cognition and behaviour from visual processing ([8–10]) and action [11] to mentalizing ([12, 13]). According to the theory, the brain embodies a hierarchical generative model that “aim[s] to capture the statistical structure of some set of observed inputs by tracking (one might say, by schematically recapitulating) the causal matrix responsible for that very structure” [3]. Based on this hierarchical model, the brain generates top-down predictions, which are compared to the incoming sensory input. The difference between the predicted and the actual sensory input, that is the prediction error, is computed. From a predictive processing perspective, minimising reducible prediction error is the primary goal of computations in the brain and occurs mainly as a result of learning ([1, 4, 14–16]).

Until recently, learning in predictive processing was cast as parameter learning, where parameters under a specific generative model are updated in light of new evidence using Bayes’ rule ([4, 15, 17]). Such a formalism is well suited for explaining how learning proceeds when the generative model contains all relevant parameters for a particular learning task. In other words, parameter learning can only ensue when the structure of a generative model is established. Unless we assume that learners are equipped from the start with the complete set of parameters that can explain every situation they will ever encounter, we need to explain how novel parameters are added to a generative model or removed from it ([18–20]). To account for this, a new type of learning has been proposed within predictive processing: structure learning ([2, 5, 15, 17, 21]). This type of learning changes the structure of a generative model by changing the number of parameters in a model or by altering their functional dependencies ([2, 5, 15, 17, 21]). Depending on the type of structural change we can further differentiate between two types of structure learning–model reduction and model expansion. One can start from an overcomplete generative model and then eliminate redundant parameters (i.e., Bayesian model reduction) ([22]), or one starts with a crude model and then add new parameters or functional dependencies (i.e., model expansion) ([5, 15]) using non-parametric methods for example ([23, 24]), but see [21] for a critique of the non-parametric methods for explaining certain aspects of structure learning. Structural changes occur if the addition or the removal of a parameter yields a larger marginal likelihood compared to the marginal likelihood of the structurally unchanged generative model ([5, 15, 17]).

Building on the formal distinction between the two learning mechanisms, the aim of this study was to investigate whether parameter learning and structure learning can be empirically distinguished. To investigate this, we created an experiment with two phases. Before the task, participants were presented with all model variables (i.e., the different predictive cues and the target stimulus) to ensure that they were familiar the basic model structure prior to the experiment. In the first phase of the experiment, participants were expected to acquire the relationship between the cues and the target stimulus as sensory evidence was accumulating. In the second phase, the need for a structural change was induced by adding a new conditional dependency. In short, the first experimental phase was designed to elicit parameter learning and the second phase to trigger structure learning.

As a result of learning a more adequate model of the world, predictions should become better over time, and, simultaneously, uncertainty in each prediction should decrease ([3, 7]). Crucially, we expected that the two experimental phases of our task would lead to two distinct dynamics of this learning process. Gradual updating of the probability distributions of existing model parameters was expected to occur in the first phase, indicated by a gradual increase in predictive accuracy of the cue-target relationship. An abrupt change from incorrect to correct predictions, once a new model parameter has been added resulting from a conditional rule change, was expected to occur in the second phase (see section Hypotheses for a more detailed description).

Assuming we could instantiate such a learning trajectory in our task, we aimed to investigate the dynamics of a physiological correlate of information gain following the presentation of the outcome on each trial, for which there is no overt behavioural marker. Pupil dilation under constant luminance is a well-known indirect measure of the brain’s neuromodulatory arousal systems, including the noradrenergic locus coeruleus and the cholinergic basal forebrain ([25–29]). Subcortical arousal systems may be involved in transmitting internal uncertainty signals and (reward) prediction errors to circuits necessary for inference and action selection ([25, 30–37]). In line with this, several studies have shown that after a person is given feedback on the accuracy of a decision they just made, their pupil dilation scales with the amount of novel information gained as a result of the feedback ([16, 38–47]). Therefore, we reasoned that the target-locked pupil response in our current task might similarly reflect how informative the target was on each trial relative to the prior prediction made by the participant. We expected that target-locked pupil responses would gradually decrease in the first phase as participants learned the task contingencies over time and abruptly decrease in the second phase of the experiment once a new model parameter has been added resulting from a conditional rule change.

Method

Participants

Participants were recruited using Radboud University’s online recruitment system. The only restriction for participation was a minimum age of 16 years. Thirty-three healthy adults with normal or corrected-to-normal vision participated in our study. One participant was excluded for not following the instructions properly. Two participants were excluded due to equipment malfunction. The final sample consisted of 30 participants (24 women, aged 19–42 years, M = 23.3, SD = 4.5). The Ethics Committee of the Social Sciences Faculty at Radboud University approved the study, and all participants gave written informed consent. Participants received 15 euros for participating in the experiment.

Task and procedure

Task instructions

All participants were instructed to seat approximately 50 cm from the screen and place their chin in a chin rest. Participants carried out a computer-based two-alternative forced-choice (2AFC) task on the expected orientation (left vs. right) of the target stimulus (Gabor patches, Fig 1A). The experiment consisted of two phases with 200 trials each (400 trials in total). It took 1.5 hours to complete the experiment and there were three breaks (two short breaks halfway each phase and one longer break between the two phases) during the experiment. After each break, recalibration of the eye-tracker took place (see below). At the beginning of the experiment, participants were told that they would be presented with auditory and visual cues. The instructions were to use the cues to predict the orientation of the target stimulus in each trial. Before the start of the experiment, participants were presented with an example trial. They indicated their prediction by pressing either the right or left button on a button box for left or right orientation, respectively. Participants were instructed to press the button as soon as they thought they knew which target orientation would appear. At the end of the first phase, participants were told that the cue-target contingency would change (in the second phase) but not what the change would be. In the second phase, participants were similarly instructed to continue to use the cues to predict the orientation of the target stimulus.

Fig 1 — (A) Trial structure of the behavioural task. Participants performed a 2AFC task on the expected orientation (left/right) of upcoming Gabor patches while pupil dilation was recorded. Each trial consisted of a fixation period, a cue period, a response window followed by a delay period, and finally a target period. The decision interval ranged from onset of the cue to the participant’s response. The target interval ranged from target onset into the subsequent inter-trial interval (3 s). The target served as feedback on the accuracy of participants’ predictions in the decision interval. (B) An illustration of one of the two counterbalanced cue-target mappings. The participants had to learn cue-target contingencies to accurately predict the orientation of the upcoming Gabor patch (target). Mapping 1 was defined as the visual cue-target pairs that occurred in 80% of trials in the first phase; Mapping 2 was defined as the visual cue-target pairs that occurred in 20% of trials in the first phase. Mappings were counterbalanced between participants (i.e., half of the participants received the square -> left, diamond -> right mapping in the 80% condition in phase 1). At the start of the second phase, the frequencies (80% vs. 20%) of the cue-target mappings were reversed for trials containing the auditory tone cue only. (C) Main hypotheses for the dynamics of accuracy and information gain following the target presentation over the course of the 2AFC task. The first 200 trials of the task represent the first phase in which a gradual increase in accuracy and a gradual decrease in the absolute value of information gain were expected (represented by exponential curves). Within the second phase (the last 200 trials), an abrupt increase in accuracy and an abrupt decrease in information gain were expected (represented by sigmoidal curves).

Trial structure and experimental stimuli

Stimuli were isoluminant and the environmental illumination was the same for all participants. Stimuli were presented on a computer screen with the spatial resolution of 1920 × 1080 pixels. One trial lasted approximately nine seconds. Each trial consisted of a fixation period, a cue period, a response window followed by a delay period, and a target period. For all periods, except for the target period, a vertically oriented Gabor patch was presented. The target stimulus was a Gabor patch oriented to the left or to the right, with a special frequency of 0.033 and opacity 0.5. Each trial started with a fixation cross on the vertical Gabor patch, which was shown on the screen for 1500 ms. Afterward, the visual cue, which was either a square or a diamond, was presented in the middle of the screen for 1000 ms. In 50% of the trials the visual cue was paired with an auditory cue (tone), which was presented for 300 ms. In the other 50% of the trials, the auditory cue was absent. During the subsequent interval, the fixation cross and the vertically oriented Gabor patch were presented, and participants were asked to indicate their prediction about the upcoming orientation of the target by pressing a button. There was no maximum response window. After a button had been pressed, a delay period started with the vertical Gabor patch on the screen for additional 3000 ms. After the delay period, the target stimulus (the Gabor patch tilted either left or right) was shown for 3000 ms. The durations of the response period and target window were chosen in order to avoid contamination of the pupil dilation response to a previous event. The delay period following the response window was sufficiently long to ensure that the pupil response to the target stimulus would not be contaminated by the motion response of the button press [39]. It is important to note that the target Gabor patch served as trial-by-trial feedback on the accuracy of participants’ cue-target predictions.

Task structure

To disambiguate structure learning from parameter learning, it was necessary that our experimental paradigm after a phase of gradual learning induced an “aha” moment when participants suddenly realized a novel contingency. This required a paradigm that goes beyond conventional reversal learning (i.e., where contingencies simply change and the parameters encoding those contingencies are updated via parameter learning).

We devised a two-phase experimental paradigm during which participants first learned a simple model of cue-target mappings. In the second phase, we introduced a structural change in the cue-target mapping by adding a conditional dependency. Whereas in the first phase there was no interaction between the predictive validity of visual and auditory cues, in the second phase the predictive validity of visual cues depended on the presence of auditory cues, as the visual cue-target mappings were reversed for trials in which the auditory cue was present.

The design contained probabilistic cue-target mappings to introduce uncertainty in the predictions, simulating uncertainty that is inherent to perception in the real world. The visual and auditory cues predicted whether the target Gabor patch was tilted to the right or to the left with either an 80% or 20% probability (Fig 1B). Note that we define mapping 1 (M1) to correspond to the 80% visual cue-target pairs with respect to the first phase and mapping 2 (M2) to correspond to the 20% visual cue-target pairs with respect to the first phase. Cue-target mappings were counterbalanced between participants such that half of the participants saw the square followed by a right-oriented Gabor patch and a diamond followed by a left-oriented grating in 80% of the trials, and, for the other half of the participants, this mapping was reversed (i.e., square–> left and diamond–> right in the 80% condition). In the remaining 20% of the trials, the participants received the reversed cue-target mapping with respect to their 80% mapping condition.

Hypotheses

We first examined learning during the two experimental phases based on the behavioural responses. Participants had to indicate by a button press which Gabor patch orientation they predicted based on the cue(s). The first phase was designed to induce parameter learning, therefore, participants were expected to gradually learn the probabilistic relationship between the cues and the Gabor patch orientation (target). Post-decision sensory evidence, in this case the target-stimulus, should improve future predictions and hence increase accuracy values for the high-frequency trials. We thus expected that in the first phase, predictive accuracy would show a gradual increase over time, illustrated by an exponential curve.

At the beginning of the second phase, participants were instructed that something had changed during this phase. We expected that participants would discover the new rule by integrating the now meaningful tone into their predictive models, resulting in structure learning. This novel model parameter should account for observations that could not be predicted correctly before the parameter was added. After an initial decrease (relative to the final accuracy in the first phase) in predictive accuracy in the second phase, the addition of a new model parameter should lead to an abrupt increase in predictive accuracy (i.e., an “aha” moment), illustrated by sigmoidal curves in Fig 1C. Fig 1C (top row) illustrates the expected accuracy on predictions during the 2AFC task. The learning curves for the tone and no tone trials may have differed during the second experimental phase as compared with the first due to the change in contingency. Therefore, we investigated the tone and no-tone trials separately in the main analysis.

After exploring the learning dynamics in behavioural data, we investigated how learning in the two experimental phases was reflected in the pupil data. With learning, participants are expected to become better at making cue-target predictions. As a consequence of sensory evidence accumulating over trials, the amount of novel sensory evidence needed to update current beliefs will become smaller over time. We hypothesised that pupil responses signalling information gain [16] would decrease for the high-frequency trials as a result of learning the cue-target contingencies. More specifically, we hypothesised that in the first phase, pupil responses would decrease gradually and then plateau, while in the second phase, pupil responses would abruptly decrease until they plateau again as soon as the change in the cue-target contingency is learned (see Fig 1C).

Data acquisition and analyses

Data acquisition and pre-processing

Changes in pupil dilation were recorded using an SMI RED500 eye-tracker (SensoMotoric Instruments, Teltow/Berlin, Germany) with a sampling rate of 500 Hz. We analysed the pupil dilation data of the right eye for each participant. The timing of blinks and saccade events was not saved in the output of the eye-tracker; therefore, we did not attempt to categorize separate blink and saccade events for pre-processing purposes. Pre-processing was applied to the entire pupil dilation time series of each participant and consisted of: i) interpolation around missing samples (0.15 s before and after each missing period), ii) interpolation around blinks or saccade events based on spikes in the temporal derivative of the pupil time series (0.15 s before and after each blink or saccade period), iii) band-pass filtering (third-order Butterworth, passband: 0.01–6 Hz), iv) removing responses to nuisance events using multiple linear regression (missing periods and blink or saccade events were all categorized together as a single ‘nuisance’ event type; responses were estimated by deconvolution) [48], and v) the residuals of the nuisance regression were transformed to percent signal change with respect to the temporal mean of the time series.

For each trial, intervals corresponding to the onset of the cue were extracted from each participant’s pupil dilation time series (cue-locked and target-locked, respectively). The cue-locked and target-locked intervals were baseline-corrected separately for each trial. The baseline pupil was defined as the mean pupil in the time window -0.5 to 0 seconds with respect to the cue or target for the cue-locked and target-locked intervals, respectively. The cue-locked pupil response was analysed for data quality purposes, while the target-locked pupil response was the main dependent variable of interest.

The temporal window of interest was independently defined as 1 to 2 seconds after the target onset based on the pupil’s canonical impulse response function ([48–51]). For each trial, a single value for the target-locked pupil response was computed as the mean pupil dilation within this temporal window of interest.

Trials were excluded if the reaction time (RT) was more than three standard deviations above the participant’s mean RT or lower than 200 ms (minimal time needed for the necessary encoding and preparation of a motor response; ([52–55]).

Data quality checks

Behaviour. We expected that participants would learn the cue-target contingencies in both phases of the task, which would be reflected in higher accuracy and faster responses for high frequency trials as compared with low frequency trials. The effect of the visual cue-target mapping condition was expected to interact with the auditory cue condition and task phase, reflecting the reversal of the visual cue-target contingencies in the second phase during the tone trials only. These hypotheses were tested in two 3-way repeated measures ANOVAs, separately for accuracy (as percentage of correct trials) and RT with factors: cue-target mapping (M1 vs. M2), auditory cue (tone vs. no tone), and phase (first vs. second).

Target-locked pupil response time courses. The data acquisition quality was assessed with several analyses on the time courses of the pupil response to the cue and target presentation. The visual and auditory cues indicated to the participants that they should make a button press based on their prediction. A button press in the response phase was expected to evoke a motor-driven impulse response which should be reflected in the mean cue-locked pupil response ([39, 50, 56, 57]). In addition, we expected to see larger pupil dilation on average during tone trials as compared with no tone trials in the cue-locked pupil response, as auditory cues are known to be arousing ([58, 59]). The cue-locked effect of the tone was expected to return to baseline before the target was presented on screen. Finally, we expected to see larger pupil dilation on average during erroneous predictions as compared with correct predictions in the evoked target-locked pupil response ([39, 57, 60–65]).

Target-locked pupil response scalar averages. Using the scalar target-locked pupil response averages within our time window of interest, we expected to see an interaction between the cue-target mapping, auditory cue, and task phase in the target-locked pupil response. This was tested with a 3-way repeated measures ANOVA. In the first phase, the average pupil dilation for the M2 mapping was expected to be larger as compared with the M1 mapping, because low frequency trials (M2 in the first phase) should contain more errors overall. The direction of the mapping effect was expected to reverse in the second phase for tone trials only.

Main analyses

Psychometric curve fits on accuracy data. To test our hypothesis concerning the dynamics of the target-orientation predictions, we assessed whether the range parameter, σ, of the psychometric curve fits differed between the first and the second phase. In a psychometric curve, accuracy is plotted against signal intensity [66]. We explored the resulting psychometric curves when the number of trials completed over time was taken as a proxy for signal intensity. The sigmoid function used for the psychometric curve fits is given in Eq 1.

f (x) = a 0 + (1 - a 0) \int_{- \infty}^{x} \frac{1}{σ \sqrt{2} π} e^{\frac{- (x - μ)^{2}}{2 σ^{2}}}

(Eq 1)

We fit three parameters, μ, σ, and a0, to the individual participant’s response accuracy across all trials for the tone condition only (i.e., the frequency conditions were not differentiated), separately for the first and second phase of the experiment. We placed linear constraints on the curve fits so that σ could not exceed three times the value of μ. We bound both μ and σ so they could not exceed the number of trials in each phase of the experiment (range between 1 and 200 trials). The starting point, a0, was bound between 0 and 1. The parameters were determined by minimising the negative log-likelihood cost function.

If our hypotheses about the difference between parameter learning and structure learning are correct, then σ should be higher for the tone trials in the first phase as compared with the tone trials in the second phase.

Curve fits on target-locked pupil data. To test our main hypothesis concerning the target-locked pupil responses across the first and the second experimental phase, we assessed whether the time course of the target-locked pupil dilation showed the difference in the range parameter, σ, of the sigmoid curve fits. The target-locked pupil dilation is taken as a proxy of information gain within the two phases of the experiment. The logic is that sensory evidence for the cue-target continencies is accumulated as the task progresses over time, and learning should be evident in a reduction in the amount of novel sensory evidence needed to update current beliefs as a function of trial order. The sigmoid function used for the curve fits on target-locked pupil responses is given in Eq 2.

f (x) = a 0 + G \int_{- \infty}^{x} \frac{1}{σ \sqrt{2} π} e^{\frac{- (x - μ)^{2}}{2 σ^{2}}}

(Eq 2)

We fit four parameters, μ, σ, a0 and G, of the above sigmoid function to the target-locked pupil dilation across the high-frequency trials (80%) for the tone condition only, comparing the first and second phases of the experiment. We differentiated between the high-frequency as compared with low-frequency trials, in order to fit only a single direction of the (expected) change in information gain ([39, 67]). The starting point of the curve is reflected in the a0 parameter. The inflection point of the curve is reflected in the μ parameter. The gain parameter, G, allowed for negative scaling of the curves given the nature of the pupil signal as dependent variable (i.e., percent signal change). The range parameter, σ, reflects the range over where the curve rises. A larger σ parameter is associated with a larger range over which the transition (i.e., from f(x) = a0 to f(x) = 1) takes place. We placed linear constraints on the curve fits so that σ could not exceed three times the value of μ. We bound both μ and σ so they could not exceed the number of trials in each phase of the experiment (range between 1 and 200 trials). The parameters were furthermore not constrained or bounded. The parameters were determined by minimising the ordinary least squares cost function.

Our main hypothesis was that learning would extend across more trials in the first phase as compared with in the second phase, reflecting the difference between parameter learning and structure learning. Therefore, we expected that the σ would be larger for the high frequency tone trials in the first phase as compared with the high frequency tone trials in the second phase (note that the cue-target mappings were flipped for tone trials in the second phase; see Fig 1B). We furthermore expected the sign of the G to be negative in both phases, indicating a decreasing trend. A larger value of σ together with a negative gain parameter, G, thus reflects a more gradual reduction of target-locked pupil responses across trials. This analysis enables us to examine whether pupil dilation depended on the experimental phase and if it scaled with our hypotheses in Fig 1C. Descriptive statistics for all free parameters are presented in S2 Table in S1 File.

Software

The prediction task was administered with Psychopy [68]. The behavioural and pupil data were processed with custom software using Python [69]. The evoked pupil responses were statistically assessed with a cluster-level permutation test as part of the MNE-Python package [70]. Repeated measures ANOVAs and Bayesian tests were carried out in JASP [71]. All data and code are publicly available (https://doi.org/10.34973/t41p-hx94).

Results

The current study aimed to compare predictive accuracy over trials during parameter learning and structure learning. Structure learning, unlike parameter learning, changes the structure of a generative model by altering causal connections between parameters or by adding and removing parameters in a model ([2, 5, 15, 17, 21]). Participants performed a 2AFC task on the expected orientation (left vs. right) of an upcoming Gabor patch (target) task while pupil dilation was recorded (Fig 1A). The participants had to learn cue-target contingencies in order to accurately predict the orientation of the target. Cue-target contingencies changed at the start of the second phase in the following way: the cue-target mapping was reversed but only for trials containing the auditory tone cue. The target served as feedback on the accuracy of participants’ predictions in the decision interval.

Main effects and interactions in the cue-target prediction task

We first evaluated whether participants performed the 2AFC task as expected. Main effects and interactions between the visual cue-target mapping condition (M1 vs. M2), task phase (first vs. second), and the presence of the auditory cue (tone vs. no tone) were assessed in three independent 3-way repeated measures ANOVAs on the dependent variables of mean accuracy (Fig 2A), RT (Fig 2B), and target-locked pupil dilation (Fig 2C) (see Fig 2 for the analysis of evoked pupil responses). The ANOVA results are presented in Table 1, and relevant post hoc comparisons are presented in Fig 2. At the mean group level, a significant 3-way interaction was obtained between visual cue-target mapping condition, auditory cue condition, and task phase for accuracy and target-locked pupil response (but not for RT). Participants accurately predicted the cue-target contingencies in both phases of the experiment, illustrated by the main effect of visual cue-target mapping condition in the first phase and the mapping reversal in the second phase for tone trials only (Fig 2A). As expected, the target-locked pupil response was larger for the M2 trials as compared with the M1 trials during the first phase, and the presence of the auditory cue reversed the direction of the target-locked pupil response in the second phase for the tone trials (Fig 2C). S1 Fig in S1 File illustrates how the behaviour and target-locked pupil dilation changed as a function of time across 16 bins of trials (25 trials per bin). We confirmed that the target-locked pupil responses were “mirroring” the learning trajectory obtained in the accuracy of the behavioural responses. This correspondence between learning and pupil dilation was indicated by the presence of a negative monotonic relationship between the accuracy of predictions and the target-locked pupil response across these 16 trial bins for the tone trials (see Fig 1C for hypotheses, S1 File, and S2 Fig in S1 File). Finally, we explored whether the target-locked pupil response, on average, differentiated between the difference in the error and correct responses for each of the frequency conditions and experimental phases. The target-locked pupil response did show sensitivity to both the predictive accuracy and cue-target frequency, but these factors did not interact (see S3 Fig and S1 Table in S1 File).

Fig 2 — (A) Prediction accuracy, (B) mean RT, and (C) target-locked pupil dilation as a function of visual cue-target mapping condition (M1 vs. M2), the presence of the auditory cue (tone vs. no tone), and task phase (first vs. second). Results of the 3-way repeated measures ANOVAs are given in Table 1. Significance refers to post hoc t-tests: **p < .01, ***p < .001. Error bars, s.e.m. (N = 30). Note that the frequencies of the visual cue-target mappings change in the second phase for the tone trials.

Table 1. Results of the 3-way repeated measures ANOVAs on accuracy, RT, and target-locked pupil response.

Factors of interest were cue-target mapping condition (levels: M1 vs. M2), auditory cue (levels: tone vs. no tone), and task phase (levels: first vs. second). Accuracy data were percentage of correct trials; RT was analysed in seconds. *p < .05, **p < .01, ***p < .001.

	Accuracy			RT			Pupil response
Effect	F(1,29)	p	η² _G	F(1,29)	p	η² _G	F(1,29)	p	η² _G
Mapping	301.22	< .001***	0.68	< .01	0.964	< .01	21.17	< .001***	0.05
Auditory cue	6.69	0.015*	< .01	4.91	0.035*	0.01	3.67	0.065	0.01
Phase	0.01	0.939	< .01	0.14	0.711	< .01	0.69	0.414	< .01
Mapping Auditory cue*	226.34	< .001***	0.65	0.16	0.694	< .01	6.79	0.014*	0.01
Mapping Phase*	201.03	< .001***	0.61	0.39	0.535	< .01	20.10	< .001***	0.04
Auditory cue Phase*	0.40	0.533	< .01	3.24	0.082	< .01	0.57	0.458	< .01
Mapping Auditory cue * Phase*	205.49	< .001***	0.64	3.52	0.071	< .01	9.71	0.004**	0.02

Open in a new tab

The data quality of the pupil dilation measures was assessed with several data quality checks before testing our hypotheses about the dynamics of target-locked pupil responses across the experimental phases. First, evoked pupil dilation was present in response to the (visual and auditory) cue onsets as expected in the decision phase, here reflecting both decision preparation as well as the upcoming motor output in the form of a button press (Fig 3A). Furthermore, the temporal window (1 to 2 s) independently chosen for the target-locked pupil analysis contained the peak of the group-level cue-locked evoked response (Fig 3A, grey box). Second, as expected, errors resulted in larger pupil dilation as compared with correct trials following the target presentation, and this accuracy effect was significant within the temporal window of interest (Fig 3B, grey box). Third, the presence of the auditory cue during the decision interval (cue-locked) was associated with larger pupil dilation as compared with the absence of the tone (Fig 3C), likely reflecting a difference in phasic arousal state during tone trials. Importantly, the (unwanted) arousal effect related to the auditory tone was no longer present by the time the target was presented for the participants (Fig 3D). We note that all further pupil analyses used the target-locked pupil dilation, averaged within the temporal window of interest (Fig 3B and 3D, grey boxes), as the dependent variable. In sum, the pupil data fit all the data quality checks.

Fig 3 — All trials within the first and second phase of the prediction task were included in the evoked pupil response analysis. (A) Mean cue-locked pupil responses in the prediction interval. Black bar indicates main effect of cue, p < 0.05 (cluster-based permutation test). (B) Evoked pupil responses for correct and error trials in the feedback interval (target-locked). Black bar indicates correct vs. error effect, p < 0.05 (cluster-based permutation test). (C) Evoked pupil responses for tone and no tone trials in the prediction interval (cue-locked) and (D) in the feedback interval (target-locked). Black bar indicates tone vs. no tone effect, p < 0.05 (cluster-based permutation test). In all panels: variability around the mean responses is illustrated as the 68% confidence interval (bootstrapped; N = 30); the grey box indicates the temporal window of interest (1–2 s) with respect to event onset for the target-locked pupil responses. Note that the temporal window of interest was independently defined based on the pupil’s canonical impulse response function.

Psychometric curve fits on accuracy data

For the accuracy data, psychometric curves were fit for each participant’s response accuracy for the tone trials only, separately per phase. Individual curve fits are shown in S4 Fig in S1 File. Descriptive statistics for all free parameters are presented in S2 Table in S1 File. The null hypothesis stated that there would be no difference in the σ parameters between phases and our alternative hypothesis was that there would be difference in the σ parameters. Particularly, we expected the σ parameter to be larger for the tone trials in the first phase as compared with the tone trials in the second phase (i.e., the cue-target mappings were flipped for tone trials only in the second phase). For the trials without an auditory cue, we did not have expectations about the difference in σ between the two experimental phases of the task. Therefore, we tested only tone trials for phase-dependent differences (first vs. second) of the mean σ parameter (Fig 4A). To examine whether the σ parameters differed, we used a Bayesian Wilcoxon Signed-Rank Test (Fig 4A, right column). Bayes factor indicated evidence for the alternative hypothesis (BF₁₀ = 17300). This means the data were approximately 17300 times more likely to occur under the alternative hypothesis (that there would be a difference in the σ parameters between phases). However, the difference between the σ parameters was in the opposite direction as expected, since σ in the second phase was larger than in the first phase.

Fig 4 — For each participant, sigmoid curves were fit to the tone trials (i.e., trials with an auditory cue) and compared between the first and the second phase of the experiment. (A) For accuracy data, all tone trials (i.e., the high- and low-frequency conditions) were used to fit the curves. (B) For the target-locked pupil response data, curves were fit to the high-frequency (80%) tone trials only. Note that the cue-target mappings (M1 and M2) which correspond to the high-frequency trials differ per phase depending on the presence of the auditory cue. Results of the Bayesian Wilcoxon Signed-Rank Test are shown for the accuracy and pupil data (right column). Error bars, s.e.m. (N = 30).

Psychometric curve fits on target-locked pupil data

Finally, we tested our main hypothesis concerning the target-locked pupil responses across the two experimental phases. Sigmoid curves were fit for each participant’s target-locked pupil response for the tone trials in the high-frequency condition only, separately per phase. Individual curve fits are shown in S5 Fig in S1 File. Descriptive statistics for all free parameters are presented in S2 Table in S1 File. As with behavioural responses, our hypothesis was that σ would be larger for the high-frequency tone trials in the first phase as compared with the high-frequency tone trials in the second phase (i.e., the cue-target mappings were flipped for tone trials only in the second phase). For the trials without an auditory cue, we did not have expectations about the difference in σ between the phases of the task. Therefore, we tested only tone trials for phase-dependent differences (first vs. second) of the mean σ parameter (Fig 4B). To examine whether the σ parameters differed, we used Bayesian Wilcoxon Signed-Rank Test (Fig 4B, right column). The Bayes factor indicated evidence for the alternative hypothesis (BF₁₀ = 930). This means that the data were approximately 930 times more likely to occur under the alternative hypothesis (that there is a difference between the σ parameters) than under the null hypothesis. However, like in the accuracy data, the difference between the σ parameters was in the opposite direction as expected, since σ in the second phase was larger than in the first phase.

We also expected that the target-locked pupil responses would decrease as a result of learning, reflected in a negative gain parameter (G) in our curve fits for both phases. In the first phase, G was negative on average as expected (M = -4.1, SD = 11.94), however, G was positive on average in the second phase (M = 2.4, SD = 9.3), against our hypothesis of the target-locked pupil responses decreasing in both phases of the experiment. Furthermore, it was apparent that the sign of G was not consistent at the individual level (see S5 Fig in S1 File).

Discussion

Our research was motivated by the observation that the two kinds of learning within predictive processing, whilst recently formally differentiated, have not been empirically distinguished. Parameter learning, on the one hand, refers to updating of probability distribution of model parameters in light of new evidence using Bayes’ rule ([4, 15]). Structure learning, on the other hand, pertains to altering the structure of the generative model by changing the number of parameters in the generative model or by altering their functional dependencies ([5, 15, 17]). Related proposals have been put forward by Kwisthout and colleagues [2] and Rutar and colleagues [21], who developed a formal proposal for structural changes that go beyond parameter addition and removal in a generative model. Similarly, Heald and colleagues [72] have recently presented a theory for sensorimotor learning, called contextual inference, that differentiates between the adaptation of behaviour based on updating of existing and creation of new motor memories and adaptation due to changes in the relative weighting of these motor memories.

Building on this, we investigated whether we could empirically distinguish between parameter learning and structure learning. We expected to be able to differentiate between the two mechanisms based on their learning trajectories as measured in accuracy of participants’ predictions. We furthermore hypothesized that the different learning trajectories would be reflected in target-locked pupil dilation given its potential for signalling information gain [16]. To investigate our question, we created a within-subject computer-based experiment with two phases. In the first phase, participants had to learn and predict the probabilistic relationship between the cues and a target stimulus, and in the second phase, a conditional change occurred in the cue-target relationship learnt previously.

To test whether participants performed the task as expected, we performed some basic quality controls on behavioural and pupil data. Behavioural data showed that participants on average correctly predicted the cue-target contingencies in both experimental phases. This was reflected in the cue-target mapping effect for tone and no tone trials in the first phase, and a reversed effect of cue-target mapping effect was observed in the second phase for tone trials only (when the mapping switched). As expected, a similar pattern was also observed in pupil data.

Before turning to the main research question, the quality of the pupil measures was checked and compared with effects observed in previous literature. Decision preparation and the preparation of the motor response were reflected in the evoked pupil response to the visual and auditory cue onsets replicating previous work ([39, 50, 56, 57]). We also observed the peak of the group-level cue-locked evoked response around 1–2 seconds, which is in line with previous findings ([48, 49–51]). Furthermore, erroneous responses following the presentation of explicit feedback resulted in larger pupil dilation as compared to correct responses, an effect that has also been consistently reported ([39, 57, 60–65]). Finally, we found a well-known auditory effect on pupil responses with pupil responses being bigger on trials with a tone compared to trials without a tone ([58, 59]).

Given that participants understood and correctly performed the task, we turn to the results related to our hypotheses. We predicted that in the first and the second experimental phase, different temporal dynamics in predictive accuracy would be observed. We hypothesised that in the first phase, participants would be gradually learning the probabilistic relationship between the cues and the target orientation, leading to parameter learning. As a result of parameter learning, participants should become better at predicting future sensory input, resulting in a gradual increase in predictive accuracy over time. In the second phase the rules switched for the tone trials, which should initially lead to a decrease in predictive accuracy. When the change in rules is learned, a new parameter is added to a learner’s model, resulting in structure learning. An integration of a new parameter should lead to an abrupt increase in predictive accuracy (i.e., an “aha” moment).

Finally, to assess the main hypothesis, that parameter learning and structure learning are empirically differentiable, we performed curve fitting first on the behavioural and then on the pupil data. Curve fitting revealed that there is substantially more evidence in support of the hypothesis that there exists a difference between the phases of the task in accuracy and pupil data as reflected in the σ parameter. However, the difference was in the opposite direction as expected; the σ was significantly smaller in the first phase compared to the second phase, in accuracy and pupil data. These results suggest that the participants were learning more gradually in the second phase compared to the first phase of the experiment, contrary to our expectations.

One possible interpretation of these results is that our experimental manipulation induced structure learning in the first phase and parameter learning in the second phase. It might have been that in the first phase participants built multiple internal models, that they thought could capture the structure of the task, from scratch. An idea, that is reminiscent of Pouncy and Gershman’s work [73] where participants are considering several models or competing theories at each point in time. As participants were learning our task, they were alternating between these models and upon the realisation of the rule participants settled for the correct model, resulting in a rapid increase in predictive accuracy and a decrease in the target-locked pupil responses as the data in the first phase shows. We assumed we prevented participants from learning models from scratch in the first phase by providing them with detailed instructions and pictorial representation of the stimuli resented in the task, before the task started. By that, we thought, we equipped participants with a crude model that would contain hypotheses about all the relevant variables of the task. However, in light of the current results we believe that our instructions did not result in the construction of a simple model that participants could use as a baseline upon entering the task. Importantly, whilst the above interpretation of the results is in principle plausible, further empirical investigation needs to be conducted to confirm that participants were indeed building multiple models in the first phase and then in the second phase selected a model (they had already constructed in the first phase) and started updating the parameters of that model.

At the beginning of the second phase, participants expected the rules of the task to change. All the experimental variables (e.g., target, visual cues) in the second phase were the same as in the first phase, possibly signalling to the participants that one of the models that they had already constructed in the first phase could be suitable for explaining the change in the second phase. If this was the case, then participants in the second phase merely reused a correct, existing model, and started gradually updating parameters of that model rather than adding a new parameter (based on a new experimental variable) to a model.

We could possibly have avoided participants building models from scratch in the first phase had we ran a computerised familiarisation phase with all the relevant experimental variables with the participants prior to the experiment. This would make sure that participants have constructed crude models of the task before the experiment started and that they could then use in the first phase of the experiment. Additionally, to make sure that participants in the second phase were not just reusing one of the models they had constructed in the first phase but instead build on an existing model, we could have introduced a new experimental variable in the second phase that was present in the familiarisation but not in the first phase. In that case, participants would have to add a new model parameter (instantiating structure learning) constructed in the first phase, if they were to successfully learn the new rule in the second phase.

Our results also revealed that the gain parameter, G, which indicates the direction of the σ parameter, was negative in the first phase as expected and positive in the second phase contrary to our expectations. This suggests that pupil dilation, for the high-frequency condition, was on average decreasing in the first phase and increasing in the second phase. These results are unexpected if the target-locked pupil dilation reflects novel information gain (see Fig 1C). The gain in information following the outcome of an event should decrease as a result of increasing accuracy for predicting the contingent relationships ([1, 4, 14–16]). When trials where binned across both experimental phases, the target-locked pupil responses mirrored the participants’ accuracy such that when participants had a larger difference between cue-target frequency conditions in accuracy, they also tended to have a smaller difference between cue-target frequency conditions in the target-locked pupil responses (see S2 Fig in S1 File). These results are generally in line with the assumption that the presentation of the target stimulus became less informative as the participants learned to predict the cue-target contingencies. However, a look at the individual participants’ pupil responses at the single-trial level for the high-frequency condition only (see S5 Fig in S1 File) reveals large variability in the sign of the σ parameter for both phases, potentially suggesting individual differences in the size of pupil dilation over time. This suggestion is in line with recent findings of substantial inter- and intra-individual variation in the size of pupil dilation over trials [74]. More specifically, the study shows that in a simple digit-span memory task pupil dilation was consistently increasing over trials for some participants and for others it was decreasing. There were also participants for whom the trend was changing throughout the task. Another factor that could explain why pupil dilation was increasing in the second phase is that participants in this phase were more fatigued than in the first phase at the beginning of the experiment. As a consequence, they had to exert more effort to maintain concentration on the task and process the change in the task rules. Increased cognitive effort would result in increasing pupil dilation as has been shown many times before ([75–77]).

All in all, our data shows that there exists a qualitative difference between parameter learning and structure learning, following the theoretical proposal ([2, 5, 15, 17, 21]). However, parameter learning seemed to have occurred in the second phase and structure learning in the first phase of the experiment for reasons described above. Future studies should therefore make sure to induce experimental manipulations that have the initially intended effect. Alternatively, future studies might investigate empirically the interpretation of the current results: that participants construct multiple models at the beginning and later choose among them and update their parameters. Lastly, our study is one of the few that studied how target-locked pupil responses change over time due to learning, on a trial-to-trial basis, with some exceptions ([41, 42, 47]). Therefore, little is known about how pupil dynamics change over extended periods of time and whether individual differences exist in this process. More studies should thus examine pupil dilation in such a manner in the future.

Supporting information

S1 File

(DOCX)

Click here for additional data file.^{(2.6MB, docx)}

Data Availability

Raw, pre-processed data and all analyses scripts can be found here: https://doi.org/10.34973/t41p-hx94.

Funding Statement

DR was supported by the Donders Centre of Cognition Grant (Understanding predictive processing in development: Modelling the generation of generative models) awarded to JK and SH. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Clark A. (2015). Surfing uncertainty: Prediction, action, and the embodied mind. Oxford University Press. [Google Scholar]
2.Kwisthout J., Bekkering H., & Van Rooij I. (2017). To be precise, the details don’t matter: On predictive processing, precision, and level of detail of predictions. Brain and Cognition, 112, 84–91. doi: 10.1016/j.bandc.2016.02.008 [DOI] [PubMed] [Google Scholar]
3.Clark A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204. doi: 10.1017/S0140525X12000477 [DOI] [PubMed] [Google Scholar]
4.Friston K., FitzGerald T., Rigoli F., Schwartenbeck P., & Pezzulo G. (2016). Active inference and learning. Neuroscience & Biobehavioral Reviews, 68, 862–879. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Friston K., Lin M., Frith C. D., Pezzulo G., Hobson J. A., & Ondobaka S. (2017). Active inference, curiosity and insight. Neural Computation, 29(10), 2633–2683. doi: 10.1162/neco_a_00999 [DOI] [PubMed] [Google Scholar]
6.Friston K., & Kiebel S. (2009). Predictive coding under the free-energy principle. Philosophical Transactions of the Royal Society B: Biological Sciences, 364(1521), 1211–1221. doi: 10.1098/rstb.2008.0300 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Hohwy J. (2013). The predictive mind. OUP Oxford [Google Scholar]
8.Edwards G., Vetter P., McGruer F., Petro L. S., & Muckli L. (2017). Predictive feedback to V1 dynamically updates with sensory input. Scientific Reports, 7(1), 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Petro L. S., & Muckli L. (2016). The brain’s predictive prowess revealed in primary visual cortex. Proceedings of the National Academy of Sciences, 113(5), 1124–1125. doi: 10.1073/pnas.1523834113 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Rao R. P. N., & Ballard D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1), 79–87. doi: 10.1038/4580 [DOI] [PubMed] [Google Scholar]
11.Friston K., Daunizeau J., Kilner J., & Kiebel S. J. (2010). Action and behavior: A free-energy formulation. Biological Cybernetics, 102(3), 227–260. doi: 10.1007/s00422-010-0364-z [DOI] [PubMed] [Google Scholar]
12.Kilner J. M., Friston K. J., & Frith C. D. (2007). Predictive coding: An account of the mirror neuron system. Cognitive Processing, 8(3), 159–166. doi: 10.1007/s10339-007-0170-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Koster-Hale J., & Saxe R. (2013). Theory of mind: A neural prediction problem. Neuron, 79(5), 836–848. doi: 10.1016/j.neuron.2013.08.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.FitzGerald T. H., Dolan R. J., & Friston K. (2015). Dopamine, reward learning, and active inference. Frontiers in Computational Neuroscience, 136. doi: 10.3389/fncom.2015.00136 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Smith R., Schwartenbeck P., Parr T., & Friston K. J. (2020). An Active Inference Approach to Modeling Structure Learning: Concept Learning as an Example Case. Frontiers in Computational Neuroscience, 14. doi: 10.3389/fncom.2020.00041 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Zénon A. (2019). Eye pupil signals information gain. Proceedings of the Royal Society B: Biological Sciences, 286(1911), 20191593. doi: 10.1098/rspb.2019.1593 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Da Costa L., Parr T., Sajid N., Veselic S., Neacsu V., & Friston K. (2020). Active inference on discrete state-spaces: A synthesis. Journal of Mathematical Psychology, 99, 102447. doi: 10.1016/j.jmp.2020.102447 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Christie S., & Gentner D. (2010). Where hypotheses come from: Learning new relations by structural alignment. Journal of Cognition and Development, 11(3), 356–373. [Google Scholar]
19.Gentner D., & Hoyos C. (2017). Analogy and abstraction. Topics in Cognitive Science, 9(3), 672–693. doi: 10.1111/tops.12278 [DOI] [PubMed] [Google Scholar]
20.Schulz L. (2012). Chapter Ten—Finding New Facts; Thinking New Thoughts. In Xu F. & Kushnir T. (Eds.), Advances in Child Development and Behavior (Vol. 43, pp. 269–294). JAI. 10.1016/B978-0-12-397919-3.00010-1 [DOI] [PubMed] [Google Scholar]
21.Rutar D., de Wolff E., van Rooij I., & Kwisthout J. (2022). Structure Learning in Predictive Processing Needs Revision. Computational Brain & Behavior, 5(2), 234–243. doi: 10.1007/s42113-022-00131-8 [DOI] [Google Scholar]
22.Friston K., Parr T., & Zeidman P. (2018). Bayesian model reduction. ArXiv Preprint ArXiv:1805.07092. 10.48550/arXiv.1805.07092 [DOI] [Google Scholar]
23.Gershman S. J., & Blei D. M. (2012). A tutorial on Bayesian nonparametric models. Journal of Mathematical Psychology, 56(1), 1–12. [Google Scholar]
24.Goldwater S. J. (2007). Nonparametric Bayesian Models of Lexican Acquisition. Brown University. [Google Scholar]
25.Aston-Jones G., & Cohen J. D. (2005). An integrative theory of locus coeruleus-norepinephrine function: Adaptive gain and optimal performance. Annu.Rev.Neurosci., 28, 403–450. doi: 10.1146/annurev.neuro.28.061604.135709 [DOI] [PubMed] [Google Scholar]
26.Joshi S., & Gold J. I. (2020). Pupil Size as a Window on Neural Substrates of Cognition. Trends in Cognitive Sciences, 24(6), 466–480. doi: 10.1016/j.tics.2020.03.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Larsen R. S., & Waters J. (2018). Neuromodulatory correlates of pupil dilation. Frontiers in Neural Circuits, 12, 21. doi: 10.3389/fncir.2018.00021 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.McGinley M. J., Vinck M., Reimer J., Batista-Brito R., Zagha E., Cadwell C. R., et al. (2015). Waking state: Rapid variations modulate neural and behavioral responses. Neuron, 87(6), 1143–1161. doi: 10.1016/j.neuron.2015.09.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Murphy P. R., O’Connell R. G., O’Sullivan M., Robertson I. H., & Balsters J. H. (2014). Pupil diameter covaries with BOLD activity in human locus coeruleus. Human Brain Mapping, 35(8), 4140–4154. doi: 10.1002/hbm.22466 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Bouret S., & Sara S. J. (2005). Network reset: A simplified overarching theory of locus coeruleus noradrenaline function. Trends in Neurosciences, 28(11), 574–582. doi: 10.1016/j.tins.2005.09.002 [DOI] [PubMed] [Google Scholar]
31.Doya K. (2008). Modulators of decision making. Nature Neuroscience, 11(4), 410–416. doi: 10.1038/nn2077 [DOI] [PubMed] [Google Scholar]
32.Glimcher P. W. (2011). Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis. Proceedings of the National Academy of Sciences, 108(Supplement 3), 15647. doi: 10.1073/pnas.1014269108 [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Lak A., Nomoto K., Keramati M., Sakagami M., & Kepecs A. (2017). Midbrain Dopamine Neurons Signal Belief in Choice Accuracy during a Perceptual Decision. Current Biology, 27(6), 821–832. doi: 10.1016/j.cub.2017.02.026 [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Montague P. R., Hyman S. E., & Cohen J. D. (2004). Computational roles for dopamine in behavioural control. Nature, 431, 760. doi: 10.1038/nature03015 [DOI] [PubMed] [Google Scholar]
35.Parikh V., Kozak R., Martinez V., & Sarter M. (2007). Prefrontal Acetylcholine Release Controls Cue Detection on Multiple Timescales. Neuron, 56(1), 141–154. doi: 10.1016/j.neuron.2007.08.025 [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Schultz W. (2005). Behavioral Theories and the Neurophysiology of Reward. Annual Review of Psychology, 57(1), 87–115. 10.1146/annurev.psych.56.091103.070229 [DOI] [PubMed] [Google Scholar]
37.Yu A. J., & Dayan P. (2005). Uncertainty, Neuromodulation, and Attention. Neuron, 46(4), 681–692. doi: 10.1016/j.neuron.2005.04.026 [DOI] [PubMed] [Google Scholar]
38.Browning M., Behrens T. E., Jocham G., O’reilly J. X., & Bishop S. J. (2015). Anxious individuals have difficulty learning the causal statistics of aversive environments. Nature Neuroscience, 18(4), 590–596. doi: 10.1038/nn.3961 [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Colizoli O., de Gee J. W., Urai A. E., & Donner T. H. (2018). Task-evoked pupil responses reflect internal belief states. Scientific Reports, 8(1), 13702. doi: 10.1038/s41598-018-31985-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
40.de Gee J. W., Correa C. M. C., Weaver M., Donner T. H., & van Gaal S. (2021). Pupil Dilation and the Slow Wave ERP Reflect Surprise about Choice Outcome Resulting from Intrinsic Variability in Decision Confidence. Cerebral Cortex, 31(7), 3565–3578. doi: 10.1093/cercor/bhab032 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Kayhan E., Heil L., Kwisthout J., van Rooij I., Hunnius S., & Bekkering H. (2019). Young children integrate current observations, priors and agent information to predict others’ actions. PloS One, 14(5), e0200976. doi: 10.1371/journal.pone.0200976 [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Koenig S., Uengoer M., & Lachnit H. (2018). Pupil dilation indicates the coding of past prediction errors: Evidence for attentional learning theory. Psychophysiology, 55(4), e13020. doi: 10.1111/psyp.13020 [DOI] [PubMed] [Google Scholar]
43.Nassar M. R., Rumsey K. M., Wilson R. C., Parikh K., Heasly B., & Gold J. I. (2012). Rational regulation of learning dynamics by pupil-linked arousal systems. Nature Neuroscience, 15, 1040. doi: 10.1038/nn.3130 [DOI] [PMC free article] [PubMed] [Google Scholar]
44.O’Reilly J. X., Schüffelgen U., Cuell S. F., Behrens T. E. J., Mars R. B., & Rushworth M. F. S. (2013). Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. Proceedings of the National Academy of Sciences, 110(38), E3660–E3669. doi: 10.1073/pnas.1305373110 [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Preuschoff K., ‘t Hart B., & Einhauser W. (2011). Pupil Dilation Signals Surprise: Evidence for Noradrenaline’s Role in Decision Making. Frontiers in Neuroscience, 5, 115. doi: 10.3389/fnins.2011.00115 [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Satterthwaite T. D., Green L., Myerson J., Parker J., Ramaratnam M., & Buckner R. L. (2007). Dissociable but inter-related systems of cognitive control and reward during decision making: Evidence from pupillometry and event-related fMRI. NeuroImage, 37(3), 1017–1031. doi: 10.1016/j.neuroimage.2007.04.066 [DOI] [PubMed] [Google Scholar]
47.Van Slooten J. C., Jahfari S., Knapen T., & Theeuwes J. (2018). How pupil responses track value-based decision-making during and after reinforcement learning. PLOS Computational Biology, 14(11), e1006632. doi: 10.1371/journal.pcbi.1006632 [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Knapen T., de Gee J. W., Brascamp J., Nuiten S., Hoppenbrouwers S., & Theeuwes J. (2016). Cognitive and Ocular Factors Jointly Determine Pupil Responses under Equiluminance. PLoS ONE, 11(5), e0155574. doi: 10.1371/journal.pone.0155574 [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Burlingham C. S., Mirbagheri S., & Heeger D. J. (2022). A unified model of the task-evoked pupil response. Science Advances, 8(16), eabi9979. doi: 10.1126/sciadv.abi9979 [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Hoeks B., & Levelt W. J. M. (1993). Pupillary dilation as a measure of attention: A quantitative system analysis. Behavior Research Methods, Instruments, & Computers, 25(1), 16–26. 10.3758/BF03204445 [DOI] [Google Scholar]
51.Mathot S. (2018). Pupillometry: Psychology, Physiology, and Function. 1(1), 16. 10.5334/joc.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Ashby F. G., & Townsend J. T. (1980). Decomposing the reaction time distribution: Pure insertion and selective influence revisited. Journal of Mathematical Psychology, 21(2), 93–123. 10.1016/0022-2496(80)90001-2 [DOI] [Google Scholar]
53.Berger A., & Kiefer M. (2021). Comparison of Different Response Time Outlier Exclusion Methods: A Simulation Study. Frontiers in Psychology, 12. doi: 10.3389/fpsyg.2021.675558 [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Falmagne J.-C. (1987). Response times; their role in inferring elementary mental organization. Science, 237, 1060. Gale OneFile: Health and Medicine. [Google Scholar]
55.Whelan R. (2008). Effective Analysis of Reaction Time Data. The Psychological Record, 58(3), 475–482. doi: 10.1007/BF03395630 [DOI] [Google Scholar]
56.de Gee J. W., Knapen T., & Donner T. H. (2014). Decision-related pupil dilation reflects upcoming choice and individual bias. Proceedings of the National Academy of Sciences of the United States of America, 111(5), E618–25. doi: 10.1073/pnas.1317557111 [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Urai A. E., Braun A., & Donner T. H. (2017). Pupil-linked arousal is driven by decision uncertainty and alters serial choice bias. Nature Communications, 8, 14637. PMC. doi: 10.1038/ncomms14637 [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Liao H.-I., Yoneya M., Kidani S., Kashino M., & Furukawa S. (2016). Human pupillary dilation response to deviant auditory stimuli: Effects of stimulus properties and voluntary attention. Frontiers in Neuroscience, 10, 43. doi: 10.3389/fnins.2016.00043 [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Zekveld A. A., Koelewijn T., & Kramer S. E. (2018). The Pupil Dilation Response to Auditory Stimuli: Current State of Knowledge. Trends in Hearing, 22, 2331216518777174. doi: 10.1177/2331216518777174 [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Braem S., Coenen E., Bombeke K., Van Bochove M. E., & Notebaert W. (2015). Open your eyes for prediction errors. Cognitive, Affective, & Behavioral Neuroscience, 15(2), 374–380. doi: 10.3758/s13415-014-0333-4 [DOI] [PubMed] [Google Scholar]
61.Critchley H. D., Tang J., Glaser D., Butterworth B., & Dolan R. J. (2005). Anterior cingulate activity during error and autonomic response. NeuroImage, 27(4), 885–895. doi: 10.1016/j.neuroimage.2005.05.047 [DOI] [PubMed] [Google Scholar]
62.Maier M. E., Ernst B., & Steinhauser M. (2019). Error-related pupil dilation is sensitive to the evaluation of different error types. Biological Psychology, 141, 25–34. doi: 10.1016/j.biopsycho.2018.12.013 [DOI] [PubMed] [Google Scholar]
63.Murphy P. R., van Moort M. L., & Nieuwenhuis S. (2016). The Pupillary Orienting Response Predicts Adaptive Behavioral Adjustment after Errors. PLOS ONE, 11(3), e0151763. doi: 10.1371/journal.pone.0151763 [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Rondeel E., Van Steenbergen H., Holland R., & van Knippenberg A. (2015). A closer look at cognitive control: Differences in resource allocation during updating, inhibition and switching as revealed by pupillometry. Frontiers in Human Neuroscience, 9. https://www.frontiersin.org/articles/10.3389/fnhum.2015.00494 [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Wessel J. R., Danielmeier C., & Ullsperger M. (2011). Error Awareness Revisited: Accumulation of Multimodal Evidence from Central and Autonomic Nervous Systems. Journal of Cognitive Neuroscience, 23(10), 3021–3036. doi: 10.1162/jocn.2011.21635 [DOI] [PubMed] [Google Scholar]
66.May K. A., & Solomon J. A. (2013). Four Theorems on the Psychometric Function. PLOS ONE, 8(10), e74815. doi: 10.1371/journal.pone.0074815 [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Den Ouden H. E., Kok P., & De Lange F. P. (2012). How prediction errors shape perception, attention, and motivation. Frontiers in Psychology, 3, 548. doi: 10.3389/fpsyg.2012.00548 [DOI] [PMC free article] [PubMed] [Google Scholar]
68.Psychopy [Computer software] (1.81). (2018). University of Nottingham. https://psychopy.org/index.html
69.Python [Computer software] (3.6). (2016). Python Software Foundation. https://www.python.org/downloads/release/python-360/
70.Gramfort A., Luessi M., Larson E., Engemann D. A., Strohmeier D., Brodbeck C., et al. (2013). MEG and EEG data analysis with MNE-Python. Frontiers in Neuroscience, 267. doi: 10.3389/fnins.2013.00267 [DOI] [PMC free article] [PubMed] [Google Scholar]
71.JASP Team. (2020). JASP (0.13.1).
72.Heald J. B., Lengyel M., & Wolpert D. M. (2021). Contextual inference underlies the learning of sensorimotor repertoires. Nature, 600(7889), 489–493. doi: 10.1038/s41586-021-04129-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Pouncy T., & Gershman S. J. (2022). Inductive biases in theory-based reinforcement learning. [DOI] [PubMed] [Google Scholar]
74.Sibley C., Foroughi C. K., Brown N. L., Phillips H., Drollinger S., Eagle M., et al. (2020). More than Means: Characterizing Individual Differences in Pupillary Dilations. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 64(1), 57–61. 10.1177/1071181320641017 [DOI] [Google Scholar]
75.Hyönä J., Tommola J., & Alaja A.-M. (1995). Pupil dilation as a measure of processing load in simultaneous interpretation and other language tasks. The Quarterly Journal of Experimental Psychology, 48(3), 598–612. doi: 10.1080/14640749508401407 [DOI] [PubMed] [Google Scholar]
76.Porter G., Troscianko T., & Gilchrist I. D. (2007). Effort during visual search and counting: Insights from pupillometry. Quarterly Journal of Experimental Psychology, 60(2), 211–229. doi: 10.1080/17470210600673818 [DOI] [PubMed] [Google Scholar]
77.van der Wel P., & van Steenbergen H. (2018). Pupil dilation as an index of effort in cognitive control tasks: A review. Psychonomic Bulletin & Review, 25(6), 2005–2015. doi: 10.3758/s13423-018-1432-y [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0270619.r001

Decision Letter 0

Anthony C Constantinou

15 Sep 2022

PONE-D-22-16988Differentiating Bayesian model updating and model revision based on their prediction error dynamicsPLOS ONE

Dear Dr. Rutar,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The paper requires major revision before it can be reconsidered for publication. For details, please refer to the reviewers' comments, which I believe are detailed and helpful. I would like to draw your attention on comments raised by the reviewers about clarifying the assumptions made and discussing potential limitations of those assumptions with reference to the results presented. Please also ensure that you improve the description of your experiments, to improve clarity, as per reviewers' comments.

If you decide to resubmit a revised version, please provide point-to-point responses to each of the comments made by the reviewers. In your response, ensure you clearly explain what revisions have been made to address each of the points raised by the reviewers. If a comment is not addressed, please justify this decision in your response to the reviewer.

Please submit your revised manuscript by Oct 30 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Anthony C Constantinou

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please change "female” or "male" to "woman” or "man" as appropriate, when used as a noun (see for instance https://apastyle.apa.org/style-grammar-guidelines/bias-free-language/gender).

3. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match.

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

4. Please include your full ethics statement in the ‘Methods’ section of your manuscript file. In your statement, please include the full name of the IRB or ethics committee who approved or waived your study, as well as whether or not you obtained informed written or verbal consent. If consent was waived for your study, please include this information in your statement as well.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I enjoyed reading this clever and nicely motivated study of model revision and updating. I thought the overall motivation was excellent. However, the design is so complicated I did not fully understand what was being reversed between the revision and updating phases. Furthermore, you have made some rather superficial assumptions in building your hypotheses. This means it is difficult to assess the significance or implication of your empirical findings. Finally, although it is pleasingly honest of you to acknowledge you started with pupillary responses as the primary dependent measure, I think this was fundamentally misguided for several reasons (please see below).

In short, could you think about the following points — and whether you can restructure your paper along the lines suggested.

First, you need to be slightly more formal and specific about the distinction between model revision and model updating. I appreciate that these are terms that you have put in the literature — and that you will want to retain. However, there is an unfortunate conflation of the word ‘updating’— in the sense of Bayesian belief updating and model updating — that need to resolve. Furthermore, you seem to have a purely narrative understanding of predictive processing and the distinction between parameter learning and structure learning. I say this because you talk about revising hypotheses in the introduction. In predictive processing, there is only one model or hypothesis, and it is the parameters of this model that are updated (through revising prior beliefs to posterior beliefs) on the basis of experience. When you talk about model updating, you are referring to the structure of the model, as opposed to its parameters. I think you should make this clear with the following:

“Predictive processing can be regarded as an umbrella term for active inference and learning. Crucially, learning comes in two flavours: it can refer to the updating of model parameters (i.e., parameter learning of the sort associated with activity or experience -dependent plasticity in the brain). Conversely, the model itself can be updated (i.e., structure learning mediated by the addition or removal of connections in the brain). In this context, model revision refers to the revision of model parameters or connection weights under a specific generative model or architecture, while model updating refers to the selection or reduction of models in terms of their structure[1-3]. There are two approaches to this kind of structure learning. One can start from an overcomplete generative model and then eliminate redundant parameters (i.e., Bayesian model reduction [4]). Conversely, one can explore model space by adding extra parameters or connections (e.g., in the spirit of nonparametric Bayes [5, 6]). In both instances, the alternative models or hypotheses are compared in terms of their marginal likelihood or log evidence; rendering structure learning an instance of Bayesian model selection [7]. In the case of Bayesian model reduction, from an over complete model, there are neurobiological plausible and simple rules that can implement model updating — and that may write underwrite aha moments or, indeed, a functional explanation for sleep and its associated synaptic homeostasis [8-11]."

The second big issue is your use of pupillary diameter as a proxy for prediction error. I think that this is an unfounded and misguided move. The link between various belief updating processes in predictive processing and pupillary responses has yet to be established. I would leverage this in the way that you frame your report. In other words, instead of starting off by assuming that pupillary dilatation reflects this or that, you can identify the best explanations for pupillary dilatation on the basis of your results. I suggest this because most of the available evidence and computational work in predictive processing suggests that pupillary dilatation does not reflect prediction errors per se, but the precision or confidence placed in prediction errors of a particular sort. I would recommend you read [12] and then say something along the following lines:

“The precise beliefs updates or learning that underlie pupillary dilatation in predictive processing has yet to be fully established. However, early considerations suggest that the noradrenergic basis of pupillary dilatation links it to the encoding of precision or confidence about contingencies (i.e., transition probabilities) in the generative models that underlie active inference (a.k.a., predictive processing). In other words, pupillary responses may reflect the predictability or salience of a stimulus; where salience refers here to the propensity to revise or update latent or hidden states that are being inferred. However, the evidence for phasic pupillary responses reflecting, e.g., prediction errors, precision or precision-weighted prediction errors is much less clear.

One might imagine that pupillary dilatation could play the role of electrophysiological correlates — such as the mismatch negativity – in reflecting the information gain or surprise inherent in a particular stimulus. In light of this, we characterised the time course of model revision and updating in terms of behavioural responses (i.e., predictive accuracy) and asked: what is the best predictor of accompanying pupillary responses. In this study we were primarily concerned with phasic pupillary responses and, specifically, responses evoked by surprising or informative stimuli relative to predicted stimuli."

What I am proposing here is that you use the behavioural responses to track learning and then use the argument that only after learning can there be predictions – and that only after there are predictions is a stimulus informative or surprising. In other words, you would expect to see a monotonic relationship between model updating or revision as expressed in behavioural learning and pupillary responses. The nature of this relationship is, I think is open. For example, it could reflect the confidence or precision about a prediction. In this case, the evoked responses to correct and incorrect targets should be the same. Alternatively, pupillary responses could reflect an update to the predictions of predictability (i.e. precision). In this case, the interesting differences will emerge in terms of the difference between correct and incorrect target stimuli.

In terms of your experimental design, I think you need to be more careful in distinguishing your design from a simple reversal learning paradigm. I would recommend something along the following lines:

“To disambiguate model revision from model updating, it is necessary to evince aha moments or model updating; in the sense that a pre-existing model is not fit for purpose after a change in contingencies. This requires a paradigm that goes beyond conventional reversal learning (i.e., where contingencies simply change and the parameters encoding those contingencies are revised via parametric learning). To examine putative model updating, we used a two-phase protocol, in which a simple (revision) model of associative contingencies was sufficient to explain observable outcomes. In the second (update) phase, we changed the contingencies in a structural or qualitative fashion by adding a conditional dependency or context sensitivity. Specifically, in the simple model there was no interaction between the predictive validity of visual cues and auditory cues. However, in the update phase the predictive validity of visual cues depended upon the presence of auditory cues. This allowed as to examine the model revision and updating as subjects learned a simple model and then learned a more structured model."

I think at this stage, you have to think carefully about your hypotheses. Generally speaking, to look at model updating (i.e., Bayesian model selection or structure learning) one has to have a rather delicate paradigm that elicits aha moments. In other words, a sudden switch associated with the act of selecting one model over another – that is revealed by an abrupt change inference and subsequent task performance. I do not think you have got this in your paradigm. In other words, there will be a degree of model updating in both the updating and revision phases. It may be that the simple model allows for a shorter latency of model updating, while the context sensitive (update phase) model has a more protracted update. One could address this but by assuming that each subject commits to a selected model at the point of model updating and estimate the most likely time point of this updating. The idea here would be that for the revision phase, most subjects discover or select their model early in the trials; while for the update phase, some subjects find the model more quickly while it takes other subjects much longer. This might be an interesting way of using your intersubject variability.

Notice that this suggestion rests upon using the behavioural responses as a more efficient measure of learning. Once you have tied down the dynamics of model revision and updating, you can then turn to the pupillary responses and ask what they are most likely to reflect. In this spirit, you might also add in your discussion (to your paragraph about ways forward).

"Ultimately, to establish the construct validity of pupillary responses in terms of model revision and updating, it will be necessary to have efficient estimates of various belief states and learning. These can only be inferred from observable behaviour (e.g., choice behaviour or reaction times), under the ideal Bayesian observer assumptions afforded by active inference. Early work along these lines has looked at baseline pupillary dilatation using a Markov decision process as the generative model [12]. It would be interesting to repeat this kind of exercise using paradigms that can elicit model updating and accompanying aha moments. See [8] for a numerical example of synthetic model updating."

Finally, I think you need to be clearer about the experimental design. There were too many factors and changes for the reader to make sense of. For example, I did not understand whether Mappings 1 and 2 referred to the precision (i.e., 80% versus 20%) or to the mapping per se (i.e., square means left). Crucially, it was not clear what was reversed and what was not reversed. I think the simplest thing to do would be to have a figure in which you draw the mappings for the two phases of the paradigm separately. The maps should connect the cue to the targets with the little arrows. The precision of these mappings can then be indicated with 80% or 20% beside the arrows. This should also resolve confusion about your counterbalancing. For example, when you said that Mappings 1 and 2 were counterbalanced over subjects, does this mean that certain subjects never experienced one of the two Mappings?

I hope that these suggestions help should any revision required.

1. Smith, R., et al., An Active Inference Approach to Modeling Structure Learning: Concept Learning as an Example Case. Front Comput Neurosci, 2020. 14: p. 41.

2. Gershman, S.J. and Y. Niv, Learning latent structure: carving nature at its joints. Curr Opin Neurobiol, 2010. 20(2): p. 251-6.

3. Tervo, D.G., J.B. Tenenbaum, and S.J. Gershman, Toward the neural implementation of structure learning. Curr Opin Neurobiol, 2016. 37: p. 99-105.

4. Friston, K., T. Parr, and P. Zeidman, Bayesian model reduction. arXiv preprint arXiv:1805.07092, 2018.

5. Goldwater, S., Nonparametric Bayesian Models of Lexical Acquisition. 2006, Brown University.

6. Gershman, S.J. and D.M. Blei, A tutorial on Bayesian nonparametric models. Journal of Mathematical Psychology, 2012. 56(1): p. 1-12.

7. Hoeting, J.A., et al., Bayesian Model Averaging: A Tutorial. Statistical Science, 1999. 14(4): p. 382-401.

8. Friston, K.J., et al., Active Inference, Curiosity and Insight. Neural Comput, 2017. 29(10): p. 2633-2683.

9. Hobson, J.A. and K.J. Friston, Consciousness, Dreams, and Inference The Cartesian Theatre Revisited. Journal of Consciousness Studies, 2014. 21(1-2): p. 6-32.

10. Tononi, G. and C. Cirelli, Sleep function and synaptic homeostasis. Sleep Med Rev., 2006. 10(1): p. 49-62.

11. Hinton, G.E., et al., The "wake-sleep" algorithm for unsupervised neural networks. Science, 1995. 268(5214): p. 1158-61.

12. Vincent, P., et al., With an eye on uncertainty: Modelling pupillary responses to environmental volatility. PLOS Computational Biology, 2019. 15(7): p. e1007126.

Reviewer #2: I enjoyed reading this paper.

This paper differentiates model updating and model revision using behavioural experiments---two concepts which have been, according to the authors, recently distinguished theoretically. It does so by proposing a behavioural experiment consisting of updating and revision phases, and assesses the participant’s predictions and prediction errors throughout these phases, showing that these two phases have different predictive processing characteristics.

The paper assumes that “existing accounts of learning in the predictive-processing framework currently lack a crucial component: a constructive learning mechanism that accounts for changing models structurally when new hypotheses need to be learnt”. As such, it has the ambition to inform the theoretical development of mechanisms that reproduce human model learning. One possibility, that the experiments suggest, is that “participants first built multiple models from scratch in the updating phase and update them in the revision phase”.

The paper is compelling very well written, and I would recommend it for publication if the authors could say something about the following queries. My main comment (detailed below) is that I do not entirely agree with the way the premise of the paper. I believe that the field has competing hypotheses about how humans learn their model of the world. While I think the experiments from the paper are a valuable contribution, I believe that framing them in light of the recent literature in computational cognitive science would increase the impact of the paper. In particular, maybe it will be possible to say something about whether the experiments provide evidence for or against different computational mechanisms that have been proposed to account for human model learning within predictive processing.

A caveat: I am not qualified to assess the validity and soundness of the behavioural experiments.

Major comment:

The paper, on several occasions claims that “existing accounts of learning in the predictive-processing framework currently lack a crucial component: a constructive learning mechanism that accounts for changing models structurally when new hypotheses need to be learnt” (l501-502). It then proceeds by noting that “Kwisthout and colleagues (2017) proposed that model revision is a learning mechanism that is distinct from Bayesian model updating and accounts for such a structural change in generative models.” (l504-505).

While there are currently no algorithms that can reproduce human model learning at scale, the field of predictive processing has proposed several mechanistic explanations for human model learning. As I see it, these are split into two main categories:

1) Model revision as model updating: Model revision is cast as Bayesian belief updating over spaces of models. This is the view that has been developed by Tenenbaum, Gershman and colleagues. Human model learning can be done (in theory) by doing Bayesian inference over big spaces of generative models, often written as probabilistic programs. How do we add a factor, or hypothesis, to an existing model? One way to do this is via the toolkit of Bayesian non-parametric Bayes, whence the number of say, hidden state factors in a model is updated via Bayesian inference. Mathematically, this may requires priors over spaces of models that are infinitely large, but this is not a problem both theoretically and computationally. A nice review of Bayesian non-parametrics is: A tutorial on Bayesian nonparametric models by Gershman et al (2012). A nice review of model learning as Bayesian inference on large (but finite) spaces of generative models is: Bayesian Models of Conceptual Development: Learning as Building Models of the World by Ullman et al (2020). A couple of nice papers that have implemented the latter in practice, showing human learning efficiency in some tasks are: Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning by Tsividis et al (2021); and Inductive biases in theory-based reinforcement learning by Pouncy and Gershman (2022).

2) Model revision as free energy minimisation: this view describes human model learning as a process of (variational) free energy minimisation on spaces of generative models. This is the view that is advocated by Friston and colleagues. This view is not very dissimilar to the one aboce. Free energy minimisation entails Bayesian updating with maximisation of the model evidence. This is equivalent to minimising model complexity while maximising its accuracy. In short, the added imperative to Bayesian updating entail regularising the model: fitting the Bayesian posterior, while staying within models that are computationally manageable. In practice, this leads to building abstractions and hierarchical depth. A nice review of all this is: Active inference on discrete state-spaces: A synthesis by Da Costa et al (2020). Much has not been explored regarding the use of free energy minimisation to learn models; but, from the current literature two algorithms stand out (these are discussed in the previous paper): a) Bayesian model reduction, which enables efficient model reduction thanks to free energy minimisation, see Bayesian model reduction by Friston et al (2019). This has been used to model sleep, synaptic pruning, and insight, e.g., Active Inference, Curiosity and Insight by Friston et al (2017). b) Bayesian model expansion: which is about adding hypotheses to a model (ie growing a model), see An Active Inference Approach to Modeling Structure Learning: Concept Learning as an Example Case by Smith et al (2020).

It would be great if the authors could, either qualitatively or quantitatively say whether these experiments bring evidence in favour or against either of these hypotheses, which to my understanding are the main hypotheses advanced by the field in terms of describing model learning. My hunch is that, since model updating and revision are shown to have different predictive processing characteristics, it could a point in favour of the free energy view of things (which adds something to Bayesian updating). That said, Bayesian model updating is so flexible that maybe this framework could account for the data as well. Also, it might be possible to say something about the model revision phase in relation to mechanism 2. At the very least, the authors should mention this theoretical work on human model learning in the introduction.

Minor comments:

- L69-72: “the entirety of human cognition and behaviour from visual processing (Rao & Ballard, 1999;Edwards et al., 2017; Petro & Muckli, 2016) to mentalizing (Kilner et al., 2007; Koster-Hale & Saxe, 2013)”.

o Here may also be worth adding “and action” or “control” eg., Action and behavior: a free-energy formulation by Friston et al 2010.

o L100-101 “Model revision, unlike model updating, changes the structure of a generative model by altering its causal connections or by adding and removing hypotheses (Kwisthout et al., 2017)”. Here it is worth mentioning the other terms in the literature that are synonyms to model revision:

o Structure learning: Learning latent structure: carving nature at its joints by Gershman and Niv (2010); Active inference on discrete state-spaces: A synthesis by Da Costa et al (2020); An Active Inference Approach to Modeling Structure Learning: Concept Learning as an Example Case by Smith et al (2020)

o Causal inference: Elements of Causal Inference by Peters et al (2017).

o In regards to the suggestion that “participants first built multiple models from scratch in the updating phase and update them in the revision phase”, there may be a connection with the computational account of model learning in terms of Bayesian model updating presented in Inductive biases in theory-based reinforcement learning by Pouncy and Gershman (2022), which considers a handful of models (ie competing hypotheses) at each point in time.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Karl Friston

Reviewer #2: Yes: Lancelot Da Costa

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Feb 16;18(2):e0270619. doi: 10.1371/journal.pone.0270619.r002

Author response to Decision Letter 0

8 Dec 2022

IMPORTANT: our figures, as part of the replies to reviewers, will not appear in this box below. Therefore, please see the full replies to the reviewers in the attached files.

RESPONSE TO REVIEWERS

We would like to thank both reviewers for their interest, consideration, and helpful suggestions for improvement. We address each of their concerns in detail below. Note we have formatted the concerns of the reviewers in blue and our responses in black.

REVIEWER 1

I enjoyed reading this clever and nicely motivated study of model revision and updating. I thought the overall motivation was excellent. However, the design is so complicated I did not fully understand what was being reversed between the revision and updating phases. Furthermore, you have made some rather superficial assumptions in building your hypotheses. This means it is difficult to assess the significance or implication of your empirical findings. Finally, although it is pleasingly honest of you to acknowledge you started with pupillary responses as the primary dependent measure, I think this was fundamentally misguided for several reasons (please see below).

Reply: Thank you for this overall positive assessment on the experimental motivation and your constructive criticism, including helpful suggestions for improvement. We have taken care to address each of your points below.

Additionally, several grammatical and formatting changes have been made for clarity. Changes have been marked in green in the manuscript file.

In short, could you think about the following points — and whether you can restructure your paper along the lines suggested.

1) First, you need to be slightly more formal and specific about the distinction between model revision and model updating. I appreciate that these are terms that you have put in the literature — and that you will want to retain. However, there is an unfortunate conflation of the word ‘updating’— in the sense of Bayesian belief updating and model updating — that need to resolve. Furthermore, you seem to have a purely narrative understanding of predictive processing and the distinction between parameter learning and structure learning. I say this because you talk about revising hypotheses in the introduction. In predictive processing, there is only one model or hypothesis, and it is the parameters of this model that are updated (through revising prior beliefs to posterior beliefs) on the basis of experience. When you talk about model updating, you are referring to the structure of the model, as opposed to its parameters. I think you should make this clear with the following:

Reply: Thank you for these helpful comments. We have revised the appropriate sections of the manuscript to incorporate your suggested improvements. Particularly, we replaced “model updating” and “model revision” with the terms found in existing literature, i.e., “parameter learning” and “structure learning” (e.g., Friston et al., 2017; Smith et al., 2020; da Costa et al., 2020). We also made the difference between the two learning mechanisms clearer; parameter learning refers to updating of parameters of the generative model through revising prior beliefs to posterior beliefs via Bayes theorem, and structure learning refers to adding/removing parameter or by adding/removing connections, where a structural change comes about based on the comparison of the marginal likelihood of the data under alternative models (structurally changed vs unchanged).

The following changes have been made in the Abstract:

“Within predictive processing two kinds of learning can be distinguished: parameter learning and structure learning. In Bayesian parameter learning, parameters under a specific generative model are continuously being updated in light of new evidence. However, this learning mechanism cannot explain how new parameters are added to a model. Structure learning, unlike parameter learning, makes structural changes to a generative model by altering its causal connections or adding or removing parameters. Whilst these two types of learning have recently been formally differentiated, they have not been empirically distinguished. The aim of this research was to empirically differentiate between parameter learning and structure learning on the basis of how they affect pupil dilation. Participants took part in a within-subject computer-based learning experiment with two phases. In the first phase, participants had to learn the relationship between cues and target stimuli. In the second phase, they had to learn a conditional change in this relationship. Our results show that the learning dynamics were indeed qualitatively different between the two experimental phases, but in the opposite direction as we originally expected. Participants were learning more gradually in the second phase compared to the first phase. This might imply that participants built multiple models from scratch in the first phase (structure learning) before settling on one of these models. In the second phase, subjects might just need to update the probability distribution over the model parameters (parameter learning).”

In the Introduction, the following changes have been made (p. 3-4):

“Until recently, learning in predictive processing was cast as parameter learning, where parameters under a specific generative model are updated in light of new evidence using Bayes’ rule (Da Costa et al., 2020; Friston et al., 2016; Smith et al., 2020). Such a formalism is well suited for explaining how learning proceeds when the generative model contains all relevant parameters for a particular learning task. In other words, parameter learning can only ensue when the structure of a generative model is established. Unless we assume that learners are equipped from the start with the complete set of parameters that can explain every situation they will ever encounter, we need to explain how novel parameters are added to a generative model or removed from it (Christie & Gentner, 2010; Gentner & Hoyos, 2017; Schulz, 2012). To account for this, a new type of learning has been proposed within predictive processing: structure learning (Da Costa et al., 2020; Friston et al., 2017; Kwisthout et al., 2017; Rutar et al., 2022; Smith et al., 2020). This type of learning changes the structure of a generative model by changing the number of parameters in a model or by altering their functional dependencies (Da Costa et al., 2020; Friston et al., 2017; Kwisthout et al., 2017; Rutar et al., 2022; Smith et al., 2020). Similarly, Heald and colleagues (Heald et al., 2021) have recently presented a theory for sensorimotor learning, called contextual inference, that differentiates between the adaptation of behaviour based on updating of existing and creation of new motor memories and adaptation due to changes in the relative weighting of these motor memories.

As a result of learning a more adequate model of the world, predictions should become better over time, and, simultaneously, uncertainty in each prediction should decrease (Clark, 2013, 2015; Hohwy, 2013). Crucially, we expected that the two experimental phases of our task would lead to two distinct dynamics of this learning process. Gradual updating of the probability distributions of existing model parameters was expected to occur in the first phase, indicated by a gradual increase in predictive accuracy of the cue-target relationship. An abrupt change from incorrect to correct predictions, once a new model parameter has been added resulting from a conditional rule change, was expected to occur in the second phase (see section Hypotheses for a more detailed description).”

In the Discussion, the following changes have been made (p. 23):

“Our research was motivated by the observation that the two kinds of learning within predictive processing, whilst recently formally differentiated, have not been empirically distinguished. Parameter learning, on the one hand, refers to updating of probability distribution of model parameters in light of new evidence using Bayes’ rule (Friston et al., 2016; Smith et al., 2020). Structure learning, on the other hand, pertains to altering the structure of the generative model by changing the number of parameters in the generative model or by altering their functional dependencies (Da Costa et al., 2020; Friston et al., 2017; Smith et al., 2020). Related proposals have been put forward by Kwisthout and colleagues (Kwisthout et al., 2017) and Rutar and colleagues (Rutar et al., 2022), who developed a formal proposal for structural changes that go beyond parameter addition and removal in a generative model.”

2) The second big issue is your use of pupillary diameter as a proxy for prediction error. I think that this is an unfounded and misguided move. The link between various belief updating processes in predictive processing and pupillary responses has yet to be established. I would leverage this in the way that you frame your report. In other words, instead of starting off by assuming that pupillary dilatation reflects this or that, you can identify the best explanations for pupillary dilatation on the basis of your results. I suggest this because most of the available evidence and computational work in predictive processing suggests that pupillary dilatation does not reflect prediction errors per se, but the precision or confidence placed in prediction errors of a particular sort. I would recommend you read [12] and then say something along the following lines:

Reply: We thank the reviewer for the helpful suggestions on how to revise our hypotheses regarding pupil dilation. In line with these suggestions, we have removed the sections claiming that pupillary responses track prediction error.

At the same time, we would like to remain faithful to our original hypotheses that the post-feedback pupil responses in the current experiment may reflect “novel information gain” about the outcome of a prediction. Our hypotheses concerning the post-feedback pupil responses are based on a large body of previous literature, which we now more explicitly summarize throughout the manuscript.

In line with your suggestions, we have made the following changes:

We have added the following paragraph to the Introduction (p. 4-5):

“Assuming we could instantiate such a learning trajectory in our task, we aimed to investigate the dynamics of a physiological correlate of information gain following the presentation of the outcome on each trial, for which there is no overt behavioural marker. Pupil dilation under constant luminance is a well-known indirect measure of the brain’s neuromodulatory arousal systems, including the noradrenergic locus coeruleus and the cholinergic basal forebrain (Aston-Jones & Cohen, 2005; Joshi & Gold, 2020; Larsen & Waters, 2018; McGinley et al., 2015; Murphy et al., 2014). Subcortical arousal systems may be involved in transmitting internal uncertainty signals and (reward) prediction errors to circuits necessary for inference and action selection (Aston-Jones & Cohen, 2005; Bouret & Sara, 2005; Doya, 2008; Glimcher, 2011; Lak et al., 2017; Montague et al., 2004; Parikh et al., 2007; Schultz, 2005; Yu & Dayan, 2005). In line with this, several studies have shown that after a person is given feedback on the accuracy of a decision they just made, their pupil dilation scales with the amount of novel information gained as a result of the feedback (Browning et al., 2015; Colizoli et al., 2018; de Gee et al., 2021; Kayhan et al., 2019; Koenig et al., 2018; Nassar et al., 2012; O’Reilly et al., 2013; Preuschoff et al., 2011; Satterthwaite et al., 2007; Van Slooten et al., 2018; Zénon, 2019). Therefore, we reasoned that the target-locked pupil response in our current task might similarly reflect how informative the target was on each trial relative to the prior prediction made by the participant. We expected that target-locked pupil responses would gradually decrease in the first phase as participants learned the task contingencies over time and abruptly decrease in the second phase of the experiment once a new model parameter has been added resulting from a conditional rule change.”

The hypothesis section has been changed as well (p. 11-12):

“We first examined learning during the two experimental phases based on the behavioural responses. Participants had to indicate by a button press which Gabor patch orientation they predicted based on the cue(s). The first phase was designed to induce parameter learning, therefore, participants were expected to gradually learn the probabilistic relationship between the cues and the Gabor patch orientation (target). Post-decision sensory evidence, in this case the target-stimulus, should improve future predictions and hence increase accuracy values for the high-frequency trials. We thus expected that in the first phase, predictive accuracy would show a gradual increase over time, illustrated by an exponential curve.

At the beginning of the second phase, participants were instructed that something had changed during this phase. We expected that participants would discover the new rule by integrating the now meaningful tone into their predictive models, resulting in structure learning. This novel model parameter should account for observations that could not be predicted correctly before the parameter was added. After an initial decrease (relative to the final accuracy in the first phase) in predictive accuracy in the second phase, the addition of a new model parameter should lead to an abrupt increase in predictive accuracy (i.e., an “aha” moment), illustrated by sigmoidal curves in Figure 1C. Figure 1C (top row) illustrates the expected accuracy on predictions during the 2AFC task. The learning curves for the tone and no tone trials may have differed during the second experimental phase as compared with the first due to the change in contingency. Therefore, we investigated the tone and no-tone trials separately in the main analysis.

After exploring the learning dynamics in behavioural data, we investigated how learning in the two experimental phases was reflected in the pupil data. With learning, participants are expected to become better at making cue-target predictions. As a consequence of sensory evidence accumulating over trials, the amount of novel sensory evidence needed to update current beliefs will become smaller over time. We hypothesised that pupil responses signalling information gain (Zénon, 2019) would decrease for the high-frequency trials as a result of learning the cue-target contingencies. More specifically, we hypothesised that in the first phase, pupil responses would decrease gradually and then plateau, while in the second phase, pupil responses would abruptly decrease until they plateau again as soon as the change in the cue-target contingency is learned (see Figure 1C).”

2.1) What I am proposing here is that you use the behavioural responses to track learning and then use the argument that only after learning can there be predictions – and that only after there are predictions is a stimulus informative or surprising. In other words, you would expect to see a monotonic relationship between model updating or revision as expressed in behavioural learning and pupillary responses.

Reply: We agree with the reviewer’s suggestions and have added an analysis in which we verified a negative correlation between the cue-target frequency difference in accuracy and the target-locked pupil responses across trial bins. This relationship was selective for the target-locked responses, as it was not obtained for the pre-target baseline pupil dilation [your reference #12].

We have added the following to the Results (p. 17):

“We confirmed that the target-locked pupil responses were “mirroring” the learning trajectory obtained in the accuracy of the behavioural responses. This correspondence between learning and pupil dilation was indicated by the presence of a negative monotonic relationship between the accuracy of predictions and the target-locked pupil response across these 16 trial bins for the tone trials (see Figure 1C for hypotheses, Supplementary Materials, and Supplementary Figure 2).”

We have included the following section in the Supplementary Materials:

“Monotonic relationship between accuracy and target-locked pupil responses

If pupil responses track how informative the target itself is relative to the predicted target orientation, we expected the difference in frequency conditions in accuracy to negatively scale with the difference in frequency conditions in information gained (i.e., target-locked pupil responses; see Figure 1C, compare the 20% vs. 80% conditions). In other words, the target-locked pupil responses were expected to “mirror” the learning trajectory obtained in the accuracy of the behavioural responses.

To test this, we computed the main effect of frequency in the tone trials as the difference between the M2 as compared with the M1 mapping conditions separately for the accuracy data and target-locked pupil responses (see also Supplementary Figure 1, tone trials). Note that we only investigated this relationship for the tone trials, because the frequencies of the M1 and M2 mappings only changed in both phases of the experiment when an auditory cue was present. Next, we performed a Spearman correlation between the frequency difference in accuracy and the frequency difference in the target-locked pupil responses across 16 trial bins (12-13 trials per bin) separately for each participant. An example participant can be seen in Supplementary Figure 2A. To help correct for skewedness, the correlation coefficients were converted using a Fisher z-transformation for statistical inference (Myers & Sirois, 2006). At the group level, the resulting z-transformed correlation coefficients were tested against zero with a Bayesian one-sample t-test.

Supplementary Figure 2. Monotonic relationship between behavioural and target-locked pupil responses. (A) An example of one participant’s data is shown for the target-locked pupil responses (sub-1). The relationship between the frequency difference in accuracy and target-locked pupil responses across 16 trial bins (12-13 trials per bin). (B) The distribution of the Spearman correlation coefficients (Fisher z-transformed) at the group level for the target-locked pupil response. (C) An example of one participant’s data is shown for the pre-target baseline pupil dilation (sub-1). (D) The distribution of the Spearman correlation coefficients (Fisher z-transformed) at the group level for the pre-target baseline pupil dilation. Bayesian one-sample t-tests (against zero) were performed to evaluate the significance of the correlations.

Finally, we repeated the above analysis with the pre-target baseline pupil dilation in place of the target-locked pupil response to test whether the result was general for the pupil or specific to the target-locked pupil response (Gilzenrat et al., 2010; Joshi & Gold, 2020; Larsen & Waters, 2018; Murphy et al., 2011; Murphy, O’Connell, et al., 2014; Murphy, Vandekerckhove, et al., 2014; Vincent et al., 2019). An example of the same participant can be seen in Supplementary Figure 2C.

At the group level, we obtained a Bayes factor of 9650 that suggests there was more evidence for the alternative hypothesis than for the null hypothesis of no correlation (Supplementary Figure 2B). An average negative correlation (M = -0.35, SD = 0.321) indicated that when participants had a larger difference between frequency conditions in accuracy, they also tended to have a smaller difference between frequency conditions in the target-locked pupil responses.

Furthermore, we confirmed that the negative scaling of the frequency effect in pupil dilation and accuracy was specific for the target-locked pupil responses (compare Supplementary Figure 2B with 2D). The results indicated that there is only anecdotal evidence to suggest that the difference between frequency conditions in the pre-target baseline pupil correlated (M = 0.16, SD = 0.32) with the frequency effect in behaviour (BF10 = 5). Finally, we confirmed that the two correlations of behaviour with i) the target-locked pupil response and ii) pre-target baseline pupil dilation differed at the group level (BF10 = 2934 in favour of the alternative hypothesis).”

In the Discussion, we have added the following information (p. 27):

“Our results also revealed that the gain parameter, G, which indicates the direction of the � parameter, was negative in the first phase as expected and positive in the second phase contrary to our expectations. This suggests that pupil dilation, for the high-frequency condition, was on average decreasing in the first phase and increasing in the second phase. These results are unexpected if the target-locked pupil dilation reflects novel information gain (see Figure 1C). The gain in information following the outcome of an event should decrease as a result of increasing accuracy for predicting the contingent relationships (Clark, 2015; FitzGerald et al., 2015; Friston et al., 2016; Smith et al., 2020; Zénon, 2019). When trials where binned across both experimental phases, the target-locked pupil responses mirrored the participants’ accuracy such that when participants had a larger difference between cue-target frequency conditions in accuracy, they also tended to have a smaller difference between cue-target frequency conditions in the target-locked pupil responses (see Supplementary Figure 2). These results are generally in line with the assumption that the presentation of the target stimulus became less informative as the participants learned to predict the cue-target contingencies.”

2.2) The nature of this relationship is, I think is open. For example, it could reflect the confidence or precision about a prediction. In this case, the evoked responses to correct and incorrect targets should be the same. Alternatively, pupillary responses could reflect an update to the predictions of predictability (i.e., precision). In this case, the interesting differences will emerge in terms of the difference between correct and incorrect target stimuli.

In line with the reviewer’s helpful suggestions, we have added an analysis to the Supplementary Materials testing for a possible interaction between accuracy and cue-target frequency conditions, which would provide more information regarding the nature of the pupil signal.

We have added the following to the Results (p. 17):

“Finally, we explored whether the target-locked pupil response, on average, differentiated between the difference in the error and correct responses for each of the frequency conditions and experimental phases. The target-locked pupil response did show sensitivity to both the predictive accuracy and cue-target frequency, but these factors did not interact (see Supplementary Figure 3 and Supplementary Table 1).”

We have included the following section in the Supplementary Materials:

“Accuracy as a factor of interest for the target-locked pupil response

The target-locked pupil dilation might reflect the difference in the frequency of the cue-target mapping conditions, but it also may reflect the direction of updating of current beliefs following novel sensory evidence indicating whether the outcome is better or worse than expected. For instance, a correct response on a low frequency (20%) trial may reflect a wrong button press on the part of the participant but may elicit a substantial amount of information gained due to the unlikely outcome.

We aimed to test this in a 4-way interaction between the factors: accuracy (error vs. correct), and cue-target mapping (M1 vs. M2), auditory cue (tone vs. no tone), and phase (first vs. second). We expected that the size of the two-way interaction term defined by the accuracy and cue-target mapping factors should change over time in accordance with the auditory-cue rules in the first and second phase.

We did not proceed with the above analysis due to too many missing cases across the 16 conditions determined by the 4-way interaction (N = 9 remaining in total). These missing cases were due to the rare occurrence of certain conditions, such as correct and low frequency trials. Therefore, we could not make any inference on the potential dynamics of this 4-way interaction in the current task design.

To increase statistical power, we collapsed across the cue-target mapping and auditory tone conditions and explored potential interactions between accuracy with the cue-target frequency and experimental phase (N = 22 remaining in total). We explored whether the target-locked pupil response would, on average, differentiate between the direction of updating current beliefs following the cue-target frequency conditions (i.e., a difference in the error and correct responses for each of the frequency conditions) and whether the interaction between accuracy and cue-target frequency would differ between the experimental phases. We performed a 3-way repeated measures ANOVA on the factors: accuracy (error vs. correct), cue-target frequency (80% vs. 20%), and experimental phase (first vs. second). The data are shown in Supplementary Figure 5.

Supplementary Figure 5. The target-locked pupil response as function of accuracy, cue-target frequency, and experimental phase. Results of the 3-way repeated measures ANOVAs are given in Supplementary Table 2. Error bars, s.e.m. (N = 22).

The results of the 3-way ANOVA are given in Supplementary Table 1. Main effects of both accuracy and frequency were obtained, but these factors did not interact. As expected, errors elicited larger target-locked pupil responses as compared with correct trials (M = 0.94, SE = 0.31; see also Figure 3B for the time course of the responses), and low-frequency trials elicited larger pupil responses as compared with the high-frequency trials (M = 0.82, SE = 0.28). We note that the absence of an interaction effect may be partly due to the cue-target contingencies reversing on the tone trials (i.e., 80% -> 20% and 20% -> 80%) in the second phase of the experiment. In other words, this “flip” in the direction of expectancy of the tone trials between the two phases of the experiment may be adding noise to the averaged signal. In line with this, we found an interaction between frequency and phase, indicating that the pupil responses in the low-frequency condition were larger compared with the high-frequency condition in the first phase of the experiment (t(21) = 3.57, p = 0.005; M = 1.39, SE = 0.39), but not in the second phase (t(21) = 0.67, p = 0.507; M = 0.26, SE = 0.39).”

Supplementary Table 1. Results of the 3-way repeated measures ANOVAs on accuracy, frequency, and phase in the target-locked pupil response. Factors of interest were accuracy (levels: error vs. correct), cue-target frequency (levels: 80% vs. 20%), and experimental phase (first vs. second). Pupil data were in percent signal change units. *p < .05, **p < .01, ***p < .001

Pupil response

Effect F(1,21) p η²G

Accuracy 9.35 < .006** 0.04

Frequency 8.52 0.008** 0.03

Phase 0.17 0.681 < .01

Accuracy * Frequency 0.75 0.396 < .01

Accuracy * Phase 1.80 0.194 < .01

Frequency * Phase 4.47 0.047* 0.02

Accuracy * Frequency * Phase 0.23 0.640 < .01

3) In terms of your experimental design, I think you need to be more careful in distinguishing your design from a simple reversal learning paradigm. I would recommend something along the following lines:

Reply: We thank the reviewer for the suggestion on how to clarify the description of our experimental paradigm. According to the reviewer’s suggestions, we have elaborated on the two experimental phases in the following places, p. 8:

“To disambiguate structure learning from parameter learning, it was necessary that our experimental paradigm after a phase of gradual learning induced an “aha” moment when participants suddenly realized a novel contingency. This requires a paradigm that goes beyond conventional reversal learning (i.e., where contingencies simply change and the parameters encoding those contingencies are updated via parameter learning).

We devised a two-phase experimental paradigm during which participants first learned a simple model of cue-target mappings. In the second phase, we introduced a structural change in the cue-target mapping by adding a conditional dependency: Whereas in the first phase there was no interaction between the predictive validity of visual and auditory cues, in the second phase the predictive validity of visual cues depended on the presence of auditory cues, as the visual cue-target mappings were reversed for trials in which the auditory cue was present.”

4) I think at this stage, you have to think carefully about your hypotheses. Generally speaking, to look at model updating (i.e., Bayesian model selection or structure learning) one has to have a rather delicate paradigm that elicits aha moments. In other words, a sudden switch associated with the act of selecting one model over another – that is revealed by an abrupt change inference and subsequent task performance. I do not think you have got this in your paradigm. In other words, there will be a degree of model updating in both the updating and revision phases. It may be that the simple model allows for a shorter latency of model updating, while the context sensitive (update phase) model has a more protracted update. One could address this but by assuming that each subject commits to a selected model at the point of model updating and estimate the most likely time point of this updating. The idea here would be that for the revision phase, most subjects discover or select their model early in the trials; while for the update phase, some subjects find the model more quickly while it takes other subjects much longer. This might be an interesting way of using your intersubject variability.

Reply: We thank the reviewer for their suggestions on how to reformulate the hypotheses.

In light of the results and the reviewer’s comments we acknowledge that our empirical paradigm could have been designed better to elicit a stronger “aha moment”. However, we still believe that our paradigm nevertheless captures some aspects of this cognitive process. Our results show that there exists a quantitative difference between the two experimental phases albeit in the opposite direction as expected. The quantitative differences in the learning dynamics, importantly, suggest that two different cognitive processes ensued in the two experimental phases. We explain these differences by suggesting that multiple models were constructed in the first experimental phase and upon the realisation of the rule participants settled for one model, momentarily discarding others. In the second phase, participants choose among the models they built in the first phase, and when they realised the new rules, they choose the model they built in the first phase that corresponds with the new rules. Whilst our paradigm and the results cannot provide definitive answers, the results at least are in accordance with the suggested interpretation, and we believe that the latter offers a great starting point for the future experimental work.

Please see the paragraph on p. 25 for a full discussion:

“One possible interpretation of these results is that our experimental manipulation induced structure learning in the first phase and parameter learning in the second phase. It might have been that in the first phase participants built multiple internal models, that they thought could capture the structure of the task, from scratch. An idea, that is reminiscent of Pouncy and Gershman’s work (Pouncy & Gershman, 2022) where participants are considering several models or competing theories at each point in time. As participants were learning our task, they were alternating between these models and upon the realisation of the rule participants settled for the correct model, resulting in a rapid increase in predictive accuracy and a decrease in the target-locked pupil responses as the data in the first phase shows. We assumed we prevented participants from learning models from scratch in the first phase by providing them with detailed instructions and pictorial representation of the stimuli resented in the task, before the task started. By that, we thought, we equipped participants with a crude model that would contain hypotheses about all the relevant variables of the task. However, in light of the current results we believe that our instructions did not result in the construction of a simple model that participants could use as a baseline upon entering the task. Importantly, whilst the above interpretation of the results is in principle plausible, further empirical investigation needs to be conducted to confirm that participants were indeed building multiple models in the first phase and then in the second phase selected a model (they had already constructed in the first phase) and started updating the parameters of that model.”

For reasons explained above, we believe that it is justified to frame our hypotheses in terms of the distinction between structure learning and parameter learning. Nevertheless, we do agree with the reviewer that we should firstly tie down the dynamics of structure learning and parameter learning behaviourally and only then capture the dynamics in pupil responses. In incorporating this insight in the hypotheses section, we additionally tested the nature of relationship we expected between the behavioural and the pupil data. See our responses to your question 2.1), for changes to the manuscript. Lastly, in the rewritten hypotheses section we refrain from claiming that pupil dilation signals this or that, but rather suggest (in the introduction) that pupil dilation may be an appropriate physiological correlate for information gain. See our responses to your question 2), for changes to the manuscript.

5) Finally, I think you need to be clearer about the experimental design. There were too many factors and changes for the reader to make sense of. For example, I did not understand whether Mappings 1 and 2 referred to the precision (i.e., 80% versus 20%) or to the mapping per se (i.e., square means left). Crucially, it was not clear what was reversed and what was not reversed. I think the simplest thing to do would be to have a figure in which you draw the mappings for the two phases of the paradigm separately. The maps should connect the cue to the targets with the little arrows. The precision of these mappings can then be indicated with 80% or 20% beside the arrows. This should also resolve confusion about your counterbalancing.

Reply: We thank the reviewer for these useful suggestions for improving the experimental design description. According to your suggestions, we have made the following changes to Figure 1, p. 9:

Figure 1. Experimental design and hypotheses (A) Trial structure of the behavioural task. Participants performed a 2AFC task on the expected orientation (left/right) of upcoming Gabor patches while pupil dilation was recorded. Each trial consisted of a fixation period, a cue period, a response window followed by a delay period, and finally a target period. The decision interval ranged from onset of the cue to the participant’s response. The target interval ranged from target onset into the subsequent inter-trial interval (3 s). The target served as feedback on the accuracy of participants’ predictions in the decision interval. (B) An illustration of one of the two counterbalanced cue-target mappings. The participants had to learn cue-target contingencies to accurately predict the orientation of the upcoming Gabor patch (target). Mapping 1 was defined as the visual cue-target pairs that occurred in 80% of trials in the first phase; Mapping 2 was defined as the visual cue-target pairs that occurred in 20% of trials in the first phase. Mappings were counterbalanced between participants (i.e., half of the participants received the square -> left, diamond -> right mapping in the 80% condition in phase 1). At the start of the second phase, the frequencies (80% vs. 20%) of the cue-target mappings were reversed for trials containing the auditory tone cue only. (C) Main hypotheses for the dynamics of accuracy and information gain following the target presentation over the course of the 2AFC task. The first 200 trials of the task represent the first phase in which a gradual increase in accuracy and a gradual decrease in the absolute value of information gain were expected (represented by exponential curves). Within the second phase (the last 200 trials), an abrupt increase in accuracy and an abrupt decrease in information gain were expected (represented by sigmoidal curves).

We have expanded on the task description in the Methods (p. 8):

“The design contained probabilistic cue-target mappings to introduce uncertainty in the predictions, simulating uncertainty that is inherent to perception in the real world. The visual and auditory cues predicted whether the target Gabor patch was tilted to the right or to the left with either an 80% or 20% probability (Fig. 1B). Note that we define mapping 1 (M1) to correspond to the 80% visual cue-target pairs with respect to the first phase and mapping 2 (M2) to correspond to the 20% visual cue-target pairs with respect to the first phase. Cue-target mappings were counterbalanced between participants such that half of the participants saw the square followed by a right-oriented Gabor patch and a diamond followed by a left-oriented grating in 80% of the trials, and, for the other half of the participants, this mapping was reversed (i.e., square –> left and diamond –> right in the 80% condition). In the remaining 20% of the trials, the participants received the reversed cue-target mapping with respect to their 80% mapping condition.”

For example, when you said that Mappings 1 and 2 were counterbalanced over subjects, does this mean that certain subjects never experienced one of the two Mappings?

Reply: Yes, this is true (partially). Upon the start of the experiment, subjects were assigned to one of the two mapping conditions. Therefore, each subject always experienced either Mapping 1 or 2 as the 80% frequency condition with respect to the first phase. However, this is partially true, because each participant did see the “other” mapping condition as the 20% frequency condition (with respect to the first phase) due to the nature of balancing the 2 x 2 stimuli. We hope to have clarified this in the above-mentioned changes.

I hope that these suggestions help should any revision required.

Reply: Yes, indeed, your useful and constructive suggestions have substantially helped us improve our manuscript.