Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2022 Jun 22;18(6):e1010182. doi: 10.1371/journal.pcbi.1010182

Tracking the contribution of inductive bias to individualised internal models

Balázs Török 1,2,3, David G Nagy 1,4, Mariann Kiss 2,3, Karolina Janacsek 5,6,, Dezső Németh 3,5,7,, Gergő Orbán 1,‡,*
Editor: Lusha Zhu8
PMCID: PMC9255757  PMID: 35731822

Abstract

Internal models capture the regularities of the environment and are central to understanding how humans adapt to environmental statistics. In general, the correct internal model is unknown to observers, instead they rely on an approximate model that is continually adapted throughout learning. However, experimenters assume an ideal observer model, which captures stimulus structure but ignores the diverging hypotheses that humans form during learning. We combine non-parametric Bayesian methods and probabilistic programming to infer rich and dynamic individualised internal models from response times. We demonstrate that the approach is capable of characterizing the discrepancy between the internal model maintained by individuals and the ideal observer model and to track the evolution of the contribution of the ideal observer model to the internal model throughout training. In particular, in an implicit visuomotor sequence learning task the identified discrepancy revealed an inductive bias that was consistent across individuals but varied in strength and persistence.

Author summary

Instead of mapping stimuli directly to response, humans and other complex organisms are thought to maintain internal models of the environment. These internal models represent parts of the environment that are most relevant for deciding how to act in a given situation and therefore are key to explaining human behaviour. In behavioural experiments it is often assumed that the internal model in the subject’s brain matches the true model that governs the experiment. However this assumption can be violated due to a variety of reasons, such as insufficient training. Furthermore, the deviation of the internal model from the true model is not uniform across individuals, and therefore it summarizes the subjective beliefs of humans. In this paper, we provide a method to reverse engineer the internal model for individual subjects by analysing trial by trial behavioural measurements such as reaction times. We then track and analyse these reverse engineered models over the course of the experiment to see how participants trade off between an early inductive bias towards Markovian dynamics and the model that reflects the evidence that humans accumulate during learning about the actual statistics of the stimuli.

Introduction

Building internal models is key to acting efficiently in the environment [13]. Consider for example observing the surface of a swift river: understanding how ripples and intermittent smooth patches are shaped by underwater rocks and understanding the strength required for pulling the paddle to propel a raft in the desired direction helps to plan the route of our raft downstream. Internal models represent expectations of what is going to happen next, how objects and other people can be expected to behave (often termed intuitive physics and intuitive psychology), what is the state of unobserved parts of the environment and consequently what actions lead to desired outcomes.

An ideal observer maintains an internal model that perfectly reflects the properties of the environment and our observations. Assuming that humans maintain an ideal observer model has been instrumental to understand behavior in a wide array of situations [48]. However, limited experience with rafting and uncertainty about riverbed geometry introduces deviations between the ideal observer model and the internal model actually maintained by individuals. Indeed, deviations from the true model of the environment were key to accurately predict human judgements when they interacted with physical constructs [9]. Identifying potential deviations can be crucial since assuming an ideal observer model instead of the actual internal model can result in misinterpretation of the computations underlying human decisions [10]. Extensive experience with the environment contributes to closing the gap between the ideal observer model and the internal model but individual differences can persist due to variance in prior experience, learning strategies and a range of other factors [1114]. Consequently, accurate prediction of behavior, especially in early stages of learning, is only possible if we can retrieve the actual subjective internal models.

Potential sources of the deviation between the ideal observer model and the maintained internal model has recently been the subject of intense research [15, 16]. Studies have demonstrated that learning novel and complex statistics can lead to systematic deviations from the ground truth model [17, 18]. Mismatch between the predictions of an ideal observer model and human behaviour has been shown to be a consequence of computations relying on an internal model that deviates from the ground truth rather than sub-optimal computations [10, 19, 20]. Insights on the reasons for such deviations come from theoretical considerations. In general, perfect knowledge of the ideal observer model can be challenged by the high task and stimulus statistics complexity or by the insufficiency of available information early during learning [2124]. From a theoretical perspective, learning can be more efficient if observers not only rely on observations but recruit earlier knowledge as well. For instance, previous experience with sea kayaking can provide skills for dealing with surface features such as whirlpools or rapids, despite the fact that more regular and larger amplitude waves are characteristic of the sea. Relying on earlier knowledge can be phrased as an inductive bias since this might help the interpretation of the current stimulus but at the expense of potentially introducing distortions [25]. In summary, characterising inductive biases is key to understanding how the actual internal model maintained by humans is related to the ideal observer model.

To identify internal models, a method is required that can perform efficient inference of a flexible class of possible internal models from behavior. Recent years have seen a number of studies where behavior was used to infer complex internal models [2628]. These studies investigated internal models adapted to natural-like stimuli, in which case the ideal observer was not feasible to identify. We seek to investigate a scenario where the internal model is complex but the ideal observer model is well defined. Importantly, unlike [28], we aim to develop a tool that can efficiently infer subjective internal models, such that individual differences in learning curves can identify the evolution of the internal model as the participant learns about unfamiliar stimulus statistics. For this, we need a (i) highly expressive class of internal models and (ii) behavioural measurements that are highly informative about the internal model. We proceed by choosing an experimental paradigm that satisfies (ii). In an experiment where trials are governed by temporal dynamics and therefore individual trials are not independent, the sequence of behavioural measurements have information content that far exceeds that of an independent and identically distributed (i.i.d.) experimental setting. It has been extensively documented that participants do pick up temporal regularities in experiments with stochastic dynamics [2931]. Furthermore, individuals show high variation in their initial assumptions [29, 32]. Relying on a paradigm which features inter-trial dependencies unknown to participants, we aim to reverse-engineer the newly formed dynamical internal models of individuals. In order to satisfy (i), we propose to use infinite Hidden Markov Models (iHMMs, [33], for a brief introduction please read S1 Appendix). To infer the structure and dynamics of the iHMM we adopt and extend the Cognitive Tomography (CT) framework [27]. The proposed Cognitive Tomography model combines iHMMs and the linear ascend to threshold with ergodic rate (LATER) model [34] to relate subjective probabilities of individuals to response time measurements on a trial-by-trial basis.

In this paper we set out to infer individualised dynamical internal models from response time data using the Cognitive Tomography principle. We use an implicit sequence learning paradigm in which the stimulus sequence is characterised by challenging statistics novel to participants.

We take a data-driven approach where the structure of the internal model is discovered through modelling the subtle statistical structure present in response times. We track the evolution of the internal model over multiple days and thus obtain individual learning curves that provide unique insight into the way internal models are acquired by learning. After introducing the CT framework, we validate that the model structure inferred by CT corresponds to the internal model of individuals by testing the generalization capability of the inferred model across tasks and stimulus statistics. After validating CT we use it to gain insights into learning by assessing how the inferred model relates to a stimulus statistics driven component, the ideal observer model. We track the contribution of the ideal observer model to the internal model by assessing the amount of variance in response time explained by the ideal observer model relative to the internal model inferred by CT. The residual variance in the CT predictions not explained by the ideal observer is identified with the inductive bias that humans use when learning the task. We attempt to break down the variances in response times into two independent components: the ideal observer model and an inductive bias. We show that the internal model inferred through CT can be reliably broken down into the contributions of the ideal observer model and a simple dynamical model, the so called Markov model. While the contribution of the Markov model varies across participants, it can consistently account for the dominant portion of the residual variance across all participants. Finally, by tracking the evolution of the contributions of the two models we show how the two models are traded off during learning. While learning strategies and efficiency of learning varies considerably across individuals, a consistent trend can be identified over days, in which initial dominance of the Markov model is gradually taken over by the ideal observer, indicating that the Markov model is a general inductive bias for learning the temporal structure of the stimulus. Taken together, our findings demonstrate that complex internal models can be inferred from response time measurements. Furthermore, our results suggest a new perspective on how humans trade-off inductive biases and evidence over the course of learning and also provide new tools to measure such inductive biases.

Results

In order to test how behavioural data from individuals can be used to infer a dynamical probabilistic variable internal model and assess the contribution of inductive biases to the internal model, we used an experimental paradigm that could fulfil a number of key desiderata. First, the paradigm relies on across-trial dependencies; second, as in everyday tasks, the state of the environment cannot be unambiguously determined from the observation of momentary stimuli; third, the structure of the task is new to participants; fourth, the complexity of the task is relatively high, i.e. an a priori unknown number of latent states determine the observations; fifth, behavioural measurements during task execution are continuous, which ensures that rich inferences can be made. In the alternating serial response time task (ASRT, [35]) a stimulus can appear at four locations of a computer screen and the sequence of locations (untold to participants) follows a pre-specified structure (Fig 1A). In odd trials, the stimulus follows a 4-element sequence, while in even trials the stimulus appears at random at any of the positions with equal probability independently of all other trials (Fig 1, Methods). Such stimuli precluded unambiguously determining the state of the task solely based on a single trial’s observation. There are an additional 5 random trials at the beginning of each block. Participants are tasked to give fast and accurate manual responses through key presses corresponding to the locations of the stimuli. We collected response time measurements for sequences of stimuli organized into blocks of 85 trials. A session consisted of 25 blocks and the performance was tracked for 8 days with one session on each day, during which the same stimulus statistics governed the stimuli, followed by two additional sessions on later days where the statistics of stimuli was altered (sessions were weekly spaced when possible, on occasions 2–3 day shifts were in place due to participant availability, S1 Fig).

Fig 1. Experimental paradigm and Cognitive Tomography (CT).

Fig 1

A Top: Behavioural responses: participants are responding with key presses on a keyboard where stimulus identities (shown as different coloured squares) are associated with unique keys. Middle: An example deterministic pattern sequence, which recurrently occurs in the stimulus sequence of a particular participant. Different participants are presented with permutations of this four-element sequence. Bottom: In the actual stimulus sequence presented to participants, the deterministic pattern sequence is interleaved with random items (small squares. Random items can be any of the four stimuli and can occur with equal probability (size of the square is proportional to the probability of a stimulus). Grey line indicates one particular realization of the stochastic sequence. B The probabilistic generative model underlying Cognitive tomography. The generative model describes the process how a stimulus sequence (top grey box) results in a behavioural response. A participant is assumed to use the internal model top blue box to make a prediction for the upcoming stimulus. The internal model assumes dynamics over the latent states. The current latent state is determined jointly by earlier states and the current observation. Based on the current latent state a prediction can be made on the probability of possible upcoming stimuli. The predicted probability (size of squares corresponds to the probability of prediction) is related to the behaviour through a behavioral model (bottom blue box). The behavioral model depends on the task being performed and therefore the type of response being predicted. Here, the logarithm of the predictive probability is mapped to a mean response time and actual response times are assumed to be noisy versions of this mean. Response times (bottom grey box) shown here are 400 trials from an example participant. Cognitive tomography uses the stimulus sequence and the sequence of behavioural responses (grey boxes) to infer the components of CT, the internal model and the behavioral model (blue boxes).

We used the response times of individuals to infer a dynamical probabilistic latent variable model underlying their behaviour (Fig 1, Methods). We invoked the concept of CT to infer the internal model from a limited amount of data. CT requires the formulation of the generative model of the data, i.e. the process that produces behavioral data from observations. CT distinguishes two components of the model (Fig 1B, blue boxes): the internal model, which summarizes an individual’s knowledge about the stimulus statistics and the behavioral model, which describes how behavioral responses are related to the internal model during the task that is being performed. Inference of the internal model requires inference how latent states evolve and how these determine the stimuli. By knowing the dynamics of latent states we can make predictions for the upcoming stimuli by establishing the subjective probability of possible subsequent elements of the stimuli. The behavioral model establishes how subjective probabilities of the internal model are related to behavioral outcome, which is the response time in our case. We used the LATER model to predict response times from subjective probabilities [34, 36]. The experimenter uses the observed data (Fig 1B, grey boxes), the stimulus sequence and response times, for the inference. The resulting CT model (S2 Fig) is implemented as a probabilistic program with components implemented in Stan [37].

The iHMM model provides a flexible model class to infer latent variable models [33]. Similar to the classical Hidden Markov Model, learning entails the specification of transition probabilities between latent states along with the probability distributions of observations given a particular latent state (Fig 2A). Additional flexibility of iHMM is provided by not fixing the number of latent states but inferring this from data. This is implemented as a non-parametric Bayesian model (for a brief introduction into iHMM see S1 Appendix). In an iHMM, participants filter the information gained from the observations over time to estimate the possible latent state of the system (Fig 2Ba, filled purple circles). That is, they infer what history of events could best explain the sequence of their stochastic observations. Then, they use their dynamical model to play the latent state forward (Fig 2Ba, open purple circles) and predict the next stimulus (Fig 2Bb).

Fig 2. Inference and predictions using the internal model.

Fig 2

A We formulate the internal model as an iHMM, where the number of latent states (grey circles), transitions between the states (arrows), and the distribution of possible stimuli for any given state (coloured squares) needs to be inferred by the experimenter. Width of arrows is proportional to transition probability and arrows are pruned if the transition probability is below a threshold; size of dots indicates the probability of self-transition. Size of stimuli is proportional to appearance probability in the given state. The result of inference is a distribution over possible model structures, the figure represents a single sample from such a distribution. B Evolving the internal model from trial t to trial t + 1. At time t, participants use the internal model components to update their beliefs over the current state of the latent states (Ba, size of dark purple discs represent the posterior belief of the latent state based on the current observation, blue square). Then, participants play the model forward into the future (open purple circles). Finally, they generate predictions for the upcoming stimulus (Bb, squares in grey boxes) by summing over the possible future states (open purple circles in grey boxes). Participants use previous state beliefs and the new stimulus to update latent state beliefs. In this particular example, at trial t + 1 only one of the possible states can generate the observation, hence there is only one dark purple disk. Again, they play the dynamics forward and predict the next stimulus. C Predicted response times against actual response times are shown for individual trials for an example participant (dots). After training our inference algorithm on a training dataset of 10 blocks, we predict response times of another 10 blocks on the same day. Performance is measured as the trial-by-trial coefficient of determination between measured and predicted response times (R2, coloured label).

During the eight days of exposure to the ASRT task participants undergo learning, which leads to a substantial reorganization of the internal model. Learning can be present on short (within day) or longer time scales. In our analysis we aimed at tracking the across-day changes in the internal model of individuals. The rationale behind this choice is twofold. First, while the non-parametric Bayesian approach is relatively data-thrifty, flexibility of the model comes at a price that it is still characterized by a larger number of parameters (a transition matrix with N ⋅ (N + 1) parameters and emission matrix with 4N parameters, where N is the number of latent states). As a result, changes in the internal model cannot be reliably captured by a few button presses. In order to have a cross-validated measure of model performance we use non-overlapping data sets for learning the model and testing it. This also imposes a limit to how finely we can track changes in the internal model. Consequently, while theoretically there will be changes on a smaller time scale (especially on day one of the exposure), for practical reasons, to have a stable inference, we learn the model from the response times once in every session. Second, our analysis showed that there are substantial changes in the internal model even days after first exposure, which suggests slower learning processes, which can be reliably captured with across-day comparisons.

To test that the proposed inference algorithm is capable of the retrieval of the probabilistic model underlying response time sequences, we validated our inference algorithm on synthetic data (Methods, S3 Fig). We used three different model structures for validation, which were HMMs inferred from three different-length stimulus sequences (one sample from the iHMM inference in [33]). Similar to our human experiment data, we assessed CT by computing its predictive performance on synthetic response times. Further, since synthetic participants provide access to true subjective probabilities we also calculated performance on the ground truth subjective probabilities. We showed that the subjective probabilities can be accurately recovered from response times. As shown on S3(D) Fig, standard deviations of participants’ response times are within the range of successful model recovery.

To infer an internal model from response times, we inferred the internal model along with the parameters of the response time model on 10 blocks of trials measured at the second half of the session. Individual differences in internal models was captured by inferring internal models for every participant separately. We inferred the internal model from a single set of 10 blocks, once in a session. To check the validity of our response time model, we validated its basic assumptions. The response time model assumes that variance in response times comes from the joint effect of the variance in log predictive probabilities and an inverse Gaussian noise corrupting the subjective probabilities. If the fit of the internal and the response models are appropriate, the the residual variance, i.e. the variance not accounted for by the variance in the subjective probabilities predicted by the CT model, is expected to be inversely normally distributed. We checked this on the CT model on a subject by subject basis by contrasting the expected cumulative distribution of residuals with the measured cumulative distribution. This analysis demonstrated that residuals are close to a normal distribution (S4 Fig) with a single subject apparently having a bimodal residual distribution, potentially indicating additional structure in the internal model not captured by CT. Note, that throughout the analysis trials with fast response times are discarded (see Methods for details).

Response times could be predicted by CT efficiently even for individual trials as shown by the analysis of the response times from a single participant (R2(550) = 0.284, p < 0.001, Fig 2C). The predicted distribution of response times closely matched that of the empirical distribution of response times (for an example, see S5 Fig). It is important to note that the predictive power was substantially increased by averaging over trials in the same positions of the sequence (S6 Fig). Despite the significant advantage of trial-averaged predictions, we believe that single trial predictions provide a more rigorous and important characterization of human behaviour therefore we evaluate model performances on an individual trial basis in the rest of the paper.

Alternative models

Whether and how much the inferred internal model reflects the structure of the environment can be tested by contrasting the inferred CT model with the ideal observer model. Since we have full control over the generating process of the sequence of stimuli, the ideal observer model is identified with a generative model that has complete knowledge about the stimulus statistics and the only form of uncertainty afflicting inference stems from the ambiguity in the interpretation of observations rather than uncertainty in model structure or parameters. Assessment of the deviation of the CT and the ideal observer models can reveal the richness of the strategies pursued by humans when exposed to unfamiliar artificial stimulus statistics. Fixed parameters of the ideal observer also ensured that the changing task performance of humans could be directly compared across the course of learning to the same baseline. Importantly, ‘learning’ by participants during extended exposure throughout the experiment does not necessarily mean that their internal model gets gradually closer to the ideal observer model since even when more evidence is provided towards the true underlying model one can commit more and more to a superstitious model. As a consequence, deviation can temporarily accumulate before converging towards the true model, resulting in nonlinear learning trajectories. The ideal observer model is the one that perfectly corresponds to the task structure. Importantly, this ideal observer model is part of the set of models that iHMM can learn. This internal model corresponds to a graphical model in which eight states are present representing the four alternating pattern and random states, such that pattern states are characterized by a single possible stimulus and random states are characterized by equal probability stimuli (Fig 3B). Thus, the ideal observer models bear strong similarities with the CT model but differ conceptually: the ideal observer model parameters are determined by stimulus statistics, while CT structure and parameters are determined by behavioral data.

Fig 3. Alternative models.

Fig 3

A Table of models and the maximum likelihood parameter sets for the stimuli in our experiment. The ideal observer model (the true generative model of the stimuli) can be formalized as an 8-state HMM with states Pattern1, Random1, Pattern2, Random2, Pattern3, Random3, Pattern4, Random4 where the pattern states produce the corresponding sequence element with probability 1 and all the random states produce any of the four observations with equal probability independently. The Markov model (where predictions are produced by conditioning only on the previous observation) fits the observations best when it predicts all observations with equal probability, since the marginal probabilities of any one stimulus is equal regardless what the previous observation was, because every other trial is random. The trigram model produces a “high triplet” prediction, where the next stimulus is the successor of the stimulus two trials ago in the pattern sequence (the current observation is either a random or a pattern element, each with 50% probability, with conditional probabilities of 100% or 25%, respectively). All alternatives have equal probability of 0.125. Note that the exact probabilities in this case are not relevant since the trials are categorized into two groups (high and low) and therefore the parameters of the response time model and these probabilities are underspecified. The CT model produces a prediction for the next stimulus via filtering. A latent state of the sequence is estimated from previous observations using a Hidden Markov Model. This flexible model space includes the ideal observer model as well as the Markov model as special cases. B Structure of the ideal observer model (top panel) and that of the Markov model (bottom panel). For the description of the graphical elements as Fig 2A.

A defining characteristic of the CT model was that it could model an arbitrarily rich temporal dependence between subsequent scenes by using latent states. A model that can only account for direct dependencies between consecutive scenes is the Markov model, which lacks the capability to represent latent states. This model learns the transition probabilities directly between observations, which is a simple but feasible model that can account for a wide range of everyday observations. Importantly, a Markov-like dynamics is a special case of the internals models that CT can represent.

Finally, the gold-standard for characterizing learning in an ASRT task is the so-called triplet model that tests the correlation of response times with the summary statistics of the stimulus sequence. We formulated this model as a trigram model. We include this model as well to compare its predictive performance with alternative models. Since the triplet model reflects the summary statistics of the stimulus this model bears resemblance to the ideal observer model albeit without assuming the ability to perform real inference in the model.

Direct comparison of alternative models is presented in Fig 3A (see also Methods).

Comparison of ideal observer and CT performance to predict trial-by-trial behavior

We tracked the predictive performance of the ideal observer model through the eight days of exposure to a fixed stimulus statistics in the ASRT task. The ideal observer is determined by the stimulus structure therefore capturing across-individual differences is limited to different nuisance parameters, not characteristic of the internal model. Participant-averaged predictive performance of the ideal observer was not significantly above zero on the first day of training (one-sided t(24) = −0.7692, p = 0.775, CI = [−0.0214, Inf], d = 0.154). Participant-averaged predictive performance was constantly increasing with the length of exposure, indicating that participants gradually acquire an internal model that accommodates the statistics of stimuli (Fig 4A).

Fig 4. Contrasting the ideal observer and CT performance in predicting trial-by-trial response times.

Fig 4

A, Performance of the two models in predicting response times on the eight days of exposure to the stimulus sequences governed by the same statistics. Performance is measured as the amount of variance in response times (R2) explained by the particular model. Dots represent mean performance, boxes represent the 25 and 75 percentile of the performances across the population of 25 participants. B, Violin plot of the distribution of mode l performances across the participants on the eighth day of exposure. Grey dots indicate individual participants, lines connect model performances for the same participant. All data on the figure are cross-validated by fitting the model on a set of blocks late in the session and tested on non-overlapping earlier blocks.

Comparison of the internal model captured by CT to the ideal observer model reveals a consistent gap in predictive performance (Fig 4A). CT systematically outperformed the ideal observer on all eight days of exposure (r(198) = 0.923, p < 0.001) also demonstrating above chance predictive performance on the first day. The advantage of the CT model over the ideal observer was very consistent across participants as demonstrated by the participant-by-participant comparison of predictive performances on the eighth day of exposure (binomial test on MSE values 0.96, n = 25, p < 0.001, Fig 4B). In summary, while the ideal observer model demonstrates clear evidence that participants do gradually learn the stimulus statistics, CT reveals structure in responses that is not accounted for by the ideal observer model.

Validation of the internal model

To verify that better predictive performance of the model identified by CT is not only a consequence of a more flexible model but is a signature of inferring a model that reflects better the properties of the internal model maintained by the participants, we perform two additional analyses. The core principle of Cognitive Tomography is to distinguish an internal model that captures a task-independent understanding of the statistical structure of the environment and a response model which describes task-specific behavioural responses. The validity of this principle can be assessed by manipulating the internal and behavioral models independently. First, we tested if the same internal model can be used to predict behavioral performance in a different task, that is, to predict behavioral measures different from those that the model was trained on. CT was trained on response times when participants pressed the correct key and here we replaced this task of predicting response times with the prediction of behavior in error trials, that is, in trials when the participant pressed the incorrect button. In particular, we aimed at predicting the trials in which a participant is likely to commit errors (because the subjective probability of the correct choice is relatively low) and also the erroneous response when an error occurs (the subjective probability of the choice relative to other potential choices). Second, we tested the usage of the internal model when stimulus statistics is manipulated. After completing eight days of training, participants were exposed to novel stimulus statistics and we tested if participants recruited the learned internal model only when the stimulus statistics matched the one the internal model had been learned on.

In error prediction we separated trials based on whether the participant pressed the key corresponding to the actual stimulus or any other keys. Note, that the internal models of CT were inferred only on correct trials using the response time model. We investigated two relevant hypotheses. First, a participant will more likely commit an error when their subjective probability of the stimulus is low. Second, when committing an error, their response will be biased towards their expectations. For reference, we contrasted the predictive performance of CT with the ideal observer model. We compared the rank of the subjective probability of the upcoming stimulus both for correct and incorrect trials (Fig 5A). CT ranked highest the upcoming stimulus in correct trials above chance (0.461, n = 18473, p < 0.001) and significantly below chance for incorrect trials (0.175, n = 2777, p < 0.001). Ideal observer model excelled at predicting the correct responses, as it ranked the correct responses high above chance (0.635, n = 18473, p < 0.001). However, it also assigned the highest probability to the upcoming stimulus in incorrect trials (0.315, n = 2777, p < 0.001). Ranking of incorrect responses was above chance for both models (Fig 5B).

Fig 5. Validation of the inferred internal model by selectively changing the task and the stimulus statistics.

Fig 5

A-D Choice predictions by CT (red) and the ideal observer model (green). Models are trained on response times for correct key presses on Day 8 and tested on both correct and error trials the same day. A, Proportion of trials where the model ranked the upcoming stimulus first. For correct trials both models have preference for the stimulus. For incorrect trials, the ideal observer model falsely predicts the stimulus in more than a quarter of the time. B, Proportion of trials where the model ranked the button pressed by the participant first. For incorrect responses, both models display a preference towards the actually pressed key over alternatives. C, ROC curves for two example participants based on the subjective probabilities of upcoming stimuli (held-out dataset). Area under the ROC curve characterizes the performance of a particular model in predicting error trials. D, Area under ROC curve. Grey dots show individuals, bars show means. E, Investigating new internal models that emerge when new stimulus sequences are presented. Participant-averaged performance of predicting response times on Day 8–10 using CT-inferred models that were trained on Day 8 (filled red symbols) and Day 9 (open red symbols) on stimulus sequences governed by Day 8 or Day 9 statistics. On Day 9 a new stimulus sequence was introduced, therefore across-day prediction of response times corresponded to across sequence predictions. Training of the models was performed on 10 blocks of trials starting from the 11th block and prediction was performed on the last five blocks of trials (the index of the blocks used in testing is indicated in brackets). On Day 10, stimulus sequence was switched in 5-block segments between sequences used during Day 8 and Day 9 (purple and grey bars indicate the identity of stimulus sequence with colours matching the bars used in Day 8 and Day 9. Error bars show 2 s.e.m. over participants. Stars denote p < 0.05 difference.

We obtained a participant-by-participant assessment of the difference between model performances in predicting error trials by calculating ROC curves of the models based on the subjective probabilities assigned to upcoming stimuli (Fig 5C and S7 Fig). Area between two ROC curves characterizes the performance difference between models and CT is shown to consistently outperform the ideal observer model in distinguishing correct choices from incorrect choices (paired t-test on AUC values one-sided t(24) = 6.185, p < 0.001, CI = [0.033, Inf], d = 1.1, Fig 5D). Thus, CT can perform across-task predictions and it substantially outperforms the ideal observer model as well.

We also tested the hypothesis whether participants use a single model to represent the sequence or they are capable of holding multiple models and recruiting them appropriately [3840]. In particular, when we changed the underlying pattern sequence in the task, we expected participants to start learning a new model instead recalibrating the same model used in the first eight days. There are two major pieces of evidence at hand. Firstly, as expected, the internal model inferred on Day 8 does significantly worse in predicting behaviour when a new sequence is present on Day 9 (one-sided t(24) = 4.958, p < 0.001, CI = [0.0746, Inf], d = 1.06). Similarly, the internal model inferred on Day 9 predicts human behaviour significantly worse on Day 8 (one-sided t(24) = 4.9, p < 0.001, CI = [0.0616, Inf], d = 0.963). The real test to using two models is on Day 10, when the two underlying sequences are alternating every five blocks, starting with Day 8 sequence in blocks 1–5. Specificity of response time statistics to the stimulus statistics is tested by predicting Day 10 performance using Day 8 and Day 9 models. The Day 8 model more successfully predicts response times in blocks relying on Day 8 statistics than on blocks with Day 9 statistics (one-sided t(24) = 3.734, p < 0.001, CI = [0.0236, Inf], d = 0.594) and the opposite is true for the Day 9 model (one-sided t(24) = 3.528, p < 0.001, CI = [0.0261, Inf], d = 0.575). Oscillating pattern in the predictive performance of Day 8 and Day 9 models on blocks governed by Day 8 and Day 9 statistics indicates that participants successfully recruit different previously learned models for different stimulus statistics (Fig 5E and S8 Fig; note however, that there is high variance across participants in the level of oscillation indicating varying level of success).

In summary, these results demonstrate that the internal model inferred by CT fulfils two critical criteria: the internal model component is general across tasks but is specific to stimulus statistics.

Evolution of the internal model with increased exposure

Our initial analyses demonstrated that the internal model captured by CT can account for a large component of the variance observed in the responses of participants and also that the ideal observer model can only account for a fraction of this variance. This is expected, since learning the model underlying observations entails that participants need to learn the number of states, the dynamics, and observation distributions, which requires substantial exposure to stimulus statistics. When data is insufficient for an observer to infer the model underlying observations, they can recruit inductive biases that can reflect earlier experiences. The structure of such inductive biases can be very rich. Instead of trying to explore the space of potential forms of inductive biases, we use an Ansatz that is a parsimonious explanation of temporal dependencies, the Markov model. The Markov model only learns immediate dependencies between subsequent observations, which is not in line with the statistics of the applied stimulus sequence but reflects the regularities found in everyday stimuli. In summary, we assume that the gap between the predictive performance of CT and that of the ideal observer can be accounted for by the Markov model. Further, if it constitutes an inductive bias then responses early in the training are governed by Markov model and only gradually wanes. We analyzed the learning curves of individuals through the eight days of training. Our initial analyses were extended with an additional model, the Markov model (Fig 6A). For reference, we also analyzed the trigram model, which can capture essential summary statistics of the stimuli. The predictive performance of the trigram model closely follows that of the ideal observer, indicating that the summary statistics captured by the trigram model is indeed responsible for a substantial part of the statistics reflected by the ideal observer (Fig 6A). The Markov model can capture a significant amount of variance from response times on the first day of exposure (M = 0.0677 ranging from 0.00049 to 0.13, one-sided t(24) = 11.68, p < 0.001, CI = [0.205, Inf], d = 2.34), and its performance is not different from that of CT (binomial test on MSE values 0.72; n = 25, p = 0.0433). Note, that the Markov model is a special case of the model class represented by CT (Fig 3B), therefore indistinguishable predictive performance of the two indicates that the internal models on the first day of training are dominated by a Markov structure.

Fig 6. Evolution of the internal model with increasing training.

Fig 6

A Mean explained variance (dots, averaged over participants) in held-out response times in sessions recorded on successive days for the CT (red), Markov (blue), ideal observer (green) and trigram (yellow) models. Error bars denote 2 standard error of the group mean. Error bars show 2 s.e.m. B Color coding of response buttons used in this figure. C Color coding of sequence showed to participants. D-F Learning in individual participants (left, middle, and right panels corresponding to different participants: 102, 110, and 119, respectively). E Learning curves of CT, ideal observer, Markov, and trigram models. Internal models shown on D & F panels (corresponding to Days indicated by red disks on panel E, respectively) are samples from the posterior of possible internal models inferred by CT. CT predictive performance is calculated by averaging over the predictive performances of 60 samples. Participant 102 finds a partially accurate model by Day 2 (D) and a model close to the true model by Day 8 (F). Participant 110 retains a Markov model throughout the eight days of exposure. Prediction of their behaviour by the Markov model gradually improves while the predictive performance of the ideal observer model is floored, indicating that no higher-order statistical structure was learned. G & H Mismatch between subjective probabilities of upcoming stimuli derived from CT and alternative models: the ideal observer model (generative probabilities, horizontal axis); and the Markov model (vertical axis). KL-divergences of the predictive probabilities are shown for individual participants (dots) on Day 2 (G) and Day 8 (H). KL-divergence is zero at perfect match and grows with increasing mismatch.

CT offers a tool to investigate the specific structure of the model governing the behaviour of individuals at different points during learning while exposed to the same sequential statistical structure (Fig 6B–6D and 6F). We computed learning curves of individuals (Fig 6E and S9 Fig) and analysed the internal model structure at different points during training by taking posterior samples from the CT model. Early during training where the predictive performance of the Markov model is close to that of CT, the inferred iHMM indeed tends to have a structure close to that of the Markov model (see also Fig 3B), which is characterized by a strong correspondence between observations and latent states (Fig 6D). Later in the experiment, however, the performance of CT deviates from that of the Markov model for most of the participants (Fig 6E and S9 Fig) and the model underlying the responses reflects a more complex structure (Fig 6F). Note that the monotonic improvement of CT performance can hide a richer learning dynamics: several participants have strong nonlinearities in their learning as initial improvements correspond to a stronger reliance on a Markov-like structure, which is later abandoned for a more ideal observer-like structure (Fig 6F and S9 Fig). Importantly, learning curves and internal models corresponding to different parts of the learning curve reveal qualitative differences between participants. There are participants where improved predictability of response times does not correspond to adopting a model structure that reflects the real stimulus statistics, but the model underlying response times still closely resembles a Markov model (participant 110, Fig 6F, see also Fig 3B). In the meantime, subjects can be identified where the contribution of the Markov model to the internal model declines to almost zero and their internal model seems to faithfully reflect the characteristics of the ideal observer model (subject 119, Fig 6F, note the alternating states with uniformly distributed observations and those with close to certain prediction of observations).

An objective measure of the match between the subjective probabilities of upcoming stimuli and the ground truth probabilities can be obtained by calculating the KL-divergence between the two, a measure commonly used to compare probability distributions. An alternative argument can also be made for using KL-divergence deduced from the LATER model (see S2 Appendix). We computed the KL-divergence between the ground truth probabilities of the task and those of the inferred CT model (Fig 6G) as well as the inferred Markov model and the CT model (Fig 6H), which quantifies the influence of the Markov model on the internal model. The analysis confirms that some participants move away from a Markov model and towards the ground truth probabilities (e.g. participants 102 and 119) while others maintain a model closer to a Markov model throughout the experiment (e.g. participant 110).

Trade-off between ideal observer and Markov model contributions

The Markov model was shown to be present in multiple days of exposure to the stimulus sequence, and even the internal models of individuals inferred by CT indicated that a Markovian structure largely determines the behavior of individuals early in the training. We assessed the relative contributions of the Markov and ideal observer models by calculating the number of individuals for whom the Markov or the ideal observer model showed higher predictive performance. The Markov model could be identified for all of the participants (Fig 7A), albeit its strength to predict responses varied across participants (S8 Fig). This, along with the observation that the contribution of the Markov model could decline and even diminish for several participants, raised the possibility that the Markov model could constitute the inductive bias participants were relying on.

Fig 7. The internal model captured by CT can be reliably broken down into the independent contribution of an inductive bias and the ideal observer model.

Fig 7

A Day-by-day comparison of the number of participants for whom the predictive performance of Markov (blue) or ideal observer (green) models was higher. B Subject-by-subject comparison (dots represent individual subjects) of ideal observer model performance and normalized CT performance (the margin by which CT outperforms the Markov model) on Day 8. Dots close to the identity line (grey line) indicate cases where CT performance can be reliably accounted for by contributions from the two simpler models. Normalized CT performance closely follows the performance of the ideal observer model, and deviations tend to indicate slightly better normalized CT performance. C Performance of a linear model predicting CT model predictions on a trial-by-trial basis from a Markov and ideal observer model predictions on different days of the training. Thick mid-line indicates R2 of the trial-by-trial fit of the linear combination to CT performance averaged across participants. Boxes show 25th and 75th percentile of the distribution. Upper whiskers show largest value within 1.5 from 75th percentile. Similarly for lower whisker. Dots are data points outside the whiskers. D Histogram of the advantage of normalized CT performance over the ideal observer model. Red line marks the mean of the histogram. E Higher-order statistical learning in CT (left panels) and ideal observer model (right panels) on Day 2 (top panels) and Day 8 (bottom panels) of the experiment. Dots show individual participants. Orange dots represent participants with higher-order learning score significantly deviating from zero. CT can capture both negative deviations (Day 2) and positive deviations (Day 8) in this test and displays significant correlations across participants on both days between the predicted and measured higher-order statistical learning, indicating that subtle and nontrivial statistics of the internal model is represented in CT.

To investigate this hypothesis, we first tested if the predictive performance of CT can be understood as a combination of the performances of the Markov and ideal observer models. For this, we capitalize on the insight that the Markov and the ideal observer models capture orthogonal aspects of the response statistics. The Markov model can only account for first order transitions across observed stimuli. The ideal observer model is sensitive to both first-order and second-order transitions but since parameters of the ideal observer model are determined by the stimulus statistics, which lack first-order dependencies the structure that this model actually captures is only sensitive to second-order transitions (Fig 3). As a consequence, the variances in response times explained by these two models are additive.

We used the additivity of Markov and ideal observer variances to assess how well the performance of CT can be predicted by combining the predictions of the ideal observer and Markov models. We contrasted the normalized CT performance, the difference of the variance explained by CT and the Markov model, with the variance explained by the ideal observer model on a participant by participant basis (Fig 7B). We chose to contrast the normalized CT with ideal observer instead of contrasting CT with the sum of the ideal observer and Markov models because this measure emphasizes the contribution of stimulus-statistics to the internal model maintained by participants. We found strong correlation between the two measures (r(23) = 0.88, p < 0.001), indicating that CT performance can be largely explained by a combination of the Markov and ideal observer models. This strong correlation was consistently present on all recording days (r(198) = 0.923, p < 0.001, S10 Fig). S9 Fig also reveals that advantage of CT predictive performance over the Markov model only starts to grow as the ideal observer model can be identified in the responses of participants.

A closer inspection of the response time data can provide exquisite insight into how the inductive bias and evidence-based models are combined to determine responses. In particular, we wanted to assess if trial-by-trial CT predictions can be broken down into the individual contributions of the Markov and ideal observer models. We modelled the response time predicted by CT as a linear combination of the predictions obtained by the Markov and ideal observer models. Response times in all trials of a particular participant for any given day were fitted with three parameters: the weight of the contributing models and an offset. Combined response time predictions showed high level of correlation with the predictions obtained from CT: on any given day the across-participant average correlation was close or above 0.8 (Fig 7C). Thus, despite the changing contributions of Markov and ideal observer models across days (Fig 7A), the two models could consistently explain a very large portion of the statistics captured by CT.

We investigated if the statistical structure captured by CT goes beyond that captured by the Markov and ideal observer models. The normalized CT showed a small but significant advantage over the ideal observer model on day eight of the experiment (one-sided t(24) = 3.646, p < 0.001, CI = [0.0126, Inf], d = 0.729, Fig 7D). Therefore we sought to understand if the marginal advantage of the normalized CT predictive performance reflected relevant stimulus statistics that could be captured by CT but not by the Markov or ideal observer models. We analyzed response times to the third element of three-stimulus sequences which the trigram model is unable to distinguish. In one of the analysed conditions, the first and third elements were pattern elements and we compared these to a condition where the first and third elements are random elements but the actual observations were the same. Since only the latent state differed between the two conditions, these cannot be distinguished by the trigram model. Higher order learning, characterised by response time difference between the two conditions, was highly correlated with the higher-order statistical learning predictions of CT both early in the training (Fig 7E, r(22) = 0.756, p < 0.001) and on the last day of training (Fig 7E, r(23) = 0.603, p = 0.0014). Interestingly, early in the training most of those participants whose higher-order statistical learning measure was significantly different from zero had negative score (Fig 7E, orange dots), a counter-intuitive finding termed inverse learning [41, 42]. In contrast, higher order statistical learning could not be predicted by the ideal observer (Fig 7E, r(22) = −0.265, p = 0.21 and r(23) = 0.0457, p = 0.828 on Days 2 and 8 of training, respectively). On Day 2, some participants show a significant distinction between these Pattern and Random trials in the reverse direction: responding to Random trials faster than for Pattern trials. The ideal observer model cannot capture this feature of the data whereas internal models inferred using CT can do so.

In summary, while the ideal model cannot account for the full statistical structure captured by CT but together with the Markov model the two models explain the majority of the CT’s internal model such that the relative contributions of the two models shifts towards the ideal observer model during learning.

Discussion

In this paper we built on the idea of Cognitive Tomography [27], which aims to reverse engineer internal models from behavioural responses, and extended it to infer high-dimensional dynamical internal models from response time data alone. Key to our approach was the combination of non-parametric Bayesian methods that allow discovering flexible latent variable models, with probabilistic programming, which allows efficient inference in probabilistic models. The proposed model has a number of appealing properties for studying how acquired knowledge about a specific domain affects momentary decisions of biological agents: 1, We used iHMM, a dynamical probabilistic model that can naturally accommodate rich inter-trial dependencies, characteristic of an array of everyday tasks; 2, iHMM is capable of capturing arbitrarily complex statistical structure but not increasing the complexity of the model more than necessary [43, 44]; 3, Response times can be predicted on a trial-by-trial basis; 4, Complex individualised internal models could be inferred, which allowed inference of individual learning curves. Using this tool, we could track the deviation of the learned internal model from the ideal observer model. We identified transient structures that were nurtured temporarily only to be abandoned later during training. The deviation could be consistently explained by the contribution of a simpler model, a so-called Markov model, that learns the immediate temporal dependencies between observations but ignores latent variables. Initial dominance of the internal model by the Markov model indicated that the Markov model constitutes an inductive bias that humans fall back to when experience is limited. Indeed, during learning the contribution of Markov model decreased on an individual-by-individual basis, which coincided with a gradual decrease in the deviation between the inferred internal model and the ideal observer model.

Learning in general is an ill-defined, under-determined problem. Learning requires inductive biases formulated as priors in Bayesian models to efficiently support the acquisition of models underlying the data [45, 46]. The nature of such inductive biases is a fundamental question which concerns both cognitive science and neuroscience, even machine learning [45, 47, 48]. These inductive biases determine what we can learn effectively. Inductive biases can effectively support learning if these represent priors, which reflect the statistics of the environment. Indeed, Markovian dynamics can be a good approximation of the dynamics of the natural environment therefore can constitute a useful inductive bias. Our analysis demonstrated that participants are remarkably slow to learn the ground truth statistics of the stimuli. Our results also showed that this slow learning dynamics can be accounted for by a strong inductive bias that is consistent across participants. Slow acquisition of the true task statistics might indicate the low a priori probability of the task statistics among the potential hypotheses humans entertain. The spectrum of inductive biases that humans use can be much richer than the Markov model. For instance, after the extended exposure to the ASRT, one can expect that the inductive biases can be updated. It will be an exciting future line of research how we can identify updates in inductive biases, a question related to the broader topic of meta learning, or transfer learning [49].

The model class that we use to infer the internal model has a strong effect on the types of statistics in the data that can be learned effectively. Our proposed model class, iHMM, is appealing because it can accommodate highly complex statistical structures, including the ideal observer model or the Markov model. The flexibility of the model comes at a price that more data is required for inferring the model. This motivated the choice to infer one model per session and track learning by comparing inferred models across days. Our choice is motivated by the revealed multi-day learning process that seemed to be characteristic of all participants. The approach has a limitation too, that early in the training (and especially on the first day) faster changes in the internal model cannot be captured. Alternative model classes, such as a hierarchical version that can more effectively perform chunking, can be more effective in learning from more limited data and can be used to explore the evolution of the internal model in more detail [50]. Data hunger of model inference can be further curbed by learning a constrained version of parameters, such as the transition matrix) but at the expense of hindering potential individual differences. The proposed framework of cognitive tomography naturally accommodates such alternative model classes and we expect further insights into the way inductive biases are used during learning.

The presented model builds on the original CT analysis performed on faces [27] but differs in a number of fundamental ways. We sought to infer the evolution of the internal model for statistics new to participants. In contrast to the earlier formulation using a 2-dimensional latent space and a static model, here the inference of a dynamical and potentially high-dimensional model yields a much richer insight into the working of the internal model acquired by humans. Using a structured internal model allows the direct testing of the model against alternatives, thus providing opportunities to reveal the computational constraints that might limit learning and inference in individuals. A well-structured internal model can be used to make arbitrary domain-related inferences within the same model. Based on this, we can decompose the complex inference problem into separately meaningful sub-parts which can be reused in tangential inference problems to serve multiple goals. Our experimental design permitted some exploration of such across-task generalization capabilities, but suitably updated alternative designs could provide a more exhaustive test of across-task generalization. By showing that the same variables can be used for multiple tasks, it is reasonable to look for signatures of these quantities in neural representations. A possible alternative formalization of this problem could be using Partially Observed Markov Decision Process (POMDP) [51], where internal reward structure and the subjective belief of the effect of the participant’s actions are jointly inferred with the internal model. However, in our experiment, the action model has a simple structure and hence the problem simplifies to a probabilistic sequence learning problem. Instead, here we focus on inferring rich internal model structures as well as having an approximate Bayesian estimate instead of point estimates as in [51]. Still, the ability of POMDP to model how decisions of the agent affect the state of the state of the sequence can become useful for investigating the potential inductive bias that actions actually influence states.

Our model produces moment by moment regressors for (potentially unobserved) variables that are cognitively relevant. Earlier work considered neural correlates of hidden state representations in the orbitofrontal cortex of humans [52] but the internal model was not inferred, rather assumed to be fully known. CT provides an opportunity to design regressors for individualised and potentially changing internal models. In particular, the model differentiates between objective and subjective uncertainties, characteristics relevant to relate cognitive variables to neural responses [5356]. The former is akin to a dice-throw, uncertainty about future outcomes which may not be reduced with more information. The latter is uncertainty arising from ambiguity and lack of information about the true current state of the environment. We showed that uncertainties exhibited by a trained individual’s internal model show similar patterns in these characteristics as the ideal observer model, which promises that uncertainties inferred at intermediate stages of learning are meaningful.

Recently, major efforts have been devoted to learning structured models of complex data both in machine learning and in cognitive science [9, 5759]. These problems are as diverse as learning to learn [57, 60], causal learning [61], learning flexible representational structures [58], visual learning [62]. When applied to human data to reverse engineer the internal models harnessed by humans, past efforts fall into two major categories. 1, Complex (multidimensional) models are inferred from data and fitted to across-participant averaged data [9, 63, 64], ignoring individual differences. 2, Simple (low dimensional) models are used to predict performance on a participant-by-participant manner, thus resulting in subjective internal models [29, 65]. In particular in a simple two-latent variable dynamical probabilistic model individualised priors have been identified [11]. In this setting binary decisions were sufficient as an ‘expert model’ was assumed and assessment of prior comprised of inferring a single parameter, which defined the width of a one-dimensional hypothesis space. Findings of this study gave insights into how individuals differ in their capacity to adapt to new situations. Recently, a notable approach has been presented, which aims at characterizing individual strategies in a setting where the complexity of the state space is relatively large [66]. In this study, the rules of the game (equivalent of the statistics of stimuli in our case) and the relevant features (equivalent to the latent variables in our case) were assumed to be known by the participants. However, being a two-player task there was uncertainty about the strategy of the opponent and the limitations in the computational complexity of the inference was investigated. This aspect is orthogonal to the aspects investigated here and therefore highlight additional appeal of analysing behavior in complex settings. The contribution of the current paper is twofold: 1, We exploit recent advances in machine learning to solve the reverse-engineering problem in a setting where complex internal models with high-dimensional latent spaces are required; 2, We contribute to the problem of identifying structured inductive biases by enabling direct access to the internal model learned by individuals and by dissecting the contributions of evidence and inductive bias.

A widely studied approach to link response time to quantities relevant to task execution is the drift diffusion model, DDM [67]. In its most basic form evidence is stochastically accumulated as time passes such that the rate of accumulation is proportional to the information gained by extended exposure to a stimuli, until evidence reaches a bound where decision is made. Through a compact set of parameters DDM can explain a range of behavioural phenomena, such as decisions under variations in perceptual variables, adaptation to the volatility of the environment, attentional effects on decision making, the contribution of memory processes to decision making, decision making under time pressure [6872], and neuronal activity was also shown to display strong correlation with model variables [73, 74]. Both LATER and DDM have the potential to incorporate variables relevant to make decisions under uncertainty and the marginal distributions predicted by the two models are comparable. Our choice to use the LATER model was motivated by two major factors. First, LATER is formulated with explicit representation of subjective predictive probability by mapping it onto a single variable of the model. This setting promises that subjective probability can be independently inferred from available data and the internal model influences a single parameter of the model. As a consequence, subjective probability is formally disentangled from other parameters affecting response times and associated uncertainty can be captured with Bayesian inference. In case of distributing the effect of subjective probability among more than one parameters (starting point, slope, variance) the joint inference of subjective probability with other parameters affecting response times results in correlated distributions. Consequently, maximum likelihood inference, or any other point estimations, the preferred method to fit DDM, will have large uncertainty over the true parameters due to interactions between other variables. Furthermore, this uncertainty remains unnoticed as there is usually no estimation of this uncertainty, only point estimates. Second, trials are usually sorted based on the design of the experiment into more and less predictable trials (with notable exceptions like [29]). This leads to a misalignment between the true subjective probabilities of a naive participant and the experimenter’s assumptions. Assuming full knowledge of the task and therefore assuming an impeccable internal model in more complex tasks, however, implies that potential variance in the acquired internal models across subjects will be captured in variances in parameters characteristic of the response time model rather than those of the internal model. DDM is considered to be an algorithmic-level model [75] of choices [76], which is indeed useful for linking choice behaviour to neuronal responses [77]. The appeal of the Bayesian description offered by the normative framework used here is that it can accommodate a flexible class of internal models, without the need to adopt algorithmic constraints. Similar algorithmic-level models of behaviour that is based on the flexible and complex internal models yielded by Cognitive Tomography are not available and will be the subject of future research.

In summary, we presented and validated a tool that could flexibly infer complex, dynamical, individualised internal models from simple behavioral data. We demonstrated that various levels of discrepancy existed between the ideal observer model and the internal model maintained by individuals. We used this discrepancy to identify an inductive bias with a structure that was consistent across participants. This approach promises that altered contribution of inductive biases or learning can be identified in affected populations at the individual level. An additional promise provided by the presented approach is the separation of the internal model from the behavioral models: the Cognitive Tomography framework can naturally integrate diverse behavioral data into a single model, thus by using multiple modalities ensures faster and more accurate inference of the internal model.

Methods

Ethics statement

All participants provided written informed consent before enrollment and received course credits for taking part in the experiment. The study was approved by the United Ethical Review Committee for Research in Psychology (EPKEB) in Hungary (Approval number: 30/2012) and by the research ethics committee of Eötvös Loránd University, Budapest, Hungary. The study was conducted in accordance with the Declaration of Helsinki.

Experiment

Participants

Twenty-five individuals (22 females and 3 males) aged between 18 and 22 (MAge = 20.4 years, SDAge = 1.0 years) took part in the experiment (we recruited 32 participants, but only 26 completed the experiment; we omitted one further participant because of a system error which resulted in partial loss of their experiment data). They were university students (MYears of education = 13.3 years, SDeducation = 1.0 years) from Budapest, Hungary. None of the participants reported history of developmental, psychiatric, neurological or sleep disorders, and they had normal or corrected-to-normal vision. They performed in the normal range on standard neuropsychological tests of short-term and working memory (Digit span task: M = 6.48, SD = 1.15, Counting span task: M = 3.76, SD = 0.99) [78]. Before the assessment, all participants gave signed informed consent and received course credit for participation.

Tasks

Alternating Serial Reaction Time (ASRT) Task Learning was measured by the ASRT task [35, 79]. In this task, a stimulus (a dog’s head) appeared in one of four horizontally arranged empty circles on the screen and participants were asked to press the corresponding button as quickly and accurately as they could when the stimulus occurred. The computer was equipped with a keyboard with four heightened keys (Z, C, B, M on a QWERTY keyboard), each corresponding to a circle in a horizontal arrangement. Participants were asked to respond to the stimuli using their middle- and index fingers bimanually. The stimulus remained on the screen until the participant pressed the correct button. The next stimulus appeared after a 120 ms response-to-stimulus-interval (RSI). The task was presented in blocks of 85 stimuli: unbeknownst to the participants, after the first five warm-up trials consisting of random stimuli, an 8-element alternating sequence was presented ten times (e.g., 2r4r3r1r, where each number represents one of the four circles on the screen and r represents a randomly selected circle out of the four possible ones). The sequence started at the same phase in each block.

Procedure

There were ten sessions in the experiment, with one-week delay between the consecutive sessions. Participants performed the ASRT task with the same sequence in the first eight sessions, then an interfering sequence was introduced in Session 9, and both (original and interfering) sequences were tested in Session 10 (see S1 Fig). Participants were not given any information about the regularity that was embedded in the task in any of the sessions [79]. They were informed that the main aim of the study was to test how extended practice affected performance on a simple reaction time task. Therefore, we emphasized performing the task as accurately and as fast as they could. Between blocks, the participants received feedback about their average accuracy and reaction time presented on the screen, and then they had a rest period of between 10 and 20 s before starting the next block. On Days 1–9, the ASRT consisted of 25 blocks. One block took about 1–1.5 min, therefore the task took approximately 30 min. For each participant, one of the six unique permutations of the four possible ASRT sequence stimuli was selected in a pseudo-random manner [35, 79, 80]. The ASRT task was performed with the same pattern sequence in Sessions 1–8. In Session 9, the ASRT was performed with a new interfering pattern sequence. In Session 10, participants performed 20 blocks of the ASRT task switching between the pattern sequences of Sessions 1–8 and Session 9 every five blocks. In Session 10, the task took approximately 24 min. After performing the ASRT task in Session 10, we tested the amount of explicit knowledge the participants acquired about the task with a short questionnaire. This short questionnaire [79, 81] included two questions: “Have you noticed anything special regarding the task?” and “Have you noticed some regularity in the sequence of stimuli?”. The participants did not discover the true probabilistic sequence structure.

Modelling background

Models for sequential prediction

The experimental stimuli form a sequence of discrete observations in discrete time, {Yt}t=1T. The task is therefore to predict the upcoming stimulus conditioned on the history of observations:

P(YT+1|Y1,Y2,,YT) (1)

In practical terms, learning a model for this temporal prediction task requires imposing a structure over these conditional distributions. Without structural assumptions, there is no statistical dependence among different histories, that is, there is no generalisation from history to future observations.

In the following section we introduce a computational model, the Hidden Markov Model, which can provide a general language for solutions of this problem. It can express arbitrarily complex models given sufficiently large amounts of data. In order to remain as general as possible, we will consider a model space (infinite Hidden Markov Models as in [33]) which can model all the possible distributions in Eq 1. Moreover, we would like to achieve this while being able to express inductive biases in this language which are useful for constraining the possible models in the limited data case.

Hidden Markov model

Formally, a Hidden Markov Model comprises of a sequence of hidden states {St}t=1 and a sequence of observations {Yt}t=1. In this work we take both the latent states and the observations to be discrete, that is St,YtN. The sequence of hidden (latent) states constitute a discrete Markov-chain with transition probabilities πij = P(St+1 = j|St = i). In a Markov-chain, the sequence element St is conditionally independent of the history conditioned on the previous state and the transition probabilities:

St(S1,S2,,St2)|St1,π

At (discrete) time t, observation Yt is governed by the latent state St. The observations are generated independently and identically, conditioned on the (latent) state:

P(Yt=y|St=st)=ϕst,yandYt(Y1,Y2,,Yt1,S1,S2,,St1)|St,ϕ

Importantly, since the latent state can incorporate arbitrary information (identical observations at different time-points can correspond to different states), assuming arbitrarily many latent states, we get a completely general solution for the prediction problem in Eq 1. With an adequate prior (e.g. the Hierarchical Dirichlet Process in [82] we can learn such structures efficiently [33]). In practical terms the length of the observation sequence limits the number of possible latent states but it is limited by the diminishing posterior probability of high latent state models.

Cognitive tomography

We construct a model of behaviour which consists of two parts:

  1. An internal model maintained by the participant, which formalizes how latent states assumed to underlie observations evolve and how these states are linked to observations.

  2. A model relating the prediction of participants’ internal model to their responses (response time model).

Doubly Bayesian model

Due to the uncertainty of the participants about the true model and actual state in the stimulus sequence and to the uncertainty of the experimenter about the model maintained by participants and about the actual state of this internal model, the problem can be described as doubly Bayesian. We do Bayesian inference over an internal representation of individuals who themselves do Bayesian inference. Elements of the experimenter’s model are introduced in following sections.

Prediction of response times can be described by the following algorithm:

  1. We take posterior samples from the behavioural model which consists of parameters of the internal model and the response time model conditioned on data from ten consecutive blocks of trials (see explanation for ten below), where:
    • (a)
      all stimuli, and
    • (b)
      response times (with incorrect trials’, first five random trials’ response times, and response times smaller than 180 msec in each block removed). According to the original formulation by [34], fast response times come from an alternative distribution. We cut off the fast response times (as in [83]) at the fixed 180 msec value. However, we did not fit the cut-off time parameter. Incorrect trials constitute 11% of trials overall while trials below the 180msec threshold constitute 2.2% of trials overall and 5.3% on Day 8.

    are included.

  2. For each of the posterior model samples we compute predicted response times by:
    • (a)
      filtering the belief over the latent state over the entire sequence
    • (b)
      produce subjective probabilities for each trial
    • (c)
      produce response time prediction (MAP estimate conditioned on the subjective probability and the response time parameters of the model sample)

    Then we marginalize (i.e. average) over the response time predictions of model samples.

  3. We evaluate model performance by computing the R2 explained variance measure of the predicted response times on the response times of the test dataset. In any given session we train the model on one set of blocks and predict response times on a distinct set of test blocks. During optimizing our model and algorithm, we concluded that using ten consecutive blocks for training provides the best results for the CT model. We also found that using ten blocks for the test set decreases variance of the R2 estimator sufficiently to have individualised learning trajectories.

Note: since actual beliefs depend on past beliefs, one can think of the belief sequence as the path of a light-ray in a large dimensional fog (representing the state uncertainty). During inference, we have a noisy measurement of the light-ray in different points of time and we would like to reconstruct the best explanation of the observation sequence (response times) in terms of a hidden path. As for prediction, the model produces response time predictions for the entire stimulus sequence with no further feedback of response times (i.e. estimated internal beliefs are not updated based on what response time the participant produced on given trials).

Infinite Hidden Markov model

The infinite Hidden Markov Model is a non-parametric extension of the Hidden Markov Model, assuming countably infinitely many states. There is a hierarchical prior imposed over the state transition matrix and the so-called emission distributions relating the latent (hidden) states to observations (S2 Fig).

The hierarchical prior we used is exactly the one defined in [33]. We extended their implementation of their model to a doubly Bayesian behavioural model including the response time.

A participant is assumed to learn a probabilistic model of the sequence which is formalized as an infinite Hidden Markov Model. At (discrete) time t, observation Yt is governed by a latent (not directly observable) state St. The states {St}t=1,2,… constitute a Markov-chain, which means the following:

p(St|S1,S2,,St1)=p(St|St1) (4)

That is, the state St−1 holds all information about past regarding the possible evolution of system. In other terms, conditioning on state St−1 renders St and all previous states S1, S2, …, St−2 statistically independent.

The observation Yt at time t is independent of all other observations, conditioned on the latent state St (and the model parameters). That is, once the state of the system is decided, the actual previous observations are independent of Yt.

The parameters governing the state transitions are aggregated in the parameter matrix π:

πi,j=p(St=j|St1=i)t

The observation distributions are given by the parameter matrix ϕ:

ϕi,k=p(Yt=k|St=i)

At any given time during the task, we assume the participant had estimated the parameters π and ϕ and uses these (point estimates) to do exact filtering over the sequence of observations. That is, in each trial they use the evidence provided by the current stimulus to update their belief over the latent state of the sequence. When doing computations with the participant’s internal model, we hold the internal model fixed within shorter time-scales of the task (e.g. one session). The participant represents their belief about the current latent state of the system by a posterior distribution, updated by each incoming observation, while always conditioning on their current estimates π^ and ϕ^ of π and ϕ respectively. We denote this posterior distribution over latent states at time t by st^ (note this is not a point estimate of the state but rather a vector of probabilities where (st^)i=p(st=i).

st^p(st|y1,y2,,yt)p(yt|st)p(st|y1,y2,,yt1)=st1ϕ^st,ytp(st|st1)p(st1|y1,,yt1)=st1ϕ^st,ytπ^st1,sts^t1

For predicting the latent state based on previous states and the observation (termed filtering), stimuli of all trials (including initial random trials at the beginning of each block and stimuli in trials where participant hit the wrong key initially) are used. That is, even if incorrect response times are not used when doing inference over the participant’s internal model, the participant is assumed to update their internal beliefs based on the stimulus shown.

Prediction of the next stimulus is computed by marginalizing over the latent state posterior distribution:

p(yt+1|y1,y2,,yt)==st+1p(yt+1|st+1)p(st+1|y1,y2,,yt)=st+1ϕ^st+1yt+1p(st+1|y1,y2,,yt)=st,st+1ϕ^st+1yt+1p(st+1|st)p(st|y1,y2,,yt)=st,st+1ϕ^st+1yt+1π^st,st+1s^t

Throughout the execution of the task, the internal model of the participants is continually updating. We do not directly model the computation of the participants that estimates the current π and ϕ parameters. That is, within a given train or test dataset (10 consecutive blocks) we hold π and ϕ fixed. We do allow, however, for these estimates of π and ϕ to change between sessions. For a summary of when each parameter is allowed to change see Table 1.

Table 1. Summary of when model parameters are allowed to change.
Variable Notation Within train/test Within session, between train-test Between sessions Between participants
State transition distribution π^ No No Yes Yes
Observation distribution ϕ^ No No Yes Yes
State belief st^ Yes Yes Yes Yes
Response time parameters τ0, μ, σ No No No No
Prior of observation distribution H No No No No
Hierarchical prior over state transitions α, γ No No No No

We do Approximate Bayesian Inference using a custom sampling method that mixes steps of a Hamiltonian Monte Carlo (HMC) and a Gibbs sampler which samples a slicing parameter (see [33]). The priors used in the model are listed in Table 2.

Table 2. Parameter priors.

Values of the hierarchical prior over state transitions taken from [33].

Variable Prior
State transition distribution π^iDirichlet(α0/K,,α0/K,α0/K·ϵ)
Observation distribution ϕ^Dirichlet(0.8,0.8,0.8,0.8)
Response time parameters τ0 ∼ Γ(1, 10)
μ ∼ Γ(1, 0.1)
σ ∼ Γ(1, 0.01)
Hierarchical prior over state transitions α = 1.3
γ = 3.8

In order to handle the infinitely many possible states, we use a modified version of the slice sampling method described in [33]. In the original beam sampling algorithm, the authors sample the latent state sequence and make use of the slicing variable to constrain the set of used states to a finite set. They sample the latent sequence and the slicing variables in an alternating fashion. In our case we do not sample latent state sequences, instead, we have to estimate the subjective belief sequence over the latent states. In this latter case, the posterior belief is infinite dimensional and we use slicing to approximate this infinite-dimensional computation with a finite one. At each sampling step, we only look at the latent state belief distribution’s 1 − ϵ support where ϵ is sampled from Uniform(0.02, 0.2).

Four independently and randomly initialised Markov Chains were sampled with 1600 steps of the slice sampling (outer Gibbs-sampling chain) and 30 NUTS steps were taken in between the slice sampling steps each time. Samples from the second half of each chain were used to check if estimates of response time parameter means and confidence intervals were identical. For prediction, the last 60 unique samples were used from each chain because prediction performance saturates at this number of samples.

Ideal observer model

We formalise the ideal observer the following way: at any given point of the experiment, the ideal observer entertains an internal dynamical model comprising of two parts: latent dynamics (the transition probabilities between latent states) and an observational model (conditional distributions of observations conditioned on the latent state).

In order to produce predictions for the upcoming observation, conditioning on a fixed model, the ideal observer solves the filtering problem:

P(Yt|Y1,Y2,,Yt1,π,ϕ)==st=1P(Yt|St=st)·P(St=st|Y1,Y2,,Yt1)=st=1ϕst,yt·P(St=st|Y1,Y2,,Yt1)=st=1st1=1ϕst,yt·P(St=st|St1=st1)·P(St1=st1|Y1,Y2,,Yt1)=st=1st1=1ϕst,yt·πst1,st·P(St1=st1|Y1,Y2,,Yt1)

The term filtering is used because as we deduced, the relevant quantity is P(St−1|Y1, Y2, …, Yt−1) which can be filtered through our observations. We carry on this quantity and can calculate it for the next time-step using our model parameters and the observation Yt.

Importantly, instead of sampling one possible latent trajectory, we have to marginalise over these latent sequences to obtain our prediction for the upcoming stimulus. That is, our prediction is the aggregate of the predictions of many possible latent pasts. We combine the predictions of ‘had these been the sequence of causes of my past experiences, I should see this’ for all possible hypothesised latent cause sequences.

Response time model

In order to connect the predictions of the internal model to measured behaviour, we need to employ a generative model of response times in the form of a conditional probability distribution conditioned on the subjective predicted probability of the upcoming stimulus. To achieve this, we employ the reaction time model of [34], which in its original formulation states that the majority of saccadic response times come from a reciprocal Normal distribution.

Further studies suggest choice response time distribution should have a similar form [84, 85]. However, in other formulations, there is no explicit dependence of the distribution of the RT in a single trial depending on the subjective predicted probability, hence those models are inadequate for our purposes. The generative model for correct response times (LATER model, [34]) is:

rnNormal(μ,σ)
RTn=θ0log(pn)rn

where pn is the subjective probability (output of the internal model) corresponding to the actual upcoming stimulus and μ, σ, θ0 are the parameters characterising an individual’s response time model. These parameters jointly describe the mean and variance of the response times. Note that in our experiment these parameters comprise all idiosyncratic effects at hand, namely the individual’s state, their response times’ sensitivity to subjective predicted probabilities, the effects of instruction influencing speed-accuracy trade-off. Note, that in order to avoid assigning probability to negative reaction times, we use a truncated Normal distribution.

The response time parameters are jointly inferred along with the internal representations (dynamical model, observation distribution, latent state inference).

Validation on synthetic datasets

In order to validate our behavioural model as well as our inference method, we looked at how well we can recover subjective probabilities on a synthetic dataset. We chose to constrain our analysis to the recovery of subjective probabilities instead of the generative model structure due to the unsupervised nature of our method: the objective of inference is to learn the distribution of data (which is in direct relationship with the predictive probability of upcoming stimuli). This is in contrast with more supervised methods where the emerging representations can be gauged by performing tasks that rely on the latent variables. We used the algorithm in [33] on synthetic ASRT data to infer a first set of three different internal models from different levels of exposure. These models represent internal models of different synthetic participants (S3(A) Fig). As a prior predictive check, we show marginal distributions of synthetic response times that approximately match response time distributions of humans (S11 Fig). We take these models as the ground truth for our synthetic experiment. We trained one model on 640, 1280 and 2400 trials of ASRT stimuli. We then generated response times from the generative model with three parameter settings for each of τ0, μ, and σ resulting in a total of 33 = 27 different synthetic response time sequences. The resulting response time distribution’s variance is influenced by all four factors—the subjective probabilities (which depends on the internal model) and the three response time parameters. The standard deviation of the response times is an appropriate measure since this can be also computed for data obtained from human participants. We generated the response times for 10 ASRT blocks (the same number we used for inference on human data). Standard deviation of the resulting response times (symbol colours on S3(B) Fig) arise from the interaction of all parameters. Different combinations of the response time parameters resulting in the same standard deviation are marked by identical colours. Then, we used the CT inference method to generate a second set of (posterior) internal model samples. We computed the same model performance measure as for human data (response time prediction performance) and compared it to the prediction performance of that of the original internal model of the participant (S3(B) Fig). Then, since the recovered model matched in this performance to the ground truth internal model of the synthetic participants, we also compared how well the actual subjective probabilities of said synthetic participants can be predicted (S3(C) Fig). The results show that the prediction performance of the subjective probabilities exceeds that of the individual response times. Also, as seen in S3(D) Fig, standard deviations of human participant’s response times are within the range for which we validated our model inference method.

Alternative models

Markov model

Internal model of participants. According to this model, the participants assume that the sequence of observations constitute a Markov-chain. That is, for the sequence of observations yt, we have

p(yt|y1,y2,,yt1)=p(yt|yt1)t

The above equation states that the next observation is independent of all previous observations given the previous observation. This is equivalent to saying that all information (besides parameters governing the sequence) about the state of the sequence is included in the previous observation.

Inference. We use the same parameter priors for the response time model as for the iHMM model and the prior for transition probabilities πi ∼ Dirichlet(α0/K, α0/K, α0/K, α0/K), where K is the number of states, in this case 4.

Four independently and randomly initialised Markov Chains were sampled with 1600 steps taken with the NUTS sampler in STAN. Samples from the second half of each chain were used to check if response time parameter estimates’ means and confidence intervals were identical. For prediction, the last 60 samples were used from each chain.

Relation to HMM. Note that Markov models are a subset of Hidden Markov Models. We can always write a Markov model as an HMM if we have a matching number of observation values and latent state values and each observation is unique to a state.

This is particularly important since for an HMM for which the above condition holds, there is an equivalent Markov chain that describes the exact same sequence structure. This is the reason why we term some of the internal models identified by our iHMM method “Markov-like”, since they are closely approximated by an actual Markov model.

Trigram model

The model we describe here is also referred to as ‘triplet model’ in previous works using the ASRT paradigm. We use the term trigram since it is more commonly used in a sequential prediction modelling context.

Internal model of participants. The model, established in prior literature, sorts trials into High probability and Low probability triplets. This is equivalent to assuming that the participant uses a two-back (or trigram) model for prediction, predicting the most-likely stimulus conditioning on the previous two observations. Due to the ground truth generative model of the task there is no practical dependence on the identity of the immediately preceding stimulus, and only the penultimate stimulus can contribute to making predictions.

Inference. The trigram model has no parameters fitted. Predictive performance is evaluated by the R2 measure between the response times and the binary variable (high vs low trials) provided by the trigram model.

Model comparison

Since not all models considered are Bayesian (i.e. provide an explicit marginal log-likelihood for the response times), we chose to compare models based on explained variance of response times on a test set. Each model produces response time predictions for each trial and each individual separately. When evaluating on a given test set, in order to control for a shift in mean not related to the inherent structure of the response times, we use R2 as our performance metric. That is equivalent to assuming that the actual observed response times come from a linear model with the predicted response time as mean and an additive homoscedastic (equal variance irrespective of predicted response time) normal noise term.

R2 values were calculated separately for each individual’s trials.

Train and Test Datasets. For the reason described in the above paragraph, for each day (out of 10) of the experiment, out of the 25 blocks each day, we selected blocks 11–20 as a training dataset and blocks 1–10 as test datasets. The main reason for this choice is that on each day in the initial few blocks participants may be engaged in a warm-up phenomenon which fundamentally alters their behaviour in the task. If we use the first 10 blocks as test data, the performance metric may be influenced by 10–30% depending on how many blocks include altered behaviour. However, if we used this part as training data, the whole internal model inference would shift fundamentally, since our inference algorithm assumes a fixed model the entirety of the 10 blocks.

During model inference (train dataset) and performance evaluation (test dataset) the first five random trials and all incorrect response trials’ response times are not considered.

Statistical methods

Normality was not checked prior to t-test comparisons. All reported correlations were computed using Pearson’s correlation. T-tests are paired sample tests whenever there is a within-subject comparison. All binomial tests are one-sided. For effect sizes we calculated Cohen’s d using the lsr R package.

Error prediction

In the error prediction task we analyzed trials in which participants did not press the button corresponding to the actual stimulus and instead pressed a wrong button. The analysis assesses two quantities: the subjective probability of the correct buttons relative to that of the other buttons, and the subjective probability of the erroneously pressed button relative to those of other buttons. Just as with response time prediction, the model outputs (for each posterior model sample) a subjective probability estimate for each one of the four possible stimuli for each trial. Then, we take the mean over these probability estimates over the last 60 unique samples of each chain. We decided on using 60 samples since model performances saturate at this number. In Fig 5 we compute the rank among the four probability estimates of the stimulus and the choice in correct and incorrect trials. Then, based on the subjective probability estimates of the actual occurring stimulus, we plot the receiver operating characteristic curve for predicting whether a given trial will result in an error. This is done by moving a threshold value from 0 to 1 and predicting a correct trial if the subjective probability of the upcoming stimulus is above the threshold and an erroneous trial otherwise. The trigram model has two points (other than the (0, 0) and (1, 1) points). This is because the trigram model predicts 0.25 probability for the all stimuli for the first two trials in each block and 0.625 probability for the more high probability trigram element in all other trials and 0.125 for the other stimuli.

Kullback-Leibler divergence

We computed KL-divergence between the ground truth probabilities of the task (1.0 for Pattern and 0.25 for Random trials) and that of the inferred internal model’s subjective probabilities. For each trial, we computed:

ipi·(log(pi^)+log(pi))

where i runs over the possible stimuli. Then, we took the mean of all these KL-divergences over the trials in the test sets for Days 2 and 8 for Fig 6G and 6H.

We did the same computation between the inferred Markov models’ subjective probabilities and those of the internal models inferred by CT (y-axis on Fig 6G and 6H). For a proof why KL-divergence can be used as a measure of participants’ task performance, see S2 Appendix.

Supporting information

S1 Appendix. A brief introduction to infinite Hidden Markov Models.

(PDF)

S2 Appendix. Optimal prediction and the LATER model.

(PDF)

S1 Fig. Experimental design.

A Experimental stimuli and abstract representation used in the paper. B Design of the experiment. The experiment consisted of ten sessions, separated by a one-week delay. On Days 1–8, participants performed the ASRT task with sequence 1 throughout 25 blocks (5 epochs) each sessions. On Day 9, an interfering sequence (sequence 2) was introduced. Both sequences were tested on Day 10 with blocks of 5 alternating.

(TIF)

S2 Fig. Graphical representation of internal model and generative model of behaviour.

Left: Internal model, generative model of the sequence assumed by the participant. Right: generative model of behaviour.

(TIF)

S3 Fig. Synthetic data experiment.

A We first sampled three versions of synthetic internal models using the original iHMM inference method in [33]. The internal models of the synthetic participants differ in their experience (as how many ASRT trials they had seen)—resulting in an “early”, “middle” and “late” model. Then, we generated subjective probability values for each model on a new set of ASRT stimuli (holding the pattern sequence intact). B Results of our synthetic data experiment. Performance is measured as the amount of variance in response times (R2). We ran our inference method for 81 synthetic datasets with different parameter settings (symbols with different colors and shapes). We use the same number of response times as with the human participants to recover the internal models. Symbol colours correspond to the response time standard deviation. The result shows that while the response time prediction may be at a lower level, the latent predictive probabilities can still be inferred with relatively high accuracy. This shows the inference method can recover the latent structure from a generated response time sequence. C Predictive performance (R2) of the actual internal model of the synthetic participant vs the predictive performance of the inferred internal model of the same synthetic participant. The inferred model is evaluated on train data sets (same as on panel B). D Standard deviations of response times of individuals in the first eight experimental sessions.

(TIF)

S4 Fig. Comparison of quantiles of the (z-scored) r variable in the LATER model.

Quantiles are computed from the response times and predicted subjective probabilities with quantiles of the expected normal distribution for the analysed models (red, CT; green, ideal observer; blue, Markov), also known as QQ-plots. Participant-by-participant shows that the empirical distribution of the r parameter on a test set is approximately normal with a few exceptions (see participants 124 131 CT and Ideal Observer models), thus validating model assumptions.

(TIF)

S5 Fig. Response time distributions.

A Response time samples generated from different models and the original Data for participant 119. B The density plots of the point clouds in A. C Predicted response times (mean of maximum a posteriori estimates for each model averaged over the model samples) vs actual response times. Response times outside mean ±3 s.d. are omitted for visual clarity. In contrast with panel A, the x coordinates are best predictions rather than random samples, hence their spread is much smaller. D Histogram of model predictive performances on Day 8. 9 Box plots of model performance distributions, data same as panel D.

(TIF)

S6 Fig. Predicted response time means vs. measured response time means.

Predictions are on the test set on Day 8 of the experiment grouped by three element sequences for each participant separately (each dot corresponds to one possible three-element sequence). Only those sequences were included which had at least 5 measured correct response times in order to limit the standard error over the measured response time mean. Error bars show 2 s.e.m.

(TIF)

S7 Fig. Predicting when errors will occur for each participant individually.

(TIF)

S8 Fig. Model performances of CT models trained on Day 8 and Day 9 for each individual.

(TIF)

S9 Fig. Model performances for all models and all participants individually.

(TIF)

S10 Fig. Normalized CT performance as a function of the ideal observer model performance on different days of the experiment.

Dots indicate the performance of the models for different individuals.

(TIF)

S11 Fig. RT distribution examples of synthetic participants.

Each panel shows distributions with RT model parameters sampled from their respective priors. Distributions are shown as violin plots as a function of predictive probabilities.

(TIF)

Acknowledgments

Resources for the computational analysis were generously provided by the Wigner Data Center. The authors would like to thank to Máté Lengyel, Peter Dayan and Noémi Éltető for comments on an earlier version of the manuscript.

Data Availability

The code used for this paper is available at https://www.github.com/mzperix/asrt-beamsampling. The repository contains links to the experimental data as well as the data used to generate the figures.

Funding Statement

This research was supported by the National Brain Research Program (project 2017-1.2.1-NKP-2017-00002, D.N., G.O.); Hungarian Scientific Research Fund (NKFIH-OTKA K K125343, G.O.; NKFIH-OTKA K 128016, D.N., NKFIH-OTKA PD 124148, K.J.); Janos Bolyai Research Fellowship of the Hungarian Academy of Sciences (K.J.); IDEXLYON Fellowship of the University of Lyon as part of the Programme Investissements d’Avenir (ANR-16-IDEX-0005) (D.N). B.T. was supported by scholarship by Budapest University of Technology and Economics as well as by Mozaik Education Ltd. (Szeged, Hungary). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Tenenbaum JB, Kemp C, Griffiths TL, Goodman ND. How to Grow a Mind: Statistics, Structure, and Abstraction. Science. 2011;331(6022):1279–1285. doi: 10.1126/science.1192788 [DOI] [PubMed] [Google Scholar]
  • 2. Sutton RS. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Machine learning proceedings 1990. Elsevier; 1990. p. 216–224. [Google Scholar]
  • 3. Yang SCH, Lengyel M, Wolpert DM. Active sensing in the categorization of visual patterns. Elife. 2016;5:e12215. doi: 10.7554/eLife.12215 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Yuille A, Kersten D. Vision as Bayesian inference: analysis by synthesis? Trends in cognitive sciences. 2006;10(7):301–308. doi: 10.1016/j.tics.2006.05.002 [DOI] [PubMed] [Google Scholar]
  • 5. Wolpert DM, Ghahramani Z, Jordan MI. An internal model for sensorimotor integration. Science. 1995;269(5232):1880–1882. doi: 10.1126/science.7569931 [DOI] [PubMed] [Google Scholar]
  • 6. Berkes P, Orbán G, Lengyel M, Fiser J. Spontaneous cortical activity reveals hallmarks of an optimal internal model of the environment. Science. 2011;331(6013):83–87. doi: 10.1126/science.1195870 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Courville AC, Daw ND, Touretzky DS. Bayesian theories of conditioning in a changing world. Trends in cognitive sciences. 2006;10(7):294–300. doi: 10.1016/j.tics.2006.05.004 [DOI] [PubMed] [Google Scholar]
  • 8. Sobel DM, Tenenbaum JB, Gopnik A. Children’s causal inferences from indirect evidence: Backwards blocking and Bayesian reasoning in preschoolers. Cognitive science. 2004;28(3):303–333. doi: 10.1207/s15516709cog2803_1 [DOI] [Google Scholar]
  • 9. Battaglia PW, Hamrick JB, Tenenbaum JB. Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences. 2013;110(45):18327–32. doi: 10.1073/pnas.1306572110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. da Silva CF, Hare TA. Humans primarily use model-based inference in the two-stage task. Nature Human Behaviour. 2020;4:1053–1066. doi: 10.1038/s41562-020-0905-y [DOI] [PubMed] [Google Scholar]
  • 11. Glaze CM, Filipowicz ALS, Kable JW, Balasubramanian V, Gold JI. A bias-variance trade-off governs individual differences in on-line learning in an unpredictable environment. Nature Human Behaviour. 2018;2:213–224. doi: 10.1038/s41562-018-0297-4 [DOI] [Google Scholar]
  • 12. Ackerman PL. Individual differences in skill learning: An integration of psychometric and information processing perspectives. Psychological bulletin. 1987;102(1):3. doi: 10.1037/0033-2909.102.1.3 [DOI] [Google Scholar]
  • 13. Feldman J. Tuning your priors to the world. Topics in cognitive science. 2013;5(1):13–34. doi: 10.1111/tops.12003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Beck JM, Ma WJ, Pitkow X, Latham PE, Pouget A. Not noisy, just wrong: the role of suboptimal inference in behavioral variability. Neuron. 2012;74(1):30–39. doi: 10.1016/j.neuron.2012.03.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Rahnev D, Denison RN. Suboptimality in Perceptual Decision Making. Behavioral and brain sciences. 2018;41:e223: 1–66. doi: 10.1017/S0140525X18000936 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Song M, Bnaya Z, Ma WJ. Sources of suboptimality in a minimalistic explore-exploit task. Nature Human Behaviour. 2019;3:361–368. doi: 10.1038/s41562-019-0564-z [DOI] [PubMed] [Google Scholar]
  • 17. Roach NW, McGraw PV, Whitaker DJ, Heron J. Generalization of prior information for rapid Bayesian time estimation. Proceedings of the National Academy of Sciences. 2017;114:412–417. doi: 10.1073/pnas.1610706114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Gekas N, Chalk M, Seitz AR, Seriès P. Complexity and specificity of experimentally-induced expectations in motion perception. Journal of Vision. 2013;13:1–18. doi: 10.1167/13.4.8 [DOI] [PubMed] [Google Scholar]
  • 19. Acerbi L, Vijayakumar S, Wolpert DM. On the Origins of Suboptimality in Human Probabilistic Inference. PLoS Computational Biology. 2014;10:e1003661. doi: 10.1371/journal.pcbi.1003661 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Drugowitsch J, Wyart V, Devauchelle AD, Koechlin E. Computational Precision of Mental Inference as Critical Source of Human Choice Suboptimality. Neuron. 2016;92:1398–1411. doi: 10.1016/j.neuron.2016.11.005 [DOI] [PubMed] [Google Scholar]
  • 21. Love BC, Medin DL, Gureckis TM. SUSTAIN: a network model of category learning. Psychological Review. 2004;111:309–332. doi: 10.1037/0033-295X.111.2.309 [DOI] [PubMed] [Google Scholar]
  • 22. Gershman SJ, Niv Y. Perceptual estimation obeys Occam’s razor. Frontiers in Psychology. 2013;4:623. doi: 10.3389/fpsyg.2013.00623 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Nagy DG, Török B, Orbán G. Semantic Compression of Episodic Memories. In: Proceedings of the 40th Conference of the Cognitive Science Society; 2018. p. 2138–2143.
  • 24. Berniker M, Voss M, Kording K. Learning priors for Bayesian computations in the nervous system. PloS one. 2010;5:e12686. doi: 10.1371/journal.pone.0012686 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Griffiths TL, Chater N, Kemp C, Perfors A, Tenenbaum JB. Probabilistic models of cognition: exploring representations and inductive biases. Trends in Cognitive Sciences. 2010;14:357–364. doi: 10.1016/j.tics.2010.05.004 [DOI] [PubMed] [Google Scholar]
  • 26. Sanborn A, Griffiths TL. Markov chain Monte Carlo with people. In: Advances in neural information processing systems; 2008. p. 1265–1272. [Google Scholar]
  • 27. Houlsby NMT, Huszár F, Ghassemi MM, Orbán G, Wolpert DM, Lengyel M. Cognitive Tomography Reveals Complex, Task-Independent Mental Representations. Current Biology. 2013;23(21):2169–2175. doi: 10.1016/j.cub.2013.09.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Langlois TA, Jacoby N, Suchow JW, Griffiths TL. Serial reproduction reveals the geometry of visuospatial representations. Proceedings of the National Academy of Sciences. 2021;118(13). doi: 10.1073/pnas.2012938118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Glaze CM, Kable JW, Gold JI. Normative evidence accumulation in unpredictable environments. eLife. 2015;4:e08825. doi: 10.7554/eLife.08825 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Urai AE, Braun A, Donner TH. Pupil-linked arousal is driven by decision uncertainty and alters serial choice bias. Nature Communications. 2017;8:14637. doi: 10.1038/ncomms14637 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Braun A, Urai AE, Donner TH. Adaptive History Biases Result from Confidence-Weighted Accumulation of past Choices. Journal of Neuroscience. 2018;38(10):2418–2429. doi: 10.1523/JNEUROSCI.2189-17.2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Mathys CD, Lomakina EI, Daunizeau J, Iglesias S, Brodersen KH, Friston KJ, et al. Uncertainty in perception and the Hierarchical Gaussian Filter. Frontiers in Human Neuroscience. 2014;8:825. doi: 10.3389/fnhum.2014.00825 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Gael JV, Saatci Y, Teh YW, Ghahramani Z. Beam Sampling for the Infinite Hidden Markov Model. Proceedings of the 25th international conference on Machine learning. 2008; p. 1088–1095.
  • 34. Carpenter R, Williams M. Neural computation of log likelihood in control of saccadic eye movements. Nature. 1995;377:59–62. doi: 10.1038/377059a0 [DOI] [PubMed] [Google Scholar]
  • 35. Howard JH, Howard DV. Age differences in implicit learning of higher order dependencies in serial patterns. Psychology and Aging. 1997;12(4):634–656. doi: 10.1037/0882-7974.12.4.634 [DOI] [PubMed] [Google Scholar]
  • 36. Noorani I, Carpenter R. The LATER model of reaction time and decision. Neuroscience and Biobehavioral Reviews. 2016;64:229–251. doi: 10.1016/j.neubiorev.2016.02.018 [DOI] [PubMed] [Google Scholar]
  • 37. Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, et al. Stan: A probabilistic programming language. Journal of Statistical Software. 2017;76(1). doi: 10.18637/jss.v076.i01 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Collins A, Koechlin E. Reasoning, learning, and creativity: frontal lobe function and human decision-making. PLoS biology. 2012;10(3):e1001293. doi: 10.1371/journal.pbio.1001293 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Gershman SJ, Norman KA, Niv Y. Discovering latent causes in reinforcement learning. Current Opinion in Behavioral Sciences. 2015;5:43–50. doi: 10.1016/j.cobeha.2015.07.007 [DOI] [Google Scholar]
  • 40. Gershman SJ, Radulescu A, Norman KA, Niv Y. Statistical computations underlying the dynamics of memory updating. PLoS computational biology. 2014;10(11):e1003939. doi: 10.1371/journal.pcbi.1003939 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Song S, Howard JH, Howard DV. Implicit probabilistic sequence learning is independent of explicit awareness. Learning & Memory. 2007;14:167–176. doi: 10.1101/lm.437407 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Kóbor A, Horváth K, Kardos Z, Takács Á, Janacsek K, Csépe V, et al. Tracking the implicit acquisition of nonadjacent transitional probabilities by ERPs. Memory & Cognition. 2019;47:1546–1566. doi: 10.3758/s13421-019-00949-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Gershman SJ, Blei DM. A tutorial on Bayesian nonparametric models. Journal of Mathematical Psychology. 2012;56(1):1–12. doi: 10.1016/j.jmp.2011.08.004 [DOI] [Google Scholar]
  • 44. MacKay DJC. Information theory, inference and learning algorithms. Cambridge University Press; 2003. [Google Scholar]
  • 45. Griffiths TL, Chater N, Kemp C, Perfors A, Tenenbaum JB. Probabilistic models of cognition: exploring representations and inductive biases. Trends in Cognitive Sciences. 2010;14(8):357–364. doi: 10.1016/j.tics.2010.05.004 [DOI] [PubMed] [Google Scholar]
  • 46. Mitchell TM. The Need for Biases in Learning Generalizations. Readings in Machine Learning. 1980;(CBM-TR-117):184–191. [Google Scholar]
  • 47. Richards BA, Lillicrap TP, Beaudoin P, Bengio Y, Bogacz R, Christensen A, et al. A deep learning framework for neuroscience. Nature Neuroscience. 2019;22(11):1761–1770. doi: 10.1038/s41593-019-0520-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Botvinick M, Ritter S, Wang JX, Kurth-Nelson Z, Blundell C, Hassabis D. Reinforcement Learning, Fast and Slow. Trends in Cognitive Sciences. 2019;23:408–422. doi: 10.1016/j.tics.2019.02.006 [DOI] [PubMed] [Google Scholar]
  • 49. Wang JX. Meta-learning in natural and artificial intelligence. Current Opinion in Behavioral Sciences. 2021;38:90–95. doi: 10.1016/j.cobeha.2021.01.002 [DOI] [Google Scholar]
  • 50. Elteto N, Nemeth D, Janacsek K, Dayan P. Tracking human skill learning with a hierarchical Bayesian sequence model. bioRxiv. 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Wu Z, Schrater P, arXiv XPap, Senn W. Inverse Rational Control: Inferring What You Think from How You Forage. arXiv. 2018; p. 1805.09864.
  • 52. Schuck NW, Cai MB, Wilson RC, Niv Y. Human Orbitofrontal Cortex Represents a Cognitive Map of State Space. Neuron. 2016;91:1402–1412. doi: 10.1016/j.neuron.2016.08.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Barthelmé S, Mamassian P. Evaluation of objective uncertainty in the visual system. PLoS Computational Biology. 2009;5:e1000504. doi: 10.1371/journal.pcbi.1000504 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Bach DR, Dolan RJ. Knowing how much you don’t know: a neural organization of uncertainty estimates. Nature Reviews Neuroscience. 2012;13:572–586. doi: 10.1038/nrn3289 [DOI] [PubMed] [Google Scholar]
  • 55. Michael E, de Gardelle V, Nevado-Holgado A, Summerfield C. Unreliable evidence: 2 sources of uncertainty during perceptual choice. Cerebral Cortex. 2015;25:935–947. doi: 10.1093/cercor/bht287 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Pouget A, Drugowitsch J, Kepecs A. Confidence and certainty: distinct probabilistic quantities for different goals. Nature Neuroscience. 2016;19:366–374. doi: 10.1038/nn.4240 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Lake BM, Salakhutdinov R, Tenenbaum JB. Human-level concept learning through probabilistic program induction. Science. 2015;350(6266):1332–1338. doi: 10.1126/science.aab3050 [DOI] [PubMed] [Google Scholar]
  • 58. Kemp C, Tenenbaum JB. The discovery of structural form. Proceedings of the National Academy of Sciences of the United States of America. 2008;105(31):10687–10692. doi: 10.1073/pnas.0802631105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Saxe AM, McClelland JL, Ganguli S. A mathematical theory of semantic development in deep neural networks. Proceedings of the National Academy of Sciences. 2019;116:11537–11546. doi: 10.1073/pnas.1820226116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Braun DA, Mehring C, Wolpert DM. Structure learning in action. Behavioural brain research. 2010;206:157–165. doi: 10.1016/j.bbr.2009.08.031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Goodman ND, Ullman TD, Tenenbaum JB. Learning a theory of causality. Psychological Review. 2011;118(1):110–119. doi: 10.1037/a0021336 [DOI] [PubMed] [Google Scholar]
  • 62. Austerweil JL, Griffiths TL. A nonparametric Bayesian framework for constructing flexible feature representations. Psychological Review. 2013;120:817–851. doi: 10.1037/a0034194 [DOI] [PubMed] [Google Scholar]
  • 63. Orbán G, Fiser J, Aslin RN, Lengyel M. Bayesian learning of visual chunks by human observers. Proceedings of the National Academy of Sciences of the United States of America. 2008;105:2745–2750. doi: 10.1073/pnas.0708424105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Griffiths TL, Tenenbaum JB. Optimal predictions in everyday cognition. Psychological Science. 2006;17(9):767–773. doi: 10.1111/j.1467-9280.2006.01780.x [DOI] [PubMed] [Google Scholar]
  • 65. Gold JI, Stocker AA. Visual Decision-Making in an Uncertain and Dynamic World. Annual Review of Vision Science. 2017;3:227–250. doi: 10.1146/annurev-vision-111815-114511 [DOI] [PubMed] [Google Scholar]
  • 66.van Opheusden B, Galbiati G, Kuperwajs I, Bnaya Z. Revealing the impact of expertise on human planning with a two-player board game. psyArXiv. 2021.
  • 67. Ratcliff R, Smith PL, Brown SD, McKoon G. Diffusion Decision Model: Current Issues and History. Trends in Cognitive Sciences. 2016;20:260–281. doi: 10.1016/j.tics.2016.01.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Drugowitsch J, Moreno-Bote RN, Churchland AK, Shadlen MN, Pouget A. The cost of accumulating evidence in perceptual decision making. Journal of Neuroscience. 2012;32(11):3612–3628. doi: 10.1523/JNEUROSCI.4010-11.2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Nosofsky RM, Little DR, Donkin C, Fific M. Short-Term Memory Scanning Viewed as Exemplar-Based Categorization. Psychological Review. 2011;118(2):280–315. doi: 10.1037/a0022494 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Palmer J, Huk AC, Shadlen MN. The effect of stimulus strength on the speed and accuracy of a perceptual decision. Journal of Vision. 2005;5(5):376–404. doi: 10.1167/5.5.1 [DOI] [PubMed] [Google Scholar]
  • 71. Smith PL, Ratcliff R. An Integrated Theory of Attention and Decision Making in Visual Signal Detection. Psychological Review. 2009;116(2):283–317. doi: 10.1037/a0015156 [DOI] [PubMed] [Google Scholar]
  • 72. Ossmy O, Moran R, Pfeffer T, Tsetsos K, Usher M, Donner TH. The timescale of perceptual evidence integration can be adapted to the environment. Current biology. 2013;23:981–986. doi: 10.1016/j.cub.2013.04.039 [DOI] [PubMed] [Google Scholar]
  • 73. Gold JI, Shadlen MN. Neural computations that underlie decisions about sensory stimuli. Trends in Cognitive Sciences. 2001;5:10–16. doi: 10.1016/S1364-6613(00)01567-9 [DOI] [PubMed] [Google Scholar]
  • 74. Hanes DP, Schall JD. Neural control of voluntary movement initiation. Science. 1996;274:427–430. doi: 10.1126/science.274.5286.427 [DOI] [PubMed] [Google Scholar]
  • 75. Marr D. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. USA: Henry Holt and Co., Inc.; 1982. [Google Scholar]
  • 76. Talluri BC, Urai AE, Tsetsos K, Usher M, Donner TH. Confirmation Bias through Selective Overweighting of Choice-Consistent Evidence. Current Biology. 2018;28:3128–3135. doi: 10.1016/j.cub.2018.07.052 [DOI] [PubMed] [Google Scholar]
  • 77. Gold JI, Shadlen MN. The Neural Basis of Decision Making. Annual Review of Neuroscience. 2007;30(1):535–574. doi: 10.1146/annurev.neuro.29.051605.113038 [DOI] [PubMed] [Google Scholar]
  • 78. Janacsek K, Nemeth D. Implicit sequence learning and working memory: Correlated or complicated? Cortex. 2013;49(8):2001–2006. doi: 10.1016/j.cortex.2013.02.012 [DOI] [PubMed] [Google Scholar]
  • 79. Nemeth D, Janacsek K, Londe Z, Ullman MT, Howard DV, Howard JH. Sleep has no critical role in implicit motor sequence learning in young and old adults. Experimental Brain Research. 2010;201(2):351–358. doi: 10.1007/s00221-009-2024-x [DOI] [PubMed] [Google Scholar]
  • 80. Kóbor A, Janacsek K, Takács Á, Nemeth D. Statistical learning leads to persistent memory: Evidence for one-year consolidation OPEN. Scientific Reports. 2017;7(1):760. doi: 10.1038/s41598-017-00807-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81. Song S, Howard JH, Howard DV. Implicit probabilistic sequence learning is independent of explicit awareness. Learning and Memory. 2007;14(3):167–176. doi: 10.1101/lm.437407 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82. Teh YW, Jordan MI, Beal M, Blei D. Hierarchical dirichlet processes. Journal of the American Statistical …. 2006; p. 1–41. [Google Scholar]
  • 83. Kim TD, Kabir M, Gold JI. Behavioral/Cognitive Coupled Decision Processes Update and Maintain Saccadic Priors in a Dynamic Environment. Journal of Neuroscience. 2017;37(13):3632–3645. doi: 10.1523/JNEUROSCI.3078-16.2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84. Brown SD, Heathcote A. The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive Psychology. 2008;57:153–178. doi: 10.1016/j.cogpsych.2007.12.002 [DOI] [PubMed] [Google Scholar]
  • 85. Harris CM, Waddington J, Biscione V, Manzi S. Manual choice reaction times in the rate-domain. Frontiers in Human Neuroscience. 2014;8. doi: 10.3389/fnhum.2014.00418 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010182.r001

Decision Letter 0

Samuel J Gershman, Lusha Zhu

18 Jan 2022

Dear Mr Nagy,

Thank you very much for submitting your manuscript "Tracking the contribution of inductive bias to individualized internal models" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

This is an interesting, well posed and carefully conducted study. The reviewers found the proposed method of inferring the internal model from RT using iHMM is novel and relevant. But at the same time, they all asked for a better clarification and more details in methods and results.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Lusha Zhu, Ph.D.

Associate Editor

PLOS Computational Biology

Samuel Gershman

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Reviewer #1: Review of "Tracking the contribution of inductive bias to individualized internal models"

Reviewer: Michael Landy

This is quite a fascinating paper, and I was quite impressed with the ability to infer internal models from reaction-time data alone. I think this paper is a substantial contribution to the literature and well worth publishing. I come to this paper with some background and my own work on sequential effects, but I was unfamiliar with the background on which the work was based (beamforming, iHMM's and how to infer them). As such, I think that this paper would benefit from a bit more of a clear tutorial and clarification, and the bulk of my comments below are about making the paper easier to read for the uninitiated.

Specifics (by line number, mostly):

Title: It's interesting that the title stresses inductive bias, whereas the bulk of the paper is about inferring internal models, and the bit about inductive bias, while quite interesting, only comes up at the very tail end of the Results section. I'd think a title that covers the rest would make more sense. The abstract also only talks about inductive bias in the final sentence, which is appropriate.

Figure 2: The legend's description of 2C doesn't match the figure at all ("dots" versus what else? There are no left vs. right panels. What coloured labels are you referring to?).

155-157: This section (and probably some of the earlier text) leads the reader to think that you'll be studying the online learning of a model, and in particular, I thought at this point that you'd be applying the inference to multiple temporal sections of a session to watch that evolution. I only learned much later that your analysis treats the internal model as if it's stable and fixed within a session and, to the extent that you look at the dynamics of learned internal models, you do so at a much slower time scale (across sessions, i.e., across days). Reading the introduction and thinking about the task, I imagined there would be learning within a session and that you'd somehow track that with your inference method. So, I suggest clueing the reader in on what's coming earlier on in the manuscript, as I was disappointed when I learned that the fit of the model using 10 successive blocks was performed only once per session, and was validated by predicting EARLIER blocks. That seemed particularly weird for session 1, when I would have thought the internal model wouldn't be at all stable during the early blocks, when there was little chance that the internal model would be at all stable. I'm surprised there was no discussion at all about within-session, trial-by-trial learning.

176: "Participant-averaged performance of the ideal observer": At this point in the text, unless the reader goes off and reads the Methods carefully, it's not at all clear to the reader what the t-test is comparing and what is meant by performance. The phrase I've quoted would norrmally mean something about how well the ideal observer performs the task. But, by "performance" you mean how well the ideal observer correlates with the human RT data, i.e., the quality of its predictions. This should be rephrased and clarified at this point in the text.

Table 1: I think it would be worthwhile to clarify where the numbers (5/8, 1/8) come from for the trigram model, i.e., half the time the trigram is applied beginning at a pattern trial and its prediction could be perfect, and half the time its aligned with a random trial and it can't predict at all. Not complicated, but worth pointing out explicitly.

Figure 3: Violin plot: Is CT guaranteed to have better performance here (since it's more flexible), or is the plotted performance from the blocks that weren't fit (the earlier blocks in the session)? Also, in the legend, "mode" -> "model".

190: By "better internal model", do you mean closer to the actual generative model or do you mean closer to the flawed model the participant is actually using? The word "better" is best-suited to the former, while I think you mean the latter.

203: I like this analysis of the error trials. But, if you are interested in the predictions made by observers, why not run a version of the experiment in which some trials require prediction (i.e., the task switches from simple RT to prediction, where the task is to say in advance what color is coming up, then get feedback afterward on that trial)? That would be more informative and wouldn't require you to do the analysis only on rare error trials

228-229: It's not clear what these t-tests are testing/comparing (what the CIs are intervals of, especially since you don't say what the means are here). It's also unclear how participants notice an unsignalled change in sequence statistics fast enough for this to work, although of course it DOES work. As I said, the whole manuscript treats internal models as if they are fixed and stable and, here, as if they are instantaneously swapped in and then stable. That's never discussed overtly nor justified, even though it seems, well, counterintuitive or almost certainly false.

Figure 4, and perhaps others: Minor point, but I read and review papers on my iPad (using iAnnotate) and a whole bunch of 4E, including all the data, was not visible. I'm guessing the figure has "layers" and they fooled my iPad's PDF viewer. In generally, you will likely want to flatten your figures (merge the layers) before final submission; that's safer. The legend states that stars mark significant differences, but I don't see any stars in the figure.

237: There is no Suppl. Fig. S5B

267: S6 -> S8

Figure 5A: Shouldn't the ideal "outperform" the trigram model, because the trigram model can't infer the current phase (i.e., whether the current trial is a random or a pattern trial), whereas the trigram model can't make that distinction? That doesn't mean it will correlate better with human observers, but again it might be interesting to separately analyze those two types of trials. Yes, later on in the manuscript you do something about this distinction.

Figure 6C: What are the datapoints in the lower left? Extrema? Never mentioned in the legend.

Legend Figure 6: "accounted for" in part B is about summing R^2 values, which you justify in the running text (as orthogonal elements). But, I thought model "performance" in these graphs was correlation (i.e., R, not R^2), so strictly they shouldn't sum. Please clarify. Also "advantage of normalized CT performance over the ideal observer model". Shouldn't that be over the ideal observer plus the Markov model?

520: Is the starting phase of the sequence fixed or randomized across blocks? Across sessions?

537: The "correct" HMM is a ring of 8 states, half putting out a uniform distribution and half with deterministic output. You never state this explicitly, although it's intrinsic in your discussion of Figure 5F (which is an awesome figure, although I'm not sure how you pull a single inferred model from the inference, when the inference presumably provides a posterior across possible models). I'd think it would be worthwhile to point out that correct model. Even armed with that model, the ideal observer would start out with a distribution across states, and would only lock in after a few trials even if it knew the model with no model uncertainty.

544: \\phi_{s,y} -> \\phi_{S_t,y}

547: problem in 1 -> problem in Eq. 1

548-550: I found these two sentence obscure and even self-contradictory. Please clarify.

558: "uncertainty about the true model AND ACTUAL STATE OF THE STIMULI": I don't understand that latter phrase. In this paper, there is no visual uncertainty.

572 and nearby: I read the Methods early on just after starting to read Results. And, at this point in the Methods I assumed the pseudocode here was applied to groups of 10 blocks, then applied to a slightly later group of 10 blocks, and so on, to understand the dynamics within a session. It was only much later in my reading that I learned that it was only applied to a single, fixed stretch of 10 blocks per session. It would be good to clarify the approacher earlier.

588: Might be worth saying "countably infinite"

590: Fig. ???

608: "(e.g., one session)". First, this is the first time I learned that you assumed no learning was going on within a session (which is a bizarre assumption, although required to make this feasible). Second, the fitting is only for 10 blocks (40% of a session), so you don't really assume it's fixed for the whole session, although since you apply the estimated model to predict a different bit of the session, I guess you are assuming it's fixed for most of the session.

Equation after 610: This bit of notation can be confusing. First, you switch mid-equation from event "S_t = s_t" to shorthand "s_t", even though they mean the same thing (I'd just use the latter). Second, there is a stray proportionalto symbol at the end of the first line. Third, the notation \\hat{s}^t will, for most readers, make them assume this refers to a point estimate of the state at time t. But, you seem to mean it to be an estimate of the probability that "s_t" is the state (given the current and past stimuli). That's a weird form of notation and will confuse people.

611: "For filtering": Here I betray my outsider status: I'm not sure what "filtering" means in this community, so I was confused here. Also, it's worth reminding the reader what is meant by "even if responses times are not considered...". What you mean is that they aren't used in the performance metrics, but are used for model fitting, right?

Table 2: The column headers here are completely messed up, so I couldn't really check/parse this table. Also, what are all they "[1]"s supposed to mean? Looks like some latex formatting got into the table itself.

616-619: Here, at last, is where you finally explicitly state what you are doing about the dynamics, without previewing earlier or justifying that this is a sensible thing to do.

Table 3: This table also has similar messed-up formatting to Table 2. Is \\alpha_0 the same as \\alpha? Where is \\gamma used? What is the "\\cdot \\varepsilon" doing in the definition of \\hat{\\pi}?

631-635: I do STAN model-fitting in my lab, but you are clearly doing something fancier than I've done, so I can't really parse this paragraph (and am unfamiliar with NUTS ;^)

Equation after 642: Why are you using capital letters for \\phi and y all of a sudden?

Eq. after 659: Yes, I think this is basically the LATER formulation (I've used it in one paper, but didn't go back and check). The one weird thing about this formulation is that theoretically an outlier value of r_n could be negative, which would be a problem. For that matter, it could even be zero or very small (i.e., huge tails on the RT distribution).

679: Running it on a simulated observer is obviously a good thing to do. But, did the analysis recover the model that the simulated observer was using? How would you know? How would you score how well it did at recovering the internal model?

686: "below the threshold...": What was that threshold and how was it determined?

733: It's only in this line that you talk about explicitly the issue of models changing under the hood in early trials of a session.

742 et seq.: I was confused by this section and by the corrresponding Results text. First, in this section it wasn't clear what was meant by an error (a wrong button press, of course). It would have been nice to clarify up front that here would would ask whether such finger errors relate to the posterior probabilities of each possible stimulus, both in terms of missing the correct button because it's less likely, and in terms of which button you hit instead, because its probability is high. The main Results text didn't make this clear either. Finally, the ROC idea really is kind of nonsensical, i.e., it doesn't relate clearly as a process model of how finger errors are made.

S2A: The Table in the lower-left of this panel might as well have actual Greek letters as the row headings ;^)

S2B: Again, the labels on the axes should be explained. Is this an R or an R^2? In the legend description of A it says "symbols with different colors and shapes", but this is in panel B, not A.

S6: Are there 64 points in each plot corresponding to the different trigrams?

Reviewer #2: The authors investigated an implicit visuomotor sequence learning task and developed a computational method to reverse-engineer participants’ internal models of the serial dependence through their reaction times (RTs). One major novelty of their method was the use of the infinite hidden Markov model (iHMM) to capture the potentially infinite space of serial patterns that participants might acquire. The authors found that, in explaining participants’ variation in response times, this iHMM model (which was called the CT model in the paper) outperformed both the ideal observer model that follows the ground-truth transition rules and a few Markov models that only tracks the transitions between observable states. The explaining powers of different models changed over the training process, with the earlier stage better approximated by a first-order Markov model and the later stages better by the ideal observer model and a second-order Markov model (i.e., the trigram model). The authors concluded that the failure of achieving an internal model of ground truth as well as its individual differences resulted from specific inductive biases, in particular, a prior belief in first-order Markov transitions.

I think the application of iHMM to modeling human participants’ internal models is novel and insightful. The work is also technically solid. The writing is overall elegant and clear.

But I also have some concerns about the major conclusions of the paper, which I shall specify below.

Major concerns:

1. What parameters of the CT model characterize individual participants’ inductive biases (prior beliefs)? In the CT model, participants were assumed to update their prior beliefs from time to time in a Bayesian way. The deviation of their behaviors from the ideal observer’s depends on their priors. Before reading through the paper, I had thought it would be the hyper-parameters that differed between different individual participants. But when I came to the Methods section, I found the same set of hyper-parameters were used for modeling all participants’ internal models. Then what contributes to the individual differences in the learned internal models (e.g., Fig. 5F)? Did the individual differences just reflect some random variations in participants’ Bayesian inference? Or, did I miss anything?

2. The authors had concluded that the (first-order) Markov model is part of participants’ inductive biases. I was wondering how this conclusion could be compatible with the best-fitting model—the CT model (i.e., iHMM internal model). Links should be made between the internal model predicted by the CT model at early training stages and those of the Markov model. For example, Markov-like internal models might be the emergent properties of iHMM after limited learning experience. Moreover, if so, could it still be claimed that the Markov model constitutes participants’ inductive biases?

3. The authors had shown that the trigram model, a model with the second-order serial dependence, could not explain specific features in participants’ RTs (Fig. 6E). But how about a “quadgram” model with the third-order serial dependence? Considering that it only involves 4*4*4*4 = 256 possibilities, while there were 85 trials/block * 25 blocks = 2125 trials per session and a total of 8 training sessions, it is not a crazy idea that participants might acquire the third-order serial dependence. Besides, such quadgram model seems to be more computationally tractable than iHMM.

4. Could there be any model-free plots and descriptions of the RT results?

5. Working memory capacity had been measured for each participant. Did it correlate with participants’ task performances or their internal models?

Minor issues:

1. I agree that “Cognitive Tomography” (CT) is a cool term. However, I do not think the “CT model” is an appropriate term for the specific CT model based on iHMM, because all the other models in the paper share the same CT framework (internal model + response model). Something like the “iHMM model” might be better.

2. Lines 208–214 and Figure 4A: It seems meaningless to compare the model-predicted proportion of top rank to the chance level. If one model has an overall higher accuracy to predict the incoming stimulus than the other model, its proportion of top rank would be naturally higher than the latter for both correct and incorrect responses. What matters is the discriminability of the proportion of top rank between correct and incorrect responses, such as the ROC reported in Figure 4C.

Line 212: “However, it also assigned the highest probability to the upcoming stimulus in incorrect trials (0.315, n = 2777, p = 1). ” The statistics in the parentheses do not seem to support the statement.

3. I could not quite understand what Figure 4E could tell us. It seems that every model (not necessarily the CT model) could have better predictions for participants’ behaviors when the test statistics were more similar to the training statistics.

I felt even puzzled when I came to Line 616, which reads “throughout the execution of the task, the internal model of the participants is continually updating.” If the updating modeled by the CT model was close to participants’ actual updating, shouldn’t we see similar model performance (R^2) in different test sessions, no matter whether the test statistics were similar to the training statistics or not?

4. Some important details about modeling fitting and comparison methods should be made more explicit in the main text. For example, (1) whether each session of each participant was fitted separately, (2) whether cross-validation was used, and (3) for cross-validation, which parts of data were used as the training set and which as the test set.

Some of these details seem to be described in Table 2, but Table 2 is mis-placed in format and hard to follow.

Line 565: “ten consecutive blocks of trials”. Why ten blocks? Weren’t there 25 blocks in each session?

Line 572: “For each of the 60 posterior model samples”. What does the “60” mean?

5. Line 567: “response times smaller than 180 msecs in each block removed”. What percent of trials were removed?

6. Line 7: “intuitive psychology” seems to be rarely used in the literature. The term “folk psychology” is more common.

7. Finding participants’ internal models to deviate from an ideal observer does not seem to be new or surprising. Why were there so many figures devoted to the comparison between the CT model and the ideal observer model? A comparison between all the models (such as Fig. 5) is more informative and might be better to be described earlier in the paper.

8. Could there be a graphical illustration for the ideal observer model, similar to Fig. 5F? Maybe by enhancing Table 1.

9. The legends of Fig 5D–5F are a little confusing. When I first read “Participant 102 finds a partially accurate model by Day 2 (D) and a model close to the true model by Day 8 (F)”, I had thought (D) and (F) were only about Participant 102.

10. Typos:

Line 221: a space is missing between “model” and “as”.

Line 590: The figure number in the parentheses is missing.

Table 2: The headings of the table seem to be mis-placed.

Reviewer #3: The manuscript presents a new method for inferring subjects' internal models from response times in a sequence learning task. Because the task's statistical structure is unknown to the subjects, the assumption that the subject's internal model matches the true generative model of the task (the ideal observer assumption) does not hold. Thus, the authors use a flexible class of dynamical models (iHMM) to represent subjects' internal models and combine it with a behavioral model linking subjective probabilities to response times. The iHMM-based internal models estimated from response time data are shown to predict response times better compared to the ideal observer. By considering an alternative model, which assumes no hidden structure but only Markovian dependencies, the authors show that all subjects start with a bias towards simple temporal dependencies, but some subjects learn a model closer to the ideal observer.

The conceptual introduction to the problem is very clear. Particularly, the explicit distinction between the subject's internal model and the behavioral model accompanied by the graphical model notation (Fig. S2) is quite helpful. The results are interesting and the "cognitive tomography" method constitutes a relevant contribution to the recent literature on inferring internal models from behavioral data. However, the description of the methods could be improved in terms of clarity and level of detail and some aspects of the results need clarifying statements or additional analyses:

- Clarifying what exactly constitutes a state of the dynamical system in the article's main text would help readers not well-versed in HMMs and similar models. Relatedly, while the graphical notation for the model (Fig. 2A) is very informative once understood, it is worth a little more explanation: e.g., the authors could show how the true dynamics of the task (i.e. the ideal observer's model) or an instance of the Markov model look in the graphical notation, which would make it easier for the reader to appreciate the inferred models (Fig. 5D,F).

- The validation of the inference method on synthetic datasets is a bit scarce. For a model of this complexity and a highly customized inference procedure, I would expect to see some prior predictive checks, inference diagnostics (e.g. r_hat, effective sample size), and posterior predictive checks. As a guideline for reporting Bayesian analyses, I suggest Kruschke (2021).

- The evaluation presented in Fig. S3B suggests that the response time prediction performance is not particularly good. Even for synthetic datasets with response time standard deviations comparable to real data (the darker dots), the R^2 values are mostly between 0.25 and 0.75. Is this just the result of the inherent variability in the response times due to the LATER model (which the better performance in predicting subjective probabilities might suggest) or is this due to a failure of the inference method? Could one compare the response time prediction performance against an upper bound on the response time prediction performance computed from the ground truth synthetic internal model? How well are the parameters of the response time model (tau, mu, sigma) recovered by the inference method?

- The model comparison based on R^2 is not really convincing, because it does not take into account the significantly higher model complexity of the iHMM-based model. The authors chose R^2 because not all models are Bayesian. To my understanding, the only non-Bayesian model is the trigram model, which is not central to the argument in the paper, while the ideal observer, the iHMM-based model, and the Markov model are Bayesian. If this is correct, the authors should perform a Bayesian model comparison for these three models. If this is incorrect, please expand the description of the models in the paper to make clearer how they are fit to the data.

- In the Section "Trade-off between ideal observer and Markov model contributions", the inferred internal model is only compared quite indirectly to the ideal observer model, via their predictive accuracy for response times. Is there a more direct way to assess the distance between an iHMM and the true HMM employed in the experiment? The evaluations presented in the original iHMM paper by Gael et al. (2008) suggest that there is.

- I agree with the point made in the discussion, that POMDPs are a possible alternative formalization for the problem at hand. While the authors acknowledge in the introduction that the internal model of the agent need not be identical to the true generative model of the task, this point also applies here: An agent might assume that their actions influence the state, while it is actually not the case in the true generative model. Furthermore, employing a POMDP formulation might also shed light on internal costs relevant to the task (e.g. computational costs), which are absent from HMMs without explicit modeling of actions. I think these points are worth further discussion.

Minor points:

- The argument in the introduction for moving beyond ideal observers could be strengthened further by including relevant literature making similar arguments (e.g. Feldman, 2013, Beck et al., 2012).

- "learning a novel statistics" should be "learning novel statistics" (p. 3, l. 22), same on p. 4, l. 58, p. 10 l. 200, 201, 222

- Fig. S3A refers to Gael (2011). Should this be Gael et al (2008) as in the caption and in the bibliography or is it referring to a different paper?

- "and we formulated as a trigram model" (grammar and meaning??)

- Fig. 5B is not referenced

- Latex: use $M_\\text{education}$ (\\text environment for whole words) instead of $M_{education}$

- p. 22 l. 590: missing figure reference

- Table 2: headings seem to be broken

- p. 14: spelling of normalized / normalised is inconsistent

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: None

Reviewer #3: No: While zip files containing data and code were available, the passwords for these files were only available upon request from the authors.

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Michael S Landy

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010182.r003

Decision Letter 1

Samuel J Gershman, Lusha Zhu

4 Apr 2022

Dear Mr Nagy,

Thank you very much for submitting your manuscript "Tracking the contribution of inductive bias to individualized internal models" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

The revised manuscript has addressed most issues raised by referees. I'm returning the manuscript to you to address a few minor comments from two of the reviewers.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Lusha Zhu, Ph.D.

Associate Editor

PLOS Computational Biology

Samuel Gershman

Deputy Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

The revised manuscript has addressed most issues raised by referees. I'm returning the manuscript to you to address a few minor comments from two of the reviewers.

Reviewer's Responses to Questions

Comments to the Authors:

Reviewer #1: Re-review of "Tracking the contribution of inductive bias to individualized internal models"

Reviewer: Michael Landy

I was impressed with this paper the first time, but needed a bit more clarity in the presentation. This version improves that quite a bit and adequately responds to my and the other reviewers' requests (IMHO). My comments are pretty minor.

Specifics:

lines 99-100: This says that sessions were on consecutive days, but the Procedure says they were spaced a week apart.

Figure 3: The diagram on the lower right is probably supposed to be the Markove model, but it is neither mentioned nor described at all in the legend.

276-277: "successfully recruit previously learned models": I thought this referred to the fact that on Day 10 the better-performing model tracks the switching. That's clear in Fig. 5E, but is not exactly convincing across subjects glancing as Fig. S8 and no summary across subjects is provided.

294: vanes -> wanes

Fig. 6D,F: The correspondence between the model samples and participants/days is not clarified in the legend and is indicated by red circles on day numbers and a skinny red line connecting the model to that day that I missed completely until I stared at the figure for quite some time. You should make the connections more obvious AND mention it in the legend. Another option (not mutually exclusive) is to put a title in the corner of each model sample that says something like "Participant 102, Day 2".

327: The main text here only suggests that KL is used to measure the match between predictions, whereas the Appendix gives another justification for why KL is the right thing to do. You should allude to that other justification here as well.

338: identified FOR all?

340: FOR several participants (or maybe IN, but not AT)

436: characteristic OF all participants

3 after 603: A glitch here. S_t,Y_t \\memberof \\mathcal{N}. The sequence... [Yeah, I know my latex is wrong ;^)]

695: inbetween -> in between

697: last 60 -> the last 60

Table 1, Hierarchical prior over state transitions, Notation: Shouldn't that be \\alpha,\\gamma ???

711: predictions many -> prediction of many

759: seen on -> seen in

775: are subset -> are a subset

822: take THE mean ... over THE last 60

824: IN Fig. 5

827: predicting A correct trial

Supplement, p. 2, para. 2: The reference for Hierarchical Dirichlet didn't get filled in.

Supplement, Fig. S3: I have a notation here "Compare the resulting HMMs". I can't remember what sort of comparison I was thinking of... ;^(

Appendix B, para. 2: Note that THE quantity we obtained...

Reviewer #2: I have no further questions.

Reviewer #3: I appreciate the effort the authors have made to address my previous comments and I think the clarity of the manuscript is improved. Specifically, the description of the iHMM is now more detailed and accessible for uninitiated audiences. My concerns about the model validation and comparison have been resolved by the addition of the new version of Fig. S3B and the clarified description of how the models were fitted and evaluated. With most methodological concerns out of the way, I only have one remaining minor issue: In my previous review, I asked for MCMC diagnostics because of the custom inference method and high model complexity. While the authors have pointed to a plot validating an assumption of the model and clarified how much data were used to fit the model, they have not shown evidence for the convergence of the MCMC method.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No: I can't find any mention of it in the main text nor the supplement

Reviewer #2: None

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Michael S Landy

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010182.r005

Decision Letter 2

Samuel J Gershman, Lusha Zhu

8 May 2022

Dear Mr Nagy,

We are pleased to inform you that your manuscript 'Tracking the contribution of inductive bias to individualized internal models' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Lusha Zhu, Ph.D.

Associate Editor

PLOS Computational Biology

Samuel Gershman

Deputy Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Reviewer #1: All my comments have been adequately addressed, since they were almost all trivial! I didn't check those about the supplement...

Reviewer #3: Thanks for providing evidence for the convergence of their MCMC chains by showing the posterior CIs across multiple chains. I am still not quite sure why standard diagnostic method for checking MCMC like R-hat or effective sample size were not provided.

Irrespective of this minor issue, I think this is a very interesting paper and well worth publishing.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Michael S Landy

Reviewer #3: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010182.r006

Acceptance letter

Samuel J Gershman, Lusha Zhu

10 Jun 2022

PCOMPBIOL-D-21-02227R2

Tracking the contribution of inductive bias to individualized internal models

Dear Dr Nagy,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Anita Estes

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. A brief introduction to infinite Hidden Markov Models.

    (PDF)

    S2 Appendix. Optimal prediction and the LATER model.

    (PDF)

    S1 Fig. Experimental design.

    A Experimental stimuli and abstract representation used in the paper. B Design of the experiment. The experiment consisted of ten sessions, separated by a one-week delay. On Days 1–8, participants performed the ASRT task with sequence 1 throughout 25 blocks (5 epochs) each sessions. On Day 9, an interfering sequence (sequence 2) was introduced. Both sequences were tested on Day 10 with blocks of 5 alternating.

    (TIF)

    S2 Fig. Graphical representation of internal model and generative model of behaviour.

    Left: Internal model, generative model of the sequence assumed by the participant. Right: generative model of behaviour.

    (TIF)

    S3 Fig. Synthetic data experiment.

    A We first sampled three versions of synthetic internal models using the original iHMM inference method in [33]. The internal models of the synthetic participants differ in their experience (as how many ASRT trials they had seen)—resulting in an “early”, “middle” and “late” model. Then, we generated subjective probability values for each model on a new set of ASRT stimuli (holding the pattern sequence intact). B Results of our synthetic data experiment. Performance is measured as the amount of variance in response times (R2). We ran our inference method for 81 synthetic datasets with different parameter settings (symbols with different colors and shapes). We use the same number of response times as with the human participants to recover the internal models. Symbol colours correspond to the response time standard deviation. The result shows that while the response time prediction may be at a lower level, the latent predictive probabilities can still be inferred with relatively high accuracy. This shows the inference method can recover the latent structure from a generated response time sequence. C Predictive performance (R2) of the actual internal model of the synthetic participant vs the predictive performance of the inferred internal model of the same synthetic participant. The inferred model is evaluated on train data sets (same as on panel B). D Standard deviations of response times of individuals in the first eight experimental sessions.

    (TIF)

    S4 Fig. Comparison of quantiles of the (z-scored) r variable in the LATER model.

    Quantiles are computed from the response times and predicted subjective probabilities with quantiles of the expected normal distribution for the analysed models (red, CT; green, ideal observer; blue, Markov), also known as QQ-plots. Participant-by-participant shows that the empirical distribution of the r parameter on a test set is approximately normal with a few exceptions (see participants 124 131 CT and Ideal Observer models), thus validating model assumptions.

    (TIF)

    S5 Fig. Response time distributions.

    A Response time samples generated from different models and the original Data for participant 119. B The density plots of the point clouds in A. C Predicted response times (mean of maximum a posteriori estimates for each model averaged over the model samples) vs actual response times. Response times outside mean ±3 s.d. are omitted for visual clarity. In contrast with panel A, the x coordinates are best predictions rather than random samples, hence their spread is much smaller. D Histogram of model predictive performances on Day 8. 9 Box plots of model performance distributions, data same as panel D.

    (TIF)

    S6 Fig. Predicted response time means vs. measured response time means.

    Predictions are on the test set on Day 8 of the experiment grouped by three element sequences for each participant separately (each dot corresponds to one possible three-element sequence). Only those sequences were included which had at least 5 measured correct response times in order to limit the standard error over the measured response time mean. Error bars show 2 s.e.m.

    (TIF)

    S7 Fig. Predicting when errors will occur for each participant individually.

    (TIF)

    S8 Fig. Model performances of CT models trained on Day 8 and Day 9 for each individual.

    (TIF)

    S9 Fig. Model performances for all models and all participants individually.

    (TIF)

    S10 Fig. Normalized CT performance as a function of the ideal observer model performance on different days of the experiment.

    Dots indicate the performance of the models for different individuals.

    (TIF)

    S11 Fig. RT distribution examples of synthetic participants.

    Each panel shows distributions with RT model parameters sampled from their respective priors. Distributions are shown as violin plots as a function of predictive probabilities.

    (TIF)

    Attachment

    Submitted filename: R2R cogtom.pdf

    Attachment

    Submitted filename: r2r cogtom revision 2.pdf

    Data Availability Statement

    The code used for this paper is available at https://www.github.com/mzperix/asrt-beamsampling. The repository contains links to the experimental data as well as the data used to generate the figures.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES