(a, b) The slope (a) and intercept (b) estimates as a function of the ground truth for simulated sessions where the number of bouts matched that of real sessions. The ground truth can be recovered (R2 = 0.99 for the slope; R2 = 0.91 for the intercept) from the logistic regression. (c, d) The slope (c) and intercept (d) estimates as a function of the ground truth for simulated sessions with varying number of bouts. Overall, the ground truth can be precisely recovered for sessions with more than 100 bouts. (e) Deviance explained from a logistic regression model that fits simulated sessions of an inference-based agent using the correct model (‘Consecutive failures’), a wrong but correlated model (‘Negative value’) and a random model (where both rewards and failures are arbitrarily accumulated or reset). The deviance explained by the consecutive failures represents the upper-bound of the model performance. The deviance explained by the consecutive failures being smaller than 1 indicates that, although the ground truth can be recovered, the switching decision is not deterministic and involves some stochasticity (here the variability was matched to that of the data). However, the deviance explained by the consecutive failures is significantly greater than the deviance explained by the correlated model and the random model (two-sided Wilcoxon signed rank test, 3 stars indicate p < 10−3, p = 0.00005 between Consec. fail. and Neg. value; p < 10−7 between Consec. fail. and Random). On each box the central mark indicates the median across simulated sessions (n = 42 sessions), and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points. (f) Illustration of a logistic regression model for predicting the switching decision of an inference-based simulated agent from the two different DVs (‘Consecutive failures’ and ‘Negative value’) simultaneously. (g) Deviance explained from the model in (f) as a function of the number of bouts in each session. (h) For all simulated sessions in (e), the variance explained by the ‘consecutive failures’ DV was greater than the variance explained by the ‘negative value’ DV, indicating that the model inferred the true DV.