a When faced with a choice between two previously chosen stimuli, for which value was explicitly shown [Schosen (learned)], participants tended to select the rewarded option (Schosen+), suggesting they successfully learned their values, while for pairs of previously unchosen options, which were never directly associated with any reward [Sunchosen (inferred)], participants tended to select the option previously associated with an unrewarded item (Sunchosen0), demonstrating an inverse decision bias. b This inverse decision bias was observed even when controlling for initial subjective valuations of the choice options, in a Bayesian logistic regression predicting the probability to choose a rewarded item as a function of pair type and the difference in liking ratings. After rearranging the model coefficients, we can derive separate intercept terms for chosen and unchosen pairs. The intercept coefficient denotes the tendency to choose a rewarded item when there is no difference in liking ratings between the two choice options. For chosen pairs, the intercept is reliably positive, whereas for unchosen pairs it is reliably negative. c The inverse inference of value extends beyond the decision phase to explicit estimation of value. When asked to estimate the auction outcomes of each painting, participants correctly remembered the outcomes of the chosen paintings, yet showed the opposite pattern for the unchosen ones. d The tendency to select unchosen items previously paired with unrewarded items (Sunchosen0 over Sunchosen+) in the Final Decisions phase was correlated with an inverse estimation of value, i.e., the tendency to estimate Sunchosen0 as rewarded and Sunchosen+ as unrewarded in the Outcome Estimation phase. This relationship was assessed in a Bayesian linear regression predicting the mean probability to select rewarded items as a function of inverse estimation of value for chosen and unchosen pairs separately. In panels (a) and (c), error bars denote the standard error of the mean and points denote trial-averaged data of individual participants. In panels (b) and (d), the beta coefficients and model fits denote median and 95% highest density interval of the posterior distribution. In panel (c), green bars depict rewarded stimuli (S+) and orange bars depict unrewarded stimuli (S0) (for unchosen stimuli, this is the outcome of their chosen counterpart). Source data are provided as a Source Data file.