|
Reinforcement learning (RL) models In their simplest and widely used form, RL models include the Rescorla-Wagner learning rule (Equation 1b) combined with the Softmax choice function (Equation 2). An important element of learning in these models is the prediction error (PE), which is the difference between an expected (EV) and a received (O) outcome as a result of an action (e.g., choosing the right option) (Equation 1a).
Parameters in RL models that are estimated from the data and used to calculate PEs are a learning rate (α) and decision temperature (β). Learning rates reflect the degree of updating of expectations (EV), i.e., expected value of a stimulus or action. The EV is then updated for each stimulus or action separately at time t (note that there might be variations of these models where EVs for both options are updated simultaneously based on the outcome received for one of them, i.e., when the options are perfectly anticorrelated). Although these models can be extended in several ways, one common version includes separate learning rates for positive (better-than-expected) and negative (worse-than-expected) PEs.
The choice is then determined by the Softmax function, which assigns higher choice probability to the option with the higher EV proportional to the difference of the EVs for different options with varying sensitivity (Equation 2). The decision temperature (also called inverse temperature) indicates the degree of this sensitivity and can indicate more or less exploratory choice behavior depending on its value. Learning rates determine how much influence PEs have on the updating; a higher learning rate would lead to larger influence of the most recent outcomes, whereas a lower learning rate would lead to slower integration across a history of multiple outcomes. Bayesian Updating Models Simple reinforcement-learning models do not incorporate uncertainty directly in their computational framework. In contrast, Bayesian models assume that individuals attempt to infer the environment’s hidden states given an individual’s observations (i.e., given the outcomes). In Bayesian models, uncertainty is explicitly built in. That is, in Bayesian learning models, there is not a single estimation of EV, but there is a belief distribution over the world state of interest given the observations. This belief distribution starts with a prior belief distribution and is updated with each observation based on Bayes rule, resulting in the posterior belief distribution of an individual. The posterior distribution is then used in the decision rule by maximizing the expected utility under the posterior (e.g., maximum a posteriori (MAP) decision rule), while the width of the distribution corresponds to uncertainty about the environment’s state. For more information see e.g. Ma et al. (2022a). |