Skip to main content
. Author manuscript; available in PMC: 2021 Aug 19.
Published in final edited form as: Curr Opin Behav Sci. 2020 Nov 8;38:40–48. doi: 10.1016/j.cobeha.2020.08.007
Simple Reinforcement Learning Key estimated parameters
α- Learning rate, rate at which past outcomes influence current choices
Standard learning model based on learning rate and prediction errors that are used to update action-outcome (or stimulus-outcome) associations Q learning
δt = RtQt
Qt+1 = Qt + α · δt
Model-Based/Model-Free Reinforcement Learning Key estimated parameters
α- Learning rate, rate at which past outcomes influence current choices
ω- Weight parameter to determine relative influence of MB vs. MF
Based on learning that is updated using a balance of previous prediction error from past choices and knowledge of the task structure with the available actions (a) at each state (s), and typically tested with “2-stage” tasks. Model-Free (MF)
QMF (si,t+1, ai,t+1) = QMF (si,t, ai,t) + αiδi,t
QMF (s1,t, a1,t)= QMF (s1,t, a1,t) + α1λδ2,t (where λ is an eligibility trace allowing outcome at 2nd stage to influence 1st stage choice)

Model-Based (MB)
QMB (sA, aj) = P (sB|sA, aj)⋅max QMF (sB, a) + P (sC|sA, aj)⋅max QMF (sC, a)

MB-MF balance
Qnet (sA, aj)= ⍵ ⋅ QMB (sA, aj) + (1 − ⍵) ⋅ QMF (sA, aj)
Economic Choice and Valuation Key estimated parameters
κ- Discount rate, measure of attitude towards delayed rewards
α- Risk tolerance, measure of attitude towards risky rewards
β- Ambiguity tolerance, measure of attitude towards ambiguous rewards
λ- Loss aversion, measure of avoidance of potential loss
B- Sensitivity to losses and gains
Discounting. how temporal factors depreciate value when reward/gratification is delayed
Risk preference. how individual attitudes about known risk and ambiguity influence the value of choice options

Loss aversion. the balance between individual gain and loss sensitivities
Hyperbolic discounting
Uoption=v1+κD

Expected utility theory with only risk
Uoption = pvα

Expected utility theory with risk and ambiguity
Uoption=(pβA2)vα

Prospect theory
Uoption = π (pi) ⋅ v(xi)

Loss aversion
λ=|Bloss|Bgain