Simple Reinforcement Learning |
Key estimated parameters α- Learning rate, rate at which past outcomes influence current choices |
Q learning δt = Rt − Qt Qt+1 = Qt + α · δt |
|
Model-Based/Model-Free Reinforcement Learning |
Key estimated parameters α- Learning rate, rate at which past outcomes influence current choices ω- Weight parameter to determine relative influence of MB vs. MF |
Model-Free (MF) QMF (si,t+1, ai,t+1) = QMF (si,t, ai,t) + αi ⋅ δi,t QMF (s1,t, a1,t)= QMF (s1,t, a1,t) + α1 ⋅ λδ2,t (where λ is an eligibility trace allowing outcome at 2nd stage to influence 1st stage choice) Model-Based (MB) QMB (sA, aj) = P (sB|sA, aj)⋅max QMF (sB, a) + P (sC|sA, aj)⋅max QMF (sC, a) MB-MF balance Qnet (sA, aj)= ⍵ ⋅ QMB (sA, aj) + (1 − ⍵) ⋅ QMF (sA, aj) |
|
Economic Choice and Valuation |
Key estimated parameters κ- Discount rate, measure of attitude towards delayed rewards α- Risk tolerance, measure of attitude towards risky rewards β- Ambiguity tolerance, measure of attitude towards ambiguous rewards λ- Loss aversion, measure of avoidance of potential loss B- Sensitivity to losses and gains |
Hyperbolic discounting Expected utility theory with only risk Uoption = p ⋅ vα Expected utility theory with risk and ambiguity Prospect theory Uoption = π (pi) ⋅ v(xi) Loss aversion |