Skip to main content
. Author manuscript; available in PMC: 2014 Oct 31.
Published in final edited form as: Nous. 2012 Jun 21;48(2):314–341. doi: 10.1111/j.1468-0068.2012.00863.x
Expected value of action 1V1
Expected value of action V2
Chosen action Ai (either A1 or A2)
Reward received r
Prediction error (having chosen Ai) δ= r—Vi
Learning rate α
Updated expected values:
 Chosen action AiVi → Vi + αδ
 Unchosen action AjVj → Vj