Expected value of action | 1V1 |
Expected value of action | V2 |
Chosen action | Ai (either A1 or A2) |
Reward received | r |
Prediction error (having chosen Ai) | δ= r—Vi |
Learning rate | α |
Updated expected values: | |
Chosen action | AiVi → Vi + αδ |
Unchosen action | AjVj → Vj |