Skip to main content
. 2020 Nov 5;2(2):13. doi: 10.1007/s42484-020-00023-9

Fig. 1.

Fig. 1

Transition from time step t to t + 1, (t = 0,1,2,…), via the agent’s decision at, where s and λ denote environment state and reward (λ0 = 0), respectively (adapted from Sutton and Barto (2018))