Table 1.
Parameter | Value |
---|---|
β : Learning parameter | 0.15 |
λ : Eligibility persistence | 0.15 |
γ : Discount factor | 0.9 |
α : Eligibility decay rate | 1 − γλ |
ϵ : Exploration rate | 0.025 |
t* : Softmax time scale | 2000 trials |
m : Softmax scaling factor | 10 |
Test dataset consists of 1, 000 random sequences, while averages are computed over 100 simulations.