Skip to main content
. 2023 Jun 21;111(12):1966–1978.e8. doi: 10.1016/j.neuron.2023.03.034

Hyperparameters

Algorithm Hyperparameter Value
Q-learning temporal discount factor γ 0.9
Q-learning TD(λ) decay factor γ 0.5
Q-learning learning rate, α 0.1
Q-learning neg. reward per step 0.01
SR temporal discount factor γ 0.9
SR TD(λ) decay factor γ 0.5
SR learning rate, α 0.1
SARSA temporal discount factor γ 0.99
SARSA TD(λ) decay factor γ 0.5
SARSA learning rate, α 0.1
SARSA neg. reward per step 0.001
Tile coding tile size [2 × 2, 3 × 3]
MB-G model buffer window, N 15