Algorithm | Hyperparameter | Value |
---|---|---|
Q-learning | temporal discount factor γ | 0.9 |
Q-learning | TD(λ) decay factor γ | 0.5 |
Q-learning | learning rate, α | 0.1 |
Q-learning | neg. reward per step | 0.01 |
SR | temporal discount factor γ | 0.9 |
SR | TD(λ) decay factor γ | 0.5 |
SR | learning rate, α | 0.1 |
SARSA | temporal discount factor γ | 0.99 |
SARSA | TD(λ) decay factor γ | 0.5 |
SARSA | learning rate, α | 0.1 |
SARSA | neg. reward per step | 0.001 |
Tile coding | tile size | [2 × 2, 3 × 3] |
MB-G | model buffer window, N | 15 |