Table 1.
Name | km | Description |
---|---|---|
Baseline | 1 | No error term. Chooses each option with a fixed probability across all trials. A standard comparison against which to evaluate the other models. |
Softmax | 2 | Error Term: δ = rt+1 − Q(ai) Estimates the average outcome from each action. Does not take into account different states or future outcomes. Predicts melioration. |
Eligibility Trace (ET) | 3 | Error Term: δ = rt+1 − Q(ai) Estimates the average outcome from each action but includes a decaying memory for recent action selections (eligibility traces). With an appropriate decay term, can predict maximizing behavior. |
Q-learning Network | 3 | Error Term: δ = rt+1 + γ maxaQ(st+1; a) − Q(st; at) Utilizes a linear network to approximate distinct state representations. Error term include a discounted estimate of future reward. Depending on the setting of the discounting term, γ, and the nature of state cues that are provided, can predict maximizing behavior. |