Table 1.

Summary of models tested. The column labeled k_m denotes the number of free parameters in the model (see Equation 8).

Name	k_m	Description
Baseline	1	No error term. Chooses each option with a fixed probability across all trials. A standard comparison against which to evaluate the other models.
Softmax	2	Error Term: δ = r_t₊₁ − Q(a_i) Estimates the average outcome from each action. Does not take into account different states or future outcomes. Predicts melioration.
Eligibility Trace (ET)	3	Error Term: δ = r_t₊₁ − Q(a_i) Estimates the average outcome from each action but includes a decaying memory for recent action selections (eligibility traces). With an appropriate decay term, can predict maximizing behavior.
Q-learning Network	3	Error Term: δ = r_t₊₁ + γ max_aQ(s_t₊₁; a) − Q(s_t; a_t) Utilizes a linear network to approximate distinct state representations. Error term include a discounted estimate of future reward. Depending on the setting of the discounting term, γ, and the nature of state cues that are provided, can predict maximizing behavior.