Skip to main content
. Author manuscript; available in PMC: 2010 Dec 1.
Published in final edited form as: Cognition. 2009 May 8;113(3):293–313. doi: 10.1016/j.cognition.2009.03.013

Table 1.

Summary of models tested. The column labeled km denotes the number of free parameters in the model (see Equation 8).

Name km Description
Baseline 1 No error term.
Chooses each option with a fixed probability across all trials. A standard comparison against which to evaluate the other models.
Softmax 2 Error Term: δ = rt+1Q(ai)
Estimates the average outcome from each action. Does not take into account different states or future outcomes. Predicts melioration.
Eligibility Trace (ET) 3 Error Term: δ = rt+1Q(ai)
Estimates the average outcome from each action but includes a decaying memory for recent action selections (eligibility traces). With an appropriate decay term, can predict maximizing behavior.
Q-learning Network 3 Error Term: δ = rt+1 + γ maxaQ(st+1; a) − Q(st; at)
Utilizes a linear network to approximate distinct state representations. Error term include a discounted estimate of future reward. Depending on the setting of the discounting term, γ, and the nature of state cues that are provided, can predict maximizing behavior.