Skip to main content
. 2019 May 9;13:153. doi: 10.3389/fnhum.2019.00153

Figure 2.

Figure 2

A conceptual framework for the examined assumptions. (A) The standard model-free algorithm is the SARSA temporal-difference (TD) learning model, where the values are computed for all state-action pairs (standard value updating). We examined another possibility in which only the action values in the choice stage are computed (parsimonious value updating). (B) The originally used model-based system assumes that the expected values for all state-action pairs are calculated anew each time using the transition-probability model of the task (the forward-looking model-based system). This system carries a high calculation cost but realizes fully model-based updating. As another possibility, we applied model-based updating for the credit assignment problem (the backward-looking model-based system). This system updates only the state-action pairs relating to the last state that produced the outcome based on the transition-probability model of the task, but it works efficiently with similar accuracy to the forward-looking model-based system when the transition probabilities are stable. (C) In the standard TD learning algorithm, the values of unselected options are assumed to remain unchanged (without forgetting). We examined another possibility in which the values of unselected options change to a certain default value over time (with forgetting).