Skip to main content
. 2014 Mar 12;8:76. doi: 10.3389/fnbeh.2014.00076

Figure 2.

Figure 2

Interactions between model-based and model-free decision-making. Action values for a hypothetical agent, a person following a diet-plan, deciding whether or not to consume biscuits when presented with a cue, the biscuit tin. The agent's choice combines model-based and model-free value. (A) A decision-tree (semi-Markov state-space) represented by the model-based system when the agent considers the decision from state, P at a time interval, dp, in advance of encountering the biscuit tin, denoted by state B. Alternative courses of action at B, to consume or to abstain, are evaluated by searching through the tree of future possibilities. The choice to consume is followed after a short delay, dc, with a food reward, Rc, associated with consumption, denoted by the state C, followed after a longer delay, dh, by the maintenance of current body weight, denoted by the unrewarded state, U. The choice to abstain is followed after delay, dc, by the unrewarded state A, followed after delay, dh, by a health benefit with reward, Rh, in the form of weight loss. The agent is naïve to the parallel effects of model-free learning when computing these reward estimates. Model-based action values, QMB, are given by the sum of future rewards following each action, discounted according to a function, D(t), assumed to be exponential and identical across both controllers. The equations below indicate that the model-based system in this instance is indifferent between consuming and abstaining at both P (left hand equation) and B (right hand equation). (B) Cached values stored by the model-free system, which reflect the result of prior experience with the outcomes. Neither the outcomes themselves, nor the transitions between them, are explicitly represented. Similarly, because the distant health consequences have never been experienced, they do not influence the model-free Q-values, QMF. As a result the model-free system prefers consumption at state B. (C) Model-based and model-free values are assumed to combine according to a weighted average, governed by the parameter, ω. At P, where model-free values have no influence, the agent is indifferent between consuming and abstaining. In the presence of the biscuit tin at B however the additional influence of model-free (cached) values induces a preference for consumption.