Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2014 Mar 12;8:76. doi: 10.3389/fnbeh.2014.00076

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

Copyright © 2014 Story, Vlaev, Seymour, Darzi and Dolan.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

PMC Copyright notice

Interactions between model-based and model-free decision-making. Action values for a hypothetical agent, a person following a diet-plan, deciding whether or not to consume biscuits when presented with a cue, the biscuit tin. The agent's choice combines model-based and model-free value. (A) A decision-tree (semi-Markov state-space) represented by the model-based system when the agent considers the decision from state, P at a time interval, d_p, in advance of encountering the biscuit tin, denoted by state B. Alternative courses of action at B, to consume or to abstain, are evaluated by searching through the tree of future possibilities. The choice to consume is followed after a short delay, d_c, with a food reward, R_c, associated with consumption, denoted by the state C, followed after a longer delay, d_h, by the maintenance of current body weight, denoted by the unrewarded state, U. The choice to abstain is followed after delay, d_c, by the unrewarded state A, followed after delay, d_h, by a health benefit with reward, R_h, in the form of weight loss. The agent is naïve to the parallel effects of model-free learning when computing these reward estimates. Model-based action values, Q_MB, are given by the sum of future rewards following each action, discounted according to a function, D(t), assumed to be exponential and identical across both controllers. The equations below indicate that the model-based system in this instance is indifferent between consuming and abstaining at both P (left hand equation) and B (right hand equation). (B) Cached values stored by the model-free system, which reflect the result of prior experience with the outcomes. Neither the outcomes themselves, nor the transitions between them, are explicitly represented. Similarly, because the distant health consequences have never been experienced, they do not influence the model-free Q-values, Q_MF. As a result the model-free system prefers consumption at state B. (C) Model-based and model-free values are assumed to combine according to a weighted average, governed by the parameter, ω. At P, where model-free values have no influence, the agent is indifferent between consuming and abstaining. In the presence of the biscuit tin at B however the additional influence of model-free (cached) values induces a preference for consumption.