Skip to main content
. 2021 Aug 16;12:4942. doi: 10.1038/s41467-021-25123-3

Fig. 7. Learning the default policy results in soft habits.

Fig. 7

ac A simple choice task (a) in which the default policy has been extensively trained under conditions in which state B is rewarding. In this case, an overtrained default policy favors choice of B by default (b) which softly biases choice away from A even after the rewarded goal has moved in the test phase (c). This effect is larger when the control cost parameter, λ, is larger. This is because this parameter controls the relative weight of the control cost (for diverging from default policy; see “Methods”, Eq. (6)). d The default policy has been trained extensively to find a goal located in the blue square. e, f Performance of the model with overtrained vs. uniform (i.e. no training) default policies on this task, in which the goal has been moved but it is still in the same room (e). The overtrained model performs better here (f). However, when the goal has been moved to a different room (gi), the model with a uniform default policy (no training; g) performs better than the overtrained model, which habitually enters the room in which it has been overtrained in (h). Mean, standard error of the mean, and distribution of data across 1000 simulations are plotted in panels f and i. For overtraining, the model has experienced 1000 episodes of the task with step-size 0.01. Source data are provided as a Source Data file.