Skip to main content
. 2022 Dec 7;9(12):211800. doi: 10.1098/rsos.211800

Figure 1.

Figure 1.

Value, decision information and free energy plots in a 5 × 5 gridworld with cardinal (Manhattan) actions A:{,,,}. The goal g = #12 is in the centre and is coloured yellow in the grid plots. The arrow lengths are proportional to the conditional probability π(a|s) in the indicated direction. The relevant prior, i.e. the joint state and action distribution marginalized over all transient states, p^(a;π) is shown in the yellow goal state. (a) The policy displayed is the optimal value policy πV=argmaxπVgπ(s) for all sS. The heatmap and annotations show the negative optimal value function VgπV(s) for each state. (b) The policy presented is optimal with respect to free energy, i.e πF=argminπFgπ(s;β) for all sS. The heatmap and annotations show decision information DπF(s) with β = 100. (c) The policy displayed is again πF with β = 100. The heatmap and annotations show free energy FgπF(s;β). (d) Graph showing the numbering of states in the gridworld, the goal is coloured in green and the other colours indicate levels radiating from the centre.