Computational model. Top left: The Markov decision process used to model the approach-avoidance conflict task. The generative model is depicted graphically, such that arrows indicate dependencies between variables. Observations (o) depend on hidden states (s; this relationship is specified by the A matrix), and those states depend on both previous states (as specified by the B matrix or the initial states specified by the D matrix) and the sequences of actions/policies (π) selected by the agent. The probability of selecting a particular policy in turn depends on the expected free energy (G) of each policy with respect to the preferences (C) of the decision-maker being modelled. The degree to which expected free energy influences policy selection is also modulated by an expected precision term (γ), which is in turn dependent on a prior policy precision parameter (β), where higher values of β promote greater decision uncertainty (i.e., less influence of the differences in expected free energy across policies). For more details on the associated mathematics, see Friston and colleagues52,57 and Appendix 1, available at jpn.ca/200032-a1. In our model, the observations were cues indicating the trial type, cues indicating the position of the avatar, and the outcome stimuli. The hidden states included beliefs about trial type and avatar position, and the policies included the choice to move the avatar to any other position on the runway. Right: The A matrices in the right panel show the mapping between states and observations for outcome stimuli. Here, the rows correspond to the stimuli (first row is the “start” observation), and the columns correspond to the avatar position states (column 1 corresponds to the “start” state, and columns 2 to 10 correspond to choosing each of the 9 runway positions). Lighter colours in these A matrices indicate higher probabilities. Trial types: AV = avoid; APP = approach; CONF2, CONF4 and CONF6 = conflict + 2, 4 or 6 points, respectively. Bottom left: The model parameters corresponded to the degree to which the negative stimulus was dyspreferred relative to the degree to which the points were preferred in the C matrix (“emotional conflict”; EC), as well as the prior policy precision parameter β, which reflected decision uncertainty. Top middle: Example simulations of action selection under different parameter values during the CONF2 trial type. Blue dots indicate chosen actions, and darker colours indicate higher action values in the model.