. 2020 Oct 29;46(1):E74–E87. doi: 10.1503/jpn.200032

Table 1.

Markov decision process model of the approach-avoidance conflict task

Model variable	General definition	Model-specific specification
o_t	Observable outcomes at time t	Outcome modalities: Observed position on the runway (10 possible observations, including a “starting” position and the 9 final positions one could choose) Cues indicating trial type (5 possible observations, corresponding to the 5 trial types) Stimuli observed at the end of each trial. This included 7 possible observations corresponding to a “starting” observation, the positive stimulus with 0 or 2 points, and the negative affective stimulus with 0, 2, 4 or 6 points
s_t	Hidden states at time t	Hidden state factors: Beliefs about position on the runway (10 possible belief states with an identity mapping to the observations in outcome modality 1) Beliefs about the trial type (corresponding to the 5 trial types)
π	A distribution over action policies encoding the expectation that a particular policy is most likely to generate preferred outcomes	Allowable policies included the decision to transition from the starting state to each of the 9 possible positions on the runway
β	The prior on expected policy precision (β) is the “rate” parameter of a γ distribution, which is a standard distribution to use as a prior for expected precision. This latter term modulates the influence of expected free energy on policy selection	When β is high (reflecting low confidence about the best decision), policy selection becomes less deteriministic. Higher β values therefore encode participants’ decision uncertainty during the task (similar to the temperature parameter in a conventional softmax response function)
A matrix P(o_t \| s_t)	A matrix encoding beliefs about the relationship between hidden states and observable outcomes (i.e., the likelihood that specific outcomes will be observed given specific hidden states)	Encodes beliefs about the relationship between position on the runway and the probability of observing each outcome, conditional on beliefs about the task condition
B matrix P(s_t_{+ 1} \| s_t)	A matrix encoding beliefs about how hidden states will evolve over time (transition probabilities)	Encodes beliefs about the way participants could choose to move the avatar, as well as the belief that the task condition will not change within a trial
C matrix In P(o_t)	A matrix encoding the degree to which some observed outcomes are preferred over others (technically modelled as prior expectations over outcomes)	Encodes stronger positive preferences for receiving higher numbers of points, and negative preferences for the aversive stimuli (both relative to an anchor value of 0 for the “safe” positive stimulus). The emotional conflict (EC) parameter in our model encoded the value of participants’ preferences against observing the aversive stimuli
D matrix P(s₁)	A matrix encoding beliefs about (a probability distribution over) initial hidden states	The simulated agent always began in an initial starting state, and believed each task condition was stable across each trial