. 2021 Jun 3;11:11783. doi: 10.1038/s41598-021-91308-x

Table 2.

Markov decision process model of the AAC task.

Model variable	General definition	Model-specific specification
o_t	Observable outcomes at time t*	Outcome modalities 1. Observed position on the runway (10 possible observations, including a “starting” position and the nine final positions on the runway that could be chosen) 2. Cues indicating trial type (five possible observations, corresponding to the five trial types) 3. Stimuli observed at the end of each trial. This included seven possible observations corresponding to a “starting” observation, the positive stimulus with 0 or 2 points, and the negative affective stimulus with 0, 2, 4, or 6 points
s_t	Beliefs about hidden states at time t	Hidden state factors 1. Beliefs about position on the runway (10 possible belief states with an identity mapping to the observations in outcome modality #1) 2. Beliefs about the trial type (corresponding to the five trial types)
π	A distribution over action policies encoding the probability of choosing each policy	Allowable policies included the decision to transition from the starting state to each of the nine possible positions on the runway
β	The prior on expected policy precision ( $β$ ) is the 'rate' parameter of a gamma distribution, which is a standard distribution to use as a prior for expected precision ( $γ$ ). This latter term modulates the influence of expected free energy on policy selection	When $β$ is high (reflecting low confidence about the best decision), policy selection becomes less deteriministic. Higher $β$ values therefore encode participants’ decision uncertainty during the task (c.f., the temperature parameter in a conventional softmax response function)
A matrix $P (o_{t} \| s_{t})$	A matrix encoding beliefs about the relationship between hidden states and observable outcomes (i.e., the likelihood that specific outcomes will be observed given specific hidden states)	Encodes beliefs about the relationship between position on the runway and the probability of observing each outcome, conditional on beliefs about the task condition
B matrix $P (s_{t + 1} \| s_{t})$	A matrix encoding beliefs about how hidden states will evolve over time (transition probabilities)	Encodes beliefs about the way participants could choose to move the avatar, as well as the belief that the task condition will not change within a trial
C matrix $l n P (o_{t})$	A matrix encoding the degree to which some observed outcomes are preferred over others (technically modeled as prior expectations over outcomes). The values for each column in this matrix are passed through a softmax function to generate a proper probability distribution, which is then log-transformed	Encodes stronger positive preferences for receiving higher amounts of points, and negative preferences for the aversive stimuli (both relative to an anchor value of 0 for the “safe” positive stimulus). The EC parameter in our model encodes the value of participants’ preferences against observing the aversive stimuli
D matrix $P (s_{1})$	A matrix encoding beliefs about (a probability distribution over) initial hidden states	The simulated agent always begins in an initial starting state, and believes each task condition is stable across each trial

*Note that t here refers to a timepoint in each trial about which participants have beliefs. Before a participant makes a choice (i.e., when still in the “start” state), they have prior beliefs about the state at time t = 2, and these beliefs are then updated after a subsequent observed outcome. In the active inference literature these beliefs about timepoints are often instead denoted with the Greek letter tau (τ) in order to distinguish them from the times (t) at which new observations are presented (for details, see⁵⁴).