Skip to main content
Elsevier Sponsored Documents logoLink to Elsevier Sponsored Documents
. 2020 Dec;99:102447. doi: 10.1016/j.jmp.2020.102447

Active inference on discrete state-spaces: A synthesis

Lancelot Da Costa a,b,, Thomas Parr b, Noor Sajid b, Sebastijan Veselic b, Victorita Neacsu b, Karl Friston b
PMCID: PMC7732703  PMID: 33343039

Abstract

Active inference is a normative principle underwriting perception, action, planning, decision-making and learning in biological or artificial agents. From its inception, its associated process theory has grown to incorporate complex generative models, enabling simulation of a wide range of complex behaviours. Due to successive developments in active inference, it is often difficult to see how its underlying principle relates to process theories and practical implementation. In this paper, we try to bridge this gap by providing a complete mathematical synthesis of active inference on discrete state-space models. This technical summary provides an overview of the theory, derives neuronal dynamics from first principles and relates this dynamics to biological processes. Furthermore, this paper provides a fundamental building block needed to understand active inference for mixed generative models; allowing continuous sensations to inform discrete representations. This paper may be used as follows: to guide research towards outstanding challenges, a practical guide on how to implement active inference to simulate experimental behaviour, or a pointer towards various in-silico neurophysiological responses that may be used to make empirical predictions.

Keywords: Active inference, Free energy principle, Process theory, Variational Bayesian inference, Markov decision process, Mathematical review

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • We review active inference on discrete state-spaces, a framework thought to underwrite perception, action, planning, decision-making and learning in biological and artificial agents.

  • We derive the associated process theory and discuss its biological plausibility.

  • We discuss outstanding challenges for the theory, its implementation and empirical validation.

1. Introduction

Active inference is a normative principle underlying perception, action, planning, decision-making and learning in biological or artificial agents, that inherits from the free energy principle, a theory of self-organisation in the neurosciences (Buckley et al., 2017, Friston, 2019, Friston et al., 2006). Active inference postulates that these processes may all be seen as optimising two complementary objective functions; namely, a variational free energy, which measures the fit between an internal model and past sensory observations, and an expected free energy, which scores possible future courses of action in relation to prior preferences.

Active inference has been employed to simulate a wide range of complex behaviours in neuropsychology and machine learning, including planning and navigation (Kaplan & Friston, 2018a), reading (Friston et al., 2018b), curiosity and abstract rule learning (Friston, Lin et al., 2017), substance use disorder (Smith, Schwartenbeck et al., 2020), approach avoidance conflict (Smith, Kirlic et al., 2020), saccadic eye movements (Parr & Friston, 2018a), visual foraging (Mirza et al., 2016, Parr and Friston, 2017a), visual neglect (Parr & Friston, 2018c), hallucinations (Adams et al., 2013), niche construction (Bruineberg et al., 2018, Constant et al., 2018), social conformity (Constant et al., 2019), impulsivity (Mirza et al., 2019), image recognition (Millidge, 2019), and the mountain car problem (Çatal et al., 2019, Friston, Adams et al., 2012, Friston et al., 2009). The key idea that underwrites these simulations is that creatures use an internal forward (generative) model to predict their sensory input, which they use to infer the causes of these data. In addition to simulate behaviour, active inference allows to answer questions about an individual’s psychological processes, by comparing the evidence of different mechanistic hypotheses in relation to behavioural data.

Active inference is very generic and allows to view different models of behaviour in the same light. For example, a drift diffusion model can now be seen in relation to predictive coding as they can both be interpreted as minimising free energy through a process of evidence accumulation (Bogacz, 2017, Buckley et al., 2017, Friston and Kiebel, 2009). Similarly, a dynamic programming model of choice behaviour corresponds to minimising expected free energy under the prior preference of maximising reward (Da Costa et al., 2020). In being generic active inference is not meant to replace any of the existing models, rather it should be used as a tool to uncover the commitments and assumptions of more specific models.

Early formulations of active inference employed generative models expressed in continuous space and time (for an introduction see Bogacz, 2017, for a review see Buckley et al., 2017), with behaviour modelled as a continuously evolving random dynamical system. However, we know that some processes in the brain conform better to discrete, hierarchical, representations, compared to continuous representations (e.g., visual working memory (Luck and Vogel, 1997, Zhang and Luck, 2008), state estimation via place cells (Eichenbaum et al., 1999, O’Keefe and Dostrovsky, 1971), language, etc.). Reflecting this, many of the paradigms studied in neuroscience are naturally framed as discrete state-space problems. Decision-making tasks are a prime candidate for this, as they often entail a series of discrete alternatives that an agent needs to choose among (e.g., multi-arm bandit tasks (Daw et al., 2006, Reverdy et al., 2013, Wu et al., 2018), multi-step decision tasks (Daw et al., 2011)). This explains why – in active inference – agent behaviour is often modelled using a discrete state-space formulation, the particular applications of which are summarised in Table 1. More recently, mixed generative models (Friston, Parr et al., 2017) – combining discrete and continuous states – have been used to model behaviour involving discrete and continuous representations (e.g., decision-making and movement (Parr & Friston, 2018d), speech production and recognition (Friston, Sajid et al., 2020), pharmacologically induced changes in eye-movement control (Parr & Friston, 2019) or reading; involving continuous visual sampling informing inferences about discrete semantics (Friston, Parr et al., 2017)).

Table 1.

Applications of active inference (discrete state-space) .

Application Description References
Decision-making under uncertainty Initial formulation of active inference on partially observable Markov decision processes. Friston, Samothrakis et al. (2012)
Optimal control Application of KL or risk sensitive control in an engineering benchmark—the mountain car problem. Çatal et al. (2019) and Friston, Adams et al. (2012)
Evidence accumulation Illustrating the role of evidence accumulation in decision-making through an urns task. FitzGerald, Moran et al. (2015) and FitzGerald, Schwartenbeck et al. (2015)
Psychopathology Simulation of addictive choice behaviour. Schwartenbeck, FitzGerald, Mathys, Dolan, Wurst et al. (2015)
Dopamine The precision of beliefs about policies provides a plausible description of dopaminergic discharges. Friston et al. (2014) and FitzGerald, Dolan et al. (2015)
Functional magnetic resonance imaging Empirical prediction and validation of dopaminergic discharges. Schwartenbeck, FitzGerald, Mathys, Dolan and Friston (2015)
Maximal utility theory Evidence in favour of surprise minimisation as opposed to utility maximisation in human decision-making. Schwartenbeck, FitzGerald, Mathys, Dolan, Kronbichler et al. (2015)
Social cognition Examining the effect of prior preferences on interpersonal inference. Moutoussis et al. (2014)
Exploration–exploitation dilemma Casting behaviour as expected free energy minimising accounts for epistemic and pragmatic choices. Friston et al. (2015)
Habit learning and action selection Formulating learning as an inferential process and action selection as Bayesian model averaging. Friston et al. (2016) and FitzGerald et al. (2014)
Scene construction and anatomy of time Mean-field approximation for multi-factorial hidden states, enabling high dimensional representations of the environment. Friston and Buzsáki (2016) and Mirza et al. (2016)
Electrophysiological responses Synthesising various in-silico neurophysiological responses via a gradient descent on free energy. E.g., place-cell activity, mismatch negativity, phase-precession, theta sequences, theta–gamma coupling and dopaminergic discharges. Friston, FitzGerald et al. (2017)
Structure learning, curiosity and insight Simulation of artificial curiosity and abstract rule learning. Structure learning via Bayesian model reduction. Friston, Lin et al. (2017)
Hierarchical temporal representations Generalisation to hierarchical generative models with deep temporal structure and simulation of reading. Friston et al. (2018b) and Parr and Friston (2017b)
Computational neuropsychology Simulation of visual neglect, hallucinations, and prefrontal syndromes under alternative pathological priors. Benrimoh et al., 2018, Parr, Benrimoh et al., 2018, Parr and Friston, 2018c, Parr, Rees et al., 2018 and Parr, Rikhye et al. (2019)
Neuromodulation Use of precision parameters to manipulate exploration during saccadic searches; associating uncertainty with cholinergic and noradrenergic systems. Parr and Friston, 2017a, Parr and Friston, 2019, Sales et al., 2018 and Vincent et al. (2019)
Decisions to movements Mixed generative models combining discrete and continuous states to implement decisions through movement. Friston, Parr et al. (2017) and Parr and Friston (2018d)
Planning, navigation and niche construction Agent induced changes in environment (generative process); decomposition of goals into subgoals. Bruineberg et al., 2018, Constant et al., 2018 and Kaplan and Friston (2018a)
Atari games Active inference compares favourably to reinforcement learning in the game of Doom. Cullen et al. (2018)
Machine learning Scaling active inference to more complex machine learning problems. Tschantz et al. (2019)

Due to the pace of recent theoretical advances in active inference, it is often difficult to retain a comprehensive overview of its process theory and practical implementation. In this paper, we hope to provide a comprehensive (mathematical) synthesis of active inference on discrete state-space models. This technical summary provides an overview of the theory, derives the associated (neuronal) dynamics from first principles and relates these to known biological processes. Furthermore, this paper and Buckley et al. (2017) provide the building blocks necessary to understand active inference on mixed generative models. This paper can be read as a practical guide on how to implement active inference for simulating experimental behaviour, or a pointer towards various in-silico neuro- and electro-physiological responses that can be tested empirically.

This paper is structured as follows. Section 2 is a high-level overview of active inference. The following sections elucidate the formulation by deriving the entire process theory from first principles; incorporating perception, planning and decision-making. This formalises the action–perception cycle: (1) an agent is presented with a stimulus, (2) it infers its latent causes, (3) plans into the future and (4) realises its preferred course of action; and repeat. This enactive cycle allows us to explore the dynamics of synaptic plasticity, which mediate learning of the contingencies of the world at slower timescales. We conclude in Section 9 with an overview of structure learning in active inference.

2. Active inference

To survive in a changing environment, biological (and artificial) agents must maintain their sensations within a certain hospitable range (i.e., maintaining homeostasis through allostasis). In brief, active inference proposes that agents achieve this by optimising two complementary objective functions, a variational free energy and an expected free energy. In short, the former measures the fit between an internal (generative) model of its sensations and sensory observations, while the latter scores each possible course of action in terms of its ability to reach the range of “preferred” states of being.

Our first premise is that agents represent the world through an internal model. Through minimisation of variational free energy, this model becomes a good model of the environment. In other words, this probabilistic model and the probabilistic beliefs1 that it encodes are continuously updated to mirror the environment and its dynamics. Such a world model is considered to be generative; in that it is able to generate predictions about sensations (e.g., during planning or dreaming), given beliefs about future states of being. If an agent senses a heat source (e.g., another agent) via some temperature receptors, the sensation of warmth represents an observed outcome and the temperature of the heat source a hidden state; minimisation of variational free energy then ensures that beliefs about hidden states closely match the true temperature. Formally, the generative model is a joint probability distribution over possible hidden states and sensory consequences – that specifies how the former cause the latter – and minimisation of variational free energy enables to “invert” the model; i.e., determine the most likely hidden states given sensations. The variational free energy is the negative evidence lower bound that is optimised in variational Bayes in machine learning (Bishop, 2006, Xitong, 2017). Technically – by minimising variational free energy – agents perform approximate Bayesian inference (Sengupta and Friston, 2016, Sengupta et al., 2016), which enables them to infer the causes of their sensations (e.g., perception). This is the point of contact between active inference and the Bayesian brain (Aitchison and Lengyel, 2017, Friston, 2012, Knill and Pouget, 2004). Crucially, agents may incorporate an optimism bias (McKay and Dennett, 2009, Sharot, 2011) in their model; thereby scoring certain “preferred” sensations as more likely. This lends a higher plausibility to those courses of action that realise these sensations. In other words, a preference is simply something an agent (believes it) is likely to work towards.

To maintain homeostasis, and ensure survival, agents must minimise surprise.2 Since the generative model scores preferred outcomes as more likely, minimising surprise corresponds to maximising model evidence.3 In active inference, this is assured by the aforementioned processes; indeed, the variational free energy turns out to be an upper bound on surprise and minimising expected free energy ensures preferred outcomes are realised, thereby avoiding surprise on average.

Active inference can thus be framed as the minimisation of surprise (Friston, 2009, Friston, 2010, Friston et al., 2006, Friston and Stephan, 2007) by perception and action. In discrete state models – of the sort discussed here – this means agents select from different possible courses of action (i.e., policies) in order to realise their preferences and thus minimise the surprise that they expect to encounter in the future. This enables a Bayesian formulation of the perception–action cycle (Fuster, 1990): agents perceive the world by minimising variational free energy, ensuring their model is consistent with past observations, and act by minimising expected free energy, to make future sensations consistent with their model. This account of behaviour can be concisely framed as self-evidencing (Hohwy, 2016).

In contrast to other normative models of behaviour, active inference is a ‘first principle’ account, which is grounded in statistical physics (Friston, 2019, Parr et al., 2020). Active inference describes the dynamics of systems that persist (i.e., do not dissipate) during some timescale of interest, and that can be statistically segregated from their environment—conditions which are satisfied by biological systems. Mathematically, the first condition means that the system is at non-equilibrium steady-state (NESS). This implies the existence of a steady-state probability density to which the system self-organises and returns to after perturbation (i.e., the agent’s preferences). The statistical segregation condition is the presence of a Markov blanket (c.f., Fig. 1) (Kirchhoff et al., 2018, Pearl, 1998): a set of variables through which states internal and external to the system interact (e.g., the skin is a Markov blanket for the human body). Under these assumptions it can be shown that the states internal to the system parameterise Bayesian beliefs about external states and can be cast a process of variational free energy minimisation (Friston, 2019, Parr et al., 2020). This coincides with existing approaches to approximate inference (Beal, 2003, Bishop, 2006, Blei et al., 2017, Jordan et al., 1998). Furthermore, it can be shown that the most likely courses of action taken by those systems are those which minimise expected free energy (or a variant thereof, see Appendix C)—a quantity that subsumes many existing constructs in science and engineering (see Section 7).

Fig. 1.

Fig. 1

Markov blankets in active inference. This figure illustrates the Markov blanket assumption of active inference. A Markov blanket is a set of variables through which states internal and external to the system interact. Specifically, the system must be such that we can partition it into a Bayesian network of internal states μ, external states η, sensory states o and active states u, (μ, o and u are often referred together as particular states) with probabilistic (causal) links in the directions specified by the arrows. All interactions between internal and external states are therefore mediated by the blanket states b. The sensory states represent the sensory information that the body receives from the environment and the active states express how the body influences the environment. This blanket assumption is quite generic, in that it can be reasonably assumed for a brain as well as elementary organisms. For example, when considering a bacillus, the sensory states become the cell membrane and the active states comprise the actin filaments of the cytoskeleton. Under the Markov blanket assumption – together with the assumption that the system persists over time (i.e., possesses a non-equilibrium steady state) – a generalised synchrony appears, such that the dynamics of the internal states can be cast as performing inference over the external states (and vice versa) via a minimisation of variational free energy (Friston, 2019, Parr et al., 2020). This coincides with existing approaches to inference; i.e., variational Bayes (Beal, 2003, Bishop, 2006, Blei et al., 2017, Jordan et al., 1998). This can be viewed as the internal states mirroring external states, via sensory states (e.g., perception), and external states mirroring internal states via active states (e.g., a generalised form of self-assembly, autopoiesis or niche construction). Furthermore, under these assumptions the most likely courses of actions can be shown to minimise expected free energy. Note that external states beyond the system should not be confused with the hidden states of the agent’s generative model (which model external states). In fact, the internal states are exactly the parameters (i.e., sufficient statistics) encoding beliefs about hidden states and other latent variables, which model external states in a process of variational free energy minimisation. Hidden and external states may or may not be isomorphic. In other words, an agent uses its internal states to represent hidden states that may or may not exist in the external world.

By subscribing to the above assumptions, it is possible to describe the behaviour of viable living systems as performing active inference—the remaining challenge is to determine the computational and physiological processes that they implement to do so. This paper aims to summarise possible answers to this question, by reviewing the technical details of a process theory for active inference on discrete state-space generative models, first presented in Friston, FitzGerald et al. (2017). Note that it is important to distinguish between active inference as a principle (presented above) from active inference as a process theory. The former is a consequence of fundamental assumptions about living systems, while the latter is a hypothesis concerning the computational and biological processes in the brain that might implement active inference. The ensuing process theories theory can then be used to predict plausible neuronal dynamics and electrophysiological responses that are elicited experimentally.

3. Discrete state-space generative models

The generative model (Bishop, 2006) expresses how the agent represents the world. This is a joint probability distribution over sensory data and the hidden (or latent) causes of these data. The sorts of discrete state-space generative models used in active inference are specifically suited to represent discrete time series and decision-making tasks. These can be expressed as variants of partially observable Markov decision processes (POMDPs; Aström, 1965): from simple Markov decision processes (Barto and Sutton, 1992, Stone, 2019, White, 2001) to generalisations in the form of deep probabilistic (hierarchical) models (Allenby et al., 2005, Box and Tiao, 1965, Friston et al., 2018b). For clarity, the process theory is derived for the simplest model that facilitates understanding of subsequent generalisations; namely, a POMDP where the agent holds beliefs about the probability of the initial state (specified as D), the transition probabilities from one state to the next (defined as matrix B) and the probability of outcomes given states (i.e., the likelihood matrix A); see Fig. 2.

Fig. 2.

Fig. 2

Example of a discrete state-space generative model. Panel 2a, specifies the form of the generative model, which is how the agent represents the world. The generative model is a joint probability distribution over (hidden) states, outcomes and other variables that cause outcomes. In this representation, states unfold in time causing an observation at each time-step. The likelihood matrix A encodes the probabilities of state–outcome pairs. The policy π specifies which action to perform at each time-step. Note that the agent’s preferences may be specified either in terms of states or outcomes. It is important to distinguish between states (resp. outcomes) that are random variables, and the possible values that they can take in S (resp. in O), which we refer to as possible states (resp. possible outcomes). Note that this type of representation comprises a finite number of timesteps, actions, policies, states, outcomes, possible states and possible outcomes. In Panel 2b, the generative model is displayed as a probabilistic graphical model (Bishop, 2006, Jordan et al., 1998, Pearl, 1988, Pearl, 1998) expressed in factor graph form (Loeliger, 2004). The variables in circles are random variables, while squares represent factors, whose specific form are given in Panel 2a. The arrows represent causal relationships (i.e., conditional probability distributions). The variables highlighted in grey can be observed by the agent, while the remaining variables are inferred through approximate Bayesian inference (see Section 4) and called hidden or latent variables. Active inference agents perform inference by optimising the parameters of an approximate posterior distribution (see Section 4). Panel 2c specifies how this approximate posterior factorises under a particular mean-field approximation (Tanaka, 1999), although other factorisations may be used (Parr, Markovic et al., 2019, Schwöbel et al., 2018). A glossary of terms used in this figure is available in Table 2. The mathematical yoga of generative models is heavily dependent on Markov blankets. The Markov blanket of a random variable in a probabilistic graphical model are those variables that share a common factor. Crucially, a variable conditioned upon its Markov blanket is conditionally independent of all other variables. We will use this property extensively (and implicitly) in the text.

As mentioned above, a substantial body of work justifies describing certain neuronal representations with discrete state-space generative models (e.g., Luck and Vogel, 1997, Tee and Taylor, 2018, Zhang and Luck, 2008). Furthermore, it has been long known that – at the level of neuronal populations – computations occur periodically (i.e., in distinct and sometimes nested oscillatory bands). Similarly, there is evidence for sequential computation in a number of processes (e.g., attention Buschman and Miller, 2010, Duncan et al., 1994, Landau and Fries, 2012, visual perception Hanslmayr et al., 2013, Rolls and Tovee, 1994) and at different levels of the neuronal hierarchy (Friston, 2008, Friston et al., 2018b), in line with ideas from hierarchical predictive processing (Chao et al., 2018, Iglesias et al., 2013). This accommodates the fact that visual saccadic sampling of observations occurs at a frequency of approximately 4Hz (Parr & Friston, 2018d). The relatively slow presentation of a discrete sequence of observations enables inferences to be performed in peristimulus time by (much) faster neuronal dynamics.

Active inference, implicitly, accounts for fast and slow neuronal dynamics. At each time-step the agent observes an outcome, from which it infers the past, present and future (hidden) states through perception. This underwrites a plan into the future, by evaluating (the expected free energy of) possible policies. The inferred (best) policies specify the most likely action, which is executed. At a slower timescale, parameters encoding the contingencies of the world (e.g., A), are inferred. This is referred to as learning. Even more slowly, the structure of the generative model is updated to better account for available observations—this is called structure learning. The following sections elucidate these aspects of the active inference process theory.

This paper will be largely concerned with deriving and interpreting the inferential dynamics that agents might implement using the generative model in Fig. 2. We leave the discussion of more complex models to Appendix A, since the derivations are analogous in those cases.

4. Variational Bayesian inference

4.1. Free energy and model evidence

Variational Bayesian inference rests upon minimisation of a quantity called (variational) free energy, which bounds the improbability (i.e., the surprise) of sensory observations, under a generative model. Simultaneously, free energy minimisation is a statistical inference technique that enables the approximation of the posterior distribution in Bayes rule. In machine learning, this is known as variational Bayes (Beal, 2003, Bishop, 2006, Blei et al., 2017, Jordan et al., 1998). Active inference agents minimise variational free energy, enabling concomitant maximisation of their model evidence and inference of the latent variables of their generative model. In the following, we consider a particular time point to be given t{1,,T}, whence the agent has observed a sequence of outcomes o1:t. The posterior about the latent causes of sensory data is given by Bayes rule:

P(s1:T,A,π|o1:t)=P(o1:t|s1:T,A,π)P(s1:T,A,π)P(o1:t) (1)

Note the policy π is a random variable. This entails planning as inferring the best action sequence from observations (Attias, 2003, Botvinick and Toussaint, 2012). Computing the posterior distribution requires computing the model evidence P(o1:t)=πΠs1:TSTP(o1:t,s1:T,A,π)dA, which is intractable forcomplex generative models embodied by biological and artificial systems (Friston, 2008)—a well-known problem in Bayesian statistics. An alternative to computing the exact posterior distribution is to optimise an approximate posterior distribution over latent causes Q(s1:T,A,π), by minimising the Kullback–Leibler (KL) divergence (Kullback & Leibler, 1951) DKL—a non-negative measure of discrepancy between probability distributions. We can use the definition of the KL divergence and Bayes rule to arrive at the variational free energy F, which is a functional of approximate posterior beliefs:

0DKL[Q(s1:T,A,π)P(s1:T,A,π|o1:t)]=EQ(s1:T,A,π)[logQ(s1:T,A,π)logP(s1:T,A,π|o1:t)]=EQ(s1:T,A,π)[logQ(s1:T,A,π)logP(o1:t,s1:T,A,π)+logP(o1:t)]=EQ(s1:T,A,π)[logQ(s1:T,A,π)logP(o1:t,s1:T,A,π)]F[Q(s1:T,A,π)]+logP(o1:t)logP(o1:t)F[Q(s1:T,A,π)] (2)

From (2), one can see that by varying Q to minimise the variational free energy enables us to approximate the true posterior, while simultaneously ensuring that surprise remains low. The former offers the intuitive interpretation of the free energy as a generalised prediction error, as minimising free energy corresponds to suppressing the discrepancy between predictions, i.e., Q, and the actual state of affairs, i.e., the posterior; and indeed for a particular class of generative models, we recover the prediction error given by predictive coding schemes (see Bogacz, 2017, Buckley et al., 2017, Friston et al., 2007). Altogether, this means that variational free energy minimising agents, simultaneously, infer the latent causes of their observations and maximise the evidence for their generative model. One should note that the free energy equals the surprise logP(o1:t) only at the global free energy minimum, when the approximate posterior Q(s1:T,A,π) equals the true posterior P(s1:T,A,π|o1:t). Outside of the global free energy minimum, the free energy upper bounds the surprise, in which case, since the true posterior is generally intractable, the tightness of the bound is generally unknowable.

To aid intuition, the variational free energy can be rearranged into complexity and accuracy:

F[Q(s1:T,A,π)]=DKL[Q(s1:T,A,π)P(s1:T,A,π)]ComplexityEQ(s1:T,A,π)[logP(o1:t|s1:T,A,π)]Accuracy (3)

The first term of (3) can be regarded as complexity: a simple explanation for observable data Q, which makes few assumptions over and above the prior (i.e., with KL divergence close to zero), is a good explanation. In other words, a good explanation is an accurate account of some data that requires minimal movement for updating of prior to posterior beliefs (c.f., Occam’s principle). The second term is accuracy; namely, the probability of the data given posterior beliefs about model parameters Q. In other words, how well the generative model fits the observed data. The idea that neural representations weigh complexity against accuracy underwrites the imperative to find the most accurate explanation for sensory observations that is minimally complex, which has been leveraged by things like Horace Barlow’s principle of minimum redundancy (Barlow, 2001) and subsequently supported empirically (Dan et al., 1996, Lewicki, 2002, Olshausen and Field, 2004, Olshausen and O’Connor, 2002). Fig. 3 illustrates the various implications of minimising free energy.

Fig. 3.

Fig. 3

Markov blankets and self-evidencing. This schematic illustrates the various interpretations of minimising variational free energy. Recall that the existence of a Markov blanket implies a certain lack of influences among internal, blanket and external states. These independencies have an important consequence; internal and active states are the only states that are not influenced by external states, which means their dynamics (i.e., perception and action) are a function of, and only of, particular states (i.e., internal, sensory and active states); here, the variational (free energy) bound on surprise. This surprise has a number of interesting interpretations. Given it is the negative log probability of finding a particle or creature in a particular state, minimising surprise corresponds to maximising the value of a particle’s state. This interpretation is licensed by the fact that the states with a high probability are, by definition, attracting states. On this view, one can then spin-off an interpretation in terms of reinforcement learning (Barto & Sutton, 1992), optimal control theory (Todorov & Jordan, 2002) and, in economics, expected utility theory (Bossaerts & Murawski, 2015). Indeed, any scheme predicated on the optimisation of some objective function can now be cast in terms of minimising surprise – in terms of perception and action (i.e., the dynamics of internal and active states) – by specifying these optimal values to be the agent’s preferences. The minimisation of surprise (i.e., self-information) leads to a series of influential accounts of neuronal dynamics; including the principle of maximum mutual information (Linsker, 1990, Optican and Richmond, 1987), the principles of minimum redundancy and maximum efficiency (Barlow, 1961) and the free energy principle (Friston et al., 2006). Crucially, the average or expected surprise (over time or particular states of being) corresponds to entropy. This means that action and perception look as if they are minimising entropy. This leads us to theories of self-organisation, such as synergetics in physics (Haken, 1978, Kauffman, 1993, Nicolis and Prigogine, 1977) or homeostasis in physiology (Ashby, 1947, Bernard, 1974, Conant and Ashby, 1970). Finally, the probability of any blanket states given a Markov blanket (m) is, on a statistical view, model evidence (MacKay, 1995, MacKay, 2003). This means that all the above formulations are internally consistent with things like the Bayesian brain hypothesis, evidence accumulation and predictive coding; most of which inherit from Helmholtz motion of unconscious inference (von Helmholtz & Southall, 1962), later unpacked in terms of perception as hypothesis testing in 20th century psychology (Gregory, 1980) and machine learning (Dayan et al., 1995).

4.2. On the family of approximate posteriors

The goal is now to minimise variational free energy with respect to Q. To obtain a tractable expression for the variational free energy, we need to assume a certain simplifying factorisation of the approximate posterior. There are many possible forms (e.g., mean-field, marginal, Bethe, see Heskes, 2006, Parr, Markovic et al., 2019, Yedidia et al., 2005), each of which trades off the quality of the inferences with the complexity of the computations involved. For the purpose of this paper we use a particular structured mean-field approximation (see Table 2 for an explanation of the different distributions and variables in play):

Q(s1:T,A,π)=Q(A)Q(π)τ=1TQ(sτ|π) (4)
Q(sτ|π)=Cat(sπτ),sπτ{xRm|xi>0,ixi=1}
Q(π)=Cat(π),{xR|Π||xi>0,ixi=1}
Q(A)=i=1mQ(Ai),Q(Ai)=Dir(ai),ai(R>0)n

Table 2.

Glossary of terms and notation.

Notation Meaning Type
S Set of all possible (hidden) states. Finite set of cardinality m>0.
sτ (Hidden) state at time τ. In computations, if sτ evaluates to the ith possible state, then interpret it as the ith unit vector in Rm. Random variable over S.
s1:t Sequence of hidden states s1,,st. Random variable over S××St times=St.
O Set of all possible outcomes. Finite set of cardinality n>0.
oτ Outcome at time τ. In computations, if oτ evaluates to the jth possible outcome, then interpret it as the jth unit vector in Rn. Random variable over O.
o1:t Sequence of outcomes o1,,ot Random variable over O××Ot times=Ot.
T Number of timesteps in a trial of observation epochs under the generative model. Positive integer.
U Set of all possible actions. Finite set.
Π Set of all allowable policies; i.e., action sequences indexed in time. Finite subset of U××UT times=UT.
π Policy or actions sequence indexed in time. Random variable over Π, or element of Π depending on context.
Q Approximate posterior distribution over the latent variables of the generative model s1:T,A,π. Scalar valued probability distribution over S×{xRn|xi>0,ixi=1}m×Π.
F,Fπ Variational free energy and variational free energy conditioned upon a policy. Functionals of Q that evaluate to a scalar quantity.
G Expected free energy. Function defined on Π that evaluates to a scalar quantity.
Cat Categorical distribution; probability distribution over a finite set assigning strictly positive probabilities. Probability distribution over a finite set of cardinality k parameterised by a real valued vector of probabilities in {xRk|xi>0,ixi=1}
Dir Dirichlet distribution (conjugate prior of the categorical distribution). Probability distribution over the parameter space of the categorical distribution, parameterised by a vector of positive reals. Probability distribution over {xRk|xi>0,ixi=1}, itself parameterised by an element of (R>0)k.
Xi,Xki ith column and (k,i)th element of matrix X. Matrix indexing convention.
,,, Respectively inner product, Kronecker product, element-wise product and element-wise power. Following existing active inference literature, we adopt the convention XYXTY for matrices. Operation on vectors and matrices.
A Likelihood matrix. The probability of the state–outcome pair oτ,sτ, namely P(oτ|sτ,A) is given by oτAsτ. Random variable over the subset of Mn×m(R) with columns in {xRn|xi>0,ixi=1}.
Bπτ1 Matrix of transition probabilities from one state to the next state given action πτ1. The probability of possible state sτ, given sτ1 and action πτ1 is sτBπτ1sτ1. Matrix in Mm×m(R) with columns in {xRm|xi>0,ixi=1}.
D Vector of probabilities of initial state. The probability of the ith possible state occurring at time 1 is Di. Vector of probabilities in {xRm|xi>0,ixi=1}.
a,a Parameters of prior and approximate posterior beliefs about A. Matrices in Mn×m(R>0).
a0,a0 Matrices of the same size as a,a, with homogeneous columns; any of its ith column elements are denoted by ai0,ai0 and defined by ai0=j=1naji,ai0=j=1naji. Matrices in Mn×m(R>0).
log,Γ,ψ Natural logarithm, gamma function and digamma function. By convention these functions are taken component-wise on vectors and matrices. Functions.
EP(X)[f(X)] Expectation of a random variable f(X) under a probability density P(X), taken component-wise if f(X) is a matrix. EP(X)[f(X)]f(X)P(X)dX Real-valued operator on random variables.
A AEQ(A)[A]=aa0(1) Matrix in Mn×m(R>0).
logA logAEQ(A)[logA]=ψ(a)ψ(a0). Note that logAlogA! Matrix in Mn×m(R).
σ Softmax function or normalised exponential. σ(x)k=exkiexi Function Rk{xRk|xi>0,ixi=1}
H[P] Shannon entropy of a probability distribution P. Explicitly, H[P]=EP(x)[logP(x)] Functional over probability distributions.

This choice is driven by didactic purposes and since this factorisation has been used extensively in the active inference literature (Friston, FitzGerald et al., 2017, Friston, Parr et al., 2017, Friston et al., 2018b). However, the most recent software implementation of active inference (available in spm_MDP_VB_X.m) employs a marginal approximation (Parr, 2019, Parr, Markovic et al., 2019), which retains the simplicity and biological interpretation of the neuronal dynamics afforded by the mean-field approximation, while approximating the more accurate inferences of the Bethe approximation. For these reasons, the marginal free energy currently stands as the most biologically plausible.

4.3. Computing the variational free energy

The next sections focus on producing biologically plausible neuronal dynamics that perform perception and learning based on variational free energy minimisation. To enable this, we first compute variational the free energy, using the factorisations of the generative model and approximate posterior (c.f., Fig. 2):

F[Q(s1:T,A,π)]=EQ(s1:T,A,π)[logQ(s1:T,A,π)logP(o1:t,s1:T,A,π)]=EQ(s1:T,A,π)[logQ(A)+logQ(π)+τ=1TlogQ(sτ|π)logP(A)logP(π)logP(s1)τ=2TlogP(sτ|sτ1,π)
τ=1tlogP(oτ|sτ,A)]=DKL[Q(A)P(A)]+DKL[Q(π)P(π)]+EQ(π)[Fπ[Q(s1:T|π)]] (5)

where

Fπ[Q(s1:T|π)]τ=1TEQ(sτ|π)[logQ(sτ|π)]τ=1tEQ(sτ|π)Q(A)[logP(oτ|sτ,A)]EQ(s1|π)[logP(s1)]τ=2TEQ(sτ|π)Q(sτ1|π)×[logP(sτ|sτ1,π)] (6)

is the variational free energy conditioned upon pursuing a particular policy. This is the same quantity that we would have obtained by omitting A and conditioning all probability distributions in the numerators of (1) by π. In the next section, we will see how perception can be framed in terms of variational free energy minimisation.

5. Perception

In active inference, perception is equated with state estimation (Friston, FitzGerald et al., 2017) (e.g., inferring the temperature from the sensation of warmth), consistent with the idea that perceptions are hypotheses (Gregory, 1980). To infer the (past, present and future) states of the environment, an agent must minimise the variational free energy with respect to Q(s1:T|π) for each policy π. This provides the agent’s inference over hidden states, contingent upon pursuing a given policy. Since the only part of the free energy that depends on Q(s1:T|π) is Fπ, the agent must simply minimise Fπ. Substituting Q(sτ|π) by their sufficient statistics (i.e., the vector of parameters sπτ), Fπ becomes a function of those parameters. This enables us to rewrite (6), conveniently in matrix form (see Appendix B for details):

Fπ(sπ1,,sπT)=τ=1Tsπτlogsπττ=1toτlogAsπτsπ1logDτ=2Tsπτlog(Bπτ1)sπτ1 (7)

This enables to compute the variational free energy gradients (Petersen & Pedersen, 2012):

sπτFπ(sπ1,,sπT)=1+logsπτoτlogA+sπτ+1log(Bπτ)+logDif τ=1oτlogA+sπτ+1log(Bπτ)+log(Bπτ1)sπτ1if 1<τtsπτ+1log(Bπτ)+log(Bπτ1)sπτ1if τ>t (8)

The neuronal dynamics are given by a gradient descent on free energy (Friston, FitzGerald et al., 2017), with state-estimation expressed as a softmax function of accumulated (negative) free energy gradients, that we denote by vπτ (see Section 5.1 for an interpretation). The constant term 1 is generally omitted since the softmax function removes it anyway.

v˙πτ(sπ1,,sπT)=sπτFπ(sπ1,,sπT)sπτ=σ(vπτ) (9)

The softmax function σ – a generalisation of the sigmoid to vector inputs – is a natural choice as the variational free energy gradient is a logarithm and the components of sπτ must sum to one. Note the continuous time gradient descent on the free energy (9); although we focus on active inference with discrete generative models, this does not preclude the belief updating from occurring in continuous time (this is particularly important when relating these dynamics to neurobiological processes, see below). Yet, any numerical implementation of active inference would implement a discretised version of (9) until convergence, for example

vπτ(k)=vπτ(k1)κsπτ(k1)Fπ(sπ1(k1),,sπT(k1)) for small κ>0
sπτ(k)=σ(vπτ(k)).

5.1. Plausibility of neuronal dynamics

The temporal dynamics expressed in (9) unfold at a much faster timescale than the sampling of new observations (i.e., within timesteps) and correspond to fast neuronal processing in peristimulus time. This is consistent with behaviour-relevant computations at frequencies that are higher than the rate of visual sampling (e.g., working memory (Lundqvist et al., 2016), visual stimulus perception in humans (Hanslmayr et al., 2013) and macaques (Rolls & Tovee, 1994)).

Furthermore, these dynamics (9) are consistent with predictive processing (Bastos et al., 2012, Rao and Ballard, 1999) – since active inference prescribes dynamics that minimise prediction error – although they generalise it to a wide range of generative models. Note that, while also a variational free energy, this sort of prediction error (7) is not the same as that given by predictive coding schemes (which rely upon a certain kind of continuous state-space generative model, see Bogacz, 2017, Buckley et al., 2017, Friston et al., 2007).

Just as neuronal dynamics involve translation from post-synaptic potentials to firing rates, (9) involves translating from a vector of real numbers (v), to a vector whose elements are bounded between zero and one (sπτ); via the softmax function. As a result, it is natural to interpret the components of v as the average membrane potential of distinct neural populations, and sπτ as the average firing rate of those populations, which is bounded thanks to neuronal refractory periods. This is consistent with mean-field formulations of neural population dynamics, in that the average firing rate of a neuronal population follows a sigmoid function of the average membrane potential (Deco et al., 2008, Marreiros et al., 2008, Moran et al., 2013). Using the fact that a softmax function is a generalisation of the sigmoid to vector inputs – here the average membrane potentials of coupled neuronal populations – it follows that their average firing follows a softmax function of their average potential. In this context, the softmax function may be interpreted as performing lateral inhibition, which can be thought of as leading to narrower tuning curves of individual neurons and thereby sharper inferences (Von Békésy, 1967). Importantly, this tells us that state-estimation can be performed in parallel by different neuronal populations, and a simple neuronal architecture is sufficient to implement these dynamics (see Parr, Markovic et al. (2019, Figure 6)).

Lastly, interpreting the dynamics in this way has a degree of face validity, as it enables us to synthesise a wide-range of biologically plausible electrophysiological responses; including repetition suppression, mismatch negativity, violation responses, place-cell activity, phase precession, theta sequences, theta–gamma coupling, evidence accumulation, race-to-bound dynamics and transfer of dopamine responses (Friston, FitzGerald et al., 2017, Schwartenbeck, FitzGerald, Mathys, Dolan and Friston, 2015).

The neuronal dynamics for state estimation coincide with variational message passing (Dauwels, 2007, Winn and Bishop, 2005), a popular algorithm for approximate Bayesian inference. This follows, as we have seen, from free energy minimisation under a particular mean-field approximation (4). If one were to use the Bethe approximation, the corresponding dynamics coincide with belief propagation (Bishop, 2006, Loeliger, 2004, Parr, Markovic et al., 2019, Schwöbel et al., 2018, Yedidia et al., 2005), another widely used algorithm for approximate inference. This offers a formal connection between active inference and message passing interpretations of neuronal dynamics (Dauwels et al., 2007, Friston, Parr et al., 2017, George, 2005). In the next section, we examine planning, decision-making and action selection.

6. Planning, decision-making and action selection

So far, we have focused on optimising beliefs about hidden states under a particular policy by minimising a variational free energy functional of an approximate posterior over hidden states, under each policy.

In this section, we explain how planning and decision-making arise as a minimisation of expected free energy—a function scoring the goodness of each possible future course of action. We briefly motivate how the expected free energy arises from first-principles. This allows us to frame decision-making and action-selection in terms of expected free energy minimisation. Finally, we conclude by discussing the computational cost of planning into the future.

6.1. Planning and decision-making

At the heart of active inference, is a description of agents that strive to attain a target distribution specifying the range of preferred states of being, given a sufficient amount of time. To work towards reaching these preferences, agents select policies Q(π), such that their predicted states Q(sτ,A) at some future time point τ>t (usually, the time horizon of a policy T) reach the preferred states P(sτ,A), which are specified by the generative model. These considerations allow us to show in Appendix C that the requisite approximate posterior over policies Q(π) is a softmax function of the negative expected free energy G4 :

Q(π)=σ(G(π))G(π)=DKL[Q(sτ,A|π)P(sτ,A)]RiskEQ(sτ,A|π)P(oτ|sτ,A)[logP(oτ|sτ,A)]Ambiguity (10)

By risk we mean the difference between predicted and a priori predictions in the future (e.g., the quantification of losses as in financial risk) and ambiguity is the uncertainty associated to future observations, given states. This means that the most likely (i.e., best) policies minimise expected free energy. This ensures that future courses of action are exploitative (i.e., risk minimising) and explorative (i.e., ambiguity minimising). In particular, the expected free energy balances goal-seeking and itinerant novelty-seeking behaviour, given some prior preferences or goals. Note that the ambiguity term rests on an expectation over fictive (i.e., predicted) outcomes under beliefs about future states. This means that optimising beliefs about future states during perception is crucial to accurately predict future outcomes during planning. In summary, planning and decision-making respectively correspond to evaluating the expected free energy of different policies, which scores their goodness in relation to prior preferences and forming approximate posterior beliefs about policies.

6.2. Action selection, policy-independent state-estimation

Approximate posterior beliefs about policies allows to obtain the most plausible action as the most likely under all policies—this can be expressed as a Bayesian model average

ut=argmaxuUπΠ,πt=uQ(π). (11)

In addition, we obtain a policy independent state-estimation at any time point Q(sτ),τ{1,,T}, as a Bayesian model average of approximate posterior beliefs about hidden states under policies, which may be expressed in terms of the distribution’s parameters (Q(sτ)=Cat(sτ),Q(sτ|π)=Cat(sπτ)):

Q(sτ)=πΠQ(sτ|π)Q(π)sτ=πΠsπτQ(π) (12)

Note that these Bayesian model averages may be implemented by neuromodulatory mechanisms (FitzGerald et al., 2014).

6.3. Biological plausibility

Winner take-all architectures of decision-making are already commonplace in computational neuroscience (e.g., models of selective attention and recognition (Carpenter and Grossberg, 1987, Itti et al., 1998), hierarchical models of vision (Riesenhuber & Poggio, 1999)). This is nice, since the softmax function in (10) can be seen as providing a biologically plausible (Deco et al., 2008, Marreiros et al., 2008, Moran et al., 2013), smooth approximation to the maximum operation, which is known as soft winner take-all (Maass, 2000). In fact, the generative model, presented in Fig. 2, can be naturally extended such that the approximate posterior contains an (inverse) temperature parameter γ multiplying the expected free energy inside the softmax function (see Appendix A.2). This temperature parameter regulates how precisely the softmax approximates the maximum function, thus recovering winner take-all architectures for high parameter values (technically, this converts Bayesian model averaging into Bayesian model selection, where the policy corresponds to a model of what the agent is doing). This parameter, regulating precision of policy selection, has a clear biological interpretation in terms of confidence encoded in dopaminergic firing (FitzGerald, Dolan et al., 2015, Friston, FitzGerald et al., 2017, Friston et al., 2014, Schwartenbeck, FitzGerald, Mathys, Dolan and Friston, 2015). Interestingly, Daw and colleagues (Daw et al., 2006) uncovered evidence in favour of a similar model employing a softmax function and temperature parameter in human decision-making.

6.4. Pruning of policy trees

From a computational perspective, planning (i.e., computing the expected free energy) for each possible policy can be cost-prohibitive, due do the combinatorial explosion in the number of sequences of actions when looking deep into the future. There has been work in understanding how the brain finesses this problem (Huys et al., 2012), which suggests a simple answer: during mental planning, humans stop evaluating a policy as soon as they encounter a large loss (i.e., a high value of the expected free energy that renders the policy highly implausible). In active inference this corresponds to using an Occam window; that is, we stop evaluating the expected free energy of a policy if it becomes much higher than the best (smallest expected free energy) policy—and set its approximate posterior probability to an arbitrarily low value accordingly. This biologically plausible pruning strategy drastically reduces the number of policies one has to evaluate exhaustively.

Although effective and biologically plausible, the Occam window for pruning policy trees cannot deal with large policy spaces that ensue with deep policy trees and long temporal horizons. This means that pruning can only partially explain how biological organisms perform deep policy searches. Further research is needed to characterise the processes in which biological agents reduce large policy spaces to tractable subspaces. One explanation – for the remarkable capacity of biological agents to evaluate deep policy trees – rests on deep (hierarchical) generative models, in which policies operate at each level. These deep models enable long-term policies, modelling slow transitions among hidden states at higher levels in the hierarchy, to contextualise faster state transitions at subordinate levels (see Appendix A). The resulting (semi Markovian) process can then be specified in terms of a hierarchy of limited horizon policies that are nested over temporal scales; c.f., motor chunking (Dehaene et al., 2015, Fonollosa et al., 2015, Haruno et al., 2003).

6.5. Discussion of the action–perception cycle

Minimising variational and expected free energy are complementary and mutually beneficial processes. Minimisation of variational free energy ensures that the generative model is a good predictor of its environment; this allows the agent to accurately plan into the future by evaluating expected free energy, which in turn enables it to realise its preferences. In other words, minimisation of variational free energy is a vehicle for effective planning and reaching preferences via the expected free energy; in turn, reaching preferences minimises the expected surprise of future states of being.

In conclusion, we have seen how agents plan into the future and make decisions about the best possible course of action. This concludes our discussion of the action–perception cycle. In the next section, we examine expected free energy in greater detail. Then, we will see how active agents can learn the contingencies of the environment and the structure of their generative model at slower timescales.

7. Properties of the expected free energy

The expected free energy is a fundamental construct of interest. In this section, we unpack its main features and highlight its importance in relation to many existing theories in neurosciences and engineering.

The expected free energy of a policy can be unpacked in a number of ways. Perhaps the most intuitive is in terms of risk and ambiguity:

G(π)=DKL[Q(sτ,A|π)P(sτ,A)]Risk+EQ(sτ,A|π)[H[P(oτ|sτ,A)]]Ambiguity (13)

This means that policy selection minimises risk and ambiguity. Risk, in this setting, is simply the difference between predicted and prior beliefs about final states. In other words, policies will be deemed more likely if they bring about states that conform to prior preferences. In the optimal control literature, this part of expected free energy underwrites KL control (Todorov, 2008, van den Broek et al., 2010). In economics, it leads to risk sensitive policies (Fleming & Sheu, 2002). Ambiguity reflects the uncertainty about future outcomes, given hidden states. Minimising ambiguity therefore corresponds to choosing future states that generate unambiguous and informative outcomes (e.g., switching on a light in the dark).

We can express the expected free energy of a policy as a bound on information gain and expected log (model) evidence (a.k.a., Bayesian risk):

G(π)=EQ[DKL[Q(sτ,A|oτ,π)P(sτ,A|oτ)]]Expected evidence boundEQ[logP(oτ)]Expected log evidenceEQ[DKL[Q(sτ,A|oτ,π)Q(sτ,A|π)]]Expected information gainEQ[logP(oτ)]Expected log evidenceEQ[DKL[Q(sτ,A|oτ,π)Q(sτ,A|π)]]Expected information gain (14)

The first term in (14) is the expectation of log evidence under beliefs about future outcomes, while the second ensures that this expectation is maximally informed, when outcomes are encountered. Collectively, these two terms underwrite the resolution of uncertainty about hidden states (i.e., information gain) and outcomes (i.e., expected surprise) in relation to prior beliefs.

When the agent’s preferences are expressed in terms of outcomes (c.f., Fig. 2), it is useful to express risk in terms of outcomes, as opposed to hidden states. This is most useful when the generative model is not known or during structure learning, when the state-space evolves over time. In these cases, the risk over hidden states can be replaced risk over outcomes by assuming the KL divergence between the predicted and true posterior (under expected outcomes) is small:

DKL[Q(sτ,A|π)P(sτ,A)]Risk (states)=DKL[Q(oτ|π)P(oτ)]Risk (outcomes)+EQ(oτ|π)[DKL[Q(sτ,A|oτ,π)P(sτ,A|oτ)]]0DKL[Q(oτ|π)P(oτ)]Risk (outcomes) (15)

This divergence constitutes an expected evidence bound that also appears if we express expected free energy in terms of intrinsic and extrinsic value:

G(π)=EQ(oτ|π)[logP(oτ)]Extrinsic value+EQ(oτ|π)[DKL[Q(sτ,A|oτ,π)P(sτ,A|oτ)]]Expected evidence boundEQ(oτ|π)[DKL[Q(sτ|oτ,π)Q(sτ|π)]]Intrinsic value (states) or salienceEQ(oτ,sτ|π)[DKL[Q(A|oτ,sτ,π)Q(A)]]Intrinsic value (parameters) or novelty (16)

Extrinsic value is just the expected value of log evidence, which can be associated with reward and utility inbehavioural psychology and economics, respectively (Barto et al., 2013; Kauder, 1953, Schmidhuber, 2010). In this setting, extrinsic value is the negative of Bayesian risk (Berger, 1985), when reward is log evidence. The intrinsic value of a policy is its epistemic value or affordance (Friston et al., 2015). This is just the expected information gain afforded by a particular policy, which can be about hidden states (i.e., salience) or model parameters (i.e., novelty). It is this term that underwrites artificial curiosity (Schmidhuber, 2006).

Intrinsic value corresponds to the expected information gain about model parameters. It is also known as intrinsic motivation in neurorobotics (Barto et al., 2013, Deci and Ryan, 1985, Oudeyer and Kaplan, 2009), the value of information in economics (Howard, 1966), salience in the visual neurosciences and (rather confusingly) Bayesian surprise in the visual search literature (Itti and Baldi, 2009, Schwartenbeck et al., 2013, Sun et al., 2011). In terms of information theory, intrinsic value is mathematically equivalent to the expected mutual information between hidden states in the future and their consequences—consistent with the principles of minimum redundancy or maximum efficiency (Barlow, 1961, Barlow, 1974, Linsker, 1990). Finally, from a statistical perspective, maximising intrinsic value (i.e., salience and novelty) corresponds to optimal Bayesian design (Lindley, 1956) and machine learning derivatives, such as active learning (MacKay, 1992). On this view, active learning is driven by novelty; namely, the information gain afforded model parameters, given future states and their outcomes. Heuristically, this curiosity resolves uncertainty about “what would happen if I did that” (Schmidhuber, 2010). Fig. 4 illustrates the compass of expected free energy, in terms of its special cases; ranging from optimal Bayesian design through to Bayesian decision theory.

Fig. 4.

Fig. 4

Expected free energy. This figure illustrates the various ways in which minimising expected free energy can be unpacked (omitting model parameters for clarity). The upper panel casts action and perception as the minimisation of variational and expected free energy, respectively. Crucially, active inference introduces beliefs over policies that enable a formal description of planning as inference (Attias, 2003, Botvinick and Toussaint, 2012, Kaplan and Friston, 2018a). In brief, posterior beliefs about hidden states of the world, under plausible policies, are optimised by minimising a variational (free energy) bound on log evidence. These beliefs are then used to evaluate the expected free energy of allowable policies, from which actions can be selected (Friston, FitzGerald et al., 2017). Crucially, expected free energy subsumes several special cases that predominate in the psychological, machine learning and economics literature. These special cases are disclosed when one removes particular sources of uncertainty from the implicit optimisation problem. For example, if we ignore prior preferences, then the expected free energy reduces to information gain (Lindley, 1956, MacKay, 2003) or intrinsic motivation (Barto et al., 2013, Deci and Ryan, 1985, Oudeyer and Kaplan, 2009). This is mathematically the same as expected Bayesian surprise and mutual information that underwrite salience in visual search (Itti and Baldi, 2009, Sun et al., 2011) and the organisation of our visual apparatus (Barlow, 1961, Barlow, 1974, Linsker, 1990, Optican and Richmond, 1987). If we now remove risk but reinstate prior preferences, one can effectively treat hidden and observed (sensory) states as isomorphic. This leads to risk sensitive policies in economics (Fleming and Sheu, 2002, Kahneman and Tversky, 1988) or KL control in engineering (van den Broek et al., 2010). Here, minimising risk corresponds to aligning predicted outcomes to preferred outcomes. If we then remove ambiguity and relative risk of action (i.e., intrinsic value), we are left with extrinsic value or expected utility in economics (Von Neumann & Morgenstern, 1944) that underwrites reinforcement learning and behavioural psychology (Barto & Sutton, 1992). Bayesian formulations of maximising expected utility under uncertainty is also known as Bayesian decision theory (Berger, 1985). Finally, if we just consider a completely unambiguous world with uninformative priors, expected free energy reduces to the negative entropy of posterior beliefs about the causes of data; in accord with the maximum entropy principle (Jaynes, 1957). The expressions for variational and expected free energy correspond to those described in the main text (omitting model parameters for clarity). They are arranged to illustrate the relationship between complexity and accuracy, which become risk and ambiguity, when considering the consequences of action. This means that risk-sensitive policy selection minimises expected complexity or computational cost. The coloured dots above the terms in the equations correspond to the terms that constitute the special cases in the lower panels.

8. Learning

In active inference, learning concerns the dynamics of synaptic plasticity, which are thought to encode beliefs about the contingencies of the environment (Friston, FitzGerald et al., 2017) (e.g., beliefs about B, in some settings, are thought to be encoded in recurrent excitatory connections in the prefrontal cortex (Parr, Rikhye et al., 2019)). The fact that beliefs about matrices (e.g., A, B) may be encoded in synaptic weights conforms to connectionist models of brain function, as it offers a convenient way to compute probabilities, in the sense that the synaptic weights could be interpreted as performing matrix multiplication as in artificial neural networks, to predict; e.g., outcomes from beliefs about states, using the likelihood matrix A.

These synaptic dynamics (e.g., long-term potentiation and depression) evolve at a slower timescale than action and perception, which is consistent with the fact that such inferences need evidence accumulation over multiple state–outcome pairs. For simplicity, we will assume the only variable that is learned is A, but what follows generalises to more complex generative models (c.f., Appendix A.1. Learning A means that approximate posterior beliefs about A follow a gradient descent on variational free energy. Seeing the variational free energy (5) as a function of a (the sufficient statistic of Q(A)) we can write:

F(a)=DKL[Q(A)P(A)]τ=1tEQ(π)Q(sτ|π)Q(A)[oτlog(A)sτ]+=DKL[Q(A)P(A)]τ=1toτlogAsτ+ (17)

Here, we ignore the terms in (5) that do not depend on Q(A), as these will vanish when we take the gradient. The KL-divergence between Dirichlet distributions is (Kurt, 2013, Penny, 2001):

DKL[Q(A)P(A)]=i=1mDKL[Q(Ai)P(Ai)]=i=1mlogΓ(a0i)k=1nlogΓ(aki)logΓ(a0i)+k=1nlogΓ(aki)+(aiai)(logA)i=i=1mlogΓ(a0i)k=1nlogΓ(aki)logΓ(a0i)+k=1nlogΓ(aki)+(aa)logA (18)

Incorporating (18) in (17), we can take the gradient of the variational free energy with respect to logA:

logAF(a)=aaτ=1toτsτ (19)

where is the Kronecker (i.e., outer) product. This means that the dynamics of synaptic plasticity follow a descent on (19):

ρ˙(a)=logAF(a)=a+a+τ=1toτsτ (20)

In computational terms, these are the dynamics for evidence accumulation of Dirichlet parameters at time t. Since synaptic plasticity dynamics occur at a much slower pace than perceptual inference, it is computationally much cheaper in numerical simulations to do a one-step belief update at the end of each trial of observation epochs. Explicitly, setting the free energy gradient to zero at the end of the trial gives the following update for Dirichlet parameters:

a=a+τ=1Toτsτ (21)

After which, the prior beliefs P(A) are updated to the approximate posterior beliefs Q(A) for the subsequent trial. Note that in particular, the update counts the number of times a specific mapping between states and observations has been observed. Interestingly, this is formally identical to associative or Hebbian plasticity.

As one can see, the learning rule concerning accumulation of Dirichlet parameters (21) means that the agent becomes increasingly confident about its likelihood matrix by receiving new observations, since the matrix which is added onto a at each timestep is always positive. This is fine as long as the structure of the environment remains relatively constant. In the next section, we will see how Bayesian model reduction can revert this process, to enable the agent to adapt quickly to a changing environment. Table 3 summarises the belief updating entailed by active inference, and Fig. 5 indicates where particular computations might be implemented in the brain.

Table 3.

Summary of belief updating.

Process Computation Equations
Perception sπτ=σ(v),v˙=sπτFπ (8)
Planning G(π) (D.2), (D.3)
Decision-making Q(π)=σ(G(π)) (10)
Action selection ut=argmaxuUπΠδu,πtQ(π) (11)
Policy-independent state-estimation sτ=πΠsπτQ(π) (12)
Learning (end of trial) a=a+τ=1Toτsτ (21)

Fig. 5.

Fig. 5

Possible functional anatomy. This figure summarises a possible (coarse-grained) functional anatomy that could implement belief updating in active inference. The arrows correspond to message passing between different neuronal populations. Here, a visual observation is sampled by the retina, aggregated in first-order sensory thalamic nuclei and processed in the occipital (visual) cortex. The green arrows correspond to message passing of sensory information. This signal is then propagated (via the ventral visual pathway) to inferior and medial temporal lobe structures such as the hippocampus; this allows the agent to go from observed outcomes to beliefs about their most likely causes in state-estimation (perception), which is performed locally. The variational free energy is computed in the striatum. The orange arrows encode message passing of beliefs. Preferences C are attributed to the dorsolateral prefrontal cortex – which is thought to encode representations over prolonged temporal scales (Parr & Friston, 2017b) – consistent with the fact that these are likely to be encoded within higher cortical areas (Friston, Lin et al., 2017). The expected free energy is computed in the medial prefrontal cortex (Friston, FitzGerald et al., 2017) during planning, which leads to inferences about most plausible policies (decision-making) in the basal ganglia, consistent with the fact that the basal ganglia is thought to underwrite planning and decision-making (Berns and Sejnowski, 1996, Ding and Gold, 2013, Haber, 2003, Jahanshahi et al., 2015, Parr and Friston, 2018b, Thibaut, 2016). The message concerning policy selection is sent to the motor cortex via thalamocortical loops. The most plausible action, which is selected in the motor cortex is passed on through the spinal cord to trigger a limb movement. Simultaneously, policy independent state-estimation is performed in the ventrolateral prefrontal cortex, which leads to synaptic plasticity dynamics in the prefrontal cortex, where the synaptic weights encode beliefs about A.

9. Structure learning

In the previous sections, we have addressed how an agent performs inference over different variables at different timescales in a biologically plausible fashion, which we equated to perception, planning and decision-making. In this section, we consider the problem of learning the form or structure of the generative model.

The idea here is that agents are equipped (e.g., born) with an innate generative model that entails fundamental preferences (e.g., essential to survival), which are not updated. For instance, humans are born with prior preferences about their body temperature around 37 °C and O2, CO2, glucose etc. concentrations within a certain range. Mathematically, this means that the parameters of these innate prior distributions – encoding the agent’s expectations as part of its generative model – have hyperpriors that are infinitely precise (e.g., a Dirac delta distribution) and thus cannot be updated in an experience dependent fashion. The agent’s generative model then naturally evolves by minimising variational free energy to become a good model of the agent’s environment but is still constrained by the survival preferences hardcoded within it. This process of learning the generative model (i.e., the variables and their functional dependencies) is called structure learning.

Structure learning in active inference is an active area of research. Active inference proposes that the agent’s generative model evolves over time to maximise the evidence for its observations. However, a complete set of mechanisms that biological agents use to do so has not yet been laid out. Nevertheless, we use this section to summarise two complementary approaches; namely, Bayesian model reduction and Bayesian model expansion (Friston, Lin et al., 2017, Friston et al., 2018a, Friston and Penny, 2011, Smith et al., 2019) – that enable to simplify and complexify the model, respectively.

9.1. Bayesian model reduction

To explain the causes of their sensations, agents must compare different hypotheses about how their sensory data are generated—and retain the hypothesis or model that is the most valid in relation to their observations (i.e., has the greatest model evidence). In Bayesian statistics, these processes are called Bayesian model comparison and Bayesian model selection—these correspond to scoring the evidence for various generative models in relation to available data and selecting the one with the highest evidence (Claeskens and Hjort, 2006, Stephan et al., 2009). Bayesian model reduction (BMR) is a particular instance of structure learning, which formalises post-hoc hypothesis testing to simplify the generative model. This precludes redundant explanations of sensory data—and ensures the model generalises to new data. Technically, it involves estimating the evidence for simpler (reduced) priors over the latent causes and selecting the model with the highest evidence. This process of simplifying the generative model – by removing certain states or parameters – has a clear biological interpretation in terms of synaptic decay and switching off certain synaptic connections, which is reminiscent of the synaptic mechanisms of sleep (e.g., REM sleep (Hobson and Friston, 2012, Hobson et al., 2014)), reflection and associated machine learning algorithms (e.g., the wake–sleep algorithm (Hinton et al., 1995)).

In the following, we show BMR for learning the likelihood matrix A. Note that BMR is generic and could be used on any other variable that may be optimised during learning (e.g., see Appendix A.1), just by replacing A in the following lines. To keep things concise, we denote by o=o1:t the sequence of available observations. The current model has a prior P(A) and we would like to test whether a reduced (i.e., less complex) prior P~(A) can provide a more parsimonious explanation for the observed outcomes. Using Bayes rule, we have the following identities:

P(A)P(o|A)=P(A|o)P(o) (22)
P~(A)P(o|A)=P~(A|o)P~(o) (23)

Where P(o)=P(o|A)P(A)dA and P~(o)=P(o|A)P~(A). Dividing (22) by (23) yields

P(A)P~(A)=P(A|o)P(o)P~(A|o)P~(o) (24)

We can then use (24) in order to obtain the following relations:

1=P~(A|o)dA=P(o)P~(o)P~(A)P(A|o)P(A)dA=P(o)P~(o)EP(A|o)P~(A)P(A) (25)
logP~(o)logP(o)=logEP(A|o)P~(A)P(A) (26)

We can approximate the posterior term in the expectation of (26) with the corresponding approximate posterior Q(A), which simplifies the computation. This allows us to compare the evidence of the two models (reduced and full). If the reduced model has more evidence, it implies the current model is too complex—and redundant parameters can be removed by adopting the new priors.

In conclusion, BMR allows for computationally efficient and biologically plausible hypothesis testing, to find simpler explanations for the data at hand. It has been used to emulate sleep and reflection in abstract rule learning (Friston, Lin et al., 2017), by simplifying the prior over A at the end of each trial—this has the additional benefit of preventing the agent from becoming overconfident.

9.2. Bayesian model expansion

Bayesian model expansion is complementary to Bayesian model reduction. It entails adopting a more complex generative model – for example, by adding more states – if and only if the gain in accuracy in (3) is sufficient enough to outweigh the increase in complexity. This model expansion allows for generalisation and concept learning in active inference (Smith et al., 2019). Note that additional states need not always lead to a more complex model. It is in principle possible to expand a model in such a way that complexity decreases, as many state estimates might be able to remain close to their priors in place of a small number of estimates moving a lot. This ‘shared work’ by many parameters could lead to a simpler model.

From a computational perspective, concept acquisition can be seen as a type of structure learning (Gershman and Niv, 2010, Tervo et al., 2016) – that can be emulated through Bayesian model comparison. Recent work on concept learning in active inference (Smith et al., 2019), shows that a generative model equipped with extra hidden states can engage these ‘unused’ hidden states, when an agent is presented with novel stimuli during the learning process. Initially the corresponding likelihood mappings (i.e., the corresponding columns of A) are uninformative, but these are updated when the agent encounters new observations that cannot be accounted by its current knowledge (e.g., observing a cat when it has only been exposed to birds). This happens naturally, during the learning process, in an unsupervised way through free energy minimisation. To allow for effective generalisation, this approach can be combined with BMR; in which any new concept can be aggregated with similar concepts, and the associated likelihood mappings can be reset for further concept acquisition, in favour of a simpler model with higher model evidence. This approach can be further extended by updating the number of extra hidden states through a process of Bayesian model comparison.

10. Discussion

Due to the various recent theoretical advances in active inference, it is easy to lose sight of its underlying principle, process theory and practical implementation. We have tried to address this by rehearsing – in a clear and concise way – the assumptions underlying active inference as a principle, the technical details of the process theory for discrete state-space generative models and the biological interpretation of the accompanying neuronal dynamics. It is useful to clarify these results; as a first step to guide towards outstanding theoretical research challenges, a practical guide to implement active inference to simulate experimental behaviour and a pointer towards various predictions that may be tested empirically.

Active inference offers a degree of plausibility as a process theory of brain function. From a theoretical perspective its requisite neuronal dynamics correspond to known empirical phenomena and extend earlier theories like predictive coding (Bastos et al., 2012, Friston, 2010, Rao and Ballard, 1999). Furthermore, the process theory is consistent with the underlying free energy principle, which biological systems are thought to abide by—namely, the avoidance of surprising states: this can be articulated formally based on fundamental assumptions about biological systems (Friston, 2019, Parr et al., 2020). Lastly, the process theory has a degree of face validity as its predicted electrophysiological responses closely resemble empirical measurements.

However, for a full endorsement of the process theory presented in this paper, rigorous empirical validation of the synthetic electrophysiological responses is needed. To pursue this, one would have to specify the generative model that a biological agent employs for a particular task. This can be done through Bayesian model comparison of alternative generative models with respect to empirical (choice) behaviour being measured (e.g., Mirza et al., 2018). Once the appropriate generative model is formulated, evidence for a plausible but distinct implementations of active inference would need to be compared, which come from various possible approximations to the free energy (Parr, Markovic et al., 2019, Schwöbel et al., 2018, Yedidia et al., 2005), each of which yields different belief updates and simulated electrophysiological responses. Note that the marginal approximation to the free energy currently stands as the most biologically plausible (Parr, Markovic et al., 2019). From this, the explanatory power of active inference can be assessed in relation to empirical measurements and contrasted with other existing theories.

This means that the key challenge for active inference – and arguably data analysis in general – is finding the generative model that best explains observable data (i.e., evidence maximising). A solution to this problem would enable to find the generative model – entailed by an agent – by observing its behaviour. In turn, this would enable one to simulate its belief updating and behaviour accurately in-silico. It should be noted that these generative models can be specified manually for the purposes of reproducing simple behaviour (e.g., agents performing simple tasks needed for empirical validation discussed above). However, a generic solution to this problem is necessary to account for complex datasets; in particular, complex behavioural data from agents in a real environment. Moreover, a biologically plausible solution to this problem could correspond to a complete structure learning roadmap; accounting for how biological agents evolve their generative model to account for new observations. Evolution has solved this problem by selecting phenotypes with a good model of their sensory data, therefore, understanding the processes that have selected generative models that are fit for purpose for our environment might lead to important advances in structure learning and data analysis.

Discovering new generative models corresponding to complex behavioural data will demand to extend the current process theory to these models, in order to provide testable predictions and reproduce the observed behaviour in-silico. Examples of generative models that are used in learning and decision-making, yet are not accommodated by the current process theory, include Markov decision trees (Jordan et al., 1998, Jordan et al., 1997) and Boltzmann machines (Ackley et al., 1985, Salakhutdinov and Hinton, 2012, Stone, 2019).

One challenge that may arise, when scaling active inference to complex models with many degrees of freedom, will be the size of the policy trees in consideration. Although effective and biologically plausible, the current pruning strategy is unlikely to reduce the search space sufficiently to enable tractable inference in such cases. As noted above, the issue of scaling active inference may yield to the first principles of the variational free energy formulation. Specifically, generative models with a high evidence are minimally complex. This suggests that ‘scaling up’, in and of itself, is not the right strategy for reproducing more sophisticated or deep behaviour. A more principled approach would be to explore the right kind of factorisations necessary to explain structured behaviour. A key candidate here are deep temporal or diachronic generative models that have a separation of timescales. This form of factorisation (c.f., mean field approximation) replaces deep decision trees with shallow decision trees that are hierarchically composed.

To summarise, we argue that some important challenges for theoretical neuroscience include finding process theories of brain function that comply with active inference as a principle (Friston, 2019, Parr et al., 2020); namely, the avoidance of surprising events. The outstanding challenge is then to explore and fine grain such process theories, via Bayesian model comparison (e.g., using dynamic causal modelling (Friston, 2012, Friston et al., 2003)) in relation to experimental data. From a structure learning and data analysis perspective, the main challenge is finding the generative model with the greatest evidence in relation to available data. This may be achieved by understanding the processes evolution has selected for creatures with a good model of their environment. Finally, to scale active inference to behaviour with many degrees of freedom, one needs to understand how biological agents effectively search deep policy trees when planning into the future, when many possible policies may be entertained at separable timescales.

11. Conclusion

In conclusion, this paper aimed to summarise: the assumptions underlying active inference, the technical details underwriting its process theory, and how the associated neuronal dynamics relate to known biological processes. These processes underwrite action, perception, planning, decision-making, learning and structure learning; which we have illustrated under discrete state-space generative models. We have discussed some important outstanding challenges: from a broad perspective, the challenge for theoretical neuroscience is to develop increasingly fine-grained mechanistic models of brain function that comply with the core tenets of active inference (Friston, 2019, Parr et al., 2020). In regards to the process theory, key challenges relate to experimental validation, understanding how biological organisms evolve their generative model to account for new sensory observations and how they effectively search large policy spaces when planning into the future.

Software availability

The belief updating scheme described in this article is generic and can be implemented using standard routines (e.g., spm_MDP_VB_X.m). These routines are available as Matlab code in the SPM academic software: http://www.fil.ion.ucl.ac.uk/spm/. Examples of simulations using discrete state-space generative models can be found via a graphical user interface by typing DEM.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

Funding

LD is supported by the Fonds National de la Recherche, Luxembourg (Project code: 13568875). TP is supported by the Rosetrees Trust (Award number: 173346). NS is funded by the Medical Research Council (Ref: 2088828). SV was funded by the Leverhulme Doctoral Training Programme for the Ecological Study of the Brain (DS-2017-026). KF is funded by a Wellcome Trust Principal Research Fellowship (Ref: 088130/Z/09/Z).

Footnotes

1

By beliefs we mean Bayesian beliefs, i.e., probability distributions over a variable of interest (e.g., current position). Beliefs are therefore used in the sense of Bayesian belief updating or belief propagation—as opposed to propositional or folk psychology beliefs.

2

In information theory, the surprise (a.k.a., surprisal) associated with an outcome under a generative model is given by logp(o). This specifies the extent to which an observation is unusual and surprises the agent—but this does not mean that the agent consciously experiences surprise. In information theory this kind of surprise is known as self-information.

3

In Bayesian statistics, the model evidence (often referred to as marginal likelihood) associated with a generative model is p(o)—the probability of observed outcomes according to the model (sometimes this is written as p(o|m), explicitly conditioning upon a model). The model evidence scores the goodness of the model as an explanation of data that are sampled, by rewarding accuracy and penalising complexity, which avoids overfitting.

4

A more complete treatment may include priors over policies – usually denoted by E – and the evidence for a policy afforded by observed outcomes (usually denoted by F). These additional terms supplement the expected free energy, leading to an approximate posterior of the form σ(logEFG) (Friston et al., 2018b).

Appendix A. More complex generative models

In this Appendix, we briefly present cases of more complex discrete state-space generative models and explain how the belief updating can be extended to those cases.

A.1. Learning B and D

In this paper, we have only considered the case where A is learned, while beliefs about B (i.e., transition probabilities from one state to the next) and D (i.e., beliefs about the initial state) remained fixed. In general, B and D can also be learnt over time. This calls upon a new (extended) expression for the generative model with priors over B and D:

P(o1:T,s1:T,A,B,D,π)=P(π)P(A)P(B)P(D)P(s1|D)×τ=2TP(sτ|sτ1,B,π)τ=1TP(oτ|sτ,A)P(B)=uUi=1mP((Bu)i)P((Bu)i)=Dir((bu)i)P(D)=Dir(d) (A.1)

Here, (Bu)i and (bu)i denote the ith columns of the matrix Bu encoding the transition probabilities from one state to the next state and its corresponding Dirichlet parameter bu. Furthermore, one needs to define the corresponding approximate posteriors that will be used for learning:

Q(B)=uUi=1mQ((Bu)i)Q((Bu)i)=Dir((bu)i)Q(D)=Dir(d) (A.2)

The variational free energy, after having observed o1:t, is computed analogously as in Eq. (5). The process of finding the belief dynamics is then akin to Section 8—we rehearse it in the following: selecting only those terms in the variational free energy, which depend on B and D yields:

F[Q(B,D)]=DKL[Q(B)P(B)]+DKL[Q(D)P(D)]EQ(π)Q(s1|π)Q(D)[s1logD]τ=2tEQ(π)Q(sτ,sτ1|π)Q(B)[sτlogBπτsτ1]+=DKL[Q(B)P(B)]+DKL[Q(D)P(D)]s1logDτ=2tEQ(π)[sπτlogBπτsπτ1]+ (A.3)

Using the form of the KL divergence for Dirichlet distributions (18) and taking the gradients yields

logBuF(bu)=bubuτ=2tπΠδu,πtQ(π)(sπτsπτ1) (A.4)
logDF(d)=dds1 (A.5)

where denotes the Kronecker product. Finally, it is possible to specify neuronal plasticity dynamics following a descent on (A.4), (A.5), which correspond to biological dynamics. Alternatively, we have belief update rules implemented once after each trial of observation epochs in in-silico agents:

bu=bu+τ=2tπΠδu,πτQ(π)(sπτsπτ1) (A.6)
d=d+s1 (A.7)

A.2. Complexifying the prior over policies

In this paper, we have considered a simple prior approximate posterior over policies; namely, σ(G(π)). This can be extended to σ(γG(π)), where γ is an (inverse) temperature parameter that denotes the confidence in selecting a particular policy. This extension is quite natural in the sense that γ can be interpreted as the postsynaptic response to dopaminergic input (FitzGerald, Dolan et al., 2015, Friston et al., 2014). This correspondence is supported by empirical evidence (Schwartenbeck, FitzGerald, Mathys, Dolan and Friston, 2015) and enables one to simulate biologically plausible dopaminergic discharges (c.f., Appendix E (Friston, FitzGerald et al., 2017)). Anatomically, this parameter may be encoded within the substantia nigra, in nigrostriatal dopamine projection neurons (Schwartenbeck, FitzGerald, Mathys, Dolan and Friston, 2015), which maps well with our proposed functional anatomy (c.f., Fig. 5), since the substantia nigra is connected with the striatum. We refer the reader to (Friston, FitzGerald et al., 2017) for a discussion of the associated belief updating scheme.

A.3. Multiple state and outcome modalities

In general, one does not only need one hidden state and outcome factor to represent the environment, but many. Intuitively, this happens in the human brain as we integrate sensory stimuli from our five (or more) distinct senses. Mathematically, we can express this via different streams of hidden states (usually referred to as hidden factors) that evolve independently of one another that interact to generate outcomes at each time step; e.g., see Jordan et al. (1998, Figure 9) for a graphical representation of a multi-factorial hidden Markov model. This means that A becomes a multi-dimensional tensor that integrates information about the different hidden factors to cause outcomes. The belief updating is analogous in this case, contingent upon the fact that one assumes a mean-field factorisation of the approximate posterior on the different hidden state factors (see, e.g., Friston and Buzsáki, 2016, Mirza et al., 2016). This means that the beliefs about states may be processed in a manner analogous to Fig. 5, invoking a greater number of neural populations.

A.4. Deep temporal models

A deep temporal model is a generative model with many layers that are nested hierarchically and act at different timescales. These were first introduced within active inference in Friston, Lin et al. (2017). One can picture them graphically as a POMDP (c.f., Fig. 2) at the higher level where each outcome is replaced by a POMDP at the lower level, and so forth.

There is a useful metaphor for understanding the concept underlying deep temporal models: each layer of the model corresponds to the hand of a clock. In a two-layer hierarchical model, a ticking (resp. rotation) of the faster hand corresponds to a time step (resp. trial of observation epochs) at the lower level. At the end of each trial at the lower level, the slower hand ticks once, which corresponds to a time-step at the higher level, and the process unfolds again. One can concisely summarise this by saying that a state at the higher level corresponds to a trial of observation epochs at the lower level. Of course, there is no limit to the number of layers one can stack in a hierarchical model.

To obtain the associated belief updating, one computes free energy at the lower level by conditioning the probability distributions from Bayes rule by the variables from the higher levels. This means that one performs belief updating at the lower levels independently of the higher levels. Then, one computes the variational free energy at the higher levels by treating the lower levels as outcomes. For more details on the specificities of the scheme see Friston, Lin et al. (2017).

Appendix B. Computation of dynamics underlying perception

Here, we detail the computation of perceptual dynamics in the main text, specifically how to obtain (7), (8) from (6). In what follows, we denote the state-space by S={σ1,,σm}.

B.1. Free energy conditioned upon a policy

Recall the variational free energy conditioned upon a policy (6):

Fπ[Q(s1:T|π)]=τ=1TEQ(sτ|π)[logQ(sτ|π)]τ=1tEQ(sτ|π)Q(A)[logP(oτ|sτ,A)]EQ(s1|π)[logP(s1)]τ=2TEQ(sτ|π)Q(sτ1|π)[logP(sτ|sτ1,π)] (B.1)

In virtue of the mean field approximation (4) the approximate posterior over states Q(s1:T|π) factorises as Q(s1:T|π)=τ=1TQ(sτ|π), each of the factors being a categorical distribution over state-space S with parameter sπτ, a vector in {x(R>0)m|ixi=1}. Then, the free energy conditioned upon π is equivalently a function of each of those parameters

Fπ[Q(s1:T|π)]=Fπ(sπ1,,sπT).

Now we compute each of the expectations in (B.1).

  • EQ(sτ|π)[logQ(sτ|π)]. We use the definition of conditional expectation
    EQ(sτ|π)[logQ(sτ|π)]=i=1mQ(sτ=σi|π)logQ(sτ=σi|π)
    Since sπτ is the parameter of Q(sτ|π), we have by definition Q(sτ=σi|π)=sπτi, the ith component of sπτ. Hence,
    i=1mQ(sτ=σi|π)logQ(sτ=σi|π)=i=1msπτilogsπτi=sπτlogsπτ
    where the last equality follows by definition of the dot product of vectors.
  • EQ(sτ|π)Q(A)[logP(oτ|sτ,A)]. By definition of the likelihood matrix, logP(oτ|sτ,A)=log(oτAsτ) where A is a matrix and oτ (resp. sτ) are seen as a vector of zeros with a one at the outcome (resp. state) that occurs (see Table 2). Since oτ,sτ are only ones and zeros, log(oτAsτ)=oτlog(A)sτ. Using this and the linearity of the expectation we obtain
    EQ(sτ|π)Q(A)[logP(oτ|sτ,A)]=EQ(sτ|π)EQ(A)[log(oτAsτ)]=oτEQ(sτ|π)EQ(A)[logA]sτ=oτEQ(A)[logA]EQ(sτ|π)sτ
    By definition (see Table 2), we write EQ(A)[logA]=logA. Also, denoting by ei the ith unit vector in Rm
    EQ(sτ|π)sτ=i=1mQ(sτ=σi|π)ei=i=1msπτiei=sπτ
    Finally,
    EQ(sτ|π)[logQ(sτ|π)]=oτlogAsπτ.
  • EQ(s1|π)[logP(s1)]. By definition of the expectation, and of the vector D (see Table 2)
    EQ(s1|π)[logP(s1)]=i=1mQ(s1=σi|π)logDi=i=1msπτilogDi=sπτlogD. (B.2)
  • EQ(sτ|π)Q(sτ1|π)[logP(sτ|sτ1,π)]. By definition logP(sτ|sτ1,π)=logsτBπτ1sτ1 (see Table 2). Then a calculation analogous as for EQ(sτ|π)Q(A)[logP(oτ|sτ,A)] yields
    EQ(sτ|π)Q(sτ1|π)[logP(sτ|sτ1,π)]=sπτlog(Bπτ1)sπτ1. (B.3)

Inserting these results into (B.1) gives us (7):

Fπ(sπ1,,sπT)=τ=1Tsπτlogsπττ=1toτlogAsπτsπ1logDτ=2Tsπτlog(Bπτ1)sπτ1 (B.4)

B.2. Free energy gradients

Now we may compute the gradients of Fπ with respect to any of its arguments sπ1,,sπT. We do this for sπt,t>1, the others are analogous. When taking the gradient sπtFπ, the only terms that do not vanish in (B.4) are those that depend on sπt. Using the rules of differentiation of matrices and vectors (Petersen & Pedersen, 2012) we obtain (8).

sπtFπ(sπ1,,sπT)=sπt[sπtlogsπtotlogAsπtsπt+1log(Bπt)sπt]=sπt+1log(Bπt)log(Bπt1)sπt1=1+logsπτotlogAsπt+1log(Bπt)log(Bπt1)sπt1 (B.5)

Appendix C. Expected free energy as reaching steady-state

At the heart of active inference is a description of a certain class of systems that self-organise at non-equilibrium steady-state (Friston, 2019, Parr et al., 2020). This implies the existence of a steady-state probability distribution P(sτ,A) that the agent is guaranteed to reach given a sufficient amount of time. Intuitively, this distribution corresponds to the agent’s preferences over states and model parameters. Practically, this means that the agent selects policies, such that its predicted states Q(sτ,A) at some future time point τ>t – usually, the time horizon of a policy T – match its preferences P(sτ,A), which are specified by the generative model.

The purpose of this Appendix is to motivate the definition of expected free energy from the perspective of reaching steady-state. Specifically, we will show how a family of distributions Q(π), which comprises the (negative softmax) expected free energy, guarantee reaching steady-state.

Objective: we seek distributions over policies that implysteady-state solutions; i.e., when the final distribution does not depend upon initial observations. Such solutions ensure that, on average, stochastic policies lead to a steady-state or target distribution specified by the generative model. These solutions exist in virtue of conditional independencies, where the hidden states provide a Markov blanket that separates policies from outcomes. In other words, policies cause final states that cause outcomes.

In what follows, τ>t is a future time and QQ(oτ,sτ,A,π)P(oτ,sτ,A,π|o1:t) is the corresponding approximate posterior distribution, given observations o1:t.

Lemma 1 Steady-state —

Suppose that DKL[Q(sτ,A)P(sτ,A)] is finite. Then, the system reaches steady-state

Q(sτ,A)=P(sτ,A),

if and only if, the surprisal over policies logQ(π) and the Gibbs energy G(π;β), are equal when averaged under Q

EQ[logQ(π)]=EQ[G(π;β)], (C.1)

where,

G(π;β)=DKL[Q(sτ,A|π)P(sτ,A)]EQ(oτ,sτ,A|π)[βlogP(oτ|sτ,A)] (C.2)
βEQ[logQ(π|sτ,A)]EQ[logP(oτ|sτ,A)]0.

Here, β0 characterises the steady-state with the relative precision (i.e., negative entropy) of policies and final outcomes, given final states. The generative model stipulates steady-state, in the sense that distribution over final states (and outcomes) does not depend upon initial observations. Here, the generative and predictive distributions simply express the conditional independence between policies and final outcomes, given final states. Note that when β=1, Gibbs energy becomes expected free energy.

An important consequence of Lemma 1 is that when (C.1) holds, we either have DKL[Q(sτ,A)P(sτ,A)]=+ or DKL[Q(sτ,A)P(sτ,A)]=0 (steady-state). Intuitively, DKL[Q(sτ,A)P(sτ,A)] being infinite means that Q(sτ,A) is singular with respect to P(sτ,A). This is the case, for example, when the steady-state density sits on the other side of an impassable gulf, or when Q(sτ,A) and P(sτ,A) do not overlap. Conversely, the requirement that DKL[Q(sτ,A)P(sτ,A)] is finite implies that Q(sτ,A) is absolutely continuous with respect to P(sτ,A), that is P(sτ,A)>0 whenever Q(sτ,A)>0.

Proof of Lemma 1

Let us unpack the Gibbs energy expected under Q:

EQ[G(π;β)]=EQ[DKL[Q(sτ,A|π)P(sτ,A)]EQ(oτ,sτ,A|π)[βlogP(oτ|sτ,A)]]=EQ[DKL[Q(sτ,A|π)P(sτ,A)]]EQ[logQ(π|sτ,A)]EQ[logP(oτ|sτ,A)]EQ[EQ(oτ,sτ,A|π)[logP(oτ|sτ,A)]]=EQ[logQ(sτ,A|π)logP(sτ,A)logQ(π|sτ,A)]=EQ[logQ(π)logP(sτ,A)+logQ(sτ,A)]=EQ[logQ(π)]+DKL[Q(sτ,A)P(sτ,A)].

Therefore,

EQ[G(π;β)]=EQ[logQ(π)]DKL[Q(sτ,A)P(sτ,A)]=0Q(sτ,A)=P(sτ,A).

A straightforward consequence of Lemma 1, is that

Corollary 2

Under the assumption that DKL[Q(sτ,A)P(sτ,A)] is finite, each distribution

Q(π)=σ(G(π;β)),β0, (C.3)

describes a certain kind of system that reaches some steady-state distribution. In particular, the case β=1 corresponds to the approximate posterior over policies that is used in the main text (10).

We defer the proof of Corollary 2 to the end of this appendix.

The family of distributions Q(π;β) has interesting interpretations. For example, the case β=0 corresponds to standard stochastic control, variously known as KL control or risk-sensitive control (van den Broek et al., 2010):

G(π;0)=DKL[Q(sτ,A|π)P(sτ,A)]DKL[Q(sτ|π)P(sτ)]

In other words, the most likely policies minimise the KL divergence between the predictive and target distribution. More generally, when β>0, policies are more likely when they simultaneously minimise the entropy of outcomes, given states. In other words, β>0 ensures that the system exhibits itinerant behaviour. One can see that KL control may arise in this case if the entropy of the likelihood mapping remains constant with respect to policies. In active inference, we currently assume β=1 for simplicity, however, the implications of different values of β on behaviour are interesting and will be examined in future work.

One perspective – on the distinction between simple and general steady-states – is in terms of uncertainty about policies, where policies may be thought as trajectories taken by the system. For example, in simple steady-states there is no uncertainty about which policy led to a final state. This, for example, corresponds to describing classical systems (that follow a unique path of least action), where it would be possible to infer which policy had been pursued, given the initial and final outcomes. Conversely, in general steady-state systems (e.g., mice, Homo sapiens), simply knowing that ‘you are here’ does not tell me ‘how you got here’, even if I knew where you were this morning. Put another way, there are lots of paths or policies open to systems that attain a general steady state.

In active inference, we are interested in a certain class of systems that self-organise to general steady-states; namely, those that move through a large number of probabilistic configurations from their initial state to their final steady-state. The treatments in Parr et al. (2020) and Friston (2019) effectively turn the steady-state lemma on its head by assuming steady-state is stipulatively true – and then characterise the ensuing self-organisation in terms of Bayes optimal policies: if a system attains a general steady-state, it will appear to behave in a Bayes optimal fashion – both in terms of optimal Bayesian design (i.e., exploration) and Bayesian decision theory (i.e., exploitation) (Friston, Da Costa et al., 2020). Crucially, the loss function defining Bayesian risk is the negative log evidence for the generative model entailed by an agent. In short, systems (i.e., agents) that attain general steady-states will look as if they are responding to epistemic affordances (Parr & Friston, 2017b).

Remark 3

It is straightforward to extend this appendix by considering systems that reach their preferences at a collection of time-steps into the future, say τ1,,τn>t. In this case, one can adapt the proof of Lemma 1 to obtain:

EQi=1nG(π,τi;β)=EQ[nlogQ(π)]+i=1nDKL[Q(sτi,A)P(sτi,A)]

where G(π,τi;β) is the Gibbs free energy, replacing τ by τi in (C.2). In this case, the canonical choice of approximate posterior over policies would be:

Q(π)=σ1ni=1nG(π,τi;β) (C.4)

We conclude this appendix with the proof of Corollary 2.

Proof of Corollary 2

Let β0 and Q~(oτ,sτ,A,π) be the unnormalised measure defined as Q~(π)exp(G(π;β)) and Q~(oτ,sτ,A|π)Q(oτ,sτ,A|π). Trivially, logQ~(π)=G(π;β) and we can check that the proof of Lemma 1 still holds with an unnormalised measure Q~. Therefore, systems that have Q~ as an approximate posterior reach a steady-state distribution.

To make sense of an unnormalised distribution over policies Q~ as a posterior we only need to verify that our current update rules are still valid, or can be extended to this setting. All update rules for active inference agents hold – see Table 3 – if we extend policy independent state-estimation (12) as

sτ=1πΠQ~(π)πΠsπτQ~(π). (C.5)

Since agents that have Q~ as an approximate posterior reach steady-state, it suffices to show that replacing Q~ by Q has no effect on the agent’s dynamics. Since Q~ and Q only differ by their π-marginals, we must look at the consequence of changing Q~(π) by Q(π). The normalisation of Q~(π) has no consequences on the action selection, see (11), and policy independent state-estimation remains the same, as (C.5) shows. Changing Q~(π) by Q(π) does not change any of the remaining dynamics. □

Appendix D. Computing expected free energy

In this appendix, we present the derivations underlying the analytical expression of the expected free energy that is used in spm_MDP_VB_X.m. Following Parr (2019), we can reexpress the expected free energy in the following form:

G(π)=EQ(sτ|π)[H[P(oτ|sτ)]]Ambiguity+DKL[Q(sτ|π)P(sτ)]Risk (states)EP(oτ|sτ)Q(sτ|π)[DKL[Q(A|oτ,sτ)Q(A)]]Novelty (D.1)

Here, Q(A|oτ,sτ) denotes approximate posterior beliefs about A if we knew occurrence of the state–outcome pair (oτ,sτ). In the following, we show that we can compute the expected free energy in the following way

G(π)HsπτAmbiguity+sπτ(logsπτlogC)Risk (states)AsπτWsπτNoveltyHdiag[AlogA]W12a(1)a0(1) (D.2)

when the agent’s preferences C are expressed in terms of preferences over states. When preferences are expressed in terms of outcomes (as is currently implemented in spm_MDP_VB_X.m), the risk term instead becomes

(Asπτ)(log(Asπτ)logC)Risk (outcomes) (D.3)

D.1. Ambiguity

The ambiguity term of (D.1) is EQ(sτ|π)[H[P(oτ|sτ)]]. By definition, the entropy inside the expectation is:

H[P(oτ|sτ)]=oτOP(oτ|sτ)logP(oτ|sτ) (D.4)

The first factor inside the sum corresponds to:

P(oτ|sτ)=P(oτ,A|sτ)dA=P(oτ|sτ,A)P(A)dA=oτAsτP(A)dAoτAsτQ(A)dA=oτEQ(A)[A]sτ=oτAsτ (D.5)

Here we have replaced the prior over model parameters P(A) by the approximate posterior Q(A). This is not necessary, but in numerical simulations since learning occurs once at the end of the trial the two can be interchanged—furthermore, this allows us to reuse previously introduced notation. In any case, this tells us that the entropy can be re-expressed as:

H[P(oτ|sτ)]=oτO(oτAsτ)(oτlog(A)sτ)=i=1n(Aisτ)(log(Ai)sτ)=i=1n(Ailog(Ai))sτ=(AlogA)sτ=diag[AlogA]sτ (D.6)

Finally,

EQ(sτ|π)[H[P(oτ|sτ)]]Ambiguity=HsπτHdiag[AlogA] (D.7)

D.2. Risk

The risk term of (D.1) is the KL divergence between predicted states following a particular policy and preferred states. This can be expressed as:

DKL[Q(sτ|π)P(sτ)]Risk (states)=sπτ(sπτlogC) (D.8)

Where the vector CRm encodes preference over states P(sτ)=Cat(C). However, it is also possible to approximate this risk term over states by a risk term over outcomes (c.f., (15)), as is currently implemented in spm_MDP_VB_X.m. In this case, if CRn denotes the preferences over outcomes P(oτ)=Cat(C):

DKL[Q(oτ|π)P(oτ)]Risk (outcomes)=(Asπτ)(log(Asπτ)logC) (D.9)

D.3. Novelty

The novelty term of (D.1) is EP(oτ|sτ)Q(sτ|π)[DKL[Q(A|oτ,sτ)Q(A)]] where

Q(A)=i=1mQ(Ai),Q(Ai)=Dir(ai) (D.10)
Q(A|oτ,sτ)=i=1mQ(Ai|oτ,sτ),Q(Ai|oτ,sτ)Dir(ai) (D.11)

The KL divergence between both distributions (c.f., (18)) can be expressed as:

DKL[Q(A|oτ,sτ)Q(A)]=i=1m[logΓ(a0i)k=1nlogΓ(aki)logΓ(a0i)+k=1nlogΓ(aki)]+(aa)(ψ(a)ψ(a0)) (D.12)

where ψ is the digamma function. We now want to make sense of a. Suppose that at time τ the agents know the possible outcome j and possible state k as in Q(A|oτ,sτ) (c.f., Table 2 for terminology). This means that in this case, beliefs about hidden states correspond to the true state; in other words, sτ=sτ. We can then use the rule of accumulation of Dirichlet parameters to deduce a=a+oτsτ. In other words, ajk=ajk+1 and the remaining components are identical. Using the well-known identity:

Γ(x+1)=xΓ(x)logΓ(x+1)=logx+logΓ(x) (D.13)

we can compute (D.12):

DKL[Q(A|oτ,sτ)Q(A)]=logΓ(a0k+1)logΓ(a0k)logΓ(ajk+1)+logΓ(ajk)+ψ(ajk+1)ψ(a0k)=loga0klogajk+ψ(ajk+1)ψ(a0k+1) (D.14)

Using the definition of the digamma function ψ(x)=ddxlogΓ(x) we obtain:

DKL[Q(A|oτ,sτ)Q(A)]=loga0klogajk+ddajk(logΓ(ajk+1))dda0k(logΓ(a0k+1))=loga0klogajk+ddajk(logΓ(ajk+1))dda0k(loga0k+logΓ(a0k))=loga0klogajk+1ajk1a0k+ψ(ajk)ψ(a0k) (D.15)

We can use an asymptotic expansion of the digamma function to simplify the expression:

ψ(x)logx12x+DKL[Q(A|oτ,sτ)Q(A)]12ajk12a0k (D.16)

Finally, the analytical expression of the novelty term:

EP(oτ|sτ)Q(sτ|π)[DKL[Q(A|oτ,sτ)Q(A)]]AsπτWsπτW12a1a01 (D.17)

References

  1. Ackley D.H., Hinton G.E., Sejnowski T.J. A learning algorithm for Boltzmann machines. Cognitive Science. 1985;9(1):147–169. doi: 10.1016/S0364-0213(85)80012-4. [DOI] [Google Scholar]
  2. Adams R.A., Stephan K.E., Brown H.R., Frith C.D., Friston K.J. The computational anatomy of psychosis. Frontiers in Psychiatry. 2013;4 doi: 10.3389/fpsyt.2013.00047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Aitchison L., Lengyel M. With or without you: Predictive coding and Bayesian inference in the brain. Current Opinion in Neurobiology. 2017;46:219–227. doi: 10.1016/j.conb.2017.08.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Allenby G.M., Rossi P.E., McCulloch R.E. Hierarchical Bayes models: A practitioners guide. Journal of Bayesian Applications in Marketing. 2005 [Google Scholar]
  5. Ashby W.R. Principles of the self-organizing dynamic system. The Journal of General Psychology. 1947;37(2):125–128. doi: 10.1080/00221309.1947.9918144. [DOI] [PubMed] [Google Scholar]
  6. Aström K.J. Optimal control of Markov processes with incomplete state information. Journal of Mathematical Analysis and Applications. 1965;10 [Google Scholar]
  7. Attias H. 9th int. workshop on artificial intelligence and statistics. 2003. Planning by probabilistic inference; p. 8. [Google Scholar]
  8. Barlow H.B. The MIT Press; 1961. Possible principles underlying the transformations of sensory messages. [Google Scholar]
  9. Barlow H. Inductive inference, coding, perception, and language. Perception. 1974;3(2):123–134. doi: 10.1068/p030123. [DOI] [PubMed] [Google Scholar]
  10. Barlow H. Redundancy reduction revisited. Computation and Neural Systems. 2001:13. [PubMed] [Google Scholar]
  11. Barto A., Mirolli M., Baldassarre G. Novelty or surprise? Frontiers in Psychology. 2013;4 doi: 10.3389/fpsyg.2013.00907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Barto A., Sutton R. 1992. Reinforcement learning: An introduction. [Google Scholar]
  13. Bastos A.M., Usrey W.M., Adams R.A., Mangun G.R., Fries P., Friston K.J. Canonical microcircuits for predictive coding. Neuron. 2012;76(4):695–711. doi: 10.1016/j.neuron.2012.10.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Beal M.J. 2003. Variational algorithms for approximate Bayesian inference; p. 281. [Google Scholar]
  15. Benrimoh D., Parr T., Vincent P., Adams R.A., Friston K. Active inference and auditory hallucinations. Computational Psychiatry. 2018;2:183–204. doi: 10.1162/cpsy&#x002d9;a&#x002d9;00022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Berger J.O. Statistical decision theory and Bayesian analysis. 2nd ed. Springer-Verlag; New York: 1985. (Springer series in statistics). [DOI] [Google Scholar]
  17. Bernard C. Thomas; 1974. Lectures on the phenomena of life common to animals and plants. [Google Scholar]
  18. Berns G.S., Sejnowski T.J. How the basal ganglia make decisions. In: Damasio A.R., Damasio H., Christen Y., editors. Neurobiology of decision-making. Springer Berlin Heidelberg; Berlin, Heidelberg: 1996. pp. 101–113. (Research and perspectives in neurosciences). [DOI] [Google Scholar]
  19. Bishop C.M. Pattern recognition and machine learning. Springer; New York: 2006. (Information science and statistics). [Google Scholar]
  20. Blei D.M., Kucukelbir A., McAuliffe J.D. Variational inference: A review for statisticians. Journal of the American Statistical Association. 2017;112(518):859–877. doi: 10.1080/01621459.2017.1285773. arXiv:1601.00670. [DOI] [Google Scholar]
  21. Bogacz R. A tutorial on the free-energy framework for modelling perception and learning. Journal of Mathematical Psychology. 2017;76:198–211. doi: 10.1016/j.jmp.2015.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Bossaerts P., Murawski C. From behavioural economics to neuroeconomics to decision neuroscience: The ascent of biology in research on human decision making. Current Opinion in Behavioral Sciences. 2015;5:37–42. doi: 10.1016/j.cobeha.2015.07.001. [DOI] [Google Scholar]
  23. Botvinick M., Toussaint M. Planning as inference. Trends in Cognitive Sciences. 2012;16(10):485–488. doi: 10.1016/j.tics.2012.08.006. [DOI] [PubMed] [Google Scholar]
  24. Box G.E.P., Tiao G.C. Multiparameter problems from a Bayesian point of view. The Annals of Mathematical Statistics. 1965 [Google Scholar]
  25. Bruineberg J., Rietveld E., Parr T., van Maanen L., Friston K.J. Free-energy minimization in joint agent-environment systems: A niche construction perspective. Journal of Theoretical Biology. 2018;455:161–178. doi: 10.1016/j.jtbi.2018.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Buckley C.L., Kim C.S., McGregor S., Seth A.K. The free energy principle for action and perception: A mathematical review. Journal of Mathematical Psychology. 2017;81:55–79. doi: 10.1016/j.jmp.2017.09.004. [DOI] [Google Scholar]
  27. Buschman T.J., Miller E.K. Shifting the spotlight of attention: Evidence for discrete computations in cognition. Frontiers in Human Neuroscience. 2010;4 doi: 10.3389/fnhum.2010.00194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Carpenter G.A., Grossberg S. A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing. 1987;37(1):54–115. doi: 10.1016/S0734-189X(87)80014-2. [DOI] [Google Scholar]
  29. Çatal O., Nauta J., Verbelen T., Simoens P., Dhoedt B. 2019. Bayesian policy selection using active inference. arXiv:1904.08149 [cs] arXiv:1904.08149. [Google Scholar]
  30. Chao Z.C., Takaura K., Wang L., Fujii N., Dehaene S. Large-scale cortical networks for hierarchical prediction and prediction error in the primate brain. Neuron. 2018;100(5):1252–1266.e3. doi: 10.1016/j.neuron.2018.10.004. [DOI] [PubMed] [Google Scholar]
  31. Claeskens G., Hjort N.L. Cambridge University Press; 2006. Model selection and model averaging. [Google Scholar]
  32. Conant R.C., Ashby W.R. Every good regulator of a system must be a model of that system. International Journal of Systems Science. 1970;1(2):89–97. [Google Scholar]
  33. Constant A., Ramstead M., Veissière S.P.L., Campbell J.O., Friston K. A variational approach to niche construction. Journal of the Royal Society Interface. 2018;15(141) doi: 10.1098/rsif.2017.0685. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Constant A., Ramstead M.J.D., Veissière S.P.L., Friston K. Regimes of expectations: an active inference model of social conformity and human decision making. Frontiers in Psychology. 2019;10 doi: 10.3389/fpsyg.2019.00679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Cullen M., Davey B., Friston K.J., Moran R.J. Active inference in openai gym: A paradigm for computational investigations into psychiatric illness. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging. 2018;3(9):809–818. doi: 10.1016/j.bpsc.2018.06.010. [DOI] [PubMed] [Google Scholar]
  36. Da Costa L., Sajid N., Parr T., Friston K., Smith R. 2020. The relationship between dynamic programming and active inference: The discrete, finite horizon case. [Google Scholar]
  37. Dan Y., Atick J.J., Reid R.C. Efficient coding of natural scenes in the lateral geniculate nucleus: Experimental test of a computational theory. Journal of Neuroscience. 1996;16(10):3351–3362. doi: 10.1523/JNEUROSCI.16-10-03351.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Dauwels J. 2007 IEEE international symposium on information theory. IEEE; Nice: 2007. On variational message passing on factor graphs; pp. 2546–2550. [DOI] [Google Scholar]
  39. Dauwels J., Vialatte F., Rutkowski T., Cichocki A. NIPS. 2007. Measuring neural synchrony by message passing. [Google Scholar]
  40. Daw N.D., Gershman S.J., Seymour B., Dayan P., Dolan R.J. Model-based influences on humans’ choices and striatal prediction errors. Neuron. 2011;69(6):1204–1215. doi: 10.1016/j.neuron.2011.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Daw N.D., O’Doherty J.P., Dayan P., Seymour B., Dolan R.J. Cortical substrates for exploratory decisions in humans. Nature. 2006;441(7095):876–879. doi: 10.1038/nature04766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Dayan P., Hinton G.E., Neal R.M., Zemel R.S. The Helmholtz Machine. Neural Computation. 1995;7(5):889–904. doi: 10.1162/neco.1995.7.5.889. [DOI] [PubMed] [Google Scholar]
  43. Deci E., Ryan R.M. Intrinsic motivation and self-determination in human behavior. Springer US; 1985. (Perspectives in social psychology). [DOI] [Google Scholar]
  44. Deco G., Jirsa V.K., Robinson P.A., Breakspear M., Friston K. The dynamic brain: From spiking neurons to neural masses and cortical fields. PLoS Computational Biology. 2008;4(8) doi: 10.1371/journal.pcbi.1000092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Dehaene S., Meyniel F., Wacongne C., Wang L., Pallier C. The neural representation of sequences: From transition probabilities to algebraic patterns and linguistic trees. Neuron. 2015;88(1):2–19. doi: 10.1016/j.neuron.2015.09.019. [DOI] [PubMed] [Google Scholar]
  46. Ding L., Gold J.I. The basal ganglia’s contributions to perceptual decision-making. Neuron. 2013;79(4):640–649. doi: 10.1016/j.neuron.2013.07.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Duncan J., Ward R., Shapiro K. Direct measurement of attentional dwell time in human vision. Nature. 1994;369(6478):313–315. doi: 10.1038/369313a0. [DOI] [PubMed] [Google Scholar]
  48. Eichenbaum H., Dudchenko P., Wood E., Shapiro M., Tanila H. The hippocampus, memory, and place cells: is it spatial memory or a memory space? Neuron. 1999;23(2):209–226. doi: 10.1016/S0896-6273(00)80773-4. [DOI] [PubMed] [Google Scholar]
  49. FitzGerald T.H.B., Dolan R.J., Friston K.J. Model averaging, optimal inference, and habit formation. Frontiers in Human Neuroscience. 2014;8 doi: 10.3389/fnhum.2014.00457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. FitzGerald T.H.B., Dolan R.J., Friston K. Dopamine, reward learning, and active inference. Frontiers in Computational Neuroscience. 2015;9 doi: 10.3389/fncom.2015.00136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. FitzGerald T.H.B., Moran R.J., Friston K.J., Dolan R.J. Precision and neuronal dynamics in the human posterior parietal cortex during evidence accumulation. NeuroImage. 2015;107:219–228. doi: 10.1016/j.neuroimage.2014.12.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. FitzGerald T.H.B., Schwartenbeck P., Moutoussis M., Dolan R.J., Friston K. Active inference, evidence accumulation, and the urn task. Neural Computation. 2015;27(2):306–328. doi: 10.1162/NECO&#x002d9;a&#x002d9;00699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Fleming W.H., Sheu S.J. Risk-sensitive control and an optimal investment model II. Annals of Applied Probability. 2002;12(2):730–767. doi: 10.1214/aoap/1026915623. [DOI] [Google Scholar]
  54. Fonollosa J., Neftci E., Rabinovich M. Learning of chunking sequences in cognition and behavior. PLoS Computational Biology. 2015;11(11) doi: 10.1371/journal.pcbi.1004592. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Friston K. Hierarchical models in the brain. PLoS Computational Biology. 2008;4(11) doi: 10.1371/journal.pcbi.1000211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Friston K. The free-energy principle: A rough guide to the brain? Trends in Cognitive Sciences. 2009;13(7):293–301. doi: 10.1016/j.tics.2009.04.005. [DOI] [PubMed] [Google Scholar]
  57. Friston K. The free-energy principle: A unified brain theory? Nature Reviews Neuroscience. 2010;11(2):127–138. doi: 10.1038/nrn2787. [DOI] [PubMed] [Google Scholar]
  58. Friston K. The history of the future of the Bayesian brain. NeuroImage. 2012;62(2):1230–1233. doi: 10.1016/j.neuroimage.2011.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Friston K. BioArxiv; 2019. A free energy principle for a particular physics; p. 148. [Google Scholar]
  60. Friston K., Adams R., Montague R. What is value—accumulated reward or evidence? Frontiers in Neurorobotics. 2012;6 doi: 10.3389/fnbot.2012.00011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Friston K., Buzsáki G. The functional anatomy of time: what and when in the brain. Trends in Cognitive Sciences. 2016;20(7):500–511. doi: 10.1016/j.tics.2016.05.001. [DOI] [PubMed] [Google Scholar]
  62. Friston K., Da Costa L., Hafner D., Hesp C., Parr T. 2020. Sophisticated inference. arXiv:2006.04120 [cs, q-bio]. arXiv:2006.04120. [DOI] [PubMed] [Google Scholar]
  63. Friston K.J., Daunizeau J., Kiebel S.J. Reinforcement learning or active inference? PLoS ONE. 2009;4(7) doi: 10.1371/journal.pone.0006421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Friston K., FitzGerald T., Rigoli F., Schwartenbeck P., O’Doherty J., Pezzulo G. Active inference and learning. Neuroscience & Biobehavioral Reviews. 2016;68:862–879. doi: 10.1016/j.neubiorev.2016.06.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Friston K., FitzGerald T., Rigoli F., Schwartenbeck P., Pezzulo G. Active inference: a process theory. Neural Computation. 2017;29(1):1–49. doi: 10.1162/NECO&#x002d9;a&#x002d9;00912. [DOI] [PubMed] [Google Scholar]
  66. Friston K., Harrison L., Penny W. Dynamic causal modelling. NeuroImage. 2003;19(4):1273–1302. doi: 10.1016/S1053-8119(03)00202-7. [DOI] [PubMed] [Google Scholar]
  67. Friston K., Kiebel S. Predictive coding under the free-energy principle. Philosophical Transactions of the Royal Society, Series B (Biological Sciences) 2009;364(1521):1211–1221. doi: 10.1098/rstb.2008.0300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Friston K., Kilner J., Harrison L. A free energy principle for the brain. Journal de Physiologie (Paris) 2006;100(1–3):70–87. doi: 10.1016/j.jphysparis.2006.10.001. [DOI] [PubMed] [Google Scholar]
  69. Friston K.J., Lin M., Frith C.D., Pezzulo G., Hobson J.A., Ondobaka S. Active inference, curiosity and insight. Neural Computation. 2017;29(10):2633–2683. doi: 10.1162/neco&#x002d9;a&#x002d9;00999. [DOI] [PubMed] [Google Scholar]
  70. Friston K., Mattout J., Trujillo-Barreto N., Ashburner J., Penny W. Variational free energy and the Laplace approximation. NeuroImage. 2007;34(1):220–234. doi: 10.1016/j.neuroimage.2006.08.035. [DOI] [PubMed] [Google Scholar]
  71. Friston K.J., Parr T., de Vries B. The graphical brain: Belief propagation and active inference. Network Neuroscience. 2017;1(4):381–414. doi: 10.1162/NETN&#x002d9;a&#x002d9;00018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Friston K., Parr T., Zeidman P. 2018. Bayesian Model reduction. arXiv:1805.07092 [stat]. arXiv:1805.07092. [Google Scholar]
  73. Friston K., Penny W. Post hoc Bayesian model selection. NeuroImage. 2011;56(4):2089–2099. doi: 10.1016/j.neuroimage.2011.03.062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Friston K., Rigoli F., Ognibene D., Mathys C., Fitzgerald T., Pezzulo G. Active inference and epistemic value. Cognitive Neuroscience. 2015;6(4):187–214. doi: 10.1080/17588928.2015.1020053. [DOI] [PubMed] [Google Scholar]
  75. Friston K.J., Rosch R., Parr T., Price C., Bowman H. Deep temporal models and active inference. Neuroscience & Biobehavioral Reviews. 2018;90:486–501. doi: 10.1016/j.neubiorev.2018.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Friston K.J., Sajid N., Quiroga-Martinez D.R., Parr T., Price C.J., Holmes E. bioRxiv; 2020. Active listening. 2020.03.18.997122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Friston K., Samothrakis S., Montague R. Active inference and agency: Optimal control without cost functions. Biological Cybernetics. 2012;106(8):523–541. doi: 10.1007/s00422-012-0512-8. [DOI] [PubMed] [Google Scholar]
  78. Friston K., Schwartenbeck P., FitzGerald T., Moutoussis M., Behrens T., Dolan R.J. The anatomy of choice: Dopamine and decision-making. Philosophical Transactions of the Royal Society, Series B (Biological Sciences) 2014;369(1655) doi: 10.1098/rstb.2013.0481. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Friston K.J., Stephan K.E. Free-energy and the brain. Synthese. 2007;159(3):417–458. doi: 10.1007/s11229-007-9237-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Fuster J.M. Prefrontal cortex and the bridging of temporal gaps in the perception-action cycle. Annals of the New York Academy of Sciences. 1990;608:318–329. doi: 10.1111/j.1749-6632.1990.tb48901.x. discussion 330–336. [DOI] [PubMed] [Google Scholar]
  81. George D. 2005. Belief propagation and wiring length optimization as organizing principles for cortical microcircuits. [Google Scholar]
  82. Gershman S.J., Niv Y. Learning latent structure: Carving nature at its joints. Current Opinion in Neurobiology. 2010;20(2):251–256. doi: 10.1016/j.conb.2010.02.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Gregory R.L. Perceptions as hypotheses. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 1980;290(1038):181–197. doi: 10.1098/rstb.1980.0090. [DOI] [PubMed] [Google Scholar]
  84. Haber S.N. The primate basal ganglia: Parallel and integrative networks. Journal of Chemical Neuroanatomy. 2003;26(4):317–330. doi: 10.1016/j.jchemneu.2003.10.003. [DOI] [PubMed] [Google Scholar]
  85. Haken H. Synergetics: An introduction nonequilibrium phase transitions and self-organization in physics, chemistry and biology. 2nd ed. Springer-Verlag; Berlin Heidelberg: 1978. (Springer series in synergetics). [DOI] [Google Scholar]
  86. Hanslmayr S., Volberg G., Wimber M., Dalal S.S., Greenlee M.W. Prestimulus oscillatory phase at 7 hz gates cortical information flow and visual perception. Current Biology. 2013;23(22):2273–2278. doi: 10.1016/j.cub.2013.09.020. [DOI] [PubMed] [Google Scholar]
  87. Haruno M., Wolpert D., Kawato M. 2003. Hierarchical MOSAIC for movement generation. [Google Scholar]
  88. von Helmholtz H., Southall J.P.C. Dover Publications; New York: 1962. Helmholtz’s treatise on physiological optics. OCLC: 523553. [Google Scholar]
  89. Heskes T. Convexity arguments for efficient minimization of the Bethe and Kikuchi free energies. Journal of Artificial Intelligence Research. 2006;26:153–190. doi: 10.1613/jair.1933. [DOI] [Google Scholar]
  90. Hinton G., Dayan P., Frey B., Neal R. The “wake-sleep” algorithm for unsupervised neural networks. Science. 1995;268(5214):1158–1161. doi: 10.1126/science.7761831. [DOI] [PubMed] [Google Scholar]
  91. Hobson J., Friston K. Waking and dreaming consciousness: Neurobiological and functional considerations. Progress in Neurobiology. 2012;98(1):82–98. doi: 10.1016/j.pneurobio.2012.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Hobson J.A., Hong C.C.-H., Friston K.J. Virtual reality and consciousness inference in dreaming. Frontiers in Psychology. 2014;5 doi: 10.3389/fpsyg.2014.01133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Hohwy J. The self-evidencing brain. Noûs. 2016;50(2):259–285. doi: 10.1111/nous.12062. [DOI] [Google Scholar]
  94. Howard R.A. Information Value Theory. IEEE Transactions on Systems Science and Cybernetics. 1966;2(1):22–26. doi: 10.1109/TSSC.1966.300074. [DOI] [Google Scholar]
  95. Huys Q.J.M., Eshel N., O’Nions E., Sheridan L., Dayan P., Roiser J.P. Bonsai trees in your head: how the pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Computational Biology. 2012;8(3) doi: 10.1371/journal.pcbi.1002410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Iglesias S., Mathys C., Brodersen K.H., Kasper L., Piccirelli M., den Ouden H.E.M., Stephan K.E. Hierarchical prediction errors in midbrain and basal forebrain during sensory learning. Neuron. 2013;80(2):519–530. doi: 10.1016/j.neuron.2013.09.009. [DOI] [PubMed] [Google Scholar]
  97. Itti L., Baldi P. Bayesian surprise attracts human attention. Vision Research. 2009;49(10):1295–1306. doi: 10.1016/j.visres.2008.09.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Itti L., Koch C., Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1998;20(11):1254–1259. doi: 10.1109/34.730558. [DOI] [Google Scholar]
  99. Jahanshahi M., Obeso I., Rothwell J.C., Obeso J.A. A fronto-striato-subthalamic-pallidal network for goal-directed and habitual inhibition. Nature Reviews. Neuroscience. 2015;16(12):719–732. doi: 10.1038/nrn4038. [DOI] [PubMed] [Google Scholar]
  100. Jaynes E.T. Information theory and statistical mechanics. Physical Review. 1957;106(4):620–630. doi: 10.1103/PhysRev.106.620. [DOI] [Google Scholar]
  101. Jordan M.I., Ghahramani Z., Jaakkola T.S., Saul L.K. An introduction to variational methods for graphical models. In: Jordan M.I., editor. Learning in graphical models. Springer Netherlands; Dordrecht: 1998. pp. 105–161. [DOI] [Google Scholar]
  102. Jordan M.I., Ghahramani Z., Saul L.K. 1997. Hidden Markov decision trees. [Google Scholar]
  103. Kahneman D., Tversky A. Prospect theory: An analysis of decision under risk. Cambridge University Press; New York, NY, US: 1988. (Decision, probability, and utility: Selected readings). [DOI] [Google Scholar]
  104. Kaplan R., Friston K.J. Planning and navigation as active inference. Biological Cybernetics. 2018;112(4):323–343. doi: 10.1007/s00422-018-0753-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Kauder E. Genesis of the marginal utility theory: From aristotle to the end of the eighteenth century. The Economic Journal. 1953;63(251):638–650. doi: 10.2307/2226451. [DOI] [Google Scholar]
  106. Kauffman S.A. Oxford University Press; 1993. The origins of order: Self-organization and selection in evolution. [Google Scholar]
  107. Kirchhoff M., Parr T., Palacios E., Friston K., Kiverstein J. The Markov blankets of life: Autonomy, active inference and the free energy principle. Journal of the Royal Society Interface. 2018;15(138) doi: 10.1098/rsif.2017.0792. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Knill D.C., Pouget A. The Bayesian brain: The role of uncertainty in neural coding and computation. Trends in Neurosciences. 2004;27(12):712–719. doi: 10.1016/j.tins.2004.10.007. [DOI] [PubMed] [Google Scholar]
  109. Kullback S., Leibler R.A. On information and sufficiency. The Annals of Mathematical Statistics. 1951;22(1):79–86. doi: 10.1214/aoms/1177729694. [DOI] [Google Scholar]
  110. Kurt B. 2013. Kullback-leibler divergence between two Dirichlet (and Beta) distributions. [Google Scholar]
  111. Landau A.N., Fries P. Attention samples stimuli rhythmically. Current Biology. 2012;22(11):1000–1004. doi: 10.1016/j.cub.2012.03.054. [DOI] [PubMed] [Google Scholar]
  112. Lewicki M.S. Efficient coding of natural sounds. Nature Neuroscience. 2002;5(4):356–363. doi: 10.1038/nn831. [DOI] [PubMed] [Google Scholar]
  113. Lindley D.V. On a measure of the information provided by an experiment. The Annals of Mathematical Statistics. 1956;27(4):986–1005. [Google Scholar]
  114. Linsker R. Perceptual neural organization: Some approaches based on network models and information theory. Annual Review of Neuroscience. 1990;13(1):257–281. doi: 10.1146/annurev.ne.13.030190.001353. [DOI] [PubMed] [Google Scholar]
  115. Loeliger H.-A. An introduction to factor graphs. IEEE Signal Processing Magazine. 2004 [Google Scholar]
  116. Luck S.J., Vogel E.K. The capacity of visual working memory for features and conjunctions. Nature. 1997;390(6657):279–281. doi: 10.1038/36846. [DOI] [PubMed] [Google Scholar]
  117. Lundqvist M., Rose J., Herman P., Brincat S.L., Buschman T.J., Miller E.K. Gamma and beta bursts underlie working memory. Neuron. 2016;90(1):152–164. doi: 10.1016/j.neuron.2016.02.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  118. Maass W. On the computational power of winner-take-all. Neural Computation. 2000;12(11):2519–2535. doi: 10.1162/089976600300014827. [DOI] [PubMed] [Google Scholar]
  119. MacKay D.J.C. Information-based objective functions for active data selection. Neural Computation. 1992;4(4):590–604. doi: 10.1162/neco.1992.4.4.590. [DOI] [Google Scholar]
  120. MacKay D. Electronic letters. 1995. A free energy minimization algorithm for decoding and cryptanalysis. [Google Scholar]
  121. MacKay D.J.C. Sixth Printing 2007 ed. Cambridge University Press; Cambridge, UK ; New York: 2003. Information theory, inference and learning algorithms. [Google Scholar]
  122. Marreiros A.C., Daunizeau J., Kiebel S.J., Friston K.J. Population dynamics: Variance and the sigmoid activation function. NeuroImage. 2008;42(1):147–157. doi: 10.1016/j.neuroimage.2008.04.239. [DOI] [PubMed] [Google Scholar]
  123. McKay R.T., Dennett D.C. The evolution of misbelief. The Behavioral and Brain Sciences. 2009;32(6):493–510. doi: 10.1017/S0140525X09990975. discussion 510–561. [DOI] [PubMed] [Google Scholar]
  124. Millidge B. PsyArXiv; 2019. Implementing predictive processing and active inference: Preliminary steps and results: Preprint. [DOI] [Google Scholar]
  125. Mirza M.B., Adams R.A., Mathys C.D., Friston K.J. Scene construction, visual foraging, and active inference. Frontiers in Computational Neuroscience. 2016;10 doi: 10.3389/fncom.2016.00056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  126. Mirza M.B., Adams R.A., Mathys C., Friston K.J. Human visual exploration reduces uncertainty about the sensed world. PLOS ONE. 2018;13(1) doi: 10.1371/journal.pone.0190429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  127. Mirza M.B., Adams R.A., Parr T., Friston K. Impulsivity and active inference. Journal of Cognitive Neuroscience. 2019;31(2):202–220. doi: 10.1162/jocn&#x002d9;a&#x002d9;01352. [DOI] [PubMed] [Google Scholar]
  128. Moran R., Pinotsis D.A., Friston K. Neural masses and fields in dynamic causal modeling. Frontiers in Computational Neuroscience. 2013;7 doi: 10.3389/fncom.2013.00057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  129. Moutoussis M., Trujillo-Barreto N.J., El-Deredy W., Dolan R.J., Friston K.J. A formal model of interpersonal inference. Frontiers in Human Neuroscience. 2014;8 doi: 10.3389/fnhum.2014.00160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  130. Nicolis G., Prigogine I. Wiley-Blackwell; New York: 1977. Self-organization in nonequilibrium systems: From dissipative structures to order through fluctuations. [Google Scholar]
  131. O’Keefe J., Dostrovsky J. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Research. 1971;34(1):171–175. doi: 10.1016/0006-8993(71)90358-1. [DOI] [PubMed] [Google Scholar]
  132. Olshausen B.A., Field D.J. Sparse coding of sensory inputs. Current Opinion in Neurobiology. 2004;14(4):481–487. doi: 10.1016/j.conb.2004.07.007. [DOI] [PubMed] [Google Scholar]
  133. Olshausen B.A., O’Connor K.N. A new window on sound. Nature Neuroscience. 2002;5(4):292–294. doi: 10.1038/nn0402-292. [DOI] [PubMed] [Google Scholar]
  134. Optican L.M., Richmond B.J. Temporal encoding of two-dimensional patterns by single units in primate inferior temporal cortex. III. Information theoretic analysis. Journal of Neurophysiology. 1987;57(1):162–178. doi: 10.1152/jn.1987.57.1.162. [DOI] [PubMed] [Google Scholar]
  135. Oudeyer P.-Y., Kaplan F. What is intrinsic motivation? A typology of computational approaches. Frontiers in Neurorobotics. 2009;1 doi: 10.3389/neuro.12.006.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  136. Parr T. University College London; London: 2019. The computational neurology of active vision. (Ph.D.) [Google Scholar]
  137. Parr T., Benrimoh D.A., Vincent P., Friston K.J. Precision and false perceptual inference. Frontiers in Integrative Neuroscience. 2018;12 doi: 10.3389/fnint.2018.00039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  138. Parr T., Da Costa L., Friston K. Markov blankets, information geometry and stochastic thermodynamics. Philosophical Transactions of the Royal Society of London A (Mathematical and Physical Sciences) 2020;378(2164) doi: 10.1098/rsta.2019.0159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  139. Parr T., Friston K.J. Uncertainty, epistemics and active inference. Journal of the Royal Society Interface. 2017;14(136) doi: 10.1098/rsif.2017.0376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  140. Parr T., Friston K.J. Working memory, attention, and salience in active inference. Scientific Reports. 2017;7(1):14678. doi: 10.1038/s41598-017-15249-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  141. Parr T., Friston K.J. Active inference and the anatomy of oculomotion. Neuropsychologia. 2018;111:334–343. doi: 10.1016/j.neuropsychologia.2018.01.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  142. Parr T., Friston K.J. The anatomy of inference: Generative models and brain structure. Frontiers in Computational Neuroscience. 2018;12 doi: 10.3389/fncom.2018.00090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  143. Parr T., Friston K.J. The computational anatomy of visual neglect. Cerebral Cortex (New York, N.Y.: 1991) 2018;28(2):777–790. doi: 10.1093/cercor/bhx316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  144. Parr T., Friston K.J. The discrete and continuous brain: From decisions to movement—and back again. Neural Computation. 2018;30(9):2319–2347. doi: 10.1162/neco&#x002d9;a&#x002d9;01102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  145. Parr T., Friston K.J. The computational pharmacology of oculomotion. Psychopharmacology. 2019 doi: 10.1007/s00213-019-05240-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  146. Parr T., Markovic D., Kiebel S.J., Friston K.J. Neuronal message passing using mean-field, bethe, and marginal approximations. Scientific Reports. 2019;9(1):1889. doi: 10.1038/s41598-018-38246-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  147. Parr T., Rees G., Friston K.J. Computational neuropsychology and Bayesian inference. Frontiers in Human Neuroscience. 2018;12 doi: 10.3389/fnhum.2018.00061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  148. Parr T., Rikhye R.V., Halassa M.M., Friston K.J. Prefrontal computation as active inference. Cerebral Cortex. 2019 doi: 10.1093/cercor/bhz118. bhz118 e0190429 e12112 e37454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  149. Pearl J. 1988. Probabilistic reasoning in intelligent systems. [Google Scholar]
  150. Pearl J. Graphical models for probabilistic and causal reasoning. In: Smets P., editor. Quantified representation of uncertainty and imprecision. Springer Netherlands; Dordrecht: 1998. pp. 367–389. (Handbook of defeasible reasoning and uncertainty management systems). [DOI] [Google Scholar]
  151. Penny W. 2001. KL-divergence of normal, gamma, Dirichlet and wishart densitites. [Google Scholar]
  152. Petersen K.B., Pedersen M.S. 2012. The matrix cookbook. [Google Scholar]
  153. Rao R.P.N., Ballard D.H. Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience. 1999;2(1):79–87. doi: 10.1038/4580. [DOI] [PubMed] [Google Scholar]
  154. Reverdy P., Srivastava V., Leonard N.E. 2013. Modeling human decision-making in generalized Gaussian multi-armed bandits. arXiv:1307.6134 [cs, math, stat] arXiv:1307.6134. [Google Scholar]
  155. Riesenhuber M., Poggio T. Hierarchical models of object recognition in cortex. Nature Neuroscience. 1999;2(11):1019–1025. doi: 10.1038/14819. [DOI] [PubMed] [Google Scholar]
  156. Rolls E.T., Tovee M.J. Processing speed in the cerebral cortex and the neurophysiology of visual masking. Proceedings of the Royal Society of London, Series B. 1994;257(1348):9–15. doi: 10.1098/rspb.1994.0087. [DOI] [PubMed] [Google Scholar]
  157. Salakhutdinov R., Hinton G. An efficient learning procedure for deep Boltzmann machines. Neural Computation. 2012;24(8):1967–2006. doi: 10.1162/NECO&#x002d9;a&#x002d9;00311. [DOI] [PubMed] [Google Scholar]
  158. Sales A.C., Friston K.J., Jones M.W., Pickering A.E., Moran R.J. bioRxiv; 2018. Locus Coeruleus Tracking of prediction errors optimises cognitive flexibility: An Active Inference model. [DOI] [PMC free article] [PubMed] [Google Scholar]
  159. Schmidhuber J. Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts. Connection Science. 2006;18(2):173–187. doi: 10.1080/09540090600768658. [DOI] [Google Scholar]
  160. Schmidhuber J. Formal theory of creativity, fun, and intrinsic motivation (1990–2010) IEEE Transactions on Autonomous Mental Development. 2010;2(3):230–247. doi: 10.1109/TAMD.2010.2056368. [DOI] [Google Scholar]
  161. Schwartenbeck P., FitzGerald T., Dolan R.J., Friston K. Exploration, novelty, surprise, and free energy minimization. Frontiers in Psychology. 2013;4 doi: 10.3389/fpsyg.2013.00710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  162. Schwartenbeck P., FitzGerald T.H.B., Mathys C., Dolan R., Friston K. The dopaminergic midbrain encodes the expected certainty about desired outcomes. Cerebral Cortex (New York, N.Y.: 1991) 2015;25(10):3434–3445. doi: 10.1093/cercor/bhu159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  163. Schwartenbeck P., FitzGerald T.H.B., Mathys C., Dolan R., Kronbichler M., Friston K. Evidence for surprise minimization over value maximization in choice behavior. Scientific Reports. 2015;5:16575. doi: 10.1038/srep16575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  164. Schwartenbeck P., FitzGerald T.H.B., Mathys C., Dolan R., Wurst F., Kronbichler M., Friston K. Optimal inference with suboptimal models: Addiction and active Bayesian inference. Medical Hypotheses. 2015;84(2):109–117. doi: 10.1016/j.mehy.2014.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  165. Schwöbel S., Kiebel S., Marković D. Active inference, belief propagation, and the bethe approximation. Neural Computation. 2018;30(9):2530–2567. doi: 10.1162/neco&#x002d9;a&#x002d9;01108. [DOI] [PubMed] [Google Scholar]
  166. Sengupta B., Friston K. Computational biology workshop, Vol. 14. 2016. Approximate Bayesian inference as a gauge theory. arXiv:1705.06614. [DOI] [Google Scholar]
  167. Sengupta B., Tozzi A., Cooray G.K., Douglas P.K., Friston K.J. Towards a neuronal gauge theory. PLOS Biology. 2016;14(3) doi: 10.1371/journal.pbio.1002400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  168. Sharot T. The optimism bias. Current Biology. 2011;21(23):R941–R945. doi: 10.1016/j.cub.2011.10.030. [DOI] [PubMed] [Google Scholar]
  169. Smith R., Kirlic N., Stewart J.L., Touthang J., Kuplicki R., Khalsa S.S., Paulus M.P., Investigators T., Aupperle R. PsyArXiv; 2020. Greater decision uncertainty characterizes a transdiagnostic patient sample during approach-avoidance conflict: A computational modeling approach: Preprint. [DOI] [PMC free article] [PubMed] [Google Scholar]
  170. Smith R., Schwartenbeck P., Parr T., Friston K.J. bioRxiv; 2019. An active inference model of concept learning. [DOI] [Google Scholar]
  171. Smith R., Schwartenbeck P., Stewart J.L., Kuplicki R., Ekhtiari H., Investigators T., Paulus M.P. PsyArXiv; 2020. Imprecise action selection in substance use disorder: Evidence for active learning impairments when solving the explore-exploit dilemma: Preprint. [DOI] [PMC free article] [PubMed] [Google Scholar]
  172. Stephan K.E., Penny W.D., Daunizeau J., Moran R.J., Friston K.J. Bayesian model selection for group studies. NeuroImage. 2009;46(4):1004–1017. doi: 10.1016/j.neuroimage.2009.03.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  173. Stone J.V. 2019. Artificial intelligence engines: A tutorial introduction to the mathematics of deep learning. [Google Scholar]
  174. Sun Y., Gomez F., Schmidhuber J. 2011. Planning to be surprised: optimal Bayesian exploration in dynamic environments. arXiv:1103.5708 [cs, stat]. arXiv:1103.5708. [Google Scholar]
  175. Tanaka T. 1999. A theory of mean field approximation; p. 10. [Google Scholar]
  176. Tee J., Taylor D.P. 2018. Is information in the brain represented in continuous or discrete form? arXiv:1805.01631 [cs, math, q-bio] arXiv:1805.01631. [Google Scholar]
  177. Tervo D.G.R., Tenenbaum J.B., Gershman S.J. Toward the neural implementation of structure learning. Current Opinion in Neurobiology. 2016;37:99–105. doi: 10.1016/j.conb.2016.01.014. [DOI] [PubMed] [Google Scholar]
  178. Thibaut F. Basal ganglia play a crucial role in decision making. Dialogues in Clinical Neuroscience. 2016;18(1):3. doi: 10.31887/DCNS.2016.18.1/fthibaut. [DOI] [PMC free article] [PubMed] [Google Scholar]
  179. Todorov E. 2008 47th IEEE conference on decision and control. 2008. General duality between optimal control and estimation; pp. 4286–4292. [DOI] [Google Scholar]
  180. Todorov E., Jordan M.I. Optimal feedback control as a theory of motor coordination. Nature Neuroscience. 2002;5(11):1226–1235. doi: 10.1038/nn963. [DOI] [PubMed] [Google Scholar]
  181. Tschantz A., Baltieri M., Seth A.K., Buckley C.L. 2019. Scaling active inference. arXiv:1911.106011911.10601 [cs, eess, math, stat] arXiv:1911.10601. [Google Scholar]
  182. van den Broek B., Wiegerinck W., Kappen B. UAI. 2010. Risk sensitive path integral control. [Google Scholar]
  183. Vincent P., Parr T., Benrimoh D., Friston K.J. With an eye on uncertainty: Modelling pupillary responses to environmental volatility. PLoS Computational Biology. 2019;15(7) doi: 10.1371/journal.pcbi.1007126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  184. Von Békésy G. Princeton University Press; 1967. Sensory inhibition. [Google Scholar]
  185. Von Neumann J., Morgenstern O. Princeton University Press; Princeton, NJ, US: 1944. Theory of games and economic behavior. (Theory of games and economic behavior). [Google Scholar]
  186. White C.C. Encyclopedia of operations research and management science. Springer US; Boston, MA: 2001. Markov decision processes; pp. 484–486. [DOI] [Google Scholar]
  187. Winn J., Bishop C.M. Variational message passing. Journal of Machine Learning Research. 2005:34. [Google Scholar]
  188. Wu C.M., Schulz E., Speekenbrink M., Nelson J.D., Meder B. Generalization guides human exploration in vast decision spaces. Nature Human Behaviour. 2018;2(12):915–924. doi: 10.1038/s41562-018-0467-4. [DOI] [PubMed] [Google Scholar]
  189. Xitong Y. Institute for Advanced Computer Studies. University of Maryland; 2017. Understanding the variational lower bound. [Google Scholar]
  190. Yedidia J., Freeman W., Weiss Y. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory. 2005;51(7):2282–2312. doi: 10.1109/TIT.2005.850085. [DOI] [Google Scholar]
  191. Zhang W., Luck S.J. Discrete fixed-resolution representations in visual working memory. Nature. 2008;453(7192):233–235. doi: 10.1038/nature06860. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES