Abstract
The free energy principle (FEP) states that any dynamical system can be interpreted as performing Bayesian inference upon its surrounding environment. Although, in theory, the FEP applies to a wide variety of systems, there has been almost no direct exploration or demonstration of the principle in concrete systems. In this work, we examine in depth the assumptions required to derive the FEP in the simplest possible set of systems – weakly-coupled non-equilibrium linear stochastic systems. Specifically, we explore (i) how general the requirements imposed on the statistical structure of a system are and (ii) how informative the FEP is about the behaviour of such systems. We discover that two requirements of the FEP – the Markov blanket condition (i.e. a statistical boundary precluding direct coupling between internal and external states) and stringent restrictions on its solenoidal flows (i.e. tendencies driving a system out of equilibrium) – are only valid for a very narrow space of parameters. Suitable systems require an absence of perception-action asymmetries that is highly unusual for living systems interacting with an environment. More importantly, we observe that a mathematically central step in the argument, connecting the behaviour of a system to variational inference, relies on an implicit equivalence between the dynamics of the average states of a system with the average of the dynamics of those states. This equivalence does not hold in general even for linear stochastic systems, since it requires an effective decoupling from the system's history of interactions. These observations are critical for evaluating the generality and applicability of the FEP and indicate the existence of significant problems of the theory in its current form. These issues make the FEP, as it stands, not straightforwardly applicable to the simple linear systems studied here and suggest that more development is needed before the theory could be applied to the kind of complex systems that describe living and cognitive processes.
Keywords: Free energy principle, Markov blanket, Linear non-equilibrium systems, Bayesian inference
Highlights
-
•
We review the free energy principle (FEP) in a family of analytically tractable non-equilibrium linear systems.
-
•
We study how general and how informative the FEP is in this class of systems.
-
•
We find that crucial assumptions of the FEP restrict its applicability to a very narrow space of the studied linear systems.
-
•
Secondly, a core step relies on an equivalence between the dynamics of the average and the average of the dynamics.
-
•
As a result, the FEP does not result in a good description of the (history-dependent) behaviour of a system.
1. Introduction
During the last decade, the ‘free energy principle’ (FEP) has become an influential framework which aims to provide a grand theory promoting a Bayesian interpretation of living systems [1], [2], [3]. The FEP states that any self-organizing system (i.e. any dynamical system, and therefore any living or cognitive entity) equipped with a Markov blanket – a statistical separation between internal and external states – can be interpreted as performing Bayesian inference upon the surrounding environment, such that its internal states come to encode probabilistic beliefs about the external environment [3], [4].
The core claim of the FEP is exceptionally ambitious. It implies that the dynamics of any pair of coupled systems, under specific conditions about the interaction of internal and external states, can be described as one system trying to statistically infer the states of the second system (cf. an agent and its environment). This claim licenses an interpretation of the agent as performing a basic kind of Bayesian inference and encoding beliefs about the surrounding environment [5], [6]. Such an equivalence could have a far-reaching influence on the study of living systems. For example, it could enable approximate calculations of the dynamics of complex systems in terms of a more tractable description of the dynamics of their sufficient statistics. Furthermore, the FEP has been defended based on singular claims about its explanatory power, suggesting that it reveals novel insights among fundamental psychological concepts such as memory, attention, value, reinforcement, and salience [7] and unifies different aspects of motor behaviour and perception, from retinal stabilization to goal-seeking [8]. In addition, it has been proposed that the FEP provides a basis for integrating several general brain theories, including the Bayesian brain hypothesis, neural Darwinism, Hebbian cell assembly theory, and optimal control and decision theory [9].
The FEP has also inspired theories such as predictive coding [10], [11], [12], [13] and active inference [14], [15], [16], which offer explanations and models of brain function and dysfunction [17] through the lens of Bayesian inference, and have become widely influential in theoretical neuroscience and beyond. For instance, predictive coding has been proposed to be a biologically plausible model of cortical function [11], [12], [18], and has been applied to explain binocular rivalry [19] or attention [20]. Active inference, on the other hand, has seen substantial use in modelling the behaviour of human or animal subjects in various paradigms [14], [15], as well as understanding rational decision-making and behavioural control [21], [22], [23], [24]. Moreover, biological theories proposing that neurons [25], synapses [26], bacteria [27] or plants [28] could be explicitly performing Bayesian (variational) inference by minimizing free energy gradients have been proposed and justified using explicit appeals to the FEP. While such theories do not entirely depend upon the validity of the mathematical core of the FEP reviewed here, they nevertheless derive a substantial amount of their intellectual and rhetorical support from it. Thus, the fundamental validity of the mathematical framework of the FEP is of great importance to this large and rapidly increasing modelling literature.
In a vast body of work that spans over the course of about 15 years, the FEP has detailed the mathematical steps required to derive its central claims. In the first phase, an intuitive and heuristic idea of an imperative to minimize variational free energy was developed [29], [7] based on the need for any recognizable ‘system’ to maintain itself in a low entropy configuration over time against dissipative forces trying to push it towards a high entropy equilibrium state. Later, this heuristic argument and intuition was more formally related to concepts from stochastic thermodynamics [2], [1], specifically by bringing in the Markov blanket condition and expressing the dynamics of the system in terms of a gradient descent on a variational free energy corresponding to a generative model of the environment of the system. Finally, the mathematical formulation and some of its arguments have been recently refined in a new series of publications [3], [4], [30], a process that is still ongoing.
Despite the extensive literature on the FEP, there are few concrete examples that apply all the required steps to a specific, well-studied system. Similarly, it has rarely been explored whether the assumptions (which are sometimes left unstated) required for deriving the principle hold under the dynamics we expect from cognitive and living systems. In light of the increasing rate of publications concerning the FEP (the number of published papers on the topic doubles every few years, Fig. 1), we believe it is imperative to ground and test the foundations of the theory in concrete models to assess the generality and validity of its claims. To this end, we explore a class of systems defined by stochastic linear differential equations under a weak-coupling assumption. Such systems are the simplest possible example that can display the dynamics required for the FEP, as well as capture non-equilibrium properties of systems engaged in perception-action cycles (as we expect from living systems). Moreover, the absence of nonlinear interactions in this class of systems allows for precise analytic calculations, which offers an interesting test-bed to examine, in detail, the connection between non-trivial dynamics and the statistical properties of coupled systems. In general, due to the special independence properties imposed by the theory, if the assumptions and steps of the FEP do not hold in such simple systems, we consider it unlikely that they hold in more complex nonlinear systems where the dynamics are expected to be more deeply intertwined. Moreover, we observe that stronger couplings result in higher-order interactions that make difficult the separation between internal and external states required by the FEP. Because of this, we expect that the introduction of non-linearities will, in general, have a similar effect.
How general is the free energy principle
We first inspect the generality of the assumptions about the statistical structure of a system required by the principle. A crucial step to derive the FEP is to establish a relation between the average flow of change of a system that interacts with an environment and the gradient of a variational free energy of a model of this environment. This step relies on specific assumptions about how perception and action mediate the interaction of the internal and external states of a system. We aim to explore how general these assumptions are and whether they can be expected in the dynamical systems models of living systems.
Perception-action interface. The FEP partitions the states of a system into external, sensory, active and internal states. Then, the theory assumes that perception-action cycles involve causal dependencies such that internal and external states are only mutually influenced through the effect of active and sensory states [2]. We will refer to this idea as a perception-action interface (see Appendix A). One example of this interface is a cell membrane around a cell [2], although the formal definition of a perception-action interface does not require the interface to be a physical boundary. Thus, a perception-action interface could describe any set of variables mediating between system and environment as, for example, a combination of retinal activity and oculomotor states mediating between neural activity in the visual cortices and the location of an object in the environment [4].
Then, the FEP requires the system to be endowed with a particular statistical structure with two special properties: a Markov blanket (i.e. conditional independence between internal and external states) and the absence of solenoidal couplings.
Markov blanket. The FEP prescribes that variables in a perception-action interface constitute a Markov blanket [2], [30]. A Markov blanket (see Appendix A) is defined as a set of states (the ‘blanket’) that separates two other sets in a statistical sense (i.e. they are conditionally independent, given the blanket). The term was initially introduced in the context of Bayesian networks or graphs [31], and it is also known as the general Markov condition [32]. In the FEP, Markov blankets are used for identifying a set of variables that separate the internal and external states of a system (see [33] for a detailed study on the specific use of the concept of Markov blanket in the FEP). Here, we note that Markov blankets can be easily identified in models defined by directed acyclic Bayesian networks (Fig. 2.A). In these systems, a sufficient condition for a Markov blanket of variable x (e.g., an internal state) is that it contains the parent nodes of x, the children nodes of x and the other parents of each children node (in this case, the minimum Markov blanket is composed by the grey nodes, , in Fig. 2.A). This is defined as the local Markov condition [32] (which implies the general Markov condition in the directed acyclic graphs of Bayesian networks). However, in [2] the FEP suggests that a Markov blanket arises naturally from the perception-action interface depicted in Fig. 2.B (although recent works restrict this to the case of an absence of particular solenoidal flows [30]). However, such a cyclic structure can generate couplings that propagate beyond causal interactions. In this case, the local Markov condition does not imply a Markov blanket, as marginalization over local blanket variables generates new couplings (dashed arrows in Fig. 2.B). This important issue (identified by [34]) contradicts the intuition that perception-action states always constitute Markov blankets, which we will explore in following sections.
Solenoidal couplings. The second required property is that solenoidal couplings between internal, external, sensor and motor states are absent. The idea of solenoidal flows (see Appendix A) arises from the separation of the flow of a dynamical system into two components. The first component is a dissipative (curl-free) flow that counters the dispersion of the density caused by random fluctuations in the system. The second component, the solenoidal flow, is defined as a conservative (divergent-free) flow capturing dissipative tendencies in the system, driving it away from equilibrium [3, p.11]. This nonequilibrium nature is a fundamental aspect of living entities. Examples of this are asymmetric organism-environment interactions [35] or the oscillatory behaviour underpinning most biorhythms and neural dynamics [36]. The FEP assumes that solenoidal couplings between internal and external states are absent, meaning that these flows will not penetrate through the perception-action interface.
Average flows and the variational free energy. The confluence of these properties entails an important result: due to this particular statistical structure, the average flow of a system can be described in terms of a variational free energy gradient. The average flow (also called the marginal flow, see Appendix A) describes the average rate of change of the system conditioned on the blanket state. In turn, the variational free energy (see Appendix A) represents an upper bound of the surprise of observed states, according to an internal model of the environment.
In the first section of this manuscript, we study how likely these conditions are for the type of dynamical interactions we find in living systems. By considering the simplest case of non-equilibrium stochastic dynamics, we show that Markov blankets and absence of solenoidal flows only emerge for very particular perception-action interfaces, forcing symmetries in agent-environment interactions that are not expected in living beings.
The core issue is that these three requirements – a perception-action partition where sensor and active states mediating the internal and the external states, the existence of a Markov blanket, and decoupled solenoidal flows – are, in principle, independent conditions. A perception-action interface does not necessarily guarantee the required conditional independence relationships and, in fact, generally does not since statistical correlations can propagate beyond this interface over time due to the intrinsic fluctuations and reentrant connections in the system. Conversely, the conditional independence relationship decreed by the Markov blanket does not imply that there is a lack of dynamical coupling between internal and external states [34]. In practice, we discover that the kinds of systems that can fulfil both the Markov blanket condition and the block-diagonal solenoidal coupling condition are extremely specialized and generally do not possess the kind of sparsity and asymmetry of dynamical couplings that we expect from a perception-action interface. These asymmetries are present at different levels of living systems, leading to qualitative differences between the inside and outside of a system, shaping system-environment interactions and its related flows of energy and matter [35], [37], [38], [39].
In other words, although the Markov blanket assumption is typically maintained in systems with very weak couplings, we observe that there are direct interactions between external and internal states in the system's dynamics. This subtle distinction between the conditional independence relationships of the mechanisms of a system (perception-action interface) and its statistical couplings (Markov blanket) – which is analogous to the distinction between anatomical and functional connectivity in neuroscience – has perhaps been underappreciated in the FEP literature. This ambiguity leads to the claim that the requirements of the FEP naturally describe systems with a causal boundary between the external and internal states [2], when this is not necessarily the case.
How informative is the free energy principle
Once the relation between a free energy gradient and the average flow of a system has been established, we explore how informative this relationship is about the behaviour of an organism or the evolution of a dynamical system.
Conditional synchronisation manifold. To justify the relation between a gradient of a free energy functional and the behaviour of a system, the FEP assumes the existence of a conditional synchronisation manifold (see Appendix A). This manifold is defined as a mapping that, given a blanket state, connects the most likely internal and external states. The FEP proposes that its existence allows us to characterise the relationship between (maximum a posteriori) internal and external states in terms of internal states ‘sensing’ or ‘tracking’ external states through the Markov blanket [3].
Next, the FEP links the evolution of the most likely external states with the average flow of the system, conditioned on the blanket states. Since the step described in the previous section connects the free energy gradient with the average flow, this assumption implies that the evolution of the most likely external states is driven by the gradient of a variational free energy. Moreover, given the conditional synchronisation manifold, the most likely internal states are also driven by this free energy gradient. The FEP suggests this result leads to the appearance that any system with the properties described above behaves as if internal states were performing variational inference to predict external states.
Average flows and the rate of change of the average. The claim that systems under the required properties behave as if performing inference relies on a venturesome assumption. It implicitly attributes the rate of change of the average (expected) state of a system– i.e. the expected change in the internal state of an agent at a particular moment – can be roughly described by the steady-state average flow of the system conditioned on a blanket state – i.e. the average rate of that state during many trajectories (see Appendix A). This assumption is driven by the intuition that if the average flow points in a direction that minimizes variational free energy, internal states will behave as if they are trying, on average, to minimize this free energy. As well, this can be read as an implicit assumption that these two quantities are approximately the same. However, as we will show, this intuition is incorrect since the average flow conditioned on blanket states disconnects the rate of change in the system from its previous trajectory, which is crucial to predict the system's behaviour. In practice, this assumption implies decoupling the actions of an agent from its history of previous states. We will show that, in the class of linear systems explored in this work, this results in the free energy gradients being uninformative about the behaviour of an agent or its specific trajectories.
We should note that this claim has been relaxed in more recent work [30], proposing (instead of an equivalence between rates of change of expected states and average rates) that an interpretation in terms of Bayesian inference emerges only in expectation. We will see however that this still presents important practical and conceptual problems. To draw an analogy that portrays this issue, we could propose that the actions of a population of organisms in a particular evolutionary context maximize, on average, a fitness function – e.g. the number of genes the population passes on to the next generation. That is, however, a largely uninformative statement for describing the behaviour of an individual organism, which depends on that organism's specific history. Moreover, the behaviour of an organism that systematically inferred what to do next in terms of (evolutionary) fitness maximization would likely be entirely different from the realized behaviour of any living organism. This is not only a point of philosophical nuance but, as we will show, translates into assumptions and important equations required for deriving the FEP.
Overview of the mathematical review
In this article, we present a technical and conceptual critique of the FEP. As the theory is in continuous development, its mathematical details have been described in different forms and notations in the literature. For our study, we try to remain as close as possible to the verbal and formal descriptions described in the most recent publications [3], [4], [30]. However, in at least two instances (see Assumption 1 and Assumption 3⁎, Assumption 3⁎⁎), we identify conflicting interpretations of the theory. In these cases, we attempt to derive our argument with as much mathematical coherence as possible while also presenting results that address the different identified possibilities.
The rest of the article is organized as follows. First, we present a summary of the FEP, including a list of conditions and assumptions required to derive it. Next, we survey the steps prescribed to derive the FEP for a linear stochastic system under a weak coupling assumption. We then evaluate in which cases we expect the requirements of the FEP to hold. Finally, we evaluate the implications of our study for the theory and its applicability to the type of processes that living systems are expected to manifest.
2. Summary of the theory
Here we present a succinct description of the theory, its conditions and assumptions (Fig. 3), based on the most recent publications [3], [4], [30], although some steps apply to previous versions as well. Generally, the FEP assumes a random dynamical system described in terms of a Langevin stochastic differential equation:
(1) |
where is a vector, f is an arbitrary but differentiable function and is a Gaussian white noise with covariance , which is a diagonal matrix. Throughout this article, bold symbols represent vectors and matrices.
The FEP further assumes that the system can be decomposed into external, sensory, active and internal states, , configured as a perception-action loop reflecting an interface mediating between ‘autonomous’ states (active and internal states ) and ‘non-autonomous’ states (external and sensory states ). This leads to
Condition 1
The flow function f decouples autonomous and non-autonomous states according to the following perception-action interface:
(2)
Under the presence of random fluctuations, some systems will converge toward a stable global attractor reflecting the steady-state dynamics of the system. In systems out of thermodynamic equilibrium, this global attractor will describe a non-equilibrium steady-state (NESS), characterized by a continuous energy flux between the system and its environment. The next condition of the FEP is that this global attractor exists and can be described by a stochastic differential equation (SDE) decomposition [40], [41] (often referred in the FEP literature as a ‘Helmholtz’ decomposition), describing the flow function as a linear function of the logarithmic steady state distribution of the system:
Condition 2
The FEP assumes that the system will reach a non-equilibrium steady state described by the probability density function, which can be described using a SDE decomposition that separates the flow into dissipativeand solenoidalcomponents
(3)
(4) whereQis an antisymmetric matrix – i.e. equal to its negative transpose,. This condition requiresΓandQto be constant matrices – i.e. state-independent, as it is the case in linear systems.1
The third condition is that the perception-action interface induces a Markov blanket into the NESS probability distribution. In a Markov blanket, internal states are independent when conditioned on the blanket states, composed of sensory and active states, :
Condition 3
The steady state distribution is described in terms of a Markov blanket, where internal/external states are independent when conditioned on its blanket states.
(5)
It is important to note (see [34]), that Conditions 1 and Condition 3 are independent and one does not entail the other. This is an important point, as some presentations of the FEP assume that a perception-action interface directly involves a causal barrier (Markov blanket) between system and environment, and this is not always the case.
2.1. First move: capturing Bayesian inference with an average flow
The FEP starts by describing the average flows of the external states of a system as following a gradient minimizing a variational free energy. This connects these flows with notions from Bayesian inference.
The principle starts from a description of the ‘surprise’ of the observed blanket states, under the steady-state distribution defined by the (random) dynamics of the system, which we denote with , being the negative log-probability
(6) |
This implies that highly unlikely states will have a large surprise value and vice versa.
However, a system cannot access this surprise value without complete knowledge of its environment. Thus, Bayesian inference prescribes to use a lower bound of this surprise described by the variational free energy :
(7) |
(8) |
which is composed of the surprise plus a term capturing the distance from the probability of external states given the blanket to a variational model of the environment parametrized by θ.
This free energy constitutes a bound on the surprise, which is exact when the variational distribution is equal to the reference distribution . The FEP often assumes that q is a normal distribution:
(9) |
where θ are the most likely states of the system and its covariance matrix.
A simple instance of Bayesian inference can be derived from the assumption that the integral of the negative conditional log-probability (or surprise) in the Free energy equation is approximately quadratic in the region near the mode of the conditional density θ. This is called the Laplace approximation [44] and allows approximating
(10) |
where Tr is a trace operator, is the Hessian matrix respect to y of the marginal probability distribution at .
In this scenario, finding the model that minimizes the variational free energy is equivalent to approximating the distribution . From a gradient descent perspective, if we are interested in adjusting parameters θ, this minimization process results from following the negative gradient
(11) |
Under the FEP it is suggested that a system displaying a Markov blanket minimizes the free energy functional by implementing a gradient descent scheme referred to as recognition dynamics [7], [45], [46]. A literal interpretation of this claim involves a dynamics of the variational parameter with the form
(12) |
where γ is a matrix characterizing the rates of adjustment of the variational parameters. This type of dynamics (or a discrete counterpart) is usually proposed by active inference schemes (e.g. [46]). However, in the most recent articles (e.g. [30]) this claim is not taken literally, and the proponents of the FEP suggest that it is only the average flows of the system (not the actual dynamics) that point in the direction of the free energy gradient. Throughout this manuscript, we will explore both a literal interpretation of a gradient descent on the free energy and the case in which the free energy gradient is only connected with the average flows.
The FEP asserts that any system with a Markov blanket partition that reaches a non-equilibrium steady-state (NESS) can be construed as performing an elementary sort of Bayesian inference. This implies that the behaviour of a system can be described by some variable that behaves as in Eq. (12), at least on average. Specifically, the FEP proposes that the evolution of the statistics of internal states can be described in terms of the variational free energy, given a blanket .
Under Condition 3, given a blanket state at time t (), the statistics of internal and external states can be described independently. The FEP proposes to describe change in internal and external states through variables encoding the most likely2 internal and external states conditioned on the blanket
(13) |
(14) |
Then, the average flow of the system (or marginal flow) conditioned on the blanket can be computed from the SDE decomposition in Condition 2 as
(15) |
The first term in this expression can be related to the gradient of the surprise in Eq. (11) (see below). However, the second and third terms in the Eq. (15) preclude a straightforward connection between the average flow of the system and the minimization of the variational free energy. The necessary step in deriving the FEP is removing the solenoidal couplings between blocks of the system, encoded in the matrix Q, to remove these second and third terms. Thus, in order to describe the equivalence between the dynamics of a system and free energy minimization, the FEP assumes that
Assumption 3⁎⁎
Solenoidal couplings between ‘blocks’ of states () are precluded when a Markov blanket emerges under sparse coupling[3].
(16)
This leads to
(17) |
Where the approximation is obtained neglecting couplings of order larger than quadratic in the average of the surprise , as prescribed by the Laplace assumption, knowing that the flow of external states is independent of internal states x. The obtained expression is proportional to the gradient in Eq. (11) for , therefore pointing to a direction minimizing the free energy, where the factor represents the rate of adjustment of variational parameters (γ in Eq. (12)).
In a recent work [30], it has been proposed that in the general case Q is only block-diagonal for autonomous () and ‘non-autonomous’ states (), allowing non-zero components . However, it is not clear (to our knowledge) how this case can be directly connected with free energy minimization, and this should be perhaps clarified by future work. In any case, the findings in the next sections of this article bring forward similar problems considering one type of block-diagonal matrix or the other.
This step concludes the first move for deriving the FEP, by connecting the average flow of a system with the gradient minimizing a variational free energy functional.
2.2. Second move: linking the average flow with the dynamics of the most likely states
The second move for deriving the FEP involves connecting the average flow in the system with its (averaged) dynamics. This second step starts by assuming a mapping connecting the most likely internal and external states:
Assumption 3⁎⁎
There is a smooth and differentiable function σ that maps between the most likely internal and external states given a blanket state,
(18) and the gradientis invertible (i.e.exists)
A sufficient requisite for Assumption 2 is that the mapping from to is injective [4].
Once the mapping between internal and external states is defined, the next step, as we anticipated, admits two possible interpretations. The first is an interpretation in which the dynamics of the average states of the system strictly follow a gradient descent on the free energy (in the form of Eq. (12), e.g. [46]). A second interpretation relaxes this view to propose that free energy minimization only takes place on average over counterfactual trajectories (rather than directly). The distinction between the two (see Assumption 3⁎, Assumption 3⁎⁎) has been generally not discussed in detail, but it is of great importance to evaluate the claims of the FEP.
The first interpretation proposes that the dynamics of the most likely states can be described by the gradient on the free energy captured by the average flow described by Eq. (17), which results in:
Assumption 3⁎
The evolution of the most likely external states is similar to their conditional marginal flow given the blanket state
(19)
The star ⁎ symbol in this assumption indicates that this assumption is in general not explicitly stated, and that two competing interpretations are possible. The first interpretation, in which Assumption 3⁎ holds strictly, is supported by verbal descriptions and some mathematical steps in [3] and [4] (specifically the equivalents of Eq. (20) and (21) below3 and several verbal descriptions4). The alternative interpretation of this assumption relaxes the equivalence between and (see Assumption 3⁎⁎ below).
Taken together, these three assumptions let us derive a connection between the evolution of the most likely states and the gradient of the variational free energy:
(20) |
This has an important implication, as it allows us to derive that behaves as the variational inference parameter θ in Eq. (12). Therefore, the dynamics of a system can be described as if performing variational inference.
This is possible because the conditional synchronisation manifold (Assumption 2) allows deriving a mapping between the evolution of the most likely states through the chain rule:
(21) |
Finally, under Assumption 3⁎, the dynamics of the most likely internal states can be described as minimizing the variational free energy about external states
(22) |
which corresponds to following the gradient descent on the variational free energy described by Eq. (12), now rewritten in terms of the most likely internal states.
Here, we shall note that, in general, Assumption 3⁎ does not hold for most dynamical systems with stochastic fluctuations, as it equates the rate of change of an average with the average of the rate of change. The proponents of the FEP in recent works offer instead a more relaxed interpretation of the gradient descent on the variational free energy. This interpretation proposes that the FEP applies just to the marginal flows, and thus a system behaves ‘as if’ performing Bayesian inference just on average. For example in [30] the authors propose that ‘the interpretation in terms of Bayesian inference emerges only in expectation — or on average […] The classical example here is the averaging of multiple responses to sensory perturbations, when characterizing evoked responses in internal states’. This results in the substitution of Assumption 3⁎ by
Assumption 3⁎⁎
If the conditional average flows follow the direction of a descending gradient of a variational free energy, the behaviour of the states of the system can be interpreted ‘as if’ they were, on average, performing a gradient descent or minimizing a free energy functional.
In general, it is not easy to distinguish in the FEP literature when Assumption 3⁎ or Assumption 3⁎⁎ is considered, as verbal descriptions sometimes refer to dynamics of most likely states and average flows indistinctly. Note also that some important steps like the chain rule in Eq. (21) can only be derived for the interpretation promoted by Assumption 3⁎. Despite the interpretation, under these assumptions the proponents of the FEP conclude that the dynamics of the most likely internal and external states can be described as following a negative free energy gradient, which is equivalent to stating that they evolve as if performing a Bayesian inference. Moreover, the FEP proposes that this can be extended to the dynamics of actions a [4], deriving the principle of active inference [10], [12].
The two moves described here make important assumptions about the underlying dynamical systems used to derive the theory. The first move assumes a very specific statistical structure in which a gradient of the free energy functional is directly connected to average flows in the systems, without justifying to what extent it can be expected from the classes of systems capturing properties from biological systems. The second move makes further assumptions to justify that average flows (i.e. the average of the rate of change) in the system are informative about its behaviour and dynamics (i.e. the rate of change of the average), supporting an interpretation that described the behaviour of a system as if performing Bayesian inference. In the next sections, we will see that the steps for deriving this interpretation are not justified for the non-equilibrium linear systems studied in this paper.
In the rest of the document, we will take Conditions 1 and 2 for granted and explore the generality of the other conditions and assumptions in linear stochastic systems with weak couplings. We will see that Assumption 2 hold under special conditions and can be expected for a given class of systems. In contrast, we show that Condition 3 and Assumption 1 only hold for very specific sensorimotor loops and Assumption 3⁎, Assumption 3⁎⁎ do not hold in general, threatening the viability of the FEP and its applicability to most living systems.
3. Mathematical review of the FEP under linear stochastic dynamics
In order to explore the assumptions enumerated above in a class of tractable non-equilibrium dynamical systems, we restrict our analysis to the class of systems captured by a linear Langevin dynamics (which can be seen as an approximation of an Ornstein-Uhlenbeck process [47]) defined by
(23) |
where J is an invertible real matrix, ρ is an n dimensional real vector, is a standard n-dimensional Gaussian white noise with a diagonal covariance matrix . The linearity of the process guarantees that the model will eventually result in a Gaussian distribution.
The solution of this system (see Appendix B) in the non-equilibrium steady state (NESS) takes the form of a multivariate Gaussian distribution [47], [48] with statistical moments:
(24) |
(25) |
where can be found numerically by solving the above continuous Lyapunov equation. If J is symmetric, the steady state of the system is a state of equilibrium with . However, the FEP focuses instead on NESS, which are more appropriate for describing living systems.
For studying the NESS of the system, in this article we explore the case in which non-diagonal couplings in J are small, although we explore the effect of considering higher orders of this approximation. Thus, we define the coupling matrix as, , where are assumed to be small.
This leads to the following power series expansion,
(26) |
To further simplify things, we also define a homogeneously distributed noise , being ς a scalar constant. The details of the derivation of the equations above are described in Appendix B.
3.1. Can we expect the requirements for deriving the FEP in living systems?
To explore the generality of the conditions required to connect the average flow of a system to the gradient of the free energy, here we explore Conditions 3 and Assumption 1 required to derive the first move of the FEP.
Condition 3 : Markov blanket
First, the FEP requires the existence of a Markov blanket imposing a conditional independence between internal and external states given a blanket state (Condition 3, [3], [4]). We will see that not all linear systems meet this condition, although it can be considered as an approximation in the case of very weak-couplings.
For systems represented by Gaussian distributions, the Markov blanket condition (Condition 3) is only met when the inverse of the covariance, the Hessian matrix , satisfies,
(27) |
Thus this begs the questions: How common are Markov blankets? And when can we expect to find them in living systems? The FEP generally proposes that the theory holds when the Markov blanket condition is satisfied under a ‘canonical flow constraint’ [30] that is defined in our linear system by structured as in Fig. 4.A. This means that, mechanistically, external and sensory states do not depend on internal states, and that action and internal states do not depend on external states. This is representative of the kind of asymmetries we expect from a sensorimotor loop in living systems. However, even in linear systems, it is not easy to know under what conditions a system can satisfy both the canonical flow constraint and possess a Markov blanket. As [34] points out, neither one of these conditions is sufficient to guarantee the other.
In the case of weak couplings where , and homogenous noise , the covariance of the system can be expanded as Eq. (26), and the inverse covariance (Hessian) can be computed as a Neumann series (Appendix B),
(28) |
Under the couplings determined by Condition 1 we can see that for a first order approximation , satisfying the Markov blanket condition.
For a second order weak coupling approximation we have,
(29) |
Under the canonical flow constraints, relatively few systems will display an exact Markov blanket, except for combinations of parameters that happen to cancel the terms in the equation above. One exception is systems with weak couplings under circular loops (Fig. 4.B) or systems with two layers of blanket states (e.g. the system in Fig. 4.C). Note that this is because cycles generating conditional couplings between are of order higher than 2. In general, these cases will not display a Markov blanket for stronger couplings (see Eq. (B.15), (B.16), except for perfectly symmetric couplings). Thus, we can conclude that Markov blankets will emerge only for particular combinations of parameters, as cycles in the system will in general introduce couplings preventing their existence.
Assumption 1 : solenoidal coupling
The FEP requires that the averaged marginal flow of external states y, given a blanket state b, depends only on the gradients of its marginal density. For this, it is a requirement that there is no solenoidal coupling between external and other states (Assumption 1). We will see in this section that most linear systems will not meet this condition.
We can rewrite Eq. (25) to express the matrix of solenoidal couplings Q as (see B.2 or [34])
(30) |
Again, the values of Q can be obtained by solving the corresponding continuous Lyapunov equation. Assuming , this matrix can be expressed as the power series
(31) |
The first order approximation under weak coupling and and the canonical flow constraints (Condition 1, Fig. 4.A) results in the solenoidal coupling matrix
(32) |
Where Assumption 1 is not met for most parameter combinations.
In the best-case scenario, we can make many terms in the matrix above disappear by making blocks symmetric when possible. This however does not completely remove non-diagonal blocks, and leaves
(33) |
Thus we observe that, even in a very weakly coupled system, the only way of setting and their antisymmetric counterparts to zero is to effectively decouple some parts of the sensorimotor loop completely that results in a ‘symmetric’ interaction loop with the form displayed in Fig. 4.C). In this type of system, detailed balance is only broken by interactions inside blocks , as all couplings between blocks are symmetric. Therefore, the system is driven out-of-equilibrium only by internal tendencies of these blocks, not by their interactions between them. This precludes for example the existence of asymmetric agent-environment interactions, which may be crucial for many living processes as a mechanism for generating qualitative differences between the inside and the outside of a system, as well as for regulating exchanges of matter and energy with the environment [35], [37], [38], [39].
For a second order approximation, we find the solenoidal coupling terms between external and autonomous states:
(34) |
(35) |
(36) |
(37) |
We can see that this matrix will in general only be block diagonal in very specific cases, exemplifying how the presence of higher-order terms complicates the uncoupling of solenoidal terms.
These results show how, in the case of linear, weakly-coupled systems, the assumptions about the statistical structure of a system required by the FEP is restricted to a very narrow space of parameters (i.e. values of C). In particular, removing solenoidal couplings between blocks of the system requires a highly symmetric coupling structure (Fig. 4.C). This prevents the application of the theory to many common structures found in biological systems.
3.2. Can the FEP explain and describe the behaviour of living systems?
Above, we have shown that the conditions for connecting the conditional average flow with the gradient of the free energy functional only hold for a very narrow class of systems. Now we explore, in the cases where the previous requirements are met, the connection between the conditional average flow and the behaviour of the most likely states of the system. That is, are the results of the FEP able to describe, explain or predict the dynamics of the systems that conform to its assumptions? Here we will see that, while Assumption 2 follows from a broad class of linear systems, one of the major findings of our study is that Assumption 3⁎, Assumption 3⁎⁎ present important issues that prevent the results of the FEP to be good descriptions of the behaviour of stochastic dynamical systems
Assumption 2 : Mapping between internal and external statistics
A further requirement of the FEP is that a mapping σ exists between internal and external states (Assumption 2). We will see that in linear systems, at least, there are many systems that meet this condition.
Given a NESS well characterized by Gaussian distributions over states , the conditional distributions given a particular blanket state results in the most likely states (Eq. (13) and (14)) being described as
(38) |
(39) |
where is a generalized inverse matrix (which is equivalent to the inverse for nonsingular matrices). If and are nonsingular (this implies that ), we can derive the linear mapping,
(40) |
which is invertible if is non-singular (for this is required)
(41) |
Yielding the evolution of the most likely states as,
(42) |
This shows that, in general, a mapping between the most likely internal and external states exists if the corresponding covariance submatrices are invertible.
The FEP often states that the existence of this mapping is a consequence of a Markov blanket [4], but we note that the existence of such a mapping is independent of Conditions 1 and 3. The existence of a Markov blanket implies a conditional covariance , but this does not affect the relation between internal and external states. Thus, in linear systems any variable can potentially mediate an invertible mapping, independently of a Markov blanket, if the corresponding covariance submatrices are nonsingular.5
Assumption 3⁎ : The dynamics and Bayesian inference
The final assumption for deriving the FEP is that the evolution of the most likely external states is linked with the evolution of the marginal flow conditioned on blanket states . In this section, we see that this assumption does not hold in general.
The time derivative of yields,
(43) |
Conversely, the equation proposed by the FEP uses the marginal flow (Assumption 3⁎, see [4] or Appendix B in [3]). We represent the dynamics of a variable driven by this marginal flow as
(44) |
where captures the evolution of the most likely dynamics in a system behaving in a way that strictly minimizes the free energy. The difference between the behaviour of and allows us to evaluate how informative is the FEP about the behaviour of a system. If a system approximately follows a free energy gradient, the behaviour of these two variables will present some similarities in their evolution. This approximation is crucial for the FEP as it is this marginal flow that is connected with the gradient of the free energy (therefore equating dynamics and inference, see Assumption 3⁎). By using Eq. (39) this expression can be rewritten simply as,
(45) |
We can thus see that these two equations represent quite different quantities and that, in general, Assumption 3⁎ will not hold for linear systems.
Furthermore, we can show that this equivalence is also incorrect even in the case of weak couplings. Specifically, for the first order weak coupling approximation with and . Assuming is nonsingular (i.e. ), the weak coupling expansion of its inverse according to the Neumann series (other expansions exist in the case of generalized inverses [50]) is,
(46) |
Using the weak-coupling approximation of (Eq. (26)), Eq. (43) results in
(47) |
In contrast, the marginal flow in Eq. (44) results in
(48) |
which not only ignores the random fluctuations term, but also reverses the sign of the term . Note that when we force solenoidal uncoupling (Assumption 1, see Section 3.1), making in that case (while is non-zero).
For this weak coupling approximation, the only case in which the approximation is valid is one in which (a deterministic system) and (a system where the agent just observes the environment without affecting it). Similar expressions could be derived for higher order approximations.
As an example of the dissimilarity of these quantities, in Fig. 5.A we display and for arbitrary parameters structured as the sensorimotor loop described in Fig. 4.C with random couplings (note that some weights are set to zero and others forced to be symmetric) and . We see that cannot capture the structure in . The reason behind this is that the derivative of accumulates an error from fluctuations in the system that are not captured, displaying a random walk behaviour that is absent in the real variable . Similarly, in the most favourable case, setting a very small noise and (Fig. 5.B), the situation is similar, as even very small noise terms are sufficient for driving the two terms apart due to the integration of random fluctuations over time.
Assumption 3⁎⁎ : A way out? Problems with interpreting behaviour as Bayesian inference only ‘on average’
The results of the previous section suggest that the FEP could be, in practice, inapplicable for describing or explaining the behaviour of living systems. Some of the most recent works on the FEP [30] try to avoid the problems described above and state that the free energy principle describes just marginal flows, that is, it describes the behaviour of a system just on average over different trajectories. This could appear to circumvent some issues presented in the previous section, as it implies substituting Assumption 3⁎ by a more relaxed interpretation of a gradient descent on the free energy described by Assumption 3⁎⁎. However, under a close inspection, we encounter that the situation is not improved by this claim. Assumption 3⁎⁎ entails two problems: 1) the mapping described by Assumption 2 no longer connects internal and external flows, and 2) the conditional average flow following the gradient of the free energy does not guarantee a gradient descent in the effective behaviour. While the first problem can be solved, the second has deep implications as in most cases the average flow does not describe the true behaviour of a system in the presence of stochastic fluctuations, therefore threatening to render the FEP inapplicable to describe most living systems. In this last section, we briefly explore the validity of the theory when applied only to the average flows of a system (and not the evolution of the most likely states).
A first, practical problem implied by this claim is that if the FEP only applies to marginal flows , then a new mapping between flows is required. As we described in Eq. (21), a mapping between the dynamics of the most likely states is derived from the chain rule as the gradient of the mapping σ. If instead one interprets the FEP as connecting the conditional average flow of external and internal states a new mapping is required:
(49) |
In general, this mapping can take complicated forms, and often a unique mapping will not exist. In linear systems, however, we can simplify the marginal flows
(50) |
(51) |
Which, if is invertible, yields
(52) |
that results in the mapping ϕ being very different from . Thus, in general,
(53) |
thus contradicting [3], [4]. Thus, an interpretation of the FEP over conditional average flows should replace Assumption 2 with a new mapping. This result, in combination with a block-diagonal Q (Assumption 1) allows rewriting and pointing in the direction of free energy minimization, mediated by a mapping ϕ.
However, there is an important conceptual problem that remains even if a new mapping is derived. The FEP relies on finding a variable that behaves as a gradient descent on the free energy functional as described in Eq. (12). If Assumption 3⁎ is not met, then there is no variable in the system that behaves following a gradient descent on the free energy (i.e. a variable with behaviour determined by ). In this case, this assumption could be relaxed to Assumption 3⁎⁎ but, as we observed in the results and simulations above, even when the average flow of the system is related with the free energy (i.e. ) this is not a good description of the behaviour of a system.
In this case, the claim of the FEP, if all other assumptions are hold, can be described by Eq. (17), which relaxes the requirement of a strict gradient descent (described by Eq. (20)) and requires instead that the conditional average flow is directed in the direction of the gradient of the free energy. The problem with this relaxed gradient descent interpretation is that it does not guarantee that a system effectively performs a gradient descent, not even on average. In Appendix C we illustrate this issue in a simple bivariate linear stochastic model with variables . We observe that this particular model presents a global attractor located at with solenoidal flows in the form of a spiral flow (Fig. C.2.A). In contrast, the conditional average flow suggests a monotonic gradient ascent on , dismissing solenoidal flows in the system and transforming an attracting flow into a repelling one. This is a simple example showing how, in general, the conditional average flows do not describe the behaviour of the system. Moreover, conditional average flows can be misleading and not even a good approximation about the average behaviour of a system, indicating a gradient ascent/descent on some quantity when the behaviour of the system performs the opposite action. This is also exemplified in recent work simulating linear systems, where the free energy gradient only captures attracting tendencies at highly surprising states, not capturing solenoidal flows nor behaviour near the NESS global attractor [49], [43].
To summarize, our results show that, in general, even if a relaxed version of the assumptions of the FEP holds, a system cannot be interpreted as if performing Bayesian inference over external states. The reason behind this is that the average flows and do not describe the behaviour of the system, as it can be easily seen from the results in Fig. 5. Intuitively, substituting the true flow – – by an average flow fixing the blanket state – –decouples the trajectory of y from its previous state, which in most dynamical systems will result in an impoverished description, not capturing its real, history-dependent, behaviour.
4. Conclusion
The latest formulation of the free energy principle [3], [4] states that, in any dynamical system equipped with a Markov blanket, the flow of internal states can be construed as a gradient ascent on Bayesian model evidence. This assumption rests on two crucial moves. The first move connects the system's average flow with a gradient on a variational free energy. This connection relies on the existence of a Markov blanket and the emergence of a particular statistical structure precluding solenoidal couplings. The second is the interpretation that this relation between free energy and an average flow results in systems behaving as if performing variational inference over the states of its environment. In this review, we have summarized crucial steps required for this claim (Conditions 1–3 and Assumption 3⁎⁎, Assumption 3⁎) and have shown that several of these conditions cannot be met in general by linear, weakly-coupled stochastic systems.
The first step compels a discussion about the generality of the FEP. That is, if the principle requires a particular statistical structure, how general is this structure, and how broadly can we expect it to be present among the class of systems capturing the properties of living and cognitive processes? We discover that, in the class of linear systems explored, the answer to this question is that the statistical structure required by the FEP only arises in a very narrow class of systems, requiring stringent conditions such as fully symmetric agent-environment interactions that we cannot, in general, expect from living systems [35], [37], [38], [39]. The generality of the FEP has been questioned in the past due to conceptual issues [51], [33] or the existence of counterexamples challenging the idea that perception-action interfaces, Markov blankets and solenoidal decoupling follow from each other [34]. However, to our knowledge, our study is the first that shows that the assumptions of the FEP do not hold for a vast class of systems, namely, linear, weakly coupled systems, except for the limited case of fully symmetric agent-environment interaction.
This is concerning for two reasons. First, the FEP is designed for Gaussian (i.e. in most cases linear) stochastic systems [4], [34]. Thus, our results would imply that, as currently defined, the FEP cannot be fully implemented in a broad set of systems from the class it was designed for. Second, one could hope that the introduction of strong couplings or non-linearity allows some specific systems to meet the required conditions. Some recent approximations of the behaviour of chaotic oscillators point in this direction [42]. However, this claim should be regarded with some scepticism, as the introduction of stronger or higher-order couplings would result in additional terms in the expansions explored in this article (see Appendix B). For most parameter settings, systems with stronger couplings will not result in the independence relations required for Condition 3 and Assumption 1. This will lead, in general, to a more significant divergence between the evolution of the system and its average flow, making it more unlikely that Assumptions 3⁎ or 3⁎⁎ will hold. We leave it to future work to explore the accuracy of this claim and investigate whether the assumptions of the FEP can be met beyond the class of systems explored here, e.g. displaying strong or nonlinear couplings.
The second step concerns how informative the FEP is about the behaviour of an agent. The FEP justifies that any system can be described as if performing variational inference through the existence of a conditional synchronization manifold relating the direction of the free energy gradient and the evolution of the most likely states of a system. We observe that this manifold can exist in a broad class of systems. Nevertheless, assuming a strict gradient descent interpretation (Assumption 3⁎), it is problematic to connect the evolution of the most likely states to the average flow of a system (and, in consequence, to assume a system will behave as if minimizing the variational free energy). The problem lies in implicitly relating the rate of change of the average (expected) state as being described by the average flow (the expectation of the rate of change) conditioned on a blanket state. If, instead, we consider a more relaxed interpretation of the free energy gradient descent (i.e. just taking place on average, Assumption 3⁎⁎), we encounter that new problems arise. First, a new mapping between the flows of internal and external states is required. Second, we observe that the average flow cannot, in general, describe the true behaviour of a system. In sum, the FEP as it stands does not do justice to the influence of the system's trajectory in determining its future behaviour. The reason behind this is that the gradient of the free energy defined by the principle is computed for the average of an ensemble of trajectories. Thus, even when the free energy gradient can be connected with an average flow (which, as we have shown, happens under very specific conditions), this is mainly uninformative about the behaviour of a system subject to stochastic interactions. This is especially relevant for emergent discussions about the compatibility of the FEP with enactive and autopoietic theories of cognition (e.g. [52], [53], [54]). Specifically, enactive principles stress the history-dependence of living systems, and this supposes a fundamental incompatibility with the assumptions of the FEP [55]. In particular, enactive views of cognition conceive sense-making as a process emerging from the history of interactions of a system, which is invisible for a gradient of the free energy described as an average flow.
The motivation behind the FEP aims to connect ideas from variational inference with the dynamics of complex, self-organizing systems. This claim is exceptionally appealing, as it could potentially allow applying the machinery of Bayesian and information theoretical approaches to describe many systems that are intractable in practice. However, by inspecting the theory and its assumptions in the context of a broad class of analytically tractable models, we discover that many of the steps required to derive the theory do not straightforwardly follow or present significant conceptual problems that need to be resolved. This finding illustrates the difficulties in developing a theory of life and cognition over interdependent sets of mathematical assumptions, and how testing these assumptions and their relations against tractable models can help overcome these difficulties.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
We are grateful to Karl Friston, Lancelot Da Costa, Thomas Parr, Iñigo Arandia-Romero and Hideaki Shimazaki for their constructive feedback and comments on this manuscript. M.A. is thankful to Martin Biehl for helpful discussions about references [3], [4], [30]. We are also grateful to Manuel Baltieri for his open peer-review of a preprint of this manuscript.
M.A. was funded by the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement 892715 and supported in part by the Basque Government project IT 1228-19 and the Spanish Ministry of Science and Innovation project PID2019-104576GB-I00. C.L.B. is supported by BBRSC grant BB/P022197/1.
Communicated by J. Fontanari
Footnotes
Recent works have relaxed this assumption, considering the case of state-dependent Γ and Q matrices [42], [43]. However, these changes do not materially affect our critique (which uses systems where these matrices are independent of the state of the system) and we focus on the state-independent case here.
Choosing the arg max function can imply problems in some cases (e.g. if it is non-differentiable), and in some cases the expectation has been proposed as an alternative statistic. However, in the case of Gaussian systems these two functions are equivalent, so this does not affect the conclusions in this article.
In these works, the same symbol is used to represent and in Eq. (20) and (21) (Eq. 3.3 and 3.5 in [4], Eq. 8.22 and 8.23 in [3]), therefore making the implicit assumption that they are approximately the same quantity. Also, it should be noted that Eq. (22) can only be obtained under the combination of Assumption 2 and Assumption 3⁎, as it combines properties of the mapping σ (related to the most likely states) with the gradient of the free energy (related to the average flows).
Particularly sentences about variable in these works, for which it is stated ‘the dynamics of the internal mode as a gradient flow on the surprisal of the external mode’ ([3], p. 97) ‘the rate of change of the most likely internal states’ ([4]), p. 7.)
At the moment of writing this manuscript, we have found that a similar result has been found independently in unpublished work [49].
Appendix A. Mathematical definitions of main concepts of the free energy principle
Perception-action partition. The FEP assumes that the system can be decomposed into external, sensory, active and internal states, , configured as a perception-action loop reflecting an interface mediating between ‘autonomous’ states (active and internal states ) and ‘non-autonomous’ states (external and sensory states ). This leads to describing the evolution of the system as
(A.1) |
(A.2) |
NESS and solenoidal flows. The FEP assumes that the system will reach a non-equilibrium steady state described by the probability density function , which can be described using a SDE decomposition that separates the flow into dissipative () and solenoidal () components
(A.3) |
(A.4) |
The FEP makes some requirements about the solenoidal flow matrix Q, as it assumes it is state-independent (i.e. Q does not change with z) and that it is sparse in the sense that couplings between some states (e.g. internal and external) are zero (see 1).
Markov blanket. The steady state distribution of the system is described in terms of a Markov blanket, where internal/external states are independent when conditioned on its blanket states.
(A.5) |
Conditional synchronisation manifold. The FEP assumes that there is a smooth and differentiable function, σ, which maps between the most likely internal and external states given a blanket state,
(A.6) |
and the gradient is invertible (i.e. exists). The FEP generally refers to this mapping as a conditional synchronisation manifold, and proposes that its existence allows to characterise the relationship between (maximum a posteriori) internal and external states in terms of internal states ‘sensing’ or ‘tracking’ external states through the Markov blanket [3].
Rate of change of the average. By virtue of the conditional synchronisation manifold, the rate of change of the average internal and external states of the system are connected by the gradients of the mapping function.
(A.7) |
Variational free energy. A system performing Bayesian inference tries to minimize the surprise of observed states (), according to an internal model. However, a system cannot access this surprise value without complete knowledge of its environment, Bayesian inference prescribes to use a lower bound of this surprise described by the variational free energy :
(A.8) |
which is composed of the surprise plus a term capturing the distance from the probability of external states given the blanket to an internal model of the world parametrized by θ.
Average flow and the free energy gradient. The FEP proposes that the evolution of internal and external states of a system can be described, under a Gaussian approximation, as a gradient of the free energy of the system respect to its sufficient statistics.
(A.9) |
Then, the conditional synchronization manifold is proposed to connect the internal and external average flows (i.e. the gradients and ), suggesting that internal states are behaving as if trying to perform Bayesian inference over external states.
Appendix B. Solution of the linear Langevin dynamics
We start with the Ornstein-Uhlenbeck process
(B.1) |
which can be approximated by the equivalent Langevin dynamics
(B.2) |
where J is an invertible real matrix, ρ is an n dimensional real vector, is a standard n-dimensional Wiener process, is a standard n-dimensional Gaussian white noise with covariance matrix .
The model can be solved using standard methods for systems of differential equations
(B.3) |
(B.4) |
(B.5) |
This solution of the system [47], [48] yields statistical moments
(B.6) |
(B.7) |
These equations are hard to solve analytically. However, we can find the differential equations that result in equivalent integrals, obtaining the time evolution of statistical moments
(B.8) |
(B.9) |
At equilibrium or at the NESS, when the solution is unique, the system stabilizes to the values that make the derivatives equal to zero
(B.10) |
(B.11) |
where can be found numerically solving the above continuous Lyapunov equation (a particular case of a Sylvester equation). If J is symmetric, the steady state of the system is a state of equilibrium with . However, the FEP focuses instead on NESS, which are more appropriate for describing living systems.
B.1. Weak coupling approximation
In order to study the solution at the NESS, we assume weak couplings of the form , with being small. We derive
(B.12) |
(B.13) |
Recursively substituting in the right-hand side with , we obtain the time series expansion
(B.14) |
In the equilibrium case of symmetric couplings, the Hessian is trivially . For systems at a NESS, for and under a uniform noise , the inverse covariance (Hessian) can be computed as a Neumann series,
(B.15) |
The submatrix of the Hessian for couplings is
(B.16) |
B.2. Solenoidal flows
We can describe the surprise of the system and its gradient as
(B.17) |
(B.18) |
where we know from Eq. (25) that .
At the NESS, the solution can be described in two terms consistent with a SDE decomposition:
(B.19) |
(B.20) |
Rearranging the terms in the equation above to substitute we obtain the equivalence
(B.21) |
which again takes the form of a continuous Lyapunov equation. This equation can be solve numerically, or analytically using a power series expansion of , expanding the expression
(B.22) |
into the power series
(B.23) |
B.3. Mapping between the most likely internal and external states
If z can be divided into blanket states b, and internal/external states , the conditional distribution given blanket states is a Normal distribution with moments
(B.24) |
(B.25) |
Note that the Markov blanket imposes that , since are conditionally independent.
Similarly, we can decompose h into an internal and external state
(B.26) |
(B.27) |
If the mapping from b to x is injective (for this is required), the first term can be rearranged into
(B.28) |
yielding
(B.29) |
which is a linear mapping of to .
(B.30) |
In the NESS limit
(B.31) |
Similarly, if the mapping from b to y is injective (for this is required), we can invert the relation
(B.32) |
Appendix C. How informative are conditional marginal flows? A minimal example
In order to illustrate the debate about how informative are average flows, we study a simple two-dimensional linear stochastic system
(C.1) |
(C.2) |
which can be studied as the Langevin dynamics described in Appendix B. We select a variance of , , and
(C.3) |
The resulting covariance matrix is
(C.4) |
Fig. C.1 displays an example of a trajectory of this system.
C.1. Conditional marginal flows capture partial tendencies, not real behaviour
Flows in the system are described as
(C.5) |
(C.6) |
Following the results in Appendix B, conditional average flows can be described as
(C.7) |
(C.8) |
(C.9) |
(C.10) |
In Fig. C.2.A we describe the flow structure in the system, which describes a spiral behaviour due to the non-equilibrium tendencies in the system. This is an example of the solenoidal flows captured by the matrix Q.
In contrast, Fig. C.2.B captures the marginal flow structure when variable b is fixed. This separates tendencies in the system, resulting in a diverging tendency for variable y (diverging to −∞ for negative b and to +∞ for positive b. Similarly, combining cross marginal flows (fix b for the marginal flow of y and vice versa) result in another combination of partial tendencies as shown in Fig. C.2.C. In this case, the rotation of solenoidal flows is captured, but no the attractor that structures this rotation into an spiral behaviour.
In sum, conditional marginal flows do not capture the behaviour of the system. Even if the flows point in some direction (e.g. free energy minimization) we cannot conclude that this is equivalent to the system behaving ‘as if’ following that particular direction. An alternative illustration of this problem can be described by writing the conditional average flow of y in terms of the most likely state
(C.11) |
Where the term is a positive constant (see Fig. C.3.B). Following the logic of a relaxed interpretation of the FEP, this could be interpreted as a gradient ascent on . However, as we see in the Fig. C.2.A the real behaviour of the system will display a global attractor at with solenoidal couplings, eventually minimizing .
C.2. Rates of conditional averages are different to conditional marginal flows
The second issue we illustrate in this simple system is the difference between conditional marginal flows and the dynamics of the most likely states.
From Appendix B we can derive
(C.12) |
For a fixed b, this variable is distributed as a Normal distribution , with mean and standard deviation
(C.13) |
(C.14) |
where is the standard deviation of the noise introduced in the Langevin dynamics (described as ).
For a range of values of b, Fig. C.3 captures the conditional average flows (solid dark line), and the distribution of the derivatives of the conditional average state (the mean is represented by the dashed line and error bars of three standard deviations by the light area). As we can observe, the dependency with is positive with respect to (indicating a gradient ascent on ) while the true dynamics is captured by the negative dependency between and (indicating a gradient descent on ). Note that for different parameters the sign of the slope of can change (being negative or positive) but the slope of is always negative, given the presence of a global attractor. This shows how, even in very simple examples, these quantities can have radically different behaviours and that the conditional average flow does not necessarily capture the true behaviour of a system.
References
- 1.Friston K., Ao P. Free energy, value, and attractors. Comput Math Methods Med. 2012;2012 doi: 10.1155/2012/937860. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Friston K. Life as we know it. J R Soc Interface. 2013;10(86) doi: 10.1098/rsif.2013.0475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Friston K. A free energy principle for a particular physics. 2019. arXiv:1906.10184 preprint. [DOI] [PubMed]
- 4.Parr T., Da Costa L., Friston K. Markov blankets, information geometry and stochastic thermodynamics. Philos Trans R Soc Lond A. 2020;378(2164) doi: 10.1098/rsta.2019.0159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hohwy J. The self-evidencing brain. Noûs. 2016;50(2):259–285. [Google Scholar]
- 6.Clark A. Oxford University Press; 2015. Surfing uncertainty: prediction, action, and the embodied mind. [Google Scholar]
- 7.Friston K. The free-energy principle: a rough guide to the brain? Trends Cogn Sci. 2009;13(7):293–301. doi: 10.1016/j.tics.2009.04.005. [DOI] [PubMed] [Google Scholar]
- 8.Friston K., Daunizeau J., Kilner J., Kiebel S.J. Action and behavior: a free-energy formulation. Biol Cybern. 2010;102(3):227–260. doi: 10.1007/s00422-010-0364-z. [DOI] [PubMed] [Google Scholar]
- 9.Friston K. The free-energy principle: a unified brain theory? Nat Rev Neurosci. 2010;11(2):127–138. doi: 10.1038/nrn2787. [DOI] [PubMed] [Google Scholar]
- 10.Friston K., Mattout J., Trujillo-Barreto N., Ashburner J., Penny W. Variational free energy and the laplace approximation. NeuroImage. 2007;34(1):220–234. doi: 10.1016/j.neuroimage.2006.08.035. [DOI] [PubMed] [Google Scholar]
- 11.Friston K. A theory of cortical responses. Philos Trans R Soc Lond B, Biol Sci. 2005;360(1456):815–836. doi: 10.1098/rstb.2005.1622. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Buckley C.L., Kim C.S., McGregor S., Seth A.K. The free energy principle for action and perception: a mathematical review. J Math Psychol. 2017;81:55–79. [Google Scholar]
- 13.Millidge B., Seth A., Buckley C.L. Predictive coding: a theoretical and experimental review. 2021. arXiv:2107.12979 preprint.
- 14.Friston K., Rigoli F., Ognibene D., Mathys C., Fitzgerald T., Pezzulo G. Active inference and epistemic value. Cogn Neurosci. 2015;6(4):187–214. doi: 10.1080/17588928.2015.1020053. [DOI] [PubMed] [Google Scholar]
- 15.Friston K., Daunizeau J., Kiebel S.J. Reinforcement learning or active inference? PLoS ONE. 2009;4(7) doi: 10.1371/journal.pone.0006421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Friston K., FitzGerald T., Rigoli F., Schwartenbeck P., Pezzulo G. Active inference: a process theory. Neural Comput. 2017;29(1):1–49. doi: 10.1162/NECO_a_00912. [DOI] [PubMed] [Google Scholar]
- 17.Cullen M., Davey B., Friston K., Moran R.J. Active inference in openai gym: a paradigm for computational investigations into psychiatric illness. Biol Psychiatry. 2018;3(9):809–818. doi: 10.1016/j.bpsc.2018.06.010. [DOI] [PubMed] [Google Scholar]
- 18.Millidge B., Tschantz A., Buckley C.L. Predictive coding approximates backprop along arbitrary computation graphs. 2020. arXiv:2006.04182 preprint. [DOI] [PubMed]
- 19.Hohwy J., Roepstorff A., Friston K. Predictive coding explains binocular rivalry: an epistemological review. Cognition. 2008;108(3):687–701. doi: 10.1016/j.cognition.2008.05.010. [DOI] [PubMed] [Google Scholar]
- 20.Kanai R., Komura Y., Shipp S., Friston K. Cerebral hierarchies: predictive processing, precision and the pulvinar. Philos Trans R Soc Lond B, Biol Sci. 2015;370(1668) doi: 10.1098/rstb.2014.0169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Tschantz A., Millidge B., Seth A.K., Buckley C.L. Reinforcement learning through active inference. 2020. arXiv:2002.12636 preprint.
- 22.Da Costa L., Parr T., Sajid N., Veselic S., Neacsu V., Friston K. Active inference on discrete state-spaces: a synthesis. J Math Psychol. 2020;99 doi: 10.1016/j.jmp.2020.102447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Millidge B. Deep active inference as variational policy gradients. J Math Psychol. 2020;96 [Google Scholar]
- 24.FitzGerald T.H., Schwartenbeck P., Moutoussis M., Dolan R.J., Friston K. Active inference, evidence accumulation, and the urn task. Neural Comput. 2015;27(2):306–328. doi: 10.1162/NECO_a_00699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Parr T., Markovic D., Kiebel S.J., Friston K. Neuronal message passing using mean-field, bethe, and marginal approximations. Sci Rep. 2019;9(1):1–18. doi: 10.1038/s41598-018-38246-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Kappel D., Tetzlaff C. A synapse-centric account of the free energy principle. 2021. arXiv:2103.12649 preprint.
- 27.Tschantz A., Seth A.K., Buckley C.L. Learning action-oriented models through active inference. PLoS Comput Biol. 2020;16(4) doi: 10.1371/journal.pcbi.1007805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Calvo P., Friston K. Predicting green: really radical (plant) predictive processing. J R Soc Interface. 2017;14(131) doi: 10.1098/rsif.2017.0096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Friston K., Kilner J., Harrison L. A free energy principle for the brain. J Physiol (Paris) 2006;100(1–3):70–87. doi: 10.1016/j.jphysparis.2006.10.001. [DOI] [PubMed] [Google Scholar]
- 30.Friston K., Da Costa L., Parr T. Some interesting observations on the free energy principle. Entropy. 2021;23(8):1076. doi: 10.3390/e23081076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Pearl J. Elsevier; 1988. Probabilistic reasoning in intelligent systems: networks of plausible inference. [Google Scholar]
- 32.Richardson T.S., Spirtes P., et al. Carnegie Mellon; 1996. Automated discovery of linear feedback models. [Google Scholar]
- 33.Bruineberg J., Dolega K., Dewhurst J., Baltieri M. The emperor's new markov blankets. Behav Brain Sci. 2021:1–63. doi: 10.1017/S0140525X21002351. [DOI] [PubMed] [Google Scholar]
- 34.Biehl M., Pollock F.A., Kanai R. A technical critique of some parts of the free energy principle. Entropy. 2021;23(3):293. doi: 10.3390/e23030293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Rothman J.E., Lenard J. Membrane asymmetry. Science. 1977;195(4280):743–753. doi: 10.1126/science.402030. [DOI] [PubMed] [Google Scholar]
- 36.Buzsaki G. Oxford University Press; 2006. Rhythms of the Brain. [Google Scholar]
- 37.Fadeel B., Xue D. The ins and outs of phospholipid asymmetry in the plasma membrane: roles in health and disease. Crit Rev Biochem Mol Biol. 2009;44(5):264–277. doi: 10.1080/10409230903193307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Barandiaran X.E., Di Paolo E., Rohde M. Defining agency: individuality, normativity, asymmetry, and spatio-temporality in action. Adapt Behav. 2009;17(5):367–386. [Google Scholar]
- 39.Ruiz-Mirazo K., Moreno A. Basic autonomy as a fundamental step in the synthesis of life. Artif Life. 2004;10(3):235–259. doi: 10.1162/1064546041255584. [DOI] [PubMed] [Google Scholar]
- 40.Kwon C., Ao P. Nonequilibrium steady state of a stochastic system driven by a nonlinear drift force. Phys Rev E. 2011;84(6) doi: 10.1103/PhysRevE.84.061106. [DOI] [PubMed] [Google Scholar]
- 41.Yuan R., Tang Y., Ao P. Sde decomposition and a-type stochastic interpretation in nonequilibrium processes. Front Phys. 2017;12(6):1–9. [Google Scholar]
- 42.Friston K., Heins C., Ueltzhöffer K., Da Costa L., Parr T. Stochastic chaos and markov blankets. Entropy. 2021;23(9):1220. doi: 10.3390/e23091220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Parr T., Da Costa L., Heins C., Ramstead M.J.D., Friston K. Memory and markov blankets. Entropy. 2021;23(9):1105. doi: 10.3390/e23091105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Reid N. Proceedings of the ICIAM. 2015. Approximate likelihoods. [Google Scholar]
- 45.Kim C.S. Recognition dynamics in the brain under the free energy principle. Neural Comput. 2018;30(10):2616–2659. doi: 10.1162/neco_a_01115. [DOI] [PubMed] [Google Scholar]
- 46.Da Costa L., Parr T., Sengupta B., Friston K. Neural dynamics under active inference: plausibility and efficiency of information processing. Entropy. 2021;23(4):454. doi: 10.3390/e23040454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Vatiwutipong P., Phewchean N. Alternative way to derive the distribution of the multivariate ornstein–uhlenbeck process. Adv Differ Equ. 2019;2019(1) [Google Scholar]
- 48.Godrèche C., Luck J.-M. Characterising the nonequilibrium stationary states of ornstein–uhlenbeck processes. J Phys A, Math Theor. 2018;52(3) [Google Scholar]
- 49.Da Costa L., Friston K., Heins C., Pavliotis G.A. Bayesian mechanics for stationary processes. 2021. arXiv:2106.13830 preprint. [DOI] [PMC free article] [PubMed]
- 50.Climent J.J., Thome N., Wei Y. A geometrical approach on generalized inverses by neumann-type series. Linear Algebra Appl. 2001;332:533–540. [Google Scholar]
- 51.Raja V., Valluri D., Baggs E., Chemero A., Anderson M.L. The markov blanket trick: on the scope of the free energy principle and active inference. Phys Life Rev. 2021;39:49–72. doi: 10.1016/j.plrev.2021.09.001. [DOI] [PubMed] [Google Scholar]
- 52.Bruineberg J., Kiverstein J., Rietveld E. The anticipating brain is not a scientist: the free-energy principle from an ecological-enactive perspective. Synthese. 2018;195(6):2417–2444. doi: 10.1007/s11229-016-1239-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Constant A., Clark A., Friston K.J. Representation wars: enacting an armistice through active inference. Front Psychol. 2021;11:3798. doi: 10.3389/fpsyg.2020.598733. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Ramstead M.J., Kirchhoff M.D., Constant A., Friston K.J. Multiscale integration: beyond internalism and externalism. Synthese. 2021;198(1):41–70. doi: 10.1007/s11229-019-02115-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Di Paolo E., Thompson E., Beer R.D. Laying down a forking path: incompatibilities between enaction and the free energy principle. PsyArXiv. 2021 [Google Scholar]