Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2023 Jan 17;13:869. doi: 10.1038/s41598-023-27736-8

Thermodynamic fluctuation theorems govern human sensorimotor learning

P Hack 1,, C Lindig-Leon 1, S Gottwald 1, D A Braun 1
PMCID: PMC9845310  PMID: 36650215

Abstract

The application of thermodynamic reasoning in the study of learning systems has a long tradition. Recently, new tools relating perfect thermodynamic adaptation to the adaptation process have been developed. These results, known as fluctuation theorems, have been tested experimentally in several physical scenarios and, moreover, they have been shown to be valid under broad mathematical conditions. Hence, although not experimentally challenged yet, they are presumed to apply to learning systems as well. Here we address this challenge by testing the applicability of fluctuation theorems in learning systems, more specifically, in human sensorimotor learning. In particular, we relate adaptive movement trajectories in a changing visuomotor rotation task to fully adapted steady-state behavior of individual participants. We find that human adaptive behavior in our task is generally consistent with fluctuation theorem predictions and discuss the merits and limitations of the approach.

Subject terms: Decision, Reward

Introduction

The study of learning systems with concepts borrowed from statistical mechanics and thermodynamics has a long history reaching back to Maxwell’s demon and the ensuing debate on the relation between physics and information34. Over the last 20 years, the informational view of thermodynamics has experienced great developments, which has allowed to broaden its scope form equilibrium to non-equilibrium phenomena10, 22. Of particular importance are the so-called fluctuation theorems7, 20, 42, which relate equilibrium quantities to non-equilibrium trajectories allowing, thus, to approximate equilibrium quantities via experimental realizations of non-equilibrium processes32, 53. Among the fluctuation theorems, two results stand out, Jarzynski’s equality4, 19, 21 and Crooks’ fluctuation theorem6, 8, as they aim to bridge the apparent chasm between reversible microscopic laws and irreversible macroscopic phenomena29.

The advances in non-equilibrium thermodynamics have recently also led to new theoretical insights into simple learning systems12, 13, 16, 31, 35, 46. Abstractly, thermodynamic quantities like energy, entropy or free energy can be thought to define order relations between states14, 25, which makes them applicable to a wide range of problems. In the economic sciences, for example, such order relations are typically used to define a decision-maker’s preferences over states30. Accordingly, a decision-maker or a learning system can be thought to maximize a utility function, analogous to a physical system that aims to minimize an energy function. Moreover, in the presence of uncertainty in stochastic choice, such decision-makers can be thought to operate under entropy constraints reflecting the decision-maker’s precision31, 34, resulting in soft-maximizing the corresponding utility function instead of perfectly maximizing it. This is formally equivalent to following a Boltzmann distribution with energy given by the utility. Therefore, in this picture, the physical concept of work corresponds to utility changes caused by the environment, whereas the physical concept of heat corresponds to utility gains due to internal adaptation46. Like a thermodynamic system is driven by work, such learning systems are driven by changes in the utility landscape (e.g. changes in an error signal). By exposing learning systems to varying environmental conditions, it has been hypothesized that adaptive behavior can be studied in terms of fluctuation theorems12, 16, which are not necessarily tied to physical processes but are broadly applicable to stochastic processes satisfying certain constraints18.

Fluctuation theorems are usually deployed in statistical mechanics; particularly, the study of nonequilibrium steady states in thermodynamics. In this setting, one normally assumes a probabilistic description of an ensemble of many particles, i.e., the kinds of systems usually considered in statistical thermodynamics. However, as described in41, 42, exactly the same principles and fluctuation theorems also apply to the path of a single particle, leading to stochastic thermodynamics. This suggests that fluctuation theorems may not only be applicable to the statistics of ensembles of many learners, but also when describing the trajectory of a single participant during a learning process.

Although fluctuation theorems have been empirically observed in numerous experiments in the physical sciences1, 5, 11, 28, 37, 44, there have been no reported experimental results relating fluctuation theorems to adaptive behavior in humans or other living beings. Here, we test Jarzynski’s equality and Crooks’ fluctuation theorem experimentally in a human sensorimotor adaptation task. In this context, the fluctuation theorem establishes a linear relationship between the externally imposed utility changes driving the learning process (which are directly related to non-predicted information and energy dissipation46) and the log-probability ratio between forward and backward adaptation trajectories, when exposing participants to the sequence of environments either in the forward or reverse order. Accordingly, such learners can be quantitatively characterized by a hysteresis effect that can also be observed in simple physical systems.

Results

In a visuomotor adaptation task, human participants controlled a cursor on a screen towards a single stationary target by moving a mechanical manipulandum that was obscured from their vision under an overlaid screen—see Fig. 1A. Crucially, in each trial n, the position of the cursor could be rotated with angle θn relative to the actual hand position so that participants had to adapt when moving the cursor from the start position to the target. To measure participants’ adaptive state, we recorded their movement position at the time of crossing a certain distance from the start position, so that their response could be characterized by an angle xn. The deviation between participants’ response xn and the required movement incurs a sensorimotor loss En24 in trial n, that can be quantified as an exponential quadratic error

En(x)=1-e-(x-(θn+b))2, 1

that depends on the actual rotation angle θn set in trial n. The parameter b is a participant-specific parameter allowing for bias due to posture, biomechanics, the mechanics of the manipulandum, or other influences—see Fig. 1D. The loss (1) is taken to be the energy (or negative utility) of a participant’s stochastic response Xn=xn. For a bounded rational decision-maker26, 27, 31, 39 that optimizes this loss under uncertainty, the optimal pointing behavior after a suitably long adaptation time is described by a Boltzmann equilibrium distribution pneq of the form

pneq(xn)=exp(-β(En(xn)-Fn)), 2

for all xnAn, where the sensorimotor error En(xn) plays the role of an energy, the free energy term Fn=1βlogAnexp-βEn(xn)dxn is caused by the normalization, and An is the support of the equilibrium distribution pneq, which will vary for each participant, as we explain in Sect. A.3.3. See Fig. 1C for a representation of (2). Moreover, the softness-parameter β, also known as inverse temperature or precision, controls the trade-off between entropy maximization and energy minimization, essentially interpolating between a purely stochastic choice (β=0) and a purely rational choice (β) minimizing the energy perfectly.

Figure 1.

Figure 1

(A) Schematic representation of an experimental trial with deviation angle θ. The dotted line represents the participant’s hand movement and the continuous line represents the rotated movement observed on the screen. (B) Experimental protocol. The continuous line represents the deviation angles θ imposed during one experimental cycle, where trials 1 to 25 constitute the forward process and trials 34 to 58 constitute the backward process. The dotted line represents the beginning of the next cycle. (C) Illustration of the equilibrium distributions (2) with b,θn=0 resulting from the exponential quadratic error (1) and, respectively, β=1,1.5,2. The shaded area represents the target, which tolerates, at most, an error of 2. (D) Comparison between the equilibrium distributions that we fit using the initial 100 trials (before participants experience any perturbation) and participants’ performance in the washout plateaus between cycles (the sequence of trials with θ=0 that separate forward and backward protocol), to check whether participants equilibrate between cycles, as required by the fluctuation theorem. Red shows the normalized error histogram for the in-between plateaus exemplarily for participant 7, green shows the histogram of the fitted equilibrium distribution for the initial block of 100 trials of the same participant. The comparison for all other participants can be found in Fig. 7.

The task consisted of a sequence of target reaching trials, where the rotation angle θn changed from one trial n to the next trial n+1 according to a given up-down protocol—see Fig. 1B—, so that participants’ responses over trials could be represented by a trajectory x=(x0,x1,,xN). When the environment is changing over trials, we can distinguish cumulative error changes ΔEext(x):=n=0N-1(En+1(xn)-En(xn)) that are induced externally by changes in the environmental parameter θn, from cumulative error changes ΔEint(x):=n=1N(En(xn)-En(xn-1)) due to internal adaptation when subjects change their response from xn-1 to xn. Crucially, it is exactly the externally induced changes in error, ΔEext(x), analogous to the physical concept of work, that drive the adaptation process: if ΔEext(x) is large, the system is more surprised and has to adapt more. In the following, we thus refer to ΔEext(x) as driving error or driving signal. When applying Crooks’ fluctuation theorem for general adaptive systems18 to the above setting, we obtain the linear relation

ΔEext(x)-ΔF=1βlogρF(x)ρB(xR), 3

where xR=(xN,,x1) is the reverse trajectory, ΔF denotes the free energy difference FN-F0 and the distributions ρF(·) and ρB(·) denote the probability of observing a certain trajectory when the learner faces a series of environments in some specific order or the order is reversed, respectively. This form of Crooks’ theorem allows for an intuitive interpretation, in that any difference in probability of a trajectory and its reverse signifying a hysteresis can be directly related to an excess loss that is irretrievably generated because of imperfect adaptation. Unfortunately, Equation (3) is hard to determine from data, as it would require to estimate probability distributions over paths. However, there is an equivalent form of Crooks’ theorem that groups all trajectories according to their associated value of ΔEext(x) with corresponding distributions ρF and ρB over these values, such that

ΔEext(x)-ΔF=1βlogρF(ΔEext(x))ρB(-ΔEext(x)). 4

The distribution ρF(·) can be interpreted as the probability that the learner experiences a certain overall surprise when being exposed sequentially to a series of environments and ρB(·) is the analogous concept when the order in which the environments are presented is reversed. In equation (4), these densities are evaluated at the actual driving errors ΔEext(x) and -ΔEext(x), respectively, for a particular adaptive trajectory x.

A direct consequence of (4) is Jarzynski’s equality6, which states that

e-βΔEext(X)=e-βΔF, 5

where ·:=E[·] denotes the expectation operator, considering X=(Xn)n=0N a Markov chain with transition densities Πn that have pneq as stationary distributions, that is, for each n, pneq is the stationary distribution for Xn. In our experiment, X represents participants’ responses that are repeated over multiple repetitions of the forward-backward protocol. In the following, we will test the relationships (4) and (5) experimentally with ΔF=0 as our human learners start and end in the same environmental state (i.e. FN=F0). Note that, in our particular setting where there is no overall change in the free energy (ΔF=0), Equation (5) suggests that the expected value e-βΔEext(X) equals e-β0=1 irrespective of the value taken by β. This provides a quantitative prediction that we will evaluate empirically below.

In our experiment the task is divided into 20 cycles of 66 trials each, following the protocol (9) illustrated in Fig. 1B. We refer to trials 1 to 25 of each cycle as a realization of the forward process and trials 34 to 58 as a realization of the backward process. Notice the backward process consists of the same angles as the forward process, that is, the same utility functions, but in reversed order. Thus, we record for each participant 20 values for ΔEext(x) in both the forward and backward processes that we use to estimate participants’ probability densities of the forward and backward processes, ρF and ρB, respectively, using kernel density estimation. As the amount of data is limited to test the linear relation in (4), we will use simulation results in the following to compare against participants’ behavior.

When simulating an artificial decision-maker based on a stochastic optimization scheme with Markovian dynamics, for example a Metropolis-Hasting algorithm with target distribution pneqexp(-βEn), it is clear that we can recover the linear relationship (4), provided that sufficient samples are collected18—see, for example, a simulation with 1000 cycles in Fig. 2A, where we can see a good adjustment between the theoretical prediction (in black) and the linear regression of the observed data (in red). As a result, (5) also holds in this scenario. The more critical question is what happens when only few samples are available. To this end, we use the stochastic optimization algorithm to simulate the protocol of our experiment, that is, 20 cycles, and indicate confidence intervals using 1000 bootstraps. It can be seen in Fig. 2B that the theoretical prediction is consistent with the 99% confidence interval in the region where |ΔEext|4 (which is the region where our experimental data lies). Using the same bootstrapped data, we obtain several estimates of e-ΔEext(X) (the mean of e-ΔEext(X) for the observed values of ΔEext(X) at each bootstrap) which we use to calculate a confidence interval for it. This results in the 99% confidence interval for e-ΔEext(X) being (0.48,  1.64), which is consistent with the theoretical prediction e-ΔEext(X)=1 for ΔF=0 according to Equation (5). Accordingly, we will expect a similar behavior for our experimental data. Note we take, for simplicity, b=0, β=1 and, for all n, An=[-90,90] in these simulations (see Methods).

Figure 2.

Figure 2

Simulation of Crooks’ fluctuation theorem. (A) Simulation with 1000 cycles. In black, the theoretical prediction; in red, the linear regression for the simulated data and, in green, the simulated points. Since the simulated data set adjusts pretty well to Crooks’ fluctuation theorem (4), Jarzynski’s equality (5) is fulfilled. (B) Simulation with 20 cycles and bootstrapping. The black line is the theoretical prediction (4) while the red line and shaded area are, respectively, the mean and the 99 % confidence interval of (4) after 1000 bootstraps of the driving error values obtained in a single run (which consists of 20 cycles).

Participants’ average adaptive responses can be seen in Fig. 3 compared to the experimentally imposed true parameter values (the trial-by-trial responses can be seen in Fig. 6). The green and red lines distinguish the forward and backward trajectories, respectively, so that, from the contrast between the two curves, hysteresis becomes apparent, as common in simple physical systems22 and as reported previously in similar experiments for sensorimotor adaptation50. Participants that achieve at least 50% adaptation are shaded by a green background color and are our participants of interest. The three participants that fail to achieve this minimum adaptation level are marked by a red shade. Instead of excluding these participants entirely from the analysis, we keep them in to show the contrast to the well-adapted participants and to highlight that the results reported for the well-adapted participants do not hold trivially for any participant producing inconsistent behavior.

Figure 3.

Figure 3

Hysteresis effect. The filled triangles are the mean of the observed angles for every deviation in both the forward process, in green, and the backward process, in red. The black line is the forward protocol. Note that we have mirrored the triangles for the backward process to make them coincide with those in the forward process that are exposed to the same true angle. Participants that achieve at least 50% adaptation are shaded by a green background color. Hysteresis can be observed between trials 1 and 5, 9 and 17 and 21 and 25. Notice, as expected, the forward means are below the backward in the first region, above in the second and below again in the third.

Figure 6.

Figure 6

Observed angles in the forward and backward processes. The black line represents the protocol while the filled triangles correspond to both the forward trajectories, first and third rows in green, and the backward trajectories, second and fourth rows in red.

Figure 4 shows participants’ data compared to the theoretical prediction from (4) and the 99 % confidence interval after 1000 bootstraps as in the case of the simulations in Fig. 2B. There, we see that our data follow the trend of the theoretical prediction and lie within or close to the confidence interval bounds of the prediction in broad regions for several participants. This is not a trivial result, as can be easily seen, when randomizing the temporal order of the trajectory points or when replacing the utility function with another one that does not fit the setup. Figure 5A,B show this, for example, for an inverted Mexican hat ((10) with σ=4) that assigns low utility to the target region, and for resamples of the trajectory points in a random order, respectively. Both results are clearly incompatible with the theoretical prediction.

Figure 4.

Figure 4

Experimental results for Crooks’ fluctuation theorem when the sensorimotor loss behaves as an exponential quadratic error (1). The black line is the theoretical prediction of Crooks’ fluctuation theorem (4) while the curves stand for the mean path after 1000 bootstraps of the observed driving error values. Participants that achieve at least 50% adaptation (c.f. Fig. 3) are shaded by a green background color. The shaded areas inside the graphs are the 99% confidence intervals which result from bootstrapping. Note we fit the parameters for each participant according to Sect. A.3.3.

Figure 5.

Figure 5

Control results for Crooks’ fluctuation theorem in two scenarios: (A) the sensorimotor loss behaves like a Mexican hat function and (B) the sensorimotor loss behaves as an exponential quadratic error but we sample the observed angles randomly with repetition. The black line is the theoretical prediction of Crooks’ fluctuation theorem (4) while the curves stand for the mean path after 1000 bootstraps of the observed driving error values. The shaded areas inside the graphs are the 99% confidence intervals which result from bootstrapping. Note, for simplicity, we assume β=1 for all participants when using the Mexican hat to demonstrate that the result in (A) does not trivially hold for any cost function. For (B), we fit the parameters for each participant according to Sect. A.3.3.

When conducting an additional robustness analysis in Fig. 8, we found that, under the proposed utility function, participants’ behavior is compatible with Crooks’ fluctuation theorem for a broad neighbourhood of parameter settings, but breaks down when choosing implausible parameters. Regarding Jarzynski’s equality (5), the confidence intervals for the majority of participants are consistent with the theoretical prediction when using the bootstrapped values to calculate e-βΔEext(X) (cf. Table 1). In contrast, when following the same procedure for both the inverted Mexican hat and the randomized procedure, we obtain consistency for a considerably smaller number of participants. In particular, for the inverted Mexican hat, we obtain consistency for only two participants. Moreover, these participants are S8 and S9, which belong to the group that did not reach at least 50% adaptation (indicated by the red background area in the figures). For the randomized procedure, the expected number of participants that show consistency is also close to two, although the specific participants which are consistent vary with the realization of the randomized procedure. More specifically, after 1000 runs of the randomized procedure, the mean number of consistent participants we observed was 2.33.

Figure 8.

Figure 8

Graphical representation of the accuracy of Crooks’ fluctuation theorem for several pairs of parameters (b,β), which we measure through db,β as explained in Sect. A.3.5. The color intensity grows monotonically with the distance db,β and is divided into six regions, namely, db,β1, 1<db,β3, 3<db,β6, 6<db,β11, 11<db,β23 and 23<db,β. The actual values of db,β can be found in Table 2.

Table 1.

Experimental results for Jarzynski’s equality. We include the confidence intervals for the left hand side of (5), which we obtain after bootstrapping the observed values of ΔEext(x) for the forward process 1000 times and estimating e-βΔEext(X) by its mean for each set of bootstrapped data. In our experiment we have ΔF=0 in the right hand side of (5), resulting in a theoretical prediction of e-βΔEext(X)=1.0. Note, that for most subjects the value of 1.0 lies inside the confidence interval, which does not hold when assuming unsuitable loss functions, as discussed at the end of the Results. Participants that achieve at least 50% adaptation (c.f. Fig. 3) are shaded by a green background color .

Participant Confidence interval Participant Confidence interval
1 (0.03, 48.59) 6 (0.04, 3.75)
2 (0.03, 137.58) 7 (0.01, 0.50)
3 (0.01, 3.63) 8 (1.98, 518130.21)
4 (0.49, 63.48) 9 (0.76, 77.24)
5 (0.46, 1.37) 10 (0.26, 48758.33)

Discussion

In our experiment we have investigated the hypothesis that human sensorimotor adaptation may be participant to the thermodynamic fluctuation theorems first reported by Crooks7 and Jarzynski20. In particular, we tested whether changes in sensorimotor error induced externally by an experimental protocol are linearly related to the log-ratio of the probabilities of behavioral trajectories under a given forward and time-reversed backward protocol of a sequence of visuomotor rotations. We found that participants’ data, in all cases where participants showed an appropriate adaptive response, was consistent with this prediction or close to its confidence interval bounds, as expected from our simulations with finite sample size. Moreover, we found that the exponentiated error averaged over the path probabilities was statistically compatible with unity for these participants, in line with Jarzynski’s theorem.

Together these results not only extend the experimental evidence of Boltzmann-like relationships between the probabilities of behavior and the corresponding order-inducing functions—such as energy, utility, or sensorimotor error—from the equilibrium to the non-equilibrium domain, but also from simple physical systems to more complex learning systems when studying adaptation in changing environments, deepening, thus, the parallelism between thermodynamics in physics and decision-making systems31.

When testing for the validity of thermodynamic relations, one of the most critical issues is the choice of the energy function, that is, in our case, the error cost function. In physical systems, the energy function is usually hypothesized following from simple models involving point masses, springs, rigid bodies, etc., and generally requires knowledge of the degrees of freedom of the system under consideration. Here we have used an exponential quadratic error as a utility function, as it has been suggested previously that human pointing behavior can be best captured by loss functions that approximately follow a negative parabola for small errors and then level off for large errors24. In the absence of very large errors, many studies in the literature on sensorimotor learning have only used the quadratic loss term48, 52. Quadratic errors have also been advocated in the context of the central limit theorem and in terms of prediction errors in the context of predictive coding36, 4547. Thus, our assumptions regarding the loss function are compatible with the literature at large. Crucially, the reported results fail when assuming non-sensical cost functions, like the Mexican hat.

Experimental tests of both Jarzynski’s equality (5) and Crooks fluctuation theorem (4) have been previously reported in classical physics5, 11, 28, 37, 49 and also, in the case of Jarzynski’s equality, in quantum physics1, 44. Importantly, these results have been successfully tested in several contexts: unfolding and refolding processes involving RNA5, 28, electronic transitions between electrodes manipulating a charge parameter37, rotation of a macroscopic object inside a fluid surrounded by magnets where the current of a wire attached to the macroscopic object is manipulated11, and a trapped ion1, 44. Despite differences in physical realization, protocols, and energy functions (and thus work functions), all the above experiments follow the same basic design behind the approach presented here. This supports the claim that fluctuation theorems do not necessarily rely on involved physical assumptions but are simple mathematical properties of certain stochastic processes18, although originally they were derived in the context of non-equilibrium thermodynamics6, 19.

Mathematically, Crooks theorem (4) holds for any Markov process (i), whose initial distribution is in equilibrium (ii), and whose transition probabilities satisfy detailed balance with respect to the corresponding equilibrium distributions (iii)18. Our experimental test of Equation (4) can be seen, thus, as a test for the hypothesis that human sensorimotor adaptation processes satisfy conditions (i), (ii), and (iii). Condition (i) requires adaptation to be Markovian, which is in line with most error-driven models of sensorimotor adaptation43 that assume some internal state update of the form xt+1=f(xt,e) with adaptive state x and error e. While such models have proven fruitful for simple adaptation tasks like ours, they also have clear limitations, for example when it comes to meta-learning processes that have been reported in more complex learning scenarios2, 17. Condition (ii) is supported by our data in the second and last rows of Fig. 7, where it can be seen that participants’ behavior at the beginning of each cycle is at least approximately consistent with the equilibrium behavior recorded prior to the start of the experiment. Condition (iii) requires that the adaptive process converges to the equilibrium distribution (2) dictated by the environment and that the behaviour remains statistically unchanged when staying in that environment. Moreover, it requires that the equilibrium behavior at each energy level is time-reversible, that means, once adaptation has ceased the trial-by-trial behavior would have the same statistics when played forward or backward in a video recording. Note, however, that does not imply time-reversibility over the entire adaptation trajectory, but is only required locally for each transition step. In our sensorimotor setting, this would mean that after a suitably long adaptation time with perfect adaptation there would ultimately be no hysteresis, and accordingly it would be impossible to tell where the learner has come from. If we regard, for example, Metropolis-Hastings as a plausible model of adaptation, as some kind of stochastic optimization scheme, detailed balance and time reversibility would be fulfilled16, 38. What kind of model describes human adaptive behavior best, and whether such a model is compatible with detailed balance is ultimately an open question. In our experiment at least, the condition seems to be fulfilled well enough to stay within the confidence intervals associated with the predictions made by Crooks’ theorem.

Figure 7.

Figure 7

Comparison between participants’ behaviour in washout trials (between perturbation cycles) with the fitted equilibrium distribution (recorded before participants experienced any perturbation). The first and third rows compare the normalized histogram of the angles observed during the initial 100 trials (blue color), with the histogram of the fitted equilibrium distribution (2) over the same trials (green color). The second and the forth rows compare the same fitted equilibrium distribution (green color) with the normalized histogram of the angles observed in the 0 deviation plateaus (washout trials) which separate forward and backward protocol (red color). Note the plateau in each cycle consists of 10 points, from which we only include the last 8 to avoid large aftereffects. The application of Crooks’ theorem requires that subjects fully equilibrate between protocols, that is, in our case their behavior in washout trials should return to the fitted equilibrium behavior at the start of the experiment. Compare the discussion on condition (ii) on page 11.

While Jarzynski’s equality (5) directly follows from Crooks theorem, weaker assumptions are sufficient to derive it18, 19. In particular, condition (iii) regarding detailed balance is not necessary, as it is only required that the behavioral distribution does not change anymore once the equilibrium distribution is reached. Thus, Equation (5) can be used as a test for the weaker hypothesis that human sensorimotor adaptation satisfies conditions (i), (ii) and stationarity after convergence. While Jarzynski’s equality only requires samples from the forward process, Crooks theorem also tests the relation between the forward and the backward processes. In particular, Crooks theorem decouples the information processing with respect to any particular environment from the biases introduced by the adaptation history, that is, it assumes the transition probabilities for any given environment are independent of the history. In other words, the conditional probabilities have no memory and, thus, all memory effects are explained in terms of the state of the learning system prior to making some decision. Hence, the observed difference in behaviour after having adapted to the same environment, the hysteresis, is solely explained in terms of the information processing history before encountering the environment. Such hysteresis effects are not only common in simple physical systems like magnets or elastic bands, but have also been reported for sensorimotor tasks23, 40, 50. The hysteresis effects we report in Fig. 3 are in line with a system obeying Crooks theorem and can be replicated using Markov Chain Monte Carlo simulations of adaptation16.

Our study is part of a number of recent studies that have tried to harness equilibrium and non-equilibrium thermodynamics to gain new theoretical insights into simple learning systems12, 13, 31, 35, 46. For example, the information that can be acquired by learning in simple forward neural networks has been shown to be bounded by thermodynamic costs given by the entropy change in the weights and the heat dissipated into the environment42. More generally, when interpreting a system’s response to a stochastic driving signal in terms of computation, the amount of non-predictive information contained in the state about past environmental fluctuations is directly related to the amount of thermodynamic dissipation46. This suggests that thermodynamic fundamentals, like the second law, can be carried over to learning systems. Consider, for example, a Bayesian learner where the utility is given by the log-likelihood model and where the data are presented either in one chunk for a single update, or consecutively in little batches with many little updates. Rather than having one big surprise, in the latter case the cumulative surprise is much smaller as prior expectations can be continuously adapted, up to a point where the cumulative surprise reaches a lower bound given by the log-likelihood of the data, which corresponds to the free energy difference before and after learning16. Fluctuation theorems have recently also been attributed a fundamental role in the context of the Free Energy Principle, with relations to information geometry and decision-theoretic concepts like risk, ambiguity, expected information gain and expected value9, 33. Due to the central role of the concept of variational free energy in inference processes15, this raises the interesting question in how far our results may generalise to any belief-updating process, including for example perceptual inference and perceptual hysteresis. Finally, it has even been suggested that the dissipation of absorbed work as it is studied in a generalized Crooks theorem may underlie a general thermodynamic mechanism for self-organization and adaptation in living matter12, raising the question of whether such a general principle of adaptive dissipation could also govern biological learning processes35.

A Appendix: Methods

A.1 Theoretical methods

The derivation of (4) and (5) in the context of general Markov chains can be found in18. A similar proof of (5) under stronger assumptions was derived in6 and a different one using the same assumptions was given in19. Regarding (4), a similar proof can be found in6. Note, however, that the usual definition of work in thermodynamics is slightly different for the forward and backward process, based on the physical definition of time reversal and the associated symmetry for the work values. In our case, we define the driving signal that is analogous to the work concept in the same way, for both forward and backward process. In this case, for Equation (4) to hold, we need to assume that E1=E0 both in the forward and backward process18. Fortunately, this is true for our protocol, since we begin both forward and backward protocol with some washout trials without perturbation. It should also be pointed out that, in order for the elements involved in Jarzynski’s and Crooks’ derivations to be well-defined, the equilibrium probability density associated to each step in the Markov chain ought to be non-zero at both the starting and ending point of that step18. This will play a relevant role in the choice of the support An for the equilibrium distributions pneq in Sect. A.3.3.

A.2 Simulation methods

In this section, we explain in detail how we simulated (4) and (5).

A.2.1 Metropolis–Hastings algorithm

We use3 as reference for this section. However, for simplicity, we skip over several technical details and may oversimplify some notions.

The Metropolis-Hastings algorithm is a procedure which allows to obtain samples x from a distribution p that is proportional to some function f, that is, p(x)=1Zf(x). There are three concepts relevant to this algorithm: U, q and α. They are defined as follows

  • U(A) stands for the uniform distribution over some set AR.

  • q(·,·) is called the candidate generating density. The role of q in the algorithm is to generate a new point y given a previous point x, with y being sampled from the distribution q(x,·). In our case we define the density function in y with -9090q(x,y)dy=1, as we assume that movements will be towards the target (0 direction) under a maximally induced error of 20. Accordingly, we can expect that practically all responses will be covered by choosing a support of ±90.

  • α(·,·) is defined as follows:
    α(x,y)=min{f(y)q(y,x)f(x)q(x,y),1}iff(x)q(x,y)>0,=1otherwise
    and is included in the algorithm as a filter on the samples proposed by q, so that some of these samples will be accepted and some will be rejected, to make the samples appear to be sampled from p.

We can now introduce the Metropolis-Hastings algorithm. The algorithm is initialized at an arbitrary value x0 and then repeats the following steps for i=1,2,,M:

  • (i)

    Generate y from q(xi-1,·) and u from U(0, 1).

  • (ii)

    If uα(xi-1,y), then xi=y.

  • (iii)

    Otherwise, xi=xi-1.

Finally, the algorithm returns the values (x1,,xM).

Note that the density of transitions from x to y is therefore given by

pM(x,y)=q(x,y)α(x,y)ifxy,

which satisfies detailed balance with respect to pf3. Thus, p is the stationary distribution of the resulting Markov process, and so the xi can be regarded samples from p after the chain has passed a transient stage after which the effect of the initialization is negligible. Notice, in our implementation, described below, we only require the burn-in phase for the initial energy in order to make sure that the process starts in the corresponding stationary state. However, since we are interested in the adaptation process during a changing energy signal, we only use the first sample (M=1) for the remaining steps, conditioned on the sample from the previous step.

A.2.2 Implementation

Given a set of equilibrium distributions (p0,,pN), we use the Metropolis-Hastings algorithm on their proportional functions (f0,,fN) to generate two paths: the forward path where we apply the algorithm once at step i (M=1 in Sect. A.2.1, as explained above), with p=pi, and the backward path where we do the same but with the distributions in the reverse order. In particular, we consider

fn(x)=e-En(x) 6

for n=0,,24 for the forward process, where, for n=1,,24, we take

En(x)=-e-(x-θn)2 7

with θn given by (9), and for n=0 we consider

E0(x)=-(x+2)ifx<-2,-e-x2if-2x2,x-2if2<x. 8

We will refer to the application of the algorithm following the sequence in (6) with M=1 for each n=1,,25 as a cycle. Note E0 in (8) differs from En in (7) for n=1,,24. While we would like to take E0 as in (7) with θ0=0, since one of our hypothesis is the simulations sample the first point in each cycle from

p0(x)ee-x2,

the values of p0 for x[-2,2] are quite indistinguishable once we fix a certain precision. As a result, the algorithm does not converge to p0 in the long run. To avoid this difficulty, we simply modify the function outside [-2,2] such that points there become distinguishable. This results in the algorithm converging to a distribution close to p0. Note this modification only applies to the generation of the initial samples, hence, we use (7) to calculate ΔEext(x).

The candidate generating density we use for the nth step with n=1,,24 is a normal distribution with mean equal to the (n-1)th sample and standard deviation equal to the mean of the distances between subsequent points in the observed data, which turns out to be around 5. Using the values generated by the algorithm during a cycle, we calculate ΔEext(x) for the forward process via the utilities in (7), and, after generating several of them, we apply kernel density estimation (see Sect. A.3.4) to estimate ρF in (4). We proceed analogously to estimate ρB and, finally, use the obtained values of ΔEext(x) for the forward process together with the estimates of ρF and ρB to test (4). This test is done differently for the simulation with the large number of sample and that with a small number of them. For the larger one, we simply use the least squares method as the estimate of (4) (cf. Fig. 2A). For the smaller one, however, we produce 1000 bootstraps from the produced values of ΔEext(x) and find a confidence interval for (4) from the curves we obtain from the pair (ρF, ρB) for each bootstrap (cf. Fig. 2B).

A.3 Experimental methods

In this section, we explain the specifics of how we tested experimentally both (4) and (5).

A.3.1 Participants

Ten participants P1,,P10, five females and five males, participated in this study. Three of the authors were among the participants (P1, P2 and P3). All other participants provided written informed consent for participation and were remunerated with 10 Euros per hour. The participants were undergraduate and graduate students. The procedures were approved by the Ethics committee of Ulm University. All methods were performed in accordance with the relevant guidelines and regulations.

A.3.2 Setup

The experiment was run on a vBOT. Each participant performed the task using the handle of the right arm of the vBOT, which was manipulated with the dominant hand. The participants had no direct view of the handle but of a screen where its position, altered according to a protocol we describe in the following, was represented by a cursor.

A.3.3 Experimental design

Participants were asked to reach the center of a yellow rounded target on the screen with the center of their cursor. To begin each trial, the participants were asked to place the cursor inside a rounded initial position whose center was 15 cm away from the target’s center along the same vertical. Once the cursor crossed the horizontal containing the center of the target, the target became green if participants successfully situated the center of the cursor inside the target and red otherwise. Once the target changed its color, participants were asked to return the cursor to the initial position to begin the following trial. While both the target and the initial position were at the same place each trial, the cursor did not represent the movement of the handle veridically each trial. In particular, after 100 trials where the cursor position and the handle coincided, there were 1420 trials divided in 20 cycles of 66 trials where the cursor position was determined by rotating the vector going from the center of the initial position to the handle’s position. The rotation angle θn for each n in any cycle n=0,,65 was

θn=α(n)ifn=0,,24θn=0ifn=25,,32θn=α(57-n)ifn=33,,57θn=0ifn=58,,65 9

where all angles are in degrees and

α=(0,5,10,15,20,20,20,20,20,15,10,5,0,-5,-10,-15,-20,-20,-20,-20,-20,-15,-10,-5,0).

For each n=0,,65, we extract θn, the angle between the vertical segment joining the center of the initial position and the center of the target and the segment joining the center of the initial position and the handle in the first recorded point which is more than 12 cm apart from the center of the initial position. One can find the recorded angles (x0,,x65) for both the forward and backward processes in Fig. 6. For participant Pj, with 1j10, we take pn,j=pneq as the equilibrium distribution for the n-th trial, where bj represents the bias introduced by the machine for participant Pj. We determine the bias as the mean of the initial 100 trials (where the cursor veridically represents the handle). The value cj represents the maximum deviation for participant Pj among the distances |xn-(θn+bj)| and |xn-1-(θn+bj)|, which we use to fix the support of the equilibrium distribution for Pj as An=[θn+bj-cj,θn+bj+cj]. The parameter βj represents the spread around the bias, which we pick once the bias and the support of the equilibrium distributions are fixed by requiring these distributions to maximize the likelihood of the observed values for the first 100 trials. We observe the best spread parameters are between βj=0.25 and βj=5 for all participants. In order to choose the most suitable one for each participant, we consider the values between 0.25 and 5 that result from sequentially adding 0.25 to the lowest value and pick as βj the one that maximizes the likelihood on the observed angles in the 100 initial trials— see Fig. 7 for a comparison between the observed angles and the equilibrium distribution. We discuss, in Sect. A.3.5, how the choice of the parameters bj and βj affect the results. Note the choice of cj does not directly affect how we measure the accuracy of the predictions, but is key in the maximum likelihood estimation of βj.

Using the angles recorded during a cycle, we calculate ΔEext(x) via pn,j for both the forward and backward processes, and, using the 20 values per participant, we estimate ρF and ρB in (4) through kernel density estimation (see Sect. A.3.4). Finally, we bootstrap the obtained values of ΔEext(x) for the forward and backward process to obtain several estimates of ρF and ρB. Each of these pairs is used to produce a curve that estimates (4). The mean of these curves for each participant is what we compare to (4) in Fig. 4. The same values of ρF are used to test (5) (cf. Table 1).

A.3.4 Kernel density estimation

In order to determine the probability distributions ρF and ρB in (4), we use kernel density estimation51. Kernel density estimation consists of choosing a function K, the kernel, and a positive number h>0, the bandwidth, and approximating p by distributions of the form

1nhi=1nKx-xih.

We consider here K to be a standard normal distribution. Notice we simply estimate p as a sum of standard normal distributions around each observed point xi, for i=1,,n, and decide how much each xi influences other points in R via h. We fix h=0.7 throughout this work.

A.3.5 Robustness analysis

In this section, we measure model robustness using two approaches: (i) using the exponential quadractic error (1) and varying the parameters we fitted, i.e. b and β, and (ii) fixing a pair of parameters that are close to the optimal ones for each participant and taking convex combinations of the exponential quadractic error and the Mexican hat as sensorimotor errors.

As pointed out in Sect. A.3.3, we fix the parameters in (1) and (4), via the initial 100 trials (where no perturbation is applied). To assess model robustness, we consider the effect of assuming the same model with different parameters. We consider, in particular, all pairs (b,β) with b{-10,-3,-1,0,1,3,10} and β{0.01,0.1,1,3,4,10,100}, since they cover a wide scope of the possible behaviour of (4) using the model in (1). For the robustness analysis we fit the data of all participants with the same parameter sets. In Fig. 9, we show the histogram of the driving signals ΔEext(x) for different pairs of parameters (b,β). Then, we follow the bootstrapping procedure from Sect. A.3.3 using the different values of b and calculate the mean distance between the mean of the curves we obtain from the bootstraps and the theoretical prediction (4) with the different values of β. In particular, we consider the mean horizontal distance between the prediction and the mean curve at the points between ΔEext(x)=-4 and ΔEext(x)=4 (that is, the range of values of ΔEext(x) we present in Fig. 4) with steps of 0.1. We denote the obtained mean distance as db,β.

Figure 9.

Figure 9

Histogram of the forward driving signal values using different values of b. In particular, we present the histograms for b=1,-1,10,-10. We include the first two since they are close to the values of b we fit from the initial 100 trials (where no deviation is applied) and the last two to illustrate the grounds on which we discard certain parameter pairs. As expected from the observed hysteresis effect (cf. Fig. 3), the histograms in (A) and (B), which correspond to b=1 and b=-1, respectively, are biased towards positive values of the driving signal. When assuming implausible parameters, like the ones in (C) and (D), which correspond to b=10 and b=-10, respectively, the bias shifts towards negative values (cf. C,D) and, even, shows a significant concentration of values around 0 (cf. C). Note we observe, respectively, the same biases in the backward driving signals.

To assess how well the parameters fit the data, we have to consider the plausibility of the data being generated by our model using the different parameter settings (b,β). Accordingly, it is not enough to simply look at db,β as a goodness-of-fit measure. This is the case, as the underlying assumption in our model is that the data comes from a Markov chain where the equilibrium distributions at each step are given by the Boltzmann distribution (2) with parameters (b,β). In this situation, we expect participants to lag behind the utility they are adapting to most of the time, and hence, by definition, we expect the driving signal to be biased towards positive values. We can discard any parameter settings where this is not the case. Accordingly, we can disregard all pairs that have b=10,-10 —see Fig. 9. The values of db,β for all pairs (b,β) we considered can be found in Table 2 (see Fig. 8 for a graphical comparison). As we can see there, the best parameters have β=1, -3b3, and mean distances which are both close to each other and significantly better than the rest. This was expected, since the hypothesis that the data observed at the plateaus follows (2) for these parameters is not completely implausible (cf. Fig. 7). The values b{-1,0,1} and β{3,4}, which are the closest to the fitted parameters, also have a small mean distance (although larger than the best cases). In contrast, db,β becomes significantly larger for the parameters that are clearly unlikely, that is, those that present a huge concentration of the probability around some point, i.e. the ones where the value of β is large. In contrast, whenever the values of β become small, the equilibrium distributions become all closer to a uniform and, although the mean distance does worsen when compared to the best cases, its values do not increase much.

Table 2.

Mean distance between the theoretical prediction in (4) and the mean curve we obtain from bootstrapping the observed angles (see Sect. A.3.3) for several pairs of parameters (b,β). In particular, we consider the combinations having b=-10,-3,-1,0,1,3,10 and β=0.01,0.1,1,3,4,10,100.

b\β 0.01 0.1 1 3 4 10 100
-10 2.54 2.73 4.51 8.15 10.15 22.53 202.52
-3 2.19 1.99 0.33 3.44 5.42 17.79 197.80
-1 2.42 2.26 0.48 3.08 4.81 17.57 197.55
0 2.35 2.18 0.53 3.31 5.02 17.63 197.61
1 2.08 1.91 0.47 3.61 5.58 17.89 197.90
3 1.66 1.48 0.51 4.22 6.22 18.31 198.33
10 1.62 1.79 3.60 8.98 10.98 21.60 201.60

To assess robustness with an obviously non-fitting utility, we consider an inverted Mexican hat as utility function, that is, we substitute (1) by

En(x)=23σπ14(1-(x-θnσ)2)exp((x-θn)22σ2), 10

where we take σ=4. In this scenario, the bootstrapped data does not reflect the trend of the theoretical prediction (cf. Fig. 5). Moreover, as illustrated in Fig. 10, the model presents an unexpected bias towards negative values of the driving signal. Hence, as discussed above, the likelihood of the data coming from such a Markov chain is small and we can disregard this model. Furthermore, when following the same robustness analysis we performed on the pairs (b,β) using the convex combinations λf+(1-λ)g as sensorimotor loss, where λ=0,0.25,0.5,0.75,1, f is the exponential quadratic error with b=0 and β=4 (which are close to the values fitted for the participants) and g is the Mexican hat with σ=4, we obtain that the mean distance decreases as λ increases, as one can see in Table 3.

Figure 10.

Figure 10

Histogram of the forward driving signals using an inverted Mexican hat (10) with σ=4 as sensorimotor loss. Because of unexpected bias towards negative values of the driving signal we observe, it is unlikely the data was generated by a a Markov chain following such a sensorimotor loss and we can discard this model. Note we observe the same bias in the backward driving signals.

Table 3.

Mean distance between the theoretical prediction in (4) and the mean curve we obtain from bootstrapping the observed angles (see Sect. A.3.3) for several sensorimotor errors that are obtained as convex combinations of the exponential quadratic error (1) and the Mexican hat (10). In particular, we consider sensorimotor errors of the form λf+(1-λ)g, where λ=0,0.25,0.5,0.75,1, f is the exponential quadratic error with b=0 and β=4 (which are close to the values fitted for the participants) and g is the Mexican hat with σ=4. As expected, the mean distance diminishes as the weight of the exponential quadratic error increases.

λ Mean distance λ Mean distance
0 11.68 0.75 6.04
0.25 10.78 1 5.02
0.5 8.75

Author contributions

The authors confirm contribution to the paper as follows: study conception and design: P.H., C.L.-L., D.B.; data collection: P.H., C.L.-L.; analysis and interpretation of results: P.H., D.B.; draft manuscript preparation: P.H., C.L.-L.; S.G.; D.B.. All authors reviewed the results and approved the final version of the manuscript.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Data Availability

The datasets generated and analysed during the current study are available from the corresponding author on reasonable request.

Declarations

Competing interests

The authors declare no competing interests.

Ethics declaration

The studies involving human participants were reviewed and approved by the Ethics committee of Ulm University. The participants provided their written informed consent to participate in this study.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Shuoming A, Jing-Ning Z, Mark U, Dingshun L, Yao L, Junhua Z, Zhang-Qi Y, HT Q, Kihwan K. Experimental test of the quantum Jarzynski equality with a trapped-ion system. Nat. Phys. 2015;11(2):193–199. doi: 10.1038/nphys3197. [DOI] [Google Scholar]
  • 2.Braun DA, Mehring C, Wolpert DM. Structure learning in action. Behav. Brain Res. 2010;206(2):157–165. doi: 10.1016/j.bbr.2009.08.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Chib S, Greenberg E. Understanding the metropolis-hastings algorithm. Am. Stat. 1995;49(4):327–335. [Google Scholar]
  • 4.EGD Cohen and David Mauzerall A note on the Jarzynski equality. J. Stat. Mech: Theory Exp. 2004;2004(07):P07006. [Google Scholar]
  • 5.Delphine C, Felix R, Christopher J, Steven BS, Ignacio T, Carlos B. Verification of the crooks fluctuation theorem and recovery of RNA folding free energies. Nature. 2005;437(7056):231–234. doi: 10.1038/nature04061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Gavin EC. Nonequilibrium measurements of free energy differences for microscopically reversible Markovian systems. J. Stat. Phys. 1998;90(5):1481–1487. [Google Scholar]
  • 7.Gavin EC. Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences. Phys. Rev. E. 1999;60(3):2721. doi: 10.1103/PhysRevE.60.2721. [DOI] [PubMed] [Google Scholar]
  • 8.Gavin EC. Path-ensemble averages in systems driven far from equilibrium. Phys. Rev. E. 2000;61(3):2361. doi: 10.1103/PhysRevE.61.2361. [DOI] [Google Scholar]
  • 9.Da Costa L, Friston K, Heins C, Grigorios AP. Bayesian mechanics for stationary processes. Proc. R. Soc. A. 2021;477(2256):20210518. doi: 10.1098/rspa.2021.0518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.De Groot, S.R. & Mazur, P. Non-equilibrium Thermodynamics (Courier Corporation, 2013).
  • 11.Douarche F, Ciliberto S, Petrosyan A, Rabbiosi I. An experimental test of the Jarzynski equality in a mechanical experiment. EPL (Europhys. Lett.) 2005;70(5):593. doi: 10.1209/epl/i2005-10024-4. [DOI] [Google Scholar]
  • 12.Jeremy LE. Dissipative adaptation in driven self-assembly. Nat. Nanotechnol. 2015;10(11):919–923. doi: 10.1038/nnano.2015.250. [DOI] [PubMed] [Google Scholar]
  • 13.Goldt S, Seifert U. Stochastic thermodynamics of learning. Phys. Rev. Lett. 2017;118(1):010601. doi: 10.1103/PhysRevLett.118.010601. [DOI] [PubMed] [Google Scholar]
  • 14.Gottwald S, Braun DA. Bounded rational decision-making from elementary computations that reduce uncertainty. Entropy. 2019;21(4):1. doi: 10.3390/e21040375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Gottwald S, Braun DA. The two kinds of free energy and the Bayesian revolution. PLoS Comput. Biol. 2020;16(12):1–32. doi: 10.1371/journal.pcbi.1008420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Jordi G-M, Matthias K, Daniel AB. Non-equilibrium relations for bounded rational decision-making in changing environments. Entropy. 2018;20(1):1. doi: 10.3390/e20010001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Griffiths TL, Callaway F, Chang MB, Grant E, Krueger PM, Lieder F. Doing more with less: Meta-reasoning and meta-learning in humans and machines. Curr. Opin. Behav. Sci. 2019;29:24–30. doi: 10.1016/j.cobeha.2019.01.005. [DOI] [Google Scholar]
  • 18.Pedro, H., Sebastian, G., & Braun, D.A. Jarzyski’s equality and crooks’ fluctuation theorem for general markov chains. arXiv preprint arXiv:2202.05576 (2022). [DOI] [PMC free article] [PubMed]
  • 19.Jarzynski C. Equilibrium free-energy differences from nonequilibrium measurements: A master-equation approach. Phys. Rev. E. 1997;56(5):5018. doi: 10.1103/PhysRevE.56.5018. [DOI] [Google Scholar]
  • 20.Jarzynski C. Hamiltonian derivation of a detailed fluctuation theorem. J. Stat. Phys. 2000;98(1):77–102. doi: 10.1023/A:1018670721277. [DOI] [Google Scholar]
  • 21.Jarzynski C. Nonequilibrium work theorem for a system strongly coupled to a thermal environment. J. Stat. Mech: Theory Exp. 2004;2004(09):P09005. doi: 10.1088/1742-5468/2004/09/P09005. [DOI] [Google Scholar]
  • 22.Jarzynski C. Equalities and inequalities: Irreversibility and the second law of thermodynamics at the nanoscale. Annu. Rev. Condens. Matter Phys. 2011;2(1):329–351. doi: 10.1146/annurev-conmatphys-062910-140506. [DOI] [Google Scholar]
  • 23.Kelso JJ, Buchanan JAS, Murata T. Multifunctionality and switching in the coordination dynamics of reaching and grasping. Hum. Movement Sci. Curr. Opin. Behav. Sci. 1994;13:63–94. doi: 10.1016/0167-9457(94)90029-9. [DOI] [Google Scholar]
  • 24.Konrad PK, Daniel MW. The loss function of sensorimotor learning. Proc. Natl. Acad. Sci. 2004;101(26):9839–9842. doi: 10.1073/pnas.0308394101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Lieb EH, Yngvason J. The physics and mathematics of the second law of thermodynamics. Phys. Rep. 1999;310(1):1–96. doi: 10.1016/S0370-1573(98)00082-9. [DOI] [Google Scholar]
  • 26.Lindig-León C, Gottwald S, Braun DA. Analyzing abstraction and hierarchical decision-making in absolute identification by information-theoretic bounded rationality. Front. Neurosci. 2019;13:1. doi: 10.3389/fnins.2019.01230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Cecilia L-L, Gerrit S, Braun DA. Bounded rational response equilibria in human sensorimotor interactions. Proc. R. Soc. B. 2021;288(1962):20212094. doi: 10.1098/rspb.2021.2094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Jan L, Sophie D, Steven BS, Ignacio T, Carlos B. Equilibrium information from nonequilibrium measurements in an experimental test of Jarzynski’s equality. Science. 2002;296(5574):1832–1835. doi: 10.1126/science.1071152. [DOI] [PubMed] [Google Scholar]
  • 29.Joseph L. Ueber den zustand des wärmegleichgewichtes eines system von körpern. Akademie der Wissenschaften, Wien. Mathematisch-Naturwissenschaftliche Klasse, Sitzungsberichte. 1876;73:128–135. [Google Scholar]
  • 30.Mas-Colell, A., Whinston, M. & Green, J. Microeconomic theory (Oxford University Press, 1995).
  • 31.Pedro AO, Daniel AB. Thermodynamics as a theory of decision-making with information-processing costs. Proc. R. Soc. A: Math. Phys. Eng. Sci. 2013;469(2153):20120683. doi: 10.1098/rspa.2012.0683. [DOI] [Google Scholar]
  • 32.Park S, Khalili-Araghi F, Tajkhorshid E, Schulten K. Free energy calculation from steered molecular dynamics simulations using Jarzynski’s equality. J. Chem. Phys. 2003;119(6):3559–3566. doi: 10.1063/1.1590311. [DOI] [Google Scholar]
  • 33.Parr T, Da Costa L, Friston K. Markov blankets, information geometry and stochastic thermodynamics. Phil. Trans. R. Soc. A. 2020;378(2164):20190159. doi: 10.1098/rsta.2019.0159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Parrondo JMR, Horowitz JM, Sagawa T. Thermodynamics of information. Nat. Phys. 2015;11(2):131–139. doi: 10.1038/nphys3230. [DOI] [Google Scholar]
  • 35.Nikolay P, Robert AM, Jeremy LE. Statistical physics of adaptation. Phys. Rev. X. 2016;6(2):021036. [Google Scholar]
  • 36.Rajesh PNR, Dana HB. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 1999;2(1):79–87. doi: 10.1038/4580. [DOI] [PubMed] [Google Scholar]
  • 37.Saira O-P, Yoon Y, Tanttu T, Möttönen M, Averin DV, P Pekola J. Test of the Jarzynski and crooks fluctuation relations in an electronic system. Phys. Rev. Lett. 2012;109(18):180601. doi: 10.1103/PhysRevLett.109.180601. [DOI] [PubMed] [Google Scholar]
  • 38.Sanborn AN, Chater N. Bayesian brains without probabilities. Trends Cogn. Sci. 2016;20(12):883–893. doi: 10.1016/j.tics.2016.10.003. [DOI] [PubMed] [Google Scholar]
  • 39.Schach S, Gottwald S, Braun DA. Quantifying motor task performance by bounded rational decision theory. Front. Neurosci. 2018;12:1. doi: 10.3389/fnins.2018.00932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Schütz C, Weigelt M, Odekerken D, Klein-Soetebier T, Schack T. Motor control strategies in a continuous task space. Mot. Control. 2011;15(3):321–341. doi: 10.1123/mcj.15.3.321. [DOI] [PubMed] [Google Scholar]
  • 41.Seifert U. Entropy production along a stochastic trajectory and an integral fluctuation theorem. Phys. Rev. Lett. 2005;95(4):040602. doi: 10.1103/PhysRevLett.95.040602. [DOI] [PubMed] [Google Scholar]
  • 42.Seifert U. Stochastic thermodynamics, fluctuation theorems and molecular machines. Rep. Prog. Phys. 2012;75(12):126001. doi: 10.1088/0034-4885/75/12/126001. [DOI] [PubMed] [Google Scholar]
  • 43.Shadmehr, R. & Mussa-Ivaldi, S. Biological Learning and Control: How the Brain Builds Representations, Predicts Events, and Makes Decisions (MIT Press, 2012).
  • 44.Andrew S, Yao L, Shuoming A, Xiang Z, Jing-Ning Z, Zongping G, HT Q, Christopher J, Kihwan K. Verification of the quantum nonequilibrium work relation in the presence of decoherence. New J. Phys. 2018;20(1):013008. doi: 10.1088/1367-2630/aa9cd6. [DOI] [Google Scholar]
  • 45.Still, S. Information-theoretic approach to interactive learning. EPL (Europhys. Lett.)85(2), 28005 (2009).
  • 46.Still, S., Sivak, D.A., Bell, A.J., Crooks, G.E. Thermodynamics of prediction. Phys. Rev. Lett.109(12), 120604 (2012). [DOI] [PubMed]
  • 47.Todorov Emanuel. General duality between optimal control and estimation. In 2008 47th IEEE Conference on Decision and Control, pp. 4286–4292. IEEE, (2008).
  • 48.Emanuel T, Michael IJ. Optimal feedback control as a theory of motor coordination. Nat. Neurosci. 2002;5(11):1226–1235. doi: 10.1038/nn963. [DOI] [PubMed] [Google Scholar]
  • 49.Toyabe S, Sagawa T, Ueda M, Muneyuki E, Sano M. Experimental demonstration of information-to-energy conversion and validation of the generalized Jarzynski equality. Nat. Phys. 2010;6(12):988–992. doi: 10.1038/nphys1821. [DOI] [Google Scholar]
  • 50.Edward JAT, Daniel AB, Daniel MW. Facilitation of learning induced by both random and gradual visuomotor task variation. J. Neurophysiol. 2012;107(4):1111–1122. doi: 10.1152/jn.00635.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Weglarczyk Stanislaw. Kernel density estimation and its application. In ITM Web of Conferences, 23. EDP Sciences (2018).
  • 52.Daniel MW, Zoubin G, Michael IJ. An internal model for sensorimotor integration. Science. 1995;269(5232):1880–1882. doi: 10.1126/science.7569931. [DOI] [PubMed] [Google Scholar]
  • 53.Marty Ytreberg F, M Zuckerman D. Efficient use of nonequilibrium measurement to estimate free energy differences for molecular systems. J. Comput. Chem. 2004;25(14):1749–1759. doi: 10.1002/jcc.20103. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets generated and analysed during the current study are available from the corresponding author on reasonable request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES