Abstract
Mismatch cost (MMC) is a universally applicable lower bound on the entropy production (EP) of any fixed physical process across a given time interval. In the first part of the article, we establish results concerning MMC to prove that it scales at least linearly with the total heat flow in the worst case over initial distributions. We also prove that the MMC lower bound over a given time interval never decreases if the time interval is subdivided into a sequence of subintervals, and that the bound often increases. In the second part of the article, we introduce a general framework for computing the minimal EP (ie the MMC) associated with running a computer program on any physical system that implements a modern digital computer. We apply this general framework to compare MMC of running two canonical sorting algorithms, bubble sort and bucket sort. The framework enables us to investigate how thermodynamic cost depends on features like input size and structure (eg with or without repeated entries). Finally, we extend the framework to programs that call subroutines.
Keywords: entropy production, thermodynamic cost of computation, mismatch cost, computer programs, nonequilibrium thermodynamics
Significance Statement.
While computational complexity traditionally focuses on time and memory resources, the energetic cost of computation remains poorly understood. This work introduces a general framework to quantify the fundamental energetic cost of running any computer program, opening a new approach to comparing algorithmic efficiency beyond time and memory. Leveraging stochastic thermodynamics, we derive universal lower bounds on the unavoidable energy dissipation incurred by any high-level program. Grounded in stored-program architecture, these bounds are largely independent of physical implementation. Our approach demonstrates how thermodynamic efficiency varies across algorithms, exemplified through sorting algorithms. This framework offers new insights into the physical limits of algorithmic efficiency and energy consumption in computing.
Introduction
Background
Computer programs are sequences of instructions that, when executed mechanically, produce a desired computation. In the early days of computing, the very notion of a programmable machine was far from obvious. In his seminal 1936 paper, Alan Turing rigorously formalized the concept of a universal computer—a machine capable of executing any sequence of instructions by storing them on a tape or in memory (1). Building on this foundation, the subsequent development of stored-program machines, later known as the von Neumann architecture, enabled the proliferation of modern computers. This architecture also gave rise to the formal study of program complexity, focusing on the resources—most notably time and memory—required to compute a function. These “time” and “space” costs became central measures of program efficiency.
An analogous study of the energetic cost of executing a computer program has remained largely unexplored—despite its apparent importance for real-world computing—due to several challenges. First, classical thermodynamics and equilibrium statistical mechanics are ill-suited for machines operating far from equilibrium with many interacting, dynamically evolving degrees of freedom. Second, real-world computational machines are built on a wide variety of physical substrates, making it difficult to formulate a universal framework applicable to any computer program. In this article, we address these challenges by adopting tools from stochastic thermodynamics and applying the mismatch cost (MMC) lower bound on entropy production (EP) to a logically abstract stored-program model of computation. This is the same foundational model that enabled early computer science to compare the time and space complexity of programs in a unified manner, and it likewise allows us to derive lower bounds on thermodynamic cost of computer programs.
To provide background on the first major challenge, earlier foundational work by Szilard (2), Landauer (3), and Bennett (4) established that logically irreversible computations necessarily produce a minimum amount of heat, quantified as , where is Boltzmann’s constant, T is the temperature of the surrounding reservoir, and is the change in the system’s Shannon entropy. This reasoning led to the widely held belief that logically reversible computations, for which , have no fundamental energetic cost (5). Moreover, the entropy change accounts only for the reversible portion of heat exchange—heat that, in principle, can be fully recovered by reversing the process (6). Consequently, this framework does not capture the irreversible energy dissipation arising from the far-from-equilibrium dynamics typical of real computational machines.
Prior studies on the energetic efficiency of computer programs have largely relied on Landauer’s principle (7, 8), attributing an energy cost of to each bit erasure step. However, these approaches neglect the energetic cost of logically reversible operations and fail to capture the irreversible dissipation arising from the far-from-equilibrium dynamics inherent in program execution.
It is now well established that, beyond the Landauer cost, there exists an inherently irreversible component known as EP. EP quantifies the energy irreversibly dissipated into the environment (6, 9), and it can be strictly positive even during logically reversible computations. This deeper understanding arises from major advances in nonequilibrium statistical physics, which have extended classical thermodynamics to systems operating at mesoscopic scales and far from equilibrium. In particular, the framework of stochastic thermodynamics (9–11) has proven invaluable for characterizing far-from-equilibrium behavior and quantifying energy dissipation across a wide range of processes. As such, it provides a rigorous and well-suited theoretical foundation for studying the energetic costs of computation (12).
One important contribution to total EP is the MMC (13, 14). MMC quantifies the extra EP that arises when the starting probability distribution of a system differs from the optimal distribution that minimizes EP. Thus, even processes that produce zero EP for a specific initial distribution—making them thermodynamically reversible under those conditions—can generate positive EP precisely equal to the MMC if the initial distribution changes. This concept has been further explored to show that computational tasks that are inherently modular—that is, when a system is decomposed into various subparts for computation—have an unavoidable thermodynamic cost associated with them (15), establishing fundamental bounds on EP in systems such as communication channels (16) and Boolean circuits (17).
Contributions
The contributions of this article are 2-fold. To begin with, we establish theoretical results concerning MMC in General properties of the MMC section. Specifically, we show that the MMC contribution to the total thermodynamic cost grows at least linearly with the total heat flow in the worst case over the initial distribution. This establishes that—in contrast to results like the thermodynamic speed limit theorem or thermodynamic uncertainty relations—MMC in principle could constitute a substantial fraction of the total dissipated heat on macroscopic scales. Additionally, in Time coarse-graining section, we prove that the sum of MMCs evaluated at finer time resolutions always exceeds the MMC computed over a coarser-grained execution that omits intermediate steps—thus characterizing MMC’s behavior under time coarse-graining. This implies that MMC remains a valid lower bound on total EP even when considering only higher-level computational steps without resolving the detailed physical dynamics. Moreover, the sum of MMC evaluated on any subset of computational steps still lower bounds the total MMC, and thus total EP, over the entire process.
Second, we evaluate the MMC associated with executing computer programs. In Framework section, we introduce the framework based on the stored-program architecture of modern computers where a program counter keeps track of current instruction and a clocked, iterative process sequentially modifies memory contents. This framework not only allows us to define a computer program’s full state—including all variable values and the program counter—at any point during its execution but also to determine the MMC for each iteration of the machine. It thereby provides a minimal thermodynamic cost that is incurred in each step of a program.
We then apply this framework to a range of concrete examples, including classic sorting algorithms. Furthermore, we extend the analysis to encompass programs that invoke subroutines, thereby illustrating how MMC can be used to quantify the thermodynamic costs of modular, hierarchical program structures.
General properties of the MMC
We begin with some general considerations from stochastic thermodynamics and the role of the MMC. We consider a system with a state space that undergoes a transformation. Starting from an initial state , the system evolves and ends in a state . This transformation can be described by a conditional map , which specifies the probability that the system ends in state given that it started in state .
This formulation captures a wide range of processes, including computational devices such as Boolean gates (where the system state changes as a gate computes an output), Boolean circuits executing sequences of operations, and chemical reaction networks that where the composition of reactants change over time. In general, the mapping from an initial state to a final state is stochastic rather than deterministic, meaning that typically takes values strictly between 0 and 1.
If the initial state is drawn from a distribution over , then the final state of the system is distributed according to which is given by,
| (1) |
As a shorthand, we sometime write . Let denote the probability simplex associated with the state space , ie the set of all probability distributions over system states.
Associated with any such transformation , many thermodynamic cost function take the common mathematical form (14):
| (2) |
where f is a given real-valued function on the state space , and is the Shannon entropy defined as . Depending on the physical interpretation of f the cost function (2) can represent a variety of thermodynamic quantities, including total EP, nonadiabatic EP (18), free energy loss (19), and entropy gain (20). In particular, when represents the average heat flow during the process when the system starts in state x, then (2) corresponds to the dissipated work or the EP.
The values associated with thermodynamic costs can, in principle, be either measured experimentally or computed from detailed knowledge of the underlying physical dynamics. Once is known, the thermodynamic cost (2) is fully determined for any initial distribution. In practice, however, neither the microscopic degrees of freedom relevant for measuring nor a complete description of the system’s dynamics are typically accessible. As a result, determining the thermodynamic cost associated with a given computational map G is generally a challenging task.
Much of the development of stochastic thermodynamics has therefore focused on identifying and bounding specific contributions to thermodynamic cost that arise from physical constraints on the process. Examples include thermodynamic uncertainty relations (21–23), which quantify costs associated with precision of currents, and speed limit theorems (24, 25), which capture costs imposed by finite-time operation. Such results usually rely on simplifying assumptions about the underlying dynamics, most commonly Markovianity, weak coupling, or detailed balance.
In contrast, thermodynamics of computation is often driven by question of a more universal kind: what thermodynamic cost is unavoidably incurred in implementing a given map G, independent of the detailed physical realization? MMC has been developed and applied across a wide range of computational settings—including finite automata (26), Turning machines (27), communication systems (16), and logical circuits (17)—and provides bounds that are largely independent of the specific physics of the underlying implementation. Below, we review the MMC framework and summarize several of its key properties.
For any fixed physical process implementing the stochastic map G, which transforms any initial distribution into a final distribution , there exists among all possible initial distributions an optimal distribution that minimizes the thermodynamic cost,
| (3) |
where has fulls support over state space as long as the map G is not a deterministic map (13).
When the same process runs for an initial distribution different from , the process incurs an extra cost in addition to that incurred by the optimal initial distribution which minimizes that cost. The total cost for any decomposes into (13–15):
| (4) |
where denotes the Kullback–Leibler (KL) divergence.
The optimal distribution is also referred to as the prior distribution associated with the cost function . The decomposition in Eq. 4 expresses the total cost as the sum of two terms: the minimum achievable cost , and an additional term arising from the mismatch between any actual distribution and the prior . This additional term, given by the drop in KL divergence under the map G, is known as the MMC:
| (5) |
Due to the data-processing inequality for KL-divergence, MMC is always nonnegative: for any (28). The formula for MMC is very general, applicable to classical systems, quantum systems, and even systems undergoing non-Markovian dynamics. The formula for MMC is applies broadly whether the dynamics is Markovian or non-Markovian process, classical or quantum, and whether it is in discrete or continuous time. The prior distribution defined in Eq. 3 encodes features of the process through which G is realized. In this sense, it is important to emphasize that the MMC (5) is not completely independent of the underlying physical implementation.
That said, in many settings some natural constraints can easily be reflected in the properties of the prior distribution. For example, if the system undergoing the transformation consists of two or more physically independent subsystems, then any physically admissible prior over the joint system must factorize as a product distribution over those subsystems (11). As a consequence, for a generic initial distribution the system incurs an unavoidable EP, captured by the MMC. In this sense, the resulting bounds are universal: they do not require a detailed dynamical description of the underlying physical implementation. For instance, if two Boolean gates in a circuit, each running on its own underlying physical process that is separate from the other, operate on inputs that are correlated, it results in a thermodynamic cost that can be quantified by the MMC, independent of the microscopic physics of the gates (17). In general, computational devices are designed based on some modular and hierarchical design principles, providing strong constraints on how their subsystems are connected.
Another set of examples where MMC provides strictly positive contribution to the thermodynamic cost is when a process is repeated over and over, such as in digital computers that run periodic processes governed by a global clock. Suppose a process that is characterized by the associated map G is repeatedly applied to distribution over states of the system, without reinitializing the system. That is, starting from at time , the system evolves through a sequence of distributions , where each . Although the actual state distribution changes over time, the underlying process implementing G—and hence the associated prior distribution —remains fixed across all iterations. The total MMC then accumulates over iterations as:
| (6) |
| (7) |
Note that even if the process starts at the prior , after the first iteration, the distribution becomes , which differs from . This deviation from the prior distribution results in a strictly positive MMC in the next iteration, and the same holds for subsequent iterations (26).
The prior distribution defined in (3) is determined by the cost function f and the stochastic map G. In the following section, we examine how specific properties of f influence the properties of the prior, and consequently the MMC. In particular, we show that the maximum and minimum values of across states , especially in the regime, where is large, provide a lower bound on the MMC’s contribution to the total cost.
Prior distribution
Consider a scenario where values are sufficiently large for all x such that the right-hand side of Eq. 2 is dominated by to large extent compared to the term . In such cases, if is not uniform across all states—meaning is higher for some states than others—the associated prior distribution will take on lower values for those higher-weighted states compared to the rest. The greater the nonuniformity of across states, the closer the prior will be to the edge of the simplex . Then, any typical actual distribution on the simplex that is not close to the edge would yield a significantly high value of KL-divergence . This intuitive idea is formalized in Section SIIA to prove that in the worst case over initial distributions, MMC scales at least linearly with the difference between maximum and minimum value of ,
| (8) |
In this equation, is the MMC for the initial distribution that is furthest from the prior distribution in terms of KL divergence, and therefore incurs the maximum MMC. As an example, consider the case where represents the average heat flow into the environment when the system starts in state x. At sufficiently large scales—particularly in the regime where computational processes operate—these heat flow values are on the order of , while the size of the state space, , is much smaller, around . As a result, the difference is typically of the same order, . This implies that, in the worst-case scenario, the MMC contribution to the total cost—given by (8)—can be comparable to the total cost itself. While many EP bounds, such as the thermodynamic uncertainty relation and speed limit theorems, offer meaningful insights at the microscopic level, their relevance diminishes at mesoscopic or macroscopic scales. In these regimes, where heat generation is substantial and relatively easy to measure experimentally, such bounds capture only a small fraction of the total EP. In contrast, Eq. 8 shows that when takes large values—corresponding to macroscopic processes—the MMC can contribute significantly to the overall EP.
Time coarse-graining
Consider a system with state space , and let , , and denote the random variables representing the system state at three successive time steps , , and , with corresponding distributions , , and . Due to the additivity of the cost function over time (29), we have:
| (9) |
where denotes the cost associated with the transition from time to .
Let be the prior distribution that minimizes , and let denote the corresponding residual cost. Since is not necessarily the prior for , we obtain the following decompose of :
| (10) |
where is the residual cost for and is the MMC of using instead of the prior for .
Let denote the distribution at time obtained by evolving under the dynamics from to . This intermediate distribution is not generally the prior for , so using the MMC decomposition of , we get:
| (11) |
where is the residual cost of dynamics from time to . The residual cost of the full dynamics from time to is by definition .
| (12) |
where in the last inequality we have used the nonnegativity of the MMCs and . For any arbitrary initial distribution , using the MMC decomposition in Eq. 9 results in,
| (13) |
By using, , from (12), we obtain the desired result:
| (14) |
for any . Therefore, the MMC computed over the full interval —without accounting for the intermediate step—is less than or equal to the sum of the MMCs computed separately over and . This demonstrates that the MMC at a coarser temporal resolution is always bounded above by the total MMC at a finer resolution over the same time span.
An analogous inequality does not hold for spatial coarse-graining. Let denote a pair of random variables describing the system at a finer spatial resolution, and let X alone represent the coarse-grained description. The cost function at the fine resolution is given by:
| (15) |
By defining and marginalizing over the variable Y, one can derive the corresponding coarse-grained cost,
| (16) |
Note that is dependent on and therefore on . While is well-defined over the entire space , the coarse-grained MMC is only well-defined if the function is uniquely determined. However, since varies with , lacks a consistent definition and therefore a direct comparison between and is ill-posed.
In the following sections of the article, we introduce the foundational stored-program architecture of modern computers and use the MMC framework to analyze the thermodynamic cost of running a computer program on such a machine. Importantly, the treatment is more general: it uses only the MMC expression in Eq. 5 together with the notion of repeated physical processes and the associated MMC expression in Eq. 7 and does not require nonuniformity of . It does not rely on the results concerning linear lower bounds on MMC or the time coarse-graining result discussed in Prior distribution section and Time coarse-graining section, respectively.
Framework
Stored program computer
A defining feature of modern computers is their programmability—a single machine capable of executing arbitrary well-defined set of instructions. Early computing machines though considered orders of magnitude faster than human computer, were not easily programmable. To perform a new task, engineers often needed to rewire the device, flip physical switches, or redesign parts of the hardware. In effect, a different computation meant a different machine. The stored-program architecture was a conceptual leap because it introduced the idea that instructions could be stored in memory in exactly the same formal representation as data, while execution is handled by a separate processing unit. Programming no longer required hardware modification; it meant writing a new list of instructions to memory—programming as we know it in the modern sense, though at the machine level.
This architecture—now standard in digital computers—enabled programmability and ultimately shaped modern computing. In essence, a stored-program computer consists of a memory that holds both data and instructions, and a control/processing unit that fetches instructions by address, updates a program counter, and executes those instructions on the stored data.
The control unit includes several small storage locations, called registers that hold data or instructions during processing, two of which are central to program execution: The instruction address register, also called the program counter (PC), which stores the memory address of the current instruction; and the instruction register (IR), which holds the current instruction fetched from memory. This register is also referred to as the program counter. The control unit also contains an arithmetic logic unit (ALU) responsible for performing basic arithmetic and logical operations (see Fig. 1).
Figure 1.
Stored program architecture: The control unit has access to multiple registers, including the instruction register and program counter. Memory stores the data and instruction which can be accessed using address. Control unit communicates with memory using the address and data line, aside from read and write enable lines (not shown in the figure). PC holds the address of current instruction, which is used to fetch the instruction into IR. With the help of check circuits (not shown in the figure), the instruction is decoded into control signals that change register activity, ALU configuration, and memory access. With each clock cycle, the program counter increments and points to the next instruction.
Figure 1 provides a stylized sketch of this architecture with an example. In Fig. 1, the program counter initially points to memory address 100. The control unit, using address 100 loads the instruction stored at that location into the IR. Suppose the instruction is ‘‘READ R1, 110’’ (encoded in binary in memory), which means load the value from memory address 110 into register R1. This instruction enables register R1 for writing and issues a read request to memory address 110. For an instruction like ‘‘ADD R1, R2, R3’’, which adds the values in R1 and R2 and stores the result in R3, the check circuits direct the ALU to accept inputs from R1 and R2, perform an addition, and write the result to R3.
At the end of each instruction, the program counter increments, advancing to the address of the next instruction, and the cycle repeats. The clock-driven update of the program counter is what defines the control flow, and any deviation (eg jumps or branches) must be explicitly encoded in the instruction sequence.
This process, called the fetch-execute cycle, repeats with every clock cycle, using the same physical logic circuitry, regardless of which instruction is being executed. The control unit is not reconfigured or altered between tasks; instead, it behaves as a periodic dynamical system, transitioning deterministically based on the current instruction and register states. This uniformity across different instructions—performing the same physical sequence over and over while producing different outcomes depending on the instruction—is a defining characteristic of stored-program machines.
Modeling program dynamics using random access stored program
To model the computation and execution mechanism of stored-program computers, we use a simplified abstraction of stored program architecture known as random access stored program (RASP) machines. Much like stored program architecture used in real digital computers, a RASP machine consists of a memory that stores both the program (the list of instructions) and the data the program operates on. It is important to clarify that the term “random” in RASP refers to random-access memory—ie the ability to access any register directly—and does not imply stochastic or probabilistic computation.
We provide an example in Fig. 2, where two high-level programs are translated into their corresponding low-level RASP representations, which closely resembles how an actual computer would execute them. In this representation, the use of registers to store variable values becomes explicit, and the program counter—responsible for tracking the control flow—is clearly visible. Each variable in the program maps to a register in the RASP, including a dedicated register for the program counter. The state of the program is defined by the values of all the RASP registers storing the data and the value of the program counter. For examples, the instantaneous state of program (a) in Fig. 2 is specified by the values of registers holding the variables x, y, z, and the program counter pc.
Figure 2.
Two C programs are translated into their corresponding lower-level RASP representations. The RASP code explicitly shows how each variable is assigned to a register and how the program counter advances with each instruction. In program (a), which performs simple addition, the program counter increments sequentially since there are no loops or conditionals. Each instruction—such as loading values and performing arithmetic—corresponds to one step. In contrast, program (b) includes conditional logic, which introduces nonsequential control flow. Here, the program counter may jump to a different instruction depending on the outcome of a conditional check. A table outlining the meaning of the commands used in the RASP language is provided in Table SI.
Starting from initial input values, execution proceeds step by step by updating the contents of these registers. At each step, the program counter selects the next instruction to execute, and that instruction deterministically updates the contents of the registers and advances the counter. By repeating this clocked process, the machine follows a well-defined sequence of states corresponding to the execution of the program.
As the program execution proceeds, the joint state of all the registers evolves through a sequence of configurations—effectively tracing a trajectory through the program’s state space. We can track the discrete sequence of values all the registers go through as the program runs on a given input. The role of the program counter is crucial in the description of the state of the program. At any moment during execution, the values of all variables alone do not fully specify the program’s behavior—one must also know which instruction is being executed. The same variable values can be modified differently depending on the program counter.
This perspective allows us to model the state of a program as a node in a graph, and execution of instructions as directed edges on those nodes—each directed edge from one node to another is a transition associated with the computational step. By simulating the associated RASP of a program for every possible values of input variables, one can generate the adjacency matrix G associated with the computational graph of the program. For example, the directed graph illustrated in Fig. 3 corresponds to the heaviside program described in Fig. 2b.
Figure 3.
State space of the heaviside program execution defined in Fig. 2. Each node represents a unique program state, defined by the values of all registers and the program counter, generated for input values and . Leaf nodes (with no incoming edges) correspond to the program’s possible initial states. Directed edges trace the sequence of state transitions during execution. All paths eventually converge to a halting state, where the program halts.
More formally, let denote the set of all valid joint states of the variables in algorithm including the program counter; this set corresponds to the nodes of the computational graph. The variables in a program can generally be grouped into two categories: input variables and noninput (internal) variables. Within noninput variables, some variables such as loop counters, flags, and program counter are special variables since they are initialized to fixed values (eg the program counter begins at 0).
Let , , and denote the joint state of input variables, special variables, and noninput variables that are not special, respectively. Let be the random variable representing the initial state of the program, and let be the random variable representing the program state after ith-iteration of the associated RASP machine. Specifically, let be the random variable representing the program state after the program halts. We assume that the input variables are freshly sampled from at the start of each run. Special variables—such as flags, loop counters, and the program counter—are initialized to fixed values. All the other internal (noninput) variables, , are assumed to retain the values they had at the end of the previous execution. Let be the marginal distribution over these noninput variables from the previous run, ie , where is the joint state of all variables. Thus, the initial joint distribution over the full program state is:
| (17) |
where is the Kronecker delta, ensuring that the special variables are set to their predefined initial states with probability 1.
Starting with , the dynamics induced at the ensemble level from the state transitions of the RASP machine can be modeled naturally by the adjacency matrix,
| (18) |
or as a shorthand. Therefore, the distribution over the state of the program after ith iteration is given by,
| (19) |
Thus, starting with an initial distribution , the above equation provides the sequence of probability distributions over program’s state space across each iteration.
Note that for programs that do not have any conditionals, the number of steps required to finish the program does not change with the input. In that case, the ith iteration of the map G corresponds to the state transition associated with the ith computational step of the program. The number of iterations of G required for to reach a steady state is same as the number of computational steps needed to halt the program. However, program that do involve conditionals, such as the heaviside, the number of steps to halt the program may depend on the input. In case of heaviside, this number is 6 if and 7 otherwise. As a result, in general the iteration index i of G does not directly corresponds to a specific computational step. Moreover, the number of iterations needed for to reach a steady state is given by the maximum execution length across all possible inputs. Nonetheless, these changes of states are driven by the same map G, which corresponds to repeating an identical process across all steps of a program in a computer.
MMC of computer programs
Consider an adjacency matrix or map G representing to the state space behavior of the associated RASP of a program. As described in previous section, changes of states are driven by the repeated application of map G, which corresponds to repeating an identical process across all steps of a program in a computer. Let denote the prior distribution associated with the cost function of the map G. As discussed earlier, this prior remains fixed across iterations. Consequently, the MMC incurred during the ith iteration of the map G is given by,
| (20) |
The total MMC of running the entire program is given by the sum over all iterations until the program halts,
| (21) |
| (22) |
To determine the MMC for a given computer program using Eq. 22, we can either start with illustrative choices of the function f and derive the corresponding prior distribution using the methods outlined in Section SI, or we can make illustrative assumptions about the prior directly. In the example of Heaviside program (Heaviside program section) that follow, we adopt the first approach: we assume a uniform f and compute the associated prior distribution from it.
Heaviside program
Consider the heaviside program introduced in Fig. 2, whose state space is depicted in Fig. 3, generated through simulation of the corresponding low-level RASP code. The state of the program is given by the joint value of variables x, c, a, pc, where x and c are the input variables, a is the conditional flag variable, and pc is the program counter. The program’s state transitions are captured by the adjacency matrix of its state-space graph, as formalized in Eq. 18. Given an initial distribution , Eq. 18 describes the discrete-time evolution of the distribution under repeated application of a fixed stochastic map G.
We assume that the input variable x takes values in the set and follows a binomial distribution parameterized by the Bernoulli parameter α. For illustrative purposes, the variable c is fixed at 5. This input distribution induces a distribution over the program’s other variables, which in turn defines the initial state distribution over the program’s full state space, as described in Eq. 17. By simulating the state transitions of associated RASP for every possible input values, we obtain G. The prior is obtained by assuming uniform f over all states.
We use Eq. 20 to compute the MMC incurred at each iteration and Eq. 22 to compute the total MMC. Figure 4 shows both the stepwise and cumulative MMC results. As seen in Fig. 4a, the program reaches a steady state after four iterations, beyond which the per-step MMC remains constant. As a result, Fig. 4b shows that the cumulative MMC grows linearly after the steady state is reached.
Figure 4.
MMC of the heaviside program. The input variable x follows a binomial distribution , parameterized by a Bernoulli parameter α, while the variable c is fixed at 5. This input distribution induces a distribution over all program variables, thereby determining the initial distribution over the program’s state space. a) MMC incurred in each iteration of the heaviside program. Regardless of the initial distribution, the program reaches a steady state within four iterations, after which the per-step MMC becomes constant. b) Cumulative MMC over successive iterations of the heaviside program.
Bubble sort program
We now turn to a sorting programs for integer arrays of length n. Figure S1 depicts the source code for BubbleSort program and the corresponding RASP like code. The state of the program consists of an array arr of length n, Boolean flag swapped, and loop counters i and j. The full state of the program at any step is defined by the joint values of these registers along with the program counter. As instructions execute, both the register values and the program counter are updated, producing a sequence of state transitions.
Before proceeding to generate the state space of this program, we address two key considerations. First, for simple programs without arrays, the RASP model maps neatly onto low level stored program architecture: each variable corresponds to a register, and the program’s state is defined by the joint state of all registers and the program counter. However, array-handling introduces complexity. In real systems, array elements are typically stored in memory and accessed via register-stored addresses. To avoid this complication, we adopt a simplified RASP-like model in which all program variables—including individual array elements—are treated as if directly stored in registers. This keeps our definition of the program state minimal and unified: a joint configuration of all variables and the program counter.
Second, the size of the joint state space grows rapidly with array size n. The dominant factor is the exponential number of possible input arrays. For instance, if array elements are digits in , the number of possible input arrays is , leading to a total state space of size roughly (n accounting for internal variables). If the input array is restricted to be a permutation of , the number of inputs becomes , and the total state space scales as . In either case, the combinatorial explosion imposes practical limits on simulation of lower level RASP.
We immediately observe that this approach suffers from a combinatorial sampling challenge as the input array length increases. To manage this, we restrict our analysis to small input sizes, specifically , and separately study the cases where the input array is a permutation of without repeated entries, and where it is a combination of it with repeated entries are allowed.
To construct the associated stochastic map G of the bubble sort algorithm with restricted input arrays that are permutations of , we simulate the program code for each input array and record the transitions in the program’s state after each instruction. We perform this simulation for arrays of length and . The resulting phase spaces with state transitions are shown in Fig. 5. The leaf nodes in these graphs represent all possible input arrays— for and for .
Figure 5.
State space of the BubbleSort program for input arrays of length (a) and (b). The input arrays are permutations of . Each node represents a unique state of the program, and directed edges correspond to state transitions caused by the execution of individual instructions. Leaf nodes represent possible initial states of the program, determined by different input permutations. As the program runs, it follows a deterministic trajectory through the state space, eventually reaching a terminal (halt) or attractor state for each initial condition.
For arrays of length n with repeated entries drawn from the set , the state space expands significantly since it includes input combinations with duplicates (eg ). The enlarged state space for is shown in Fig. 6.
Figure 6.
State space of the BubbleSort program with input arrays allowing repeated entries from the set . Compared to the case with distinct elements, the state space for is significantly larger due to the increased number of possible input configurations, and the program contains many more fixed points (halt states).
To evaluate the MMC, we assume a uniform prior in both settings: one over the state space of arrays with nonrepeated entries (permutations) and one over arrays with repeated entries (combinations). We consider array lengths in each case. We also assume that the actual initial distribution over the state space is uniform. The resulting MMC per iteration of the map G is shown in Fig. 7a for the permutation case and Fig. 7b for the combination case. In both settings, the MMC per iteration approaches a steady value once the program halts, and the number of iterations required for halting increases with the input length.
Figure 7.
MMC incurred in each iterative step of the bubble sort algorithm, denoted by . a) The case where input arrays of length n are all permutations of , while b) considers input arrays formed by combinations (with possible repetitions) of . In both cases, the distribution over initial states is taken to be uniform, and the prior distribution is also assumed to be uniform over state space.
The aggregate MMC—obtained by summing over all iterations until termination—is plotted as a function of input size in Fig. 8. The total MMC is consistently larger in the combination case (allowing repeated entries) than in the permutation case (distinct entries only). Allowing repeated entries affects the state space in two competing ways. On the one hand, repeated values can reduce the number of swaps: when adjacent elements are equal, no exchange occurs, allowing the algorithm to terminate in fewer steps on average, thereby reducing the number of transitions per run. On the other hand, allowing repetition substantially increases the number of possible input arrays, enlarging the overall state space. The net effect is a trade-off: repeated entries can shorten individual execution paths, yet increase the average MMC due to the larger and more complex state space over which the dynamics unfold.
Figure 8.
Aggregate MMC, obtained by summing over all iterations until termination is plotted for bubble sort for both permutations (red) and combinations (blue) of inputs of length n. The initial distribution is taken to be uniform. The program has higher total MMC when the input arrays are allowed to have repetitions compared to the case where inputs are restricted to permutations.
In the next section, we discuss how this framework can be extended to programs that invoke other programs as subroutines. As an illustrative example, we present the Bucket Sort algorithm, which partitions an input array into buckets and calls the bubble sort program to sort each subarray individually.
Subroutine calls
Consider two programs and . State space associated with each program consists of the joint state of all variables in the program and the state of program counter. Let and denote the state space of the program and , respectively. Based on the framework discussed earlier in Framework section, one can define the dynamics of the distribution over the program’s state space:
| (23) |
where and are stochastic maps corresponding to and .
In a subroutine call of by , one of the instructions in program calls to update the variables in . We specifically focus on the call-by-value case, where program passes a copy of the values to subroutine . Any changes made within the subroutine do not affect the original variables in the calling program. The variables within a subroutine are defined locally in its own scope, and the main program passes only the values—not references—of variables to the subroutine. As a result, the subroutine does not operate directly on the variables defined in the calling program but instead works on local copies of their values.
This allows us to treat the calling program and the subroutine—and their respective state spaces—largely independently. Specifically, we can define separate prior distributions for the stochastic maps and , corresponding to the state spaces and , respectively. Once the subroutine is invoked within , and the input distribution to induces an initial distribution over its state space, the evolution of proceeds independently of the state distribution of .
| (24) |
This independence arises because, during the execution of program , all variables associated with program remain unchanged. On the other hand, if we focus solely on the discrete-time updates of program ’s variables—ignoring the internal dynamics of during its invocation—the evolution of the distribution over ’s state remains well-defined:
| (25) |
Denoting the MMC associated with a complete run (ie until the initial distribution reaches a steady state) of each program as and , we define the total MMC of the joint program—where calls —as the sum of the two:
| (26) |
assuming that program is called r times within program , and that each call induces an initial distribution over ’s.
A few important conditions must be satisfied for expression (26) to serve as a valid MMC lower bound on the EP associated with the joint program execution. First, the initial distribution induced on the state space of must be the same for each invocation. Second, the number of times the subroutine is called must be independent of the input to the main program . Finally, the timing (ie the steps at which is invoked within ) must also be independent of the input to . Moreover, it is important to emphasize that the MMC in Eq. 26 does not account for the thermodynamic cost associated with the creation and destruction of correlations between the variables of program and subroutine each time calls with new input values. Nonetheless, Eq. 26 provides a valid lower bound on the cost function.
We consider a simplified example of the Bucket Sort algorithm, where an input array is divided into two subarrays (buckets), and the bubble sort subroutine is called separately to sort each of these buckets.
Bucket-sort program
Bucket sort is an efficient sorting algorithm that divides the input data into a fixed number of buckets, sorts the elements within each bucket—typically using another sorting algorithm or directly if the buckets are small—and then concatenates the sorted buckets to produce the final sorted output. Unlike bubble sort, which has a worst-case time complexity of , bucket sort can achieve an average-case time complexity of under certain input distributions (eg when the input is uniformly distributed).
In this section, however, to illustrate the MMC of subroutine calls, we focus on a simplified version of bucket sort (shown in Fig. 9) that takes an input array of length n, divides it into two buckets (subarrays), calls BubbleSort to sort each subarray, and then combines the sorted results.
Figure 9.
A source code for bucket sort written in MATLAB.
Note that the subroutine BubbleSort is called exactly twice, regardless of the input array. Furthermore, the timing of these calls is fixed and does not vary with the input. Moreover, the distribution over the variables within BubbleSort is the same for each call. Therefore, for the program in Fig. 9, we can apply Eq. 26 to evaluate the MMC. It is also worth noting that Eq. 26 does not account for correlations between the variables of the two subroutine calls. The MMC lower bound on the cost of the joint program defined in Fig. 9 is determined by
| (27) |
where and are the state spaces of BucketSort and BubbleSort, respectively. Note that any initial distribution for the BucketSort program induces an initial distribution for the BubbleSort program.
Discussion and future work
The framework developed in this article, which combines RASP machines with the MMC, is among the first attempts to address the thermodynamic cost of running computer programs in a unified manner without requiring detailed knowledge of the underlying device physics. It allows us to move toward a broader notion of algorithmic complexity that accounts not only for how long a program runs or how much memory it consumes but also for the thermodynamic cost associated with its execution.
For example, in the case of the heaviside program, we observe that the second and third iterations of the associated RASP machine incur substantially lower MMC than the first iteration (Fig. 4a). This is not immediately obvious just by looking at the associated RASP alone (Fig. 2b). However, examining the corresponding state space in Fig. 3, the reason becomes intuitive: at the ensemble level, the first iteration results in a convergence of many distinct initial states into relatively few states. This contraction in state space at the distribution level leads to a larger MMC. A similar effect is visible in the fourth iteration. While this is only an intuitive explanation—since the precise relationship between state-space convergence and MMC also depends on the prior distribution—it suggests a broader principle: algorithmic steps that induce greater state-space convergence tend to incur higher MMC.
The BubbleSort examples illustrate an even more nuanced interaction between input structure, state space, and MMC. One might reasonably expect that BubbleSort acting on an array of size 2 would incur less MMC in each iteration than the same algorithm acting on arrays of size 3 or greater. Surprisingly, Fig. 7 shows the opposite: the MMC for is consistently larger for at each iteration before the program halts compared to the MMC per iteration for . This cannot be explained by state-space structure alone. However, the number of iterations it takes the program to halt increases with increasing length of input array, and consequently the aggregate MMC is larger for larger. Since, we assumed a uniform prior distribution, both the state space encoded in the stochastic map G and the choice of f together determine the prior, and hence the MMC. Isolating the precise cause of this behavior—and more generally, understanding how algorithmic structure, input size, and prior interact to determine MMC—remains an interesting direction for future work.
One aspect we did not investigate in this article is the relationship between MMC and other standard complexity measures such as time and space complexity. For instance, as discussed earlier, programs or algorithms that involve conditionals or loops generally have halting times that depend on the input: some inputs cause them to run longer than others. Given a probability distribution over inputs, one can therefore speak of the expected running time of a program before it halts. Every probability distribution over inputs is associated with such an expected time. A natural question for future work is whether the total MMC associated with an input distribution is related in any systematic way to the corresponding expected running time induced by the same prior distribution.
Because the methods developed here are, in many respects, first of their kind, they also come with shortcomings that we highlight and that future work may hope to address. The first, and more pressing, difficulty is the rapid growth of the state space as the number of program variables increases. This combinatorial explosion limits the practical applicability of the current method to relatively small programs. At present, the authors do not know a general way around this obstacle. However, one promising direction would be to replace exhaustive enumeration of the entire state space with probabilistic sampling techniques—such as Monte Carlo methods—which approximate key quantities by sampling a manageable subset of states rather than computing over all of them.
The second issue concerns the estimation of the prior distribution, which plays a central role in the calculation of MMC. As discussed in the article, one can either begin with a reasonable guess for the function and infer a prior from it, or directly attempt to estimate the prior distribution. In either case, the reliability of the final MMC value is only as strong as one’s confidence in the underlying estimation procedure.
One promising direction for identifying prior-independent minimal costs involves analyzing the structure of the periodic MMC. Consider Eq. 22, which has played a central role throughout this article in defining the total MMC for a given prior :
| (28) |
As the system evolves through a sequence of state distributions according to the update rule , there exists a distribution that minimizes the sum above. This optimal prior defines a special MMC , which acts as a strictly positive lower bound on the MMC incurred for any other choice of prior . Importantly, this strictly positive minimal MMC is completely independent of the underlying physical process and arises solely from the computational map G and its repeated application.
Our current treatment of subroutine calls is restricted to scenarios where both the timing and frequency of the calls are fixed and independent of the input instance. However, in general programs, the stochasticity of input—when input is drawn from a distribution—can induce stochasticity in both when and how often a subroutine is invoked. That is, the call structure becomes input-dependent. Extending our current consideration to handle this more general case poses a very challenging and mathematically reach problem.
Supplementary Material
Contributor Information
Abhishek Yadav, Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA; Department of Physical Sciences, IISER Kolkata, Mohanpur 741246, India.
Francesco Caravelli, Theoretical Division (T-4), Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
David Wolpert, Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA; Complexity Science Hub, Vienna 1030, Austria; Center for Bio-social Complex Systems, Arizona State University, Tempe, AZ 85281, USA; International Center for Theoretical Physics, Trieste 34151, Italy.
Supplementary Material
Supplementary material is available at PNAS Nexus online.
Competing Interest
The authors declare no competing interests.
Funding
The work by F.C. was conducted under the auspices of the National Nuclear Security Administration of the United States Department of Energy at Los Alamos National Laboratory (LANL) under contract no. DE-AC52-06NA25396. F.C. was also financed via DOE LDRD grant 20240245ER. A.Y. and D.H.W. were supported by US NSF grant CCF-2221345 and thanks the Santa Fe Institute for support.
Author Contributions
Abhishek Yadav (Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Visualization, Writing—original draft, Writing—review & editing), Francesco Caravelli (Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing—original draft), and David Wolpert (Conceptualization, Formal analysis, Investigation, Methodology, Writing—original draft)
Preprints
This manuscript was posted on a preprint: https://doi.org/10.48550/arXiv.2411.16088.
Data Availability
All data, code, and analysis scripts used in this study are publicly available at https://github.com/Kensho28/RASP. No proprietary or confidential data were used.
References
- 1. Turing AM. 1936. On computable numbers, with an application to the entscheidungsproblem. J Math. 58:5. [Google Scholar]
- 2. Szilard L. 1964. On the decrease of entropy in a thermodynamic system by the intervention of intelligent beings. Behav Sci. 9:301–310. [DOI] [PubMed] [Google Scholar]
- 3. Landauer R. 1961. Irreversibility and heat generation in the computing process. IBM J Res Dev. 5:183–191. [Google Scholar]
- 4. Bennett CH. 1982. The thermodynamics of computation—a review. Int J Theor Phys (Dordr). 21:905–940. [Google Scholar]
- 5. Sagawa T. 2014. Thermodynamic and logical reversibilities revisited. J Stat Mech. 2014:P03025. [Google Scholar]
- 6. Van den Broeck C, Esposito M. 2015. Ensemble and trajectory thermodynamics: a brief introduction. Physica A. 418:6–16. [Google Scholar]
- 7. Bingham BD, Greenstreet MR. 2011. Modeling energy-time trade-offs in VLSI computation. IEEE Trans Comput. 61:530–547. [Google Scholar]
- 8. Tyagi N, Lynch J, Demaine ED. 2016. Toward an energy efficient language and compiler for (partially) reversible algorithms. In: International Conference on Reversible Computation. Springer. p. 121–136.
- 9. Seifert U. 2012. Stochastic thermodynamics, fluctuation theorems and molecular machines. Rep Prog Phys. 75:126001. [DOI] [PubMed] [Google Scholar]
- 10. Parrondo JMR, Horowitz JM, Sagawa T. 2015. Thermodynamics of information. Nat Phys. 11:131–139. [Google Scholar]
- 11. Wolpert DH. 2019. The stochastic thermodynamics of computation. J Phys A Math Theor. 52:193001. See arXiv:1905.05669 for updated version. [Google Scholar]
- 12. Wolpert DH, et al. 2024. Is stochastic thermodynamics the key to understanding the energy costs of computation? Proc Natl Acad Sci U S A. 121:e2321112121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Kolchinsky A, Wolpert DH. 2017. Dependence of dissipation on the initial distribution over states. J Stat Mech. 2017:083202. [Google Scholar]
- 14. Kolchinsky A, Wolpert DH. 2021. Dependence of integrated, instantaneous, and fluctuating entropy production on the initial state in quantum and classical processes. Phys Rev E. 104:054107. [DOI] [PubMed] [Google Scholar]
- 15. Wolpert DH, Kolchinsky A. 2020. Thermodynamics of computing with circuits. New J Phys. 22:063047. [Google Scholar]
- 16. Yadav A, Wolpert D. 2025. Minimal thermodynamic cost of communication. Phys Rev Res. 7(4):043324. [Google Scholar]
- 17. Yadav A, Yousef M, Wolpert D 2025. Minimal thermodynamic cost of computing with circuits, arXiv, arXiv:2504.04031, http://arxiv.org/abs/2504.04031v3
- 18. Esposito M, Van den Broeck C. 2010. Three faces of the second law. I. Master equation formulation. Phys Rev E. 82:011143. [DOI] [PubMed] [Google Scholar]
- 19. Kolchinsky A, et al. 2025. Maximizing free energy gain. Entropy. 27:91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Plastino AR, Plastino A. 1995. Fisher information and bounds to the entropy increase. Phys Rev E. 52:4580–4582. [DOI] [PubMed] [Google Scholar]
- 21. Barato AC, Seifert U. 2015. Thermodynamic uncertainty relation for biomolecular processes. Phys Rev Lett. 114:158101. [DOI] [PubMed] [Google Scholar]
- 22. Gingrich TR, Horowitz JM, Perunov N, England JL. 2016. Dissipation bounds all steady-state current fluctuations. Phys Rev Lett. 116:120601. [DOI] [PubMed] [Google Scholar]
- 23. Horowitz JM, Gingrich TR. 2020. Thermodynamic uncertainty relations constrain non-equilibrium fluctuations. Nat Phys. 16:15–20. [Google Scholar]
- 24. Shiraishi N, Funo K, Saito K. 2018. Speed limit for classical stochastic processes. Phys Rev Lett. 121:070601. [DOI] [PubMed] [Google Scholar]
- 25. Tuan Vo V, Van Vu T, Hasegawa Y. 2020. Unified approach to classical speed limit and thermodynamic uncertainty relation. Phys Rev E. 102:062132. [DOI] [PubMed] [Google Scholar]
- 26. Ouldridge TE, Wolpert DH. 2023. Thermodynamics of deterministic finite automata operating locally and periodically. New J Phys. 25:123013. [Google Scholar]
- 27. Kolchinsky A, Wolpert DH. 2020. Thermodynamic costs of Turing machines. Phys Rev Res. 2:033312. [Google Scholar]
- 28. Polyanskiy Y, Wu Y. 2014. Lecture notes on information theory. Lecture notes for ECE563 (UIUC) and, 6(2012-2016):7, 2014.
- 29. Parrondo JMR, Van den Broeck C, Kawai R. 2009. Entropy production and the arrow of time. New J Phys. 11:073008. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data, code, and analysis scripts used in this study are publicly available at https://github.com/Kensho28/RASP. No proprietary or confidential data were used.









