A derivation of the master equation from path entropy maximization

Julian Lee; Steve Pressé

doi:10.1063/1.4743955

. 2012 Aug 15;137(7):074103. doi: 10.1063/1.4743955

A derivation of the master equation from path entropy maximization

Julian Lee ^1,^a), Steve Pressé ^2,^b)

PMCID: PMC4108628 PMID: 22920099

Abstract

The master equation and, more generally, Markov processes are routinely used as models for stochastic processes. They are often justified on the basis of randomization and coarse-graining assumptions. Here instead, we derive nth-order Markov processes and the master equation as unique solutions to an inverse problem. We find that when constraints are not enough to uniquely determine the stochastic model, an nth-order Markov process emerges as the unique maximum entropy solution to this otherwise underdetermined problem. This gives a rigorous alternative for justifying such models while providing a systematic recipe for generalizing widely accepted stochastic models usually assumed to follow from the first principles.

I. INTRODUCTION

Markov chains^1,2 are often the starting point for modeling condensed phase stochastic dynamics in biophysics^3–8 and beyond.⁹ Markov chains are approximations of continuous system dynamics. They are primarily justified on the basis of coarse-graining approximations.¹⁰ Coarse-graining reduces classical phase space—with phase points dynamics governed by Liouville's equations—to a discrete set of states—with stochastic hopping between states determined by stationary transition probabilities. Such coarse-graining methods have recently been used to show how Markov models can describe the continuous dynamics of biomolecules evolving in complex potential landscapes.^11–13

A very different approach to stochastic dynamics is due to Filyukov and Karpov¹⁴ and later Jaynes.¹⁵ Using this approach, stochastic dynamical models can be inferred as unique solutions to an inverse problem. A model is defined by the probability for each stochastic path. Normally, the number of stochastic paths greatly outnumbers the constraints imposed from data. To find a unique solution to this underdetermined problem we ask: which model not only satisfies the limited experimental constraints but also maximizes the entropy for the path probabilities? This is exactly equivalent to finding a model for the path probabilities which satisfies the experimental constraints while satisfying these logical consistency axioms due to Shore and Johnson:¹⁶ (1) when A and B are independent data then the model for P(A and B) must reduce to P(A)P(B) and the model for P(A or B) must reduce to P(A) + P(B); and (2) furthermore, any prediction made from the model must be independent of the coordinate system used in the calculation.

This method of finding a stochastic model is mathematically similar to the maximum entropy principle for determining equilibrium probability distributions.^17–20 In earlier work, Ge et al.—which extended the work of Stock et al.²¹ and Ghosh et al.²²—showed that the first order Markov chain emerges as a natural consequence of path entropy maximization. Here we generalize this work in many important ways. (1) We do not limit ourselves to first order Markov processes; (2) we consider the conditions for which the master equation emerges as a solution to the procedure of path entropy maximization; (3) we consider how different types of constraints affect the emergent model; and (4) we consider very general (nonlinear) constraints.

To the best of our knowledge, this is the first time the master equation and, more generally, nth-order Markov processes are rigorously shown to follow from maximum entropy principles. This provides an alternative justification for the master equation—the basic tool of stochastic physics and biology—which is distinct from standard chemical or mechanistic justifications provided by van Kampen,¹ Zwanzig,²³ Gillespie,²⁴ and others. The master equation assumes from the onset a dynamics described by stationary transition probabilities and time-varying state occupation probabilities. Here we only assume data of a specific type are available and the basic logical consistency axioms required to justify maximum entropy as an inference tool.¹⁶ Posing the master equation as the solution of an inverse problem is significant because possible generalizations to the master equation are now derivable within this formalism. These generalizations can then be justified on the firm axiomatic basis of provided by Shore and Johnson.

II. MARKOV MODEL OF nth ORDER: DEFINITIONS AND NOTATIONS

In this section, we briefly introduce the mathematical notation necessary for the remainder of the paper. Consider a stochastic process in discrete time. Let the index i_t denote the state of the system at time t along the path C from time 0 to T where C = {i₀, i₁, i₂, …, i_T}. The probability distribution of path C is

\begin{matrix} P (C) = p (i_{0}, i_{1}, ..., i_{T}) . \end{matrix}

(1)

An n-point joint probability is defined as follows

\begin{matrix} p (a_{1}, ..., a_{m}; t) \\ \equiv \sum_{i_{0}, i_{1}, ..., i_{t - m}, j_{1}, j_{2}, ..., j_{T - t}} p (i_{0}, i_{1}, ..., i_{t - m}, \\ a_{1}, ..., a_{m}, j_{1}, j_{2}, ..., j_{T - t}) . \end{matrix}

(2)

The explicit time index is required, as the result depends on which indices are summed over. Conditional—also called transition—probabilities are obtained by invoking Bayes' theorem:

\begin{matrix} p (i_{0}, ..., i_{t - 1} \to i_{t}) \equiv \frac{p (i_{0}, ..., i_{t})}{p (i_{0}, ..., i_{t - 1})} . \end{matrix}

(3)

We call p(i₀, …, i_{t − 1} → i_t) a transition probability. When the transition probability depends only on the previous n-time steps

\begin{matrix} p (i_{0}, ..., i_{t - 1} \to i_{t}) & = & p (i_{t - n}, i_{t - n + 1} ..., i_{t - 1} \to i_{t}; t) \\ \equiv & \frac{p (i_{t - n}, ..., i_{t}; t)}{p (i_{t - n}, ..., i_{t - 1}; t - 1)}, \end{matrix}

(4)

the process is called an nth-order Markov process. When the transition probability is time-independent, it is called a time-homogeneous Markov process. When no specification is given, a Markov process is assumed first order, time-homogeneous.

III. DERIVATION OF FIRST ORDER MARKOV PROCESS WITH LINEAR CONSTRAINTS

Here we show how the first order Markov process is derived from path entropy maximization. We begin with the definition of path entropy

\begin{matrix} H = - \sum_{{i_{0}, i_{1}, ..., i_{T}}} p (i_{0}, i_{1}, ..., i_{T}) \log p (i_{0}, i_{1}, ..., i_{T}) . \end{matrix}

(5)

We consider N₁ and N₂ linear constraints on one and two-point probabilities, respectively:

\begin{matrix} F_{0}^{(α)} & \equiv & \sum_{t = 0}^{T} \sum_{i_{t}} ɛ_{i_{t}}^{(α)} p (i_{t}; t) - (T + 1) E_{0}^{(α)} \\ = & 0 (α = 1, ..., N_{1}), \\ F_{1}^{(γ)} & \equiv & \sum_{t = 0}^{T - 1} \sum_{i_{t} i_{t + 1}} J_{i_{t} i_{t + 1}}^{(γ)} p (i_{t}, i_{t + 1}; t + 1) - T J_{0}^{(γ)} \\ = & 0, (γ = 1, ..., N_{2}) \end{matrix}

(6)

and a normalization condition

\begin{matrix} \sum_{{i_{0}, i_{1}, ..., i_{T}}} p (i_{0}, i_{1}, ..., i_{T}) = 1 . \end{matrix}

(7)

These constraints are imposed using Lagrange multipliers. That is, the Lagrange multiplier terms are added to the path entropy as follows:

\begin{matrix} - \sum_{{i_{0}, i_{1}, ..., i_{T}}} p (i_{0}, i_{1}, ..., i_{T}) \log p (i_{0}, i_{1}, ..., i_{T}) \\ - \sum_{α = 1}^{N_{1}} β_{α} (\sum_{t = 0}^{T} \sum_{i_{t}} ɛ_{i_{t}}^{(α)} p (i_{t}; t) - (T + 1) E_{0}^{(α)}) \\ + \sum_{γ = 1}^{N_{2}} ν_{γ} (\sum_{t = 0}^{T - 1} \sum_{i_{t} i_{t + 1}} J_{i_{t} i_{t + 1}}^{(γ)} p (i_{t}, i_{t + 1}; t + 1) - T J_{0}^{(γ)}) \\ + (ρ + 1) (\sum_{{i_{0}, i_{1}, ..., i_{T}}} p (i_{0}, i_{1}, ..., i_{T}) - 1) . \end{matrix}

(8)

Extremizing Eq. (8) with respect to p(i₀, i₁, …, i_T), we obtain

\begin{matrix} - & \log p (i_{0}, i_{1}, ..., i_{T}) - \sum_{α} β_{α} \sum_{t = 0}^{T} ɛ_{i_{t}}^{(α)} \\ + \sum_{γ} ν_{γ} \sum_{t = 0}^{T - 1} J_{i_{t} i_{t + 1}}^{(γ)} + ρ = 0 . \end{matrix}

(9)

The Lagrange multipliers introduced in Eq. (8) are determined by additional equations which come from taking the variation of Eq. (8) with respect to these Lagrange multipliers. The solution to Eq. (9) is expressed in terms of the Lagrange multipliers as follows

\begin{matrix} p (i_{0}, i_{1}, ..., i_{T}) \\ = \exp (ρ - \sum_{α} β_{α} \sum_{t = 0}^{T} ɛ_{i_{t}}^{(α)} + \sum_{γ} ν_{γ} \sum_{t = 0}^{T - 1} J_{i_{t} i_{t + 1}}^{(γ)}) \\ = \exp (ρ) v (i_{0}) G (i_{0}, i_{1}) G (i_{1}, i_{2}) ... G (i_{T - 1}, i_{T}) v (i_{T}), \end{matrix}

(10)

where the elements of the vector v, v(i), and the elements of the transfer matrix G, G(i, j), are defined as follows

\begin{matrix} v (i) & = & \exp (- \sum_{α} β_{α} ɛ_{i}^{(α)} / 2), \\ G (i, j) & = & \exp (- \sum_{α} β_{α} ɛ_{i}^{(α)} / 2 + \sum_{γ} ν_{γ} J_{i j}^{(γ)} - \sum_{α} β_{α} ɛ_{j}^{(α)} / 2) . \end{matrix}

(11)

The m-point joint probability distribution, Eq. (2), is obtained from Eq. (10) by summing over indices i₀, …, i_{t − m}, i_{t + 1}, …, i_T as follows:

\begin{matrix} p (a_{1}, ..., a_{m}; t) & = & \sum_{i_{0}, ..., i_{t - m}, i_{t + 1}, ..., i_{T}} p (i_{0}, i_{1}, ..., i_{t - m}, a_{1}, ..., a_{m}, i_{t + 1}, ..., i_{T}) \\ = & \exp (ρ) [v^{†} G^{t - m + 1}] (a_{1}) G (a_{1}, a_{2}) G (a_{2}, a_{3}) ... G (a_{m - 1}, a_{m}) [G^{T - t} v] (a_{m}) \\ = & \frac{[v^{†} G^{t - m + 1}] (a_{1}) G (a_{1}, a_{2}) G (a_{2}, a_{3}) ... G (a_{m - 1}, a_{m}) [G^{T - t} v] (a_{m})}{v^{†} G^{T} v}, \end{matrix}

(12)

where [v†Gⁿ](a) and [Gⁿv](a) denote the ath components of the row and column vectors v†Gⁿ and Gⁿv, respectively. (Similarly, [Gⁿ](a, b) denotes the (a, b) component of the matrix Gⁿ throughout the paper.) Therefore combining Eqs. (4) and (12), we have

\begin{matrix} p (a_{1}, ..., a_{m} \to a_{m + 1}; t) & = & \frac{\exp (ρ) [v^{†} G^{t - m}] (a_{1}) G (a_{1}, a_{2}) ... G (a_{m}, a_{m + 1}) [G^{T - t} v] (a_{m + 1})}{\exp (ρ) [v^{†} G^{t - m}] (a_{1}) G (a_{1}, a_{2}) ... G (a_{m - 1}, a_{m}) [G^{T - t + 1} v] (a_{m})} \\ = & \frac{G (a_{m}, a_{m + 1}) [G^{T - t} v] (a_{m + 1})}{[G^{T - t + 1} v] (a_{m})} = p (a_{m} \to a_{m + 1}; t) . \end{matrix}

(13)

This shows that a conditional probability of transition in fact depends only on the last two states, those right before and after the transition. Therefore, the process is indeed a first order Markov one. However, it should be noted that the transition probability has explicit time dependence.

The first order Markov property was also derived in Ref. 25 for the special case of constraining one-point and two-point statistics which we now define. One-particle statistics corresponds to $F_{0}^{(α)}$ with

\begin{matrix} ɛ_{i}^{(α)} = δ_{i, α} (α = 1, ..., N), \end{matrix}

(14)

where the index α of the constraint now goes over each state of the system, N being their total number of such states. This constraint simply counts the number of times state α is visited over the course of the trajectory. Likewise, two-point statistics corresponds to imposing $F_{1}^{(τ, σ)}$ with

\begin{matrix} J_{i, j}^{(τ, σ)} = δ_{i, τ} δ_{j, σ} (τ, σ = 1, ..., N), \end{matrix}

(15)

where we labelled the constraint by double indices (τ, σ) instead of the single index γ for notational convenience. This again simply counts the number of transitions from state τ to σ over the course of the trajectory.

IV. DERIVATION OF THE TIME-HOMOGENEOUS MASTER EQUATION

Recall that a master equation requires time-dependent state occupation probabilities and time-independent transition probabilities. Under what conditions are such approximations valid? To answer this question we apply the Perron-Frobenius theorem^26–29 to the G transfer matrix of Sec. III—a square matrix which by construction is of size N × N and has positive elements. According to the theorem, G satisfies the following properties:

(1)
It has a positive real eigenvalue r, called the Perron-Frobenius eigenvalue, such that any other eigenvalue λ is strictly smaller than r in absolute value, |λ| < r.
(2)
There is a left eigenvector y^† = (y₁, …, y_N) for r with positive components. That is, y^†G = ry^† and y_i > 0 for all i. Similarly, there is a right eigenvector z with positive components, such that Gz = rz and z_i > 0 for all i.
(3)
Left and right eigenvectors with eigenvalue r are non-degenerate.
(4)
$\lim_{T \to \infty} \frac{G^{T}}{r^{T}} = z y^{†}$

Now reconsider Eq. (13) where

\begin{matrix} p (a_{m} \to a_{m + 1}; t) = \frac{G (a_{m}, a_{m + 1}) [G^{T - t} v] (a_{m + 1})}{[G^{T - t + 1} v] (a_{m})} . \end{matrix}

(16)

Since the vector v has only non-negative elements, both $G^{T} v / r^{T}$ and $v^{†} G^{T} / r^{T}$ have well-defined non-zero limits for T → ∞,

\begin{matrix} \lim_{T \to \infty} \frac{G^{T} v}{r^{T}} = z (y^{†} v); \lim_{T \to \infty} \frac{v^{†} G^{T}}{r^{T}} = (v^{†} z) y^{†} . \end{matrix}

(17)

Therefore, taking the limit T − t → ∞ of Eq. (16) and using Eq. (17), we find

\begin{matrix} p (a \to b) = \frac{G (a, b) z (b)}{r z (a)} . \end{matrix}

(18)

That is, the transition probability is time-independent in this limit. However, from Eq. (12), the m-point joint probabilities are still explicitly time-dependent when T − t is large

\begin{matrix} p (a_{1}, ..., a_{m}; t) \\ = \frac{[v^{†} G^{t - m + 1}] (a_{1}) G (a_{1}, a_{2}) G (a_{2}, a_{3}) ... G (a_{m - 1}, a_{m}) z (a_{m})}{r^{t} v^{†} z} \end{matrix}

(19)

and, in particular, this is true for the one-point occupation probability

\begin{matrix} p (a; t) = \frac{[v^{†} G^{t}] (a) z (a)}{r^{t} v^{†} z} . \end{matrix}

(20)

Thus maximizing the path entropy under the linear constraint Eq. (6) up to two-point probabilities, which are imposed for infinite duration into the future (T − t → ∞), we obtain a time-homogeneous Markov process which is described by (1) time-independent transition probabilities and (2) time-dependent one-point occupation probabilities. From Eqs. (18) and (20), we now obtain the evolution equation for the time-homogeneous Markov process

\begin{matrix} p (a; t + 1) = \sum_{b} p (b; t) p (b \to a), \end{matrix}

(21)

which is the celebrated master equation.

Note the asymmetry in time: the transition probabilities as well as the joint probabilities are time dependent when the limit of t → ∞ is taken but T − t is kept finite. This is simply due to the fact that the transition probability p(b → a) is defined in a time-asymmetric manner.

The last limit to consider is the stationary case, when both T − t and t are large. Then the m-point joint probability of Eq. (12) reduces to

\begin{matrix} p (a_{1}, ..., a_{m}) \\ = \frac{y (a_{1}) G (a_{1}, a_{2}) G (a_{2}, a_{3}) ... G (a_{m - 1}, a_{m}) z (a_{m})}{r^{m - 1} y^{†} z}, \end{matrix}

(22)

which is independent of time as are the state occupation probability or any conditional probability derived from Eq. (22). This is to be expected, since we have time translation invariance in the stationary limit. Stationarity also trivially follows when the constraints themselves are stationary, which are much stronger conditions than those in Eq. (6).

Equation (22) was also derived in the large T limit with (m = T) in Ref. 31 though the stationary Markov process was assumed from the onset therein. In contrast, in the current work m is finite and can be as small as 1, even in the large T limit, and stationarity is derived rather than being an a priori assumption. Likewise, the first order Markov process was derived in Ref. 25 from path entropy maximization for the special case of pair statistics constraints, but neither conditions for the time-homogeneous process nor stationarity were discussed.³²

V. TIME-HOMOGENEOUS MARKOV PROCESSES WITH AN ARBITRARY INITIAL CONDITION

We have discussed how data can come in the form of state occupation probabilities (e.g., how long during the course of a single molecule fluorescence experiment did a protein dwell in a low fluorescent state) or transition probabilities. However, data may also be available in the form of conditions at different points in time (e.g., the sample is pumped into a photoexcited state at time t = 0). Are our conclusions on time-homogeneity from Sec. IV robust to initial, final, or other such conditions? In this section, we briefly show when the time-homogeneity of transition probability depends on such conditions.

Consider an arbitrary condition imposed at time τ

\begin{matrix} p (a; t = τ) = π (a) . \end{matrix}

(23)

We then add the term ∑_aλ(a)(p(a; τ) − π(a)) with Lagrange multipliers λ(a) (a = 1, …, N) to the constrained entropy, Eq. (8). As before, setting the variation with respect to p(i₀, i₁, …, i_T) to zero yields

\begin{matrix} p (i_{0}, i_{1}, ..., i_{T}) & = & \exp (ρ + λ (i_{τ}) - β \sum_{t = 0}^{T} ɛ_{i_{t}} + ν \sum_{t = 0}^{T - 1} J_{i_{t} i_{t + 1}}) \\ = & \exp (ρ + λ (i_{τ})) v (i_{0}) G (i_{0}, i_{1}) G (i_{1}, i_{2}) ... G (i_{T - 1}, i_{T}) v (i_{T}) \\ = & \frac{v (i_{0}) π (i_{τ}) G (i_{0}, i_{1}) G (i_{1}, i_{2}) ... G (i_{T - 1}, i_{T}) v (i_{T})}{\sum_{j_{0} ..., j_{T}} v (j_{0}) π (j_{τ}) G (j_{0}, j_{1}) G (j_{1}, j_{2}) ... G (j_{T - 1}, j_{T}) v (j_{T})}, \end{matrix}

(24)

where in the last line we used the normalization condition Eq. (7) to eliminate ρ and the initialization constraint Eq. (23) to eliminate λ. We now have

\begin{matrix} τ \leq t - m + 1 : \\ p (a_{1}, ..., a_{m}; t) = \frac{\sum_{a} [v^{†} G^{τ}] (a) π (a) [G^{t - τ - m + 1}] (a, a_{1}) G (a_{1}, a_{2}) ... G (a_{m - 1}, a_{m}) [G^{T - t} v] (a_{m})}{\sum_{b} [v^{†} G^{τ}] (b) π (b) [G^{T - τ} v] (b)}, \\ t - m + 1 < τ \leq t : \\ p (a_{1}, ..., a_{m}; t) \\ = \frac{[v^{†} G^{t - m + 1}] (a_{1}) G (a_{1}, a_{2}) ... G (a_{τ - t + m - 1}, a_{τ - t + m}) π (a_{τ - t + m})}{\sum_{b} [v^{†} G^{τ}] (b) π (b) [G^{T - τ} v] (b)} \\ \times G (a_{τ - t + m}, a_{τ - t + m + 1}) ... G (a_{m - 1}, a_{m}) [G^{T - t} v] (a_{m}), \\ t < τ : \\ p (a_{1}, ..., a_{m}; t) = \frac{\sum_{a} [v^{†} G^{t - m + 1}] (a_{1}) G (a_{1}, a_{2}) ... G (a_{m - 1}, a_{m})}{\sum_{b} [v^{†} G^{τ}] (b) π (b) [G^{T - τ} v] (b)} \\ \times [G^{τ - t}] (a_{m}, a) π (a) [G^{T - τ} v] (a) . \end{matrix}

(25)

Using the definition of the transition probability from Eq. (4) we find

\begin{matrix} τ < t : \\ p (a_{1}, ..., a_{m} \to a_{m + 1}; t) = \frac{G (a_{m}, a_{m + 1}) [G^{T - t} v] (a_{m + 1})}{[G^{T - t + 1} v] (a_{m})} \\ τ \geq t : \\ p (a_{1}, ..., a_{m} \to a_{m + 1}; t) \\ = \frac{G (a_{m}, a_{m + 1}) \sum_{a} [G^{τ - t}] (a_{m + 1}, a) π (a) [G^{T - τ} v] (a)}{\sum_{b} [G^{τ - t + 1}] (a_{m}, b) π (b) [G^{T - τ} v] (b)} . \end{matrix}

(26)

We notice that the indices a₁, …, a_{m − 1} have dropped out from the right-hand side of Eq. (26). We can therefore write

\begin{matrix} p (a_{1}, ..., a_{m} \to a_{m + 1}; t) = p (a_{m} \to a_{m + 1}; t), \end{matrix}

(27)

showing that, once more, we have a first order Markov process. Furthermore, the transition probability for t > τ has exactly the same form as Eq. (13), independent of the initial condition π. It is therefore time-homogeneous under the limit of large T − t. The same is not true of t ⩽ τ, where the transition probability always depends on the specified condition and time-homogeneity requires both large T − τ and τ − t. As noted earlier, this time-asymmetry is a natural consequence of the fact that the definition of the transition probability itself is time-asymmetric.

VI. GENERAL DERIVATION OF nth-ORDER MARKOV PROCESS FROM PATH ENTROPY MAXIMIZATION

In this section, we generalize the arguments of Sec. III in two important ways: (1) we consider constraints on the data up to n + 1-point probabilities

\begin{matrix} F^{(α)} ({p (i; t)}, {p (i \to j; t)}, ..., \\ {p (i_{0}, ..., i_{n - 1} \to i_{n}; t)}) = 0, \end{matrix}

(28)

and (2) we do not assume that the constraints F^(α) are linear functions of their arguments (as was the case for Eq. (6)).

Provided constraints are linear—as was the case in Eq. (6)—the arguments in Sec. III are generalizable to nth-order Markov processes. Indeed, the path probability would be described by the multiplication of rank-(n + 1) tensors rather than matrices, such as Eq. (12). The nth-order Markov process would follow immediately, though the derivation of the time-homogeneity of various transition probabilities, as in Secs. IV and V, would require the difficult task of applying an analogue of the Perron-Frobenius theorem for general tensors.

Since we want to derive the nth-order Markov process for fully general constraints, as given by Eq. (28), we take a different route. We first express the path probability p(i₁, i₂, …, i_T) in terms of the conditional probabilities:

\begin{matrix} p (i_{0}, i_{1}, ..., i_{T}) & = & p (i_{0}; 0) p (i_{0} \to i_{1}; 1) p (i_{0}, i_{1} \to i_{2}; 2) ... \\ \times p (i_{0}, i_{1} ..., i_{T - 1} \to i_{T}; T) . \end{matrix}

(29)

Substituting this expression into Eq. (5), we get

\begin{matrix} H & = & - \sum_{{i_{0}, i_{1}, ..., i_{T}}} p (i_{0}, i_{1}, ..., i_{T}) \\ \times (\log p (i_{0}; 0) + \sum_{t = 0}^{T - 1} \log p (i_{0}, ..., i_{t} \to i_{t + 1}; t + 1)) \\ = & - \sum_{i} p (i; 0) \log p (i; 0) \\ - \sum_{t = 0}^{T - 1} \sum_{{i_{0}, i_{1}, ..., i_{t + 1}}} p (i_{0}, i_{1}, ..., i_{t + 1}; t + 1) \\ \times \log p (i_{0}, ..., i_{t} \to i_{t + 1}; t + 1), \end{matrix}

(30)

where, in getting from first to second line, we invoked the relation between joint and marginal probabilities; $p (i_{0}, ..., i_{m}; m) = \sum_{i_{m + 1}, ..., i_{T}} p (i_{1}, i_{2}, ..., i_{T}) .$

Now reconsider the constraints given by Eq. (28) imposed from p(i; t) to p(i₀, …, i_{n − 1} → i_n; t). We will maximize the entropy, Eq. (30), in two steps:

(1)
We maximize the entropy with respect to {p(i₀, …, i_k; t)} (k > n), for given values of {p(i₀, …, i_k; t)} with k ⩽ n.
(2)
We then vary the entropy over the remaining variables, {p(i₀, …, i_k; t)} (k ⩽ n).

By assumption, constraints on the data only matter in step 2. Furthermore, as we now show, step 1 (the unconstrained maximization) is sufficient to show that the general path probability reduces to that of an nth-order Markov process.

In order to perform step 1, we first invoke the equality

\begin{matrix} - \sum_{i} q_{i} \log q_{i} \leq - \sum_{i} q_{i} \log p_{i} \end{matrix}

(31)

for arbitrary probability distributions p_i and q_i.³³ It follows from Eq. (31) that

\begin{matrix} - \sum_{j} p (i_{0}, ..., i_{m - 1} \to j; t) \log p (i_{0}, ..., i_{m - 1} \to j; t) \\ \leq & - \sum_{j} p (i_{0}, ..., i_{m - 1} \to j; t) \log p (i_{m - n}, ..., i_{m - 1} \to j; t) . \end{matrix}

(32)

Multiplying both sides of Eq. (32) by p(i₀, …, i_{m − 1}; t) and summing over i₀, …, i_{m − 1}, we find

\begin{matrix} = & - \sum_{i_{0}, ..., i_{m - 1}, j} p (i_{0}, ..., i_{m - 1}, j; t) \log p (i_{0}, ..., i_{m - 1} \to j; t) \\ \leq - \sum_{i_{0}, ..., i_{m - 1}, j} p (i_{0}, ..., i_{m - 1}, j; t) \log p (i_{m - n}, ..., i_{m - 1} \to j; t), \end{matrix}

(33)

where Eq. (4) was used. The above sets a bound on the last term of the path entropy, Eq. (30). Therefore, for given values of {p(i₀, …, i_k; t)} with k ⩽ n, we see that H is maximized for

\begin{matrix} p (i_{0}, ..., i_{m - 1} \to j; t) \\ = p (i_{m - n}, ..., i_{m - 1} \to j; t) (m > n), \end{matrix}

(34)

the system now being described by a nth-order Markov model where the probability p(i; t) is determined only by previous n steps of history.

Now the transition probability Eq. (34) for nth-order Markov process can be substituted into the path entropy formula, Eq. (30). Step 2 can then be carried forward: the resulting path entropy can be maximized with respect to the remaining variables p(i₀; t), p(i₀ → i₁; t), …, p(i₀, i₁, …, i_{n − 1} → i_n; t) under the constraints, Eq. (28).

In summary, we have just shown that nth-order Markov processes follow under very general constraints provided by Eq. (28). Markov models emerge from the entropy maximization method—and these provide immediate and principled generalizations of the ubiquitous master equation.

VII. DISCUSSION

Markov processes and master equations—the evolution equation describing a first order time-homogeneous Markov process—are standard stochastic modeling tools invoked across disciplines. Such models are usually justified mechanistically by coarse-graining arguments or by assuming quick randomization in space of reactants and products (the “well-stirred” approximation). Yet it is challenging to ascertain a priori whether any of these conditions actually hold. Just like maximum entropy has provided an alternative to ergodic theory for the justification of the equilibrium probability distribution,¹⁷ we believe that the path entropy techniques of Filyukov and Karpov,¹⁴ and later Jaynes,¹⁵ provide a compelling axiomatic basis for the Markov process and the master equation. Here the Markov process emerges as a solution to the following inverse problem: given measurable n-point constraints on a trajectory, what is the least biased model for a probability distribution of paths? By least biased, we mean one that, for instance, does not impose correlations in a model when such correlations are not otherwise warranted by the data (technically these are the logical consistency axioms of Shore and Johnson). The unique solution to this problem is that which maximizes the entropy subject to constraints from the data.

With this formalism, we justify generalizations of the master equation on rigorous mathematical grounds. It is tempting to conjecture whether the nth-order Markov process can lead to a time-homogeneous process so long as the constraints are imposed for a time much longer than that of one time step. The proof would require an analogue of the Perron-Frobenius theorem for general tensors, an interesting subject for further investigation.

ACKNOWLEDGMENTS

We thank Ken Dill, Kingshuk Ghosh, and Hao Ge for useful discussions. S.P. acknowledges an FQRNT fellowship and Ken Dill's support by way of NSF Grant No. R01GM090205-03.

REFERENCES

1.van Kampen N. G., Stochastic Processes in Chemistry and Physics (North-Holland, Amsterdam, 1981). [Google Scholar]
2.Chung K. L., Lectures from Markov Processes to Brownian Motion (Springer-Verlag, New York, 1982). [Google Scholar]
3.Gopich I. and Szabo A., J. Chem. Phys. 118, 454 (2003). 10.1063/1.1523896 [DOI] [Google Scholar]
4.Cao J. and Silbey R. J., J. Phys. Chem. B 112, 12867 (2008). 10.1021/jp803347m [DOI] [PubMed] [Google Scholar]
5.Berezhkovskii A. M., Szabo A., and Weiss G. H., J. Chem. Phys. 110, 9145 (1999). 10.1063/1.478836 [DOI] [Google Scholar]
6.Zhang X.-J., Qian H., and Qian M., Phys. Rep. 510, 1 (2012). 10.1016/j.physrep.2011.09.002 [DOI] [Google Scholar]
7.Ge H., Qian H., and Qian M., Phys. Rep. 510, 87 (2012). 10.1016/j.physrep.2011.09.001 [DOI] [Google Scholar]
8.Brown F. L. H., Acc. Chem. Res. 39, 363 (2006). 10.1021/ar050028l [DOI] [PubMed] [Google Scholar]
9.Feng H. D. and Wang J., Chem. Phys. Lett. 501, 562 (2011). 10.1016/j.cplett.2010.11.017 [DOI] [Google Scholar]
10.de Groot S. R. and Mazur P., Non-Equilibrium Thermodynamics (Dover, New York, 1983). [Google Scholar]
11.Prinz J.-H., Chodera J. D., Pande V. S., Swope W. C., Smith J. C., and Noé F., J. Chem. Phys. 134, 244108 (2011). 10.1063/1.3592153 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Pande V. S., Beauchamp K., and Bowman G. R., Methods 52, 99 (2010). 10.1016/j.ymeth.2010.06.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Kasson P. and Pande V. S., Pac. Symp. Biocomput. 15, 260 (2010). [PubMed] [Google Scholar]
14.Filyukov A. A. and Karpov V. Y., J. Eng. Phys. Thermophys. 13, 326 (1967); 10.1007/BF00832348 [DOI] [Google Scholar]; Filyukov A. A. and Karpov V. Y., J. Eng. Phys. Thermophys. 13, 416 (1967); 10.1007/BF00828961 [DOI] [Google Scholar]; Filyukov A. A., J. Eng. Phys. Thermophys. 14, 429 (1968). 10.1007/BF00828058 [DOI] [Google Scholar]
15.Jaynes E. T., “Macroscopic prediction,” in Complex Systems Operational Approaches in Neurobiology, Physics, and Computers, edited by Haken H. (Springer-Verlag, Berlin, 1985). [Google Scholar]
16.Shore J. E. and Johnson R. W., IEEE Trans. Inf. Theory 26, 26 (1980). 10.1109/TIT.1980.1056144 [DOI] [Google Scholar]
17.Jaynes E. T., Phys. Rev. 106, 620 (1957); 10.1103/PhysRev.106.620 [DOI] [Google Scholar]; Jaynes E. T., Phys. Rev. 108, 171 (1957). 10.1103/PhysRev.108.171 [DOI] [Google Scholar]
18.Gull S. F. and Daniell G. J., Nature (London) 272, 686 (1978). 10.1038/272686a0 [DOI] [Google Scholar]
19.Steinbach P. J., Chu K., Frauenfelder H., Johnson J. B., Lamb D. C., Nienhaus G. U., Sauke T. B., and Young R. D., Biophys. J. 61, 235 (1992). 10.1016/S0006-3495(92)81830-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Jaynes E. T., Probability Theory: The Logic of Science (Cambridge University Press, London, 2003). [Google Scholar]
21.Stock G., Ghosh K., and Dill K. A., J. Chem. Phys. 128, 194102 (2008). 10.1063/1.2918345 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Ghosh K., Dill K. A., Inamdar M. M., Seitaridou E., and Phillips R., Am. J. Phys. 74, 123 (2006). 10.1119/1.2142789 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Zwanzig R., Nonequilibrium Statistical Mechanics (Oxford University Press, New York, 2001). [Google Scholar]
24.Gillespie D. T., J. Phys. Chem. 81, 2340 (1977). 10.1021/j100540a008 [DOI] [Google Scholar]
25.Ge H., Pressé S., Ghosh K., and Dill K. A., J. Chem. Phys. 136, 064108 (2012). 10.1063/1.3681941 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Berman A. and Plemmons R. J., Nonnegative Matrices in the Mathematical Sciences (SIAM, 1994). [Google Scholar]
27.Meyn S. P. and Tweedie R. L., Markov Chains and Stochastic Stability (Springer-Verlag, London, 1993). [Google Scholar]
28.Seneta E., Non-Negative Matrices and Markov Chains (Springer, 1981). [Google Scholar]
29.Isaacson D. L. and Madsen I., Markov Chains: Theory and Applications (Wiley, 1976). [Google Scholar]
30.The stationary process is also obtained when the constraint are imposed at each point in time:
$\begin{matrix} F_{0}^{(α)} (t) & = & ɛ_{i_{t}}^{(α)} p (i_{t}; t) - E_{0}^{(α)} \\ = & 0 (α = 1, ..., N_{1}) (t = 0, ..., T), \\ F_{1}^{(γ)} (t) & = & \sum_{i_{t} i_{t + 1}} J_{i_{t} i_{t + 1}}^{(γ)} p (i_{t}, i_{t + 1}; t + 1) - J_{0}^{(γ)} \\ = & 0 (γ = 1, ..., N_{2}) (t = 0, ..., T - 1) . \end{matrix}$
Our result shows that the weaker constraint Eq. (6) can achieve this so long as 0 ≪ t, T − t.
31.Monthus C. J., J. Stat. Mech.: Theory Exp. 2011, P03008. 10.1088/1742-5468/2011/03/P03008 [DOI] [Google Scholar]
32. Adapted to our notation, it is stated underneath of Eq. (11) of Ref. 25, that p(a, b)∝G(a, b), implying that p(a, b) is time-independent. However, since /article/back/ref-list/ref/mixed-citation/inline-formula $p (a, b) = \frac{[v^{†} G^{t - 1}] (a) G (a, b) [G^{T - t} v] (b)}{v^{†} G^{T} v}$ from Eq. (12), this is strictly correct only when T − t and t are both large.
33.Using the well-known inequality log x ⩽ −1 + x for x > 0, we see that
$\begin{matrix} - \sum q_{i} \log q_{i} + \sum q_{i} \log p_{i} \\ = \sum q_{i} \log \frac{p_{i}}{q_{i}} \leq \sum_{i} q_{i} (- 1 + \frac{p_{i}}{q_{i}}) \\ = - \sum_{i} q_{i} + \sum_{i} p_{i} = 0, \end{matrix}$
proving the inequality, Eq. (31). This inequality was also invoked in Ref. 14 in a much narrower setting (of deriving a 0th order Markov model).

[c1] 1.van Kampen N. G., Stochastic Processes in Chemistry and Physics (North-Holland, Amsterdam, 1981). [Google Scholar]

[c2] 2.Chung K. L., Lectures from Markov Processes to Brownian Motion (Springer-Verlag, New York, 1982). [Google Scholar]

[c3] 3.Gopich I. and Szabo A., J. Chem. Phys. 118, 454 (2003). 10.1063/1.1523896 [DOI] [Google Scholar]

[c4] 4.Cao J. and Silbey R. J., J. Phys. Chem. B 112, 12867 (2008). 10.1021/jp803347m [DOI] [PubMed] [Google Scholar]

[c5] 5.Berezhkovskii A. M., Szabo A., and Weiss G. H., J. Chem. Phys. 110, 9145 (1999). 10.1063/1.478836 [DOI] [Google Scholar]

[c6] 6.Zhang X.-J., Qian H., and Qian M., Phys. Rep. 510, 1 (2012). 10.1016/j.physrep.2011.09.002 [DOI] [Google Scholar]

[c7] 7.Ge H., Qian H., and Qian M., Phys. Rep. 510, 87 (2012). 10.1016/j.physrep.2011.09.001 [DOI] [Google Scholar]

[c8] 8.Brown F. L. H., Acc. Chem. Res. 39, 363 (2006). 10.1021/ar050028l [DOI] [PubMed] [Google Scholar]

[c9] 9.Feng H. D. and Wang J., Chem. Phys. Lett. 501, 562 (2011). 10.1016/j.cplett.2010.11.017 [DOI] [Google Scholar]

[c10] 10.de Groot S. R. and Mazur P., Non-Equilibrium Thermodynamics (Dover, New York, 1983). [Google Scholar]

[c11] 11.Prinz J.-H., Chodera J. D., Pande V. S., Swope W. C., Smith J. C., and Noé F., J. Chem. Phys. 134, 244108 (2011). 10.1063/1.3592153 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c12] 12.Pande V. S., Beauchamp K., and Bowman G. R., Methods 52, 99 (2010). 10.1016/j.ymeth.2010.06.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c13] 13.Kasson P. and Pande V. S., Pac. Symp. Biocomput. 15, 260 (2010). [PubMed] [Google Scholar]

[c14] 14.Filyukov A. A. and Karpov V. Y., J. Eng. Phys. Thermophys. 13, 326 (1967); 10.1007/BF00832348 [DOI] [Google Scholar]; Filyukov A. A. and Karpov V. Y., J. Eng. Phys. Thermophys. 13, 416 (1967); 10.1007/BF00828961 [DOI] [Google Scholar]; Filyukov A. A., J. Eng. Phys. Thermophys. 14, 429 (1968). 10.1007/BF00828058 [DOI] [Google Scholar]

[c15] 15.Jaynes E. T., “Macroscopic prediction,” in Complex Systems Operational Approaches in Neurobiology, Physics, and Computers, edited by Haken H. (Springer-Verlag, Berlin, 1985). [Google Scholar]

[c16] 16.Shore J. E. and Johnson R. W., IEEE Trans. Inf. Theory 26, 26 (1980). 10.1109/TIT.1980.1056144 [DOI] [Google Scholar]

[c17] 17.Jaynes E. T., Phys. Rev. 106, 620 (1957); 10.1103/PhysRev.106.620 [DOI] [Google Scholar]; Jaynes E. T., Phys. Rev. 108, 171 (1957). 10.1103/PhysRev.108.171 [DOI] [Google Scholar]

[c18] 18.Gull S. F. and Daniell G. J., Nature (London) 272, 686 (1978). 10.1038/272686a0 [DOI] [Google Scholar]

[c19] 19.Steinbach P. J., Chu K., Frauenfelder H., Johnson J. B., Lamb D. C., Nienhaus G. U., Sauke T. B., and Young R. D., Biophys. J. 61, 235 (1992). 10.1016/S0006-3495(92)81830-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c20] 20.Jaynes E. T., Probability Theory: The Logic of Science (Cambridge University Press, London, 2003). [Google Scholar]

[c21] 21.Stock G., Ghosh K., and Dill K. A., J. Chem. Phys. 128, 194102 (2008). 10.1063/1.2918345 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c22] 22.Ghosh K., Dill K. A., Inamdar M. M., Seitaridou E., and Phillips R., Am. J. Phys. 74, 123 (2006). 10.1119/1.2142789 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c23] 23.Zwanzig R., Nonequilibrium Statistical Mechanics (Oxford University Press, New York, 2001). [Google Scholar]

[c24] 24.Gillespie D. T., J. Phys. Chem. 81, 2340 (1977). 10.1021/j100540a008 [DOI] [Google Scholar]

[c25] 25.Ge H., Pressé S., Ghosh K., and Dill K. A., J. Chem. Phys. 136, 064108 (2012). 10.1063/1.3681941 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c26] 26.Berman A. and Plemmons R. J., Nonnegative Matrices in the Mathematical Sciences (SIAM, 1994). [Google Scholar]

[c27] 27.Meyn S. P. and Tweedie R. L., Markov Chains and Stochastic Stability (Springer-Verlag, London, 1993). [Google Scholar]

[c28] 28.Seneta E., Non-Negative Matrices and Markov Chains (Springer, 1981). [Google Scholar]

[c29] 29.Isaacson D. L. and Madsen I., Markov Chains: Theory and Applications (Wiley, 1976). [Google Scholar]

[c30] 30.The stationary process is also obtained when the constraint are imposed at each point in time:
$\begin{matrix} F_{0}^{(α)} (t) & = & ɛ_{i_{t}}^{(α)} p (i_{t}; t) - E_{0}^{(α)} \\ = & 0 (α = 1, ..., N_{1}) (t = 0, ..., T), \\ F_{1}^{(γ)} (t) & = & \sum_{i_{t} i_{t + 1}} J_{i_{t} i_{t + 1}}^{(γ)} p (i_{t}, i_{t + 1}; t + 1) - J_{0}^{(γ)} \\ = & 0 (γ = 1, ..., N_{2}) (t = 0, ..., T - 1) . \end{matrix}$
Our result shows that the weaker constraint Eq. (6) can achieve this so long as 0 ≪ t, T − t.

[c31] 31.Monthus C. J., J. Stat. Mech.: Theory Exp. 2011, P03008. 10.1088/1742-5468/2011/03/P03008 [DOI] [Google Scholar]

[c32] 32. Adapted to our notation, it is stated underneath of Eq. (11) of Ref. 25, that p(a, b)∝G(a, b), implying that p(a, b) is time-independent. However, since /article/back/ref-list/ref/mixed-citation/inline-formula $p (a, b) = \frac{[v^{†} G^{t - 1}] (a) G (a, b) [G^{T - t} v] (b)}{v^{†} G^{T} v}$ from Eq. (12), this is strictly correct only when T − t and t are both large.

[c33] 33.Using the well-known inequality log x ⩽ −1 + x for x > 0, we see that
$\begin{matrix} - \sum q_{i} \log q_{i} + \sum q_{i} \log p_{i} \\ = \sum q_{i} \log \frac{p_{i}}{q_{i}} \leq \sum_{i} q_{i} (- 1 + \frac{p_{i}}{q_{i}}) \\ = - \sum_{i} q_{i} + \sum_{i} p_{i} = 0, \end{matrix}$
proving the inequality, Eq. (31). This inequality was also invoked in Ref. 14 in a much narrower setting (of deriving a 0th order Markov model).

PERMALINK

A derivation of the master equation from path entropy maximization

Julian Lee

Steve Pressé

Abstract

I. INTRODUCTION

II. MARKOV MODEL OF nth ORDER: DEFINITIONS AND NOTATIONS

III. DERIVATION OF FIRST ORDER MARKOV PROCESS WITH LINEAR CONSTRAINTS

IV. DERIVATION OF THE TIME-HOMOGENEOUS MASTER EQUATION

V. TIME-HOMOGENEOUS MARKOV PROCESSES WITH AN ARBITRARY INITIAL CONDITION

VI. GENERAL DERIVATION OF nth-ORDER MARKOV PROCESS FROM PATH ENTROPY MAXIMIZATION

VII. DISCUSSION

ACKNOWLEDGMENTS

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A derivation of the master equation from path entropy maximization

Julian Lee

Steve Pressé

Abstract

I. INTRODUCTION

II. MARKOV MODEL OF nth ORDER: DEFINITIONS AND NOTATIONS

III. DERIVATION OF FIRST ORDER MARKOV PROCESS WITH LINEAR CONSTRAINTS

IV. DERIVATION OF THE TIME-HOMOGENEOUS MASTER EQUATION

V. TIME-HOMOGENEOUS MARKOV PROCESSES WITH AN ARBITRARY INITIAL CONDITION

VI. GENERAL DERIVATION OF nth-ORDER MARKOV PROCESS FROM PATH ENTROPY MAXIMIZATION

VII. DISCUSSION

ACKNOWLEDGMENTS

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases