Skip to main content
The Journal of Chemical Physics logoLink to The Journal of Chemical Physics
. 2012 Feb 13;136(6):064108. doi: 10.1063/1.3681941

Markov processes follow from the principle of maximum caliber

Hao Ge 1,2,3,a), Steve Pressé 4, Kingshuk Ghosh 5, Ken A Dill 6
PMCID: PMC3292588  PMID: 22360170

Abstract

Markov models are widely used to describe stochastic dynamics. Here, we show that Markov models follow directly from the dynamical principle of maximum caliber (Max Cal). Max Cal is a method of deriving dynamical models based on maximizing the path entropy subject to dynamical constraints. We give three different cases. First, we show that if constraints (or data) are given in the form of singlet statistics (average occupation probabilities), then maximizing the caliber predicts a time-independent process that is modeled by identical, independently distributed random variables. Second, we show that if constraints are given in the form of sequential pairwise statistics, then maximizing the caliber dictates that the kinetic process will be Markovian with a uniform initial distribution. Third, if the initial distribution is known and is not uniform we show that the only process that maximizes the path entropy is still the Markov process. We give an example of how Max Cal can be used to discriminate between different dynamical models given data.

INTRODUCTION

Dynamical processes are commonly modeled as Markov processes.1, 2 When making a dynamical model, what is the underlying justification for asserting a Markov model? We show here that Markov modeling can be justified on the grounds of the principle of maximum caliber (Max Cal), the dynamical analog of maximum entropy.3 Max Cal provides a general foundation for stochastic dynamics in the same way that the principle of maximizing the entropy provides a foundation for equilibrium statistical physics.4, 5, 6, 7, 8

In Max Cal, one begins with knowledge of all the possible microscopic trajectories that a system could take in time. A small number of dynamical quantities are measured (far fewer than the total number of possible microscopic trajectories). These quantities can include known average macroscopic fluxes. The route entropy is then maximized subject to the dynamical constraints on these average quantities (called maximizing the caliber) to build a weighted ensemble of these microscopic trajectories consistent with the constrained averages. The ensemble is then used to infer other dynamical averages and fluctuations.9, 10, 11, 12, 13, 14, 15, 16 The first application of this basic idea is probably due to Filyukov and Karpov who, in 1967,17 assumed dynamical microscopic trajectories that could be treated as Markov chains. They then maximized the caliber to parameterize the rates of their model which was assumed to be Markovian from the onset.

Here, we pose Filyukov and Karpov's problem in reverse and ask: given dynamical constraints, what models will maximize the caliber? The answer to this question is one systematic recipe for building dynamical models. A common approach to dynamical modeling is to assert a particular kinetic model having a given set of rates as parameters. Then, the parameters of the model are fit to the experimental data. However, this type of approach has limitations: the models are often not unique, they are sometimes overparameterized, and they are not guaranteed to provide a principled description of the data beyond the known average quantities that are used as input. We are interested in answering the following question: Without invoking a hypothetical model in advance, how much can the data alone tell us about how to model the dynamics of the system underlying it?

THE MAXIMUM-CALIBER APPROACH

Suppose we have a stochastic process with a discrete state space Γ = {1, 2, ⋅ ⋅ ⋅, N}.

Consider its trajectories of length T and denote the probability for the trajectory {i0, i1, ⋅⋅⋅, iT} as pi0i1iT where in is the state visited at time n. The path entropy, S(T), is

S(T)=i0,i1,,iTpi0i1iTlogpi0i1iT. (1)

Each variable in in the sum runs from 1 to N. Equation 1 is the measure which, when maximized subject to constraints, yields the least biased probability for the set of microscopic trajectories, {pi0i1iT}, consistent with observed constraints.18, 19 For a trajectory of length T = 1, p1, 2 is the probability of being in state 1 at time 0 and state 2 at time 1. The sum Eq. 1 is over all possible combinations of states at time 0 and 1.

Furthermore, if pi0i1iT is a path probability, then it must satisfy certain logical consistency requirements, as established by Kolmogorov.20

Specifically, for any T equal to 0 or to any positive integer, the probability distribution for a microscopic trajectory must satisfy

iT+1pi0iTiT+1=pi0iT, (2)

i.e., the distribution of the process during the time interval {0, 1, 2, ⋅⋅⋅, T} is the distribution of the process during the time interval {0, 1, 2, ⋅⋅⋅, T + 1} marginalized over T + 1.

Constraint on the mean of the singlet distribution

First, we suppose we only know the mean number of time intervals during which the system dwells in each state m. Let, Am be those quantities in a trajectory of length T and

Am(T)=i0,i1,,iTpi0i1iTk=0Tδik,m, (3)

where δi, j = 1 when i = j and 0 otherwise. Thus, in Eq. 3, k=0Tδik,m counts the number of times state m is visited. By averaging this quantity over all microscopic trajectories, we find the mean number of times this state is visited from time 0 to T. We, therefore, have that ∑mAm = T + 1. When data are given in terms of quantities like Am(T), we refer to these as “singlet statistics.”

Now, we maximize the path entropy in Eq. 1 subject to constraints on our singlet statistic, imposed using Lagrange multipliers. That is, we maximize the Caliber, S(T) − ∑mλmAm(T) with respect to each pi0i1iT where λm is the Lagrange multiplier for state m. We therefore solve

[S(T)mλmAm(T)]pi0i1iT=1+logpi0i1iTmλmk=0Tδik,m=0,

which yields

pi0i1iTemλmk=0Tδik,m=k=0Teλik. (4)

Note that these λ's are all implicitly dependent on the length of the microscopic trajectory in time, T.

Once normalized, the probability becomes

pi0i1iT=k=0Teλiki0i1iTk=0Teλik=k=0Teλik(i=1Neλi)T+1=k=0Tpik(T), (5)

where pik(T)=eλik/i=1Neλi is the probability of being in state ik at time k for a trajectory of length T. The trajectory pi0i1iT thus breaks up into the product of independent probabilities for each time interval.

Therefore, we have

Am=i0,i1,,iTpi0i1iTk=0Tδik,m=(T+1)pm(T), (6)

where pm(T) = ∑ipiδi, m is interpreted as the probability of being found in state m over the course of time T.

As a final note on this example, we consider the result of imposing the consistency condition, Eq. 2, to constrain the distribution obtained above. From this we obtain for the time interval [0, T + 1],

iT+1pi0iTiT+1=k=0Tpik(T+1)iT+1piT+1(T+1)=k=0Tpik(T+1). (7)

Since iT+1pi0iTiT+1=pi0i1iT=k=0Tpik(T) due to the results from Max Cal for time interval [0, T], it then follows that

k=0Tpik(T+1)=k=0Tpik(T), (8)

which gives

pik(T+1)=pik(T) (9)

for any ik.

The proof from Eqs. 8 to 9 follows from summing both sides of Eq. 8 over all indices except ik. Furthermore, since the state ik is arbitrary, we recover Eq. 9.

So, under singlet constraints, maximizing the caliber leads to a model which is an identical, independent distributed process. Equation 5 is the statement of independence and Eq. 9 is the statement of the identical distributions.

Constraint on pairwise statistics

Now we consider instead the situation in which the constraint is on the pairwise statistics for each step mn over the time period [0, T], i.e.,

Amn=i0,,iTpi0iTk=0T1δik,mδik+1,n. (10)

Here, k=0T1δik,mδik+1,n is just the number of occurrence of the transition mn and ∑m, nAmn = T.

Now we maximize the path entropy in Eq. 1 subject to constraints on the pairwise statistics given by Eq. 10. We use the λmn quantities as Lagrange multipliers to constrain Amn. Following a derivation similar to that of Eq. 5, we now get

pi0iT=k=0T1pikik+1em,nλmnk=0T1δik,mδik+1,n, (11)

where pikik+1eλikik+1. Following similar reasoning to Eqs. 7, 8, we find

pikik+1(T+1)=pikik+1(T). (12)

Equations 11, 12 are the main results of this subsection. In particular, Eq. 11 shows that when transitions are used as constraints the procedure of Max Cal yields a Markov process as the model where the probability of a microtrajectory is given by the product of the independent transition probabilities. Furthermore, Eq. 12 says that these transitions are independent of the total trajectory length.

What if the initial distribution is not uniformly distributed?

In Secs. 1, 2B, we did not use any information about the initial condition of the states; hence, we implicitly assumed uniform initial conditions. In many cases, however, the initial conditions are known and not uniform. How can we include this information within Max Cal? As an example, if a system is subjected to a temperature spike at time t = 0, prior to relaxation, its initial conditions will not be uniform. By this we mean that proteins are initially sampled from a particular (presumably unequilibrated) conformational state. T-jump experiments are useful, for instance, in studying the folding kinetics of proteins from a denatured state.21

Max Cal still applies in cases where the initial condition is not uniform. However, we must now condition the path entropy on the known initial state, pi0(0)pi0(t=0). This is an additional piece of information that can be regarded as a constraint on path probability. In general, we will denote the probability of occupying state in at time n as pin(n).

In other words, the following expression:

pi0i1iT=pi0(0)pi1iT|i0 (13)

must be substituted into the definition of the path entropy, Eq. 1, that is then to be maximized with respect to a further constraint available on pi0(0). In the above pi1iT|i0 denotes the probability of the path from time 1 to T conditioned on being in state i0 at time 0.

Having pre-specified the initial conditions and now constraining pairwise statistics, we recover

pi0iT=pi0(0)k=0T1pikik+1. (14)

In this case, the state occupancies, the pi’s, are not necessarily independent of time. A particular state occupancy pik can depend on how much time has elapsed since t = 0. To see this explicitly, we note that i0pi0(0)pi0i1 is now equal to pi1(1) which might be different from the steady-state value (which is denoted as πi1). The steady-state value would instead be obtained from i0πi0pi0i1. Upon specifying initial conditions, we have in general

inpin(n)pinin+1=pin(n+1). (15)

When initial conditions are not specified, i.e., no constraints are imposed on pi0, then plugging Eq. 14 into the expression for the path entropy, Eq. 1, we have

S(T)=i0,i1,,iTpi0(0)k=0T1pikik+1logpi0(0)k=0T1pikik+1.

The pi0(0) which maximizes this expression (i.e., taking the derivative of the above with respect to pi0 and setting that derivative equal to zero) is the initial uniform distribution. By uniform we mean that the initial distribution is one with equal a priori weight assigned to each state. The same result is also obtained in statistical mechanics by maximizing the entropy over states without constraints. The initial distribution is, therefore, not the stationary distribution.

In theory, when knowledge of a stationary distribution is known apriori, i.e., when πi0 is known, this value of pi0 can be used as a prior. That is,

S(T)=i0,i1,,iTpi0(0)k=0T1pikik+1×logpi0(0)k=0T1pikik+1/πi0. (16)

In this case, value of pi0(0) which maximizes the path entropy is found to be πi0 upon normalization.

In practical terms, for a long trajectory, where each state is visited multiple times, the steady state is reliably obtained as follows. We define the dynamical partition function

Qd(T)=i0,,iTem,nλmnk=0T1δik,mδik+1,n. (17)

Taking the derivative of this dynamical partition function gives the average number of time intervals of dwell, Amn = −∂log Qd(T)/∂log λmn (resembling the way that taking derivatives of equilibrium partition functions yield equilibrium averages and higher cumulants). When T is large enough, we have ∑mAmn(T)/T → πn, which is the stationary distribution of state n.

Thus, even when the initial conditions are assumed to be uniform, rather than to select initial conditions from the stationary distribution, the system still tends towards the correct stationary state.

A note on data analysis is warranted. We have just seen that constraints determine the functional form of the probability for observing a particular trajectory in maximum caliber – in other words they justify the Markov chain. Max Cal thus yields the mathematical form for the likelihood function used to describe the probability of a particular trajectory. Parameterizing a trajectory – i.e., finding numerical values for all transition probabilities – is equivalent to maximizing the caliber.

MODEL-BUILDING IN MAX CAL IS DRIVEN BY THE DATA

Max Cal gives a systematic procedure for building kinetic models. To build a kinetic model, pair statistics are first used as constraints. But how do we know this is the correct model? If measured correlation functions show this constraint to be insufficient, triplet statistics are otherwise used as constraints and so forth. Below we show a simple problem where pairwise statistics gives the correct distribution, but in another slightly complex example it makes the wrong prediction. The example further shows how we can improve the prediction by considering higher order statistics.

Suppose the underlying physics is just two-state, AkBkAB and the profiles of A and B are measured over time intervals of length δt. Hence, one could apply Max Cal by first considering a microtrajectory starting in state A, i.e., using pA(0) = 1 and pB(0) = 0. For example, suppose we measure pAA = 1 − kAδt where pAA is the probability that the system is still in state A at time δt, given that it was in state A at time t = 0. Since pAB + pAA = 1 then pAB = kAδt. The probability of staying some total time tNδt in state A is pA(t)pAAN=(1kAδt)N=exp(kAt) for long enough t. This would be your expectation based only on pairwise constraints. The same situation would occur for pBB, pBA, and pB(t). We know that for such a system the steady-state distribution is pAss=kB/(kA+kB) and pBss=kA/(kA+kB). These are obtained from pAss=pBA/(pAB+pBA) and pBss=pAB/(pAB+pBA). To summarize, Max Cal infers the correct steady-state distribution after repeated measurements, i.e., accurate measurements of pAB and pBA, though, in the absence of initial conditions, it infers uniform initial conditions. However, if the trajectory observed is sufficiently long, Max Cal does predict the correct steady-state occupation of each state provided all states are clearly resolved no matter how many states we have.

Now we consider the case where not all states are resolved. Consider the following reaction scheme as an example: AkABkCC. Suppose that species A and B cannot be resolved experimentally. When not all states are resolved, transitions from the aggregated state are not Markov (technically not first-order Markov). The aggregate of A and B is labeled A¯. In this case, the probability of dwelling in state A¯ as a function of t, pA¯(t), is

pA¯(t)=pA(t)+pB(t)=(kCkA)1(kCekAtkAekCt). (18)

Theoretically, there are two ways of backing out a model. One is to fit or to infer the number of exponential contributions to the decay of Eq. 18 and thereby infer the intermediate state B.22 The other is to build the Markovian model from Max Cal through entropy maximization with finite-step constraints. In practice Max Cal is more general (it does not rely on exponentials) and is easier to apply.

Pairwise constraints, such as above, would predict exponential distribution for pA¯(t) instead of Eq. 18. Clearly, this would be inconsistent with a measurement consistent with Eq. 18 instead. From pairwise constraints, we could predict quantities like the probability of observing pi0i1i2=pA¯A¯A¯ within δt. However, our prediction would be incorrect.

We, therefore, turn to data beyond two point constraints to see if we can improve our prediction. We use the set of three point constraints such as pA¯A¯A¯ based on the underlying model AkABkCC. The constraint pA¯A¯A¯ is computed as follows:

pA¯A¯A¯=pAAA+pAAB+pABB, (19)

where pAAA = pAApAA = (1 − kAt/2))2, pAAB = pAApAB = (1 − kAt/2))kAt/2), and pABB = pABpBB = kAt/2)(1 − kCt/2)). Therefore,

pA¯A¯A¯=1kAkCδt22. (20)

In a similar fashion, we find pA¯A¯C=kAkC(δt/2)2 and pA¯CC=0 where pA¯A¯C=pABC=kAkC((δt/2))2.

Now, expanding pA¯(t), Eq. 18, to second order in δt, we have pA¯(t)1kAkCδt2/2. Comparing this with pA¯A¯A¯, it is clear they both capture the correct kA and kC dependence to second order in δt though the pre-factors do not agree. Predictions of other higher order trajectories would also be in better agreement by having third-order constraints.

Just as we had before, this exercise can be carried out to even higher order and the difference in the pre-factor would be eliminated when we use more and more steps as constraints. For instance, using fourth-order time constraints we find

pA¯A¯A¯A¯=13kAkCδt32(kA+kC)δt33/3. (21)

This should be compared with the expansion of Eq. 18 to the third order in δt that yields

pA¯(t)=1kAkC2[(δt)2(kA+kC)(δt)3/3]. (22)

This time the prefactor of the second-order term in Eq. 21 changes to 1/3 from 1/4 in Eq. 20, and in the limit of large number of time steps, it would finally become the correct 1/2. Thus, we find that with higher order constraints we capture more of the behavior of the full profile pA¯(t) when the underlying process cannot be accurately determined by a two-point constraint. The example above outlines how to improve our choice of constraints depending on the data.

CONCLUSIONS AND DISCUSSION

The dynamics of bulk chemical reaction kinetics are governed by the law of mass action. For chemical reactions in small systems like in cellular biochemical environments, the dynamics are stochastic, and many models assume Markovian dynamics. On the other hand, with growing microscopic biochemical data on dynamical systems, a statistical approach to chemical kinetics which is founded on the inferential principles of maximum entropy is desirable. For example, it is well known that exponential distribution for a positive random variable, and Gaussian distribution, both essential to chemical kinetics, can be justified using Jaynes “principle of maximum entropy.”7, 8 It is in this spirit that the approach based on Max Cal for statistical dynamics was proposed.3, 9, 10, 11, 12 Maximum caliber can be used both as a first principle from which to derive stochastic dynamical models and as a method of data analysis. We showed here that Markovian dynamics can be justified from Max Cal; the Markov property is a natural consequence of maximizing a dynamical entropy over trajectories for time-independent processes. Our treatment can be generalized to handle dynamical systems having finite memory (time delays), corresponding to cases of inference used in time-series analysis.23

ACKNOWLEDGMENTS

We thank Professor Hong Qian at the University of Washington and Professor Julian Lee at Soongsil University in Seoul for very helpful discussions. We acknowledge National Institutes of Health (NIH) Grant Nos. R01GM 34993 and 1R01GM090205. In addition, H. Ge acknowledges support by the NSF of China (NSFC) 10901040 and the specialized Research Fund for the Doctoral Program of Higher Education (New Teachers) 20090071120003. This article was first written while H. Ge visited the Xie group in the Department of Chemistry and Chemical Biology, Harvard University. H. Ge thanks Professor Xiaoliang Sunney Xie and his department for their support and hospitality.

References

  1. van Kampen N. G., Stochastic Processes in Chemistry and Physics (North-Holland, Amsterdam, 1981). [Google Scholar]
  2. Chung K. L., Lectures from Markov Processes to Brownian Motion (Springer-Verlag, New York, 1982). [Google Scholar]
  3. Jaynes E. T., “Macroscopic prediction,” in Complex Systems – Operational Approaches in Neurobiology, Physics, and Computers, edited by Haken H. (Springer-Verlag, Berlin, 1985). [Google Scholar]
  4. Jaynes E. T., Probability Theory: The Logic of Science (Cambridge University Press, London, 2003). [Google Scholar]
  5. Steinbach P. J., Chu K., Frauenfelder H., Johnson J. B., Lamb D. C., Nienhaus G. U., Sauke T. B., and Young R. D., Biophys. J. 61, 235 (1992). 10.1016/S0006-3495(92)81830-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Gull S. F. and Daniell G. J., Nature (London) 272, 686 (1978). 10.1038/272686a0 [DOI] [Google Scholar]
  7. Jaynes E. T., Phys. Rev. 106, 620 (1957). 10.1103/PhysRev.106.620 [DOI] [Google Scholar]
  8. Jaynes E. T., Phys. Rev. 108, 171 (1957). 10.1103/PhysRev.108.171 [DOI] [Google Scholar]
  9. Seitaridou E., Inamdar M. M., Phillips R., Ghosh K., and Dill K. A., J. Phys. Chem. B 111, 2288 (2007). 10.1021/jp067036j [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Pressé S., Ghosh K., Phillips R., and Dill K. A., Phys. Rev. E 82, 031905 (2010). 10.1103/PhysRevE.82.031905 [DOI] [PubMed] [Google Scholar]
  11. Pressé S., Ghosh K., and Dill K. A., J. Phys. Chem. B 115, 6202 (2011). 10.1021/jp111112s [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Otten M. and Stock G., J. Chem. Phys. 133, 034119 (2010). 10.1063/1.3455333 [DOI] [PubMed] [Google Scholar]
  13. Stock G., Ghosh K., and Dill K. A., J. Chem. Phys. 128, 194102 (2008). 10.1063/1.2918345 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Smith E., Rep. Prog. Phys. 74, 046601 (2011). 10.1088/0034-4885/74/4/046601 [DOI] [Google Scholar]
  15. Monthus C., J. Stat. Mech. 2011, P03008. 10.1088/1742-5468/2011/03/P03008 [DOI] [Google Scholar]
  16. Ghosh K., J. Chem. Phys. 134, 195101 (2011). 10.1063/1.3590918 [DOI] [PubMed] [Google Scholar]
  17. Filyukov A. A. and Karpov V. Ya., Inzh.-Fiz. Zh. 13, 798 (1967). [Google Scholar]
  18. Livesey A. K. and Skilling J., Acta Cryst. A41, 113 (1985). [Google Scholar]
  19. Shore J. E. and Johnson R. W., IEEE Trans. Inf. Theory IT-26, 26 (1980). 10.1109/TIT.1980.1056144 [DOI] [Google Scholar]
  20. Kolmogorov A. N., Grundbegriffe der Wahrscheinlichkeitsrechnung (Springer, Berlin, 1933) [Google Scholar]; Kolmogorov A. N. [Foundations of the Theory of Probability (Chelsea, New York, 1950) (in English)]. [Google Scholar]
  21. Gruebele M., Annu. Rev. Phys. Chem. 50, 485 (1999). 10.1146/annurev.physchem.50.1.485 [DOI] [PubMed] [Google Scholar]
  22. Lu H. P., Xun L., and Xie X. S., Science 282, 1877 (1998). 10.1126/science.282.5395.1877 [DOI] [PubMed] [Google Scholar]
  23. Brockwell P. J. and Davis R. A., Time Series: Theory and Methods, 2nd ed. (Springer-Verlag, New York, 2009). [Google Scholar]

Articles from The Journal of Chemical Physics are provided here courtesy of American Institute of Physics

RESOURCES