Abstract
We present a practical method for simplifying Markov chains on a potentially large state space when detailed balance holds. A simple and transparent technique is introduced to remove states with low equilibrium occupancy. The resulting system has fewer parameters. The resulting effective rates between the remaining nodes give dynamics identical to the original system’s except on very fast timescales. This procedure amounts to using separation of timescales to neglect small capacitance nodes in a network of resistors and capacitors. We illustrate the technique by simplifying various reaction networks, including transforming an acyclic four-node network to a three-node cyclic network. For a reaction step in which a ligand binds, the law of mass action implies a forward rate proportional to ligand concentration. The effective rates in the simplified network are found to be rational functions of ligand concentration.
Keywords: Reversible Markov Chains, Model Simplification, Ligand-binding, Low-Occupancy States, Non-linear Chains, MWC Model
1. Introduction
Markov chain models (MCM) have numerous important applications in biology, chemistry, computer science, and engineering systems (Norris, 1997). In biology for example, Markov models have contributed a great deal towards understanding the function and structure of ion channels, enzymes, ligand-binding proteins, and population process. A common problem with Markov chains is the large state space that these models span in many applications. State aggregation is perhaps the most straight forward way to deal with such large chains when some sets of states can be treated as indistinguishable (Stewart, 1991; Deng et al., 2009). Spectral methods have become popular for aggregating states and model simplification (Huisinga et al., 2004). However, the requirement to compute eigenvectors of the Markov transition matrix for a large dimensional space makes spectral methods hard to implement, and results in a complex mapping between the original states and the resulting aggregated states. This motivates a simpler method for model simplification, the subject in this paper.
In many cases, Markov chains contain states that have relatively low probability of being occupied, but may serve as important transition gateways between high occupancy states. We show that it is possible to simplify these models so that low-occupancy states do not appear in the simplified chain, but their effect is included in the model. The elimination of the low occupancy states affects the rates between the remaining states. This reduction can yield complicated reaction rates that reflect the physics of the eliminated states. For example, it can introduce non-trivial ligand dependence into the rates of the reduced model.
Understanding this class of simplified models can be important for interpreting results of fitting data with Markov chains. Colquhoun has argued that one should never fit data to the Hill equation (defined below) because it represents a “physical impossibility” (Colquhoun, 2006), in that integer Hill coefficients greater than unity suggest simultaneous binding of multiple ligands, with no intermediate steps. We agree that there is no physical justification for a priori fits to the usual Hill equation, but we show that the Hill equation with any positive integer coefficient does represent a physical situation. It might happen that the statistical best fit (based on, for example, the AIC (Akaike, 1974) or BIC (Schwarz, 1978) criteria which penalize for overfitting) of the ligand dependent probability of occupancy, p, of some state of interest yields a Hill equation with coefficient four: p = d L4/(1 + d L4) where L is the concentration of the ligand. Uncovering such a Hill equation does not imply that four ligands bind simultaneously. Rather we will show that it implies the intervening states with 1 – 3 ligands bound have relatively low occupancy compared to the states with 0 and 4 ligands bound. That is, although the “true” occupancy might be given by p = dL4/(1 + aL + bL2 + cL3 + dL4) it can happen that using nonzero values for a, b, and c does not improve the statistical quality of the fit. Colquhoun did not address the question of how to proceed if one uncovers a Hill equation during the fitting process but we suspect he would agree that one ought not to introduce parameters for which there is no statistical evidence (such as a, b, c).
This raises a question. If a Hill equation with a Hill coefficient greater than one is found to provide the statistical best fit for the ligand concentration dependence of the equilibrium occupancy of some observable state, what ligand dependent rates should be used to connect the unbound and fully bound states? A maximum likelihood fit to time series data keeping all the states will be found to have neutral directions in the space of parameters because the model is over-parameterized relative to what the experimental data can constrain. One might try to resolve the issue experimentally by collecting more data, but this will likely be time-consuming, expensive, and not practicable. The low-occupancy states can add unnecessary complexity to Markov models. However, completely ignoring the low occupancy states is not appropriate either as they can introduce non-trivial ligand dependence for the experimentally inferred rates. The ligand dependence of the rates can provide crucial insight into the structure and function of the system under consideration. In essence we are disgarding processes with fast time scales. The mathematics of “multi-scale” methods goes back at least to the late 19th century (Lindstedt, 1882). The purpose of the current manuscript is simply to point out that for reversible Markov chains a nonstandard parameterization renders the elimination of the fast processes trivial. We show that in many cases, it is easy to write down the correct ligand-dependent rates, essentially by inspection.
The rest of this paper is organized as follows. In the next section, we discuss detailed balance and reversible Markov chains and show how to simplify a 3 state reversible Markov chain with a low occupancy state to a two state Markov chain. We also discuss the energy landscape of the full and simplified chains. We then show that reversible Markov chains are equivalent to resistor-capacitor networks, and that our simplification amounts to neglecting small capacitances. We show how to construct the equivalent circuit for the general problem using a known result from circuit theory. In section “Examples” we work through several cases that we think ought to be of general interest, one a linear 5-state chain which runs from 0 to 4 ligands bound, a chain with multiple ligands dependence, the simplification of a 4 state model with no cycles to a 3 state model connected in a loop, and finally the well-studied Monod, Wyman, and Changeux (hereafter referred to as MWC) model (Monod et al., 1965) in the Hill equation limit. We simplify the MWC model to a two state chain and compare the distribution of first passage times to go from one high occupancy state to another in both the full model and in the simplified 2-state model. The approximation is singular; at any finite time the probability of making the transition converges for the two models but for infinitesimal times the distributions do not converge. The details of these calculations are addressed in the Appendices. Finally, we summarize the main findings of the paper in the conclusions section.
2. Reversible Markov Chains
Finite state Markov chains obey an evolution equation of the form:
(1) |
where p is a vector with pi(t) being the probability that state i is occupied at time t. The generator matrix Q contains the transition rates from state i to state j. The diagonal entries of Q satisfy Qii = −Σj≠i Qij. We assume that there is a unique equilibrium vector of steady state occupancies, w which satisfies:
(2) |
Note also that the vector containing all ones, which we denote by u, satisfies Qu = 0. Our primary goal in the present work is to show how to construct an approximate chain to the “true chain” that obeys an evolution equation:
(3) |
where p̃ and Q̃ are the reduced probability vectors and generator matrices. This is straightforward for reversible Markov chains in the case that some of the states have very low equilibrium probabilities (or occupancies). In particular we will show that reduced generator, Q̃ can be constructed using known methods from circuit theory.
In this work we are considering only the important special case of chains which obey “detailed balance” or microscopic reversibility, also known as “reversible” chains. If one starts from an arbitrary rate matrix and tries to impose this condition loop by loop, detailed balance seems to introduce great complexity. Detailed balance reflects time reversal symmetry, and if the system is expressed in a way that manifestly satisfies detailed balance (Yang et al., 2006; Fredkin et al., 1985; Kolmogorov, 1936; Onsager, 1931) then the equations become simpler rather than more complex. A chain is reversible if and only if wiQij = wjQji for all i, j. We define a diagonal matrix W by
(4) |
in terms of which the reversible condition can be written WQ = (WQ)T.
The matrix WQ gives the directional probability flux at equilibrium from state i to state j, by which we mean the equilibrium occupancy of state i times the rate from state i to j. For example for the two state chain , where wA = 1 and wB = kf /kr are the unnormalized occupancies of states A and B relative to A. Z = wA + wB and so that . At equilibrium, there are equal and opposite fluxes of magnitude kf /Z between the two states.
2.1. Equilibrium Flux
Because of its fundamental importance we denote the symmetric matrix WQ by
(5) |
where Jij (i ≠ j) is the equilibrium flux of probability between states i and j and Jii = −Σj≠iJij.
In the remainder of this paper we use “state occupancy” and “equilibrium flux” parameters to parameterize the Markov chain as suggested in (Yang et al., 2006) instead of reaction rates. The two approaches are mathematically equivalent but the occupancy-equilibrium flux parameter approach automatically satisfies detailed balance and is more intuitive because it separates thermodynamic quantities (equilibrium occupancies, or state energies) from kinetic quantities (equilibrium reaction fluxes, or transition state energies). Thus we can write:
(6) |
so that Eq. (1) can be written:
(7) |
We will see that states for which the occupancy is very low can be eliminated by taking the limit wi → 0 while maintaining finite fluxes through state i.
2.2. Reduction of a 3 state chain to 2 states and the energy landscape
To explain our approach, we reduce the following 3-state chain to a 2-state chain by discussing the energy landscape of this reaction.
(8) |
where kAB is the reaction rate from state A to state B, etc. If the occupancy of B is vanishingly small, one would expect that replacing the full chain by an effective chain:
(9) |
with effective rates should be legitimate on time scales long compared to the equilibration of the B state. In the low occupancy limit1, the rates out of B become fast and the B equilibration time goes to zero. As the limit is approached, the time spent in B becomes negligible compared to the mean time to go between A and C. If the effective chain is to approximate the full chain, the mean times to make the transitions from A to C and C to A should agree in the two chains.
Energetically, low occupancy states are local minima high in the energy landscape. In Figure 1 we show the energy landscape corresponding to the reaction given by equation 8. Our parameterization does not require Arrhenius temperature dependence for the reaction rates but Arrhenius kinetics allows for a more intuitive understanding of the equilibrium flux/occupancy parameterization. We write:
for the rates where kB is Boltzmann’s constant and T is the absolute temperature. The energy EA is the energy of state A etc. The energy is the energy of the unstable transition state between A and B (which can be defined as the state with maximum energy state along the most probable trajectory connecting A with B, as illustrated in Figure 1). The rate, k0, is the number of attempts to cross a barrier per unit time. For simplicity, we take this to be the same for all reactions, but differences could be absorbed into the transition state energies.
Denoting the flux at equilibrium from A to B by JAB etc., and the occupancies for state A, B, and C, by wA = e−EA/(kBT)/Z, etc., with Z = e−EA/(kBT)+e−EB/(kBT)+ e−EC/(kBT) we have:
for the fluxes. The fluxes are confirmed to be symmetric: JAB = JBA, JBC = JCB, which shows that Arrhenius kinetics on an energy landscape obey detailed balance, as expected. The generator matrix is:
(10) |
Note the separation between thermodynamic quantities (EA,EB,EC) and kinetic ones (k0 and transition state energies). The generator matrix for other topologies takes this same general form, with the inverse occupancy of state i being proportional to eEi/(kBT) and the flux between states i and j being proportional to . The exact mean first passage time to go from A to C, τAC, is given by (Fredkin et al., 1985):
where represents the two states A and B aggregated together, and QA denotes that part of Q that connects states within to each other (in this case is the first two columns of the first two rows of Q), so that:
(11) |
and = (1, 1)T is just u confined to the subspace , and (1, 0) is the initial state. So,
(12) |
Similarly, the exact mean time to go from C to A, τCA is given by:
(13) |
The low occupancy limit (of the B state) obtains when EB − EA ≫ kBT and EB − EC ≫ kBT. In this case, the mean times to go from A to C and C to A simplify to:
(14) |
(15) |
These are the approximate mean transition times. Thus we define effective (approximate) rates
(16) |
(17) |
and effective fluxes:
(18) |
Note that
(19) |
This is an important general result that applies whenever a low occupancy state connects to only two other states, as will become clear in section “Reversible Chains are Equivalent to RC networks”. We can interpret equation 18 in terms of an effective barrier height between states A and C, , such that:
(20) |
by defining
(21) |
which simplifies in two limits. If then to good approximation. In words, this limit simply means that if the difference in the barrier heights and is large compared to kBT, then the presence of the lower barrier has negligible effect.
The other simplifying limit is when in which case we find which at first glance might not appear particularly intuitive. However, noting that the effective flux is written: we find that which says that the fluxes from A to B and from B to C are equal, and the effective flux from A to C is half as much. The mean number of A–B transitions the system makes while transitioning from A to C is 2 in this case because once the sytem is in B it has equal odds of hopping to A or C.
The exact distribution of first passage times to go from A to C, f(t) can be shown to be:
(22) |
for small wB the fast and slow decays (λf and λs respectively) are given by: and . Note that as wB → 0 we have λs → kAC and λf ~ ∞. At t = 0 the exact distribution function is identically zero:f exact(0) = 0 while the approximate distribution function, kACe−kACt is simply kAC at t = 0. At any finite time the exact and approximate distributions converge as wB → 0.
2.3. Reversible Chains are Equivalent to RC networks
Equation 7 is formally equivalent to Kirchoff’s law for an RC circuit in which a collection of capacitors are connected to ground on one side and to each other via a network of resistors on the other side with pi the charge on the ith capacitor which has capacitance wi and Jij is the conductivity between the ith and jth nodes (which is symmetric) (Figure 2). The condition Jii = −Σj≠i Jij is Kirchoff’s junction rule which states that the sum of the currents into a node must be zero. Detailed balance follows from the fact that the conductivity of a passive resistor is the same in either direction which follows from Ohm’s law. At steady state no current flows in an RC circuit. The voltage across the ith capacitor is qi/Ci where qi is the charge on the ith capacitor (which has capacitance Ci). As an initial charge distribution relaxes to equilibrium, current flows until the voltage across each capacitor is the same. This voltage is the total charge ( ) over the total capacitance ( ), /C which we take to be one volt. The analog to voltage in the Markov case is pi/wi. The energy stored in the capacitors initially is . As time goes by this is dissipated via joule heating in the resistors until it reaches 1/2 Σi qi ×1 volt = 1/2 × 1 volt. In the Markov case with initial probability pi for being in the ith state, the “energy” is initially which dissipates until it reaches . In the Markov case, as an initial probability distribution relaxes to equilibrium, probability flows until the “voltage” for each state is unity. Note that, provided all the capacitors in the RC network are actually connected, the final charge distribution is independent of the network. For the Markov case, the final probability distribution is independent of the network. One cannot use equilibrium probability distributions to infer the network and one cannot use the final charge distribution to infer the connectivity of the capacitors. Just as information on the time dependent flow of charge is required to make inferences regarding the resistor network, information on the time dependent flow of probability is required to make inferences regarding the connectivity of reversible Markov chains.
The crux of this paper is that small capacitors (small occupancies) can be neglected. Any node that has a very small capacitance can be removed from the network. After these capacitors are removed, the low capacitance nodes can be removed by connecting the remaining nodes with resistors having the correct resistances. Equation 19 is a corollary based on the fact that the resistances of resistors in series add. I.e., the inverse of the effective flux through a linear chain is just the sum of the inverse node to node fluxes along the chain, since flux is analogous to electrical conductance (or inverse resistance). Another corollary is that for parallel paths the fluxes add, just as conductances do for parallel resistors. Although this approximation may cause large errors on very short timescales, the duration of the errors is often so short as not to be noticed. For example, any connection between two resistors can hold some tiny amount of charge and therefore has some capacitance relative to ground. Yet, formulas for adding resistors in series or parallel are taught without concern for the violations that must be present on very short time scales.
We will see later that the flux matrix, J̃ for the simplified chain (equation 3) can easily be constructed by a series of “Y − Δ” transformations that have been used in circuit theory since the late 19th century (Kennelly, 1899; Akers Jr, 1960; Knudsen and Fazekas, 2006; Van Lier and Otten, 1973). The resulting network can then be simplified so that some links are replaced by combinations of rates (or flux parameters). With the states ordered so that the high occupancy ones come first followed by the low occupancy ones the reduction can be achieved by the following sequence of transformations which in essence removes the low occupancy states one state at a time (the transformations will be explained further with examples in section 3.2).
(23) |
(24) |
(25) |
(26) |
(27) |
where nlow is the number of low occupancy states, nhigh is the number of high occupancy states, and J̃0 = J. The diagonal entries at any point in the sequence are of course given by . In the preceding, tk indexes the low occupancy states. The quantity is the “branching fraction” and denotes the fractional flux between state j and the low occupancy state tk. In essence this transformation removes each low occupancy node one at a time and interconnects all the states that were previously linked to the removed node. If the high occupancy states are renormalized so that w̃i = wi/z̃ with z̃ set by the normalization condition: , the transformed flux matrix must also be renormalized J̃ → J̃/z̃. Finally the reduced system obeys:
(28) |
A Mathematica program that performs this reduction on random flux matrices is in the online supplement. While the reaction steps in the original process can be considered as “elementary”, the reaction rates in the reduced process are not elementary. Consequently, the reduction procedure can give complicated ligand dependence for reaction rates between the remaining states. For example, in the full model discussed in section 3.1 (equation 29), the transition rates from E4 to S0 along the chain are ligand independent. However the effective rate from E4 to S0 in the reduced 2 states system is ligand dependent.
3. Examples
In this section we perform 3 simplifications on models that we think might be of general interest. We begin by discussing a 5 state linear chain in which the states have 0, 1, 2…4 ligands bound but only the unliganded and quadruply liganded states have high occupancy. We then consider the MWC model for a tetrameric molecule which binds ligand. We consider the MWC model in the Hill limit, for which only the states with 0 and 4 ligands bound have high occupancy. Finally we consider a 4 state acyclic model in which three states are all connected to the same low occupancy state. The resulting 3 state model has a cycle but still automatically obeys detailed balance.
3.1. A linear chain
Here we consider the following chain:
(29) |
As shown in (Yang et al., 2006), we have written the rate from state Xl to Xm equal to the ratio of flux between Xl and Xm and occupancy of Xl where l and m is the number of ligands bound in Xl and Xm respectively. The (unnormalized) flux between states Xl and Xm is jlmLmax(l,m) and the (unnormalized) occupancy of state Xl is written as KlLl. Only energy differences are relevant, so we can give unit unnormalized occupancy to the unliganded state, S0 (i.e. K0 = 1). Denoting the normalized occupancies of state Xl by wl, we have wl = KlLl/Z with Z = 1 + K1L + K2L2 + K3L3 + K4L4. If the occupancies of the intermediate states, t1, t2, and t3 are negligible compared to max(w0, w4), then the mean times to go from S0 to E4, τS0E4, and back, τE4S0 are given by:
Details of reaching these expressions using generator matrix theory are given in Appendix A. In Appendix B, we demonstrate the simplification of an example where the states have binding sites for multiple ligands.
3.2. Reduction of an Acyclic Model to a Cyclic Model: Application of the “Y − Δ” Transformation
Next, we consider the case where the chain has more than one branch. An example of such case is shown in Figure 3a. If the state labeled “t” represents a low-occupancy state then the system can be simplified to the chain shown in Figure 3b via the “Y − Δ” transformation discussed in section 2.3. We denote the (unnormalized) occupancies of the states (A, B, C) by KA, KB, KC respectively. The effective probability flux from A to B, JAB, is the product of probability flux from A to t, JAt, and the fractional probability flux flowing from t to B (see equation 23). Notice that there are no direct fluxes between the high occupancy states in the full scheme (Figure 3a) therefore, the first term on the right hand side of equation 23 is zero. Thus, the effective fluxes between high occupancy states, A, B, and C are given by:
(30) |
where M, N = A, B, and C. The branching fraction of probability flux from state N to t given by equation 24 is , i = A, B, and C. The denominator in BFNt is the total probability flux out of t.
The effective rates from M to N (with M ≠ N), kMN, are given by: . For example in Figure 3b,
(31) |
The reader can check that detailed balance is satisfied.
In situations where the high occupancy states in the “Y” chain are linked directly in addition to the links through the low occupancy state (see Figure 3c for an example), the first term on the right hand side of equation 23 is non-zero (Akers Jr, 1960; Knudsen and Fazekas, 2006; Van Lier and Otten, 1973). For example, the effective flux between the high occupancy states after simplifying the chain in Figure 3c to a “Δ” loop is given by
(32) |
where JMN(dir) is the probability flux for the direct link between high occupancy states M and N.
Any complex network can be reduced by applying equation 23 to all branches involving low occupancy states. The simplification would result in a fully connected (△, □, , , .....) loop depending on the number of high occupancy states (3, 4, 5, 6, .....) in the original (Y, X, , , .....) branch. Each state in the resultant loop would be directly connected to all other states.
3.3. The Tetrameric MWC Model
The MWC model was developed by Monod, Wyman, and Changeux as a model for allostery in hemoglobin (Monod et al., 1965). For historical continuity we use the original MWC notation except that we use L for ligand concentration (instead of their F) and Λ (instead of their L) for the occupancy of the “tense” state, T0, relative to the “relaxed” state R0. The original MWC model considered only thermodynamics, not kinetics, so we must make a few assumptions regarding the dynamics. For the usual tetrameric case, we write the MWC model as follows:
As in MWC KR and KT are the dissociation constants for ligand unbinding from the relaxed and tense monomers and c = KR/KT. The flux parameters, jtt, jrr, and jrt, set the rates of tense-tense, relaxed-relaxed, and tense-relaxed transitions respectively. MWC did not discuss dynamics but only equilibrium and so had need for equilibrium constants but not for flux parameters, or equivalently, reaction rates. MWC did not specify whether there are transitions between Ri and Ti present for states i > 0. We assume there are such links. For simplicity, we use a single flux parameter jrt for all Ri to Ti transitions. Relaxing the previous assumption has little effect. Similarly we use the same flux parameters jrr and jtt for each Ri to Ri+1 and each Ti to Ti+1 transition in the spirit of the original MWC model in which the monomers are unaffected by ligand binding. The Hill limit corresponds to Λ → ∞, c → 0, and Λc4 → 0. The exact expected fraction of sites with ligand bound (MWC’s ȲF), is given by:
(33) |
Which reduces to in the Hill limit (large Λ and small c)2 i.e.
(34) |
(35) |
where x ≡ L/KR is MWC’s “α”. Thus a Hill equation with coefficient greater than one can be physically meaningful. But we can also make use of these equations without taking the Hill limit. We consider parameter values that are in a physically plausible range: Λ = 104 and c = 10−6 with KR = 1nM and KT = 1mM. In Figure 4 we plot and . The (unnormalized) occupancies of the relaxed and tense states, are given by and respectively (see Appendix C). For small x, T0 is the only state with significant occupancy and for large enough x R4 is the only significantly occupied state. T0 and R4 have equal occupancy at x = Λ1/4 = 10. At this value of x, the (unnormalized) occupancy of R3 is 4, 000 while that of R4 and T0 are each 10, 000 which gives a normalized probability for R3 (at x ≈ 13) of about .18 so if one were able to glean this from observation one could keep the state R3 and reduce the system to a “Δ” loop involving T0, R3, and R4 using the “Y − Δ” transformation. We show this towards the end of Appendix C. But first we reduce the full MWC model to a two-state model involving T0 and R4 states.
To find the effective rates between T0 and R4 in the reduced two-states MWC model, one can perform the matrix algebra (as done in Appendix C). Here we point out that for large Λ and sufficiently small c using the analogy with RC circuits and the definition of flux (occupancy x rate) gives the effective flux from T0 to R4 by inspection. Since the tense states T1, T2, T3, T4 have negligible occupancy for small enough c we simply have a chain from T0 ⇌ R0 ⇌ R1 ⇌ R2 ⇌ R3 ⇌ R4 with (unnormalized) occupancies of Λ, 1, 4x, 6x2, 4x3, x4 respectively. Unnormalized means we drop the normalization factor Z. The normalized fluxes are the unnormalized fluxes divided by Z, and the rates work out the same as long as the fluxes and the occupancies are both unnormalized or both normalized. By inspection we find (in the small c limit):
(36) |
The mean transition time from T0 to R4 is the occupancy of T0 (Λ) times 1/JT0R4 and the transition time from R4 to T0 is the occupancy of R4 (x4) times 1/JT0R4. That is for small c
(37) |
In Figure 5 we plot the exact and effective (approximate) transition rate from T0 to R4, (the inverse of τT0R4 given by equation 37), as a function of x, for jrr = jrt = jtt = 10 (in arbitrary time units). The exact rate is given by the inverse of equation C.12. There are 3 distinct regimes: small x in which the rate goes as x4, intermediate x in which the rate goes as x, and large x in which the rate becomes constant and equal to jrt. As clear from Figure 5, the exact transition rate converges to the approximate rate as we decrease the value of c. In Figure 6, we compare the exact and approximate cumulative probabilities that the MWC model has reached the state R4 given that it was in state T0 at time 0 for three different values of x. The exact and approximate cumulative probabilities are given by equations C.16 and C.17 respectively. The dotted curves show the approximate solution and the solid curves show the exact solution. We notice that for short times there can be significant deviation between the two, but that the probabilities begin to converge in all three cases by the time that the probability exceeds roughly 0.001.
4. Conclusions
In this paper we developed a rigorous technique for simplifying reversible Markov chains in the case that some of the states have very low occupancy relative to the other states. Using the analogy between reversible Markov chains and RC-networks we showed that analytic formulae for the reduced models can be obtained by inspection in many cases including linear chains, the Hill equation limit of the tetrameric MWC model, and the 4 state acyclic model which reduces to 3 states connected in a loop.
The motivation behind our study was to develop a simple and transparent procedure for reducing Markov chains with low occupancy states in order to (1) avoid over-parameterization when fitting a model to a given data set and (2) acquire a better understanding of the underlying dynamics of the system from which the data is collected. Nevertheless, our simplification procedure would also enhance the computational efficiency when simulating such systems. The reduction in computational time would depend on the number of low occupancy states eliminated and the probability flux between low and high occupancy states. We illustrate the improvement in computational time by considering the following example.
(38) |
where A and B are the high occupancy states with occupancy 1 and KL2 respectively, and t is the low occupancy state having occupancy εL. jAt and jBt are the flux parameters for A ⇔ t and t ⇔ B transitions respectively. After eliminating the low occupancy state t, the chain in equation 38 reduces to
(39) |
kAB and kAB are the effective transition rates from A to B and vice versa. In Appendix D, we calculate the expected number of random numbers, Nrand, needed to perform a Gillespie simulation (Gillespie, 1976) of the full model for one transition from A to B and back to A. We find Nrand = 3(pA/pB + pB/pA + 2), where pA and pB are the probabilities of transition from t to A and t to B respectively. The minimum value of Nrand = 12. As the ligand concentration, L, varies, pA/pB or pB/pA will become large so Nrand can increase arbitrarily. For the reduced model, only two random numbers are ever required to simulate a transition from A to B and back to A. Thus for this simple example the amount of computational work required to simulate the full model is at least 6 times (and potentially much more than) that needed for the reduced model.
Any Markov chain can be reduced by applying the “Y − Δ” transformation (equation 23) and other simplification methods from circuit theory. There is a large body of work, both ongoing and older, for performing these transformations efficiently on large circuits (see for example (Akers Jr, 1960; Knudsen and Fazekas, 2006; Van Lier and Otten, 1973)). These techniques carry over directly to the important case of reversible Markov chains.
Nodes with very small occupancy compared to all of the remaining nodes can certainly be removed for the purposes of longer time dynamics. It can be desirable to keep some small occupancy nodes, if they correspond to initial states or only have small occupancy for a certain range of ligand concentration or other parameter. In this case it can happen that other nodes can be removed that don’t have small occupancy compared to these nodes. This will still be a good approximation if the equilibration time for the latter nodes is short compared to the time-scales of interest (typically, the equilibration times of the high occupancy nodes). More generally the early time dynamics could be treated via matched asymptotics but this would require additional parameters while our primary goal in this paper is the elimination of parameters that are difficult to estimate from data.
We have attempted to use the ideas sketched here to help with the development of a data-driven model of the IP3 receptor/Ca2+-channel(Ullah et al.). The standard approach to fitting single molecule data with Markov chains involves first selecting a chain and then maximizing the likelihood of a data set by varying the parameters in the selected chain. However, there are an enormous number of possible chains and one is unlikely to guess the correct chain. Our approach allows one to construct models that have as many decay constants as can be distinguished by experiments, yet can also give correct dependence on ligand concentration. Ideally the data-driven construction of reaction networks would proceed iteratively from data collection to model construction, analysis, and refinement and ultimately additional data collection. During the course of the modeling process the modeler could gain the ability to provide estimates of some of the missing parameters. Even so the approximations discussed here provide a useful arrow for the modeler’s quiver.
Supplementary Material
Highlights.
A simple technique for simplifying Markov chains on large state space.
The approach is illustrated by several analogies from physics.
The technique is presented by several examples.
Our method works for multi-ligand dependent molecules as well.
Our study will have a broad impact in the field of single-molecule dynamics.
Acknowledgments
JEP would like to acknowledge to thank Paul Fenimore for pointing out that we were separating kinetic parameters from thermodynamic ones. This work was supported by National Institute of Health under grant number 5RO1GM065830-08.
Appendix A
In this Appendix we use generator matrix theory to reduce the 5-state chain to 2-state chain discussed in section “A linear chain”. The 5-state chain has 2 high occupancy states, S0 and E4 and 3 low occupancy states, t1, t2, and t3. The goal is to aggregate this 5 state model into a 2 state model by eliminating the 3 low occupancy states. We will derive the mean time to go from one state to another. The effective transition rates between the high occupancy states are simply the inverse of mean times to transition between those states. To simplify this chain we first write it in terms of probability fluxes
(A.1) |
W in the generator matrix Q (equation 6) is the diagonal matrix whose entries are the unnormalized equilibrium occupancies of the five states, S0, t1, t2, t3, and E4.
(A.2) |
where K1L, K2L2, K3L3, and K4L4 are the occupancies of states t1, t2, t3, and E4 respectively relative to state S0 having an occupancy of 1. J in equation 6 is the symmetric generator matrix with element Jxy corresponding to the equilibrium probability flux from state x to y. The diagonal entries of J are given by Jxx = −Σy≠x Jxy which is an expression of conservation of probability (Bruno et al., 2005).
(A.3) |
Putting equations A.2 and A.3 in equation 6 gives
(A.4) |
We first aggregate S0, t1, t2, and t3 states and represent the aggregated state by . The exact distribution of first passage time to go from to E4 is given by
(A.5) |
where = (1, 0, 0, 0) is the row matrix whose elements are initial probabilities of states S0, t1, t2, and t3 and uE4 is a unit column matrix with dimension equal to the number of final states to which the system is about to transition, in this case one state (E4). , and are the sub-matrices of Q
(A.6) |
(A.7) |
We can calculate the exact mean time to go from aggregated state to E4, , by integrating equation A.5 and is given by
(A.8) |
(A.9) |
(A.10) |
The last expression is the approximate mean transition time to go from to E4 and is reached by assuming that the occupancies of the states t1, t2, and t3 are negligible as compared to states S0 and E4. If the low occupancy states have small but finite occupancy then the mean transition time to go from aggregated state to E4 is given by equation A.9.
Similarly, the exact distribution of first passage time to go to S0 from , which represents the aggregate of t1, t2, t3, and E4 states is given by
(A.11) |
where = (0, 0, 0, 1) is the row matrix having the initial probabilities of states t1, t2, t3, and E4 respectively, uS0 is a 1 × 1 identity matrix, , and are the sub-matrices of Q
(A.12) |
(A.13) |
Integrating equation A.11 gives us the exact mean time to go from state to S0 as
(A.14) |
(A.15) |
(A.16) |
where the last expression follows from the assumption that states t1, t2, and t3 have negligible occupancies.
Appendix B
In numerous cases we deal with the Markov chains where the state of the system depends on multiple ligands. In this Appendix we simplify a chain that involves the binding of multiple ligands. The simplification process for such chains is similar to what we have presented for the single ligand case in section “A linear chain”. In Figure B1a we show a chain having total number of 5 states. We wish to simplify this scheme to the one shown in Figure B1b so that the low occupancy states are aggregated into the two high occupancy states. In the first step, the system makes transition from state A00 to t20 by binding two molecules of ligand L1. In the second step, the system makes transition from state t20 to B22 by binding two molecules of ligand L2. We will use the electrical circuit analogy to simplify the 5 states chain into 2 states chain (see equation 19).
The effective probability flux from state A00 to B22, JA00B22, is
(B.1) |
Note that the exponents of ligands L1 and L2 in the individual probability fluxes in equation B.1 are equal to the maximum of the number of corresponding ligand molecules bound to the two states involved in the transition.
The probability flux from one state to another is simply the ratio of the occupancy of the initial state and the mean transition time from initial to final state, i.e.
(B.2) |
The occupancy of A00 = 1, giving the approximate mean transition time from A00 to B22 is
(B.3) |
Similarly, the approximate mean transition time from state B22 to A00 is
(B.4) |
Where is the occupancy of B22 state. Equations (B.1 – B.4) can be easily generalized for an arbitrary number of states in the chain.
Appendix C
In this Appendix, we simplify the tetrameric MWC model so that the final model is only composed of T0 and R4 states. Towards the end of this Appendix, we will discuss the case of x = Λ1/4 = 10 where state R3 is not a low-occupancy state. Before writing the matrices W and J used in generator matrix Q (equation 6), we first calculate the unnormalized occupancies of all states in MWC model. Consider the following reaction
(C.1) |
Where X can be either R or T and equilibrium constant . The forward rate of the reaction is . At equilibrium the occupancy of Xi+1 state is
(C.2) |
Thus the occupancies of Ri and Ti are given as
(C.3) |
(C.4) |
Where the occupancies of R0 and T0 are 1 and Λ respectively, . In MWC language Λ = L and .
Thus the diagonal matrix W in equation 6 whose entries are the unnormalized equilibrium occupancies of the all 10 states becomes
(C.5) |
where w = (wR, wT). Vectors wR and wT contain the occupancies of all R and T states respectively.
To write J, we calculate the probability fluxes between various states. Since
(C.6) |
Thus
(C.7) |
(C.8) |
If the time for Ti to make the transition to Ri is τ1 then
(C.9) |
Using the notation , and for flux parameters, the effective flux matrix, J, in equation 6 becomes
(C.10) |
Where
F11 = −4jrrx − jrtΛ
F22 = −4jrrx − 12jrrx2 − 4cjrtxΛ
F33 = −12jrrx2 − 12jrrx3 − 6c2jrtx2Λ
F44 = −12jrrx3 − 4jrrx4 − 4c3jrtx3Λ
F55 = −4jrrx4 − c4jrtx4Λ
F66 = −4cjttx − jrtΛ
F77 = −4cjttx − 12c2jttx2 − 4cjrtxΛ
F88 = −12c2jttx2 − 12c3jttx3 − 6c2jrtx2Λ
F99 = −12c3jttx3 − 4c4jttx4 − 4c3jrtx3Λ
F1010 = −4c4jttx4 − c4jrtx4Λ.
Using equations 6, C.5, and C.10 we can write
(C.11) |
Where
Q11 = −4jrrx − jrtΛ
Q22 = −jrr − 3jrrx − cjrtΛ
Q33 = −2jrr(1 + x) − c2jrtΛ
Q44 = −jrr(3 + x) − c3jrtΛ
Q55 = −4jrr − c4jrtΛ
In analogy with equation (A.8), we can write the exact mean transition time from T0 to R4 as
(C.12) |
Where represents the aggregate of all states in the MWC model other than R4 is a row matrix of initial probabilities of all states in (all states in except T0 have 0 initial probability, T0 has initial probability of 1), is the sub-matrix of Q with entry ij equal to the transition rate between i and j states in , column matrix is the sub-matrix of Q with entry i equal to the transition rate from i state in , and uR4 is a unit matrix.
(C.13) |
(C.14) |
In the limit (Λ → ∞, c → 0, Λc → 0), the mean time to transition from T0 to R4, given by equation (C.12) simplifies to
(C.15) |
For the full MWC model, the exact latency (first passage time) distribution for transition from state T0 to R4 given by
(C.16) |
For the simplified two state MWC model, the latency distribution becomes
(C.17) |
Where the initial probability of the system being in T0 state, ΠT0= 1, and QT0R4 = kT0R4. is the transition rate from T0 to R4 in the simplified 2 state MWC model.
Mean transition time and latency distribution from R4 to T0 can be calculated in the same manner.
Next we discuss the case where one could keep state R3 and reduce the MWC model to a “Δ” loop involving T0, R3, and R4 states. We first rewrite the MWC model in terms of probability fluxes (Figure C1a) where the double arrows represent the fluxes between states. The probability fluxes between various states are given by equations C.7 – C.9 and are excluded from Figure C1 for clarity. We use the “Y − Δ” transformation to perform the simplification in the following steps. Step 1: eliminate states R0 and T4 from the two linear branches T0 ⇔ R0 ⇔ R1 and T3 ⇔ T4 ⇔ R4 respectively (Figure C1b). The inverse of effective probability flux between T0 and R1 is given by the sum of the inverses of fluxes in T0 ⇔ R0 and R0 ⇔ R1 transitions (see equation 19). Similarly, the effective probability flux between T3 and R4 is given by the fluxes involved in T3 ⇔ T4 and T4 ⇔ R4 transitions. Step 2: eliminate state T1 using the “Y − Δ” transformation so that states T0, R1, and T2 form a “Δ” loop (Figure C1c). In this and the following steps, the effective probability fluxes between various states in the loop can be calculated by using equation 23. Step 3: follow step 2 to eliminate state R2 so that R1, R3, and T2 states form a “Δ” loop (Figure C1d). Step 4: convert the “Y” branch composed of states T0, R1, R3, and T2 to a “Δ” loop involving T0, R3, and T2 to eliminate R1 (Figure C1e). Step 5: eliminate T2 from the “Y” chain composed of T0, T2, T3, and R3 (Figure C1f). Step 6: eliminate T3 by converting the “Y” branch involving states T0, R3, R4, and T3 to reach the final “Δ” loop having T0, R3, and R4 states (Figure C1g). One can use this procedure for other complex networks.
Appendix D
In this Appendix, we calculate the expected number of random numbers required to simulate one transition of the system from state A to B and back to A in the full and reduced models using Gillespie’s Algorithm (Gillespie, 1976). If the system is in state t in the full model, then the probability of transition from t to A, pA, and t to B, pB, are given by
(D.1) |
and pB = 1 − pA. Transition from t to either A or B is a Bernoulli process. The probability of making n transitions to A followed by one transition to B is . The expected number of transitions from t to A before reaching B is
(D.2) |
The number of transitions from A to t are NAt = NtA + 1. So <NAt>=<NtA> +1 = pA/pB + 1. The total number of transitions out of t, Nt, is Nt = NtA + 1 (the final transition is to B). The number of random numbers required to simulate the transition of the system from state A to B through t is
(D.3) |
Similarly the number of random numbers needed to simulate the transition of the system from state B to A through t is . Thus the expected number of random numbers, Nrand, needed to simulate one transition from state A to t to B and state B to t to A in the full model is Nrand = 3(pA/pB + pB/pA + 2). The number of random numbers needed to simulate one transition from state A to B and B to A in the reduced model is 2. The ratio of the expected number of random numbers needed to simulate one transition of the system from state A to B and back to A using the full and reduced models is 3(pA/pB + pB/pA + 2)/2, which has a minimum of 6 and an infinite maximum.
Footnotes
Note that the term “low occupancy” does not imply that the occupancy of a low occupancy state is less than that of all the high occupancy states under all conditions. Rather, it means that the occupancy of low occupancy states is negligible compared to at least one of the main states under all conditions. See Figure 5 inset for an illustration.
Here we show that the Hill limit corresponds to c → 0, Λ → ∞, and Λc4 → 0 so that . For finite x both and go to zero as c → 0, Λ → ∞. If x diverges in the Hill limit it either diverges slower, the same as, or faster than Λ1/4. We treat these cases one by one.
If x diverges slower than Λ1/4 we write where 0 < α < 1. Then and . If cx remains finite as Λ → ∞ then so that does not converge to thus we must have cx → 0. If cx → 0 then .
If x diverges as Λ1/4 we write . Then . If cx remains finite as Λ → ∞ then so that does not converge to unless cx → 0. If cx → 0 then cΛ1/4 → 0 ⇒ Λc4 → 0.
If x diverges faster than Λ1/4 we write where 0 < α. Then and . If cx diverges as Λ → ∞ then . Note that c goes to zero faster than Λ−1/4 so that if Λc3 diverges it must diverge slower than Λ1/4. Since x is diverging faster than Λ1/4, it follows that Λc3/x → 0. Thus we have that .
Thus for all x if and only if Λc4 → 0.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Akaike H. A new look at the statistical model identification. IEEE Trans Automatic Control. 1974;AC-19:716–23. [Google Scholar]
- Akers S., Jr The use of wye-delta transformations in network simplification. operations research. 1960:311–323. [Google Scholar]
- Bruno W, Yang J, Pearson J. Using independent open-to-closed transitions to simplify aggregated Markov models of ion channel gating kinetics. Proc Natl Acad Sci USA. 2005;102:6326. doi: 10.1073/pnas.0409110102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Colquhoun D. Agonist-activated ion channels. British J Pharmacol. 2006;147:S17–S26. doi: 10.1038/sj.bjp.0706502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deng K, Sun Y, Mehta P, Meyn S. An information-theoretic framework to aggregate a Markov chain. American Control Conference, ACC’09; IEEE; 2009. pp. 731–736. [Google Scholar]
- Fredkin D, Montal M, Rice J. Theory of Markov chains. Proc Berkeley Conf in Honor of Jerzy Neyman and Jack Kiefer. 1985;1:269–289. [Google Scholar]
- Gillespie D. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. Journal of computational physics. 1976;22:403–434. [Google Scholar]
- Huisinga W, Meyn S, Schütte C. Phase transitions and metastability in Markovian and molecular systems. Annal App Prob. 2004;14:419–458. [Google Scholar]
- Kennelly A. Equivalence of triangles and stars in conducting networks. Electrical World and Engineer. 1899;34:413–414. [Google Scholar]
- Knudsen H, Fazekas S. Robust algorithm for random resistor networks using hierarchical domain structure. Journal of Computational Physics. 2006;211:700–718. [Google Scholar]
- Kolmogorov A. Theory of markov chains. Annal Mathematics. 1936;112:155–160. [Google Scholar]
- Lindstedt A. Memoires de l’Academie Imperiale des sciences de St.-Petersbourg, VII serie 31. 1882. Beitrag zur integration der differentialgleichungen der storungs-theorie. [Google Scholar]
- Monod J, Wyman J, Changeux JP. On the nature of allosteric transitions: A plausible model. J Mol Biol. 1965;12:88–118. doi: 10.1016/s0022-2836(65)80285-6. [DOI] [PubMed] [Google Scholar]
- Norris J. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press; Cambridge: 1997. Markov chains. [Google Scholar]
- Onsager L. Reciprocal relations in irreversible processes i. Phy Rev. 1931;37:405–426. [Google Scholar]
- Schwarz G. Estimating the dimension of a model. The Annals of Statistics. 1978;6:461–464. [Google Scholar]
- Stewart W. Numerical solution of Markov chains. CRC; 1991. [Google Scholar]
- Ullah G, Mak DOD, Pearson J. A data-driven model of a modal gated ion channel: The inositol 1,4,5-trisphosphate receptor in insect sf9 cells. J Gen Physiol. doi: 10.1085/jgp.201110753. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Lier M, Otten R. Planarization by transformation. Circuit Theory, IEEE Transactions on. 1973;20:169–171. [Google Scholar]
- Yang J, Bruno WJ, Hlavacek WS, Pearson JE. On imposing detailed balance in complex reaction mechanisms. Biophys J. 2006;91:1136–1141. doi: 10.1529/biophysj.105.071852. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.