Abstract
Many processes in biology and chemistry involve multi-step reactions or transitions. The kinetic data associated with these reactions are manifested by superpositions of exponential decays that are often difficult to dissect. Two major challenges have hampered the kinetic analysis of multi-step chemical reactions: (1) Reliable and unbiased determination of the number of reaction steps, (2) Stable reconstruction of the distribution of kinetic rate constants. Here, we introduce two numerically stable integral transformations to solve these two challenges. The first transformation enables us to deduce the number of rate-limiting steps from kinetic measurements, even when each step has arbitrarily distributed rate constants. The second transformation allows us to reconstruct the distribution of rate constants in the multi-step reaction using the phase function approach, without fitting the data. We demonstrate the stability of the two integral transformations by both analytic proofs and numerical tests. These new methods will help providing robust and unbiased kinetic analysis for many complex chemical and biochemical reactions.
Keywords: Reaction kinetics, exponential decay, rate constant, single molecule study
1. Introduction
The multi-step reaction models are central to the mechanistic understandings of various fields of chemistry: organic,1 atmospheric,2 biophysical,3–5 etc. For instance, an enzymatic reaction may be described by several sequential steps, including substrate binding, catalytic reaction, and product release;4 an ion channel may change its conformation through multi-step allosteric transtions;5 the movement by molecular motors requires ATP binding, ATP hydrolysis, and ADP release, with one of these steps linked to the powerstroke of a motor;3 proteins6,7 and RNA molecules8–10 may fold/unfold themselves via complicated multi-stage pathways on a rugged energy landscape. A broad spectrum of experimental methods has been devised for probing into these complex chemical reactions, and these methods can be loosely classified as either ensemble measurements or single-molecule measurements. In the former case, one tracks the time course of the concentration of a certain chemical in the bulk; in the latter case, one tracks the reaction course of individual molecules one by one. The experimental data arising from these measurements often encode the kinetic information in the form of a superposition of many exponential decay modes, and the decay rate constants and the coefficients of these decay modes contain information about the underlying chemical reaction scheme.
It is often critical to dissect the multi-step reaction kinetics quantitatively from experimental data, to determine the rate constant distribution of each step and to investigate the dependence of these rate constants on external conditions. For example, the rate constants can be affected by the concentration of substrates targeting a specific enzyme; the trans-membrane potential acting on an ion-channel; the force load on a molecular motor etc. Such dependence may elucidate the mechanism of the biological systems of interest. As individual biological molecules were brought to close scrutiny in recent single-molecule experiments, their kinetic behaviors were often found to invoke non-trivial distributions of reaction (or transition) rate constants. (See for example refs. 7–9,11–15) These experiments suggest that biological processes may not only consist of multiple steps but a seemingly two-step reaction may also involve heterogeneous kinetics that are better described by a distribution of, instead of a single, kinetic rate constants, further complicating the analysis of biological processes that exhibit complex chemical kinetics.
The analysis of multi-step reaction kinetics is challenging in two aspects. The first aspect is to reliably determine the number of reaction steps from kinetic data. The number of reaction steps have been previously estimated by computing the “randomness parameter” 16–19 from the kinetic data, or fitting the data to a Γ-distribution function.18,20 The “randomness parameter” approach, however, generally estimates a lower bound of the total number of rate-limiting steps, and the estimates become exact only when all reaction steps involve nearly identical rate constants.16–18 Similarly, the validity of a Γ-distribution fit also draws on identical rate constants or a Hill function response,21 which may not hold for general cases. The second challenge involves stable determination of the distribution of kinetic rate constants. Technically, this means to solve an inverse Laplace problem, which involves a well-known numerical instability (i.e. hypersensitivity to input noise).22 As a general practice, a discrete distribution of rate constants has typically been deduced from Levenberg-Marquardt fitting or Hidden Markov modeling,5,23,24 although these methods may suffer from a relatively arbitrary choice of the number of fitting parameters. Continuous distributions of rate constants have often been dealt with by stabilizing the numerical inversion of Laplace transform. The Tikhonov regularization25,26 and maximum entropy method27,28 are two popular stabilization methods, which stabilize the inverse Laplace transform by penalizing the curvature26 or information entropy27,28 of the continuous distribution profile and may distort the rate constant distribution especially when the distribution function does not exhibit uniform ruggedness. We have recently developed a phase function method to overcome the instability of the inverse Laplace problem, handling both discrete and continuous distributions of rate constants without prior knowledge input or artificial penalization.29 Nonetheless, the phase function method derives its viability and numerical stability from a strong constraint: the combination coefficients of multiple decay modes must be all non-negative real numbers, which implies single-step reaction kinetics.
To address the aforementioned challenges in the kinetic analysis of multi-step reactions, we introduce two numerically stable integral transformations in this paper. The first integral transformation enables us to reliably recover the number of reaction steps from noisy kinetic data. This method applies to general multi-step reactions, in which the rate constants of individual reaction steps can take arbitrary, non-identical distributions. The second integral transformation allows us to convert a multi-step kinetic data set to a corresponding single-step kinetic data set with a distribution of rate constants. We can thus utilize the “phase function method”29 to handle multi-step reaction kinetics as well.
The paper is organized as follows. In section 2, the mathematical formulation of data analysis is presented, with detailed descriptions of the two aforementioned integral transformations. In section 3, the numerical implementations of the theory and related discussion are presented. The paper is concluded by section 4. Although examples presented in this paper are motivated by biophysical/biochemical processes, with a particular emphasis on interpreting single-molecule experiments which have been shown recently to reveal rich kinetics and dynamics of biological systems, we expect the numerical methods present here to be generally applicable to the analysis of multi-step reaction kinetics arising from many other chemical processes.
2. Theoretical Methods
2.1 Kinetic Data in Sequential Processes with m Reaction Steps
A typical m-step chemical reaction takes the following scheme:
(1) |
According to eq 1, any molecule that starts from the “A1 state” has to experience (m − 1) intermediate states A2, …, Am before it is turned to a “B state”. The m-step reaction involves m sequential tasks in a sequel, fulfilled with a set of rate constants k1, k2, …, km. Often, the observable properties of the A states are experimentally indistinguishable, i.e. the states A1,…, Am may give similar fluorescence/mechanical signals in a single-molecule experiment. This degeneracy makes it difficult to fully dissect the reaction kinetics by experimental means.
An accessible quantity for a molecule that starts from the A1 state is how long it takes to finish all the m tasks and reach the B state. Defining this as “reaction time” t (referring to the dwell time in all the A-like states), t will be the sum of m random durations tdt = t1 + t2 +···+tm, where ti is the dwell time that the molecule spends in the Ai state. Here, the random variable ti is governed by the reaction rate constant ki and thus follows the exponential decay law Prob{ti > t} = e−kit, t ≥ 0. The probability density for ti is thus given by , t > 0. The probability density for t = t1 + t2 +···+ tm then becomes a convolution of m factors:
(2) |
where the convolution operation “*” is defined as . This probability density function p(t) will henceforth be referred to as the “overall kinetic data” or “kinetic data”, from which we will try to dissect the multi-step reaction kinetics and determine the number of reaction steps m, as well as the rate constants for each step k1, k2, … and km.
A specific case of eq 1 is the situation where k1 = k2 = ··· = km = k:
(3) |
In this case, the kinetic data p(t) is simply the well-known Γ-distribution:
(4) |
This functional form is widely used in homogeneous sequential kinetic models in physical chemistry. Two notable examples are photon counting process and the reaction of processive enzymes. In the latter context, the enzyme exhibiting a reaction scheme as shown in eq 3 is often referred to as a “Poisson enzyme”.16 Hence, we will refer to the reaction scheme eq 3 as “Poisson enzyme reaction” hereafter.
Despite the apparent difference between the reaction schemes eq 1 and eq 3 (and their respective kinetic data eq 2 and eq 4), they are intrinsically connected. eq 2 is related to eq 4 through a weighted superposition:
(5) |
where μk1 k2···km (k) is a non-decreasing function of k, bounded by 0 and 1. Physically speaking, this means that the prototypical m-step reaction can always be realized by superposing many “Poisson enzyme reactions” , with a weighted distribution of k.
We defer the rigorous derivation of eq 5 and the explicit form of μk1k2···km (k) to Supporting Information A.1. Here, we will only illustrate the validity of eq 5 for the specific case m = 2, where we have the following result:
(6) |
Noting that the identity is valid for both k1 = k2 and k1 ≠ k2, we can cast pk1k2 (t) in eq 6 into the unified form as
(7) |
Here, ξ is a random variable uniformly distributed on the closed interval [0,1].
We can generalize eq 5 to reaction schemes that are more complex than shown in eq 1. For instance, we may allow the rate constant ki of the ith step to sort from a random distribution, instead of taking a fixed value. In the reaction scheme
the random variable ki obeys the cumulative distribution Fj (k) ≡ Prob{kj < k}. In this case, the corresponding kinetic data will take the form:
(8) |
Here, μ(k) = ∫dF1 (k1)··· ∫dFm (km)μk1···km (k) is still a non-decreasing function of k, bounded by 0 and 1. Therefore, the “Poisson enzyme reactions” are elementary building blocks of arbitrarily complicated m-step sequential reactions.
From eq 8, we make two observations:
The analysis of m-step sequential reactions boils down to the reconstruction of a bounded and non-decreasing function μ(k).
The number of reaction steps m is imbedded in the t → 0+ behavior of p(t): for any non-decreasing function μ(k), we have p(t) ∝ tm−1 as t → 0+.
Motivated by these observations, we propose to analyze the sequential kinetic data with four sequential steps:
Performing a “heat capacity transform” Ĉ to the kinetic data p(t), and deducing the number of reaction steps m by exploiting the asymptotic behavior p(t) ∝ tm−1, t → 0+;
Applying an “m-to-1 transformation” Ĵm to the kinetic data, and casting it into the form ℘(t) = (Ĵm p)(t) = ∫ke−kt dμ(k);
Reconstructing the non-decreasing function μ(k) from the 1-step kinetic data ℘(t) = ∫ke−kt dμ(k) using the “phase function approach”;29
Extracting kinetic information from the reconstructed μ(k).
2.2 Deriving the Number of Reaction Steps through the “Heat Capacity Transform”
In the following, we will develop a method to deduce the number of reaction steps m from kinetic data as shown in eq 8.
The function p(t) has a universal asymptotic behavior p(t) ∝ tm−1, t → 0+, so one may naively guess that m can be deduced by computing the limiting behavior of d ln p(t)/d ln t, as t → 0+. In reality, however, the estimate of d ln p(t)/d ln t can encounter the “ln0” instability when noise is superimposed on the kinetic data p(t). Here, we will suppress such instability by applying an integral transform to the kinetic data p(t).
Before presenting the integral transform, we first quantify the noise source that affects the detection of p(t). Taking single molecule experiments as examples, the kinetic data derived from single-molecule experiments are often described as a histogram, where the histogram count nτ (t) in the bin [t, t +τ) describes the number of molecules that reach the final state B in the time interval [t, t +τ). The count nτ (t) asymptotically scales with the probability density p(t):
(9) |
Here, N = Σ(τ) nτ(t) is the total number of molecules counted and “Σ(τ)” denotes summation over all the time bins (i.e. over the entire observation time window). The uncertainty in p(t) arises from the Poisson counting noise and thus the variance of nτ(t) is equal to its mean:
(10) |
From this, it is clear that we cannot naively estimate m~d ln p(t)/d lnt, t → 0+, due to the numerical uncertainty (Note: nτ(t) ∝ p(t) ∝ tm−1 → 0, as t → 0+ for multi-step reactions where m > 1).
To overcome this numerical instability, we exploit the following transform:
(11) |
which is the discretized version of the integral transform for p(t), . The integral transform Ĉ significantly suppresses the noise in δnτ (t) and the asymptotic behavior of (Ĉnτ)(k) is given by
(12) |
(Supporting Information B provides an analytic derivation of eq 12.) Therefore, we can derive the number of reaction steps, m, from a numerical estimate of the following limit:
(13) |
Due to Poisson noise 〈[δnτ (t)]2〉 = 〈nt(t)〉 ≈ Nτp(t), the uncertainty in (Ĉnτ)(k) is given by:
(14) |
(See Supporting Information B for a proof of eq 14.) Here, is the cumulative count for molecules with reaction time smaller than t.
One may get more physical insights into the transform Ĉ by comparing it to Debye’s formula that links the phonon spectrum D(ν) of a solid to its heat capacity CV:
where h is the Planck constant, ν is the phonon frequency, kB is the Boltzmann constant, and T is the absolute temperature. For an m-dimensional solid, the phonon spectrum is D(ν) ∝ νm−1 in the low frequency limit ν → 0+ (which is analogous to p(t) ∝ tm−1 as t → 0+), and accordingly the heat capacity goes as CV ∝ Tm in the low temperature limit T → 0+ (which is analogous to the asymptotic behavior (Ĉp)(k): (Ĉnτ)(k) ∝1 km as k → +∞). Just like the observation that the often rugged phonon spectrum is transformed into a smooth heat capacity curve as a function of temperature, the relatively noisy p(t) will be transformed into a relative smooth (Ĉnτ)(k), due to the low-pass filtering property of the transformation30. Hence, we referred to the integral transform Ĉ as the “heat capacity transform”.
2.3 Reducing Multi-Step Kinetics to 1-Step Kinetics by the Integral Transform Ĵm
Once the number of reaction step m is deduced, we will further extract the kinetic information for a multi-step reaction by reconstituting the non-decreasing distribution function μ(k), which relates to the kinetic data p(t) through eq 8:
The distribution function μ(k) can further provide chemical information after we combine it with additional inputs about sequential reaction schemes. For the m = 1 case, one can stably reconstruct μ(k) from p(t) using the phase function method.29 To reconstruct μ(k) in the m > 1 case, we need to transform p(t) to the form of “1-step kinetics”.
Such an “m-to-1 transform” Ĵm can be realized as follows:
(15) |
This linear transform convert p(t) to ℘(t) = (Ĵm p)(t) ≡ ∫ke−kt dμ(k). The integral transform Ĵm is a numerically stable transformation, i.e. any small perturbation f(t) = δp(t) to the kinetic data p(t) will remain small after being transformed to (Ĵm f)(t). Quantitatively, one can verify the following inequality (See derivations in Supporting Information C): g
and the noise amplification ratio cm satisfies
(16) |
We note that the reconstruction of μ(k) from the 1-step kinetic data also has a finite amplification ratio (See Ref. 29). Thus, the mapping from the m-step kinetic data to the non-decreasing function μ(k) is numerically stable, the overall noise amplification ratio being .
2.4 Reconstructing μ(k) from the Kinetic Data p(t) Using the “Phase Function Approach”
We have previously developed a “phase function method” to reconstruct a non-decreasing μ(k) from 1-step reaction kinetics ℘(t) = (Ĵm p)(t) ≡ ∫ke−kt dμ(k).29 Now that we have reduced the m-step reaction kinetics to 1-step reaction kinetics by the “m-to-1 transform” Ĵm, we can use the phase function approach to deduce μ(k). The main idea was to regard exponential decay e−kt as an oscillation eiωt in purely imaginary frequency ω = ik. We can thus decompose the kinetic data as superposition of oscillation modes by using Fourier transform and then converts information on the Reω-axis to the Imω-axis (k-axis).29
The “phase function approach” reconstructs a non-decreasing μ(k) in three steps as outlined below:29
-
Perform a Fourier transform to the data ℘(t) for complex-valued frequency ω = Reω + i Imω satisfying Imω ≤ 0, and determine the phase function
-
Since is the imaginary part of an analytic function , it must satisfy the two-dimensional Laplace equation ∂2φ(ω)/∂(Reω)2 + ∂2φ(ω)/∂(Imω)2 = 0. We can thus numerically solve this Laplace equation to deduce φ(ω) for all Reω ≠ 0, which satisfies29
(17) -
We can then obtain the phase function on the imaginary axis φ(ik) ≡ limε→0+φ(ik − ε), which contains all the information necessary for the reconstruction of μ(k) and allows μ(k) to be derived using the following transform R̂:
(18)
The non-decreasing property of μ(k) and the boundedness of the associated phase function 0 ≤ φ(ik) ≤ π ensure the numerical stability of this algorithm against noise in the time domain data.29 The uncertainty in μ(k) = (R̂φ)(k) is estimated by:
(19) |
where ε# is the noise level (t-domain relative error) in the kinetic data ℘(t).
2.5 From μ(k) to Kinetic Information of Interest
From μ(k), we can compute a frequency spectrum of the kinetic data p(t) as:
(20) |
The function p̃(ω) is referred to as the “frequency spectrum” because it coincides with the Fourier transform of :
This spectral representation p̃(ω) = (Ω^m μ)(ω) can help reveal kinetic information that is implicitly imbedded in the non-decreasing function μ(k), as shown below.
Consider the m-step sequential reaction scheme , where the rate constant kj obeys the probability distribution Fj (k) = Prob{kj < k} for j = 1,2,…, m. The scheme invokes no further branched pathways. Chemically speaking, in this scenario, we assume that the molecule has to challenge an energy barrier with distributed barrier height at each reaction step, but the distributions of barrier heights in distinct reaction steps are independent. From the kinetic data in the t-domain p(t) = [∫ke−kt dF1 (k)]* ··· *[∫ke−kt dFm (k)], which is a convolution of m factors, one can get the frequency spectrum in the ω-domain as the product of m single-step reaction frequency spectra:
(21) |
Because the function p̃(ω) given in eq 21 satisfies 0 < |p̃(ω)| < +∞ for Reω < 0, it is possible to define a single-valued phase function ψ(ω) = Imln p̃(ω) as
(22) |
(See Supporting Information A.2 for a proof of the above statement.) According to eq 21, the phase ψ(ω) further decomposes to the sum of m terms:
As we bring ψ(ω) to the limit Reω → 0−, we obtain:
Evidently, this algebraic sum in the k-domain is a mathematically more convenient form than the original m-term convolution in the t-domain: p(t) = [∫ke−kt dF1 (k)]*···*[∫ke−kt dFm (k)]. Here, we note that the crucial bridge between k- and t-domains builds on the condition 0 < | p̃(ω)| < +∞ for Reω ≠ 0, but such a bound on | p̃(ω)|, while being rigorously true for sequential reactions following the form , may not necessarily hold for branched reaction schemes. For a more general reaction, it is possible that p̃(ω) vanishes for some Reω ≠ 0, leading to singularities with ln p̃(ω) = ln 0 = ∞ and ill-defined phase functions.
Naturally, the next question is how to determine the distributions of rate constants Fj(k) = Prob{kj < k} (j = 1,2,…, m) from the phase function decomposition ? Clearly, once we know each individual φj(ik), we can unambiguously reconstruct Fj(k) = (R̂φj)(k) (using eq 18). However, with the mere knowledge of one kinetic data set p(t), the best one can obtain is just the sum of m phase functions φj(ik), j = 1,2, …, m, and the problem of recovering the m individual summands is underdetermined. To fully characterize the kinetics, we usually need additional inputs.
Here, we will discuss two often encountered circumstances in which additional knowledge can help us fully dissect the multi-step reaction kinetics using ψ(ik).
Situation 1: The distributions of kj for each step are identical
This is often assumed for the movement kinetics of motor proteins (such as kinesins31 and myosins14) and nucleic acid-translocating enzymes (such as polymerases15 and helicases32) where the elementary motion steps adopt approximately identical rate constants. Under this assumption, we can model the kinetic data p(t) by the convolution of m identical copies of single-step reaction kinetics:
Accordingly, the phase function of the overall sequential reaction is given by , with all individual phase functions to be identical: φj(ik) = ψ(ik)/m. Therefore, we can determine F(k) using the reconstruction formula eq 18: F(k) = (R̂φj)(k).
Situation 2: The distributions of kj scale independently with external experimental conditions
In many cases, the distribution of rate constants of each step of a multi-step process can be independently tuned by external conditions. A simple example is that the rate constant of enzyme-substrate binding process scales linearly with respect to concentration of the substrate. In these cases, we are not only able to determine a single phase function
but can also determine the scaled phase function
from experimentally measured p(t) functions at the scaled experimental conditions. Here, s1, …, sm are sets of scaling factors for these conditions (substrate concentration etc.). By choosing a sufficient number of different sets of (s1, …, sm) (i.e. a sufficient number of independent kinetic experiments), we can uniquely determine the functional forms of individual phase functions φ1, …,φm and thus the rate constant distributions of Fj(k) = (R̂φj)(k).
To flesh out this idea, we illustrate the case m = 2 with the enzymatic reaction:
Here, the enzyme (E) binds a substrate (S) to form an enzyme-substrate complex (ES) with a distributed binding rate constant k1 = κ1[S], where [S] is the concentration of the substrate. The enzyme-substrate complex (ES) then undergoes a catalytic step, executed with a distributed rate constant k2 (F2(k) = Prob{k2 < k}) to release the product (P). Suppose that we conduct two experiments: the first for [S] = [S]0, and the second for [S] = s[S]0, where s > 1 is a scaling factor. The resulting kinetic data sets are p[1](t) and p[2](t), respectively. If the distribution of in the first experiment is denoted by , then the distribution of in the second experiment is given by . Now we can reconstruct, from p[1](t) and p[2](t), two phase functions
respectively. We can compute the difference between these two phase functions to get a function
from which we can fully recover the phase function φ1(ik) as
(23) |
Such a series actually terminates after finite terms, because when k < k ( k is the slowest detectable rate constants due to the finite time duration of the experiment), φ1(ik) = 0 and thus g(k)=0. Therefore, we can fully determine φ1 (ik) and φ2 (ik), from which we can directly determine the rate constant distributions governing the two steps of the enzymatic reaction using , F2 = R̂φ2 where R̂ is again given by eq 18. This m = 2 example can be generalized to any reaction form with m > 2 as long as the rate constant at all but one step can be scaled independently (for example, by tuning the concentration of (m−1) distinct substrates that bind to an enzyme in a sequential manner).
3. Numerical Results and Discussion
Here, we shall numerically illustrate the above theories and address several practical problems in the analysis of multi-step reaction kinetics:
Determine the number of reaction steps from noisy kinetic data;
Deduce the probability distribution of rate constants in each reaction step based on the assumption that the distribution is identical for all steps;
Determine the probability distributions of individual reaction steps in “enzymatic reactions” where the rate constant of each step can be independently scaled.
3.1 Finding the Number of Reaction Steps
To simulate kinetic data with realistic experimental noise (such as the noise level carried by single-molecule experiments), we used Monte Carlo simulations to construct the histograms of overall reaction times for a variety of reaction schemes in the form of (m = 2, 3, 4 and 5) (Figures 1a, 1c, 1e and 1g). Here the overall reaction time of each molecule is defined as the dwell time in all the A states (“reactant” and “intermediate” states) before the molecule reach the “product” state B. Each reaction step may take a fixed rate constant (Figures 1a, 1e and 1g) or sort from a non-trivial distribution (Figure 1c). The distributions of rate constants may be identical (Figures 1a and 1c) or distinct (Figures 1e and 1g) among the different reaction steps.
In more details, for a given reaction scheme , a random reaction time was simulated as the sum of m random integers R = R1 + R2 + ··· + Rm, where Rj follows the distribution Prob{Rj > t} = ∫e−kt dFj (k). A total of N = 8000 such random integers R are simulated for each reaction scheme. The 8000 reaction times are then binned to construct a histogram, nτ (t) (Figures 1a, 1c, 1e and 1g), with the bin size equal to the time resolution τ, which is taken as τ = 1 in all examples shown in this paper. Such a Monte Carlo approach was designed to reflect realistic noise arising from finite time resolution and finite total counts present in single molecule experiments.
We then tested the “heat capacity transform” approach with these simulated kinetic data (Figures 1b, 1d, 1f, and 1h). First we performed the transform Ĉ on the simulated histogram nτ (t):
As shown in Figures 1b, 1d, 1f and 1h, the transform yields (Ĉnτ)(k) ≈ N = 8000 in the small k limit as expected. To obtain a numerically stable estimate of the number of reaction steps using the large k limit, , we took the following procedure to estimate the derivative and limit: (1) We estimated the derivative with a dynamic window size εk = ln | (Ĉnτ)(k)+ | δ(Ĉnτ)(k) || − ln(Ĉnτ)(k) where | δ(Ĉnτ)(k) | is noise level in (Ĉnτ)(k) as given in eq 14. As a result,
(24) |
(2) To properly estimate the limit at large k, we note that the rate constant k > 1/τ cannot be determined by an experiment with time resolution of τ and that the estimate of ln[(Ĉnτ)(k)] is reliable only when (Ĉnτ)(k)± | δ(Ĉnτ)(k) |> 0. Bearing these two facts in mind, we set a maximum value for k in the evaluation of , i.e.
(25) |
where kcut-off ≡ max{k: |δ( Ĉnτ)(k)| ≤ (Ĉnτ)(k)}. Rather than literally pursuing k → +∞, we calculate the number of steps as the nearest integer to m* = limk→kmax m(k).
As shown in Figure 1, the m* values recover the correct numbers of reaction steps to within an error of ± 10%. This is not only true for multi-step reactions with identical rate constant distribution at each step (Figures 1a–1d) but also true for reactions with disparate rate constants at different steps (Figures 1e–1h). This method applies to reactions with a single rate constant in each step (Figures 1a, 1b, and 1e–1h) as well as reactions with a broad distribution of rate constants (Figures 1c and 1d). The number of reaction steps as high as m = 5 can be faithfully recovered (Figures 1g and 1h) when the noise level for the t-domain kinetic data is ~1% (Poisson counting noise corresponding to N = 8000). We expect that with lower noise in the kinetic data, even high numbers of the reaction steps can be accurately deduced (See Supporting Information D). In comparison, the “randomness parameter” defined by
(26) |
gives r = 0.51, 0.75, 0.34 and 0.35 for the kinetic data shown in Figures 1a, 1c, 1e and 1g, respectively. Here 〈·〉 denotes an average weighted by the reaction time histogram nτ (t). The lower bound of the number of reaction steps is then given by m = 1/r, corresponding to ~2, 1.3, 3 and 3 steps for the 4 cases. Indeed, when each reaction step has a single and identical rate constant k, as shown in Fig. 1a, this lower bound accurately describes the actual number of reactions steps as expected. However, for the other cases, 1/r significantly deviated from the number of reactions steps as well as rate-limiting steps.
The “heat capacity transform” thus provides a more general and precise approach to derive the number of reaction steps from kinetic data. However, for fixed data precision in the time domain, the “heat capacity transform” may also fail when the number of reaction steps m or the dispersion of rate constants is too large (See Supporting Information D for more quantitative discussion).
3.2 Reconstructing Identically Distributed Rate Constants for Sequential Multi-Step Reactions
The major concern of this subsection is to fully determine the reaction scheme from kinetic data p(t), provided that k1, ···,km are identically distributed: F1(k) = ··· = Fm (k). This type of kinetic scheme can approximate many processive biomolecular processes. For instance, we may wish to characterize the kinetic profile of an RNA polymerase molecule that transcribes a piece of DNA or a helicase molecule that unwinds a DNA or RNA duplex. The typical kinetic data often provides information about how long it takes the polymerase molecule to transcribe DNA with a certain number of nucleotides or how long it takes the helicase to unwind a duplex of a certain number of base-pairs without directly revealing the actual translocation step-sizes of these molecules. If we neglect the sequence dependence, the kinetics of these processive biochemical reactions can be approximated by the scheme . Using the “heat capacity transform” Ĉ, we may deduce the number of reaction steps m required for translocating across n nucleotides. Subsequently, the step-size of the molecule of interest can be computed as n/m and then the probability distribution of rate constants for an elementary translocation step can be deduced.
In Figure 2, we simulated the situation for m = 3 or 5 with dispersed kinetic rate constants. The distributions of the rate constants were identical but independent of each other. The reaction times in Figure 2 were simulated as the sum of random integers R= R1 + ··· + Rm, m = 3 or 5. For the m = 3 case (Figure 2a), the rate constants were allowed to take two discrete values with equal probability, mimicking the scenario that the molecule has two distinct conformation each with a distinct reaction rate (as depicted by the black curve in Figure 2b). For the m = 5 case (Figure 2c), the reaction barriers for each step (i.e. ln k ) were taken to be uniformly distributed on an interval (as depicted by the black curve in Figure 2d).
We used four steps to deduce the number of reaction steps and the underlying F(k) for each step from the simulated overall kinetic data given in Figures 2a and 2c.
We deduced the number of reaction steps by applying the “heat capacity transform” Ĉ to the simulated histogram nτ (t). Taking the limit of m* = limk→kmax m(k) with m(k) given by eq 24, the simulated histogram shown in Figure 2a leads to an estimate m* = 3.21, which rounds off to m = 3, agreeing quantitatively with the preset number of reaction steps. Similarly, when data in Figure 2c were analyzed by transform Ĉ, the result is m* = 4.61, which rounds off to m = 5, also agreeing with the preset number of reaction steps.
We applied the “m-to-1” transformation Ĵm to the raw data nτ (t). After Ĵm transformation, the kinetic data took the form ∫ke−ktdμ(k), where the non-decreasing function μ(k) can be reconstructed via the “phase function approach” (data not shown).
We computed the frequency spectrum p̃ (ω) = (Ω̂mμ)(ω) ≡ ∫(1+iω/k)−m dμ(k) from μ(k), and deduced the phase function. The phase function for individual reaction steps were then deduced from φ(ik) = ψ (ik)/m.
We reconstructed the rate constant distributions F(k) from φ(ik) using the R̂ transform as shown in eq 18.
The reconstructed rate constant distributions are shown as the red curves in Figures 2b and 2d, with the vertical error bars given by (eq 19). The reconstruction result compares quantitatively well (within error) with the preset distribution of rate constants (black curves) in both cases. The horizontal error bars on the preset F(k) (depicted by the grey zones) arise from finite sampling error in the simulated data (i.e. at the noise level (~1%) corresponding to the Poisson counting error for N = 8000, a single exponential decay cannot be distinguished from the superposition of exponential decay modes with a range of rate constants defined by Δln k = ±0.4.)
3.3 Reconstructing Michaelis-Menten Kinetics with Distributed Rate Constants
In this subsection, we dissect a 2-step enzymatic reaction . The rate of the substrate-binding step is tunable by substrate concentration [S]. We aim to reconstruct the probability distribution of rate constants in both the substrate-binding step and the catalysis step.
In Figure 3 we simulated two enzymatic reactions:
(Here, the unidirectional reaction schemes are approximations to the Michaelis-Menten scheme that involves one reversible step:
under the conditions κ−1 ≪ k2 and κ−1 ≪ κ1[S]. The unidirectional approximation is often valid as κ−1 ≪ k2 is satisfied for a wide class of enzymes and it is also possible to achieve κ−1 ≪ κ1[S] by properly tuning the substrate concentration [S].) Two independent numerical simulations for the two conditions [S] = [S]0 and [S] = 10[S]0 were conducted to obtain the reaction time histograms in Figures 3a and 3b, with the preset rate constant distributions for k1 = κ1[S]0 and k2 as shown in Figures 3c and 3d (black curves).
The numerical reconstruction of F1(k) = Prob{k1 = κ1[S]0 < k} and F2(k) = Prob{k2 < k} was achieved in the following four steps.
We applied the “2-to-1 transformation” Ĵ2 to both histograms in Figures 3a and 3b, and reconstructed two non-decreasing functions μ[1] (k) and μ[2] (k) (not shown) using the phase function method.
-
We computed the phase functions ψ[1] (ik) = arg[(Ω̂2μ[1])(ik)] and ψ[2] (ik) = arg[(Ω̂2μ[2])(ik)] using eqs 20 and 22.
From ψ[1] (ik) = φ1 (ik) + φ2 (ik), and ψ[2] (ik) = φ1 (ik/10) + φ2 (ik), we deduced the phase function corresponding to the each reaction step, φ1 (ik) and φ2 (ik), using eq 23.
We reconstructed the distribution of rate constants for each step as F1 (k) = (R̂φ1)(k) and F2 (k) = (R̂φ2)(k) using eq 18.
The reconstructed cumulative distributions F1 (k) = Prob {κ1[S]0 < k} and F2 (k) = Prob{k2 < k} (red curves in Figures 3c and 3d) both agree well with the preset distribution (black curves) within the finite sampling error depicted by the grey zones in the figures, arising from the finite number reaction times simulated (N = 8000). Here, the rate constant distributions in a 2-step reaction were uniquely determined by simultaneous analysis of two experimental data sets and did not invoke fitting the raw data to subjective models of the distributions. In contrast, typical fitting methods are not adequate to determine such arbitrary distributions of rate constants as shown in Figures 3c and 3d.
4. Conclusions
It is a challenging task to dissect the kinetics of a multi-step chemical reaction. This is especially true if the totally number of reaction steps are unknown a priori or if some of the reaction steps cannot be described by a single rate constant. In this work, we introduced two numerically stable integral transforms, the “heat capacity transform” Ĉ and the “m-to-1 transform” Ĵm, to help analyzing the general m-step reaction kinetics using the phase function approach.
The “heat capacity transform” Ĉ maps the overall kinetic data, which typically follows the form , to . It allows us to reliably infer the number of reaction steps m from noisy kinetic data, by tracking the asymptotic behavior (Ĉp)(k′) ∝ 1/k′m, k′ → +∞. The “heat capacity transform” method is applicable to general m-step reactions in which the rate constants can follow arbitrary distributions and/or are different among distinct steps. It should be noted, however, that the performance of the “heat capacity transform” is still subject to the numerical quality of the input data, i.e., it cannot resolve two reaction models, this distinction of which lie within the experimental noise. For example, while a single-molecule experiment with N = 2000 events may be sufficient to distinguish the 4-step and 5-step reaction models, experiments with N = 200 events could barely resolve 2-step versus 3-step reactions. (See Supporting Information D for a quantitative argument for this fundamental limitation.)
The “m-to-1 transform” Ĵm maps the overall kinetic data, p(t) to . It effectively converts an m-step kinetic data set to a 1-step kinetic data set ℘(t) = ∫ke−ktdμ(k), allowing the “effective” rate constant distribution μ(k) to be stably reconstructed using the “phase function approach”, a method we recently developed for stabilizing the inverse Laplace transform.29 The breakdown of a unique multi-step reaction scheme is, however, usually an underdetermined problem with a single kinetic data set p(t) in spite of a full determination of μ(k). As we have demonstrated in this paper, the “effective” rate constant distribution μ(k), when combined with additional input of kinetic information offers a powerful and straight forward strategy to fully dissect the m-step reaction kinetics .
Although we have presented our method in the context of sequential multi-step reactions with all irreversible steps, the same quantitative analysis applies to many non-sequential, partially reversible reactions as well. The classical Michaelis-Menten mechanism offers a telling example of how a partly reversible reaction is isomorphic to a sequential reaction with all irreversible steps. The reaction kinetics for
is identical to the 2-step reaction , where and .3,19 For the general kinetic correspondence between partly-reversible and totally-irreversible reaction schemes, we refer the readers to ref 33, where an algebraic approach was introduced for “reducing” complex reaction schemes (that involve reversible steps and loops) to more familiar paradigms.
In this paper, we mainly discussed the analysis of dwell time histograms that involve complex superpositions of exponential decays and introduced a model-independent method for analyzing these reaction kinetics. Many other statistical descriptions for kinetics in complex systems also manifest themselves as multi-exponential decays, especially in systems with a wide spectrum of time-scales. Notably, the decays of reactant concentration in bulk measurements (such as the drug metabolism and toxin degradation in pharmacokinetics34), the stochastic switching kinetics between molecular states (such as the random blinking of a quantum dot35), and the fluctuations of certain physiological responses (such as the polarization and depolarization of a nerve impulse36) all fall into this category. We anticipate that the method presented in this work would be helpful toward quantitative analysis of the complex kinetics of these processes. As we aimed at extracting maximum information from least assumptions, our model-independent method does not adapt itself to additional knowledge inputs automatically. In the case where such input is available (such as connectivity between multiple nodes in a reaction scheme37,38), our method may be complemented by other kinetics analysis approaches, such as the hidden Markov modeling24,38 to provide a fuller picture of the reaction kinetics.
Supporting Information Available
Acknowledgments
This work is supported in part by the National Institutes of Health and the David and Lucile Packard Foundation. X. Z. is a Howard Hughes Medical Institute investigator.
References
- 1.Sykes P. A Guidebook to Mechanism in Organic Chemistry. 6. Pearson Education; Harlow, England: 1986. [Google Scholar]
- 2.Petrucci RH, Harwood WS, Herring FG. General Chemistry - Principles and Modern Applications. 8. Prentice Hall; Upper Saddle River, NJ: 2002. [Google Scholar]
- 3.Schnitzer MJ, Block SM. Cold Spring Harb Symp Quant Biol. 1995;60:793. doi: 10.1101/sqb.1995.060.01.085. [DOI] [PubMed] [Google Scholar]
- 4.Nelson DL, Cox MM. Lehninger Principles of Biochemistry. 3. Worth Publishers; Gordonsville, VA: 2000. [Google Scholar]
- 5.Qin F, Li L. Biophys J. 2004;87:1657. doi: 10.1529/biophysj.103.037531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Astumian RD. Appl Phys A. 2002;75 [Google Scholar]
- 7.Yang H, Luo G, Karnchanaphanurach P, Louie TM, Rech I, Cova S, Xun L, Xie XS. Science. 2003;302:262. doi: 10.1126/science.1086911. [DOI] [PubMed] [Google Scholar]
- 8.Zhuang X, Bartley LE, Babcock HP, Russell R, Ha T, Herschlag D, Chu S. Science. 2000;288:2048. doi: 10.1126/science.288.5473.2048. [DOI] [PubMed] [Google Scholar]
- 9.Tan E, Wilson TJ, Nahas MK, Clegg RM, Lilley DMJ, Ha T. Proc Nat Acad Sci USA. 2003;100:9308. doi: 10.1073/pnas.1233536100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Liu S, Bokinsky GE, Walter NG, Zhuang X. Proc Nat Acad Sci USA. 2007;104:12634. doi: 10.1073/pnas.0610597104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lu HP, Xun L, Xie XS. Science. 1998;282:1877. doi: 10.1126/science.282.5395.1877. [DOI] [PubMed] [Google Scholar]
- 12.Zhuang X, Kim H, Pereira MJB, Babcock HP, Walter NG, Chu S. Science. 2002;296:1473. doi: 10.1126/science.1069013. [DOI] [PubMed] [Google Scholar]
- 13.Hong MK, Harbron EJ, O’Connor DB, Guo J, Barbara PF, Levin JG, Musier-Forsyth K. J Mol Biol. 2003;325:1. doi: 10.1016/s0022-2836(02)01177-4. [DOI] [PubMed] [Google Scholar]
- 14.Yildiz A, Forkey JN, McKinney SA, Ha T, Goldman YE, Selvin PR. Science. 2003;300:2061. doi: 10.1126/science.1084398. [DOI] [PubMed] [Google Scholar]
- 15.Shaevitz JW, Abbondanzieri EA, Landick R, Block SM. Nature. 2003;426:684. doi: 10.1038/nature02191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Svoboda K, Mitra PP, Block SM. Proc Nat Acad Sci USA. 1994;91:11782. doi: 10.1073/pnas.91.25.11782. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Schnitzer MJ, Block SM. Nature. 1997;388:386. doi: 10.1038/41111. [DOI] [PubMed] [Google Scholar]
- 18.Xie S. Single Mol. 2001;2:229. [Google Scholar]
- 19.English BP, Min W, van Oijen AM, Lee KT, Luo G, Sun H, Cherayil BJ, Kou SC, Xie XS. Nat Chem Bio. 2006;2:87. doi: 10.1038/nchembio759. [DOI] [PubMed] [Google Scholar]
- 20.Cai L, Friedman N, Xie XS. Nature. 2006;440:358. doi: 10.1038/nature04599. [DOI] [PubMed] [Google Scholar]
- 21.Friedman N, Cai L, Xie XS. Phys Rev Lett. 2006;97:168302. doi: 10.1103/PhysRevLett.97.168302. [DOI] [PubMed] [Google Scholar]
- 22.McWhirter JG, Pike ER. J Phys A: Math Gen. 1978;11:1729. [Google Scholar]
- 23.Venkataramanan L, Sigworth FJ. Biophys J. 2002;82:1930. doi: 10.1016/S0006-3495(02)75542-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.McKinney SA, Joo C, Ha T. Biophys J. 2006;91:1941. doi: 10.1529/biophysj.106.082487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Tikhonov AN, Arsenin VY. Solution of Ill-posed Problems. John Wiley and Sons; New York, NY: 1977. [Google Scholar]
- 26.Provencher SW. Comput Phys Comm. 1982;27:213. [Google Scholar]
- 27.Livesey AK, Brochon JC. Biophys J. 1987;52:693. doi: 10.1016/S0006-3495(87)83264-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Steinbach PJ, Ionescu R, Matthews CR. Biophys J. 2002;82:2244. doi: 10.1016/S0006-3495(02)75570-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Zhou Y, Zhuang X. Biophys J. 2006;91:4045. doi: 10.1529/biophysj.106.090688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Istratov AA, Vyvenko OF. Rev Sci Instr. 1999;70:1233. [Google Scholar]
- 31.Yildiz A, Tomishige M, Vale RD, Selvin PR. Science. 2004;303:676. doi: 10.1126/science.1093753. [DOI] [PubMed] [Google Scholar]
- 32.Dohoney KM, Gelles J. Nature. 2001;409:370. doi: 10.1038/35053124. [DOI] [PubMed] [Google Scholar]
- 33.Shaevitz JW, Block SM, Schnitzer MJ. Biophys J. 2005;89:2277. doi: 10.1529/biophysj.105.064295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Lunn DJ. Bayesian Analysis of Population Pharmacokinetic/Pharmacodynamic Models. In: Husmeier D, Dybowski R, Roberts S, editors. Probabilistic Modeling in Bioinformatics and Medical Informatics. Springer-Verlag; London: 2005. [Google Scholar]
- 35.Verberk R, van Oijen AM, Orrit M. Phys Rev B. 2002;66:233202. [Google Scholar]
- 36.Hodgkin AL, Huxley AF. J Physiol. 1952;116:449. doi: 10.1113/jphysiol.1952.sp004717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Flomenbom O, Klafter J, Szabo A. Biophys J. 2005;88:3780. doi: 10.1529/biophysj.104.055905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Andrec M, Levy RM, Talaga DS. J Phys Chem A. 2003;107 doi: 10.1021/jp035514+. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.