Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Mar 2.
Published in final edited form as: Chem Phys. 2012 Mar 2;396:53–60. doi: 10.1016/j.chemphys.2011.06.006

Likelihood functions for the analysis of single-molecule binned photon sequences

Irina V Gopich 1
PMCID: PMC3375684  NIHMSID: NIHMS304526  PMID: 22711967

Abstract

We consider the analysis of a class of experiments in which the number of photons in consecutive time intervals is recorded. Sequence of photon counts or, alternatively, of FRET efficiencies can be studied using likelihood-based methods. For a kinetic model of the conformational dynamics and state-dependent Poisson photon statistics, the formalism to calculate the exact likelihood that this model describes such sequences of photons or FRET efficiencies is developed. Explicit analytic expressions for the likelihood function for a two-state kinetic model are provided. The important special case when conformational dynamics are so slow that at most a single transition occurs in a time bin is considered. By making a series of approximations, we eventually recover the likelihood function used in hidden Markov models. In this way, not only is insight gained into the range of validity of this procedure, but also an improved likelihood function can be obtained.

Keywords: single molecule fluorescence spectroscopy, FRET, hidden Markov models, three-color FRET

1. Introduction

Conformational dynamics of a single molecule can be studied by analyzing fluorescence emission from dyes attached to the molecule [14]. Photophysical properties of the dyes depend on the distance to a quencher or, in the case of Förster resonance energy transfer (FRET), on the distance between donor and acceptor dyes. Conformational changes lead to changes in these distances, which are reflected in the fluctuations of fluorescence emission. The problem is how to extract the information about the conformational dynamics.

Figure 1 schematically illustrates a simple example of the problem considered here. Suppose that a protein can exist in only folded and unfolded state (see Fig. 1(a)). Figure 1(b) shows a conformational trajectory as the system jumps between the two states. However, this trajectory is not observed directly. What can be observed are photons emitted by the donor and acceptor fluorescent dyes attached to the molecule (see Fig. 1(c)). Since the average distance between the dyes is different in the two states, the corresponding donor and acceptor fluorescence intensity will also differ. Time resolution is not always sufficient to measure photon arrival times, in which case only numbers of photons during observation (bin) time are monitored. Figure 1(d) shows a time-binned photon trajectory where the numbers of acceptor (NA) and donor (ND) photons in time bins of the same duration T are determined. In this paper, we consider only this type of measurements. The sequence of photon counts can be converted into a sequence of FRET efficiencies defined in each bin as the fraction of acceptor photons, i.e., E = NA/(NA + ND).

Figure 1.

Figure 1

FRET and two-state protein folding. (a) Kinetic scheme that describes the folding and unfolding of a protein with attached donor and acceptor dyes. (b) The conformational trajectory which is of interest but invisible. (c) The observed photon trajectory when the arrival time of each detected photon is recorded. (d) A photon trajectory when only the numbers of acceptor and donor photons in a time bin of duration T is recorded.

A widely used method of analyzing photon counts is to construct histograms of photon counts or, more commonly, the histograms of FRET efficiencies. In such a procedure, the correlation between successive bins resulting from conformational dynamics is apparently lost. A different approach involves the analysis of the whole sequence of photon counts using a likelihood function. The starting point is the probability distribution of observing the sequence of photon counts given a model of conformational dynamics. This distribution is considered as the likelihood that the model of conformational dynamics describes the observed ordered sequence of photons. The likelihood function is then analyzed to solve various inference problems. These include finding optimal model parameters (by using gradient methods, or expectation-maximization, or Monte-Carlo methods), establishing the most likely conformational state trajectory (e.g., using the Viterbi algorithm), identification of the number of conformations, and model comparison among others. During the last decade, likelihood-based methods have been used to analyze many different biophysical experiments, including those dealing with ion channels [5, 6], molecular motors [7], protein dynamics [8], and unzipping of DNA by nanopores [9]. Even in the specialized area of single-molecule photon counting, there is now a rather extensive literature [1021].

Finding the likelihood function that corresponds to a specific physical problem is the first and most crucial step in all likelihood-based methods. We recently presented such a study for the sequences of photons with observed arrival times and photon colors [15]. In this paper, we will consider the likelihood function to handle single-molecule experiments where only the numbers of photons in time bins are recorded.

Sequences of photon counts and FRET efficiencies have been previously analyzed in the framework of hidden Markov models (HMM) [14, 16, 19]. In this technique, a molecule can change its state only at regularly spaced discrete times [22]. Being in a particular state, the molecule “emits a signal” (e.g., a number of photons or FRET efficiency) with an “emission” probability distribution associated with that state. Lee [16] pointed out to the fundamental limitation in the application of HMM to single-molecule data since transitions between conformational states in HMM are synchronized with the beginning of the time bin. Liu et al. [19] found that the parameters of the model of conformational dynamics determined using HMM are sensitive to the choice of the “emission” distribution.

In this paper, we focus on the study of the likelihood function for the sequence of photons detected in consecutive time bins. Other inference problems such as methods of optimization are outside the scope of the present paper. Our approach is based on the exact likelihood function corresponding to the model of photon statistics and conformational dynamics described below in Section 2. Although we are primarily interested in FRET, in the next section we will assume that all detected photons have the same color. The reason is that the mathematical formalism is more transparent in this case and can readily be generalized to FRET with two- and even three-color photons [2], as will be shown below in Section 4. The exact likelihood function is rather complicated. Therefore, we make a series of controlled approximations to derive likelihood functions that are practically useful, rather then simply postulating them. The relationship of these likelihood functions with those used in HMM will be established. In addition to the sequence of photon counts, the sequence of FRET efficiencies and corresponding likelihood functions are considered in Section 5. The last section summarizes results and gives concluding remarks.

2. The model

The model of conformational and photophysical dynamics adopted here is as follows. The molecule has M discrete conformational states. Being in a particular state, the molecule emits photons whose statistics are Poissonian (shot noise). This means that when the system is in state i, the distribution of the number of photons, N, detected in a time bin of duration T is (niT)N exp(−niT)/N!, where ni is the average number of photons that are detected per unit time (i.e., the photon count rate of state i). The mean number of photons in a bin is 〈N〉 = niT. For the photons of two colors, statistics of photons depend on two parameters, i.e., acceptor, nAi, and donor, nDi, photon count rates.

The interconversion among conformations is described by a set of rate equations for the population of state i at time t, pi(t). In matrix notation, these equations are written as dp/dt = Kp. The ijth off-diagonal element of the rate matrix K, Kij, is the rate constant that describes the transition from state j to i. The diagonal element, Kii, describes escape from state i and is equal to the negative of the sum of all rates that deplete state i, Kii = −Σji Kji. The vector of normalized equilibrium populations, peq, is obtained by solving Kpeq = 0. Detailed balance imposes the constraints on the matrix elements, Kijpeq(j) = Kjipeq(i).

The above model of photon statistics is based on the separation of all processes that influence fluorescence emission into fast and slow compared to the average time between detected photons, which is usually on the microsecond time scale. When all fluctuations are fast, the photon statistics are Poissonian. A detection device such as CCD camera may distort this distribution. Here we follow Liu et al. [19] and assume that, after appropriate scaling, the resulting fluorescence intensity is still Poissonian. The Poisson distribution depends on the count rate that involves the parameters of fast processes. The count rates are determined by many factors such as dye excitation, energy transfer, decay of the donor and acceptor excited states, spectral crosstalk, linker and dye orientation dynamics on submicrosecond time scale, as well as the detection efficiency and Poissonian background noise [23]. Fortunately, all these complications should be considered only if one wishes to interpret the extracted count rates in structural terms (e.g., to get the interdye distance). They do not affect the parameters of conformational dynamics.

Dynamical processes that are comparable or slower than the mean interphoton time modulate photon count rates and alter the statistics of photon counts. We define a “conformation” in a broad sense as any state of the molecule that has different photon count rates. This may be due to a different interdye distance that changes on a microsecond time scale and slower. This may be also due to “long-lived” photophysical states of the fluorophores [20, 24], labeling permutations [3, 25], etc.

The kinetic model of conformational dynamics is not restrictive as may appear at first sight. If conformational space is continuous (e.g., diffusion on a free energy surface), we can discretize it and construct the rate matrix from the finite difference approximation of the appropriate differential evolution operator. To describe conformational states whose lifetime distribution is multiexponential, we can introduce several, appropriately connected, Markovian states that have identical photon count rates.

It is interesting to note that the model adopted in this paper is a special case of what is called a Markov Modulated Poisson Process in the statistics literature [26]. This model has been applied to treat photon sequences in time-resolved experiments with recorded interphoton times [13, 15, 17, 20, 21].

3. Photons of the same color

We start by considering photons of one color. First, we obtain the exact likelihood function for a sequence of photon counts assuming that conformational dynamics are described by conventional rate equations and statistics of photons in each conformational state are Poissonian (see Section 2). Then we discuss various approximations that eventually lead to the likelihood function used in the standard HMM.

The likelihood function is equal to the probability distribution of observing a sequence of photon counts in consecutive time bins (i.e., N1, N2, N3, …). This distribution is related to the probability to detect N photons during the time interval of duration T and to be in conformational state i at the end of the time interval, given the molecule was in state j at the beginning. We denote this probability by Pij (N). In general, this is a complex object because the system can visit many states during the time interval and the number of photon counts is correlated with the visited states.

Let assume for simplicity that the observed sequence of photons consists of three time bins with recorded numbers of one-color photons N1, N2, and N3. The probability to observe such sequence (the likelihood function) is

L=i,j,k,l=1MPlk(N3)Pkj(N2)Pji(N1)peq(i)=1P(N3)P(N2)P(N1)peq (1)

where P(N) is the matrix with elements Pij (N) and 1 is the transpose of a column vector with every element equal to unity. This probability has a transparent interpretation. Reading from right to left, the first term, peq(i), is the probability that the system is in state i at the beginning of the first time interval. Since i can be any state, we sum over i. The next term Pji(N1) is the probability that the system starts in the state i at the beginning of the interval and is in state j at the end and N1 photons have been detected. The next term, Pkj (N2), corresponds to the second bin, in which N2 photons have been detected, and so on.

The generalization to a larger number of time bins (J) is:

L=1P(NJ)P(N3)P(N2)P(N1)peq (2)

The above expression is the likelihood that the parameters of the kinetic model (i.e., the rates Kij and photon count rates ni) describe the observed sequence of photon counts. The likelihood function is presented as a product of matrices. Note this formally exact likelihood function involves the probabilities Pij (N) that depend on the initial (j) and final (i) states in the bin interval.

In order to be able to estimate the model parameters, we need to specify the matrix P(N). This matrix can be found analytically only for a two-state system. The two states (e.g., unfolded and folded states in Fig. 1(a)) are characterized by the photon count rates n1 and n2. The transitions between the states are described by the rates k1 (1 → 2) and k2 (2 → 1). The exact expressions for the matrix elements Pij(N) can be obtained by averaging the Poissonian distribution of photon counts over the distribution of the time spent in one of the two interconverting states. This distribution can be found by generalizing the distribution in Ref. [27]. For n2 > n1, it can be shown that the diagonal (ii = 11, 22) and off-diagonal (ij = 12, 21) matrix elements are

Pii=(niT)NN!e(ni+ki)T+ki0T[n1t+n2(Tt)]NN!e(n1+k1)t(n2+k2)(Tt)×(k1(Tt)/(k2t))i3/2I1(4k1k2t(Tt))dtPij=kj0T[n1t+n2(Tt)]NN!e(n1+k1)t(n2+k2)(Tt)I0(4k1k2t(Tt))dt (3)

where I0(x) and I1(x) are the modified Bessel functions. Equations (2) and (3) provide an exact analytical expression for the likelihood function in the case of two-state conformational dynamics. This result can be readily generalized to photons of two and three photon colors.

Even in the case of the two-state model, the exact likelihood function is rather complicated. To simplify it and to treat more than two states, we start with the generating function of the matrix P(N), for which a compact expression was previously obtained [23, 28]

N=0λNP(N)=e(K(1λ)𝒩)T (4)

Here 𝒩 is a diagonal matrix with the photon count rates of each conformation on the diagonal (𝒩ij = niδij, where δij is the Kronecker delta defined so that it is unity when i = j and zero otherwise). This relation means that P(N) is the coefficient of λN in the expansion of the matrix exponential, exp((K −(1−λ)𝒩)T), in powers of λ.

To make progress, we consider the important special case when conformational dynamics is so slow that at most a single transition can occur during the bin time. In this limit, the transition rates are small, KijT ≪ 1, and one can expand the generating function in Eq. (4) to linear order in K [29]

N=0λNP(N)e(1λ)𝒩T+0Te(1λ)𝒩(Tt)Ke(1λ)𝒩tdt (5)

Expanding the right hand side in a power series in λ and equating the coefficients of λN, we find

Pii(N)(1+KiiT)(niT)NN!eniT=(1jiKjiT)(niT)NN!eniT (6)

When ij,

Pij(N)KijT0T[njt+ni(Tt)]NN!enjtni(Tt)dt/T (7)

Both of these equations have a simple physical interpretation. Pii(N) corresponds to the molecule that has the same state i in the beginning and in the end of the bin time. In the limit of slow conformational dynamics, this means that no transition has occurred during the bin time. Therefore, the probability Pii(N) is the product of the Poisson distribution of photon counts in state i (with count rate ni) and the probability to have no transitions, which we denote by Aii. This probability is Aii = 1 − Σji KjiT. The above expansion is valid only if Σji KjiT < 1; therefore, Aii is positive.

Now consider Pij (N) with different initial and final states, ij. In the limit of slow conformational dynamics, this means that a transition from j to i must have occurred at some moment t in the time interval 0 ≤ tT. The probability that this happened is approximately Aij = KijT. Note that the probabilities Aij are normalized, i=1MAij=1 (which means that the molecule either stays in state i or jumps to some other state). If the transition from j to i occurred at time t, then the mean number of photons is njt + ni(Tt) and the distribution of photons in such bins is Poissonian with the corresponding mean. Since the transition can occur anywhere in the interval with equal probability, one must average over t, as done in Eq. (7).

Equations 6 and 7 can be be combined into a single expression valid for i = j and ij:

Pij(N)AijBij(N) (8)

where Aij = δij + KijT and

Bij(N)=0T[nit+nj(Tt)]NN!enitnj(Tt)dt/T (9)

The integral in the above equation can be evaluated analytically and Bij (N) can be expressed in terms of the incomplete gamma-function, Γ(N, nT):

Bij(N)=Γ(N+1,niT)Γ(N+1,njT)(njni)TN! (10)

Note that Bij(N) is normalized as N=0Bij(N)=1 . The diagonal term is just a Poisson distribution, Bii(N) = (niT)N exp(−niT)/N!.

The likelihood function for three bins of photons in Eq. (1) can be written in terms of Aij and Bij (N) as

L=i,j,k,l=1MAlkBlk(N3)AkjBkj(N2)AjiBji(N1)peq(i) (11)

The above equations present the likelihood function that can be applied to study conformational dynamics at times long compared to the bin time. It is assumed that no more than one transition can occur during the bin time. The structure of this likelihood function is similar to that in HMM [22] if Aij are identified with the transition probabilities from j to i and Bij(N) with the “emission” probabilities. However, Bij (N) depends on both initial and final conformational states, unlike HMM where it is implied that a transition can occur only between two bins (which we refer here to as the standard HMM) and, therefore, the “emission” probability depends only on the initial state. Note that our likelihood function is to be read from right to left and the matrix of transition probabilities is the transpose of that used in statistical literature [22].

Now we turn to the consideration of the “emission” distribution Bij (N). The diagonal term, Bii(N), is just a Poisson distribution. Hence we focus on the distribution with nonequal initial and final states, ij. The analytical expression for this distribution, although it is available for one-color photons, cannot be extended to photons of two and three colors in general. Therefore, we proceed to various approximations, which will be also employed later in Section 4. These approximations are based on the mean μij and variance σij2 of the number of photons N in the bins that have a transition from j to i during the bin time. They are defined as μij = 〈Nij and σij2=N2ijNij2 , where 〈…〉ij means averaging with distribution Bij(N), Eq. (9). Evaluating the averages, we have:

μij=(ni+nj)T/2σij2=μij+(ninj)2T2/12 (12)

An approximation of Bij(N) that gives the correct mean and variance can be obtained using a negative binomial distribution:

Bij(N)Γ(N+rij)N!Γ(rij)(μijσij2)rij(1μijσij2)N (13)

where rij=μij2/(σij2μij) . This distribution can be considered as the Poisson distribution with a random count rate distributed according to a gamma distribution. The negative binomial distribution reduces to the Poisson distribution when ni = nj.

When N is sufficiently large, we can use a Gaussian distribution with the correct mean and variance, Eq. (12):

Bij(N)(2πσij2)1/2 exp ((Nμij)22σij2) (14)

Another approximation of Bij(N) is to use the midpoint rule to evaluate the integral in Eq. (9) (i.e., set t = T/2). In this way, we find

Bij(N)((ni+nj)T/2)NN!e(ni+nj)T/2 (15)

This approximation is the Poisson distribution with a count rate in between ni and nj. This gives the correct mean number of photons averaged over all bins that have a transition from j to i. However, the approximation results in the incorrect variance, which is equal to the mean in this case.

The crudest approximation is to evaluate the integral in Eq. (9) by setting t = 0:

Bij(N)(njT)NN!enjT (16)

This is the Poisson distribution with the count rate of the initial state. Substituting Eq. (16) into Eq. (11), we get the likelihood function for a three-bin sequence of photons that corresponds to this approximation

L=i,j,k=1M(nkT)N3N3!enkTAkj(njT)N2N2!enjTAji(niT)N1N1!eniTpeq(i) (17)

Here we used Σl Alk = 1. This is the likelihood functions that one would use when applying the standard HMM to binned trajectories with photons of one color.

In Figure 2, we compare various approximation for Bij(N), Eqs. (13)(16), with the exact result in Eq. (10). It can be seen that even the simplest approximation (a Poisson distribution with the average count rate, Eq. (15), labeled as PA in Fig. 2) improves the “emission” probability associated with the initial state, Eq. (16). This is the simplest way of improving the likelihood function in Eq. (17). The Gaussian approximation in Eq. (14) or more sophisticated negative binomial distribution in Eq. (13) are preferable when the fluorescence intensity of the initial and final states differ considerably. Note that the figure refers to the bins that have a transition during the bin time. The number of such bins is small when the conformational dynamics are slow. The majority of the bins is described by the Poisson “emission” probability, Bii(N) = (niT)N/N! exp(−niT), which is the same for all approximations.

Figure 2.

Figure 2

Comparison of various approximations for the probability Bij(N) of detecting N photons in the bins that undergo a single ji transition. The exact (EX) distribution (dots, Eq. (10)) is compared with the negative binomial (NB) distribution (open circles, Eqs. (13) and (12)), the Gaussian (G) distribution (solid line, Eqs. (12) and (14)), the Poisson (PA) distribution with the average count rate (crosses, Eq. (15)), and the Poisson distribution (P) with the photon count rate of the initial state (triangles, Eq. (16)), which is implicit in the standard Hidden Markov Models. Mean numbers of photons in a bin in states i and j are (a) niT = 10, njT = 20; (b) niT = 10, njT = 50.

4. Two and three color FRET

In this section we extend the previous ideas and approximations to FRET where photons of two or more colors are observed. We start with the likelihood function for a sequence of donor and acceptor photon counts (see Fig. 1(d)). The molecule in conformational state i now emits acceptor and donor photons distributed according to a Poisson distribution with count rates nAi and nDi. The likelihood function for a sequence containing both donor and acceptor photons is the generalization of Eq. (2). This can be written in terms of the probability Pij(NA,ND) to detect NA acceptor and ND donor photons and to be in state i at the end of the bin time, given that the molecule is in state j at the beginning. All these probabilities are considered as elements of matrix P(NA,ND). 1 The generating function for P(N) discussed in the previous section (see Eq. (4)) can be readily generalized to account for two photon colors[23, 28]:

NA,ND=0λANAλDNDP(NA,ND)=e(K(1λA)𝒩A(1λD)𝒩D)T (18)

where [𝒩A]ij = nAiδij and [𝒩D]ij = nDiδij. The likelihood function for a sequence of J bins with two-color photons is (c.f. Eq. (2))

L=1P(NAJ,NDJ)P(NA3,ND3)P(NA2,ND2)P(NA1,ND1)peq (19)

These expressions provide an exact likelihood function for a time-binned sequence of donor and acceptor photons.

As before, we are interested in the case when conformational dynamics is so slow that at most one transition occurs during the bin time. In this case the generating function can be expanded to linear order in K (see Eq. (5)), resulting in the probability that generalizes Eq. (8):

Pij(NA,ND)AijBij(NA,ND) (20)

Here the transition probability Aij is the same as before, Aij = δij + KijT. The “emission” probability is

Bij(NA,ND)=0T[nAit+nAj(Tt)]NANA![nDit+nDj(Tt)]NDND!enitnj(Tt)dt/T (21)

where ni = nAi+nDi. This is just what one would expect from the extension of the expression in Eq. (9) for one-color photons to donor and acceptor photons. However, there is no closed form expression for this integral in general. When i = j, Bij(NA,ND) in Eq. (21) is the product of two Poisson distributions with the acceptor and donor count rates nAi and nDi. As in the case of one-color photons, Bij(NA,ND) (ij) depends on both initial and final states in the bin, unlike the “emission” probability in the standard HMM.

To simplify the “emission” probability, we can make the same series of approximations for the off-diagonal Bij(NA,ND), ij, as before. First, we find the mean (μAij and μDij), variance (σAij2 and σDij2) , and the correlation (ρij) of acceptor and donor photon counts in the bins that have a single ji transition. They are defined as μFij = 〈NFij, σFij2=(NFμFij)2ij, and ρij2=(NAμAij)(NDμDij)ij/σAijσDij , where index F = A,D denotes acceptor or donor, 〈…〉ij means averaging using the distribution Bij(NA,ND) in Eq. (21). Evaluating the averages, we find

μFij=(nFi+nFj)T/2σFij2=μFij+(nFinFj)2T2/12ρijσAijσDij=(nAinAj)(nDinDj)T2/12 (22)

Previously we presented two approximations that have the correct mean and variance, namely, a discrete negative binomial and a continuous Gaussian distributions. While it appears possible to generalize the former using negative multinomial distributions, for the sake of simplicity we present only the generalization of Eq. (14), which is a bivariate Gaussian distribution:

Bij(NA,ND)(2πσAijσDij1ρij2)1× exp ((NAμAij)22σAij2(1ρij2)(NDμDij)22σDij2(1ρij2)+ρij(NAμDij)(NDμDij)σAijσDij(1ρij2)) (23)

where the parameters of the Gaussian are given by Eq. (22). As before, this approximation is applied only when ij.

The approximation that generalizes Eq. (15) is the product of the Poisson distributions for acceptor and donor photons with the count rates in between those in state i and j

Bij(NA,ND)((nAi+nAj)T/2)NANA!((nDi+nDj)T/2)NDND!e(ni+nj)T/2 (24)

where ni = nAi + nDi is the total count rate in state i. This distribution has the correct mean but incorrect variance and correlation.

Finally, the approximation in the spirit of HMM that involves only the parameters of the state in the beginning of the bin (j) is

Bij(NA,ND)(NAjT)NANA!(nDjT)NDND!enjT (25)

This is a generalization of Eq. (16) for one-color photons.

The distribution in the above equation is one of the “emission” probabilities used in the framework of HMM [19] in the analysis of binned sequences of acceptor and donor photon counts. It follows from our derivation that the likelihood function presented as the product of the transition and “emission” probabilities (as implied in the standard HMM) can be used only when conformational dynamics are slow on the time scale of the bin time. Even in this case, the “emission” probability can be improved by using the approximations in Eq. (23) or Eq. (24) instead of the simple Poisson distribution as in Eq. (25).

Three-color FRET

The likelihood function and its approximations for one- and two-color photons can be readily extended to three-color FRET. Consider a molecule with three labels attaches, namely, one donor, D, and two acceptors, A and B. The molecule in conformational state i can emit three kinds of photons with count rates nAi, nBi, and nDi. As in the case of two-color FRET, it is not necessary to have a microscopic theory for these quantities in order to be able to extract the rates of conformational changes from photon trajectories. Instead of presenting the straightforward generalization of the previous theory, we right away consider the case of slow conformational dynamics so that only none or one transition occurs during a bin. In this case the likelihood function for three bins can be written as

L=1P(NA3,NB3,ND3)P(NA2,NB2,ND2)P(NA1,NB1,ND1)peq (26)

where

Pij(NA,NB,ND)AijBij(NA,NB,ND) (27)

with the transition probability Aij = δij + KijT and the “emission” probability

Bij(NA,NB,ND)=0T[nAit+nAj(Tt)]NANA![nBit+nBj(Tt)]NBNB!×[nDit+nDj(Tt)]NDND!enitnj(Tt)dt/T (28)

where ni = nAi + nBi + nDi. As before, this can be evaluated numerically or various approximations can be made.

The approximations for the “emission” probability for two-color FRET considered before can be readily extended to three colors. The approximation in the spirit of HMM [19], which generalizes Eq. (25), is the product of three Poisson distributions that involve only the parameters of the state at the beginning of the bin (j). The approximation that generalizes Eq. (24) is the product of three Poisson distributions with the count rates in between those in states i and j:

Bij(NA,NB,ND)μAijNANA!μBijNBNB!μDijNDND!eμAijμBijμDij (29)

where μFij = (nFi + nFj)T/2 and index F now stands for A, B, and D.

The generalization of the Gaussian approximation in Eq. (23) involves three variables:

Bij(NA,NB,ND)1(2π)3 det Σe12F,F(NFμFij)(NFμFij)[Σ1]FF (30)

where Σ is the covariance matrix with elements [Σ]FF = μFijδFF + (nFinFj)(nFinFj)T2/12 (cf. Eq. (22)). In the above equation, the summation is over all F and F′, which take values A, B, and D.

5. FRET efficiencies

Instead of studying sequences of donor and acceptor photon counts, one can construct and then analyze sequences of FRET efficiencies. They are particularly advantageous for the single-molecule measurements of the molecules diffusing through a laser spot because FRET efficiencies are less influenced by the fluctuations of the laser intensity inside the spot [31]. FRET efficiencies are calculated from photon counts in a bin as Ei = NAi/(NAi + NDi). For example, the sequence of photons shown in Fig. 1(d) can be converted to the sequence of FRET efficiencies, E1 = NA1/(NA1 + ND1), E2 = NA2/(NA2 + ND2), E3 = NA3/(NA3 + ND3). The likelihood that parameters of the model are consistent with such a trajectory now depends on one variable (FRET efficiency), not two as in Eq. (19). The likelihood function for the sequence of J bins can be expressed in terms of matrix P(E), similar to Eq. 2:

L=1P(EJ)P(E3)P(E2)P(E1)peq (31)

The matrix element Pij(E) is the probability that FRET efficiency falls into an interval E ± h of size h for the bins in which the molecule is in state i at the end of the bin interval and in state j at the beginning. 2 This probability is related to Pij (NA,ND) and can be obtained from the latter by summing over NA and ND such that NA/(NA + ND) = E ± h.

Now we proceed directly to the limit of slow conformational transitions, in which case Pij (E) can be presented as the product of the transition probabilities, Aij = δij + KijT, and the “emission” probabilities, Bij(E), similarly to Eqs. 8 and 20:

Pij(E)AijBij(E) (32)

The “emission” probability Bij(E) is the probability of E when at most a single transition occurs in a bin. This can be obtained only numerically from Bij(NA,ND), Eq. (21), by summing over NA and ND. So the strategy we adopt here to simplify Bij(E) is to find the mean and variance of Bij(E) and then to use them to construct either a Beta or a Gaussian distribution.

The mean FRET efficiency of all bins in which the molecule is in state j and i at the beginning and end of the bin time is complicated unless the count rates are independent of conformation. Since the idea of using FRET efficiency rather than the number of acceptor and donor photons is only a good one when the total count rate does not depend on conformation, we shall now assume that this is the case here, i.e., nAi + nDi = n is independent of i. Then, evaluating the average 〈Eij ≡ 〈NA/(NA + ND)〉ij using the distribution Bij(NA,ND) in Eq. (21), we have

ijEij=12(i+j) (33)

where ℰi = nAi/(nAi + nDi) = nAi/n is the average apparent FRET efficiency of conformation i.

Similarly, one can show that the variance σij2E2ijEij2 is

σij2=ij(1ij)N1+112(ji)2(1N1) (34)

where 〈N−1〉 is the mean of the reciprocal of the sum of acceptor and donor counts in a bin. Since the total count rate does not depend on the conformational state, 〈N−1〉 is also state-independent.

Using the mean and variance in the above equations, Bij(E) can be approximated as a continuous Gaussian distribution:

Bij(E)(2πσij2)1/2 exp ((Eij)22σij2) (35)

Alternatively, one can use a Beta distribution

Bij(E)Γ(αij+βij)Γ(αij)Γ(βij)Eαij1(1E)βij1 (36)

with the parameters chosen such that the mean and the variance are exact:

αij=ij2(1ij)/σij2ijβij=ij(1ij)2/σij21+ij (37)

Note that the “emission” probabilities in Eqs. (35) and (36) are applied for both i = j and ij.

Both Beta and Gaussian distributions were used to describe sequences of FRET efficiencies in the framework of HMM [14, 19]. However, the distributions used in these works differ from our approximations in two ways. First, our distributions depend on the initial and final states, which accounts for the fact that transitions between the conformational states can occur at any time during the bin, not just at the beginning. Second, the parameters of our Beta and Gaussian distributions are not free but related to the model parameters (i.e., FRET efficiencies of the states) via Eq. (34). The additional parameter 〈N−1〉 can be obtained from the experimentally measured distribution of the total number of photons.

In addition to the the Beta distribution with parameters in Eq. (37), we consider a simpler set of parameters. Replacing σij2ij(1ij)/[(nAj+nDj)T] in Eq. (37) and using ℰij in Eq. (33), we get at (nAj + nDj)T ≫ 1:

αij=(nAi+nAj)T/2βij=(nDi+nDj)T/2 (38)

These parameters involve the average count rates, by analogy with Eq. (15) and (24).

Finally, we consider a third set of parameters for the Beta distribution, which depend only on the state of the molecule at the beginning of the bin:

αij=nAjTβij=nDjT (39)

These are equal the mean numbers of acceptor and donor photons detected during bin time T when the molecule is in state j.

Figure 3 shows various approximations of the “emission” probability Bij(E) are tested against exact histograms of the FRET efficiency for the bins in which the molecule has a ji transition during the bin. The exact histograms are calculated by summing Bij(NA,ND), Eq. (21), over NA and ND under the constraint that NA/(NA + ND) is within E ± h, where h is the histogram step. The Gaussian distribution, Eqs. (33)(35), and the Beta distribution, Eq. (36), with parameters in Eq. (37) (labeled as B1) are almost identical and are the best in describing the exact histogram. These distributions depend on both initial and final conformational states and results in the correct mean and variance. The Beta distribution with the parameters in Eq. (39) that depend only on the initial conformational state (labeled as B3) deviates significantly from the exact distribution. Using this “emission” probability in the likelihood function might lead to the loss of accuracy in determining the model parameters. Finally, the Beta distribution with average count rates in Eq. (38) (B2) has the correct mean but wrong variance of the distribution. This might be considered as the simplest (but not the best) way of correcting the parameters of the Beta distribution.

Figure 3.

Figure 3

Comparison of various approximations for the probability of FRET efficiency, Bij(E), in the bins that undergo a single ji transition. The exact (EX) histograms (bars) are calculated using Eq. (21). They are compared with the Gaussian (G) distribution (solid line, Eqs. (33)(35)) and with Beta distributions, Eq. (36), with three different sets of parameters: parameters B1 result in the correct mean and variance (dashed line, Eq. (37)), B2 are determined by the average count rates (dotted, Eq. (38)), and B3 by the count rates of the initial state (dashed-dotted, Eq. (39)). Mean number of donor and acceptor photons in a bin is nT = 100, FRET efficiencies of the states are (a) ℰi = 0.5, ℰj = 0.7; (b) ℰi = 0.3, ℰj = 0.7

6. Concluding remarks

In this paper we considered how to construct the likelihood functions required to analyze a time-binned photon trajectory. The likelihood function can be presented in terms of the distribution of photon counts during the bin time. This distribution depends on whether a transition between conformational states has occurred during the bin time. For a two-state model, we have obtained the exact analytical expression for this distribution and, therefore, for the likelihood function. This result is valid for both long and short bin times that might be comparable to the time between state transitions. In general case, the likelihood function is more complex, and we have presented approximations starting from rigorous description of photon statistics.

In the case of slow conformational dynamics, the distribution of photons during the bin time can be presented as a product of the transition and “emission” probabilities. However, the “emission” probability depends on the molecule’s initial and final states, since a transition between conformational states can occur at any time during the bin time. When there are no transitions between the states, the “emission” distribution is Poissonian. When the conformational state changes during the bin time, the distribution is more complex. When no more than one transition during the bin time can occur, several approximations for the “emission” distribution have been suggested. The simplest and most evident (however, not necessarily most accurate) approximation is the Poisson distribution with a count rate in between of the initial and final count rates. More accurate approximations of the “emission” probability have been constructed using the exact mean and variance of the “emission” distribution. The approximate likelihood functions are valid for slow conformational dynamics compared to the bin time, as in the case of the standard HMM, but the restriction is less stringent. The approximations modify the “emission” probability for the bins with transitions between the states. Despite the fact that the number of such bins is small in the case of slow dynamics, we hope that these corrections to the “emission” probability will help to reduce bias in estimating model parameters.

Similar ideas have been employed to treat two-photon trajectories converted into sequences of FRET efficiencies. The “emission” distributions for the bins with and without transitions during the bin have been determined. It should be noted that apparent FRET efficiencies depend only on the conformational state, whereas photon count rates depend on both conformational state and location in the laser spot. This opens the possibility of using the likelihood function for the binned trajectories of FRET efficiencies presented above to analyze molecules diffusing through a laser spot. When a molecule diffuses through the laser spot, the only parameter that depends on the location in the spot is the reciprocal of the sum of acceptor and donor photons in the variance, Eq. 34.

The likelihood function can be either directly optimized with respect to the model parameters or used as the starting point of more sophisticated Bayesian procedures. Since likelihood-based methods depend on using the correct likelihood function, we hope that our work provides a sound foundation to build upon. One way to check the consistency of the observed photon trajectories with the model is to generate photon sequences according to the model, construct a FRET efficiency histogram (or the distribution of photon counts in the case of photons of one color) and compare this with the observed one [15]. The FRET efficiency distribution obtained from the simulated data constructed using HMM contains only peaks corresponding to the conformational states. However, real data may contain additional peaks due to exchange between the states [20, 3234], which can be misinterpreted as an additional conformational state. The improved likelihood function presented in this paper should help to avoid such problems when analyzing photon trajectories, i.e., detection of false states that arise due to incorrect treatment of bins with transitions between states.

Research Highlights.

A sequence of photon counts can be analyzed using a likelihood function. An exact likelihood function for a two-state kinetic model is provided. Several approximations are considered for an arbitrary kinetic model. Improved likelihood functions are obtained to treat sequences of FRET efficiencies.

Acknowledgments

I thank A. Szabo for numerous extremely illuminating discussions, H.S. Chung for Figure 1(a), and A. Berezhkovskii for helpful comments on the manuscript. This work was supported by the Intramural Research Program of the National Institutes of Health, NIDDK.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1

The joint probability distribution of acceptor and donor photons in all bins of duration T, P(NA,ND), is obtained by averaging over all initial and final conformational states, P(NA, ND) = 1P(NA, ND)peq. This quantity is central to all approaches [4, 23, 30] that focus on analyzing FRET efficiency histograms.

2

The FRET efficiency histogram, FEH(E), which is obtained from the set of efficiencies irrespective of order, is related to P(E) as FEH(E) = 1 P(E)peq.

References

RESOURCES