Skip to main content
Biophysical Reports logoLink to Biophysical Reports
. 2022 Dec 2;3(1):100089. doi: 10.1016/j.bpr.2022.100089

Single-photon smFRET. I: Theory and conceptual basis

Ayush Saurabh 1,2, Mohamadreza Fazel 1,2, Matthew Safar 1,3, Ioannis Sgouralis 4, Steve Pressé 1,2,5,
PMCID: PMC9793182  PMID: 36582655

Abstract

We present a unified conceptual framework and the associated software package for single-molecule Förster resonance energy transfer (smFRET) analysis from single-photon arrivals leveraging Bayesian nonparametrics, BNP-FRET. This unified framework addresses the following key physical complexities of a single-photon smFRET experiment, including: 1) fluorophore photophysics; 2) continuous time kinetics of the labeled system with large timescale separations between photophysical phenomena such as excited photophysical state lifetimes and events such as transition between system states; 3) unavoidable detector artefacts; 4) background emissions; 5) unknown number of system states; and 6) both continuous and pulsed illumination. These physical features necessarily demand a novel framework that extends beyond existing tools. In particular, the theory naturally brings us to a hidden Markov model with a second-order structure and Bayesian nonparametrics on account of items 1, 2, and 5 on the list. In the second and third companion articles, we discuss the direct effects of these key complexities on the inference of parameters for continuous and pulsed illumination, respectively.

Why it matters

smFRET is a widely used technique for studying kinetics of molecular complexes. However, until now, smFRET data analysis methods have required specifying a priori the dimensionality of the underlying physical model (the exact number of kinetic parameters). Such approaches are inherently limiting given the typically unknown number of physical configurations a molecular complex may assume. The methods presented here eliminate this requirement and allow estimating the physical model itself along with kinetic parameters, while incorporating all sources of noise in the data.

Introduction

Förster resonance energy transfer (FRET) has served as a spectroscopic ruler to study motion at the nanometer scale (1,2,3,4), and has revealed insight into intra- and intermolecular dynamics of proteins (5,6,7,8,9,10,11), nucleic acids (12), and their interactions (13,14). In particular, single-molecule FRET (smFRET) experiments have been used to determine the pore size and opening mechanism of ion channels sensitive to mechanical stress in the membrane (15), the intermediate stages of protein folding (16,17), and the chromatin interactions modulated by the helper protein HP1 α involved in allowing genetic transcription for tightly packed chromatin (18).

A typical FRET experiment involves labeling molecules of interest with donor and acceptor dyes such that the donor may transfer energy to the acceptor via dipole-dipole interaction when separated by distances of 2–10 nm (19). This interaction weakens rapidly with increasing separation R and goes as R6 (20,21).

To induce FRET during experiments, the donor is illuminated by a continuous or pulsating light source for the desired time period or until the dyes photobleach. Upon excitation, the donor may emit a photon itself or transfer its energy nonradiatively to the acceptor which eventually relaxes to emit a photon of a different color (20,21). As such, the data collected consist of photon arrival times (for single-photon experiments) or, otherwise, brightness values in addition to photon colors collected in different detection channels.

The distance dependence in the rate of energy transfer between donor and acceptor is key in using smFRET as a molecular ruler. Furthermore, this distance dependence directly manifests itself in the form of higher fraction of photons detected in the acceptor channel when the dyes are closer together (as demonstrated in Fig. 1). This fraction is commonly referred to as the FRET efficiency,

ϵFRET=nAnA+nD=11+(R/R0)6,

where nD and nA are the number of donor and acceptor photons detected in a given time period, respectively. In addition, R0 is the characteristic separation that corresponds to a FRET efficiency of 0.5 or 50% of the emitted photons emanating from the acceptor.

Figure 1.

Figure 1

A cartoon figure illustrating smFRET data. For the experiments considered here, the kinetics along the reaction coordinate defined along the donor-acceptor distance are monitored using single-photon arrival data. In the figure above, photon arrivals are represented by green dots for photons arriving into the donor channel and red dots for photons arriving in the acceptor channel. For the case where donor and acceptor label one molecule, a molecule’s transitions between system states (coinciding with conformations) is reflected by the distance between labels measured by variations in detected photon arrival times and colors.

Now, the aim of smFRET is to capture on-the-fly changes in donor-acceptor distance. However, this is often confounded by several sources of stochasticity, which unavoidably obscure direct interpretation. These include: 1) the stochasticity inherent to photon arrival times; 2) a detector’s probabilistic response to an incoming photon (22); 3) background emissions (2); and 4) fluorescent labels’ stochastic photophysical properties (2). Taken together, these problems necessarily contribute to uncertainty in the number of distinct system states visited by a labeled system over an experiment’s course (23,24,25).

Here, we delve into greater detail into items 2 and 4. In particular, item 2 pertains to questions of crosstalk, detector efficiency, dead time, dark current, and instrument response function (IRF) introducing uncertainty in excited photophysical state lifetime assessments (22,26,27).

Item 4 refers to a collection of effects including limited quantum yield and variable brightness due to blinking of dyes caused by nonradiative pathways (28,29), photobleaching or permanent deactivation of the dyes (2,28,29), spectral overlap between the donor and acceptor dyes, which may result in direct excitation of the acceptors or leaking of photons into the incorrect channel (2,26), or a donor-acceptor pair’s relative misalignment or positioning resulting in false signals and inaccurate characterization of the separation between labeled molecules (2,30).

Although the goal has always remained to analyze the rawest form of data, the reality of these noise properties has traditionally led to the development of approximate binned photon analyses even when data are collected at the level of single photons across two detectors. Binning is either achieved by directly summing photon arrivals over a time period when using single-photon detectors (23,31) or by integrating intensity over a few pixels when using widefield detectors (32).

While binned data analyses can be used to determine the number and connectivity of system states (33)—by computing average FRET efficiencies over bin time windows and using them in turn to construct FRET efficiency histograms (23,25,31,34,35,36)—they come at the cost of averaging kinetics that may exist below a time bin not otherwise easily accessible (32,37,38). They also eliminate information afforded by, say, the excited photophysical state lifetime in the case of pulsed illumination.

While histogram analyses are suited to infer static molecular properties, kinetics over binned time traces have also been extracted by supplementing these techniques with a hidden Markov model (HMM) treatment (23,25,34,35,36,39).

Using HMMs, binned analysis techniques immediately face the difficulty of an unknown number of system states visited. Therefore, they require the number of system states as an input to deduce the putative kinetics between the candidate system states.

What is more, the binned analysis’ accuracy is determined by the bin sizes where large bins may result in averaging of the kinetics. Moreover, increasing bin size may lead to estimation of an excess number of system states. This artifact arises when a system appears to artificially spend more time in the system states below the bin size (38). To address these challenges, we must infer continuous time trajectories below the bin size through, for example, the use of Markov jump processes (32), while retaining a binned, i.e., discrete measurement model.

When single-photon data are available we may avoid the binning issues inherent to HMM analysis (32,40,41). Doing so, also allows us to directly leverage the noise properties of detectors for single-photon arrivals (e.g., IRF) well calibrated at the single-photon level. Moreover, we can now also incorporate information available through photophysical state lifetimes when using pulsed illumination otherwise eliminated in binning data. Incorporating all of this additional information, naturally, comes with added computational cost (37) whose burden a successful method should mitigate.

Often, to help reduce computational costs, further approximations on the system kinetics are invoked, such as assuming system kinetics to be much slower than FRET label excitation and relaxation rates. This approximation helps decouple photophysical and system (molecular) kinetics (16,37,42,43).

What is more, as they exist, the rigor of direct photon arrival analysis methods are further compromised to help reduce computational cost by treating detector features and background as preprocessing steps (16,37,42,43). In doing so, simultaneous and self-consistent inference of kinetics and other molecular features becomes unattainable. Finally, all methods, whether relying on the analysis of binned photons or single-photon arrival, suffer from the “model selection problem.” That is, the problem associated with identifying the number of system states warranted by the data. More precisely, the problem associated with propagating the uncertainty introduced by items 1–4 into a probability over the models (i.e., system states). Existing methods for system state identification only provide partial reprieve.

For example, while FRET histograms identify peaks to intuit the number of system states, these peaks may provide unreliable estimates for a number of reasons: 1) fast transitions between system states may result in a blurring of otherwise distinct peaks (1) or, counter-intuitively, introduce more peaks (25,38); 2) system states may differ primarily in kinetics but not FRET efficiency (40); 3) detector properties and background may introduce additional features in the histograms.

To address the model selection problem, overfitting penalization criteria (such as the Bayesian information criterion or BIC) (23,44) or variational Bayesian (24) approaches have been employed.

Often, these model selection methods assume implicit properties of the system. For example, the BIC requires the assumption of weak independence between measurements (i.e., ideally independent identically distributed measurements and thus no Markov kinetics in state space) and a unique likelihood maximum, both of which are violated in smFRET data (24). Furthermore, BIC and other such methods provide point estimates rather than full probabilities over system states ignoring uncertainty from items 1–4 propagated over models (45).

As such, we need to learn distributions over system states and kinetics warranted by the data and whose breadth is dictated by the sources of uncertainty discussed above. More specifically, to address model selection and build joint distributions over system states and their kinetics, we treat the number of system states as a random variable just as the current community treats smFRET kinetic rates as random variables (25,40,41). Our objective is therefore to obtain distributions over all unknowns (including system states and kinetics) while accounting for items 1–4. Furthermore, this must be achieved in a computationally efficient way avoiding, altogether, the draconian assumptions of existing in single-photon analysis methods. In other words, we want to do more (by learning joint distributions over the number of system states alongside everything else) and we want it to cost less.

If we insist on learning distributions over unknowns, then it is convenient to operate within a Bayesian paradigm. Also, if the model (i.e., the number of system states) is unknown, then we must further generalize to the Bayesian nonparametric (BNP) paradigm (25,41,46,47,48,49,50,51,52,53). BNPs directly address the model selection problem concurrently and self-consistently while learning the associated model’s parameters and output full distributions over the number of system states and the other parameters.

In this series of three companion articles, we present a complete description of single-photon smFRET analysis within the BNP paradigm addressing noise sources discussed above (items 1–4). In addition, we develop specialized computational schemes for both continuous and pulsed illumination for it to “cost less.”

Indeed, mitigating computational cost becomes critical, especially with the added complexity of working within the BNP paradigm. This, in itself, warrants a detailed treatment of continuous and pulsed illumination analyses in two companion articles.

To complement this theoretical framework, we also provide to the community a suite of programs called BNP-FRET written in the compiled language Julia for high performance. These freely available programs allow for comprehensive analysis of single-photon smFRET time traces on immobilized molecules obtained with a wide variety of experimental setups.

In what follows, we first present a forward model. Next, we build an inverse strategy to learn full posteriors within the BNP paradigm. Finally, multiple examples are presented by applying the method to simulated data sets across different parameter regimes. Experimental data are treated in the two subsequent companion articles (54,55).

Forward model

Conventions

To be consistent throughout our three-part article, we precisely define some terms as follows.

  • 1.

    a macromolecular complex under study is always referred to as a system,

  • 2.

    the configurations through which a system transitions are termed system states, typically labeled using σ,

  • 3.

    FRET dyes undergo quantum mechanical transitions between photophysical states, typically labeled using ψ,

  • 4.

    a system-FRET combination is always referred to as a composite,

  • 5.

    a composite undergoes transitions among its superstates, typically labeled using ϕ,

  • 6.

    all transition rates are typically labeled using λ,

  • 7.

    the symbol N is generally used to represent the total number of discretized time windows, typically labeled with n, and

  • 8.

    the symbol wn is generally used to represent the observations in the n-th time window.

smFRET data

Here, we briefly describe the data collected from typical smFRET experiments analyzed by BNP-FRET. In such experiments, donor and acceptor dyes labeling a system can be excited using either continuous illumination or pulsed illumination, where short laser pulses arrive at regular time intervals. Moreover, acceptors can also be excited by nonradiative transfer of energy from an excited donor to a nearby acceptor. Upon relaxation, both donor and acceptor can emit photons collected by single-photon detectors. These detectors record the set of photon arrival times and detection channels. We denote the arrival times by

{Tstart,T1,T2,T3,,TK,Tend},

and detection channels with

{c1,c2,c3,,cK},

for a total number of K photons. In the equations above, Tstart and Tend are experiment’s start and end times. Further, we emphasize here that the strategy used to index the detected photons above is independent of the illumination setup used.

Throughout the experiment, photon detection rates from the donor and acceptor dyes vary as the distance between them changes, due to the system kinetics. In cases where the distances form an approximately finite set, we treat the system as exploring a discrete system state space. The acquired FRET traces can then be analyzed to estimate the transition rates between these system states assuming a known model (i.e., known number of system states). We will lift this assumption of knowing the model a priori in the section “nonparametrics: predicting the number of system states.”

Cases where the system state space is continuous fall outside the scope of the current work and require extensions of (56) and (57) currently in progress.

In the following subsections, we present a physical model (forward model) describing the evolution of an immobilized system labeled with a FRET pair. We use this model to derive, step-by-step, the collected data’s likelihood given a choice of model parameters. Furthermore, given the mathematical nature of what is to follow, we will accompany major parts of our derivations with a pedagogical example of a molecule labeled with a FRET pair undergoing transitions between just two system states to demonstrate each new concept in example boxes.

Likelihood

To derive the likelihood, we begin by considering the stochastic evolution of an idealized system, transitioning through a discrete set of total Mσ system states, {σ1,,σMσ}, labeled with a FRET pair having Mψ discrete photophysical states, {ψ1,,ψMψ}, representing the fluorophores in their ground, excited, triplet, blinking, photobleached, or other quantum mechanical states. The combined system-FRET composite now undergoes transitions between Mϕ=Mσ×Mψ superstates, {ϕ1,,ϕMϕ}, corresponding to all possible ordered pairs (σj,ψk) of the system and photophysical states. To be precise, we define ϕi(σj,ψk), where i=(j1)Mψ+k.

Assuming Markovianity (memorylessness) of transitions among superstates, the probability of finding the composite in a specific superstate at a given instant evolves according to the master equation (40).

dρ(t)dt=ρ(t)G, (1)

where the row vector ρ(t) of length Mϕ has elements coinciding with probabilities for finding the system-FRET composite in a given superstate at time t. More explicitly, defining the photophysical portion of the probability vector ρ(t) corresponding to system state σi as

ρσi(t)=[ρσi,ψ1(t)ρσi,ψ2(t)ρσi,ψMψ(t)],

we can write ρ(t) as

ρ(t)=[ρσ1(t)ρσ2(t)ρσMσ(t)].

Furthermore, in the master equation above, G is the generator matrix of size Mϕ×Mϕ populated by all transition rates λϕiϕj between superstates.

Each diagonal element of the generator matrix G corresponds to self-transitions and is equal to the negative sum of the remaining transition rates within the corresponding row. That is, λϕiϕi=jiλϕiϕj. This results in zero row-sums, assuring that ρ(t) remains normalized at all times as described later in more detail (see Eq. 5). Furthermore, for simplicity, we assume no simultaneous transitions among system states and photophysical states as such events are rare (although the incorporation of these events in the model may be accommodated by expanding the superstate space). This assumption results in λ(ψi,σj)(ψl,σm)=0 for simultaneous il and lm, which allows us to simplify the notation further. That is, λ(ψi,σj)(ψi,σk)λσjσk (for any i) and λ(ψi,σj)(ψk,σj)λσj,ψiψk (for any j). This leads to the following form for the generator matrix containing blocks of exclusively photophysical and exclusively system transition rates, respectively

G=[Gσ1ψj1λσ1σjIλσ1σ2Iλσ1σMσIλσ2σ1IGσ2ψj2λσ2σjIλσ2σMσIλσMσσ1IλσMσσ2IGσMσψjMσλσMσσjI], (2)

where the matrices on the diagonal Gσiψ are the photophysical parts of the generator matrix for a system found in the σi system state. In addition, I is the identity matrix of size Mψ.

For later convenience, we also organize the system transition rates λσiσj in Eq. 2 as a matrix

Gσ=[λσ1σ2λσ1σ3λσ1σMσλσ2σ1λσ2σ3λσ2σMσλσ3σ1λσ3σ2λσ3σMσλσMσσ1λσMσσ2λσMσσ3], (3)

which we call system generator matrix.

Moreover, the explicit forms of Gσiψ in Eq. 2 depend on the photophysical transitions allowed in the model. For instance, if the FRET pair is allowed to go from its ground photophysical state (ψ1) to the excited donor (ψ2) or excited acceptor (ψ3) states only, the matrix is given as

Gσiψ=[λσi,ψ1ψ2λσi,ψ1ψ3λσi,ψ2ψ1λσi,ψ2ψ3λσi,ψ3ψ10]=[λexλdirectλdλσiFRETλa0], (4)

where the along the diagonal represents the negative row-sum of the remaining elements, λex is the excitation rate, λd and λa are the donor and acceptor relaxation rates, respectively, and λdirect is direct excitation of the acceptor by a laser, and λσiFRET is the donor to acceptor FRET transition rate when the system is in its i-th system state. We note that only FRET transitions depend on the system states (identified by dye-dye separations) and correspond to FRET efficiencies given by

ϵσiFRET=λσiFRETλσiFRET+λd,

where the ratio on the right hand side represents the fraction of FRET transitions among all competing transitions out of an excited donor, that is, the fraction of emitted acceptor photons among total emitted photons.

With the generator matrix at hand, we now look for solutions to the master equation of Eq. 1. Due to its linearity, the master equation accommodates the following analytical solution:

ρ(t)=ρ(t0)exp((tt0)G)ρ(t0)Π(tt0), (5)

illustrating how the probability vector ρ(t) arises from the propagation of the initial probability vector at time t0 by the exponential of the generator matrix (the propagator matrix Π(tt0)). The exponential maps the transition rates λϕiϕj in the generator matrix to their corresponding transition probabilities πϕiϕj populating the propagator matrix. The zero row-sums of the generator matrix guarantee that the resulting propagator matrix is stochastic (i.e., has rows of probabilities that sum to unity, jπϕiϕj=1).

Example I: State space and generator matrix.

For a molecule undergoing transitions between its two conformations, we have Mσ=2 system states given as {σ1,σ2}. The photophysical states of the FRET pair labeling this molecule are defined according to whether the donor or acceptor are excited. Denoting the ground state by G and excited state by E, we can write all photophysical states of the FRET pair as {ψ1=(G,G),ψ2=(E,G),ψ3=(G,E)}, where the first element in the ordered pair represents the donor state. Furthermore, here, we assume no simultaneous excitation of the donor and acceptor owing to its rarity.

Next, we construct the superstate space with Mφ=6 ordered pairs {φ1=(ψ1,σ1),φ2=(ψ2,σ1),φ3=(ψ3,σ1),φ4=(ψ1,σ2),φ5=(ψ2,σ2),φ6=(ψ3,σ2)}. Finally, the full generator matrix for this setup reads

G=[Gσ1ψλσ1σ2Iλσ1σ2Iλσ2σ1IGσ2ψλσ2σ1I]=[λexλdirectλσ1σ200λdλσ1FRET0λσ1σ20λa000λσ1σ2λσ2σ100λexλdirect0λσ2σ10λdλσ2FRET00λσ2σ1λa0].

Both here, and in similar example boxes that follow, we choose values for rates commonly encountered in experiments (17). We consider a laser exciting a donor at rate λex=10ms1. Next, we suppose that the molecule switches between system states σ1 and σ2 at rates λσ1σ2=2.0ms1 and λσ2σ1=1ms1.

Furthermore, assuming typical lifetimes of 3.6 and 3.5 ns for the donor and acceptor dyes (17), their relaxation rates are, respectively, λd=1/3.6 ns−1 and λa=1/3.5 ns−1. We also assume that there is no direct excitation of the acceptor and thus λdirect=0. Next, we choose FRET efficiencies of 0.2 and 0.9 for the two system states resulting in λσ1FRET=λd/4=0.06 ns−1 and λσ2FRET=9λd=2.43 ns−1.

Finally, these values lead to the following generator matrix (in ms1 units)

G=[1210.002.00.00.0277000347002700000.02.00.02850000.02850020.00.02.01.00.00.01110.000.01.00.0277000277700125000000.00.01.02850000.0285001].

After describing the generator matrix and deriving the solution to the master equation, we continue by explaining how to incorporate observations into a likelihood.

In the absence of observations, any transition among the set of superstates are unconstrained. However, when monitoring the system using suitable detectors, observations rule out specific transitions at the observation time. For example, ignoring background for now, the detection of a photon from a FRET label identifies a transition from an excited photophysical state to a lower energy photophysical state of that label. On the other hand, no photon detected during a time period indicates the absence of radiative transitions or the failure of detectors to register such transition. Consequently, even periods without photon detections are informative in the presence of a detector. In other words, observations from a single-photon smFRET experiment are continuous in that they are defined at every point in time.

In addition, since smFRET traces report radiative transitions of the FRET labels at photon arrival times, uncertainty remains about the occurrences of unmonitored transitions (e.g., between system states). Put differently, smFRET traces (observations) only partially specify superstates at any given time.

Now, to compute the likelihood for such smFRET traces, we must sum those probabilities over all trajectories across superstates (superstate trajectories) consistent with a given set of observations. Assuming the system ends in superstate ϕi at Tend, this sum over all possible trajectories can be very generally given by the element of the propagated vector ρ(Tend) corresponding to superstate ϕi. Therefore, a general likelihood may be written as

L=p(ϕi)=[ρ(Tend)]i. (6)

However, as the final superstate at time Tend is usually unknown, we must therefore marginalize (sum) over the final superstate to obtain the following likelihood

L=i=1Mϕp(ϕi)=ρ(Tend)ρnormT, (7)

where all elements of the vector ρnorm are set to 1 as a means to sum the probabilities in vector ρ(Tend). In the following sections, we describe how to obtain concrete forms for these general likelihoods.

Absence of observations

For pedagogical reasons, it is helpful to first look at the trivial case where a system-FRET composite evolves but no observations are made (due to a lack, say, of detection channels). In this case, all allowed superstate trajectories are possible between the start time of the experiment, Tstart, and end, Tend. This is because the superstate cannot be specified or otherwise restricted at any given time by observations previously explained. Consequently, the probability vector ρ(t) remains normalized throughout the experiment as no superstate trajectory is excluded. As such, the likelihood is given by summing over probabilities associated to the entire set of trajectories, that is,

L=p((T1,e1),,(TK,eK)|G)=ρ(Tstart)Π(TendTstart)ρnormT=ρ(Tend)ρnormT=1, (8)

where {e1,,eK} are the emission times of all emitted photons, not recorded due to lack of detection channels and thus not appearing on the right hand side of the expression.

In what follows, we describe how the probability vector ρ(t) does not remain normalized as it evolves to ρ(Tend) when detectors partially collapse knowledge of the occupied superstate during the experiment. This results in a likelihood smaller than one. We do so for the conceptually simpler case of continuous illumination for now.

Introducing observations

To compute the likelihood when single-photon detectors are present, we start by defining a measurement model where the observation at a given time is dictated by ongoing transitions and detector features (e.g., crosstalk, detector efficiency). As we will see in more detail later, if we describe the evolution of a system by defining its states at discrete time points and these states are not directly observed, and thus hidden, then this measurement model adopts the form of a HMM. Here, Markovianity arises when a given hidden state only depends on its immediate preceding hidden state. In such HMMs, an observation at a given time is directly derived from the concurrent hidden state.

As an example of an HMM, for binned smFRET traces, an observation is often approximated to depend only on the current hidden state. However, contrary to such a naive HMM, an observation in a single-photon setup in a given time period depends on the current superstate and the immediate previous superstate. This naturally enforces a second-order structure on the HMM where each observed random variable depends on two superstates, as we demonstrate shortly. A similar HMM structure was noted previously to model a fluorophore’s photo-switching behavior in (58).

Now, to address this observation model, we first divide the experiment’s time duration into N windows of equal size, ϵ=(TendTstart)/N. We will eventually take the continuum limit ϵ0 to recover the original system as described by the master equation. We also sum over all possible transitions between superstates within each window. These windows are marked by the times (see Fig. 2 a)

{t0,t1,t2,,tN},

where the n-th window is given by (tn1,tn) with t0=Tstart and tN=Tend. Corresponding to each time window, we have observations

w={w1,w2,w3,,wN},

where wn= if no photons are detected and wn={(Tn(1),cn(1)),(Tn(2),cn(2)),} otherwise, with the j-th photon in a window being recorded by the channel cn(j) at time Tn(j). Note here that observations in a time window, being a continuous quantity, allow for multiple photon arrivals or none at all.

Figure 2.

Figure 2

Graphical models depicting the random variables and parameters involved in the generation of photon arrival data for smFRET experiments. Circles shaded in blue represent parameters of interest we wish to deduce, namely transition rates and probabilities. The circles shaded in gray correspond to observations. The unshaded circles represent the superstates. The arrows reflect conditional dependence among these variables and colored dots represent photon arrivals. Going from (a) to (b), we convert the original HMM with a second-order structure to a naive HMM where each observation only depends on one state.

As mentioned earlier, each of these observations originate from the evolution of the superstate. Therefore, we define superstates occupied at the beginning of each window as

{a1,a2,a3,,aN1,aN,aN+1},

where an is the superstate at the beginning of the n-th time window as shown in Fig. 2 a. The framework described here can be employed to compute the likelihood. However, the second-order structure of the HMM leads to complications in these calculations. In the rest of this section, we first illustrate the mentioned complication using a simple example and then describe a solution to this issue.

Example II: Naive likelihood computation.

Here, we calculate the likelihood for our two-state system described earlier. For simplicity alone, we attempt the likelihood calculation for a time period spanning the first two time windows (N=2) in Fig. 2 a. Within this period the system-FRET composite evolves from superstate a1 to a3 giving rise to observations w1:2. The likelihood for such a setup is typically obtained using a recursive strategy by marginalizing over superstates a1:3 (summing over all possible superstate trajectories)

L=p(w1:2|G,ρstart)=a1:3p(w1:2,a1:3|G,ρstart)=a1:3p(w2|a1:3,w1G,ρstart)p(w1,a1:3|G,ρstart)=a1:3p(w2|a2:3,G)p(w1|a1:2,G)p(a1:3|G,ρstart).

Here, we have applied the chain rule of probabilities in each step. Moreover, in the last step, we have only retained the parameters that are directly connected to the random variable on the left in each term, as shown by arrows in Fig. 2 a.

Now, for our two system state example, an can be any of the six superstates φ1:6 (Mφ=6) given earlier. As such, the sum above contains MφN+1=63 terms for such a simple example. For a large number of time windows, computing this sum becomes prohibitively expensive. Therefore, it is common to use a recursive approach to find the likelihood, only requiring Mφ2(N+1) operations, as we describe in the next section. However, due to our HMM’s second-order structure, the two first terms (involving observations) in the above sum are conditioned on a mutual superstate a2, which forbids recursive calculations.

After describing the issue in computing the likelihood due to the second-order structure of our HMM, we now describe a solution to this problem. As such, to simplify the likelihood calculation, we temporarily introduce superstates bn at the end of n-th window separated from superstate an+1 at the beginning of (n+1)-th window by a short time τ as shown in Fig. 2 b during which no observations are recorded (inactive detectors). This procedure allows us to conveniently remove dependency of consecutive observations on a mutual superstate. That is, consecutive observations wn and wn+1 now do not depend on a common superstate an+1, but rather on separated (an,bn) pairs; see Fig. 2 b. The sequence of superstates now looks like (see Fig. 2 b)

{a1,b1,a2,b2,a3,b3,,aN1,bN1,aN,bN,aN+1}, (9)

which now permits a recursive strategy for likelihood calculation as described in the next section. Furthermore, we will eventually take the τ0 limit to obtain the likelihood of the original HMM with the second-order structure.

Recursion formulas

We now have the means to compute the terminal probability vector ρend=ρ(Tend) by evolving the initial vector ρstart=ρ(Tstart). This is most conveniently achieved by recursively marginalizing (summing) over all superstates in Eq. 9 backward in time, starting from the last superstate aN+1 as follows

L=p(w1:N|G,ρstart)=aN+1p(w1:N,aN+1|G,ρstart)=aN+1AN+1(aN+1)=AN+1ρnormT, (10)

where AN+1(aN+1) are elements of the vector AN+1 of length Mϕ, commonly known as a filter (59). Moving backward in time, the filter at the beginning of the n-th time window, An(an+1), is related to the filter at the end of the n-th window, Bn(bn), due to Markovianity, as follows

An+1(an+1)=p(w1:n,an+1|G,ρstart)=bnp(an+1|bn,G)Bn(bn), (11)

or in matrix notation as

An+1=BnΠ˜n,

where p(an+1|bn) are the elements of the transition probability matrix Π˜n described in the next section. Again due to Markovianity, the filter at the end of the n-th window, Bn(bn), is related to the filter at the beginning of the same time window, An(an), as

Bn(bn)=p(w1:n1,bn|G,ρstart)=anp(wn|an,bn,G)p(bn|an,G)An(an), (12)

or in matrix notation as

Bn=AnΠn(r),

where the terms p(wn|an,bn,G)p(bn|an,G) populate the transition probability matrix Πn(r) described in the next section. Here, we use the superscript (r) to denote that elements of this matrix include observation probabilities. We note here that the last filter in the recursion formula, A1, is equal to starting probability vector ρstart itself.

Reduced propagators

To derive the different terms in the recursive filter formulas, we first note that the transition probabilities p(an|bn1,G) and p(bn|an,G) do not involve observations. As such, we can use the full propagator as follows

p(bn|an,G)=(Π)anbn=(exp((ϵτ)G))anbn

and

p(an|bn1,G)=(Π˜)bn1an=(exp(τG))bn1an,

respectively. On the other hand, the term p(wn|an,bn,G) includes observations that result in modification to the propagator by ruling out a subset of transitions. For instance, observation of a photon momentarily eliminates all nonradiative transitions. The modifications now required can be structured into a matrix Dn of the same size as the propagator with elements (Dn)anbn=p(wn|an,bn,G). We term all such matrices detection matrices. The product p(wn|an,bn,G)p(bn|an,G) in Eq. 12 can now be written as

(Πn(r))anbn=(Π)anbn×(Dn)anbn,

relating the modified propagator (termed reduced propagator and distinguished by the superscript (r) hereafter) (Πn(r))anbn in the presence of observations to the full propagator (no observations). Plugging in the matrices introduced above into the recursive filter formulas (Eqs. 11 and 12), we obtain in matrix notation

An+1=BnΠ˜=Bnexp((ϵτ)G)Bn=AnΠn(r)=An(exp(ϵG)Dn), (13)

where the symbol represents element-by-element product of matrices. Here, however, the detection matrices cannot yet be computed analytically as the observations wn allow for an arbitrary number of transitions within the finite time window (tn1,tn). However, they become manageable in the limit that the time windows become vanishingly small, as we demonstrate later.

Likelihood for the HMM with second-order structure

Now, inserting the matrix expressions for filters of Eq. 13 into the recursive formula likelihood Eq. 10, we arrive at

L=aN+1ρstartΠ1(r)Π˜Π2(r)Π˜Π3(r)Π˜ΠN1(r)Π˜ΠN(r)=ρstartΠ1(r)Π˜Π2(r)Π˜Π3(r)Π˜ΠN1(r)Π˜ΠN(r)ρnormT=ρ(Tend)ρnormT,

where, in the second step, we added a row vector of ones, ρnorm at the end to sum over all elements. Here, the superscript T denotes matrix transpose. As we now see, the structure of the likelihood above amounts to propagation of the initial probability vector ρstart to the final probability vector ρ(Tend) via multiple propagators corresponding to N time windows.

Now, under the limit τ0, we have

Π˜=limτ0exp(τG)=I,Π=limτ0exp((ϵτ)G)=exp(ϵG),

where I is the identity matrix. In this limit, we recover the likelihood for the HMM with a second-order structure as

L=ρstartΠ1(r)Π2(r)Π3(r)ΠN1(r)ΠN(r)ρnormT. (14)

We note here that the final probability vector ρ(Tend) is not normalized to one upon propagation due to the presence of reduced propagators corresponding to observations. More precisely, the reduced propagators restrict the superstates evolution to only a subset of trajectories over a time window ϵ in agreement with the observation over this window. This, in turn, results in a probability vector whose elements sum to less than one. That is,

ρstartΠ1(r)Π2(r)Π3(r)ΠN1(r)ΠN(r)ρnormT=ρ(Tend)ρnorm<1. (15)

Continuum limit

Up until now, the finite size of the time window ϵ allowed for an arbitrary number of transitions per time window (tn1,tn), which hinders the computation of an exact form for the detection matrices. Here, we take the continuum limit, as the time windows become vanishingly small (that is, ϵ0 as N). Thus, no more than one transition is permitted per window. This allows us to fully specify the detection matrices Dn.

To derive the detection matrices, we first assume ideal detectors with 100% efficiency and include detector effects in the subsequent sections (see the “detection effects” section). In such cases, the absence of photon detections during a time window, while detectors are active, indicates that only nonradiative transitions took place. Thus, only nonradiative transitions have nonzero probabilities in the detection matrices. As such, for evolution from superstate an to bn, the elements of the nonradiative detection matrix, Dnon, are given by

(Dnon)anbn={1Nonradiativetransitions0Radiativetransitions. (16)

On the other hand, when the k-th photon is recorded in a time window, only elements corresponding to radiative transitions are nonzero in the detection matrix denoted by Dkrad as

(Dkrad)anbn={0Alltransitionsexceptforthek-thphotonemission1k-thphotonemission. (17)

Here, we note that the radiative detection matrices have zeros along their diagonals, since self-transitions are nonradiative.

We can now define the reduced propagators corresponding to the nonradiative and radiative detection matrices, Dnon and Dkrad, using the Taylor approximation limϵ0Πn=I+ϵG+O(ϵ2) as

Π(r)non=(I+ϵG+O(ϵ2))Dnon=exp(ϵGnon)+O(ϵ2),and (18)
Πk(r)rad=(I+ϵG+O(ϵ2))Dkrad=ϵGkrad+O(ϵ2). (19)

In the equations above, Gnon=GDnon and Gkrad=GDkrad, where the symbol represents an element-by-element product of the matrices. Furthermore, the product between the identity matrix and Dkrad above vanishes in the radiative propagator due to zeros along the diagonals of Dkrad.

Example III: Detection matrices.

For our example with two system states described earlier, the detection matrices of Eqs. 16 and 17 take simple forms. The radiative detection matrix has the same size as the generator matrix with nonzero elements wherever there is a rate associated to a radiative transition

Dd/arad=[0000001/0000000/1000000000000001/0000000/100],

where the subscripts d and a, respectively, denote photon detection in donor and acceptor channels. Similarly, the nonradiative detection matrix is obtained by setting all elements of the generator matrix related to radiative transitions to zero and the remaining to one as

Dnon=[111111011111011111111111111011111011].

Final likelihood

With the asymptotic forms of the reduced propagators in Eq. 14 now defined in the last subsection, we have all the ingredients needed to arrive at the final form of the likelihood.

To do so, we begin by considering the period right after the detection of the (k1)-th photon until the detection of the k-th photon. For this time period, the nonradiative propagators in Eq. 14 can now be easily merged into a single propagator Πknon=exp((TkTk1)Gnon), as the commutative arguments of the exponentials can be readily added. Furthermore, at the end of this interphoton period, the radiative propagator Πk(r)rad marks the arrival of the k-th photon. The product of these two propagators

ΠknonΠk(r)rad=ϵΠknonGkrad+O(ϵ2)=ϵexp((TkTk1)Gnon)Gkrad+O(ϵ2), (20)

now governs the stochastic evolution of the system-FRET composite during that interphoton period.

Inserting Eq. 20 for each interphoton period into the likelihood for the HMM with second-order structure in Eq. 14, we finally arrive at our desired likelihood

L=ϵKρstartΠ1nonG1radΠ2nonG2radΠK1nonGK1radΠKnonGKradΠendnonρnormT+O(ϵK+1). (21)

This likelihood has the same structure as shown by Gopich and Szabo in (40).

Example IV: Propagator and likelihood.

Here, we consider a simple FRET trace where two photons are detected at times 0.05 and 0.15 ms in the donor and acceptor channels, respectively. To demonstrate the ideas developed so far, we calculate the likelihood of these observations as (see Eq. 21)

L=ϵ2ρstartΠ1nonG1radΠ2nonG2radρnormT.

To do so, we first need to calculate Π1non using the nonradiative detection (Dnon) and generator (G) matrices found in the previous example boxes

Π1non=exp(0.05(GDnon))=[0.55000.06000000000000000.03000.5800000000000000],

and similarly

Π2non=exp((0.150.05)(GDnon))=[0.30000.06000000000000000.03000.3300000000000000].

Next, we proceed to calculate G1rad and G2rad. Remembering that the first photon was detected in the donor channel, we have (in ms1 units)

G1rad=GDdrad=[0000002770000000000000000000000027700000000000].

Similarly, since the second photon was detected in the acceptor channel, we can write (in ms1 units)

G2rad=GDarad=[0000000000002850000000000000000000000028500000].

We also assume that the system is initially in the superstate φ1 giving ρstart=[1,0,0,0,0,0]. Finally, putting everything together, we can find the likelihood as L=3.06ϵ2 where ϵ is a constant and does not contribute to parameter estimations, as we show later.

Effect of binning single-photon smFRET data

When considering binned FRET data, the time period of an experiment (TendTstart) is typically divided into a finite number (=N) of equally sized (=ϵ) time windows (bins), and the photon counts (intensities) in each bin are recorded in the detection channels. This is in contrast to single-photon analysis where individual photon arrival times are recorded. To arrive at the likelihood for such binned data, we start with the single-photon likelihood derived in Eq. 15 where ϵ is not infinitesimally small, that is,

L=ρstartΠ1(r)Π2(r)Π3(r)ΠN1(r)ΠN(r)ρnormT, (22)

where

(Πn(r))anan+1=(Π)anan+1×(Dn)anan+1,

or in the matrix notation

Πn(r)=ΠDn=exp(ϵG)Dn, (23)

where Dn is the detection matrix introduced in the “reduced propagators” section.

Next, we must sum over all superstate trajectories that may give rise to the recorded photon counts (observations) in each bin. However, such a sum is challenging to compute analytically and has been attempted in (38). Here, we only show likelihood computation under commonly applied approximations/assumptions when analyzing binned smFRET data, which are: 1) bin size ϵ is much smaller than typical times spent in a system state or, in other words, for a system transition rate λσiσj, we have ϵλσiσj1; and 2) excitation rate λex is much slower than dye relaxation and FRET rates, or in other words, interphoton periods are much larger than the excited state lifetimes.

The first assumption is based on realistic situations where system kinetics (at seconds timescale) are many orders of magnitude slower than the photophysical transitions (at nanoseconds timescale). This timescale separation allows us to simplify the propagator calculation in Eq. 23. To see that, we first separate the system transition rates from photophysical transition rates in the generator matrix as

G=GσI+Gψ, (24)

where denotes a tensor product, Gσ is the portion of generator matrix G containing only system transition rates previously defined in Eq. 3, and Gψ is the portion containing only photophysical transition rates, that is,

GσI=[j1λσ1σjIλσ1σ2Iλσ1σMσIλσ2σ1Ij2λσ2σjIλσ2σMσIλσMσσ1IλσMσσ2IjMσλσMσσjI], (25)

and

Gψ=[Gσ1ψ000Gσ2ψ000GσMσψ], (26)

where Gσiψ is the photophysical generator matrix corresponding to system state σi given in Eq. 4.

Now plugging Eq. 24 into the full propagator Π=exp(ϵG) and applying the famous Zassenhaus formula for matrix exponentials, we get

Π=exp(ϵ(GσI+Gψ))=exp(ϵ(GσI))exp(ϵGψ)exp(ϵ22[GσI,Gψ])exp(O(ε3)) (27)

where the square brackets represent the commutator of the constituting matrices and the last term represents the remaining exponentials involving higher-order commutators. Furthermore, the commutator [GσI,Gψ] results in a very sparse matrix given by

[GσI,Gψ]=[0λσ1σ2(Gσ2ψGσ1ψ)λσ1σMσ(GσMσψGσ1ψ)λσ2σ1(Gσ1ψGσ2ψ)0λσ2σMσ(GσMσψGσ2ψ)λσMσσ1(Gσ1ψGσMσψ)λσMσσ2(Gσ2ψGσMσψ)0], (28)

where

λσiσj(GσjψGσiψ)=λσiσj[0000(λσjFRETλσiFRET)(λσjFRETλσiFRET)000].

Now, the propagator calculation in Eq. 27 simplifies if the commutator ϵ2[GσI,Gψ]0, implying that either the bin size ϵ is very small such that ϵλσiσj1 (our first assumption) or that FRET rates/efficiencies are almost indistinguishable (ϵ(λσjFRETλσiFRET)0). Under such conditions, the system state can be assumed to stay constant during a bin, with system transitions only occurring at the ends of bin periods. Furthermore, the full propagator Π in Eq. 27 can now be approximated as

Π=exp(ϵG)exp(ϵ(GσI))exp(ϵGψ)=(ΠσI)Πψ, (29)

where the last equality follows from the block diagonal form of GσI given in Eq. 25 and Πσ=exp(ϵGσ) is the system transition probability matrix (propagator) given as

Πσ=[πσ1σ1πσ1σ2πσ1σMσπσ2σ1πσ2σ2πσ2σMσπσMσσ1πσMσσ2πσMσσMσ]. (30)

Moreover, Πψ=exp(ϵGψ) is the photophysical transition probability matrix (propagator) as

Πψ=[Πσ1ψ000Πσ2ψ000ΠσMσψ], (31)

where the elements are given as Πσiψ=exp(ϵGσiψ). Furthermore, because of the block diagonal structure of Πψ, the matrix multiplication in Eq. 29 results in

Π=[πσ1σ1Πσ1ψπσ1σ2Πσ1ψπσ1σMσΠσ1ψπσ2σ1Πσ2ψπσ2σ2Πσ2ψπσ2σMσΠσ2ψπσMσσ1ΠσMσψπσMσσ2ΠσMσψπσMσσMσΠσMσψ]. (32)

After deriving the full propagator Π for the time period ϵ (bin) under our first assumption, we now proceed to incorporate observations during this period via detection matrices Dn to compute the reduced propagator of Eq. 23. To do so, we now apply our second assumption of relatively slower excitation rate λex. This assumption implies that interphoton periods are dominated by the time spent in the ground state of the FRET pair and are distributed according to a single exponential distribution, Exponential(λex). Consequently, the total photon counts per bin follow a Poisson distribution, Poisson(ϵλex), independent of the photophysical portion of the photophysical trajectory taken from superstate an to an+1.

Now, the first and the second assumptions imply that the observation during the n-th bin only depends on the system state sn (or the associated FRET rate λsnFRET). As such we can approximate the detection matrix elements as

(Dn)anan+1=p(wn|an,an+1)p(wn|sn). (33)

Using these approximations, the reduced propagator in Eq. 23 can now be written as

Πn(r)=ΠDn[πσ1σ1p(wn|sn=σ1)Πσ1ψπσ1σMσp(wn|sn=σ1)Πσ1ψπσ2σ1p(wn|sn=σ2)Πσ2ψπσ2σMσp(wn|sn=σ2)Πσ2ψπσMσσ1p(wn|sn=σMσ)ΠσMσψπσMσσMσp(wn|sn=σMσ)ΠσMσψ]. (34)

Next, to compute the likelihood for the n-th bin, we need to sum over all possible superstate trajectories within this bin as

Ln=ρnΠn(r)ρnormTi=1Mσj=1Mσp(wn|sn=σi)πσiσj(ρσi,nΠσiψρnormT), (35)

where ρn is a normalized row vector populated by probabilities of finding the system-FRET composite in the possible superstates at the beginning of the n-th bin. Furthermore, we have written portions of ρn corresponding to system state σi as ρσi,n. To be more explicit, we have

ρn=[ρσ1,nρσ2,nρσMσ,n],

following the convention in the “likelihood” section and using n to now represent time tn.

Moreover, since each row of Πσiψ=exp(ϵGσiψ) sums to one, we have ΠσiψρnormT=ρnormT, which simplifies the bin likelihood of Eq. 35 to

Lni=1Mσj=1Mσp(wn|sn=σi)πσiσj(ρσi,nρnormT)=i=1Mσj=1Mσp(wn|sn=σi)πσiσjρσi,n, (36)

where we have defined ρσi,nρσi,nρnormT=jρσi,ψj as the probability of the system to occupy system state σi. We can also write the previous equation in the matrix form as

Lnρnσ(ΠσDnσ)ρnormT, (37)

where ρnσ is a row vector of length Mσ (number of system states) populated by ρσi,n for each system state, and Dnσ, in the same spirit as Dn, is a detection matrix of dimensions Mσ×Mσ populated by observation probability p(wn|sn=σi) in each row corresponding to system state σi. Furthermore, defining Πnσ(ΠσDnσ), we note here that Πnσ propagates probabilities during the n-th bin in a similar manner as the reduced propagators Πn(r) of Eq. 23.

Therefore, we can now multiply these new propagators for each bin to approximate the likelihood of Eq. 22 as

LρstartσΠ1σΠ2σΠ3σΠNσρnormT, (38)

where ρstartσ is a row vector, similar to ρnσ, populated by probabilities of being in a given system state at the beginning of an experiment.

To conclude, our two assumptions regarding system kinetics and excitation rate allow us to significantly reduce the dimensions of the propagators. This, in turn, leads to much lowered expense for likelihood computation. However, cheaper computation comes at the expense of requiring a large number of photon detections or excitation rates per bin to accurately determine FRET efficiencies (identify system states) since we marginalize over photophysics in each bin. Such high excitation rates lead to faster photobleaching and increased phototoxicity, and thereby much shorter experiment durations. As we will see in the “pulsed illumination” section, this problem can be mitigated by using pulsed illumination, where the likelihood takes a similar form as Eq. 38, but FRET efficiencies can be accurately estimated from the measured microtimes.

Detection effects

In the previous section, we assumed idealized detectors to illustrate basic ideas on detection matrices. However, realistic FRET experiments must typically account for detector nonidealities. For instance, many emitted photons may simply go undetected when the detection efficiency of single-photon detectors, i.e., the probability of an incident photon being successfully registered, is less than one due to inherent nonlinearities associated with the electronics (22) or the use of filters in cases of polarized fluorescent emission (60,61). In addition, donor photons may be detected in the channel reserved for acceptor photons or vice-versa due to emission spectrum overlap (62). This phenomenon, commonly known as crosstalk, crossover, or bleedthrough, can significantly affect the determination of quantities, such as transition rates and FRET efficiencies, as we demonstrate later in our results. Other effects adding noise to fluorescence signals include dark current (false signal in the absence of incident photons), dead time (the time a detector takes to relax back into its active mode after a photon detection), and timing jitter or IRF (stochastic delay in the output signal after a detector receives a photon) (22). In this section, we describe the incorporation of all such effects into our model except dark current and background emissions, which require more careful treatment and will be discussed in the “background emissions” section.

Crosstalk and detection efficiency

Noise sources such as crosstalk and detection efficiency necessarily result in photon detection being treated as a stochastic process. Both crosstalk and detection efficiency can be included into the propagators in both cases by substituting the zeros and ones, appearing in the ideal radiative and nonradiative detection matrices (Eqs. 16 and 17), with probabilities between zero and one. In such a way, the resulting propagators obtained from these detection matrices, in turn, incorporate into the likelihood the effects of crosstalk and detection efficiency into the model.

Here, in the presence of crosstalk, for clarity, we add a superscript to the radiative detection matrix of Eq. 17 for the k-th photon, Dkradct. The elements of this detection matrix for the anbn transition, when a photon intended for channel j is registered in channel i reads

(Dkradct)anbn={0NonradiativetransitionsϕjiRadiativetransitions

where ϕji is the probability for this event (upon transition from superstate an to bn). Further, detector efficiencies can also be accounted for in these probabilities to represent the combined effects of crosstalk, arising from spectral overlap, and absence of detection channel registration. When we do so, we recover iϕji1 (for cases where i and j can be both the same or different), as not all emitted photons can be accounted for by the detection channels.

This new detection matrix above results in the following modification to the radiative propagator of Eq. 19 for the k-th photon

Πk(r)radct=(I+ϵG+O(ϵ2))Dkradct=ϵGkradct+O(ϵ2).

The second equality above follows by recognizing that the identity matrix multiplied, element-wise, by Dkradct is zero. By definition, Gkradct is the remaining nonzero product.

On the other hand, for time periods when no photons are detected, the nonradiative detection matrices in Eq. 16 become

(Dn)anbn=(Dnonct)anbn={1Nonradiativetransitions1jϕijRadiativetransitions,

where the sum gives the probability of the photon intended for channel i to be registered in any channel. The nonradiative propagator of Eq. 18 for an infinitesimal period of size ϵ in the presence of crosstalk and inefficient detectors is now

Π(r)nonct=(I+ϵG+O(ϵ2))Dnonct=exp(ϵGnonct)+O(ϵ2), (39)

where Gnonct=GDnonct. With the propagators incorporating crosstalk and detection efficiency now defined, the evolution during an interphoton period between the (k1)-th photon and the k-th photon of size (TkTk1) is now governed by the product

ΠknonctΠk(r)radct=ϵΠknonctGkradct+O(ϵ2), (40)

where the nonradiative propagators in Eq. 39 have now been merged into a single propagator Πknonct=exp((TkTk1)Gnonct) following the same procedure as Eq. 20.

Finally, inserting Eq. 40 for each interphoton period into the likelihood of Eq. 14, we arrive at the final likelihood incorporating crosstalk and detection efficiency as

L=εKρstartΠ1nonctG1radctΠ2nonctG2radctΠK1nonctGK1radct×ΠKnonctGKradctΠendnonctρnormT+O(εK+1).

After incorporating crosstalk and detector efficiencies into our model, we briefly explain the calibration of the crosstalk probabilities/detection efficiencies ϕij. To calibrate these parameters, two samples, one containing only donor dyes and another containing only acceptor dyes, are individually excited with a laser under the same power to determine the number of donor photons ndiraw and number of acceptor photons nairaw detected in channel i.

From photon counts recorded for the donor-only sample, assuming ideal detectors with 100% efficiency, we can compute the crosstalk probabilities for donor photons going to channel i, ϕdi, using the photon count ratios as ϕdi=ndiraw/ndem, where ndem is the absolute number of emitted donor photons. Similarly, crosstalk probabilities for acceptor photons going to channel i, ϕai, can be estimated as ϕai=nairaw/naem, where naem is the absolute number of emitted acceptor photons. In the matrix form, these crosstalk factors for a two-detector setup can be written as

A=[ϕa1ϕd1ϕa2ϕd2]. (41)

Using this matrix, for the donor-only sample, we can now write

[nd1rawnd2raw]=A[0ndem]=[ϕd1ϕd2]ndem, (42)

and similarly for the acceptor only sample

[na1rawna2raw]=A[naem0]=[ϕa1ϕa2]naem. (43)

However, it is difficult to estimate the absolute number of emitted photons ndem and naem experimentally, and therefore the crosstalk factors in A can only be determined up to multiplicative factors of ndem and naem.

Since scaling the photon counts in an smFRET trace by an overall constant does not affect the FRET efficiency estimates (determined by photon count ratios) and escape rates (determined by changes in FRET efficiency), we only require crosstalk factors up to a constant as in the last equation.

For this reason, one possible solution toward determining the matrix elements of A up to one multiplicative constant is to first tune dye concentrations such that the ratio ndem/naem=1, which can be accomplished experimentally. This allows us to write the crosstalk factors in the matrix form up to a constant as follows

A=[ϕa1ϕd1ϕa2ϕd2][na1rawnd1rawna2rawnd2raw]. (44)

It is common to set the multiplicative factor in Eq. 44 by the total donor photon counts jndjraw to give

A=[ϕa1ϕd1ϕa2ϕd2][na1rawjndjrawnd1rawjndjrawna2rawjndjrawnd2rawjndjraw]. (45)

We note that from the convention adopted here, we have ϕd1+ϕd2=1.

Furthermore, in situations where realistic detectors affect the raw counts, the matrix elements of A as computed above automatically incorporate the effects of detector inefficiencies including the fact that jϕj1.

In addition, the matrix A can be further generalized to account for more than two detectors by appropriately expanding the size of the matrix dimensions to coincide with the number of detectors. Calibration of the matrix elements then follows the same procedure as above.

Now, in performing single-photon FRET analysis, we will use directly the elements of A in constructing our measurement matrix. However, it is also common, to compute the matrix elements of A from what is termed the route correction matrix (RCM) (63) typically used in binned photon analysis. The RCM is defined as the inverse of A to obtain corrected counts ndem and naem up to a proportionality constant as

RCM[ϕd2ϕd1ϕa2ϕa1]. (46)
Example V: Detection matrices with crosstalk and detector efficiencies.

For our example with two system states, we had earlier shown detection matrices for ideal detectors. Here, we incorporate crosstalk and detector efficiencies into these matrices. Moreover, we assume a realistic RCM (64) given as

RCM[1.00.220.01.02].

However, following the convention of Eq. 45, we scale the matrix provided by a sum of absolute values of its first row elements, namely 1.22, leading to effective crosstalk factors ϕij given as

ϕa1=0.84,ϕa2=0.0,ϕd1=0.18,andϕd2=0.82.

As such, these values imply approximately 18% crosstalk from donor to acceptor channel and 84% detection efficiency for acceptor channel without any crosstalk using the convention adopted in Eq. 45. Now, we modify the ideal radiative detection matrices by replacing their nonzero elements with the calibrated ϕij values above

Dd/aradct=[000000ϕd2/ϕd100000ϕa2/ϕa100000000000000ϕd2/ϕd100000ϕa2/ϕa100]=[0000000.82/0.18000000/0.84000000000000000.82/0.18000000/0.8400].

Similarly, we modify the ideal nonradiative detection matrix by replacing the zero elements by 1ϕa1ϕa2=0.16 and 1ϕd2ϕd1=0 as follows

Dnonct=[1111110111110.16111111111111110111110.1611].

Effects of detector dead time

Typically, a detection channel i becomes inactive (dead) after the detection of a photon for a period δi as specified by the manufacturer. Consequently, radiative transitions associated with that channel cannot be monitored during that period.

To incorporate this detector dead period into our likelihood model, we break an interphoton period between the (k1)-th and k-th photon into two intervals: the first interval with an inactive detector and the second one when the detector is active. Assuming that the (k1)-th photon is detected in the i-th channel, the first interval is thus δik long. As such, we can define the detection matrix for this interval as

(Dikdead)anbn={1Alltransitionsnotintendedforchannelik0Alltransitionsintendedforchannelik.

Next, corresponding to this detection matrix, we have the propagator

Πkikdead=exp(δik(GDikdead))=exp(δikGikdead),

that evolves the superstate during the detector dead time. This propagator can now be used to incorporate the detector dead time into Eq. 20 to represent the evolution during the period between the (k1)-th and k-th photons as

Πk1ik1deadΠknonΠk(r)rad=ϵΠk1ik1deadΠknonGkrad+O(ϵ2), (47)

where ΠknonΠk(r)rad describes the evolution when the detector is active.

Finally, inserting Eq. 47 for each interphoton period into the likelihood for the HMM with a second-order structure in Eq. 14, we arrive at the following likelihood that includes detector dead time

LρstartΠ1nonG1radΠ1i1deadΠ2nonG2radΠ2i2deadΠKnonGKradΠKiKdeadΠendnonρnormT. (48)

To provide an explicit example on the effect of the detector dead time on the likelihood, we take a detour for pedagogical reasons. In this context, we consider a very simple case of one detection channel (dead time δ) observing a fluorophore with two photophysical states, ground (ψ1) and excited (ψ2), illuminated by a laser. The data in this case contains only photon arrival times

{T1,T2,T3,,TK}.

The generator matrix containing the photophysical transition rates for this setup is

G=[λψ1ψ2λψ2ψ1]=[λexλd],

where the along the diagonal represents the negative row-sum of the remaining elements, λex is the excitation rate, and λd is the donor relaxation rate.

Here, all transitions are possible during detector dead times as there are no observations. As such, the dead time propagators in the likelihood (Eq. 48) are simply expressed as exponentials of the full generator matrix, that is, Πkikdead=exp(δG), leaving the normalization of the propagated probability vector ρ unchanged, e.g., just as we had seen in Eq. 8.

As we will see, these dead times, similar to detector inefficiencies, simply increase our uncertainty over parameters we wish to learn, such as kinetics, by virtue of providing less information. By contrast, background emissions and crosstalk provide false information. However, the net effect is the same: all noise sources increasing uncertainty.

Adding the detection IRF

Due to various sources of noise impacting the detection timing electronics (also known as jitter), the time elapsed between photon arrival and detection is itself a hidden (latent) random variable (22). Under continuous illumination, we say that this stochastic delay in time is sampled from a probability density, f(τ), termed the detection IRF. To incorporate the detection IRF into the likelihood of Eq. 48, we convolute the propagators with f(τ) as follows

Lρstart(0τIRFdτ1Π1non(ΔT1τ1)G1radΠ1i1dead(τ1)f(τ1))×(0τIRFdτ2Π2non(ΔT2τ2)G2radΠ2i2dead(τ2)f(τ2))×(0τIRFdτKΠKnon(ΔTKτK)GKradΠKiKdead(τK)f(τK))ΠendnonρnormT, (50)

where we have used dead time propagators Πkikdead to incorporate detector inactivity during the period between photon reception and detector reactivation. Moreover, we have Πknon(ΔTkτk)=exp((ΔTkτk)Gnon) as described in Eq. 18.

To facilitate the computation of this likelihood, we use the fact that typical acquisition devices record at discrete (but very small) time intervals. For instance, a setup with the smallest acquisition time of 16 ps and a detection IRF distribution that is approximately 100 ps wide will have the detection IRF spread over, roughly, six acquisition periods. This allows each convolution integral to be discretized over the six acquisition intervals and computed in parallel, thereby avoiding extra real computational time associated to this convolution other than the overhead associated with parallelization.

Illumination features

After discussing detector effects, we continue here by further considering different illumination features. For simplicity alone, our likelihood computation until now assumed continuous illumination with a uniform intensity. More precisely, the element λex of the generator matrix in Eq. 4 was assumed to be time independent. Here, we generalize our formulation and show how other illumination setups (such as pulsed illumination and alternating laser excitation, ALEX (65)) can be incorporated into the likelihood by simply assigning a time dependence to the excitation rate λex(t).

Pulsed illumination

Here, we consider an smFRET experiment where the FRET pair is illuminated using a laser for a very short period of time (a pulse), δpulse, at regular intervals of size τ; see Fig. 3 a. Now, as in the case of continuous illumination with constant intensity, the likelihood for a set of observations acquired using pulsed illumination takes a similar form to Eq. 21 involving products of matrices as follows

LρstartQ1Q2Q3QN1QNρnormT, (51)

where Qn, with n=1,,N, denotes the propagator evolving the superstate during the n-th interpulse period between the (n1)-th and the n-th pulse.

Figure 3.

Figure 3

Events over a pulsed illumination experiment pulse window. Here, the beginning of the n-th interpulse window of size τ is marked by time tn. The FRET labels originally in the state GG (donor and acceptor, respectively, in ground states) are excited by a high-intensity burst (shown by the green) to the state EG (only donor excited) for a very short time δpulse. If FRET occurs, the donor transfers its energy to the acceptor and resides in the ground state leaving the FRET labels in the GE state (only acceptor excited). The acceptor then emits a photon to be registered by the detector at microtime μn. When using ideal detectors, the microtime is the same as the photon emission time as shown in (a). However, when the timing hardware has jitter (shown in red), a small delay ϵn is added to the microtime as shown in (b).

To derive the structure of Qn during the n-th interpulse period, we break it into two portions: 1) pulse with nonzero laser intensity where the evolution of the FRET pair is described by the propagator Πnpulse introduced shortly; 2) dark period with zero illumination intensity where the evolution of the FRET pair is described by the propagator Πndark introduced shortly. Furthermore, depending on whether a photon is detected or not over the n-th interpulse period the propagators Πnpulse and Πndark assume different forms.

First, when no photons are detected, we have

Πnpulse=exp(0δpulsedδGnon(δ)),and (52)
Πndark=exp((τδpulse)Gdark), (53)

where the integration over the pulse period now involves a time-dependent Gnon due to temporal variations in λex(t). The integral in Eq. 52 is sometimes termed the excitation IRF, although we will not use this convention here. For this reason, when we say IRF, we imply detection IRF alone. In addition, Gdark is the same as Gnon except for the excitation rate that is now set to zero due to lack of illumination. Finally, the propagator for an interpulse period with no photon detection can now be written as

Qn=ΠnpulseΠndark=exp(0δpulsedδGnon(δ))exp((τδpulse)Gdark). (54)

On the other hand, if a photon is detected sometime after a pulse (as in Fig. 3 a), the pulse propagator remains as in Eq. 52. However, the propagator Πndark must now be modified to include a radiative generator matrix Gnrad similar to Eq. 20

Πndark=exp((μnδpulse)Gdark)Gnradexp((τμn)Gdark), (55)

where μn is the photon arrival time measured with respect to the n-th pulse (also termed microtime) as shown in Fig. 3 a. Here, the two exponential terms describe the evolution of the superstate before and after the photon detection during the dark period.

Moreover, we can construct the propagator for situations where a photon is detected during a pulse itself in a similar fashion. Here, the propagator Πndark remains the same as in Eq. 53 but Πnpulse must now be modified to include the radiative generator matrix Gnrad as

Πnpulse=exp(0μndδGnon(δ))Gnradexp(μnδpulsedδGnon(δ)). (56)

The propagators derived so far in this section assumed ideal detectors. We now describe a procedure to incorporate the IRF into this formulation. This is especially significant in accurate estimation of fluorophores’ lifetimes, which is commonly done in pulse illumination smFRET experiments. To incorporate the IRF, we follow the same procedure as in the “adding the detection IRF” section and introduce convolution between the IRF function f(ϵ) and propagators above involving photon detections. That is, when there is a photon detected during the dark period, we modify the propagator Πndark as

Πndark=0δIRFdϵnexp((μnδpulseϵn)Gdark)Gnrad×exp((τμn+ϵn)Gdark)f(ϵn), (57)

while the Πnpulse stays the same as in Eq. 52. Here, ϵn is the stochastic delay in photon detection resulting from the IRF as shown in Fig. 3 b.

Moreover, when there is a photon detected during a pulse, the propagator Πnpulse of Eq. 56 can be modified in a similar fashion to accommodate the IRF, while the propagator Πndark remains the same as in Eq. 53.

The propagators Qn presented in this section involve integrals over large generator matrices that are analytically intractable and computationally expensive when considering large pulse numbers. Therefore, we follow a strategy similar to the one used in the “effect of binning single-photon smFRET data” sectionfor binned likelihood to approximate these propagators.

To reduce the complexity of the calculations, we start by making realistic approximations. Given the timescale separation between the interpulse period (typically tens of nanoseconds) and the system kinetics (typically of seconds timescale) in a pulsed illumination experiment, it is possible to approximate the system state trajectory as being constant during an interpulse period. In essence, rather than treating the system state trajectory as a continuous time process, we discretize the trajectory such that system transitions only occur at the beginning of each interpulse period. This allows us to separate the photophysical part of the generator matrix Gψ in Eq. 4 from the portion describing the evolution of the system under study Gσ given in Eq. 3. Here, by contrast to the likelihood shown in the “pulsed illumination” section, we can now independently compute photophysical and system likelihood portions, as described below.

To derive the likelihood, we begin by writing the system state propagator during an interpulse period as

Πσ=exp(τGσ). (58)

Furthermore, we must incorporate observations into these propagators by multiplying each system transition probability in Πσ, πσiσj, with the observation probability if that transition had occurred. We organize these observation probabilities using our newly defined detection matrices Dnσ similar to the “continuum limit” section and write the modified propagators as

Πnσ=ΠσDnσ, (59)

where again represents the element-by-element product. Here, the elements of Dnσ depend on the photophysical portion of the generator matrix Gψ and their detailed derivations are shown in the third companion article (54). We note here that propagator matrix dimensions are now Mσ×Mσ making them computationally less expensive than in the continuous illumination case. Finally, the likelihood for the pulsed illuminated smFRET data with these new propagators reads

L=p(w|ρstart,Πσ,Gψ)ρstartΠ1σΠ2σΠNσρnormT, (60)

which, similar to the case of binned likelihood under continuous illumination (see the “effect of binning single-photon smFRET data” section), sums over all possible system state trajectories.

We will later use this likelihood to put forward an inverse model to learn transition probabilities (elements of Πσ) and photophysical transition rates appearing in Gψ.

Background emissions

Here, we consider background photons registered by detectors from sources other than the labeled system under study (2). The majority of background photons comprise ambient photons, photons from the illumination laser entering the detectors, and dark current (false photons registered by detectors) (22).

Due to the uniform laser intensity in the continuous illumination case, considered in this section, we may model all background photons using a single distribution from which waiting times are drawn. Often, such distributions are assumed (or verified) to be exponential with fixed rates for each detection channel (66,67). Here, we model the waiting time distribution for background photons arising from both origins as a single exponential as is often the most common case. However, in the pulsed illumination case, laser source and the two other sources of background require different treatments due to nonuniform laser intensity. That is, the ambient photons and dark current are still modeled by an Exponential distribution, although it is often further approximated as a Uniform distribution given that interpulse period if much shorter than the average background waiting time. The full formulation describing all background sources under pulsed illumination is provided in the third companion article (54).

We now proceed to incorporate background into the likelihood under continuous illumination. We do so, as mentioned earlier, by assuming an Exponential distribution for the background, which effectively introduces new photophysical transitions into the model. As such, these transitions may be incorporated by expanding the full generator matrix G (described in the “likelihood” section) appearing in the likelihood, thereby leaving the structure of the likelihood itself intact, cf., Eq. 21.

To be clear, in constructing the new generator matrix, we treat background in each detection channel as if originating from fictitious independent emitters with constant emission rates (exponential waiting time). Furthermore, we assume that an emitter corresponding to channel i is a two-state system with photophysical states denoted by

{ψi,1bg,ψi,2bg}.

Here, each transition to the other state coincides with a photon emission with rate λibg. As such, the corresponding background generator matrix for channel i can now be written as

Gibg=[λψi,1bgψi,2bgλψi,2bgψi,1bg]=[λibgλibg].

Since the background emitters for each channel are independent of each other, the expanded generator matrix G for the combined setup (system-FRET composite plus background) can now be computed. This can be achieved by combining the system-FRET composite state space and the background state spaces for all of the total C detection channels using Kronecker sums (68) as

G=GnobgG1bgG2bgGCbg,

where the symbol denotes the matrix Kronecker sum, and Gnob represents previously shown generator matrices without any background transition rates.

The propagators needed to compute the likelihood can now be obtained by exponentiating the expanded generator matrix above as

exp((TkTk1)G)=exp((TkTk1)Gnobg)exp((TkTk1)G1bg)exp((TkTk1)G2bg)exp((TkTk1)GCbg),

where the symbol denotes the matrix Kronecker product (tensor product) (68).

Furthermore, the same detection matrices defined earlier to include only nonradiative transitions or only radiative transitions, and their generalization with crosstalk and detection efficiency, can be used to obtain nonradiative and radiative propagators, as shown in the “continuum limit” section.

Consequently, as mentioned earlier, by contrast to incorporating the effects of dead time or IRF, addition of background sources do not entail any changes in the basic structure (arrangement of propagators) of the likelihood appearing in Eq. 21.

Example VI: Background.

To provide a concrete example for background, we again return to our FRET pair with two system states. The background free full generator matrix for this system-FRET composite was provided in the example box in the “likelihood” section as (in units of ms1)

Gnobg=[1210.002.00.00.0277000347002700000.02.00.02850000.02850020.00.02.01.00.00.01110.000.01.00.0277000277700125000000.00.01.02850000.0285001].

Here, we expand the above generator matrix to incorporate background photons entering two channels (i=1,2) at rates of λ1bg=1 ms−1 and λ2bg=0.5 ms−1. We do so by performing a Kronecker sum of Gnobg with the following generator matrix for the background

Gbg=G1bgG2bg=[1111][0.50.50.50.5]=[1.50.5100.51.501101.50.5010.51.5],

resulting in

G=GnobgGbg.

Here, G is a 24 × 24 matrix and we do not include its explicit from.

Fluorophore characteristics: Quantum yield, blinking, photobleaching, and direct acceptor excitation

As demonstrated for background in the previous section, to incorporate new photophysical transitions, such as fluorophore blinking and photobleaching, into the likelihood we must modify the full generator matrix G. This can again be accomplished by adding extra photophysical states, relaxing nonradiatively, to the fluorophore model. These photophysical states can have long or short lifetimes depending on the specific photophysical phenomenon at hand. For example, donor photobleaching can be included by introducing a third donor photophysical state into the matrix of Eq. 4 without any escape transitions as follows

Gσiψ=[λψ1ψ200λψ2ψ1λσi,ψ2ψ3λψ2ψ4λψ3ψ100000]=[λex00λdλσiFRETλbleachλa00000],

where ψ1 is the lowest energy combined photophysical state for the FRET labels, ψ2 represents the excited donor, ψ3 represents the excited acceptor, and ψ4 represents a photobleached donor, respectively. In addition, λd and λa denote donor and acceptor relaxation rates, respectively, λbleach represents permanent loss of emission from the donor (photobleaching), and λσiFRET represents FRET transitions when the system is in its i-th system state.

Fluorophore blinking can be implemented similarly, except with a nonzero escape rate out of the new photophysical state, allowing the fluorophore to resume emission after some time (52,69). Here, assuming that the fluorophore cannot transition into the blinking photophysical state from the donor ground state results in the following generator matrix

Gσiψ=[λψ1ψ200λψ2ψ1λσi,ψ2ψ3λψ2ψ4λψ3ψ100λψ4ψ100]=[λex00λdλσiFRETλblinkλa00λunblink00].

So far, we have ignored direct excitation of acceptor dyes in the likelihood model. This effect can also be incorporated into the likelihood by assigning a nonzero value to the transition rate λψ1ψ3, that is,

Gσiψ=[λψ1ψ2λψ1ψ3λψ2ψ1λσi,ψ2ψ3λψ3ψ10]=[λexλdirectλdλσiFRETλa0].

Other photophysical phenomena can also be incorporated into our likelihood by following the same procedure as above. Finally, just as when adding background, the structure of the likelihood (arrangement of the propagators) when treating photophysics (including adding the effect of direct acceptor excitation) stays the same as in Eq. 21.

Synthetic data generation

In the previous subsections, we described how to compute the likelihood, which is the sum of probabilities over all possible superstate trajectories that could give rise to the observations made by a detector, as demonstrated in the “forward model” section. Here, we demonstrate how one such superstate trajectory can be simulated to produce synthetic photon arrival data using the Gillespie algorithm (70), as described in the next section, followed by the addition of detector artefacts. We then use the generated data to test our BNP-FRET sampler.

Gillespie and detector artefacts

The Gillespie algorithm generates two sets of random variables. The times at which superstates change (indexed 1 through N). These times occur anywhere along a continuous time grid. The next set of random variables are the states associated to the superstate preceding the time at which the superstate changes.

We designate the sequence of superstates

{b1,b2,,bN},

where bn{ϕ1,ϕ2,,ϕMϕ}. Here, unlike earlier in the “likelihood” section, the time index n on superstates bn is not on a regular temporal grid.

Now, to generate the superstate sequence above, we first randomly draw the first superstate, b1, from the set of possible superstates given their corresponding probabilities. Next, we draw the second superstate b2 of the sequence using the set of transition rates out of the first state with self-transitions excluded by construction. Now, after choosing b2, we generate the holding time h1 (the time spent in b1) from the Exponential distribution with rate constant associated with transitions b1b2. Finally, we repeat the two previous steps to sequentially generate the full sequence of superstates along with the corresponding holding times.

More formally, we generate a trajectory, by first sampling the initial superstate as

b1Categoricalϕ1:Mϕ(ρstart),

where ρstart is the initial probability vector and the Categorical distribution is the generalization of the Bernoulli distribution for more than two possible outcomes. The remaining superstates can now be sampled as

bn+1|bn,GCategoricalϕ1:Mϕ(λbnϕ1λbn,λbnϕ2λbn,,λbnϕMϕλbn),

where λbn=iλbnϕi is the escape rate for the superstate bn and rates for self-transitions are zero. The above equation reads as follows: “the superstate bn+1 is drawn (sampled) from a Categorical distribution given the superstate bn and the generator matrix G.”

Once the n-th superstate bn is chosen, the holding time hn (the time spent in bn) is sampled as follows

hn|bn,GExponential(λbn).

Finally, with ideal detectors, the detection channel ck is assigned deterministically to the k-th photon emitted at time Tkem, which can be computed by summing all the holding times preceding the corresponding radiative transition.

Furthermore, in the presence of detection effects, such as crosstalk, detection efficiency, and IRF, we must add to the stochastic output of the Gillespie simulation another layer of stochasticity originating from the measurement model. That is, we stochastically assign detection channel and detection times to an emitted photon, as described below.

In the presence of crosstalk and inefficient detectors, we choose the detection channel for the k-th photon emitted upon a radiative transition as

ckCategorical{,1,2}(pk,p1k,p2k),

where pk,p1k, and p2k, respectively, denote the probability of the photon going undetected, being detected in channels 1 and 2.

Moreover, in the presence of the IRF, we assign a stochastic delay ϵk, sampled from a probability distribution f(ϵ), to the absolute photon emission time Tkem. This results in the detection time, Tk=Tkem+ϵk, as registered by the timing hardware.

In addition, when photophysical effects (such as blinking and photobleaching) and background are present, we can generate a superstate trajectory following the same procedure as above using the generator matrices G incorporating these effects as described in the previous sections.

Finally, we obtain our desired smFRET trace (see Fig. 4) consisting of photon arrival times T1:K and detection channels c1:K as

{(T1,c1),(T2,c2),(T3,c3),,(TK,cK)}.
Figure 4.

Figure 4

Simulated data. Here, we show a superstate trajectory (in blue) generated using Gillespie algorithm for a system-FRET composite with two system states σ1:2 and three photophysical states. Collectively, superstates φ1:3 correspond to photophysical states when the system resides in σ1 and superstates φ4:6 correspond to photophysical states when the system resides in σ2. The pink vertical lines mark the time points where transitions between the superstates occur. The variables b1:7 and h1:7 between each set of vertical lines represent the superstates and associated holding times, respectively. The green and red dots show the photon detections at times T1 and T2 in channels 1 and 2, respectively. The first photon is detected upon transition b3b4 (or φ5φ4), while the second photon is detected upon transition b6b7 (or φ6φ4). For this plot, we have used very fast system transition rates of λσ1σ2=0.001 ns−1 and λσ1σ2=0.002 ns−1 for demonstrative purposes only.

Inverse strategy

Now, armed with the likelihood for different experimental setups and a means by which to generate synthetic data (or having experimental data at hand), we proceed to learn the parameters of interest. Assuming precalibrated detector parameters, these include transition rates that enter the generator matrix G, and elements of ρstart. However, accurate estimation of the unknowns requires an inverse strategy capable of dealing with all existing sources of uncertainty in the problem, such as photon’s stochasticity and detector noise. This naturally leads us to adopt a Bayesian inference framework where we employ Monte Carlo methods to learn distributions over the parameters.

We begin by defining the distribution of interest over the unknown parameters we wish to learn termed the posterior. The posterior is proportional to the product of the likelihood and prior distributions using Bayes’ rule as follows

p(G,ρstart|w)L(w|G,ρstart)p(G,ρstart), (62)

where the last term p(G,ρstart) is the joint prior distribution over G and ρstart defined over the same domains as the parameters. The prior is often selected on the basis of computational convenience. The influence of the prior distribution on the posterior diminishes as more data are incorporated through the likelihood. Furthermore, the constant of proportionality is the inverse of the absolute probability of the collected data, 1/p(w), and can be safely ignored as generation of Monte Carlo samples only involves ratios of posterior distributions or likelihoods.

In addition, the ϵK factor in the likelihood first derived in Eq. 21 can be absorbed into the proportionality constant as it does not depend on any of the parameters of interest, resulting in the following expression for the posterior (in the absence of detector dead time and IRF for simplicity)

p(G,ρstart|w)ρstartΠ1nonG1radΠ2nonG2radΠK1nonGK1radΠKnonGKradΠendnonρendT×p(G,ρstart). (63)

Next, assuming a priori that different transition rates are independent of each other and initial probabilities, we can simplify the prior as follows

p(G,ρstart)=p(ρstart)i,jp(λϕiϕj), (64)

where we select the Dirichlet prior distribution over initial probabilities as this prior is conveniently defined over a domain where the probability vectors, drawn from it, sum to unity. That is,

p(ρstart)=Dirichlet(ζ), (65)

where the Dirichlet distribution is a multivariate generalization of the Beta distribution and ζ is a vector of the same size as the superstate space. Typically parameters of the prior are termed hyperparameters and as such ζ collects as many hyperparameters as its size.

In addition, we select Gamma prior distributions for individual rates. That is,

p(λϕiϕj)=Gamma(λϕiϕj;α,λrefα), (66)

guaranteeing positive values. Here, α and λref (a reference rate parameter) are hyperparameters of the Gamma prior. For simplicity, these hyperparameters are usually chosen (with appropriate units) such that the prior distributions are very broad, minimizing their influence on the posterior.

Furthermore, to reduce computational cost, the number of parameters we need to learn can be reduced by reasonably assuming the system was at steady state immediately preceding the time at which the experiment began. That is, instead of sampling ρstart from the posterior, we compute ρstart by solving the time-independent master equation,

ρstartG=0.

Therefore, the posterior in Eq. 63 now reduces to

p(G|w)ρstartΠ1nonG1radΠ2nonG2radΠK1nonGK1radΠKnonGKradΠendnonρendT×p(G). (67)

In the following subsections, we first describe a parametric inverse strategy, i.e., assuming a known number of system states, for sampling parameters from the posterior distribution in Eq. 67 using Monte Carlo methods. Next, we generalize this inverse strategy to a nonparametric case where we also deduce the number of system states.

Parametric sampler: BNP-FRET with fixed number of system states

Now with the posterior, Eq. 67, at hand and assuming steady-state ρstart, here we illustrate a sampling scheme to deduce the transition rates of the generator matrix G.

As our posterior of Eq. 67 does not assume a standard form amenable to analytical calculations, we must iteratively draw numerical samples of the transition rates within G using Markov chain Monte Carlo (MCMC) techniques. Specifically, we adopt a Gibbs algorithm to, sequentially and separately, generate samples for individual transition rates at each MCMC iteration. To do so, we first write the posterior of Eq. 67 using the chain rule as follows

p(G|w)=p(λϕiϕj|Gλϕiϕj,w)p(Gλϕiϕj|w), (68)

where the backslash after G indicates exclusion of the following rate parameters and w denotes the set of observations as introduced in the “introducing observations” section. Here, the first term on the right-hand side is the conditional posterior for the individual rate λϕiϕj. The second term is considered a constant in the corresponding Gibbs step as it does not depend on λϕiϕj. Moreover, following the same logic, the priors p(Gλϕiϕj) (see Eq. 67) for the remaining rate parameters in the posterior on the left are also considered constant. Therefore, from Eqs. 67 and 68, we can write the conditional posterior for λϕiϕj above as

p(λϕiϕj|Gλϕiϕj,w)ρstartΠ1nonG1radΠ2nonG2radΠKnonGKradΠendnonρendT×Gamma(λϕiϕj;α,λrefα). (69)

Just as with the posterior over all parameters, this conditional posterior shown above does not take a closed form allowing for direct sampling.

As such, we turn to the Metropolis-Hastings (MH) algorithm (71) to draw samples from this conditional posterior, where new samples are drawn from a proposal distribution q and accepted with probability

α(λϕiϕj,λϕiϕj)=min{1,p(λϕiϕj|w,Gλϕiϕj)q(λϕiϕj|λϕiϕj)p(λϕiϕj|w,Gλϕiϕj)q(λϕiϕj|λϕiϕj)}, (70)

where the asterisk represents the proposed rate values from the proposal distribution q.

To construct an MCMC chain of samples, we begin by initializing the chain for each transition rate λϕiϕj by random values drawn from the corresponding prior distributions. We then iteratively sweep the whole set of transition rates in each MCMC iteration by drawing new values from the proposal distribution q.

A computationally convenient choice for the proposal is a Normal distribution leading to a simpler acceptance probability in Eq. 70. This is due to its symmetry resulting in q(λϕiϕj|λϕiϕj)=q(λϕiϕj|λϕiϕj). However, a Normal proposal distribution would allow negative transition rates naturally forbidden leading to rejection in the MH step and thus inefficient sampling. Therefore, it is convenient to propose new samples either drawn from a Gamma distribution or, as shown below, from a Normal distribution in logarithmic space to allow for exploration along the full real line as follows

log(λϕiϕj/κ)|log(λϕiϕj/κ),σ2Normal(log(λϕiϕj/κ),σ2),

where κ=1 is an auxiliary parameter in the same units as λϕiϕj introduced to obtain a dimensionless quantity within the logarithm.

The variable transformation above now requires introduction of Jacobian factors in the acceptance probability as follows

α(λϕiϕj,λϕiϕj)=min{1,p(λϕiϕj|w,Gλϕiϕj)p(λϕiϕj|w,Gλϕiϕj)(log(λϕiϕj/κ)/λϕiϕj)(log(λϕiϕj/κ)/λϕiϕj)},

where the derivative terms represent the Jacobian and the proposal distributions are canceled by virtue of using a symmetric Normal distribution.

The acceptance probability above depends on the difference of the current and proposed values for a given transitions rate. In other words, smaller differences between the current and proposed values often lead to larger acceptance probabilities. This difference is determined by the covariance of the Normal proposal distribution σ2, which needs to be tuned for each rate individually to achieve optimal performance of the BNP-FRET sampler, or very approximately, one-fourth acceptance rate for the proposals (72).

This whole algorithm can now be summarized in the following pseudocode.

  • # Initialize chain of samples

  • j=1

  • fori = 1:Mσ×Mσ
    •  λi(j)Gamma(α,λrefα)
  • end

  • # Iteratively sample from the posterior using Gibbs algorithm

  • forj = 2:Draws
    •  fori = 1:Mσ×Mσ
      •  #Proposenewsample
      •  log(λi)Normal(log(λi(j1)),σ2),
      •  #Computeacceptanceprobability
      •  α(λi,λi(j1))=min{1,p(λi|w,Gλi)p(λi(j1)|w,Gλi)(log(λi)/λi)(j1)(log(λi)/λi)}
      •  ifα(λi,λi(j1))>rand()
      •  #Acceptproposal
      •  λi(j)=exp(λi)
      •  else
      •  #Rejectproposal
      •  λi(j)=λi(j1)
      •  end
  •  end

  • end

Nonparametrics: Predicting the number of system states

After describing our inverse strategy for a known number of system states (i.e., parametric inference), we turn to more realistic scenarios where we may not know the number of system states which, in turn, leads to an unknown number of superstates (i.e., nonparametric inference). In the following subsections, we first describe the BNP framework for continuous illumination and then proceed to illustrate our BNP strategy under pulsed illumination. Such BNP frameworks introduced herein eventually provide us with distributions over the number of system states simultaneously, and self-consistently, with other model parameters.

Bernoulli process for continuous illumination

The number of system states is often unknown and cannot a priori be set by hand. Therefore, to learn the states warranted by the data, we turn to the BNP paradigm. That is, we first define an infinite-dimensional version of the generator matrix in Eq. 2 and multiply each of its elements by a Bernoulli random variable bi (also termed loads). These loads, indexed by i, allow us to turn on/off portions of the generator matrix associated with transitions between specific system states (including self-transitions). We can write the nonparametric generator matrix as follows

G=[b12Gσ1ψj1b1bjλσ1σjIb1b2λσ1σ2Ib2b1λσ2σ1Ib22Gσ2ψj2b2bjλσ2σjI]=[b12λψ1ψ2b12λψ1ψ3b1b2λσ1σ200b12λψ2ψ1b12λψ2ψ3(1)0b1b2λσ1σ20b12λψ3ψ1b12λψ3ψ200b1b2λσ1σ2b1b2λσ2σ100b22λψ1ψ2b22λψ1ψ30b1b2λσ2σ10b22λψ2ψ1b22λψ2ψ3(2)00b1b2λσ2σ1b22λψ3ψ1b22λψ3ψ2],

where a load value of 1 represents an “active” system state, while “inactive” system states (not warranted by the data) get a load value of 0. Here, there are two loads associated to every transition because there is a pair of states corresponding to each transition. Within this formalism, the number of active loads is the number of system states estimated by the BNP-FRET sampler. As before, represents negative row-sums.

The full set of loads, b={b1,b2,,b}, now become quantities we wish to learn. To leverage Bayesian inference methods to learn the loads, the previously defined posterior distribution (Eq. 67) now reads as follows

p(b,G|w)L(w|b,G,ρstart)p(G)p(b), (71)

where the prior p(b) is Bernoulli while the remaining prior, p(G), can be assumed to be the same as in Eq. 66.

As in the case of the parametric BNP-FRET sampler presented in the section “parametric sampler: BNP-FRET with fixed number of system states,” we generate samples from this nonparametric posterior employing a similar Gibbs algorithm. To do so, we first initialize the MCMC chains of loads and rates by taking random values from their priors. Next, to construct the MCMC chains, we iteratively draw samples from the posterior in two steps: 1) sequentially sample all rates using the MH algorithm; then 2) load by direct sampling, one-by-one from their corresponding conditional posteriors. Here, step (1) is identical to the parametric case parametric sampler: BNP-FRET with fixed number of system statesand we only focus on the second step in what follows.

To sample the i-th load, the corresponding conditional posterior reads (41)

p(bi|bbi,G,w)L(w|b,G,ρstart)Bernoulli(bi;11+Mσmax1γ), (72)

where the backslash after b indicates exclusion of the following load and Mσmax and γ are hyperparameters. Here, γ sets the a priori expected number of system states.

A note on the interpretation of Mσmax is in order. When dealing with nonparametrics, we nominally must consider an infinite set of loads and priors for these loads called Bernoulli process priors (73). Samplers for such process priors are available although inefficient (74,75). However, for computational convenience, it is possible to introduce a large albeit finite number of loads set to Mσmax. It can be shown that parameter inference are unaffected by this choice of cutoff (73,76,77) when setting the success probability to 1/(1+Mσmax1γ) as in the Bernoulli distribution of Eq. 72. This is because such a choice forces the mean (expected number of system states) of the full prior on loads ip(bi) to be finite (=γ).

Since the conditional posterior in the equation above must be discrete and describes probabilities of the load being either active or inactive, it must itself follow a Bernoulli distribution with updated parameters

p(bi|bbi,G,w)=Bernoulli(bi;qi),

where

qi=L(w|bi=1,bbi,G,ρstart)L(w|bi=1,bbi,G,ρstart)+L(w|bi=0,bbi,G,ρstart).

The Bernoulli form of this posterior allows direct sampling of the loads.

We will apply this method for synthetic and experimental data in the second companion article of this series (55).

iHMM methods for pulsed illumination

Under pulsed illumination, the Bernoulli process prior described earlier for continuous illumination can in principle be used as is to estimate the number of system states and the transition rates. However, in this section, we describe a computationally cheaper inference strategy applicable to the simplified likelihood of Eq. 60 assuming system state transitions occurring only at the beginning of each pulse. The reduction in computational expense is achieved by directly learning the elements of the propagator Πσ of Eq. 58, identical for all interpulse periods. In doing so, we learn transition probabilities for the system states instead of learning rates, although we will continue learning rates for photophysical states. This avoids expensive matrix exponentials for potentially large system state numbers required for computing the propagators under continuous illumination.

Now, to infer the transition probabilities in Πσ, the dimensions of which are unknown owing to an unknown number of system states, as well as transition rates among the photophysical states (elements of Gψ in Eq. 4), and initial probabilities, we must place suitable priors on these parameters yielding the following posterior

p(ρstart,Πσ,Gψ|w)p(w|ρstart,Πσ,Gψ)p(ρstart)p(Gψ)p(Πσ), (73)

where we have immediately written the joint prior as a product prior over ρstart, Gψ, and Πσ. Next, for ρstart and Gψ we use the same priors as in Eqs. 65 and 66. However, as the number of system states is unknown, Πσ requires special treatment. To learn Πσ, it is convenient to adopt the infinite HMM (iHMM) (41,48) due to the discrete nature of system state transitions over time.

As the name suggests, the iHMM leverages infinite system state spaces (Mσ in Eq. 3) similar to the Bernoulli process prior described in the “Bernoulli process for continuous illumination” section. However, unlike the Bernoulli process, all system states remain permanently active. The primary goal of an iHMM is then to infer transition probabilities between system states, some of which, not warranted by the data, remain very small and set by the (nonparametric) prior that we turn to shortly. Thus the effective number of system states can be enumerated from the most frequently visited system states over the course of a learned trajectory.

Within this iHMM framework, we place an infinite dimensional version of the Dirichlet prior, termed the Dirichlet process prior (41,48,78), as priors over each row of the propagator Πσ. That is,

πmDirichletProcess(αβ),m=1,2,, (74)

where πm is the m-th row of Πσ. Here, the hyperparameters of the Dirichlet process prior include the concentration parameter α that determines the sparsity of the πm and the hyper parameter β, which is a probability vector, also known as base distribution. Together αβ are related to the ζ introduced earlier for the (finite) Dirichlet distribution of Eq. 65.

Next, as the base distribution itself is unknown and all transitions out of each state should be likely to revisit the same set of states, we must place the same base distribution on all Dirichlet process priors placed on the rows of the propagator. To sample this unique base, we again choose a Dirichlet process prior (41,79,80,81), that is,

βDirichletProcess(ξγ),

where we may set ξ=1 and γ is a vector of hyperparameters of size Mσ.

Now, to deduce the unknown parameters, we need to draw samples from the posterior in Eq. 73. However, due to the nonanalytical form of the posterior we cannot jointly sample our posterior. Thus, as before, we adopt a Gibbs sampling strategy to sequentially and separately draw samples for each parameter. Here, we only illustrate our Gibbs sampling step for the transition probabilities πm. Our Gibbs steps for the remaining parameters are similar to the ones in the “parametric sampler: BNP-FRET with fixed number of system states” section. The complete procedure is described in the third companion article (54).

Similar to the Bernoulli process prior, there are two common approaches to draw samples within the iHMM framework: slice sampling using the exact Dirichlet process prior and finite truncation (41,48,82,83). Just as before for the case of continuous illumination, we truncate the Dirichlet process prior to a finite Dirichlet distribution and fix its dimensionality to a finite (albeit large) number which we set to Mσmax to improve the sampling. It can then be shown that, for large enough Mσmax, the number of system states visited becomes independent of Mσmax (41).

As before, to numerically sample the transition probabilities πm from our full posterior in Eq. 73 through MCMC, we choose our initial samples from the priors

βDirichlet(ξγ),πmDirichlet(αβ),m=1,2,,Mσmax

where we chose elements of γ to be 1/Mσmax to ascribe similar weights across the state space a priori.

Likelihood computation in practice

As shown in the “pulsed illumination” section, the likelihood typically takes the following generic form

LρstartQ1Q2Q3QK1QKρnormT, (75)

where Qi are matrices whose exact form depends on which effects we incorporate into our likelihood. Computing this last expression would typically lead to underflow as likelihood values quickly drop below floating-point precision.

For this reason, it is convenient to introduce the logarithm of this likelihood. To derive the logarithm of the likelihood of Eq. 75, we rewrite the likelihood as a product of multiple terms as follows

L(ρstartρnormT)(ρ1Q1ρnormT)(ρ2Q2ρnormT)(ρK1QK1ρnormT)(ρKQKρnormT),

where ρi are the normalized probability vectors given by the following recursive formula

ρ1=ρstart,andρi=ρi1Qi1(ρi1Qi1ρnormT).

Now, using the recursion relation above, the log-likelihood can be written as

log(L)=log(ρstartρendT)+log(ρ1Q1ρnormT)+log(ρ2Q2ρnormT)+log(ρ3Q3ρnormT)+log(ρK1QK1ρnormT)+log(ρKQKρnormT)+const,

where const is a constant.

Note that ρstartρnormT=1. The pseudocode to compute the log-likelihood is as follows

ρ=ρstart

p=sum(ρ)=1

log(L)=log(p)=0

fori=1:K

 Q=ρ=ρQp=sum(ρ)log(L)=log(L)+log(p)ρ=ρ/p

end

returnlog(L)

Results

In this section, we present results using our BNP-FRET sampler described above. Specifically, here we benchmark the parametric (i.e., fixed number of system states) version of our sampler using synthetic data, while the two subsequent manuscripts (54,55) focus on the nonparametric (i.e., unknown number of system states) analysis of experimental data.

For simplicity alone, we begin by analyzing data from an idealized system with two system states using different photon budgets and excitation rates. Next, we consider more realistic examples incorporating the following one at a time: 1) crosstalk and detection efficiency; 2) background emission; 3) IRF; and then 4) end with a brief discussion on the unknown number of system states. We demonstrate when these features become relevant, as well as the overall robustness and generality of the BNP-FRET sampler.

For now, we assume continuous illumination for all parametric examples and use the following priors for the analyses. The prior used for the FRET rates are

λσiFRETGamma(1,1ns1),

and use the following prior over the system transition rates

λσiσjGamma(1,106ns1).

As discussed earlier in the section “parametric sampler: BNP-FRET with fixed number of system states,” it is more convenient to work within logarithmic space where we use the following proposal distributions to update the parameter values

log(λex)|log(λex),σexNormal(log(λex),σex2),log(λσiFRET)|log(λσiFRET),σFRETNormal(log(λσiFRET),σFRET2),andlog(λσiσj)|log(λσiσj),σsysNormal(log(λσiσj),σsys2),

where denotes proposed rates and where it is understood that all rates appearing in the logarithm have been divided through by a unit constant in order for the argument of the logarithm to remain dimensionless.

For efficient exploration of the parameter space by the BNP-FRET sampler and upon extensive experimentation with acceptance ratios, we found it prudent to alternate between two sets of variances, {σex2=105, σFRET2=0.01, sigmasys2=0.1} and {σex2=105, σFRET2=0.5, σsys2=5.0} to generate an MCMC chain. This ensures that we propose samples of different orders of magnitude. As an intuitive guide, the more data we have, the sharper we expect our posterior over our rates to be and, thus, the smaller both variances should be in our proposal distributions.

In the examples presented in the next few subsections, for computational simplicity, we fix the escape rates for the donor and acceptor excited photophysical states as well as the background rates for each detection channel in our simulations, as they can be precalibrated from experiments.

Parametric examples

Photon budget and excitation rate

Here, we perform Bayesian analysis on synthetically generated data (as described in the “synthetic data generation” section) for the simplest case where the number of system states is an input to the BNP-FRET sampler. To generate data, we use the following generator matrix

G=[10.002.00.00.02.77×1051.11×1050.02.00.02.85×1050.00.00.02.01.00.00.010.000.01.00.02.77×1050.91×1050.00.01.02.85×1050.0]ms1,

where the elements are motivated from real experiments (84). Using this generator matrix, we generated a superstate trajectory as described in the “synthetic data generation” section. We analyzed 430,000 photons from the generated data using our BNP-FRET sampler. The resulting posterior distribution for transitions between system states and FRET efficiencies (computed as ϵσiFRET=λσiFRET/(λd+λσiFRET) for the i-th system state) is shown in Fig. 5. As we will see for all examples, the finiteness of data always leads to some error as evident from the slight offset of the peaks of the distribution from the ground truth.

Figure 5.

Figure 5

Learned bivariate posterior for the system state escape rates λesc and FRET efficiencies ϵFRET given synthetic data. To produce this plot, we analyzed synthetic data generated using an excitation rate of λex=10 ms−1, and escape rates λesc=1 and 2 ms−1 and FRET efficiencies of 0.09 and 0.29 for the two system states, respectively. The ground truth is shown with red dots. The FRET efficiencies estimated by our sampler are 0.2880.006+0.007 and 0.0920.003+0.003. Furthermore, predicted escape rates are 2.030.17+0.16 ms−1 and 0.980.07+0.10 ms−1. The small bias away from the ground truth is due to the finiteness of data. We have smoothed the distributions, for illustrative purposes only, using kernel density estimation (KDE) available through the Julia Plots package.

The effects of a limited photon budget become significant especially when system kinetics occur across multiple timescales with the most photon-starved state characterized by the largest escape rate. In this case, it is useful to quantify how many photons are typically required to assess any escape rate (with the fastest setting the lower photon count bound needed) to obtain below 15% error in parameter estimates.

To quantify the number of photons, ignoring background and detector effects, we define a dimensionless quantity that we call the “photon budget index” predicting the photon budget needed to accurately estimate the transition rates in the model as

s=KλexλprobeMσ, (76)

where K is the total number of photons in a single-photon smFRET trace (photon budget), λex is the excitation rate, λprobe represents the escape rate (timescale) that we want to probe, and Mσ is the number of system states. The parameters in the numerator control the amount of data available and the temporal resolution. On the other hand, the parameters in the denominator are the properties of the system under investigation and represent the required resolution.

From experimentation, we have found a photon budget index of approximately 106 to be a safe lower threshold for keeping errors below 15% (this error cutoff is a user choice) in parameter estimates. In the simple parametric example above, we have K=4.3×105, λex=10 ms−1, and the fastest transition that we want to probe is λprobe=2 ms−1, and Mσ=2, which corresponds to a photon budget index of 1.08×107. In Fig. 6, we also demonstrate the reduction in errors (confidence interval size) for parameters of the same system as the photon budget is increased from 12,500 to 400,000 photons. For each of those cases, we used 9000 MCMC samples to compute statistical metrics such as quantiles.

Figure 6.

Figure 6

System and FRET transition rates as functions of the number of photons used for analysis. To produce these plots using the same kinetic parameters as in Fig. 5. Next, we analyzed the data considering only the first 12,500 photons and then increased the photon budget by a factor of two for each subsequent analysis. Furthermore, we generated 9000 MCMC samples for each analysis to compute statistical quantities. In (a), we show two plots corresponding to the two system transition rates (λesc). The blue dots represent the median values (50% quantile), and the ends of the attached confidence intervals represent 5 and 95% quantiles. The ground truths are shown with red horizontal lines. We show similar plots for FRET transition rates (λFRET) in (b). In all of the plots, we see that, as the photon budget is increased, the confidence intervals shrink (the posterior gets narrower/sharper). With a budget of 400,000 photons, the confidence intervals represent less than 10% error in the estimates and contain the ground truths in all of the plots.

We further investigate the effect of another quantity that appears in the photon budget index, that is, excitation rate on the parameter estimates. To do so, we generate three new synthetic data sets, each containing 670,000 photons, using the same excitation rate of 10 ms−1, and FRET efficiencies of 0.28 and 0.09 for the two system states, respectively, as before. However, the kinetics differ across these data sets so that they have system state transition rates well below, equal to, and well above the excitation rate. As such, for the first data set, we probe slower kinetics compared with the excitation rate with system state transition rates set at 0.1 ms−1. In the next two data sets, the molecule changes system states at a much faster rates of 10 and 1000 ms−1, respectively.

The results obtained for these FRET traces using our Bayesian methods are shown in Fig. 7. The bias in the posterior away from the ground truth increases as faster kinetics are probed in Fig. 7, from left to right. The results for the case with the fastest transition rates of 1000 ms−1 in Fig. 7 c show a marked deterioration of the predictions, as the information content is not sufficient to separate the two FRET efficiencies resulting in estimated values close to the average of the ground truth values (0.185). This lack of information is also reflected in the uncertainties corresponding to each escape rate as shown in Fig. 7. Moreover, the predicted transition rates are of the same order as the excitation rate itself due to lack of temporal resolution available to probe such fast kinetics.

Figure 7.

Figure 7

Learned bivariate posterior for the system state escape rates λesc (log-scale in (c)) and FRET efficiencies ϵFRET from synthetic data. For all synthetic smFRET traces, we use an excitation rate of 10 ms−1 and FRET efficiences of 0.29 and 0.09 for the two system states, respectively. The three panels correspond to different timescales being probed with transitions rates: (a) 0.1 ms−1; (b) 10 ms−1; and (c) 1000 ms−1. The ground truth values are shown with red dots. The bias in the parameter estimates increases as faster kinetics are probed, demonstrating deterioration of the information content of the collected data resulting in expectedly poor estimation assuming a fixed photon budget of 670,000. This can also be seen quantitatively by calculating the confidence intervals reported below for each case. The FRET efficiencies estimated by our sampler for the slowest case in (a) are 0.2860.002+0.002 and 0.0910.001+0.001 ms−1, and the corresponding escape rates are 0.1010.005+0.004 and 0.0960.004+0.004 ms−1. For the intermediate case in (b), FRET efficiencies estimated by our sampler are 0.2000.110+0.117 and 0.1020.014+0.022, and predicted escape rates are 8.473.17+2.42 ms−1 and 7.672.66+1.32 ms−1. For the fastest case in (c), FRET efficiencies estimated by our sampler are 0.1890.027+0.025 and 0.1890.029+0.016, and predicted escape rates are 5.003.63+26.9 ms−1 and 3.492.49+27.21 ms−1. Poorer confidence intervals for larger escape rates reflect larger uncertainty due to lack of information.

To conclude, the excitation rate used to collect smFRET data and the total number of photons available determine the amount of information needed to resolve transitions among system states. As such, the ability of Bayesian methods to naturally propagate error from finiteness of information into parameter estimates make them indispensable tools for quantitative smFRET data analysis. This is by contrast to maximum likelihood-based methods, which provide only inaccurate point estimates on account of limited data.

An example with crosstalk

Here, we demonstrate how our method handles cases when significant crosstalk is present. To show this, we use the same dynamical parameters and photon budget as in the previous subsection for generating synthetic data but allow 5% of the donor photons to be stochastically detected by the acceptor channel. We then analyze the data with two versions of our method, one that incorporates crosstalk and one that ignores it altogether. Our results show that neglecting crosstalk necessarily leads to artefactually higher FRET efficiency estimates. This is clearly seen in Fig. 8 a. As expected, incorporation of crosstalk into the likelihood, as shown in the “cross talk and detection efficiency” section, results in a smaller bias. In this case, both ground truths fall within the range of posteriors for the corrected model; see Fig. 8 b. Furthermore, as shown in Fig. 9, top panels, we note that donor crosstalk again results in overestimation in FRET efficiencies. However, when we correct for crosstalk, our BNP-FRET sampler starts learning FRET efficiencies with ground truths falling within the range of 95% confidence intervals (Fig. 9, bottom panels). As expected, our simulations in Fig. 9 also show that uncertainty increases with increasing crosstalk and parameter estimation fails for crosstalk values beyond 60%.

Figure 8.

Figure 8

Learned bivariate posterior for the system state escape rates λesc and FRET efficiencies ϵFRET for synthetic data with crosstalk. The ground truth is shown with red dots. In (a), we show the learned posterior using the model that does not correct for crosstalk consistently shows deviation from the ground truth with higher FRET efficiency estimates on account of more donor photons being detected as acceptor photons. The FRET efficiencies estimated by our sampler for this case are 0.3140.009+0.007 and 0.1350.005+0.002, and the predicted escape rates are 1.900.11+0.17 ms−1 and 1.050.18+0.10 ms−1. For the corrected case shown in (b), FRET efficiencies estimated by our sampler are 0.2760.010+0.006 and 0.0880.006+0.004. Furthermore, predicted escape rates are 1.850.14+0.15 ms−1 and 1.060.10+0.12 ms−1. The corrected model mitigates this bias as demonstrated by the posterior in (b).

Figure 9.

Figure 9

System transition rates λesc and FRET efficiencies ϵFRET as functions of increasing donor crosstalk probability φd1. To produce these plots, we generated synthetic data with excitation and escape rates as in Fig. 5. In each plot, the blue dots represent the median values (50% quantile), and the ends of the attached confidence intervals represent the 5 and 95% quantiles. Furthermore, the ground truths are shown with red horizontal lines. In (a), our two plots show system transition rates estimated by the BNP-FRET sampler when corrected and uncorrected for crosstalk. We show similar plots for FRET efficiencies in (b). In all plots, we see that, as donor crosstalk is increased, the confidence intervals grow (the posterior gets wider) and the estimates become unreliable after φd1>0.6. In addition, as expected, if uncorrected for, the FRET efficiencies start to merge with increasing crosstalk due to most photons being detected in acceptor channel (labeled 1).

An example with background emissions

In the “background emissions” section, we had shown a way to include background emissions in the forward model. For the current example, we again choose the same kinetic parameters for the system and the FRET pair as in Fig. 5, but now some of the photons come from background sources with rates λibg=λex/10=1 ms−1 for the i-th channel. Addition of a uniform background would again lead to higher FRET efficiency estimates due to excess photons detected in each channel, if left uncorrected in the model, as can be seen in Fig. 10 a and Fig. 11, top panels. By comparison with the uncorrected method, our results migrate toward the ground truth when analyzed with the full method (see Figs. 10 b and 11, bottom panels). Furthermore, as shown in Fig. 11, when background photons account for more than approximately 40% of detected photons, relative uncertainties in estimated transition rates become larger than 25% indicating unreliable results.

Figure 10.

Figure 10

Learned bivariate posterior for the system state escape rates λesc and FRET efficiencies ϵFRET given synthetic data with background emissions. The ground truth is shown with red dots. The learned posterior distribution using the model that does not correct for background emissions (a) consistently shows bias away from the ground truth with higher estimates for FRET efficiencies on account of extra background photons. The FRET efficiencies estimated by our sampler for this uncorrected case are 0.3220.008+0.007 and 0.1610.004+0.003, and the predicted escape rates are 1.930.14+0.17 ms−1 and 0.950.10+0.10 ms−1. The corrected model mitigates this bias as demonstrated by the posterior in (b) as demonstrated by learned FRET efficiencies of 0.2930.012+0.014 and 0.0960.004+0.003. Furthermore, predicted escape rates are 1.990.25+0.25 ms−1 and 0.870.05+0.08 ms−1.

Figure 11.

Figure 11

System transition rates λesc and FRET efficiencies ϵFRET as functions of increasing donor and acceptor background fraction λbg/λex. To produce these plots, we generated synthetic data with an excitation rate of λex=10 ms−1, and escape rates λesc=1 and 2 ms−1 for the two system states, respectively, same as Fig. 5, while increasing the fraction of background photons (donor and acceptor) from 0 to 50% (λbg/λex=1). In each plot, the blue dots represent the median values (50% quantile), and the ends of the attached confidence intervals represent the 5 and 95% quantiles. Furthermore, ground truths are shown with red horizontal lines. In (a), we show two plots showing system transition rates estimated by the BNP-FRET sampler when corrected and uncorrected for crosstalk. We show similar plots for FRET efficiencies in (b). In all plots, we see that as background is increased, the confidence intervals get bigger (the posterior gets wider) and the estimates become unreliable after λbg/λex>0.6. In addition, as expected, if unaccounted for, FRET efficiencies start to merge with increasing background as photons originating from FRET events significantly reduce.

An example with IRF

To demonstrate the effect of the IRF as described in the “adding the detection IRF” section, we generated new synthetic data for a single fluorophore (with no FRET for simplicity alone) with an escape rate of λd=2.0 ns−1 (similar to that of Cy3 dye (85)) being excited by a continuous-wave laser at a high excitation rate λex=0.01 ns−1. For simplicity, we approximate the IRF with a truncated Gaussian distribution about 96 ps wide with mean at 48 ps. We again analyze the data with two versions of our method, both incorporating and neglecting the IRF. The results are depicted in Fig. 12, where the posterior is narrower when incorporating the IRF. This is especially helpful when accurate lifetime determination is important in discriminating between different system states. By contrast, accurate determination of lifetimes (which span nanoseconds timescales) do not impact the determination of much slower system kinetics from one system state to the next.

Figure 12.

Figure 12

Effects of IRF. Both histograms show the fluorophore’s inverse lifetime with ground truth shown by a red line. The bias in the peaks away from the ground truth arises from the limited amounts of data being used to learn the posterior shown. The corrected model (orange) reduces the histogram’s breadth compared with the uncorrected model (blue). We conclude from the small effects of correcting for the IRF that, predictably, the IRF may be less important under continuous illumination. By contrast, under pulsed illumination to be explored in the third companion article (54), the IRF will play a more significant role.

A nonparametric example

Here, we demonstrate our method in learning the number of system states by analyzing approximately 600 ms ( 120,000 photons) of synthetic smFRET time trace data with three system states under pulsed illumination with 25 ns interpulse window (see Fig. 13). This example utilizes the iHMM method described in the “iHMM methods for pulsed illumination” section earlier and discussed in greater depth in the third companion article (54). Using realistic values from the third companion article (54), we set the excitation probability per pulse to be 0.005. Furthermore, kinetics are set at 1.2 ms escape rates for the highest and lowest FRET system states, and an escape rate of 2.4 ms for an intermediate system state. Our BNP method simultaneously recovered the correct system state transition probabilities and thereby the number of system states along with other parameters, including donor and acceptor relaxation rates and the per-pulse excitation probability. By comparison, a parametric version of the same method must assume a fixed number of system states a priori. Assuming, say, two system states results in both higher-FRET system states being combined together into one system state with a FRET efficiency of 0.63 and a lifetime of about 0.6 ms (see Fig. 13 c).

Figure 13.

Figure 13

Demonstration of nonparametric analysis on synthetic pulsed data. In (a) we show simulated data for a pulsed illumination experiment. It illustrates a trajectory with three system states labeled sn in blue and the corresponding photon arrivals with the vertical length of the red or green line denoting the lifetime observed in nanoseconds. Here (b) and (c) show the bivariate posterior for the escape rates λesc and FRET efficiencies ϵFRET. The ground truth is again shown with red dots. The learned posterior using the two system state parametric model (c) combines high FRET states into one averaged system state with a long lifetime. The nonparametric model correctly infers three system states, as shown in (b).

Discussion

In this paper, we have presented a complete framework to analyze single-photon smFRET data, which includes a photon-by-photon likelihood, detector effects, fluorophore characteristics, and different illumination methods. We demonstrated how modern Bayesian methods can be used to obtain full distributions over the parameters, and discussed limitations posed by the photon budget and excitation rate. In addition, we have shown how to implement a nonparametric inverse strategy to learn an unknown number of system states.

Our method readily accommodates details relevant to specialized smFRET applications. For instance, we can analyze spatial and temporal dependence of excitation by simple modification of generator matrices included in Eq. 21. This is useful in experiments employing pulsed illumination (see the “illumination features” section), as well as alternating-laser excitation (ALEX). In particular, ALEX is used to directly excite the acceptor label, either as a way to gain qualitative information about the sample (65,86), to reduce photobleaching (65,86), or to study intermolecular interactions (16). Similarly, the generator matrix in Eq. 21 can easily be expanded to include any number of labels, extending our method beyond two colors. Three color smFRET experiments have revealed simultaneous interactions between three proteins (87), monitored conformational subpopulations of molecules (88), and improved our understanding of protein folding and interactions (2,16,89,90,91).

As the likelihood (Eq. 21) involves as many matrix exponentials as detected photons, the computational cost of our method scales approximately linearly with the number of photons and quadratically with the number of system states. For instance, it took about 5 hours to analyze the data used to generate Fig. 5 on a regular desktop computer. Additions to our model that increase computational cost include: 1) IRF; 2) pulsed illumination; and 3) BNPs. The computational cost associated to the IRF is attributed to the integral required (see the “adding the detection IRF” section). The cost of the likelihood computation in the pulsed illumination case depends linearly with pulse number, rather than with photon number (see Eq. 51). This greatly increases the computational cost in cases where photon detections are infrequent. Finally, BNPs necessarily expand the dimensions of the generator matrix whose exponentiation is required (Eq. 21) resulting in longer burn-in time for our MCMC chains.

As a result, we have optimized the computational cost with respect to the physical conditions of the system being studied. First, inclusion of the IRF can be parallelized, potentially reducing the time-cost to a calculation over a single data acquisition period. In our third companion article (54), dealing with pulsed illumination, we improve computational cost by making the assumption that fluorophore relaxation occurs within the window between consecutive pulses, thereby reducing our second-order structure herein to a first-order HMM, and allowing for faster computation of the likelihood in pulses where no photon is detected. Furthermore, in (54), we also mitigate the computational cost by assuming physically motivated timescale separation.

As it stands, our framework applies to smFRET experiments on immobilized molecules. However, it is often the case that molecules labeled with FRET pairs are allowed to diffuse freely through a confocal volume, such as in the study of binding and unbinding events (17), protein-protein interactions (17,65), and unhindered conformational dynamics of freely diffusing proteins (65,92). Photon-by-photon analysis of such data is often based on correlation methods which suffer from bulk averaging (40,42,43). We believe our framework has the potential to extend (50,73,93) to learn both the kinetics and diffusion coefficients of single molecules.

In addition, our current framework is restricted to models with discrete system states. However, smFRET can also be used to study systems that are better modeled as continuous, such as intrinsically disordered proteins, which include continuous changes not always well approximated by discrete system states (17,94). Adopting an adaptation of (56) should allow us to generalize this framework and instead infer energy landscapes, perhaps relevant to protein folding (95,96), continuum ratchets as applied to motor protein kinetics (97), and the stress modified potential landscapes of mechano-sensitive molecules (98).

To conclude, we have presented a general framework and demonstrated the importance of incorporating various features into the likelihood while learning full distributions over all unknowns including system states. In the following two companion articles (54,55), we specialize our method, and computational scheme, to continuous and pulsed illumination. We then apply our method to interactions of the intrinsically disordered proteins NCBD and ACTR (55) under continuous illumination, and the kinetics of the Holliday junction under pulsed illumination (54).

Code availability

The BNP-FRET software package is available on Github at https://github.com/LabPresse/BNP-FRET.

Acknowledgments

We thank Weiqing Xu and Dr. Zeliha Kilic for regular feedback and help, especially during the development of the nonparametrics samplers, and Dr. Irina Gopich for discussions. S.P. acknowledges support from the NIH NIGMS (R01GM130745) for supporting early efforts in nonparametrics and NIH NIGMS (R01GM134426) for supporting single-photon efforts. The bulk of the computations was performed on Agave and Sol supercomputers at ASU.

Declaration of interests

The authors declare no competing interests.

Editor: Jorg Enderlein.

References

  • 1.Wu L., Huang C., et al. James T.D. Förster resonance energy transfer (FRET)-based small-molecule sensors and imaging agents. Chem. Soc. Rev. 2020;49:5110. doi: 10.1039/c9cs00318e. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Roy R., Hohng S., Ha T. A practical guide to single-molecule FRET. Nat. Methods. 2008;5:507. doi: 10.1038/nmeth.1208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Demchenko A.P. Springer Science & Business Media; 2008. Introduction to Fluorescence Sensing. [Google Scholar]
  • 4.Periasamy A., Day R. Elsevier; 2011. Molecular Imaging: FRET Microscopy and Spectroscopy. [Google Scholar]
  • 5.Rhoades E., Gussakovsky E., Haran G. Watching proteins fold one molecule at a time. Proc. Natl. Acad. Sci. USA. 2003;100:3197–3202. doi: 10.1073/pnas.2628068100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Martinac B. Single-molecule FRET studies of ion channels. Prog. Biophys. Mol. Biol. 2017;130:192–197. doi: 10.1016/j.pbiomolbio.2017.06.014. [DOI] [PubMed] [Google Scholar]
  • 7.Chung H.S., McHale K., et al. Eaton W.A. Single-molecule fluorescence experiments determine protein folding transition path times. Science. 2012;335:981–984. doi: 10.1126/science.1215768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Pressé S., Ghosh K., et al. Dill K.A. Dynamical fluctuations in biochemical reactions and cycles. Phys. Rev. E - Stat. Nonlinear Soft Matter Phys. 2010;82:031905. doi: 10.1103/PhysRevE.82.031905. [DOI] [PubMed] [Google Scholar]
  • 9.Schuler B. Single-molecule FRET of protein structure and dynamics - a primer. J. Nanobiotechnol. 2013;11(S2):S2. doi: 10.1186/1477-3155-11-S1-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Coban O., Zanetti-Dominguez L.C., et al. Ng T. Effect of phosphorylation on EGFR dimer stability probed by single-molecule dynamics and FRET/FLIM. Biophys. J. 2015;108:1013–1026. doi: 10.1016/j.bpj.2015.01.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Halder K., Dölker N., et al. Neumann H. MD simulations and FRET reveal an environment-sensitive conformational plasticity of importin-β. Biophys. J. 2015;109:277–286. doi: 10.1016/j.bpj.2015.06.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Sabir T., Schröder G.F., et al. Magennis S.W. Global structure of forked DNA in solution revealed by high-resolution single-molecule FRET. J. Am. Chem. Soc. 2011;133:1188–1191. doi: 10.1021/ja108626w. [DOI] [PubMed] [Google Scholar]
  • 13.Phelps C., Israels B., Jose D., Marsh M.C., von Hippel P.H., Marcus A.H. Using microsecond single-molecule FRET to determine the assembly pathways of T4 ssDNA binding protein onto model DNA replication forks. Proc. Natl. Acad. Sci. USA. 2017;114:E3612–E3621. doi: 10.1073/pnas.1619819114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Baltierra-Jasso L.E., Morten M.J., et al. Magennis S.W. Crowding-induced hybridization of single DNA hairpins. J. Am. Chem. Soc. 2015;137:16020–16023. doi: 10.1021/jacs.5b11829. [DOI] [PubMed] [Google Scholar]
  • 15.Wang Y., Liu Y., et al. Selvin P.R. Single molecule FRET reveals pore size and opening mechanism of a mechano-sensitive ion channel. Elife. 2014;3:e01834. doi: 10.7554/eLife.01834. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Yoo J., Kim J.-Y., et al. Chung H.S. Fast three-color single-molecule FRET using statistical inference. Nat. Commun. 2020;11:3336. doi: 10.1038/s41467-020-17149-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Schuler B., Eaton W.A. Protein folding studied by single-molecule FRET. Curr. Opin. Struct. Biol. 2008;18:16–26. doi: 10.1016/j.sbi.2007.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kilic S., Felekyan S., et al. Fierz B. Single-molecule FRET reveals multiscale chromatin dynamics modulated by HP1α. Nat. Commun. 2018;9:235. doi: 10.1038/s41467-017-02619-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Lerner E., Cordes T., et al. Weiss S. Toward dynamic structural biology: Two decades of single-molecule Förster resonance energy transfer. Science. 2018;359:eaan1133. doi: 10.1126/science.aan1133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Förster T. Zwischenmolekulare energiewanderung und fluoreszenz. Ann. Phys. 1948;437:55–75. [Google Scholar]
  • 21.Jones G.A., Bradshaw D.S. Resonance energy transfer: From fundamental theory to recent applications. Front. Physiol. 2019;7:100. [Google Scholar]
  • 22.Eisaman M.D., Fan J., et al. Polyakov S.V. Invited review article: Single-photon sources and detectors. Rev. Sci. Instrum. 2011;82:071101. doi: 10.1063/1.3610677. [DOI] [PubMed] [Google Scholar]
  • 23.McKinney S.A., Joo C., Ha T. Analysis of single-molecule FRET trajectories using hidden Markov modeling. Biophys. J. 2006;91:1941–1951. doi: 10.1529/biophysj.106.082487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Bronson J.E., Fei J., et al. Wiggins C.H. Learning rates and states from biophysical time series: A Bayesian approach to model selection and single-molecule FRET data. Biophys. J. 2009;97:3196–3205. doi: 10.1016/j.bpj.2009.09.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Sgouralis I., Madaan S., et al. Pressé S. A Bayesian nonparametric approach to single molecule Förster resonance energy transfer. J. Phys. Chem. B. 2019;123:675–688. doi: 10.1021/acs.jpcb.8b09752. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Becker W. Springer; 2015. Advanced Time-Correlated Single Photon Counting Applications. [Google Scholar]
  • 27.Isbaner S., Karedla N., et al. Enderlein J. Dead-time correction of fluorescence lifetime measurements and fluorescence lifetime imaging. Opt Express. 2016;24(9):9429–9445. doi: 10.1364/OE.24.009429. [DOI] [PubMed] [Google Scholar]
  • 28.Rasnik I., McKinney S.A., Ha T. Nonblinking and long-lasting single-molecule fluorescence imaging. Nat. Methods. 2006;3:891–893. doi: 10.1038/nmeth934. [DOI] [PubMed] [Google Scholar]
  • 29.Hübner C.G., Renn A., et al. Wild U.P. Direct observation of the triplet lifetime quenching of single dye molecules by molecular oxygen. J. Chem. Phys. 2001;115:9619–9622. [Google Scholar]
  • 30.Dale R.E., Eisinger J., Blumberg W.E. The orientational freedom of molecular probes. The orientation factor in intramolecular energy transfer. Biophys. J. 1979;26:161–193. doi: 10.1016/S0006-3495(79)85243-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Schuler B. Single-molecule fluorescence spectroscopy of protein folding. ChemPhysChem. 2005;6:1206–1220. doi: 10.1002/cphc.200400609. [DOI] [PubMed] [Google Scholar]
  • 32.Kilic Z., Sgouralis I., et al. Pressé S. Extraction of rapid kinetics from smFRET measurements using integrative detectors. Cell Rep. Phys. Sci. 2021;2:100409. doi: 10.1016/j.xcrp.2021.100409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Gopich I.V., Szabo A. Single-macromolecule fluorescence resonance energy transfer and free-energy profiles. J. Phys. Chem. B. 2003;107:5058–5063. [Google Scholar]
  • 34.Andrec M., Levy R.M., Talaga D.S. Direct determination of kinetic rates from single-molecule photon arrival trajectories using hidden Markov models. J. Phys. Chem. A. 2003;107:7454–7464. doi: 10.1021/jp035514+. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Keller B.G., Kobitski A., et al. Noé F. Complex RNA folding kinetics revealed by single-molecule FRET and hidden Markov models. J. Am. Chem. Soc. 2014;136:4534–4543. doi: 10.1021/ja4098719. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Thomsen J., Sletfjerding M.B., et al. Hatzakis N.S. DeepFRET, a software for rapid and automated single-molecule FRET data classification using deep learning. Elife. 2020;9:e60404. doi: 10.7554/eLife.60404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Gopich I.V., Szabo A. Decoding the pattern of photon colors in single-molecule FRET. J. Phys. Chem. B. 2009;113:10965–10973. doi: 10.1021/jp903671p. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Kilic Z., Sgouralis I., Pressé S. Generalizing HMMs to continuous time for fast kinetics: Hidden Markov jump processes. Biophys. J. 2021;120:409–423. doi: 10.1016/j.bpj.2020.12.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Harris P.D., Narducci A., et al. Lerner E. Multi-parameter photon-by-photon hidden Markov modeling. Nat. Commun. 2022;13(1):1000. doi: 10.1038/s41467-022-28632-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Gopich I.V., Szabo A. Theory of the statistics of kinetic transitions with application to single-molecule enzyme catalysis. J. Chem. Phys. 2006;124:154712. doi: 10.1063/1.2180770. [DOI] [PubMed] [Google Scholar]
  • 41.Sgouralis I., Pressé S. An introduction to infinite HMMs for single-molecule data analysis. Biophys. J. 2017;112:2021–2029. doi: 10.1016/j.bpj.2017.04.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Gopich I.V., Szabo A. Single-molecule FRET with diffusion and conformational dynamics. J. Phys. Chem. B. 2007;111:12925–12932. doi: 10.1021/jp075255e. [DOI] [PubMed] [Google Scholar]
  • 43.Pirchi M., Tsukanov R., et al. Nir E. Photon-by-photon hidden Markov model analysis for microsecond single-molecule FRET kinetics. J. Phys. Chem. B. 2016;120:13065–13075. doi: 10.1021/acs.jpcb.6b10726. [DOI] [PubMed] [Google Scholar]
  • 44.Lerner E., Ingargiola A., Weiss S. Characterizing highly dynamic conformational states: The transcription bubble in RNAP-promoter open complex as an example. J. Chem. Phys. 2018;148:123315. doi: 10.1063/1.5004606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Lever J., Krzywinski M., Altman N. Model selection and overfitting. Nat. Methods. 2016;13:703–704. [Google Scholar]
  • 46.Ferguson T.S. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1973;1:209. [Google Scholar]
  • 47.Gershman S.J., Blei D.M. A tutorial on Bayesian nonparametric models. J. Math. Psychol. 2012;56:1–12. [Google Scholar]
  • 48.Sgouralis I., Whitmore M., et al. Pressé S. Single molecule force spectroscopy at high data acquisition: A Bayesian nonparametric analysis. J. Chem. Phys. 2018;148:123320. doi: 10.1063/1.5008842. [DOI] [PubMed] [Google Scholar]
  • 49.Tavakoli M., Taylor J.N., et al. Pressé S. Single molecule data analysis: An introduction. Adv. Chem. Phys. 2017;162:205–305. [Google Scholar]
  • 50.Tavakoli M., Jazani S., et al. Pressé S. Pitching single-focus confocal data analysis one photon at a time with Bayesian nonparametrics. Phys. Rev. X. 2020;10:011021. doi: 10.1103/physrevx.10.011021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Tavakoli M., Jazani S., et al. Pressé S. Direct photon-by-photon analysis of time-resolved pulsed excitation data using Bayesian nonparametrics. Cell Rep. Phys. Sci. 2020;1:100234. doi: 10.1016/j.xcrp.2020.100234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Bryan J.S., 4th, Sgouralis I., Pressé S. Diffraction-limited molecular cluster quantification with Bayesian nonparametrics. Nat. Comput. Sci. 2022;2:102–111. doi: 10.1038/s43588-022-00197-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Fazel M., Jazani S., et al. Pressé S. High resolution fluorescence lifetime maps from minimal photon counts. ACS Photonics. 2022;9:1015–1025. doi: 10.1021/acsphotonics.1c01936. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Safar M., Saurabh A., et al. Pressé S. Single photon smFRET. III. application to pulsed illumination. Biophys. Rep. 2022;2:100088. doi: 10.1016/j.bpr.2022.100088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Saurabh A., Safar M., et al. Pressé S. Single photon smFRET. II. application to continuous illumination. Biophys. Rep. 2022;3:100087. doi: 10.1016/j.bpr.2022.100087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Bryan J.S., 4th, Basak P., et al. Pressé S. Inferring potential landscapes from noisy trajectories of particles within an optical feedback trap. iScience. 2022;25:104731. doi: 10.1016/j.isci.2022.104731. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Bryan J.S., 4th, Sgouralis I., Pressé S. Inferring effective forces for Langevin dynamics using Gaussian processes. J. Chem. Phys. 2020;152:124106. doi: 10.1063/1.5144523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Patel L., Gustafsson N., et al. Cohen E. A hidden Markov model approach to characterizing the photo-switching behavior of fluorophores. Ann. Appl. Stat. 2019;13:1397–1429. doi: 10.1214/19-AOAS1240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Rabiner L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE. 1989;77:257–286. [Google Scholar]
  • 60.Mattheyses A.L., Hoppe A.D., Axelrod D. Polarized fluorescence resonance energy transfer microscopy. Biophys. J. 2004;87:2787–2797. doi: 10.1529/biophysj.103.036194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Gordon F., Elcoroaristizabal S., Ryder A.G. Modelling Förster resonance energy transfer (FRET) using anisotropy resolved multi-dimensional emission spectroscopy (ARMES) Biochim. Biophys. Acta Gen. Subj. 2021;1865:129770. doi: 10.1016/j.bbagen.2020.129770. [DOI] [PubMed] [Google Scholar]
  • 62.Gordon G.W., Berry G., et al. Herman B. Quantitative fluorescence resonance energy transfer measurements using fluorescence microscopy. Biophys. J. 1998;74:2702–2713. doi: 10.1016/S0006-3495(98)77976-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Benke S., Holla A., et al. Schuler B. Combining rapid microfluidic mixing and three-color single-molecule FRET for probing the kinetics of protein conformational changes. J. Phys. Chem. B. 2021;125:6617–6628. doi: 10.1021/acs.jpcb.1c02370. [DOI] [PubMed] [Google Scholar]
  • 64.Zosel F., Soranno A., et al. Schuler B. Depletion interactions modulate the binding between disordered proteins in crowded environments. Proc. Natl. Acad. Sci. USA. 2020;117:13480–13489. doi: 10.1073/pnas.1921617117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Kapanidis A.N., Laurence T.A., et al. Weiss S. Alternating-laser excitation of single molecules. Acc. Chem. Res. 2005;38:523–533. doi: 10.1021/ar0401348. [DOI] [PubMed] [Google Scholar]
  • 66.Dytso A., Vincent Poor H. Vincent Poor. Estimation in Poisson noise: Properties of the conditional mean estimator. IEEE Trans. Inf. Theor. 2020;66:4304–4323. [Google Scholar]
  • 67.Alléaume R., Treussart F., et al. Roch J.-F. Photon statistics characterization of a single-photon source. New J. Phys. 2004;6:85. [Google Scholar]
  • 68.Brewer J. Kronecker products and matrix calculus in system theory. IEEE Trans. Circ. Syst. 1978;25:772–781. [Google Scholar]
  • 69.Rollins G.C., Shin J.Y., et al. Pressé S. Stochastic approach to the molecular counting problem in superresolution microscopy. Proc. Natl. Acad. Sci. USA. 2015;112:E110–E118. doi: 10.1073/pnas.1408071112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Gillespie D.T. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comput. Phys. 1976;22:403–434. [Google Scholar]
  • 71.Metropolis N., Rosenbluth A.W., et al. Teller E. Equation of state calculations by fast computing machines. J. Chem. Phys. 1953;21:1087–1092. [Google Scholar]
  • 72.Gelman A., Gilks W.R., Roberts G.O. Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. 1997;7:110. [Google Scholar]
  • 73.Jazani S., Sgouralis I., et al. Pressé S. An alternative framework for fluorescence correlation spectroscopy. Nat. Commun. 2019;10:3662. doi: 10.1038/s41467-019-11574-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Ghahramani Z., Griffiths T. Infinite latent feature models and the indian buffet process. Adv. Neural Inf. Process. Syst. 2005;18 [Google Scholar]
  • 75.Thibaux R., Jordan M.I. Artificial Intelligence and Statistics. PMLR; 2007. Hierarchical beta processes and the indian buffet process; p. 564. [Google Scholar]
  • 76.Paisley J., Lawrence C. Proceedings of the 26th Annual International Conference on Machine Learning. 2009. Nonparametric factor analysis with beta process priors; p. 777. [Google Scholar]
  • 77.Al Labadi L., Zarepour M. On approximations of the beta process in latent feature models: Point processes approach. Sankhya. 2018;80:59–79. [Google Scholar]
  • 78.Fazel M., Alexander V., et al. Pressé S. Fluorescence lifetime: Beating the IRF and interpulse window. bioRxiv. 2022 doi: 10.1101/2022.09.08.507224. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Teh Y.W., Jordan M.I., et al. Blei D.M. Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 2006;101:1566–1581. [Google Scholar]
  • 80.Pitman J. Poisson–Dirichlet and GEM invariant distributions for split-and-merge transformations of an interval partition. Combinator. Probab. Comput. 2002;11:501–514. [Google Scholar]
  • 81.Jayaram S. A constructive definition of Dirichlet priors. Stat. Sin. 1994;4:639. [Google Scholar]
  • 82.Neal R.M. Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph Stat. 2000;9:249. [Google Scholar]
  • 83.Gelfand A.E., Kottas A., MacEachern S.N. Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Am. Stat. Assoc. 2005;100:1021–1035. [Google Scholar]
  • 84.Zosel F., Mercadante D., et al. Schuler B. A proline switch explains kinetic heterogeneity in a coupled folding and binding reaction. Nat. Commun. 2018;9:3332. doi: 10.1038/s41467-018-05725-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Sanborn M.E., Connolly B.K., et al. Levitus M. Fluorescence properties and photophysics of the sulfoindocyanine cy3 linked covalently to DNA. J. Phys. Chem. B. 2007;111:11064–11074. doi: 10.1021/jp072912u. [DOI] [PubMed] [Google Scholar]
  • 86.Hohlbein J., Craggs T.D., Cordes T. Alternating-laser excitation: Single-molecule FRET and beyond. Chem. Soc. Rev. 2014;43:1156–1171. doi: 10.1039/c3cs60233h. [DOI] [PubMed] [Google Scholar]
  • 87.Sun Y., Wallrabe H., et al. Periasamy A. Three-color spectral FRET microscopy localizes three interacting proteins in living cells. Biophys. J. 2010;99:1274–1283. doi: 10.1016/j.bpj.2010.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Clamme J.-P., Deniz A.A. Three-color single-molecule fluorescence resonance energy transfer. ChemPhysChem. 2005;6:74–77. doi: 10.1002/cphc.200400261. [DOI] [PubMed] [Google Scholar]
  • 89.Hohng S., Joo C., Ha T. Single-molecule three-color FRET. Biophys. J. 2004;87:1328–1337. doi: 10.1529/biophysj.104.043935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Pressé S., Peterson J., et al. Dill K. Single molecule conformational memory extraction: P5ab RNA hairpin. J. Phys. Chem. B. 2014;118:6597–6603. doi: 10.1021/jp500611f. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Pressé S., Lee J., Dill K.A. Extracting conformational memory from single-molecule kinetic data. J. Phys. Chem. B. 2013;117:495–502. doi: 10.1021/jp309420u. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Deniz A.A., Dahan M., et al. Schultz P.G. Single-pair fluorescence resonance energy transfer on freely diffusing molecules: Observation of Förster distance dependence and subpopulations. Proc. Natl. Acad. Sci. USA. 1999;96:3670–3675. doi: 10.1073/pnas.96.7.3670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Jazani S., Sgouralis I., Pressé S. A method for single molecule tracking using a conventional single-focus confocal setup. J. Chem. Phys. 2019;150:114108. doi: 10.1063/1.5083869. [DOI] [PubMed] [Google Scholar]
  • 94.Schuler B., Hofmann H. Single-molecule spectroscopy of protein folding dynamics—expanding scope and timescales. Curr. Opin. Struct. Biol. 2013;23:36–47. doi: 10.1016/j.sbi.2012.10.008. [DOI] [PubMed] [Google Scholar]
  • 95.Kirmizialtin S., Huang L., Makarov D.E. Topography of the free-energy landscape probed via mechanical unfolding of proteins. J. Chem. Phys. 2005;122:234915. doi: 10.1063/1.1931659. [DOI] [PubMed] [Google Scholar]
  • 96.Shoemaker B.A., Wang J., Wolynes P.G. Structural correlations in protein folding funnels. Proc. Natl. Acad. Sci. USA. 1997;94:777–782. doi: 10.1073/pnas.94.3.777. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Kolomeisky A.B., Fisher M.E. Molecular motors: A theorist’s perspective. Annu. Rev. Phys. Chem. 2007;58:675–695. doi: 10.1146/annurev.physchem.58.032806.104532. [DOI] [PubMed] [Google Scholar]
  • 98.Konda S.S.M., Avdoshenko S.M., Makarov D.E. Exploring the topography of the stress-modified energy landscapes of mechanosensitive molecules. J. Chem. Phys. 2014;140:104114. doi: 10.1063/1.4867500. [DOI] [PubMed] [Google Scholar]

Articles from Biophysical Reports are provided here courtesy of Elsevier

RESOURCES