Single-photon smFRET. I: Theory and conceptual basis

Ayush Saurabh; Mohamadreza Fazel; Matthew Safar; Ioannis Sgouralis; Steve Pressé

doi:10.1016/j.bpr.2022.100089

. 2022 Dec 2;3(1):100089. doi: 10.1016/j.bpr.2022.100089

Single-photon smFRET. I: Theory and conceptual basis

Ayush Saurabh ^1,², Mohamadreza Fazel ^1,², Matthew Safar ^1,³, Ioannis Sgouralis ⁴, Steve Pressé ^1,^2,^5,^∗

PMCID: PMC9793182 PMID: 36582655

Abstract

We present a unified conceptual framework and the associated software package for single-molecule Förster resonance energy transfer (smFRET) analysis from single-photon arrivals leveraging Bayesian nonparametrics, BNP-FRET. This unified framework addresses the following key physical complexities of a single-photon smFRET experiment, including: 1) fluorophore photophysics; 2) continuous time kinetics of the labeled system with large timescale separations between photophysical phenomena such as excited photophysical state lifetimes and events such as transition between system states; 3) unavoidable detector artefacts; 4) background emissions; 5) unknown number of system states; and 6) both continuous and pulsed illumination. These physical features necessarily demand a novel framework that extends beyond existing tools. In particular, the theory naturally brings us to a hidden Markov model with a second-order structure and Bayesian nonparametrics on account of items 1, 2, and 5 on the list. In the second and third companion articles, we discuss the direct effects of these key complexities on the inference of parameters for continuous and pulsed illumination, respectively.

Why it matters

smFRET is a widely used technique for studying kinetics of molecular complexes. However, until now, smFRET data analysis methods have required specifying a priori the dimensionality of the underlying physical model (the exact number of kinetic parameters). Such approaches are inherently limiting given the typically unknown number of physical configurations a molecular complex may assume. The methods presented here eliminate this requirement and allow estimating the physical model itself along with kinetic parameters, while incorporating all sources of noise in the data.

Introduction

Förster resonance energy transfer (FRET) has served as a spectroscopic ruler to study motion at the nanometer scale (1,2,3,4), and has revealed insight into intra- and intermolecular dynamics of proteins (5,6,7,8,9,10,11), nucleic acids (12), and their interactions (13,14). In particular, single-molecule FRET (smFRET) experiments have been used to determine the pore size and opening mechanism of ion channels sensitive to mechanical stress in the membrane (15), the intermediate stages of protein folding (16,17), and the chromatin interactions modulated by the helper protein HP1 $α$ involved in allowing genetic transcription for tightly packed chromatin (18).

A typical FRET experiment involves labeling molecules of interest with donor and acceptor dyes such that the donor may transfer energy to the acceptor via dipole-dipole interaction when separated by distances of 2–10 nm (19). This interaction weakens rapidly with increasing separation $R$ and goes as $R^{- 6}$ (20,21).

To induce FRET during experiments, the donor is illuminated by a continuous or pulsating light source for the desired time period or until the dyes photobleach. Upon excitation, the donor may emit a photon itself or transfer its energy nonradiatively to the acceptor which eventually relaxes to emit a photon of a different color (20,21). As such, the data collected consist of photon arrival times (for single-photon experiments) or, otherwise, brightness values in addition to photon colors collected in different detection channels.

The distance dependence in the rate of energy transfer between donor and acceptor is key in using smFRET as a molecular ruler. Furthermore, this distance dependence directly manifests itself in the form of higher fraction of photons detected in the acceptor channel when the dyes are closer together (as demonstrated in Fig. 1). This fraction is commonly referred to as the FRET efficiency,

ϵ_{F R E T} = \frac{n_{A}}{n_{A} + n_{D}} = \frac{1}{1 + {(R / R_{0})}^{6}},

where $n_{D}$ and $n_{A}$ are the number of donor and acceptor photons detected in a given time period, respectively. In addition, $R_{0}$ is the characteristic separation that corresponds to a FRET efficiency of 0.5 or 50% of the emitted photons emanating from the acceptor.

A cartoon figure illustrating smFRET data. For the experiments considered here, the kinetics along the reaction coordinate defined along the donor-acceptor distance are monitored using single-photon arrival data. In the figure above, photon arrivals are represented by green dots for photons arriving into the donor channel and red dots for photons arriving in the acceptor channel. For the case where donor and acceptor label one molecule, a molecule’s transitions between system states (coinciding with conformations) is reflected by the distance between labels measured by variations in detected photon arrival times and colors.

Now, the aim of smFRET is to capture on-the-fly changes in donor-acceptor distance. However, this is often confounded by several sources of stochasticity, which unavoidably obscure direct interpretation. These include: 1) the stochasticity inherent to photon arrival times; 2) a detector’s probabilistic response to an incoming photon (22); 3) background emissions (2); and 4) fluorescent labels’ stochastic photophysical properties (2). Taken together, these problems necessarily contribute to uncertainty in the number of distinct system states visited by a labeled system over an experiment’s course (23,24,25).

Here, we delve into greater detail into items 2 and 4. In particular, item 2 pertains to questions of crosstalk, detector efficiency, dead time, dark current, and instrument response function (IRF) introducing uncertainty in excited photophysical state lifetime assessments (22,26,27).

Item 4 refers to a collection of effects including limited quantum yield and variable brightness due to blinking of dyes caused by nonradiative pathways (28,29), photobleaching or permanent deactivation of the dyes (2,28,29), spectral overlap between the donor and acceptor dyes, which may result in direct excitation of the acceptors or leaking of photons into the incorrect channel (2,26), or a donor-acceptor pair’s relative misalignment or positioning resulting in false signals and inaccurate characterization of the separation between labeled molecules (2,30).

Although the goal has always remained to analyze the rawest form of data, the reality of these noise properties has traditionally led to the development of approximate binned photon analyses even when data are collected at the level of single photons across two detectors. Binning is either achieved by directly summing photon arrivals over a time period when using single-photon detectors (23,31) or by integrating intensity over a few pixels when using widefield detectors (32).

While binned data analyses can be used to determine the number and connectivity of system states (33)—by computing average FRET efficiencies over bin time windows and using them in turn to construct FRET efficiency histograms (23,25,31,34,35,36)—they come at the cost of averaging kinetics that may exist below a time bin not otherwise easily accessible (32,37,38). They also eliminate information afforded by, say, the excited photophysical state lifetime in the case of pulsed illumination.

While histogram analyses are suited to infer static molecular properties, kinetics over binned time traces have also been extracted by supplementing these techniques with a hidden Markov model (HMM) treatment (23,25,34,35,36,39).

Using HMMs, binned analysis techniques immediately face the difficulty of an unknown number of system states visited. Therefore, they require the number of system states as an input to deduce the putative kinetics between the candidate system states.

What is more, the binned analysis’ accuracy is determined by the bin sizes where large bins may result in averaging of the kinetics. Moreover, increasing bin size may lead to estimation of an excess number of system states. This artifact arises when a system appears to artificially spend more time in the system states below the bin size (38). To address these challenges, we must infer continuous time trajectories below the bin size through, for example, the use of Markov jump processes (32), while retaining a binned, i.e., discrete measurement model.

When single-photon data are available we may avoid the binning issues inherent to HMM analysis (32,40,41). Doing so, also allows us to directly leverage the noise properties of detectors for single-photon arrivals (e.g., IRF) well calibrated at the single-photon level. Moreover, we can now also incorporate information available through photophysical state lifetimes when using pulsed illumination otherwise eliminated in binning data. Incorporating all of this additional information, naturally, comes with added computational cost (37) whose burden a successful method should mitigate.

Often, to help reduce computational costs, further approximations on the system kinetics are invoked, such as assuming system kinetics to be much slower than FRET label excitation and relaxation rates. This approximation helps decouple photophysical and system (molecular) kinetics (16,37,42,43).

What is more, as they exist, the rigor of direct photon arrival analysis methods are further compromised to help reduce computational cost by treating detector features and background as preprocessing steps (16,37,42,43). In doing so, simultaneous and self-consistent inference of kinetics and other molecular features becomes unattainable. Finally, all methods, whether relying on the analysis of binned photons or single-photon arrival, suffer from the “model selection problem.” That is, the problem associated with identifying the number of system states warranted by the data. More precisely, the problem associated with propagating the uncertainty introduced by items 1–4 into a probability over the models (i.e., system states). Existing methods for system state identification only provide partial reprieve.

For example, while FRET histograms identify peaks to intuit the number of system states, these peaks may provide unreliable estimates for a number of reasons: 1) fast transitions between system states may result in a blurring of otherwise distinct peaks (1) or, counter-intuitively, introduce more peaks (25,38); 2) system states may differ primarily in kinetics but not FRET efficiency (40); 3) detector properties and background may introduce additional features in the histograms.

To address the model selection problem, overfitting penalization criteria (such as the Bayesian information criterion or BIC) (23,44) or variational Bayesian (24) approaches have been employed.

Often, these model selection methods assume implicit properties of the system. For example, the BIC requires the assumption of weak independence between measurements (i.e., ideally independent identically distributed measurements and thus no Markov kinetics in state space) and a unique likelihood maximum, both of which are violated in smFRET data (24). Furthermore, BIC and other such methods provide point estimates rather than full probabilities over system states ignoring uncertainty from items 1–4 propagated over models (45).

As such, we need to learn distributions over system states and kinetics warranted by the data and whose breadth is dictated by the sources of uncertainty discussed above. More specifically, to address model selection and build joint distributions over system states and their kinetics, we treat the number of system states as a random variable just as the current community treats smFRET kinetic rates as random variables (25,40,41). Our objective is therefore to obtain distributions over all unknowns (including system states and kinetics) while accounting for items 1–4. Furthermore, this must be achieved in a computationally efficient way avoiding, altogether, the draconian assumptions of existing in single-photon analysis methods. In other words, we want to do more (by learning joint distributions over the number of system states alongside everything else) and we want it to cost less.

If we insist on learning distributions over unknowns, then it is convenient to operate within a Bayesian paradigm. Also, if the model (i.e., the number of system states) is unknown, then we must further generalize to the Bayesian nonparametric (BNP) paradigm (25,41,46,47,48,49,50,51,52,53). BNPs directly address the model selection problem concurrently and self-consistently while learning the associated model’s parameters and output full distributions over the number of system states and the other parameters.

In this series of three companion articles, we present a complete description of single-photon smFRET analysis within the BNP paradigm addressing noise sources discussed above (items 1–4). In addition, we develop specialized computational schemes for both continuous and pulsed illumination for it to “cost less.”

Indeed, mitigating computational cost becomes critical, especially with the added complexity of working within the BNP paradigm. This, in itself, warrants a detailed treatment of continuous and pulsed illumination analyses in two companion articles.

To complement this theoretical framework, we also provide to the community a suite of programs called BNP-FRET written in the compiled language Julia for high performance. These freely available programs allow for comprehensive analysis of single-photon smFRET time traces on immobilized molecules obtained with a wide variety of experimental setups.

In what follows, we first present a forward model. Next, we build an inverse strategy to learn full posteriors within the BNP paradigm. Finally, multiple examples are presented by applying the method to simulated data sets across different parameter regimes. Experimental data are treated in the two subsequent companion articles (54,55).

Forward model

Conventions

To be consistent throughout our three-part article, we precisely define some terms as follows.

1.
a macromolecular complex under study is always referred to as a system,
2.
the configurations through which a system transitions are termed system states, typically labeled using $σ$ ,
3.
FRET dyes undergo quantum mechanical transitions between photophysical states, typically labeled using $ψ$ ,
4.
a system-FRET combination is always referred to as a composite,
5.
a composite undergoes transitions among its superstates, typically labeled using $ϕ$ ,
6.
all transition rates are typically labeled using $λ$ ,
7.
the symbol $N$ is generally used to represent the total number of discretized time windows, typically labeled with $n$ , and
8.
the symbol $w_{n}$ is generally used to represent the observations in the $n$ -th time window.

smFRET data

Here, we briefly describe the data collected from typical smFRET experiments analyzed by BNP-FRET. In such experiments, donor and acceptor dyes labeling a system can be excited using either continuous illumination or pulsed illumination, where short laser pulses arrive at regular time intervals. Moreover, acceptors can also be excited by nonradiative transfer of energy from an excited donor to a nearby acceptor. Upon relaxation, both donor and acceptor can emit photons collected by single-photon detectors. These detectors record the set of photon arrival times and detection channels. We denote the arrival times by

{T_{s t a r t}, T_{1}, T_{2}, T_{3}, \dots, T_{K}, T_{e n d}},

and detection channels with

{c_{1}, c_{2}, c_{3}, \dots, c_{K}},

for a total number of $K$ photons. In the equations above, $T_{s t a r t}$ and $T_{e n d}$ are experiment’s start and end times. Further, we emphasize here that the strategy used to index the detected photons above is independent of the illumination setup used.

Throughout the experiment, photon detection rates from the donor and acceptor dyes vary as the distance between them changes, due to the system kinetics. In cases where the distances form an approximately finite set, we treat the system as exploring a discrete system state space. The acquired FRET traces can then be analyzed to estimate the transition rates between these system states assuming a known model (i.e., known number of system states). We will lift this assumption of knowing the model a priori in the section “nonparametrics: predicting the number of system states.”

Cases where the system state space is continuous fall outside the scope of the current work and require extensions of (56) and (57) currently in progress.

In the following subsections, we present a physical model (forward model) describing the evolution of an immobilized system labeled with a FRET pair. We use this model to derive, step-by-step, the collected data’s likelihood given a choice of model parameters. Furthermore, given the mathematical nature of what is to follow, we will accompany major parts of our derivations with a pedagogical example of a molecule labeled with a FRET pair undergoing transitions between just two system states to demonstrate each new concept in example boxes.

Likelihood

To derive the likelihood, we begin by considering the stochastic evolution of an idealized system, transitioning through a discrete set of total $M_{σ}$ system states, ${σ_{1}, \dots, σ_{M_{σ}}}$ , labeled with a FRET pair having $M_{ψ}$ discrete photophysical states, ${ψ_{1}, \dots, ψ_{M_{ψ}}}$ , representing the fluorophores in their ground, excited, triplet, blinking, photobleached, or other quantum mechanical states. The combined system-FRET composite now undergoes transitions between $M_{ϕ} = M_{σ} \times M_{ψ}$ superstates, ${ϕ_{1}, \dots, ϕ_{M_{ϕ}}}$ , corresponding to all possible ordered pairs $(σ_{j}, ψ_{k})$ of the system and photophysical states. To be precise, we define $ϕ_{i} \equiv (σ_{j}, ψ_{k})$ , where $i = (j - 1) M_{ψ} + k$ .

Assuming Markovianity (memorylessness) of transitions among superstates, the probability of finding the composite in a specific superstate at a given instant evolves according to the master equation (40).

\frac{d ρ (t)}{d t} = ρ (t) G,

(1)

where the row vector $ρ (t)$ of length $M_{ϕ}$ has elements coinciding with probabilities for finding the system-FRET composite in a given superstate at time $t$ . More explicitly, defining the photophysical portion of the probability vector $ρ (t)$ corresponding to system state $σ_{i}$ as

ρ_{σ_{i}} (t) = [\begin{array}{c} ρ_{σ_{i}, ψ_{1}} (t) & ρ_{σ_{i}, ψ_{2}} (t) & \dots & ρ_{σ_{i}, ψ_{M_{ψ}}} (t) \end{array}],

we can write $ρ (t)$ as

ρ (t) = [\begin{array}{c} ρ_{σ_{1}} (t) & ρ_{σ_{2}} (t) & \dots & ρ_{σ_{M_{σ}}} (t) \end{array}] .

Furthermore, in the master equation above, $G$ is the generator matrix of size $M_{ϕ} \times M_{ϕ}$ populated by all transition rates $λ_{ϕ_{i} \to ϕ_{j}}$ between superstates.

Each diagonal element of the generator matrix $G$ corresponds to self-transitions and is equal to the negative sum of the remaining transition rates within the corresponding row. That is, $λ_{ϕ_{i} \to ϕ_{i}} = - \sum_{j \neq i} λ_{ϕ_{i} \to ϕ_{j}}$ . This results in zero row-sums, assuring that $ρ (t)$ remains normalized at all times as described later in more detail (see Eq. 5). Furthermore, for simplicity, we assume no simultaneous transitions among system states and photophysical states as such events are rare (although the incorporation of these events in the model may be accommodated by expanding the superstate space). This assumption results in $λ_{(ψ_{i}, σ_{j}) \to (ψ_{l}, σ_{m})} = 0$ for simultaneous $i \neq l$ and $l \neq m$ , which allows us to simplify the notation further. That is, $λ_{(ψ_{i}, σ_{j}) \to (ψ_{i}, σ_{k})} \equiv λ_{σ_{j} \to σ_{k}}$ (for any $i$ ) and $λ_{(ψ_{i}, σ_{j}) \to (ψ_{k}, σ_{j})} \equiv λ_{σ_{j}, ψ_{i} \to ψ_{k}}$ (for any $j$ ). This leads to the following form for the generator matrix containing blocks of exclusively photophysical and exclusively system transition rates, respectively

G = [\begin{array}{c} G_{σ_{1}}^{ψ} - \sum_{j \neq 1} λ_{σ_{1} \to σ_{j}} I & λ_{σ_{1} \to σ_{2}} I & \dots & λ_{σ_{1} \to σ_{M_{σ}}} I \\ λ_{σ_{2} \to σ_{1}} I & G_{σ_{2}}^{ψ} - \sum_{j \neq 2} λ_{σ_{2} \to σ_{j}} I & \dots & λ_{σ_{2} \to σ_{M_{σ}}} I \\ ⋮ & ⋮ & ⋱ & ⋮ \\ λ_{σ_{M_{σ}} \to σ_{1}} I & λ_{σ_{M_{σ}} \to σ_{2}} I & \dots & G_{σ_{M_{σ}}}^{ψ} - \sum_{j \neq M_{σ}} λ_{σ_{M_{σ}} \to σ_{j}} I \end{array}],

(2)

where the matrices on the diagonal $G_{σ_{i}}^{ψ}$ are the photophysical parts of the generator matrix for a system found in the $σ_{i}$ system state. In addition, $I$ is the identity matrix of size $M_{ψ}$ .

For later convenience, we also organize the system transition rates $λ_{σ_{i} \to σ_{j}}$ in Eq. 2 as a matrix

G_{σ} = [\begin{array}{c} * & λ_{σ_{1} \to σ_{2}} & λ_{σ_{1} \to σ_{3}} & \dots & λ_{σ_{1} \to σ_{M_{σ}}} \\ λ_{σ_{2} \to σ_{1}} & * & λ_{σ_{2} \to σ_{3}} & \dots & λ_{σ_{2} \to σ_{M_{σ}}} \\ λ_{σ_{3} \to σ_{1}} & λ_{σ_{3} \to σ_{2}} & * & \dots & λ_{σ_{3} \to σ_{M_{σ}}} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ λ_{σ_{M_{σ}} \to σ_{1}} & λ_{σ_{M_{σ}} \to σ_{2}} & λ_{σ_{M_{σ}} \to σ_{3}} & \dots & * \end{array}],

(3)

which we call system generator matrix.

Moreover, the explicit forms of $G_{σ_{i}}^{ψ}$ in Eq. 2 depend on the photophysical transitions allowed in the model. For instance, if the FRET pair is allowed to go from its ground photophysical state $(ψ_{1})$ to the excited donor $(ψ_{2})$ or excited acceptor $(ψ_{3})$ states only, the matrix is given as

G_{σ_{i}}^{ψ} = [\begin{array}{c} * & λ_{σ_{i}, ψ_{1} \to ψ_{2}} & λ_{σ_{i}, ψ_{1} \to ψ_{3}} \\ λ_{σ_{i}, ψ_{2} \to ψ_{1}} & * & λ_{σ_{i}, ψ_{2} \to ψ_{3}} \\ λ_{σ_{i}, ψ_{3} \to ψ_{1}} & 0 & * \end{array}] = [\begin{array}{c} * & λ_{e x} & λ_{d i r e c t} \\ λ_{d} & * & λ_{σ_{i}}^{F R E T} \\ λ_{a} & 0 & * \end{array}],

(4)

where the $*$ along the diagonal represents the negative row-sum of the remaining elements, $λ_{e x}$ is the excitation rate, $λ_{d}$ and $λ_{a}$ are the donor and acceptor relaxation rates, respectively, and $λ_{d i r e c t}$ is direct excitation of the acceptor by a laser, and $λ_{σ_{i}}^{F R E T}$ is the donor to acceptor FRET transition rate when the system is in its $i$ -th system state. We note that only FRET transitions depend on the system states (identified by dye-dye separations) and correspond to FRET efficiencies given by

ϵ_{σ_{i}}^{F R E T} = \frac{λ_{σ_{i}}^{F R E T}}{λ_{σ_{i}}^{F R E T} + λ_{d}},

where the ratio on the right hand side represents the fraction of FRET transitions among all competing transitions out of an excited donor, that is, the fraction of emitted acceptor photons among total emitted photons.

With the generator matrix at hand, we now look for solutions to the master equation of Eq. 1. Due to its linearity, the master equation accommodates the following analytical solution:

ρ (t) = ρ (t_{0}) \exp ((t - t_{0}) G) \equiv ρ (t_{0}) Π (t - t_{0}),

(5)

illustrating how the probability vector $ρ (t)$ arises from the propagation of the initial probability vector at time $t_{0}$ by the exponential of the generator matrix (the propagator matrix $Π (t - t_{0})$ ). The exponential maps the transition rates $λ_{ϕ_{i} \to ϕ_{j}}$ in the generator matrix to their corresponding transition probabilities $π_{ϕ_{i} \to ϕ_{j}}$ populating the propagator matrix. The zero row-sums of the generator matrix guarantee that the resulting propagator matrix is stochastic (i.e., has rows of probabilities that sum to unity, $\sum_{j} π_{ϕ_{i} \to ϕ_{j}} = 1$ ).

Example I: State space and generator matrix.

For a molecule undergoing transitions between its two conformations, we have $M_{σ} = 2$ system states given as ${σ_{1}, σ_{2}}$ . The photophysical states of the FRET pair labeling this molecule are defined according to whether the donor or acceptor are excited. Denoting the ground state by $G$ and excited state by $E$ , we can write all photophysical states of the FRET pair as ${ψ_{1} = (G, G), ψ_{2} = (E, G), ψ_{3} = (G, E)}$ , where the first element in the ordered pair represents the donor state. Furthermore, here, we assume no simultaneous excitation of the donor and acceptor owing to its rarity.

Next, we construct the superstate space with $M_{φ} = 6$ ordered pairs ${φ_{1} = (ψ_{1}, σ_{1}), φ_{2} = (ψ_{2}, σ_{1}), φ_{3} = (ψ_{3}, σ_{1}), φ_{4} = (ψ_{1}, σ_{2}), φ_{5} = (ψ_{2}, σ_{2}), φ_{6} = (ψ_{3}, σ_{2})}$ . Finally, the full generator matrix for this setup reads

\begin{array}{c} G = [\begin{array}{c} G_{σ_{1}}^{ψ} - λ_{σ_{1} \to σ_{2}} I & λ_{σ_{1} \to σ_{2}} I \\ λ_{σ_{2} \to σ_{1}} I & G_{σ_{2}}^{ψ} - λ_{σ_{2} \to σ_{1}} I \end{array}] \\ = [\begin{array}{c} * & λ_{e x} & λ_{d i r e c t} & λ_{σ_{1} \to σ_{2}} & 0 & 0 \\ λ_{d} & * & λ_{σ_{1}}^{F R E T} & 0 & λ_{σ_{1} \to σ_{2}} & 0 \\ λ_{a} & 0 & * & 0 & 0 & λ_{σ_{1} \to σ_{2}} \\ λ_{σ_{2} \to σ_{1}} & 0 & 0 & * & λ_{e x} & λ_{d i r e c t} \\ 0 & λ_{σ_{2} \to σ_{1}} & 0 & λ_{d} & * & λ_{σ_{2}}^{F R E T} \\ 0 & 0 & λ_{σ_{2} \to σ_{1}} & λ_{a} & 0 & * \end{array}] . \end{array}

Both here, and in similar example boxes that follow, we choose values for rates commonly encountered in experiments (17). We consider a laser exciting a donor at rate $λ_{e x} = 10 {ms}^{- 1}$ . Next, we suppose that the molecule switches between system states $σ_{1}$ and $σ_{2}$ at rates $λ_{σ_{1} \to σ_{2}} = 2.0 {ms}^{- 1}$ and $λ_{σ_{2} \to σ_{1}} = 1 {ms}^{- 1}$ .

Furthermore, assuming typical lifetimes of 3.6 and 3.5 ns for the donor and acceptor dyes (17), their relaxation rates are, respectively, $λ_{d} = 1 / 3.6$ ns⁻¹ and $λ_{a} = 1 / 3.5$ ns⁻¹. We also assume that there is no direct excitation of the acceptor and thus $λ_{d i r e c t} = 0$ . Next, we choose FRET efficiencies of 0.2 and 0.9 for the two system states resulting in $λ_{σ_{1}}^{F R E T} = λ_{d} / 4 = 0.06$ ns⁻¹ and $λ_{σ_{2}}^{F R E T} = 9 λ_{d} = 2.43$ ns⁻¹.

Finally, these values lead to the following generator matrix (in ${ms}^{- 1}$ units)

G = [\begin{array}{c} - 12 & 10.0 & 0 & 2.0 & 0.0 & 0.0 \\ 277000 & - 347002 & 70000 & 0.0 & 2.0 & 0.0 \\ 285000 & 0.0 & - 285002 & 0.0 & 0.0 & 2.0 \\ 1.0 & 0.0 & 0.0 & - 11 & 10.0 & 0 \\ 0.0 & 1.0 & 0.0 & 277000 & - 2777001 & 2500000 \\ 0.0 & 0.0 & 1.0 & 285000 & 0.0 & - 285001 \end{array}] .

After describing the generator matrix and deriving the solution to the master equation, we continue by explaining how to incorporate observations into a likelihood.

In the absence of observations, any transition among the set of superstates are unconstrained. However, when monitoring the system using suitable detectors, observations rule out specific transitions at the observation time. For example, ignoring background for now, the detection of a photon from a FRET label identifies a transition from an excited photophysical state to a lower energy photophysical state of that label. On the other hand, no photon detected during a time period indicates the absence of radiative transitions or the failure of detectors to register such transition. Consequently, even periods without photon detections are informative in the presence of a detector. In other words, observations from a single-photon smFRET experiment are continuous in that they are defined at every point in time.

In addition, since smFRET traces report radiative transitions of the FRET labels at photon arrival times, uncertainty remains about the occurrences of unmonitored transitions (e.g., between system states). Put differently, smFRET traces (observations) only partially specify superstates at any given time.

Now, to compute the likelihood for such smFRET traces, we must sum those probabilities over all trajectories across superstates (superstate trajectories) consistent with a given set of observations. Assuming the system ends in superstate $ϕ_{i}$ at $T_{e n d}$ , this sum over all possible trajectories can be very generally given by the element of the propagated vector $ρ (T_{e n d})$ corresponding to superstate $ϕ_{i}$ . Therefore, a general likelihood may be written as

L = p (ϕ_{i}) = {[ρ (T_{e n d})]}_{i} .

(6)

However, as the final superstate at time $T_{e n d}$ is usually unknown, we must therefore marginalize (sum) over the final superstate to obtain the following likelihood

L = \sum_{i = 1}^{M_{ϕ}} p (ϕ_{i}) = ρ (T_{e n d}) ρ_{norm}^{T},

(7)

where all elements of the vector $ρ_{norm}$ are set to 1 as a means to sum the probabilities in vector $ρ (T_{e n d})$ . In the following sections, we describe how to obtain concrete forms for these general likelihoods.

Absence of observations

For pedagogical reasons, it is helpful to first look at the trivial case where a system-FRET composite evolves but no observations are made (due to a lack, say, of detection channels). In this case, all allowed superstate trajectories are possible between the start time of the experiment, $T_{s t a r t}$ , and end, $T_{e n d}$ . This is because the superstate cannot be specified or otherwise restricted at any given time by observations previously explained. Consequently, the probability vector $ρ (t)$ remains normalized throughout the experiment as no superstate trajectory is excluded. As such, the likelihood is given by summing over probabilities associated to the entire set of trajectories, that is,

L = p ((T_{1}, e_{1}), \dots, (T_{K}, e_{K}) | G) = ρ (T_{s t a r t}) Π (T_{e n d} - T_{s t a r t}) ρ_{norm}^{T} = ρ (T_{e n d}) ρ_{norm}^{T} = 1,

(8)

where ${e_{1}, \dots, e_{K}}$ are the emission times of all emitted photons, not recorded due to lack of detection channels and thus not appearing on the right hand side of the expression.

In what follows, we describe how the probability vector $ρ (t)$ does not remain normalized as it evolves to $ρ (T_{e n d})$ when detectors partially collapse knowledge of the occupied superstate during the experiment. This results in a likelihood smaller than one. We do so for the conceptually simpler case of continuous illumination for now.

Introducing observations

To compute the likelihood when single-photon detectors are present, we start by defining a measurement model where the observation at a given time is dictated by ongoing transitions and detector features (e.g., crosstalk, detector efficiency). As we will see in more detail later, if we describe the evolution of a system by defining its states at discrete time points and these states are not directly observed, and thus hidden, then this measurement model adopts the form of a HMM. Here, Markovianity arises when a given hidden state only depends on its immediate preceding hidden state. In such HMMs, an observation at a given time is directly derived from the concurrent hidden state.

As an example of an HMM, for binned smFRET traces, an observation is often approximated to depend only on the current hidden state. However, contrary to such a naive HMM, an observation in a single-photon setup in a given time period depends on the current superstate and the immediate previous superstate. This naturally enforces a second-order structure on the HMM where each observed random variable depends on two superstates, as we demonstrate shortly. A similar HMM structure was noted previously to model a fluorophore’s photo-switching behavior in (58).

Now, to address this observation model, we first divide the experiment’s time duration into $N$ windows of equal size, $ϵ = (T_{e n d} - T_{s t a r t}) / N$ . We will eventually take the continuum limit $ϵ \to 0$ to recover the original system as described by the master equation. We also sum over all possible transitions between superstates within each window. These windows are marked by the times (see Fig. 2 a)

{t_{0}, t_{1}, t_{2}, \dots, t_{N}},

where the $n$ -th window is given by $(t_{n - 1}, t_{n})$ with $t_{0} = T_{s t a r t}$ and $t_{N} = T_{e n d}$ . Corresponding to each time window, we have observations

w = {w_{1}, w_{2}, w_{3}, \dots, w_{N}},

where $w_{n} = \emptyset$ if no photons are detected and $w_{n} = {(T_{n}^{(1)}, c_{n}^{(1)}), (T_{n}^{(2)}, c_{n}^{(2)}), \dots}$ otherwise, with the $j$ -th photon in a window being recorded by the channel $c_{n}^{(j)}$ at time $T_{n}^{(j)}$ . Note here that observations in a time window, being a continuous quantity, allow for multiple photon arrivals or none at all.

Graphical models depicting the random variables and parameters involved in the generation of photon arrival data for smFRET experiments. Circles shaded in blue represent parameters of interest we wish to deduce, namely transition rates and probabilities. The circles shaded in gray correspond to observations. The unshaded circles represent the superstates. The arrows reflect conditional dependence among these variables and colored dots represent photon arrivals. Going from (a) to (b), we convert the original HMM with a second-order structure to a naive HMM where each observation only depends on one state.

As mentioned earlier, each of these observations originate from the evolution of the superstate. Therefore, we define superstates occupied at the beginning of each window as

{a_{1}, a_{2}, a_{3}, \dots, a_{N - 1}, a_{N}, a_{N + 1}},

where $a_{n}$ is the superstate at the beginning of the $n$ -th time window as shown in Fig. 2 a. The framework described here can be employed to compute the likelihood. However, the second-order structure of the HMM leads to complications in these calculations. In the rest of this section, we first illustrate the mentioned complication using a simple example and then describe a solution to this issue.

Example II: Naive likelihood computation.

Here, we calculate the likelihood for our two-state system described earlier. For simplicity alone, we attempt the likelihood calculation for a time period spanning the first two time windows $(N = 2)$ in Fig. 2 a. Within this period the system-FRET composite evolves from superstate $a_{1}$ to $a_{3}$ giving rise to observations $w_{1 : 2}$ . The likelihood for such a setup is typically obtained using a recursive strategy by marginalizing over superstates $a_{1 : 3}$ (summing over all possible superstate trajectories)

\begin{array}{c} L = p (w_{1 : 2} | G, ρ_{s t a r t}) = \sum_{a_{1 : 3}} p (w_{1 : 2}, a_{1 : 3} | G, ρ_{s t a r t}) \\ = \sum_{a_{1 : 3}} p (w_{2} | a_{1 : 3}, w_{1} G, ρ_{s t a r t}) p (w_{1}, a_{1 : 3} | G, ρ_{s t a r t}) \\ = \sum_{a_{1 : 3}} p (w_{2} | a_{2 : 3}, G) p (w_{1} | a_{1 : 2}, G) p (a_{1 : 3} | G, ρ_{s t a r t}) . \end{array}

Here, we have applied the chain rule of probabilities in each step. Moreover, in the last step, we have only retained the parameters that are directly connected to the random variable on the left in each term, as shown by arrows in Fig. 2 a.

Now, for our two system state example, $a_{n}$ can be any of the six superstates $φ_{1 : 6}$ $(M_{φ} = 6)$ given earlier. As such, the sum above contains $M_{φ}^{N + 1} = 6^{3}$ terms for such a simple example. For a large number of time windows, computing this sum becomes prohibitively expensive. Therefore, it is common to use a recursive approach to find the likelihood, only requiring $M_{φ}^{2} (N + 1)$ operations, as we describe in the next section. However, due to our HMM’s second-order structure, the two first terms (involving observations) in the above sum are conditioned on a mutual superstate $a_{2}$ , which forbids recursive calculations.

After describing the issue in computing the likelihood due to the second-order structure of our HMM, we now describe a solution to this problem. As such, to simplify the likelihood calculation, we temporarily introduce superstates $b_{n}$ at the end of $n$ -th window separated from superstate $a_{n + 1}$ at the beginning of $(n + 1)$ -th window by a short time $τ$ as shown in Fig. 2 b during which no observations are recorded (inactive detectors). This procedure allows us to conveniently remove dependency of consecutive observations on a mutual superstate. That is, consecutive observations $w_{n}$ and $w_{n + 1}$ now do not depend on a common superstate $a_{n + 1}$ , but rather on separated $(a_{n}, b_{n})$ pairs; see Fig. 2 b. The sequence of superstates now looks like (see Fig. 2 b)

{a_{1}, b_{1}, a_{2}, b_{2}, a_{3}, b_{3}, \dots, a_{N - 1}, b_{N - 1}, a_{N}, b_{N}, a_{N + 1}},

(9)

which now permits a recursive strategy for likelihood calculation as described in the next section. Furthermore, we will eventually take the $τ \to 0$ limit to obtain the likelihood of the original HMM with the second-order structure.

Recursion formulas

We now have the means to compute the terminal probability vector $ρ_{e n d} = ρ (T_{e n d})$ by evolving the initial vector $ρ_{s t a r t} = ρ (T_{s t a r t})$ . This is most conveniently achieved by recursively marginalizing (summing) over all superstates in Eq. 9 backward in time, starting from the last superstate $a_{N + 1}$ as follows

\begin{array}{l} L = p (w_{1 : N} | G, ρ_{s t a r t}) & = \sum_{a_{N + 1}} p (w_{1 : N}, a_{N + 1} | G, ρ_{s t a r t}) \\ = \sum_{a_{N + 1}} A_{N + 1} (a_{N + 1}) = A_{N + 1} ρ_{norm}^{T}, \end{array}

(10)

where $A_{N + 1} (a_{N + 1})$ are elements of the vector $A_{N + 1}$ of length $M_{ϕ}$ , commonly known as a filter (59). Moving backward in time, the filter at the beginning of the $n$ -th time window, $A_{n} (a_{n + 1})$ , is related to the filter at the end of the $n$ -th window, $B_{n} (b_{n})$ , due to Markovianity, as follows

\begin{array}{c} A_{n + 1} (a_{n + 1}) = p (w_{1 : n}, a_{n + 1} | G, ρ_{s t a r t}) \\ = \sum_{b_{n}} p (a_{n + 1} | b_{n}, G) B_{n} (b_{n}), \end{array}

(11)

or in matrix notation as

A_{n + 1} = B_{n} {\tilde{Π}}_{n},

where $p (a_{n + 1} | b_{n})$ are the elements of the transition probability matrix ${\tilde{Π}}_{n}$ described in the next section. Again due to Markovianity, the filter at the end of the $n$ -th window, $B_{n} (b_{n})$ , is related to the filter at the beginning of the same time window, $A_{n} (a_{n})$ , as

\begin{array}{c} B_{n} (b_{n}) = p (w_{1 : n - 1}, b_{n} | G, ρ_{s t a r t}) \\ = \sum_{a_{n}} p (w_{n} | a_{n}, b_{n}, G) p (b_{n} | a_{n}, G) A_{n} (a_{n}), \end{array}

(12)

or in matrix notation as

B_{n} = A_{n} Π_{n}^{(r)},

where the terms $p (w_{n} | a_{n}, b_{n}, G) p (b_{n} | a_{n}, G)$ populate the transition probability matrix $Π_{n}^{(r)}$ described in the next section. Here, we use the superscript $(r)$ to denote that elements of this matrix include observation probabilities. We note here that the last filter in the recursion formula, $A_{1}$ , is equal to starting probability vector $ρ_{s t a r t}$ itself.

Reduced propagators

To derive the different terms in the recursive filter formulas, we first note that the transition probabilities $p (a_{n} | b_{n - 1}, G)$ and $p (b_{n} | a_{n}, G)$ do not involve observations. As such, we can use the full propagator as follows

p (b_{n} | a_{n}, G) = {(Π)}_{a_{n} \to b_{n}} = {(\exp ((ϵ - τ) G))}_{a_{n} \to b_{n}}

and

p (a_{n} | b_{n - 1}, G) = {(\tilde{Π})}_{b_{n - 1} \to a_{n}} = {(\exp (τ G))}_{b_{n - 1} \to a_{n}},

respectively. On the other hand, the term $p (w_{n} | a_{n}, b_{n}, G)$ includes observations that result in modification to the propagator by ruling out a subset of transitions. For instance, observation of a photon momentarily eliminates all nonradiative transitions. The modifications now required can be structured into a matrix $D_{n}$ of the same size as the propagator with elements ${(D_{n})}_{a_{n} \to b_{n}} = p (w_{n} | a_{n}, b_{n}, G)$ . We term all such matrices detection matrices. The product $p (w_{n} | a_{n}, b_{n}, G) p (b_{n} | a_{n}, G)$ in Eq. 12 can now be written as

{(Π_{n}^{(r)})}_{a_{n} \to b_{n}} = {(Π)}_{a_{n} \to b_{n}} \times {(D_{n})}_{a_{n} \to b_{n}},

relating the modified propagator (termed reduced propagator and distinguished by the superscript $(r)$ hereafter) ${(Π_{n}^{(r)})}_{a_{n} \to b_{n}}$ in the presence of observations to the full propagator (no observations). Plugging in the matrices introduced above into the recursive filter formulas (Eqs. 11 and 12), we obtain in matrix notation

\begin{array}{c} A_{n + 1} = B_{n} \tilde{Π} = B_{n} \exp ((ϵ - τ) G) \\ B_{n} = A_{n} Π_{n}^{(r)} = A_{n} (\exp (ϵ G) ⊙ D_{n}), \end{array}

(13)

where the symbol $⊙$ represents element-by-element product of matrices. Here, however, the detection matrices cannot yet be computed analytically as the observations $w_{n}$ allow for an arbitrary number of transitions within the finite time window $(t_{n - 1}, t_{n})$ . However, they become manageable in the limit that the time windows become vanishingly small, as we demonstrate later.

Likelihood for the HMM with second-order structure

Now, inserting the matrix expressions for filters of Eq. 13 into the recursive formula likelihood Eq. 10, we arrive at

\begin{array}{c} L = \sum_{a_{N + 1}} ρ_{s t a r t} Π_{1}^{(r)} \tilde{Π} Π_{2}^{(r)} \tilde{Π} Π_{3}^{(r)} \tilde{Π} \dots Π_{N - 1}^{(r)} \tilde{Π} Π_{N}^{(r)} \\ = ρ_{s t a r t} Π_{1}^{(r)} \tilde{Π} Π_{2}^{(r)} \tilde{Π} Π_{3}^{(r)} \tilde{Π} \dots Π_{N - 1}^{(r)} \tilde{Π} Π_{N}^{(r)} ρ_{norm}^{T} = ρ (T_{e n d}) ρ_{norm}^{T}, \end{array}

where, in the second step, we added a row vector of ones, $ρ_{norm}$ at the end to sum over all elements. Here, the superscript $T$ denotes matrix transpose. As we now see, the structure of the likelihood above amounts to propagation of the initial probability vector $ρ_{s t a r t}$ to the final probability vector $ρ (T_{e n d})$ via multiple propagators corresponding to $N$ time windows.

Now, under the limit $τ \to 0$ , we have

\begin{array}{c} \tilde{Π} = \lim_{τ \to 0} \exp (τ G) = I, \\ Π = \lim_{τ \to 0} \exp ((ϵ - τ) G) = \exp (ϵ G), \end{array}

where $I$ is the identity matrix. In this limit, we recover the likelihood for the HMM with a second-order structure as

L = ρ_{s t a r t} Π_{1}^{(r)} Π_{2}^{(r)} Π_{3}^{(r)} \dots Π_{N - 1}^{(r)} Π_{N}^{(r)} ρ_{n o r m}^{T} .

(14)

We note here that the final probability vector $ρ (T_{e n d})$ is not normalized to one upon propagation due to the presence of reduced propagators corresponding to observations. More precisely, the reduced propagators restrict the superstates evolution to only a subset of trajectories over a time window $ϵ$ in agreement with the observation over this window. This, in turn, results in a probability vector whose elements sum to less than one. That is,

ρ_{s t a r t} Π_{1}^{(r)} Π_{2}^{(r)} Π_{3}^{(r)} \dots Π_{N - 1}^{(r)} Π_{N}^{(r)} ρ_{norm}^{T} = ρ (T_{e n d}) ρ_{norm} < 1 .

(15)

Continuum limit

Up until now, the finite size of the time window $ϵ$ allowed for an arbitrary number of transitions per time window $(t_{n - 1}, t_{n})$ , which hinders the computation of an exact form for the detection matrices. Here, we take the continuum limit, as the time windows become vanishingly small (that is, $ϵ \to 0$ as $N \to \infty$ ). Thus, no more than one transition is permitted per window. This allows us to fully specify the detection matrices $D_{n}$ .

To derive the detection matrices, we first assume ideal detectors with 100% efficiency and include detector effects in the subsequent sections (see the “detection effects” section). In such cases, the absence of photon detections during a time window, while detectors are active, indicates that only nonradiative transitions took place. Thus, only nonradiative transitions have nonzero probabilities in the detection matrices. As such, for evolution from superstate $a_{n}$ to $b_{n}$ , the elements of the nonradiative detection matrix, $D^{n o n}$ , are given by

{(D^{n o n})}_{a_{n} \to b_{n}} = {\begin{cases} 1 & Nonradiative transitions \\ 0 & Radiative transitions \end{cases} .

(16)

On the other hand, when the $k$ -th photon is recorded in a time window, only elements corresponding to radiative transitions are nonzero in the detection matrix denoted by $D_{k}^{r a d}$ as

{(D_{k}^{r a d})}_{a_{n} \to b_{n}} = {\begin{cases} 0 & All transitions except for the k -th photon emission \\ 1 & k -th photon emission \end{cases} .

(17)

Here, we note that the radiative detection matrices have zeros along their diagonals, since self-transitions are nonradiative.

We can now define the reduced propagators corresponding to the nonradiative and radiative detection matrices, $D^{n o n}$ and $D_{k}^{r a d}$ , using the Taylor approximation $\lim_{ϵ \to 0} Π_{n} = I + ϵ G + O (ϵ^{2})$ as

Π^{(r) n o n} = (I + ϵ G + O (ϵ^{2})) ⊙ D^{n o n} = \exp (ϵ G^{n o n}) + O (ϵ^{2}), a n d

(18)

Π_{k}^{(r) r a d} = (I + ϵ G + O (ϵ^{2})) ⊙ D_{k}^{r a d} = ϵ G_{k}^{r a d} + O (ϵ^{2}) .

(19)

In the equations above, $G^{n o n} = G ⊙ D^{n o n}$ and $G_{k}^{r a d} = G ⊙ D_{k}^{r a d}$ , where the symbol $⊙$ represents an element-by-element product of the matrices. Furthermore, the product between the identity matrix and $D_{k}^{r a d}$ above vanishes in the radiative propagator due to zeros along the diagonals of $D_{k}^{r a d}$ .

Example III: Detection matrices.

For our example with two system states described earlier, the detection matrices of Eqs. 16 and 17 take simple forms. The radiative detection matrix has the same size as the generator matrix with nonzero elements wherever there is a rate associated to a radiative transition

D_{d / a}^{r a d} = [\begin{array}{c} 0 & 0 & 0 & 0 & 0 & 0 \\ 1 / 0 & 0 & 0 & 0 & 0 & 0 \\ 0 / 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 / 0 & 0 & 0 \\ 0 & 0 & 0 & 0 / 1 & 0 & 0 \end{array}],

where the subscripts $d$ and $a$ , respectively, denote photon detection in donor and acceptor channels. Similarly, the nonradiative detection matrix is obtained by setting all elements of the generator matrix related to radiative transitions to zero and the remaining to one as

D^{n o n} = [\begin{array}{c} 1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 0 & 1 & 1 \\ 1 & 1 & 1 & 0 & 1 & 1 \end{array}] .

Final likelihood

With the asymptotic forms of the reduced propagators in Eq. 14 now defined in the last subsection, we have all the ingredients needed to arrive at the final form of the likelihood.

To do so, we begin by considering the period right after the detection of the $(k - 1)$ -th photon until the detection of the $k$ -th photon. For this time period, the nonradiative propagators in Eq. 14 can now be easily merged into a single propagator $Π_{k}^{n o n} = \exp ((T_{k} - T_{k - 1}) G^{n o n})$ , as the commutative arguments of the exponentials can be readily added. Furthermore, at the end of this interphoton period, the radiative propagator $Π_{k}^{(r) r a d}$ marks the arrival of the $k$ -th photon. The product of these two propagators

Π_{k}^{n o n} Π_{k}^{(r) r a d} = ϵ Π_{k}^{n o n} G_{k}^{r a d} + O (ϵ^{2}) = ϵ \exp ((T_{k} - T_{k - 1}) G^{n o n}) G_{k}^{r a d} + O (ϵ^{2}),

(20)

now governs the stochastic evolution of the system-FRET composite during that interphoton period.

Inserting Eq. 20 for each interphoton period into the likelihood for the HMM with second-order structure in Eq. 14, we finally arrive at our desired likelihood

L = ϵ^{K} ρ_{s t a r t} Π_{1}^{n o n} G_{1}^{r a d} Π_{2}^{n o n} G_{2}^{r a d} \dots Π_{K - 1}^{n o n} G_{K - 1}^{r a d} Π_{K}^{n o n} G_{K}^{r a d} Π_{e n d}^{n o n} ρ_{n o r m}^{T} + O (ϵ^{K + 1}) .

(21)

This likelihood has the same structure as shown by Gopich and Szabo in (40).

Example IV: Propagator and likelihood.

Here, we consider a simple FRET trace where two photons are detected at times 0.05 and 0.15 ms in the donor and acceptor channels, respectively. To demonstrate the ideas developed so far, we calculate the likelihood of these observations as (see Eq. 21)

L = ϵ^{2} ρ_{s t a r t} Π_{1}^{n o n} G_{1}^{r a d} Π_{2}^{n o n} G_{2}^{r a d} ρ_{norm}^{T} .

To do so, we first need to calculate $Π_{1}^{n o n}$ using the nonradiative detection $(D^{n o n})$ and generator $(G)$ matrices found in the previous example boxes

Π_{1}^{n o n} = \exp (0.05 (G ⊙ D^{n o n})) = [\begin{array}{c} 0.55 & 0 & 0 & 0.06 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0.03 & 0 & 0 & 0.58 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{array}],

and similarly

Π_{2}^{n o n} = \exp ((0.15 - 0.05) (G ⊙ D^{n o n})) = [\begin{array}{c} 0.30 & 0 & 0 & 0.06 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0.03 & 0 & 0 & 0.33 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{array}] .

Next, we proceed to calculate $G_{1}^{r a d}$ and $G_{2}^{r a d}$ . Remembering that the first photon was detected in the donor channel, we have (in ${ms}^{- 1}$ units)

G_{1}^{r a d} = G ⊙ D_{d}^{r a d} = [\begin{array}{c} 0 & 0 & 0 & 0 & 0 & 0 \\ 277000 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 277000 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{array}] .

Similarly, since the second photon was detected in the acceptor channel, we can write (in ${ms}^{- 1}$ units)

G_{2}^{r a d} = G ⊙ D_{a}^{r a d} = [\begin{array}{c} 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 285000 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 285000 & 0 & 0 \end{array}] .

We also assume that the system is initially in the superstate $φ_{1}$ giving $ρ_{s t a r t} = [1,0,0,0,0,0]$ . Finally, putting everything together, we can find the likelihood as $L = 3.06 ϵ^{2}$ where $ϵ$ is a constant and does not contribute to parameter estimations, as we show later.

Effect of binning single-photon smFRET data

When considering binned FRET data, the time period of an experiment $(T_{e n d} - T_{s t a r t})$ is typically divided into a finite number $(= N)$ of equally sized $(= ϵ)$ time windows (bins), and the photon counts (intensities) in each bin are recorded in the detection channels. This is in contrast to single-photon analysis where individual photon arrival times are recorded. To arrive at the likelihood for such binned data, we start with the single-photon likelihood derived in Eq. 15 where $ϵ$ is not infinitesimally small, that is,

L = ρ_{s t a r t} Π_{1}^{(r)} Π_{2}^{(r)} Π_{3}^{(r)} \dots Π_{N - 1}^{(r)} Π_{N}^{(r)} ρ_{norm}^{T},

(22)

where

{(Π_{n}^{(r)})}_{a_{n} \to a_{n + 1}} = {(Π)}_{a_{n} \to a_{n + 1}} \times {(D_{n})}_{a_{n} \to a_{n + 1}},

or in the matrix notation

Π_{n}^{(r)} = Π ⊙ D_{n} = \exp (ϵ G) ⊙ D_{n},

(23)

where $D_{n}$ is the detection matrix introduced in the “reduced propagators” section.

Next, we must sum over all superstate trajectories that may give rise to the recorded photon counts (observations) in each bin. However, such a sum is challenging to compute analytically and has been attempted in (38). Here, we only show likelihood computation under commonly applied approximations/assumptions when analyzing binned smFRET data, which are: 1) bin size $ϵ$ is much smaller than typical times spent in a system state or, in other words, for a system transition rate $λ_{σ_{i} \to σ_{j}}$ , we have $ϵ λ_{σ_{i} \to σ_{j}} ≪ 1$ ; and 2) excitation rate $λ_{e x}$ is much slower than dye relaxation and FRET rates, or in other words, interphoton periods are much larger than the excited state lifetimes.

The first assumption is based on realistic situations where system kinetics (at seconds timescale) are many orders of magnitude slower than the photophysical transitions (at nanoseconds timescale). This timescale separation allows us to simplify the propagator calculation in Eq. 23. To see that, we first separate the system transition rates from photophysical transition rates in the generator matrix as

G = G_{σ} \otimes I + G_{ψ},

(24)

where $\otimes$ denotes a tensor product, $G_{σ}$ is the portion of generator matrix $G$ containing only system transition rates previously defined in Eq. 3, and $G_{ψ}$ is the portion containing only photophysical transition rates, that is,

G_{σ} \otimes I = [\begin{array}{c} - \sum_{j \neq 1} λ_{σ_{1} \to σ_{j}} I & λ_{σ_{1} \to σ_{2}} I & \dots & λ_{σ_{1} \to σ_{M_{σ}}} I \\ λ_{σ_{2} \to σ_{1}} I & - \sum_{j \neq 2} λ_{σ_{2} \to σ_{j}} I & \dots & λ_{σ_{2} \to σ_{M_{σ}}} I \\ ⋮ & ⋮ & ⋱ & ⋮ \\ λ_{σ_{M_{σ}} \to σ_{1}} I & λ_{σ_{M_{σ}} \to σ_{2}} I & \dots & - \sum_{j \neq M_{σ}} λ_{σ_{M_{σ}} \to σ_{j}} I \end{array}],

(25)

and

G_{ψ} = [\begin{array}{c} G_{σ_{1}}^{ψ} & 0 & \dots & 0 \\ 0 & G_{σ_{2}}^{ψ} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & G_{σ_{M_{σ}}}^{ψ} \end{array}],

(26)

where $G_{σ_{i}}^{ψ}$ is the photophysical generator matrix corresponding to system state $σ_{i}$ given in Eq. 4.

Now plugging Eq. 24 into the full propagator $Π = \exp (ϵ G)$ and applying the famous Zassenhaus formula for matrix exponentials, we get

\begin{array}{c} Π = \exp (ϵ (G_{σ} \otimes I + G_{ψ})) \\ = \exp (ϵ (G_{σ} \otimes I)) \exp (ϵ G_{ψ}) \exp (- \frac{ϵ^{2}}{2} [G_{σ} \otimes I, G_{ψ}]) \exp (O (ε^{3})) \end{array}

(27)

where the square brackets represent the commutator of the constituting matrices and the last term represents the remaining exponentials involving higher-order commutators. Furthermore, the commutator $[G_{σ} \otimes I, G_{ψ}]$ results in a very sparse matrix given by

[G_{σ} \otimes I, G_{ψ}] = [\begin{array}{c} 0 & λ_{σ_{1} \to σ_{2}} (G_{σ_{2}}^{ψ} - G_{σ_{1}}^{ψ}) & \dots & λ_{σ_{1} \to σ_{M_{σ}}} (G_{σ_{M_{σ}}}^{ψ} - G_{σ_{1}}^{ψ}) \\ λ_{σ_{2} \to σ_{1}} (G_{σ_{1}}^{ψ} - G_{σ_{2}}^{ψ}) & 0 & \dots & λ_{σ_{2} \to σ_{M_{σ}}} (G_{σ_{M_{σ}}}^{ψ} - G_{σ_{2}}^{ψ}) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ λ_{σ_{M_{σ}} \to σ_{1}} (G_{σ_{1}}^{ψ} - G_{σ_{M_{σ}}}^{ψ}) & λ_{σ_{M_{σ}} \to σ_{2}} (G_{σ_{2}}^{ψ} - G_{σ_{M_{σ}}}^{ψ}) & \dots & 0 \end{array}],

(28)

where

λ_{σ_{i} \to σ_{j}} (G_{σ_{j}}^{ψ} - G_{σ_{i}}^{ψ}) = λ_{σ_{i} \to σ_{j}} [\begin{array}{c} 0 & 0 & 0 \\ 0 & - (λ_{σ_{j}}^{F R E T} - λ_{σ_{i}}^{F R E T}) & (λ_{σ_{j}}^{F R E T} - λ_{σ_{i}}^{F R E T}) \\ 0 & 0 & 0 \end{array}] .

Now, the propagator calculation in Eq. 27 simplifies if the commutator $ϵ^{2} [G_{σ} \otimes I, G_{ψ}] \to 0$ , implying that either the bin size $ϵ$ is very small such that $ϵ λ_{σ_{i} \to σ_{j}} ≪ 1$ (our first assumption) or that FRET rates/efficiencies are almost indistinguishable $(ϵ (λ_{σ_{j}}^{F R E T} - λ_{σ_{i}}^{F R E T}) \approx 0)$ . Under such conditions, the system state can be assumed to stay constant during a bin, with system transitions only occurring at the ends of bin periods. Furthermore, the full propagator $Π$ in Eq. 27 can now be approximated as

Π = \exp (ϵ G) \approx \exp (ϵ (G_{σ} \otimes I)) \exp (ϵ G_{ψ}) = (Π_{σ} \otimes I) Π_{ψ},

(29)

where the last equality follows from the block diagonal form of $G_{σ} \otimes I$ given in Eq. 25 and $Π_{σ} = \exp (ϵ G_{σ})$ is the system transition probability matrix (propagator) given as

Π_{σ} = [\begin{array}{c} π_{σ_{1} \to σ_{1}} & π_{σ_{1} \to σ_{2}} & \dots & π_{σ_{1} \to σ_{M_{σ}}} \\ π_{σ_{2} \to σ_{1}} & π_{σ_{2} \to σ_{2}} & \dots & π_{σ_{2} \to σ_{M_{σ}}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ π_{σ_{M_{σ}} \to σ_{1}} & π_{σ_{M_{σ}} \to σ_{2}} & \dots & π_{σ_{M_{σ}} \to σ_{M_{σ}}} \end{array}] .

(30)

Moreover, $Π_{ψ} = \exp (ϵ G_{ψ})$ is the photophysical transition probability matrix (propagator) as

Π_{ψ} = [\begin{array}{c} Π_{σ_{1}}^{ψ} & 0 & \dots & 0 \\ 0 & Π_{σ_{2}}^{ψ} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & Π_{σ_{M_{σ}}}^{ψ} \end{array}],

(31)

where the elements are given as $Π_{σ_{i}}^{ψ} = \exp (ϵ G_{σ_{i}}^{ψ})$ . Furthermore, because of the block diagonal structure of $Π_{ψ}$ , the matrix multiplication in Eq. 29 results in

Π = [\begin{array}{c} π_{σ_{1} \to σ_{1}} Π_{σ_{1}}^{ψ} & π_{σ_{1} \to σ_{2}} Π_{σ_{1}}^{ψ} & \dots & π_{σ_{1} \to σ_{M_{σ}}} Π_{σ_{1}}^{ψ} \\ π_{σ_{2} \to σ_{1}} Π_{σ_{2}}^{ψ} & π_{σ_{2} \to σ_{2}} Π_{σ_{2}}^{ψ} & \dots & π_{σ_{2} \to σ_{M_{σ}}} Π_{σ_{2}}^{ψ} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ π_{σ_{M_{σ}} \to σ_{1}} Π_{σ_{M_{σ}}}^{ψ} & π_{σ_{M_{σ}} \to σ_{2}} Π_{σ_{M_{σ}}}^{ψ} & \dots & π_{σ_{M_{σ}} \to σ_{M_{σ}}} Π_{σ_{M_{σ}}}^{ψ} \end{array}] .

(32)

After deriving the full propagator $Π$ for the time period $ϵ$ (bin) under our first assumption, we now proceed to incorporate observations during this period via detection matrices $D_{n}$ to compute the reduced propagator of Eq. 23. To do so, we now apply our second assumption of relatively slower excitation rate $λ_{e x}$ . This assumption implies that interphoton periods are dominated by the time spent in the ground state of the FRET pair and are distributed according to a single exponential distribution, Exponential $(λ_{e x})$ . Consequently, the total photon counts per bin follow a Poisson distribution, Poisson $(ϵ λ_{e x})$ , independent of the photophysical portion of the photophysical trajectory taken from superstate $a_{n}$ to $a_{n + 1}$ .

Now, the first and the second assumptions imply that the observation during the $n$ -th bin only depends on the system state $s_{n}$ (or the associated FRET rate $λ_{s_{n}}^{F R E T}$ ). As such we can approximate the detection matrix elements as

{(D_{n})}_{a_{n} \to a_{n + 1}} = p (w_{n} | a_{n}, a_{n + 1}) \approx p (w_{n} | s_{n}) .

(33)

Using these approximations, the reduced propagator in Eq. 23 can now be written as

\begin{array}{c} Π_{n}^{(r)} = Π ⊙ D_{n} \\ \approx [\begin{array}{c} π_{σ_{1} \to σ_{1}} p (w_{n} | s_{n} = σ_{1}) Π_{σ_{1}}^{ψ} & \dots & π_{σ_{1} \to σ_{M_{σ}}} p (w_{n} | s_{n} = σ_{1}) Π_{σ_{1}}^{ψ} \\ π_{σ_{2} \to σ_{1}} p (w_{n} | s_{n} = σ_{2}) Π_{σ_{2}}^{ψ} & \dots & π_{σ_{2} \to σ_{M_{σ}}} p (w_{n} | s_{n} = σ_{2}) Π_{σ_{2}}^{ψ} \\ ⋮ & ⋱ & ⋮ \\ π_{σ_{M_{σ}} \to σ_{1}} p (w_{n} | s_{n} = σ_{M_{σ}}) Π_{σ_{M_{σ}}}^{ψ} & \dots & π_{σ_{M_{σ}} \to σ_{M_{σ}}} p (w_{n} | s_{n} = σ_{M_{σ}}) Π_{σ_{M_{σ}}}^{ψ} \end{array}] . \end{array}

(34)

Next, to compute the likelihood for the $n$ -th bin, we need to sum over all possible superstate trajectories within this bin as

\begin{array}{c} L_{n} = ρ_{n} Π_{n}^{(r)} ρ_{norm}^{T} \\ \approx \sum_{i = 1}^{M_{σ}} \sum_{j = 1}^{M_{σ}} p (w_{n} | s_{n} = σ_{i}) π_{σ_{i} \to σ_{j}} (ρ_{σ_{i}, n} Π_{σ_{i}}^{ψ} ρ_{norm}^{T}), \end{array}

(35)

where $ρ_{n}$ is a normalized row vector populated by probabilities of finding the system-FRET composite in the possible superstates at the beginning of the $n$ -th bin. Furthermore, we have written portions of $ρ_{n}$ corresponding to system state $σ_{i}$ as $ρ_{σ_{i}, n}$ . To be more explicit, we have

ρ_{n} = [\begin{array}{c} ρ_{σ_{1}, n} & ρ_{σ_{2}, n} & \dots & ρ_{σ_{M_{σ}}, n} \end{array}],

following the convention in the “likelihood” section and using $n$ to now represent time $t_{n}$ .

Moreover, since each row of $Π_{σ_{i}}^{ψ} = \exp (ϵ G_{σ_{i}}^{ψ})$ sums to one, we have $Π_{σ_{i}}^{ψ} ρ_{norm}^{T} = ρ_{norm}^{T}$ , which simplifies the bin likelihood of Eq. 35 to

\begin{array}{c} L_{n} \approx \sum_{i = 1}^{M_{σ}} \sum_{j = 1}^{M_{σ}} p (w_{n} | s_{n} = σ_{i}) π_{σ_{i} \to σ_{j}} (ρ_{σ_{i}, n} ρ_{norm}^{T}) \\ = \sum_{i = 1}^{M_{σ}} \sum_{j = 1}^{M_{σ}} p (w_{n} | s_{n} = σ_{i}) π_{σ_{i} \to σ_{j}} ρ_{σ_{i}, n}, \end{array}

(36)

where we have defined $ρ_{σ_{i}, n} \equiv ρ_{σ_{i}, n} ρ_{norm}^{T} = \sum_{j} ρ_{σ_{i}, ψ_{j}}$ as the probability of the system to occupy system state $σ_{i}$ . We can also write the previous equation in the matrix form as

L_{n} \approx ρ_{n}^{σ} (Π_{σ} ⊙ D_{n}^{σ}) ρ_{norm}^{T},

(37)

where $ρ_{n}^{σ}$ is a row vector of length $M_{σ}$ (number of system states) populated by $ρ_{σ_{i}, n}$ for each system state, and $D_{n}^{σ}$ , in the same spirit as $D_{n}$ , is a detection matrix of dimensions $M_{σ} \times M_{σ}$ populated by observation probability $p (w_{n} | s_{n} = σ_{i})$ in each row corresponding to system state $σ_{i}$ . Furthermore, defining $Π_{n}^{σ} \equiv (Π_{σ} ⊙ D_{n}^{σ})$ , we note here that $Π_{n}^{σ}$ propagates probabilities during the $n$ -th bin in a similar manner as the reduced propagators $Π_{n}^{(r)}$ of Eq. 23.

Therefore, we can now multiply these new propagators for each bin to approximate the likelihood of Eq. 22 as

L \approx ρ_{s t a r t}^{σ} Π_{1}^{σ} Π_{2}^{σ} Π_{3}^{σ} \dots Π_{N}^{σ} ρ_{norm}^{T},

(38)

where $ρ_{s t a r t}^{σ}$ is a row vector, similar to $ρ_{n}^{σ}$ , populated by probabilities of being in a given system state at the beginning of an experiment.

To conclude, our two assumptions regarding system kinetics and excitation rate allow us to significantly reduce the dimensions of the propagators. This, in turn, leads to much lowered expense for likelihood computation. However, cheaper computation comes at the expense of requiring a large number of photon detections or excitation rates per bin to accurately determine FRET efficiencies (identify system states) since we marginalize over photophysics in each bin. Such high excitation rates lead to faster photobleaching and increased phototoxicity, and thereby much shorter experiment durations. As we will see in the “pulsed illumination” section, this problem can be mitigated by using pulsed illumination, where the likelihood takes a similar form as Eq. 38, but FRET efficiencies can be accurately estimated from the measured microtimes.

Detection effects

In the previous section, we assumed idealized detectors to illustrate basic ideas on detection matrices. However, realistic FRET experiments must typically account for detector nonidealities. For instance, many emitted photons may simply go undetected when the detection efficiency of single-photon detectors, i.e., the probability of an incident photon being successfully registered, is less than one due to inherent nonlinearities associated with the electronics (22) or the use of filters in cases of polarized fluorescent emission (60,61). In addition, donor photons may be detected in the channel reserved for acceptor photons or vice-versa due to emission spectrum overlap (62). This phenomenon, commonly known as crosstalk, crossover, or bleedthrough, can significantly affect the determination of quantities, such as transition rates and FRET efficiencies, as we demonstrate later in our results. Other effects adding noise to fluorescence signals include dark current (false signal in the absence of incident photons), dead time (the time a detector takes to relax back into its active mode after a photon detection), and timing jitter or IRF (stochastic delay in the output signal after a detector receives a photon) (22). In this section, we describe the incorporation of all such effects into our model except dark current and background emissions, which require more careful treatment and will be discussed in the “background emissions” section.

Crosstalk and detection efficiency

Noise sources such as crosstalk and detection efficiency necessarily result in photon detection being treated as a stochastic process. Both crosstalk and detection efficiency can be included into the propagators in both cases by substituting the zeros and ones, appearing in the ideal radiative and nonradiative detection matrices (Eqs. 16 and 17), with probabilities between zero and one. In such a way, the resulting propagators obtained from these detection matrices, in turn, incorporate into the likelihood the effects of crosstalk and detection efficiency into the model.

Here, in the presence of crosstalk, for clarity, we add a superscript to the radiative detection matrix of Eq. 17 for the $k$ -th photon, $D_{k}^{r a d - c t}$ . The elements of this detection matrix for the $a_{n} \to b_{n}$ transition, when a photon intended for channel $j$ is registered in channel $i$ reads

{(D_{k}^{r a d - c t})}_{a_{n} \to b_{n}} = {\begin{cases} 0 & Nonradiative transitions \\ ϕ_{j i} & Radiative transitions \end{cases}

where $ϕ_{j i}$ is the probability for this event (upon transition from superstate $a_{n}$ to $b_{n}$ ). Further, detector efficiencies can also be accounted for in these probabilities to represent the combined effects of crosstalk, arising from spectral overlap, and absence of detection channel registration. When we do so, we recover $\sum_{i} ϕ_{j i} \leq 1$ (for cases where $i$ and $j$ can be both the same or different), as not all emitted photons can be accounted for by the detection channels.

This new detection matrix above results in the following modification to the radiative propagator of Eq. 19 for the $k$ -th photon

Π_{k}^{(r) r a d - c t} = (I + ϵ G + O (ϵ^{2})) ⊙ D_{k}^{r a d - c t} = ϵ G_{k}^{r a d - c t} + O (ϵ^{2}) .

The second equality above follows by recognizing that the identity matrix multiplied, element-wise, by $D_{k}^{r a d - c t}$ is zero. By definition, $G_{k}^{r a d - c t}$ is the remaining nonzero product.

On the other hand, for time periods when no photons are detected, the nonradiative detection matrices in Eq. 16 become

{(D_{n})}_{a_{n} \to b_{n}} = {(D^{n o n - c t})}_{a_{n} \to b_{n}} = {\begin{cases} 1 & Nonradiative transitions \\ 1 - \sum_{j} ϕ_{i j} & Radiative transitions \end{cases},

where the sum gives the probability of the photon intended for channel $i$ to be registered in any channel. The nonradiative propagator of Eq. 18 for an infinitesimal period of size $ϵ$ in the presence of crosstalk and inefficient detectors is now

Π^{(r) n o n - c t} = (I + ϵ G + O (ϵ^{2})) ⊙ D^{n o n - c t} = \exp (ϵ G^{n o n - c t}) + O (ϵ^{2}),

(39)

where $G^{n o n - c t} = G ⊙ D^{n o n - c t}$ . With the propagators incorporating crosstalk and detection efficiency now defined, the evolution during an interphoton period between the $(k - 1)$ -th photon and the $k$ -th photon of size $(T_{k} - T_{k - 1})$ is now governed by the product

Π_{k}^{n o n - c t} Π_{k}^{(r) r a d - c t} = ϵ Π_{k}^{n o n - c t} G_{k}^{r a d - c t} + O (ϵ^{2}),

(40)

where the nonradiative propagators in Eq. 39 have now been merged into a single propagator $Π_{k}^{n o n - c t} = \exp ((T_{k} - T_{k - 1}) G^{n o n - c t})$ following the same procedure as Eq. 20.

Finally, inserting Eq. 40 for each interphoton period into the likelihood of Eq. 14, we arrive at the final likelihood incorporating crosstalk and detection efficiency as

\begin{array}{c} L = ε^{K} ρ_{s t a r t} Π_{1}^{n o n - c t} G_{1}^{r a d - c t} Π_{2}^{n o n - c t} G_{2}^{r a d - c t} \dots Π_{K - 1}^{n o n - c t} G_{K - 1}^{r a d - c t} \\ \times Π_{K}^{n o n - c t} G_{K}^{r a d - c t} Π_{e n d}^{n o n - c t} ρ_{norm}^{T} + O (ε^{K + 1}) . \end{array}

After incorporating crosstalk and detector efficiencies into our model, we briefly explain the calibration of the crosstalk probabilities/detection efficiencies $ϕ_{i j}$ . To calibrate these parameters, two samples, one containing only donor dyes and another containing only acceptor dyes, are individually excited with a laser under the same power to determine the number of donor photons $n_{d i}^{r a w}$ and number of acceptor photons $n_{a i}^{r a w}$ detected in channel $i$ .

From photon counts recorded for the donor-only sample, assuming ideal detectors with 100% efficiency, we can compute the crosstalk probabilities for donor photons going to channel $i$ , $ϕ_{d i}$ , using the photon count ratios as $ϕ_{d i} = n_{d i}^{r a w} / n_{d}^{e m}$ , where $n_{d}^{e m}$ is the absolute number of emitted donor photons. Similarly, crosstalk probabilities for acceptor photons going to channel $i$ , $ϕ_{a i}$ , can be estimated as $ϕ_{a i} = n_{a i}^{r a w} / n_{a}^{e m}$ , where $n_{a}^{e m}$ is the absolute number of emitted acceptor photons. In the matrix form, these crosstalk factors for a two-detector setup can be written as

A = [\begin{array}{c} ϕ_{a 1} & ϕ_{d 1} \\ ϕ_{a 2} & ϕ_{d 2} \end{array}] .

(41)

Using this matrix, for the donor-only sample, we can now write

[\begin{array}{c} n_{d 1}^{r a w} \\ n_{d 2}^{r a w} \end{array}] = A [\begin{array}{c} 0 \\ n_{d}^{e m} \end{array}] = [\begin{array}{c} ϕ_{d 1} \\ ϕ_{d 2} \end{array}] n_{d}^{e m},

(42)

and similarly for the acceptor only sample

[\begin{array}{c} n_{a 1}^{r a w} \\ n_{a 2}^{r a w} \end{array}] = A [\begin{array}{c} n_{a}^{e m} \\ 0 \end{array}] = [\begin{array}{c} ϕ_{a 1} \\ ϕ_{a 2} \end{array}] n_{a}^{e m} .

(43)

However, it is difficult to estimate the absolute number of emitted photons $n_{d}^{e m}$ and $n_{a}^{e m}$ experimentally, and therefore the crosstalk factors in $A$ can only be determined up to multiplicative factors of $n_{d}^{e m}$ and $n_{a}^{e m}$ .

Since scaling the photon counts in an smFRET trace by an overall constant does not affect the FRET efficiency estimates (determined by photon count ratios) and escape rates (determined by changes in FRET efficiency), we only require crosstalk factors up to a constant as in the last equation.

For this reason, one possible solution toward determining the matrix elements of $A$ up to one multiplicative constant is to first tune dye concentrations such that the ratio $n_{d}^{e m} / n_{a}^{e m} = 1$ , which can be accomplished experimentally. This allows us to write the crosstalk factors in the matrix form up to a constant as follows

A = [\begin{array}{c} ϕ_{a 1} & ϕ_{d 1} \\ ϕ_{a 2} & ϕ_{d 2} \end{array}] \propto [\begin{array}{c} n_{a 1}^{r a w} & n_{d 1}^{r a w} \\ n_{a 2}^{r a w} & n_{d 2}^{r a w} \end{array}] .

(44)

It is common to set the multiplicative factor in Eq. 44 by the total donor photon counts $\sum_{j} n_{d j}^{r a w}$ to give

A = [\begin{array}{c} ϕ_{a 1} & ϕ_{d 1} \\ ϕ_{a 2} & ϕ_{d 2} \end{array}] \equiv [\begin{array}{c} \frac{n_{a 1}^{r a w}}{\sum_{j} n_{d j}^{r a w}} & \frac{n_{d 1}^{r a w}}{\sum_{j} n_{d j}^{r a w}} \\ \frac{n_{a 2}^{r a w}}{\sum_{j} n_{d j}^{r a w}} & \frac{n_{d 2}^{r a w}}{\sum_{j} n_{d j}^{r a w}} \end{array}] .

(45)

We note that from the convention adopted here, we have $ϕ_{d 1} + ϕ_{d 2} = 1$ .

Furthermore, in situations where realistic detectors affect the raw counts, the matrix elements of $A$ as computed above automatically incorporate the effects of detector inefficiencies including the fact that $\sum_{j} ϕ_{j} \leq 1$ .

In addition, the matrix $A$ can be further generalized to account for more than two detectors by appropriately expanding the size of the matrix dimensions to coincide with the number of detectors. Calibration of the matrix elements then follows the same procedure as above.

Now, in performing single-photon FRET analysis, we will use directly the elements of $A$ in constructing our measurement matrix. However, it is also common, to compute the matrix elements of $A$ from what is termed the route correction matrix (RCM) (63) typically used in binned photon analysis. The RCM is defined as the inverse of $A$ to obtain corrected counts $n_{d}^{e m}$ and $n_{a}^{e m}$ up to a proportionality constant as

R C M \propto [\begin{array}{c} ϕ_{d 2} & - ϕ_{d 1} \\ - ϕ_{a 2} & ϕ_{a 1} \end{array}] .

(46)

Example V: Detection matrices with crosstalk and detector efficiencies.

For our example with two system states, we had earlier shown detection matrices for ideal detectors. Here, we incorporate crosstalk and detector efficiencies into these matrices. Moreover, we assume a realistic RCM (64) given as

R C M \propto [\begin{array}{c} 1.0 & - 0.22 \\ 0.0 & 1.02 \end{array}] .

However, following the convention of Eq. 45, we scale the matrix provided by a sum of absolute values of its first row elements, namely 1.22, leading to effective crosstalk factors $ϕ_{i j}$ given as

ϕ_{a 1} = 0.84, ϕ_{a 2} = 0.0, ϕ_{d 1} = 0.18, and ϕ_{d 2} = 0.82 .

As such, these values imply approximately 18% crosstalk from donor to acceptor channel and 84% detection efficiency for acceptor channel without any crosstalk using the convention adopted in Eq. 45. Now, we modify the ideal radiative detection matrices by replacing their nonzero elements with the calibrated $ϕ_{i j}$ values above

D_{d / a}^{r a d - c t} = [\begin{array}{c} 0 & 0 & 0 & 0 & 0 & 0 \\ ϕ_{d 2} / ϕ_{d 1} & 0 & 0 & 0 & 0 & 0 \\ ϕ_{a 2} / ϕ_{a 1} & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & ϕ_{d 2} / ϕ_{d 1} & 0 & 0 \\ 0 & 0 & 0 & ϕ_{a 2} / ϕ_{a 1} & 0 & 0 \end{array}] = [\begin{array}{c} 0 & 0 & 0 & 0 & 0 & 0 \\ 0.82 / 0.18 & 0 & 0 & 0 & 0 & 0 \\ 0 / 0.84 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0.82 / 0.18 & 0 & 0 \\ 0 & 0 & 0 & 0 / 0.84 & 0 & 0 \end{array}] .

Similarly, we modify the ideal nonradiative detection matrix by replacing the zero elements by $1 - ϕ_{a 1} - ϕ_{a 2} = 0.16$ and $1 - ϕ_{d 2} - ϕ_{d 1} = 0$ as follows

D^{n o n - c t} = [\begin{array}{c} 1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 1 & 1 & 1 \\ 0.16 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 0 & 1 & 1 \\ 1 & 1 & 1 & 0.16 & 1 & 1 \end{array}] .

Effects of detector dead time

Typically, a detection channel $i$ becomes inactive (dead) after the detection of a photon for a period $δ_{i}$ as specified by the manufacturer. Consequently, radiative transitions associated with that channel cannot be monitored during that period.

To incorporate this detector dead period into our likelihood model, we break an interphoton period between the $(k - 1)$ -th and $k$ -th photon into two intervals: the first interval with an inactive detector and the second one when the detector is active. Assuming that the $(k - 1)$ -th photon is detected in the $i$ -th channel, the first interval is thus $δ_{i_{k}}$ long. As such, we can define the detection matrix for this interval as

{(D^{i_{k} - d e a d})}_{a_{n} \to b_{n}} = {\begin{cases} 1 & All transitions not intended for channel i_{k} \\ 0 & All transitions intended for channel i_{k} \end{cases} .

Next, corresponding to this detection matrix, we have the propagator

Π_{k}^{i_{k} - d e a d} = \exp (δ_{i_{k}} (G ⊙ D^{i_{k} - d e a d})) = \exp (δ_{i_{k}} G^{i_{k} - d e a d}),

that evolves the superstate during the detector dead time. This propagator can now be used to incorporate the detector dead time into Eq. 20 to represent the evolution during the period between the $(k - 1)$ -th and $k$ -th photons as

Π_{k - 1}^{i_{k - 1} - d e a d} Π_{k}^{n o n} Π_{k}^{(r) r a d} = ϵ Π_{k - 1}^{i_{k - 1} - d e a d} Π_{k}^{n o n} G_{k}^{r a d} + O (ϵ^{2}),

(47)

where $Π_{k}^{n o n} Π_{k}^{(r) r a d}$ describes the evolution when the detector is active.

Finally, inserting Eq. 47 for each interphoton period into the likelihood for the HMM with a second-order structure in Eq. 14, we arrive at the following likelihood that includes detector dead time

L \propto ρ_{s t a r t} Π_{1}^{n o n} G_{1}^{r a d} Π_{1}^{i_{1} - d e a d} Π_{2}^{n o n} G_{2}^{r a d} Π_{2}^{i_{2} - d e a d} \dots Π_{K}^{n o n} G_{K}^{r a d} Π_{K}^{i_{K} - d e a d} Π_{e n d}^{n o n} ρ_{norm}^{T} .

(48)

To provide an explicit example on the effect of the detector dead time on the likelihood, we take a detour for pedagogical reasons. In this context, we consider a very simple case of one detection channel (dead time $δ$ ) observing a fluorophore with two photophysical states, ground $(ψ_{1})$ and excited $(ψ_{2})$ , illuminated by a laser. The data in this case contains only photon arrival times

{T_{1}, T_{2}, T_{3}, \dots, T_{K}} .

The generator matrix containing the photophysical transition rates for this setup is

G = [\begin{array}{c} * & λ_{ψ_{1} \to ψ_{2}} \\ λ_{ψ_{2} \to ψ_{1}} & * \end{array}] = [\begin{array}{c} * & λ_{e x} \\ λ_{d} & * \end{array}],

where the $*$ along the diagonal represents the negative row-sum of the remaining elements, $λ_{e x}$ is the excitation rate, and $λ_{d}$ is the donor relaxation rate.

Here, all transitions are possible during detector dead times as there are no observations. As such, the dead time propagators in the likelihood (Eq. 48) are simply expressed as exponentials of the full generator matrix, that is, $Π_{k}^{i_{k} - d e a d} = \exp (δ G)$ , leaving the normalization of the propagated probability vector $ρ$ unchanged, e.g., just as we had seen in Eq. 8.

As we will see, these dead times, similar to detector inefficiencies, simply increase our uncertainty over parameters we wish to learn, such as kinetics, by virtue of providing less information. By contrast, background emissions and crosstalk provide false information. However, the net effect is the same: all noise sources increasing uncertainty.

Adding the detection IRF

Due to various sources of noise impacting the detection timing electronics (also known as jitter), the time elapsed between photon arrival and detection is itself a hidden (latent) random variable (22). Under continuous illumination, we say that this stochastic delay in time is sampled from a probability density, $f (τ)$ , termed the detection IRF. To incorporate the detection IRF into the likelihood of Eq. 48, we convolute the propagators with $f (τ)$ as follows

\begin{array}{l} L \propto ρ_{s t a r t} & (\int_{0}^{τ_{I R F}} d τ_{1} Π_{1}^{n o n} (Δ T_{1} - τ_{1}) G_{1}^{r a d} Π_{1}^{i_{1} - d e a d} (τ_{1}) f (τ_{1})) \\ \times (\int_{0}^{τ_{I R F}} d τ_{2} Π_{2}^{n o n} (Δ T_{2} - τ_{2}) G_{2}^{r a d} Π_{2}^{i_{2} - d e a d} (τ_{2}) f (τ_{2})) \\ ⋮ \\ \times (\int_{0}^{τ_{I R F}} d τ_{K} Π_{K}^{n o n} (Δ T_{K} - τ_{K}) G_{K}^{r a d} Π_{K}^{i_{K} - d e a d} (τ_{K}) f (τ_{K})) Π_{e n d}^{n o n} ρ_{norm}^{T}, \end{array}

(50)

where we have used dead time propagators $Π_{k}^{i_{k} - d e a d}$ to incorporate detector inactivity during the period between photon reception and detector reactivation. Moreover, we have $Π_{k}^{n o n} (Δ T_{k} - τ_{k}) = \exp ((Δ T_{k} - τ_{k}) G^{n o n})$ as described in Eq. 18.

To facilitate the computation of this likelihood, we use the fact that typical acquisition devices record at discrete (but very small) time intervals. For instance, a setup with the smallest acquisition time of 16 ps and a detection IRF distribution that is approximately 100 ps wide will have the detection IRF spread over, roughly, six acquisition periods. This allows each convolution integral to be discretized over the six acquisition intervals and computed in parallel, thereby avoiding extra real computational time associated to this convolution other than the overhead associated with parallelization.

Illumination features

After discussing detector effects, we continue here by further considering different illumination features. For simplicity alone, our likelihood computation until now assumed continuous illumination with a uniform intensity. More precisely, the element $λ_{e x}$ of the generator matrix in Eq. 4 was assumed to be time independent. Here, we generalize our formulation and show how other illumination setups (such as pulsed illumination and alternating laser excitation, ALEX (65)) can be incorporated into the likelihood by simply assigning a time dependence to the excitation rate $λ_{e x} (t)$ .

Pulsed illumination

Here, we consider an smFRET experiment where the FRET pair is illuminated using a laser for a very short period of time (a pulse), $δ_{p u l s e}$ , at regular intervals of size $τ$ ; see Fig. 3 a. Now, as in the case of continuous illumination with constant intensity, the likelihood for a set of observations acquired using pulsed illumination takes a similar form to Eq. 21 involving products of matrices as follows

L \propto ρ_{s t a r t} Q_{1} Q_{2} Q_{3} \dots Q_{N - 1} Q_{N} ρ_{norm}^{T},

(51)

where $Q_{n}$ , with $n = 1, \dots, N$ , denotes the propagator evolving the superstate during the $n$ -th interpulse period between the $(n - 1)$ -th and the $n$ -th pulse.

Events over a pulsed illumination experiment pulse window. Here, the beginning of the $n$ -th interpulse window of size $τ$ is marked by time $t_{n}$ . The FRET labels originally in the state GG (donor and acceptor, respectively, in ground states) are excited by a high-intensity burst (shown by the green) to the state EG (only donor excited) for a very short time $δ_{p u l s e}$ . If FRET occurs, the donor transfers its energy to the acceptor and resides in the ground state leaving the FRET labels in the GE state (only acceptor excited). The acceptor then emits a photon to be registered by the detector at microtime $μ_{n}$ . When using ideal detectors, the microtime is the same as the photon emission time as shown in (a). However, when the timing hardware has jitter (shown in *red*), a small delay $ϵ_{n}$ is added to the microtime as shown in (b).

To derive the structure of $Q_{n}$ during the $n$ -th interpulse period, we break it into two portions: 1) pulse with nonzero laser intensity where the evolution of the FRET pair is described by the propagator $Π_{n}^{p u l s e}$ introduced shortly; 2) dark period with zero illumination intensity where the evolution of the FRET pair is described by the propagator $Π_{n}^{d a r k}$ introduced shortly. Furthermore, depending on whether a photon is detected or not over the $n$ -th interpulse period the propagators $Π_{n}^{p u l s e}$ and $Π_{n}^{d a r k}$ assume different forms.

First, when no photons are detected, we have

Π_{n}^{p u l s e} = \exp (\int_{0}^{δ_{p u l s e}} d δ G^{n o n} (δ)), and

(52)

Π_{n}^{d a r k} = \exp ((τ - δ_{p u l s e}) G^{d a r k}),

(53)

where the integration over the pulse period now involves a time-dependent $G^{n o n}$ due to temporal variations in $λ_{e x} (t)$ . The integral in Eq. 52 is sometimes termed the excitation IRF, although we will not use this convention here. For this reason, when we say IRF, we imply detection IRF alone. In addition, $G^{d a r k}$ is the same as $G^{n o n}$ except for the excitation rate that is now set to zero due to lack of illumination. Finally, the propagator for an interpulse period with no photon detection can now be written as

Q_{n} = Π_{n}^{p u l s e} Π_{n}^{d a r k} = \exp (\int_{0}^{δ_{p u l s e}} d δ G^{n o n} (δ)) \exp ((τ - δ_{p u l s e}) G^{d a r k}) .

(54)

On the other hand, if a photon is detected sometime after a pulse (as in Fig. 3 a), the pulse propagator remains as in Eq. 52. However, the propagator $Π_{n}^{d a r k}$ must now be modified to include a radiative generator matrix $G_{n}^{r a d}$ similar to Eq. 20

Π_{n}^{d a r k} = \exp ((μ_{n} - δ_{p u l s e}) G^{d a r k}) G_{n}^{r a d} \exp ((τ - μ_{n}) G^{d a r k}),

(55)

where $μ_{n}$ is the photon arrival time measured with respect to the $n$ -th pulse (also termed microtime) as shown in Fig. 3 a. Here, the two exponential terms describe the evolution of the superstate before and after the photon detection during the dark period.

Moreover, we can construct the propagator for situations where a photon is detected during a pulse itself in a similar fashion. Here, the propagator $Π_{n}^{d a r k}$ remains the same as in Eq. 53 but $Π_{n}^{p u l s e}$ must now be modified to include the radiative generator matrix $G_{n}^{r a d}$ as

Π_{n}^{p u l s e} = \exp (\int_{0}^{μ_{n}} d δ G^{n o n} (δ)) G_{n}^{r a d} \exp (\int_{μ_{n}}^{δ_{p u l s e}} d δ G^{n o n} (δ)) .

(56)

The propagators derived so far in this section assumed ideal detectors. We now describe a procedure to incorporate the IRF into this formulation. This is especially significant in accurate estimation of fluorophores’ lifetimes, which is commonly done in pulse illumination smFRET experiments. To incorporate the IRF, we follow the same procedure as in the “adding the detection IRF” section and introduce convolution between the IRF function $f (ϵ)$ and propagators above involving photon detections. That is, when there is a photon detected during the dark period, we modify the propagator $Π_{n}^{d a r k}$ as

Π_{n}^{d a r k} = \int_{0}^{δ_{I R F}} d ϵ_{n} \exp ((μ_{n} - δ_{p u l s e} - ϵ_{n}) G^{d a r k}) G_{n}^{r a d} \times \exp ((τ - μ_{n} + ϵ_{n}) G^{d a r k}) f (ϵ_{n}),

(57)

while the $Π_{n}^{p u l s e}$ stays the same as in Eq. 52. Here, $ϵ_{n}$ is the stochastic delay in photon detection resulting from the IRF as shown in Fig. 3 b.

Moreover, when there is a photon detected during a pulse, the propagator $Π_{n}^{p u l s e}$ of Eq. 56 can be modified in a similar fashion to accommodate the IRF, while the propagator $Π_{n}^{d a r k}$ remains the same as in Eq. 53.

The propagators $Q_{n}$ presented in this section involve integrals over large generator matrices that are analytically intractable and computationally expensive when considering large pulse numbers. Therefore, we follow a strategy similar to the one used in the “effect of binning single-photon smFRET data” sectionfor binned likelihood to approximate these propagators.

To reduce the complexity of the calculations, we start by making realistic approximations. Given the timescale separation between the interpulse period (typically tens of nanoseconds) and the system kinetics (typically of seconds timescale) in a pulsed illumination experiment, it is possible to approximate the system state trajectory as being constant during an interpulse period. In essence, rather than treating the system state trajectory as a continuous time process, we discretize the trajectory such that system transitions only occur at the beginning of each interpulse period. This allows us to separate the photophysical part of the generator matrix $G_{ψ}$ in Eq. 4 from the portion describing the evolution of the system under study $G_{σ}$ given in Eq. 3. Here, by contrast to the likelihood shown in the “pulsed illumination” section, we can now independently compute photophysical and system likelihood portions, as described below.

To derive the likelihood, we begin by writing the system state propagator during an interpulse period as

Π_{σ} = \exp (τ G_{σ}) .

(58)

Furthermore, we must incorporate observations into these propagators by multiplying each system transition probability in $Π_{σ}$ , $π_{σ_{i \to} σ_{j}}$ , with the observation probability if that transition had occurred. We organize these observation probabilities using our newly defined detection matrices $D_{n}^{σ}$ similar to the “continuum limit” section and write the modified propagators as

Π_{n}^{σ} = Π_{σ} ⊙ D_{n}^{σ},

(59)

where $⊙$ again represents the element-by-element product. Here, the elements of $D_{n}^{σ}$ depend on the photophysical portion of the generator matrix $G_{ψ}$ and their detailed derivations are shown in the third companion article (54). We note here that propagator matrix dimensions are now $M_{σ} \times M_{σ}$ making them computationally less expensive than in the continuous illumination case. Finally, the likelihood for the pulsed illuminated smFRET data with these new propagators reads

L = p (w | ρ_{s t a r t}, Π_{σ}, G_{ψ}) \propto ρ_{s t a r t} Π_{1}^{σ} Π_{2}^{σ} \dots Π_{N}^{σ} ρ_{norm}^{T},

(60)

which, similar to the case of binned likelihood under continuous illumination (see the “effect of binning single-photon smFRET data” section), sums over all possible system state trajectories.

We will later use this likelihood to put forward an inverse model to learn transition probabilities (elements of $Π_{σ}$ ) and photophysical transition rates appearing in $G_{ψ}$ .

Background emissions

Here, we consider background photons registered by detectors from sources other than the labeled system under study (2). The majority of background photons comprise ambient photons, photons from the illumination laser entering the detectors, and dark current (false photons registered by detectors) (22).

Due to the uniform laser intensity in the continuous illumination case, considered in this section, we may model all background photons using a single distribution from which waiting times are drawn. Often, such distributions are assumed (or verified) to be exponential with fixed rates for each detection channel (66,67). Here, we model the waiting time distribution for background photons arising from both origins as a single exponential as is often the most common case. However, in the pulsed illumination case, laser source and the two other sources of background require different treatments due to nonuniform laser intensity. That is, the ambient photons and dark current are still modeled by an Exponential distribution, although it is often further approximated as a Uniform distribution given that interpulse period if much shorter than the average background waiting time. The full formulation describing all background sources under pulsed illumination is provided in the third companion article (54).

We now proceed to incorporate background into the likelihood under continuous illumination. We do so, as mentioned earlier, by assuming an Exponential distribution for the background, which effectively introduces new photophysical transitions into the model. As such, these transitions may be incorporated by expanding the full generator matrix $G$ (described in the “likelihood” section) appearing in the likelihood, thereby leaving the structure of the likelihood itself intact, cf., Eq. 21.

To be clear, in constructing the new generator matrix, we treat background in each detection channel as if originating from fictitious independent emitters with constant emission rates (exponential waiting time). Furthermore, we assume that an emitter corresponding to channel $i$ is a two-state system with photophysical states denoted by

{ψ_{i, 1}^{b g}, ψ_{i, 2}^{b g}} .

Here, each transition to the other state coincides with a photon emission with rate $λ_{i}^{b g}$ . As such, the corresponding background generator matrix for channel $i$ can now be written as

G_{i}^{b g} = [\begin{array}{c} * & λ_{ψ_{i, 1}^{b g} \to ψ_{i, 2}^{b g}} \\ λ_{ψ_{i, 2}^{b g} \to ψ_{i, 1}^{b g}} & * \end{array}] = [\begin{array}{c} * & λ_{i}^{b g} \\ λ_{i}^{b g} & * \end{array}] .

Since the background emitters for each channel are independent of each other, the expanded generator matrix $G$ for the combined setup (system-FRET composite plus background) can now be computed. This can be achieved by combining the system-FRET composite state space and the background state spaces for all of the total $C$ detection channels using Kronecker sums (68) as

G = G_{n o - b g} \oplus G_{1}^{b g} \oplus G_{2}^{b g} \oplus \dots \oplus G_{C}^{b g},

where the symbol $\oplus$ denotes the matrix Kronecker sum, and $G_{n o - b}$ represents previously shown generator matrices without any background transition rates.

The propagators needed to compute the likelihood can now be obtained by exponentiating the expanded generator matrix above as

\exp ((T_{k} - T_{k - 1}) G) = \exp ((T_{k} - T_{k - 1}) G_{n o - b g}) \otimes \exp ((T_{k} - T_{k - 1}) G_{1}^{b g}) \otimes \exp ((T_{k} - T_{k - 1}) G_{2}^{b g}) \otimes \dots \otimes \exp ((T_{k} - T_{k - 1}) G_{C}^{b g}),

where the symbol $\otimes$ denotes the matrix Kronecker product (tensor product) (68).

Furthermore, the same detection matrices defined earlier to include only nonradiative transitions or only radiative transitions, and their generalization with crosstalk and detection efficiency, can be used to obtain nonradiative and radiative propagators, as shown in the “continuum limit” section.

Consequently, as mentioned earlier, by contrast to incorporating the effects of dead time or IRF, addition of background sources do not entail any changes in the basic structure (arrangement of propagators) of the likelihood appearing in Eq. 21.

Example VI: Background.

To provide a concrete example for background, we again return to our FRET pair with two system states. The background free full generator matrix for this system-FRET composite was provided in the example box in the “likelihood” section as (in units of ${ms}^{- 1}$ )

G_{n o - b g} = [\begin{array}{c} - 12 & 10.0 & 0 & 2.0 & 0.0 & 0.0 \\ 277000 & - 347002 & 70000 & 0.0 & 2.0 & 0.0 \\ 285000 & 0.0 & - 285002 & 0.0 & 0.0 & 2.0 \\ 1.0 & 0.0 & 0.0 & - 11 & 10.0 & 0 \\ 0.0 & 1.0 & 0.0 & 277000 & - 2777001 & 2500000 \\ 0.0 & 0.0 & 1.0 & 285000 & 0.0 & - 285001 \end{array}] .

Here, we expand the above generator matrix to incorporate background photons entering two channels $(i = 1,2)$ at rates of $λ_{1}^{b g} = 1$ ms⁻¹ and $λ_{2}^{b g} = 0.5$ ms⁻¹. We do so by performing a Kronecker sum of $G_{n o - b g}$ with the following generator matrix for the background

G_{b g} = G_{1}^{b g} \oplus G_{2}^{b g} = [\begin{array}{c} - 1 & 1 \\ 1 & - 1 \end{array}] \oplus [\begin{array}{c} - 0.5 & 0.5 \\ 0.5 & 0.5 \end{array}] = [\begin{array}{c} - 1.5 & 0.5 & 1 & 0 \\ 0.5 & - 1.5 & 0 & 1 \\ 1 & 0 & - 1.5 & 0.5 \\ 0 & 1 & 0.5 & - 1.5 \end{array}],

resulting in

G = G_{n o - b g} \oplus G_{b g} .

Here, $G$ is a 24 $\times$ 24 matrix and we do not include its explicit from.

Fluorophore characteristics: Quantum yield, blinking, photobleaching, and direct acceptor excitation

As demonstrated for background in the previous section, to incorporate new photophysical transitions, such as fluorophore blinking and photobleaching, into the likelihood we must modify the full generator matrix $G$ . This can again be accomplished by adding extra photophysical states, relaxing nonradiatively, to the fluorophore model. These photophysical states can have long or short lifetimes depending on the specific photophysical phenomenon at hand. For example, donor photobleaching can be included by introducing a third donor photophysical state into the matrix of Eq. 4 without any escape transitions as follows

G_{σ_{i}}^{ψ} = [\begin{array}{c} * & λ_{ψ_{1} \to ψ_{2}} & 0 & 0 \\ λ_{ψ_{2} \to ψ_{1}} & * & λ_{σ_{i}, ψ_{2} \to ψ_{3}} & λ_{ψ_{2} \to ψ_{4}} \\ λ_{ψ_{3} \to ψ_{1}} & 0 & * & 0 \\ 0 & 0 & 0 & * \end{array}] = [\begin{array}{c} * & λ_{e x} & 0 & 0 \\ λ_{d} & * & λ_{σ_{i}}^{F R E T} & λ_{b l e a c h} \\ λ_{a} & 0 & * & 0 \\ 0 & 0 & 0 & * \end{array}],

where $ψ_{1}$ is the lowest energy combined photophysical state for the FRET labels, $ψ_{2}$ represents the excited donor, $ψ_{3}$ represents the excited acceptor, and $ψ_{4}$ represents a photobleached donor, respectively. In addition, $λ_{d}$ and $λ_{a}$ denote donor and acceptor relaxation rates, respectively, $λ_{b l e a c h}$ represents permanent loss of emission from the donor (photobleaching), and $λ_{σ_{i}}^{F R E T}$ represents FRET transitions when the system is in its $i$ -th system state.

Fluorophore blinking can be implemented similarly, except with a nonzero escape rate out of the new photophysical state, allowing the fluorophore to resume emission after some time (52,69). Here, assuming that the fluorophore cannot transition into the blinking photophysical state from the donor ground state results in the following generator matrix

G_{σ_{i}}^{ψ} = [\begin{array}{c} * & λ_{ψ_{1} \to ψ_{2}} & 0 & 0 \\ λ_{ψ_{2} \to ψ_{1}} & * & λ_{σ_{i}, ψ_{2} \to ψ_{3}} & λ_{ψ_{2} \to ψ_{4}} \\ λ_{ψ_{3} \to ψ_{1}} & 0 & * & 0 \\ λ_{ψ_{4} \to ψ_{1}} & 0 & 0 & * \end{array}] = [\begin{array}{c} * & λ_{e x} & 0 & 0 \\ λ_{d} & * & λ_{σ_{i}}^{F R E T} & λ_{b l i n k} \\ λ_{a} & 0 & * & 0 \\ λ_{u n b l i n k} & 0 & 0 & * \end{array}] .

So far, we have ignored direct excitation of acceptor dyes in the likelihood model. This effect can also be incorporated into the likelihood by assigning a nonzero value to the transition rate $λ_{ψ_{1} \to ψ_{3}}$ , that is,

G_{σ_{i}}^{ψ} = [\begin{array}{c} * & λ_{ψ_{1} \to ψ_{2}} & λ_{ψ_{1} \to ψ_{3}} \\ λ_{ψ_{2} \to ψ_{1}} & * & λ_{σ_{i}, ψ_{2} \to ψ_{3}} \\ λ_{ψ_{3} \to ψ_{1}} & 0 & * \end{array}] = [\begin{array}{c} * & λ_{e x} & λ_{d i r e c t} \\ λ_{d} & * & λ_{σ_{i}}^{F R E T} \\ λ_{a} & 0 & * \end{array}] .

Other photophysical phenomena can also be incorporated into our likelihood by following the same procedure as above. Finally, just as when adding background, the structure of the likelihood (arrangement of the propagators) when treating photophysics (including adding the effect of direct acceptor excitation) stays the same as in Eq. 21.

Synthetic data generation

In the previous subsections, we described how to compute the likelihood, which is the sum of probabilities over all possible superstate trajectories that could give rise to the observations made by a detector, as demonstrated in the “forward model” section. Here, we demonstrate how one such superstate trajectory can be simulated to produce synthetic photon arrival data using the Gillespie algorithm (70), as described in the next section, followed by the addition of detector artefacts. We then use the generated data to test our BNP-FRET sampler.

Gillespie and detector artefacts

The Gillespie algorithm generates two sets of random variables. The times at which superstates change (indexed 1 through $N$ ). These times occur anywhere along a continuous time grid. The next set of random variables are the states associated to the superstate preceding the time at which the superstate changes.

We designate the sequence of superstates

{b_{1}, b_{2}, \dots, b_{N}},

where $b_{n} \in {ϕ_{1}, ϕ_{2}, \dots, ϕ_{M_{ϕ}}}$ . Here, unlike earlier in the “likelihood” section, the time index $n$ on superstates $b_{n}$ is not on a regular temporal grid.

Now, to generate the superstate sequence above, we first randomly draw the first superstate, $b_{1}$ , from the set of possible superstates given their corresponding probabilities. Next, we draw the second superstate $b_{2}$ of the sequence using the set of transition rates out of the first state with self-transitions excluded by construction. Now, after choosing $b_{2}$ , we generate the holding time $h_{1}$ (the time spent in $b_{1}$ ) from the Exponential distribution with rate constant associated with transitions $b_{1} \to b_{2}$ . Finally, we repeat the two previous steps to sequentially generate the full sequence of superstates along with the corresponding holding times.

More formally, we generate a trajectory, by first sampling the initial superstate as

b_{1} \sim C a t e g o r i c a l_{ϕ_{1 : M_{ϕ}}} (ρ_{s t a r t}),

where $ρ_{s t a r t}$ is the initial probability vector and the Categorical distribution is the generalization of the Bernoulli distribution for more than two possible outcomes. The remaining superstates can now be sampled as

b_{n + 1} | b_{n}, G \sim C a t e g o r i c a l_{ϕ_{1 : M_{ϕ}}} (\frac{λ_{b_{n} \to ϕ_{1}}}{λ_{b_{n}}}, \frac{λ_{b_{n} \to ϕ_{2}}}{λ_{b_{n}}}, \dots, \frac{λ_{b_{n} \to ϕ_{M_{ϕ}}}}{λ_{b_{n}}}),

where $λ_{b_{n}} = \sum_{i} λ_{b_{n} \to ϕ_{i}}$ is the escape rate for the superstate $b_{n}$ and rates for self-transitions are zero. The above equation reads as follows: “the superstate $b_{n + 1}$ is drawn (sampled) from a Categorical distribution given the superstate $b_{n}$ and the generator matrix $G$ .”

Once the $n$ -th superstate $b_{n}$ is chosen, the holding time $h_{n}$ (the time spent in $b_{n}$ ) is sampled as follows

h_{n} | b_{n}, G \sim E x p o n e n t i a l (λ_{b_{n}}) .

Finally, with ideal detectors, the detection channel $c_{k}$ is assigned deterministically to the $k$ -th photon emitted at time $T_{k}^{e m}$ , which can be computed by summing all the holding times preceding the corresponding radiative transition.

Furthermore, in the presence of detection effects, such as crosstalk, detection efficiency, and IRF, we must add to the stochastic output of the Gillespie simulation another layer of stochasticity originating from the measurement model. That is, we stochastically assign detection channel and detection times to an emitted photon, as described below.

In the presence of crosstalk and inefficient detectors, we choose the detection channel for the $k$ -th photon emitted upon a radiative transition as

c_{k} \sim C a t e g o r i c a l_{{\emptyset, 1,2}} (p_{\emptyset}^{k}, p_{1}^{k}, p_{2}^{k}),

where $p_{\emptyset}^{k}, p_{1}^{k}$ , and $p_{2}^{k}$ , respectively, denote the probability of the photon going undetected, being detected in channels 1 and 2.

Moreover, in the presence of the IRF, we assign a stochastic delay $ϵ_{k}$ , sampled from a probability distribution $f (ϵ)$ , to the absolute photon emission time $T_{k}^{e m}$ . This results in the detection time, $T_{k} = T_{k}^{e m} + ϵ_{k}$ , as registered by the timing hardware.

In addition, when photophysical effects (such as blinking and photobleaching) and background are present, we can generate a superstate trajectory following the same procedure as above using the generator matrices $G$ incorporating these effects as described in the previous sections.

Finally, we obtain our desired smFRET trace (see Fig. 4) consisting of photon arrival times $T_{1 : K}$ and detection channels $c_{1 : K}$ as

{(T_{1}, c_{1}), (T_{2}, c_{2}), (T_{3}, c_{3}), \dots, (T_{K}, c_{K})} .

Simulated data. Here, we show a superstate trajectory (in *blue*) generated using Gillespie algorithm for a system-FRET composite with two system states $σ_{1 : 2}$ and three photophysical states. Collectively, superstates $φ_{1 : 3}$ correspond to photophysical states when the system resides in $σ_{1}$ and superstates $φ_{4 : 6}$ correspond to photophysical states when the system resides in $σ_{2}$ . The pink vertical lines mark the time points where transitions between the superstates occur. The variables $b_{1 : 7}$ and $h_{1 : 7}$ between each set of vertical lines represent the superstates and associated holding times, respectively. The green and red dots show the photon detections at times $T_{1}$ and $T_{2}$ in channels 1 and 2, respectively. The first photon is detected upon transition $b_{3} \to b_{4}$ (or $φ_{5} \to φ_{4}$ ), while the second photon is detected upon transition $b_{6} \to b_{7}$ (or $φ_{6} \to φ_{4}$ ). For this plot, we have used very fast system transition rates of $λ_{σ_{1} \to σ_{2}} = 0.001$ ns⁻¹ and $λ_{σ_{1} \to σ_{2}} = 0.002$ ns⁻¹ for demonstrative purposes only.

Inverse strategy

Now, armed with the likelihood for different experimental setups and a means by which to generate synthetic data (or having experimental data at hand), we proceed to learn the parameters of interest. Assuming precalibrated detector parameters, these include transition rates that enter the generator matrix $G$ , and elements of $ρ_{s t a r t}$ . However, accurate estimation of the unknowns requires an inverse strategy capable of dealing with all existing sources of uncertainty in the problem, such as photon’s stochasticity and detector noise. This naturally leads us to adopt a Bayesian inference framework where we employ Monte Carlo methods to learn distributions over the parameters.

We begin by defining the distribution of interest over the unknown parameters we wish to learn termed the posterior. The posterior is proportional to the product of the likelihood and prior distributions using Bayes’ rule as follows

p (G, ρ_{s t a r t} | w) \propto L (w | G, ρ_{s t a r t}) p (G, ρ_{s t a r t}),

(62)

where the last term $p (G, ρ_{s t a r t})$ is the joint prior distribution over $G$ and $ρ_{s t a r t}$ defined over the same domains as the parameters. The prior is often selected on the basis of computational convenience. The influence of the prior distribution on the posterior diminishes as more data are incorporated through the likelihood. Furthermore, the constant of proportionality is the inverse of the absolute probability of the collected data, $1 / p (w)$ , and can be safely ignored as generation of Monte Carlo samples only involves ratios of posterior distributions or likelihoods.

In addition, the $ϵ^{K}$ factor in the likelihood first derived in Eq. 21 can be absorbed into the proportionality constant as it does not depend on any of the parameters of interest, resulting in the following expression for the posterior (in the absence of detector dead time and IRF for simplicity)

\begin{array}{c} p (G, ρ_{s t a r t} | w) \propto ρ_{s t a r t} Π_{1}^{n o n} G_{1}^{r a d} Π_{2}^{n o n} G_{2}^{r a d} \dots Π_{K - 1}^{n o n} G_{K - 1}^{r a d} Π_{K}^{n o n} G_{K}^{r a d} Π_{e n d}^{n o n} ρ_{e n d}^{T} \\ \times p (G, ρ_{s t a r t}) . \end{array}

(63)

Next, assuming a priori that different transition rates are independent of each other and initial probabilities, we can simplify the prior as follows

p (G, ρ_{s t a r t}) = p (ρ_{s t a r t}) \prod_{i, j} p (λ_{ϕ_{i} \to ϕ_{j}}),

(64)

where we select the Dirichlet prior distribution over initial probabilities as this prior is conveniently defined over a domain where the probability vectors, drawn from it, sum to unity. That is,

p (ρ_{s t a r t}) = D i r i c h l e t (ζ),

(65)

where the Dirichlet distribution is a multivariate generalization of the Beta distribution and $ζ$ is a vector of the same size as the superstate space. Typically parameters of the prior are termed hyperparameters and as such $ζ$ collects as many hyperparameters as its size.

In addition, we select Gamma prior distributions for individual rates. That is,

p (λ_{ϕ_{i} \to ϕ_{j}}) = G a m m a (λ_{ϕ_{i} \to ϕ_{j}}; α, \frac{λ_{r e f}}{α}),

(66)

guaranteeing positive values. Here, $α$ and $λ_{r e f}$ (a reference rate parameter) are hyperparameters of the Gamma prior. For simplicity, these hyperparameters are usually chosen (with appropriate units) such that the prior distributions are very broad, minimizing their influence on the posterior.

Furthermore, to reduce computational cost, the number of parameters we need to learn can be reduced by reasonably assuming the system was at steady state immediately preceding the time at which the experiment began. That is, instead of sampling $ρ_{s t a r t}$ from the posterior, we compute $ρ_{s t a r t}$ by solving the time-independent master equation,

ρ_{s t a r t} G = 0 .

Therefore, the posterior in Eq. 63 now reduces to

\begin{array}{c} p (G | w) \propto ρ_{s t a r t} Π_{1}^{n o n} G_{1}^{r a d} Π_{2}^{n o n} G_{2}^{r a d} & \dots Π_{K - 1}^{n o n} G_{K - 1}^{r a d} Π_{K}^{n o n} G_{K}^{r a d} Π_{e n d}^{n o n} ρ_{e n d}^{T} \\ \times p (G) . \end{array}

(67)

In the following subsections, we first describe a parametric inverse strategy, i.e., assuming a known number of system states, for sampling parameters from the posterior distribution in Eq. 67 using Monte Carlo methods. Next, we generalize this inverse strategy to a nonparametric case where we also deduce the number of system states.

Parametric sampler: BNP-FRET with fixed number of system states

Now with the posterior, Eq. 67, at hand and assuming steady-state $ρ_{s t a r t}$ , here we illustrate a sampling scheme to deduce the transition rates of the generator matrix $G$ .

As our posterior of Eq. 67 does not assume a standard form amenable to analytical calculations, we must iteratively draw numerical samples of the transition rates within $G$ using Markov chain Monte Carlo (MCMC) techniques. Specifically, we adopt a Gibbs algorithm to, sequentially and separately, generate samples for individual transition rates at each MCMC iteration. To do so, we first write the posterior of Eq. 67 using the chain rule as follows

p (G | w) = p (λ_{ϕ_{i} \to ϕ_{j}} | G ∖ λ_{ϕ_{i} \to ϕ_{j}}, w) p (G ∖ λ_{ϕ_{i} \to ϕ_{j}} | w),

(68)

where the backslash after $G$ indicates exclusion of the following rate parameters and $w$ denotes the set of observations as introduced in the “introducing observations” section. Here, the first term on the right-hand side is the conditional posterior for the individual rate $λ_{ϕ_{i} \to ϕ_{j}}$ . The second term is considered a constant in the corresponding Gibbs step as it does not depend on $λ_{ϕ_{i} \to ϕ_{j}}$ . Moreover, following the same logic, the priors $p (G ∖ λ_{ϕ_{i} \to ϕ_{j}})$ (see Eq. 67) for the remaining rate parameters in the posterior on the left are also considered constant. Therefore, from Eqs. 67 and 68, we can write the conditional posterior for $λ_{ϕ_{i} \to ϕ_{j}}$ above as

\begin{array}{c} p (λ_{ϕ_{i} \to ϕ_{j}} | G ∖ λ_{ϕ_{i} \to ϕ_{j}}, w) & \propto ρ_{s t a r t} Π_{1}^{n o n} G_{1}^{r a d} Π_{2}^{n o n} G_{2}^{r a d} \dots Π_{K}^{n o n} G_{K}^{r a d} Π_{e n d}^{n o n} ρ_{e n d}^{T} \\ \times G a m m a (λ_{ϕ_{i} \to ϕ_{j}}; α, \frac{λ_{r e f}}{α}) . \end{array}

(69)

Just as with the posterior over all parameters, this conditional posterior shown above does not take a closed form allowing for direct sampling.

As such, we turn to the Metropolis-Hastings (MH) algorithm (71) to draw samples from this conditional posterior, where new samples are drawn from a proposal distribution $q$ and accepted with probability

α (λ_{ϕ_{i} \to ϕ_{j}}^{*}, λ_{ϕ_{i} \to ϕ_{j}}) = \min {1, \frac{p (λ_{ϕ_{i} \to ϕ_{j}}^{*} | w, G ∖ λ_{ϕ_{i} \to ϕ_{j}}) q (λ_{ϕ_{i} \to ϕ_{j}} | λ_{ϕ_{i} \to ϕ_{j}}^{*})}{p (λ_{ϕ_{i} \to ϕ_{j}} | w, G ∖ λ_{ϕ_{i} \to ϕ_{j}}) q (λ_{ϕ_{i} \to ϕ_{j}}^{*} | λ_{ϕ_{i} \to ϕ_{j}})}},

(70)

where the asterisk represents the proposed rate values from the proposal distribution $q$ .

To construct an MCMC chain of samples, we begin by initializing the chain for each transition rate $λ_{ϕ_{i} \to ϕ_{j}}$ by random values drawn from the corresponding prior distributions. We then iteratively sweep the whole set of transition rates in each MCMC iteration by drawing new values from the proposal distribution $q$ .

A computationally convenient choice for the proposal is a Normal distribution leading to a simpler acceptance probability in Eq. 70. This is due to its symmetry resulting in $q (λ_{ϕ_{i} \to ϕ_{j}} | λ_{ϕ_{i} \to ϕ_{j}}^{*}) = q (λ_{ϕ_{i} \to ϕ_{j}}^{*} | λ_{ϕ_{i} \to ϕ_{j}})$ . However, a Normal proposal distribution would allow negative transition rates naturally forbidden leading to rejection in the MH step and thus inefficient sampling. Therefore, it is convenient to propose new samples either drawn from a Gamma distribution or, as shown below, from a Normal distribution in logarithmic space to allow for exploration along the full real line as follows

\log (λ_{ϕ_{i} \to ϕ_{j}}^{*} / κ) | \log (λ_{ϕ_{i} \to ϕ_{j}} / κ), σ^{2} \sim N o r m a l (\log (λ_{ϕ_{i} \to ϕ_{j}} / κ), σ^{2}),

where $κ = 1$ is an auxiliary parameter in the same units as $λ_{ϕ_{i} \to ϕ_{j}}$ introduced to obtain a dimensionless quantity within the logarithm.

The variable transformation above now requires introduction of Jacobian factors in the acceptance probability as follows

α (λ_{ϕ_{i} \to ϕ_{j}}^{*}, λ_{ϕ_{i} \to ϕ_{j}}) = \min {1, \frac{p (λ_{ϕ_{i} \to ϕ_{j}}^{*} | w, G ∖ λ_{ϕ_{i} \to ϕ_{j}})}{p (λ_{ϕ_{i} \to ϕ_{j}} | w, G ∖ λ_{ϕ_{i} \to ϕ_{j}})} \frac{(\partial \log (λ_{ϕ_{i} \to ϕ_{j}} / κ) / \partial λ_{ϕ_{i} \to ϕ_{j}})}{{(\partial \log (λ_{ϕ_{i} \to ϕ_{j}} / κ) / \partial λ_{ϕ_{i} \to ϕ_{j}})}^{*}}},

where the derivative terms represent the Jacobian and the proposal distributions are canceled by virtue of using a symmetric Normal distribution.

The acceptance probability above depends on the difference of the current and proposed values for a given transitions rate. In other words, smaller differences between the current and proposed values often lead to larger acceptance probabilities. This difference is determined by the covariance of the Normal proposal distribution $σ^{2}$ , which needs to be tuned for each rate individually to achieve optimal performance of the BNP-FRET sampler, or very approximately, one-fourth acceptance rate for the proposals (72).

This whole algorithm can now be summarized in the following pseudocode.

# Initialize chain of samples
$j = 1$
$f o r i$ = $1 : M_{σ} \times M_{σ}$
- $λ_{i}^{(j)} \sim G a m m a (α, \frac{λ_{r e f}}{α})$
end
# Iteratively sample from the posterior using Gibbs algorithm
$f o r j$ = $2 : Draws$
- $f o r i$ = $1 : M_{σ} \times M_{σ}$
  - $# P r o p o s e n e w s a m p l e$
  - $\log (λ_{i}^{*}) \sim N o r m a l (\log (λ_{i}^{(j - 1)}), σ^{2}),$
  - $# C o m p u t e a c c e p t a n c e p r o b a b i l i t y$
  - $α (λ_{i}^{*}, λ_{i}^{(j - 1)}) = \min {1, \frac{p (λ_{i}^{*} | w, G ∖ λ_{i})}{p (λ_{i}^{(j - 1)} | w, G ∖ λ_{i})} \frac{{(\partial \log (λ_{i}) / \partial λ_{i})}^{(j - 1)}}{{(\partial \log (λ_{i}) / \partial λ_{i})}^{*}}}$
  - $i f α (λ_{i}^{*}, λ_{i}^{(j - 1)}) > r a n d ()$
  - $# A c c e p t p r o p o s a l$
  - $λ_{i}^{(j)} = \exp (λ_{i}^{*})$
  - $e l s e$
  - $# R e j e c t p r o p o s a l$
  - $λ_{i}^{(j)} = λ_{i}^{(j - 1)}$
  - $e n d$
$e n d$
$e n d$

Nonparametrics: Predicting the number of system states

After describing our inverse strategy for a known number of system states (i.e., parametric inference), we turn to more realistic scenarios where we may not know the number of system states which, in turn, leads to an unknown number of superstates (i.e., nonparametric inference). In the following subsections, we first describe the BNP framework for continuous illumination and then proceed to illustrate our BNP strategy under pulsed illumination. Such BNP frameworks introduced herein eventually provide us with distributions over the number of system states simultaneously, and self-consistently, with other model parameters.

Bernoulli process for continuous illumination

The number of system states is often unknown and cannot a priori be set by hand. Therefore, to learn the states warranted by the data, we turn to the BNP paradigm. That is, we first define an infinite-dimensional version of the generator matrix in Eq. 2 and multiply each of its elements by a Bernoulli random variable $b_{i}$ (also termed loads). These loads, indexed by $i$ , allow us to turn on/off portions of the generator matrix associated with transitions between specific system states (including self-transitions). We can write the nonparametric generator matrix as follows

\begin{array}{c} G = [\begin{array}{c} b_{1}^{2} G_{σ_{1}}^{ψ} - \sum_{j \neq 1} b_{1} b_{j} λ_{σ_{1} \to σ_{j}} I & b_{1} b_{2} λ_{σ_{1} \to σ_{2}} I & \dots \\ b_{2} b_{1} λ_{σ_{2} \to σ_{1}} I & b_{2}^{2} G_{σ_{2}}^{ψ} - \sum_{j \neq 2} b_{2} b_{j} λ_{σ_{2} \to σ_{j}} I & \dots \\ ⋮ & ⋮ & ⋱ \end{array}] \\ = [\begin{array}{c} * & b_{1}^{2} λ_{ψ_{1} \to ψ_{2}} & b_{1}^{2} λ_{ψ_{1} \to ψ_{3}} & b_{1} b_{2} λ_{σ_{1} \to σ_{2}} & 0 & 0 & \dots \\ b_{1}^{2} λ_{ψ_{2} \to ψ_{1}} & * & b_{1}^{2} λ_{ψ_{2} \to ψ_{3}}^{(1)} & 0 & b_{1} b_{2} λ_{σ_{1} \to σ_{2}} & 0 & \dots \\ b_{1}^{2} λ_{ψ_{3} \to ψ_{1}} & b_{1}^{2} λ_{ψ_{3} \to ψ_{2}} & * & 0 & 0 & b_{1} b_{2} λ_{σ_{1} \to σ_{2}} & \dots \\ b_{1} b_{2} λ_{σ_{2} \to σ_{1}} & 0 & 0 & * & b_{2}^{2} λ_{ψ_{1} \to ψ_{2}} & b_{2}^{2} λ_{ψ_{1} \to ψ_{3}} & \dots \\ 0 & b_{1} b_{2} λ_{σ_{2} \to σ_{1}} & 0 & b_{2}^{2} λ_{ψ_{2} \to ψ_{1}} & * & b_{2}^{2} λ_{ψ_{2} \to ψ_{3}}^{(2)} & \dots \\ 0 & 0 & b_{1} b_{2} λ_{σ_{2} \to σ_{1}} & b_{2}^{2} λ_{ψ_{3} \to ψ_{1}} & b_{2}^{2} λ_{ψ_{3} \to ψ_{2}} & * & \dots \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋱ \end{array}], \end{array}

where a load value of 1 represents an “active” system state, while “inactive” system states (not warranted by the data) get a load value of 0. Here, there are two loads associated to every transition because there is a pair of states corresponding to each transition. Within this formalism, the number of active loads is the number of system states estimated by the BNP-FRET sampler. As before, $*$ represents negative row-sums.

The full set of loads, $b = {b_{1}, b_{2}, \dots, b_{\infty}}$ , now become quantities we wish to learn. To leverage Bayesian inference methods to learn the loads, the previously defined posterior distribution (Eq. 67) now reads as follows

p (b, G | w) \propto L (w | b, G, ρ_{s t a r t}) p (G) p (b),

(71)

where the prior $p (b)$ is Bernoulli while the remaining prior, $p (G)$ , can be assumed to be the same as in Eq. 66.

As in the case of the parametric BNP-FRET sampler presented in the section “parametric sampler: BNP-FRET with fixed number of system states,” we generate samples from this nonparametric posterior employing a similar Gibbs algorithm. To do so, we first initialize the MCMC chains of loads and rates by taking random values from their priors. Next, to construct the MCMC chains, we iteratively draw samples from the posterior in two steps: 1) sequentially sample all rates using the MH algorithm; then 2) load by direct sampling, one-by-one from their corresponding conditional posteriors. Here, step (1) is identical to the parametric case parametric sampler: BNP-FRET with fixed number of system statesand we only focus on the second step in what follows.

To sample the $i$ -th load, the corresponding conditional posterior reads (41)

p (b_{i} | b ∖ b_{i}, G, w) \propto L (w | b, G, ρ_{s t a r t}) B e r n o u l l i (b_{i}; \frac{1}{1 + \frac{M_{σ}^{\max} - 1}{γ}}),

(72)

where the backslash after $b$ indicates exclusion of the following load and $M_{σ}^{\max}$ and $γ$ are hyperparameters. Here, $γ$ sets the a priori expected number of system states.

A note on the interpretation of $M_{σ}^{\max}$ is in order. When dealing with nonparametrics, we nominally must consider an infinite set of loads and priors for these loads called Bernoulli process priors (73). Samplers for such process priors are available although inefficient (74,75). However, for computational convenience, it is possible to introduce a large albeit finite number of loads set to $M_{σ}^{\max}$ . It can be shown that parameter inference are unaffected by this choice of cutoff (73,76,77) when setting the success probability to $1 / (1 + \frac{M_{σ}^{\max} - 1}{γ})$ as in the Bernoulli distribution of Eq. 72. This is because such a choice forces the mean (expected number of system states) of the full prior on loads $\prod_{i} p (b_{i})$ to be finite $(= γ)$ .

Since the conditional posterior in the equation above must be discrete and describes probabilities of the load being either active or inactive, it must itself follow a Bernoulli distribution with updated parameters

p (b_{i} | b ∖ b_{i}, G, w) = B e r n o u l l i (b_{i}; q_{i}),

where

q_{i} = \frac{L (w | b_{i} = 1, b ∖ b_{i}, G, ρ_{s t a r t})}{L (w | b_{i} = 1, b ∖ b_{i}, G, ρ_{s t a r t}) + L (w | b_{i} = 0, b ∖ b_{i}, G, ρ_{s t a r t})} .

The Bernoulli form of this posterior allows direct sampling of the loads.

We will apply this method for synthetic and experimental data in the second companion article of this series (55).

iHMM methods for pulsed illumination

Under pulsed illumination, the Bernoulli process prior described earlier for continuous illumination can in principle be used as is to estimate the number of system states and the transition rates. However, in this section, we describe a computationally cheaper inference strategy applicable to the simplified likelihood of Eq. 60 assuming system state transitions occurring only at the beginning of each pulse. The reduction in computational expense is achieved by directly learning the elements of the propagator $Π_{σ}$ of Eq. 58, identical for all interpulse periods. In doing so, we learn transition probabilities for the system states instead of learning rates, although we will continue learning rates for photophysical states. This avoids expensive matrix exponentials for potentially large system state numbers required for computing the propagators under continuous illumination.

Now, to infer the transition probabilities in $Π_{σ}$ , the dimensions of which are unknown owing to an unknown number of system states, as well as transition rates among the photophysical states (elements of $G_{ψ}$ in Eq. 4), and initial probabilities, we must place suitable priors on these parameters yielding the following posterior

p (ρ_{s t a r t}, Π_{σ}, G_{ψ} | w) \propto p (w | ρ_{s t a r t}, Π_{σ}, G_{ψ}) p (ρ_{s t a r t}) p (G_{ψ}) p (Π_{σ}),

(73)

where we have immediately written the joint prior as a product prior over $ρ_{s t a r t}$ , $G_{ψ}$ , and $Π_{σ}$ . Next, for $ρ_{s t a r t}$ and $G_{ψ}$ we use the same priors as in Eqs. 65 and 66. However, as the number of system states is unknown, $Π_{σ}$ requires special treatment. To learn $Π_{σ}$ , it is convenient to adopt the infinite HMM (iHMM) (41,48) due to the discrete nature of system state transitions over time.

As the name suggests, the iHMM leverages infinite system state spaces ( $M_{σ} \to \infty$ in Eq. 3) similar to the Bernoulli process prior described in the “Bernoulli process for continuous illumination” section. However, unlike the Bernoulli process, all system states remain permanently active. The primary goal of an iHMM is then to infer transition probabilities between system states, some of which, not warranted by the data, remain very small and set by the (nonparametric) prior that we turn to shortly. Thus the effective number of system states can be enumerated from the most frequently visited system states over the course of a learned trajectory.

Within this iHMM framework, we place an infinite dimensional version of the Dirichlet prior, termed the Dirichlet process prior (41,48,78), as priors over each row of the propagator $Π_{σ}$ . That is,

π_{m} \sim D i r i c h l e t P r o c e s s (α β), m = 1,2, \dots,

(74)

where $π_{m}$ is the $m$ -th row of $Π_{σ}$ . Here, the hyperparameters of the Dirichlet process prior include the concentration parameter $α$ that determines the sparsity of the $π_{m}$ and the hyper parameter $β$ , which is a probability vector, also known as base distribution. Together $α β$ are related to the $ζ$ introduced earlier for the (finite) Dirichlet distribution of Eq. 65.

Next, as the base distribution itself is unknown and all transitions out of each state should be likely to revisit the same set of states, we must place the same base distribution on all Dirichlet process priors placed on the rows of the propagator. To sample this unique base, we again choose a Dirichlet process prior (41,79,80,81), that is,

β \sim D i r i c h l e t P r o c e s s (ξ γ),

where we may set $ξ = 1$ and $γ$ is a vector of hyperparameters of size $M_{σ}$ .

Now, to deduce the unknown parameters, we need to draw samples from the posterior in Eq. 73. However, due to the nonanalytical form of the posterior we cannot jointly sample our posterior. Thus, as before, we adopt a Gibbs sampling strategy to sequentially and separately draw samples for each parameter. Here, we only illustrate our Gibbs sampling step for the transition probabilities $π_{m}$ . Our Gibbs steps for the remaining parameters are similar to the ones in the “parametric sampler: BNP-FRET with fixed number of system states” section. The complete procedure is described in the third companion article (54).

Similar to the Bernoulli process prior, there are two common approaches to draw samples within the iHMM framework: slice sampling using the exact Dirichlet process prior and finite truncation (41,48,82,83). Just as before for the case of continuous illumination, we truncate the Dirichlet process prior to a finite Dirichlet distribution and fix its dimensionality to a finite (albeit large) number which we set to $M_{σ}^{\max}$ to improve the sampling. It can then be shown that, for large enough $M_{σ}^{\max},$ the number of system states visited becomes independent of $M_{σ}^{\max}$ (41).

As before, to numerically sample the transition probabilities $π_{m}$ from our full posterior in Eq. 73 through MCMC, we choose our initial samples from the priors

\begin{array}{c} β \sim D i r i c h l e t (ξ γ), \\ π_{m} \sim D i r i c h l e t (α β), m = 1,2, \dots, M_{σ}^{\max} \end{array}

where we chose elements of $γ$ to be $1 / M_{σ}^{\max}$ to ascribe similar weights across the state space a priori.

Likelihood computation in practice

As shown in the “pulsed illumination” section, the likelihood typically takes the following generic form

L \propto ρ_{s t a r t} Q_{1} Q_{2} Q_{3} \dots Q_{K - 1} Q_{K} ρ_{n o r m}^{T},

(75)

where $Q_{i}$ are matrices whose exact form depends on which effects we incorporate into our likelihood. Computing this last expression would typically lead to underflow as likelihood values quickly drop below floating-point precision.

For this reason, it is convenient to introduce the logarithm of this likelihood. To derive the logarithm of the likelihood of Eq. 75, we rewrite the likelihood as a product of multiple terms as follows

L \propto (ρ_{s t a r t} ρ_{n o r m}^{T}) (ρ_{1} Q_{1} ρ_{n o r m}^{T}) (ρ_{2} Q_{2} ρ_{n o r m}^{T}) \dots (ρ_{K - 1} Q_{K - 1} ρ_{n o r m}^{T}) (ρ_{K} Q_{K} ρ_{n o r m}^{T}),

where $ρ_{i}$ are the normalized probability vectors given by the following recursive formula

\begin{array}{c} ρ_{1} = ρ_{s t a r t}, a n d \\ ρ_{i} = \frac{ρ_{i - 1} Q_{i - 1}}{(ρ_{i - 1} Q_{i - 1} ρ_{n o r m}^{T})} . \end{array}

Now, using the recursion relation above, the log-likelihood can be written as

\begin{array}{c} \log (L) = \log (ρ_{s t a r t} ρ_{e n d}^{T}) + \log (ρ_{1} Q_{1} ρ_{n o r m}^{T}) + \log (ρ_{2} Q_{2} ρ_{n o r m}^{T}) + \log (ρ_{3} Q_{3} ρ_{n o r m}^{T}) + \dots \\ \log (ρ_{K - 1} Q_{K - 1} ρ_{n o r m}^{T}) + \log (ρ_{K} Q_{K} ρ_{n o r m}^{T}) + c o n s t, \end{array}

where $c o n s t$ is a constant.

Note that $ρ_{s t a r t} ρ_{n o r m}^{T} = 1$ . The pseudocode to compute the log-likelihood is as follows

$ρ = ρ_{s t a r t}$

$p = s u m (ρ) = 1$

$\log (L) = \log (p) = 0$

$f o r i = 1 : K$

$\begin{array}{l} Q = \dots \\ ρ = ρ Q \\ p = s u m (ρ) \\ \log (L) = \log (L) + \log (p) \\ ρ = ρ / p \end{array}$

$e n d$

$r e t u r n \log (L)$

Results

In this section, we present results using our BNP-FRET sampler described above. Specifically, here we benchmark the parametric (i.e., fixed number of system states) version of our sampler using synthetic data, while the two subsequent manuscripts (54,55) focus on the nonparametric (i.e., unknown number of system states) analysis of experimental data.

For simplicity alone, we begin by analyzing data from an idealized system with two system states using different photon budgets and excitation rates. Next, we consider more realistic examples incorporating the following one at a time: 1) crosstalk and detection efficiency; 2) background emission; 3) IRF; and then 4) end with a brief discussion on the unknown number of system states. We demonstrate when these features become relevant, as well as the overall robustness and generality of the BNP-FRET sampler.

For now, we assume continuous illumination for all parametric examples and use the following priors for the analyses. The prior used for the FRET rates are

λ_{σ_{i}}^{F R E T} \sim G a m m a (1,1 n s^{- 1}),

and use the following prior over the system transition rates

λ_{σ_{i} \to σ_{j}} \sim G a m m a (1,1 0^{- 6} n s^{- 1}) .

As discussed earlier in the section “parametric sampler: BNP-FRET with fixed number of system states,” it is more convenient to work within logarithmic space where we use the following proposal distributions to update the parameter values

\begin{array}{c} \log (λ_{e x}^{*}) | \log (λ_{e x}), σ_{e x} \sim N o r m a l (\log (λ_{e x}), σ_{e x}^{2}), \\ \log ({λ_{σ_{i}}^{F R E T}}^{*}) | \log (λ_{σ_{i}}^{F R E T}), σ_{F R E T} \sim N o r m a l (\log (λ_{σ_{i}}^{F R E T}), σ_{F R E T}^{2}), a n d \\ \log (λ_{σ_{i} \to σ_{j}}^{*}) | \log (λ_{σ_{i} \to σ_{j}}), σ_{s y s} \sim N o r m a l (\log (λ_{σ_{i} \to σ_{j}}), σ_{s y s}^{2}), \end{array}

where $*$ denotes proposed rates and where it is understood that all rates appearing in the logarithm have been divided through by a unit constant in order for the argument of the logarithm to remain dimensionless.

For efficient exploration of the parameter space by the BNP-FRET sampler and upon extensive experimentation with acceptance ratios, we found it prudent to alternate between two sets of variances, { $σ_{e x}^{2} = 10^{- 5}$ , $σ_{F R E T}^{2} = 0.01$ , $s i g m a_{s y s}^{2} = 0.1$ } and { $σ_{e x}^{2} = 10^{- 5}$ , $σ_{F R E T}^{2} = 0.5$ , $σ_{s y s}^{2} = 5.0$ } to generate an MCMC chain. This ensures that we propose samples of different orders of magnitude. As an intuitive guide, the more data we have, the sharper we expect our posterior over our rates to be and, thus, the smaller both variances should be in our proposal distributions.

In the examples presented in the next few subsections, for computational simplicity, we fix the escape rates for the donor and acceptor excited photophysical states as well as the background rates for each detection channel in our simulations, as they can be precalibrated from experiments.

Parametric examples

Photon budget and excitation rate

Here, we perform Bayesian analysis on synthetically generated data (as described in the “synthetic data generation” section) for the simplest case where the number of system states is an input to the BNP-FRET sampler. To generate data, we use the following generator matrix

G = [\begin{array}{c} * & 10.0 & 0 & 2.0 & 0.0 & 0.0 \\ 2.77 \times 10^{5} & * & 1.11 \times 10^{5} & 0.0 & 2.0 & 0.0 \\ 2.85 \times 10^{5} & 0.0 & * & 0.0 & 0.0 & 2.0 \\ 1.0 & 0.0 & 0.0 & * & 10.0 & 0 \\ 0.0 & 1.0 & 0.0 & 2.77 \times 10^{5} & * & 0.91 \times 10^{5} \\ 0.0 & 0.0 & 1.0 & 2.85 \times 10^{5} & 0.0 & * \end{array}] m s^{- 1},

where the elements are motivated from real experiments (84). Using this generator matrix, we generated a superstate trajectory as described in the “synthetic data generation” section. We analyzed 430,000 photons from the generated data using our BNP-FRET sampler. The resulting posterior distribution for transitions between system states and FRET efficiencies (computed as $ϵ_{σ_{i}}^{F R E T} = λ_{σ_{i}}^{F R E T} / (λ_{d} + λ_{σ_{i}}^{F R E T})$ for the $i$ -th system state) is shown in Fig. 5. As we will see for all examples, the finiteness of data always leads to some error as evident from the slight offset of the peaks of the distribution from the ground truth.

Learned bivariate posterior for the system state escape rates $λ_{e s c}$ and FRET efficiencies $ϵ_{F R E T}$ given synthetic data. To produce this plot, we analyzed synthetic data generated using an excitation rate of $λ_{e x} = 10$ ms⁻¹, and escape rates $λ_{e s c} = 1$ and 2 ms⁻¹ and FRET efficiencies of 0.09 and 0.29 for the two system states, respectively. The ground truth is shown with red dots. The FRET efficiencies estimated by our sampler are ${0.288}_{- 0.006}^{+ 0.007}$ and ${0.092}_{- 0.003}^{+ 0.003}$ . Furthermore, predicted escape rates are ${2.03}_{- 0.17}^{+ 0.16}$ ms⁻¹ and ${0.98}_{- 0.07}^{+ 0.10}$ ms⁻¹. The small bias away from the ground truth is due to the finiteness of data. We have smoothed the distributions, for illustrative purposes only, using kernel density estimation (KDE) available through the Julia Plots package.

The effects of a limited photon budget become significant especially when system kinetics occur across multiple timescales with the most photon-starved state characterized by the largest escape rate. In this case, it is useful to quantify how many photons are typically required to assess any escape rate (with the fastest setting the lower photon count bound needed) to obtain below 15% error in parameter estimates.

To quantify the number of photons, ignoring background and detector effects, we define a dimensionless quantity that we call the “photon budget index” predicting the photon budget needed to accurately estimate the transition rates in the model as

s = \frac{K λ_{e x}}{λ_{p r o b e} M_{σ}},

(76)

where $K$ is the total number of photons in a single-photon smFRET trace (photon budget), $λ_{e x}$ is the excitation rate, $λ_{p r o b e}$ represents the escape rate (timescale) that we want to probe, and $M_{σ}$ is the number of system states. The parameters in the numerator control the amount of data available and the temporal resolution. On the other hand, the parameters in the denominator are the properties of the system under investigation and represent the required resolution.

From experimentation, we have found a photon budget index of approximately $10^{6}$ to be a safe lower threshold for keeping errors below 15% (this error cutoff is a user choice) in parameter estimates. In the simple parametric example above, we have $K = 4.3 \times 10^{5}$ , $λ_{e x} = 10$ ms⁻¹, and the fastest transition that we want to probe is $λ_{p r o b e} = 2$ ms⁻¹, and $M_{σ} = 2$ , which corresponds to a photon budget index of $1.08 \times 10^{7}$ . In Fig. 6, we also demonstrate the reduction in errors (confidence interval size) for parameters of the same system as the photon budget is increased from 12,500 to 400,000 photons. For each of those cases, we used 9000 MCMC samples to compute statistical metrics such as quantiles.

System and FRET transition rates as functions of the number of photons used for analysis. To produce these plots using the same kinetic parameters as in Fig. 5. Next, we analyzed the data considering only the first 12,500 photons and then increased the photon budget by a factor of two for each subsequent analysis. Furthermore, we generated 9000 MCMC samples for each analysis to compute statistical quantities. In (a), we show two plots corresponding to the two system transition rates $(λ_{e s c})$ . The blue dots represent the median values (50% quantile), and the ends of the attached confidence intervals represent 5 and 95% quantiles. The ground truths are shown with red horizontal lines. We show similar plots for FRET transition rates $(λ_{F R E T})$ in (b). In all of the plots, we see that, as the photon budget is increased, the confidence intervals shrink (the posterior gets narrower/sharper). With a budget of 400,000 photons, the confidence intervals represent less than 10% error in the estimates and contain the ground truths in all of the plots.

We further investigate the effect of another quantity that appears in the photon budget index, that is, excitation rate on the parameter estimates. To do so, we generate three new synthetic data sets, each containing $\approx 670,000$ photons, using the same excitation rate of 10 ms⁻¹, and FRET efficiencies of 0.28 and 0.09 for the two system states, respectively, as before. However, the kinetics differ across these data sets so that they have system state transition rates well below, equal to, and well above the excitation rate. As such, for the first data set, we probe slower kinetics compared with the excitation rate with system state transition rates set at 0.1 ms⁻¹. In the next two data sets, the molecule changes system states at a much faster rates of 10 and 1000 ms⁻¹, respectively.

The results obtained for these FRET traces using our Bayesian methods are shown in Fig. 7. The bias in the posterior away from the ground truth increases as faster kinetics are probed in Fig. 7, from left to right. The results for the case with the fastest transition rates of 1000 ms⁻¹ in Fig. 7 c show a marked deterioration of the predictions, as the information content is not sufficient to separate the two FRET efficiencies resulting in estimated values close to the average of the ground truth values $(\approx 0.185)$ . This lack of information is also reflected in the uncertainties corresponding to each escape rate as shown in Fig. 7. Moreover, the predicted transition rates are of the same order as the excitation rate itself due to lack of temporal resolution available to probe such fast kinetics.

Learned bivariate posterior for the system state escape rates $λ_{e s c}$ (log-scale in (c)) and FRET efficiencies $ϵ_{F R E T}$ from synthetic data. For all synthetic smFRET traces, we use an excitation rate of 10 ms⁻¹ and FRET efficiences of 0.29 and 0.09 for the two system states, respectively. The three panels correspond to different timescales being probed with transitions rates: (a) 0.1 ms⁻¹; (b) 10 ms⁻¹; and (c) 1000 ms⁻¹. The ground truth values are shown with red dots. The bias in the parameter estimates increases as faster kinetics are probed, demonstrating deterioration of the information content of the collected data resulting in expectedly poor estimation assuming a fixed photon budget of 670,000. This can also be seen quantitatively by calculating the confidence intervals reported below for each case. The FRET efficiencies estimated by our sampler for the slowest case in (a) are ${0.286}_{- 0.002}^{+ 0.002}$ and ${0.091}_{- 0.001}^{+ 0.001}$ ms⁻¹, and the corresponding escape rates are ${0.101}_{- 0.005}^{+ 0.004}$ and ${0.096}_{- 0.004}^{+ 0.004}$ ms⁻¹. For the intermediate case in (b), FRET efficiencies estimated by our sampler are ${0.200}_{- 0.110}^{+ 0.117}$ and ${0.102}_{- 0.014}^{+ 0.022}$ , and predicted escape rates are ${8.47}_{- 3.17}^{+ 2.42}$ ms⁻¹ and ${7.67}_{- 2.66}^{+ 1.32}$ ms⁻¹. For the fastest case in (c), FRET efficiencies estimated by our sampler are ${0.189}_{- 0.027}^{+ 0.025}$ and ${0.189}_{- 0.029}^{+ 0.016}$ , and predicted escape rates are ${5.00}_{- 3.63}^{+ 26.9}$ ms⁻¹ and ${3.49}_{- 2.49}^{+ 27.21}$ ms⁻¹. Poorer confidence intervals for larger escape rates reflect larger uncertainty due to lack of information.

To conclude, the excitation rate used to collect smFRET data and the total number of photons available determine the amount of information needed to resolve transitions among system states. As such, the ability of Bayesian methods to naturally propagate error from finiteness of information into parameter estimates make them indispensable tools for quantitative smFRET data analysis. This is by contrast to maximum likelihood-based methods, which provide only inaccurate point estimates on account of limited data.

An example with crosstalk

Here, we demonstrate how our method handles cases when significant crosstalk is present. To show this, we use the same dynamical parameters and photon budget as in the previous subsection for generating synthetic data but allow 5% of the donor photons to be stochastically detected by the acceptor channel. We then analyze the data with two versions of our method, one that incorporates crosstalk and one that ignores it altogether. Our results show that neglecting crosstalk necessarily leads to artefactually higher FRET efficiency estimates. This is clearly seen in Fig. 8 a. As expected, incorporation of crosstalk into the likelihood, as shown in the “cross talk and detection efficiency” section, results in a smaller bias. In this case, both ground truths fall within the range of posteriors for the corrected model; see Fig. 8 b. Furthermore, as shown in Fig. 9, top panels, we note that donor crosstalk again results in overestimation in FRET efficiencies. However, when we correct for crosstalk, our BNP-FRET sampler starts learning FRET efficiencies with ground truths falling within the range of 95 $%$ confidence intervals (Fig. 9, bottom panels). As expected, our simulations in Fig. 9 also show that uncertainty increases with increasing crosstalk and parameter estimation fails for crosstalk values beyond 60%.

Learned bivariate posterior for the system state escape rates $λ_{e s c}$ and FRET efficiencies $ϵ_{F R E T}$ for synthetic data with crosstalk. The ground truth is shown with red dots. In (a), we show the learned posterior using the model that does not correct for crosstalk consistently shows deviation from the ground truth with higher FRET efficiency estimates on account of more donor photons being detected as acceptor photons. The FRET efficiencies estimated by our sampler for this case are ${0.314}_{- 0.009}^{+ 0.007}$ and ${0.135}_{- 0.005}^{+ 0.002}$ , and the predicted escape rates are ${1.90}_{- 0.11}^{+ 0.17}$ ms⁻¹ and ${1.05}_{- 0.18}^{+ 0.10}$ ms⁻¹. For the corrected case shown in (b), FRET efficiencies estimated by our sampler are ${0.276}_{- 0.010}^{+ 0.006}$ and ${0.088}_{- 0.006}^{+ 0.004}$ . Furthermore, predicted escape rates are ${1.85}_{- 0.14}^{+ 0.15}$ ms⁻¹ and ${1.06}_{- 0.10}^{+ 0.12}$ ms⁻¹. The corrected model mitigates this bias as demonstrated by the posterior in (b).

System transition rates $λ_{e s c}$ and FRET efficiencies $ϵ_{F R E T}$ as functions of increasing donor crosstalk probability $φ_{d 1}$ . To produce these plots, we generated synthetic data with excitation and escape rates as in Fig. 5. In each plot, the blue dots represent the median values (50% quantile), and the ends of the attached confidence intervals represent the 5 and 95% quantiles. Furthermore, the ground truths are shown with red horizontal lines. In (a), our two plots show system transition rates estimated by the BNP-FRET sampler when corrected and uncorrected for crosstalk. We show similar plots for FRET efficiencies in (b). In all plots, we see that, as donor crosstalk is increased, the confidence intervals grow (the posterior gets wider) and the estimates become unreliable after $φ_{d 1} > 0.6$ . In addition, as expected, if uncorrected for, the FRET efficiencies start to merge with increasing crosstalk due to most photons being detected in acceptor channel (labeled 1).

An example with background emissions

In the “background emissions” section, we had shown a way to include background emissions in the forward model. For the current example, we again choose the same kinetic parameters for the system and the FRET pair as in Fig. 5, but now some of the photons come from background sources with rates $λ_{i}^{b g} = λ_{e x} / 10 = 1$ ms⁻¹ for the $i$ -th channel. Addition of a uniform background would again lead to higher FRET efficiency estimates due to excess photons detected in each channel, if left uncorrected in the model, as can be seen in Fig. 10 a and Fig. 11, top panels. By comparison with the uncorrected method, our results migrate toward the ground truth when analyzed with the full method (see Figs. 10 b and 11, bottom panels). Furthermore, as shown in Fig. 11, when background photons account for more than approximately 40% of detected photons, relative uncertainties in estimated transition rates become larger than 25% indicating unreliable results.

Learned bivariate posterior for the system state escape rates $λ_{e s c}$ and FRET efficiencies $ϵ_{F R E T}$ given synthetic data with background emissions. The ground truth is shown with red dots. The learned posterior distribution using the model that does not correct for background emissions (a) consistently shows bias away from the ground truth with higher estimates for FRET efficiencies on account of extra background photons. The FRET efficiencies estimated by our sampler for this uncorrected case are ${0.322}_{- 0.008}^{+ 0.007}$ and ${0.161}_{- 0.004}^{+ 0.003}$ , and the predicted escape rates are ${1.93}_{- 0.14}^{+ 0.17}$ ms⁻¹ and ${0.95}_{- 0.10}^{+ 0.10}$ ms⁻¹. The corrected model mitigates this bias as demonstrated by the posterior in (b) as demonstrated by learned FRET efficiencies of ${0.293}_{- 0.012}^{+ 0.014}$ and ${0.096}_{- 0.004}^{+ 0.003}$ . Furthermore, predicted escape rates are ${1.99}_{- 0.25}^{+ 0.25}$ ms⁻¹ and ${0.87}_{- 0.05}^{+ 0.08}$ ms⁻¹.

System transition rates $λ_{e s c}$ and FRET efficiencies $ϵ_{F R E T}$ as functions of increasing donor and acceptor background fraction $λ_{b g} / λ_{e x}$ . To produce these plots, we generated synthetic data with an excitation rate of $λ_{e x} = 10$ ms⁻¹, and escape rates $λ_{e s c} = 1$ and 2 ms⁻¹ for the two system states, respectively, same as Fig. 5, while increasing the fraction of background photons (donor and acceptor) from 0 to 50% $(λ_{b g} / λ_{e x} = 1)$ . In each plot, the blue dots represent the median values (50% quantile), and the ends of the attached confidence intervals represent the 5 and 95% quantiles. Furthermore, ground truths are shown with red horizontal lines. In (a), we show two plots showing system transition rates estimated by the BNP-FRET sampler when corrected and uncorrected for crosstalk. We show similar plots for FRET efficiencies in (b). In all plots, we see that as background is increased, the confidence intervals get bigger (the posterior gets wider) and the estimates become unreliable after $λ_{b g} / λ_{e x} > 0.6$ . In addition, as expected, if unaccounted for, FRET efficiencies start to merge with increasing background as photons originating from FRET events significantly reduce.

An example with IRF

To demonstrate the effect of the IRF as described in the “adding the detection IRF” section, we generated new synthetic data for a single fluorophore (with no FRET for simplicity alone) with an escape rate of $λ_{d} = 2.0$ ns⁻¹ (similar to that of Cy3 dye (85)) being excited by a continuous-wave laser at a high excitation rate $λ_{e x} = 0.01$ ns⁻¹. For simplicity, we approximate the IRF with a truncated Gaussian distribution about 96 ps wide with mean at 48 ps. We again analyze the data with two versions of our method, both incorporating and neglecting the IRF. The results are depicted in Fig. 12, where the posterior is narrower when incorporating the IRF. This is especially helpful when accurate lifetime determination is important in discriminating between different system states. By contrast, accurate determination of lifetimes (which span nanoseconds timescales) do not impact the determination of much slower system kinetics from one system state to the next.

Effects of IRF. Both histograms show the fluorophore’s inverse lifetime with ground truth shown by a red line. The bias in the peaks away from the ground truth arises from the limited amounts of data being used to learn the posterior shown. The corrected model (*orange*) reduces the histogram’s breadth compared with the uncorrected model (*blue*). We conclude from the small effects of correcting for the IRF that, predictably, the IRF may be less important under continuous illumination. By contrast, under pulsed illumination to be explored in the third companion article (54), the IRF will play a more significant role.

A nonparametric example

Here, we demonstrate our method in learning the number of system states by analyzing approximately 600 ms ( $\approx$ 120,000 photons) of synthetic smFRET time trace data with three system states under pulsed illumination with 25 ns interpulse window (see Fig. 13). This example utilizes the iHMM method described in the “iHMM methods for pulsed illumination” section earlier and discussed in greater depth in the third companion article (54). Using realistic values from the third companion article (54), we set the excitation probability per pulse to be 0.005. Furthermore, kinetics are set at 1.2 ms escape rates for the highest and lowest FRET system states, and an escape rate of 2.4 ms for an intermediate system state. Our BNP method simultaneously recovered the correct system state transition probabilities and thereby the number of system states along with other parameters, including donor and acceptor relaxation rates and the per-pulse excitation probability. By comparison, a parametric version of the same method must assume a fixed number of system states a priori. Assuming, say, two system states results in both higher-FRET system states being combined together into one system state with a FRET efficiency of 0.63 and a lifetime of about 0.6 ms (see Fig. 13 c).

Demonstration of nonparametric analysis on synthetic pulsed data. In (a) we show simulated data for a pulsed illumination experiment. It illustrates a trajectory with three system states labeled $s_{n}$ in blue and the corresponding photon arrivals with the vertical length of the red or green line denoting the lifetime observed in nanoseconds. Here (b) and (c) show the bivariate posterior for the escape rates $λ_{e s c}$ and FRET efficiencies $ϵ_{F R E T}$ . The ground truth is again shown with red dots. The learned posterior using the two system state parametric model (c) combines high FRET states into one averaged system state with a long lifetime. The nonparametric model correctly infers three system states, as shown in (b).

Discussion

In this paper, we have presented a complete framework to analyze single-photon smFRET data, which includes a photon-by-photon likelihood, detector effects, fluorophore characteristics, and different illumination methods. We demonstrated how modern Bayesian methods can be used to obtain full distributions over the parameters, and discussed limitations posed by the photon budget and excitation rate. In addition, we have shown how to implement a nonparametric inverse strategy to learn an unknown number of system states.

Our method readily accommodates details relevant to specialized smFRET applications. For instance, we can analyze spatial and temporal dependence of excitation by simple modification of generator matrices included in Eq. 21. This is useful in experiments employing pulsed illumination (see the “illumination features” section), as well as alternating-laser excitation (ALEX). In particular, ALEX is used to directly excite the acceptor label, either as a way to gain qualitative information about the sample (65,86), to reduce photobleaching (65,86), or to study intermolecular interactions (16). Similarly, the generator matrix in Eq. 21 can easily be expanded to include any number of labels, extending our method beyond two colors. Three color smFRET experiments have revealed simultaneous interactions between three proteins (87), monitored conformational subpopulations of molecules (88), and improved our understanding of protein folding and interactions (2,16,89,90,91).

As the likelihood (Eq. 21) involves as many matrix exponentials as detected photons, the computational cost of our method scales approximately linearly with the number of photons and quadratically with the number of system states. For instance, it took about 5 hours to analyze the data used to generate Fig. 5 on a regular desktop computer. Additions to our model that increase computational cost include: 1) IRF; 2) pulsed illumination; and 3) BNPs. The computational cost associated to the IRF is attributed to the integral required (see the “adding the detection IRF” section). The cost of the likelihood computation in the pulsed illumination case depends linearly with pulse number, rather than with photon number (see Eq. 51). This greatly increases the computational cost in cases where photon detections are infrequent. Finally, BNPs necessarily expand the dimensions of the generator matrix whose exponentiation is required (Eq. 21) resulting in longer burn-in time for our MCMC chains.

As a result, we have optimized the computational cost with respect to the physical conditions of the system being studied. First, inclusion of the IRF can be parallelized, potentially reducing the time-cost to a calculation over a single data acquisition period. In our third companion article (54), dealing with pulsed illumination, we improve computational cost by making the assumption that fluorophore relaxation occurs within the window between consecutive pulses, thereby reducing our second-order structure herein to a first-order HMM, and allowing for faster computation of the likelihood in pulses where no photon is detected. Furthermore, in (54), we also mitigate the computational cost by assuming physically motivated timescale separation.

As it stands, our framework applies to smFRET experiments on immobilized molecules. However, it is often the case that molecules labeled with FRET pairs are allowed to diffuse freely through a confocal volume, such as in the study of binding and unbinding events (17), protein-protein interactions (17,65), and unhindered conformational dynamics of freely diffusing proteins (65,92). Photon-by-photon analysis of such data is often based on correlation methods which suffer from bulk averaging (40,42,43). We believe our framework has the potential to extend (50,73,93) to learn both the kinetics and diffusion coefficients of single molecules.

In addition, our current framework is restricted to models with discrete system states. However, smFRET can also be used to study systems that are better modeled as continuous, such as intrinsically disordered proteins, which include continuous changes not always well approximated by discrete system states (17,94). Adopting an adaptation of (56) should allow us to generalize this framework and instead infer energy landscapes, perhaps relevant to protein folding (95,96), continuum ratchets as applied to motor protein kinetics (97), and the stress modified potential landscapes of mechano-sensitive molecules (98).

To conclude, we have presented a general framework and demonstrated the importance of incorporating various features into the likelihood while learning full distributions over all unknowns including system states. In the following two companion articles (54,55), we specialize our method, and computational scheme, to continuous and pulsed illumination. We then apply our method to interactions of the intrinsically disordered proteins NCBD and ACTR (55) under continuous illumination, and the kinetics of the Holliday junction under pulsed illumination (54).

Code availability

The BNP-FRET software package is available on Github at https://github.com/LabPresse/BNP-FRET.

Acknowledgments

We thank Weiqing Xu and Dr. Zeliha Kilic for regular feedback and help, especially during the development of the nonparametrics samplers, and Dr. Irina Gopich for discussions. S.P. acknowledges support from the NIH NIGMS (R01GM130745) for supporting early efforts in nonparametrics and NIH NIGMS (R01GM134426) for supporting single-photon efforts. The bulk of the computations was performed on Agave and Sol supercomputers at ASU.

Declaration of interests

The authors declare no competing interests.

Editor: Jorg Enderlein.

References

1.Wu L., Huang C., et al. James T.D. Förster resonance energy transfer (FRET)-based small-molecule sensors and imaging agents. Chem. Soc. Rev. 2020;49:5110. doi: 10.1039/c9cs00318e. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Roy R., Hohng S., Ha T. A practical guide to single-molecule FRET. Nat. Methods. 2008;5:507. doi: 10.1038/nmeth.1208. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Demchenko A.P. Springer Science & Business Media; 2008. Introduction to Fluorescence Sensing. [Google Scholar]
4.Periasamy A., Day R. Elsevier; 2011. Molecular Imaging: FRET Microscopy and Spectroscopy. [Google Scholar]
5.Rhoades E., Gussakovsky E., Haran G. Watching proteins fold one molecule at a time. Proc. Natl. Acad. Sci. USA. 2003;100:3197–3202. doi: 10.1073/pnas.2628068100. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Martinac B. Single-molecule FRET studies of ion channels. Prog. Biophys. Mol. Biol. 2017;130:192–197. doi: 10.1016/j.pbiomolbio.2017.06.014. [DOI] [PubMed] [Google Scholar]
7.Chung H.S., McHale K., et al. Eaton W.A. Single-molecule fluorescence experiments determine protein folding transition path times. Science. 2012;335:981–984. doi: 10.1126/science.1215768. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Pressé S., Ghosh K., et al. Dill K.A. Dynamical fluctuations in biochemical reactions and cycles. Phys. Rev. E - Stat. Nonlinear Soft Matter Phys. 2010;82:031905. doi: 10.1103/PhysRevE.82.031905. [DOI] [PubMed] [Google Scholar]
9.Schuler B. Single-molecule FRET of protein structure and dynamics - a primer. J. Nanobiotechnol. 2013;11(S2):S2. doi: 10.1186/1477-3155-11-S1-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Coban O., Zanetti-Dominguez L.C., et al. Ng T. Effect of phosphorylation on EGFR dimer stability probed by single-molecule dynamics and FRET/FLIM. Biophys. J. 2015;108:1013–1026. doi: 10.1016/j.bpj.2015.01.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Halder K., Dölker N., et al. Neumann H. MD simulations and FRET reveal an environment-sensitive conformational plasticity of importin-β. Biophys. J. 2015;109:277–286. doi: 10.1016/j.bpj.2015.06.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Sabir T., Schröder G.F., et al. Magennis S.W. Global structure of forked DNA in solution revealed by high-resolution single-molecule FRET. J. Am. Chem. Soc. 2011;133:1188–1191. doi: 10.1021/ja108626w. [DOI] [PubMed] [Google Scholar]
13.Phelps C., Israels B., Jose D., Marsh M.C., von Hippel P.H., Marcus A.H. Using microsecond single-molecule FRET to determine the assembly pathways of T4 ssDNA binding protein onto model DNA replication forks. Proc. Natl. Acad. Sci. USA. 2017;114:E3612–E3621. doi: 10.1073/pnas.1619819114. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Baltierra-Jasso L.E., Morten M.J., et al. Magennis S.W. Crowding-induced hybridization of single DNA hairpins. J. Am. Chem. Soc. 2015;137:16020–16023. doi: 10.1021/jacs.5b11829. [DOI] [PubMed] [Google Scholar]
15.Wang Y., Liu Y., et al. Selvin P.R. Single molecule FRET reveals pore size and opening mechanism of a mechano-sensitive ion channel. Elife. 2014;3:e01834. doi: 10.7554/eLife.01834. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Yoo J., Kim J.-Y., et al. Chung H.S. Fast three-color single-molecule FRET using statistical inference. Nat. Commun. 2020;11:3336. doi: 10.1038/s41467-020-17149-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Schuler B., Eaton W.A. Protein folding studied by single-molecule FRET. Curr. Opin. Struct. Biol. 2008;18:16–26. doi: 10.1016/j.sbi.2007.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Kilic S., Felekyan S., et al. Fierz B. Single-molecule FRET reveals multiscale chromatin dynamics modulated by HP1α. Nat. Commun. 2018;9:235. doi: 10.1038/s41467-017-02619-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Lerner E., Cordes T., et al. Weiss S. Toward dynamic structural biology: Two decades of single-molecule Förster resonance energy transfer. Science. 2018;359:eaan1133. doi: 10.1126/science.aan1133. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Förster T. Zwischenmolekulare energiewanderung und fluoreszenz. Ann. Phys. 1948;437:55–75. [Google Scholar]
21.Jones G.A., Bradshaw D.S. Resonance energy transfer: From fundamental theory to recent applications. Front. Physiol. 2019;7:100. [Google Scholar]
22.Eisaman M.D., Fan J., et al. Polyakov S.V. Invited review article: Single-photon sources and detectors. Rev. Sci. Instrum. 2011;82:071101. doi: 10.1063/1.3610677. [DOI] [PubMed] [Google Scholar]
23.McKinney S.A., Joo C., Ha T. Analysis of single-molecule FRET trajectories using hidden Markov modeling. Biophys. J. 2006;91:1941–1951. doi: 10.1529/biophysj.106.082487. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Bronson J.E., Fei J., et al. Wiggins C.H. Learning rates and states from biophysical time series: A Bayesian approach to model selection and single-molecule FRET data. Biophys. J. 2009;97:3196–3205. doi: 10.1016/j.bpj.2009.09.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Sgouralis I., Madaan S., et al. Pressé S. A Bayesian nonparametric approach to single molecule Förster resonance energy transfer. J. Phys. Chem. B. 2019;123:675–688. doi: 10.1021/acs.jpcb.8b09752. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Becker W. Springer; 2015. Advanced Time-Correlated Single Photon Counting Applications. [Google Scholar]
27.Isbaner S., Karedla N., et al. Enderlein J. Dead-time correction of fluorescence lifetime measurements and fluorescence lifetime imaging. Opt Express. 2016;24(9):9429–9445. doi: 10.1364/OE.24.009429. [DOI] [PubMed] [Google Scholar]
28.Rasnik I., McKinney S.A., Ha T. Nonblinking and long-lasting single-molecule fluorescence imaging. Nat. Methods. 2006;3:891–893. doi: 10.1038/nmeth934. [DOI] [PubMed] [Google Scholar]
29.Hübner C.G., Renn A., et al. Wild U.P. Direct observation of the triplet lifetime quenching of single dye molecules by molecular oxygen. J. Chem. Phys. 2001;115:9619–9622. [Google Scholar]
30.Dale R.E., Eisinger J., Blumberg W.E. The orientational freedom of molecular probes. The orientation factor in intramolecular energy transfer. Biophys. J. 1979;26:161–193. doi: 10.1016/S0006-3495(79)85243-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Schuler B. Single-molecule fluorescence spectroscopy of protein folding. ChemPhysChem. 2005;6:1206–1220. doi: 10.1002/cphc.200400609. [DOI] [PubMed] [Google Scholar]
32.Kilic Z., Sgouralis I., et al. Pressé S. Extraction of rapid kinetics from smFRET measurements using integrative detectors. Cell Rep. Phys. Sci. 2021;2:100409. doi: 10.1016/j.xcrp.2021.100409. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Gopich I.V., Szabo A. Single-macromolecule fluorescence resonance energy transfer and free-energy profiles. J. Phys. Chem. B. 2003;107:5058–5063. [Google Scholar]
34.Andrec M., Levy R.M., Talaga D.S. Direct determination of kinetic rates from single-molecule photon arrival trajectories using hidden Markov models. J. Phys. Chem. A. 2003;107:7454–7464. doi: 10.1021/jp035514+. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Keller B.G., Kobitski A., et al. Noé F. Complex RNA folding kinetics revealed by single-molecule FRET and hidden Markov models. J. Am. Chem. Soc. 2014;136:4534–4543. doi: 10.1021/ja4098719. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Thomsen J., Sletfjerding M.B., et al. Hatzakis N.S. DeepFRET, a software for rapid and automated single-molecule FRET data classification using deep learning. Elife. 2020;9:e60404. doi: 10.7554/eLife.60404. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Gopich I.V., Szabo A. Decoding the pattern of photon colors in single-molecule FRET. J. Phys. Chem. B. 2009;113:10965–10973. doi: 10.1021/jp903671p. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Kilic Z., Sgouralis I., Pressé S. Generalizing HMMs to continuous time for fast kinetics: Hidden Markov jump processes. Biophys. J. 2021;120:409–423. doi: 10.1016/j.bpj.2020.12.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Harris P.D., Narducci A., et al. Lerner E. Multi-parameter photon-by-photon hidden Markov modeling. Nat. Commun. 2022;13(1):1000. doi: 10.1038/s41467-022-28632-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Gopich I.V., Szabo A. Theory of the statistics of kinetic transitions with application to single-molecule enzyme catalysis. J. Chem. Phys. 2006;124:154712. doi: 10.1063/1.2180770. [DOI] [PubMed] [Google Scholar]
41.Sgouralis I., Pressé S. An introduction to infinite HMMs for single-molecule data analysis. Biophys. J. 2017;112:2021–2029. doi: 10.1016/j.bpj.2017.04.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Gopich I.V., Szabo A. Single-molecule FRET with diffusion and conformational dynamics. J. Phys. Chem. B. 2007;111:12925–12932. doi: 10.1021/jp075255e. [DOI] [PubMed] [Google Scholar]
43.Pirchi M., Tsukanov R., et al. Nir E. Photon-by-photon hidden Markov model analysis for microsecond single-molecule FRET kinetics. J. Phys. Chem. B. 2016;120:13065–13075. doi: 10.1021/acs.jpcb.6b10726. [DOI] [PubMed] [Google Scholar]
44.Lerner E., Ingargiola A., Weiss S. Characterizing highly dynamic conformational states: The transcription bubble in RNAP-promoter open complex as an example. J. Chem. Phys. 2018;148:123315. doi: 10.1063/1.5004606. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Lever J., Krzywinski M., Altman N. Model selection and overfitting. Nat. Methods. 2016;13:703–704. [Google Scholar]
46.Ferguson T.S. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1973;1:209. [Google Scholar]
47.Gershman S.J., Blei D.M. A tutorial on Bayesian nonparametric models. J. Math. Psychol. 2012;56:1–12. [Google Scholar]
48.Sgouralis I., Whitmore M., et al. Pressé S. Single molecule force spectroscopy at high data acquisition: A Bayesian nonparametric analysis. J. Chem. Phys. 2018;148:123320. doi: 10.1063/1.5008842. [DOI] [PubMed] [Google Scholar]
49.Tavakoli M., Taylor J.N., et al. Pressé S. Single molecule data analysis: An introduction. Adv. Chem. Phys. 2017;162:205–305. [Google Scholar]
50.Tavakoli M., Jazani S., et al. Pressé S. Pitching single-focus confocal data analysis one photon at a time with Bayesian nonparametrics. Phys. Rev. X. 2020;10:011021. doi: 10.1103/physrevx.10.011021. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Tavakoli M., Jazani S., et al. Pressé S. Direct photon-by-photon analysis of time-resolved pulsed excitation data using Bayesian nonparametrics. Cell Rep. Phys. Sci. 2020;1:100234. doi: 10.1016/j.xcrp.2020.100234. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Bryan J.S., 4th, Sgouralis I., Pressé S. Diffraction-limited molecular cluster quantification with Bayesian nonparametrics. Nat. Comput. Sci. 2022;2:102–111. doi: 10.1038/s43588-022-00197-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Fazel M., Jazani S., et al. Pressé S. High resolution fluorescence lifetime maps from minimal photon counts. ACS Photonics. 2022;9:1015–1025. doi: 10.1021/acsphotonics.1c01936. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Safar M., Saurabh A., et al. Pressé S. Single photon smFRET. III. application to pulsed illumination. Biophys. Rep. 2022;2:100088. doi: 10.1016/j.bpr.2022.100088. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Saurabh A., Safar M., et al. Pressé S. Single photon smFRET. II. application to continuous illumination. Biophys. Rep. 2022;3:100087. doi: 10.1016/j.bpr.2022.100087. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Bryan J.S., 4th, Basak P., et al. Pressé S. Inferring potential landscapes from noisy trajectories of particles within an optical feedback trap. iScience. 2022;25:104731. doi: 10.1016/j.isci.2022.104731. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Bryan J.S., 4th, Sgouralis I., Pressé S. Inferring effective forces for Langevin dynamics using Gaussian processes. J. Chem. Phys. 2020;152:124106. doi: 10.1063/1.5144523. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Patel L., Gustafsson N., et al. Cohen E. A hidden Markov model approach to characterizing the photo-switching behavior of fluorophores. Ann. Appl. Stat. 2019;13:1397–1429. doi: 10.1214/19-AOAS1240. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Rabiner L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE. 1989;77:257–286. [Google Scholar]
60.Mattheyses A.L., Hoppe A.D., Axelrod D. Polarized fluorescence resonance energy transfer microscopy. Biophys. J. 2004;87:2787–2797. doi: 10.1529/biophysj.103.036194. [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Gordon F., Elcoroaristizabal S., Ryder A.G. Modelling Förster resonance energy transfer (FRET) using anisotropy resolved multi-dimensional emission spectroscopy (ARMES) Biochim. Biophys. Acta Gen. Subj. 2021;1865:129770. doi: 10.1016/j.bbagen.2020.129770. [DOI] [PubMed] [Google Scholar]
62.Gordon G.W., Berry G., et al. Herman B. Quantitative fluorescence resonance energy transfer measurements using fluorescence microscopy. Biophys. J. 1998;74:2702–2713. doi: 10.1016/S0006-3495(98)77976-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
63.Benke S., Holla A., et al. Schuler B. Combining rapid microfluidic mixing and three-color single-molecule FRET for probing the kinetics of protein conformational changes. J. Phys. Chem. B. 2021;125:6617–6628. doi: 10.1021/acs.jpcb.1c02370. [DOI] [PubMed] [Google Scholar]
64.Zosel F., Soranno A., et al. Schuler B. Depletion interactions modulate the binding between disordered proteins in crowded environments. Proc. Natl. Acad. Sci. USA. 2020;117:13480–13489. doi: 10.1073/pnas.1921617117. [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Kapanidis A.N., Laurence T.A., et al. Weiss S. Alternating-laser excitation of single molecules. Acc. Chem. Res. 2005;38:523–533. doi: 10.1021/ar0401348. [DOI] [PubMed] [Google Scholar]
66.Dytso A., Vincent Poor H. Vincent Poor. Estimation in Poisson noise: Properties of the conditional mean estimator. IEEE Trans. Inf. Theor. 2020;66:4304–4323. [Google Scholar]
67.Alléaume R., Treussart F., et al. Roch J.-F. Photon statistics characterization of a single-photon source. New J. Phys. 2004;6:85. [Google Scholar]
68.Brewer J. Kronecker products and matrix calculus in system theory. IEEE Trans. Circ. Syst. 1978;25:772–781. [Google Scholar]
69.Rollins G.C., Shin J.Y., et al. Pressé S. Stochastic approach to the molecular counting problem in superresolution microscopy. Proc. Natl. Acad. Sci. USA. 2015;112:E110–E118. doi: 10.1073/pnas.1408071112. [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Gillespie D.T. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comput. Phys. 1976;22:403–434. [Google Scholar]
71.Metropolis N., Rosenbluth A.W., et al. Teller E. Equation of state calculations by fast computing machines. J. Chem. Phys. 1953;21:1087–1092. [Google Scholar]
72.Gelman A., Gilks W.R., Roberts G.O. Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. 1997;7:110. [Google Scholar]
73.Jazani S., Sgouralis I., et al. Pressé S. An alternative framework for fluorescence correlation spectroscopy. Nat. Commun. 2019;10:3662. doi: 10.1038/s41467-019-11574-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
74.Ghahramani Z., Griffiths T. Infinite latent feature models and the indian buffet process. Adv. Neural Inf. Process. Syst. 2005;18 [Google Scholar]
75.Thibaux R., Jordan M.I. Artificial Intelligence and Statistics. PMLR; 2007. Hierarchical beta processes and the indian buffet process; p. 564. [Google Scholar]
76.Paisley J., Lawrence C. Proceedings of the 26th Annual International Conference on Machine Learning. 2009. Nonparametric factor analysis with beta process priors; p. 777. [Google Scholar]
77.Al Labadi L., Zarepour M. On approximations of the beta process in latent feature models: Point processes approach. Sankhya. 2018;80:59–79. [Google Scholar]
78.Fazel M., Alexander V., et al. Pressé S. Fluorescence lifetime: Beating the IRF and interpulse window. bioRxiv. 2022 doi: 10.1101/2022.09.08.507224. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
79.Teh Y.W., Jordan M.I., et al. Blei D.M. Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 2006;101:1566–1581. [Google Scholar]
80.Pitman J. Poisson–Dirichlet and GEM invariant distributions for split-and-merge transformations of an interval partition. Combinator. Probab. Comput. 2002;11:501–514. [Google Scholar]
81.Jayaram S. A constructive definition of Dirichlet priors. Stat. Sin. 1994;4:639. [Google Scholar]
82.Neal R.M. Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph Stat. 2000;9:249. [Google Scholar]
83.Gelfand A.E., Kottas A., MacEachern S.N. Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Am. Stat. Assoc. 2005;100:1021–1035. [Google Scholar]
84.Zosel F., Mercadante D., et al. Schuler B. A proline switch explains kinetic heterogeneity in a coupled folding and binding reaction. Nat. Commun. 2018;9:3332. doi: 10.1038/s41467-018-05725-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
85.Sanborn M.E., Connolly B.K., et al. Levitus M. Fluorescence properties and photophysics of the sulfoindocyanine cy3 linked covalently to DNA. J. Phys. Chem. B. 2007;111:11064–11074. doi: 10.1021/jp072912u. [DOI] [PubMed] [Google Scholar]
86.Hohlbein J., Craggs T.D., Cordes T. Alternating-laser excitation: Single-molecule FRET and beyond. Chem. Soc. Rev. 2014;43:1156–1171. doi: 10.1039/c3cs60233h. [DOI] [PubMed] [Google Scholar]
87.Sun Y., Wallrabe H., et al. Periasamy A. Three-color spectral FRET microscopy localizes three interacting proteins in living cells. Biophys. J. 2010;99:1274–1283. doi: 10.1016/j.bpj.2010.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
88.Clamme J.-P., Deniz A.A. Three-color single-molecule fluorescence resonance energy transfer. ChemPhysChem. 2005;6:74–77. doi: 10.1002/cphc.200400261. [DOI] [PubMed] [Google Scholar]
89.Hohng S., Joo C., Ha T. Single-molecule three-color FRET. Biophys. J. 2004;87:1328–1337. doi: 10.1529/biophysj.104.043935. [DOI] [PMC free article] [PubMed] [Google Scholar]
90.Pressé S., Peterson J., et al. Dill K. Single molecule conformational memory extraction: P5ab RNA hairpin. J. Phys. Chem. B. 2014;118:6597–6603. doi: 10.1021/jp500611f. [DOI] [PMC free article] [PubMed] [Google Scholar]
91.Pressé S., Lee J., Dill K.A. Extracting conformational memory from single-molecule kinetic data. J. Phys. Chem. B. 2013;117:495–502. doi: 10.1021/jp309420u. [DOI] [PMC free article] [PubMed] [Google Scholar]
92.Deniz A.A., Dahan M., et al. Schultz P.G. Single-pair fluorescence resonance energy transfer on freely diffusing molecules: Observation of Förster distance dependence and subpopulations. Proc. Natl. Acad. Sci. USA. 1999;96:3670–3675. doi: 10.1073/pnas.96.7.3670. [DOI] [PMC free article] [PubMed] [Google Scholar]
93.Jazani S., Sgouralis I., Pressé S. A method for single molecule tracking using a conventional single-focus confocal setup. J. Chem. Phys. 2019;150:114108. doi: 10.1063/1.5083869. [DOI] [PubMed] [Google Scholar]
94.Schuler B., Hofmann H. Single-molecule spectroscopy of protein folding dynamics—expanding scope and timescales. Curr. Opin. Struct. Biol. 2013;23:36–47. doi: 10.1016/j.sbi.2012.10.008. [DOI] [PubMed] [Google Scholar]
95.Kirmizialtin S., Huang L., Makarov D.E. Topography of the free-energy landscape probed via mechanical unfolding of proteins. J. Chem. Phys. 2005;122:234915. doi: 10.1063/1.1931659. [DOI] [PubMed] [Google Scholar]
96.Shoemaker B.A., Wang J., Wolynes P.G. Structural correlations in protein folding funnels. Proc. Natl. Acad. Sci. USA. 1997;94:777–782. doi: 10.1073/pnas.94.3.777. [DOI] [PMC free article] [PubMed] [Google Scholar]
97.Kolomeisky A.B., Fisher M.E. Molecular motors: A theorist’s perspective. Annu. Rev. Phys. Chem. 2007;58:675–695. doi: 10.1146/annurev.physchem.58.032806.104532. [DOI] [PubMed] [Google Scholar]
98.Konda S.S.M., Avdoshenko S.M., Makarov D.E. Exploring the topography of the stress-modified energy landscapes of mechanosensitive molecules. J. Chem. Phys. 2014;140:104114. doi: 10.1063/1.4867500. [DOI] [PubMed] [Google Scholar]

[bib1] 1.Wu L., Huang C., et al. James T.D. Förster resonance energy transfer (FRET)-based small-molecule sensors and imaging agents. Chem. Soc. Rev. 2020;49:5110. doi: 10.1039/c9cs00318e. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] 2.Roy R., Hohng S., Ha T. A practical guide to single-molecule FRET. Nat. Methods. 2008;5:507. doi: 10.1038/nmeth.1208. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3.Demchenko A.P. Springer Science & Business Media; 2008. Introduction to Fluorescence Sensing. [Google Scholar]

[bib4] 4.Periasamy A., Day R. Elsevier; 2011. Molecular Imaging: FRET Microscopy and Spectroscopy. [Google Scholar]

[bib5] 5.Rhoades E., Gussakovsky E., Haran G. Watching proteins fold one molecule at a time. Proc. Natl. Acad. Sci. USA. 2003;100:3197–3202. doi: 10.1073/pnas.2628068100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] 6.Martinac B. Single-molecule FRET studies of ion channels. Prog. Biophys. Mol. Biol. 2017;130:192–197. doi: 10.1016/j.pbiomolbio.2017.06.014. [DOI] [PubMed] [Google Scholar]

[bib7] 7.Chung H.S., McHale K., et al. Eaton W.A. Single-molecule fluorescence experiments determine protein folding transition path times. Science. 2012;335:981–984. doi: 10.1126/science.1215768. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Pressé S., Ghosh K., et al. Dill K.A. Dynamical fluctuations in biochemical reactions and cycles. Phys. Rev. E - Stat. Nonlinear Soft Matter Phys. 2010;82:031905. doi: 10.1103/PhysRevE.82.031905. [DOI] [PubMed] [Google Scholar]

[bib9] 9.Schuler B. Single-molecule FRET of protein structure and dynamics - a primer. J. Nanobiotechnol. 2013;11(S2):S2. doi: 10.1186/1477-3155-11-S1-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10.Coban O., Zanetti-Dominguez L.C., et al. Ng T. Effect of phosphorylation on EGFR dimer stability probed by single-molecule dynamics and FRET/FLIM. Biophys. J. 2015;108:1013–1026. doi: 10.1016/j.bpj.2015.01.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Halder K., Dölker N., et al. Neumann H. MD simulations and FRET reveal an environment-sensitive conformational plasticity of importin-β. Biophys. J. 2015;109:277–286. doi: 10.1016/j.bpj.2015.06.014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12.Sabir T., Schröder G.F., et al. Magennis S.W. Global structure of forked DNA in solution revealed by high-resolution single-molecule FRET. J. Am. Chem. Soc. 2011;133:1188–1191. doi: 10.1021/ja108626w. [DOI] [PubMed] [Google Scholar]

[bib13] 13.Phelps C., Israels B., Jose D., Marsh M.C., von Hippel P.H., Marcus A.H. Using microsecond single-molecule FRET to determine the assembly pathways of T4 ssDNA binding protein onto model DNA replication forks. Proc. Natl. Acad. Sci. USA. 2017;114:E3612–E3621. doi: 10.1073/pnas.1619819114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Baltierra-Jasso L.E., Morten M.J., et al. Magennis S.W. Crowding-induced hybridization of single DNA hairpins. J. Am. Chem. Soc. 2015;137:16020–16023. doi: 10.1021/jacs.5b11829. [DOI] [PubMed] [Google Scholar]

[bib15] 15.Wang Y., Liu Y., et al. Selvin P.R. Single molecule FRET reveals pore size and opening mechanism of a mechano-sensitive ion channel. Elife. 2014;3:e01834. doi: 10.7554/eLife.01834. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16.Yoo J., Kim J.-Y., et al. Chung H.S. Fast three-color single-molecule FRET using statistical inference. Nat. Commun. 2020;11:3336. doi: 10.1038/s41467-020-17149-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] 17.Schuler B., Eaton W.A. Protein folding studied by single-molecule FRET. Curr. Opin. Struct. Biol. 2008;18:16–26. doi: 10.1016/j.sbi.2007.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.Kilic S., Felekyan S., et al. Fierz B. Single-molecule FRET reveals multiscale chromatin dynamics modulated by HP1α. Nat. Commun. 2018;9:235. doi: 10.1038/s41467-017-02619-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] 19.Lerner E., Cordes T., et al. Weiss S. Toward dynamic structural biology: Two decades of single-molecule Förster resonance energy transfer. Science. 2018;359:eaan1133. doi: 10.1126/science.aan1133. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Förster T. Zwischenmolekulare energiewanderung und fluoreszenz. Ann. Phys. 1948;437:55–75. [Google Scholar]

[bib21] 21.Jones G.A., Bradshaw D.S. Resonance energy transfer: From fundamental theory to recent applications. Front. Physiol. 2019;7:100. [Google Scholar]

[bib22] 22.Eisaman M.D., Fan J., et al. Polyakov S.V. Invited review article: Single-photon sources and detectors. Rev. Sci. Instrum. 2011;82:071101. doi: 10.1063/1.3610677. [DOI] [PubMed] [Google Scholar]

[bib23] 23.McKinney S.A., Joo C., Ha T. Analysis of single-molecule FRET trajectories using hidden Markov modeling. Biophys. J. 2006;91:1941–1951. doi: 10.1529/biophysj.106.082487. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] 24.Bronson J.E., Fei J., et al. Wiggins C.H. Learning rates and states from biophysical time series: A Bayesian approach to model selection and single-molecule FRET data. Biophys. J. 2009;97:3196–3205. doi: 10.1016/j.bpj.2009.09.031. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] 25.Sgouralis I., Madaan S., et al. Pressé S. A Bayesian nonparametric approach to single molecule Förster resonance energy transfer. J. Phys. Chem. B. 2019;123:675–688. doi: 10.1021/acs.jpcb.8b09752. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] 26.Becker W. Springer; 2015. Advanced Time-Correlated Single Photon Counting Applications. [Google Scholar]

[bib27] 27.Isbaner S., Karedla N., et al. Enderlein J. Dead-time correction of fluorescence lifetime measurements and fluorescence lifetime imaging. Opt Express. 2016;24(9):9429–9445. doi: 10.1364/OE.24.009429. [DOI] [PubMed] [Google Scholar]

[bib28] 28.Rasnik I., McKinney S.A., Ha T. Nonblinking and long-lasting single-molecule fluorescence imaging. Nat. Methods. 2006;3:891–893. doi: 10.1038/nmeth934. [DOI] [PubMed] [Google Scholar]

[bib29] 29.Hübner C.G., Renn A., et al. Wild U.P. Direct observation of the triplet lifetime quenching of single dye molecules by molecular oxygen. J. Chem. Phys. 2001;115:9619–9622. [Google Scholar]

[bib30] 30.Dale R.E., Eisinger J., Blumberg W.E. The orientational freedom of molecular probes. The orientation factor in intramolecular energy transfer. Biophys. J. 1979;26:161–193. doi: 10.1016/S0006-3495(79)85243-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] 31.Schuler B. Single-molecule fluorescence spectroscopy of protein folding. ChemPhysChem. 2005;6:1206–1220. doi: 10.1002/cphc.200400609. [DOI] [PubMed] [Google Scholar]

[bib32] 32.Kilic Z., Sgouralis I., et al. Pressé S. Extraction of rapid kinetics from smFRET measurements using integrative detectors. Cell Rep. Phys. Sci. 2021;2:100409. doi: 10.1016/j.xcrp.2021.100409. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] 33.Gopich I.V., Szabo A. Single-macromolecule fluorescence resonance energy transfer and free-energy profiles. J. Phys. Chem. B. 2003;107:5058–5063. [Google Scholar]

[bib34] 34.Andrec M., Levy R.M., Talaga D.S. Direct determination of kinetic rates from single-molecule photon arrival trajectories using hidden Markov models. J. Phys. Chem. A. 2003;107:7454–7464. doi: 10.1021/jp035514+. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] 35.Keller B.G., Kobitski A., et al. Noé F. Complex RNA folding kinetics revealed by single-molecule FRET and hidden Markov models. J. Am. Chem. Soc. 2014;136:4534–4543. doi: 10.1021/ja4098719. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] 36.Thomsen J., Sletfjerding M.B., et al. Hatzakis N.S. DeepFRET, a software for rapid and automated single-molecule FRET data classification using deep learning. Elife. 2020;9:e60404. doi: 10.7554/eLife.60404. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] 37.Gopich I.V., Szabo A. Decoding the pattern of photon colors in single-molecule FRET. J. Phys. Chem. B. 2009;113:10965–10973. doi: 10.1021/jp903671p. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] 38.Kilic Z., Sgouralis I., Pressé S. Generalizing HMMs to continuous time for fast kinetics: Hidden Markov jump processes. Biophys. J. 2021;120:409–423. doi: 10.1016/j.bpj.2020.12.022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] 39.Harris P.D., Narducci A., et al. Lerner E. Multi-parameter photon-by-photon hidden Markov modeling. Nat. Commun. 2022;13(1):1000. doi: 10.1038/s41467-022-28632-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] 40.Gopich I.V., Szabo A. Theory of the statistics of kinetic transitions with application to single-molecule enzyme catalysis. J. Chem. Phys. 2006;124:154712. doi: 10.1063/1.2180770. [DOI] [PubMed] [Google Scholar]

[bib41] 41.Sgouralis I., Pressé S. An introduction to infinite HMMs for single-molecule data analysis. Biophys. J. 2017;112:2021–2029. doi: 10.1016/j.bpj.2017.04.027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] 42.Gopich I.V., Szabo A. Single-molecule FRET with diffusion and conformational dynamics. J. Phys. Chem. B. 2007;111:12925–12932. doi: 10.1021/jp075255e. [DOI] [PubMed] [Google Scholar]

[bib43] 43.Pirchi M., Tsukanov R., et al. Nir E. Photon-by-photon hidden Markov model analysis for microsecond single-molecule FRET kinetics. J. Phys. Chem. B. 2016;120:13065–13075. doi: 10.1021/acs.jpcb.6b10726. [DOI] [PubMed] [Google Scholar]

[bib44] 44.Lerner E., Ingargiola A., Weiss S. Characterizing highly dynamic conformational states: The transcription bubble in RNAP-promoter open complex as an example. J. Chem. Phys. 2018;148:123315. doi: 10.1063/1.5004606. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] 45.Lever J., Krzywinski M., Altman N. Model selection and overfitting. Nat. Methods. 2016;13:703–704. [Google Scholar]

[bib46] 46.Ferguson T.S. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1973;1:209. [Google Scholar]

[bib47] 47.Gershman S.J., Blei D.M. A tutorial on Bayesian nonparametric models. J. Math. Psychol. 2012;56:1–12. [Google Scholar]

[bib48] 48.Sgouralis I., Whitmore M., et al. Pressé S. Single molecule force spectroscopy at high data acquisition: A Bayesian nonparametric analysis. J. Chem. Phys. 2018;148:123320. doi: 10.1063/1.5008842. [DOI] [PubMed] [Google Scholar]

[bib49] 49.Tavakoli M., Taylor J.N., et al. Pressé S. Single molecule data analysis: An introduction. Adv. Chem. Phys. 2017;162:205–305. [Google Scholar]

[bib50] 50.Tavakoli M., Jazani S., et al. Pressé S. Pitching single-focus confocal data analysis one photon at a time with Bayesian nonparametrics. Phys. Rev. X. 2020;10:011021. doi: 10.1103/physrevx.10.011021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib51] 51.Tavakoli M., Jazani S., et al. Pressé S. Direct photon-by-photon analysis of time-resolved pulsed excitation data using Bayesian nonparametrics. Cell Rep. Phys. Sci. 2020;1:100234. doi: 10.1016/j.xcrp.2020.100234. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib52] 52.Bryan J.S., 4th, Sgouralis I., Pressé S. Diffraction-limited molecular cluster quantification with Bayesian nonparametrics. Nat. Comput. Sci. 2022;2:102–111. doi: 10.1038/s43588-022-00197-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib53] 53.Fazel M., Jazani S., et al. Pressé S. High resolution fluorescence lifetime maps from minimal photon counts. ACS Photonics. 2022;9:1015–1025. doi: 10.1021/acsphotonics.1c01936. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib64] 54.Safar M., Saurabh A., et al. Pressé S. Single photon smFRET. III. application to pulsed illumination. Biophys. Rep. 2022;2:100088. doi: 10.1016/j.bpr.2022.100088. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib77] 55.Saurabh A., Safar M., et al. Pressé S. Single photon smFRET. II. application to continuous illumination. Biophys. Rep. 2022;3:100087. doi: 10.1016/j.bpr.2022.100087. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] 56.Bryan J.S., 4th, Basak P., et al. Pressé S. Inferring potential landscapes from noisy trajectories of particles within an optical feedback trap. iScience. 2022;25:104731. doi: 10.1016/j.isci.2022.104731. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib55] 57.Bryan J.S., 4th, Sgouralis I., Pressé S. Inferring effective forces for Langevin dynamics using Gaussian processes. J. Chem. Phys. 2020;152:124106. doi: 10.1063/1.5144523. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib56] 58.Patel L., Gustafsson N., et al. Cohen E. A hidden Markov model approach to characterizing the photo-switching behavior of fluorophores. Ann. Appl. Stat. 2019;13:1397–1429. doi: 10.1214/19-AOAS1240. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] 59.Rabiner L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE. 1989;77:257–286. [Google Scholar]

[bib58] 60.Mattheyses A.L., Hoppe A.D., Axelrod D. Polarized fluorescence resonance energy transfer microscopy. Biophys. J. 2004;87:2787–2797. doi: 10.1529/biophysj.103.036194. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib59] 61.Gordon F., Elcoroaristizabal S., Ryder A.G. Modelling Förster resonance energy transfer (FRET) using anisotropy resolved multi-dimensional emission spectroscopy (ARMES) Biochim. Biophys. Acta Gen. Subj. 2021;1865:129770. doi: 10.1016/j.bbagen.2020.129770. [DOI] [PubMed] [Google Scholar]

[bib60] 62.Gordon G.W., Berry G., et al. Herman B. Quantitative fluorescence resonance energy transfer measurements using fluorescence microscopy. Biophys. J. 1998;74:2702–2713. doi: 10.1016/S0006-3495(98)77976-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib61] 63.Benke S., Holla A., et al. Schuler B. Combining rapid microfluidic mixing and three-color single-molecule FRET for probing the kinetics of protein conformational changes. J. Phys. Chem. B. 2021;125:6617–6628. doi: 10.1021/acs.jpcb.1c02370. [DOI] [PubMed] [Google Scholar]

[bib62] 64.Zosel F., Soranno A., et al. Schuler B. Depletion interactions modulate the binding between disordered proteins in crowded environments. Proc. Natl. Acad. Sci. USA. 2020;117:13480–13489. doi: 10.1073/pnas.1921617117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib63] 65.Kapanidis A.N., Laurence T.A., et al. Weiss S. Alternating-laser excitation of single molecules. Acc. Chem. Res. 2005;38:523–533. doi: 10.1021/ar0401348. [DOI] [PubMed] [Google Scholar]

[bib65] 66.Dytso A., Vincent Poor H. Vincent Poor. Estimation in Poisson noise: Properties of the conditional mean estimator. IEEE Trans. Inf. Theor. 2020;66:4304–4323. [Google Scholar]

[bib66] 67.Alléaume R., Treussart F., et al. Roch J.-F. Photon statistics characterization of a single-photon source. New J. Phys. 2004;6:85. [Google Scholar]

[bib67] 68.Brewer J. Kronecker products and matrix calculus in system theory. IEEE Trans. Circ. Syst. 1978;25:772–781. [Google Scholar]

[bib68] 69.Rollins G.C., Shin J.Y., et al. Pressé S. Stochastic approach to the molecular counting problem in superresolution microscopy. Proc. Natl. Acad. Sci. USA. 2015;112:E110–E118. doi: 10.1073/pnas.1408071112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib69] 70.Gillespie D.T. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comput. Phys. 1976;22:403–434. [Google Scholar]

[bib70] 71.Metropolis N., Rosenbluth A.W., et al. Teller E. Equation of state calculations by fast computing machines. J. Chem. Phys. 1953;21:1087–1092. [Google Scholar]

[bib71] 72.Gelman A., Gilks W.R., Roberts G.O. Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. 1997;7:110. [Google Scholar]

[bib72] 73.Jazani S., Sgouralis I., et al. Pressé S. An alternative framework for fluorescence correlation spectroscopy. Nat. Commun. 2019;10:3662. doi: 10.1038/s41467-019-11574-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib73] 74.Ghahramani Z., Griffiths T. Infinite latent feature models and the indian buffet process. Adv. Neural Inf. Process. Syst. 2005;18 [Google Scholar]

[bib74] 75.Thibaux R., Jordan M.I. Artificial Intelligence and Statistics. PMLR; 2007. Hierarchical beta processes and the indian buffet process; p. 564. [Google Scholar]

[bib75] 76.Paisley J., Lawrence C. Proceedings of the 26th Annual International Conference on Machine Learning. 2009. Nonparametric factor analysis with beta process priors; p. 777. [Google Scholar]

[bib76] 77.Al Labadi L., Zarepour M. On approximations of the beta process in latent feature models: Point processes approach. Sankhya. 2018;80:59–79. [Google Scholar]

[bib78] 78.Fazel M., Alexander V., et al. Pressé S. Fluorescence lifetime: Beating the IRF and interpulse window. bioRxiv. 2022 doi: 10.1101/2022.09.08.507224. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib79] 79.Teh Y.W., Jordan M.I., et al. Blei D.M. Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 2006;101:1566–1581. [Google Scholar]

[bib80] 80.Pitman J. Poisson–Dirichlet and GEM invariant distributions for split-and-merge transformations of an interval partition. Combinator. Probab. Comput. 2002;11:501–514. [Google Scholar]

[bib81] 81.Jayaram S. A constructive definition of Dirichlet priors. Stat. Sin. 1994;4:639. [Google Scholar]

[bib82] 82.Neal R.M. Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph Stat. 2000;9:249. [Google Scholar]

[bib83] 83.Gelfand A.E., Kottas A., MacEachern S.N. Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Am. Stat. Assoc. 2005;100:1021–1035. [Google Scholar]

[bib84] 84.Zosel F., Mercadante D., et al. Schuler B. A proline switch explains kinetic heterogeneity in a coupled folding and binding reaction. Nat. Commun. 2018;9:3332. doi: 10.1038/s41467-018-05725-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib85] 85.Sanborn M.E., Connolly B.K., et al. Levitus M. Fluorescence properties and photophysics of the sulfoindocyanine cy3 linked covalently to DNA. J. Phys. Chem. B. 2007;111:11064–11074. doi: 10.1021/jp072912u. [DOI] [PubMed] [Google Scholar]

[bib86] 86.Hohlbein J., Craggs T.D., Cordes T. Alternating-laser excitation: Single-molecule FRET and beyond. Chem. Soc. Rev. 2014;43:1156–1171. doi: 10.1039/c3cs60233h. [DOI] [PubMed] [Google Scholar]

[bib87] 87.Sun Y., Wallrabe H., et al. Periasamy A. Three-color spectral FRET microscopy localizes three interacting proteins in living cells. Biophys. J. 2010;99:1274–1283. doi: 10.1016/j.bpj.2010.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib88] 88.Clamme J.-P., Deniz A.A. Three-color single-molecule fluorescence resonance energy transfer. ChemPhysChem. 2005;6:74–77. doi: 10.1002/cphc.200400261. [DOI] [PubMed] [Google Scholar]

[bib89] 89.Hohng S., Joo C., Ha T. Single-molecule three-color FRET. Biophys. J. 2004;87:1328–1337. doi: 10.1529/biophysj.104.043935. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib90] 90.Pressé S., Peterson J., et al. Dill K. Single molecule conformational memory extraction: P5ab RNA hairpin. J. Phys. Chem. B. 2014;118:6597–6603. doi: 10.1021/jp500611f. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib91] 91.Pressé S., Lee J., Dill K.A. Extracting conformational memory from single-molecule kinetic data. J. Phys. Chem. B. 2013;117:495–502. doi: 10.1021/jp309420u. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib92] 92.Deniz A.A., Dahan M., et al. Schultz P.G. Single-pair fluorescence resonance energy transfer on freely diffusing molecules: Observation of Förster distance dependence and subpopulations. Proc. Natl. Acad. Sci. USA. 1999;96:3670–3675. doi: 10.1073/pnas.96.7.3670. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib93] 93.Jazani S., Sgouralis I., Pressé S. A method for single molecule tracking using a conventional single-focus confocal setup. J. Chem. Phys. 2019;150:114108. doi: 10.1063/1.5083869. [DOI] [PubMed] [Google Scholar]

[bib94] 94.Schuler B., Hofmann H. Single-molecule spectroscopy of protein folding dynamics—expanding scope and timescales. Curr. Opin. Struct. Biol. 2013;23:36–47. doi: 10.1016/j.sbi.2012.10.008. [DOI] [PubMed] [Google Scholar]

[bib95] 95.Kirmizialtin S., Huang L., Makarov D.E. Topography of the free-energy landscape probed via mechanical unfolding of proteins. J. Chem. Phys. 2005;122:234915. doi: 10.1063/1.1931659. [DOI] [PubMed] [Google Scholar]

[bib96] 96.Shoemaker B.A., Wang J., Wolynes P.G. Structural correlations in protein folding funnels. Proc. Natl. Acad. Sci. USA. 1997;94:777–782. doi: 10.1073/pnas.94.3.777. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib97] 97.Kolomeisky A.B., Fisher M.E. Molecular motors: A theorist’s perspective. Annu. Rev. Phys. Chem. 2007;58:675–695. doi: 10.1146/annurev.physchem.58.032806.104532. [DOI] [PubMed] [Google Scholar]

[bib98] 98.Konda S.S.M., Avdoshenko S.M., Makarov D.E. Exploring the topography of the stress-modified energy landscapes of mechanosensitive molecules. J. Chem. Phys. 2014;140:104114. doi: 10.1063/1.4867500. [DOI] [PubMed] [Google Scholar]

PERMALINK

Single-photon smFRET. I: Theory and conceptual basis

Ayush Saurabh

Mohamadreza Fazel

Matthew Safar

Ioannis Sgouralis

Steve Pressé

Abstract

Why it matters

Introduction

Figure 1.

Forward model

Conventions

smFRET data

Likelihood

Example I: State space and generator matrix.

Absence of observations

Introducing observations

Figure 2.

Example II: Naive likelihood computation.

Recursion formulas

Reduced propagators

Likelihood for the HMM with second-order structure

Continuum limit

Example III: Detection matrices.

Final likelihood

Example IV: Propagator and likelihood.

Effect of binning single-photon smFRET data

Detection effects

Crosstalk and detection efficiency

Example V: Detection matrices with crosstalk and detector efficiencies.

Effects of detector dead time

Adding the detection IRF

Illumination features

Pulsed illumination

Figure 3.

Background emissions

Example VI: Background.

Fluorophore characteristics: Quantum yield, blinking, photobleaching, and direct acceptor excitation

Synthetic data generation

Gillespie and detector artefacts

Figure 4.

Inverse strategy

Parametric sampler: BNP-FRET with fixed number of system states

This whole algorithm can now be summarized in the following pseudocode.

Nonparametrics: Predicting the number of system states

Bernoulli process for continuous illumination

iHMM methods for pulsed illumination

Likelihood computation in practice

Results

Parametric examples

Photon budget and excitation rate

Figure 5.

Figure 6.

Figure 7.

An example with crosstalk

Figure 8.

Figure 9.

An example with background emissions

Figure 10.

Figure 11.

An example with IRF

Figure 12.

A nonparametric example

Figure 13.

Discussion

Code availability

Acknowledgments

Declaration of interests

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases