Abstract
Single particle tracking (SPT) is a powerful class of methods for studying the dynamics of biomolecules inside living cells. The techniques reveal both trajectories of individual particles, with a resolution well below the diffraction limit of light, and the parameters defining the motion model, such as diffusion coefficients and confinement lengths. Existing algorithms assume these parameters are constant throughout an experiment. However, it has been demonstrated that they often vary with time as the tracked particles move through different regions in the cell or as conditions inside the cell change in response to stimuli. In this work we apply the method of local Maximum Likelihood (ML) estimation to the SPT application combined with change detection. Local ML uses a sliding window over the data, estimating the model parameters in each window. Once we have found the values for the parameters before and after the change, we apply offline change detection to know the exact time of the change. Then, we reestimate these parameters and show that there is an improvement in the estimation of key parameters found in SPT. Preliminary results using simulated data with a basic diffusion model with additive Gaussian noise show that our proposed algorithm is able to track abrupt changes in the parameters as they evolve during a trajectory.
I. INTRODUCTION
Single Particle Tracking (SPT) is a class of experimental techniques and mathematical algorithms for following small (less than 100 nm) biological macromolecules moving inside living cells, including viruses, proteins, and strands of RNA [1]. By labeling the particle with a fluorescent reporter such as a quantum dot or a fluorescent protein, the motion of the labeled object can be resolved with nanometer-scale spatial resolution. While there are many different schemes, the general paradigm in SPT involves capturing a series of wide-field fluorescence images, localizing the fluorescent particle in each frame and linking across frames to form a trajectory, and then analyzing that trajectory to estimate motion model parameters. Motion at the nanometer-scale is dominated by stochasticity with a variety of motion models relevant to describing the dynamics of single particles inside living cells, including (among others) free diffusion, confined diffusion, directed motion, and anomalous diffusion [2].
This general SPT paradigm can be described as producing noisy observations of a stochastic process (such as from producing a trajectory by localizing a fluorescent particle in each frame of an image sequence). The most common technique to estimate the motion model parameters is to fit the Mean Square Displacement (MSD) curve of the data with a specific model. While a very simple and popular approach, the results depend on choices such as the number of points to use in the fit and does not account for many factors, including observation noise, motion blur arising from camera integration times, and other experimental realities [3]. Despite these limitations, this approach has been enormously successful in probing biomolecular dynamics [4].
For diffusion models, maximum likelihood (ML) estimation has been established for estimating motion model parameters, placing the estimation on firm theoretical grounding, ensuring the the analysis is both consistent and asymptotically efficient while also providing a rigorous understanding of the accuracy and performance of the estimator [3], [5]–[7]. Current schemes, however, assume that the model parameters, while unknown, are time invariant. This assumption does not always hold as the parameters, or even the model itself, may change over time depending on the specific biological application being studied (see, e.g. [8]).
To begin to address this issue, one of the authors used a jump Markov model to allow for different parameter values, and even different motion models, at different stages of a given trajectory [9]. This approach, however, does not allow for continuously varying parameters and assumes a probabilistic structure on the changes in the model that may be non-physical. Recently, the authors applied local ML estimation together with Expectation Maximization (EM) [10]. This approach handles both issues but not assuming any underlying probabilistic model and by allowing for continuous variation in parameters. However, the windowed approach leads to a tradeoff between the quality of the estimation (which improves with longer windows) and sensitivity and detection of when a change occurs (which improves with shorter windows.
In this work we develop a two-step estimation algorithm based on local likelihood estimation and change detection techniques such as the likelihood ratio test. We focus on a pure diffusion model with time-varying diffusion coefficients. The algorithm works as follows: (i) we use a sliding window approach, estimating a time-invariant parameter model in each window to trace out time-varying parameters; (ii) under the assumption that the model parameters shift between fixed values, we use the results of (i) in a change-detection scheme to estimate the time at which the model shifted. We can then go back and re-estimate the model parameter in each time block between the changes, using the largest possible window to produce the most accurate estimate.
In addition to adding change detection and re-estimation, our approach here differs from our earlier work in [10] by replacing EM with a concentrated likelihood function that can be efficiently optimized using Newton-like algorithms. This yields an algorithm which is significantly more efficient in terms of its computation complexity. The tradeoff, however, is that the likelihood-based scheme is not extendable to motion and observation models that are significantly more complicated than simple diffusion and observations with Gaussian noise while the EM-scheme is applicable to fairly broad classes of models (see, e.g. [11]). The change detection is carried out by using a well-known technique based on the likelihood ratio test, see e.g. [12]–[14] and the references therein.
In our prior work, we also did not consider the initialization step. Because the optimization is numerical in nature, having a good starting point is a key element in producing accurate results. Here we discuss the selection of the initial value in a systematic way based on the spectral factorization of an equivalent model.
The remainder of this paper is organized as follows: in Sec. II we give some background on SPT; in Sec. III we pose the maximum likelihood problem for the time-invariant case using the concentrated likelihood function; in Sec. IV we develop our 2-step time-varying approach optimizing it directly with Newton-like methods; in Sec. V, we demonstrate the scheme through numerical simulations; finally, Sec. VI gives conclusions and future work.
II. SPT model
There are a variety of approaches to acquire data in SPT and a myriad of algorithms to process that data. The standard paradigm, illustrated in Fig. 1, is to collect a series of images from a CCD camera, localize the particle in each of the images, link them into a trajectory, and then analyze the trajectories for motion model parameters using, typically, a curve fit to a MSD curve. Localization can be done using a variety of different algorithms, ranging from simple centroid calculations through full ML estimation [15]–[17]. The most common remains a fit of the intensity in the image to a simple Gaussian profile, yielding an accuracy on the order of 10 nm, well below the diffraction limit of light.
Fig. 1.
Illustration of a generic SPT scheme. (left) Data, often in the form of CCD camera images are processed using a localization algorithm to produce (center) trajectories which are then analyzing to produce (right) the MSD curve which is fit to a model to yield an estimate.
Once given a trajectory, analysis typically proceeds by selecting a parameterized model and using the trajectory to identify those parameters. In this work we focus on simple diffusion, given by
| (1) |
where , Q = 2DΔt is the variance of the process noise defined by the diffusion coefficient D and the sampling time Δt, and R is the variance of the measurement noise as generated by a variety of processes, including shot noise due to the physics of photon generation in fluorescence and read-out noise in the camera. Despite the complex reality of measurement noise, it is often modeled as a simple Gaussian white noise process. The Gaussian approximation works well at high signal-to-noise levels but does break apart at low intensities, necessitating a more complex description of the data [11]. In general, the measurement noise R has reasonably static statistics as it is driven primarily by the experimental equipment. The process noise Q, however, can vary as the particles move into different regions of the cells with different local environments, interact with different species in their surroundings, or go through biochemical changes due to the natural activity in the cell. This insight will set up the conditions for the examples given in a posterior section.
Our goal is to develop an algorithm that can handle motion models where the diffusion coefficients change abruptly, estimating accurately both the diffusion parameters and the time points of the change. To this end, we propose a two-step estimation algorithm. The first step consists of estimating the values before and after the change, and the second step is to use change detection techniques to estimate the time of the change.
In what follows we will use both Q = Q(t), R = R(t), and Qt and Rt to denote the time-varying nature of the parameters. Note that the approach flexibly handles either R or Q, or both, being time-varying.
III. Inference problem
The parameters of the system in (1) can be estimated using ML estimation. One simple way to solve the ML problem is by iteratively optimizing the ML function. This provides a systematic way to obtain parameter estimates, and can be done, for example, by using the Expectation-Maximization (EM) algorithm. This approach was taken in [10], and the parameters were estimated using the EM algorithm in a local ML setting. This approach is general, since it allows, with small modifications, to estimate all the parameters of a linear system, including the dynamics contained in A, the observation matrix C, the initial states x0, and in particular, it allows to estimate the variances Q and R. This preliminary result showed that we are able to track time-varying parameters in the context of SPT. However, if we are only interested to track time-varying diffusion constants, we can do it directly by posing the ML function, and then optimizing it locally, which seems to be a more straightforward way to proceed at a low computational load.
A. The likelihood function for the SPT model
The likelihood function can be written as:
| (2) |
where p is a probability density function an β is the parameter. For the system considered in this work, it is well known, see e.g. [18] that the joint probability of the observed data YN is given by
| (3) |
where β = [Q R]T. Notice that in the context of this work, Q and R are scalars.
The log-likelihood function can then be written as:
| (4) |
where vt is the residual being the 1-step ahead prediction, and Ft is a covariance matrix defined as:
| (5) |
Notice that (5) are the typical Kalman Filter updates, but only replacing A = C = 1, and that Q and R, are embedded in Ft. Writing (4) in terms of the unknown parameters, and dropping the constants, we find
| (6) |
The optimization of the likelihood in (6) can be carried out by using iterative methods such as the EM algorithm. However, since we are only interested in Q and R, we can optimize it directly with Newton-like methods.
B. Newton-like methods
An alternative to the EM algorithm to optimize the log-likelihood function in (4) is to use Newton-like optimization algorithms [19], [20]. The EM algorithm is typically the preferred option due to a number of advantages. However, Newton-like algorithms offer faster convergence rate [21, pp.358–359] for the case when the step size is calculated using the likelihood function. Hence, in this section, we study an alternative method for solving the ML estimation problem in (6).
Using a Newton-like optimization method, (6) is solved via the following iterative procedure:
| (7) |
where is the search direction and δi is the step size at iteration i-th. The search direction is given by the solution of
| (8) |
where the d-dimensional vector denotes the gradient of l(β) at . In the Newton method, the d×d matrix Bi is defined as the inverse of the Hessian of l(β) at .
An alternative to the Newton method is a family of so-called quasi-Newton methods. In these methods, Bi is defined as an approximation to the inverse of the Hessian, which is iteratively computed, along with the iterations (7). In this work we build Bi using the BFGS formula [22].
Both, the Newton and the quasi-Newton method, require the computation of the gradient gi of l(β). In view of (6), we have that
| (9) |
where
| (10) |
| (11) |
The step size δi is obtained by carrying out a linear search for a local maximum in the direction .
It is important to note that one can also take the step size, δi, in the QN method based on the auxiliary function Q(β, β(i)) used in the EM algorithm. In this case, the estimate provided by the QN method can, in the best case, be as good as the one obtained by the closed form solution in the EM-based algorithm given in [10]. On the other hand, the advantage of the QN method proposed here can be found if the step size, δi, is obtained by using the log-likelihood function l(β), rather than the one obtained from using the auxiliary function Q(β, β(i)). Different combinations of methods (QN and EM) have been explored in the literature, see e.g. [19], [23], [24], however, to try a comparison between all of them is out of the scope of this paper.
For our time-invariant case, we can describe the algorithm as follows:
Algorithm 1:
choose an initial estimate . Then, for i = 0,1, · · ·,
calculate, using Kalman Filter equations, the expressions in (5)
obtain a new value of the parameter β(i+1)
go to (2) and iterate until convergence of the parameters estimates
The above algorithm offers fast convergence and the nice properties of ML estimates. However, we are interested in optimizing a likelihood function which is time-varying, that is, the parameter vector β has the form β = β(t).
C. Calculation of the initial value
We consider the diffusion model (1). Doing a simple manipulation, we can obtain an equivalent model
| (12) |
where z is the shift operator, and Δyk is defined as yk+1 − yk. It is straightforward to realize that (12) resembles to a moving average (MA) model of order 1, hence, we can write its spectral factorization, see e.g. [25] :
| (13) |
where C(z) = (z − 1), A(z) = 1, and C(z−∗) = C(z−1).
Developing the terms in (13), we can obtain
| (14) |
On the other hand, it is clear that being our model a MA(1), then we can find an equivalent MA(1) of the form:
| (15) |
where , with c and being unknown parameters.
Equivalently, we can calculate the spectrum of the MA(1) model given in (15), obtaining
| (16) |
Given that (14) and (15) are equivalent, we can write , and from doing that we can obtain an expression for the initial estimates of Q and R based on the estimates of c and σϵ as follows:
| (17) |
Notice that now we only need to obtain estimates for c and , which so far are unknown parameters. We can apply, for example Prediction Error methods (PEM), which turns out to be extremely useful for the problem at hand. In PEM, we can obtain the estimates for c and as
| (18) |
where , and is the 1-step-ahead predictor.
IV. Time-varying case
In this section, we extend the previous algorithm to trace time-varying parameters. The idea is to pose a time-varying likelihood which is time-invariant in a window of a nominated span. The likelihood is then optimized in this window, and the process is carried out by leaving one sample out and taking a new one in order to keep the same window size. The process finishes when we have included the last available sample.
A. Time-varying ML estimation
Our time-varying likelihood can be written as:
| (19) |
where is a weight, with k is each point within the window, h is a previously defined window length and f(yk|β) is given by
| (20) |
Usually t refers to the middle point of the window length. As we explained in [10], the idea is to use a window of nominated span h and time point k to estimate the parameters that maximize the log-likelihood function within the window. The parameter estimates corresponds to the middle of the window. The estimation algorithm continues when the time points are shifted by one unit and finishes when the last data point is included in the window.
To optimize the likelihood within the window, we apply the same algorithm described in previous Algorithm 1. We note that all these operations are carried out within the window h.
An important consideration is given to the window. To avoid Gibbs ringing phenomenon, windows with rounded edges should be used, see e.g. [26]. An example of such a window is given in (21).
| (21) |
where K(v) ≥ 0 with
The windowed log-likelihood lt function is essentially the point wise product of the time-invariant auxiliary l(β) function and the window function in time domain. Note that the idea here is is that the parameter is fixed within the window while the imposition of the kernel delivers a time-variant parameter estimate.
The time-varying estimation algorithm can be summarized as follows:
Algorithm 2:
Select a window of nominated span h. A good starting point is h = hoN, ho = 0.1, where N is the total number of data points.
Select a kernel function with rounded edges, e.g. (21)
Select an initial value of the parameter βo to start the algorithm (see (17))
Run Algorithm 1 until convergence within the window h. The result corresponds to the ML estimator within the window.
Drop one sample of the window and add a new sample.
Repeat the process from from (3) to (5) until the last sample is included.
B. Change detection
Change detection is a widespread topic in the systems and control community. Much research has been done, mainly in the 70s and 80s, see e.g. [12], [13] and the references therein.
Change detection can be carried out online or offline, and also used to detect additive and non-additive changes.
The problem of change detection can be stated simply as: Given a sequence of random variables {yk}k=1:N with a probability density p(YN|β) (depending upon a parameter β), then find the unknown time to such that the parameter β = βo for t < tc, and β = β1 (≠ βo) for t >= tc.
The above question can be answered online or offline. Either way, we need to know the value of the parameter before the change occurs. To do offline change detection, we also need to know the value of the parameter after the change has occurred. For online estimation, the value of β1 is not necessary to be known, because it can also be estimated.
In our problem, there is no need to do online change detection, that is, we will focus on the offline case. We also notice that for our system, we want to detect a change that is nonadditive, that is, a change in the variance. A sufficient statistic to detect nonadditive changes is the likelihood ratio test, which is a function of the residuals of models before the change and after the change.
To apply the likelihood ratio test, we define
| (22) |
where Yk−1 is the vector containing data up to k−1, (a)+ = max{a,0}, βo and β1 are the parameter values before and after the change, respectively. In reality, we do not know the exact values of βo and β1, hence, we need to use their estimates, which are obtained by using Algorithm 2.
The change detection algorithm is completed when check the following threshold:
| (23) |
where λ is a pre-defined value by the user.
There are also other methods which are not based on the likelihood ratio. These methods are referred as non-likelihood-based algorithms. The advantage of these methods is that they are simpler than those based on the likelihoods, and they are efficient from the statistical inference point of view, see e.g. [12]. However, we will not use them here since they are out of the scope of this paper.
C. Refining the estimation
In this section, we propose a combined algorithm to refine the estimation of diffusion constants in the SPT framework. The combined algorithm basically combines Algorithm 2 and change detection. We have:
Algorithm 3:
Run Algorithm 2 to obtain estimates of pair (βo, β1), namely
Run the test (23) to obtain an ML estimate of the change tc, namely
Separate data from and to obtain refined estimates for βo and β1
V. Examples
In this section, we demonstrate the performance of our algorithm through a simple numerical example. We replicate a typical trajectory using the model (1), with
| (24) |
where μ(t) is the unit step function. The system was simulated for 400 steps and the data stored for off-line analysis. 75 different simulations were performed to generate statistics on the estimation performance. If we consider that our time-varying diffusion is given by Dt = Qt/(2Δt), varying our diffusion value between 0.1 and 0.5.
In Fig. 2, we show the performance of our time-varying estimation. The threshold of 7 is defined by the user. It is clear that there is a trade-off between between this threshold and the detection of the change that is 0.1 and 0.5, repectively.
Fig. 2.
(Upper plot) Estimation of time-varying Q; (Middle plot) Estimation of time-varying R; (Lower plot) Performance of the likelihood ratio test
In Fig. 3 (left), we show the histograms for the time of change tc, which moves around 225 samples; (center) we show the histograms of the refined estimation of Dt before the change; (right) refined estimation of Dt after the change. Total data points is N = 400. As we observe from Fig. 3 (b) and (c), our algorithm is able to estimate correctly the values of the real parameters before and after the change.
Fig. 3.
(Left plot) Estimation of time of change tc; (Middle plot) Estimation of time-varying Dt before the change; (Lower plot) Estimation of time-varying Dt after the change
VI. CONCLUSIONS
In this paper, we have developed 2-step estimation algorithm to obtain a key parameter to describe trajectories in the context of SPT. The algorithm combines two well-known techniques, namely, local likelihood estimation algorithm and likelihood ratio test. The approach was demonstrated through simulations that, while preliminary, and based on simulated data, capture typical noise levels and data run lengths that are found in real-data SPT applications.
ACKNOWLEDGMENT
This work was supported in part by the NIH through grant NIGMS 5R01GM117039-02.
References
- [1].Shen H, Tauzin LJ, Baiyasi R, Wang W, Moringo N, Shuang B, and Landes CF, “Single Particle Tracking: From Theory to Biophysical Applications,” Chemical Reviews, vol. 117, no. 11, pp. 7331–7376, Jun. 2017. [DOI] [PubMed] [Google Scholar]
- [2].Monnier N, Guo S-M, Mori M, He J, Lénárt P, and Bathe M, “Bayesian Approach to MSD-Based Analysis of Particle Motion in Live Cells,” Biophysical Journal, vol. 103, no. 3, pp. 616–626, Aug. 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Berglund AJ, “Statistics of camera-based single-particle tracking,” Physical Review E, vol. 82, no. 1, p. 011917, Jul. 2010. [DOI] [PubMed] [Google Scholar]
- [4].Manzo C and Garcia-Parajo MF, “A review of progress in single particle tracking: from methods to biophysical insights,” Reports on Progress in Physics, vol. 78, no. 12, p. 124601, Dec. 2015. [DOI] [PubMed] [Google Scholar]
- [5].Michalet X, “Mean square displacement analysis of single-particle trajectories with localization error: Brownian motion in an isotropic medium,” Physical Review E, vol. 82, no. 4, p. 041914, Oct. 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Michalet X and Berglund AJ, “Optimal diffusion coefficient estimation in single-particle tracking,” Physical Review E, vol. 85, no. 6, p. 061916, Jun. 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Boyer D, Dean DS, Mejía-Monasterio C, and Oshanin G, “Optimal least-squares estimators of the diffusion constant from a single Brownian trajectory,” The European Physical Journal Special Topics, vol. 216, no. 1, pp. 57–71, Jan. 2013. [Google Scholar]
- [8].Brandenburg B and Zhuang X, “Virus trafficking – learning from single-virus tracking,” Nature Reviews Microbiology, vol. 5, no. 3, pp. 197–208, Mar. 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Ashley TT and Andersson SB, “A Sequential Monte Carlo Framework for the System Identification of Jump Markov State Space Models,” in American Control Conference, 2014, pp. 1144–1149. [Google Scholar]
- [10].Godoy BI, Vickers N, and Andersson SB, “Estimation of time-varying single particle tracking models using local maximum likelihood estimation,” in Biophysical Journal, vol. 116, no. 3, Baltimore, Maryland, USA, 2019. [Google Scholar]
- [11].Ashley TT and Andersson SB, “Method for simultaneous localization and parameter estimation in particle tracking experiments,” Physical Review E, vol. 92, no. 5, p. 052707, Nov. 2015. [DOI] [PubMed] [Google Scholar]
- [12].Basseville M and Nikiforov I, Detection of Abrupt changes: Theory and Application. Prentice Hall, 1993. [Google Scholar]
- [13].Basseville M and Benveniste A, Eds., Detection of abrupt changes in Signals and Dynamical Systems. Springer-Verlag, 1986. [Google Scholar]
- [14].Basseville M and Benveniste A, “Sequential detection of abrupt changes in spectral characteristis of digital signals,” IEEE Trans. on Inf. Theory, vol. 29, no. 5, pp. 709–724, 1983. [Google Scholar]
- [15].Cheezum MK, Walker WF, and Guilford WH, “Quantitative comparison of algorithms for tracking single fluorescent particles.” Biophysical Journal, vol. 81, no. 4, p. 2378, 2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Andersson SB, “Localization of a fluorescent source without numerical fitting,” Optics Express, vol. 16, no. 23, pp. 18714–18724, Jan. 2008. [DOI] [PubMed] [Google Scholar]
- [17].Small AR and Parthasarathy R, “Superresolution Localization Methods,” dx.doi.org, vol. 65, no. 1, pp. 107–125, Apr. 2014. [DOI] [PubMed] [Google Scholar]
- [18].Harvey A, Forecasting, structural time series models and the Kalman Filter. Cambridge University Press, 1989. [Google Scholar]
- [19].Goodwin G and Agüero J, “Approximate EM algorithms for parameter and state estimation in nonlinear stochastic models,” in 44th IEEE Conf. on Decision and Control and the European Control Conf. 2005, Sevilla, Spain, 2005. [Google Scholar]
- [20].Lange K, “A gradient algorithm locally equivalent to the EM algorithm,” Journal of the Royal Statistical Society. Series B, vol. 57, no. 2, pp. 425–437, 1995. [Google Scholar]
- [21].Cappé D, Moulines E, and Rydén T, Inference in Hidden Markov Models. Springer, 2005. [Google Scholar]
- [22].Fletcher R, Pratical Methods for optimization. John Wiley & Sons, Inc., 1987. [Google Scholar]
- [23].Mclachlan G and Krishnan T, The EM algorithm and Extensions, 2nd ed. John Wiley & Sons, Inc., 2008. [Google Scholar]
- [24].Olsson RK, Petersen KB, and Lehn-Schioler T, “State-space models: From the EM algorithm to a gradient approach,” Journal Neural Computation, vol. 19, no. 4, 2007. [Google Scholar]
- [25].Soderström T, Discrete-time Stochastic Systems. Springer, 2002. [Google Scholar]
- [26].Hewitt E and Hewitt R, “The Gibbs-Wilbraham phenomenon: An episode in Fourier analysis,” Archive of history of exact sciences, vol. 21, no. 2, pp. 129–160, 1979. [Google Scholar]



