A Bayesian framework for the detection of diffusive heterogeneity

Julie A Cass; C David Williams; Julie Theriot

doi:10.1371/journal.pone.0221841

. 2020 May 7;15(5):e0221841. doi: 10.1371/journal.pone.0221841

A Bayesian framework for the detection of diffusive heterogeneity

Julie A Cass ^1,^*, C David Williams ¹, Julie Theriot ^1,²

Editor: Juan Carlos del Alamo³

PMCID: PMC7205219 PMID: 32379846

Abstract

Cells are crowded and spatially heterogeneous, complicating the transport of organelles, proteins and other substrates. One aspect of this complex physical environment, the mobility of passively transported substrates, can be quantitatively characterized by the diffusion coefficient: a descriptor of how rapidly substrates will diffuse in the cell, dependent on their size and effective local viscosity. The spatial dependence of diffusivity is challenging to quantitatively characterize, because temporally and spatially finite observations offer limited information about a spatially varying stochastic process. We present a Bayesian framework that estimates diffusion coefficients from single particle trajectories, and predicts our ability to distinguish differences in diffusion coefficient estimates, conditional on how much they differ and the amount of data collected. This framework is packaged into a public software repository, including a tutorial Jupyter notebook demonstrating implementation of our method for diffusivity estimation, analysis of sources of uncertainty estimation, and visualization of all results. This estimation and uncertainty analysis allows our framework to be used as a guide in experimental design of diffusivity assays.

Introduction

Diffusion is essential for the intra-cellular transport of many organelles, proteins and substrates. In the crowded and heterogeneous physical environment of the cell, diffusivity is a local, spatially dependent characteristic of the space, dependent on factors such as the size of the particle, and the local viscosity and spatial crowding. These spatial heterogeneities must be addressed when using diffusion coefficients as readouts of intra-cellular transport and the physical environment. This intra-cellular diffusion coefficient is often experimentally estimated through two approaches: single particle tracking (SPT) [1–3] and fluorescence correlation spectroscopy (FCS) [4].

In single particle tracking experiments, a live cell is imaged in successive frames, and individual punctate objects are tracked to construct a trajectory of time-dependent positions (Fig 1). One of the most common approaches to extracting diffusion coefficient estimates from SPT is to use mean-squared displacement (MSD). The MSD generically follows the following relationship:

\begin{matrix} M S D (τ) = ⟨ {(Δ x (τ))}^{2} ⟩ = 2 d D τ^{α}, \end{matrix}

(1)

where Δx is the step size between frames taken at a time lag of τ, in d spatial dimensions, and D is the diffusion coefficient. The parameter setting the MSD scaling with time, α, is determined by the diffusive model. Any temporal scaling with α ≠ 1 is called anomalous diffusion, with super- and sub-diffusion models having α > 1 and α < 1, respectively. Intracellular diffusion has most often been characterized to be sub-diffusive, likely as a result of crowding [3].

Fig 1 — In SPT, a live cell is imaged over a series of time points. Individual punctate objects are localized at each time-step, and these positions are traced from frame to frame to produce individual time-lapse trajectories.

For objects undergoing homogeneous isotropic diffusion, the MSD of puncta is a linear function of lag time (α = 1), with the slope being proportional to the apparent diffusion coefficient: The averaging in this calculation can be taken on a single or multiple trajectory basis (i.e. mean of each displacement over time-step τ in a single trajectory or over many trajectories). If MSD analysis is completed on a per-trajectory basis, this technique allows for spatial resolution of diffusivity variation; however it relies on the fitting of the MSD(τ) slope. This analysis can be misleading, as it includes no information about the uncertainty in this estimation beyond calculation of the error on the mean. As a result, when multiple single-trajectory MSD’s are plotted together on a log-log plot, it can be easy to interpret non-overlapping MSD(τ) line as portraying distinct diffusivities, when they could just be representing uncertainty-driven variations around a single shared value.

In FCS, a laser illuminates a region of a sample containing fluorescently tagged particles [5]. The characteristic time a fluorescent particle spends in the illuminated region (“dwell time”) can be calculated from the intensity auto-correlation function. Together with the length scale of the illuminated region, dwell time gives an estimate of the diffusion coefficient in this region. The calculation of the diffusion coefficient from these properties is dependent on the chosen diffusion model; this method is flexible to anomalous diffusion models and captures small-scale local diffusivities. However, only one local measurement can be made from each illuminated region, making the assessment of many local regions experimentally intensive.

Like FCS, SPT can be used to probe local diffusivities and is robust to anomalous diffusion models [6]. But in contrast, rather than providing one diffusivity measurement per illuminated region, SPT allows for as many individual local diffusivity estimates to be simultaneously made as there are fluorescent particles in the field of view. Dependent on particle density, this advantage allows for the efficient use of spatially dependent diffusivity assays. While SPT offers many advantages, it relies on finite observations of a stochastic assay, limiting our diffusivity estimation accuracy.

While powerful analyses from SPT have indicated the complexity of transport in live cells, the spatial variation of the diffusion coefficient remains poorly characterized. This can be attributed to challenges in disentangling effects of biological heterogeneity and limited sampling of a stochastic process [7, 8]. To address these challenges, we developed a Bayesian framework to estimate a posterior distribution of the possible diffusion coefficients underlying single-trajectory dynamics. This framework generates look-up tables predicting the detectability of differences in diffusion coefficients, conditional on the ratio of their values and amount of trajectory data collected.

Other packages with information theoretic frameworks for trajectory analysis have been released; for example, the Single-Molecule Analysis by Unsupervised Gibbs sampling (“SMAUG”) software package [9] also uses Bayesian estimation to characterize diffusive environments. However, our package is unique because it is intended specifically to provide lightweight trajectory analysis and prediction that can be used by those with a biological background to inform microscopy experiment design, without requiring deep statistical or computational knowledge.

Materials and methods

Trajectory simulation and localization error

We generated sample trajectories with known diffusion coefficients by simulating Brownian motion of particles in a d-dimensional space. At each time-point and along each spatial dimension, a step size was drawn from a zero-mean Gaussian $N (μ = 0, σ^{2})$ with variance σ² defined by the diffusion coefficient: σ² = 〈|Δx|²〉 = 2dDΔt, where d is the number of spatial dimensions, D is the homogeneous isotropic diffusion coefficient, and Δt is the time-step. At each time point, a new step size in each dimension was drawn from the normal distribution, to generate the displacement vector $\vec{Δ x}$ . This displacement vector was added to the position $\vec{x} (t)$ to generate the next position $\vec{x} (t + Δ t)$ . We recorded the position of the particle at each frame in a time-series, constructing a trajectory mimicking the data one would get from tracking an object from time-series images (Fig 2).

Fig 2 — A 2D diffusive trajectory with no localization error is drawn for T time-steps. At each time-step, a cloud of Gaussian uncertainty is drawn; the shape and shading of this cloud demonstrate how likely it is for the position of be measured at any of the surrounding points rather than in the true position. A sample alternative trajectory is drawn (purple) showing the path we might observe the particle to take, due to the localization error in measuring the true position as a function of time.

To mimic the static localization error inherent in microscopy-generated trajectories in our simulated trajectories, we added Gaussian error to the locations of simulated particles at each time point [10]. After each successive location was stochastically chosen based on a model of Brownian motion, an additional draw from another normal distribution was made to select a shift in position in each spatial dimension. The variance of this Gaussian localization error can be tuned to the user’s own specific microscope configuration.

The locations of the simulated particle at each time-point (with and without error included) are stored in a DataFrame, and these trajectories are digested into frame-to-frame displacements; realistically these step sizes were used to generate the trajectories, making back-calculating them seem tedious. However, the remainder of our toolkit is designed for analysis of any trajectory—simulated or tracked from images. Therefore a user can choose to either input their own image-derived trajectories or use a simulated trajectory to perform estimation of the unknown diffusivity.

Bayesian inference of diffusivity

To estimate the diffusivity underlying a single trajectory (and our uncertainty in this estimation), we employ Bayesian inference [11]. This method is focused on generating a “posterior probability distribution”: the probability that a random variable takes on any of a set of values, based on provided evidence and a prior distribution. In our case, the random variable is the diffusivity, and the evidence is the set of step sizes from a single trajectory. The prior distribution for the variance of a normal distribution with known mean is an inverse-gamma distribution. This acts as a conjugate prior; that is, a class of distributions for which the prior and posterior distributions take on the same mathematical form; therefor our posterior will also be an inverse-gamma function. The inverse-gamma distribution’s probability density function over diffusion coefficients D > 0 is parameterized by the scale (a) and shape (b):

\begin{matrix} I G (D; a, b) = \frac{b^{a}}{Γ (a)} {(1 / D)}^{(a + 1)} e^{- b / D} . \end{matrix}

(2)

The parameters a and b have been used in place of the typical use of α and β respectively, to disambiguate from the MSD time-scaling parameter α in Eq 1.

The posterior distribution peaks near the true diffusion coefficient and has a width corresponding to the confidence interval of our estimate, which is largely determined by the trajectory length and magnitude of localization error.

Characterizing the distinguishability of diffusivity posteriors

To characterize our uncertainty on whether trajectories come from regions with different diffusivities, we require a way to quantitatively discriminate between pairs of posterior distributions. To achieve this, we use the Kullback-Leibler (KL) divergence. The KL divergence acts as a single-value estimation of how well we can analytically distinguish whether the step sizes from a trajectory came from the diffusivity predicted by one posterior or the other. The KL divergence of two inverse-gamma distributions p(a, b) and $q (\hat{a}, \hat{b})$ is calculated as follows [12]:

\begin{matrix} K L (a, b, \hat{a}, \hat{b}) = (a - \hat{a}) Ψ (a) + \hat{b} (\frac{a}{b}) - a + l o g \frac{b^{\hat{a} + 1} Γ (\hat{a})}{b {\hat{b}}^{\hat{a}} Γ (a)} \end{matrix}

(3)

where Ψ(a) is the digamma function, defined as the logarithmic derivative of the gamma function (Γ(a)). Since this metric is not symmetric and we have no preference between distributions p and q, we use a symmetrized version of the KL divergence $K L = \frac{1}{2} (K L (a, b, \hat{a}, \hat{b}) + K L (\hat{a}, \hat{b}, a, b))$ .

Code availability

A repository for our source code is publicly available at the Allen Cell Modeling GitHub page https://github.com/AllenCellModeling/diffusive_distinguishability, conveniently packaged with ReadTheDocs documentation and a tutorial Jupyter notebook demonstrating usage and reproducible figure production. This package is registered under DOI 10.5281/zenodo.2662552.

Results and discussion

Bayesian inference of diffusivity

When the position of a diffusing object is recorded as a trajectory of discrete steps in time, the sizes of those steps can be mathematically represented as stochastic draws from a distribution characterized by the diffusion coefficient. Our method for estimating the diffusion coefficient relies on breaking individual trajectories into frame-to-frame steps, and applying a Bayesian statistical framework to predict the diffusivity underlying each set of stochastically derived step sizes. From a single trajectory, this framework provides not only an estimation of the diffusivity, but also a representation of our uncertainty. While our framework could be adapted to analyze more complex dynamic models, our current implementation introduces a workflow for analyzing isotropic homogeneous diffusion; therefore, trajectories with unknown diffusivity will result in a step-size distribution which is normally distributed, with zero mean and unknown variance $N (μ = 0, σ^{2})$ .

Bayesian inference is built on the use prior and posterior distributions [11]. Our “prior” distribution is an initial guess at the solution to a problem before using our observations or data to inform our expectations (i.e. a priori); for instance, if I have no intuition for the solution to my estimation problem, I would use a flat prior telling my model that I think any solution is equally likely. We then use our data to narrow down our solution estimation (i.e. a posteriori), resulting in a “posterior” distribution. In our case, the step size distribution from a single trajectory would be the observations, and the posterior might look like a distribution of diffusivity values, peaked around some value indicating a likely estimate of the underlying diffusion coefficient. The longer the trajectory is, the more information we can use to narrow down our answer, leading to a more tightly peaked posterior (discussed in greater detail in the Sources of posterior estimate error).

Inverse-gamma distribution as diffusivity conjugate prior

In this section, we will step through the process of applying Bayesian analysis to our particular case. First, we will get introduced to the governing principle of this approach, called Bayes’ theorem [11], then we will carefully digest this principle into pieces and see how it applies to our own application.

Bayes’ theorem tells us that the posterior distribution for an unknown variable θ is proportional to the product of the prior distribution p(θ) and the “likelihood function”, or the function giving the probability of making observation x given the unknown variable p(x|θ). Mathematically, this is often represented:

\begin{matrix} p (θ | x) \propto p (θ) p (x | θ) . \end{matrix}

(4)

How does this apply to the diffusion process we have been exploring? In our problem, we have taken single particle trajectories and split them into frame-to-frame step sizes. We can say, then, that our Bayesian “observed variable x” is the step size Δx. We’ve discussed previously that we expect the step sizes for diffusive trajectories to be normally distributed, with a mean of zero and an unknown variance. Translating again to the Bayesian framework, we can say that our unknown variable θ is the variance σ², and our likelihood function is the normal distribution of step sizes, i.e. $p (x | θ) = p (Δ x | σ^{2}) = N (0, σ^{2})$ .

The prior is our initial guess of the probability distribution of values for our unknown variable, σ². To determine the prior distribution for our cases, p(θ) = p(σ²), we consider the mathematical dependence of the normally distributed step sizes on the variance σ²:

\begin{matrix} p (Δ x | σ^{2}) \propto {(1 / σ^{2})}^{β} e^{- γ / σ^{2}} \end{matrix}

(5)

We see that this dependence looks a bit like a gamma distribution, except that our variable of interest is found in the denominator. This class of function is intuitively called an inverse-gamma function (IG, Eq 2). We can now say a priori that we expect our estimated σ² values to follow an inverse-gamma distribution, and therefore this is the form of our prior: p(θ) = p(σ²) = IG(σ²).

We have now seen how to place the observed and unknown Bayesian variables in the context of our problem, and explored the Normal and inverse gamma distributions which can be used as our likelihood and prior distributions, respectively. With these pieces in hand, we can now find the class of function for our posterior distribution, as the product of our prior and likelihood distributions (Eq 4). In our case, we find that the product of p(σ²) and p(Δx|σ²) also has an inverse gamma dependence on σ². We note that our posterior distribution is a function of the same class as the posterior—we will come back to this after a brief note.

In this section we have built up a framework for performing Bayesian analysis to estimate a distribution of variances, but we promised an estimation of the diffusion coefficient. Now let us recall that the variance of the diffusive step size distribution is directly proportional t the diffusion coefficient (σ² = 2dDΔt), and therefore, with the inclusion of a multiplicative constant, this analysis is easily transferred into a Bayesian estimation of diffusivity D, with inverse gamma prior and posterior distributions IG(D).

In general, when the prior and posterior for Bayesian analysis take the same mathematical form, the prior is referred to as a “conjugate prior.” The matching of the conjugate prior and posterior function types dramatically simplifies the statistical method, presenting one advantage of this prior. A second advantage of our prior is that the inverse-gamma distribution acts a conservative initial “guess,” with any order of magnitude diffusivity is equally likely, before the introduction of any data. In the Bayesian method of statistical inference, the choice of prior can bias our results; for instance, if we expect the diffusivity to be around 1 μm²/s, we might select a prior distribution that is sharply peaked around this value. If the diffusivity is, in fact, close to this value, that choice of prior would help guide our posterior towards the correct result. However, if that intuition is incorrect, and the true value lies in the tails outside our peaked prior, we will have biased out Bayesian estimator away from the true value, skewing our results ti be incorrect. As a result, use of an “uninformative” prior such as the inverse-gamma distribution with scale and shape parameter a, b → 0, treats all posterior results as being equally likely and helps us to remove our a priori bias from our diffusivity inference. The distribution and quantity of values in our set of step sizes will then determine the scale (a) and shape (b) parameters for our posterior inverse-gamma distribution IG(D;a, b).

Sources of posterior estimate error

The estimation of diffusivity from a single trajectory is limited by the finite trajectory length and accuracy in localizing the object at each time point. As a result, careful consideration of how each of these factors will impact the estimation uncertainty is necessary when constructing an experimental design. To address this, we have constructed a framework for generating look-up tables predicting the percent error posterior diffusivity estimation conditional on a set of trajectory lengths and localization errors.

Many methods for estimating diffusivity from a single trajectory rely on the analysis of the frame-to-frame step-size distribution extracted from that trajectory. However, during a microscopy experiment, there will always be an inherent limitation to the degree of accuracy that an object can be localized in each frame. This arises from both static and dynamic sources of localization error; static localization error occurs due to the inherent limit to spatial resolution of imaging experiments, while dynamic localization error comes from the non-instantaneous nature of capturing an image resulting in object movement during image acquisition [13]. Since dynamic localization error is most relevant for quicky moving objects, such as small substrates, we have chosen to simulate and provide example analysis of the effects of static localization error.

As a result of limitations in spatial resolution, when the object is tracked and trajectories generated, an inherent limitation in localization accuracy is encoded in the trajectory, and therefore skews the step-size values being used to infer the diffusion coefficient. To demonstrate the impact of localization error on SPT, we provide an example simulated trajectory with varying amounts of localization error applied (Fig 3).

Fig 3 — A 2D diffusive trajectory with no localization error is drawn for T time-steps. That same trajectory is then redrawn in increasingly light colors, for increasing levels of localization error. This error is parameterized in the form of the standard deviation of a Gaussian blur, in microns. This example allows us to visualize the impact that a range of localization errors would have on the same trajectory.

Fig 4 demonstrates the impact of underlying diffusion coefficients and localization errors on posterior estimates. We provide examples of trajectories in two regions with differing diffusion coefficients, each with and without localization error included in the trajectory simulation. We then plot the posteriors for all four of these trajectories on one set of axes. Our tool aims to quantify the effects of this localization error on the estimation of diffusivity by generating trajectories with varying known degrees of localization error and reporting their impact on the error of the posterior estimation of the known underlying diffusivity.

Diffusive trajectories are composed of successive steps, whose sizes are stochastic draws from a distribution set by the diffusivity. When only short trajectories are available, we have only a limited set of draws from this distribution—as a result, the variance of this distribution is difficult to accurately predict, and the posterior distribution of diffusivity probabilities will be less accurate and precise. While it would be ideal to simply collect longer trajectories, this is often experimentally impossible; therefore, we aim to give experimentalists an analysis framework to estimate how accurately they can predict diffusivity given their own limitations in tracking.

Because our trajectories are simulated, we benefit from the knowledge of the true diffusivity and degree of localization error, and can therefore precisely quantify the relation between the error in our Bayesian estimation of diffusivity and the level of localization error. This provides a look-up table for experimentalists to predict the accuracy in diffusivity estimation that can be achieved with their own particular microscopy experiment, shown in Fig 4. We quantify the error in our estimates as the magnitude of the percent error between the true diffusivity and the mode of the posterior probability distribution as calculated by the posterior’s scale and shape parameters:

\begin{matrix} %_{e r r o r} = | 100 (\frac{b + 1}{a} - D_{t r u e}) / D_{t r u e} | . \end{matrix}

(6)

Of course, due to the stochastic nature of diffusive properties, even with all the same simulation parameters, the posterior error will vary from one simulation to the next. In order to capture the mean effect of each parameter on posterior error, the results in Fig 5 represent the average percent error for N = 10⁴ replicates of the same simulation parameterization.

Fig 5 — The percent error for a given posterior is measured as the percent error between the true diffusion coefficient used to generate the trajectory, and the mode of the posterior distribution (or the diffusion coefficient which gives the maximum value of the probability density function). This heatmap reports the mean percent error magnitude for 10⁴ posteriors generated under each set of trajectory length and localization error conditions, with diffusion coefficients of (A) 0.01 μm²/s (B) 0.1 μm²/s and (C) 1.0 μm²/s. Please note the difference in heatmap scale bars.

For example, in a study of the Bacilus subtilis SMC complex [3], with diffusion coefficient on the order of 0.1 μm²/s, localization error of 0.1 μm and trajectory lengths of approximately 50 frames, this table tells us to expect a diffusivity estimation error of ≈ 15%. A study of MRNP diffusion in the nucleus [14] with diffusion coefficient also on the order of 0.1 μm²/s, but with localization errors ranging from 0.01-0.1 μm and trajectory lengths greater than 1000 frames, we can expect a diffusivity error ranging between 5% and 10%, depending on the experiment’s localization error. For a more specific deep dive into the estimation error expected for a specific diffusivity, localization error and trajectory length, users can simulate these results using the“get_dim_error” function of our tool, demonstrated in the Jupyter notebook tutorial.

In addition, it should be noted that the number of spatial dimensions of the assay (i.e. whether trajectories are measured in two or three spatial dimensions) as well as the mean-squared displacement (related to the diffusion coefficient) can impact the relationship between localization error and Bayesian estimation error. For a more in-depth discussion and simulation of this, please see the tutorial Jupyter notebook in our project GitHub repository.

Distinguishability of trajectory diffusivities

With the above percent error analysis derived for simulated trajectories with known diffusivities, a picture arises of how our estimates of the diffusivity differ from the true values. As a result, when this technique is applied to experimentally-derived trajectories whose underlying diffusivities are unknown, we may want to ask ‘how likely is it that two trajectories resulting in different diffusivity estimates were actually derived from regions with the same diffusivity?’ The biological motivation and analog for this technical question is ‘how heterogeneous is the physical cellular environment?’

This will depend on the amount of overlap between the two diffusivity posterior distributions, which is determined by: (1) how different the underlying diffusion coefficients are (how far apart the theoretical maxima of posteriors are) and (2) how uncertain we are in our estimations (how wide the posterior distributions are). One way to measure the difference between two distributions is to use the Kullback-Leibler divergence (KL divergence). A KL divergence of zero indicates that two distributions are identical; one interpretation of this metric is that its inverse tells you the number of times you can draw samples from one distribution in place of the other before there is significant information loss.

In order to communicate the distinguishability of pairs of posteriors conditional on their trajectory parameters, we have created a heatmap look-up table of the KL divergence of posterior pairs, dependent upon the ratio of their underlying diffusion coefficients (i.e. D₂/D₁), and the trajectory length. An example of this look-up table heatmap is provided in Fig 6. The complete code used to generate this map is provided in the tutorial Jupyter notebook found in the GitHub repository for this project. By cloning the repository, users can directly edit this example code to recreate this map with a different localization error or different distribution of trajectory lengths and diffusion coefficient values. An experimentalist may generate their own heatmap for trajectories with their specified degree of localization error, and get a table to tell them how distinguishable differences in diffusion coefficients will be for different lengths of trajectories that they can collect. This framework could therefore play a valuable role in describing the feasibility of and requirements for experiments addressing the spatial heterogeneity of the intra-cellular diffusive environment.

Fig 6 — Heatmap displaying the average KL divergence of diffusivity posteriors. For each entry in the heatmap, two trajectories of the same length (x-axis) are produced, with differing underlying diffusivities with the ratio D₂/D₁ (y-axis). A posterior is estimated for each, and their KL divergence is calculated as a measure of the distinguishability of the underlying diffusivities. As this process is stochastic, this is repeated 10⁴, with the average being the value reported in the heatmap.

Comparison with MSD analysis

Given a single trajectory, let us compare what we could learn of the underlying diffusivity through MSD analysis and our Bayesian framework. In MSD analysis, the trajectory would be split into step sizes associated with every possible lag time (that is, the mean of the squared displacement for all step sizes between frames τ = 1, 2, 3… frames apart. The diffusivity can be calculated by fitting the MSD using Eq 1, often using a loglog plot. This provides a single prediction of the average diffusivity over the course of the trajectory. In contrast, our Bayesian framework outputs a probability distribution of diffusivity values; the diffusivity giving the highest probability can be extracted to give a single-values diffusivity estimation, but the distribution as a whole offers the appealing advantage of giving a quantitative measure of our confidence in this estimate.

This confidence interval offers an added benefits over MSD analysis. Through posterior visualization and the KL divergence analysis described in the previous section, this Bayesian estimation framework provides us with a straightforward visual and quantitative way to diagnose how likely it is that diffusivity estimates from two trajectories are actually describing regions with different physical properties. In the case of MSD, comparison of single-trajectory diffusivity estimates is done by plotting MSD(τ) for each trajectory on the same log-log plot and comparing their intercepts. This methodology fails to capture information about uncertainty, and may lead to the false conclusion that each trajectory is taken from a region with a unique diffusivity. In many cases Bayesian posterior analysis will reveal significant overlap between these trajectories’ posteriors, indicating the analyzed trajectories do not mark the region as having heterogeneous diffusivity. One interpretation of the KL divergence is that its inverse tells you the number of observations you can make using one distribution in place of the other, before the information loss becomes significant. For instance, if posteriors from trajectories A and B have a KL divergence of 0.01, I could use 100 measurements from posterior A to describe posterior B before I start to significantly misinterpret posterior B; this means that these distributions are extremely similar and their diffusivities might be considered to be the same. If posteriors A and B have a KL divergence greater than one, the numbed of observations before significant information loss would be less than or equal to one, telling me that using even a single measurement from one distribution in place of the other will cause a mischaracterization; the trajectories used to generate these distributions have distinct diffusivities.

Application to spatially dependent diffusivity characterization

In the introduction of this paper, we discussed the importance of analysis techniques that acknowledge the heterogeneity of cellular environments. The single-trajectory dependence of this tool offers a framework to build on for characterizing variations in the diffusivities felt by trajectories recorded in different cellular regions. By mapping the diffusivity estimates from each trajectory (value most probable from posterior distribution) to the spatial region where the tracked substrate was localized, the user can build up a spatial mapping of the diffusivity. While frameworks exist for spatial mapping of the physical properties of cells, such as nanorheology of injected particles [15] and SMAUG [9], these techniques respectively require an extensive and invasive experimental design or in-depth knowledge of computational Bayesian inference. Our tool offers an approachable framework for experimental design of studies to probe the spatial variation of physical properties of the cell.

Framework limitations

As we have discussed, the presence of localization error and the finite nature of trajectories will contribute to the uncertainty in any analysis of single particle trajectories. Here, we discuss several other important limitations to be considered when using this software package.

This framework is currently only implemented for the analysis of pure diffusion, however anomalous diffusion (particularly sub-diffusion) is commonly reported in the analysis of biological trajectories. Users could adapt the package to analyze trajectories undergoing anomalous diffusion by editing our Bayesian estimation code. We have described how our conjugate prior and posterior model have been selected specifically to analyze a normal distribution of step sizes with zero mean; because the step size distribution is dependent upon the diffusion model, the class of function used for the prior and posterior will also be dependent upon the diffusion model. To modify this framework for other diffusion models, users would therefore select new prior and posterior distributions, and require a new equation for calculating the KL divergence for a pair of distributions belonging to this mathematical function class (i.e. a replacement for Eq 3). However, it is important to note that as the diffusion model becomes more complex, selection of a prior and posterior can become very challenging, limiting the scope of the framework.

Realistic intra-cellular transport is additionally complicated by the presence of active transport and flow. Furthermore, the affects of confinement and characterization of the physical properties of the cytoplasm (i.e. elasticity) can further complicate intra-cellular dynamics. As these factors are not considered in the current implementation of our framework, they will contribute to the error in the analysis of experimentally derived trajectories.

Application to fractional Brownian motion trajectories

Many research studies have demonstrated intracellular transport to be sub-diffusive (i.e. α < 1 in Eq 1), with α = 0.75 in crowded cellular environments such as an actin lattice or the cytoplasm [3]. In particular, these trajectories have ergodic MSD’s and velocity autocorrelations which are anti-correlated at short timescales; this behavior is characterized by the sub-diffusive model of fractional Brownian motion (FBM). FBM trajectories are parameterized by the Hurst coefficient H, defined as H = α/2; thus, FBM trajectories with H = 0.375 provide a more complicated, but more realistic representation of intracellular transport than the simpler model of pure diffusion used in the results presented so far. However, its application in this Bayesian estimation tool would require the use of a much more complicated prior and posterior; while our tool is built to be robust to varying priors and posteriors, we understand that defining these distributions for more complex models of motion can be challenging. To test how accurately this simpler diffusion model can be used to predict the diffusivity of more realistic FBM trajectories, we applied the existing, pure-diffusion based Bayesian analysis to FBM-generated trajectories with H = 0.375 and calculate the error in the estimated diffusivity, presenting the resulting posterior error heat-maps as in Fig 7. These FBM trajectories are produced using publicly available simulator written by Christopher Flynn (https://pypi.org/project/fbm/). It is important to note that for this sub-diffusive behavior, there is no longer a single diffusion coefficient defined for the trajectory; instead, the diffusion coefficient must now be defined for a given time lag (τ in Eq 1). For the results presented here, we analyze the error in the effective diffusion coefficient defined for α = 1 second. We find that the error in the estimated diffusivity for these more biologically relevant trajectories are nearly identical to those reported for the purely diffusive trajectories; we therefore believe that despite that complexity of experimentally derived intracellular trajectories, this analysis tool remains a suitable for experimental diffusivity estimation.

Fig 7 — The percent error for a given posterior is measured as the percent error between the true diffusion coefficient used to generate the trajectory, and the mode of the posterior distribution (or the diffusion coefficient which gives the maximum value of the probability density function). Trajectories for this figure are simulated using fractional Brownian motion with Hurst coefficient H = 0.375 (or α = 0.75) This heatmap reports the mean percent error magnitude for 10⁴ posteriors generated under each set of trajectory length and localization error conditions, with diffusion coefficients of (A) 0.01 μm²/s (B) 0.1 μm²/s and (C) 1.0 μm²/s. Please note the difference in heatmap scale bars.

Conclusion

Heterogeneity of diffusive dynamics may majorly impact the transport of essential cellular substrates but remains largely uncharacterized. To shed light on the feasibility of resolving spatial from stochastic drivers of diffusive heterogeneity in trajectory data, we developed a framework for predicting our ability to detect differences in diffusivity under different experimental regimes. Our framework is intended to inform the design of experiments characterizing the spatial dependence of diffusivity on sub-cellular location.

Acknowledgments

We would like to thank Steph Weber, for her helpful comments on this manuscript, and Molly Maleckar, Gabriel Mitchell and Jamie Sherman for their helpful conversations. We thank Jackson Brown for his CookieCutter template and guidance in repository initialization, and Thao Do for her scientific illustration. Finally, we thank Paul G. Allen, founder of the Allen Institute for Cell Science, for his vision, encouragement and support.

Data Availability

All data and software are publicly available on the GitHub repository https://github.com/AllenCellModeling/diffusive_distinguishability (DOI: 10.5281/zenodo.2662552).

Funding Statement

This work has been completed and funded by the Allen Institute for Cell Science.

References

1. Lee GM, Ishihara A, Jacobson K. Direct observation of brownian motion of lipids in a membrane. PNAS. 1991;88:6274–6278. 10.1073/pnas.88.14.6274 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Saxton MJ, Jacobson K. SINGLE-PARTICLE TRACKING: Applications to Membrane Dynamics. Annu Rev Biophys Biomol Struct. 1997;26:373–399. 10.1146/annurev.biophys.26.1.373 [DOI] [PubMed] [Google Scholar]
3. Weber SC, Spakowitz AJ, Theriot JA. Bacterial Chromosomal Loci Move Subdiffusively through a Viscoelastic Cytoplasm. PRL. 2010;104(23)(238102). 10.1103/PhysRevLett.104.238102 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Magde D, Elson E, Webb WW. Thermodynamic Fluctuations in a Reacting System—Measurement by Fluorescence Correlation Spectroscopy. Phys Rev Lett. 1972;29:705–708. 10.1103/PhysRevLett.29.705 [DOI] [Google Scholar]
5. Machán R, Hof M. Recent developments in fluorescence correlation spectroscopy for diffusion measurements in planar lipid membranes. International journal of molecular sciences. 2010;11:427–457. 10.3390/ijms11020427 [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Harwardt M, Dietz MS, Heilemann M, Wohland T. SPT and Imaging FCS Provide Complementary Information on the Dynamics of Plasma Membrane Molecules. BiophysJ,. 2018;. 10.1016/j.bpj.2018.03.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Valentine MT, Kaplan PD, Thota D, Crocker JC, Gisler T, Prud’homme RK, et al. Investigating the microenvironments of inhomogeneous soft materials with multiple particle tracking. Phys Rev E. 2001;64(061506). [DOI] [PubMed] [Google Scholar]
8. Lampo TJ, Stylianidou S, MP B, Wiggins P, Spakowitz AJ. Cytoplasmic RNA-Protein Particles Exhibit Non-Gaussian Subdiffusive Behavior. BiophysJ. 2017;112(3):532–542. 10.1016/j.bpj.2016.11.3208 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Karslake JD, Donarski ED, Shelby SA, Demey DM, DiRita VJ, Veatch SL, et al. SMAUG: Analyzing single-molecule tracks with nonparametric Bayesian statistics. BioRxiv Pre-print. 2019; 10.1101/578567. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Michalet X. Mean square displacement analysis of single-particle trajectories with localization error: Brownian motion in an isotropic medium. Phys Rev E: Statistical, nonlinear, and soft matter physics. 2010;82 10.1103/PhysRevE.82.041914 [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2nd ed Chapman and Hall/CRC; 2004. [Google Scholar]
12. Llera A, Beckmann CF. Estimating an Inverse Gamma distribution. arXiv:160501019 [statME]. 2016;. [Google Scholar]
13. Savin T, Doyle P. Static and dynamic errors in particle tracking microrheology. BiophysJ. 2005;88:623–38. 10.1529/biophysj.104.042457 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Forst NA, Hsiangmin EL, Blanpied TA. Optimization of Cell Morphology Measurement via Single-Molecule Tracking PALM. PLOS One. 2012;7(5)(e36751). 10.1371/journal.pone.0036751 [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Wu PH, Hale CM, Chen WC, Lee JS, Tseng Y, Wirtz D. High-throughput ballistic injection nanorheology to measure cell mechanics. Nature protocols. 2012;7:155–170. 10.1038/nprot.2011.436 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0221841.r001

Decision Letter 0

Juan Carlos del Alamo

3 Oct 2019

PONE-D-19-23195

A Bayesian framework for the detection of diffusive heterogeneity

PLOS ONE

Dear Dr Cass,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

We would appreciate receiving your revised manuscript by Nov 17 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Juan Carlos del Alamo

Academic Editor

PLOS ONE

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This study describes a Bayesian inference algorithm to estimate local values of the diffusion coefficient inside live cells from single trajectories of intracellular particles. This type of algorithm can be useful to researchers interested in quantifying diffusivity of heterogeneous or time-varying environments, including but limited to the cytoplasm of live cells. The manuscript describes related existing efforts in the literature, and makes a convincing point that the present algorithm, and in particular its associated freely accessible implementation, is sufficiently different from those existing efforts. In particular, while the parametric nature of the present study is a limitation with respect to existing, non-parametric, efforts, its simplicity may be advantageous to those without significant expertise in statistical mechanics.

The manuscript analyzes the error in the estimated D based on the localization error of the particle. As expected, the error decreases with the length of the recorded trajectory. However, Figure 5 shows this error seems to be unacceptably large for some combinations of values, and it is unclear whether a “typical experiment” (this is a loose term whose meaning is expanded below) would yield acceptable results. The authors argue that the purpose of Figure 5 is for each researcher to assess the error for their own experiments. This is valuable but Figure 5 is plotted in a way that makes this assessment difficult:

1) Only two values of D are covered. It would seem to make more sense to plot Figure 5 normalizing the localization error with (D*tau)^(1/2). This could capture the D-dependence of the estimation error, and only one panel might be necessary to cover all D values.

2) Second, a line plot or contour plot format would be preferable to read errors in the plot.

3) It would be informative to represent a “typical experiment” or experiments in the localization error and trajectory length coordinates of Figure 5. The authors can use experiments from the literature and / or their own data from previous studies.

As the authors point out, a limitation of the study is that it focuses on an idealized model of intracellular diffusion. The authors argue that the method could be adjusted to account for complicated phenomena, such as subdiffusion, but the intended audience of this algorithm may not find this straightforward. This issue is compounded with the fact that this reviewer finds the purely diffusive case to be particularly amenable to the analytical calculation of the posterior distribution. Other cases may be harder… It would be informative to illustrate how the algorithm would be modified in the subdiffusive or e.g., persistent-random walk case by presenting the posterior distribution for those processes.

Finally, there may be cases in which modifying the algorithm to account for non-purely diffusive, isotropic behavior is not feasible or where the actual behavior that needs to be accounted for is unknown a priori. It would be informative to know the error in estimated D in those cases. Again, subdiffusive, persistent-random or anisotropic random walk cases come to mind.

Minor comments:

1) Is the alpha in equation 2 related to the alpha in equation 1?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 May 7;15(5):e0221841. doi: 10.1371/journal.pone.0221841.r002

Author response to Decision Letter 0

19 Dec 2019

This response is better viewed in the Response to Reviewers file, where our responses are interwoven with the reviewer's comments and clearly indicated. I have copied this text and placed the reviewer's original notes in brackets, to given our responses appropriate context.

[This study describes a Bayesian inference algorithm to estimate local values of the diffusion coefficient inside live cells from single trajectories of intracellular particles. This type of algorithm can be useful to researchers interested in quantifying diffusivity of heterogeneous or time-varying environments, including but limited to the cytoplasm of live cells. The manuscript describes related existing efforts in the literature, and makes a convincing point that the present algorithm, and in particular its associated freely accessible implementation, is sufficiently different from those existing efforts. In particular, while the parametric nature of the present study is a limitation with respect to existing, non-parametric, efforts, its simplicity may be advantageous to those without significant expertise in statistical mechanics.]

We’d like to thank the reviewer for their thoughtful consideration of our manuscript and helpful comments. Indeed, we hope that this tool may be of particular use to researchers without deeply computational or statistical backgrounds, offering a modifiable tool with more flexibility than the MSD, but less complexity than existing software that can have a higher barrier to entry.

[The manuscript analyzes the error in the estimated D based on the localization error of the particle. As expected, the error decreases with the length of the recorded trajectory. However, Figure 5 shows this error seems to be unacceptably large for some combinations of values, and it is unclear whether a “typical experiment” (this is a loose term whose meaning is expanded below) would yield acceptable results. The authors argue that the purpose of Figure 5 is for each researcher to assess the error for their own experiments. This is valuable but Figure 5 is plotted in a way that makes this assessment difficult:

2) Second, a line plot or contour plot format would be preferable to read errors in the plot.]

I agree with the reviewer that this kind of non-dimensionalization would offer a more robust representation of the model results. There are many relevant parameters whose relative values impact the estimation error, and careful choice of how to represent this data is certainly important. However, since we are aiming for this tool to be engaging for a more experimental audience, we wanted to maintain the use of more tangible parameters, which maintain their straightforward physical interpretability, and chose this format to prioritize familiarity and approachability over density of information reporting.

For any given experiment you may have a single localization error and limited range of trajectory lengths. So understand why the parameterization of this figure may not be the most versatile in its use for any one experiment. However, we hope that this parameterization of the lookup table might instead be applicable to a wider audience. As this figure is viewed by a wide audience of researchers with trajectories of varying lengths and localization errors constraining their experiments, we hope they can get a ballpark answer for the question of whether this tool will be useful for them. While the diffusion coefficient value is of course relevant as well, we hope that representing the effects on three different orders of magnitude of diffusivities can give the reader a taste of what might be possible (a third order of magnitude was added in our revised manuscript).

That said, the tool is designed to have an accompanying tutorial Jupyter notebook with examples of how each figure is generated. This was a conscious choice to ensure that those without extensive computational backgrounds have a more easily approached interface with the code, in order to tweak analysis parameters themselves, and see how the results change if they switch out parameter values to generate adaptation of our provided figures which are tailored to their own experimental parameters and constraints. We hope that this design can help make up for the limitations in what can be presented in the manuscript figures.

[3) It would be informative to represent a “typical experiment” or experiments in the localization error and trajectory length coordinates of Figure 5. The authors can use experiments from the literature and / or their own data from previous studies.]

Thank you for this feedback - we have incorporated this suggestion.

[As the authors point out, a limitation of the study is that it focuses on an idealized model of intracellular diffusion. The authors argue that the method could be adjusted to account for complicated phenomena, such as subdiffusion, but the intended audience of this algorithm may not find this straightforward. This issue is compounded with the fact that this reviewer finds the purely diffusive case to be particularly amenable to the analytical calculation of the posterior distribution. Other cases may be harder... It would be informative to illustrate how the algorithm would be modified in the subdiffusive or e.g., persistent-random walk case by presenting the posterior distribution for those processes.

We agree that demonstrating example adaptations to the prior and posterior for more complex intracellular would be an exciting enhancement of what this manuscript could offer, however these advancements are beyond the scope of what we are hoping to explicitly provide within this manuscript. We hope that the detailed demonstration of how this tool is built and acknowledgement that prior and posterior distributions may be adapted for more complex needs is sufficient for demonstrating the value of this tool.

However, we agree with the reviewer’s important note that sub-diffusive motion is of particularly great importance to address in greater detail. To address this important concern, we have taken the reviewers suggestion of applying our analysis tool (with the existing prior and posterior designed for pure diffusion) to biologically- relevant sub-diffusive trajectories and reported the resulting estimation error. For this task we have used trajectories simulated using fractional Brownian motion, as previous work has shown this to be a prevalent mode of intracellular transport. The Hurst coefficient (H) used to define this process is H = alpha/2, where alpha is the parameter giving the MSD’s scaling with time lag (tau) as in Eq 1 of our manuscript. Thus, we have set the Hurst coefficient in the simulated trajectories to best represent reported results for the sub-diffusive time scaling alpha = 0.75 (or H = 0.375).

We feel this was an important addition to the manuscript in demonstrating the applicability of the tool and are grateful for the reviewer’s suggestion to include this.

[Minor comments:

1) Is the alpha in equation 2 related to the alpha in equation 1?]

Thanks for pointing this out! No, the two are not related. We’ve changed the inverse gamma parameters to (a, b) rather than (alpha, beta) to disambiguate.

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(25.2KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0221841.r003

Decision Letter 1

Juan Carlos del Alamo

12 Mar 2020

PONE-D-19-23195R1

A Bayesian framework for the detection of diffusive heterogeneity

PLOS ONE

Dear Dr Cass,

We would appreciate receiving your revised manuscript by Apr 26 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Juan Carlos del Alamo

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: I appreciate the authors' efforts to address my concerns and am for the most part satisfied with their revisions. I have a couple of remaining comments.

First, the list of references is rather short and there are places at which the authors discuss standard statistical inference theory without providing appropriate references to the literature. Perhaps a graduate level textbook would be enough. I believe this would be important considering the targeted audience.

Second, I still believe the authors overestimate the generality / flexibility of their approach. I understand the computational framework they present could be extended to other scenarios more representative of intracellular fluctuations than a Gaussian process. However, it is not clear that these extensions would be trivial. In fact, they even recognize this point themselves (circa line 370). I appreciate the authors including a section where they benchmark their tool for fractional Brownian motion. I would suggest to temper the statements about generality of the framework. Also, please use the same color axis and color bars in figures 5 and 7 to facilitate direct comparison (as in by caxis of Matlab or clim of python).

Reviewer #2: The authors manuscript with the accompanying, well-documented python repository is a valuable tool for researchers without significant expertise in Bayesian statistics. It would be more helpful to gather better intuition for KL divergence criterion with more details on how to interpret Fig 6. For e.g, an approximate threshold value of threshold KL below which the posteriors have a given probability to represent the same true diffusion constant (and therefore, not representative of the heterogeneous environment). Authors explain the intuition of KL values, but a rule of thumb would be more beneficial to design experiments.

The authors also provide an accessible way of estimating baseline errors in inference using heatmaps in Fig 5. A very important source of sensitivity to parameter inference lies in prior distribution parameters and a brief guide of choosing parameters (a,b) to not introduce bias in analysis (uninformative prior) would be recommended.

Minor typo: In pg 7/18, the likelihood function is written as p(theta | x) instead of p(x | theta).

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS One. 2020 May 7;15(5):e0221841. doi: 10.1371/journal.pone.0221841.r004

Author response to Decision Letter 1

14 Apr 2020

>>>Reviewer #1: I appreciate the authors' efforts to address my concerns and am for the most part satisfied with their revisions. I have a couple of remaining comments.

Thanks for taking the time to consider our manuscript again and for your helpful feedback.

>>>First, the list of references is rather short and there are places at which the authors discuss standard statistical inference theory without providing appropriate references to the literature. Perhaps a graduate level textbook would be enough. I believe this would be important considering the targeted audience.

We appreciate the suggestion and have added references to a graduate Bayesian statistics text (Gelman et al’s “Bayesian Data Analysis”) where appropriate.

>>>Second, I still believe the authors overestimate the generality / flexibility of their approach. I understand the computational framework they present could be extended to other scenarios more representative of intracellular fluctuations than a Gaussian process. However, it is not clear that these extensions would be trivial. In fact, they even recognize this point themselves (circa line 370). I appreciate the authors including a section where they benchmark their tool for fractional Brownian motion. I would suggest to temper the statements about generality of the framework.

We have removed the “flexible” descriptor in line 354 and added another statement acknowledging the challenge of framing more complex priors/ posteriors in line 372-374 in the “Framework limitations” sections.

>>>Also, please use the same color axis and color bars in figures 5 and 7 to facilitate direct comparison (as in by caxis of Matlab or clim of python).

Thanks for catching this; we’ve updated the rendering of these figures so that each comparable plot (ie 5A/7A, 5B/7B etc have identical color axes / color bars.

>>>Reviewer #2: The authors manuscript with the accompanying, well-

documented python repository is a valuable tool for researchers without significant expertise in Bayesian statistics.

We appreciate you taking the time to review our manuscript and accompanying repository.

>>>It would be more helpful to gather better intuition for KL divergence criterion with more details on how to interpret Fig 6. For e.g, an approximate threshold value of threshold KL below which the posteriors have a given probability to represent the same true diffusion constant (and therefore, not representative of the heterogeneous environment). Authors explain the intuition of KL values, but a rule of thumb would be more beneficial to design experiments.

Thanks for this suggestion; we agree that including a benchmark value would increase the usability of this reference table and have included a value and associated brief discussion in lines 332-342.

>>>The authors also provide an accessible way of estimating baseline errors in inference using heatmaps in Fig 5. A very important source of sensitivity to parameter inference lies in prior distribution parameters and a brief guide of choosing parameters (a,b) to not introduce bias in analysis (uninformative prior) would be recommended.

We agree that a discussion of this prior bias and parameter choice strengthens the manuscript and the tool’s usability; we’ve included a brief discussion of this in lines 191-208.

>>>Minor typo: In pg 7/18, the likelihood function is written as p(theta | x) instead of p(x | theta).

Thanks for catching this typo; we’ve fixed it in the updated manuscript.

Attachment

Submitted filename: response_to_reviewers.pdf

Click here for additional data file.^{(28.4KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0221841.r005

Decision Letter 2

Juan Carlos del Alamo

17 Apr 2020

A Bayesian framework for the detection of diffusive heterogeneity

PONE-D-19-23195R2

Dear Dr. Cass,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Juan Carlos del Alamo

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0221841.r006

Acceptance letter

Juan Carlos del Alamo

22 Apr 2020

PONE-D-19-23195R2

A Bayesian framework for the detection of diffusive heterogeneity

Dear Dr. Cass:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Juan Carlos del Alamo

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(25.2KB, pdf)}

Attachment

Submitted filename: response_to_reviewers.pdf

Click here for additional data file.^{(28.4KB, pdf)}

Data Availability Statement

All data and software are publicly available on the GitHub repository https://github.com/AllenCellModeling/diffusive_distinguishability (DOI: 10.5281/zenodo.2662552).

[pone.0221841.ref001] 1. Lee GM, Ishihara A, Jacobson K. Direct observation of brownian motion of lipids in a membrane. PNAS. 1991;88:6274–6278. 10.1073/pnas.88.14.6274 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0221841.ref002] 2. Saxton MJ, Jacobson K. SINGLE-PARTICLE TRACKING: Applications to Membrane Dynamics. Annu Rev Biophys Biomol Struct. 1997;26:373–399. 10.1146/annurev.biophys.26.1.373 [DOI] [PubMed] [Google Scholar]

[pone.0221841.ref003] 3. Weber SC, Spakowitz AJ, Theriot JA. Bacterial Chromosomal Loci Move Subdiffusively through a Viscoelastic Cytoplasm. PRL. 2010;104(23)(238102). 10.1103/PhysRevLett.104.238102 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0221841.ref004] 4. Magde D, Elson E, Webb WW. Thermodynamic Fluctuations in a Reacting System—Measurement by Fluorescence Correlation Spectroscopy. Phys Rev Lett. 1972;29:705–708. 10.1103/PhysRevLett.29.705 [DOI] [Google Scholar]

[pone.0221841.ref005] 5. Machán R, Hof M. Recent developments in fluorescence correlation spectroscopy for diffusion measurements in planar lipid membranes. International journal of molecular sciences. 2010;11:427–457. 10.3390/ijms11020427 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0221841.ref006] 6. Harwardt M, Dietz MS, Heilemann M, Wohland T. SPT and Imaging FCS Provide Complementary Information on the Dynamics of Plasma Membrane Molecules. BiophysJ,. 2018;. 10.1016/j.bpj.2018.03.013 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0221841.ref007] 7. Valentine MT, Kaplan PD, Thota D, Crocker JC, Gisler T, Prud’homme RK, et al. Investigating the microenvironments of inhomogeneous soft materials with multiple particle tracking. Phys Rev E. 2001;64(061506). [DOI] [PubMed] [Google Scholar]

[pone.0221841.ref008] 8. Lampo TJ, Stylianidou S, MP B, Wiggins P, Spakowitz AJ. Cytoplasmic RNA-Protein Particles Exhibit Non-Gaussian Subdiffusive Behavior. BiophysJ. 2017;112(3):532–542. 10.1016/j.bpj.2016.11.3208 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0221841.ref009] 9. Karslake JD, Donarski ED, Shelby SA, Demey DM, DiRita VJ, Veatch SL, et al. SMAUG: Analyzing single-molecule tracks with nonparametric Bayesian statistics. BioRxiv Pre-print. 2019; 10.1101/578567. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0221841.ref010] 10. Michalet X. Mean square displacement analysis of single-particle trajectories with localization error: Brownian motion in an isotropic medium. Phys Rev E: Statistical, nonlinear, and soft matter physics. 2010;82 10.1103/PhysRevE.82.041914 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0221841.ref011] 11. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2nd ed Chapman and Hall/CRC; 2004. [Google Scholar]

[pone.0221841.ref012] 12. Llera A, Beckmann CF. Estimating an Inverse Gamma distribution. arXiv:160501019 [statME]. 2016;. [Google Scholar]

[pone.0221841.ref013] 13. Savin T, Doyle P. Static and dynamic errors in particle tracking microrheology. BiophysJ. 2005;88:623–38. 10.1529/biophysj.104.042457 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0221841.ref014] 14. Forst NA, Hsiangmin EL, Blanpied TA. Optimization of Cell Morphology Measurement via Single-Molecule Tracking PALM. PLOS One. 2012;7(5)(e36751). 10.1371/journal.pone.0036751 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0221841.ref015] 15. Wu PH, Hale CM, Chen WC, Lee JS, Tseng Y, Wirtz D. High-throughput ballistic injection nanorheology to measure cell mechanics. Nature protocols. 2012;7:155–170. 10.1038/nprot.2011.436 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A Bayesian framework for the detection of diffusive heterogeneity

Julie A Cass

C David Williams

Julie Theriot

Roles

Abstract

Introduction

Fig 1. Single particle tracking.

Materials and methods

Trajectory simulation and localization error

Fig 2. Sample trajectory with and without localization error.

Bayesian inference of diffusivity

Characterizing the distinguishability of diffusivity posteriors

Code availability

Results and discussion

Bayesian inference of diffusivity

Inverse-gamma distribution as diffusivity conjugate prior

Sources of posterior estimate error

Fig 3. Sample trajectory with and without localization error.

Fig 4. Sample trajectories and diffusivity posteriors, with and without localization error.

Fig 5. Percent posterior estimation error conditional on static localization accuracy and trajectory lengths.

Distinguishability of trajectory diffusivities

Fig 6. Look-up table for posterior KL divergence, conditional on diffusivities and trajectory lengths.

Comparison with MSD analysis

Application to spatially dependent diffusivity characterization

Framework limitations

Application to fractional Brownian motion trajectories

Fig 7. Percent posterior estimation error conditional on static localization accuracy and trajectory lengths for fraction Brownian motion.

Conclusion

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Juan Carlos del Alamo

Roles

Author response to Decision Letter 0

Decision Letter 1

Juan Carlos del Alamo

Roles

Author response to Decision Letter 1

Decision Letter 2

Juan Carlos del Alamo

Roles

Acceptance letter

Juan Carlos del Alamo

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases