Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2024 Feb 29:2024.02.27.582378. [Version 1] doi: 10.1101/2024.02.27.582378

Learning Continuous 2D Diffusion Maps from Particle Trajectories without Data Binning

Vishesh Kumar 1,2, J Shepard Bryan IV 1,2, Alex Rojewski 1,2, Carlo Manzo 3,4, Steve Pressé 1,2,5,*
PMCID: PMC10925201  PMID: 38464131

Abstract

Diffusion coefficients often vary across regions, such as cellular membranes, and quantifying their variation can provide valuable insight into local membrane properties such as composition and stiffness. Toward quantifying diffusion coefficient spatial maps and uncertainties from particle tracks, we use a Bayesian method and place Gaussian Process (GP) Priors on the maps. For the sake of computational efficiency, we leverage inducing point methods on GPs arising from the mathematical structure of the data giving rise to non-conjugate likelihood-prior pairs. We analyze both synthetic data, where ground truth is known, as well as data drawn from live-cell single-molecule imaging of membrane proteins. The resulting tool provides an unsupervised method to rigorously map diffusion coefficients continuously across membranes without data binning.

1. Introduction

Cellular membranes play critical roles in many important biological processes, such as signal transduction [1], molecular transport, and the maintenance of structural integrity of cells [2]. Owing to the complexity of cellular membrane architecture and its interactions with peripheral structures on both the interior and exterior of the cell [3], modeling membrane and, more generally, heterogeneous protein diffusive dynamics in space remains an active area of study [4, 5, 6, 7].

To understand the dynamics of embedded membrane proteins, several methods have been proposed to map diffusion coefficients, or diffusivities, of membrane proteins [8, 9, 10, 11, 12, 13, 14].

While all of these methods infer dynamical quantities, most of them involve data binning as a data pre-processing step. This necessarily reduces the amount of information left to analyze in deducing diffusion coefficient maps whether from membrane proteins or other applications. Beyond data reduction, binning methods, ultimately, impact the spatial precision of inference and our ability to rigorously propagate error arising from particle localizations into diffusion coefficient spatial maps. Approximately optimizing bin sizes, locations, and shapes in estimating diffusion coefficient magnitudes has been addressed, whether involving Voronoi tessellation [15] or assigning length scales to each localization based on noise [16], though leaving unresolved the possibility of avoiding data binning altogether.

Furthermore, inherent to binned analysis is the implication that diffusion coefficients are assumed to vary across bins, but otherwise remain constant within any one bin [13]. This sharp spatial variation, introduced by binning, masks the precise underlying gradient of the diffusion coefficient change within a bin that may already be encoded in the data. Recent approaches, based on deep learning [14], remove the need for binning but require supervised training on labeled datasets.

To address these fundamental issues, we propose a new framework circumventing these difficulties by determining a continuous diffusion coefficient map using all available data without binning in an unsupervised fashion. That is, we are focused on learning continuous 2D diffusion coefficient maps from trajectory data. In principle, the trajectory data can be of any type and the spatially varying diffusion coefficient can arise from any number of physical origins.

To achieve this, we develop a Bayesian framework that uses trajectory data to infer all candidate diffusion coefficient maps alongside their associated uncertainty. Within our Bayesian framework, we avoid data binning by leveraging the mathematics of Gaussian Process (GP) priors on the continuously defined diffusion map that we wish to learn. We further leverage approximations (inducing point methods detailed later [17, 18]) to otherwise reduce the cubic scaling [19] of naive GP implementations.

We demonstrate our method on both synthetic data, where ground truth diffusion maps are known, as well as experimental data involving membrane protein trajectories extracted from live-cell single-molecule imaging experiments on cellular membranes.

2. Methods

Our goal is to infer continuous spatial diffusion coefficient maps in 2D, from particle trajectory data. In many practical examples, these particles are membrane proteins. Then we provide a numerical scheme suited for the method. Finally, we validate our method on both synthetic and experimental data.

2.1. Theory Methods

Concretely, we treat diffusion dynamics as a Brownian random walk, under the Itô approximation [20, 21], with spatially varying diffusion coefficient. Under this approximation, we first describe a forward model relating the collected data to spatial diffusion coefficient maps. Next, we design an inverse method, to learn diffusion coefficient landscapes warranted by the data. Strictly speaking, as we work within a Bayesian paradigm, we develop a posterior over all candidate spatial maps to which we can assign probabilities. We then use Monte Carlo to sample from our posterior. Though in principle our framework assigns probabilities to all putative spatial maps given the data, for convenience, the spatial maps we illustrate in all figures are those maximizing the posterior termed the maximum a posteriori (MAP) 2D maps.

2.1.1. Forward model

We index each particle, to which we associated one track (i.e., one location in each frame) using i=1,..,I, with I the total number of particles. For each particle, location measurements occur at fixed time intervals, spaced Δt apart, defining time frames. Not all particles are present in each frame. To accommodate this contingency, we define a quantity Ni which is the total number of frames in which particle i appears. For instance, particle 1 (i=1), may appear in a total of 2 frames, and thus N1=2. Particles are assumed to appear in consecutive frames and, if they disappear due to blinking of the labels on membrane proteins for instance in fluorescence experiments, then we consider the particle once re-appeared as a new particle. As we are only learning diffusion coefficient maps, and not keeping track of particle identities, this convention is purely a matter of convenience.

Next, we define the position of a given particle, i, in its nith frame as rnii. Within a free diffusion model with spatially varying diffusion coefficient, we assume this position to be normally distributed around the preceding position rni-1i, with variance proportional to the diffusion coefficient at that previous position. This transition probability is expressed as

𝒫(rniirni-1i,D)=𝒩(rnii;rni-1i,2ΔtD(rni-1i)I2). (1)

Here D() is a continuous function describing the spatially dependent diffusion coefficient and I2 is a 2D identity matrix. Using the transition probability previously defined, we construct the likelihood of a collection of I trajectories with Ni positions for each i particle, given a diffusion coefficient map, D, as follows

𝒫rD=i=1Ini=2Ni𝒩(rnii;rni1i,2ΔtD(rni1i)I2), (2)

where r=rniini=1:Nii=1:I collects all observed particle positions for all particles.

2.1.2. Inverse method

Given our forward model, our next task is to develop the (posterior) probability distribution over diffusion coefficient maps. For this, we use Bayesian inference, as it provides a principled framework to systematically incorporate observed data, leading to reliable and robust estimation. In mathematical terms, this is achieved leveraging Bayes’ Theorem

𝒫(D()r)𝒫(D())𝒫(rD()), (3)

where 𝒫(D()) is the prior distribution on all candidate maps, 𝒫(rD()) is the likelihood of the data provided a specific diffusion coefficient map, given by Equation (2), and 𝒫(D()r) is the posterior distribution assigning probabilities to all candidate diffusion coefficient maps given the data.

Having specified the likelihood in Equation (2), we now specify a prior. Specifically, we need a distribution on D() assigning probabilities to continuous surfaces but also allowing for a convenient form when evaluated at discrete spatial positions. A common choice is the Gaussian Process (GP) prior [19, 18]. GPs are an infinite collection of co-varying random variables, any finite subsample of which follows a multivariate Gaussian distribution, allowing for a means by which to assign a probability over function space. In this case, the function space consists of surfaces. This means that we can assign probabilities to continuous diffusion coefficient maps based on a discrete set of values.

Thus far, we have been using D() to refer to the continuous diffusion coefficient function. As we begin to explicitly define the inference task, we will need to define this function on a 2D discrete grid as finely spaced as computational efficiency permits. Notationally, we use D to represent an array of diffusion coefficients and use an appropriate subscripted index to describe the locations of the discretization. Selecting a finite number of training points, ω, on the diffusion coefficient surface, the prior approximates to

𝒫D=GPD;μ,K𝒫Dω=𝒩Dω;μω,Kωω, (4)

here Dω is the array of diffusion coefficients based on an arbitrary surface at locations ω, μω is the mean of the prior at those same training points, and Kωω is the auto-covariance between the training points.

Evaluating the posterior typically involves inverting the covariance matrix [19] which becomes computationally expensive, scaling cubically with the number of data points [18], and unstable for large datasets. To address this, we turn to an inducing point method [17] defining a uniform grid of points, performing inference on those points, and obtaining a finer resolution when needed by leveraging a covariance-based interpolation method. For our inducing point model, we establish m inducing points as a uniform grid on the domain on which we infer Dm. That is

𝒫Dm=𝒩Dm;μMLE,Kmm, (5)

where Dm is the array of diffusion coefficients at the inducing points, μMLE is the mean of the prior, and Kmm is the auto-covariance matrix between the inducing points. Having defined a computationally efficient form of our prior, we can proceed by defining values for the two hyperparameter quantities, mean and covariance. As a convenient starting point, we set the mean of the prior to be a flat diffusion coefficient with magnitude given by the Maximum Likelihood Estimate (MLE) over the entire data [22],

μMLE=14((i=1INi)I)Δti=1In=2Nirniirni1i2. (6)

As for the covariance we choose a square exponential kernel [18], also known as the Gaussian Radial Basis Function (RBF) whose form reads

kx,x=λ2e-122x-x2, (7)

here λ sets the variance of the multivariate Gaussian and defines the covariance between positions x and x. The explicit values of these hyperparameters can be tuned based on the expected variation of the true diffusion surface. Generally, to keep the prior uninformative, we set the variance, λ, to be twice the magnitude of the MLE, and set the length scale to 20% of the data’s range.

For computational convenience, we placed the prior on a coarser, inducing point, grid though our likelihood is on the grid of available (unprocessed, i.e., unbinned) data points. Thus to compute both prior and likelihood simultaneously, we must have a way to interpolate from one grid to the other. This is achieved by rigorously interpolating using our covariance matrices [17, 18]

Dr=KrmKmm-1Dm, (8)

whose elements read

Drnii=KrniimKmm-1Dm. (9)

Here Drnii is the diffusion coefficient at rnii and Krniim is a row vector, from the full covariance matrix Krm, whose elements are the covariance between the position rnii and all inducing points.

Under our inducing point interpolation scheme, we reparameterize our likelihood, Equation (2),

𝒫rD=i=1Ini=2Ni𝒩(rnii;rni1i,2ΔtD(rni1i)I2)=i=1Ini=2Ni𝒩(rnii;rni1i,2ΔtDni1iI2)=i=1Ini=2Ni𝒩(rnii;rni1i,2Δt(K(rnii1)mKmm1Dm)I2). (10)

This setup yields the following posterior distribution (up to a normalization constant) over Dm given the data, r

𝒫Dmr𝒫Dm𝒫rDm=𝒩Dm;D0,Kmm×i=1In=2Ni𝒩(rnii;rni1i,2Δt(Krnii1mKmm1Dm)I2). (11)

By maximizing this posterior with respect to Dm, we obtain the most probable diffusion coefficient distribution explaining the observed data r though sampling from the posterior is possible to gain information on uncertainty. While direct sampling from this posterior is challenging due to the mathematical (non-conjugate) form of the prior and likelihood, we rely on MCMC sampling [18], specifically by constructing a Metropolis-within-Gibbs scheme.

2.1.3. Algorithm

Metropolis-within-Gibbs sampling requires a proposal distribution for generating samples of Dm. A straightforward approach is to propose new Dm’s, either element-wise or as whole surfaces, based on the previous value of Dm [23]. However, such a naive approach is problematic because the prior, from Equation (5), favors small deviations from the mean. This necessitates proposals amounting to small variations away from Dm, leading to extended convergence times with no guarantee of avoiding local optima traps.

To address this challenge, we introduce a reparametrization that allows us to propose new values for Dm in a more efficient manner. We linearly transform the array of inducing points, using the inverse auto-covariance matrix, into a space that respects the covariance of the prior, denoted as Am=Kmm-1Dm. In this transformed space, we can make proposals across the entire map whose smoothness matches, and is therefore less penalized, by the prior enabling more substantial proposals.

To expedite convergence, we initialize the Metropolis-Hastings algorithm at a highly probable sample deterministically identified. In our specific case, a very effective initialization method involves performing MLE at the data points with interpolation to the inducing points using RBF interpolation. For this, we use the same RBF as our covariance matrix Equation (7).

The last challenge with a high dimensional posterior is avoiding local posterior maxima, while the algorithm is iteratively converging. We tackle this in two ways, the first is tempered sampling [24], which allows us to transform our posterior into an augmented space where we can control the behavior of the sampler. Mathematically we write

ΠDm,r,T=𝒫Dmr1/T (12)

where the temperature (T) of the sampler dictates how heavily the sampler is penalized for moving to areas of low probability. Sampling at higher temperatures results in a lower penalty and vice-versa, while a temperature of 1 is precisely our standard Monte Carlo scheme. For our specific case, we begin with a temperature of 10 and exponentially decay to 1.

The second way we avoid local maxima is by stochastically iterating through the components of Am when making proposals. This means that instead of proposing new values of the components in order every time, we iterate through the components of Am randomly. This prevents proposals in any one area of the surface from dominating the accepted samples.

With these modifications, we outline the algorithm as follows.

Initialization:

  • We first define our hyper-parameters, μMLE, λ, , as outlined in Equation (6) and Equation (7).

  • Compute the initial sample deterministically using the MLE at each point and performing a kernel-based interpolation, using Equation (7) as the kernel, to the inducing points: Dm0=KmrμrKmr1r. Here is Hadamard division (elementwise division).

  • Transform the initial sample into inverse space: Am0=Kmm-1Dm0

Monte Carlo:

  • For many iterations, perform the following steps: Begin with the desired sampling temperature, 10 in our case, and allow it to decay to 1 over the iterations. During the decay, generate samples of Dm using Dmj+1=KmmAmj+1, where Amj+1 is obtained through stochastic iterations over k, the inducing points in inverse space:
    1. Propose new values for Akj+1𝒩(Akj,(Akj)2*ϵk), where ϵk is a constant for adjusting the proposal magnitude to maintain a desirable acceptance rate (about 25%). ϵk is automatically updated after every full sweep through k.
    2. Accept or reject the proposal based on the tempered Metropolis-Hastings acceptance ratio while adjusting for the current temperature of the sampler. One full loop through all values of k is considered the new sample Amj+1.

Post-Processing:

  • After the sampler has run and converged, select the sample with the highest probability to obtain the MAP estimate.

This algorithm efficiently explores the high-dimensional posterior over the diffusion map and also estimates the most probable diffusion coefficient map (the MAP estimate) given the observed data.

2.2. Data Methods

2.2.1. Generating and Benchmarking Synthetic Data

To benchmark our inference method and assess its performance across various scenarios, we initially rely on the use of synthetic data. Synthetic data provides a controlled environment for rigorous evaluation, allowing us to precisely control the properties of the data and establish known ground truth diffusion coefficients. By comparing the algorithm’s estimated diffusion coefficient maps with these known ground truths, we validate its effectiveness and identify potential limitations. Synthetic data are generated by specifying an arbitrary diffusion coefficient map surface, and then simulating trajectories by directly sampling from the forward model, specifically Equation (2).

In our evaluation, we introduce specific challenges to test the algorithm’s capabilities:

  1. Reducing the amount of data. To evaluate how well the algorithm performs under conditions of limited data availability, we systematically reduce the amount of data in our synthetic scenarios. This allows us to benchmark the method’s accuracy with decreasing data. More concretely, we simulate more challenging data scenarios by reducing the number of particles or randomly dropping positions from particle trajectories. This helps us replicate situations where certain regions have few measurements or the particle is lost over tracking, say, due to phenomena such as photobleaching. Representative results are shown in Figures 2a, 2b, 2c, while additional full analyses are available in Figure S1;

  2. Increasing the complexity of the diffusion coefficient maps used in generating the synthetic data. That is, to assess the algorithm’s performance in the presence of important spatial changes in diffusion coefficients, we create synthetic data scenarios with rapidly spatially varying diffusion coefficients. For instance, we generate diffusion coefficient maps as wave patterns with progressively decreasing wave periods in each dataset to test the method’s robustness. By generating synthetic data under these conditions, we gain valuable insights into the algorithm’s behavior, ensuring its robustness and adaptability across a range of practical situations. Representative results are shown in Figures 2g, 2h, 2i, while additional full analyses are available in Figure S1.

Figure 2: Learning diffusion coefficient maps from synthetic data.

Figure 2:

Each row here represents an analysis of a unique synthetic data set. The first column of each row shows the true diffusion coefficient map used in synthetic data simulation and the second column plots the diffusion coefficient map inferred by the algorithm, with the synthetic data plotted in green below the surface. As can be seen in Figures 2b 2e 2h, we progressively increase the number of localizations, 5 × 103, 105, and 2 × 105 respectively. The third column plots the relative error between the Ground Truth Diffusion Map and the Inferred Diffusion Map as a function of space computed according to Equation (13)

Once we have generated and analyzed such synthetic datasets with our algorithm, we benchmark the results with a calculation of spatial accuracy using:

RelativeError(%)=GroundTruthMap-InferredMapGroundTruthMap×100%. (13)

2.2.2. Experimental Methods

For the live-cell single-molecule imaging experiments, CHO cell lines stably expressing dendritic cell-specific intercellular adhesion molecule-3-grabbing nonintegrin (DC-SIGN) wild-type and DC-SIGN N80A mutant, established by Lipofectamin 2000 (Invitrogen) transfection, were cultured in Ham’s F-12 medium (LabClinics), supplemented with 10% heat-in-activated FBS (Invitrogen), 1% Antibiotic Antimycotic Solution (GE Healthcare Life Sciences), and 0.5 mg/mL of the aminoglycoside antibiotic G418 (Invitrogen). For the protein labeling, we used half-antibody fragments obtained following a protocol similar to the one used in [25]. DCN46 antibody (Pharmingen) was dialyzed overnight at 4ôC using Slide-A-Lyzer MINI Dialysis Units (Thermo Scientific) in PBS and reduced with DTT (Invitrogen) following the manufacturer’s instructions. Reduced antibodies were then biotinylated with Maleimide-PEG2-Biotin (Thermo Scientific). Nonreacted DTT and unbound biotin were removed by overnight dialysis. Biotinylated half-antibody fragments were conjugated with streptavidin-coated quantum dots QD655 and QD585 (Invitrogen). Before imaging, CHO cells were seeded onto 25-mm glass coverslips (Menzel-Gla asser). Cells were incubated with quantum dot conjugates for 5 min at RT. Extensive washing with serum-free medium was performed to remove unbound conjugates. Imaging was performed using an Olympus fluorescence microscope equipped with a 1.4 NA, 100× objective. Samples were illuminated in epifluorescence geometry with the 488-nm line of an argon-ion laser (Spectra Physics), with power density at the sample plane of approximately 0.3kW/cm2ˆ. Emission light was split with appropriate dichroic mirror and filters, and collected on an intensified EM-CCD (Hamamatsu). Movies were recorded at a frame rate of 33 Hz for 10,000 frames. Detection and tracking were performed using u-track [26]. The detection (Gaussian Mixture-Model Fitting) and tracking parameters were optimized based on visual inspection and performance diagnostic of the resulting detection and tracking. All image and data analysis were performed in MATLAB (The MathWorks, Natick, MA). Videos were loaded into MATLAB using Bio-Formats [27].

3. Results

As mentioned previously, we split the discussion of our results into two sections. The first pertains to results on synthetic data used to validate the robustness of our algorithm. The second section consists of experimental data to highlight our framework’s applicability to biological data. Our standard for an accurate map will be a 10% relative error, i.e., as long as we stay below 10% we claim that we have inferred the diffusion map accurately.

3.1. Synthetic Data

Here we show the results on three unique synthetic datasets. For all synthetic datasets, we keep a consistent 25-micron square for the field of view and randomize both the exact number of trajectories and the length of each. Each dataset validates our inference algorithm and showcases an important feature of the method. The first, Figures 2a, 2b, 2c, shows our inference task on a flat diffusion surface, showing how inferring a continuous surface is possible within our chosen Bayesian paradigm using only about 5×103 localizations (from around 250 trajectories). In Figures 2d, 2e, 2f, we show that it is possible to learn a single perturbation upon the flat surface, verifying that the model is able to infer accurate diffusion maps using about 105 localizations. Finally, simulate the data using around 2 × 105 localizations using a diffusion coefficient map consisting of a series of waves where the maximum period is 40% of the field of view. In Figures 2g 2h 2i, we see that we can recover the diffusion map within 10% relative error with few areas just outside that threshold.

3.2. Experimental Data

Here we show results obtained for experimental data of live-cell single-molecule imaging. Dendritic cell-specific intercellular adhesion molecule-3-grabbing nonintegrin (DC-SIGN) stably expressed in CHO cells were fluorescently labeled and imaged on the dorsal membrane of living cells. Previous work [28, 29] has reported heterogeneous diffusion for both the wild-type and N80A mutant, characterized by changes in diffusion coefficient.

For all experimental data, the field of view is under or equal to a 25-micron square, and the datasets themselves contain at least 1.8×105 protein localizations. After running on synthetic data for parameter regimes close to this (and showing the method infers diffusion coefficient maps within 10% relative error, see Figure 2), we now turn to experimental data.

The results in Figure 3 show that we are able to recover diffusion coefficient maps from the various sets of membrane protein trajectories. The magnitude of the diffusion coefficient recovered by our method is in line with previous observations [28, 29], shown in Figure S3. As always, with experimental data, we do not have a ground truth to assess the accuracy of our inference.

Figure 3: Learning diffusion coefficient maps from experimental data.

Figure 3:

Here we visualize the inferred diffusion coefficient map from six different experimental datasets, each from different cells. The green points at the bottom of each plot represent DC-SIGN wt (a-c) and N80A (d-f) 105 localizations from trajectories analyzed for each set. The surfaces plotted are the inferred surfaces from the algorithm and, as expected, they diverge towards the edge where there is no data as we intentionally analyze a region larger than the data provided to extrapolate the diffusion coefficient map slightly beyond the data.

To address this, we produced a self-consistency check by splitting all experimental datasets in half and running the algorithm separately on each half to verify the convergence of both to a consistent diffusion coefficient map. To maintain similar spatial densities, we split the data into two subsets by extracting alternate positions from each protein trajectory. Figure 4 shows the results after subsetting the data. Here we have purposefully extrapolated beyond areas of data for two reasons: the first is for visualization convenience and the second is to analyze behavior slightly beyond the data. Allowing for the fact that GP inference on diffusion maps tends to revert to the prior at the edges with limited data [30], we see that from the remaining areas with data, the inferred maps derived from each subset stay within 10% relative error, the same standard used for the synthetic data. In addition, in SI we split the data by randomly selecting trajectories and comparing the diffusion maps from both data subsets demonstrating that the diffusion map is constant in time within error over the observation time, shown in Figure S2.

Figure 4: Self-consistency check on experimental data.

Figure 4:

Figure 4:

We subsetted each experimental dataset into half and ran the algorithm on each half. Each row above represents a unique dataset. The first two columns coincide with the inferred surface plotted with the respective data half below. The third column is the relative error between the two inferred surfaces. In areas of the membrane that have protein trajectories, we can see that the relative error between the two surfaces stays below 10%. We also highlight that the red in Figures 4c 4i 4o 4r specifically arises in areas where no data is available and thus differences from samples from the very broad prior in both MAP estimates are very different.

4. Discussion

We have presented a general inference algorithm that can accurately infer continuous spatial diffusion coefficient maps from particle trajectories. Working within a Bayesian framework, we developed a posterior distribution assigning probabilities to all possible diffusion coefficient surfaces. We were able to reduce the computational burden of naive GP regression by adopting an inducing point method [17] auxiliary variable sampling techniques. More concretely, we reduced the otherwise cubic scaling of naive GP with the data 19, 17, 18, 𝒪((i=1INi)3) to 𝒪(m2(i=1INi)) for initialization, 𝒪((i=1INi)2) to 𝒪(m(i=1INi)) per Monte Carlo iteration, substantially reducing the computational burden and enhancing the efficiency of the posterior sampling.

We analyzed the accuracy and robustness of our method through the analysis of synthetic data, with a known ground truth. The results show that our model successfully captures the spatial variation of the diffusion coefficient, with relative errors within 10%. This validates the reliability of our approach. Furthermore, we have applied our method to experimental data verifying self-consistency within 10% relative error by splitting the data into two halves and analyzing each independently.

Ultimately, moving forward, improved uncertainty quantification in the diffusion coefficient maps recovered may be achieved by using tools capable of quantifying localization uncertainty [31, 32] and propagating this uncertainty into an uncertainty over the diffusion coefficient map. However ideally, at computational cost, we ambitiously envision future work simultaneously and self-consistently learning diffusion coefficient maps while tracking. In other words, it may be possible to envision a more general framework avoiding the modular structure proposed here that first requires tracks as input and then processes these tracks to produce diffusion maps. This modular structure is acceptable for well separated tracks but may start to fail for crowded regions [31, 32] with many particles criss-crossing paths.

Such a self-consistent framework avoiding modularity may also benefit the analysis of dimmer particles to which larger localization uncertainty is associated and treat other sources of heterogeneity in the biological data. For example, it may shed quantitative insight on the role of lipid composition on protein diffusion by correlating lipid localization to diffusion maps [33, 34, 35].

Supplementary Material

Supplement 1

Figure 1: Schematic representation of our method.

Figure 1:

Our method uses single-particle localizations forming trajectories as input and outputs a continuous surface describing the diffusion coefficient as a function of space without binning or other forms of data downsampling. On the left is a single frame from a fluorescence microscopy frame stack which has been artistically labelled in orange to represent membrane protein trajectories. These trajectories are the input into our model, which identifies the spatial diffusion coefficient map of the highest probability. This is plotted on the right, with green dots identifying the localizations of particles used in deducing the diffusion coefficient map. The linking between the localizations of each particle across each frame forms tracks which, for clarity alone, are not shown.

6. Acknowledgements

We thank Maria Garcia-Parajo for providing data collected in her lab. We acknowledge the support of the NIH (NIGMS R01GM130745, NIGMS R01GM134426, NIGMS R35GM148237). C.M. acknowledges support through grant PID2021-125386NB-I00 funded by MCIN/AEI/10.13039/501100011033/ and “ERDF A way of making Europe”.

Footnotes

5

Code Availability

The code for this work can be found at https://github.com/LabPresse/GPDiffusionMapping

References

  • [1].Cheng Xiaolin and Smith Jeremy C.. Biological Membrane Organization and Cellular Signaling. Chemical Reviews, 119(9):5849–5880, 2019. [DOI] [PubMed] [Google Scholar]
  • [2].Casares Doralicia, Escribá Pablo V, and Rosselló Catalina Ana. Membrane Lipid Composition: Effect on Membrane and Organelle Structure, Function and Compartmentalization and Therapeutic Avenues. International Journal of Molecular Sciences, 20(9):2167, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Vereb György, János Szöllosi János Matkó, Nagy Peter, Farkas Tamás, Vigh László, László Mátyus Thomas A. Waldmann, and Damjanovich Sándor. Dynamic, yet structured: The cell membrane three decades after the Singer-Nicolson model. Proceedings of the National Academy of Sciences of the United States of America, 100(14):8053–8058, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Jacobson Ken, Liu Ping, and B Christoffer Lagerholm. The Lateral Organization and Mobility of Plasma Membrane Components. Cell, 177(4):806–819, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Ohkubo Tatsunari, Shiina Takaaki, Kawaguchi Kayoko, Sasaki Daisuke, Inamasu Rena, Yang Yue, Li Zhuoqi, Taninaka Keizaburo, Sakaguchi Masaki, Fujimura Shoko, Sekiguchi Hiroshi, Kuramochi Masahiro, Arai Tatsuya, Tsuda Sakae, Sasaki Yuji C., and Mio Kazuhiro. Visualizing Intramolecular Dynamics of Membrane Proteins. International Journal of Molecular Sciences, 23(23):14539, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Mahtarin Rumana, Islam Shafiqul, Jahirul Islam Md., Obayed Ullah M, Ackas Ali Md, and Halim Mohammad A. Structure and dynamics of membrane protein in SARS-CoV-2. Journal of Biomolecular Structure and Dynamics, 40(10):4725–4738, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Tsekouras Konstantinos, Siegel Amanda P., Day Richard N., and Pressé Steve. Inferring Diffusion Dynamics from FCS in Heterogeneous Nuclear Environments. Biophysical Journal, 109(1):7–17, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Masson Jean-Baptiste, Dionne Patrice, Salvatico Charlotte, Renner Marianne, Christian G Specht Antoine Triller, and Dahan Maxime. Mapping the energy and diffusion landscapes of membrane proteins at the cell surface using high-density single-molecule imaging and Bayesian inference: application to the multiscale dynamics of glycine receptors in the neuronal membrane. Biophysical journal, 106(1):74–83, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Ranjit Suman, Gratton Enrico, and Lanzano Luca. Mapping Diffusion in a Living Cell using the Phasor Approach. Biophysical Journal, 107(12):2775–2785, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Ernst Dominique and Köhler Jürgen. Measuring a diffusion coefficient by single-particle tracking: statistical analysis of experimental mean squared displacement curves. Physical Chemistry Chemical Physics, 15(3):845–849, 2013. [DOI] [PubMed] [Google Scholar]
  • [11].Toppozini Laura, Victoria Garcia-Sakai Robert Bewley, Dalgliesh Robert, Perring Toby, and Rheinstädter Maikel C. Diffusion in membranes: Toward a two-dimensional diffusion map. EPJ Web of Conferences, 83:02019, 2015. [Google Scholar]
  • [12].Lee Yerim, Phelps Carey, Huang Tao, Mostofian Barmak, Wu Lei, Zhang Ying, Tao Kai, Young Hwan Chang Philip JS Stork, Gray Joe W, et al. High-throughput, single-particle tracking reveals nested membrane domains that dictate KRasG12D diffusion and trafficking. eLfe, 8:e46393, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Türkcan Silvan, Alexandrou Antigoni, and Masson Jean-Baptiste. A Bayesian Inference Scheme to Extract Diffusivity and Potential Fields from Confined Single-Molecule Trajectories. Biophysical Journal, 102(10):2288–2298, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Pineda Jesús, Midtvedt Benjamin, Bachimanchi Harshith, Sergio Noé Daniel Midtvedt, Volpe Giovanni, and Manzo Carlo. Geometric deep learning reveals the spatiotemporal features of microscopic motion. Nature Machine Intelligence, 5(1):71–82, 2023. [Google Scholar]
  • [15].Mohamed El Beheiry Maxime Dahan, and Masson Jean-Baptiste. InferenceMAP: mapping of single-molecule dynamics with Bayesian inference. Nature Methods, 12(7):594–595, 2015. [DOI] [PubMed] [Google Scholar]
  • [16].Mary A Rohrdanz Wenwei Zheng, Maggioni Mauro, and Clementi Cecilia. Determination of reaction coordinates via locally scaled diffusion map. The Journal of Chemical Physics, 134(12):03B624, 2011. [DOI] [PubMed] [Google Scholar]
  • [17].Wilson Andrew Gordon and Nickisch Hannes. Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP). CoRR, abs/1503.01057, 2015. [Google Scholar]
  • [18].Pressé Steve and Sgouralis Ioannis. Data Modeling for the Sciences: Applications, Basics, Computations. Cambridge University Press, 2023. [Google Scholar]
  • [19].Rasmussen Carl Edward and Williams Christopher K. I.. Gaussian Processes for Machine Learning. The MIT Press, 2005. [Google Scholar]
  • [20].Zwanzig Robert. Nonequilibrium statistical mechanics. Oxford University Press, 2001. [Google Scholar]
  • [21].Sato Issei and Nakagawa Hiroshi. Approximation analysis of stochastic gradient Langevin dynamics by using Fokker-Planck equation and Ito process. In International Conference on Machine Learning, pages 982–990. PMLR, 2014. [Google Scholar]
  • [22].Young M. E., Carroad P. A., and Bell R. L.. Estimation of diffusion coefficients of proteins. Biotechnology and Bioengineering, 22(5):947–955, 1980. [Google Scholar]
  • [23].Michalis K Titsias Neil Lawrence, and Rattray Magnus. Markov chain Monte Carlo algorithms for Gaussian processes. Inference and Estimation in Probabilistic Time-Series Models, 9:298, 2008. [Google Scholar]
  • [24].Sambridge Malcolm. A Parallel Tempering algorithm for probabilistic sampling and multimodal optimization. Geophysical Journal International, 196(1):357–374, 2013. [Google Scholar]
  • [25].Low-Nam Shalini T, Lidke Keith A, Cutler Patrick J, Roovers Rob C, van Bergen en Henegouwen Paul MP, Wilson Bridget S, and Lidke Diane S. ErbB1 dimerization is promoted by domain co-confinement and stabilized by ligand binding. Nature Structural & Molecular Biology, 18(11):1244–1249, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Jaqaman Khuloud, Loerke Dinah, Mettlen Marcel, Kuwata Hirotaka, Grinstein Sergio, Sandra L Schmid, and Gaudenz Danuser. Robust single-particle tracking in live-cell time-lapse sequences. Nature Methods, 5(8):695–702, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Linkert Melissa, Curtis T Rueden Chris Allan, Burel Jean-Marie, Moore Will, Patterson Andrew, Loranger Brian, Moore Josh, Neves Carlos, MacDonald Donald, et al. Metadata matters: access to image data in the real world. Journal of Cell Biology, 189(5):777–782, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Torreno-Pina Juan A, Castro Bruno M, Manzo Carlo, Buschow Sonja I, Cambi Alessandra, and Garcia Maria F. Enhanced receptor–clathrin interactions induced by N-glycan–mediated membrane micropatterning. Proceedings of the National Academy of Sciences, 111(30):11037–11042, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Manzo Carlo, Juan A Torreno-Pina Pietro Massignan, Lapeyre Gerald J Jr, Lewenstein Maciej, and Garcia Parajo Maria F. Weak ergodicity breaking of receptor motion in living cells stemming from random diffusivity. Physical Review X, 5(1):011021, 2015. [Google Scholar]
  • [30].IV Bryan J. Shepard, Ioannis Sgouralis, and Steve Pressé. Inferring effective forces for Langevin dynamics using Gaussian processes. The Journal of Chemical Physics, 152(12):124106, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Sgouralis Ioannis, Xu Lance W.Q, Jalihal Ameya P, Walter Nils G, and Pressé Steve. BNP-Track: A framework for superresolved tracking. bioRxiv, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Jazani Sina, Sgouralis Ioannis, and Pressé Steve. A method for single molecule tracking using a conventional single-focus confocal setup. The Journal of Chemical Physics, 150(11):114108, 2019. [DOI] [PubMed] [Google Scholar]
  • [33].Gaus Katharina, Gratton Enrico, Kable Eleanor P. W., Jones Allan S., Gelissen Ingrid, Kritharides Leonard, and Jessup Wendy. Visualizing lipid structure and raft domains in living cells with two-photon microscopy. Proceedings of the National Academy of Sciences, 100(26):15554–15559, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Sjövall Peter and Lausmaa Jukka and Carlsson Nygren, Håkan, Malmberg, Per Lennart. Imaging of membrane lipids in single cells by imprint-imaging time-of-flight secondary ion mass spectrometry. Analytical Chemistry, 75(14):3429–3434, 2003. [DOI] [PubMed] [Google Scholar]
  • [35].Ingólfsson Helgi I. and Melo Manuel N. and Arnarez van Eerden, Floris J, Lopez Clément, Wassenaar Cesar A, Periole Tsjerk A, de Vries Xavier, Tieleman Alex H. Marrink Peter. Lipid organization of the plasma membrane. Journal of the American Chemical Society, 136(41):14554–14559, 2014. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES