Abstract
Deep neural networks have led to significant advancements in microscopy image generation and analysis. In single-molecule localization-based super-resolution microscopy, neural networks are capable of predicting fluorophore positions from high-density emitter data, thus reducing acquisition time, and increasing imaging throughput. However, neural network-based solutions in localization microscopy require intensive human intervention and often compromise between model performance and its generalization. Researchers have to manually tune simulated training data parameters to resemble their experimental data; thus, for every change in the experimental conditions, a new training set should be manually tuned, and a new model should be trained. Here, we introduce AutoDS and AutoDS3D, two software programs for super-resolution reconstruction of single-molecule localization microscopy data that are based on Deep-STORM and DeepSTORM3D. Our methods significantly reduce human intervention from the analysis process by automatically extracting the experimental parameters from the imaging raw data. In the 2D case, AutoDS selects the optimal model for the analysis out of a set of pre-trained models, hence, completely removing user supervision from the process. In the 3D case, we improve the computation efficiency of DeepSTORM3D and integrate the lengthy workflow into a graphic user interface that enables image reconstruction with a single click. Ultimately, we demonstrate comparable or superior performance of both methods compared to Deep-STORM, DeepSTORM3D, and other state-of-the-art methods, while significantly reducing the manual labor and computation time.
Subject terms: Super-resolution microscopy, Image processing
Introduction
Super-resolution microscopy enhances the resolving capability of optical imaging to interrogate length scales below the diffraction limit of light. In single-molecule localization microscopy (SMLM) methods, such as direct stochastic optical reconstruction microscopy ((d)STORM)1,2, DNA-points accumulation for imaging in nanoscale topography (DNA-PAINT)3, and (fluorescence) photoactivated localization microscopy ((f)PALM)4,5, the localization of non-overlapping fluorophore signals yields an improvement in resolution of an order of magnitude or more6. Furthermore, SMLM can be extended to 3D imaging by employing point spread function (PSF) engineering7,8, where the optical system is altered to encode the axial positions of fluorophores in the PSF shape. Since SMLM relies on accumulating isolated emitter signals to generate a super-resolved image, these methods naturally result in long imaging times. Consequentially, the throughput of SMLM methods is limited, and live-cell imaging is restricted to slow biological dynamics.
The integration of neural networks (NNs) into SMLM reduces the acquisition time by enabling the prediction of super-resolved images from high-emitter density imaging data9–14. Typically, the training data used to train these networks is based on images that highly resemble the experimental data; the level of similarity between the training data and the experimental data strongly affects performance. One strategy to prepare such training data is to simulate a large and diverse dataset10–12, which depends mainly on selecting optimal simulation parameters, such as noise levels, emitter density, as well as the encoding PSF15, in the 3D case. Alternatively, the application of exchangeable fluorophore labels16 in an SMLM experiment allows one to generate experimental data for model training, as demonstrated using Deep-STORM in combination with DNA-PAINT17 and live-cell HaloTag-PAINT18.
All deep learning based super-resolution reconstruction methods face a common challenge: dealing with heterogeneous data, including varying emitter densities, signal-to-noise ratio (SNR) conditions, and background levels within a single dataset, or even within a single frame; the performance of a single trained model applied across a heterogeneous dataset is limited. Furthermore, models with narrow applicability across datasets are particularly limited for image restoration packages that are sensitive to variation in noise profiles, and model retraining is required with even relatively minor changes to acquisition parameters19. This bottleneck introduces a notable challenge in balancing the generalizability of the network across different sample conditions and its overall performance, a challenge also known as the bias-variance trade-off. Interestingly, apart from increasing the training data variability for single models10, no existing learning-based localization algorithm has adequately addressed the issue of dataset heterogeneity.
In the context of 3D SMLM by PSF engineering, although the heterogeneity issue exists there as well20,21, a more limiting problem is that the whole analysis pipeline, from pre-processing to post-processing is extremely manual labor-demanding. The process typically includes: accurate PSF characterization, image pre-processing, training data generation, network training, inference, and post-processing, involving multiple programming languages and software. Furthermore, many parameters in this process need to be properly and manually tuned to guarantee the model’s performance, which requires user expertise. Overall, this process is time-consuming and challenging for end users.
In this work, we demonstrate several contributions to the automation of deep learning-based 2D and 3D localization microscopy. In the 2D case, (1) we develop AutoDS, a fully-automated algorithm for experimental parameter extraction that alleviates the need for manual tuning of SMLM training data; (2) we provide a set of pre-trained Deep-STORM models that covers a wide range of experimental conditions and completely eliminates the need for user intervention during inference; (3) we show that feeding Deep-STORM with patches of the field-of-view and analyzing them separately leads to improved performance compared to processing of entire frames. Our results show significant improvement in predicting super-resolution images from heterogenous high density SMLM datasets of complex biological samples compared to using a single, frame-wide model; (4) in the 3D case, we automated the standard image processing pipeline of PSF engineering-based SMLM and develop AutoDS3D, a one-click graphic user interface (GUI) built on the DeepSTORM3D (DS3D) framework11. AutoDS3D achieves reconstruction quality comparable to DeepSTORM3D across various PSF types while operating faster and fully automatically. Overall, these contributions extend the previous Deep-STORM framework; hence, we refer to our method here as AutoDS and AutoDS3D.
Notably, our contribution extends beyond algorithm performance and includes the capability of using AutoDS/3D as a first-of-its-kind one-click deep-learning-based plug-and-play tool for non-expert users, significantly removing human decision-making from the loop, and in the case of 2D imaging, completely alleviating the need for post-experiment model training. This contributes to the democratization of state-of-the-art deep learning tools by significantly reducing the human-supervised model-training step and improving prediction quality.
Results
AutoDS
Our analysis scheme starts with the division of each input video into multiple patches (Fig. 1). The choice of patch size is important: undersized patches lead to suboptimal decisions based on local properties, e.g., either zero emitter density in the absence of emitters in the patch, or very high emitter density when only a single emitter is present. On the other hand, the use of oversized patches ultimately converges to the global parameter-based network selection. Another important trade-off governed by the choice of patch size is between runtime and the model optimization granularity. Although smaller patches are processed faster than large patches by convolutional NNs, decreasing the patch size leads to more forward passes through the model and longer runtimes eventually. In this study, we selected patch length scale in the range of [6.5, 14.9] μm, which we found to be a good compromise between the aforementioned considerations. Nevertheless, our analysis showed that this pipeline is robust to patch lengths larger than 2.5 μm (see Fig. S1).
Fig. 1. AutoDS workflow.
a Dividing the experimental single-molecule localization microscopy (SMLM) video into multiple patches; b extracting the emitter density and the signal-to-noise ratio (SNR) parameters from each patch, and comparing the estimated patch parameters to the parameters used for the pre-trained models; c choosing the best fitting model to analyze each patch (green box marks the chosen model); d analyzing the patches using the selected pre-trained Deep-STORM model and generating the localization maps; e reassembling the patches according to their original position in the field of view to obtain a super-resolution reconstruction. Scale bars are a, b 800 nm and e 5 µm.
Crucially, we have made AutoDS independent of wavelength, numerical aperture (NA), and camera pixel size, by allowing users to specify these parameters during model training and during inference. This is achieved by modifying the experimental images to match the simulated parameters; this alleviates the need to retrain new models for different optical systems (see image interpolation in the methods section). This allows us to use pre-trained models, by automatically down/up-scaling the experimental images during inference. Notably, this interpolation is optional and could be turned off when not helpful.
Subsequently, we employ an automated parameter-extraction algorithm to classify the patches based on emitter density and the mean and standard deviation of the following properties (see parameter extraction and model selection in the methods section): background, noise, PSF photon intensity, and PSF dimensions (see Figs. S2, S3 for visualization of the estimation performance of the model). Based on these classifications, the appropriate model from a set of four pre-trained Deep-STORM models is chosen for processing each patch. Finally, each patch is analyzed by the corresponding model, and the resulting localization maps are integrated into the full field-of-view (FOV). The whole pipeline requires no decision-making by the user, making it user-friendly and suitable for non-experts.
In our first experiment, we applied AutoDS to a DNA-PAINT experiment containing microtubule samples recorded for 30,000 frames. This dataset did not contain high heterogeneity in the emitter density and the SNR; hence, as expected, the reconstruction quality of Deep-STORM and AutoDS is similar (Fig. 2).
Fig. 2. Reconstruction quality of AutoDS.
Comparison between a Deep-STORM and b AutoDS reconstructions of low-density SMLM data of a cell stained for microtubules. The Deep-STORM model was trained specifically for this dataset. Scale bars = 4 μm for the large field of view and 1.5 μm for the region of interest (ROI) marked by the yellow rectangle. c Two-color AutoDS reconstruction of a rat brain tissue section labeled for alpha-tubulin (red) and TOM20 (blue) and imaged with DNA-PAINT. Example reconstructed ROIs for Deep-STORM (left), AutoDS (middle) and low-density ground truth (right) for each target separately d - mitochondria, e - tubulin). f The mean peak signal-to-noise ratio (PSNR), normalized root-mean-squared error (NRMSE), and multi-scale structured similarity index measure (MS-SSIM) (error bars = standard deviation values) across n = 5 different experiments per target obtained by Deep-STORM (DS) and AutoDS.
Next, we applied AutoDS to high-density DNA-PAINT data recorded from rat brain tissue labeled for two targets, TOM20 and α-tubulin17 (Fig. 2c). Per target, imager strand concentrations were tuned to obtain high-emitter densities. Our automated pipeline estimated that the emitter densities were in the range of [0.0, 1.81] emitters per µm2 (see Table S1 for estimated parameter statistics). The targets were recorded for 400 frames and at single frame acquisition time of 150 ms, corresponding to a 1-min total imaging time. From these datasets, super-resolved images were predicted with AutoDS. Moreover, a ground truth dataset was acquired for the same samples with low emitter density at a total imaging time of 25 min. The ground truth images were reconstructed using an SMLM fitting software22.
We compared the patchwise reconstructions of AutoDS to the reconstruction of a single Deep-STORM model that processed the entire field-of-view, and to the reconstruction of DECODE, a different state-of-the-art high-density reconstruction algorithm (see Fig. S6 for an example visual comparison to DECODE). Visual inspection of the predicted images generated by AutoDS versus the ground truth images (Fig. 2d, e) showed faithful reconstruction in structure and density of targets. To quantitatively assess the performance of AutoDS we measured the similarity between the reconstructed images and the ground truth images using the peak signal to noise ratio (PSNR), the normalized root mean squared error (NRMSE), and the multi-scale structural similarity index (MS-SSIM)23,24 (Fig. 2f).
Moreover, we estimated the reconstructed image resolution using both the Fourier Ring Correlation25 (FRC) and the decorrelation analysis26 methods (means and standard deviations of all methods are reported in Table S2). Deep-STORM resolution estimation was consistently worse than AutoDS and DECODE estimated resolutions, while the latter had similar performance overall. Importantly, visual inspection of the images did not perfectly match the estimated reconstruction resolutions obtained by FRC and decorrelation analysis. Notably, since both resolution estimation methods are highly dependent on the imaging conditions, such as background noise, photon count, etc., we find the relation between the estimations of different reconstruction algorithms more informative than their absolute values.
Qualitatively, AutoDS reconstructions show superior visual quality compared to the single-model approach, with a higher number of predicted emitters, resulting in more continuous structural features in some regions. Quantitatively, all metrics showed that AutoDS outperforms the single-model approach (see Table 1), supporting the visual observations (Fig. 2d, e). Nevertheless, the reconstruction quality is comparable in most of the field-of-view, mainly because of the low variation of the emitter density and SNR throughout most of the field-of-view. In other words, the highest improvement of our method is shown in regions where the emitter densities and SNR deviate from their mean values.
Table 1.
Quantitative comparison between Deep-STORM and AutoDS reconstructions on the mitochondria and tubulin datasets
| Reconstruction algorithm | Sample type | PSNR | NRMSE | MS-SSIM |
|---|---|---|---|---|
| Deep-STORM | Mitochondria | 12.74 ± 3.65 | 1.65 ± 0.76 | 0.991 ± 6.8·10−5 |
| AutoDS | Mitochondria | 16.44 ± 0.56 | 0.97 ± 0.05 | 0.999 ± 7.5·10−5 |
| Deep-STORM | Tubulin | 15.71 ± 0.71 | 0.9 ± 0.06 | 0.998 ± 1.7·10−4 |
| AutoDS | Tubulin | 17.59 ± 0.56 | 0.73 ± 0.02 | 0.999 ± 7.9·10−5 |
The mean PSNR, NRMSE, and MS-SSIM are reported with ± one standard deviation over n = 5 different fields of view.
We examined the distribution of the selected pre-trained models for both the TOM20 and α-tubulin experiments and found that the model selection distribution was spread across multiple models (see Fig. S7). This finding shows the potential contribution of our pipeline for the analysis of experimental data with high variability in sample properties like emitter density and SNR. Moreover, having a set of pre-trained models suitable for multiple experimental conditions is highly beneficial to remove user intervention from the analysis process. An application that demands such a flexibility is multi-target imaging (Fig. 2c), where AutoDS can serve as a digital add-on to a microscope setup and can enable straight-forward multiplexing27.
A major implication of our automated framework is speeding-up of SMLM analysis, towards potentially real-time SMLM. This is achievable due to the complete elimination of any post-experiment training requirements. To quantify the speed enhancement, we analyzed a typical SMLM video containing 400 high-emitter density frames of size = 512 × 512 pixels, which contained sufficient information for a complete structural reconstruction. We split this dataset into 64 patches and ran AutoDS; the model selection took less than a minute, and the inference and reassembly of the reconstructed image from the patches took another 10 min. This is an order of magnitude faster compared to a neural network that requires training post-experiment9 (typically ~1–2 h per model training).
AutoDS3D
AutoDS3D is a web application that runs on either a local computer or a remote server. It takes optical parameters, a calibration z-stack, and raw images from SMLM data set as input, and generates a localization list with a single click or step-by-step interactions (Fig. S8). To enhance computation efficiency, AutoDS3D employs a Gaussian representation of emitters in space. Additionally, like AutoDS, it automates the image analysis workflow, eliminating the need for technically demanding manual labor.
DS3D11 discretizes 3D object space into voxels, assigning a value of 1 to voxels occupied by an emitter and 0 elsewhere. This binary representation limits precision to the voxel size and the high resolution demands large matrices with small voxels, which significantly increases computational complexity. AutoDS3D addresses this limitation by representing emitters using 3D Gaussians (Fig. S17). The center of mass of such a Gaussian defines emitter location, enabling the use of larger voxels while maintaining high localization precision and improving computational efficiency.
For automation, we integrate: a phase retrieval algorithm for PSF characterization15,28, image pre-processing for background removal, SNR characterization, training data generation, network training, inference test and final inference on all the images. Intermediate results are generated during the execution to provide feedback (Fig. S9). This step completely alleviates the extremely time- and expertise-demanding need for user parameter selection (e.g., SNR, emitter density) by eyeballing and trial and error.
We first compared the performance of (1) AutoDS3D; (2) DS3D with; and (3) without 4x up-sampling for lateral voxel size control. These three models are trained under controlled conditions including the PSF model, the number of training images, and the level of noise and photon count. We then performed 1000 inference runs with each trained model to assess localization performance, on simulated images of randomly located emitters in 3D.
The evaluation metrics include: duration of training data generation, training duration, inference duration, post-processing duration, Jaccard index, and lateral and axial root mean square error (RMSE) between the localizations and simulated ground truth locations. This comparison is conducted under both high and low SNR conditions, as illustrated in the radar plots of Fig. 3 (see Fig. S18, S19 for bar plots). In high SNR conditions (Fig. 3a), AutoDS3D achieves the best computation efficiency and Jaccard index among the three models while maintaining localization accuracy comparable to DS3D with up-sampling. Although DS3D without up-sampling reduces operation time, it compromises lateral RMSE due to the coarse lateral voxel size. Under low SNR condition (Fig. 3b), AutoDS3D continues to lead in computational efficiency and Jaccard index, while localization RMSE becomes comparable across all models, likely due to the dominant influence of noise on localization precision.
Fig. 3. Accuracy and efficiency comparison of AutoDS3D and DeepSTORM3D in simulations.
Computation efficiency metrics, including training data generation time, training time, inference time, post-processing time, and accuracy metrics, including axial and lateral root-mean-squared error (RMSE), as well as 1-Jaccard index, are analyzed through 1000 times of inference whose statistical box plots are shown around the radar plot: a for high signal-to-noise ratio (SNR) and b for low SNR. Example frames are shown for both cases.
Next, we validated AutoDS3D using experimental SMLM data with various PSFs. We started with a Tetrapod dataset11 (20k frames, 333 by 313 pixels, video 1) and processed it using both AutoDS3D and DS3D (with 4x up-sampling). As shown in Fig. 4, both methods show comparable reconstruction quality (see Fig. S12 for FRC analysis). DS3D requires approximately 14 h for this reconstruction, plus additional hours/days depending on the user expertise for parameter selection and validation. In contrast, AutoDS3D completes the process in less than 2 h with a single click. Another validation of AutoDS3D with double helix PSF is shown in Fig. S11.
Fig. 4. Experimental comparison between AutoDS3D and DS3D using tetrapod point spread function.
a Reconstruction of AutoDS3D with cross section c and d. b Reconstruction of DS3D with cross section e and f at the same location as c and d.
We also compared AutoDS3D to another state-of-the-art deep learning-based reconstruction algorithm DECODE10. Lamin B1 was imaged with DNA-PAINT and the data was processed using AutoDS3D, DS3D, and DECODE (Fig. 5). In this comparison, AutoDS3D and DS3D achieves comparable image quality, both outperforming DECODE. A comprehensive comparison based on this dataset, including runtime, PSF compatibility, and localization counts, is provided in Table S3.
Fig. 5. Experimental comparison of AutoDS3D, DS3D, and DECODE using double-helix point spread function.
Reconstruction performance of a AutoDS3D, b DS3D, and c DECODE. Cross sections of each reconstruction are shown in d–i Scale bars in a–c: 5 µm. Scale bars in d–i: 1 µm.
Finally, we evaluated AutoDS3D on two datasets featuring astigmatic PSF29 (video 3 and video 4). In the reconstruction of nuclear pore complexes (Fig. 6a), the two-layer structures with an axial separation of ~50 nm are clearly resolved. The reconstructed microtubules (Fig. 6b, with cross-sections in c and d) exhibit well-defined hollow tubular structures with a diameter of ~50 nm.
Fig. 6. Experimental validation of AutoDS3D using astigmatic point spread function.
a Reconstruction of nuclear pore complexes with a zoom-in view and cross sections showing two-layer structures with axial distance of around 50 nm. b Reconstruction of microtubules with axial cross section c and d showing hollow tubule structures.
Discussion
We present several key contributions to the automation of deep-learning-based SMLM algorithms. The key results include: 1. one-click 2D and 3D SMLM reconstruction, 2. alleviation of post-acquisition network training in the 2D case, and faster reconstruction in the 3D case, and 3. patch-wise model selection in the 2D case to handle the locally heterogenous emitter density that is intrinsic to SMLM data of biological samples.
AutoDS is preloaded with four NN models for localization prediction in different experimental conditions. Users of AutoDS can use the set of pre-trained models offered in the package, while expert users may opt to train their own set of models and model selection logic to optimize their network performance for their needs. More generally, one can think of various multi-model methods that can be broadly applied to all manners of PSF-based SMLM datasets, regardless of biological structure, for a better quality of image prediction.
Furthermore, the most time- and resource-intensive component of NNs is the preparation of training data and model training. Training data is the most important aspect of a well-performing model; however, commonly, high-quality ground truth data matching the experimental data is not easily available. Additionally, in Deep-STORM (both 2D and 3D), the generation of training data is human-supervised, namely, the user must observe the training data and visually assert that it resembles the experiment. By implementing pre-trained models in this workflow, we remove the requirement for model training, which eliminates the need for preparing training datasets and saves on computing and human resources.
Taken together, AutoDS provides better functionality, improved output quality, and user-friendliness compared to the single-model module. This multi-model, patch-based approach can also be easily extended to other image-based NNs that are confronted with heterogeneous image data from cell biology samples, such as in denoising30,31, image restoration19, and segmentation32–34. Future implementations of AutoDS can include an automated and non-rigid patch size selection module to confine similar structural densities within a single patch for more accurate model application.
In 3D SMLM, the state-of-the-art deep learning-based algorithms remain highly demanding in terms of user expertise, which limits their adoption by the broader community. For example, DS3D depends on the external software VIPR, which requires a strong background in optical imaging. To generate representative training data, DS3D users must manually tune numerous parameters, and the final network feedback from each tuning iteration is typically only available after over 10 h, due to the computational cost of its volumetric network output. DECODE improves computational efficiency by producing 2D outputs, but it still relies on third-party PSF calibration software SMAP, which also demands substantial user expertise. Based on our experience, SMAP struggles to calibrate PSFs with large spatial footprints, such as tetrapod or double helix with large axial range. In this context, we believe the one-click usability of AutoDS3D makes it more accessible and potentially more appealing to the broader microscopy community.
Using a Gaussian representation of molecules, AutoDS3D resolves the trade-off between localization precision and computational efficiency, allowing the use of larger voxel sizes without compromising accuracy. AutoDS3D integrates the key steps of 3D SMLM image processing into a user-friendly GUI, making advanced localization accessible to users without computational expertise.
Although AutoDS3D enables one-click 3D super-resolution reconstruction, it still requires time-consuming model training. One future direction is to incorporate pre-trained models, similar to the 2D case, to enable instantaneous inference without the need for training. However, a key challenge in this approach will be ensuring compatibility between the PSF models used in the pre-trained models and those encountered in practical experiments on the user side. One natural solution would be to include the PSF model, such as a z-stack of calibration beads, as an additional input to the localization network.
Methods
AutoDS - Parameter extraction and model selection
Initially, each frame is background subtracted by the 35th intensity percentile in the frame; then, the frame is min-subtracted to zero out the minimal pixel values. The density-adaptive parameter extraction algorithm receives as input low-resolution single-molecule patches. First, we look for patches that do not contain any emission event by requiring the 99th percentile in the patch to be higher than twice the patch mean and the frame mean intensity value. This step is crucial since the patchwise analysis scheme is more sensitive to mistakes in very low-density patches due to a lack of signal supervision from the entire FOV. Then, we determine a detection threshold for emission events according to the following formula:
| 1 |
All pixels that cross this threshold and are local maxima within a neighborhood of 3 × 3 pixels are selected for our rough estimate of the number of emitters. The emitter density is predicted by dividing the number of emitters by the analyzed patch size. Next, we resolve the noise statistics by creating a binary mask containing zeros in a 5 × 5 neighborhood around each estimated emitter position and ones anywhere else. The mean and standard deviation of the noise are calculated using the generated mask, and the signal mean and standard deviation are calculated using the estimated emitter pixels.
For the model selection, we focused on two extracted parameters: the emitter density and the SNR, defined by the mean signal amplitude divided by the noise standard deviation. While different networks could be trained for different combinations of these two parameters, we found that it was practically sufficient to train four AutoDS models with varying density and SNR conditions to analyze patches with different difficulty levels (see Table 2). We empirically chose the minimal SNR value to be 2 since the emitter signal is barely visible in this scenario; furthermore, we chose the maximal SNR value to be 8 since this SNR value is sufficient for precise emitter localization. Moreover, we found that training Deep-STORM models on too high emitter density values resulted in undertrained models that perform badly; thus, we decided to use a maximal emitter density of 2 emitters per micron squared with some standard deviation that creates a variety of scenarios with higher local density.
Table 2.
The pre-trained model parameters
| Model 1 | Model 2 | Model 3 | Model 4 | |
|---|---|---|---|---|
| Emitter density Emitters | 0.5 | 1.0 | 1.5 | 2.0 |
| Signal-to-noise ratio | 8 | 6 | 4 | 2 |
As the difficulty level increases, the emitter density increases, and the signal-to-noise ratio decreases. Decoupling those two parameters and producing a larger number of pre-trained nets did not make a substantial difference to the results.
In depth analysis of prediction quality, as well as performance quantification as a function of emitter density and SNR could be seen in Figs. S3–S5.
Each input patch is analyzed by the algorithm described above and is linked to a difficulty level between 1 and 4. Then, we average between the emitter density-based difficulty level and the SNR-based level to determine which pre-trained model will analyze the input patch.
2D Image interpolation
Different labs use different optical systems for recording SMLM experiments. Variations in the emitter emission wavelength, camera pixel size, and numerical aperture affect the PSF size and shape; thus, requiring previous deep learning approaches to be re-trained for each optical setup. In this work, we compare the mentioned parameters during training and the experimental data to interpolate the input image and allow using the same pre-trained model for the analysis of images captured by any reasonable optical system. First, we determine the PSF size in the training data of our pre-trained models and compare it to the PSF size of the experimental data. PSF size is calculated by the formula:
| 2 |
We divided the interpolation into two cases:
If the model PSF size is larger, we need to make the experimental PSFs slightly larger. We use a Gaussian filter of size to inflate the PSFs to fit the expected size by the neural networks.
If the model PSF size is smaller, we need to downsample the input frames (effectively, reducing the PSF size). We do so by cubic interpolation with downsampling rate of .
Although both interpolation methods are not physically rigorous, this interpolation enabled us to use the same pre-trained model for all 2D experiments shown in this manuscript without noticeable performance degradation, showcasing the robustness of AutoDS.
2D Microscopy data - sample preparation
Rat brain tissue section 2D DNA-PAINT datasets were reused from a previous study17, in keeping with the principle of data reusability and reproducibility, as well as reducing laboratory materials and animal use. Animal handling from the previous work was performed according to the regulations by the Regierungspraesidium Karlsruhe. Data from the previous work is available on the Zenodo repository35. No animal studies were performed in this work. Methods ranging from sample handling to DNA-PAINT imaging are outlined in the previous study.
For the acquisition of 2D microtubule datasets, HeLa cells were cultured in DMEM (Gibco), supplemented with 10% FCS (Gibco, USA), 1% GlutaMAX (Gibco) and 1% nonessential amino acids (Gibco) at 37 °C in 5% CO2. Cells were seeded onto #1.5 thickness glass coverslips (Marienfeld, Germany) overnight. The cells were washed with a BRB80 microtubule stabilizing buffer (80 mM PIPES, 1 mM MgCl2, 1 mM EGTA; Sigma Aldrich, USA) and 0.5% Triton-X 100 solution at 37 °C for 30 s. The washing solution was removed and the second BRB80 solution with 0.5% Triton-X 100 and 0.5% glutaraldehyde (Electron Microscopy Sciences, USA) at 37 °C was added to the cells and incubated for 10 mins. The cells were quenched with sodium borohydride (Carl Roth, Germany) for 7 mins and washed with PBS. Blocking was performed with 5% FBS (Gibco) for 30 mins. Cells were incubated with a primary antibody against α-tubulin-mouse (T6199, Sigma-Aldrich, USA; clone DM1A; dilution 1:300) in 0.5% FCS for 1 h and subsequently washed with PBS three times. Cells were then labeled with a single domain mouse antibody conjugated to an F1 DNA-PAINT docking strand (Massive-SDAB-FAST 2-PLEX, Massive Photonics, Germany) for 1 h followed by 3 washes with PBS. Microtubules were imaged using 0.5 nM F1 imager strands conjugated to Cy3b. Acquisition was performed at 100 ms/frame for 30,000 frames on a custom-built SMLM microscope36.
AutoDS - Ground truth and predicted images
Ground truth α-tubulin and TOM20 images (low-emitter density, 0.5 nM imager strands, 10,000 frames; Fig. 2) were fitted and reconstructed in Picasso software22. High-emitter density DNA-PAINT datasets for α-tubulin (5 nM imager strands, 400 frames) and TOM20 (10 nM imager strands, 400 frames) were input into the AutoDS NN in the Google Colab notebook environment (Pro version) for super-resolution image prediction. The “patchwise_analysis” parameter is selected, which uses the set of pretrained models in the “Models” folder. The pixel size and emission wavelength of the dataset were defined as 233 nm, and the number of patches used was 8. For the single model comparisons, and Deep-STORM notebook was used to analyze the same high-emitter density datasets. The model upsampling factor of 8 determines the predicted image pixel size.
2D Image analysis
Ground truth and predicted images of tubulin and TOM20 (Fig. 2) were processed in Python. The image intensities were normalized, clipped in the range of the 0th and the 99th percentiles, and a Gaussian blur of sigma = 2 was applied. All images were registered using fiducial markers. Predicted images were compared against ground truth images.
SNR characterization of AutoDS3D
In AutoDS3D, SNR is automatically characterized following PSF calibration and background subtraction during the preprocessing of raw images. Imaging noise is modeled as a combination of Poisson-distributed shot noise, governed by the photon count, and additive Gaussian readout noise defined by its mean and standard deviation. These noise parameters are extracted from a region containing sparse molecules (see Figs.S14 for the analysis of the influence of molecule density on the performance of this parameter extraction method), ideally non-overlapping PSFs, in experimental images.
The detection region can be set by default as a corner of the field of view or specifically defined—either by drawing a square in the pop-up image display window or by entering the region’s coordinates and dimensions in the GUI. From this region, 1000 consecutive frames are cropped, forming a time series of length 1000 for each pixel. The pixel with the lowest mean intensity is considered the noise pixel, and its mean and standard deviation are used as baseline estimates for the Gaussian readout noise. To allow for estimation tolerance, this pair of parameters is empirically extended to a range of [1.0, 1.4] times their respective values.
To estimate the photon count, the maximum pixel value (MPV) is first identified in the cropped region of the experimental data. Using the calibrated PSF model, we compute the average ratio between a preset photon count and MPV of the PSF across various axial positions in a defined z-range. This ratio is then used to convert the experimental MPV into an estimated photon count. Since this estimation tends to be optimistic—due to noise and potential PSF overlap—we empirically extend the estimated value to a range of [0.5, 1.1] times the nominal value to account for these uncertainties. This estimation method is PSF-agnostic and the application to Tetrapod and double helix PSF is analyzed in Fig. S14.
PSF calibration
For PSF calibration, AutoDS3D automatically analyzes a bead z-stack to retrieve the pupil phase for practical PSF modeling. Additionally, we provide another option of in-situ characterization which was recently implemented by inverse modeling in TensorFlow28. We adapt it and realize it in PyTorch in an additional code that comes with AutoDS3D. We validate this code with both simulated and experimental data (Figs. S15, S16).
While in-situ modeling can potentially yield PSF models more representative of the actual sample compared to bead-based calibration, it inherently involves solving an ill-posed inverse problem, where multiple pupil functions may produce similar 3D PSFs. In our simulation, the converged PSF model exhibits a z range that appears scaled relative to the ground truth z-range (Fig. S15). Incorporating additional prior information is essential to constrain the solution space and make the problem well-posed.
Lamin B1 sample preparation and imaging
For single-molecule imaging of lamin B1, human osteosarcoma (U-2 OS, HTB-96, ATCC) cells were cultured at 37 °C and 5% CO2 in Dulbecco’s modified Eagle’s medium (DMEM) (21063029, Gibco) supplemented with 10% (v/v) fetal bovine serum (FBS) (16000044, Gibco) and 100 mM sodium pyruvate (11360070, Gibco). The cells were seeded in a custom-made PDMS microfluidic chip14 in cell culture medium at 37 °C and 5% CO2 and allowed to adhere. Fixation was performed by incubating the sample in 4% (w/v) paraformaldehyde in phosphate buffered saline (PBS) for 20 min at room temperature. The cells were then washed once in PBS and quenched in 10 mM ammonium chloride in PBS for 10 min. Next, the cells were permeabilized by washing them three times with 0.2% (v/v) Triton X-100 in PBS, with a 5-min incubation between each wash. Subsequently, the cells were blocked with 3% (w/v) bovine serum albumin (BSA, A2058, Sigma-Aldrich) in PBS for 1 h. Next, the cells were labeled with rabbit anti-lamin B1 primary antibodies (ab16048, Abcam) at a dilution of 1:1000 in 1% (w/v) BSA in PBS for 2 h at room temperature. After washing three times with 0.1% (v/v) Triton X-100 in PBS, the cells were labeled with donkey anti-rabbit secondary antibodies conjugated to oligonucleotides (Massive Photonics) diluted in antibody incubation buffer (Massive Photonics) for 1 h at room temperature. The cells were then washed three times with 1x washing buffer (Massive Photonics), three times with 0.2% (v/v) Triton X-100 in PBS, and once in imaging buffer (500 mM NaCl in PBS, pH 8).
The single-molecule data of lamin B1 was obtained using an optical setup described previously14. Light sheet illumination at 560 nm was used with an intensity of ~800 W/cm2. 10,000 frames were acquired using an exposure time of 100 ms by flowing in the imaging buffer containing 0.08 nM complementary oligonucleotides conjugated with Cy3B (Massive Photonics).
The influence of the nominal focal plane on the reconstruction in AutoDS3D
We examine the impact of the nominal focal plane (NFP) parameter on reconstruction by assigning different NFP values in the GUI. The analysis (Figure S10) shows that the z-range size remains relatively constant for consistent PSF shapes and the z-range position shifts linearly with NFP.
Supplementary information
Acknowledgements
This research was supported by the Israel Science Foundation (Grant no. 1081/24) to Y.S. and A.S. and by partial financial support from the National Institute of General Medical Sciences of the National Institutes of Health (Grant R35GM155365) and startup funds from the Cancer Prevention and Research Institute of Texas (Grant RR200025) to A.-K.G. M.H. acknowledges funding by the Deutsche Forschungsgemeinschaft (DFG) (Grants GRK 2566, CRC 1177). Y.S. is supported by the Zuckerman Foundation. A.S. is supported by the Fulbright Program under the Fulbright Visiting Scholar Program. We acknowledge the access and services provided by the Imaging Centre at the European Molecular Biology Laboratory (EMBL IC), generously supported by the Boehringer Ingelheim Foundation.
Author contributions
A.S. developed and validated AutoDS, contributed to data visualization, and wrote the manuscript. D.X. developed and validated AutoDS3D, contributed to data visualization, and participated in manuscript writing. K.K.N. tested AutoDS on 2D localization microscopy datasets, contributed to visualization, and assisted in manuscript preparation. Y.N. evaluated AutoDS3D on double-helix PSF data, contributed to visualization, and assisted in manuscript preparation. G.G. and N.S. contributed a sample single-molecule super-resolution dataset of nuclear lamina protein Lamin B1 for analysis, assisted in running DECODE, and assisted in manuscript preparation. A.-K.G. provided supervision and funding support and reviewed the manuscript. M.H. contributed supervision, secured funding, and participated in manuscript writing. Y.S. supervised and directed the overall project. All authors read and approved the final manuscript.
Data availability
The datasets generated and/or analysed during the current study are not publicly available due to their large size but are available from the corresponding author on reasonable request.
Code availability
The software used in this work is publicly available at: https://github.com/alonsaguy/One-click-image-reconstruction-in-single-molecule-localization-microscopy-via-deep-learning.
Competing interests
The authors declares no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Alon Saguy, Dafei Xiao.
Supplementary information
The online version contains supplementary material available at 10.1038/s44303-025-00123-w.
References
- 1.Rust, M. J., Bates, M. & Zhuang, X. Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM). Nat. Methods3, 793–795 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Heilemann, M. et al. Subdiffraction-resolution fluorescence imaging with conventional fluorescent probes. Angew. Chem. Int. Ed.47, 6172–6176 (2008). [DOI] [PubMed] [Google Scholar]
- 3.Jungmann, R. et al. Single-molecule kinetics and super-resolution microscopy by fluorescence imaging of transient binding on DNA origami. Nano Lett.10, 4756–4761 (2010). [DOI] [PubMed] [Google Scholar]
- 4.Betzig, E. et al. Imaging intracellular fluorescent proteins at nanometer resolution. Science313, 1642–1645 (2006). [DOI] [PubMed] [Google Scholar]
- 5.Hess, S. T., Girirajan, T. P. K. & Mason, M. D. Ultra-High Resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J.91, 4258–4272 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Sauer, M. & Heilemann, M. Single-molecule localization microscopy in Eukaryotes. Chem. Rev.117, 7478–7509 (2017). [DOI] [PubMed] [Google Scholar]
- 7.Pavani, S. R. P. et al. Three-dimensional, single-molecule fluorescence imaging beyond the diffraction limit by using a double-helix point spread function. Proc. Natl. Acad. Sci.106, 2995–2999 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Shechtman, Y., Sahl, S. J., Backer, A. S. & Moerner, W. E. Optimal point spread function design for 3D imaging. Phys. Rev. Lett.113, 133902 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Nehme, E., Weiss, L. E., Michaeli, T. & Shechtman, Y. Deep-STORM: super-resolution single-molecule microscopy by deep learning. Optica5, 458 (2018). [Google Scholar]
- 10.Speiser, A. et al. Deep learning enables fast and dense single-molecule localization with high accuracy. Nat. Methods18, 1082–1090 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Nehme, E. et al. DeepSTORM3D: dense 3D localization microscopy and PSF design by deep learning. Nat. Methods17, 734–740 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Saguy, A. et al. DBlink: dynamic localization microscopy in super spatiotemporal resolution via deep learning. Nat. Methods20, 1939–1948 (2023). [DOI] [PubMed] [Google Scholar]
- 13.Nogin, Y. et al. DeepOM: single-molecule optical genome mapping via deep learning. Bioinformatics39, btad137 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Saliba, N., Gagliano, G. & Gustavsson, A.-K. Whole-cell multi-target single-molecule super-resolution imaging in 3D with microfluidics and a single-objective tilted light sheet. Nat. Commun.15, 10187 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ferdman, B. et al. VIPR: vectorial implementation of phase retrieval for fast and accurate microscopic pixel-wise pupil estimation. Opt. Express28, 10179 (2020). [DOI] [PubMed] [Google Scholar]
- 16.Albertazzi, L. & Heilemann, M. When weak is strong: a plea for low affinity binders for optical microscopy. Angew. Chem. Int. Ed.62, e202303390 (2023). [DOI] [PubMed] [Google Scholar]
- 17.Narayanasamy, K. K., Rahm, J. V., Tourani, S. & Heilemann, M. Fast DNA-PAINT imaging using a deep neural network. Nat. Commun.13, 5047 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Jang, S. et al. Neural network-assisted single-molecule localization microscopy with a weak-affinity protein tag. Biophys. Rep.3, 100123 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Weigert, M. et al. Content-aware image restoration: pushing the limits of fluorescence microscopy. Nat. Methods15, 1090–1097 (2018). [DOI] [PubMed] [Google Scholar]
- 20.Fu, S. et al. Field-dependent deep learning enables high-throughput whole-cell 3D super-resolution imaging. Nat. Methods20, 459–468 (2023). [DOI] [PubMed] [Google Scholar]
- 21.Xiao, D. et al. Large-FOV 3D localization microscopy by spatially variant point spread function generation. Sci. Adv.2024, eadj3656–10 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Schnitzbauer, J., Strauss, M. T., Schlichthaerle, T., Schueder, F. & Jungmann, R. Super-resolution microscopy with DNA-PAINT. Nat. Protoc.12, 1198–1228 (2017). [DOI] [PubMed] [Google Scholar]
- 23.Wang, Z., Simoncelli, E. P. & Bovik, A. C. Multiscale structural similarity for image quality assessment. 37th Asilomar Conference on Signals, Systems & Computers2, 1398–1402 (2003). [Google Scholar]
- 24.Prieto, G., Chevalier, M. & Guibelalde, E. MS_SSIM Index as a Java plugin for ImageJ, online (2014).
- 25.Banterle, N., Bui, K. H., Lemke, E. A. & Beck, M. Fourier ring correlation as a resolution criterion for super-resolution microscopy. J. Struct. Biol.183, 363–367 (2013). [DOI] [PubMed] [Google Scholar]
- 26.Descloux, A., Grußmayer, K. S. & Radenovic, A. Parameter-free image resolution estimation based on decorrelation analysis. Nat. Methods16, 918–924 (2019). [DOI] [PubMed] [Google Scholar]
- 27.Narayanasamy, K. K. et al. Visualizing synaptic multi-protein patterns of neuronal tissue with DNA-assisted single-molecule localization microscopy. Front. Synaptic Neurosci.13, 671288 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Liu, S. et al. Universal inverse modeling of point spread functions for SMLM localization and microscope characterization. Nat. Methods21, 1082–1093 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Li, Y. et al. Real-time 3D single-molecule localization using experimental point spread functions. Nat. Methods15, 367–369 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Krull, A., Buchholz, T.-O. & Jug, F. Noise2void-learning denoising from single noisy images. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2129–2137 (2019).
- 31.Goncharova, A. S., Honigmann, A., Jug, F. & Krull, A. Improving Blind Spot Denoising for Diffraction-Limited Microscopy Data. ECCV 2020 Workshop on BioImage Computing (2020).
- 32.Falk, M. et al. Heterochromatin drives compartmentalization of inverted and conventional nuclei. Nature570, 395–399 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods18, 100–106 (2021). [DOI] [PubMed] [Google Scholar]
- 34.Zakrzewski, F. et al. Automated detection of the HER2 gene amplification status in Fluorescence in situ hybridization images for the diagnostics of cancer tissues. Sci. Rep.9, 8231 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Narayansamy, K. K. Accelerating DNA-PAINT imaging with a deep neural network. 10.5281/zenodo.6966132.
- 36.Power, R. M., Tschanz, A., Zimmermann, T. & Ries, J. Build and operation of a custom 3D, multicolor, single-molecule localization microscope. Nat. Protoc.19, 2467–2525 (2024). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets generated and/or analysed during the current study are not publicly available due to their large size but are available from the corresponding author on reasonable request.
The software used in this work is publicly available at: https://github.com/alonsaguy/One-click-image-reconstruction-in-single-molecule-localization-microscopy-via-deep-learning.






