DeepInMiniscope: Deep learning–powered physics-informed integrated miniscope

Feng Tian; Ben Mattison; Weijian Yang

doi:10.1126/sciadv.adr6687

. 2025 Sep 12;11(37):eadr6687. doi: 10.1126/sciadv.adr6687

DeepInMiniscope: Deep learning–powered physics-informed integrated miniscope

Feng Tian ¹, Ben Mattison ^2,³, Weijian Yang ^1,^3,^*

PMCID: PMC12429034 PMID: 40938981

Abstract

Mask-based integrated fluorescence microscopy is a compact imaging technique for biomedical research. It can perform snapshot 3D imaging through a thin optical mask with a scalable field of view (FOV). Integrated microscopy uses computational algorithms for object reconstruction, but efficient reconstruction algorithms for large-scale data have been lacking. Here, we developed DeepInMiniscope, a miniaturized integrated microscope featuring a custom-designed optical mask and an efficient physics-informed deep learning model that markedly reduces computational demand. Parts of the 3D object can be individually reconstructed and combined. Our deep learning algorithm can reconstruct object volumes over 4 millimeters by 6 millimeters by 0.6 millimeters. We demonstrated substantial improvement in both reconstruction quality and speed compared to traditional methods for large-scale data. Notably, we imaged neuronal activity with near-cellular resolution in awake mouse cortex, representing a substantial leap over existing integrated microscopes. DeepInMiniscope holds great promise for scalable, large-FOV, high-speed, 3D imaging applications with compact device footprint.

DeepInMiniscope, a miniaturized, compact, and thin integrated microscope, performs high-resolution 3D imaging over a large volume.

INTRODUCTION

Fluorescence microscopy is a powerful tool in biological and biomedical research. While most of the fluorescence microscopes are in benchtop configurations, the emergence of their miniaturized counterparts enables applications demanding a compact footprint and portability, such as in endoscopic procedures or for implantable devices in bioimaging. These miniaturized microscopes typically have the same core structure as the benchtop configurations, which incorporates an objective lens and a tube lens (1–8). A fundamental limit of such a configuration is the tradeoff among the device footprint, field of view (FOV), and imaging resolution. Increasing the FOV while retaining high resolution tends to increase the device footprint in three dimensions (3D). Furthermore, they typically lack optical sectioning and 3D imaging capability. Imaging the sample in 3D requires taking one image per focal depth by refocusing the objective lens. Mask-based integrated or lensless microscopes (9–24), an emerging imaging technique, could overcome these limitations by replacing the bulk optics into a thin optical mask. The optical mask is in close proximity with the camera so the entire device could be very thin. The integrated microscope can resolve 3D objects in a snapshot as the optical mask can modulate the light to encode 3D object information onto a 2D image. Crucially, the FOV is readily scalable by increasing the size of the optical mask, without necessitating any increase in the device thickness. Although an integrated microscope offers such advantages, it has a general challenge to recover high-quality object information, especially with large-scale data. First, recovering 3D information from 2D image could be an ill-posed problem, which may require strong prior information. Second, unlike conventional microscopes where the point spread function (PSF; describing how the object point is mapped to the camera plane) is spatially confined in a single cluster, the PSF in integrated microscopes is spatially spread out on the camera. When imaging biological samples with dense features, the images of different object points could overlap between each other. This could effectively elevate the background and thus reduce the signal contrast. Given these two reasons, it is challenging to reconstruct 3D volumes of dense fluorescence samples in high quality. This issue becomes more severe with a large FOV and a high number of camera pixels and object voxels, which demand substantial computational resources.

Here, we develop DeepInMiniscope, an integrated microscope innovatively paired with a custom designed optical mask and a physics-informed computational algorithm termed as multi-local-FOV ADMM-Net, enabling a high-efficient and high-quality reconstruction of both 2D and 3D objects (Fig. 1). Our optical mask features a microlens array where each unit is a lens doublet, resulting in a sparse PSF where each lobe is spatially confined. As became more clearly later, such a PSF plays an important role to reduce the computational demand of the reconstruction algorithm. One distinct feature of DeepInMiniscope from other mask-based integrated microscopy is its local PSF, i.e., each object point is only imaged into a local region on the camera (fig. S1). Our multi-local-FOV algorithm is designed to reconstruct the objects in local FOVs and then seamlessly integrates them together into a cohesive whole. This approach substantially reduces memory requirements, as the entire 3D object does not need to be reconstructed at once. Different parts can be reconstructed in parallel [with sufficient graphics processing unit (GPU) memory] or sequentially (with limited GPU memory). Another distinct feature of DeepInMiniscope is its high-efficient and high-quality reconstruction through a custom-designed ADMM-Net, which is a multistage physics-informed model embedded in unrolled Alternating Direction Method of Multipliers (ADMM) deep neural networks (25, 26). Such a hybrid approach combines the advantage of the high accuracy of model-based iterative optimization approach (9, 10, 15, 27) and the fast running time and data adaptability of the data-driven deep neural networks (18, 19, 22, 28–30), leading to a highly efficient training process requiring only a small amount of data, and a high-quality reconstruction. Thanks to the sparse PSF which is composed of discrete and localized lobes, our deep neural network demands considerably fewer computational resources. It could process data with large scale in a high speed to achieve a high reconstruction quality, which are otherwise intractable in either iterative optimization methods or deep neural networks.

Fig. 1. — (A) Schematic of DeepInMiniscope. The raw measurement from the miniaturized integrated microscope is sent to a deep neural network, which reconstructs the object in 3D. (B) Application of DeepInMiniscope in neural activity imaging in mouse brain in vivo. Left: A 3D volume in the visual cortex highlighting the reconstructed active neurons (shown as maximum projection of temporal activity). Right: The normalized extracted temporal activity of representative neurons. (C to L) Application of DeepInMiniscope in fluorescence imaging of thin fluorescent samples and the 2D reconstructions. From left to right column, the samples are lens tissue, square grid patterns (12-μm line width), long line patterns (20-μm line width), polygons and letters, *C. elegans* embryos, and *C. elegans* body. (C) Raw measurements by DeepInMiniscope. (D) Reconstruction using a list-based RL algorithm. (E) Reconstruction using a multi-local-FOV ADMM-Net. (F) Ground truth obtained by a benchtop microscope. (G) Reconstruction using a single-FOV Hadamard-Net. (H) Reconstruction using a single-FOV ADMM-Net. (I) Reconstruction using a multi-global-FOV Hadamard-Net without PSF initialization. (J) Reconstruction using a multi-global-FOV Hadamard-Net with PSF initialization. (K) Reconstruction using a multi-global-FOV Wiener-Net without PSF initialization. (L) Reconstruction using a multi-global-FOV Wiener-Net with PSF initialization. Scale bars, 300 μm (B) and 500 μm [(C) and (D)].

In addition to our deep neural network, we developed a second object reconstruction approach called list-based Richardson-Lucy (RL), based on the RL algorithm. This method leverages the sparse PSF and substantially reduces computational resources compared to traditional RL. Although it does not perform as well as the neural network, it requires no training and achieves robust performance with large-scale data.

DeepInMiniscope has an FOV of 4 mm by 6 mm, over 600-μm axial range, with a lateral resolution ~10 μm and an axial resolution ~60 μm. We demonstrated high-quality imaging of a variety of challenging fluorescence samples (Fig. 1), including scattering phantoms with complex and dense features, organisms like Caenorhabditis elegans (C. elegans) and hydras that display 3D motion, as well as the cortical neural activity with near-cellular resolution in an awake mouse. DeepInMiniscope is a promising tool for portable, extensive field-of-view, high-speed, 3D imaging applications.

RESULTS

Miniaturized integrated microscope with a doublet microlens array

DeepInMiniscope (Fig. 2 and Materials and Methods) features a thin layer of microlens array as the optical mask. Each microlens unit is a custom-designed doublet lens (Fig. 2, E and F) that minimizes wavefront aberrations, yielding a sharp PSF across an extended FOV. Compared with singlet lens, our design could achieve a higher Strehl ratio within the designed FOV while reduces the background outside its FOV (Fig. 2G). Each doublet lens has a diameter of 300 μm and a focal length of 1.17 mm, resulting in a numerical aperture (NA) of 0.1. The optical mask, consisted of 108 doublet lenses distributed in a semirandom pattern within an area of 4 mm by 6 mm, was fabricated using two-photon polymerization (31, 32) on a fluorescence emission filter (interference filter). The interspaces between doublet units were coated with aluminum to reduce the background light that is not modulated by the lenslets. We set the distance between the object and the microlens array as 3.9 to 5.5 mm, and the distance between the microlens array and the image plane as ~1.6 mm, resulting in a system magnification of 0.26 to 0.39, considering the effect of glasses in the optical path. The effective detection NA is ~0.024 to 0.037 on the object. We defined the nominal FOV of each doublet to be ~1 mm, where the peak intensity of the PSF drops to ~0.8 of the value of the on-axis PSF, in an overall good agreement with the simulation results of the design (Fig. 2, G and H). This result confirmed that the PSF is spatially sparse with each lobe being individually constrained. As described later, this is a vital attribute that substantially decreases the computational demand for our reconstruction algorithm, setting our approach apart from others. Using the nominal FOV of each doublet, we find that each object point is effectively imaged by ~4 microlens units (Fig. 2I). On the basis of the detection NA and effective FOV of each doublet, for a typical magnification of 0.35, we can calculate the lateral resolution to be ~9.6 μm, the axial resolution to be ~63 μm, and the depth of field to be ~1 mm through geometrical optics (~800 μm through diffraction theory) (note S1 and fig. S2). The lateral FOV is determined by the size of the microlens array, which is 4 mm by 6 mm.

The microlens array on the emission filter is stacked with an absorptive color filter and assembled in proximity with a board-level back-illuminated complementary metal-oxide semiconductor (CMOS) camera sensor (5.5 mm by 7.4 mm, 3000 pixels by 4000 pixels) (Fig. 2, A to C). An optional multimode fiber, bifurcating into two ports, can be inserted to the assembly to deliver the excitation light from a light-emitting diode to the sample. The angle and position of the two fiber ports were optimized so the illumination is uniform across the FOV of 4 mm by 6 mm (Fig. 2D). The fluorescence signals pass through the filter stack and are captured by the doublet microlens array onto the camera sensor. The entire assembly is ~30 mm by 30 mm by 6.5 mm in size with a weight of 7.5 g (or 3 g if excluding the camera).

Highly efficient reconstruction algorithms

A major challenge in integrated microscopy is to reconstruct the high-quality 3D object while minimizing the computational resource and time, from a 2D image where the weak fluorescence signals are overshadowed by a strong background created by the overlap of the subimages from different lenslets. There are two major classes of reconstruction algorithms: model-based iterative optimization methods, such as those based on RL or ADMM (9, 10, 15, 27), and deep learning methods using a neural network (18, 19, 22, 28–30). The former typically delivers higher and more consistent reconstruction quality but at the cost of substantial computational resource and time, along with the need for an accurate characterization of the PSF. Deep learning methods, on the other hand, once trained, can have a fast processing speed, although they often lack generalizability to data types beyond their training set and are constrained by the available GPU memory.

We develop multi-local-FOV ADMM-Net, a multistage and physics-informed deep neural network based on unrolled ADMM frameworks (Fig. 3, figs. S3 to S5, and Materials and Methods) (25, 26). This network not only synthesizes the robust and high-quality reconstruction capabilities of ADMM-based iterative optimization with the high processing speed in neural networks but, crucially, substantially reduces the required computational resources, facilitating the reconstruction of large 3D data volume with a large voxel count from a 2D image with high pixel count.

The multi-local-FOV ADMM-Net starts with a preprocessing step for noise and background suppression, using either a U-Net or a Laplacian-of-Gaussian (LoG) filter (fig. S3), followed by a multihead physics-informed neural network to initiate the reconstruction results. Each head of the network learns a local PSF and reconstructs the object within a localized FOV at a specific depth (fig. S6), through a Hadamard-Net in the Fourier domain that we recently developed (30). The Hadamard-Net, also termed as Fourier-Net (22, 33), grounded in the underlying deconvolution principle of transpose convolution, learns the deconvolution process in the Fourier domain without requiring PSF calibration (fig. S7), making it highly efficient to create an initial reconstruction (30). The preliminary reconstruction results are then refined through multiple ADMM-Nets (fig. S8), which has two distinct features from other networks and traditional ADMM-Net (25, 26).

First, each ADMM-Net reconstructs a 3D local FOV, offering unique advantages for microscopy lacking a global PSF (fig. S1). This allows us to model the image formation process in each local FOV through a unique PSF. Each ADMM-Net then incorporates this image formation into the learnable network parameters through the local data patch (fig. S6). Loss function is calculated between the reconstructed local patches and their corresponding ground truth patches to update individual ADMM-Nets. The local reconstructions are then fused together into an entire 3D volume. Such a process cohesively integrates the individual FOV together. Furthermore, by only reconstructing the local FOV and then fusing them together rather than reconstructing the entire FOV at once, we allow running the reconstruction on GPUs that do not have enough random-access memory (RAM) to reconstruct the global FOV at once (notes S2 and S3).

Second, each of our ADMM-Net has a substantially reduced demand of computational resources than the traditional ADMM-Net (25, 26). ADMM-Net itself has a relatively complex structure, which is composed of multiple iterative stages, including object updates through gradient calculation, regularization and denoising [through a convolutional neural network (CNN) in our case], and dual update to account for primal residues (Materials and Methods). Typically, the computational challenge lies in the extensive time needed in the object update stage which involves the calculation of the matrix ${(A^{T} A + ρ I)}^{- 1}$ and its multiplication with a 1D linearized vector of the object, where $A$ is the system matrix describing how each object voxel is mapped to the image plane, $A^{T}$ is the transpose of $A$ , $I$ is the identity matrix, and $ρ$ is a learnable scalar. The computational complexity of the calculation of ${(A^{T} A + ρ I)}^{- 1}$ is $O (N^{3})$ in time, and its multiplication with the 1D linearized vector of the object is $O (N^{2})$ , and the required RAM scales with $N^{2}$ , where $N$ is the number of object voxels, assumed to the same order of magnitude with the number of camera pixels (note S2 and fig. S9). By exploiting the sharp and sparse PSF patterns as well as its spatial invariant property (within each local FOV), we can approximate ${(A^{T} A + ρ I)}^{- 1}$ as an identity matrix (note S2 and figs. S9 to S13). This further allows us to transform the large matrix multiplication between the 1D linearized vector of the object and ${(A^{T} A + ρ I)}^{- 1}$ into the sum of 2D convolutions between the 2D object at each depth and a 2D kernel initialized as a matrix with all elements being zero except the central one. This could then be simplified to Hadamard multiplication in the Fourier domain with a computational complexity to $O (N lo g_{2} N)$ in time (note S2, fig. S9, and Materials and Methods). The required RAM scales with $N$ . In addition, our algorithm does not require any calibration of the PSF in experiment, which would be a laborious work in ADMM (15) or typical deep neural networks (18, 19, 26, 28, 29). The initialization strategy that approximate ${(A^{T} A + ρ I)}^{- 1}$ as the identify matrix is crucial for facilitating the neural network training and achieving good performance (figs. S10 and S11). While our ADMM-Net, which converts the matrix multiplication into convolution with a learnable kernel, is primarily designed for microlens array, it can also be applied to the less sparse PSFs, such as contour PSF (11) and caustic PSF (10), although with reduced performance (note S3 and figs. S10 and S11). Nonetheless, as long as the PSFs remain relatively sparse, our initialization strategy proves robust, allowing the neural network to further optimize itself based on the data.

We trained our neural network in experiment using custom-designed fluorescent samples, with ground truth images captured by a benchtop 2× microscope (Materials and Methods). This approach, distinct from conventional methods that rely on simulated data (18, 29), eliminates the need for extensive and precise calibration of the PSF. Furthermore, our method could better capture the image formation process which may not be fully modeled in simulation. Our ADMM-Net exhibits rapid convergence in both the number of stages and the number of training datasets required. We implemented six stages to optimize GPU memory usage (fig. S14) and only needed a very small number of datasets for training (fig. S15). This efficiency stems from the ADMM-Net being physics informed rather than purely data driven. The Hadamard-Net (30), used for the initial reconstruction, learns the underlying physics of the imaging model. The subsequent ADMM stages adhere to the principles of ADMM but reduce complexity and optimize themselves using the data. In addition, we calibrated the angular rotation between the raw measurements and the ground truth data. We also resized the ground truth image sizes for each training depth to maintain constant pixel scaling. These adjustments enable the ADMM-Net to effectively learn the parameters of the underlying physics model. Once trained, our multi-local-FOV ADMM-Net is capable of reconstructing a volume of 6 mm by 4 mm by 0.6 mm with ~1248 voxels by 832 voxels by 13 voxels at 1 Hz (24 GB GPU RAM) or 3 Hz (80 GB GPU RAM). This represents a substantial advantage over the conventional ADMM, which, while capable of achieving superior reconstruction quality (figs. S10 to S13), demands an enormous of RAM [tens of terabyte (TB)] and extensive computational time to process such a large data volume (N ~ 10⁶ to 10⁷). This requirement is not practically feasible.

To benchmark the performance of our neural network that can run on large data scale, we developed a list-based RL algorithm as an alternate method to reconstruct the objects (fig. S16 and Materials and Methods). The RL algorithm is a classical deconvolution method, but in 3D integrated microscopy, it would require too much memory to store the system matrix. Leveraging the sparse PSF, we turn the system matrix into individual lists, which contain the amount of camera pixels (<50) that each object voxel projects to. Effectively, this reduces the complexity of the system matrix from $O (N^{2})$ to $O (50 N)$ , thereby conserving computational time and memory and enabling the algorithm to operate on a standard laptop. Furthermore, the reconstruction could have a reduced background because we only consider those voxel-pixel pairs within the FOV of each lens unit. Compared to the multi-local-FOV ADMM-Net, the list-based RL algorithm runs slower [0.58 to 12 s per iteration for ~1.1 × 10⁷ voxels (13 to 16 planes, 1 to 3 mm FOV for each lens unit); three to five iterations needed in general] and may be less capable to reconstruct objects with complex and continuous features. Nonetheless, the list-based RL algorithm is suitable for system lacking a global PSF and has a generally robust reconstruction performance, which could serve as a baseline to evaluate the performance of the multi-local-FOV ADMM-Net.

Image resolution

We characterized the image resolution using both a single point source and a resolution target (figs. S17 and S18). The lateral resolution of our integrated imager is ~10 μm, verified by imaging a 4-μm point source, and the USAF resolution target (group 5, element 6, ~8.8-μm line width), which is consistent with our designed value. The axial resolution is ~60 μm.

Imaging of phantom with dense fluorescence features and comparison between multi-local-FOV ADMM-Net and others

We validated DeepInMiniscope and the reconstruction algorithms on fluorescence samples such as lens tissue stained with fluorescent spray and customized-fabricated masks containing letter shape features, mesh grids, or randomly distributed lines (Fig. 1, C to F). The lens tissue posed a substantial challenge for reconstruction due to its dense fluorescence features, leading to a strong background in the raw image. Before the actual reconstruction, we used a U-net to suppress the background (Materials and Methods). For other samples, we used LoG filters to highlight the edges, which could facilitate the subsequent reconstruction. While both the multi-local-FOV ADMM-Net and list-based RL algorithm successfully reconstructed the object, the multi-local-FOV ADMM-Net demonstrated an overall better performance in terms of peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM), calculated on background-subtracted images (table S1 and Materials and Methods).

The multi-local-FOV ADMM-Net features local FOV reconstruction. We compared its performance with networks reconstructing the image as a whole with a global PSF or multiple global PSFs (Fig. 1, G to L, and Materials and Methods). First, we looked into single-FOV Hadamard-Net (Fig. 1G) (30) and a single-FOV ADMM-Net (Fig. 1H). The former learns to deconvolute the entire FOV in the Fourier domain through a single pair of Fourier filters, and the latter refines the reconstruction results through the ADMM-Net. Both networks are limited to learn a single PSF for the entire FOV and could not achieve a holistically optimized reconstruction over the entire FOV (fig. S8). Such a global PSF could make up artifacts to explain the irrelevant features of the global PSF on local images. Second, increasing the number of channels of the global PSF, as in a multi-global-FOV Hadamard-Net [also termed as SV-FourierNet (22)] (Fig. 1, I and J) or a multi-global-FOV Wiener-Net (19, 29) (Fig. 1, K and L), did not help. These two networks are designed for imaging systems with spatial variant PSF, which can be described as a combination of multiple global PSFs. They learn these global PSFs in the Fourier domain and spatial domain, respectively, reconstruct multiple versions of the entire FOVs through each global PSF and then fuse them through a neural network such as a CNN. This does not fit our system where global PSFs do not exist. Even when the PSF of each channel was initialized by the locally calibrated PSF, without constraint of locality, the network tried to find a compromised global PSF for each channel. When the reconstructions from different channels were merged together, only the central FOV had a relatively good reconstruction quality, with the rest regions contaminated with low-quality features and background artifacts (fig. S19). Thus, the global reconstruction strategy and the network architecture are incompatible with the condition when the global PSF does not exist, regardless of the number of channels and how each channel is initialized. This is further illustrated by the loss landscape of the global reconstruction network for systems with and without global PSF. The former one has a more convex negative log-likelihood loss landscape, and the latter one faces challenging in finding the optimized result (fig. S20). Last, the multi-local-FOV ADMM-Net resulted in reconstructions with a more balanced quality and intensity and a reduced background variation across the entire FOV, particularly in the samples with dense and complex features such as the lens tissue. Quantitatively, the multi-local-FOV ADMM-Net generally has a reduced background intensity and background variation, compared to single-FOV Hadamard-Net, multi-global-FOV Hadamard-Net, or multi-global-FOV Wiener-Net. Furthermore, the multi-local-FOV ADMM-Net generally has a higher PSNR and SSIM compared to other neural network approaches (table S1). This affirms the effectiveness and suitability of multi-local-FOV ADMM-Net for our imaging system.

Imaging of fluorescent beads embedded in scattering medium over a large 3D volume

We tested the 3D imaging capability of DeepInMiniscope on phantoms containing randomly distributed fluorescent beads (5 μm in diameter) in clear media (Fig. 4, A to D). We first used a benchtop microscope with a 2× or 10× objective lens to characterize the sample (Fig. 4A). The former has a large FOV but poor 3D resolving power, whereas the latter can resolve multiple axial depths by changing the focus but has a smaller FOV. DeepInMiniscope combines the strength of both objective lenses and could resolve 3D features over a large volume (6 mm by 4 mm by 0.6 mm, 13 planes, reconstructed by either multi-local-FOV ADMM-Net or list-based RL). The reconstructed beads shown in the xy, yz, and xz projection match very well with those obtained from the benchtop 10by microscope (fig. S21). As a comparison, we also reconstructed the same volume using the single-FOV Hadamard-Net, which showed less satisfactory results (fig. S21). Last, we also analyzed the axial resolving power of multi-local-FOV ADMM-Net and list-based RL through the histogram of the axial full width at half maximum (FWHM) of the reconstructed beads. The minimum value was ~50 μm (Fig. 4D). This is in line with the 60-μm axial resolution described in the earlier session.

Fig. 4. — (A) Fluorescent beads phantom (fluorescent beads in 5 μm in diameter distributed in optical clear polymer) imaged by a benchtop microscope with 2× objective lens (left, xy view) and 10× objective lens (right, xy, yz, and xz views, 13 axial planes, each separated by 50 μm). The image from the 10× objective lens (right) is a zoom-in view of the region inside the orange dashed box in the image from the 2× objective lens (left). (B) 3D reconstruction results within 600-μm depth range through a multi-local-FOV ADMM-Net, in xy, xz, and yz views. The green/orange dashed box in the xy view (left) corresponds to the FOV of the image from the 2×/10× objective lens respectively in (A). Right: Zoom-in view of the region inside the orange dashed box in the left image. (C) Same as (B) but with a list-based RL algorithm. (D) Histogram of the axial FWHM of individual beads reconstructed from the multi-local-FOV ADMM-Net (left) and list-based RL algorithm (right). (E to H) 3D reconstruction of fluorescent beads in 12 μm in diameter distributed in a 3D scattering volume with a mean-free path (MFP) of (E) 50 μm, (F) 100 μm, (G) 250 μm, and (H) 500 μm. The optical clear polymer is mixed with fluorescent beads in 12 μm in diameter and nonfluorescent beads in 1.18 μm in diameter, whose concentration could be used to control the mean-free path. All reconstructions are within 4.2 mm by 5.8 mm by 600 μm volume range. Left/middle: Reconstruction through the multi-local-FOV ADMM-Net/list-based RL algorithm, in xy, xz, and yz views. Right: Reference image captured by a benchtop microscope with a 2× objective lens and the photograph showing the scattering phantom slide on top of a resolution target. Scale bars, 500 μm [(A) to (C), left)], 200 μm [(A) to (C), right], and 500 μm [(E) to (H)].

In subsequent experiments, we imaged 3D scattering phantoms where the 12-μm-diameter fluorescent beads were mixed with 1.18-μm-diameter nonfluorescent beads, which act as scatterers (Fig. 4, E to H). As the scattered light increases the imaging background, we first removed the background from the raw image by a LoG filter, before processing them with the multi-local-FOV ADMM-Net or list-based RL algorithm (Materials and Methods). Both reconstruction methods could pick up the ballistic light and reconstruct the 3D volume across scattering lengths of 50 to 500 μm. The results are in a good agreement with the images captured by the benchtop microscope, demonstrating the robustness of reconstruction algorithms.

Imaging of C. elegans and hydra

Imaging the biological specimens presents challenges for integrated imager due to the typically weak fluorescence signals and the presence of background fluorescence over the regions of interest. Our reconstruction algorithm, equipped with a background removal module and multi-local-FOV ADMM-Net, is particularly tailored to tackle this challenge. The algorithm iteratively learns the image formation and progressively enhances the reconstructions. We validated the reconstruction capability by imaging C. elegans (Fig. 1) and the live hydra (Fig. 5 and fig. S22), where the embryos of the C. elegans and the epithelial cells of the hydra were labeled with green fluorescent protein (GFP). In C. elegans, our system was capable of distinguishing embryos as small as 13 μm. In hydra, it successfully captured its tentacles with thicknesses down to ~20 μm. The reconstruction quality of both multi-local-FOV ADMM-Net and list-based RL was superior compared to other reconstruction algorithms (Figs. 1 and 5). Last, we recorded the 3D motion of the hydra, across a FOV of ~6 mm by 4 mm over 2-mm depth range at 4 Hz, reconstructed using the list-based RL algorithm (Fig. 5, F to H, and movies S1 and S2). While traditional benchtop microscopes are commonly used to study the morphology of hydra without axial resolution, our integrated microscope offers a distinct advantage of high-resolution 3D imaging over large volumes.

Fig. 5. — (A to D) Image and reconstruction of two hydras, both having endodermal cells (inner layer of the epithelial cell) labeled with GFP. The hydras were housed in a thin chamber in between the microscope slide and the cover slip. (A) A raw image frame from DeepInMiniscope. (B) 2D reconstruction of (A) by the multi-local-FOV ADMM-Net. (C) 2D reconstruction of (A) by the list-based RL algorithm. (D) 2D reconstruction of (A) by the single-FOV Hadamard-Net. (E) Reference image captured by a benchtop microscope with a 2× objective lens. The white arrow indicates the thin tentacles (~20 μm). (F to H) 3D reconstruction of a hydra at three different frames over 2-mm axial range, by the list-based RL algorithm. The color bar indicates the reconstruction depth. The hydra was housed in a well with a depth of 2 mm, which provided space for the 3D movement of the hydra. Scale bars, 500 μm [(A) to (H)].

Imaging of the visual cortex of awake mouse

One key application of DeepInMiniscope is to record neuronal activity over a large FOV in awake mouse. We conducted in vivo experiments to monitor the spontaneous activity of layer 2/3 in primary visual cortex (V1) in head-fixed awake mouse transfected with a calcium indicator GCaMP6f (34), at a recording rate of 4 Hz (Figs. 1B and 6). A notable hurdle encountered in imaging scattering brain tissues was the pronounced background in the raw recordings, which prevented imaging at the single-cell level in previous studies (17, 21). To address this, we preprocessed the raw recording by suppressing the pixels exhibiting small temporal variation (Materials and Methods). Subsequently, the data were processed through the multi-local-FOV ADMM-Net, in a frame-by-frame manner. We successfully extracted ~150 regions of interest with near-cellular resolution, over a 3D volume spanning ~1.5 mm by 2 mm by 300 μm in the mouse cortex (with the lateral FOV limited by the spatial expression range of GCaMP6f and the size of craniotomy in our current animal preparation protocol; Fig. 6E). We segmented individual clusters in the 3D volume based on the graph connectivity between voxels in the spatiotemporal correlation map of the 3D volume (Materials and Methods) (16). Most of the clusters had a lateral FWHM of 15 to 35 μm in the spatiotemporal correlation map (Fig. 6F). Given 10 to 15 μm of the neuronal cell body size and the ~10-μm lateral resolution of DeepInMiniscope, each of the clusters is likely an individual neuron. The axial FWHM of most of the clusters was ~100 μm (Fig. 6G), which was a reasonably well optical sectioning capability in mouse brain tissue. From the 3D volume, we could extract different sets of neurons across multiple planes (50, 150, and 300 μm depth; Fig. 6, H to J). Using a constrained non-negative matrix factorization algorithm (CNMF-E) (35), we further deconvoluted and reconstructed the activity of individual neurons (Fig. 6, K to M, and movies S3 and S4).

Fig. 6. — (A) Experimental setup. The mouse was head-fixed on a treadmill, with DeepInMiniscope mounted on top of the headplate. Excitation light was delivered through the dual fiber channels. (B) A single raw image frame. (C) SD-DLM mask, which is the time-series SD of the difference–to–local-mean (DLM) of the raw video, followed by a LoG filtering. The SD-DLM mask highlights pixels with strong temporal dynamics. (D) A single frame of the DLM video (i.e., raw video processed by the DLM operation) weighted by the SD-DLM mask. (E) 3D reconstruction of the image volume (1.5 mm by 2 mm by 600 μm) by a multi-local-FOV ADMM-Net, in xy, yz, and xz views, showing the spatiotemporal correlation map of the reconstructed video. The multi-local-FOV ADMM-Net processed the SD-DLM mask weighted DLM video frames individually. The reconstructed time-series volume was then projected into a 3D volume showing the spatiotemporal correlation among adjacent voxels. This 3D volume was further processed by an iterative clustering algorithm (16) to highlight individual neurons (indicated as red color). The dashed line indicates the brain surface. (F to G) Histogram of the (F) lateral FWHM and (G) axial FWHM of all the clusters founds in the 3D volume of the spatiotemporal correlation map. (H to J) Three individual axial planes from (E), at depths of (H) 50 μm, (I) 150 μm, and (J) 300 μm. The individual neurons were indicated by red color. (K to M) Representative normalized temporal activity traces of the extracted neurons in (K) 50-μm, (L) 150-μm, and (M) 300-μm planes. Black, activity traces of the neurons directly extracted from the video reconstructed by multi-local-FOV ADMM-Net. Red, activity traces of the neurons extracted through CNMF-E (35) from the reconstructed video. Scale bars, 500 μm [(B) to (E) and (H)].

As a reference, we directly extracted the neurons and their activities from the nonoverlapping regions of the subimages from the lens units, which appeared as conventional wide-field images. For different neurons, we manually select the subimages to extract their traces (where the neurons were located in the central effective FOV of the subimages). While this method cannot provide 3D resolution, the extracted traces can serve as a reference of the ground truth. We compared the neuronal activity traces from the subimages and those extracted from multi-local-FOV ADMM-Net (fig. S23C) and found a good overall match. This result endorsed the high fidelity of our reconstruction algorithm. This represents, to the best of our knowledge, the first demonstration of in vivo calcium imaging at near-cellular resolution over a 3D volume in mouse brain using a mask-based integrated miniaturized microscope.

DISCUSSION

In summary, we developed DeepInMiniscope, a miniaturized integrated microscope with a custom-designed doublet microlens array. Our innovative multi-local-FOV ADMM-Net algorithm is capable of reconstructing local FOVs and synthesizing them to a larger FOV, demonstrating the scalability of the imaging FOV. We showed that the multi-local-FOV ADMM-Net was highly efficient in computation and had superior reconstruction capabilities over previous methods. We tested DeepInMiniscope across a variety of samples, including those with dense and detailed features over expansive areas such as lens tissue, and biological specimens where the regions of interest exhibited weak fluorescence or were embedded in scattering tissues such as neurons in awake mice. In addition, we introduced a list-based RL algorithm and successfully reconstructed the 3D motion of a hydra over a 2-mm depth of field. We summarized the key distinctions of DeepInMiniscopes from others in note S4.

One key contribution of our work is the development of the multi-local-FOV ADMM-Net, designed for imaging systems that do not have a global PSF. Typical deep learning algorithms for integrated imagers adopt a single PSF kernel to reconstruct images. While certain algorithms like multi-global-FOV Wiener-Net (29) or multi-global-FOV Hadamard-Net [i.e., SV-FourierNet (22)] use multiple PSF kernels to improve the reconstruction in spatial variant systems, they apply each PSF kernel globally. Such a global reconstruction approach is incompatible for the PSF which only encodes a point source to a nearby region on the image sensor and could result in excessive noise and artifacts (fig. S19). Our method innovatively partitions the raw data into localized patches for individual reconstruction and subsequently merges them into a large FOV output. This strategy of focusing on smaller, local patches substantially improves the quality and efficiency of reconstruction, making our algorithm particularly suited for our miniaturized microscope with a scalable FOV. Crucially, our algorithm allows image reconstruction with large-scale data using GPUs with a small amount of memory.

Our method shares similarity with existing strategies that divide the FOV into localized images for each lens unit and use the light field imaging technique for 3D volume reconstruction through refocusing (18). However, our approach distinguishes itself by performing deconvolution on each localized FOV, rather than merely demixing them at the image sensor plane. This crucial advancement enables our system to tackle the more complex imaging scenarios where the local images of each lens unit substantially overlap, such as imaging samples with dense features spanning a large area.

Our work also presents a substantial advancement in optimizing the computational efficiency of the ADMM-Net, which leverages the unique attributes of our PSF. Traditionally, ADMM-Net demands substantial memory resources as it is unrolled into multiple stages, each involving a large-scale matrix multiplication as well as the operation of matrix inversion. In our case, the sparse and spatially discrete nature of the PSF from our doublet microlens array allows an innovative approximation of A^T A as an identity matrix which alleviates a substantial computational burden. This further facilitates the conversion of a large matrix multiplication into 2D convolutions, which is then calculated by more manageable Hadamard multiplications in the Fourier domain. Such a strategy reduces the computation complexity by orders of magnitude. Furthermore, this approach eliminates the need of a specific initialization of system matrix A. In conventional ADMM or deep neural network such as the multi-global-FOV Wiener-Net (29), A is typically fixed or requires careful initialization through an extensive PSF calibration across the entire object domain, which is time consuming.

To process neuronal recordings from the mouse brain, we used the multi-local-FOV ADMM-Net to reconstruct each video frame individually. This was followed by spatial segmentation and/or CNMF-E applied to the reconstructed videos at each depth to extract the activity traces of individual neurons. As an alternative pipeline, constrained NMF can be directly applied to the raw video data. In this approach, the initialization of the constrained NMF could be informed by the spatial reconstruction results (36) generated by the multi-local-FOV ADMM-Net, which is applied to a single temporal summary frame (e.g., an SD projection) of the video. This hybrid method combines the spatial deconvolution and reconstruction strengths of the multi-local-FOV ADMM-Net, with the temporal processing capability of the constrained NMF. However, further refinement is required to effectively handle the highly multiplexed and high dynamic range characteristics of the imaging data during the iterative update process (fig. S23D). In addition, recent studies have demonstrated the potential of deep neural networks for background removal, spatial segmentation, and activity trace extraction in neuronal recordings (37–40) acquired with conventional one-photon or two-photon microscopes. Future work could explore the adaptation of such deep learning strategies for processing data from miniaturized integrated microscopes.

Our miniaturized integrated microscope, constructed with a doublet microlens array and complemented by a highly efficient multi-local-FOV ADMM-Net reconstruction algorithm, allows high-resolution 3D microscopic imaging over a large FOV. The strategy that decomposes the entire view into local patches for individual construction before stitching them together establishes a new platform for microscopy. This approach fundamentally allows for the FOV to be scaled up unlimitedly. Future work could explore designing microlens units with different NAs and focal lengths (5) to further optimize the resolution and depth of focus, integrating the microlens design within a differentiable neural network framework to jointly optimize the imaging optics and reconstruction such as through an end-to-end learning strategy (41–43), increasing the overall imaging acquisition speed by using fluorophores or fluorescent indicators with higher brightness and reconstruction speed by using parallel computation through multiple GPUs and demonstrating imaging in freely behaving animals.

MATERIALS AND METHODS

Construction of the miniaturized integrated microscope

The doublet microlens is designed through Zemax OpticStudio which optimizes its Strehl ratio and wavefront error. The distribution of the doublet microlens in the array ensures that each object point can be imaged by four to six lens units. The centroid positions of the microlens units in the array are generated by a semirandom patch generation algorithm. The algorithm starts by placing a microlens unit at the center position of the FOV. It marks the microlens unit area and its effective imaging area and then generates the centroid position of the next microlens unit from the random coordinate that falls within a lower bound and upper bound of distance from existing microlens units. The process is repeated until no position is found for the next lens unit within the FOV (fig. S24). The doublet microlens array was fabricated on a fluorescent band-pass filter (ET525/50m, Chroma), with the interspaces between the lenslets coated with aluminum to reduce the background light that is not modulated by the lenslets. The fabrication process was separated into two steps. First, photolithography was performed on the filter glass to define the interspaces between the lenslets. An 80-nm-thick aluminum layer was then deposited to the filter through electron beam evaporation, followed by liftoff (fig. S25). Such an aluminum layer could reflect >99% of the fluorescent light and excitation light (fig. S26). Second, the doublet microlens array was fabricated by two-photon polymerization process (31, 32) (IP-S, Lumenworkx). To support the overhang microlens component on top, we incorporated dual helix structures to connect the edges of two lens components. The bottom piece of the array was attached directly onto the glass surface free of aluminum deposition. The dual helix support structures, and the top components of the lens units were then attached atop the bottom components.

The doublet microlens array was then stacked on top of an absorptive filter (Yellow 12 Kodak Wratten color filter, 14 mm by 11.3 mm by 0.1 mm, Edmund Optics). The entire stack and a board-level back-illuminated CMOS image sensor (IMX226, The Imaging Source) were then assembled in a 3D printed housing (AlSi10Mg, Protolabs). Two fiber ports (Ø400 μm, Thorlabs) could be inserted to the assembly for sample illumination. Figure S27 illustrates the assembling process.

Architecture of ADMM-Net (single-FOV reconstruction)

The ADMM-Net is an unrolled ADMM with learnable parameters, aiming to recover the 3D object $X$ from a measured image $b$ of an optical imaging system with a forward model of $b = A X + n$ , where $A$ is the system matrix and $n$ is the noise. Such a reconstruction problem can be expressed as an optimization problem

{minimize}_{X \in ℝ^{3}} f (X) + λ {‖ ϕ X ‖}_{1}, with f (X) = {‖ AX - b ‖}_{2}

(1)

where $λ {‖ ϕ X ‖}_{1}$ is a regularization term, $ϕ$ is an operator that transforms $X$ into a sparsity representation, and λ is a hyperparameter. The biological samples are typically sparse themselves in the spatial domain, so the above optimization problem can be written as

{minimize}_{X \in ℝ^{3}} f (X) + λ {‖ X ‖}_{1}, with f (X) = {‖ AX - b ‖}_{2}

(2)

The ADMM algorithm formulates this optimization problem as

{minimize}_{X \in ℝ^{3}} f (X) + g (Z), subject to X - Z = 0

(3)

where $Z$ is the regularized variable of $X$ , and $g (Z)$ is the regularization term of $Z$ (and thus $X$ ).

The augmented Lagrangian in its scaled form is

L_{ρ} (X, Z, U) = f (X) + g (Z) + \frac{ρ}{2} {‖ X - Z + U ‖}_{2}^{2} - \frac{ρ}{2} {‖ U ‖}_{2}^{2}

(4)

where $U$ is the dual variable, and $ρ$ is the Lagrange multiplier.

The ADMM-Net (25) with stages representing the iterations in ADMM algorithm, learns parameters and regularization terms from data for enhanced performance, flexibility, and efficiency. Unlike conventional ADMM-Net which requires vectorizing $X$ and $Z$ , storing a large 2D matrix $A$ , and initializing $A$ by an extensive calibration of the PSF (particularly in spatial variant system), our realization eliminates these needs. It alleviates the needs to calculate $A^{T} A$ and the inversion of the large matrix. It transforms large matrix multiplication into two-dimensional frequency domain convolution operations and reduces the overall computational complexity in time from $O (N^{3})$ to $O (Nlo g_{2} N)$ [assuming $N$ is the number of voxels in the object, which is on the same order as the number of pixels $M$ in the recorded image] and thus the required computational time and resources (note S2).

We developed two types of ADMM-Net: Type I aligns closely with the conventional optimization, and type II demands less memory (Fig. 3 and fig. S3). Both are applicable for 2D and 3D sample reconstruction.

In type I ADMM-Net, the update of $X$ , $Z$ , and $U$ in each stage can be described as

Stage 0: ${\hat{X}}^{0} = A^{T} b = H^{D} (b)$ , ${\hat{Z}}^{0} = C ({\hat{X}}^{0})$ , and $U^{0} = {\hat{X}}^{0} - {\hat{Z}}^{0}$ .

Stage $k + 1$ $(k \geq 0)$

\begin{matrix} {\hat{X}}^{k + 1} ≔ arg min_{X} L_{ρ} (X, {\hat{Z}}^{k}, U^{k}) = \\ {(A^{T} A + ρ I)}^{- 1} [A^{T} b + ρ ({\hat{Z}}^{k} - U^{k})] = H^{C} [H^{D} (b) + ρ ({\hat{Z}}^{k} - U^{k})] \end{matrix}

{\hat{Z}}^{k + 1} ≔ arg min_{Z} L_{ρ} ({\hat{X}}^{k + 1}, Z, U^{k}) = C ({\hat{X}}^{k + 1} + U^{k})

U^{k + 1} ≔ U^{k} + {\hat{X}}^{k + 1} - {\hat{Z}}^{k + 1}

(5)

where $H^{C}$ , $H^{D}$ , $C$ , and $ρ$ represent trainable modules or parameters for each stage.

$H^{D}$ represents the deconvolution Hadamard layer (Hadamard-Net) (30), which operates in frequency domain and deconvolutes $b$ into $X$ , both of them in a format of 2D matrix (fig. S3). It aims to recover $A^{T} b$ . As the PSF in our system is sparse, discrete and spatial invariant (for each local FOV), $A^{T} \approx A^{- 1}$ , so $H^{D} (b) = A^{T} b \approx A^{- 1} b$ . $H^{D}$ transforms $b$ into the 2D Fourier domain, performs an element-wise (i.e., Hadamard) multiplication between the 2D Fourier transform of $b$ and 2D learnable kernels, respectively, for both real and imaginary components, and then transforms the result back to the spatial domain. The weights of the kernel are randomly initialized.

$H^{C}$ represents the convolution Hadamard layer (fig. S3). As our PSF is spatially sparse, discrete, and invariant (for each local FOV), $A^{T} A$ can be approximated as a 2D identity matrix, so does ${(A^{T} A + ρ I)}^{- 1} .$ ${(A^{T} A + ρ I)}^{- 1} [H^{D} (b) + ρ ({\hat{Z}}^{k} - U^{k})]$ can thus be treated as a 2D convolution between a kernel associated with ${(A^{T} A + ρ I)}^{- 1}$ and the 2D matrix $[H^{D} (b) + ρ ({\hat{Z}}^{k} - U^{k})]$ . The kernel is initialized as a 2D matrix with a single nonzero entry in the central element. This convolution could then be calculated in frequency domain through element-wise (i.e., Hadamard) multiplication, where the frequency domain kernel could be learnt. The weights of this frequency domain kernel could be initialized as 1 for all the elements.

$ρ$ is a learnable parameter initialized as a random scalar between 0 and 1. It is used to weight the regularized solution when added to $H^{D} (b)$ .

$C$ is a 2D CNN, serving as a learnable regularizer model (fig. S3). We note that $C$ can be a soft-thresholding operator if the regularization term is l1 norm. Rather than empirically using the l1 norm, we allow the algorithm to learn the regularization term from the data through the CNN. This facilitates a more stable and a higher-quality reconstruction.

Type II ADMM-Net shares the same architecture as type I but is more memory efficient. It replaces $H^{D}$ in all stages with $H^{D_{o}}$ from stage 0, meaning that all $H^{D}$ share the same parameters as $H^{D_{o}}$ , thus reducing memory requirements. The update of $\hat{X}$ can be described as below

{\hat{X}}^{0} = H^{D_{0}} (b)

{\hat{X}}^{k + 1} = H^{C} [{\hat{X}}^{0} + ρ ({\hat{Z}}^{k} - U^{k})] = H^{C} [H^{D_{0}} (b) + ρ ({\hat{Z}}^{k} - U^{k})]

(6)

The updates of $\hat{Z}$ and $U$ remain the same as type I ADMM-Net.

Here, we adopt type II ADMM-Net to save the computation resource. There were a total of six stages in the ADMM-Net.

For 3D samples, we construct $H_{z}^{D_{0}}$ for each depth $z$ . Its output is cropped and rescaled on the basis of the magnification of each depth, resulting in ${\hat{X}}_{z}^{0} = R_{z} \{C_{z} [H_{z}^{D_{0}} (b)]\}$ , where $C_{z}$ and $R_{z}$ are learnable crop and resizing layers respectively for depth at $z$ . This maintains consistent magnification across depths, preventing lateral shifts. At stage $k$ , ${\hat{X}}_{z}^{k + 1}$ is updated by individual $H_{z}^{C}$ , similar to the 2D reconstruction. Individual ${\hat{X}}_{z}^{k + 1}$ can be concatenated into a 3D matrix ${\hat{X}}^{k + 1}$ . The update of ${\hat{Z}}^{k + 1}$ and $U^{k + 1}$ follow the same procedure as the 2D reconstruction but with ${\hat{Z}}^{k + 1}$ and $U^{k + 1}$ as 3D matrices and $C$ as a 3D convolution neural network.

In 3D reconstruction, a predicted measured image $\hat{b}$ is calculated for loss function computation. $\hat{b}$ is obtained via the learned $H_{z}^{D_{0}}$ and ${\hat{X}}_{z}^{0}$ . A forward model of the imaging system for depth z is constructed by taking the complex conjugate of the learned weights in $H_{z}^{D_{0}}$ and multiplying it with the Fourier transform of ${\hat{X}}_{z}^{0}$ to obtain a predicted Fourier transform of ${\hat{b}}_{z}$ . The predicted $\hat{b}$ can then be obtained by

\hat{b} = \sum_{z} {\hat{b}}_{z} = \sum_{z} {H_{z}^{D_{0}}}^{- 1} ({\hat{X}}_{z}^{0})

(7)

We iteratively built and trained ADMM-Net with an increasing number of stages. For 3D reconstruction, we pretrained $H_{z}^{D_{0}}$ independently for each depth as 2D reconstructions, using a 2D binary cross entropy (BCE) loss between ${\hat{X}}_{z}^{0}$ and ground truth $X_{z}$ . We then incorporated 3D imaging data and trained all $H_{z}^{D_{0}}$ in stage 0 using the same 2D BCE loss and a SSIM loss between $\hat{b}$ and the real measurement $b$ (closed-loop loss function). The latter loss function increased the network robustness and its axial sectioning capability (fig. S28). We then added and trained $C$ in stage 0 with a 3D BCE loss between ${\hat{Z}}^{0}$ and 3D volume ground truth $X$ and SSIM loss for $\hat{b}$ . We then cascaded stage 1 and trained both stages together with 3D BCE loss for ${\hat{Z}}^{1}$ and SSIM loss for $\hat{b}$ . We then added subsequent stages and performed training with SSIM loss for $\hat{b}$ , and a new 3D view loss for ${\hat{Z}}^{k + 1}$ was calculated in the last stage. The 3D view loss is defined as the sum SSIM loss between the projected xy, yz, and xz views and the corresponding ground truth, which enhanced the axial sectioning capability. This loss function was applied only after more stages were added so the model could focus on baseline reconstruction before enhancing the axial resolution. For 2D reconstruction, we used only the 2D BCE loss function for ${\hat{X}}_{z}^{k}$ . We found that the loss function can typically converge after the third stages (fig. S14). Given the fast convergence of ADMM, we had a total of six stages in ADMM-Net to conserve GPU memory.

Multi-ADMM-Net (multiple local FOV reconstruction)

For 2D reconstruction, the entire FOV (4 mm by 6 mm) was divided into 108 FOVs in the raw measured image, each being 1.3 mm by 1.3 mm in size and centered at a single microlens unit (fig. S6). Each of the FOV was used to reconstruct a 1 mm by 1 mm area in the object domain. For 3D reconstruction, the raw measured image was divided into six local FOVs, each being 2.6 mm by 2.6 mm. Each local FOV was used to reconstruct a subvolume of 2.02 mm by 2.02 mm by 0.6 mm out of the entire 4 mm by 6 mm by 0.6 mm volume. Each FOV/subvolume was reconstructed through a single ADMM-Net, with the loss functions calculated for each FOV/subvolume, and $C$ shared among all the ADMM-Nets in the same stage. The output was then stitched together where the overlapped regions were averaged.

Training dataset

We created a training dataset by capturing images of 2D fluorescent phantom samples using a bench top inverted microscope and DeepInMiniscope. The phantom samples were features such as letters and polygons on a mask (prepared by spraying green fluorescent paint on a custom-designed photolithography mask) (fig. S4). For each image reconstruction depth, we accordingly set the distance between the miniaturized integrated microscope and the sample. We then performed two calibration steps to align the image from the benchtop microscope (ground truth) and that of DeepInMiniscope (input of the reconstruction algorithms), using two calibration masks. The first calibration mask is a 4 mm by 6mm rectangular mask with marks on the four corners. Using these marks, we could define the effective region on DeepInMiniscope’s image which represents a 4 mm by 6 mm FOV. This effective region (2048 pixels by 3072 pixels) then serves as the paired image of the ground truth. The second calibration mask contains dots separated by 600 μm. This could calibrate the magnification of DeepInMiniscope. We also calculated the relative rotation angle between the images from the two microscopes and corrected such a rotation. After these two calibrations, we captured 15 image pairs for different phantom features (or the same features but different rotations) from both microscopes. This completed the acquisition of training data for one imaging depth. Then, we displaced DeepInMiniscope to change its distance from the sample while keeping the sample and the inverted microscope in the same position and repeated the steps above to acquire training data for another imaging depth. Once all the training pairs of different imaging depths were obtained, we could synthesize 3D training pairs where the images of DeepInMiniscope were a summation of the images of individual depth (fig. S5).

For each training pairs in 2D reconstruction, we divided the entire FOV into 108 local FOVs, each centered at a lens unit. Each local FOV is 1.3 mm by 1.3 mm in the raw measurement and 1 mm by 1 mm in the output of ADMM-Net and ground truth. The larger size of the image in the raw measurement patch is to include all the subimages of the microlens units within the local FOV for reconstruction algorithm. For each local FOV in each imaging depth, we use 14 image pairs in the training and 1 image pair for validation.

For each training pairs in 3D reconstruction, we divided the entire volume (4 mm by 6 mm by 0.6 mm) into six subvolumes (limited by GPU RAM). Each local FOV is 2.6 mm by 2.6 mm in the raw measurement, and each subvolume is 2.02 mm by 2.02 mm by 0.6 mm in the output of ADMM-Net and ground truth. For each local FOV, we used 90 synthesized volumes in the training. The data of each depth were selected from 14 of the 15 measurement pairs at each depth. We also used 10 synthesized volumes for validation, which were different from the volumes used in the training and contained the data from the one reserved measurement pair from each depth. The 2D multi-local-FOV ADMM-Net model is trained on Tesla A10G (24 GB RAM), and the 3D multi-local-FOV ADMM-Net model is trained with Tesla A100 (80 GB RAM).

Denoising and background suppression module

The raw image could contain noise and diffused background resulting from fluorescence light entering the lens units at high angles (i.e., from point source located outside the nominal FOV of the lens units) or scattered light from scattering tissue. A denoise module before the multi-local-FOV ADMM-Net could be used to preprocess the raw images to suppress the noise/background and pick up useful features. This module is optional and could be realized by various algorithms, such as camera dead pixel removal, LoG, edge filtering, or a pretrained CNN, depending on the exact images. Here, for objects that are sparse and discrete, a LoG filter was used to pick up the edges of the features. For images containing dense features that are not obviously regulated in any domain, a flexible preprocessing module such as a U-net could enhance the overall reconstruction results of multi-local-FOV ADMM-Net (fig. S3). We first trained $H^{D_{0}}$ without this U-net. Using the noisy raw images as the input, $H^{D_{0}}$ will learn to reconstruct the object with a rough estimation under the noise. Next, we connected the untrained U-net in front of the pretrained $H^{D_{0}}$ and trained the two modules together. Such a procedure avoids the scenario that the untrained $H^{D_{0}}$ and U-net confuse each other and do not process the image correctly. After a few epochs training with two modules connected, the U-net can learn to extract useful features from the raw measurements even if there is no ground truth for the output of U-net, and $H^{D_{0}}$ will adopt the output of U-net and enhance the reconstruction quality.

List-based RL deconvolution

RL deconvolution is an iterative algorithm to reconstruct the object using the PSF. Instead of using a full system matrix to store the mapping relationship between the object voxel and camera pixel, we use two sets of lists (fig. S16). The first sets of lists store the contributive object voxels for each camera pixel, whereas the second sets of lists store the contributive camera pixels for each object voxel (i.e., the PSF). This substantially reduces the required memory and computational time.

The two sets of lists could be obtained using geometrical optics (fig. S16). We set up a global coordinate across the object space and image space, with the lateral grid size being equal to the size of the pixel in the camera sensor. The nominal FOV is set to be 4 mm by 6 mm, same as the size of the microlens array. We could expand the FOV by a ratio of $S_{x}$ and $S_{y}$ respectively in the $x$ and $y$ directions, as object points that are laterally outside the microlens array area could be imaged by some lens units at the boundary onto the camera sensor. In our setup, $S_{x}$ and $S_{y}$ was <20%.

The pixel size on our camera sensor was 1.85 μm by 1.85 μm, and the magnification of the imaging system was 0.26 to 0.39. This translates to the lateral size of the voxel in the object space to be 4.7 to 7.1 μm. We could thus down-sample the global coordinate by a factor of $D_{S}$ in the lateral direction to be the coordinate of the object space while keeping the same FOV. In other words, within the global coordinate system, we picked one point out of every $D_{S} \times D_{S}$ points as the voxel to be reconstructed. Given the magnification and resolution, we set $D_{S}$ to be 3.

For each object voxel, we used a ray-tracing approach to find out the contributive camera pixels. We approximated the PSF of each lens unit to be a single point. As we have 108 microlens units, there will be a maximum of 108 pixels. However, as the object voxel is close to microlens array, only a portion of the lens units could effectively image a single object voxel (Fig. 2, G to I, and fig. S1). Correspondingly, pixels in the camera sensor that were laterally far away ( $d_{max}$ pixel counts away) from the object voxel of interest would not be considered. In our setup, $d_{max}$ was set to be ~800 to 1000 pixel units.

The pseudo code below illustrates the process to find the two sets of lists (Algorithm 1).

Using the two lists, we could conduct the list-based RL algorithm. The algorithm follows the same procedure as the standard RL algorithm, except that the full system matrix to store the mapping relationship between the object voxel and camera pixel is replaced by the two sets of lists. We typically conducted three to five iterations to achieve a good convergence.

The pseudo code below illustrates the list-based RL algorithm (Algorithm 2).

Neural networks used to compare with multi-local-FOV ADMM-Net

We compared the performance of multi-local-FOV ADMM-Net against single-FOV Hadamard-Net (30) and single-FOV ADMM-Net, as well as multi-global-FOV Hadamard-Net [also known as SV-FourierNet (22)] and multi-global-FOV Wiener-Net (29).

The single-FOV Hadamard-Net (30) and single-FOV ADMM-Net have been described earlier. The Hadamard-Net is essentially the ${\hat{X}}_{0}$ initialization in stage 0 in ADMM-Net, without any subsequent stage for ADMM updates.

Both multi-global-FOV Hadamard-Net and multi-global-FOV Wiener-Net reconstruct multiple global FOVs and then fuse them together. The multi-global-FOV Hadamard-Net shared a similar concept as (22). The image $b$ is first transformed into the 2D Fourier domain. An element-wise multiplication (i.e., Hadamard multiplication) between the 2D Fourier transform of $b$ and multiple sets of 2D learnable kernels in the Fourier domain are then performed, respectively, for both real components and imaginary components (30). Each resultant image, corresponding to each set of 2D learnable kernels, is then inverse Fourier transformed back to the spatial domain. The images were then fused together. We tested the multi-global-FOV Hadamard-Net where the 2D kernels were initialized randomly and where each of the 2D kernels was initialized as the Fourier transform of a calibrated local PSF in nine locations uniformly distributed over the entire FOV.

The multi-global-FOV Wiener-Net is adopted from (29). It shares a similar concept as the multi-global-FOV Hadamard-Net but learnt the 2D kernels in the spatial domain (representing PSF). Like the multi-global-FOV Hadamard-Net, we tested the multi-global-FOV Wiener-Net where the 2D kernels were initialized randomly and where each of the 2D kernels was initialized as a calibrated local PSF in nine locations uniformly distributed over the entire FOV. The noise level scalar was initialized as one. In all these methods, the raw measurement could be preprocessed by a denoising module, before processed by the neural networks.

Evaluation metrics on reconstructions

We evaluated the quality of the reconstructed images by examining both the background regions and the regions of interest which contain the image features. We first normalized the intensity of the reconstructed images so their pixel values ranged from 0 to 1. From the reference ground truth images captured by a benchtop microscope, we segmented the regions of interest and background regions through a threshold. Using the background region mask, we calculated the mean and SD for these regions in the reconstructed images (denoted as BG MEAN and BG SD, respectively, in table S1). Typically, lower values of BG MEAN and BG SD are preferable as they signify enhanced image contrast and more distinguishable features within the regions of interest. We then subtracted the background mean (BG MEAN) across each pixel and set the pixels with negative values into 0, in both the reconstructed images and reference ground truth images. On the basis of these images, we calculated the PSNR and SSIM of the reconstructed images. These PSNR and SSIM metrics could more accurately represent the quality of the reconstructed image within the regions of interest while also accounting for background variation. Higher values of PSNR and SSIM indicate a better reconstruction quality.

Preparation of scattering phantom sample

We prepared the scattering phantom by mixing nonfluorescent microspheres (1.18 μm in diameter, Cospheric) and fluorescent microspheres (12 μm in diameter, Cospheric) in an optical clear ultraviolet (UV) light–curable polymer (Norland). The scattering mean free path was calculated by Mie scattering theory (44)

l_{s} = \frac{2 d}{3 Φ Q_{s}}

(8)

where $d$ is the mean diameter of scatters, $Φ$ is the volume fraction of scatters, and $Q_{s}$ is the scattering efficiency factor calculated by Mie scattering calculator (45). The nonfluorescent microspheres were weighted, powdered by an ultrasonic machine to avoid clustering, and added to the optical polymer. The fluorescent beads were added and mixed uniformly into the polymer. The polymer was then poured to a 3D printed phantom slide and cured with UV light.

Mouse surgery, experiment, and image processing

Mouse experiments were conducted with the approval of University of California Davis Institutional Animal Care and Use Committee. C57BL/6J mice (2 to 5 months old) were injected with AAV1-hSyn-GCaMP6f into the right primary visual cortex, and chronic craniotomy was performed to attach a circular 3.5-mm glass coverslip window for imaging. A custom stainless steel headplate with a 7-mm central hole was attached to the skull surrounding the craniotomy through dental cement (Metabond). The coverslip was secured with cyanoacrylate (3M VetBond) and sealed with dental cement to cover the exposed bone. After 3 to 4 weeks, which allowed for virus expression, awake imaging was performed. The headplate was fixed to custom posts on top of a 3D printed treadmill where the mouse could run freely. The headplate and the miniaturized integrated microscope DeepInMiniscope were aligned in parallel with a bubble level. The neuronal activity was imaged by DeepInMiniscope positioned directly above the cranial window, with the excitation light delivered through the two fiber ports connected to DeepInMiniscope.

The recorded raw video was preprocessed to identify pixels with high temporal variance, which likely indicate active neurons. This was achieved through a difference–to–local mean (DLM) operation. A box filter was applied to each raw video frame to calculate the local average brightness of each pixel, and the difference between the pixel value in the original raw frame and its local average was calculated. This resulted in a DLM video with a suppressed background. An SD was calculated for each pixel over time, and a LoG filter was applied to extract a highly active area. This produced an SD-DLM mask representing local signal variation weights. The SD-DLM mask was multiplied element-wise with the DLM video to enhance temporal activities and further suppress background noise. Each frame of the weighted DLM video was processed by the multi-local-FOV ADMM-Net for 3D reconstruction. Two algorithms were used to analyze the time-series 3D reconstruction volume. The first evaluated the temporal correlation between voxels to identify individual clusters which likely represented a neuron. For each voxel, its averaged temporal correlation against the adjacent pixels (i.e., four voxels laterally and two voxels axially) was calculated to construct a 3D volume. An iterative clustering algorithm (16), which evaluated the graph connectivity between voxels in the spatiotemporal correlation map of the 3D volume, segmented individual clusters and located their centroids. Clusters with a lateral FWHM of <~35 μm were likely an individual neuron. The second algorithm, a CNMF-E (35), extracted the spatial footprint and deconvoluted and reconstructed the activity of individual neurons for each reconstructed plane. Neurons were further selected on the basis of the clustering result to enhance the extraction fidelity.

Acknowledgments

We thank C. Juliano and B. D. Cox for the hydra samples, F. McNally and E. Beath for the C. elegans samples, L. Tian for the mouse surgery setup and tools, the Center for Nano-MicroManufacturing at University of California, Davis for assistance in fabricating the doublet microlens array, and the Translating Engineering Advances to Medicine (TEAM) Lab and the Engineering Student Design Center (ESDC) at University of California, Davis for assistance in designing and building the mechanical housing of the miniscope.

Funding: This work was partially supported by the National Eye Institute (R21EY029472 to W.Y.), National Institute of Neurological Disorders and Stroke (R01NS133924 to W.Y.), Burroughs Wellcome Fund (CASI 1015761 to W.Y.), and National Science Foundation (CAREER 1847141 to W.Y.).

Author contributions: Conceptualization: W.Y. and F.T. Investigation: F.T., B.M., and W.Y. Visualization: F.T. and W.Y. Methodology: F.T., B.M., and W.Y. Formal analysis: F.T. and W.Y. Validation: F.T. and W.Y. Data curation: F.T. Project administration: W.Y. Funding acquisition: W.Y. Resources: F.T., B.M., and W.Y. Writing—original draft: F.T. and W.Y. Writing—review and editing: F.T., B.M., and W.Y. Supervision: W.Y.

Competing interests: The authors declare that they have no competing interests.

Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. The code of the reconstruction algorithms for DeepInMiniscope and the demonstration dataset is available on a public online repository (Dryad) https://doi.org/10.5061/dryad.6t1g1jx83 (and on Github at https://github.com/Yang-Research-Laboratory/DeepInMiniscope-Learned-Integrated-Miniscope).

Supplementary Materials

The PDF file includes:

Supplementary Notes S1 to S4

Figs. S1 to S28

Table S1

Legends for movies S1 to S4

References

sciadv.adr6687_sm.pdf^{(16.9MB, pdf)}

Other Supplementary Material for this manuscript includes the following:

Movies S1 to S4

sciadv.adr6687_movies_s1_to_s4.zip^{(21.4MB, zip)}

REFERENCES AND NOTES

1.Ghosh K. K., Burns L. D., Cocker E. D., Nimmerjahn A., Ziv Y., El Gamal A., Schnitzer M. J., Miniaturized integration of a fluorescence microscope. Nat. Methods 8, 871–878 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Aharoni D., Hoogland T. M., Circuit investigations with open-source miniaturized microscopes: Past, present and future. Front. Cell. Neurosci. 13, 141 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
3.de Groot A., van den Boom B. J. G., van Genderen R. M., Coppens J., van Veldhuijzen J., Bos J., Hoedemaker H., Negrello M., Willuhn I., de Zeeuw C. I., Hoogland T. M., NINscope, a versatile miniscope for multi-region circuit investigations. eLife 9, e49987 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Qin Z., Chen C., He S., Wang Y., Tam K. F., Ip N. Y., Qu J. Y., Adaptive optics two-photon endomicroscopy enables deep-brain imaging at synaptic resolution over large volumes. Sci. Adv. 6, eabc6521 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Yanny K., Antipa N., Liberti W., Dehaeck S., Monakhova K., Liu F. L., Shen K., Ng R., Waller L., Miniscope3D: Optimized single-shot miniature 3D fluorescence microscopy. Light Sci. Appl. 9, 171 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Scherrer J. R., Lynch G. F., Zhang J. J., Fee M. S., An optical design enabling lightweight and large field-of-view head-mounted microscopes. Nat. Methods 20, 546–549 (2023). [DOI] [PubMed] [Google Scholar]
7.Guo C., Blair G. J., Sehgal M., Sangiuliano Jimka F. N., Bellafard A., Silva A. J., Golshani P., Basso M. A., Blair H. T., Aharoni D., Miniscope-LFOV: A large-field-of-view, single-cell-resolution, miniature microscope for wired and wire-free imaging of neural dynamics in freely behaving animals. Sci. Adv. 9, eadg3918 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Zong W., Obenhaus H. A., Skytøen E. R., Eneqvist H., de Jong N. L., Vale R., Jorge M. R., Moser M.-B., Moser E. I., Large-scale two-photon calcium imaging in freely moving mice. Cell 185, 1240–1256.e30 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Adams J. K., Boominathan V., Avants B. W., Vercosa D. G., Ye F., Baraniuk R. G., Robinson J. T., Veeraraghavan A., Single-frame 3D fluorescence microscopy with ultraminiature lensless FlatScope. Sci. Adv. 3, e1701548 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Antipa N., Kuo G., Heckel R., Mildenhall B., Bostan E., Ng R., Waller L., DiffuserCam: Lensless single-exposure 3D imaging. Optica 5, 1–9 (2018). [Google Scholar]
11.Boominathan V., Adams J. K., Robinson J. T., Veeraraghavan A., PhlatCam: Designed phase-mask based thin lensless camera. IEEE Trans. Pattern Anal. Mach. Intell. 42, 1618–1629 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Cai Z., Chen J., Pedrini G., Osten W., Liu X., Peng X., Lensless light-field imaging through diffuser encoding. Light Sci. Appl. 9, 143 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Kuo G., Liu F. L., Grossrubatscher I., Ng R., Waller L., On-chip fluorescence microscopy with a random microlens diffuser. Opt. Express 28, 8384–8399 (2020). [DOI] [PubMed] [Google Scholar]
14.Wu J., Zhang H., Zhang W., Jin G., Cao L., Barbastathis G., Single-shot lensless imaging with fresnel zone aperture and incoherent illumination. Light Sci. Appl. 9, 53 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Xue Y., Davison I. G., Boas D. A., Tian L., Single-shot 3D wide-field fluorescence imaging with a computational miniature mesoscope. Sci. Adv. 6, eabb7508 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Tian F., Hu J., Yang W., GEOMScope: Large field-of-view 3D lensless microscopy with low computational complexity. Laser Photonics Rev. 15, 2100072 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Adams J. K., Yan D., Wu J., Boominathan V., Gao S., Rodriguez A. V., Kim S., Carns J., Richards-Kortum R., Kemere C., Veeraraghavan A., Robinson J. T., In vivo lensless microscopy via a phase mask generating diffraction patterns with high-contrast contours. Nat. Biomed. Eng. 6, 617–628 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Xue Y., Yang Q., Hu G., Guo K., Tian L., Deep-learning-augmented computational miniature mesoscope. Optica 9, 1009–1021 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Wu J., Boominathan V., Veeraraghavan A., Robinson J. T., Real-time, deep-learning aided lensless microscope. Biomed. Opt. Express 14, 4037–4051 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Hu J., Yang W., Metalens array miniaturized microscope for large-field-of-view imaging. Opt. Commun. 555, 130231 (2024). [Google Scholar]
21.Wu J., Chen Y., Veeraraghavan A., Seidemann E., Robinson J. T., Mesoscopic calcium imaging in a head-unrestrained male non-human primate using a lensless microscope. Nat. Commun. 15, 1271 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Yang Q., Guo R., Hu G., Xue Y., Li Y., Tian L., Wide-field, high-resolution reconstruction in computational multi-aperture miniscope using a Fourier neural network. Optica 11, 860–871 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Boominathan V., Robinson J. T., Waller L., Veeraraghavan A., Recent advances in lensless imaging. Optica 9, 1–16 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Li S., Gao Y., Wu J., Wang M., Huang Z., Chen S., Cao L., Lensless camera: Unraveling the breakthroughs and prospects. Fundam. Res. 5, 1725–1736 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Yang Y., Sun J., Li H., Xu Z., Deep ADMM-Net for compressive sensing MRI. Adv. Neural Inf. Process. Syst. 29, 10–18 (2016). [Google Scholar]
26.Monakhova K., Yurtsever J., Kuo G., Antipa N., Yanny K., Waller L., Learned reconstructions for practical mask-based lensless imaging. Opt. Express 27, 28075–28090 (2019). [DOI] [PubMed] [Google Scholar]
27.Asif M. S., Ayremlou A., Sankaranarayanan A., Veeraraghavan A., Baraniuk R. G., FlatCam: Thin, lensless cameras using coded aperture and computation. IEEE Trans. Comput. Imaging 3, 384–397 (2017). [Google Scholar]
28.Khan S. S., Sundar V., Boominathan V., Veeraraghavan A., Mitra K., FlatNet: Towards photorealistic scene reconstruction from lensless measurements. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1934–1948 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Yanny K., Monakhova K., Shuai R. W., Waller L., Deep learning for fast spatially varying deconvolution. Optica 9, 96–99 (2022). [Google Scholar]
30.Tian F., Yang W., Learned lensless 3D camera. Opt. Express 30, 34479–34496 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Gissibl T., Thiele S., Herkommer A., Giessen H., Sub-micrometre accurate free-form optics by three-dimensional printing on single-mode fibres. Nat. Commun. 7, 11763 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Thiele S., Arzenbacher K., Gissibl T., Giessen H., Herkommer A. M., 3D-printed eagle eye: Compound microlens system for foveated imaging. Sci. Adv. 3, e1602655 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Deb D., Jiao Z., Sims R., Chen A. B., Broxton M., Ahrens M. B., Podgorski K., Turaga S. C., FourierNets enable the design of highly non-local optical encoders for computational imaging. Thirty-Sixth Conf. Neural Inf. Process. Syst. 1829, 25224–25236 (2022). [Google Scholar]
34.Chen T.-W., Wardill T. J., Sun Y., Pulver S. R., Renninger S. L., Baohan A., Schreiter E. R., Kerr R. A., Orger M. B., Jayaraman V., Looger L. L., Svoboda K., Kim D. S., Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature 499, 295–300 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Zhou P., Resendez S. L., Rodriguez-Romaguera J., Jimenez J. C., Neufeld S. Q., Giovannucci A., Friedrich J., Pnevmatikakis E. A., Stuber G. D., Hen R., Kheirbek M. A., Sabatini B. L., Kass R. E., Paninski L., Efficient and accurate extraction of in vivo calcium signals from microendoscopic video data. eLife 7, e28728 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Nöbauer T., Skocek O., Pernía-Andrade A. J., Weilguny L., Traub F. M., Molodtsov M. I., Vaziri A., Video rate volumetric Ca²⁺ imaging across cortex using seeded iterative demixing (SID) microscopy. Nat. Methods 14, 811–818 (2017). [DOI] [PubMed] [Google Scholar]
37.Bao Y., Soltanian-Zadeh S., Farsiu S., Gong Y., Segmentation of neurons from fluorescence calcium recordings beyond real time. Nat. Mach. Intell. 3, 590–600 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Sità L., Brondi M., de Leon Roig P. L., Curreli S., Panniello M., Vecchia D., Fellin T., A deep-learning approach for online cell identification and trace extraction in functional two-photon calcium imaging. Nat. Commun. 13, 1529 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Zhang Y., Zhang G., Han X., Wu J., Li Z., Li X., Xiao G., Xie H., Fang L., Dai Q., Rapid detection of neurons in widefield calcium imaging datasets after training with synthetic data. Nat. Methods 20, 747–754 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Zhang K., Tang S., Zhu V., Barchini M., Yang W., An end-to-end recurrent compressed sensing method to denoise, detect and demix calcium imaging data. Nat. Mach. Intell. 6, 1106–1118 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
41.S. S. Khan, V. Boominathan, A. Veeraraghavan, K. Mitra, “Designing optics and algorithm for ultra-thin, high-speed lensless cameras” in 2023 IEEE International Conference on Multimedia and Expo (ICME) (IEEE, 2023), pp. 1583–1588. [Google Scholar]
42.Tseng E., Colburn S., Whitehead J., Huang L., Baek S.-H., Majumdar A., Heide F., Neural nano-optics for high-quality thin lens imaging. Nat. Commun. 12, 6493 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Zhang K., Hu J., Yang W., Deep compressed imaging via optimized-pattern scanning. Photonics Res. 9, B57–B70 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Mengual O., Meunier G., Cayré I., Puech K., Snabre P., TURBISCAN MA 2000: Multiple light scattering measurement for concentrated emulsion and suspension instability analysis. Talanta 50, 445–456 (1999). [DOI] [PubMed] [Google Scholar]
45.S. Prahl, Mie Scattering Calculator. https://omlc.org/calc/mie_calc.html.
46.H. M. Merklinger, The INs and OUTs of FOCUS: An Alternative Way to Estimate Depth-of-Field and Sharpness in the Photographic Image (1992).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Notes S1 to S4

Figs. S1 to S28

Table S1

Legends for movies S1 to S4

References

sciadv.adr6687_sm.pdf^{(16.9MB, pdf)}

Movies S1 to S4

sciadv.adr6687_movies_s1_to_s4.zip^{(21.4MB, zip)}

[R1] 1.Ghosh K. K., Burns L. D., Cocker E. D., Nimmerjahn A., Ziv Y., El Gamal A., Schnitzer M. J., Miniaturized integration of a fluorescence microscope. Nat. Methods 8, 871–878 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Aharoni D., Hoogland T. M., Circuit investigations with open-source miniaturized microscopes: Past, present and future. Front. Cell. Neurosci. 13, 141 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.de Groot A., van den Boom B. J. G., van Genderen R. M., Coppens J., van Veldhuijzen J., Bos J., Hoedemaker H., Negrello M., Willuhn I., de Zeeuw C. I., Hoogland T. M., NINscope, a versatile miniscope for multi-region circuit investigations. eLife 9, e49987 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Qin Z., Chen C., He S., Wang Y., Tam K. F., Ip N. Y., Qu J. Y., Adaptive optics two-photon endomicroscopy enables deep-brain imaging at synaptic resolution over large volumes. Sci. Adv. 6, eabc6521 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Yanny K., Antipa N., Liberti W., Dehaeck S., Monakhova K., Liu F. L., Shen K., Ng R., Waller L., Miniscope3D: Optimized single-shot miniature 3D fluorescence microscopy. Light Sci. Appl. 9, 171 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Scherrer J. R., Lynch G. F., Zhang J. J., Fee M. S., An optical design enabling lightweight and large field-of-view head-mounted microscopes. Nat. Methods 20, 546–549 (2023). [DOI] [PubMed] [Google Scholar]

[R7] 7.Guo C., Blair G. J., Sehgal M., Sangiuliano Jimka F. N., Bellafard A., Silva A. J., Golshani P., Basso M. A., Blair H. T., Aharoni D., Miniscope-LFOV: A large-field-of-view, single-cell-resolution, miniature microscope for wired and wire-free imaging of neural dynamics in freely behaving animals. Sci. Adv. 9, eadg3918 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Zong W., Obenhaus H. A., Skytøen E. R., Eneqvist H., de Jong N. L., Vale R., Jorge M. R., Moser M.-B., Moser E. I., Large-scale two-photon calcium imaging in freely moving mice. Cell 185, 1240–1256.e30 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Adams J. K., Boominathan V., Avants B. W., Vercosa D. G., Ye F., Baraniuk R. G., Robinson J. T., Veeraraghavan A., Single-frame 3D fluorescence microscopy with ultraminiature lensless FlatScope. Sci. Adv. 3, e1701548 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Antipa N., Kuo G., Heckel R., Mildenhall B., Bostan E., Ng R., Waller L., DiffuserCam: Lensless single-exposure 3D imaging. Optica 5, 1–9 (2018). [Google Scholar]

[R11] 11.Boominathan V., Adams J. K., Robinson J. T., Veeraraghavan A., PhlatCam: Designed phase-mask based thin lensless camera. IEEE Trans. Pattern Anal. Mach. Intell. 42, 1618–1629 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Cai Z., Chen J., Pedrini G., Osten W., Liu X., Peng X., Lensless light-field imaging through diffuser encoding. Light Sci. Appl. 9, 143 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Kuo G., Liu F. L., Grossrubatscher I., Ng R., Waller L., On-chip fluorescence microscopy with a random microlens diffuser. Opt. Express 28, 8384–8399 (2020). [DOI] [PubMed] [Google Scholar]

[R14] 14.Wu J., Zhang H., Zhang W., Jin G., Cao L., Barbastathis G., Single-shot lensless imaging with fresnel zone aperture and incoherent illumination. Light Sci. Appl. 9, 53 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Xue Y., Davison I. G., Boas D. A., Tian L., Single-shot 3D wide-field fluorescence imaging with a computational miniature mesoscope. Sci. Adv. 6, eabb7508 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Tian F., Hu J., Yang W., GEOMScope: Large field-of-view 3D lensless microscopy with low computational complexity. Laser Photonics Rev. 15, 2100072 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Adams J. K., Yan D., Wu J., Boominathan V., Gao S., Rodriguez A. V., Kim S., Carns J., Richards-Kortum R., Kemere C., Veeraraghavan A., Robinson J. T., In vivo lensless microscopy via a phase mask generating diffraction patterns with high-contrast contours. Nat. Biomed. Eng. 6, 617–628 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Xue Y., Yang Q., Hu G., Guo K., Tian L., Deep-learning-augmented computational miniature mesoscope. Optica 9, 1009–1021 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Wu J., Boominathan V., Veeraraghavan A., Robinson J. T., Real-time, deep-learning aided lensless microscope. Biomed. Opt. Express 14, 4037–4051 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Hu J., Yang W., Metalens array miniaturized microscope for large-field-of-view imaging. Opt. Commun. 555, 130231 (2024). [Google Scholar]

[R21] 21.Wu J., Chen Y., Veeraraghavan A., Seidemann E., Robinson J. T., Mesoscopic calcium imaging in a head-unrestrained male non-human primate using a lensless microscope. Nat. Commun. 15, 1271 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Yang Q., Guo R., Hu G., Xue Y., Li Y., Tian L., Wide-field, high-resolution reconstruction in computational multi-aperture miniscope using a Fourier neural network. Optica 11, 860–871 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Boominathan V., Robinson J. T., Waller L., Veeraraghavan A., Recent advances in lensless imaging. Optica 9, 1–16 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Li S., Gao Y., Wu J., Wang M., Huang Z., Chen S., Cao L., Lensless camera: Unraveling the breakthroughs and prospects. Fundam. Res. 5, 1725–1736 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Yang Y., Sun J., Li H., Xu Z., Deep ADMM-Net for compressive sensing MRI. Adv. Neural Inf. Process. Syst. 29, 10–18 (2016). [Google Scholar]

[R26] 26.Monakhova K., Yurtsever J., Kuo G., Antipa N., Yanny K., Waller L., Learned reconstructions for practical mask-based lensless imaging. Opt. Express 27, 28075–28090 (2019). [DOI] [PubMed] [Google Scholar]

[R27] 27.Asif M. S., Ayremlou A., Sankaranarayanan A., Veeraraghavan A., Baraniuk R. G., FlatCam: Thin, lensless cameras using coded aperture and computation. IEEE Trans. Comput. Imaging 3, 384–397 (2017). [Google Scholar]

[R28] 28.Khan S. S., Sundar V., Boominathan V., Veeraraghavan A., Mitra K., FlatNet: Towards photorealistic scene reconstruction from lensless measurements. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1934–1948 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Yanny K., Monakhova K., Shuai R. W., Waller L., Deep learning for fast spatially varying deconvolution. Optica 9, 96–99 (2022). [Google Scholar]

[R30] 30.Tian F., Yang W., Learned lensless 3D camera. Opt. Express 30, 34479–34496 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Gissibl T., Thiele S., Herkommer A., Giessen H., Sub-micrometre accurate free-form optics by three-dimensional printing on single-mode fibres. Nat. Commun. 7, 11763 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Thiele S., Arzenbacher K., Gissibl T., Giessen H., Herkommer A. M., 3D-printed eagle eye: Compound microlens system for foveated imaging. Sci. Adv. 3, e1602655 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Deb D., Jiao Z., Sims R., Chen A. B., Broxton M., Ahrens M. B., Podgorski K., Turaga S. C., FourierNets enable the design of highly non-local optical encoders for computational imaging. Thirty-Sixth Conf. Neural Inf. Process. Syst. 1829, 25224–25236 (2022). [Google Scholar]

[R34] 34.Chen T.-W., Wardill T. J., Sun Y., Pulver S. R., Renninger S. L., Baohan A., Schreiter E. R., Kerr R. A., Orger M. B., Jayaraman V., Looger L. L., Svoboda K., Kim D. S., Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature 499, 295–300 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Zhou P., Resendez S. L., Rodriguez-Romaguera J., Jimenez J. C., Neufeld S. Q., Giovannucci A., Friedrich J., Pnevmatikakis E. A., Stuber G. D., Hen R., Kheirbek M. A., Sabatini B. L., Kass R. E., Paninski L., Efficient and accurate extraction of in vivo calcium signals from microendoscopic video data. eLife 7, e28728 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.Nöbauer T., Skocek O., Pernía-Andrade A. J., Weilguny L., Traub F. M., Molodtsov M. I., Vaziri A., Video rate volumetric Ca²⁺ imaging across cortex using seeded iterative demixing (SID) microscopy. Nat. Methods 14, 811–818 (2017). [DOI] [PubMed] [Google Scholar]

[R37] 37.Bao Y., Soltanian-Zadeh S., Farsiu S., Gong Y., Segmentation of neurons from fluorescence calcium recordings beyond real time. Nat. Mach. Intell. 3, 590–600 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Sità L., Brondi M., de Leon Roig P. L., Curreli S., Panniello M., Vecchia D., Fellin T., A deep-learning approach for online cell identification and trace extraction in functional two-photon calcium imaging. Nat. Commun. 13, 1529 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.Zhang Y., Zhang G., Han X., Wu J., Li Z., Li X., Xiao G., Xie H., Fang L., Dai Q., Rapid detection of neurons in widefield calcium imaging datasets after training with synthetic data. Nat. Methods 20, 747–754 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Zhang K., Tang S., Zhu V., Barchini M., Yang W., An end-to-end recurrent compressed sensing method to denoise, detect and demix calcium imaging data. Nat. Mach. Intell. 6, 1106–1118 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.S. S. Khan, V. Boominathan, A. Veeraraghavan, K. Mitra, “Designing optics and algorithm for ultra-thin, high-speed lensless cameras” in 2023 IEEE International Conference on Multimedia and Expo (ICME) (IEEE, 2023), pp. 1583–1588. [Google Scholar]

[R42] 42.Tseng E., Colburn S., Whitehead J., Huang L., Baek S.-H., Majumdar A., Heide F., Neural nano-optics for high-quality thin lens imaging. Nat. Commun. 12, 6493 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Zhang K., Hu J., Yang W., Deep compressed imaging via optimized-pattern scanning. Photonics Res. 9, B57–B70 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] 44.Mengual O., Meunier G., Cayré I., Puech K., Snabre P., TURBISCAN MA 2000: Multiple light scattering measurement for concentrated emulsion and suspension instability analysis. Talanta 50, 445–456 (1999). [DOI] [PubMed] [Google Scholar]

[R45] 45.S. Prahl, Mie Scattering Calculator. https://omlc.org/calc/mie_calc.html.

[R46] 46.H. M. Merklinger, The INs and OUTs of FOCUS: An Alternative Way to Estimate Depth-of-Field and Sharpness in the Photographic Image (1992).

PERMALINK

DeepInMiniscope: Deep learning–powered physics-informed integrated miniscope

Feng Tian

Ben Mattison

Weijian Yang

Roles

Abstract

INTRODUCTION

Fig. 1. Overview of DeepInMiniscope.

RESULTS

Miniaturized integrated microscope with a doublet microlens array

Fig. 2. Assembly of DeepInMiniscope and the microlens array.

Highly efficient reconstruction algorithms

Fig. 3. Architecture of ADMM-Net, illustrated as single-FOV to reconstruct a single 3D volume.

Image resolution

Imaging of phantom with dense fluorescence features and comparison between multi-local-FOV ADMM-Net and others

Imaging of fluorescent beads embedded in scattering medium over a large 3D volume

Fig. 4. 3D reconstruction of fluorescent beads distributed in a 3D volume.

Imaging of C. elegans and hydra

Fig. 5. In vivo imaging of hydra labeled by GFP.

Imaging of the visual cortex of awake mouse

Fig. 6. In vivo calcium imaging of neural activity in mouse visual cortex, transfected with GCaMP6f.

DISCUSSION

MATERIALS AND METHODS

Construction of the miniaturized integrated microscope

Architecture of ADMM-Net (single-FOV reconstruction)

Multi-ADMM-Net (multiple local FOV reconstruction)

Training dataset

Denoising and background suppression module

List-based RL deconvolution

Algorithm 1. Establishing the two sets of lists (voxel-pixel mapping and pixel-voxel mapping).

Algorithm 2. List-based RL.

Neural networks used to compare with multi-local-FOV ADMM-Net

Evaluation metrics on reconstructions

Preparation of scattering phantom sample

Mouse surgery, experiment, and image processing

Acknowledgments

Supplementary Materials

The PDF file includes:

Other Supplementary Material for this manuscript includes the following:

REFERENCES AND NOTES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases