Abstract
Snapshot recording of transient dynamics in three dimensions (3-D) is highly demanded in both fundamental and applied sciences. Yet it remains challenging for conventional high-speed cameras to address this need due to limited electronic bandwidth and reliance on mechanical scanning. The emergence of light field tomography (LIFT) provides a new solution to these long-standing problems and enables 3-D imaging at an unprecedented frame rate. However, based on sparse-view computed tomography, LIFT can accommodate only a limited number of projections, degrading the resolution in the reconstructed image. To alleviate this problem, we herein present a spectral encoding scheme to significantly increase the number of allowable projections in LIFT while maintaining its snapshot advantage. The resultant system can record 3-D dynamics at a kilohertz volumetric frame rate. Moreover, by using a multichannel compressed sensing algorithm, we improve the image quality with an enhanced spatial resolution and suppressed aliasing artifacts.
The demand for single-shot high-speed imaging is ever-growing [1–3]. To circumvent the bandwidth limitation of electronic image sensors, strategies like compressed sensing encode high-dimensional spatiotemporal data and transform it into a low-dimensional format for fast readout: one single camera pixel, thus, carries multiplexed spatial or temporal information [4]. The scene can be numerically reconstructed provided that the signals are sparse in a specific domain. Such data compression has been implemented with encoders like a high-speed digital mirror device (DMD) [5–8], a piezoelectric stage [9], pixel shutters [10], and temporally shifting detectors such as a streak camera [5] and a time delay integration (TDI) camera [11].
Despite significant advances, most compressed imaging cameras can only capture two-dimensional (2-D) dynamics. To break this barrier, we recently developed light field tomography (LIFT), which reformulates light field photography as a sparse-view computed tomography (CT) problem and enables three-dimensional (3-D) imaging at an unprecedented frame rate. Rather than recording a 2-D image at each view angle, LIFT acquires en-face projections of perspective images and records the data using a one-dimension (1-D) detector array [12,13]. Such a measurement scheme significantly reduces the data load and allows a sub-10-ps temporal resolution. However, like other compressed imaging modalities, LIFT’s image resolution is largely dependent on the data compression ratio. Additionally, sharing the same problem with sparse-view CT, the LIFT reconstruction bears artifacts and an anisotropic resolution as a result of a limited number of projection measurements [14–16]. Although these problems can be alleviated by filling the aperture of the main lens with more perspective image channels, this reduces each channel’s aperture and, therefore, compromises the channel’s diffraction-limited resolution (Fig. S1 in Supplement 1).
To solve this problem, we present augmented LIFT through parallel spectral encoding. Our method is inspired by the fact that most imaging systems record only the spatial coordinates of an optical field. The other dimensions of light, such as polarization [17], wavelengths [6,18,19], and angles [20], can be used for signal multiplexing. We previously showed that, by using a diffractive element-like grating, we could disperse the 1-D projections in LIFT and enable spectral imaging [21]. Here we utilize this spectral dimension and encode wavelengths with optical rotations for a significantly enriched projection angle set within a snapshot (Fig. 1). This maintains the perspective image channel’s aperture and provides LIFT with scalability to deal with signals with various sparsity.
Fig. 1.

Schematic of the optical system and image formation. (a) Dove prisms with different angles followed by dichroic mirrors encode rotation into three color channels. Images are merged at the intermediate image plane. The pseudo color in the schematic does not indicate the actual wavelength used in the system. (b) Dove prism and cylindrical lens array are used to capture the light field of the object. Each lenslet is located at a different position of the objective’s aperture and forms a 1-D projection of the original 2-D light field sub-image. A transmissive grating disperses the 1-D projection so that we can separate three channels previously encoded with different image rotations. The slit is perpendicular to the non-power axis of cylindrical lens and the dispersing direction of the grating. Only three lenslets in the array are shown for simplicity. (c) The pipeline of our imaging method. We read out only a few rows of pixels and rearrange them to get the sinogram within each channel. The color dotted lines show example sampling positions for three channels. By combining the computational tomography and light field imaging, we can reconstruct the object in 3D with much less pixel readout and a higher imaging speed.
We employ three Dove prisms to rotate the image with slightly different angles (Channel 1, 0; Channel 2, 2.3 degrees; Channel 3, 5.3 degrees). Each of them performs only in a given spectral bandwidth (Channel 1, 565–600 nm; Channel 2, 550–565 nm; Channel 3, 540–550 nm), and three channels are merged as a single output [Fig. 1(a)]. The following detection system implements a Dove prism array and a cylindrical lens array to capture the light field in the form of 1-D projections [Fig. 1(b)]. Each lenslet occupies a sub-pupil area of the main lens, and the disparity enables synthetic refocusing in post-processing. Since the projection makes the information redundant in the non-power axis of the cylindrical lens, it reduces the necessary pixel readout to one row per sub-pupil image and, thus, boosts the imaging speed. In the end, to separate channels from the merged image, a diffractive grating disperses the 1-D projection and, thus, gives simultaneous acquisition of multiple channels of LIFT.
The schematic of the system is illustrated in Fig. S2 in Supplement 1. We use a microscopic objective (4X0.13NA, RMS4X-PF, Olympus). In the infinity space of the objective, beam splitters (BS1, 30(R):70(T), BS019; BS2, 50(R):50(T), BS013, Thorlabs) split three channels, and dichroic mirrors (DM1, AT565dc; DM2, T550lpxr, Chroma) merge them after Dove prisms (PS995M, Thorlabs). The pupil plane of the objective is then relayed by a 4f system (ACT508–300-A, ACT508–750-A, Thorlabs) to an array of custom Dove prisms and cylindrical lenses (Fig. S3 in Supplement 1). Each Dove prism in the array has a clear aperture of 2mm, while each cylindrical lens (focal length = 20mm ) covers five Dove prisms with an extended length of 17mm. The 1D projections are focused onto a slit array (slit width = 50 μm), followed by being dispersed by a grating (300 groves/mm, GT50–03, Thorlabs) in the Fourier plane of another relay system . The raw image is captured by a scientific complementary metal–oxide–semiconductor (sCMOS) sensor (Prime BSI or Kinetix, Teledyne). We select multiple regions of interest (ROIs) for exposure and readout, which allows us to convert a 2-D sensor into an array of fast 1-D detectors. This configuration enables a higher speed because of a lighter data load. Compared to a streak camera in the original LIFT implementation, an sCMOS camera is less costly and easier to maintain. More importantly, it allows configurable 2-D sensor areas where the encoded signals can be demultiplexed. For the illumination source, we used a halogen lamp (HL250-AY, AmScope) with a diffuser (DG10–120, Thorlabs) for static scenes and a LED (UHP-F-5–560, Prizmatix) with a collimator (LLG5-CM1, Prizmatix) for dynamic scenes. Overall, the systematic magnification is 0.178× with a FOV around 4.2 mm × 4.2 mm.
Figure 1(c) shows the pipeline of our imaging process. The vertical direction spans the spectrum of the projection measurement associated with the perspective image. By sampling the corresponding wavelength for each channel, we read out 15 rows of pixels (five rows per channel) to acquire three sinograms (25 projections for each) from the same snapshot. Each channel differs in the global rotation applied during the spectrum encoding stage, thus complementing each other to mitigate the sparse-view problem in CT.
For channel , we model the formation of the vectorized sinogram as
| (1) |
where is the vectorized 2-D image, is the geometrical transformation between channels during spectral encoding, is the rotation operator applied by the Dove prism array, and denotes the signal integration by the cylindrical lens.
is the collection of rotation operators, which can be expanded as
will be the forward operator of channel , and the image reconstruction of a 2-D slice can be achieved by iteratively solving the following optimization problem:
| (2) |
where is a transform function sparsifying the image. We chose total variation while other functions like and wavelet transform can also be applicable. is a hyperparameter that weighs the regularization term. Equation (2) is solved using the fast iterative shrinkage-thresholding algorithm (FISTA) [22].
In conventional light field imaging, synthetic refocusing is performed by shifting and adding the sub-pupil images [23]. To extend our method to a 3D scene, we shift each sub-pupil image with respect to its pupil location before passing it to the iterative reconstruction [12,21]. If we define the spatial coordinate of each Dove prism in the array as , we translate the sub-pupil image in the direction and distance of vector , where is a parameter depending on the axial location of synthetic focal plane. Since we acquire only a 1-D projection of a sub-pupil image, we need to find the amount of translation perpendicular to the projection axis. Assuming that one Dove prism lenslet rotates the image by counterclockwise and the non-power direction of cylindrical lens aligns with the sub-pupil coordinate axis , we shift each 1-D sub-pupil projection by [21]
| (3) |
During the system calibration, we measure the rotation angles of each channel by aligning images of the same object acquired at the intermediate image plane within different spectra. We next place a pinhole at the nominal focal plane of the objective lens. From the full-frame sensor image, we can locate and store the center positions of each projection. Then we translate the pinhole axially with a known distance and find the shifting parameter that gives the sharpest refocused image. By this means, we map with the depth position in experiments. We further capture a standard United States Air Force (USAF) resolution target image at each depth and compute the geometrical transformation among channels using image registration. This step is crucial because the transformation is depth-dependent. After the system calibration, we only read out limited pixels at projection positions for a higher framerate. The pixel rearrangement and reconstruction will be using the same calibration data for all following experimental measurements.
Sparse-view CT is prone to noise and structural artifacts. When testing our system on the USAF resolution target, a single channel measurement can frequently fail by generating deteriorated line pairs and stripe-like structures [Fig. 2(a)]. With an insufficient number of projection angles, the resolving capability is also subject to the object’s orientation and its relative location in the field of view (FOV). Since each channel observes the sample with slightly different rotations, reconstruction varies given the same group of line pairs (Fig. 2). This underlies the misinterpretation of the sample structures. Combining all three channels, the augmented measurement provides twice more projections than a single channel, leading to higher contrast and lower noise. The resolving capability also gains robustness in dealing with samples of different orientations (Fig. 2).
Fig. 2.

Enhancement of image quality by combining three channels measurement. (a) Comparison of reconstruction results of multichannel versus single channel when imaging a USAF resolution target. By combining the information from all three channels, the image gains higher image quality, a higher contrast, reduced artifacts, and stable resolving capability on differently oriented objects. Yellow and blue box mark the area enlarged for a close-up comparison. (b) Pixel intensity profile of sampling line i and ii, which are labeled in (a). Our proposed method delivers a more reliable spatial resolution, while the single channel suffers from fewer number of projections, and it is dependent on the object orientation. For multichannel results, contrast levels are calculated for line pair groups 2–2 to 2–5 as 0.60,0.43,0.41, and 0.28. The resolvable line pair 2–5 indicates a lateral resolution of 157 μm (Table S1 in Supplement 1).
To further validate the enhancement, we tested our method on synthetic images displayed on an LCD panel and used structure similarity index measure (SSIM) as a metric to quantify the reconstruction quality. We found that the images with complex structures like retinal vessels [Fig. 3(a)] benefit the most from the multichannel reconstruction. This is because objects with fine details and textures are less compressible, and thus it requires more projection measurements (i.e., a lower compression ratio) for CT reconstruction.
Fig. 3.

Validation of reconstruction fidelity and synthetic refocus. (a) We display various samples on an LCD display panel in front of our system and quantify the reconstruction fidelity and quality using the structural similarity index measure (SSIM). The color-coded merged image shows the similarity of our reconstruction to the ground truth. We observe a more significant advantage of our multichannel over single-channel measurement when imaging samples with more complex structure. (b) The USAF resolution target is placed at different defocus positions to show the synthetic refocusing ability of our imaging method. (c) We adopt the classical “shifting and add” algorithm for image refocusing. By shifting each sub-image of the light field with regard to their position at the aperture, we can bring sharp focus back to a defocused object. The relation between the shifting and object depth is plotted. Red dots denote each measurement, and the dotted line is the fitting.
LIFT captures the light field of the scene, which permits 3-D reconstruction. To demonstrate the improvement in image quality through spectral encoding also applies to refocused depths, we translated the resolution target axially with various defocus. By shifting each projection accordingly in post-processing, the augmented LIFT can focus on different depths with the same enhancement from multichannel reconstruction [Figs. 3(b) and 3(c)].
The parallel spectral encoding preserves the LIFT’s snapshot 3-D imaging ability, which allows us to capture fast dynamics at different depths simultaneously. The measurement of 5 × 5 sub-aperture images, which would occupy the entire sensor frame in a conventional light field camera, is now compressed in a format of 15 × 3200 pixels. By reducing the full frame to a few rows of pixels, we can record the light field at a kilohertz frame rate by reading out only selected ROIs in a scientific CMOS (3200 × 3200 px, 16-bit mode, full frame rate, 83 Hz). We demonstrated the speed by imaging a rotating optical chopper wheel (MC1F6P10, Thorlabs), which was relayed to the sample to generate a mask in motion [Fig. 4(a)]. The slit moved at an approximate speed of 200 mm/s and crossed the entire FOV within 20 ms. And our system could seize the transient movement without motion blur. We mounted a grid pattern on the chopper wheel and positioned a USAF resolution target at a defocused distance. We reconstructed a sharp image sequence for both objects via digital refocusing in post-processing [Fig. 4(b), Visualization 1]. We performed a similar experiment using axially displaced alphabet characters [Figs. 4(c) and 4(d)]. While each frame benefits from the augmented sinogram via spectral encoding (Visualization 2), it is flexible to adjust the frame rate by configuring the camera ROIs for the actual readout of the number of channels. For example, when dealing with simple objects where a high compression ratio is permissible, we can switch to a single channel for higher imaging speed (Visualization 3).
Fig. 4.

High-speed 3-D imaging experiments. (a) We designed an experiment where a rotating optical chopper is relayed to the sample to generate a mask in high-speed motion. The chopper and the sample are displaced at different depths. To represent the chopper wheel plane, we attached a small grid. The data was acquired at 1111 Hz. (b) Representative time-lapse frames of the grid (upper row) and the USAF resolution target (lower row). Through synthetic refocus, we can reconstruct sharp images of dynamics at different depths in post-processing. Scale bar, 1 mm. (c) Displaced alphabet characters are used to replace USAF resolution target in (a). (d) Representative time-lapse frames of each character masked by the rotating optical chopper. White dotted line delineates the shape of chopper, and the arrow is the rotating direction. Scale bar, 1 mm.
Conventional light field acquisition is redundant for 3-D imaging because the sub-aperture images duplicate each other except for a disparity cue [12]. LIFT solves this problem by reducing the data dimension and, thus, provides an efficient way for high-speed volumetric imaging. In this work, we further augmented LIFT with spectral encoding to mitigate the common drawbacks of sparse-view CT reconstruction. Kilohertz 3-D microscopic imaging was demonstrated over a large volume (~4.2 mm × 4.2 mm × 4.5 mm) with reduced reconstruction errors and aliasing artifacts.
In the current system, the sCMOS sensor is the bottleneck for the imaging speed. Because the ROIs are sequentially exposed and there exists an extra overhead time for each readout, the achievable framerate (1111 Hz) is far below the theoretical limit (> 17kHz). This problem can be alleviated by optimizing the camera’s firmware. The current method is not directly compatible with the 1-D sensors like the streak camera because spectral multiplexing requires an extra sensor area for dispersion. Despite the technical challenge, we could seek to stack synchronized 1-D sensor arrays as a solution. Additionally, because the encoding system introduces aberrations (mainly astigmatism), we cannot directly concatenate sinograms from different channels. Instead, our algorithm approaches it by iteratively merging 2-D reconstructions in the image space, which relies on the prior knowledge of the geometrical transformation between the channels. Lastly, an underlying assumption of our method is that the sample has a similar appearance across wavelengths, a condition that holds if the image contrast is dominated by a single chromophore. Although beyond the scope of current work, fluorescence imaging can also be made possible by tailoring the channel wavelength to the fluorophore’s emission. We expect our augmented acquisition method will expand the application realm of LIFT by enabling a higher image quality while maintaining its snapshot advantage.
Supplementary Material
Funding.
National Institutes of Health (R01HL129727, R01HL159970, R35GM128761).
Footnotes
Disclosures. The authors declare no conflicts of interest.
Supplemental document. See Supplement 1 for supporting content.
Data availability.
The data that support the plots within this paper and other findings of this study are available from the corresponding author upon reasonable request.
REFERENCES
- 1.Liang J and Wang LV, Optica 5,1113 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Gao L and Wang LV, Phys. Rep. 616, 1 (2016) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Mikami H, Gao L, and Goda K, Nanophotonics 5, 497 (2016). [Google Scholar]
- 4.Park J, Feng X, Liang R, and Gao L, Nat. Commun. 11, 5602 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Gao L, Liang J, Li C, and Wang LV, Nature 516, 74 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Wang P, Liang J, and Wang LV, Nat. Commun. 11, 2091 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Liang J, Wang P, Zhu L, and Wang LV, Nat. Commun. 11, 5252 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lu Y, Wong TTW, Chen F, and Wang L, Phys. Rev. Lett. 122, 193904 (2019). [DOI] [PubMed] [Google Scholar]
- 9.Llull P, Liao X, Yuan X, Yang J, Kittle D, Carin L, Sapiro G, and Brady DJ, Opt. Express 21, 10526 (2013). [DOI] [PubMed] [Google Scholar]
- 10.Mochizuki F, Kagawa K, Okihara S, Seo M-W, Zhang B, Takasawa T, Yasutomi K, and Kawahito S, Opt. Express 24, 4155 (2016). [DOI] [PubMed] [Google Scholar]
- 11.Park J and Gao L, Optica 8,1620 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Feng X and Gao L, Nat. Commun. 12, 2179 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Feng X, Ma Y, and Gao L, Nat. Commun. 13, 3333 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Davison ME, SIAM J Appl. Math. 43, 428 (1983). [Google Scholar]
- 15.Willemink MJ and Noël PB, Eur. Radiol. 29, 2185 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Han X, Bian J, Eaker DR, Kline TL, Sidky EY, Ritman EL, and Pan X, IEEE Trans. Med. Imaging 30, 606 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hai N, Kumar R, and Rosen J, Opt. Laser Eng. 151, 106912 (2022). [Google Scholar]
- 18.Nakagawa K, Iwasaki A, Oishi Y, Horisaki R, Tsukamoto A, Nakamura A, Hirosawa K, Liao H, Ushida T, Goda K, Kannari F, and Sakuma I, Nat. Photonics. 8, 695 (2014). [Google Scholar]
- 19.Zang Z, Li Z, Luo Y, Han Y, Li H, Liu X, and Fu HY, APL Photon. 7, 046102(2022). [Google Scholar]
- 20.Li Z, Zgadzaj R, Wang X, Chang Y-Y, and Downer MC, Nat. Commun. 5, 3085(2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Cui Q, Park J, Ma Y, and Gao L, Optica 8, 1552 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Beck A and Teboulle M, SIAM J. Imaging Sci. 2, 183 (2009). [Google Scholar]
- 23.Ng R, Levoy M, Brédif M, Duval G, Horowitz M, and Hanrahan P, Light Field Photography with a Hand-Held Plenoptic Camera (Stanford University, 2005). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support the plots within this paper and other findings of this study are available from the corresponding author upon reasonable request.
