Abstract
Regenerative therapies have recently shown potential in restoring sight lost due to degenerative diseases. Their efficacy requires precise intra-retinal delivery, which can be achieved by robotic systems accompanied by high quality visualization of retinal layers. Intra-operative Optical Coherence Tomography (iOCT) captures cross-sectional retinal images in real-time but with image quality that is inadequate for intra-retinal therapy delivery. This paper proposes a two-stage super-resolution methodology that enhances the image quality of the low resolution (LR) iOCT images leveraging information from pre-operatively acquired high-resolution (HR) OCT (preOCT) images. First, we learn the degradation process from HR to LR domain through CycleGAN and use it to generate pseudo iOCT (LR) images from the HR preOCT ones. Then, we train a Pix2Pix model on the pairs of pseudo iOCT and preOCT to learn the super-resolution mapping. Quantitative analysis using both full-reference and no-reference image quality metrics demonstrates that our approach clearly outperforms the learning-based state-of-the art techniques with statistical significance. Achieving iOCT image quality comparable to pre-OCT quality can help this medical imaging modality be established in vitreoretinal surgery, without requiring expensive hardware-related system updates.
Keywords: Image quality, super-resolution, iOCT
1. Introduction
Regenerative therapies (e.g. [20, 6]) have emerged as novel treatment methods for degenerative eye diseases such as Age-Related Macular Degeneration [15], which gradually leads to sight loss. Their success, however, depends on precise delivery to the intra-retinal or sub-retinal space. To this end, alongside novel robotic tools that enable the required implantation precision [5], excellent visualization capabilities are crucial for intra-operative guidance. Intra-operative Optical Coherence Tomography (iOCT) can support such vitreoretinal interventions by providing cross-sectional visualization of the retina and the targeted layers.
In the pre-operative setting, the gold standard for imaging this targeted anatomy, is Optical coherence tomography (OCT), which is a non-invasive imaging modality using infrared light interferometry to visualise retinal layer information. Modern OCT systems use spatiotemporal signal averaging to capture OCT images of excellent quality, enabling clinicians to easily differentiate retinal tissues and layers. However, the long acquisition time during pre-operative OCT scanning makes it unsuitable for the real-time visualization of an intervention. Real-time acquisition is achieved by iOCT albeit at the expense of image quality. More specifically, iOCT images have increased levels of speckle noise [24] and low signal strength [21], which limit their interventional utility. Therefore, we focus on computationally enhancing the quality of iOCT images provided by current commercial clinical systems with the goal of augmenting the capabilities of iOCT technology in the surgical setting without requiring expensive hardware updates.
Image quality enhancement of OCT images has been addressed by various works. Wiener filters [21], segmentation-based [8], registration-based [23] and diffusion-based [2] methods, as well as methods that consider empirical speckle statistics [18], successfully enhanced the OCT quality by reducing speckle noise (denoising) and preserving image structures. However, similar methods can not efficiently be applied on iOCT images and real-time scenarios due to their high computational cost as well as the need of perfect image alignment and prolonged scanning time.
Learning-based techniques using Generative Adversarial Networks (GANs) [9] have been proposed for image quality enhancement or domain translation of natural images [14, 27, 13]. Similar approaches have been adopted for medical imaging modalities such as CT [26], PET [25] and OCT [1, 7, 10]. However, few works have focused on intra-operative OCT image quality enhancement. In [16] iOCT quality was improved using iOCT 3D cubes as the high resolution domain, while in [17] super-resolution achieved through surgical biomicroscopy guidance.
This work concerns self-supervised super-resolution4 of iOCT images transferring the quality from high-resolution (HR) pre-operative OCT images to low-resolution (LR) iOCT images. As access to aligned LR-HR pairs is not available, previous approach [17] focused on estimating the HR image of each LR by fusing multiple aligned iOCT video frames and then performing paired super-resolution. Given the fact that their estimated HR images are still of inferior quality compared to preOCT, we propose here a two-stage methodology for the task of unpaired image quality enhancement of iOCT using available preOCT images as HR domain. First, we train a CycleGAN [27] model using iOCT as input and pre-operative, high quality, OCT as the target domain and learn the image degradation process by training the backwards mapping network (HR to LR). Subsequently, the latter is leveraged to generate pseudo iOCT images, which contrary to the starting unpaired dataset, are now aligned with their pre-OCT counterparts. Then, we apply super-resolution with pixel-level supervision through Pix2Pix [13] using the generated pseudo iOCT images. To establish the effectiveness of this approach we provide extensive quantitative analysis showing we outperform existing, state-of-the-art learning based iOCT super-resolution approaches.
2. Methods
In this section, we present the data used in our study, the two-stage super-resolution approach and the quantitative metrics used for evaluation.
2.1. Datasets
The data used in this work are derived from an internal database of intra-operative and pre-operative OCT scans accompanied with vitreoretinal surgery videos acquired at Moorfields Eye Hospital, London, UK (see Fig. 1). The data was acquired in accordance with the Declaration of Helsinki (1983 Revision) and its ethical principles. We use HR pre-operative OCT data (resolution of 512x1024x128 voxels) of 61 subjects which were acquired prior to the surgery using Cirrus 5000 as well as LR intra-operative OCT data (resolution of 440x300 pixels) acquired during the intervention using RESCAN 700 integrated into the Zeiss OPMI LUMERA 700. Pre-operative OCT 2D frames were extracted from the recorded 3D OCT scans.
Fig. 1.
(a): Surgery video frame. Left: Surgical biomicroscope view. Right: iOCT frames. (b): From top to bottom: iOCT and preOCT with macular hole.
2.2. Two-Stage Super-resolution Approach
The task addressed in this work is super-resolution (SR) and quality enhancement of iOCT images. Specifically, this task is formulated as domain translation from the iOCT domain to the preOCT domain. In our first attempt, we used CycleGAN’s architecture (Fig. 2.b) as one-stage approach to learn the bidirectional domain translation between HR preOCT and LR iOCT images. However, given that our iOCT and preOCT images are unpaired, and despite the fact that CycleGAN has shown superior performance in unpaired tasks where no pixel-level loss can be employed, as shown in our quantitative analysis it failed to generate consistent results.
Fig. 2. Different approaches for learning the mapping (G) between X and Y domains.
We therefore propose a two-stage approach (Fig. 2.c). In the first stage, we use a CycleGAN model to learn the mappings between iOCT and preOCT domains. We leverage the capability of the model to learn with consistency the backwards mapping (from preoCT to iOCT), thus providing a generator that approximates the degradation and domain translation from HR to LR. We then use the trained backwards generator Gx to generate a pseudo (fake) iOCT that is pixel-wise aligned with each real preOCT image. In the second stage, we train a model that learns to map pseudo iOCT images (LR) to the preOCT domain (HR) leveraging pixel-level supervision through the Pix2Pix model. Crucially, as we show in the experimental section, the generator in the second stage sees only pseudo iOCT inputs but is able to effectively generalize to real iOCT images.
2.3. Implementation Details
The dataset, 7808 pairs of preOCT and pseudo iOCT images, was split into: training set (70%, 43 patients), validation set (15%, 9 patients) and test set (15%, 9 patients). Each patient’s image data were used only in one set. Pseudo iOCT images were generated through the inference of the first stage network achieving similarity of 87.49 Fréchet Inception Distance (see next section) with respect to the real iOCT images.
We based the implementation of the building blocks (Pix2Pix, CycleGAN) of our two-stage approach on the code available online5. Both networks use ResNet-based generator [14] of nine residual blocks and are trained on input resolution of 440x300. All models are trained using Adam optimizer with initial learning rate of 10−4 and a batch size of 4 for a total of 200 epochs. We used NVIDIA Quadro P6000 GPU with 24 GB memory for our experiments.
2.4. Evaluation Metrics
To evaluate the performance of the proposed approach compared to the state-of-the art learning based methods, given that the ground truth HR images do not exist, we use five no-reference Image Quality Assessment (IQA) metrics, i.e. Fréchet Inception Distance (FID) [11], Kernel Inception Distance (KID) [3], perceptual loss function lfeat [16], Global Contrast Factor (GCF) [19] and Fast Noise Estimation (FNE) [12]. FID calculates the distance between distributions of features of two image sets extracted from the ImageNet-pretrained Inception-v3. KID is the squared Maximum Mean Discrepancy between Inception representations extracted from Inception-v3. Perceptual loss lfeat demonstrates how perceptually similar are two image sets by calculating the distance of their representations extracted by Deep Convolutional Network pretrained on Imagenet [22]. GCF calculates the contrast at different resolution levels to calculate the global contrast of the image while FNE measures the noise level of each image of the dataset. We use |ΔGCF|, which quantifies the absolute difference of the GCF that SR image yields compared to preOCT, and |ΔFNE|, which is the absolute difference of the FNE that SR image yields with respect to preOCT.
Furthermore, we use full-reference metrics following the SR approaches in natural images which apply image degradation techniques to the HR ground truth images to create the LR counterparts. Peak signal-to-noise ratio (PSNR) and Structural Similarity Index (SSIM) are used in our case to evaluate the performance of each model using as input the pseudo iOCT images and comparing its output with real preOCT images.
3. Results
In this section, we present the results obtained from the quantitative analysis conducted to evaluate our approach.
3.1. Evaluation on real iOCT Images
We compare the improvements in image quality of the iOCT images generated by our model with respect to images from the preOCT domain. A total number of 2352 iOCT frames, extracted from iOCT surgery videos of 9 patients, not present in the train set, was used as test set. As ground truth HR images do not exist, we use five different no-reference IQA metrics, described in 2.4: FID, KID, lfeat, |ΔGCF| and |ΔFNE|.
Table 1 summarises the results of our analysis. Super-resolution using our method ranks first in terms of three out of five no-reference metrics, which demonstrates that the iOCT image quality has been improved (see also Fig. 3) and is closer to preOCT images. Our method is compared to the real iOCT images, the state-of-the-art iOCT SR techniques [16, 17] and the SR using Cycle-GAN with unpaired LR and HR datasets (UnCycGAN). Regarding perceptual metrics, our method exhibit the best FID value and close to the best KID and lfeat values, which demonstrates that our methods can generate SR images that are perceptually more similar to the HR domain. In addition, |ΔGCF| and |ΔFNE| metrics demonstrate that contrast and noise values of SR images through our method are closer to the values of HR preOCT images, which are the images with the best quality in our dataset. We assessed the statistical significance of the reported values for |ΔGCF| and |ΔFNE| using a paired t-test and all p-values were p < 0.001. Statistical significance can not be examined for perceptual metrics (FID, KID, lfeat) which return one value for all the test set. Furthermore, our approach performs at 18.05 frames per second (FPS) with iOCT images of size 440x300 as input, which is appropriate for the real-time requirement of our application.
Table 1. Quantitative analysis on iOCT images.
Arrows show whether higher/lower is better.
Fig. 3. From top to bottom: LR iOCT images, SR using [17], SR using our proposed method.
3.2. Evaluation on pseudo iOCT Images
Prior works [4, 14] used this standard evaluation technique but they applied heuristic degradation processes to generate LR images from their HR counterparts. However, iOCT quality can be affected by speckle noise, low signal strength, different pathologies which are not trivial to simulate. Therefore, we opt for learning the degradation processes. As described in 2.2, during the first training stage we learn the mapping from preOCT to pseudo iOCT and we use it to create pseudo iOCT images that are aligned with our real preOCT for testing, allowing full-reference metrics to be computed.
Thus, quantitative analysis using both full-reference, i.e. PSNR, SSIM, and no-reference metrics was performed on 1152 pairs of pseudo iOCT and preOCT images. The results are reported in Table 2. Our method outperforms all other approaches both numerically and visually as shown in Fig. 4. According to six out of seven metrics, our approach can generate SR images that have high perceptual and structural similarity as well as similar levels of contrast and noise to preOCT images. Paired t-test was used to assess the statistical significance of the pairwise comparisons of the PSNR, SSIM, |ΔGCF| and |ΔFNE| reported values and all p-values were p < 0.001.
Table 2. Quantitative analysis.
Arrows show whether higher/lower is better.
| Full-Reference | No-Reference | ||||||
|---|---|---|---|---|---|---|---|
| PSNR (↑) | SSIM (↑) | FID (↓) | KID (↓) | lfeat(↓) | |ΔGCF| (↓) | |ΔFNE| (↓) | |
| pseudo iOCT | 23.05±2.1 | 0.65±0.1 | 121.30 | 0.114 | 336.04 | 1.43±0.5 | 2.49±0.5 |
| [16] | 16.81±1.8 | 0.58±0.1 | 123.18 | 0.127 | 362.40 | 2.48±0.5 | 3.17±0.1 |
| [17] | 24.09±2.2 | 0.64±0.1 | 75.43 | 0.058 | 277.89 | 0.41±0.3 | 4.11±0.1 |
| UnCycGAN | 28.93±1.6 | 0.82±0.0 | 58.87 | 0.041 | 237.66 | 0.34±0.2 | 2.81±0.1 |
| Ours | 31.45±0.9 | 0.82±0.0 | 16.62 | 0.007 | 76.02 | 0.27±0.1 | 2.61±0.4 |
Fig. 4. From top to bottom: LR pseudo iOCT images, SR using [17], SR using our proposed method, HR preOCT images.
4. Discussion and Conclusions
In our study, we propose a Super-resolution pipeline for iOCT images acquired during vitreoretinal surgeries using pre-operatively acquired OCT images as HR domain. Our methodology clearly outperforms both numerically and visually previous proposed image quality enhancement methods.
First, we learn the degradation from preOCT (HR) domain to iOCT (LR) domain through a CycleGAN model trained on unpaired images of the two domains. Then, we apply the learned degradation process to generate pseudo iOCT images from preOCT ones which allows us to create pairs of LR-HR images. Finally, we train a Pix2Pix model on the LR-HR pairs to perform super-resolution.
We quantitatively evaluate our pipeline using as input both iOCT images extracted from real surgery videos and pseudo iOCT images generated through the learned degradation process. The results demonstrate the superior improvement that our method can achieve compared to already proposed techniques.
Future work will include qualitative analysis from expert clinicians and will consider temporal information for the iOCT video super-resolution.
Supplementary Material
Footnotes
We interchange “super resolution” and “quality enhancement” as usual in the literature.
References
- 1.Apostolopoulos S, Salas J, Ordóñez JL, Tan SS, Ciller C, Ebneter A, Zinkernagel M, Sznitman R, Wolf S, De Zanet S, Munk M. Automatically enhanced oct scans of the retina: A proof of concept study. Scientific reports. 2020;10(1):1–8. doi: 10.1038/s41598-020-64724-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bernardes R, Maduro C, Serranho P, Araújo A, Barbeiro S, Cunha-Vaz J. Improved adaptive complex diffusion despeckling filter. Optics express. 2010;18(23):24048–24059. doi: 10.1364/OE.18.024048. [DOI] [PubMed] [Google Scholar]
- 3.Bińkowski M, Sutherland DJ, Arbel M, Gretton A. Demystifying mmd gans. arXiv preprint. 2018:arxiv:1801.01401 [Google Scholar]
- 4.Bulat A, Yang J, Tzimiropoulos G. To learn image super-resolution, use a gan to learn how to do image degradation first; Proceedings of the European conference on computer vision (ECCV); 2018. pp. 185–200. [Google Scholar]
- 5.Cornelissen P, Ourak M, Borghesan G, Reynaerts D, Vander Poorten E. Towards real-time estimation of a spherical eye model based on a single fiber oct; 2019 19th International Conference on Advanced Robotics (ICAR); 2019. pp. 666–672. [Google Scholar]
- 6.da Cruz L, Fynes K, Georgiadis O, Kerby J, Luo YH, Ahmado A, Vernon A, Daniels JT, Nommiste B, Hasan SM, Gooljar SB, et al. Phase 1 clinical study of an embryonic stem cell–derived retinal pigment epithelium patch in age-related macular degeneration. Nature biotechnology. 2018;36(4):328. doi: 10.1038/nbt.4114. [DOI] [PubMed] [Google Scholar]
- 7.Devalla SK, Subramanian G, Pham TH, Wang X, Perera S, Tun TA, Aung T, Schmetterer L, Thiéry AH, Girard MJ. A deep learning approach to denoise optical coherence tomography images of the optic nerve head. Scientific reports. 2019;9(1):1–13. doi: 10.1038/s41598-019-51062-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Fang L, Li S, Cunefare D, Farsiu S. Segmentation based sparse reconstruction of optical coherence tomography images. IEEE transactions on medical imaging. 2016;36(2):407–421. doi: 10.1109/TMI.2016.2611503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. Advances in neural information processing systems. 2014;27 [Google Scholar]
- 10.Halupka KJ, Antony BJ, Lee MH, Lucy KA, Rai RS, Ishikawa H, Woll-stein G, Schuman JS, Garnavi R. Retinal optical coherence tomography image enhancement via deep learning. Biomedical optics express. 2018;9(12):6205–6221. doi: 10.1364/BOE.9.006205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems. 2017;30 [Google Scholar]
- 12.Immerkaer J. Fast noise variance estimation. Computer vision and image understanding. 1996;64(2):300–302. [Google Scholar]
- 13.Isola P, Zhu JY, Zhou T, Efros AA. Image-to-image translation with conditional adversarial networks; Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. pp. 1125–1134. [Google Scholar]
- 14.Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and super-resolution; European conference on computer vision; Springer; 2016. pp. 694–711. [Google Scholar]
- 15.de Jong EK, Geerlings MJ, den Hollander AI. Age-related macular degeneration. Genetics and genomics of eye disease. 2020:155–180. [Google Scholar]
- 16.Komninos C, Pissas T, Flores B, Bloch E, Vercauteren T, Ourselin S, Cruz LD, Bergeles C. Intra-operative oct (ioct) image quality enhancement: A super-resolution approach using high quality ioct 3d scans; International Workshop on Ophthalmic Medical Image Analysis; Springer; 2021. pp. 21–31. [Google Scholar]
- 17.Komninos C, Pissas T, Mekki L, Flores B, Bloch E, Vercauteren T, Ourselin S, Da Cruz L, Bergeles C. Surgical biomicroscopy-guided intra-operative optical coherence tomography (ioct) image super-resolution. International Journal of Computer Assisted Radiology and Surgery. 2022;17(5):877–883. doi: 10.1007/s11548-022-02603-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Li M, Idoughi R, Choudhury B, Heidrich W. Statistical model for oct image denoising. Biomedical optics express. 2017;8(9):3903–3917. doi: 10.1364/BOE.8.003903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Matkovic K, Neumann L, Neumann A, Psik T, Purgathofer W. Global contrast factor-a new approach to image contrast. Computational Aesthetics. 2005;2005(159-168):1. [Google Scholar]
- 20.Nazari H, Zhang L, Zhu D, Chader GJ, Falabella P, Stefanini F, Rowland T, Clegg DO, Kashani AH, Hinton DR, Humayun MS. Stem cell based therapies for age-related macular degeneration: the promises and the challenges. Progress in retinal and eye research. 2015;48:1–39. doi: 10.1016/j.preteyeres.2015.06.004. [DOI] [PubMed] [Google Scholar]
- 21.Ozcan A, Bilenca A, Desjardins AE, Bouma BE, Tearney GJ. Speckle reduction in optical coherence tomography images using digital filtering. JOSA A. 2007;24(7):1901–1910. doi: 10.1364/josaa.24.001901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, et al. Imagenet large scale visual recognition challenge. International journal of computer vision. 2015;115(3):211–252. [Google Scholar]
- 23.Sander B, Larsen M, Thrane L, Hougaard JL, Jørgensen TM. Enhanced optical coherence tomography imaging by multiple scan averaging. British Journal of Ophthalmology. 2005;89(2):207–212. doi: 10.1136/bjo.2004.045989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Viehland C, Keller B, Carrasco-Zevallos OM, Nankivil D, Shen L, Mangalesh S, Viet DT, Kuo AN, Toth CA, Izatt JA. Enhanced volumetric visualization for real time 4D intraoperative ophthalmic swept-source OCT. Biomedical Optics Express. 2016;7(5):1815. doi: 10.1364/BOE.7.001815. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Xu J, Gong E, Pauly J, Zaharchuk G. 200x low-dose pet reconstruction using deep learning. arXiv preprint. 2017:arxiv:1712.04119 [Google Scholar]
- 26.Yang Q, Yan P, Zhang Y, Yu H, Shi Y, Mou X, Kalra MK, Zhang Y, Sun L, Wang G. Low-dose ct image denoising using a generative adversarial network with wasserstein distance and perceptual loss. IEEE transactions on medical imaging. 2018;37(6):1348–1357. doi: 10.1109/TMI.2018.2827462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zhu JY, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks; Proceedings of the IEEE international conference on computer vision; 2017. pp. 2223–2232. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




