Abstract
Endoscopic video sequences provide surgeons with direct surgical field or visualisation on anatomical targets in the patient during robotic surgery. Unfortunately, these video images are unavoidably hazy or foggy to prevent surgeons from clear surgical vision due to typical surgical operations such as ablation and cauterisation during surgery. This Letter aims at removing fog or smoke on endoscopic video sequences to enhance and maintain a direct and clear visualisation of the operating field during robotic surgery. The authors propose a new luminance blending framework that integrates contrast enhancement with visibility restoration for foggy endoscopic video processing. The proposed method was validated on clinical endoscopic videos that were collected from robotic surgery. The experimental results demonstrate that their method provides a promising means to effectively remove fog or smoke on endoscopic video images. In particular, the visual quality of defogged endoscopic images was improved from 0.5088 to 0.6475.
Keywords: image enhancement, endoscopes, medical robotics, surgery, video signal processing, medical image processing, biomedical optical imaging, image sequences
Keywords: endoscopic video defogging, endoscopic video sequences, direct surgical field, robotic surgery, surgical vision, surgical operations, luminance blending framework, foggy endoscopic video processing, clinical endoscopic videos, endoscopic video images, cauterisation
1. Introduction
Interventional endoscopes (e.g. bronchoscope and colonoscope) integrated with video cameras at their distal tips are widely introduced in minimally invasive surgery. The endoscope provides surgeons with real-time endoscopic video sequences that are shown on medical displays. On the basis of endoscopic vision or surgical field from these images, surgeons can directly visualise and examine abnormal tissues and treat or resect tumours in the body.
Unfortunately, the visual quality of endoscopic video images is unavoidably degraded because of surgical smoke or fog during robotic surgery. These endoscopic foggy images (Fig. 1) are generally generated from a surgical processing called cauterisation, which is usually employed to limit the bleeding vessels, while other typical operations such as laser ablation can also bring surgical smoke in surgical field. Such fog or smoke commonly distracts surgeons who may wait for a while without doing anything until surgical smoke is gone, which increases surgical time. On the other hand, surgical fog also degrades the clear visualisation of the surgical field from the endoscope, as well as covers the structural details (e.g. vessel structures) on the organ surface. This harmful issue leads to inappropriate device use and incorrectly targeted tissue, increasing surgical risks such as in tissue or tumour resection during endoscopic surgery. Therefore, endoscopic video defogging plays an essential role in enhancing and maintaining a clear field of surgical vision, not only for safety by preventing inadvertent injury, but also for improving precision and reducing operative time.
Fig. 1.
Hazy images in robotic-assisted endoscopic surgery
a Thin smoke
b Heavy smoke
Endoscopic field defogging methods generally consists of hardware- and software-based strategies. While the former uses typical devices to remove smoke, the latter is algorithmic, i.e. computational photography techniques. This work develops a new luminance blending strategy for surgical video defogging. It combines a contrast enhancement procedure with a fast visibility recovery method to remove fog or smoke on endoscopic video sequences. We also quantitatively and objectively evaluate the experimental results of using our proposed method and others. The main contributions of this work are two-fold: (i) a new luminance blending approach with better performance than other defogging approaches and (ii) an objective image quality metric for quantitative assessment of dehazed images.
The remainder of this Letter is organised as follows. Section 2 briefly reviews work related to current dehazing methods. Our hybrid luminance blending-based dehazing method for vision augmentation is presented in Section 3, followed by the experiment settings in Section 4. Sections 5 shows and discusses the validation results before concluding this work in Section 6.
2. Related work
Real-world natural image and video dehazing or defogging techniques are widely discussed in computer vision and computational photography in the literature. Fattal [1] presented a graphical model used to calculate the atmospheric light for hazy-free image recovery. They assume that scene shading and transmission are locally independent of each other, which are not practical in applications. Tarel and Hautiere [2] introduced a fast visibility restoration strategy based on median filtering, but it usually results in colour distortion and easily fails at the image median filtering step that usually introduces null pixels. On the other hand, the fast visibility method also requires more efficient computation for real-time processing.
While He et al. [3] proposed dark channel-based atmospheric light and transmission estimation with soft editing, Meng et al. [4] employed the boundary constraint and contextual regularisation to modify this dark channel-based method, especially, they improved the computational efficiency and skipped soft editing. Nishino et al. [5] estimated two statistically independent components of the scene albedo and depth by using the Bayesian defogging model. While this Bayesian-based method works well, it also results in colour distortion. Ancuti and Ancuti [6] discussed a multi-scale fusion approach that combines the white balance with linearly transformed images extracted from hazy images. This multi-scale fusion approach is generally trapped in dealing with inhomogeneous fog due to loss of transmission depth information. While Sulami et al. [7] proposed a reduced formation model to describe image pixels in small patches as lines that are used to recover the atmospheric light orientation, Galdran et al. [8] presented an improved variational framework using inter-channel contrast in optimisation. More interestingly, fusion-based defogging is generally recognised as a promising framework to address the disadvantages of various dehazing methods [9].
More recently, deep learning-driven methods are increasingly developed for single image dehazing. While Ren et al. [10] employed multi-scale convolutional neural networks for single image dehazing, Li et al. [11] proposed All-In-One Dehazing Network (AOD-NET) to directly create the clean image through a lightweight convolutional neural network instead of separately computing the transmission map and the atmospheric light for single image dehazing. Moreover, Ren et al. [12] developed a deep video dehazing method based on semantic segmentation, which can effectively use the abundant information that exists across neighbouring frames for precise dehazing. Liu et al. [13] introduced a simple generic model-agnostic convolutional neural network trained end-to-end to recover clear images from hazy inputs.
Although these methods work well on natural images, they remain challenging to deal with surgical endoscopic video image fog or smoke, particularly in the case of inhomogeneous or thick haze. This work aims to address the problem of hazy images or videos with inhomogeneous or thick haze, particularly foggy endoscopic videos.
3. Approaches
This section details our luminance blending framework for surgical endoscopic video defogging. Our method contains several steps: (i) contrast enhancement, (ii) visibility recovery, and filtering, and (iii) luminance blending. Fig. 2 shows the flowchart of our processing, as discussed in the following section.
Fig. 2.
Flowchart of our proposed defogging method for night-time images
3.1. Contrast enhancement
Surgical foggy images are of low-contrast and limited illumination, especially in hazy regions. The goal of contrast enhancement is to improve the contrast of hazy-less regions on the endoscopic image and calculate the luminance and to enhance the luminance of the final defogged surgical image.
The contrast enhancement step assumes (i) most regions on the foggy image are hazy pixels that critically affect the mean of the foggy image and (ii) the level of haze in these regions depends on the distance between the atmospheric light and the scene, as discussed in [6]. On the basis of the assumption, we compute the enhanced luminance by the magnifying difference between the surgical hazy image and its average luminance value in the three channels
| (1) |
where is the magnification factor to control the luminance of the augmented foggy regions and are the width and height of the hazy endoscopic image. The original luminance at each pixel is calculated by [14]
| (2) |
where coefficients , , and .
3.2. Visibility recovery and filtering
A widely used physical imaging model is established for hazy images by Koschmieder's law [3]
| (3) |
where denotes an observed (foggy) image, refers to as a haze-free image (also called scene radiance), and indicates the atmospheric light or the sky luminance. The transmission map describes the amount of the unscattered light entering a camera and can be computed by
| (4) |
where k and are the atmosphere's scattering factor and the distance between the camera and any objects in a scene.
On the basis of (3), we aim to solve hazy-free image under the unknown variables and . However, according to a fast visibility recovery method [2], we did not directly estimate since it is difficult to precisely predict the transmission map related to depth information. To skip , the atmospheric veil was employed [6]
| (5) |
Then, (3) can be rewritten to calculate
| (6) |
This requires the atmospheric light and veil for which robust estimates can be obtained much more easily than the depth and transmission maps in the original formulation (3). The methods that are used to determine and veil have been discussed in [2]. Here, we skip the technical details of how to estimate light and veil .
Since the result of the fast visibility recovery usually contains image noise and artefacts, we employ joint bilateral filtering to process and obtain .
The bilateral filter is an edge-aware image processing method to denoise and simultaneously preserve edge information [15, 16]. The concept of joint bilateral filtering is to perform spatial filtering (particularly a Gaussian kernel) on a low-resolution image and simultaneously apply a range filter to process a high-resolution image (here the low- and high-resolution images refer to the recovery image and the original image , respectively) [17]
| (7) |
| (8) |
where , , variances , in the region centred at the pixel , and is computed by
| (9) |
which is the normalisation term to guarantee the sum of the weights for all the pixels to be one.
3.3. Luminance blending
This step is to estimate illumination on image and and blend their illumination to improve the illumination of the defogged endoscopic surgical image.
We transfer the images and from the red, green and blue (RGB) to YCbCr colour space. For the Y-component or luminance component of them, we used recursive filtering [18] to estimate the illumination of and and obtain and . By using image illumination and , we seek to recognise pixels in hazy regions. So, a weight function is empirically introduced, and the output of the blending fusion can be formulated
| (10) |
Note that the weight function (also called the weight matrix) depends on the level of smoke. In the heavy-smoke case, if the foggy pixel intensity belongs to the range of 16–128, these pixels will be assigned with weight 1. In the thin-smoke case, the pixels on the interval [128, 235] will be assigned with weight 1. The luminance output may not be distributed into the full range of pixel intensity, resulting in a low-contrast image. We implement the following linear transformation to stretch its histogram to a specific intensity range
| (11) |
where denotes the final luminance, and are the minimum and maximum intensities of the blending output , respectively. We empirically set and in our work. Eventually, we combine the Y-component and the chromatic components and and transform them into the RGB colour space, obtaining the final defogged image.
4. Validation
Foggy endoscopic videos were acquired from robotic surgery. All the experiments were executed on a laptop installed with Windows 8.1 Professional 64 bit system, 32.0 GB memory, and processor Intel(R) Xeon(R) CPU 8 2.8 GHz and MATLAB R2017a. We tested about 1200 frames in this Letter.
We compare the proposed method with the following approaches: (i) M1, Tarel et al. [2], (ii) M2, He et al. [3], (iii) M3, Nishino et al. [5], (iv) M4, Ancuti and Ancuti [6], (v) M5, Meng et al. [4], (vi) M6, Sulami et al. [7], and (vii) M7, our method.
We introduce a naturalness metric to depict how natural surgical images appear based on statistically analysing thousands of images [19]. On the other hand, we also employ structural similarity index (SSIM) [20] to evaluate structural information on images. Eventually, we define a hybrid quality metric to evaluate defogged endoscopic images
| (12) |
where denotes the SSIM and indicates the naturalness. The coefficient is set to 0.6, which was experimentally determined to balance the structural information and naturalness.
5. Results and discussion
Fig. 3 visually compares the defogged results of endoscopic images with thin and thick fogs. The visual quality of the results demonstrates that our method works better than others since it removes fog without introducing colour distortion. On the other hand, two surgeons manually inspected all the dehazed results and generally believe that our defogged method outperforms other approaches since the subjective visual quality of using our method is more natural and colourful, which is better than the original foggy image.
Fig. 3.
Comparison of using various defogging methods: (a)–(h) thin-fog image and (i)–(p) thick-fog image
a Thin-fog image
b M1 [2]
c M2 [3]
d M3 [5]
e M4 [6]
f M5 [4]
g M6 [7]
h M7 (ours)
i Thick-fog image
j M1 [2]
k M2 [3]
l M3 [5]
m M4 [6]
n M5 [4]
o M6 [7]
p M7 (ours)
Table 1 quantitatively compares the objective assessment of the dehazed results obtained from the seven approaches. The quantitative assessment results show that our proposed approach outperforms other methods. While the average naturalness of M1 and our proposed method M7 were comparable, the average SSIM was improved from 0.7978 to 0.9275. More interestingly, the average hybrid quality of our methods was 0.6475, which was much better than other approaches (M5 provides 0.5088).
Table 1.
Quantitatively objective assessment of the results obtained from the seven defogging approaches
| Approaches | M0 | M1 [2] | M2 [3] | M3 [5] | M4 [6] | M5 [4] | M6 [7] | M7 (ours) |
|---|---|---|---|---|---|---|---|---|
| SSIM | — | 0.6587 | 0.4781 | 0.6676 | 0.6488 | 0.7978 | 0.3944 | 0.9275 |
| naturalness | 0.1411 | 0.2319 | 0.0218 | 0.1097 | 0.1439 | 0.0752 | 0.0602 | 0.2274 |
| hybrid | — | 0.4890 | 0.2956 | 0.4445 | 0.4468 | 0.5088 | 0.2608 | 0.6475 |
M0 indicates the quantitative results of the original foggy images and does not have the SSIM index that is a reference-based metric
Additionally, the computational times of the methods M1, M3, M4, M5, M6, and M7 were 31.3, 62.3, 1.3, 5.2, 75.6, and 1.1 s/frame, respectively. Method M2 deals with an image in more than 2700 s since soft editing was extremely slow [3]. Our method works faster than others.
This work aims to enhance the surgical field visualisation of endoscopic surgery. We developed a new luminance blending defogging algorithm. The experimental results demonstrate that our algorithm outperforms others from subjective and objective evaluations. The effectiveness of our algorithm lies in fusing the advantages of the enhancement and restoration dehazing methods. Our method has several potential limitations including unclear parameter sensitivity, effective enhancement, quality assessment, and heavy processing time. These limitations will be further investigated in the future. In addition, though our method works better than other approaches, it still introduces colour distortion, which will be further investigated.
6. Conclusion
We proposed a new luminance blending defogging framework that integrates contrast enhancement, joint bilateral filtering, and visibility recovery to remove smoke in endoscopic videos from robotic surgery. We evaluated our method on endoscopic video sequences acquired from robotic prostatectomy. The experimental results demonstrate the effectiveness of our proposed method, which outperforms other approaches. In particular, our method improved the hybrid quality of the dehazed results from 0.5088 to 0.6475.
7. Acknowledgments
This work was partly supported by the National Natural Science Foundation of China (No. 61971367) and the Fundamental Research Funds for the Central Universities China (No. 20720180062).
8 References
- 1.Fattal R.: ‘Single image dehazing’, ACM Trans. Graph., 2008, 27, (3), pp. 1–10 (doi: 10.1145/1360612.1360671) [Google Scholar]
- 2.Tarel J.P., Hautiere N.: ‘Fast visibility restoration from a single color or gray level image’. IEEE Int. Conf. Computer Vision (ICCV), 2009, pp. 2201–2208 [Google Scholar]
- 3.He K., Sun J., Tang X.: ‘Single image haze removal using dark channel prior’, IEEE Trans. Pattern Anal. Mach. Intell., 2011, 33, (12), pp. 2341–2353 (doi: 10.1109/TPAMI.2010.168) [DOI] [PubMed] [Google Scholar]
- 4.Meng G., Wang Y., Duan J., et al. : ‘Efficient image dehazing with boundary constraint and contextual regularization’. IEEE Int. Conf. Computer Vision (ICCV), 2013, pp. 617–624 [Google Scholar]
- 5.Nishino K., Kratz L., Lombardi S.: ‘Bayesian defogging’, Int. J. Comput. Vis., 2012, 98, (3), pp. 263–278 (doi: 10.1007/s11263-011-0508-1) [Google Scholar]
- 6.Ancuti C.O., Ancuti C.: ‘Single image dehazing by multi-scale fusion’, IEEE Trans. Image Process., 2013, 22, (8), pp. 3271–3282 (doi: 10.1109/TIP.2013.2262284) [DOI] [PubMed] [Google Scholar]
- 7.Sulami M., Glatzer I., Fattal R., et al. : ‘Automatic recovery of the atmospheric light in hazy images’. IEEE Int. Conf. Computational Photography (ICCP), 2014, pp. 1–11 [Google Scholar]
- 8.Galdran A., Vazquez Corral J., Pardo D., et al. : ‘Enhanced variational image dehazing’, SIAM J. Imaging Sci., 2015, 8, (3), pp. 1519–1546 (doi: 10.1137/15M1008889) [Google Scholar]
- 9.Xu Y., Wen J., Fei L., et al. : ‘Review of video and image defogging algorithms and related studies on image restoration and enhancement’, IEEE Assess, 2016, 4, pp. 165–188 [Google Scholar]
- 10.Ren W., Liu S., Zhang H., et al. : ‘Single image dehazing via multi-scale convolutional neural networks’. European Conf. Computer Vision (ECCV), 2016, pp. 154–169 [Google Scholar]
- 11.Li B., Peng X., Wang Z., et al. : ‘An all-in-one network for dehazing and beyond’, arXiv preprint arXiv:170706543, 2017
- 12.Ren W., Zhang J., Xu X., et al. : ‘Deep video dehazing with semantic segmentation’, IEEE Trans. Image Process., 2018, 28, (4), pp. 1895–1908 (doi: 10.1109/TIP.2018.2876178) [DOI] [PubMed] [Google Scholar]
- 13.Liu Z., Xiao B., Alrabeiah M., et al. : ‘Single image dehazing with a generic model-agnostic convolutional neural network’, IEEE Signal Process. Lett., 2019, 26, (6), pp. 833–837 (doi: 10.1109/LSP.2019.2910403) [Google Scholar]
- 14.Szeliski R.: ‘Computer vision: algorithms and applications’ (Springer, 2011) [Google Scholar]
- 15.Durand F., Dorsey J.: ‘Fast bilateral filtering for the display of high-dynamic-range images’, ACM Trans. Graph., 2002, 21, (3), pp. 257–266 (doi: 10.1145/566654.566574) [Google Scholar]
- 16.Chen J., Paris S., Durand F.: ‘Real-time edge-aware image processing with the bilateral grid’, ACM Trans. Graph., 2007, 26, (3), pp. 1–9 (doi: 10.1145/1276377.1276506) [Google Scholar]
- 17.Kopf J., Cohen M., Lischinski D., et al. : ‘Joint bilateral upsampling’, ACM Trans. Graph., 2007, 26, (3), pp. 961–5 [Google Scholar]
- 18.Ramirez J.M., Paredes J.L.: ‘Recursive weighted myriad based filters and their optimizations’, IEEE Trans. Signal Process., 2016, 64, (15), pp. 4027–4039 (doi: 10.1109/TSP.2016.2557304) [Google Scholar]
- 19.Yeganeh H., Wang Z.: ‘Objective quality assessment of tone-mapped images’, IEEE Trans. Image Process., 2013, 22, (2), pp. 657–667 (doi: 10.1109/TIP.2012.2221725) [DOI] [PubMed] [Google Scholar]
- 20.Wang Z., Bovik A.C., Sheikh H.R., et al. : ‘Image quality assessment: from error visibility to structural similarity’, IEEE Trans. Image Process., 2004, 13, (4), pp. 600–612 (doi: 10.1109/TIP.2003.819861) [DOI] [PubMed] [Google Scholar]



