Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jul 6.
Published in final edited form as: J Biophotonics. 2020 Feb 3;13(4):e201960135. doi: 10.1002/jbio.201960135

Optical coherence tomography image de-noising using a generative adversarial network with speckle modulation

Zhao Dong 1,2, Guoyan Liu 3,4, Guangming Ni 2, Jason Jerwick 1,2, Lian Duan 1, Chao Zhou 1,2,4,*
PMCID: PMC8258757  NIHMSID: NIHMS1715056  PMID: 31970879

Abstract

Optical coherence tomography (OCT) is widely used for biomedical imaging and clinical diagnosis. However, speckle noise is a key factor affecting OCT image quality. Here, we developed a custom generative adversarial network (GAN) to de-noise OCT images. A speckle-modulating OCT (SM-OCT) was built to generate low speckle images to be used as the ground truth. 210,000 SM-OCT images were used for training and validating the neural network model, which we call SM-GAN. The performance of the SM-GAN method was further demonstrated using online benchmark retinal images, 3D OCT images acquired from human fingers and OCT videos of a beating fruit fly heart. The denoise performance of the SM-GAN model was compared to traditional OCT de-noising methods and other state-of-the-art deep learning based denoise networks. We conclude that the SM-GAN model presented here can effectively reduce speckle noise in OCT images and videos while maintaining spatial and temporal resolutions.

Keywords: Deep learning, De-noise, Generative Adversarial Network, Optical Coherence Tomography

Graphical Abstract

graphic file with name nihms-1715056-f0008.jpg

1 |. INTRODUCTION

Optical coherence tomography (OCT)1 is an emerging biomedical imaging technology24 that enables micron-scale, cross-sectional, and 3D imaging of biological tissues noninvasively. OCT is widely used for clinical applications, including ophthalmology57, cardiology8, endoscopy9, dermatology10, 11, and dentistry12. OCT, based on interferometry techniques, depends on the spatial and temporal coherence of backscattering light from the sample. This coherence also generates speckle noise13, 14 which degrades the contrast of OCT images, making the detailed structure of tissue samples difficult to resolve. In order to improve OCT image quality, both software-based and hardware-based methods were developed to remove speckle noise.

One straightforward software-based approach for reducing speckle noise is image averaging. Averaging multiple cross-sectional OCT images or “B-scans”, that were acquired at the same location results in a low speckle noise OCT image. This method has two main limitations: first, repeated scans lower acquisition speed and efficiency which decreases the temporal resolution. Second, for in vivo imaging applications, involuntary movements or sample motions degrade the quality of averaged images. These motions can result in motion artifacts or information loss in the averaged images.

Different filters, including low-pass filter15, median filter16, 17, and mean filter18 have been utilized for OCT image de-noising. De-noising using spatial filters can be effective, but at the same time, image resolution may be degraded due to pixel averaging. Besides filter-based denoise methods, more advanced de-noising approaches have been developed in recent years, including BM3D19, MSBTD20, and Tikhonov21. These de-noising methods showed better performance. However, post-processing for these methods is time-intensive and decreases the spatial resolution of the OCT images.

Besides algorithm based de-noise methods, many hardware-based de-noise methods were developed. One approach is to design a random laser22 to achieve speckle-free imaging. Another method is to utilize optical diffuser23 to introduce local phase shifts to remove speckle noise for OCT images. These hardware-based methods can remove speckle noise. Nonetheless, they increase system complexity. In addition, some hardware-based methods also require repeated scans that affect the temporal resolution of the system.

In recent years, deep neural networks24, 25 have been widely used in biomedical image processing applications, like semantic segmentation26, 27, image classification28, and super-resolution image reconstruction29. Generative adversarial network (GAN)30 is a promising deep neural network, which has been used in different machine learning problems, including text to image transformation31, image generation32, and image editing33. Recently, Huang et al.34, Ma et al.35, and Chen et al.36 employed the GAN method to de-noise OCT images for ophthalmic applications. With different GAN network structures, more general datasets, and better training strategies, the GAN method for OCT image de-noising can be further improved. In this paper, we developed a custom GAN-based deep neural network, called SM-GAN, to de-noise OCT images and videos. Speckle-modulating OCT (SM-OCT)23 was built to generate low speckle ground truth images. Different types of samples were collected with the SM-OCT system as training data in order to make the SM-GAN model more general for all OCT images. With a two-step training strategy, the SM-GAN method can virtually eliminate speckle noise while maintaining the image resolution. Furthermore, compared with image averaging method which requires repeated scans, the SM-GAN method can maintain temporal resolution. We tested our SM-GAN method with different OCT images and live sample OCT videos to compare de-noising performance with other traditional and deep learning based de-noising methods.

2 |. METHOD

2.1 |. Training data

A speckle-modulating OCT23 system was built to generate low speckle OCT images that can be used as the ground truth. An optical diffuser was inserted in the sample arm to generate random time-varying scrambling of the wave front of the OCT imaging beam. Cross-sectional images were acquired 100 times at the same location of the sample while the optical diffuser was moving. As a result, the speckle pattern changes randomly from frame to frame. After averaging the 100 B-scan frames, the speckle noise in the image was significantly reduced. The SM-OCT system used has a central wavelength at 840 nm and ~100 dB sensitivity. This OCT system has an axial resolution of ~3 μm in tissue and a ~5 μm transverse resolution.

Multiple scan patterns and different acquisition speed (for example, with 12.5 kHz, 20 kHz or 50 kHz A-line rates) were used to make the training datasets more general. For each sample, we acquired images at several different locations. At each location, we first repeatedly scanned 100 times at the same position (800 A-line × 100 repeats with a scan range of 2.8 mm) and averaged the intensity of these 100 images as the ground truth image. Then we acquired 3D data sets for testing (800× 600 for a scan range of 2.8 × 2.8 mm2 or 800 × 600 for a scan range of 1.87 × 1.87 mm2).

30 different types of samples, including various meats (e.g., pork, pork skin, pork feet, beef, lamb, chicken, chicken skin, fish skin, and fish), fruits (e.g., apple, pear, grape, orange, strawberry, olive, golden fruit, lemon, and kiwi), and vegetables (e.g., carrot, cucumber, cabbage, green pepper, squash, mushroom, jalapeno pepper, spinach, potato, celery, zucchini, and tomato) were acquired with the SM-OCT setup. We collected 600 SM-OCT datasets, each dataset was comprised of 100 repeated B-scan frames. FIGURE 1 shows a collage image of 6 different samples imaged with different speeds and resolutions. All 100 repeated frames for each dataset were used as either training data or validation data. No images from the same dataset were used as both training and validation data. We randomly selected 100 datasets, (in total 10,000 OCT frames) as validation data. Augmentation operations, like image flip, rotation, and contrast adjustment, were employed to the other 50,000 images to increase the training data size. After augmentation, a total of 200,000 OCT images were used for training, and 10,000 OCT images were used as validation data.

FIGURE 1.

FIGURE 1

Collage SM-OCT images of different samples. (a) chicken skin. (b) tape. (c) beef. (d) pork skin. (e) Pork. (f) fish. The top image of each sample is a single SM-OCT B-scan image, while the bottom is the ground truth image obtained by averaging 100 repeated SM-OCT B-scans, which significantly reduced the speckle noise.

2.2 |. Network Structure

The GAN30 network includes two parts: the generator network and the discriminator network. As shown in FIGURE 2 (a) the generator network operates to generate the prediction from a noisy image to a low-noise image, while the discriminator network serves as a judging tool to evaluate the output from the generator. The goal of the generator network is to fool the discriminator network by producing denoising images that the discriminator can not distinguish from the ground truth. While the discriminator network’s goal is to become more skilled at identifying generated denoising images. Backpropagation was applied for both generator and discriminator networks to optimize their performance. The training terminates when the generator produces images that discriminator network can not distinguish from the ground truth.

FIGURE 2.

FIGURE 2

SM-GAN training process and network structure. (a) SM-GAN model training process. (b) Generator network structure (c) Discriminator network structure.

2.2.1 |. Generator Network

The generator network used noisy OCT images as input data. The goal of the generator network is to produce images similar to ground truth images to fool the discriminator network. As shown in FIGURE 2 (a) the weight values of the generator network are iteratively updated to improve output image quality based on the generator loss feedback. FIGURE 2 (b) shows the structure of the generator network. The central part of the generator network is sixteen identical residual blocks. Sixteen residual blocks were used to balance the trade-off between training time and GAN performance. Each residual block includes two convolutional layers (Conv) with 3×3 kernels followed by a batch normalization layer (BN). The activation layers of the residual blocks are LeakyRelu. At the end of the residual block, an elementwise sum layer sums up the input and the output features of the residual block. This summation operation maintains input feature information and minimizes information loss.

2.2.2 |. Discriminator Network

The purpose of the discriminator network is to judge the output image quality from the generator network. FIGURE 2 (c) shows the structure of the discriminator network. The central part of the discriminator network is seven convolutional blocks with identical structures. Each block contains three layers: a convolutional layer with 3×3 kernels, a batch normalization layer and a LeakyReLU layer. After these convolutional blocks, two dense layers followed by a sigmoid activation function output the classification probability. We gradually increased the number of convolutional blocks to balance the total training time and discriminator network performance. Hence, seven convolutional blocks were found to achieve optimal performance for our discriminator network.

2.3 |. Loss Functions

As shown in FIGURE 2 (a), we used generator loss and discriminator loss as our loss functions, which give feedback to the GAN network.

The generator loss29 is defined as a weighted summation of content loss and adversarial loss.

Lgenerator=Lcontent+103×Ladversarial, (1)

The content loss compares feature differences between generator output images and ground truth images. Pixel-wise Mean Square Error (MSE) loss32, 37 as defined below can be used as a content loss for GAN.

LcontentMSE=1W×Hx=1Wy=1H[Ix,yGTG(IInput)x,y]2, (2)

where W and H indicate image width and height, respectively. IInput and IGT represent input images and ground truth images. G is the operator of the generator that is acting on input images and produces the output of the generator network. Pixel-wise MSE loss is widely used and easy to implement. On the other hand, MSE loss causes over smoothing for some structures in generated images which leads to feature loss38, 39. One solution is to employ a VGG 19 network to extract feature maps from ground truth images and output images of the generator network separately39. Content loss is defined as the Euclidean distance between these two extracted feature maps. With this method, the loss function is also called VGG loss, and the definition is shown in Eq. (3),

LcontentVGG=1W×Hx=1Wy=1H{ϕ(IGT)x,yϕ[G(IInput)]x,y}2, (3)

where ϕ represents the operator of the VGG 19 network that is acting on input images and produces the output of the VGG 19 network.

Content loss measures the difference between generator output and ground truth. It improves the generator’s output image quality to the same level of ground truth images. Adversarial loss29 is complementary of content loss. It helps the GAN network to generate images that preserve features from input OCT images. Definition of adversarial loss is shown below:

Ladversarial=n=1NlogD[G(IInput)], (4)

where D is the operator of the discriminator that is acting on discriminator input images and produces the output of the discriminator network. N is the total number of training images.

Discriminator loss29 was utilized for the discriminator network to update its weights during model training. As shown in FIGURE 2 (a), output images from the generator network and ground truth images are regarded as inputs of the discriminator network independently. If the input is a ground truth image, the discriminator network output is labeled real discriminator output; if the input is the output from the generator network, the discriminator network output is labeled fake discriminator output. Discriminator loss is calculated based on the real and fake discriminator output. The definition of the discriminator loss is shown in Eq. (5),

LDis=n=1N{log[D(IGT)]log{1D[G(IInput)]}}, (5)

2.4 |. Training Details and Evaluation Functions

The SM-GAN model was trained on a single NVIDIA GeForce GTX 1080 GPU with a Tensorflow based environment. The training process includes two steps. Employing a two-steps strategy for training the SM-GAN model can improve training efficiency and avoid undesired local optima40. The generator network, with a much deeper network structure and much larger numbers of training parameters compared to the discriminator network, needs longer training time to optimize the network performance. In addition, the transfer learning strategy can find a good starting point for the generator network and lead to a quick converge and better performance for the second training step. For the first training stage, only the generator network was trained with the MSE content loss function to quickly find the initial values. The content loss in equation (1) is set to the MSE loss as shown in equation (2). For this step, Adam optimizer41 was used to train the generator network. The initial learning rate was set to 10−5 and decreased by a factor of 0.1 for every 105 training iterations.

The second training stage employed a transfer learning strategy to fine-tune the network. The output weights of the generator network from the first training step were used as initial weights for the second training step. Both the generator and the discriminator network were trained during this step. For the second training stage, the MSE content loss was replaced with VGG content loss, which helps generate low noise images without over smoothing. The content loss in equation (1) is set to the VGG loss as shown in equation (3). We used the same optimizer and initial learning as the first training step. The decay rate was 10−6. With the two-step training strategy, the SM-GAN model was trained and converged quickly. The output images preserved features of input images and minimized information loss.

Contrast-to-noise ratio (CNR) and peak signal-to-noise ratio (PSNR) values were calculated to quantitatively evaluate the de-noising performance of the GAN model. CNR measures the contrast between the signal region and the background region. The definition of CNR is shown in Eq. (6),

CNR=1mi=1m[10log10(|μiμb|σi2+σb2)], (6)

where m is the total selected signal ROIs and we calculated three selected ROIs in our experiment. μi and σi are the mean and standard deviation of ith selected signal ROI in the sample image. μb and σb are the mean and standard deviation of the background ROI in the sample image.

PSNR is commonly used to measure the quality of the image and we computed PSNR for the whole OCT image. The definition is shown in Eq. (7),

PSNR=10log10(max(I)2MSE), (7)

where max(I) represents the maximum pixel value of image I. The Mean Square Error (MSE) is defined as,

MSE=1W×Hx=1Wy=1H(Ix,yKx,y)2, (8)

where W and H are the image width and height. Ix, y and Kx, y represent the output de-noising image and the reference image (i.e. ground truth image).

3 |. RESULTS

We tested our well-trained SM-GAN model with new OCT static sample images. Other traditional de-noising methods, such as BM3D19 and MSBTD20, were applied to the same OCT images to compare the de-noising performance. In addition, other deep learning based methods, including SRResNet40 and SRGAN40, were tested to compare the de-noising performance. SM-GAN de-noise method achieved the highest CNR and PSNR values compared to other de-noising methods. In addition to the static images, live sample OCT dataset, including online benchmark retinal OCT images20, 42, human finger and fruit fly heartbeat videos, were tested with the SM-GAN model, traditional and other deep learning based de-noising methods. The de-noising results from both static and live sample OCT datasets demonstrated that the SM-GAN model can reduce the speckle noise and maintain the spatial resolution.

3.1 |. De-noising results with SM-GAN

Grape and chicken cross-sectional OCT images were selected as the testing data for the SM-GAN method. Two commonly used traditional OCT denoising methods, BM3D19 and MSBTD20, and state-of-the-art deep learning networks, SRResNet40 and SRGAN40, were tested as comparisons. FIGURE 3 shows the original grape and chicken images, 100-frame averaged ground truth images, and their de-noising output results with BM3D, MSBTD, SRResNet, SRGAN, and SM-GAN, respectively. We zoomed into one selected area to see more details for both grape and chicken images, as shown in the middle columns of FIGURE 3. The SM-GAN output images maintain the resolution and reduce the speckle noise. However, speckle noise is visible in the output images of the BM3D and the MSBTD methods. SRResNet and SRGAN can remove the speckle noise, but output images for both grape and chicken are blurred compared to ground truth images.

FIGURE 3.

FIGURE 3

Input and ground truth OCT images of chicken and grape and different de-noising methods output images. Three signal regions (green) and one background region (blue) are manually selected for CNR calculation. Middle images are magnified images selected by the red box. (a) input OCT images of chicken and grape. (b) BM3D de-noising images of chicken and grape. (c) MSBTD de-noising images of chicken and grape. (d) SRResNet de-noising images of chicken and grape. (e) SRGAN de-noising images of chicken and grape. (f) SM-GAN de-noising images of chicken and grape (g) 100 Frames averaged ground truth images of chicken and grape.

In FIGURE 4, CNR and PSNR values for grape and chicken images were calculated to quantitatively measure the de-noising performance of different methods. CNR measures the contrast value between the signal information and background information. The SM-GAN method effectively reduces the speckle noise in the background and maintains the signal information. Hence the CNR values of grape and chicken SM-GAN de-noise images (labeled with a red star in FIGURE 3 (f)) equal to 12.58 dB and 13.89 dB, respectively, which are better than BM3D, MSBTD, SRResNet, and SRGAN methods (FIGURE 4 (a)). Based on Eq. (7) and (8), the larger the PSNR value is, the better the de-noise performance will be. The SM-GAN model also achieved the best PSNR values compared with other traditional and deep learning based de-noise methods for both grape and chicken images, which are 27.5 dB and 28.6 dB, respectively (FIGURE 4 (b)).

FIGURE 4.

FIGURE 4

CNR and PSNR evaluation plots for grape and chicken OCT images. (a) CNR evaluation of input data, denoising outputs of BM3D, MSBTD, SRResNet, SRGAN, SM-GAN, and ground truth image. (b) PSNR evaluation of input data, denoising outputs of BM3D, MSBTD, SRResNet, SRGAN, and SM-GAN.

3.2 |. OCT images of retinal images de-noised using SM-GAN

The above results demonstrated that the SM-GAN de-noising method removes speckle noise for OCT images of static samples. A significant challenge for de-noising images of live samples is that sample motion makes it difficult to acquire averaged ground truth images for training. We tested three live samples, a human retina, a human finger and the beating heart of a fruit fly, with the SM-GAN method.

First, the online benchmark retina OCT dataset20, 42 was tested with BM3D, MSBTD, SRResNet, SRGAN, and SM-GAN denoise methods to compare de-noising performance. FIGURE 5 (ag) shows an example retinal OCT image without de-noising, de-noising outputs using BM3D, MSBTD, SRResNet, SRGAN, and SM-GAN methods, and the ground truth image. In FIGURE 5 (hn), we selected one region from the retinal image and zoomed in to show more information. Speckle noise was greatly reduced with the SM-GAN method. Detailed information on the retina can be clearly observed and different layers can be resolved in the SM-GAN output image. Speckle noise was still visible in retina structure and background for the BM3D and MSBTD de-noise images. The output images of SRResNet were blurred. These images clearly show that SM-GAN based de-noising method works well for human retina images.

FIGURE 5.

FIGURE 5

De-noising retinal images with traditional and deep learning based methods. Both the high noise input and low noise ground truth retinal images are from the online benchmark dataset. (a) high noise input retinal image. (b) BM3D denoising image. (c) MSBTD denoising image. (d)SRResNet denoising image. (e) SRGAN denoising image. (f) SM-GAN denoising image. (g) ground truth image. (h-n) zoom in regions selected with the red boxes in (a-g).

3.3 |. 3D dataset of human finger de-noised using SM-GAN

The human finger dataset was tested with BM3D, SRResNet, SRGAN and our SM-GAN methods to compare de-noising performance. FIGURE 6 (ae) shows the 3D volumetric data of the human finger dataset without de-noising (Video S1), de-noising output datasets of BM3D (Video S2), SRResNet (Video S3), SRGAN (Video S4) and SM-GAN (Video S5). 3D volumetric datasets were acquired with 800 × 600 A-scans. In FIGURE 6 (fj), we randomly selected one frame from Video S6 which shows the comparison of all cross-sections of the 3D dataset. Speckle noise was reduced with the SM-GAN method. Detailed information on sweat ducts and epidermal junctions can be clearly observed in the SM-GAN output image. Speckle noise was still visible in finger structure and background for the BM3D and SRResNet methods output images. For the SRGAN output 3D dataset, some detailed structures of the finger were not reconstructed well and caused information loss. These videos clearly show that the SM-GAN based de-noising method works well for human finger videos.

FIGURE 6.

FIGURE 6

3D volumetric images and cross-sectional images of a human finger without de-noising, de-noising output images of BM3D, SRResNet, SRGAN, and SM-GAN. (a) 3D dataset of finger without de-noising (Video S1) (b) 3D dataset of BM3D output (Video S2) (c) 3D dataset of SRResNet output (Video S3) (d) 3D dataset of SRGAN output (Video S4) (e) 3D dataset of SM-GAN output (Video S5) (f-j) one frame of combination finger video (Video S6) of input image, de-noising outputs of BM3D, SRResNet, SRGAN and SM-GAN.

3.4 |. Fly heartbeat video de-noising results with SM-GAN

Our group has used OCT to image and characterize the fruit fly heart function4345. Speckle noise overlays with heart boundaries in fly heartbeat OCT videos resulting in difficulty for fly heart segmentation. Image averaging de-noising methods cannot be used for fly heartbeat videos due to the fast dynamics of the heartbeat. Different filters and wavelet methods would affect spatial resolution which could potentially affect the quantification of fine details of the heart such as heart wall thickness and chamber size. Hence, the SM-GAN model provides a fast and robust de-noising method for fruit fly heartbeat videos.

FIGURE 7 (ae) show one frame of a fly heartbeat video (Video S7) without denoising and with de-noising with four different methods: BM3D, SRResNet, SRGAN, and SM-GAN. We selected one axial scan at the same position from FIGURE 7 (ae), through the fly heart wall, to plot the intensity profile for each image as shown in FIGURE 7 (fj). The top point in FIGURE 7 (ae) corresponds to the left point in FIGURE 7 (fj). The black rectangle boxes in FIGURE 7 represent the interior of the fly heart tube, and the right peaks in FIGURE 7 (fj) show the fly heart wall. Based on FIGURE 7 (fj), noise remains in the intensity plot for the BM3D, SRResNet, and SRGAN. On the other hand, noise in the heart tube is minimized with the SM-GAN method making it easier to achieve accurate heart tube segmentation with these low-noise images.

FIGURE 7.

FIGURE 7

One frame of fly heartbeat video [Video S7] of the input image and de-noising output images with BM3D, SRResNet, SRGAN, and SM-GAN methods. Column intensity plot and FWHM measurement of the corresponding peak. (a-e) input fly heart image, and de-noise output images with BM3D, SRResNet, SRGAN, and SM-GAN methods. (f-j) selected column intensity profile plot for the input image and four methods of output images. The black box shows the fly heart tube and the red box represents the fly heart wall. (k-o) FWHM measurements of intensity peak of input image and four de-noise methods output images that show the thickness of fly heart wall thickness.

The full width at half maximum (FWHM) values of the peak in the right part of the intensity plot represent fly heart wall thickness. FIGURE 7 (ko) show a zoomed-in version of the intensity peak and their corresponding FWHM values. The FWHM values for the BM3D, SRResNet, and SRGAN are equal to 11.85 μm, 10.13 μm and 9.85 μm, which are larger than the FWHM value, 9.43 μm, of input image intensity peak. Increased FWHM values indicate image blurring and spatial resolution decreased. FWHM value for the SM-GAN de-noise output equals 9.47 μm, which is similar to the input. Based on fly heartbeat images and intensity profile results, the SM-GAN de-noising method outputs a low noise fly heartbeat video with the preserved spatial and temporal resolution which is crucial for accurate fly heart segmentation.

4 |. DISCUSSION

In this paper, we developed a deep learning based SM-GAN de-noising method for OCT images and videos. Our SM-GAN approach improved the de-noising performance from three perspectives: training data preparation, neural network structure, and training strategy design. For training data collection, following Liba, et al, we built a speckle-modulating OCT23 (SM-OCT) system to generate low speckle ground truth images. Cross-sectional images were acquired 100 times at the same location of the sample while the optical diffuser was moving. After averaging the 100 B-scan frames, the speckle noise in the image was significantly reduced. With this strategy, we can generate training data with ex vivo imaging. To generalize the denoising model, we acquired images from 30 different types of samples using the SM-OCT system with different acquisition speeds and resolutions. 210,000 SM-OCT images were used for training and validating the neural network model.

Our SM-GAN design was based on the ResNet46 structure. Skip connections27 was used to train a deep neural network to minimize information loss. For each residual block in the SM-GAN generator, the input and output were concatenated to maintain the information. Further, we connected the generator input directly with the output of 16 residual blocks to further maintain the information from the input images. For other OCT GAN based de-noising methods, Ma et al.35 employed a U-Net47 based structure for the generator design that is difficult to train a deep neural network without information loss. Huang et al.34, and Chen et al.36 used a DenseNet48 based generator network, which includes fewer training parameters. However, the DenseNet utilized much larger GPU memory and computational power compared to the ResNet based networks49. In addition, different from Chen et al.36, both the generator and discriminator networks of our SM-GAN employed Leaky Relu layers50, 51. Leaky Relu layers can accelerate the training process and keep the back-propagation optimization from getting stuck51.

Furthermore, our SM-GAN approach employed a two-step training strategy to improve de-noising performance. We employed transfer learning for training the SM-GAN model to improve training efficiency and avoid undesired local optima. Transfer learning strategy can find a good starting point for the generator network and lead to a quick converge and better performance for the second training step. For different training stages, we employed different content loss functions to fast the training speed and fine-tune the network to avoid image over smoothing.

We also compared the processing time with the SM-GAN model for human finger and fly heartbeat videos. Using the SM-GAN model, processing 4096 frames of fly heartbeat videos (each frame has 256 × 256 pixels) took about 127 seconds. With the BM3D, SRResNet, and SRGAN, it took 6185 seconds, 74.6 seconds and 131 seconds, respectively. For the human finger 3D dataset with 600 frames (each frame is 800×600 pixels) the SM-GAN model took 97.8 seconds to de-noise the whole dataset. It took the BM3D, the SRResNet, and the SRGAN 3000 seconds, 67.8 seconds, and 101.4 seconds to denoise the same 3D dataset. The SM-GAN, the SRGAN, and the SRResNet methods processing time are much faster than the BM3D. Considering the image blurring caused by the SRResNet and SRGAN, the SM-GAN method can maintain the spatial resolution and have better de-noising performance.

Although the SM-GAN de-noise method has demonstrated better de-speckling performance than traditional and some state-of-the-art deep learning methods, it still has room to improve the performance. Currently, our SM-GAN model can remove speckle noise and detector noise from OCT images. However, other sources affecting OCT image quality, such as motion artifacts, were not considered in our SM-GAN model since our training datasets were prepared using static samples to achieve the best speckle noise reduction. In the future, more training data sets with other OCT noise sources can be added to make the SM-GAN model more robust. In addition, real-time video rendering needs to process at ~30 frames per second. Our current SM-GAN model can only process 3 frames (each frame is 800×600 pixels) per second which does not satisfy the requirement for real-time rendering. In the future, further improvement of the SM-GAN model structure will be employed to shorten the training and testing time to achieve de-noising at the video rate.

5 |. CONCLUSION

In this paper, we developed a generative adversarial network, called SM-GAN, to de-noise OCT images and videos. We imaged 30 types of samples using the SM-OCT system with different speeds and resolutions to generate low speckle ground truth images. In total, 210,000 OCT images were utilized to train and test the SM-GAN model. We compared the de-noising performance of SM-GAN with other commonly used de-noising methods such as BM3D, MSBTD, SRResNet, and SRGAN. OCT images such as chicken, grapes, and retinal image, 3D OCT dataset such as the human finger, and the OCT video of a beating fruit fly heart were tested with different de-noising methods. Experiment results showed that our SM-GAN de-noising method can effectively reduce speckle noise and maintain spatial and temporal resolution for both OCT images and videos.

Supplementary Material

Video S1

Video S1: 3D rendering of the human finger without denoising.

Download video file (4MB, mp4)
Video S2

Video S2: 3D rendering of the human finger with a BM3D denoising method.

Download video file (3.2MB, mp4)
Video S3

Video S3: 3D rendering of the human finger with an SRResNet denoising method.

Download video file (1.1MB, mp4)
Video S4

Video S4: 3D rendering of the human finger with an SRGAN denoising method.

Download video file (2.1MB, mp4)
Video S5

Video S5: 3D rendering of the human finger with our SM-GAN denoising method.

Download video file (2MB, mp4)
Video S6

Video S6: Direct comparison of the denoising effect on human finger OCT dataset processed with different de-noise methods.

Download video file (10.6MB, mp4)
Video S7

Video S7: Direct comparison of the denoising effect on fruit fly heartbeat OCT images (50 frames) processed with different de-noise methods.

Download video file (2.7MB, mp4)

ACKNOWLEDGEMENTS

The authors would like to thank Jinyun Zou, Yongyang Huang, Jing Men, and Zhiwen Yang for helpful discussions. This work was supported by NSF grants IDBR (DBI-1455613), PFI: AIR-TT (IIP-1640707), NIH grants R15EB019704 and R01EB025209.

Footnotes

SUPPORTING INFORMATION

Additional Supporting Information may be found online in the supporting information tab for this article.

References

  • [1].Huang D, Swanson EA, Lin CP, Schuman JS, Stinson WG, Chang W, Hee MR, Flotte T, Gregory K, Puliafito CA Science. 1991, 254, 1178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Wojtkowski M. Appl. Opt. 2010, 49, D30–D61. [DOI] [PubMed] [Google Scholar]
  • [3].Fujimoto J, Swanson E Investigative ophthalmology & visual science. 2016, 57, OCT1–OCT13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Klein T, Huber R. Biomed. Opt. Express 2017, 8, 828–859. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Grulkowski I, Liu JJ, Potsaid B, Jayaraman V, Lu CD, Jiang J, Cable AE, Duker JS, Fujimoto JG. Biomed. Opt. Express 2012, 3, 2733–2751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Klein T, Wieser W, Eigenwillig CM, Biedermann BR, Huber R. Opt. Express 2011, 19, 3044–3062. [DOI] [PubMed] [Google Scholar]
  • [7].Klein T, Wieser W, Reznicek L, Neubauer A, Kampik A, Huber R. Biomed. Opt. Express 2013, 4, 1890–1908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Reiber J, Tu S, Tuinenburg J, Koning G, Janssen J, Dijkstra J Cardiovascular diagnosis and therapy. 2011, 1, 57–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Tsai T-H, Potsaid B, Tao YK, Jayaraman V, Jiang J, Heim PJS, Kraus MF, Zhou C, Hornegger J, Mashimo H, Cable AE, Fujimoto JG. Biomed. Opt. Express 2013, 4, 1119–1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Welzel J Skin Research and Technology. 2001, 7, 1–9. [DOI] [PubMed] [Google Scholar]
  • [11].Gambichler T, Moussa G, Sand M, Sand D, Altmeyer P, Hoffmann K Journal of Dermatological Science. 2005, 40, 85–94. [DOI] [PubMed] [Google Scholar]
  • [12].Otis LL, Everett MJ, Sathyam US, Colston BW The Journal of the American Dental Association. 2000, 131, 511–514. [DOI] [PubMed] [Google Scholar]
  • [13].Bashkansky M, Reintjes J. Opt. Lett 2000, 25, 545–547. [DOI] [PubMed] [Google Scholar]
  • [14].Schmitt JM, Xiang SH, Yung KM Journal of Biomedical Optics. 1999, 4, 95–105, 111. [DOI] [PubMed] [Google Scholar]
  • [15].Hee MR, Izatt JA, Swanson EA, Huang D, Schuman JS, Lin CP, Puliafito CA, Fujimoto JG JAMA Ophthalmology. 1995, 113, 325–332. [DOI] [PubMed] [Google Scholar]
  • [16].Boyer K, Herzog A, Roberts C IEEE transactions on medical imaging. 2006, 25, 553–570. [DOI] [PubMed] [Google Scholar]
  • [17].Lee K, Abràmoff MD, Niemeijer M, Garvin MK, Sonka M, in SPIE Medical Imaging: Biomedical Applications in Molecular, Structural, and Functional Imaging, San Diego, CA, March 2010, pp.33. [Google Scholar]
  • [18].Ozcan A, Bilenca A, Desjardins AE, Bouma BE, Tearney GJ J Opt Soc Am A Opt Image Sci Vis. 2007, 24, 1901–1910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Chong B, Zhu Y-K Optics Communications. 2013, 291, 461–469. [Google Scholar]
  • [20].Fang L, Li S, Nie Q, Izatt JA, Toth CA, Farsiu S. Biomed. Opt. Express 2012, 3, 927–942. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Chong GT, Farsiu S, Freedman SF, Sarin N, Koreishi AF, Izatt JA, Toth CA Archives of Ophthalmology. 2009, 127, 37–44. [DOI] [PubMed] [Google Scholar]
  • [22].Redding B, Choma MA, Cao H Nature Photonics. 2012, 6, 355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Liba O, Lew MD, SoRelle ED, Dutta R, Sen D, Moshfeghi DM, Chu S, de la Zerda A Nature Communications. 2017, 8, 15845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Krizhevsky A, Sutskever I, Hinton GE, in Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, December 2012, pp.1097–1105. [Google Scholar]
  • [25].LeCun Y, Bengio Y, Hinton G Nature. 2015, 521, 436. [DOI] [PubMed] [Google Scholar]
  • [26].Long J, Shelhamer E, Darrell T, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, June 2015, pp.3431–3440. [Google Scholar]
  • [27].Ronneberger O, Fischer P, Brox T, in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Munich, Germany, October 2015, pp.234–241. [Google Scholar]
  • [28].Haralick RM, Shanmugam K, Dinstein I IEEE Transactions on Systems, Man, and Cybernetics. 1973, SMC-3, 610–621. [Google Scholar]
  • [29].Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z, Shi W, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, July 2017, pp.105–114. [Google Scholar]
  • [30].Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y, in Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, December 2014, pp.2672–2680. [Google Scholar]
  • [31].Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X, in 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, June 2018, pp.3431–3440. [Google Scholar]
  • [32].Dong C, Loy CC, He K, Tang X IEEE Transactions on Pattern Analysis and Machine Intelligence. 2016, 38, 295–307. [DOI] [PubMed] [Google Scholar]
  • [33].Zhu J-Y, Krähenbühl P, Shechtman E, Efros AA, in Computer Vision – ECCV 2016, Amsterdam, Netherlands, October 2016, pp.597–613. [Google Scholar]
  • [34].Huang Y, Lu Z, Shao Z, Ran M, Zhou J, Fang L, Zhang Y Opt. Express 2019, 27, 12289–12307. [DOI] [PubMed] [Google Scholar]
  • [35].Ma Y, Chen X, Zhu W, Cheng X, Xiang D, Shi F. Biomed. Opt. Express. 2018, 9, 5129–5146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Chen Z, Zeng Z, Shen H, Zheng X, Dai P, Ouyang P Biomedical Signal Processing and Control. 2020, 55, 101632. [Google Scholar]
  • [37].Yu X, Porikli F, in Computer Vision – ECCV 2016, Amsterdam, Netherlands, October 2016, pp.318–333. [Google Scholar]
  • [38].Mathieu M, Couprie C, Lecun Y, in International Conference on Learning Representations, San Juan, Puerto Rico, May 2016, pp.1151. [Google Scholar]
  • [39].Johnson J, Alahi A, Fei-Fei L, in Computer Vision – ECCV 2016, Amsterdam, Netherlands, October 2016, pp.694–711. [Google Scholar]
  • [40].Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z, Shi W, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, July 2017, pp.105–114. [Google Scholar]
  • [41].Kingma D, Ba J, in International Conference on Learning Representations, Banff, Canada, April 2014, pp.1412. [Google Scholar]
  • [42].Fang L, Li S, McNabb R, Nie Q, Kuo A, Toth C, Izatt J, Farsiu S IEEE transactions on medical imaging. 2013, 32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Alex A, Li A, Tanzi RE, Zhou C Science Advances. 2015, 1, e1500639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Men J, Huang Y, Solanki J, Zeng X, Alex A, Jerwick J, Zhang Z, Tanzi RE, Li A, Zhou C IEEE Journal of Selected Topics in Quantum Electronics. 2015, 22, 120–132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [45].Men J, Jerwick J, Wu P, Chen M, Alex A, Ma Y, Tanzi RE, Li A, Zhou C JoVE (Journal of Visualized Experiments). 2016, e55002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [46].He K, Zhang X, Ren S, Sun J, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770–778. [Google Scholar]
  • [47].Long J, Shelhamer E, Darrell T in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 3431–3440. [Google Scholar]
  • [48].Huang G, Liu Z, van der Maaten L, Weinberger K, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 2261–2269. [Google Scholar]
  • [49].Bianco S, Cadène R, Celona L, Napoletano P IEEE Access. 2018, 6, 64270–64277. [Google Scholar]
  • [50].Xavier G, Antoine B, Yoshua B in Proceedings of the 28th International Conference on International Conference on Machine Learning, Bellevue, Washington, USA, 2011, pp.315–323. [Google Scholar]
  • [51].Xu B, Wang N, Chen T, Li M. in 32nd International Conference on Machine Learning, ICML 2015, Lille, France, July 2015, pp. 2342–2350. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Video S1

Video S1: 3D rendering of the human finger without denoising.

Download video file (4MB, mp4)
Video S2

Video S2: 3D rendering of the human finger with a BM3D denoising method.

Download video file (3.2MB, mp4)
Video S3

Video S3: 3D rendering of the human finger with an SRResNet denoising method.

Download video file (1.1MB, mp4)
Video S4

Video S4: 3D rendering of the human finger with an SRGAN denoising method.

Download video file (2.1MB, mp4)
Video S5

Video S5: 3D rendering of the human finger with our SM-GAN denoising method.

Download video file (2MB, mp4)
Video S6

Video S6: Direct comparison of the denoising effect on human finger OCT dataset processed with different de-noise methods.

Download video file (10.6MB, mp4)
Video S7

Video S7: Direct comparison of the denoising effect on fruit fly heartbeat OCT images (50 frames) processed with different de-noise methods.

Download video file (2.7MB, mp4)

RESOURCES