Skip to main content
Biomedical Optics Express logoLink to Biomedical Optics Express
. 2022 Mar 8;13(4):1924–1938. doi: 10.1364/BOE.445319

Image-to-image translation of label-free molecular vibrational images for a histopathological review using the UNet+/seg-cGAN model

Yunjie He 1, Jiasong Li 1, Steven Shen 2, Kai Liu 1, Kelvin K Wong 1,3, Tiancheng He 1, Stephen T C Wong 1,2,3,*
PMCID: PMC9045908  PMID: 35519236

Abstract

Translating images generated by label-free microscopy imaging, such as Coherent Anti-Stokes Raman Scattering (CARS), into more familiar clinical presentations of histopathological images will help the adoption of real-time, spectrally resolved label-free imaging in clinical diagnosis. Generative adversarial networks (GAN) have made great progress in image generation and translation, but have been criticized for lacking precision. In particular, GAN has often misinterpreted image information and identified incorrect content categories during image translation of microscopy scans. To alleviate this problem, we developed a new Pix2pix GAN model that simultaneously learns classifying contents in the images from a segmentation dataset during the image translation training. Our model integrates UNet+ with seg-cGAN, conditional generative adversarial networks with partial regularization of segmentation. Technical innovations of the UNet+/seg-cGAN model include: (1) replacing UNet with UNet+ as the Pix2pix cGAN’s generator to enhance pattern extraction and richness of the gradient, and (2) applying the partial regularization strategy to train a part of the generator network as the segmentation sub-model on a separate segmentation dataset, thus enabling the model to identify correct content categories during image translation. The quality of histopathological-like images generated based on label-free CARS images has been improved significantly.

1. Introduction

Coherent anti-Stokes Raman scattering (CARS) is a nonlinear four-wave mixing process that is used to enhance the weak (spontaneous) Raman signal by several orders [1]. CARS is a label-free, 3-dimensional (3D), real-time optical imaging modality that can provide biological tissue images with sub-cellular level resolution [24]. The CARS modality has been used in disease research and pathological diagnosis by imaging target tissue both ex vivo and in vivo [58]. In this manuscript, we are developing a method to convert the images from a high-resolution, label-free, 3-D capable optical imaging system to H&E pseudo-stained imaging. Only a few imaging modalities fall into this category, such as autofluorescence imaging, second-harmonic generation (SHG), and third-harmonic generation (THG). These imaging modalities can achieve high resolution label-free images, but they are not suitable for all tissue types [9]. CARS is sensitive to certain chemical bonds and is not limited to a particular tissue type. Therefore, by tuning the wavelength of the laser source in our CARS system, we can image an array of different tissue types. The chemical bond contrast has the potential for pathological diagnosis similar to histopathological stained images, such as through hematoxylin-eosin (H&E) staining, to view the tissue architecture. However, pathologists and clinical investigators are used to interpreting the images in H&E presentation.

Prior research shows the potential for stain-less staining or virtual staining [10,11]. Stainless staining for computed histopathology is possible by translating infrared (IR) microscopy to H&E images, without the need of labeling agents. It can provide high quality images in some cases. However, IR microscopy is constrained in its applications and outcomes due to its fundamental limitations, including the high spectral similitude between spectra of different bacteria, high signal overlap and widening of adjacent signals, incompatibility for the acquisition of spectra into samples with content of intrinsic water, background signals of substrate producing “false” signals not associated with the sample, signals with weak intensity resulting in a low signal-to-noise ratio (SNR), and excessive spectral noise [12,13]. In addition, tissue autofluorescence is defined as a natural ability of tissue to fluoresce when exposed to a certain light wavelength and is not suitable for imaging most tissue types [14].

In this paper, we developed a deep learning enabled image-to-image translation model that can map label-free CARS images into an H&E image presentation automatically to sidestep the need for additional tissue preparation and staining steps. Generative adversarial networks (GAN) were introduced by Ian J. Goodfellow et al. in 2014 [1519]. Their brand-new structure gained lots of attention in various areas such as data augmentation, data enhancement, and data generation. Particularly in the computer vision domain, GAN are more flexible and provide a new angle for image translation and augmentation. GAN are composed of two parts: the generator and the discriminator. The generator produces images from random noise (or from input images with prior information) to confuse the discriminator, while the discriminator aims to classify the input image as a pseudo but realistic looking image. The generator and discriminator compete with each other during training and achieve optimized states while loss functions converge. GAN are accompanied by fully connected layers in the beginning and then become more powerful after combination with convolutional layers.

Owing to GAN’s effectiveness, many mutated structures have been developed. For example, a conditional GAN indicates that the generator input can be either random noise or other meaningful data input, paving the way for image translation [20]. CycleGAN can be used to train an image-to-image translation model, which is not highly dependent on paired datasets, by its additional inverse mapping structure [21]. WGAN introduces Wasserstein distance to improve the converging process during training [22,23]. LSGAN and hinge loss based GAN bring different objective functions to ameliorate the performance [2428]. Further, abundant innovations related to GAN have been created for specific tasks, such as Δ-GAN using semi-supervised learning, and DiscoGAN and DualGAN applied to transfer learning [2931]. GAN usually have better performance for image translation than traditional methods. The goals of image translation vary across medical imaging applications, and thus it is hard to find an objective function for a specific task and even harder to generalize it. Nevertheless, the discriminator in a GAN overcomes these problems by being trained along with the generator. This optimization process allows a GAN to avoid creating a complex loss/dice function that is often not generalizable.

The reliability of output images is always the key issue in applying GAN in medical image generation and translation, and many improvements have been made. For example, the Laplacian generative adversarial networks (LAPGAN) were designed by Denton et al. to upgrade the resolution of output images [32]. Isola et al. introduced Pix2pix, a supervised image translation model [18], and the appearance of a Pix2pix GAN has improved the performance significantly. The L1 regularization loss in Pix2pix improved the robustness over other conditional GAN, particularly for medical image generation, translation, and even denoising. Chen et al. applied GAN in magnetic resonance imaging (MRI) reconstruction and computed tomography (CT) generation based on MRI [27]. Medical image translation is a rapidly emerging research area of GAN [17,3335], notably their applications to image translation between CT and MRI scans [21,3638].

In this paper, the subject is translating CARS images to H&E style by using GAN [39]. If the CARS-H&E translation technology is successful, pathological diagnosis based on a CARS 3D capable image modality has the potential to replace H&E staining and allow label-free imaging of tissue presentation in familiar H&E format but without the need for staining in practical use. However, there are hardly any image-to-image translation applications reported on CARS-H&E stained images. Recent works in this area are related to translating H&E style images based on multimodal CARS/TPEF/SHG [40,41]. They showed great potential for translating multichannel label-free microscopy to pseudo-H&E staining microscopy. Unfortunately, the resulting image contrast is not usually accurate, containing misleading information for pathological diagnosis.

In the current study, we propose a more accurate and robust image-translation method and illustrate the model using thyroid cancer tissue images acquired by CARS microscopy. In contrast to prior work that transferred the color style of H&E images, we leverage the ground truth tissue segmentation labels in H&E microscopy during model training and use it to significantly improve the accuracy in image translation to pseudo-H&E presentation where tissue labels are not available. Our model integrates UNet+ with seg-cGAN, conditional generative adversarial networks with partial regularization of segmentation. It replaces UNet with UNet+ as the Pix2pix cGAN’s generator to enhance pattern extraction and richness of the gradient and applies the partial regularization strategy to train a part of the generator network as the segmentation sub-model on a separate segmentation dataset, thus enabling the model to identify correct content categories during image translation.

2. Materials and methods

The automated image-to-image translation pipeline to map label-free CARS images to pathologic H&E images by integrating image preprocessing and translation with an improved Pix2Pix conditional GAN model, called UNet+/seg-cGAN is shown in Fig. 1.

Fig. 1.

Fig. 1.

The automated imaging pipeline developed for CARS-H&E image translation.

2.1. Data collection

The following two datasets were established for model training purposes: (1) CARS-H&E image translation dataset (200 scans/10 cases), and (2) CARS segmentation dataset (200 scans/10 cases). The two datasets are paired and share identical CARS images scanned from thyroid tissues. To construct the CARS-H&E dataset, the same group of tissues was H&E stained and photographed. Furthermore, CARS images were labeled by pathologists and saved as mask images. Therefore, we obtained the CARS segmentation dataset by combining these masks with its corresponding CARS images.

Human thyroid tissue was obtained from patients undergoing thyroid/parathyroid surgery at Houston Methodist Hospital, Houston, Texas, USA and Shanghai General Hospital, Shanghai, P.R. China, following an institutional review board approval. The excised tissue samples were cut into 5 mm chunks and then immediately snap-frozen in liquid nitrogen for storage. Frozen tissue samples were passively thawed for 30 minutes at room temperature before CARS imaging. The sample was first imaged by CARS system and immediately sent for H&E staining afterward. (Fig. 2)

Fig. 2.

Fig. 2.

The imaging system for acquiring data.

The tissue samples were placed on a 170 µm cover slide (VWR, Radnor, PA, USA) and then inverted on an imaging chamber to avoid possible compression. For Artificial intelligence-augmented CARS (iCARS) imaging, the pump beam was tuned to 802 nm while the Stokes beam was fixed at 1040 nm to probe the symmetric stretching frequency of CH2 bond at 2845 cm−1 [42]. iCARS signals were generated at 663 nm. The image (1024 × 1024 pixels) was acquired and displayed in real-time using the ThorImage 3.0 software (Thorlabs, Inc), and the average laser output power was about 75 mW for pump beam and 35 mW for Stokes beam. Bright-field images of the H&E slides were examined with an Olympus BX51 microscope as a standard control.

2.2. Data preprocessing methods

It is important to preprocess raw data before the model training. The data preprocessing improves the training with high probability in practice. The following three sub-sections describe the key preprocessing procedures that we applied to optimize the model performance.

2.2.1. Edge fading correction

Microscopy images generated by CARS often have fading issues near or at the edges of the images. This margin seriously affects the statistical distribution of image pixel values and reduces the model performance if it is uncorrected. To resolve fading issues, we deployed BaSiC method proposed by Tiangying et al., a tool that applies intensity normalization based on low rank and sparse decomposition to the target images [43]. This largely reduced the edge fading compared to the original CARS images.

2.2.2. Image registration

Many GAN, such as style-transfer GAN, allow training without paired image couples [4446]. However, these GAN can rarely be applied to medical image translation due to their lack of precision. Moreover, matched image pairs will always bring a huge improvement to the performance of the model. Since each tissue is imaged twice on different scanners, the two images will be slightly misaligned or distorted even if the tissue was photographed carefully. Image registration is indispensable for application to impaired images for optimizing the dataset quality.

Image registration includes two procedures, location of reference points and image transformation [47,48]. Finding the reference points is one of the most important steps in image registration, because no matter how fancy the transforming method is, the output will be totally wrong if the reference points were not located correctly. Nevertheless, the imaging principles are vastly different between CARS and H&E images. CARS images only have one channel while H&E images are in the RGB format. Consequently, the automated registration methods easily make mistakes on locating reference points, thus ruining the entire registration operation.

We implemented a semi-automatic image registration Python tool to display the high probability reference points, inspired by [49]. The tool allows manual adjustment of the reference points when the points are incorrectly located. A registration sample is shown in Fig. 3(B). After having verified the reference points, the affine transformation was applied to obtain the registered H&E image (see the third image in Fig. 3(B)).

Fig. 3.

Fig. 3.

(A) representative sample of edge shade correction (B) The image registration pairs CARS and H&E images generated from the same tissue sample. The reference points (yellow dots) were used for transformation to pair images. The image in the last column is the H&E image after registration. (C) The figure shows how the augmentation process works. The yellow rectangles are the cropped useful areas of original images after registration. The three points with different colors are random points generated from uniform distribution. The white squares in the images are the cropped images. (D) Cropped CARS and H&E images of the same thyroid tissue and corresponding mask image which are ready for the model training.

2.2.3. Image augmentation

After registering the original H&E image, we can further augment data by cropping and rotating image pairs and adding noise to the CARS images. The cropping size is set as 256 × 256.

uniform([0,s256])

The cropping strategy uses the uniform distribution to pick up the left-up corner of the augmented image and crop it off with the size of 256 × 256. (Figure 3(C))

The 200 raw image pairs were acquired from scanners and were randomly split into training (100), validation (40), and testing (60) datasets. The scans from same case were not located in the different datasets. After registration and augmentation, the number of image pairs were multiplied by 40 (i.e. train:validation:test = 4000:1600:2400),

f(x)={1,ifx+z<01,ifx+z>255x+z127.51,if0x+z255

where f is the data generator function, x is the pixel values of the original input images, and z is the added noise with zuniform([0.1,0.1]) .

2.2.4. CARS segmentation dataset

The CARS segmentation dataset is based on the CARS-H&E image dataset. The CARS images were manually labeled by pathologists in three different types of content: thyroid follicle (orange), background (blue), and cells (grey). The segmentation dataset was simultaneously augmented along with the CARS-H&E image dataset. Finally, we have 4000, 1600, and 2400 image pairs in the training, validation, and test datasets (Fig. 3(D)).

2.3. Post-processing

Since a CARS image is cropped to 256 × 256 pieces before being entered into the model, we need to apply a stitching strategy to reconstruct the pseudo-H&E image from the outputs. Thus, we developed a special stitching strategy to efficiently orGANize the outputs. Since the model easily makes mistakes at the edge of the image, the input CARS image of the model was deliberately cropped partially overlapped between the pieces.

During the experiments, we found that the efficiency decreases if the overlapped area is too large, especially evident when the width of the overlapped edge is larger than half the size of the input image. However, the advantages of the stitching strategy will not be significant while the overlapped edges are extremely narrow. Moreover, our cropping strategy can find the optimized width capable of dividing the boundary length without remainder pixels. The following formula shows the way to find the optimized width of overlapped edges,

min0<x<C2,xNx1{(hx)mod(Cx)=0}

where x is the optimized overlapped edge width, C is the cropped image size, and h represents the width of the original image size (h = 256 here).

Through the formula, we can decrease abrupt transition between adjacent cropped images, thereby minimizing loss of information, increasing precision, and smoothing the stitched edges. The equation ensures that the computational cost will not increase dramatically by finding a reasonable h.

The stitching method deals with the overlapped edges of outputs and determines the final image quality. Here we propose two efficient stitching dice.

2.3.1. Mean dice

The mean dice averages the pixel values from the overlapped area of outputs mapped from cropped CARS images. For the pixel p on the channel c, we have:

qc=1Mm{1,2,,M}pcm

where M is the number of overlapped images at the current pixel, pcm is the predicted value of the pixel p at the channel c, and qc represents the average value for the pixel p at channel c.

2.3.2. Median dice

The median dice calculates the pixel values’ median from the overlapped area of outputs, which are mapped from cropped CARS images. For the pixel P on the channel c, we have:

qc=1MMedianm{1,2,,M}pcm

where M is the number of overlapped images at the current pixel, pcm is the predicted value of the pixel p at the channel c, and qc is the median value for the pixel p at channel c.

2.3.3. Dynamic dice

Since the quality of a CARS image at the edge is far less than the middle of the image, the possibility of outliers will increase greatly. The median dice is thus a better choice at image edges because it is less sensible to the outliers. However, the mean dice is more accurate while the variance of pixel values is low. Therefore, the median dice is only applied when:

|MeanPiMedianPi|>1nj{1,,n}|MeanPjMedianPj|

where pi is the current pixel, and n is the number of pixels in the target image. The dynamic dice will be efficient while the mean and median are vastly different, but this situation is rare.

2.4. Methods

Although the GAN model has brought huge benefits to image generation and transformation, there is still room for improvement in better presenting architectural details, especially when applying the model to highly sensitive patient images. We thus propose a new model, termed U-Net+&seg-GAN, to improve the fidelity and quality of medical image transformation. The new model significantly reduces the information loss during the CARS-H&E stained image translation and increases the accuracy compared to the prevalent Pix2pix conditional GAN model.

Considering that the quality of translated images from CARS is highly sensitive to information loss, the generator’s structure has been carefully designed so that the information extracted by each convolutional layer can be efficiently utilized.

The image-to-image translation starts with a basic image reconstruction structure, encoder-decoder that has two desirable properties: (1) the encoder extracts the information in different levels by applying sequential blocks of neural network layers, and (2) the decoder can reconstruct the image in the same shape of the input [19,50]. The layers in the encoder-decoder architecture do not have to be convolutional. For example, digital images in MNIST dataset only have resolution of 28 × 28 × 1 and are good to be raveled as input of the neural networks constructed by dense layers.

Unlike the basic encoder-decoder architecture, U-Net is most likely to be accompanied by convolutional layers, which is powerful for image feature extraction [51]. The reason is that the special skip-connection structure of U-Net allows the convolutional and deconvolutional layers to be directly connected while they are in the same block level. This design is useful to vary the gradients and acquire more information for the ending layers, hence improving the loss convergence process [52,53].

The Pix2pix GAN model uses the U-Net as its generator and achieves a better performance than the encoder-decoder. However, the limitation of U-Net remains: the skip-connection combinations between convolutional and deconvolutional blocks are restricted due to the same shape rule between convolutional block and its related deconvolutional block. To enrich the skip-connection and vary the gradients further, UNet+ is introduced.

2.4.1. UNet+ GAN

UNet+, originally designed for image segmentation, is introduced into the Pix2pix GAN model in our research. The UNet+ overcomes the limitations by its redesigned skip-connections. Figure 4(A) shows the UNet+, designed by Zhou et al., set as the GAN’s generator. The skip-connections of U-Net were replaced by up and down samplings in different depths [50,54]. The downward, upward, and horizontal arrows in Fig. 4 represent the down-sampling, up-sampling, and skip-connections between convolutional blocks. Different depths in UNet+ vary the feature extraction levels and enrich more gradients during the loss optimization compared to the U-Net structure.

Fig. 4.

Fig. 4.

(A) represents the structure of UNet+ GAN’s generator. Layers X0,1, X0,2, and X0,3 are directly concatenated and then connected to the final layer, which enriches the gradients of the loss function. (B) is the UNet+/seg-cGAN’s generator. Comparing to (A), the UNet+/seg-cGAN’s generator compiled the red triangle part as an image segmentation model. It was simultaneously trained based on the CARS segmentation dataset and with the entire GAN’s generator network.

2.4.2. UNet+ seg-GAN

Although different categories of content have the same color gamut, they differ in biological expressions. For example, the white color may not indicate similar components in different content categories. As shown in Fig. 3(D), the white part with blue mask indicates the background while the thyroid follicle with white artifacts is labeled under orange mask. It is important to send to the generator a priori information, such as content categories, which will help the generator to make better translation decisions. The proposed UNet+/seg-cGAN method is designed to address this issue. Figure 4(B) shows its generator’s structure. Based on UNet+, the part of networks in the red triangle was compiled with the categorical cross-entropy loss ( Lseg ) for the segmentation model, while the whole networks were compiled for cGAN’s loss. Thus, the optimization problem becomes:

minGmaxDLcGAN(D,G)=ExPdata[logD(x)]+Ezpz(z)[log(1D(G(z)))]+λL(G)

where G is the UNet+ generator, D is the convolutional discriminator, z represents noises generated from the uniform distribution added to CARS images during image augmentation, LcGAN(D,G) represents the general function, and x represents samples randomly chosen from the real H&E stained image dataset. The regularization part is multiplied by a constant λ (0.5-1), and θ represents the parameters related to segmentation:

L1(G)=Ex,y,z[|yG(x,z)|]L2(G)=Ex,y,z[|yG(x,z)|2]Lseg(Gθseg)=c=1CyclogGc,θseg(x,z)

Lseg(Gθseg) represents the categorical loss of the segmentation model, where C is the number of the content categories in the segmentation dataset. Its related parameters θ are also involved in GAN’s optimization. We name this strategy partial regularization, which restricts regularization effects in the specific layers and therefore injects additional information into the target model (see Algorithm 1).

Overall, we made the following improvements from Pix2pix cGAN. First, the UNet generator of Pix2pix cGAN was replaced by UNet+ to enhance the feature extraction during convolutional operations and to vary the loss function’s gradients. Second, the front part of the generator was compiled as a segmentation sub-model. During the optimization of the whole UNet+, the sub-model was simultaneously trained on a segmentation dataset. All CARS images were labeled by mask of different content categories in this dataset. This strategy was named partial regularization.

3. Results

The models were trained on a TITAN RTX NVIDIA GPU with Adam optimizer (learning rate = 1e-3∼1e-5, beta1 = 0.6, and beta2 = 0.5). The training finished while the average change of generator’s mean square error loss was less than 1e-4.

Algorithm 1. The procedure for Training UNet+ seg-cGAN.

graphic file with name boe-13-4-1924-i001.jpg

3.1. Model performance

In Fig. 5, three representative sample groups from the test dataset illustrate the fidelity and quality of pseudo H&E images. Owing to the differences of photographical mechanism between CARS and H&E imaging methods, many tissue details in the real H&E stained images cannot be found in its corresponding CARS image. However, the UNet+ generator upgraded the utilization of information in CARS images and accordingly can maintain more biological structures during the image translation. Also, the sub-segmentation model can help identify the content categories to strengthen the model performance.

Fig. 5.

Fig. 5.

Representative samples from the test results are showed. The shade corrected CARS images are inputs of UNet+/seg-cGAN. The columns of pseudo H&E and predicted mask are images generated by the model. Columns of label H&E and label mask images are gold standards of the real H&E stained images and segmentation masks of content categories.

In Sample 4 of Fig. 5, the black area on the bottom left of the CARS image was correctly classified and translated into pseudo H&E staining image, in spite of some missed details in the CARS image as compared to its real H&E staining image. Since the information cannot be invented, the non-existent tissue details in the CARS image will reasonably not appear in its pseudo H&E image.

To analyze how much UNet+/seg-cGAN has improved the translation, we compared UNet+ with other generators such as UNet and basic encoder-decoder without skip-connections. These generators with different filter numbers were also included in experiments. In a standard UNet generator, the filter number of the encoder block usually doubles compared to its previous block, while the filter number of the decoder block is its half compared to the previous one most of the time. The blocks in the same level in the encoder and decoder can connect each other by the skip connection. We name the filter number of the first encoder block the basic filter number, which equals to the filter number of the final decoder block. Reasonably, the basic filter number should be the smallest block filter number in the generator.

Since there are two main improvements, including the generator upgrade from UNet to UNet+ and the simultaneous segmentation sub-model training, UNet cGAN, UNet+ cGAN and UNet+ cGAN with segmentation sub-model were trained with same basic filter numbers to illustrate the performance difference between different generators. A larger size UNet and UNet+ with 128 filter number are presented for reference.

Meanwhile, the UNet cGAN with segmentation sub-model was not constructed because the UNet+ has its special skip-connection layers, which are friendly to construct a segmentation sub-model, while the direct skip-connections between encoder blocks and decoder blocks in the standard UNet do not allow it to train a segmentation sub-model. Table 1 displays the results of models based on the test dataset. The first column of Table 1 includes the names of the models and followed by the basic filter numbers of their generators. The second column contains the numbers of generator parameters of each model. The parameter numbers are identical between the generators in UNet+ cGAN and UNet+ seg-cGAN with same basic filter number, because the sub-model training does not modify the generator networks. In the third column, the structural similarity index (SSIM) is a metric to measure how similar the pseudo H&E image is to its corresponding real H&E image.

Table 1. Performance of conditional GAN with different generators.

Generators (basic filter number) Number of parameters SSIM MSE
U-Net (64) 16,665,219 0.5148 (±0.167) 0.0731 (±0.025)
U-Net (128) 66,622,723 0.6911 (±0.152) 0.0655 (±0.014)
UNet+ (64) 21,948,483 0.7220 (±0.145) 0.0296 (±0.019)
UNet+ (128) 87,675,011 0.7591 (±0.116) 0.0254 (±0.011)
UNet+/seg (32) 5,501,987 0.6179 (±0.125) 0.0129 (±0.013)
UNet+ CycleGAN (64) 21,948,483 0.8125 (±0.109) 0.0281 (±0.089)
UNet+/seg (64) 21,948,483 0.8917 (±0.071) 0.0079 (±0.004)
UNet+/seg (128) 87,675,011 0.9066 (±0.065) 0.0076 (±0.003)

The images are all scaled between 0 and 1 before the SSIM score calculation following the same range rule for input arrays. The SSIM score is a real number between 0 and 1. The higher the SSIM score, the more similar the two images. In the last column is the mean square error (MSE), which measures the difference between two related images. The lower the MSE score, the more similar the two images.

Table 1 shows that the UNet+/seg strategy is the best performer with same or less parameters as measured by SSIM and MSE metrics. Note that because additional information from segmentation dataset was imported by training the CARS-H&E generator and CARS segmentation model simultaneously. The CARS segmentation model is a sub-model of CARS-H&E generator and shares its most parameters with the latter. The information about content categories helped the generator to better understand the CARS images and ameliorated the converging process. Since the objective problem is about translating from CARS images into H&E images, we do not need to train with the sub-model and directly obtained the UNet+ networks from UNet+ seg-cGAN, which has better performance than UNet+ from UNet+ cGAN. The generator’s performance is improved while the generator structures stay identical. In addition, the performance of pseudo H&E stained images are slightly improved by using proposed stitching strategies. There are some prior works related to similar research [55,56]. In this paper, we compare our model to CycleGAN, even their inputs are based on 3 types of scans [41]. We re-trained a CycleGAN model based on our data. The CycleGAN model performed worse than UNet+ seg-cGAN. UNet+ seg-cGAN recognizes better content categories, and therefore does less wrong color mapping. For example, in the ex2 of Fig. 6, the CycleGAN failed to recognize the background and did the wrong translation while UNet+ seg-cGAN did it correctly.

Fig. 6.

Fig. 6.

Five test samples after stitching were shown above. The first column contains CARS images as UNet+/seg-cGAN’s inputs. Pseudo H&E images generated from UNet+/seg-cGAN and UNet+ CycleGAN are located at the second column and third column. Key structures were merely kept comparing to the Real H&E images. However, UNet+ CycleGAN has much more color inversion problem than UNet+/seg-cGAN.

Table 2 shows the MSE and the SSIM of the stitching area between pseudo images and real H&E stained images. N/A is the control strategy (without any stitching method). The performance of the groups implemented with stitching are slightly better than the control. Figure 6 displays samples after stitching with the dynamic method. Zooming into the tissue details, there is almost no trace of splicing in pseudo H&E images. The tissue structures in the images become more coherent, and this reduces the probability of misdiagnosis.

Table 2. Stitching strategies’ comparison.

Metrics Mean Median Dynamic N/A
MSE 0.0074 (±0.006) 0.0071 (±0.007) 0.0079 (±0.004) 0.0085 (±0.009)
SSIM 0.8810 (±0.072) 0.8807 (±0.074) 0.8917 (±0.071) 0.8751 (±0.079)

4. Discussion

In the model, UNet was replaced by UNet+ and the partial regularization strategy was applied with UNet+. The partial regularization allows UNet+/seg-cGAN to construct a sub-model simultaneously during training, which brings additional information into GAN’s generator from CARS segmentation dataset and improves the converging process during training. Accordingly, the generator for CARS-H&E image translation has improved significantly without increasing its number of parameters.

The perceptual validation aims to test whether the pseudo H&E images generated by UNet+/seg-cGAN can fool humans. Just like most of the other GAN models, the output images are always slightly fuzzy. This consequently allows random participants to easily filter out the pseudo images with accuracy more than 70% without any pathological knowledge. Also, the outputs have slightly lower quality than the gold standards as the information that does not exist in CARS images but is present in targeted H&E images cannot be invented during image-to-image translation. The goal of our work is to generate H&E-like label-free CARS images, which are more helpful for pathologists in interpretation and help make the label-free modality more acceptable for pathologic diagnosis.

The pathologists participating in this study claimed that the results showed the huge potential of the model for the pathological diagnosis only based on translated images from CARS, however there are still some important structures missing during image translation. Figure 6 illustrates such examples. It appears that pseudo H&E images maintain most tissue architectural information and lots of cytological details. For example, in the top corner of ex1 of Fig. 6, the black artifacts may confuse pathologists during diagnosis. Also, the pseudo images may tend to have less quality than the actual H&E images due to the resolution of input CARS images, e.g., cellular clarity and nuclear or chromatin detail are less obvious.

The translation from CARS images will allow pathologists and clinicians to waive the tissue staining procedure during diagnosis to save the time and cost. The improvements made in this paper contributed to this ultimate goal. Coupling with endoscopy techniques, real-time acquisition and presentation of label-free CARS images in H&E format would realize pathology diagnosis in vivo.

Acknowledgments

We also acknowledge Houston Methodist research pathology core for tissue staining and thank Rebecca Danforth for proofreading the manuscript.

Funding

John S. Dunn Foundation10.13039/100006988; Ting Tsung and Wei Fong Chao Family Foundation10.13039/100015164.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

  • 1.Rodriguez L. G., Lockett S. J., Holtom G. R., “Coherent anti-stokes Raman scattering microscopy: A biological review,” Cytometry, Part A 69A(8), 779–791 (2006). 10.1002/cyto.a.20299 [DOI] [PubMed] [Google Scholar]
  • 2.Gao L., Zhou H., Thrall M. J., Li F., Yang Y., Wang Z., Luo P., Wong K. K., Palapattu G. S., Wong S. T., “Label-free high-resolution imaging of prostate glands and cavernous nerves using coherent anti-Stokes Raman scattering microscopy,” Biomed. Opt. Express 2(4), 915–926 (2011). 10.1364/BOE.2.000915 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Evans C. L., Xu X., Kesari S., Xie X. S., Wong S. T., Young G. S., “Chemically-selective imaging of brain structures with CARS microscopy,” Opt. Express 15(19), 12076–12087 (2007). 10.1364/OE.15.012076 [DOI] [PubMed] [Google Scholar]
  • 4.Yang Y., Li F., Gao L., Wang Z., Thrall M. J., Shen S. S., Wong K. K., Wong S. T., “Differential diagnosis of breast cancer using quantitative, label-free and molecular vibrational imaging,” Biomed. Opt. Express 2(8), 2160–2174 (2011). 10.1364/BOE.2.002160 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Gao L., Wang Z., Li F., Hammoudi A. A., Thrall M. J., Cagle P. T., Wong S. T., “Differential diagnosis of lung carcinoma with coherent anti-Stokes Raman scattering imaging,” Arch. Pathol. Lab. Med. 136(12), 1502–1510 (2012). 10.5858/arpa.2012-0238-SA [DOI] [PubMed] [Google Scholar]
  • 6.Uckermann O., Galli R., Tamosaityte S., Leipnitz E., Geiger K. D., Schackert G., Koch E., Steiner G., Kirsch M., “Label-free delineation of brain tumors by coherent anti-Stokes Raman scattering microscopy in an orthotopic mouse model and human glioblastoma,” PLoS One 9(9), e107115 (2014). 10.1371/journal.pone.0107115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Galli R., Sablinskas V., Dasevicius D., Laurinavicius A., Jankevicius F., Koch E., Steiner G., “Non-linear optical microscopy of kidney tumours,” J. Biophotonics 7(1-2), 23–27 (2014). 10.1002/jbio.201200216 [DOI] [PubMed] [Google Scholar]
  • 8.Legesse F. B., Medyukhina A., Heuke S., Popp J., “Texture analysis and classification in coherent anti-Stokes Raman scattering (CARS) microscopy images for automated detection of skin cancer,” Comput. Med. Imag. Graph. 43, 36–43 (2015). 10.1016/j.compmedimag.2015.02.010 [DOI] [PubMed] [Google Scholar]
  • 9.Chien C.-H., Chen W.-W., Wu J.-T., Chang T.-C., “Label-free imaging of Drosophila in vivo by coherent anti-Stokes Raman scattering and two-photon excitation autofluorescence microscopy,” J. Biomed. Opt. 16(1), 016012 (2011). 10.1117/1.3528642 [DOI] [PubMed] [Google Scholar]
  • 10.Mayerich D., Walsh M. J., Kadjacsy-Balla A., Ray P. S., Hewitt S. M., Bhargava R., “Stain-less staining for computed histopathology,” Technology 03(01), 27–31 (2015). 10.1142/S2339547815200010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Rivenson Y., Wang H., Wei Z., de Haan K., Zhang Y., Wu Y., Günaydın H., Zuckerman J. E., Chong T., Sisk A. E., “Virtual histological staining of unlabelled tissue-autofluorescence images via deep learning,” Nat. Biomed. Eng. 3(6), 466–477 (2019). 10.1038/s41551-019-0362-y [DOI] [PubMed] [Google Scholar]
  • 12.Ojeda J. J., Dittrich M., “Fourier transform infrared spectroscopy for molecular analysis of microbial cells,” in Microbial Systems Biology (Springer, 2012), pp. 187–211. [DOI] [PubMed] [Google Scholar]
  • 13.Alvarez-Ordonez A., Mouwen D., Lopez M., Prieto M., “Fourier transform infrared spectroscopy as a tool to characterize molecular composition and stress response in foodborne pathogenic bacteria,” J. Microbiol. Methods 84(3), 369–378 (2011). 10.1016/j.mimet.2011.01.009 [DOI] [PubMed] [Google Scholar]
  • 14.Baletic N., Petrovic Z., Pendjer I., Malicevic H., “Autofluorescent diagnostics in laryngeal pathology,” Euro. Archives Oto-Rhino-Laryngol. Head Neck 261(5), 233–237 (2004). 10.1007/s00405-003-0668-x [DOI] [PubMed] [Google Scholar]
  • 15.Goodfellow I. J., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y., “Generative adversarial networks,” arXiv preprint arXiv:1406.2661 (2014).
  • 16.Gui J., Sun Z., Wen Y., Tao D., Ye J., “A review on generative adversarial networks: Algorithms, theory, and applications,” arXiv preprint arXiv:2001.06937 (2020).
  • 17.Yi X., Walia E., Babyn P., “Generative adversarial network in medical imaging: A review,” Med. Image Anal. 58, 101552 (2019). 10.1016/j.media.2019.101552 [DOI] [PubMed] [Google Scholar]
  • 18.Isola P., Zhu J.-Y., Zhou T., Efros A. A., “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 1125–1134. [Google Scholar]
  • 19.Badrinarayanan V., Kendall A., Cipolla R., “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017). 10.1109/TPAMI.2016.2644615 [DOI] [PubMed] [Google Scholar]
  • 20.Mirza M., Osindero S., “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784 (2014).
  • 21.Lu Y., Tai Y.-W., Tang C.-K., “Attribute-guided face generation using conditional cycleGAN,” in Proceedings of the European conference on computer vision (ECCV) (2018), pp. 282–297. [Google Scholar]
  • 22.Gulrajani I., Ahmed F., Arjovsky M., Dumoulin V., Courville A., “Improved training of wasserstein GANs,” arXiv preprint arXiv:1704.00028 (2017).
  • 23.Arjovsky M., Chintala S., Bottou L., “Wasserstein generative adversarial networks,” in International conference on machine learning (PMLR2017), pp. 214–223. [Google Scholar]
  • 24.Mao X., Li Q., Xie H., Lau R. Y., Wang Z., Paul Smolley S., “Least squares generative adversarial networks,” in Proceedings of the IEEE international conference on computer vision (2017), pp. 2794–2802. [Google Scholar]
  • 25.Mao X., Li Q., Xie H., Lau R. Y., Wang Z., Smolley S. P., “On the effectiveness of least squares generative adversarial networks,” IEEE Trans. Pattern Anal. Mach. Intell. 41(12), 2947–2960 (2019). 10.1109/TPAMI.2018.2872043 [DOI] [PubMed] [Google Scholar]
  • 26.Miyato T., Kataoka T., Koyama M., Yoshida Y., “Spectral normalization for generative adversarial networks,” arXiv preprint arXiv:1802.05957 (2018).
  • 27.Lim J. H., Ye J. C., “Geometric GAN,” arXiv preprint arXiv:1705.02894 (2017).
  • 28.Tran D., RanGANath R., Blei D. M., “Hierarchical implicit models and likelihood-free variational inference,” arXiv preprint arXiv:1702.08896 (2017).
  • 29.Gan Z., Chen L., Wang W., Pu Y., Zhang Y., Liu H., Li C., Carin L., “Triangle generative adversarial networks,” arXiv preprint arXiv:1709.06548 (2017).
  • 30.Kim T., Cha M., Kim H., Lee J. K., Kim J., “Learning to discover cross-domain relations with generative adversarial networks,” in International Conference on Machine Learning (PMLR2017), pp. 1857–1865. [Google Scholar]
  • 31.Yi Z., Zhang H., Tan P., Gong M., “DualGAN: Unsupervised dual learning for image-to-image translation,” in Proceedings of the IEEE international conference on computer vision (2017), pp. 2849–2857. [Google Scholar]
  • 32.Denton E., Chintala S., Szlam A., Fergus R., “Deep generative image models using a laplacian pyramid of adversarial networks,” arXiv preprint arXiv:1506.05751 (2015).
  • 33.Chen Y., Shi F., Christodoulou A. G., Xie Y., Zhou Z., Li D., “Efficient and accurate MRI super-resolution using a generative adversarial network and 3D multi-level densely connected network,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (Springer, 2018), pp. 91–99. [Google Scholar]
  • 34.Oulbacha R., Kadoury S., “MRI to CT Synthesis of the Lumbar Spine from a Pseudo-3D Cycle GAN,” in 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI) (IEEE, 2020), pp. 1784–1787. [Google Scholar]
  • 35.Singh N. K., Raza K., “Medical Image Generation Using Generative Adversarial Networks: A Review,” Health Informatics: A Computational Perspective in Healthcare, 77–96 (2021).
  • 36.Liu R., Lei Y., Wang T., Zhou J., Roper J., Lin L., McDonald M. W., Bradley J. D., Curran W. J., Liu T., “Synthetic dual-energy CT for MRI-only based proton therapy treatment planning using label-GAN,” Phys. Med. Biol. 66(6), 065014 (2021). 10.1088/1361-6560/abe736 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Tie X., Lam S. K., Zhang Y., Lee K. H., Au K. H., Cai J., “Pseudo-CT generation from multi-parametric MRI using a novel multi-channel multi-path conditional generative adversarial network for nasopharyngeal carcinoma patients,” Med. Phys. 47(4), 1750–1762 (2020). 10.1002/mp.14062 [DOI] [PubMed] [Google Scholar]
  • 38.Emami H., Dong M., Nejad-Davarani S., Glide-Hurst C., “SA-GAN: Structure-Aware Generative Adversarial Network for Shape-Preserving Synthetic CT Generation,” arXiv preprint arXiv:2105.07044 (2021).
  • 39.Chan J. K., “The wonderful colors of the hematoxylin–eosin stain in diagnostic surgical pathology,” Int. J. Surg. Pathol. 22(1), 12–32 (2014). 10.1177/1066896913517939 [DOI] [PubMed] [Google Scholar]
  • 40.Bocklitz T. W., Salah F. S., Vogler N., Heuke S., Chernavskaia O., Schmidt C., Waldner M. J., Greten F. R., Bräuer R., Schmitt M., “Pseudo-HE images derived from CARS/TPEF/SHG multimodal imaging in combination with Raman-spectroscopy as a pathological screening tool,” BMC cancer 16(1), 534 (2016). 10.1186/s12885-016-2520-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Pradhan P., Meyer T., Vieth M., Stallmach A., Waldner M., Schmitt M., Popp J., Bocklitz T., “Computational tissue staining of non-linear multimodal imaging using supervised and unsupervised deep learning,” Biomed. Opt. Express 12(4), 2280–2298 (2021). 10.1364/BOE.415962 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Lee J. Y., Kim S.-H., Moon D. W., Lee E. S., “Three-color multiplex CARS for fast imaging and microspectroscopy in the entire CHn stretching vibrational region,” Opt. Express 17(25), 22281–22295 (2009). 10.1364/OE.17.022281 [DOI] [PubMed] [Google Scholar]
  • 43.Peng T., Thorn K., Schroeder T., Wang L., Theis F. J., Marr C., Navab N., “A BaSiC tool for background and shading correction of optical microscopy images,” Nat. Commun. 8(1), 1–7 (2017). 10.1038/s41467-016-0009-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Gatys L. A., Ecker A. S., Bethge M., “Image style transfer using convolutional neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 2414–2423. [Google Scholar]
  • 45.Johnson J., Alahi A., Fei-Fei L., “Perceptual losses for real-time style transfer and super-resolution,” in European conference on computer vision (Springer, 2016), pp. 694–711. [Google Scholar]
  • 46.Jing Y., Yang Y., Feng Z., Ye J., Yu Y., Song M., “IEEE Trans. Visual. Comput. Graphics,” IEEE Trans. Visual. Comput. Graphics 26(11), 3365–3385 (2020). 10.1109/TVCG.2019.2921336 [DOI] [PubMed] [Google Scholar]
  • 47.Zitova B., Flusser J., “Image registration methods: a survey,” Image Vis. computing 21(11), 977–1000 (2003). 10.1016/S0262-8856(03)00137-9 [DOI] [Google Scholar]
  • 48.Nag S., “Image registration techniques: a survey,” arXiv preprint arXiv:1712.07540 (2017).
  • 49.Murphy K., van Ginneken B., Klein S., Staring M., de Hoop B. J., Viergever M. A., Pluim J. P., “Semi-automatic construction of reference standards for evaluation of image registration,” Med. Image Anal. 15(1), 71–84 (2011). 10.1016/j.media.2010.07.005 [DOI] [PubMed] [Google Scholar]
  • 50.Szegedy C., Ioffe S., Vanhoucke V., Alemi A., “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Proceedings of the AAAI Conference on Artificial Intelligence (2017). [Google Scholar]
  • 51.Ronneberger O., Fischer P., Brox T., “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention (Springer, 2015), pp. 234–241. [Google Scholar]
  • 52.Wang H., Li Y., Luo Z., “An Improved Breast Cancer Nuclei Segmentation Method Based on UNet++,” in Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence (2020), pp. 193–197. [Google Scholar]
  • 53.Lei M., Li J., Li M., Zou L., Yu H., “An Improved UNet++ Model for Congestive Heart Failure Diagnosis Using Short-Term RR Intervals,” Diagnostics 11(3), 534 (2021). 10.3390/diagnostics11030534 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Zhou Z., Siddiquee M. M. R., Tajbakhsh N., Liang J., “Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,” IEEE Trans. Med. Imaging 39(6), 1856–1867 (2020). 10.1109/TMI.2019.2959609 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Rivenson Y., Ozcan A., “Toward a Thinking Microscope: Deep Learning in Optical Microscopy and Image Reconstruction,” ArXiv abs/1805.08970 (2018).
  • 56.Chen B.-C., Sung J., Lim S.-H., “Chemical imaging with frequency modulation coherent anti-Stokes Raman scattering microscopy at the vibrational fingerprint region,” J. Phys. Chem. B 114(50), 16871–16880 (2010). 10.1021/jp104553s [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.


Articles from Biomedical Optics Express are provided here courtesy of Optica Publishing Group

RESOURCES