Skip to main content
AMIA Summits on Translational Science Proceedings logoLink to AMIA Summits on Translational Science Proceedings
. 2022 May 23;2022:323–330.

CycleGAN with Dynamic Criterion for Malaria Blood Cell Image Synthetization

Zhaohui Liang 1, Jimmy Xiangji Huang 1
PMCID: PMC9285136  PMID: 35854731

Abstract

We present a cycle-consistent adversarial network (Cycle GAN) with dynamic criterion to synthesize blood cells parasitized by malaria plasmodia. The result shows 100% of the synthetic images are correctly classified by the pretrained classifier compared to 99.61% of the real images, 76.6% generated by the Cycle GAN without the dynamic criterion. The average score of Frechet Inception Distance (FID) of the generated images by the enhanced Cycle GAN is 0.0043 (Std=0.0005), which is significantly lower than the FID score of the variational autoencoder (VAE) model (0.0085 (Std=0.0007)). We conclude that the new Cycle GAN model with dynamic criterion can generate high quality malaria infected blood cell images with good diversity. The new method provides new augmentation technique to enhance the image diversity where the acquisition of well-annotated images is highly restricted, and to improve the robustness of medical image automatic processing by deep neural networks.

Introduction

Malaria is a tropical infectious disease caused by the infection of plasmodia. The microorganism is usually transmitted by mosquito bites and parasitizes in human blood cells (particularly red blood cells). According to the statistics by WHO, there are 409,000 deaths related to malaria in 2019 and the total death toll is accumulated to 7.6 million since 20001. The standard malaria fast screening diagnosis technique is microscopic malaria infected blood cell counting by medical professionals. This method is inefficient because it is not only time and labor consuming but also highly affected by individual expertise. To solve this problem automatic image processing technology has been applied to malaria diagnosis since 20052. In 2016, our team developed a convolution neural network (CNN) with 6 convolutional layers for classification of malaria infected blood cells. The CNN was trained with a dataset with 27,578 blood cell images (ration: 1:1) and the average accuracy is 97.37%3. The following studies also report extremely high classification accuracy4,5. However, these results are all achieved based on a large, well annotated dataset for training the CNN models. In most cases, big annotated medical image datasets are difficult to acquired. If the medical images are annotated by non-medical persons, the quality of the image data is suspicious due to the lack of expertise. Therefore, we should seek for a solution to minimize the human expertise intervention to the deep neural network (DNN) optimization.

Another drawback of DNN is that the specific medical image patterns are different from general-purposed images such as those in the ImageNet dataset. As the result, when using transfer learning with DNN models trained by the ImageNet to fine tune a new model for the medical images, the pretrained feature extractors usually cannot effectively capture the medical significant patterns through the complex architecture but simply develop meaningless combinations for the final decision. In our previous work on CNN for the malaria blood cell image classification, the transfer learning approach has lower accuracy (91.99%) than the randomly initialized CNN (97.37%)3. A recent study reveals that the seemly high-performance DNN models for COVID-19 chest X-Ray image detection are vulnerable from network attacks6.

The common strategy to improve DNN performance is to enhance the diversity of training image data by augmentation such as random rotation, flipping, and jittering. However, these conventional augmentation methods are unsuitable to the most medical tasks like images of histological cells and tissues, or X-Ray photos. The image-based medical diagnosis usually requires structure completeness and correct image alignment because the diagnosis is usually based on the comparison between normal and abnormal structure. The random augmentation techniques are likely to break the structure completeness or position alignment. As a result, the DNN models are likely to capture wrong combination of patterns or artifacts instead of the correct ones matching the human knowledge. This is a possible explanation for the DNN vulnerability for medical image processing. As response to the challenge, we switch to apply the generative models for image augmentation. A generative model can learn the data distribution in the unsupervised manner, then it can generate new data with reasonable variations given the learned distribution. In this work, we respectively apply two approaches of generative learning: variant autoencoder (VAE) and generative adversarial networks (GAN). The difference is that a VAE is to maximize the evidence lower bound (ELBO) of the data distribution, while the GAN is optimized by achieving an equilibrium between the generator model and the discriminator model. The merit of using generative models for image augmentation is that both VAE and GAN generate synthetic images with an acceptable extent of randomness, which can effectively simulate the real context of medical practice.

In the rest parts, we will briefly introduce the rationale of VAE and GAN, then we will present the details of our new cycle-consistent adversarial network (Cycle GAN) with a dynamic criterion to compute an extra loss term of the objective, the setting of the experiments, and the result of the comparison of the VAE, this new Cycle GAN, and a conventional GAN. Finally, we will draw a conclusion based on the analysis and propose the solution for malaria blood cell synthetization.

Rationale and Methods

Deep generative learning or generative DNN, is an unsupervised learning approach to learn the training data distribution with a given DNN architecture, then it can generate new data points belonging to the learned distribution with random variance. However, the generative DNN cannot either explicitly or implicitly learn the identical distribution of the training data, but it can approximate the true parameters by different modeling techniques. There are two main methods for generative DNN: variational autoencoder (VAE) and generative adversarial networks (GAN). In this study, we mainly use the GAN approach to synthesize the malaria infected blood cell images. The VAE model is implemented for synthetic image comparison.

VAE is a generative model introduced in 20137. Given the observed dataset X = {x(1), x(2), … , x(i)}, a VAE is composed of two networks. The encoder is a DNN parameterized by ϕ to estimate the posterior distribution of the latent variable z given X:qϕ(z|X), where the training data points are taken as observations to estimate the parameters of the conditional distribution of the latent representation Z. The decoder is another DNN parameterized by θ to estimate the conditional distribution of the observed data pθ(X/z), where the input is a sample z (usually the outputs from the encoder). The optimization objective of a VAE can be written as:

(θ,ϕ;X)=ωDKL(qϕ(z|X)||pθ(z))+𝔼qϕ(z|X)[logpθ(X|z)] (1)

where the reverse Kullback-Leibler (KL) divergence is to measure the distance between posterior distribution of z (qϕ(z|X)) parameterized by the encoder and the prior distribution of z (pθ(z)) parametrized by the decoder. The second term of the right side of Equation (1) is the expected negative log-likelihood to measure the expected error of reconstructing the data points belonging to X from the latent space Z. We aim to maximize the log-likelihood of logpθ(x)≥ELBO, where ELBO is the evidence lower bound. We let ELBO = (θ,ϕ;X). The logpθ(x) will be maximized when the negative ELBO is minimized. Given the GPU support, we choose to compute the analytic KL divergence and do not use the reparameterization trick. In addition, the weight (ω) of the KL divergence term is a crucial hyperparameter for VAE performance. A too small ω cannot effectively regularizing the qϕ(z|X) term so the z sampled from qϕ(z) will be from a very low-density position of qϕ(z|X). On the contrary, when ω is too large, the distance between the posterior distribution and prior distribution is too close, resulting in the loss of diversity. In our study, we set the ω=0.01 as the KL divergence weight or VAE optimization. For image generation, we use 2D convolutional layers to down-sampling (stride=2) the feature maps for the VAE encoder and use 2D transpose convolutional layers to up-sampling (stride=2) the latent variables back to images (64-by-64-by-3) for the VAE decoder.

The new cycle-consistent adversarial network with dynamic criterion, or Cycle GAN with dynamic criterion we present in this work, is used to generate high quality synthetic malaria infected blood cell images from real images with human eye detectable randomly diversity from the original ones. Cycle GAN is the state-of-the-art conditional generative adversarial network (cGAN) for unpaired image to image translation8. A typical Cycle GAN uses two generators and two discriminators to learn the mapping of two distributions by optimizing with a complex objective and reaching a state of adversarial equilibrium. In our new model, we add a pretrained binary classifier as the criterion, which is a residual neural network trained by the training dataset with the accuracy of 99.61%. The criterion is to calculate an extra critic loss term for optimizing both generators during the Cycle GAN training. The term “dynamic” means if the GAN model generates images too far way from the real malaria positive blood cells, the criterion will yield large penalty to the critic loss term to pull the generator back to the acceptable scope. On the other hand, if the generated images are in the acceptable scope for real malaria blood cells, the critic loss term will be dynamically minimize to keep the diversity of the synthetic images.

The original Cycle GAN architecture has two GAN models to learn and generate images respectively belonging to the source domain X and target domain Y. X represents the distribution of the normal cell images and Y represents the distribution of the malaria infected cell images. There are two pairs of GAN models in the Cycle GAN: generator G and discriminator DY aim to adversarially generate and distinguish the generated / real malaria infected blood cell images, i.e., minGmaxDYLGAN(G,DY,X,Y), and generator F and discriminator DX aim to adversarially generate and distinguish the generated and real normal blood images, i.e. minFmaxDXLGAN(F, DX, Y, X). The summation of the two terms: LGAN(G, DY, Y, X) + LGAN(F, DX, Y, X), is the adversarial loss term of the Cycle GAN. Two more loss terms are added to form the original total generator loss: cycle consistency loss, or Lcyc(G,F) is to measure the error when the images are reversely translated back to their original domains, i.e., x → G(x) → F(G(x)) ≈ x (forward cycle consistency), and y → F(y) → G(F(y)) ≈ y (backward cycle consistency). It helps to transfer uncommon style elements such as the dots representing the parasitizing plasmodia in the infected blood cells and random dyed organelles in normal cell, while it remains the comment features such as the shape of the blood cell during image translation. The identity loss, or Liden (G, F) is added to compute the total objective loss for the whole model. It is to measure whether the generators (G and F) can produce a real image from a real image, i.e., x → F(x) ≈ x and y → G(y) ≈ y. This adjustment can raise the magnitude the gradient to further stabilize the adversarial train, and it also helps to enhance the background diversity of the generated images. The total generator loss is written as:

Ltotal=[LGAN(G,DY,X,Y)+LGAN(F,DX,Y,X)]+λLcyc(G,F)+λLiden(G,F) (2)

λ is the weight for the cycle consistency loss and identity loss during optimization. According to original work, the λ term is set to 10.0 for Cycle GAN optimization8. However, we find that λ=10.0 is too low for the model optimization, thus empirically λ=80.0 is used in our Cycle GAN implementation. Our enhanced Cycle GAN model introduces a new criterion loss term. It consists of two terms, i.e., the cycle criterion loss and the identity criterion loss, which can be jointly written as:

Lcritics=Lccycle+Lcidentity (3)

Like the cycle loss and identity loss, the cycle criterion loss Lc-cycle quantitatively measures whether of the back-translated images are still be classified as the original class, and the identity criterion loss Lc-identity quantitatively measures whether the trained generators can produce real images from a real observed sample that still consistent to the same class. After adding the new criterion loss term, the total generator loss in Equation 2 is revised as:

Ltotal=[LGAN(G,DY,X,Y)+LGAN(F,DX,Y,X)]+λLcycle(G,F)+λLiden(G,F)+κ(φLcritics) (4)

In Equation 4, λ and Φ are the parameters to respectively adjust the importance of the different loss terms during the whole Cycle GAN architecture optimization. The classification loss injection is controlled by a function κ to determine the frequency of classification loss injection. We set κ to be once every five steps in this work because too frequent injection of classification loss will shift the adversarial equilibrium and reduce the fidelity of the synthetic images. The term Φ determines the importance of the criterion loss when it is injected into the total generator loss. Empirically, the criterion loss contributes a large proportion of the total generator loss at the beginning of the Cycle GAN optimization. When the Cycle GAN training reaches an adversarial equilibrium, the criterion loss can periodically add an extra oscillation momentum to the stable condition to push the generator progress to learn more details. The new term is considered as a regularization method to prevent the saturated status of the GAN optimization because it provides a method to make the GAN training controllable to a certain degree. In summary, the total loss of the generators in our enhanced Cycle GAN consists of four parts:

  • Adversarial loss: LGAN (G, DY, X, Y) + LGAN (F, DX, Y, X)

  • Cycle consistency loss: Lcycle (G, F)

  • Identity loss: Liden (G, F)

  • Criterion Loss: Lcritics = c-cycle + Lc-identity

The generators follow the U-NET architecture with skip connections to reduce the input feature size from 64 by 64 to 1 by 1 then to restore to 64 by 64. The discriminators follow the PatchGAN architecture with an output of 8-by-8-by-1 feature map to determine with the images are real or fake. We use the binary cross entropy as the objective function for the discriminator loss and the adversarial loss terms for the generators. The cycle consistency loss and the identity loss use the mean of absolute error (MAE) function as the objective. The terms of the criterion loss are measured by the sparse categorical cross entropy as the same method as how the pretrained criterion was optimized. Some studies recommend using the unbounded smooth loss function such as to optimize the GAN models such as Wasserstein loss or least square loss (MSE)8,9. Empirically, the choice of loss functions is mainly based on the components of the total loss objective. If all errors can be measured within similar scales, using the unbounded loss functions is straightforward and easier for the overall GAN optimization. However, if the GAN architecture consists of many components like this case, using hypermeters to adjust the importance of different terms or to determine the frequency of loss injection to the total loss can provide a more flexible option for GAN optimization as described in Equation 4. Our new Cycle GAN architecture is illustrated in Figure 1.

Figure 1.

Figure 1.

Architecture of Cycle GAN with Dynamic Criterion

Experiments and Results

We use an open-source dataset contains 24 thousand parasitemic (malaria positive) and normal (malaria negative) segmented blood cell images (ratio 1:1) hosted by National Library of Medicine (NLM) as we did our previous work3. The dataset is accessible at ftp://lhcftp.nlm.nih.gov/Open-Access-Datasets/Malaria/NIH-NLM-ThinBloodSmearsPf/ for the development of an Android based automatic malaria screener11. One benefit of using the Cycle GAN architecture is that the model can be optimized by a relatively small dataset (e.g., hundreds of images). To save the runtime, we randomly choose 19,578 images (9,789 from each class) for the Cycle GAN optimization and the rest 8,000 image for the following tests. Given the original image size, they are resized to 64-by-64-by-3 to fit the model input. And the models are respectively optimized by 140 epochs on the Google Colab Pro Cloud supported by a Tesla P100 GPU. The average optimization runtime of a single epoch is 16 seconds for the VAE model, and 55 seconds for the enhanced Cycle GAN model. A sample of the real blood cell images is illustrated in Figure 2.

Figure 2.

Figure 2.

Original Blood Cell Images

From Figure 2, we find that it is difficult to discriminate the uninfected blood cells (malaria negative) from the parasitemic cells (malaria positive) without medical expertise because the images from both classes have similar background color and randomly dyed dots inside the cells. Since our hypothesis is that the malaria positive cell images and the malaria negative cell images belongs to two separable distributions. Therefore, we can use the VAE to learn the distribution parameters of the malaria positive images, and we can also use the new Cycle GAN with dynamic criterion to learn the mapping parameters between the two domains.

We implement the VAE models as mentioned above and optimized them with different weights of KL divergence. We use the Adam (adaptive moment estimation) optimizer12 with the initial learning rate of 2 × 10−4. The VAE model are optimized with 100 epochs with the mini-batch size of 128 given the GPU memory limitation on Google Colab.

The results show that the VAE decoder can generate plausible blood cell images from the KL weight=0.10. The outcomes become better when the KL weight=0.05. The generated images are shown is Figure 3. Noted that the all the synthetic images by the VAE are all classified as malaria positive images by the pretrained criterion.

Figure 3.

Figure 3.

Synthetic Blood Cell Images by VAE

In the next experiment, we implement the new Cycle GAN with dynamic criterion with λ = 40.0 and Φ =0.20. The models are optimized with the Adam optimzer12 with the initial learning rate of 4 × 10−4 for 150 epochs. The mini-batch size is set to 128 given the limitation of GPU memory on Google Colab. The synthetic blood cell images respectively generated from the malaria positive images and from the negative images are shown in Figure 4.

Figure 4.

Figure 4.

Synthetic Blood Cell Images by Enhanced Cycle GAN with Dynamic Criterion

From the results, we conclude that the trained generator of the enhanced Cycle GAN can synthesize malaria positive from both positive and negative real images. All the generated images are accurately classified by the pretrained criterion (accuracy=100%). However, from Figure 4, we find that thought all generated images are classified as malaria positive by the criterion, the synthetic images generated from positive images are obviously more plausible than those generated from negative images. In additional, artifacts are easily observed from all generated images, but they are effectively restricted to a certain degree by the Cycle GAN optimization. In contrast, if we remove the criterion and optimize the original Cycle GAN architecture, the generated images will have less control resulting in about 23.3% of the generated images will be classified as malaria negative by the pretrained criterion. (See Figure 5)

Figure 5.

Figure 5.

Synthetic Blood Cell Images by Cycle GAN without criterion

When we observe the changes of different loss term values during the optimization of the enhanced GAN with dynamic criterion (Figure 6), we find that the whole architecture will reach and maintain an adversarial equilibrium after about 5 epochs. As a result, the learning process of the generators will be slowed down due to the stable low gradient. A periodical injection of the criterion loss term can give extra oscillation momentum to slightly break the adversarial equilibrium, so that the learning process can be expedited by this dynamic loss injection. The proper choice of the weight and frequency of the criterion loss value is important. From the bottom graphs of Figure 6, we find that the values of the two criterion loss terms are very large at the beginning (particularly during the first 40 epochs), therefore, they will pull the whole architecture away from the approaching adversarial equilibrium. Conversely, the value of the two criterion loss terms become much smaller in the end of the training (after the 80 epochs), so they will improve the learning process when the gradients become saturated.

Figure 6.

Figure 6.

Change of loss values during enhanced Cycle GAN Optimization

Finally, we use the Frechet Inception Distance (FID) to quantitatively measure the generated images by different generative models. FID is a metric to evaluate the distance between the feature vectors by an inception network trained by the ImageNet dataset13. The FID score provides quantitative evaluation for the quality of generated images by generative models. In general, lower FID scores indicates the generated images are well correlated to the real ones and with higher quality images. However, since the images in the ImageNet dataset and the histological images are obviously heterogeneous, the FID scores for the generated blood cell images are likely to be very low because the pretrained inception net is likely to classify all generated cell images to the same class. However, the quantitative difference between the generated images by different models still can provide an objective metrics for comparison. The FID scores of the synthetic images by different generative models are shown in Table 1. It indicates that the new Cycle GAN with dynamic can generate the best quality images compared to those by the VAE model.

Table 1.

Comparison of image quality and performance of enhanced Cycle GAN and VAE

Model architecture Classification Accuracy Training Runtime (sec / epoch) FID Mean (Std) Image Quality (Subjective)
Cycle GAN with dynamic criterion 100% 55 0.0043 (0.0005) High with good diversity
Cycle GAN without criterion 76.6% 52 0.0051 (0.0003) High with good diversity
Convolutional VAE 100% 16 0.0085 (0.0007) Fair with good diversity

Conclusion

This study demonstrates a new Cycle GAN with dynamic criterion can generate high quality synthetic blood cell images from real segmented histological blood cell images. Compared with the convolutional variant autoencoder (VAE), another trendy deep generative model, the enhanced Cycle GAN produces synthetic blood cell images with higher quality and good diversity. The criterion provides a dynamic control to the GAN architecture to generate images belonging to the desirable class with complex discriminative pattern associated with medical expertise. We believe this new method is a state-of-the-art solution to improve the balance of the training dataset and further the final performance of the other DNN based machine learning tasks. Therefore, it is helpfully to solve the common machine learning issue on the inaccessibility of well-annotated medical images relying on medical expertise, and it will finally become a low-cost and feasible method to improve the AI performance in the medical imaging domain.

Acknowledgment

This work is supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada and the York Research Chair (YRC) program.

Figures & Table

References

  • 1.WHO World Malaria Report 2020. World Health Organization. Available at: https://www.who.int/teams/global-malaria-programme/reports/world-malaria-report-2020 (Accessed: 25 Aug 2021)
  • 2.Tokumasu F, Fairhurst RM, Ostera GR. Band 3 modifications in Plasmodium falciparum-infected AA and CC erythrocytes assayed by autocorrelation analysis using quantum dots. J Cell Sci. 2005;118(5):1091–8. doi: 10.1242/jcs.01662. [DOI] [PubMed] [Google Scholar]
  • 3.Liang Z, Powell A, Ersoy I, Poostchi M, Silamut K, Palaniappan K, et al. 2016. pp. 493–496. CNN-based image analysis for malaria diagnosis. 2016 IEEE international conference on bioinformatics and biomedicine (BIBM).
  • 4.Yang F, Poostchi M, Yu H, Zhou Z, Silamut K, Yu J, et al. 24(5):1427–1438. doi: 10.1109/JBHI.2019.2939121. Deep learning for smartphone-based malaria parasite detection in thick blood smears. IEEE J Biomed Health Inform. 2020; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Yu H, Yang F, Rajaraman S, Ersoy I, Moallem G, Poostchi M, et al. 20(1):1–8. doi: 10.1186/s12879-020-05453-1. Malaria Screener: a smartphone application for automated malaria screening. BMC Infect Dis. 2020; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Hirano H, Koga K, Takemoto K. Vulnerability of deep neural networks for detecting COVID-19 cases from chest X-ray images to universal adversarial attacks. PLoS One. 2020;15(12) doi: 10.1371/journal.pone.0243963. : e0243963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kingma DP, Welling M. Auto-Encoding Variational Bayes. stat. 2014 May; 1050:1.
  • 8.Li M, Huang H, Ma L, Liu W, Zhang T, Jiang Y. 2018. pp. 184–199. Unsupervised image-to-image translation with stacked cycle-consistent adversarial networks. The European Conference on Computer Vision (ECCV).
  • 9.Arjovsky M, Chintala S, Bottou L. 2017. pp. 214–223. Wasserstein generative adversarial networks. International conference on machine learning.
  • 10.Mao X, Li Q, Xie H, Lau RY, Wang Z, Paul Smolley S. 2017. pp. 2794–2802. Least squares generative adversarial networks. IEEE international conference on computer vision.
  • 11.Yu H, Yang F, Rajaraman S, Ersoy I, Moallem G, Poostchi M, et al. 20(1):1–8. doi: 10.1186/s12879-020-05453-1. Malaria Screener: a smartphone application for automated malaria screening. BMC Infectious Diseases. 2020; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kingma DP, Ba J. 2014. Adam. A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  • 13.Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. GANs trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems. 2017; 30.

Articles from AMIA Summits on Translational Science Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES