Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jan 15.
Published in final edited form as: Med Image Comput Comput Assist Interv. 2020 Sep 29;12265:362–371. doi: 10.1007/978-3-030-59722-1_35

Multimarginal Wasserstein Barycenter for Stain Normalization and Augmentation

Saad Nadeem 1, Travis Hollmann 2, Allen Tannenbaum 3
PMCID: PMC7810499  NIHMSID: NIHMS1660602  PMID: 33458721

Abstract

Variations in hematoxylin and eosin (H&E) stained images (due to clinical lab protocols, scanners, etc) directly impact the quality and accuracy of clinical diagnosis, and hence it is important to control for these variations for a reliable diagnosis. In this work, we present a new approach based on the multimarginal Wasserstein barycenter to normalize and augment H&E stained images given one or more references. Specifically, we provide a mathematically robust way of naturally incorporating additional images as intermediate references to drive stain normalization and augmentation simultaneously. The presented approach showed superior results quantitatively and qualitatively as compared to state-of-the-art methods for stain normalization. We further validated our stain normalization and augmentations in the nuclei segmentation task on a publicly available dataset, achieving state-of-the-art results against competing approaches.

Keywords: Wasserstein Barycenter, Stain Normalization

1. Introduction

Histology is founded on the study of microscopic images to diagnose cell structures and arrangements for which staining is a critical part of the tissue preparation process. In particular, certain staining agents (mainly hematoxylin and eosin) transform the transparent tissue samples to become more distinguishable. Hematoxylin dyes the nuclei a dark purple color while eosin dyes other structures a pink color. A major problem is that results from the staining are inconsistent and prone to variability due to many factors including differing lab protocols, inter-patient variabilities, differences in raw materials, and variations in the slide scanners. These inconsistencies may cause major problems not only for pathologists, but may also degrade the performance of computer-aided diagnosis systems.

It is well-known that for downstream applications such as nuclei segmentation, stain normalization and augmentation are essential [5, 16]. In fact, augmentation might be more important than normalization, as pointed out by [10]. Color jitter or random HSV shifts was identified as an important step for the top contenders of the MonuSeg competition [6]. Due to the coming telemedicine revolution, there will be a strong requirement for either matching to a reference distribution or incorporating stain invariant approaches using stain augmentation or training on diverse datasets.

Tellez et al. [16] showed that for certain specific tasks, stain augmentation (adding color jitters in HSV or HED color space), with and without normalization, may also improve performance while making the resultant models more robust; nuclei segmentation was not explored as one of the tasks in the latter paper. Further as shown in [10], deep learning stain normalization/transfer approaches, such as StainGAN [14], are not particularly suitable for nuclei segmentation task given the lack of training data representing enough samples from same tissue, institution, preparation protocols, etc. Stain invariant models learnt via novel stain augmentation approaches can help achieve better performance. The work of Vahadane et al. [17] has been the predominant choice in the past for stain normalization in the nuclei segmentation tasks.

To deal with some of the aforementioned issues, we introduce a new approach for simultaneous H&E stain normalization and augmentation based on the multimarginal Wasserstein barycenter approach. Specifically, the novelty of the paper lies in first introducing the traditional Wasserstein barycenter approach for stain normalization/augmentation (Figure 1), and then introducing the multi-marginal version [1, 9] to overcome the limitations of the traditional approach in this context (Figure 2). Note that the traditional Wasserstein barycenter (1 source and 1 reference), although widely employed in computer vision, to the best of our knowledge has never been used for stain normalization/augmentation and the more general multimarginal Wasserstein barycenter (1 source and multiple references) has hardly ever been used in computer vision or medical imaging communities. For more accurate stain normalization, the multimarginal version allows one to incorporate additional distributions by utilizing one or more intermediate reference images (Figure 2). The resultant interpolations span a broad spectrum of stain variations allowing for simultaneous stain normalization and augmentation.

Fig.1.

Fig.1.

Traditional (N = 2) Wasserstein Barycenter. The source color distribution is interpolated towards the target color distribution in the Lab color space.

Fig.2.

Fig.2.

Multimarginal Wasserstein Barycenter with 1, 2, and 3 reference images, i.e., N = 2, 3 or 4 in (6). The images with blue borders are input/source images, the ones with green are the (intermediate) references and the red ones are the final reference/target images. Reference images can be from the same or different domains. Ideally in the multimarginal case, one of the intermediate references should have some background.

With respect to the pipeline, we convert given source and target images from RGB to the Lab color space, interpolate between the given color distributions using the Wasserstein barycenter, and then convert back to the RGB space. The Lab color space was chosen based on its general effectiveness (decorrelated color channels, etc.) in color transfer applications; see Reinhard and Pouli [13] and the references therein. Finally, we quantified our results on stain normalization with respect to a publicly available dataset and obtained state-of-the-art results. Similarly, we augment and normalize stain images in a publicly available nuclei segmentation dataset and again achieved state-of-the-art results using our method.

2. Background on Wasserstein Distance

We first very briefly review some basic material on optimal mass transport (OMT) theory and the Wasserstein distance that we will need in the sequel. We refer the reader to [19] for a more detailed development of the subject and references.

Consider two probability measures μ0, μ1 on n. In the original formulation of OMT due to Gaspard Monge [18, 19], one seeks a transport map

T:nn:xT(x)

which specifies where the initial mass μ0(dx) at x should be transported in order match the final distribution. This means that T#μ0 = μ1 where μ1 is the “push-forward” of μ0 under T:

μ1(B)=μ0(T1(B))

for every Borel set B in n. Moreover, given the transportation cost c(x, y), the map should minimize

nc(x,T(x))dμ0(x). (1)

In this paper, we will only consider the case c(x, y) = ∥xy2. To ensure finite cost, we will assume that μ0 and μ1 lie in the space of probability densities with finite second moments, denoted by P2(n).

The dependence of the transportation cost on T is highly nonlinear and a minimum may not exist for general costs c. In order to handle this problem, Leonid Kantorovich proposed a relaxed formulation [18, 19], in which one seeks a joint distribution πΠ(μ0, μ1) on n×n, referred to as a coupling of μ0 and μ1, i.e., the marginals along the two coordinate directions should coincide with μ0 and μ1, respectively. More precisely, in this setting, we consider

KinfπΠ(μ0,μ1)n×nxy2dπ(x,y). (2)

For the case where μ0, μ1 are absolutely continuous with respect to the Lebesgue measure, it is a standard result that OMT (2) has a unique solution [18, 19]. This is of the form

π=(Id×T)#μ0,

where Id stands for the identity map, and T is the unique minimizer of (1). One may also show that the unique optimal transport T is the gradient of a convex function ω, i.e.,

y=T(x)=ω(x). (3)

Wasserstein metric:

The square root of the optimal cost formally defines a Riemannian metric on P2(n), known as the Wasserstein metric W2 [18, 19], i.e., W2(μ0,μ1)K with K in (2).

Naturally P2(n) is a geodesic space: a geodesic between μ0 and μ1 is of the form

μt=(Tt)#μ0,Tt(x)=(1t)x+tT(x). (4)

A geodesic path is also known as displacement interpolation [8]. It holds that

W2(μs,μt)=(ts)W2(μ0,μ1),0s<t1. (5)

μt also solves the Wasserstein barycenter problem in the case of two probability measures as we will now describe below.

3. Wasserstein Barycenter

We follow the theory described in [1, 9] to which we refer the interested reader to all the relevant references. We follow the notation and set-up from Section 2. In the case of images of interest in the present work, we take n = 2 or n = 3 in P2(n).

Then the Wasserstein barycenter of N probability measures μ1,,μNP2(n) is the minimizer of the functional

f(μ)=i=1NλiWpp(μi,μ),λi0,i=1Nλi=1. (6)

This is a special case of the Multimarginal Optimal Transport (MOMT) problem [1, 9].

The case N = 2 is classical due to McCann [8] has been sketched in Section 2. For our purposes for stain normalization and augmentation in histological data, we may regard μ1 as the source distribution and μ2 as the reference distribution. Then for t ∈ [0, 1], we can consider minimizing the family of functionals

ft(μ)=(1t)W22(μ1,μ)+tW22(μ2,μ),

and hence get a continuous family of interpolations which form a geodesic in the space of probability distributions as described in Section 2. See equation (4).

In the present work, we also employ one source distribution μ1 and either 1, 2 or 3 reference distributions μi (2 ≤ i ≤ 4). For N ≥ 3 in (6), i.e., two or more references, we use the term “multimarginal OMT, ” to emphasize the fact that we are considering more than two measures. For the application to images, one can always normalize to make sure that the total mass (intensity) is 1. Further in the examples below (see Section 4), we choose λi = 1/N, 1 ≤ iN. Let μopt denote the optimal solution of (6), that is, the barycenter. Notice that taking μ1 and μopt as the marginals, we also find the optimal transport map T#μ1 = μopt. We also set the parameter m = N −2, which is the number of intermediate reference images. Thus, m = 0 refers to the usual Wasserstein barycenter computed with respected to a reference and a source, with no intermediate reference images.

Computation of Wasserstein Barycenters:

There are a number of algorithms for the computation of the multimarginal Wasserstein barycenter; see the very recent paper [11] and the references therein. In our implementation, we used the approach developed in Cuturi and Doucet [3], which we briefly sketch.

In the latter work, the authors propose (sub)gradient descent framework based on a modification of the functional (6); see in particular Section 4 of [3]. Because of the computational complexity, they first smooth the Wasserstein distance via an entropic regularizer. This allows them to employ the Sinkhorn algorithm [2], which is an iterative rescaling descent procedure that converges to the desired regularized distance. The procedure of [3] leads to a strictly convex objective function whose gradients can be computed in a fast, efficient manner. We employed the algorithm for the case in which we want the barycenter for N distributions (N = 2, 3 or 4) where N is the total number of masses in the weighted sum defining the Wasserstein barycenter (Eq. 6). As mentioned above, m = N − 2 is the number of intermediate reference images.

4. Experiments and Results

We implemented our algorithm using the Python Optimal Transport (POT) library4 which include GPU-accelerated versions of Sinkhorn regularization. We used Nvidia GeForce RTX 2080 Ti for our experiments. Pytorch framework was used for StainGAN5 and CNN36 implementations. We evaluated our approach against Reinhard et al. [12], Macenko et al.[7], Khan et al. [4], Vahadne et al. [17], and StainGAN [14].

4.1. Stain Normalization Evaluation

We used MITOS-ATYPIA’14 challenge dataset for evaluating our stain normalization. The dataset includes same tissue sections scanned by two different scanners (Aperio-A and Hamamatsu-H) with total 424 X20 A-H frame pairs, 300 training and 124 testing. Images from scanner A are normalized and matched against the real corresponding images from H (ground truth). As in StainGAN [14], 10, 000 random (256×256) patches from 300 training frames were used for training (26 epochs with the regularization parameter λ = 10, learning rate 0.0002, Adam optimizer with a batch size of 4) and 500 patches from 124 testing data used for evaluation. The visual and quantitative comparisons are shown in Figure 3 and Table 1, respectively. For the traditional case (one reference and source), our results are very similar to Reinhard et al. [12] since they also do color matching in Lab space, but our results improve drastically given two reference images. The references in our case span patches with different amounts of background visible. We also tested with different reference images and we show that we get a tighter bound as long as the references contain different amounts of background visibility; see the supplementary material for the box plots of SSIM for different references.

Fig.3.

Fig.3.

Multimarginal Wasserstein Barycenter comparison with state of the art methods on MITOS-ATYPIA’14 challenge dataset. The blue bordered image is the input image and the two red bordered images are the references (with the one above being the intermediate reference). m is the number of intermediate references (in traditional Wasserstein barycenter m = 0; there is no intermediate reference).

Table 1.

Stain Normalization Comparison (Mean ± Standard Deviation) on MITOS-ATYPIA’14 challenge dataset using Structural Similarity Index (SSIM) [20], Feature Similarity Index (FSIM) [21]. The time is the total time taken for normalizing all 500 images.

Methods SSIM FSIM Time (sec)
Reinhard [12] 0.55±0.13 0.63±0.07 6.76
Macenko [7] 0.51±0.08 0.62±0.09 59.40
Khan [4] 0.62±0.18 0.65±0.08 1994.87
Vahdane [17] 0.63±0.11 0.65±0.06 502.04
StainGAN [14] 0.68±0.23 0.69±0.06 69.12
m = 0 (Ours) 0.59±0.32 0.67±0.08 254.06
m = 1 (Ours) 0.73±0.06 0.75±0.11 384.28

4.2. Nuclei Segmentation

We also evaluate our stain augmentation/normalization in the nuclei segmentation settings. Specifically, we test our approach on MoNuSeg challenge dataset [6]. Previous works [6, 10] have already shown that stain normalization improves performance for nuclei segmentation tasks. As mentioned in the Introduction, Tellez et al. [16] demonstrated that for several key specific tasks, stain augmentation improves performance and robustifies the resultant models. Very importantly, deep learning stain normalization/transfer approaches, e.g., StainGAN [14], are not particularly suitable for nuclei segmentation task given the lack of training (source-target image pair) data; see [10] for the details. The work of Vahadane et al. [17] has been one of the main methodologies for stain normalization in nuclei segmentation.

Here we explore the effects of using different combinations of stain normalization and augmentation approaches for the same underlying architecture (CNN3), geometric augmentation and post-processing approaches [6]. We used the same architecture and hyperparameters as reported in Kumar et al. [6]. After training and validation, the Aggregated Jaccard Index was computed on the same test set as in [6] for direct comparison. To drive the stain normalization and augmentation, in our approach we used 4 reference images, one from each organ present in the training dataset; images from 4 organs were used for training and testing and images from 3 additional organs were included just in the test set.

5. Conclusions and Future Work

In the paper, we presented a new multimarginal Wasserstein barycenter method for H&E stain normalization and augmentation given one or more references. The method achieved superior performance in stain normalization and nuclei segmentation tasks because of the use of the intermediate references in the multimarginal setting. This allows one to incorporate additional distributions that can give physically more realistic interpolations (hence augmentation) as well as normalization. Since the normalization is done in the color distribution space and is not dependent on the number of pixels, the method can easily be scaled to whole slide images. In the future, we will also explore incorporating our Wasserstein barycenter formulation as a deep learning loss function.

Supplementary Material

Supplement

Table 2.

Nuclei Segmentation comparisons using Aggregated Jaccard Index on same MoNuSeg test dataset as reported in [6] for direct comparison (with images from 4 organs -Breast, Liver, Kidney and Prostate- also included in the training data and 3 others-Bladder, Colon and Stomach- not included in training). For the same underlying architecture (CNN3 [6]) denoted by C, different combinations of the following are explored: Vahadane Stain Normalization (V), geometric augmentation (G) via rigid (rotation and flipping) and affine transformations, color jitter via random HSV/HED shifts (J), stain augmentation via direct perturbations of H&E color channels (A) as introduced by Tellez et al. [15] (αi and βi were similarly taken from two uniform distributions), and finally our stain normalization (SN) and augmentation (SA). For SN, we just include the final interpolation and for SA we include all the interpolations.

Organ Image C+V [6] C+SN C+G+J+A C+G+SN+SA
Breast 1 0.4974 0.5211 0.4532 0.5325
2 0.5796 0.5726 0.4830 0.5815
Liver 1 0.5175 0.5462 0.6134 0.5598
2 0.5148 0.5829 0.5918 0.6013
Kidney 1 0.4792 0.4812 0.5815 0.5648
2 0.6672 0.7187 0.6924 0.7414
Prostate 1 0.4914 0.5305 0.5491 0.6270
2 0.3761 0.4017 0.3191 0.5296
Bladder 1 0.5465 0.5634 0.5510 0.6475
2 0.4968 0.5016 0.4489 0.5267
Colon 1 0.4891 0.5108 0.4904 0.5318
2 0.5692 0.6179 0.5879 0.6263
Stomach 1 0.4538 0.5318 0.4823 0.6408
2 0.4378 0.4520 0.3912 0.6551
Overall 0.5083 0.5381 0.5168 0.5976

Acknowledgements

This study was supported by AFOSR grants (FA9550-17-1-0435, FA9550-20-1-0029), NIH grant (R01-AG048769), MSK Cancer Center Support Grant/Core Grant (P30 CA008748), and a grant from Breast Cancer Research Foundation (grant BCRF-17-193).

Footnotes

References

  • 1.Agueh M, Carlier G: Barycenters in the Wasserstein space. SIAM Journal Math. Analysis 43(2), 904–924 (2011) [Google Scholar]
  • 2.Cuturi M: Sinkhorn distances: Lightspeed computation of optimal transport. In: Advances in Neural Information Processing Systems. pp. 2292–2300 (2013) [Google Scholar]
  • 3.Cuturi M, Doucet A: Fast computation of Wasserstein barycenters. In: ICML’14: Proc. 31st International Conf. Machine Learning vol. 32, pp. 685–693 (2014) [Google Scholar]
  • 4.Khan AM, Rajpoot N, Treanor D, Magee D: A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution. IEEE Transactions on Biomedical Engineering 61(6), 1729–1738 (2014) [DOI] [PubMed] [Google Scholar]
  • 5.Kumar N, Verma R, Anand D, Zhou Y, Onder OF, Tsougenis E, Chen H, Heng PA, Li J, Hu Z, et al. : A multi-organ nucleus segmentation challenge. IEEE Transactions on Medical Imaging (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Kumar N, Verma R, Sharma S, Bhargava S, Vahadane A, Sethi A: A dataset and a technique for generalized nuclear segmentation for computational pathology. IEEE Transactions on Medical Imaging 36(7), 1550–1560 (2017) [DOI] [PubMed] [Google Scholar]
  • 7.Macenko M, Niethammer M, Marron JS, Borland D, Woosley JT, Guan X, Schmitt C, Thomas NE: A method for normalizing histology slides for quantitative analysis. IEEE International Symposium on Biomedical Imaging: From Nano to Macro pp. 1107–1110 (2009) [Google Scholar]
  • 8.McCann R: A convexity principle for interacting gases. Adv. Math 128(1), 153–179 (1997) [Google Scholar]
  • 9.Pass B: Multi-marginal optimal transport: theory and applications. ESAIM: Mathematical Modelling and Numerical Analysis 49(6), 1771–1790 (2015) [Google Scholar]
  • 10.Pontalba JT, Gwynne-Timothy T, David E, Jakate K, Androutsos D, Khademi A: Assessing the impact of color normalization in convolutional neural network-based nuclei segmentation frameworks. Frontiers in Bioengineering and Biotechnology 7 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Puccetti G, Rüschendorf L, Vanduffel S: On the computation of Wasserstein barycenters. Journal of Multivariate Analysis 176, 104581 (2020) [Google Scholar]
  • 12.Reinhard E, Adhikhmin M, Gooch B, Shirley P: Color transfer between images. IEEE Computer Graphics and Applications 21(5), 34–41 (2001) [Google Scholar]
  • 13.Reinhard E, Pouli T: Colour spaces for colour transfer In: International Workshop on Computational Color Imaging. pp. 1–15. Springer; (2011) [Google Scholar]
  • 14.Shaban MT, Baur C, Navab N, Albarqouni S: Staingan: Stain style transfer for digital histological images. International Symposium on Biomedical Imaging pp. 953–956 (2019) [Google Scholar]
  • 15.Tellez D, Balkenhol M, Otte-Höller I, van de Loo R, Vogels R, Bult P, Wauters C, Vreuls W, Mol S, Karssemeijer N, et al. : Whole-slide mitosis detection in h&e breast histology using phh3 as a reference to train distilled stain-invariant convolutional networks. IEEE Transactions on Medical Imaging 37(9), 2126–2136 (2018) [DOI] [PubMed] [Google Scholar]
  • 16.Tellez D, Litjens G, Bándi P, Bulten W, Bokhorst JM, Ciompi F, van der Laak J: Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology. Medical Image Analysis 58, 101544 (2019) [DOI] [PubMed] [Google Scholar]
  • 17.Vahadane A, Peng T, Sethi A, Albarqouni S, Wang L, Baust M, Steiger K, Schlitter AM, Esposito I, Navab N: Structure-preserving color normalization and sparse stain separation for histological images. IEEE Transactions on Medical Imaging 35(8), 1962–1971 (2016) [DOI] [PubMed] [Google Scholar]
  • 18.Villani C: Topics in Optimal Transportation. No. 58, American Mathematical Soc. (2003) [Google Scholar]
  • 19.Villani C: Optimal Transport: Old and New, vol. 338 Springer; (2008) [Google Scholar]
  • 20.Wang Z, Bovik AC, Sheikh HR, Simoncelli EP: Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13(4), 600–612 (2004) [DOI] [PubMed] [Google Scholar]
  • 21.Zhang L, Zhang L, Mou X, Zhang D: Fsim: A feature similarity index for image quality assessment. IEEE Transactions on Image Processing 20(8), 2378–2386 (2011) [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

RESOURCES