Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Oct 28.
Published in final edited form as: Mach Learn Med Imaging. 2010;6357:58–66. doi: 10.1007/978-3-642-15948-0_8

Appearance Normalization of Histology Slides

Marc Niethammer 1, David Borland 1, J S Marron 1, John Woosley 1, Nancy E Thomas 1
PMCID: PMC4211434  NIHMSID: NIHMS337285  PMID: 25360444

Abstract

This paper presents a method for automatic color and intensity normalization of digitized histology slides stained with two different agents. In comparison to previous approaches, prior information on the stain vectors is used in the estimation process, resulting in improved stability of the estimates. Due to the prevalence of hematoxylin and eosin staining for histology slides, the proposed method has significant practical utility. In particular, it can be used as a first step to standardize appearances across slides, that is very effective at countering effects due to differing stain amounts and protocols, and to slide fading. The approach is validated using synthetic experiments and 13 real datasets.

1 Introduction

To highlight distinct structures in microscopy images of tissue samples, tissue staining is commonly used. Frequently two stains, such as hematoxylin and eosin (H&E), are applied for purposes such as discriminating cell nuclei and cytoplasm. Variations in staining results can be minimized by using fully standardized staining protocols. However, in practice precise control over stain color and staining intensity is typically not possible: stains may fade over time, stain colors may differ slightly, slides may have been imaged on different microscopes, or data that has already been digitized may need to be analyzed.

Standard (non-fluorescent) stains absorb light. Local stain concentrations and stain colors determine the appearance of an illuminated slide sample under the microscope. If no stain is present and the underlying tissue does not absorb a significant amount of light, the corresponding pixel will appear bright. Areas where the stains accumulate will appear darker. Absorption is wavelength dependent, and a particular stain can be characterized by its absorption coefficients, forming a vector (the stain vector) of dimension equal to the number of wavelengths in the sensor used for imaging (three for a standard RGB color camera, as in this work). Given the stain vectors, an image can be decomposed into individual stain components via color deconvolution [1], and stains can subsequently be intensity adjusted. This paper proposes a method for automatic stain vector estimation and slide appearance normalization (color and intensity), which can in turn improve quantitative, computerized analysis.

Previous approaches to extract stain vectors include (1) manual region of interest definition, (2) methods relying on non-negative matrix factorizations [2], and (3) plane fitting in the optical density domain [3]. The approach presented in this paper is most closely related to [3]. Novel contributions include: (1) a rigorous theory for the color model used, (2) the introduction of prior information for the stain vector taking into account varying amounts of stain (such as that encountered in the case of sparsely distributed nuclei on large amounts of stained background tissue), (3) an alternating optimization method and its connection to a sub-problem from trust region optimization, (4) a novel twist on Otsu thresholding [4] which also includes prior information, (5) and quantitative validation on synthetic and real datasets.

Sec. 2 introduces the stain model and formalizes the planar assumption for the stain vectors of [3]. Sec. 3 discusses the plane fitting method with prior information. The clustering approach is presented in Sec. 4. Sec. 5 presents validation results, and Sec. 6 concludes.

2 Stain Vector Model

According to the Beer-Lambert law, the transmission of light through a material can be modeled as I = I0eαcx, where I0 is the intensity of the incident light and I the intensity of the light after passing through the medium; α is the absorption coefficient, c the concentration of the absorbing substance, and x the distance traveled through the medium. The absorbance, or optical density (OD), is: OD=αcx=log(II0). The proposed method assumes that α and x are constant for a specimen and a given stain, but that a stain’s concentration c may change. For a multi-spectral image the relation is:

I=I0eαcx,OD=log(II0)=αcx,

where the absorption coefficient is color dependent (αi), ⊙ denotes the Hadamard product (the element-wise vector product), and Ø the Hadamard division. Note that low intensities will correspond to large optical densities and high intensities (e.g. white areas) will correspond to low optical densities. Each stain has a characteristic vector α of absorption coefficients. Given a specific distance x, the optical density vector OD is linearly related to the absorption coefficient vector, where the proportionality constant is given by the stain concentration: OD = αxc. Applying the Beer-Lambert law to the two-stain color-image case (e.g., eosin and hematoxylin) yields I = I0 ⊙ e−(α1c1x12c2x2), where superscripts denote values for the two distinct stains. Converting to optical density results in

log(II0)=α1c1x1+α2c2x2,

which shows that the obtainable intensity vectors I for a given illumination I0 lie in the plane spanned by the absorption coefficient vectors αi (the stain vectors) in the optical density domain. Since ci 0 and xi 0 and the αi are linearly independent, any color (which can fully be explained by the imaging model) needs to lie within the convex cone C = {x|x = q1α1 + q2α2, q1, q2 0 }. Further, normalizing all possible optical density vectors, the resulting points must all lie within CN={xx=x||x||,xCo}, where Inline graphic denotes Inline graphic\0 and geometrically, Inline graphic = Inline graphicInline graphic, the intersection between Inline graphic (the n-dimensional unit sphere − n = 3 for an RGB camera) and Inline graphic, which is a sector of a great circle.

3 Plane fitting with a plane prior

The optical density cone of Sec. 2 is a subset of a plane Inline graphic passing through the origin Inline graphic = {x: nTx = 0}, where n is the plane’s unit normal. The signed distance of any point to the plane can be computed as d(x, Inline graphic) = nTx. The plane closest to all points (wrt. their squared distances) minimizes

E(n)=nT(ixixiT)n=nTSn,s.t.||n||=1.

Since S is by construction symmetric positive-semi-definite, n is the eigenvector of the smallest eigenvalue of S. Such an unconstrained estimation was proposed in [3]. Estimation results are reliable when a sufficient amount of both stains is present in a given slide sample. But when this assumption does not hold, which happens for example when stained nuclei are sparse on a given sample, or if artifactual stains (e.g., brown from melanin) is present, adding prior information on the color direction is very important to assure good performance of the estimator. The corresponding maximum-a-posteriori energy (with prior) is

E(n)=(12σ2id2(xi,P))+12(σ0)2||nnp||2,s.t.||n||=1, (1)

where σ and σ0 are the standard deviations for the measured points (assumed to be independent) and the prior, respectively (assumed to be Gaussian).

An approximately planar distribution of points will only be observed if large numbers of measurement points for both stains are contained in the set of data points. This is often not the case. Therefore a weighting of the data points assigned to either one of the stain directions can be very useful. Assume the partitioning into two classes is given by a clustering method (as described in Sec. 4). Then the energy (up to constants) can be decomposed as

E(n)=[j=12(iPjnjd2(xi,P)2σ2+α2(σ0)2||nnp||2)wj],

where wi are appropriately chosen weights, Pi indicate the partitions, and ni are the number of points in the respective partitions. A reweighting should take into account the presence or absence of a sufficient number of data points in either of the clusters. Further, a simplification to the form of Eq. 1 is desirable in case both clusters are of equal size. For these properties to be fulfilled the following conditions should hold:

αw1+αw2=1,w1N2+w2N2=N,γn1=w1,γn2=w2.

These conditions are fulfilled for α=12,w1=2n2n, and w2=2n1n. After some algebra and rescaling (by 2σ2N=(4n1n2)) the energy becomes

E(n)=nT(12n1i=1n1xixiT+12n2i=1n2xixiT)n+N4n1n2σ2(σ0)2||nnp||2,

which is of the same form as Eq. 1: a weighted covariance matrix and a cluster-dependent weighting of the prior term. The optimization problem is closely related to the sub-problem of finding a minimum over a boundary in trust-region optimization [5]. The overall solution alternates between a solution of the optimization problem and a reclustering (see Sec. 4) of the data points, to convergence.

Theorem 1 (Optimal plane fit)

Given the optimization problem

minnnTSn+1σ2||nnp||2,||n||=1,

where S is a real positive-semi-definite matrix, at optimality

(S¯+λI)n=1σ2np,||n||=1,S¯=S+1σ2I,(S¯+λI)0.

Proof

Follows from the proof of the trust-region optimality conditions [5].

If + λI is invertible, the Lagrangian multiplier can be obtained by solving

σ4i=13(λi+λ)2=i=13(np)i2j=1,ji3(λj+λ)2,λ(λ1,),

where λ1 is the smallest eigenvalue of , and ñp = QTnp, with Q the matrix of eigenvectors (as columns) of . If the matrix is not invertible, the point distribution exhibits symmetries which makes the solution non-unique.

Typically, the prior plane normal np will not be directly available, but will be specified through a set of two given stain vectors {s1, s2}. The associated normal vector prior is then np=s1×s2||s1×s2||.

4 Clustering of the data points

The plane fitting algorithm of Sec. 3 requires a clustering method to partition the set of given normalized stain vectors with respect to the two stains applied to a tissue specimen. K-means is arguably one of the most popular clustering methods, and simplifies to Otsu-thresholding when two cluster centers are sought for one-dimensional features. A globally optimal clustering can be computed by Otsu thresholding [4] by discretizing the space of possible thresholds. The computations are efficiently performed using the feature histogram. A suitable one-dimensional feature for the clustering of the plane-fitting algorithm is angle in the fitting plane with respect to a given reference direction (the midpoint of the projection of the stain vector priors). The Otsu threshold minimizes within-cluster variance weighted by cluster size. Prior stain vector information should be used for the thresholding as well to avoid gross mis-clusterings when the number of data points for one stain direction clearly dominates the other.

Prior information can be incorporated into Otsu thresholding by minimizing

E(IIθ,μ1,μ2)=(1σ12ij:IjIθ(Iiμ2)2)+(1σ22ij:Ij>Iθ(Iiμ1)2)+1(σ2μ)2(μ2μ¯2)2+1(σ1μ)2(μ1μ¯1)2,

with respect to the unknown threshold Iθ (which completely specifies the segmentation) and the unknown central elements for the two stain angles μ1 and μ2. For a given partitioning, the optimal values are

μi=σi2σi2+(σiμ)2niμ¯i+(σiμ)2niσi1+(σiμ)2niI¯i,

where ni denotes the numbers of points and Īi the mean angles in the partitions. Note that μ1 and μ2 are not the foreground and the background mean respectively, but are a weighted average of the means and the priors.

Computing the angle histogram for a set of points allows for direct, efficient computation of Īi, and consequentially of μi. Searching over all discretized threshold values results in the globally optimal threshold Iθ. Neglecting the prior terms or specifying uniform priors and Gaussian distributions for the image likelihoods recovers the standard Otsu thresholding technique.

To avoid specifying separate priors for np and for the angle center priors μ̄i, they are computed based on the stain vector priors. Since the stain vectors themselves are (according to the stain vector model) extreme directions specifying the boundary of the optical density cone, they are not directly useful as priors for Otsu thresholding, as pure colors are the exception, and a distribution of mixed colors is observed. Therefore, given two stain vectors s1 and s2, the priors are chosen as the angles with respect to the reference direction of the projections {qi}=qiqiTnn onto the current estimate of the plane, where q1 = (1 − α)s1 + αs2 and q2 = αs1 + (1 − α)s2, α ∈ [0, 0.5) (α = 0.15 in experiments done here) are directions moved slightly inward of the cone boundaries.

Once the plane has been fit, estimates of stain intensities are computed as the medians of the clusters. Estimates of the stain vectors are obtained by computing the robust minima (γ-percentile) and maxima ((1 − γ) percentile) within the cluster centers; γ = 1 for all experiments performed here. These estimates enable the transformation of images into any chosen color-space, and therefore normalization of appearance across multiple slides. While related to the approach by [3] the proposed method alleviates problems with uneven cluster sizes by defining statistics within the clusters, by weighting the plane fit with respect to cluster size, and by incorporating prior information. The benefits of this new approach are presented in Sec. 5. Fig. 1 shows an overview of the plane fitting algorithm with prior.

Fig. 1.

Fig. 1

Algorithmic description of the optimal plane-fit algorithm.

5 Validation and Experimental Results

Figs. 3, 2 and 4 show the performance of the plane fitting algorithm with prior. Fig. 3 shows the results of a synthetic experiment to estimate the plane normal. Three methods are compared: (1) estimation without a plane normal prior (corresponding to the method in [3]), (2) estimation with a plane normal prior, but without the clustering step, and (3) the full algorithm 1. Estimation results shown are deviations (in degrees) of the estimated normal vector with respect to the ground truth normal vector. To assess the influence of varying cluster sizes and normal priors, two stain vectors s1, s2 were chosen at a 15 degree angle. Varying numbers of stain vectors were generated for the two stains using an isotropic Gaussian distribution (standard deviation of 0:1 which is similar to that in our real data sets). Priors were generated from the stain vectors by tilting the normal vector by an angle θ with respect to the axis defined by the stain vectors (translated to the origin) and by rotating the priors within the plane by an angle Φ. Fig. 3 shows that all three methods to determine the plane normal have similar performance for clusters of equal size. Prior information improves the results greatly for uneven point distributions (by almost 10 degrees on average for the most extreme point imbalance 5/1000 as expected to occur for example in regions with sparsely distributed nuclei). In cases where the effect of the prior is most pronounced, the clustering further improves estimation results.

Fig. 3.

Fig. 3

Synthetic experiments: Angle difference (in degrees) between the estimated normal vector and the ground truth. Top row: proposed method. Middle row: plane fit with prior without clustering error minus with clustering error. Bottom row: plane fit without prior error minus plane fit with prior with clustering error. Estimates are results of 1000 random samples for different priors and varying numbers (5/1000, …) of points in the two stain clusters. The proposed method performs best. The white line shows the zero level set of angle differences.

Fig. 2.

Fig. 2

Estimation consistency for the proposed method and the method not using prior information [3] comparing the mean deviation from the mean normal vector across a slide (with respect to the tiles) in degrees. Smaller values and a tighter distribution demonstrate the advantage of the proposed method. Results are statistically significantly different.

Fig. 4.

Fig. 4

Restaining of a real dataset using the proposed method (right) and the method not using prior information (middle). Zoom-ins (top-right).

Fig. 2 shows the performance of the method on 13 real datasets compared to direct plane fitting without a plane prior as in [3]. The histology images were subdivided into areas of 1000×1000 pixels and were independently adjusted for stain intensity and stain direction using the two methods. Fig. 2 shows the estimation consistency for the two methods by comparing the mean deviation from the mean normal vector across a slide (mean with respect to the tiles). Estimation consistency is statistically significantly better for the proposed method (with p < 1e−4 using a t-test or non-parametric permutation test). The mean deviation from the prior was around 11 degrees for the method using prior information and 20 degrees for the method not using the prior information. The tight distribution for the consistency results for the proposed method demonstrates that the prior was not chosen to dominate the results.

To illustrate the behavior of the estimation method graphically Fig. 4 shows the results for a real dataset compared to direct plane fitting without a plane prior. Stain intensity scaling factors and the deviations from the mean of the estimates for the normal direction are shown. For a well working method, the results are expected to be approximately uniform. While the difference for the stain correction between the two methods (in both cases prior information was used for the clustering to obtain the intensity scalings) is not as drastic as for the normal direction, the plane fitting method with prior improves intensity scalings as can be witnessed by the reconstruction results, which are almost perfectly uniform for the method using prior information and inconsistent otherwise.

6 Conclusions

This paper presented a method to automatically adjust the appearance of stained histology slides. It described a novel way of adding prior information for the stain vectors and how to deal with unequal stain distribution through a clustering process. The clustering is a novel adaptation of Otsu thresholding including prior information. The underlying optimization problem relates to trust-region optimization and is therefore well studied and easy to solve. Real and synthetic experiments showcase the superior performance of the method developed.

References

  • 1.Ruifrok A, Johnston D. Quantification of histochemical staining by color de convolution. Analytical and Quantitative Cytology and Histology. 2001 Aug;23:291–299. [PubMed] [Google Scholar]
  • 2.Rabinovich A, Agarwal S, Laris C, Price J, Belongie S. Unsupervised color decomposition of histologically stained tissue samples. Advances in Neural Information Processing Systems. 2003 [Google Scholar]
  • 3.Macenko M, Niethammer M, Marron J, Borland D, Woosley J, Guan X, Schmitt C, Thomas N. A method for normalizing histology slides for quantitative analysis. Proceedings of the Sixth IEEE International Symposium on Biomedical Imaging (ISBI) 2009:1107–1110. [Google Scholar]
  • 4.Otsu N. A threshold selection method from gray-level histograms. Automatica. 1975;11:285–296. [Google Scholar]
  • 5.Nocedal J, Wright SJ. Numerical Optimization. Springer; 2006. [Google Scholar]

RESOURCES