Abstract
Automatic image analysis of histopathology specimens would help the early detection of blood cancer. The first step for automatic image analysis is segmentation. However, touching cells bring the difficulty for traditional segmentation algorithms. In this paper, we propose a novel algorithm which can reliably handle touching cells segmentation. Robust estimation and color active contour models are used to delineate the outer boundary. Concave points on the boundary and inner edges are automatically detected. A concave vertex graph is constructed from these points and edges. By minimizing a cost function based on morphological characteristics, we recursively calculate the optimal path in the graph to separate the touching cells. The algorithm is computationally efficient and has been tested on two large clinical dataset which contain 207 images and 3898 images respectively. Our algorithm provides better results than other studies reported in the recent literature.
1 Introduction
As new therapies emerge for blood cancer screening, it becomes increasingly important to distinguish among subclasses of lymphocytes in advance. Processing the specimen using a reliable, image-based analysis system could reduce the cost and patient morbidity. In image-based analysis the first step is segmentation. However, the traditional methods usually fail to accurately segment touching cells in the digitized hematologic specimens. Touching cells are especially prominent in malignant cases. In Figure 1, we show representative morphologies for benign and five hematologic malignancies (hematoxylin-eosin staining): Chronic Lymphocytic Leukemia (CLL) [1], Mantle Cell Lymphoma, (MCL) [2], Follicular Center Cell Lymphoma (FCC) [3], Acute Myelocytic Leukemia (AML) and Acute Lymphocytic Leukemia (ALL) [2].
The watershed algorithm is the most commonly used method for performing touching object segmentation. However, it suffers from several major drawbacks.
Oversegmentation. The algorithm is sensitive to noise and often produces many oversegmented small regions. Marker-based watershed [4] can partially remedy this issue, but it requires manual selection or accurate estimation of the markers.
Lack of shape prior. It is generally difficult to include shape priors in the watershed transform. Although there are some efforts [5,6] proposed for specific cases, the general problem still exists.
In this paper, we propose a novel algorithm to separate touching cells. The algorithm starts from a deformable model which extracts the boundary contour of the touching cells. The concave vertex graph is constructed using the concave vertices on the contour and the edges detected in the region of touching cells. The segmentation is then treated as an optimal grouping of pixels, which can be solved by recursively searching optimal shortest path in the concave vertex graph.
2 Boundary Contour Extraction
The initial step of the algorithm is to extract the boundary contour of the touching cells. We first apply a L2E robust estimation [7] to provide a rough estimation of the outer boundaries of the cells inside the region of interest (ROI). A robust gradient vector flow (GVF) snake [8] using Luv [9, Sec. 8.4] color gradients is further applied to extract the objects from the background. Since the deformable models are initialized using the results of robust estimation, the convergence speed is increased and the method can handle topological changes. In this paper, we focus our attention on the touching cases shown in Figure 2b, where the output contour represents the outer boundary of the touching cells.
3 Concave Points and Inner Edges Detection
In Figure 3, we show the construction of the concave vertex graph. The contour found by boundary contour extraction algorithm is shown in Figure 3a. We detect the high curvature points on the contour via [10](Figure 3b). At each point p on the contour a set of triangles are constructed. The points which satisfy
(1) |
where dmin, dmax = 7, 9 pixels and αmax = 150° are kept. The candidates are further processed to suppress the local nonmaxima points. The final high curvature points correspond to both concave and convex points. We keep only the concave points, shown as red rectangles in Figure 3c. This can be calculated from the sign of the cross product a ⊗ b, which has to be negative for concave points.
Canny edge detector is applied inside the cell region and straight line fitting is used to model the edges (Figure 3d). The separating curve combines a pair of convex vertices on the boundary and is enforced to pass through the inner edges.
4 Touching Cells Segmentation
The outer boundary of the touching cells is defined as C, and the region enclosed by C is R(C). The concave points are the set V, e.g. v1– v5 which are shown in Figure 3e. The inner edges are the set E, e.g. shown as white solid lines in Figure 3e and also illustrated by ei in Figure 3f.
4.1 Concave Vertex Graph
In Figure 3f we construct the concave vertex graph G. Let W be the vertex set consisting of the end points of inner edges E, e.g. wi and wj in Figure 3f. The vertices of graph G are then equal to V ∪ W.
The graph has two sets of edges E and F. The set E contains the inner edges found by the edge detection algorithm. The set F is constructed with filling edges by connecting the vertices in G which are not connected by inner edges, e.g. fk in Figure 3f. The lengths of the inner edges are set to ε (10−16), while the lengths of the filling edges in set F are given by the Euclidean distance between the two vertices of the edges.
The Dijkstra algorithm is used to find the shortest path pij between vi and vj. The length of the pij, ||pij ||, is given by the total length of the filling edges fk in pij because the length of real inner edges is set to be ε
(2) |
In Figure 3f, as an example, we can see ||p12|| > ||p13|| because p12 traverse longer filling edges than p13. The defined path lengths enforce the segmentation to follow inner edges since the trivial solution to directly connect two concave vertices using only filling edges in graph G would provide a longer path.
Alg. 1.
Input: Given the region of interest (ROI) containing touching cells.
|
After the Dijkstra algorithm is applied, we find all the shortest pathes among concave vertices, pij, which are valid candidates to separate touching cells. The key idea of our algorithm is to treat the touching cells segmentation as recursively searching for the best path pij in G, which minimizes a cost function specifically designed to prefer cell-like object-cut.
4.2 Cost Function
We are looking for perceptually “good” segmentation of touching cells. For this purpose, we design the cost function to represent the clues that surgical pathologists use for judgement.
-
The cells should be objects which are perceptually salient, since humans intend to separate such objects in an image. A good definition of saliency is proposed in [11] based on the Gestalt laws [12]. We apply the minimum of two saliency costs
(3) where ||pij|| is the length defined in (2), each path pij in G divides R(C) into two regions L and R, and the min function in (3) selects the region with the smallest cost. The area(C, pij) denotes the area enclosed by C and path pij.
-
The cells are objects which are close to elliptical shape and can be modeled by ellipse fitting using points on C and pij. The ratio between the long and short axes is recorded as tg. The segmented objects are expected to provide a ratio tg in the range [tg1, tg2], in which case the dist (tg, [tg1, tg2]) = 0. Otherwise, we define dist (tg, [tg1, tg2]) = min (|tg − tg2|, |tg − tg1|).
(4) where the L and R have the same definition as (3). The tg1 and tg2 represent the lower bound and upper bound of the long axes to short axes ratio.
- The cells are objects which have biologically reasonable areas. Following the definition above, we use ta1 and ta2 to represent the lower bound and upper bound of the cell area.
(5) -
The final cost c is the weighted sum
(6) The optimal values of coefficients are selected as λ1 = 0.5, λ2 = 0.3 and λ3 = 0.2, which are learned in an offline process using a training set and held constant throughout the experiments.
4.3 Algorithm
Using the concave vertex graph G and the cost function c, the method is described in Algorithm 1. It is recursively applied to separate touching cells until all the region R(C) are allocated to the segmented cells. The algorithm only separates the cytoplasm of the touching cells. Since the colors of nuclei and cytoplasm are distinct, they can be easily separated. In order to provide smooth boundaries, we apply the quadratic splines to postprocess the boundaries of each segmented cell.
5 Experiments
The cell database consists of a mixed set of 86 hematopathology cases: 18 Mantle Cell Lymphoma (MCL), 20 Chronic Lymphocytic Leukemia (CLL), 9 Follicular Center Cell Lymphoma (FCC), 18 Acute Lymphocytic Leukemia (ALL), 19 Acute Myelocytic Leukemia (AML), and 19 benign cases. For each case, there are varying number of cell images from 10 to 90. In total there exists 3898 cell images in our complete database. All the cases were generated from the archives of City of Hope Hospital in California, University of Pennsylvania of School of Medicine, Spectrum Health System, Grand Rapids, MI and Robert Wood Johnson Medical School, University of Medicine & Density of New Jersey.
The imaging platform for the experiments consisted of an Intel-based workstation interfaced with a high-resolution Olympus DP70 camera equipped with 12-bit color depth on each color channel and 1.45 million pixel effective resolution. The system also includes a single 2/3 inch CCD digital camera, an Olympus AX70 microscope equipped with a Prior 6-way robotic stage, motorized objective turret and a magnification changer.
We compare the segmentation results with manually segmentation. Two sets of experiments are performed.
The 207 touching cases of the histopathology cell image dataset.
The complete database which contains 3898 histopathology cell images.
Figure 4 shows some segmentation results. In Table 1 we present the segmentation accuracies for the six different classes of lymphocytes in two set of experiments. We obtained an average accuracy 88.9% on the touching cells dataset and 90.1% on the complete database.
Table 1.
Benign | CLL | MCL | FCC | AML | ALL | |
---|---|---|---|---|---|---|
accuracyc (%) of touching cells | 90.1 | 90.8 | 86.4 | 86.9 | 86.3 | 85.2 |
accuracyn (%) of touching cells | 92.3 | 91.2 | 88.1 | 88.7 | 87.5 | 87.9 |
accuracyc (%) of all cells | 92.5 | 91.7 | 87.2 | 89.1 | 88.5 | 87.6 |
accuracyn (%) of all cells | 95.8 | 92.8 | 90.1 | 91.0 | 88.9 | 89.2 |
Only a limited number of recent literature addresses the issue of touching cells segmentation in histopathology images using hematoxylin staining in high resolution (60× in our case). The watershed algorithm [4] is widely accepted for touching object segmentation and successfully used in segmenting histopathology images [13]. We compared our method with watershed using the 207 touching cell image dataset and listed the results in Table 2. The 80% column in Table 2 represents the sorted 80% highest accuracy of all the results, and is commonly used by doctors to evaluate the usability of the system. The experiments demonstrate the superior performance of the presented approach.
Table 2.
Mean | Variance | Median | Min | Max | 80% | |
---|---|---|---|---|---|---|
Watershed | 74.3 | 9.8 | 75.1 | 65.4 | 82.7 | 72.9 |
Concave Vertex Graph | 88.9 | 5.1 | 90.2 | 75.2 | 95.5 | 87.1 |
6 Conclusion
In this paper, a novel segmentation algorithm has been proposed to address the challenges of touching cell segmentation in hematologic specimens. The results are validated using real clinical data containing six classes of hematologic blood cell images. We compare our algorithm with watershed and experimentally show the superior performance of the proposed algorithm.
For general pixel grouping problem using a normal graph, the optimization problem is N P -hard. Only certain cost function can be approximately solved using algorithm like normalized cut [14] in polynomial time. In our algorithm, the cost function is designed to meet the domain specific requirements. The concave vertex graph, which utilize the concave points of the outer contour, reduce the search space to the shortest pathes in the constructed graph G. Based on a MATLAB implementation, the algorithm can finish in less than 2 seconds for an 128×128 image.
References
- 1.Rozman C, Montserrat E. Chronic lymphocytic leukemia. The New England Journal of Medicine. 1995;333(16):1052–1057. doi: 10.1056/NEJM199510193331606. [DOI] [PubMed] [Google Scholar]
- 2.Cotran R, Kumar V, Collins T, Robbins S. Pathologic basis of disease. 5. W.B. Saunders Company; Philadelphia: 1994. [Google Scholar]
- 3.Aisenberg A. Coherent view of non-Hodgkin’s lymphoma. J Clin Oncol. 1995;13:2656–2675. doi: 10.1200/JCO.1995.13.10.2656. [DOI] [PubMed] [Google Scholar]
- 4.Moga AN, Gabbouj M. Parallel marker-based image segmentation with watershed transformation. Journal of Parallel and Distributed Computing. 1998;51(1):27–45. [Google Scholar]
- 5.Grau V, Mewes AUJ, Alcaniz M, Kikinis R, Warfield SK. Improved watershed transform for medical image segmentation using prior information. ITMI. 2004;23(4):447–458. doi: 10.1109/TMI.2004.824224. [DOI] [PubMed] [Google Scholar]
- 6.Nguyen HT, Ji Q. Improved watershed segmentation using water diffusion and local shape priors. CVPR. 2006;1:985–992. [Google Scholar]
- 7.Scott DW. Parametric statistical modeling by minimum integrated square error. Technometrics. 2001;43:274–285. [Google Scholar]
- 8.Yang L, Meer P, Foran D. Unsupervised segmentation based on robust estimation and color active contour models. IEEE Trans on Information Technology in Biomedicine. 2005;9:475–486. doi: 10.1109/titb.2005.847515. [DOI] [PubMed] [Google Scholar]
- 9.Wyszecki G, Stiles WS. Color Science: Concepts and Methods, Quantitative Data and Formulae. 2. Wiley; Chichester: 1982. [Google Scholar]
- 10.Chetverikov D, Szabó Z. A simple and efficient algorithm for detection of high curvature points in planar curves. The 23rd Workshop of the Austrian Pattern Recognition Group; 1999. pp. 175–184. [Google Scholar]
- 11.Stahl JS, Wang S. Convex grouping combining boundary and region information. ICCV. 2005;2:946–953. doi: 10.1109/tip.2007.904463. [DOI] [PubMed] [Google Scholar]
- 12.Elder JH, Goldberg RM. Ecological statistics of Gestalt laws for the perceptual organization of contours. Journal of Vision. 2002;2(4):324–353. doi: 10.1167/2.4.5. [DOI] [PubMed] [Google Scholar]
- 13.Adiga PSU, Chaudhuri BB. An efficient method based on watershed and rule-based merging for segmentation of 3D histo-pathological images. J Pattern Recognition. 2001;34(7):1449–1458. [Google Scholar]
- 14.Cai W, Chung AC. Multi-resolution vessel segmentation using normalized cuts in retinal images. In: Larsen R, Nielsen M, Sporring J, editors. MICCAI 2006. LNCS. Vol. 4191. Springer; Heidelberg: 2006. pp. 928–936. [DOI] [PubMed] [Google Scholar]