Abstract.
Purpose: Semi-automatic image segmentation is still a valuable tool in clinical applications since it retains the expert oversights legally required. However, semi-automatic methods for simultaneous multi-class segmentation are difficult to be clinically implemented due to the complexity of underlining algorithms. We purpose an efficient one-vs-rest graph cut approach of which the complexity only grows linearly as the number of classes increases.
Approach: Given an image slice, we construct multiple one-vs-rest graphs, each for a tissue class, for inference of a conditional random field (CRF). The one-vs-rest graph cut is to minimize the CRF energy derived from regional and boundary class probabilities estimated from random forests to obtain a one-vs-rest segmentation. The final segmentation is obtained by fusing from those one-vs-rest segmentations based on majority voting. We compare our method to a well-used multi-class graph cut method, alpha-beta swap, and a fully connected CRF (FCCRF) method, in brain tumor segmentation of 20 high-grade tumor cases in 2013 MICCAI dataset.
Results: Our method achieved mean Dice score of 0.83 for whole tumor, compared to 0.80 by alpha-beta swap and 0.79 by FCCRF. There was a performance improvement over alpha-beta swap by a factor of five.
Conclusions: Our method utilizes the probabilistic-based CRF which can be estimated from any machine learning technique. Comparing to traditional multi-class graph cut, the purposed one-vs-rest approach has complexity that grows only linearly as the number of classes increases, therefore, our method can be applicable for both online semi-automatic and offline automatic segmentation in clinical applications.
Keywords: semi-automatic segmentation, multi-class segmentation, conditional random field, graph cuts
1. Introduction
Many clinical applications utilizing medical images require careful and thorough labeling of anatomical structures or tissue types to design an optimal treatment plan, or to evaluate or predict treatment outcomes. Human oversight is legally and ethically necessary for decision-making in the healthcare industry that is highly regulated. Thus, segmentation done by expert is still the gold standard. However, manual segmentation is prone to human error and introduces variations existing between physicians with different training backgrounds and within the same physician from time to time. This inconsistency may imperil successful patient treatment. Automatic or semi-automatic segmentation approaches can diminish human fatigue, accelerate workflow, and provide consistency.
Most commercial solutions employ atlas-based approaches1–5 to automatic normal organ segmentation. Atlas-based methods assume that organ shape and location are similar among the population. Thus, once a spatial correspondence is established between an atlas (or multiple atlases) and a target image, the atlas’ pre-segmented labels can be transferred to the target image. However, when the anatomy of a new case is very different from the selected atlases, the image registration between the target image and the atlases might not be sufficiently accurate to obtain a good segmentation.6 To address variations of organ shapes among the population, one may introduce more atlases to increase the coverage, but as the number of atlases increases, it becomes more computationally expensive to find the proper atlases as more registrations are required.
With the advanced chip development of graphics processing units (GPUs), the neural network has taken a quantum leap in recent years in computer vision and natural language processing applications. Convolutional neural networks (CNNs)7 have shown remarkable performance in object recognition and have been extended to medical image segmentation8,9 with equally impressive capability. A network needs to go deep to achieve robustness, i.e., it includes many layers of neural nodes and convolution filters. As a result, it requires an enormous amount of training data to fit a massive number of network parameters. For real-world images, much effort, such as ImageNett,10 has been made publicly available to provide very large-scale curated and labeled image datasets for the training of neural networks. While it is possible to acquire annotations of those real-world images via crowdsourcing, it is impossible to do so in the field of medical images. Obtaining a large-scale medical image training dataset remains a challenge. First, well-curated data from medical expert annotation are demanding. Second, the training dataset is usually institution-specific. A dataset from one institution may not apply to another institution. Lastly, certain disease sites and certain image modalities may be lacking sufficient images.
We propose a novel and efficient one-vs-rest graph-based approach for multi-class and multi-modal medical image segmentation. We do not aim to compete with state-of-the-art fully automatic methods that require tremendous effort in training. Instead, we examined the performance of the proposed graph-based segmentation method with limited on-line training and showed compatible results comparing to state-of-the-arts.
2. Background of Graph-Based Segmentation
One of the approaches to image segmentation is to treat a discrete image grid as a graph and utilize widely studied graph partition algorithms. We handle multi-class image segmentation as a minimum -cut problem in graph theory to partition the image (graph) into sub graphs (classes) such that the cost to separate the graph is minimum. The cost commonly includes a data term to penalize disagreement between the assigned image label of a pixel and its observation, plus a smoothness term to penalize discontinuity if neighboring image pixels have similar features. The cost is formulated as an energy function of labeling with labels:
(1) |
where and are unary data and pairwise smoothness terms, respectively, is the set of image pixels and is the set of neighboring pixels. The energy function resembles the energy from conditional random field (CRF)11 which models the conditional probability of labeling (segmentation) given the observation. The conditional probability is maximized when the energy is minimized. It is well known that the -cut problem of finding the minimum of the energy function is NP-hard when so it is computationally intractable. Boykov et al.12 demonstrated that the minimum of the energy function in Eq. (1) can be approximated via a sequence of two-class cuts, namely, swap and -expansion algorithms for being a semi-metric and metric, respectively. Both algorithms use cuts iteratively to change the labels of pixels for approaching the minimum of energy until convergence. At each iteration, the swap algorithm considers each pair of labels to determine if pixels assigned to one label in the pair can be swapped to another label (-move) using graph cut, while expansion algorithm iterates through each label to determine if pixels assigned to the label can be expanded, that is, allowing any set of pixels to change their label to the examined label (-move.) Although these algorithms are more efficient than traditional approximation method such as simulated annealing, they are still very time consuming due to the iterative process, thus not suitable for on-line segmentation.
Recently, the graph structure of CRF model has been extended to a fully connected graph [fully connected CRF (FCCRF)]13,14 to establish pairwise potentials on all pairs of pixels in the image. The resulting graph may have billions of edges, making the conventional inference algorithms impractical. To efficiently solve the minimization of CRF energy for inference, mean-field approximation was employed, nevertheless, the edge (boundary) potentials were limited to Gaussian kernels.
Our method, although maintaining traditional CRF graph structure, does not make assumptions on boundary potentials, and addresses the inference problem using the concept of ensemble classification.15 We perform one-vs-rest graph cuts, each to minimize CRF energy of a one-vs-rest permutation, to obtain segmentations. The final segmentation is decided by label fusion based on majority voting from the segmentations. Our method only increases the complexity of graph cut by a factor of . The overview of our method is shown in Fig. 1.
Fig. 1.
The overview of the proposed segmentation workflow. one-vs-rest graphs are constructed with edge costs assigned from probabilities estimated by trained regional and boundary classifiers. Individual cut is performed for each graph and the segmentations from cuts are combined through a majority vote to obtain the final segmentation.
3. Methods
3.1. Conditional Random Fields
Let be the graph representing a 2D or 3D volumetric image with voxels. are nodes representing image voxels and are edges connecting neighboring nodes, where is the set of neighbors of voxel in a neighborhood system (four-connected system in this study.) We treat image segmentation as a classification problem for assigning a class label to each voxel. Let be the set of labels for the classification, the label assignment be a random variable and observed image features be . The most plausible segmentation given an image is when conditional probability is maximum. In a CRF, factorizes according to as defined in the following form
(2) |
where is a unary term describing how likely that the node is assigned to label given the observed image feature and is a pairwise term for an edge describing how likely that nodes and are assigned to labels and , respectively. We define the two terms to reflect the probabilities estimated from a regional classifier for classifying voxels to certain tissue types and a boundary classifier for classifying an edge to be a non-boundary or boundary type. Let be the set of boundary types. We define
(3) |
and
(4) |
3.2. One-vs-Rest Cuts
The exponent in Eq. (2) can be minimized by graph cut16 via swap or expansion. Since multi-cut (-cut) problem is NP hard, those approximation methods are expensive to compute. We propose a one-vs-rest approach that combines the results of cuts. Since multiple labels can be assigned to a voxel after cuts, the final class label assignment is determined in a fashion similar to label fusion strategy commonly seen in atlas-based segmentation. Let cut be an cut that determines the voxels to be assigned to class or the rest of the classes, . The construction of graphs for cut is similar to that used in Boykov and Jolly17 for bi-class segmentation. For each voxel node, edges are added to connect the to the node and the node to the , and edges are added to connect the node to its neighbors. For a node and its neighbor , let the cost of the edges being cut be and for edges and for . They are assigned as follows for an cut that minimizes CRF energy defined in Eq. (2),
Once the edge costs are assigned, then a regular minimum cut is performed. After the cut, any voxel (node) that remains connected to source is assigned to class label for the cut at this time and otherwise it is assigned to . If a link with is cut, then there is a boundary for an unknown in .
3.3. Majority Votes
After cuts, each voxel has class assignments from the cuts, the final class assignment, i.e., segmentation, is determined by the majority votes. Let be the class assignment for voxel from the ’th cut, then we define a counting function
(5) |
and a function
(6) |
is the number of times of that voxel is assigned to . We then determine the final class assignment for voxel using the following rules:
-
•
Majority rule: If , for all , assign voxel to class . If there is no majority, tie breaker rule is applied.
-
•
Tie breaker rule: If there is a subset such that , where , assign voxel to class , classified from a regional classifier. That is,
To summarize the procedure of one-vs-rest cuts, the algorithm is listed in Algorithm 1.
Algorithm 1.
Algorithm for One-vs-Rest Cuts
Input:, , (regional classifier), (boundary classifier) |
Output: (segmentation) |
1: for to do |
2: initialize class assignment |
3: |
Constructgraph: |
4: |
5: for in do |
6: add edge (source, ) and assign edge cost estimated from |
7: add edge (, sink) and assign edge cost estimated from |
8: end for |
9: for in do |
10: assign edge cost estimated from |
11: end for |
12: perform cut on |
13: for in do |
14: if remains connected to then |
15: |
16: else |
17: |
18: end if |
19: end for |
20: end for |
Majority votes: |
21: for in do |
22: for in do |
23: Calculate from using Eq. (5) |
24: end for |
25: = the set of labels that have |
26: ifthen |
27: |
28: else |
29: = the label predicted from |
30: end if |
31: end for |
32: return |
4. Experiments and Results
4.1. Dataset
We evaluated our method with 20 MRI high-grade (HG) cases from MICCAI Brain Multi-Modal Tumor Segmentation Challenge (BRATS 2013).18 Four MRI sequences: Flair, T1, T1 contrast (T1c), and T2, as well as ground truth are provided in the dataset. The ground truths are consensual manual segmentations from experts and contain the following five classes of tissue type: 0- Normal organs, 1- Necrosis, 2- Edema, 3- Non-Enhancing tumor, and 4- Enhancing Tumor. In this experiment, the evaluation was done for Whole tumor (1 + 2 + 3 + 4), Active tumor (1 + 3 + 4) and Edema (2), considering applications in clinical practice.
4.2. Training
To demonstrate the performance of our method with limited training samples available during on-line segmentation, we performed case-specific training. For each case, we used 3 to 5 image slices for training: an axial slice in the middle of the tumor, an axial slice in the superior part of the tumor, an axial slice in the inferior part of the tumor, and when necessary, two additional axial slices to ensure that training samples are available for all classes. Depending on the case, tumors appear on 32 to 98 image slices. Image intensities from all four MR sequences were the only image feature used to train regional and boundary classifiers to estimate the probabilistic terms in the CRF energy. Intensity values are raw data directly from image files without any pre-processing. We chose random forest19 (RF) for both regional and boundary classifiers in this study. Voxel and boundary samples of all five labels from the training slices were used to train the regional and boundary RF classifiers. It was possible, however, that samples from a boundary class were lacking, i.e., two types of tissues may not have attached to each other at all. In this case, a very small epsilon value () was assigned to the probability.
4.3. Results
We did voxel-wise classification using the regional RF classifiers to generate baseline segmentations for evaluation of our one-vs-rest method and the classic multi-class graph cut method, Swap. In addition, we compared our method to the most recent state-of-art FCCRF. Our method and Swap shared the same RF regional and boundary classifiers for CRF energy whereas, for FCCRF, the RF regional classifier was used to establish unary potentials and Gaussian kernels-based on image intensities and positions of paired voxels were used to establish boundary potentials for the mean-field approximation as proposed by Krahenbuhl et al.13
We used Dice similarity coefficient (DSC) as the measure of accuracy for comparison to the ground truth from the MICAAI dataset. The average scores of different methods are shown in Table 1 and Fig. 2. Our method one-vs-rest performed best in all three tissue categories: whole tumor, active tumor, and edema with mean DSC in 0.83, 0.79, and 0.71, respectively. Our method was significantly better than all three other methods in the segmentation of whole tumor (). Our method also was significantly better () than RF and FCCRF in both active tumor and edema segmentations. For those two tasks, when compared to the swap method, our method was not significantly better. This was because of the larger variation of Dice score in the swap method. It was commonly understood that the Dice coefficient is too sensitive for small structures and tends to have larger variation. Figure 3 shows the segmentation results of five example cases from the MICCAI 2013 HG cases. The baseline RF segmentation is the result of voxel-wise classification and therefore had many isolated mis-segmented voxels. FCCRF removed most of the mis-segmented voxels, however, did not preserve piece-wise continuity of the segmentation. The swap preserved the continuity well but in the process some fine details were lost. In contrast, the segmentation from our one-vs-rest method shows advantages in both continuity and fine details. These were quantified using the following approach. We found the connected components in 3D (26-connected neighborhood) and calculated the percentage of number of components that has volume less than 26 voxels (isolated components either from over-segmentation or fine details.) For whole tumor, the ground truth is 92%, compared to 98% in RF, 98% in FCCRF, 67% in swap, and 97% in our method (Fig. 4). The numbers confirm the significant under-segmentation in swap.
Table 1.
Comparison of DSC of segmentations from our one-vs-rest method and other methods.
Whole tumor | Active tumor | Edema | ||||
---|---|---|---|---|---|---|
Mean | Std. | Mean | Std. | Mean | Std. | |
RF | 0.67 | 0.14 | 0.65 | 0.19 | 0.56 | 0.15 |
FCCRF | 0.79 | 0.10 | 0.74 | 0.15 | 0.68 | 0.14 |
swap | 0.80 | 0.18 | 0.77 | 0.21 | 0.69 | 0.22 |
One-vs-rest | 0.83* | 0.11 | 0.79** | 0.14 | 0.71** | 0.16 |
Note: Wilcoxon signed rank test.
, compared to RF and FCCRF
, compared to all other methods
Fig. 2.
Dice scores. Our one-vs-rest method has the highest mean in all three tissue categories: whole tumor (0.83), active tumor (0.79), and edema (0.71).
Fig. 3.
Segmentation results from five HG cases. Each row is an example slice from a different case. (a) T2 weighted MRI image; (b) results using only the regional classifier; (c) results from FCCRF; (d) results from swap; (e) results of our proposed one-vs-rest method; and (f) the ground truth.
Fig. 4.
Comparison of percentage of isolated components in whole tumor segmentation. Our one-vs-rest method is closest to the ground truth while swap has significant under-segmentation.).
4.4. Performance
The implementation of the algorithms in this study is in Python with a runtime library implemented in C/C++ for a fast version of cut17 used in both swap and our one-vs-rest method. On average, the segmentation time per slice is 82 s for swap and 18 s for one-vs-rest on a Windows® 7 workstation equipped with an Intel® Xeon® E2-2620 2GHz processor.
5. Conclusion
We present a novel one-vs-rest graph cut algorithm under the CRF framework for multi-class image segmentation. Our algorithm performs multiple minimum cuts to minimize CRF energy for obtaining segmentations of the one-vs-rest class permutations. The final segmentation is combined from the one-vs-rest segmentations via majority votes. Unlike the previous graph-based approach swap, the complexity of our method grows only linearly with the number of classes, in addition to the complexity of the algorithm chosen. Our CRF unary (regional) and pairwise (boundary) potentials are purely probabilistic and can be estimated from any chosen and trained classification model. We demonstrated that our algorithm is suitable for semiautomatic segmentation with light-weight and case-specific training, which can be done interactively. In our experiment with MICCAI brain tumor dataset, our method yielded the highest Dice scores in segmentation of whole tumor, active tumor, and edema.
Future work will investigate other image features to further improve segmentation results and feasibility of using this method as a post-editing tool for deep learning-based segmentation.
Acknowledgments
This research was funded in part through the NIH/NCI Cancer Center Support Grant P30 CA008748. We would like to thank reviewers whose valuable comments helped improve and clarify this paper
Biographies
Yu-chi Hu is a software specialist at Memorial Sloan Kettering Cancer Center. His research interests are focused on developing and incorporating machine learning techniques in image segmentation, image registration, and longitudinal analysis for applications in radiotherapy.
Gikas Mageras is an emeritus medical physicist at Memorial Sloan Kettering Cancer Center. His research interests are in detection and management of patient internal motion by means of image-guided radiation therapy. His current research is in the application of deep learning to deformable CT-to-cone-beam-CT registration for localizing organs-at-risk during radiation treatment of pancreatic cancer.
Michael Grossberg is an associate professor of computer science at City College of New York. His research spans data science, computer science, statistics, and mathematics. This has focused on data visualization for big and high dimensional data, using tools from statistical/machine learning, and using web technology for exploration. Recent domains have included remote sensing, climate science, medical imaging, computer vision, and finance. He is also codirector of the CCNY Master’s Program in Data Science and Engineering.
Disclosures
No conflicts of interest, financial, or otherwise, are declared by the authors.
Contributor Information
Yu-chi Hu, Email: huj@mskcc.org.
Gikas Mageras, Email: magerasg@mskcc.org.
Michael Grossberg, Email: grossberg@cs.ccny.cuny.edu.
References
- 1.Rohlfing T., et al. , “Evaluation of atlas selection strategies for atlas-based image segmentation with application to confocal microscopy images of bee brains,” NeuroImage 21, 1428–1442 (2004). 10.1016/j.neuroimage.2003.11.010 [DOI] [PubMed] [Google Scholar]
- 2.Heckemann R. A., et al. , “Automatic anatomical brain MRI segmentation combining label propagation and decision fusion,” NeuroImage 33, 115–126 (2006). 10.1016/j.neuroimage.2006.05.061 [DOI] [PubMed] [Google Scholar]
- 3.Klein A., et al. , “Mindboggle: automated brain labeling with multiple atlases,” BMC Med. Imaging 5, 7 (2005). 10.1186/1471-2342-5-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Langerak T. R., et al. , “Multiatlas-based segmentation with preregistration atlas selection,” Med. Phys. 40(9), 091701 (2013). 10.1118/1.4816654 [DOI] [PubMed] [Google Scholar]
- 5.Sjöberg C., Ahnesjö A., “Multi-atlas based segmentation using probabilistic label fusion with adaptive weighting of image similarity measures,” Comput. Methods Programs Biomed. 110(3), 308–319 (2013). 10.1016/j.cmpb.2012.12.006 [DOI] [PubMed] [Google Scholar]
- 6.Sjöberg C., Johansson S., Ahnesjö A., “How much will linked deformable registrations decrease the quality of multi-atlas segmentation fusions?” Radiat. Oncol. 9(1), 251 (2014). 10.1186/s13014-014-0251-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Krizhevsky A., Sutskever I., Hinton G. E., “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, Pereira F., et al., Eds., Vol. 25, pp. 1097–1105, Curran Associates, Inc., Red Hook, New York: (2012). [Google Scholar]
- 8.Ronneberger O., Fischer P., Brox T., “U-Net: convolutional networks for biomedical image segmentation,” Lect. Notes Comput. Sci. 9351, 234–241 (2015). 10.1007/978-3-319-24574-4_28 [DOI] [Google Scholar]
- 9.Milletari F., Navab N., Ahmadi S., “V-Net: fully convolutional neural networks for volumetric medical image segmentation,” in Fourth Int. Conf. 3D Vision, pp. 565–571 (2016). 10.1109/3DV.2016.79 [DOI] [Google Scholar]
- 10.Russakovsky O., et al. , “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vision 115(3), 211–252 (2015). 10.1007/s11263-015-0816-y [DOI] [Google Scholar]
- 11.Lafferty J. D., McCallum A., Pereira F. C. N., “Conditional random fields: probabilistic models for segmenting and labeling sequence data,” in Proc. Eighteenth Int. Conf. Mach. Learn., Morgan Kaufmann Publishers Inc., San Francisco, CA, pp. 282–289 (2001). [Google Scholar]
- 12.Boykov Y., Veksler O., Zabih R., “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001). 10.1109/34.969114 [DOI] [Google Scholar]
- 13.Krähenbühl P., Koltun V., “Efficient inference in fully connected CRFs with gaussian edge potentials,” in Advances in Neural Information Processing Systems, Shawe-Taylor J., et al., Eds., Vol. 24, pp. 109–117, Curran Associates, Inc., Red Hook, New York: (2011). [Google Scholar]
- 14.Dou Q., et al. , “3D deeply supervised network for automatic liver segmentation from ct volumes,” Lect. Notes Comput. Sci. 9901, 149–157 (2016). 10.1007/978-3-319-46723-8_18 [DOI] [Google Scholar]
- 15.Dietterich T. G., “Ensemble methods in machine learning,” Lect. Notes Comput. Sci. 1857, 1–15 (2000). 10.1007/3-540-45014-9_1 [DOI] [Google Scholar]
- 16.Kolmogorov V., Zabih R., “What energy functions can be minimized via graph cuts?” IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 147–159 (2004). 10.1109/TPAMI.2004.1262177 [DOI] [PubMed] [Google Scholar]
- 17.Boykov Y., Jolly M. P., “Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images,” in Proc. Int. Conf. Comput. Vision, pp. 105–112 (2001). 10.1109/ICCV.2001.937505 [DOI] [Google Scholar]
- 18.“The multimodal brain tumor image segmentation benchmark (BRATS)” 2013, https://www.smir.ch/BRATS/Start2013. [DOI] [PMC free article] [PubMed]
- 19.Breiman L., “Random forests,” Mach. Learn. 45, 5–32 (2001). 10.1023/A:1010933404324 [DOI] [Google Scholar]