Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Aug 18.
Published in final edited form as: Proc IEEE Int Symp Biomed Imaging. 2022 Apr 26;2022:10.1109/isbi52829.2022.9761629. doi: 10.1109/isbi52829.2022.9761629

SELF-SEMANTIC CONTOUR ADAPTATION FOR CROSS MODALITY BRAIN TUMOR SEGMENTATION

Xiaofeng Liu 1, Fangxu Xing 1, Georges El Fakhri 1, Jonghye Woo 1
PMCID: PMC9387767  NIHMSID: NIHMS1779009  PMID: 35990931

Abstract

Unsupervised domain adaptation (UDA) between two significantly disparate domains to learn high-level semantic alignment is a crucial yet challenging task. To this end, in this work, we propose exploiting low-level edge information to facilitate the adaptation as a precursor task, which has a small cross-domain gap, compared with semantic segmentation. The precise contour then provides spatial information to guide the semantic adaptation. More specifically, we propose a multi-task framework to learn a contouring adaptation network along with a semantic segmentation adaptation network, which takes both magnetic resonance imaging (MRI) slice and its initial edge map as input. These two networks are jointly trained with source domain labels, and the feature and edge map level adversarial learning is carried out for cross-domain alignment. In addition, self-entropy minimization is incorporated to further enhance segmentation performance. We evaluated our framework on the BraTS2018 database for cross-modality segmentation of brain tumors, showing the validity and superiority of our approach, compared with competing methods.

Keywords: Unsupervised Domain Adaptation, Medical Image Segmentation, MR Imaging Modalities

1. INTRODUCTION

Accurate tumor segmentation plays a vital role in the early diagnosis and surgical planning. Over the past several years, due to advances in deep learning, segmentation performance has been substantially improved, when compared with prior approaches. In particular, deep learning-based semantic segmentation of brain tumors using magnetic resonance imaging (MRI) data has been actively developed, the goal of which is to make pixel-wise semantic classification [1, 2, 3]. For example, each voxel seen in the MRI volumes of the brain can be categorized as the enhancing tumor (EnhT), peritumoral edema (ED), necrotic and non-enhancing tumor core (CoreT), or healthy tissue [4]. To date, many deep learning based approaches have relied on the independent and identically distributed (i.i.d.) assumption between training and testing data [5]. Yet, this assumption is often violated, when multiple acquisition parameters or imaging modalities are involved, resulting in a significant performance degradation [1, 2, 3].

Cross-modality unsupervised domain adaptation (UDA), therefore, has been proposed to mitigate this performance degradation, when applying a network trained on one MRI modality (e.g., T2-weighted MRI) to another (e.g., T1-weighted MRI) [6, 7]. Cross-modality UDA aims to transfer knowledge learned from a labeled source modality to an unlabeled target modality to deal with the difficulty of labeling in the target modality [8, 9, 10, 11]. Adversarial training approaches [12] would be a typical solution to this problem, by enforcing features extracted from source and target domains to be indistinguishable. Simply aligning the large semantic feature discrepancy, however, may result in negative transfer, thereby degenerating the performance in both domains [5]. As such, accurate alignment of a large domain gap in a cross-modality setting remains a long-lasting challenge [7].

To tackle the aforementioned issues, in this work, we propose to leverage edge delineation that is closely related to semantic segmentation, while being relatively “easier” to adapt [1]. Actually, semantic segmentation aims to achieve not only an edge delineation, but also a semantic categorical label assignment [1, 2]. In addition, there is considerable evidence supporting that the low-level edge features in the shallow layers of convolutional neural networks (CNN) are highly transferable, and have a smaller domain gap than semantic segmentation adaptation [13]. Accordingly, the adaptation for the edge delineation can serve as a precursor task, and provide relatively precise contours for target domain samples. Since there is a strong correlation between the semantic edge delineation and semantic segmentation in that the edges are the boundaries of the semantic classes, we are able to facilitate the subsequent segmentation with the edge map. We note that we are only interested in the semantic contours of the to-be segmented regions, instead of all texture edges.

To the best of our knowledge, this is the first attempt at facilitating cross-modality UDA for segmentation with the adaptation of self-semantic edge delineation. Specifically, we propose a novel cascaded multi-task framework with the consecutive semantic contour delineation and an edge map conditioned semantic segmentation network. The adaptation of the semantic contour delineation is achieved by an edge map adversarial alignment along with supervised delineation in the source domain. We note that the edge label in the source domain can be directly generated from its segmentation label with the Canny edge detector [14] in a self-supervision manner, and we do not need an additional labeling effort. At the testing stage, the segmentation network takes the concatenated MRI slice and its edge map as input for inference. We follow the conventional adversarial segmentation UDA to enforce the feature level alignment. In addition, our framework is orthogonal to unsupervised learning approaches; therefore the widely used self-entropy minimization can be simply added on top of our framework. We validate our frame- work on a brain tumor segmentation task using multiple MRI modalities to demonstrate the effectiveness of our framework

2. METHODOLOGY

2.1. Conventional Adversarial UDA for Segmentation

In the cross-modality segmentation UDA, we have a labeled source domain with the sample {xs, ys} and an unlabeled target domain with the sample {xt}. We note that the label ys is the pixel-wise segmentation map. We target to learn a mapping f : x → y that performs well in both source and target domains with the training data of {xs, ys} and {xt}.

A typical solution would be to train an autoencoder-based segmentation network with {xs, ys}, following a supervise learning scheme, and simultaneously to train a feature-level discriminator, following an adversarial game [15]. Specifically, we use a cross-entropy loss in each pixel for source domain supervised learning, given by

LCEy=iCyislogy˜is, (1)

where y˜s is the prediction of the segmentation network, and C indicates the number of semantic categories.

The discriminator takes the latent representation, i.e., the output features of the encoder Enc(xs) or Enc(xt), as input, and classify whether they are from the source or target domain. The encoder tries to confuse the discriminator, by making Enc(xs) and Enc(xt) indistinguishable. Then, both Enc(xs) and Enc(xt) can share the same decoder for segmentation. The to-be minimized binary cross-entropy loss of the feature-level discriminator can be formulated as:

LDisf=ExslogDisEncxs (2)
+Extlog1DisEncxt. (3)

Here, the encoder tries to maximizeLDisf, i.e., minimizing LDisf in implementation [16]

2.2. Semantic Contour Guided Segmentation UDA

Our proposed work builds upon the adversarial UDA approach in the following way. Instead of directly aligning the semantic segmentation features that have a large domain gap, we propose to achieve semantic contour delineation adaptation as a preparatory task. Since the low-level edge inherits a smaller domain gap, the delineation task itself can serve as a precursor module for the semantic segmentation [1, 2]. The overall framework is shown in Fig. 1.

Fig. 1.

Fig. 1.

Illustration of our proposed multi-task adaptation framework, comprising a consecutive semantic edge delineation and an edge conditioned segmentation module. The edge map and feature level discriminators are applied to edge and segmentation adaptation, respectively. We note that only the light blue masked modules are used in testing.

Specifically, we have an independent semantic contour network to predict the edge map of both source and target domain samples. For source domain samples, the ground truth semantic edge label es can be simply generated from their segmentation label with the Canny edge detector [14]. The edge label has binary value, and the output layer uses a sigmoid unit. The corresponding CE loss in each pixel for the edge delineation can be expressed as:

LCEe=esloge˜s1eslog1e˜s. (4)

To achieve the adaptation of the semantic contour delineation, we simply adopt the edge map level adversarial training. The discriminator takes the predicted edge maps e˜s and e˜t as input and classify their domains. The cross-entropy loss of the edge map discriminator can be formulated as:

LDise=Ee˜slogDise˜s+Ee˜tlog1Dise˜t. (5)

The discriminator is trained by minimizing LDise, while the semantic contour network tries to maximize it.

Next, we concatenate the semantic edge map with the original MRI slice, which is then input to the segmentation network. We note that the semantic edge map provides location information, which is consistent with the segmentation boundaries. In addition, our latent space encoding can be expressed as Encxs,e˜s and Encxt,e˜t. Therefore, the adversarial loss in Eq. (2) can be reformulated as:

LDisf=Exs,e˜slogDisEncxs,e˜s (6)
+Ext,e˜tlog1DisEncxt,e˜t. (7)

UDA to learn high-level semantic alignment in the unlabeled target modality can be seen as an unsupervised learning problem. Therefore, unsupervised learning approaches can be applied to further improve the performance [17, 18]. To this end, we adopt self-entropy minimization, which can be easily incorporated into our framework. Self-entropy minimization encourages a confident output, i.e., the maximum softmax value can be high [19, 20]. For the segmentation task, the self-entropy minimization can be formulated as the pixelwise averaged entropy of the last layer’s softmax prediction:

LEnt=1NnNσnlogσn, (8)

where N indicates the number of pixels of the segmentation map, and σb,n is the histogram distribution of the softmax output of the n-th pixel. By minimizing the pixel-wise averaged entropy, the outputs are encouraged to approach a one-hot distribution.

In summary, we jointly minimize the following objectives for different modules:

Semantic Contour Net:LCEeαLDise, (9)
Edge discriminator:LDise, (10)
Encoder:LCEyβLDisf+λLEnt, (11)
Decoder:LCEy+λLEnt, (12)
Segmentation feature discriminator:LDisf, (13)

where β and λ are the weighting parameters.

At the testing stage, we first generate the semantic edge map of the input MRI slice with our semantic contour network. The segmentation network then makes the inference based on the concatenated edge map and its original MRI slice. Of note, we do not need discriminators in testing

3. EXPERIMENTS AND RESULTS

3.1. Data and Evaluation Protocol

We evaluated our framework on the publicly available multi-modality BraTS2018 database [4]. The database contains a total of 285 subjects, in which 210 subjects have high-grade gliomas, and the remaining 75 subjects have low-grade gliomas. Each subject has four modalities that are aligned, including T1-weighted (T1), T1-contrast enhanced (T1ce), T2-weighted (T2), and T2 Fluid Attenuated Inversion Recovery (FLAIR) MRI. The label has three classes for each pixel, including EnhT, ED, and CoreT [4]. In addition, the sum of EnhT, ED, and CoreT represents the whole tumor [3].

In order to demonstrate the effectiveness of our framework, we followed the standard cross-modality UDA setting that selects 80% subjects for training and 20% subjects for testing [7, 6]. We used T2-weighted MRI as the labeled source domain, and FLAIR, T1-weighted, and T1CE MRI as the unlabeled target domains [7, 6]. We normalized the image intensity to [−1, 1] and applied cropping and rotation for data augmentation as in [7, 6]. We note that the data were used in an unpaired manner [6], and we did not need additional edge labels.

We adopted the Dice score and Hausdorff distance for quantification as in [6]. The Dice score measures the overlaps between ground truth labels and predictions, and the larger value indicates better performance. In contrast, the Hausdorff distance is defined between two sets of points in the metric space, and the lower value indicates the better performance.

3.2. Network and Training Details

In order to demonstrate the generality of our framework, we adopted two backbones for our segmentation network as in [6, 7]. In [6], the encoder and decoder based segmentation network consists of the residual blocks and dilated residual blocks, respectively. The segmentation network used in [7] is based on the more powerful Deeplab-ResNet50. We followed [23] to use a fully convolutional encoder-decoder network for our semantic contour network, and only revised the input layer to a single channel for our MRI data. Our segmentation feature discriminator is composed of 3 convolutional and 2 fully connected layers. The edge map discriminator has 4 convolutional and 2 fully connected layers.

We implemented our framework using the PyTorch tool-box. The training was implemented in a server with an NVIDIA V100 GPU, which took about 5 hours. In testing, the inference took only 0.1s. The learning rates were set as 1e3 and 1e4 for segmentation and semantic contour networks and discriminators, respectively. We used the momentum of 0.5 consistently.

3.3. Qualitative Evaluations

Fig. 2 shows a visualization of the segmentation maps of the samples from different target domains. We used the segmentation network as in DSFN [6] for a fair comparison. Considering the large appearance discrepancies between different MRI modalities, the performance dropped significantly. Even with the adaptation via DSFN [6], the results were significantly different from the ground truth as visually assessed, potentially leading to severe misdiagnosis. By contrast, we can see that our proposed framework yielded superior segmentation results, compared with DSFN [6], where our framework is based on a relatively simple dual-scheme fusion network. Therefore, our self-semantic contour network is considered a simple yet efficient add-on module to guide the challenging semantic segmentation adaptation. In addition, the self-entropy minimization was able to further boost the performance in an add-on fashion.

Fig. 2.

Fig. 2.

Comparison with an UDA method, DSFN [6], and an ablation study of the self-entropy minimization for adapting T2-weighted MRI to FLAIR/T1/T1CE MRI.

3.4. Quantitative Evaluations

Tables 1 and 2 show the quantitative evaluation results of the segmentation network backbone using DSFN [6] and DSA [7], respectively. Our proposed framework yielded superior performance, compared with the competing methods with the same segmentation backbone, thus demonstrating its effectiveness and generality. The ablation study of the self-entropy minimization was also consistent with the qualitative evaluations. Since we used T2-weighted MRI as our source domain, there is a relatively smaller domain gap between T2-weighted and T2-FLAIR MRI, while the domain gap is larger between T2-weighted and T1-weighted or T1CE MRI. In Table 2, accordingly, we can see that the performance in T1-weighted or T1CE MRI domains is significantly inferior to FLAIR MRI.

Table 1.

Comparison of Core/EnhT/Whole tumor segmentation for the cross-modality UDA. Results are averaged over T2-weighted to T1-weighted, T1CE, and FLAIR MRI. We used the same segmentation backbone as in DSFN [6].

Method DICE Score [%] Hausdorff Distance [mm]

CoreT EnhT Whole CoreT EnhT Whole
 

No UDA [6] 20.4 39.7 22.6 55.4 67.3 46.8

CycleGAN [21] 52.6 41.9 57.8 35.3 48.2 21.9
SIFA [22] 54.4 41.1 59.3 28.6 35.7 17.1
DSFN [6] 56.8 42.7 66.1 27.8 34.5 15.6

Ours w/o LEnt 57.6 43.9 67.4 26.6 33.5 13.9
Ours 58.0 44.6 67.9 26.4 33.2 13.5

Table 2.

Whole tumor segmentation performance of the cross-modality UDA. We used the same segmentation back-bone as in DSA [7].

Method DICE Score [%] Hausdorff Distance [mm]

FLAIR T1 T1CE FLAIR T1 T1CE
No UDA [7] 65.1 4.2 6.3 28.0 55.7 49.8
 

DSA [7] 81.8 57.7 62.0 8.6 14.2 13.7

Ours w/o LEnt 82.6 58.8 63.1 8.1 12.8 11.5
Ours 82.9 59.3 63.5 7.9 12.5 11.2

4. DISCUSSION AND CONCLUSION

In this work, we proposed a multi-task adaptation framework to utilize the low-level edge detection task to facilitate the challenging semantic segmentation. The semantic contour has been a precursor task for semantic segmentation, since the edge information is highly transferable, thus simplifying the adaptation of the semantic delineation. With the relatively precise contour in the target domain, we were able to guide the subsequent segmentation, by providing the boundaries of semantic classes. We did not need an additional edge label, but instead, we resorted to the simple Canny operator to acquire the source domain edge map in a self-supervision manner. In addition, the widely used self-entropy minimization was incorporated to boost the performance. We demonstrated its effectiveness on the T2-weighted to FLAIR/T1-weighted/T1CE MRI adaptation task and achieved superior performance, compared with the competing state-of-the-art methods.

6. ACKNOWLEDGMENTS

This work is partially supported by NIH R01DC018511, R01DE027989, R01CA165221, and P41EB022544.

Footnotes

5.

COMPLIANCE WITH ETHICAL STANDARDS

This research study was conducted retrospectively using human subject data made available in open access by BraTS’18.

7. REFERENCES

  • [1].Yuheng Song and Hao Yan, “Image segmentation algorithms overview,” arXiv, 2017. [Google Scholar]
  • [2].Rampun Andrik, et al. , “Breast pectoral muscle segmentation in mammograms using a modified holistically-nested edge detection network,” Medical image analysis, vol. 57, pp. 1–17, 2019. [DOI] [PubMed] [Google Scholar]
  • [3].Liu Xiaofeng, Xing Fangxu, Yang Chao, Fakhri Georges El, and Woo Jonghye, “Adapting off-the-shelf source segmenter for target medical image segmentation,” in MICCAI Springer, 2021, pp. 549–559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Menze Bjoern H et al. , “The multimodal brain tumor image segmentation benchmark (BRATS),” IEEE TMI, vol. 34, no. 10, pp. 1993–2024, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Liu Xiaofeng, Guo Zhenhua, Li Site, Xing Fangxu, You Jane, C-C Jay Kuo Georges El Fakhri, and Woo Jonghye, “Adversarial unsupervised domain adaptation with conditional and label shift: Infer, align and iterate,” ICCV, 2021. [Google Scholar]
  • [6].Zou Danbing, Zhu Qikui, and Yan Pingkun, “Unsupervised domain adaptation with dualscheme fusion network for medical image segmentation,” in IJCAI, 2020. [Google Scholar]
  • [7].Han Xiaoting, Qi Lei, Yu Qian, Zhou Ziqi, Zheng Yefeng, Shi Yinghuan, and Gao Yang, “Deep symmetric adaptation network for cross-modality medical image segmentation,” arXiv, 2021. [DOI] [PubMed] [Google Scholar]
  • [8].Liu Xiaofeng, Liu Xiongchang, Hu Bo, Ji Wenxuan, Xing Fangxu, Lu Jun, You Jane, C-C Jay Kuo Georges El Fakhri, and Woo Jonghye, “Subtype-aware unsupervised domain adaptation for medical diagnosis,” AAAI, 2021. [Google Scholar]
  • [9].Liu Xiaofeng, Xing Fangxu, Stone Maureen, Zhuo Jiachen, Reese Timothy, Jerry L Prince Georges El Fakhri, and Woo Jonghye, “Generative self-training for cross-domain unsupervised tagged-to-cine mri synthesis,” in MICCAI Springer, 2021, pp. 138–148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Liu Xiaofeng, Hu Bo, Jin Linghao, Han Xu, Xing Fangxu, Ouyang Jinsong, Lu Jun, Fakhri Georges EL, and Woo Jonghye, “Domain generalization under con- ditional and label shifts via variational bayesian infer- ence,” IJCAI, 2021. [Google Scholar]
  • [11].Liu Xiaofeng, Li Site, Ge Yubin, Ye Pengyi, You Jane, and Lu Jun, “Recursively conditional gaussian for ordinal unsupervised domain adaptation,” in ICCV, 2021. [DOI] [PubMed] [Google Scholar]
  • [12].Liu Xiaofeng, Chao Yang, You Jane J, Jay Kuo C-C, and Vijayakumar Bhagavatula, “Mutual information regularized feature-level frankenstein for discriminative recognition,” IEEE TPAMI, 2021. [DOI] [PubMed] [Google Scholar]
  • [13].Long Mingsheng, Cao Yue, Wang Jianmin, and Jordan Michael I, “Learning transferable features with deep adaptation networks,” ICML, 2015. [DOI] [PubMed] [Google Scholar]
  • [14].Ding Lijun and Goshtasby Ardeshir, “On the canny edge detector,” Pattern Recognition, 2001. [Google Scholar]
  • [15].Liu Xiaofeng, Xing Fangxu, Jerry L Prince Aaron Carass, Stone Maureen, Fakhri Georges El, and Woo Jonghye, “Dual-cycle constrained bijective vae-gan for tagged-to-cine magnetic resonance image synthesis,” in ISBI IEEE, 2021, pp. 1448–1452. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Salimans Tim, Goodfellow Ian, Zaremba Wojciech, Cheung Vicki, Radford Alec, and Chen Xi, “Improved techniques for training gans,” in NIPS, 2016. [Google Scholar]
  • [17].Wang Dequan, Shelhamer Evan, Liu Shaoteng, Olshausen Bruno, and Darrell Trevor, “Fully test-time adaptation by entropy minimization,” arXiv, 2020. [Google Scholar]
  • [18].Bateson Mathilde, Kervadec Hoel, Dolz Jose, Hervé Lombaert, and Ayed Ismail Ben, “Source-relaxed domain adaptation for image segmentation,” in MICCAI Springer, 2020, pp. 490–499. [Google Scholar]
  • [19].Grandvalet Yves and Bengio Yoshua, “Semi-supervised learning by entropy minimization,” in NIPS, 2005. [Google Scholar]
  • [20].Liu Xiaofeng, Xing Fangxu, Fakhri Georges El, and Woo Jonghye, “Unsupervised domain adaptation for segmentation with black-box source model,” in SPIE Medical Imaging, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Zhu Jun-Yan, Park Taesung, Isola Phillip, and Efros Alexei A, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in ICCV, 2017. [Google Scholar]
  • [22].Chen Cheng, Dou Qi, Chen Hao, Qin Jing, and Heng Pheng-Ann, “Synergistic image and feature adaptation: Towards cross-modality domain adaptation for medical image segmentation,” in AAAI, 2019. [DOI] [PubMed] [Google Scholar]
  • [23].Yang Jimei, Price Brian, Cohen Scott, Lee Honglak, and Yang Ming-Hsuan, “Object contour detection with a fully convolutional encoder-decoder network,” in CVPR, 2016, pp. 193–202. [Google Scholar]

RESOURCES