Abstract
Deep learning (DL) models for segmenting various anatomical structures have achieved great success via a static DL model that is trained in a single source domain. Yet, the static DL model is likely to perform poorly in a continually evolving environment, requiring appropriate model updates. In an incremental learning setting, we would expect that well-trained static models are updated, following continually evolving target domain data—e.g., additional lesions or structures of interest—collected from different sites, without catastrophic forgetting. This, however, poses challenges, due to distribution shifts, additional structures not seen during the initial model training, and the absence of training data in a source domain. To address these challenges, in this work, we seek to progressively evolve an “off-the-shelf” trained segmentation model to diverse datasets with additional anatomical categories in a unified manner. Specifically, we first propose a divergence-aware dual-flow module with balanced rigidity and plasticity branches to decouple old and new tasks, which is guided by continuous batch renormalization. Then, a complementary pseudo-label training scheme with self-entropy regularized momentum MixUp decay is developed for adaptive network optimization. We evaluated our framework on a brain tumor segmentation task with continually changing target domains—i.e., new MRI scanners/modalities with incremental structures. Our framework was able to well retain the discriminability of previously learned structures, hence enabling the realistic life-long segmentation model extension along with the widespread accumulation of big medical data.
1. Introduction
Accurate segmentation of a variety of anatomical structures is a crucial prerequisite for subsequent diagnosis or treatment [28]. While recent advances in data-driven deep learning (DL) have achieved superior segmentation performance [29], the segmentation task is often constrained by the availability of costly pixel-wise labeled training datasets. In addition, even if static DL models are trained with extraordinarily large amounts of training datasets in a supervised learning manner [29], there exists a need for a segmentor to update a trained model with new data alongside incremental anatomical structures [24].
In real-world scenarios, clinical databases are often sequentially constructed from various clinical sites with varying imaging protocols [19,20,21,23]. As well, labeled anatomical structures are incrementally increased with additional lesions or new structures of interest, depending on study goals or clinical needs [27,18]. Furthermore, access to previously used data for training can be restricted, due to data privacy protocols [18,17]. Therefore, efficiently utilizing heterogeneous structure-incremental (HSI) learning is highly desired for clinical practice to develop a DL model that can be generalized well for different types of input data and varying structures involved. Straightforwardly fine-tuning DL models with either new structures [30] or heterogeneous data [17] in the absence of the data used for the initial model training, unfortunately, can easily overwrite previously learned knowledge, i.e., catastrophic forgetting [30,17,14].
At present, satisfactory methods applied in the realistic HSI setting are largely unavailable. First, recent structure-incremental works cannot deal with domain shift. Early attempts [27] simply used exemplar data in the previous stage. [5,33,30,18] combined a trained model prediction and a new class mask as a pseudo-label. However, predictions from the old model under a domain shift are likely to be unreliable [38]. The widely used pooled feature statistics consistency [5,30] is also not applicable for heterogeneous data, since the statistics are domain-specific [2]. In addition, a few works [13,25,34] proposed to increase the capacity of networks to avoid directly overwriting parameters that are entangled with old and new knowledge. However, the solutions cannot be domain adaptive. Second, from the perspective of continuous domain adaptation with the consistent class label, old exemplars have been used for the application of prostate MRI segmentation [32]. While Li et al. [17] further proposed to recover the missing old stage data with an additional generative model, hallucinating realistic data, given only the trained model itself, is a highly challenging task [31] and may lead to sensitive information leakage [35]. Third, while, for natural image classification, Kundu et al. [16] updated the model for class-incremental unsupervised domain adaption, its class prototype is not applicable for segmentation.
In this work, we propose a unified HSI segmentor evolving framework with a divergence-aware decoupled dual-flow (D3F) module, which is adaptively optimized via HSI pseudo-label distillation using a momentum MixUp decay (MMD) scheme. To explicitly avoid the overwriting of previously learned parameters, our D3F follows a “divide-and-conquer” strategy to balance the old and new tasks with a fixed rigidity branch and a compensated learnable plasticity branch, which is guided by our novel divergence-aware continuous batch renormalization (cBRN). The complementary knowledge can be flexibly integrated with the model re-parameterization [4]. Our additional parameters are constant in training, and 0 in testing. Then, the flexible D3F module is trained following the knowledge distillation with novel HSI pseudo-labels. Specifically, inspired by the self-knowledge distillation [15] and self-training [38] that utilize the previous prediction for better generalization, we adaptively construct the HSI pseudo-label with an MMD scheme to smoothly adjust the contribution of potential noisy old model predictions on heterogeneous data and progressively learned new model predictions along with the training. In addition, unsupervised self-entropy minimization is added to further enhance performance.
Our main contributions can be summarized as follow:
To our knowledge, this is the first attempt at realistic HSI segmentation with both incremental structures of interest and diverse domains.
We propose a divergence-aware decoupled dual-flow module guided by our novel continuous batch renormalization (cBRN) for alleviating the catastrophic forgetting under domain shift scenarios.
The adaptively constructed HSI pseudo-label with self-training is developed for efficient HSI knowledge distillation.
We evaluated our framework on anatomical structure segmentation tasks from different types of MRI data collected from multiple sites. Our HSI scheme demonstrated superior performance in segmenting all structures with diverse data distributions, surpassing conventional class-incremental methods without considering data shift, by a large margin.
2. Methodology
For the segmentation model under incremental structures of interest and domain shift scenarios, we are given an off-the-shelf segmentor parameterized with θ0, which has been trained with the data in an initial source domain 𝒟0 = {𝒳0, 𝒴0}, where and are the paired image slice and its segmentation mask with the height of H and width of W, respectively. There are T consecutive evolving stages with heterogeneous target domains , each with the paired slice set and the current stage label set , where . Due to heterogeneous domain shifts, 𝒳t from different sites or modalities follows diverse distributions across all T stages. Due to incremental anatomical structures, the overall label space, across the previous t stages, 𝒴t is expanded from 𝒴t−1 with the additional annotated structures 𝒮t in stage t. i.e., 𝒴t = 𝒴t−1∪𝒮t = 𝒴0∪𝒮1 …∪𝒮t. We are targeting to learn that performs well on all for delineating all of the structures 𝒴T seen in T stages.
2.1. cBRN guided divergence-aware decoupled dual-flow
To alleviate the forgetting through parameter overwriting, caused by both new structures and data shift, we propose a D3F module for flexible decoupling and integration of old and new knowledge.
Specifically, we duplicate the convolution in each layer initialized with the previous model to form two branches as in [13,25,34]. The first rigidity branch is fixed at the stage t to keep the old knowledge we have learned. In contrast, the extended plasticity branch is expected to be adaptively updated to learn the new task in 𝒟t. At the end of current training stage t, we can flexibly integrate the convolutions in two branches, i.e., and to with the model re-parameterization [4]. In fact, the dual-flow model can be regarded as an implicit ensemble scheme [9] to integrate multiple sub-modules with a different focus. In addition, as demonstrated in [6], the fixed modules will regularize the learnable modules to act as the fixed one. Thus, the plasticity modules can also be implicitly encouraged to keep the previous knowledge along with its HSI learning.
However, under the domain shift, it can be sub-optimal to directly average the parameters, since may not perform well to predict 𝒴t−1 on 𝒳t. It has been demonstrated that batch statistics adaptation plays an important role in domain generalizable model training [22]. Therefore, we propose a continual batch renormalization (cBRN) to mitigate the feature statistics divergence between each training batch at a specific stage and the life-long global data distribution.
Of note, as a default block in the modern convolutional neural networks (CNN) [8,37], batch normalization (BN) [11] normalizes the input feature of each CNN channel with its batch-wise statistics, e.g., mean μB and standard deviation σB, and learnable scaling and shifting factors {γ, β} as where i indexes the spatial position in . BN assumes that the same mini-batch training and testing distribution [10], which does not hold in HSI. Simply enforcing the same statistics across domains as [5,33,30] can weaken the model expressiveness [36].
The recent BRN [10] proposes to rectify the data shift between each batch and the dataset by using the moving average μ and σ along with the training:
| (1) |
where η ∈ [0, 1] is applied to balance the global statistics and the current batch. In addition, and are used in both training and testing. Therefore, BRN renormalizes to highlight the dependency on the global statistics {μ, σ} in training for a more generalizable model, while limited to the static learning.
In this work, we further explore the potential of BRN in the continuously evolving HSI task to be general for all of domains involved. Specifically, we extend BRN to cBRN across multiple consecutive stages by updating {μc, σc} along with all stages of training, which is transferred as shown in Fig. 1. The conventional BN also inherits {μ, σ} for testing, while not being used in training [11]. At the stage t, μc and σc are succeeded from t − 1 stage, and are updated with the current batch-wise and in rigidity and plasticity branches:
| (2) |
Fig. 1:
Illustration of one layer in our proposed divergence-aware decoupled dual-flow module guided with cBRN for our cross-MR-modality HSI task, i.e., subject-independent (CoreT with T1) → (EnhT with T2) → (ED with FLAIR). Notably, we do not require the dual-flow or cBRN, for the initial segmentor.
For testing, the two branches in final model can be merged for the lightweight implementation:
| (3) |
Therefore, does not introduce additional parameters for deployment.
2.2. HSI pseudo-label distillation with momentum MixUp decay
The training of our developed with D3F is supervised with the previous model and current stage data . In conventional class incremental learning, the knowledge distillation [31] is widely used to construct the combined label by adding and the prediction of . Then, can be optimized by the training pairs of . However, with heterogeneous data in different stages, can be highly unreliable. Simply using it as ground truth cannot guide the correct knowledge transfer.
In this work, we construct a complementary pseudo-label with a MixUp decay scheme to adaptively exploit the knowledge in the old segmentor for the progressively learned new segmentor. In the initial training epochs, could be a more reliable supervision signal, while we would expect can learn to perform better on predicting 𝒴t−1. Of note, even with the rigidity branch, the integrated network can be largely distracted by the plasticity branch in the initial epochs. Therefore, we propose to dynamically adjust their importance in constructing pseudo-label along with the training progress. Specifically, we MixUp the predictions of and w.r.t. 𝒴t−1, i.e., , and control their pixel-wise proportion for the pseudo-label with MMD:
| (4) |
where i indexes each pixel, and λ is the adaptation momentum factor with the exponential decay of iteration I. λ0 is the initial weight of , which is empirically set to 1 to constrain λ ∈ (0, 1]. Therefore, the weight of old model prediction can be smoothly decreased along with the training, and gradually represents the target data for the old classes in [: t−1]. Of note, we have ground-truth of new structure under HSI scenarios [5,33,30,18]. We calculate the cross-entropy loss ℒCE with the pseudo-label as self-training [15,38].
In addition to the old knowledge inherited in , we propose to explore unsupervised learning protocols to stabilize the initial training. We adopt the widely used self-entropy (SE) minimization [7] as a simple add-on training objective. Specifically, we have the slice-level segmentation SE, which is the averaged entropy of the pixel-wise softmax prediction as . In training, the overall optimization loss is formulated as follows:
| (5) |
where α is used to balance our HSI distillation and SE minimization terms, and Imax is the scheduled iteration. Of note, strictly minimizing the SE can result in a trivial solution of always predicting a one-hot distribution [7], and a linear decreasing of α is usually applied, where λ0 and α0 are reset in each stage.
3. Experiments and Results
We carried out two evaluation settings using the BraTS2018 database [1], including cross-subset (relatively small domain shift) and cross-modality (relatively large domain shift) tasks. The BraTS2018 database is a continually evolving database [1] with a total of 285 glioblastoma or low-grade gliomas subjects, comprising three consecutive subsets, i.e., 30 subjects from BraTS2013 [26], 167 subjects from TCIA [3], and 88 subjects from CBICA [1]. Notably, these three subsets were collected from different clinical sites, vendors, or populations [1]. Each subject has T1, T1ce, T2, and FLAIR MRI volumes with voxel-wise labels for the tumor core (CoreT), the enhancing tumor (EnhT), and the edema (ED).
We incrementally learned CoreT, EnhT, and ED structures throughout three consecutive stages, each following different data distributions. We used subject-independent 7/1/2 split for training, validation, and testing. For a fair comparison, we adopted the ResNet-based 2D nnU-Net backbone with BN as in [12] for all of the methods and all stages used in this work.
3.1. Cross-subset structure incremental evolving
In our cross-subset setting, three structures were sequentially learned across three stages: (CoreT with BraTS2013) → (EnhT with TCIA) → (ED with CBICA). Of note, we used a CoreT segmentator trained with BraTS2013 as our off-the-shelf segmentor in t = 0. Testing involved all subsets and anatomical structures. We compared our framework with the three typical structure-incremental (SI-only) segmentation methods, e.g., PLOP [5], MargExcIL [18], and UCD [30], which cannot address the heterogeneous data across stages. As tabulated in Table 1, PLOP [5] with additional feature statistic constraints has lower performance than MargExcIL [18], since the feature statistic consistency was not held in HSI scenarios. Of note, the domain-incremental methods [17,32] cannot handle the changing output space. Our proposed HSI framework outperformed SI-only methods [5,18,30] with respect to both DSC and HD, by a large margin. For the anatomical structure CoreT learned in t = 0, the difference between our HSI and these SI-only methods was larger than 10% DSC, which indicates the data shift related forgetting lead to a more severe performance drop in the early stages. We set η = 0.01 and alpha0 = 10 according to the sensitivity study in the supplementary material.
Table 1:
Numerical comparisons and ablation studies of the cross-subset brain tumor HSI segmentation task
| Method | Data shift consideration | Dice similarity coefficient (DSC) [%] ↑ | Hausdorff distance (HD)[mm] ↓ | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Mean | CoreT | EnhT | ED | Mean | CoreT | EnhT | ED | ||
| PLOP [5] | × | 59.83±0.131 | 45.50 | 57.39 | 76.59 | 19.2±0.14 | 22.0 | 19.8 | 15.9 |
| MargExcIL [18] | × | 60.49±0.127 | 48.37 | 56.28 | 76.81 | 18.9±0.11 | 21.4 | 19.8 | 15.5 |
| UCD [30] | × | 61.84±0.129 | 49.23 | 58.81 | 77.48 | 19.0±0.15 | 21.8 | 19.4 | 15.7 |
| HSI-MMD | √ | 66.87±0.126 | 59.42 | 61.26 | 79.93 | 16.8±0.13 | 18.5 | 17.8 | 14.2 |
| HSI-D3F | √ | 67.18±0.118 | 60.18 | 63.09 | 78.26 | 16.7±0.14 | 18.0 | 17.5 | 14.5 |
| HSI-cBRN | √ | 68.07±0.121 | 61.52 | 63.45 | 79.25 | 16.3±0.14 | 17.8 | 17.3 | 13.8 |
| HSI | √ | 69.44±0.119 | 63.79 | 64.71 | 79.81 | 15.7±0.12 | 16.7 | 16.9 | 13.6 |
| Joint Static | √(upper bound) | 73.98±0.117 | 71.14 | 68.35 | 82.46 | 15.0±0.13 | 15.7 | 16.2 | 13.2 |
For the ablation study, we denote HSI-D3F as our HSI without the D3F module, simply fine-tuning the model parameters. HSI-cBRN used dual-flow to avoid direct overwriting, while the model was not guided by cBRN for more generalized prediction on heterogeneous data. As shown in Table 1, both the dual-flow and cBRN improve the performance. Notably, the dual-flow model with flexible re-parameterization was able to alleviate the overwriting, while our cBRN was developed to deal with heterogeneous data. In addition, HSI-MMD indicates our HSI without the momentum MixUp decay in pseudo-label construction, i.e., simply regarding the prediction of is ground truth for 𝒴t−1. However, can be quite noisy, due to the low quantification performance of early stage structures, which can be aggravated in the case of the long-term evolving scenario. Of note, the pseudo-label construction is necessary as in [5,18,30]. We also provide the qualitative comparison with SI-only methods and ablation studies in Fig. 3.
Fig. 3:
Segmentation examples in t = 1 and t = 2 in the cross-subset brain tumor HSI segmentation task.
3.2. Cross-modality structure incremental evolving
In our cross-modality setting, three structures were sequentially learned across three stages: (CoreT with T1) → (EnhT with T2) → (ED with T2 FLAIR). Of note, we used the CoreT segmentator trained with T1 modality as our off-the-shelf segmentor in t = 0. Testing involved all MRI modalities and all structures. With the hyperparameter validation, we empirically set η = 0.01 and α0 = 10.
In Table 2, we provide quantitative evaluation results. We can see that our HSI framework outperformed SI-only methods [5,18,30] consistently. The improvement can be even larger, compared with the cross-subset task, since we have much more diverse input data in the cross-modality setting. Catastrophic forgetting can be severe, when we use SI-only method for predicting early stage structures, e.g., CoreT. We also provide the ablation study with respect to D3F, cBRN, and MMD in Table 2. The inferior performance of HSI-D3F/cBRN/MMD demonstrates the effectiveness of these modules for mitigating domain shifts.
Table 2:
Numerical comparisons and ablation studies of the cross-modality brain tumor HSI segmentation task
| Method | Data shift consideration | Dice similarity coefficient (DSC) [%] ↑ | Hausdorff distance (HD)[mm] ↓ | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Mean | CoreT | EnhT | ED | Mean | CoreT | EnhT | ED | ||
| PLOP [5] | × | 39.58±0.231 | 13.84 | 38.93 | 65.98 | 30.7±0.26 | 48.1 | 25.4 | 18.7 |
| MargExcIL [18] | × | 42.84±0.189 | 19.56 | 41.56 | 67.40 | 29.1±0.28 | 46.7 | 22.1 | 18.6 |
| UCD [30] | × | 44.67±0.214 | 21.39 | 45.28 | 67.35 | 29.4±0.32 | 46.2 | 23.6 | 18.4 |
| HSI-MMD | √ | 59.81±0.207 | 51.63 | 53.82 | 73.97 | 19.4±0.26 | 21.6 | 20.5 | 16.2 |
| HSI-D3F | √ | 60.81±0.195 | 53.87 | 55.42 | 73.15 | 19.2±0.21 | 21.4 | 19.9 | 16.2 |
| HSI-cBRN | √ | 61.87±0.180 | 54.90 | 56.62 | 74.08 | 18.5±0.25 | 20.1 | 19.5 | 16.0 |
| HSI | √ | 64.15±0.205 | 58.11 | 59.51 | 74.83 | 17.7±0.29 | 18.9 | 18.6 | 15.8 |
| Joint Static | √(upper bound) | 70.64±0.184 | 67.48 | 65.75 | 78.68 | 16.7±0.26 | 17.2 | 17.8 | 15.1 |
4. Conclusion
This work proposed an HSI framework under a clinically meaningful scenario, in which clinical databases are sequentially constructed from different sites/imaging protocols with new labels. To alleviate the catastrophic forgetting alongside continuously varying structures and data shifts, our HSI resorted to a D3F module for learning and integrating old and new knowledge nimbly. In doing so, we were able to achieve divergence awareness with our cBRN-guided model adaptation for all the data involved. Our framework was optimized with a self-entropy regularized HSI pseudo-label distillation scheme with MMD to efficiently utilize the previous model in different types of MRI data. Our framework demonstrated superior segmentation performance in learning new anatomical structures from cross-subset/modality MRI data. It was experimentally shown that a large improvement in learning anatomic structures was observed.
Fig. 2:
Illustration of the proposed HSI pseudo-label distillation with MMD
Acknowledgements
This work is supported by NIH R01DC018511, R01DE027989, and P41EB022544. The authors would like to thank Dr. Jonghyun Choi for his valuable insights and helpful discussions.
References
- 1.Bakas S., Reyes M., Jakab A., Bauer S., Rempfler M., Crimi A., Shinohara R.T., Berger C., Ha S.M., Rozycki M., et al. : Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv:1811.02629 (2018) [Google Scholar]
- 2.Chang W.G., You T., Seo S., Kwak S., Han B.: Domain-specific batch normalization for unsupervised domain adaptation. In: CVPR. pp. 7354–7362 (2019) [Google Scholar]
- 3.Clark K., Vendt B., Smith K., Freymann J., Kirby J., Koppel P., Moore S., Phillips S., Maffitt D., Pringle M., et al. : The cancer imaging archive (tcia): maintaining and operating a public information repository. Journal of digital imaging 26(6), 1045–1057 (2013) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ding X., Zhang X., Ma N., Han J., Ding G., Sun J.: Repvgg: Making vgg-style convnets great again. In: CVPR. pp. 13733–13742 (2021) [Google Scholar]
- 5.Douillard A., Chen Y., Dapogny A., Cord M.: Plop: Learning without forgetting for continual semantic segmentation. In: CVPR. pp. 4040–4050 (2021) [Google Scholar]
- 6.Fu S., Li Z., Liu Z., Yang X.: Interactive knowledge distillation for image classification. Neurocomputing 449, 411–421 (2021) [Google Scholar]
- 7.Grandvalet Y., Bengio Y.: Semi-supervised learning by entropy minimization. In: NeurIPS (2005) [Google Scholar]
- 8.He K., Zhang X., Ren S., Sun J.: Deep residual learning for image recognition. In: CVPR (2016) [Google Scholar]
- 9.Huang G., Sun Y., Liu Z., Sedra D., Weinberger K.Q.: Deep networks with stochastic depth. In: European Conference on Computer Vision. pp. 646–661 (2016) [Google Scholar]
- 10.Ioffe S.: Batch renormalization: Towards reducing minibatch dependence in batch-normalized models. NeurIPS 30 (2017) [Google Scholar]
- 11.Ioffe S., Szegedy C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICML. pp. 448–456. PMLR (2015) [Google Scholar]
- 12.Isensee F., Jaeger P.F., Kohl S.A., Petersen J., Maier-Hein K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods 18(2), 203–211 (2021) [DOI] [PubMed] [Google Scholar]
- 13.Kanakis M., Bruggemann D., Saha S., Georgoulis S., Obukhov A., Gool L.V.: Reparameterizing convolutions for incremental multi-task learning without task interference. In: European Conference on Computer Vision. pp. 689–707 (2020) [Google Scholar]
- 14.Kim D., Bae J., Jo Y., Choi J.: Incremental learning with maximum entropy regularization: Rethinking forgetting and intransigence. arXiv:1902.00829 (2019) [Google Scholar]
- 15.Kim K., Ji B., Yoon D., Hwang S.: Self-knowledge distillation: A simple way for better generalization. arXiv:2006.12000 (2020) [Google Scholar]
- 16.Kundu J.N., Venkatesh R.M., Venkat N., Revanur A., Babu R.V.: Class-incremental domain adaptation. In: European Conference on Computer Vision. pp. 53–69 (2020) [Google Scholar]
- 17.Li K., Yu L., Heng P.A.: Domain-incremental cardiac image segmentation with style-oriented replay and domain-sensitive feature whitening. TMI; (2022) [DOI] [PubMed] [Google Scholar]
- 18.Liu P., Wang X., Fan M., Pan H., Yin M., Zhu X., Du D., Zhao X., Xiao L., Ding L., Zhou S.: Learning incrementally to segment multiple organs in a ct image. MICCAI; (2022) [Google Scholar]
- 19.Liu X., Prince J.L., Xing F., Zhuo J., Timothy R., Stone M., Fakhri G.E., Woo J.: Attentive continuous generative self-training for unsupervised domain adaptive medical image translation. Medical Image Analysis (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Liu X., Xing F., El Fakhri G., Woo J.: Memory consistent unsupervised off-the-shelf model adaptation for source-relaxed medical image segmentation. Medical Image Analysis 83, 102641 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Liu X., Xing F., Shusharina N., Lim R., Jay Kuo C.C., El Fakhri G., Woo J.: Act: Semi-supervised domain-adaptive medical image segmentation with asymmetric co-training. In: MICCAI; (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Liu X., Xing F., Yang C., El Fakhri G., Woo J.: Adapting off-the-shelf source segmenter for target medical image segmentation. In: MICCAI. pp. 549–559 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Liu X., Xing F., You J., Lu J., Kuo C.C.J., El Fakhri G., Woo J.: Subtype-aware dynamic unsupervised domain adaptation. IEEE TNNLS (2022) [DOI] [PubMed] [Google Scholar]
- 24.Liu X., Yoo C., Xing F., Oh H., El Fakhri G., Kang J.W., Woo J., et al. : Deep unsupervised domain adaptation: a review of recent advances and perspectives. APSIPA Transactions on Signal and Information Processing 11(1) (2022) [Google Scholar]
- 25.Liu Y., Schiele B., Sun Q.: Adaptive aggregation networks for class-incremental learning. In: CVPR. pp. 2544–2553 (2021) [Google Scholar]
- 26.Menze B.H., Jakab A., Bauer S., Kalpathy-Cramer J., Farahani K., Kirby J., Burren Y., Porz N., Slotboom J., Wiest R., et al. : The multimodal brain tumor image segmentation benchmark (BRATS). TMI 34(10), 1993–2024 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ozdemir F., Fuernstahl P., Goksel O.: Learn the new, keep the old: Extending pretrained models with new anatomy and images. In: MICCAI. pp. 361–369 (2018) [Google Scholar]
- 28.Shusharina N., Söderberg J., Edmunds D., Löfman F., Shih H., Bortfeld T.: Automated delineation of the clinical target volume using anatomically constrained 3d expansion of the gross tumor volume. Radiotherapy and Oncology (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Tajbakhsh N., Jeyaseelan L., Li Q., Chiang J.N., Wu Z., Ding X.: Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation. Medical Image Analysis 63, 101693 (2020) [DOI] [PubMed] [Google Scholar]
- 30.Yang G., Fini E., Xu D., Rota P., Ding M., Nabi M., Alameda-Pineda X., Ricci E.: Uncertainty-aware contrastive distillation for incremental semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022) [DOI] [PubMed] [Google Scholar]
- 31.Yin H., Molchanov P., Alvarez J.M., Li Z., Mallya A., Hoiem D., Jha N.K., Kautz J.: Dreaming to distill: Data-free knowledge transfer via deepinversion. In: CVPR. pp. 8715–8724 (2020) [Google Scholar]
- 32.You C., Xiang J., Su K., Zhang X., Dong S., Onofrey J., Staib L., Duncan J.S.: Incremental learning meets transfer learning: Application to multi-site prostate mri segmentation. arXiv:2206.01369 (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Yu L., Liu X., Van de Weijer J.: Self-training for class-incremental semantic segmentation. IEEE Transactions on Neural Networks and Learning Systems (2022) [DOI] [PubMed] [Google Scholar]
- 34.Zhang C.B., Xiao J.W., Liu X., Chen Y.C., Cheng M.M.: Representation compensation networks for continual semantic segmentation. In: CVPR (2022) [Google Scholar]
- 35.Zhang H., Zhang Y., Jia K., Zhang L.: Unsupervised domain adaptation of black-box source models. arXiv:2101.02839 (2021) [Google Scholar]
- 36.Zhang J., Qi L., Shi Y., Gao Y.: Generalizable semantic segmentation via model-agnostic learning and target-specific normalization. arXiv:2003.12296 (2020) [Google Scholar]
- 37.Zhou X.Y., Yang G.Z.: Normalization in training u-net for 2-D biomedical semantic segmentation. IEEE Robotics and Automation Letters 4(2), 1792–1799 (2019) [Google Scholar]
- 38.Zou Y., Yu Z., Liu X., Kumar B., Wang J.: Confidence regularized self-training. In: ICCV. pp. 5982–5991 (2019) [Google Scholar]



