ACTION++: Improving Semi-supervised Medical Image Segmentation with Adaptive Anatomical Contrast

Chenyu You; Weicheng Dai; Yifei Min; Lawrence Staib; Jas Sekhon; James S Duncan

doi:10.1007/978-3-031-43901-8_19

. Author manuscript; available in PMC: 2024 May 29.

Published in final edited form as: Med Image Comput Comput Assist Interv. 2023 Oct 1;14223:194–205. doi: 10.1007/978-3-031-43901-8_19

ACTION++: Improving Semi-supervised Medical Image Segmentation with Adaptive Anatomical Contrast

Chenyu You ¹, Weicheng Dai ², Yifei Min ⁴, Lawrence Staib ^1,^2,³, Jas Sekhon ^4,⁵, James S Duncan ^1,^2,^3,⁴

PMCID: PMC11136572 NIHMSID: NIHMS1995503 PMID: 38813456

Abstract

Medical data often exhibits long-tail distributions with heavy class imbalance, which naturally leads to difficulty in classifying the minority classes (i.e., boundary regions or rare objects). Recent work has significantly improved semi-supervised medical image segmentation in long-tailed scenarios by equipping them with unsupervised contrastive criteria. However, it remains unclear how well they will perform in the labeled portion of data where class distribution is also highly imbalanced. In this work, we present ACTION++, an improved contrastive learning framework with adaptive anatomical contrast for semi-supervised medical segmentation. Specifically, we propose an adaptive supervised contrastive loss, where we first compute the optimal locations of class centers uniformly distributed on the embedding space (i.e., off-line), and then perform online contrastive matching training by encouraging different class features to adaptively match these distinct and uniformly distributed class centers. Moreover, we argue that blindly adopting a constant temperature $τ$ in the contrastive loss on long-tailed medical data is not optimal, and propose to use a dynamic $τ$ via a simple cosine schedule to yield better separation between majority and minority classes. Empirically, we evaluate ACTION++ on ACDC and LA benchmarks and show that it achieves state-of-the-art across two semi-supervised settings. Theoretically, we analyze the performance of adaptive anatomical contrast and confirm its superiority in label efficiency.

Keywords: Semi-Supervised Learning, Contrastive Learning, Imbalanced Learning, Long-tailed Medical Image Segmentation

1. Introduction

With the recent development of semi-supervised learning (SSL) [3], rapid progress has been made in medical image segmentation, which typically learns rich anatomical representations from few labeled data and the vast amount of unlabeled data. Existing SSL approaches can be generally categorized into adversarial training [32,39,16,38], deep co-training [23,43], mean teacher schemes [27,42,14,13,15,7,41,34], multi-task learning [19,11,22,37,35], and contrastive learning [2,29,40,33,24,36].

Contrastive learning (CL) has become a remarkable approach to enhance semi-supervised medical image segmentation performance without significantly increasing the amount of parameters and annotation costs [2,29,36]. In real-world clinical scenarios, since the classes in medical images follow the Zipfian distribution [44], the medical datasets usually show a long-tailed, even heavy-tailed class distribution, i.e., some minority (tail) classes involving significantly fewer pixel-level training instances than other majority (head) classes, as illustrated in Figure 1. Such imbalanced scenarios are usually very challenging for CL methods to address, leading to noticeable performance drop [18].

Fig. 1. — Examples of two benchmarks (*i.e*., ACDC and LA) with imbalanced class distribution. From left to right: input image, ground-truth segmentation map, class distribution chart, training data feature distribution for multiple classes.

To address long-tail medical segmentation, our motivations come from the following two perspectives in CL training schemes [2,36]: ➊ Training objective – the main focus of existing approaches is on designing proper unsupervised contrastive loss in learning high-quality representations for long-tail medical segmentation. While extensively explored in the unlabeled portion of long-tail medical data, supervised CL has rarely been studied from empirical and theoretical perspectives, which will be one of the focuses in this work; ➋ Temperature scheduler – the temperature parameter $τ$ , which controls the strength of attraction and repulsion forces in the contrastive loss [5,4], has been shown to play a crucial role in learning useful representations. It is affirmed that a large $τ$ emphasizes anatomically meaningful group-wise patterns by group-level discrimination, whereas a small $τ$ ensures a higher degree of pixel-level (instance) discrimination [28,25]. On the other hand, as shown in [25], group-wise discrimination often results in reduced model’s instance discrimination capabilities, where the model will be biased to “easy” features instead of “hard” features. It is thus unfavorable for long-tailed medical segmentation to blindly treat $τ$ as a constant hyperparameter, and a dynamic temperature parameter for CL is worth investigating.

In this paper, we introduce ACTION++, which further optimizes anatomically group-level and pixel-level representations for better head and tail class separations, on both labeled and unlabeled medical data. Specifically, we devise two strategies to improve overall segmentation quality by focusing on the two aforementioned perspectives: (1) we propose supervised adaptive anatomical contrastive learning (SAACL) for long-tail medical segmentation. To prevent the feature space from being biased toward the dominant head class, we first pre-compute the optimal locations of class centers uniformly distributed on the embedding space (i.e., off-line), and then perform online contrastive matching training by encouraging different class features to adaptively match these distinct and uniformly distributed class centers; (2) we find that blindly adopting the constant temperature $τ$ in the contrastive loss can negatively impact the segmentation performance. Inspired by an average distance maximization perspective, we leverage a dynamic $τ$ via a simple cosine schedule, resulting in significant improvements in the learned representations. Both of these enable the model to learn a balanced feature space that has similar separability for both the majority (head) and minority (tail) classes, leading to better generalization in long-tail medical data. We evaluated our ACTION++ on the public ACDC and LA datasets [1,31]. Extensive experimental results show that our ACTION++ outperforms prior methods by a significant margin and sets the new state-of-the-art across two semi-supervised settings. We also theoretically show the superiority of our method in label efficiency (Appendix A). Code is released at here.

2. Method

2.1. Overview

Problem Statement

Given a medical image dataset $(X, Y)$ , our goal is to train a segmentation model $F$ that can provide accurate predictions that assign each pixel to their corresponding $K$ -class segmentation labels.

Setup

Figure 2 illustrates an overview of ACTION++. By default, we build this work upon ACTION pipeline [36], the state-of-the-art CL framework for semi-supervised medical image segmentation. The backbone model adopts the student-teacher framework that shares the same architecture, and the parameters of the teacher are the exponential moving average of the student’s parameters. Hereinafter, we adopt their model as our backbone and briefly summarize its major components: (1) global contrastive distillation pre-training; (2) local contrastive distillation pre-training; and (3) anatomical contrast fine-tuning.

Fig. 2. — Overview of ACTION++: (1) global and local pre-training with proposed anatomical-aware temperature scheduler, (2) our proposed adaptive anatomical contrast fine-tuning, which first pre-computes the optimal locations of class centers uniformly distributed on the embedding space (*i.e*., off-line), and then performs online contrastive matching training by encouraging different class features to adaptively match these distinct and uniformly distributed class centers with respect to anatomical features.

Global and Local Pre-training

[36] first creates two types of anatomical views as follows: (1) augmented views - $x^{1}$ and $x^{2}$ are augmented from the unlabeled input scan with two separate data augmentation operators; (2) mined views - $n$ samples (i.e., $x^{3}$ ) are randomly sampled from the unlabeled portion with additional augmentation. The pairs $[x^{1}, x^{2}]$ are then processed by student-teacher networks $[F_{s}, F_{t}]$ that share the same architecture and weight, and similarly, $x^{3}$ is encoded by $F_{t}$ . Their global latent features after the encoder $E$ (i.e., $[h^{1}, h^{2}, h^{3}]$ ) and local output features after decoder $D$ (i.e., $[f^{1}, f^{2}, f^{3}]$ ) are encoded by the two-layer nonlinear projectors, generating global and local embeddings $v_{g}$ and $v_{l} . v$ from $F_{s}$ are separately encoded by the non-linear predictor, producing $w$ in both global and local manners¹. Third, the relational similarities between augmented and mined views are processed by SoftMax function as follows: $u_{s} = l o g \frac{e x p (s i m (w^{1}, v^{3}) / τ_{s})}{\sum_{n = 1}^{N} e x p (s i m (w^{1}, v_{n}^{3}) / τ_{s})}, u_{t} = l o g \frac{e x p (s i m (w^{2}, v^{3}) / τ_{t})}{\sum_{n = 1}^{N} e x p (s i m (w^{2}, v_{n}^{3}) / τ_{t})}$ , where $τ_{s}$ and $τ_{t}$ are two temperature parameters. Finally, we minimize the unsupervised instance discrimination loss (i.e., Kullback-Leibler divergence $𝒦 ℒ$ ) as:

ℒ_{inst} = 𝒦 ℒ (u_{s} ∥ u_{t}) .

(1)

We formally summarize the pretraining objective as the equal combination of the global and local $ℒ_{inst}$ , and supervised segmentation loss $ℒ_{sup}$ (i.e., equal combination of Dice loss and cross-entropy loss).

Anatomical Contrast Fine-tuning

The underlying motivation for the fine-tuning stage is that it reduces the vulnerability of the pre-trained model to long-tailed unlabeled data. To mitigate the problem, [36] proposed to fine-tune the model by anatomical contrast. First, the additional representation head $φ$ is used to provide dense representations with the same size as the input scans. Then, [36] explore pulling queries $r_{q} \in ℛ$ to be similar to the positive keys $r_{k}^{+} \in ℛ$ , and push apart the negative keys $r_{k}^{-} \in ℛ$ . The AnCo loss is defined as follows:

ℒ_{anco} = \sum_{c \in 𝒞} \sum_{r_{q} ~ ℛ_{q}^{c}} - \log \frac{\exp (r_{q} \cdot r_{k}^{c, +} / τ_{a n})}{\exp (r_{q} \cdot r_{k}^{c, +} / τ_{a n}) + \sum_{r_{k}^{-} ~ ℛ_{k}^{c}} \exp (r_{q} \cdot r_{k}^{-} / τ_{a n})},

(2)

where $𝒞$ denotes a set of all available classes in the current mini-batch, and $τ_{a n}$ is a temperature hyperparameter. For class $c$ , we select a query representation set $ℛ_{q}^{c}$ , a negative key representation set $ℛ_{k}^{c}$ whose labels are not in class $c$ , and the positive key $r_{k}^{c, +}$ which is the $c$ -class mean representation. Given $𝒫$ is a set including all pixel coordinates with the same size as $R$ , these queries and keys can be defined as: $ℛ_{q}^{c} = ⋃_{[i, j] \in 𝒜} 1 (y_{[i, j]} = c) r_{[i, j]}, ℛ_{k}^{c} = ⋃_{[i, j] \in 𝒜} 1 (y_{[i, j]} \neq c) r_{[i, j]}, r_{k}^{c, +} = \frac{1}{|ℛ_{q}^{c}|} \sum_{r_{q} \in ℛ_{q}^{c}} r_{q}$ . We formally summarize the fine-tuning objective as the equal combination of unsupervised $ℒ_{anco}$ , unsupervised cross-entropy loss $ℒ_{unsup}$ , and supervised segmentation loss $ℒ_{sup}$ . For more details, we refer the reader to [36].

2.2. Supervised Adaptive Anatomical Contrastive Learning

The general efficacy of anatomical contrast on long-tail unlabeled data has previously been demonstrated by the authors of [36]. However, taking a closer look, we observe that the well-trained $F$ shows a downward trend in performance, which often fails to classify tail classes on labeled data, especially when the data shows long-tailed class distributions. This indicates that such well-trained $F$ is required to improve the segmentation capabilities in long-tailed labeled data. To this end, inspired by [17] tailored for the image classification tasks, we introduce supervised adaptive anatomical contrastive learning (SAACL), a training framework for generating well-separated and uniformly distributed latent feature representations for both the head and tail classes. It consists of three main steps, which we describe in the following.

Anatomical Center Pre-computation

We first pre-compute the anatomical class centers in latent representation space. The optimal class centers are chosen as $K$ positions from the unit sphere $S^{d - 1} = \{v \in R^{d} : ∥ v ∥_{2} = 1\}$ in the $d$ -dimensional space. To encourage good separability and uniformity, we compute the class centers ${\{ψ_{c}\}}_{c = 1}^{K}$ by minimizing the following uniformity loss $ℒ_{unif}$ :

ℒ_{unif} ({\{ψ_{c}\}}_{c = 1}^{K}) = \sum_{c = 1}^{K} log (\sum_{c^{'} = 1}^{K} exp (ψ_{c} \cdot ψ_{c^{'}} / τ)) .

(3)

In our implementation, we use gradient descent to search for the optimal class centers constrained to the unit sphere $S^{d - 1}$ , which are denoted by ${\{ψ_{c}^{⋆}\}}_{c = 1}^{K}$ . Furthermore, the latent dimension $d$ is a hyper-parameter, which we set such that $d ≫ K$ to ensure the solution found by gradient descent indeed maximizes the minimum distance between any two class centers [6]. It is also known that any analytical minimizers of Eqn. 3 form a perfectly regular $K$ -vertex inscribed simplex of the sphere $S^{d - 1}$ [6]. We emphasize that this first step of pre-computation of class centers is completely off-line as it does not require any training data.

Adaptive Allocation

As the second step, we explore adaptively allocating these centers among classes. This is a combinatorial optimization problem and an exhaustive search of all choices would be computationally prohibited. Therefore, we draw intuition from the empirical mean in the K-means algorithm and adopt an adaptive allocation scheme to iteratively search for the optimal allocation during training. Specifically, consider a batch $ℬ = \{ℬ_{1}, \dots, ℬ_{K}\}$ where $ℬ_{c}$ denotes a set of samples in a batch with class label $c$ , for $c = 1, \dots, K$ . Define ${\overline{ϕ}}_{c} (ℬ) = \sum_{i \in ℬ_{c}} ϕ_{i} / {∥\sum_{i \in ℬ_{c}} ϕ_{i}∥}_{2}$ be the empirical mean of class $c$ in current batch, where $ϕ_{i}$ is the feature embedding of sample $i$ . We compute assignment $π$ by minimizing the distance between pre-computed class centers and the empirical means:

π^{⋆} = \arg \underset{π}{m i n} \sum_{c = 1}^{K} {∥ψ_{π (c)}^{⋆} - {\overline{ϕ}}_{c}∥}_{2} .

(4)

In implementation, the empirical mean is updated using moving average. That is, for iteration $t$ , we first compute the empirical mean ${\overline{ϕ}}_{c} (ℬ)$ for batch $ℬ$ as described above, and then update by ${\overline{ϕ}}_{c} \leftarrow (1 - η) {\overline{ϕ}}_{c} + η {\overline{ϕ}}_{c} (ℬ)$ .

Adaptive Anatomical Contrast

Finally, the allocated class centers are well-separated and should maintain the semantic relation between classes. To utilize these optimal class centers, we want to induce the feature representation of samples from each class to cluster around the corresponding pre-computed class center. To this end, we adopt a supervised contrastive loss for the label portion of the data. Specifically, given a batch of pixel-feature-label tuples ${\{(ω_{i}, ϕ_{i}, y_{i})\}}_{i = 1}^{n}$ where $ω_{i}$ is the i-th pixel in the batch, $ϕ_{i}$ is the feature of the pixel and $y_{i}$ is its label, we define supervised adaptive anatomical contrastive loss for pixel $i$ as:

ℒ_{aaco} = \frac{- 1}{n} \sum_{i = 1}^{n} (\sum_{ϕ_{i}^{+}} l o g \frac{e x p (ϕ_{i} \cdot ϕ_{i}^{+} / τ_{s a})}{\sum_{ϕ_{j}} e x p (ϕ_{i} \cdot ϕ_{j} / τ_{s a})} + λ_{a} l o g \frac{e x p (ϕ_{i} \cdot ν_{i} / τ_{s a})}{\sum_{ϕ_{j}} e x p (ϕ_{i} \cdot ϕ_{j} / τ_{s a})}),

(5)

where $ν_{i} = ψ_{π^{⋆} (y_{i})}^{⋆}$ is the pre-computed center of class $y_{i}$ . The first term in Eqn. 5 is supervised contrastive loss, where the summation over $ϕ_{i}^{+}$ refers to the uniformly sampled positive examples from pixels in batch with label equal to $y_{i}$ . The summation over $ϕ_{j}$ refers to all features in the batch excluding $ϕ_{i}$ . The second term is contrastive loss with the positive example being the pre-computed optimal class center.

2.3. Anatomical-aware Temperature Scheduler (ATS)

Training with a varying $τ$ induces a more isotropic representation space, wherein the model learns both group-wise and instance-specific features [12]. To this end, we are inspired to use an anatomical-aware temperature scheduler in both the supervised and the unsupervised contrastive losses, where the temperature parameter $τ$ evolves within the range $[τ^{-}, τ^{+}]$ for $τ^{+} > τ^{-}$ . Specifically, for iteration $t = 1, \dots, T$ with $T$ being the total number of iterations, we set $τ_{t}$ as:

τ_{t} = τ^{-} + 0.5 (1 + c o s (2 π t / T)) (τ^{+} - τ^{-}) .

(6)

3. Experiments

Experimental Setup

We evaluate ACTION++ on two benchmark datasets: the LA dataset [31] and the ACDC dataset [1]. The LA dataset consists of 100 gadolinium-enhanced MRI scans, with the fixed split [29] using 80 and 20 scans for training and validation. The ACDC dataset consists of 200 cardiac cine MRI scans from 100 patients including three segmentation classes, i.e., left ventricle (LV), myocardium (Myo), and right ventricle (RV), with the fixed split² using 70, 10, and 20 patients’ scans for training, validation, and testing. For all our experiments, we follow the identical setting in [42,19,30,29], and perform evaluations under two label settings (i.e., 5% and 10%) for both datasets.

Implementation Details

We use an SGD optimizer for all experiments with a learning rate of 1e-2, a momentum of 0.9, and a weight decay of 0.0001. Following [42,19,30,29] on both datasets, all inputs were normalized as zero mean and unit variance. The data augmentations are rotation and flip operations. Our work is built on ACTION [36], thus we follow the identical model setting except for temperature parameters because they are of direct interest to us. For the sake of completeness, we refer the reader to [36] for more details. We set $λ_{a}, d$ as 0.2, 128, and regarding all $τ$ , we use $τ^{+} = 1.0$ and $τ^{-} = 0.1$ if not stated otherwise. On ACDC, we use the U-Net model [26] as the backbone with a 2D patch size of 256 × 256 and batch size of 8. For pre-training, the networks are trained for 10K iterations; for fine-tuning, 20K iterations. On LA, we use the V-Net [21] as the backbone. For training, we randomly crop 112 × 112 × 80 patches and the batch size is 2. For pre-training, the networks are trained for 5K iterations. For fine-tuning, the networks are for 15K iterations. For testing, we adopt a sliding window strategy with a fixed stride (18 × 18 × 4). All experiments are conducted in the same environments with fixed random seeds (Hardware: Single NVIDIA GeForce RTX 3090 GPU; Software: PyTorch 1.10.2+cu113, and Python 3.8.11).

Main Results

We compare our ACTION++ with current state-of-the-art SSL methods, including UAMT [42], SASSNet [16], DTC [19], URPC [20], MC-Net [30], SS-Net [29], and ACTION [36], and the supervised counterparts (UNet [26]/VNet [21]) trained with Full/Limited supervisions – using their released code. To evaluate 3D segmentation ability, we use Dice coefficient (DSC) and Average Surface Distance (ASD). Table 2 and Table 1 display the results on the public ACDC and LA datasets for the two labeled settings, respectively. We next discuss our main findings as follows. (1) LA: As shown in Table 1, our method generally presents better performance than the prior SSL methods under all settings. Fig. 4 (Appendix) also shows that our model consistently outperforms all other competitors, especially in the boundary region; (2) ACDC: As Table 2 shows, ACTION++ achieves the best segmentation performance in terms of Dice and ASD, consistently outperforming the previous SSL methods across two labeled settings. In Fig. 3 (Appendix), we can observe that ACTION++ can yield the segmentation boundaries accurately, even for very challenging regions (i.e., RV and Myo). This suggests that ACTION++ is inherently better at long-tailed learning, in addition to being a better segmentation model in general.

Table 2.

Quantitative comparison (DSC[%]/ASD[voxel]) for ACDC under two unlabeled settings (5% or 10%). All experiments are conducted as [42,16,19,20,30,29,36] in the identical setting for fair comparisons. The best results are indicated in bold.

	3 Labeled (5%)				7 Labeled (10%)
Method	Average	RV	Myo	LV	Average	RV	Myo	LV
UNet-F [26]	91.5/0.996	90.5/0.606	88.8/0.941	94.4/1.44	91.5/0.996	90.5/0.606	88.8/0.941	94.4/1.44
UNet-L	51.7/13.1	36.9/30.1	54.9/4.27	63.4/5.11	79.5/2.73	65.9/0.892	82.9/2.70	89.6/4.60
UAMT [42]	48.3/9.14	37.6/18.9	50.1/4.27	57.3/4.17	81.8/4.04	79.9/2.73	80.1/3.32	85.4/6.07
SASSNet [16]	57.8/6.36	47.9/11.7	59.7/4.51	65.8/2.87	84.7/1.83	81.8/0.769	82.9/1.73	89.4/2.99
URPC [20]	58.9/8.14	50.1/12.6	60.8/4.10	65.8/7.71	83.1/1.68	77.0/0.742	82.2/0.505	90.1/3.79
DTC [19]	56.9/7.59	35.1/9.17	62.9/6.01	72.7/7.59	84.3/4.04	83.8/3.72	83.5/4.63	85.6/3.77
MC-Net [30]	62.8/2.59	52.7/5.14	62.6/0.807	73.1/1.81	86.5/1.89	85.1/0.745	84.0/2.12	90.3/2.81
SS-Net [29]	65.8/2.28	57.5/3.91	65.7/2.02	74.2/0.896	86.8/1.40	85.4/1.19	84.3/1.44	90.6/1.57
ACTION [36]	87.5/1.12	85.4/0.915	85.8/0.784	91.2/1.66	89.7/0.736	89.8/0.589	86.7/0.813	92.7/0.804
• ACTION++ (ours)	88.5/0.723	86.9/0.662	86.8/0.689	91.9/0.818	90.4/0.592	90.5/0.448	87.5/0.628	93.1/0.700

Open in a new tab

Table 1.

Quantitative comparison (DSC[%]/ASD[voxel]) for LA under two unlabeled settings (5% or 10%). All experiments are conducted as [42,16,19,20,30,29,36] in the identical setting for fair comparisons. The best results are indicated in bold. VNet-F (fully-supervided) and VNet-L (semi-supervided) are considered as the upper bound and the lower bound for the performance comparison.

	4 Labeled (5%)		8 Labeled (10%)
Method	DSC[%]↑	ASD[voxel]↓	DSC[%]↑	ASD[voxel]↓
VNet-F [21]	91.5	1.51	91.5	1.51
VNet-L	52.6	9.87	82.7	3.26
UAMT [42]	82.3	3.82	87.8	2.12
SASSNet [16]	81.6	3.58	87.5	2.59
DTC [19]	81.3	2.70	87.5	2.36
URPC [20]	82.5	3.65	86.9	2.28
MC-Net [30]	83.6	2.70	87.6	1.82
SS-Net [29]	86.3	2.31	88.6	1.90
ACTION [36]	86.6	2.24	88.7	2.10
• ACTION++ (ours)	87.8	2.09	89.9	1.74

Open in a new tab

Ablation Study

We first perform ablation studies on LA with 10% label ratio to evaluate the importance of different components. Table 3 shows the effectiveness of supervised adaptive anatomical contrastive learning (SAACL). Table 4 (Appendix) indicates that using anatomical-aware temperature scheduler (ATS) and SAACL yield better performance in both pre-training and fine-tuning stages. We then theoretically show the superiority of our method in Appendix A.

Table 3.

Ablation studies of Supervised Adaptive Anatomical Contrast (SAACL).

Method	DSC[%]↑	ASD[voxel]↓
KCL [9]	88.4	2.19
CB-KCL [10]	86.9	2.47
SAACL (Ours)	89.9	1.74
SAACL (random assign)	88.0	2.79
SAACL (adaptive allocation)	89.9	1.74

Open in a new tab

Finally, we conduct experiments to study the effects of cosine boundaries, cosine period, different methods of varying $τ$ , and $λ_{a}$ in Table 5, Table 6 (Appendix), respectively. Empirically, we find that using our settings (i.e., $τ^{-} = 0.1, τ^{+} = 1.0, T / # i t e r a t i o n s = 1.0$ , cosine scheduler, $λ_{a} = 0.2$ ) attains optimal performance.

4. Conclusion

In this paper, we proposed ACTION++, an improved contrastive learning framework with adaptive anatomical contrast for semi-supervised medical segmentation. Our work is inspired by two intriguing observations that, besides the unlabeled data, the class imbalance issue exists in the labeled portion of medical data and the effectiveness of temperature schedules for contrastive learning on long-tailed medical data. Extensive experiments and ablations demonstrated that our model consistently achieved superior performance compared to the prior semi-supervised medical image segmentation methods under different label ratios. Our theoretical analysis also revealed the robustness of our method in label efficiency. In future, we will validate CT/MRI datasets with more foreground labels and try t-SNE.

Supplementary Material

appendix

NIHMS1995503-supplement-appendix.pdf^{(1.6MB, pdf)}

Table 4.

Effect of cosine boundaries in with the largest difference between $τ^{-}$ and $τ^{+}$ .

_$τ^{-}$╲^{$τ^{+}$}	0.2	0.3	0.4	0.5	1.0
0.07	84.1	85.0	86.9	87.9	89.7
0.1	84.5	85.9	87.1	88.3	89.9
0.2	84.2	84.4	85.8	87.1	87.6

Open in a new tab

Footnotes

For simplicity, we omit details of local instance discrimination in the following.

https://github.com/HiLab-git/SSL4MIS/tree/master/data/ACDC

References

1.Bernard O, Lalande A, Zotti C, Cervenansky F, Yang X, Heng PA, Cetin I, Lekadir K, Camara O, Ballester MAG, et al. : Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: Is the problem solved? IEEE Transactions on Medical Imaging (2018) [DOI] [PubMed] [Google Scholar]
2.Chaitanya K, Erdil E, Karani N, Konukoglu E: Contrastive learning of global and local features for medical image segmentation with limited annotations. In: NeurIPS; (2020) [Google Scholar]
3.Chapelle O, Scholkopf B, Zien A: Semi-supervised learning (chapelle o. et al. , eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks (2009) [Google Scholar]
4.Chen T, Kornblith S, Norouzi M, Hinton G: A simple framework for contrastive learning of visual representations. In: ICML. pp. 1597–1607. PMLR (2020) [Google Scholar]
5.Chen X, Fan H, Girshick R, He K: Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297 (2020) [Google Scholar]
6.Graf F, Hofer C, Niethammer M, Kwitt R: Dissecting supervised contrastive learning. In: ICML. PMLR (2021) [Google Scholar]
7.He Y, Lin F, Tzeng NF, et al. : Interpretable minority synthesis for imbalanced classification. In: IJCAI; (2021) [Google Scholar]
8.Huang W, Yi M, Zhao X: Towards the generalization of contrastive self-supervised learning. arXiv preprint arXiv:2111.00743 (2021) [Google Scholar]
9.Kang B, Li Y, Xie S, Yuan Z, Feng J: Exploring balanced feature spaces for representation learning. In: ICLR; (2021) [Google Scholar]
10.Kang B, Xie S, Rohrbach M, Yan Z, Gordo A, Feng J, Kalantidis Y: Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217 (2019) [Google Scholar]
11.Kervadec H, Dolz J, Granger É, Ben Ayed I: Curriculum semi-supervised segmentation. In: MICCAI. Springer; (2019) [Google Scholar]
12.Kukleva A, Böhle M, Schiele B, Kuehne H, Rupprecht C: Temperature schedules for self-supervised contrastive methods on long-tail data. In: ICLR; (2023) [Google Scholar]
13.Lai Z, Wang C, Cheung S.c., Chuah CN: Sar: Self-adaptive refinement on pseudo labels for multiclass-imbalanced semi-supervised learning. In: CVPR. pp. 4091–4100 (2022) [Google Scholar]
14.Lai Z, Wang C, Gunawan H, Cheung SCS, Chuah CN: Smoothed adaptive weighting for imbalanced semi-supervised learning: Improve reliability against unknown distribution data. In: ICML. pp. 11828–11843 (2022) [Google Scholar]
15.Lai Z, Wang C, Oliveira LC, Dugger BN, Cheung SC, Chuah CN: Joint semi-supervised and active learning for segmentation of gigapixel pathology images with cost-effective labeling. In: ICCV. pp. 591–600 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Li S, Zhang C, He X: Shape-aware semi-supervised 3d semantic segmentation for medical images. In: MICCAI. pp. 552–561. Springer; (2020) [Google Scholar]
17.Li T, Cao P, Yuan Y, Fan L, Yang Y, Feris RS, Indyk P, Katabi D: Targeted supervised contrastive learning for long-tailed recognition. In: CVPR; (2022) [Google Scholar]
18.Li Z, Kamnitsas K, Glocker B: Analyzing overfitting under class imbalance in neural networks for image segmentation. IEEE Transactions on Medical Imaging (2020) [DOI] [PubMed] [Google Scholar]
19.Luo X, Chen J, Song T, Wang G: Semi-supervised medical image segmentation through dual-task consistency. In: AAAI; (2020) [Google Scholar]
20.Luo X, Liao W, Chen J, Song T, Chen Y, Zhang S, Chen N, Wang G, Zhang S: Efficient semi-supervised gross target volume of nasopharyngeal carcinoma segmentation via uncertainty rectified pyramid consistency. In: MICCAI Springer; (2021) [Google Scholar]
21.Milletari F, Navab N, Ahmadi SA: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 3DV. pp. 565–571. IEEE; (2016) [Google Scholar]
22.Oliveira LC, Lai Z, Siefkes HM, Chuah CN: Generalizable semi-supervised learning strategies for multiple learning tasks using 1-d biomedical signals. In: NeurIPS Workshop on Learning from Time Series for Health (2022) [Google Scholar]
23.Qiao S, Shen W, Zhang Z, Wang B, Yuille A: Deep co-training for semi-supervised image recognition. In: ECCV; (2018) [Google Scholar]
24.Quan Q, Yao Q, Li J, Zhou S.k.: Information-guided pixel augmentation for pixel-wise contrastive learning. arXiv preprint arXiv:2211.07118 (2022) [Google Scholar]
25.Robinson J, Sun L, Yu K, Batmanghelich K, Jegelka S, Sra S: Can contrastive learning avoid shortcut solutions? In: NeurIPS; (2021) [PMC free article] [PubMed] [Google Scholar]
26.Ronneberger O, Fischer P, Brox T: U-net: Convolutional networks for biomedical image segmentation. In: MICCAI. Springer; (2015) [Google Scholar]
27.Tarvainen A, Valpola H: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: NeurIPS. pp. 1195–1204 (2017) [Google Scholar]
28.Wang F, Liu H: Understanding the behaviour of contrastive loss. In: CVPR; (2021) [Google Scholar]
29.Wu Y, Wu Z, Wu Q, Ge Z, Cai J: Exploring smoothness and class-separation for semi-supervised medical image segmentation. In: MICCAI; (2022) [Google Scholar]
30.Wu Y, Xu M, Ge Z, Cai J, Zhang L: Semi-supervised left atrium segmentation with mutual consistency training. In: MICCAI; (2021) [Google Scholar]
31.Xiong Z, Xia Q, Hu Z, Huang N, Bian C, Zheng Y, Vesal S, Ravikumar N, Maier A, Yang X, et al. : A global benchmark of algorithms for segmenting the left atrium from late gadolinium-enhanced cardiac magnetic resonance imaging. Medical Image Analysis (2021) [DOI] [PubMed] [Google Scholar]
32.Xue Y, Xu T, Zhang H, Long LR, Huang X: Segan: Adversarial network with multi-scale l 1 loss for medical image segmentation. Neuroinformatics (2018) [DOI] [PubMed] [Google Scholar]
33.You C, Dai W, Liu F, Su H, Zhang X, Staib L, Duncan JS: Mine your own anatomy: Revisiting medical image segmentation with extremely limited labels. arXiv preprint arXiv:2209.13476 (2022) [Google Scholar]
34.You C, Dai W, Min Y, Liu F, Zhang X, Clifton DA, Zhou SK, Staib LH, Duncan JS: Rethinking semi-supervised medical image segmentation: A variance-reduction perspective. arXiv preprint arXiv:2302.01735 (2023) [PMC free article] [PubMed] [Google Scholar]
35.You C, Dai W, Min Y, Staib L, Duncan JS: Implicit anatomical rendering for medical image segmentation with stochastic experts. arXiv preprint arXiv:2304.03209 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
36.You C, Dai W, Staib L, Duncan JS: Bootstrapping semi-supervised medical image segmentation with anatomical-aware contrastive distillation. In: IPMI; (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
37.You C, Xiang J, Su K, Zhang X, Dong S, Onofrey J, Staib L, Duncan JS: Incremental learning meets transfer learning: Application to multi-site prostate mri segmentation. In: International Workshop on Distributed, Collaborative, and Federated Learning (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]
38.You C, Yang J, Chapiro J, Duncan JS: Unsupervised wasserstein distance guided domain adaptation for 3d multi-domain liver segmentation. In: Interpretable and Annotation-Efficient Learning for Medical Image Computing. pp. 155–163. Springer International Publishing; (2020) [Google Scholar]
39.You C, Zhao R, Liu F, Dong S, Chinchali S, Topcu U, Staib L, Duncan J: Class-aware adversarial transformers for medical image segmentation. In: NeurIPS; (2022) [PMC free article] [PubMed] [Google Scholar]
40.You C, Zhao R, Staib LH, Duncan JS: Momentum contrastive voxel-wise representation learning for semi-supervised volumetric medical image segmentation. In: MICCAI; (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]
41.You C, Zhou Y, Zhao R, Staib L, Duncan JS: Simcvd: Simple contrastive voxel-wise representation distillation for semi-supervised medical image segmentation. IEEE Transactions on Medical Imaging (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Yu L, Wang S, Li X, Fu CW, Heng PA: Uncertainty-aware self-ensembling model for semi-supervised 3d left atrium segmentation. In: MICCAI; (2019) [Google Scholar]
43.Zhou Y, Wang Y, Tang P, Bai S, Shen W, Fishman E, Yuille A: Semi-supervised 3d abdominal multi-organ segmentation via deep multi-planar co-training. In: WACV. IEEE; (2019) [Google Scholar]
44.Zipf GK: The psycho-biology of language: An introduction to dynamic philology. Routledge; (2013) [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

appendix

NIHMS1995503-supplement-appendix.pdf^{(1.6MB, pdf)}

[R1] 1.Bernard O, Lalande A, Zotti C, Cervenansky F, Yang X, Heng PA, Cetin I, Lekadir K, Camara O, Ballester MAG, et al. : Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: Is the problem solved? IEEE Transactions on Medical Imaging (2018) [DOI] [PubMed] [Google Scholar]

[R2] 2.Chaitanya K, Erdil E, Karani N, Konukoglu E: Contrastive learning of global and local features for medical image segmentation with limited annotations. In: NeurIPS; (2020) [Google Scholar]

[R3] 3.Chapelle O, Scholkopf B, Zien A: Semi-supervised learning (chapelle o. et al. , eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks (2009) [Google Scholar]

[R4] 4.Chen T, Kornblith S, Norouzi M, Hinton G: A simple framework for contrastive learning of visual representations. In: ICML. pp. 1597–1607. PMLR (2020) [Google Scholar]

[R5] 5.Chen X, Fan H, Girshick R, He K: Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297 (2020) [Google Scholar]

[R6] 6.Graf F, Hofer C, Niethammer M, Kwitt R: Dissecting supervised contrastive learning. In: ICML. PMLR (2021) [Google Scholar]

[R7] 7.He Y, Lin F, Tzeng NF, et al. : Interpretable minority synthesis for imbalanced classification. In: IJCAI; (2021) [Google Scholar]

[R8] 8.Huang W, Yi M, Zhao X: Towards the generalization of contrastive self-supervised learning. arXiv preprint arXiv:2111.00743 (2021) [Google Scholar]

[R9] 9.Kang B, Li Y, Xie S, Yuan Z, Feng J: Exploring balanced feature spaces for representation learning. In: ICLR; (2021) [Google Scholar]

[R10] 10.Kang B, Xie S, Rohrbach M, Yan Z, Gordo A, Feng J, Kalantidis Y: Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217 (2019) [Google Scholar]

[R11] 11.Kervadec H, Dolz J, Granger É, Ben Ayed I: Curriculum semi-supervised segmentation. In: MICCAI. Springer; (2019) [Google Scholar]

[R12] 12.Kukleva A, Böhle M, Schiele B, Kuehne H, Rupprecht C: Temperature schedules for self-supervised contrastive methods on long-tail data. In: ICLR; (2023) [Google Scholar]

[R13] 13.Lai Z, Wang C, Cheung S.c., Chuah CN: Sar: Self-adaptive refinement on pseudo labels for multiclass-imbalanced semi-supervised learning. In: CVPR. pp. 4091–4100 (2022) [Google Scholar]

[R14] 14.Lai Z, Wang C, Gunawan H, Cheung SCS, Chuah CN: Smoothed adaptive weighting for imbalanced semi-supervised learning: Improve reliability against unknown distribution data. In: ICML. pp. 11828–11843 (2022) [Google Scholar]

[R15] 15.Lai Z, Wang C, Oliveira LC, Dugger BN, Cheung SC, Chuah CN: Joint semi-supervised and active learning for segmentation of gigapixel pathology images with cost-effective labeling. In: ICCV. pp. 591–600 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Li S, Zhang C, He X: Shape-aware semi-supervised 3d semantic segmentation for medical images. In: MICCAI. pp. 552–561. Springer; (2020) [Google Scholar]

[R17] 17.Li T, Cao P, Yuan Y, Fan L, Yang Y, Feris RS, Indyk P, Katabi D: Targeted supervised contrastive learning for long-tailed recognition. In: CVPR; (2022) [Google Scholar]

[R18] 18.Li Z, Kamnitsas K, Glocker B: Analyzing overfitting under class imbalance in neural networks for image segmentation. IEEE Transactions on Medical Imaging (2020) [DOI] [PubMed] [Google Scholar]

[R19] 19.Luo X, Chen J, Song T, Wang G: Semi-supervised medical image segmentation through dual-task consistency. In: AAAI; (2020) [Google Scholar]

[R20] 20.Luo X, Liao W, Chen J, Song T, Chen Y, Zhang S, Chen N, Wang G, Zhang S: Efficient semi-supervised gross target volume of nasopharyngeal carcinoma segmentation via uncertainty rectified pyramid consistency. In: MICCAI Springer; (2021) [Google Scholar]

[R21] 21.Milletari F, Navab N, Ahmadi SA: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 3DV. pp. 565–571. IEEE; (2016) [Google Scholar]

[R22] 22.Oliveira LC, Lai Z, Siefkes HM, Chuah CN: Generalizable semi-supervised learning strategies for multiple learning tasks using 1-d biomedical signals. In: NeurIPS Workshop on Learning from Time Series for Health (2022) [Google Scholar]

[R23] 23.Qiao S, Shen W, Zhang Z, Wang B, Yuille A: Deep co-training for semi-supervised image recognition. In: ECCV; (2018) [Google Scholar]

[R24] 24.Quan Q, Yao Q, Li J, Zhou S.k.: Information-guided pixel augmentation for pixel-wise contrastive learning. arXiv preprint arXiv:2211.07118 (2022) [Google Scholar]

[R25] 25.Robinson J, Sun L, Yu K, Batmanghelich K, Jegelka S, Sra S: Can contrastive learning avoid shortcut solutions? In: NeurIPS; (2021) [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Ronneberger O, Fischer P, Brox T: U-net: Convolutional networks for biomedical image segmentation. In: MICCAI. Springer; (2015) [Google Scholar]

[R27] 27.Tarvainen A, Valpola H: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: NeurIPS. pp. 1195–1204 (2017) [Google Scholar]

[R28] 28.Wang F, Liu H: Understanding the behaviour of contrastive loss. In: CVPR; (2021) [Google Scholar]

[R29] 29.Wu Y, Wu Z, Wu Q, Ge Z, Cai J: Exploring smoothness and class-separation for semi-supervised medical image segmentation. In: MICCAI; (2022) [Google Scholar]

[R30] 30.Wu Y, Xu M, Ge Z, Cai J, Zhang L: Semi-supervised left atrium segmentation with mutual consistency training. In: MICCAI; (2021) [Google Scholar]

[R31] 31.Xiong Z, Xia Q, Hu Z, Huang N, Bian C, Zheng Y, Vesal S, Ravikumar N, Maier A, Yang X, et al. : A global benchmark of algorithms for segmenting the left atrium from late gadolinium-enhanced cardiac magnetic resonance imaging. Medical Image Analysis (2021) [DOI] [PubMed] [Google Scholar]

[R32] 32.Xue Y, Xu T, Zhang H, Long LR, Huang X: Segan: Adversarial network with multi-scale l 1 loss for medical image segmentation. Neuroinformatics (2018) [DOI] [PubMed] [Google Scholar]

[R33] 33.You C, Dai W, Liu F, Su H, Zhang X, Staib L, Duncan JS: Mine your own anatomy: Revisiting medical image segmentation with extremely limited labels. arXiv preprint arXiv:2209.13476 (2022) [Google Scholar]

[R34] 34.You C, Dai W, Min Y, Liu F, Zhang X, Clifton DA, Zhou SK, Staib LH, Duncan JS: Rethinking semi-supervised medical image segmentation: A variance-reduction perspective. arXiv preprint arXiv:2302.01735 (2023) [PMC free article] [PubMed] [Google Scholar]

[R35] 35.You C, Dai W, Min Y, Staib L, Duncan JS: Implicit anatomical rendering for medical image segmentation with stochastic experts. arXiv preprint arXiv:2304.03209 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.You C, Dai W, Staib L, Duncan JS: Bootstrapping semi-supervised medical image segmentation with anatomical-aware contrastive distillation. In: IPMI; (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.You C, Xiang J, Su K, Zhang X, Dong S, Onofrey J, Staib L, Duncan JS: Incremental learning meets transfer learning: Application to multi-site prostate mri segmentation. In: International Workshop on Distributed, Collaborative, and Federated Learning (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.You C, Yang J, Chapiro J, Duncan JS: Unsupervised wasserstein distance guided domain adaptation for 3d multi-domain liver segmentation. In: Interpretable and Annotation-Efficient Learning for Medical Image Computing. pp. 155–163. Springer International Publishing; (2020) [Google Scholar]

[R39] 39.You C, Zhao R, Liu F, Dong S, Chinchali S, Topcu U, Staib L, Duncan J: Class-aware adversarial transformers for medical image segmentation. In: NeurIPS; (2022) [PMC free article] [PubMed] [Google Scholar]

[R40] 40.You C, Zhao R, Staib LH, Duncan JS: Momentum contrastive voxel-wise representation learning for semi-supervised volumetric medical image segmentation. In: MICCAI; (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.You C, Zhou Y, Zhao R, Staib L, Duncan JS: Simcvd: Simple contrastive voxel-wise representation distillation for semi-supervised medical image segmentation. IEEE Transactions on Medical Imaging (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] 42.Yu L, Wang S, Li X, Fu CW, Heng PA: Uncertainty-aware self-ensembling model for semi-supervised 3d left atrium segmentation. In: MICCAI; (2019) [Google Scholar]

[R43] 43.Zhou Y, Wang Y, Tang P, Bai S, Shen W, Fishman E, Yuille A: Semi-supervised 3d abdominal multi-organ segmentation via deep multi-planar co-training. In: WACV. IEEE; (2019) [Google Scholar]

[R44] 44.Zipf GK: The psycho-biology of language: An introduction to dynamic philology. Routledge; (2013) [Google Scholar]

PERMALINK

ACTION++: Improving Semi-supervised Medical Image Segmentation with Adaptive Anatomical Contrast

Chenyu You

Weicheng Dai

Yifei Min

Lawrence Staib

Jas Sekhon

James S Duncan

Abstract

1. Introduction

Fig. 1.

2. Method

2.1. Overview

Problem Statement

Setup

Fig. 2.

Global and Local Pre-training

Anatomical Contrast Fine-tuning

2.2. Supervised Adaptive Anatomical Contrastive Learning

Anatomical Center Pre-computation

Adaptive Allocation

Adaptive Anatomical Contrast

2.3. Anatomical-aware Temperature Scheduler (ATS)

3. Experiments

Experimental Setup

Implementation Details

Main Results

Table 2.

Table 1.

Ablation Study

Table 3.

4. Conclusion

Supplementary Material

Table 4.

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases