Improving Skin Lesion Segmentation with Self-Training

Aleksandra Dzieniszewska; Piotr Garbat; Ryszard Piramidowicz

doi:10.3390/cancers16061120

. 2024 Mar 11;16(6):1120. doi: 10.3390/cancers16061120

Improving Skin Lesion Segmentation with Self-Training

Aleksandra Dzieniszewska ^1,^*, Piotr Garbat ¹, Ryszard Piramidowicz ¹

Editors: Arianna Mencattini¹, Helder C R De Oliveira¹

PMCID: PMC10969337 PMID: 38539454

Abstract

Simple Summary

Finding the area of a skin lesion on dermoscopy images is important for diagnosing skin conditions. The accuracy of segmentation impacts the overall diagnosis. The quality of segmentation depends on the amount of labeled data that is hard to obtain because it requires a lot of time from experts. This study introduces a technique that enhances the segmentation process by using a combination of expert-generated and computer-generated labels. The method uses a trained model to generate labels for new data that are later used to improve the model. The findings suggest that this approach could make skin cancer detection tools more accurate and efficient, potentially making a big difference in the medical field, especially in situations where high-quality data are limited.

Abstract

Skin lesion segmentation plays a key role in the diagnosis of skin cancer; it can be a component in both traditional algorithms and end-to-end approaches. The quality of segmentation directly impacts the accuracy of classification; however, attaining optimal segmentation necessitates a substantial amount of labeled data. Semi-supervised learning allows for employing unlabeled data to enhance the results of the machine learning model. In the case of medical image segmentation, acquiring detailed annotation is time-consuming and costly and requires skilled individuals so the utilization of unlabeled data allows for a significant mitigation of manual segmentation efforts. This study proposes a novel approach to semi-supervised skin lesion segmentation using self-training with a Noisy Student. This approach allows for utilizing large amounts of available unlabeled images. It consists of four steps—first, training the teacher model on labeled data only, then generating pseudo-labels with the teacher model, training the student model on both labeled and pseudo-labeled data, and lastly, training the student* model on pseudo-labels generated with the student model. In this work, we implemented DeepLabV3 architecture as both teacher and student models. As a final result, we achieved a mIoU of 88.0% on the ISIC 2018 dataset and a mIoU of 87.54% on the PH2 dataset. The evaluation of the proposed approach shows that Noisy Student training improves the segmentation performance of neural networks in a skin lesion segmentation task while using only small amounts of labeled data.

Keywords: deep learning, semi-supervised learning, skin lesion segmentation, skin cancer, dermoscopy images

1. Introduction

Melanoma is the seventieth most common cancer worldwide and one of the most common cancers among young adults [1]. Although it is one of the deadliest kinds of skin cancer [2], it might be completely cured if detected early. Melanoma mortality rates are highly correlated with the state of cancer at the moment of diagnosis. Statistics show that a 5-year relative survival rate for people diagnosed in the localized stage reaches 99%, while a diagnosis in the distant stage results in a significant drop in the survival rate down to 30% [3]. Therefore, the monitoring and early diagnosis of skin lesions are crucial in preventing cancer diseases.

The majority of currently deployed solutions are focused on training complex systems toward end-use tasks such as predicting a diagnosis. These solutions have many advantages, including the computational efficiency and ease of optimization. However, excellent performance of complex models requires sufficient training data, which is often challenging in medical applications. At the same time, the performance of segmentation models has been shown to improve logarithmically with the amount of training data [4]. It is helpful to incorporate prior knowledge into the training in medical image analysis tasks with small datasets and heterogeneous case distributions. One example of this type of approach is the use of segmentation masks. This treatment reduces the complexity of understanding images by machines by extracting representative features from lesions, leading to improved diagnostic efficiency [5].

The research results presented indicate a positive correlation between enhanced segmentation quality and improved classification accuracy [6,7,8]. Our other work has demonstrated that using a segmentation mask for skin lesion classification enhances classification accuracy, and the quality of the used segmentation mask directly influences classification results [9].

A commonly used method to recognize melanoma assumes checking the ABCDE criteria. This approach considers asymmetry, border features, color, diameter, and skin lesion evolution to differentiate benign from malignant skin lesions. Approaches easily understandable by humans are commonly implemented in computer-aided diagnosis systems. They provide interpretability and explainability that end-to-end classifiers do not deliver. Doctors require computer analysis methods to not only give correct diagnoses but also explain terms on what such decision was made. The segmentation mask can provide information about the boundary and symmetry of a skin lesion. Moreover, automatic segmentation is an important preprocessing step in many medical use cases as it shows the area of interest for a further analysis. Thus, accurate skin lesion segmentation is crucial in an automated diagnosis. However, variations in shape and size, irregular lesion boundaries, and low contrast differences between the lesion and the skin make developing automated segmentation methods nontrivial.

Deep learning-based methods enable achieving state-of-the-art results in multiple medical image segmentation tasks. However, they require considerable amounts of annotated training samples, collecting which is time-consuming and costly as skilled individuals are required to label images. We propose using a semi-supervised learning technique to employ images without binary masks to improve neural network performance in a skin lesion segmentation task.

We explore the possibility of implementing self-training with Noisy Student [10] in medical image segmentation. Noisy Student training is a semi-supervised learning technique utilizing labeled and unlabeled data. It was first applied to semantic segmentation by Y. Zhu et al. [11]. It consists of three main steps. First, the teacher model is trained on a small set of labeled data, i.e., real labels. Second, labels are predicted for unlabeled data and the student model is trained using generated pseudo-labels and real labels. Third, new labels are predicted with the student model and the new student model (further referred to as student*) is trained. The self-training approach is possible in the case of skin lesion segmentation because there is an available quite large dataset of dermatoscopic skin lesion images without segmentation masks, compared to other medical image datasets. We decided to use the same architecture for the teacher and student model because there are much less public data than in a typical ImageNet classification task where millions of images are available.

Firstly, the best architecture for the teacher model was selected. Architectures like U-Net [12], U-Net++ [13], and DeepLabV3 [14] were tested. The best-performing model (DeepLabV3) was used as a teacher. The second step was to train the student model on labeled data. Then, the best student was set as a new teacher. It should be noted that the optimal ratio of labeled and unlabeled data in the training dataset was found and the further increase in the number of generated labels led to a performance decrease. Research has also shown that generated labels with better quality result in better performance of a student model. Student* slightly improves segmentation performance but only for a smaller real-to-generated labels ratio.

The evaluation of this approach shows that Noisy Student training improves the segmentation performance of neural networks in a skin lesion segmentation task while using only small amounts of labeled data. We reached state-of-the-art performance on ISIC 2018 and PH2 datasets. The code used for this research is available at https://github.com/Oichii/Improving-skin-lesion-segmentation-with-self-training (accessed on 6 March 2024).

1.1. Related Work

1.1.1. Image Segmentation

Before deep learning grew in popularity, skin lesion segmentation methods were based on traditional image processing techniques such as adaptive thresholding based on a grayscale image histogram [15], iterative active contour adjustments [16], and region growth based on color space quantization [17].

With the rapid development of deep learning, traditional methods were replaced by convolutional neural networks. The first proposed architecture was a fully convolutional network (FCN) [18] followed by U-Net [12]. The success of encoder–decoder-type architectures led to many modifications of U-Net, like U-Net++ [13] or ResU-Net [19]. Also, different backbones were applied to improve segmentation results.

U-Net consists of a contracting path (encoder) and a symmetric expanding path (decoder). The novelty of the architecture is the concatenation of encoder intermediate feature maps with the corresponding feature maps of the decoder, enabling the network to learn context and correct localization simultaneously. The encoder follows the typical convolutional network architecture where each layer halves the input size and doubles the number of features. Each decoder layer doubles the image size and halves the number of feature channels. It is also concatenated with the appropriate feature map from the encoder. At the final layer, each feature vector is mapped to the selected number of classes [12]. DeepLab is also an encoder–decoder architecture. It utilizes dilated convolution and Atrous Spatial Pyramid Pooling (ASPP). The encoder is a convolutional network that replaces standard convolution with dilated convolution to overcome localization invariance caused by pooling operations [14]. The dilated convolution also allows the use of pre-trained weights in the encoder. DeepLab also addresses the issue of segmenting objects of varying scales through the ASPP module, which uses convolution with multiple filter sizes and dilation rates to capture multi-scale features. This approach is inspired by pyramid pooling, which showed that resampled convolutional features extracted at a single scale could correctly classify regions of any scale [20].

A specially crafted loss function combined with general-purpose architectures was applied in skin lesion segmentation. For example, a loss function based on Jaccard distance is proposed to overcome the re-weighting need [21].

Some architectures were explicitly proposed for skin lesion segmentation. A Dermoscopic Skin Network (DSNet) uses depth-wise separable convolution to eliminate the need to learn redundant features by reducing the number of parameters [22].

Wang et al. [23] proposed a boundary-aware transformer that can effectively model global long-range dependencies and capture local features by fully utilizing boundary prior knowledge provided by a boundary-wise attention gate (BAG). Because it provides detailed spatial information, BAG’s auxiliary supervision can assist transformers in learning position embedding.

Tang et al. [24] proposed the Dual-Aggregation Transformer Network (DuAT), which combines Global-to-Local Spatial Aggregation (GLSA), which, in turn, aggregates both global and local spatial features and is useful for locating objects with various scales, and Selective Boundary Aggregation (SBA), which accumulates low-level boundary characteristics and high-level semantic information for a better object localization and preservation of borders [24].

Bagheri et al. [25] proposed an ensemble of neural networks that uses a graph-based method to combine segmentation results of Mask R-CNN and Retina-Deeplab.

Input image preprocessing is also important in skin lesion image segmentation. Some recent approaches also showed that preprocessing of an input image, such as transformation to polar coordinates using the centroid or center of the object found using another method, increases skin lesion segmentation performance [26]. The proposed preprocessing method includes a hair removal technique using a black top hat filter to create a hair mask combined with an image inpainting technique to restore a clean skin image [27].

1.1.2. Semi-Supervised Learning

Semi-supervised learning aims to train a model using both labeled and unlabeled data that is better than a supervised model trained on labeled data only [28]. The labeled portion of the data is usually smaller than the unlabeled portion, thus representing the most common real-life scenario. This is especially the case with medical imaging, where data collection and labeling need to be performed by a qualified doctor. The preparation of detailed masks for image segmentation is also time-consuming, even more so than for classification tasks. Consequently, segmentation benefits more from methods that allow for using only a small amount of labeled images. Semi-supervised learning can be used with both handcrafted features and deep learning-based classifiers.

You et al. [29] proposed an approach that uses self-training combined with an SVM classifier based on radial projection to segment retinal blood vessels. Portela, Cavalcanti, and Ren [30] used clustering to label voxel clusters combined with Gaussian mixture models to label the remaining pixels of a brain MR scan.

Bai et al. [31] developed an iterative semi-supervised framework for cardiac MR image segmentation where in each iteration, pseudo-labels for unlabeled images are generated by the network and refined by a conditional random field [32]. The model is then updated using generated pseudo-labels.

Adversarial learning can also incorporate unlabeled data in semi-supervised image segmentation. Zhang et al. [33] implemented two networks, one that segments images and a second that distinguishes between segmentation results of labeled and unlabeled images. In the adversarial training process, the segmentation network learns to produce similar results on both types of data.

Li et al. [34] proposed self-loop uncertainty, which involves optimizing a neural network with a self-supervised task to generate pseudo-labels, which are then used as ground truth for unlabeled images to enhance the training set and improve segmentation accuracy. This approach is a fast alternative to ensembling multiple models to estimate uncertainty as it reduces inference time.

This work is based on the Noisy Student training method introduced by Xie et al. in [10] as an extension to pseudo-labeling [35] and self-training [36]. To improve model performance, it uses unlabeled images with pseudo-labels generated by a model trained on limited labeled data. In other words, it uses the model’s own confident predictions to create more training data by producing labels for unlabeled data [28]. Image augmentations and model size also play an essential role in this approach. The student model is no smaller than the teacher to better capture the complexity of a larger dataset, and random image augmentations lead to a better generalization of the student model. It was successfully used in the segmentation task in [37] where it improved the score on PASCAL VOC 2012 and Cityscapes datasets. We found no previous use of Noisy Student training in skin lesion segmentation.

Our approach is different from other comparable solutions in the following aspects: We used deep learning instead of clustering and SVM, as proposed by You et al. [29] and Portela et al. [30]. Differently from Bai et al. [31], our pseudo-labels are generated once at the beginning of iteration and do not change during iteration. Compared to Zhang et al. [33], self-training models do not influence each other directly as in adversarial training. In our case, only pseudo-labels generated by a model are used in the next steps. We have a batch that contains labeled and unlabeled data, and we use a pseudo-label for unlabeled data. The solution proposed by Li et al. [34] also uses a batch that contains labeled and unlabeled data, but their solution uses a self-supervised subtask of image permutations for unlabeled data.

2. Materials and Methods

This section introduces a semi-supervised self-training framework with Noisy Student for skin lesion image segmentation. Teacher–student training, as employed in the study, is described in Figure 1. Our goal is to combine a limited set of labeled data and a large amount of unlabeled data to increase the accuracy and robustness of lesion segmentation. Such an approach allows for reducing human effort on labeling. Questions we want to answer in the study are (1) will self-training with Noisy Student enhance the segmentation of skin lesion images?; (2) what is the largest amount of unlabeled data that we can use to enhance the performance?; (3) what is the best combination of augmentations to use for input noise?

Self-training with Noisy Student for skin lesion image segmentation. (a) Train teacher model on labeled data and generate pseudo-labels. (b) Train the student model on labeled and pseudo-labeled data and generate new pseudo-labels. (c) Train the student* model on labeled and new pseudo-labeled data. Blue arrows in the Figure represent the model training process, and red arrows describe pseudo-labels flow.

The input to the algorithm is both labeled and unlabeled images. Training a teacher model using solely labeled data is the first stage. Then, the trained teacher model predicts segmentation masks (pseudo-labels) for unlabeled images. Images with corresponding generated masks and images with real labels define a new training dataset. The teacher model generates an output image with probabilities for each pixel to belong to either the background or a skin lesion. The generated mask is then thresholded with a threshold of 0.5 to create a binary mask.

Images with generated masks that were empty or only had a small number of pixels are excluded from the dataset since they can have a negative impact on the training. In other words, the predicted confidence of skin lesion pixels was low, so the image was removed from the training dataset to increase self-training efficiency. The student model is trained to minimize the loss on both labeled and unlabeled data while validation uses only images with real labels. Finally, we run the second iteration of the training in which we select the student model with the highest IoU on the validation dataset for a new teacher. It is then employed to generate new pseudo-labels for unlabeled data. New masks are used to train a new student model (student*). The same model architecture is used for both student and teacher models so it will have enough capacity to learn from a larger dataset while preserving generalization capabilities. For student model training, we use dropout with $p = 0.5$ as model noise and image augmentations that include random flips, rotation, and hue shift as input noise.

2.1. Model Architectures

In the study, we used model architectures with a notable position in the literature as we want to focus our research on designing a scalable training approach rather than on deep learning network architecture. We want to separate the influence of Noisy Student training and application-specific model adjustments. We tested four model architectures: U-Net, U-Net++, DeepLabV3, and DeepLabV3+.

U-Net was first proposed for medical image segmentation. It consists of an encoder and decoder in a U-shaped architecture. It also implements skip connections between corresponding encoder and decoder blocks, which enhance segmentation performance.

U-Net++ is an extension of the original U-Net architecture that was proposed to address some of the limitations of U-Net, particularly its limited capacity to capture complex patterns and its tendency to produce coarse segmentation results. U-Net++ takes advantage of the semantic similarity between the encoder’s and decoder’s feature maps by introducing dense skip connections. These skip connections are designed to connect each encoder layer to every layer of the corresponding decoder block. The connections also include a dense convolution block, which helps increase the network’s capacity and capture more complex patterns.

DeepLabV3 is also a commonly used solution in medical applications. It performs atrous convolution with multiple rates to capture image features at multiple scales. Model architecture is presented in Figure 2. DeepLabV3+ enhances the segmentation of object boundaries compared to DeepLabV3 by incorporating an improved decoder module.

2.2. Data

2.2.1. Labeled Dataset

To train the baseline (teacher) model, freely accessible dermatoscopy image datasets released by the International Skin Imaging Collaboration (ISIC) were used. The combined datasets released from 2016 to 2018 contain 3074 image and segmentation mask pairs. ISIC 2016 and ISIC 2017 [38] validation and training subsets and ISIC 2018 [39,40] training subsets combined were used for model training and validation in the 4:1 ratio. The test subsets of ISIC 2016 and ISIC 2017 as well as the validation subset of ISIC 2018 were used to test the model. Acquired subsets were checked to ensure no overlap between the data. In total, there are 1572 training, 523 validation, and 979 testing mask and image pairs.

For the final evaluation, we used the PH2 dataset that contains 200 dermoscopic images with segmentation masks.

2.2.2. Unlabeled Dataset

Unlabeled images were also obtained from freely accessible dermatoscopy image datasets, i.e., ISIC 2020 [41,42] and ISIC 2019 [43] datasets. Those datasets combined provide almost 60k skin lesion images without segmentation masks. The number of samples in each of the datasets is presented in Figure 3; for the training dataset, we used a maximal number of pseudo-labels in the study, i.e., for ratio $m = 8$ . Images from the labeled datasets were filtered out from the unlabeled data. Then, the model trained on labeled data was run on the images to predict segmentation masks. Masks with a pixel size of less than 100 were screened. This was performed because the masks containing only a few pixels are less accurate or contain errors. The examples of rejected images are shown in Figure 4.

Number of real and pseudo-labeled samples in train, validation, and test datasets.

Examples of images for which masks were not created or were too small and thus not included in the dataset.

2.3. Implementation Details

We implemented the method described above using the PyTorch framework [44]. Encoder models were initialized with weights pre-trained on ImageNet, and decoder weights were initialized randomly. Pre-trained weights are used due to their beneficial influence on skin lesion segmentation performance, as it was shown in [45]. Images and masks were resized to the resolution of 256 × 256 pixels, and values were scaled to the range of $[0, 1]$ . We used a batch size of $b s = 10$ by default and reduced it when we could not fit the model into the memory.

For training, a stochastic gradient descent optimizer (SGD) [46] was used with an initial learning rate of $l r = 0.002$ and cosine annealing learning rate scheduler [47]; momentum was set to $β = 0.54$ and weight decay was set to $w d = 0.01$ .

We trained each DeepLabV3 with a ResNet18 backbone for 60 epochs and DeepLabV3 with ResNet34 for 90 epochs or until IoU on the validation dataset no longer decreased.

As a loss function, dice loss presented in Equation (1) was used, where $λ = 1$ is the smoothing parameter. It provides better results in terms of both the IoU and dice coefficient compared to the weighted cross-entropy function.

L_{D i c e} = 1 - \frac{2 | X ⋂ Y | + λ}{| X | + | Y | + λ}

(1)

2.4. Evaluation Metrics

Model performance was evaluated with the dice coefficient (Dice), presented in Equation (2), and Intersection over Union (IoU), also known as the Jaccard index, presented in Equation (3), where X is the predicted mask and Y is the ground truth mask. The Jaccard index is used to quantify the overlap area between the true and predicted lesion masks, and the dice coefficient is used to assess the similarity between real and predicted masks.

D i c e = \frac{2 | X ⋂ Y |}{| X | + | Y |}

(2)

I o U = \frac{| X ⋂ Y |}{| X ⋃ Y |}

(3)

In addition, precision (Prec) and recall (Rec) were calculated in a pixel-wise manner as in Equations (4) and (5), where $T P$ is the number of true positives, $F P$ is the number of false positives, and $F N$ is the number of false negatives. Precision is used to measure the number of pixels that belong to the lesion region that are correctly classified. Recall calculates the number of pixels outside of the lesion area that are incorrectly classified.

P r e c = \frac{T P}{T P + F P}

(4)

R e c = \frac{T P}{T P + F N}

(5)

All evaluation metrics take values in the range $[0, 1]$ and the higher values correspond to better results. Following the approach from work [27], the percentage of images with IoU over 0.8 was calculated and a visual inspection of the worst and best examples was performed. It is due to the expert dermatologists’ agreement that only skin lesion segmentation with IoU over $78.6 %$ is helpful and useful for medical purposes [40]. Also, segmentation with the Jaccard index equal to $0.8$ or above is, in general, visually correct [38].

3. Results

In this section, we describe the details of our experiments. Then, we report the method’s performance on ISIC 2018 and PH2 datasets as they are the most benchmarked and freely accessible skin lesion segmentation datasets. Finally, we compare our method with the state-of-the-art skin lesion segmentation models described in the literature.

3.1. Architecture Selection for Teacher Model

We compared U-Net, U-Net++, DeepLabV3, and DeepLabV3+ architectures to determine the optimal architecture for skin lesion segmentation, and the best one was selected for further experiments. All the models were tested with a ResNet34 backbone. Results are shown in Table 1.

Table 1.

Comparison of model architectures for skin lesion segmentation.

	IoU	Dice	Precision	Recall
U-Net	0.6941	0.7396	0.8758	0.6460
U-Net++	0.8137	0.8978	0.8902	0.8202
DeepLabV3	0.8205	0.8822	0.9170	0.8080
DeepLabV3+	0.8166	0.8843	0.9140	0.7915

	mIoU	Dice	Precision	Recall
teacher	0.8659	0.9194	0.8781	0.9308
m = 1	0.8789	0.9355	0.9030	0.9303
m = 2	0.8713	0.9215	0.8690	0.9482
m = 4	0.8657	0.9245	0.8760	0.9382
m = 8	0.8714	0.9292	0.8808	0.9396

	mIoU	Dice	Precision	Recall
teacher	0.8568	0.9133	0.8408	0.9596
m = 1	0.8647	0.9173	0.8498	0.9561
m = 2	0.8601	0.9159	0.8622	0.9389
m = 4	0.8567	0.9180	0.8368	0.9579
m = 8	0.8593	0.9124	0.8448	0.9571

	mIoU	Dice	Precision	Recall
teacher	0.8649	0.9273	0.8962	0.9385
m = 1	0.8740	0.9382	0.8969	0.9503
m = 2	0.8723	0.9333	0.8816	0.9652
m = 4	0.8578	0.9224	0.8590	0.9671
m = 8	0.8734	0.9346	0.8877	0.9590

	mIoU	Dice	Precision	Recall
teacher	0.8574	0.9192	0.8502	0.9760
m = 1	0.8554	0.9181	0.8459	0.9782
m = 2	0.8575	0.9200	0.8475	0.9791
m = 4	0.8603	0.9222	0.8484	0.9811
m = 8	0.8537	0.9135	0.8380	0.9825

	Teacher	Student	m
simple augmentations	0.8475	0.8540	4
coarse dropout	0.8360	0.8501	2
elastic transform	0.8169	0.8299	1
grid distortion	0.8499	0.8532	1
optical distortion	0.8519	0.8550	2

	mIoU	Dice	Precision	Recall
teacher	0.8512	0.9226	0.8451	0.9820
m = 1	0.8586	0.9253	0.8536	0.9804
m = 2	0.8550	0.9222	0.8454	0.9843
m = 4	0.8546	0.9217	0.8473	0.9806
m = 8	0.8484	0.9227	0.8425	0.9826

	mIoU	Dice	Precision	Recall
teacher	0.8598	0.9209	0.8404	0.9710
m = 1	0.8559	0.9207	0.8470	0.9603
m = 2	0.8681	0.9220	0.8596	0.9590
m = 4	0.8657	0.9245	0.8760	0.9382
m = 8	0.8439	0.9104	0.8367	0.9477

Method	mIoU	Dice	Ref.
DuAT	0.867	0.923	[24]
BAT	0.843	0.912	[23]
DoubleU-Net	0.821	0.896	[49]
Polar	0.874 *	0.925	[26]
Ours (ResNet18)	0.880	0.937	[this work]
Ours (ResNet34)	0.869	0.919	[this work]

ISIC	The International Skin Imaging Collaboration
FCN	Fully Convolutional Network
ASPP	Atrous Spatial Pyramid Pooling
DSNet	Dermoscopic Skin Network
BAG	Boundary-wise Attention Gate
SGD	Stochastic Gradient Descent
GLSA	Global-to-Local Spatial Aggregation
SBA	Selective Boundary Aggregation
CNN	Convolutional Neural Network
IoU	Intersection over Union

PERMALINK

Improving Skin Lesion Segmentation with Self-Training

Aleksandra Dzieniszewska

Piotr Garbat

Ryszard Piramidowicz

Roles

Abstract

Simple Summary

Abstract

1. Introduction

1.1. Related Work

1.1.1. Image Segmentation

1.1.2. Semi-Supervised Learning

2. Materials and Methods

Figure 1.

2.1. Model Architectures

Figure 2.

2.2. Data

2.2.1. Labeled Dataset

2.2.2. Unlabeled Dataset

Figure 3.

Figure 4.

2.3. Implementation Details

2.4. Evaluation Metrics

3. Results

3.1. Architecture Selection for Teacher Model

Table 1.

3.2. Real- to Pseudo-Label Ratio

Table 2.

Table 3.

Table 4.

Table 5.

Figure 5.

3.3. Input Noise

Figure 6.

Table 6.

Figure 7.

Table 7.

Table 8.

3.4. Second Iteration of Student Training

Figure 8.

Figure 9.

Figure 10.

Figure 11.

Figure 12.

Table 9.

Table 10.

Table 11.

Table 12.

Table 13.

Table 14.

4. Discussion

Figure 13.

4.1. Qualitative Analysis

Figure 14.

Figure 15.

4.2. Comparison with State of the Art

Table 15.

4.3. The Robustness of the Model

5. Conclusions

Abbreviations

Author Contributions

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Funding Statement

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases