Skip to main content
Journal of Healthcare Engineering logoLink to Journal of Healthcare Engineering
. 2022 Mar 28;2022:4246239. doi: 10.1155/2022/4246239

Augmentation-Consistent Clustering Network for Diabetic Retinopathy Grading with Fewer Annotations

Guanghua Zhang 1, Keran Li 2, Zhixian Chen 1, Li Sun 3,4, Jianwei zhang 5, Xueping Pan 6,
PMCID: PMC8979701  PMID: 35388319

Abstract

Diabetic retinopathy (DR) is currently one of the severe complications leading to blindness, and computer-aided, diagnosis technology-assisted DR grading has become a popular research trend especially for the development of deep learning methods. However, most deep learning-based DR grading models require a large number of annotations to provide data guidance, and it is laborious for experts to find subtle lesion areas from fundus images, making accurate annotation more expensive than other vision tasks. In contrast, large-scale unlabeled data are easily accessible, becoming a potential solution to reduce the annotating workload in DR grading. Thus, this paper explores the internal correlations from unknown fundus images assisted by limited labeled fundus images to solve the semisupervised DR grading problem and proposes an augmentation-consistent clustering network (ACCN) to address the above-mentioned challenges. Specifically, the augmentation provides an efficient cue for the similarity information of unlabeled fundus images, assisting the supervision from the labeled data. By mining the consistent correlations from augmentation and raw images, the ACCN can discover subtle lesion features by clustering with fewer annotations. Experiments on Messidor and APTOS 2019 datasets show that the ACCN surpasses many state-of-the-art methods in a semisupervised manner.

1. Introduction

Diabetic retinopathy (DR) is one of the most prevalent complications caused by diabetes, which may cause intermittent or even permanent blindness [13]. Ophthalmologists often judge the severity of DR based on the features of the disease and the number of lesions, such as observing the characteristics of microaneurysms, hemorrhages, soft exudates, and hard exudates [4, 5]. Recognized by international authorities [6, 7], the severity of DR can be categorized into the following five levels: normal, mild, moderate, severe nonproliferative, and proliferative; these can be summarized into two main categories: normal and abnormal or nonreferable and referable symptoms [79]. If the retina is in the pathological state of DR for a long time, the blood vessels in the eye will eventually become blocked, eventually leading to decreased vision and even blindness. Therefore, it is essential to detect DR early and grade the DR severity in patients because early correct and timely treatment can largely avoid the deterioration of the disease.

In clinical diagnosis, DR detection mainly relies on the careful comparison of colorful fundus images by ophthalmologists. Recently, as the number of diabetic patients has increased yearly, the number of subjects to be tested has become vast, bringing a significant burden on ophthalmologists and DR experts who waste much time observing fundus images. Therefore, it is necessary to develop computer-aided diagnosing models to efficiently reduce the workload and inspection time for ophthalmologists and experts, achieving real-time DR diagnosis for patients.

To solve the automatic DR grading, early attempts [1013] are inclined toward exploiting traditional machine learning methods on manual features, limited by specific feature extraction skills and experience. Aiming at this weakness, deep learning has become a popular solution for DR grading with many successful applications [14, 15] because it can automatically learn critical features from fundus images, supervised by accurate annotations. However, these models often depend on a large number of labeled fundus images, whose discriminant information only occurs in subtle blood vessels. The DR grading annotators must master the professional medical knowledge to support them, manually finding key features to decide on actual DR severity, which is a highly time-consuming workload. Thus, high-quality labeled data are scarce, making the supervised DR grading model hard to accomplish.

To save the expensive annotating work in real applications, this paper attempts to solve automatic DR grading in a semisupervised manner to integrate unlabeled data into the training stage because clinical inspection can produce many unlabeled fundus images containing important potential information. Thus, the most crucial task of this paper is to train a robust DR-grading model from massive unlabeled data assisted by fewer annotations, as shown in Figure 1. Extracting more identical information from unlabeled fundus images becomes a top priority, and the data consistency of unlabeled data is vital for feature learning in our work [1619]. Inspired by previous works, we make more efforts to mine consistent correlations between raw fundus images and their augmentations, which preserve the consistent discriminative information but suffer from image transformations, such as geometric transformation, color space augmentation, random erasing, generative adversarial networks, and neural style transfer.

Figure 1.

Figure 1

Analysis diagram of our semisupervised DR-grading solution.

In this paper, we propose an augmentation-consistent clustering network (ACCN) to alleviate the laborious annotating workload in clinical application, which straightforwardly mines the consistent inner correlations among fundus image augmentations and dynamically conducts weight clustering to utilize the sufficient unlabeled data, absorbing fewer annotated fundus images. As the discriminant cues indicating DR grades are subtle in fundus images, the augmentations from raw images can help the ACCN spread the information from annotated data to unlabeled images. Besides, an online memory unit is introduced to dynamically update the clustering centroids, guaranteeing the global consistency between labeled and unlabeled fundus images in exploring critical information.

The main contributions of this article are summarized as follows:

  1. We propose a brand-new, highly robust semisupervised framework (ACCN) to solve the DR grading problem, inspired by the consistent discriminative correlations between labeled and unlabeled fundus images with different augmentations.

  2. We design a reasonable weight-clustering algorithm that benefits from an online memory unit to dynamically update the clustering centroids with global consistency, generating high-quality pseudolabels for unlabeled images and integrating annotated fundus images to explore discriminative information for DR grading.

  3. We conducted experiments on the public data sets Messidor and APTOS 2019, and the results show that the ACCN is superior to many state-of-the-art DR grading methods.

2. Related Work

This section summarizes recent works on the diabetic retinopathy grading problem and introduces the successful computer-aided diagnosing applications of semisupervised learning.

2.1. Diabetic Retinopathy Grading

With the continuous development of deep learning, its application to retinal images has also achieved great success. Recently, some new research has been proposed [2023]. For example, Sambyal et al. [20] proposed an aggregated residual transformation-based model for automatic multistage classification of diabetic retinopathy. Bhardwaj et al. [21] developed a hierarchical severity-level grading system to detect and classify DR ailments. Bodapati et al. [22] presented a hybrid deep neural network architecture with a gated attention mechanism for automated diagnosis of diabetic retinopathy. Math et al. [23] designed a segment-based learning approach for diabetic retinopathy detection, which mutually learns classifiers and features from the data and achieves significant development in diabetic retinopathy recognition.

However, the methods mentioned above require a large amount of labeling information. Medical labeling is well known to be expensive and time-consuming, which many institutions cannot afford. This significantly constrains the transferability of these developed DR grading systems.

2.2. Semisupervised Learning in Medical Image Classification

In recent years, medical imaging technology has been fully developed for clinical applications [2426]. In medical image analysis, annotation is often difficult to obtain because it is expensive and labor-intensive. Semisupervised learning to relieve the pressure of labeling has provided great help to a certain extent. In recent years, some studies have successfully applied the semisupervised framework to medical image analysis [2731]. Wang et al. [27] incorporated virtual adversarial training on both labeled and unlabeled data into the course of training, self-training, and consistency regularization to effectively exploit useful information from unlabeled data. Calderon et al. [28] explored the impact of using unlabeled data through the implementation of a recent approach known as MixMatch for mammogram images. Pang et al. [29] developed a radionics model based on a semisupervised GAN method to perform data augmentation in breast ultrasound images. Liu et al. [30] proposed a self-supervised mean teacher for chest X-ray classification that combines self-supervised mean-teacher pretraining with semisupervised fine-tuning. Bakalo et al. [31] designed a deep learning architecture for multiclass classification and localization of abnormalities in medical imaging illustrated through experiments on mammograms.

In this paper, we propose a novel augmentation-consistent clustering network (ACCN) for semisupervised diabetic retinopathy grading on fundus images, exploring the discriminative information learned from plentiful unlabeled data and fewer annotated fundus images.

3. Method

Aiming to explore the discriminant information from massive unlabeled fundus images, we design a novel semisupervised DR grading approach, the augmentation-consistent clustering network (ACCN), to assist the supervised model trained by fewer annotated data. The ACCN utilizes consistent learning and weight clustering on easily accessible unlabeled data with the help of fewer annotations to achieve the semisupervised diabetic retinopathy grading task. In detail, the ACCN first considers the category correlations among unlabeled fundus images, maintaining consistency with different augmentations. Then the trained model from annotated fundus images is utilized as the baseline network, and the ACCN deploys a clustering algorithm to weight their CNN features to calculate the pseudolabels for unlabeled images. Finally, we utilize the real annotations and pseudoannotations to train the network parameters. The whole workflow for the ACCN is illustrated in Figure 2, and the symbols are summarized in Table 1.

Figure 2.

Figure 2

Scheme of the augmentation-consistent clustering network. First, different augmentations for annotated and unlabeled fundus images are generated in a weak and a strong manner, respectively, and consistent feature learning is conducted to train a robust feature extractor. Then, the unlabeled feature representations are fed into a weight-clustering unit to assign pseudolabels with dynamically updating memory in model training. Finally, the pseudolabels and corresponding unlabeled retinal images are utilized to optimize the whole network for solving the DR grading task with fewer annotations.

Table 1.

The symbol summary.

Symbol Meaning
x i l The i-th annotated retinal image
x j u The j-th unlabeled retinal image
A weak The weak augmentation
Astrong The collection of strong augmentations
x˜il The weak augmented image for xil
X˜ju The collection of strong augmentations for xju
G The feature extractor
F The classifier
X l The labeled raw images and their augmentations
X u The set of unlabeled raw images
X u The unlabeled raw images and their augmentations
c k The local centroid for k-th class
y j u The generated pseudolabel
M k The global centroid

3.1. Augmentation-Consistent Learning

In semisupervised DR grading work, the most crucial task is the exploration of unlabeled retinal images. At the same time, the augmentation in deep learning is a popular and easily conducted process to produce various transformations for unlabeled raw fundus images, containing consistent identity information but close to realistic scenarios [19, 32]. Thus, the ACCN first conducts reasonable augmentations for raw retinal images to generate diverse data with the same category and then employs a convolutional neural network to learn appearance feature representations for the augmented images.

In the ACCN, we adopt augmentation anchoring technology [19, 32] that utilizes the pseudolabels that come from weakly augmented samples as the “anchor” and align the strongly augmented samples to the “anchor.” Notably, the weak augmentation Aweak in our method contains a random cropping followed by a random horizontal flip, and the strong augmentation sequence Astrong={Astrong1, Astrong2, ⋯, Astrongk} is achieved by RandAugment and a fixed augmentation strategy that contains a sequence of image transformations.

Because the labeled images contain sufficient grading information to find samples in the same category, with no need to generate much more augmented images, we only process the annotated retinal image xil by weak augmentation to produce an “anchor” x˜il,

x˜il=Aweakxil, (1)

while the unlabeled fundus image xul should be transformed into an image sequence by strong augmentations to produce more strongly augmented samples to form sufficient training data in the same category. Thus, we utilize the strong augmentation series to generate their augmentations:

X˜ju=Astrongkxjuk=1K, (2)

where x˜u denotes K strongly augmented unlabeled fundus images from Astrong.

Through the above-mentioned augmentations, we can obtain the weak augmented annotated image x˜il and strong augmented unlabeled fundus images X˜ju, which are intended to supervise the model training to analyze the images from multiple angles and extract more critical features.

As for feature learning, the ACCN employs the ResNet-50 architecture [33] as the feature extractor for fundus images and their augmentations due to its excellent performance in medical imaging. Particularly, the feature extractor is defined by G for annotated and unlabeled retinal images, and the feature vector G(·) is transformed into a probability vector by a classifier F. Taking a retinal image x as an example, its prediction can be mathematically represented by

Px=FGx. (3)

Essentially, the weak augmented images enlarge the scale of labeled data to compose a labeled set Xl=x1l,x2l,,xNllx˜1l,x˜2l,,x˜Nll, training the feature extractor and classifier by labeled cross-entropy (lce) loss:

Llce=xiXlyillog  FGxi;WG;WF, (4)

where WG and WF represent the network parameters of the feature extractor and the classifier, respectively.

Similarly, the strong augmentations for unlabeled images produce the transformed samples with the same category as raw images. Thus, we also introduce an augmentation-consistent (ac) loss to enforce that the classifier predicts the consistent probability vectors for the correlated augmentation and raw fundus images:

Lac=xjXu,x˜jX˜juPxjPx˜j, (5)

where Xu={x1u, x2u, ⋯, xNuu} denotes the set of unlabeled retinal images.

Benefiting from the labeled cross-entropy loss Llce and augmentation-consistent loss Lac, the feature extractor G and classifier F can learn a lot from the discriminative consistency between augmentations and raw images, especially from the unlabeled retinal images. Hence, the backbone network in the ACCN possesses quite an inferential capability for unknown retinal images.

3.2. Weight Clustering Unit

Even though the consistency information has been extracted from unlabeled images, accurate diabetic retinopathy grading cues are implied in the annotations. In recent years, pseudolabels have become an essential research topic in unlabeled image analysis [3436]. However, simply introducing a pretrained fully connected classifier F by the limited labeled data does not contain robust identification ability; thus, it cannot effectively extract the internal association between the unlabeled feature representations because the augmentation consistent loss is short of the annotations. To address this weakness, the ACCN designs a weight clustering unit to mine the mutual relationships between unknown samples and their pseudolabels.

Specifically, we calculate the estimated centroid ck for each class according to the primary outputs from the trained classifier F:

ck=xiuXuδkFGxiuGxiuxiuXuδkFGxiu, (6)

where δk corresponds to the k-th element output by softmax. Then, we calculate the distance between each unlabeled feature and each centroid to generate pseudolabels according to the nearest neighbor principle:

yju=argminkdGxju,ck, (7)

where d(·, ·) denotes the Euclidean distance measure. In this way, we induce the prediction model focus on some samples around the decision boundary and explore more discriminative information by the weight clustering unit.

It should be noted that weight clustering is supported by iterative epochs to update the centroids. This means that multiple clustering is required in each batch, producing different local centroids. This may cause much more centroid deviation with wrong pseudolabeled annotations. To avoid this problem in our ACCN model, we design a dynamic centroid memory {Mk}k=1Nc to store the temporary global centroids in each batch, where Mk is the k-th class center and Nc represents the number of image categories. Besides, the updated strategy for the global centroid is as follows:

Mk=1ηtkMk+ηtkck, (8)

where ηtk=etk represents the updating rate of grade k and tk denotes the number of categories k that appeared in the previous batch.

Finally, we minimize the distance between the local and global centroids in each batch by a global consistent (gc) loss:

Lgc=1Nck=1NcMkck2. (9)

By advancing the above-mentioned relationship, we can alleviate the problem that wrong pseudolabeled samples cannot be correctly distinguished, which also improves the effect of diabetic retinopathy grading.

By the weight clustering unit, we can obtain reasonable pseudoannotation for the unlabeled retinal images. This supports us to conduct the annotation level supervised training from unlabeled fundus data and their strong augmentations Xu=x1u,x2u,,xNuuX˜1u,X˜2u,,X˜Nuu corresponding to their pseudolabels {y1u, y2u, ⋯, yNuu}, according to a pseudo-cross-entropy (pce) loss:

Lpce=xjXuyjulog  FGxj;WG;WF. (10)

3.3. Final Loss for ACCN Model

As described above, our semisupervised diabetic retinopathy grading approach ACCN is composed of two crucial modules, namely, an augmentation-consistent learning and a weight clustering unit, attached with labeled cross-entropy loss Llce, augmentation-consistent loss Lac, global-consistent loss Lgc, and pseudo-cross-entropy loss Lpce.

To update all trainable parameters in the ACCN, we integrate the final loss into the network with balance parameters:

minWG,WFL=Llce+γ1Lac+γ2Lgc+γ3Lpce, (11)

where γ1, γ2, and γ3 are parameters to balance different loss functions.

4. Experiments

4.1. Database Description

In this section, we evaluate the proposed augmentation-consistent clustering network by training on the publicly available dataset Messidor [37]. In detail, Messidor [37] contains approximately 1200 digital fundus images obtained by using a Topcon TRC NW6 nonmydriatic camera. The sizes of fundus images are 440 × 960, 2240 × 1488, or 2304 × 1536 in, and ophthalmologists labeled each image. According to the DR severity, Messidor classifies the fundus images into one of the four grades, namely, normal and no lesion (R0), mild (R1), severe nonproliferative (R2), and proliferative (R3) retinal images. The data distribution of Messidor in each grade is described in Table 2, and the popular DR grading task of normal/abnormal classification is summarized in Table 3. The distribution shows that the common challenging problem is the data imbalance, which may influence the model training.

Table 2.

The class distribution of datasets.

Label Messidor
DR 0 546
DR 1 153
DR 2 247
DR 3 254

Table 3.

The popular classification task on DR grades.

Label Description
DR grading DR 0/DR 1/DR 2/DR 3
Normal/abnormal DR DR 0/DR 1, DR 2, DR 3

4.2. Experimental Settings

This paper conducts normal/abnormal DR grading experiments, dividing the dataset into 600 training images and 600 testing samples. In detail, labeled retinal images in the training data contain 400 labeled fundus images, including 200 positive cases and 200 negative images. As for the unlabeled training data, they contain 46 positive cases and 154 negative images. In addition, we chose the left 600 retinal images as testing data, which contain 300 positive and 300 negative cases. The entire experimental process is completed using the PyTorch framework under GeForce 2080TI GPU. Precisely, each retinal image is adjusted to 512 512 pixels before inputting it to the network, and the batch size is set to 8. Besides, we use ResNet-50 as the backbone, and the classifier is composed of linear layers. For parameter settings, the learning rate is set to 0.001, and balance parameters [λ1, λ2, and λ3] are [0.6, 0.3, and 0.8, respectively] to perform the best DR grading results. In addition, the training process spends around 2.5 minutes per epoch, and the evaluation for testing images takes 5 milliseconds per fundus image.

To measure the experimental performance, we adopt the popular indicators to compare and evaluate our models: specificity (SPE), sensitivity (SEN), accuracy (ACC), and the area under the ROC curve (AUC).

4.3. Comparison with Other Methods

4.3.1. Performance on Messidor

In order to demonstrate the performance of the ACCN on DR grading, we compare with different baseline methods for the normal/abnormal DR grading task. As to the compared methods, we choose the manual grading results from two experts [38] and introduce two experimental methods used in [39], which emphasize the role of multiple filter sizes in learning fine-grained discriminant features and proposes two deep convolutional neural networks, combining kernels with a multiple loss network and a Vgg network. The normal/abnormal fundus image classification results on Messidor are reported in Table 4, and our ACCN framework achieves the highest accuracy of 89.8%, sensitivity of 93.0%, specificity of 86.7%, and AUC of 93.6%, outperforming the supervised DR grading model and experts. What needs to be emphasized is that our ACCN model only utilizes 400 annotated retinal images and other training data is unlabeled while the compared models require fully annotated retinal images and experts require long-term professional training. Therefore, the excellent performance of our ACCN in a semisupervised manner proves that it can save us from depending on expensive annotating networks in significant applications for DR grading.

Table 4.

Compared performance on Messidor.

Methods Accuracy Sensitivity Specificity AUC
Expert A [38] 87.8 92.2
Expert B [38] 76.4 86.5
Holly et al. [39] 87.1 88.2 85.7 87.0
Holly et al. [39] 85.8 91.6 80.3 86.2
Odena et al. [40] 94.7 95.4 95.1 96.7
S2MTS2 [30] 86.7 88.7 84.8 86.3
SRC-MT [41] 85.8 86.4 85.2 84.8
ACCN 89.8 93.0 86.7 96.0

Besides, we choose two existing semisupervised medical image classification methods [30, 41] to compare with our ACCN model. S2MTS2 [30] combines self-supervised mean-teacher pretraining with a semisupervised fine-tuning method to solve the multilabel chest X-ray classification; SRC-MT [41] proposes a sample relation data consistency paradigm to effectively extract unlabeled data by modeling the relationship information among different medical image samples. To compare the ACCN with them, we implement their public available code on the Messidor dataset with the same settings. The results are summarized in Table 4, proving that our ACCN approach is superior to those semisupervised medical image classification methods, with considerable improvements in each metric. Although our method outperforms some supervised methods, there is still a gap with advanced supervised methods, and the ACCN still has the potential to be explored to reach the supervised performance.

4.4. Visual Analysis for ACCN

This article outlines two popular visualizations for the ACCN to make it generally available for the diabetic retinopathy grading task. First, the ROC curve is shown in Figure 3, and our approach achieves an AUC of 0.96 on the Messidor dataset. Besides, we utilize 600 testing fundus images and illustrate the classification results in the confusion matrix (Figure 4). The confusion matrix can quickly visualize the proportion of various misclassified categories into other classes. From the results, the ACCN model correctly classifies the 279 abnormal and 261 normal fundus images, with 89.9% accuracy. Summarizing the above-mentioned visualization results, we can see that our ACCN model effectively utilizes a large amount of unlabeled data with fewer annotations to solve the semisupervised DR grading task well.

Figure 3.

Figure 3

ROC curve of the proposed ACCN model for normal/abnormal DR grading on the Messidor dataset.

Figure 4.

Figure 4

Normal/abnormal DR classification on the Messidor dataset.

At the same time, we calculate the loss reduction during model training, illustrated in Figure 5. The overall loss reveals a downward trend, and the regeneration of pseudolabels causes the ups and downs in the first half by clustering within the batch. After adding the global-consistent loss, the clustering centroids are dynamically updated more reasonably, with stable loss convergence. This demonstrates that our ACCN can rapidly train a semisupervised DR grading model and the global-consistent loss significantly improves the convergence.

Figure 5.

Figure 5

Loss curve of the ACCN for model training on the Messidor dataset.

4.5. Performance on Other DR Grading Datasets

This article also chooses another publicly available DR grading dataset, APTOS 2019, in the normal/abnormal DR experiments to provide the transferability of the proposed ACCN approach. APTOS 2019 [42] was proposed in the APTOS 2019 diabetic retinopathy classification contest, which was organized by the Asia Pacific Tele-Ophthalmology Society. It comprises 3662 retinal images from fundus photography with available annotations captured from multiclinics with different imaging conditions at Aravind Eye Hospital in India. Concretely, this dataset contains five classes for training the ACCN, and the data are highly imbalanced, as summarized in Table 5. Compared to Messidor, APTOS 2019 is more challenging because it contains five grades on DR and it can prove the effectiveness of our ACCN model more sufficiently on normal and abnormal DR classification, and the detailed division of different DR grades can be found in Table 4.

Table 5.

The class distribution of APTOS 2019.

Label APTOS Division
DR 0 1805 Normal
DR 1 370 Abnormal
DR 2 999 Abnormal
DR 3 193 Abnormal
DR 4 295 Abnormal

From Table 6, it can be found that the ACCN has reached a high accuracy of 93.4%, sensitivity of 91.0%, specificity of 95.7, and AUC of 0.984. These results mean that the ACCN can effectively extract the internal connections among unlabeled retinal images in different datasets and it can successfully solve the DR grading problem with fewer annotations when transferred to other application scenarios.

Table 6.

Experimental results on APTOS 2019.

Methods Accuracy Sensitivity Specificity AUC
ACCN 93.4 91.0 95.7 98.4

5. Further Analysis

This section further discusses the impacts of major components and parameters on the ACCN approach to the semisupervised DR grading task, including the labeled data, augmentation-consistent learning, and the weight clustering unit.

5.1. The Impact of Labeled Fundus Images

This paper attempts to solve the DR-grading task with fewer annotations. Thus there are very few high-quality samples with accurate labels for DR diagnosis. To measure the impacts of labeled data, we use accuracy to test how the number of labeled retinal images influences the ACCN performance on the Messidor dataset. From the results in Figure 6, it can be observed that the DR grading accuracy rapidly increases from 68.7% to 75.2% as the number of labeled fundus images increases from 50 to 100 and it mildly increases from 75.2% to 89.8% when the number of labeled data is between 100 and 400. Finally, the ACCN model achieves an accuracy of 93.4% when it is fully supervised.

Figure 6.

Figure 6

DR classification performance with different numbers of labeled data.

The above-mentioned experimental results show that the proposed semisupervised model can work well using a relatively small number of labeled samples, with fewer annotating costs than existing supervised DR grading models. However, using the proposed ACCN approach still requires a certain amount of labeled samples to obtain a higher classification accuracy. A similar trend and conclusion can also be observed from sensitivity, specificity, and AUC.

5.2. The Impact of Augmentation-Consistent Learning

The first dominating method in the ACCN is the augmentation-consistent learning module, which generates weak and strong augmentations for annotated and unlabeled training images, respectively, and conducts consistent feature learning for the raw images and their augmentations. To weigh the impact of this module, we only employ raw images to conduct the weight clustering network and assign pseudolabels. The results are reported in Table 7 (ACL). Concretely, the ACL module improves the DR grading performance with an accuracy of +13.5%, sensitivity of +14.7%, specificity of +12.4%, and AUC of +14.6%. This further certifies that the novelties of our proposed augmentation-consistent learning mechanism are beneficial to the semisupervised DR grading task.

Table 7.

The contributions of the major steps in ACCN (%).

Target Accuracy Sensitivity Specificity AUC
ACL +13.5 +14.7 +12.4 +14.6
WLU +8.1 +9.3 +7 +9.8

5.3. The Impact of Weight Clustering

We then analyze the influence of the weight clustering module. We remove the entire clustering module and directly use the prediction vector of the high-confidence sample after the softmax output as the pseudolabel for training. The effect of normal/abnormal DR classification on the Messidor dataset is that the accuracy has dropped by 8.1%, which demonstrates that the ACCN employing a weight clustering unit to explore the internal relationship between unknown samples is effective in semisupervised DR grading task. Compared to the supervised models in the study by Holly et al. [39], our model achieves a competitive AUC of 86.2% when removing the WLU. It benefits from the proposed augmentation-consistent learning module and further proves the effectiveness of our semisupervised learning approach.

5.4. The Impact of Positive Cases in Unlabeled Data

The positive proportion of unlabeled data is an important factor affecting the final performance for the semisupervised diabetic retinopathy grading problem. We finally discuss the influence of the positive proportion of unlabeled training data by changing the proportion of positive cases in unlabeled data. The results on the Messidor dataset are summarized in Figure 7, revealing that the accuracy of performance decreases with increasing positive proportion in unlabeled training. This demonstrates that the positive cases in labeled training data provide more discriminative information than the ones in unlabeled data. Thus, the balanced distribution of negative and positive cases both in labeled and unlabeled data is important for the semisupervised diabetic retinopathy grading task. In addition, under the premise that the number of labeled samples remains unchanged, we record experimental results employing different proportions of positive samples (unlabeled). The result is shown in Figure 8.

Figure 7.

Figure 7

The accuracy results of different positive proportions in unlabeled training data.

Figure 8.

Figure 8

The accuracy results with different ratios of positive samples in unlabeled data.

6. Discussion and Conclusion

For the real application of diabetic retinopathy grading, the lack of labeled data is the main challenge that limits the application of deep learning. This is probably due to the following reasons. First, the lesion indicating DR is always subtle in digital fundus images, so labeling retinal images require expertise in long-term training, and hiring experts to annotate is very expensive and time-consuming. Second, medical data, especially images for human diseases, become difficult to collect due to rigorous privacy issues. Finally, the diseases that require the aid of computer vision are often complex, and the model training must use sufficient data, making the fundus image annotation more complicated.

To address the above-mentioned challenges, we propose an augmentation-consistent clustering network (ACCN) approach for semisupervised diabetic retinopathy grading, which can mine internal correlations among unknown samples assisted by fewer annotations. The proposed model can compensate for the lack of labeled data in the following ways. (1) The augmentation-consistent learning generates weak and strong augmentations for annotated and unlabeled fundus images and provides inherent consistent information by labeled cross-entropy and augmentation-consistent losses. (2) A weight clustering unit is designed to calculate the pseudolabels for unknown retinal images with a dynamically clustering algorithm, which utilizes weight centroids to cluster in a global-consistent manner. (3) The DR classification model is further trained by combining annotated and pseudolabeled retinal images to achieve the semisupervised diabetic retinopathy grading task. Adequate experiments on the Messidor dataset prove that the ACCN can perform effective DR classification with limited labeled data, and the extensive experiments on APTOS 2019 demonstrate the scalability of our ACCN network to different domains.

In future, we will work on the unsupervised learning approach to conduct fundus image classification without any annotations. Besides, we will focus on diabetic retinopathy grading in multiple stages to provide a more accurate diagnosis for ophthalmologists.

Acknowledgments

This work was supported by the Research Funds of the Shanxi Transformation and Comprehensive Reform Demonstration Zone (Grant no. 2018KJCX04), the Fund for Shanxi “1331 Project,” and the Key Research and Development Program of Shanxi Province (No. 201903D311009). The work was also partially sponsored by the Research Foundation of Education Bureau of Shanxi Province (Grant No. HLW-20132), the Scientific Innovation Plan of Universities in Shanxi Province (Grant no. 2021L575), and, the Shanxi Scholarship Council of China (Grant No. 2020-149). The work was also sponsored by the Zhejiang Medical and Health Research Project (2020PY027) and the Huzhou Science and Technology Planning Program (2019GY13).

Data Availability

The datasets used and/or analyzed during the present study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare that they have no financial and personal relationships with other people or organizations that can inappropriately influence our work; there is no professional or other personal interest of any nature or kind in any product, service, and/or company that could be construed as influencing the position presented in, or the review of, the entitled manuscript.

Authors' Contributions

Guanghua Zhang, Keran Li, and Zhixian Chen contributed equally to this work.

References

  • 1.Cho N. H., Shaw J. E., Karuranga S., et al. Idf diabetes atlas: global estimates of diabetes prevalence for 2017 and projections for 2045. Diabetes Research and Clinical Practice . 2018;138:271–281. doi: 10.1016/j.diabres.2018.02.023. [DOI] [PubMed] [Google Scholar]
  • 2.Gargeya R., Leng T. Automated identification of diabetic retinopathy using deep learning. Ophthalmology . 2017;124(7):962–969. doi: 10.1016/j.ophtha.2017.02.008. [DOI] [PubMed] [Google Scholar]
  • 3.Pratt H., Coenen F., Broadbent D., Harding S. P., Zheng Y. Convolutional neural networks for diabetic retinopathy. Procedia Computer Science . 2016;90:200–205. doi: 10.1016/j.procs.2016.07.014. [DOI] [Google Scholar]
  • 4.Li X., Hu X., Yu L. Canet: cross-disease attention network for joint diabetic retinopathy and diabetic macular edema grading. IEEE Transactions on Medical Imaging . 2019;39(5):1483–1493. doi: 10.1109/TMI.2019.2951844. [DOI] [PubMed] [Google Scholar]
  • 5.Zhou Y., He X., Huang L. Collaborative learning of semi-supervised segmentation and classification for medical images. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; June 2019; Long Beach, CA, USA. pp. 2079–2088. [Google Scholar]
  • 6.Wilkinson C. P., Ferris F. L., III, Klein R. E., et al. Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales. Ophthalmology . 2003;110(9):1677–1682. doi: 10.1016/s0161-6420(03)00475-5. [DOI] [PubMed] [Google Scholar]
  • 7.Gulshan V., Peng L., Coram M., et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA . 2016;316(22):2402–2410. doi: 10.1001/jama.2016.17216. [DOI] [PubMed] [Google Scholar]
  • 8.Wang Z., Yin Y., Shi J., Fang W., Li H., Wang X. Zoom-in-net: deep mining lesions for diabetic retinopathy detection, medical image computing and computer assisted intervention − MICCAI 2017. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; September 2017; Cambridge, UK. Springer; pp. 267–275. [DOI] [Google Scholar]
  • 9.Yang Y., Shang F., Wu B. Robust collaborative learning of patch-level and image-level annotations for diabetic retinopathy grading from fundus image. IEEE Transactions on Cybernetics . 2021;2021 doi: 10.1109/tcyb.2021.3062638.3062638 [DOI] [PubMed] [Google Scholar]
  • 10.Sopharak A., Dailey M. N., Uyyanonvara B., et al. Machine learning approach to automatic exudate detection in retinal images from diabetic patients. Journal of Modern Optics . 2010;57(2):124–135. doi: 10.1080/09500340903118517. [DOI] [Google Scholar]
  • 11.Priya R., Aruna P. Diagnosis of diabetic retinopathy using machine learning techniques. ICTACT Journal on soft computing . 2013;3(4):563–575. doi: 10.21917/ijsc.2013.0083. [DOI] [Google Scholar]
  • 12.Krause J., Gulshan V., Rahimy E., et al. Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy. Ophthalmology . 2018;125(8):1264–1272. doi: 10.1016/j.ophtha.2018.01.034. [DOI] [PubMed] [Google Scholar]
  • 13.Zhang M., Meng W., Davies T., Zhang Y., Xie S. Q. A robot-driven computational model for estimating passive ankle torque with subject-specific adaptation. IEEE Transactions on Biomedical Engineering . 2015;63(4):814–821. doi: 10.1109/tbme.2015.2475161. [DOI] [PubMed] [Google Scholar]
  • 14.Greenspan H., Van Ginneken B., Summers R. M. Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Transactions on Medical Imaging . 2016;35(5):1153–1159. doi: 10.1109/tmi.2016.2553401. [DOI] [Google Scholar]
  • 15.Litjens G., Kooi T., Bejnordi B. E., et al. A survey on deep learning in medical image analysis. Medical Image Analysis . 2017;42:60–88. doi: 10.1016/j.media.2017.07.005. [DOI] [PubMed] [Google Scholar]
  • 16.Chen T., Kornblith S., Norouzi M., Hinton G. A simple framework for contrastive learning of visual representations. Proceedings of the International conference on machine learning; June 2020; Las Vegas, Nevada. PMLR; pp. 1597–1607. [Google Scholar]
  • 17.He K., Fan H., Wu Y., Xie S., Girshick R. Momentum contrast for unsupervised visual representation learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; June 2020; Seattle, WA, USA. pp. 9729–9738. [DOI] [Google Scholar]
  • 18.Zhang L., Qi G.-J. Wcp: worst-case perturbations for semi-supervised deep learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; June 2020; Seattle, WA, USA. pp. 3912–3921. [DOI] [Google Scholar]
  • 19.Sohn K., Berthelot D., Li C.-L., et al. Fixmatch: simplifying semi-supervised learning with consistency and confidence. Advances in Neural Information Processing Systems . 2020;33:596–608. [Google Scholar]
  • 20.Sambyal N., Saini P., Syal R., Gupta V. Aggregated residual transformation network for multistage classification in diabetic retinopathy. International Journal of Imaging Systems and Technology . 2021;31(2):741–752. [Google Scholar]
  • 21.Bhardwaj C., Jain S., Sood M. Hierarchical severity grade classification of non-proliferative diabetic retinopathy. Journal of Ambient Intelligence and Humanized Computing . 2021;12(2):2649–2670. doi: 10.1007/s12652-020-02426-9. [DOI] [Google Scholar]
  • 22.Bodapati J. D., Shaik N., Naralasetti V. Composite deep neural network with gated-attention mechanism for diabetic retinopathy severity classification. Journal of Ambient Intelligence and Humanized ComputIng . 2021;12(1):1–15. [Google Scholar]
  • 23.Math L., Fatima R. Adaptive machine learning classification for diabetic retinopathy. Multimedia Tools and Applications . 2021;80(4):5173–5186. doi: 10.1007/s11042-020-09793-7. [DOI] [Google Scholar]
  • 24.Jiang Z., Li Z., Grimm M., et al. Autonomous robotic screening of tubular structures based only on real-time ultrasound imaging feedback. IEEE Transactions on Industrial Electronics . 2021;69(7) doi: 10.1109/TIE.2021.3095787. [DOI] [Google Scholar]
  • 25.Thies M., Oelze M. L. Combined therapy planning, real-time monitoring, and low intensity focused ultrasound treatment using a diagnostic imaging array. IEEE Transactions on Medical Imaging . 2022;2022 doi: 10.1109/TMI.2021.3140176.3140176 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Jiang Z., Grimm M., Zhou M., Hu Y., Esteban J., Navab N. Automatic force-based probe positioning for precise robotic ultrasound acquisition. IEEE Transactions on Industrial Electronics . 2020;68(11) doi: 10.1109/TIE.2020.3036215. [DOI] [Google Scholar]
  • 27.Wang X., Chen H., Xiang H. Deep virtual adversarial self-training with consistency regularization for semi-supervised medical image classification. Medical Image Analysis . 2021;70 doi: 10.1016/j.media.2021.102010.102010 [DOI] [PubMed] [Google Scholar]
  • 28.Calderón-Ramírez S., Murillo-Hernández D., Rojas-Salazar K., et al. Improving uncertainty estimations for mammogram classification using semi-supervised learning. Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN); July 2021; Shenzhen, China. pp. 1–8. [DOI] [Google Scholar]
  • 29.Pang T., Ng W. L., Chan C. Semi-supervised ganbased radiomics model for data augmentation in breast ultrasound mass classification. Computer Methods and Programs in Biomedicine . 2021;203 doi: 10.1016/j.cmpb.2021.106018.106018 [DOI] [PubMed] [Google Scholar]
  • 30.Liu F., Tian Y., Cordeiro F. R., Belagiannis V., Reid I., Carneiro G. International Workshop on Machine Learning in Medical Imaging . Switzerland: Springer, Cham; 2021. Self-supervised mean teacher for semi-supervised chest x-ray classification; pp. 426–436. [DOI] [Google Scholar]
  • 31.Ran B., Goldberger J., Ben-Ari R. Weakly and semi supervised detection in medical imaging via deep dual branch net. Neurocomputing . 2021;421:15–25. [Google Scholar]
  • 32.Berthelot D., Carlini N., Cubuk D. Remixmatch: semi-supervised learning with distribution alignment and augmentation anchoring. Proceedings of the International Conference on Learning Representation; April 2020; Addis Ababa, Ethiopia. [Google Scholar]
  • 33.He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition; June 2016; Las Vegas, NV, USA. pp. 770–778. [DOI] [Google Scholar]
  • 34.Feng H., Chen M., Hu J., Shen D., Liu H., Cai D. Complementary pseudo labels for unsupervised domain adaptation on person re-identification. IEEE Transactions on Image Processing . 2021;30:2898–2907. doi: 10.1109/tip.2021.3056212. [DOI] [PubMed] [Google Scholar]
  • 35.Hu Z., Yang Z., Hu X., Ram N. Simple: similar pseudo label exploitation for semi-supervised classification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; June 2021; Nashville, TN, USA. pp. 15099–15108. [Google Scholar]
  • 36.Cascante-Bonilla P., Tan F., Qi Y., Ordonez V. Curriculum labeling: revisiting pseudo-labeling for semi-supervised learning. Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event; February 2021; AAAI Press; pp. 6912–6920. https://ojs.aaai.org/index.php/AAAI/article/view/16852 . [Google Scholar]
  • 37.Decencière E., Zhang X., Cazuguel G., et al. Feedback on a publicly distributed image database: the Messidor database. Image Analysis & Stereology . 2014;33(3):231–234. doi: 10.5566/ias.1155. [DOI] [Google Scholar]
  • 38.Sánchez CI., Niemeijer M., Dumitrescu AV., Suttorp-Schulten MS., Abràmoff MD., van Ginneken B. Evaluation of a computer-aided diagnosis system for diabetic retinopathy screening on public data. Investigative Ophthalmology & Visual Science . 2011;52(7):4866–4871. doi: 10.1167/iovs.10-6633. [DOI] [PubMed] [Google Scholar]
  • 39.Vo H., Verma A. New deep neural nets for fine-grained diabetic retinopathy recognition on hybrid color space. Proceedings of the 2016 IEEE International Symposium on Multimedia (ISM); December 2016; San Jose, CA, USA. IEEE; pp. 209–215. [DOI] [Google Scholar]
  • 40.Odena A. Semi-supervised learning with generative adversarial networks. 2016. https://arxiv.org/abs/1606.01583 .
  • 41.Liu Q., Yu L., Luo L., Dou Q., Heng P. A. Semi-supervised medical image classification with relation-driven self-ensembling model. IEEE Transactions on Medical Imaging . 2020;39(11):3429–3440. doi: 10.1109/TMI.2020.2995518. [DOI] [PubMed] [Google Scholar]
  • 42.Dekhil O., Naglah A., Shaban M., Ghazal M., Taher F., Elbaz A. Deep learning based method for computer aided diagnosis of diabetic retinopathy. Proceedings of the 2019 IEEE International Conference on Imaging Systems and Techniques (IST); December 2019; Abu Dhabi, UAE. IEEE; pp. 1–4. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets used and/or analyzed during the present study are available from the corresponding author on reasonable request.


Articles from Journal of Healthcare Engineering are provided here courtesy of Wiley

RESOURCES