Skip to main content
Journal of Diabetes and Metabolic Disorders logoLink to Journal of Diabetes and Metabolic Disorders
. 2024 Sep 20;23(2):2289–2314. doi: 10.1007/s40200-024-01497-1

Deep learning based binary classification of diabetic retinopathy images using transfer learning approach

Dimple Saproo 1,, Aparna N Mahajan 2, Seema Narwal 3
PMCID: PMC11599653  PMID: 39610484

Abstract

Objective

Diabetic retinopathy (DR) is a common problem of diabetes, and it is the cause of blindness worldwide. Detection of diabetic radiology disease in the early detection stage is crucial for preventing vision loss. In this work, a deep learning-based binary classification of DR images has been proposed to classify DR images into healthy and unhealthy. Transfer learning-based 20 pre-trained networks have been fine-tuned using a robust dataset of diabetic radiology images. The combined dataset has been collected from three robust databases of diabetic patients annotated by experienced ophthalmologists indicating healthy or non-healthy diabetic retina images.

Method

This work has improved robust models by pre-processing the DR images by applying a denoising algorithm, normalization, and data augmentation. In this work, three rubout datasets of diabetic retinopathy images have been selected, named DRD- EyePACS, IDRiD, and APTOS-2019, for the extensive experiments, and a combined diabetic retinopathy image dataset has been generated for the exhaustive experiments. The datasets have been divided into training, testing, and validation sets, and the models use classification accuracy, sensitivity, specificity, precision, F1-score, and ROC-AUC to assess the model's efficiency for evaluating network performance. The present work has selected 20 different pre-trained networks based on three categories: Series, DAG, and lightweight.

Results

This study uses pre-processed data augmentation and normalization of data to solve overfitting problems. From the exhaustive experiments, the three best pre-trained have been selected based on the best classification accuracy from each category. It is concluded that the trained model ResNet101 based on the DAG category effectively identifies diabetic retinopathy disease accurately from radiological images from all cases. It is noted that 97.33% accuracy has been achieved using ResNet101 in the category of DAG network.

Conclusion

Based on the experiment results, the proposed model ResNet101 helps healthcare professionals detect retina diseases early and provides practical solutions to diabetes patients. It also gives patients and experts a second opinion for early detection of diabetic retinopathy.

Keywords: Series, DAG, Lightweight, Pre-trained networks, Classification accuracy

Introduction

The most common eye conditions that cause blindness or reduced vision are diabetic retinopathy, cataracts, glaucoma, and age-related macular degeneration (AMD) [1]. Diabetes is a condition that is common all over the world. Diabetes is ranked seventh among deadly diseases according to a World Health Organization (WHO) report [25]. It is noted that the number of cases and incidence of diabetes have been increasing over the past few decades, with an estimated 422 million people with diabetes disease [36]. It is noted that approximately 62 million Indians suffer from diabetic retinopathy with age 25–75, and it is expected that 102 million rise by 2030[15]. It is familiar to people with diabetes, and the retina is damaged due to high blood sugar [6]. The blood vessels leak, swell, and do not pass blood in the retina, causing abnormalities. Diabetic retinopathy damages the retina due to the complexities of diabetes mellitus and is the leading cause of blindness [710]. Diabetic Retinopathy has been classified into two stages named as (a) Proliferative Diabetic Retinopathy (PDR) and (b) Non-proliferative Diabetic Retinopathy (NPDR). Further, NPDR is divided into five classes: (a) Mild, (b) Moderate, and (c) severe [1116]. The advanced form of diabetic retinopathy is called PDR, caused by the development of aberrant blood vessels or swelling in the retina [14]. The first stage of DR is known as mild NPR, during which the patient does not notice any changes in the vision or eye condition [15]. The blood vessel walls of the retina dilute and cause leakage, leading to microaneurysms, i.e., small lumps coming out from vessel walls [1619]. Generally, a minute red spot or yellow circle that appears in the eye is called mild, moderate, or severe NDPR. Moderate NPDR is distinguished from mild NPDR by more significant damage to the retina's blood vessels. In moderate NPDR, the blood arteries develop microscopic balloons or swellings with microaneurysms, and fluid leaks from the retina or bleeds [2024]. Severe NDPR is a serious problem characterized by a more damaged retina in blood vessels than mild and moderate NPDR [2426].

The exhaustive literature survey shows that deep learning-based pre-trained networks with a transfer learning approach have been widely used for classifying diabetic retinopathy images [142]. Fundus retina images are widely used to detect abnormalities of diabetic retinopathy. In this work, a robust dataset of DR has been prepared to classify images into binary classes [27]. PDP and NPDP are considered unhealthy eye conditions, and non-diabetic Retinopathy is considered healthy eye fundus images of the eye [28]. In unhealthy eye images, retinal vessels become irregular in shape, size, and diameter, whereas healthy eye images have regular shape, size, and diameter [2933]. Three benchmarked datasets of diabetic retinopathy images have been considered in this work for the experiment. A brief description of healthy eye and diabetic or unhealthy eye images of retinopathy is shown in Fig. 1.

Fig. 1.

Fig. 1

A brief description of healthy eye and diabetic or unhealthy eye images of retinopathy

In the present work, pre-processing of DR images has been used to pre-process the raw DR images named as (a) resizing of DR images, (b) removing of Gaussian and salt and paper noise with a suitable filtering method [3033]. The quality of DR images has been enhanced by selecting an optimal filtering algorithm in this work [3337]. The optimal filtering algorithm has been chosen by pulling different filtering categories in the context of smoothening the homogeneous region and preserving the edges of the DR images [38]. This work uses structure and edge preservation index to evaluate the performance of the filtering algorithm using DR images [3740]. Accordingly, in the present work, the pre-trained network has been divided into three categories named (a) series, (b) DAG (Residual and inception modules), and lightweight networks [3542]. In this work, 20 pre-trained have been selected in the robust pull of three categories, i.e., Series, DAG & Lightweight, and fined-tuned the pre-trained networks using the transfer learning approach [4042]. Accordingly, classification accuracy, sensitivity, specificity, precision, F1-score, and ROC-AUC are used to evaluate the network performance [3742].

Table 1 shows a brief exhaustive literature review on the classification of diabetic retinopathy imaging using a deep learning-based pre-trained network and transfer learning approach.

Table 1.

A brief exhaustive literature review on the classification of diabetic retinopathy imaging using a deep learning-based pre-trained network and transfer learning approach

Investigator(s) Year Pre-processed Method Dataset Name No. of Images DL Based Pre-Trained Model Classifier
Gulshan et al. [1] 2016 Resizing and normalization EyePACS & MESSIDOR-2 - DCNN Sen.-97.5%
Chandrakumar et al. [2] 2016 Resizing and Contrast Enhancement EyePACS, Drive - DCNN Acc.-94%
Zhou, L. et al. [3] 2017 Contrast Enhancement Three Dataset - Self-Design Architecture AUC-0.928
Dutta, S et al. [4] 2018 Contrast Enhancement

EyePACS

(5 Class)

50000 VGG16 Acc.- 78.3%
Junjun, P. et al. [5] 2018 Contrast Enhancement

EyePACS

(5 Class)

35, 126 ResNet18 Acc.- 78.4%
Kassani, S. H. et. al [6] 2019 Resizing and normalization

APTOS 2019

(5 Class)

3662 Xception Acc.- 83.09%
Challa, U.K et al. [7] 2019 Gaussian filters

Kaggle dataset

(Five Class)

33,000 Pre-Trained Networks Acc. -86.64%
Qummar, S et al. [8] 2019 Scaling and Resizing Kaggle dataset 5608 5 Pre-trained Network Sp and F1 Score
Bhardwaj, C. et al. [9] 2020 Contrast Enhancement

MESSIDOR

(4 Class)

1200 QIV Model (Inception-V3) Acc.- 93.33%
Saxena, G. et al. [10] 2020 Resizing and Augmentation EyePACS 88,702 InceptionResNet, ResNet and Inception AUC-0.927
Yusaku Katada et al. [11] 2020 DA EyePACS 35,126 (3508 Selected) Inception v3 Sensitivity of 81.5% and 90.8%
Ali Usman et al. [12] 2020 Resizing, Data Augmentation 7 Online Dataset 2680 Inception v3, ResNet50 and Alex Net Acc.- 85.2%
Wejdan L. Alyoubi et al. [13] 2021 CLAHE, Cropping, and DA DDR and APTOS-2019 47,870 EfficientNetB0 Acc.-89%
Bhardwaj, C. et al. [14] 2021 Contrast Enhancement

MESSIDOR

(4 Class)

1200 QEIRV‑2 Model (Inception-V3) Acc.- 93.3%
Chen, P. N. et al. [15] 2021 Resizing and Grayscale EyePACS (88,702), NASNet-Large

Acc.-

81.60% & 92.5%

San-Li Yi et al. [16] 2021 Resizing and Augmentation

APTOS 2019

(5 Class)

3662 RA-EfficientNet Acc.- 93.55%
Z. Khan et al. [17] 2021 Scaling and Resizing EyePACS 88,702 VGG16 and VGG-NiN Sp.-91%
Sraddha Das et al. [18] 2021 Adaptive histogram equalization DIARETDB1 - CNN Acc.- 98.7%
AbdelMaksoud et al. [19] 2022 Resizing and Augmentation 4 Dataset 39,301 E-DenseNet Acc.- 91.3%
Kobat, S. G., et al. [20] 2022 Resizing

NDRD

APTOS 2019

2355 and 3662 DenseNet201 Acc. -87.43% & 84.90%
Mungloo-Dilmohamud, Z et al. [21] 2022 Rescaling and Augmentation

APTOS 2019

(5 Class)

3662

VGG16, ResNet50

DenseNet169

Acc. -82%
Al-Omaisi Asia et al. [22] 2022 Cropping, Resizing, and Augmentation

XHO Dataset

(5 Class)

1607 ResNet 50, 101 and VGG16 Acc.-80.88%
Sambit S. Mondal et al. [23] 2022 CLAHE and DA

APTOS 2019

(5 Class)

3662 ResNext and DenseNet Acc.-86.08
Yasashvini, R. et al. [24] 2022 Weiner Filter APTOS 2019 3662 ResNet & DenseNet Acc.- 96.22%
Dayana, A. M. et al. [25] 2022 ADF Local - AFU-NET -
Oulhadj, M. et al. [26] 2022 Scaling and Resizing APTOS 2019 3662 Densenet-121, Xception, Inception-v3 & Resnet-50 Acc. -85.28%
Jabbar M. K. et al. [27] 2022 CLAHE, DA & Resizing EyePACS 35,126 VggNet Acc.-96.6%
Menaouer, B. et al. [28] 2022 Scaling and Resizing

APTOS‑2019,

Messidor-2 &

Local public DR

5584 VGG 16 and 19 Acc.-90.6% and
Fayyaz, A. M. et al. [29] 2023 Resizing and Augmentation

ODIR

(4 Class)

- AlexNet and ResNet-101 SVM Acc.-93%
Dolly Das et al. [30] 2023 Resizing and Augmentation

EyePACS

(5 Class)

35, 126 19 Pre-Trained Acc.-79.11%
C. Mohanty et al. [31] 2023 Cropped and Resized

APTOS 2019

(5 Class)

3662 DenseNet 121 Acc.-97.30%
Pradeep Kumar Jena et al. [32] 2023 CLAHE APTOS and MESSIDOR 3662 & 1200 Self-Design Architecture Acc.—SVM (98.6% and 91.9%)
Bhimavarapu, U. et al. [33] 2023 CLAHE and Histogram Equalization

APTOS and

Kaggle

3662 & 35,126 5 Pre-trained Network Acc. – 98.32% & 98.71%
Islam, N. et al. [34] 2023 Gaussian Filter APTOS, IDRiD 3662 & 516 Xception APTOS—99.04% IDRiD -94.17%
Sajid, M. et al. [35] 2023 DA and Image Enhancement Public Dataset 32,800 DR-NASNet Acc.-96%
Alwakid, G. et al. [36] 2023 CLAHE & Data Augmentation APTOS 2019 3662 DenseNet-121 Acc.-98.36%
Vijayan, M. et al. [37] 2023 Scaling DDR, IDRiD, and APTOS 13,673, 516 & 3662 6 Pre-trained Network Acc.-82.5%
Alwakid, G. et al. [38] 2023 CLAHE & DA APTOS 2019 3662 DenseNet-121 -
Guefrachi, S et al. [39] 2024 Resizing and Augmentation APTOS 2019 3662 Resnet152-V2 Acc.- 100%
Sunkari, S. et al. [40] 2024 Contrast and Brightness

APTOS and

Kaggle

3662 & 35,126 3 Pre-trained Network Acc.—93.51%
Macsik, P. et al. [41] 2024 CLAHE & DA DDR and APTOS 2019 3662 Xception & EfficientNetB4 Acc
Shakibania Bu-Ali et al. [42] 2024 CLAHE & DA APTOS 2019 3662 4 Pre-trained Network Acc.-96.44

T & V—Training & Validation, ODIR- Ocular Intelligent Recognition collect, NDRD-New Diabetic Retinopathy Dataset, APTOS-Asia Pacific Tele-Ophthalmology Society, ADF- Anisotropic Diffusion Filter, AFU-NET-Attention-based Fusion Network, DA-Data Augmentation

From Table 1, it is observed that APTOS 2019, Diabetic Retinopathy Dataset, IDRiD, and EyePACS datasets were used in most of the studies with binary and multi-class classification using tuned pre-trained networks using transfer learning [3542]. It is concluded that the highest accuracy has been achieved at 100% by Guefrachi S et al. [39]. It is noted that the interpretability of pre-trained networks improves the classification accuracy and explains the ability of DR detection [3042]. It is also observed that only 3 out of 42 studies (7%) used pre-trained networks as feature extractors and classifiers. It is observed that SVM, KNN, and PNN classifiers have been used to classify features extracted by CNN architecture [142]. It is also noted that only 2 out of 42 studies (approx. 5%) used self-design CNN architecture to classify diabetic retinopathy images. It is also stated that only 6 out of 42 studies used binary class classification [3542]. It is also noted that Gaussian filters, CLAHE, Data augmentation, histogram equalization, scaling, and resizing have been widely used as despeckling algorithms [3742]. According to the literature review, noise is removed from an image by convolving it with a Gaussian kernel while maintaining the underlying structures [7]. It's an easy way to reduce Gaussian noise without sacrificing performance [7, 28, 3642].

In the present work, the pre-trained networks have been divided into three categories: series, DAG (Directed Acyclic Graphs), and lightweight. Selecting a network in which category to classify medical images is crucial. This work provided a platform to classify DR images into binary classes by selecting the optimal pre-trained network with category. Accordingly, 20 pre-trained networks have been divided into three categories: series, DAG, and lightweight. These networks are used to classify DR images into binary. However, lightweight pre-trained networks provide effective, scalable, and readily available options for deploying deep learning models in resource-constrained situations. DAGs and series help manage and optimize complex workflows with apparent dependencies.

In the present work, the Optimal Denoising algorithm, Gaussian Filter, and resizing image size are done by optimizing performance evaluation parameters and maintaining the aspect ratio of the DR images. It is noted that the robust pull of the filtering and resizing algorithm has selected the denoising and resizing method. Structure and edge-preserving index (SEPI) [4350] metrics have been evaluated for the performance of denoising filters. The Gaussian optimal denoising filter for DR images performed outstandingly in the DR image classification. In the present work, the effect of the denoising algorithm has been evaluated based on the performance parameter of the classification used in this work. The impact of the denoising algorithm has been reported in this work.

Workflow adopted for classification of diabetic retinopathy images

The workflow adopted in this work for classifying diabetic retinopathy images using fine-tuned pre-trained networks with a transfer learning approach is presented in Fig. 2.

Fig. 2.

Fig. 2

The workflow adopted in this work for the classification of diabetic retinopathy images using fined tuned pre-trained networks with the transfer learning approach

The exhaustive literature review noted that very few studies based on binary class classification of DR images differentiate retina disease into two classes, as no sign of diabetic retinopathy and the sign of diabetic retinopathy has been reported by the researchers [142]. It is also concluded that the feature engineering step has been eliminated using deep learning or different types of pre-trained networks [44]. It is also noted that feature engineering shows significantly less result than deep learning. Convention feature extraction and selection process has been removed using a convolution-based network followed by ReLU, normalization, and max pooling [4347].

From Fig. 2, it is noted that the classification of DR images involves different steps named (a) Benchmark diabetic Retinopathy Dataset Collection, (b) Data Pre-processing and Preparation Module, (c) Dataset Splitting for Training, Validation, and Testing Set, (d) Data Augmentation (e) Selection of pre-trained network, (f) Network training (g) Hyperparameters tuning using transfer learning, (h) evaluation parameters, (i) Validation and interpretation of the trained model and (j) Development and deployment.

Original benchmark diabetic retinopathy dataset collection

The dataset for diabetic retinopathy images has been collected from three benchmarked datasets consisting of annotated images by experienced ophthalmologists marked as healthy and unhealthy retinal images affected by diabetic disease [5153]. The sample images of diabetic retinopathy in different stages are shown in Fig. 3.

Fig. 3.

Fig. 3

Diabetic Retinopathy Stages (a) Normal retinal, (b) Mild Diabetic Retinopathy, (c) Moderate Diabetic Retinopathy, (d) Severe Diabetic Retinopathy, (e) Proliferative Diabetic Retinopathy, (f) Macular Edema

The first dataset available at Kaggle, the Diabetic Retinopathy Dataset (DRD—EyePACS) [51], consists of 2750 DR images, 1000 belonging to healthy classes and 1750 to unhealthy classes. The size of all images is 256 × 256.

The second dataset, IDRiD (Indian Diabetic Retinopathy Image Dataset) [52], consists of 513 DR images, 413 of which belong to healthy and 103 to unhealthy classes. The size of all images is 4288 × 2848.

The third dataset, APTOS (Asia Pacific Tele-Ophthalmology Society)-2019 [53], contains 3662 DR images collected from many participants in rural India. The dataset was prepared by India's Aravind Eye Hospital [3542]. The fundus images have been taken over a long period in various settings and situations. The medical professionals examined and labeled the samples based on the standard provided by the International Clinical Diabetic Retinopathy Disease Severity Scale (ICDRDSS). It consists of 1805 samples with healthy and 1857 samples with unhealthy classes. The details of diabetic retinopathy datasets are shown in Table 2.

Table 2.

The Detail of Diabetic Retinopathy Datasets

Class DRD-EyePACS Dataset [51] IDRiD Dataset [52] ATOS -2019 Dataset [53] Combined Dataset
Healthy (Not Diabetic Retinopathy) 1000 168 1805 2973
Unhealthy (Diabetic Retinopathy) 1750 348 1857 3955
Total 2750 516 3662 6928

Table 2 notes that four experiments were performed in this work. Datasets DRD-EyePACS, IDRiD, and APOS-2019 have been considered for experiments 1, 2, and 3. Experiment 4 has been performed in this work, combining all three datasets and generating a robust dataset. The datasets used by medical professionals are not robust, so to solve this problem, all datasets have been combined into two classes, and a collective dataset has been prepared in this work. 6928 DR images contain 2973 images in the healthy class and 3955 DR images in the unhealthy class.

Data pre-processing and preparation module

In the present work, the data pre-processing and preparation module has been divided into three categories named as (a) Resizing of DR Images, (b) Image enhancement using suitable denoising algorithm, (c) Data augmentation, and (d) Dataset Splitting for Training, Validation and Testing Set.

Resizing of DR images

The exhaustive literature review concludes that the resizing of DR images is essential for the analysis and classification of diabetic retinopathy disease. It is also noted that deep learning-based pre-trained networks required the standard size of the image. To meet this requirement, the image size has been resized by 256 × 256 in this work. The DR image has been resized by maintaining the aspect ratio.

Image enhancement using suitable denoising algorithm

The present work has enhanced DR image quality using a pre-processing step. From the exhaustive literature review, denoising, contrast adjustment, and sharpening have been widely used for DR images to enhance the quality of images [142]. In the present work, an exhaustive poll of filtering and enhancement algorithms has been selected for the categories named (a) linear, (b) non-linear, (c) edge-preserving, and (d) contrast enhancement categories [7, 3542]. It is observed that the Gaussian filter outperformed in all categories. This work used a Gaussian filter to pre-process DR images [7].

Dataset splitting for training, validation and testing set

Dataset splitting is an essential step in deep learning model development to evaluate the performance of the networks. Initially, the dataset was divided into training and testing subsets, and then the training set was split into validation data and training data. A brief description of diabetic retinopathy datasets splitting for training, validation, and testing set is shown in Table 3.

Table 3.

The brief description of diabetic retinopathy datasets splitting for training and testing

Dataset DRD-EyePACS Dataset IDRiD Dataset ATOS -2019 Dataset Combined Dataset
Class Training Testing Training Testing Training Testing Training Testing
Healthy (Not Diabetic Retinopathy) 800 200 134 50 1605 200 2505 450
Unhealthy (Diabetic Retinopathy) 1550 200 298 50 1657 200 3487 450
Total DR Image 2350 400 416 100 3262 400 5992 900
2750 516 3662 6928

Table 3 reveals that the training data contains a large portion of the set, and the network learns patterns and relationships from this data. The deep learning hyperparameters are fine-tuned, and their training performance is evaluated using the validation set. It prevents overfitting the deep learning network and calculates the networks' performance during each epoch. The trained model's performance is calculated using the testing set. It acts as a neutral gauge of the model applied to untested data. The dataset is typically divided into three subsets in equal ratios depending on the dataset's size and the problem's specific requirements.

Data augmentation

Balancing a dataset for diabetic retinopathy using data augmentation involves creating additional samples of the minority class (e.g., severe diabetic retinopathy) to match the number of samples in the majority class (e.g., no diabetic retinopathy) [3342]. Data augmentation techniques generate new samples by transforming existing images while preserving their semantic content [40]. The dataset has been randomly shuffled, and the DR image dataset ensures that each sample type is represented in each subset [41]. To overcome these problems, data augmentation balanced the training and validation data and achieved good network performance [3040]. In the present work, data augmentation has been applied to the training set of all three datasets. The healthy and unhealthy class of DR images reached approximately 4000 of each class and each dataset. The combined dataset was prepared by mixing all classes, and a robust dataset was prepared to classify DR images. The number of multipliers as data augmentation operations with each dataset is shown in Table 4.

Table 4.

The number of multipliers as data augmentation operation with each dataset

Dataset DRD-EyePACS Dataset IDRiD Dataset ATOS -2019 Dataset Combined Dataset
Class Training Testing Training Testing Training Testing Training Testing
Healthy (Not Diabetic Retinopathy)

800 × 5 = 

4000

200 118 × 34 = 4012 50 1605 × 2.5 = 4012 200 4000 + 4012 + 4012 = 12,024 450
Unhealthy (Diabetic Retinopathy) 1550*2.6 = 4030 200 298 × 13.5 = 4023 50 1657 × 2.40 = 3976 200 4030 + 4023 + 3976 = 12,029 450
Total 8030 400 8035 100 7988 400 24,053 900

Randomly shuffle the dataset to ensure each subset contains a representative sample of the entire dataset. This helps reduce the risk of any particular subgroup being biased. In classification tasks, preserving the class allocation between subsets is essential, mainly when working with imbalanced datasets. In the present work, class subsets have been balanced using data augmentation, ensuring that each subset has an equal distribution of classes of diabetic retinopathy images.

The number of augmentation transform operations used as a multiplier for different images is shown in Table 5.

Table 5.

The number of augmentation transform operations used as multipliers for different images

Datasets Diabetic Retinopathy disease type Rotation Flipping Flipping with Rotation Translation Multiplier No. of Images Total Images
90o 180o Horizontal Vertical 90° H 180° H 90oV 180oV
DRD-EyePACS Dataset -1 Healthy (Not Diabetic Retinopathy)  ×   ×   ×   ×  5 800 4000
IDRiD Dataset -2 -13 to 13 34 118 4012
ATOS -2019 Dataset -3  ×   ×   ×   ×   ×   ×  0.5 ×  (-1) 2.5 1605 4012
DRD-EyePACS Dataset -1 Unhealthy (Diabetic Retinopathy)  ×   ×   ×   ×   ×   ×  0.6 ×  (-1) 2.6 1550 4030
IDRiD Dataset -2 -3 to 3 13.5 298 4023
ATOS -2019 Dataset—3  ×   ×   ×   ×   ×   ×  0.4 ×  (-1) 2.4 1657 3976

V-Vertical, H-Horizontal

Splitting the diabetic retinopathy dataset into training, testing, and validation sets

The first step in developing a deep learning-based classification model is to divide the DR Dataset into training, testing, and validation sets to ensure the model performs efficiently when applied to new data. Table 6 provides a brief description of the dataset after data augmentation was applied to diabetic retinopathy images, with balanced training, validation, and testing sets, as well as the combined dataset.

Table 6.

The brief description of the dataset after data augmentation applied to diabetic retinopathy images with balanced training, validation, and testing sets

graphic file with name 40200_2024_1497_Tab6_HTML.jpg

Table 6 shows that the diabetic retinopathy datasets are considered balanced if there are the same number of cases or samples in each class or category in the training, validation, and testing sets. In the present work, the combined dataset has been generated. In deep learning applications, particularly classification, the balanced dataset is an important step to prevent biasing the majority of classes and guarantee the representation of all classes during training. It is also noted that diabetic retinopathy dataset performance has been compared with classification accuracy, and the best combination has been selected based on the dataset and type of pre-trained networks. It is also revealed that the performance of each pre-trained network has been calculated based on the transfer learning approach, and the denoising effect has been reported.

Selection of pre-trained network and classification module

Selecting a pre-trained network for diabetic retinopathy depends upon the type of dataset, size of images, and the availability of pre-trained models. In the present work, a pre-trained network has been selected based on Series, DAG, and lightweight categories. Table 7 shows a brief description of each category. The selection of a pre-trained network is essential for classifying diabetic retinopathy images. This work divides the pre-trained network into Series, DAG, and Lightweight categories. The number of parameters is the criteria for dividing the pre-trained network into each category [44].

Table 7.

A brief category of selecting a pre-trained network based on each category

S.No Name of the Pre-trained Networks Type of Categories Size of Images Number of Parameters(Million) Depth of the Network
1 AlexNet Series 227 × 227 × 3 61.0 8
2 vgg16 224 × 224 × 3 138 16
3 vgg19 224 × 224 × 3 144 19
4 darknet19 256 × 256 × 3 20.8 19
5 darknet53 256 × 256 × 3 41.6 53
6 inceptionv3 DAG 299 × 299 × 3 23.9 48
7 densenet201 224 × 224 × 3 20.0 201
8 Resnet50 224 × 224 × 3 25.6 50
9 Resnet101 224 × 224 × 3 44.6 101
10 xception 299 × 299 × 3 22.9 71
11 inceptionresnetv2 299 × 299 × 3 55.9 164
12 nasnetlarge 331 × 331 × 3 88.9 -
13 SqueezeNet Lightweight Networks 227 × 227 × 3 1.24 18
14 mobilenetv2 224 × 224 × 3 3.5 53
15 shufflenet 224 × 224 × 3 1.4 50
16 nasnetmobile 224 × 224 × 3 5.3 -
17 efficientnetb0 224 × 224 × 3 5.3 82
18 GoogleNet 224 × 224 × 3 7.0 22
19 googlenet-places365 224 × 224 × 3 7.0 22
20 resnet18 224 × 224 × 3 11.7 18

Table 7 shows that the pre-trained networks selected in Series, DAG, and lightweight categories are 5, 7, and 8, respectively. A total of 20 pre-trained networks have been selected, and the depth of the pre-trained network is important for classifying DR images.

Assessment parameters used in classification of DR images

From an exhaustive literature survey, it is noted that Accuracy, sensitivity, specificity, log loss, precision, F-score, overlapping error, boundary-based evaluation, etc., are the performance metrics used by different researchers to evaluate the detection algorithms of diabetic retinopathy. The present work uses Accuracy, Sensitivity, Specificity, Precision, and F1-Score metrics to calculate performance parameters by classifying DR images. The sample of the Confusion matrix from the classification of DR images is shown in Table 8.

Table 8.

The sample of the Confusion matrix from the classification of DR images

Healthy (Not Diabetic Retinopathy) TN FP
Unhealthy (Diabetic Retinopathy) FN TP
Healthy (Not Diabetic Retinopathy) Unhealthy (Diabetic Retinopathy)

Number of correctly detected DR images True positive (TP) and the number of DR images detected wrongly, false-negative (FN) instances that are incorrectly classified, and the numbers of positive FP (false positive) and FN (false negative).

Accuracy is the ratio of the total number of correctly identified to the total number of images taken for the classification. The formula used for the accuracy is shown in Eq. 1.

Accuracy=TP+TNTP+TN+FP+FN 1

Sensitivity or recall determines the measure of the correctness of patients with DR. The formula used for Sensitivity is shown in Eq. 2.

Sensitivity=TPTP+FN 2

It is calculated by dividing the number of patients diagnosed with diabetic retinopathy by the total number of affected patients.

Specificity is a ratio that determines whether a person is unaffected by the DR disease. The formulas for Precision and recall are shown in Eq. 4 and 5, and the formula for Specificity is shown in Eq. 3.

Specificity=TNFP+TN 3

Precision is the ratio of correctly classified DR images to the number of unhealthy DR images detected wrongly as the DR images.

Precision=TPFP+TP 4

The F-Score, also known as the F1-Score, is a metric for how accurate a model is on a given dataset. It is used to evaluate binary classification systems.

F-Score=2×Precision×RecallPrecision+Recall 5

The harmonic mean of precision and recall calculates the regular F1 score.

Implementation details and number of experiments

The selected hyperparameters used to implement experiments are shown in Table 9.

Table 9.

The selected hyperparameters for implementation of experiments

Hyperparameters Details
Learning Rate 10–4
Mini-batch Size 32
Maximum Epochs 30
Optimizer Adam
The selected machine for implementation of experiments
Used Machine Details
GPU NVidia GEFORCE RTX 4060, 8 GB, 3072 CUDA CORE
Processor 12th Gen Intel Core i7 Processor
Operating System Windows 11 Home

In the present work, four experiments have been drawn from each category of the pre-trained network. The experiments performed to classify DR images are shown in Table 10.

Table 10.

Experiments Performed in the classification of DR images

Experiment(s) Details Dataset
Experiment—1 Using Original DR images without augmentation DRD-EyePACS, IDRiD, ATOS-2019, and Combined dataset
Experiment—2 Using Original DR images with augmentation DRD-EyePACS, IDRiD, ATOS-2019, and Combined dataset
Experiment—3 Using pre-processed dataset images without augmentation DRD-EyePACS, IDRiD, ATOS-2019, and Combined dataset
Experiment—4 Using pre-processed dataset images with augmentation DRD-EyePACS, IDRiD, ATOS-2019, and Combined dataset

Performance evaluation of experiments

In the present network, 20 pre-trained networks have been used to calculate the performance of three datasets and combined three datasets for DR images based on healthy and unhealthy classes. The pre-trained network has been divided into series, DAG, and lightweight categories. Accuracy, Sensitivity, Specificity, precision, and F1-score have been calculated in this work. In the present work, original, pre-processed with augmentation, and without augmentation DR images have been used to calculate the performance of pre-trained network categories. The performance of each case is as follows:

Performance evaluation metrics using DRD-EyePACS dataset

Several factors such as choice of pre-trained network, data pre-processing, fine-tuning of the pre-trained network using optimal hyper-parameter, validation data, and evaluation metrics are taken into consideration while networks performance using series, DAG and Lightweight network architectures category using DRD-EyePACS Dataset.

Performance of series-based network architectures using DRD-EyePACS dataset

Table 11 shows the performance of series-based networks with and without augmentation using original and pre-processed DR images of the DRD-EyePACS Dataset.

Table 11.

The performance of series-based networks based on augmentation and without augmentation using original & pre-processed DR images of DRD-EyePACS Dataset

(a) Using Original DRD-EyePACS Dataset images
Network Name Confusion Matrix Without Augmentation Confusion Matrix After Augmentation
ACC % Sen Sp Pr F1

ACC

%

Sen Sp Pr F1
AlexNet 136 64 65 0.62 0.68 0.66 0.64 176 24 85 0.82 0.88 0.87 0.85
76 124 36 164
vgg16 140 60 67.5 0.65 0.70 0.68 0.67 180 20 86 0.82 0.90 0.89 0.85
70 130 36 164
vgg19 152 48 73 0.70 0.76 0.74 0.72 186 14 89.5 0.86 0.93 0.92 0.89
60 140 28 172
darknet19 130 70 66 0.67 0.65 0.66 0.66 182 18 86.5 0.82 0.91 0.90 0.86
66 134 36 164
darknet53 144 56 71 0.70 0.72 0.71 0.71 176 24 87 0.86 0.88 0.88 0.87
60 140 28 172
(b) Using Pre-processed DRD-EyePACS Dataset Images
Network Name Confusion Matrix ACC Sen Sp Pr F1 Confusion Matrix ACC Sen Sp Pr F1
AlexNet 140 60 69 0.68 0.70 0.69 0.69 178 22 87 0.85 0.89 0.89 0.87
64 136 30 170
vgg16 150 50 71 0.67 0.75 0.73 0.70 182 18 87.5 0.84 0.91 0.90 0.87
66 134 32 168
vgg19 152 48 75 0.74 0.76 0.76 0.75 192 8 92.5 0.89 0.96 0.96 0.92
52 148 22 178
darknet19 144 56 69 0.66 0.72 0.70 0.68 176 24 85 0.82 0.88 0.87 0.85
68 132 36 164
darknet53 156 44 73 0.68 0.78 0.76 0.72 186 14 89.5 0.86 0.93 0.92 0.89
64 136 28 172

Table 11 shows that the vgg19 network in the series category achieved the highest classification accuracy using the DRD-EyePACS Dataset. It is also observed that 92.5% accuracy has been achieved after the augmentation of the DRD Dataset. It is also noted that the vgg19-based network achieved 0.89, 0.96, 0.96, and 0.92 evaluation parameters named sensitivity, Specificity, Precision, and F1-Score, respectively. It is also noted that Individual Class Accuracy (ICA) for the healthy eye is 192 and ICA for an unhealthy eye is 178, which has been achieved using series-based network vgg19. The best result is marked in grey.

Performance of DAG-based network architectures using DRD-EyePACS dataset

Table 12 shows the performance of DAG-based networks with and without augmentation using original and pre-processed DR images.

Table 12.

The performance of DAG-based networks based on augmentation and without augmentation using original & pre-processed DR images of DRD-EyePACS Dataset

(a) Using Original DRD-EyePACS Dataset images
Network Name Confusion Matrix Without Augmentation Confusion Matrix After Augmentation

ACC

%

Sen Sp Pr F1

ACC

%

Sen Sp Pr F1
inceptionv3 138 62 67.5 0.66 0.69 0.68 0.67 180 20 86.5 0.83 0.90 0.89 0.86
68 132 34 166
densenet201 152 48 69.5 0.63 0.76 0.72 0.67 184 16 87.5 0.83 0.92 0.91 0.87
74 126 34 166
Resnet50 152 48 72.5 0.69 0.76 0.74 0.72 182 18 88.5 0.86 0.91 0.91 0.88
62 138 28 172
Resnet101 154 46 74.5 0.72 0.77 0.76 0.74 190 10 91.5 0.88 0.95 0.95 0.91
56 144 24 176
xception 152 48 71 0.66 0.76 0.73 0.69 184 16 87 0.82 0.92 0.91 0.86
68 132 36 164
inceptionresnetv2 150 50 69 0.63 0.75 0.72 0.67 180 20 85 0.80 0.90 0.89 0.84
74 126 40 160
nasnetlarge 148 52 68.5 0.63 0.74 0.71 0.67 188 12 88.5 0.83 0.94 0.93 0.88
74 126 34 166
(b) Using Pre-processed DRD-EyePACS Dataset Images
Network Name Confusion Matrix ACC Sen Pr F1 Confusion Matrix ACC Sen Pr F1
inceptionv3 152 48 72.5 0.69 0.76 0.74 0.72 184 16 88.5 0.85 0.92 0.91 0.88
62 138 30 170
densenet201 154 46 73.5 0.70 0.77 0.75 0.73 184 16 89 0.86 0.92 0.91 0.89
60 140 28 172
Resnet50 158 42 75 0.71 0.79 0.77 0.74 188 12 91.5 0.89 0.94 0.94 0.91
58 142 22 178
Resnet101 162 38 77.5 0.74 0.81 0.80 0.77 194 6 93.5 0.90 0.97 0.97 0.93
52 148 20 180
xception 150 50 72 0.69 0.75 0.73 0.71 182 18 89.5 0.88 0.91 0.91 0.89
62 138 24 176
inceptionresnetv2 148 52 70 0.66 0.74 0.72 0.69 184 16 89 0.86 0.92 0.91 0.89
68 132 28 172
nasnetlarge 142 58 69.5 0.68 0.71 0.70 0.69 176 24 86.5 0.85 0.88 0.88 0.86
64 136 30 170

From Table 12, it is concluded that the Resnet101 network in the category of DAG achieved the highest classification accuracy using the DRD-EyePACS Dataset. It is also observed that 93.5% accuracy has been achieved after the augmentation of the DRD Dataset. It is also noted that the Resnet101-based network achieved 0.90, 0.97, 0.97, and 0.93 evaluation parameters named sensitivity, Specificity, Precision, and F1-Score, respectively. It was also noted that individual class accuracy (ICA) for a healthy eye is 194 and ICA for an unhealthy eye is 180, which was achieved using the based network Resnet101. The best result is marked in grey.

Performance of lightweight based network architectures using DRD-EyePACS dataset

Table 13 shows the performance of Lightweight networks based on augmentation and without augmentation using original and pre-processed DR images of the DRD-EyePACS Dataset.

Table 13.

The performance of Lightweight networks based on augmentation and without augmentation using original & pre-processed DR images of the DRD-EyePACS Dataset

(a) Using Original DRD-EyePACS Dataset images
Network Name Confusion Matrix Without Augmentation Confusion Matrix After Augmentation
ACC Sen Sp Pr F1 ACC Sen Sp Pr F1
SqueezeNet 142 58 69.5 0.68 0.71 0.70 0.69 178 22 85 0.81 0.89 0.88 0.84
64 136 38 162
mobilenetv2 154 46 72 0.67 0.77 0.74 0.71 180 20 87.5 0.85 0.90 0.89 0.87
66 134 30 170
shufflenet 158 42 75.5 0.72 0.79 0.77 0.75 188 12 91 0.88 0.94 0.94 0.91
56 144 24 176
nasnetmobile 152 48 72 0.68 0.76 0.74 0.71 180 20 88.5 0.87 0.90 0.90 0.88
64 136 26 174
efficientnetb0 140 60 67 0.64 0.70 0.68 0.66 176 24 85 0.82 0.88 0.87 0.85
72 128 36 164
GoogleNet 138 62 64.5 0.60 0.69 0.66 0.63 170 30 83.5 0.82 0.85 0.85 0.83
80 120 36 164
googlenet-places365 150 50 71 0.67 0.75 0.73 0.70 182 18 87 0.83 0.91 0.90 0.86
66 134 34 166
resnet18 142 58 68 0.65 0.71 0.69 0.67 176 24 90.5 0.93 0.88 0.89 0.91
70 130 14 186
(b) Using Pre-processed DRD-EyePACS Dataset Images
Network Name Confusion Matrix

ACC

%

Sen Sp Pr F1 Confusion Matrix

ACC

%

Sen Sp Pr F1
SqueezeNet 150 50 71 0.67 0.75 0.73 0.70 180 20 87.5 0.85 0.90 0.89 0.87
66 134 30 170
mobilenetv2 158 42 74 0.69 0.79 0.77 0.73 186 14 90.5 0.88 0.93 0.93 0.90
62 138 24 176
shufflenet 160 40 76.5 0.73 0.80 0.78 0.76 192 8 94.5 0.93 0.96 0.96 0.94
54 146 14 186
nasnetmobile 158 42 74 0.69 0.79 0.77 0.73 184 16 90.5 0.89 0.92 0.92 0.90
62 138 22 178
efficientnetb0 150 50 71 0.67 0.75 0.73 0.70 178 22 86.5 0.84 0.89 0.88 0.86
66 134 32 168
GoogleNet 142 58 68 0.65 0.71 0.69 0.67 180 20 86 0.82 0.90 0.89 0.85
70 130 36 164
googlenet-places365 152 48 72.5 0.69 0.76 0.74 0.72 182 18 89.5 0.88 0.91 0.91 0.89
62 138 24 176
resnet18 152 48 70 0.64 0.76 0.73 0.68 176 24 86.5 0.85 0.88 0.88 0.86
72 128 30 170

From Table 13, it is concluded that the shufflenet network in the category of Lightweight achieved the highest classification accuracy using the DRD-EyePACS Dataset. It is also observed that 94.5% accuracy has been achieved after the augmentation of the DRD Dataset. It is also noted that the shufflenet-based network achieved 0.93, 0.96, 0.96, and 0.94 evaluation parameters named sensitivity, Specificity, Precision, and F1-Score, respectively. It was also noted that individual class accuracy (ICA) for the healthy eye is 192, and ICA for an unhealthy eye is 186, achieved using a series-based network shufflenet. The best result is marked in grey.

Performance evaluation metrics using IDRiD Dataset

Several factors such as choice of pre-trained network, data pre-processing, fine-tuning of the pre-trained network using optimal hyper-parameter, validation data, and evaluation metrics are taken into consideration while networks performance using series, DAG and Lightweight network architectures category using IDRiD Dataset.

Performance of series-based network architectures using IDRiD dataset

The performance of series-based networks with and without augmentation using original and pre-processed DR images of the IDRiD Dataset is shown in Table 14.

Table 14.

The performance of series-based networks based on augmentation and without augmentation using original & pre-processed DR images of IDRiD Dataset

(a) Using Original IDRiD Dataset images
Network Name Confusion Matrix Without Augmentation Confusion Matrix After Augmentation
ACC % Sen Sp Pr F1 ACC % Sen Sp Pr F1
AlexNet 42 8 64 0.44 0.84 0.73 0.55 46 4 85 0.78 0.92 0.91 0.84
28 22 11 39
vgg16 43 7 67 0.48 0.86 0.77 0.59 46 4 86 0.80 0.92 0.91 0.85
26 24 10 40
vgg19 42 8 72 0.60 0.84 0.79 0.68 47 3 90 0.86 0.94 0.93 0.90
20 30 7 43
darknet19 39 11 62 0.46 0.78 0.68 0.55 44 6 84 0.80 0.88 0.87 0.83
27 23 10 40
darknet53 41 9 71 0.60 0.82 0.77 0.67 46 4 87 0.82 0.92 0.91 0.86
20 30 9 41
(b) Using Pre-processed IDRiD Dataset Images
Network Name Confusion Matrix ACC % Sen Sp Pr F1 Confusion Matrix ACC % Sen Sp Pr F1
AlexNet 45 5 68% 0.56 0.90 0.85 0.67 47 3 87 0.80 0.94 0.93 0.86
22 28 10 40
vgg16 46 4 71% 0.50 0.92 0.86 0.63 47 3 88 0.82 0.94 0.93 0.87
25 25 9 41
Vgg19 45 05 73% 0.56 0.90 0.85 0.67 48 2 92 0.88 0.96 0.96 0.92
22 28 6 44
darknet19 38 12 66% 0.56 0.76 0.70 0.62 46 4 88 0.84 0.92 0.91 0.88
22 28 8 42
darknet53 46 4 72% 0.52 0.92 0.87 0.65 45 5 89 0.88 0.90 0.90 0.89
24 26 6 44

Table 14 shows that the vgg19 network in the series category achieved the highest classification accuracy using the IDRiD Dataset. It is also observed that 92% accuracy has been achieved after the augmentation of the IDRiD Dataset. It is also noted that the vgg19-based network achieved 0.88, 0.96, 0.96, and 0.92 evaluation parameters named sensitivity, Specificity, Precision, and F1-Score, respectively. It was also noted that individual class accuracy (ICA) for the healthy eye is 48 and ICA for an unhealthy eye is 44, which was achieved using series-based network vgg19. The best result is marked in grey.

Performance of DAG-based network architectures using IDRiD dataset

The performance of DAG-based networks with and without augmentation using original and pre-processed DR images of the IDRiD Dataset is shown in Table 15.

Table 15.

The performance of DAG-based networks based on augmentation and without augmentation using original & pre-processed DR images of IDRiD Dataset

(a) Using Original IDRiD Dataset images
Network Name Confusion Matrix 100 Without Augmentation Confusion Matrix After Augmentation
ACC Sen Sp Pr F1 ACC Sen Sp Pr F1
inceptionv3 40 10 67 0.54 0.80 0.73 0.62 46 4 88 0.84 0.92 0.91 0.88
23 27 8 42
densenet201 42 8 69 0.50 0.84 0.76 0.60 45 5 87 0.84 0.90 0.89 0.87
25 25 8 42
Resnet50 41 9 70 0.58 0.82 0.76 0.66 47 3 88 0.82 0.94 0.93 0.87
21 29 9 41
Resnet101 44 6 72 0.56 0.88 0.82 0.67 47 3 90 0.86 0.94 0.93 0.90
22 28 7 43
xception 43 7 69 0.52 0.86 0.79 0.63 42 8 85 0.86 0.84 0.84 0.85
24 26 7 43
inceptionresnetv2 38 12 67 0.58 0.76 0.71 0.64 45 5 84 0.78 0.90 0.89 0.83
21 29 11 39
nasnetlarge 37 13 68 0.62 0.74 0.70 0.66 46 4 86 0.80 0.92 0.91 0.85
19 31 10 40
(b) Using Pre-processed IDRiD Dataset Images
Network Name Confusion Matrix ACC Sen Sp Pr F1 Confusion Matrix ACC Sen Sp Pr F1
inceptionv3 41 9 68 0.54 0.82 0.75 0.63 46 4 89 0.86 0.92 0.91 0.89
23 27 7 43
densenet201 43 7 71 0.56 0.86 0.80 0.66 47 3 90 0.86 0.94 0.93 0.90
22 28 7 43
Resnet50 43 7 72 0.58 0.86 0.81 0.67 45 5 90 0.90 0.90 0.90 0.90
21 29 5 45
Resnet101 44 6 73 0.58 0.88 0.83 0.68 47 3 92 0.90 0.94 0.94 0.92
21 29 5 45
xception 44 6 70 0.52 0.88 0.81 0.63 47 3 88 0.82 0.94 0.93 0.87
24 26 9 41
inceptionresnetv2 40 10 69 0.58 0.80 0.74 0.65 43 7 86 0.86 0.86 0.86 0.86
21 29 7 43
nasnetlarge 39 11 70 0.62 0.78 0.74 0.67 47 3 88 0.82 0.94 0.93 0.87
19 31 9 41

From Table 15, it is concluded that the Resnet101 network in the category of DAG achieved the highest classification accuracy using the IDRiD Dataset. It is also observed that 92% accuracy has been achieved after the augmentation of the IDRiD Dataset. It is also noted that the Resnet101-based network achieved 0.90, 0.94, 0.94, and 0.92 evaluation parameters named sensitivity, Specificity, Precision, and F1-Score, respectively. It was also noted that individual class accuracy (ICA) for a healthy eye is 47, and ICA for an unhealthy eye is 45, achieved using a series-based network, Resnet101. The best result is marked in grey.

Performance of Lightweight based network architectures using IDRiD dataset

Table 16 shows the performance of Lightweight networks based on augmentation and without augmentation using original and pre-processed DR images of the IDRiD Dataset.

Table 16.

The performance of Lightweight pre-trained networks based on augmentation and without augmentation using original & pre-processed DR images of IDRiD Dataset

(a) Using Original IDRiD Dataset images
Network Name Confusion Matrix100 Without Augmentation Confusion Matrix After Augmentation

ACC

%

Sen Sp Pr F1

ACC

%

Sen Sp Pr F1
SqueezeNet 41 9 68 0.54 0.82 0.75 0.63 44 6 83 0.78 0.88 0.87 0.82
23 27 11 39
mobilenetv2 44 6 73 0.58 0.88 0.83 0.68 45 5 89 0.88 0.90 0.90 0.89
21 29 6 44
shufflenet 44 6 74 0.60 0.88 0.83 0.70 46 4 90 0.88 0.92 0.92 0.90
20 30 6 44
nasnetmobile 40 10 69 0.58 0.80 0.74 0.65 45 5 88 0.86 0.90 0.90 0.88
21 29 7 43
efficientnetb0 38 12 65 0.54 0.76 0.69 0.61 41 9 82 0.82 0.82 0.82 0.82
23 27 9 41
GoogleNet 39 11 65 0.52 0.78 0.70 0.60 43 7 83 0.80 0.86 0.85 0.82
24 26 10 40
googlenet-places365 39 11 67 0.56 0.78 0.72 0.63 45 5 86 0.82 0.90 0.89 0.85
22 28 9 41
resnet18 38 12 66 0.56 0.76 0.70 0.62 44 6 84 0.80 0.88 0.87 0.83
22 28 10 40
(b) Using Pre-processed IDRiD Dataset Images
Network Name Confusion Matrix ACC % Sen Sp Pr F1 Confusion Matrix ACC % Sen Sp Pr F1
SqueezeNet 40 10 70 0.60 0.80 0.75 0.67 45 5 86 0.82 0.90 0.89 0.85
20 30 9 41
mobilenetv2 44 6 74 0.60 0.88 0.83 0.70 45 5 90 0.90 0.90 0.90 0.90
20 30 5 45
shufflenet 45 5 75 0.60 0.90 0.86 0.71 46 4 91 0.90 0.92 0.92 0.91
20 30 5 45
nasnetmobile 40 10 71 0.62 0.80 0.76 0.68 46 4 89 0.86 0.92 0.91 0.89
19 31 7 43
efficientnetb0 39 11 66 0.54 0.78 0.71 0.61 44 6 84 0.80 0.88 0.87 0.83
23 27 10 40
GoogleNet 40 10 68 0.56 0.80 0.74 0.64 43 7 85 0.84 0.86 0.86 0.85
22 28 8 42
googlenet-places365 40 10 70 0.60 0.80 0.75 0.67 45 5 88 0.86 0.90 0.90 0.88
20 30 7 43
resnet18 40 10 68 0.56 0.80 0.74 0.64 44 6 85 0.82 0.88 0.87 0.85
22 28 9 41

From Table 16, it is concluded that the shufflenet network in the category of Lightweight achieved the highest classification accuracy using IDRiD Dataset. It is also observed that 91% accuracy has been achieved after the augmentation of the IDRiD Dataset. It is also noted that the shufflenet-based network achieved 0.90, 0.92, 0.92, and 0.91 evaluation parameters named sensitivity, Specificity, Precision, and F1-Score, respectively. It is also noted that Individual Class Accuracy (ICA) for the healthy eye is 46 and ICA for an unhealthy eye is 45, which has been achieved using series-based network shufflenet. The best result is marked in grey.

Performance evaluation metrics using the ATOS-2019 Dataset

Several factors such as choice of pre-trained network, data pre-processing, fine-tuning of the pre-trained network using optimal hyper-parameter, validation data, and evaluation metrics are taken into consideration while networks performance using series, DAG and Lightweight network architectures category using ATOS-2019 Dataset.

Performance of series-based network architectures using ATOS-2019 Dataset

The performance of series-based networks based with augmentation and without augmentation using original & pre-processed DR images of the ATOS-2019 Dataset is shown in Table 17.

Table 17.

The performance of series-based networks based on augmentation and without augmentation using original & pre-processed DR images of ATOS-2019 Dataset

(a) Using Original ATOS-2019 Dataset images
Network Name Confusion Matrix Without Augmentation Confusion Matrix After Augmentation

ACC

%

Sen Sp Pr F1

ACC

%

Sen Sp Pr F1
AlexNet 140 60 66 0.62 0.70 0.67 0.65 172 28 85 0.84 0.86 0.86 0.85
76 124 32 168
vgg16 141 59 68.5 0.67 0.71 0.69 0.68 180 20 87.5 0.85 0.90 0.89 0.87
67 133 30 170
vgg19 152 48 72.5 0.69 0.76 0.74 0.72 188 12 91.5 0.89 0.94 0.94 0.91
62 138 22 178
darknet19 142 58 68 0.65 0.71 0.69 0.67 180 20 87 0.84 0.90 0.89 0.87
70 130 32 168
darknet53 141 59 67 0.64 0.71 0.68 0.66 177 23 86.5 0.85 0.89 0.88 0.86
73 127 31 169
(b) Using Pre-processed ATOS-2019 Dataset Images
Network Name Confusion Matrix

ACC

%

Sen Sp Pr F1 Confusion Matrix

ACC

%

Sen Sp Pr F1
AlexNet 145 55 67.5 0.63 0.73 0.69 0.66 179 21 86.5 0.84 0.90 0.89 0.86
75 125 33 167
vgg16 145 55 70 0.68 0.73 0.71 0.69 184 16 88.5 0.85 0.92 0.91 0.88
65 135 30 170
vgg19 153 47 73 0.70 0.77 0.75 0.72 191 9 92.5 0.90 0.96 0.95 0.92
61 139 21 179
darknet19 145 55 69.5 0.67 0.73 0.71 0.69 182 18 88 0.85 0.91 0.90 0.88
67 133 30 170
darknet53 143 57 68 0.65 0.72 0.69 0.67 180 20 87.5 0.85 0.90 0.89 0.87
71 129 30 170

From Table 17, it is concluded that vgg19 networks in the category of series achieved the highest classification accuracy using the ATOS-2019 Dataset. It is also observed that 92.5% accuracy has been achieved after the augmentation of the ATOS-2019 Dataset. It is also noted that the vgg19-based network achieved 0.90, 0.96, 0.95, and 0.92 evaluation parameters named sensitivity, Specificity, Precision, and F1-Score, respectively. It also noted that individual class accuracy (ICA) for a healthy eye is 191 and ICA for an unhealthy eye is 179, which was achieved using series-based network vgg19. The best result is marked in grey.

Performance of DAG-based network architectures using ATOS-2019 dataset

Table 18 shows the performance of DAG-based networks with and without augmentation using original and pre-processed DR images of the ATOS-2019 Dataset.

Table 18.

The performance of DAG-based networks based on augmentation and without augmentation using original & pre-processed DR images of ATOS-2019 Dataset

(a) Using Original ATOS-2019 Dataset images
Network Name Confusion Matrix Without Augmentation Confusion Matrix After Augmentation

ACC

%

Sen Sp Pr F1

ACC

%

Sen Sp Pr F1
inceptionv3 147 53 71.5 0.70 0.74 0.72 0.71 181 19 86 0.82 0.91 0.90 0.85
61 139 37 163
densenet201 145 55 705 0.69 0.73 0.71 0.70 185 15 88 0.84 0.93 0.92 0.87
63 137 33 167
Resnet50 146 54 71.5 0.70 0.73 0.72 0.71 185 15 90 0.88 0.93 0.92 0.90
60 140 25 175
Resnet101 150 50 74 0.73 0.75 0.74 0.74 189 11 91.5 0.89 0.95 0.94 0.91
54 146 23 177
xception 144 56 70 0.68 0.72 0.71 0.69 181 19 86.5 0.83 0.91 0.90 0.86
64 136 35 165
inceptionresnetv2 140 60 67 0.64 0.70 0.68 0.66 170 30 83.5 0.82 0.85 0.85 0.83
72 128 36 164
nasnetlarge 145 55 70.5 0.69 0.73 0.71 0.70 182 18 87.5 0.84 0.91 0.90 0.87
63 137 32 168
(b) Using Pre-processed ATOS-2019 Dataset Images
Network Name Confusion Matrix

ACC

%

Sen Sp Pr F1 Confusion Matrix

ACC

%

Sen Sp Pr F1
inceptionv3 146 54 72.5 0.72 0.73 0.73 0.72 185 15 87.5 0.83 0.93 0.92 0.87
56 144 35 165
densenet201 146 54 71.5 0.70 0.73 0.72 0.71 187 13 89 0.85 0.94 0.93 0.88
60 140 31 169
Resnet50 151 49 73 0.71 0.76 0.74 0.72 187 13 91 0.89 0.94 0.93 0.91
59 141 23 177
Resnet101 152 48 74.5 0.73 0.76 0.75 0.74 193 7 94 0.92 0.97 0.96 0.94
54 146 17 183
xception 146 54 71 0.69 0.73 0.72 0.70 185 15 88 0.84 0.93 0.92 0.87
62 138 33 167
inceptionresnetv2 141 59 68 0.66 0.71 0.69 0.67 175 25 85 0.83 0.88 0.87 0.85
69 131 35 165
nasnetlarge 147 53 72 0.71 0.74 0.73 0.72 187 13 89.5 0.86 0.94 0.93 0.89
59 141 29 171

From Table 18, it is concluded that the Resnet101 network in the category of DAG achieved the highest classification accuracy using the ATOS-2019 Dataset. It is also observed that 94% accuracy has been achieved after augmentation of ATOS-2019 Dataset. It is also noted that the Resnet101-based network achieved 0.92, 0.97, 0.96, and 0.94 evaluation parameters named sensitivity, Specificity, Precision, and F1-Score, respectively. It is also stated that individual class accuracy (ICA) for the healthy eye is 193 and ICA for an unhealthy eye is 183, which was achieved using the based network Resnet101. The best result is marked in grey.

Performance of Lightweight based network architectures using ATOS-2019 dataset

Table 19 shows the performance of Lightweight networks based on augmentation and without augmentation using original and pre-processed DR images of the ATOS-2019 Dataset.

Table 19.

The performance of Lightweight networks based on augmentation and without augmentation using original & pre-processed DR images of ATOS-2019 Dataset

(a) Using Original ATOS-2019 Dataset images
Network Name Confusion Matrix400 Without Augmentation Confusion Matrix After Augmentation
ACC % Sen Sp Pr F1 ACC % Sen Sp Pr F1
SqueezeNet 143 57 70.5 0.70 0.72 0.71 0.70 180 20 86.5 0.83 0.90 0.89 0.86
61 139 34 166
mobilenetv2 145 55 71 0.70 0.73 0.72 0.71 176 24 87 0.86 0.88 0.88 0.87
61 139 28 172
shufflenet 149 51 72.5 0.71 0.75 0.73 0.72 187 13 91.5 0.90 0.94 0.93 0.91
59 141 21 179
nasnetmobile 143 57 70 0.69 0.72 0.71 0.70 182 18 86.5 0.82 0.91 0.90 0.86
63 137 36 164
efficientnetb0 135 65 66 0.65 0.68 0.66 0.65 170 30 83.5 0.82 0.85 0.85 0.83
71 129 36 164
GoogleNet 130 70 64.5 0.64 0.65 0.65 0.64 165 35 81.75 0.81 0.83 0.82 0.82
72 128 38 162
googlenet-places365 141 59 69 0.68 0.71 0.70 0.69 175 25 85.5 0.84 0.88 0.87 0.85
65 135 33 167
resnet18 136 64 66 0.64 0.68 0.67 0.65 174 26 84 0.81 0.87 0.86 0.84
72 128 38 162
(b) Using Pre-processed ATOS-2019 Dataset Images
Network Name Confusion Matrix

ACC

%

Sen Sp Pr F1 Confusion Matrix

ACC

%

Sen Sp Pr F1
SqueezeNet 148 52 72 0.70 0.74 0.73 0.71 183 17 87.5 0.84 0.92 0.91 0.87
60 140 33 167
mobilenetv2 149 51 72.5 0.71 0.75 0.73 0.72 180 20 89.5 0.89 0.90 0.90 0.89
59 141 22 178
shufflenet 153 47 73.5 0.71 0.77 0.75 0.73 191 9 92.5 0.90 0.96 0.95 0.92
59 141 21 179
nasnetmobile 145 55 71 0.70 0.73 0.72 0.71 185 15 89 0.86 0.93 0.92 0.89
61 139 29 171
efficientnetb0 137 63 67 0.66 0.69 0.68 0.66 175 25 85.5 0.84 0.88 0.87 0.85
69 131 33 167
GoogleNet 135 65 66 0.65 0.68 0.66 0.65 172 28 85 0.84 0.86 0.86 0.85
71 129 32 168
googlenet-places365 143 57 70 0.69 0.72 0.71 0.70 180 20 87.5 0.85 0.90 0.89 0.87
63 137 30 170
resnet18 139 61 67 0.65 0.70 0.68 0.66 175 25 85.5 0.84 0.88 0.87 0.85
71 129 33 167

From Table 19, it is concluded that the shufflenet network in the category of Lightweight achieved the highest classification accuracy using the ATOS-2019 Dataset. It is also observed that 92.5% accuracy has been achieved after augmentation of ATOS-2019 Dataset. It is also noted that the shufflenet-based network achieved 0.90, 0.96, 0.95, and 0.92 evaluation parameters named sensitivity, Specificity, Precision, and F1-Score, respectively. It was also noted that individual class accuracy (ICA) for the healthy eye is 191 and ICA for an unhealthy eye is 179, which was achieved using series-based network shufflenet. The best result is marked in grey.

Performance evaluation metrics using the combined dataset

Several factors such as choice of pre-trained network, data pre-processing, fine-tuning of the pre-trained network using optimal hyper-parameter, validation data, and evaluation metrics are taken into consideration while network performance using series, DAG, and Lightweight network architectures category using Combined Dataset.

Performance of series-based network architectures using combined dataset

The performance of series-based networks-based augmentation and without augmentation using original & pre-processed DR images of the Combined Dataset is shown in Table 20.

Table 20.

The performance of series-based networks based on augmentation and without augmentation using original & pre-processed DR images of Combined Dataset

(a) Using Original Combined Dataset images
Network Name Confusion Matrix Without Augmentation Confusion Matrix After Augmentation
ACC % Sen Sp Pr F1 ACC % Sen Sp Pr F1
AlexNet 318 132 65.33 0.60 0.71 0.67 0.63 415 35 91 0.90 0.92 0.92 0.91
180 270 46 404
vgg16 324 126 67.88 0.64 0.72 0.69 0.67 409 41 90 0.89 0.91 0.91 0.90
163 287 49 401
vgg19 346 104 72.66 0.68 0.77 0.75 0.71 428 22 94.66 0.94 0.95 0.95 0.95
142 308 26 424
darknet19 311 139 66.44 0.64 0.69 0.67 0.66 421 29 92 0.90 0.94 0.93 0.92
163 287 43 407
darknet53 326 124 69.22 0.66 0.72 0.71 0.68 410 40 90.1111 0.89 0.91 0.91 0.90
153 297 49 401
(b) Using Pre-processed Combined Dataset Images
Network Name Confusion Matrix ACC % Sen Sp Pr F1 Confusion Matrix ACC % Sen Sp Pr F1
AlexNet 330 120 68.77 0.64 0.73 0.71 0.67 419 31 92 0.91 0.93 0.93 0.92
161 289 41 409
vgg16 341 109 70.55 0.65 0.76 0.73 0.69 413 37 91.11 0.90 0.92 0.92 0.91
156 294 43 407
vgg19 350 100 73.88 0.70 0.78 0.76 0.73 435 15 96.22 0.96 0.97 0.97 0.96
135 315 19 431
darknet19 327 123 68.88 0.65 0.73 0.70 0.68 425 25 93 0.92 0.94 0.94 0.93
157 293 38 412
darknet53 345 105 70.66 0.65 0.77 0.73 0.69 421 29 91 0.88 0.94 0.93 0.91
159 291 52 398

Table 20 shows that the vgg19 network in the series category achieved the highest classification accuracy using a Combined Dataset. It is also observed that 96.22% accuracy has been achieved after augmentation of Combined Dataset. It is also noted that the vgg19-based network achieved 0.96, 0.97, 0.97, and 0.96 evaluation parameters named sensitivity, Specificity, Precision, and F1-Score, respectively. It was also noted that individual class accuracy (ICA) for the healthy eye is 435 and ICA for an unhealthy eye is 431, which was achieved using series-based network vgg19. The best result is marked in grey.

Performance of DAG-based network architectures using combined dataset

Table 21 shows the performance of DAG-based networks based augmentation and without augmentation using original and pre-processed DR images of the Combined Dataset.

Table 21.

The performance of DAG-based networks based on augmentation and without augmentation using original & pre-processed DR images of Combined Dataset

(a) Using Original Combined Dataset images
Network Name Confusion Matrix Without Augmentation Confusion Matrix After Augmentation
ACC % Sen Sp Pr F1 ACC % Sen Sp Pr F1
inceptionv3 325 125 69.2 0.66 0.72 0.70 0.68 415 35 89.55 0.87 0.92 0.92 0.89
152 298 59 391
densenet201 339 111 69.6 0.64 0.75 0.72 0.68 414 36 89.11 0.86 0.92 0.92 0.89
162 288 62 388
Resnet50 339 111 71.7 0.68 0.75 0.73 0.71 414 36 90.22 0.88 0.92 0.92 0.90
143 307 52 398
Resnet101 348 102 0.74 0.71 0.77 0.76 0.73 426 24 93.55 0.92 0.95 0.95 0.93
132 318 34 416
xception 339 111 70.3 0.65 0.75 0.73 0.69 417 33 91.33 0.90 0.93 0.92 0.91
156 294 45 405
inceptionresnetv2 328 122 67.8 0.63 0.73 0.70 0.66 406 44 88.33 0.86 0.90 0.90 0.88
167 283 61 389
nasnetlarge 330 120 69.3 0.65 0.73 0.71 0.68 416 34 90.11 0.88 0.92 0.92 0.90
156 294 55 395
Using Pre-processed Combined Dataset Images
Network Name Confusion Matrix ACC % Sen Sp Pr F1 Confusion Matrix ACC % Sen Sp Pr F1
inceptionv3 339 111 72 0.69 0.75 0.74 0.71 415 35 90.88 0.89 0.93 0.93 0.88
141 309 72 378
densenet201 343 107 72.3 0.68 0.76 0.74 0.71 418 32 91 0.89 0.93 0.93 0.89
142 308 66 384
Resnet50 352 98 73.7 0.69 0.78 0.76 0.73 420 30 91.44 0.90 0.93 0.93 0.91
138 312 50 400
Resnet101 358 92 75.6 0.72 0.80 0.78 0.75 440 10 97.33 0.97 0.98 0.98 0.97
127 323 14 436
xception 340 110 71.3 0.67 0.76 0.73 0.70 414 36 92.44 0.91 0.94 0.93 0.88
148 302 66 384
inceptionresnetv2 329 121 69 0.65 0.73 0.71 0.68 402 48 89.77 0.89 0.91 0.91 0.87
158 292 70 380
nasnetlarge 328 122 70.6 0.68 0.73 0.72 0.70 410 40 91.44 0.89 0.94 0.94 0.88
142 308 68 382

From Table 21, it is concluded that the Resnet101 network in the category of DAG achieved the highest classification accuracy using the Combined Dataset. It is also observed that 97.33% accuracy has been achieved after augmentation of Combined Dataset. It is also noted that the Resnet101-based network achieved 0.97, 0.98, 0.98, and 0.97 evaluation parameters named sensitivity, Specificity, Precision, and F1-Score, respectively. It is also stated that individual class accuracy (ICA) for the healthy eye is 440, and ICA for an unhealthy eye is 436, achieved using the based network Resnet101. The best result is marked in grey.

Performance of lightweight-based network architecture using combined dataset

Table 22 shows the performance of Lightweight networks based on augmentation and without augmentation using original and pre-processed DR images of the Combined Dataset.

Table 22.

The performance of Lightweight networks based on augmentation and without augmentation using original & pre-processed DR images of Combined Dataset

(a) Using Original Combined Dataset images
Network Name Confusion Matrix Without Augmentation Confusion Matrix After Augmentation

ACC

%

Sen Sp Pr F1

ACC

%

Sen Sp Pr F1
SqueezeNet 326 124 69.7 0.67 0.72 0.71 0.69 402 48 85.44 0.82 0.89 0.88 0.85
148 302 83 367
mobilenetv2 343 107 71.6 0.67 0.76 0.74 0.70 401 49 87.44 0.86 0.89 0.89 0.87
148 302 64 386
shufflenet 351 99 74 0.70 0.78 0.76 0.73 429 21 93.33 0.91 0.95 0.95 0.93
135 315 39 411
nasnetmobile 335 115 70.7 0.67 0.74 0.72 0.70 407 43 87.55 0.85 0.90 0.90 0.87
148 302 69 381
efficientnetb0 313 137 66.3 0.63 0.70 0.67 0.65 387 63 0.84 0.82 0.86 0.85 0.84
166 284 81 369
GoogleNet 307 143 64.5 0.61 0.68 0.66 0.63 378 72 82.66 0.81 0.84 0.84 0.82
176 274 84 366
googlenet-places365 330 120 69.6 0.66 0.73 0.71 0.69 402 48 86.22 0.83 0.89 0.89 0.86
153 297 76 374
resnet18 316 134 66.8 0.64 0.70 0.68 0.66 394 56 86.88 0.86 0.88 0.87 0.87
164 286 62 388
(b) Using Pre-processed Combined Dataset Images
Network Name Confusion Matrix

ACC

%

Sen Sp Pr F1 Confusion Matrix

ACC

%

Sen Sp Pr F1
SqueezeNet 338 112 71.3 0.68 0.75 0.73 0.70 408 42 87.33 0.84 0.91 0.90 0.87
146 304 72 378
mobilenetv2 351 99 73.3 0.69 0.78 0.76 0.72 411 39 90 0.89 0.91 0.91 0.90
141 309 51 399
shufflenet 358 92 75 0.70 0.80 0.78 0.74 439 11 96.66 0.96 0.98 0.98 0.97
133 317 19 431
nasnetmobile 343 107 72.3 0.68 0.76 0.74 0.71 415 35 89.66 0.87 0.92 0.92 0.89
142 308 58 392
efficientnetb0 326 124 68.6 0.65 0.72 0.70 0.67 397 53 85.77 0.83 0.88 0.88 0.85
158 292 75 375
GoogleNet 317 133 67.1 0.64 0.70 0.68 0.66 395 55 85.44 0.83 0.88 0.87 0.85
163 287 76 374
googlenet-places365 335 115 71.1 0.68 0.74 0.73 0.70 407 43 88.44 0.86 0.90 0.90 0.88
145 305 61 389
resnet18 331 119 68.4 0.63 0.74 0.71 0.67 396 54 87.55 0.87 0.88 0.88 0.88
165 285 58 392

From Table 22, it is concluded that the shufflenet network in the category of Lightweight achieved the highest classification accuracy using the Combined Dataset. It is also observed that 96.66% accuracy has been achieved after augmentation of Combined Dataset. It is also noted that the shufflenet-based network achieved 0.96, 0.98, 0.98, and 0.97 evaluation parameters named sensitivity, Specificity, Precision, and F1-Score, respectively. It is also noted that Individual Class Accuracy (ICA) for the healthy eye is 439 and ICA for an unhealthy eye is 431, which has been achieved using series-based network shufflenet. The best result is marked in grey.

Result and discussion

When creating a model on limited resources, CNN image analysis uses transfer learning (TL), which involves moving feature weights from training on large image datasets to training on smaller datasets. It is possible to minimize the number of images in the target domain drastically. Generally, the model is trained on a smaller target dataset for feature extraction or fine-tuning based on the target domain's size and similarity, using pre-trained weights from ImageNet, a sizable dataset of natural images. However, if there is a significant difference between the source and target domains, the model's performance might not be recognized. Several pre-trained models have been used to classify diabetic retinopathy images. They show promising and different results and require less computational time. To solve this problem, 20 different pre-trained models have been divided into three categories: series, DAG, and lightweight. Three benchmark datasets have been collected for this work, and a combined dataset has been prepared for two classes. The best results have been achieved with weights sequentially transferred from ImageNet to prepared datasets. Our research showed that the deep learning method based on the ResNet101 network effectively distinguishes between normal retinal and DR images. The promising result shows that the resnet101-based pre-trained network achieved the highest classification accuracy using combined dataset. The amount of dataset used for training, the choice of hyperparameters, the pre-trained network category, and the specific dataset impact the level of precision of deep learning models compared. The present work uses Accuracy, Sensitivity, Specificity, Precision, and F1-Score metrics to calculate performance parameters by classifying DR images. The best-performing network has been selected based on classification accuracy for each category. Table 23 shows the comparison analysis of different networks with different categories. Figure 4 shows the ROC-AUC curve using the combined dataset for three categories.

Table 23.

The comparison analysis of different networks with different categories

Category Dataset Network Name Confusion Matrix ACC % Sen Sp Pr F1
Series Vgg19 EyePACS Dataset 192 8 92.5 0.89 0.96 0.96 0.92
22 178
IDRiD Dataset 48 2 92 0.88 0.96 0.96 0.92
6 44
ATOS-2019 191 9 92.5 0.90 0.96 0.95 0.92
21 179
Combined Dataset 435 15 96.22 0.96 0.97 0.97 0.96
19 431
DAG Resnet101 EyePACS Dataset 194 6 93.5 0.90 0.97 0.97 0.93
20 180
IDRiD Dataset 46 4 91 0.90 0.92 0.92 0.91
5 45
ATOS-2019 193 7 94 0.92 0.97 0.96 0.94
17 183
Combined Dataset 440 10 97.33 0.97 0.98 0.98 0.97
14 436
Lightweight shufflenet EyePACS Dataset 192 8 94.5 0.93 0.96 0.96 0.94
14 186
IDRiD Dataset 46 4 91 0.90 0.92 0.92 0.91
5 45
ATOS-2019 191 9 92.5 0.90 0.96 0.95 0.92
21 179
Combined Dataset 439 11 96.66 0.96 0.98 0.98 0.97
19 431

Fig. 4.

Fig. 4

The ROC-AUC curve using the combined dataset for three categories

The Final Result of the pre-trained network with classification accuracy using four datasets is shown in Table 24.

Table 24.

The Final Result of the pre-trained network with classification accuracy using four datasets

S.No Network Name Classification Accuracy
EyePACS Dataset IDRiD Dataset ATOS-2019 Combined Dataset
1 AlexNet 87 87 86.5 92
2 vgg16 87.5 88 88.5 91.11
3 vgg19 92.5 92 92.5 96.22
4 darknet19 85 88 88 93
5 darknet53 89.5 89 87.5 91
6 inceptionv3 88.5 89 87.5 90.88
7 densenet201 89 90 89 91
8 Resnet50 91.5 90 91 91.44
9 Resnet101 93.5 92 94 97.33
10 xception 89.5 88 88 92.44
11 inceptionresnetv2 89 86 85 89.77
12 nasnetlarge 86.5 88 89.5 91.44
13 SqueezeNet 87.5 86 87.5 87.33
14 mobilenetv2 90.5 90 89.5 90
15 shufflenet 94.5 91 92.5 96.66
16 nasnetmobile 90.5 89 89 89.66
17 efficientnetb0 86.5 84 85.5 85.77
18 GoogleNet 86 85 85 85.44
19 googlenet-places365 89.5 88 87.5 88.44
20 resnet18 86.5 85 85.5 87.55

Tables 23 & 24 and Fig. 4 concluded that the combined dataset achieved the highest accuracy in all three categories: series, DAG, and Lightweight. It is observed that Vgg19, ResNet101, and shufflenet pre-trained networks achieved the highest accuracy of 96.22%, 97.33%, and 96.66% in series, DAG, and Lightweight categories. It is also noted that ResNet101 achieved the highest category in all cases. It is concluded that the ResNet101 pre-trained network in the DAG category is optimal for diabetic retinopathy disease detection in the early stage. The best result is marked in grey.

Conclusion

The exhaustive experiments concluded that the ResNet101-based pre-trained network in the category of DAG achieved the highest accuracy using combined datasets, i.e., DRD-EyePACS, IDRiD, and ATOS-2019. The significant advancement in the ResNet101 network achieves a balance between computing efficiency, depth, and accuracy. It is also observed that high accuracy, robustness, and efficiency have been achieved using ResNet101 in the category of DAG and observed as a powerful method for diagnosing and classifying diabetic retinopathy. It enhances the diagnosis and treatment in the early stage for the patient and can be used as real-time clinical practice. It is also noted that the implementation and training of ResNet101 require much processing compared to other networks and require high-quality labeled data. In this work, data was balanced using the augmentation technique, and balanced data was generated for robust model training. It is noted that 97.33% accuracy has been achieved using ResNet101. The proposed method can be used for routine time clinical practice.

Funding

This research received no external funding.

Data availability

The corresponding author has been authorized to access the data, and it will be shared for reasonable reasons.

Declarations

Ethics approval and consent to participate

Not Applicable.

Consent for publication

NA.

Conflicts of interest

There is no conflict of interest among the authors.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Gulshan V, Peng L, Coram M, et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA. 2016;316(22):2402–10. 10.1001/jama.2016.17216. [DOI] [PubMed] [Google Scholar]
  • 2.Chandrakumar T, Kathirvel R. Classifying diabetic retinopathy using deep learning architecture. Int J Eng Res. 2016;5:19–24. [Google Scholar]
  • 3.Zhou L, Zhao Y, Yang J, Yu Q, Xu X. Deep multiple instance learning for automatic detection of diabetic retinopathy in retinal images. IET Image Proc. 2018;12(4):563–71. 10.1049/iet-ipr.2017.0636. [Google Scholar]
  • 4.Dutta S, Manideep BCS, Basha SM, Caytiles RD, Iyengar NCSN. Classification of diabetic retinopathy images by using deep learning models. Int J Grid Distrib Comput. 2018;11(1):89–106. 10.14257/ijgdc.2018.11.1.09. [Google Scholar]
  • 5.Junjun P, Zhifan Y, Dong S, Hong, Q. Diabetic Retinopathy Detection Based on Deep Convolutional Neural Networks for Localization of Discriminative Regions. Proceedings - 8th International Conference on Virtual Reality and Visualization, ICVRV 2018;46–52. 10.1109/ICVRV.2018.00016
  • 6.Kassani SH, Kassani PH, Khazaeinezhad R, Wesolowski MJ, Schneider KA, Deters R. Diabetic retinopathy classification using a modified xception architecture. 2019 IEEE 19th International Symposium on Signal Processing and Information Technology, ISSPIT. 2019. 10.1109/ISSPIT47144.2019.9001846
  • 7.Challa UK, Yellamraju P, Bhatt JS. A Multi-class Deep All-CNN for detection of diabetic retinopathy using retinal fundus images. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11941 LNCS. 2019;191–199. 10.1007/978-3-030-34869-4_21
  • 8.Qummar S, Khan FG, Shah S, Khan A, Shamshirband S, Rehman ZU, Khan IA, Jadoon W. A Deep Learning ensemble approach for diabetic retinopathy detection. IEEE Access. 2019;7:150530–9. 10.1109/ACCESS.2019.2947484. [Google Scholar]
  • 9.Bhardwaj C, Jain S, Sood M. Diabetic retinopathy severity grading employing quadrant-based Inception-V3 convolution neural network architecture. Int J Imaging Syst Technol. 2021;31(2):592–608. 10.1002/ima.22510. [Google Scholar]
  • 10.Saxena G, Verma DK, Paraye A, Rajan A, Rawat A. Improved and robust deep learning agent for preliminary detection of diabetic retinopathy using public datasets. Intelligence-Based Med. 2020;3–4. 10.1016/j.ibmed.2020.100022
  • 11.Katada Y, Ozawa N, Masayoshi K, Ofuji Y, Tsubota K, Kurihara T. Automatic screening for diabetic retinopathy in interracial fundus images using artificial intelligence. Intelligence-Based Med. 2020;3–4. 10.1016/j.ibmed.2020.100024
  • 12.Usman, A., Muhammad, A., Martinez-Enriquez, A. M., & Muhammad, A. (2020). Classification of Diabetic Retinopathy and Retinal Vein Occlusion in Human Eye Fundus Images by Transfer Learning. In K. Arai, S. Kapoor, & R. Bhatia (Eds.), Advances in Information and Communication (pp. 642–653). FICC 2020. Adv Intell Syst Comput.2020;1130. Springer, Cham. 10.1007/978-3-030-39442-4_47.
  • 13.Alyoubi WL, Abulkhair MF, Shalash WM. Diabetic retinopathy fundus image classification and lesions localization system using deep learning. Sensors. 2021;21(11). 10.3390/s21113704 [DOI] [PMC free article] [PubMed]
  • 14.Bhardwaj C, Jain S, Sood M. Deep learning-based diabetic retinopathy severity grading system employing quadrant ensemble model. J Digit Imaging. 2021;34(2):440–57. 10.1007/s10278-021-00418-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chen PN, Lee CC, Liang CM, Pao SI, Huang KH, Lin KF. General deep learning model for detecting diabetic retinopathy. BMC Bioinformatics. 2021;22. 10.1186/s12859-021-04005-x [DOI] [PMC free article] [PubMed]
  • 16.Yi SL, Yang XL, Wang TW, She FR, Xiong X, He JF. Diabetic retinopathy diagnosis based on RA-efficientnet. Applied Sciences (Switzerland). 2021;11(22):11035. 10.3390/app112211035. [Google Scholar]
  • 17.Khan Z, Khan FG, Khan A, Rehman ZU, Shah S, Qummar S, Ali F, Pack S. Diabetic retinopathy detection using vgg-nin a deep learning architecture. IEEE Access. 2021;9:61408–16. 10.1109/ACCESS.2021.3074422. [Google Scholar]
  • 18.Das S, Kharbanda K, M S, Raman R, DED. Deep learning architecture based on segmented fundus image features for classification of diabetic retinopathy. Biomed Signal Process Control. 2021;68, 102600. 10.1016/j.bspc.2021.102600.
  • 19.AbdelMaksoud E, Barakat S, Elmogy M. A computer-aided diagnosis system for detecting various diabetic retinopathy grades based on a hybrid deep learning technique. Med Biol Eng Compu. 2022;60(7):2015–38. 10.1007/s11517-022-02564-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kobat SG, Baygin N, Yusufoglu E, Baygin M, Barua PD, Dogan S, Yaman O, Celiker U, Yildirim H, Tan RS, Tuncer T, Islam N, Acharya UR. Automated diabetic retinopathy detection using horizontal and vertical patch division-Based Pre-Trained DenseNET with digital fundus images. Diagnostics. 2022;12(8):1975. 10.3390/diagnostics12081975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Mungloo-Dilmohamud Z, Khan MHM, Jhumka K, Beedassy BN, Mungloo NZ, Peña-Reyes C. Balancing data through data augmentation improves the generality of transfer learning for diabetic retinopathy classification. Appl Sci (Switzerland). 2022;12(11):5363. 10.3390/app12115363. [Google Scholar]
  • 22.Asia AO, Zhu CZ, Althubiti SA, Al-Alimi D, Xiao YL, Ouyang PB, Al-Qaness MAA. Detection of diabetic retinopathy in retinal fundus images using CNN classification models. Electronics (Switzerland). 2022;11(17):2740. 10.3390/electronics11172740. [Google Scholar]
  • 23.Mondal SS, Mandal N, Singh KK, Singh A, Izonin I. EDLDR: An ensemble deep learning technique for detection and classification of diabetic retinopathy. Diagnostics. 2023;13(1):124. 10.3390/diagnostics13010124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Yasashvini R, Raja Sarobin VM, Panjanathan R, Graceline S, Anbarasi JL. Diabetic retinopathy classification using CNN and hybrid deep convolutional neural networks. Symmetry. 2022;14(9):1932. 10.3390/sym14091932. [Google Scholar]
  • 25.Dayana AM, Emmanuel WRS. Deep learning enabled optimized feature selection and classification for grading diabetic retinopathy severity in the fundus image. Neural Comput Appl. 2022;34(21):18663–83. 10.1007/s00521-022-07471-3. [Google Scholar]
  • 26.Oulhadj M, Riffi J, Chaimae K, Mahraz AM, Ahmed B, Yahyaouy A, Fouad C, Meriem A, Idriss BA, Tairi H. Diabetic retinopathy prediction based on deep learning and deformable registration. Multimedia Tools and Applications. 2022;81(20):28709–27. 10.1007/s11042-022-12968-z. [Google Scholar]
  • 27.Jabbar MK, Yan J, Xu H, Rehman ZU, Jabbar A. Transfer learning-based model for diabetic retinopathy diagnosis using retinal images. Brain Sci. 2022;12(5):535. 10.3390/brainsci12050535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Menaouer B, Dermane Z, el Houda Kebir N, Matta N. diabetic retinopathy classification using hybrid deep learning approach. SN Comp Sci. 2022;3(5). 10.1007/s42979-022-01240-8
  • 29.Fayyaz AM, Sharif MI, Azam S, Karim A, El-Den J. Analysis of diabetic retinopathy (DR) based on the deep learning. Information (Switzerland). 2023;14(1):30. 10.3390/info14010030. [Google Scholar]
  • 30.Das D, Biswas SK, Bandyopadhyay S. Detection of diabetic retinopathy using convolutional neural networks for feature extraction and classification (DRFEC). Multimedia Tools Appl. 2023;82(19):29943–30001. 10.1007/s11042-022-14165-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Mohanty C, Mahapatra S, Acharya B, Kokkoras F, Gerogiannis VC, Karamitsos I, Kanavos A. Using deep learning architectures for detection and classification of diabetic retinopathy. Sensors. 2023;23(12):5726. 10.3390/s23125726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Jena PK, Khuntia B, Palai C, Nayak M, Mishra TK, Mohanty SN. A novel approach for diabetic retinopathy screening using asymmetric deep learning features. Big Data Cogn Comput. 2023;7(1):25. 10.3390/bdcc7010025. [Google Scholar]
  • 33.Bhimavarapu U, Chintalapudi N, Battineni G. automatic detection and classification of diabetic retinopathy using the improved pooling function in the convolution neural network. Diagnostics. 2023;13(15):2606. 10.3390/diagnostics13152606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Islam N, Jony MdMH, Hasan E, Sutradhar S, Rahman A, Islam MdM. Toward lightweight diabetic retinopathy classification: A knowledge distillation approach for resource-constrained settings. Appl Sci. 2023;13(22):12397. 10.3390/app132212397. [Google Scholar]
  • 35.Sajid MZ, Hamid MF, Youssef A, Yasmin J, Perumal G, Qureshi I, Naqi SM, Abbas Q. DR-NASNet: automated system to detect and classify diabetic retinopathy severity using improved pretrained NASNet model. Diagnostics. 2023;13(16):2645. 10.3390/diagnostics13162645. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Alwakid G, Gouda W, Humayun M. Enhancement of diabetic retinopathy prognostication using deep learning, CLAHE, and ESRGAN. Diagnostics. 2023. 10.3390/diagnostics. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Vijayan M, Venkatakrishnan S. A regression-based approach to diabetic retinopathy diagnosis using efficientnet. Diagnostics. 2023;13(4):774. 10.3390/diagnostics13040774. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Alwakid G, Gouda W, Humayun M, Jhanjhi NZ. Deep learning-enhanced diabetic retinopathy image classification. Digital Health. 2023;9. 10.1177/20552076231194942 [DOI] [PMC free article] [PubMed]
  • 39.Guefrachi S, Echtioui A, Hamam H. Automated diabetic retinopathy screening using deep learning. Multimedia Tools Appl. 2024. 10.1007/s11042-024-18149-4. [Google Scholar]
  • 40.Sunkari S, Sangam A, P VS, Manikandan S, Raman R, Rajalakshmi R, S T. A refined ResNet18 architecture with Swish activation function for Diabetic Retinopathy classification. Biomedical Signal Processing and Control. 2024;88, 105630. 10.1016/j.bspc.2023.105630.
  • 41.Macsik P, Pavlovicova J, Kajan S, Goga J, Kurilova V. Image preprocessing-based ensemble deep learning classification of diabetic retinopathy. IET Image Proc. 2024;18(3):807–28. 10.1049/ipr2.12987. [Google Scholar]
  • 42.Shakibania Bu-Ali H, Raoufi S, Pourafkham B, Khotanlou Bu-Ali H, Shakibania H, Khotanlou H, Mansoorizadeh M. Dual branch deep learning network for detection and stage grading of diabetic retinopathy. Biomedical Signal Processing and Control(Pre-print). 2024.
  • 43.Yadav N, Dass R, Virmani J. Despeckling filters applied to thyroid ultrasound images : a comparative analysis. Multimedia Tools Appl. 2022. 10.1007/s11042-022-11965-6. [Google Scholar]
  • 44.Yadav N, Dass R, Virmani J. Deep leaning-based CAD system design for thyroid tumor characterization using ultrasound images. Multimedia Tools Appl. 2023. 10.1007/s11042-023-17137-4. [Google Scholar]
  • 45.Yadav N, Dass R, Virmani J. A systematic review of machine learning based thyroid tumor characterisation using ultrasonographic images. J Ultrasound. 2024. 10.1007/s40477-023-00850-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Dass R, Yadav N. Image quality assessment parameters for despeckling filters. Procedia Comput Sci. 2020;167(2019):2382–92. 10.1016/j.procs.2020.03.291.1. [Google Scholar]
  • 47.Yadav N, Dass R, Virmani J. Machine learning based CAD system for thyroid tumor characterization using ultrasound images. Int J Med Eng Info. 2022. 10.1504/IJMEI.2022.10049164. [Google Scholar]
  • 48.Yadav N, Dass R, Virmani J. Assessment of encoder-decoder based segmentation models for thyroid ultrasound images. Med Biol Eng Compu. 2023. 10.1007/s11517-023-02849-4. [DOI] [PubMed] [Google Scholar]
  • 49.Yadav N, Dass R, Virmani J. Texture analysis of liver ultrasound images. emergent converging technol. Biomed Syst Lect Notes Electr Eng. 2022;841:575–85. 10.1007/978-981-168774-7_48. [Google Scholar]
  • 50.Yadav N, Dass R, Virmani J. Objective assessment of segmentation models for thyroid ultrasound images. J Ultrasound. 2022. 10.1007/s40477-022-00726-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.https://www.kaggle.com/datasets/sachinkumar413/diabetic-retinopathy-dataset. Accessed on February 2024.
  • 52.Porwal P, Pachade S, Kamble R, Kokare M, Deshmukh G, Sahasrabuddhe V, Meriaudeau F. Indian diabetic retinopathy image dataset (IDRiD): A database for diabetic retinopathy screening research. Data. 2018;3(25):1–8. 10.21227/H25W98. [Google Scholar]
  • 53.https://www.kaggle.com/datasets/sovitrath/diabetic-retinopathy-224x224-2019-data?resource=download. Accessed on February 2024

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The corresponding author has been authorized to access the data, and it will be shared for reasonable reasons.


Articles from Journal of Diabetes and Metabolic Disorders are provided here courtesy of Springer

RESOURCES