Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Jul 23;15:26765. doi: 10.1038/s41598-025-12151-y

Diabetes diagnosis using a hybrid CNN LSTM MLP ensemble

Yanmin Fan 1,
PMCID: PMC12287293  PMID: 40702146

Abstract

Diabetes is a chronic condition brought on by either an inability to use insulin effectively or a lack of insulin produced by the body. If left untreated, this illness can be lethal to a person. Diabetes can be treated and a good life can be led with early diagnosis. The conventional method of identifying diabetes utilizing clinical and physical data is laborious, hence an automated method is required. An ensemble deep learning model is presented in this research for the diagnosis of diabetes which includes three steps. Preprocessing is the first step, which includes cleaning, normalizing, and organizing the data so that it can be fed into deep learning models. The second step involves employing two neural networks to retrieve features. Convolutional neural network (CNN) is the first neural network utilized for extracting the spatial characteristics of the data, while Long Short-Term Memory (LSTM) networks—more specifically, an LSTM Stack—are used to comprehend the time-dependent flow of the data based on medical information from patients. The last step is combining the two feature sets that the CNN and LSTM models have acquired to create the input for the MLP (Multi-layer Perceptron) classifier. To diagnose sickness, the MLP model serves as a meta-learner to combine and convert the data from the two feature extraction algorithms into the target variable. According to the implementation results, the suggested approach outperformed the compared approaches in terms of average accuracy and precision, achieving 98.28% and 0.99%, respectively, indicating a very great capacity to identify diabetes.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-12151-y.

Keywords: Diabetes disease, Ensemble learning, Deep neural networks, Convolutional neural network, Long short-term memory

Subject terms: Engineering, Mathematics and computing

Introduction

Diabetes is a chronic metabolic disorder characterized by elevated blood glucose levels, resulting from either the body’s inability to produce enough insulin or to effectively use the insulin it produces. Increased thirst, appetite, and frequent urination can be symptoms of higher glucose in the blood. If left untreated, diabetes can have several repercussions1. Diabetes ranks among the most common chronic diseases that often lead to severe complications when left without treatment and undiagnosed. If not identified and treated early, this body condition can cause a lot of harm to the eyes, kidneys, heart, blood vessels, and nerves2. There are two types of diabetes that exist in human beings -type 1 and type 2 diabetes3.

Mostly, individuals who are primarily below the age of 30 are exposed to type-1 diabetes. Those patients identified with type 1 diabetes cannot effectively be subjected to remission just by the use of oral medications. There is a need for insulin therapy for effective treatment4. Type-2 diabetes mostly occurs in middle-aged and above individuals and is not fully treatable. Those individuals who have Type 2 diabetes can live a normal life through proper control of their lifestyle and frequent follow-ups to the doctors5. In addition to these factors, the global number of diabetes cases has been influenced by both population aging and expansion6. Diabetes is one of the key elements that escalate health-system costs, morbidity, disability, and premature death6,7. More than persons without diabetes, persons with diabetes are prone to have depressive symptoms and Major Depressive Disorder (MDD). Recent studies indicate that the incidence of depression occurs twofold in people with Type 2 diabetes compared to control groups8,9.

There is some evidence from a number of researchers that diabetes precedes depression and that subsequent to a diabetes diagnosis, it increases the risk of a depression onset. The development of diabetes-induced distress, which may be as a result of factors such as hyperglycemia, changes in glucose transport, or the treatment difficulties involved in diabetes management, are responsible for these results. These results have important implications for clinical practice8,10. Recent longitudinal studies have challenged such thinking, demonstrating that depression itself may predispose to the development of diabetes10,11. Not all studies would support the opposite relationship that diabetes predisposes to depression or that diabetes and depression are only weakly associated10.

A recent report published by the World Health Organization (WHO) indicated that people with comorbid depression and other chronic diseases were likelier to self-report poor health than those with depression alone, or each of the other chronic conditions individually12. In 2015, this very condition was believed to be prevalent among 415 million people all over the world. It is estimated that one in every ten people is going to get affected by diabetes by 204013. Therefore, healthcare digitization provides an opportunity in the reduction of human error, improvement of clinical outcomes, monitoring of information longitudinally, and more. AI approaches from machine learning to deep learning are playing a key role across all domains related to well-being, from developing new clinical systems to being responsible for managing information and records related to patient treatment and the course of different diseases14.

A quick and correct diagnosis of diabetes helps manage the disease and avoid serious complications. Because traditional methods depend on human observations, they are not always fast or accurate which is why we need automated and precise tools. Deep learning, a branch of AI, has demonstrated great potential in medical diagnosis by being able to identify patterns from large and complicated datasets. For example, researchers have looked into using Support Vector Machines (SVMs) and Random Forests to predict diabetes using clinical data that is organized9,11. Deep neural networks (DNNs) have become important tools lately, as they can automatically find features in raw medical data without the need for much manual work1518.

However, it is still difficult for single-model approaches to make use of the wide range of patient clinical data, often including both fixed and changing measurements. Although CNNs are good at finding patterns in grid-like data and LSTMs are skilled at handling time-series data, using a hybrid approach that can handle both spatial and temporal features of diverse diabetes data is not widely explored. Moreover, using these specialized feature extractors as part of a strong ensemble approach may result in better performance and generalization, dealing with the problems that single models have.

Here, this paper introduces a new ensemble deep learning model to accurately diagnose diabetes. We propose using a CNN to find spatial features in the clinical data and a LSTM stack to identify trends in patient medical information. The main innovation of this research is using both neural architectures to ensure full coverage of feature extraction. The features from both CNN and LSTM are brought together and provided to a Multi-layer Perceptron (MLP) classifier which uses them to make a reliable and accurate decision about the disease. The combination of CNN and LSTM in this method helps create a more accurate and thorough model for identifying diabetes. The main contribution of this research work includes the following:

  • This paper proposes an ensemble system for diabetes detection using clinical data that uses deep neural networks for extracting features that can later be used in the detection of this disease.

  • A technique is proposed for describing diabetes-related features, wherein the potentials of the CNN and LSTM models are combined to enhance the feature extraction and classification operations.

  • This investigation will form the basis for the exact and reliable tools of early diabetes detection and relieve the burden on the patient for specialist interpretation of information as much as possible.

The continuation of the paper is as follows: In the following part, we looked at some parallel research. In the third portion, the materials and technique were presented. The fourth portion provides an assessment of the findings and the fifth section concludes.

Related works

This section provides an overview of the papers published in recent years on subjects related to diabetes. Yurttakal and Baş15 presented a deep neural network approach based on a stacked ensemble for early diabetes risk assessment by using 520 patients. The method, during testing on the data set, gave the maximum success rate: 99.36% of the cases, 99.19% Area Under the Curve )AUC(, and retained higher test percentages compared to previous prediction studies.

Qummar et al.16 used the Kaggle dataset to train five deep CNN models for the purpose of encoding rich features that enhance classification performance for various stages of Diabetic Retinopathy (DR). This model demonstrated better performance than the existing method and better performance than state-of-the-art methods on the same dataset.

Sisodia and Dilip17 tested the use of Decision Trees, SVMs, and Naive Bayes in the early identification of diabetes. The Pima Indians Diabetes Dataset was used for the trials, and recall, accuracy, precision, and F-measure were assessed. Outperforming all other algorithms was Naive Bayes.

Alanis et al.18 proposed a classification and diagnosis scheme for diabetes mellitus and poor glucose tolerance based on deep neural networks, which gave a 96% accuracy using time series, virtual, and actual patient data.

With a 6.5% increase in test set accuracy compared to the existing MLP works, Massaro et al.5 demonstrated a useful implementation of the LSTM strategy in systems that support decisions for homecare support and de-hospitalization procedures, suggesting it as a viable substitute for homecare services.

Alhassan et al.19 have used a very large dataset of more than 14k patient records from the King Abdullah International Research Centre for Diabetes (AIRCD) to perform the diagnosis of Type 2 Diabetes Mellitus by using temporal predictive Deep Learning models, hence giving an accuracy of more than 97% in comparison to classical Machine Learning approaches.

Rahman et al.20 have developed a new classification model for diabetes with Convolutional LSTM (CLSTM), outperforming popular models like CNN, Traditional LSTM, or simply T-LSTM, CNN-LSTM, obtaining accuracy of up to 97.26% after features from the Pima Indians Diabetes Database.

Chowdary and Kumar21 presented a deep learning approach for enhancing diabetes prediction using CLSTM, which was further advanced with multivariate imputation using chained equations, giving encouraging results.

Waberi et al.22 developed a hybrid model using LSTM networks and eXtreme Gradient Boosting (XGBoost) to better predict Type 2 Diabetes Mellitus. This model established temporal patterns in patient history and converted these to predictive insights by approximating the problem to an absolute prediction accuracy of 0.99; hence, it is quite promising in health care.

Alex et al.23 showed a SMOTE-based deep LSTM method for diabetes prediction, which reached the highest rate, 99.64%, in a diabetes dataset. It outperformed other methods and recommended its use for clinical analysis in diabetic patients.

Olisah et al.24 presented a machine learning framework for improving diabetes prediction with the PIMA Indian dataset and Laboratory of the Medical City Hospital (LMCH) datasets, and achieved a classification accuracy of 97.931% and 100%, making it comparable to state-of-the-art models and the best for Conventional Machine Learning models (CML-based) models.

Ahmed et al.25 developed machine learning-based classifier models for the detection of diabetes using clinical data, which improved the existing accuracy between 2.71% and 13.13%, using different algorithms such as decision tree, Naive Bayes, k-NN, random forest, gradient boosting, and logistic regression.

Haq et al.26 developed a method for detecting diabetes using the Decision Tree, Ada Boost, and Random Forest ensemble learning algorithms with the help of the classifier Decision Tree, which showed better classification performance and greater accuracy, making it suitable for e-healthcare environments.

Nagaraj and Deepalakshmi27 used Enhanced Support Vector Machine (SVM) and DNN models in diabetes prediction and screening, where the results showed that the Deep Neural Network model was more effective in prediction on a dataset of 768 patients.

Wee et al.28 explored machine learning and deep learning in diabetes identification, emphasizing the need for cost-effective, high-performing solutions, anthropometric and non-invasive measurements, and feature selection. Summarization of the studied works is presented in “Supplementary Table 1” in Supplementary.

This research is an extension of what was done before by using both CNNs and LSTMs to classify difficult biomedical signals. Earlier studies, including the classification of speech emotions29, detecting Parkinson’s from speech30 and heart sound classification with CNNs and local binary/ternary features31, have helped guide the integration of these models. The study makes use of and advances the approach to handle the important problem of identifying diabetes from clinical information.

Research methodology

In this section, the characteristics of the dataset used in this research are first described, and then the steps of the proposed method for diabetes detection using the ensemble of DNNs are presented.

Data

In the current research, the KAIMRC32 dataset was used which includes data collected from the King Abdullah International Medical Research Center in the Middle East through the main National Guard hospitals located in three populous regions. This is part of the hospital care services methods for clinical diagnosis of visitors against type two diabetes. The collected data contains clinical diagnosis records from the complete history of 14,609 patient visits, collected between the years 2010 and 2015. The KAIMRC dataset is rich and diverse, comprising approximately 41 million timestamped results for various laboratory tests (e.g., Blood Urea Nitrogen (BUN), Cholesterol (Chol), Mean Corpuscular Hemoglobin (MCH)) and vital signs (e.g., Body Mass Index (BMI), high blood pressure). In addition to these time-series clinical measurements, it also retains static patient characteristics such as gender, patient age at the time of visit, type of visit (disease, outpatient, or emergency), type of discharge (at home, referral to another hospital, patient death), and length of stay, as well as the type of services received (e.g., heart, brain and nerves, endocrine glands).

It should be noted that the data is imbalanced and 62% of patients have been diagnosed with diabetes. To address this class imbalance and ensure robust model training and evaluation, appropriate techniques such as stratified sampling were employed during the data partitioning process, as further elaborated in the continuation of this paper. The KAIMRC dataset used in this study is a pre-established dataset which is accessible through official request to king Abdullah international medical research center. Although we cannot say how the original data were collected, we believe they were collected and processed following proper ethical guidelines. As the dataset is public and has been used before in studies, we think that the original participants or their guardians gave informed consent.

Proposed method

With the help of deep learning’s potential to identify patterns in large and detailed sets of data, this section of the article suggests a range of DNNs for accurate and efficient diagnosis of diabetes using clinical data. This strategy involves the integration of multiple DNN structures to provide a more detailed assessment of the features that are present in clinical files. The proposed method consists of three main phases (Fig. 1):

Fig. 1.

Fig. 1

Steps of the proposed method.

  • I.

    Preprocessing: The first step in the proposed method includes a set of data transformation processes for preprocessing them in accordance with the techniques used in the proposed method, which precisely prepares the clinical data for use by DNNs. This step includes the following sub-processes:

    • Data Cleaning: This process is used to validate the data and also clean the data to ensure that there are no errors or mistakes in the data set.
    • Normalization: In this process, the scale of the data is normalized to a scale that will not introduce errors in the DNNs.
    • Transformation: At the end of the preprocessing step, the data is transformed to fit the input of DNNs used in the proposed ensemble system. Therefore, this step feeds the required inputs to the CNN and LSTM models.
  • II.

    Feature Extraction: This step employs two different kinds of neural networks to make informational features from the preprocessed clinical data. The proposed hybrid model for feature extraction of data consists of two components:

    • CNN: More specifically, CNNs are employed for extracting spatial features out of the data. These features can show significant trends in the data that can be used for diagnostic purposes. Therefore, the first component incorporated in the proposed method of extracting clinical features concerning diabetes is the CNN that is primarily designed to explain the spatial dependencies of data in diagnosing the condition.
    • LSTM: LSTM models are particularly effective in handling and analyzing the streaming data. The temporal patterns of patient’s medical records are captured by a LSTM in the proposed model. As this model is more suitable for recording the nature of changes in clinical data points over time and can be useful in diagnosing a condition.
  • III.

    Classification: In the final step of the proposed method, the features extracted from both models, CNN and LSTM, are combined and fed to the final MLP for classification. The MLP is employed in this step as an ensemble model and gathers the interpretations that were made by the involved neural networks. The application of this ensemble approach has two main benefits.

    • Robustness: This model combines the features of various DNN architectures and does not have the flaws of each model, and therefore, gives more accurate predictions.
    • Accuracy: This set exploits the feature extraction aspect of CNN and LSTM which could lead to a better understanding of the data and therefore a better chance of diagnosis.

Overfitting was prevented by taking special care during the training of the deep learning components. To manage this, we used hyperparameter tuning, selected models that were neither too complex nor underperformed and considered early stopping rules. The continuation of this section is dedicated to detailing each of the steps of the proposed method.

Preprocessing

The most important stage of preprocessing is that it guarantees the suitability and quality of time series data for use in ensemble model of DNNs. This step consists of three crucial stages:

  1. Data Cleaning: In this stage, the input data is first explored to see if there are missing values or outliers; if so, they will be removed from the dataset. The objective here is to handle data that might affect model performance:

    • Determining Missing Values: At the start of the preprocessing phase, missing values in the time-series data are identified and later on imputation methods are then used to manage them. It should be noted that if there are p ≥ 5 consecutive missing data points, the information of that sequence is ignored.
    • Detection and Management of Outliers: These are instances which can distort model learning process; therefore, their identification and handling within a detection scheme is highly significant. In33 using general Extreme Studentized Deviate method for outlier identification was proposed and this research follows the same approach. Moreover, if an outlier is detected, then such observation(s) would be discarded from the dataset.
  2. Normalization: To provide a uniform range in all features, normalization is required since distinct variables in the input data may have different scales. This enables the CNN and LSTM layers in the ensemble model to process data more effectively. Thus, the SoftMax scaling method is applied in the suggested method to normalize the features. This normalizing operator is defined as follows34 and maps every feature in the range [0,1] non-linearly:
    graphic file with name d33e497.gif 1
    .
    As per the above equation, we have:
    graphic file with name d33e506.gif 2
    .

    where Inline graphic indicates the attribute’s standard deviation and Inline graphic its mean. Additionally, r is an adjustable parameter, and in this study, it is set to r = 1.

  3. Data Transformation for Model Inputs: This stage involves transforming the preprocessed data into a common format so that the CNN and LSTM feature extraction components can process it. The proposed method employs a separate approach for preparing the data of each model:

    1. LSTM Input Transformation: Windowing the input data is the first step in transforming time-series data into an LSTM-processable format. In order to perform this procedure, each patient’s data points are divided into time frames that represent distinct intervals (e.g., weekly blood sugar readings, monthly blood pressure measures). A crucial hyperparameter that must be properly controlled, the window length affects the temporal range that the LSTM records.
    2. CNN Input Transformation: Since CNNs typically operate on 3D tensors, and also the input data has multiple variables (for example, blood sugar readings, blood pressure measurements, weight), to fit the data with the CNN model, instances are reshaped into a 3D tensor. The dimensions of this 3D tensor are broken down as follows:
      • Instances: The total number of samples in the dataset, or data points.
      • Time Steps: The duration of the time window (number of readings for each variable in each window, for instance).
      • Channels: The number of variables in the incoming data is indicated by this dimension.

It should be noted that time-independent clinical information is described in the above tensor matrix in the form of a separate channel, and the empty space in this channel is filled with zero values.

Feature extraction

After preprocessing the data, the set of features represented through time series and patient clinical information are used as inputs for the feature extraction components. This mechanism is semantically illustrated in Fig. 2. The feature extraction components in the proposed ensemble system include a CNN component that is effective in extracting spatial relationships between data patterns and an LSTM stack that effectively describes diabetes-related features based on temporal dependency. This section details the feature extraction models.

Fig. 2.

Fig. 2

Semantic diagram of the proposed DL-based feature extraction model.

LSTM stack

LSTM provides a powerful method for recording short-term and also long-term dependencies inpatient medical records, which is very important for accurate diagnosis. The proposed feature extraction step uses a stack of LSTM layers, instead of a single layer, for the gradual extraction of features at different levels of temporal abstraction. This feature extraction component can be broken down into the following parts:

  • First LSTM Layer: The first layer in the stack receives preprocessed sequences as input. It processes each element of a sequence (for example, reading blood sugar at a specific time point) and learns short-term dependencies in the represented window. The output of this layer is converted into a hidden state that encapsulates the relevant information extracted from the current and a few previous time steps.

  • Next LSTM Layers: The hidden state from the previous layer enters the next LSTM layer in the stack. This layer not only processes the current input sequence but also considers the information obtained in the previous hidden state. This allows the network to not only learn immediate relationships in a window but also how these short-term dependencies evolve over longer periods.

  • Output of LSTM Stack: After processing the entire sequence, the final LSTM layer in the stack produces an output vector. This vector represents high-level features extracted from time-series data. This feature not only encompasses immediate relationships in a window but also how these relationships change over the defined time window based on information collected through the hidden states of all LSTMs in the stack.

The window length and the number of LSTM layers in the stack are the most essential hyperparameters for tuning in the feature extraction model. Using a deeper stack allows the model to record more complex temporal patterns, but also increases the complexity and risk of overfitting the model. Therefore, in this research, the LSTM stack model was evaluated with different depths and window lengths, and it was finally determined that using an LSTM stack with a depth of 5 and also a window length of L = 15 could result in the best performance of the model. This iterative evaluation process, guided by validation set performance, also helped in selecting a configuration that balanced model capacity with the risk of overfitting, ensuring good generalization to unseen data.

CNN

The second feature extraction component used in the proposed method is a CNN. Figure 3 illustrates the structure of this feature extraction component.

Fig. 3.

Fig. 3

Structure of the proposed CNN model for feature extraction.

According to the structure depicted in Fig. 3, the feature extraction model is fed with the tensor matrix resulting from the preprocessing step. The proposed CNN model uses three convolution blocks to extract feature maps from the input matrices. Each convolution block consists of a convolution layer, an activation layer, and a pooling layer. After extracting the feature maps, three consecutive fully connected layers are utilized to extract the vector structure of the features and compress them. Thus, the fully connected layer FC1 converts the feature maps into a vector format, and then the extracted features are compressed using the FC2 layers. Finally, the weight values obtained from the last fully connected layer (FC3) are classified by the network’s final layers so that it can evaluate the training performance based on error metrics. In the proposed feature extraction model, the activation values obtained from the FC2 layer are considered as features extracted from clinical data. Thus, the length of the feature vector extracted through the CNN model is 30.

To achieve a model with the highest efficiency in feature extraction, a grid search strategy is employed to adjust the hyperparameters. Convolution layer dimensions, the quantity of filters in the layer of convolution, the kind of activation operation, and the kind of pooling mechanism are among the CNN setup hyperparameters that were looked at during the procedure for searching. Additionally, the model’s efficacy was analyzed for various training setups, such as mini batch size, optimizer, and learning rate. On the other hand, for tuning the LSTM stack, hyperparameters of window length, stack depth, and the number of units per layer were considered during grid search. The hyperparameters taken into consideration for the CNN and LSTM stack designs are given in Table 1. This table also lists the search space considered for tuning the hyperparameters of the MLP meta-learner which will be discussed in Sect. 3.2.3.

Table 1.

Considered hyperparameters for configuring the CNN feature extraction model.

Model Hyperparameter Search Values
CNN Convolution width/length {2, 3, 4, 6}
Number of filters {8, 16, 24, 32}
Activation operator ReLU, PReLU, LeakyReLU
Pooling type Max, min, average
Optimizer SGDM, Adam
Learning rate {0.005, 0.001, 0.0005} for Adam and {0.1, 0.05, 0.01, 0.005} for SGDM
Mini batch size {8, 16, 32}
LSTM stack Window length (L) {10, 15, 20, 25}
Stack depth (layers) {3, 4, 5, 6}
Units per layer {50, 100, 150}
MLP Neurons (hidden layer 1) {32, 64, 128}
Neurons (hidden layer 2) {8, 16, 32}
Activation function {ReLU, Tanh, Sigmoid}
Training algorithm {Levenberg-Marquardt, SCG}

The validation error factor is used to conduct the configuration search and assess its suitability. Consequently, according to the thorough search conducted, the CNN config (Fig. 3) detailed in this section had the lowest validating loss. Conversely, the CNN model performed best on the used dataset when trained using an Adam optimizer and with a minimum batch size of 16. The decision to use validation error to guide the search allowed us to find a CNN setup that worked well and was not likely to overfit the data. Choosing the right number of filters and layer dimensions, as well as the regularization that comes with the optimization method and validation, helped as well.

Diagnosis

After extracting features from the input samples, the set of features extracted through the LSTM stack and CNN are merged into a vector so that the resulting vector is utilized as a classification component in the proposed ensemble system. The proposed classification component is an MLP with 2 hidden layers. In this step, the MLP acts as an ensemble mechanism and combines the insights interpreted by the involved neural networks. The first hidden layer of the suggested MLP model for feature classification contains 64 neurons, whereas the second hidden layer contains 16 neurons. The length of the feature vector acquired in the preceding phase are used to calculate the total quantity of neurons in the input layer. The number of neurons in the output layer, on the other hand, is equivalent to the total amount of target groups (2), and the firing of any of these neurons for a sample of input will indicate whether the sample is healthy or ill. The MLP model’s training was also checked using the validation set to prevent overfitting during this last classification phase. The training was performed within 1000 epochs and it was stopped earlier, if the model’s results on the validation set did not improve after 30 epochs. A grid search was performed, guided by the results from the validation set, to select the right number of neurons, activation function and training method to avoid overfitting. The search space employed in grid search for these parameters are listed in Table 1. Based on the results of this search, using ReLU as the activation function and the Levenberg-Marquardt algorithm35 as the training function resulted in the least validation error and good generalization.

Research finding

In this section, after explaining the employed evaluation scenarios and metrics, the results of the experiments are presented and discussed.

Evaluation scenarios and metrics

The experiments were conducted using MATLAB 2020a. We used k-fold cross-validation to assess how well the model works and to confirm its robustness. We specifically applied a 10-fold cross-validation method. For every fold, the data was divided so that 80% went to training, 10% to validating and adjusting hyperparameters and the other 10% was used to assess the final performance of the model. Since the data was not balanced (with 62% of patients having diabetes), stratified sampling was used to split the data in every fold. This allowed the model to be assessed more accurately since the number of diabetic and non-diabetic patients was the same in all the sets. The performance metrics shown are the average results from the 10 folds.

The proposed method is implemented in 4 modes, which describe the first mode of the proposed method, discussed in the third section. In the LSTM Stack mode, only the LSTM model is used to diagnose diabetes, meaning that the data is processed by the LSTM Stack. The CNN mode is used for the mode in which the CNN model is used to extract data features and then perform disease diagnosis based on these features. Also, in the MLP mode, the features of pre-processed data by MLP are used for disease diagnosis, and the feature extraction step is no longer present. Also proposed method along with papers Alhassan et al.19, Rahman et al.20, Chowdary & Kumar21, and Hou22 has been compared. The models have been evaluated against accuracy, precision, F-Measure, recall, MCC and CSI and the obtained values are depicted in Table 2.

Table 2.

Comparison of different approaches’ performance in diabetes diagnosis using KAIMRC data.

Models Precision Recall F-Measure Accuracy MCC CSI AUC
Proposed 0.9903 0.9820 0.9861 98.2870 0.9638 0.9723 0.9843
LSTM Stack 0.9712 0.9584 0.9647 95.6571 0.9084 0.9296 0.9582
CNN 0.9635 0.9369 0.9500 93.8873 0.8720 0.9004 0.9417
MLP 0.9527 0.9239 0.9381 92.4373 0.8417 0.8766 0.9251
Alhassan et al.20 0.9762 0.9669 0.9715 96.4816 0.9256 0.9430 0.9667
Rahman et al.21 0.9757 0.9617 0.9686 96.1405 0.9186 0.9374 0.9617
Chowdary and Kumar22 0.9812 0.9709 0.9760 97.0431 0.9375 0.9521 0.9708
Hou23 0.9856 0.9818 0.9837 97.9814 0.9572 0.9674 0.9817

Accuracy: This metric shows the correct identification rate of instances in both negative (healthy) and positive (disease) categories.

graphic file with name d33e771.gif 3

.

where the total amount of correctly identified occurrences of the positive class is known as TP (True Positives). The amount of correctly identified instances of the negative class is known as TN (True Negatives). The number of incorrectly categorized occurrences of the negative class is known as FP (False Positives), whereas the number of incorrectly classified instances of the positive class is known as FN (False Negatives).

Precision: it is the ratio between the true positives and the total number of test instances identified as positives.

graphic file with name d33e782.gif 4

.

Recall: it is the ratio of TP and TP + FN

graphic file with name d33e791.gif 5

.

F- Measure: it is the weighted harmonic mean of precision and recall:

graphic file with name d33e800.gif 6

.

The correlation between observed and predicted classifications is determined by the confusion matrix using Eq. 7 where + 1 indicates perfect prediction, -1 indicates absolute difference, and zero indicates random prediction.

graphic file with name d33e812.gif 7

.

CSI is the percentage of positive occurrences correctly predicted in all events. It has values between 0 and 1. CSI is also referred to as the threat score and the true skill statistic. CSI is given by:

graphic file with name d33e821.gif 8

.

Results

The experiments were conducted according to the scenarios mentioned in Sect. 4.1. Figure 4 shows the average accuracy of the proposed method and other comparative methods. Based on these results, it is evident that the proposed method outperforms other models. Next is the LSTM Stack method, which demonstrates that LSTM models, due to their more comprehensive extraction of temporal data stream information, extract time-based information with greater accuracy. Since our data is time-based, classification using LSTM models performs better compared to CNN and MLP models, typically used for spatial data classification. Therefore, LSTM surpasses CNN and MLP models. Furthermore, the proposed approach, by combining these three models effectively, achieved performance improvement compared to the LSTM Stack structure, with a 2.63% improvement over the LSTM Stack method and a 0.3% enhancement over the comparative method of Hou, indicating a high average accuracy of the proposed approach.

Fig. 4.

Fig. 4

Comparison of average accuracy across different methods.

The confusion matrices obtained after accumulation of test results within 10 folds of cross validation are depicted in Fig. 5. The suggested approach correctly classified 5263 out of 5347 instances as being in the negative category, achieving an impressive 98.4% accuracy rate in doing so. The suggested approach, on the other hand, showed an accuracy of 98.2% for the samples in the positive class, which added to the overall diagnostic accuracy of 98.29% and represented a 0.31% increase over the comparative methods. Accordingly, this research shows that the suggested approach achieves higher output reliability and performs better than the comparison approaches in identifying the members of both sample categories.

Fig. 5.

Fig. 5

Confusion matrices for different methods in diabetes diagnosis through 10 folds of cross-validation.

The classification score diagram is shown in Fig. 6. This figure demonstrates that our approach has nearly perfect precision and that its recall and f-Measure are still greater than those of the comparative approaches. With a precision score of 0.99%, this suggested approach outperformed both the Hou method and the LSTM Stack method by 0.02 and 0.01 points, respectively. Additionally, recall indicates that our suggested approach was able to more precisely identify more individuals with diabetes; that is, there is a high likelihood that the suggested method will extract samples that fall into the positive group. Additionally, the harmonic average of these two indicators is higher than the comparison approaches, according to the F-Measure.

Fig. 6.

Fig. 6

Comparing the classification rates of various methods in diabetes diagnosis using precision, recall, and f-measure.

Figure 7 provides the evaluation of diagnostic methods for diabetes disease through MCC, CSI, and AUC criteria according to accuracy and effectiveness. The plot is shown for the corresponding improved performance of the developed method in diagnosing the disease. Superior values of the criteria show a superior performance of the method, which was correct in detecting the relevant samples for the disease under consideration.

Fig. 7.

Fig. 7

MCC, CSI, and AUC values obtained by different methods in diabetes diagnosis.

Figure 8 shows the receiver operating characteristic (ROC) curve. Each method has its ROC curve calculated class by class. The performance of the proposed method is evaluated in detecting positive category (diabetes) samples. In the proposed method, the main focus is to maximize the True Positive Rate (TPR) and to minimize the False Positive Rate (FPR). Thus, the bigger the AUC, the better the performance of the proposed method, and that is what occurred. The increase in AUC compared with the comparative methods indicates that the proposed method is more accurate in detecting samples of the positive category, namely, Diagnosis of diabetes.

Fig. 8.

Fig. 8

receiver operating characteristic curves obtained by different methods in diabetes diagnosis.

Table 2 shows the performance evaluation of different techniques for diagnosing diabetes. The evaluation parameters for these techniques are precision, recall, F-measure, accuracy, MCC, CSI, and AUC. The proposed method had the highest values for precision, recall, and F-measure, respectively, which indicated stronger performance in making detections of samples pertaining to diabetes. Furthermore, its AUC was higher than the comparative methods, proving greater accuracy in identifying samples in the positive category (i.e., diagnosis of diabetes). In sum, the table demonstrates that the proposed method is superior to other methods in diagnosing diabetes and, hence, may be useful in improving practices related to medicine and, consequently, systems for diagnosing diseases.

Statistical analysis

By doing statistical analysis, the effectiveness and significance of the suggested method compared to the prior procedures may be more fully analyzed. To this goal, a one-way ANOVA was performed.

A matrix representing the classification pattern of different methods is used in this analysis; each row of the matrix represents a test sample, and each column represents the classification behavior of the techniques. The accuracy scores for each test sample are then computed in this matrix by comparing each predicted label with the sample’s ground-truth labeling. In this instance, matching is rated at + 1 and the corresponding item in the matrix gets valued as -1 if the output for an instance does not match its real label.

Normality tests were performed on each column of the resultant matrix using quantile-quantile (Q-Q) graphs and the Shapiro-Wilk test as the instrument36. These tests’ outcomes demonstrated that each model column’s distribution was normal, allowing for the statistical study of correctness using one-way ANOVA. The Shapiro-Wilk test (p > 0.05 for accuracy) revealed that the one-way ANOVA is appropriate for comparing the means of several groups when the data is normally distributed.

This test’s outcomes are displayed in Table 3. Based on the experiment results, the accuracy of a minimum of two different methods differs statistically significantly (p < 0.05), as indicated by the ANOVA test results (Inline graphic). These results also indicate the statistically significant distinctions amongst at least two methods for the accuracy indicator. Note that the models responsible for this discrepancy cannot be identified using a one-way ANOVA test. Therefore, to identify the techniques underlying these disparities, multiple comparison analysis37 was employed. Tukey’s Honestly Significant Difference (HSD) was employed as the post-hoc analysis in the multiple comparison study to determine which of the particular models had statistically distinct levels of accuracy. When comparing pairings of groups after a statistically significant F-test in the examination of variance, this post-hoc test is perfect. Since Tukey’s HSD accounts for the family-wise error rate, the likelihood of at least one Type I mistake (false positives) for each individual comparison is kept within the acceptable alpha level (0.05 in this experiment), making it preferred over other post-hoc analyses. Tukey’s HSD provides a workable method for determining which model performs noticeably worse or better by regulating this error ratio. As a consequence of Tukey’s HSD test, we could find specific algorithms that accounted for the considerable distinctions that were shown in the ANOVA analysis.

Table 3.

The ANOVA table obtained through statistical analysis on accuracy.

Source SS Inline graphic MS F Inline graphic
Columns 156.3 7 22.3251 146.29 Inline graphic
Error 17175.6 112,544 0.1526
Total 17331.8 112,551

The results of these assessments are shown in Fig. 9 and show that the suggested approach outperforms the models put forth by Alhassan et al.19, Rahman et al.20, and Chowdary & Kumar21 as well as the state of employing LSTM stack, CNN, and MLP individually in terms of average accuracy. However, its accuracy is not significantly different from that of the Hou technique22.

Fig. 9.

Fig. 9

Multiple comparison examinations of the accuracies among diabetes diagnosis models.

The results of this study show that, in addition to the method’s notable performance advantage over previous methods, the stacked ensemble learning model employing the LSTM-CNN-MLP combination can result in a significant improvement in the diagnosis performance.

Limitation and future directions

As the research findings demonstrated, the suggested hybrid architecture achieved promising results in diabetes diagnosis. However, the suggested model still faces limitations which lead to avenues for future directions:

  • The first limitation of this research is related to the representativeness of the data and the possible effect on the generalizability of the model. The KAIMRC dataset is a collection of data that was gathered in medical centers in the Middle East exclusively, which is why it mostly reflects a particular demographic group. It is known that the pathophysiology and clinical manifestation of type 2 diabetes are affected by a complicated combination of genetic predispositions, eating habits, environmental conditions, and regional healthcare systems. Therefore, although the suggested model was very accurate on this data, it does not necessarily mean that it will perform well on other ethnic and geographic groups. To determine the wider clinical applicability and strength of our method, future studies must be aimed at validating and refining the model in terms of larger, multi-center data that will include a more heterogeneous and multi-ethnic patient cohort.

  • One of the main weaknesses of the present research is the fact that the proposed hybrid deep learning architecture has an inherent interpretability problem. Although the CNN-LSTM-MLP ensemble showed a high predictive accuracy, it is a so-called black box model, which implies that the reasoning behind a certain diagnosis is not clear. The ensemble strategy is a further complication to this complexity as it is hard to separate the particular spatial characteristics of the CNN and temporal characteristics of the LSTM that have the greatest impact on the final classification. To have a complete trust and implementation of a diagnostic tool in a clinical environment, it is important to understand the decision-making process of the tool. Hence, the further research should be devoted to the implementation of Explainable Artificial Intelligence (XAI) methods. To gain insights into the particular predictions, techniques like SHAP (SHapley Additive exPlanations) or LIME (Local Interpretable Model-agnostic Explanations) may be applied to give an understanding of which data points and features of a patient have the greatest impact on the diagnostic outcome.

  • In terms of computation cost, the proposed approach increases the computational expenses by integrating different DNNs. However, this computational cost can be reduced based on the high accuracy achieved. Also, techniques for improving the computations, particularly in deep neural networks, could solve this issue.

  • The proposed approach is predominantly on clinical data. However, the fact that we exclusively depended on medical data provides for the fact that quality of life and genetic characteristics, which are the basis of diabetes diagnosis and should enhance the quality of disease diagnosis, respectively, unfortunately, data were devoid of these features. Nonetheless, we would consider incorporating them into the proposed approach in the future to increase the accuracy of the outcome.

Conclusion

This study has implemented an automated way of detecting diabetes early using clinical and physical data by employing ensemble of DNN models to address the cumbersome and time-consuming nature of the current diagnostic processes. The three-stage diagnosis procedure initiates with the preprocessing of data, including cleaning, normalization, and organizing the data to input into the deep neural network models. Feature extraction is done by two neural networks: convolutional neural network for spatial characteristic extraction and the long short-term memory stack for capturing the temporal patterns based upon the patient medical data. This is further followed by the amalgamation of the features recorded from the results of the CNN and LSTM models to form the input for the multi-layer perceptron classifier, thereby creating the target variable, diabetes. Results obtained from this procedure depicted that the approach provided the best average accuracy and the best average precision which amounted, respectively, to 98.28% and 0.99%, which was better than the comparative methods and proved the strong ability of diabetes identification. This novel way not only makes the diagnosis process easier, but also it ensures a guarantee in making the lives of the people prone to diabetes better through its early intervention.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1 (16.5KB, docx)

Author contributions

YANMIN FAN wrote the main manuscript text. YANMIN FAN reviewed the manuscript.

Data availability

All data generated or analysed during this study are included in this published article.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Kumar, D. A. & Govindasamy, R. Performance and evaluation of classification data mining techniques in diabetes. Int. J. Comput. Sci. Inform. Technol.6(2), 1312–1319 (2015). [Google Scholar]
  • 2.Sneha, N. & Gangil, T. Analysis of diabetes mellitus for early prediction using optimal features selection. J. Big Data. 6(1), 1–19 (2019). [Google Scholar]
  • 3.Allam, F., Nossai, Z., Gomma, H., Ibrahim, I. & Abdelsalam, M. A recurrent neural network approach for predicting glucose concentration in type-1 diabetic patients. In International Conference on Engineering Applications of Neural Networks, 254–259 (Springer Berlin Heidelberg, 2011).
  • 4.Li, Y., Li, H. & Yao, H. Analysis and study of diabetes follow-up data using a data-mining-based approach in new urban area of Urumqi, Xinjiang, China, 2016–2017. Comput. Math. Methods Med.2018(2018). [DOI] [PMC free article] [PubMed]
  • 5.Massaro, A., Maritati, V., Giannone, D., Convertini, D. & Galiano, A. LSTM DSS automatism and dataset optimization for diabetes prediction. Appl. Sci.9(17), 3532 (2019). [Google Scholar]
  • 6.Chireh, B., Li, M. & D’Arcy, C. Diabetes increases the risk of depression: A systematic review, meta-analysis and estimates of population attributable fractions based on prospective studies. Prev. Med. Rep.14, 100822 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Seuring, T., Archangelidi, O. & Suhrcke, M. The economic costs of type 2 diabetes: a global systematic review. Pharmacoeconomics33, 811–831 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Anderson, R. J., Freedland, K. E., Clouse, R. E. & Lustman, P. J. The prevalence of comorbid depression in adults with diabetes: a meta-analysis. Diabetes Care. 24(6), 1069–1078 (2001). [DOI] [PubMed] [Google Scholar]
  • 9.Reza, M. S., Hafsha, U., Amin, R., Yasmin, R. & Ruhi, S. Improving SVM performance for type II diabetes prediction with an improved non-linear kernel: insights from the PIMA dataset. Comput. Methods Programs Biomed. Update. 4, 100118 (2023). [Google Scholar]
  • 10.Engum, A. The role of depression and anxiety in onset of diabetes in a large population-based study. J. Psychosom. Res.62(1), 31–38 (2007). [DOI] [PubMed] [Google Scholar]
  • 11.Ooka, T. et al. Random forest approach for determining risk prediction and predictive factors of type 2 diabetes: large-scale health check-up data in Japan. BMJ Nutr. Prev. Health. 4(1), 140 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Moussavi, S. et al. Depression, chronic diseases, and decrements in health: results from the world health surveys. Lancet370(9590), 851–858 (2007). [DOI] [PubMed] [Google Scholar]
  • 13.Federation, I. D. IDF diabetes atlas. http://www.diabetesatlas.org (2015).
  • 14.Kumar, Y., Koul, A., Singla, R. & Ijaz, M. F. Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda. J. Ambient Intell. Humaniz. Comput.14(7), 8459–8486 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Yurttakal, A. H. & Baş, H. Possibility prediction of diabetes mellitus at early stage via stacked ensemble deep neural network. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. 21(4), 812–819 (2021). [Google Scholar]
  • 16.Qummar, S. et al. A deep learning ensemble approach for diabetic retinopathy detection. IEEE Access.7, 150530–150539 (2019).
  • 17.Sisodia, D. & Sisodia, D. S. Prediction of diabetes using classification algorithms. Procedia Comput. Sci.132, 1578–1585 (2018). [Google Scholar]
  • 18.Alanis, A. Y., Sanchez, O. D., Vaca-González, A. & Rangel-Heras, E. Intelligent classification and diagnosis of diabetes and impaired glucose tolerance using deep neural networks. Mathematics11(19), 4065 (2023). [Google Scholar]
  • 19.Alhassan, Z. et al. Type-2 diabetes mellitus diagnosis from time series clinical data using deep learning models. In Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4–7, 2018, Proceedings, Part III 27, 468–478 (Springer International Publishing, 2018).
  • 20.Rahman, M., Islam, D., Mukti, R. J. & Saha, I. A deep learning approach based on convolutional LSTM for detecting diabetes. Comput. Biol. Chem.88, 107329 (2020). [DOI] [PubMed] [Google Scholar]
  • 21.Chowdary, P. B. K. & Kumar, R. U. An effective approach for detecting diabetes using deep learning techniques based on convolutional LSTM networks. Int. J. Adv. Comput. Sci. Appl., 12(4). (2021).
  • 22.Waberi, A. D., Mwangi, R. W. & Rimiru, R. M. Advancing type II diabetes predictions with a hybrid LSTM-XGBoost approach. J. Data Anal. Inform. Process.12(02), 163–188 (2024). [Google Scholar]
  • 23.Alex, S. A., Jhanjhi, N. Z., Humayun, M., Ibrahim, A. O. & Abulfaraj, A. W. Deep LSTM model for diabetes prediction with class balancing by SMOTE. Electronics11(17), 2737 (2022). [Google Scholar]
  • 24.Olisah, C. C., Smith, L. & Smith, M. Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Comput. Methods Programs Biomed.220, 106773 (2022). [DOI] [PubMed] [Google Scholar]
  • 25.Ahmed, N. et al. Machine learning based diabetes prediction and development of smart web application. Int. J. Cogn. Comput. Eng.2, 229–241 (2021). [Google Scholar]
  • 26.Haq, A. U. et al. Intelligent machine learning approach for effective recognition of diabetes in E-healthcare using clinical data. Sensors. 20(9), 2649 (2020). [DOI] [PMC free article] [PubMed]
  • 27.Nagaraj, P. & Deepalakshmi, P. Diabetes prediction using enhanced SVM and deep neural network learning techniques: an algorithmic approach for early screening of diabetes. Int. J. Healthc. Inform. Syst. Inf. (IJHISI). 16(4), 1–20 (2021). [Google Scholar]
  • 28.Wee, B. F., Sivakumar, S., Lim, K. H., Wong, W. K. & Juwono, F. H. Diabetes detection based on machine learning and deep learning approaches. Multimedia Tools Appl.83(8), 24153–24185 (2024). [Google Scholar]
  • 29.Er, M. B. A novel approach for classification of speech emotions based on deep and acoustic features. Ieee Access.8, 221640–221653 (2020). [Google Scholar]
  • 30.Er, M. B., Isik, E. & Isik, I. Parkinson’s detection based on combined CNN and LSTM using enhanced speech signals with variational mode decomposition. Biomed. Signal Process. Control. 70, 103006 (2021). [Google Scholar]
  • 31.Er, M. B. Heart sounds classification using convolutional neural network with 1D-local binary pattern and 1D-local ternary pattern features. Appl. Acoust.180, 108152 (2021). [Google Scholar]
  • 32.King Abdullah International Medical Research Center. KAIMRC Diabetes dataset, https://kaimrc.med.sa/En (2024). Accessed 09 Feb.
  • 33.Fitrianto, A., Muhamad, W. Z. A. W., Kriswan, S. & Susetyo, B. Comparing outlier detection methods using boxplot generalized extreme studentized deviate and sequential fences. Aceh Int. J. Sci. Technol.11(1), 38–45 (2022). [Google Scholar]
  • 34.Singh, B. K., Verma, K. & Thoke, A. S. Investigations on impact of feature normalization techniques on classifier’s performance in breast tumor classification. Int. J. Comput. Appl.116(19) (2015).
  • 35.Bilski, J., Smoląg, J., Kowalczyk, B., Grzanek, K. & Izonin, I. Fast computational approach to the Levenberg-Marquardt algorithm for training feedforward neural networks. J. Artif. Intell. Soft Comput. Res.13(2), 45–61 (2023). [Google Scholar]
  • 36.Hanusz, Z., Tarasinska, J. & Zielinski, W. Shapiro–Wilk test with known mean. REVSTAT-Statistical J.14(1), 89–100 (2016). [Google Scholar]
  • 37.McHugh, M. L. Multiple comparison analysis testing in ANOVA. Biochemia Med.21(3), 203–209 (2011). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (16.5KB, docx)

Data Availability Statement

All data generated or analysed during this study are included in this published article.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES