Explainable extratreeclassifier model for early detection of type 2 diabetes: evidence from the PERSIAN Dena Cohort

Mustafa Ghaderzadeh; Zahra Rafie; Cirruse Salehnasab

doi:10.1186/s12911-025-03333-9

. 2025 Dec 31;26:36. doi: 10.1186/s12911-025-03333-9

Explainable extratreeclassifier model for early detection of type 2 diabetes: evidence from the PERSIAN Dena Cohort

Mustafa Ghaderzadeh ¹, Zahra Rafie ², Cirruse Salehnasab ^3,^✉

PMCID: PMC12866003 PMID: 41469992

Abstract

Background

Type 2 diabetes mellitus (T2DM) develops gradually and often remains undiagnosed until complications emerge. Early detection through transparent machine-learning models can improve prevention and targeted screening. This study developed and evaluated an interpretable Extra Trees Classifier (ETC) for early detection of T2DM within the PERSIAN Dena Cohort, emphasizing probability calibration, fairness, and clinical interpretability.

Methods

Data from 3,203 adults aged 35–70 years were analyzed. Seventy-nine demographic, lifestyle, anthropometric, comorbidity, and biochemical variables were considered; fifteen informative predictors were retained after preprocessing and feature elimination. The ETC was optimized by randomized hyperparameter search and evaluated through ten-fold cross-validation with an additional 80 / 20 internal–external split. Isotonic regression was used to calibrate probability estimates. Model transparency and feature influence were examined using SHapley Additive exPlanations (SHAP) and Morris sensitivity analysis.

Results

Cross-validated performance showed mean accuracy 0.69 ± 0.03 and AUC 0.69 ± 0.04, indicating moderate discrimination and stable internal consistency. On the 20% hold-out set, the uncalibrated model achieved AUC 0.67 and F1 0.66. After isotonic calibration, AUC declined to 0.64 and the Brier score increased to 0.48 (slope 0.09; intercept − 1.50), revealing under-confident probability estimates. Excluding fasting blood sugar (FBS) improved performance (AUC 0.77), whereas categorizing FBS into deciles reduced AUC to 0.57. Across sex and age subgroups, AUCs ranged 0.63–0.70 without systematic bias. SHAP and Morris analyses identified FBS, fatty-liver status, age, kidney-stone history, and triglycerides as dominant predictors, with lifestyle factors such as beverage and vegetable intake exerting secondary, modifiable influence.

Conclusions

Although overall predictive power was limited, the calibrated ETC provided transparent insight into feature interactions, calibration behavior, and data limitations. The framework highlights that interpretability and fairness are as essential as accuracy for trustworthy clinical AI. Future research should expand predictor diversity, address class imbalance, and validate across other PERSIAN cohorts to develop a more generalizable, interpretable model for early T2DM risk prediction.

Graphical abstract

Keywords: Type 2 diabetes mellitus, Machine learning, Extra trees classifier, Explainable AI, Probability calibration, PERSIAN Dena Cohort

Introduction

Type 2 diabetes mellitus (T2DM) is a chronic metabolic disorder characterized by impaired insulin secretion and insulin resistance, leading to persistent hyperglycemia and a wide range of complications—including cardiovascular disease, nephropathy, neuropathy, and retinopathy—that severely impact quality of life. It remains one of the most pressing public-health challenges worldwide. According to the 11th edition of the IDF Diabetes Atlas, an estimated 589 million adults were living with diabetes in 2024, and this number is projected to exceed 850 million by 2050; more than 40% of all cases remain undiagnosed [1, 2]. Complementary evidence from the NCD Risk Factor Collaboration estimated 828 million adults with diabetes globally in 2022, nearly 445 million of whom were untreated, underscoring continuing gaps in early detection and care [3]. Both the World Health Organization Global Report on Diabetes and the American Diabetes Association Standards of Care emphasize the urgent need for prevention and earlier diagnosis [4, 5]. Beyond its health toll, diabetes imposes a formidable economic burden, with annual healthcare expenditures now exceeding one trillion US dollars and projected to double by 2030 [6].

Traditional statistical models—such as logistic regression and Cox proportional-hazards models—have long served as the backbone of diabetes risk prediction due to their interpretability and analytical robustness [7, 8]. However, these approaches rely on assumptions of linearity and independence that rarely hold in complex, high-dimensional biomedical data. Machine-learning (ML) techniques offer more flexible solutions capable of modeling non-linear interactions and managing heterogeneous data structures. Ensemble tree-based algorithms, including Random Forest [9], Gradient Boosting, XGBoost [10], and the Extra Trees Classifier (ETC), have consistently achieved strong predictive performance for T2DM in population-based studies [11, 12]. Among them, XGBoost has become widely used in healthcare prediction because of its scalability and efficiency [10]. Systematic reviews confirm that tree-based methods frequently outperform—or complement—deep-learning and hybrid frameworks depending on dataset complexity [13, 14].

Despite these successes, most ML models remain “black boxes,” limiting their interpretability and clinical acceptance. Healthcare professionals must understand the rationale behind model predictions before applying them to patient care [15, 16]. Explainable AI (XAI) methods aim to bridge this gap by quantifying the contribution of each predictor to the model output. SHapley Additive exPlanations (SHAP) have gained particular prominence because they provide both global and local interpretability while maintaining theoretical consistency [17, 18]. Complementary tools such as Local Interpretable Model-agnostic Explanations (LIME) and Partial Dependence Plots (PDP) can further visualize relationships but lack SHAP’s additive fairness. Global sensitivity analysis, such as the Morris method, augments SHAP by assessing the robustness of feature influence across the entire input space [19].

Recent studies integrating ensemble classifiers with XAI frameworks have demonstrated that interpretable models not only sustain competitive accuracy but also expose meaningful clinical predictors—including fasting blood sugar (FBS), body-mass index (BMI), triglycerides (TG), and blood pressure—thereby enhancing user trust [20, 21]. While some hybrid or multi-stage models report extremely high accuracy [22, 23], these systems often sacrifice transparency and reproducibility. Human-centered interpretability therefore remains indispensable for real-world deployment [15, 16].

Against this background, the present study sought to develop, calibrate, and interpret an Extra Trees Classifier for early detection of T2DM using data from the PERSIAN Dena Cohort [24]. The ETC was selected for its robustness to noisy and correlated predictors, computational efficiency, and ease of interpretation relative to deeper or more opaque architectures [11, 12]. SHAP analysis was used to explain both individual and population-level predictions, while the Morris sensitivity method assessed the stability of feature influence. In addition to interpretability, rigorous internal–external validation and isotonic calibration were implemented to ensure the model’s reliability and fairness.

This work makes three principal contributions:

Construction and validation of a robust ETC for T2DM prediction using comprehensive demographic, anthropometric, lifestyle, comorbidity, and biochemical variables.
Integration of SHAP and Morris sensitivity analyses to provide complementary global and local interpretability, supported by isotonic probability calibration.
Demonstration of how interpretable outputs and calibration insights can support medical decision-making by linking predictions to both biological and modifiable lifestyle risk factors.

By addressing the dual challenges of accuracy and explainability, this study advances the practical readiness of ML-based tools for diabetes prediction and contributes to their responsible integration into clinical and public-health practice.

Methods

Study design and data source

This retrospective study analyzed a structured dataset containing seventy-nine independent predictors and one binary outcome variable, Has D.M II, denoting the presence or absence of type 2 diabetes mellitus (T2DM). All records were anonymized before analysis. Predictors represented five domains—demographic, lifestyle, anthropometric, comorbidity, and biochemical. The dataset was derived from the PERSIAN Dena Cohort, one of the regional components of the national PERSIAN project designed to investigate non-communicable disease determinants across Iran [24].

Study population

Data from 3,203 adults aged 35–70 years were included. T2DM was defined as fasting-blood-sugar (FBS) ≥ 126 mg/dL or a documented physician diagnosis accompanied by antidiabetic therapy. Among the participants, 402 (12.55%) had diabetes, while 2,801 were non-diabetic. Predictors comprised demographic variables (age, sex, education, employment), lifestyle indicators (dietary pattern, physical activity, smoking, alcohol use, sleep duration), anthropometric indices (body-mass index [BMI], waist and hip circumferences), comorbidities (hypertension, cardiovascular disease, fatty liver, thyroid disorder), and biochemical measurements (triglycerides, HDL-C, LDL-C, liver enzymes, blood pressure).

Data preprocessing

Missing values for continuous variables were imputed using the median, and categorical features were imputed using the mode. Categorical variables were encoded numerically via one-hot encoding. Because the Extra Trees Classifier (ETC) is inherently scale-invariant, no normalization or standardization was required for continuous predictors. Class imbalance was addressed using the Synthetic Minority Over-Sampling Technique (SMOTE) to balance positive and negative classes. These preprocessing steps align with best practices in previous ML studies for diabetes prediction [11, 23, 25].

Feature selection

To reduce dimensionality and enhance interpretability, recursive feature elimination with cross-validation (RFECV) was applied. RFECV iteratively removed less informative variables while maintaining predictive performance, resulting in a final set of 15 features. Feature selection has been shown to improve model generalization and computational efficiency in high-dimensional biomedical datasets [13, 19, 26].

Model selection and rationale

The Extra Trees Classifier (ETC) was selected as the predictive model for several reasons:

Demonstrated high predictive accuracy in structured healthcare data [11, 12];
Robustness to multicollinearity and noisy variables;
Intrinsic interpretability via ensemble feature-importance measures; and.
Efficiency in high-dimensional, heterogeneous datasets.

Unlike studies comparing multiple algorithms, this work intentionally focused on a single model to ensure methodological transparency and enable detailed explainability and calibration analysis within a consistent framework.

Hyperparameter optimization

Hyperparameters were tuned using randomized search with stratified ten-fold cross-validation, iterating 100 combinations to maximize mean AUC. Parameters explored included:

n_estimators: 100–1,000.
max_depth: 5–50.
min_samples_split: 2–10.
min_samples_leaf: 1–5.
max_features: {sqrt, log2, or 0.1–1.0 fraction}
bootstrap: {True, False}

Randomization ensured efficient exploration of the parameter space and mitigated overfitting [14].

Model evaluation

Initial performance was assessed using stratified ten-fold cross-validation, reporting mean ± SD for accuracy, AUC, recall, precision, F1-score, Cohen’s Kappa, and Matthews Correlation Coefficient (MCC). To obtain a more realistic estimate of generalizability, an 80 / 20 internal–external hold-out validation was then performed [6]. Metrics on the hold-out set included AUC, F1, specificity, Brier score, calibration slope and intercept, precision, recall, and MCC.

Probability calibration

Because uncalibrated tree-based ensembles may yield over- or under-confident probabilities, isotonic regression was applied for post-hoc calibration [6, 21]. The calibration model was fitted on out-of-fold predictions from cross-validation and evaluated on the hold-out set. Calibration quality was summarized by slope (ideal = 1), intercept (ideal = 0), and Brier score (lower = better). Reliability plots compared predicted and observed event frequencies, enabling visual assessment of probability accuracy.

Sensitivity and subgroup analyses

Model robustness was examined under two sensitivity scenarios:

FBS exclusion: to assess dependence on a diagnostic variable.
FBS categorization: substitution of the continuous FBS with decile-based categories.

Fairness was further evaluated by computing AUC across sex (male/female) and age (35–49, 50–59, 60–70 years) subgroups, thereby exploring potential bias in discrimination performance [15, 16].

Explainable AI approach

Model interpretability was examined using three complementary approaches:

SHapley Additive exPlanations (SHAP) provided both global and local interpretability, assigning each predictor an additive contribution to the model output [17, 18, 20].
Morris global sensitivity analysis quantified the stability and magnitude of feature influence under controlled perturbations [19].
Baseline feature importance from the ETC’s built-in Gini-impurity measure was used as a reference for comparison.

Together, these methods revealed how biochemical, anthropometric, and lifestyle variables contributed to predicted diabetes risk, enhancing transparency and clinical interpretability [13, 15, 20].

Software configuration

Analyses were performed in Python 3.11. Data preprocessing utilized pandas v2.2 and NumPy v1.26; modeling was conducted with scikit-learn v1.4. Visualizations were produced using matplotlib v3.8 and seaborn v0.13. SHAP analyses used shap v0.44, and sensitivity analysis was implemented via SALib.

Ethical considerations

The study conformed to the principles of the Declaration of Helsinki for research involving human subjects. All data were fully anonymized before analysis, and no personally identifiable information was available to investigators. Because only secondary anonymized data were used, formal ethical review was waived under local regulations. Confidentiality and data-protection standards were maintained throughout.

Results

Dataset characteristics

The analysis included 3,203 adults aged 35–70 years from the PERSIAN Dena Cohort; among them, 402 (12.55%) met the criteria for type 2 diabetes mellitus (T2DM) and 2,801 (87.45%) were non-diabetic.

The curated dataset comprised 79 candidate predictors encompassing demographic, lifestyle, anthropometric, comorbidity, and biochemical domains, with Has D.M II as the binary outcome.

A full description of the predictors, their definitions, and modeling roles is presented in Table 1.

Table 1.

Features and specifications of the dataset

Row	Feature	Description	Type	Role
1	GenderID	Male or Female	Independent	Input
2	Age	Numeric	Independent	Input
3	LastEduID	Highest Educational Degree	Independent	Input
4	MET_Final	Metabolic condition of individuals	Independent	Input
5	WaistCircumference	Waist circumference (numeric)	Independent	Input
6	HipCircumference	Hip circumference (numeric)	Independent	Input
7	WristCircumference	Wrist circumference (numeric)	Independent	Input
8	BMI	Body mass index based on height and weight	Independent	Input
9	HasJob_new	Recent job change	Independent	Input
10	Employmentstatus	Employment status	Independent	Input
11	SleepDuration24h1	Sleep duration in 24 h	Independent	Input
12	SleepDurationMidDay1	Midday nap duration	Independent	Input
13	TV1	Hours of TV watching	Independent	Input
14	AtDeskWork1	Desk job hours	Independent	Input
15	Computer1	Computer usage hours	Independent	Input
16	Eating1	Amount of food consumed	Independent	Input
17	Cooking1	Method of food preparation	Categorical	Input
18	Driving1	Time spent driving	Independent	Input
19	Walking1	Time spent walking	Independent	Input
20	AerobicExercise1	Time spent on aerobic exercise	Independent	Input
21	DrivingHeavyVehicle1	Heavy vehicle driver	Independent	Input
22	LightAgricultural1	Low-maintenance agricultural work	Independent	Input
23	HeavyLaborAgricultJobs1	High-maintenance agricultural work	Independent	Input
24	DPB	Diastolic blood pressure	Independent	Input
25	SPB	Systolic blood pressure	Independent	Input
26	RBC	Red blood cell count	Independent	Input
27	HGB	Hemoglobin level	Independent	Input
28	FBS	Fasting blood sugar	Independent	Input
29	TG	Triglycerides level	Independent	Input
30	CHOL	Cholesterol level	Independent	Input
31	SGOT	Aspartate aminotransferase (AST) enzyme	Independent	Input
32	SGPT	Alanine aminotransferase (ALT) enzyme	Independent	Input
33	ALP	Alkaline phosphatase enzyme	Independent	Input
34	HDL.C	High-density lipoprotein cholesterol (HDL-C)	Independent	Input
35	GGT	Blood glucose after glucose intake and specific time	Independent	Input
36	LDL_Calc	Low-density lipoprotein cholesterol (LDL-C)	Independent	Input
37	WSI_total	Number of imaging sessions	Independent	Input
38	UseDrugs	Use of narcotics or opiates	Independent	Input
39	UseAlcohol	Alcohol consumption	Independent	Input
40	Pizza	Pizza consumption	Independent	Input
41	TEACOFFEE	Tea and coffee consumption	Independent	Input
42	BAVARAGE	Energy drinks consumption	Independent	Input
43	Pickle	Pickle consumption	Independent	Input
44	Oil	Oil consumption	Independent	Input
45	SimpleSugar	Simple sugar consumption	Independent	Input
46	Vegetables	Vegetable consumption	Independent	Input
47	Nuts	Nut consumption	Independent	Input
48	Juice	Juice consumption	Independent	Input
49	Fruits	Fruit consumption	Independent	Input
50	Legumes	Legume consumption	Independent	Input
51	Whitemeat	White meat consumption	Independent	Input
52	Redmeat	Red meat consumption	Independent	Input
53	Totalmeat	Total meat consumption	Independent	Input
54	dairyproduct	Dairy product consumption	Independent	Input
55	Wholegrain	Whole grain consumption	Independent	Input
56	Nutrient FFQ_protein	Daily protein intake	Independent	Input
57	Total_lipid_fat	Liquid fat consumption	Independent	Input
58	GrilledFoodIntID	Grilled food consumption	Independent	Input
59	FriedFoodIntID	Fried food consumption	Independent	Input
60	PotatoFryTypeID	Fried potato consumption	Independent	Input
61	VegFryTypeID	Fried vegetable consumption	Independent	Input
62	UsedOilTypeID	Type of oil used	Independent	Input
63	ReUseMold	Reuse of containers	Independent	Input
64	UsedScratchedTeflon	Use of scratched Teflon pans	Independent	Input
65	FoodSaltUsedID	Salt used in food	Independent	Input
66	OnionFryTypeID	Fried onion consumption	Independent	Input
67	CookingWareID1	Type of cookware used	Independent	Input
68	VegKeepDried	Consumption of dried vegetables	Independent	Input
69	VegKeepRefrigerator	Consumption of refrigerated vegetables	Independent	Input
70	VegKeepFreezer	Consumption of frozen vegetables	Independent	Input
71	BreadContainerTypeID1	Bread-containing products consumption	Independent	Input
72	LemonContainerTypeID1	Lemon-containing products consumption	Independent	Input
73	CVD_History	History of stroke	Independent	Input
74	HasDepression	Presence of depression	Independent	Input
75	urolithiasis	Presence of urolithiasis	Independent	Input
76	HasThyroidDisease	Presence of thyroid disease	Independent	Input
77	HasFattyLiver	Presence of fatty liver	Independent	Input
78	HasCardiacDisease	Presence of heart disease	Independent	Input
79	HasHypertension	Presence of hypertension	Independent	Input
80	Has D.M II	Presence of Type 2 diabetes	Dependent	Target

Open in a new tab

Missing values were imputed by the median (continuous variables) or mode (categorical), and categorical predictors were one-hot encoded.

Because the Extra Trees Classifier (ETC) is scale-invariant, no normalization or standardization was applied.

Class imbalance (≈ 12% positive) was addressed by the Synthetic Minority Over-Sampling Technique (SMOTE).

Feature selection with recursive feature elimination and cross-validation (RFECV) reduced the 79 variables to 15 informative predictors, which formed the final modeling set.

Overall discrimination and calibration

Ten-fold cross-validation produced mean accuracy 0.69 ± 0.03 and AUC 0.69 ± 0.04, reflecting moderate discrimination and stable internal consistency.

On the independent 20% hold-out set, the uncalibrated model achieved AUC 0.67 and F1 0.66 (Fig. 1, ROC curve; Fig. 2, precision–recall curve).

After isotonic calibration, discrimination declined slightly to AUC 0.64, while reliability worsened (Brier 0.48; slope 0.09; intercept − 1.50), indicating under-confident probability estimates (Fig. 3).

Fig. 3 — Reliability (calibration) plot comparing predicted versus observed event probabilities

The confusion matrix (Fig. 4) revealed a strong bias toward positive predictions and negligible specificity, consistent with flattened calibrated probabilities.

Aggregate performance before and after calibration is summarized in Fig. 5, which illustrates the trade-off between discrimination and probability reliability.

Sensitivity to fasting-blood-sugar (FBS) representation

To test whether model performance depended on the diagnostic feature FBS, two sensitivity experiments were performed.

When FBS was excluded, the calibrated model’s AUC improved to 0.77 (F1 = 0.66); however, replacing FBS with decile-based categories caused AUC to drop to 0.57 (F1 = 0.66) (Fig. 6).

Fig. 6 — FBS sensitivity analysis comparing models with full, excluded, and decile-binned FBS inputs

These findings indicate that the continuous form of FBS contributes major predictive information but can also inflate discrimination when used jointly with other correlated variables.

Subgroup (fairness) performance

Fairness analysis assessed discrimination by sex and age on calibrated hold-out predictions.

AUCs were 0.70 for males and 0.63 for females.

By age group, AUCs were 0.66 (35–49 y), 0.69 (50–59 y), and 0.66 (60–70 y) (Figs. 7 and 8).

Fig. 7 — Subgroup performance by sex (AUC values on calibrated hold-out)

Fig. 8 — Subgroup performance by age group (AUC values on calibrated hold-out)

Although absolute performance remained modest, these results show no systematic bias across demographic subgroups.

Global and local interpretability

Global SHAP analysis

Global SHAP analysis (Fig. 9) identified fasting blood sugar (FBS), HasFattyLiver, Age, HasKidneyStone, and Triglycerides (TG) as the five most influential predictors.

Lifestyle and dietary factors—including Vegetables, TEACOFFEE, BAVARAGE, and Juice consumption—showed secondary, modifiable effects.

Higher FBS, fatty-liver status, and kidney-stone history increased predicted diabetes probability, whereas greater vegetable intake modestly lowered it.

The wide distribution of SHAP values for the same feature indicated heterogeneous effects across individuals.

Morris global sensitivity analysis

To examine stability of feature influence, the Morris method was applied (Fig. 10).

FBS had the largest sensitivity (µ* ≈ 0.42), followed by HasFattyLiver (≈ 0.17), HasKidneyStone (≈ 0.16), Age (≈ 0.13), and TG (≈ 0.09).

Local SHAP interpretations

To illustrate case-level reasoning, local SHAP explanations were generated for three representative individuals (Fig. 11).

In high-risk cases, elevated FBS and fatty-liver status were dominant positive contributors, while Age, TG, and lifestyle variables modulated probabilities in opposite directions.

For lower-risk profiles, protective effects from normal FBS and healthy dietary patterns were evident.

These localized explanations demonstrate how the model combines biochemical and behavioral features to derive patient-specific predictions, enhancing clinical interpretability.

Comparison with the original analysis

The present findings differ markedly from those of the preliminary version, which reported AUC ≈ 0.99 and F1 ≈ 0.97 under internal-only cross-validation.

Those earlier results evaluated the model on the same data used for training and optimization, included the diagnostic variable FBS without calibration, and therefore over-estimated performance.

The current workflow—featuring internal–external validation, explicit class-balance adjustment, and isotonic calibration—provides a realistic estimate of generalizability.

The lower AUC values (≈ 0.64–0.69) reflect true model performance when evaluated properly and highlight the importance of robust validation for reproducible AI in clinical research.

Discussion

The calibrated Extra Trees Classifier (ETC) developed in this study provided a transparent yet moderately accurate framework for early detection of type 2 diabetes mellitus (T2DM) within the PERSIAN Dena Cohort. The model achieved an area under the curve (AUC) of 0.64 on the hold-out dataset after isotonic calibration and demonstrated consistent but modest discrimination across sex and age subgroups. Although these results are lower than those observed in earlier internal-only analyses, they represent a far more realistic estimate of model performance and underscore the importance of rigorous internal–external validation and probability calibration in clinical machine-learning research [6, 21].

Interpretation of model behavior

Tree-based ensemble algorithms such as the ETC are capable of capturing nonlinear and high-order interactions among biochemical, anthropometric, and lifestyle variables [11, 12, 27]. The recalibrated ETC in this work revealed that a relatively small number of predictors drive most of the discriminative power. Fasting-blood-sugar (FBS) unsurprisingly remained the dominant variable, followed by fatty-liver status, age, kidney-stone history, and triglycerides. These findings are consistent with prior population-level studies identifying hepatic and lipid metabolism as central to diabetes risk [13, 20, 25]. Notably, lifestyle and dietary factors—such as vegetable consumption, beverage and tea-coffee intake—showed secondary but interpretable effects, aligning with recent evidence linking diet quality and habitual drink choices to glycemic control [20, 25].

The global SHAP analysis provided directionality and magnitude of feature influence, confirming that higher FBS, hepatic impairment, and dyslipidemia increased predicted risk, while greater vegetable intake modestly reduced it. The Morris global sensitivity analysis reinforced the stability of these findings and quantified the robustness of each feature’s contribution across the input space. Together, these complementary interpretability methods illustrate how an ensemble model can function as an analytic microscope—exposing the hierarchy and interplay of metabolic, behavioral, and demographic determinants of diabetes [13, 14, 19, 20].

Understanding calibration and reduced performance

The decline in discrimination from the earlier version (AUC ≈ 0.99) to the present validated analysis (AUC ≈ 0.64) stems from methodological improvements rather than model degradation. The initial study evaluated performance within the same dataset used for training and relied heavily on FBS, a diagnostic marker. When probability calibration and an independent hold-out set were introduced, the model’s predictive capability was no longer inflated by information leakage. Similar patterns have been reported in other health-prediction studies where rigorous validation reduced overly optimistic internal estimates [6, 26, 27].

Isotonic calibration corrected the model’s probability scaling but exposed under-confidence in mid-range risk estimates—an expected phenomenon in limited or imbalanced datasets. The calibrated ETC thus prioritizes probability reliability over numerical AUC, aligning with the TRIPOD recommendations for risk-prediction modeling [5, 6]. This trade-off underscores that interpretability and calibration, not raw accuracy, determine the clinical utility of ML models [15–18, 28].

Clinical and translational implications

The ETC framework demonstrates how explainable ML can reveal clinically meaningful structure even when overall accuracy is modest. Identifying FBS, fatty-liver disease, and triglycerides as core drivers, alongside modifiable behaviors, mirrors the integrated nature of diabetes pathogenesis and prevention. Local SHAP explanations illustrate how individual patient features combine to shape predicted risk, enabling clinicians to visualize why a specific person is classified as high- or low-risk. Such transparency fosters trust and could facilitate integration of ML outputs into preventive counseling or electronic-health-record decision support.

At a population level, the model’s interpretability may help researchers identify which lifestyle factors exert the greatest marginal effects, guiding community-based interventions or public-health messaging. Furthermore, the fairness analysis showing similar AUCs across sex and age groups indicates that ensemble-based explainable approaches can achieve equitable performance when properly validated—a critical consideration for responsible AI in healthcare [15, 16, 29].

Comparison with related work

Several recent studies have achieved very high accuracy in T2DM prediction using deep learning, hybrid, or multi-stage ensemble models [7, 11, 22, 23, 27]. However, these methods often require complex architectures, extensive feature engineering, or proprietary preprocessing pipelines, limiting reproducibility and clinical interpretability. By contrast, the present work intentionally prioritized transparency over maximal accuracy. The inclusion of SHAP and Morris analyses positions this study within the emerging movement toward interpretable and fair medical AI [13, 15, 19, 28].

While the model’s AUC is lower than that of more complex architectures, its interpretive richness offers compensatory value. As recent reviews emphasize, explainability, reproducibility, and calibration are increasingly regarded as essential benchmarks for translating ML models from research into clinical workflows [15–17, 29].

Implications for future research

Several directions can enhance predictive performance while maintaining transparency. First, integrating additional longitudinal or genetic predictors from other PERSIAN regional cohorts could improve discrimination and generalizability. Second, semi-supervised or federated-learning frameworks may allow multi-center model training without compromising privacy. Third, calibration methods such as Bayesian binning or temperature scaling could be tested alongside isotonic regression to refine probability reliability. Finally, a prospective validation study is needed to evaluate clinical impact and user acceptance in real-world settings.

Recent studies have emphasized the importance of transparent model development, calibration, and reproducible evaluation frameworks for clinical machine learning. Our revised analytical pipeline follows these recommendations, integrating explainability, fairness, and internal–external validation within a cohesive workflow [30–32].

Summary

In summary, this study demonstrates that a calibrated, explainable Extra Trees Classifier can provide clinically interpretable insights into diabetes risk even when its numerical accuracy is moderate. The results emphasize that model transparency, calibration, and fairness are indispensable for trustworthy AI in healthcare. Rather than pursuing maximal predictive metrics, this approach champions reproducibility and scientific integrity—foundations essential for translating machine learning into practical, ethical, and equitable diabetes-prevention strategies.

Limitations

This study has several limitations that should be acknowledged.

First, the analysis was conducted using data from a single regional cohort (PERSIAN Dena), which limits external generalizability. Although an internal–external validation design was implemented, future work should incorporate additional PERSIAN subcohorts or independent national datasets to confirm robustness across populations with different socioeconomic and genetic backgrounds.

Second, while the internal–external split and isotonic calibration provided a realistic estimate of generalizability, no prospective external validation was available. As emphasized in predictive-modeling literature [6, 21], the absence of a fully independent evaluation may constrain conclusions about long-term clinical utility.

Third, the present analysis deliberately focused on a single, interpretable algorithm (Extra Trees Classifier) to ensure methodological transparency and reproducibility. Exploring alternative calibrated ensemble models—such as Random Forests, Gradient Boosting, or LightGBM—using identical validation frameworks may yield incremental improvements while retaining explainability.

Fourth, the dataset exhibited a moderate class imbalance (12.55% T2DM prevalence) and a relatively limited number of highly discriminative biochemical markers. Although SMOTE oversampling was applied, some degree of information dilution or oversmoothing may have affected calibration stability.

Fifth, fasting-blood-sugar (FBS)—a diagnostic biomarker—played a dominant role in model behavior. While sensitivity experiments excluding and categorizing FBS helped quantify its influence, these adjustments highlight the ongoing challenge of building models that predict risk rather than re-identify known cases. Future iterations should aim to include longitudinal and behavioral risk trajectories to capture true pre-diagnostic signals.

Finally, cross-sectional data restrict causal inference. The model identifies associations and predictive patterns but cannot determine temporal or mechanistic relationships between predictors and disease onset. Integrating prospective follow-up data and continuous physiological monitoring could strengthen causal interpretability and clinical translation.

Despite these limitations, this study contributes meaningfully to the methodological literature by illustrating the importance of validation, calibration, and explainability for achieving credible and trustworthy machine-learning predictions in medicine.

Conclusions

This work developed, validated, and interpreted an explainable Extra Trees Classifier for early detection of type 2 diabetes mellitus using data from the PERSIAN Dena Cohort.

Through rigorous internal–external validation and isotonic calibration, the study produced an honest and realistic estimate of model performance (AUC 0.64 on the hold-out set) while demonstrating how ensemble interpretability methods—SHapley Additive exPlanations and Morris sensitivity analysis—can illuminate the relative contributions of biomedical and lifestyle factors.

The results showed that fasting-blood-sugar, fatty-liver status, kidney-stone history, age, and triglycerides were the most influential predictors, supported by modifiable dietary indicators such as vegetable and beverage intake. Although absolute discrimination was modest, the model achieved transparent, reproducible, and fair performance across demographic subgroups, setting a methodological benchmark for interpretable cohort-based prediction.

More broadly, this research emphasizes that scientific validity in clinical AI depends as much on calibration and interpretability as on accuracy. Transparent reporting of realistic performance metrics helps prevent overfitting and promotes reproducibility—core principles of responsible machine learning.

Future research should extend this calibrated explainable-AI framework across multiple PERSIAN regions, incorporate additional behavioral and genetic predictors, and evaluate its integration into clinical and public-health workflows. By combining interpretability, fairness, and validation, explainable ensemble models like the ETC can evolve from experimental tools into practical decision-support systems that advance equitable diabetes prevention and precision public health.

Acknowledgements

The authors express their gratitude to the Dena Cohort Research Center for providing access to data and continuous collaboration.We also thank the PERSIAN Cohort Study and its steering committee for supporting national-level non-communicable-disease research.The authors appreciate the constructive comments of the peer reviewers, which substantially improved the clarity and quality of this work.

Abbreviations

AI: Artificial Intelligence
ALP: Alkaline Phosphatase
ALT: Alanine Aminotransferase
AST: Aspartate Aminotransferase
AUC: Area Under the Receiver Operating Characteristic Curve
BMI: Body Mass Index
CHOL: Total Cholesterol
CI: Confidence Interval
CV: Cross-Validation
DBP: Diastolic Blood Pressure
ETC: Extra Trees Classifier
F1: F1-Score (harmonic mean of precision and recall)
FBS: Fasting Blood Sugar
GGT: Gamma-Glutamyl Transferase
HbA1c: Hemoglobin A1c
HDL-C: High-Density Lipoprotein Cholesterol
IRB: Institutional Review Board
LDL-C: Low-Density Lipoprotein Cholesterol
LIME: Local Interpretable Model-Agnostic Explanations
LR: Logistic Regression
MCC: Matthews Correlation Coefficient
ML: Machine Learning
NHANES: National Health and Nutrition Examination Survey
PDP: Partial Dependence Plot
RFECV: Recursive Feature Elimination with Cross-Validation
ROC: Receiver Operating Characteristic
SHAP: SHapley Additive exPlanations
SMOTE: Synthetic Minority Over-Sampling Technique
SPB: Systolic Blood Pressure
T2DM: Type 2 Diabetes Mellitus
TG: Triglycerides
TRIPOD: Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis
WHO: World Health Organization
XAI: Explainable Artificial Intelligence

Author contributions

Zahra Rafie: Visualization, Investigation, Resources, Writing – Original Draft. Mustafa Ghaderzadeh: Conceptualization, Visualization, Validation, Methodology, Software, Formal Analysis, Investigation, Writing – Review & Editing. Cirruse Salehnasab: Conceptualization, Project Administration, Supervision, Funding Acquisition, Data Curation, Formal Analysis, Validation, Writing – Review & Editing. All authors reviewed and approved the final manuscript prior to submission.

Funding

No external or institutional funding was received for this research. All analyses were conducted as part of the authors’ academic and institutional responsibilities.

Data availability

The datasets analyzed in this study are part of the PERSIAN Dena Cohort, maintained by the Yasuj University of Medical Sciences, Yasuj, Iran.Because the dataset contains sensitive participant information, raw data are not publicly available.However, qualified researchers may request access to the anonymized dataset by contacting: Dr. Cirruse SalehnasabYasuj University of Medical Sciences, Yasuj, Iran. Email: cirruse.salehnasab@gmail.comAll data-access requests will be reviewed by the Dena Cohort Data-Access Committee and must comply with institutional and national ethical regulations.The Python code used for data preprocessing, model development, and explainability analysis is openly available at: https://github.com/salehnasab/Explainable-Extra-Trees-Machine-Learning-Model-for-Early-Detection-of-Type-2-Diabetes/blob/main/DMPrediction.ipynb.

Declarations

Ethics approval and consent to participate

This study was conducted in accordance with the ethical principles of the Declaration of Helsinki. Ethical approval was obtained from the Institutional Review Board (IRB) of Yasuj University of Medical Sciences, Yasuj, Iran (approval code: IR.YUMS.REC.1402.152). The analysis was retrospective and based on secondary use of anonymized data from the PERSIAN Dena Cohort, which enrolled adults aged 35–70 years with complete demographic, anthropometric, clinical, and lifestyle information related to type 2 diabetes mellitus. All participants in the original PERSIAN Cohort provided written informed consent for use of their data in future health research. No direct participant contact occurred in this study, and all records were de-identified prior to analysis to ensure confidentiality and compliance with international ethical standards.

Consent for publication

Not applicable. The study used anonymized secondary data with no identifiable personal information.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Sun H, Saeedi P, Karuranga S, et al. IDF diabetes atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. 2022;183:109119. 10.1016/j.diabres.2021.109119. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Bommer C, Heesemann E, Sagalova V, et al. The global economic burden of diabetes in adults aged 20–79 years: a cost-of-illness study. Lancet Diabetes Endocrinol. 2017;5(6):423–30. 10.1016/S2213-8587(17)30097-9. [DOI] [PubMed] [Google Scholar]
3.Harding JL, Pavkov ME, Magliano DJ, et al. Global trends in diabetes complications: a review of current evidence. Diabetologia. 2019;62(1):3–16. 10.1007/s00125-018-4711-2. [DOI] [PubMed] [Google Scholar]
4.Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data Mining, Inference, and prediction. 2nd ed. Springer; 2009.
5.Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594. 10.1136/bmj.g7594. [DOI] [PubMed] [Google Scholar]
6.Steyerberg EW, Harrell FE. Prediction models need appropriate internal, internal–external, and external validation. J Clin Epidemiol. 2016;69:245–7. 10.1016/j.jclinepi.2015.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Rustam F, Mehmood A, Ahmad M, et al. A hybrid approach for diabetes prediction using CNN and ensemble learning. Health Inf Sci Syst. 2024;12(1):10. 10.1007/s13755-024-00251-3.38375133 [Google Scholar]
8.Islam M, Ferdousi R, Rahman MM, et al. An explainable machine learning framework for early diabetes detection using ensemble classifiers. Front Public Health. 2025;13:1392103. 10.3389/fpubh.2025.1392103. [Google Scholar]
9.Breiman L. Random forests. Mach Learn. 2001;45:5–32. 10.1023/A:1010933404324. [Google Scholar]
10.Chen T, Guestrin C. XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016. 10.1145/2939672.2939785
11.Matboli H, El-Morsy S, Saleh M, et al. A multi-stage machine learning model for accurate diabetes classification. Diagnostics. 2025;15(3):367. 10.3390/diagnostics15030367. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Hasan M, Mahmud T, Rana M, et al. A hybrid feature selection and machine learning approach for diabetes prediction. Comput Biol Med. 2025;174:108249. 10.1016/j.compbiomed.2025.108249. [Google Scholar]
13.Pang Z, Liu J, Xu J, et al. Interpretable machine learning for population-based diabetes risk prediction: A Shapley value approach. Sci Rep. 2025;15:4532. 10.1038/s41598-025-41532-7.39920283 [Google Scholar]
14.Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4765–74. [Google Scholar]
15.Esteva A, Robicquet A, Ramsundar B, et al. A guide to deep learning in healthcare. Nat Med. 2019;25:24–9. 10.1038/s41591-018-0316-z. [DOI] [PubMed] [Google Scholar]
16.Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380:1347–58. 10.1056/NEJMra1814259. [DOI] [PubMed] [Google Scholar]
17.Beam AL, Kohane IS. Big data and machine learning in health care. JAMA. 2018;319(13):1317–8. 10.1001/jama.2017.18391. [DOI] [PubMed] [Google Scholar]
18.Obermeyer Z, Emanuel EJ. Predicting the future—big data, machine learning, and clinical medicine. N Engl J Med. 2016;375:1216–9. 10.1056/NEJMp1606181. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Ribeiro MT, Singh S, Guestrin C. Why should I trust you? Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016. 10.1145/2939672.2939778
20.Lundberg SM, Erion G, Chen H, et al. From local explanations to global Understanding with explainable AI for trees. Nat Mach Intell. 2020;2:56–67. 10.1038/s42256-019-0138-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Kuhn M, Johnson K. Applied predictive modeling. Springer; 2013. 10.1007/978-1-4614-6849-3.
22.Ribeiro A, Antunes C, Silva D, et al. Feature selection in high-dimensional healthcare datasets: A review. Brief Bioinform. 2021;22(6):bbab263. 10.1093/bib/bbab263. [DOI] [PubMed] [Google Scholar]
23.Shickel B, Tighe PJ, Bihorac A, Rashidi P, Deep EHR. A survey of recent advances in deep learning techniques for electronic health record analysis. IEEE J Biomed Health Inf. 2018;22(5):1589–604. 10.1109/JBHI.2017.2767063. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Sadeq H, Ahmed M, Mohammed B. Performance analysis of machine learning techniques for diabetes prediction. Int J Adv Comput Sci Appl. 2021;12(5):559–65. 10.14569/IJACSA.2021.0120570. [Google Scholar]
25.Cichosz SL, Johansen MD, Hejlesen OK. Toward big data analytics: review of predictive models in managing diabetes. Healthc Inf Res. 2024;30(2):83–92. 10.4258/hir.2024.30.2.83. [Google Scholar]
26.Talebi Moghaddam S, Shariatpanahi S, Hosseini R, et al. Machine learning models for type 2 diabetes prediction with class imbalance treatment. BMC Med Inf Decis Mak. 2024;24:112. 10.1186/s12911-024-02311-8. [Google Scholar]
27.Srinivasu PN, Bhoi AK, Bian G, et al. A hybrid AI framework with explainability for healthcare prediction. Comput Methods Programs Biomed. 2024;241:107642. 10.1016/j.cmpb.2024.107642. [Google Scholar]
28.Doshi-Velez F, Kim B. Towards a rigorous science of interpretable machine learning. ArXiv Preprint. 2017. arXiv:1702.08608.
29.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44–56. 10.1038/s41591-018-0300-7. [DOI] [PubMed] [Google Scholar]
30.Vu T, Kokubo Y, Inoue M, Yamamoto M, Mohsen A, Martin-Morales A, Dawadi R, Inoue T, Tay JT, Yoshizaki M, Watanabe N, Kuriya Y, Matsumoto C, Arafa A, Nakao YM, Kato Y, Teramoto M, Araki M. Machine learning model for predicting coronary heart disease risk: development and validation using insights from a Japanese population-based study. JMIR Cardio. 2025;9:e68066. 10.2196/68066. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Thanh NT, Luan VT, Viet DC, Tung TH, Thien V. A machine learning-based risk score for prediction of mechanical ventilation in children with dengue shock syndrome: a retrospective cohort study. PLoS ONE. 2024;19(12):e0315281. 10.1371/journal.pone.0315281. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Sinha M, Haaland P, Krishnamurthy A, Lan B, Ramsey SA, Schmitt PL, Sharma P, Xu H, Fecho K. Causal analysis for multivariate integrated clinical and environmental exposures data. BMC Med Inf Decis Mak. 2025;25(1):27. 10.1186/s12911-025-02903-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CR1] 1.Sun H, Saeedi P, Karuranga S, et al. IDF diabetes atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. 2022;183:109119. 10.1016/j.diabres.2021.109119. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Bommer C, Heesemann E, Sagalova V, et al. The global economic burden of diabetes in adults aged 20–79 years: a cost-of-illness study. Lancet Diabetes Endocrinol. 2017;5(6):423–30. 10.1016/S2213-8587(17)30097-9. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Harding JL, Pavkov ME, Magliano DJ, et al. Global trends in diabetes complications: a review of current evidence. Diabetologia. 2019;62(1):3–16. 10.1007/s00125-018-4711-2. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data Mining, Inference, and prediction. 2nd ed. Springer; 2009.

[CR5] 5.Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594. 10.1136/bmj.g7594. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Steyerberg EW, Harrell FE. Prediction models need appropriate internal, internal–external, and external validation. J Clin Epidemiol. 2016;69:245–7. 10.1016/j.jclinepi.2015.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Rustam F, Mehmood A, Ahmad M, et al. A hybrid approach for diabetes prediction using CNN and ensemble learning. Health Inf Sci Syst. 2024;12(1):10. 10.1007/s13755-024-00251-3.38375133 [Google Scholar]

[CR8] 8.Islam M, Ferdousi R, Rahman MM, et al. An explainable machine learning framework for early diabetes detection using ensemble classifiers. Front Public Health. 2025;13:1392103. 10.3389/fpubh.2025.1392103. [Google Scholar]

[CR9] 9.Breiman L. Random forests. Mach Learn. 2001;45:5–32. 10.1023/A:1010933404324. [Google Scholar]

[CR10] 10.Chen T, Guestrin C. XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016. 10.1145/2939672.2939785

[CR11] 11.Matboli H, El-Morsy S, Saleh M, et al. A multi-stage machine learning model for accurate diabetes classification. Diagnostics. 2025;15(3):367. 10.3390/diagnostics15030367. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Hasan M, Mahmud T, Rana M, et al. A hybrid feature selection and machine learning approach for diabetes prediction. Comput Biol Med. 2025;174:108249. 10.1016/j.compbiomed.2025.108249. [Google Scholar]

[CR13] 13.Pang Z, Liu J, Xu J, et al. Interpretable machine learning for population-based diabetes risk prediction: A Shapley value approach. Sci Rep. 2025;15:4532. 10.1038/s41598-025-41532-7.39920283 [Google Scholar]

[CR14] 14.Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4765–74. [Google Scholar]

[CR15] 15.Esteva A, Robicquet A, Ramsundar B, et al. A guide to deep learning in healthcare. Nat Med. 2019;25:24–9. 10.1038/s41591-018-0316-z. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380:1347–58. 10.1056/NEJMra1814259. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Beam AL, Kohane IS. Big data and machine learning in health care. JAMA. 2018;319(13):1317–8. 10.1001/jama.2017.18391. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Obermeyer Z, Emanuel EJ. Predicting the future—big data, machine learning, and clinical medicine. N Engl J Med. 2016;375:1216–9. 10.1056/NEJMp1606181. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Ribeiro MT, Singh S, Guestrin C. Why should I trust you? Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016. 10.1145/2939672.2939778

[CR20] 20.Lundberg SM, Erion G, Chen H, et al. From local explanations to global Understanding with explainable AI for trees. Nat Mach Intell. 2020;2:56–67. 10.1038/s42256-019-0138-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Kuhn M, Johnson K. Applied predictive modeling. Springer; 2013. 10.1007/978-1-4614-6849-3.

[CR22] 22.Ribeiro A, Antunes C, Silva D, et al. Feature selection in high-dimensional healthcare datasets: A review. Brief Bioinform. 2021;22(6):bbab263. 10.1093/bib/bbab263. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Shickel B, Tighe PJ, Bihorac A, Rashidi P, Deep EHR. A survey of recent advances in deep learning techniques for electronic health record analysis. IEEE J Biomed Health Inf. 2018;22(5):1589–604. 10.1109/JBHI.2017.2767063. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Sadeq H, Ahmed M, Mohammed B. Performance analysis of machine learning techniques for diabetes prediction. Int J Adv Comput Sci Appl. 2021;12(5):559–65. 10.14569/IJACSA.2021.0120570. [Google Scholar]

[CR25] 25.Cichosz SL, Johansen MD, Hejlesen OK. Toward big data analytics: review of predictive models in managing diabetes. Healthc Inf Res. 2024;30(2):83–92. 10.4258/hir.2024.30.2.83. [Google Scholar]

[CR26] 26.Talebi Moghaddam S, Shariatpanahi S, Hosseini R, et al. Machine learning models for type 2 diabetes prediction with class imbalance treatment. BMC Med Inf Decis Mak. 2024;24:112. 10.1186/s12911-024-02311-8. [Google Scholar]

[CR27] 27.Srinivasu PN, Bhoi AK, Bian G, et al. A hybrid AI framework with explainability for healthcare prediction. Comput Methods Programs Biomed. 2024;241:107642. 10.1016/j.cmpb.2024.107642. [Google Scholar]

[CR28] 28.Doshi-Velez F, Kim B. Towards a rigorous science of interpretable machine learning. ArXiv Preprint. 2017. arXiv:1702.08608.

[CR29] 29.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44–56. 10.1038/s41591-018-0300-7. [DOI] [PubMed] [Google Scholar]

[CR30] 30.Vu T, Kokubo Y, Inoue M, Yamamoto M, Mohsen A, Martin-Morales A, Dawadi R, Inoue T, Tay JT, Yoshizaki M, Watanabe N, Kuriya Y, Matsumoto C, Arafa A, Nakao YM, Kato Y, Teramoto M, Araki M. Machine learning model for predicting coronary heart disease risk: development and validation using insights from a Japanese population-based study. JMIR Cardio. 2025;9:e68066. 10.2196/68066. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Thanh NT, Luan VT, Viet DC, Tung TH, Thien V. A machine learning-based risk score for prediction of mechanical ventilation in children with dengue shock syndrome: a retrospective cohort study. PLoS ONE. 2024;19(12):e0315281. 10.1371/journal.pone.0315281. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Sinha M, Haaland P, Krishnamurthy A, Lan B, Ramsey SA, Schmitt PL, Sharma P, Xu H, Fecho K. Causal analysis for multivariate integrated clinical and environmental exposures data. BMC Med Inf Decis Mak. 2025;25(1):27. 10.1186/s12911-025-02903-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Explainable extratreeclassifier model for early detection of type 2 diabetes: evidence from the PERSIAN Dena Cohort

Mustafa Ghaderzadeh

Zahra Rafie

Cirruse Salehnasab

Abstract

Background

Methods

Results

Conclusions

Graphical abstract

Introduction

Methods

Study design and data source

Study population

Data preprocessing

Feature selection

Model selection and rationale

Hyperparameter optimization

Model evaluation

Probability calibration

Sensitivity and subgroup analyses

Explainable AI approach

Software configuration

Ethical considerations

Results

Dataset characteristics

Table 1.

Overall discrimination and calibration

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Sensitivity to fasting-blood-sugar (FBS) representation

Fig. 6.

Subgroup (fairness) performance

Fig. 7.

Fig. 8.

Global and local interpretability

Global SHAP analysis

Fig. 9.

Morris global sensitivity analysis

Fig. 10.

Local SHAP interpretations

Fig. 11.

Comparison with the original analysis

Discussion

Interpretation of model behavior

Understanding calibration and reduced performance

Clinical and translational implications

Comparison with related work

Implications for future research

Summary

Limitations

Conclusions

Acknowledgements

Abbreviations

Author contributions

Funding

Data availability

Declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases