Abstract
Acute kidney injury (AKI) is a common and serious complication in patients with acute myocardial infarction (AMI) and diabetes mellitus (DM), often leading to poor outcomes. Timely and accurate risk stratification of AKI severity is essential for early intervention and improved prognosis. This study aimed to develop and externally validate machine learning (ML) models for early risk stratification of AKI severity in AMI patients with DM. In this multicenter retrospective study, data were collected from the MIMIC-III/IV, eICU-CRD, and Zhongda Hospital databases. Feature selection was performed via Boruta and LASSO algorithms. A total of 30 predictive models were built using 10 ML algorithms. Model performance was assessed via metrics of discrimination, calibration, and clinical utility. The top-performing models were interpreted using SHapley Additive exPlanations (SHAP) and deployed as an interactive web application. Among the 4,908 AMI patients with DM, 1,930 (39.3%) developed AKI. The LightGBM model achieved the highest area under the curve (AUC) in predicting stage I or higher AKI (AUC = 0.851, 95% CI: 0.822–0.879) and stage II or higher AKI (AUC = 0.874, 95% CI: 0.849–0.897) in the external Zhongda Hospital cohort. For stage III AKI, the XGBoost model performed best, with an AUC of 0.875 (95% CI: 0.846–0.901). We developed and validated ML models for early risk stratification of AKI severity in AMI patients with DM. These models demonstrated robust performance and clinical utility and were integrated into a user-friendly web tool to facilitate individualized risk stratification and early intervention, potentially improving clinical outcomes.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-026-41356-y.
Keywords: Acute kidney injury, Acute myocardial infarction, Diabetes mellitus, Machine learning, Risk stratification
Subject terms: Biomarkers, Cardiology, Computational biology and bioinformatics, Diseases, Medical research, Nephrology, Risk factors
Introduction
Acute kidney injury (AKI) is a common and serious complication among patients with acute myocardial infarction (AMI), particularly in those with coexisting diabetes mellitus (DM)1. The presence of DM further increases susceptibility to renal injury through mechanisms such as microvascular dysfunction, chronic inflammation, and metabolic dysregulation2. AKI in this population is associated with increased in-hospital mortality, prolonged hospitalization, higher healthcare costs, and worse long-term cardiovascular and renal outcomes1. Early identification of patients at high risk for AKI is therefore of substantial clinical importance.
In routine clinical practice, AKI is often recognized only after serum creatinine (Scr) has already increased, at which point renal injury may already be established and opportunities for effective preventive intervention are limited3. Although electronic alert systems based on Scr changes have been developed to facilitate earlier detection, recent evidence suggests that such systems alone have not consistently improved patient outcomes4,5. These limitations underscore the need for tools that can identify high-risk patients earlier in the clinical course, before overt kidney injury becomes clinically apparent.
From a clinician’s perspective, the value of machine learning lies not in algorithmic complexity, but in its ability to integrate routinely collected clinical data and provide early, individualized risk estimates6,7. By leveraging information available during the early phase of hospitalization, machine learning–based models may support proactive risk stratification and guide timely preventive strategies recommended by the Kidney Disease: Improving Global Outcomes (KDIGO) care bundles8. In this context, we aimed to develop and externally validate interpretable machine learning models for early in-hospital risk stratification of AKI severity in patients with AMI and DM using multicenter data.
Methods
This multicenter retrospective study adhered to the Transparent Reporting of a multivariable model for Individual Prognosis or Diagnosis + Artificial Intelligence (TRIPOD + AI) reporting guidelines9. The overall workflow is illustrated in Fig. 1.
Fig. 1.
Schematic workflow of the study. MIMIC, medical information mart for intensive care; AMI, acute myocardial infarction. DM, diabetes mellitus. AKI, acute kidney injury.
Data collection
The data used for model training and internal validation were obtained from the MIMIC-III and MIMIC-IV databases, which contain patient information from 2001 to 202210,11. External validation was performed via two datasets: External Validation Set A (eICU-CRD database, 2014–2015) and External Validation Set B (Zhongda Hospital, Southeast University, 2022–2023)12. Access to the MIMIC and eICU-CRD databases was granted after completing the required application process (Record ID: 60092717). The study was approved by the Ethics Committee of Zhongda Hospital, Southeast University (No: 2023ZDSYLL265-P01). Informed consent was waived because of the anonymized nature of the data. All methods were performed in accordance with the relevant guidelines and regulations and complied with the Declaration of Helsinki where applicable.
In the MIMIC and eICU databases, AMI and DM were identified based on International Classification of Diseases, Ninth and Tenth Revision (ICD-9 and ICD-10) diagnosis codes extracted from the diagnosis modules. In the Zhongda Hospital cohort, diagnoses were identified using corresponding diagnostic codes recorded in the electronic medical record system. Only patients with a documented diagnosis of both AMI and DM during hospitalization were included in the analysis.
Data cleaning
Patients with a diagnosis of both AMI and DM were included. The exclusion criteria were as follows: (1) severe renal insufficiency (eGFR < 15 mL/min/1.73 m²) or renal replacement therapy; (2) hospital stays < 24 h; (3) > 30% missing data. The collected data included: (1) demographic data (age, sex, height, weight, ethnicity); (2) comorbidities (diabetes type, hypertension); (3) vital signs (heart rate, blood pressure, respiratory rate, temperature); (4) laboratory results (blood cell counts, coagulation function, myocardial injury markers, renal/hepatic function); and (5) additional data (treatments, SOFA scores, length of stay, in-hospital mortality). All variables were derived exclusively from data collected within the first 24 h after hospital admission.
Data preprocessing
The sample size was calculated via the formula n = (1.96/δ)2 × ϕ(1-ϕ), where ϕ represents the desired outcome ratio (ϕ = 0.29) and is the error margin (δ = 0.05)13. With this formula, the minimum required sample size for the training set was 316 participants. This finding indicates that the training population was adequate for model development. Collins’ guidelines for the external validation of prognostic models recommend a minimum of 100 events, with 200 or more events being ideal. The eICU-CRD and Zhongda cohorts included 345 and 103 events, respectively, thus meeting the recommended criteria14. Missing data (< 30%) were handled using multiple imputation by chained equations (MICE). To avoid potential bias, outcome variables related to AKI diagnosis or severity were not included in the imputation models. Imputation was performed independently for each data split, including the training set, internal validation set, and both external validation cohorts, using only predictor variables available within each dataset.
Feature selection
Two algorithms were used for feature selection: Boruta and LASSO. Boruta, a random forest-based method, identifies relevant predictors by comparing variable importance to randomized permutations15. LASSO, a regularization technique, compresses coefficients to zero, excluding irrelevant variables and mitigating multicollinearity16. This combined approach enhances model accuracy and generalizability17.
Model construction and evaluation
The MIMIC data were split into an 80% training set and a 20% internal validation set. External validation was performed via eICU-CRD and Zhongda datasets. The synthetic minority over-sampling technique (SMOTE) was used to address class imbalance. Ten algorithms, including logistic regression (LR), decision tree (DT), random forest (RF), support vector machine (SVM), k-nearest neighbors (KNN), naive bayes (NB), artificial neural network (ANN), extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), and light gradient boosting machine (LightGBM), were used to construct predictive models. The hyperparameters were optimized via grid search and manual fine-tuning.
Model performance was evaluated via the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, F1 score, calibration plots, and decision curve analysis (DCA). Calibration was assessed visually using calibration plots, without formal quantitative estimation of calibration slope or intercept.
Missing value imputation, feature selection, class imbalance handling using SMOTE and hyperparameter tuning were performed exclusively on the training dataset. The internal validation and external validation datasets were kept completely independent and were not used in any stage of model training or optimization.
Model interpretation and application
The SHAP method was used to interpret model predictions and rank feature importance18. The best-performing models were deployed as a web application via the Streamlit Python framework, enabling clinicians to input feature values and obtain predicted probabilities for each AKI grade.
Outcome definition
The diagnosis and staging of AKI were based on the Scr criteria established by KDIGO guidelines: Stage I: Scr increase to 1.5–1.9 times baseline or ≥ 0.3 mg/dL (26.5 µmol/L). Stage II: Scr increase to 2.0–2.9 times baseline. Stage III: Scr increase to ≥ 3.0 times baseline or ≥ 4.0 mg/dL (353.6 µmol/L), or initiation of renal replacement therapy. AKI was identified using the KDIGO Scr based on the official MIMIC-derived algorithm, in which the baseline Scr was defined as the lowest value within the preceding 48 h or 7 days prior to each Scr measurement19. To ensure temporal separation between predictors and outcomes, all predictor variables were extracted exclusively from the first 24 h after hospital admission, whereas AKI severity for model evaluation was determined based on Scr changes occurring after completion of this initial 24-hour predictor window. Although some patients exhibited early Scr changes meeting KDIGO AKI staging criteria within the first 24 h, these values were treated exclusively as predictor information and were not used to define the outcome. Accordingly, AKI staging was based solely on subsequent creatinine trajectories beyond the predictor window.
Sensitivity analyses
To assess the robustness of the models, we conducted a series of sensitivity analyses. First, predictor variables were grouped into four categories: demographic characteristics, laboratory indicators, clinical scores, and urine output–related variables. Separate models were developed using each variable category and evaluated in the external validation cohort. The performance of these models was compared with that of the composite model incorporating all variable categories.
In addition, an additional sensitivity analysis was performed to evaluate the impact of AKI definition on model performance. In the MIMIC internal validation cohort, AKI was alternatively defined using combined Scr and urine-output KDIGO criteria. The best-performing model from the primary analysis was applied without retraining, and predictive performance was reassessed to examine the robustness of the main findings under a different outcome definition.
Statistical analysis
Continuous variables were expressed as mean ± standard deviation or median (interquartile range) and compared via Student’s t-test or the Wilcoxon rank-sum test. Categorical variables were presented as frequencies (percentages) and compared using chi-square or Fisher’s exact tests. Analyses were performed using R (version 4.0.0) and Python (version 3.10). A p-value < 0.05 was considered statistically significant.
Results
Patient characteristics
A total of 4,908 patients with AMI and DM were included in this study: 3,517 from the MIMIC database, 1,036 from the eICU-CRD, and 355 from Zhongda Hospital, Southeast University (Fig. 1). In the MIMIC cohort, the incidences of stage I or higher AKI, stage II or higher AKI, and stage III AKI were 42.1%, 10.9%, and 6.1%, respectively. AKI incidence rates were observed in the external cohorts: 33.3%, 7.4%, and 3.7% in the eICU-CRD cohort, and 29.0%, 11.8%, and 7.0% in the Zhongda Hospital cohort (Table 1). The demographic and clinical characteristics of the non-AKI and AKI patients in the three cohorts are detailed in Supplementary Tables S1-3.
Table 1.
Demographic and clinical characteristics.
| Characteristic | MIMIC (n = 3517) |
eICU-CRD (n = 1036) |
Zhongda Hospital (n = 355) |
|---|---|---|---|
| ≥ AKI I | 1482 (42.1) | 345 (33.3) | 103 (29.0) |
| ≥ AKI II | 383 (10.9) | 77 (7.4) | 42 (11.8) |
| AKI III | 216 (6.1) | 38 (3.7) | 25 (7.0) |
| Male (n%) | 2169 (61.7) | 629 (60.7) | 248 (69.9) |
| Age (year) | 73.0 [64.0, 81.0] | 67.0 [57.0, 75.0] | 69.0 [61.0, 75.0] |
| Height (cm) | 168.0 [160.0, 176.0] | 170.2[162.6, 177.8] | 168.0 [160.0, 172.0] |
| Weight (kg) | 82.4 [70.1, 96.9] | 86.8 [72.5, 104.0] | 70.0 [61.0, 75.0] |
| BMI (kg/m2) | 28.9 [25.2, 33.4] | 29.9 [25.8, 35.0] | 24.6 [22.6, 27.3] |
| Race (n%) | |||
| White | 2341 (66.6) | 635 (61.3) | 0 (0.0) |
| Black | 319 (9.1) | 115 (11.1) | 0 (0.0) |
| Asian | 79 (2.2) | 36 (3.5) | 355 (100.0) |
| Other | 778 (22.1) | 250 (24.1) | 0 (0.0) |
| Hypertension (n%) | 1497 (42.6) | 418 (40.3) | 266 (74.9) |
| CKD (n%) | 1054 (30.0) | 175 (16.9) | 43 (12.1) |
| CD (n%) | 288 (8.2) | 46 (4.4) | 60 (16.9) |
| SBP_max (mmHg) | 145.0 [132.0, 160.0] | 151.0 [134.0, 168.0] | 142.0 [130.0, 159.0] |
| SBP_min (mmHg) | 89.0 [80.0, 99.0] | 90.0 [77.0, 104.0] | 103.0 [92.0, 119.0] |
| MBP_max (mmHg) | 100.0 [90.0, 112.0] | 108.0 [95.0, 122.0] | 104.3 [96.0, 114.7] |
| MBP_min (mmHg) | 58.0 [52.0, 65.0] | 60.0 [50.0, 71.0] | 81.1 [68.0, 95.0] |
| WBC_max (k/uL) | 13.1 [9.0, 17.7] | 11.6 [8.8, 15.3] | 9.3 [7.2, 12.2] |
| WBC_min (k/uL) | 9.6 [7.4, 12.6] | 10.5 [8.0, 13.4] | 8.9 [6.9, 11.3] |
| Hemoglobin (g/dL) | 11.2 [9.7, 12.8] | 12.8 [11.1, 14.4] | 13.3 [11.3, 14.8] |
| RDW_max (%) | 14.5 [13.5, 15.9] | 14.1 [13.3, 15.3] | 13.2 [12.8, 14.0] |
| RDW_min (%) | 14.2 [13.3, 15.5] | 14.0 [13.2, 15.1] | 13.2 [12.8, 14.0] |
| eGFR (mL/min/1.73m2) | 47.1 [32.5, 66.6] | 49.6 [32.2, 67.9] | 73.7 [43.5, 95.6] |
| Scr_max (mg/dL) | 1.3 [0.9, 1.9] | 1.2 [0.9, 1.8] | 1.0 [0.8, 1.5] |
| Scr_min (mg/dL) | 1.1 [0.8, 1.5] | 1.1 [0.8, 1.5] | 1.0 [0.8, 1.4] |
| BUN_max (mg/dL) | 27.0 [19.0, 43.0] | 23.0 [16.0, 36.0] | 21.7 [16.6, 32.6] |
| BUN_min (mg/dL) | 23.0 [15.0, 36.0] | 21.0 [15.0, 32.0] | 20.6 [20.5, 20.9] |
| Glucose_max (mg/dL) | 216.0 [171.0, 284.0] | 210.0 [156.0, 303.0] | 192.4 [148.5, 262.1] |
| Glucose_min (mg/dL) | 123.0 [98.0, 160.0] | 161.0 [123.0, 215.0] | 158.4 [122.2, 207.7] |
| Albumin (g/dL) | 3.4 [3.0, 3.8] | 3.3 [2.8, 3.7] | 3.9 [3.5, 4.2] |
| ALT (IU/L) | 25.0 [17.0, 44.0] | 30.0 [21.0, 51.0] | 23.0 [16.0, 37.0] |
| pH_max | 7.4 [7.4, 7.5] | 7.4 [7.3,7.4] | 7.4 [7.4, 7.4] |
| pH_min | 7.3 [7.3, 7.4] | 7.3 [7.2, 7.4] | 7.4 [7.4, 7.4] |
| PaO2_max (mmHg) | 160.0 [83.0, 349.0] | 123.7 [81.0, 209.0] | 89.1 [72.0, 112.8] |
| PaO2_min (mmHg) | 74.0 [43.0, 103.0] | 79.0 [61.0, 111.0] | 75.1 [64.4, 87.4] |
| Lactate_max (mmol/L) | 2.2 [1.5, 3.2] | 2.3 [1.6 ,4.4] | 2.4 [1.5, 3.8] |
| Lactate_min (mmol/L) | 1.4 [1.1, 1.8] | 1.9 [1.2, 2.8] | 1.7 [1.3, 2.5] |
| Base Excess_max (mmol/L) | 0.0 [-1.0, 2.0] | 0.9 [0.5, 1.3] | -1.2 [-3.1, 0.5] |
| Base Excess_min (mmol/L) | -2.0[-5.0, 0.0] | -3.1 [-2.8, -1.2] | -1.3 [-3.9, 0.4] |
| Calcium (mEq/L) | 1.1 [1.1, 1.2] | 2.4 [0.6, 4.2] | 1.2 [1.1, 1.4] |
| Phosphate (mEq/L) | 3.6 [3.0, 4.2] | 3.5 [2.9, 4.3] | 1.1 [1.0, 1.3] |
| Anion Gap_max (mEq/L) | 16.0 [14.0, 19.0] | 12.0 [9.0, 15.0] | 14.7 [12.9, 17.1] |
| Anion Gap_min (mEq/L) | 13.0 [11.0, 15.0] | 10.0 [8.0, 12.6] | 14.6 [12.6, 16.9] |
| UO (mL) | 1801.0 [1169.0, 2645.0] | 1600.0 [925.0, 2400.0] | 2200.0 [1700.0, 2734.0] |
| UOKH (mL/kg/h) | 0.9 [0.6, 1.4] | 0.8 [0.4, 1.2] | 1.299 [1.0, 1.7] |
| SOFA | 4.0 [2.0, 7.0] | 3.0 [2.0, 5.0] | 4.0 [3.0, 5.0] |
| Contrast (n%) | 1237 (35.2) | 414 (40.0) | 296 (83.4) |
| CAG (n%) | 1203 (34.2) | 399 (38.5) | 284 (80.0) |
| CABG (n%) | 316 (9.0) | 72 (7.0) | 12 (3.4) |
| Length of Stay (day) | 8.7 [5.4, 13.7] | 5.0 [2.8, 9.1] | 7.0 [6.0, 9.0] |
| Deaths in hospital (n%) | 460 (13.1) | 115 (11.1) | 25 (7.0) |
BMI, body mass index; CKD, chronic kidney disease; CD, cerebrovascular disease; SBP, systolic blood pressure; MBP, mean blood pressure; WBC, white blood cell; RDW, red blood cell distribution width; eGFR, estimated glomerular filtration rate; Scr, serum creatinine; BUN, blood urea nitrogen; ALT, alanine transaminase; PaO2, partial pressure of oxygen in arterial blood; UO, urine output; UOKH, urine output per kilogram per hour; SOFA, sequential organ failure assessment; CAG, coronary angiography; CABG, coronary artery bypass grafting;
Feature selection
From 90 clinical variables in the MIMIC database, the Boruta algorithm identified 19, 32, and 27 predictor variables for stage I or higher, stage II or higher, and stage III AKI, respectively. Subsequent LASSO screening reduced these to 15, 17, and 9 variables, respectively (Supplementary Fig.S1-3). The final predictor variables are listed in Table 2.
Table 2.
Predictor variables.
| ≥AKI I | Variable |
|---|---|
| UO, UOKH, SOFA, Weight, BMI, SBP_min, Hemoglobin, INR_max, eGFR, Scr_max, Scr_min, BUN_min, Albumin, Lactate_max, Anion gap_max | |
| ≥AKI II | UO, UOKH, SOFA, MBP_max, RDW_max, eGFR, Scr_max, Scr_min, BUN_min, Glucose_max, Anion gap_max, Anion gap_min, pH_min, PaO2_max, Base excess_min, Calcium, Lactate_max |
| AKI III | UO, SOFA, eGFR, Scr_max, Albumin, WBC_min, pH_min, ALT, Phosphate |
UO, urine output; UOKH, urine output per kilogram per hour; SOFA, sequential organ failure assessment; BMI, body mass index; SBP, systolic blood pressure; INR, international normalized ratio; eGFR, estimated glomerular filtration rate; Scr, serum creatinine; BUN, blood urea nitrogen; RDW, red blood cell distribution width; PaO2, partial pressure of oxygen in arterial blood; ALT, alanine transaminase; WBC, white blood cell;
Model development and validation
Thirty predictive models were constructed via 10 ML algorithms based on the selected variables. In the training set, all the models demonstrated strong performance: RF (AUC = 0.969, 95% CI: 0.964–0.973) for stage I or higher AKI, KNN (AUC = 0.995, 95% CI: 0.994–0.996) for stage II or higher AKI, and DT (AUC = 0.997, 95% CI: 0.996–0.997) for stage III AKI. In the internal validation set, LR performed best for stage I or higher AKI (AUC = 0.730, 95% CI: 0.690–0.766) and stage II or higher AKI (AUC = 0.819, 95% CI: 0.761–0.874), whereas RF excelled for stage III AKI (AUC = 0.881, 95% CI: 0.841–0.918).
External validation via the eICU-CRD dataset (External Validation Set A) identified LightGBM as the best model for stage I or higher AKI (AUC = 0.743, 95% CI: 0.710–0.773), stage II or higher AKI (AUC = 0.750, 95% CI: 0.694–0.802), and XGBoost for stage III AKI (AUC = 0.843, 95% CI: 0.770–0.900) (Fig. 2). Further validation with the Zhongda Hospital dataset (External Validation Set B) confirmed the robustness of these models: LightGBM achieved AUCs of 0.851 (95% CI: 0.822–0.879) for stage I or higher AKI and 0.874 (95% CI: 0.849–0.897) for stage II or higher AKI, whereas XGBoost achieved an AUC of 0.875 (95% CI: 0.846–0.901) for stage III AKI (Fig. 3). The models also demonstrated high accuracy, sensitivity, specificity, PPV, NPV, and F1 scores (Supplementary Tables S4-6).
Fig. 2.
Receiver operating characteristic curve for training set, internal validation set, external validation set A. Stage I or higher AKI (A-C), stage II or higher AKI (D-F), stage III AKI (G-I). CI, confidence interval.
Fig. 3.
Receiver operating characteristic curve for external validation set B. Stage I or higher AKI (A) stage II or higher AKI (B), stage III AKI (C).
Clinical utility and calibration of the models
DCA was performed to evaluate the clinical utility of the best models by quantifying the net benefits at different threshold probabilities. In the training set, the prediction models provided greater net benefit than the “treat-all” or “treat-none” strategies when the threshold probabilities were below 0.82 for stage I or higher AKI, 0.84 for stage II or higher AKI, and 0.92 for stage III AKI (Supplementary Fig.S4D-F). In external validation set B, the corresponding thresholds were below 0.85, 0.90, and 0.88, respectively (Fig. 4D-F). The calibration curves demonstrated strong agreement between predicted and observed values, indicating excellent model calibration in both the training and external validation sets (Supplementary Fig.S4G-I, Fig. 5G-I).
Fig. 4.
Confusion matrix plots, calibration plots, and DCAs for models in external validation set B. DCA decision curve analysis. (A, D, G) for stage I or higher AKI. (B, E, H) for stage II or higher AKI. (C, F, I) for stage III AKI.
Fig. 5.
Global model interpretation by the SHAP method. (A, B) for stage I or higher AKI. (C, D) for stage II or higher AKI. (E, F) for stage III AKI.
Model interpretation
To address the challenge of interpretability in machine learning models, we employed the SHAP method. This approach provides two levels of interpretation: Global Interpretation: SHAP summary plots (Fig. 5) rank features by their contribution to the model, with the highest Scr level at admission identified as the most influential variable across all three models. Local Interpretation: individualized predictions are visualized using waterfall plots. The relative importance of predictors for each model, quantified using mean absolute SHAP values, is summarized in Supplementary Table S7. For example, in a patient with AMI and DM treated at Zhongda Hospital, the model predicted probabilities of 59.8%, 53.5%, and 1.4% for AKI stage I or higher, stage II or higher, and stage III, respectively. Bar lengths indicate the magnitude of each feature’s contribution, while colors (red/blue) denote positive or negative effects (Supplementary Fig.S6).
Sensitivity analysis
Sensitivity analyses demonstrated the robustness of the proposed models. Models constructed using individual variable categories showed lower predictive performance than the composite model in the external validation cohort, indicating the advantage of integrating multidimensional clinical information (Supplementary Table S8). When AKI was alternatively defined using combined Scr and UO KDIGO criteria in the MIMIC internal validation cohort, the best-performing model retained acceptable discriminative ability without retraining (Supplementary Table S9). These findings support the stability of the model performance across different variable groupings and AKI definitions.
Application of the model
The best-performing models were deployed as a web application to facilitate clinical use (https://prediction-model-for-aki-severity-g33bpthmj9uazm7beatkth.streamlit.app/). Clinicians can input feature values to obtain predictions of AKI severity (Supplementary Fig.S7).
Discussion
In this multicenter study, we developed and externally validated interpretable machine learning models to predict the severity of acute kidney injury in patients with AMI and DM. Using routinely collected clinical variables from the first 24 h after hospital admission, the models demonstrated good discriminative performance across different AKI severity thresholds in both internal and external validation cohorts. The incorporation of SHAP-based interpretation further enabled transparent assessment of variable contributions, enhancing clinical interpretability.
Machine learning has been increasingly applied in clinical research because of its ability to model complex, nonlinear relationships among clinical variables20. Gradient boosting–based algorithms, including XGBoost and LightGBM, have demonstrated strong performance in cardiovascular risk prediction tasks21,22. Consistent with prior studies, these models demonstrated robust performance across different AKI severity levels in our analysis. Their suitability for early risk stratification lies in their ability to integrate routinely collected clinical data, and their selection was based on predictive performance and interpretability rather than methodological novelty.
We adopted the KDIGO criteria based on the Scr level for AKI diagnosis. This decision was informed by our observation that the incidence of AKI in AMI patients with DM in the MIMIC database was as high as 70% when both UO and Scr, were used compared with previously reported rates of 4.4%-38.5%4,23,24. Several factors may explain this discrepancy, including differences in patient severity, exposure to nephrotoxic agents, and variability in UO monitoring practices across databases.The choice of a Scr–based KDIGO definition was further motivated by substantial heterogeneity in UO measurement frequency across centers, which may compromise the generalizability of UO–based AKI definitions in multicenter retrospective datasets. Notably, a recent systematic review and meta-analysis by Chen et al. reported that many prior AKI studies defined AKI primarily based on serum creatinine criteria, highlighting the widespread use of Scr-based definitions in AKI research5. Moreover, UO is highly influenced by factors such as hydration status, diuretic use, and local nursing documentation practices, and its recording frequency varies considerably between institutions. In contrast, Scr measurements are instrument-based and more consistently recorded, making them more suitable for harmonized AKI definition in large multicenter retrospective analyses. Although serum creatinine–derived variables were included as predictors, these measurements were restricted to the early phase of hospitalization and reflect baseline renal vulnerability rather than diagnostic criteria for AKI, which was defined based on subsequent creatinine changes.
The predictor variables selected via the Boruta and LASSO algorithms are closely related to the mechanisms of AKI. SHAP summary plots highlighted the importance of renal function-related predictors, with UO and UOKH serving as dynamic indicators of renal perfusion and function. Scr and eGFR are used to assess acute changes and baseline renal function, respectively, while blood urea nitrogen (BUN) reflects renal excretory capacity and metabolic status25,26. Although current AKI diagnosis relies on Scr and UO changes, the delayed increase in Scr may result in a 1–3 day diagnostic delay25. Therefore, incorporating additional variables into predictive models is crucial for improving early prediction accuracy. Blood pressure is a key indicator of renal hemodynamics and has been linked to AKI risk, with hypotension identified as an independent risk factor in cardiac surgery populations27. Metabolic disturbances, such as elevated lactate, anion gap, and abnormal pH, indicate tissue hypoperfusion and metabolic acidosis, exacerbating renal injury28,29. Inflammation and oxidative stress also contribute to AKI, as indicated by reduced albumin levels, elevated red cell distribution width (RDW), and higher SOFA scores. Our previous study demonstrated that the RDW-to-albumin ratio, a marker of chronic inflammation, effectively predicts AKI after AMI, consistent with other risk scores incorporating organ dysfunction measures30,31.
All input variables required by the proposed model are routinely collected within the first 24 h after hospital admission and can be automatically extracted from electronic health records, laboratory systems, and nursing documentation, allowing the model to operate without additional manual data entry or increased bedside workload (Fig. 6). In practice, the model is intended to be run shortly after completion of the first 24 h of hospitalization, once routine laboratory results and cumulative UO are available for automated extraction. The model is designed as a clinical decision-support tool rather than a decision-making system. It provides individualized AKI risk estimates to assist clinicians in identifying high-risk patients and prioritizing early in-hospital management and secondary prevention, such as closer monitoring, avoidance of additional nephrotoxic exposures, optimization of hemodynamics and fluid balance, and timely nephrology consultation. Predicted probabilities are intended to support relative risk stratification among patients rather than to define absolute decision thresholds for clinical interventions. Importantly, the web-based tool does not replace KDIGO diagnostic criteria or clinical judgment, but serves as a complementary aid for early AKI risk estimation. Because initiation of renal replacement therapy is incorporated into the definition of KDIGO stage III AKI, the model’s ability to identify patients at high risk of severe AKI may also help flag individuals at risk of dialysis-requiring AKI, which may be clinically relevant for supporting management decisions. A worked example illustrating clinical use of the web-based tool is provided in the Supplementary Materials.
Fig. 6.
Clinical workflow of the early AKI risk stratification models. AMI, acute myocardial infarction. DM, diabetes mellitus. EMR, electronic medical record. CRRT, continuous renal replacement therapy.
Our study focused on AKI in AMI patients with DM because of their significantly higher AKI incidence and worse prognosis, driven by complex pathophysiological interactions. First, chronic kidney disease and microvascular complications in long-term DM patients reduce renal reserve, increasing susceptibility to acute ischemic injury32. DM also exacerbates myocardial infarction severity, leading to reduced cardiac output and impaired renal perfusion33. In addition, hyperglycemia-induced oxidative stress, inflammation, mitochondrial dysfunction, and insulin resistance may directly damage renal tubular epithelial cells and worsen renal hypoxia, as reflected by elevated lactate levels34–36. These mechanisms highlight the multifactorial nature of AKI in AMI patients with DM. We further observed that the highest glucose level on the first day was an independent predictor of stage II or higher AKI, consistent with our previous findings demonstrating a J-shaped relationship between stress hyperglycemia and AKI risk in critically ill AMI patients37.
The 24-hour window for model input was selected based on methodological and practical considerations. Urine output–derived variables were calculated using first-day data, as the frequency of urine output recording varied substantially across datasets, ranging from high-resolution measurements in MIMIC to predominantly 24-hour summaries in the eICU and Zhongda cohorts. A unified first-day window was therefore adopted to ensure consistency and generalizability. In addition, very early or admission-only measurements may show weaker associations with AKI occurring later during hospitalization. The 24-hour window represents a pragmatic balance between early risk assessment and predictive robustness. We acknowledge that Scr plays a dual role in our study, serving both as a diagnostic component of AKI and as a dominant contributor to the model. Accordingly, the proposed model is best described as an early in-hospital risk stratification tool, providing individualized estimates of AKI severity risk using information accumulated during the first 24 h after admission. We acknowledge that, in a subset of patients, AKI may already be diagnosable within this initial 24-hour period based on KDIGO Scr criteria. In these cases, the model functions primarily as an early identification and severity stratification tool. In other patients, the model may help predict subsequent AKI progression later during hospitalization.
Model performance varied across datasets, with stronger discrimination observed in the training set and the Zhongda Hospital cohort, and relatively lower performance in the internal validation set and the eICU dataset. This variability highlights the challenges of model transportability across heterogeneous clinical settings. One possible explanation relates to differences in temporal and institutional contexts among the datasets. The MIMIC database spans more than two decades (2001–2022), during which treatment standards, diagnostic technologies, and clinical practices for acute myocardial infarction and acute kidney injury have evolved. In contrast, the eICU dataset was collected from 208 U.S. hospitals over a shorter period (2014–2015) and reflects substantial inter-institutional variability in treatment and documentation practices. Such temporal and spatial heterogeneity may influence the observed incidence and severity of AKI and, consequently, model performance. These findings indicate that the proposed model should not be assumed to perform uniformly across hospitals without adjustment. For application in a new clinical setting, local calibration or model updating using institution-specific data would likely be required. Future studies should focus on retraining the model using larger, more geographically diverse datasets and on prospective multicenter validation to further assess transportability.
Currently, no definitive treatments exist for AKI, underscoring the importance of early identification of patients at risk of severe or progressive AKI to optimize in-hospital management and improve outcomes. Given that the proposed models provide risk estimates after the first 24 h of hospitalization, its primary clinical value lies in supporting trajectory management rather than preventing the initial kidney injury. Although early hydration is a cornerstone of AKI prevention in specific clinical contexts, such as contrast exposure or renal hypoperfusion, the model is more appropriately positioned to guide subsequent management decisions once early AKI risk has been identified38. Emerging evidence also suggests that sodium–glucose cotransporter 2 (SGLT2) inhibitors may confer renal protective effects by improving renal hemodynamics and reducing oxidative stress and inflammation39,40. For patients at high risk of severe AKI, early continuous renal replacement therapy (CRRT) can remove inflammatory cytokines and metabolic waste, alleviating renal burden and improving outcomes41. Integrating ML-based AKI risk stratification with these management strategies may support individualized care aimed at limiting AKI progression rather than preventing its initial onset.
This study has several limitations. First, although a standardized KDIGO-based Scr algorithm was applied for AKI identification, misclassification of patients with pre-admission AKI cannot be completely excluded, which is inherent to retrospective analyses using electronic health record data. In addition, although predictors were restricted to the first 24 h after admission and outcomes were defined based on subsequent Scr changes, some patients may have met early AKI criteria within this initial window. Therefore, the model should not be interpreted as purely predicting future de novo AKI, but rather as an early in-hospital risk stratification tool. Second, model performance in the external validation cohorts was relatively lower than that observed in the training set, which may be attributable to the unavailability of certain clinically relevant variables, such as C-reactive protein, triglycerides, culprit vessels in AMI, and contrast agent volume, as well as the lack of treatment-related information including insulin use, diuretics, and fluid administration. In addition, the absence of long-term dynamic monitoring of key variables, such as Scr and UO, may have further affected model accuracy. Third, the long time span covered by the MIMIC database may limit the direct applicability of the findings to contemporary clinical practice. Finally, the retrospective study design may introduce residual confounding and selection bias. Prospective multicenter studies are warranted to further validate and refine the proposed models.
From a clinical perspective, the key message of this study is that AKI risk stratification in patients with AMI and DM can be performed early using data routinely available within the first 24 h of hospitalization. By identifying patients at elevated risk, the proposed models may support closer monitoring and timely preventive strategies, while final clinical decisions remain at the discretion of the treating physician.
Conclusions
In this study, we developed and validated interpretable ML–based models for early in-hospital risk stratification of AKI severity in patients with AMI and comorbid DM. By leveraging routinely collected clinical data and emphasizing model interpretability, this work provides a framework for individualized AKI risk assessment. However, model performance varied across healthcare systems and documentation practices, indicating that local validation, recalibration, or retraining using institution-specific data would likely be required prior to clinical deployment. Accordingly, the proposed models should be considered as candidate clinical decision-support tools that warrant further prospective and multicenter evaluation.
Supplementary Information
Below is the link to the electronic supplementary material.
Author contributions
LR was responsible for developing the model and drafting the manuscript. YQ performed the literature review and statistical validation. XDL, YS, SLX, GLY, and DW collected and curated the multicenter data. YHQ provided clinical study design supervision and validated the statistical results and clinical interpretations. CCT led machine learning model development, optimization, oversaw technical validation and manuscript submission. All the authors read and approved the final draft.
Funding
This study was supported by the National Natural Science Foundation of China (No.82170433), the Natural Science Foundation of Jiangsu Province (No. BK20241682) and the Jiangsu Provincial Key Discipline (Laboratory) under Grant (NO. ZDXK202207).
Data availability
The data is available from public access requests for eICU and MIMIC-III/IV (https://physionet.org/). Accessing the data requires ethics module training and certification. The data from Zhongda Hospital are available from the corresponding author upon reasonable request.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Liang Ruan and Yong Qiao contributed equally to this work.
Contributor Information
Chengchun Tang, Email: tangchengchun@hotmail.com.
Yuhan Qin, Email: 986742402@qq.com.
References
- 1.Kaltsas, E., Chalikias, G. & Tziakas, D. The Incidence and the Prognostic Impact of Acute Kidney Injury in Acute Myocardial Infarction Patients: Current Preventive Strategies. Cardiovasc. Drugs Ther.32, 81–98 (2018). [DOI] [PubMed] [Google Scholar]
- 2.Miura, T., Kuno, A. & Tanaka, M. Diabetes modulation of the myocardial infarction-acute kidney injury axis. Am. J. Physiol. Heart Circ. Physiol.322, H394–H405 (2022). [DOI] [PubMed] [Google Scholar]
- 3.Kellum, J. A. et al. Acute kidney injury. Nat. Rev. Dis. Primer. 7, 52 (2021). [DOI] [PubMed] [Google Scholar]
- 4.Gao, S., Liu, Q., Chen, H., Yu, M. & Li, H. Predictive value of stress hyperglycemia ratio for the occurrence of acute kidney injury in acute myocardial infarction patients with diabetes. BMC Cardiovasc. Disord. 21, 157 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Chen, J. J. et al. Electronic Alert Systems for Patients With Acute Kidney Injury: A Systematic Review and Meta-Analysis. JAMA Netw. Open.7, e2430401 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Shehab, M. et al. Machine learning in medical applications: A review of state-of-the-art methods. Comput. Biol. Med.145, 105458 (2022). [DOI] [PubMed] [Google Scholar]
- 7.Fritz, B. A. et al. Deep-learning model for predicting 30-day postoperative mortality. Br. J. Anaesth.123, 688–695 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.See, C. Y. et al. Improvement of composite kidney outcomes by AKI care bundles: a systematic review and meta-analysis. Crit. Care Lond. Engl.27, 390 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Collins, G. S. et al. TRIPOD + AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ385, e078378 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Johnson, A., Pollard, T. & Mark, R. MIMIC-III Clinical Database. PhysioNet (2015). 10.13026/C2XW26
- 11.Johnson, A. et al. MIMIC-IV. 10.13026/KPB9-MT58
- 12.Pollard, T. J., Johnson, A. E. W., Raffa, J. & Badawi, O. The eICU Collaborative Research Database. physionet.org (2017). 10.13026/C2WM1R
- 13.Riley, R. D. et al. Calculating the sample size required for developing a clinical prediction model. BMJ368, m441 (2020). [DOI] [PubMed] [Google Scholar]
- 14.Riley, R. D. et al. Evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study. BMJ384, e074821 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Speiser, J. L., Miller, M. E., Tooze, J. & Ip, E. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl.134, 93–101 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Vasquez, M. M., Hu, C., Roe, D. J., Halonen, M. & Guerra, S. Measurement error correction in the least absolute shrinkage and selection operator model when validation data are available. Stat. Methods Med. Res.28, 670–680 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.He, W., Fu, X. & Chen, S. Advancing polytrauma care: developing and validating machine learning models for early mortality prediction. J. Transl Med.21, 664 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Goodwin, N. L., Nilsson, S. R. O., Choong, J. J. & Golden, S. A. Toward the explainability, transparency, and universality of machine learning for behavioral classification in neuroscience. Curr. Opin. Neurobiol.73, 102544 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Khwaja, A. KDIGO clinical practice guidelines for acute kidney injury. Nephron Clin. Pract.120, c179–184 (2012). [DOI] [PubMed] [Google Scholar]
- 20.L, S. et al. The Use of Deep Learning and Machine Learning on Longitudinal Electronic Health Records for the Early Detection and Prevention of Diseases: Scoping Review. J Med. Internet Res26, (2024). [DOI] [PMC free article] [PubMed]
- 21.Li, J. et al. Predicting Mortality in Intensive Care Unit Patients With Heart Failure Using an Interpretable Machine Learning Model: Retrospective Cohort Study. J. Med. Internet Res.24, e38082 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zhang, X. et al. Identification of DNA methylation-regulated genes as potential biomarkers for coronary heart disease via machine learning in the Framingham Heart Study. Clin. Epigenetics. 14, 122 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.G, M. et al. Acute Kidney Injury in Diabetic Patients With Acute Myocardial Infarction: Role of Acute and Chronic Glycemia. J Am. Heart Assoc7, (2018). [DOI] [PMC free article] [PubMed]
- 24.Sun, Y. B. et al. Risk factors of acute kidney injury after acute myocardial infarction. Ren. Fail.38, 1353–1358 (2016). [DOI] [PubMed] [Google Scholar]
- 25.Xiao, Z. et al. Emerging early diagnostic methods for acute kidney injury. Theranostics12, 2963–2986 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Dutta, A., Saha, S., Bahl, A., Mittal, A. & Basak, T. A comprehensive review of acute cardio-renal syndrome: need for novel biomarkers. Front. Pharmacol.14, 1152055 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Walsh, M. et al. Relationship between intraoperative mean arterial pressure and clinical outcomes after noncardiac surgery: toward an empirical definition of hypotension. Anesthesiology119, 507–515 (2013). [DOI] [PubMed] [Google Scholar]
- 28.Zhou, X. et al. Lactate level and lactate clearance for acute kidney injury prediction among patients admitted with ST-segment elevation myocardial infarction: A retrospective cohort study. Front. Cardiovasc. Med.9, 930202 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Pan, Q. et al. The association between serum anion gap and acute kidney injury after coronary artery bypass grafting in patients with acute coronary syndrome. BMC Cardiovasc. Disord. 23, 542 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Ruan, L. et al. Red Blood Cell Distribution Width to Albumin Ratio for Predicting Type I Cardiorenal Syndrome in Patients with Acute Myocardial Infarction: A Retrospective Cohort Study. J Inflamm. Res. [DOI] [PMC free article] [PubMed]
- 31.Su, Y. et al. AKI-Pro score for predicting progression to severe acute kidney injury or death in patients with early acute kidney injury after cardiac surgery. J. Transl Med.22, 571 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Alicic, R. Z., Rooney, M. T. & Tuttle, K. R. Diabetic Kidney Disease: Challenges, Progress, and Possibilities. Clin. J. Am. Soc. Nephrol. CJASN. 12, 2032–2045 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Muratsubaki, S. et al. Suppressed autophagic response underlies augmentation of renal ischemia/reperfusion injury by type 2 diabetes. Sci. Rep.7, 5311 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Tang, C., Livingston, M. J., Liu, Z. & Dong, Z. Autophagy in kidney homeostasis and disease. Nat. Rev. Nephrol.16, 489–508 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Gui, Y., Palanza, Z., Fu, H. & Zhou, D. Acute kidney injury in diabetes mellitus: Epidemiology, diagnostic, and therapeutic concepts. FASEB J. Off Publ Fed. Am. Soc. Exp. Biol.37, e22884 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Li, H., Ren, Q., Shi, M., Ma, L. & Fu, P. Lactate metabolism and acute kidney injury. Chin. Med. J. (Engl). 10.1097/CM9.0000000000003142 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Li, X. et al. Stress hyperglycemia ratio as an independent predictor of acute kidney injury in critically ill patients with acute myocardial infarction: a retrospective U.S. cohort study. Ren. Fail.47, 2471018 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Cossette, F. et al. Tailored hydration for the prevention of contrast-induced acute kidney injury after coronary angiogram or PCI: A systematic review and meta-analysis. Am. Heart J.282, 93–102 (2025). [DOI] [PubMed] [Google Scholar]
- 39.Kuno, A. et al. Empagliflozin attenuates acute kidney injury after myocardial infarction in diabetic rats. Sci. Rep.10, 7238 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Paolisso, P. et al. Outcomes in diabetic patients treated with SGLT2-Inhibitors with acute myocardial infarction undergoing PCI: The SGLT2-I AMI PROTECT Registry. Pharmacol. Res.187, 106597 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Li, X., Liu, C., Mao, Z., Li, Q. & Zhou, F. Timing of renal replacement therapy initiation for acute kidney injury in critically ill patients: a systematic review of randomized clinical trials with meta-analysis and trial sequential analysis. Crit. Care Lond. Engl.25, 15 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data is available from public access requests for eICU and MIMIC-III/IV (https://physionet.org/). Accessing the data requires ethics module training and certification. The data from Zhongda Hospital are available from the corresponding author upon reasonable request.






