Summary
Background
Rhabdomyolysis (RM) is a complex clinical syndrome with heterogeneous progression patterns among patients of varying severity. Early and accurate prediction of acute kidney injury (AKI), disease severity, renal replacement therapy (RRT) requirements, and mortality risk is essential for timely identification of high-risk individuals, personalized treatment planning, and optimal allocation of healthcare resources. We aimed to develop and externally validate an interpretable multi-task machine learning (ML) model to predict four clinical outcomes in patients with rhabdomyolysis: AKI, disease severity, the need for RRT, and in-hospital mortality.
Methods
We conducted a retrospective study using three data sources: the eICU Collaborative Research Database (eICU-CRD), the Medical Information Mart for Intensive Care IV (MIMIC-IV), and electronic medical records from four tertiary hospitals in China. Data from eICU-CRD and MIMIC-IV were combined to form the derivation cohort for model training and internal validation, while data from the Chinese hospitals served as the external validation cohort. We analyzed 1429 patients from 2008 to 2019 in the derivation cohort and 362 patients from 2016 to 2022 in the external validation cohort. AKI was defined according to the Kidney Disease: Improving Global Outcomes (KDIGO) criteria, based on serum creatinine levels and urine output. Twenty-two clinical features available within the first 24 h of admission were selected to develop the prediction models. Ten machine learning (ML) algorithms were applied to construct multi-task prediction models. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC). To improve interpretability, feature importance was assessed using the SHapley Additive exPlanation (SHAP) method.
Findings
1429 patients were included in the derivation cohort (69.4% developed AKI, 36.7% were classified as having severe disease, 12.1% required RRT, and 9.8% had in-hospital mortality). 362 patients were included in the external validation cohort (27.9% developed AKI, 25.7% had severe disease, 27.3% required RRT, and 4.1% had in-hospital mortality). Among all evaluated models, the random forest (RF) algorithm exhibited the highest overall discriminative performance across the four prediction tasks. Based on feature importance rankings, interpretable final models were developed for each task using the top five contributing features. These models demonstrated robust predictive accuracy for AKI, disease severity, RRT requirements, and in-hospital mortality, with AUCs and corresponding 95% confidence intervals (CIs) of 0.914 (0.875–0.944), 0.909 (0.869–0.940), 0.888 (0.844–0.921), and 0.823 (0.773–0.865) in the internal validation cohort, and 0.906 (0.871–0.934), 0.856 (0.815–0.890), 0.852 (0.811–0.887), and 0.832 (0.789–0.869) in the external validation cohort, respectively. To support clinical implementation, a web- and Android-based decision support system was developed and is currently undergoing pilot testing in multiple hospitals.
Interpretation
We developed and validated an interpretable multi-task ML model capable of accurately predicting key clinical outcomes in patients with RM. To improve clinical applicability, a user-friendly decision support system was implemented, incorporating interactive features to support frontline healthcare providers in real-time risk stratification and individualized management of RM.
Funding
National Key Research and Development Program of China (Nos. 2021YFC3002202 and 2023YFF1204104).
Keywords: Rhabdomyolysis, Acute kidney injury, Renal replacement therapy, Mortality, Machine learning, Prediction model, SHAP
Research in context.
Evidence before this study
Rhabdomyolysis (RM) is a rapidly progressive and potentially life-threatening clinical syndrome, often resulting in serious outcomes such as acute kidney injury (AKI), need for renal replacement therapy (RRT), and increased in-hospital mortality. We systematically searched PubMed and other databases for articles published from January 2000 to June 2025 using the keywords “rhabdomyolysis”, “acute kidney injury (AKI)”, “renal replacement therapy (RRT)”, “mortality”, “prediction model”, and “machine learning”. The majority of previous studies focused on predicting a single clinical outcome (e.g., RRT or mortality), typically using conventional statistical methods and limited sample sizes. Few studies have utilized explainable machine learning (ML) approaches to develop multi-task predictive models for RM, and none have integrated such models into deployable clinical tools to support precision management in real-world settings.
Added value of this study
This is the first study to propose a multi-task prediction strategy specifically for patients with RM. Unlike previous efforts targeting isolated complications, we developed explainable ML models that simultaneously predict four critical outcomes: AKI, disease severity, RRT requirement, and in-hospital mortality. Leveraging three large-scale databases—including two international Intensive Care Unit (ICU) datasets and one multicenter Chinese cohort—the models achieved strong performance in both internal and external validations. Furthermore, we embedded the final models into a clinical decision support system, available in both web and Android mobile formats, which is currently undergoing pilot implementation in multiple hospitals.
Implications of all the available evidence
By developing interpretable, multi-task ML models, this study expands current evidence on risk stratification in RM. The integrated prediction framework enables early identification of high-risk patients, supports personalized clinical decision-making, and promotes more efficient allocation of healthcare resources. SHapley Additive exPlanations (SHAP) enhanced model transparency and improved clinician trust in predictions. The deployed application has shown good practicality and generalizability in real-world clinical environments. Future prospective studies are warranted to assess whether such tools can improve patient outcomes and to further explore their incorporation into RM clinical management guidelines. Collectively, our findings highlight the potential of explainable artificial intelligence (AI) in advancing precision medicine for RM.
Introduction
Rhabdomyolysis (RM) is a clinical syndrome caused by skeletal muscle injury that results in the release of myoglobin into the bloodstream. It is frequently associated with severe complications, including acute kidney injury (AKI), electrolyte disturbances, and the need for renal replacement therapy (RRT), with an estimated mortality rate of approximately 10%.1, 2, 3, 4, 5, 6 The clinical manifestations of RM are highly heterogeneous, with considerable variability in disease progression among affected individuals. Early identification of high-risk patients and timely intervention are essential for improving clinical outcomes and optimizing healthcare resource utilization.7,8
Currently, early clinical assessment of RM primarily relies on biomarkers such as creatine kinase (CK), along with renal function indicators including serum creatinine and urine output.9,10 However, these conventional parameters often lack sufficient sensitivity and specificity for predicting disease severity, AKI onset, RRT requirements, and mortality risk. Notably, CK is not a specific or early predictor of AKI in patients with rhabdomyolysis.11 Consequently, there is an urgent need for more sensitive and accurate risk assessment tools to improve outcome prediction and support precise clinical decision-making.12
With the widespread adoption of electronic medical records (EMR), machine learning (ML) techniques have gained increasing recognition for their potential in clinical risk prediction.13, 14, 15 In recent years, ML-based models have demonstrated strong performance in predicting acute conditions such as AKI, particularly in intensive care unit (ICU) settings.16,17 Nonetheless, despite their predictive strength, many ML models lack interpretability, creating a “black box” dilemma that limits clinical implementation.18 To address this limitation, the SHapley Additive exPlanation (SHAP) method has emerged as a powerful tool to enhance model transparency. SHAP provides individualized, feature-level explanations that enable clinicians to understand the rationale behind model predictions.19, 20, 21, 22
This study aimed to develop and validate an interpretable, multi-task ML prediction model capable of accurately identifying disease severity, AKI onset, RRT requirements, and mortality risk in patients with RM. The model was trained and validated using EMR data from the eICU Collaborative Research Database (eICU-CRD), Medical Information Mart for Intensive Care IV (MIMIC-IV), and the Beijing–Tianjin Multicenter Hospital Dataset (BTMH). SHAP was used to visualize feature contributions and improve model interpretability, thereby enhancing transparency and supporting its potential for clinical implementation.
Methods
Data source
We conducted a multicenter retrospective study using de-identified data from the MIMIC-IV (v2.0),23 eICU-CRD (v2.0),24,25 and the BTMH database. MIMIC-IV is a contemporary electronic health record (EHR) database developed at the Beth Israel Deaconess Medical Center, comprising hospital admissions between 2008 and 2019. It contains more than 65,000 ICU admissions and over 200,000 emergency department visits. The eICU-CRD is a multicenter dataset comprising de-identified data from over 200,000 ICU admissions across the United States from 2014 to 2015. This study complied with the data use agreement (DUA) of the PhysioNet platform. The BTMH database comprises de-identified EHR data from four tertiary hospitals in China and was used for external validation, including patient records from 2016 to 2022. This study was approved by the Institutional Review Board (IRB).
Study population
All adult patients (aged ≥18 years) diagnosed with RM according to the International Classification of Diseases, Ninth and Tenth Revisions (ICD-9/10),26 were included. Exclusion criteria were as follows: (1) peak CK level < 1000 U/L; (2) hospital stay < 24 h; (3) patients with one or more of the following diseases, acute myocardial infarction (AMI), chronic heart failure (CHF), chronic kidney disease stage 4 or 5 (CKD 4/5), chronic respiratory failure (CRF), chronic liver failure (CLF), dermatomyositis (DM), or polymyositis (PM); (4) patients with unknown outcomes or outliers. Details of the corresponding ICD codes used for inclusion and exclusion are provided in Supplementary Appendix S1.
Data collection and processing
Data were extracted using PostgreSQL version 15.3 from MIMIC-IV (v2.0) and eICU-CRD (v2.0) for patients meeting the inclusion criteria. To ensure consistency between MIMIC-IV and eICU-CRD, we performed data harmonization prior to model development. This included aligning variable definitions (e.g., standardizing SCr units to μmol/L), unifying laboratory units (e.g., CK, ALT, AST, LDH), reconciling variable names using a standardized mapping dictionary, and restricting feature extraction to the first 24 h of admission. All preprocessing was implemented through a unified PostgreSQL-based pipeline.
Clinical variables collected within the first 24 h of admission included demographics, vital signs, and laboratory measurements. These variables were used to construct the clinical prediction models. AKI was defined according to the KDIGO criteria.27 For patients without available baseline serum creatinine (SCr), the KDIGO-recommended method was used: an estimated glomerular filtration rate (eGFR) of 75 mL/min/1.73 m2 was assumed, and the baseline SCr was back-calculated using the MDRD equation. Features with more than 30% missingness were excluded to reduce bias.28 A total of 22 features were selected, including sex, age, height, weight, body mass index (BMI), CK, serum creatinine (SCr), blood urea nitrogen (BUN), alanine aminotransferase (ALT), alkaline phosphatase (ALP), aspartate aminotransferase (AST), total bilirubin (TBil), lactate dehydrogenase (LDH), albumin (Alb), potassium (K), calcium (Ca), sodium (Na), chloride (Cl), prothrombin time (PT), activated partial thromboplastin time (APTT, hereafter referred to as PTT), white blood cell count (WBC), and platelet count (PLT). Clinically, these variables capture key pathophysiological aspects of RM, including muscle breakdown, renal function, electrolyte imbalance, and coagulation abnormalities. Statistically, all features were routinely available within the first 24 h of admission and had less than 30% missingness, supporting robust and feasible model development. This selection strategy is consistent with prior literature on AKI risk in RM and enhances early clinical applicability.
To address multicollinearity, Spearman correlation analysis was performed for each of the four prediction tasks, as illustrated in Supplementary Fig. S1.29 When two variables were highly correlated in the Spearman correlation analysis (correlation coefficient > 0.6), the feature with the weaker association with the outcome was excluded from the dataset. A detailed explanation of this process is provided in Supplementary Appendix S2. A total of 18 features were ultimately retained for model development.30
External validation cohorts
External validation was performed using the independent BTMH dataset, applying the same inclusion and exclusion criteria as in the derivation cohort. A total of 362 patients were included in the final analysis.
Definition of severe patients
To improve risk identification and support clinical decision-making,31 we defined severe patients as those who developed both AKI and required RRT during hospitalization, or those who experienced in-hospital mortality.32,33
Model development and comparison
The combined MIMIC-IV and eICU-CRD datasets were randomly split into training and testing sets (80:20 ratio) for each of the four prediction tasks, using stratified sampling based on the corresponding binary outcome variable. The BTMH dataset served as the external validation cohort. Eighteen features were used to build task-specific models. Missing values were imputed using median values.
Ten machine learning algorithms were evaluated, including logistic regression (LR),34 support vector machine (SVM),35 decision tree (DT),36 random forest (RF),37 k-nearest neighbors (KNN),38 extra trees (ET),39 gradient boosting machine (GBM),40 adaptive boosting (AdaBoost),41 extreme gradient boosting (XGBoost),42 and artificial neural networks (ANN).43 To mitigate overfitting, five-fold stratified cross-validation was employed during training. Hyperparameters were tuned via a combination of grid search and manual adjustment. Model performance was assessed using the area under the receiver-operating-characteristic (ROC) curve (AUC), sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), and F1 score.
Feature selection and model interpretation
The SHAP method was employed to enhance model interpretability by quantifying feature contributions and ranking feature importance, thereby addressing the “black-box” nature of ML models. SHAP values were used to guide feature selection for each prediction task. Starting with 18 features, we sequentially reduced them based on SHAP importance rankings while monitoring model performance. The DeLong non-parametric test was used to compare AUCs, and feature elimination was stopped when a statistically significant drop in performance was detected. SHAP provides both global and local interpretability. Global SHAP summarizes the overall contribution of each feature to model predictions, while local SHAP identifies patient-level factors that contribute to individual predictions.
Web and Android-based applications
To facilitate clinical adoption, the final models were deployed as both web-based and Android-based tools. Upon input of the selected features, the application generates predictions for all tasks with a single click.
Statistical analysis
Data were analyzed using Python (v3.8.3; Python Software Foundation) and R (v4.2.0; R Foundation for Statistical Computing). Categorical variables were summarized as counts (n) and percentages (%) and compared using the chi-square test or Fisher's exact test. Continuous variables with non-normal distributions were reported as medians and interquartile ranges (IQR) and compared using the Mann–Whitney U test or Kruskal–Wallis test. Model performance was evaluated using the AUC. The optimal cutoff point was determined by maximizing the Youden index, calculated as (sensitivity + specificity − 1). Statistical significance was defined as a p-value < 0.05.
Ethics
The external validation dataset (BTMH), comprising de-identified electronic health records from four tertiary hospitals in China, was approved by the Institutional Review Board of Tianjin University, China (Approval No. TJUE-2022-254). Due to the retrospective nature of the study and the use of anonymized data, the requirement for informed consent was waived by the IRB.
Two publicly available databases were used to construct and validate the prediction models: the Medical Information Mart for Intensive Care IV (MIMIC-IV, version 2.0) and the eICU Collaborative Research Database (eICU-CRD, version 2.0). The MIMIC-IV database was approved by the institutional review boards of Beth Israel Deaconess Medical Center (IRB No. 2001-P-001699/14) and the Massachusetts Institute of Technology (IRB No. 0403000206), USA. The eICU-CRD database was certified for low re-identification risk under the Health Insurance Portability and Accountability Act (HIPAA) and exempted from further IRB review (Certification No. 1031219-2). Access to both datasets was granted through credentialed access on PhysioNet following successful completion of the NIH “Protecting Human Research Participants” course.
Role of the funding source
The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report.
Results
Patient characteristics
This multicenter retrospective study included 1429 patients with RM from the MIMIC-IV (v2.0) and eICU-CRD (v2.0) databases, comprising the derivation cohort used for model development. Baseline demographic and clinical characteristics of this cohort are summarized in Table 1. Among these patients, 69.4% developed AKI, 36.7% were classified as having severe disease, 12.1% required RRT, and 9.8% experienced in-hospital mortality.
Table 1.
Baseline demographic and clinical characteristics of patients in the derivation cohort.
| Variables | Non-AKI (n = 437) | AKI (n = 992) | p value |
|---|---|---|---|
| AKI | |||
| Age, year | 47.0 [31.0–62.0] | 56.0 [42.0–70.0] | <0.001 |
| Height, cm | 172.7 [170.2–172.7] | 172.7 [167.6–177.8] | 0.442 |
| Weight, kg | 79.8 [75.4–81.6] | 79.8 [70.6–94.4] | 0.072 |
| BMI, kg/m2 | 26.8 [25.8–27.7] | 26.8 [24.1–31.4] | 0.117 |
| Male, n (%) | 303 (69.3) | 686 (69.2) | 1.000 |
| CK, U/L | 8862.5 [3285.0–16744.0] | 6089.0 [2797.5–15804.5] | 0.044 |
| SCr, mg/dL | 1.0 [0.8–1.6] | 2.0 [1.1–4.0] | <0.001 |
| BUN, mg/dL | 19.0 [12.0–31.0] | 36.0 [20.0–62.0] | <0.001 |
| ALT, U/L | 128.0 [58.0–200.0] | 85.8 [63.0–214.0] | 0.564 |
| ALP, U/L | 81.0 [62.0–95.0] | 70.0 [63.0–97.0] | 0.218 |
| AST, U/L | 282.0 [127.0–430.0] | 211.0 [157.0–573.0] | 0.239 |
| TBil, mg/dL | 0.7 [0.5–1.0] | 0.7 [0.5–1.0] | 0.040 |
| LDH, U/L | 598.0 [441.0–598.0] | 1029.0 [710.8–1029.0] | <0.001 |
| Alb, g/dL | 3.4 [3.4–3.7] | 2.9 [2.7–3.4] | <0.001 |
| K, mEq/L | 4.4 [4.2–4.8] | 4.5 [4.0–5.2] | 0.955 |
| Ca, mg/dL | 9.0 [8.6–9.4] | 8.4 [7.7–9.0] | <0.001 |
| Na, mEq/L | 143.0 [141.0–146.0] | 142.0 [138.0–145.0] | <0.001 |
| Cl, mEq/L | 109.0 [106.0–112.0] | 108.0 [104.0–112.0] | 0.012 |
| PT, seconds | 13.5 [13.0–13.7] | 14.1 [13.5–15.2] | <0.001 |
| PTT, seconds | 30.3 [28.5–30.4] | 33.5 [30.3–33.5] | <0.001 |
| WBC, 109/L | 10.5 [7.5–13.0] | 12.3 [8.9–16.4] | <0.001 |
| PLT, 109/L | 191.0 [155.0–241.0] | 174.0 [131.0–229.0] | <0.001 |
| Variable | Non- Severe (n = 905) | Severe (n = 524) | p value |
| Severity | |||
| Age, year | 51.0 [35.0–66.0] | 57.0 [45.0–69.0] | <0.001 |
| Height, cm | 172.7 [167.6–175.3] | 172.7 [167.6–177.8] | 0.246 |
| Weight, kg | 79.8 [72.5–86.0] | 79.8 [72.5–97.6] | 0.003 |
| BMI, kg/m2 | 26.8 [25.0–29.3] | 26.8 [24.2–32.3] | 0.027 |
| Male, n (%) | 628 (69.4) | 361 (68.9) | 0.859 |
| CK, U/L | 7529.0 [2855.0–15614.0] | 6105.0 [2896.8–17915.0] | 0.800 |
| SCr, mg/dL | 1.1 [0.9–1.8] | 3.1 [1.8–5.3] | <0.001 |
| BUN, mg/dL | 21.0 [13.0–38.0] | 47.0 [30.0–75.2] | <0.001 |
| ALT, U/L | 111.0 [59.0–173.0] | 85.5 [67.8–341.8] | 0.093 |
| ALP, U/L | 78.0 [63.0–93.0] | 70.0 [64.8–101.2] | 0.841 |
| AST, U/L | 255.0 [142.0–431.0] | 211.0 [162.8–870.0] | 0.005 |
| TBil, mg/dL | 0.7 [0.5–1.0] | 0.7 [0.6–1.1] | 0.010 |
| LDH, U/L | 598.0 [559.0–1029.0] | 1029.0 [1029.0–1029.0] | <0.001 |
| Alb, g/dL | 3.4 [2.9–3.5] | 2.8 [2.5–3.2] | <0.001 |
| K, mEq/L | 4.4 [4.1–4.9] | 4.6 [4.1–5.5] | <0.001 |
| Ca, mg/dL | 8.9 [8.4–9.3] | 8.0 [7.5–8.8] | <0.001 |
| Na, mEq/L | 142.0 [140.0–145.0] | 141.0 [138.0–145.0] | <0.001 |
| Cl, mEq/L | 108.0 [105.0–111.0] | 109.0 [103.0–113.0] | 0.963 |
| PT, seconds | 13.5 [10.9–13.5] | 15.1 [14.3–15.7] | <0.001 |
| PTT, seconds | 30.3 [30.1–33.5] | 33.5 [32.9–33.5] | <0.001 |
| WBC, 109/L | 11.1 [7.7–14.4] | 12.7 [9.5–17.6] | <0.001 |
| PLT, 109/L | 191.0 [151.0–239.0] | 168.5 [120.8–220.2] | <0.001 |
| Variable | Non-RRT (n = 1256) | RRT (n = 173) | p value |
| RRT | |||
| Age, year | 54.0 [38.8–68.2] | 49.0 [38.0–61.0] | 0.011 |
| Height, cm | 172.7 [167.6–177.8] | 172.7 [172.7–175.3] | 0.089 |
| Weight, kg | 79.8 [71.3–90.3] | 79.8 [79.0–93.9] | 0.013 |
| BMI, kg/m2 | 26.8 [24.4–30.1] | 26.8 [26.4–30.4] | 0.114 |
| Male, n (%) | 859 (68.4) | 130 (75.1) | 0.079 |
| CK, U/L | 6510.0 [2732.2–14258.0] | 10604.0 [4991.0–50985.0] | <0.001 |
| SCr, mg/dL | 1.3 [0.9–2.4] | 6.0 [3.3–7.9] | <0.001 |
| BUN, mg/dL | 26.0 [15.0–47.0] | 65.0 [38.0–101.0] | <0.001 |
| ALT, U/L | 87.5 [57.0–165.0] | 380.0 [124.0–1241.0] | <0.001 |
| ALP, U/L | 71.0 [62.0–89.0] | 115.0 [74.0–174.0] | <0.001 |
| AST, U/L | 211.0 [137.0–384.2] | 1057.0 [319.0–3311.0] | <0.001 |
| TBil, mg/dL | 0.7 [0.5–1.0] | 1.0 [0.6–2.5] | <0.001 |
| LDH, U/L | 955.5 [598.0–1029.0] | 1029.0 [1029.0–1762.0] | <0.001 |
| Alb, g/dL | 3.3 [2.8–3.4] | 2.9 [2.6–3.3] | <0.001 |
| K, mEq/L | 4.4 [4.0–4.9] | 5.5 [4.6–6.4] | <0.001 |
| Ca, mg/dL | 8.6 [8.0–9.1] | 8.9 [7.7–9.6] | 0.066 |
| Na, mEq/L | 142.0 [139.0–145.0] | 142.0 [139.0–145.0] | 0.917 |
| Cl, mEq/L | 108.0 [105.0–112.0] | 108.0 [102.0–112.0] | 0.045 |
| PT, seconds | 12.1 [11.3–13.5] | 13.8 [13.3–15.4] | 0.006 |
| PTT, seconds | 32.1 [30.3–33.5] | 33.5 [30.3–43.0] | <0.001 |
| WBC, 109/L | 11.2 [8.1–15.0] | 13.2 [9.7–18.1] | <0.001 |
| PLT, 109/L | 190.0 [142.0–236.0] | 153.0 [116.0–207.0] | <0.001 |
| Variable | Survival (n = 1289) | Mortality (n = 140) | p value |
| Mortality | |||
| Age, year | 52.0 [37.0–66.0] | 65.5 [54.0–79.0] | <0.001 |
| Height, cm | 172.7 [167.6–177.8] | 172.7 [170.1–172.7] | 0.572 |
| Weight, kg | 79.8 [72.5–90.9] | 79.8 [75.9–83.1] | 0.419 |
| BMI, kg/m2 | 26.8 [24.7–30.4] | 26.8 [24.8–27.1] | 0.128 |
| Male, n (%) | 891 (69.1) | 96 (71.6) | 0.623 |
| CK, U/L | 7275.0 [2874.0–16661.0] | 5148.0 [2823.2–11615.8] | 0.106 |
| SCr, mg/dL | 1.4 [0.9–3.0] | 2.5 [1.8–4.0] | <0.001 |
| BUN, mg/dL | 28.0 [16.0–50.0] | 45.0 [29.0–65.8] | <0.001 |
| ALT, U/L | 101.0 [60.0–194.0] | 93.5 [67.2–470.5] | 0.058 |
| ALP, U/L | 74.0 [63.0–93.0] | 82.0 [69.0–142.5] | <0.001 |
| AST, U/L | 233.0 [145.0–505.0] | 282.0 [168.5–1293.2] | 0.006 |
| TBil, mg/dL | 0.7 [0.5–1.0] | 0.8 [0.6–2.0] | <0.001 |
| LDH, U/L | 1029.0 [598.0–1029.0] | 1029.0 [598.0–1029.0] | <0.001 |
| Alb, g/dL | 3.3 [2.8–3.4] | 2.8 [2.4–3.4] | <0.001 |
| K, mEq/L | 4.4 [4.1–5.0] | 4.9 [4.3–6.0] | <0.001 |
| Ca, mg/dL | 8.6 [8.0–9.2] | 8.5 [7.6–9.2] | 0.129 |
| Na, mEq/L | 142.0 [139.0–145.0] | 144.0 [140.0–149.0] | <0.001 |
| Cl, mEq/L | 108.0 [105.0–112.0] | 111.0 [106.0–116.0] | <0.001 |
| PT, seconds | 12.5 [11.3–13.5] | 15.6 [15.2–17.2] | 0.007 |
| PTT, seconds | 32.4 [30.3–33.5] | 33.5 [29.8–43.0] | <0.001 |
| WBC, 109/L | 11.2 [8.1–15.0] | 14.4 [11.1–18.9] | <0.001 |
| PLT, 109/L | 188.0 [142.0–234.0] | 164.5 [112.5–237.2] | 0.015 |
Comparison of patient characteristics stratified by four clinical outcomes: AKI, disease severity, RRT requirement, and mortality.
AKI: acute kidney injury; RRT: renal replacement therapy.
A total of 2540 patients with RM were initially identified. After excluding those aged <18 years or with peak CK levels <1000 U/L, 2023 patients remained. Further exclusions included 48 patients with hospital stays <24 h, 217 with severe comorbidities, and 329 with unknown outcomes, resulting in a final derivation cohort of 1429 patients (828 from MIMIC-IV and 601 from eICU-CRD).
The external validation cohort comprised 474 patients with RM from the BTMH database. After applying the same exclusion criteria, 362 patients were included. Among them, 27.9% developed AKI, 25.7% were classified as having severe disease, 27.3% required RRT, and 4.1% experienced in-hospital mortality. Details of the study design are displayed in Fig. 1.
Fig. 1.
Flow chart of the study design. RM: rhabdomyolysis; CK: creatine kinase; AMI: acute myocardial infarction; CHF: congestive heart failure; CKD: chronic kidney disease; CRF: chronic renal failure; CLF: chronic liver failure; DM: dermatomyositis; PM: polymyositis; ML: machine learning; SHAP: SHapley Additive exPlanation.
Model development and performance comparison
Using data collected within the first 24 h of admission, ten ML models were trained and evaluated to predict four clinical outcomes in patients with RM: AKI, disease severity, RRT requirement, and mortality. For AKI prediction, the RF model achieved the best performance (AUC = 0.922, 95% confidence interval [CI]: 0.884–0.950), followed by GBM (AUC = 0.917, 95% CI: 0.879–0.943). For disease severity, RF again performed best (AUC = 0.920, 95% CI: 0.882–0.949), followed by GBM (AUC = 0.918, 95% CI: 0.880–0.945). For RRT prediction, RF remained top-performing (AUC = 0.901, 95% CI: 0.862–0.935), with ET slightly lower (AUC = 0.891, 95% CI: 0.855–0.927). For mortality prediction, XGBoost yielded the highest AUC (0.869, 95% CI: 0.823–0.905), followed by AdaBoost (AUC = 0.861, 95% CI: 0.820–0.902).
A comprehensive comparison of all ten models is provided in Supplementary Table S1. ROC curves for the top five models across all prediction tasks are shown in Fig. 2, and SHAP summary plots of the top 18 features are presented in Supplementary Fig. S2. During the SHAP-guided feature elimination process, both the random forest (RF) and XGBoost models consistently demonstrated strong performance, as shown in Supplementary Fig. S4. Performance metrics of the best-performing models with different numbers of features across tasks are summarized in Supplementary Table S2. The optimal cutoff for each task was determined by maximizing the Youden index, based on which sensitivity, specificity, PPV, NPV, accuracy, and F1 score were calculated.
Fig. 2.
Performance of ML models for prediction of clinical outcomes. ROC curves comparing the performance of ten ML models across four prediction tasks: (A) AKI prediction, (B) Disease severity classification, (C) RRT requirement prediction, and (D) mortality prediction. ROC: receiver operating characteristic; AUC: area under the curve; ML: machine learning; AKI: acute kidney injury; RRT: renal replacement therapy; RF: random forest; GBM: gradient boosting machine; ET: extra trees; XGBoost: extreme gradient boosting; SVM: support vector machine; LR: logistic regression; AdaBoost: adaptive boosting.
Final model selection
The final models were selected based on SHAP-guided feature reduction performance, as illustrated in Fig. 3 and Supplementary Fig. S4. In each task, the full 18-feature model significantly outperformed the simplified 2-feature version (ΔAUC = 0.067–0.139; all p < 0.05), but showed no statistically significant advantage over the 5-feature model (ΔAUC = 0.008–0.046; all p > 0.1). Therefore, the 5-feature models were selected as the final versions for further analysis. The selected features were as follows: PT, LDH, SCr, PTT, and Alb for AKI prediction; PT, SCr, LDH, Ca, and K for disease severity classification; SCr, AST, K, ALP, and LDH for RRT prediction; and Age, Cl, SCr, K, and PTT for mortality prediction.
Fig. 3.
Comparison of AUCs for the final model across different numbers of features in each prediction task. ROC curves of the best-performing ML model for each clinical prediction task, evaluated with varying numbers of SHAP-selected features. (A) AKI prediction using the RF model. (B) Disease severity classification using the RF model. (C) RRT requirement prediction using the RF model. (D) Mortality prediction using the XGBoost model. AUC: area under the receiver operating characteristic curve; ROC: receiver operating characteristic; SHAP: SHapley Additive exPlanations; RF: random forest; XGBoost: extreme gradient boosting; AKI: acute kidney injury; RRT: renal replacement therapy.
External validation results
In the external validation cohort, the final models demonstrated consistently strong performance across all prediction tasks, with AUCs (95% CIs) of 0.906 (0.871–0.934) for AKI, 0.856 (0.815–0.890) for disease severity, 0.852 (0.811–0.887) for RRT requirement, and 0.832 (0.789–0.869). These findings support the generalizability and potential clinical utility of the models in independent populations. Detailed results are provided in Supplementary Table S4 and Supplementary Fig. S5.
Model interpretation
To enhance model interpretability, we employed the SHAP method to quantify the contribution of each input feature to the model's predictions. SHAP enables both global and local interpretability: global explanations summarize the average impact of each feature on the overall model output, while local explanations illustrate how specific feature values influence individual predictions.
As shown in Fig. 4, SHAP summary plots rank features based on their mean SHAP values across all prediction tasks. SHAP dependence plots further illustrate how variations in specific feature values affect model outcomes, with the relationships between actual feature values and SHAP contributions presented in Supplementary Fig. S3. A positive SHAP value indicates a greater influence on predicting the positive class (e.g., in mortality prediction, patients aged ≥65 years often exhibit positive SHAP values, shifting the prediction toward the “death” class).
Fig. 4.
Global interpretability of the final models using SHAP. SHAP summary plots for the final five-feature models in each clinical prediction task. Each point represents an individual patient's SHAP value for a specific feature. Color denotes the feature value (red = high, blue = low). Features are ranked by mean absolute SHAP value. (A) AKI prediction. (B) Disease severity classification. (C) RRT requirement prediction. (D) In-hospital mortality prediction. The plots illustrate the relative contribution and directional impact of each input variable on model output across the cohort. SHAP: SHapley Additive exPlanations; AKI: acute kidney injury; RRT: renal replacement therapy; SCr: serum creatinine; PT: prothrombin time; LDH: lactate dehydrogenase; Alb: albumin; PTT: activated partial thromboplastin time; Ca: calcium; K: potassium; AST: aspartate aminotransferase; ALP: alkaline phosphatase; Cl: chloride; Age: patient age.
Furthermore, local interpretability allows for a clearer understanding of how the model arrives at predictions for individual patients. Fig. 5 presents SHAP waterfall plots for four representative cases: panels A, C, E, and G illustrate the predicted probabilities of AKI, severe disease, RRT requirement, and in-hospital mortality, respectively, while panels B, D, F, and H show the corresponding counterfactual predictions, including non-AKI, non-severe disease, no RRT requirement, and survival.
Fig. 5.
SHAP-based local explanation for individual predictions. SHAP waterfall plots demonstrating the contribution of each input feature to model predictions for four representative patients. Panels A, C, E, and G show positive outcome predictions for AKI, disease severity, RRT requirement, and in-hospital mortality, respectively. Panels B, D, F, and H display the corresponding counterfactual predictions for non-AKI, non-severe disease, no RRT requirement, and survival. Red bars represent features that increase the predicted probability, while blue bars represent features that decrease it. The length of each bar indicates the magnitude of contribution to the final model output for an individual case. SHAP: SHapley Additive exPlanations; AKI: acute kidney injury; RRT: renal replacement therapy.
In Fig. 5, f(x) represents the model's final predicted probability for a given patient—i.e., the likelihood of experiencing a specific outcome such as AKI, requiring RRT, or in-hospital mortality—ranging from 0 to 1. E[f(x)] can be understood as the model's “baseline prediction” in the absence of any feature input. The final prediction can be approximated as the sum of this baseline value and the SHAP contributions of all input features. For example, according to the predictive model, Fig. 5E shows that the patient has an 84% probability of requiring RRT, whereas the counterfactual scenario in Fig. 5F indicates a 16% probability of not requiring RRT. As illustrated in Fig. 5E and F, the patient's values for SCr, ALP, AST, K, and LDH pushed the prediction toward the “RRT required” class, with SCr exerting the strongest positive influence. This finding is consistent with Fig. 4C, where SCr ranks as the most important predictor in the RRT prediction task.
Facilitating clinical application
To promote clinical implementation, we developed an intelligent decision support system powered by the final models, available in both web-based and Android mobile application formats, as illustrated in Fig. 6. The web version, designed for desktop use, includes modules for diagnostic prediction, medical history, patient record management, and user administration. The Android version, optimized for mobile platforms, includes all functionalities except user administration. Each user is assigned a unique account to ensure data isolation and privacy. No information is shared across accounts, preventing unauthorized access or cross-user data leakage.
Fig. 6.
Overview of the intelligent decision support system's interface and core functions. User interfaces of the intelligent decision support system designed to support clinical implementation. (A) The web-based version is intended for desktop use. (B) The Android-based mobile application is designed for smartphones and tablets. (C) Display of diagnostic results.
Discussion
To the best of our knowledge, this is the first retrospective multicenter study to evaluate ten ML models for multi-task prediction in patients with RM, incorporating model interpretability analysis and the development of clinically deployable decision support tools.
RM is a potentially life-threatening condition triggered by a variety of causes, either independently or in conjunction with acute systemic illnesses. In our derivation cohort, the incidence of AKI was 69.4%, the requirement for RRT was 12.1%, and in-hospital mortality reached 9.8%. Compared with the 50.0% AKI incidence reported by McMahon et al.,44 our cohort exhibited a higher AKI rate.45 However, this finding is consistent with previous studies conducted in ICU populations with more severe illness. For instance, Liu et al. reported a 70% AKI incidence using MIMIC-IV data,46 while Wen et al. observed a rate of 64.0% in the MIMIC-III cohort.9 These comparisons suggest that the AKI burden in our study is reflective of the critical care setting and underscores the clinical relevance of our model for application in high-acuity patient populations.
Previous studies have primarily focused on predicting single RM-related outcomes such as AKI, RRT use, or mortality. However, due to the complex etiology and high inter-individual variability of RM, single biomarkers often fail to achieve adequate predictive accuracy. For example, CK alone is an unreliable marker of renal injury—mild elevation without renal dysfunction, termed hyperCKemia, is common.47,48 Although myoglobin levels correlate with renal damage, their short half-life limits diagnostic value beyond the acute phase.49
ML offers a powerful computational approach for capturing nonlinear relationships and high-dimensional interactions in clinical data. Compared with traditional statistical models, ML can learn patterns directly from large-scale EHR, enhancing predictive accuracy and generalizability.50,51 Nevertheless, a persistent challenge in ML-based modeling is the lack of standardized feature selection strategies. Including too many variables can lead to overfitting, multicollinearity, and reduced interpretability, whereas using too few may compromise model performance. To balance accuracy and interpretability, we applied SHAP to guide feature selection, enabling the development of streamlined, transparent models suitable for clinical implementation.52 Interestingly, CK, a well-recognized biomarker in RM, did not appear among the top five SHAP-derived predictors for AKI in our model. This does not suggest that CK lacks clinical significance; rather, it may reflect its limited early predictive value for AKI in our dataset, or that its signal was partially captured by other features such as SCr or PTT. Notably, PTT has also been identified in prior studies as a strong predictor of AKI risk in RM populations. These findings underscore the utility of interpretable ML approaches in uncovering latent but clinically meaningful patterns and enhancing the alignment between model outputs and clinical reasoning.
RM is primarily characterized by skeletal muscle cell injury, which leads to the massive release of intracellular components such as myoglobin and CK into the circulation. Due to its complex etiology, highly heterogeneous clinical presentations, and rapid disease progression, a single biomarker or clinical variable is often insufficient to accurately capture disease trajectory or comprehensively predict patient prognosis.44 For example, even the widely used CK level has shown inconsistent associations with the risk of AKI or mortality in previous studies.53 Our developed models demonstrated superior predictive performance compared to conventional single biomarkers. In comparative analyses, our ML-based models consistently outperformed traditional biomarkers such as CK, SCr, and LDH across all prediction tasks. These findings support the application of multi-feature ML models for RM risk stratification, particularly in clinically complex patient populations.
Large-scale, multi-task ML studies focusing specifically on RM remain rare. In this study, we utilized data from MIMIC-IV (v2.0), eICU-CRD (v2.0), and the BTMH database, comprising 1791 patients for model development and validation. While previous work has applied ML to single-outcome prediction,46,54 our multi-task model simultaneously assessed AKI, RRT, in-hospital mortality, and disease severity. This integrative design allows for holistic patient evaluation and supports individualized, timely decision-making. These outcomes were modeled independently, as they represent distinct clinical endpoints at different stages of RM progression, enabling the system to provide layered risk warnings across the patient care continuum.
Our final models demonstrated consistently strong performance across both internal and external validation cohorts, with AUCs (95% CIs) of 0.914 (0.875–0.944)/0.906 (0.871–0.934) for AKI, 0.909 (0.869–0.940)/0.856 (0.815–0.890) for disease severity, 0.888 (0.844–0.921)/0.852 (0.811–0.887) for RRT, and 0.823 (0.773–0.865)/0.832 (0.789–0.869) for mortality. To improve interpretability and support clinical adoption, we implemented SHAP to visualize feature contributions and clarify model decision mechanisms. By providing both global explanations (feature importance rankings) and local explanations (individual prediction drivers), SHAP addresses the common “black box” concern associated with ML models and enhances clinical understanding.55 This dual-level interpretability may improve clinician trust and promote broader integration of ML tools into real-world practice.56,57 These results collectively confirm the robustness, transparency, and translational potential of our approach.
In contrast with earlier tools that were limited in scalability or usability,58,59 we developed web- and Android-based intelligent decision support systems featuring intuitive interfaces and modular, clinically relevant functionalities. Pilot testing is currently underway at multiple partner hospitals to evaluate usability, integration performance, and real-world clinical utility. Interested readers may request access by contacting the corresponding author. However, as the development dataset predominantly consisted of critically ill ICU patients from Western populations, this may present a substantial barrier to broader clinical implementation. Future research should incorporate more heterogeneous data sources to achieve a more balanced representation of disease severity and to further improve the model's external generalizability.60
Notably, recent recommendations from the European Neuromuscular Centre (ENMC) suggest that CK thresholds for diagnosing RM should be set at 5000 U/L for exertional cases and 10,000 U/L for non-exertional RM. In our study, we adopted a relatively lower threshold of CK ≥1000 U/L, based on the following considerations: alignment with thresholds used in several prior large-scale retrospective studies; and the practical limitations of EMR-based data—increasing the threshold substantially would have reduced the sample size, thereby compromising model generalizability and the feasibility of external validation. In future studies, higher CK cut-offs may be explored when data availability permits.61,62 Additionally, the ENMC recommendations also support the use of the McMahon risk score to predict the likelihood of AKI and in-hospital mortality. However, because several key parameters required for this score were unavailable in our current dataset, we plan to broaden the scope of data collection in future studies and evaluate the comparative performance of the McMahon score and our machine learning–based prediction models. Although CK >5000 U/L is commonly regarded as a strong indicator of RM-induced AKI, approximately 40% of AKI patients in our cohort had CK levels below this threshold. This suggests that, in critically ill populations, AKI may result from multifactorial causes such as infection, hypotension, or nephrotoxic medications. These findings further underscore the importance of expanding future data collection to include non-ICU patients.
This study has several limitations. First, although the models were developed and validated using retrospective multicenter datasets, prospective studies are necessary to confirm their robustness and clinical applicability prior to broader deployment. Pilot testing is currently in progress to refine model performance and assess its utility in real-world clinical settings. Second, the application of strict inclusion and exclusion criteria—based on ICD codes and creatine kinase (CK) thresholds, and the exclusion of patients with severe comorbidities—was intended to reduce potential confounding. However, this approach may limit the generalizability of our findings, particularly to non-ICU populations, as the training cohort primarily consisted of critically ill patients. Third, while the training data included individuals from diverse ethnic backgrounds, external validation was primarily performed in an Asian population, which may limit the model's applicability to other demographic settings. Fourth, no consensus currently exists regarding the minimum sample size required for developing machine learning–based clinical prediction models. Although our models demonstrated satisfactory performance in both internal and external validations, future research should further explore how sample size may influence model stability and reliability. Fifth, while external validation yielded favorable results, incorporating additional multicenter data would further enhance model robustness and improve its generalizability. Sixth, we did not stratify patients based on the etiology of rhabdomyolysis (RM). Given the substantial heterogeneity in RM pathogenesis, evaluating model performance across etiological subgroups warrants further investigation.
Future work will focus on (1) expanding data collection to include a broader range of clinical variables and more representative populations—including non-ICU and Asian cohorts—to enhance generalizability; (2) integrating the McMahon risk score into our analysis and comparing its predictive performance against machine learning–based models; and (3) exploring additional advanced algorithms, such as LightGBM, to further optimize predictive accuracy and model interpretability.
Contributors
CLL, JShi, and FJW contributed equally to this work. CLL, JShi, FJW, and QL conceived and designed the study. DL, YL, BFY, YLZ, DWY, HJ, JSong, and LZ were responsible for data collection and curation. CLL and FJW conducted the data analysis. CLL, JShi, and FJW drafted the initial manuscript. XQG, HJF, and QL supervised the study and provided critical revisions. CLL, JShi, FJW, and QL had full access to and verified the data. All authors reviewed and approved the final version of the manuscript.
Data sharing statement
The MIMIC-IV (v2.0) and eICU-CRD (v2.0) datasets used in this study are publicly available via the PhysioNet platform (https://physionet.org/), subject to completion of the required data use agreement and associated training. The BTMH dataset, derived from four tertiary hospitals in China, is not publicly available due to institutional and patient privacy considerations, but may be made available to qualified researchers upon reasonable request to the corresponding authors.
Declaration of interests
The authors declare that they have no competing interests.
Acknowledgements
This study was supported by the National Key Research and Development Program of China (Nos. 2021YFC3002202 and 2023YFF1204104). The funders had no role in the study design, data collection, data analysis, manuscript preparation, or the decision to publish.
Footnotes
Supplementary data related to this article can be found at https://doi.org/10.1016/j.eclinm.2025.103438.
Contributor Information
Xiaoqin Guo, Email: guoxiaoqinlet@tju.edu.cn.
Haojun Fan, Email: fanhj@tju.edu.cn.
Qi Lv, Email: lvqi@tju.edu.cn.
Appendix A. Supplementary data
References
- 1.Bosch X., Poch E., Grau J.M. Rhabdomyolysis and acute kidney injury. N Engl J Med. 2009;361(1):62–72. doi: 10.1056/NEJMra0801327. [DOI] [PubMed] [Google Scholar]
- 2.Petejova N., Martinek A. Acute kidney injury due to rhabdomyolysis and renal replacement therapy: a critical review. Crit Care. 2014;18(3):224. doi: 10.1186/cc13897. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lim A.K.H., Azraai M., Pham J.H., Looi W.F., Bennett C. The association between illicit drug use and the duration of renal replacement therapy in patients with acute kidney injury from severe rhabdomyolysis. Front Med (Lausanne) 2020;7 doi: 10.3389/fmed.2020.588114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Cabral B.M.I., Edding S.N., Portocarrero J.P., Lerma E.V. Rhabdomyolysis. Dis Mon. 2020;66(8) doi: 10.1016/j.disamonth.2020.101015. [DOI] [PubMed] [Google Scholar]
- 5.Zutt R., van der Kooi A.J., Linthorst G.E., Wanders R.J.A., de Visser M. Rhabdomyolysis: review of the literature. Neuromuscul Disord. 2014;24(8):651–659. doi: 10.1016/j.nmd.2014.05.005. [DOI] [PubMed] [Google Scholar]
- 6.McKenna M.C., Kelly M., Boran G., Lavin P. Spectrum of rhabdomyolysis in an acute hospital. Ir J Med Sci. 2019;188(4):1423–1426. doi: 10.1007/s11845-019-01968-y. [DOI] [PubMed] [Google Scholar]
- 7.Chavez L.O., Leon M., Einav S., Varon J. Beyond muscle destruction: a systematic review of rhabdomyolysis for clinical practice. Crit Care. 2016;20(1):135. doi: 10.1186/s13054-016-1314-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Cervellin G., Comelli I., Lippi G. Rhabdomyolysis: historical background, clinical, diagnostic and therapeutic features. Clin Chem Lab Med. 2010;48(6):749–756. doi: 10.1515/CCLM.2010.151. [DOI] [PubMed] [Google Scholar]
- 9.Wen T., Mao Z., Liu C., Wang X., Tian S., Zhou F. Association between admission serum phosphate and risk of acute kidney injury in critically ill patients with rhabdomyolysis: a retrospective study based on MIMIC-Ⅲ. Injury. 2023;54(1):189–197. doi: 10.1016/j.injury.2022.10.024. [DOI] [PubMed] [Google Scholar]
- 10.Vanholder R., Sever M. Risk factors: predicting prognosis in patients with rhabdomyolysis. Nat Rev Nephrol. 2013;9(11):637–638. doi: 10.1038/nrneph.2013.207. [DOI] [PubMed] [Google Scholar]
- 11.Simpson J.P., Taylor A., Sudhan N., Menon D.K., Lavinio A. Rhabdomyolysis and acute kidney injury: creatine kinase as a prognostic marker and validation of the McMahon Score in a 10-year cohort: a retrospective observational evaluation. Eur J Anaesthesiol. 2016;33(12):906–912. doi: 10.1097/EJA.0000000000000490. [DOI] [PubMed] [Google Scholar]
- 12.El-Abdellati E., Eyselbergs M., Sirimsi H., et al. An observational study on rhabdomyolysis in the intensive care unit. Exploring its risk factors and main complication: acute kidney injury. Ann Intensive Care. 2013;3(1):8. doi: 10.1186/2110-5820-3-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Rajkomar A., Dean J., Kohane I. Machine learning in medicine. N Engl J Med. 2019;380(14):1347–1358. doi: 10.1056/NEJMra1814259. [DOI] [PubMed] [Google Scholar]
- 14.Miotto R., Wang F., Wang S., Jiang X., Dudley J.T. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform. 2018;19(6):1236–1246. doi: 10.1093/bib/bbx044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Haug C.J., Drazen J.M. Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med. 2023;388(13):1201–1208. doi: 10.1056/NEJMra2302038. [DOI] [PubMed] [Google Scholar]
- 16.Tomašev N., Glorot X., Rae J.W., et al. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature. 2019;572(7767):116–119. doi: 10.1038/s41586-019-1390-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Flechet M., Güiza F., Schetz M., et al. AKIpredictor, an online prognostic calculator for acute kidney injury in adult critically ill patients: development, validation and comparison to serum neutrophil gelatinase-associated lipocalin. Intensive Care Med. 2017;43(6):764–773. doi: 10.1007/s00134-017-4678-3. [DOI] [PubMed] [Google Scholar]
- 18.Wiens J., Shenoy E.S. Machine learning for healthcare: on the verge of a major shift in healthcare epidemiology. Clin Infect Dis. 2018;66(1):149–153. doi: 10.1093/cid/cix731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lundberg S.M., Erion G., Chen H., et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020;2(1):56–67. doi: 10.1038/s42256-019-0138-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Huang S., Zhou Y., Liang Y., et al. Machine-learning-derived online prediction models of outcomes for patients with cholelithiasis-induced acute cholangitis: development and validation in two retrospective cohorts. EClinicalMedicine. 2024;76 doi: 10.1016/j.eclinm.2024.102820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Zhang Y.P., Zhang X.Y., Cheng Y.T., et al. Artificial intelligence-driven radiomics study in cancer: the role of feature engineering and modeling. Mil Med Res. 2023;10(1):22. doi: 10.1186/s40779-023-00458-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Qi X., Wang S., Fang C., Jia J., Lin L., Yuan T. Machine learning and SHAP value interpretation for predicting comorbidity of cardiovascular disease and cancer with dietary antioxidants. Redox Biol. 2025;79 doi: 10.1016/j.redox.2024.103470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Pollard T.J., Johnson A.E.W., Raffa J.D., Celi L.A., Mark R.G., Badawi O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci Data. 2018;5 doi: 10.1038/sdata.2018.178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Johnson A.E., Stone D.J., Celi L.A., Pollard T.J. The MIMIC Code Repository: enabling reproducibility in critical care research. J Am Med Inform Assoc. 2018;25(1):32–39. doi: 10.1093/jamia/ocx084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Johnson A.E.W., Bulgarelli L., Shen L., et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. 2023;10(1):1. doi: 10.1038/s41597-022-01899-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Agarwal A.R., Prichett L., Jain A., Srikumaran U. Assessment of use of ICD-9 and ICD-10 codes for social determinants of health in the US, 2011-2021. JAMA Netw Open. 2023;6(5) doi: 10.1001/jamanetworkopen.2023.12538. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Khwaja A. KDIGO clinical practice guidelines for acute kidney injury. Nephron Clin Pract. 2012;120(4):c179–c184. doi: 10.1159/000339789. [DOI] [PubMed] [Google Scholar]
- 28.Jakobsen J.C., Gluud C., Wetterslev J., Winkel P. When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts. BMC Med Res Methodol. 2017;17(1):162. doi: 10.1186/s12874-017-0442-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Dormann C.F., Elith J., Bacher S., et al. Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography. 2013;36(1):27–46. [Google Scholar]
- 30.Efthimiou O., Seo M., Chalkou K., Debray T., Egger M., Salanti G. Developing clinical prediction models: a step-by-step guide. BMJ. 2024;386 doi: 10.1136/bmj-2023-078276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Vincent J.L., Moreno R., Takala J., et al. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med. 1996;22(7):707–710. doi: 10.1007/BF01709751. [DOI] [PubMed] [Google Scholar]
- 32.Kellum J.A., Lameire N. Diagnosis, evaluation, and management of acute kidney injury: a KDIGO summary (Part 1) Crit Care. 2013;17(1):204. doi: 10.1186/cc11454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Hoste E.A.J., Bagshaw S.M., Bellomo R., et al. Epidemiology of acute kidney injury in critically ill patients: the multinational AKI-EPI study. Intensive Care Med. 2015;41(8):1411–1423. doi: 10.1007/s00134-015-3934-7. [DOI] [PubMed] [Google Scholar]
- 34.Dodek P.M., Wiggs B.R. Logistic regression model to predict outcome after in-hospital cardiac arrest: validation, accuracy, sensitivity and specificity. Resuscitation. 1998;36(3):201–208. doi: 10.1016/s0300-9572(98)00012-4. [DOI] [PubMed] [Google Scholar]
- 35.Verplancke T., Van Looy S., Benoit D., et al. Support vector machine versus logistic regression modeling for prediction of hospital mortality in critically ill patients with haematological malignancies. BMC Med Inform Decis Mak. 2008;8:56. doi: 10.1186/1472-6947-8-56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Quinlan J.R. Induction of decision trees. Mach Learn. 1986;1:81–106. [Google Scholar]
- 37.Svetnik V., Liaw A., Tong C., Culberson J.C., Sheridan R.P., Feuston B.P. Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci. 2003;43(6):1947–1958. doi: 10.1021/ci034160g. [DOI] [PubMed] [Google Scholar]
- 38.Cover T., Hart P. Nearest neighbor pattern classification. IEEE Trans Inf Theory. 1967;13(1):21–27. [Google Scholar]
- 39.Geurts P., Ernst D., Wehenkel L. Extremely randomized trees. Mach Learn. 2006;63:3–42. [Google Scholar]
- 40.Friedman J.H. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29(5):1189–1232. [Google Scholar]
- 41.Schapire R.E. Empirical Inference: Festschrift in Honor of Vladimir N Vapnik. Springer; 2013. Explaining adaboost; pp. 37–52. [Google Scholar]
- 42.Chen T., Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016. pp. 785–794. [Google Scholar]
- 43.Agatonovic-Kustrin S., Beresford R. Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. J Pharm Biomed Anal. 2000;22(5):717–727. doi: 10.1016/s0731-7085(99)00272-1. [DOI] [PubMed] [Google Scholar]
- 44.McMahon G.M., Zeng X., Waikar S.S. A risk prediction score for kidney failure or mortality in rhabdomyolysis. JAMA Intern Med. 2013;173(19):1821–1828. doi: 10.1001/jamainternmed.2013.9774. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Candela N., Silva S., Georges B., et al. Short- and long-term renal outcomes following severe rhabdomyolysis: a French multicenter retrospective study of 387 patients. Ann Intensive Care. 2020;10(1):27. doi: 10.1186/s13613-020-0645-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Liu C., Liu X., Mao Z., et al. Interpretable machine learning model for early prediction of mortality in ICU patients with rhabdomyolysis. Med Sci Sports Exerc. 2021;53(9):1826–1834. doi: 10.1249/MSS.0000000000002674. [DOI] [PubMed] [Google Scholar]
- 47.Morin A.-G., Somme D., Corvol A. Rhabdomyolysis in older adults: outcomes and prognostic factors. BMC Geriatr. 2024;24(1):46. doi: 10.1186/s12877-023-04620-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Han M.H. American Association of Neuropathologists, Inc.; 2009. Adams and Victor's Principles of Neurology. [Google Scholar]
- 49.Huerta-Alardín A.L., Varon J., Marik P.E. Bench-to-bedside review: rhabdomyolysis -- an overview for clinicians. Crit Care. 2005;9(2):158–169. doi: 10.1186/cc2978. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Goecks J., Jalili V., Heiser L.M., Gray J.W. How machine learning will transform biomedicine. Cell. 2020;181(1):92–101. doi: 10.1016/j.cell.2020.03.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Rajkomar A., Oren E., Chen K., et al. Scalable and accurate deep learning with electronic health records. NPJ Digit Med. 2018;1:18. doi: 10.1038/s41746-018-0029-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Li Y., Sperrin M., Ashcroft D.M., van Staa T.P. Consistency of variety of machine learning and statistical models in predicting clinical risks of individual patients: longitudinal cohort study using cardiovascular disease as exemplar. BMJ. 2020;371 doi: 10.1136/bmj.m3919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Luo Y., Liu C., Li D., et al. Progress in the diagnostic and predictive evaluation of crush syndrome. Diagnostics (Basel) 2023;13(19) doi: 10.3390/diagnostics13193034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Liu C., Yuan Q., Mao Z., et al. Development and validation of a model for the early prediction of the RRT requirement in patients with rhabdomyolysis. Am J Emerg Med. 2021;46:38–44. doi: 10.1016/j.ajem.2021.03.006. [DOI] [PubMed] [Google Scholar]
- 55.Ribeiro M.T., Singh S., Guestrin C. Why Should I Trust You?” Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016. pp. 1135–1144. [Google Scholar]
- 56.Lundberg S.M., Lee S.I. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4768–4777. [Google Scholar]
- 57.Molnar C. Lulu.com; 2020. Interpretable Machine Learning. [Google Scholar]
- 58.Hu J., Xu J., Li M., et al. Identification and validation of an explainable prediction model of acute kidney injury with prognostic implications in critically ill children: a prospective multicenter cohort study. EClinicalMedicine. 2024;68 doi: 10.1016/j.eclinm.2023.102409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Lee H., Hwang S.H., Park S., et al. Prediction model for type 2 diabetes mellitus and its association with mortality using machine learning in three independent cohorts from South Korea, Japan, and the UK: a model development and validation study. EClinicalMedicine. 2025;80 doi: 10.1016/j.eclinm.2025.103069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Purushotham S., Meng C., Che Z., Liu Y. Benchmarking deep learning models on large healthcare datasets. J Biomed Inform. 2018;83:112–134. doi: 10.1016/j.jbi.2018.04.007. [DOI] [PubMed] [Google Scholar]
- 61.Kruijt N., Laforet P., Vissing J., et al. 276th ENMC International Workshop: recommendations on optimal diagnostic pathway and management strategy for patients with acute rhabdomyolysis worldwide. 15th-17th March 2024, Hoofddorp, the Netherlands. Neuromuscul Disord. 2025;50 doi: 10.1016/j.nmd.2025.105344. [DOI] [PubMed] [Google Scholar]
- 62.Kenney K., Landau M.E., Gonzalez R.S., Hundertmark J., O'Brien K., Campbell W.W. Serum creatine kinase after exercise: drawing the line between physiological response and exertional rhabdomyolysis. Muscle Nerve. 2012;45(3):356–362. doi: 10.1002/mus.22317. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






