Visual Abstract
Keywords: AKI, drug nephrotoxicity, nephrotoxicity, prediction modeling
Abstract
Key Points
The developed dynamic model for predicting progression to stage 2 or higher AKI using multicenter data had robust performance.
Clinical decision support implementation of the developed model could help prevent AKI progression.
Kinetic eGFR, nephrotoxic drug burden, and BUN were the top features and remained the same across the models and sites.
Background
AKI is an abrupt decline in kidney function that occurs in about 20% of hospitalized admissions and may lead to irreversible kidney damage.
Methods
We developed and externally validated deep learning models to dynamically predict progression to stage 2 or higher AKI defined by Kidney Disease Improving Global Outcomes serum creatinine criteria within the next 48 hours. We used an extensive set of predictors including demographics, admission source, comorbidities, medications, laboratory, and vitals measurements.
Results
Our retrospective study includes adult noncritical care patients at the University of Pittsburgh Medical Center (UPMC; 2018–2022; n=39,755) and the University of Florida Health (UFH; 2012–2019; n=122,324). In the UFH and UPMC development cohort, the mean age was 55 and 71 and 55% (n=47,350) and 54% (n=15,128) were female, respectively. Stage 2 or higher AKI occurred in 3% (n=3257) and 8% (n=2296) of UFH and UPMC patients, respectively. Area under the receiver operating characteristic curve values with 95% confidence interval (CI) ranged between 0.77 (95% CI, 0.75 to 0.78; UPMC Model—model trained on UPMC patients) and 0.81 (95% CI, 0.79 to 0.82; UFH Model—model trained on University of Florida patients) for the UFH test cohort and between 0.79 (95% CI, 0.78 to 0.8; UFH Model) and 0.83 (95% CI, 0.82 to 0.84; UPMC Model) for the UPMC test cohort. UFH-UPMC Model achieved an area under the receiver operating characteristic curve of 0.81 (95% CI, 0.80 to 0.83) for UFH and 0.82 (95% CI, 0.81 to 0.84) for UPMC test cohorts. Kinetic eGFR, nephrotoxic drug burden, and BUN remained the features with the highest influence across the models and institutions.
Conclusions
The model developed using multicenter data had robust performance, suggesting that implementation could help prevent AKI progression.
Introduction
AKI, the decline of kidney function, occurs in about 20% of acute medical admissions, with the mortality rate among these patients averaging 23%.1,2 Drug-associated AKI accounts for approximately 15%–25% of all AKI events.3–7 Specifically, one third of potential adverse drug events in patients with AKI is due to continuation of a contraindicated nephrotoxin.8 In instances of mild AKI, the patient may achieve full recovery. However, once moderate to severe AKI occurs, the irreversible loss of nephron function may drastically reduce the kidney's lifespan and predispose it to CKD and kidney failure.2 AKI management with intervention can potentially lead to early, sustained recovery of kidney function and has been found to yield a better prognosis than patients with late recovery. Thus, timely identification and management of AKI with consideration to nephrotoxic burden should be implemented to prevent the progression of AKI, limit downstream consequences of AKI, and facilitate kidney function recovery.2,9
The use of machine learning techniques with electronic health records (EHRs) has enabled development of computational risk models that can promptly detect patients at risk, leading to better outcomes.10–16 However, existing AKI risk models face limitations including limited generalizability due to lack of external validation,11,12 utilization of less representative patient data for model development,15 and focus on static patient characteristics to predict AKI risk which prevents them from capturing dynamic changes in risk trajectory throughout patient encounters.13,17 Therefore, it is important to develop and externally validate models that capture dynamic changes, including changes in nephrotoxic burden.
In this work, we aimed to (1) develop and internally/externally validate across health systems a machine learning model for continuous risk prediction of AKI stage progression (stage 2 or more) using EHR from noncritical care patient populations from two hospitals, and (2) quantify the nephrotoxic burden and investigate its importance for AKI risk prediction.
Methods
Study Design and Participants
The study was approved by the University of Pittsburgh Institutional Review Board (IRB) and Pitt Privacy Office (STUDY20120008) as well as the University of Florida IRB and the University of Florida Privacy Office (IRB 201901123). The University of Pittsburgh Medical Center (UPMC) cohort included adult patients admitted to a non-intensive care unit (non-ICU) at the UPMC between July 1, 2018, and December 30, 2022, sampled to enrich cohort with encounters that had progression to AKI stage 2 or higher during their hospitalization (Supplemental Methods). The University of Florida Health (UFH) cohort included all adult admissions to the non-ICU at UFH between January 1, 2012, and August 22, 2019. The UFH final cohort included 122,324 hospital encounters from 71,693 patients, and the UPMC final cohort included 39,755 patients (Figure 1 and Supplemental Methods). At UFH, we had the authorization to include all admissions along with at least a 1-year history for each patient, providing a comprehensive dataset that captures longitudinal patient trajectories. On the other hand, we had only data for the most recent admission for each patient along with their medical history for UPMC patients due to data use agreement and data sharing restrictions.
Figure 1.

Cohort selection and exclusion criteria. (A) UFH cohort. (B) UPMC cohort. ICU, intensive care unit; UFH, University of Florida Health; UPMC, University of Pittsburgh Medical Center.
We adhered to the guidelines given in Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis18 and Leisman et al. under the type 3 analysis category.19 We randomly divided both cohorts into development (70% of observations, n=85,888 encounters for UFH and n=27,808 encounters for UPMC), validation (10%, n=11,879 encounters for UFH and n=3944 encounters for UPMC), calibration (5%, n=5981 encounters for UFH and n=1996 encounters for UPMC), and test (15%, n=18,576 encounters for UFH and n=5958 encounters for UPMC) sets.
Predictors, Outcomes, Model Development, and Validation
We derived static predictors (demographic and admission features, comorbidities, and preadmission medications and laboratory measurements) and dynamic predictors (repeatedly collected during the hospitalization including vitals, laboratory measurements, and nephrotoxic burden) for both cohorts using EHR. We calculated the nephrotoxic drug burden as the weighted sum of unique nephrotoxic drugs administered daily over the preceding 7 days, using a previously published modified-Delphi study that assessed the nephrotoxic potential of medications used in the nonintensive care setting.20 The outcome was stage 2 or higher AKI (moderate to severe AKI) based on Kidney Disease Improving Global Outcomes (KDIGO) serum creatinine (SCr) criteria development within the next 48 hours and was predicted every 12 hours during the entire hospitalization (Supplemental Figure 1).21 A computable phenotype algorithm, a version of eKidneyHealth algorithm modified to optimize alert counts, was run to identify CKD status and AKI stage.22 The baseline SCr was determined using preadmission measurements or the estimated SCr (Figure 2). Estimated reference creatinine was computed through back-calculation of the CKD Epidemiology Collaboration Study equation refit without race multiplier, as per recommendations, with a baseline eGFR assumption of 75 ml/min per 1.73 m2.22,23 We determined the daily kinetic eGFR (KeGFR) by estimating the creatinine production rate and tracking the percentage change in creatinine levels over time. We used creatinine measurements taken at intervals of at least 12 hours apart to estimate the production rate and provide insights into the percentage change in creatinine over time.24 The Supplemental Methods provide detailed descriptions for input and outcome variables, and all features are listed in Supplemental Table 1.
Figure 2.
Determination of reference creatinine flow. *Admission creatinine is the first creatinine during encounter. **Minimum of all SCr values measured from 7 days before admission up to first creatinine on admission day is calculated. ***Estimated SCr is obtained by back calculation from the 2021 CKD-EPI refit without race. CKD-EPI, CKD Epidemiology Collaboration; SCr, serum creatinine.
We trained separate recurrent neural network–based deep learning models on patient populations from UFH (UFH Model), from UPMC (UPMC Model), and a combined population from UFH and UPMC (UFH-UPMC Model). Feature processing, architecture, training, evaluation, calibration, and interpretation of the models are detailed in Supplemental Figure 2 and Supplemental Methods. We compared the models with commonly used predictive models, namely, logistic regression, random forest, and the extreme gradient boosting models (Supplemental Methods and Supplemental Table 2). We evaluated model performance using area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), sensitivity, and specificity. To describe feature influences on the predicted outputs, we used the Shapley Additive Explanations (SHAP) values.25 We also analyzed the trajectories of daily nephrotoxic drugs and performed an ablation study by removing the nephrotoxic drug burden feature and retraining the model (Supplemental Figures 3 and 4, Supplemental Methods, and Supplemental Tables 3 and 4).
Results
Patient Baseline Characteristics and Outcomes
Among 85,888 patient encounters in the UFH development cohort, the mean (SD) age was 55 (19) years, 47,350 (55%) encounters were women, and 18,625 (22%) were Black (Table 1). In the UPMC development cohort with 27,808 patient encounters, the mean (SD) age was 71 (14) years, 15,128 (54%) were women, and 2935 (11%) were Black. Notably, the demographics of UPMC encounters were significantly different from UFH encounters.
Table 1.
Summary of patient characteristics
| Patient Characteristics | UFH Development Cohort | UPMC Development Cohort | P Values |
|---|---|---|---|
| No. of patients | 50,179 | 27,808 | |
| No. of encounters | 85,888 | 27,808 | |
| Demographics | |||
| Age, yr, mean (SD) | 55 (19) | 71 (14) | <0.001 |
| Female sex, n (%) | 47,350 (55) | 15,128 (54) | 0.03 |
| Race, n (%) | |||
| Black | 18,625 (22) | 2935 (11) | <0.001 |
| Missing | 835 (1) | 214 (1) | <0.001 |
| Othera | 4839 (6) | 412 (1) | <0.001 |
| White | 61,589 (72) | 24,247 (87) | <0.001 |
| Ethnicity, n (%) | |||
| Hispanic | 3653 (4) | 121 (0) | <0.001 |
| Non-Hispanic | 82,235 (96) | 27,687 (100) | <0.001 |
| Comorbidities | |||
| Charlson Comorbidity Index, median (IQR) | 2 (0–6) | 2 (0–5) | <0.001 |
| Hypertension, n (%) | 15,924 (19) | 8827 (32) | <0.001 |
| Cardiovascular disease, n (%)b | 29,402 (34) | 10,236 (37) | <0.001 |
| Diabetes mellitus, n (%) | 24,260 (28) | 7626 (27) | 0.008 |
| Cancer, n (%) | 22,102 (26) | 4271 (15) | <0.001 |
| Liver disease, n (%) | 18,254 (21) | 2596 (9) | <0.001 |
| CKD, n (%) | 23,528 (27) | 6917 (25) | <0.001 |
| eGFR, ml/min per 1.73 m2, median (IQR) | 97.15 (78.47–111.81) | 77.93 (54.14–94.90) | <0.001 |
| Outcome | |||
| Hospital length of stay, d, median (IQR) | 5 (3–7) | 5 (4–8) | <0.001 |
| Among patients developed AKI | 6 (4–9) | 6 (4–9) | 0.41 |
| Worst AKI stage, n (%) | |||
| No AKI | 73,780 (86) | 17,724 (64) | <0.001 |
| Stage 1 | 8851 (10) | 7788 (28) | <0.001 |
| Stage 2 | 2003 (2) | 1806 (6) | <0.001 |
| Stage 3 | 1166 (1) | 438 (2) | 0.008 |
| Stage 3+KRT | 88 (0.1) | 52 (0.2) | <0.001 |
IQR, interquartile range; UFH, University of Florida Health; UPMC, University of Pittsburgh Medical Center.
Other race includes American Indian/Alaskan Native, Asian, Native Hawaiian/other Pacific Islander, and multiracial.
Cardiovascular disease was considered if there was a history of congestive heart failure, coronary artery disease of peripheral vascular disease.
Compared with the UFH cohort, patients in the UPMC cohort were more likely to have hypertension (32% versus 19%) and cardiovascular disease (37% versus 34%) but less likely to have cancer (15% versus 26%), liver disease (9% versus 21%), and CKD (25% versus 27%). Patients with UFH were more likely to have normal baseline kidney function with significantly higher eGFR (97.15 versus 77.93 ml/min per 1.73 m2). The median length of hospital stay for both cohorts was 5 days. The percentage of patients developing AKI (36% versus 14%) and developing moderate to severe AKI was higher for the UPMC cohort than the UFH cohort (1% versus 0.3% for no AKI patients, 11% versus 7% for stage 1 patients; Supplemental Table 5). Differences observed between two cohorts may be due to differences in sample selection methodology and should not be interpreted as variability in clinical characteristics and outcomes between patient populations from two health centers. The use of admission creatinine (49% in UFH, 35% in UPMC) and creatinine within the past year before admission (45% in UFH, 38% in UPMC) was similar for each cohort, with higher use of estimated creatinine for UPMC cohort (Supplemental Table 6).
Model Performance
Model performance metrics are shown along with their 95% confidence interval (CI) for UFH and UPMC separately in Table 2. Both the UFH Model (on UFH 0.81 [95% CI, 0.79 to 0.82] versus UPMC 0.79 [95% CI, 0.78 to 0.8]) and the UPMC Model (on UPMC 0.82 [95% CI, 0.81 to 0.83] versus UFH 0.77 [95% CI, 0.75 to 0.78]) had higher AUROCs for their source sites, while the models had reduced AUROCs for the test populations from cross sites. For UFH test patients, AUPRCs were similar for the UFH Model (0.05 [95% CI, 0.04 to 0.06]) and the UPMC Model (0.05 [95% CI, 0.04 to 0.06]) yet slightly higher for the UFH-UPMC Model (0.06 [95% CI, 0.05 to 0.06]). Although the best AUPRCs on the UPMC test cohort were obtained for the UFH-UPMC Model (0.13 [95% CI, 0.11 to 0.15]), the UFH Model gave the lowest value (0.1 [95% CI, 0.09 to 0.11]). AUPRCs on the UFH test cohort were significantly lower than the UPMC test cohort for all three models. Sensitivity was highest for models when performing on their source site (UFH Model on UFH 0.81 [95% CI, 0.81 to 0.81] and UPMC Model on UPMC 0.82 [95% CI, 0.82 to 0.82]), while specificity was highest for the UFH-UPMC Model (UFH-UPMC Model on UFH 0.72 [95% CI, 0.7 to 0.75] and on UPMC 0.73 [95% CI, 0.71 to 0.76]).
Table 2.
Classification performance metrics for each model on test cohorts from separate health institutions
| Model | Sensitivity (95% CI) | Specificity (95% CI) | AUROC (95% CI) | AUPRC (95% CI) |
|---|---|---|---|---|
| UFH Model | ||||
| UFH test cohort | 0.81 (0.81 to 0.81) | 0.66 (0.63 to 0.68) | 0.81 (0.79 to 0.82) | 0.04 (0.04 to 0.05) |
| UPMC test cohort | 0.78 (0.77 to 0.78) | 0.64 (0.62 to 0.67) | 0.79 (0.78 to 0.8) | 0.1 (0.09 to 0.11) |
| UPMC Model | ||||
| UFH test cohort | 0.76 (0.76 to 0.76) | 0.63 (0.61 to 0.66) | 0.77 (0.75 to 0.78) | 0.04 (0.04 to 0.05) |
| UPMC test cohort | 0.82 (0.82 to 0.82) | 0.69 (0.66 to 0.71) | 0.83 (0.82 to 0.84) | 0.12 (0.1 to 0.13) |
| UFH-UPMC Model | ||||
| UFH test cohort | 0.77 (0.77 to 0.77) | 0.72 (0.7 to 0.75) | 0.81 (0.8 to 0.83) | 0.06 (0.05 to 0.06) |
| UPMC test cohort | 0.76 (0.75 to 0.76) | 0.73 (0.71 to 0.76) | 0.82 (0.81 to 0.84) | 0.13 (0.11 to 0.15) |
University of Florida Health Model was trained on University of Florida development dataset, University of Pittsburgh Medical Center Model was trained on University of Pittsburgh Medical Center development dataset and University of Florida Health—University of Pittsburgh Medical Center Model was trained on combination of University of Florida Health and University of Pittsburgh Medical Center development datasets. AUPRC, area under the precision-recall curve; AUROC, area under the receiver operating characteristic curve; CI, confidence interval; UFH, University of Florida Health; UPMC, University of Pittsburgh Medical Center.
Model discrimination was also evaluated with respect to demographic profiles (Supplemental Table 7). UFH and UFH-UPMC models did not demonstrate major differences in performance outcomes regarding subgroup analyses applied (e.g., female versus male, Black versus non-Black) in UFH and UPMC cohorts. This outcome may indicate these models' robustness against bias with respect to these demographics. The UPMC model had significantly greater AUROC values for male patients compared with female patients for UPMC test cohort (0.86 [0.84 to 0.87] versus 0.81 [0.79 to 0.82]) potentially influenced by the relatively small size of the test cohort used for model evaluation. Similarly, the UPMC model had significantly higher AUROC for non-Black subjects than Black patients in the UFH test cohort (0.78 [0.76 to 0.80] versus 0.72 [0.69 to 0.75]) which could be from differences in patient demographics distributions (i.e., Black proportions: 22% [UFH] versus 11% [UPMC]).
Interpreting Model Outputs
We presented the list of 20 inputs with the highest feature importance by illustrating the mean absolute SHAP values for aggregated test cohorts from UFH (Supplemental Figures 5A and 6A), from UPMC (Supplemental Figures 5B and 6B), and the combination of test cohorts from both institutions (Figure 3). The order of the top three variables (mean KeGFR, nephrotoxic drug burden and mean BUN) was consistent across source and target sites and models. Although BUN often rises faster than SCr in AKI, it provides early and complementary insights into renal function due to its sensitivity to factors like protein intake and hydration status.26 KeGFR offers a dynamic measure of kidney function that incorporates the rate of change in creatinine over time and essentially tracks the patient's creatinine production rate. The distinct kinetics and complementary insights allow our model to more effectively capture significant fluctuations and trends in renal function, which are critical for predicting AKI within a short timeframe. Previous studies have demonstrated that the presence and dosage of nephrotoxic drugs are strong predictors for the development of AKI.2 This relationship is further highlighted in our SHAP analysis, where a nephrotoxic drug has been identified among the top features influencing the predictive model. SHAP values do not constitute or indicate any causal relation between features and outcome(s). Thus, top variables could be considered as plausible variables that potentially signal patients' future kidney conditions, as they were shown to be biologically correlated to kidney damage.3,27,28
Figure 3.

Feature importance for UFH-UPMC model on combined test sets from UFH and UPMC. keGFR, kinetic eGFR; SHAP, shapley additive explanations.
Most of the top ten variables for the UFH Model and UPMC Model were based on laboratory measurements routinely collected during patient hospitalizations at both UFH and UPMC health systems. Specifically, mean serum chloride and mean serum sodium in the past 12 hours were consistently listed within the top ten features derived from the UFH Model and the UPMC Model for both sites, and the mean serum calcium in the past 12 hours was observed in the top features for the UFH-UPMC Model. This increased surveillance may signal a systemic risk (such as anemia, infections) that predispose patients to AKI, as physiologically unstable patients tend to be monitored more frequently. In addition, certain medications also require ongoing laboratory monitoring due to their metabolism through kidneys and liver. Another interpretation could be that UPMC cohort covers a time period overlapping with early/peak phases coronavirus disease 2019 (COVID-19). During this period of such high clinical uncertainty, patients who were hospitalized or suspected of having COVID-19 could be subjected to more frequent laboratory testing as well and potentially higher risk of AKI.29 Despite not necessarily being direct indicators of the patients' most recent kidney conditions, SCr variables derived within 365 days before admission were clinically intuitive, as those values were potentially used to determine patients' baseline SCr. Similarly, minimum SCr values within the past year before admission consistently remained among the top influential features across the models and sites. Vital records (minimum body temperature in the past 6–12 hours and maximum mean arterial pressure in the last 6 hours) were indexed among the top ten variables for the UFH-UPMC Model for the combined test population. Other common variables observed as influential across all models and health systems were based on the counts of laboratory values monitoring metabolic (number of serum albumin values in the past 12 hours) and blood (number of hemoglobin values in the past 12 hours) indicators. Age and sex had a higher influence on predictions made with the UFH Model and the UPMC Model on both sites, while race was also listed among the top 20 for the UFH Model on both centers. Except for minimum SCr, all variables with the highest influence on the predictions made with the UFH-UPMC Model were derived from records streamed during hospitalization.
Discussion
We developed models for continuously predicting stage 2 or higher AKI within the next 48 hours for patients from UFH and UPMC, which are two different health centers operating on different integrated information and data warehousing systems. The model developed using multicenter data proved to be more robust than any locally developed models. This improvement is attributed to the use of a large and diverse dataset. Applying SHAP method elucidated feature importance patterns that were related to kidney damage (such as BUN) or the care being provided (such as the number of measurements taken), demonstrating the potential to gain trust from patients and clinicians. Both the feature importance and the ablation study on nephrotoxic drug burden demonstrated that incorporating nephrotoxic drug burden into the model development improves its performance.
Recent reviews demonstrated that AKI is an active field of research for improved predictive, diagnostic, and prognostic digital health solutions.30–32 Although most AKI prediction models target intensive care unit populations, there are few studies focused on general cohorts receiving noncritical care.33–35 In a recent review, only three studies were reported predicting AKI and externally validating the model on adult patients from the same institution with temporal difference or from a distinct health system or centers.36 Churpek et al. externally validated a model for continuously predicting stage 2 or higher AKI within the next 48 hours with reported AUROCs of 0.80 and 0.84 for ward patients.33 Patients admitted to six sites were considered for validating models predicting stage 2 or higher AKI within the next 48 hours, with AUROCs ranging between 0.68 and 0.80.34 In a study by Kim et al., AKI within the next 7 days was predicted for general admissions, resulting in an AUROC of 0.90 for stage 2 or higher AKI in their external validation experiments.35 Recent work completed by Cao et al. replicated the state-of-the-art AKI prediction study, given by Tomasev et al., on a more diverse patient population and reported an AUROC of 0.81.15,16 In Zhang et al., a discrete time survival model attained to AUROC values ranging 0.83–0.89 for predicting severe AKI (stage 2–3) within the next 48 hours in external validation cohorts.37 In their work, Koyner et al. trained a deep model incorporates clinical notes and reported an AUROC of 0.88 for stage 2 and higher prediction.38
We used a dynamic model in line with the evolving landscape of AKI prediction models as discussed in previous studies.39,40 Accurately predicting patients who are likely to get worse with a reasonable lead time may enable caregivers to capture the cases that need more aggressive treatments, change the fluid administration plan, or re-evaluate and control nephrotoxic exposure.31,34 Our study departs from Churpek et al.33 by providing detailed subgroup analyses used in evaluating performance discrepancies across the subgroups. Our work also differs from the study of Cao et al.16 and Song et al.34 that considered only the first 7 days of patients' admissions, while our study considered the complete hospitalization period of patients. Similarly, in Song et al., patients' length of stays were limited to the first 7 days of admissions; therefore, periods beyond 7 days remained unexamined. Our study contributes to AKI prediction research by developing a deep dynamic model for noncritical care patients while maintaining competitive performance results for externally validated patient populations.
In the baseline model comparisons performed for predicting severe AKI beyond the first 48 hours of admission, our model's performance did not surpass that of static methods such as extreme gradient boosting. This outcome highlights a critical insight that, when using routinely collected clinical data, the predictive power may be inherently capped regardless of the complexity of the algorithm. To achieve significant improvements, future research should consider incorporating nontraditional data sources and modalities. For instance, integrating natural language processing techniques to analyze clinical notes or using image data from diagnostic tests can potentially uncover additional predictive signals not captured by standard clinical metrics.
Our study has several limitations. The AKI definition used in this study did not include the urine output criteria as measurements taken during general care clinics visits remain unreliable and are not monitored as intensively as in critical care units. In line with this, a recent systematic review reported that several large cohort studies involving non-ICU patients have relied on SCr, as urine output data are typically not collected in these settings.41 Besides, we were unable to include urine output as an input in our model, which may have led to missing potentially important early indicators for intervention, such as assessing response to fluid resuscitation. We acknowledge that SCr is a functional biomarker rather than a direct marker of tissue damage. However, SCr remains a widely used and clinically relevant biomarker for diagnosing AKI, particularly in settings where more immediate biomarkers are either not available or validated for route clinical use. Similarly, particularly focusing on stage 2 and higher AKI within a 48-hour window, that is doubling the SCr levels per KDIGO guidelines, introduces certain limitations as it may not fully capture the underlying pathophysiologic processes or the immediate effect of acute kidney damage. However, the KDIGO criteria are widely recognized and used in both clinical practice and research, providing a standardized and consistent framework for identifying and staging AKI. We considered an enriched dataset for UPMC cohort to allocate cases strategically. This approach ensured we remained within the allowable data-sharing limits and enhanced the representativeness of the minority class. Although being used in dealing with class imbalance, this may change the class distribution seen during training. Despite its limitations, employing this definition allows for comparability with existing studies and ensures that our findings are relevant and translatable within the broader context of AKI research. Another limitation is that we relied on patients' EHR data to derive nephrotoxic drug related features and did not consider prehospital nephrotoxic medication exposure. The developed model was based on recurrent neural network architecture, which is limited in processing long sequences due to vanishing gradients. Therefore, the model may fail to deliver accurate results for patients with a long hospitalization period and future research leveraging advanced machine learning techniques is necessary to handle long sequences. Some false positives might be cases of impending AKI that were prevented by timely clinical intervention. The differences in date ranges arose due to the availability and readiness of the datasets at each site. The UFH dataset spans from 2012 to 2019, reflecting the period for which comprehensive digital records were available and consistently maintained. On the other hand, the UPMC dataset covers 2018–2022, aligning with the timeframe when their data systems were updated and standardized to ensure high-quality, complete records suitable for our analysis. Although these differences present some methodologic variability, each dataset independently meets high standards for data quality and consistency within its respective time frame. In addition, data collection period for UPMC overlaps with the COVID-19 pandemic, which may potentially influence the dataset with more intense monitoring including more frequent laboratory testing. Finally, there was a difference in the distribution of baseline creatinine categories which may reflect the strengths of these specific institutions in maintaining comprehensive longitudinal records particularly for patients with regular health care engagement. In less integrated or resource-limited health care settings, historical creatinine data may be sparser and potentially limiting the applicability of similar modeling strategies. Accordingly, prospective deployment and assessment is necessary to identify the true workflow benefit of the developed models.
We developed and externally validated models for predicting stage 2 and higher AKI within the next 48 hours for non-ICU patients. Our model maintained competitive performance on external validation cohorts. Although externally validated models and results presented in this paper were strengths of this study, real-time clinical beneficiaries of the model necessitate further evaluation, which is an immediate objective of our future work.
Supplementary Material
Acknowledgments
The authors gratefully acknowledge the technical support of NVIDIA Artificial Intelligence Technology Center at UF for this research. The authors acknowledge the Intelligent Clinical Care Center research group for support provided for this study. The authors acknowledge the University of Florida Integrated Data Repository, the University of Florida Health Office of the Chief Data Officer for providing the UFH analytic dataset, and the UPMC R3 for providing the UPMC dataset for this project. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The funders had no role in design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Footnotes
E.A. and Y.R. contributed equally to this work as first authors.
T.O.B. and A.B. contributed equally to this work as senior authors.
Disclosures
Disclosure forms, as provided by each author, are available with the online version of the article at http://links.lww.com/KN9/B305.
Author Contributions
Conceptualization: Esra Adiyeke, Nabihah Amatullah, Azra Bihorac, Sandra L. Kane-Gill, Raghavan Murugan, Tezcan Ozrazgat-Baslanti, Parisa Rashidi, Yuanfang Ren, Benjamin Shickel, Britney A. Stottlemyer, Tiffany L. Tran.
Formal analysis: Esra Adiyeke, Ziyuan Guan, Yuanfang Ren, Matthew M. Ruppert.
Writing – original draft: Esra Adiyeke, Tezcan Ozrazgat-Baslanti, Yuanfang Ren.
Writing – review & editing: Esra Adiyeke, Nabihah Amatullah, Azra Bihorac, Ziyuan Guan, Christopher M. Horvat, Sandra L. Kane-Gill, Raghavan Murugan, Tezcan Ozrazgat-Baslanti, Parisa Rashidi, Yuanfang Ren, Dan Ricketts, Matthew M. Ruppert, Benjamin Shickel, Britney A. Stottlemyer, Tiffany L. Tran.
Funding
A. Bihorac, S.L. Kane-Gill, R. Murugan, P. Rashidi, Y. Ren, and B. Shickel: National Institute of Diabetes and Digestive and Kidney Diseases (R01 DK121730). T. Ozrazgat-Baslanti: National Institute of Diabetes and Digestive and Kidney Diseases (R01 DK121730 and K01 DK120784). E. Adiyeke: National Institute of Diabetes and Digestive and Kidney Diseases (K01 DK120784). This work was supported by National Center for Advancing Translational Sciences (UL1 TR000064 and UL1 TR001427).
Declarative Statements
An initial version of this work has appeared in preprint form on: https://doi.org/10.48550/arXiv.2402.04209. Large dataset (e.g., omics data, health care data, clinical trial data, imaging data).
Data Availability Statements
Original data generated for the study will be made available upon reasonable request to the corresponding author. Data Type: Observational Data; Health Care Data. Reason for Restricted Access: Data used in this analysis include both date and time stamps. In order to prevent patient privacy compromises due to inclusion of identifiers in the data, our data cannot be publicly shared in a repository. Author contact is Azra Bihorac (abihorac@ufl.edu).
Supplemental Material
This article contains the following supplemental material online at http://links.lww.com/KN9/B306.
Supplemental Figure 1. AKI identification flow.
Supplemental Figure 2. Calibration plots for UFH-UPMC Model using isotonic regression for 20 bins.
Supplemental Figure 3. Daily nephrotoxic drugs given for UFH.
Supplemental Figure 4. Daily nephrotoxic drugs given for UPMC.
Supplemental Figure 5. Feature importance for UFH Model on UFH test set (A) and on UPMC test set (B).
Supplemental Figure 6. Feature importance for UPMC Model on UFH test set (A) and on UPMC test set (B).
Supplemental Table 1. Summary of input features and preprocessing descriptions.
Supplemental Table 2. Comparison of different predictive models.
Supplemental Table 3. Classification performance metrics for UFH-UPMC Model and UFH-UPMC Model-nephrotoxic drugs excluded on test cohorts.
Supplemental Table 4. Classification performance metrics for UFH-UPMC Model and UFH-UPMC Model-nephrotoxic drugs excluded on test cohorts stratified by sex, age, comorbidities, and race.
Supplemental Table 5. Transition probability for AKI stage outcome within the next 48 hours.
Supplemental Table 6. Baseline SCr identification distributions.
Supplemental Table 7. Classification performance metrics for each model on test cohorts from separate health institutions stratified by sex and race.
References
- 1.Finlay S Bray B Lewington AJ, et al. Identification of risk factors associated with acute kidney injury in patients admitted to acute medical units. Clin Med (Lond). 2013;13(3):233–238. doi: 10.7861/clinmedicine.13-3-233 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Kellum JA, Romagnani P, Ashuntantang G, Ronco C, Zarbock A, Anders H-J. Acute kidney injury. Nat Rev Dis Primers. 2021;7(1):52. doi: 10.1038/s41572-021-00284-z [DOI] [PubMed] [Google Scholar]
- 3.Mehta RL Pascual MT Soroko S, et al. Spectrum of acute renal failure in the intensive care unit: the PICARD experience. Kidney Int. 2004;66(4):1613–1621. doi: 10.1111/j.1523-1755.2004.00927.x [DOI] [PubMed] [Google Scholar]
- 4.Uchino S Kellum JA Bellomo R, et al. Acute renal failure in critically ill patients: a multinational, multicenter study. JAMA. 2005;294(7):813–818. doi: 10.1001/jama.294.7.813 [DOI] [PubMed] [Google Scholar]
- 5.Menon S, Kirkendall ES, Nguyen H, Goldstein SL. Acute kidney injury associated with high nephrotoxic medication exposure leads to chronic kidney disease after 6 months. J Pediatr. 2014;165(3):522–577.e2. doi: 10.1016/j.jpeds.2014.04.058 [DOI] [PubMed] [Google Scholar]
- 6.Davison AM, Jones CH. Acute interstitial nephritis in the elderly: a report from the UK MRC glomerulonephritis register and a review of the literature. Nephrol Dial Transplant. 1998;13(suppl 7):12–16. doi: 10.1093/ndt/13.suppl_7.12 [DOI] [PubMed] [Google Scholar]
- 7.Baker RJ, Pusey CD. The changing profile of acute tubulointerstitial nephritis. Nephrol Dial Transplant. 2004;19(1):8–11. doi: 10.1093/ndt/gfg464 [DOI] [PubMed] [Google Scholar]
- 8.Cox ZL McCoy AB Matheny ME, et al. Adverse drug events during AKI and its recovery. Clin J Am Soc Nephrol. 2013;8(7):1070–1078. doi: 10.2215/CJN.11921112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kellum JA, Sileanu FE, Bihorac A, Hoste EA, Chawla LS. Recovery after acute kidney injury. Am J Respir Crit Care Med. 2017;195(6):784–791. doi: 10.1164/rccm.201604-0799OC [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Koyner JL, Adhikari R, Edelson DP, Churpek MM. Development of a multicenter ward-based AKI prediction model. Clin J Am Soc Nephrol. 2016;11(11):1935–1943. doi: 10.2215/CJN.00280116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Koyner JL, Carey KA, Edelson DP, Churpek MM. The development of a machine learning inpatient acute kidney injury prediction model. Crit Care Med. 2018;46(7):1070–1077. doi: 10.1097/CCM.0000000000003123 [DOI] [PubMed] [Google Scholar]
- 12.Haines RW Lin SP Hewson R, et al. Acute kidney injury in trauma patients admitted to critical care: development and validation of a diagnostic prediction model. Sci Rep. 2018;8(1):3665. doi: 10.1038/s41598-018-21929-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Peng JC Wu T Wu X, et al. Development of mortality prediction model in the elderly hospitalized AKI patients. Sci Rep. 2021;11(1):15157. doi: 10.1038/s41598-021-94271-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Motwani SS, McMahon GM, Humphreys BD, Partridge AH, Waikar SS, Curhan GC. Development and validation of a risk prediction model for acute kidney injury after the first course of cisplatin. J Clin Oncol. 2018;36(7):682–688. doi: 10.1200/JCO.2017.75.7161 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tomašev N Glorot X Rae JW, et al. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature. 2019;572(7767):116–119. doi: 10.1038/s41586-019-1390-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Cao J Zhang X Shahinian V, et al. Generalizability of an acute kidney injury prediction model across health systems. Nat Mach Intell. 2022;4(12):1121–1129. doi: 10.1038/s42256-022-00563-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hodgson LE, Sarnowski A, Roderick PJ, Dimitrov BD, Venn RM, Forni LG. Systematic review of prognostic prediction models for acute kidney injury (AKI) in general hospital populations. BMJ Open. 2017;7(9):e016591. doi: 10.1136/bmjopen-2017-016591 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMC Med. 2015;13:1. doi: 10.1186/s12916-014-0241-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Leisman DE Harhay MO Lederer DJ, et al. Development and reporting of prediction models: guidance for authors from editors of respiratory, sleep, and critical care journals. Crit Care Med. 2020;48(5):623–633. doi: 10.1097/CCM.0000000000004246 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Stottlemyer BA Abebe KZ Palevsky PM, et al. Expert consensus on the nephrotoxic potential of 195 medications in the non-intensive care setting: a modified Delphi method. Drug Saf. 2023;46(7):677–687. doi: 10.1007/s40264-023-01312-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kellum JA Lameire N Aspelin P, et al. Kidney disease: improving global outcomes (KDIGO) acute kidney injury work group. KDIGO clinical practice guideline for acute kidney injury. Editorial. Kidney Int Suppl. 2012;2(1):1–138. doi: 10.1038/kisup.2012.1 [DOI] [Google Scholar]
- 22.Ozrazgat-Baslanti T Ren Y Adiyeke E, et al. Development and validation of a race-agnostic computable phenotype for kidney health in adult hospitalized patients. PLoS One. 2024;19(4):e0299332. doi: 10.1371/journal.pone.0299332 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Inker LA Eneanya ND Coresh J, et al. New creatinine-and cystatin C–based equations to estimate GFR without race. New Engl J Med. 2021;385(19):1737–1749. doi: 10.1056/NEJMoa2102953 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Chen S. Retooling the creatinine clearance equation to estimate kinetic GFR when the plasma creatinine is changing acutely. J Am Soc Nephrol. 2013;24(6):877–888. doi: 10.1681/ASN.2012070653 [DOI] [PubMed] [Google Scholar]
- 25.Lundberg SM, Lee S-I. A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems; 2017. [Google Scholar]
- 26.Frank H Graf J Amann-Gassner U, et al. Effect of short-term high-protein compared with normal-protein diets on renal hemodynamics and associated variables in healthy young men. Am J Clin Nutr. 2009;90(6):1509–1516. doi: 10.3945/ajcn.2009.27601 [DOI] [PubMed] [Google Scholar]
- 27.Christiadi D Erlich J Levy M, et al. The kinetic estimated glomerular filtration rate ratio predicts acute kidney injury. Nephrology (Carlton). 2021;26(10):782–789. doi: 10.1111/nep.13918 [DOI] [PubMed] [Google Scholar]
- 28.Edelstein CL. Biomarkers of acute kidney injury. Adv Chronic Kidney Dis. 2008;15(3):222–234. doi: 10.1053/j.ackd.2008.04.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Legrand M Bell S Forni L, et al. Pathophysiology of COVID-19-associated acute kidney injury. Nat Rev Nephrol. 2021;17(11):751–764. doi: 10.1038/s41581-021-00452-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Selby NM, Pannu N. Opportunities in digital health and electronic health records for acute kidney injury care. Curr Opin Crit Care 2022;28(6):605–612. doi: 10.1097/mcc.0000000000000971 [DOI] [PubMed] [Google Scholar]
- 31.Kashani KB, Koyner JL. Digital health utilities in acute kidney injury management. Curr Opin Crit Care. 2023;29(6):542–550. doi: 10.1097/mcc.0000000000001105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Yu X, Ji Y, Huang M, Feng Z. Machine learning for acute kidney injury: changing the traditional disease prediction mode. Front Med. 2023;10:1050255. doi: 10.3389/fmed.2023.1050255 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Churpek MM Carey KA Edelson DP, et al. Internal and external validation of a machine learning risk score for acute kidney injury. JAMA Netw Open. 2020;3(8):e2012892. doi: 10.1001/jamanetworkopen.2020.12892 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Song X Yu AS Kellum JA, et al. Cross-site transportability of an explainable artificial intelligence model for acute kidney injury prediction. Nat Commun. 2020;11(1):5668. doi: 10.1038/s41467-020-19551-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kim K Yang H Yi J, et al. Real-time clinical decision support based on recurrent neural networks for in-hospital acute kidney injury: external validation and model interpretation. J Med Internet Res. 2021;23(4):e24120. doi: 10.2196/24120 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wainstein M, Flanagan E, Johnson DW, Shrapnel S. Systematic review of externally validated machine learning models for predicting acute kidney injury in general hospital patients. Front Nephrol. 2023;3:1220214. doi: 10.3389/fneph.2023.1220214 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Zhang Y Xu D Gao J, et al. Development and validation of a real-time prediction model for acute kidney injury in hospitalized patients. Nat Commun. 2025;16(1):68. doi: 10.1038/s41467-024-55629-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Koyner JL Martin J Carey KA, et al. Multicenter development and validation of a multimodal deep learning model to predict moderate to severe AKI. Clin J Am Soc Nephrol. 2025;20(6):766–778. doi: 10.2215/CJN.0000000695 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Mistry NS, Koyner JL. Artificial intelligence in acute kidney injury: from static to dynamic models. Adv Chronic Kidney Dis. 2021;28(1):74–82. doi: 10.1053/j.ackd.2021.03.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Bajaj T, Koyner JL. Artificial intelligence in acute kidney injury prediction. Adv Chronic Kidney Dis. 2022;29(5):450–460. doi: 10.1053/j.ackd.2022.07.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Feng Y Wang AY Jun M, et al. Characterization of risk prediction models for acute kidney injury: a systematic review and meta-analysis. JAMA Netw Open. 2023;6(5):e2313359. doi: 10.1001/jamanetworkopen.2023.13359 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Original data generated for the study will be made available upon reasonable request to the corresponding author. Data Type: Observational Data; Health Care Data. Reason for Restricted Access: Data used in this analysis include both date and time stamps. In order to prevent patient privacy compromises due to inclusion of identifiers in the data, our data cannot be publicly shared in a repository. Author contact is Azra Bihorac (abihorac@ufl.edu).


