Skip to main content
Springer logoLink to Springer
. 2025 Sep 18;53(1):179. doi: 10.1007/s00240-025-01856-4

Development and validation of an explainable machine learning model for predicting sepsis risk following flexible ureteroscopic lithotripsy

Ruichen Li 1, Biao Zhang 2, Liying Zeng 2, Jiayan Mo 1, Jinyuan Zhang 1, Sheng Bi 1,
PMCID: PMC12446151  PMID: 40965651

Abstract

Sepsis is a severe complication of flexible ureteroscopic lithotripsy (fURL), a widely used treatment for kidney stones. This study aimed to develop and validate a predictive model based on machine learning (ML) for assessing the risk of sepsis following fURL while enhancing its interpretability through Shapley Additive Explanations (SHAP). This retrospective study in China was conducted to develop and validate a prediction model for sepsis following fURL. The derivation cohort comprised 1,386 patients treated between 2019 and July 2024 divided into training and internal validation subsets. External validation was performed on a cohort of 604 patients treated between 2019 and 2023 at a collaborating center. Sepsis was diagnosed according to Sepsis-3.0 consensus guidelines. Fifteen machine learning algorithms were employed to construct predictive models, and their performance was meticulously evaluated using metrics such as the area under the receiver operating characteristic curve (AUC). To enhance model interpretability, the Shapley Additive Explanations (SHAP) method was applied to assess and rank the importance of individual features. The Extra Trees (ET) model incorporating eight key features demonstrated the best discriminative ability, with an AUC of 0.90. It accurately predicted sepsis in both internal (AUC = 0.87) and external validation (AUC = 0.81). In this study, we developed an Extra Trees (ET) machine learning model to predict sepsis risk following fURL, which demonstrated high accuracy in predicting sepsis in both the internal and external validation cohorts. This model, equipped with SHAP-driven interpretability and deployed as an accessible web application, has the potential to serve as a clinical tool for patient risk stratification following fURL.

Keywords: Kidney stones, Sepsis, Flexible ureteroscopy, Machine learning, SHAP (Shapley additive explanations)

Introduction

Kidney stone disease represent a growing public Health issue worldwide, characterized by a steady rise in annual incidence. In the United States, 19% of men and 9% of women are diagnosed with kidney stones by the age of 70 [1]. The prevalence of urolithiasis in China ranges between 1.5% and 18%, with higher rates observed in the southern regions [2]. Flexible ureteroscopic lithotripsy (fURL) is a mainstream modality for the treatment of kidney stones measuring less than 2 cm, offering advantages such as reduced trauma, accelerated recovery, and fewer complications. Recent technological advancements, including fURL combined with negative pressure suction, improved laser technologies and the development of single-use flexible ureteroscopes, have significantly expanded its global adoption [3]. However, despite these advancements, postoperative sepsis remains a major complication, with reported incidence rates ranging from 0.5% to 11.1% [4].

Sepsis is a life-threatening syndrome characterized by organ dysfunction resulting from a dysregulated host response to infection. It manifests with symptoms such as fever, low blood pressure, and in severe cases, organ failure or death. Accurate prediction of sepsis risk is essential to minimize postoperative complications and improve patient outcomes, particularly after surgical procedures such as fURL [5]. Several studies have explored the key risk factors for sepsis [610]. For instance, Chugh et al. found that patients with conditions like obesity, advanced age, female sex, neurogenic bladder, prolonged operative times, and the presence of ureteral stents faced an increased risk of developing sepsis [10].

In recent years, the rapid advancement of artificial intelligence (AI) technologies, including machine learning (ML) and deep learning (DL), have expanded their applications in the medical field. These technologies are now widely utilized in disease diagnosis, prognosis prediction, and personalized treatment planning, delivering improved accuracy, efficiency, and opportunities for early intervention. The integration of AI into healthcare holds the potential to transform traditional medical practices and enhance patient outcomes [11]. For example, Pietropaolo et al. used Random Forest algorithm to predict the risk of ICU admission for sepsis following ureteroscopic lithotripsy. Using clinical data from nine European centers, their model achieved an accuracy of 81.3%, with an area under the receiver-operating-characteristic curve (AUC) of 0.89, demonstrating strong predictive capabilities. Key predictors include stone location, stent duration, stone size, and surgical time [12]. Similarly, Chen et al. developed a DNN model combined with the least absolute shrinkage and selection operator (LASSO) method to predict sepsis risk after fURL and percutaneous nephrolithotomy. Their model, which utilized preoperative CT imaging and clinical data from 847 patients, achieved an AUC of 0.920, effectively identifying critical risk factors for sepsis [13].

Artificial intelligence (AI) has shown great promise in enhancing predictive models. However, its integration into clinical practice and the advancement of interpretable prediction models remain constrained, largely due to the lack of transparency, as the intricate mechanisms of complex algorithms often obscure their decision-making processes [14]. Addressing this challenge is crucial for facilitating clinical use and ensuring that advanced tools are practical and aligned with evidence-based medical practices. To enhance model interpretability, tree-based ensemble machine learning algorithms provide the advantage of quantifying feature importance, elucidating the contributions of individual variables to the model’s predictions. Inspired by the Shapley value, a concept from cooperative game theory, SHAP fairly distributes contributions among features, deconstructing the model’s output into individual feature contributions. With consistent values, SHAP provides a thorough understanding of the model’s behavior through visualizations such as SHAP summary plots, enhancing both interpretability and clinical applicability [15].

This study seeks to develop and validate a ML model to predict sepsis after fURL, employing preoperative and intraoperative data. Additionally, the SHAP method is applied to elucidate feature importance and enhance the model’s interpretability.

Methods

Study patients

This study included adult patients who underwent fURL, excluding those with conditions including pregnancy, ureteral malformations, or horseshoe kidneys. A total of 1,386 patients from Yiyang Central Hospital were included, with 1,178 patients (treated between 2019 and 2023) forming the training set and 208 patients (treated between January and July 2024) serving as the internal validation set. The external validation cohort comprised 604 patients treated at the Yiyang First Traditional Chinese Medicine Hospital between 2019 and 2023. The inclusion and exclusion criteria for the external cohort mirrored those for the derivation cohort.

Data collection

Preoperative non-contrast abdominal CT was conducted for all patients to assess stone burden by calculating the surface area (SA) [16]. The formula used is as follows:

graphic file with name d33e255.gif

For cases involving multiple stones, the burden of stone was determined through the sum of the surface areas (SA) of each. Based on previous studies of risk factors for sepsis following fURL or URL [610], this study collected variables including baseline patient characteristics, laboratory test results, surgical duration, presence of diabetes, preoperative ureteral stent duration, and occurrence of complications.

Definition of sepsis

Sepsis is identified as a life-threatening condition characterized by organ dysfunction resulting from a dysregulated host response to infection. Diagnosis is guided by the quick Sequential Organ Failure Assessment (qSOFA) score, which incorporates three criteria: cognitive impairment (Glasgow Coma Scale score < 15), a systolic blood pressure of 100mmHg or lower, and a respiratory rate of 22 breaths per minute or higher. A qSOFA score of ≥ 2 signifies suspected sepsis and is linked to a 3- to 14-fold rise in in-hospital mortality. Compared to traditional methods, this scoring system provides a more practical and efficient means of assessing infection-induced organ dysfunction. These criteria aligned with the Sepsis-3.0 consensus guidelines [1719].

Sample size calculation

This study was designed to achieve an AUC of at least 0.9 [13]. Using the R package “pmsampsize,” the required sample size was calculated as 1,127 participants with 57 outcome events, assuming an outcome prevalence of 0.05 and 16 parameters, following the methodology outlined by Riley et al. [20].

Data preprocessing

Before model construction, data processing was applied to the training data. Python 3.11.7, along with PyCaret library 3.3.2, was used for data handling and machine learning (ML). Cases with missing values were excluded through listwise deletion. Categorical features with up to 25 unique categories were one-hot encoded to improve model performance. The synthetic minority over-sampling technique (SMOTE) was employed to enhance the model’s ability to accurately classify instances of the minority class, thereby improving overall performance [21]. Z-score normalization was performed during data standardization to ensure that all features were on a uniform scale, preventing any single feature from disproportionately influencing model training.

Model construction

Dimension reduction techniques, including algorithm-based feature selection and collinearity analysis, were employed to improve classifier performance. The “classic” approach, which combines permutation feature importance with models such as Random Forest, AdaBoost, and linear correlation analysis, was applied. A threshold of 0.80 was set for feature selection, and collinearity was assessed using Pearson correlation. To ensure reliable estimates of the model’s generalization capability, a tenfold cross-validation was implemented.

Model evaluation

Fifteen machine learning algorithms were evaluated to identify significant models, and their performance was assessed using a comprehensive set of metrics, including accuracy, AUC, recall, precision, F1 score, and Cohen’s kappa. Decision curve analysis (DCA) was used to evaluate the clinical practical value of the models. DCA provides a comprehensive graphical tool that enables decision-makers to evaluate whether a model can improve decision-making in practical clinical scenarios [22].

Model interpretation

To enhance the model’s interpretability, SHAP values were utilized to provide consistent and precise attributions for each feature in the predictive model. This unified approach explains the output of machine learning model by evaluating the contribution of each feature across all possible combinations with other features.

Web application deployment using streamlit framework

To enhance the practicality of the research findings and offer robust support for real-world clinical predictions, the model was deployed as a web application using the Streamlit framework in Python. By inputting the values of the predictive features included in the final model, the application provides the predicted probability of sepsis and generates an individualized force plot for each patient.

Statistical analysis

Categorical variables were presented as frequencies and percentages, with group differences evaluated using the χ² test or Fisher’s exact test for expected frequencies < 10. Continuous variables were expressed as medians with interquartile ranges (IQR) and compared between groups using the Wilcoxon rank-sum test. All statistical analyses and computations were conducted using R (v4.2.0) and Python (v3.11.7). P value < 0.05 was considered indicative of statistical significance.

Result

Characteristics of patients

In total, 1,990 patients who underwent fURL were included in this study. The derivation cohort consisted of data from 1,386 patients with fURL treated at Yiyang Central Hospital between 2019 and July 2024, whereas the external validation cohort comprised 604 patients from Yiyang First Traditional Chinese Medicine Hospital between 2019 and 2023. Figure 1 illustrates the overall workflow of this study, along with detailed information on participant recruitment for each analysis.

Fig. 1.

Fig. 1

The overall flowchart of the study. fURL, flexible ureteroscopic lithotripsy

Among the 1,386 patients in the derivation cohort, 68 (4.9%) individuals developed sepsis following fURL. Table 1 provides a detailed comparison of demographic and clinical characteristics across the training, internal validation, and external validation cohorts. Notably, the external validation cohort exhibited statistically significant differences compared to the training set for several variables. These included sex, age, preoperative white blood cell count, albumin, blood urea nitrogen, creatinine, urinary white blood cell count, nitrite, surgical duration, stone burden, Double-J stent duration, and the incidence of preoperative fever.

Table 1.

Characteristics of the cohort participants in the prediction model of sepsis outcomes

Variable Derivation cohort External validation cohort N = 604
training set N = 1 178 internal validation set N = 208
Sex
female 452 (38%) 102 (49%) 249 (61%)*
male 726 (62%) 106 (51%) 355 (59%)*
Age 55·000 (48·000–64·000) 55·000 (49·000–64·000) 54·000 (47·000–61·000)*
WBC 7·000 (5·755–8·770) 7·220 (5·838–8·560) 6·000 (5·000–8·000)*
Albumin 41·900 (39·200–44·100) 42·000 (39·000–44·000) 43·000 (40·375–46·000)*
ALT 20·750 (14·925–31·800) 22·000 (15·000–32·000) 21·000 (14·775–32·550)
BUN 5·800 (4·600–7·220) 5·600 (4·500–7·000) 5·400 (4·400–6·700)*
CR 87·000 (70·000–112·750) 86·000 (69·000–115·000) 80·000 (65·000–99·000)*
uWBC 45·000 (15·000–115·000) 42·500 (15·000–112·000) 15·000 (0·000–125·000)*
Nitrite
- 1089(92%) 194(93%) 577(95)*
+ 89 (8%) 14 (7%) 27 (5%)*
Urinary culture
- 1053(89%) 183(88%) 551(91%)
+ 125 (11%) 25 (12%) 53 (9%)
Diabetes
no 1060(90%) 177(85%) 548(91%)
yes 118 (10%) 31 (15%) 56 (9%)
Surgical duration 60·000 (40·000–75·000) 55·500 (38·000–75·000) 80·000 (60·000–120·000)*
Stone burden 54·950 (31·400–95·758) 50·648 (28·260–93·600) 117·750 (78·108–196·370)*
Double-J stent duration 2·000 (0·000–4·000) 2·000 (0·000–5·000) 0·000 (0·000–10·250)*
Preoperative fever
no 1105(94%) 197(95%) 586(97%)*
yes 73 (6%) 11 (5%) 18 (3%)*
Suctioning UAS
no 110(9.3%) 19(9.1%) 0(0%)
yes 1 068 (91%) 189 (91%) 604 (100%)

*P < 0.05, external validation cohort vs. Training set

Abbreviations: WBC, White Blood Cell; ALT, Alanine Aminotransferase; BUN, Blood Urea Nitrogen; CR, Creatinine; uWBC, Urinary White Blood Cells; UAS, ureteral access sheaths

Further analysis, detailed in Table 2, summarizes the demographic and clinical characteristics of patients with and without sepsis in both the derivation and external validation cohorts. In the derivation cohort, patients who developed sepsis, when compared to those who did not, were significantly older, presented with higher preoperative white blood cell counts, had lower albumin levels, and exhibited markedly higher urinary white blood cell counts. Additionally, a higher prevalence of positive urinary nitrites and positive urinary cultures was observed in the sepsis group. Surgical duration, stone burden, Double-J stent duration, and the incidence of preoperative fever also showed significant differences between the sepsis and non-sepsis groups within the derivation cohort.

Table 2.

Characteristics of derivation and external validation cohorts according to sepsis status

Variable Derivation cohort External validation cohort
Non Sepsis, N = 1 318 Sepsis, N = 68 Non Sepsis, N = 573 Sepsis, N = 31
Sex
female 520(39%) 34(50%) 237(41%) 12(39%)
male 798 (61%) 34 (50%) 336 (59%) 19 (61%)
Age 55·000 (48·000–64·000) 57·500 (51·000–69·000)* 53·000 (47·000–61·000) 55·000 (48·500–60·500)
WBC 7·000 (5·740–8·645) 8·355 (5·958–10·730)* 6·000 (5·000–7·850) 6·740 (5·010–8·000)
Albumin 42·000 (39·200–44·100) 40·850 (37·950–43·050)* 43·000 (40·400–46·000) 42·900 (40·050–45·000)
ALT 20·750 (15·000–31·825) 25·750 (17·000–32·250) 21·000 (14·800–33·000) 21·000 (14·300–29·800)
BUN 5·800 (4·600–7·228) 5·520 (4·495–6·605) 5·400 (4·400–6·700) 5·400 (4·500–6·000)
CR 86·000 (69·000–113·000) 91·500 (75·000–111·250) 80·000 (65·000–100·000) 81·000 (72·000–98·500)
uWBC 42·000 (15·000–103·000) 590·000 (199·250–1 074·000)* 15·000 (0·000–100·000) 500·000 (70·000–500·000)
Nitrite
- 1240(94%) 43(63%)* 553(96%) 24(77%)
+ 78 (6%) 25 (37%)* 20 (4%) 7 (23%)
Urinary culture
- 1201(91%) 35(51%)* 535(93%) 16(52%)
+ 117 (9%) 33 (49%)* 38 (7%) 15 (48%)
Diabetes
no 1180(90%) 57(84%) 520(91%) 28(90%)
yes 138 (10%) 11 (16%) 53 (9%) 3 (10%)
Surgical duration 60·000 (40·000–75·000) 60·000 (45·000–91·000)* 80·000 (60·000–115·000) 80·000 (68·500–140·000)
Stone burden 54·950 (28·274–94·985) 70·690 (46·315–126·826)* 117·750 (78·500–196·250) 141·300 (66·725–249·412)
Double-J stent duration 2·000 (0·000–4·000) 6·500 (0·000–30·000)* 0·000 (0·000–9·000) 15·000 (0·000–30·000)
Preoperative fever
no 1245(94%) 57(84%)* 560(98%) 26(84%)
yes 73 (6%) 11 (16%)* 13 (2%) 5 (16%)
Suctioning UAS
no 123(9%) 6(9%) 0(0%) 0(0%)
yes 1 195 (91%) 62 (91%) 573 (100%) 31 (100%)

*P < 0.05, non sepsis vs. sepsis

Abbreviations: WBC, White Blood Cell; ALT, Alanine Aminotransferase; BUN, Blood Urea Nitrogen; CR, Creatinine; uWBC, Urinary White Blood Cells; UAS, ureteral access sheaths

Model development and performance comparison

After feature selection, eight features were retained for further analysis: white blood cell count (WBC), albumin, creatinine (CR), alanine transaminase (ALT), Double-J stent duration, urinary white blood cell count (uWBC), surgical duration, and stone burden. A total of 15 models, including Extra Trees, Random Forest, and XGBoost, were constructed. Table 3 presents a comparison of the performance metrics for all models. Among the 15 ML models, the Extra Trees (ET) model demonstrated the highest predictive performance for sepsis with an AUC of 0.90, followed closely by the Logistic Regression (LR) and Gradient Boosting Classifier (GBC).

Table 3.

Discriminatory performance of sepsis risk prediction models for patients undergoing flexible ureteroscopic lithotripsy in the training set

Algorithm used in model Accuracy AUC Recall Prec. F1 Kappa
ET 0.9626 0.9043 0.4467 0.6933 0.536 0.5179
LR 0.8455 0.8877 0.69 0.2098 0.3171 0.2606
GBC 0.9431 0.8873 0.5467 0.4717 0.495 0.466
XGBoost 0.9457 0.8842 0.53 0.4456 0.4754 0.4476
Ridge 0.8676 0.8831 0.6733 0.2369 0.3467 0.2945
LDA 0.8676 0.8831 0.6733 0.2369 0.3467 0.2945
RF 0.9533 0.8828 0.3767 0.555 0.4422 0.4189
LightGBM 0.9482 0.8821 0.4967 0.477 0.4735 0.447
SVM 0.8029 0.8743 0.7967 0.1915 0.3042 0.2438
ADA 0.9176 0.8114 0.5167 0.3132 0.3798 0.3401
QDA 0.8693 0.8101 0.6567 0.2262 0.3328 0.2803
NB 0.8337 0.8014 0.6967 0.2012 0.305 0.2475
KNN 0.8701 0.7789 0.5533 0.2265 0.3152 0.2622
DT 0.9211 0.7119 0.48 0.3238 0.3806 0.3414
Dummy 0.9508 0.5 0 0 0 0

Abbreviations: ET, Extra Trees Classifier; RF, Random Forest Classifier; Dummy, Dummy Classifier; LightGBM, Light Gradient Boosting Machine; XGBoost, Extreme Gradient Boosting; GBC, Gradient Boosting Classifier; DT, Decision Tree Classifier; ADA, AdaBoost Classifier; KNN, K-Nearest Neighbors Classifier; QDA, Quadratic Discriminant Analysis; Ridge, Ridge Classifier; LDA, Linear Discriminant Analysis; LR, Logistic Regression; NB, Naive Bayes; SVM, Support Vector Machine - Linear Kernel

Identification and validation of the final model

The Receiver Operating Characteristic (ROC) curves for the top four performing models were subsequently evaluated in the internal and external validation sets, as displayed in Fig. 2. In the internal validation, the Extra Trees (ET) model achieved an AUC of 0.87, the Logistic Regression (LR) model achieved an AUC of 0.86, the Extreme Gradient Boosting (XGBoost) model achieved an AUC of 0.84, and the Gradient Boosting Classifier (GBC) model achieved an AUC of 0.83.In the external validation, the ET model maintained good performance with an AUC of 0.81. The LR model achieved an AUC of 0.84, while the GBC model had an AUC of 0.66.

Fig. 2.

Fig. 2

ROC curves for the top four machine learning models predicting sepsis risk in patients undergoing fURL. A: ROC curves for the internal validation set. B: ROC curves for the external validation set

To assess the clinical utility and net benefit of these leading models, Decision Curve Analysis (DCA) was performed, with results presented in Fig. 3. The DCA illustrated that the ET model provided a higher net benefit compared to strategies of treating all or no patients across a clinically relevant range of threshold probabilities. In the internal validation, the ET model showed a net benefit for threshold probabilities between approximately 3% and 34%. In the external validation set, this benefit was observed for threshold probabilities ranging from 3% to 14%. In contrast, the second-best model, the traditional LR model, attained an AUC of 0.89, 0.86 in internal validation, and 0.84 in external testing, respectively. However, the clinical net benefit for the LR model was limited to threshold probabilities ranging from 3% to 17% in internal validation, and from 3% to 7% in external validation. Therefore, the ET model was ultimately chosen for subsequent analysis.

Fig. 3.

Fig. 3

Decision Curve Analysis (DCA) illustrating the clinical net benefit of the top four machine learning models for predicting sepsis risk in patients undergoing fURL. A: DCA curves for the internal validation set. B: DCA curves for the external validation set

Model explanation

To identify the most influential features in the predictive model and provide a global explanation of their overall impact, we generated a SHAP summary plot for the ET model, highlighting eight key variables from the training dataset. Figure 4A illustrates the relationship between these features and their corresponding SHAP values, with each dot representing an individual patient. Red dots indicate higher SHAP values, suggesting an increased risk of developing urosepsis, while blue dots represent lower SHAP values, indicating a reduced risk. The eight most influential features identified were urinary white blood cell count (uWBC), Double-J stent duration, white blood cell count (WBC), alanine transaminase (ALT), creatinine (CR), albumin, surgical duration, and stone burden, highlighting their clinical significance in predicting sepsis among patients undergoing flexible ureteroscopic lithotripsy (fURL).

Fig. 4.

Fig. 4

Global model explanation and Local model explanation for the ET Model. A: SHAP summary dot plot. Each dot represents an individual patient. Red dots indicate higher SHAP values (associated with increased sepsis risk), while blue dots represent lower SHAP values. The probability of sepsis increases with the SHAP value of a feature. B: SHAP force plot. The base value indicated the mean predicted output. Feature names and their values were shown at the bottom of the plot, with each feature group ordered outward from the center according to their impact magnitude

In addition to global explanations, local explanation analyzes how a specific prediction is made for an individual patient. Figure 4B illustrates the concept of a SHAP force plot for such an individual-level interpretation. For instance, a patient was predicted by the ET model to have a 17% probability of developing sepsis following fURL. The SHAP force plot demonstrated that risk-contributing features, including surgical duration (90 min), ALT (35 U/L), and extended Double-J stent duration (6 days), were represented as red segments, indicating their positive impact on sepsis risk. In contrast, protective features such as uWBC (28/µL), WBC (7 × 10^9/L), CR (45 µmol/L), and albumin (37 g/L) appeared as blue segments, exerting a mitigating influence. The combined effect of these competing factors resulted in a relatively low predicted probability of 17%, highlighting the predominance of protective features in this case.

Convenient application for clinical utility

To assess the sepsis risk of patients in real-time, the final model was deployed as a web application, making it accessible for use by doctors, patients, and their families. By entering the values of the eight key features required by the model, the application predicts an individual patient’s risk of developing sepsis. Additionally, a force plot will be generated for each patient, highlighting how individual features contribute to the prediction. Red sections represent features driving the prediction toward “sepsis,” while blue sections indicate features pushing it toward “non-sepsis.” White arrows serve as dividing lines between features, and the distance between adjacent dividing lines reflects the influence of the corresponding feature values. The web application can be accessed online at https://predictor-of-sepsis.streamlit.app.

Discussion

Significance of the study

This study highlights the effectiveness of machine learning models in predicting sepsis risk following fURL. Among the 15 ML models, the ET model achieved the highest AUC value and along with a good net benefit. ET, an ensemble learning method, improves model generalization and stability by generating multiple highly randomized decision trees. Previous studies suggest that the ET model may have predictive value in the medical field [23, 24]. SHAP analysis enhanced the interpretability of the model by identifying key features influencing sepsis risk, including urinary white blood cells, duration of double-J stent placement, albumin, ALT, creatinine, white blood cell count, stone burden, and surgical duration.

All predictive factors incorporated into our model are routinely assessed during the evaluation of surgical indications, ensuring its practical applicability for urologists. Identifying patients at higher risk of sepsis before surgery allows for the implementation of targeted preventive strategies, such as placing a double-J stent, shortening surgical duration, minimizing irrigation fluid usage, and reducing irrigation pressure. Additionally, this model has the potential to help reduce unnecessary antibiotic exposure in low-risk patients, thereby minimizing the risk of complications and combating antimicrobial resistance.

Strengths and limitations

Compared with existing models for predicting sepsis risk following fURL, our ET model has several advantages. First, with an AUC of 0.9 in the derivation set and strong performance in both internal (AUC = 0.87) and external validations (AUC = 0.81). A retrospective study by Yang et al. employedutilized traditional univariate and multivariate binary logistic regression analyses to develop a nomogram for predicting the risk of urosepsis following fURL. The ROC curve for the nomogram demonstrated an AUC of 0.887 and 0.864 in the internal validation [25]. However, their study did not undergo external validation. Additionally, the ability of the ET model to effectively handle nonlinear relationships and complex variable interactions provides a distinct advantage when integrating clinical indicators and laboratory results.

Compared to traditional logistic regression models, the “black box” nature of the ET model results in lower interpretability. However, SHAP analysis enhances interpretability by providing clear insights into feature contributions. Using SHAP values, clinicians can identify influential variables, improve risk management and address the need for interpretable healthcare models that are effectively integrated into clinical workflows. Additionally, we deployed the model on a web application built using the Streamlit framework, making it accessible online and shareable with a wider group of clinicians.

This study has several limitations. First, key inflammatory markers such as procalcitonin and C-reactive protein were not included, and additional variables such as the specific location of the urinary stone and whether the patient initially presented through the emergency department were also not considered. Incorporating these factors could potentially enhance the model’s predictive accuracy. Moreover, there is robust evidence supporting preoperative urine culture as a significant risk factor for postoperative sepsis [26]. In our study, preoperative urine culture was also found to be significantly associated with postoperative sepsis; however, this variable was ultimately excluded from the final model during the feature selection process, as its predictive contribution was determined to be lower than the other retained features. The omission of this clinically relevant factor may have reduced the comprehensiveness and predictive power of the model. Second, as a retrospective analysis, prospective studies and randomized controlled trials (RCTs) are needed to validate our findings and establish specific recommendations for predicting sepsis following fURL. Third, the model was developed using data from a Chinese population and externally validated at another local hospital. As shown in Table 1, patients from the external hospital exhibited lower white blood cell counts, larger stone size, higher albumin levels, and longer operation times compared to those from our hospital. These variations may influence the predictive accuracy and generalizability of the model in different populations. Further validation in diverse cohorts is required. Fourth, machine learning models typically perform best with large datasets; although sample size calculations were conducted, future research incorporating multicenter data could improve the model’s robustness and predictive power. Finally, the relatively low incidence of sepsis following fURL resulted in an imbalance in prediction outcomes, favoring negative predictions. While the SMOTE method was applied to address class imbalance, it remains a potential limitation.

Further research

Future efforts will focus on developing larger medical databases, integrating imaging data such as preoperative CT scans, and advancing AI techniques to further improve predictive accuracy [27]. Enhanced algorithms will be made accessible via smartphones or the cloud, expanding their applications in clinical decision making. However, real-world deployment requires approval from the regulatory bodies. Concerns remain regarding the reliability of AI-driven diagnoses and the potential for programming biases to affect the accuracy of medical evaluations [28].

Conclusion

In conclusion, this study successfully developed and validated a robust, interpretable machine learning model for predicting sepsis risk following fURL. The integration of SHAP analysis enhances its clinical trustworthiness and applicability. While patients identified by the model as high-risk warrant increased clinical attention, further prospective research is necessary to confirm that model-guided interventions can improve patient outcomes.

Acknowledgements

This work was supported by the Yiyang Central Hospital Research Funding (2024QN02) and Scientific Research Project of the Hunan Provincial Health Commission (D202304056133). The funders had no role in the study design, data collection, manuscript preparation, or decision to publish.

Author contributions

R.L. performed data analyses, established the machine learning models, and drafted the manuscript. L.Z., B.Z., J.M., J.Z., and T.Z. contributed to data collection. R.L. and S.B. were involved in the study design. All authors reviewed and approved the manuscript.

Data availability

The dataset and analysis code used in this study are available for academic research purposes and can be provided upon request by contacting the corresponding author.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Awedew AF, Han H, Berice BN, Dodge M, Schneider RD, Abbasi-Kangevari M, Al-Aly Z, Almidani O, Alvand S, Arabloo J, Aravkin AY, Ayana TM, Bhardwaj N, Bhardwaj P, Bhaskar S, Bikbov B, Santos FLC dos, Charan J, Cruz-Martins N, Dadras O, Dai X, Digesa LE, Elhadi M, Elmonem MA, Esezobor CI, Fatehizadeh A, Gebremeskel TG, Getachew ME, Ghamari S-H, Hay SI, Ilic IM, Ilic MD, Jayarajah U, Jazayeri SB, Kim MS, Lee S-W, Lee SWH, Lim SS, Mahmoud MA, Malik AA, Mentis A-FA, Mestrovic T, Michalek IM, Mihrtie GN, Mirrakhimov EM, Mokdad AH, Moni MA, Moradi M, Murray CJL, Ortiz A, Pawar S, Perico N, Rashidi M-M, Rawassizadeh R, Remuzzi G, Schumacher AE, Singh JA, Skryabin VY, Skryabina AA, Tan K-K, Tolani MA, Tahbaz SV, Valizadeh R, Vo B, Wolde AA, Jabbari SHY, Yazdanpanah F, Yiğit A, Yiğit V, Zahir M, Zastrozhin M, Zhang Z-J, Zumla A, Misganaw A, Dirac MA (2024) The global, regional, and national burden of urolithiasis in 204 countries and territories, 2000–2021: a systematic analysis for the Global Burden of Disease Study 2021. eClinicalMedicine 78:. 10.1016/j.eclinm.2024.102924
  • 2.Wang W, Fan J, Huang G, Li J, Zhu X, Tian Y, Su L (2017) Prevalence of kidney stones in mainland China: a systematic review. Sci Rep 7:41630. 10.1038/srep41630 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.He M, Dong Y, Cai W, Cai J, Xie Y, Yu M, Li C, Wen L (2024) Recent advances in the treatment of renal stones using flexible ureteroscopys. Int J Surg Lond Engl 110:4320–4328. 10.1097/JS9.0000000000001345 [Google Scholar]
  • 4.Bhojani N, Miller LE, Bhattacharyya S, Cutone B, Chew BH (2021) Risk factors for urosepsis after ureteroscopy for stone disease: a systematic review with meta-analysis. J Endourol 35:991–1000. 10.1089/end.2020.1133 [DOI] [PubMed] [Google Scholar]
  • 5.Evans L, Rhodes A, Alhazzani W, Antonelli M, Coopersmith CM, French C, Machado FR, Mcintyre L, Ostermann M, Prescott HC, Schorr C, Simpson S, Wiersinga WJ, Alshamsi F, Angus DC, Arabi Y, Azevedo L, Beale R, Beilman G, Belley-Cote E, Burry L, Cecconi M, Centofanti J, Coz Yataco A, De Waele J, Dellinger RP, Doi K, Du B, Estenssoro E, Ferrer R, Gomersall C, Hodgson C, Møller MH, Iwashyna T, Jacob S, Kleinpell R, Klompas M, Koh Y, Kumar A, Kwizera A, Lobo S, Masur H, McGloughlin S, Mehta S, Mehta Y, Mer M, Nunnally M, Oczkowski S, Osborn T, Papathanassoglou E, Perner A, Puskarich M, Roberts J, Schweickert W, Seckel M, Sevransky J, Sprung CL, Welte T, Zimmerman J, Levy M (2021) Surviving sepsis campaign: international guidelines for management of sepsis and septic shock 2021. Intensive Care Med 47:1181–1247. 10.1007/s00134-021-06506-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Uchida Y, Takazawa R, Kitayama S, Tsujii T (2018) Predictive risk factors for systemic inflammatory response syndrome following ureteroscopic laser lithotripsy. Urolithiasis 46:375–381. 10.1007/s00240-017-1000-3 [DOI] [PubMed] [Google Scholar]
  • 7.Pietropaolo A, Hendry J, Kyriakides R, Geraghty R, Jones P, Aboumarzouk O, Somani BK (2020) Outcomes of elective ureteroscopy for ureteric stones in patients with prior urosepsis and emergency drainage: prospective study over 5 year from a tertiary endourology centre. Eur Urol Focus 6:151–156. 10.1016/j.euf.2018.09.001 [DOI] [PubMed] [Google Scholar]
  • 8.Xu CG, Guo YL (2019) Diagnostic and prognostic values of BMPER in patients with Urosepsis following ureteroscopic lithotripsy. BioMed Res Int 2019:8078139. 10.1155/2019/8078139 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bai T, Yu X, Qin C, Xu T, Shen H, Wang L, Liu X (2019) Identification of factors associated with postoperative urosepsis after ureteroscopy with holmium: yttrium-aluminum-garnet laser lithotripsy. Urol Int 103:311–317 [DOI] [PubMed] [Google Scholar]
  • 10.Chugh S, Pietropaolo A, Montanari E, Sarica K, Somani BK (2020) Predictors of urinary infections and urosepsis after ureteroscopy for stone disease: a systematic review from EAU section of urolithiasis (EULIS). Curr Urol Rep 21:16. 10.1007/s11934-020-0969-2 [DOI] [PubMed] [Google Scholar]
  • 11.Shah M, Naik N, Somani BK, Hameed BZ (2020) Artificial intelligence (AI) in urology-current use and future directions: an iTRUE study. Turk J Urol 46:S27–S39. 10.5152/tud.2020.20117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Pietropaolo A, Geraghty RM, Veeratterapillay R, Rogers A, Kallidonis P, Villa L, Boeri L, Montanari E, Atis G, Emiliani E (2021) A machine learning predictive model for post-ureteroscopy urosepsis needing intensive care unit admission: a case–control YAU endourology study from nine European centres. J Clin Med 10:3888 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Chen M, Yang J, Lu J, Zhou Z, Huang K, Zhang S, Yuan G, Zhang Q, Li Z (2022) Ureteral calculi lithotripsy for single ureteral calculi: can DNN-assisted model help preoperatively predict risk factors for sepsis? Eur Radiol 32:8540–8549. 10.1007/s00330-022-08882-5 [DOI] [PubMed] [Google Scholar]
  • 14.Fehr J, Citro B, Malpani R, Lippert C, Madai VI (2024) A trustworthy AI reality-check: the lack of transparency of artificial intelligence products in healthcare. Front Digit Health 6:1267290. 10.3389/fdgth.2024.1267290 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Lu S-C, Swisher CL, Chung C, Jaffray D, Sidey-Gibbons C (2023) On the importance of interpretable machine learning predictions to inform clinical decision making in oncology. Front Oncol. 10.3389/fonc.2023.1129380 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Tailly T, Nadeau BR, Violette PD, Bao Y, Amann J, Nott L, Denstedt JD, Razvi H (2020) Stone burden measurement by 3D reconstruction on noncontrast computed tomography is not a more accurate predictor of stone-free rate after percutaneous nephrolithotomy than 2D stone burden measurements. J Endourol 34:550–557. 10.1089/end.2019.0718 [DOI] [PubMed] [Google Scholar]
  • 17.Fm W, Z T, Te BJ (2017) An update on classification and management of Urosepsis. Curr Opin Urol 27. 10.1097/MOU.0000000000000364
  • 18.Seymour CW, Liu VX, Iwashyna TJ, Brunkhorst FM, Rea TD, Scherag A, Rubenfeld G, Kahn JM, Shankar-Hari M, Singer M, Deutschman CS, Escobar GJ, Angus DC (2016) Assessment of clinical criteria for sepsis: for the third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA 315:762–774. 10.1001/jama.2016.0288 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Marik PE, Taeb AM (2017) SIRS, qSOFA and new sepsis definition. J Thorac Dis 9:943–945. 10.21037/jtd.2017.03.125 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Riley RD, Ensor J, Snell KIE, Harrell FE, Martin GP, Reitsma JB, Moons KGM, Collins G, van Smeden M (2020) Calculating the sample size required for developing a clinical prediction model. BMJ 368:m441. 10.1136/bmj.m441 [DOI] [PubMed] [Google Scholar]
  • 21.Matharaarachchi S, Domaratzki M, Muthukumarana S (2024) Enhancing smote for imbalanced data with abnormal minority instances. Mach Learn Appl 18:100597. 10.1016/j.mlwa.2024.100597 [Google Scholar]
  • 22.Piovani D, Sokou R, Tsantes AG, Vitello AS, Bonovas S (2023) Optimizing clinical decision making with decision curve analysis: insights for clinical investigators. Healthcare 11:2244. 10.3390/healthcare11162244 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ghiasi MM, Zendehboudi S (2021) Application of decision tree-based ensemble learning in the classification of breast cancer. Comput Biol Med 128:104089. 10.1016/j.compbiomed.2020.104089 [DOI] [PubMed] [Google Scholar]
  • 24.Zhang B, Dong X, Hu Y, Jiang X, Li G (2023) Classification and prediction of spinal disease based on the SMOTE-RFE-XGBoost model. PeerJ Comput Sci 9:e1280. 10.7717/peerj-cs.1280 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Yang M, Li Y, Huang F (2023) A nomogram for predicting postoperative urosepsis following retrograde intrarenal surgery in upper urinary calculi patients with negative preoperative urine culture. Sci Rep 13:2123. 10.1038/s41598-023-29352-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Sun J, Xu J, OuYang J (2019) Risk factors of infectious complications following ureteroscopy: a systematic review and meta-analysis. Urol Int 104:113–124. 10.1159/000504326 [DOI] [PubMed] [Google Scholar]
  • 27.Pinto-Coelho L (2023) How artificial intelligence is shaping medical imaging technology: a survey of innovations and applications. Bioengineering 10:1435. 10.3390/bioengineering10121435 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Zuhair V, Babar A, Ali R, Oduoye MO, Noor Z, Chris K, Okon II, Rehman LU (2024) Exploring the impact of artificial intelligence on global health and enhancing healthcare in developing nations. J Prim Care Community Health 15:21501319241245847. 10.1177/21501319241245847 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The dataset and analysis code used in this study are available for academic research purposes and can be provided upon request by contacting the corresponding author.


Articles from Urolithiasis are provided here courtesy of Springer

RESOURCES