Abstract
Introduction
The aim of this pilot study was to develop and validate a machine learning (ML)-based clinical nomogram to predict the success rate of extracorporeal shock wave lithotripsy (ESWL) for kidney stones, optimizing patient selection and treatment outcomes.
Material and methods
A retrospective analysis of ESWL data in all nephrolithiasis patients was performed from January 2018 to September 2022. Age, gender, stone size, stone area, stone location inside the kidney, stone density, skin-to-stone distance (SSD), stent presence, hydronephrosis presence, complications, and number of ESWL procedures were analysed. Inclusion criteria were a single kidney stone, stone size 5 mm to 20 mm, stone density less than average 1000 HU according to a native CT scan, and an adult patient. Statistical analysis was performed using the T test, mean, and standard deviation, and the calculations were processed using IBM SPSS Statistics for Macintosh, Version 25.0 with statistically significant values indicated by p <0.05. Python programming language was used to test machine learning models based on the previous study data. The scikit-learn library was used as a source of different ML models.
Results
102 patients fulfilled inclusion and exclusion criteria. There were 70 male and 32 female patients. The mean age was 54.1 ±13.2 years, mean stone size 9.1 ±3.1 mm, stone area 47.3 ±30.8 mm2 and SSD 8.4 ±1.8 cm. Patients were divided into two groups. The first group consisted of patients who achieved an SFR ≤3 mm after single ESWL procedure (group ESWL 1st, n = 42), the second group were patients who needed more than one ESWL procedure to achieve the SFR (group ESWL nth, n = 60). Statistically significant predictors of single-treatment success were stone size (7.9 ±2.2 mm vs 10.0 ±3.4 mm, p <0.001), stone area (34.1 ±19.4 mm2 vs 56.6 ±33.9 mm2, p <0.001), and SSD (7.2 ±1.3 cm vs 9.3 ±1.7 cm, p <0.001). The Linear Discriminant Analysis model (LDA) achieved a mean predictive accuracy of ~70%. The final nomogram identified the highest probability of single-treatment ESWL success for SSD ≤8 cm and stone area ≤60 mm2, in all locations except for stones in the lower kidney pole.
Conclusions
Using machine learning we have introduced the nomogram predicting single treatment success rate of an ESWL procedure. The application of this ML-nomogram in clinical practice enables the nomogram to continuously improve and refine its outputs as new data become available.
Keywords: ESWL, machine learning, artificial intelligence, renal stone, nomogram
INTRODUCTION
Two interventions are currently most often used to treat uncomplicated nephrolithiasis: extracorporeal shock wave lithotripsy (ESWL) and retrograde intrarenal surgery (RIRS). ESWL uses high-energy shock waves to break stones into small fragments, which the patient spontaneously excretes through the urinary tract. This procedure is still considered a standard stone disease treatment option for kidney stones up to 20 mm (diameter) with a good long-term stone-free rate (SFR) [1–3]. The method is generally well-tolerated without a requirement for a general anesthesia and with minimum procedure-related complications. The indication criteria for each procedure can vary depending on the kidney stone properties or patients’ attitude towards multiple procedures needed to achieve SFR. ESWL is less invasive than RIRS, but with novel laser techniques RIRS becomes a preferable procedure. However, there are disadvantages of RIRS over ESWL: higher incidence of treatment complications, and the procedure is more costly, sometimes requiring a longer hospital stay. Choosing the optimal intervention for patients with uncomplicated nephrolithiasis is not straightforward, and therefore an adequate treatment method must be selected [4]. To improve the ESWL indication, a better quantitative approach is necessary. Machine learning (ML) is a term describing artificial intelligence (AI) algorithms that improve performance when exposed to new data. The development of ML-based nomogram is therefore of great importance to improve the ESWL success rate.
The primary aim of this study was to use ML algorithms to predict the single treatment success rate of an ESWL in patients with kidney stones. The secondary aim of the study was to determine stone parameters predicting the success rate of an ESWL procedure.
MATERIAL AND METHODS
A retrospective analysis of ESWL data in all nephrolithiasis patients was performed from January 2018 to September 2022. Age, gender, stone size, stone area, stone location inside the kidney, stone density, skin-to-stone distance (SSD), stent presence, hydronephrosis presence, complications, and number of ESWL procedures were analyzed. Inclusion criteria were a single kidney stone, stone size 5 mm to 20 mm, stone density less than average 1000 HU according to a native CT scan, and an adult patient. Patients with contraindications for an ESWL procedure and previous stone intervention were excluded from the study (Figure 1). The ESWL procedure was performed under local anesthesia in all patients using the same piezoelectric extracorporeal lithotripter Sonolith® i-sys, which includes an integrated lithotripsy table and C-arm fluoroscopy with standard ramping energy modulation protocol used in all patients (low energy settings for the first 100 shocks to improve tolerability and minimize soft tissue injury, followed by a gradual increase to therapeutic levels).
Figure 1.
Enrolment diagram.
Stone targeting was conducted by the same urologist using both radiography and ultrasonography. The energy of a single shock wave ranged from 0.15 to 1.2 mJ/mm2, with 2000 to 4000 shock waves administered per procedure.
Extracorporeal shock wave lithotripsy success rate analysis
To analyze a success rate of ESWL a SFR with stone fragments 3 mm or less on non-contrast CT imaging performed on all patients 2 weeks after the procedure was defined as a success. Patients were retrospectively divided into 2 groups. The first group consisted of patients who achieved given SFR after one ESWL procedure (group ESWL 1st). The second group consisted of patients who needed more than one ESWL procedure to achieve the SFR (group ESWL nth). Stone and patient parameters were compared in the groups.
Statistical analysis was performed using the T test, mean, and standard deviation, and the calculations were processed using IBM SPSS Statistics for Macintosh, Version 25.0 with statistically significant values indicated by p <0.05.
Machine learning model testing and creation of a nomogram
The next part of the study was ML testing with the aim of predicting the probability for an ESWL procedure success. Python programming language was used to test the machine learning models based on previous study data. The scikit-learn library was used as a source of different ML models. The models tested were as follows: Linear Discriminant Analysis (LDA), K Neighbors Classifier (KNN), Decision Tree Classifier (CART), Gaussian Naïve Bayes (NB), and Random Forest Classifier (RFC). Accuracy, precision, and recall for each of the ML models were evaluated.
To tackle the measurement imprecision and limited sample size (n = 102), we applied a parametric-bootstrap noise-injection strategy. Stone size (σ = 0.5 mm) and skin-to-stone distance (σ = 0.5 cm) were each assumed to follow normal error distributions; patient we generated 100 Monte-Carlo resamples of these 2 variables while keeping all other features and the outcome label unchanged. Patients were then randomly split into a modelling pool (68 patients, 2/3 of both study groups) and an independent hold-out test set (34 patients, remaining 1/3). The modelling pool was expanded by the above-described bootstrap to 6800 observations and randomly divided 80%/20% for training and validation. The final model from this phase was applied to the untouched originals in the test set, ensuring the algorithm never encountered the exact values used for evaluation. Steps of (1) patient splitting, (2) bootstrapping, (3) model training/validation and (4) testing were repeated 100 times with all reported metrics representing the mean across these runs.
Bioethical standards
The study was approved by Ethics Committee of Comenius University, Martin, Slovakia (No. approval: EK UNM 47/2020).
RESULTS
A total of 102 patients fulfilled the inclusion and exclusion criteria. There were 70 male and 32 female patients. The mean age was 54.1 ±13.2 years, mean stone size 9.1±3.1 mm, stone area 47.3±30.8 mm2, and SSD 8.4 ±1.8 cm.
Extracorporeal shock wave lithotripsy success rate analysis
Patients were divided into 2 groups: the single treatment success (group ESWL 1st, n = 42) and the failure of the single treatment (group ESWL nth, n = 60) (Figure 1). Average SFR in group ESWL 1st was 2 mm. The statistically significant measurements between the given groups (1 vs 2) were mean stone size, stone area, and SSD (Table 1), with p <0.001, respectively. There was no statistical significance between the groups in terms of age, sex, stone density, hydronephrosis presence and complications rate. A pre-stenting was performed in 51 patients (50%) with no statistically significant difference between the groups. Two patients developed renal hematoma, which was treated conservatively as a complication of the procedure.
Table 1.
Statistically significant measurements between the groups with p <0.001
| Parameter | ESWL 1st (n = 42) | ESWL nth (n = 60) |
|---|---|---|
| Stone size (mm) | 7.9 ±2.2 | 10.0 ±3.4 |
| Stone area (mm2) | 34.1 ±19.4 | 56.6 ±33.9 |
| SSD (cm) | 7.2 ±1.3 | 9.3 ±1.7 |
SSD – skin-to-stone distance
Machine learning model testing and creation of a nomogram
For ML purposes the models from scikit-learn library were tested on all given parameters. The stone location inside the kidney, SSD, and the area of each stone appeared to be of greatest importance and were used in the final nomogram.
Using python and machine learning models from the scikit-learn library, training of multiple models was performed to predict spread into group ESWL 1st or ESWL nth. Several models suffered from overtraining. The LDA model achieved the best average results by repeating the split test as described in the Methods section (approximately 70% mean accuracy). This process is schematically shown in Figure 2, and scores are shown in Figure 3.
Figure 2.
Visualization of single learning and verification cycle.
Figure 3.
Accuracy, precision, and recall scores for the tested ML models.
CART – Decision Tree Classifier; KNN – K Neighbors Classifier; LDA – Linear Discriminant Analysis; NB – Gaussian Naïve Bayes; RFC – Random Forest Classifier
After successful testing, a nomogram was created. Using the new ML nomogram, we estimated the probability of ESWL success (Figure 4). The nomogram can predict ESWL success in the single treatment based on SSD, area, and stone location. The best probability of a single treatment success was found with SSD ≤8 cm and stone area ≤ 60 mm2 in all locations with the exception of a lower kidney pole (Figure 4).
Figure 4.
Best probability of a single ESWL treatment success.
SSD – skin-to-stone distance
DISCUSSION
Kidney stones can be considered a problematic urological disease, because data show an increasing incidence rate [3, 5]. ESWL is a standard procedure still popular because of wide availability and good results. However, indications vary based on each department preference, and several recent studies favor RIRS [6].
It was demonstrated that patients after RIRS achieved better health-related quality of life (HRQOL) score results compared to ESWL patients [7]. These patients preferred RIRS because it was faster and less involving in comparison with ESWL procedures. This has led us to work on improvement of ESWL indication with the aim of only a single procedure needed to achieve SFR.
Stone area and SSD play a significant role in achieving ESWL success, according to our study findings. The role of SSD and stone burden was well confirmed by other authors [8]. However, stone area is not routinely used to analyze ESWL success, and based on our study this parameter is crucial. Recent studies also mention stone volume as a novel parameter properly assessing stone burden which superiority remains to be proven [9]. In our study, stone area was used due to the retrospective nature of the dataset and the limited availability of volumetric data across all patients.
A recent study by Kayar et al. [10] developed a nomogram for predicting stone-free outcomes after RIRS. The model stratified patients into 4 distinct risk categories based on clinical and anatomical factors. Notably, ROC analysis identified stone volume as a stronger predictor of treatment success compared to stone size. These findings support the growing role of individualized predictive tools in endourology and provide valuable context for interpreting our ESWL-specific model.
Stone density – despite being traditionally regarded as a key predictor of ESWL success – did not emerge as a significant variable in our nomogram. Several factors may account for this finding. First, stones in our cohort had an average density below 1000 HU, which may have limited variability and thus reduced the strength of association. Second, as this was a retrospective study, HU measurements were not standardized across all scans, and differences in CT protocols, slice thickness, or reconstruction parameters may have introduced measurement noise and attenuated true correlations. Third, stone density may be partially collinear with other predictive variables, thereby diluting its independent contribution in the multivariate model.
For ESWL procedures performed in our study the patients achieved only analgesic medication, no general anesthesia was used. All patients underwent the ESWL procedure using the same lithotripter with stone focusing done by both radiography and ultrasonography by the same urologist.
Interestingly, stent presence did not make any difference between the 2 groups compared in our study. Following the usual routine at our department, the pre-stenting rate was high. There are studies reporting that stent presence affects ESWL success – pre-stenting before ESWL does not benefit stone-free rate and can induce lower urinary tract symptoms [11]. We believe that with our ML nomogram we can reduce the pre-stenting rate in stone patients with high ML probability of 1st ESWL procedure success.
Machine learning as a problem-solving tool is still a novelty for many clinicians. ML was seldom used to improve ESWL success rate up to this date. In one study [12] the authors demonstrated that a machine learning algorithm trained on just 11 patients increased the hit rate of shockwaves. There was a study developing an AI model focused on ESWL single treatment of ureteral stones [13]. However, no studies focused on single treatment success of kidney stones, which was the aim of our study.
During our ML process, overtraining of models was challenging. We tested several models to achieve the best possible results. This is a standard routine in the ML process, with different models achieving different variability. In our study, the LDA model has shown to have the best parameters, and the final nomogram was based on this model.
However, ML is usually used on higher numbers of data, and this could have affected our results. To tackle the measurement imprecision and limited sample size, we applied a parametric-bootstrap noise-injection strategy, as described in the Material and methods section. This protocol injects realistic measurement noise and averages over 100 independent experiments, jointly minimizing overfitting risk while extracting maximal information from a modest cohort. A conventional k fold scheme would interleave each patient’s 100 noised replicas across all folds, so each “validation” fold is inevitably contaminated by versions of the same subjects that appear in training, inflating accuracy and leaving no truly independent check. In our 100 repeat pipeline the leakage inside the internal 80:20 split is harmless because each run is ultimately judged on a separate one third hold out of untouched originals, but k fold cross validation would omit that safeguard and therefore exaggerate performance while offering a noisier estimate drawn from a small number of original patients.
To observe the stability of our results, the whole process was repeated 100 times with a random selection of the model and test. This approach is still not ideal and can introduce biases into the resulting models. This can be overcome only by using a larger input data sample, but there is no universally applicable mathematical formula for determining the minimum sample size. While methods for evaluating sample size exist [14], the answer is always only available after testing. General rule of thumb, however, points at 10–1000× for each of the observed variables.
The strengths of our study include the complexity of analysis based on several stone and patient parameters, with all patients having only a single kidney stone and a complex ML application on these data.
This study has several limitations that should be acknowledged. First, the relatively small sample size of 102 patients limits the generalizability of the findings. However, this limitation was greatly reduced by the modified dataset. Although sufficient for a pilot study, a larger cohort would be necessary to further validate the model and improve its predictive accuracy. Second, the retrospective nature of the study may introduce biases related to data collection and patient selection. Additionally, because the study was conducted at a single institution, the findings may not be fully generalizable to other centers with different patient populations, equipment, or treatment protocols. Furthermore, although multiple clinical and radiological variables were analyzed, additional factors such as stone composition and patient-reported outcomes could further refine the model’s predictive performance. The study also utilized a single piezoelectric lithotripter, so the results may not be directly applicable to other types of ESWL devices, such as electromagnetic or electrohydraulic lithotripters. Despite these limitations, this study highlights the potential of machine learning in optimizing ESWL outcomes and provides a foundation for future research with larger, prospectively collected datasets and external validation cohorts.
CONCLUSIONS
Using machine learning we have introduced a nomogram to predict the single treatment success rate of an ESWL procedure. The best probability for ESWL success was found with SSD ≤8 cm and stone area ≤60 mm2 in all locations with the exception of a lower kidney pole. The application of machine learning in clinical practice enables the nomogram to continuously improve and refine its outputs as new data become available.
Funding Statement
FUNDING This research received no external funding.
CONFLICTS OF INTEREST
The authors declare no conflict of interest.
ETHICS APPROVAL STATEMENT
The study was approved by Ethics Committee of Comenius University, Martin, Slovakia (No. approval: EK UNM 47/2020).
This study was conducted at the Jessenius Faculty of Medicine, Comenius University, in cooperation with CERN, Switzerland.
References
- 1.Basulto-Martínez M, Olvera-Posada D, Velueta-Martínez IA, et al. Quality of life in patients with kidney stones: translation and validation of the Spanish Wisconsin Stone Quality of Life Questionnaire. Urolithiasis. 2020; 48: 419–424. [DOI] [PubMed] [Google Scholar]
- 2.Marseille E, Larson B, Kazi DS, Kahn JG, Rosen S. Thresholds for the costeffectiveness of interventions: alternative approaches. Bull World Health Organ. 2015; 93: 118–124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Skolarikos A, Jung H, Neisius A, et al. EAU Guidelines on Urolithiasis. EAU Guidelines. Presented at the EAU Annual Congress, Paris, 2024. [Google Scholar]
- 4.Noble PA, Hamilton BD, Gerber G. Stone decision engine accurately predicts stone removal and treatment complications for shock wave lithotripsy and laser ureterorenoscopy patients. PLoS One. 2024; 19: e0301812. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Siener R. Nutrition and kidney stone disease. Nutrients. 2021; 13: 1917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bosio A, Alessandria E, Dalmasso E, et al. Flexible ureterorenoscopy versus shockwave lithotripsy for kidney stones ≤2 cm: a randomized controlled trial. Eur Urol Focus. 2022; 8: 1816–1822. [DOI] [PubMed] [Google Scholar]
- 7.Svihra J Jr, Sopilko I, Svihrova V, Student V, Luptak J. Is health-related quality of life of patients after single-use flexible ureteroscopy superior to extracorporeal shock wave lithotripsy? A randomised prospective study. Urolithiasis. 2021; 49: 73–79. [DOI] [PubMed] [Google Scholar]
- 8.Güler Y. Non-contrast computed tomography-based factors in predicting ESWL success: a systematic review and meta-analysis. Prog Urol. 2023; 33: 27–47. [DOI] [PubMed] [Google Scholar]
- 9.Panthier F, Kutchukian S, Ducousso H, et al. How to estimate stone volume and its use in stone surgery: a comprehensive review. Actas Urol Esp (Engl Ed). 2024; 48: 71–78. [DOI] [PubMed] [Google Scholar]
- 10.Kayar K, Kayar R, Tuncel KG, Tosun C, Yucebas OE. Stone-free rate after RIRS: a multivariable analysis and predictive nomogram from a single-center study. World J Urol. 2025; 43: 369. [DOI] [PubMed] [Google Scholar]
- 11.Shen P, Jiang M, Yang J, et al. Use of ureteral stent in extracorporeal shock wave lithotripsy for upper urinary calculi: a systematic review and meta-analysis. J Urol. 2011; 186: 1328–1335. [DOI] [PubMed] [Google Scholar]
- 12.Muller S, Abildsnes H, & Østvik A, Kragset O, Gangås I, Birke H, et al. Can a dinosaur think? Implementation of artificial intelligence in extracorporeal shock wave lithotripsy. Eur Urol Open Sci. 2021; 27: 33–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Yang H, Wu X, Liu W, et al. CT-based AI model for predicting therapeutic outcomes in ureteral stones after single extracorporeal shock wave lithotripsy through a cohort study. Int J Surg. 2024; 110: 6601–6609. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Balki I, Amirabadi A, Levman J, et al. Sample-size determination methodologies for machine learning in medical imaging research: a systematic review. Can Assoc Radiol J. 2019; 70: 344–353. [DOI] [PubMed] [Google Scholar]




