Table 2.
Cross-validation results for six deep learning models on the kidney ultrasound classification task. The table reports both the average accuracy (± standard deviation) across five folds and the best validation accuracy achieved in any single fold. YOLO11x-cls demonstrates the highest average accuracy and peak validation performance, indicating strong generalization across different patient subsets. This table highlights the consistency and robustness of each model under a 5-fold cross-validation scheme. The performance variance across folds is also reflected through the standard deviation, offering insights into each model’s stability when trained on different subsets of the data.
| Model | Average accuracy across folds (%) | Best validation accuracy (%) |
|---|---|---|
| InceptionV3 | ![]() |
94.21 |
| EfficientNet | ![]() |
72.08 |
| VGG16 | ![]() |
88.75 |
| ResNet34 | ![]() |
89.20 |
| ResNet50 | ![]() |
90.08 |
| YOLOv8x-cls | ![]() |
95.44 |
| YOLO11x-cls | 90 ± 5.9 | 95.9 |





