. 2025 Nov 25;15:41940. doi: 10.1038/s41598-025-25755-1

Table 2.

Cross-validation results for six deep learning models on the kidney ultrasound classification task. The table reports both the average accuracy (± standard deviation) across five folds and the best validation accuracy achieved in any single fold. YOLO11x-cls demonstrates the highest average accuracy and peak validation performance, indicating strong generalization across different patient subsets. This table highlights the consistency and robustness of each model under a 5-fold cross-validation scheme. The performance variance across folds is also reflected through the standard deviation, offering insights into each model’s stability when trained on different subsets of the data.

Model	Average accuracy across folds (%)	Best validation accuracy (%)
InceptionV3		94.21
EfficientNet		72.08
VGG16		88.75
ResNet34		89.20
ResNet50		90.08
YOLOv8x-cls		95.44
YOLO11x-cls	90 ± 5.9	95.9