Skip to main content
. 2024 Apr 11;13(4):512–527. doi: 10.21037/gs-23-417

Table 2. Diagnostic performance of the three pretrained deep learning models in the four classification tasks.

Tasks Models RadImageNet ImageNet
ACC AUC (95% CI) Sensitivity Specificity F1 ACC AUC (95% CI) Sensitivity Specificity F1
Nuclear grade ResNet50 0.667 0.560 (0.469–0.571) 0.400 0.720 0.286 0.610 0.510 (0.486–0.619) 0.474 0.458 0.452
InceptionV3 0.828 0.510 (0.485–0.515) 0.030 0.987 0.061 0.806 0.537 (0.465–0.563) 0.531 0.500 0.513
DenseNet121 0.761 0.540 (0.474–0.547) 0.200 0.873 0.218 0.650 0.563 (0.450–0.571) 0.433 0.693 0.292
ER ResNet50 0.558 0.574 (0.450–0.589) 0.524 0.623 0.610 0.642 0.520 (0.417–0.548) 0.903 0.151 0.772
InceptionV3 0.532 0.480 (0.406–0.527) 0.651 0.302 0.647 0.577 0.579 (0.448–0.586) 0.573 0.585 0.641
DenseNet121 0.513 0.460 (0.447–0.513) 0.621 0.302 0.628 0.526 0.540 (0.467–0.550) 0.553 0.472 0.606
PR ResNet50 0.610 0.570 (0.496–0.587) 0.920 0.220 0.730 0.526 0.493 (0.472–0.537) 0.744 0.242 0.640
InceptionV3 0.474 0.460 (0.433–0.491) 0.558 0.364 0.546 0.513 0.400 (0.386–0.533) 0.872 0.045 0.669
DenseNet121 0.493 0.460 (0.453–0.521) 0.698 0.227 0.609 0.552 0.530 (0.497–0.553) 0.697 0.364 0.638
HER2 ResNet50 0.649 0.583 (0.455–0.584) 0.396 0.330 0.422 0.541 0.450 (0.416–0.566) 0.563 0.530 0.442
InceptionV3 0.541 0.573 (0.495–0.583) 0.667 0.480 0.485 0.622 0.525 (0.489–0.568) 0.250 0.800 0.300
DenseNet121 0.642 0.530 (0.455–0.535) 0.208 0.850 0.274 0.669 0.560 (0.468–0.566) 0.250 0.870 0.329

ACC, accuracy; AUC, area under the curve; CI, confidence interval; ER, estrogen receptor; PR, progesterone receptor; HER2, human epidermal growth factor receptor 2.