. 2024 Apr 11;13(4):512–527. doi: 10.21037/gs-23-417

Table 2. Diagnostic performance of the three pretrained deep learning models in the four classification tasks.

Tasks	Models	RadImageNet					ImageNet
Tasks	Models	ACC	AUC (95% CI)	Sensitivity	Specificity	F1	ACC	AUC (95% CI)	Sensitivity	Specificity	F1
Nuclear grade	ResNet50	0.667	0.560 (0.469–0.571)	0.400	0.720	0.286	0.610	0.510 (0.486–0.619)	0.474	0.458	0.452
	InceptionV3	0.828	0.510 (0.485–0.515)	0.030	0.987	0.061	0.806	0.537 (0.465–0.563)	0.531	0.500	0.513
	DenseNet121	0.761	0.540 (0.474–0.547)	0.200	0.873	0.218	0.650	0.563 (0.450–0.571)	0.433	0.693	0.292
ER	ResNet50	0.558	0.574 (0.450–0.589)	0.524	0.623	0.610	0.642	0.520 (0.417–0.548)	0.903	0.151	0.772
	InceptionV3	0.532	0.480 (0.406–0.527)	0.651	0.302	0.647	0.577	0.579 (0.448–0.586)	0.573	0.585	0.641
	DenseNet121	0.513	0.460 (0.447–0.513)	0.621	0.302	0.628	0.526	0.540 (0.467–0.550)	0.553	0.472	0.606
PR	ResNet50	0.610	0.570 (0.496–0.587)	0.920	0.220	0.730	0.526	0.493 (0.472–0.537)	0.744	0.242	0.640
	InceptionV3	0.474	0.460 (0.433–0.491)	0.558	0.364	0.546	0.513	0.400 (0.386–0.533)	0.872	0.045	0.669
	DenseNet121	0.493	0.460 (0.453–0.521)	0.698	0.227	0.609	0.552	0.530 (0.497–0.553)	0.697	0.364	0.638
HER2	ResNet50	0.649	0.583 (0.455–0.584)	0.396	0.330	0.422	0.541	0.450 (0.416–0.566)	0.563	0.530	0.442
	InceptionV3	0.541	0.573 (0.495–0.583)	0.667	0.480	0.485	0.622	0.525 (0.489–0.568)	0.250	0.800	0.300
	DenseNet121	0.642	0.530 (0.455–0.535)	0.208	0.850	0.274	0.669	0.560 (0.468–0.566)	0.250	0.870	0.329

ACC, accuracy; AUC, area under the curve; CI, confidence interval; ER, estrogen receptor; PR, progesterone receptor; HER2, human epidermal growth factor receptor 2.