Skip to main content
. 2024 Dec 3;10(1):353–364. doi: 10.1016/j.idm.2024.12.002

Table 1.

Performance results of vision transformer-based transfer learning models (ViT-TL), vision transformer models trained from scratch (ViT-scratch), and convolutional neural network-based transfer learning models (CNN-TL) approaches.

Approach Model Accuracy F1 score Precision Recall AUC
ViT-TL
ViTB-16 0.854 ± 0.01 0.844 ± 0.01 0.872 ± 0.01 0.826 ± 0.01 0.870 ± 0.01
ViTB-32 0.842 ± 0.02 0.830 ± 0.03 0.840 ± 0.03 0.822 ± 0.03 0.865 ± 0.02
ViTL-32
0.810 ± 0.02
0.796 ± 0.02
0.814 ± 0.02
0.784 ± 0.02
0.845 ± 0.02
ViT-scratch
ViTB-16 0.624 ± 0.02 0.604 ± 0.03 0.618 ± 0.03 0.600 ± 0.04 0.666 ± 0.01
ViTB-32 0.622 ± 0.02 0.596 ± 0.03 0.594 ± 0.03 0.584 ± 0.04 0.648 ± 0.01
ViTL-32
0.600 ± 0.02
0.590 ± 0.03
0.584 ± 0.03
0.572 ± 0.04
0.644 ± 0.01
CNN-TL ResNet50 0.772 ± 0.02 0.756 ± 0.02 0.804 ± 0.02 0.726 ± 0.03 0.785 ± 0.02
EfficientNetB2 0.680 ± 0.07 0.608 ± 0.12 0.674 ± 0.07 0.614 ± 0.10 0.744 ± 0.06
InceptionV3 0.804 ± 0.02 0.766 ± 0.02 0.848 ± 0.02 0.722 ± 0.03 0.823 ± 0.01