Table 3.
Methods information comparisons [MB: megabyte; M: million; S: seconds per image (A6000 GPU time)].
| Datasets | Depth | Size (MB) | Parameters (M) | Training time (S) | Inference time (S) |
|---|---|---|---|---|---|
| GoogleNet24 | 22 | 27 | 5.631 | 0.0643 | 0.0433 |
| VGG1625 | 16 | 528 | 134.310 | 0.0857 | 0.0500 |
| ResNet5026 | 50 | 96 | 23.004 | 0.0700 | 0.0450 |
| DenseNet20127 | 201 | 77 | 20.037 | 0.0986 | 0.0550 |
| Inceptionv328 | 48 | 89 | 24.377 | 0.8038 | 0.0513 |
| Xception29 | 71 | 85 | 37.916 | 0.0957 | 0.0517 |
| InceptionResnetV230 | 164 | 209 | 54.325 | 0.1029 | 0.0533 |
| NasnetLarge31 | 533 | 332 | 84.769 | 1.7736 | 0.4317 |
| EfficientNetB732 | 438 | 256 | 63.818 | 2.7429 | 0.5083 |
| Vision transformer17 | 225 | 327.366 | 85.817M | 0.1271 | 0.0817 |
| CONVT33 | 208 | 327.226 | 85.780 | 0.0950 | 0.0667 |
| Beit_large34 | 369 | 1354.662 | 304.662 | 7.7307 | 2.2033 |
| RVT | 340 | 74.965 | 19.626 | 4.3943 | 0.9200 |
The inference time includes both validation and test datasets). Note that parameters are trainable parameters using our dataset, and they will be different from the number of parameters in the original model.