Table 3.
Comparison of the PMVT model with other backbone models on three datasets (the FPS indicator is calculated on the desktop computer, and bold text highlights the best-performing network).
| Methods | Top-1 Accuracy(%) | Parameters (M) | FLOPs (G) | FPS (img/s) | ||
|---|---|---|---|---|---|---|
| Wheat | Coffee | Rice | ||||
| SqueezeNet-1.0 | 70.0 | 79.7 | 86.2 | 0.74 | 0.73 | 293.0 |
| SqueezeNet-1.1 | 86.1 | 83.1 | 85.1 | 0.73 | 0.26 | 311.5 |
| ShuffleNetV2-1.0 | 89.6 | 68.5 | 82.7 | 1.27 | 0.15 | 151.9 |
| MobileNetV3-Small | 92.0 | 66.3 | 89.7 | 1.54 | 0.06 | 170.2 |
| PMVT-XXS (ours) | 93.6 | 85.4 | 93.1 | 0.98 | 0.31 | 88.5 |
| ShuffleNetV2-1.5 | 92.5 | 73.0 | 86.2 | 2.50 | 0.31 | 148.4 |
| MobileFormer-26M | 91.4 | 77.5 | 90.8 | 2.22 | 0.03 | 53.1 |
| MobileFormer-52M | 92.8 | 79.2 | 83.9 | 2.46 | 0.05 | 60.7 |
| MobileFormer-96M | 92.8 | 84.2 | 87.3 | 3.33 | 0.09 | 58.8 |
| MobileNetV3-Large | 92.8 | 72.0 | 91.9 | 4.22 | 0.23 | 141.0 |
| EfficientNet-B0 | 94.1 | 84.2 | 88.5 | 4.03 | 0.41 | 109.9 |
| PMVT-XS (ours) | 94.7 | 86.5 | 97.7 | 2.01 | 0.85 | 85.3 |
| ShuffleNetV2-2.0 | 93.6 | 70.0 | 91.4 | 5.38 | 0.60 | 146.2 |
| MobileFormer-151M | 94.4 | 75.3 | 88.5 | 6.34 | 0.10 | 42.3 |
| EfficientNet-B1 | 94.4 | 79.8 | 90.8 | 6.53 | 0.61 | 75.3 |
| EfficientNet-B2 | 93.3 | 83.1 | 87.3 | 7.72 | 0.70 | 76.6 |
| Deit-Tiny | 91.4 | 78.7 | 84.0 | 5.49 | 1.08 | 161.7 |
| PoolFormer-S12 | 91.4 | 85.4 | 85.1 | 11.39 | 1.81 | 178.3 |
| CVT-Tiny | 93.6 | 82.0 | 86.2 | 19.63 | 4.08 | 62.2 |
| TNT-Small | 92.8 | 80.9 | 88.5 | 23.40 | 4.85 | 67.3 |
| ResNet50 | 93.9 | 70.8 | 90.8 | 23.53 | 4.13 | 125.1 |
| ResNet101 | 94.1 | 63.0 | 88.5 | 42.50 | 7.86 | 66.3 |
| PMVT-S (ours) | 94.9 | 87.6 | 92.0 | 5.06 | 1.59 | 81.3 |