Table 3.
The performance comparison of the Residual-Transformer-Fine-Grained (ResTFG) for the Transformer branch with different hyperparameters and the number of the encoder block layer. TransFG_B16 with a batch size of 16 is one of the Transformer-Fine-Grained (TransFG) models.
| Model Name | Hidden size | MLP dimension | Number heads | Number Layers |
Parameters (M) |
Speed (FPS) |
Accuracy (%) |
|---|---|---|---|---|---|---|---|
| TransFG_B16 | 768 | 3,072 | 12 | 12 | 85.80 | 104 | 93.5 |
| ResTFG (C13, H12, L8) (a) | 768 | 3,072 | 12 | 8 | 82.14 | 105 | 97.2 |
| ResTFG (C13, H12, L8) (b) | 384 | 1,536 | 12 | 8 | 21.50 | 112 | 97.0 |
| ResTFG (C13, H12, L8) (c) | 288 | 1,024 | 12 | 8 | 11.90 | 113 | 95.7 |
| ResTFG (C13, H12, L8) (d) | 192 | 768 | 12 | 8 | 6.12 | 116 | 95.7 |
| ResTFG (C13, H8, L8) | 384 | 1,536 | 8 | 8 | 21.50 | 114 | 96.7 |
| ResTFG (C13, H4, L8) | 384 | 1,536 | 4 | 8 | 21.50 | 114 | 96.6 |
| ResTFG (C13, H2, L8) | 384 | 1,536 | 2 | 8 | 21.50 | 114 | 96.0 |
| ResTFG (C13, H12, L6) | 384 | 1,536 | 12 | 6 | 17.96 | 134 | 96.5 |
| ResTFG (C13, H12, L4) | 384 | 1,536 | 12 | 4 | 14.41 | 165 | 96.9 |
| ResTFG (C13, H12, L3) | 384 | 1,536 | 12 | 3 | 12.63 | 183 | 97.0 |
| ResTFG (C13, H12, L2) | 384 | 1,536 | 12 | 2 | 10.86 | 216 | 97.1 |