TABLE IV.
Performance Comparison of the Networks with ResNet101 Employed as the Feature Extractor Across All Networks.
| Network | Fusion Method | Modality | MAE (N) | MSE (N2) | RMSE (N) |
|---|---|---|---|---|---|
| LSTMs | - | Shape | 0.0280 | 0.0014 | 0.0374 |
| LSTMs | - | Optical Flow | 0.0424 | 0.0029 | 0.0539 |
| LSTMs | Concatenation | Shape+Optical Flow | 0.0251 | 0.0009 | 0.0307 |
| LSTMs | Memory Fusion | Shape+Optical Flow | 0.0125 | 0.0004 | 0.0200 |
| GRUs | - | Shape | 0.0288 | 0.0013 | 0.0361 |
| GRUs | - | Optical Flow | 0.0471 | 0.0034 | 0.0583 |
| GRUs | Concatenation | Shape+Optical Flow | 0.0248 | 0.0009 | 0.0310 |
| Transformer | - | Shape | 0.0274 | 0.0010 | 0.0316 |
| Transformer | - | Optical Flow | 0.0360 | 0.0019 | 0.0436 |
| Transformer | Concatenation | Shape+Optical Flow | 0.0265 | 0.0001 | 0.0311 |