Table 18:
Dataset | MuJoCo Push | ||||||
---|---|---|---|---|---|---|---|
Metric | Epochs trained | Training time (s) | Training params (M) | Training peak memory (MB) | Inference time (s) | Inference params (M) | |
U | Unimodal Unimodal Unimodal Unimodal |
20 20 20 20 |
738±133 288±39 252±6 372±64 |
3.88 3.33 3.33 3.33 |
3607±1 3595±2 3594±1 3594±1 |
3.46±0.02 0.91±0.08 0.87±0.04 0.86±0.04 |
3.88 3.33 3.33 3.33 |
| |||||||
M | EF LF-LSTM TF-LSTM [179] MulT [156] |
20 20 20 20 |
815±34 856±46 1914±31 4792±62 |
3.92 1.90 23.5 14.6 |
3654±1 3636±1 4530±9 6530±16 |
4.44±0.55 4.32±0.45 7.75±0.12 22.4±0.28 |
3.92 1.90 23.5 14.6 |
Dataset | Vision&Touch | ||||||
---|---|---|---|---|---|---|---|
Metric | Epochs trained | Training time (s) | Training params (M) | Training peak memory (MB) | Inference time (s) | Inference params (M) | |
U | Unimodal Unimodal Unimodal |
15 1 5 15 |
2633 2185 2514 |
1.00 0.13 0.08 |
5530 2426 2389 |
63.9 51.6 59.5 |
1.00 0.13 0.08 |
| |||||||
M | LF Sensor Fusion [91] LRTF [106] |
15 50 35 |
2672 11604 8366 |
1.20 1.10 1.09 |
5572 4467 4987 |
64.4 62.6 64.4 |
1.20 1.10 1.09 |
| |||||||
O | RefNet [135] | 15 | 3819 | 135 | 6067 | 65.0 | 1.20 |