Table 7:
Component | Model | Parameter | Value |
---|---|---|---|
Image Encoder | CNN | Filter sizes Num filters Filter strides Filter padding |
[7,5,5,3,3,3] [16,32,64,64,128,128] [2,2,2,2,2,2] Same |
Force Encoder | Causal Convolution [157] | Filter sizes Num filters Filter strides Filter padding |
[2,2,2,2,2] [16,32,64,128,256] [2,2,2,2,2] 1 |
Proprio Encoder | Linear | Hidden sizes | [32, 64, 128, 256] |
Depth Encoder | CNN | Filter sizes Num filters Filter strides Filter padding |
[3, 3, 4, 3, 3, 3] [32, 64, 64, 64, 128, 128] [2, 2, 2, 2, 2, 2] Same |
Action Encoder | Linear | Hidden sizes | [32, 32] |
Classification Head | 2-Layer MLP | Hidden size Activation |
128 LeakyReLU(0.2) |
Fusion | LRTF [106] | Output dim Ranks |
200 40 |
Sensor Fusion [91] | z-dim | 128 | |
Training | Loss Batch size Num epochs Optimizer Learning rate |
Contact: Cross Entropy End-Effector: MSE 64 Sensor Fusion: 50 LRTF: 35; Others: 15 Adam Contact: 10−4 End-Effector: 5×10−4 |
|
RefNet [135] | Loss Batch size Optimizer/Learning Rate Refiner Self Loss Weight |
Cross Entropy + Contrast 40 Adam / 0.0005 MLP(1056,2000,65760) 0.0001 |