Skip to main content
. Author manuscript; available in PMC: 2024 May 21.
Published in final edited form as: Adv Neural Inf Process Syst. 2021 Dec;2021(DB1):1–20.

Table 7:

Table of hyperparameters for prediction on Vision&Touch dataset in the robotics domain.

Component Model Parameter Value
Image Encoder CNN Filter sizes
Num filters
Filter strides
Filter padding
[7,5,5,3,3,3]
[16,32,64,64,128,128]
[2,2,2,2,2,2]
Same
Force Encoder Causal Convolution [157] Filter sizes
Num filters
Filter strides
Filter padding
[2,2,2,2,2]
[16,32,64,128,256]
[2,2,2,2,2]
1
Proprio Encoder Linear Hidden sizes [32, 64, 128, 256]
Depth Encoder CNN Filter sizes
Num filters
Filter strides
Filter padding
[3, 3, 4, 3, 3, 3]
[32, 64, 64, 64, 128, 128]
[2, 2, 2, 2, 2, 2]
Same
Action Encoder Linear Hidden sizes [32, 32]
Classification Head 2-Layer MLP Hidden size
Activation
128
LeakyReLU(0.2)
Fusion LRTF [106] Output dim
Ranks
200
40
Sensor Fusion [91] z-dim 128
Training Loss
Batch size
Num epochs
Optimizer
Learning rate
Contact: Cross Entropy
End-Effector: MSE
64
Sensor Fusion: 50
LRTF: 35; Others: 15
Adam
Contact: 10−4
End-Effector: 5×10−4
RefNet [135] Loss
Batch size
Optimizer/Learning Rate
Refiner
Self Loss Weight
Cross Entropy + Contrast
40
Adam / 0.0005
MLP(1056,2000,65760)
0.0001