Core
|
embedding dim. |
uniform, min: 8, max: 512, step: 8 |
112 |
learning rate
|
uniform, min: 0.0001, max: 0.01 |
0.0048 |
patch dropout
|
uniform, min: 0, max: 0.5 |
0.1338 |
drop Path
|
uniform, min: 0, max: 0.5 |
0.0505 |
pos. encoding
|
none, learnable, sinusoidal
|
learnable
|
weight decay
|
uniform, min: 0, max: 1 |
0.1789 |
batch size
|
uniform, min: 1, max: 64 |
6 |
Spatial Transformer
|
num. blocks
|
uniform, min:1, max: 8, step: 1 |
3 |
patch size
|
uniform, min: 3, max: 16, step: 1 |
7 |
patch stride
|
uniform, min: 1, max: patch size, step: 1 |
2 |
Temporal Transformer
|
num. blocks
|
uniform, min: 1, max: 8, step: 1 |
5 |
patch size
|
uniform, min: 1, max: 50, step: 1 |
25 |
patch stride
|
uniform, min: 1, max: patch size, step: 1 |
1 |
multi-head attention (MHA) layer
|
num. heads
|
uniform, min: 1, max: 16, step: 1 |
11 |
head dim. |
uniform, min: 8, max: 512, step: 8 |
48 |
MHA dropout
|
uniform, min: 0, max: 0.5 |
0.3580 |
feedforward (FF) layer
|
FF dim. |
uniform, min: 8, max: 512, step: 8 |
136 |
FF activation
|
Tanh, Sigmoid, ELU, GELU, SwiGLU |
GELU |
FF dropout
|
uniform, min: 0, max: 0.5 |
0.0592 |