Skip to main content
. 2023 Jun 17;23(12):5677. doi: 10.3390/s23125677

Table 1.

Model hyperparameters.

Hyperparameter Value
Number of layers 24
Number of heads 16
Hidden size 1024
Optimizer Adam with β1 = 0.99, β2 = 0.999, ϵ= 1×108
Learning rate 1.0×105
Learning rate scheduler linear with a warm-up ratio of 0.1
Batch size 16
Gradient accumulation steps 4