. 2024 Oct 1;14:22797. doi: 10.1038/s41598-024-71893-3

Table 2.

Hyperparameters of our proposed ViT-GRU model with their values.

Hyperparameters	Values
Epochs	35
Batch size	64
Image size	224×224×3
Learning rate	0.0001
Weight decay	0.0001
Optimizer	AdamW, Adam, SGD
Loss function	Categorical cross-entropy
Patch size	8
Number of patches	256
Projection dimension	64
Number of parallel self-attention heads	4
Number of transformer encoder layers	8