Table 2.
Model network structural hyperparameters
| Parameter | Value | Explanation |
|---|---|---|
| n_cyc | 30 | Number of previous cycles used for model input. |
| batch_size | 512 | Batch size for training and validation. |
| Lr | 8e-4 | Learning rate for training. |
| num_epochs | 12,000 | Maximum number of training epochs. |
| Patience | 1,600 | Early stopping patience. |
| Alpha | [0.1] ∗ 10 | Capacity loss weight during pre-training. |
| in_ch | 4 | Number of input channels for convolution layers. |
| out_ch | [8, 16, 64] | Number of output channels for convolution layers. |
| Kernel | 3 | Kernel size for convolution layers. |
| Stride | 2 | Stride for convolution layers. |
| Padding | 0 | Padding for convolution layers. |
| embed_dim | 64 | Embedding dimension for multi-head attention layers. |
| num_heads | 2 | Number of attention heads in multi-head attention layers. |
| Dropout | 0.3 | Dropout rate for multi-head attention layers. |
| dense_1 | 64 | Number of neurons in the first dense layer. |
| dense_2 | 64 | Number of neurons in the second dense layer. |
| finetune_lr | 2e-5 | Learning rate for fine-tuning. |
| train_alpha | [0.09] ∗ 9 + [0] | Fine-tuning capacity loss weights during training. |
| valid_alpha | [0.09] ∗9 + [0] | Fine-tuning capacity loss weights during validation. |
| finetune_epochs | 800 | Maximum number of fine-tuning epochs. |