train_epochs |
Number of complete passes through the entire training dataset. A higher number of epochs may improve model performance, though an excessive number could lead to overfitting. |
3 |
train_batch_size |
Determines the number of samples the model processes simultaneously during training. A larger batch size can accelerate training but requires more memory. |
16 |
eval_batch_size |
Similar to the training batch size, it controls the number of samples the model processes at once during evaluation. |
16 |
learning_rate |
Defines the rate at which the model adjusts its weights based on the loss gradient. A high learning rate speeds up training but may hinder convergence, while a lower rate results in more stable, albeit slower, learning. |
|
weight_decay |
A regularization parameter that helps prevent overfitting by penalizing large weights, ensuring that the model generalizes well to unseen data. |
|