Skip to main content
. 2024 Sep 27;14:22281. doi: 10.1038/s41598-024-73733-w

Table 2.

Optimal value and explanation for optimal hyperparameters of machine learning algorithms on the unstratified data with core features.

Model Hyperparameter Explanation Value
RF ntree Number of trees to grow 625
mtry Number of variables randomly sampled as candidates at each split 1
nodesize Minimum size of terminal nodes, increasing the nodesize leads to the growth of smaller trees and reduces the time required to fit the model. 4
SVM kernel Kernel functions for model training and prediction, including linear and radial kernels. linear
cost Cost of constraints violation 0.4
XGBoost eta The learning rate, a larger ‘eta’ value results in a more conservative boosting process, increasing the risk of underfitting, while a smaller value may lead to overfitting. 0.05
max_depth Maximum depth of individual learners (classification trees) 2
subsample The subsample proportion of the training instances, when set to 0.5 means half of the training samples are randomly selected for each learner, aiding in preventing overfitting. 0.5
colsample_bytree Percentage of columns selected when training individual learners 0.3
gamma Minimum loss required for further division of leaf nodes for an individual learner (classification tree) 10
nrounds Maximum number of boosting iterations 150