Skip to main content
. 2022 Sep 28;19(19):12378. doi: 10.3390/ijerph191912378

Table 6.

The highest achievable AUC for the DDC dataset with hyperparameters tuning of the six ML models.

Classifiers Tuned Hyperparameters AUC (W/ GSO) AUC (W/O GSO)
GNB The classes’ prior probabilities (=None) and features’ largest variance portion for stability guesstimate (=0.01). 0.637±0.008 0.628±0.009
BNB Additive Laplace smoothing parameter (=1.0), classes’ prior probabilities (=None), and to learn or not class priors (=True). 0.637±0.009 0.632±0.003
RF Bootstrap samples or not (=True), split quality function (=gini), the best split feature numbers (=auto), leaf node number for grow trees (=3), leaf node’s samples (=0.4), the samples required to split an internal node (=2), tree numbers in the forest (=100), out-of-bag samples to calculate the generalization score (=False), and the bootstrapping samples’ randomness control with feature sampling for node’ split (=100). 0.628±0.000 0.628±0.000
DT Split quality function (=entropy), the best split feature numbers (=auto), leaf node’s samples required (=0.5), samples required to split an internal node (=0.1), the bootstrapping samples’ randomness control with feature sampling for node’ split (=100), and node’s partition strategy (=best). 0.792±0.025 0.675±0.009
XGB Initial prediction score (=0.5), used booster (gbtree), each levels’ subsample ratio (=1), each nodes’ subsample ratio (=1), evaluation metrics for validation data (=error), minimum loss reduction for a further partition on a leaf node (=1.5), weights’ L2 regularization (=1.5), tree depth (=5), child’s hessian sum (=5), trees in the forest (=100), parallel trees built during each iteration (=1), the bootstrapping samples’ randomness control with feature sampling for node’ split (=100), control the unbalance classes (=1), and training subsample ratio (=1.0). 0.830±0.007 0.811±0.008
LGB Boosting method (=gbdt), class weight (=True), tree construction’s columns subsample ratio (=1.0), base learner tree depth (=1), trees in the forest (=50), the bootstrapping samples’ randomness control with feature sampling for node’ split (=100), base learner tree leaves (=25), and training instance subsample ratio (=0.25). 0.796±0.010 0.793±0.012