Skip to main content
. 2020 Oct 6;10:16581. doi: 10.1038/s41598-020-73644-6

Table 3.

Top performing Extreme Gradient Boosting classifiers (XGBoost) for HemoPI-1, HemoPI-2, HemoPI-3 datasets before and after optimizing classifier hyperparameters and number N of descriptors.

Hyperparameters Feature reduction N Acc. (%) Prec. (%) MCC statistic CK statistic AUC ROC
HemoPI-1 model and validation datasets
Default MC (0.75) 26 95.7 93.7 0.914 0.914 0.957
92.3 88.5 0.846 0.846 0.923
HemoPI-2 model and validation datasets
'colsample_bytree': 0.8, 'eta': 0.1, 'max_depth': 14, 'min_child_weight': 1, 'subsample': 0.7, 'tree_method': 'hist', 'objective':'binary:logistic' RFECV 34 79.1 75.3 0.577 0.577 0.787
70.3 67.2 0.398 0.397 0.697
HemoPI-3 model and validation datasets
'colsample_bytree': 0.8, 'eta': 0.2, 'max_depth': 14, 'min_child_weight': 0.2, 'subsample': 0.8, 'tree_method': 'approx', 'objective':'binary:logistic' MC (0.75) 28 78.7 74.9 0.569 0.568 0.783
72.6 69.1 0.445 0.444 0.720

Optimal number N of descriptors were determined using MC: multicollinearity and RFECV: recursive feature extraction with tenfold cross-validation.