Skip to main content
. 2020 Oct 6;10:16581. doi: 10.1038/s41598-020-73644-6

Table 2.

Three top performing binary classifiers for HemoPI-1, HemoPI-2, HemoPI-3 datasets after optimizing classifier hyperparameters and number N of descriptors.

Classifiers Hyperparameters Feature reduction N Acc. (%) Prec. (%) MCC statistic CK statistic AUC ROC
HemoPI-1 model and validation datasets
1.1 LDA 'solver': 'svd', 'tol': 0.0001 RFECV 18 95.1 92.6 0.903 0.903 0.951
94.6 92.5 0.891 0.891 0.946
1.2 GBC 'max_depth': 4, 'max_features': 'sqrt', 'min_samples_leaf': 10, 'n_estimators': 240 MC (0.75) 26 96.5 95.0 0.930 0.930 0.965
92.7 89.6 0.855 0.855 0.927
1.3 GBC 'max_depth': 4, 'max_features': 'sqrt', 'min_samples_leaf': 10, 'n_estimators': 208 None 56 96.0 94.6 0.921 0.921 0.960
92.3 89.2 0.846 0.846 0.923
HemoPI-2 model and validation datasets
2.1 GBC 'max_depth': 4, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'n_estimators': 112 None 56 77.7 74.0 0.549 0.549 0.774
74.3 70.4 0.479 0.476 0.736
2.2 GBC Default RFECV 15 77.8 74.2 0.552 0.552 0.775
73.2 69.8 0.459 0.482 0.728
2.3 GBC Default None 56 76.7 72.9 0.529 0.528 0.763
72.3 68.9 0.439 0.437 0.717
HemoPI-3 model and validation datasets
3.1 GBC 'max_depth': 18, 'max_features': 'log2', 'min_samples_leaf': 10, 'n_estimators':192 None 56 80.0 76.4 0.597 0.597 0.796
71.7 68.3 0.427 0.425 0.711
3.2 GBC 'max_depth': 12, 'max_features': 'sqrt', 'min_samples_leaf': 8, 'n_estimators': 160 RFECV 40 78.2 74.4 0.559 0.558 0.777
74.5 70.8 0.483 0.482 0.740
3.3 GBC 'max_depth': 20, 'max_features': 'log2', 'min_samples_leaf': 8, 'n_estimators': 96 MC (0.75) 28 78.0 74.4 0.556 0.556 0.777
72.6 69.0 0.445 0.443 0.719

Optimal number N of descriptors were determined using multicollinearity, RFECV: tenfold cross-validated recursive feature extraction or BE: backward elimination.