Skip to main content
. 2021 Mar 9;11:5529. doi: 10.1038/s41598-021-85016-9

Table 1.

Summary table of performance measures of the investigated ML algorithms developed on human expert-annotated features (HEAF).

Report section Method ML Classifier HEAF feature space Rank Software Optimized metric Tested hyperparameter space Selected number of features or hyperparameter settings on outer fold 1.0–5.0 Accuracy# [min–max; %] ME AUC BS LL
Human Expert-Annotated Features (HEAF) CART CT p = 28 (all) rpart [R] ACC rpart.control = default; cp = 0.01 no optimization (no pruning) 28 73.3 [66.7–79.2] 0.27 0.63 0.37 0.87
vRF RF p = 28 (all) 4 randomForest [R] ME ntree = 500, mtry = 5, pvarsel = 28 28 81.5 [73.8–92.7] 0.18 0.82 0.27 0.44
vRF RF p = 28 (all) randomForest[R] ME ntree = 500, mtry = 5, pvarsel = 9 9 71.0 [59.5–82.9] 0.29 0.69 0.37 0.56
vRF RF p = 28 (all) randomForest[R] ME ntree = 500, mtry = 5, pvarsel = 5 5 75.2 [68.3–83.3] 0.25 0.69 0.36 0.54
tRFBS RF p = 28 (all) 2 randomForest[R] BS ntree = [100, 200, 300, … , 900, 1000] 28, 14, 14, 14, 14 83.1 [76.2–90.2] 0.17 0.81 0.27 0.44
tRFME RF p = 28 (all) randomForest[R] ME mtry = [3, 4, 5, 6, 7] 28, 28, 14, 5, 14 79.6 [68.3–90.2] 0.20 0.79 0.29 0.46
tRFLL RF p = 28 (all) 2 randomForest[R] LL pvarsel = [3, 5, 10, 14, 20, 25, 28] 25, 14, 14, 14, 14 83.1 [76.2–90.2] 0.17 0.81 0.27 0.44
ELNET ELNET p = 28 (all) 3 glmnet[R] ME α = [0, 0.1, 0.2, … , 0.8, 0.9, 1] λ = tenfold CV with default hot-start α = [0.1, 0.8, 0, 1, 0.1] λ = [0.195, 0.0688, 0.208, 0.0301, 0.1632] 82.0 [78.6–85.4] 0.18 0.79 0.27 0.43
SVM-LK SVM p = 28 (all) 1 e1071[R] ME C = [0.001, 0.01, 0.1, 1, 10, 100, 1000] C = [1, 1, 100, 10, 10] 87.4 [82.9–90.2] 0.13 0.79 0.22 0.37
XGBoost BT p = 28 (all) 5 xgboost[R] ME nrouds/ntree = 100, nrouds = 100 80.6 [75.0–85.7] 0.19 0.70 0.30 0.48
max_depth = [3, 5, 6, 8] max_depth = [5, 3, 5, 8, 3]
eta = [0.1, 0.3] eta = [0.1, 0.1, 0.1, 0.3, 0.1]
gamma = [0, 0.5, 1.0] gamma = [0, 0.5, 1, 0.5, 1]
colsample_bytree = [0.1, 0.25, 0.5, 0.693 (ln2) ~ RF, 1] colsample_bytree = [1, 1, 0.5, 1, 0.5]

Accuracy#: the averaged fivefold CV accuracy is calculated, ACC: accuracy, AUC: multiclass area under the ROC after Hand and Till (that can only be calculated if probabilities are scaled to 1), BS: Brier score, ME: misclassification error, LL: multiclass log loss, vRF and tRF: vanilla- and tuned random forests, ELNET: elastic net penalized multinomial logistic regression, SVM: support vector machines, LK: linear kernel SVM; XGBoost: extreme gradient boosting using trees as base learners, BT: boosted trees, CART: classification and regression trees; CT: classification tree; cp: complexity parameter used for CART node splitting (for this no optimization (pruning) was performed); ln(2) ~ RF: column sampling (i.e. bootstrap) representing the settings equivalent to running RF in the xgboost library, [R]: R statistical software environment.