. 2019 Jan 29;17(2):81. doi: 10.3390/md17020081

Table 2.

Summary performances (accuracies) of our best binary classifiers after removing 39 highly correlated variables (r_pearson > 0.9, multicollinearity), applying recursive feature extraction and/or tuning hyperparameters. All models were built using a training set (300 observations) and an external testing set (32 observations) with stratified 10-fold cross-validation.

Model Name	Number Variables	Hyperparameters	Model Accuracy	External Test Accuracy
After Recursive Feature Elimination with Cross-Validation (RFECV)
logistic regression	135	default	0.82	0.66
decision tree	200	default	0.75	0.69
random forest	154	default	0.82	0.78
gradient boosting	160	default	0.78	0.72
After Tuning Hyperparameters
logistic regression ¹	200	l1, 1	0.81 ± 0.06	0.63 ± 0.18
decision tree ²	200	auto, 5, 1	0.70 ± 0.05	0.56 ± 0.20
random forest ³	200	sqrt, 400, 5, 2	0.80 ± 0.07	0.75 ± 0.20
gradient boosting ⁴	200	log2, 200, 10, 4	0.78 ± 0.07	0.69 ± 0.20
After Tuning Hyperparameters and RFECV
logistic regression ¹	18	l1, 1	0.82	0.63
decision tree ²	156	auto, 5, 1	0.77	0.63
random forest ³	150	sqrt, 400, 5, 2	0.82	0.72
gradient boosting ⁴	162	log2, 200, 10, 4	0.82	0.75
After Multicollinearity and RFECV
logistic regression	88	default	0.82	0.66
decision tree	2	default	0.70	0.63
random forest	127	default	0.79	0.78
gradient boosting	69	default	0.78	0.69
After Multicollinearity and Tuning Hyperparameters
logistic regression ¹	161	l2, 10	0.81 ± 0.06	0.78 ± 0.18
decision tree ²	161	auto, 6, 1	0.74 ± 0.04	0.60 ± 0.20
random forest ³	161	log2, 700, 5, 2	0.80 ± 0.07	0.80 ± 0.20
gradient boosting ⁴	161	log2, 300, 15, 4	0.80 ± 0.07	0.80 ± 0.20
After Multicollinearity, Tuning Hyperparameters and RFECV
logistic regression ¹	99	l2, 10	0.83	0.72
decision tree ²	85	auto, 6, 1	0.76	0.63
random forest ³	150	log2, 700, 5, 2	0.82	0.72
gradient boosting ⁴	161	log2, 300, 15, 4	0.80	0.78

Hyperparameters for ¹ logistic regression {penalty, cost C}, ² decision tree {max_features, max_depth, min_samples_leaf}, ³ random forest {max_features, n_estimators, max_depth, min_samples_leaf} and ⁴ gradient boosting {max_features, n_estimators, max_depth, min_samples_leaf} classifiers.