Skip to main content
. 2023 Sep 14;13:15213. doi: 10.1038/s41598-023-42542-y

Table 2.

Performance comparison of machine learning models: random forest (RF), XGBoost (XGB), decision trees (DT), logistic regression (LogR), and support vector machines (SVM) for a varying number of features selected with permutation importance.

Metric # of features DT LogR RF XGB SVM
F1 8 0.489 ± 0.138 0.56 ± 0.13 0.429 ± 0.151 0.546 ± 0.127 0.361 ± 0.114
F1 12 0.502 ± 0.134 0.529 ± 0.123 0.316 ± 0.162 0.547 ± 0.135 0.409 ± 0.1
F1 16 0.493 ± 0.155 0.514 ± 0.13 0.31 ± 0.168 0.54 ± 0.134 0.483 ± 0.063
F1 20 0.49 ± 0.149 0.514 ± 0.125 0.214 ± 0.167 0.514 ± 0.145 0.483 ± 0.064
F1 All 0.421 ± 0.13 0.374 ± 0.117 0.064 ± 0.107 0.433 ± 0.145 0.487 ± 0.061

Performance is measured with F1 ± standard deviation (SD) for the test set.

Parameters of machine learning methods: RF, DT, SVM: class_weight = ‘balanced’, XGB: scale_pos_weight = counts[class1]/counts[class2]; LogR: max_iter = 10,000.

Significant values are in bold.