Table 2.
Performance comparison of four machine learning models with different sampling ratios.
| Model | AUCa, mean (SD) | F1, mean (SD) | Accuracy, mean (SD) | Recall, mean (SD) | Precision, mean (SD) | |||||
| Ratio=1 | ||||||||||
|
|
LGBMb | 0.75 (0.12) | 0.7 (0.14) | 0.72 (0.08) | 0.71 (0.19) | 0.73 (0.18) | ||||
|
|
LRc | 0.75 (0.1) | 0.73 (0.13) | 0.73 (0.1) | 0.76 (0.16) | 0.72 (0.15) | ||||
|
|
RFd | 0.76 (0.11) | 0.72 (0.09) | 0.72 (0.1) | 0.73 (0.11) | 0.76 (0.2) | ||||
|
|
XGBe | 0.75 (0.11) | 0.71 (0.13) | 0.71 (0.1) | 0.73 (0.13) | 0.7 (0.16) | ||||
| Ratio=3 | ||||||||||
|
|
LGBM | 0.73 (0.07) | 0.52 (0.08) | 0.72 (0.07) | 0.62 (0.16) | 0.47 (0.1) | ||||
|
|
LR | 0.74 (0.07) | 0.54 (0.06) | 0.7 (0.09) | 0.72 (0.11) | 0.46 (0.11) | ||||
|
|
RF | 0.75 (0.08) | 0.5 (0.08) | 0.69 (0.08) | 0.64 (0.14) | 0.44 (0.11) | ||||
|
|
XGB | 0.76 (0.08) | 0.53 (0.11) | 0.71 (0.12) | 0.65 (0.19) | 0.48 (0.12) | ||||
| Ratio=5 | ||||||||||
|
|
LGBM | 0.74 (0.09) | 0.42 (0.11) | 0.7 (0.1) | 0.66 (0.18) | 0.35 (0.14) | ||||
|
|
LR | 0.73 (0.09) | 0.43 (0.11) | 0.68 (0.13) | 0.68 (0.13) | 0.34 (0.15) | ||||
|
|
RF | 0.74 (0.07) | 0.43 (0.11) | 0.71 (0.14) | 0.65 (0.15) | 0.35 (0.13) | ||||
|
|
XGB | 0.73 (0.08) | 0.41 (0.1) | 0.62 (0.1) | 0.8 (0.12) | 0.29 (0.09) | ||||
| Ratio=10 | ||||||||||
|
|
LGBM | 0.75 (0.09) | 0.32 (0.09) | 0.74 (0.13) | 0.64 (0.2) | 0.23 (0.07) | ||||
|
|
LR | 0.73 (0.1) | 0.29 (0.08) | 0.68 (0.13) | 0.66 (0.11) | 0.19 (0.07) | ||||
|
|
RF | 0.75 (0.08) | 0.29 (0.07) | 0.69 (0.12) | 0.69 (0.13) | 0.19 (0.05) | ||||
|
|
XGB | 0.75 (0.07) | 0.32 (0.06) | 0.72 (0.12) | 0.66 (0.13) | 0.22 (0.08) | ||||
| Ratio=20 | ||||||||||
|
|
LGBM | 0.72 (0.07) | 0.18 (0.07) | 0.68 (0.13) | 0.67 (0.13) | 0.11 (0.05) | ||||
|
|
LR | 0.73 (0.06) | 0.2 (0.06) | 0.72 (0.14) | 0.65 (0.2) | 0.13 (0.06) | ||||
|
|
RF | 0.72 (0.06) | 0.18 (0.04) | 0.72 (0.09) | 0.63 (0.12) | 0.11 (0.03) | ||||
|
|
XGB | 0.74 (0.06) | 0.17 (0.03) | 0.67 (0.13) | 0.69 (0.15) | 0.1 (0.02) | ||||
aAUC: area under the receiver operating characteristic curve.
bLGBM: light gradient boosting machine.
cLR: logistic regression.
dRF: random forest.
eXGB: extreme gradient boosting.