Table 4.
Characteristics of training and test sets of the machine learning models
| Labels | Datasets | Models | Major effect | Minor effect | Moderate effect | Average |
|---|---|---|---|---|---|---|
| Specificity | Training | LGBM | 0.952470 (0.89–0.984) | 0.945017 (0.902–0.963) | 0.988782 (0.935–0.995) | 0.962090 (0.897–0.990) |
| RF | 0.942155 (0.886–0.981) | 0.883671 (0.815–0.938) | 0.971873 (0.916–0.994) | 0.932566 (0.874–0.978) | ||
| LR | 0.902513 (0.842–0.954) | 0.829750 (0.775–0.882) | 0.942907 (0.893–0.976) | 0.891723 (0.814–0.956) | ||
| Test | LGBM | 0.901235 (0.832–0.952) | 0.810320 (0.765–0.874) | 0.910022 (0.865–0.973) | 0.873859 (0.809–0.951) | |
| RF | 0.902256 (0.835–0.97) | 0.799248 (0.734–0.858) | 0.924031 (0.891–0.972) | 0.875178 (0.805–0.943) | ||
| LR | 0.885311 (0.816–0.937) | 0.800696 (0.764–0.853) | 0.924315 (0.892–0.971) | 0.870107 (0.799–0.926) | ||
| Precision | Training | LGBM | 0.909631 (0.852–0.956) | 0.891492 (0.846–0.935) | 0.976000 (0.943–0.995) | 0.925708 (0.874–0.963) |
| RF | 0.880060 (0.836–0.924) | 0.780654 (0.745–0.824) | 0.942249 (0.891–0.983) | 0.867654 (0.819–0.906) | ||
| LR | 0.811558 (0.774–0.863) | 0.686844 (0.641–0.739) | 0.869870 (0.827–0.905) | 0.789424 (0.754–0.82) | ||
| Test | LGBM | 0.808720 (0.771–0.853) | 0.635983 (0.597–0.681) | 0.805380 (0.758–0.851) | 0.750028 (0.723–0.789) | |
| RF | 0.811366 (0.778–0.856) | 0.638670 (0.589–0.675) | 0.822077 (0.775–0.874) | 0.757371 (0.702–0.792) | ||
| LR | 0.788790 (0.742–0.831) | 0.625817 (0.582–0.673) | 0.821904 (0.791–0.875) | 0.745503 (0.702–0.779) | ||
| Recall | Training | LGBM | 0.952677 (0.915–0.987) | 0.902059 (0.871–0.943) | 0.917868 (0.867–0.965) | 0.924201 (0.891–0.976) |
| RF | 0.868343 (0.834–0.908) | 0.848889 (0.803–0.888) | 0.875706 (0.816–0.915) | 0.864313 (0.821–0.907) | ||
| LR | 0.862100 (0.824–0.903) | 0.738075 (0.701–0.773) | 0.752381 (0.714–0.798) | 0.784185 (0.765–0.821) | ||
| Test | LGBM | 0.841874 (0.805–0.888) | 0.666667 (0.614–0.709) | 0.734488 (0.687–0.781) | 0.747676 (0.697–0.791) | |
| RF | 0.831990 (0.788–0.882) | 0.702854 (0.664–0.751) | 0.716456 (0.684–0.752) | 0.750433 (0.714–0.798) | ||
| LR | 0.833476 (0.802–0.873) | 0.675485 (0.641–0.709) | 0.708738 (0.667–0.746) | 0.739233 (0.701–0.776) | ||
| F1_score | Training | LGBM | 0.930657 (0.908–0.956) | 0.896744 (0.854–0.926) | 0.896744 (0.854–0.936) | 0.924481 (0.897–0.974) |
| RF | 0.874162 (0.847–0.913) | 0.813343 (0.785–0.864) | 0.907760 (0.854–0.947) | 0.865088 (0.823–0.906) | ||
| LR | 0.836066 (0.795–0.896) | 0.711538 (0.687–0.768) | 0.806871 (0.774–0.845) | 0.784825 (0.754–0.831) | ||
| Test | LGBM | 0.824964 (0.778–0.865) | 0.650964 (0.614–0.692) | 0.768302 (0.725–0.805) | 0.748077 (0.704–0.793) | |
| RF | 0.821549 (0.778–0.863) | 0.669226 (0.627–0.701) | 0.765641 (0.723–0.809) | 0.752139 (0.717–0.792) | ||
| LR | 0.810518 (0.769–0.864) | 0.649703 (0.607–0.683) | 0.761137 (0.731–0.806) | 0.740453 (0.716–0.792) | ||
| Accuracy | Training | LGBM | 0.934230 (0.897–0.971) | 0.894230 (0.856–0.926) | 0.884230 (0.834–0.921) | 0.924230 (0.888–0.967) |
| RF | 0.874497 (0.821–0.914) | 0.814497 (0.774–0.854) | 0.834497 (0.794–0.871) | 0.864497 (0.824–0.912) | ||
| LR | 0.773508 (0.735–0.814) | 0.693508 (0.652–0.739) | 0.753508 (0.712–0.793) | 0.783508 (0.746–0.82) | ||
| Test | LGBM | 0.817573 (0.774–0.859) | 0.737573 (0.704–0.774) | 0.737573 (0.698–0.769) | 0.747573 (0.712–0.783) | |
| RF | 0.770676 (0.716–0.816) | 0.760676 (0.723–802) | 0.690676 (0.642–0.735) | 0.750676 (0.713–0.801) | ||
| LR | 0.690093 (0.645–0.734) | 0.750093 (0.712–0.783) | 0.730093 (0.699–0.772) | 0.740093 (0.708–0.79.1) |
LGBM=Light gradient boosting machine, LR=Logistic regression, RF=Random forest