Table 2.
Imputing or not | Sampling methods |
Binning or not | Screening methods | Model methods | Num of Variables | Num of Samples | AUC_TR | AUC_TE | AUC_Set 2 | OF1 | OF2 |
Not | Not | Not | Not | $D | 16 | 167 | 0.693±0.025 | 0.621±0.054 | 0.737±0.062 | 1.125±0.122 | 0.849±0.111 |
Not | Not | Not | Forward and stepwise | $S | 3 | 167 | 0.551±0.033 | 0.577±0.078 | 0.557±0.051 | 0.974±0.162 | 1.039±0.131 |
Not | Not | Not | Backward | $L | 5 | 167 | 0.659±0.019 | 0.658±0.039 | 0.687±0.052 | 1.005±0.072 | 0.964±0.102 |
Not | Not | Yes | Forward and stepwise | $S | 3 | 167 | 0.551±0.033 | 0.577±0.078 | 0.577±0.051 | 0.974±0.162 | 1.039±0.131 |
Not | Not | Yes | Backward | $S | 3 | 167 | 0.551±0.033 | 0.577±0.078 | 0.577±0.051 | 0.974±0.162 | 1.039±0.131 |
Not | Undersampling | Not | Not | $XF | 16 | 98 | 0.778±0.025 | 0.827±0.080 | 0.744±0.085 | 0.949±0.102 | 1.130±0.198 |
Not | Undersampling | Not | Forward and stepwise | $L | 3 | 98 | 0.679±0.032 | 0.660±0.075 | 0.664±0.067 | 1.044±0.142 | 1.008±0.192 |
Not | Undersampling | Not | Backward | $L | 3 | 98 | 0.679±0.032 | 0.660±0.075 | 0.664±0.067 | 1.044±0.142 | 1.008±0.192 |
Not | Undersampling | Yes | Forward and stepwise | $KNN | 4 | 98 | 0.725±0.028 | 0.674±0.074 | 0.715±0.070 | 1.086±0.122 | 0.955±0.169 |
Not | Undersampling | Yes | Backward | $XF | 5 | 98 | 0.755±0.047 | 0.753±0.074 | 0.725±0.058 | 1.013±0.136 | 1.044±0.125 |
Not | Oversampling | Not | Not | $R | 16 | 263 | 0.781±0.021 | 0.770±0.040 | 0.758±0.070 | 1.017±0.060 | 1.028±0.150 |
Not | Oversampling | Not | Backward | $XF | 5 | 263 | 0.814±0.031 | 0.799±0.037 | 0.761±0.094 | 1.021±0.073 | 1.066±0.155 |
Not | Oversampling | Not | Forward and stepwise | $XF | 4 | 263 | 0.716±0.020 | 0.726±0.026 | 0.690±0.068 | 0.987±0.048 | 1.064±0.133 |
Not | Oversampling | Yes | Forward and stepwise | $XF | 4 | 263 | 0.834±0.030 | 0.821±0.031 | 0.782±0.080 | 1.018±0.068 | 1.060±0.112 |
Not | Oversampling | Yes | Backward | $XF | 7 | 263 | 0.864±0.028 | 0.856±0.022 | 0.813±0.127 | 1.010±0.040 | 1.088±0.261 |
Yes | Not | Not | Not | $D | 16 | 315 | 0.725±0.012 | 0.678±0.047 | 0.703±0.051 | 1.074±0.087 | 0.973±0.131 |
Yes | Not | Yes | Forward and stepwise | $XF | 5 | 315 | 0.812±0.024 | 0.760±0.048 | 0.757±0.056 | 1.073±0.089 | 1.008±0.091 |
Yes | Not | Not | Forward and stepwise | $XF | 4 | 315 | 0.752±0.020 | 0.701±0.070 | 0.711±0.055 | 1.084±0.131 | 0.994±0.144 |
Yes | Not | Not | Backward | $XF | 6 | 315 | 0.742±0.019 | 0.734±0.063 | 0.734±0.066 | 1.017±0.094 | 1.012±0.160 |
Yes | Not | Yes | Backward | $B | 6 | 315 | 0.729±0.019 | 0.718±0.099 | 0.714±0.100 | 1.034±0.151 | 1.034±0.266 |
Yes | Undersampling | Not | Not | $B | 16 | 199 | 0.785±0.032 | 0.811±0.087 | 0.778±0.063 | 0.980±0.126 | 1.052±0.170 |
Yes | Undersampling | Yes | Forward and stepwise | $XF | 4 | 199 | 0.701±0.027 | 0.665±0.074 | 0.722±0.050 | 1.067±0.135 | 0.927±0.130 |
Yes | Undersampling | Not | Forward and stepwise | $S | 4 | 199 | 0.685±0.022 | 0.658±0.069 | 0.702±0.053 | 1.053±0.137 | 0.946±0.146 |
Yes | Undersampling | Yes | Backward | $S | 5 | 199 | 0.699±0.015 | 0.754±0.083 | 0.733±0.052 | 0.938±0.113 | 1.034±0.143 |
Yes | Undersampling | Not | Backward | $KNN | 5 | 199 | 0.740±0.029 | 0.738±0.082 | 0.736±0.065 | 1.017±0.143 | 1.013±0.165 |
Yes | Oversampling | Not | Not | $XF | 16 | 513 | 0.916±0.030 | 0.869±0.041 | 0.862±0.123 | 1.056±0.052 | 1.039±0.243 |
Yes | Oversampling | Yes | Forward and stepwise | $B | 7 | 513 | 0.857±0.023 | 0.824±0.039 | 0.849±0.072 | 1.042±0.052 | 0.978±0.118 |
Yes | Oversampling | Not | Forward and stepwise | $XF | 8 | 513 | 0.907±0.031 | 0.861±0.039 | 0.843±0.115 | 1.054±0.049 | 1.049±0.230 |
Yes | Oversampling | Not | Backward | $XF | 9 | 513 | 0.907±0.024 | 0.871±0.030 | 0.866±0.082 | 1.041±0.036 | 1.017±0.134 |
Yes | Oversampling | Yes | Backward | $B | 9 | 513 | 0.865±0.032 | 0.823±0.050 | 0.839±0.107 | 1.054±0.070 | 1.003±0.191 |
OF1 was calculated using the formula: AUCSet 1 training set /AUCSet 1 testing set, and OF2, AUCSet 1 testing set /AUCSet 2.
The bold value was the maximum AUCSet 2 of 30 algorithms.
AUC, area under the receiver operating characteristic curve; AUC_Set 2, AUC of set 2; AUC_TE, AUC of set 1 testing set; AUC_TR, AUC of set 1 training set; $B, Bayesian network; $D, discriminant model; $KNN, KNN algorithm; $L, logistic regression model; $R, CHAID; $S, SVM; $XF, the ensemble model.