Table 4.
Approaches | AUC_TR | AUC_TE | AUC_Set 2 | OF1 | OF2 | |||||||||||||||
Univariate analysis | Multivariate analysis* | Univariate analysis | Multivariate analysis | Univariate analysis | Multivariate analysis | Univariate analysis | Multivariate analysis | Univariate analysis | Multivariate analysis | |||||||||||
P value | MMD/R | P value | SE | P value | MMD/R | P value | SE | P value | MMD/R | P value | SE | P value | MMD/R | P value | SE | P value | MMD/R | P value | SE | |
Imputing or not | <0.0001† | 0.0793 | 0.0019 | −0.3020 | <0.0001† | 0.0605 | 0.0813 | −0.2082 | <0.0001† | 0.0759 | 0.1719 | −0.1687 | 0.0394† | 0.0231 | 0.6812 | −0.0635 | 0.1084† | 0.0203 | 0.6829 | −0.0634 |
Sampling methods | <0.0001† | 0.1695 | 0.0141 | −0.2757 | <0.0001† | 0.1620 | 0.2980 | −0.1436 | <0.0001† | 0.1349 | 0.0840 | −0.2471 | 0.7884† | 0.0145 | 0.3667 | −0.1615 | 0.5764† | 0.0440 | 0.6617 | 0.0786 |
Binning or not | 0.6441† | 0.0053 | 0.7135 | −0.0149 | 0.7837† | 0.0009 | 0.8834 | −0.0073 | 0.8258† | 0.0028 | 0.7578 | 0.0160 | 0.9188† | 0.0066 | 0.9212 | −0.0064 | 0.6672† | 0.0036 | 0.7668 | −0.0193 |
Screening methods | 0.0119† | 0.0489 | 0.0338 | 0.1102 | 0.0042† | 0.0541 | 0.0024 | 0.1950 | 0.0091‡ | 0.0513 | 0.0352 | 0.1394 | 0.2277‡ | 0.0242 | 0.1343 | −0.1242 | 0.6512† | 0.0213 | 0.4484 | 0.0631 |
Model methods | <0.0001† | 0.2015 | <0.0001 | 0.2734 | <0.0001† | 0.1654 | <0.0001 | 0.2864 | <0.0001† | 0.1739 | <0.0001 | 0.2025 | 0.0143† | 0.1166 | 0.5617 | −0.0386 | 0.0271† | 0.1274 | 0.1616 | 0.0936 |
Num of Variables | <0.0001§ | 0.6121 | <0.0001 | 0.2905 | <0.0001§ | 0.5197 | <0.0001 | 0.3360 | <0.0001§ | 0.5147 | <0.0001 | 0.3134 | 0.2593§ | 0.0653 | 0.4035 | −0.0739 | 0.2219§ | −0.0707 | 0.7425 | 0.0292 |
Num of Samples¶ | <0.0001§ | 0.6716 | <0.0001 | 0.9020 | <0.0001§ | 0.5083 | 0.0022 | 0.5584 | <0.0001§ | 0.4949 | <0.0001 | 0.3655 | 0.0722§ | 0.1039 | 0.1972 | 0.3024 | 0.5946§ | −0.0308 | 0.8326 | −0.0497 |
Oversampling leading to abnormal distribution in all five indexes.
The bold values indicate the parameters of approaches which would significantly affect predictive indicators.
*Multiple linear regression was used for multivariate analysis.
†Kruskal-Wallis test.
‡General linear models for analysis of variance (ANOVA).
§Spearman correlation analysis.
¶The variance inflation factor (VIF) of variable ‘Num of Samples’ in multiregression model is 16.4146 (which is greater than 10), indicates multicollinearity that maybe exists and may make the model unstable; this variable may be severely collinear with imputing, binning, and sampling, so the multiple linear regression (MLR) model was re-established after the three variables were eliminated.
AUC, area under the receiver operating characteristic curve; AUC_Set 2, AUC of set 2; AUC_TE, AUC of set 1 testing set; AUC_TR, AUC of set 1 training set; MMD, maximum mean difference among levels; R, correlation coefficient; SE, standardized estimate.