Skip to main content
. Author manuscript; available in PMC: 2014 Jun 30.
Published in final edited form as: J Biomed Inform. 2013 Dec 25;48:160–170. doi: 10.1016/j.jbi.2013.12.012

Table 4.

Average AUC (area under the ROC curve) performance measures for the various feature selection and classifier combinations in the predictive modeling pipelines for the three different EHR data sets. The average and standard deviation is computed over 100 classifier runs (10 × 10-fold cross-validation).

Data Set Feature Selection Classification Average AUC (std)
Small Information Gain K-Nearest Neighbor 0.680 (0.075)
Naïve Bayesian 0.687 (0.042)
Logistic Regression 0.690 (0.044)
Random Forest 0.717 (0.045)
Fisher Score K-Nearest Neighbor 0.655 (0.102)
Naïve Bayesian 0.690 (0.042)
Logistic Regression 0.689 (0.046)
Random Forest 0.713 (0.043)
Medium Information Gain K-Nearest Neighbor 0.598 (0.013)
Naïve Bayesian 0.692 (0.013)
Logistic Regression 0.746 (0.012)
Random Forest 0.752 (0.012)
Fisher Score K-Nearest Neighbor 0.616 (0.013)
Naïve Bayesian 0.688 (0.014)
Logistic Regression 0.741 (0.012)
Random Forest 0.749 (0.012)
Large Information Gain K-Nearest Neighbor 0.602 (0.027)
Naïve Bayesian 0.634 (0.007)
Logistic Regression 0.706 (0.006)
Random Forest 0.705 (0.006)
Fisher Score K-Nearest Neighbor 0.597 (0.023)
Naïve Bayesian 0.632 (0.007)
Logistic Regression 0.705 (0.006)
Random Forest 0.704 (0.006)