Skip to main content
. 2020 May 26;5(3):e00656-19. doi: 10.1128/mSystems.00656-19

TABLE 2.

Comparison of different feature engineering approaches on performance of ML prediction of PRa

Genomes
used
Metric Ref-based value
(95% CI)b
Ref-based value
with GWAS
(95% CI)b
Ref-free value
(95% CI)b
Ref-based value
with polymyxin
exposure data
(95% CI)b
P value
(GWAS vs
ref based)
P value
(ref free vs
ref based)
P value (poly
exposure vs
ref based)
Random forest Random forest SVC GBTC
CUIMC AUROC 0.885 (0.849, 0.92) 0.893 (0.864, 0.922) 0.696 (0.564, 0.828) 0.923 (0.88, 0.965) 0.571 0.241 0.104
  bACC 0.789 (0.751, 0.827) 0.841 (0.82, 0.862) 0.64 (0.536, 0.743) 0.796 (0.714, 0.879) 0.026* 0.226 0.544
  Accuracy 0.796 (0.762, 0.83) 0.854 (0.83, 0.878) 0.649 (0.541, 0.758) 0.804 (0.716, 0.892) 0.009* 0.91 0.342
  F1 0.755 (0.701, 0.809) 0.816 (0.793, 0.84) 0.579 (0.454, 0.704) 0.768 (0.685, 0.85) 0.045* 0.734 0.733
  Precision 0.799 (0.739, 0.859) 0.881 (0.848, 0.914) 0.67 (0.506, 0.833) 0.866 (0.737, 0.996) 0.006* 0.023* 0.085
  Recall 0.733 (0.635, 0.831) 0.763 (0.725, 0.801) 0.56 (0.407, 0.714) 0.732 (0.607, 0.857) 1 0.011* 0.879
GBTC SVC SVC
Non-CUIMC AUROC 0.933 (0.884, 0.982) 0.933 (0.888, 0.979) 0.803 (0.692, 0.913)   0.85 0.677  
  bACC 0.753 (0.654, 0.853) 0.82 (0.76, 0.881) 0.5 (0.5, 0.5)   0.427 0.089  
  Accuracy 0.873 (0.822, 0.925) 0.917 (0.894, 0.94) 0.82 (0.811, 0.83)   0.185 0.005*  
  F1 0.59 (0.395, 0.785) 0.729 (0.648, 0.81) 0 (0, 0)   0.345 0.623  
  Precision 0.711 (0.473, 0.949) 0.832 (0.759, 0.905) 0 (0, 0)   0.703 0.569  
  Recall 0.57 (0.362, 0.778) 0.669 (0.547, 0.791) 0 (0, 0)   0.88 0*  
GBTC SVC GBTC
All AUROC 0.894 (0.838, 0.95) 0.931 (0.915, 0.947) 0.692 (0.546, 0.838)   0.19 0.015*  
  bACC 0.784 (0.73, 0.838) 0.801 (0.776, 0.827) 0.5 (0.5, 0.5)   0.473 0.006*  
  Accuracy 0.827 (0.78, 0.874) 0.864 (0.84, 0.888) 0.688 (0.685, 0.691)   0.162 0.045*  
  F1 0.702 (0.623, 0.781) 0.741 (0.702, 0.779) 0 (0, 0)   0.384 0.003*  
  Precision 0.8 (0.675, 0.926) 0.889 (0.846, 0.932) 0 (0, 0)   0.363 0.472  
  Recall 0.668 (0.549, 0.788) 0.638 (0.591, 0.686) 0 (0, 0)   0.344 0.064  
a

*, statistical significance with a P value of <0.05. Abbreviations: ML, machine learning; PR, polymyxin resistance; AUROC, area under receiver-operator curve; bACC, balanced accuracy; CI, confidence interval; CUIMC, Columbia University Irving Medical Center; GWAS, genome-wide association study; ref, reference.

bUsing the best-performing algorithm indicated.