Table 2.
The performance of various models for discriminating clustered strains from non-clustered strains in the lineage2 cohort
| Parameters | Training set (n = 3595, 2081 clustered strains, 1514 non-clustered strains) |
Test set (n = 1541, 918 clustered strains, 623 non-clustered strains) |
||
|---|---|---|---|---|
| Random Forest | Gradient Boosted Classification Tree | Random Forest | Gradient Boosted Classification Tree | |
| Kappa | 0.641 | 0.613 | 0.454 | 0.442 |
|
AUC (95% CI) |
0.908 (0.899, 0.917) |
0.877 (0.866, 0.888) |
0.791 (0.771, 0.811) |
0.778 (0.757, 0.799) |
|
Sensitivity (95% CI) |
0.873 (0.862, 0.884) |
0.836 (0.824, 0.848) |
0.786 (0.766, 0.806) |
0.807 (0.787, 0.827) |
|
Specificity (95% CI) |
0.762 (0.748, 0.776) |
0.779 (0.765, 0.793) |
0.666 (0.642, 0.690) |
0.628 (0.604, 0.652) |
|
PPV (95% CI) |
0.837 (0.825, 0.849) |
0.845 (0.833, 0.857) |
0.771 (0.750, 0.792) |
0.741 (0.719, 0.763) |
|
NPV (95% CI) |
0.811 (0.798, 0.824) |
0.767 (0.753, 0.781) |
0.686 (0.663, 0.709) |
0.712 (0.689, 0.735) |
|
PLR (95% CI) |
4.437 (4.415, 4.459) |
3.625 (3.597, 3.653) |
2.451 (2.402, 2.50) |
2.571 (2.528, 2.614) |
|
NIR (95% CI) |
0.225 (0.15, 0.30) |
0.276 (0.198, 0.354) |
0.408 (0.313, 0.503) |
0.389 (0.301, 0.477) |
|
Accuracy (95% CI) |
0.827 (0.815, 0.839) |
0.813 (0.8, 0.826) |
0.737 (0.715, 0.759) |
0.730 (0.708, 0.752) |
AUC, area under the curve; PPV, positive predictive value; NPV, negative predictive value; PLR, positive likelihood ratio; NLR, negative likelihood ratio; CI, confidence