Table 3.
Best-performing model results stratified by sensitivity target
| Sensitivity Target | Sensitivity (95%CI) | Specificity (95%CI) | Precision (95%CI) | F1 Score (95%CI) |
|---|---|---|---|---|
| 1.00 | 1.00 (1.00–1.00) | 0.03 (0.00–0.25) | 0.89 (0.89–0.91) | 0.94 (0.94–0.96) |
| 0.90 | 0.98 (0.84–1.00) | 0.07 (0.00–0.25) | 0.89 (0.89–0.91) | 0.94 (0.87–0.96) |
| 0.75 | 0.91 (0.72–1.00) | 0.35 (0.25–0.75) | 0.92 (0.89–0.97) | 0.91 (0.81–0.96) |
| 0.50 | 0.80 (0.50–0.94) | 0.66 (0.50–1.00) | 0.95 (0.91–1.00) | 0.86 (0.65–0.95) |
| 0.25 | 0.63 (0.19–0.88) | 0.93 (0.75–1.00) | 0.99 (0.94–1.00) | 0.75 (0.32–0.93) |
| 0.10 | 0.48 (0.06–0.88) | 1.00 (1.00–1.00) | 1.00 (1.00–1.00) | 0.60 (0.12–0.93) |
| 0.00 | 0.03 (0.03–0.06) | 1.00 (1.00–1.00) | 1.00 (1.00–1.00) | 0.07 (0.06–0.12) |
The best performing model as measured by AUROC was an XGBoost model trained on text-data only derived from a character threshold of 500. The table above records sensitivity, specificity, precision and F1 score based on different decision thresholds for this model which achieved a sensitivity greater than or equal to and which were nearest to the desired sensitivity target. Confidence intervals are estimated using bootstrapping (1000 models fit).