Skip to main content
. 2025 Jun 6;8:340. doi: 10.1038/s41746-025-01703-1

Table 3.

Best-performing model results stratified by sensitivity target

Sensitivity Target Sensitivity (95%CI) Specificity (95%CI) Precision (95%CI) F1 Score (95%CI)
1.00 1.00 (1.00–1.00) 0.03 (0.00–0.25) 0.89 (0.89–0.91) 0.94 (0.94–0.96)
0.90 0.98 (0.84–1.00) 0.07 (0.00–0.25) 0.89 (0.89–0.91) 0.94 (0.87–0.96)
0.75 0.91 (0.72–1.00) 0.35 (0.25–0.75) 0.92 (0.89–0.97) 0.91 (0.81–0.96)
0.50 0.80 (0.50–0.94) 0.66 (0.50–1.00) 0.95 (0.91–1.00) 0.86 (0.65–0.95)
0.25 0.63 (0.19–0.88) 0.93 (0.75–1.00) 0.99 (0.94–1.00) 0.75 (0.32–0.93)
0.10 0.48 (0.06–0.88) 1.00 (1.00–1.00) 1.00 (1.00–1.00) 0.60 (0.12–0.93)
0.00 0.03 (0.03–0.06) 1.00 (1.00–1.00) 1.00 (1.00–1.00) 0.07 (0.06–0.12)

The best performing model as measured by AUROC was an XGBoost model trained on text-data only derived from a character threshold of 500. The table above records sensitivity, specificity, precision and F1 score based on different decision thresholds for this model which achieved a sensitivity greater than or equal to and which were nearest to the desired sensitivity target. Confidence intervals are estimated using bootstrapping (1000 models fit).