. 2025 Jun 6;8:340. doi: 10.1038/s41746-025-01703-1

Table 3.

Best-performing model results stratified by sensitivity target

Sensitivity Target	Sensitivity (95%CI)	Specificity (95%CI)	Precision (95%CI)	F1 Score (95%CI)
1.00	1.00 (1.00–1.00)	0.03 (0.00–0.25)	0.89 (0.89–0.91)	0.94 (0.94–0.96)
0.90	0.98 (0.84–1.00)	0.07 (0.00–0.25)	0.89 (0.89–0.91)	0.94 (0.87–0.96)
0.75	0.91 (0.72–1.00)	0.35 (0.25–0.75)	0.92 (0.89–0.97)	0.91 (0.81–0.96)
0.50	0.80 (0.50–0.94)	0.66 (0.50–1.00)	0.95 (0.91–1.00)	0.86 (0.65–0.95)
0.25	0.63 (0.19–0.88)	0.93 (0.75–1.00)	0.99 (0.94–1.00)	0.75 (0.32–0.93)
0.10	0.48 (0.06–0.88)	1.00 (1.00–1.00)	1.00 (1.00–1.00)	0.60 (0.12–0.93)
0.00	0.03 (0.03–0.06)	1.00 (1.00–1.00)	1.00 (1.00–1.00)	0.07 (0.06–0.12)

The best performing model as measured by AUROC was an XGBoost model trained on text-data only derived from a character threshold of 500. The table above records sensitivity, specificity, precision and F1 score based on different decision thresholds for this model which achieved a sensitivity greater than or equal to and which were nearest to the desired sensitivity target. Confidence intervals are estimated using bootstrapping (1000 models fit).