. 2024 Nov 21;7:334. doi: 10.1038/s41746-024-01275-6

Table 1.

Poorly performing subgroups identified by AFISP

Subgroup #	Phenotype	AUROC [95% Bootstrap CI]	Number of patients
AFISP 1	Anemia and nonspecific lung disease	0.81 [0.78, 0.84]	562
AFISP 2	Nonspecific lung disease and Hypoxemia	0.82 [0.77, 0.86]	310
AFISP 3	Sepsis and acute respiratory failure	0.83 [0.79, 0.87]	306
AFISP 4	Sepsis and anemia	0.85 [0.82, 0.88]	606
AFISP 5	Acute respiratory failure and no bronchitis	0.89 [0.88, 0.91]	1095
AFISP 6	Nonspecific lung disease and hypoxemia	0.89 [0.88, 0.91]	1468
AFISP 7	Acute respiratory failure	0.89 [0.88, 0.91]	1137
AFISP 8	Sepsis	0.91 [0.90, 0.93]	1361
AFISP 9	Hypoxemia	0.91 [0.90, 0.93]	1587
AFISP 10	Nonspecific lung disease and not admitted from ED	0.92 [0.90, 0.94]	468
AFISP 11	Nonspecific lung disease and no infection	0.92 [0.90, 0.95]	422
AFISP 12	Transferred from nursing facility and admitted from ED	0.93 [0.91, 0.94]	2196
AFISP 13	Transferred from a nursing facility	0.93 [0.92, 0.94]	3035

Phenotypes of subgroups found by AFISP in the worst 10% subset of the evaluation dataset, and the performance of the AAM-inspired model on these subgroups. For reference, the full evaluation dataset contained 60,998 patients and the AUROC of the AAM-inspired model on the full dataset was 0.986 [0.985, 0.987]. Confidence intervals computing using 100 bootstrap resamples. All subgroups had a statistically significant difference (at the 0.05 significance level) in performance from the performance threshold (0.944) with all p values less than 1 × 10⁻⁴ after correcting for multiple comparisons.