Skip to main content
. 2020 Apr 24;3:61. doi: 10.1038/s41746-020-0266-y

Table 1.

Model performance.

Internal dataset: Stanford Internal dataset: Stanford (real prevalence) External dataset: Intermountain External dataset: Intermountain (real prevalence)
Metric
Accuracy 0.77 [0.76–0.78] 0.81 [0.80–0.82] 0.78 [0.77–0.78] 0.80 [0.79–0.81]
AUROC 0.84 [0.82–0.87] 0.84 [0.79–0.90] 0.85 [0.81–0.88] 0.85 [0.80–0.90]
Specificity 0.82 [0.81–0.83] 0.82 [0.82–0.83] 0.80 [0.79–0.81] 0.81 [0.80–0.82]
Sensitivity 0.73 [0.72–0.74] 0.75 [0.73–0.77] 0.75 [0.74–0.76] 0.75 [0.73–0.77]
PPV/precision 0.81 [0.80–0.81] 0.47 [0.45–0.48] 0.77 [0.76–0.78] 0.44 [0.43–0.46]
NPV 0.75 [0.74–0.76] 0.94 [0.94–0.95] 0.78 [0.77–0.79] 0.94 [0.94–0.95]

Model performance on the internal test set (Stanford) and external test set (Intermountain) with 95% confidence interval using probability threshold of 0.55 that maximizes both sensitivity and specificity on Stanford validation dataset. Bootstrapping is used to generate prevalence of PE in real world (between 14 and 22%).