. 2021 Jun 5;68:103402. doi: 10.1016/j.ebiom.2021.103402

Table 2.

Performance of T₁W, T₂W, clinical features and ensemble models on the internal test set (n = 93) compared with expert evaluation, as well as the external test set (n = 97). p-value as calculated by the McNemar test for each expert is for accuracy relative to the performance of the ensemble model. Abbreviations - ROC AUC, area under ROC curve; PPV, positive predictive value; NPV, negative predictive value; 95% CI, 95% confidence interval.

Internal Test Set
Modality	F1 Score	ROC AUC	Accuracy (95% CI)	Sensitivity (95% CI)	Specificity (95% CI)	PPV	NPV	p-value
Clinical	0·58	0·71	0·62 (0·52-0·72)	0·57 (0·42-0·71)	0·67 (0·53-0·78)	0·59	0·65	-
T1W	0·59	0·64	0·66 (0·55-0·74)	0·55 (0·40-0·69)	0·75 (0·61-0·85)	0·64	0·67	-
T2W	0·67	0·74	0·74 (0·64-0·82)	0·57 (0·42-0·71)	0·88 (0·76-0·95)	0·80	0·71	-
Ensemble	0·75	0·82	0·76 (0·67-0·84)	0·79 (0·64-0·89)	0·66 (0·53-0·78)	0·72	0·81	-
Expert 1	0·77	-	0·76 (0·66-0·84)	0·86 (0·72-0·94)	0·68 (0·54-0·79)	0·69	0·85	1.0
Expert 2	0·74	-	0·73 (0·63-0·81)	0·83 (0·69-0·92)	0·64 (0·50-0·76)	0·66	0·82	0.66
Expert 3	0·52	-	0·60 (0·50-0·69)	0·48 (0·33-0·62)	0·70 (0·56-0·81)	0·57	0·61	0·02
Expert Committee	0·73	-	0·73 (0·63-0·81)	0·81 (0·67-0·90)	0·66 (0·52-0·78)	0·67	0·81	0.7

External Testing Set
Modality	F1 Score	ROC AUC	Accuracy (95% CI)	Sensitivity (95% CI)	Specificity (95% CI)	PPV	NPV

Clinical	0·52	0·69	0·64 (0·54-0·73)	0·49 (0·34-0·64)	0·74 (0·62-0·84)	0·56	0·68
T₁W	0·51	0·66	0·66 (0·56-0·75)	0·44 (0·29-0·59)	0·81 (0·69-0·89)	0·61	0·68
T₂W	0·65	0·73	0·72 (0·62-0·80)	0·64 (0·48-0·77)	0·78 (0·65-0·87)	0·66	0·76
Ensemble	0·70	0·79	0·73 (0·64-0·81)	0·77 (0·61-0·88)	0·71 (0·58-0·81)	0·63	0·82