. 2021 Feb 16;11:3938. doi: 10.1038/s41598-021-83237-6

Table 2.

Detailed diagnostic metrics of end-to-end models and radiologists on internal and external testing datasets.

	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
Internal testing dataset
MLP	0.893^a	0.879^a	0.900	0.806	0.841	0.889
SVM	0.845^a,b	0.909^a	0.814	0.698	0.789	0.860
LR	0.874^a	0.848^a,b	0.886	0.778	0.812	0.867
XGBoost	0.816^a,b	0.788 ^a,b	0.829	0.684	0.732	0.808
Senior R	0.903^a	0.879^a	0.914	0.829	0.853	0.896
Junior R	0.767^b	0.667^b	0.814	0.629	0.647	0.737
P value	< 0.05*	< 0.05*	> 0.05	> 0.05	–	–
External testimg dataset
MLP	0.884^a,b	0.879^a,b	0.887^a	0.806^a,b,c	0.841	0.883
SVM	0.758^c	0.970^b	0.645^b	0.593^d	0.736	0.791
LR	0.905^a,b	0.970^b	0.871^a	0.800^c	0.877	0.919
XGBoost	0.495^d	0.758^a	0.355^c	0.385^e	0.510	0.518
Senior R	0.926^b	0.818^a	0.984^d	0.964^b	0.885	0.897
Junior R	0.832^a,c	0.818^a	0.839^a	0.730^a,c,d	0.771	0.828
P value	< 0.05*	< 0.05*	< 0.05*	< 0.05*	–	–

*On either internal or external testing dataset, different lowercase letters in the same column indicate significant differences among different models or readers (P < 0.05).