Skip to main content
. 2021 Feb 16;11:3938. doi: 10.1038/s41598-021-83237-6

Table 2.

Detailed diagnostic metrics of end-to-end models and radiologists on internal and external testing datasets.

Accuracy Sensitivity Specificity Precision F1-score G-mean
Internal testing dataset
MLP 0.893a 0.879a 0.900 0.806 0.841 0.889
SVM 0.845a,b 0.909a 0.814 0.698 0.789 0.860
LR 0.874a 0.848a,b 0.886 0.778 0.812 0.867
XGBoost 0.816a,b 0.788 a,b 0.829 0.684 0.732 0.808
Senior R 0.903a 0.879a 0.914 0.829 0.853 0.896
Junior R 0.767b 0.667b 0.814 0.629 0.647 0.737
P value  < 0.05*  < 0.05*  > 0.05  > 0.05
External testimg dataset
MLP 0.884a,b 0.879a,b 0.887a 0.806a,b,c 0.841 0.883
SVM 0.758c 0.970b 0.645b 0.593d 0.736 0.791
LR 0.905a,b 0.970b 0.871a 0.800c 0.877 0.919
XGBoost 0.495d 0.758a 0.355c 0.385e 0.510 0.518
Senior R 0.926b 0.818a 0.984d 0.964b 0.885 0.897
Junior R 0.832a,c 0.818a 0.839a 0.730a,c,d 0.771 0.828
P value  < 0.05*  < 0.05*  < 0.05*  < 0.05*

*On either internal or external testing dataset, different lowercase letters in the same column indicate significant differences among different models or readers (P < 0.05).