Skip to main content
. Author manuscript; available in PMC: 2024 Dec 1.
Published in final edited form as: Acad Radiol. 2023 Jul 10;30(12):2973–2987. doi: 10.1016/j.acra.2023.04.023

Table 4.

F1 scores, AUC-PR, and AUC-ROC for each (deep learning algorithm, training task, test set) combination. The AUC-PR and the AUC-ROC of the models built using the ensemble majority voting algorithm could not be computed (see the “Model evaluation” section of the “MATERIALS AND METHODS” section). In this table, yellow and magenta are used to mark the MrOS dataset and the local dataset, respectively.

Training task Task 1: ImageNet → MrOS-mSQ Task 2: ImageNet → local-m2ABQ Task 3: ImageNet → MrOS-mSQ → local-m2ABQ
Test set MrOS-mSQ MrOS-m2ABQ Local-m2ABQ MrOS-m2ABQ Local-m2ABQ MrOS-m2ABQ Local-m2ABQ
F1 score
GoogLeNet 0.751 0.691 0.579 0.698 0.668 0.694 0.701
Inception-ResNet-V2 0.729 0.652 0.523 0.670 0.659 0.698 0.674
EfficientNet-B1 0.743 0.667 0.543 0.705 0.650 0.747 0.689
Ensemble averaging 0.773 0.677 0.566 0.729 0.684 0.761 0.702
Ensemble majority voting 0.776 0.648 0.553 0.706 0.694 0.713 0.712
AUC-PR
GoogLeNet 0.817 0.782 0.606 0.784 0.698 0.804 0.736
Inception-ResNet-V2 0.798 0.795 0.636 0.809 0.656 0.801 0.696
EfficientNet-B1 0.816 0.796 0.628 0.785 0.703 0.808 0.746
Ensemble averaging 0.841 0.796 0.658 0.811 0.730 0.831 0.764
AUC-ROC
GoogLeNet 0.990 0.897 0.918 0.927 0.941 0.933 0.949
Inception-ResNet-V2 0.993 0.925 0.914 0.930 0.925 0.922 0.947
EfficientNet-B1 0.993 0.914 0.916 0.914 0.941 0.933 0.958
Ensemble averaging 0.992 0.911 0.930 0.936 0.948 0.940 0.955