. Author manuscript; available in PMC: 2024 Dec 1.

Published in final edited form as: Acad Radiol. 2023 Jul 10;30(12):2973–2987. doi: 10.1016/j.acra.2023.04.023

Table 4.

F₁ scores, AUC-PR, and AUC-ROC for each (deep learning algorithm, training task, test set) combination. The AUC-PR and the AUC-ROC of the models built using the ensemble majority voting algorithm could not be computed (see the “Model evaluation” section of the “MATERIALS AND METHODS” section). In this table, yellow and magenta are used to mark the MrOS dataset and the local dataset, respectively.

Training task	Task 1: ImageNet → MrOS-mSQ			Task 2: ImageNet → local-m2ABQ		Task 3: ImageNet → MrOS-mSQ → local-m2ABQ
Test set	MrOS-mSQ	MrOS-m2ABQ	Local-m2ABQ	MrOS-m2ABQ	Local-m2ABQ	MrOS-m2ABQ	Local-m2ABQ
	F₁ score
GoogLeNet	0.751	0.691	0.579	0.698	0.668	0.694	0.701
Inception-ResNet-V2	0.729	0.652	0.523	0.670	0.659	0.698	0.674
EfficientNet-B1	0.743	0.667	0.543	0.705	0.650	0.747	0.689
Ensemble averaging	0.773	0.677	0.566	0.729	0.684	0.761	0.702
Ensemble majority voting	0.776	0.648	0.553	0.706	0.694	0.713	0.712
	AUC-PR
GoogLeNet	0.817	0.782	0.606	0.784	0.698	0.804	0.736
Inception-ResNet-V2	0.798	0.795	0.636	0.809	0.656	0.801	0.696
EfficientNet-B1	0.816	0.796	0.628	0.785	0.703	0.808	0.746
Ensemble averaging	0.841	0.796	0.658	0.811	0.730	0.831	0.764
	AUC-ROC
GoogLeNet	0.990	0.897	0.918	0.927	0.941	0.933	0.949
Inception-ResNet-V2	0.993	0.925	0.914	0.930	0.925	0.922	0.947
EfficientNet-B1	0.993	0.914	0.916	0.914	0.941	0.933	0.958
Ensemble averaging	0.992	0.911	0.930	0.936	0.948	0.940	0.955