Table 4.
F1 scores, AUC-PR, and AUC-ROC for each (deep learning algorithm, training task, test set) combination. The AUC-PR and the AUC-ROC of the models built using the ensemble majority voting algorithm could not be computed (see the “Model evaluation” section of the “MATERIALS AND METHODS” section). In this table, yellow and magenta are used to mark the MrOS dataset and the local dataset, respectively.
| Training task | Task 1: ImageNet → MrOS-mSQ | Task 2: ImageNet → local-m2ABQ | Task 3: ImageNet → MrOS-mSQ → local-m2ABQ | ||||
|---|---|---|---|---|---|---|---|
| Test set | MrOS-mSQ | MrOS-m2ABQ | Local-m2ABQ | MrOS-m2ABQ | Local-m2ABQ | MrOS-m2ABQ | Local-m2ABQ |
| F1 score | |||||||
| GoogLeNet | 0.751 | 0.691 | 0.579 | 0.698 | 0.668 | 0.694 | 0.701 |
| Inception-ResNet-V2 | 0.729 | 0.652 | 0.523 | 0.670 | 0.659 | 0.698 | 0.674 |
| EfficientNet-B1 | 0.743 | 0.667 | 0.543 | 0.705 | 0.650 | 0.747 | 0.689 |
| Ensemble averaging | 0.773 | 0.677 | 0.566 | 0.729 | 0.684 | 0.761 | 0.702 |
| Ensemble majority voting | 0.776 | 0.648 | 0.553 | 0.706 | 0.694 | 0.713 | 0.712 |
| AUC-PR | |||||||
| GoogLeNet | 0.817 | 0.782 | 0.606 | 0.784 | 0.698 | 0.804 | 0.736 |
| Inception-ResNet-V2 | 0.798 | 0.795 | 0.636 | 0.809 | 0.656 | 0.801 | 0.696 |
| EfficientNet-B1 | 0.816 | 0.796 | 0.628 | 0.785 | 0.703 | 0.808 | 0.746 |
| Ensemble averaging | 0.841 | 0.796 | 0.658 | 0.811 | 0.730 | 0.831 | 0.764 |
| AUC-ROC | |||||||
| GoogLeNet | 0.990 | 0.897 | 0.918 | 0.927 | 0.941 | 0.933 | 0.949 |
| Inception-ResNet-V2 | 0.993 | 0.925 | 0.914 | 0.930 | 0.925 | 0.922 | 0.947 |
| EfficientNet-B1 | 0.993 | 0.914 | 0.916 | 0.914 | 0.941 | 0.933 | 0.958 |
| Ensemble averaging | 0.992 | 0.911 | 0.930 | 0.936 | 0.948 | 0.940 | 0.955 |