Table 2.
Classification metrics on the test dataset using the different architecture of deep transfer learning models and also proposed ensemble method. For each model, average (± std.) performance measure is reported over the best 5 trained model checkpoints
| Model | Precision | Recall | F1-score | Accuracy | AUC |
|---|---|---|---|---|---|
| EfficientNetB0 | 0.847(± 0.03) | 0.822(± 0.11) | 0.815(± 0.05) | 0.82(± 0.02) | 0.907(± 0.02) |
| EfficientNetB1 | 0.727(± 0.06) | 0.718(± 0.09) | 0.712(± 0.03) | 0.71(± 0.02) | 0.809(± 0.02) |
| EfficientNetB2 | 0.768(± 0.03) | 0.768(± 0.12) | 0.768(± 0.05) | 0.77(± 0.03) | 0.859(± 0.03) |
| EfficientNetB3 | 0.769(± 0.03) | 0.765(± 0.07) | 0.763(± 0.03) | 0.76(± 0.03) | 0.851(± 0.01) |
| EfficientNetB4 | 0.791(± 0.02) | 0.789(± 0.05) | 0.788(± 0.01) | 0.79(± 0.01) | 0.877(± 0.01) |
| EfficientNetB5 | 0.817(± 0.03) | 0.817(± 0.11) | 0.817(± 0.05) | 0.82(± 0.03) | 0.886(± 0.01) |
| Inception_resnet_v2 | 0.773(± 0.03) | 0.774(± 0.12) | 0.773(± 0.05) | 0.77(± 0.02) | 0.856(± 0.01) |
| InceptionV3 | 0.825(± 0.03) | 0.814(± 0.07) | 0.815(± 0.03) | 0.82(± 0.02) | 0.897(± 0.02) |
| NASNetLarge | 0.772(± 0.06) | 0.770(± 0.09) | 0.768(± 0.03) | 0.77(± 0.01) | 0.836(± 0.03) |
| NASNetMobile | 0.759(± 0.03) | 0.757(± 0.12) | 0.757(± 0.05) | 0.76(± 0.04) | 0.823(± 0.02) |
| ResNet50 | 0.807(± 0.03) | 0.808(± 0.11) | 0.807(± 0.05) | 0.81(± 0.03) | 0.875(± 0.01) |
| Xception | 0.738(± 0.06) | 0.739(± 0.09) | 0.738(± 0.03) | 0.74(± 0.04) | 0.782(± 0.04) |
| DenseNet121 | 0.768(± 0.03) | 0.768(± 0.03) | 0.768(± 0.03) | 0.77(± 0.02) | 0.868(± 0.04) |
| SeResnet50 | 0.755(± 0.03) | 0.745(± 0.07) | 0.745(± 0.03) | 0.75(± 0.02) | 0.818(± 0.02) |
| ResNext50 | 0.810(± 0.03) | 0.806(± 0.12) | 0.806(± 0.05) | 0.81(± 0.02) | 0.843(± 0.02) |
| Proposed ensemble model | 0.857(± 0.02) | 0.854(± 0.05) | 0.852(± 0.01) | 0.852(± 0.01) | 0.91(± 0.01) |