Table 6. The ablation study on three different backbones: DenseNet121, ViT, and our network (DenseNet121 + ViT).
| T(IoU) | Model | Atelectasis | Cardiomegaly | Effusion | Infiltration | Mass | Nodule | Pneumonia | Pneumothorax |
|---|---|---|---|---|---|---|---|---|---|
| 0.1 | DenseNet121 | 0.596 | 0.993 | 0.678 | 0.707 | 0.674* | 0.113 | 0.786 | 0.418* |
| ViT | 0.365 | 0.937 | 0.392 | 0.586 | 0.306 | 0.000 | 0.429 | 0.202 | |
| DenseNet121+ViT | 0.673* | 1.000* | 0.713* | 0.808* | 0.633 | 0.127* | 0.833* | 0.330 | |
| 0.2 | DenseNet121 | 0.321 | 0.923 | 0.566 | 0.636 | 0.469 | 0.000 | 0.691 | 0.259* |
| ViT | 0.179 | 0.887 | 0.147 | 0.374 | 0.102 | 0.000 | 0.286 | 0.149 | |
| DenseNet121+ViT | 0.372* | 1.000* | 0.615* | 0.737* | 0.510* | 0.000 | 0.738* | 0.245 | |
| 0.3 | DenseNet121 | 0.160* | 0.606 | 0.364 | 0.414 | 0.326 | 0.000 | 0.571 | 0.145 |
| ViT | 0.100 | 0.740 | 0.063 | 0.242 | 0.061 | 0.000 | 0.071 | 0.096 | |
| DenseNet121+ViT | 0.160* | 0.958* | 0.420* | 0.616* | 0.347* | 0.000 | 0.643* | 0.160* | |
| 0.4 | DenseNet121 | 0.064* | 0.232 | 0.217 | 0.303 | 0.225* | 0.000 | 0.500 | 0.103 |
| ViT | 0.058 | 0.486 | 0.014 | 0.172 | 0.020 | 0.000 | 0.071 | 0.032 | |
| DenseNet121+ViT | 0.064* | 0.704* | 0.259* | 0.535* | 0.204 | 0.000 | 0.524* | 0.106* | |
| 0.5 | DenseNet121 | 0.013 | 0.070 | 0.105 | 0.222 | 0.123 | 0.000 | 0.309 | 0.053 |
| ViT | 0.032* | 0.225 | 0.007 | 0.091 | 0.020 | 0.000 | 0.024 | 0.021 | |
| DenseNet121+ViT | 0.006 | 0.409* | 0.112* | 0.424* | 0.163* | 0.000 | 0.381* | 0.085* | |
| 0.6 | DenseNet121 | 0.000 | 0.021 | 0.028 | 0.111 | 0.032 | 0.000 | 0.095 | 0.043* |
| ViT | 0.006* | 0.070 | 0.007 | 0.040 | 0.000 | 0.000 | 0.000 | 0.000 | |
| DenseNet121+ViT | 0.006* | 0.190* | 0.042* | 0.232* | 0.041* | 0.000 | 0.262* | 0.043* | |
| 0.7 | DenseNet121 | 0.000 | 0.007 | 0.007 | 0.061 | 0.020* | 0.000 | 0.024 | 0.011 |
| ViT | 0.006* | 0.014 | 0.006 | 0.010 | 0.000 | 0.000 | 0.000 | 0.000 | |
| DenseNet121+ViT | 0.006* | 0.049* | 0.028* | 0.162* | 0.020* | 0.000 | 0.119* | 0.021* |
*, the best results of our experiments. ViT, vision transformer; DenseNet121+ViT, our proposed method which includes two network branches, DenseNet121 and ViT; T(IoU), different IoU thresholds.