Table 1:
Results of zero-shot image classification tasks on four datasets. We take an additional prompt ensemble version of each method (with subscript ENS). We take the mean and standard deviation (STD) of accuracy (ACC) in five runs considering the randomness of prompt generation process. Best scores across a dataset are in bold.
| ACC(STD) | CheXpert-5x200 | MIMIC-5x200 | COVID | RSNA |
|---|---|---|---|---|
| CLIP | 0.2016(0.01) | 0.1918(0.01) | 0.5069(0.03) | 0.4989(0.01) |
| CLIPENS | 0.2036(0.01) | 0.2254(0.01) | 0.5090(<0.01) | 0.5055(0.01) |
| ConVIRT | 0.4188(0.01) | 0.4018(0.01) | 0.5184(0.01) | 0.4731(0.05) |
| ConVIRTENS | 0.4224(0.02) | 0.4010(0.02) | 0.6647(0.05) | 0.4647(0.08) |
| GLoRIA | 0.4328(0.01) | 0.3306(0.01) | 0.7090(0.04) | 0.5808(0.08) |
| GLoRIAENS | 0.4210(0.03) | 0.3382(0.01) | 0.5702(0.06) | 0.4752(0.06) |
| MedCLIP-ResNet | 0.5476(0.01) | 0.5022(0.02) | 0.8472(<0.01) | 0.7418(<0.01) |
| MedCLIP-ResNetENS | 0.5712(<0.01) | 0.5430(<0.01) | 0.8369(<0.01) | 0.7584(<0.01) |
| MedCLIP-ViT | 0.5942(<0.01) | 0.5006(<0.01) | 0.8013(<0.01) | 0.7447(0.01) |
| MedCLIP-ViTENS | 0.5942(<0.01) | 0.5024(<0.01) | 0.7943(<0.01) | 0.7682(<0.01) |