Skip to main content
. Author manuscript; available in PMC: 2024 Aug 14.
Published in final edited form as: Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022:3876–3887. doi: 10.18653/v1/2022.emnlp-main.256

Table 1:

Results of zero-shot image classification tasks on four datasets. We take an additional prompt ensemble version of each method (with subscript ENS). We take the mean and standard deviation (STD) of accuracy (ACC) in five runs considering the randomness of prompt generation process. Best scores across a dataset are in bold.

ACC(STD) CheXpert-5x200 MIMIC-5x200 COVID RSNA
CLIP 0.2016(0.01) 0.1918(0.01) 0.5069(0.03) 0.4989(0.01)
CLIPENS 0.2036(0.01) 0.2254(0.01) 0.5090(<0.01) 0.5055(0.01)
ConVIRT 0.4188(0.01) 0.4018(0.01) 0.5184(0.01) 0.4731(0.05)
ConVIRTENS 0.4224(0.02) 0.4010(0.02) 0.6647(0.05) 0.4647(0.08)
GLoRIA 0.4328(0.01) 0.3306(0.01) 0.7090(0.04) 0.5808(0.08)
GLoRIAENS 0.4210(0.03) 0.3382(0.01) 0.5702(0.06) 0.4752(0.06)
MedCLIP-ResNet 0.5476(0.01) 0.5022(0.02) 0.8472(<0.01) 0.7418(<0.01)
MedCLIP-ResNetENS 0.5712(<0.01) 0.5430(<0.01) 0.8369(<0.01) 0.7584(<0.01)
MedCLIP-ViT 0.5942(<0.01) 0.5006(<0.01) 0.8013(<0.01) 0.7447(0.01)
MedCLIP-ViTENS 0.5942(<0.01) 0.5024(<0.01) 0.7943(<0.01) 0.7682(<0.01)