Figure 2:

Demonstration of challenges in medical image-text contrastive learning. (1) Pre-training data only includes paired images and texts. However, many more image-only and text-only datasets are ignored. (2) False negatives appear. For an anchor image, previous methods treat paired texts (i.e., reports from the same patient’s study) as positives and unpaired texts (i.e., reports from other patients’ studies) as negatives. However, the negative texts can describe the same symptoms as the anchor texts.