Skip to main content
. 2024 May 3;69(10):10TR01. doi: 10.1088/1361-6560/ad387d

Table 12.

Overview of LMs for multimodal learning. The asterisks (*) indicate terms that are either not present in the original paper or do not apply in this context.

References ROI Modality Dataset Model name Vision model Language model
Huemann et al (2023a) Chest x-ray CANDID-PTX ConTEXTual Net U-Net T5-Large
Huang et al (2021) Chest x-ray CheXpert, RSNA pneumonia detection challenge, SIIM-ACR pneumothorax segmentation, NIH ChestX-ray14 GLoRIA CNN Transformer
Li et al (2023) * CT, x-ray MosMedData+, ESO-CT, QaTa-COV19 LViT U-shaped CNN U-shaped ViT (BERT-Embed)
Khare et al (2021) * * VQA-Med 2019, VQA-RAD, ROCO MMBERT ResNet152 BERT
Huemann et al (2022) Lymphoma PET, CT Institutional ViT, EfficientNet B7 ROBERTA-Large, Bio ClinicalBERT, and BERT
Chen et al (2022) * x-ray, MRI, CT ROCO, MedICaT M3AE ViT BERT
Delbrouck et al (2022) * * MIMIC-CXR, Indiana University x-ray collection, PadChest, CheXpert, VQA-Med 2021 ViLMedic CNN BioBERT