Skip to main content
. 2023 Dec 2;6:226. doi: 10.1038/s41746-023-00952-2

Fig. 2. Structure of the presented Med-MLLM framework.

Fig. 2

It consists of three main components: a Image-only pre-training which incorporates the patient-level contrastive learning (PCL); b Text-only pre-training which incorporates three training objectives: the masked language modelling (MLM), the sentence reconstruction (SR) loss, and the findings-impression alignment (FIA) loss; and c Image-text pre-training which incorporates a knowledge base and a pre-training objective: soft image-text alignment (SITA).