Skip to main content
. Author manuscript; available in PMC: 2021 Oct 1.
Published in final edited form as: Med Image Comput Comput Assist Interv. 2020 Sep 29;12262:529–539. doi: 10.1007/978-3-030-59713-9_51

Fig. 1:

Fig. 1:

The architecture of our joint model, along with an example chest radiograph xI and its associated radiology report xR. At training time, the model predicts the edema severity level from images and text through their respective encoders and classifiers, and compares the predictions with the labels. The joint embedding loss JE associates image embeddings I with text embeddings R in the joint embedding space. At inference time, the image stream and the text stream are decoupled and only the image stream is used. Given a new chest radiograph (image), the image encoder and classifier compute its edema severity level.