Skip to main content
. 2023 Jul 26;13:12098. doi: 10.1038/s41598-023-39278-0

Figure 5.

Figure 5

Illustration of the Medfusion model. (A) General overview of the architecture. x and  x~ are the input and output images. (B) Details of the autoencoder with a sampling of the latent space via the reparameterization trick at the end of the encoder and a direct connection (dashed lines) into the decoder (only active for training the autoencoder). (C) Detailed view of the denoising UNet with a linear layer for time and label embedding. (D) Detailed view of the submodules inside the autoencoder and UNet. If not specified otherwise, a convolution kernel size of 3 × 3, GroupNorm with 8 groups, and Swish activation was used.