Skip to main content
. 2023 May 9;13:7544. doi: 10.1038/s41598-023-34303-8

Figure 7.

Figure 7

Fusion model architectures: (ac)—Late fusion models that differ in the decision model that aggregated the unimodal predictions: weighted average, TabNet and XGBoost respectively. (d) Early fusion model that created a shared representation from the concatenation of CTPA features from the SANet and EHR tabular embeddings. The CTPA features are extracted from the last FC layer of SANet. The tabular embeddings are extracted from the TabNet’s Embedding Generator module (Similar to word embeddings in NLP). (e) Intermediate fusion model that concatenated extracted latent space features from both SANet and TabNet single modality encoders. The SANet and TabNet features are extracted from the last FC layer of each model. (f) This model is identical to (e), except for the substitution of the SANet with Swin UNETR model as the CTPA encoder. For (df), the concatenated vector was passed through bilinear attention or dimensionality reduction such as PCA. Finally, the reduced vector was used as input to TabNet for classification. (g) The multimodal inputs are encoded by independent TabNet and Swin UNETR Transformer streams and their outputs are concatenated and fused by MBT Transformer encoder.