Skip to main content
. Author manuscript; available in PMC: 2023 Jun 23.
Published in final edited form as: Prog Biomed Eng (Bristol). 2023 Apr 11;5(2):10.1088/2516-1091/acc2fe. doi: 10.1088/2516-1091/acc2fe

Table 1.

An overview of image and non-image modalities, number of subjects, tasks multimodal fusion methods and performance comparison of reviewed studies (the performance are reported by reviewed studies).

Study Modalities Subjects Tasks Fusion
strategy
Fusion details Performance comparison
(uni-/multimodal)
Performance comparison
(different fusion
methods)
1 Holste et al [16] MRI images, clinical features 17,046 samples of 5,248 patients Classification of breast cancer Operation Element-wise multiplication/elementwise summation/concatenation of learned unimodal features or direct features. [AUC] Images: 0.860, clinical features: 0.806, all: 0.903 (P-value < 0.05) [AUC] learned feature concatenation: 0.903, sum: 0.902, multiplication: 0.896; probability fusion: 0.888 (p-value >0.05)
2 Lu et al [18] H&E images, clinical features 1) 32,537 samples of 29,107 patients from CPTAC [22] and TCGA [23]. (Public)
2) 19162 samples of 19162 patients from an in-house dataset.
3) External testing set: 682 patients.
Classification of primary and metastatic tumors, and origin sites. Operation Concatenation of clinical features and the learned pathology image feature. [Top-1 accuracy] Image: about 0.740, image + sex: about 0.808, image + sex + site: about 0.762 (Metastatic tumors)
3 EI-Sappagh et al [24] MRI and PET images, neuropsychology data, cognitive scores, assessment data 1,536 patients from ADNI [37]. (Public) Classification of AD and prodromal status. Regression of 4 cognitive scores. Operation Concatenation of learned static features and learned time-series unimodal features from five stacked CNN-biLSTM. [Accuracy] Five modalities: 92.62, four modalities: 90.45, three modalities: 89.40, two modalities: 89.09. (Regression performance is consistent with the classification)
4 Yan et al [27] Pathology images, clinical features 3,764 samples of 153 patients. (Public) Classification of breast cancer. Operation Concatenation of increased-dimensional clinical features and multi-scale image features. [Accuracy] Image + clinical features: 87.9, clinical features: 78.5, images: 83.6
5 Mobadersany et al [21] H&E images, genomic data 1,061 samples of 769 patients from the TCGA-GBM and TCGA-LGG [23]. (Public) Survival prediction of glioma tumors Operation Concatenation of genomic biomarkers and learned pathology image features. [C-Index] Image: 0.745, gene: 0.746, images + gene: 0.774, (P-value < 0.05)
6 Yap et al [42] Macroscopic images, dermatoscopic images, clinical features 2,917 samples from ISIC [43]. (Public) Classification of skin lesion. Operation Concatenation of clinical features and learned image features. [AUC] Dsc + macro + clinical: 0.888, dsc + macro: 0.888, macro: 0.854, dsc: 0.871
7 Silva et al [44] Pathology images, mRNA, miRNA, DNA, copy number variation (CNV), clinical features 11,081 patients of 33 cancer types from TCGA [23]. (Public) Pancancer survival prediction. Operation Attention Attention weighted element-wise summation of unimodal features. [C-Index] Clinical: 0.742, mRNA: 0.763, miRNA: 0.717, DNA: 0.761, CNV: 0.640, pathology: 0.562, clinical + mRNA + DNAm: 0.779, all six modalities: 0.768
8 Kawahara et al [45] Clinical images, dermoscopic images, clinical features 1,011 samples. (Public) Classification of skin lesion. Operation Concatenation of learned unimodal features. [Accuracy] Clinical images + clinical features: 65.3, dermoscopic images + clinical features: 72.9, all modalities: 73.7
9 Yoo et al [38] MRI images, clinical features 140 patients Classification of brain lesion conversion. Operation Concatenation of learned images features and the replicated and rescaled clinical features. [AUC] Images: 71.8, images + clinical: 74.6
10 Yao et al [28] Pathology images, genomic data 1) 106 patients from TCGA-LUSC. 2) 126 patients from the TCGA-GBM [23]. (Public) Survival prediction of lung cancer and brain cancer. Operation Subspace Maximum correlated representation supervised by the CCA-based loss. [C-index] Pathology images: 0.5540, molecular: 0.5989, images + molecular: 0.6287. (LUSC). Similar results on other two datasets. [C-index] Proposed: 0.6287, SCCA [46]: 0.5518, DeepCorr + DeepSurv [17]: 0.5760 (LUSC). Similar results on other two datasets.
11 Cheerla et al [47] Pathology images, genomic data, clinical features 11,160 patients from TCGA [23] (nearly 43% of patients miss modalities). (Public) Survival prediction of 20 types of cancer. Operation Subspace The average of learned unimodal features, while a margin-based hinge-loss was used to regularize the similarity of learned unimodal features. [C-index] Clinical + miRNA + mRNA + pathology: 0.78, clinical + miRNA: 0.78, clinical + mRNA: 0.60, clin + miRNA + mRNA:0.78, clinical + miRNA + pathology: 0.78
12 Li et al [48] Pathology images genomic data 826 cases from the TCGA-BRCA [23]. (Public) Survival prediction of breast cancer. Operation Subspace Concatenated the learned unimodal features regularized by a similarity loss. [C-index] Images + gene: 0.7571, gene: 0.6912, image: 0.6781. (p-value < 0.05)
13 Zhou et al [39] CT images, laboratory indicators, clinical features 733 patients Classification of COVID-19 severity. Operation Subspace Concatenated the learned unimodal features regularized by a similarity loss. [Accuracy] Clinical features: 90.45, CT + clinical features: 96.36 [Accuracy] Proposed: 96.36, proposed wo/similarity loss: 93.18
14 Ghosal et al [65] Two fMRI paradigms images, genomic data (single nucleotide polymorphisms (SNP)) 1) 210 patients from the LIBD institute.
2) External testing set: 97 patients from BARI institute.
Classification of neuropsychiatric disorders. Operation Subspace Mean vector of learned unimodal features, supervised by the reconstruction loss. [AUC] Proposed: 0.68, encoder + dropout: 0.62, encoder only: 0.59 (LIBD). The external test set showed the same trend of results.
15 Cui et al [35] H&E and MRI images, genomic data (DNA), demographic features 962 patients (170 with complete modalities) from TCGA-GBMLGG [23] and BraTs [66] (Public) Survival prediction of glioma tumors. Operation Subspace Mean vector of learned unimodal features with modality dropout, supervised by the reconstruction loss. [C-index] Pathology: 0.7319 radiology: 0.7062, DNA: 0.7174, demographics: 0.7050, all: 0.7857 [C-index] Proposed: 0.8053, pathomic fusion [67]: 0.7697, deep orthogonal [34]: 0.7624
16 Schulz et al [20] CT, MRI and H&E images, genomic data 1)230 patients from the TCGA-KIRC [23]. (Public) 2) External testing set: 18 patients. Survival prediction of clear-cell renal cell carcinoma. Operation Attention Concatenation of learned unimodal features with an attention layer. [C-index] Radiology: 0.7074, pathology: 0.7424, rad + path: 0.7791. (p-value < 0.05). The external test set showed similar results
17 Cui et al [68] CT images, clinical features 924 samples of 397 patients Lymph node metastasis prediction of cell carcinoma. Operation Attention The concatenation of learned unimodal features with a category-wise contextual attention were used as the attributes of graph nodes. [AUC] Images: 0.782, images + clinical: 0.823. [AUC] Proposed: 0.823, logistic regression: 0.713, attention gated [74]: 0.6390, deep insight [75]: 0.739
18 Li et al [31] H&E images, clinical features 3,990 cases Lymph node metastasis prediction of breast cancer. Operation Attention Attention-based MIL for WSI-level representation, whose attention coefficients were learned from both modalities. [AUC] Clinical: 0.8312, image: 0.7111, clinical and image: 0.8844 [AUC] Proposed: 0.8844, concatenation: 0.8420, gating attention [67]: 0.8570, M3DN [70]: 0.8117
19 Duanmu et al [59] MRI images, genomic data, demographic features. 112 patients Response prediction to neoadjuvant chemotherapy in breast cancer. Operation Attention The learned feature vector of non-image modality was multiplied in a channel-wise way with the image features at multiple layers. [AUC] Image: 0.5758, image and non-image: 0.8035 [AUC] Proposed: 0.8035, concatenation: 0.5871
20 Guan et al [36] CT images, clinical features 553 patients Classification of esophageal fistula risk. Operation Attention Self-attention on the concatenation of learned unimodal features. Concatenation of all paths in the end. [AUC] Images: 0.7341, clinical features [76]: 0.8196, images + clinical: 0.9119 [AUC] Proposed: 0.9119, Concate: 0.8953, Ye et al [77]: 0.7736, Chauhan et al [53]: 0.6885, Yap et al [42]:0.8123
21 Pölsterl et al [52] MRI images, clinical features. 1,341 patients for diagnosis and 755 patients for prognosis. (Public) Survival prediction and diagnosis of AD. Operation Attention Dynamic affine transform module. [C-index] Images: 0.599, images + clinical: 0.748 [C-index] Proposed: 0.748, FiLM [78]: 0.7012, Duanmu et al [59]: 0.706, concatenation: 0.729
22 Wang et al [79] X-ray images, free-text reports. 1) Chest x-ray 14 dataset [80]. 2) 900 samples from a hand-labeled dataset. 3) 3,643 samples from the OpenI [81]. (Partially public) Classification of thorax disease. Operation Attention Multi-level attention for learned features of image and text. [Weighted accuracy] Text reports: 0.978, images: 0.722, images + text reports: 0.922. (Chest X-rays14). Similar results on other two datasets.
23 Chen et al [67] H&E images, genomic data (DNA and mRNA) 1) 1,505 samples of 769 patients from TCGA-GBM/LGG.
2) 1,251 samples of 417 patients from TCGA-KIRC [23]. (Public)
Survival prediction and grade classification of glioma tumors and renal cell carcinoma. Operation Attention Tensor Fusion Kronecker product of different modalities. And a gated-attention layer was used to regularize the unimportant features. [C-index] Images (CNN): 0.792, images (GCN): 0.746, gene: 0.808,images + gene: 0.826. (GBM/LGG) Similar results on the other dataset. [C-index]: Proposed: 0.826, Mobadersany et al [21]: 0.781. (p-value < 0.05) (GBM/LGG). Similar results on the other dataset.
24 Wang et al [29] Pathology images, genomic data 345 patients from TCGA [23](Public) Survival prediction of breast cancer. Operation Tensor Fusion Inter-modal features and intra-modal features produced by the bilinear layers. [C-index] Gene: 0.695, images: 0.578, gene + images: 0.723 [C-index] Proposed: 0.723, LASSO-Cox 0.700, inter-modal features: 0.708, DeepCorrSurv [28] : 0.684, MDNNMD [82]: 0.704, concatenation: 0.703
25 Braman et al [34] T1 and T2 MRI images, genomic data (DNA), clinical features 176 patients from TCGA-GBM/LGG [23] and BraTs [66]. (Public) Survival prediction of brain glioma tumors. Operation Attention Tensor Fusion Extended the fusion method in [67] to four modalities and the orthogonal loss was added to encourage the learning of complementary unimodal features. [C-index] Radiology: 0.718, pathology: 0.715, gene: 0.716, clinical: 0.702, path + clin: 0.690, all: 0.785 [C-index] Proposed: 0.785, pathomic fusion [67]: 0.775, concatenation: 0.76
26 Cao et al [41] fMRI images, clinical features 871 patients from ABIDE [83]. (Public) Classification of ASD and health controls. Graph Operation Nodes features were composed of image features, while the edge weights were calculated by images and non-image features. [Accuracy] Sites + gender + age + FIQ: 0.7456, sites + age + FIQ: 0.7534, sites + age: 0.7520 [Accuracy] Proposed: 0.737, Parisot et al [40]: 0.704
27 Parisot et al [40] fMRI images, clinical features 1) 871 patients from ABIDE [83]. 2) 675 subjects from ANDI [37]. (Public) Classification of ASD and health control. Prediction of conversion to AD. Graph Operation Nodes features were composed of image features, while the edge weights were calculated by images and non-image features. [AUC] Image + sex + APOE4: 0.89, image + sex + APOE4 + age: 0.85 (ADNI dataset) [AUC] Proposed: 0.89, GCN: 0.85, MLP (Concatenation): 0.74 (ADNI dataset)
28 Chen et al [26] H&E images, genomic data 1) - 4) 437, 1,022, 1,011, 515 and 538 patients from TCGA-BLCA, TCGA-BRCA, TCGA-GBMLGG, TCGA-LUAD and TCGA-UCEC respectively [23] (Public) Survival prediction of five kinds of tumors. Operation Attention Co-attention mapping between WSIs and genomic features. [C-Index] Gene: 0.527, pathology images: 0.614, all: 0.653 (overall prediction of five tumors) [C-index] Proposed: 0.653, concatenation: 0.634, bilinear pooling: 0.621. (Overall prediction of five tumors)
29 Zhou et al [84] PET images, MRI images, genomic data (SNP) 805 patients from ADNI [37] (360 with complete multimodalities). (Public) Classification of AD and its prodromal status Operation Learned features of every two modalities and all three modalities were concatenated at the 1st and 2nd fusion stage separately. [Accuracy] MRI + PET + SNP > MRI + PET > MRI > MRI + SNP > PET + SNP > PET > SNP (Four-class classification) [Accuracy] Proposed > MKL [85] > SAE [86] (Direct concatenation of learned unimodal features)
30 Huang et al [87] CT images, clinical features, and lab test results 1,837 studies from 1,794 patients Classification of the presence pulmonary embolism Operation Compared seven kinds of fusion, including early, intermediate and late fusion. Late elastic fusion performed the best. [AUC] Images: 0.791, clinical and lab test: 0.911, all: 0.947. [AUC] Early fusion: 0.899, late fusion: 0.947, joint fusion: 0.893.
31 Lu et al [69] Pathology images, genomics data 736 patients from TCGA-GBM/LGG [23]. (Public) Survival prediction and grade classification of glioma tumors. Operation Attention Proposed a multimodal transformer encoder for co-attention fusion. [C-index] Images: 0.7385, gene: 0.7979, images + gene: 0.8266 (Same trend for the classification task) [C-index] Proposed: 0.8266 pathomic fusion [67]: 0.7994
32 Cai et al [51] Camera/dermatoscopic images, clinical features 1) 10,015 cases from ISIC [43]. (Public) 2) 760 cases from a private dataset. Classification of skin wounds Operation Attention Two multi-head cross attention to interactively fuse information from images and metadata. [AUC] Images: 0.944 clinical features: 0.964 images + clinical: 0.974 (Private dataset) [AUC] Poposed: 0.974, metaBlock [88]: 0.968, concatenation: 0.964 (Private dataset)
33 Jacenkow et al [72] X-ray images, free-text reports 210,538 cases from MIMIC-CXR [89]. (Public) Classification chest diseases Attention Finetuned unimodally pre-trained BERT models by a multimodal task. [ACC] Images: 86.0 text: 85.1 images + text: 87.7 [ACC] Proposed: 87.7, attentive [90]: 86.8
34 Li et al [58] X-ray images, free-text reports 1) 222,713 cases from MIMIC-CXR [89],
2) 3,684 cases from OpenI [81]. (Public)
Classification of chest diseases Attention Used different pre-trained visual-text transformer. [AUC] Text: 0.974, image + text: 0.987 (MIMIC-CXR) [AUC] VisualBERT [91, 92]: 0.987, LXMERT [93]: 0.984, UNITER [94]: 0.985, PixelBERT [95]: 0.953 (MIMIC-CXR)