Table 5.
Classification Performance in the Task of Distinguishing Malignant From Benign Breast Lesions for the Human-Engineered Radiomics, CNN Feature Extraction, CNN Fine-Tuning, and Fusion Classifiers on Entire Data Set (Both Mass and NME), Mass Lesion Only, and NME Only. The Multiple Comparison Corrections Were Performed Using the Bonferroni–Holm Method
All | Mass | NME | ||||
---|---|---|---|---|---|---|
AUC [95% CI] | p-value for ΔAUC (significance level) [95% CI of ΔAUC] | AUC [95% CI] | p-value for ΔAUC (significance level) [95% CI of ΔAUC] | AUC [95% CI] | p-value for ΔAUC (significance level) [95% CI of ΔAUC] | |
Human-engineered radiomics (RadHE) | 0.89 [0.8582, 0.9221] | ... | 0.90 [0.8546, 0.9348] | ... | 0.91 [0.8579, 0.9488] | ... |
CNN feature extraction (CNNFE) | 0.85 [0.8158, 0.8903] | ... | 0.90 [0.8520, 0.9307] | ... | 0.90 [0.8423, 0.9466] | ... |
FusionA (RadHE + CNNFE) | 0.91 [0.8732, 0.9352] | ... | 0.94 [0.9032, 0.9648] | ... | 0.94 [0.8808, 0.9650] | ... |
RadHE vs CNNFE | ... | 0.0619 (0.025) [−0.0019, 0.0780] | ... | 0.8722 (0.05) [−0.0426, 0.0502) | ... | 0.7933 (0.05) [−0.0468, 0.0613] |
RadHE vs FusionA | ... | 0.2499 (0.05) [−0.0297, 0.0077] | ... | 0.0057 (0.025) [−0.0637, −0.0109] | ... | 0.1703 (0.025) [−0.0452, 0.0080] |
CNNFE vs FusionA | ... | 0.0002 (0.017) [−0.0735, −0.0228] | ... | 0.0039 (0.017) [−0.0650, −0.0124] | ... | 0.1663 (0.017) [−0.0596, 0.0103] |
Human-engineered radiomics (RadHE) | 0.89 [0.8582, 0.9221] | ... | 0.90 [0.8546, 0.9348] | ... | 0.91 [0.8579, 0.9488] | ... |
CNN fine-tuning (CNNFT) | 0.89 [0.8582, 0.9245] | ... | 0.93 [0.8971, 0.9547] | ... | 0.87 [0.8075, 0.9169] | ... |
FusionB (RadHE + CNNFT) | 0.90 [0.8659, 0.9334] | ... | 0.93 [0.8961, 0.9625] | ... | 0.93 [0.8776, 0.9604] | ... |
RadHE vs CNNFT | ... | 0.9955 (0.05) [−0.0281, 0.0279] | ... | 0.1001 (0.025) [−0.0630, 0.0055] | ... | 0.1469 (0.05) [−0.0145, 0.0968] |
RadHE vs FusionB | ... | 0.1490 (0.017) [−0.0244, 0.0037] | ... | 0.0002 (0.017) [−0.0500, −0.0153] | ... | 0.1259 (0.025) [−0.0437, 0.0054] |
CNNFT vs FusionB | ... | 0.1671 (0.025) [−0.0289, 0.0050] | ... | 0.7357 (0.05) [−0.0235, 0.0166] | ... | 0.0037 (0.017) [−0.1018, −0.0197] |
CNN feature extraction (CNNFE) | 0.85 [0.8158, 0.8903] | ... | 0.90 [0.8520, 0.9307] | ... | 0.90 [0.8423, 0.9466] | ... |
CNN fine-tuning (CNNFT) | 0.89 [0.8582, 0.9245) | ... | 0.93 [0.8971, 0.9547] | ... | 0.87 [0.8075, 0.9169] | ... |
FusionC (CNNFE + CNNFT) | 0.90 [0.8737, 0.9319] | 0.94 [0.9020, 0.9584] | ... | 0.92 [0.8769, 0.9583] | ... | |
CNNFE vs CNNFT | ... | 0.0481 (0.025) [−0.0761, −0.0003] | ... | 0.0886 (0.025) [−.0708, 0.0050] | ... | 0.2441 (0.025) [−0.0237, 0.0933] |
CNNFE vs FusionC | ... | <0.0001 (0.017) [−0.0651, −0.0255] | ... | 0.0006 (0.017) [−0.0534, −0.0145] | ... | 0.2448 (0.05) [−0.0463, 0.0118] |
CNNFT vs FusionC | ... | 0.4302 (0.05) [−0.0286, 0.0122] | ... | 0.7327 (0.05) [−0.0251, 0.0177] | ... | 0.0034 (0.017) [−0.0887, −0.0176] |
Human-engineered radiomics (RadHE) | 0.89 [0.8582, 0.9221] | ... | 0.90 [0.8546, 0.9348] | ... | 0.91 [0.8579, 0.9488] | ... |
CNN feature extraction (CNNFE) | 0.85 [0.8158, 0.8903] | ... | 0.90 [0.8520, 0.9307] | ... | 0.90 [0.8423, 0.9466] | ... |
CNN fine-tuning (CNNFT) | 0.89 [0.8582, 0.9245) | ... | 0.93 [0.8971, 0.9547] | ... | 0.87 [0.8075, 0.9169] | ... |
FusionD (RadHE + CNNFE+CNNFT) | 0.91 [0.8840, 0.9431] | ... | 0.94 [0.9122, 0.9678] | ... | 0.95 [0.9066, 0.9735] | ... |
RadHE vs FusionD | ... | 0.0448 (0.025) [−0.0381, −0.0004] | ... | 0.0018 (0.025) [−0.0697, −0.0160] | ... | 0.0171 (0.025) [−0.0643, −0.0063] |
CNNFE vs FusionD | ... | 0.0001 (0.017) [−0.0842, −0.0285] | ... | 0.0015 (0.017) [−0.0737, −0.0175) | ... | 0.0337 (0.05) [−0.0829, −0.0033] |
CNNFT vs FusionD | ... | 0.0509 (0.05) [−0.0393, 0.0001] | ... | 0.1724 (0.05) [−0.0345, 0.0062] | ... | 0.0004 (0.017) [−0.1186, −0.0344] |