Table 3.
Average test performance (in %) of the models computed across 100 stratified random splits.
| Model | # feat. | sMBF | MFR | MFC radius | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| AUC | Sens. | Spec. | AUC | Sens. | Spec. | AUC | Sens. | Spec. | ||
| Global | 1 | 68.3 [67.0, 69.7] | 80.4 | 62.4 | 69.7 [68.4, 71.2] | 72.5 | 69.4 | 70.5 [69.4, 71.8] | 77.4 | 66.0 |
| Regional | 17 | 71.2 [69.6, 72.4] | 78.1 | 65.3 | 71.6 [70.4, 72.9] | 74.3 | 69.8 | 73.4 [72.3, 74.7] | 74.8 | 72.0 |
| Radiomics all | 93 | 67.4 [66.0, 69.0] | 72.3 | 64.7 | 68.7 [67.2, 70.4] | 73.1 | 67.5 | 71.6 [70.0, 73.0] | 72.4 | 71.1 |
| Radiomics intens. | 18 | 73.2 [71.6, 74.5] | 77.1 | 70.2 | 68.4 [66.9, 69.8] | 73.3 | 66.8 | 73.8 [72.4, 75.1] | 76.4 | 67.7 |
| CNN 1 | – | 70.0 [68.5, 71.3] | 80.1 | 64.3 | 70.3 [69.0, 71.8] | 74.4 | 71.1 | 73.6 [72.4, 74.8] | 74.8 | 71.0 |
| CNN 2 | – | 71.3 [69.7, 72.8] | 81.4 | 64.9 | 70.8 [69.4, 72.2] | 77.3 | 66.4 | 73.9 [72.5, 75.3] | 77.5 | 69.0 |
| Clinical + global | 20 | 67.6 [66.3, 68.9] | 77.6 | 61.6 | 68.5 [67.2, 69.8] | 75.2 | 64.6 | 69.1 [68.0, 70.3] | 79.0 | 62.8 |
| Clinical + regional | 36 | 70.9 [69.4, 72.1] | 78.2 | 65.3 | 71.8 [70.6, 73.0] | 76.9 | 68.4 | 72.5 [71.3, 73.8] | 76.5 | 70.0 |
| Clin. + rad. all | 112 | 69.1 [67.7, 70.5] | 72.9 | 67.1 | 70.8 [69.1, 72.3] | 78.0 | 67.1 | 72.8 [71.2, 74.1] | 80.0 | 67.4 |
| Clin. + rad. intens. | 37 | 71.5 [70.2, 72.8] | 76.7 | 65.9 | 69.9 [68.5, 71.3] | 77.9 | 64.5 | 72.7 [71.3, 73.8] | 80.0 | 67.7 |
| Clin. + CNN 1 | – | 68.2 [66.8, 69.4] | 77.7 | 62.4 | 69.3 [68.0, 70.5] | 76.7 | 64.7 | 70.0 [68.9, 71.3] | 78.5 | 64.3 |
| Clin. + CNN 2 | – | 70.4 [69.2, 71.6] | 80.1 | 63.1 | 71.2 [70.0, 72.5] | 78.5 | 66.4 | 72.1 [70.9, 73.3] | 78.5 | 67.7 |
Bootstrap-based 95% confidence intervals are reported for AUC. A LR model based on the 19 clinical features only achieves an AUC of 63.0% [61.5, 64.3], with a sensitivity of 70.6% and a specificity of 62.4%. The top performance is highlighted in bold for each column.