Table 1.
Comparison between supervised CNN and weakly-supervised MIL-CNN model. The model performances at patient-level were evaluated on the 3-split dataset for comparison. Bold numbers were the highest scores for MIL-CNN model. All numbers are the mean values and standard deviations of the test dataset under 5-fold cross-validation.
Method | Accuracy | Precision | Recall | F-score | AUC |
---|---|---|---|---|---|
MIL-1-split | 0.95±0.05 | 1.00±0.00 | 0.91±0.08 | 0.95±0.04 | 0.95±0.05 |
MIL-2-split | 0.93±0.05 | 0.97±0.03 | 0.91±0.08 | 0.94±0.05 | 0.92±0.06 |
MIL-3-split | 0.95±0.05 | 1.00±0.00 | 0.91±0.08 | 0.95±0.04 | 0.95±0.05 |
CNN-3-split | 0.77±0.09 | 0.90±0.09 | 0.71±0.13 | 0.77±0.11 | 0.79±0.08 |