. 2023 Feb 21;13(4):1342–1354. doi: 10.7150/thno.81784

Table 1.

Comparison between supervised CNN and weakly-supervised MIL-CNN model. The model performances at patient-level were evaluated on the 3-split dataset for comparison. Bold numbers were the highest scores for MIL-CNN model. All numbers are the mean values and standard deviations of the test dataset under 5-fold cross-validation.

Method	Accuracy	Precision	Recall	F-score	AUC
MIL-1-split	0.95±0.05	1.00±0.00	0.91±0.08	0.95±0.04	0.95±0.05
MIL-2-split	0.93±0.05	0.97±0.03	0.91±0.08	0.94±0.05	0.92±0.06
MIL-3-split	0.95±0.05	1.00±0.00	0.91±0.08	0.95±0.04	0.95±0.05
CNN-3-split	0.77±0.09	0.90±0.09	0.71±0.13	0.77±0.11	0.79±0.08