Table 2.
Prediction regions on the baseline cases (Test set 1) for cancer detection and ISUP grading, n (%)
Confidence | Benign (n = 440) | Cancer (n = 354) | All biopsies (n = 794) | ||||
---|---|---|---|---|---|---|---|
Conformal prediction regions for cancer detection | |||||||
99.90% | Error, n (%) | 0 (0) | 1 (0) | 1 (0) | |||
Empty, n (%) | 0 (0) | 0 (0) | 0 (0) | ||||
Single predictions, n (%) | 315 (72%) | 303 (86%) | 618 (78%) | ||||
Multiple predictions, n (%) | 125 (28%) | 50 (14%) | 175 (22%) | ||||
Conformal prediction regions for ISUP grading | |||||||
ISUP 1 (n = 172) | ISUP 2 (n = 62) | ISUP 3 (n = 31) | ISUP 4 (n = 41) | ISUP 5 (n = 48) | All grades (n = 354) | ||
67% | Error, n (%) | 49 (28%) | 20 (32%) | 7 (23%) | 11 (27%) | 11 (23%) | 98 (28%) |
Empty, n (%) | 5 (3%) | 2 (3%) | 0 (0%) | 2 (5%) | 1 (2%) | 10 (3%) | |
Single predictions, n (%) | 114 (66%) | 20 (32%) | 7 (23%) | 12 (29%) | 21 (44%) | 174 (49%) | |
Multiple predictions, n (%) | 4 (2%) | 20 (32%) | 17 (55%) | 16 (39%) | 15 (31%) | 72 (20%) | |
80% | Error, n (%) | 28 (16%) | 16 (26%) | 6 (19%) | 7 (17%) | 5 (10%) | 62 (18%) |
Empty, n (%) | 3 (2%) | 2 (3%) | 0 (0%) | 1 (2%) | 1 (2%) | 7 (2%) | |
Single predictions, n (%) | 97 (56%) | 8 (13%) | 3 (10%) | 6 (15%) | 18 (38%) | 132 (37%) | |
Multiple predictions, n (%) | 44 (26%) | 36 (58%) | 22 (71%) | 27 (66%) | 24 (50%) | 153 (43%) | |
Conformal prediction regions for ISUP grading: Class-wise confidence levels | |||||||
Class-wise confidence levels | Confidence 85% for ISUP 1 | Confidence 67% for ISUP 2—ISUP 5 | |||||
ISUP 1 (n = 172) | ISUP 2 (n = 62) | ISUP 3 (n = 31) | ISUP 4 (n = 41) | ISUP 5 (n = 48) | All grades (n = 354) | ||
Error, n (%) | 20 (12%) | 20 (32%) | 7 (23%) | 12 (29%) | 11 (23%) | 70 (20%) | |
Empty, n (%) | 2 (1%) | 2 (3%) | 0 (0) | 1 (2%) | 1 (2%) | 6 (2%) | |
Single predictions, n (%) | 117 (68%) | 11 (18%) | 7 (23%) | 12 (29%) | 21 (44%) | 168 (47%) | |
Multiple predictions, n (%) | 33 (19%) | 29 (47%) | 17 (55%) | 16 (39%) | 15 (31%) | 110 (31%) | |
AI point predictions for cancer detection | |||||||
Benign (n = 440) | Cancer (n = 354) | All biopsies (n = 794) | |||||
Error, n (%) | 4 (1%) | 10 (3%) | 14 (2%) | ||||
Correct, n (%) | 436 (99%) | 344 (97%) | 780 (98%) | ||||
AI point predictions for ISUP grading | |||||||
ISUP 1 (n = 172) | ISUP 2 (n = 62) | ISUP 3 (n = 31) | ISUP 4 (n = 41) | ISUP 5 (n = 48) | All grades (n = 354) | ||
Error, n (%) | 26 (15%) | 33 (53%) | 19 (61%) | 21 (51%) | 18 (38%) | 117 (33%) | |
Correct, n (%) | 146 (85%) | 29 (47%) | 12 (39%) | 20 (49%) | 30 (62%) | 237 (67%) |
The results are presented both as prediction regions by the conformal predictor and point predictions by the AI system without the conformal predictor. Cancer detection is reported at a confidence level of 99.9%, and ISUP grading is reported at 67% and 80% confidence levels, as well as using class-wise confidence levels (85% for ISUP 1 and 67% for ISUP 2–5). Labels are included in the prediction region if their confidence is higher than a user-specified desired confidence (e.g., 99.9%). The error is the fraction of true labels not included in the prediction region. A multi-label prediction means that the prediction is uncertain, and the model cannot distinguish between several possible class labels at the desired confidence. Empty set predictions are examples where the model could not assign any label, typically meaning that the example was very different from the data the model was trained on. ISUP International Society of Urological Pathology.