. 2022 Dec 15;13:7761. doi: 10.1038/s41467-022-34945-8

Table 2.

Prediction regions on the baseline cases (Test set 1) for cancer detection and ISUP grading, n (%)

Confidence		Benign (n = 440)	Cancer (n = 354)	All biopsies (n = 794)
Conformal prediction regions for cancer detection
99.90%	Error, n (%)	0 (0)	1 (0)	1 (0)
	Empty, n (%)	0 (0)	0 (0)	0 (0)
	Single predictions, n (%)	315 (72%)	303 (86%)	618 (78%)
	Multiple predictions, n (%)	125 (28%)	50 (14%)	175 (22%)
Conformal prediction regions for ISUP grading
		ISUP 1 (n = 172)	ISUP 2 (n = 62)	ISUP 3 (n = 31)	ISUP 4 (n = 41)	ISUP 5 (n = 48)	All grades (n = 354)
67%	Error, n (%)	49 (28%)	20 (32%)	7 (23%)	11 (27%)	11 (23%)	98 (28%)
	Empty, n (%)	5 (3%)	2 (3%)	0 (0%)	2 (5%)	1 (2%)	10 (3%)
	Single predictions, n (%)	114 (66%)	20 (32%)	7 (23%)	12 (29%)	21 (44%)	174 (49%)
	Multiple predictions, n (%)	4 (2%)	20 (32%)	17 (55%)	16 (39%)	15 (31%)	72 (20%)
80%	Error, n (%)	28 (16%)	16 (26%)	6 (19%)	7 (17%)	5 (10%)	62 (18%)
	Empty, n (%)	3 (2%)	2 (3%)	0 (0%)	1 (2%)	1 (2%)	7 (2%)
	Single predictions, n (%)	97 (56%)	8 (13%)	3 (10%)	6 (15%)	18 (38%)	132 (37%)
	Multiple predictions, n (%)	44 (26%)	36 (58%)	22 (71%)	27 (66%)	24 (50%)	153 (43%)
Conformal prediction regions for ISUP grading: Class-wise confidence levels
Class-wise confidence levels		Confidence 85% for ISUP 1	Confidence 67% for ISUP 2—ISUP 5
		ISUP 1 (n = 172)	ISUP 2 (n = 62)	ISUP 3 (n = 31)	ISUP 4 (n = 41)	ISUP 5 (n = 48)	All grades (n = 354)
	Error, n (%)	20 (12%)	20 (32%)	7 (23%)	12 (29%)	11 (23%)	70 (20%)
	Empty, n (%)	2 (1%)	2 (3%)	0 (0)	1 (2%)	1 (2%)	6 (2%)
	Single predictions, n (%)	117 (68%)	11 (18%)	7 (23%)	12 (29%)	21 (44%)	168 (47%)
	Multiple predictions, n (%)	33 (19%)	29 (47%)	17 (55%)	16 (39%)	15 (31%)	110 (31%)
AI point predictions for cancer detection
		Benign (n = 440)	Cancer (n = 354)	All biopsies (n = 794)
	Error, n (%)	4 (1%)	10 (3%)	14 (2%)
	Correct, n (%)	436 (99%)	344 (97%)	780 (98%)
AI point predictions for ISUP grading
		ISUP 1 (n = 172)	ISUP 2 (n = 62)	ISUP 3 (n = 31)	ISUP 4 (n = 41)	ISUP 5 (n = 48)	All grades (n = 354)
	Error, n (%)	26 (15%)	33 (53%)	19 (61%)	21 (51%)	18 (38%)	117 (33%)
	Correct, n (%)	146 (85%)	29 (47%)	12 (39%)	20 (49%)	30 (62%)	237 (67%)

The results are presented both as prediction regions by the conformal predictor and point predictions by the AI system without the conformal predictor. Cancer detection is reported at a confidence level of 99.9%, and ISUP grading is reported at 67% and 80% confidence levels, as well as using class-wise confidence levels (85% for ISUP 1 and 67% for ISUP 2–5). Labels are included in the prediction region if their confidence is higher than a user-specified desired confidence (e.g., 99.9%). The error is the fraction of true labels not included in the prediction region. A multi-label prediction means that the prediction is uncertain, and the model cannot distinguish between several possible class labels at the desired confidence. Empty set predictions are examples where the model could not assign any label, typically meaning that the example was very different from the data the model was trained on. ISUP International Society of Urological Pathology.