Skip to main content
. 2025 Oct 13;8:609. doi: 10.1038/s41746-025-01970-y

Table 3.

Classification performance of AI models in all datasets

Performance Training set Validation set Test 1 Test 2 Test 3
Classification Task 1
Number n = 999 n = 250 n = 173 n = 289 n = 124
AUC (95%CI) 0.946(0.933,0.959) 0.942(0.914,0.970) 0.923(0.883,0.962) 0.855(0.813,0.897) 0.877(0.811,0.941)
Sensitivity 0.912(371/407) 0.784(80/102) 0.812(56/69) 0.810(111/137) 0.913(21/23)
Specificity 0.823(487/592) 0.960(142/148) 0.865(90/104) 0.717(109/152) 0.753(76/101)
Accuracy 0.859(858/999) 0.888(222/250) 0.844(146/173) 0.761(220/289) 0.782(97/124)
PPV 0.779(371/476) 0.930(80/86) 0.800(56/70) 0.721(111/154) 0.457(21/46)
NPV 0.931(487/523) 0.866(142/164) 0.874(90/103) 0.807(109/135) 0.974(76/78)
Classification Task 2
Patients n = 487 n = 142 n = 90 n = 109 n = 76
AUC (95%CI) 0.976(0.965,0.987) 0.858(0.777,0.939) 0.826(0.735,0.917) 0.815(0.689,0.942) 0.903(0.835,0.970)
Sensitivity 0.905(171/189) 0.684(26/38) 0.879(29/33) 0.792(19/24) 0.692(18/26)
Specificity 0.936(279/298) 0.942(98/104) 0.719(41/57) 0.929(79/85) 0.960(48/50)
Accuracy 0.924(450/487) 0.873(124/142) 0.778(70/90) 0.899(98/109) 0.868(66/76)
PPV 0.900(171/190) 0.813(26/32) 0.644(29/45) 0.760(19/25) 0.900(18/20)
NPV 0.939(279/297) 0.891(98/110) 0.911(41/45) 0.941(79/84) 0.857(48/56)
Classification Task 3
Number n = 487 n = 142 n = 90 n = 109 n = 76
AUC (95%CI) 0.963(0.935,0.991) 0.971(0.945,0.996) 0.958(0.887,1.000) 0.960(0.919,1.000) 0.749(0.494,1.000)
Sensitivity 0.965(444/460) 0.873(110/126) 0.951(77/81) 0.936(87/93) 0.562(41/73)
Specificity 0.852(23/27) 1.000(16/16) 0.889(8/9) 0.875(14/16) 1.000(3/3)
Accuracy 0.959(467/487) 0.887(126/142) 0.944(85/90) 0.927(101/109) 0.579(44/76)
PPV 0.991(444/448) 1.000(110/110) 0.987(77/78) 0.978(87/89) 1.000(41/41)
NPV 0.590(23/39) 0.500(16/32) 0.667(8/12) 0.700(14/20) 0.086(3/35)
Classification Task 4
Number n = 371 n = 80 n = 56 n = 111 n = 21
AUC (95%CI) 0.985(0.977,0.994) 0.982(0.959,1.000) 0.988(0.970,1.000) 0.945(0.898,0.992) 0.981(0.936,1.000)
Sensitivity 0.943(182/193) 0.938(45/48) 0.938(30/32) 0.922(47/51) 0.923(12/13)
Specificity 0.955(170/178) 1.000(32/32) 0.958(23/24) 0.900(54/60) 1.000(8/8)
Accuracy 0.949(352/371) 0.963(77/80) 0.946(53/56) 0.910(101/111) 0.952(20/21)
PPV 0.958(182/190) 1.000(45/45) 0.968(30/31) 0.887(47/53) 1.000(12/12)
NPV 0.939(170/181) 0.914(32/35) 0.920(23/25) 0.931(54/58) 0.889(8/9)

Task 1: Classification of mucinous PCN (IPMN and MCN) versus non-mucinous PCN (SCN and SPN); Task 2: Classification of precancerous versus malignant pancreatic mucinous tumors; Task 3: Differentiation of pancreatic IPMN from MCN; and Task 4: Distinction of pancreatic SPN from SCN.

AUC area under the receiver operating characteristic curve, PPV positive predictive value, NPV negative predictive value.