Table 3.
Classification performance of AI models in all datasets
| Performance | Training set | Validation set | Test 1 | Test 2 | Test 3 |
|---|---|---|---|---|---|
| Classification Task 1 | |||||
| Number | n = 999 | n = 250 | n = 173 | n = 289 | n = 124 |
| AUC (95%CI) | 0.946(0.933,0.959) | 0.942(0.914,0.970) | 0.923(0.883,0.962) | 0.855(0.813,0.897) | 0.877(0.811,0.941) |
| Sensitivity | 0.912(371/407) | 0.784(80/102) | 0.812(56/69) | 0.810(111/137) | 0.913(21/23) |
| Specificity | 0.823(487/592) | 0.960(142/148) | 0.865(90/104) | 0.717(109/152) | 0.753(76/101) |
| Accuracy | 0.859(858/999) | 0.888(222/250) | 0.844(146/173) | 0.761(220/289) | 0.782(97/124) |
| PPV | 0.779(371/476) | 0.930(80/86) | 0.800(56/70) | 0.721(111/154) | 0.457(21/46) |
| NPV | 0.931(487/523) | 0.866(142/164) | 0.874(90/103) | 0.807(109/135) | 0.974(76/78) |
| Classification Task 2 | |||||
| Patients | n = 487 | n = 142 | n = 90 | n = 109 | n = 76 |
| AUC (95%CI) | 0.976(0.965,0.987) | 0.858(0.777,0.939) | 0.826(0.735,0.917) | 0.815(0.689,0.942) | 0.903(0.835,0.970) |
| Sensitivity | 0.905(171/189) | 0.684(26/38) | 0.879(29/33) | 0.792(19/24) | 0.692(18/26) |
| Specificity | 0.936(279/298) | 0.942(98/104) | 0.719(41/57) | 0.929(79/85) | 0.960(48/50) |
| Accuracy | 0.924(450/487) | 0.873(124/142) | 0.778(70/90) | 0.899(98/109) | 0.868(66/76) |
| PPV | 0.900(171/190) | 0.813(26/32) | 0.644(29/45) | 0.760(19/25) | 0.900(18/20) |
| NPV | 0.939(279/297) | 0.891(98/110) | 0.911(41/45) | 0.941(79/84) | 0.857(48/56) |
| Classification Task 3 | |||||
| Number | n = 487 | n = 142 | n = 90 | n = 109 | n = 76 |
| AUC (95%CI) | 0.963(0.935,0.991) | 0.971(0.945,0.996) | 0.958(0.887,1.000) | 0.960(0.919,1.000) | 0.749(0.494,1.000) |
| Sensitivity | 0.965(444/460) | 0.873(110/126) | 0.951(77/81) | 0.936(87/93) | 0.562(41/73) |
| Specificity | 0.852(23/27) | 1.000(16/16) | 0.889(8/9) | 0.875(14/16) | 1.000(3/3) |
| Accuracy | 0.959(467/487) | 0.887(126/142) | 0.944(85/90) | 0.927(101/109) | 0.579(44/76) |
| PPV | 0.991(444/448) | 1.000(110/110) | 0.987(77/78) | 0.978(87/89) | 1.000(41/41) |
| NPV | 0.590(23/39) | 0.500(16/32) | 0.667(8/12) | 0.700(14/20) | 0.086(3/35) |
| Classification Task 4 | |||||
| Number | n = 371 | n = 80 | n = 56 | n = 111 | n = 21 |
| AUC (95%CI) | 0.985(0.977,0.994) | 0.982(0.959,1.000) | 0.988(0.970,1.000) | 0.945(0.898,0.992) | 0.981(0.936,1.000) |
| Sensitivity | 0.943(182/193) | 0.938(45/48) | 0.938(30/32) | 0.922(47/51) | 0.923(12/13) |
| Specificity | 0.955(170/178) | 1.000(32/32) | 0.958(23/24) | 0.900(54/60) | 1.000(8/8) |
| Accuracy | 0.949(352/371) | 0.963(77/80) | 0.946(53/56) | 0.910(101/111) | 0.952(20/21) |
| PPV | 0.958(182/190) | 1.000(45/45) | 0.968(30/31) | 0.887(47/53) | 1.000(12/12) |
| NPV | 0.939(170/181) | 0.914(32/35) | 0.920(23/25) | 0.931(54/58) | 0.889(8/9) |
Task 1: Classification of mucinous PCN (IPMN and MCN) versus non-mucinous PCN (SCN and SPN); Task 2: Classification of precancerous versus malignant pancreatic mucinous tumors; Task 3: Differentiation of pancreatic IPMN from MCN; and Task 4: Distinction of pancreatic SPN from SCN.
AUC area under the receiver operating characteristic curve, PPV positive predictive value, NPV negative predictive value.