Table 1.
Comparison of diagnostic performance between the deep learning system and radiologists
| Accuracy | Sensitivity | Specificity | Positive predictive value | Negative predictive value | AUC | Time for training | Time for testing | |
| AI | ||||||||
| AlexNet | 87.4 | 77.3 | 97.1 | 96.3 | 81.8 | 0.87 | 51 min | 9 sec |
| GoogleNet | 85.3 | 74.2 | 95.9 | 94.7 | 80.0 | 0.85 | 3 hr | 11 sec |
| Radiologists | 81.2 | 80.2 | 82.0 | 78.7 | 83.4 | 0.74 | – | – |
| p values | ||||||||
| AI(AlexNet) vs AI(GoogleNet) | 0.3991 | 0.5296 | 0.3390 | 0.1412 | 0.4647 | 0.0511 | ||
| AI(AlexNet) vs Radiologists | 0.0528 | 0.4386 | 0.0445 a | 0.0528 | 0.4386 | <0.0001 b | ||
| AI(GoogleNet) vs Radiologists | 0.2453 | 0.4386 | 0.0424 a | 0.0507 | 0.4386 | 0.0003 b | ||
p<0.05 (Mann–Whitney U test).
p<0.01 (χ2-test).