Table 2.
Performance of Deep Learning–based Automatic Detection Algorithm in the 6 External Validation Datasets
| Performance Measures | Seoul National University Hospital Dataset | Boramae Medical Center Dataset | Kyunghee University Hospital at Gangdong Dataset | Daejeon Eulji Medical Center Dataset | Montgomery Dataset | Shenzhen Dataset |
|---|---|---|---|---|---|---|
| Area under the receiver operating characteristic curve | 0.993 (0.984–1.002) | 0.979 (0.954–1.005) | 1.000 (0.999–1.000) | 1.000 (0.999–1.000) | 0.996 (0.991–1.001) | 0.977 (0.967–0.987) |
| Area under the alternative free-response receiver operating characteristic curve | 0.993 (0.983–1.003) | 0.981 (0.960–1.001) | 0.994 (0.987–1.001) | 1.000 (0.999–1.000) | 0.996 (0.990–1.002) | 0.973 (0.963–0.984) |
| SensitivitySENa | 0.952 (0.881–0.987) | 0.943 (0.860–0.984) | 1.000 (0.965–1.000) | 1.000 (0.949–1.000) | 1.000 (0.932–1.000) | 0.947 (0.916–0.969) |
| SpecificitySENa | 1.000 (0.964–1.000) | 0.957 (0.880–0.991) | 0.914 (0.823–0.968) | 0.980 (0.930–0.998) | 0.938 (0.860–0.979) | 0.911 (0.875–0.940) |
| True detection rateSENa | 0.962 (0.914–0.988) | 0.945 (0.894–0.976) | 1.000 (0.981–1.000) | 1.000 (0.984–1.000) | 1.000 (0.956–1.000) | 0.953 (0.931–0.970) |
| SensitivitySPEa | 0.843 (0.747–0.914) | 0.900 (0.805–0.959) | 0.990 (0.947–1.000) | 0.986 (0.923–1.000) | 0.846 (0.719–0.931) | 0.841 (0.796–0.879) |
| SpecificitySPEa | 1.000 (0.964–1.000) | 1.000 (0.949–1.000) | 1.000 (0.949–1.000) | 1.000 (0.964–1.000) | 1.000 (0.955–1.000) | 0.991 (0.973–0.998) |
| True detection rateSPEa | 0.750 (0.667–0.821) | 0.759 (0.681–0.826) | 0.806 (0.743–0.860) | 0.719 (0.656–0.776) | 0.719 (0.609–0.813) | 0.771 (0.731–0.807) |
aSubscript SEN indicates the high-sensitivity cutoff; subscript SPE indicates the high-specificity cutoff.