Table 1.
Model performance (measured by AUC) with two different datasets, architectures and tasks when data with higher uncertainty levels is referred to further inspection.
Dataset | Architecture | Task | 100% data AUC | 90% data AUC | 80% data AUC | 70% data AUC |
---|---|---|---|---|---|---|
Kaggle DR | Bayes. CNN | (0) vs (1, 2, 3, 4) | 0.889 | 0.898 | 0.908 | 0.918 |
CI: [0.885–0.892] | CI: [0.894–0.902] | CI: [0.904–0.912] | CI: [0.914–0.922] | |||
Kaggle DR | Bayes. CNN | (0, 1) vs (2, 3, 4) | 0.927 | 0.938 | 0.947 | 0.956 |
CI: [0.924–0.930] | CI: [0.935–0.941] | CI: [0.944–0.950] | CI: [0.953–0.959] | |||
Messidor | Bayes. CNN | (0) vs (1, 2, 3) | 0.936 | 0.948 | 0.956 | 0.968 |
CI: [0.922–0.949] | CI: [0.935–0.960] | CI: [0.943–0.968] | CI: [0.956–0.978] | |||
Messidor | Bayes. CNN | (0, 1) vs (2, 3) | 0.955 | 0.965 | 0.973 | 0.978 |
CI: [0.943–0.967] | CI: [0.953–0.975] | CI: [0.962–0.983] | CI: [0.967–0.988] | |||
Kaggle DR | JFnet | (0) vs (1, 2, 3, 4) | 0.911 | 0.918 | 0.925 | 0.932 |
CI: [0.908–0.914] | CI: [0.914–0.921] | CI: [0.921–0.929] | CI: [0.928–0.935] | |||
Kaggle DR | JFnet | (0, 1) vs (2, 3, 4) | 0.947 | 0.953 | 0.954 | 0.956 |
CI: [0.944–0.950] | CI: [0.949–0.956] | CI: [0.951–0.958] | CI: [0.952–0.960] | |||
Messidor | Single best43 | (0) vs (1, 2, 3) | 0.936 | — | — | — |
Messidor | Ensemble43 | (0) vs (1, 2, 3) | 0.989 | — | — | — |
Messidor-2* | CNN14 | (0, 1) vs (> = 2) | 0.990 | — | — | — |
CI: [0.986–0.995] | — | — | — |