Table 3.
Results of the experiment on prostate cancer digital pathology classification using different methods. The highest accuracy in each classification task (column) has been highlighted in bold text.
| Method | Cancerous vs. benign | High-grade vs. low-grade | Percentage of large classification errors | ||
|---|---|---|---|---|---|
| accuracy | AUC | accuracy | AUC | ||
| Single pathologist | 0.80 | 0.78 | 0.65 | 0.61 | 0.07 |
| Majority vote | 0.86 | 0.87 | 0.73 | 0.74 | 0.03 |
| STAPLE | 0.84 | 0.86 | 0.73 | 0.72 | 0.03 |
| STAPLE + iMAE loss | 0.93 | 0.91 | 0.76 | 0.79 | 0.03 |
| Minimum-loss label | 0.88 | 0.88 | 0.80 | 0.82 | 0.03 |
| Annotator confusion estimation | 0.92 | 0.93 | 0.80 | 0.82 | 0.01 |
| STAPLE (3–3) | 0.86 | 0.86 | 0.69 | 0.70 | 0.02 |
| STAPLE + iMAE loss (3–3) | 0.90 | 0.88 | 0.75 | 0.78 | 0.02 |
| Annotator confusion estimation (3–3) | 0.90 | 0.88 | 0.73 | 0.76 | 0.03 |