Table 2.
Quantitative performance evaluation of both networks and two model ensembles. The segmentation networks’ predictions were converted to tile-level predictions in advance for better comparison. All performance metrics were computed for the full test dataset, except for ensemble logistic regression, where the mean values were obtained using iterated 2-fold cross-validation
Accuracy | AUC | Sensitivity | Specificity | |
---|---|---|---|---|
Classification network | 89.9% | 0.963 | 89.8% | 90.0% |
Segmentation network | 85.9% | 0.921 | 93.6% | 85.4% |
Ensemble averaging | 87.1% | 0.959 | 86.4% | 87.8% |
Ensemble logistic regression | 89.7% | 0.960 | 91.1% | 90.0% |