. 2023 Nov 3;18:121. doi: 10.1186/s13000-023-01407-8

Table 2.

Quantitative performance evaluation of both networks and two model ensembles. The segmentation networks’ predictions were converted to tile-level predictions in advance for better comparison. All performance metrics were computed for the full test dataset, except for ensemble logistic regression, where the mean values were obtained using iterated 2-fold cross-validation

	Accuracy	AUC	Sensitivity	Specificity
Classification network	89.9%	0.963	89.8%	90.0%
Segmentation network	85.9%	0.921	93.6%	85.4%
Ensemble averaging	87.1%	0.959	86.4%	87.8%
Ensemble logistic regression	89.7%	0.960	91.1%	90.0%