Table 7.
Testing Performance on Dataset B, Based on the Pathology Classifiers Trained on Dataset A
Performance on Dataset B | NM | MH | ME | AMD |
---|---|---|---|---|
AUC | 0.978 | 0.969 | 0.941 | 0.975 |
Best balanced accuracy, % | 95.5 | 97.3 | 90.5 | 95.2 |
The ground truth for this experiment was defined by the consensus from the two experts for both datasets. The consensus includes 96.9%, 95.4%, 88.0%, and 90.5% of 326 scans from dataset A, and 94.7%, 100%, 90.0%, and 84.7% of 131 scans from dataset B, for NM, MH, ME, and AMD, respectively. The number of positive cases versus total cases is shown.