. 2011 Oct 21;52(11):8316–8322. doi: 10.1167/iovs.10-7012

Table 7.

Testing Performance on Dataset B, Based on the Pathology Classifiers Trained on Dataset A

Performance on Dataset B	NM	MH	ME	AMD
AUC	0.978	0.969	0.941	0.975
Best balanced accuracy, %	95.5	97.3	90.5	95.2

The ground truth for this experiment was defined by the consensus from the two experts for both datasets. The consensus includes 96.9%, 95.4%, 88.0%, and 90.5% of 326 scans from dataset A, and 94.7%, 100%, 90.0%, and 84.7% of 131 scans from dataset B, for NM, MH, ME, and AMD, respectively. The number of positive cases versus total cases is shown.