. 2019 Sep 18;2(4):528–537. doi: 10.1093/jamiaopen/ooz040

Table 3.

Performance of distant labels and classification models using 146 manually reviewed gold standard patients

		Performance measurements^a (95% confidence interval)
		Area under the curve (AUC)	Sensitivity	Specificity	PPV	NPV	F-1 score	Accuracy
Distant labels		NA	0.889	0.797	0.810	0.881	0.848	0.842
Distant labels			(0.818, 0.957)	(0.700, 0.889)	(0.723, 0.899)	(0.797, 0.952)	(0.783, 0.908)	(0.781, 0.904)
Classifier	A (CCR)	0.789	0.542	0.824	0.750	0.649	0.629	0.685
	A (CCR)	(0.716, 0.861)	(0.423, 0.662)	(0.727, 0.899)	(0.633, 0.863)	(0.543, 0.744)	(0.521, 0.726)	(0.603, 0.760)
	B (NLP)	0.917	0.861	0.878	0.873	0.867	0.867	0.870
	B (NLP)	(0.868, 0.966)	(0.778, 0.933)	(0.800, 0.944)	(0.794, 0.943)	(0.783, 0.936)	(0.800, 0.925)	(0.815, 0.925)
	C (NLP + CCR)	0.925 (0.880, 0.969)	0.861	0.878	0.873	0.867	0.867	0.870
	C (NLP + CCR)		(0.778, 0.933)	(0.800, 0.944)	(0.794, 0.943)	(0.783, 0.936)	(0.800, 0.925)	(0.815, 0.925)

Note that positive predictive value (PPV), negative predictive value (NPV), F-1 score, and overall accuracy are highly dependent on the prevalence of the condition, which in our case is 72/146 = 0.49. The actual prevalence of recurrent metastatic breast cancer in our study population is likely to be much lower. However, sensitivity, specificity, and area under the curve (AUC) are intrinsic properties of classifier and are insensitive to prevalence of cases.³²^,³³