Editor
The May 2010 Radiology article by Dr Carney and colleagues (1) piqued our interest. The authors set cut points to identify underperforming radiologists who might benefit from additional training. These include sensitivity less than 75%, specificity less than 88% or greater than 95%, recall rate (RR) less than 5% or greater than 12%, positive predictive value (PPV) less than 3% or greater than 8%, and cancer detection rate (CDR) less than 2.5 per 1000 interpretations.
It is difficult to see how these criteria could be used with confidence. The suggested cut points contain internal inconsistencies: PPV is mathematically derived (PPV = CDR/RR). Given the suggested minimum CDR (2.5 per 1000 interpretations) and the suggested maximum RR (12%), the lower bound for PPV would be 2.1%, which is outside the authors’ acceptable range. Similarly, a CDR of 4.0 per 1000 interpretations would result from the authors’ minimum acceptable RR (5%) and maximum PPV (8%).
A radiologist with a CDR of 5.0 per 1000 interpretations and an RR of 6% would have a PPV of 8.3%—too high. It is hard to see how additional training would benefit this radiologist. Otten et al (2) found that with an RR of greater than 5%, the CDR levels off, resulting in a disproportionate and undesirable rise in false-positive findings.
The authors note that certain combinations of outcomes will achieve an RR below the lower-bound, though these would not be problematic. This is difficult to reconcile, since high CDR and low RR always produce a high PPV. The authors implicitly acknowledge this, yet provide no concrete solutions. Thoughtful approaches for assessing the interrelationships between CDR, RR, and PPV have been published elsewhere (3).
Some of the normative data in the study comes from radiologists who had interpreted only 100 screening or diagnostic mammograms. Since reading 960 screening mammograms every 2 years is required for certification, the relevance of the resulting cut points can be questioned further.
It is not clear if the indicators are relevant for all patient populations. Factors such as the age of the screened population and screening history (first vs subsequent screening), not just for “high-risk populations,” are intimately related to the performance of screening mammography.
Footnotes
Disclosures of Potential Conflicts of Interest: G.P.D. No potential conflicts of interest to disclose. J.O. No potential conflicts of interest to disclose. L.P. No potential conflicts of interest to disclose. D.M. No potential conflicts of interest to disclose. J.C. No potential conflicts of interest to disclose. R.S. No potential conflicts of interest to disclose. N.W. No potential conflicts of interest to disclose.
References
- 1.Carney PA, Sickles EA, Monsees BS, et al. . Identifying minimally acceptable interpretive performance criteria for screening mammography. Radiology 2010;255(2):354–361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Otten JD, Karssemeijer N, Hendriks JH, et al. . Effect of recall rate on earlier screen detection of breast cancers based on the Dutch performance indicators. J Natl Cancer Inst 2005;97(10):748–754. [DOI] [PubMed] [Google Scholar]
- 3.Blanks RG, Moss SM, Wallis MG. Monitoring and evaluating the UK National Health Service Breast Screening Programme: evaluating the variation in radiological performance between individual programmes using PPV-referral diagrams. J Med Screen 2001;8(1):24–28. [DOI] [PubMed] [Google Scholar]