Skip to main content
RSNA Journals logoLink to RSNA Journals
letter
. 2011 Mar;258(3):960–961. doi: 10.1148/radiol.101735

Limitations of Minimally Acceptable Interpretive Performance Criteria for Screening Mammography

Gregory P Doyle *,, Jay Onysko , Lisa Pogany , Diane Major , Judy Caines §, Rene Shumak , Nancy Wadden #
PMCID: PMC6939948  PMID: 21339358

Editor

The May 2010 Radiology article by Dr Carney and colleagues (1) piqued our interest. The authors set cut points to identify underperforming radiologists who might benefit from additional training. These include sensitivity less than 75%, specificity less than 88% or greater than 95%, recall rate (RR) less than 5% or greater than 12%, positive predictive value (PPV) less than 3% or greater than 8%, and cancer detection rate (CDR) less than 2.5 per 1000 interpretations.

It is difficult to see how these criteria could be used with confidence. The suggested cut points contain internal inconsistencies: PPV is mathematically derived (PPV = CDR/RR). Given the suggested minimum CDR (2.5 per 1000 interpretations) and the suggested maximum RR (12%), the lower bound for PPV would be 2.1%, which is outside the authors’ acceptable range. Similarly, a CDR of 4.0 per 1000 interpretations would result from the authors’ minimum acceptable RR (5%) and maximum PPV (8%).

A radiologist with a CDR of 5.0 per 1000 interpretations and an RR of 6% would have a PPV of 8.3%—too high. It is hard to see how additional training would benefit this radiologist. Otten et al (2) found that with an RR of greater than 5%, the CDR levels off, resulting in a disproportionate and undesirable rise in false-positive findings.

The authors note that certain combinations of outcomes will achieve an RR below the lower-bound, though these would not be problematic. This is difficult to reconcile, since high CDR and low RR always produce a high PPV. The authors implicitly acknowledge this, yet provide no concrete solutions. Thoughtful approaches for assessing the interrelationships between CDR, RR, and PPV have been published elsewhere (3).

Some of the normative data in the study comes from radiologists who had interpreted only 100 screening or diagnostic mammograms. Since reading 960 screening mammograms every 2 years is required for certification, the relevance of the resulting cut points can be questioned further.

It is not clear if the indicators are relevant for all patient populations. Factors such as the age of the screened population and screening history (first vs subsequent screening), not just for “high-risk populations,” are intimately related to the performance of screening mammography.

Footnotes

Disclosures of Potential Conflicts of Interest: G.P.D. No potential conflicts of interest to disclose. J.O. No potential conflicts of interest to disclose. L.P. No potential conflicts of interest to disclose. D.M. No potential conflicts of interest to disclose. J.C. No potential conflicts of interest to disclose. R.S. No potential conflicts of interest to disclose. N.W. No potential conflicts of interest to disclose.

References

  • 1.Carney PA, Sickles EA, Monsees BS, et al. . Identifying minimally acceptable interpretive performance criteria for screening mammography. Radiology 2010;255(2):354–361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Otten JD, Karssemeijer N, Hendriks JH, et al. . Effect of recall rate on earlier screen detection of breast cancers based on the Dutch performance indicators. J Natl Cancer Inst 2005;97(10):748–754. [DOI] [PubMed] [Google Scholar]
  • 3.Blanks RG, Moss SM, Wallis MG. Monitoring and evaluating the UK National Health Service Breast Screening Programme: evaluating the variation in radiological performance between individual programmes using PPV-referral diagrams. J Med Screen 2001;8(1):24–28. [DOI] [PubMed] [Google Scholar]
Radiology. 2011 Mar;258(3):961–962. doi: 10.1148/radiol.101735a

Response

Patricia A Carney *, Edward A Sickles , Barbara S Monsees , Lawrence W Bassett §, Diana L Miglioretti

We greatly appreciate the comments of Dr Doyle and his colleagues. In addressing their first point (ie, difficulty in seeing how the criteria set in our study could be used with confidence because the cut points contain internal consistencies), we would like to point out that the cut point(s) for each individual metric were derived separately and were not intended to be so internally consistent that one bound of any given metric combined with a bound of another metric would always result in a within-bounds metric (1). To achieve such internal consistency would not only be an extremely complex endeavor, but would also result in such narrow bounds as to be unattainable by the majority of practicing U.S. radiologists. Rather, the cut points we derived were intended to serve as determinants of whether or not to perform a detailed review of the overall performance of a given radiologist, with the understanding that many radiologists so flagged would likely be determined to have acceptable overall performance.

Regarding the second point about our cut points being relevant only to radiologists practicing within the United States, we point out that the authors of the letter all are from Canada, a country in which screening mammography is centrally organized and provincially funded. Screening mammography in the United States is neither centrally organized nor fully government funded at the state or national level. Our metrics were derived by radiologists who practice only in the United States who were informed by normative data that come only from U.S. practices, a country in which lack of central organization precludes universal high-volume screening, perceived malpractice exposure likely results in much higher RRs than are observed elsewhere, and screening is performed more frequently (often annually) and for a wider range of patient ages (starting at age 40 years, with no upper age limit) than elsewhere.

Lastly, we agree that factors such as the age of the screened population and screening history (first vs subsequent screening), not just for high-risk populations, are intimately related to the performance of screening mammography, and we addressed this in the discussion section in our article.

Disclosures of Potential Conflicts of Interest: P.A.C. No potential conflicts of interest to disclose. E.A.S. No potential conflicts of interest to disclose. B.S.M. Financial activities related to the present article: none to disclose. Financial activities not related to the present article: expects less than $2000 for serving on the medical advisory board for Hologic, institution has a National Institutes of Health grant for photoacoustic breast imaging, received an honorarium from University of Alabama Birmingham for speaking. Other relationships: none to disclose. L.W.B. No potential conflicts of interest to disclose. D.L.M. Financial activities related to the present article: institution has grants from National Cancer Institute and American Cancer Society, institution has received travel support from American Cancer Society. Financial activities not related to the present article: institution has grants or grants pending from National Cancer Institute and American Cancer Society. Other relationships: none to disclose.

Reference

  • 1.Carney PA, Sickles EA, Monsees BS, et al. . Identifying minimally acceptable interpretive performance criteria for screening mammography. Radiology 2010;255(2):354–361. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Radiology are provided here courtesy of Radiological Society of North America

RESOURCES