Skip to main content
RSNA Journals logoLink to RSNA Journals
editorial
. 2024 Oct 8;313(1):e242141. doi: 10.1148/radiol.242141

A New Paradigm for Reading Screening Mammograms

Lars J Grimm 1,
PMCID: PMC11535870  PMID: 39377683

See also the article by Gommers and Verboom et al in this issue.

Dr Grimm is an associate professor in the Breast Imaging Division in the Department of Radiology at Duke University. His research interests include health services, active monitoring for ductal carcinoma in situ, artificial intelligence, and gender and racial inequality in radiology. He is a fellow of the Society of Breast Imaging, chair of the National Mammography Database, director of the Duke Radiology Clinical Research Unit, and the principal investigator for an NIH R01 grant.

Dr Grimm is an associate professor in the Breast Imaging Division in the Department of Radiology at Duke University. His research interests include health services, active monitoring for ductal carcinoma in situ, artificial intelligence, and gender and racial inequality in radiology. He is a fellow of the Society of Breast Imaging, chair of the National Mammography Database, director of the Duke Radiology Clinical Research Unit, and the principal investigator for an NIH R01 grant.

Radiologists are under steadily increasing pressure to read studies faster and more efficiently while maintaining high performance levels. In the United States, there is also a national radiologist workforce shortage, with an inadequate number of graduating trainees available to fill job vacancies. These pressures have led to high levels of physician burnout and turnover, which were exacerbated by the COVID-19 pandemic (1,2). Burnout is prevalent among breast radiologists, but there are opportunities for workflow improvement (2). Screening mammograms are unique among radiology studies due to their large clinical volume, standardization in imaging protocols, and discrete outcomes (eg, recall or no recall). This makes them an attractive target for research, and small improvements in efficiency and performance can yield notable benefits for both patients and radiologists.

In this issue of Radiology, Gommers and Verboom et al (3) assessed novel approaches to prioritizing the order of reading screening mammography examinations to investigate whether they would improve screening performance. Using an enriched dataset of 150 two-dimensional (2D) mammography screening examinations (75 with breast cancer, 75 with normal or benign findings) from the Dutch National Breast Cancer Screening Program, the authors used a multireader multicase retrospective study design to evaluate three different approaches to reading mammography examinations. The first approach mirrored current practice, with the order of mammography examinations generated randomly. In the second approach, mammography examinations were interpreted according to volumetric breast density—from less dense to more dense. The final approach used artificial intelligence (AI) (self-supervised learning) to organize mammography examinations based on whether they “looked similar.” With that approach, mammograms were ordered from most different to most similar in appearance. Thirteen experienced breast radiologists read the 150 mammography examinations three times using each of the three approaches in a random order. Eye tracking software was used to measure eye movements. The investigators then compared the performance, reading time, and eye fixation metrics among the three approaches.

The authors (3) found that screening performance improved when the screening mammography examinations were read according to volumetric breast density than with random order. The readers had significantly better overall performance (area under the receiver operating characteristic curve, 0.93 vs 0.92; P = .009), reduced reading times (median, 24.3 vs 27.9 seconds; P < .001), reduced fixation counts (median, 47 vs 52; P < .001), and reduced fixation times in malignant regions (median, 3.7 vs 4.6 seconds; P < .001). Put simply, the readers were faster, more accurate, and more efficient when screening mammography examinations were ordered by breast density. In contrast, there were no performance benefits for the AI-organized approach versus random shuffling. In fact, the reading time with the AI approach was significantly longer than that with random shuffling (median, 28.4 vs 27.9 seconds; P = .02). The authors thus demonstrated that deliberate ordering could benefit the interpretation of screening mammograms but that not all ordering approaches are better than the current random shuffle standard.

The strategy of starting with low-density studies and transitioning to more dense studies, which are presumably harder to read, was based on a body of work in the neurosciences called visual adaptation (4). In visual adaptation, the sensitivity of readers on a current case is altered by a lingering afterimage from the previous case, which, in the case of steadily increasing breast density, would prime the reader for progressively harder cases. Gommers and Verboom et al (3) should be applauded for taking the findings from one domain and applying them in a novel fashion to another. However, as they noted, there is no reason this is the best ordering approach, as breast density is essentially used as a proxy for difficulty. One could easily presuppose that starting with the hardest cases first thing in the morning with a fresh cup of coffee and then getting progressively easier cases as fatigue sets in would also be a successful strategy. The study by Gommers and Verboom et al (3) thus represents an important proof of concept that justifies additional research into alternative ordering strategies, including those that use other metrics.

Although the AI-ordering approach did not offer any benefits over the random order, this merely means that this specific ordering approach was not successful. AI has the potential to learn the strengths, weaknesses, and preferences of individual radiologists and could order examinations based on any number of features, including derivatives of volumetric breast density. There is also the possibility of creating personalized reading lists that would maximize individual performance. It is not hard to imagine a future where a radiologist logs into a workstation, indicates they want to read a batch of 50 examinations, and is presented with a populated, ordered list unique to that individual. Of note, AI vendors have been informally pitching modified interpretation lists for years. However, these efforts often focus on triaging worklists to divert studies with specific findings to individuals with greater expertise in certain areas (eg, Dr Smith is the expert on calcifications). In breast imaging, this approach is problematic because breast radiologists must adhere to specific performance metrics (ie, cancer detection rate, recall rate). Changing the pretest probability of malignancy by showing more cases with potentially actionable findings could alter those metrics. However, this is not a concern if the total pool of cases is unchanged, but merely the order in which cases are presented is altered.

Deliberate ordering of mammograms could reframe the approach to screening mammogram interpretation similar to how batch reading restructured workflows 2 decades ago. In on-demand reading, mammograms are interpreted as they are obtained, often intermixed with images from other modalities. With batch reading, the radiologist waits until a stack of mammograms has accumulated and then interprets them consecutively, ideally in an uninterrupted setting. Compared with on-demand reading, batch reading reduces recall rates while maintaining the cancer detection and false-negative rates (5,6). This favorably shifts the benefits and harms of breast cancer screening and, upon discovery, led many practices to carve out dedicated screening mammography services. This change in workflow is one of the contributing factors to the increasing subspecialization of breast imaging over time. Deliberate ordering of screening mammograms could be introduced into workflows very easily because it dovetails with existing batch-reading strategies.

It remains to be seen what impact deliberate ordering will have when interpreting images from digital breast tomosynthesis (DBT) rather than 2D screening mammography. DBT is gradually replacing 2D mammography in many countries due in part to the higher cancer detection rates and sensitivity in women with denser breasts (7). The presentation of breast tissue is fundamentally different on DBT images versus 2D mammograms due to the ability to scroll through the breast at different depths, so it is unknown if the performance benefits identified by Gommers and Verboom et al (3) will be exacerbated, minimized, or remain unchanged. It is very likely that some other metric of breast complexity might be more influential. However, there is the potential for much greater efficiency gains with DBT since the average reading time for DBT can be twice as long as that for 2D mammography (8). Logistically, interpretation time is one of the major burdens that high-volume breast imaging practices are dealing with as they transition to DBT. Thus, any improvements would be well received.

Although the study by Gommers and Verboom et al (3) is very promising, there are limitations that must be acknowledged and addressed before ordered interpretations can be adopted into clinical practice. It is well known that the performance of readers can shift in reader studies, especially in an enriched series. The readers in this study were knowledgeable about the level of enrichment, which likely also changed their behaviors. In addition, this was a retrospective study and further work in a prospective setting is needed. Finally, the study was performed using breast radiologists who read a median of 11 000 screening mammograms per year (approximately 50 per day), which is many more than most radiologists read in the United States. The performance of radiologists who read a lower volume of mammograms and nonspecialist radiologists may differ from that of the radiologists in this study. None of these limitations though are insurmountable, and all can be tested to optimize deliberate ordering before adoption into clinical practice.

In conclusion, the results by Gommers and Verboom et al (3) are very promising when viewed as a proxy for ordered screening mammogram interpretations. Although significant additional testing and refinement is needed, the findings represent a rare opportunity to improve breast radiologists’ interpretation performance and efficiency and are complementary to existing performance improvement measures, including batch reading and newer AI tools.

Footnotes

Disclosures of conflicts of interest: L.J.G. Grant from the National Institute of Health and ECOG-ACRIN, consulting fees from Hologic and Medscape Reference, leadership role with the Society of Breast Imaging and the National Mammography Database.

References

  • 1. Rawson JV , Smetherman D , Rubin E . Short-Term Strategies for Augmenting the National Radiologist Workforce . AJR Am J Roentgenol 2024. ; 222 ( 6 ): e2430920 . [DOI] [PubMed] [Google Scholar]
  • 2. Dodelzon K , Grimm LJ , Plimpton SR , Markovic D , Milch HS . Ongoing Impact of COVID-19 on Breast Radiologists’ Wellness . J Breast Imaging 2023. ; 5 ( 3 ): 287 – 296 . [DOI] [PubMed] [Google Scholar]
  • 3. Gommers JJJ , Verboom SD , Duvivier KM , et al . Enhancing radiologist reading performance by ordering screening mammograms based on characteristics that promote visual adaptation . Radiology 2024. ; 313 ( 1 ): e240237 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Kohn A . Visual adaptation: physiology, mechanisms, and functional benefits . J Neurophysiol 2007. ; 97 ( 5 ): 3155 – 3164 . [DOI] [PubMed] [Google Scholar]
  • 5. Ghate SV , Soo MS , Baker JA , Walsh R , Gimenez EI , Rosen EL . Comparison of recall and cancer detection rates for immediate versus batch interpretation of screening mammograms . Radiology 2005. ; 235 ( 1 ): 31 – 35 . [DOI] [PubMed] [Google Scholar]
  • 6. Burnside ES , Park JM , Fine JP , Sisney GA . The use of batch reading to improve the performance of screening mammography . AJR Am J Roentgenol 2005. ; 185 ( 3 ): 790 – 796 . [DOI] [PubMed] [Google Scholar]
  • 7. Phi XA , Tagliafico A , Houssami N , Greuter MJW , de Bock GH . Digital breast tomosynthesis for breast cancer screening and diagnosis in women with dense breasts - a systematic review and meta-analysis . BMC Cancer 2018. ; 18 ( 1 ): 380 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. GJW Partridge , Darker I , J James J , et al . How long does it take to read a mammogram? Investigating the reading time of digital breast tomosynthesis and digital mammography . Eur J Radiol 2024. ; 177 : 111535 . [DOI] [PubMed] [Google Scholar]

Articles from Radiology are provided here courtesy of Radiological Society of North America

RESOURCES