Skip to main content
Radiology: Artificial Intelligence logoLink to Radiology: Artificial Intelligence
. 2021 Feb 10;3(2):e210009. doi: 10.1148/ryai.2021210009

Should Artificial Intelligence Tell Radiologists Which Study to Read Next?

Stacy D O’Connor 1,, Manav Bhalla 1
PMCID: PMC8035575  PMID: 33939773

See also article by O’Neill et al in this issue.

Stacy D. O’Connor, MD, MPH, MMSc, is an associate professor of radiology and surgery at the Medical College of Wisconsin, subspecializing in abdominal imaging/intervention and imaging/clinical informatics. She is the medical director of IT operations for MCW Radiology and chairs the quality committee for Froedtert & MCW Imaging Services. She has a wide range of research interests, including renal cancer, inflammatory bowel disease, critical results communication, clinical decision support, and structured reporting.

Stacy D. O’Connor, MD, MPH, MMSc, is an associate professor of radiology and surgery at the Medical College of Wisconsin, subspecializing in abdominal imaging/intervention and imaging/clinical informatics. She is the medical director of IT operations for MCW Radiology and chairs the quality committee for Froedtert & MCW Imaging Services. She has a wide range of research interests, including renal cancer, inflammatory bowel disease, critical results communication, clinical decision support, and structured reporting.

Manav Bhalla, MD, is an assistant professor of radiology at the Medical College of Wisconsin. He is subspecialty trained in neuroradiology, vascular and interventional radiology, and abdominal radiology, and his research interests include stroke and dementia imaging. His work primarily focuses on increasing the effectiveness of an initial radiologic examination, which in key conditions can dichotomize the treatment algorithm. Dr Bhalla has been awarded editor’s recognition award twice by Radiology and by AJNR.

Manav Bhalla, MD, is an assistant professor of radiology at the Medical College of Wisconsin. He is subspecialty trained in neuroradiology, vascular and interventional radiology, and abdominal radiology, and his research interests include stroke and dementia imaging. His work primarily focuses on increasing the effectiveness of an initial radiologic examination, which in key conditions can dichotomize the treatment algorithm. Dr Bhalla has been awarded editor’s recognition award twice by Radiology and by AJNR.

The reading worklist, which displays studies to be interpreted, is often organized to help radiologists read the most urgent study next. However, there are few variables available to determine the urgency of any given examination. Many lists simply use imaging location (eg, emergency, inpatient, outpatient) and the priority assigned by the ordering provider based on a clinical assessment of the patient’s symptoms. Ordering providers also may apply a high priority to an examination that does not require immediate interpretation but must be acquired quickly to accommodate a patient’s other appointments. Although this sort of prioritization serves as a guide to technologists to determine the order in which to perform the examinations, its application as a reading priority indicator has limitations and is fraught with errors. At some practices, an experienced imaging technologist may alert radiologists about imaging findings noticed while scanning that may need immediate if not urgent attention. Alternatively, the technologist may edit the priority level of the examination while sending it to the radiologists’ reading worklist. This practice, however, is not standard across the country. In this age of quality improvement, it may be prudent to augment these current practices with additional more sophisticated tools such as artificial intelligence (AI).

A trained AI algorithm can be applied to everyday practice in critical time-dependent scenarios such as stroke and other acute intracranial findings such as intracranial hemorrhage (ICH) from nonstroke causes. A leading cause of death and disability worldwide, stroke kills approximately 140 000 Americans each year and is responsible for almost one death every 4 minutes (1). The use of CT to distinguish ischemic from hemorrhagic stroke is critical to establish early treatment options and to triage patients to appropriate management. The speed at which this determination is made greatly affects the success of treatment, such that door-to-diagnosis time and radiologist turnaround time (TAT) are included in national benchmarks and performance metrics of evolving reimbursement models. The rapidity of CT acquisition could be negated by the absence of appropriate prioritization, especially if stroke or ICH is unexpected or examinations are ordered from less urgent locations, as these studies could have a longer “shelf life” or wait time on a radiologist reading list.

Examinations can be prioritized appropriately based on specific tags or labels placed at the time of ordering or by request after images have been obtained (2). This in turn expedites the interpretation by minimizing the wait time of an examination in the reading worklist. Osborne et al evaluated the impact of tagging studies as “stroke protocol,” which would prioritize them above emergent CT examinations. The average TAT for stroke protocol CT was 6.5 minutes compared with 17.3 minutes for emergent CT examinations. The improvement was the result of decreased “available-to-picked time,” which is combined with “radiologist reading time” to calculate TAT.

Reading priorities can be set manually by technologists at end examination using a schema, which relies on many different variables. When these priorities are assigned numerical values, radiologists can easily identify the next most urgent study to read. Using this method, Gaskin et al reported significant improvement in the median TAT for the most urgent studies (critical, emergency department/urgent, and inpatient/urgent), but the change was less (5%) in emergency department studies, presumably because examinations tagged with the emergency department location are by default interpreted first by radiologists (3).

AI tools have the potential to go beyond variables known prior to an examination and incorporate information gleaned from the images themselves to impact radiologists and their workflow. Arbabshirani et al first evaluated the performance of a predictive deep learning model for ICH detection at head CT, then studied its impact on workflow optimization (4). The model had a high accuracy (area under the receiver operating characteristic curve of 0.846) for ICH detection and reprioritized 26% of “routine” studies with positive findings to “stat,” resulting in significantly lower median time to clinical interpretation (19 minutes vs 512 minutes). However, the model’s positive predictive value was only 64% for ICH, leading to many false-positive detections and potentially decreasing acceptance by radiologists.

The article by O’Neill et al in this issue of Radiology: Artificial Intelligence evaluates the impact of a commercially available deep learning algorithm for ICH detection on TAT of positive examinations, using three phases of implementation (5). These phases focused on reducing the time between an examination appearing on the reading worklist and a radiologist beginning to read it, as 90% of TAT for low priority examinations and 60% of TAT for high priority examinations was composed of this “wait time,” while the read time component of TAT was not significantly different.

Evolving evidence-based design principles for clinical decision support systems may prove useful in guiding AI developers to effective methods for notifying radiologists of algorithm results and can help explain the impact of the three phases of ICH detection notification (6). The first phase used an ancillary widget that was prominently displayed on the auxiliary monitor and clearly identified positive examinations. However, it required radiologists to move out of their regular workflow to review the widget and determine if there was a study that should be read before the one currently at the top of their reading worklist. The second phase flagged positive examinations on the reading worklist with a clear bright yellow icon, removing the need to review a separate window but still requiring the cognitive burden of deciding if a flagged study should be read before one higher on the worklist. Neither ancillary widgets nor flagging positive examinations had a significant impact on wait time. The third phase actively reprioritized positive examinations to the top of the list, removing cognitive burden and fitting well into standard workflow. This reduced the wait time from 15.75 minutes per study to 12.01 minutes per study.

Robust and clinically relevant methods of measurement must be identified that can demonstrate the impact of AI tools in a complex medical environment to justify their expense and establish a return-on-investment. Because the length of a reading worklist varies and longer lists could lead to increased time until an examination is selected for interpretation, O’Neill et al borrowed from queue theory, using a linear model to account for the number of examinations on the reading worklist when evaluating the association between TAT and reprioritization. However, the reading worklist used by radiologists for clinical work included a variety of examinations, while the dataset only included unenhanced CT. As a result, it underestimated queue size and overestimated queue-adjusted wait time. Additionally, examinations performed outside of normal working hours were excluded to reduce additional confounding variables.

As reliable AI tools are developed that can reprioritize examinations with urgent findings appropriately, methods to adjudicate priority adjustments from an ensemble of algorithms will be required, especially for worklists that contain examinations from multiple modalities for multiple body parts. For example, if there is a head CT with ICH (one positive finding), should it be prioritized above or below a chest radiograph with a pneumothorax and pneumonia (two positive findings)? What if the ICH is larger than yesterday’s CT? What if it is a tension pneumothorax? A future version of the AI Results framework from Integrating the Healthcare Enterprise (IHE) (7) could incorporate clinical significance data to aid proper relative reprioritization across tools from various vendors.

Additionally, ensembles of AI tools also create the risk of pushing false-negative examinations too far down the list, below true- and false-positive examinations. On the basis of previously assessed and published diagnostic performance of O’Neill et al’s algorithm, 8% of ICH false-positive examinations were prioritized incorrectly while 1% of ICH false-negative examinations were assigned low priority incorrectly. Dynamic schema may be required to balance AI results with traditional prioritization variables such as patient type (eg, emergency, inpatient, outpatient) and time elapsed since examination acquisition, especially when using algorithms with low accuracy or conditions with low prevalence.

In summary, this study provides an example of optimal utilization of information technology in radiology practice, integrating two principal applications of AI, namely, a deep learning algorithm to identify findings and manage workflow. Technologically augmented triaging systems can be effective in reducing TAT. Future research with this model could include utilization during nonroutine working hours, inclusion of more heterogeneous types of examinations on the worklist, and impact on false-negative examinations. A different perspective on the impact of this application could be obtained by evaluating clinical impact of increased TAT for examinations with relatively lower priority.

Footnotes

Disclosures of Conflicts of Interest: S.D.O. disclosed no relevant relationships. M.B. disclosed no relevant relationships.

References

  • 1.Yang Q, Tong X, Schieb L, et al. Vital Signs: Recent Trends in Stroke Death Rates - United States, 2000-2015. MMWR Morb Mortal Wkly Rep 2017;66(35):933–939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Osborne TF, Grabiel AJ, Clark RH. The Benefit of a Triage System to Expedite Acute Stroke Head Computed Tomography Interpretations. J Stroke Cerebrovasc Dis 2018;27(5):1190–1193. [DOI] [PubMed] [Google Scholar]
  • 3.Gaskin CM, Patrie JT, Hanshew MD, Boatman DM, McWey RP. Impact of a Reading Priority Scoring System on the Prioritization of Examination Interpretations. AJR Am J Roentgenol 2016;206(5):1031–1039. [DOI] [PubMed] [Google Scholar]
  • 4.Arbabshirani MR, Fornwalt BK, Mongelluzzo GJ, et al. Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration. NPJ Digit Med 2018;1:9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.O’Neill TJ, Xi Y, Stehel E, et al. Active Reprioritization of the Reading Worklist Using Artificial Intelligence Has a Beneficial Effect on the Turnaround Time for Interpretation of Head CT with Intracranial Hemorrhage. Radiol Artif Intell 2021;3(2):e200024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Miller K, Mosby D, Capan M, et al. Interface, information, interaction: a narrative review of design and functional requirements for clinical decision support. J Am Med Inform Assoc 2018;25(5):585–592. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Radiology Technical Committee . IHE Radiology Technical Framework Supplement. https://www.ihe.net/uploadedFiles/Documents/Radiology/IHE_RAD_Suppl_AIR.pdf. Updated July 16, 2020. Accessed January 7, 2021.

Articles from Radiology: Artificial intelligence are provided here courtesy of Radiological Society of North America

RESOURCES