Clinical Validation Is the Key to Adopting AI in Clinical Practice

Francine L Jacobson; Elizabeth A Krupinski

doi:10.1148/ryai.2021210104

. 2021 Jun 16;3(4):e210104. doi: 10.1148/ryai.2021210104

Clinical Validation Is the Key to Adopting AI in Clinical Practice

Francine L Jacobson ^1,^✉, Elizabeth A Krupinski ¹

PMCID: PMC8328103 PMID: 34350416

Francine L. Jacobson, MD, MPH, is the medical director for lung cancer screening at Brigham Health, staff radiologist in the division of thoracic imaging at Brigham and Women's Hospital, and assistant professor of radiology at Harvard Medical School with research interests in radiology perception science and medical image analysis. — **Francine L. Jacobson, MD, MPH,** is the medical director for lung cancer screening at Brigham Health, staff radiologist in the division of thoracic imaging at Brigham and Women's Hospital, and assistant professor of radiology at Harvard Medical School with research interests in radiology perception science and medical image analysis.

Elizabeth A. Krupinski, PhD, is professor and vice chair for research in the department of radiology & imaging sciences at Emory University. Her research interests are in medical image perception, observer performance, and medical decision-making to improve understanding of perceptual and cognitive mechanisms underlying the interpretation of medical images to reduce errors, thereby improving patient care and outcomes. — **Elizabeth A. Krupinski, PhD,** is professor and vice chair for research in the department of radiology & imaging sciences at Emory University. Her research interests are in medical image perception, observer performance, and medical decision-making to improve understanding of perceptual and cognitive mechanisms underlying the interpretation of medical images to reduce errors, thereby improving patient care and outcomes.

In the study “Deep Learning Systems for Pneumothorax Detection on Chest Radiographs: A Multicenter External Validation Study,” Thian et al performed a multicenter validation study that provides a broader range of radiographic techniques than typically found in artificial intelligence (AI), including images that are more typically obtained in clinical practice (1). From the perspective of a radiologist who reports hundreds of postoperative chest radiographs obtained to exclude pneumothorax every month, pneumothorax detection is a well-chosen target. The frequency of pneumothorax varies from rare to very common in specific groups of patients and can therefore decrease the validity of criticisms regarding pretest probability of pneumothorax. It is an independent finding of great importance in critical care settings. The frequency of imaging in these settings provides a tolerance for missing a small pneumothorax that would not immediately change clinical management. Shifting the primary method for identification of pneumothorax from radiologists’ manual image search to AI image review could be a well-chosen entry point for AI to relieve oversubscribed radiologists of a repetitive task.

Deep learning models are built using neural networks to find pattern in data (2). It is important to validate a model using different image datasets. Many such validations are necessary to create a robust clinical tool. The authors have performed a multi-institutional validation study that can serve as a model for local validation in a multisite institution to confirm adequate performance across an entire medical system. Radiologists should be involved in this process just as they would be involved in system acceptance testing and quality assurance. Validation can provide a strong foundation for a quality improvement project.

In the early days of teleradiology, we found unexpected success in detecting pneumothorax in the developing world on chest radiographs captured with a low-resolution digital camera (3). It is a very specific task with far less ambiguity than other lung findings across a very broad range of practice scenarios, including trauma in the emergency department, postoperative chest radiographs, and radiographs obtained for management of critically ill patients in the intensive care unit. The process followed by Thian et al provides a guide for clinical radiologists to follow when adopting commercial image analysis tools. Performance of a local validation study is well within reach for clinical radiologists who do not think of themselves as researchers. It is a process that should be utilized on a more routine basis in the future, especially as AI leads the observer further from the process. As we move into the era of AI in radiology, we believe that this type of investigation will become a core element of the scientific study of medical imaging itself. In part, such evaluation is needed because convolutional neural networks and quantitative image analysis methods are able to discover features that radiologists are not able to adequately verbalize. Such an occurrence has precedent in the historical development of medical imaging.

The fundamental imaging sciences of physics and psychophysics are as relevant to AI development and validation as the computer algorithms and models. Methods developed during the 20th century to study medical image perception provided insight into the unverbalized cognitive processes of the radiologist as observer. Using eye-tracking technologies, the pattern in which the radiologist's eyes focus on the chest radiograph provides a basis for understanding expertise and sources of error. The saccades the eyes make as radiologists search images and how they transform into a decision about the likelihood of the presence of an abnormality, as examined by Kundel and others (4), are equivalent to the black box of a neural network. The detection of signal from noise in medical images is another process that was well documented through experiments performed by Burgess and colleagues years ago (5), and they highlighted some of the fundamental limits of the human visual system that AI may provide a way to overcome by “seeing” features humans cannot.

It is important for AI developers to understand this long and important history of medical image perception research as there are valuable lessons to be learned. Before we ask a computer to aid the radiologist in an image interpretation task, perhaps we should first ask what are the fundamental limitations to human visual and cognitive systems that AI can help overcome, then build the models that address these limitations. For example, Thian et al found that their AI model performed better with large than small pneumothoraces—but so do humans. The model was not impacted by the presence of chest tubes, potentially eliminating a known pitfall in pneumothorax detection, providing improvement over the human observer. These types of questions are relevant because stand-alone performance of AI models is obviously important, especially in studies like Thian et al where a wide variety of images from multiple institutions are used. However, it is only half the story. We need more studies on the impact of AI models on human observer performance. Can they help radiologists find those small pneumothoraces that do not present with enough features to draw the radiologist's eyes to them and convince them that something they cannot see very well or at all is something they need to be concerned about? The long history of medical image perception and studies on the impact of technology on decision-making can help discover the answer.

Thus, deep learning models will continue to require ongoing validation. Over time, changes in technology also will prove challenging, especially where comparisons are required across technological advances. Radiologists must be included in the iterative process demonstrated in this study by Thian et al for local clinical implementation of quantitative and deep analytic methods. Medical image perception methods must be part of the foundation of AI. More comparison between human observers and AI will be required on an ongoing basis to validate the use of AI technology clinically.

Footnotes

Disclosures of Conflicts of Interest: F.L.J. disclosed no relevant relationships. E.A.K. Activities related to the present article: serves on RSNA journals’ Publication Ethics Committee. Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships.

References

1.Thian YL, Ng DW, Patrick JT, et al. Deep learning for pneumothorax detection on chest radiographs: a multicenter external validation study. Radiol Artif Intell 2021;3(4):e200190. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 2002;35(5-6):352–359. [DOI] [PubMed] [Google Scholar]
3.Szot A, Jacobson FL, Munn S, et al. Diagnostic accuracy of chest X-rays acquired using a digital camera for low-cost teleradiology. Int J Med Inform 2004;73(1):65–73. [DOI] [PubMed] [Google Scholar]
4.Kundel HL. Images, image quality and observer performance: new horizons in radiology lecture. Radiology 1979;132(2):265–271. [DOI] [PubMed] [Google Scholar]
5.Kundel HL. History of research in medical image perception. J Am Coll Radiol 2006;3(6):402–408. [DOI] [PubMed] [Google Scholar]

[r1] 1.Thian YL, Ng DW, Patrick JT, et al. Deep learning for pneumothorax detection on chest radiographs: a multicenter external validation study. Radiol Artif Intell 2021;3(4):e200190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r2] 2.Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 2002;35(5-6):352–359. [DOI] [PubMed] [Google Scholar]

[r3] 3.Szot A, Jacobson FL, Munn S, et al. Diagnostic accuracy of chest X-rays acquired using a digital camera for low-cost teleradiology. Int J Med Inform 2004;73(1):65–73. [DOI] [PubMed] [Google Scholar]

[r4] 4.Kundel HL. Images, image quality and observer performance: new horizons in radiology lecture. Radiology 1979;132(2):265–271. [DOI] [PubMed] [Google Scholar]

[r5] 5.Kundel HL. History of research in medical image perception. J Am Coll Radiol 2006;3(6):402–408. [DOI] [PubMed] [Google Scholar]

PERMALINK

Clinical Validation Is the Key to Adopting AI in Clinical Practice

Francine L Jacobson, MD, MPH

Elizabeth A Krupinski, PhD

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Clinical Validation Is the Key to Adopting AI in Clinical Practice

Francine L Jacobson, MD, MPH

Elizabeth A Krupinski, PhD

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases