How to Implement AI in the Clinical Enterprise: Opportunities and Lessons Learned

Yvonne W Lui; Krzysztof Geras; K Tobias Block; Marc Parente; Joseph Hood; Michael P Recht

doi:10.1016/j.jacr.2020.09.039

. Author manuscript; available in PMC: 2021 Jun 11.

Published in final edited form as: J Am Coll Radiol. 2020 Nov;17(11):1394–1397. doi: 10.1016/j.jacr.2020.09.039

How to Implement AI in the Clinical Enterprise: Opportunities and Lessons Learned

Yvonne W Lui ¹, Krzysztof Geras ², K Tobias Block ³, Marc Parente ⁴, Joseph Hood ⁵, Michael P Recht ⁶

PMCID: PMC8193627 NIHMSID: NIHMS1707467 PMID: 33153543

INTRODUCTION

Advances in research give credence to the promise that has built up in recent times around artificial intelligence (AI) applications in medical imaging. To completely realize this promise in clinical practice, significant questions still need to be answered. Several have been discussed in detail in previous publications, including scientific questions around developing the best models to solve appropriate clinical questions [1], complex regulatory issues that span multiple agencies [2,3], and the creation of successful business models [4,5]. One important topic that has received little attention up to now is the nuts and bolts of what it takes to take an AI algorithm for medical imaging from the laboratory into full-scale clinical deployment. In this article, we will address this question and look at various aspects, from internal testing to infrastructure needs and deployment challenges.

TESTING THE MODELS

To deploy AI-based medical imaging tools, whether coming from a vendor or a research group, it is vital for practices to formally test models on their own internal and unseen data. This is important to confirm that the model works as intended on data from your specific practice. The extent to which the model will generalize to the data at your practice will depend on whether the data used to build it were similar. For example, tools based on deep neural networks might not work well at your practice, if the data used to train them were acquired with different imaging devices or from a population with very different demographics. Furthermore, some models could overfit to the training data and not generalize well to unseen data. Regardless of the tool’s purpose, whether computer-aided diagnosis or workflow management, testing on internal data should be performed.

Ideally, testing should be performed both retrospectively and prospectively. When testing prospectively, it is advisable to run the tool in the background before full deployment. Depending on regulatory requirements for the model, some of these activities may represent research, in which case internal review board approval is required. AI research projects may present unique challenges to established research paradigms such as by the sheer number of cases that may be required for inclusion as well as perceived controversy arising from the mere mention of AI; however, it is currently the standard practice to follow established research protocols.

Testing involves several steps, including identifying the appropriate test cohort, data curation, and data labeling. In terms of cohort building, it is important to consider the correct clinical application for a model. For example, an algorithm that is useful for identifying and reprioritizing urgent outpatient radiographs might be rather irrelevant for intensive care unit portable films. The type of AI model being tested and deployed will dictate where and how to perform appropriate testing. Currently, the number and types of applications of AI tools in radiology continues to grow and includes diverse areas from lesion detection to segmentation methods, image reconstruction, 3-D visualization tools, and workflow optimization. For all applications, it is important to clearly separate data used to train, validate, and test the models. Even studies from the same patient should not be included in both training and testing data sets. Finally, you may need to perform labeling of internal test data using your own experienced radiologists or adjudication by experts of any discrepancies if relying on automated report mining for labels.

Many publications report the performance of AI-based tools using area under the receiver operator curve, often abbreviated as AUC. Although very popular, this is, however, merely one measure of a model’s performance. Other popular alternatives are the area under the precision recall curve (AUPRC), as well as F₁ score, which incorporates both the precision and recall for a certain classification threshold. AUC or AUPRC may or may not be a good representation of a model’s true value. One should always consider the particular use case of a model when determining what metric is the appropriate one to use and how the reported numbers should be interpreted. Both AUC and AUPRC for a specific model strongly depend on the population on which the model is evaluated. For example, it is possible to achieve very impressive AUC numbers by adding many negative cases to the test set. Therefore, results reported in a research articles may not accurately reflect your own experience. In general, AUC is more robust to small sample size than AUPRC, but AUPRC is more robust to class imbalance. F₁ is a good alternative to AUC or AUPRC when the desired recall rate is known.

One also should not compare AUC or AUPRC numbers reported in different publications if they were not computed on the same test set. Consideration of such statistical factors is important to interpret model performance and estimate a model’s true value to your clinical enterprise.

ORCHESTRATION

To ensure routine applicability, data must flow automatically and seamlessly from the acquisition devices through a data management system that dispatches the right studies to the respective AI algorithms, routes them through any needed processing steps, and collects the generated outputs (Fig. 1). Depending on the use case, the final results may be forwarded to various targets such as picture archives, viewing workstations, dictation systems, or worklist managers. It is critical to feed only the correct views and series types to the different AI tools, because incorrect input data will invariably be problematic and cause false results (Fig. 2). In addition, models may require complementary data such as relevant patient demographics, procedure and imaging-site information, as well as patient clinical history. As machine learning models develop greater sophistication, it is likely that the processes of data curation will also become more complicated, most likely requiring multidimensional query of gthe electronic health record and data retrieval from legacy hospital information systems.

Fig 1. — Artificial intelligence (AI) image data orchestrator (blue) is needed to incorporate AI models into practice. Complex routing must be followed to move studies from acquisition device to corresponding AI model(s) with results feeding back into the clinical environment. HL7 = health level seven.

Fig 2. — Saliency heat map from a discrepant study between an outside vendor artificial intelligence algorithm result and our radiologist reported result. The FDA-approved product is marketed for the detection of intracranial hemorrhage. In this example, caudal images from a CT examination that included the head and neck were routed through the model. The input of unexpected images yields a nonsense result with an area of concern highlighted in the left axilla (red or orange). IM = image; SE = series.

Different solutions are currently being developed to coordinate radiology data interaction with AI-based tools, including various commercial platforms as well as community-driven open-source systems [6]. It is still unclear which approach will ultimately be best suited, but all successful systems will need to fulfill the following requirements: the flexibility to allow multiple models to be used in a practice simultaneously, the ability to accommodate both research- and vendor-specific models, support for on-premise deployment of AI tools as well as processing through cloud services, the ability to coordinate with existing infrastructure, consistent and reliable performance with up-time monitoring and failure tolerance, and finally, the ability to scale and adapt to the dynamic demands of a clinical practice with ongoing quality assessment [7]. Furthermore, successful integration will span existing practice domains including IT, clinical informatics, and quality and safety.

PROCESSING STEPS

For clinical deployment of AI-based tools, time is a critical component to success. Information that is slow to reach the interpreting radiologist can have negative downstream effects such as delay in care, reader frustration, and incomplete interpretation. Processing time has long been a barrier to the consistent use of semi-automated tools for 3-D visualization. A strength of deep learning–based models is fast inference time: because the computationally heavy training, which might even take months for some tasks, is done before the deployment, the time needed to process any particular case at test time is oftentimes very short. To fully realize this strength, it is important to consider time for transmission and time for any needed preprocessing and postprocessing steps. Of note, machine learning models do not handle DICOM standard directly and require conversion to a generic image file format with extraction of a metadata dictionary. AI models often require uniform image size for efficient training, and thus clinical studies that vary in resolution may require interpolation or down-sampling to perform well. Additionally, some models rely on preprocessing or postprocessing steps such as skull stripping and coregistration. Finally, if studies will be sent to a cloud-based platform, deidentification and reidentification steps will be needed in addition to extra routing and processing steps. Time required for these additional steps can add up and overwhelm the short inference time of the actual AI model. Finally, one should also consider the hardware configuration needed to execute these steps because different components are optimal for different functions.

PERSONNEL

Even more important than having the right infrastructure is having the right team. A wide breadth of knowledge needs to be represented: machine learning, software engineering, radiology, institutional IT, internal review board requirements, and legal, compliance, and possibly clinical partners from referring specialties. Early and iterative feedback from individuals practicing in relevant areas is critical to understanding how individuals will interact with the tool in real-life clinical scenarios. Such a multidisciplinary team will help to ensure successful deployment (Fig. 3). Practices will need to commit resources for appropriate education and training to form, maintain, and grow the multidisciplinary core required for this work as well as to ensure appropriate backup personnel.

Fig 3. — Example from our department of a multidimensional team convened for deployment of a workflow algorithm to predict no-show and same-day cancellations at outpatient imaging centers. The team included front desk staff, patient communications personnel, institutional predictive analytics lead, lead technologist, clinical radiologist, electronic health record expert, radiology IT, data engineer, IT integration expert, as well as medical center IT (MCIT) and machine learning (ML) scientist (shown teleconferenced in), seen here presenting findings to the department chair. The composition of artificial intelligence deployment teams will differ depending on the model's application and other potential expertise needed could include legal, compliance, and research.

SUMMARY

In summary, the science driving AI for medical imaging applications is groundbreaking and has the potential to change the current practice ofradiology. To realize this change in clinical practice, several facets need to be implemented. Ultimately, the best systems will be adaptive ones with the right infrastructure as well as the right expertise. Once implemented, AI-based tools will need ongoing quality measures to ensure continued high performance.

Footnotes

The authors state that they have no conflict of interest related to the material discussed in this article. Dr Lui, Dr Geras, Dr Block, Mr Parente, Mr Hood, and Dr Recht are nonpartner, non–partnership track employees.

Contributor Information

Yvonne W. Lui, Associate Chair for Artificial Intelligence, Department of Radiology, NYU Langone Health / NYU Grossman School of Medicine, New York, New York.

Krzysztof Geras, Department of Radiology, NYU Langone Health / NYU Grossman School of Medicine, New York, New York.

K. Tobias Block, Department of Radiology, NYU Langone Health / NYU Grossman School of Medicine, New York, New York.

Marc Parente, Department of Radiology, NYU Langone Health / NYU Grossman School of Medicine, New York, New York.

Joseph Hood, Department of Radiology, NYU Langone Health / NYU Grossman School of Medicine, New York, New York.

Michael P. Recht, Chair of the Department of Radiology, NYU Langone Health / NYU Grossman School of Medicine, New York, New York.

REFERENCES

1.Driver CN, Bowles BS, Bartholmai BJ, Greenberg-Worisek AJ. Artificial Intelligence in radiology: a call for thoughtful application. Clin Transl Sci 2020;13:216–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Harvey HB, Gowda V. How the FDA regulates AI. Acad Radiol 2020;27:58–61. [DOI] [PubMed] [Google Scholar]
3.Pesapane F, Volonte C, Codari M, Sardanelli F. Artificial intelligence as a medical device in radiology: ethical and regulatory issues in Europe and the United States. Insights Imaging 2018;9:745–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Martin Noguerol T, Paulano-Godino F, Martin-Valdivia MT, Menias CO, Luna A. Strengths, weaknesses, opportunities, and threats analysis of artificial intelligence and machine learning applications in radiology. J Am Coll Radiol 2019;16 9 Pt B:1239–47. [DOI] [PubMed] [Google Scholar]
5.Martin-Carreras T, Chen PH. From data to value: how artificial intelligence augments the radiology business to create value. Semin Musculoskelet Radiol 2020;24:65–73. [DOI] [PubMed] [Google Scholar]
6.Sohn JH, Chillakuru YR, Lee S, et al. An open-source, vender agnostic hardware and software pipeline for integration of artificial intelligence in radiology workflow. J Digit Imaging 2020;33:1041–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Dikici E, Bigelow M, Prevedello LM, White RD, Erdal BS. Integrating AI into radiology workflow: levels of research, production, and feedback maturity. J Med Imaging (Bellingham) 2020;7:016502. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Driver CN, Bowles BS, Bartholmai BJ, Greenberg-Worisek AJ. Artificial Intelligence in radiology: a call for thoughtful application. Clin Transl Sci 2020;13:216–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Harvey HB, Gowda V. How the FDA regulates AI. Acad Radiol 2020;27:58–61. [DOI] [PubMed] [Google Scholar]

[R3] 3.Pesapane F, Volonte C, Codari M, Sardanelli F. Artificial intelligence as a medical device in radiology: ethical and regulatory issues in Europe and the United States. Insights Imaging 2018;9:745–53. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Martin Noguerol T, Paulano-Godino F, Martin-Valdivia MT, Menias CO, Luna A. Strengths, weaknesses, opportunities, and threats analysis of artificial intelligence and machine learning applications in radiology. J Am Coll Radiol 2019;16 9 Pt B:1239–47. [DOI] [PubMed] [Google Scholar]

[R5] 5.Martin-Carreras T, Chen PH. From data to value: how artificial intelligence augments the radiology business to create value. Semin Musculoskelet Radiol 2020;24:65–73. [DOI] [PubMed] [Google Scholar]

[R6] 6.Sohn JH, Chillakuru YR, Lee S, et al. An open-source, vender agnostic hardware and software pipeline for integration of artificial intelligence in radiology workflow. J Digit Imaging 2020;33:1041–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Dikici E, Bigelow M, Prevedello LM, White RD, Erdal BS. Integrating AI into radiology workflow: levels of research, production, and feedback maturity. J Med Imaging (Bellingham) 2020;7:016502. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

How to Implement AI in the Clinical Enterprise: Opportunities and Lessons Learned

Yvonne W Lui, MD

Krzysztof Geras, PhD

K Tobias Block, PhD

Marc Parente, BS

Joseph Hood, BS

Michael P Recht, MD

INTRODUCTION

TESTING THE MODELS

ORCHESTRATION

Fig 1.

Fig 2.

PROCESSING STEPS

PERSONNEL

Fig 3.

SUMMARY

Footnotes

Contributor Information

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

How to Implement AI in the Clinical Enterprise: Opportunities and Lessons Learned

Yvonne W Lui, MD

Krzysztof Geras, PhD

K Tobias Block, PhD

Marc Parente, BS

Joseph Hood, BS

Michael P Recht, MD

INTRODUCTION

TESTING THE MODELS

ORCHESTRATION

Fig 1.

Fig 2.

PROCESSING STEPS

PERSONNEL

Fig 3.

SUMMARY

Footnotes

Contributor Information

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases