Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 May 1.
Published in final edited form as: Anesth Analg. 2020 May;130(5):1115–1118. doi: 10.1213/ANE.0000000000004575

Realistically Integrating Machine Learning into clinical practice: a road map of opportunities, challenges, and a potential future

Ira S Hofer 1, Michael Burns 2, Samir Kendale 3, Jonathan P Wanderer 4
PMCID: PMC7584400  NIHMSID: NIHMS1638233  PMID: 32287118

Recently, interest in machine learning has rapidly grown within the healthcare community. This popularity has been due in part to the advances in perception tasks such as speech processing and image analysis, which has led to the successful implementation of machine learning across a variety of industries: from the ancient game of Go to Netflix movie suggestions1. The abundance of electronic health record data coupled with the low costs of computation power and data storage make machine learning and medicine well matched. Though healthcare has been traditionally slow to adopt new technologies, the time is approaching when machine learning delivers on promises to improve the current and future practice of perioperative medicine. Deployment and operationalization of a machine learning model requires synthesizing knowledge in data processing and model development with a knowledge of medicine and clinical workflows. In this article we hope to: elucidate what machine learning is and why it will transform clinical care, discuss what it takes to implement machine learning in clinical care, address current limitations and drawbacks, and ultimately examine what the future of machine learning in healthcare may hold.

A primer on machine learning

Machine learning is a subset of artificial intelligence in which a computer iteratively learns from data without explicit rule-based programming. Machine learning models find patterns within data and apply these patterns to new data to make predictions. Because of the computer’s ability to rapidly process large amounts of electronic data, these models are advantageous when using large datasets. Where humans can become overwhelmed with increasingly large and complex data inputs, machine learning models often thrive. The recent rapid explosion in computational processing power and increasing availability of large data enables capabilities beyond traditional rule-based modeling including improved analytical capacities and accessibility to unique, potentially hidden, insights. As these methods become more widely adopted in the medical community all parties will benefit. Improvements include increased care efficiency and more accurate hospital reimbursements and increase care quality such as improved disease classification, predicting complications and ultimately better outcomes,

Machine learning models have been created to predict an increasing number of clinical outcomes such as diagnoses and mortality, with applications including: C. difficile infection in the inpatient hospital setting2, identifying molecular markers for cancer treatments3, and post-operative surgical outcomes4. Examples of machine learning including a cardiologist using an automated interpretation of an ECG and a radiologist using an automated detection of a lung nodule in a chest x-ray. In both of these examples, a machine learning model approximates a trained physician’s diagnosis with high accuracy. To date, models are successfully constructed to answer a single clinical question by using appropriately labeled representative data. The tools to develop these models (using languages such as R and Python) are largely free of charge and openly accessible. However, the data required to train these models and the expertise in deploying them remain hurdles to widespread adoption.

We are currently in a state of discovery with most efforts centered around model development. Unfortunately, a substantial number of models fail to progress to clinical deployment. The promise of machine learning applications and their appropriate implementation remain separated, with progression slowed by inexperience, a lack of direction, and the need for multiple complimentary skill sets that may not be present in a single institution or team. The focus on model development can overshadow important evaluations of the appropriateness of the question being addressed, model scrutiny, and technical or political roadblocks in operationalizing results. To be successful, it is important to consider costs, impact, outcomes (including moral and ethical), and path to deployment before projects begin.

Model Implementation

The key step in developing and operationalizing a machine learning model is picking a good problem to target. Ideally, these problems have some important characteristics: they are regarded as being of high clinical or monetary value, they represent areas of medicine or operations where there is consensus by providers on the appropriate management, the necessary data are already available, and the application of computing power has the promise to reduce cognitive burden for the provider. Few problems will have all of these properties. For instance, a system has been described that predicts intraoperative hypotension, based on high fidelity analysis of arterial waveforms5. Thus, using data already available, providers can potentially intervene on this significant pathophysiological state before it develops. However, the appropriate therapy, whether it be administration of volume, delivery of vasopressors or adjustment of the anesthetic, would need still need to be determined.

The next step is to create a team of individuals capable of creating the desired solution6. While abilities and roles may overlap, the team will need clinical expertise, data experts and design help. Clinical expertise requires both a clinical subject matter expert and a clinical champion to assist with implementation. To address data needs, the team will need an individual capable of extracting appropriate datasets from the electronic health record, someone to perform data validation, and an expert in machine learning models. Some individuals may play more than one role. For instance, a clinical subject matter expert who is facile with data extraction and has time for extensive data validation could get much of the work done.

With an addressable problem and a team in place, the next step is to develop a good solution. Machine learning tools must obey the five rights of decision support: providing the right information, to the right person, in the right format, through the right channel, at the right time in the workflow7. Conceptualizing the intervention requires understanding the implications of model performance. For example, one of the authors recently evaluated a model in operation that predicted need for a post-acute care bed at time of discharge, with great performance: AUC 0.87, and sensitivity and specificity of roughly 80%. With approximately 94% of patients discharged to home, however, that translates to a positive predictive value of 24%, and a negative predictive value of 75%8. When used to drive a list of patients expected to have post-acute care needs, it performs poorly: 3 of 4 patients flagged are going home, and 1 in 4 going to post-acute care are missed, resulting in wasted time. Predicting patients going home instead yields a much better result: a positive predictive value of 98%, which case managers can use to reliably identify patients who don’t need to be evaluated, thus saving significant time.

Lastly, piloting the solution with a limited group of users is critical. This can be done in a production environment by restricting the number of users or areas to which the solution is available, or by releasing the solution into a separate test or support environment for evaluation. Use of a production environment may yield better feedback, but may not always be feasible or may be much more resource intensive. After a successful pilot, the solution is ready for deployment. It is important to embed an evaluation strategy into the roll-out to to determine if the solution was effective. It is often helpful to share the results of the implementation with the end users and periodically re-evaluate deployed solutions, which have performance that may degrade over time as patient characteristics or practice patterns change with time making iterative changes as necessary.9 Additionally, solutions may develop malfunctions over time, and solutions no longer working appropriately should either be fixed or decommissioned.10

Challenges to Development and Implementation

There are a number of hurdles that are still present in machine learning development and implementation in anesthesiology. Many of these challenges are similar to those that have plagued outcomes and database researchers for some time.

The first set of drawbacks relate to model development. While the expansion in data availability is beneficial for creating machine learning models, the variety of data sources, as well as the trials of data wrangling, can be formidable. Incompatible data formats from different sources, unusual data structures, and generally messy data can occupy a significant amount of time in the development process11. Similarly, if models are created using a single institution’s data, there still remains the issue of external validation from multiple sources12. Fortunately, using data from large registries may be helpful in addressing these concerns. First, large multicenter data warehouses, such as those built by the Anesthesia Quality Institute, Multicenter Perioperative Outcomes Group, and National Surgical Quality Improvement Program, typically attempt to unify disparate data sources and formats into a common format13.Second, they contain data from multiple institutions. This allows for testing of generalizability by running a given model on data from different institutions. While there are still data acquisition and quality difficulties with large multicenter warehouses, depending on how they are structured and assuming institutions submit quality data, the standardization of data elements is a step in the right direction for healthcare-specific data management.

In order to impact patient care, models must not only be developed but also deployed. Given a lack of interoperability standards, for a model to work it must be able to interface with the data within different EHRs. Each EHR contains varying data structures, creating a significant challenge to model deployment. It is similar to having a recipe in one language and the ingredients in another, and not knowing whether the right ingredients are in the pantry. Unfortunately, competition between EHR companies, between hospitals, and even between silos within an institution limits the drive for interoperability. Recently, by the Office of the National Coordinator for Health Information Technology updated their meaningful use criteria to include interoperability standards, however this is a more recent change14. Hopefully this will force the various stakeholders (including those in competition) to restructure systems to improve interoperability. Realistically, true interoperability will take time to perfect and these efforts require strong industry collaboration between regulators, administrators, scientists, and, just as importantly, physicians. With more EHRs embracing Fast Healthcare Interoperability Resources (FHIR, a standard for healthcare data to facilitate exchange), there is progress in the right direction, but there remains enough variation to prevent rapid or reliable interoperability15.

Simultaneous with these infrastructure challenges are a variety of additional technical challenges related to model deployment. Until recently, none of the major EHR companies supported machine learning out of the box. Ideally, there would be a machine learning platform within the EHR to easily integrate existing scripts written with common machine learning tools in a manner that would not affect the real time performance of the EHR. The analogy would be the app store for a cell phone. Developers are able to develop software that runs on a third party’s platform (the cell phone manufacturer) in a reliable and predictable way. The cellphone maker publishes clear standards and interoperability rules but has little to do with the actual app development. To date, despite the hype surrounding machine learning, none of the current EHR platforms offer services anywhere near as robust as those which can be created outside EHR platforms. This gap requires developers to export the data from the EHR into another platform in order to run their models, increasing the potential failure points and requiring significant additional infrastructure.

Deployment of machine learning models has non-technical hurdles. Policies need to be in place to ensure appropriate oversight, specifically delineating who is responsible for the data, model upkeep, and any actionable consequences of model results. The commitment of stakeholders to guarantee sustainable implementation, as is the instituting of appropriate patient privacy and consent policies16. Additionally, there are a number of costs associated with development, whether they be financial costs and time costs. Neither of these can be underappreciated or reduced in an attempt to economize. Healthcare is a highly complex system and the stakes are high in the event of model failures. “Moving fast and breaking things17”, as has been described in Silicon Valley is not a viable option in medicine. If a GPS app suggests the wrong route home the consequence is a longer commute. If a model fails in healthcare the consequences are life threatening.

Towards the Future

So what does the future hold? While the exact timeline or path to adoption remains ill defined, machine learning will certainly become part of, and alter, medicine. Despite the challenges in predicting the future, answering three key questions may be helpful in understanding the ways that machine learning will impact anesthesiology.

What clinical challenges most lend themselves to machine learning based implementations? Those most successfully approached can be broadly classified in two groups: questions assessing clinical treatment strategies and those assessing clinical risk. There are clinical scenarios where practice is unclear at the time. For example, several studies have shown machine learning models are capable of discriminating which septic patients are most likely to be volume responsive18. Similarly, several recent studies have shown that models can be constructed to help predict perioperative risk19, something that clinicians find notoriously hard to quantify. Thus, there is significant potential for models to help clinicians identify at risk patients, and ultimately quantify what is currently not quantifiable.

How might we overcome obstacles? Historical lessons from other industries may be of use. In the 1990s, as computers were becoming more relevant, implementation was plagued with issues related to a lack of standardization and interoperability. Computer companies attempted to use proprietary cables, closed software formats, and other similar devices to ensure customer loyalty. However, over time a combination of innovation and regulation served to create standards which allowed the computer industry to make hardware and operating systems mostly manufacturer agnostic. The aforementioned FHIR standards is likely the early beginnings of these changes in medicine. Implementation and workflow are likely the place where clinicians must be most actively involved in machine learning applications. Anesthesia has historically been entrenched in informatics20, and the anesthesiologist will have continued involvement in advocating for patients and ensuring that machine learning implementations augment rather than complicate clinical workflow. These efforts, including reducing the increasing requirements for data entry, will be critical for successful advancement and adoption of artificial intelligence in healthcare.

The last question: what might the future of healthcare look like? It is conceivable that in the coming decades many of the repetitive tasks we perform today will become automated. Others have described target controlled infusion systems whereby an anesthesiologist sets blood pressure goals and the pressor dose is automatically adjusted to keep the pressure within the designated zone. Machine learning has the ability to take this a step further, determining patient specific blood pressure goals and extrapolating results from previous cases to determine which pressor to use. However, at its core, the practice of medicine will remain largely unchanged. Medicine involves speaking to patients and helping them in what are often the most stressful times of their lives. These higher cognitive tasks, such as understanding a patient’s goals and weighing competing priorities are unlikely to ever be outsourced to a machine and will likely become a bigger part of the clinical effort of physicians. It is for this reason the term artificial intelligence is increasingly being recognized as a misnomer. The future is likely one of augmented intelligence, where computers become indispensable tools that help us care for our patients, and the physician can devote more of their efforts on patient care.

Glossary:

EHR

Electronic Health Record

Footnotes

Conflict of Interest: Ira Hofer receives funding from Merck Pharmaceuticals. He is also the founder and president of Clarity Healthcare Analytics and has a copyright on software that assists with data extraction from the EHR.

References

  • 1.Saria S, Butte A, Sheikh A. Better medicine through machine learning: What’s real, and what’s artificial? PLOS Medicine 2018;15:e1002721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Oh J, Makar M, Fusco C, McCaffrey R, Rao K, Ryan E, Washer L, West L, Young V, Guttag J, Hooper D, Shenoy E, Wiens J (2018). A Generalizable, Data-Driven Approach to Predict Daily Risk of Clostridium difficile Infection at Two Large Academic Health Centers Infection Control & Hospital Epidemiology 39(4), 425–433. 10.1017/ice.2018.16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Lee Su-In, et al. “A machine learning approach to integrate big data for precision medicine in acute myeloid leukemia.” Nature communications 91 (2018): 42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Merali ZG, Witiw CD, Badhiwala JH, Wilson JR, Fehlings MG (2019) Using a machine learning approach to predict outcome after surgery for degenerative cervical myelopathy. PLoS ONE 14(4): e0215133 10.1371/journal.pone.0215133 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hatib F, Jian Z, Buddi S, Lee C, Settels J, Sibert K, Rinehart J, Cannesson M. Machine-learning Algorithm to Predict Hypotension Based on High-fidelity Arterial Pressure Waveform Analysis. Anesthesiology. 2018. October;129(4):663–674 [DOI] [PubMed] [Google Scholar]
  • 6.Ray JM, Ratwani RM, Sinsky CA, Frankel RM, Friedberg MW, Powsner SM, Rosenthal DI, Wachter RM, Melnick ER. Six habits of highly successful health information technology: powerful [DOI] [PMC free article] [PubMed]
  • 7.Strategies for design and implementation. J Am Med Inform Assoc. 2019. July 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Osheroff JA, Teich JA, Levick D et al. Improving Outcomes with Clinical Decision Support: An Implementer’s Guide. 2nd Edition Chicago, IL: HIMSS, 2012: p. 15. [Google Scholar]
  • 9.Yoshida E, Fei S, Bavuso K, Lagor C, Maviglia S. The Value of Monitoring Clinical Decision Support Interventions. Appl Clin Inform. 2018. January;9(1):163–173 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kassakian SZ, Yackel TR, Gorman PN, Dorr DA. Clinical decisions support malfunctions in a commercial electronic health record. Appl Clin Inform. 2017. September 6;8(3):910–923 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Kruse CS, et al. , Challenges and Opportunities of Big Data in Health Care: A Systematic Review. JMIR Med Inform, 2016. 4(4): p. e38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Altman DG, et al. , Prognosis and prognostic research: validating a prognostic model. Bmj, 2009. 338: p. b605. [DOI] [PubMed] [Google Scholar]
  • 13.Dutton RP, Registries of the anesthesia quality institute. Int Anesthesiol Clin, 2014. 52(1): p. 1–14. [DOI] [PubMed] [Google Scholar]
  • 14.Connecting Health and Care for the Nation: A Shared Nationwide Interoperability Roadmap.
  • 15.Saripalle R, Runyan C, and Russell M, Using HL7 FHIR to achieve interoperability in patient health record. J Biomed Inform, 2019. 94: p. 103188. [DOI] [PubMed] [Google Scholar]
  • 16.Price WN 2nd and Cohen IG, Privacy in the age of medical big data. Nat Med, 2019. 25(1): p. 37–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Zuckerberg Mark. 2009. Interview with Business Insider. https://www.businessinsider.com/mark-zuckerberg-2010-10 [Google Scholar]
  • 18.Komorowski M, Celi LA, Badawi O, Gordon AC, Faisal AA. The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care. Nature Medicine 2018:1–11. [DOI] [PubMed] [Google Scholar]
  • 19.Lee CK, Hofer I, Gabel E, Baldi P, Cannesson M. Development and Validation of a Deep Neural Network Model for Prediction of Postoperative In-hospital Mortality. Anesthesiology 2018;129:649–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Hofer IS MD, Levin MA, Simpao AF, McCormick PJ, Rothman BS. Anesthesia Informatics Grows Up. Anesthesia & Analgesia 2018;127:18–20. [DOI] [PubMed] [Google Scholar]

RESOURCES