Artificial Intelligence (AI) is moving into every aspect of our lives: selecting the online advertisements that we see, recognizing our voices, showing us the best route home and protecting us from credit card fraud. Machine learning (ML) refers to computer programs that use complex mathematical functions to give computers the ability to perform human-like actions such as problem-solving, object and word recognition, decision-making, and generating predictions about future events. ML commonly analyzes thousands of columns and millions of rows to identify complex non-linear relationships between multiple variables. It offers the potential for deeper insight than traditional biostatistics and also has the potential to personalize treatment and make tailored recommendations for individual patients.
In anesthesia, AI has potential applications in decision support, patient monitoring, drug delivery, ultrasound-guided regional anesthesia, and training. Algorithms can analyze individual patient data in real-time to predict critical events, need for treatment modifications and to support decision-making. They can identify patterns, predict outcomes and recommend optimal treatment strategies to the perioperative team. AI is not intended to replace the expertise and judgement of physicians in our specialty but aims to deliver on the five rights of decision support: providing the right information, to the right person, in the right format, through the right channel, at the right time in the workflow1.
A great deal of hype currently surrounds the development of AI tools2, fueled by the introduction of the Large Language Model ChatGPT and by headlines espousing the future societal dangers of advanced AI. In reality, few anesthesiologists currently encounter AI-based tools in their current practice, despite experiencing the benefits of AI daily in their lives outside of work. So, why is it so difficult to get AI projects out there to help our patients?
Let’s start with data. Medicine generates data in both high volume and high veracity, and it is collected into the electronic medical record (EMR), primarily for documentation and medicolegal purposes. Rarely is the EMR optimized for data-based research because such optimization doesn’t happen by accident. We need to design the ways we collect data to optimize it for future use with AI. Contemporary medical practice has become so safe and effective that many adverse outcomes have become rare. While this development is admirable, it also means that few records are available to train ML models that can recognize events where patient safety was threatened. Large data sets are often used to power high-performing and clinically meaningful models, necessitating data sharing across institutions. These large data sets have the potential to become unwieldy without initiatives such as the Multicenter Perioperative Outcomes Group (MPOG) to improve data collection, storage, and validation. Privacy and data security are major concerns, as AI systems rely on vast amounts of sensitive patient data for training and analysis. Ensuring proper safeguards and adhering to strict data protection regulations are essential to maintain patient confidentiality and trust.
While an expansion in data availability is beneficial for creating ML models, the variety of data sources is formidable. Incompatible data formats from different sources, varied data structures, and messy data can be challenging to sort out. There are no universally agreed data collection standards to support what is known as “interoperability” between data sources. For an AI model to have wide adoption, it must be able to interface with the data within different EMRs. More EMRs are embracing FHIR (Fast Healthcare Interoperability Resources) as a standard for healthcare data to facilitate exchange, but this is far from a perfect solution.
For some potentially wonderful AI tools, we currently don’t even capture the necessary data. ML-targeted drug delivery may improve on current pharmacokinetic and pharmacodynamics models, but we need highly granular information on dose timing and patient physiologic effects. Even with availability of that granularity, capture and analysis of available data is still only, at best, observational data, with all of the inherent limitations of observational study design. Ideally, for machine learning, we need data that is of high quality, clinically relevant, with few missing values and is closely related to the population we wish to study. This just doesn’t happen organically; we will need to do some work before we can let our AI algorithms start to learn. This pre-processing or data cleaning is time- and labor- intensive and is therefore costly.
We also need to be sure to only use features that would be available at the time the model is clinically useful. This is why AI development must be guided by subject matter expert physicians working closely with data scientists to select the appropriate data fields and to examine the feasibility of values that are fed into the model. There are a limited number of clinicians who can translate between the computational aspects of model building and the clinical insights around the problem to be solved and how to integrate the model into clinical workflow, and that leaves a skills gap for model development and implementation. There’s also huge market demand for data scientists across multiple industries.
When developing our projects, we need to choose the right question. This might be in an area of high clinical or monetary value and should represent clinical or operational areas where there is consensus by providers on the appropriate management, where the necessary data are already available, and where the application of AI has the promise to reduce cognitive burden and provide useful insight for the provider. What bedside clinicians most value may not match what the non-clinician purchasers and funders of ML tools value, nor what is easy to produce by developers working with existing datasets. ML-systems are situated to recommend evidence-based clinical actions where data exists, with greater perspective than any individual clinician. However, AI lacks the ability to contextualize a clinical decision to the wider care of an individual patient. Therefore, AI systems are better deployed in support of clinician knowledge, rather than as clinician replacement. They are also more likely to be trusted and accepted in this way. The term artificial intelligence is therefore potentially a misnomer. The near future is likely one of augmented intelligence, where computers become indispensable tools that help us care for our patients and allow physicians and other healthcare providers to devote more of their efforts on patient care.
Recent high-profile AI safety disasters such as the Boeing 737 Max and Tesla Model S crashes have been attributed to user error from a lack of familiarity with automated piloting systems and use outside their intended design. It’s therefore important for clinical users to understand the model’s limitations and appropriate use. Trust in any new technology or technique is a massive concern--without it, we simply will not change our practice, but machine learning presents relationships between variables in a fashion that is unreadable to the human eye. DARPA, the Defense Advanced Research Projects Agency, is investigating what we as humans require to understand and trust AI systems. The FDA is currently designing procedures to guide premarket review of proposed clinical ML applications.
Researchers and developers must consider the full ethical, bias, and safety implications of deploying these new technologies. As with the introduction of any new treatment, the need for careful appraisal, validation, and monitoring of tools using AI does not stop with implementation. The proposed FDA framework includes a type of “Phase IV” post-marketing surveillance that will be essential to assess model performance. Many models are trained on real-world datasets--the treatment delivered to actual patients by physicians--rather than based on a known gold standard of care. There is extensive evidence that the healthcare outcomes of minority groups fall below expected outcomes due to systematic bias. Although AI aims to add no additional bias, if we do not adjust for bias in the training data, it becomes baked into the model.
As we develop models, we must be sure to develop not what we can achieve with progressive new technologies, but on what we should create, based on sound ethical principles. This will create and maintain public and clinician trust, essential to acceptance of AI as it matures into widespread clinical use. We are currently in a state of discovery with most efforts centered around model development. The promise of ML applications and their appropriate implementation remain separated, with progression slowed by the need for multiple complementary skill sets that may not be present in a single institution or team. We’re getting there, but it’s slow as we tackle these challenges.
References
- 1.Campbell RJ. The five rights of clinical decision support: CDS tools helpful for meeting meaningful use. Journal of AHIMA. 2013;84(10):42–47 (web version updated February 2016). [PubMed] [Google Scholar]
- 2.Lonsdale H, Jalali A, Gálvez JA, Ahumada LM, Simpao AF. Artificial Intelligence in Anesthesiology: Hype, Hope, and Hurdles. Anesthesia & Analgesia. 2020;130(5):1111–1113. doi: 10.1213/ane.0000000000004751 [DOI] [PubMed] [Google Scholar]
