Abstract
Machine learning, a subset of artificial intelligence (AI), is a set of computational tools that can be used to enhance provision of clinical care in all areas of medicine. Gastroenterology and hepatology utilize multiple sources of information, including visual findings on endoscopy, radiologic imaging, manometric testing, genomes, proteomes, and metabolomes. However, clinical care is complex and requires a thoughtful approach to best deploy AI tools to improve quality of care and bring value to patients and providers. On the operational level, AI-assisted clinical management should consider logistic challenges in care delivery, data management, and algorithmic stewardship. There is still much work to be done on a broader societal level in developing ethical, regulatory, and reimbursement frameworks. A multidisciplinary approach and awareness of AI tools will create a vibrant ecosystem for using AI-assisted tools to guide and enhance clinical practice. From optically enhanced endoscopy to clinical decision support for risk stratification, AI tools will potentially transform our practice by leveraging massive amounts of data to personalize care to the right patient, in the right amount, at the right time.
Keywords: artificial intelligence, digestive system diseases, machine learning
Introduction
Clinical management of gastrointestinal diseases span the spectrum from acute to ongoing chronic care and use multiple types of information including endoscopic video, radiologic imaging, manometric readings, and genomic data. With recent advances in artificial intelligence (AI) to process imaging, text, and genomic data, there is great promise for AI-assisted tools to advance the care of patients with gastrointestinal diseases. However, given the complexity of clinical care, there are significant logistic, regulatory, and ethical challenges in determining appropriate and optimal use of the technology.
The potential of artificial intelligence and machine learning in analyzing “big data”
Artificial intelligence is a field that has advanced rapidly in the age of increased computational power, algorithmic sophistication, and availability of data. There is a distinction between general AI and narrow AI. General AI is theoretically identical to human intelligence and is not restricted to specific tasks. General AI currently does not exist, but there are prototypes in natural language processing that appear to be a promising step in that direction (a new language generator called GPT-3 recently released by OpenAI in June 2020).
Machine learning, a subset of AI, is a set of computational tools used in narrow AI applications, where the algorithms are trained to perform well for very specific tasks (e.g. identifying polyps on screening and surveillance colonoscopies). The advantage of machine learning over conventional statistical tools is the ability to analyze “big data,” defined as datasets that are large (volume), complex (variety), and constantly updating (velocity). For medicine, the explosion of available data has been estimated as a doubling every 73 days, and machine learning tools are suited for analyzing the data to be used for diagnostic and prognostic purposes.1
Challenges facing implementation of artificial intelligence in medicine
Machine learning is, first and foremost, a tool to be used in clinical care. Like any tool, the purpose must be judiciously and thoughtfully considered prior its use. Machine learning tools depend heavily on the data used in training and development and may include mathematical and statistical assumptions that are unfamiliar to most clinicians. Logistic challenges can be categorized into understanding the care delivery process, data management, and algorithmic understanding. However, the overall environment for integrating AI-assisted tools needs more development to promote wider uptake. This includes regulatory guidance, standardized payment, and ethical challenges in data privacy, equity, and fairness (Figure 1).
Figure 1.

Schematic reflecting core challenges to artificial intelligence in medicine.
Challenge 1: Understanding the care delivery process.
Before any machine learning tool is considered, a deep understanding of the problem and associated care delivery process is the key to any application of AI to clinical care. The starting point to apply machine learning to clinical care should follow the framework suggested by Isaac Kohane: is the task simple or complex?2,3 The task includes the clinical question and also defines specific areas of the clinical process that can be optimized. By defining specific areas of the clinical process that can be optimized using AI as a tool, the maximal benefit and value can be achieved for patient care and provider satisfaction. On a practical level, depending on the specific clinical problem and care process, the type of algorithm can be selected according to the required level of performance and amount of data available.
There is a growing recognition of the critical role of implementation into the clinical process for AI tools, with the goal to “design the best possible care delivery system for a given problem.”4–6 This is usually an iterative process that goes through the delivery process before, during, and after implementation with the AI tool and focuses on designing and improving user interfaces.
Challenge 2: Data management.
Data management is critical for AI, because as the name machine learning suggests, models must have robust data to successfully learn the relevant patterns.
The first challenge that must be addressed is the availability of high-quality data that are readily captured and accessible and can be generated with each iteration.2 The principle “Garbage in, Garbage out” captures the core concept of ensuring that high-quality data are used to train and test algorithmic performance. While advances in algorithmic development may help, the basis of most algorithms rest on the data itself.7 The implications for data management does not end after training and testing; once the algorithms are trained and tested on high-quality data, there should be a pipeline of consistently labeled data that can be used to continuously train the model in the “virtuous cycle” of the data ecosystem. To address this, many health-care systems have worked towards data standardization and interoperability across platforms to ensure that imaging and clinical data can be pooled and used to generate consistent results.
Another challenge is data bias, or errors, that can lead to predictions that worsen incorrect practice patterns and may unintentionally contribute to worsening disparities in clinical care. For example, if an algorithm to predict risk of hospital-based intervention for patients presenting with overt gastrointestinal bleeding is trained in a setting where there is the wrong practice of overtransfusion, the algorithm may recommend admission for patients who could be discharged for outpatient care. Also, if a clinical dataset has gaps in data from vulnerable or underserved populations, unfair attitudes and practices may contribute to disparities in the output of the algorithm.8 Although the potential for bias will always exist, rigorous validation can mitigate its effect on algorithmic performance. Ideally, the study designs for validation should include internal validation, external validation, calibration, and appropriate statistical testing that compares model performance with a control. External validation in particular is critical for both clinical data and imaging data, because the bias can be mitigated when data are pooled from multiple patient populations, centers, contexts, and manufacturers.
Prospective studies of deployed AI tools with iterative feedback and monitoring can identify areas of bias that can then be corrected. Clinical trial designs for AI tools have been proposed based on the type of task, including randomized controlled trials, random tandem trials, A/B testing, and experimental designs for quality improvement initiatives.9 Currently, new guidelines have been proposed for clinical trial protocols (SPIRIT-AI) and randomized controlled trials (CONSORT-AI) for AI interventions that can be used to design rigorous, high-quality studies.9,10
Ongoing data maintenance is important to ensure that changes in patient populations or clinical care do not affect performance of the AI-assisted tool. Monitoring performance and retraining should be considered from the beginning, because there may be difference in data trends reflecting differences in patient outcomes due to dataset shift (practices, populations evolve over time), new therapies, or evolving epidemiology. Algorithmic stewardship is a new concept on a systems level that includes maintaining an inventory of existing algorithms, having regular audits of safety and fairness, and constant performance monitoring to prevent degradation over time.11,12 As new data is used to update the AI, the algorithms will learn according to the input data and outcomes observed. As the algorithms optimize to learn the patterns of data that predict the desired outcome, it is possible that the performance will either improve in learning the actual predictors or deteriorate due to biases in the data, leading to misclassification. Ongoing expert surveillance is key in troubleshooting the issue as due to either poor-quality input data or algorithmic error. With retraining and careful algorithmic monitoring, the model should have equal if not better performance over time. The burden of maintenance and monitoring should ideally not be on the medical institutions or users but rather the third-party vendor who should take responsibility for setting up the datastreams for regular updates, perform checks for algorithmic maintenance, and establish protocols to investigate when an algorithm misfires.
Finally, cost-effectiveness should be assessed at both the institution level and across national and international boundaries to ensure that sustainable value is realized for healthcare systems. If the costs for maintaining an AI system outstrip potential savings, scarce resources should be re-allocated to other areas. In AI-enhanced screening and surveillance colonoscopies, the AI-enhanced polyp leave-in strategy compared with resect-all-polyps strategy appears to show savings in multiple healthcare systems throughout the world. A recent cost-effectiveness stud estimated savings of 18.9% and US$149.2 million in Japan, 6.9% and US$12.3 million in England, 7.6% and US$1.1 million in Norway, and 10.9% and US$85.2 million in the USA.13 However, this study only evaluates one specific AI tool, and does not include a comprehensive evaluation of the cost of implementation of process changes including increased time for each procedure and actual real-world pattern of endoscopists using the tool.
Challenge 3: Algorithmic understanding.
Algorithmic interpretability is particularly important in clinical care, where providers have developed a deep expertise that can take into account factors that may not be captured by the machine learning model.11 In gastroenterology, practitioners are specialists who are experts in the field and should have the ability to verify the system performance. Use of the AI tools must consider the balance of power, in particular how the AI tools may impinge on professional authority for clinicians.5 Furthermore, by understanding how the prediction is made, practitioners would be able to assess if the prediction is being generated from actual signal or is being distorted by confounding variables. Finally, the generated patterns can be tested and integrated into current scientific understanding to advance clinical care more generally.
Challenge 4: Algorithmic adversarial attacks.
Adversarial attacks that exploit weaknesses in current algorithms by manipulating the input data are an emerging challenge that has particular importance for health care.14 Currently, the most active area in the USA is regarding insurance claims approvals via billing codes, because insurance companies deploy machine learning models for classifying certain claims. In particular, interest for gastroenterologists, however, is the potential for visually imperceptible “adversarial noise” added to images that can cause deep learning models deployed on imaging to misdiagnose or miss pathology. Proposed measures to defend against attacks include backups that provide a “fingerprint” of data to be extracted and stored immediately after capture. This can then be used to compare to the image used for analysis to evaluate for data tampering and also be used to build resiliency in algorithms during real-time deployment.
Challenge 5: Regulatory guidance.
Regulatory guidance is underdeveloped globally. Despite strong efforts on a national, regional, and international level to develop frameworks to ensure quality control and patient safety, there is still a high amount of uncertainty regarding the requirements that will be enacted.15 Quality control through regulation is being developed on the international level through the International Medical Device Regulators Forum and on the regional and national level: the USA has proposed a new regulatory framework of Software as Medical Device through the Food and Drug Administration, the European Union has proposed General Data Protection and Regulation, and the China State Council has proposed a development plan for AI.15
Challenge 6: Liability and legal responsibility.
Due to the regulatory ambiguity above, liability and reimbursement is not clearly defined. The issue of liability is critical for both firms developing AI-assisted tools and medical provider end-users. If patients experience an adverse event based on clinical decisions made with AI-assisted tools, who is accountable? As a “black box” tool, the liability of adverse events to patients based on decisions made using AI-assisted tools may be shouldered by the manufacturer. If the output is sufficiently interpretable, however, the liability would likely be borne by the medical provider who made the clinical decision. Reimbursement should ideally compensate whichever party shoulders the risk; however, AI-specific reimbursement is notably absent from national health-care systems and private payors. Currently, there is only one instance of payment specifically for use of an AI tool through the Centers for Medicare and Medicaid under the framework of new technology add-on payments, upcoming for fiscal year 2022. No other system-wide payment mechanism currently exists for reimbursement for AI tools, particularly for AI-enhanced endoscopy.
Challenge 7: Ethical issues.
Ethical challenges concern the interaction of these algorithms with human health and the safeguards that should be put in place to mitigate the potential adverse effects of these algorithms. Challenges include defining the role of informed consent in data utilization, maintaining privacy compliance across the spectrum of data users, returning results from analyses using patient data, and addressing equity in algorithmic development. Informed consent is a cornerstone of medical research, which recognizes the autonomy of patients and their right over their medical data. However, a challenge that should be considered in AI is the potential for using the same patient data for both specific conditions (e.g. inflammatory bowel disease) and also in aggregate (e.g. if the data are sent for epidemiological purposes).16 Privacy is also challenging to maintain when AI may include a host of third-party partners, including vendors, software developers, data scientists, and other systems. In particular, the USA requires HIPAA compliance across the spectrum of data users, and thus, it is important to consider how to deidentify data and maintain a secure dataflow.16 One specific area where this is important is when considering how and to whom results of the AI tool should be returned. This has implications for shared decision-making for patients and providers, because the findings may impact how the patient thinks about the next step in their treatment plan. By considering who should have access, what should be shared, and which threshold should be used to share specific findings, the patient and provider can hopefully use the AI tool to assist in planning further care and avoid miscommunication. Finally, equitable access to both the training and deployment of AI tools should be considered as the technology develops. Recently, a study found severe disparity in geographic distribution of deep learning algorithms in clinical applications, with patient cohort data predominantly coming from three states: California, New York, and Massachusetts.17 Representation is key, because patient outcomes and clinical care may vary across geography, ethnicity, and socioeconomic status. If these aspects are considered, modifications can be made to decrease the risk of health inequities such as race correction in clinical algorithms.18
The future
The hype for AI is not new; in the history of AI, there was a tremendous enthusiasm for AI that has led to several “AI winters” in the 1970s and 1980s, leading to periods of reduced funding and profound disillusionment. The current hype seems to mirror historical trends, particularly with claims of outperformance of deep learning tools versus clinicians in a recent systematic review.3
This time, however, things may be different. We have emerging clear frameworks to guide best practice and weigh claims of machine learning studies, there is an abundance of infrastructure for data storage, and the revolution of computational processing expands the capacity to handle ever-increasing amounts of data. More importantly, there is a multidisciplinary democratization of open-source machine learning tools through programming languages with pre-written, readily available software libraries, such as TensorFlow, Scikit-Learn, and PyTorch. The rapid emergence of a multidisciplinary approach and awareness of AI tools holds promise for the use of AI-assisted tools to guide and enhance our clinical practice.
Currently, the role envisioned for machine learning tools primarily to assist clinicians in making decisions for patient care. In the future, it is conceivable that integration of multimodal streams of data could frame the significance of findings, changing the role for clinicians from gathering and analyzing information to spending time with patients helping them navigate their disease experience. Eric Topol of Scripps Research Institute, a leading thinker for AI and medicine, has emphasized the idea that when AI is able to deliver similar or superior results than humans, it still cannot replace the human side of empathetically being with patients.19 The opportunity lies with reduced time burden in collating patient data and performing preliminary analyses, and instead spending the time interpreting results and managing patient expectations to help them cope as they progress through the treatment plan.
From optical biopsies and enhanced routine colonoscopies to selecting the optimal immunomodulator drug for inflammatory bowel disease, AI tools will potentially transform our practice by leveraging massive amounts of data to personalize care to the right patient, in the right amount, at the right time.
Acknowledgment
This work was supported by the NIH training grant T32 DK007017.
References
- 1.Densen P Challenges and opportunities facing medical education. Trans. Am. Clin. Climatol. Assoc 2011; 122: 48–58. [PMC free article] [PubMed] [Google Scholar]
- 2.Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N. Engl. J. Med 2019; 380: 1347–58. [DOI] [PubMed] [Google Scholar]
- 3.Nagendran M, Chen Y, Lovejoy CA et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ 2020; 368: m689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Shaw J, Rudzicz F, Jamieson T, Goldfarb A. Artificial intelligence and the implementation challenge. J. Med. Internet Res 2019; 21: e13659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Sendak MP, Ratliff W, Sarro D et al. Real-world integration of a sepsis deep learning technology into routine clinical care: implementation study. JMIR Med. Inform 2020; 8: el 5182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Li RC, Asch SM, Shah NH. Developing a delivery science for artificial intelligence in healthcare. npj Digit. Med 2020; 3: 107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Rajkomar A, Oren E, Chen K et al. Scalable and accurate deep learning with electronic health records. npj Digit. Med 2018; 1: 18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. Potential biases in machine learning algorithms using electronic health record data. JAMA Intern. Med 2018; 178: 1544–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Rivera SC, Liu X, Chan A-W, Denniston AK, Calvert MJ. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. BMJ 2020; 370: m3210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Liu X, Rivera SC, Moher D, Calvert MJ, Denniston AK. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. BMJ 2020; 370: m3164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019; 17: 195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Eaneff S, Obermeyer Z, Butte AJ. The case for algorithmic stewardship for artificial intelligence and machine learning technologies. JAMA 2020. [DOI] [PubMed] [Google Scholar]
- 13.Mori Y, Kudo S-e, East JE et al. Cost savings in colonoscopy with artificial intelligence-aided polyp diagnosis: an add-on analysis of a clinical trial (with video). Gastrointest. Endosc 2020; 92: 905–11.e1. [DOI] [PubMed] [Google Scholar]
- 14.Finlayson SG, Bowers JD, Ito J, Zittrain JL, Beam AL, Kohane IS. Adversarial attacks on medical machine learning. Science 2019; 363: 1287–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K. The practical implementation of artificial intelligence technologies in medicine. Nat. Med 2019; 25: 30–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Shen FX, Wolf SM, Gonzalez RG, Garwood M. Ethical issues posed by field research using highly portable and cloud-enabled neuroimaging. Neuron 2020; 105: 771–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kaushal A, Altman R, Langlotz C. Geographic distribution of US cohorts used to train deep learning algorithms. JAMA 2020; 324: 1212–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Vyas DA, Eisenstein LG, Jones DS. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. N. Engl. J. Med 2020; 383: 874–82. [DOI] [PubMed] [Google Scholar]
- 19.Topol E Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again New York: Basic Books, Inc, 2019. [Google Scholar]
