Abstract
Purpose of review:
Artificial intelligence (AI) is a broad term that pertains to a computer’s ability to mimic and sometimes surpass human intelligence in interpretation of large datasets. The adoption of AI in gastrointestinal motility has been slower compared to other areas such as polyp detection and interpretation of histopathology.
Recent findings:
Within esophageal physiologic testing, AI can automate interpretation of image-based tests, especially high resolution manometry (HRM) and functional luminal imaging probe (FLIP) studies. Basic tasks such as identification of landmarks, determining adequacy of the HRM study and identification from achalasia from non-achalasia patterns are achieved with good accuracy. However, existing AI systems compare AI interpretation to expert analysis rather than to clinical outcome from management based on AI diagnosis. The use of AI methods is much less advanced within the field of ambulatory reflux monitoring, where challenges exist in assimilation of data from multiple impedance and pH channels. There remains potential for replication of the AI successes within esophageal physiologic testing to HRM of the anorectum, and to innovative and novel methods of evaluating gastric electrical activity and motor function.
Summary:
The use of AI has tremendous potential to improve detection of dysmotility within the esophagus using esophageal physiologic testing, as well as in other regions of the gastrointestinal tract. Eventually, integration of patient presentation, demographics and alternate test results to individual motility test interpretation will improve diagnostic precision and prognostication using AI tools.
Keywords: artificial intelligence, machine learning, high resolution manometry, functional lumen imaging probe
Artificial intelligence (AI) is a broad term that pertains to a computer’s ability to mimic human intelligence. Interest in the application of AI to healthcare has surged in recent years.1. The utilization of AI in medicine is extensive, encompassing methodology ranging from common statistical tools (e.g., linear regression), to more complex deep learning methods (e.g., convolutional neural networks, CNN), and even generative models (e.g., ChatGPT, Figure 1). The ability to quickly interrogate vast amounts of data and automatically search and extract complex features makes many AI models ideal for use in healthcare.
Figure 1.

Definitions of terminology used in describing artificial intelligence (AI) methodology. ML: machine learning
Although AI uptake in clinical gastroenterology has been rapid and diverse, including assisted endoscopy, diagnosis, prognostication, and treatment selection, the jury is still out whether AI truly improves the gastroenterologist’s day-to-day clinical activities2. The most extensively studied, and perhaps broadly adopted, AI applications in gastroenterology involve computer-aided detection of polyps during colonoscopy, which utilizes deep learning models that interpret images in real-time. However, data from randomized control studies has been mixed in whether the augmentation of polyp detection is clinically relevant, even as the technology is now commercially available3 4 5. Although trepidation regarding AI exists among clinicians, studies of AI in clinical gastroenterology continue, and now include AI-assisted evaluation of pathologic samples, especially identification of dysplastic lesions within Barrett’s esophagus and automated endoscopic scoring of inflammation in inflammatory bowel disease 6, 7 8, 9.
While the adoption of AI in gastrointestinal motility has been slower compared to other areas, technological advancements have created opportunities for automated interpretation of high resolution manometry (HRM) and functional lumen imaging probe (FLIP) studies10. Manometric software packages now incorporate algorithms for the automatic identification of anatomic landmarks, along with diagnostic interpretation, with varying degrees of accuracy. The application of AI methods to gastrointestinal motility has primarily focused on esophageal testing and interpretation, as evidenced by recent studies.11, 12 In this review, we discuss current AI applications within esophageal physiologic testing, and categorize the various AI tools available to the clinician in the evaluation, diagnosis, and management of esophageal motility disorders.
WHAT IS ARTIFICIAL INTELLIGENCE?
At its core, AI consists of the use of complex computer programs as ‘intelligent’ interpreters of datasets to identify patterns that may not be apparent to a human or too cumbersome for manual analysis. The identification of data patterns is termed ‘learning’, and therefore ‘machine learning’ is a term that sometimes is used synonymously with AI, although in essence, machine learning is a more complex subset within AI systems (Figure 1). Varying degrees of human input may be required for AI processes. Once the AI system has learnt the patterns within a dataset, the option remains for certain AI systems to continue computing without further human input to identify further relationships within the dataset. Patterns identified by AI are expressed as mathematical models, which can then be utilized to interrogate other new or existing data sets to make clinical predictions.
In addition to large datasets with adequate volume, clinical AI models depend on diverse data containing a wide array of patterns and diagnoses from representative populations. This diversity ensures generalizability of models to unseen data and different patient cohorts. However, a current challenge is the interpretation of non-diverse data, as this can lead to models that inadvertently encode biases into their underlying algorithms.13
Although AI methods have existed since the mid-twentieth century, the combination of increased computing power, abundant data, and innovative self-supervised learning techniques has equipped researchers with novel tools for research and data interpretation.14 However, despite the promises offered by AI, challenges persist. These include a lack of transparency in deep-learning methods and difficulties in both reproducing and auditing results.14 In spite of these, thoughtful applications of AI have the potential to revolutionize clinical medicine.
ARTIFICIAL INTELLIGENCE TOOLS
Machine learning and deep learning methods represent the core methods for clinical AI applications, and are mathematical models designed to interpret data, recognize patterns, and make predictions (Figure 2). Broadly speaking, two forms of machine learning are utilized: supervised and unsupervised.
Figure 2.

Hierarchical organization of common machine learning processes: Machine learning is broadly divided into supervised and unsupervised learning. Supervised learning relies on labeled data for prediction and inference. It is further divided into classification and regression methods. Classification is used for categorical outcomes, while regression is typically used for quantitative outcomes. Several commonly used classification and regression methods are listed. Unsupervised learning identifies patterns and trends in unlabeled data. Unsupervised learning includes clustering methods, of which several commonly used approaches are listed.
Supervised AI methods rely on labeled data to accurately classify or predict outcomes. Data is divided into two to three different portions. A learning dataset is used to train an algorithm and, where applicable, aid in parameter selection. Subsequently, the resulting algorithm is applied to the test dataset to prove efficacy; this final analysis typically follows evaluation of a validation set aimed to confirm accuracy before moving forward. The necessity of dividing data into learning and validation sets means that researchers must reserve a portion of their data for validation. The proportion of the data held for validation directly impacts the data available for training. While there is no explicit standard for how much data to reserve for validation, various methods exist to optimize validation while minimizing data loss for training. K-fold cross-validation is a commonly used method that divides the entire data set into K folds, with typical values of K being 5 or 10. In the case of 10-fold cross-validation, the model is trained on 9/10ths of the data, and the remaining 1/10th is reserved for validation. This training and validation process is repeated 10 times, with a new 1/10th of the data used for validation in each iteration.
Common supervised methods include regression and classification techniques. Simpler “inflexible” methods like linear and logistic regression excel at inferential analysis, providing easily interpretable parameters and suitability for smaller data sets. In contrast, more complex and “flexible” models, such as neural networks and ensemble methods, are adept at predictive modeling and classification, particularly when dealing with high-dimensional data sets with numerous variables. However, they require large sample sizes and are susceptible to overfitting.
The most advanced supervised models, termed deep learning models, require a substantial volume of data for effective model development. However, even where adequate data exists, errors can arise. For example, repeatedly using the same data for training can be tempting, as it can increase the volume of training material, but this approach carries the risk of training the algorithm on spurious patterns unrelated to the task, a phenomenon known as ‘overfitting’. Overfit models excel on the learning data set but exhibit poor performance on the validation data set.15
Unsupervised models are, in essence, AI applications that do not require data labeling by humans. These AI models are tasked with identifying patterns and clustering unlabeled data. The machine is given access to a dataset and will look for patterns agnostic of human interpretation. The resultant algorithms can be enlightening and can aid in future investigations, but is often viewed as less ideal for clinical applications since factors that might have led to internal algorithm generation are not obvious, and in many instances, unknown.
Unsupervised methods, including K-means clustering and principal component analysis, are often applied during exploratory data analysis. These methods aid in identifying relationships and unique subpopulations within the data that might not be immediately apparent to a human observer. Additionally, they can assist with condensing data into a smaller number of variables, which is more conducive to subsequent supervised learning analyses.
The choice of the most suitable AI tool depends on the goals of the clinician or researcher. For tasks involving causal inference, classic methods like linear or logistic regression are preferred as they provide parameter estimates with clearly defined interpretations. When it comes to predictive modeling, classic methods can be effective, but more complex tools such as random forests or gradient boosting can also offer valuable insights. For image recognition, deep-learning methods such as neural networks excel, but they come at the cost of reduced transparency in the model.
Another promising aspect of AI innovation in healthcare is the integration and interpretation of high-dimensional multimodal data. This involves the ability to combine a wide range of patient information from medical records with data from motility testing to enhance diagnosis and predict clinical outcomes. AI methods that facilitate this integration include automated feature selection and classification tools. These tools are instrumental in identifying essential predictor variables, streamlining the diagnostic process, and improving the accuracy of clinical outcome predictions.10
ARTIFICIAL INTELLIGENCE IN ESOPHAGEAL PHYSIOLOGIC TESTING
Esophageal physiologic testing, as a field, is well-suited for AI-driven transformation due to the substantial volume of data generated by motility testing and the well-defined classification framework for esophageal motility disorders (Table 1).16, 17 AI also holds the potential to improve decision-making regarding optimal treatment options and predicting patient outcomes.
Table 1.
Artificial Intelligence in Esophagology
| Study | Machine Learning Method | Clinical Value |
|---|---|---|
| High Resolution Manometry (HRM) | ||
| Jungheim et al (2016)18 | Sequence labelling, logistic regression | Determination of normal UES restitution time |
| Rohof et al (2014)24 | Regression (cubic spline) | Identification of key pressure and impedance locations within HRM topography for the automated interpretation of intrabolus pressure and IRP |
| Czako et al (2021)25 | CNN | Detection of failed catheter placement and categorization of IRP values |
| Popa et al (2022)26 | CNN | Classification of HRM Clouse plots according to Chicago classification version 3.0 |
| Surdea-Blaga et al (2022)27 | CNN | Classification HRM Clouse plots according to Chicago classification version 3.0 |
| Kou et al (2022)10 | CNN, ANN, gradient boosting | Classification of HRM Clouse plots according to Chicago classification version 3.0 |
| Kou et al (2021)28 | ANN (VAE), LDA | Identification of three distinct clusters in HRM amenable to ML classification |
| Kou et al (2022)29 | RNN | Automatic classification of swallow types based on HRM data |
| Functional Lumen Imaging Probe (FLIP) | ||
| Kou et al (2023)23 | CNN | Automated FLIP study interpretation |
| Carlson et al (2021)30 | Decision tree | Prediction of HRM achalasia subtypes from FLIP metrics |
| Schauer et al (2022)31 | Bayesian additive regression trees | Estimated probability of esophageal obstruction FLIP metrics |
| Ambulatory pH-Impedance Monitoring | ||
| Rogers et al (2021)33 | Decision tree | Automated extraction of baseline impedance and novel outcome prediction metrics for GERD management. |
| Wong et al (2023)34 | CNN | Automatic identification of reflux episodes and PSPW index. |
ANN, artificial neural network; CNN, convolutional neural network; FLIP, functional luminal imaging probe; HRM, high resolution esophageal manometry; IRP, integrated relaxation pressure; LDA, linear discriminant analysis; PSPW, post reflux swallow-induced peristaltic wave; RNN, recurrent neural network; UES, upper esophageal sphincter; VAE, variational autoencoder
Existing automated interpretation of common manometry and reflux monitoring metrics are a simplistic form of machine learning, where decision tree algorithms identify HRM pressure ramps to recognize location of the upper esophageal sphincter (UES), lower esophageal sphincter (LES) and diaphragmatic crural impression18. These pressure ramps have also been utilized to recognize extents and metrics within pressure phenomena, to calculate values for commonly used software metrics including integrated relaxation pressure (IRP), distal latency (DL), distal contractile integral (DCI) as well as peristaltic integrity in identification of peristaltic breaks19, 20. Within ambulatory reflux monitoring, total acid exposure time, reflux episode numbers and mean nocturnal baseline impedance can be extracted from raw data collected21.
Potential areas of AI applications in esophageal motility include settings where increasing diagnostic confidence can augment patient management, such as differentiating conclusive and actionable esophagogastric junction outflow obstruction (EGJOO) from non-actionable artifactual elevation in EGJ outflow metrics. This can be extended to predictive modeling in order to better understand which patients with EGJOO will have spontaneous resolution of their symptoms, which will have persistent symptoms, and which will progress to achalasia.22 Other predictive modeling questions center on selection and response to therapy.
Neural networks may also enhance and automate the interpretation of imaging including timed barium esophagrams, FLIP tracings, and esophageal manometry Clouse plots. A convolutional neural network has already been employed for the interpretation of FLIP studies.23
HIGH RESOLUTION MANOMETRY
Since its inception, automated analysis methods have been attempted on traditional HRM. The majority of these initial endeavors predated the influx of AI methods that have more recently come to the forefront. Early investigators utilized a MATLAB-based extension, termed AIM-plot, to identify key pressure and impedance locations within HRM topography for the automated interpretation of intrabolus pressure and integrated relaxation pressure (IRP).24 They demonstrated excellent reliability compared to manual identification of key metrics.
Within a few years, machine learning techniques were being applied to HRM analysis. Small studies surfaced utilizing standard techniques based on logistic regressions. In one study, investigators used regression modeling to characterize UES landmarks in 15 healthy volunteers, and were able to identify UES restitution times with reasonable correlation scores (≥0.63).18 Using more advanced techniques, European investigators reported machine learning techniques applied to identify failed catheter placement and to automate HRM analysis for Chicago classification.25–27
In a recent study, Czako and colleagues compiled a database comprised of just 67 images for failed catheter placement, with an additional 2370 images of correct probe placement.25 In the same manuscript, the authors also described methodology for the evaluation of IRP values. The images were cropped and centered in pre-processing. The labeled images were then fed into a CNN, which had already been trained on thousands of images from ImageNet. On the 15% used for testing, accuracy rates above 90% were reported for both the identification of improperly placed catheter and elevated IRP. Relying upon similar methodology where manual pre-processing removed unnecessary image components, the same investigators trained a CNN to classify HRM plots according to Chicago classification version 3.0. They reported an accuracy of 93% compared to manual identification.26
In a series of papers published during the past 5 years, Kou et al evaluated methods for utilizing AI to make manometric diagnosis.10, 28, 29 The authors first described a process by which an unsupervised model was fed thirty two thousand manometry images28. A variational auto encoder (VAE) was employed to recognize patterns within manometric images, but also to act as a feature extractor for future studies. The dataset was marked on swallow and full study levels, and images were down-sampled to ensure consistent form (24 seconds, 10Hz). Following identification of a presumed optimal case from the VAE, linear discriminant analysis (LDA) was applied to the same data set for reduction, using a standard data science package (scikit-learn). Three clusters emerged and consisted of hypercontractile swallows in one, hypomotile (weak and failed) in the second, and normal, fragmented, and premature in another. For the task of classifying swallows into one of these groups, the model demonstrated accuracy on the test set of 0.87.
The same investigators then applied a multi-stage process for making an HRM diagnosis10, using data from three groups: training, validation, and test. The test portion was comprised of 15% of the total cohort. Images were down-sampled, and three individual swallow- level models were assessed, each based upon CNN analysis. Model implementation utilized known libraries (tensor-flow, keras). These swallow-level evaluations included swallow pressurization, swallow types, and IRP analysis. The swallow-level outcomes were then fed into three candidate study-level models. In the first, the categorization schemes utilized decision trees aimed at mimicking human interpretation, and the second utilized an augmented features model “stacked” upon the rule-based model from the first candidate scheme. The third utilized an artificial neural network (ANN) model. The investigators then attempted to balance the models producing a blended model by applying Bayesian principles. The authors demonstrated accuracy of study-level diagnoses of 0.81 in the test set.
The available studies demonstrate that AI methods can be successfully utilized to perform some of the basic tasks involved with HRM interpretation, starting with identification of an adequate study for interpretation and marking of landmarks. Important motor patterns such as achalasia can be identified from non-achalasia patters with reasonable accuracy. However, more work is needed to integrate the patient’s presentation and adjunctive test results to make the diagnosis more specific, as HRM pressure patterns are not always directly predictive of a clinical diagnosis. Further, non-achalasia motor patterns do not necessarily have uniform or consistent management algorithms based on HRM alone, and integrated clinical management prediction, including prognostication of outcome, will be clinically relevant future goals.
FUNCTIONAL LUMEN IMAGING PROBE
The introduction of functional lumen imaging probe (FLIP), an innovative physiologic testing technique, has opened new possibilities for categorizing disease subtypes and predicting disease progression and therapeutic response.7,8 The utilization of image-driven technology lends itself to artificial intelligence analyses. Additionally, the primary outcomes of the technology are diameter and distensibility index (DI), both of which are acquired in real-time, but can be saved and analyzed using standard AI techniques.
In a cohort of 180 achalasia patients, Carlson et al reported a 78% accuracy in segregating spastic achalasia (type 3) from non-spastic achalasia (types 1 and 2), utilizing a decision-tree-based model obtained through machine learning30. A lower accuracy of 55% was reported for segregating all the achalasia subtypes from each other. Features within FLIP that predicted each subtype of achalasia consisted of absent contractility, and balloon pressure <21 mmHg in type 1, balloon pressure 21–34 and absent response or balloon pressure ≥34 mmHg without occluding contractions and repetitive retrograde contractions (RRC), or sustained occluding contractions (SOC) in type 2, and balloon pressure > 46 mmHg, and presence of occluding conraction, RRC or SOC pattern in type 3.
Carlson et al subsequently created a decision support tool using Bayesian additive regression trees (BART), aimed at estimating the probability of obstruction utilizing EGJ distensibility index (DI) and diameter from FLIP. The study cohort consisted of 557 patients who had undergone both FLIP and HRM, and 35 asymptomatic volunteers31. Timed upright barium study, where available, was also used a baseline marker to define absence of obstruction. The model was assessed based upon a 20% test group from the original cohort, and demonstrated accuracy of 89–90% for the identification of EGJ obstruction. Accuracy of 89% was reported for the identification of normal vs. not normal and achalasia vs. not achalasia in a cohort of 687 patients and 35 asymptomatic controls23. Of 28 patients with achalasia, none were predicted as normal and 93% were correctly identified as achalasia by the algorithm.
There remains uncertainty surrounding FLIP patterns that are not completely normal or conclusive for achalasia. Available AI studies within FLIP serve to reinforce achalasia patterns from non-achalasia or normal patterns, but more research is needed to better characterize the latter group, and translate findings into management paradigms. There is a temptation to offer endoscopic or laparoscopic myotomy to all settings where DI is low, but reactive contractions of the distal esophagus and LES may mimic true outflow obstruction patterns. There is high potential for AI methods to help differentiate achalasia-like outflow obstruction amenable to endoscopic myotomy from non-obstructive reactive patterns, especially if clinical history and adjunctive test results can be integrated into the decision model.
AMBULATORY REFLUX MONITORING
Ambulatory reflux monitoring is a time intensive process, both for the patient and the interpreting physician. Simple automated tools are reliable in extraction of acid exposure time and reflux-symptom association from both catheter-based and prolonged wireless pH monitoring. When impedance is added to pH monitoring, manual review of software-identified reflux events is required21. In addition to acid exposure times and reflux episode numbers, novel metrics such as mean nocturnal baseline impedance (MNBI) and post reflux swallow induced peristaltic wave (PSPW) can also be calculated to further delineate whether an individual has features consistent with pathologic reflux32.
The first investigation of artificial intelligence in ambulatory reflux monitoring used a complex decision tree analysis (DTA) based upon current pH-impedance analysis paradigms21. The final program consisted of nine layers, and accurately identified 88.5% of events, including reflux episodes, air events, and artifacts33. After exclusion of these events, the authors identified variations in upright and recumbent baseline impedance values. A ratio was generated that was successful in predicting outcomes from GERD management. Wong and colleagues subsequently utilized 7939 impedance events from 106 patients, relying on the Wingate consensus to guide the manual identification of esophageal activity. An 18 layer convolutional network built upon a preexisting platform termed ResNet 18 was utilized for analysis. The authors reported excellent agreement with ICC of 0.965 compared to manual interpretation and an overall accuracy of 87%, which represented an improvement over current commercially available software34. Accuracy of 82% was reported for the identification of PSPW.
Use of AI methods within reflux monitoring is in its infancy, partly related to the complexity of analysis. pH-impedance systems have multiple channels of data that are not amenable to image-based interpretation, unlike HRM or FLIP. Multiple different events, such as swallows, reflux episodes, belch episodes, meals and artifacts can impact interpretation. The presence of liquid vs. gas in the lumen can alter the impedance values, and each of these events can change the baseline for varying lengths of time. Pre- and post-processing of data using sophisticated AI and non-AI tools, and perhaps some form of unsupervised learning may need to be employed for the use of AI in reflux monitoring moving forwards.
ARTIFICIAL INTELLIGENCE IN MOTILITY TESTING ELSEWHERE IN THE GUT
Utilization of AI methodology in motor testing from other areas of the gut are not as prevalent as with esophageal physiologic testing. Rather, the predominance of machine learning data from both the small and large intestines has relied on image detection technologies, primarily in the form of CNNs35, 36.
Innovative and novel methods of assessing gastric electrical activity have potential to involve AI interpretation of data, and may eventually lead to clinical management models37. Although non-physiological machine learning tools have been utilized to characterize interstitial cells of Cajal38, some of earliest assessments of electrogastrography, which utilized k-nearest neighbors and support vector machines in ferrets, were able to predict onset of emesis39. Although these early studies utilized implanted leads, a novel technique involving cutaneous electrodes, termed body surface mapping, has shown promise in phenotyping gastric physiology. The technology generates vast amounts of data and is therefore an ideal candidate for advanced augmented analyses. Machine learning has been applied to body surface mapping in the stomach, and optimal predictions regarding wave propagation have been found using Bayesian analysis40. Agrusa and colleagues have also demonstrated that CNN can be applied to body surface mapping data41. The authors evaluated approximately 6000 images using four sequentially layered 3D convolutional layers, each with 32 filters, followed by two dense layers. They also evaluated efficacy of linear discriminant analysis. Accuracy rates for both methods exceeded 90% for the determination of normal versus abnormal slow wave propagation.
Pressure topography of anorectal manometry is an ideal target for machine learning applications, since data generation is identical to esophageal manometry. Indeed, deep learning has been utilized to identify dyssynergic defection patterns42 and gradient boosting has allowed for the differentiation between incontinence and obstructive patterns43. In the first study42, the authors evaluated multiple supervised (logistic regression, linear and quadratic discriminant analysis) and unsupervised (Gaussian mixture models) methods, as well as deep learning models on high-definition videos. Dyssynergic patterns were defined based on anal pressure differential during simulated defecation. Where paradoxical contraction was noted, the study was marked as abnormal and where pressure reduction was ≥20%, the study was marked as no dyssynergia. Where reduction was 1–20%, the study was considered ambiguous. A total of 1208 maneuvers from 302 patients were evaluated. After exclusion of ambiguous maneuvers, the majority were labeled dyssynergic (453/762). The area under the curve (AUC) for the deep learning model was 0.91, slightly lower than the traditional methods and a hybrid of the two (hybrid model AUC 0.96); however, the deep learning model was also more likely to mark the ambiguous group as inconclusive42.
In the second study, four separate methods were evaluated for automated anorectal manometry interpretation - random forest, gradient boosting, k-nearest neighbors, and support vector machines43. Of 827 anorectal manometry studies, 493 had defecatory obstruction, while the other 334 were diagnosed with fecal incontinence. A five-fold training strategy was utilized, reserving 10% of data for testing. The gradient boosting model demonstrated the highest accuracy at 84.2% in the segregation of obstruction versus incontinence.
Thus, compared to esophageal physiologic testing, use of AI in gastric and anorectal manometry is in its infancy, but much potential exists in both regions.
CHALLENGES AND CONCERNS
Human input is crucial for meticulous model evaluation and providing experience and insights that might not exist within the training data. Thus, AI shines when used in collaboration with humans, often resulting in better performance than either could achieve alone.
Fears regarding the impact of AI are often exaggerated, although they do have a basis in reality. One significant challenge is our limited understanding of complex deep-learning models, which can make their application to hypothesis testing challenging.2 Moreover, biases within datasets can lead to skewed model parameters, potentially limiting the generalizability of prediction models to the broader patient population and even posing risks.11 Some clinicians also harbor concerns about being replaced by AI models, which often excel in analytical tasks with superior precision and speed. However, it’s essential to recognize that the human element remains invaluable in research and clinical care.
FUTURE DIRECTION
Current attempts at using machine learning have evaluated performance in relationship to ‘conclusive’ or ‘true’ diagnoses based on expert review. For instance, in the study using FLIP panometry, three categories were generated: true normal, true achalasia, and neither achalasia nor normal23. Studies using HRM compared Chicago Classification 3.0 diagnoses made by experienced esophagologists based on 10 swallows to image based pattern recognition, requiring extensive preparation and cropping of images25, 26. Another HRM based study created deep learning models to identify integrated relaxation pressure classes with the intent to automate the Chicago Classification algorithm27. While simplistic systems of this sort generate confidence in these ‘extreme’ patterns at either end of the diagnostic scale, it is also highly likely that practitioners will be able to make these distinctions with minimal training. However, these systems can overcome inter-observer variation. raising question as to the true value of machine learning in these contexts.
The more innovative applications of machine learning relate to the identification of parameters and metrics that are hitherto unrecognized in predicting symptoms or disease processes. For instance, FLIP metrics have been utilized in creating a ‘virtual disease landscape’ where mechanical work of the esophagus was estimated. This created mechanics-based parameter clusters that corresponded to specific esophageal disorders with the ability to predict disease progression over time12. Use in other areas of the gut are expected to grow exponentially in upcoming years.
The ability of AI systems to extract and analyze information from electronic platforms with clinical data can help the clinician predict diagnoses and choose optimal diagnostic tools based on clinical presentation and patient profiles. These computer-aided diagnosis (CAD) systems have been utilized in disorders of the hindgut, and have potential for use in foregut disorders as well44, 45. Chatbots can potentially add to the value of AI systems by providing an interface for normal-speech queries that can integrate available literature to provide diagnostic and management direction46. Indeed, natural language processing tools have shown promise in multiple fields in automated note generation47. There is potential for clinical visits to be overheard, have key clinical information be recorded by a natural language processing algorithm, and the data be scanned by deep learning algorithms to improve testing and diagnostic accuracy. In the immediate future, integration of patient demographics and clinical presentation to test results will provide an additional layer of diagnostic precision compared to interpretation of tests alone.
Footnotes
No conflicts of interest exist. No writing assistance was obtained.
Disclosures: OF: none; BR: Consulting: Braintree; CPG: Consulting: Medtronic, Diversatek, Braintree; Speaker: Carnot
References
- 1.Weissler EH, Naumann T, Andersson T, et al. The role of machine learning in clinical research: transforming the future of evidence generation. Trials 2021;22:537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Ahuja A, Kefalakes H. Clinical Applications of Artificial Intelligence in Gastroenterology: Excitement and Evidence. Gastroenterology 2022;163:341–344. ** A commentary describing terminology utilized in artificial intelligence, and the various areas where use of artificial intelligence has potential to change practice of clinical gastroenterology
- 3.Barua I, Vinsard DG, Jodal HC, et al. Artificial intelligence for polyp detection during colonoscopy: a systematic review and meta-analysis. Endoscopy 2021;53:277–284. [DOI] [PubMed] [Google Scholar]
- 4.Wang P, Berzin TM, Glissen Brown JR, et al. Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study. Gut 2019;68:1813–1819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ladabaum U, Shepard J, Weng Y, et al. Computer-aided Detection of Polyps Does Not Improve Colonoscopist Performance in a Pragmatic Implementation Trial. Gastroenterology 2023;164:481–483 e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Hashimoto R, Requa J, Dao T, et al. Artificial intelligence using convolutional neural networks for real-time detection of early esophageal neoplasia in Barrett’s esophagus (with video). Gastrointest Endosc 2020;91:1264–1271 e1. [DOI] [PubMed] [Google Scholar]
- 7.Ozawa T, Ishihara S, Fujishiro M, et al. Novel computer-assisted diagnosis system for endoscopic disease activity in patients with ulcerative colitis. Gastrointest Endosc 2019;89:416–421 e1. [DOI] [PubMed] [Google Scholar]
- 8.Calderaro J, Kather JN. Artificial intelligence-based pathology for gastrointestinal and hepatobiliary cancers. Gut 2021;70:1183–1193. [DOI] [PubMed] [Google Scholar]
- 9.Skrede OJ, De Raedt S, Kleppe A, et al. Deep learning for prediction of colorectal cancer outcome: a discovery and validation study. Lancet 2020;395:350–360. [DOI] [PubMed] [Google Scholar]
- 10.Kou W, Carlson DA, Baumann AJ, et al. A multi-stage machine learning model for diagnosis of esophageal manometry. Artif Intell Med 2022;124:102233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Jell A, Kuttler C, Ostler D, et al. How to Cope with Big Data in Functional Analysis of the Esophagus. Visc Med 2020;36:439–442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Halder S, Yamasaki J, Acharya S, et al. Virtual disease landscape using mechanics-informed machine learning: Application to esophageal disorders. Artif Intell Med 2022;134:102435. * Description of the creation of a virtual esophageal disease landscape based on parameters from esophageal testing relating to altered bolus transit in the esophagus, where underlying physics of esophageal disorders are mapped into the created landscape to facilitate understanding of physiology and pathophysiology
- 13.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019;25:44–56. [DOI] [PubMed] [Google Scholar]
- 14. Hunter DJ, Holmes C. Where Medical Statistics Meets Artificial Intelligence. N Engl J Med 2023;389:1211–1219. ** A commentary comparing older statistical methods to modern artificial intelligence and its ability to extract complex, task oriented features from large data sets. Challenges and concerns regarding artificial intelligence are also discussed.
- 15.Kernbach JM, Staartjes VE. Foundations of Machine Learning-Based Clinical Prediction Modeling: Part II-Generalization and Overfitting. Acta Neurochir Suppl 2022;134:15–21. [DOI] [PubMed] [Google Scholar]
- 16.Carlson DA, Gyawali CP, Khan A, et al. Classifying Esophageal Motility by FLIP Panometry: A Study of 722 Subjects With Manometry. Am J Gastroenterol 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Yadlapati R, Kahrilas PJ, Fox MR, et al. Esophageal motility disorders on high-resolution manometry: Chicago classification version 4.0((c)). Neurogastroenterol Motil 2021;33:e14058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Jungheim M, Busche A, Miller S, et al. Calculation of upper esophageal sphincter restitution time from high resolution manometry data using machine learning. Physiol Behav 2016;165:413–24. [DOI] [PubMed] [Google Scholar]
- 19.Kahrilas PJ, Sifrim D. High-resolution manometry and impedance-pH/manometry: valuable tools in clinical and investigational esophagology. Gastroenterology 2008;135:756–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Gyawali CP, Patel A. Esophageal motor function: technical aspects of manometry. Gastrointest Endosc Clin N Am 2014;24:527–43. [DOI] [PubMed] [Google Scholar]
- 21.Gyawali CP, Rogers B, Frazzoni M, et al. Inter-reviewer Variability in Interpretation of pH-Impedance Studies: The Wingate Consensus. Clin Gastroenterol Hepatol 2021;19:1976–1978 e1. [DOI] [PubMed] [Google Scholar]
- 22.Beveridge C, Lynch K. Diagnosis and Management of Esophagogastric Junction Outflow Obstruction. Gastroenterol Hepatol (N Y) 2020;16:131–138. [PMC free article] [PubMed] [Google Scholar]
- 23. Kou W, Soni P, Klug MW, et al. An artificial intelligence platform provides an accurate interpretation of esophageal motility from Functional Lumen Imaging Probe Panometry studies. Neurogastroenterol Motil 2023;35:e14549. * Description of an artificial intelligence platform for interpretation of functional lumen imaging probe findings to normal, achalasia and not-achalasia, with accuracy close to 90% or higher.
- 24.Rohof WO, Myers JC, Estremera FA, et al. Inter- and intra-rater reproducibility of automated and integrated pressure-flow analysis of esophageal pressure-impedance recordings. Neurogastroenterol Motil 2014;26:168–75. [DOI] [PubMed] [Google Scholar]
- 25.Czako Z, Surdea-Blaga T, Sebestyen G, et al. Integrated Relaxation Pressure Classification and Probe Positioning Failure Detection in High-Resolution Esophageal Manometry Using Machine Learning. Sensors (Basel) 2021;22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Popa SL, Surdea-Blaga T, Dumitrascu DL, et al. Automatic Diagnosis of High-Resolution Esophageal Manometry Using Artificial Intelligence. J Gastrointestin Liver Dis 2022;31:383–389. [DOI] [PubMed] [Google Scholar]
- 27.Surdea-Blaga T, Sebestyen G, Czako Z, et al. Automated Chicago Classification for Esophageal Motility Disorder Diagnosis Using Machine Learning. Sensors (Basel) 2022;22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kou W, Carlson DA, Baumann AJ, et al. A deep-learning-based unsupervised model on esophageal manometry using variational autoencoder. Artif Intell Med 2021;112:102006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Kou W, Galal GO, MW Klug, et al. Deep learning-based artificial intelligence model for identifying swallow types in esophageal high-resolution manometry. Neurogastroenterol Motil 2022;34:e14290. * Description of identification of swallow types from raw high resolution manometry data using a deep learning artificial intelligence model.
- 30.Carlson DA, Kou W, Rooney KP, et al. Achalasia subtypes can be identified with functional luminal imaging probe (FLIP) panometry using a supervised machine learning process. Neurogastroenterol Motil 2021;33:e13932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Schauer JM, Kou W, Prescott JE, et al. Estimating Probability for Esophageal Obstruction: A Diagnostic Decision Support Tool Applying Machine Learning to Functional Lumen Imaging Probe Panometry. J Neurogastroenterol Motil 2022;28:572–579. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Gyawali CP, Kahrilas PJ, Savarino E, et al. Modern diagnosis of GERD: the Lyon Consensus. Gut 2018;67:1351–1362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rogers B, Samanta S, Ghobadi K, et al. Artificial intelligence automates and augments baseline impedance measurements from pH-impedance studies in gastroesophageal reflux disease. J Gastroenterol 2021;56:34–41. [DOI] [PubMed] [Google Scholar]
- 34.Wong MW, Liu MX, Lei WY, et al. Artificial intelligence facilitates measuring reflux episodes and postreflux swallow-induced peristaltic wave index from impedance-pH studies in patients with reflux disease. Neurogastroenterol Motil 2023;35:e14506. [DOI] [PubMed] [Google Scholar]
- 35.Soffer S, Klang E, Shimon O, et al. Deep learning for wireless capsule endoscopy: a systematic review and meta-analysis. Gastrointest Endosc 2020;92:831–839 e8. [DOI] [PubMed] [Google Scholar]
- 36.Urban G, Tripathi P, Alkayali T, et al. Deep Learning Localizes and Identifies Polyps in Real Time With 96% Accuracy in Screening Colonoscopy. Gastroenterology 2018;155:1069–1078 e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Carson DA, O’Grady G, Du P, et al. Body surface mapping of the stomach: New directions for clinically evaluating gastric electrical activity. Neurogastroenterol Motil 2021;33:e14048. [DOI] [PubMed] [Google Scholar]
- 38.Mah SA, Du P, Avci R, et al. Analysis of Regional Variations of the Interstitial Cells of Cajal in the Murine Distal Stomach Informed by Confocal Imaging and Machine Learning Methods. Cell Mol Bioeng 2022;15:193–205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Nanivadekar AC, Miller DM, Fulton S, et al. Machine learning prediction of emesis and gastrointestinal state in ferrets. PLoS One 2019;14:e0223279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Allegra AB, Gharibans AA, Schamberg GE, et al. Bayesian inverse methods for spatiotemporal characterization of gastric electrical activity from cutaneous multi-electrode recordings. PLoS One 2019;14:e0220315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Agrusa AS, Gharibans AA, Allegra AA, et al. A Deep Convolutional Neural Network Approach to Classify Normal and Abnormal Gastric Slow Wave Initiation From the High Resolution Electrogastrogram. IEEE Trans Biomed Eng 2020;67:854–867. [DOI] [PubMed] [Google Scholar]
- 42.Levy JJ, Navas CM, Chandra JA, et al. Video-Based Deep Learning to Detect Dyssynergic Defecation with 3D High-Definition Anorectal Manometry. Dig Dis Sci 2023;68:2015–2022. [DOI] [PubMed] [Google Scholar]
- 43.Saraiva MM, Pouca MV, Ribeiro T, et al. Artificial Intelligence and Anorectal Manometry: Automatic Detection and Differentiation of Anorectal Motility Patterns-A Proof-of-Concept Study. Clin Transl Gastroenterol 2023;14:e00555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Visaggi P, de Bortoli N, Barberio B, et al. Artificial Intelligence in the Diagnosis of Upper Gastrointestinal Diseases. J Clin Gastroenterol 2022;56:23–35. ** A comprehensive review describing the use of artificial intelligence in upper gastrointestinal disorders, including motility disorders.
- 45.Dogan Y, Bor S. Computer-Based Intelligent Solutions for the Diagnosis of Gastroesophageal Reflux Disease Phenotypes and Chicago Classification 3.0. Healthcare (Basel) 2023;11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Henson JB, Glissen Brown JR, Lee JP, et al. Evaluation of the Potential Utility of an Artificial Intelligence Chatbot in Gastroesophageal Reflux Disease Management. Am J Gastroenterol 2023. * A description of the use an accuracy of a chatbot as an information resource for patients with symptoms of gastroesophageal reflux disease
- 47.Sheikhalishahi S, Miotto R, Dudley JT, et al. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med Inform 2019;7:e12239. [DOI] [PMC free article] [PubMed] [Google Scholar]
