Abstract
There is a rising interest in use of big data approaches to personalize treatment of inflammatory bowel diseases (IBDs) and to predict and prevent outcomes such as disease flares and therapeutic nonresponse. Machine learning (ML) provides an avenue to identify and quantify features across vast quantities of data to produce novel insights in disease management. In this review, we cover current approaches in ML-driven predictive outcomes modeling for IBD and relate how advances in other fields of medicine may be applied to improve future IBD predictive models. Numerous studies have incorporated clinical, laboratory, or omics data to predict significant outcomes in IBD, including hospitalizations, outpatient corticosteroid use, biologic response, and refractory disease after colectomy, among others, with considerable health care dollars saved as a result. Encouraging results in other fields of medicine support efforts to use ML image analysis—including analysis of histopathology, endoscopy, and radiology—to further advance outcome predictions in IBD. Though obstacles to clinical implementation include technical barriers, bias within data sets, and incongruence between limited data sets preventing model validation in larger cohorts, ML-predictive analytics have the potential to transform the clinical management of IBD. Future directions include the development of models that synthesize all aforementioned approaches to produce more robust predictive metrics.
Keywords: machine learning, artificial intelligence, predictive modeling, inflammatory bowel diseases
Introduction
The turn of the 21st century marked a major leap forward in the sophistication of treatments for inflammatory bowel diseases (IBDs) with the advent of biologic therapies. Compared with the nonspecific immunosuppressive modalities of the prior era, these drugs targeted molecular pathways more specific to the underlying pathogenic mechanisms of inflammation in IBD, resulting in more patients achieving clinical remission.1 However, despite subsequent advances in biologic and other nonsteroidal agents, including gut-selective monoclonal antibodies and small molecules, therapeutic nonresponse and ensuing complications still remain serious roadblocks in disease management.2 Consequently, as in many areas of medicine, there is increasing interest in use of big data approaches in ulcerative colitis (UC) and Crohn’s disease (CD) to better understand their natural histories and predictors of disease development, severity, complications, treatment sensitivities, and more.3 With regards to predictive modeling, the goal is to leverage large data sets to discover new disease subgroups based on predicted disease course and, in turn, design patient-tailored therapeutic plans from individualized clinical, molecular, and imaging data (Figure 1). Multiple clinical trials and biorepositories are in the process of compiling such data sets,4–8 but the challenge remains how to process large volumes of data to create appropriate classifying prediction algorithms.
Figure 1.
Machine learning predictive models can use a variety of data—including clinical, imaging, and omics (molecular) data, among others—to categorize patients based on forecasted treatment sensitivities or disease outcomes which may be used by clinicians to guide management decisions. A classification model is shown. Grey circles, all patients; black box, machine learning model; colored circles, patients categorized by predicted outcome.
Machine learning (ML) can analyze features across vast quantities of data to potentially produce novel disease insights regarding diagnosis, identification of phenotypes, predicting prognosis, and guiding management decisions. In this review, we discuss current approaches in ML-driven predictive outcomes modeling for IBD. It may seem that the lag between discoveries in ML research and clinical implementation is long. Healthcare has been relatively sluggish to adopt digital age advancements compared with other industries, for many reasons. In 2009, for example, while many businesses were accelerating into the era of information technology, only 12% of hospitals had electronic health records until passage of the Health Information Technology for Economic and Clinical Health Act in 2009.9 Although 97% of hospitals in the United States had electronic health records by 2017, most systems are disparate between centers and optimized for billing rather than medical research. Lastly, the stakes in health care are higher and thus ML algorithms are subject to more scrutiny before widespread implementation.
In light of recent efforts to amass and expand access to biomedical data, multiple ML approaches have been successful in predicting important disease outcomes in IBD, including hospitalizations, outpatient steroid use,10 biologic response,11,12 and refractory disease after colectomy,13 among others. Some of these algorithms have already been deployed in clinical practice with subsequent savings in health care expenditures to the tune of tens of thousands of dollars per year.14 In addition to existing IBD predictive outcome approaches, we will also highlight efforts using ML in other fields of medicine with regards to how these advances can be applied to IBD research. Machine learning has already started assisting physicians in making diagnosis and management decisions,15,16 and we hope this review will give clinicians the tools to recognize how ML may be applied to predict and improve clinical outcomes for patients with IBD in the present and near future. Although we touch on some additional applications of ML in IBD research such as diagnostics and alternate methods of predictive modeling, the scope of this review is focused on approaches in predictive outcomes modeling in IBD using ML models.
Machine Learning Concepts
Artificial intelligence (AI) refers to computational systems that are able to perform tasks and solve problems that normally require human intelligence and decision-making.17 Machine learning, a subset of AI, enables computer algorithms to improve themselves based on prior experiences, either with or without human supervision.18 Deep learning is a type of ML that can be used to formulate complex feature representations by training artificial neural networks with many hidden layers.19 Neural networks were originally inspired by the biology of the human brain in which interconnected neurons send and receive signals.18 Neurons or nodes within artificial networks contain nonlinear activation functions to control which signals are sent to subsequent layers of the model. The parameters of the model are adjusted during training to optimize an objective function such as prediction confidence.19Figure 2 depicts an example of a fully connected neural network with 2 hidden layers.
Figure 2.
A neural network with 2 hidden layers. Layer 1 has 2 neurons and layer 2 has 4 neurons. Circles, neurons/nodes.
Machine learning typically follows either a supervised or unsupervised learning approach.18,20 Supervised learning is when an ML model is trained using human-labeled input-output pairs to later determine the outputs for unpaired inputs (eg, annotating features in a pathology slide to teach the model how to identify those features in unannotated slides).21 These algorithms often require large quantities of annotated training data to produce meaningful predictions.22 Supervised learning algorithms include support vector machines (SVMs),23 random forest (RF),24 and convolutional neural networks (CNNs).25 Unsupervised learning is when an ML model identifies distinct features in unlabeled data sets, which may or may not have known significance, to calculate underlying patterns within the data (eg, differentiating normal from diseased tissue in unannotated pathology slides).26,27 Because of the lack of labels, these methods can be more challenging and are often used to elucidate hidden structures within training data sets.28 Unsupervised learning algorithms include k-means clustering,29 autoencoders,30 and principal component analysis.31 Further derivations of these types of learning are being implemented but will not be covered in this review.28
Model selection and evaluation of model performance are 2 areas of interest when applying ML to a data set. Each model carries its own set of assumptions about the data which are, to some extent, violated by real-world data. The choice of algorithm comes down to a balance of minimizing the number of assumptions violated while maximizing the model’s performance.32 Although studies have varied in the metrics used for measuring performance, one of the most widely used metrics for binary classification problems (such as diseased vs normal or survival vs nonsurvival) is the area under the receiver operating characteristics curve (AUROC).33 The AUROC gives an aggregate measure of a model’s ability to distinguish between different classes at all possible classification thresholds, with values closer to 1 indicating better classifiers.34 Other metrics—such as precision, recall, F1-score (the harmonic mean of precision and recall), accuracy, sensitivity, specificity, mean error metrics, and others—are used depending on the scenario.35
Once trained, ML algorithms are advantageous due to their speed and reproducibility in recognizing patterns from the training data, in addition to the ability to continue improving with increasing exposure to new data.36 Machine learning predictive models for IBD outcomes like disease progression and treatment-specific response have the potential to synthesize data from multiple sources—such as clinical and laboratory data,37 histopathology, radiography, endoscopy, and omics13—to produce robust analytics that could fundamentally change the clinical practice of medicine.15 In this review, we cover how ML-driven metrics have been used to predict outcomes in a variety of diseases and how those approaches may be applied to IBD management.
Laboratory and Clinical Data
Most current ML-based predictive models for IBD use tree-based classifiers such as RF on clinical and laboratory data to forecast future disease progression and treatment response.10–12,14,37,38 Random forest classifiers contain a large number of uncorrelated decision trees that each produces its own prediction output. The final output of the model is the most prevalent output vote from the individual trees (Figure 3). Random forest classifiers reduce overfitting by including many uncorrelated decision trees to generate a prediction, which makes them advantageous for complex classification problems such as using hundreds of variables to predict a future outcome. Random forest can also determine relative input feature importance for strength of correlation with the model’s output, which is advantageous compared with some other ML techniques whose interpretability can be more opaque.24
Figure 3.
Visualization of a hypothetical random forest classifier to predict clinical response to anti-tumor necrosis factor (anti-TNF) drugs in an inflammatory bowel disease patient. The model inputs are run through a multitude of independent decision trees that together make the “forest.” The output of the model is either the average output of the individual trees or the most prevalent output (the one with the most votes).
Waljee et al have developed predictive models using RF to predict several outcomes in IBD, including clinical remission with thiopurines,14 hospitalization and outpatient corticosteroid (CS) use,10 corticosteroid-free remission with vedolizumab,11,12 and long-term ustekinumab response.37 Using inputs of age and laboratory data, they created an ML model to predict clinical response to thiopurine therapy, preferential shunting to 6-methylmercaptopurine, and medical nonadherence. The model predicting remission had an AUROC of 0.79 compared with the standard of care using thiopurine metabolite levels, which had an AUROC of 0.49. The change in the mean number of clinical events in patients with ML-predicted remission included 1.65 fewer steroid prescriptions, 1.05 fewer hospitalizations, and 0.19 fewer surgeries per year. These models were integrated into daily clinical use through the electronic health record in place of traditional send-out thiopurine metabolite measurements, resulting in a total reduction in testing expenditures of $75,000 less per year at the time of publication. The findings support an additional benefit of ML analytics: increased value of care by extracting high-value information from low-cost medical data routinely acquired in IBD management.14
The same group explored whether hospitalizations or corticosteroid use could be predicted based on inputs of demographic information, history of corticosteroid-sparing immunosuppressants, and longitudinal laboratory data. The AUROC for the longitudinal RF model was 0.85, which increased to 0.87 with the addition of history of IBD-related hospitalizations and outpatient CS prescriptions compared with 0.68 with a logistic regression model. The 5 leading independent risk factors for future hospitalizations or steroid use were age, mean serum albumin, immunosuppressive medication use, and mean and highest platelet counts.10 Next, they investigated whether early initiation of vedolizumab therapy was associated with corticosteroid-free endoscopic remission within 1 year of starting therapy. Remission in ulcerative colitis (UC) patients on vedolizumab could be predicted with an AUROC of 0.62 using baseline data alone, and this rose to 0.73 with data through week 6; 59% of patients predicted to be remitters achieved CS-free endoscopic remission compared with 21% of predicted nonremitters.8 Likewise, the model for predicting remission in CD patients on vedolizumab therapy had an AUROC of 0.65 with baseline labs vs 0.75 with labs through week 6 of initiating therapy; 35.8% of predicted remitters achieved remission, in contrast to 6.7% of predicted nonremitters.9
Lastly, Waljee et al used an RF approach to predict long-term ustekinumab response in CD patients, with responders defined as having C-reactive protein (CRP) levels lower than 5 mg/dL beyond week 42 of initiating treatment. An AUROC of 0.78 was achieved using laboratory data through the first 8 weeks of treatment, whereas a simpler model using only week 6 albumin-to-C-reactive protein ratio boasted nearly similar performance, with an AUROC of 0.76. The addition of serum ustekinumab levels was not associated with improvements to any models tested and actually decreased the performance of the week-6 labs model.37 This once again confirms prior findings that identification of likely nonresponders can be achieved with ML analysis of low-cost routine labs and does not require expensive drug-level monitoring.
The studies by Waljee et al were among the first ML-predictive models created for IBD and involved relatively large cohorts ranging from several hundred to tens of thousands of patients. Furthermore, they were able to improve prediction of disease outcomes such as treatment response using data from the electronic health record already collected as part of routine IBD management, providing increased prognostic value without increasing procedures or costs for patients. The authors acknowledged some limitations: most of the models have not been validated in external populations and hence their generalizability is unknown. The model to predict hospitalizations, for example, was performed on a Veterans’ Affairs data set, which was >90% male and >70% White, and thus may be subject to biases.10 Furthermore, the models had strong performance as measured by AUROC, although all studies were performed as retrospective analyses. Before ML can guide management decisions, long-term intervention studies will be required to determine what to do based on model predictions, such as a poor patient trajectory, for example.37
Dong et al compared the ability of 5 algorithms—RF, logistic regression, SVM, decision tree, and artificial neural network—to predict the need for surgery in CD patients based on data extracted from the electronic health record. Random forest had the highest AUROC of 0.99 compared with the next best model, logistic regression, which had an AUROC of 0.95. The top predictors of surgery included radiographic evidence of disease progression, presence of fistula or abscess, never receiving biologic therapy, and elevated CRP.38 In another study, an artificial neural network was used to predict seasonality of IBD flares in 901 Chinese patients based on the dates of previous flares as determined by unspecified clinical, radiographic, endoscopic, or histological indications on presentation. The investigators found that relapse in CD patients could be accurately predicted with a mean absolute predicted error of 17.1%, with flares most likely to occur in July and August.39
Although using routine IBD laboratory and clinical information as model inputs is advantageous due to the widespread availability of the data, some investigators are exploring ML analysis of survey responses to predict medical nonadherence.40 Wang et al compared the efficacy of logistic regression, back-propagation neural networks, and SVM using data from questionnaires to predict azathioprine nonadherence among Chinese CD patients. They used RF to select the top questionnaire responses that independently predicted medication nonadherence for input in the SVM model, which included concerns about the necessity of medications, anxiety, depression, and level of education. Support vector machines had the highest performance, with an AUROC of 0.93, followed by the back propagation neural network (0.91) and logistic regression (0.90).40 These methods present an interesting use of RF to help determine the relative feature importance for an SVM model, which traditionally can achieve high classification performance but suffers from lack of interpretability.41 From a clinical perspective, in light of the increased prevalence of anxiety and depression among IBD patients42 and the impact of mind-body interventions in reducing anxiety, subjective pain, and expression of genes related to inflammatory and oxidative stress-related pathways,43 surveys could be an additional approach for construction of ML models to predict medication nonadherence and guide appropriate preventative and follow-up strategies.
Imaging
Although ML image analysis techniques applied to IBD have not yet been used to predict future outcomes, encouraging data in other fields of medicine suggest that powerful predictive metrics can be extracted from imaging in the forms of histopathology, endoscopy, and radiography.
Machine Learning in Histopathology
Histopathological image analysis begins with digitization of whole slide images (WSIs), which are analyzed to extract features or patterns using deep learning methods. In IBD predictive analytics, 2 major areas of interest are broad tissue pattern recognition and cellular quantification techniques, which may be used to identify subtle patterns in histology that are associated with clinical phenotypes.
Tissue pattern recognition
Deep learning image classification models broadly categorize an entire image based on the features within it (eg, differentiating a snowy lake from a sandy beach). Figure 4 depicts a hypothetical example of an image classification model to classify antitumor necrosis factor (anti-TNF) responders and nonresponders. Image classification has been used with great success in diagnostic models, such as to distinguish WSIs of gastric and colonic epithelial tumors from normal tissue with AUROCs ranging from 0.96 to 0.99.44 In IBD, ML-driven histopathological classifying models have not yet been used to predict outcomes but have been able to distinguish between IBD subtypes. Mossotto et al created supervised ML models for distinguishing pediatric IBD subtypes of CD and UC using human-scored endoscopic data only, histological data only, and combined endoscopic with histological data, which yielded classification accuracies of 71%, 76.9%, and 82.7%, respectively. They also created unsupervised models that revealed 4 subgroups of patients based on macro and microscopic dysregulations of the colorectal region which were unrelated to known IBD subtypes.45 The presence of these distinct subgroups suggests there may be currently unknown histological patterns in IBD that differ between patients at diagnosis, although whether these patterns portend varying disease trajectories is yet to be determined.
Figure 4.
A simplified, hypothetical image classification machine learning model to stratify patients as anti-tumor necrosis factor (anti-TNF) drug responders or nonresponders based on histopathological image analysis of diagnostic biopsy. Extracted features from a digitized colon biopsy whole slide image are used as inputs for a neural network with 2 hidden layers to produce the prediction output. Deep learning models have multiple additional hidden layers.
Automatically derived histopathological image recognition has been used in development of prognostication models in cancer patients. Yu et al were able to distinguish the likelihood of short- and long-term survival in lung adenocarcinoma and squamous cell carcinoma patients using a fully automated image-segmentation pipeline. The primary features used by the model to predict survival outcomes were texture of nuclei, decomposition of nuclei, and decomposition of cytoplasm. These features are intriguing because they demonstrate the ability of ML to quantify cellular features—such as texture of nuclei throughout an entire WSI—that could be difficult and time intensive for humans to interpret with interobserver consistency.46 Beck et al used a ML-based method to categorize patches of breast cancer histopathology images as epithelium or stroma and image processed them to extract thousands of morphological features. They then used an L1-regularized logistic regression to create a prognostic model based on the processed images and found a strong independent association between stromal features and overall survival. Some of the ML-derived stromal architectural features most strongly correlated with survival have no known biological underpinning, demonstrating that ML analysis can elucidate previously unknown features from biopsies that can predict survival.47 Compared with histological dysregulation of the tumor microenvironment, quantifying the spectrum of inflammatory phenotypes in IBD may or may not be more difficult using ML approaches. Future research could apply these techniques to biopsy specimens in IBD to discover novel prognostic indicators or develop an objective histologic score for grading disease activity and/or severity.
Cellular quantification techniques
Object detection is a subset of ML image classification that identifies distinct objects within an image that are typically labeled with a bounding box. Image segmentation is a more specific type of object detection in which each pixel is given a label, providing the exact location of the object of interest within the image (Figure 5).48 This is particularly useful for tasks such as identifying cells in histopathology biopsy images. The located cells can be image processed to extract information such as cellular counts, sizes, granules, nuclei, mitotic figures, and many other features.
Figure 5.
Examples of image classification, object detection, and segmentation by a machine learning model designed to detect eosinophils in colonic histology hematoxylin and eosin slides. In the left frame (image classification), the model classifies the entire image as containing eosinophils or no eosinophils. In the middle frame (object detection), the individual eosinophils are detected and designated with a bounding box. In the right frame (image segmentation), each pixel in the image is classified as eosinophil or not, with pixels containing eosinophils colored yellow.
Tissue eosinophilia and neutrophilia have been implicated in IBD outcomes, with higher tissue eosinophil counts in UC being associated with severity at diagnosis, short-term corticosteroid requirement in children,49 and poor response to medical therapy.50 The phenotype of tissue eosinophilia is also significant, and 1 study found a neutrophil-predominant eosinophilia on diagnostic biopsy correlated with higher likelihood of subsequent disease flares and hospitalizations compared with eosinophil-predominant inflammation, which was associated with longer flare-free survival.51 As a proof-of-concept in a small sample of 29 patients, our research group has found that deep learning image segmentation analysis of colonic UC biopsies can be used for automated eosinophil quantification with a low absolute error of 2 eosinophils per high-powered field (hpf). We also found a possible correlation between tissue eosinophilia on diagnostic biopsy and time to start of biologic therapy and are in the process of increasing patient numbers for more robust analysis. Currently, the use of histopathology to predict disease outcomes in IBD is still in its infancy, but findings from precision oncology support that ML can be used to discover previously unknown biopsy features that forecast future disease trajectory. Automated image analysis of IBD specimens could be incorporated into existing clinical pathways and may inform future management decisions.
Machine Learning in Endoscopy
Utilization of ML for endoscopic IBD surveillance can be approached via analysis of still images, as described previously, and by using video segmentation techniques. Machine learning has already shown promise in areas of gastroenterology (GI) endoscopy, including computer-aided detection and diagnosis of polyps52 and Barrett esophagus.53 A recent study also demonstrated that deep learning can effectively detect small bowel ulceration on video capsule endoscopy in CD with AUROCs between 0.94 and 0.99.54 The performance of supervised deep learning models for grading endoscopic severity in IBD is similar to that of experienced human reviewers, with video segmentation models for UC achieving an AUROC of 0.97 for matching human grading.55
A caveat of traditional ML-based approaches is that the algorithm will incorporate any potential bias found in the training set. Studies have shown that there is significant interobserver variability in assessing endoscopic disease activity with the Mayo score, and as such, it may not be a reliable way of determining disease prognosis and potential treatment options. This effect is seen even amongst experienced gastroenterologists.56,57 Given the subjective nature of the Mayo endoscopic score, the human-graded training set in a supervised learning approach may have significant variability and thus affect the precision of the ML model itself. To combat this, further research is emerging to evaluate the appropriateness and efficacy of more objective unsupervised ML models on determining disease activity in UC and CD.58–60 Yao et al found fully automated unsupervised video analysis was able to correctly distinguishing remission vs active disease in 83.7% of videos.61 Researchers in Japan recently developed an ML algorithm that was able to determine endoscopic and histologic remission with 90.1% and 92.9% accuracy, without the need for mucosal biopsy.62 The continued evolution of unsupervised ML methods could lead to more accurate diagnosis and targeting of therapeutic options in IBD.
Due in part to the current focus on improving and validating endoscopic models for diagnosis and grading severity, image or video segmentation analysis of endoscopy has not yet been shown to robustly predict future IBD-related outcomes. However, RF classifiers have been used to classify pediatric IBD subtypes based on inputs of endoscopic and histologic data as binary or ordinal variables. One model identified 7 key features, 3 histological (granulomas, patchy crypt distortion, and patchy chronic inflammation) and 4 endoscopic (skip lesions, ≥5 small discrete ulcers in the colon, ileitis with a mild cecum, relative patchiness), to discriminate between colonic CD and UC.63 This study provides a potential avenue for how endoscopic findings can be used to classify IBD subtypes that may be correlated with disease outcomes in the future.
Machine Learning in Radiology
The current roles of radiographic contrast imaging in IBD management are (1) to distinguish IBD subtypes at diagnosis; (2) to visualize penetrating or stricturing disease complications; (3) to assess extraintestinal IBD manifestations; and (4) to assess disease activity during flares.64 An emerging role for magnetic resonance imaging (MRI) is as a noninvasive biomarker of treatment response. Recent evidence suggests the concordance of magnetic resonance enterography (MRE) index of activity with mucosal healing on endoscopy,65 and multiple studies have been published on the development of ML models for CD severity assessment in MRI.66,67 Furthermore, MRE parameters such as pre- and post-treatment wall thickness, edema, length of involvement, and change in Clermont score have been associated with antitumor necrosis factor drug response in ileal CD.68 Deep learning could be applied to analyze known features or derive novel patterns from MRE that may indicate future disease trajectory as described previously for histopathological analyses. Deep learning analysis of MRI has been used to predict tumor response to chemotherapy in patients with colorectal cancer liver masses,69 radiotherapy response of metastatic lymph nodes,70 and medical responders among heart failure patients.71 In patients with glioma, deep learning–derived features using SVM were highly associated with survival at 1, 2, 3, and 4 years compared with traditionally used MRI features such as contrast enhancement and tumor size, which had no significant associations. 72 Deep learning–derived MRI features have also been combined with clinical characteristics to predict short-, medium-, and long-term survivors with 84% accuracy in amyotrophic lateral sclerosis patients.73 These are just a few examples that suggest the possibility of using ML-derived image features from MRI and combining these with other types of data to predict disease progression and treatment response in IBD. As of yet, none of the aforementioned ML image analysis techniques have been applied to predict IBD outcomes, and further research in this area is warranted.
OMICS
An abundance of omics data is being collected in large IBD studies74; however, molecular profiles need to be corelated with clinical end points to have significance.75 Omics have been used to differentiate healthy subjects from subjects with IBD,76 and several multiomics approaches have been useful in predicting treatment response.
Zarringhalam et al created an algorithm based on previously identified causal gene relationships that was incorporated into an ML model to predict response to infliximab therapy in UC patients. The model identified interferon gamma, lipopolysaccharide, and tumor necrosis factor as key regulators, with an accuracy of 70% for prediction of treatment response.77 In CD, it has been demonstrated that patients naïve and exposed to antitumor necrosis factor can be completely segregated based on unsupervised hierarchical clustering of ileal whole transcriptome profiles. Furthermore, using transcripts at the time of ileocolectomy from 60 CD patients, an RF classifier identified 30 transcripts that differentiated patients who remained in remission from those who had refractory disease within 4 years following surgery.13 Verstockt et al created a multiomics factor analysis of CD patients by incorporating several layers of omics approaches, including RNA from colon biopsies, proteomics from serum samples, and genotyping data to identify 19 latent factors associated with endoscopic response to ustekinumab therapy. The 10 most dominant omics features predicted endoscopic response at week 24, with an accuracy of 98%. Only 2 of 10 features were associated with baseline fecal calprotectin and 1 of 10 with C-reactive protein, suggesting the algorithm inputs were assessing disease factors related to treatment response beyond baseline inflammation.78
In oncology, Schmauch et al created an ML model that was trained to predict normalized gene expression data from unannotated WSIs of tumor biopsies. They matched WSIs and RNA-sequencing profiles from 28 different cancer types to train a deep learning model. The model was able to predict the spatial expression of various genes throughout a WSI, including the ability to differentiate types of lymphocytes and predict gene expression dysregulation for universal hallmark genes of cancer and cancer type–specific pathways.79 Of note, the ability of the model to distinguish T cells from B cells, despite the known difficulty of differentiating these cell types based on morphometry alone, has profound implications for the utility of deep learning to capture more subtle, currently unknown histopathological patterns.80 These distinct inflammatory phenotypes may portend unique treatment sensitivities and could change the ways we think about a variety of diseases, including IBD.
Machine learning omics-based approaches for predicting IBD outcomes have the potential to be more powerful than any of the aforementioned ML methods due to their high sensitivity. In oncology, ML approaches of transcriptomic and proteomic analyses have been able to distinguish between cellular transcripts on a single cell level, which could be used to quantify changes in the heterogeneity of tumors throughout a treatment course and detect resistant clonal subpopulations long before they appear on imaging.81 Omics approaches from serum also have the potential to give a more global view of disease compared with a biopsy or histological section, which by nature are slices that could be missing the full extent of disease.82 In IBD, the exact causes of inflammation and mechanisms of clinical nonresponse are unclear. However, it may be possible to detect the presence of genetic alterations leading to therapeutic resistance far before the onset of clinical symptoms, which would provide a useful avenue of monitoring of treatment response for IBD patients. Martin et al used single-cell analysis of ileal CD lesions to identify cellular patterns associated with resistance to anti-TNF therapy, and ML techniques could be viable methods to further explore these approaches.83 A summary of the ML predictive modeling articles for IBD to date is shown in Table 1.
Table 1.
Summary of existing machine learning predictive outcomes models for inflammatory bowel diseases.
| Author | Year | Predicted Outcome | Machine Learning Method | Top Predictors | |
|---|---|---|---|---|---|
| Clinical and Laboratory | |||||
| Dong et al.38 | 2019 | Progression to surgery in Crohn’s Disease | Random Forest | Disease progression on radiography, presence of abscess or fistula, no infliximab use, elevated CRP | |
| Peng et al.39 | 2015 | Seasonal onset and relapse of IBD | Artificial Neural Network | Flares in Crohn’s Disease most likely in July and August | |
| Waljee et al.14 | 2017 | Corticosteroid-free remission with thiopurines | Random Forest | Hemoglobin, lymphocytes, hematocrit, neutrophils, platelets | |
| Waljee et al.10 | 2017 | Hospitalizations; Outpatient corticosteroid use | Random Forest | Age, serum albumin, immunosuppressant use, mean and highest platelet counts | |
| Waljee et al.11,12a | 2018 | Corticosteroid-free remission with vedolizumab | Random Forest | Labs through week 6 of treatment initiation | |
| Waljee et al.37 | 2019 | Ustekinumab response | Random Forest | Labs through week 8 or week 6 albumin/CRP ratio | |
| Wang et al.40 | 2020 | Azathioprine nonadherence | Support Vector Machine + Random Forest | Education level, history of anxiety or depression, concerns about medications | |
| Omics | |||||
| Cushing et al.13 | 2019 | Refractory disease after ileocolectomy | Random Forest | Toll-like receptor, NOD-like receptor, TNF transcriptomics | |
| Zarringhalam et al.77 | 2014 | Infliximab response | Panelized Logistic Regression | IFN-gamma, LPS, TNF transcriptomics |
aTwo publications in this year exploring remission with vedolizumab—one for ulcerative colitis and one for Crohn’s disease.
Ethics and Limitations of Machine Learning Predictive Modeling
Although ML applications for IBD are encouraging, there remain several obstacles to implementation. The strengths of ML lie in its ability to process complicated data sets and perceive patterns to a degree that mimics and may even surpass the ability of humans. The major impediments stem primarily from limited availability of annotated data, especially big data, for model training and validation that will be required for clinical integration of ML.22,36
For construction of ML models, the first barrier is often related to data access. Besides security concerns related to privacy and protected health information, the training data need to be as similar to real-world data as possible for the model to perform well in a clinical context. Supervised learning models require large training sets to produce meaningful insights; to train histopathology-based models, for example, some place a minimum requirement of 10,000 manually annotated slides for adequate real-world model performance.84 Expert human supervisors available for the time-intensive task of annotating or classifying data for model training are often in short supply. IBM’s Watson for Oncology ML algorithm recently came under fire for making erroneous recommendations that at times violated black box warnings, such as recommending use of the drug bevacizumab in a patient with severe bleeding. Despite initial claims that the algorithm was trained on patient records, internal documents revealed much of the training consisted of hypothetical synthetic cases with limited input of real medical data. This underscores the need for large training data sets and robust validation to prevent potential iatrogenic injury by an algorithm.85
In addition to difficulties with obtaining data to build ML models, all existing and future models need to be validated across heterogeneous populations to increase confidence in model generalizability. There have been well-documented issues with cross-sectional validation of ML models.86 Last year, for example, a widely publicized coronavirus disease 2019 (COVID-19) mortality predictive model using 3 biomarkers (lactate dehydrogenase, C-reactive protein, and lymphocyte count) was shown to be irreproducible by 3 groups.87–90 Techniques for expanding labeled training data sets such as data augmentation or transfer learning are being explored as ways to potentially circumvent the bottleneck of data acquisition and annotation, but the question of how accurately these techniques mirror real-world medical data without magnifying bias is yet to be established.22 Compared with humans, ML may seem more impartial and objective in its calculations, but whatever biases that exist in data used to train the model will be exponentially amplified by the model in actual use. Recent examples from Big Tech demonstrate the pitfalls of training data lacking adequate representation of historically underrepresented groups.91,92 For example, gender identification facial recognition algorithms made by Microsoft, IBM, and Face++ were far more likely to misidentify the gender of darker-skinned women (35%) than lighter-skinned men (1%).93 For IBD, any groups underrepresented in the training data may suffer from lower or incorrect prediction accuracies, which may in turn influence the success of management decisions. Racial and socioeconomic disparities in medical research cohorts have long been a challenge, and it will be important to actively take appropriate mitigation measures in the creation of ML training and validation data sets.94 A recent executive order issued by the Biden administration established an equitable data working group to disaggregate all federal data sets by race, ethnicity, gender, disability, income, and other key demographic variables, marking an important step to measure and promote equity in government action.95
Another concern with construction of training and validation data sets is that they simulate an idealized version of the pathology of interest, with most possible edge cases or confounding conditions excluded. A challenge for real-world implementation of IBD ML models is for the models to distinguish IBD from other possible causes of intestinal inflammation that may be overlapping, such as infection, ischemia, medication side effects, etc.45 Because ML models benefit from high volume data exposure and often fail in unpredictable ways when exposed to situations that differ from those they were trained on, open source publication of computational processes and data sets should be encouraged to help mitigate these errors. Furthermore, although many IBD ML studies are completed using locally obtained anonymized research data, creation of large annotated data sets could facilitate training and validation of more robust models. Studies have also varied in the metrics used to evaluate model performance (AUROC, sensitivity, accuracy, etc.) and the definitions for outcomes such as clinical remission. Ideally, open sharing of code and data would allow investigators to pit models from different studies head-to-head on completing the same tasks in the same data sets with the same definitions to truly evaluate their comparative performance. Such approaches through competitions such as ImageNet have been responsible for much of the rapid innovation seen in deep learning over the last decade and could similarly usher forward new breakthroughs in medical artificial intelligence. A possible avenue for this is through leverage of emerging national multi-institutional collaborative efforts such as Risk Stratification of Rapid Disease Progression in Children With Crohn’s Disease (RISK)4 and Predicting Response to Standardized Pediatric Colitis Therapy (PROTECT),5 and international efforts including the IBD Plexus,7 Dutch IBD Biobank,6 and UK IBD BioResource,8 among many others.
Even after construction and validation of ML models, integration into existing clinical pathways may present additional limitations. Most ML efforts in IBD are still in the initial research phases and would require prospective clinical trials before earning Food and Drug Administration (FDA) approval for clinical deployment. The FDA has issued a statement on steps to advance digital health policies with a proposed regulatory framework for ML-based software such as medical devices.96 Development of algorithms and providing proof-of-concept studies would underpin a successful institutional review board submission to conduct clinical trials to prove the safety and efficacy of these systems in a clinical environment.96 Endeavors to conduct such trials in IBD can look to the pipelines of previous success stories, such as the development of an autonomous ML-based system to detect diabetic retinopathy by Abramoff et al.97 Of note, although many models for IBD are built using data already acquired as part of routine IBD diagnosis, some workflow processes will still need to be adjusted. For example, integration of ML histopathological analysis would require all WSIs of colon biopsies to be digitized, which is time intensive. Furthermore, significant heterogeneity exists in staining and biopsy preparation techniques within and between institutions due to varying institutional protocols. Digital color and contrast optimization methods could be a universal way to address this limitation, but a standard has yet to been established.98
Lastly, there will doubtlessly be administrative and regulatory hurdles related to the ethical issues of computers making prognostic or management recommendations that previously were only capable by humans.22 Despite having metrics to evaluate model predictions, in some cases the interpretability of ML can be quite difficult. Machine learning algorithms are often referred to as black boxes, with the opacity of certain techniques making it hard to understand how the model arrived at its conclusions.99 If a model can produce highly accurate predictions about prognosis based on features of imaging that humans do not understand, can we still trust the predictions? Proponents of ML argue that the human brain is also a black box, with many of its processes occurring automatically without our complete understanding. However, if a model makes an erroneous prediction leading to a poor outcome, who is legally and ethically responsible? Considerable efforts are being dedicated to develop methods for interpretation and visualization of model features, which may help increase trust by clinicians and the public.100
Conclusions
The goal of ML in IBD management is not to replace man with a machine. Although ML has the potential to provide insights into disease with greater acuity than human physicians, most ML models in their current forms are only applicable under the specific set of parameters and situations in which they were trained. Far more research and validation would be required for models to be able to independently forecast IBD progression and prescribe appropriate management directives.36 As ML becomes increasingly prevalent, it is important for clinicians to understand when it can be applied, provide expert guidance on development and validation of the models, and interpret the model’s recommendations in the full context of the patient’s clinical scenario. For these reasons, we do not believe that ML will replace physicians in the foreseeable future. Rather, ML will be an additional tool in the armamentarium to support human-led decision-making and delivery of care. Although most of the approaches discussed in this review used one type of data, such as only histology or transcriptomics, future ML predictive models need to be developed that can synthesize multiple data types to produce more robust predictive metrics. The onus is on humanity to develop and train such models for the sake of improving patient outcomes. The field of ML in medicine and IBD is rapidly evolving with new use-cases being developed day by day. We hope this review has provided additional understanding of basic ML concepts and how ML-based predictive analytics can be applied to improve outcomes for patients with IBD.
Funding
Research reported in this publication was supported by The National Institute of Diabetes and Digestive and Kidney Diseases under award number K23DK117061-01A1 (Syed) of the National Institutes of Health and Litwin IBD Pioneers program, Crohn’s & Colitis Foundation.
References
- 1. Samaan M, Campbell S, Cunningham G, et al. Biologic therapies for Crohn’s disease: optimising the old and maximising the new. F1000Res. 2019;8:F1000 Faculty Rev–1210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Actis GC, Pellicano R, Fagoonee S, Ribaldone DG. History of inflammatory bowel diseases. J Clin Med. 2019;8:1970. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Seyed Tabib NS, Madgwick M, Sudhakar P, et al. Big data in IBD: big progress for clinical practice. Gut. 2020;69:1520–1532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Kugathasan S, Denson LA, Walters TD, et al. Prediction of complicated disease course for children newly diagnosed with Crohn’s disease: a multicentre inception cohort study. Lancet. 2017;389:1710–1718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Hyams JS, Davis S, Mack DR, et al. Factors associated with early outcomes following standardised therapy in children with ulcerative colitis (PROTECT): a multicentre inception cohort study. Lancet Gastroenterol Hepatol. 2017;2:855–868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Spekhorst LM, Imhann F, Festen EA, et al. Cohort profile: design and first results of the Dutch IBD Biobank: a prospective, nationwide biobank of patients with inflammatory bowel disease. BMJ Open. 2017;7:e016695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. IBD Plexus: Partnering to Accelerate Science. Published 2021. Updated 2021. Accessed January 10, 2021. https://www.crohnscolitisfoundation.org/research/current–research–initiatives/ibd–plexus
- 8. Parkes M. IBD BioResource: an open–access platform of 25 000 patients to accelerate research in Crohn’s and Colitis. Gut. 2019;68:1537–1540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Joseph M. How President Obama Shaped the Future of Digital Health.2016. Accessed January 16, 2021. https://techcrunch.com/2016/07/27/how–president–obama–shaped–the–future–of–digital–health/
- 10. Waljee AK, Lipson R, Wiitala WL, et al. Predicting hospitalization and outpatient corticosteroid use in inflammatory bowel disease patients using machine learning. Inflamm Bowel Dis. 2017;24:45–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Waljee AK, Liu B, Sauder K, et al. Predicting corticosteroid–free endoscopic remission with vedolizumab in ulcerative colitis. Aliment Pharmacol Ther. 2018;47:763–772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Waljee AK, Liu B, Sauder K, et al. Predicting corticosteroid–free biologic remission with vedolizumab in Crohn’s disease. Inflamm Bowel Dis. 2018;24:1185–1192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Cushing KC, Mclean R, McDonald KG, et al. Predicting risk of postoperative disease recurrence in Crohn’s disease: patients with indolent Crohn’s disease have distinct whole transcriptome profiles at the time of first surgery. Inflamm Bowel Dis. 2018;25:180–193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Waljee AK, Sauder K, Patel A, et al. Machine learning algorithms for objective remission and clinical outcomes with thiopurines. J Crohns Colitis. 2017;11:801–810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Krittanawong C. The rise of artificial intelligence and the uncertain future for physicians. Eur J Intern Med. 2018;48:e13–e14. [DOI] [PubMed] [Google Scholar]
- 16. Miller DD. The medical AI insurgency: what physicians must know about data to practice with intelligent machines. NPJ Digital Med. 2019;2:62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Sung JJ, Stewart CL, Freedman B. Artificial intelligence in health care: preparing for the fifth industrial revolution. Med J Aust. 2020;213:253–255.e251. [DOI] [PubMed] [Google Scholar]
- 18. Scott IA, Cook D, Coiera EW, Richards B. Machine learning in clinical practice: prospects and pitfalls. Med J Aust. 2019;211:203–205.e201. [DOI] [PubMed] [Google Scholar]
- 19. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. [DOI] [PubMed] [Google Scholar]
- 20. Rashidi HH, Tran NK, Betts EV, et al. Artificial intelligence and machine learning in pathology: the present landscape of supervised methods. Acad Pathol. 2019;6:2374289519873088–2374289519873088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380:1347–1358. [DOI] [PubMed] [Google Scholar]
- 22. Lundervold AS, Lundervold A. An overview of deep learning in medical imaging focusing on MRI. Zeitschrift für Medizinische Physik. 2019;29:102–127. [DOI] [PubMed] [Google Scholar]
- 23. Cortes C, Vapnik V. Support–vector networks. Mach Learn. 1995;20:273–297. [Google Scholar]
- 24. Breiman L. Random forests. Mach Learn. 2001;45:5–32. [Google Scholar]
- 25. Esteva A, Robicquet A, Ramsundar B, et al. A guide to deep learning in healthcare. Nat Med. 2019;25:24–29. [DOI] [PubMed] [Google Scholar]
- 26. Raza K, Singh NK. A Tour of Unsupervised Deep Learning for Medical Image Analysis. Curr Med Imag. 2021. doi:10.2174/1573405617666210127154257. Epub ahead of print. PMID: 33504314. [DOI] [PubMed] [Google Scholar]
- 27. Sari CT, Gunduz–Demir C. Unsupervised feature extraction via deep learning for histopathological classification of colon tissue images. IEEE Trans Med Imaging. 2019;38:1139–1149. [DOI] [PubMed] [Google Scholar]
- 28. Komura D, Ishikawa S. Machine learning methods for histopathological image analysis. Comput Struct Biotechnol J. 2018;16:34–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Likas A, Vlassis N, Verbeek JJ. The global k–means clustering algorithm. Pattern Recognit. 2003;36:451–461. [Google Scholar]
- 30. Baldi P. Autoencoders, unsupervised learning, and deep architectures. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning; Proceedings of Machine Learning Research; Bellevue, Washington, USA; 2012:37–49. [Google Scholar]
- 31. Wold S, Esbensen K, Geladi P. Principal component analysis. Chemom Intell Lab Syst. 1987;2:37–52. [Google Scholar]
- 32. Raschka S. Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:181112808. Cornell Tech; 2018. [Google Scholar]
- 33. Hajian–Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian J Intern Med. 2013;4:627–635. [PMC free article] [PubMed] [Google Scholar]
- 34. Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997;30:1145–1159. [Google Scholar]
- 35. Dinga R, Penninx BWJH, Veltman DJ, et al. Beyond accuracy: Measures for assessing machine learning models, pitfalls and guidelines. bioRxiv. 2019:743138. [Google Scholar]
- 36. Topol EJ. High–performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44–56. [DOI] [PubMed] [Google Scholar]
- 37. Waljee AK, Wallace BI, Cohen–Mekelburg S, et al. Development and validation of machine learning models in prediction of remission in patients with moderate to severe Crohn disease. JAMA Network Open. 2019;2:e193721–e193721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Dong Y, Xu L, Fan Y, et al. A novel surgical predictive model for Chinese Crohn’s disease patients. Medicine (Baltimore). 2019;98:e17510–e17510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Peng JC, Ran ZH, Shen J. Seasonal variation in onset and relapse of IBD and a model to predict the frequency of onset, relapse, and severity of IBD based on artificial neural network. Int J Colorectal Dis. 2015;30:1267–1273. [DOI] [PubMed] [Google Scholar]
- 40. Wang L, Fan R, Zhang C, et al. Applying machine learning models to predict medication nonadherence in Crohn’s disease maintenance therapy. Patient Prefer Adherence. 2020;14:917–926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Rätsch G, Sonnenburg S, Schäfer C. Learning interpretable SVMs for biological sequence classification. BMC Bioinformatics. 2006;7:S9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Choi K, Chun J, Han K, et al. Risk of anxiety and depression in patients with inflammatory bowel disease: a nationwide, population–based study. J Clin Med. 2019;8:654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Kuo B, Bhasin M, Jacquart J, et al. Genomic and clinical effects associated with a relaxation response mind–body intervention in patients with irritable bowel syndrome and inflammatory bowel disease. PLoS One. 2015;10:e0123861–e0123861. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Iizuka O, Kanavati F, Kato K, et al. Deep learning models for histopathological classification of gastric and colonic epithelial tumours. Sci Rep. 2020;10:1504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Mossotto E, Ashton JJ, Coelho T, et al. Classification of paediatric inflammatory bowel disease using machine learning. Sci Rep. 2017;7:2427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Yu K–H, Zhang C, Berry GJ, et al. Predicting non–small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat Commun. 2016;7:12474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Beck AH, Sangoi AR, Leung S, et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci Transl Med. 2011;3:108ra113–108ra113. [DOI] [PubMed] [Google Scholar]
- 48. Viitaniemi V, Laaksonen J. Techniques for image classification, object detection and object segmentation. In: Paper presented at: International Conference on Advances in Visual Information Systems Springer; 2008. [Google Scholar]
- 49. Morgenstern S, Brook E, Rinawi F, et al. Tissue and peripheral eosinophilia as predictors for disease outcome in children with ulcerative colitis. Dig Liver Dis. 2017;49:170–174. [DOI] [PubMed] [Google Scholar]
- 50. Zezos P, Patsiaoura K, Nakos A, et al. Severe eosinophilic infiltration in colonic biopsies predicts patients with ulcerative colitis not responding to medical therapy. Colorectal Dis. 2014;16:O420–430. [DOI] [PubMed] [Google Scholar]
- 51. Alhmoud T, Gremida A, Colom Steele D, et al. Outcomes of inflammatory bowel disease in patients with eosinophil–predominant colonic inflammation. BMJ Open Gastroenterol. 2020;7:e000373–e000373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Wang P, Xiao X, Glissen Brown JR, et al. Development and validation of a deep–learning algorithm for the detection of polyps during colonoscopy. Nat Biomed Eng. 2018;2:741–748. [DOI] [PubMed] [Google Scholar]
- 53. de Groof AJ, Struyvenberg MR, van der Putten J, et al. Deep–learning system detects neoplasia in patients with barrett’s esophagus with higher accuracy than endoscopists in a multistep training and validation study with benchmarking. Gastroenterology. 2020;158:915–929.e914. [DOI] [PubMed] [Google Scholar]
- 54. Klang E, Barash Y, Margalit RY, et al. Deep learning algorithms for automated detection of Crohn’s disease ulcers by video capsule endoscopy. Gastrointest Endosc. 2020;91:606–613.e602. [DOI] [PubMed] [Google Scholar]
- 55. Stidham RW, Liu W, Bishu S, et al. Performance of a deep learning model vs human reviewers in grading endoscopic disease severity of patients with ulcerative colitis. JAMA Network Open. 2019;2:e193963–e193963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Fernandes SR, Pinto J, Marques da Costa P, Correia L. Disagreement among gastroenterologists using the Mayo and Rutgeerts endoscopic scores. Inflamm Bowel Dis. 2018;24:254–260. [DOI] [PubMed] [Google Scholar]
- 57. Daperno M, Comberlato M, Bossa F, et al. Inter-observer agreement in endoscopic scoring systems: preliminary report of an ongoing study from the Italian group for inflammatory bowel disease (IG–IBD). Dig Liver Dis. 2014;46:969–973. [DOI] [PubMed] [Google Scholar]
- 58. Iacucci M, Daperno M, Lazarev M, et al. Development and reliability of the new endoscopic virtual chromoendoscopy score: the PICaSSO (Paddington International Virtual ChromoendoScopy ScOre) in ulcerative colitis. Gastrointest Endosc. 2017;86:1118–1127.e1115. [DOI] [PubMed] [Google Scholar]
- 59. Bossuyt P, Nakase H, Vermeire S, et al. Automatic, computer–aided determination of endoscopic and histological inflammation in patients with mild to moderate ulcerative colitis based on red density. Gut. 2020;69:1778–1786. [DOI] [PubMed] [Google Scholar]
- 60. Ozawa T, Ishihara S, Fujishiro M, et al. Novel computer–assisted diagnosis system for endoscopic disease activity in patients with ulcerative colitis. Gastrointest Endosc. 2019;89:416–421.e411. [DOI] [PubMed] [Google Scholar]
- 61. Yao H, Najarian K, Gryak J, et al. Fully automated endoscopic disease activity assessment in ulcerative colitis. Gastrointest Endosc. 2021; 93: 728– 736.e1. [DOI] [PubMed] [Google Scholar]
- 62. Takenaka K, Ohtsuka K, Fujii T, et al. Development and validation of a deep neural network for accurate evaluation of endoscopic images from patients with ulcerative colitis. Gastroenterology. 2020;158:2150–2157. [DOI] [PubMed] [Google Scholar]
- 63. Dhaliwal J, Erdman L, Drysdal E, et al. Accurate classification of pediatric colonic IBD subtype using a random forest machine learning classifier. J Pediatr Gastroenterol Nutr. 2021; 72: 262– 269. [DOI] [PubMed] [Google Scholar]
- 64. Kilcoyne A, Kaplan JL, Gee MS. Inflammatory bowel disease imaging: current practice and future directions. World J Gastroenterol. 2016;22:917–932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Ordás I, Rimola J, Rodríguez S, et al. Accuracy of magnetic resonance enterography in assessing response to therapy and mucosal healing in patients with Crohn’s disease. Gastroenterology. 2014;146:374–382.e371. [DOI] [PubMed] [Google Scholar]
- 66. Puylaert CAJ, Schüffler PJ, Naziroglu RE, et al. Semiautomatic assessment of the terminal ileum and colon in patients with Crohn disease using MRI (the VIGOR++ project). Acad Radiol. 2018; 25: 1038– 1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Schüffler PJ, Mahapatra D, Naziroglu RE, et al. Semi–automatic Crohn’s disease severity estimation on MR imaging. In: Paper presented at: 6th MICCAI Workshop on Abdominal Imaging – Computational and Clinical Applications; Springer International Publishing; 2014, 2014. [Google Scholar]
- 68. Gordic S, Bane O, Kihira S, et al. Evaluation of ileal Crohn’s disease response to TNF antagonists: validation of MR enterography for assessing response. Initial results. Eur J Radiol Open. 2020;7:100217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Zhu HB, Xu D, Ye M, et al. Deep learning–assisted MRI prediction of tumor response to chemotherapy in patients with colorectal liver metastases. Int J Cancer. 2021; 148(7): 1717– 1730. [DOI] [PubMed] [Google Scholar]
- 70. Gurney–Champion OJ, Kieselmann JP, Wong KH, et al. A convolutional neural network for contouring metastatic lymph nodes on diffusion–weighted magnetic resonance images for assessment of radiotherapy response. Phys Imaging Radiat Oncol. 2020;15:1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. MacGregor RM, Guo A, Masood MF, et al. Machine learning outcome prediction in dilated cardiomyopathy using regional left ventricular multiparametric strain. Ann Biomed Eng. 2021; 49: 922– 932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. Emblem KE, Due–Tonnessen P, Hald JK, et al. Machine learning in preoperative glioma MRI: survival associations by perfusion–based support vector machine outperforms traditional MRI. J Magn Reson Imaging. 2014;40:47–54. [DOI] [PubMed] [Google Scholar]
- 73. van der Burgh HK, Schmidt R, Westeneng HJ, et al. Deep learning predictions of survival based on MRI in amyotrophic lateral sclerosis. Neuroimage Clin. 2017;13:361–369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Imhann F, Van der Velde KJ, Barbieri R, et al. The 1000IBD project: multi–omics data of 1000 inflammatory bowel disease patients; data release 1. BMC Gastroenterol. 2019;19:5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Kumar M, Garand M, Al Khodor S. Integrating omics for a better understanding of inflammatory bowel disease: a step towards personalized medicine. J Transl Med. 2019;17:419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Khorasani HM, Usefi H, Peña–Castillo L. Detecting ulcerative colitis from colon samples using efficient feature selection and machine learning. Sci Rep. 2020;10:13744. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77. Zarringhalam K, Enayetallah A, Reddy P, Ziemek D. Robust clinical outcome prediction based on Bayesian analysis of transcriptional profiles and prior causal networks. Bioinformatics. 2014;30:i69–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. Verstockt B, Sudahakar P, Creyns B, et al. DOP70 An integrated multi–omics biomarker predicting endoscopic response in ustekinumab treated patients with Crohn’s disease. J Crohn’s Colitis. 2019;13:S072–S073. [Google Scholar]
- 79. Schmauch B, Romagnoni A, Pronier E, et al. A deep learning model to predict RNA–Seq expression of tumours from whole slide images. Nat Commun. 2020;11:3877. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Strokotov DI, Yurkin MA, Gilev KV, et al. Is there a difference between T– and B–lymphocyte morphology? J Biomed Opt. 2009;14:064036. [DOI] [PubMed] [Google Scholar]
- 81. Hu Y, Hase T, Li HP, et al. A machine learning approach for the identification of key markers involved in brain development from single–cell transcriptomic data. BMC Genomics. 2016;17:1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82. Uniken Venema WT, Voskuil MD, Vila AV, et al. Single–cell RNA sequencing of blood and ileal T cells from patients with Crohn’s disease reveals tissue–specific characteristics and drug targets. Gastroenterology. 2019;156:812–815.e822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83. Martin JC, Chang C, Boschetti G, et al. Single–cell analysis of Crohn’s disease lesions identifies a pathogenic cellular module associated with resistance to anti–TNF therapy. Cell. 2019;178:1493–1508.e1420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84. Campanella G, Hanna MG, Geneslaw L, et al. Clinical–grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med. 2019;25: 1301–1309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85. Ross CS, Ike. IBM’s Watson supercomputer recommended ‘unsafe and incorrect’ cancer treatments, internal documents show.2018. Published July 25, 2018. Accessed February 5, 2021.
- 86. Zech JR, Badgeley MA, Liu M, et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross–sectional study. PLOS Med. 2018;15:e1002683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87. Yan L, Zhang H–T, Goncalves J, et al. An interpretable mortality prediction model for COVID–19 patients. Nat Mach Intell. 2020;2:283–288. [Google Scholar]
- 88. Barish M, Bolourani S, Lau LF, et al. External validation demonstrates limited clinical utility of the interpretable mortality prediction model for patients with COVID–19. Nat Mach Intell. 2021; 3: 25– 27. [Google Scholar]
- 89. Quanjel MJR, van Holten TC, Gunst–van der Vliet PC, et al. Replication of a mortality prediction model in Dutch patients with COVID–19. Nat Mach Intell. 2021; 3: 23– 24. [Google Scholar]
- 90. Dupuis C, De Montmollin E, Neuville M, et al. Limited applicability of a COVID–19 specific mortality prediction rule to the intensive care setting. Nat Mach Intell. 2021; 3: 20– 22. [Google Scholar]
- 91. Simonite T. Machines taught by photos learn sexist view of women. 2017. Published August 21, 2017. Accessed January 10, 2021. https://www.wired.com/story/machines–taught–by–photos–learn–a–sexist–view–of–women/
- 92. Vincent J. Google ‘fixed’ its racist algorithm by removing gorillas from its image–labeling tech.2018. Published January 12, 2018. Accessed January 10, 2021. https://www.theverge.com/2018/1/12/16882408/google–racist–gorillas–photo–recognition–algorithm–ai
- 93. Lohr S. Facial recognition is accurate, if you’re a white guy. The New York Times. 2018. Published February 12, 2018. [Google Scholar]
- 94. Konkel L. Racial and ethnic disparities in research studies: the challenge of creating more diverse cohorts. Environ Health Perspect. 2015;123:A297–A302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95. Biden JR. Executive order on advancing racial equity and support for underserved communities through the federal government. In: House TW, ed. Office of the Federal Register, National Archives and Records Administration; 2021. https://www.govinfo.gov/app/details/DCPD-202100054/
- 96. Artificial intelligence and machine learning (AI/ML) software as a medical device action plan. In: Administration FaD, ed. United States Food and Drug Administration; 2021. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device
- 97. Abràmoff MD, Lavin PT, Birch M, et al. Pivotal trial of an autonomous AI–based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digital Med. 2018;1:39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98. Vahadane A, Peng T, Sethi A, et al. Structure–preserving color normalization and sparse stain separation for histological images. IEEE Trans Med Imaging. 2016;35:1962–1971. [DOI] [PubMed] [Google Scholar]
- 99. Schmelzer R., Towards A, More T, ransparent AI. 2020. Published May 23, 2020. Accessed January 10, 2021. https://www.forbes.com/sites/cognitiveworld/2020/05/23/towards–a–more–transparent–ai/?sh=5aef3f493d93
- 100. Montavon G, Samek W, Müller K–R. Methods for interpreting and understanding deep neural networks. Digital Signal Process. 2018;73:1–15. [Google Scholar]





