Abstract
The widespread adoption of electronic health records has resulted in an abundance of imaging and clinical information. New data-processing technologies have the potential to revolutionize the practice of medicine by deriving clinically meaningful insights from large-volume data. Among those techniques is supervised machine learning, the study of computer algorithms that use self-improving models that learn from labeled data to solve problems. One clinical area of application for supervised machine learning is within oncology, where machine learning has been used for cancer diagnosis, staging, and prognostication. This review describes a framework to aid clinicians in understanding and critically evaluating studies applying supervised machine learning methods. Additionally, we describe current studies applying supervised machine learning techniques to the diagnosis, prognostication, and treatment of cancer, with a focus on gastroenterological cancers and other related pathologies.
Keywords: machine learning, supervised learning, automated diagnosis, artificial intelligence
The widespread adoption of electronic health records has resulted in an abundance of imaging and clinical information. The wealth of data, while providing new opportunities for data-driven medicine, has also resulted in noise that has posed challenges in identifying clinically impactful insights. New data-processing technologies have the potential to revolutionize the practice of medicine by deriving meaningful insights from large-volume data.
Among those techniques is machine learning, a branch of artificial intelligence that solves problems through the use of self-improving models that learn from data instead of following prescribed rules.1 In machine learning, models learn to accurately predict outcomes or classify data from large datasets. There are many conceptual model categories within machine learning. In simple terms, the major algorithm types include supervised learning (labeled datasets), unsupervised learning (unlabeled data), and reinforcement learning (partially labeled data).
Machine learning is the foundation of many technologies including personalized content recommendation on web sites, fraud-detection software, and self-driving cars. Machine learning methods have been applied to medical problems since the late 1960s.2 In the recent years, the rise of electronic medical data has fueled a renewed interest in medical artificial intelligence research and development. These efforts have produced diagnostic models that have matched and occasionally surpassed the performance of experienced physicians.3,4
In medicine, oncology has been at the forefront of the application of machine learning methods to clinical problems. Within oncology, most machine learning research has focused on using genetic sequencing, clinical imaging, and images of histopathological specimens to diagnose cancer as early as possible in many types of primary cancers.5–7 Machine learning models have also been used to predict the survival or risk of recurrence for patients already diagnosed with cancer.8–10 More recently, machine learning has been used to help clinicians recommend cancer treatment modalities when evidence-driven guidelines are lacking, inconsistent, or impractical.11 For example, the Liver Imaging Reporting and Data System (LI-RADS), which is used to standardize the diagnosis of hepatocellular carcinoma, is complex and requires manual assessment of too many variables to be easily implemented in practice.12 Additionally, machine learning models can efficiently predict responses to various treatment options based on a patient’s individual clinical presentation, helping clinicians determine which treatment course is most appropriate for each patient. These predictions help clinicians make evidence-driven decisions in patient care while also advancing the goals of precision medicine to deliver more tailored treatment recommendations to patients. While machine learning technologies have yet to be incorporated into routine clinical practice, these advances have the potential to improve standards for early diagnosis and disease prognostication.
Given its increasing relevance, physicians must obtain the skills to evaluate these technologies to translate them effectively into practice. This review aims to provide clinicians with a basic understanding of supervised learning, the most commonly applied branch of machine learning to the clinical management of cancer. We first provide a brief overview of supervised machine learning fundamentals. A framework describing how to effectively evaluate studies using supervised machine learning techniques will be outlined.
Supervised Machine Learning Fundamentals
Artificial intelligence is a broad term that includes machine learning and refers to the use of algorithms to automate tasks such as problem-solving and pattern recognition. Machine learning refers to algorithms that are self-improving. In other words, explicit rules and logic are not prespecified but rather learned by the algorithm through exposure to data.
Supervised machine learning is a subtype of machine learning that is distinct because it requires labeled input and output data to learn. In supervised learning, the model learns to associate variables from the input data (in machine learning terms, features) with the final outcome. The process of assembling the labeled dataset (training data) often requires manual annotation, and the annotated data will serve as the ground truth from which the algorithms outcome predictions are based.
There are many types of learning models, ranging from a basic logistic regression to random forests, neural networks, and support vector machines. The selection of a learning model is nuanced and depends on the precise clinical question at hand, and we will discuss some basic rationale for model selection here. Similarly, there are many programming languages in which these algorithms can be constructed. Selection of a programming language is primarily based on the computer scientist’s experience and preference and may depend on the availability of specific open-source machine learning libraries with tools (e.g. TensorFlow, Scikit-learn, Keras) that facilitate easy implementation of code.
After learning the associations between input features and the outcome of interest, trained models can accurately predict the outcome in previously unencountered cases.13 For example, if trained on a large dataset of magnetic resonance (MR) images with lesions labeled as malignant or benign, a model could then predict (in machine learning terminology, classify) whether a given lesion is cancerous. This outcome-focused framework is distinct from traditional statistical models, which are often employed with relatively less focus on predictions and a greater emphasis on forms of inference such as hypothesis testing. Common hypotheses include determining effect sizes of a treatment or identifying causal relationships between variables and outcomes. Machine learning is often not the most effective methodology to answer these types of scientific questions.
If the input data do not include all relevant variables, the model may fail to achieve high predictive accuracy. Continuing from the previous example, relevant imaging variables such as size of lesion and degree of contrast uptake and washout must be provided for the model to have enough information to classify lesions as benign or malignant. If less relevant variables (such as organ volume) are specified, the model will learn to minimize the contribution of such features to the model’s predictions. Once a set of features has been identified, the machine learning model uses this information to predict the outcome. By comparing its guess with the actual outcome, the model learns the combination and relative contribution of features that are most predictive of the outcome. Eventually, the model identifies a combination and relative contribution of each feature that can reliably reproduce the outcome.
For the specific task of classifying binary events (e.g., identifying a lesion as benign or malignant), several learning models can be used. Most models use input data to find a rule or boundary that separates data into categories of outcomes—a method referred to as discriminative modeling. Examples of discriminative models include logistic regression, support vector machines, and random forest models.14 Generative models, by contrast, such as Naive Bayes, calculate probabilities of encountering various inputs and use these probabilities to predict outcomes.14 In general, discriminative models perform better than generative models on large, well-balanced datasets, whereas generative models can detect and compensate for imbalances that tend to emerge in small training datasets.15
When analyzing clinical imaging data, the aforementioned learning models may be too simplistic to produce reliably accurate predictions. An MR imaging (MRI), for example, consists of a series of images, each of which contains many pixels. Because of this large volume of data, input variables (features) are often too numerous to process or too difficult to quantify and, in some cases, are simply not known. In these cases, artificial neural networks (ANNs) are often used. Inspired by neural networks in the brain, ANNs are multilayered networks that are able to process large amounts of data and capture complex relationships.16
Convolutional neural networks (CNNs) are a type of ANN used in image processing that uses deep learning, a term that refers to an architectural structure that employs hidden layers. Deep learning neural networks can be used to make predictions based on raw image data. For example, a neural network can process an entire MR image as a matrix of pixels and learn to classify lesions as malignant without being explicitly programmed to consider features such as lesion size and enhancement characteristics. Instead, neural networks look for shapes and patterns in the image and identify those that appear to be highly associated with the outcome. Because features are not explicitly identified, CNNs and other deep learning algorithms function as a “black box” that often produce results without any apparent clinical intuition. In other words, they will not easily reveal which image characteristics were extracted from the raw pixel data of an MR to ultimately determine lesion malignancy. This opacity is a major drawback in translating these algorithms into clinical practice, as medical specialists are hesitant to use technologies that cannot be rationalized according to tangible scientific and medical principles. In response to this, recent studies have aimed to produce more readily accessible models for clinicians, such as heat maps that identify areas of interest on imaging that clinicians can then evaluate.17 However, without improved clinician understanding of the statistical methods underlying deep learning methods, the full potential of deep learning models cannot be realized (Fig. 1).
Fig. 1.

Schematic illustrating components and value of machine learning.
Framework for Evaluating Machine Learning Literature
Evaluating studies using machine learning techniques can be challenging for clinicians without advanced training in statistics and computer science. A simple framework for critically examining such studies and evaluating the clinical translational potential of a proposed model is provided in Fig. 2.
Fig. 2.

Clinical framework for the evaluation of machine learning literature.
Is Machine Learning an Appropriate Solution for This Problem?
The first step in evaluating a study is to determine whether machine learning techniques are appropriate for the medical problem being addressed. The primary purpose of a machine learning model is to accurately predict an outcome. Machine learning is generally inferior to other statistical methods if a study is seeking to test an idea within the context of a particular theory (e.g., what amount of necrosis is most predictive of response to treatment?). In this example, the researcher is imposing a theory that necrosis is an indicator of treatment response. Machine learning, by contrast, uses data to predict an outcome without relying on any existing theories (e.g., how do we predict which patients will respond to a given treatment using pretreatment imaging?).18 In this case, even if necrosis is statistically related to treatment response, the model may prioritize other related features if they produce a better performing model. In other words, understanding the relative contribution of each feature is typically less important in machine learning than achieving superior final model performance.
The clinical management of cancer has many important and elusive outcomes of interest. Commonly studied outcomes include the presence of cancer, staging, survival, recurrence, and treatment response. Accurate prediction of any of these outcomes will benefit patient care, thus presenting many potential applications of machine learning technologies in oncologic care.
Is the Training Cohort Large and Representative?
The performance of a machine learning model depends heavily on the quality and quantity of the data it is trained on. Large amounts of training data are often necessary, particularly with models that have many features or with neural networks. In deep learning, it is not uncommon to train models with millions of samples. If the training data are too small or homogeneous, the model will be inaccurate or overfitted. An overfitted model is accurate at predicting outcomes in the dataset it was trained on but performs poorly on other data. If the training cohort does not include a heterogeneous representation of a disease, the external validity of the model is also likely to suffer. For example, if training data does not include both well-circumscribed and diffuse tumors in imaging, the model will likely not be successful in diagnosing a wide spectrum of tumors typically encountered in clinical practice.
Some studies rely upon large databases of deidentified information, such as the National Cancer Database.19 Others combine multiple similar databases to increase the size of their respective training cohorts.3 When large cohorts of patient data are unavailable, the existing data can be augmented. One technique is to split an image into multiple patches and use each patch as a data point in a training dataset.11 While data augmentation can increase the size of a training cohort, it does not enable a training dataset to become more representative of a general patient population. Carefully examining the selection process of the training dataset will ensure that clinicians do not overestimate the external validity of a machine learning model; a common mistake is to presume that the efficacy of the proposed model will be equivalent when applied to a significantly different patient population than the training patient set.
Are the Selected Features Clinically Meaningful?
Using input features commonly captured during routine clinical care will increase the likelihood of a machine learning model ultimately being adopted into patient care. Along these lines, it should be simple to extract features from input data. When considerable manipulation (in machine learning language, preprocessing) of the input data is necessary to derive features, clinical implementation and model reproducibility may be inhibited.
Most studies using clinical, imaging, or histological data use commonly measured features in their models, such as mass size, shape, lymph node infiltration, and family history of cancer. Studies employing genomic data often use data sources that are not currently routinely collected in patient care, such as SNP (single-nucleotide polymorphisms) sequencing, polyadenylation patterns, and other molecular biomarkers. For example, one study aiming to predict response to therapy for patients with severe ulcerative colitis used microRNA profiles, which are not yet routinely performed in many institutions.20 Another study was able to accomplish the similar task of predicting tumor response to chemoradiotherapy in esophageal cancer using readily available imaging radiomics features, which, in turn, improved the relative translational utility of the model.21 As genomic analysis of tumors is incorporated into clinical practice, the translational potential of these studies will grow in tandem. However, the present-day clinical applicability of a proposed model should be considered.
What Methods Were Used to Evaluate the Model Performance?
After training a machine learning model, it is important to evaluate its performance on a different patient cohort. To accomplish this, the original data are often randomly split into three sets: training, validation, and testing sets. Several different models are trained on the same training set using different parameters. These models are then tested on the validation set, and the best model is chosen based on its ability to predict outcomes in these previously unencountered data. This process is called cross-validation. The simplest form of cross-validation is called the holdout method, where the data are split only once into training and testing data. The mean error of the test dataset is often reported. The advantage of the hold-out method is that it only needs to be run once, but the performance evaluation is subject to higher variance. K-fold cross-validation is an improvement of this method. The data are divided into k subsets (e.g., 5- or 10-fold), and holdout is repeated k times. The variance of performance evaluation using this method is typically reduced compared with the holdout method, though the process is computationally more intensive as it is repeated many times. Leave-one-out cross-validation is an extreme version of k-fold cross-validation, where k is equal to the number of data points in the set. This method is frequently used in small datasets. Cross-validation allows the best combination of features to be identified. The model is then retrained on this optimized version. Finally, the test set is used to evaluate how well the validated model is likely to perform on a dataset it has not yet encountered.
Model performance is often reported as true positive rate (sensitivity) and true negative rate (specificity) percentages, representing the model’s ability to classify outcomes correctly. Accuracy is a term colloquially used to describe model performance, but statistical accuracy refers to the ratio of correctly predicted samples in a set of all data samples and is a commonly reported measure of model performance. These values may also be visually represented in a receiver operating characteristic curve (ROC), in which the area under the curve (AUC) represents a ratio of true positive results to false-positive results. The AUC ranges from 0 to 1, where a perfect classifier has an AUC of 1. The C-statistic is another term used to describe the area under the ROC. F scores are another metric often used to evaluate model performance by estimating accuracy.22 In simple terms, the F score represents the precision (positive predictive value) and recall (sensitivity) of the model. Of note, the F score does not consider the number of true negative rate (specificity) of a model, which is an important component of a model’s performance. F scores must therefore be interpreted cautiously.
How Does the Final Performance of the Model Compare with Existing Standards?
The end goal of applying machine learning techniques to a medical problem is to produce a clinically translatable tool. When applied to clinical tasks, machine learning model accuracies should always be compared with existing standards to determine whether they are effective enough to be incorporated into clinical practice. Machine learning models are most valuable when they are consistently more accurate than current gold standards, whether that be expert physicians or current technology. One excellent study developed a model to classify gastric carcinoma from whole-slide histological images. The authors then compared the performance of their model to both standard of care and state-of-the-art image analysis technology in digital histopathology.23 This comparison allowed the researchers to articulate the nuanced superiority and appropriate application for their proposed model.
Comparisons are often overlooked when the relevant comparison of a model performance is to a clinician’s determination rather than a technology or analytic method. However, the few studies that have compared model accuracies to physicians report equivalent or slightly higher performance than physician experts. One study reported that radiologists had an average sensitivity of 82.5% and specificity of 96.5% for characterizing various liver lesions from MRI. When applied to the same task, a machine learning model achieved a 92% accuracy, 92% sensitivity, and 98% specificity.11 In this study, quantifying and reporting the performance of expert physicians allowed the authors to conclude that their constructed models were comparable to physician assessment, which increases the likelihood that these models can be brought into clinical practice with confidence in their clinical utility.
In the rare instance when a physician is compared to a model, physician performance is typically estimated using small groups of physicians. In the aforementioned liver lesion study, validation was performed including two board-certified radiologists.11 In addition to the average radiologist performance, important metrics such as the standard deviation, range, and distribution of physician performance should be reported when possible. These metrics are understandably difficult to capture, but given the heterogeneity of clinician judgment, quantifying the spread of physician performance can only further help demonstrate the clinical benefit of a consistent and reproducible model.
Conclusion and Future Directions
While the majority of current research is focused on automated diagnosis and prognosis, an increasing number of studies are applying machine learning to other aspects of cancer management. Machine learning methods have the potential to produce clinical tools that provide oncologists greater insights into the likely outcomes of various paths of disease management and treatment. To this end, some early research has been conducted to predict treatment response from tumor traits and clinical imaging.21,24–26 Most importantly, clinical applicability needs to be brought to the forefront of machine learning research in cancer. Further work must be done to compare the performance of machine learning models to general clinician performance to tangibly demonstrate the value of these methods in patient care. A framework must be established to effectively translate learning models into clinical practice. This task involves establishing standards for the development of models to be used in patient care as well as creating interfaces in which clinicians can seamlessly integrate the use of these applications into patient encounters. It is for these reasons that physicians of all types must understand and become involved in machine learning research; a clinical perspective is necessary to realize the potential value of machine learning in their practice of medicine (Appendix 1).
Funding
This review was partially supported by the National Institutes of Health/National Cancer Institute Grant # R01CA206180.
Appendix 1
Period Covered: 2014 to 2019
Keywords used in conjunction with “machine learning,” “supervised learning,” and “deep learning”: hepatitis b, hepatitis c, hepatocellular carcinoma, NASH, NAFLD, polyps, endoscopy, colorectal cancer, colon cancer, liver, pancreatitis, IBD, esophageal cancer, primary biliary cholangitis, gastrointestinal hemorrhage, treatment, outcome, polyps, colitis, celiac disease, prediction
Table 1.
Machine learning applications for the diagnosis of gastrointestinal diseases
| Authors | Summary | Primary data type | No. of patients (no. of samplesa) | Model type(s) | Best model performance |
|---|---|---|---|---|---|
| Dong et al6 | Diagnosis of esophageal varices from clinical data | Clinical | 347 | RF | AUC = 0.82 |
| Hamm et al20 | Classifying different types of liver lesions on MRI | Imaging | 494a | NN | Accuracy = 92% Sensitivity = 92% Specificity = 98% |
| Ito et al7 | Diagnosis of cT1b colorectal cancer from endoscopic imaging | Imaging | 190a | NN | AUC = 0.871 Sensitivity = 67.5% Specificity = 89% Accuracy = 81.2% |
| Lee et al27 | Differentiation between normal, ulcerous, and cancerous tissue in endoscopic imaging | Imaging | 787a | NN | AUC = 0.95 |
| Sharma et al23 | Classifying gastric carcinoma and detecting necrosis from histopathology | Histology | 15a | NN | Accuracy = 81.4% |
| Wang et al28 | Staging liver fibrosis in hepatitis B patients from elastography incorporating Radiomics | Imaging | 1.990a | NN | AUC = 0.99 |
| Wang et al29 | Real-time detection of polyps during endoscopy with deep learning | Imaging | 28.321a | NN | AUC = 0.984 Sensitivity = 94.38% Specificity=95.92% |
| Wu et al30 | Prediction of NAFLD with clinical data | Clinical | 577 | RF, naïve Bayes classifier, NN | AUC = 0.925 |
| Yip et al31 | Detecting NAFLD from clinical data | Clinical | 922 | Ridge regression, decision tree, AdaBoost | AUC = 0.88 Sensitivity = 92% Specificity = 90% |
| Zhou et al5 | Assessing chronic inflammation grade of chronic hepatitis B patients incorporating gene expression and clinical data | Genetic, clinical | 122 | PCA, RF, K-nearest neighbor, SVM | AUC = 0.88 |
Abbreviations: MRI, magnetic resonance imaging; NAFLD, nonalcoholic fatty liver disease; AUC, area under the curve; PCA, principal component analysis; RF, random forest; SVM, support vector machine; NN, neural network.
The number of samples derived from patients, not the number of unique patients participating in the study.
Table 2.
Machine learning applications for the prognosis of gastrointestinal diseases
| Authors | Summary | Primary data type | No. of patients (no. of samplesa) | Model type(s) | Best model performance |
|---|---|---|---|---|---|
| Augustin et al32 | Predicting outcome of acute variceal hemorrhage | Clinical | 267 | LR, CART | AUC = 0.83 |
| Cai et al8 | Predicting survival for patients with HCC after hepatectomy | Clinical, imaging | 299 | Bayesian network | 83.22%/51.33% |
| Chen et al33 | Predicting pancreatic neuroendocrine tumor pathology grade and prognosis from Doppler imaging | Imaging | 112 | RF, NN, SVM | AUC = 0.997 |
| Chen et al34 | Predicting the malignant potential of gastrointestinal stromal tumors preoperatively | Imaging | 222 | SVM | AUC = 0.87 |
| Eaton et al35 | Predicting hepatic decompensation in patients with primary biliary cholangitis | Clinical | 784 | Gradient boosting | C-statistic = 0.9 |
| Horie et al36 | Detecting and differentiating between different type of esophageal cancer | Imaging | 481 | NN | Sensitivity = 98% Accuracy = 98% |
| Chen et al20 | Predicting development of cirrhosis in patients with hepatitis C from gene signatures | Genetic, clinical | 574 | Naïve Bayes classifier | AUC=76% |
| Ichimasa et al9 | Predicting lymph node metastasis presence of T1 colorectal cancer from endoscopic imaging | Imaging | 690 | NN | Sensitivity = 100% Specificity = 66% Accuracy = 69% |
| Kim et al10 | Predicting HCC recurrence incorporating radiomic imaging signatures | Imaging | 167 | RF | C-statistic = 0.72 |
| Merath et al37 | Predicting complications after gastrointestinal surgery | Clinical | 15.657 | Decision tree | C-statistic = 0.74 |
| Waljee et al38 | Predicting corticosteroid use and hospitalization time of patients with inflammatory bowel disease | Clinical | 20.368 | RF | AUC = 0.87 |
Abbreviations: HCC, hepatocellular carcinoma; LR, logistic regression; CART, classification and regression tree, RF, random forest; SVM, support vector machine; NN, neural network.
The number of samples derived from patients, not the number of unique patients participating in the study.
Table 3.
Machine learning applied to the treatment of gastrointestinal diseases
| Authors | Summary | Primary data type | No. of patients (no. of samplesa) | Model type(s) | Best model performance |
|---|---|---|---|---|---|
| Abajian et al24 | Predicting treatment response of patients with HCC | Imaging | 36 | RF, LR | Accuracy = 78% Sensitivity = 62.5% Specificity=82.1% |
| Morilla et al25 | Predicting treatment response of patients with acute severe ulcerative colitis | Genetic, clinical | 76 | NN | Accuracy = 93% AUC = 0.91 |
| Jin et al26 | Prediction of response after chemoradiation for esophageal cancer | Imaging | 94 | SVM, XGBoost | Accuracy = 0.708 AUC = 0.541 |
| Riyahi et al21 | Prediction of pathologic tumor response to chemoradiotherapy in esophageal cancer | Imaging | 20 | SVM-Lasso model | Sensitivity = 94.4% Specificity = 91.8% AUC = 0.94 |
Abbreviations: HCC, hepatocellular carcinoma; LR, logistic regression; RF, random forest; SVM, support vector machine; NN, neural network.
The number of samples derived from patients, not the number of unique patients participating in the study.
Footnotes
Issue Theme Hepato-Pancreato-Biliary & Transplant Surgery; Guest Editor, Koji Hashimoto, MD, PhD
Conflict of Interest
Dr. Chapiro reports grants from the German-Israeli Foundation for Scientific Research and Development, Rolf W. Günther Foundation for Radiological Research, Boston Scientific, Philips Healthcare, and Guerbet, outside the submitted work.
References
- 1.Bishop CM. Pattern Recognition and Machine Learning. New York, NY: Springer; 2006 [Google Scholar]
- 2.Patel VL, Shortliffe EH, Stefanelli M, et al. The coming of age of artificial intelligence in medicine. Artif Intell Med 2009;46(01): 5–17 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017; 542(7639):115–118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Teare P, Fishman M, Benzaquen O, Toledano E, Elnekave E. Malignancy detection on mammography using dual deep convolutional neural networks and genetically discovered false color input enhancement. J Digit Imaging 2017;30(04):499–505 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Zhou W, Ma Y, Zhang J, et al. Predictive model for inflammation grades of chronic hepatitis B: large-scale analysis of clinical parameters and gene expressions. Liver Int 2017;37(11): 1632–1641 [DOI] [PubMed] [Google Scholar]
- 6.Dong TS, Kalani A, Aby ES, et al. Machine learning-based development and validation of a scoring system for screening high-risk esophageal varices. Clin Gastroenterol Hepatol 2019;17(09): 1894–1901.e1 [DOI] [PubMed] [Google Scholar]
- 7.Ito N, Kawahira H, Nakashima H, Uesato M, Miyauchi H, Matsubara H. Endoscopic diagnostic support system for cT1b colorectal cancer using deep learning. Oncology 2019;96(01):44–50 [DOI] [PubMed] [Google Scholar]
- 8.Cai ZQ, Si SB, Chen C, et al. Analysis of prognostic factors for survival after hepatectomy for hepatocellular carcinoma based on a Bayesian network. PLoS One 2015;10(03):e0120805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ichimasa K, Kudo SE, Mori Y, et al. Artificial intelligence may help in predicting the need for additional surgery after endoscopic resection of T1 colorectal cancer. Endoscopy 2018;50(03): 230–240 [DOI] [PubMed] [Google Scholar]
- 10.Kim S, Shin J, Kim DY, Choi GH, Kim MJ, Choi JY. Radiomics on gadoxetic acid-enhanced magnetic resonance imaging for prediction of postoperative early and late recurrence of single hepatocellular carcinoma. Clin Cancer Res 2019;25(13):3847–3855 [DOI] [PubMed] [Google Scholar]
- 11.Hamm CA, Wang CJ, Savic LJ, et al. Deep learning for liver tumor diagnosis part I: development of a convolutional neural network classifier for multi-phasic MRI. Eur Radiol 2019;29(07):3338–3347 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Siedlikowski ST, Kielar AZ, Ormsby EL, Bijan B, Kagay C. Implementation of LI-RADS into a radiological practice. Abdom Radiol (NY) 2018;43(01):179–184 [DOI] [PubMed] [Google Scholar]
- 13.Alpaydin E Introduction to Machine Learning. 3rd ed. Cambridge, MA: MIT Press; 2014 [Google Scholar]
- 14.Hastie T, Tibshirani R, Friedman JH. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York, NY: Springer; 2009 [Google Scholar]
- 15.Ng AY, Jordan MI. On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes Paper presented at the Advances in Neural Information Processing Systems, December 9–14, 2002, Vancouver, British Columbia, Canada [Google Scholar]
- 16.Nielsen MA. Neural Networks and Deep Learning. Determination Press; 2015 [Google Scholar]
- 17.Quellec G, Charrière K, Boudi Y, Cochener B, Lamard M. Deep image mining for diabetic retinopathy screening. Med Image Anal 2017;39:178–193 [DOI] [PubMed] [Google Scholar]
- 18.Braman NM, Etesami M, Prasanna P, et al. Intratumoral and peritumoral radiomics for the pretreatment prediction of pathological complete response to neoadjuvant chemotherapy based on breast DCE-MRI. Breast Cancer Res 2017;19(01):57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kim J, Shin H. Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data. J Am Med Inform Assoc 2013;20(04):613–618 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Chen X, Duan Q, Xuan Y, Sun Y, Wu R. Possible pathways used to predict different stages of lung adenocarcinoma. Medicine (Baltimore) 2017;96(17):e6736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Riyahi S, Choi W, Liu CJ, et al. Quantifying local tumor morphological changes with Jacobian map for prediction of pathologic tumor response to chemo-radiotherapy in locally advanced esophageal cancer. Phys Med Biol 2018;63(14):145020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Begik O, Oyken M, Cinkilli Alican T, Can T, Erson-Bensan AE. Alternative polyadenylation patterns for novel gene discovery and classification in cancer. Neoplasia 2017;19(07):574–582 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Sharma H, Zerbe N, Klempert I, Hellwich O, Hufnagl P. Deep convolutional neural networks for automatic classification of gastric carcinoma using whole slide images in digital histopathology. Comput Med Imaging Graph 2017;61:2–13 [DOI] [PubMed] [Google Scholar]
- 24.Abajian A, Murali N, Savic LJ, et al. Predicting treatment response to intra-arterial therapies for hepatocellular carcinoma with the use of supervised machine learning-an artificial intelligence concept. J Vasc Interv Radiol 2018;29(06):850–857.e1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Morilla I, Uzzan M, Laharie D, et al. Colonic MicroRNA profiles, identified by a deep learning algorithm, that predict responses to therapy of patients with acute severe ulcerative colitis. Clin Gastroenterol Hepatol 2019;17(05):905–913 [DOI] [PubMed] [Google Scholar]
- 26.Jin X, Zheng X, Chen D, et al. Prediction of response after chemoradiation for esophageal cancer using a combination of dosimetry and CT radiomics. Eur Radiol 2019;29(11):6080–6088 [DOI] [PubMed] [Google Scholar]
- 27.Lee JH, Kim YJ, Kim YW, et al. Spotting malignancies from gastric endoscopic images using deep learning. Surg Endosc 2019;33 (11):3790–3797 [DOI] [PubMed] [Google Scholar]
- 28.Wang K, Lu X, Zhou H, et al. Deep learning Radiomics of shear wave elastography significantly improved diagnostic performance for assessing liver fibrosis in chronic hepatitis B: a prospective multicentre study. Gut 2019;68(04):729–741 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Wang P, Xiao X, Glissen Brown JR, et al. Development and validation of a deep-learning algorithm for the detection of polyps during colonoscopy. Nat Biomed Eng 2018;2(10):741–748 [DOI] [PubMed] [Google Scholar]
- 30.Wu CC, Yeh WC, Hsu WD, et al. Prediction of fatty liver disease using machine learning algorithms. Comput Methods Programs Biomed 2019;170:23–29 [DOI] [PubMed] [Google Scholar]
- 31.Yip TC, Ma AJ, Wong VW, et al. Laboratory parameter-based machine learning model for excluding non-alcoholic fatty liver disease (NAFLD) in the general population. Aliment Pharmacol Ther 2017;46(04):447–456 [DOI] [PubMed] [Google Scholar]
- 32.Augustin S, Muntaner L, Altamirano JT, et al. Predicting early mortality after acute variceal hemorrhage based on classification and regression tree analysis. Clin Gastroenterol Hepatol 2009;7 (12):1347–1354 [DOI] [PubMed] [Google Scholar]
- 33.Chen K, Zhang W, Zhang Z, He Y, Liu Y, Yang X. Simple vascular architecture classification in predicting pancreatic neuroendocrine tumor grade and prognosis. Dig Dis Sci 2018;63(11):3147–3152 [DOI] [PubMed] [Google Scholar]
- 34.Chen T, Ning Z, Xu L, et al. Radiomics nomogram for predicting the malignant potential of gastrointestinal stromal tumours preoperatively. Eur Radiol 2019;29(03):1074–1082 [DOI] [PubMed] [Google Scholar]
- 35.Eaton JE, Vesterhus M, McCauley BM, et al. Primary Sclerosing Cholangitis Risk Estimate Tool (PREsTo) predicts outcomes of the disease: a derivation and validation study using machine learning. Hepatology 2018. (e-pub ahead of print). doi: 10.1002/hep.30085 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Horie Y, Yoshio T, Aoyama K, et al. Diagnostic outcomes of esophageal cancer by artificial intelligence using convolutional neural networks. Gastrointest Endosc 2019;89(01):25–32 [DOI] [PubMed] [Google Scholar]
- 37.Merath K, Hyer JM, Mehta R, et al. Use of machine learning for prediction of patient risk of postoperative complications after liver, pancreatic, and colorectal surgery. J Gastrointest Surg 2019. (e-pub ahead of print). doi: 10.1007/s11605-019-04338-2 [DOI] [PubMed] [Google Scholar]
- 38.Waljee AK, Lipson R, Wiitala WL, et al. Predicting hospitalization and outpatient corticosteroid use in inflammatory bowel disease patients using machine learning. Inflamm Bowel Dis 2017;24(01): 45–53 [DOI] [PMC free article] [PubMed] [Google Scholar]
