Abstract
OBJECTIVES
Machine learning (ML) has great potential, but there are few examples of its implementation improving outcomes. The thoracic surgeon must be aware of pertinent ML literature and how to evaluate this field for the safe translation to patient care. This scoping review provides an introduction to ML applications specific to the thoracic surgeon. We review current applications, limitations and future directions.
METHODS
A search of the PubMed database was conducted with inclusion requirements being the use of an ML algorithm to analyse patient information relevant to a thoracic surgeon and contain sufficient details on the data used, ML methods and results. Twenty-two papers met the criteria and were reviewed using a methodological quality rubric.
RESULTS
ML demonstrated enhanced preoperative test accuracy, earlier pathological diagnosis, therapies to maximize survival and predictions of adverse events and survival after surgery. However, only 4 performed external validation. One demonstrated improved patient outcomes, nearly all failed to perform model calibration and one addressed fairness and bias with most not generalizable to different populations. There was a considerable variation to allow for reproducibility.
CONCLUSIONS
There is promise but also challenges for ML in thoracic surgery. The transparency of data and algorithm design and the systemic bias on which models are dependent remain issues to be addressed. Although there has yet to be widespread use in thoracic surgery, it is essential thoracic surgeons be at the forefront of the eventual safe introduction of ML to the clinic and operating room.
Keywords: Artificial intelligence, Machine learning, Prediction, Survival, Complications, Algorithm
Artificial intelligence (AI) is now ubiquitous but has only recently begun to be utilized in health care [1, 2].
INTRODUCTION
Artificial intelligence (AI) is now ubiquitous but has only recently begun to be utilized in health care [1, 2]. Patient data are becoming progressively more complex, providing a robust prospect for more sophisticated tools like AI to analyse this information beyond traditional statistical methods. Machine learning (ML) is a subset of AI, referring to computer algorithms that learn from large amounts of data and improve with experience to make accurate predictions for a given task. ML has shown the potential to augment personalized healthcare delivery, offering opportunities for improved outcomes and reduced costs [2]. This has led to an explosion of publications on ML in medicine. This surge is centred around increases in computational power and the availability of medical ‘big data’ to allow ML algorithms to understand patterns and make predictions using images, speech and text [1, 3–5]. Significant resources are increasingly being poured into AI, with the 2020 global AI in healthcare market size estimated at 4.9 billion US dollars and projected to reach 45.2 billion by 2026 [6].
Despite the hype surrounding ML in medicine, there are limited examples of its implementation leading to improved clinical outcomes. Therefore, as AI becomes more prevalent in the surgical sphere, it is imperative surgeons can critically evaluate these publications to lead the development and eventual implementation of this technology at the point of care [7]. Excellent reviews of ML in cardiac surgery have been published [8, 9], but little has been published specific to general thoracic surgery [10]. This paper provides a novel review of current ML models to address: (i) the current applications and limitations of AI specific to thoracic surgery and (ii) the future of AI in thoracic surgery.
MATERIALS AND METHODS
A literature review was conducted in December of 2020 on PubMed for clinical research papers and review papers published before January 2021. Non-English papers, abstracts without full text and paediatric populations were excluded to improve generalizability.
This search resulted in 347 papers. Additional investigations of PubMed relevant to ML in thoracic surgery were performed to include more recent publications without medical subject heading search tagging. Given this review’s narrative nature and the heterogeneous methods used, a scoping review of ML in thoracic surgery was undertaken, and the PRISMA-ScR checklist was followed (Supplementary Material). In total, 22 manuscripts met inclusion criteria, with the primary requirement focusing on ML algorithms to analyse patient information relevant to a thoracic surgeon. Papers were also required to contain sufficient details on data used for analysis, ML methods and results. The search terms used and paper selection can be found in Supplementary Material, Fig. S1. Selected articles are presented in 5 categories (Fig. 1).
Figure 1:
Categories and papers selected. ML: machine learning; NSCLC: non-small-cell lung cancer; PFT: pulmonary function test; SCLC: small-cell lung cancer.
Each study was assessed by authors K.P.S., D.M., C.T., S.A.M., D.T.R and L.A.C. The authors developed a rubric to allow for the standard evaluation of quality assessment, capturing what was agreed as essential metrics of quality for ML, and were independently followed by each evaluator (Supplementary Material, Table S1 and Fig. 3). The following characteristics were extracted and evaluated from each study: study design, data management, methods, model performance and implementation. A detailed assessment of each selected paper’s data analysis was also performed (Supplementary Material, Table S2).
Figure 3:
Variation in methodology among the reviewed papers. Green represents papers that addressed rubric criteria and red represents those that did not.
RESULTS
Overview of machine learning
ML involves programming computers to ‘learn’ from data and is briefly described here as more detailed reviews have been previously published [1, 7, 9]. Table 1 provides a glossary of standard terms in ML.
Table 1:
Glossary of common machine learning terms
| Term | Definition |
|---|---|
| Algorithm | Mathematical approaches are applied to data that create models |
| Bias | From the perspective of data acquisition, this is referring to selection bias. Bias can be found when the population under study does not represent a general population, such as most patients being of only one gender or race |
| Fairness | Biases present in model predictions. An example of an unfair model could be unfavourable model predictions based on race or gender resulting from the training data |
| Features | The measured attributes believed to contribute to the prediction |
| Hyperparameter | A parameter of the algorithm set by the researcher before learning |
| Label | The result being predicted |
| Machine Learning | A branch of artificial intelligence focused on how to make machines (computers) learn from experience |
| Model | The mathematical function created by the algorithm from the training data. It can also be called a ‘classifier’ if used to predict labels of different ‘classes’ |
| Optimization | A mathematical process of minimizing the error of the model’s predictions, thereby optimizing the model accuracy |
| Supervised learning | A type of machine learning where the algorithm learns from labelled training data |
| Training data | Data that is fed to the machine learning algorithm to create the model which can then be used to predict labels on new data |
| Testing data | Data that was not used for training fed to the model to determine the model’s accuracy |
| Unsupervised learning | A type of machine learning where the algorithm learns from unlabelled training data |
| Validation set | A sample of training data not used for training but used as an estimate of model performance. If performance on the validation set is poor, methods to improve performance such as hyperparameter tuning can be done before applying to the formal test set |
While traditional statistical models such as linear or logistic regression seek to describe relationships between variables, ML ultimately seeks to predict, classify or optimize. The ML program differentiates between the training data (the features) and what we would like to infer (the label). The programme can learn a mathematical function from the data and recursively adjust to provide more accurate predictions, classifications or optimizations by minimizing the error of the desired function. ML algorithms can also have various elements called hyperparameters that can be modified by the scientist as opposed to the computer to improve model performance. Common subcategories of ML include computer vision, deep learning, natural language processing (NLP) and reinforcement learning. ML algorithms can be supervised if the algorithm is given the labels of training data or unsupervised if they are not given the labels [1]. Figure 2 outlines which papers included in this review were supervised/unsupervised and the algorithms used.
Figure 2:
Machine learning algorithms utilized in the reviewed papers. Each algorithm has a colour and is represented by the number using that algorithm. LASSO: Least Absolute Shrinkage and Selection Operator; SVM: support vector machine.
It is recommended that ML models be evaluated with data scientists to ensure sound computer code, mathematical models and accurate interpretation of model results. Many clinicians do not have a computer science background to assess the methodology properly. A standard approach to the critical evaluation of these models is also recommended, such as the rubric created and used in this paper (Supplementary Material, Table S1 and Fig. 3), and others have also been suggested [7, 11].
Machine learning for diagnostic augmentation
ML may be helpful in testing pertinent to the thoracic surgeon through its ability to accurately detect subtle patterns, such as in lung nodule classification [12]. One group applied a deep learning convolutional neural network model trained on 42 290 low-dose computer tomography (CT) scans from the National Lung Cancer Screening Trial to a test set of 6716 patients from the National Lung Cancer Screening Trial to perform localization and lung cancer risk categorization. The algorithm performed as well as the radiologists if prior imaging was present; however, if no previous imaging was available, the model outperformed the radiologists, reducing the false-positive rate by 11% and the false-negative rate by 5%. The model achieved 94.4% area under the receiver operator curve (AUC) for correctly identifying cancer risks of lesions found on screening CTs [13].
Pulmonary function tests represent a similar opportunity for pattern recognition augmentation. In one study, an ML model was trained to recognize disease patterns and diagnoses on 1430 pulmonary function tests. One hundred twenty pulmonologists independently evaluated pulmonary function tests of 50 random patients, and the model was tested on these same patients for comparison [14]. The pulmonologists' interpretation matched the American Thoracic Society/European Respiratory Society (ATS/ERS) reference pattern in 74.4% of cases, with 44.6% accuracy for correct diagnostic category assignment, with high inter-rater variability. The AI software matched the same patients’ reference patterns in 100% of cases, with a correct diagnostic category assignment of 82%, definitively outperforming the pulmonologists with an average interpretation time of 0.2 s. The stark difference in performance was considered secondary to the algorithm detecting subtle characteristics difficult for humans to see, avoiding arbitrary cut-off values such as the 0.7 forced expiratory volume in 1 s/forced vital capacity ratio for obstructive disease. The algorithm detected a continuum of disease, allowing for more precise disease differentiation.
Automation and enhanced detection of subtle patterns may allow ML models to improve the sensitivity and specificity of diagnostic tests and the accuracy of recommendations for clinical management, especially in centres without pulmonologists or thoracic radiologists. positron emission tomography/CT and other testing modalities pertinent to the thoracic surgeon may benefit from similar models.
Machine learning for preoperative predictions
A powerful application of ML is the ability to make predictions based on previous patient data. One example involved lung cancer detection using circulating tumour DNA (ctDNA) in patients with lung nodules [15]. Next-generation sequencing was used to detect ctDNA before surgery in 192 consecutive patients with lung lesions on imaging (56 non-cancer, 136 cancer) who underwent surgical resection. An ML algorithm using linear discriminant analysis was then employed to improve the ctDNA mutation analysis. The overall sensitivity of ctDNA alone for detecting lung cancer was 69%, with a 96% specificity. When clinical, biomarker and genomic features were given to the ML model, the sensitivity and specificity improved to 80% and 99%, respectively. Given that most patients present with late-stage lung cancer, earlier detection could meaningfully impact many of these patients’ management.
Another group used a random forest model to predict features associated with nodal positivity in oesophageal cancer to help guide the extent of lymphadenectomy needed for accurate nodal detection [16]. Using 31 variables of data from 5806 oesophagectomy patients from the Worldwide Esophageal Cancer Collaboration (WECC), the model found that higher grade, higher T stage and longer cancers were the strongest predictors of nodal positivity. These same characteristics also predicted the number of positive lymph nodes, with a lesser lymphadenectomy needed to detect nodal disease with these more aggressive cancers. Based on the algorithm, the authors argued that for shorter tumours (<2.5 cm), the extent of lymphadenectomy required to detect nodal positivity needed to be greater than for longer, invasive and poorly differentiated cancers.
Reports have also looked at predicting medical therapies to maximize survival in oesophageal cancer. A study from 2007 used gene expression profiles of 46 patients with oesophageal cancer [21 squamous cell carcinomas (SCC), 25 adenocarcinomas (AC)] to predict response to preoperative chemoradiotherapy (CRT) [17]. An unsupervised hierarchical cluster analysis and a supervised support vector machine (SVM) classified tumour samples based on gene expression. The hierarchical cluster analysis did not classify well, and the SVM did not perform satisfactorily when AC and SCC cancer patients were combined or in the individual AC cohort. The SVM model reached a predicted power of 87% for SCC and, using a 32-gene classifier, identified 45% of SCC patients who would not have benefitted from CRT.
Focusing solely on oesophageal SCC, 1 group evaluated pre-CRT microRNA profiles in 106 patients to establish prediction models for CRT response [18]. MicroRNAs are suggested to be tied to prognosis and regulate SCC cell growth, invasion and migration [19]. Therefore, the authors deployed multiple statistical models from logistic regression to more complex ML models to predict the treatment response using 10 microRNA profiles differentially expressed between responders and non-responders to treatment. The ML model had the best performance, with an accuracy of 87.3%, a specificity of 88.1% and a sensitivity of 83.3% for identifying patients who would be non-responders to CRT. Similarly, the ML model’s subgroup was the only independent factor associated with CRT response in the validation set.
A more recent study used a random forest algorithm to identify the treatment strategy that maximizes an individual patient’s survival for oesophageal cancer [20]. Again, using the WECC database, 13 320 patients were included in 4 treatment categories: oesophagectomy alone, neoadjuvant therapy, oesophagectomy and adjuvant therapy, or tri-modality therapy (neoadjuvant therapy, oesophagectomy and adjuvant therapy). Specific neoadjuvant/adjuvant therapies were not discussed. The models used 36 variables, generating survival curves for each patient and predicted counterfactual survival curves for alternative therapies they did not receive but would be eligible for. The optimal therapy’s predicted outcome was then compared with the actual treatment, resulting in lifetime gained from the optimal treatment. The optimal therapy determined by the model for most patients with oesophageal AC was oesophagectomy alone or neoadjuvant treatment. Not surprisingly, survival was highly variable and depended on the patient’s clinical and cancer features. For example, a 67-year-old white male with a clinical T1N0M0 AC of the oesophagus would be predicted to have a restricted mean survival time of 7.3 years after oesophagectomy and 6.5 years after neoadjuvant therapy. In contrast, a 65-year-old white male with clinical grade 2, T4N1M0 AC, has a predicted 4.7 restricted mean survival time for oesophagectomy alone and 6 years for neoadjuvant therapy.
Despite many of these models being from single institutions with small sample sizes, they are consistent with national guidelines and represent a more nuanced approach than general guidelines currently offer. Similar models may move us closer towards precision cancer care and personalized medicine; however, generalizability and validation are limited at this time.
Machine learning for postoperative complication prediction
Predicting adverse events after surgery has been a growing topic of ML study. Real-time risk score development for hospitalized patients at risk of deterioration represents one of the few ML applications validated to improve outcomes. An integrated health system developed an automated warning system (advance alert monitor) that identifies patients at risk for deterioration using an ML model [21]. The model was trained on 649 418 hospital admissions in which 19 153 had deterioration and included patients admitted to a general medical-surgical ward or step-down unit. The advance alert monitor creates hourly scores using a logistic regression model on electronic health record (EHR) data such as lab tests, vital signs, neurological status, severity of illness, comorbidities and other indicators such as length of stay. Using these real-time data, the model predicts the risk of unplanned intensive care unit (ICU) transfer or death during hospitalization and provides 12-h warnings. The advance alert monitor performance at hospitals where the system was utilized was compared to outcomes at hospitals where it was not active but would have triggered an activation. They found a lower incidence of ICU admission (17.7% vs 20.9%), a shorter length of stay (6.5 vs 7.2 days) and lower 30-day mortality after an event that led to an alert (15.8% vs 20.4%), with a calculated avoidance of 520 deaths per year over the 3.5-year study period.
ML has also been used to predict postoperative complications after lung resection. The earliest study included in this review (2002) retrospectively collected 96 preoperative clinical, laboratory and spirometry variables for consecutive lung-resection patients at a single institution [22]. Three neural networks were trained to predict mortality 30 days postoperatively, and a fourth neural network was trained to predict postoperative complications (pneumonia, respiratory insufficiency, myocardial infarction and sepsis). The model was trained on 113 of 141 patients and tested on the remaining 28 patients. There were 4 deaths in the test group, and 2 of the 3 neural network models predicted these deaths correctly. The neural network trained to estimate major complications classified all 28 test cases correctly.
A similar study from 2003 evaluated 489 patients who underwent resection for non-small-cell lung cancer (NSCLC) [23]. Neural networks were trained on 348 of these patients using various clinical and surgical features to predict the probability of morbidity after surgery. An ensemble of 100 neural networks was tested on the remaining 141 patients, averaging the ensemble results to get the final morbidity prediction. The sensitivity of the neural networks was 67%, with 100% specificity, and an AUC of 98% [23].
A more recent study evaluated a neural network to predict hospital discharge within 24 h. The network was trained on 15 201 patients discharged from a general surgical inpatient floor (thoracic patients included) at a single hospital, validated on 3843 patients and tested on a cohort of 605 patients [24]. Input data were obtained from a database linked to the patient’s EHR, allowing for a wide array of feature types and real-time updates to the model. The features consisted of demographics, surgery information, medications, test results, clinician orders and notes. Clinical milestones were also tracked, such as the transition to oral medication and barriers to discharge. Predicting discharge within 24 h, the model achieved a mean AUC of 0.842, with 22.7% of patients being discharged later than predicted. Evaluation of the model parameter weights revealed the top reasons for preventing discharge included lack of an oral diet, lack of visiting nurse services, patients lacking social support and disposition to an inpatient facility, among many more. The neural network also outperformed a historical model of median length of stay by surgical procedure type.
One group used ML to predict respiratory failure after lung resection [25]. They used discharge data from the National Inpatient Sample (NIS) database for lobectomy patients from 2015. Supervised ML with random forest without hyperparameter tuning followed by a synthetic minority oversampling technique was employed for model development. Input variables included demographics, comorbidities and post-lobectomy hospital events from 4062 patients, 417 of whom had respiratory failure. Chronic electrolyte imbalance, sepsis, chronic disease burden, acute kidney failure, malnutrition and open operations were the essential model variables predictive of postoperative respiratory failure. The model achieved a sensitivity of 83.3% and a specificity of 94.4% for predicting respiratory failure, though this was reduced to 69.4% and 85.0%, respectively, after an error was discovered in the programming code [26].
The same group attempted to predict early readmission after oesophagectomy, identifying risk factors for patients in the National Readmission Database (NRD) [27]. There were 383 patients readmitted within 30 days among 2307 who underwent oesophagectomy during the study timeframe. Similar ML techniques were used, with a random forest and a NearMiss algorithm to increase the model’s sensitivity. Variables most predictive of readmission were severity and mortality scores, pneumonia, anastomotic leak, pyothorax, case urgency and postoperative acute renal failure. This resulted in a sensitivity of 71.7% and 51.4% specificity; however, the same coding error as in the respiratory failure study decreased the AUC by 1–20% [28].
These last 2 models containing coding errors allowed a ‘leak’ between training and testing data, essentially allowing the model to train on testing data, falsely overestimating the model’s predictive ability. Coding errors such as this demonstrate the importance of working with data scientists for validation of correct model design and deployment. However, despite the pitfalls for errors, these models show the potential power of ML models to assist in postoperative risk stratification to impact outcomes.
Machine learning prediction using molecular markers
ML algorithms have also been used with molecular markers to predict survival in postoperative patients for lung and oesophageal cancer. The influence of histone modification on recurrence-free survival after oesophagectomy for SCC was studied using a K-means clustering ML algorithm applied to 237 patients at a single institution [29]. Clusters with poor recurrence-free survival were associated with histone modification in H3K18Ac and H4R3diMe, suggesting that certain histone modifications may represent an independent risk factor for recurrence-free survival in SCC. Clustering has also been employed to identify a CpG island methylator phenotype of resected small-cell lung cancer that led to a poorer prognosis [30].
In an attempt to identify prognostic biomarkers, 1 group looked at protein expression levels of 75 signalling proteins in 384 resected NSCLC tumours [31]. They trained a random forest and SVM-Radial Basis Kernel to identify protein signatures that portended a worse prognosis, identifying a 6-protein signature for AC and a 5-protein signature for SCC of the lung. Using these signatures, the model could identify ‘good’ and ‘poor’ prognosis groups, with a 3-year survival of 96.0% and 37.5% for AC in the good and poor prognosis groups, respectively. A 3-year survival of 72.7% in the ‘good’ group and 15.8% in the ‘poor’ group was found for SCC, further supporting that protein expression levels could represent independent predictors of survival.
Finally, immune cell infiltration has also been evaluated for prognostic significance. Multiple datasets were used to comprise 751 resected lung AC tumours for training a Least Absolute Shrinkage and Selection Operator (LASSO) Cox regression model, creating an immune cell infiltration score that was validated on 418 patients [32]. Based on the model, a lower score had a significantly better prognosis than those with a higher score.
These studies demonstrate how ML may assist in improving survival prediction above what the tumour, node and metastasis (TNM) staging system alone can offer with the addition of molecular markers.
Machine learning for postoperative survival prediction
Several studies have looked at using ML to predict survival after resection in lung and oesophageal cancer. Looking at oesophageal SCC, a 2013 study used SVM models to predict distant postoperative metastasis after oesophagectomy [33]. The authors used 4 SVM models: SVM1 included T and N stage; SVM2 included T stage, N stage, tumour length and a marker for cell differentiation; SVM3 included the features of SVM2 plus 12 molecular markers based on the univariate analysis of the training cohort, and SVM4 included all features of SVM2 and 9 molecular markers based off of the univariate analysis of the training and validation cohorts. Training occurred on 319 patients, with 164 patients in a validation cohort. SVM4 performed the best with a sensitivity of 56.6%, a specificity of 97.7%, and an accuracy of 78.7% for predicting the risk of distant metastasis after oesophagectomy, potentially aiding in postoperative discussions of individual benefits of adjuvant therapy.
Concerning lung cancer, a 2011 study developed a model to predict survival in completely resected early-stage NSCLC patients [34]. A decision tree algorithm using 30 patient variables (covering TNM features, clinical variables, spirometry and molecular markers) was used on 482 patients to predict 5-year survival, achieving an AUC of 74%. A similar study aiming to improve survival prediction in NSCLC over the TNM staging system retrospectively evaluated 1981 patients from a thoracic surgery database [35]. A neural network was trained using important variables from univariate and multivariate Cox regression to predict mortality. The authors found that age over 67 and body mass index over 27.6 significantly affected patients’ survival with NSCLC.
More recently, going beyond the more structured ML methods, soft computing via fuzzy and soft sets—thought to be robust on uncertain data—was used to predict 5-year survival after resection for NSCLC [36]. The model was trained on 100 patients using 6 input variables (age, body mass index, chronic obstructive pulmonary disease, forced expiratory volume in 1 s, surgical approach, complications) and resulted in a survival prediction accuracy of 79%.
Finally, a study from 2020 used 16 140 NSCLC patients from the Surveillance, Epidemiology and End Results (SEER) database to train a neural network called DeepSurv to predict lung cancer-specific survival [37]. DeepSurv was then validated on 3228 patients and deployed on an independent dataset of 1182 patients from Shanghai. An additional neural network was trained to output personalized treatment recommendations (lobectomy vs sub-lobectomy) and categorized patients into 2 groups based on the concordance of the treatment recommended and received. DeepSurv outperformed the TNM system for lung cancer-specific survival with a C statistic of 74% vs 70%, respectively. Patients who received the treatment recommended by the neural network had better survival rates than those who did not, specifically those who underwent sublobar resection when lobectomy was advised. The authors created a user-friendly computer interface, allowing for easy input of features preoperatively or postoperatively to guide treatment recommendations and make survival predictions based on different treatments, taking steps towards implementing personalized surgical care in the clinic.
DISCUSSION
This review demonstrates that ML can enhance the accuracy of preoperative tests, facilitate earlier diagnosis of pathology, identify cancer therapies likely to maximize survival, and predict adverse events and survival after thoracic surgery. However, only one of the models reviewed (Advanced Alert Monitor) has reached clinical implementation. More widespread employment of this technology is likely to first emerge in specialties pioneering AI such as radiology. What to do with AI-based surgical models and the legality of their predictions have yet to be explored. Therefore, surgeons' critical evaluation of this technology is needed now, before the eventual introduction to practice.
New models should be evaluated as any novel patient therapy with the expectation of external validation and prospective validation of results, ideally in a randomized clinical trial followed by Food and Drug Administration (FDA) approval. Of the 130 medical AI devices currently approved by the FDA, only 4 were evaluated prospectively, many without comparison to clinician performance [38]. As future models emerge, the use of rubrics and interpretation with data scientists may prove beneficial. Utilizing our rubric, Fig. 3 demonstrates the variation of the reviewed studies according to rubric criteria. While all articles reported validation strategies, only 4 performed external validation on independent datasets. Only one of the papers demonstrated improved patient outcomes, and 1 addressed potential bias and fairness in the data. Nearly all failed to perform model calibration—comparing the predicted probabilities to the distribution of observed outcomes. Moreover, there was considerable variation in the amount of information provided to allow for reproducibility of results, ranging from the inclusion of cursory supplementary material to an entire GitHub repository. Based on our review, 14 studies included enough detail to reproduce the methods closely, and the majority of the models are not generalizable to different patient populations.
Every algorithm is dependent on the data on which it is trained, and statistical bias may exist from small sample sizes or non-representative training cohorts, errors in measurement or heterogeneity of treatment effects. Bias within the training cohort can propagate healthcare disparities, as historically collected data used to derive these models may underrepresent vulnerable groups by containing very few minority populations or featuring patients from 1 country or continent. Efforts to train algorithms on representative datasets should be undertaken when developing novel ML applications. Global datasets should be a long-term goal to mitigate bias and improve training. We need to hold research utilizing medical AI to higher standards addressing widespread bias, the lack of external and prospective evaluation and the lack of data sharing limiting reproducibility.
Looking to the future, AI can explore novel questions in surgery, providing insight into the efficacy of current practice. The proportion of medical practice based on scientific evidence is estimated at 10–20%, and it is impossible to perform randomized trials for each decision a surgeon makes daily [39]. Many choices fall into the ‘art of medicine’, based on training, experience and biases allowing for significant practice variation [40]. ML can leverage the vast amounts of data readily available in the EHR. For instance, ∼150 000 discrete pieces of data are created during a hospitalization, with an average of 1 million gigabytes of data managed by healthcare institutions worldwide, more data than at any time in history [41]. More robust conclusions might be drawn from this comprehensive clinical data to inform individualized patient care. In contrast, many databases such as the National Cancer Database or the Society of Thoracic Surgeons (STS) database are narrow in scope and scarce in detail. Combining databases such as the Medical Information Mart for Intensive Care (MIMIC) database [42], which contains high-resolution clinical data from over 60 000 ICU admissions, and the STS database may be a way to leverage clinical and surgical databases. Designing effective models can also be approached in novel ways as datathons are routinely performed across the globe on de-identified data, rewarding the highest performing model in the competition, allowing for collaboration beyond one’s institution [43].
One could imagine a patient determined to be high risk for lung cancer based on an ML risk calculation using clinical, molecular and genetic information and recommended to undergo lung cancer screening. An ML pattern recognition algorithm could then assist the radiologist in detecting concerning lung nodules. Another algorithm helps the pathologist to the correct diagnosis from a biopsy of concerning lesions. An AI patient-specific risk analysis could then guide appropriate medical and surgical treatment based on the likelihood of postoperative complications and overall survival and then provide tailored surveillance recommendations adaptable in real time as more data is acquired. The operating room may also see more AI, as computer vision technology is already being used to accurately identify objects, surgical instruments and steps of an operation to use as a potential virtual proctor [44]. AI could also assist daily workflows, such as using NLP-enabled dictation for EHR documentation, giving surgeons more time with patients and less paperwork [45]. AI has enormous potential to augment, not replace, what physicians do daily and provides an opportunity for industry as well, given the need for seamless software integration enhancing but without adding to the burden and clutter of the EHR. Figure 4 provides the critical steps one might take to embark on creating a medical AI program.
Figure 4:
Steps for a successful artificial intelligence program. Industry may be useful in Step 3.
Limitations
We acknowledge the limitations of this scoping review, specifically the narrative nature, lack of systematic data analysis and selection bias risk based on the chosen papers.
CONCLUSION
In conclusion, the implementation of AI in thoracic surgery holds much promise to improve preoperative test accuracy, speed pathological diagnosis, predict therapies to maximize cancer survival and predict survival and adverse events after thoracic surgery. ML provides the potential to move us from general guideline treatments to those most likely to maximize individual patient outcomes. New partnerships can be forged between disciplines, departments and beyond borders to move towards precision patient care by leveraging this technology. However, despite the demonstrated improvement in predictive abilities of ML models in thoracic surgery, this technology is still in its nascency as many challenges remain before implementation into daily practice. It is critical to emphasize the reproducibility of algorithm development, the prospective validation of model results and the recognition of potential bias on which models are dependent in new ML research. Therefore, thoracic surgeons must be at the forefront and critically evaluate these models for the eventual safe introduction of predictive AI models in the clinic and operating room to augment surgical decision-making.
SUPPLEMENTARY MATERIAL
Supplementary material is available at EJCTS online.
Funding
LAC is funded by the National Institute of Health through NIBIB R01 EB017205.
Conflicts of interest: none declared.
Author contributions
Kenneth P. Seastedt: Conceptualization; Data curation; Formal analysis; Methodology; Writing—original draft; Writing—review & editing. Dana Moukheiber: Conceptualization; Data curation; Formal analysis; Methodology; Writing—original draft; Writing—review & editing. Saurabh A. Mahindre: Conceptualization; Data curation; Formal analysis; Methodology; Writing—original draft; Writing—review & editing. Chaitanya Thammineni: Conceptualization; Data curation; Formal analysis; Methodology; Writing—original draft; Writing—review & editing. Darin T. Rosen: Conceptualization; Data curation; Formal analysis; Methodology; Writing—original draft; Writing—review & editing. Ammara A. Watkins: Conceptualization; Data curation; Formal analysis; Methodology; Writing—original draft; Writing—review & editing. Daniel A. Hashimoto: Conceptualization; Data curation; Formal analysis; Methodology; Writing—original draft; Writing—review & editing. Chuong D. Hoang: Conceptualization; Data curation; Formal analysis; Methodology; Writing—original draft; Writing—review & editing. Jacques Kpodonu: Conceptualization; Data curation; Formal analysis; Methodology; Writing—original draft; Writing—review & editing. Leo A. Celi: Conceptualization; Data curation; Formal analysis; Methodology; Writing—original draft; Writing—review & editing.
Reviewer information
European Journal of Cardio-Thoracic Surgery thanks Ilkka Ilonen, Rizwan A. Qureshi, Meinoshin Okumura and the other, anonymous reviewer(s) for their contribution to the peer review process of this article.
Supplementary Material
ABBREVIATIONS
- AC
Adenocarcinoma
- AI
Artificial intelligence
- AUC
Area under the receiver operator curve
- CRT
Chemoradiotherapy
- CT
Computer tomography
- ctDNA
Circulating tumour DNA
- ICU
Intensive care unit
- ML
Machine learning
- NLP
Natural language processing
- NSCLC
Non-small-cell lung cancer
- SCC
Squamous cell carcinoma
- SVM
Support vector machine
- TNM
Tumour, node and metastasis
REFERENCES
- 1. Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K. et al. A guide to deep learning in healthcare. Nat Med 2019;25:24–9. [DOI] [PubMed] [Google Scholar]
- 2. Matheny ME, Whicher D, Thadaney Israni S.. Artificial intelligence in health care: a report from the national academy of medicine. JAMA 2020;323:509–10. [DOI] [PubMed] [Google Scholar]
- 3. Krizhevsky A, Sutskever I, Hinton GE, ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, Vol. 1. Lake Tahoe, NV: Curran Associates Inc., 2012, 1097–105. [Google Scholar]
- 4. Hirschberg J, Manning CD.. Advances in natural language processing. Science 2015;349:261–6. [DOI] [PubMed] [Google Scholar]
- 5. Wu J, Yılmaz E, Zhang M, Li H, Tan KC.. Deep spiking neural networks for large vocabulary automatic speech recognition. Front Neurosci 2020;14:199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Artificial Intelligence in Healthcare Market with Covid-19 Impact Analysis by Offering (Hardware, Software, Services), Technology (Machine Learning, NLP, Context-Aware Computing, Computer Vision), End-Use Application, End User and Region - Global Forecast to 2026. https://www.marketsandmarkets.com/Market-Reports/artificial-intelligence-healthcare-market-54679303.html (February 2021, date last accessed).
- 7. Faes L, Liu X, Wagner SK, Fu DJ, Balaskas K, Sim DA. et al. A clinician's guide to artificial intelligence: how to critically appraise machine learning studies. Trans Vis Sci Tech 2020;9:7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Baxter RD, Fann JI, DiMaio JM, Lobdell K.. Digital health primer for cardiothoracic surgeons. Ann Thorac Surg 2020;110:364–72. [DOI] [PubMed] [Google Scholar]
- 9. Kilic A. Artificial intelligence and machine learning in cardiovascular health care. Ann Thorac Surg 2020;109:1323–9. [DOI] [PubMed] [Google Scholar]
- 10. Etienne H, Hamdi S, Le Roux M, Camuset J, Khalife-Hocquemiller T, Giol M. et al. Artificial intelligence in thoracic surgery: past, present, perspective and limits. Eur Respir Rev 2020;29:200010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Matheny ME, ThadaneyIsrani S, Ahmed M, Whicher D.. AI in Health Care: The Hope, the Hype, the Promise, the Peril. Washington, DC: National Academy of Medicine, 2019. https://nam.edu/artificial-intelligence-special-publication (February 2021, date last accessed). [Google Scholar]
- 12. Uthoff J, Stephens MJ, Newell JD, Hoffman EA, Larson J, Koehn N. et al. Machine learning approach for distinguishing malignant and benign lung nodules utilizing standardized perinodular parenchymal features from CT. Med Phys 2019;46:3207–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Ardila D, Kiraly AP, Bharadwaj S, Choi B, Reicher JJ, Peng L. et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat Med 2019;25:954–61. [DOI] [PubMed] [Google Scholar]
- 14. Topalovic M, Das N, Burgel PR, Daenen M, Derom E, Haenebalcke C. et al. Artificial intelligence outperforms pulmonologists in the interpretation of pulmonary function tests. Eur Respir J 2019;53:1801660. [DOI] [PubMed] [Google Scholar]
- 15. Peng M, Xie Y, Li X, Qian Y, Tu X, Yao X. et al. Resectable lung lesions malignancy assessment and cancer detection by ultra-deep sequencing of targeted gene mutations in plasma cell-free DNA. J Med Genet 2019;56:647–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Rice TW, Ishwaran H, Hofstetter WL, Schipper PH, Kesler KA, Law S. et al. Esophageal cancer. Ann Surg 2017;265:122–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Duong C, Greenawalt DM, Kowalczyk A, Ciavarella ML, Raskutti G, Murray WK. et al. Pretreatment gene expression profiles can be used to predict response to neoadjuvant chemoradiotherapy in esophageal cancer. Ann Surg Oncol 2007;14:3602–9. [DOI] [PubMed] [Google Scholar]
- 18. Wen J, Luo K, Liu H, Liu S, Lin G, Hu Y. et al. MiRNA expression analysis of pretreatment biopsies predicts the pathological response of esophageal squamous cell carcinomas to neoadjuvant chemoradiotherapy. Ann Surg 2016;263:942–8. [DOI] [PubMed] [Google Scholar]
- 19. Mei LL, Qiu YT, Zhang B, Shi ZZ.. MicroRNAs in esophageal squamous cell carcinoma: potential biomarkers and therapeutic targets. Cancer Biomark 2017;19:1–9. [DOI] [PubMed] [Google Scholar]
- 20. Rice TW, Lu M, Ishwaran H, Blackstone EH; Worldwide Esophageal Cancer Collaboration Investigators. Precision surgical therapy for adenocarcinoma of the esophagus and esophagogastric junction. J Thorac Oncol 2019;14:2164–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Escobar GJ, Liu VX, Schuler A, Lawson B, Greene JD, Kipnis P.. Automated identification of adults at risk for in-hospital clinical deterioration. N Engl J Med 2020;383:1951–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Esteva H, Marchevsky A, Núñez T, Luna C, Esteva M.. Neural networks as a prognostic tool of surgical risk in lung resections. Ann Thorac Surg 2002;73:1576–81. [DOI] [PubMed] [Google Scholar]
- 23. Santos-García G, Varela G, Novoa N, Jiménez MF.. Prediction of postoperative morbidity after lung resection using an artificial neural network ensemble. Artif Intell Med 2004;30:61–9. [DOI] [PubMed] [Google Scholar]
- 24. Safavi KC, Khaniyev T, Copenhaver M, Seelen M, Zenteno Langle AC, Zanger J. et al. Development and validation of a machine learning model to aid discharge processes for inpatient surgical care. JAMA Netw Open 2019;2:e1917221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Bolourani S, Wang P, Patel VM, Manetta F, Lee PC.. Predicting respiratory failure after pulmonary lobectomy using machine learning techniques. Surgery 2020;168:743–52. [DOI] [PubMed] [Google Scholar]
- 26. Bolourani S, Wang P, Patel VM, Manetta F, Lee PC.. Corrigendum to “Predicting respiratory failure after pulmonary lobectomy using machine learning techniques”. Surgery 2020;169:1001. [DOI] [PubMed] [Google Scholar]
- 27. Bolourani S, Tayebi MA, Diao L, Wang P, Patel V, Manetta F. et al. Using machine learning to predict early readmission following esophagectomy. J Thorac Cardiovasc Surg 2021;161:1926–39.e8. [DOI] [PubMed] [Google Scholar]
- 28. Notice of corrections. J Thorac Cardiovasc Surg 2021;161:341. [Google Scholar]
- 29. Hoseok I, Ko E, Kim Y, Cho EY, Han J, Park J. et al. Association of global levels of histone modifications with recurrence-free survival in stage IIB and III esophageal squamous cell carcinomas. Cancer Epidemiol Biomarkers Prev 2010;19:566–73. [DOI] [PubMed] [Google Scholar]
- 30. Saito Y, Nagae G, Motoi N, Miyauchi E, Ninomiya H, Uehara H. et al. Prognostic significance of CpG island methylator phenotype in surgically resected small cell lung carcinoma. Cancer Sci 2016;107:320–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Jin BF, Yang F, Ying XM, Gong L, Hu SF, Zhao Q. et al. Signaling protein signature predicts clinical outcome of non-small-cell lung cancer. BMC Cancer 2018;18:259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Yang X, Shi Y, Li M, Lu T, Xi J, Lin Z. et al. Identification and validation of an immune cell infiltrating score predicting survival in patients with lung adenocarcinoma. J Transl Med 2019;17:217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Yang HX, Feng W, Wei JC, Zeng TS, Li ZD, Zhang LJ. et al. Support vector machine-based nomogram predicts postoperative distant metastasis for patients with oesophageal squamous cell carcinoma. Br J Cancer 2013;109:1109–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. López-Encuentra A, López-Ríos F, Conde E, García-Luján R, Suárez-Gauthier A, Mañes N. et al. ; on behalf of the Bronchogenic Carcinoma Cooperative Group of the Spanish Society of Pneumology and Thoracic Surgery (GCCB-S). Composite anatomical-clinical-molecular prognostic model in non-small cell lung cancer. Eur Respir J 2011;37:136–42. [DOI] [PubMed] [Google Scholar]
- 35. Poullis M, McShane J, Shaw M, Woolley S, Shackcloth M, Page R. et al. Lung cancer staging: a physiological update. Interact CardioVasc Thorac Surg 2012;14:743–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Alcantud JCR, Varela G, Santos-Buitrago B, Santos-García G, Jiménez MF.. Analysis of survival for lung cancer resections cases with fuzzy and soft set theory in surgical decision-making. PLoS One 2019;14:e0218283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. She Y, Jin Z, Wu J, Deng J, Zhang L, Su H. et al. Development and validation of a deep learning model for non-small cell lung cancer survival. JAMA Netw Open 2020;3:e205842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Wu E, Wu K, Daneshjou R, Ouyang D, Ho DE, Zou J.. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat Med 2021;27:582–4. [DOI] [PubMed] [Google Scholar]
- 39. Darst JR, Newburger JW, Resch S, Rathod RH, Lock JE.. Deciding without data. Congenit Heart Dis 2010;5:339–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Celi LA, Fine B, Stone DJ.. An awakening in medicine: the partnership of humanity and intelligent machines. Lancet Digit Health 2019;1:e255–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Banks MA. Sizing up big data. Nat Med 2020;26:5–6. [DOI] [PubMed] [Google Scholar]
- 42. Saeed M, Villarroel M, Reisner AT, Clifford G, Lehman LW, Moody G. et al. Multiparameter Intelligent Monitoring in Intensive Care II: a public-access intensive care unit database. Crit Care Med 2011;39:952–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Aboab J, Celi LA, Charlton P, Feng M, Ghassemi M, Marshall DC. et al. A “datathon” model to support cross-disciplinary collaboration. Sci Transl Med 2016;8:333ps8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Ward TM, Mascagni P, Ban Y, Rosman G, Padoy N, Meireles O. et al. Computer vision in surgery. Surgery 2021;169:1253–6. [DOI] [PubMed] [Google Scholar]
- 45. Kaufman DR, Sheehan B, Stetson P, Bhatt AR, Field AI, Patel C. et al. Natural language processing-enabled and conventional data capture methods for input to electronic health records: a comparative usability study. JMIR Med Inform 2016;4:e35. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





