Abstract
Objectives:
The goal of this study was to develop a deep neural network (DNN) for predicting surgical/medical complications and unplanned reoperations following thyroidectomy.
Design, Setting, and Participants:
The 2005–2017 American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) database was queried to extract patients who underwent thyroidectomy. A DNN consisting of 10 layers was developed with an 80:20 breakdown for training and testing.
Main Outcome Measures:
Three primary outcomes of interest, including occurrence of surgical complications, medical complications, and unplanned reoperation were predicted.
Results:
Of the 21,550 patients who underwent thyroidectomy, medical complications, surgical complications, and reoperation occurred in 1,723 (8.0%), 943 (4.38%), and 2,448 (11.36%) patients, respectively. The DNN performed with an AUC-ROC of 0.783 (medical complications), 0.709 (surgical complications), and 0.703 (reoperations). Accuracy, specificity, and negative predictive values of the model for all outcome variables ranged 78.2%-97.2%, while sensitivity and positive predictive values ranged 11.6%-62.5% Variables with high permutation importance included sex, inpatient vs outpatient, and ASA class.
Conclusions:
We predicted surgical/medical complications and unplanned reoperation following thyroidectomy via development of a well-performing ML algorithm. We have also developed a web-based application available on mobile devices to demonstrate the predictive capacity of our models in real time.
Keywords: Thyroidectomy, machine learning, artificial neural network, complication, reoperation
Introduction
In the past decade, the incidence of thyroid cancer is increasing in contrast to many other head and neck cancers.1 Consequently, thyroidectomy is widely employed, amounting to more than 130,000 cases per year in the United States.2 Though thyroidectomy is generally considered to be a safe procedure, surgical risks such as hypocalcemia and recurrent laryngeal nerve paralysis remain a burden to the current healthcare, and many investigations have characterized complications and reoperations following thyroidectomy.3,4 Consequently, development of technology that improves the predictive capacity of such post-thyroidectomy events is warranted.
Developments in machine learning (ML) have accelerated over the last decade while annotated medical data have become widely available for academic use. These advancements have allowed for successful application of such technologies to many subspecialties of medicine.5–7 A form of ML, artificial neural network (ANN) is an algorithm of interest, since it oftentimes functions superiorly to traditional statistical methods when handling large datasets with underlying nonlinear distributions.8 Given that many clinical variables such as treatment outcomes are products of multifactorial causes including risk factors, ANN has the potential to be applied to a wide variety of medical disciplines for its predictive capacity; however, it is yet to be employed for predicting post-surgical outcomes for thyroidectomy.
The goal of this study was to develop an ANN that predicts surgical complications, medical complications, and unplanned reoperations after thyroidectomy. We used the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) database for training and validating the algorithm. Increased predictivity of these post-surgical outcomes could decrease occurrences of such problems, lowering the risk of undergoing thyroidectomy.
Materials and Methods
This manuscript has been prepared with reference to the STROBE checklist for cohort studies. This study was exempted from Institutional Review Board approval due to the publicly available and de-identified nature of the database. The ACS-NSQIP database, which reports 30-day morbidity and mortality information for various surgical operations, was reviewed retrospectively between the years 2005 to 2017. A total of 21550 patients undergoing thyroidectomy were identified using current procedural terminology (CPT) codes associated with thyroidectomy (including 60200, 60210, 60212, 60220, 60225, 60240, 60252, 60254, 60260, 60270, and 60271).
We predicted three primary outcomes of interest, including occurrence of surgical complications, medical complications, and unplanned reoperation. Surgical complication was derived from multiple variables, including occurrences of superficial surgical site infections (SSI), deep SSI, organ/space SSI, wound disruption, and blood transfusion within 72 hours. Medical complication included occurrences of pneumonia, unplanned reintubation, urinary tract infection, deep vein thrombosis, renal insufficiency, pulmonary embolism, being on ventilator >48 hours, acute renal failure, cerebrovascular accident with neurological deficit, cardiac arrest requiring cardiopulmonary resuscitation, myocardial infarction, sepsis, or septic shock. For reoperation, we used the “RETURNOR” variable, which indicates unplanned reoperation.
Twenty-one preoperative variables were selected for use as input variables (Supplemental Table 1). Complication-related variables were all removed. The NSQIP database contained a significant amount of missing data. These must be replaced with alternate values in order to be passed through deep learning algorithms. Consequently, we utilized a random forest-based imputation tool, the missForest package, to fill in these missing data. We included all input variables for imputing missing values. MissForest was implemented in the Python programming language via the missingpy library.
Continuous variables were normalized to values between 0 and 1. Categorical features were encoded via one-hot encoding. Both types of features were scaled to a mean of 0 and standard deviation of 1 after separating them into training, validation, and testing datasets. Due to imbalance in the dataset, we conducted over-sampling of positive cases within the training dataset via the Synthetic Minority Oversampling Technique (SMOTE), which increases the number of minority cases through interpolation. New data are synthesized based on a randomly selected point within k nearest neighbors of the minority cases. This prevents neural networks from neglecting minority classes, leading to higher performance.
We developed a deep neural network (DNN), a type of ANN, using the Keras open-source library written in the Python programming language. The model was trained independently for classification of three outcome variables, including surgical complication, medical complication, and reoperation. A total of 21 variables were input into the model for classification. These input data were passed through 10 neural network layers. The entire dataset (n = 21550) was divided into training (n = 13792 [64 %]), validation (n = 3448 [16 %]), and testing (n = 4310 [20 %]) datasets. We used multiple performance metrics, including area under the curve of receiver operating characteristics (AUC-ROC), accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). We further computed the permutation feature importance of individual variables. This method calculates the decrease in a given evaluation metric (AUC-ROC) when a single feature is removed from the dataset for making the prediction. We repeated this process for all distinct features. Features with the higher decrease in evaluation scores were deemed more important. The evaluation was conducted using the test dataset. All outputs were computed through the ELI5 python package.
We developed a web-based application to demonstrate the predictive capacity of our three models in real-time. Using TensorFlow.js, a JavaScript interface for machine learning algorithms, our model runs entirely in the browser session of a user’s device to ensure privacy and security of the data. This allows the model to predict results without the need to send information to a server for inference. Additionally, the web interface is available on mobile devices.
Results
A total of 21,550 patients were included in this study. 4,731 (21.95%) patients were male, while 14,494 patients were white (67.26%) and 2,909 (13.50%) were black or African American. The mean age of the cohort was 52.39 ± 15.06 years. Out of a total of 21,550 patients, medical complications and surgical complications occurred in 1,723 (8.0%) patients and 943 (4.38%) patients, respectively, while 2,448 (11.36%) patients had undergone reoperation. The AUC-ROC for predicting occurrence of medical complication in the test dataset was 0.783, along with an accuracy of 0.782, sensitivity of 0.625, specificity of 0.797, PPV of 0.225, and NPV of 0.958. The AUC-ROC for predicting occurrence of surgical complication in the test dataset was 0.709, while accuracy of prediction was 0.816, sensitivity was 0.476, specificity was 0.832, PPV was 0.116, and NPV was 0.972. Finally, the AUC-ROC for predicting occurrence of reoperation in the test dataset was 0.703. Corresponding accuracy of prediction was 0.784, sensitivity was 0.435, specificity was 0.825, PPV was 0.226, and NPV was 0.926. Evaluation statistics and ROC curves for all predictions are depicted in Table 1 and Figure 1.
Table 1.
Model evaluation statistics for predicting occurrence of medical complication, surgical complication, and reoperation (classification threshold = 0.5).
| Trials | AUC-ROC | Accuracy | Sensitivity | Specificity | PPV | NPV |
|---|---|---|---|---|---|---|
| Medical Complication | 0.783 | 0.782 | 0.625 | 0.797 | 0.225 | 0.958 |
| Surgical Complication | 0.709 | 0.816 | 0.476 | 0.832 | 0.116 | 0.972 |
| Reoperation | 0.703 | 0.784 | 0.435 | 0.825 | 0.226 | 0.926 |
AUC-ROC: area under the curve of the receiver operating characteristic; PPV: positive predictive value; NPV: negative predictive value.
Figure 1.

ROC curve for predictions. (1) Medical complication, (2) Surgical complication, (3) Reoperation.
We computed the permutation importance of each variable for predicting each outcome variable. Top 5 most important features for predicting each outcome are listed in Table 2. Though results varied depending on the outcome variables, features such as sex, inpatient vs. outpatient, and ASA class were deemed as the most important features in making the prediction. A web application was created to allow users to upload information for their own patient cohort and receive predictive values output by our models (Figure 2). To enhance the user experience for both clinicians and researchers, the web-app was created to include two input styles of single patient forms and multi-patient spreadsheets. The single patient form is used to get predictions about an individual by filling out a questionnaire associated with input variables. The spreadsheet input style allows users to see the predictive outcomes for multiple individuals by either uploading a spreadsheet of data or inputting information about multiple patients into the table provided in the web application. Following this process, users can download output predictions in a spreadsheet format. The website is publicly accessible at the following link: https://headneckml.com/thyroidectomy.html
Table 2.
Top 5 variables considered important for predicting occurrence of medical complication, surgical complication, and reoperation as derived via ELI5.
| Ranking | Medical complication | Surgical complication | Reoperation |
|---|---|---|---|
| 1 | Inpatient/outpatient | Sex | Sex |
| 2 | ASA class | Race | Inpatient/outpatient |
| 3 | Elective surgery | ASA Class | Male |
| 4 | Age | Inpatient/outpatient | Elective surgery |
| 5 | Race | Smoking | ASA class |
ASA: American Society of Anesthesiologists.
Figure 2.

User interface for web deployment of predictive models: https://headneckml.com/thyroidectomy.html
Discussion
Although thyroidectomy is considered a safe procedure, it can be associated with different post-surgical issues, ranging from less severe, transient problems to major complications that adversely impact patients’ quality of life. Dysphonia, dysphagia, and other nonspecific upper aerodigestive symptoms may be present in 45% to 80% of patients post-operatively; however, most of these symptoms resolve spontaneously 6-12 months after surgery, or with appropriate early speech and swallowing therapy.9–11 Vocal cord paralysis due to inferior laryngeal nerve injury can significantly impair phonation and cause severe dysphagia and dyspnea. Transient palsies are observed in 3% to 8% of thyroidectomies with permanent inferior laryngeal nerve injury occurring in 0.2% to 6.6% of cases.12–14 Both temporary and permanent palsies cause significant impact on patient’s quality of life.15 Postoperative hypoparathyroidism is another common complication in thyroidectomy, with transient symptoms occurring in 19% to 39% of patients and permanent loss of the parathyroid glands in up to 3% of patients.16,17 Although majority of complications following thyroidectomy are minor and non-invalidating, there is still potential risk for life-long adverse effects.
Besides the mental and physical burden on patients’ lives, complications and reoperations place a large financial load on the healthcare system, with rehospitalizations estimated to cost upwards of $17 billion annually in avoidable Medicare expenditure.18 Physicians could mitigate the negative effects of postoperative complications on healthcare expenditure and quality of life by using a model that would allow them to predict patients at risk for complications and develop personalized treatment plans. Advancements in ANN have led researchers to employ such technology for prediction of various surgical variables such as readmission, recurrence, length-of-stay, complication, and reoperation among various specialties.19,20 However, there is a dearth of similar literature in regard to thyroidectomy, and to our knowledge, our study is the first to predict occurrences of reoperation, surgical complication, and medical complication following thyroidectomy using ANN.
Logistic regression, a type of ML, is one of the most widely used methodologies for identifying risk factors predictive of medical outcomes such as morbidity and mortality. However, because of its dependence on using linear relationships between independent and dependent variables, logistic regression might lack the necessary flexibility to capture more complex relationships, such as those seen with surgical outcomes. ANN is particularly suited for these scenarios since it requires less previous knowledge regarding the importance of specific input variables and is capable of detecting non-linear relationships between input and output variables.21 Our DNN algorithm achieved an AUC-ROC of 0.703, 0.783, and 0.709 for predicting occurrences of reoperation, medical complication, and surgical complication, respectively. This is comparable to the results of other studies that utilized ANN algorithms to predict postoperative outcomes for shoulder arthroplasty, spinal surgery, and inguinal hernia repair.19,20,22 Although it is debatable which algorithms are most suitable for different analyses, our results indicate that application of a DNN algorithm was successful in predicting postoperative complications and reoperations in thyroidectomy.
We were able to analyze the importance of individual variables by characterizing how the exclusion of each variable affected the algorithm’s ability to effectively predict complications and reoperations. Variables that yielded a higher decrease in algorithm performance upon exclusion were considered more important. We utilized this methodology as it is not currently possible to directly measure variable importance in ANN classifications. Although this permutation method does not provide directionality on how the variable influences the prediction, it provides input on what factors were deemed important by the model in making predictions and may provide clinically significant information. For instance, previous literature shows that sex is a major predictor of certain thyroidectomy complications. Female sex has been shown to be an independent risk factor for postoperative hypocalcemia and male sex has been shown to increase the risk for post-thyroidectomy hemorrhage.23,24 This supports our results indicating that variables deemed important through premutation analysis could potentially be monitored in future thyroidectomies to improve patient outcomes. Machine learning explainability and interpretation is a topic that is under extensive investigation, and further research must be conducted in this sphere.
As a proof of concept for the utility of our model in clinical settings, we developed a web-based application with a user-friendly platform. Users can input data per individual patient or for a cohort of multiple patients and output predictive values in an organized excel sheet. Utility of the website is further enhanced by the fact that the website is mobile-friendly, allowing for easy-use in clinical settings. Furthermore, processed inputs are fed into our models using TensorFlow.js, a JavaScript interface for TensorFlow machine learning library, allowing our models to run completely in the user’s device browser session without any transfer of data to external servers for inference. The data processing and model prediction require no data transfer to a server, which ensures the privacy and security of patient data that is highly sensitive.
Our study has several limitations, some of which stem from our use of a large national database, the NSQIP, as the training data set for our algorithm. Use of trained reviewers to extract data from medical records, as done for the NSQIP database, can potentially lend itself to error. Previous NSQIP studies have found lower than expected rates of concurrent procedures, indicating that CPT codes may not be correctly documented.25 Furthermore, although specific CPT codes were used to collect a homogeneous cohort of thyroidectomy patients, there still exist limitations due to regional variation in pre and postoperative patient care, lending to potential unexamined heterogeneity in the cohort. Another significant limitation of our study is the absence of some variables, such as thyroid size, the extent of thyroid disease, lymph node involvement, presence of malignancy, and type of thyroid cancer. These variables are not reported on the NSQIP database, therefore were not included in our prediction model. Future studies should consider a more comprehensive set of predictors (i.e., input variables) in order to provide a better understanding of the factors that contribute to postoperative outcomes. Despite this limitation, our results provide valuable insights into some of the factors predictive of thyroidectomy outcome.
The NSQIP had a significant amount of missing data, which needed to be filled with alternate values before it can be passed through our algorithm. We used the missForest data package to fill these values; however, a more complete data set would have likely improved algorithm performance. Similarly, there was a class imbalance, with a higher number of cases without complications or reoperations presented. We used the SMOTE technique to oversample the minority positive cases in order to keep our algorithm from neglecting them, although a dataset with a higher minority class sample size would have likely improved performance of our ANN.
Conclusion
We developed a well-performing DNN algorithm to predict surgical and medical complications and reoperation following thyroidectomy. The model performed at an AUC-ROC of 0.783, 0.709, and 0.703 for predicting medical complications, surgical complications, and reoperation. respectively. Variables with high permutation importance included sex, inpatient vs outpatient, and ASA class. These findings may potentially offer more insight into post-surgical outcomes and allow physicians to make more personalized treatment plans for thyroidectomy patients, though, further investigations are needed for its clinical implementation.
Supplementary Material
Key Points.
Though thyroidectomy is generally considered to be a safe procedure, surgical risks remain a burden to the current healthcare.
Physicians could mitigate the negative effects of postoperative complications on healthcare expenditure and quality of life by using an artificial neural network to predict patients at risk for complications.
We developed a machine learning model that used data from national registry to predict complications after thyroidectomy.
We have also developed open-source interface (website) where the predictive models can be tested in real-time by any user.
Based on the predictions of our algorithm, physicians can make more personalized treatment plans for thyroidectomy patients.
Financial Disclosure:
Mehdi Abouzari was supported by the National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant TL1TR001415.
Footnotes
Conflicts of Interest: None
Authorship Statement:
KT, KG, YMH, TT, WBA & MA: Conception and design of the study; KT, KG, KHA, PK & KT: Methodology and data analysis; KT, KG, KHA, PK, KT, YMH, TT, WBA & MA: Interpretation of the data; KT: Writing-original draft; KG, KHA, PK, KT, YMH, TT, WBA & MA: Writing-review and editing; YMH, TT, WBA & MA: Supervision; KT, KG, KHA, PK, KT, YMH, TT, WBA & MA: Accountable for all aspects of the work.
Data Availability Statement:
The data that support the findings of this study are openly available in the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) database at https://www.facs.org/quality-programs/data-and-registries/acs-nsqip/participant-use-data-file/.
References
- 1.Davies L & Gilbert Welch H Increasing incidence of thyroid cancer in the United States, 1973-2002. JAMA 295, 2164–2167 (2006). [DOI] [PubMed] [Google Scholar]
- 2.Sosa JA, Hanna JW, Robinson KA & Lanman RB Increases in thyroid nodule fine-needle aspirations, operations, and diagnoses of thyroid cancer in the United States. Surgery 154, 1420–1427 (2013). [DOI] [PubMed] [Google Scholar]
- 3.Meltzer C et al. Risk of Complications after Thyroidectomy and Parathyroidectomy: A Case Series with Planned Chart Review. Otolaryngol Head Neck Surg 155, 391–401 (2016). [DOI] [PubMed] [Google Scholar]
- 4.Bhattacharyya N & Fried MP Assessment of the morbidity and complications of total thyroidectomy. Arch Otolaryngol Head Neck Surg 128, 389–392 (2002). [DOI] [PubMed] [Google Scholar]
- 5.Goyal A et al. Can machine learning algorithms accurately predict discharge to nonhome facility and early unplanned readmissions following spinal fusion? Analysis of a national surgical registry. J Neurosurg Spine 1–11 (2019) doi: 10.3171/2019.3.SPINE181367. [DOI] [PubMed] [Google Scholar]
- 6.Merath K et al. Use of Machine Learning for Prediction of Patient Risk of Postoperative Complications After Liver, Pancreatic, and Colorectal Surgery. J Gastrointest Surg 24, 1843–1851 (2020). [DOI] [PubMed] [Google Scholar]
- 7.Yamamoto Y et al. Automated acquisition of explainable knowledge from unannotated histopathology images. Nat Commun 10, 5642 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Chen JH & Asch SM Machine Learning and Prediction in Medicine — Beyond the Peak of Inflated Expectations. N Engl J Med 376, 2507–2509 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Pereira JA, Girvent M, Sancho JJ, Parada C & Sitges-Serra A Prevalence of long-term upper aerodigestive symptoms after uncomplicated bilateral thyroidectomy. Surgery 133, 318–322 (2003). [DOI] [PubMed] [Google Scholar]
- 10.Krekeler BN et al. Patient-Reported Dysphagia After Thyroidectomy: A Qualitative Study. JAMA Otolaryngol Head Neck Surg 144, 342–348 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Mattioli F et al. The role of early voice therapy in the incidence of motility recovery in unilateral vocal fold paralysis. Logoped Phoniatr Vocol 36, 40–47 (2011). [DOI] [PubMed] [Google Scholar]
- 12.Echternach M et al. Laryngeal complications after thyroidectomy: is it always the surgeon? Arch Surg 144, 149–153; discussion 153 (2009). [DOI] [PubMed] [Google Scholar]
- 13.Varaldo E, Ansaldo GL, Mascherini M, Cafiero F & Minuto MN Neurological complications in thyroid surgery: a surgical point of view on laryngeal nerves. Front Endocrinol (Lausanne) 5, 108 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Stavrakis AI, Ituarte PHG, Ko CY & Yeh MW Surgeon volume as a predictor of outcomes in inpatient and outpatient endocrine surgery. Surgery 142, 887–899; discussion 887-899 (2007). [DOI] [PubMed] [Google Scholar]
- 15.Spector BC et al. Quality-of-life assessment in patients with unilateral vocal cord paralysis. Otolaryngol Head Neck Surg 125, 176–182 (2001). [DOI] [PubMed] [Google Scholar]
- 16.Orloff LA et al. American Thyroid Association Statement on Postoperative Hypoparathyroidism: Diagnosis, Prevention, and Management in Adults. Thyroid 28, 830–841 (2018). [DOI] [PubMed] [Google Scholar]
- 17.Edafe O & Balasubramanian SP Incidence, prevalence and risk factors for post-surgical hypocalcaemia and hypoparathyroidism. Gland Surg 6, S59–S68 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Jencks SF, Williams MV & Coleman EA Rehospitalizations among patients in the Medicare fee-for-service program. N Engl J Med 360, 1418–1428 (2009). [DOI] [PubMed] [Google Scholar]
- 19.Karhade AV et al. Development of Machine Learning Algorithms for Prediction of 30-Day Mortality After Surgery for Spinal Metastasis. Neurosurgery 85, E83–E91 (2019). [DOI] [PubMed] [Google Scholar]
- 20.Biron DR et al. A Novel Machine Learning Model Developed to Assist in Patient Selection for Outpatient Total Shoulder Arthroplasty. J Am Acad Orthop Surg 28, e580–e585 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ayer T et al. Comparison of Logistic Regression and Artificial Neural Network Models in Breast Cancer Risk Estimation. Radiographics 30, 13–22 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Gao J, Zagadailov P & Merchant AM The Use of Artificial Neural Network to Predict Surgical Outcomes After Inguinal Hernia Repair. The Journal of surgical research 259, 372–378 (2021). [DOI] [PubMed] [Google Scholar]
- 23.Liu J et al. Risk factors for post-thyroidectomy haemorrhage: a meta-analysis. Eur J Endocrinol 176, 591–602 (2017). [DOI] [PubMed] [Google Scholar]
- 24.Mehta S, Dhiwakar M & Swaminathan K Outcomes of parathyroid gland identification and autotransplantation during total thyroidectomy. Eur Arch Otorhinolaryngol 277, 2319–2324 (2020). [DOI] [PubMed] [Google Scholar]
- 25.Bovenzi CD et al. Reconstructive trends and complications following parotidectomy: incidence and predictors in 11,057 cases. Journal of Otolaryngology - Head & Neck Surgery 48, 64 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support the findings of this study are openly available in the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) database at https://www.facs.org/quality-programs/data-and-registries/acs-nsqip/participant-use-data-file/.
