Skip to main content
BMC Endocrine Disorders logoLink to BMC Endocrine Disorders
. 2026 Feb 3;26:75. doi: 10.1186/s12902-026-02164-7

Machine learning-based classification of papillary thyroid carcinoma versus multinodular goiter using preoperative laboratory and cytology data

Salar GolmohammadzadehKhiaban 1, Mehrad Namazee 2, Ali Rahnamaei 3,
PMCID: PMC12958729  PMID: 41634679

Abstract

Background

Thyroid nodules are frequently encountered in clinical practice, with their detection increasing due to advancements in imaging modalities. While most nodules are benign, distinguishing papillary thyroid carcinoma (PTC) from benign entities such as multinodular goiter (MNG) remains a diagnostic challenge. Fine-needle aspiration (FNA) and sonography are standard tools, but their limitations highlight the need for supplementary approaches. This study evaluates the use of machine learning (ML) models to classify PTC versus MNG using routine preoperative clinical, laboratory, and cytological data before performing surgery and Pathology results.

Methods

This retrospective multicenter study included 971 patients who underwent total thyroidectomy between 2020 and 2024. The dataset incorporated demographic data, preoperative sonographic findings, hematologic and thyroid function tests, and FNA cytology results. Five supervised ML algorithms—Logistic Regression, Random Forest, XGBoost, Support Vector Machine (SVM), and K-Nearest Neighbor (KNN)—were trained and validated. Model performance was assessed using accuracy, precision, recall, F1-score, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC-ROC).

Results

The XGBoost model achieved the best performance, with an accuracy of 84.4%, precision of 85.3%, and an AUC-ROC of 0.881. It also demonstrated high sensitivity (0.714) and specificity (0.944). Random Forest also performed well (accuracy: 81.2%, AUC-ROC: 0.919). Logistic Regression, SVM, and KNN underperformed in comparison. Feature importance analysis revealed that the FNA result, nodule size, and TSH were the most influential predictors.

Conclusion

Machine learning models, particularly XGBoost and Random Forest, show promise in accurately distinguishing between MNG and PTC using routine clinical data. Their integration into preoperative assessment may enhance diagnostic precision, reduce unnecessary procedures, and support personalized surgical decision-making. Further validation in diverse, multicenter cohorts is warranted to confirm generalizability and clinical utility.

Clinical trial number

Not applicable.

Keywords: Thyroid nodule, Papillary thyroid carcinoma (PTC), Machine learning (ML), Fine-Needle aspiration (FNA), Multi nodular goiter, Classification

Backgrounds

Thyroid nodules are highly prevalent clinical findings, and their detection rates increase progressively with the advancement of imaging modalities [1]. Although the majority of these nodules are benign [2], a minority may represent malignant lesions, the most common of which is papillary thyroid carcinoma (PTC) [3]. Accurate preoperative discrimination between malignant and benign thyroid nodules are essential to avoid unnecessary surgery, reduce patient anxiety, and guide appropriate clinical management [4].

The most prevalent benign thyroid condition, known as multinodular goiter (MNG), is characterized by the thyroid gland containing multiple nodules of various sizes and shapes [5]. Despite being the gold standard for the preoperative diagnosis of thyroid nodules, fine-needle aspiration (FNA) cytology can produce non-diagnostic or indeterminate results, especially because of sampling errors and nodule heterogeneity [6]. Currently, molecular tests have improved diagnostic accuracy; however, they do not fully eliminate the risk of misdiagnosis or ensure adequate monitoring [7]. Despite the ready availability of thyroid function tests and other preoperative laboratory parameters, they are not optimally utilized in diagnostic modeling [8].

In this regard, artificial intelligence (AI) and machine learning have developed as incredibly powerful tools for the enhancement of clinical decision-making [9]. These technologies are specifically very good at identifying sophisticated, nonlinear patterns in high-dimensional data, abilities especially valuable in the heterogenous domain of thyroid disease [10, 11]. AI models can aggregate and process diverse inputs, such as preoperative laboratory findings and cytological findings, to generate predictive information that may be elusive to conventional statistical methods or clinical judgment [12, 13].

Recent years have seen a growing interest in artificial intelligence (AI) and machine learning (ML) approaches for improving thyroid nodule diagnosis. Prior studies, such as those by Buda et al. (2019) [14], have demonstrated promising results using deep learning and radiomic features extracted from ultrasound or cytology images. However, these models often require access to large imaging datasets and specialized computational infrastructure, which can limit their clinical generalizability. In contrast, the present study focuses on developing ML models that integrate routine, readily available preoperative data including laboratory tests, structured sonographic findings, and FNA cytology reports to classify thyroid nodules as benign (MNG) or malignant (PTC). This approach leverages data already collected in standard clinical workflows, thereby increasing accessibility and potential for real-world application.

On the other hand, most existing models leaving broadly unaddressed the possibility of cheap, cost-saving AI tools based on data already integrated into standard clinical processes [15, 16]. However, there is still a lack of research on the use of machine learning models specifically trained on routine preoperative laboratory and FNA data to differentiate between MNG and PTC. To address this gap, a machine learning model for preoperative stratification of MNG and PTC based on routinely ordered laboratory tests and FNA cytology results was developed and validated. This model seeks to increase risk stratification, improve diagnostic accuracy, and facilitate better surgical decision-making by leveraging AI.

Method

Study design and population

Patients who underwent thyroid surgery between 2020 and 2024 at three major tertiary care centers were included. Inclusion criteria were as follows: patients over 18 years old who underwent thyroid surgery with a confirmed histopathological diagnosis, had available preoperative laboratory test results, and had undergone fine-needle aspiration (FNA) cytology. Patients with incomplete preoperative data, a history of prior thyroid surgery (including completion thyroidectomy), or final histopathology revealing non-papillary histology were excluded. Finally, a total of 951 patients met the eligibility criteria, including 408 cases of multinodular goiter (MNG) and 543 cases of papillary thyroid carcinoma (PTC).

Data collection and preprocessing

Data were retrospectively collected from the electronic medical records of Firoozgar and Rasoul Akram Hospitals between 2020 and 2024. Only patients who underwent total thyroidectomy and had complete records for the required variables were included. Cases with missing key features were excluded from the analysis.

The dataset included demographic information (age, gender), Sonographic findings (nodule size and location, lymph node status), preoperative laboratory values (complete blood count, thyroid function tests, calcium), cytology results from preoperative fine-needle aspiration (FNA) including Benign, Malignant, papillary thyroid carcinoma (PTC), AUS, non-diagnostic and also final surgical pathology (benign vs. malignant, tumor type). All hematologic and thyroid function tests were obtained within 2–3 weeks prior to surgery and, when available, before fine-needle aspiration (FNA).

The dataset utilized structured sonographic and cytopathology reports rather than raw imaging data, consistent with standard clinical practice. To mitigate operator dependency, data were derived from multiple radiologists and pathologists across two tertiary centers, thereby enhancing heterogeneity and reducing single-operator bias.

Prior to model development, all numeric variables were standardized using z-score normalization. Categorical variables were encoded using one-hot encoding where applicable. Outliers and biologically implausible values were reviewed and removed. To ensure data consistency, all preprocessing steps were performed using reproducible Python-based pipelines.

AI model development and evaluation

The objective of model development was to classify patients with papillary thyroid carcinoma versus multinodular goiter using preoperative clinical, laboratory, cytological, and imaging data. Multiple supervised learning algorithms were explored to identify the most effective classifier for this task.

The dataset was randomly split into training (80%) and testing (20%) sets while preserving class balance. Feature selection was performed using both domain knowledge and data-driven techniques, including univariate statistical tests and feature importance scores derived from tree-based models. To address class imbalance and improve generalizability, stratified sampling was used during cross-validation.

Several machine learning algorithms were implemented and compared, including:

  • Logistic Regression.

  • Random Forest.

  • Gradient Boosting Machines (XGBoost).

  • Support Vector Machine (SVM).

  • K-Nearest Neighbor (KNN).

Model development and tuning were carried out using scikit-learn [17], XGBoost [18] and data management, visualization using NumPy [19] and Matplotlib [20] in Python. Hyperparameters were optimized via grid search with 5-fold cross-validation on the training set. Performance metrics such as accuracy, precision, recall, F1-score, Sensitivity-Specificity and area under the receiver operating characteristic curve (AUC-ROC) were used to evaluate each model on the independent test set.

Ethical consideration

This study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committees of Firoozgar Hospital and Rasoul Akram Hospital [the institutional review board (IRB) Number of 20-169-2024] The study utilized de-identified retrospective data, and informed consent was waived due to the non-interventional nature of the research. All data handling procedures adhered to institutional and national regulations on data privacy and patient confidentiality.

Result

Patient characteristics

The study cohort included 951 patients (~ 59% Women / ~41% Men) with a mean age of 45.85 years (range: 15–78). The average nodule size was 33.02 mm, with a wide variation from 5 mm to 96 mm. Hematologic parameters such as white blood cell count (WBC), hemoglobin (Hb), and platelet count (PLT) had means of 7.97 × 10⁹/L, 13.98 g/dL, and 235.09 × 10⁹/L, respectively. Thyroid function tests showed an average TSH level of 2.42 µIU/mL and total T4 (tt4) of 9.84 µg/dL. Full summary statistics for clinical and laboratory and also Sonographic and Cytological Characteristics variables are presented in Table 1 and Table 2 respectively.

Table 1.

Descriptive statistics of patient characteristics

Variable Min Max Mean Median Std Dev
Age (years) 15.00 78.00 45.85 45.50 13.39
Size (mm) 5.00 96.00 33.02 30.00 18.75
WBC (×10⁹/L) 3.80 19.70 7.97 7.50 2.72
NP (%) 9.10 96.30 66.05 64.50 13.51
NC (×10⁹/L) 1.40 66.40 5.72 4.60 5.48
LP (%) 2.30 61.00 26.67 28.85 11.33
Lymph Count 0.10 29.40 2.21 1.90 2.47
NTL Ratio 0.53 42.00 4.13 2.17 5.58
Hemoglobin (g/dL) 9.10 11.90 13.98 13.20 8.64
PLT (×10⁹/L) 105.00 686.00 235.09 225.50 65.42
RDW (%) 11.00 20.00 13.68 13.10 1.74
PDW (%) 8.20 19.70 12.07 11.60 2.14
MPV (fL) 1.40 12.20 9.41 9.40 1.35
TSH (µIU/mL) 0.01 29.30 2.42 1.40 3.89
TT4 (µg/dL) 0.10 139.00 9.84 7.90 14.16
TT3 (ng/dL) 0.76 242.00 93.28 105.00 59.19
Calcium (mg/dL) 7.10 12.00 9.26 9.30 0.83

Shown are the minimum, maximum, mean, median, and standard deviation for demographic, hematologic, and endocrine variables

Table 2.

Sonographic and cytological characteristics of the study population

Variable Category n (%) or mean ± SD
Nodule size (mm) Mean size of nodules 33.22 ± 18.85
Lymph node positivity Yes / No Yes (55%) / No (45%)
FNA cytology result Benign / Malignant / AUS / Non-diagnostic 36.9% / 39.5% / 10.9% / 12.6%

Continuous variables are presented as mean ± standard deviation (SD), and categorical variables as counts and percentages. Nodule size, lymph node positivity, and fine-needle aspiration (FNA) cytology categories are included as representative features used in model development

Models performance

In this study, five machine learning models were trained and evaluated to classify thyroid pathology using preoperative laboratory features. The models included Logistic Regression, Random Forest, XGBoost, Support Vector Machine (SVM), and K-Nearest Neighbors (KNN). The performance of each model was assessed using several metrics as summarized in Table 3. Additionally, sensitivity and specificity were computed to evaluate the models’ clinical discriminative abilities (Table 2).

Table 3.

Performance metrics of five machine learning models for thyroid nodule classification

Model Accuracy Precision Recall F1-score AUC-ROC Sensitivity Specificity
XGBoost 0.844 0.853 0.844 0.840 0.881 0.714 0.944
Random Forest 0.812 0.828 0.812 0.806 0.919 0.643 0.944
Logistic Regression 0.719 0.720 0.719 0.713 0.671 0.571 0.833
SVM 0.688 0.703 0.688 0.667 0.639 0.429 0.889
KNN 0.562 0.556 0.562 0.557 0.625 0.429 0.667

The table summarizes accuracy, precision, recall, F1-score, AUC-ROC, sensitivity, and specificity. XGBoost and Random Forest achieved the highest overall performance across most metrics

XGBoost achieved the highest overall performance, with an accuracy of 84.4%, precision of 85.3%, recall of 84.4%, F1-score of 84.0%, and an AUC-ROC of 0.881. Importantly, it also demonstrated high sensitivity (0.714) and specificity (0.944), indicating a strong ability to correctly identify both positive and negative cases. Random Forest performed comparably well, with an accuracy of 81.2% and the highest AUC-ROC of 0.919 among all models. It also exhibited strong specificity (0.944), although its sensitivity (0.643) was slightly lower than that of XGBoost. (Table 1)

Logistic Regression showed moderate performance (accuracy: 71.9%, AUC-ROC: 0.671) and a notable imbalance between sensitivity (0.571) and specificity (0.833), suggesting it was more conservative in predicting positive cases.

SVM and KNN underperformed relative to the other classifiers. SVM achieved an accuracy of 68.8% and a low AUC-ROC of 0.639 showed the lowest performance across all metrics, with an accuracy of 56.2%, AUC-ROC of 0.625, and similarly poor sensitivity A visual comparison of the classifiers’ performance is presented in Fig. 1, which shows the ROC curves for all five models. Ensemble methods such as XGBoost and Random Forest demonstrated steeper curves and larger AUC values, reflecting their superior classification performance.

Fig. 1.

Fig. 1

ROC curves of five machine learning models used for classifying thyroid nodules. XGBoost and Random Forest models demonstrated the AUC values (0.881 and 0.919, respectively), indicating superior discriminatory performance compared to other models. curves smoothed using a gaussian kernel with sigma = 1 for better visualization

Feature importance

To gain insight into the contribution of each feature to the model’s predictive performance, we evaluated feature importance using the trained XGBoost classifier. The results are presented in Fig. 2.

Fig. 2.

Fig. 2

Feature importance scores derived from the XGBoost model. The plot ranks input variables based on their relative contribution to model predictions. FNA result, nodule size, and TSH were among the most influential features, while demographic and anatomical variables such as sex and contralateral nodule presence showed minimal importance

Among all input variables, the fine needle aspiration (FNA) result was the most influential predictor of thyroid pathology, with an importance score of 0.194, followed by nodule size (0.116), and thyroid-stimulating hormone (TSH) levels (0.078). These were followed by age, platelet distribution width (PDW), mean platelet volume (MPV), and hemoglobin (Hb), each contributing moderate importance to the classification task.

In contrast, features such as sex, bilateral nodule, and contralateral nodule presence had minimal influence on model performance, each with importance values below 0.01. These findings emphasize the dominant role of cytological and imaging characteristics (e.g., FNA result, nodule size), as well as selected laboratory biomarkers, in distinguishing malignant from benign thyroid nodules in our dataset.

To further assess the contribution of FNA to overall model performance, we conducted an additional analysis excluding the FNA variable from the feature set. The accuracy of the XGBoost model decreased from 84.4% (AUC = 0.881) to 76.3% (AUC = 0.801), indicating that while FNA is a major predictive feature, the model retained reasonable discriminative ability based solely on laboratory and sonographic parameters. This finding suggests that the ML framework may still provide clinically useful risk stratification in cases where FNA is unavailable or deferred, potentially reducing the need for unnecessary invasive procedures in low-risk patients.

Discussion

Thyroid nodules are one of the most common clinical findings, with their prevalence increasing due to advancements in imaging techniques [21]. While the majority of thyroid nodules are benign, distinguishing malignant nodules, particularly PTC, remains a critical clinical challenge [22]. This study presents a machine learning (ML) model that integrates routine preoperative clinical, laboratory, and FNA cytology data to classify thyroid nodules as either benign (MNG) or malignant (PTC), which holds significant potential for improving diagnostic accuracy and patient outcomes.

The diagnosis and management of thyroid nodules heavily rely on imaging modalities such as ultrasound and, when necessary, FNA cytology. However, these methods have limitations, including operator dependency and the risk of non-diagnostic or indeterminate results. Molecular tests, though improving diagnostic accuracy, are often costly and do not fully eliminate the risk of misdiagnosis [22]. Additionally, while thyroid function tests and other laboratory parameters are readily available, they have not been optimally integrated into predictive models for preoperative stratification of thyroid nodules. This study addresses this gap by incorporating preoperative laboratory test results, sonographic findings, and cytological data into an ML framework. Such integration is crucial because it allows for the inclusion of a diverse set of inputs, reflecting the multifactorial nature of thyroid disease. ML algorithms, particularly those suited for high-dimensional data, excel at recognizing complex, nonlinear relationships among variables, providing a more comprehensive and potentially more accurate diagnostic tool than traditional methods. The model’s use of routinely collected clinical data enhances its accessibility and cost-effectiveness, making it a practical alternative to more resource-intensive diagnostic approaches [23].

The performance of the ML models developed in this study was evaluated across several metrics, including accuracy, precision, recall, F1-score, sensitivity, specificity, and AUC-ROC. The XGBoost classifier performed the best among the five models tested, achieving an accuracy of 84.4%, precision of 85.3%, and an AUC-ROC of 0.881. These results indicate that the model is highly effective at distinguishing between MNG and PTC. Furthermore, the high sensitivity (0.714) and specificity (0.944) suggest that the model is capable of both correctly identifying malignant cases (sensitivity) and correctly ruling out benign ones (specificity).

In comparison, Random Forest also performed well, with an accuracy of 81.2% and the highest AUC-ROC of 0.919, though its sensitivity was slightly lower than that of XGBoost. Logistic Regression, Support Vector Machine (SVM), and K-Nearest Neighbor (KNN) underperformed relative to these two models, with Logistic Regression showing a notable imbalance between sensitivity and specificity. The relatively lower performance of SVM and KNN indicates that more complex models like XGBoost and Random Forest are better suited for this classification task, likely due to their ability to handle nonlinear relationships and interactions among features. These findings align with previous studies where ML models, particularly ensemble methods like Random Forest and gradient boosting, have demonstrated robust performance in classifying thyroid nodules based on clinical and imaging data [24]. For instance, a study by Liu et al. (2022) developed an AI model with an AUC-ROC of 0.91 for distinguishing malignant thyroid nodules, further supporting the reliability of ML in this domain [25]. This consistency across studies suggests that machine learning has significant potential in enhancing the accuracy and efficiency of thyroid nodule diagnosis.

Although fine-needle aspiration (FNA) remains the current gold standard for preoperative evaluation of thyroid nodules, its diagnostic accuracy is limited, particularly in indeterminate and non-diagnostic categories, with reported accuracy ranging from 65% to 85% in prior studies [26]. In our cohort, the XGBoost model achieved a sensitivity of 0.714 and specificity of 0.944, demonstrating comparable or superior discriminatory performance relative to conventional FNA. Importantly, the proposed ML framework is not intended to replace FNA but to serve as a complementary tool, providing additional risk stratification in equivocal cases and potentially guiding the need for further molecular testing or surgical intervention.

A key aspect of this study is the identification of the most important features contributing to the model’s performance. Feature importance analysis revealed that the FNA result, nodule size, and TSH levels were the most influential predictors of thyroid pathology. These findings are consistent with clinical knowledge, where cytological features (such as FNA results) and nodule characteristics (such as size) are crucial for evaluating the risk of malignancy. TSH levels, a well-established biomarker for thyroid function, further support the biological relevance of these features in distinguishing benign from malignant nodules [27]. In contrast, demographic factors such as sex and the presence of contralateral nodules had minimal impact on model performance. This is important because it demonstrates that the model successfully prioritizes clinically meaningful features while minimizing the influence of less relevant variables. These results suggest that the model’s decision-making process closely mirrors the clinical decision-making process, where FNA and nodule characteristics are paramount, and demographic factors play a secondary role.

The clinical implications of this study are profound. First, the ability to preoperatively distinguish between MNG and PTC can significantly influence treatment decisions. By identifying patients at higher risk for malignancy, clinicians can make more informed decisions regarding the need for surgery, the extent of the procedure, and postoperative care. For instance, benign nodules may warrant conservative monitoring, while malignant nodules may require more aggressive interventions, including total thyroidectomy and lymph node dissection. Furthermore, reducing unnecessary surgeries and biopsies can alleviate patient anxiety and reduce healthcare costs [24]. As the model utilizes routine clinical data that are already collected as part of standard practice, its implementation would not require significant changes to current clinical workflows, making it a feasible tool for widespread adoption. This approach could also enhance patient care by providing more personalized management based on individual risk profiles, thus improving overall healthcare efficiency.

Beyond its methodological contributions, the clinical applicability of our model is noteworthy. Machine learning–based decision support tools such as ours could complement existing diagnostic pathways by integrating widely available preoperative data to improve risk stratification. In practice, this may help reduce unnecessary surgeries and repeat biopsies in patients with low-risk nodules, while identifying those with higher malignancy risk who would benefit from more aggressive intervention. Importantly, unlike molecular testing or advanced radiomics, which may be costly and unavailable in many regions, our approach relies on routine laboratory values, sonographic findings, and cytology reports, making it broadly accessible and cost-effective. Thus, integration of ML models into preoperative workflows has the potential to enhance personalized care, improve efficiency, and support clinical decision-making, particularly in resource-limited settings.

While the model demonstrates promising results, there are several limitations to consider. The retrospective design and inclusion of only patients who underwent thyroidectomy introduce a potential selection bias, as this cohort represents individuals with nodules already deemed suspicious or indeterminate for malignancy. Consequently, the findings may not fully generalize to the broader population of thyroid nodules encountered in general clinical practice. Furthermore, the reliance on data from a limited number of tertiary centers may restrict external applicability. To enhance the robustness and generalizability of the model, future studies should validate its performance in diverse, multicenter cohorts, ideally including unselected patients presenting with thyroid nodules. The model could also be further improved by integrating additional data modalities such as genomic profiles, advanced imaging-derived features, and patient-reported outcomes, which may provide a more comprehensive understanding of thyroid pathology. Future research should particularly emphasize validating model performance in preoperative triage of indeterminate nodules and exploring real-world clinical integration through AI-assisted diagnostic workflows.

Conclusion

This study highlights the feasibility and efficacy of using machine learning models based on routine preoperative clinical and laboratory data to differentiate between benign and malignant thyroid nodules. The integration of such models into clinical practice could revolutionize preoperative stratification, leading to more personalized and efficient patient care. As healthcare systems increasingly embrace digital tools, AI-driven models hold the potential to significantly improve diagnostic accuracy, reduce unnecessary procedures, and ultimately enhance patient outcomes. Further research and validation are essential to refine these models and ensure their widespread applicability in diverse healthcare settings.

Acknowledgements

Not applicable.

Abbreviations

AI

Artificial Intelligence

AUC-ROC

Area Under the Receiver Operating Characteristic Curve

AUS

Atypia of Undetermined Significance

FNA

Fine-Needle Aspiration

Hb

Hemoglobin

IRB

Institutional Review Board

KNN

K-Nearest Neighbor

LP

Lymphocyte Percentage

MNG

Multinodular Goiter

ML

Machine Learning

MPV

Mean Platelet Volume

NC

Neutrophil Count

NP

Neutrophil Percentage

NTL Ratio

Neutrophil-to-Lymphocyte Ratio

PDW

Platelet Distribution Width

PLT

Platelet Count

PTC

Papillary Thyroid Carcinoma

RDW

Red Cell Distribution Width

SVM

Support Vector Machine

TSH

Thyroid-Stimulating Hormone

TT3

Total Triiodothyronine

TT4

Total Thyroxine

WBC

White Blood Cell Count

XGBoost

Extreme Gradient Boosting

Author contributions

SGK collected and curated the patient data from participating centers and contributed to data preprocessing and statistical analysis. MN performed the machine learning modeling, including algorithm selection, training, and validation, and contributed to data visualization and interpretation. AR supervised the study design, provided clinical oversight and domain expertise in thyroid pathology, and was a major contributor in writing and editing the manuscript. All authors read and approved the final manuscript.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Data availability

The datasets generated and analyzed during the current study are not publicly but are available from the corresponding author upon a reasonable request.

Code availability

The underlying code used for model development, training, and evaluation in this study is not publicly available but may be made available to qualified researchers upon reasonable request from the corresponding author.

Declarations

Ethics approval and consent to participate

This study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committees of Firoozgar Hospital and Rasoul Akram Hospital [the institutional review board (IRB) Number of 20-169-2024] The study utilized de-identified retrospective data, and informed consent was waived due to the non-interventional nature of the research. All data handling procedures adhered to institutional and national regulations on data privacy and patient confidentiality.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Durante C, Grani G, Lamartina L, Filetti S, Mandel SJ, Cooper DS. The diagnosis and management of thyroid nodules: a review. JAMA. 2018;319(9):914–24. [DOI] [PubMed] [Google Scholar]
  • 2.Marotta V, Bifulco M, Vitale M. Significance of RAS mutations in thyroid benign nodules and non-medullary thyroid cancer. Cancers. 2021;13(15):3785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Alexander EK, Doherty GM, Barletta JA. Management of thyroid nodules. Lancet Diabetes Endocrinol. 2022;10(7):540–8. [DOI] [PubMed] [Google Scholar]
  • 4.Grani G, Sponziello M, Filetti S, Durante C. Thyroid nodules: diagnosis and management. Nat Rev Endocrinol. 2024;20(12):715–28. [DOI] [PubMed]
  • 5.Khan L, Khan I, Jogezai AK, Magsi SS, Khan N, Riaz J. Frequency of thyroid carcinoma in multi nodular goiter. Pakistan J Med Health Sci. 2023;17(02):237. [Google Scholar]
  • 6.Fresilli D, David E, Pacini P, Del Gaudio G, Dolcetti V, Lucarelli GT, et al. Thyroid nodule characterization: how to assess the malignancy risk. Update of the literature. Diagnostics. 2021;11(8):1374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Yazdaan HE, Jaya F, Sanjna F, Junaid M, Rasool S, Baig A, et al. Advances in thyroid function tests: precision diagnostics and clinical implications. Cureus. 2023;15(11). [DOI] [PMC free article] [PubMed]
  • 8.Soh SB, Aw TC. Laboratory testing in thyroid Conditions - Pitfalls and clinical utility. Ann Lab Med. 2019;39(1):3–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Toro-Tobon D, Loor-Torres R, Duran M, Fan JW, Singh Ospina N, Wu Y, et al. Artificial intelligence in thyroidology: a narrative review of the current applications, associated challenges, and future directions. Thyroid. 2023;33(8):903–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Habchi Y, Himeur Y, Kheddar H, Boukabou A, Atalla S, Chouchane A, et al. Ai in thyroid cancer diagnosis: Techniques, trends, and future directions. Systems. 2023;11(10):519. [Google Scholar]
  • 11.Tong W-J, Wu S-H, Cheng M-Q, Huang H, Liang J-Y, Li C-Q, et al. Integration of artificial intelligence decision aids to reduce workload and enhance efficiency in thyroid nodule management. JAMA Netw Open. 2023;6(5):e2313674–e. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Brydges G, Uppal A, Gottumukkala V. Application of machine learning in predicting perioperative outcomes in patients with cancer: A narrative review for clinicians. Curr Oncol. 2024;31(5):2727–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hays P. Artificial intelligence in cytopathological applications for cancer: a review of accuracy and analytic validity. Eur J Med Res. 2024;29(1):553. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Buda M, Wildman-Tobriner B, Hoang JK, Thayer D, Tessler FN, Middleton WD, et al. Management of thyroid nodules seen on US images: deep learning May match performance of radiologists. Radiology. 2019;292(3):695–701. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Singh S, Mohajer B, Wells SA, Garg T, Hanneman K, Takahashi T, et al. Imaging genomics and multiomics: A guide for beginners starting Radiomics-Based research. Acad Radiol. 2024;31(6):2281–91. [DOI] [PubMed] [Google Scholar]
  • 16.Sorrenti S, Dolcetti V, Radzina M, Bellini MI, Frezza F, Munir K, et al. Artificial intelligence for thyroid nodule characterization: where are we standing? Cancers. 2022;14(14):3357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30. [Google Scholar]
  • 18.Chen T, Guestrin C, editors. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016: ACM.
  • 19.Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, et al. Array programming with numpy. Nature. 2020;585:357–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Hunter JD, Matplotlib. A 2D graphics environment. Comput Sci Eng. 2007;9(3):90–5. [Google Scholar]
  • 21.Ahmed S, Johnson PT, Horton KM, Lai H, Zaheer A, Tsai S, et al. Prevalence of unsuspected thyroid nodules in adults on contrast enhanced 16- and 64-MDCT of the chest. World J Radiol. 2012;4(7):311–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Dharampal N, Smith K, Harvey A, Paschke R, Rudmik L, Chandarana S. Cost-effectiveness analysis of molecular testing for cytologically indeterminate thyroid nodules. J Otolaryngol Head Neck Surg. 2022;51(1):46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Thottakkara P, Ozrazgat-Baslanti T, Hupf BB, Rashidi P, Pardalos P, Momcilovic P, et al. Application of machine learning techniques to High-Dimensional clinical data to forecast postoperative complications. PLoS ONE. 2016;11(5):e0155705. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Xiong Z, Shi Y, Zhang Y, Duan S, Ding Y, Zheng Q, et al. Ultrasound radiomics based XGBoost model to differential diagnosis thyroid nodules and unnecessary biopsy rate: individual application of SHapley additive explanations. J Clin Ultrasound. 2024;52(3):305–14. [DOI] [PubMed] [Google Scholar]
  • 25.Gu J, Xie R, Zhao Y, Zhao Z, Xu D, Ding M, et al. A machine learning-based approach to predicting the malignant and metastasis of thyroid cancer. Front Oncol. 2022;12:938292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Bongiovanni M, Spitale A, Faquin WC, Mazzucchelli L, Baloch ZW. The Bethesda system for reporting thyroid cytopathology: a meta-analysis. Acta Cytol. 2012;56(4):333–9. [DOI] [PubMed] [Google Scholar]
  • 27.Brito JP, Singh-Ospina N, Gionfriddo MR, Maraka S, De Espinosa A, Rodriguez-Gutierrez R, et al. Restricting ultrasound thyroid fine needle aspiration biopsy by nodule size: which tumors are we missing? A population-based study. Endocrine. 2016;51(3):499–505. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets generated and analyzed during the current study are not publicly but are available from the corresponding author upon a reasonable request.

The underlying code used for model development, training, and evaluation in this study is not publicly available but may be made available to qualified researchers upon reasonable request from the corresponding author.


Articles from BMC Endocrine Disorders are provided here courtesy of BMC

RESOURCES