Abstract
Artificial intelligence (AI) models in thyroid cancer are often not generalizable due to biased or non‐representative training and validation datasets. We propose a four‐stage roadmap emphasizing representative data collection, inclusion of contextual variables, robust training/validation design, and post‐deployment monitoring to improve fairness and clinical utility.

Keywords: artificial intelligence, data representation, equity, model training, thyroid cancer
1. Introduction
Artificial intelligence (AI) models are increasingly being developed and studied in thyroid cancer care, with applications proposed for diagnosis, risk stratification, treatment planning, and prognosis [1, 2, 3]. However, most of them remain at the research stage, with limited integration into routine clinical practice [4, 5, 6]. Thyroid cancer incidence and progression are influenced by a spectrum of demographic factors, tumor biology, comorbidities, and environmental exposures [7, 8, 9]. Reliance on non‐representative datasets limits the clinical utility and fairness of AI tools in thyroid cancer care.
2. Discussion
To address the limitations identified in existing AI models for thyroid cancer seen in 173 studies, a roadmap was developed to outline key considerations across the model development lifecycle (Figure 1) [10]. This framework synthesizes sources of bias observed in the literature and offers a sequential approach to improving representativeness, clinical relevance, and long‐term performance. Each component reflects a stage where unaddressed gaps can translate into downstream errors or inequitable care.
FIGURE 1.

Roadmap of factors influencing AI model performance in thyroid cancer. The model development process spans data collection, model training, and deployment. Key considerations include demographic and geographic representation, socioeconomic and environmental variables, clinical characteristics, comorbidities, and careful training–validation set design to reduce bias and concept drift. [Color figure can be viewed in the online issue, which is available at www.laryngoscope.com]
The roadmap begins with epidemiologically representative data collection, which serves as the foundation of model development. Data inputs should reflect the variable geographic distribution of thyroid cancer, including differences in incidence, genetics, and risk factors by age, sex, and ethnicity. For example, BRAF V600E mutations are more common in East Asian populations, while Hispanic and Indigenous patients often present with delayed diagnoses due to socioeconomic and access‐related barriers [11, 12]. Without representative sampling at this stage, models risk poor performance in populations most affected by the disease.
Following this, inclusion of patient, tumor, and clinical variables ensures that models are grounded in real‐world care. Frequently reported features, such as age, sex, and tumor size, are important but insufficient in isolation. Variables like comorbidities, environmental exposures, and socioeconomic position influence disease trajectory, diagnostic timing, and access to definitive treatment [13]. Despite their known impact, these factors are rarely integrated into model design. Their exclusion can lead to algorithms that miss subtle but clinically important patterns, especially in underserved or underrepresented subgroups.
The third stage emphasizes training and validation set design. Ensuring balanced sample sizes and demographic diversity across both sets is critical for model generalizability. Many studies reuse the same dataset for both model training and validation, which increases the risk of overfitting and masks performance gaps [14, 15, 16]. Additionally, selection bias is frequently introduced when only high‐confidence or surgically confirmed nodules are included, excluding equivocal lesions that represent a common diagnostic challenge in practice [17, 18, 19]. Without diversity in the training and testing cohorts, models may produce inflated performance metrics.
The final step in adequate AI model design addresses deployment and concept drift, a dynamic risk that arises when the model is applied in evolving clinical environments. Changes in disease epidemiology, diagnostic technology, and clinical guidelines can all reduce model accuracy over time. Without ongoing monitoring and periodic recalibration, AI tools may continue to make outdated or inappropriate predictions. This is particularly concerning in thyroid cancer care, where small shifts in risk classification can lead to substantial changes in management decisions, including unnecessary surgery or missed malignancy.
Each element of the roadmap contributes to building stronger, more equitable, and generalizable AI models. While no single stage is sufficient on its own, each plays a critical role in minimizing bias and improving performance across diverse populations. When data representativeness, variable inclusion, validation, and monitoring are thoughtfully integrated, the result is a more trustworthy and clinically useful model. Conversely, neglecting any one of these components can compromise model accuracy, limit generalizability, and risk reinforcing existing health disparities.
While existing AI reporting guidelines, such as CONSORT‐AI and TRIPOD‐AI, promote transparency and methodological rigor, they do not address the full development lifecycle or explicitly incorporate equity considerations, particularly in the context of thyroid cancer. This roadmap aims to fill that gap by providing a condition‐specific framework focused on bias mitigation and real‐world relevance. However, it is important to note that the roadmap remains a conceptual tool, grounded in findings from systematic reviews and has not been prospectively validated. Implementation may be limited by institutional data constraints, lack of harmonized demographic variables, and restricted access to multiethnic datasets. Inclusion of contextual factors such as environmental exposure and socioeconomic status often requires external data linkage, raising privacy and feasibility concerns. Additionally, adaptation may be necessary for other disease contexts with differing epidemiologic or clinical features.
3. Conclusion
Developing clinically useful AI models for thyroid cancer requires attention to data representativeness, variable selection, validation design, and ongoing monitoring. This roadmap provides a structured framework to support models that are both accurate and equitable. By addressing bias across each stage of development, it strengthens clinical relevance and generalizability. As AI applications advance toward clinical translation in thyroid cancer care, applying these principles will be essential to ensure models remain reliable and effective in real‐world practice.
Ethics Statement
The authors have nothing to report.
Conflicts of Interest
The authors declare no conflicts of interest.
Ramchandani R., Guo E., Biglou S. G., et al., “Roadmap for Representative Artificial Intelligence Models for Thyroid Cancer,” The Laryngoscope 136, no. 2 (2026): 528–530, 10.1002/lary.70169.
Funding: The authors received no specific funding for this work.
Level of Evidence: N/A
Data Availability Statement
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
References
- 1. Mittermaier M., Raza M. M., and Kvedar J. C., “Bias in AI‐Based Models for Medical Applications: Challenges and Mitigation Strategies,” npj Digital Medicine 6, no. 1 (2023): 113, 10.1038/s41746-023-00858-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Pizzato M., Li M., Vignat J., et al., “The Epidemiological Landscape of Thyroid Cancer Worldwide: GLOBOCAN Estimates for Incidence and Mortality Rates in 2020,” Lancet Diabetes and Endocrinology 10, no. 4 (2022): 264–272, 10.1016/s2213-8587(22)00035-3. [DOI] [PubMed] [Google Scholar]
- 3. Shank J. B., Are C., and Wenos C. D., “Thyroid Cancer: Global Burden and Trends,” Indian Journal of Surgical Oncology 13, no. 1 (2022): 40–45, 10.1007/s13193-021-01429-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Patel R. H., Foltz E. A., Witkowski A., and Ludzik J., “Analysis of Artificial Intelligence‐Based Approaches Applied to Non‐Invasive Imaging for Early Detection of Melanoma: A Systematic Review,” Cancers 15, no. 19 (2023): 4694, 10.3390/cancers15194694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Melarkode N., Srinivasan K., Qaisar S. M., and Plawiak P., “AI‐Powered Diagnosis of Skin Cancer: A Contemporary Review, Open Challenges and Future Research Directions,” Cancers 15, no. 4 (2023): 1183, 10.3390/cancers15041183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Guo L. N., Lee M. S., Kassamali B., Mita C., and Nambudiri V. E., “Bias in, Bias out: Underreporting and Underrepresentation of Diverse Skin Types in Machine Learning Research for Skin Cancer Detection—A Scoping Review,” Journal of the American Academy of Dermatology 87, no. 1 (2022): 157–159, 10.1016/j.jaad.2021.06.884. [DOI] [PubMed] [Google Scholar]
- 7. LeClair K., Bell K. J. L., Furuya‐Kanamori L., Doi S. A., Francis D. O., and Davies L., “Evaluation of Gender Inequity in Thyroid Cancer Diagnosis: Differences by Sex in US Thyroid Cancer Incidence Compared With a Meta‐Analysis of Subclinical Thyroid Cancer Rates at Autopsy,” JAMA Internal Medicine 181, no. 10 (2021): 1351–1358, 10.1001/jamainternmed.2021.4804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Moleti M., Sturniolo G., Di Mauro M., Russo M., and Vermiglio F., “Female Reproductive Factors and Differentiated Thyroid Cancer,” Frontiers in Endocrinology 8 (2017): 8, 10.3389/fendo.2017.00111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Magreni A., Bann D. V., Schubart J. R., and Goldenberg D., “The Effects of Race and Ethnicity on Thyroid Cancer Incidence,” JAMA Otolaryngology. Head & Neck Surgery 141, no. 4 (2015): 319–323, 10.1001/jamaoto.2014.3740. [DOI] [PubMed] [Google Scholar]
- 10. Ramchandani R., Guo E., Biglou S., et al., “Representation and Bias in Artificial Intelligence Models for Thyroid Cancer: A Systematic Review,” Thyroid, ahead of print, August 28, (2025), 10.1177/10507256251372175. [DOI] [PubMed] [Google Scholar]
- 11. McTiernan A. M., Weiss N. S., and Daling J. R., “Incidence of Thyroid Cancer in Women in Relation to Reproductive and Hormonal Factors,” American Journal of Epidemiology 120, no. 3 (1984): 423–435, 10.1093/oxfordjournals.aje.a113907. [DOI] [PubMed] [Google Scholar]
- 12. Rahbari R., Zhang L., and Kebebew E., “Thyroid Cancer Gender Disparity,” Future Oncology 6, no. 11 (2010): 1771–1779, 10.2217/fon.10.127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Lee Y. K., Hong N., Park S. H., et al., “The Relationship of Comorbidities to Mortality and Cause of Death in Patients With Differentiated Thyroid Carcinoma,” Scientific Reports 9, no. 1 (2019): 11435, 10.1038/s41598-019-47898-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Dai F., Yao S., Wang M., et al., “Improving AI Models for Rare Thyroid Cancer Subtype by Text Guided Diffusion Models,” Nature Communications 16, no. 1 (2025): 4449, 10.1038/s41467-025-59478-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Hanani A. A., Donmez T. B., Kutlu M., and Mansour M., “Predicting Thyroid Cancer Recurrence Using Supervised CatBoost: A SHAP‐Based Explainable AI Approach,” Medicine (Baltimore) 104, no. 22 (2025): e42667, 10.1097/MD.0000000000042667. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Moon G., Park J. H., Lee T., and Yoon J. H., “A Machine Learning‐Based Model for Preoperative Assessment and Malignancy Prediction in Patients With Atypia of Undetermined Significance Thyroid Nodules,” Journal of Clinical Medicine 13, no. 24 (2024): 7769, 10.3390/jcm13247769. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Kotecka‐Blicharz A., Pfeifer A., Czarniecka A., et al., “Thyroid Nodules With Indeterminate Cytopathology: A Constant Challenge in Everyday Practice. The Effectiveness of Clinical Decisions Using Diagnostic Tools Available in Poland,” Polskie Archiwum Medycyny Wewnętrznej 131, no. 12 (2021), October 11, 10.20452/pamw.16117. [DOI] [PubMed] [Google Scholar]
- 18. Chambara N. and Ying M., “The Diagnostic Efficiency of Ultrasound Computer–Aided Diagnosis in Differentiating Thyroid Nodules: A Systematic Review and Narrative Synthesis,” Cancers 11, no. 11 (2019): 1759, 10.3390/cancers11111759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. McQueen A. S. and Bhatia K. S. S., “Thyroid Nodule Ultrasound: Technical Advances and Future Horizons,” Insights Into Imaging 6, no. 2 (2015): 173–188, 10.1007/s13244-015-0398-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
