Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

ArXiv logoLink to ArXiv
[Preprint]. 2023 Oct 16:arXiv:2310.10867v1. [Version 1]

Evolving Horizons in Radiotherapy Auto-Contouring: Distilling Insights, Embracing Data-Centric Frameworks, and Moving Beyond Geometric Quantification

Kareem A Wahid 1,2, Carlos E Cardenas 3, Barbara Marquez 4,5, Tucker J Netherton 5, Benjamin H Kann 6, Laurence E Court 5, Renjie He 2, Mohamed A Naser 2, Amy C Moreno 2, Clifton D Fuller 2, David Fuentes 1,*
PMCID: PMC10614971  PMID: 37904737

Introduction

Historically, clinician-derived contouring of tumors and healthy tissues has been crucial for radiotherapy (RT) planning. In recent years, advances in artificial intelligence (AI), predominantly in deep learning (DL), have rapidly improved automated contouring for RT applications, particularly for routine organs-at-risk13. Despite research efforts actively promoting its broader acceptance, clinical adoption of auto-contouring is not yet standard practice.

Notably, within several AI communities, there has been growing enthusiasm to shift from conventional “model-centric” AI approaches (i.e., improving a model while keeping the data fixed), to “data-centric” AI approaches (i.e., improving the data while keeping a model fixed)4. Although balancing both approaches is typically ideal for crafting the optimal solution for specific use cases, most research in RT auto-contouring has prioritized algorithmic modifications aimed at enhancing quantitative contouring performance based on geometric (i.e., structural overlap) indices5 — a clear testament to the “model-centric” AI paradigm.

In this editorial, aimed at clinician end-users and multidisciplinary research teams, we harmonize key insights in contemporary RT auto-contouring algorithmic development to motivate the adoption of data-centric AI frameworks for impactful future research directions that would further facilitate clinical adoption. Of note, the discussion herein draws primarily from literature related to head and neck cancer (HNC), showcasing it as a representative example of a complex disease site. However, these insights apply broadly to auto-contouring across disease sites.

Insight 1: DL auto-contouring algorithms require high-quality training data

The adage “garbage in, garbage out” is often used to describe the importance of providing computational algorithms with high-quality data (i.e., “ground truth”). One particular challenge for RT contouring applications is the absence of a definitive ground truth. In contouring research, ground truth typically refers to a structure delineated by a clinician, preferably with expertise in the relevant disease site. Ideally, this structure should show minimal differences if another expert were to contour it independently (i.e., low interobserver variability), given the observers desire the same clinical endpoint. Despite increasing guideline recommendations over time6, some structures, such as target volumes, are inherently more subjective than others due to clinical factors and institutional preferences. Notably, the precise definition of ground truth in contouring is debated, as multiple clinically acceptable solutions for a single structure may exist5,7. Building on this context, a tangible manifestation of the “garbage in, garbage out” principle within HNC contouring is exemplified in a study by Henderson et al.8. Their findings revealed that models trained on a small set of consistent contours (i.e., strictly following guidelines) aligned more closely with the ground truth test data than those trained on a vast array of inconsistent contours (Figure 1). This underscores the critical role of consistent, high-quality contours for successful DL auto-contouring training.

Figure 1.

Figure 1.

A deep learning model trained with a few highly consistent, i.e., high-quality, contours (green) was more closely aligned to the ground truth test data than a model trained with many inconsistent contours (red) for various head and neck cancer radiotherapy structures. The 95% Hausdorff distance (HD95) (a) and mean distance to agreement (mDTA) (b) were used as geometric performance quantification metrics. Lower values for both metrics indicate better performance. Reprinted from Henderson et al.8.

Curating high-quality ground-truth contouring data is costly in terms of dedicated clinician effort. Expert clinicians must meticulously manually contour structures and, when applicable, carefully consider existing guidelines to reduce interobserver variability. Consensus contouring fusion methods, such as the Simultaneous Truth and Performance Level Estimation algorithm, have allowed for potentially suboptimal contours (e.g., deviating from guidelines) to be combined to yield an improved overall contour structure. Recent work by Lin et al.9 investigated consensus methods across various RT disease sites using an unprecedented number of physician observers and revealed that as few as two to five non-expert contours can approximate expert gold standard geometric benchmarks (Figure 2). Conceivably, these consensus inputs could be cost-effective alternatives to expert-derived ground truth for DL auto-contouring training. In other words, institutions without access to established experts may still be able to produce high-quality data for algorithmic development.

Figure 2.

Figure 2.

Consensus from a limited number of non-expert contours can approximate expert benchmarks. Specific plot is shown for the left parotid gland in a head and neck cancer case using the volumetric Dice similarity coefficient (DSC) as a performance quantification metric. The Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm was used to generate consensus contours. To explore consensus quality dynamics based on the number of non-expert inputs, bootstrap resampling selected random non-expert subsets with replacement to form consensus contours, which were then compared to expert consensus. Each dot represents the median from 100 bootstrap iterations with a 95% confidence interval (shaded area). The black dotted line indicates the median expert DSC interobserver variability (IOV). The gray dotted line indicates DSC performance for the maximum number of non-experts used in the consensus. For this example, three to four non-experts can approximate expert IOV benchmarks. As the number of non-experts in the consensus contour increases, performance generally improves before plateauing. Adapted from Lin et al.9.

Insight 2: DL auto-contouring models exhibit reasonable quantitative performance with limited data

While natural images (e.g., photographs) are abundant and simple to annotate, medical image contouring data are significantly limited. This has constrained DL contouring research in the medical image domain to much smaller training set sizes compared to their natural image counterparts. Nonetheless, DL auto-contouring models seemingly perform quite well in terms of geometric indices despite limited medical image training set sizes, assuming high-quality data. A study by Fang et al.10 highlighted this phenomenon by showing most HNC organs-at-risk reach 95% of their maximum possible geometric performance with as few as 40 independent patient samples (Figure 3). Depending on context-specific use cases for certain structures, the appropriate sample sizes may be even smaller. Moreover, the study illustrated diminishing returns in quantitative performance with increasing training set size, noting that performance plateaus or even declines in some instances. Similarly, Yu et al.11 and Weissmann et al.12 demonstrate that small, well-curated datasets can be used to train publicly available models to achieve clinically acceptable results.

Figure 3.

Figure 3.

Relatively small training sample sizes are needed to reach high geometric performance for deep learning auto-contouring models. The percentage of the volumetric Dice similarity coefficient (DSC) using different training sample sizes relative to the maximum DSC for individual contour structures is shown in different colors. Most organ-at-risk structures required ~40 patient samples to achieve 95% of the maximum possible performance; notably, lenses and optic nerves required 200 samples to achieve 95% of the maximum possible performance. Reprinted from Fang et al.10.

While DL models were historically labeled as “data hungry”, modern approaches now allow them to perform impressively well even with what might appear as limited data. In auto-contouring, because training is fundamentally conducted at the scale of voxels, even modest patient populations can provide sufficient datasets for pattern learning. Notably, data-centric pre-processing strategies, such as performing image cropping to minimize the imbalance between “positive” and “negative” voxels before model training, further enhance this ability in auto-contouring13.

Insight 3: Auto-contouring quantitative performance is saturating

The democratization of science, particularly through open-source tools and data, has justifiably become more prevalent over time. Much of this shift has also influenced the realm of radiotherapy research14 and, by extension, medical image contouring. This has allowed for an increasingly “level” playing field for researchers in terms of algorithmic development. Within contouring, a prime example of the benefits of open-science practices has been the increasing use of U-Net, an effective DL contouring architecture, through standard computational libraries. nnU-Net15, a self-configuring variant of the U-Net architecture, has unmistakenly become a de facto standard for many medical image contouring projects. More recently, the publicly available Segment Anything Model, which has been benchmarked on medical imaging data16, has also yielded impressive results with minimal domain-specific training.

Over the past several years, medical image data challenges (i.e., public competitions), have been inundated with U-Net variants15. This surge has seemingly decreased the gap between ‘state-of-the-art’ and ‘average’ participant performance. In RT contouring, the HECKTOR challenge17 — a competition focused on HNC gross tumor volume contouring using PET/CT imaging — stands out as a prime example, where the state-of-the-art contouring performance has steadily plateaued after median performance crossed expert interobserver variability (Figure 4). Moreover, once a measure of human performance benchmarking has been exceeded (e.g., interobserver variability), the practical benefits of further improving geometric indices become somewhat ambiguous. For particularly noisy contouring targets like tumor volumes, where human agreement on what constitutes an “acceptable” contour would already be low, the value of greater geometric performance optimization merits reconsideration.

Figure 4.

Figure 4.

HEad and neCK TumOR (HECKTOR) contouring performance saturation. Contouring performance measured by volumetric Dice similarity coefficient. Green and blue dots correspond to the top 10% and median tumor contouring performance measured across all participating teams, respectively. The gray dotted line corresponds to a clinician expert interobserver variability benchmark. Data derived from corresponding HECKTOR conference proceedings.

Future perspectives on auto-contouring

From the previous discussion points, it becomes increasingly clear that DL auto-contouring requires data that, perhaps contrary to popular belief, is surprisingly simple to curate. Moreover, given the open-source nature of state-of-the-art DL architectures, training these models is also seemingly straightforward. One could ostensibly collect a relatively small group of non-expert contours and generate consensus data to train a nnU-Net model that delivers reasonable geometric performance. So, is RT auto-contouring effectively a solved problem at this point? Though some facets of contemporary research seem to support this idea, there remain significant avenues of exploration before we can confidently say yes.

Most auto-contouring research has focused on geometric indices (e.g., volumetric Dice) as evaluation criteria5, likely because these indices are commonly embedded within model training schemes. While geometric indices can serve as intuitive adjuncts for roughly gauging clinical acceptability, they are not a panacea. Geometric indices have been found to not be strongly correlated to dosimetric or clinical endpoints7,18,19, so their utility in RT is potentially limited. A growing number of studies have started incorporating clinician-derived qualitative scoring evaluations, which may be more closely linked to clinical usability, but these methods may be prone to human bias7. Nonetheless, model-centric AI approaches that seek to gain increasingly diminishing returns in geometric performance by simply tweaking underlying DL architectures may not offer significant clinical benefits. Of note, it is not this editorial’s intention to dissuade researchers from continuing investments in model-centric approaches but rather to emphasize the importance of assessing whether such endeavors lead to meaningful clinical impact. For example, recent model-centric approaches have demonstrated state-of-the-art contouring performance can be achieved by intelligently reducing the number of model parameters20, thereby accelerating training and facilitating deployment in resource-constrained settings. Moreover, for challenging tumor-related structures there might still be room for improvement in geometric performance. However, one must question: would an improvement in a Dice score of 1% for say, a parotid gland contour, offer any tangible benefit? The clinical influence of such a change is doubtful. Future research is likely to explore alternative indices for quantification, particularly those that can accurately capture dosimetric impact.

Given the widespread availability of standardized auto-contouring DL architectures, there is a natural inclination for auto-contouring research to transition toward data-centric approaches. Additionally, unlike other industries where vast data repositories exist, medical research is marked by a relative data shortage21, making the pursuit of a data-optimization strategy potentially more fruitful than model-optimization in the current landscape. For instance, fields of data-centric AI, like active learning, where models iteratively learn through user interaction, could be used to improve performance and minimize contouring time. Notably, interactive contouring has already been shown to be clinically feasible for HNC tumors22 and OARs23. Furthermore, as additional imaging modalities like magnetic resonance imaging become relevant for RT planning24, data-centric AI methods such as domain adaptation and transfer learning — techniques that apply knowledge from one data environment to another — are anticipated to rise in prominence. Illustrating these concepts, Boyd et al.25 adapted a glioma auto-contouring model from an adult to a pediatric population, thereby demonstrating effective translation even in limited data scenarios. Moreover, data-centric techniques could, given appropriate regulatory approval, conceivably be employed in the future to better tailor solutions to specific institutions or user preferences. Recent work by Balagopal et al.26 demonstrated that a pre-trained auto-contouring model could be tailored to particular practice styles with only a limited amount of new data. This challenges the traditional objective of ensuring generalized performance across institutions to emphasize usability for individual entities, highlighting potentially evolving priorities in DL auto-contouring.

Importantly, literature within AI-augmented decision-making highlights the need to design support tools that align with clinicians’ intended use. Recent evidence has shown clinicians do not fully capitalize on the potential gains from image-based AI assistance, even when these models consistently outperform experts27. Additionally, the challenge of automation over-reliance is expected to pose problems when users interact with these systems28. This underscores the imperative of increasing research into model uncertainty estimation and explainability methods29. Model uncertainty and explainability will likely become an increasingly relevant facet for ensuring clinician trust and engagement when implementing RT auto-contouring tools. Techniques that align model uncertainty with human expectations using data-centric approaches are poised to gain significance. Furthermore, as we increasingly rely on these models, ensuring they remain unbiased, particularly toward underrepresented or marginalized communities, is paramount. The consequences of biased AI can range from inaccurate predictions to reinforcing systemic inequalities30. Thus, adopting specific data-centric strategies focused on assuring representation and consistent performance will not just be beneficial — but a moral imperative.

Conclusion

Model-centric AI has made great strides in RT auto-contouring. Nevertheless, given DL auto-contouring facile training characteristics, readily available state-of-the-art architectures, and a plateauing of geometric performance, it becomes imperative for the auto-contouring community to pivot their focus. Embracing data-centric techniques, such as active learning and transfer learning, and exploring alternative methods to capture clinical utility, such as dosimetric impact and model uncertainty, could chart the next frontier in auto-contouring and allow for more facile clinical adoption. This shift not only recognizes the evolving needs and challenges of clinicians but also holds the promise of driving more clinically relevant breakthroughs for patients.

Funding Statement:

KAW was supported by an Image Guided Cancer Therapy (IGCT) T32 Training Program Fellowship from T32CA261856. BM was supported by the American Legion Auxiliary Fellowships in Cancer Research and the UTHealth Innovation for Cancer Prevention Research Training Program Pre-doctoral Fellowship (Cancer Prevention and Research Institute of Texas grant #RP210042).BHK receives funding from NIH/NIDCR K08 DE030216. LEC has received funding from Varian Medical Systems, Wellcome Trust, Fund for Innovation in Cancer Informatics, and The Cancer Prevention and Research Institute of Texas. ACM receives unrelated funding and salary support from NIH National Institute of Dental and Craniofacial Research Exploratory/Developmental Research Grant Program (R21DE031082-01) and Mentored Career Development Award to Promote Diversity (K01DE030524-01A1), and infrastructure support from MD Anderson Cancer Center via the Charles and Daneen Stiefel Center for Head and Neck Cancer Oropharyngeal Cancer Research Program. CDF was supported by P30CA016672. DF was supported by R01CA195524.

Footnotes

Conflicts of Interest: CDF has received travel, speaker honoraria and/or registration fee waivers unrelated to this project from: The American Association for Physicists in Medicine; the University of Alabama-Birmingham; The American Society for Clinical Oncology; The Royal Australian and New Zealand College of Radiologists; The American Society for Radiation Oncology; The Radiological Society of North America; and The European Society for Radiation Oncology. The other authors have no interests to disclose.

Declaration of generative AI and AI-assisted technologies in the writing process: During the preparation of this work, the authors used ChatGPT (GPT-4 architecture; ChatGPT September 25 Version) to improve the grammatical accuracy and semantic structure of portions of the text. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

Data availability statement:

Tabular data and Python code used to create the HECKTOR performance saturation figure are available on GitHub (https://github.com/kwahid/autoseg_editorial_code/tree/main).

References

  • 1.Cardenas C. E., Yang J., Anderson B. M., Court L. E. & Brock K. B. Advances in Auto-Segmentation. Semin. Radiat. Oncol. 29, 185–197 (2019). [DOI] [PubMed] [Google Scholar]
  • 2.Santoro M., Strolin S., Paolani G., Della Gala G., Bartoloni A., Giacometti C., Ammendolia I., Morganti A. G. & Strigari L. Recent Applications of Artificial Intelligence in Radiotherapy: Where We Are and Beyond. NATO Adv. Sci. Inst. Ser. E Appl. Sci. 12, 3223 (2022). [Google Scholar]
  • 3.Naqa I. E. in Artificial Intelligence in Radiation Oncology and Biomedical Physics 1–23 (CRC Press, 2023). [Google Scholar]
  • 4.Hamid O. H. From Model-Centric to Data-Centric AI: A Paradigm Shift or Rather a Complementary Approach? in 2022 8th International Conference on Information Technology Trends (ITT) 196–199 (2022). [Google Scholar]
  • 5.Mackay K., Bernstein D., Glocker B., Kamnitsas K. & Taylor A. A Review of the Metrics Used to Assess Auto-Contouring Systems in Radiotherapy. Clin. Oncol. 35, 354–369 (2023). [DOI] [PubMed] [Google Scholar]
  • 6.Lin D., Lapen K., Sherer M. V., Kantor J., Zhang Z., Boyce L. M., Bosch W., Korenstein D. & Gillespie E. F. A Systematic Review of Contouring Guidelines in Radiation Oncology: Analysis of Frequency, Methodology, and Delivery of Consensus Recommendations. Int. J. Radiat. Oncol. Biol. Phys. 107, 827–835 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Baroudi H., Brock K. K., Cao W., Chen X., Chung C., Court L. E., El Basha M. D., Farhat M., Gay S., Gronberg M. P., Gupta A. C., Hernandez S., Huang K., Jaffray D. A., Lim R., Marquez B., Nealon K., Netherton T. J., Nguyen C. M., Reber B., Rhee D. J., Salazar R. M., Shanker M. D., Sjogreen C., Woodland M., Yang J., Yu C. & Zhao Y. Automated Contouring and Planning in Radiation Therapy: What Is ‘Clinically Acceptable’? Diagnostics 13, 667 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Henderson E. G. A., Vasquez Osorio E. M., van Herk M., Brouwer C. L., Steenbakkers R. J. H. M. & Green A. F. Accurate segmentation of head and neck radiotherapy CT scans with 3D CNNs: consistency is key. Phys. Med. Biol. 68, (2023). [DOI] [PubMed] [Google Scholar]
  • 9.Lin D., Wahid K. A., Nelms B. E., He R., Naser M. A., Duke S., Sherer M. V., Christodouleas J. P., Mohamed A. S. R., Cislo M., Murphy J. D., Fuller C. D. & Gillespie E. F. E pluribus unum: prospective acceptability benchmarking from the Contouring Collaborative for Consensus in Radiation Oncology crowdsourced initiative for multiobserver segmentation. J Med Imaging (Bellingham) 10, S11903 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Fang Y., Wang J., Ou X., Ying H., Hu C., Zhang Z. & Hu W. The impact of training sample size on deep learning-based organ auto-segmentation for head-and-neck patients. Phys. Med. Biol. 66, (2021). [DOI] [PubMed] [Google Scholar]
  • 11.Yu C., Anakwenze C. P., Zhao Y., Martin R. M., Ludmir E. B., S Niedzielski J., Qureshi A., Das P., Holliday E. B., Raldow A. C., Nguyen C. M., Mumme R. P., Netherton T. J., Rhee D. J., Gay S. S., Yang J., Court L. E. & Cardenas C. E. Multi-organ segmentation of abdominal structures from non-contrast and contrast enhanced CT images. Sci. Rep. 12, 19093 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Weissmann T., Huang Y., Fischer S., Roesch J., Mansoorian S., Ayala Gaona H., Gostian A.-O., Hecht M., Lettmaier S., Deloch L., Frey B., Gaipl U. S., Distel L. V., Maier A., Iro H., Semrau S., Bert C., Fietkau R. & Putz F. Deep learning for automatic head and neck lymph node level delineation provides expert-level accuracy. Front. Oncol. 13, 1115258 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Rodríguez Outeiral R., Bos P., van der Hulst H. J., Al-Mamgani A., Jasperse B., Simões R. & van der Heide U. A. Strategies for tackling the class imbalance problem of oropharyngeal primary tumor segmentation on magnetic resonance imaging. Phys Imaging Radiat Oncol 23, 144–149 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wahid K. A., Glerean E., Sahlsten J., Jaskari J., Kaski K., Naser M. A., He R., Mohamed A. S. R. & Fuller C. D. Artificial Intelligence for Radiation Oncology Applications Using Public Datasets. Semin. Radiat. Oncol. 32, 400–414 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Isensee F., Jaeger P. F., Kohl S. A. A., Petersen J. & Maier-Hein K. H. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2021). [DOI] [PubMed] [Google Scholar]
  • 16.Mazurowski M. A., Dong H., Gu H., Yang J., Konz N. & Zhang Y. Segment anything model for medical image analysis: An experimental study. Med. Image Anal. 89, 102918 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Andrearczyk V., Oreiller V., Abobakr M., Akhavanallaf A., Balermpas P., Boughdad S., Capriotti L., Castelli J., Cheze Le Rest C., Decazes P., Correia R., El-Habashy D., Elhalawani H., Fuller C. D., Jreige M., Khamis Y., La Greca A., Mohamed A., Naser M., Prior J. O., Ruan S., Tanadini-Lang S., Tankyevych O., Salimi Y., Vallières M., Vera P., Visvikis D., Wahid K., Zaidi H., Hatt M. & Depeursinge A. Overview of the HECKTOR Challenge at MICCAI 2022: Automatic Head and Neck Tumor Segmentation and Outcome Prediction in PET/CT. in Head and Neck Tumor Segmentation and Outcome Prediction 1–30 (Springer Nature Switzerland, 2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Sherer M. V., Lin D., Elguindi S., Duke S., Tan L.-T., Cacicedo J., Dahele M. & Gillespie E. F. Metrics to evaluate the performance of auto-segmentation for radiation treatment planning: A critical review. Radiother. Oncol. 160, 185–191 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hosny A., Bitterman D. S., Guthier C. V., Qian J. M., Roberts H., Perni S., Saraf A., Peng L. C., Pashtan I., Ye Z., Kann B. H., Kozono D. E., Christiani D., Catalano P. J., Aerts H. J. W. L. & Mak R. H. Clinical validation of deep learning algorithms for radiotherapy targeting of non-small-cell lung cancer: an observational study. Lancet Digit Health 4, e657–e666 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Celaya A., Actor J. A., Muthusivarajan R., Gates E., Chung C., Schellingerhout D., Riviere B. & Fuentes D. PocketNet: A Smaller Neural Network for Medical Image Analysis. IEEE Trans. Med. Imaging 42, 1172–1184 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Pereira T., Morgado J., Silva F., Pelter M. M., Dias V. R., Barros R., Freitas C., Negrão E., Flor de Lima B., Correia da Silva M., Madureira A. J., Ramos I., Hespanhol V., Costa J. L., Cunha A. & Oliveira H. P. Sharing Biomedical Data: Strengthening AI Development in Healthcare. Healthcare (Basel) 9, (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wei Z., Ren J., Korreman S. S. & Nijkamp J. Towards interactive deep-learning for tumour segmentation in head and neck cancer radiotherapy. Phys Imaging Radiat Oncol 25, 100408 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Rasmussen M. E., Nijkamp J. A., Eriksen J. G. & Korreman S. S. A simple single-cycle interactive strategy to improve deep learning-based segmentation of organs-at-risk in head-and-neck cancer. Phys Imaging Radiat Oncol 26, 100426 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Keall P. J., Brighi C., Glide-Hurst C., Liney G., Liu P. Z. Y., Lydiard S., Paganelli C., Pham T., Shan S., Tree A. C., van der Heide U. A., Waddington D. E. J. & Whelan B. Integrated MRI-guided radiotherapy - opportunities and challenges. Nat. Rev. Clin. Oncol. 19, 458–470 (2022). [DOI] [PubMed] [Google Scholar]
  • 25.Boyd A., Ye Z., Prabhu S., Tjong M. C., Zha Y., Zapaishchykova A., Vajapeyam S., Hayat H., Chopra R., Liu K. X., Nabavidazeh A., Resnick A., Mueller S., Haas-Kogan D., Aerts H. J. W. L., Poussaint T. & Kann B. H. Expert-level pediatric brain tumor segmentation in a limited data scenario with stepwise transfer learning. medRxiv (2023). doi: 10.1101/2023.06.29.23292048 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Balagopal A., Nguyen D., Bai T., Dohopolski M., Lin M.-H. & Jiang S. Prior Guided Deep Difference Meta-Learner for Fast Adaptation to Stylized Segmentation. arXiv [cs.CV] (2022). at <http://arxiv.org/abs/2211.10588> [Google Scholar]
  • 27.Agarwal N., Moehring A., Rajpurkar P. & Salz T. Combining Human Expertise with Artificial Intelligence: Experimental Evidence from Radiology. (2023). doi: 10.3386/w31422 [DOI] [Google Scholar]
  • 28.Pot M., Kieusseyan N. & Prainsack B. Not all biases are bad: equitable and inequitable biases in machine learning and radiology. Insights Imaging 12, 13 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Cui S., Traverso A., Niraula D., Zou J., Luo Y., Owen D., El Naqa I. & Wei L. Interpretable artificial intelligence in radiology and radiation oncology. Br. J. Radiol. 96, 20230142 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Chen I. Y., Pierson E., Rose S., Joshi S., Ferryman K. & Ghassemi M. Ethical Machine Learning in Healthcare. Annu Rev Biomed Data Sci 4, 123–144 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Tabular data and Python code used to create the HECKTOR performance saturation figure are available on GitHub (https://github.com/kwahid/autoseg_editorial_code/tree/main).


Articles from ArXiv are provided here courtesy of arXiv

RESOURCES