Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

ArXiv logoLink to ArXiv
[Preprint]. 2024 Dec 9:arXiv:2306.00838v3. Originally published 2023 Jun 1. [Version 3]

The Brain Tumor Segmentation - Metastases (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI

Ahmed W Moawad 1,*,α,β, Anastasia Janas 2,*,α,β,δ, Ujjwal Baid 3,*,α,β, Divya Ramakrishnan 2,*,α,β, Rachit Saluja 4,5,*,α,β,γ, Nader Ashraf 6,7,*,α,β,γ, Nazanin Maleki 2,6,*,α,β,δ, Leon Jekel 8,*,α,β, Nikolay Yordanov 9,δ,κ, Pascal Fehringer 10,δ,κ, Athanasios Gkampenis 11,δ,κ, Raisa Amiruddin 6,α,δ, Amirreza Manteghinejad 6,α, Maruf Adewole 12,α, Jake Albrecht 13,α, Udunna Anazodo 12,14,α, Sanjay Aneja 15,α, Syed Muhammad Anwar 16,α, Timothy Bergquist 17,α, Veronica Chiang 18,α, Verena Chung 13,α, Gian Marco Conte 17,α, Farouk Dako 19,α, James Eddy 13,α, Ivan Ezhov 20,α, Nastaran Khalili 21,α, Keyvan Farahani 22,α, Juan Eugenio Iglesias 23,α, Zhifan Jiang 24,α, Elaine Johanson 25,α, Anahita Fathi Kazerooni 21,26,27,α, Florian Kofler 28,α, Kiril Krantchev 2,α,β,ϵ,δ, Dominic LaBella 29,α, Koen Van Leemput 30,α, Hongwei Bran Li 23,α, Marius George Linguraru 16,31,α, Xinyang Liu 24,α, Zeke Meier 32,α, Bjoern H Menze 33,α, Harrison Moy 2,α,β,ϵ, Klara Osenberg 2,α,β, Marie Piraud 34,α, Zachary Reitman 29,α, Russell Takeshi Shinohara 35,α, Chunhao Wang 29,α, Benedikt Wiestler 28,α, Walter Wiggins 36,α, Umber Shafique 37,α,η, Klara Willms 2,β, Arman Avesta 2,38,β, Khaled Bousabarah 39,β,ϵ, Satrajit Chakrabarty 40,41,β, Nicolo Gennaro 42,β, Wolfgang Holler 39,β,ϵ, Manpreet Kaur 43,β,ϵ, Pamela LaMontagne 44,β, MingDe Lin 45,β,ϵ, Jan Lost 46,β,ϵ, Daniel S Marcus 44,β, Ryan Maresca 15,β,ϵ, Sarah Merkaj 47,β,ϵ, Gabriel Cassinelli Pedersen 48,β,ϵ, Marc von Reppert 49,β,ϵ, Aristeidis Sotiras 44,50,β, Oleg Teytelboym 1,β, Niklas Tillmans 51,β,ϵ, Malte Westerhoff 39,β,ϵ, Ayda Youssef 52,β, Devon Godfrey 29,β, Scott Floyd 29,β, Andreas Rauschecker 53,β, Javier Villanueva-Meyer 53,β, Irada Pflüger 54,β, Jaeyoung Cho 54,β, Martin Bendszus 54,β, Gianluca Brugnara 54,β, Justin Cramer 55,η, Gloria J Guzman Perez-Carillo 56,η, Derek R Johnson 17,η, Anthony Kam 57,η, Benjamin Yin Ming Kwan 58,η, Lillian Lai 59,η, Neil U Lall 60,η, Fatima Memon 61,62,63,η, Mark Krycia 61,η, Satya Narayana Patro 64,η, Bojan Petrovic 65,η, Tiffany Y So 66,η, Gerard Thompson 67,68,η, Lei Wu 69,η, E Brooke Schrickel 70,η, Anu Bansal 71,θ, Frederik Barkhof 72,73,θ, Cristina Besada 74,θ, Sammy Chu 69,θ, Jason Druzgal 75,θ, Alexandru Dusoi 76,θ, Luciano Farage 77,θ, Fabricio Feltrin 78,θ, Amy Fong 79,θ, Steve H Fung 80,θ, R Ian Gray 81,θ, Ichiro Ikuta 55,θ, Michael Iv 82,θ, Alida A Postma 83,84,θ, Amit Mahajan 2,θ, David Joyner 75,θ, Chase Krumpelman 42,θ, Laurent Letourneau-Guillon 85,θ, Christie M Lincoln 86,θ, Mate E Maros 87,θ, Elka Miller 88,θ, Fanny Esther A Morón 89,θ, Esther A Nimchinsky 90,θ, Ozkan Ozsarlak 91,θ, Uresh Patel 92,θ, Saurabh Rohatgi 38,θ, Atin Saha 93,94,θ, Anousheh Sayah 95,θ, Eric D Schwartz 96,97,θ, Robert Shih 98,θ, Mark S Shiroishi 99,θ, Juan E Small 100,θ, Manoj Tanwar 101,θ, Jewels Valerie 102,θ, Brent D Weinberg 103,θ, Matthew L White 104,θ, Robert Young 93,θ, Vahe M Zohrabian 105,θ, Aynur Azizova 106,θ, Melanie Maria Theresa Brüßeler 43,β,κ, Mohanad Ghonim 107,κ, Mohamed Ghonim 107,κ, Abdullah Okar 108,κ, Luca Pasquini 93,κ, Yasaman Sharifi 109,κ, Gagandeep Singh 110,κ, Nico Sollmann 111,112,113,κ, Theodora Soumala 11,κ, Mahsa Taherzadeh 114,κ, Philipp Vollmuth 54,115,β,γ, Martha Foltyn-Dumitru 54,β,γ, Ajay Malhotra 2,β,γ, Aly H Abayazeed 82,γ, Francesco Dellepiane 116,γ, Philipp Lohmann 117,118,γ, Víctor M Pérez-García 119,γ, Hesham Elhalawani 120,γ, Maria Correia de Verdier 121,122,γ, Sanaria Al-Rubaiey 123,λ, Rui Duarte Armindo 124,λ, Kholod Ashraf 52,λ, Moamen M Asla 125,λ, Mohamed Badawy 126,λ, Jeroen Bisschop 127,λ, Nima Broomand Lomer 128,λ, Jan Bukatz 123,λ, Jim Chen 129,λ, Petra Cimflova 130,λ, Felix Corr 131,λ, Alexis Crawley 132,λ, Lisa Deptula 133,λ, Tasneem Elakhdar 52,λ, Islam H Shawali 52,λ, Shahriar Faghani 17,λ, Alexandra Frick 134,λ, Vaibhav Gulati 135,λ, Muhammad Ammar Haider 136,λ, Fátima Hierro 137,λ, Rasmus Holmboe Dahl 138,λ, Sarah Maria Jacobs 139,λ, Kuang-chun Jim Hsieh 89,λ, Sedat G Kandemirli 59,λ, Katharina Kersting 123,λ, Laura Kida 123,λ, Sofia Kollia 140,λ, Ioannis Koukoulithras 141,λ, Xiao Li 103,λ, Ahmed Abouelatta 52,λ, Aya Mansour 52,λ, Ruxandra-Catrinel Maria-Zamfirescu 123,λ, Marcela Marsiglia 142,λ, Yohana Sarahi Mateo-Camacho 143,λ, Mark McArthur 144,λ, Olivia McDonnell 145,λ, Maire McHugh 146,λ, Mana Moassefi 147,λ, Samah Mostafa Morsi 86,λ, Alexander Munteanu 148,λ, Khanak K Nandolia 149,λ, Syed Raza Naqvi 150,λ, Yalda Nikanpour 151,λ, Mostafa Alnoury 152,λ, Abdullah Mohamed Aly Nouh 153,λ, Francesca Pappafava 154,λ, Markand D Patel 155,λ, Samantha Petrucci 53,λ, Eric Rawie 156,λ, Scott Raymond 157,λ, Borna Roohani 108,λ, Sadeq Sabouhi 158,λ, Laura M Sanchez-Garcia 159,λ, Zoe Shaked 123,λ, Pokhraj P Suthar 160,λ, Talissa Altes 161,λ, Edvin Isufi 161,λ, Yaseen Dhemesh 162,λ, Jaime Gass 161,λ, Jonathan Thacker 161,λ, Abdul Rahman Tarabishy 163,λ, Benjamin Turner 164,λ, Sebastiano Vacca 165,λ, George K Vilanilam 164,λ, Daniel Warren 162,λ, David Weiss 166,λ, Fikadu Worede 6,λ, Sara Yousry 52,λ, Wondwossen Lerebo 6,μ, Alejandro Aristizabal 167,168,π, Alexandros Karargyris 167,π, Hasan Kassem 167,π, Sarthak Pati 3,167,169,π, Micah Sheller 167,170,π, Katherine E Evan Link 171,α,β, Evan Calabrese 172,α,β, Nourel hoda Tahon 161,α,β, Ayman Nada 161,α,β, Yuri S Velichko 42,α,β, Spyridon Bakas 3,37,173,α,β,ϕ, Jeffrey D Rudie 122,174,α,β,η,ϕ, Mariam Aboian 6,α,β,η,ϕ,
PMCID: PMC10312806  PMID: 37396600

Abstract

The translation of AI-generated brain metastases (BM) segmentation into clinical practice relies heavily on diverse, high-quality annotated medical imaging datasets. The BraTS-METS 2023 challenge has gained momentum for testing and benchmarking algorithms using rigorously annotated internationally compiled real-world datasets. This study presents the results of the segmentation challenge and characterizes the challenging cases that impacted the performance of the winning algorithms. Untreated brain metastases on standard anatomic MRI sequences (T1, T2, FLAIR, T1PG) from eight contributed international datasets were annotated in stepwise method: published UNET algorithms, student, neuroradiologist, final approver neuroradiologist. Segmentations were ranked based on lesion-wise Dice and Hausdorff distance (HD95) scores. False positives (FP) and false negatives (FN) were rigorously penalized, receiving a score of 0 for Dice and a fixed penalty of 374 for HD95. The mean scores for the teams were calculated. Eight datasets comprising 1303 studies were annotated, with 402 studies (3076 lesions) released on Synapse as publicly available datasets to challenge competitors. Additionally, 31 studies (139 lesions) were held out for validation, and 59 studies (218 lesions) were used for testing. Segmentation accuracy was measured as rank across subjects, with the winning team achieving a LesionWise mean score of 7.9. The Dice score for the winning team was 0.65 ± 0.25. Common errors among the leading teams included false negatives for small lesions and misregistration of masks in space. The Dice scores and lesion detection rates of all algorithms diminished with decreasing tumor size, particularly for tumors smaller than 100 mm3. In conclusion, algorithms for BM segmentation require further refinement to balance high sensitivity in lesion detection with the minimization of false positives and negatives. The BraTS-METS 2023 challenge successfully curated well-annotated, diverse datasets and identified common errors, facilitating the translation of BM segmentation across varied clinical environments and providing personalized volumetric reports to patients undergoing BM treatment.

Keywords: BraTS, BraTS-METS, Medical image analysis challenge, Brain metastasis, Brain tumor segmentation, Machine learning, Artificial Intelligence

1. Introduction

Brain metastases represent the most common malignancy affecting the adult central nervous system (Le Rhun et al., 2021), affecting an estimated 20–40% of patients with systemic cancer (Percy et al., 1972; Tabouret et al., 2012; Posner, 1978; Nayak et al., 2012). Patients commonly have multiple lesions at different stages of treatment, therefore radiologic evaluation often extends beyond a mere comparison with the most recent scan. In clinical practice, a comprehensive assessment frequently involves reviewing several previous scans to monitor the progression or changes in the metastases over time which can be laborious and time-consuming (Jekel et al., 2022b; Kaur et al., 2023; Cassinelli Petersen et al., 2022).

The shift toward automated volumetric analysis and lesion organization in evaluating BMs is a transformative (Kaur et al., 2023;Ocaña-Tienda et al., 2023), transcending the conventional qualitative assessment methods to a personalized and time-efficient approach. Artificial intelligence (AI) based volumetric BMs assessments will not only improve the precision of measurements but also provide high-quality personalized reports of individual treatment response of brain metastases and thus influence patient outcomes; it’s about democratizing access to high-quality care Pinto-Coelho, 2023; Najjar, 2023; Tang, 2019. By integrating automated volumetric analysis into clinical practice, we can ensure more reliable and consistent measurements, extending these advanced diagnostic capabilities beyond specialized centers to a broader range of healthcare settings. Improved accessibility of personalized reporting is crucial, particularly for patients in regions where such specialized services were previously unavailable, thus broadening the scope of quality care to include more comprehensive and timely monitoring of disease progression and response to treatment.

The intricate task of accurately detecting, segmenting, and assessing BMs is pivotal for devising effective therapeutic strategies and prognostication. However, the efficacy of machine learning algorithms in this realm is inherently tied to the availability and quality of annotated medical imaging datasets (Zhou et al., 2020; Zhang et al., 2020; Xue et al., 2020; Jeong et al., 2024;Grøvik et al., 2020; Dikici et al., 2020, 2022; Charron et al., 2018; Bousabarah et al., 2020). Historically, the scarcity of large-scale, annotated datasets in the medical imaging field has limited the potential of machine learning algorithms. Many researchers find themselves constrained to smaller, local institutional datasets, which limits algorithm generalizability across different institutions (Greenspan et al., 2016). In this context, medical image analysis challenges—competitions to establish accurate segmentation algorithms—have emerged as crucial platforms, facilitating the development, testing, and benchmarking of machine learning algorithms by providing access to extensive, meticulously labeled, multi-center, real-world datasets.

Specifically, the domain of BMs analysis stands to benefit immensely from such collaborative initiatives. The complexities associated with BMs, such as the variability in size, shape, and location of lesions, necessitate sophisticated machine learning approaches that can adapt to the diverse characteristics of these metastatic manifestations (Cho et al., 2021). Moreover, the dynamic nature of BMs, with changes occurring over time and in response to treatment, underscores the need for algorithms capable of longitudinal assessment and multi-lesion segmentation.

The 2023 Brain Tumor Segmentation – Metastases (BraTS-METS) challenge marked a significant shift from previous BraTS challenges, which centered on adult brain diffuse astrocytoma (Zhang et al., 2020; Xue et al., 2020; Jeong et al., 2024). The scope was broadened to encompass a variety of brain tumor entities, thereby addressing the issue of data scarcity and methodological complexities inherent in earlier challenges. This challenge prioritized the segmentation of BMs on pre-treatment MR imaging. The goal of BraTS-METS 2023 was to establish a robust, accurate algorithm for segmenting metastatic lesions of virtually any size on diagnostic magnetic resonance imaging (MRI) using T1-weighted (T1) pre-contrast, T1 post-contrast, T2-weighted (T2), and fluid attenuated inversion recovery (FLAIR) sequences. The resulting standardized auto-segmentation algorithm was made openly accessible, thus facilitating its integration into clinical and research protocols across institutions.

Initially, the intention was to develop an algorithm dedicated to segmenting pre-treatment BMs (Figure 1, Step 1). This algorithm was fine-tuned to delineate the enhancing tumor, peritumoral edema, and necrotic portions of the metastases (Figure 1, Step 2). The ultimate aim was to establish a BMs consortium for future collaborative research (Figure 1, Step 3). This consortium is designed to foster a collaborative research environment, not only for the development of BM imaging algorithms but also for their clinical translation and community education efforts.

Figure 1:

Figure 1:

Flow chart outlining the BraTS-METS 2023 vision, beginning with the pre-treatment BMs segmentation during the 2023 ASNR/MICCAI BraTS challenge. In this phase, segmentations were conducted on a select dataset subset to refine the dataset for algorithm development by participants. The dataset is set to expand in subsequent challenges through ongoing annotation of contributed brain MRIs. Future challenges will incorporate datasets with annotated post-treatment BMs, segmentations including the hemorrhagic component of tumors, and non-skull-stripped images to enhance the evaluation of dural-based and osseous metastases. These datasets, coupled with clinical data and patient demographics, will contribute to an inter-institutional BMs consortium, fostering collaborative research and the clinical application of algorithms through partnerships between academia and industry.

2. Background

Standard-of-care for evaluation of BMs includes qualitative assessment of changes in lesion size and number and two dimensional measurements performed by radiologists manually on PACS workstation. In clinical trials, the Response Assessment in Neuro-Oncology Brain Metastases (RANO-BM) guidelines predominantly rely on measuring the unidimensional longest diameter of lesions (Lin et al., 2015). However, these traditional criteria may not fully capture the complex dynamics and morphological changes of BMs over time, particularly given the heterogeneity and irregular growth patterns often associated with these lesions.

Recent advances in MRI technology, particularly the adoption of high-resolution 3D sequences such as T1 magnetization prepared rapid acquisition gradient-echo, T1 fast spoiled gradient-echo, and T1 three-dimension high-resolution inversion recovery-prepared fast spoiled gradient-recalled, have significantly enhanced our ability to detect and monitor smaller BMs. The traditional threshold for target lesions, as outlined in the RANO-BM criteria proposed by Lin et al., set the minimum size at 10 mm in longest diameter, visible on two or more axial slices with a 5 mm or less interval (Lin et al., 2015). However, with the advancements in imaging, lesions as small as 1–2 mm can now be reliably detected, but because of significant inter-rater variability in measurement of lesions smaller than 5 mm, the consensus criteria still requires a lesion of at least 10 mm to be considered as measurable disease. Introduction of improved reproducibility and low variability between algorithm based measurements provides a potential for future re-evaluation of standardized assessment criteria to include smaller lesions. Indeed, recent practices have seen a shift towards a 5 mm minimum size threshold, aligning with the capabilities of current MRI technology, as highlighted byQian et al. (2017).

Integration of automated techniques, such as deep learning algorithms for segmentation and assessment, offers a promising avenue approach to enhance the precision and efficiency of volumetric evaluations, aligning with the requirements of the RANO-BM guidelines (Kanakarajan et al., 2023; Wang et al., 2023a;Yoo et al., 2022). The importance of multi-lesional segmentation and continuous assessment across serial imaging cannot be overstated. Such a comprehensive approach can benefit from the integration of automatic algorithms that are capable of efficiently detecting and segmenting metastases across multiple imaging time points, including pre- and post-treatment scans. The enhanced precision and efficiency of clinical assessments can complement the expertise of radiologists and other clinicians, which would aid not only in tracking disease progression and response to treatment but also in identifying new lesions at the earliest possible stage.

Despite the potential benefits, the routine implementation of such automated techniques in clinical settings faces significant hurdles, given the extensive time required and the variability inherent in imaging techniques across different temporal scans. This variability often arises from disparate imaging equipment and the fact that different radiologists may interpret sequential scans for a single patient differently, introducing acquisition heterogeneity and inter-reader variability (Buchner et al., 2023; Mi et al., 2020).

Addressing the detection and segmentation challenges associated with smaller BMs is therefore of paramount importance. The successful development of targeted algorithms will expedite their translation to and adoption in clinical practice, providing a vital resource in the management of BMs. By successfully overcoming those challenges, we can provide algorithms that can be readily translated and implemented in clinical settings.

3. Related Works

While challenges remain in the field of automated BMs segmentation, recent studies are indicative of a promising trajectory toward achieving high levels of automation, consistency, and adaptability in clinical practice (Jekel et al., 2022b; Kanakarajan et al., 2023;Dang et al., 2022; Jekel et al., 2022a; Chen et al., 2023b). Kanakarajan et al. (2023) demonstrated a significant advancement with their development of a fully automated segmentation method for BMs using T1 contrast-enhanced MR images, which could significantly aid in evaluating treatment effects post-stereotactic radiosurgery. Similarly, Buchner et al. (2023) have identified core MRI sequences that are essential for reliable automatic BMs segmentation, providing a foundation for standardized imaging protocols and enhancing algorithmic consistency across various clinical settings.

The integration of multi-phase delayed enhanced MR images has been explored by Chen et al. (2023b), who reported improvements in the accuracy of both segmentation and classification of BMs. This approach addressed the critical need for refined diagnostic tools that can adapt to the complex nature of BMs. Furthermore,Ottesen et al. (2023) have extended the capabilities of deep learning algorithms by implementing 2.5D and 3D segmentation techniques on multinational MRI data, enhancing the robustness and adaptability of these systems for diverse clinical environments.

The ongoing development and refinement of these automated segmentation tools are set to revolutionize the way BMs are assessed, bringing about a significant enhancement in the consistency and quality of patient care (Jekel et al., 2022b;Jalalifar et al., 2023). Yoo et al. (2023) underscored the importance of the data domain in self-supervised learning for accurate BMs detection and segmentation. This development points toward the creation of more adaptable and robust systems capable of functioning effectively across a variety of clinical scenarios. Moreover, advancements in the reduction of false positives within automated BMs segmentation underscore the growing feasibility and effectiveness of these technologies, even in diverse clinical environments, cementing their role as invaluable assets in medical imaging (Ghesu et al., 2022; Liew et al., 2023; Ziyaee et al., 2023).

Detecting smaller metastatic lesions, typically ranging from 1 to 2 mm, is pivotal in patient prognosis and treatment planning. Given the increased reliance on SRS (Vogelbaum et al., 2022), accurately identifying the exact number and localization of these small metastases becomes even more critical to ensure effective treatment and minimize the risk of missed targets, which could necessitate additional interventions, cause treatment delays, and increase healthcare costs (Minniti et al., 2011; Schnurman et al., 2022;Chen et al., 2023c). The gross total volume (GTV) of BMs is potentially a critical prognostic indicator, yet its clinical utility remains largely untapped due to the absence of validated volumetric segmentation tools. The considerable effort required to detect and volumetrically segment all lesions, irrespective of size, poses a significant challenge. While existing glioma-focused segmentation algorithms, such as those developed by Applied Computer Vision Lab & Division of Medical Image Computing, Germany, have shown promising accuracy for larger metastases as measured by Dice scores, their efficacy diminishes with smaller lesions.

Efforts to release publicly available BM datasets have varied significantly in their criteria and quality, contributing to inconsistencies in algorithm training and validation. Table 1 provides a summary of previously publicly available datasets.

Table 1:

Overview of publicly available datasets for BMs.

Public Dataset Data Publisher Number of case Difference from BraTS datasets
NYUMets (Oermann et al., 2023) New York University 1,429 patients ▪ Contains post therapy cases
▪ Not all patients have images
▪ Most cases without segmented BM
BrainMetShare (Grøvik et al., 2020) Stanford University 156 patients ▪ Does not contain T2 sequence
▪ Contains post therapy cases
▪ Only contains TC subregion
▪ Available in JPEG format
UCSF-BMSR (Rudie et al., 2024) University of California San Francisco 412 patients ▪ Contains synthetic T2 images
▪ Contains post therapy cases
Brain-TR-GammaKnife (Wang et al., 2023b) University of Mississippi 47 patients ▪ Does not contain T2 images
▪ Contains post therapy cases
MOLAB (Ocaña-Tienda et al., 2023) University of Castilla-La Mancha 75 patients ▪ Contains post therapy cases
▪ Recently published
▪ Not all BMs are segmented

The development of a universally accepted, metastasisspecific AI tool represents a considerable gap in the current landscape, posing a barrier to the standard clinical use of GTV assessment for prognostication in patients with BMs. This challenge is compounded by the lack of a comprehensive public dataset, which would facilitate a fair comparison of existing BMs segmentation models. The availability of such a dataset could significantly accelerate progress by enabling researchers to benchmark and refine their models against a standardized dataset, thereby enhancing the reliability and accuracy of AI-powered segmentation tools. Bridging these gaps is essential for advancing the integration of AI in the prognostic evaluation of BMs, ultimately improving patient management and treatment outcomes.

4. Materials & Methods

4.1. Data

The BraTS-METS dataset included retrospectively collected multiparametric MRI (mpMRI) scans from diverse institutions, representing the variability in imaging protocols and equipment reflective of global clinical practices. Inclusion criteria encompassed MRI scans with the presence of untreated BMs with T1 pre-contrast, T1 post-contrast, T2, and FLAIR sequences. Participating institutions had obtained Institutional Review Board and Data Transfer Agreement approvals before contributing data, ensuring compliance with regulatory standards. These scans were then centralized and curated for consistency.

Exclusion criteria included the presence of prior treatment changes, lack of one of the required MRI sequences, or imaging not technically acceptable due to motion or other significant imaging artifacts. The cases where post-treatment changes were noted were reserved for BraTS-METS 2024.

The dataset allocation for the BraTS-METS 2023 challenge adhered to the standard machine learning protocol, with 70% designated for training, 10% for validation, and 20% for testing. Ground truth (GT) labels were provided exclusively for the training set, while the validation set remained unlabeled to ensure integrity in algorithmic evaluation. The testing set was kept hidden from the participants. The use of additional data, whether public or private, was restricted to prevent bias in the algorithmic ranking process. Participants were allowed to reference external datasets only for publication purposes and were required to disclose such usage transparently in their manuscripts, along with results derived from the BraTS-METS 2023 dataset.

4.2. Imaging Data Description

The mpMRI scans included four sequences: non-enhanced T1, post-gadolinium-contrast T1 (T1Gd), T2, and non-enhanced T2-FLAIR, procured from various scanners and protocols. Standardized pre-processing was applied to all the BraTS-METS mpMRI scans. Specifically, the applied pre-processing routines included conversion of the DICOM files to the NIfTI file format, co-registration to the same anatomical template (SRI24)(Rohlfing et al., 2010), resampling to a uniform isotropic resolution (1mm3), and, finally, skull stripping (Isensee et al., 2019). The pre-processing pipeline was made publicly available through the Cancer Imaging Phenomics Toolkit (CaPTk) (Pati et al., 2020; Rathore et al., 2018) and the Federated Tumor Segmentation (FeTS) tool (Pati et al., 2022). Conversion to Neuroimaging Informatics Technology Initiative (NIfTI) stripped the accompanying metadata from the Digital Imaging and Communications in Medicine (DICOM) images and removed all protected health information from the DICOM headers. Furthermore, skull stripping mitigated potential facial reconstruction/recognition of the patient (Greenspan et al., 2016;Cho et al., 2021). The specific approach used for skull stripping was based on a novel deep learning approach that accounts for the brain shape prior and was agnostic to the MRI sequence input (Juluru et al., 2020; Schwarz et al., 2019).

4.3. Tumor Labels

The annotation of tumor sub-regions aligned with Visually AcceSAble Rembrandt Images (VASARI) feature visibility and encompassed three labels: Gd-enhancing tumor (ET - label 3), surrounding non-enhancing FLAIR hyperintensity (SNFH - label 2), and the non-enhancing tumor core (NETC – label 1). ET is described as the enhancing portion of the tumor, characterized by areas of hyperintensity in T1Gd that are brighter than T1. NETC is identified as the presumed necrotic core of the tumor, which is evident as a non-enhancing focus surrounded by enhancing tumor. SNFH is defined as the peritumoral edema and tumor infiltrated tissue, indicated by the abnormal hyperintense signal on the T2-FLAIR images, which includes the infiltrative non-enhancing tumor, as well as vasogenic edema in the peritumoral region. In previous BraTS challenges, ET was segmented as label 4. However, starting from BraTS 2023, ET has been segmented as label 3 for consistency. The sub-regions are shown in Figure 2.

Figure 2:

Figure 2:

Image panels illustrating the annotated tumor sub-regions across various mpMRI scans with segmentations of ET (yellow), SNFH (green), and NETC (red) done on ITK-SNAP.

4.4. Tumor Annotation Protocol

The BraTS initiative, in consultation with domain experts, defined various tumor sub-regions to provide a standardized approach for their assessment and evaluation. However, alternative criteria for delineation could be established, resulting in slightly different tumor sub-regions. To ensure consistency in the GT delineations across various annotators, the following tumor annotation protocol was designed. Structural mpMRI volumes were considered (T1, T1Gd, T2, T2-FLAIR).

The BraTS-METS 2023 challenge focuses on three regions of interest:

  1. Whole Tumor (WT) = Label 1 + Label 2 + Label 3

  2. Tumor Core (TC) = Label 1 + Label 3

  3. Enhancing Tumor (ET) = Label 3

WT describes the complete extent of the disease, encompassing TC and the peritumoral edematous/invaded tissue, typically depicted by the abnormal hyper-intense signal in the T2-FLAIR volume. While the radiologic definition of tumor boundaries, especially in infiltrative tumors such as gliomas, presents a well-known challenge, this is less problematic in BMs, which typically have well-defined borders of the contrast-enhancing portion. In most cases, the boundaries of the contrast-enhancing region of the BM and the surrounding FLAIR hyperintense edema are well defined. One of the major challenges in segmenting BMs lies in the overlap of edema between multiple lesions, which is why the segmentation of ET is separated from WT and treated as distinct entities.

4.5. Annotation Pipeline

To ensure uniformity in data imaging and tumor labeling, we established a comprehensive annotation pipeline (Figure 3). This pipeline facilitates the development of accurate GT labels and is divided into five key stages: pre-segmentation, annotation refinement, technical quality control (QC), initial approval, and final approval.

Figure 3:

Figure 3:

BraTS-METS 2023 annotation pipeline.

4.6. Pre-segmentation

The initial phase involved pre-segmenting imaging volumes using three distinct approaches:

  1. nnU-Net trained on the University of California, San Francisco BMs Stereotactic Radiosurgery (UCSF-BMSR) MRI Dataset (Rudie et al., 2024), which creates the ET label and was fused with predictions of NETC and SNFH from an nnU-Net trained on the pre-treatment BraTS 2021 glioma dataset.

  2. nnU-Net trained on AURORA multicenter study (Kaur et al., 2023), which creates SNFH and tumor core (ET + NETC) labels.

  3. nnU-Net trained on Heidelberg University Hospital dataset (Pflüger et al., 2022), which creates SNFH and tumor core labels.

The label fusion process varied for each label. SNFH (label – 2) was fused using the STAPLE fusion algorithm to aggregate the segmentations from each automated segmentation algorithm, accounting for systematic errors (Warfield et al., 2004). ET (label – 3) was fused using the minority voting algorithm to aggregate all enhancing tumor voxels identified by the automated segmentation algorithms, due to varying accuracies in detecting small metastases. NETC (label – 1) is only produced by the nnU-Net trained on UCSF-BMSR. Algorithms trained on AURORA and Heidelberg datasets only segment TC and SNFH. Therefore, NETC overlays both ET and SNFH labels.

4.7. Annotation Refinement and Initial Approval

All pre-segmentations from the three models, along with fused segmentations, were provided to the annotators. Subtraction images, in which the non-contrast T1 sequence is digitally subtracted from the post-contrast T1 sequence, were also provided to aid in the annotation refinement process. Annotations were performed by a diverse group of more than 150 student annotators and volunteer neuroradiology experts, under the supervision of annotator coordinators (A.J. and K.K.). Cases requiring re-annotation due to incompleteness were identified and returned for correction. During the process of annotation, the trainees participated in group reviews of cases, asked questions, and attended lectures by expert imagers. Completed student annotations were then reviewed by a pool of 52 experienced board-certified attending neuroradiologists (approvers) recruited by the American Society of Neuroradiology, ensuring quality control and uniformity with the SRI24 atlas standards.

Approvers reviewed the volunteer annotations and either approved the case or returned it to students for re-annotation. Additionally, a QC process was implemented, which included removing all random voxels and any voxels outside the brain mask, ensuring all images had the same parameters (space, orientation, and origin) as the SRI24 atlas, and verifying the presence of all segmentations and segmentation masks are in the folder with original NIfTI images.

4.8. Annotation Final Approval

Following refinement, each case underwent a secondary review by a different board-certified neuroradiologist from the approver pool, ensuring accurate metastasis segmentation and adherence to inclusion criteria. In cases of discrepancy, the second approvers made the necessary changes themselves without reverting to the trainees. Finally, a neuroradiologist (M.A.) with over 6 years of brain tumor expertise conducted a final dataset review, guaranteeing consistency across all annotations.

4.9. Common Errors of Automated Segmentations

Based on observations from previous BraTS challenges, common errors in automated segmentations were identified. The most typical errors in the current challenge included:

  1. Automated algorithms missing small metastases. Enhancing metastasis was fused using the minority voting algorithm to aggregate all enhancing tumor voxels identified by the three algorithms. However, many small metastases were missed and were manually segmented by neuroradiology attendings.

  2. Segmentation of white matter changes from microvascular disease. Peritumoral edema segmentations were checked by neuroradiology attendings and modified.

  3. The segmentation of non-enhancing lesions that have intrinsic T1 hyperintensity. Voxels with intrinsic T1 hyperintensity were manually removed from ET segmentations.

These insights led to specific adjustments in the annotation process to enhance accuracy.

4.10. Performance Evaluation Framework

Participants were offered a baseline approach implemented in the Generally Nuanced Deep Learning Framework (GaNDLF), a modular open-source framework maintained by the MLCommons organization. GaNDLF provides popular network architectures, but also allows users to leverage the functionality of other libraries, such as PILLOW and MONAI. Submissions were packaged in MLCube containers as described in the instructions provided in the Synapse platform. These submissions were registered to MLCommons’ MedPerf, an open federated AI/ML evaluation platform. MedPerf automated the pipeline of running the participants’ models on the evaluation datasets of each contributing site’s data and calculating evaluation metrics on the resulting predictions. Finally, the Synapse platform retrieved the metrics results from the MedPerf server and ranked them to determine the winner.

Performance evaluation was based on Dice scores and 95% Hausdorff distance (HD95) for individual segmented lesions as defined by the three regions of interest: ET, TC and WT. Given that BMs are often small, sometimes comprising only a few voxels, it was clinically significant to assess segmentation algorithms based on their capacity to accurately detect and delineate both small and large lesions. Teams were ranked based on a combination of lesionwise Dice and Hausdorff distance scores across all evaluated test cases. False positives and false negatives were rigorously penalized, receiving a score of 0 for Dice and a fixed penalty of 374 for HD95. This methodical approach was uniformly applied across the three designated tissue classes, with subsequent aggregation of results by taking the mean score for each CaseID within each tissue category.

Lesion-wiseDiceScore=iLDiceliTP+FN+FP (1)
Lesion-wiseHD95=iLHD95liTP+FN+FP (2)

where L is the total number of GT lesions and TP,FP,FN are the number of true positive, false positive and false negative lesions respectively.

All participants were evaluated and ranked using the same unseen testing data, which was not accessible to them. They were required to upload their containerized method to the evaluation platforms. The final top-ranked teams were announced at the 2023 Medical Image Computing and Computer Assisted Intervention Society (MICCAI) annual meeting, with monetary prizes awarded to the top-ranked teams in both tasks of the challenge.

For this challenge, each team was ranked relative to its competitors for each of the testing subjects, for each evaluated region (i.e., ET, TC, WT), and for each measure (i.e., Dice and Hausdorff). For example, each team was ranked for 59 subjects, for 3 regions, and for 2 metrics, which resulted in 59 × 3 × 2 = 354 individual rankings. The final ranking score (FRS) for each team was then calculated by first averaging across all these individual rankings for each patient (i.e., cumulative rank), and then averaging these cumulative ranks across all patients for each participating team. This ranking scheme has also been adopted in other challenges with satisfactory results, such as the Ischemic Stroke Lesion Segmentation challenge (Maier et al., 2017).

We then conducted further permutation testing to determine statistical significance of the relative rankings between each pair of teams. This permutation testing reflected differences in performance that exceeded those that might be expected by chance. Specifically, for each team, we started with a list of observed subject-level cumulative ranks, i.e., the actual ranking described above. For each pair of teams, we repeatedly randomly permuted (i.e., for 100,000 times) the cumulative ranks for each subject. For each permutation, we calculated the difference in the FRS between this pair of teams. The proportion of times the difference in FRS calculated using randomly permuted data exceeded the observed difference in FRS (i.e., using the actual data) indicated the statistical significance of their relative rankings as a p-value. These values were reported in an upper triangular matrix, providing insights of statistically significant differences across each pair of participating teams.

4.11. Analysis

The competition framework encompassed evaluations across three key regions: ET, TC, and WT, utilizing two primary metrics: lesion-wise Dice and lesion-wise HD95. These metrics have been developed primarily to evaluate the performance of models at the level of individual lesions, rather than on a whole-image basis. This approach ensured that our evaluation did not favor models that only captured large lesions, a limitation commonly observed with standard Dice scores. By assessing models on a lesion-by-lesion basis, we gained insights into their ability to segment all sizes of BMs accurately.

To implement this evaluation framework, we first isolated the lesion tissues (i.e., ET, TC, WT). We applied dilation to the GT labels for WT, TC, and ET to gauge the lesion’s extent. This technique ensured that during connected component analysis, small lesions adjacent to a primary lesion were not misclassified as separate entities. It is crucial to note that the GT labels remained unchanged throughout this process. We conducted a 26-connectivity connected component analysis on the predicted labels and compared each component to the corresponding GT label on a component-by-component basis. We calculated the Dice scores and HD95 scores individually for each lesion (or component), assigning the aforementioned penalty, to all false positives and negatives. Subsequently, we computed the mean score for each specific case.

Acknowledging the variability in lesion significance arising due to human error, a volumetric threshold of 2 voxels (2 mm3) was established by an expert panel of clinical radiologists, below which the models’ performance on deemed ”small/false” lesions is not considered in the evaluation. This approach was primarily adopted to ensure that participants were not unfairly penalized for stray voxels in the GT labels, which may result from human error, or for small lesions unrelated to the pathology central to the challenge. The expert panel of clinical radiologists also determined the dilation factor, which was uniformly applied for combining lesions in the GT masks. A dilation factor of 1 voxel in 3D space was chosen because BMs can be small, and it is important to avoid combining these small BMs.

The code and detailed information on the lesion-wise evaluation metrics can be found here 1.

4.12. Dataset

Multiple datasets were contributed by individual institutions and were in various stages of annotation and approval (Figure 4).

Figure 4:

Figure 4:

Map of institutions that expressed interest in contributing data to the BraTS-METS challenge.

5. Results

5.1. Dataset Sources

Our annotation and approval pipeline, as previously described, was applied to datasets from a variety of institutions, including New York University (NYU), Yale University, Washington University, Cairo University (CairoU), Duke University, and the University of Missouri. The annotated NYU dataset is uniquely hosted on the NYU website (access to the data can be requested by filling the form)2, separate from the public BraTS repository. As for the UCSF dataset, synthetic T2 images were generated and shared on the UCSF website3. The Stanford University dataset, despite being publicly available, was not incorporated into our primary dataset due to the lack of T2 image sequences. These datasets were available and optional for additional training. For logistical reasons, the UCSF, Stanford, and NYU datasets were excluded from the validation and test phases of our project.

In all, 2712 cases were received from various institutes of which 1303 cases were reviewed from eight institutions. After 337 cases were excluded, 876 cases were allocated into the training (n = 402; UCSF and Stanford datasets cases that were optional, n = 474), validation (n = 31), and testing (n = 59) groups (Table 2). All the source institutions were located in the United States, except for one in Egypt.

Table 2:

Dataset sources in the BraTS-METS 2023 challenge. In the training dataset, 474 cases from UCSF and Stanford were included as optional because they did not have original T2 weighted images.

Dataset Source Total cases reviewed Excluded Training Validation Test

Duke 37 0 26 4 7
CairoU 45 10 32 1 2
Missouri 25 3 16 2 4
WashU 40 1 27 4 8
Yale 225 30 137 20 38
NYU* 221 57 164 0 0
UCSF^ 560 236 324 0 0
Stanford^ 150 0 150 0 0

Total 1,303 337 402 (474 optional) 31 59
*

The NYU dataset is part of the official challenge. Because it is hosted on a separate website, it is not included in the validation or test set.

^

UCSF and Stanford datasets are not part of the official challenge. Both datasets are provided as optional training sets.

5.2. Lesion Characteristics

Table 3 provides a detailed overview of lesion count and sizes across the different dataset groups used in the BraTS-METS 2023 challenge. These data demonstrate the variation in lesion count and size across the dataset groups.

Table 3:

Lesion count and sizes for each dataset group.

Dataset Group ET lesion-count (total) ET lesion-count median (IQR) ET lesion-size median (IQR) WT lesion-count (total) WT lesion-count median (IQR) WT lesion-size median (IQR)

Training* (n = 402) 3076 3 (7) 65 (287) 2618 3 (5) 121 (804)
Validation (n = 31) 139 3 (4) 141 (664) 119 3(3) 591 (3318)
Testing (n = 59) 218 2 (3) 132 (613) 193 2 (3) 322 (8624)
*

The training group does not include the optional UCSF and Stanford datasets.

5.3. Performance Analysis

Table 4 provides the relative ranking for each team. Team NVAUTO ranked first in the challenge, with an average rank across subjects of 7.9 and a PatientWise mean of 0.38. Team SY placed second with a PatientWise mean of 0.41 across all patients. The supplementary material depicts the pitfall cases with figures illustrating the false positives or missed lesions.

Table 4:

Top-performing teams ranking with cumulative ranks across subjects. Lower scores indicate better performance.

Team Name Cumulative ranks across subjects Lesion-wise mean Rank

NVAUTO 466 7.9 1
SY 503 8.5 2
blackbean 571.5 9.7 3
CNMCPMI2023 689 11.7 4
isahajmistry 817 13.8 5
DeepRadOnc 907.5 15.4 6
MIASINTEF 1002 17 7

Figure 5 provides a patient-wise comparison of segmentation accuracy across the different participating teams. The boxplots reflect the distribution of each team’s accuracy per patient case per lesion—across all cases within the test dataset, with lower value signifying better performance. The teams NVAUTO, SY, and blackbean showed a notably higher median accuracy, alongside a relatively narrow interquartile range (IQR). Conversely, DeepRadOnc displayed a wider IQR.

Figure 5:

Figure 5:

BraTS-METS 2023 boxplots of LesionWise ranking across patients for all participating teams on the BraTS 2023 test set (lower is better).

A description of the algorithms used by the top four winning teams are shown in Table 5.

Table 5:

Description of algorithms used by the top 4 winning teams.

Team Name & DL alogrithm Description
NVAUTO (SegResNet from MONAI Auto3DSeg) ▪ MONAI native (uses transforms, loaders, losses, networks components of MONAI)
▪ 4-channel input, which is a concatenation of four different MRI scans
▪ Input data is normalized to have zero mean and unit standard deviation for each channel.
▪ Employs random cropping to a fixed size of 224×224×144 pixels
▪ AdamW optimizer with a learning rate of 2e-4 is used in combination with a cosine annealing scheduler
▪ Model is trained for a range of 300 to 1000 epochs, using 5-fold cross-validation
▪ A combined Dice-Focal loss function is utilized for training
▪ Data augmentation techniques include spatial transformations (random rotations, scaling, flips) and intensity modifications (random adjustments to intensity/contrast, addition of noise, and blur)
▪ Code reference: GitHub - MONAI and SegRes-NetDS
SY (3D TransUNet Model (Chen et al., 2023a)) ▪ 3D nnUNet as the CNN Encoder + Decoder
▪ 12-layer ViT as the Transformer Encoder with ImageNet pretrained weights
▪ A hybrid loss function consisting of pixel-wise cross entropy loss and dice loss
▪ Pre-train the transformer blocks using Masked Autoencoder (He et al., 2022)
▪ Code reference: 3D TransUNet Model
blackbean (STU-Net) ▪ A scalable and transferable version of nnUNet
▪ Larger input patch size: 160 × 160 × 160
▪ Poly decay policy
▪ Code reference: STU-NET and nnUNetV1
CNMCPMI2023 (Label-wise model ensemble approach) ▪ nnU-Net and Swin UNETR CNN + ViT
▪ Outputs of these networks are then subjected to a non-linear function
▪ Processed outputs are combined through model ensembling to create ensembled predictions
▪ Label-wise post-processing is then applied to these ensembled predictions to produce the final predictions for each label

5.4. Detailed Performance by Tumor Entities

Table 6 delineates the comparative performance of each participating team’s Dice scores for each tumor entity (i.e., ET, TC, and WT). The team NVAUTO secured the top rank across all categories, exhibiting a mean Dice score of 0.60 for ET, 0.65 for TC, and 0.62 for WT. Notably, SY and blackbean shared the second rank in the ET segmentation, with a mean of 0.57. Figures 6, 7, and 8 further highlight the lesion-wise Dice scores (shown as panels A) and HD95 (shown as panels B) for each participating team for each tumor entity.

Table 6:

Teams’ Dice scores, reported as mean ± standard deviation (median), and ranking based on individual tumor entities.

Team Name ET
TC
WT
Dice score Rank Dice score Rank Dice score Rank

NVAUTO 0.60 ± 0.24 (0.58) 1 0.65 ± 0.25 (0.60) 1 0.62 ± 0.24 (0.61) 1
SY 0.57 ± 0.28 (0.57) 2 0.62 ± 0.29 (0.64) 2 0.60 ± 0.29 (0.61) 2
blackbean 0.57 ± 0.26 (0.58) 2 0.61 ± 0.28 (0.58) 3 0.57 ± 0.28 (0.57) 4
CNMCPMI2023 0.55 ± 0.28 (0.64) 4 0.60 ± 0.30 (0.69) 4 0.58 ± 0.29 (0.64) 3
isahajmistry 0.49 ± 0.29 (0.44) 5 0.53 ± 0.29 (0.49) 5 0.48 ± 0.27 (0.43) 5
DeepRadOnc 0.39 ± 0.31 (0.39) 6 0.43 ± 0.36 (0.43) 6 0.40 ± 0.31 (0.41) 7
MIASINTEF 0.39 ± 0.29 (0.39) 6 0.43 ± 0.31 (0.44) 6 0.43 ± 0.32 (0.43) 6

Figure 6:

Figure 6:

BraTS-METS 2023 boxplots of enhancing tumor Dice scores (A) and 95% Hausdorff distance (HD95) (B) for all participating teams on the BraTS 2023 test set.

Figure 7:

Figure 7:

BraTS-METS 2023 boxplots of tumor core Dice scores (A) and 95% Hausdorff distance (HD95) (B) for all participating teams on the BraTS 2023 test set.

Figure 8:

Figure 8:

BraTS-METS 2023 boxplots of whole tumor Dice scores (A) and 95% Hausdorff distance (HD95) (B) for all participating teams on the BraTS 2023 test set.

Figure 9 illustrates a comparative evaluation across the three tumor regions of interest where performance of the segmentation models is quantified using three metrics: lesion detection rate, sensitivity, and positive predictive value (PPV). The lesion detection rate was led by NVAUTO with rates of 76% for ET, 78% for TC, and 80% for WT. Closely following were blackbean and SY, with both achieving a 75% detection rate for ET and TC, and 76% and 72% for WT, respectively. In terms of sensitivity, NVAUTO again showed superior performance, with 90% for ET, 91% for TC, and 90% for WT, reflecting a high true positive rate. blackbean and SY exhibited comparably high sensitivity, around 89–90% across tumor entities. PPV results depicted NVAUTO at the forefront with 82% for ET, 84% for TC, and 84% for WT. Following suit, blackbean maintained a PPV of 79% across all tumor entities, and SY showcased a slightly lower yet robust PPV performance with 76%.

Figure 9:

Figure 9:

Performance metrics across tumor entities—whole tumor (WT), tumor core (TC), and enhancing tumor (ET).

5.5. Algorithm Sensitivity to Lesion Size

Figure 10 provides insight into the models’ performance in segmenting lesions of different sizes. This was analyzed by calculating a running average within an expanding window of tumor volume, starting with only the smallest tumors and progressively including larger lesions (Kelahan et al., 2022).

Figure 10:

Figure 10:

BraTS-METS 2023 plot of cumulative average of (A) Dice scores, (B) 95% Hausdorff distance (HD95), and (C) lesion detection rate as a function of increasing lesion volume.

The graphs collectively indicate that segmentation algorithm performance diminishes as tumor size decreases, with all teams facing challenges in maintaining high Dice scores and lesion detection rates for smaller tumors. The HD95 data suggest that algorithms struggled with precision in delineating the contours of smaller lesions, reflected in greater distances from the ground truth, a trend particularly noticeable for tumors less than 100 mm3 in volume. Despite these challenges, NVAUTO consistently outperformed its counterparts.

6. Discussion

The use of machine learning in medical imaging has brought notable improvements in detecting and segmenting BMs. Clinical evaluation of BMs has unique complexity because it requires volumetric measurements and organization of lesions to provide granular details on individual lesion treatment history and assess treatment response. Presence of BMs is often a prognostic indicator of poor outcome in patients with metastatic disease, significantly changing treatment options and impacting patient survival (Jekel et al., 2022a; Chen et al., 2023b;Ottesen et al., 2023). The 2023 BraTS-METS challenge has significantly driven forward the development of algorithms designed to manage the complex task of BMs segmentation. These algorithms provide clinicians with better tools to measure tumor volumes accurately, which is crucial for both treatment planning and patient outcomes. The varying performance among the participating teams underlines the inherent complexity of tumor segmentation in diverse datasets. This diversity in results particularly highlights the difficulty algorithms face in consistently identifying and accurately segmenting small metastases, which remain a significant hurdle in the literature, clinical practice, and for BraTS-METs challenge participants. The assessment metric utilized in BraTS-METs 2023 challenge penalizes for false negatives and false positives, which provides overall low Dice coefficients but provides a metric that optimizes for selection of algorithms that will be easily translated into diverse clinical practices. The performance trends observed in the challenge demonstrate that while some progress has been made, the precise detection of small metastases continues to be the principal challenge, limiting the overall effectiveness of current models. Enhancing the sensitivity and specificity of these models for small lesion detection is crucial, as this would lead to significant improvements in diagnostic accuracy and clinical outcomes.Improving sensitivity of small metastases will likely require both larger sample sizes and novel network architectures or loss functions that focus on lesionwise detection as currently employed loss functions are optimized towards voxelwise performance.

While multiple algorithms have shown promise in accurately segmenting BMs with high Dice scores (Dikici et al., 2020, 2022; Charron et al., 2018; Bousabarah et al., 2020), a critical limitation remains in their ability to detect very small lesions, i.e., under 5 mm in size. Accurately identifying and quantifying every lesion, regardless of size, is paramount for effective therapeutic planning and prognosis assessment. Fairchild et al. (2024) retrospectively investigated BMs that were missed on initial MRIs, despite meeting diagnostic criteria, but became detected upon subsequent imaging in patients undergoing repeat SRS courses (Fairchild et al., 2024). The radiographic evidence of these metastases could often be spotted in earlier scans, suggesting potential for improved early detection and treatment planning. This issue is particularly pronounced for lesions under 3 mm, which may go untreated initially, only to become apparent on future imaging (Fairchild et al., 2023).

The heterogeneity in the appearance of BMs—ranging from multiple small lesions to solitary large lesions with varying degrees of edema—presents unique challenges in their detection and management. Our review of the challenge outcomes shows that Team NVAUTO achieved the highest scores, with a mean lesion-wise Dice score of 0.60 to 0.65 across different tumor entities. While these results place them at the forefront, the scores also highlight that there is considerable potential for further advancements. The close performance of teams like SY and blackbean illustrates the competitive nature of the field and emphasizes the need for ongoing improvements in precision, especially for smaller and more challenging lesions.

It is essential to highlight how various models developed for the 2023 BraTS-METS challenge handled the segmentation of these critical, small lesions. Our analysis of model performance across different lesion sizes revealed significant variations in how these models managed lesion detection and characterization. For instance, NVAUTO exhibited exceptional performance across all lesion sizes, particularly with smaller lesions, surpassing the overall per formance of many other models in the challenge. These model performance findings underscore the necessity for continuous improvement in the algorithms’ sensitivity to tumor size variations, which is crucial for ensuring that all lesions, particularly the smaller and potentially more elusive ones, are accurately identified and appropriately managed in clinical settings.

In the realm of targeted therapies, such as radiation, precision in lesion segmentation directly influences treatment efficacy, as determining lesion sizes influences SRS dose. For example, lesions up to 20 mm may receive up to 24 Gy, which is adjusted based on the lesion’s diameter to prevent severe neurotoxicity (Shaw et al., 2000). Misidentifying or overlooking even a single small lesion can lead to inadequate treatment coverage, potentially resulting in suboptimal patient outcomes and increased recurrence rates (Kaal et al., 2005; Zindler et al., 2014). This underscores the necessity for advancements in diagnostic imaging techniques and highlights the critical role of machine learning technologies in achieving high precision in BMs detection and segmentation. In turn, these algorithms have the potential to significantly impact treatment response assessments and improve workflow efficiencies in clinical practice.

Accurate detection and precise quantification of lesion volumes are critical for determining patient prognosis. Prior research has shown that the GTV of metastatic disease within the brain significantly impacts patient survival, particularly when deciding between equivalent treatment options such as surgery and radiotherapy (Routman et al., 2018;Krist et al., 2022). This precise volume measurement helps clinicians choose the most appropriate therapeutic approach, ensuring that treatments like SRS or invasive surgical interventions are tailored to the patient’s specific disease burden.

The ability to assess the GTV of BMs at diagnosis is crucial for patient outcomes. Accurately tracking changes in lesion volumes and perilesional edema over time is essential for informed decision-making in the post-treatment setting (Jalalifar et al., 2023). Treatments for brain metastatic disease utilize targeted approaches such as SRS, hypofractionated stereotactic radiation therapy (HFSRT), and hippocampal avoidance whole brain radiotherapy with less common use of whole brain radiation therapy due to neurotoxicity concerns. These techniques are particularly beneficial for patients with multiple metastases—even over 50—and rely heavily on precise volumetric localization of each metastasis (Simon et al., 2022). Unlike WBRT, which uses a 2D plan and does not require detailed localization, SRS and HFSRT involve complex 3D planning to accurately target each lesion. Furthermore, the dynamic nature of these metastases—with some increasing in size transiently before decreasing or resolving, and others possibly representing radiation necrosis or recurrence—underscores the necessity for reliable monitoring of metastasis sizes in relation to treatment timing (Wang et al., 2023a). This ongoing surveillance of the contrast enhancing component and peri-tumoral edema is vital to differentiate between active disease and treatment effects, thereby guiding the adjustment of therapeutic strategies (Kaur et al., 2023; Jekel et al., 2022a).

A significant challenge in creating large open science datasets involves safeguarding patient privacy and securing sensitive data (Vahdati et al., 2024; Shaw et al., 2024; Wang et al., 2024;Gichoya et al., 2023;Davis et al., 2024). This can be addressed by establishing robust security measures, such as data de-identification using skull and face stripping from the MRI scan to remove facial features. Moreover, fostering a culture of sharing and collaboration is essential for the broad applicability of these algorithms across different institutions. It is vital to balance promoting open science with maintaining patient safety, as this balance will drive future advancements in medical image analysis. This focus on open science not only broadens access to data but also introduces challenges in data handling and annotation, particularly for complex cases like BMs.

In the 2023 inaugural BraTS-METS challenge, a significant hurdle was the preparation of BMs datasets with expert-approved lesion annotations. Unlike other brain tumors such as glioblastomas or meningiomas, BMs display significant phenotypic variability and are often characterized by the presence of multiple synchronous lesions. This variability and multiplicity greatly complicate the annotation process, extending the time required from a few minutes to several hours depending on the number and complexity of lesions.

To address this, we introduced an innovative educational approach to annotation that not only facilitates the development of high-quality annotated datasets but also serves as a learning platform for annotators. This strategy involves a comprehensive educational series on BM imaging, basic MRI physics, and the principles of open science. This approach emphasizes deliberate learning (Mitchell and Boyer, 2020), where student annotators engage deeply with the material through practical experience, reinforced by weekly hands-on sessions with experts in brain tumor imaging and a structured curriculum. This method not only accelerates the learning curve but also ingrains a thorough comprehension of diverse BM presentations, turning the annotation process into a valuable educational experience and creating a rich training resource for future professionals. Additionally, the curriculum includes detailed discussions on various brain abnormalities such as microvascular white matter damage, microbleeds, and different stages of hemorrhage, further enriching their understanding and capabilities in annotating complex imaging datasets.

While our approach faced challenges due to the heterogeneity of the contributed datasets, this diversity is reflective of real-world clinical environments where algorithms must perform effectively across a wide range of data variations. Many cases were excluded from the analysis due to resection cavities, post-treatment changes, or the absence of brain parenchymal metastases. Inadequate skull stripping sometimes led to the inadvertent removal of metastases or failure to detect them, complicating accurate data interpretation. Furthermore, skull stripping can make it difficult to describe and differentiate dural-based lesions, such as metastases and meningiomas, and limits the evaluation of osseous metastases to the calvarium.

Another source of heterogeneity was due to differences in data acquisition, patient motion, protocols, slice thickness, and contrast injection timing that can lead to misregistration of images on different sequences. Particularly, the impact of slice thickness on lesion detectability is crucial, especially when targeting subcentimeter metastases. For example, the RANO high grade glioma criteria specify lesion visibility on two contiguous 5 mm thick slices, underscoring the importance of image resolution (Wen et al., 2023). During our manual segmentation processes, challenges arose when matching sequences acquired with varying 2D and 3D techniques, highlighting disparities in slice thickness and voxel sizes. In some instances, the co-registration of images appeared misaligned, potentially affecting the precision of segmentations. To address some of these issues, all images were standardized by registering them to the common SRI24 atlas (Rohlfing et al., 2010), promoting greater uniformity and adherence to the consensus brain tumor imaging protocol. This not only helped to mitigate the variations introduced by different imaging protocols but also enhanced the general applicability and effectiveness of the developed algorithms. These limitations contribute to the heterogeneity of data, which can have both positive and negative implications. While it can pose challenges for developing a uniform segmentation algorithm, it can also provide a diverse range of data that can benefit and generalize algorithm development.

While standardization of brain tumor imaging protocols (BTIP) have been proposed and are increasingly used in clinical trials resulting improved standardization of image acquisition, there is still a significant variability in imaging protocols among different imaging practices (Ellingson et al., 2021, 2015;Kaufmann et al., 2020). Increased implementation of standardized imaging protocols ensures consistency in the acquisition and interpretation of neuro-oncological images, which is crucial for comparing outcomes across studies and improving the reliability of lesion measurement across different institutions.

The complexity of annotating ground truth data for BMs represents yet another challenge in this year’s BraTS-METS challenge, largely due to the typically small size of BMs and their frequent occurrence in large numbers within a single scan. Annotator fatigue is a notable concern, as the meticulous nature of the task can lead to errors or oversight. Throughout the annotation process, numerous instances necessitated segmentation revisions, as exemplified by the initial work done on the Yale BM dataset by a medical student, which later required refinement by experienced neuroradiologists (Kaur et al., 2023; CassinelliPetersen et al., 2022; Jekel et al., 2022a; Ramakrishnan et al., 2023). The need for such revisions became particularly apparent when the dataset, along with its segmentations, was integrated into the BraTS challenge and adapted to a new atlas. This process often revealed previously unnoticed small lesions or inaccuracies in the depiction of necrotic tumor portions and peritumoral edema on FLAIR images. These experiences showcase the imperative of a robust ground truth (i.e. reference standard) approach that incorporates humans in the loop refinements and utilizes consensus techniques like STAPLE to ensure the highest data integrity (Warfield et al., 2004). The iterative nature of these annotations underscores the need for multiple rounds of review to ensure accuracy and the importance of standardizing annotation practices to facilitate more efficient data usage. To foster continual improvement and address any discrepancies, we encourage participants to engage actively with the challenge organizers, who are prepared to update and refine the segmentation data as necessary to maintain the integrity and utility of the dataset.

7. Conclusion

In the inaugural 2023 BraTS-METS challenge, we have addressed both technical and practical challenges in the establishment of datasets, high quality reference standard annotations, and assessment metrics for the development and application of machine learning algorithms for BM segmentation by challenge participants. The challenge has highlighted the critical need for algorithms capable of detecting even the smallest lesions, which are often overlooked due to human error or obscured by the limitations of imaging data. This task is complicated by the necessity of balancing the high sensitivity required for detection with the need to minimize false positives that can disrupt clinical workflows. The development of refined segmentation algorithms that effectively balance sensitivity with specificity is therefore essential. Utilizing multi-institutional datasets, the BraTS-METS challenge has been instrumental in advancing these developments, pushing forward the creation of models that are robust and adaptable across varied clinical environments. This approach optimizes the precision of these algorithms and potentiates their practical applicability, ensuring they can meet the nuanced demands of real-world medical practice. As we continue to refine these technologies, our goal remains to enhance the accuracy of diagnoses and treatment planning, ultimately improving patient management and outcomes in the challenging arena of brain metastasis treatment.

Supplementary Material

Supplement 1

Acknowledgments

The success of any challenge in the medical domain depends upon the quality of well-annotated multi-institutional datasets. We are grateful to all the data contributors, annotators, and approvers for their time and efforts. We are grateful to the institutions that contributed directly and indirectly to resources for the development of the databases. We are also grateful to individual companies that assisted in the development of datasets, such as Visage Imaging in the development of the Yale BM dataset.

S. Bakas and U. Baid conducted part of the work reported in this manuscript at their current affiliation, as well as while they were affiliated with the Center for Artificial Intelligence and Data Science for Integrated Diagnostics (AI2D) and the Center for Biomedical Image Computing and Analytics (CBICA), Perelman School of Medicine at the University of Pennsylvania, Philadelphia.

M. Aboian conducted part of the work reported in this manuscript at her current affiliation, as well as while she was affiliated with Yale University School of Medicine, New Haven, CT.

We thank Victoria Ramirez (Department of Radiology, Children’s Hospital of Philadelphia) for her efforts in reviewing the manuscript.

We thank Ananya Purwar for her technical support in editing the LaTeX formatting for this work.

Funding

Research reported in this publication was partly supported by the National Cancer Institute (NCI) of the National Institutes of Health (NIH), under award numbers U01CA242871, NIH/NCI R21CA259964. The research was supported by Yale Department of Radiology and by Children’s Hospital of Philadelphia (CHOP) Department of Radiology. The content of this publication is the sole responsibility of the authors and does not represent the official views of the NIH.

Footnotes

Conflicts of Interest

No conflicts of interest to disclose.

Ethical Standards

The work follows appropriate ethical standards in conducting research and writing the manuscript, following all applicable laws and regulations regarding treatment of animals or human subjects.

Data availability

The data provided for the challenge is available on the Challenge Page Link. All the analysis will be shared via BOX on request.

References

  1. Bousabarah Khaled, Ruge Maximilian, Brand Julia-Sarita, Hoevels Mauritius, Rueß Daniel, Borggrefe Jan, Hokamp Nils Große, Visser-Vandewalle Veerle, Maintz David, Treuer Harald, et al. Deep convolutional neural networks for automated segmentation of brain metastases trained on clinical data. Radiation Oncology, 15:1–9, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Buchner Josef A, Peeken Jan C, Etzel Lucas, Ezhov Ivan, Mayinger Michael, Christ Sebastian M, Brunner Thomas B, Wittig Andrea, Menze Bjoern H, Zimmer Claus, et al. Identifying core mri sequences for reliable automatic brain metastasis segmentation. Radiotherapy and Oncology, 188:109901, 2023. [DOI] [PubMed] [Google Scholar]
  3. Petersen Gabriel Cassinelli, Bousabarah Khaled, Verma Tej, von Reppert Marc, Jekel Leon, Gordem Ayyuce, Jang Benjamin, Merkaj Sara, Fadel Sandra Abi, Owens Randy, et al. Real-time pacs-integrated longitudinal brain metastasis tracking tool provides comprehensive assessment of treatment response to radiosurgery. Neuro-Oncology Advances, 4(1):vdac116, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Charron Odelin, Lallement Alex, Jarnet Delphine, Noblet Vincent, Clavier Jean-Baptiste, and Meyer Philippe. Automatic detection and segmentation of brain metastases on multimodal mr images with a deep convolutional neural network. Computers in biology and medicine, 95:43–54, 2018. [DOI] [PubMed] [Google Scholar]
  5. Chen Jieneng, Mei Jieru, Li Xianhang, Lu Yongyi, Yu Qihang, Wei Qingyue, Luo Xiangde, Xie Yutong, Adeli Ehsan, Wang Yan, et al. 3d transunet: Advancing medical image segmentation through vision transformers. arXiv preprint arXiv:2310.07781, 2023a. [Google Scholar]
  6. Chen Mingming, Guo Yujie, Wang Pengcheng, Chen Qi, Bai Lu, Wang Shaobin, Su Ya, Wang Lizhen, and Gong Guanzhong. An effective approach to improve the automatic segmentation and classification accuracy of brain metastasis by combining multi-phase delay enhanced mr images. Journal of Digital Imaging, 36(4):1782–1793, 2023b. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chen Victor Eric, Kim Minchul, Nelson Nicolas, Kim Inkyu Kevin, and Shi Wenyin. Cost-effectiveness analysis of 3 radiation treatment strategies for patients with multiple brain metastases. Neuro-Oncology Practice, 10(4):344–351, 2023c. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cho Se Jin, Sunwoo Leonard, Baik Sung Hyun, Bae Yun Jung, Choi Byung Se, and Kim Jae Hyoung. Brain metastasis detection using machine learning: a systematic review and meta-analysis. Neuro-oncology, 23(2):214–225, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dang NP, Noid G, Liang Y, Bovi JA, Bhalla M, and Li A. Automated brain metastasis detection and segmentation using deep-learning method. International Journal of Radiation Oncology, Biology, Physics, 114(3):e50, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Davis Melissa A, Wu Ona, Ikuta Ichiro, Jordan John E, Johnson Michele H, and Quigley Edward. Understanding bias in artificial intelligence: A practice perspective. American Journal of Neuroradiology, 45(4):371–373, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Dikici Engin, Ryu John L, Demirer Mutlu, Bigelow Matthew, White Richard D, Slone Wayne, Erdal Barbaros Selnur, and Prevedello Luciano M. Automated brain metastases detection framework for t1-weighted contrast-enhanced 3d mri. IEEE journal of biomedical and health informatics, 24(10):2883–2893, 2020. [DOI] [PubMed] [Google Scholar]
  12. Dikici Engin, Nguyen Xuan V, Bigelow Matthew, Ryu John L, and Prevedello Luciano M. Advancing brain metastases detection in t1-weighted contrast-enhanced 3d mri using noisy student-based training. Diagnostics, 12(8):2023, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Ellingson Benjamin M, Bendszus Martin, Boxerman Jerrold, Barboriak Daniel, Erickson Bradley J, Smits Marion, Nelson Sarah J, Gerstner Elizabeth, Alexander Brian, Goldmacher Gregory, et al. Consensus recommendations for a standardized brain tumor imaging protocol in clinical trials. Neuro-oncology, 17(9):1188–1198, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Ellingson Benjamin M, Brown Matthew S, Boxerman Jerrold L, Gerstner Elizabeth R, Kaufmann Timothy J, Cole Patricia E, Bacha Jeffrey A, Leung David, Barone Amy, Colman Howard, et al. Radiographic read paradigms and the roles of the central imaging laboratory in neurooncology clinical trials. Neuro-oncology, 23(2):189–198, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Fairchild Andrew, Salama Joseph K, Godfrey Devon, Wiggins Walter F, Ackerson Bradley G, Oyekunle Taofik, Donna Niedzwiecki, Fecci Peter E, Kirkpatrick John P, and Floyd Scott R . Incidence and imaging characteristics of difficult to detect retrospectively identified brain metastases in patients receiving repeat courses of stereotactic radiosurgery. Journal of Neuro-Oncology, pages 1–9, 2024. [DOI] [PubMed] [Google Scholar]
  16. Fairchild Andrew T, Salama Joseph K, Wiggins Walter F, Ackerson Bradley G, Fecci Peter E, Kirkpatrick John P, Floyd Scott R, and Godfrey Devon J. A deep learning-based computer aided detection (cad) system for difficultto-detect brain metastases. International Journal of Radiation Oncology* Biology* Physics, 115(3):779–793, 2023. [DOI] [PubMed] [Google Scholar]
  17. Ghesu Florin C, Georgescu Bogdan, Mansoor Awais, Yoo Youngjin, Neumann Dominik, Patel Pragneshkumar, Vishwanath Reddappagari Suryanarayana, Balter James M, Cao Yue, Grbic Sasa, et al. Contrastive self-supervised learning from 100 million medical images with optional supervision. Journal of Medical Imaging, 9(6):064503–064503, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Gichoya Judy Wawira, Thomas Kaesha, Celi Leo Anthony, Safdar Nabile, Banerjee Imon, Banja John D, Seyyed-Kalantari Laleh, Trivedi Hari, and Purkayastha Saptarshi. Ai pitfalls and what not to do: mitigating bias in ai. The British Journal of Radiology, 96(1150):20230023, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Greenspan Hayit, Van Ginneken Bram, and Summers Ronald M. Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE transactions on medical imaging, 35(5):1153–1159, 2016. [Google Scholar]
  20. Grøvik Endre, Yi Darvin, Iv Michael, Tong Elizabeth, Rubin Daniel, and Zaharchuk Greg. Deep learning enables automatic detection and segmentation of brain metastases on multisequence mri. Journal of Magnetic Resonance Imaging, 51(1):175–182, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. He Kaiming, Chen Xinlei, Xie Saining, Li Yanghao, Dollár Piotr, and Girshick Ross. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022. [Google Scholar]
  22. Isensee Fabian, Schell Marianne, Pflueger Irada, Brugnara Gianluca, Bonekamp David, Neuberger Ulf, Wick Antje, Schlemmer Heinz-Peter, Heiland Sabine, Wick Wolfgang, et al. Automated brain extraction of multisequence mri using artificial neural networks. Human brain mapping, 40(17):4952–4964, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Jalalifar Seyed Ali, Soliman Hany, Sahgal Arjun, and Sadeghi-Naini Ali. Automatic assessment of stereotactic radiation therapy outcome in brain metastasis using longitudinal segmentation on serial mri. IEEE Journal of Biomedical and Health Informatics, 2023. [DOI] [PubMed] [Google Scholar]
  24. Jekel Leon, Bousabarah Khaled, Lin MingDe, Merkaj Sara, Kaur Manpreet, Avesta Arman, Aneja Sanjay, Omuro Antonio, Chiang Veronica, Scheffler Björn, et al. Nimg-02. pacs-integrated auto-segmentation workflow for brain metastases using nnu-net. Neuro-oncology, 24 (Supplement 7):vii162–vii162, 2022a. [Google Scholar]
  25. Jekel Leon, Brim Waverly R, von Reppert Marc, Staib Lawrence, Petersen Gabriel Cassinelli, Merkaj Sara, Subramanian Harry, Zeevi Tal, Payabvash Seyedmehdi, Bousabarah Khaled, et al. Machine learning applications for differentiation of glioma from brain metastasis—a systematic review. Cancers, 14(6):1369, 2022b. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Jeong Hana, Park Ji Eun, Kim NakYoung, Yoon Shin-Kyo, and Kim Ho Sung. Deep learning-based detection and quantification of brain metastases on black-blood imaging can provide treatment suggestions: a clinical cohort study. European Radiology, 34(3):2062–2071, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Juluru Krishna, Siegel Eliot, and Mazura Jan. Identification from mri with face-recognition software. The New England Journal of Medicine, 382(5):489–490, 2020. [DOI] [PubMed] [Google Scholar]
  28. Kaal Evert CA, Niël Charles GJH, and Vecht Charles J. Therapeutic management of brain metastasis. The Lancet Neurology, 4(5):289–298, 2005. [DOI] [PubMed] [Google Scholar]
  29. Kanakarajan Hemalatha, De Baene Wouter, Hanssens Patrick, and Sitskoorn Margriet. Fully automated brain metastases segmentation using t1-weighted contrast-enhanced mr images before and after stereotactic radiosurgery. medRxiv, pages 2023–07, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Kaufmann Timothy J, Smits Marion, Boxerman Jerrold, Huang Raymond, Barboriak Daniel P, Weller Michael, Chung Caroline, Tsien Christina, Brown Paul D, Shankar Lalitha, et al. Consensus recommendations for a standardized brain tumor imaging protocol for clinical trials in brain metastases. Neuro-oncology, 22(6):757–772, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Kaur Manpreet, Petersen Gabriel Cassinelli, Jekel Leon, von Reppert Marc, Varghese Sunitha, de Oliveira Santo Irene Dixe, Avesta Arman, Aneja Sanjay, Omuro Antonio, Chiang Veronica, et al. Pacs-integrated tools for peritumoral edema volumetrics provide additional information to rano-bm-based assessment of lung cancer brain metastases after stereotactic radiotherapy: A pilot study. Cancers, 15(19):4822, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Kelahan Linda C, Kim Donald, Soliman Moataz, Avery Ryan J, Savas Hatice, Agrawal Rishi, Magnetta Michael, Liu Benjamin P, and Velichko Yuri S. Role of hepatic metastatic lesion size on inter-reader reproducibility of ct-based radiomics features. European radiology, 32(6):4025–4033, 2022. [DOI] [PubMed] [Google Scholar]
  33. Krist David T, Naik Anant, Thompson Charee M, Kwok Susanna S, Janbahan Mika, Olivero William C, and Hassaneen Wael. Management of brain metastasis. surgical resection versus stereotactic radiotherapy: a meta-analysis. Neuro-Oncology Advances, 4(1):vdac033, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Rhun E Le, Guckenberger Matthias, Smits Marion, Dummer Reinhard, Bachelot Thomas, Sahm Felix, Galldiks Norbert, de Azambuja Evandro, Berghoff Anna Sophie, Metellus Philippe, et al. Eano–esmo clinical practice guidelines for diagnosis, treatment and follow-up of patients with brain metastasis from solid tumours. Annals of Oncology, 32(11):1332–1347, 2021. [DOI] [PubMed] [Google Scholar]
  35. Liew Andrea, Lee Chun Cheng, Subramaniam Valarmathy, Lan Boon Leong, and Tan Maxine. Gradual self-training via confidence and volume based domain adaptation for multi dataset deep learning-based brain metastases detection using nonlocal networks on mri images. Journal of Magnetic Resonance Imaging, 57(6):1728–1740, 2023. [DOI] [PubMed] [Google Scholar]
  36. Lin Nancy U, Lee Eudocia Q, Aoyama Hidefumi, Barani Igor J, Barboriak Daniel P, Baumert Brigitta G, Bendszus Martin, Brown Paul D, Camidge D Ross, Chang Susan M, et al. Response assessment criteria for brain metastases: proposal from the rano group. The lancet oncology, 16(6):e270–e278, 2015. [DOI] [PubMed] [Google Scholar]
  37. Maier Oskar, Menze Bjoern H, Von der Gablentz Janina, Häni Levin, Heinrich Mattias P, Liebrand Matthias, Winzeck Stefan, Basit Abdul, Bentley Paul, Chen Liang, et al. Isles 2015-a public evaluation benchmark for ischemic stroke lesion segmentation from multispectral mri. Medical image analysis, 35:250–269, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Mi Honglan, Yuan Mingyuan, Suo Shiteng, Cheng Jiejun, Li Suqin, Duan Shaofeng, and Lu Qing. Impact of different scanners and acquisition parameters on robustness of mr radiomics features based on women’s cervix. Scientific reports, 10(1):20407, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Minniti Giuseppe, Clarke Enrico, Lanzetta Gaetano, Mattia Falchetto Osti Guido Trasimeni, Bozzao Alessandro, Romano Andrea, and Enrici Riccardo Maurizi. Stereotactic radiosurgery for brain metastases: analysis of outcome and risk of brain radionecrosis. Radiation oncology, 6: 1–9, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Mitchell Sally A and Boyer Tanna J. Deliberate practice in medical simulation. 2020. [PubMed] [Google Scholar]
  41. Najjar Reabal. Redefining radiology: a review of artificial intelligence integration in medical imaging. Diagnostics, 13(17):2760, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Nayak Lakshmi, Lee Eudocia Quant, and Wen Patrick Y . Epidemiology of brain metastases. Current oncology reports, 14:48–54, 2012. [DOI] [PubMed] [Google Scholar]
  43. Ocaña-Tienda Beatriz, Pérez-Beteta Julián, Villanueva-García José D, Romero-Rosales José A, Molina-García David, Suter Yannick, Asenjo Beatriz, Albillo David, de Mendivil Ana Ortiz, Pérez-Romasanta Luis A, et al. A comprehensive dataset of annotated brain metastasis mr images with clinical and radiomic data. Scientific data, 10(1):208, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Oermann Eric, Link Katherine, Schnurman Zane, Liu Chris, Kwon Young Joon Fred, Jiang Lavender Yao, Nasir-Moin Mustafa, Neifert Sean, Alzate Juan, Bernstein Kenneth, et al. Longitudinal deep neural networks for assessing metastatic brain cancer on a massive open benchmark. 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Ottesen Jon André, Yi Darvin, Tong Elizabeth, Iv Michael, Latysheva Anna, Saxhaug Cathrine, Jacobsen Kari Dolven, Helland Åslaug, Emblem Kyrre Eeg, Rubin Daniel L, et al. 2.5 d and 3d segmentation of brain metastases with deep learning on multinational mri data. Frontiers in Neuroinformatics, 16:1056068, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Pati Sarthak, Singh Ashish, Rathore Saima, Gastounioti Aimilia, Bergman Mark, Ngo Phuc, Ha Sung Min, Bounias Dimitrios, Minock James, Murphy Grayson, et al. The cancer imaging phenomics toolkit (captk): technical overview. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 5th International Workshop, BrainLes 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 17, 2019, Revised Selected Papers, Part II 5, pages 380–394. Springer, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Pati Sarthak, Baid Ujjwal, Edwards Brandon, Sheller Micah J, Foley Patrick, Reina G Anthony, Thakur Siddhesh, Sako Chiharu, Bilello Michel, Davatzikos Christos, et al. The federated tumor segmentation (fets) tool: an open-source solution to further solid tumor research. Physics in Medicine & Biology, 67(20):204002, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Percy Alan K, Elveback Lila R, Okazaki Haruo, and Kurland Leonard T. Neoplasms of the central nervous system: epidemiologic considerations. Neurology, 22(1):40–40, 1972. [DOI] [PubMed] [Google Scholar]
  49. Irada Pflüger Tassilo Wald, Isensee Fabian, Schell Marianne, Meredig Hagen, Schlamp Kai, Bernhardt Denise, Brugnara Gianluca, Claus Peter Heußel Juergen Debus, et al. Automated detection and quantification of brain metastases on clinical mri data using artificial neural networks. Neuro-oncology advances, 4(1):vdac138, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Pinto-Coelho Luís. How artificial intelligence is shaping medical imaging technology: A survey of innovations and applications. Bioengineering, 10(12):1435, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Posner JB. Intracranial metastases from systemic cancer. Adv. Neurol., 19:579–592, 1978. [PubMed] [Google Scholar]
  52. Qian Jack M, Mahajan Amit, Yu James B, Tsiouris A John, Goldberg Sarah B, Kluger Harriet M, and Chiang Veronica LS. Comparing available criteria for measuring brain metastasis response to immunotherapy. Journal of NeuroOncology, 132:479–485, 2017. [DOI] [PubMed] [Google Scholar]
  53. Ramakrishnan Divya, Jekel Leon, Chadha Saahil, Janas Anastasia, Moy Harrison, Maleki Nazanin, Sala Matthew, Kaur Manpreet, Petersen Gabriel Cassinelli, Merkaj Sara, et al. A large open access dataset of brain metastasis 3d segmentations with clinical and imaging feature information. ArXiv, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Rathore Saima, Bakas Spyridon, Pati Sarthak, Akbari Hamed, Kalarot Ratheesh, Sridharan Patmaa, Rozycki Martin, Bergman Mark, Tunc Birkan, Verma Ragini, et al. Brain cancer imaging phenomics toolkit (brain-captk): an interactive platform for quantitative analysis of glioblastoma. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: Third International Workshop, BrainLes 2017, Held in Conjunction with MICCAI 2017, Quebec City, QC, Canada, September 14, 2017, Revised Selected Papers 3, pages 133–145. Springer, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Rohlfing Torsten, Zahr Natalie M, Sullivan Edith V, and Pfefferbaum Adolf. The sri24 multichannel atlas of normal adult human brain structure. Human brain mapping, 31 (5):798–819, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Routman David M, Bian Shelly X, Diao Kevin, Liu Jonathan L, Yu Cheng, Ye Jason, Zada Gabriel, and Chang Eric L. The growing importance of lesion volume as a prognostic factor in patients with multiple brain metastases treated with stereotactic radiosurgery. Cancer medicine, 7(3):757–764, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Rudie Jeffrey D, Saluja Rachit, Weiss David A, Nedelec Pierre, Calabrese Evan, Colby John B, Laguna Benjamin, Mongan John, Braunstein Steve, Hess Christopher P, et al. The university of california san francisco, brain metastases stereotactic radiosurgery (ucsf-bmsr) mri dataset. Radiology: Artificial Intelligence, page e230126, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Schnurman Zane, Mashiach Elad, Link Katherine E, Donahue Bernadine, Sulman Erik, Silverman Joshua, Golfinos John G, Oermann Eric Karl, and Kondziolka Douglas. Causes of death in patients with brain metastases. Neurosurgery, pages 10–1227, 2022. [DOI] [PubMed] [Google Scholar]
  59. Schwarz Christopher G, Kremers Walter K, Therneau Terry M, Sharp Richard R, Gunter Jeffrey L, Vemuri Prashanthi, Arani Arvin, Spychalla Anthony J, Kantarci Kejal, Knopman David S, et al. Identification of anonymous mri research participants with face-recognition software. New England Journal of Medicine, 381(17):1684–1686, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Shaw Edward, Scott Charles, Souhami Luis, Dinapoli Robert, Kline Robert, Loeffler Jay, and Farnan Nancy. Single dose radiosurgical treatment of recurrent previously irradiated primary brain tumors and brain metastases: final report of rtog protocol 90–05. International Journal of Radiation Oncology* Biology* Physics, 47(2):291–298, 2000. [DOI] [PubMed] [Google Scholar]
  61. Shaw James, Ali Joseph, Atuire Caesar A, Cheah Phaik Yeong, Español Armando Guio, Gichoya Judy Wawira, Hunt Adrienne, Jjingo Daudi, Littler Katherine, Paolotti Daniela, et al. Research ethics and artificial intelligence for global health: perspectives from the global forum on bioethics in research. BMC Medical Ethics, 25(1):46, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Simon Mihály, Papp Judit, Csiki Emese, and Kovács Árpád. Plan quality assessment of fractionated stereotactic radiotherapy treatment plans in patients with brain metastases. Frontiers in Oncology, 12:846609, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Tabouret Emeline, Chinot Olivier, Metellus Philippe, Tallet Agnes, Viens Patrice, and Goncalves Anthony. Recent trends in epidemiology of brain metastases: an overview. Anticancer research, 32(11):4655–4662, 2012. [PubMed] [Google Scholar]
  64. Tang Xiaoli. The role of artificial intelligence in medical imaging research. BJR— Open, 2(1):20190031, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Vahdati Sanaz, Khosravi Bardia, Mahmoudi Elham, Zhang Kuan, Rouzrokh Pouria, Faghani Shahriar, Moassefi Mana, Tahmasebi Aylin, Andriole Katherine P, Chang Peter, et al. A guideline for open-source tools to make medical imaging data ready for artificial intelligence applications: A society of imaging informatics in medicine (siim) survey. Journal of Imaging Informatics in Medicine, pages 1–10, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Vogelbaum Michael A, Brown Paul D, Messersmith Hans, Brastianos Priscilla K, Burri Stuart, Cahill Dan, Dunn Ian F, Gaspar Laurie E, Gatson Na Tosha N, Gondi Vinai, et al. Treatment for brain metastases: Asco-sno-astro guideline, 2022. [DOI] [PubMed] [Google Scholar]
  67. Wang Jen-Yeu, Qu Vera, Hui Caressa, Sandhu Navjot, Mendoza Maria G, Panjwani Neil, Chang Yu-Cheng, Liang ChihHung, Lu Jen-Tang, Wang Lei, et al. Stratified assessment of an fda-cleared deep learning algorithm for automated detection and contouring of metastatic brain tumors in stereotactic radiosurgery. Radiation Oncology, 18(1):61, 2023a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Wang Ryan, Kuo Po-Chih, Chen Li-Ching, Kenneth Patrick Seastedt Judy Wawira Gichoya, and Celi Leo Anthony. Drop the shortcuts: image augmentation improves fairness and decreases ai detection of race and other demographics from medical images. EBioMedicine, 102, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Wang Yibin, Duggar William Neil, Caballero David Michael, Thomas Toms Vengaloor, Adari Neha, Mundra Eswara Kumar, and Wang Haifeng. A brain mri dataset and baseline evaluations for tumor recurrence prediction after gamma knife radiotherapy. Scientific Data, 10(1):785, 2023b. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Warfield Simon K, Zou Kelly H, and Wells William M. Simultaneous truth and performance level estimation (staple): an algorithm for the validation of image segmentation. IEEE transactions on medical imaging, 23(7):903–921, 2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Wen Patrick Y, van den Bent Martin, Youssef Gilbert, Cloughesy Timothy F, Ellingson Benjamin M, Weller Michael, Galanis Evanthia, Barboriak Daniel P, de Groot John, Gilbert Mark R, et al. Rano 2.0: update to the response assessment in neuro-oncology criteria for high-and low-grade gliomas in adults. Journal of Clinical Oncology, 41 (33):5187–5199, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Xue Jie, Wang Bao, Ming Yang, Liu Xuejun, Jiang Zekun, Wang Chengwei, Liu Xiyu, Chen Ligang, Qu Jianhua, Xu Shangchen, et al. Deep learning–based detection and segmentation-assisted management of brain metastases. Neuro-oncology, 22(4):505–514, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Yoo SK, Kim TH, Kim HJ, Yoon HI, and Kim JS. Deep learning-based automatic detection and segmentation of brain metastases for stereotactic ablative radiotherapy using black-blood magnetic resonance imaging. International Journal of Radiation Oncology, Biology, Physics, 114(3):e558, 2022. [Google Scholar]
  74. Yoo Youngjin, Zhao Gengyan, Sandu Andreea E, Re Thomas J, Das Jyotipriya, Wang Hesheng, Kim Michelle, Shen Colette, Lee Yueh, Kondziolka Douglas, et al. The importance of data domain on self-supervised learning for brain metastasis detection and segmentation. In Medical Imaging 2023: Computer-Aided Diagnosis, volume 12465, pages 556–562. SPIE, 2023. [Google Scholar]
  75. Zhang Min, Young Geoffrey S, Chen Huai, Li Jing, Qin Lei, McFaline-Figueroa J Ricardo, Reardon David A, Cao Xinhua, Wu Xian, and Xu Xiaoyin. Deep-learning detection of cancer metastases to the brain on mri. Journal of Magnetic Resonance Imaging, 52(4):1227–1236, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Zhou Zijian, Sanders Jeremiah W, Johnson Jason M, Gule-Monroe Maria K, Chen Melissa M, Briere Tina M, Wang Yan, Son Jong Bum, Pagel Mark D, Li Jing, et al. Computer-aided detection of brain metastases in t1-weighted mri for stereotactic radiosurgery using deep learning single-shot detectors. Radiology, 295(2):407–415, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Zindler Jaap D, Slotman Ben J, and Lagerwaard Frank J. Patterns of distant brain recurrences after radiosurgery alone for newly diagnosed brain metastases: Implications for salvage therapy. Radiotherapy and Oncology, 112(2):212–216, 2014. [DOI] [PubMed] [Google Scholar]
  78. Ziyaee Hamidreza, Cardenas Carlos E, Yeboa D Nana, Li Jing, Ferguson Sherise D, Johnson Jason, Zhou Zijian, Sanders Jeremiah, Mumme Raymond, Court Laurence, et al. Automated brain metastases segmentation with a deep dive into false-positive detection. Advances in radiation oncology, 8(1):101085, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

Data Availability Statement

The data provided for the challenge is available on the Challenge Page Link. All the analysis will be shared via BOX on request.


Articles from ArXiv are provided here courtesy of arXiv

RESOURCES