Abstract
Resection and whole brain radiotherapy (WBRT) are the standards of care for the treatment of patients with brain metastases (BM) but are often associated with cognitive side effects. Stereotactic radiosurgery (SRS) involves a more targeted treatment approach and has been shown to avoid the side effects associated with WBRT. However, SRS requires precise identification and delineation of BM. While many AI algorithms have been developed for this purpose, their clinical adoption has been limited due to poor model performance in the clinical setting. Major reasons for non-generalizable algorithms are the limitations in the datasets used for training the AI network. The purpose of this study was to create a large, heterogenous, annotated BM dataset for training and validation of AI models to improve generalizability. We present a BM dataset of 200 patients with pretreatment T1, T1 post-contrast, T2, and FLAIR MR images. The dataset includes contrast-enhancing and necrotic 3D segmentations on T1 post-contrast and whole tumor (including peritumoral edema) 3D segmentations on FLAIR. Our dataset contains 975 contrast-enhancing lesions, many of which are sub centimeter, along with clinical and imaging feature information. We used a streamlined approach to database-building leveraging a PACS-integrated segmentation workflow.
Background & Summary
Brain metastases (BM) develop in up to 30–40% of patients with a primary malignancy, particularly those with lung cancer, breast cancer, and melanoma.1,2 Palliative treatment for BM includes resection, whole brain radiotherapy (WBRT), and, more recently, stereotactic radiosurgery (SRS).1 Although WBRT can reduce the neurological symptoms of BM, the overall survival has been shown to be decreased in patients with certain risk factors, including older age, lower baseline cognitive performance status, and >3 BM.3,4 SRS provides a more targeted and less toxic approach to BM treatment than WBRT and can be performed when patients present with >10 lesions although its predominant use is still in treatment of localized metastatic disease.5,6 In fact, one meta-analysis revealed a significant improvement in performance status and local control in patients treated with WBRT plus SRS compared to WBRT alone.7 Localization and accurate delineation of BM margins are critical for effective SRS treatment.8 In addition, differentiation of BM from high-grade gliomas, such as glioblastoma, can be challenging, and textual analysis of the peritumoral environment on T2/FLAIR sequences can aid in differentiation of these tumor subtypes.9
To address the challenge of BM diagnosis and delineation, several artificial intelligence (AI) tools, including machine learning (ML) and deep learning (DL) algorithms, have been developed in the past decade.8,10–14 While many of these algorithms showed promising results in BM diagnosis and auto-segmentation, there is still a large gap in the clinical implementation and adoption of these algorithms.12,15 One reason for this gap is the lack of algorithm generalizability to real-world datasets. In fact, many algorithms are trained and developed on small single-institution hospital datasets that lack diversity in patient populations and imaging protocols, which are often present in the clinical setting.12 In fact, one meta-analysis of BM algorithms revealed that the average sample size of datasets used to train algorithms was around 150, with half of the studies explicitly including patients with only solitary BM.12 Thus, there is a critical need for large, diverse, and open-access datasets to better train AI algorithms and to challenge AI models to perform accurate assessments on a large breadth of patient cases.12 To date, there are only two publicly available BM datasets, both of which contain under 200 patients with pretreatment segmentations solely on T1 post-contrast.16,17
We curated a dataset of 200 patients with a clinical or pathological diagnosis of BM with accompanying clinical and qualitative/quantitative imaging data. In addition to enhancing tumor 3D segmentations, our dataset also provides 3D segmentations of necrotic tumor portions on T1 post-contrast and whole tumor (including peritumoral edema) on FLAIR. Our dataset includes several sub-centimeter contrast-enhancing lesions, which are critical for training algorithms to recognize subtle lesions on imaging. Manual 3D tumor segmentations using a commercially available semi-automatic segmentation tool was performed in a novel workflow directly in a research instance of our PACS (AI Accelerator, Visage Imaging, Inc., San Diego, CA),18 which allowed for the creation and validation of segmentations in an accelerated time frame. We are in the process of making our dataset publicly available with all tumor segmentations (contrast-enhancing, necrotic, and peritumoral edema), standard MRI sequences (T1, T1 post-contrast, T2, and FLAIR), and an Excel file containing clinical information and qualitative/quantitative imaging features. We hope that our dataset contributes to the training and validation of future BM AI algorithms with the goal of their implementation, translation, and adoption in clinical practice for BM diagnosis and treatment.
Methods
Subject characteristics.
Patients were queried from the Yale New Haven Hospital (YNHH) database from 2013 to 2021, the YNHH tumor board registry in 2021, and the YNHH Gamma Knife registry from 2017 to 2021. Inclusion criteria were a clinical or pathological diagnosis of brain metastasis confirmed on the electronic medical record and availability of all four pretreatment standard MRI sequences (T1, T1 post-contrast, T2, and FLAIR) without significant motion artifact. There was a total of 200 patients included in the dataset. Of the 200 patients, the following was the breakdown of primary tumor origin: non-small cell lung cancer (86, 43%), melanoma (41, 20.5%), breast cancer (26, 13%), small cell lung cancer (17, 8.5%), renal cell carcinoma (16, 8%), and gastrointestinal cancers (14, 7%).
Image acquisition.
A summary of all imaging parameters for FLAIR and T1 post-contrast images of the 200 patients can be found in Table 1. The images were obtained on 1-T (4, 2%), 1.5-T (113, 56.5%), and 3-T (83, 41.5%) MRI scanners. Scanner vendors included Siemens (158, 79%), General Electric (31, 15.5%), Philips (7, 3.5%), and Hitachi (4, 2%).
Table 1:
Imaging Parameter | FLAIR | T1 post-contrast |
---|---|---|
Acquisition (n, %) | 2D (193, 96.5%) 3D (5, 2.5%) N/A (2, 1%) |
2D (32, 16%) 3D (166, 83%) N/A (2, 1%) |
Median (range) echo time (msec) | 92.0 (10.0 – 400.0) | 3.1 (1.8 – 26.1) |
Median (range) repetition time (msec) | 9000.0 (1700.0 – 12000.0) | 1900.0 (5.9 – 2619.8) |
Median (range) slice thickness (mm) | 5.0 (1.0 – 5.5) | 1.0 (0.9 – 5.0) |
Median (range) slice spacing (mm) | 5.0 (0.0 – 7.5) | 0.0 (0.0 – 7.0) |
N/A = not available; range = minimum – maximum
Segmentation procedure.
The DICOM studies for all 200 patients were sent and de-identified from the clinical production (Visage 7, Visage Imaging, Inc., San Diego, CA) to a research instance of our PACS. To streamline the segmentation workflow, a custom hanging protocol and eight-viewer layout were designed to automatically 3D register and display the relevant MR imaging sequences upon study load.18,19 Manual segmentations were performed by one medical student (L.J.) on the research PACS using a commercially available semi-automatic 3D segmentation tool (Fig 1).18
The segmentations were checked and manually revised as needed by two board-certified neuroradiologists (M.S.A. and F.M.) with more than seven years of clinical experience each. Contrast-enhancing lesions and necrotic portions were segmented on T1 post-contrast (Fig 2A). In total, there were 975 contrast-enhancing lesions among all patients with 285 patients having necrotic components. Whole tumor (including peritumoral edema) was segmented on FLAIR (Fig 2B). A total of 662 lesions had peritumoral edema surrounding contrast-enhancement. Notably, because a 3D registration of the various MR imaging sequences was performed using the custom hanging protocol, the segmentation masks could be accurately copied and pasted between MR imaging sequences.18
Clinical data and anonymization.
Clinical data for all patients were collected from the electronic medical record. They include the following: age at diagnosis, sex, ethnicity, smoking history in pack-years, primary tumor origin, presence of extranodal metastasis, and time to death or last note in the electronic medical record as of July 2022. The following qualitative/quantitative imaging features were included: presence of infratentorial involvement, total number of lesions (contrast-enhancing, necrotic, and peritumoral edema), total volume of all regions (contrast-enhancing, necrotic, and peritumoral edema), ratio of necrotic to contrast-enhancing volume, and ratio of peritumoral edema to contrast-enhancing volume.
De-identification was implemented on the research server and occurred directly upon receipt of the DICOM images from either the PACS production system or the long-term archive. No non-anonymized images were stored on the research server. The de-identification removes/modifies all metadata that have identifiable information according to the DICOM standard PS3.15 2018b Appendix E “Attribute Confidentiality Profiles”. Specifically, the “Basic Profile” combined with the “Clean Descriptors Option”, the “Clean Structured Content Option” and the “Retain Longitudinal Temporal Information with Modified Dates Option” were implemented. The PatientID, Accession number, and StudyInstanceUID were removed and replaced with a computed unique ID that is calculated using hash functions and a hash key. While this process is not reversible, it does guarantee that, if another study for the same patient is sent through the pipeline later, those new objects are assigned to the same patient on the research server, unless the hash key in the pipeline is changed. Likewise, additional images/series for the same study would be assigned to the same de-identified study. The MR images and 3D segmentation masks were exported as NIfTI files from the research server using the Python Visage application program interface (API). All images were skull stripped to maintain patient anonymity prior to publication of the dataset.
Ethical approval.
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board (or Ethics Committee) of Yale University, protocol 2000029055, approved on 10/01/2020.
Data Records
The data records are in the process of being published on The Cancer Imaging Archive (TCIA) collections. Each patient has a total of five associated NIfTI files with four image files of the standard sequences (T1 pre-contrast, T1 post-contrast, T2, and FLAIR) and a fifth segmentation file with combined masks from T1 post-contrast and FLAIR segmentations. The segmentation file has three labels: Label 1 (red) represents tumor necrosis, Label 2 (green) represents peritumoral edema, and Label 3 (blue) represents contrast-enhancing tumor. All sequences and segmentations for each patient were exported from research PACS in NIfTI format, co-registered to the SRI24 anatomical template, resampled to a uniform isotropic resolution (1 mm3), and skull stripped. The dataset also contains one Excel file with clinical and qualitative/quantitative imaging feature information for each patient. The patients are labeled with anonymized identifiers.
Technical Validation
All patients had brain metastases and primary tumor of origin confirmed either pathologically or clinically through the electronic medical record. In addition, only patients with high-quality T1, T1 post-contrast, T2, and FLAIR images without significant motion artifacts were included in the final dataset. All segmentations were independently validated by two neuroradiologists (M.S.A. and F.M.) with more than seven years of clinical experience each. After exporting to NIfTI format, standard sequences and segmentation files for all patients were opened on the ITK-SNAP software and adjusted by a neuroradiologist (M.S.A.).
Usage Notes
After completion of the data upload process, the NIfTI files can be downloaded from TCIA public collections (https://www.cancerimagingarchive.net/) and opened on segmentation platforms that support NIfTI format.
Acknowledgements
The authors would like to thank Yale School of Medicine Department of Radiology and Biomedical Imaging and Yale New Haven Hospital for providing the images and helping to make the data publicly available. Research reported in this publication was partly supported by the National Institutes of Health (NIH) under the award number NIH/NCI:U01CA242871 (S.B.). The content of this publication is solely the responsibility of the authors and does not represent the official views of the NIH.
Competing Interests
M.S.A. has collaborations with Visage Imaging, Inc., Blue Earth Diagnostics, Telix, and AAA. She also has a KL2 TR00186 grant from the NCATS foundation. M.L. is an employee and stockholder of Visage Imaging, Inc., and unrelated to this work, receives funding from NIH/NCI R01 CA206180 and NIH/NCI R01 CA275188. W.H. and M.W. are employees and stockholders of Visage Imaging GmbH. K.B. is an employee of Visage Imaging GmbH. C.K. receives royalties from Primal Pictures 3D Informa, has grant funding from the NIH, and has received the Core Curriculum grant from the American Society of Head and Neck Radiology, all unrelated to this work. The remaining co-authors do not have any competing interests.
References
- 1.Kotecha R, Gondi V, Ahluwalia MS, et al. Recent advances in managing brain metastasis. F1000Res. 2018;7:F1000 Faculty Rev-1772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Boire A, Brastianos PK, Garzia L, et al. Brain metastasis. Nat Rev Cancer. 2020;20(1):4–11. [DOI] [PubMed] [Google Scholar]
- 3.Buecker R, Hong ZY, Liu XM, et al. Risk factors to identify patients who may not benefit from whole brain irradiation for brain metastases - a single institution analysis. Radiation Oncology. 2019;14(1):41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Park YW, Choi D, Park JE, et al. Differentiation of recurrent glioblastoma from radiation necrosis using diffusion radiomics with machine learning model development and external validation. Sci Rep. 2021;11(1):2913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Xue J, Kubicek GJ, Grimm J, et al. Biological implications of whole-brain radiotherapy versus stereotactic radiosurgery of multiple brain metastases. J Neurosurg. 2014;121 Suppl:60–68. [DOI] [PubMed] [Google Scholar]
- 6.Niranjan A, Monaco E, Flickinger J, et al. Guidelines for Multiple Brain Metastases Radiosurgery. Prog Neurol Surg. 2019;34:100–109. [DOI] [PubMed] [Google Scholar]
- 7.Patil CG, Pricola K, Garg SK, et al. Whole brain radiation therapy (WBRT) alone versus WBRT and radiosurgery for the treatment of brain metastases. Cochrane Database Syst Rev. 2010;(6):CD006121. [DOI] [PubMed] [Google Scholar]
- 8.Cho SJ, Sunwoo L, Baik SH, et al. Brain metastasis detection using machine learning: a systematic review and meta-analysis. Neuro Oncol. 2021;23(2):214–225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Martín-Noguerol T, Mohan S, Santos-Armentia E, et al. Advanced MRI assessment of non-enhancing peritumoral signal abnormality in brain lesions. Eur J Radiol. 2021;143:109900. [DOI] [PubMed] [Google Scholar]
- 10.Huang Y, Bert C, Sommer P, et al. Deep learning for brain metastasis detection and segmentation in longitudinal MRI data. Med Phys. 2022;49(9):5773–5786. [DOI] [PubMed] [Google Scholar]
- 11.Isensee F, Jaeger PF, Kohl SAA, et al. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. 2021;18(2):203–211. [DOI] [PubMed] [Google Scholar]
- 12.Jekel L, Brim WR, von Reppert M, et al. Machine Learning Applications for Differentiation of Glioma from Brain Metastasis-A Systematic Review. Cancers (Basel). 2022;14(6):1369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Pflüger I, Wald T, Isensee F, et al. Automated detection and quantification of brain metastases on clinical MRI data using artificial neural networks. Neuro-Oncology Advances. 2022;4(1):vdac138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Rudie JD, Rauschecker AM, Bryan RN, et al. Emerging Applications of Artificial Intelligence in Neuro-Oncology. Radiology. 2019;290(3):607–618. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.van Kempen EJ, Post M, Mannil M, et al. Performance of machine learning algorithms for glioma segmentation of brain MRI: a systematic literature review and meta-analysis. Eur Radiol. 2021;31(12):9638–9653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ocaña-Tienda B, Pérez-Beteta J, Villanueva-García JD, et al. A comprehensive dataset of annotated brain metastasis MR images with clinical and radiomic data. Sci Data. 2023;10(1):208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.BrainMetShare | Center for Artificial Intelligence in Medicine & Imaging. Accessed May 7, 2023. https://aimi.stanford.edu/brainmetshare
- 18.Aboian M, Bousabarah K, Kazarian E, et al. Clinical implementation of artificial intelligence in neuroradiology with development of a novel workflow-efficient picture archiving and communication system-based automated brain tumor segmentation and radiomic feature extraction. Front Neurosci. 2022;16:860208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Cassinelli Petersen G, Bousabarah K, Verma T, et al. Real-time PACS-integrated longitudinal brain metastasis tracking tool provides comprehensive assessment of treatment response to radiosurgery. Neurooncol Adv. 2022;4(1):vdac116. [DOI] [PMC free article] [PubMed] [Google Scholar]