Skip to main content
Scientific Data logoLink to Scientific Data
. 2024 Feb 29;11:254. doi: 10.1038/s41597-024-03021-9

A large open access dataset of brain metastasis 3D segmentations on MRI with clinical and imaging information

Divya Ramakrishnan 1,, Leon Jekel 1,2, Saahil Chadha 1, Anastasia Janas 1,3, Harrison Moy 1,4, Nazanin Maleki 1, Matthew Sala 1,5, Manpreet Kaur 1,6, Gabriel Cassinelli Petersen 1,7, Sara Merkaj 1,8, Marc von Reppert 1,9, Ujjwal Baid 10,11, Spyridon Bakas 10,11, Claudia Kirsch 1,12,13, Melissa Davis 1, Khaled Bousabarah 14, Wolfgang Holler 14, MingDe Lin 1,15, Malte Westerhoff 14, Sanjay Aneja 16,17, Fatima Memon 1, Mariam S Aboian 1
PMCID: PMC10904366  PMID: 38424079

Abstract

Resection and whole brain radiotherapy (WBRT) are standard treatments for brain metastases (BM) but are associated with cognitive side effects. Stereotactic radiosurgery (SRS) uses a targeted approach with less side effects than WBRT. SRS requires precise identification and delineation of BM. While artificial intelligence (AI) algorithms have been developed for this, their clinical adoption is limited due to poor model performance in the clinical setting. The limitations of algorithms are often due to the quality of datasets used for training the AI network. The purpose of this study was to create a large, heterogenous, annotated BM dataset for training and validation of AI models. We present a BM dataset of 200 patients with pretreatment T1, T1 post-contrast, T2, and FLAIR MR images. The dataset includes contrast-enhancing and necrotic 3D segmentations on T1 post-contrast and peritumoral edema 3D segmentations on FLAIR. Our dataset contains 975 contrast-enhancing lesions, many of which are sub centimeter, along with clinical and imaging information. We used a streamlined approach to database-building through a PACS-integrated segmentation workflow.

Subject terms: CNS cancer, Metastasis

Background & Summary

Brain metastases (BM) develop in up to 30–40% of patients with a primary malignancy, particularly those with lung cancer, breast cancer, and melanoma1,2 Palliative treatment for BM includes resection, whole brain radiotherapy (WBRT), and, more recently, stereotactic radiosurgery (SRS)1 Although WBRT can reduce the neurological symptoms of BM, the overall survival has been shown to be decreased in patients with certain risk factors, including older age, lower baseline cognitive performance status, and >3 BM3,4 SRS provides a more targeted and less toxic approach to BM treatment than WBRT and can be performed when patients present with >10 lesions although its predominant use is still in treatment of localized metastatic disease5,6 In fact, one meta-analysis revealed a significant improvement in performance status and local control in patients treated with WBRT plus SRS compared to WBRT alone7 Localization and accurate delineation of BM margins are critical for effective SRS treatment8 In addition, differentiation of BM from high-grade gliomas, such as glioblastoma, can be challenging, and textural analysis of the peritumoral environment on T2/FLAIR MRI sequences can aid in differentiation of these tumor subtypes9

To address the challenge of BM diagnosis and delineation, several artificial intelligence (AI) tools, including machine learning (ML) and deep learning (DL) algorithms, have been developed in the past decade8,1014 While many of these algorithms showed promising results in BM diagnosis and auto-segmentation, there is still a large gap in the clinical implementation and adoption of these algorithms12,15 One reason for this gap is the lack of algorithm generalizability to real-world datasets. In fact, many algorithms are trained and developed on small single-institution hospital datasets that lack diversity in patient populations and imaging protocols, which are often present in the clinical setting12 In fact, one meta-analysis of BM algorithms revealed that the average sample size of datasets used to train algorithms was around 150, with half of the studies explicitly including patients with only solitary BM12 Thus, there is a critical need for large, diverse, and open-access datasets to better train AI algorithms and to challenge AI models to perform accurate assessments on a large breadth of patient cases12 To date, there are only two publicly available BM datasets, both of which contain under 200 patients with pretreatment segmentations solely on T1 post-contrast16,17.

We curated a dataset of 200 patients with a clinical or pathological diagnosis of BM with accompanying clinical and qualitative/quantitative imaging information18 In addition to enhancing tumor 3D segmentations, our dataset also provides 3D segmentations of necrotic tumor portions on T1 post-contrast and peritumoral edema on FLAIR. Our dataset includes several sub-centimeter contrast-enhancing lesions, which are critical for training algorithms to recognize subtle lesions on imaging18 Manual 3D tumor segmentations using a commercially available semi-automatic segmentation tool was performed in a novel workflow directly in a research instance of our PACS (AI Accelerator, Visage Imaging, Inc., San Diego, CA)19 which allowed for the creation and validation of segmentations in an accelerated time frame. Our dataset is publicly available on The Cancer Imaging Archive (TCIA) platform with all tumor segmentations (contrast-enhancing, necrotic, and peritumoral edema), standard MRI sequences (T1, T1 post-contrast, T2, and FLAIR), and an Excel file containing clinical and qualitative/quantitative imaging information18 We hope that our dataset contributes to the training and validation of future BM AI algorithms with the goal of their implementation, translation, and adoption in clinical practice for BM diagnosis and treatment.

Methods

Subject characteristics

Patients were queried from the Yale New Haven Hospital (YNHH) database from 2013 to 2021, the YNHH tumor board registry in 2021, and the YNHH Gamma Knife registry from 2017 to 2021. Inclusion criteria were a clinical or pathological diagnosis of brain metastasis confirmed on the electronic medical record and availability of all four pretreatment standard MRI sequences (T1, T1 post-contrast, T2, and FLAIR) without significant motion artifact. There was a total of 200 patients included in the dataset18 Of the 200 patients, the following was the breakdown of primary tumor origin: non-small cell lung cancer (86, 43%), melanoma (41, 20.5%), breast cancer (26, 13%), small cell lung cancer (17, 8.5%), renal cell carcinoma (16, 8%), and gastrointestinal cancers (14, 7%).

Image acquisition

A summary of all imaging parameters for FLAIR and T1 post-contrast images of the 200 patients can be found in Table 1. The images were obtained on 1-T (4, 2%), 1.5-T (113, 56.5%), and 3-T (83, 41.5%) MRI scanners. Scanner vendors included Siemens (158, 79%), General Electric (31, 15.5%), Philips (7, 3.5%), and Hitachi (4, 2%).

Table 1.

Summary of imaging parameters for FLAIR and T1 post-contrast sequences.

Imaging Parameter FLAIR T1 post-contrast
Acquisition (n, %) 2D (193, 96.5%) 2D (32, 16%)
3D (5, 2.5%) 3D (166, 83%)
N/A (2, 1%) N/A (2, 1%)
Median (range) echo time (msec) 92.0 (10.0–400.0) 3.1 (1.8–26.1)
Median (range) repetition time (msec) 9000.0 (1700.0–12000.0) 1900.0 (5.9–2619.8)
Median (range) slice thickness (mm) 5.0 (1.0–5.5) 1.0 (0.9–5.0)
Median (range) slice spacing (mm) 5.0 (0.0–7.5) 0.0 (0.0–7.0)

*N/A = not available; range = minimum to maximum.

Segmentation procedure

The DICOM studies for all 200 patients were sent and de-identified from the clinical production (Visage 7, Visage Imaging, Inc., San Diego, CA) to a research instance of our PACS. To streamline the segmentation workflow, a custom hanging protocol and eight-viewer layout were designed to automatically 3D register and display the relevant MR imaging sequences upon study load19,20 Manual segmentations were performed by one medical student (L.J.) on the research PACS using a commercially available semi-automatic 3D segmentation tool as shown in Fig. 1. Research PACS annotation layout19.

Fig. 1.

Fig. 1

Research PACS annotation layout. The T1, T1 post-contrast, FLAIR, and T2 sequences for one patient are displayed on the eight-viewer layout after alignment with the auto-align tool. The PACS interface incorporates a 3D volumetric tool (white circle/rectangle) and displays labeled segmentations for two brain metastases in the display window (red rectangle).

The segmentations were checked and manually revised as needed by two board-certified neuroradiologists (M.S.A. and F.M.) with more than seven years of clinical experience each. Whole tumor (including peritumoral edema) was segmented on FLAIR as shown in Fig. 2a. PACS-based segmentations of whole tumor. Whole tumor includes the entirety of the tumor, which appears hyperintense on FLAIR, and includes edema and infiltrative tissue surrounding the contrast-enhancing portion of the tumor. A total of 662 lesions had peritumoral edema surrounding contrast-enhancement. Contrast-enhancing lesions and necrotic portions were segmented on T1 post-contrast as shown in Fig. 2b. PACS-based segmentations of contrast-enhancing lesion and corresponding necrotic portions. Contrast-enhancing lesions included those that showed hyperintensity on the T1 post-contrast sequence compared to the T1 sequence. Necrotic portions included regions within contrast-enhancing lesions that were hypointense on T1 post-contrast compared to T1. These regions can also be fluid-filled and appear hyperintense on T2. In total, there were 975 contrast-enhancing lesions among all patients with 285 patients having necrotic components. Notably, because a 3D registration of the various MR imaging sequences was performed using the custom hanging protocol, the segmentation masks could be accurately copied and pasted between MR imaging sequences19.

Fig. 2.

Fig. 2

PACS-based segmentations and NIfTI masks for one patient. After auto-alignment of FLAIR and T1 post-contrast sequences, segmentation of whole tumor (“Whole2_FLAIR”), including peritumoral edema, was performed on FLAIR (a), and segmentations of contrast-enhancing lesion (“Core2_PGGE”) and corresponding necrotic portions (“Necrosis2_PGGE”) were performed on T1 post-contrast (b). (c) The combined segmentation masks are shown overlaid on the FLAIR sequence in NIfTI format. The green region represents peritumoral edema, the blue region represents contrast-enhancing tumor, and the red region represents necrotic tumor.

Clinical data and anonymization

Clinical data for all patients were collected from the electronic medical record. They include the following: age at diagnosis, sex, ethnicity, smoking history at diagnosis in pack-years, primary tumor origin, presence of extranodal metastasis, and time to death or last note in the electronic medical record as of July 2022. The following qualitative/quantitative imaging features were included: presence of infratentorial involvement, total number of lesions with contrast-enhancement, necrosis, and peritumoral edema, total volume of all regions (contrast-enhancing, necrotic, and peritumoral edema), ratio of necrotic to contrast-enhancing volume, and ratio of peritumoral edema to contrast-enhancing volume.

De-identification was implemented on the research server and occurred directly upon receipt of the DICOM images from either the PACS production system or the long-term archive. No non-anonymized images were stored on the research server. The de-identification removes/modifies all metadata that have identifiable information according to the DICOM standard PS3.15 2018b Appendix E “Attribute Confidentiality Profiles”. Specifically, the “Basic Profile” combined with the “Clean Descriptors Option”, the “Clean Structured Content Option” and the “Retain Longitudinal Temporal Information with Modified Dates Option” were implemented. The PatientID, Accession number, and StudyInstanceUID were removed and replaced with a computed unique ID that is calculated using hash functions and a hash key. While this process is not reversible, it does guarantee that, if another study for the same patient is sent through the pipeline later, those new objects are assigned to the same patient on the research server, unless the hash key in the pipeline is changed. Likewise, additional images/series for the same study would be assigned to the same de-identified study. The MR images and 3D segmentation masks were exported as NIfTI files from the research server using the Python Visage application program interface (API). The Cancer Imaging Phenomics Toolkit (CaPTk)21 and Federated Tumor Segmentation (FeTS)22 pipelines were used to pre-process all sequences and segmentations for each patient. The pre-processing steps included image co-registration to the SRI24 anatomical template, resampling to a uniform isotropic resolution (1 mm3), and skull stripping to maintain patient anonymity. Both the PACS annotation system and CaPTk toolkit used a rigid registration method for the images.

Ethical approval

The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board (or Ethics Committee) of Yale University, protocol 2000029055, approved on 10/01/2020. The IRB waived participant consent given data anonymization and approved open publication of the data.

Data Records

The dataset has been deposited to The Cancer Imaging Archive (TCIA)18 Each patient has a total of five associated NIfTI files with four image files of the standard sequences (T1 pre-contrast, T1 post-contrast, T2, and FLAIR) and a fifth segmentation file with combined masks from T1 post-contrast and FLAIR segmentations. The segmentation file has three labels: Label 1 (red) represents tumor necrosis, Label 2 (green) represents peritumoral edema, and Label 3 (blue) represents contrast-enhancing tumor as shown in Fig. 2c. Combined segmentation NIfTI mask for one patient. The dataset also contains one Excel file with clinical and qualitative/quantitative imaging information18 The patients are labeled with anonymized identifiers.

Technical Validation

All patients had brain metastases and primary tumor of origin confirmed either pathologically or clinically through the electronic medical record. In addition, only patients with high-quality T1, T1 post-contrast, T2, and FLAIR images without significant motion artifacts were included in the final dataset18 All segmentations were independently validated by two neuroradiologists (M.S.A. and F.M.) with more than seven years of clinical experience each. After exporting to NIfTI format, standard sequences and segmentation files for all patients were opened on the ITK-SNAP software. Since all segmentations were combined into one mask per patient during preprocessing, a neuroradiologist (M.S.A.) made additional adjustments to the combined segmentation mask, which involved correction of any over or under segmented regions of interest (i.e. tumor necrosis, peritumoral edema, and contrast-enhancing tumor) after opening the segmentation file on ITK-SNAP and aligning it with the standard sequences. A medical student (D.R.) double checked and adjusted the revised NIfTI segmentation masks and manually counted the number of lesions with contrast-enhancement, necrosis, and peritumoral edema for each patient.

Usage Notes

After completion of the data upload process, the NIfTI files can be downloaded from TCIA (https://www.cancerimagingarchive.net) public collection “Pretreat-MetsToBrain-Masks” at 10.7937/6be1-r748 and opened on segmentation platforms that support NIfTI format18.

Acknowledgements

The authors would like to thank Yale School of Medicine Department of Radiology and Biomedical Imaging and Yale New Haven Hospital for providing the images and helping to make the data publicly available. Research reported in this publication was partly supported by the National Institutes of Health (NIH) under the award number NIH/NCI:U01CA242871 (S.B.). The content of this publication is solely the responsibility of the authors and does not represent the official views of the NIH.

Author contributions

D.R. – data export and quality control, dataset publication, manuscript preparation. L.J. – database assembly, tumor segmentations, clinical data collection, manuscript revision. S.C. – final lesion volume calculations, manuscript revision. A.J. – manuscript revision. H.M. – tumor segmentations, manuscript revision. N.M. – manuscript revision. M.S. – manuscript revision. M.K. – manuscript revision. G.C.P. – manuscript revision. S.M. – manuscript revision. M.v.R. – manuscript revision. U.B. – manuscript revision. S.B. – manuscript revision. C.K. – manuscript revision. M.D. – manuscript revision. K.B. – image transfer and de-identification, manuscript revision. W.H. – image transfer and de-identification, manuscript revision. M.L. – image transfer and de-identification, manuscript revision. M.W. – image transfer and de-identification, manuscript revision. S.A. – manuscript revision. F.M. – segmentation correction, manuscript revision. M.S.A. – project supervisor, segmentation correction, dataset publication, manuscript revision.

Code availability

The image pre-processing code used to build the dataset can be found at the following link: https://cbica.github.io/CaPTk/preprocessing_brats.html.

Competing interests

C.K. – receives royalties from Primal Pictures 3D Informa, has grant funding from the NIH, and has received the Core Curriculum grant from the American Society of Head and Neck Radiology, all unrelated to this work. K.B. – employee of Visage Imaging GmbH. W.H. – employee and stockholder of Visage Imaging GmbH. M.L. – employee and stockholder of Visage Imaging, Inc., and unrelated to this work, receives funding from NIH/NCI R01 CA206180 and NIH/NCI R01 CA275188. M.W. – employee and stockholder of Visage Imaging GmbH. M.S.A. – has collaborations with Visage Imaging, Inc., Blue Earth Diagnostics, Telix, and AAA. She also has a KL2 TR00186 grant from the NCATS foundation. The remaining co-authors do not have any competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Kotecha R, Gondi V, Ahluwalia MS, Brastianos PK, Mehta MP. Recent advances in managing brain metastasis. F1000Res. 2018;7:F1000 Faculty Rev–1772. doi: 10.12688/f1000research.15903.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Boire A, Brastianos PK, Garzia L, Valiente M. Brain metastasis. Nat Rev Cancer. 2020;20:4–11. doi: 10.1038/s41568-019-0220-y. [DOI] [PubMed] [Google Scholar]
  • 3.Buecker R, et al. Risk factors to identify patients who may not benefit from whole brain irradiation for brain metastases - a single institution analysis. Radiation Oncology. 2019;14:41. doi: 10.1186/s13014-019-1245-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Park YW, et al. Differentiation of recurrent glioblastoma from radiation necrosis using diffusion radiomics with machine learning model development and external validation. Sci Rep. 2021;11:2913. doi: 10.1038/s41598-021-82467-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Xue J, et al. Biological implications of whole-brain radiotherapy versus stereotactic radiosurgery of multiple brain metastases. J Neurosurg. 2014;121(Suppl):60–68. doi: 10.3171/2014.7.GKS141229. [DOI] [PubMed] [Google Scholar]
  • 6.Niranjan A, Monaco E, Flickinger J, Lunsford LD. Guidelines for multiple brain metastases radiosurgery. Prog Neurol Surg. 2019;34:100–109. doi: 10.1159/000493055. [DOI] [PubMed] [Google Scholar]
  • 7.Patil CG, Pricola K, Garg SK, Bryant A, Black KL. Whole brain radiation therapy (WBRT) alone versus WBRT and radiosurgery for the treatment of brain metastases. Cochrane Database Syst Rev. 2010;6:CD006121. doi: 10.1002/14651858.CD006121.pub2. [DOI] [PubMed] [Google Scholar]
  • 8.Cho SJ, et al. Brain metastasis detection using machine learning: a systematic review and meta-analysis. Neuro Oncol. 2021;23:214–225. doi: 10.1093/neuonc/noaa232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Martín-Noguerol T, Mohan S, Santos-Armentia E, Cabrera-Zubizarreta A, Luna A. Advanced MRI assessment of non-enhancing peritumoral signal abnormality in brain lesions. Eur J Radiol. 2021;143:109900. doi: 10.1016/j.ejrad.2021.109900. [DOI] [PubMed] [Google Scholar]
  • 10.Huang Y, et al. Deep learning for brain metastasis detection and segmentation in longitudinal MRI data. Med Phys. 2022;49:5773–5786. doi: 10.1002/mp.15863. [DOI] [PubMed] [Google Scholar]
  • 11.Isensee F, Jaeger PF, Kohl SAA, Petersen J, Maier-Hein KH. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. 2021;18:203–211. doi: 10.1038/s41592-020-01008-z. [DOI] [PubMed] [Google Scholar]
  • 12.Jekel L, et al. Machine learning applications for differentiation of glioma from brain metastasis - a systematic review. Cancers (Basel). 2022;14:1369. doi: 10.3390/cancers14061369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Pflüger I, et al. Automated detection and quantification of brain metastases on clinical MRI data using artificial neural networks. Neuro-Oncology Advances. 2022;4:vdac138. doi: 10.1093/noajnl/vdac138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Rudie JD, Rauschecker AM, Bryan RN, Davatzikos C, Mohan S. Emerging applications of artificial intelligence in neuro-oncology. Radiology. 2019;290:607–618. doi: 10.1148/radiol.2018181928. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.van Kempen EJ, et al. Performance of machine learning algorithms for glioma segmentation of brain MRI: a systematic literature review and meta-analysis. Eur Radiol. 2021;31:9638–9653. doi: 10.1007/s00330-021-08035-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ocaña-Tienda B, et al. A comprehensive dataset of annotated brain metastasis MR images with clinical and radiomic data. Sci Data. 2023;10:208. doi: 10.1038/s41597-023-02123-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.BrainMetShare | Center for Artificial Intelligence in Medicine & Imaginghttps://aimi.stanford.edu/brainmetshare (2019).
  • 18.Ramakrishnan D, 2023. A large open access dataset of brain metastasis 3D segmentations on MRI with clinical and imaging feature information. The Cancer Imaging Archive. [DOI] [PMC free article] [PubMed]
  • 19.Aboian M, et al. Clinical implementation of artificial intelligence in neuroradiology with development of a novel workflow-efficient picture archiving and communication system-based automated brain tumor segmentation and radiomic feature extraction. Front Neurosci. 2022;16:860208. doi: 10.3389/fnins.2022.860208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Petersen, G. C. et al. Real-time PACS-integrated longitudinal brain metastasis tracking tool provides comprehensive assessment of treatment response to radiosurgery. Neurooncol Adv. 4, vdac116 (2022). [DOI] [PMC free article] [PubMed]
  • 21.Pati S, et al. The cancer imaging phenomics toolkit (CaPTk): technical overview. Brainlesion. 2020;11993:380–394. doi: 10.1007/978-3-030-46643-5_38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Pati, S. et al. The federated tumor segmentation (FeTS) tool: an open-source solution to further solid tumor research. Phys Med Biol. 67 (2022). [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. Ramakrishnan D, 2023. A large open access dataset of brain metastasis 3D segmentations on MRI with clinical and imaging feature information. The Cancer Imaging Archive. [DOI] [PMC free article] [PubMed]

Data Availability Statement

The image pre-processing code used to build the dataset can be found at the following link: https://cbica.github.io/CaPTk/preprocessing_brats.html.


Articles from Scientific Data are provided here courtesy of Nature Publishing Group

RESOURCES