Skip to main content
Scientific Data logoLink to Scientific Data
. 2022 Jun 14;9:338. doi: 10.1038/s41597-022-01415-1

Enhancing the REMBRANDT MRI collection with expert segmentation labels and quantitative radiomic features

Anousheh Sayah 1,#, Camelia Bencheqroun 2,#, Krithika Bhuvaneshwar 2,, Anas Belouali 2, Spyridon Bakas 3, Chiharu Sako 3, Christos Davatzikos 3, Adil Alaoui 2, Subha Madhavan 2, Yuriy Gusev 2,
PMCID: PMC9198015  PMID: 35701399

Abstract

Malignancy of the brain and CNS is unfortunately a common diagnosis. A large subset of these lesions tends to be high grade tumors which portend poor prognoses and low survival rates, and are estimated to be the tenth leading cause of death worldwide. The complex nature of the brain tissue environment in which these lesions arise offers a rich opportunity for translational research. Magnetic Resonance Imaging (MRI) can provide a comprehensive view of the abnormal regions in the brain, therefore, its applications in the translational brain cancer research is considered essential for the diagnosis and monitoring of disease. Recent years has seen rapid growth in the field of radiogenomics, especially in cancer, and scientists have been able to successfully integrate the quantitative data extracted from medical images (also known as radiomics) with genomics to answer new and clinically relevant questions. In this paper, we took raw MRI scans from the REMBRANDT data collection from public domain, and performed volumetric segmentation to identify subregions of the brain. Radiomic features were then extracted to represent the MRIs in a quantitative yet summarized format. This resulting dataset now enables further biomedical and integrative data analysis, and is being made public via the NeuroImaging Tools & Resources Collaboratory (NITRC) repository (https://www.nitrc.org/projects/rembrandt_brain/).

Subject terms: Data publication and archiving, Data processing, Image processing


Measurement(s) MRI scans
Technology Type(s) Segmented labels in NIFTI format
Sample Characteristic - Organism Homo sapiens

Introduction

Brain cancer is a deadly disease with a 5-year survival rate of only about 30% (www.seer.cancer.gov). According to the Global Cancer Observatory https://gco.iarc.fr/, there were 308,102 cases of cancers of the brain and the central nervous system (CNS) in the world as of 20201 (139,756 were women, and over 168,346 were men1). There are more than 120 identified types of brain tumors, according to the National Brain Tumor Society, that are extremely heterogenous in nature, https://braintumor.org/brain-tumor-information/understanding-brain-tumors/tumor-types/ making it a complex disease to understand and interpret. In spite of the progress made in treatments of other cancers over the last 20 years, there continue to be only 5 approved drugs to treat brain tumors, and no prognostic advancements for GBM patients have been observed2. https://braintumor.org/brain-tumor-information/brain-tumor-facts/.

Medical imaging technologies including magnetic resonance imaging (MRI) and computed tomography (CT) scans, are one of newer technologies increasingly used in translational imaging research3. Due to its complex nature, the brain tissue environment offers a rich opportunity for translational research. MRI can provide a comprehensive view of the abnormal regions in the brain4 therefore, its applications in the translational brain cancer research is considered essential for the diagnosis, monitoring, and management of the disease3.

In recent years, scientists have been able to integrate the data gleaned from medical images with genomics, and this burgeoning field is called radiogenomics57. The imaging data is first converted into a quantitative summarized format, through extracted measurements (also known as radiomics) that can be both visual and sub-visual to the naked eye8. These radiomic features allow further extraction of imaging phenotypes, that can be integrated with genomics data using machine learning (ML) and artificial intelligence (AI) based algorithms. While many clinical trials are ongoing for new treatments in brain cancer research, there are many opportunities for the development novel treatment hypotheses using radiogenomics approaches9.

There are several large-scale national collaborations that utilize either brain cancer data, or medical imaging related technologies for translational research including, the Brain Science Foundation https://www.brainsciencefoundation.org/; The endbraincancer (EBC) https://endbraincancer.org/end-brain-cancer/; The Children Brain Tumor Tissue Consortium (CBTTC) https://www.chop.edu/clinical-trial/cbttc-collection-protocol; The Children’s Brain Tumor Network https://cbtn.org/about-us, The Cancer Imaging Archive (TCIA)10, and more. However, only a handful of national brain cancer projects include both multi-omics data and medical imaging data. These include The Cancer Genome Atlas (TCGA), which is a large collection of multi-omics data from 22 cancer types including Lower grade gliomas (LGG)11,12 and Glioblastomas (GBM)12,13. The imaging data from the TCGA data collection, along with imaging data from other studies are housed at the publicly accessible TCIA imaging data repository https://www.cancerimagingarchive.net/. The National Cancer Institute (NCI) Cancer Research Data Commons (CRDC) provides access to a cloud-based ecosystem with access, visualization, and analysis of multi-modal imaging data through its public portal. It also allows researchers to connect imaging data to corresponding genomics and proteomics data within the CRDC collections https://portal.imaging.datacommons.cancer.gov/.

Another initiative that included both omics data and medical images was the REMBRANDT project (REpository for Molecular BRAin Neoplasia DaTa), a joint initiative of the NCI and National Institute of Neurological Disorders and Stroke (NINDS). This project consisted of a large brain cancer patient-derived dataset that contained clinically annotated data generated through the Glioma Molecular Diagnostic Initiative (GDMI) from 874 glioma specimens comprising 566 gene expression arrays, 834 copy number arrays, and 13,472 clinical phenotype data points. In 2015, the molecular data including microarray gene expression, copy number, and clinical data were migrated to the Georgetown Database of Cancer (G-DOC)14,15. This project was managed by our team at Georgetown University, and this dataset was made public in 2018 through the publication Gusev et al.16, and the data made available via the NCBI Gene Expression Omnibus (GEO) data repository GSE10847617. Among the patients in this REMBRANDT collection, pre-surgical magnetic resonance (MR) multi-sequence images was obtained from 130 patients and is hosted at TCIA18 https://wiki.cancerimagingarchive.net/display/Public/REMBRANDT.

In this paper, we obtained the raw MRI scans from the publicly available REMBRANDT collection, and processed them through a well-known image processing pipeline that is specialized for the brain cancer MRI scans. The workflow included automated volumetric segmentation of the MRIs that identified various subregions of the brain including necrotic core, edema, non-enhancing tumor (NET) and enhancing tumor (ET), Gray Matter (GM), White Matter (WM), and Cerebrospinal Fluid (CSF). A Board-Certified radiologist then performed verification and refinements of the segmented labels that included extracted radiomic features as well. This allowed the representation of the MRI scans in a quantitative format, with the intention of enabling further biomedical and integrative data analyses.

This dataset is being made public in the NeuroImaging Tools & Resources Collaboratory (NITRC) repository through this link (https://www.nitrc.org/projects/rembrandt_brain/)19 to allow researchers perform radiogenomics based analysis, integrate with gene expression and copy number data, and enable new discoveries and hypotheses. Table 1 shows a summary of the REMBRANDT brain cancer collection.

Table 1.

Details of the REMBRANDT brain cancer collection.

Source Protocol 1 Samples Protocol 2 Data
Rembrandt glioma samples RNA extraction 671 patients Microarray hybridization GSE10847417
Rembrandt glioma samples DNA extraction 263 patients SNP array hybridization GSE10847517
Rembrandt glioma samples MRI scans 130 patients Raw MRIs in DICOM format TCIA18
Rembrandt glioma samples MRI scans 64 patients Segmented labels in NIFTI format NITRC19

Materials and Methods

Data download

We first downloaded the pre-operative raw MRI scans from the TCIA imaging archive10,20 for all the 130 patients including multiple series for each patient in DICOM file format21. The board-certified radiologist performed labeling of the MRI scans of the all modalities in the dataset that included MRIs from different modalities, including T1-weighted, T2-weighted, post-contrast T1-weighted (T1-C), and T2 Fluid-Attenuated Inversion Recovery (FLAIR) volumes22.

Data formatting

Some scans had mixed PD and T2 modalities, and had to be separated based on the meta-data in the DICOM file. Only patients that had available MRI data for all four modalities (T1, T2, T1-C and FLAIR) were selected for the next step, which resulted in a set of 72 patients. Figure 1 shows an example of four modalities from the same brain cancer patient.

Fig. 1.

Fig. 1

An example of four modalities (T1-weighted, T2-weighted, post-contrast T1-weighted (T1-C), and FLAIR) from the same brain cancer patient (patient# HF1702).

We then applied two different pipelines for the processing of these scans, comprising two popular brain cancer segmentation tools: (a) The first pipeline used the BraTumIA23 tool (Fig. 2A), and (b) the second pipeline used the GLISTRboost24,25 tool (Fig. 2B). Notably, the GLISTRboost based pipeline was top ranked in the International Multimodal Brain Tumor Segmentation challenge 2015 (BraTS’15)26 and uses an Expectation-Maximization (EM)27 framework to automatically map the various sub-regions of the brain scans while accounting for brain deformations caused by the tumor through biophysical growth modelling28. The runner-up for this challenge was the BraTumIA tool which uses a machine learning algorithm23.

Fig. 2.

Fig. 2

(A) Segmentation pipeline using the Bratumia segmentation tool. (B) Segmentation pipeline using the GLISTRboost segmentation tool.

Brain tumor segmentation using BraTumIA

After the raw data was downloaded and formatted, we ended up with MRI scans from 72 patients with four modalities - T1-weighted, T2-weighted, T1-C, and FLAIR. The images were then used as input into the BraTumIA23 tool which internally performed all processing steps. Skull stripping was performed using the Insight Toolkit ITK29 as a first step to generate a brain mask, and in the second step, the images were registered i.e. spatially transformed using the ITK toolkit, so that the voxels of the various images will correspond to one another. The images were segmented into tumor and healthy images using a joint classification-regularization based algorithm. The segmented output labels were in a meta image format (.mha) file format (Fig. 2A).

The Board-Certified radiologist performed verification of the predicted segmented labels. Example segmented labels for a brain cancer patient obtained using the BraTumIA pipeline is shown in Fig. 3

Fig. 3.

Fig. 3

Segmented labels for a brain cancer patient (patient# HF1708) obtained using the BraTumIA pipeline. It shows how the MRI scans look across all four modalities.

Brain tumor segmentation using GLISTRboost

The raw data was downloaded and cleaned in a similar order as the previous pipeline to get MRI scans from 72 patients with four modalities - T1-weighted, T2-weighted, T1-C, and FLAIR. Then, several pre-processing steps were applied. The MRI scans were first re-oriented so that all the images would be transformed into the same Left-Post-Superior (LPS) coordinate system https://www.slicer.org/wiki/Coordinate_systems, a necessary step in order to be able to compare or integrate data obtained from different modalities. The images were then co-registered to the same T1 anatomic template using “Greedy” (github.com/pyushkevich/greedy)30, a CPU-based C++ implementation of the greedy diffeomorphic registration algorithm31. Greedy is integrated into the ITK-SNAP (itksnap.org) segmentation software32,33, as well as the Cancer Imaging Phenomics Toolkit (CaPTk - www.cbica.upenn.edu/captk)3437. After the co-registration, brain extraction (also known as skull-stripping) was performed using the Brain Mask Generator (BrainMaGe)38,39, which is based on a deep learning segmentation architecture (namely U-Net40) and uses a novel framework introducing the brain’s shape as a prior and hence allowing it to be agnostic to the input MRI sequence. BrainMaGe38,39 was used to remove non-cerebral tissues like the skull, scalp, and dura from brain images.

A step called seeding was then performed by the radiologist. Seeding involved manual tagging of the sub-regions of the brain MRI including tumor regions namely ET, NET and ED; and healthy regions including white matter, gray matter, CSF, vessels and cerebellum. Seed points included center and radius of the tumor, and sample seed points in each sub-region of the brain image. This seeding step enabled the segmentation algorithm to accurately model the intensity distribution (mean and variance), for each tissue class. This allowed the segmentation tool to perform with higher accuracy compared to other segmentation tools. This step was performed using the Cancer Imaging Phenomics Toolkit (CaPTk) software platform3437. The output of this step included two text files - one with information about the tumor, and another regarding the sample points in each sub-region. These two files were used as input to the next step in the pipeline.

After these steps were completed, automated volumetric segmentation and registration was performed using GLISTRboost24,25. During the segmentation process, MRI scans from 8 patients had to be filtered out for several reasons including low quality and very limited coverage, or unreliable results due to irregularities in the input MRI scans. At the end of this pipeline (Fig. 2B), complete segmentation results were successfully obtained for 64 patients. Table 2 shows a summary of the original 130 patients in the REMBRANDT patient cohort before start of analysis, and the 64-patient cohort after completion of the segmentation step.

Table 2.

Summary of the patient cohort in the REMBRANDT brain cancer collection.

Select clinical features of the REMBRANT dataset Summary of 130 patient cohort before filtering Summary of 64 patient cohort after filtering
Clinical Feature Category Patient count % Category Patient count %
Age range 10–14 1 1% 10–14 1 2%
15–19 2 2% 15–19 1 2%
20–24 3 2% 20–24 0 0%
25–29 4 3% 25–29 3 5%
30–34 7 5% 30–34 5 8%
35–39 13 10% 35–39 4 6%
40–44 7 5% 40–44 3 5%
45–49 8 6% 45–49 5 8%
50–54 11 8% 50–54 6 9%
55–59 6 5% 55–59 3 5%
60–64 6 5% 60–64 1 2%
65–69 3 2% 65–69 2 3%
70–74 6 5% 70–74 3 5%
75–79 3 2% 75–79 2 3%
85–89 1 1% 85–89 1 2%
NA or blank 49 38% NA or blank 24 38%
Gender FEMALE 37 28% FEMALE 16 25%
MALE 43 33% MALE 24 38%
NA or Blank 50 38% NA or Blank 24 38%
Disease Type ASTROCYTOMA 47 36% ASTROCYTOMA 28 44%
GBM 41 32% GBM 18 28%
MIXED 1 1% OLIGODENDROGLIOMA 12 19%
OLIGODENDROGLIOMA 22 17% NA or Blank 6 9%
UNCLASSIFIED 1 1%
NA or Blank 18 14%

The output files from this pipeline were in the form of NIfTI files https://nifti.nimh.nih.gov. Figure 4 shows the segmented labels for a brain cancer patient obtained using the GLISTRboost pipeline.

Fig. 4.

Fig. 4

Segmented labels for a brain cancer patient (patient# HF1538) obtained using the GLISTRboost pipeline.

Radiomics analysis

Our Board-Certified radiologist discovered that the BraTumIA algorithm was only effective in the segmentation of one type of cancer, i.e., GBM patients; whereas the GLISTRboost pipeline produced more accurate segmented labels for all the brain cancer sub-types in this data collection. For this reason, we chose the segmented labels from the GLISTRboost pipeline for the radiomics analysis.

Pyradiomics41, an open-source python package was used to extract radiomics features from the segmented labels of the MRI brain scans. It included a total of 120 features, which describes various properties related to the medical image pixels, including two- and three-dimensional shape, texture, energy and entropy, size and co-occurrence, gray tone differences and more41. Table 3 shows a summary of the different classes of features characterized by pyradiomics42. Supplementary File 1 shows the radiomics features extracted from the REMBRANDT segmented labels from the GLISTRboost pipeline.

Table 3.

Summary of the types of features represented in the pyradiomics numerical output.

Class of Pyradiomics feature Number of features
First Order Statistics 19
Shape-based (3D) 16
Shape-based (2D) 10
Gray Level Co-occurrence Matrix 24
Gray Level Run Length Matrix 16
Gray Level Size Zone Matrix 16
Neighboring Gray Tone Difference Matrix 5
Gray Level Dependence Matrix 14
Total 120

Applications

Applications for multi-omics analysis

The gene expression and copy number data from this same dataset was made public in 2018 through the publication Gusev et al.16, and the data made available the NCBI Gene Expression Omnibus (GEO) data repository GSE10847617. The medical imaging data in the form of segmented labels, along with numerical output from radiomics will now be made public through this publication. This would allow researchers to integrate gene expression, copy number and medical imaging data from the same set of patients. Such a multi-omics based radiogenomics analyses would allow for research and development of novel biomarkers, and treatment hypotheses for precision medicine.

Applications for meta-analysis of brain cancer imaging studies

The GLISTRboost segmentation pipeline used in this paper has been applied to the MRI scans from TCGA brain cancer (TCGA-GBM and TCGA-LGG) patients as demonstrated in the Bakas et al.12 publication. Since the same GLISTRboost segmentation pipeline was applied to the REMBRANDT and TCGA brain cancer (TCGA-GBM and TCGA-LGG), we can now use them for meta-analyses. For instance, the open source radiomics PyRadiomics tool can be used on both datasets to obtain quantitative radiomics output. This means that these two data collections could be used together in a meta-analysis approach to provide a better sample size for machine learning and AI applications. We believe this is very valuable and enables further biomedical and integrative data analysis. The radiomics output from PyRadiomics from the REMBRANDT; and the TCGA-GBM and TCGA-LGG collections have been made available through this publication as Supplementary File 1 and Supplementary File 2 respectively.

Applications for federated learning approaches in brain cancer imaging studies

Another application is the Federated Tumor Segmentation (FeTS) platform43 that allows training specific machine learning models by leveraging information gathered from brain cancer datasets residing in collaborating sites without ever exchanging the data. The segmented labels from our REMBRANDT MRI scans are part of this world-wide federation https://www.fets.ai/, and has enabled very large multi-site machine learning models in an effort to accelerate discovery.

Summary

In this publication, we took the raw MRI scans from the REMBRANDT data collection from public domain, and performed volumetric segmentation to identify various subregions of the brain. Radiomic features were then extracted to represent the MRI scans in numerical format. The gene expression and copy number data from the same Rembrandt dataset was made public in 2018 through the publication Gusev et al.16, and the data made available the NCBI Gene Expression Omnibus (GEO) data repository GSE10847617. This dataset now enables researchers to further translational research using not only the medical image data, but also in conjunction with the genomics and clinical data.

We believe that by making this dataset available to the research community via a public repository provides a unique data science research opportunity to the biomedical and data science research communities. Such combined datasets would provide researchers with a unique opportunity to conduct integrative analysis of quantitative data from medical images, gene expression and copy number changes, alongside clinical outcomes (overall survival) in this large brain cancer study published to date.

Technical Validation - Radiologist Manual Verification

Our Board-Certified radiologist confirmed that the BraTumIA algorithm was only effective in the segmentation of one type of cancer – GBM patients. This is mentioned in the BraTumIA manual (https://www.nitrc.org/projects/bratumia), and is due to the fact that the morphology is very different for each cancer subtype, and hence the tool worked well only for GBM patients

The radiologist found that the GLISTRboost algorithm was more effective in the segmentation of the various sub-types of brain cancers in this dataset – Astrocytoma, Oligodendroglioma, and GBM. Manual verification and correction were performed on the segmented labeled output files. By using an additional manual seeding step which provided sample sub-regions as a reference for the algorithm, the GLISTRboost pipeline was able to overcome morphology and other differences in the various sub-types of brain cancers in this dataset.

This verification and corrections were performed using an MRI viewer software MITK44 https://www.mitk.org/. Figure 5 shows an example image of how the manual verification performed.

Fig. 5.

Fig. 5

Illustration of how the Radiologist performed manual verification using patient# HF1538 as an example.

Data Records

We first downloaded the pre-operative raw MRI scans from the TCIA imaging archive for 130 patients. After cleaning, MRI scans from 72 patients with complete data from four modalities were chosen for further processing. Two well-known brain cancer segmentation pipelines were applied to the cleaned dataset – BraTumIA23 and GLISTRboost24. The GLISTRboost24 algorithm was top ranked in the International Multimodal Brain Tumor Image Segmentation challenge 2015 (BraTS’15), and the BraTumIA23 algorithm was the runner up. After running both the BraTumIA23 and GLISTRboost24 pipelines, it was discovered that BraTumIA23 tool was only effective in the segmentation of one type of cancer – GBM patients. GLISTRboost24 pipeline was more effective in the segmentation of the various sub-types of brain cancers in this dataset – Astrocytoma, Oligodendroglioma, and GBM.

The segmented labels from the GLISTRboost24 pipeline, along with the manual corrections performed radiologist have been made publicly available through NeuroImaging Tools & Resources Collaboratory (NITRC) repository19. The gene expression and copy number data from this same dataset was made public in 2018 through the publication Gusev et al.16, and the data made available the NCBI Gene Expression Omnibus (GEO) data repository GSE10847617. Table 3 shows a high-level summary of the REMBRANDT brain cancer collection.

Usage Notes

The Madhavan45 et al. publication that originally described the Rembrandt portal and dataset has enabled numerous analyses and has been cited 366 times so far (as of January 2022). The gene expression and copy number data from the REMBRANDT dataset was made public in 2018 through the publication Gusev et al.16, and the data made available the NCBI Gene Expression Omnibus (GEO) data repository GSE10847617 which has been cited 69 times so far (as of January 2022).

In this publication, we took the raw MRI scans from the REMBRANDT data collection and performed volumetric segmentation to identify various subregions of the brain. Radiomic features were then extracted to represent the MRI scans in a quantitative format. This dataset now enables researchers to integrate gene expression, copy number and medical imaging data from the same set of patients. Such a multi-omics based radiogenomics analyses would allow for research and development of novel biomarkers, and treatment hypotheses for precision medicine.

The GLISTRboost segmentation pipeline applied in this manuscript was previously applied to the MRI scans from TCGA brain cancer (TCGA-GBM and TCGA-LGG) patients in Bakas et al.12 publication. Since imaging data from both REMBRANDT and TCGA brain cancer collection were processed with the same segmentation pipeline, the two datasets can now be used in-conjunction in a meta-analyses study. For example, the TCGA brain cancer dataset could be used as a training set, and the REMBRADNT dataset could be used as an independent testing set in such an analysis. Another example: open source radiomics tool PyRadiomics can be applied to both datasets to obtain quantitative radiomics output. Such a meta-analysis approach can provide a better sample size for machine learning and AI applications. We believe this would be very valuable and enables further biomedical and integrative data analysis. The radiomics output from PyRadiomics from the REMBRANDT; and the TCGA-GBM & TCGA-LGG collections have been made available through this publication as Supplementary File 1 and Supplementary File 2 respectively.

Another application is the Federated Tumor Segmentation (FeTS) platform43 that allows training specific machine learning models by leveraging information gathered from brain cancer datasets residing in collaborating sites without ever exchanging the data43. The segmented labels from our REMBRANDT MRI scans are part of this world-wide federation https://www.fets.ai/. Such a federated model has enabled very large multi-site machine learning models in an effort to accelerate discovery, and build new advanced machine learning models.

In summary, we believe that by making this dataset available to the research community via a public repository provides a unique data science research opportunity to the biomedical and data science research communities. Such combined datasets would provide researchers with a unique opportunity to conduct integrative analysis of numerical data from medical images, gene expression and copy number changes, alongside clinical outcomes (overall survival) in this large brain cancer study.

Data Privacy

The segmented medical images generated in this manuscript and made public via NITRC are skull stripped and hence do not contain identifiable information.

Supplementary information

Supplementary File 1: (115.4KB, xlsx)
Supplementary File 2 (1.9MB, xlsx)

Acknowledgements

This work was partly funded by the Lombardi Cancer Center support grant (NCI P30 CA51008), and partly supported by the National Cancer Institute (NCI) and the National Institute of Neurological Disorders and Stroke (NINDS) of the National Institutes of Health (NIH), under award numbers NCI: U01CA242871, NCI: U24CA189523, and NINDS: R01NS042645. The content of this publication is solely the responsibility of the authors and does not represent the official views of the NIH.

Author contributions

A.S.: Board Certified radiologist expertise, performed labeling of MRI scans, seeding, manual verification of all segmentation results. C.B.: performed the data cleaning, Bratumia segmentation analysis, and machine learning data analytics. K.B.: project manager, data manager, writing, and editing. A.B.: project design, machine learning team lead, machine learning data analytics; S.B., C.S. and C.D.: data initialization/pre-processing, performing GLISTRboost segmentation analysis. A.A.: project administration and finance. S.M.: microarray and copy number data owner. Y.G.: senior bioinformatics expert, conceptualization and project design, P.I., writing and editing. All authors participated in reviewing the manuscript.

Code availability

The methods and tools applied in this paper use open-source tools detailed in respective publications Bakas et al.12 publication. The python code for extracting PyRadiomics features from Rembrandt and the TCGA segmented data (Supplementary File 1 and 2 respectively) is provided here. https://github.com/ICBI/rembrandt-mri.

Competing interests

Subha Madhavan is currently employed and is a minor shareholder at AstraZeneca, Gaithersburg, MD, USA.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Anousheh Sayah, Camelia Bencheqroun.

Change history

7/7/2022

A Correction to this paper has been published: 10.1038/s41597-022-01518-9

Contributor Information

Krithika Bhuvaneshwar, Email: kb472@georgetown.edu.

Yuriy Gusev, Email: yg63@georgetown.edu.

Supplementary information

The online version contains supplementary material available at 10.1038/s41597-022-01415-1.

References

  • 1.Khazaei Z, et al. The association between incidence and mortality of brain cancer and human development index (HDI): an ecological study. BMC Public Health. 2020;20:1696. doi: 10.1186/s12889-020-09838-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Tracy Batchelor, R. N., Tarbell, N. & Weller, M. Oxford Textbook of Neuro-Oncology, 10.1093/med/9780199651870.001.0001 (Oxford University Press, 2017).
  • 3.Vadmal V, et al. MRI image analysis methods and applications: an algorithmic perspective using brain tumors as an exemplar. Neurooncol Adv. 2020;2:vdaa049. doi: 10.1093/noajnl/vdaa049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Shukla G, et al. Advanced magnetic resonance imaging in glioblastoma: a review. Chin Clin Oncol. 2017;6:40. doi: 10.21037/cco.2017.06.28. [DOI] [PubMed] [Google Scholar]
  • 5.Fathi Kazerooni A, Bakas S, Saligheh Rad H, Davatzikos C. Imaging signatures of glioblastoma molecular characteristics: A radiogenomics review. J Magn Reson Imaging. 2020;52:54–69. doi: 10.1002/jmri.26907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Binder ZA, et al. Epidermal Growth Factor Receptor Extracellular Domain Mutations in Glioblastoma Present Opportunities for Clinical Imaging and Therapeutic Development. Cancer Cell. 2018;34:163–177 e7. doi: 10.1016/j.ccell.2018.06.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bakas S, et al. In Vivo Detection of EGFRvIII in Glioblastoma via Perfusion Magnetic Resonance Imaging Signature Consistent with Deep Peritumoral Infiltration: The phi-Index. Clin Cancer Res. 2017;23:4724–4734. doi: 10.1158/1078-0432.CCR-16-1871. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zwanenburg A, et al. The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology. 2020;295:328–338. doi: 10.1148/radiol.2020191145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Rathore S, et al. Multi-institutional noninvasive in vivo characterization of IDH, 1p/19q, and EGFRvIII in glioma using neuro-Cancer Imaging Phenomics Toolkit (neuro-CaPTk) Neurooncol Adv. 2020;2:iv22-iv34. doi: 10.1093/noajnl/vdaa128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Clark K, et al. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. J Digit Imaging. 2013;26:1045–57. doi: 10.1007/s10278-013-9622-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Pedano N, 2016. Radiology Data from The Cancer Genome Atlas Low Grade Glioma [TCGA-LGG] collection. The Cancer Imaging Archive. [DOI]
  • 12.Bakas S, et al. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci Data. 2017;4:170117. doi: 10.1038/sdata.2017.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Scarpace L, 2016. Radiology Data from The Cancer Genome Atlas Glioblastoma Multiforme [TCGA-GBM] collection [Data set] The Cancer Imaging Archive. [DOI]
  • 14.Madhavan S, et al. G-DOC: a systems medicine platform for personalized oncology. Neoplasia. 2011;13:771–83. doi: 10.1593/neo.11806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Bhuvaneshwar K, et al. G-DOC Plus - an integrative bioinformatics platform for precision medicine. BMC Bioinformatics. 2016;17:193. doi: 10.1186/s12859-016-1010-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Gusev Y, et al. The REMBRANDT study, a large collection of genomic data from brain cancer patients. Sci Data. 2018;5:180158. doi: 10.1038/sdata.2018.158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Gusev Y, 2018. The REMBRANDT study – a large collection of genomic data from brain cancer patients (GSE108476) GSE108476 [DOI] [PMC free article] [PubMed]
  • 18.Scarpace L, Flanders AE, Jain R, Mikkelsen T, Andrews DW. 2019. Data From REMBRANDT [Data set] The Cancer Imaging Archive. [DOI]
  • 19.Sayah A, 2021. Segmentation Labels for the REMBRANDT brain cancer MRI image collection. [DOI]
  • 20.Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30:207–10. doi: 10.1093/nar/30.1.207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kahn CE, Jr., Carrino JA, Flynn MJ, Peck DJ, Horii SC. DICOM and radiology: past, present, and future. J Am Coll Radiol. 2007;4:652–7. doi: 10.1016/j.jacr.2007.06.004. [DOI] [PubMed] [Google Scholar]
  • 22.Currie S, Hoggard N, Craven IJ, Hadjivassiliou M, Wilkinson ID. Understanding MRI: basic MR physics for physicians. Postgrad Med J. 2013;89:209–23. doi: 10.1136/postgradmedj-2012-131342. [DOI] [PubMed] [Google Scholar]
  • 23.Abu Khalaf N, Desjardins A, Vredenburgh JJ, Barboriak DP. Repeatability of Automated Image Segmentation with BraTumIA in Patients with Recurrent Glioblastoma. AJNR Am J Neuroradiol. 2021;42:1080–1086. doi: 10.3174/ajnr.A7071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Bakas S, et al. GLISTRboost: Combining Multimodal MRI Segmentation, Registration, and Biophysical Tumor Growth Modeling with Gradient Boosting Machines for Glioma Segmentation. Brainlesion. 2016;9556:144–155. doi: 10.1007/978-3-319-30858-6_1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zeng K, et al. Segmentation of Gliomas in Pre-operative and Post-operative Multimodal Magnetic Resonance Imaging Volumes Based on a Hybrid Generative-Discriminative Framework. Brainlesion. 2016;10154:184–194. doi: 10.1007/978-3-319-55524-9_18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Menze BH, et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) IEEE Trans Med Imaging. 2015;34:1993–2024. doi: 10.1109/TMI.2014.2377694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Moon, T. K. The expectation-maximization algorithm. 13 (1996).
  • 28.Mang A, Bakas S, Subramanian S, Davatzikos C, Biros G. Integrated Biophysical Modeling and Image Analysis: Application to Neuro-Oncology. Annu Rev Biomed Eng. 2020;22:309–341. doi: 10.1146/annurev-bioeng-062117-121105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Bauer, S., Fejes, T., Reyes, M. A Skull-Stripping Filter for ITK. The Insight Journal (2012).
  • 30.Yushkevich PA, et al. Fast automatic segmentation of hippocampal subfields and medial temporal lobe subregions in 3 Tesle and 7 Tesla T2-weighted MRI. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association. 2016;12:P126–P127. doi: 10.1016/j.jalz.2016.06.205. [DOI] [Google Scholar]
  • 31.Joshi S, Davis B, Jomier M, Gerig G. Unbiased diffeomorphic atlas construction for computational anatomy. Neuroimage. 2004;23(Suppl 1):S151–60. doi: 10.1016/j.neuroimage.2004.07.068. [DOI] [PubMed] [Google Scholar]
  • 32.Yushkevich PA, et al. User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability. NeuroImage. 2006;31:1116–1128. doi: 10.1016/j.neuroimage.2006.01.015. [DOI] [PubMed] [Google Scholar]
  • 33.Yushkevich PA, et al. User-Guided Segmentation of Multi-modality Medical Imaging Datasets with ITK-SNAP. Neuroinformatics. 2019;17:83–102. doi: 10.1007/s12021-018-9385-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Davatzikos C, et al. Cancer imaging phenomics toolkit: quantitative imaging analytics for precision diagnostics and predictive modeling of clinical outcome. J Med Imaging (Bellingham) 2018;5:011018. doi: 10.1117/1.JMI.5.1.011018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Fathi Kazerooni A, et al. Cancer Imaging Phenomics via CaPTk: Multi-Institutional Prediction of Progression-Free Survival and Pattern of Recurrence in Glioblastoma. JCO Clin Cancer Inform. 2020;4:234–244. doi: 10.1200/CCI.19.00121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Pati S, et al. The Cancer Imaging Phenomics Toolkit (CaPTk): Technical Overview. Brainlesion. 2020;11993:380–394. doi: 10.1007/978-3-030-46643-5_38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Saima Rathore, S. B. et al. Brain Cancer Imaging Phenomics Toolkit (brain-CaPTk): An Interactive Platform for Quantitative Analysis of Glioblastoma. In International MICCAI Brainlesion Workshop (2018). [DOI] [PMC free article] [PubMed]
  • 38.Thakur S, et al. Brain extraction on MRI scans in presence of diffuse glioma: Multi-institutional performance evaluation of deep learning methods and robust modality-agnostic training. Neuroimage. 2020;220:117081. doi: 10.1016/j.neuroimage.2020.117081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Thakur SP, et al. Skull-Stripping of Glioblastoma MRI Scans Using 3D Deep Learning. Brainlesion. 2019;11992:57–68. doi: 10.1007/978-3-030-46640-4_6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Ronneberger, O., Fischer, P., & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention 234–241 (Springer, 2015).
  • 41.van Griethuysen JJM, et al. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res. 2017;77:e104–e107. doi: 10.1158/0008-5472.CAN-17-0339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Pyradiomics. Radiomic Features. https://pyradiomics.readthedocs.io/en/latest/features.html. (2016)
  • 43.Sheller MJ, et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci Rep. 2020;10:12598. doi: 10.1038/s41598-020-69250-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Nolden M, et al. The Medical Imaging Interaction Toolkit: challenges and advances: 10 years of open-source development. Int J Comput Assist Radiol Surg. 2013;8:607–20. doi: 10.1007/s11548-013-0840-8. [DOI] [PubMed] [Google Scholar]
  • 45.Madhavan S, et al. Rembrandt: helping personalized medicine become a reality through integrative translational research. Mol Cancer Res. 2009;7:157–67. doi: 10.1158/1541-7786.MCR-08-0435. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. Pedano N, 2016. Radiology Data from The Cancer Genome Atlas Low Grade Glioma [TCGA-LGG] collection. The Cancer Imaging Archive. [DOI]
  2. Scarpace L, 2016. Radiology Data from The Cancer Genome Atlas Glioblastoma Multiforme [TCGA-GBM] collection [Data set] The Cancer Imaging Archive. [DOI]
  3. Gusev Y, 2018. The REMBRANDT study – a large collection of genomic data from brain cancer patients (GSE108476) GSE108476 [DOI] [PMC free article] [PubMed]
  4. Scarpace L, Flanders AE, Jain R, Mikkelsen T, Andrews DW. 2019. Data From REMBRANDT [Data set] The Cancer Imaging Archive. [DOI]
  5. Sayah A, 2021. Segmentation Labels for the REMBRANDT brain cancer MRI image collection. [DOI]

Supplementary Materials

Supplementary File 1: (115.4KB, xlsx)
Supplementary File 2 (1.9MB, xlsx)

Data Availability Statement

The methods and tools applied in this paper use open-source tools detailed in respective publications Bakas et al.12 publication. The python code for extracting PyRadiomics features from Rembrandt and the TCGA segmented data (Supplementary File 1 and 2 respectively) is provided here. https://github.com/ICBI/rembrandt-mri.


Articles from Scientific Data are provided here courtesy of Nature Publishing Group

RESOURCES