Reproducibility analysis of multi-institutional paired expert annotations and radiomic features of the Ivy Glioblastoma Atlas Project (Ivy GAP) dataset

Sarthak Pati; Ruchika Verma; Hamed Akbari; Michel Bilello; Virginia B Hill; Chiharu Sako; Ramon Correa; Niha Beig; Ludovic Venet; Siddhesh Thakur; Prashant Serai; Sung Min Ha; Geri D Blake; Russell Taki Shinohara; Pallavi Tiwari; Spyridon Bakas

doi:10.1002/mp.14556

. Author manuscript; available in PMC: 2021 Aug 23.

Published in final edited form as: Med Phys. 2020 Dec 4;47(12):6039–6052. doi: 10.1002/mp.14556

Reproducibility analysis of multi-institutional paired expert annotations and radiomic features of the Ivy Glioblastoma Atlas Project (Ivy GAP) dataset

Sarthak Pati ^1,^2,^#, Ruchika Verma ³, Hamed Akbari ^4,⁵, Michel Bilello ⁶, Virginia B Hill ⁷, Chiharu Sako ^8,⁹, Ramon Correa ¹⁰, Niha Beig ¹¹, Ludovic Venet ¹², Siddhesh Thakur ¹³, Prashant Serai ^14,¹⁵, Sung Min Ha ¹⁶, Geri D Blake ¹⁷, Russell Taki Shinohara ^18,¹⁹, Pallavi Tiwari ^20,^a),^†, Spyridon Bakas ^21,^22,^23,^‡

¹Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA

²Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA

³Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH 44106, USA

⁴Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA

⁵Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA

⁶Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA

⁷Department of Radiology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA

⁸Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA

⁹Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA

¹⁰Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH 44106, USA

¹¹Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH 44106, USA

¹²Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA

¹³Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA

¹⁴Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA

¹⁵Department of Computer Science and Engineering, The Ohio State University, OH 43210, USA

¹⁶Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA

¹⁷University of Arkansas for Medical Sciences, Little Rock, AR, USA

¹⁸Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA

¹⁹Penn Statistical Imaging and Visualization Endeavor (PennSIVE), University of Pennsylvania, Philadelphia, PA 19104, USA

²⁰Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH 44106, USA

²¹Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA

²²Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA

²³Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA

^†

Equal senior corresponding author: pxt130@case.edu.

^‡

Equal senior corresponding author: sbakas@upenn.edu.

^a)

Author to whom correspondence should be addressed. pxt130@case.edu.

Contributed equally.

PMCID: PMC8382093 NIHMSID: NIHMS1684160 PMID: 33118182

Abstract

Purpose:

The availability of radiographic magnetic resonance imaging (MRI) scans for the Ivy Glioblastoma Atlas Project (Ivy GAP) has opened up opportunities for development of radiomic markers for prognostic/predictive applications in glioblastoma (GBM). In this work, we address two critical challenges with regard to developing robust radiomic approaches: (a) the lack of availability of reliable segmentation labels for glioblastoma tumor sub-compartments (i.e., enhancing tumor, non-enhancing tumor core, peritumoral edematous/infiltrated tissue) and (b) identifying “reproducible” radiomic features that are robust to segmentation variability across readers/sites.

Acquisition and validation methods:

From TCIA’s Ivy GAP cohort, we obtained a paired set (n = 31) of expert annotations approved by two board-certified neuroradiologists at the Hospital of the University of Pennsylvania (UPenn) and at Case Western Reserve University (CWRU). For these studies, we performed a reproducibility study that assessed the variability in (a) segmentation labels and (b) radiomic features, between these paired annotations. The radiomic variability was assessed on a comprehensive panel of 11 700 radiomic features including intensity, volumetric, morphologic, histogram-based, and textural parameters, extracted for each of the paired sets of annotations. Our results demonstrated (a) a high level of inter-rater agreement (median value of DICE ≥0.8 for all sub-compartments), and (b) ≈24% of the extracted radiomic features being highly correlated (based on Spearman’s rank correlation coefficient) to annotation variations. These robust features largely belonged to morphology (describing shape characteristics), intensity (capturing intensity profile statistics), and COLLAGE (capturing heterogeneity in gradient orientations) feature families.

Data format and usage notes:

We make publicly available on TCIA’s Analysis Results Directory (https://doi.org/10.7937/9j41-7d44), the complete set of (a) multi-institutional expert annotations for the tumor sub-compartments, (b) 11 700 radiomic features, and (c) the associated reproducibility meta-analysis.

Potential applications:

The annotations and the associated meta-data for Ivy GAP are released with the purpose of enabling researchers toward developing image-based biomarkers for prognostic/predictive applications in GBM. © 2020 American Association of Physicists in Medicine [https://doi.org/10.1002/mp.14556]

Keywords: glioblastoma, IvyGAP, MRI, radiomics, reproducibility, segmentation

1. INTRODUCTION

Glioblastoma (GBM) is the most aggressive and heterogeneous brain tumor. Despite multimodal treatment consisting of maximal safe surgical resection, radiation, and chemotherapy, median survival has only slightly improved to approximately 15 months, with less than 10% of patients surviving for over 5 yr.¹ This poor prognosis is largely on account of the underlying disease heterogeneity inherent in GBM tumors, which ultimately leads to treatment resistance, and thus dismal patient outcomes.

Radiographic imaging (i.e., magnetic resonance imaging (MRI)) is the modality of choice for routine clinical diagnosis and response assessment in GBM. Recently, computational analysis of these routine MRI scans, also known as radiomics, has enabled the extraction of quantitative feature attributes that capture textural and morphologic diversity,^2–4 within and outside the enhancing GBM tumor. These radiomic attributes describe subvisual cues reflecting the underlying biological processes of the tumor and its microenvironment, which otherwise are not visually discernible. Radiomic analysis in GBM has also greatly benefited from the availability of large multi-institutional publicly available data repositories, such as The Cancer Imaging Archive (TCIA)⁵ with its Ivy Glioblastoma Atlas Project (Ivy GAP) collection.^6,7 These rich anonymized data repositories have enabled research groups to develop imaging phenotypes toward identifying tumor molecular characteristics,^8–11 predicting overall survival^12–15, and progression-free survival,¹⁶ as well as the location of recurrence¹⁷ and response to chemotherapy.¹⁸ These radiomic approaches have involved capturing radiomic attributes from different tumor sub-compartments including non-enhancing tumor core (NET), enhancing tumor (ET), and peritumoral edematous/infiltrated (ED) regions obtained from multiparametric MRI (mpMRI) scans including native (T1) and gadolinium-enhanced T1-weighted (T1Gd), T2-weighted (T2), and T2-weighted-Fluid-Attenuated Inversion Recovery (FLAIR) scans, for tumor characterization.

However, in order to leverage multi-institutional repositories, such as TCIA’s Ivy GAP, for development of robust radiomic approaches, two key challenges need to be carefully accounted for. First is the lack of availability of reliable segmentation labels of the different GBM tumor sub-compartments (NET, ET, ED).^19,20 Lesion segmentation is a foremost step for downstream radiomic analysis. However, obtaining reliable annotations is a manual, tedious, and time-consuming process. While efforts to make expert-annotated segmentation labels publicly available for other TCIA collections have previously been undertaken by our group,²¹ such labels and the associated metadata are currently missing for the Ivy GAP collection. The second challenge is to account for the variability in radiomic features across segmentation labels obtained from different experts/institutions. Along with radiomic variability with respect to image acquisition protocols, and reconstruction kernels, the variability in radiomic features with respect to segmentation is well recognized in the field.²² While a few studies have recently explored the issue of segmentation variability in radiomic analysis for the TCGA-GBM cohort,^23–25 to our knowledge, none of these studies have comprehensively explored the reproducibility of radiomic features in the context of multi-institutional paired expert annotations.

Toward addressing these challenges, in this work, we have three objectives. First, we investigate the variability in segmentation labels signed off by two experienced board-certified neuroradiologists (M.B. and V.B.H.) performed at two different institutions [University of Pennsylvania (UPenn), and Case Western Reserve University (CWRU), respectively] for the publicly available Ivy GAP collection. Second, we seek to investigate the reproducibility of radiomic features across the set of segmentation labels obtained from the two institutions (CWRU, UPenn). Lastly, the segmentation labels for the three tumor sub-compartments (NET, ET, ED), the corresponding subcompartment-specific radiomic features (including intensity, volumetric, morphologic, histogram-based, textural, and COLLAGE), as well as the associated metadata collected as a part of this study are made publicly available through the TCIA Analysis Results portal (https://doi.org/10.7937/9j41-7d44).²⁶ Our overarching purpose is to (a) provide an online resource of multi-institutional paired segmentation labels for evaluation of segmentation-variability for the publicly available Ivy GAP cohort as well as (b) enable imaging and non-imaging researchers to be able to leverage the Ivy GAP cohort for development of robust and reproducible computational approaches for GBM characterization.

2. ACQUISITION AND VALIDATION METHODS

2.A. Data description

The Ivy Glioblastoma Atlas Project (Ivy GAP)^6,7 is a freely accessible online data resource, comprising a comprehensive cohort of radiological scans (i.e., MRI, CT), digitized tissue pathology slides, and corresponding transcriptomic data of 41 GBM patients.⁶ This data collection is a collaborative effort between the Ben and Catherine Ivy Foundation, the Allen Institute for Brain Science, and the Ben and Catherine Ivy Center for Advanced Brain Tumor Treatment. The radiographic scans for Ivy GAP are available through TCIA (wiki.cancerimagingarchive.net/display/Public/Ivy+GAP), the RNA sequencing data, in situ hybridization, and digitized histology slides, along with corresponding anatomic annotations are available through the Allen Institute (glioblastoma.alleninstitute.org), while the genomic and clinical data are available through the Swedish Institute (ivygap.org).

The retrospectively collected 41 subjects as a part of the Ivy GAP collection were triaged in our work to a total of n = 31 subjects following the inclusion criteria that comprised the availability of (a) the four structural mpMRI scans, that is, T1, T1Gd, T2, and FLAIR, and (b) baseline preoperative timepoint of acquisition (i.e., prior to any instrumentation). We further excluded two subjects (i.e., W32 and W42) on account of obvious registration failures, as illustrated in Fig. 10(b). Finally, one subject (i.e., W50) was also excluded from the analysis of radiomic feature robustness due to an observed disagreement across the two expert readers (M.B and V.B.H) for the annotations corresponding to tumor core.

2.B. Preprocessing

The four structural baseline pre-operative mpMRI protocols, that is, T1Gd, T1, T2, and T2-FLAIR (FLAIR) were downloaded from TCIA in DICOM format and converted to the NIfTI format. Different preprocessing pipelines were followed at each institution (UPenn, CWRU), as shown in (Fig.1) described below.

FIG. 1. — Overall workflow of the present work.

Preprocessing at UPenn.

All four modalities were first placed in a common orientation (the chosen orientation is “LPS” in the radiological convention, which is the same as “RAI” in the neurological convention). Then, to ensure cross-subject consistency, the T1Gd scan for every subject was registered to the SRI24 anatomical atlas space (www.nitrc.org/projects/sri24).²⁷ To facilitate registration, the T1Gd scan was first bias-corrected using the N4 Bias correction method²⁸ from ITK, using the Cancer Imaging Phenomics Toolkit (CaPTk),²⁹ and then registered to the SRI24 space using the Greedy registration framework³⁰ (https://github.-com/pyushkevich/greedy, available in CaPTk and ITK-SNAP³¹). The generated transformation was then applied to the original T1Gd scan (i.e., prior to bias correction) to ensure minimal loss of signal. Bias correction was not included in the pre-processing pipeline at UPenn since the group previously reported that this process obliterates the MRI signal, particularly that of the FLAIR modality, and may have a negative impact on the downstream segmentation.²¹ Subsequently, the remaining scans of each subject were registered to the transformed T1Gd scan resulting in co-registered MRI volumes of 1mm³ isotropic resolution in the SRI space. The brain was then extracted from all co-registered scans using a pretrained DeepMedic model, available through CaPTk,³² and the resulting brainmask was manual revised when needed ensuring that the complete abnormal hyper-intense FLAIR signal was always included within the brainmask.

Pre-processing at CWRU.

Registration of the T1Gd MRI scan of each Ivy GAP subject to the Montreal Neurological Institute (MNI - http://brainmap.org/training/BrettTransform.html) 1mm³ isotropic brain atlas³³ was performed using three-dimensional (3D) rigid and affine transformation via 3D Slicer 4.8.³⁴ Furthermore, to account for the resolution variability across studies from across protocols, the T1, T2, and FLAIR MRI scans were co-registered with the registered T1Gd sequence to ensure all MRI sequences are isotropic with 1mm³ dimensions. Following registration, the Swiss skull stripper³⁵ module of 3D slicer was used to strip the skull across the three MRI protocols (T1Gd, T2, and FLAIR). Every skull stripped MRI scan was corrected for bias field inhomogeneity using N4 bias-correction module available in 3D slicer.²⁸

2.C. Segmentation of tumor sub-compartments

All the tumors included in the Ivy GAP data were segmented at UPenn (M.B) and CWRU (V.B.H) following a consistent annotation protocol as defined by the International Brain Tumor Segmentation (BraTS) Challenge.^19–21 The segmentation labels were performed/approved by two expert board-certified neuroradiologists with over 10 yr of experience. The tumor subcompartment labels comprised the ET, NET, and ED. ET is radiographically defined by the hyperintense signal in T1Gd scans not only when compared to T1, but also when compared to “healthy” white matter in T1Gd. NET is typically defined radiographically by hypointense scans in T1-Gd scans when compared to their corresponding areas in the T1 scan. The combination of ET and NET describes the bulk of the tumor, which is what is typically resected, and here onwards defined as the tumor core (TC). Beyond the boundaries of TC, the complete extent of the disease is typically depicted radiographically as the area enclosed by the abnormal/hyperintense signal in the T2-FLAIR scan. This area, defined as the “whole tumor” (WT), entails the TC and the ED.

UPenn segmentations.

The expert tumor annotations from UPenn, for the three tumor sub-compartments (i.e., ET, NET, and ED), were a product of a computer-aided segmentation using an in-house software³⁶ followed by manual revisions including corrections for (a) obvious under- or over-segmented sub-compartments, (b) voxels classified as ED within the TC, (c) unclassified voxels within the TC, (d) voxels classified as NET outside the TC, and (e) voxels corresponding to vessels within the ED that were either classified as ED or ET. Finally, contralateral, periventricular, and noncontiguous WT areas with hyperintense signal in the FLAIR scans were considered to represent chronic microvascular changes, or age-associated demyelination, rather than tumor infiltration,³⁷ and hence were excluded from the WT.

CWRU segmentations.

Expert tumor annotations from CWRU for the three tumor sub-compartments (i.e., ET, NET, and ED), were performed manually by a collaborating neuro-radiologist (V.B.H.) with over 10 years of experience in neuroradiology, after carefully considering three structural MRI scans, that is, T1Gd, T2, FLAIR.

2.D. Transforming annotations to a common atlas space

Since the annotations were performed in two different atlas spaces at each institution (i.e., SRI for UPenn, and MNI for CWRU), they needed to be brought to a common atlas space. To ensure consistency with the BraTS,^19–21 datasets, the SRI space was chosen as the reference atlas onto which the MNI labels were transformed. Four different registration solutions were explored for this transformation using Greedy with Normalized Mutual Information called from CaPTk:

MNI atlas to the SRI atlas and transformations appliedto the CWRU segmentation labels.
Each MNI-registered T1Gd scans to the SRI atlas andapply corresponding transformations to the CWRU segmentation labels.
One MNI-registered T1Gd scan to the SRI atlas andapply the transformation to all CWRU segmentation labels.
Each MNI-registered T1Gd scan to the correspondingT1Gd SRI-registered volume and apply the corresponding transformation to CWRU segmentation labels.

After generating transformed labels following these four approaches (each using both skull-stripped and non-skull-stripped images and different registration kernels), the most optimal alignment, based on the qualitative assessment of the end results (where we observed notable differences as shown in Fig. 2), was obtained using the last approach on skull-stripped images using Greedy⁶ with normalized mutual information called from CaPTk.^29,38

FIG. 2. — Screenshots of Subject W8, showcasing the various registration transformations between CWRU and UPenn annotation we have used in Section 2.D, with the corresponding overall DICE scores. Green represents the UPenn tumor annotations and red represents the CWRU transformed annotations.

2.E. Radiomic analysis

Following standardization of the segmentation labels across the two institutions, a comprehensive array of 975 unique radiomic features (Table I) were obtained from eight different feature families including intensity-based statistical features (20 descriptors), morphological features (19 descriptors),³⁹ histogram features (503 descriptors), Gray-level co-occurrence matrix (GLCM)⁴⁰ (72 descriptors), Gray-level run-length matrix (GLRLM)^41–44 (90 descriptors), Gray-level size zone matrix (GLSZM) (162 descriptors), Neighborhood gray tone difference matrix (NGTDM) (5 descriptors), and co-occurrence of Local Anisotropic Gradient Orientations (COLLAGE) (104 descriptors). These feature sets were extracted per tumor subcompartment (NET, ET, ED) for every MRI scan (i.e., T1, T1Gd, T2, and FLAIR), for every subject using the same set of input images obtained from UPenn, that is, 11 700 features per patient. Since the preprocessing steps were different across the two institutions, for radiomic comparison we chose a single set of input images, processed using the UPenn pipeline which was consistent with the popular BraTS pipeline,^19–21 to ensure that the feature differences are on account of segmentation variability and not due to the varying pre-processing steps across institutions. Our radiomic feature set was extracted using open source tools comprising the Cancer imaging Phenomics Toolkit (CaPTk, www.cbica.upenn.edu/captk)²⁹ and a 3D Slicer extension for the COLLAGE feature (https://github.com/ccipd/CoLlAGeSlicerExtension).⁴⁵ CaPTk is an open-source software toolkit, which offers functionalities to extract a wide array of radiomic features compliant with the image biomarker standardization initiative (IBSI),²² and has been extensively used in radiomic analysis studies.^{12,14,16,21,44} Similarly, COLLAGE is a new open-source radiomic feature set, which has shown promise in disease prognosis and prediction for different solid tumors including brain, breast, lung, and prostate cancer.^13,18,46,47 Both CaPTk and COLLAGE were configured with a varying set of input parameters during feature extraction, including varying binning values ( $B$ ∈ {16,32,64}) for quantization, radii ( $R$ ∈ {1,2,3}) around the center voxel under consideration, and the window sizes (w) of 3 and 5 for computation of COLLAGE features. A complete set of extracted features can be found in the data repository available through TCIA,²⁶ as well as in supplementary documentation.

TABLE 1.

Summary of the radiomic features extracted in this study and the associated input parameters.

Feature family	Total features	Description	Parameters

Morphology	19	Geometric properties of the ROI	-
Intensity	20	Intensity distribution within the ROI	-
Histogram	503	Intensity distribution within the ROI after bin quantization	$B$ ∈ {16,32,64,128}
COLLAGE	104	Quantifies heterogeneity of local gradient orientations within w	w ∈ {3,5}
GLCM	72	Distribution of discretized intensities of neighboring voxels along all directions within the ROI	$B$ ∈ {16,32,64,128}
			$R$ ∈ {1,2,3}
GLRLM	90	Distribution of discretized intensities in all directions across run lengths within the ROI	$B$ ∈ {16,32,64,128}
			$R$ ∈ {1,2,3}
GLSZM	162	Number of groups (or zones) of neighboring discretized voxels within the ROI	$B$ ∈ {16,32,64,128}
			$R$ ∈ {1,2,3}
NGTDM	5	Number of groups of neighboring discretized voxels within the ROI, within a Chebyshev distance	$B$ ∈ {16,32,64,128}
			$R$ ∈ {1,2,3}

Open in a new tab

2.F. Experimental design

We quantitatively evaluated reproducibility for the Ivy GAP cohort with regard to two distinct endpoints: (a) the inter-reader agreement of the volumetric annotations across the three tumor sub-compartments (NET, ET, ED), and (b) the reproducibility of the extracted radiomic features across the three sub-compartments as well as across four MRI protocols (i.e. T1, T1Gd, T2, and FLAIR), as described below.

Inter-rater Agreement of Volumetric Annotations.

We used the four most-commonly used metrics for semantic segmentation, including Dice Similarity Coefficient (DICE), Hausdorff distance, sensitivity, and specificity, to quantitatively compare the segmentation labels obtained from the two experts (M.B, V.B.H). For completeness, we have performed the analysis by first considering the CWRU rater as ground truth and comparing UPenn rater and then considering the UPenn rater as the ground truth and comparing the CWRU rater; both done on a per-voxel manner. Specifically, DICE was used to evaluate the extent of spatial overlap between the two sets of annotations and sensitivity and specificity are used to assess the overall agreement of the raters between all the sub-compartments. Furthermore, the 95th percentile of the Hausdorff distance was used to measure the maximum distance of the point set of one annotation boundary to the nearest point in the other. In addition, the sensitivity and specificity metrics that describe the true positive rate and the true negative rate across the pair of segmentations were evaluated. Notably, these metrics were estimated for every tumor region, that is, ET, NET, ED, TC, and WT.

Radiomic Feature Robustness.

To assess the robustness of the extracted radiomic features across the two sets of expert annotations, different correlation metrics were considered including the intraclass correlation coefficient (ICC),⁴⁸ which has been extensively used in the literature for assessing segmentation variability^24,49,50 as well as Spearman rank correlation⁵¹. Spearman’s rank correlation coefficient (r_s) allows for sensitivity to nonlinear relationships in assessing the statistical dependence between the rankings of each feature across the two experts, and hence was used as the method of choice for our analysis. Additionally, along with Spearman correlation coefficient, we found intraclass correlation coefficient (ICC(3,1)) in Ref. [48] to be applicable in the case of our study⁵² and thus calculated ICC(3,1) measure for our analysis.

3. RESULTS

Inter-rater Agreement of Volumetric Annotations.

The inter-reader agreement across different tumor sub-compartments was obtained using 3D volumetric analysis, as illustrated in Fig. 3 and Fig. 10(a). The high overall values of DICE, sensitivity, and specificity, combined with the low 95th percentile Hausdorff distances demonstrate the high rate of agreement between the UPenn and CWRU raters across various labels for the included Ivy GAP subjects. Specifically, the composite tumor regions of TC and WT consistently demonstrated the best inter-rater agreement in terms of their spatial overlap, when compared with the individual tumor sub-compartments of ET, NET, and ED. Particularly the agreement for the TC area, which represents the bulk of the mass under consideration for resection, obtained a median DICE > 0.85, followed by WT with a median DICE slightly above 0.7. When observed in tandem with the DICE score of all tumor sub-compartments, the lower agreement of WT appeared to be driven by the tumor region of ED that had the lowest DICE scores.

The extreme outliers for the NET region (Fig. 3), belonged to cases W26 and W50. W26 shows an apparent previous instrumentation (Fig. 4). W50 is another exceptional case, where the annotations of the two expert raters were in disagreement, especially with respect to TC, which was identified in completely different locations (Fig. 5).

FIG. 5. — Screenshots of Subject W50, where the raters’ agreement regarding the site of *NET* and ET was different (locations with largest diameter of Non-enhancing part of tumor highlighted for each annotation).

Radiomic Feature Robustness.

Fig. 6 and Fig. 11 shows the Spearman’s rank correlation coefficients and intraclass correlation (specifically, ICC(3,1)) obtained for different feature families, across different tumor sub-compartments (i.e., ET, NET, and ED), as well as across the four MRI protocols (T1, T1Gd, T2, and FLAIR). Interestingly, for ET, and NET sub-compartments, we observed consistent patterns across different radiomic feature families, with high correlation values observed for morphology (also reported lowest variance), intensity, and COLLAGE features across the four MRI protocols and highly variable correlation values for Histogram, GLRLM and GLSZM feature families (Fig. 6). For the ET region, while intensity tended to have high correlation values, lowest variance was observed in NGTDM features, across all four MRI protocols.

In order to identify the most correlated features, we used a threshold of ≥0.8 for the correlation coefficient measure across the segmentation set, obtained for every feature. After imposing the threshold, a small percentage (24.3%) of the overall feature set was identified as “reproducible” across the paired segmentation sets, as elucidated in Fig. 7; Figs. 9 and 12. The largest number of robust features was obtained for the morphology feature family across NET and ET sub-compartments, across T1, T1Gd, and T2 MRI protocols. For ET subcompartment, the COLLAGE features were found to have the largest number of robust features for T1, and FLAIR MRI protocols, while morphology feature family had slightly higher percentage of features being picked up for T1Gd and T2 protocols.

FIG. 7. — Thermometer plot highlighting the percentage of robust features across UPenn and CWRU segmentations, (with Spearman’s correlation coefficient of ≥0.8) for the 8 feature families across T1, T1Gd, T2, FLAIR protocols.

Overall, the highest correlations were consistently observed for intensity-based, and COLLAGE features, aside from the morphology feature family. Interestingly, the COLLAGE entropy, sum variance, and energy features were found to be most stable (r_s ≥ 0:8) across all MRI protocols and tumor sub-compartments. In contrast, low correlations were observed for most of the other texture features obtained from GLCM, GLRLM, GLSZM, and NGTDM feature families, across all sub-compartments, as well as feature families.

4. DATA FORMAT AND USAGE NOTES

In accordance with the principles of Findability, Accessibility, Interoperability, and Reusability (FAIR principles),⁵³ all the data and the associated meta-data generated as a part of this study is made publicly available through the TCIA’s Analysis Results Directory (https://doi.org/10.7937/9j41-7d44).²⁶ Specifically, the released data comprises of (a) the available expert segmentation labels of the various tumor sub-compartments performed at each institution (i.e. 34 subjects segmented at UPenn, 34 subjects segmented at CWRU, with a total of 37 subjects (including 31 paired segmentations performed at both UPenn and CWRU), in the original space they were created (i.e., SRI for UPenn and MNI for CWRU), with (b) their corresponding co-registered and skull-stripped structural mpMRI scans (i.e., in SRI for UPenn and in MNI for CWRU), (c) the paired expert segmentation labels that were available for the 31 subjects, all being co-registered in the SRI atlas, (d) the corresponding SRI and MNI anatomical atlas files that we employed, (e) the complete set of 11 700 extracted radiomic features per subject, for each of the 31 included subjects, (f) the metadata relating to the metrics we utilized for the evaluation of the inter-rater agreement, as well as (g) the parameters used for the radiomic feature extraction and the correlation analysis results for identifying robust radiomic features, for the 28 subjects, and finally, (h) the specific identified robust/reproducible radiomic features. All image related files are provided in NIfTI format, while the metadata files are provided in tabular formats (.xlsx and .csv).

5. DISCUSSION

The availability of large data repositories such as TCIA’s Ivy GAP cohort has opened up tremendous possibilities with the use of radiomics (i.e., quantitative feature analysis) for applications in prognosis and prediction in GBM tumors. However, in order to develop robust noninvasive image-based markers using the TCIA’s Ivy GAP, there are two significant challenges that need to be accounted for: (a) the lack of availability of reliable segmentation labels for different tumor sub-compartments (NET, ET, and ED) and (b) identification of “reproducible” radiomic features that are robust to variability in segmentation labels obtained from different institutions. In this study, we sought to address these challenges via, (a) evaluating inter-rater agreement in volumetric annotations of tumor sub-compartments obtained from two institutions (UPenn and CWRU), (b) identifying robust/stable radiomic features across the two sets of segmentations obtained from UPenn and CWRU, and (c) the public release of the multi-institutional paired expert segmentation labels, the identified robust radiomic features, as well as the associated analysis,²⁶ through TCIA.

Most notable among previous related works, the work of Tixier et al.²⁴ has explored the robustness of radiomic features extracted from the TCGA-GBM dataset. However, there are four key differences between the two studies, particularly in terms of the comparative analysis. First and foremost, Tixier et al. compared the radiomic features extracted from a single tumor region, by considering non-enhancing tumor, enhancing tumor, and peritumoral edematous/invaded tissue as a single lesion habitat. Our work, in contrast, provides a more comprehensive comparative analysis following the most widely accepted convention (used by the International BraTS challenge^19–21) wherein we consider (a) each tumor subcompartment (non-enhancing tumor, enhancing tumor, peritumoral edematous/invaded tissue) separately, (b) the enhancing and non-enhancing tumor as a single “tumor core” region (i.e., the potentially resectable tumor), as well as (c) the union of all the three tumor sub-compartments as a single habitat (“whole tumor”). Second, another notable difference between the two studies include consideration of only FLAIR and T1Gd scans by Tixier et al., in contrast to the present study that considers all four structural MRI modalities, that is, T1, T1Gd, T2, and FLAIR. Third, a major difference was in terms of the radiomic features considered across the two analyses, where Tixier et al. evaluated a total of 108 features (extracted using the open-source CERR package⁵⁴), whereas we extracted a total of 11 700 radiomic descriptors from various different feature families (Table I) (extracted using open-source packages, COLLAGE¹⁵ and CaPTk^29,38). Finally, we performed our statistical analysis based on Spearman’s correlation coefficient. Spearman’s correlation coefficient is a nonparametric measure of the degree of association between two variables, and unlike ICC⁴⁸ (that was used by²⁴), it does not require the assumption that the relationship between the variables is linear.⁴⁹ For completeness, we also assessed ICC (3,1) (Fig. 11) metric⁴⁸ and found the results to be comparable to using Spearman’s coefficient (Fig. 6), except for the NGTDM feature family, where more number of features were identified as stable using the ICC measure for T2 and FLAIR as compared to using the Spearman measure.

Our volumetric analysis across the segmentation labels obtained from the two institutions indicated a high level of agreement between the two raters, especially for TC region as evidenced by the relatively high values of sensitivity (median value ≥ 0.85) and specificity (median value ≥ 0.95), which is of vital clinical importance as it defines the region that is considered for surgical resection. Similar levels of agreement can be seen for the WT (median sensitivity ≥ 0.85), ET (median sensitivity ≥ 0.8), NET (median sensitivity ≥ 0.7), and ED (median sensitivity ≥ 0.7) with median specificity ≥ 0.9 for all, highlighting the correlation between the two raters. The standard deviation and median values of the evaluation metrics for the inter-rater agreement across the GBM sub-compartments in our work were found to be consistent with previously reported results on other similar TCIA and BRATS studies.^19–21

Our results for radiomic feature reproducibility across the pair of segmentation labels identified 24.3% of 11 700 extracted radiomic features to be robust to segmentation changes across the two sites. A substantial proportion of these selected features belonged to morphology (describing shape characteristics), intensity (capturing statistics across intensity. profiles), and COLLAGE feature (capturing heterogeneity in local gradient orientations) families (Fig. 7 and Fig. 12). The high correlations obtained for the morphology and intensity feature families were likely on account of the high inter-reader agreement observed across the tumor regions, especially across NET and ET. Similarly, high correlations obtained for the COLLAGE feature family could be attributed to the fact that COLLAGE features are not directly computed on the intensity measurements but are rather derived from the gradient orientations within a local neighborhood window. The gradient orientations seem to be less impacted by the variability in segmentation labels across sites. Further, it was observed that the maximum number of total stable features from these three feature families (r_s ≥ 0:8) belonged to the T1, protocol followed by T2, FLAIR, and T1Gd respectively.

Based on our feasibility study, most of the Morphological features were not found to be dependent on the differences in segmentations themselves, rather on segmentation characteristics (such as elongation, sphericity, eccentricity, and flatness), which were found to be fairly similar across the two raters and thus robust to per-pixel segmentation variations. Intensity statistics features capture the aggregated measures (i.e., mean, median) of the intensity profile of the modalities in the specified tumor compartment and hence were not found to be dependent on local differences in intensities across the two segmentations. Most of the intensity statistics features demonstrated a high degree of correlation between the two raters. Strikingly, the histogram feature family, and by extension, GLCM, GLRLM, GLSZM, and NGTDM feature families (which are known to capture local image heterogeneity) demonstrated low correlation values across segmentations for the majority of their features. This may be since these features are computed across multiple binning values (16, 32, 64, and 128), thereby making the feature set highly dependent on intensity changes, which may be reflected in lower correlation values across the patients. Additionally, these features include contrast, coarseness, homogeneity, and busyness, which have been previously been indicated to present large variations in their correlation values, therefore may need to be carefully investigated for robustness across segmentations before being employed in radiomic analysis for GBM tumors. Interestingly, while Haralick texture measurements across GLCM, GLSZM, and NGTDM feature families were sensitive to segmentation variability, COLLAGE texture features, which are also considered measures of local image heterogeneity, demonstrated high correlations measures, across all three sub-compartments and MRI protocols. This may be on account of the fact that COLLAGE computes measurements such as energy/entropy from the local intensity gradients rather than local intensity differences, and hence rendered more resilient to local differences in image intensities across segmentations. Previous studies⁵⁵ have similarly demonstrated that the features which are driven by entropy and energy exhibit lesser variations due to variability in acquisition variations and reconstruction parameters.

It was noted that the brain extraction (also known as skull-stripping) approaches employed across the two sites, may have caused issues in the transformation of the respective annotations due to parts of the head (e.g., eyeballs) that were not removed during skull stripping. Examples of this issue can be found in the uploaded data for subjects W32 and W42 in the MNI created annotations by CWRU. However, even in the cases where registration did not fail, we observed that the tumor segmentation can be affected when part of a tumor or peritumoral area adjacent to the skull is removed during the brain extraction process, (Fig. 8). This highlights the need for a robust brain extraction method optimized for pathological brains that could work consistently across modalities and clinical sites.³²

FIG. 8. — Screenshots of Subject W8, showcasing maximal agreement between UPenn and CWRU raters (with regard to the whole tumor). Each image shows the axial slice from all 4 structural modalities in the top row with the annotations of UPenn and CWRU raters in the bottom 2 rows.

Interestingly, during our segmentation analysis, we observed an exceptional case (subject W50), for which the TC was annotated in two completely different locations by the two expert raters, as shown in detail in Fig. 5. It was noted that the CWRU rater had demarcated the center of the ED region (within the superior parietal lobule) as the TC, whereas the UPenn rater had highlighted the edge of the ED, closer to the ventricles (within the more inferior parasagittal precuneus) as the TC. One possible reason for this might be the fact that there are minimally enhancing foci in both these locations in the T1Gd scan, without a distinct central TC. There is also infiltrative non-enhancing or poorly enhancing tumor throughout the abnormal FLAIR hyper-intense signal (in a gliomatosis cerebri pattern), which is seen best on T2 through a slightly less hyper intense envelope, than the rest of the FLAIR hyper intense signal, reflecting highly cellular tumor compatible with the pathologically proven GBM. This case points to the difficulty and variability involved in the task of tumor region delineation, even by experienced clinicians. Another subject of particular interest was W26, where radiologic assessment indicated that it was a non-baseline scan (the points of entry for a resection are visible, Fig. 4). We still included it in our analysis for segmentation agreement as well as radiomic feature analysis because the tumor that was being assessed did not seem to be affected by previous instrumentation.

Our work did have limitations.

Our study was limited to investigating inter-reader agreement and did not consider intra-reader variability across segmentation labels. Further, segmentations were obtained from a single reader per institution. Allocating more than 2 raters would have allowed for a consensus analysis. While comprehensive (with over 11 000 radiomic features analyzed), the radiomic variability analysis was limited to 8 feature families. Future work will include interrogating intra-, as well as multiple-inter-reader segmentation variability, as well as including additional feature families (i.e., Laws, local binary patterns) for radiomic feature variability. We will also consider interrogating reproducibility of radiomic features across variations in slice thickness, image reconstruction methods, magnetic field strengths, echo times, and repetition times.

6. CONCLUSIONS

Radiomics has recently provided a surrogate mechanism for capturing GBM tumor heterogeneity using routine non-invasive MRI scans.⁵⁶ However, radiomic features are known to be susceptible to variations in annotation protocols across sites. In this work, we presented a feasibility study to (a) evaluate inter-reader agreement obtained for tumor segmentation labels, and (b) identify reproducible radiomic features across variations in tumor segmentations, in a multi-institutional setting, for the TCIA’s⁵ Ivy GAP dataset.⁶ First, we quantified the inter-reader agreement using the most-commonly used metrics (DICE, Sensitivity, Specificity, and Hausdorff). Higher value of the DICE, Sensitivity and Specificity while, lower value of Hausdorff indicates better inter-reader agreement, between the two segmented regions. Our results demonstrated that there was a high amount of overall correlation between the two raters for all sub-compartments. Second, our radiomic variability analysis experiment suggested that (a) certain features and feature families such as intensity statistics (mean, median, standard deviation, and kurtosis), morphologic (flatness, elongation, and sphericity), and COLLAGE (statistics of local gradient entropy) may be more robust to variability in segmentation labels obtained from different readers, and (b) GLCM and GLRLM feature families, which are dependent on local intensity differences, showed lower correlation across features extracted from the segmented tumor regions demarcated by two different raters. While GLCM and GLRLM features have previously shown to be prognostic of GBM,^8,12,45 our results indicated that most of these features represented large variations across the two segmentations (Fig. 6 and Fig. 11), and may need to be carefully investigated for robustness across segmentations for prognostic modeling in GBM tumors. However, in contrast, majority of morphology and intensity statistics-based features seemed to be resilient to segmentation differences across the two readers. We further made the multi-institutional segmentations as well as associated meta-data collected as a part of this analysis available on the TCIA web-portal as a community resource,²⁶ with the purpose of enabling imaging and non-imaging researchers to leverage the Ivy GAP cohort for developing image-based biomarkers for prognosis and prediction of GBM tumors.

Supplementary Material

supplemental material

Data S1. Multi-institutional paired expert segmentations and radiomic features of the Ivy GAP dataset.

NIHMS1684160-supplement-supplemental_material.pdf^{(26.9MB, pdf)}

ACKNOWLEDGMENTS

Research reported in this publication was partly supported by the National Institutes of Health (NIH) under award number NCI/ITCR:U01CA242871, as well as by the Department of Defense (DoD) Peer Reviewed Cancer Research Program (W81XWH-18-1-0404), Dana Foundation David Mahoney Neuroimaging Grant, the CCCC Brain Tumor Pilot Award, the CWRU Technology Validation Start-Up Fund (CTP), and The V Foundation Translational Research Award. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH, U.S. Department of Veterans Affairs, the DoD, or the United States Government. Niha Beig is an employee of Tempus Labs, Inc.

Footnotes

SUPPORTING INFORMATION

Additional supporting information may be found online in the Supporting Information section at the end of the article.

Contributor Information

Sarthak Pati, Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.

Ruchika Verma, Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH 44106, USA.

Hamed Akbari, Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.

Michel Bilello, Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.

Virginia B. Hill, Department of Radiology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA

Chiharu Sako, Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.

Ramon Correa, Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH 44106, USA.

Niha Beig, Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH 44106, USA.

Ludovic Venet, Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA.

Siddhesh Thakur, Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA.

Prashant Serai, Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Computer Science and Engineering, The Ohio State University, OH 43210, USA.

Sung Min Ha, Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA.

Geri D. Blake, University of Arkansas for Medical Sciences, Little Rock, AR, USA

Russell Taki Shinohara, Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA; Penn Statistical Imaging and Visualization Endeavor (PennSIVE), University of Pennsylvania, Philadelphia, PA 19104, USA.

Pallavi Tiwari, Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH 44106, USA.

Spyridon Bakas, Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.

REFERENCES

1.Ostrom QT, Gittleman H, Fulop J, et al. CBTRUS statistical report: primary brain and central nervous system tumors diagnosed in the United States in 2008–2012. Neuro-oncology. 2015;17:iv1–iv62. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Amadasun M, King R. Textural features corresponding to textural properties. IEEE Trans Syst. 1989;19:1264–1274. [Google Scholar]
3.Jaffe CC. Imaging and genomics: Is there a synergy? Radiology. 2012;264:329–331. PMID: 22821693. [DOI] [PubMed] [Google Scholar]
4.Xiao T, Hua W, Li C, Wang S. Glioma grading prediction by exploring radiomics and deep learning features. In Proceedings of the Third International Symposium on Image Computing and Digital Medicine. 2019:208–213. [Google Scholar]
5.Clark K, Vendt B, Smith K, et al. The cancer imaging archive (TCIA): maintaining and operating a public information repository. J Dig Imaging. 2013;26:1045–1057. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Puchalski RB, Shah N, Miller J, et al. An anatomic transcriptional atlas of human glioblastoma. Science. 2018;360:660–663. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Shah N, Feng X, Lankerovich M, Puchalski RB, Keogh B. Data from Ivy GAP . The Cancer Imaging Archive. 2016. 10.7937/K9/TCIA.2016.XLwaN6nL [DOI] [Google Scholar]
8.Akbari H, Bakas S, Pisapia JM, et al. In vivo evaluation of EGFRvIII mutation in primary glioblastoma patients via complex multiparametric MRI signature. Neuro-oncology. 2018;20:1068. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Bakas S, Akbari H, Pisapia J, et al. In vivo detection of EGFRvIII in glioblastoma via perfusion magnetic resonance imaging signature consistent with deep peritumoral infiltration: the φ-index. Clinical Cancer Res. 2017;23:4724–4734. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Ellingson BM, Lai A, Harris RJ, et al. Probabilistic radiographic atlas of glioblastoma phenotypes. Am J Neuroradiol. 2012;34:533. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Zinn PO, Majadan B, Sathyan P, et al. Radiogenomic mapping of edema/cellular invasion MRI-phenotypes in glioblastoma multiforme. PloS One. 2011;6:e25451. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Bakas S, Shukla G, Akbari H, et al. Overall survival prediction in glioblastoma patients using structural magnetic resonance imaging (MRI): advanced radiomic features may compensate for lack of advanced MRI modalities. Journal of Medical Imaging. 2020;7(3):1. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Beig N, Patel J, Prasanna P, et al. Radiogenomic analysis of hypoxia pathway is predictive of overall survival in glioblastoma. Sci Rep. 2018;8:1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Macyszyn L, Akbari H, Pisapia JM, et al. Imaging patterns predict patient survival and molecular subtype in glioblastoma via machine learning techniques. Neuro-oncology. 2015;18:417–425. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Prasanna P, Patel J, Partovi S, Madabhushi A, Tiwari P. Radiomic features from the peritumoral brain parenchyma on treatment-naive multi-parametric MR imaging predict long versus short-term survival in glioblastoma multiforme: preliminary findings. Eur Radiol. 2017;27:4188–4197. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Fathi Kazerooni A, Akbari H, Shukla G, et al. Cancer imaging phenomics via CAPTK: multi-institutional prediction of progression-free survival and pattern of recurrence in glioblastoma. JCO Clin Cancer Inform. 2020;4:234–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Akbari H, Macyszyn L, Da X, et al. Imaging surrogates of infiltration obtained via multiparametric imaging pattern analysis predict subsequent location of recurrence of glioblastoma. Neurosurgery. 2016;78:572–580. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Verma R, Correa R, Hill V, et al. Radiomics of the lesion habitat on pretreatment MRI predicts response to chemo-radiation therapy in glioblastoma. In: Medical Imaging 2019: Computer-Aided Diagnosis. Vol. 10950. International Society for Optics and Photonics; 2019:109500B. [Google Scholar]
19.Bakas S, Reyes M, Jakab A, et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv preprint arXiv:1811.02629; 2018. [Google Scholar]
20.Menze BH, Jakab A, Bauer S, et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans Med Imaging. 2014;34:1993–2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Bakas S, Akbari H, Sotiras A, et al. Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci Data. 2017;4:170117. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Zwanenburg A, Vallieres M, Abdalah MA, et al. The image biomarker standardization initiative: standardized quantitative radiomics for high throughput image-based phenotyping. Radiology. 2020;295:191145. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Shiri I, Hajianfar G, Sohrabi A, et al. Repeatability of radiomic features in magnetic resonance imaging of glioblastoma: test-retest and image registration analyses. Med Phys. 2020;47:4265. [DOI] [PubMed] [Google Scholar]
24.Tixier F, Um H, Young RJ, Veeraraghavan H. Reliability of tumor segmentation in glioblastoma: impact on the robustness of MRI-radiomic features. Med Phys. 2019;46:3582–3591. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Um H, Tixier F, Bermudez D, Deasy JO, Young RJ, Veeraraghavan H. Impact of image preprocessing on the scanner dependence of multi-parametric MRI radiomic features and covariate shift in multi-institutional glioblastoma datasets. Phys Med Biol. 2019;64:165011. [DOI] [PubMed] [Google Scholar]
26.Pati S, Verma R, Akbari H, et al. Multi-institutional paired expert segmentations and radiomic features of the Ivy GAP dataset. The Cancer Imaging Archive. 2020. 10.7937/9j41-7d44 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Rohlfing T, Zahr NM, Sullivan EV, Pfefferbaum A. The SRI24 multichannel atlas of normal adult human brain structure. Human Brain Mapping. 2010;31:798–819. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Tustison NJ, Avants BB, Cook PA, et al. N4itk: improved N3 bias correction. IEEE Trans Med Imaging. 2010;29:1310–1320. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Davatzikos C, Rathore S, Bakas S, et al. Cancer imaging phenomics toolkit: quantitative imaging analytics for precision diagnostics and predictive modeling of clinical outcome. J Med Imaging. 2018;5:011018. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Yushkevich PA, Pluta J, Wang H, Wisse LE, Das S, Wolk D. Fast automatic segmentation of hippocampal subfields and medial temporal lobe subregions in 3 tesla and 7 tesla T2-weighted MRI. Alzheimer’s & Dementia. 2016;12:P126–P127. [Google Scholar]
31.Yushkevich PA, Piven J, Cody Hazlett H, et al. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage. 2006;31:1116–1128. [DOI] [PubMed] [Google Scholar]
32.Thakur SP, Doshi J, Pati S, et al. Skull-stripping of glioblastoma MRI scans using 3D deep learning. In International MICCAI Brainlesion Workshop. Springer; 2019:57–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Talairach J. Co-planar stereotaxic atlas of the human brain-3-dimensional proportional system. An Approach to Cerebral Imaging; 1988. [Google Scholar]
34.Kikinis R, Pieper SD, Vosburgh KG. 3D slicer: a platform for subject-specific image analysis, visualization, and clinical support. In:Intraoperative Imaging and Image-Guided Therapy. Berlin: Springer; 2014:277–289. [Google Scholar]
35.Bauer S, Fejes T, Reyes M. A Skull-Stripping Filter for ITK. The Insight Journal. 2012. 10.5281/zenodo.811812 [DOI] [Google Scholar]
36.Bakas S, Zeng K, Sotiras A, et al. Glistrboost: combining multimodal MRI segmentation, registration, and biophysical tumor growth modeling with gradient boosting machines for glioma segmentation. In: BrainLes 2015. Berlin: Springer; 2015:144–155. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Haller S, Kövari E, Herrmann FR, et al. Do brain T2/flair white matter hyperintensities correspond to myelin loss in normal aging? A radio-logic-neuropathologic correlation study. Acta Neuropathol Commun.2013;1:14. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Rathore S, Bakas S, Pati S, et al. Brain cancer imaging phenomicstoolkit (brain-CaPTk): an interactive platform for quantitative analysis of glioblastoma. In International MICCAI Brainlesion Workshop. Springer; 2017:133–145. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Max J. Quantizing for minimum distortion. IRE Trans Inform Theory. 1960;6:7–12. [Google Scholar]
40.Haralick RM, Shanmugam K, and Dinstein IH. Textural features forimage classification. IEEE Trans Syst.1973;SMC-3:610–621. [Google Scholar]
41.Chu A, Sehgal CM, Greenleaf JF. Use of gray value distribution of runlengths for texture analysis. Pattern Recogn Lett. 1990;11:415–419. [Google Scholar]
42.Dasarathy BV, Holder EB. Image characterizations based on joint graylevel—run length distributions. Pattern Recogn Lett. 1991;12:497–502. [Google Scholar]
43.Galloway M. Texture analysis using gray level run lengths. Comput Graphics Image Process. 1975;4:172–179. [Google Scholar]
44.Tang X. Texture information in run-length matrices. IEEE Transactions on Image Processing. 1998;7(11):1602–1609. [DOI] [PubMed] [Google Scholar]
45.Prasanna P, Tiwari P, Madabhushi A. Co-occurrence of local anisotropicgradient orientations (collage): distinguishing tumor confounders and molecular subtypes on MRI. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2014:73–80. [DOI] [PubMed] [Google Scholar]
46.Braman NM, Etesami M, Prasanna P. Intratumoral and peritumoralradiomics for the pretreatment prediction of pathological complete response to neoadjuvant chemotherapy based on breast DCE-MRI. Breast Cancer Res. 2017;19:57. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Shiradkar R, Ghose S, Jambor I, et al. Radiomic features from pretreatment biparametric MRI predict prostate cancer biochemical recurrence: preliminary findings. J Magn Reson Imaging. 2018;48:1626–1636. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bullet. 1979;86:420. [DOI] [PubMed] [Google Scholar]
49.Liu R, Elhalawani H, Radwan Mohamed AS. Stability analysis of CT radiomic features with respect to segmentation variation in oropharyngeal cancer. Clin Translat Radiat Oncol. 2020;21:11–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Moradmand H, Aghamiri SMR, Ghaderi R. Impact of image preprocessing methods on reproducibility of radiomic features in multimodal magnetic resonance imaging in glioblastoma. J Appl Clin Med Phys. 2020;21:179–190. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Rockafellar RT, Wets RJ-B. Variational Analysis. Vol. 317. Berlin: Springer Science & Business Media; 2005. [Google Scholar]
52.Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15:155–163. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Wilkinson MD, Dumontier M, Aalbersberg IJ, et al. The fair guidingprinciples for scientific data management and stewardship. Sci Data. 2016;3:160018. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Apte AP, Iyer A, Crispin-Ortuzar M, et al. Extension of CERR for computational radiomics: a comprehensive matlab platform for reproducible radiomics research. Med Phys. 2018;45:3713–3720. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Galavis PE, Hollensen C, Jallow N, Paliwal B, Jeraj R. Variability of textural features in FDG PET images due to different acquisition modes and reconstruction parameters. Acta Oncol. 2010;49:1012–1016. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Alic L, Niessen WJ, Veenland JF. Quantification of heterogeneity as a biomarker in tumor imaging: A systematic review. PloS One. 2014;9: e110300. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplemental material

Data S1. Multi-institutional paired expert segmentations and radiomic features of the Ivy GAP dataset.

NIHMS1684160-supplement-supplemental_material.pdf^{(26.9MB, pdf)}

[R1] 1.Ostrom QT, Gittleman H, Fulop J, et al. CBTRUS statistical report: primary brain and central nervous system tumors diagnosed in the United States in 2008–2012. Neuro-oncology. 2015;17:iv1–iv62. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Amadasun M, King R. Textural features corresponding to textural properties. IEEE Trans Syst. 1989;19:1264–1274. [Google Scholar]

[R3] 3.Jaffe CC. Imaging and genomics: Is there a synergy? Radiology. 2012;264:329–331. PMID: 22821693. [DOI] [PubMed] [Google Scholar]

[R4] 4.Xiao T, Hua W, Li C, Wang S. Glioma grading prediction by exploring radiomics and deep learning features. In Proceedings of the Third International Symposium on Image Computing and Digital Medicine. 2019:208–213. [Google Scholar]

[R5] 5.Clark K, Vendt B, Smith K, et al. The cancer imaging archive (TCIA): maintaining and operating a public information repository. J Dig Imaging. 2013;26:1045–1057. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Puchalski RB, Shah N, Miller J, et al. An anatomic transcriptional atlas of human glioblastoma. Science. 2018;360:660–663. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Shah N, Feng X, Lankerovich M, Puchalski RB, Keogh B. Data from Ivy GAP . The Cancer Imaging Archive. 2016. 10.7937/K9/TCIA.2016.XLwaN6nL [DOI] [Google Scholar]

[R8] 8.Akbari H, Bakas S, Pisapia JM, et al. In vivo evaluation of EGFRvIII mutation in primary glioblastoma patients via complex multiparametric MRI signature. Neuro-oncology. 2018;20:1068. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Bakas S, Akbari H, Pisapia J, et al. In vivo detection of EGFRvIII in glioblastoma via perfusion magnetic resonance imaging signature consistent with deep peritumoral infiltration: the φ-index. Clinical Cancer Res. 2017;23:4724–4734. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Ellingson BM, Lai A, Harris RJ, et al. Probabilistic radiographic atlas of glioblastoma phenotypes. Am J Neuroradiol. 2012;34:533. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Zinn PO, Majadan B, Sathyan P, et al. Radiogenomic mapping of edema/cellular invasion MRI-phenotypes in glioblastoma multiforme. PloS One. 2011;6:e25451. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Bakas S, Shukla G, Akbari H, et al. Overall survival prediction in glioblastoma patients using structural magnetic resonance imaging (MRI): advanced radiomic features may compensate for lack of advanced MRI modalities. Journal of Medical Imaging. 2020;7(3):1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Beig N, Patel J, Prasanna P, et al. Radiogenomic analysis of hypoxia pathway is predictive of overall survival in glioblastoma. Sci Rep. 2018;8:1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Macyszyn L, Akbari H, Pisapia JM, et al. Imaging patterns predict patient survival and molecular subtype in glioblastoma via machine learning techniques. Neuro-oncology. 2015;18:417–425. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Prasanna P, Patel J, Partovi S, Madabhushi A, Tiwari P. Radiomic features from the peritumoral brain parenchyma on treatment-naive multi-parametric MR imaging predict long versus short-term survival in glioblastoma multiforme: preliminary findings. Eur Radiol. 2017;27:4188–4197. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Fathi Kazerooni A, Akbari H, Shukla G, et al. Cancer imaging phenomics via CAPTK: multi-institutional prediction of progression-free survival and pattern of recurrence in glioblastoma. JCO Clin Cancer Inform. 2020;4:234–244. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Akbari H, Macyszyn L, Da X, et al. Imaging surrogates of infiltration obtained via multiparametric imaging pattern analysis predict subsequent location of recurrence of glioblastoma. Neurosurgery. 2016;78:572–580. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Verma R, Correa R, Hill V, et al. Radiomics of the lesion habitat on pretreatment MRI predicts response to chemo-radiation therapy in glioblastoma. In: Medical Imaging 2019: Computer-Aided Diagnosis. Vol. 10950. International Society for Optics and Photonics; 2019:109500B. [Google Scholar]

[R19] 19.Bakas S, Reyes M, Jakab A, et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv preprint arXiv:1811.02629; 2018. [Google Scholar]

[R20] 20.Menze BH, Jakab A, Bauer S, et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans Med Imaging. 2014;34:1993–2024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Bakas S, Akbari H, Sotiras A, et al. Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci Data. 2017;4:170117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Zwanenburg A, Vallieres M, Abdalah MA, et al. The image biomarker standardization initiative: standardized quantitative radiomics for high throughput image-based phenotyping. Radiology. 2020;295:191145. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Shiri I, Hajianfar G, Sohrabi A, et al. Repeatability of radiomic features in magnetic resonance imaging of glioblastoma: test-retest and image registration analyses. Med Phys. 2020;47:4265. [DOI] [PubMed] [Google Scholar]

[R24] 24.Tixier F, Um H, Young RJ, Veeraraghavan H. Reliability of tumor segmentation in glioblastoma: impact on the robustness of MRI-radiomic features. Med Phys. 2019;46:3582–3591. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Um H, Tixier F, Bermudez D, Deasy JO, Young RJ, Veeraraghavan H. Impact of image preprocessing on the scanner dependence of multi-parametric MRI radiomic features and covariate shift in multi-institutional glioblastoma datasets. Phys Med Biol. 2019;64:165011. [DOI] [PubMed] [Google Scholar]

[R26] 26.Pati S, Verma R, Akbari H, et al. Multi-institutional paired expert segmentations and radiomic features of the Ivy GAP dataset. The Cancer Imaging Archive. 2020. 10.7937/9j41-7d44 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Rohlfing T, Zahr NM, Sullivan EV, Pfefferbaum A. The SRI24 multichannel atlas of normal adult human brain structure. Human Brain Mapping. 2010;31:798–819. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Tustison NJ, Avants BB, Cook PA, et al. N4itk: improved N3 bias correction. IEEE Trans Med Imaging. 2010;29:1310–1320. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Davatzikos C, Rathore S, Bakas S, et al. Cancer imaging phenomics toolkit: quantitative imaging analytics for precision diagnostics and predictive modeling of clinical outcome. J Med Imaging. 2018;5:011018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Yushkevich PA, Pluta J, Wang H, Wisse LE, Das S, Wolk D. Fast automatic segmentation of hippocampal subfields and medial temporal lobe subregions in 3 tesla and 7 tesla T2-weighted MRI. Alzheimer’s & Dementia. 2016;12:P126–P127. [Google Scholar]

[R31] 31.Yushkevich PA, Piven J, Cody Hazlett H, et al. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage. 2006;31:1116–1128. [DOI] [PubMed] [Google Scholar]

[R32] 32.Thakur SP, Doshi J, Pati S, et al. Skull-stripping of glioblastoma MRI scans using 3D deep learning. In International MICCAI Brainlesion Workshop. Springer; 2019:57–68. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Talairach J. Co-planar stereotaxic atlas of the human brain-3-dimensional proportional system. An Approach to Cerebral Imaging; 1988. [Google Scholar]

[R34] 34.Kikinis R, Pieper SD, Vosburgh KG. 3D slicer: a platform for subject-specific image analysis, visualization, and clinical support. In:Intraoperative Imaging and Image-Guided Therapy. Berlin: Springer; 2014:277–289. [Google Scholar]

[R35] 35.Bauer S, Fejes T, Reyes M. A Skull-Stripping Filter for ITK. The Insight Journal. 2012. 10.5281/zenodo.811812 [DOI] [Google Scholar]

[R36] 36.Bakas S, Zeng K, Sotiras A, et al. Glistrboost: combining multimodal MRI segmentation, registration, and biophysical tumor growth modeling with gradient boosting machines for glioma segmentation. In: BrainLes 2015. Berlin: Springer; 2015:144–155. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Haller S, Kövari E, Herrmann FR, et al. Do brain T2/flair white matter hyperintensities correspond to myelin loss in normal aging? A radio-logic-neuropathologic correlation study. Acta Neuropathol Commun.2013;1:14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Rathore S, Bakas S, Pati S, et al. Brain cancer imaging phenomicstoolkit (brain-CaPTk): an interactive platform for quantitative analysis of glioblastoma. In International MICCAI Brainlesion Workshop. Springer; 2017:133–145. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.Max J. Quantizing for minimum distortion. IRE Trans Inform Theory. 1960;6:7–12. [Google Scholar]

[R40] 40.Haralick RM, Shanmugam K, and Dinstein IH. Textural features forimage classification. IEEE Trans Syst.1973;SMC-3:610–621. [Google Scholar]

[R41] 41.Chu A, Sehgal CM, Greenleaf JF. Use of gray value distribution of runlengths for texture analysis. Pattern Recogn Lett. 1990;11:415–419. [Google Scholar]

[R42] 42.Dasarathy BV, Holder EB. Image characterizations based on joint graylevel—run length distributions. Pattern Recogn Lett. 1991;12:497–502. [Google Scholar]

[R43] 43.Galloway M. Texture analysis using gray level run lengths. Comput Graphics Image Process. 1975;4:172–179. [Google Scholar]

[R44] 44.Tang X. Texture information in run-length matrices. IEEE Transactions on Image Processing. 1998;7(11):1602–1609. [DOI] [PubMed] [Google Scholar]

[R45] 45.Prasanna P, Tiwari P, Madabhushi A. Co-occurrence of local anisotropicgradient orientations (collage): distinguishing tumor confounders and molecular subtypes on MRI. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2014:73–80. [DOI] [PubMed] [Google Scholar]

[R46] 46.Braman NM, Etesami M, Prasanna P. Intratumoral and peritumoralradiomics for the pretreatment prediction of pathological complete response to neoadjuvant chemotherapy based on breast DCE-MRI. Breast Cancer Res. 2017;19:57. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] 47.Shiradkar R, Ghose S, Jambor I, et al. Radiomic features from pretreatment biparametric MRI predict prostate cancer biochemical recurrence: preliminary findings. J Magn Reson Imaging. 2018;48:1626–1636. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] 48.Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bullet. 1979;86:420. [DOI] [PubMed] [Google Scholar]

[R49] 49.Liu R, Elhalawani H, Radwan Mohamed AS. Stability analysis of CT radiomic features with respect to segmentation variation in oropharyngeal cancer. Clin Translat Radiat Oncol. 2020;21:11–18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] 50.Moradmand H, Aghamiri SMR, Ghaderi R. Impact of image preprocessing methods on reproducibility of radiomic features in multimodal magnetic resonance imaging in glioblastoma. J Appl Clin Med Phys. 2020;21:179–190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] 51.Rockafellar RT, Wets RJ-B. Variational Analysis. Vol. 317. Berlin: Springer Science & Business Media; 2005. [Google Scholar]

[R52] 52.Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15:155–163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] 53.Wilkinson MD, Dumontier M, Aalbersberg IJ, et al. The fair guidingprinciples for scientific data management and stewardship. Sci Data. 2016;3:160018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] 54.Apte AP, Iyer A, Crispin-Ortuzar M, et al. Extension of CERR for computational radiomics: a comprehensive matlab platform for reproducible radiomics research. Med Phys. 2018;45:3713–3720. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] 55.Galavis PE, Hollensen C, Jallow N, Paliwal B, Jeraj R. Variability of textural features in FDG PET images due to different acquisition modes and reconstruction parameters. Acta Oncol. 2010;49:1012–1016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R56] 56.Alic L, Niessen WJ, Veenland JF. Quantification of heterogeneity as a biomarker in tumor imaging: A systematic review. PloS One. 2014;9: e110300. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Reproducibility analysis of multi-institutional paired expert annotations and radiomic features of the Ivy Glioblastoma Atlas Project (Ivy GAP) dataset

Sarthak Pati

Ruchika Verma

Hamed Akbari

Michel Bilello

Virginia B Hill

Chiharu Sako

Ramon Correa

Niha Beig

Ludovic Venet

Siddhesh Thakur

Prashant Serai

Sung Min Ha

Geri D Blake

Russell Taki Shinohara

Pallavi Tiwari

Spyridon Bakas

Abstract

Purpose:

Acquisition and validation methods:

Data format and usage notes:

Potential applications:

1. INTRODUCTION

2. ACQUISITION AND VALIDATION METHODS

2.A. Data description

2.B. Preprocessing

FIG. 1.

Preprocessing at UPenn.

Pre-processing at CWRU.

2.C. Segmentation of tumor sub-compartments

UPenn segmentations.

CWRU segmentations.

2.D. Transforming annotations to a common atlas space

FIG. 2.

2.E. Radiomic analysis

TABLE 1.

2.F. Experimental design

Inter-rater Agreement of Volumetric Annotations.

Radiomic Feature Robustness.

3. RESULTS

Inter-rater Agreement of Volumetric Annotations.

FIG. 3.

FIG. 4.

FIG. 5.

Radiomic Feature Robustness.

FIG. 6.

FIG. 7.

4. DATA FORMAT AND USAGE NOTES

5. DISCUSSION

FIG. 8.

Our work did have limitations.

6. CONCLUSIONS

Supplementary Material

ACKNOWLEDGMENTS

Footnotes

Contributor Information

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases