See also the article by Hoebel et al in this issue.
Pallavi Tiwari, PhD, is an assistant professor of biomedical engineering and the director of Brain Image Computing Laboratory at Case Western Reserve University. Her research focuses on machine learning for personalized medicine solutions in oncology. Her research has resulted in over 50 publications, 50 abstracts, and three issued patents. Dr Tiwari has been a recipient of several scientific awards, including 100 Women Achievers by Government of India, Crain’s Cleveland Forty Under 40, and J&J WiSTEM2D scholar award. Her research is funded through the National Cancer Institute, Department of Defense, and state and foundation grants.
Ruchika Verma is a PhD candidate in the Brain Image Computing Laboratory at Case Western Reserve University. Her research is focused on developing machine and deep learning algorithms for medical image analysis for precision oncology. Her most recent work has been on reproducibility analysis of radiomic features in glioblastoma tumors across segmentation, site, and scanner variability. She has published her research in leading journals in the field of medical image analysis. She has also co-organized two international competitions on nuclei segmentation, MoNuSeg (2018) and MoNuSAC (2020).
We should be asking, ‘How much of an effect is there?’, not ‘Is there an effect?’
Geoff Cumming
The growing availability of radiologic data—such as the more than 1 million publicly available images on The Cancer Imaging Archive coupled with increased computational resources—has translated into an exponential increase in the number of artificial intelligence (AI) strategies for disease characterization. Apart from the “data-driven” deep learning revolution, a major focus in AI in radiology in the recent years has been on strategies involving radiomic features. Radiomics is defined as computerized, high-throughput extraction of subvisual features (also known as computational biomarkers) from in vivo imaging to comprehensively characterize pixelwise tumor characteristics that may not be visually appreciable by expert readers. These radiomic attributes go beyond semantic features of radiologist-derived assessments of a tumor to capture spiculations and shape- and size-based attributes (quantitatively measuring regular or irregular tumor boundary changes based on three-dimensional topology or morphology) and to quantify intratumoral heterogeneity via pixel-level, gray-level, and textural differences, among others. As such, radiomic analysis often yields thousands of quantifiable imaging characteristics extracted from routine diagnostic MRI and CT scans. In multiple studies (1–3), radiomics approaches have demonstrated tantalizing potential in prognostic and predictive modeling for characterizing disease presence or response to therapy using imaging studies.
A significant advantage with developing and implementing radiomics-based approaches is that they leverage imaging studies that are routinely acquired as a part of clinical protocol, and hence, do not significantly disrupt existing clinical workflow. However, a key bottleneck limiting clinical utility of radiomic approaches has been their lack of demonstrated ability to generalize across variations in image acquisition protocols across scanners (from different vendors) and across sites. Sources of variations in MRI acquisition that are known to impact radiomic features include differences in image contrast, voxel resolutions, section thicknesses, image reconstruction methods, magnetic field strengths, echo times, and repetition times. Notably, at present, most of these strategies are driven by identifying a subset of radiomic feature attributes (from a pool of thousands of features) that optimize the classifier accuracy, without necessarily accounting for sensitivity of radiomic features to variations in image acquisitions. Unfortunately, the sensitivity of radiomic features to image-specific variations has stirred some level of mistrust among radiologists and others in the clinical community on the potential of these powerful approaches.
Multiple efforts are currently underway to address some of the practical challenges associated with generalizability of radiomic features. These include attempts to study radiomic variability across large multisite data (4), controlled test-retest data (5), and phantom studies (6). Most of these attempts so far have been in the pursuit of identifying a set of radiomic features that are both repeatable and reproducible across different image acquisition, preprocessing, segmentation, and radiomic feature extraction pipelines. Reproducibility is defined as radiomic features being consistent across differences in image acquisition across software, sites, and scanners, while repeatability refers to radiomic features that remain consistent across scans obtained multiple times on the same scanner. The most well-known effort in this direction is the Image Biomarker Standardization Initiative (IBSI) (7), which was established to provide best-practice guidelines for standardizing feature extraction pipelines from radiologic scans from different sites and scanners. Additionally, open-source software platforms, such as PyRadiomics and Cancer Imaging Phenomics Toolkit, have provided the medical imaging community with standardized radiomic feature pipelines for analysis.
The article by Hoebel et al in this issue of Radiology: Artificial Intelligence attempts to address the critical issue of identifying repeatable radiomic features using controlled test-retest data in glioblastoma (GBM) tumors (8). The authors leveraged a unique cohort of 48 studies from two clinical trials, NCT00662506 and NCT00756106, to assess the influence of preprocessing on the repeatability of radiomic features extracted from GBM MRI scans acquired 2–6 days apart using the same 3.0-T scanners. The authors used the PyRadiomics open-source Python package to obtain radiomic features such as shape, intensity, and texture features derived from gray-level co-occurrence matrices (GLCM) from enhancing lesions and T2 abnormal regions between test and retest scans.
For both T2-weighted fluid-attenuated inversion recovery (FLAIR) and gadolinium-enhanced T1-weighted scan and rescan MRI sequences, Hoebel et al identified that the shape features were highly repeatable, as quantified using intraclass correlation coefficient, in comparison to intensity and texture features. Shape features have been shown to be impacted by segmentation variability in multiple studies in the context of GBM, as well as other solid tumors (9). However, in this study, the expert segmentations obtained across test and retest scans appeared to be consistent, which resulted in high repeatability performance of the derived shape features across test-retest studies. In a similar vein, other studies (9) have demonstrated that segmentation variability can cause differences in radiomic features and should be carefully investigated when assessing radiomic reproducibility for model building. Additionally, the findings suggested that both intensity and texture features demonstrated higher variability as compared with the shape features. These findings are in line with previous repeatability (5) and reproducibility studies (10,11), where intensity and texture features have been found to be sensitive to subtle variations in test-retest scans, as well as image variations across sites and scanners. Both texture (describing heterogeneity measures from gray-level co-occurrence matrix) and intensity features (describing the distribution of intensity values) are dependent on the actual per-voxel intensity values within the region of interest and may yield substantially different measurements if they are not appropriately normalized in some fashion before downstream analysis.
To evaluate the impact of normalization on repeatability of the radiomic features, the authors performed a comprehensive evaluation of the impact of preprocessing operations such as histogram matching and z score normalization on features extracted from gadolinium-enhanced T1-weighted and T2-weighted FLAIR MRI scans. Such analysis is particularly relevant in the context of MRI scans because the absolute voxel intensities in MRI (unlike CT scans) do not have tissue-specific meaning. Their study demonstrated image normalization as a key step, as it improved the overlap between the region of interest intensity histograms between the scan and rescan. However, in a few cases, the authors reported that image normalization had an adverse effect on feature repeatability. Further analysis revealed that the adverse effect in these cases was on account of incomplete brain extraction prior to normalization. The improvement in repeatability following this kind of quality assessment particularly highlights the importance of careful interrogation of the data following each preprocessing step to avoid downstream effects on feature repeatability and classifier performance.
Notably, the study demonstrated differences in repeatability trends: features from T2-weighted FLAIR demonstrated greater repeatability than those from T1-weighted MRI scans. The authors posit that the repeatability of features from T1-weighted postcontrast scans was impacted by variations in contrast application and the timing of image acquisition after injection despite the study being performed under relatively controlled conditions and in a clinical trial setting. This finding is significant because most radiomic studies (especially in the context of brain tumors) employ routine gadolinium-enhanced T1-weighted MRI scans for analysis and currently do not explicitly account for such variations in gadolinium-based contrast, which appears to negatively impact reproducibility and repeatability of these models.
Although the current study analyzed GLCM texture features alone, other texture features described by the IBSI include the gray-level run-length matrix (GLRLM), gray-level size-zone matrix (GLSZM), neighboring gray-tone difference matrix (NGTDM), gray-level dependence matrix (GLDM), and Laws energy and Gabor wavelet features, all of which have previously demonstrated prognostic ability for prediction of survival in GBMs and other tumors. Thus, a comprehensive analysis of the aforementioned radiomic features in terms of reproducibility and repeatability will be of widespread interest to enable the clinical utility of prognostic and predictive radiomic models.
Undoubtably, at this point, the question is not so much if there exists an “effect” in radiomic features across variability in sites and scanners, but really how much that effect will impact downstream analysis in prognostic or predictive models. Hence, large concerted effects from the medical imaging community are needed to rigorously and carefully evaluate radiomic features, not just in terms of their predictive efficacy, but also in terms of repeatability and reproducibility across multisite cohorts (preferably in large retrospective clinical trial cohorts). Crucially, these large multi-institutional retrospective studies will serve as key drivers in gaining trust from major stakeholders such as radiologists, oncologists, and patients. With that trust, radiomics features ultimately will make their way into prospective randomized trials as computational biomarkers to guide treatment decisions. The work by Hoebel et al is certainly one such important step in that direction.
Footnotes
Disclosures of Conflicts of Interest: P.T. Activities related to the present article: research partly supported by the National Institutes of Health under award number NCI/ITCR: 1U01CA248226-01, as well as by the Department of Defense (DoD) Peer Reviewed Cancer Research Program (W81XWH-18-1-0404), Johnson & Johnson Research Scholar Award, Dana Foundation David Mahoney Neuroimaging Grant, the CCCC Brain Tumor Pilot Award, the CWRU Technology Validation Start-Up Fund (CTP), and The V Foundation Translational Research Award. Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships. V.R. disclosed no relevant relationships.
References
- 1.Kickingereder P, Burth S, Wick A, et al. Radiomic profiling of glioblastoma: identifying an imaging predictor of patient survival with improved performance over established clinical and radiologic risk models. Radiology 2016;280(3):880–889. [DOI] [PubMed] [Google Scholar]
- 2.Beig N, Bera K, Prasanna P, et al. Radiogenomic-based survival risk stratification of tumor habitat on Gd-T1w MRI Is associated with biological processes in glioblastoma. Clin Cancer Res 2020;26(8):1866–1876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Khorrami M, Khunger M, Zagouras A, et al. Combination of peri- and intratumoral radiomic features on baseline ct scans predicts response to chemotherapy in lung adenocarcinoma. Radiol Artif Intell 2019;1(2):e180012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Khorrami M, Bera K, Leo P, et al. Stable and discriminating radiomic predictor of recurrence in early stage non-small cell lung cancer: Multi-site study. Lung Cancer 2020;142:90–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Shiri I, Hajianfar G, Sohrabi A, et al. Repeatability of radiomic features in magnetic resonance imaging of glioblastoma: Test-retest and image registration analyses. Med Phys 2020. 10.1002/mp.14368. Published online July 2, 2020. [DOI] [PubMed] [Google Scholar]
- 6.Mackin D, Fave X, Zhang L, et al. Measuring Computed Tomography Scanner Variability of Radiomics Features. Invest Radiol 2015;50(11):757–765. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Zwanenburg A, Vallières M, Abdalah MA, et al. The Image Biomarker Standardization Initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology 2020;295(2):328–338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Hoebel KV, Patel JB, Beers A, et al. Radiomics Repeatability Pitfalls in a Scan-Rescan MRI Study of Glioblastoma. Radiol Artif Intell 2020;3(1):e190199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Tixier F, Um H, Young RJ, Veeraraghavan H. Reliability of tumor segmentation in glioblastoma: Impact on the robustness of MRI-radiomic features. Med Phys 2019;46(8):3582–3591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Um H, Tixier F, Bermudez D, Deasy JO, Young RJ, Veeraraghavan H. Impact of image preprocessing on the scanner dependence of multi-parametric MRI radiomic features and covariate shift in multi-institutional glioblastoma datasets. Phys Med Biol 2019;64(16):165011. [DOI] [PubMed] [Google Scholar]
- 11.Chirra P, Leo P, Yim M, et al. Multisite evaluation of radiomic feature reproducibility and discriminability for identifying peripheral zone prostate tumors on MRI. J Med Imaging (Bellingham) 2019;6(2):024502. [DOI] [PMC free article] [PubMed] [Google Scholar]