See the article by Chang et al. in this issue, pp. 1412–1422.
Identification and delineation of tumor burden on MR images is a critical, yet challenging, function for clinical care and neuro-oncology research. Changes in tumor burden on structural MRI sequences are used as surrogates for efficacy throughout the entire drug development pipeline, from initial preclinical evaluations through clinical tests in human patients. DeepNeuro, the algorithm recently published by Chang et al,1 and other automated approaches to tumor segmentation (see reviews by Gordillo et al,2 Işın et al,3 and Bauer et al4), combined with standardization of image acquisition and knowledgeable interpretation of radiographic changes, hold great promise for accurate, repeatable, efficient, and inexpensive tumor segmentation.
Manual or semi-automated quantification of tumor size by experts using commercial or academic tools is highly variable across platforms5 and is extremely time-consuming, labor intensive, inefficient, and expensive. Expert neuroradiologists in large clinical trials average a more than 40% adjudication rate,6–8 which is consistent with adjudication or discrepancy rates between radiologists in other tumor types and between central review compared with local site call of progression (~30–40%).9 Difficulty in determining changes in T2 hyperintense lesions by Response Assessment in Neuro-Oncology was often determined to be the largest reason for adjudication, while changes in contrast enhancement appeared less ambiguous but still relatively difficult. While some investigators have suggested that volumetric segmentation on T1 subtraction maps may be a way to replace blinded, adjudicated central reads,10 there remain significant questions as to the reproducibility of simple segmentation approaches in the presence of artifacts and generalization of these threshold-based segmentation techniques for all relevant components of the tumor (eg, enhancing, necrotic, and non-enhancing disease). Automated segmentation and quantitative response assessment using a large set of posttherapeutic data, as demonstrated by DeepNeuro,1 has significant potential to reduce these adjudication rates while allowing investigators and regulators to evaluate potential changes in various aspects of the tumor biology.
In addition to standard response assessment, clearly defining regions of interest for both T2 hyperintense lesions (eg, edema, non-enhancing tumor) and contrast-enhancing lesions on postcontrast T1-weighted images are critical for obtaining advanced imaging measurements (diffusion, perfusion, PET, etc.) within areas of active disease. Thus, tumor segmentation itself adds another level of uncertainty beyond the inherent variability in advanced image acquisition and postprocessing. The use of an automated and standardized tumor segmentation algorithm like DeepNeuro could conceivably reduce variability in advanced imaging analyses and increase repeatability of results across centers and studies.
Performance of sophisticated algorithms like DeepNeuro is going to greatly improve through the adaptation and implementation of the standardized brain tumor imaging protocol (BTIP).11 Introduced in 2015 as a result of a joint meetings between the FDA, National Cancer Institute, clinical scientists, imaging experts, pharmaceutical and biotech companies, clinical trial cooperative groups, and patient advocacy groups, BTIP is compliant with the American College of Radiology Imaging Network and the European Organisation for Research and Treatment of Cancer guidelines and includes the minimum recommended sequences for response assessment using 1.5T and 3T MRI scanners. Since most machine learning algorithms utilize public datasets obtained from individual institutions or trials that may not be BTIP compliant, they are innately designed to generalize across a wide range of acquisition parameters and machine settings. By streamlining and focusing the range of MR parameters that are acceptable for use in clinical trials, performance of automated algorithms will theoretically improve as trials comply with BTIP and adapt to less variability in imaging characteristics.
“We are drowning in data but starved for wisdom.”
~Arianna Huffington, founder of the Huffington Post re: Artificial Intelligence
More than 200 scientific abstracts, plenary talks, and presentations were devoted to artificial intelligence or machine learning at the 2018 Radiologic Society of North America (RSNA) annual meeting last year. In radiology, artificial intelligence and “deep learning” approaches are changing the way we schedule patients, acquire and postprocess images, measure tumor size, dictate or interpret changes, and charge for radiological services. While DeepNeuro and other techniques with automatic pipelines hold the promise of reducing ambiguity and variability in tumor segmentation, these algorithms are largely based on “expert reads,” which may be problematic, as experts may not be properly vetted or trained, may have difficulty accurately defining tumor boundaries in diffuse gliomas, and/or may not utilize all approaches we know to be useful for defining active disease (eg, T1 subtraction maps, subtle signal intensity changes on T2-weighted images, architectural disruption at the gray/white matter boundary). Additionally, most of the available automated algorithms have been trained entirely on treatment-naïve tumors,5 which is a much simpler problem (ie, no postsurgical or posttherapeutic changes) with no tangible clinical impact. It is more meaningful, and admittedly more difficult, to try and differentiate changes in tumor during the posttherapeutic setting or during treatment with experimental therapeutics. While DeepNeuro may not have yet undergone extensive testing under various therapeutic contexts (eg, anti-angiogenic, immunotherapies) or in challenging tumor types (eg, gliomatosis cerebri, mixed grade tumors), their attempt to adapt their algorithm to perform an automated response assessment during treatment within a clinical trial is a strong, noteworthy step in the right direction. Thus, while promising, artificial intelligence is inherently only as good as the knowledge we are able to provide. These algorithms currently cannot generate new wisdom into the intrinsic biology of brain tumors, nor can they begin to unravel the true extent of tumor infiltration into the brain. Perhaps someday, with the addition of autopsy data, this latter goal could become a reality. However, just like humans, these new AI algorithms have the ability to adapt, adjust, learn, and “self-correct” to improve performance based on new or improved knowledge.
We are at the dawn of a renaissance in imaging technology in neuro-oncology. Artificial intelligence–based techniques and approaches for automated response assessment are going to increase in sophistication and complexity in the years and decades to come. Investigators and regulators should be acutely aware of the limitations of this technology, providing wisdom and context to further guide the development and implementation in both clinical trials and clinical care for the benefit of brain tumor patients.
References
- 1. Chang K, Beers AL, Bai HX, et al. Automatic assessment of glioma burden: a deep learning algorithm for fully automated volumetric and bi-dimensional measurement. Neuro Oncol. 2019;21(11):1412–1422. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Gordillo N, Montseny E, Sobrevilla P. State of the art survey on MRI brain tumor segmentation. Magn Reson Imaging. 2013;31(8):1426–1438. [DOI] [PubMed] [Google Scholar]
- 3. Işın A, Direkoğlu C, Şah M. Review of MRI-based brain tumor image segmentation using deep learning methods. Procedia Comput Sci. 2016;102:317–324. [Google Scholar]
- 4. Bauer S, Wiest R, Nolte LP, Reyes M. A survey of MRI-based medical image analysis for brain tumor studies. Phys Med Biol. 2013;58(13):R97–R129. [DOI] [PubMed] [Google Scholar]
- 5. Menze BH, Jakab A, Bauer S, et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans Med Imaging. 2015;34(10):1993–2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Ford RR, O'Neal M, Moskowitz SC, Fraunberger J. Adjudication rates between readers in blinded independent central review of oncology studies. J Clin Trials. 2016;6(5):289. [Google Scholar]
- 7. Boxerman JL, Zhang Z, Safriel Y, et al. Early post-bevacizumab progression on contrast-enhanced MRI as a prognostic marker for overall survival in recurrent glioblastoma: results from the ACRIN 6677/RTOG 0625 Central Reader Study. Neuro Oncol. 2013;15(7):945–954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Pope WB, Hessel C. Response assessment in neuro-oncology criteria: implementation challenges in multicenter neuro-oncology trials. AJNR Am J Neuroradiol. 2011;32(5):794–797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Dodd LE, Korn EL, Freidlin B, et al. Blinded independent central review of progression-free survival in phase III clinical trials: important design element or unnecessary expense? J Clin Oncol. 2008;26(22):3791–3796. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Schmainda KM, Prah MA, Zhang Z, et al. Quantitative delta T1 (dT1) as a replacement for adjudicated central reader analysis of contrast-enhancing tumor burden: a subanalysis of The American College of Radiology Imaging Network 6677/Radiation Therapy Oncology Group 0625 Multicenter Brain Tumor Trial. AJNR Am J Neuroradiol. 2019;40(7):1132–1139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Ellingson BM, Bendszus M, Boxerman J, et al. ; Jumpstarting Brain Tumor Drug Development Coalition Imaging Standardization Steering Committee. Consensus recommendations for a standardized Brain Tumor Imaging Protocol in clinical trials. Neuro Oncol. 2015;17(9):1188–1198. [DOI] [PMC free article] [PubMed] [Google Scholar]