Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Sep 27.
Published in final edited form as: Eur Radiol. 2023 Apr 12;33(9):6582–6591. doi: 10.1007/s00330-023-09583-3

Fast, light, and scalable: harnessing data‑mined line annotations for automated tumor segmentation on brain MRI

Nathaniel C Swinburne 1, Vivek Yadav 1, Krishna Nand Keshava Murthy 1, Pierre Elnajjar 1, Hao‑Hsin Shih 1, Prashanth Kumar Panyam 1, Alice Santilli 1, David C Gutman 1, Luke Pike 2, Nelson S Moss 3, Jacqueline Stone 4, Vaios Hatzoglou 1, Akash Shah 1, Krishna Juluru 1, Sohrab P Shah 5, Andrei I Holodny 1, Robert J Young 1; M.S.K. MIND Consortium
PMCID: PMC10523913  NIHMSID: NIHMS1892697  PMID: 37042979

Abstract

Objectives

While fully supervised learning can yield high-performing segmentation models, the effort required to manually segment large training sets limits practical utility. We investigate whether data mined line annotations can facilitate brain MRI tumor segmentation model development without requiring manually segmented training data.

Methods

In this retrospective study, a tumor detection model trained using clinical line annotations mined from PACS was leveraged with unsupervised segmentation to generate pseudo-masks of enhancing tumors on T1-weighted post-contrast images (9911 image slices; 3449 adult patients). Baseline segmentation models were trained and employed within a semi-supervised learning (SSL) framework to refine the pseudo-masks. Following each self-refinement cycle, a new model was trained and tested on a held-out set of 319 manually segmented image slices (93 adult patients), with the SSL cycles continuing until Dice score coefficient (DSC) peaked. DSCs were compared using bootstrap resampling.

Utilizing the best-performing models, two inference methods were compared: (1) conventional full-image segmentation, and (2) a hybrid method augmenting full-image segmentation with detection plus image patch segmentation.

Results

Baseline segmentation models achieved DSC of 0.768 (U-Net), 0.831 (Mask R-CNN), and 0.838 (HRNet), improving with self-refinement to 0.798, 0.871, and 0.873 (each p < 0.001), respectively. Hybrid inference outperformed full image segmentation alone: DSC 0.884 (Mask R-CNN) vs. 0.873 (HRNet), p < 0.001.

Conclusions

Line annotations mined from PACS can be harnessed within an automated pipeline to produce accurate brain MRI tumor segmentation models without manually segmented training data, providing a mechanism to rapidly establish tumor segmentation capabilities across radiology modalities.

Keywords: Deep learning, Magnetic resonance imaging, Brain, Radiology, Neoplasms

Introduction

The application of artificial intelligence (AI) to diagnostic radiology promises to revolutionize patient care by reducing detection errors, increasing accuracy, and improving the non-invasive characterization of disease. In cancer imaging, tumor segmentation represents a more accurate quantification of disease burden as compared with conventional linear measurement used by most radiologists in standard care and oncology clinical trials [13]. In contrast to tumor segmentation, linear measurements, which assume that tumors have a regular ellipsoid morphology, are insensitive to subtle changes in tumor size and may be impacted by technical factors like patient positioning in the scanner [4], reducing interobserver agreement [5]. Tumor segmentation is also a prerequisite for radiomic analysis, the application of AI for the non-invasive characterization of tumors to guide optimal treatment [6], and essential to longitudinal analyses that track individual tumor changes over time. Despite its value to both clinical care and research, tumor segmentation is not routinely performed due to the onerous time and effort requirements. Manual segmentation of a single tumor on a brain MRI can take an average of 10 min [7], and one volumetric scan may feature numerous individual tumors.

While AI has transformed the automated processing of non-medical images, the progress of radiology AI has been comparatively slow due to the scarcity of domain experts needed for the time-intensive and generally non-reimbursable process of radiology image labeling. Robust segmentation AI models typically require at least thousands of annotated training images, and deployed models may require additional annotated data for retraining due to limitations of model generalizability [811] and target data shift [1214]. The reliance on manual annotation of radiology data for fully supervised learning thus represents a major challenge and obstacle to the advancement of radiology AI.

There is a need for scalable automated image annotation techniques that overcome the necessity of such large-scale manual efforts, shifting the major burden of effort from the radiologist to the AI models and data scientists, with the radiologist maintaining the critical role of overseeing and ensuring model performance. In a previous study, we described an automated pipeline that data mines clinically generated tumor line measurement annotations in PACS and employs semi-supervised learning (SSL) to generate bounding boxes around unlabeled tumors in the training images, achieving accurate tumor detection on brain MRI [15, 16]. In achieving high-performance using image annotations generated by well-established clinical radiology workflows, this pipeline avoids the manual annotation bottleneck inherent in fully supervised learning and provides a source of continuous annotation data for model retraining.

We hypothesize that this PACS data mining pipeline can be extended to further downstream AI tasks such as tumor segmentation, using SSL to overcome the data labeling bottleneck, facilitating the development of models that will improve disease response assessment and enable scalable “big data” radiomic analyses. Previous radiology deep learning efforts have employed bounding box annotations to accomplish semantic segmentation using pseudo-masks generated automatically via unsupervised segmentation methods [1720]. While these investigations have utilized manually curated bounding box and image datasets, there remains a need for semantic segmentation methods that achieve high performance without requiring manual curation of the training set, instead leveraging inherently noisy image annotations drawn from real-world clinical radiology workflows. In the current study, we investigate whether the PACS data mining and tumor detection pipeline can be extended using fully automated AI methodologies to achieve accurate segmentation of enhancing tumors on brain MRI without manually segmented training data.

Materials and methods

This retrospective study was approved by the local Institutional Review Board and the need for written informed consent was waived. All data storage and handling were performed in compliance with HIPAA regulations.

Use of the tumor detection model to automatically generate segmentation pseudo‑masks

The tumor detection model was previously trained using clinically annotated T1-weighted post-contrast (T1C +) brain MR images acquired at our institution between January 2012 and December 2017. The detection training pipeline data mines line annotations from PACS, converts lines to bounding boxes, and utilizes a SSL framework to generate bounding boxes around unlabeled tumors in the training images, automatically improving the training dataset. The final model achieved high performance in detecting tumors ≥ 1 cm [15, 16].

In the current investigation, this tumor detection model was utilized to automatically generate baseline tumor segmentation pseudo-masks as follows. Rather than relying on the tumor bounding boxes generated by data mining PACS, which would introduce challenges of data noise and incompletely labeled images, the model was used to detect lesions on the detection training image set, yielding 12,483 lesion bounding boxes on 9911 individual T1C + image slices from 6236 unique scans (3449 patients). Within the image patch defined by each bounding box, Otsu thresholding [21], a method of automatically binarizing an image into separate foreground and background masks using histogram intensity analysis, was applied to generate an initial approximation of the enhancing tumor. The resultant set of 9911 images and segmentation pseudo-masks comprised the initial dataset (TrainOtsu) used to train the baseline tumor segmentation models.

To establish the reference standard test dataset used to score each segmentation model, 319 image slices (from 229 unique scans performed on 93 adult patients) containing enhancing tumors (278 intra-axial and 88 extra-axial tumors) were randomly selected from the held-out test cohort described in [16]. Manual segmentation of all enhancing tumors was performed in ITK-SNAP [22] by coauthor NS (8 years neuroradiology experience). To prevent data leakage, the training and test datasets were defined at the patient level, ensuring that patients with multiple images or scans did not overlap between the two datasets.

Full‑image segmentation model training and pseudo‑mask self‑refinement

Three segmentation neural network architectures were compared: U-Net [23], Mask R-CNN [24], and HRNet [25]. All models were measured for average Dice score coefficient (DSC) using the held-out test set.

To improve the baseline segmentation pseudo-masks through the addition of contextual image data, an automated self-refinement process utilizing the full brain MR images and pseudo-masks was employed for each architecture.

  1. The baseline pseudo-mask training set (TrainOtsu) was employed to train a baseline segmentation model that was used to predict masks on the training images themselves, generating a new proposed segmentation mask set.

    An automated acceptability check (Supplemental information), designed to prevent the erosion and eventual loss of a detected lesion’s segmentation mask during self-refinement, was performed on each image’s proposed new mask, leveraging the bounding boxes previously generated by the detection model. This yielded a new segmentation mask training set, Train1.

  2. The new training set (generically, Trainn) was used to train a new model that was again used to predict new masks on the training images. The acceptability check was again performed for each image and proposed mask, generating a new segmentation training set, Trainn+1.

  3. Step 2 above was repeated until model performance, as scored with the held-out test set, peaked.

After completing the self-refinement process for each architecture, the training set yielding the overall best-performing trained model was selected as the final optimized segmentation dataset (TrainFinal) and used to train final models of each architecture, which were then compared.

Comparison of conventional versus hybrid segmentation implementations

To investigate whether the tumor detection model has value beyond generating segmentation training data and can also be used to directly augment the full-image segmentation model’s performance, a second hybrid segmentation inference method was investigated. The hybrid method adds a second arm running in parallel to the full-image segmentation model: use of the detection model to identify tumors and explicitly serve up image patches to a separate image patch segmentation model. The predicted masks output by each arm of the hybrid method are combined in a final reconciliation step to correct for tumors missed by the full image segmentation model, maximizing overall segmentation performance (Supplemental information).

To generate the image patch segmentation training set, the training images and TrainFinal segmentation masks were cropped using the bounding boxes provided by the detection model. The cropped images and masks (TrainFinal-patch) were used to train an image patch segmentation model of each architecture.

Conventional segmentation models and the best-performing hybrid segmentation method were further compared using Hausdorff distance (95th percentile), volume similarity, and Jaccard similarity coefficient. The impact of tumor size on segmentation performance was evaluated using Pearson correlation coefficient.

The impact of training set size was investigated using the best-performing hybrid segmentation method and conventional segmentation model of the same architecture. For each method, additional models were trained using randomly selected subsets of the full training set (75%, 50%, and 25%) and evaluated using the full test set.

An overview of the training set automated segmentation process is shown in Fig. 1. The full PACS mining, object detection, and segmentation (MODS) development pipeline is shown in Fig. 2.

Fig. 1.

Fig. 1

Overview of the training set automated segmentation process. Beginning with tumor bounding boxes, unsupervised segmentation (Otsu thresholding) is performed on the image regions defined by each box, yielding the baseline segmentation pseudo-masks (TrainOtsu). For each segmentation architecture, mask self-refinement is performed. The self-refined segmentation mask training set yielding the best performing model of any architecture, as assessed using the held-out test set, is selected as the final training set (TrainFinal) for the full-image segmentation models. The images and TrainFinal masks are cropped using the bounding boxes, yielding the training set for the image patch models (TrainFinal-patch)

Fig. 2.

Fig. 2

Overview of the full PACS data mining, object detection, and segmentation (MODS) pipeline extended in the current investigation, with the new steps outlined inside the blue dashed lines. In both the object detection and segmentation steps, semi-supervised learning is used to automatically correct noisy training annotations

Statistical analysis

Statistical tests were conducted to compare the differences in model DSCs and significance levels were derived by bootstrap resampling 2000 times. In each resampling cycle, a sample of 319 images was randomly selected from the full test dataset with replacement and used to score each model. These aggregate results were then used to calculate pair-wise comparison p-values (1-sided).

Results

The combined segmentation training and test sets included images from 3542 patients (mean age 58.4 ± 17 years; 2014 women and 1528 men) on 6465 unique MRI scans. Patients’ primary malignancy varied widely, with lung adenocarcinoma, glioblastoma, and breast ductal carcinoma being the most common solitary cancer diagnoses. Patient demographics are included in Table 1.

Table 1.

Patient demographics

Training group Testing group Total
Patient demographics
 • No. of patients 3449 93 3542
 • Mean age (y) 58.5 ± 17.0 56.4 ± 17.0 58.4 ± 17.0
 • Women/men (ratio) 1965/1484 (1.32) 49/44 (1.11) 2014/1528 (1.32)
Primary malignancy
 • Lung adenocarcinoma 466 (13.5%) 11 (11.8%) 477 (13.5%)
 • Glioblastoma 367 (10.6%) 12 (12.9%) 379 (10.7%)
 • Breast ductal carcinoma 342 (9.9%) 9 (9.6%) 351 (9.9%)
 • Melanoma 131 (3.8%) 10 (10.8%) 141 (4.0%)
 • Prostate adenocarcinoma 87 (2.5%) 1 (1.1%) 88 (2.5%)
 • Lung small cell carcinoma 71 (2.1%) 5 (5.4%) 76 (2.1%)
 • Other, multiple, or unavailable 1985 (57.6%) 45 (48.3%) 2030 (57.3%)
Number of unique scans 6236 229 6465
Image slice thickness (mm)*
 • 5.0 9496 (95.8%) 316 (99.1%) 9812 (95.9%)
 • 3.0 257 (2.6%) 3 (0.9%) 260 (2.5%)
 • 4.5 98 (1.0%) N/A 98 (1.0%)
 • Other 60 (0.6%) N/A 60 (0.6%)
Image slices per scanner manufacturer
 • GE Medical Systems 9877 (99.7%) 319 (100%) 10,196 (99.7%)
 • Philips Healthcare 29 (0.3%) 0 (0%) 29 (0.3%)
 • Siemens 5 (0.05%) 0 (0%) 5 (0.05%)
Image in-plane resolution (voxels)
 • 256 × 256 5424 (54.7%) 199 (62.3%) 5623 (55.0%)
 • 512 × 512 4453 (44.9%) 120 (37.6%) 4573 (44.7%)
 • 320 × 320 27 (0.3%) 0 (0%) 27 (0.3%)
 • 1024 × 1024 7 (0.07%) 0 (0%) 7 (0.07%)
Mean bounding box length (cm) 2.18 ± 1.23 1.95 ± 1.21 2.17 ± 1.23
Mean segmentation area (cm2) 1.61 ± 3.68 ** 1.95 ± 3.32 1.62 ± 3.67

Abbreviations: y, years; mm, millimeters; cm, centimeters; GE, General Electric

*

Image slice thickness was equivalent to inter-slice spacing for all images

**

Mean segmentation size is reported using the baseline TrainOtsu dataset

Full‑image segmentation model

The baseline segmentation models trained using the Otsu-generated segmentation pseudo-masks (TrainOtsu) achieved DSC of 0.768 (U-Net), 0.831 (Mask R-CNN), and 0.838 (HRNet). The automated self-refinement method significantly improved performance for each architecture, with each peaking in performance within 10 cycles; maximum DSCs: U-Net 0.798 in 6 cycles (p < 0.001), Mask R-CNN 0.871 in 6 cycles (p < 0.001), and HRNet 0.873 in 7 cycles (p < 0.001). Representative examples of segmentation mask evolution by the self-refinement process are shown in Fig. 3. Changes in DSC for each architecture during self-refinement are included in Fig. 4 and Table S-1.

Fig. 3.

Fig. 3

Examples of pseudo-mask evolution during self-refinement. Examples of baseline pseudo-masks generated automatically using Otsu thresholding in concert with detected tumor bounding boxes (first two columns). The initially noisy Otsu pseudo-masks improve during consecutive self-refinement cycles by the HRNet architecture (third through fifth columns)

Fig. 4.

Fig. 4

Change in DSC of the full image segmentation models during self-refinement, as measured using the held-out test dataset. Each of the 3 architectures peaked in performance within 10 cycles. DSC, Dice score coefficient

The self-refined pseudo-mask dataset yielding the best-performing trained model (HRNet after 7 cycles) was selected as the TrainFinal dataset used to train the final full-image segmentation models of each architecture. These final models attained maximum DSCs of 0.809 (U-Net), 0.871 (Mask R-CNN), and 0.873 (HRNet).

Image patch segmentation model and hybrid segmentation method

The image patch segmentation models trained using the TrainFinal-patch dataset achieved DSCs of 0.737 (U-Net), 0.801 (Mask R-CNN), and 0.796 (HRNet). For each architecture, hybrid inference improved performance over the full-image segmentation model alone: DSC 0.832 (U-Net), 0.884 (Mask R-CNN), and 0.881 (HRNet). Comparing full-image and hybrid segmentation using the best-performing architectures for each approach, hybrid inference significantly outperformed full-image segmentation alone: DSC 0.884 (Mask R-CNN hybrid) vs. 0.873 (HRNet full image), p < 0.001. For the best-performing method (Mask R-CNN hybrid), segmentation performance was greater for metastatic tumors (65 patients, 207 image slices; DSC 0.901) than primary brain tumors (28 patients, 112 image slices; DSC 0.842), and did not significantly correlate with tumor size (DSC and lesion area; Pearson r = − 0.02, p = 0.74).

An example of hybrid segmentation inference is shown in Fig. 5. Comparisons of model performance are included in Tables 2 and S-2.

Fig. 5.

Fig. 5

Example of the hybrid segmentation method’s ability to augment the full-image segmentation model using a parallel arm that combines tumor detection, image cropping, and image patch segmentation. In the final reconciliation step above, the sub-mask of the smaller tumor unrecognized by the full image model is replaced by the output of the image patch segmentation model, improving the final mask

Table 2.

Pair-wise comparison of segmentation model DSCs using bootstrap resampling

Models compared [DSC (95% CI)] p value
Full image segmentation model comparisons
 U-Net baseline [0.768 (0.753–0.781)] U-Net after self-refinement [0.798 (0.784–0.810)] < 0.001
 Mask R-CNN baseline [0.831 (0.816–0.846)] Mask R-CNN after self-refinement [0.871 (0.854–0.886)] < 0.001
 HRNet baseline [0.838 (0.823–0.854)] HRNet after self-refinement [0.873 (0.858–0.889)] < 0.001
Full image vs. Hybrid segmentation model comparison
 HRNet after self-refinement [0.873 (0.858–0.889)] Mask R-CNN hybrid [0.884 (0.868–0.899)] < 0.001

Pair-wise comparison of segmentation model DSCs using bootstrap resampling technique. All baseline models were trained using the TrainOtsu dataset and are compared with the peak model of the same architecture obtained from self-refinement

The best-performing full image segmentation model (HRNet after self-refinement) is compared with the best-performing hybrid method (Mask R-CNN hybrid). The image patch segmentation model used in the hybrid Mask R-CNN method was trained with the TrainFinal-patch dataset derived from TrainFinal

For each comparison, the DSC of the better performing model is in bold text. Bolded p values indicate p < 0.05

CI, confidence interval

Decreasing the training set size lowered performance for both the conventional and hybrid segmentation methods, with performance declining greatest in absolute terms when comparing models trained with 50% versus 25% sized subsets versus other sequential subsets (Table S-3). For each training subset, the hybrid method outperformed the conventional (full-image) segmentation model alone.

Discussion

In this investigation, we extended the use of a brain MRI tumor detection model, trained using readily available line annotations mined from PACS, to achieve accurate segmentation of enhancing tumors without manually segmented training data. The use of an unsupervised segmentation method, Otsu thresholding, in concert with the detection model generated a baseline segmentation training set that was automatically improved using a SSL self-refinement process. In providing anatomic context through exposure to large numbers of brain MR images during self-refinement, models iteratively improved at delineating tumor from non-tumor tissue despite noisy initial pseudo-masks, yielding significantly better performing final segmentation models.

Augmenting these full-image segmentation models with a parallel arm consisting of tumor detection followed by image patch segmentation further improved performance slightly, from maximum DSC 0.873 to 0.884 (p < 0.001), with this performance advantage persisting for all training set sizes evaluated. While the hybrid method adds some complexity over full-image segmentation alone, the benefit is attained using the detection model already available and without any additional image annotation burden placed on the radiologist. This performance level compares favorably to the literature describing deep learning methods for segmenting enhancing tumor on brain MRI, with a median published DSC of 0.73 [26] and the winner of the benchmark BraTS 2020 competition attaining DSC of 0.82 [27].

In considering the larger context of the MODS development pipeline extended by this investigation (Fig. 2), which does not require manual annotation of training images beyond the initial PACS data mining step, there are several important advantages over large-scale manual segmentation efforts used for fully supervised learning. Foremost, the automated generation and self-refinement of segmentation training data is markedly faster, shifting the major annotation burden to the models themselves and limiting the manual annotation required by radiologists to only the comparatively small test set. In our experience, manual segmentation of the test set required approximately 20 h of total effort (3.8 min per image), which was accomplished by one radiologist over 7 sessions. At the same per-image rate, manual annotation of the training set itself (n = 9911 images) would have required over 625 h (more than 26 full days) of additional radiologist effort, > 95% time savings afforded by using the MODS pipeline. Second, since MODS is scalable, increasing the size of the segmentation training set (e.g., by an order of magnitude) could be accomplished without any additional manual effort from radiologists. This addresses a critical need for voxel-level image annotation methods that can be feasibly applied to massively large datasets to achieve high statistical power in “big data” radiomic analyses [28]. Third, MODS is lightweight in its modular design, enabling the dynamic rerunning of the full development pipeline to incorporate novel unsupervised segmentation methods or address target data shifts (e.g., due to changes in patient population, scanner parameters, or imaging protocols). Whereas in a fully supervised learning framework a data shift occurring in late-stage development or post-deployment typically requires major manual effort to rework training annotations, the MODS pipeline is designed to “plug into” existing radiology workflow. This allows it to be easily rerun using newly mined image and line annotation data, supporting the transition of radiology AI from static models to dynamic pipelines and continuous learning.

There are a few potential limitations to accomplishing automated tumor segmentation using the MODS approach. First, by design, this method achieves segmentation by relying on the previous tumor detection capability. While this facilitates overall pipeline optimization by reducing the opaque “black box” nature of a single end-to-end segmentation model, any limitations to the detection model will tend to degrade the performance of the segmentation model as well. Inadequate segmentation performance due to deficiencies inherited from the detection model could be countered through the use of a modestly sized retraining segmentation dataset for segmentation model fine-tuning [29]. Second, while the application of Otsu thresholding to generate the baseline pseudo-masks was successful for enhancing tumors on brain MRI, achieving a high DSC of 0.838 (HRNet) using only these noisy, unrefined training data, this technique would likely be of less value for tumors or imaging modalities with low lesion conspicuity, potentially resulting in initial pseudo-masks and baseline segmentation models that derive limited benefit from self-refinement. In such situations, other unsupervised segmentation techniques [30] could be investigated and formally compared using the held-out test set. Third, the optimal deployment of the hybrid segmentation pipeline within a continuous learning framework may require modifications to radiology information technology architectures to support data routing, clinical workflow integration, and model re-training [31]. However, the short-term costs of these infrastructure modifications would enable substantial long-term efficiency gains from leveraging local, clinically generated image annotations for continuous model improvement.

In conclusion, our investigation affirms the tremendous value of historical line annotation data mined from PACS for facilitating radiology computer vision model development. Applying automated techniques to these mined weak annotations yields tumor detection and segmentation models achieving excellent performance, markedly reducing the manual annotation time and effort required of radiologists as compared with fully supervised learning. The MODS development pipeline could be applied to other radiology imaging modalities, providing a roadmap to rapidly establish and continuously optimize automated tumor detection and segmentation capabilities across the radiology department.

Supplementary Material

Supplemental file

Key Points.

  • A brain MRI tumor detection model trained using clinical line measurement annotations mined from PACS was leveraged to automatically generate tumor segmentation pseudo-masks.

  • An iterative self-refinement process automatically improved pseudo-mask quality, with the best-performing segmentation pipeline achieving a Dice score of 0.884 on a held-out test set.

  • Tumor line measurement annotations generated in routine clinical radiology practice can be harnessed to develop high-performing segmentation models without manually segmented training data, providing a mechanism to rapidly establish tumor segmentation capabilities across radiology modalities.

Methodology.

  • retrospective

  • diagnostic or prognostic study

  • performed at one institution

Acknowledgements

The authors thank Zhigang Zhang, PhD (Associate Attending, Department of Epidemiology-Biostatistics, Memorial Sloan Kettering Cancer Center) for statistics consultation.

MSK MIND Consortium

The members of the MSK MIND Consortium are Sohrab Shah PhD

Jianjiong Gao PhD, Paul Sabbatini MD, Peter D. Stetson MD, Nathaniel Swinburne MD, Nikolaus Schultz PhD, Matthew Hellmann MD, Yulia Lakhman MD, Mithat Gonen PhD, Pedram Razavi MD PhD, Elizabeth Sutton MD, Pegah Khosravi PhD, Kevin Boehm, Rami Vanguri PhD, Justin Jee MD PhD, Karl Pichotta PhD, Christopher Fong PhD, Arfath Pasha, Doori Rose, Essam Elsherif, Andrew Aukerman, Druv Patel, Anika Begum, Elizabeth Zakszewski PhD, Benjamin Gross, John Philip MS, Luke Geneslaw, Robert Pimienta and Surya Narayana Rangavajhala

Funding

MSK MIND is supported by Cycle For Survival. This project is also supported by the National Institutes of Health/National Cancer Institute (Cancer Center Support Grant P30 CA008748).

Abbreviations

AI

Artificial intelligence

CNN

Convolutional neural network

DSC

Dice score coefficient

PACS

Picture archiving and communication system

R-CNN

Region-based CNN

T1C+

Post-contrast T1-weighted images

Footnotes

Nathaniel C. Swinburne and Vivek Yadav are the co-first author.

Robert J. Young is the senior author.

The full list of MSK MIND Consortium members is included the Acknowledgements section.

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s00330-023-09583-3.

Declarations

Guarantor The scientific guarantor of this publication is Nathaniel Swinburne.

Conflict of interest The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article.

Statistics and biometry Zhigang Zhang, PhD (Associate Attending, Department of Epidemiology-Biostatistics, Memorial Sloan Kettering Cancer Center) kindly provided statistical advice for this manuscript.

Informed consent Written informed consent was waived by the Institutional Review Board.

Ethical approval Institutional Review Board approval was obtained.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

References

  • 1.Eisenhauer EA, Therasse P, Bogaerts J et al. (2009) New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer 45:228–247. 10.1016/j.ejca.2008.10.026 [DOI] [PubMed] [Google Scholar]
  • 2.van den Bent M, Wefel J, Schiff D et al. (2011) Response assessment in neuro-oncology (a report of the RANO group): assessment of outcome in trials of diffuse low-grade gliomas. Lancet Oncol 12:583–593. 10.1016/S1470-2045(11)70057-2 [DOI] [PubMed] [Google Scholar]
  • 3.Ko C-C, Yeh L-R, Kuo Y-T, Chen J-H (2021) Imaging biomarkers for evaluating tumor response: RECIST and beyond. Biomark Res 9:52. 10.1186/s40364-021-00306-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Reuter M, Gerstner ER, Rapalino O et al. (2014) Impact of MRI head placement on glioma response assessment. J Neurooncol 118:123–129. 10.1007/s11060-014-1403-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Sorensen AG, Patel S, Harmath C et al. (2001) Comparison of diameter and perimeter methods for tumor volume calculation. J Clin Oncol 19:551–557. 10.1200/JCO.2001.19.2.551 [DOI] [PubMed] [Google Scholar]
  • 6.Lambin P, Leijenaar RTH, Deist TM et al. (2017) Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol 14:749–762. 10.1038/nrclinonc.2017.141 [DOI] [PubMed] [Google Scholar]
  • 7.Egger J, Kapur T, Fedorov A et al. (2013) GBM volumetry using the 3D Slicer Medical Image Computing Platform. Sci Rep 3:1364. 10.1038/srep01364 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.AlBadawy EA, Ashirbani S, Mazurowski MA (2018) Deep learning for segmentation of brain tumors: impact of cross-institutional training and testing. Med Phys 45:1150–1158. 10.1002/mp.12752 [DOI] [PubMed] [Google Scholar]
  • 9.Zech JR, Badgeley MA, Liu M et al. (2018) Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med 15:e1002683. 10.1371/journal.pmed.1002683 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Park JE, Park SY, Kim HJ, Kim HS (2019) Reproducibility and generalizability in radiomics modeling: possible strategies in radiologic and statistical perspectives. Korean J Radiol 20:1124–1137. 10.3348/kjr.2018.0070 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Voter AF, Meram E, Garrett JW, Yu J- PJ (2021) Diagnostic accuracy and failure mode analysis of a deep learning algorithm for the detection of intracranial hemorrhage. J Am Coll Radiol 10.1016/j.jacr.2021.03.005 [DOI] [PMC free article] [PubMed]
  • 12.Yan W, Huang L, Xia L et al. (2020) MRI manufacturer shift and adaptation: increasing the generalizability of deep learning segmentation for MR images acquired with different scanners. Radiol Artif Intell 2:e190195. 10.1148/ryai.2020190195 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Kondrateva E, Pominova M, Popova E, et al. (2020) Domain shift in computer vision models for MRI data analysis: an overview. ArXiv201007222 Cs Eess
  • 14.Yu AC, Mohajer B, Eng J (2022) External validation of deep learning algorithms for radiologic diagnosis: a systematic review. Radiol Artif Intell 4:e210064. 10.1148/ryai.210064 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Swinburne NC, Mendelson D, Rubin DL (2020) Advancing semantic interoperability of image annotations: automated conversion of non-standard image annotations in a commercial PACS to the annotation and image markup. J Digit Imaging 33:49–53. 10.1007/s10278-019-00191-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Swinburne NC, Yadav V, Kim J et al. (2022) Semisupervised training of a brain MRI tumor detection model using mined annotations. Radiology 303:80–89. 10.1148/radiol.210817 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Rajchl M, Lee MCH, Oktay O et al. (2017) DeepCut: object segmentation from bounding box annotations using convolutional neural networks. IEEE Trans Med Imaging 36:674–683. 10.1109/TMI.2016.2621185 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Sun L, Wu J, Ding Y, Huang Y, Wang G, Yu Y (2020) A teacher-student framework for semi-supervised medical image segmentation from mixed supervision. arXiv 10.48550/arXiv.2010.12219 [DOI]
  • 19.Liu Y, Hui Q, Peng Z, Gong S, Kong D (2021) Automatic CT segmentation from bounding box annotations using convolutional neural networks. arXiv 10.48550/arXiv.2105.14314 [DOI]
  • 20.Tang Y, Cao Z, Zhang Y et al. (2021) Leveraging large-scale weakly labeled data for semi-supervised mass detection in mammograms 3855–3864. https://openaccess.thecvf.com/content/CVPR2021/html/Tang_Leveraging_Large-Scale_Weakly_Labeled_Data_for_Semi-Supervised_Mass_Detection_in_CVPR_2021_paper.html
  • 21.Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9:62–66. 10.1109/TSMC.1979.4310076 [DOI] [Google Scholar]
  • 22.Yushkevich PA, Piven J, Hazlett HC et al. (2006) User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage 31:1116–1128. 10.1016/j.neuroimage.2006.01.015 [DOI] [PubMed] [Google Scholar]
  • 23.Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 Springer International Publishing, Cham, pp 234–241 [Google Scholar]
  • 24.He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In 2017 IEEE international conference on computer vision (ICCV) 2980–2988. 10.1109/ICCV.2017.322 [DOI]
  • 25.Wang J, Sun K, Cheng T et al. (2020) Deep high-resolution representation learning for visual recognition. arXiv 10.48550/arXiv.1908.07919 [DOI] [PubMed]
  • 26.Bhalodiya JM, Lim Choi Keung SN, Arvanitis TN (2022) Magnetic resonance image-based brain tumour segmentation methods: a systematic review. Digit Health 8:20552076221074120. 10.1177/20552076221074122 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Isensee F, Jaeger PF, Full PM, Vollmuth P, Maier-Hein KH (2020) NnU-Net for brain tumor segmentation. arXiv 10.48550/arXiv.2011.00848 [DOI]
  • 28.Boehm KM, Khosravi P, Vanguri R et al. (2022) Harnessing multimodal data integration to advance precision oncology. Nat Rev Cancer 22:114–126. 10.1038/s41568-021-00408-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Rauschecker AM, Gleason TJ, Nedelec P et al. (2022) Interinstitutional portability of a deep learning brain MRI lesion segmentation algorithm. Radiol Artif Intell 4:e200152. 10.1148/ryai.2021200152 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Vadmal V, Junno G, Badve C et al. (2020) MRI image analysis methods and applications: an algorithmic perspective using brain tumors as an exemplar. Neurooncol Adv 2:vdaa049. 10.1093/noajnl/vdaa049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Dikici E, Bigelow M, Prevedello LM et al. (2020) Integrating AI into radiology workflow: levels of research, production, and feedback maturity. J Med Imaging 7:016502. 10.1117/1.JMI.7.1.016502 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental file

RESOURCES