Skip to main content
Radiology: Artificial Intelligence logoLink to Radiology: Artificial Intelligence
. 2020 Mar 11;2(2):e190011. doi: 10.1148/ryai.2020190011

Three-Plane–assembled Deep Learning Segmentation of Gliomas

Shaocheng Wu 1, Hongyang Li 1, Daniel Quang 1, Yuanfang Guan 1,
PMCID: PMC7104789  PMID: 32280947

Abstract

Purpose

To design a computational method for automatic brain glioma segmentation of multimodal MRI scans with high efficiency and accuracy.

Materials and Methods

The 2018 Multimodal Brain Tumor Segmentation Challenge (BraTS) dataset was used in this study, consisting of routine clinically acquired preoperative multimodal MRI scans. Three subregions of glioma—the necrotic and nonenhancing tumor core, the peritumoral edema, and the contrast-enhancing tumor—were manually labeled by experienced radiologists. Two-dimensional U-Net models were built using a three-plane–assembled approach to segment three subregions individually (three-region model) or to segment only the whole tumor (WT) region (WT-only model). The term three-plane–assembled means that coronal and sagittal images were generated by reformatting the original axial images. The model performance for each case was evaluated in three classes: enhancing tumor (ET), tumor core (TC), and WT.

Results

On the internal unseen testing dataset split from the 2018 BraTS training dataset, the proposed models achieved mean Sørensen–Dice scores of 0.80, 0.84, and 0.91, respectively, for ET, TC, and WT. On the BraTS validation dataset, the proposed models achieved mean 95% Hausdorff distances of 3.1 mm, 7.0 mm, and 5.0 mm, respectively, for ET, TC, and WT and mean Sørensen–Dice scores of 0.80, 0.83, and 0.91, respectively, for ET, TC, and WT. On the BraTS testing dataset, the proposed models ranked fourth out of 61 teams. The source code is available at https://github.com/GuanLab/Brain_Glioma.

Conclusion

This deep learning method consistently segmented subregions of brain glioma with high accuracy, efficiency, reliability, and generalization ability on screening images from a large population, and it can be efficiently implemented in clinical practice to assist neuro-oncologists or radiologists.

Supplemental material is available for this article.

© RSNA, 2020


Summary

An accurate and fast deep learning approach developed for automatic segmentation of brain glioma on multimodal MRI scans achieved Sørensen–Dice scores of 0.80, 0.83, and 0.91 for enhancing tumor, tumor core, and whole tumor, respectively.

Key Points

  • ■ A fast method for automatic segmentation of brain glioma achieved high prediction accuracy on multimodal MRI scans.

  • ■ This method included three major components: normalization of the brain region, a three-plane–assembled approach, and modeling across different types of glioma.

  • ■ On the 2018 Brain Tumor Segmentation Challenge validation dataset, the proposed models achieved mean Sørensen–Dice scores of 0.80, 0.83, and 0.91 for enhancing tumor, tumor core, and whole tumor, respectively, approaching the radiologist-level benchmark scores of 0.85, 0.86, and 0.91.

Introduction

Accurate segmentation of gliomas on routine MRI scans plays an important role in disease diagnosis, treatment decision, and prognosis (13). In current clinical practice, complementary MRI sequences, including native T1-weighted, postcontrast T1-weighted, T2-weighted, and T2 fluid-attenuated inversion recovery (FLAIR), are required to characterize different tissue properties and areas of tumor spread. Current glioma segmentation is mostly performed manually. However, manual delineation is time-consuming, especially for multimodal MRI protocols, and it is highly dependent on the subjective decisions of individual radiologists (47). Therefore, computational tools for automatic glioma segmentation are in demand to assist radiologists in interpretation and identification of subtle changes in brain glioma. Nonetheless, the isointense and hypointense regions of glioma and the fuzziness of tumor margins (8) pose great challenges to automatic glioma segmentation. Even in the same MRI sequence, variation exists in ground truth labels applied by various radiologists, further complicating this segmentation task.

Recent developments in deep learning have yielded an opportunity to segment brain glioma at higher precision and resolution. Deep convolutional neural network (CNN) methods have performed well in many high-profile contests (911). Compared with conventional machine learning methods, such as support vector machines or random forests, deep CNN approaches are not dependent on hand-crafted features but can automatically learn a hierarchy of complex features from the data (9,12). Many deep CNN-based approaches have been developed for glioma segmentation. The first-place method in the 2013 Multimodal Brain Tumor Segmentation Challenge (BraTS) challenge developed two CNN structures with different depths to segment high-grade glioma (HGG) and low-grade glioma (LGG) (9). A multiscale CNN structure was proposed to take advantage of local and global information of multimodal MRI scans (13). A CNN with pixelwise-weighted loss function improved the accuracy of predictions around the glioma edges (14). The first-place method in the 2017 BraTS challenge ensembled three different network architectures, namely three-dimensional U-Net (15,16), three-dimensional fully convolutional network (17), and DeepMedic (11). However, there is still room to improve the performance and efficiency of these computational approaches to help improve radiologists’ segmentation of glioma.

In this study, we described our algorithm for automatic brain glioma segmentation on multimodal MRI scans and evaluated it on the 2018 BraTS challenge dataset. The 2018 BraTS challenge is a benchmark platform, which systematically evaluates computational methods for segmenting brain glioma on the basis of holdout multimodal imaging data (7,18). We used a two-dimensional network architecture to balance the trade-off between efficiency and accuracy. Three major components in our algorithm greatly improved the performance, including normalization of brain region, the three-plane–assembled strategy, and modeling across HGG and LGG.

Materials and Methods

Data

The 2018 BraTS dataset consists of three subsets for model training at the initial stage (n = 210 HGG + 75 LGG), validation at the leaderboard stage (n = 66 cases with unknown glioma grade), and testing at the final performance evaluation stage (n = 191 cases with unknown glioma grade). This validation dataset allowed participants to assess their model performance on the fly during the challenge, and the segmentation masks were not available for participants. At the final evaluation stage, participants were required to submit their segmentation results within a limited controlled time window (48 hours) to avoid overfitting. The segmentation masks of the final testing data were also unknown. Because the masks were not available for the validation and testing datasets, we only used the training dataset to perform fivefold cross validation (see the Network Training section later in this article).

The 2018 BraTS dataset has been segmented manually, by one to four raters, following the same annotation protocol, and their ground truth segmentation masks were approved by experienced neuroradiologists (7,19). The ground truth segmentation masks were labeled in three subregions of tumor tissue: necrotic and nonenhancing tumor core (NCR and NET, respectively), peritumoral edema, and contrast-enhancing tumor (ET). All data contain only the axial images with 240 × 240 × 155 voxels per case. Each case contains four three-dimensional multimodal images (T1-weighted, postcontrast T1-weighted, T2-weighted, and T2 FLAIR), which are rigidly aligned, skull stripped, and resampled to 1 × 1 × 1-mm isotropic resolution. These data were homogenized using rigid registration and linear interpolation by the data provider (7). The dataset is accessible online and includes datasets from the Cancer Imaging Archive (20,21).

Preprocessing

To alleviate the heterogeneity of MRI scans and cohort effect (Fig E1 [supplement]), we normalized each imaging sequence of each case independently by subtracting the mean and then dividing by the standard deviation of the brain region. This strategy mapped the heterogeneous samples to the uniform space of normal distribution.

The information of brain glioma location and severity is encoded in three-dimensional space. Ideally, three-dimensional neural networks are able to capture the three-dimensional information (2225). However, in practice, the graphics processing unit memory and image resolution limit the performance of these methods (26). We circumvented these limitations and created a simple yet effective approach to decode the three-dimensional information by a three-plane–assembled approach (Fig 1) and retained the pixel-level long-range information at the original high resolution. The term three-plane–assembled means that we generated 240 × 155 × 240-pixel coronal and sagittal images by reformatting the original 240 × 240 × 155-pixel axial images. These generated images were then zero-padded into 240 × 240 × 240 pixels, and images of all three anatomic planes were used as the input to our model. Because the ground truth segmentation masks were also in the axial plane, they were processed in the same manner as the training dataset to generate the coronal and sagittal segmentation masks.

Figure 1:

Figure 1:

The overview of our three-plane–assembled brain glioma segmentation. In data preprocessing (upper panel), multimodal MR images from all three anatomic planes were generated and then normalization, augmentation, and oversampling were employed. In model training (middle panel), two U-Net–based models were built using the nested training strategy for all three subregions and only the whole tumor region, respectively. In the performance evaluation (lower panel), Sørensen–Dice score, precision-recall curve, and receiver operating characteristic (ROC) curve were used to assess the model performance on all three regions.

Network Architecture

We used a typical U-Net architecture (15) (Fig 2), which accepts two-dimensional images from three-dimensional image sets with four channels (T2 FLAIR, T2-weighted, T1-weighted, and postcontrast T1-weighted) as inputs. We designed two models: The three-region model segmented the three subregions individually, and the whole tumor (WT)–only model segmented only the WT regions. Both models included brain region normalization, the three-assembled approach, and modeling across HGG and LGG. In both models, the inputs went through a series of convolutional and pooling layers and were turned into feature maps with a smaller size. The resulting feature maps then passed a series of “upconvolutional” and concatenating layers. Finally, the network produced a segmentation mask either with only one label channel for the WT region (WT-only model) or with three separately labeled channels for the three subregions (three-region model). We also conducted experiments with AlexNet architecture (27) using the same components as those used for the U-Net architecture. The only difference between AlexNet and U-Net is that AlexNet does not have the skip connections (concatenation layers) to connect the downsampling path and the upsampling path (Fig E10, A [supplement]).

Figure 2:

Figure 2:

Three-plane–assembled method and U-Net architecture. A, Multimodal MR images from another two anatomic planes (sagittal and coronal planes) were generated from the original MR images from the axial plane. All MR images from three planes with four channels (T1, T1ce, T2, and T2 FLAIR) were input into the model to obtain segmentation masks with three channels (whole tumor [WT], tumor core, and enhancing tumor). B, U-Net architecture with 10 layers was deployed. The three-channel model output segmentation of three regions and the one-channel model output segmentation of the WT region. T1 = T1-weighted, T1ce = postcontrast T1-weighted, T2 = T2-weighted, T2-FLAIR = fluid-attenuated inversion recovery.

Network Training

Images obtained with four imaging sequences (T2-weighted, T2 FLAIR, T1-weighted, and postcontrast T1-weighted) and three anatomic planes (axial, coronal, and sagittal) from both HGG and LGG cases were input into our model. Fivefold cross validation was deployed to train five models. This process involved splitting the data into five folds, training on four of the folds, testing on the remaining fold, and repeating this process five times so that each fold was used for testing one time. We used a nested training pipeline: Each time half of the training data were used to train the model, the other half were used for monitoring training progress and tuning hyperparameters. Our models were trained with randomly sampled two-dimensional images (240 × 240) from three-dimensional image sets with a batch size of 16 and for a total of 10 epochs. Training was performed using the Adam optimizer with an initial learning rate (lrinit = 3e-5). For each model, we repeated this process five times with different splits to refine our model. Performance of the model was assessed by using a Sørensen–Dice score (28) on the testing fold.

Data Augmentation

Because the size of training data was relatively small, we deployed two types of data augmentation. First, images were randomly flipped in either a left or right or up or down direction. The corresponding labels were also processed in the same manner. Although the local contexts of each pixel were the same after flipping, the resulting orientations of glioma and normal regions were different from the original orientations. Second, to address the extremely imbalanced classification problem (<10% pixels are positive in an image and around 35% images contain positive labels), we oversampled images with positive labels to balance the positive-negative ratio.

Evaluation

The three subregions of tumor tissue were considered for evaluation in three classes: ET, tumor core (TC), and WT. Classes were nested such that the largest area class, WT, included ET, peritumoral edema, and NCR and NET as its subsets and TC included peritumoral edema and NCR and NET as its subsets. The 2018 BraTS challenge evaluated the model performance in the metrics of Sørensen–Dice score (28) and 95% Hausdorff distance (29). The Sørensen–Dice score is a metric used for comparing the similarity of two sets of labels:

graphic file with name ryai.2020190011.uneq1.jpg,

where A and B are two sets of labels, Inline graphic is the intersection area between two sets, and Inline graphic is the sum of the areas of A and B (Fig 1).

The Hausdorff distance measures the Euclidean distance between two sets of labels:

graphic file with name ryai.2020190011.uneq2.jpg

and

graphic file with name ryai.2020190011.uneq3.jpg

where A and B are two sets of labels and h (A, B) computes the Euclidean distance between a and b such that a and b are points of A and B, respectively. For this challenge, set A represents pixelwise binary labels from radiologists and set B represents predicted labels from a computational model. We also evaluated our model performance using areas under the receiver operating characteristic curve (AUROCs) and areas under the precision-recall curve (AUPRCs) (28,30,31).

For each two-dimensional image (240 × 240) from a three-dimensional image set (240 × 240 × 155), our model predicted three segmentation masks respectively for three subregions (ET, TC, and WT). Specifically, our model computed three probabilities for each pixel to represent how likely it belonged to each of the three subregions. With the ground truth label, we calculated the Sørensen–Dice score for each subregion in each image. In this way, we obtained 155 Sørensen–Dice scores per subregion. For each subregion, we then calculated the average of 155 Sørensen–Dice scores. This resulted in three Sørensen-Dice scores for a patient case. Finally, we calculated the average of the Sørensen–Dice scores for all testing cases and obtained three Sørensen–Dice scores representing the performance of our model on three subregions. The ranking scheme of the 2018 BraTS challenge comprises the rankings of each team relative to its competitors for (a) each of the testing cases, (b) each evaluated region (ET, TC, and WT), and (c) for each scoring metric (Sørensen–Dice score and Hausdorff distance). The final ranking score was then calculated by averaging across all these individual rankings, which was further normalized by the number of teams.

Experiments

We conducted several experiments to investigate the contribution of each component to our final performance. Specifically, (a) “3planes+HGG+LGG” represents the experiments conducted with three-plane–assembled approach and modeling across HGG and LGG; (b) “3planes+HGG” represents the experiments conducted with three-plane–assembled approach and only HGG cases; (c) “HGG+LGG” represents the experiments conducted with only modeling across HGG and LGG; (d) “3planes+LGG” represents the experiments conducted with three-plane–assembled approach and only LGG cases; and (e) “Only Whole Model” represents the experiments with three-plane–assembled approach and modeling across HGG and LGG only on the WT region.

Results

We reported our model performance using the data provided by the 2018 BraTS Challenge. On our internal unseen testing dataset split from the BraTS training dataset, our proposed models achieved mean Sørensen–Dice scores of 0.80, 0.84, and 0.91, respectively, for ET, TC, and WT, approaching radiologist-level Sørensen–Dice scores of 0.85, 0.86, and 0.91, respectively (7). On the BraTS validation dataset of 66 cases, our proposed models achieved mean 95% Hausdorff distances of 3.1 mm, 7.0 mm, and 5.0 mm, respectively, for ET, TC, and WT and mean Sørensen–Dice scores of 0.80, 0.83, and 0.91, respectively, for ET, TC, and WT. The BraTS independent holdout testing dataset contains 191 cases also with unknown glioma grade and unknown segmentation masks, and it requires participants to submit their segmentation results within a limited controlled time window (48 hours). On the BraTS testing dataset, our proposed models ranked fourth out of 61 teams.

Space Transformation with the Three-Plane–Assembling Approach Improves the Segmentation of Brain Glioma

We deployed a three-plane–assembled approach to decode the three-dimensional information from three anatomic planes (Fig 2, A). In the three-region model, this approach achieved mean Sørensen–Dice scores of 0.882, 0.839, and 0.798, respectively, for three regions (ET, TC, and WT) (3planes+HGG+LGG, Fig 3, A) and mean AUROCs of 0.999, 0.999, and 0.999, respectively, at the pixel level (3planes+HGG+LGG, Fig E3 [supplement]). Given the class imbalance, we further calculated the mean AUPRCs, which can provide a less inflated measure (Fig 4, C–E). The AUPRCs had mean values of 0.946, 0.924, and 0.888 for three regions at the pixel level (3planes+HGG+LGG, Fig E2 [supplement]). Conversely, the model without the three-plane–assembled approach only led to corresponding mean Sørensen–Dice scores of 0.853, 0.799, and 0.761 (HGG+LGG in Fig 3, A); mean AUROCs of 0.998, 0.995, and 0.991; and mean AUPRCs of 0.920, 0.892, and 0.859, respectively, for the three regions. This result demonstrated that our three-plane–assembled approach significantly improved model performance by capturing spatial information and increasing the number of training samples. The training losses and validation losses for different experiments are also shown in Figure 3, B, and Figure E11 (supplement).

Figure 3:

Figure 3:

Sørensen–Dice scores, areas under the precision-recall curves (AUPRCs), and learning curves of different experiments. A, Violin plot graphs show the AUPRC and Sørensen–Dice score performances for all experiments in predicting all three regions (whole tumor [WT], tumor core, and enhancing tumor). Specifically, the “3planes+HGG+LGG” model represents the experiments with three-plane–assembled approach and modeling across high-grade glioma (HGG) and low-grade glioma (LGG), the “3planes+HGG” represents the experiments with three-plane–assembled approach and only HGG cases, the “HGG+LGG” represents the experiments with only modeling across HGG and LGG, the “3planes+LGG” represents the experiments with three-plane–assembled approach and only LGG cases, and the “Only Whole Model” represents the experiments with three-plane–assembled approach and modeling across HGG and LGG only on the WT region. The original data of the Sørensen–Dice scores and AUPRCs can be found in Tables E1 and E2 (supplement). B, Learning curves with both training loss and validation loss for four experiments show that models with three planes, HGG, and LGG perform better for three-region and WT predictions.

Figure 4:

Figure 4:

A, B, Scatterplots show the positive correlation between performance and tumor volume. The tumor volumes are in the unit of pixel. C–E, Scatterplots show the inflated measure of the areas under the receiver operating characteristic curve (AUROCs) compared with the areas under the precision-recall curve (AUPRCs) in all three regions. Each dot represents one case. Of note, the y-axis of the AUROC ranges from 0.9900 to 1.0000, where the performances in different cases are barely distinguishable. In contrast, the x-axis of AUPRC clearly separates them. F, The total prediction runtime of our approach for segmenting 1, 10, 30, 60, and 100 cases.

Modeling across HGG and LGG Improves the Segmentation Performance

Besides the three-plane–assembled approach, we combined the cases of both HGG and LGG in the training stage and this improved the overall performance (3planes+HGG+LGG vs 3planes+HGG and 3planes+LGG, Fig 3, A). The limited number of LGG samples and the fact that LGG rarely has necrosis and has a much lower propensity for enhancement greatly restrained the performance of the model solely trained on LGG (3planes+LGG, Fig 3, A).

Interestingly, the model combining HGG and LGG also outperformed the model exclusively trained for HGG (3planes+HGG, Fig 3, A), which only led to mean Sørensen–Dice scores of 0.863, 0.811, and 0.786 (Fig 5); mean AUROCs of 0.994, 0.988, and 0.991 (Fig E3 [supplement]); and mean AUPRCs of 0.939, 0.916, and 0.861 (Fig E3 [supplement]), respectively, for WT, TC, and ET, compared with the model with both HGG and LGG (3planes+HGG+LGG, Fig 3, A). Despite the fact that the larger size of training samples can boost the model performance, this result indicated that the HGG and LGG cases are relatively similar at the image level and complementary to each other in predicting the glioma landscapes.

Figure 5:

Figure 5:

Prediction examples of different experiments. Prediction examples of both high-grade glioma (HGG) and low-grade glioma (LGG) from four experiments show that models with three planes, HGG, and LGG achieve better delineation of three regions compared with that of segmentations by radiologists (ground truth). More prediction examples are shown in Figures E5E9 (supplement).

WT-only Model Further Improves WT Region Segmentation

We conducted further experiments to investigate whether models that separately output segmentations of different regions could improve performance. We found that only the model for WT region achieved higher accuracy, with a mean Sørensen–Dice score of 0.906 (Only Whole Model, Fig 3, A), a mean AUROC of 0.998 (Fig E4, A, right panel [supplement]), and a mean AUPRC of 0.950 (Fig E4, A, left panel, [supplement]), whereas the three-region model led to a mean Sørensen–Dice score of 0.882 (3planes+HGG+LGG, Fig 3, A), a mean AUROC of 0.999 (3planes+HGG+LGG, Fig E3 [supplement]), and a mean AUPRC of 0.946 (3planes+HGG+LGG, Fig E2 [supplement]) for the WT region. These results could be attributed to the fact that some cases have smaller or even no TC or ET regions labeled, whereas every case has the WT region labeled. The low quantity of positive labels from TC and ET regions resulted in less comprehensive segmentation of the corresponding regions.

Prediction Performance Is Partially Related to the Volumes of Brain Glioma

Our models achieved high performance on various evaluation metrics (Fig 3, A; Figs E2; E3; E4 B; E4, C; and E4, D [supplement]), yet the variance of prediction performance across cases was noticeable (Fig 3, A). We further investigated the possible aspects affecting the performance. Interestingly, the prediction performance was partially related to the volumes of brain glioma (Fig 4, A, B). Gliomas with a higher volume tended to be easier to predict and resulted in slightly higher performances on Sørensen–Dice score (correlation = 0.317, P < .001) and on AUPRC (correlation = 0.309, P < .001).

U-Net Outperforms AlexNet through Skip Connection

The model based on two-dimensional AlexNet (27) had lower performance compared with our model with two-dimensional U-Net (Fig E10 [supplement]). U-Net differs from AlexNet only in that U-Net has skip connections between the downsampling path and upsampling path with concatenation operations (Fig E10, A [supplement]). These skip connections provide local information to the global information while upsampling and avoid important information loss in convolutional operations.

Discussion

In this study, we described our brain tumor segmentation approach based on two-dimensional U-Net, which ranked in fourth place in the 2018 BraTS challenge. It assembled images from three anatomic planes and used four MRI sequences as four input channels. This strategy benefited from additional spatial information of glioma and increased the number of training data to avoid overfitting. It retained high performance even in a completely new dataset because it employed normalization on independent cases from the highly variant BraTS dataset (7). In addition to the general applicability, our models were computationally efficient, requiring only 5–7 seconds for segmenting one case (Fig 4, F), whereas previous studies required more prediction time (911). More importantly, the BraTS benchmark dataset had large variability between labels from different raters, or from a rater and consensus fusion (7). Our approach improved the segmentation from individual rater and achieved almost the same accuracy as the segmentation from consensus fusion (Fig 3). This performance suggested the scope and potential of our models in assisting radiologists in image interpretation and identifying subtle changes in brain glioma.

The major aspect contributing to the success of our algorithm was the effort to preserve spatial information of brain glioma through a three-plane–assembled approach in two-dimensional U-Net. The segmentation mask output from the model with only one plane might contain false-positive findings, which are usually located outside of the overall tumor region (Fig 5). For example, a noisy or blurred nontumor region in a round shape may appear similar to that of a tumor region on the original axial image. After reformatting the original image, the same region on the coronal and sagittal images can be distinct from tumor (eg, no longer a round shape). The three-plane–assembled approach can therefore capture this and correct these false-positive findings with the help of information from other dimensions. In addition, the three-plane–assembled approach increases the number of training samples because one axial image now becomes three images. This further alleviates a common overfitting problem of neural network models because of relatively small sample sizes. The fact that the WT-only model outperforms the three-region model in terms of WT segmentation can be attributed to the performance evaluation method. The probabilities output from the three-region model with a value below 0.5 would be removed to further avoid false-positive findings. Pixels around the edges of regions commonly have probabilities lower than 0.5 for all three regions. Therefore, these pixels would not be assigned to any regions while they actually belong to the WT region. In this way, such pixels result in lower performance of WT segmentation in the three-region model than that in the WT-only model.

Previous methods trained two separate CNNs for HGG and LGG, and this led to Sørensen–Dice scores of only 0.75, 0.65, and 0.78, respectively, for ET, TC, and WT regions in the 2015 BraTS Challenge (9). Although the lesions of HGG and LGG are quite different from each other (32,33), the combined training improved the predictions of both HGG and LGG, especially for LGG. The low quantity of LGG cases may be one of the reasons that is restraining the performance of the model solely used for LGG. Another possibility is that the number of samples with necrosis and ET is limited in LGG cases because LGG rarely has necrosis and has a much lower propensity for enhancement. Modeling across HGG and LGG comes from their similarity at the image level. The aggressive tumor infiltration of HGG may lead to unclear and irregular boundaries, and the intact blood-brain barrier of LGG also makes its boundaries invisible or blurry (34).

Although we achieved high segmentation performance, our approach was not perfect. The number of glioma images we used to train models was relatively small (285 cases, 44 175 images) compared with that of ImageNet, which has more than 14 million images (35). Our method might have limitations from the common overfitting problem of neural network models because of the limited number of training cases. This is especially true for the ET region because the number of images with ET labels is even smaller. Our current model was less successful in segmenting the ET region, yet we anticipate that our models can be improved with more training samples in the future. Furthermore, compared with traditional machine learning models, the model training process of our approach required more computational resources, such as the graphics processing unit (Table E3 [supplement]), which cannot be supplied by common laptop or desktop computers. Finally, our two-dimensional approach could not fully consider the complete three-dimensional information, although we introduced the three-plane–assembled strategy. With more glioma training samples and more powerful computer hardware, three-dimensional approaches would eventually be used with desired efficiency and accuracy.

Altogether, our study demonstrated the potential and scope of our computational models in brain glioma segmentation. Our neural network models could be easily adapted and retrained with different kinds of glioma datasets, including diffusion or perfusion data. For example, our current model had four input channels of T1-weighted, postcontrast T1-weighted, T2-weighted, and T2 FLAIR. If the diffusion or perfusion data were available, they could be treated as a fifth input channel. The relationship and interaction between these input channels for glioma segmentation will be learned automatically by our model during the training process. In clinical practice, we envision that neuro-oncologists or radiologists can use our models to automatically obtain the glioma segmentation masks and focus on the masked regions together with the original MRI scans for further evaluation based on their expertise. This is especially useful when serial examinations are performed in a patient and the number of scans to be analyzed becomes much larger. Our approach could be used to generate segmentation masks for each examination, allowing radiologists to keep track of the glioma regions and compare differences across examinations.

SUPPLEMENTAL TABLES

Tables E1-E3 (PDF)
ryai190011suppa1.pdf (358.9KB, pdf)

SUPPLEMENTAL FIGURES

Figure E1:
ryai190011suppf1.jpg (100KB, jpg)
Figure E2:
ryai190011suppf2.jpg (268.9KB, jpg)
Figure E3:
ryai190011suppf3.jpg (193.7KB, jpg)
Figure E4:
ryai190011suppf4.jpg (112.3KB, jpg)
Figure E5:
ryai190011suppf5.jpg (757KB, jpg)
Figure E6:
ryai190011suppf6.jpg (149.6KB, jpg)
Figure E7:
ryai190011suppf7.jpg (152.7KB, jpg)
Figure E8:
ryai190011suppf8.jpg (143.8KB, jpg)
Figure E9:
ryai190011suppf9.jpg (151.2KB, jpg)
Figure E10:
ryai190011suppf10.jpg (170.6KB, jpg)
Figure E11:
ryai190011suppf11.jpg (698.7KB, jpg)

Acknowledgments

Acknowledgments

We acknowledge the organizing committee and data contributors of the 2018 BraTS challenge.

H.L. is supported by American Heart Association and Amazon Web Services 3.0 Data Grant Portfolio: Artificial Intelligence and Machine Learning Training Grants (19AMTG34850176). Y.G. supported by the National Science Foundation (NSF-US14-PAF07599; CAREER: On-line Service for Predicting Protein Phosphorylation Dynamics Under Unseen Perturbations NSF), National Institute of General Medical Sciences (NIGMS R35GM133346–01), Michael J. Fox Foundation for Parkinson’s Research (17373), and a donation from Amazon.

Disclosures of Conflicts of Interest: S.W. disclosed no relevant relationships. H.L. disclosed no relevant relationships. D.Q. disclosed no relevant relationships. Y.G. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: service as scientific advisor/consultant to Eli Lilly, Genentech, and Roche; shareholder of Cleery and Ann Arbor Algorithms; receives research support from Merck KGaA. Other relationships: disclosed no relevant relationships.

Abbreviations:

AUPRC
area under the precision-recall curve
AUROC
area under the receiver operating characteristic curve
BraTS
Brain Tumor Segmentation Challenge
CNN
convolutional neural network
ET
enhancing tumor
FLAIR
fluid-attenuated inversion recovery
HGG
high-grade glioma
LGG
low-grade glioma
NCR
necrotic tumor core
NET
nonenhancing tumor core
TC
tumor core
WT
whole tumor

References

  • 1.Mazzara GP, Velthuizen RP, Pearlman JL, Greenberg HM, Wagner H. Brain tumor target volume determination for radiation treatment planning through automated MRI segmentation. Int J Radiat Oncol Biol Phys 2004;59(1):300–312. [DOI] [PubMed] [Google Scholar]
  • 2.Yamahara T, Numa Y, Oishi T, et al. Morphological and flow cytometric analysis of cell infiltration in glioblastoma: a comparison of autopsy brain and neuroimaging. Brain Tumor Pathol 2010;27(2):81–87. [DOI] [PubMed] [Google Scholar]
  • 3.Bauer S, Wiest R, Nolte LP, Reyes M. A survey of MRI-based medical image analysis for brain tumor studies. Phys Med Biol 2013;58(13):R97–R129. [DOI] [PubMed] [Google Scholar]
  • 4.Huang M, Yang W, Wu Y, Jiang J, Chen W, Feng Q. Brain tumor segmentation based on local independent projection-based classification. IEEE Trans Biomed Eng 2014;61(10):2633–2645. [DOI] [PubMed] [Google Scholar]
  • 5.Weltens C, Menten J, Feron M, et al. Interobserver variations in gross tumor volume delineation of brain tumors on computed tomography and impact of magnetic resonance imaging. Radiother Oncol 2001;60(1):49–59. [DOI] [PubMed] [Google Scholar]
  • 6.Egger J, Kapur T, Fedorov A, et al. GBM volumetry using the 3D Slicer medical image computing platform. Sci Rep 2013;3(1):1364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Menze BH, Jakab A, Bauer S, et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans Med Imaging 2015;34(10):1993–2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Sachdeva J, Kumar V, Gupta I, Khandelwal N, Ahuja CK. Segmentation, feature extraction, and multiclass brain tumor classification. J Digit Imaging 2013;26(6):1141–1150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Pereira S, Pinto A, Alves V, Silva CA. Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans Med Imaging 2016;35(5):1240–1251. [DOI] [PubMed] [Google Scholar]
  • 10.Havaei M, Davy A, Warde-Farley D, et al. Brain tumor segmentation with deep neural networks. Med Image Anal 2017;35:18–31. [DOI] [PubMed] [Google Scholar]
  • 11.Kamnitsas K, Ledig C, Newcombe VFJ, et al. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med Image Anal 2017;36:61–78. [DOI] [PubMed] [Google Scholar]
  • 12.Jiang YQ, Xiong JH, Li HY, et al. Recognizing basal cell carcinoma on smartphone-captured digital histopathology images with a deep neural network. Br J Dermatol 2019 Apr 24 [Epub ahead of print] 10.1111/bjd.18026. [DOI] [PubMed] [Google Scholar]
  • 13.Havaei M, Dutil F, Pal C, Larochelle H, Jodoin PM. A convolutional neural network approach to brain tumor segmentation. In: Crimi A, Menze B, Maier O, Reyes M, Handels H, eds. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2015. Lecture Notes in Computer Science, vol 9556. Cham, Switzerland: Springer, 2016; 195–208. [Google Scholar]
  • 14.Randhawa RS, Modi A, Jain P, Warier P. Improving boundary classification for brain tumor segmentation and longitudinal disease progression. In: Crimi A, Menze B, Maier O, Reyes M, Winzeck S, Handels H, eds. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2016. Lecture Notes in Computer Science, vol 10154. Cham, Switzerland: Springer, 2016; 65–74. [Google Scholar]
  • 15.Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells W, Frangi A, eds. Medical Image Computing and Computer-Assisted Intervention: MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Cham, Switzerland: Springer, 2015; 234–241. [Google Scholar]
  • 16.Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O. 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin S, Joskowicz L, Sabuncu M, Unal G, Wells W, eds. Medical Image Computing and Computer-Assisted Intervention: MICCAI 2016. MICCAI 2016. Lecture Notes in Computer Science, vol 9901. Cham, Switzerland: Springer, 2016; 424–432. [Google Scholar]
  • 17.Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, June 7–12, 2015. Piscataway, NJ: IEEE, 2015. [DOI] [PubMed] [Google Scholar]
  • 18.Guan Y. Waking up to data challenges. Nat Mach Intell 2019;1:67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Bakas S, Akbari H, Sotiras A, et al. Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci Data 2017;4(1):170117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Cancer Imaging Archive Public Access. TCGA-GBM. Cancer Imaging Archive. https://wiki.cancerimagingarchive.net/display/Public/TCGA-GBM. Modified February 13, 2019. Accessed April 9, 2019.
  • 21.Cancer Imaging Archive Public Access. TCGA-LGG. Cancer Imaging Archive. https://wiki.cancerimagingarchive.net/display/Public/TCGA-LGG. Modified January 8, 2020. Accessed April 9, 2019.
  • 22.Aganj I, Harisinghani MG, Weissleder R, Fischl B. Unsupervised medical image segmentation based on the local center of mass. Sci Rep 2018;8(1):13012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kleesiek J, Urban G, Hubert A, et al. Deep MRI brain extraction: a 3D convolutional neural network for skull stripping. Neuroimage 2016;129:460–469. [DOI] [PubMed] [Google Scholar]
  • 24.Brosch T, Tang LYW, Youngjin Yoo, Li DK, Traboulsee A, Tam R. Deep 3D convolutional encoder networks with shortcuts for multiscale feature integration applied to multiple sclerosis lesion segmentation. IEEE Trans Med Imaging 2016;35(5):1229–1239. [DOI] [PubMed] [Google Scholar]
  • 25.Qi Dou, Hao Chen, Lequan Yu, et al. Automatic detection of cerebral microbleeds from MR images via 3D convolutional neural networks. IEEE Trans Med Imaging 2016;35(5):1182–1195. [DOI] [PubMed] [Google Scholar]
  • 26.Smistad E, Falch TL, Bozorgi M, Elster AC, Lindseth F. Medical image segmentation on GPUs: a comprehensive review. Med Image Anal 2015;20(1):1–18. [DOI] [PubMed] [Google Scholar]
  • 27.Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM 2017;60(6):84–90. [Google Scholar]
  • 28.Davis J, Goadrich M. The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning. New York, NY: ACM, 2006; 233–240. [Google Scholar]
  • 29.Taha AA, Hanbury A. Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool. BMC Med Imaging 2015;15(1):29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Li H, Quang D, Guan Y. Anchor: trans-cell type prediction of transcription factor binding sites. Genome Res 2019;29(2):281–292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Li H, Li T, Quang D, Guan Y. Network propagation predicts drug synergy in cancers. Cancer Res 2018;78(18):5446–5457. [DOI] [PubMed] [Google Scholar]
  • 32.Alcantara Llaguno SR, Parada LF. Cell of origin of glioma: biological and clinical implications. Br J Cancer 2016;115(12):1445–1450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Togao O, Hiwatashi A, Yamashita K, et al. Differentiation of high-grade and low-grade diffuse gliomas by intravoxel incoherent motion MR imaging. Neuro Oncol 2016;18(1):132–141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Dong H, Yang G, Liu F, Mo Y, Guo Y. Automatic brain tumor detection and segmentation using U-Net based fully convolutional networks. In: Valdés Hernández M, González-Castro V, eds. Medical Image Understanding and Analysis. MIUA 2017. Communications in Computer and Information Science, vol 723. Cham, Switzerland: Springer, 2017; 506–517. [Google Scholar]
  • 35.Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248–255. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Tables E1-E3 (PDF)
ryai190011suppa1.pdf (358.9KB, pdf)
Figure E1:
ryai190011suppf1.jpg (100KB, jpg)
Figure E2:
ryai190011suppf2.jpg (268.9KB, jpg)
Figure E3:
ryai190011suppf3.jpg (193.7KB, jpg)
Figure E4:
ryai190011suppf4.jpg (112.3KB, jpg)
Figure E5:
ryai190011suppf5.jpg (757KB, jpg)
Figure E6:
ryai190011suppf6.jpg (149.6KB, jpg)
Figure E7:
ryai190011suppf7.jpg (152.7KB, jpg)
Figure E8:
ryai190011suppf8.jpg (143.8KB, jpg)
Figure E9:
ryai190011suppf9.jpg (151.2KB, jpg)
Figure E10:
ryai190011suppf10.jpg (170.6KB, jpg)
Figure E11:
ryai190011suppf11.jpg (698.7KB, jpg)

Articles from Radiology. Artificial intelligence are provided here courtesy of Radiological Society of North America

RESOURCES