Skip to main content
Journal of Medical Imaging logoLink to Journal of Medical Imaging
. 2019 Oct 4;6(4):044001. doi: 10.1117/1.JMI.6.4.044001

Active learning strategy and hybrid training for infarct segmentation on diffusion MRI with a U-shaped network

Aurélien Olivier a, Olivier Moal a,*, Bertrand Moal a, Fanny Munsch b, Gosuke Okubo b, Igor Sibon c,d, Vincent Dousset b,e, Thomas Tourdias b,e
PMCID: PMC6777650  PMID: 31592439

Abstract.

Automatic and reliable stroke lesion segmentation from diffusion magnetic resonance imaging (MRI) is critical for patient care. Methods using neural networks have been developed, but the rate of false positives limits their use in clinical practice. A training strategy applied to three-dimensional deconvolutional neural networks for stroke lesion segmentation on diffusion MRI was proposed. Infarcts were segmented by experts on diffusion MRI for 929 patients. We divided each database as follows: 60% for a training set, 20% for validation, and 20% for testing. Our hypothesis was a two-phase hybrid learning scheme, in which the network was first trained with whole MRI (regular phase) and then, in a second phase (hybrid phase), alternately with whole MRI and patches. Patches were actively selected from the discrepancy between expert and model segmentation at the beginning of each batch. On the test population, the performances after the regular and hybrid phases were compared. A statistically significant Dice improvement with hybrid training compared with regular training was demonstrated (p<0.01). The mean Dice reached 0.711±0.199. False positives were reduced by almost 30% with hybrid training (p<0.01). Our hybrid training strategy empowered deep neural networks for more accurate infarct segmentations on diffusion MRI.

Keywords: ischemic stroke lesion segmentation, deep learning, diffusion-weighted imaging, fully convolutional networks, patches, active learning

1. Introduction

In Western countries, stroke is the main cause of adult disability, the second cause of dementia after Alzheimer’s disease, and the third cause of death.1 Thrombolysis and, more recently, thrombectomy have transformed stroke outcome.2 Imaging findings are a key determinant for the triage of patients eligible for thrombectomy.3 On magnetic resonance imaging (MRI), diffusion-weighted imaging (DWI) is the major biomarker of ongoing infarct.4 Infarct volume as measured by DWI is a prognosis factor that has recently gained importance in selecting patients who can benefit from revascularization in an extended time window.5,6 A rapid, accurate, and operator-independent method to compute infarct volume from DWI is therefore crucial for patient care.7 Although clinical tools for segmentation are available,8 they rely mostly on a voxel intensity threshold and often require user interaction to correct errors, which is problematic in the context of an emergency or telestroke networks.9

Since 2012, deep convolutional neural networks (CNNs) have caught the interest of researchers because they outperformed state-of-the-art methods on ImageNet,10 an image classification challenge. More recently, methods based on fully CNNs have outperformed state-of-the-art methods on several segmentation challenges in medical imaging: for example, “BRATS,” a brain tumor segmentation challenge based on multimodal MRI scans;11 “Promise12,” a prostate segmentation challenge based on MRI;12 “LiTS,” a liver tumor segmentation challenge based on computed tomography scans; and “ISLES,” an infarct lesion segmentation challenge organized in conjunction with the conference on medical image computing and computer-assisted intervention (MICCAI).13

Such networks have been applied to infarct lesion segmentation on DWI.14,15 The Dice16 is the main metric to measure their performance and the most common cost function to train them. However, the Dice presents limitations: notably, in the case of multiple object segmentation of heterogeneous sizes, as in stroke patients’ MRIs. In such a case, the impact of small false positives over the Dice coefficient can be masked by the presence of larger detected lesions. The number of false positives is the major limit to the deployment of these networks in clinical practice.14,15

For infarct segmentation, different methods have been proposed. Some authors trained networks on whole three-dimensional (3-D) MRIs; a larger context would prevent the detection of false positives. However, other authors have suggested that training a model on patches would overcome Dice limitation.15,17,18 In those methods, the strategy of patch selection was demonstrated to influence model performance.19,20 Yet, to our knowledge, there is no method combining both whole 3-D MRIs and patches to train a single model with patches extracted actively during training.

To automatically segment infarct lesions with a 3-D FCN, our hypothesis was that a two-phase hybrid learning scheme (including regular training with whole MRIs and hybrid training with whole MRIs and patches selected based on discrepancies between provisional model predictions and expert segmentations) could improve model performance and decrease false positives compared with regular training with whole MRIs.

2. Related Work

2.1. Deep Learning for Segmentation

FCNs were first proposed successfully by Long et al.21 on semantic segmentation. In this approach, models directly output a pixelwise segmentation of the target object. The approach presents major advantages compared with conventional patch-based models combined with sliding windows, which provide coarse segmentation and redundant computations. In the past 3 years, numerous improvements have increased FCN performance. First, Ronneberger et al.18 introduced U-net, an encoder–decoder-like architecture using upsampling layers, and skip connections, trained end-to-end on small datasets. Upsampling layers were then replaced with deconvolutional or transposed convolutional layers, allowing finer reconstruction in the decoding phase.22 Additional variations of U-net have since been proposed, such as the addition of atrous convolutional layers to increase the receptive field without learning extra parameters23 or the use of residual connections, inspired by Resnet architecture. These networks were extended to 3-D mainly for medical image analysis such as computed tomography or MRI.24 In addition, although common cost functions were based on a pixelwise cross-entropy, Milletari et al.25 showed better results for segmentation of the prostate in MRI by using the Dice as a cost function, thus reducing the impact of class imbalance common in segmentation tasks.16

2.2. Segmentation of the Infarct Lesion

ISLES is an annual multimodal infarct lesion segmentation challenge organized in conjunction with MICCAI conferences. Each year, a new challenge is proposed using various modalities. The ISLES 2015 challenge aimed to segment subacute infarct lesions from a multimodal acquisition, including DWI, Flair, and T1 and T2 sequences with a database of 64 cases: 28 for training and 36 for testing. The winner, Kamnitsas et al.,17 proposed extracting patches at multiple scales, fed to CNNs.

Patches extracted around lesions increase the frequency of the lesion class against the background, thus reducing class imbalance26 while containing fewer voxels than a whole image. Furthermore, the Dice loss measured locally on patches centered on isolated lesions increases the importance of small lesions during training. Yet, training on whole images creates a larger context for predictions and permits optimization of the model directly on the final target.

Recently, a hybrid approach, training two independent cascaded neural networks, has been proposed.14 The first FCN, EDD-net, was trained with 2-D DWI slices and showed a large number of false positives. The second FCN, MUSCLE-net, trained with patches centered on the first model, predicted lesions and was used to refine the segmentation, decreasing by 44% the number of false positives. The authors, Chen et al., explained that their large initial number of false positives could be decreased with a 3-D approach.

More recently, Zhang et al.27 developed a 3-D FCN using dense connected convolutional layers,28 trained with large patches to segment infarct lesions on MRI. They outperformed EDD-net on their own dataset of 247 MRIs but showed worse results when using EDD-net + MUSCLE-net, having overfitting issues when retraining MUSCLE-net.

Additionally, a better selection of patches could improve model performance. Selecting specific batches of data during training has been shown to improve the results of deep learning models in various applications. For image processing, data selection methods were proposed,19,20 selecting training samples at each minibatch of the stochastic gradient descent according to the loss values. Wang et al.29 slightly improved their results on multiorgan segmentation, using a relaxed upper confidence bound to select training data.

Because the potential of training data selection as patches using loss values feeding a single model has not yet been explored for the segmentation of infarct lesions, we propose this approach and analyze its performance.

3. Methods

This section describes an innovative deep neural network training scheme for infarct lesion segmentation on DWI MRI. We propose a two-phase hybrid learning scheme for training a single 3-D FCN with both MRIs and actively extracted patches, aiming to penalize strongly the false-positive detection. The model is trained with whole MRIs during the regular phase and alternatively on whole MRIs and patches during the hybrid phase. An active patch selection strategy was developed to extract patches centered on false positive, false negative, and true positive detections (TPDs). Our architecture, UD-net, is inspired by a U-shaped FCN, integrating at its bottleneck an adaptive max-pooling, conserving the same resolution in the z-axis to reduce the anisotropy of features, and a dilated convolution with residual connections. The performances of the model after the regular phase and after the hybrid phase were compared to evaluate the benefits of the hybrid phase.

3.1. Databases

Three databases were combined. They are described in detail elsewhere,3032 and their main characteristics are presented in Table 1.

Table 1.

Median age, gender, median NIHSS, average infarct volume, average number of lesions, and mean delay from symptoms to MRI on each database. (n, number of patients).

  Database 1 n=83 Database 2 n=351 Database3 n=495
Median age [year (range)] 64.0 (23 to 87) 68.0 (29 to 95) 67 (16 to 94)
Male (%) 59.3 64.2 64.0
Median NIHSS at presentation (range) 11 (4 to 20) 4 (1 to 25) 1 (0 to 22)
Average infarct volume [mL (SD)] 51.0 (65.5) 33.9 (60.1) 8.4 (19.3)
Average number of lesions (SD) 4.6 (5.2) 7.7 (10.9) 7.2 (10.7)
Mean delay from symptoms to MRI [hours (SD)] 100 (26) 57 (17) (max 72)a
a

The detailed timing from onset to imaging is unknown, but all MRIs were obtained within the first 3 days.

The first cohort study (database-1) consisted of a total of 83 patients with first middle cerebral artery acute ischemic stroke included in a prospective national multicenter observational study. The primary inclusion criteria were men and women, older than 18 years, with a clinical diagnosis of minor-to-severe cerebral infarct (NIHSS scores between 4 and 20) in the left or right middle cerebral artery territory. MRIs were collected at four institutions on 1.5T magnets (Intera and Achieva, Philips Healthcare, Best, Netherlands; and Magnetom Vision, Siemens, Erlangen, Germany) with a standard single-shot spin echo planar imaging diffusion sequence using 24 slices of 5 mm thickness at b=0 and b=1000  s/mm2.

The second cohort study (database 2) consisted of a total of 351 consecutive patients with supratentorial ischemic stroke included in a prospective longitudinal study from June 2012 to February 2015. The primary inclusion criteria were men or women, older than 18 years, with a suspected clinical diagnosis of minor-to-severe supratentorial cerebral infarct (NIHSS scores between 1 and 25) 24 to 72 h after the insult, confirmed on trace DWI. MRIs were collected in one institution on a 3T magnet (Discovery MR750w, GE Medical Systems, Milwaukee, Wisconsin) with a standard spin echo echo-planar imaging diffusion sequence using 38 slices of 4 mm thickness at b=0 and b=1000  s/mm2.

The third cohort (database 3) consisted of a total of 495 patients following the same inclusion criteria as for database 2 and retrospectively collected between May 2016 and September 2017 with 3- to 5-mm diffusion imaging acquired on 1.5T (Achieva Philips Healthcare, Best, Netherlands; and Area, Siemens, Erlangen, Germany) or 3T (Discovery MR750w, GE Medical Systems, Milwaukee, Wisconsin) magnets. MRIs were acquired within a maximum of 72 h after stroke onset.

Infarct lesions were manually segmented on DWI by expert readers with the help of tools based on intensity variation and edge detection (“level tracing effect,” “draw effect,” and “paint effect”) available in 3-D Slicer.33 Apparent diffusion coefficient (ADC) maps were viewed simultaneously with DWI to edit borders only on diffusion lesions with decreased ADCs. This task is tedious; however, we previously demonstrated, on a subset of the population, good inter- and intrareader reliability to assess infarct volume based on DWI with a global intraclass correlation coefficient of 0.96 (95%CI = 0.93 to 0.98).34

We divided each database as follows: 60% of data for a training set, 20% for validation, and 20% for testing.

3.2. Preprocessing and Data Augmentation

DWI MRIs were rescaled to the same voxel size of (1.6, 1.6, and 4 mm) and cropped or padded to reach a [144, 144, 48] size. Normalization of voxel intensities was then applied to set the mean at 0 and the standard deviation at 1 for each DWI. In addition, data augmentation was performed to increase model robustness and reduce overfitting.35 For this purpose, MRIs were randomly transformed at each batch using Gaussian noise, B-spline deformation, and horizontal flips.

3.3. Network Architecture

The network architecture, UD-net, consists of a 3-D U-shaped network, inspired by Ronneberger et al.18 The architecture was made up of eight blocks containing convolutional layers, as shown in Fig. 1. Three down blocks constituted the downsampling part of the architecture, two blocks constituted the bottleneck, and three additional up blocks constituted the upsampling part. The down blocks contained 3×3×3 convolutions followed by batch normalization, a leaky RELU36,37 as the activation function, and max-pooling. Leaky RELU was selected as the activation function to prevent vanishing gradients,36,37 and batch normalization was selected to allow a faster and more stable training of networks.38,39

Fig. 1.

Fig. 1

(a) Network architecture and (b) blocks from the network architecture.

Considering the difference in voxel size between the x- and y-axes, and the z-axis, the third block of the downsampling part performed an adaptive max-pooling only on the x- and y-axes, thus reducing the anisotropy of features and preserving resolution in the z-axis. The bottleneck of the network enclosed a Conv block and a dilated block aimed at increasing the receptive field.

Up blocks consisted of a 3×3×3 convolution, batch normalization, a leaky RELU, and a deconvolution. In addition, skip connections were implemented by feeding the up blocks with the concatenation of the previous block output and its corresponding down-block output.

In addition, each up block was fed into a 3×3×3×2 convolution, producing intermediary segmentation. Up blocks were then resized to match the original input shape and summed into a final segmentation, similar to that of Kayalibay et al.40 This encourages earlier layers to produce good segmentation and speeds up the training process. The receptive field of UD-net is with 70 voxels on the x- and y-axes and 42 voxels on the z-axis. Finally, a Softmax layer was used to generate voxelwise classification probabilities. The network was optimized with the Dice on the Softmax layer output.

3.4. Hybrid Learning Scheme with Whole Images and Patches

3.4.1. Two training phases

Our approach used two different learning phases. During the regular learning phase, UD-net was trained with a batch size of two whole MRIs with stochastic gradient descent, until the model stopped improving on the validation set for 50 epochs (Fig. 2). The weights that obtained the smallest loss function on validation data defined UD-net regular.

Fig. 2.

Fig. 2

Representation of the two-phase hybrid learning scheme. During the regular learning phase, UD-net is trained on whole MRIs until convergence. In the hybrid phase, UD-net is retrained using both MRI epochs and patch epochs from the best weights of the regular phase.

In the hybrid phase, UD-net regular was retrained alternatively on patch epochs and MRI epochs (Fig. 2). During MRI epochs, UD-net was trained with whole MRI with a batch size of two. During patch epochs, the network was trained with 3-D patches extracted using the active patch selection strategy, described in the next paragraph. The batch size was then set to four patches. Validation was performed every two epochs, after each MRI epoch. The model was trained until no improvement in loss function was observed after 25 validation data evaluations. The weights that obtained the smallest loss function during learning the hybrid phase defined UD-net hybrid.

3.4.2. Active patch selection

During each batch of the patch’s epochs, two MRIs were selected, and their predicted voxelwise probabilities were inferred using the model current weights without backpropagation.

Then, patch selection was performed batchwise by comparing expert segmentation and predicted segmentation using the model weights at the beginning of each batch. Finally, patches were fed to UD-net for a complete stochastic gradient descent iteration with the weights update (see Fig. 3).

Fig. 3.

Fig. 3

Active patch selection: for each MRI of a batch, segmentation is inferred from the actual weights of the model and compared with the expert segmentation; then, patches are selected using active patch selection and fed into the network to update weights with a gradient descent iteration.

On the basis of model output, voxels with a predicted probability superior or equal to 0.5 were labeled as infarct lesions. From both expert and predicted segmentation, blobs, defined as a group of spatially connected voxels labeled as infarct lesions, were extracted. They were extracted using the scipy label function.41 From the expert segmentation and current model prediction, false positive detections (FPDs), false negative detections (FNDs), and true positive detections (TPDs) were identified to generate patches.

  • TPDs were defined as a predicted segmentation blob for which at least one voxel overlaps with a blob from expert segmentation.

  • FPDs were defined as a predicted segmentation blob for which no voxel overlaps with a blob from expert segmentation.

  • FNDs were defined as an expert segmentation blob for which no voxel overlaps with a blob from predicted segmentation.

On the basis of Chen et al.’s14 work, we defined a target patch shape of [64, 64, 16] (102.4, 102.4, and 64 mm). First, minimum bounding boxes (mBBs) around FND, FPD, and TPD were generated. To focus on small lesions, mBBs greater than a fixed shape of [64, 64, 16] were not considered. Patches were generated by expanding each mBB to the target voxel shape of [64, 64, 16]. Overlapping patches were merged if their corresponding mBBs could fit within the target patch size as shown in Fig. 4. Finally, if less than four patches could be found based on the previous criteria, two patches of shape [64, 64, 16] were randomly extracted to reach a batch size of four patches.

Fig. 4.

Fig. 4

Active patch extraction representation: (a) axial plane from MRI; (b) false positives (FP) in orange, true positives (TP) in green, false negatives (FN) in red, and mBBs displayed in white; (c) candidate patches displayed in white; and (d) selected patches displayed in white.

3.5. Evaluation Methods

To compare expert and predicted segmentation, the following metrics were calculated for each MRI:

  • Dice.

  • Number and volume of FPDs and FNDs, extracted by the method proposed in Sec. 3.4.2. Volumes were computed by summing the volumes of all FPDs and all FNDs per MRI.

  • Relative volume difference (RVD) between the volumes of predicted and expert segmentation.

For both the UD-net regular and UD-net hybrid results, the mean and the standard deviation of these metrics were expressed for all the test databases and for each test database individually. To compare both models, the differences were evaluated using the paired bilateral t-test with a significance level set at 0.05.

A decision for thrombectomy in an extended time window can be based on the detection of a mismatch between clinical severity and infarct volume with specific thresholds (i.e., 20, 30, and 50 mL).5 For each of these three thresholds, the volume of ischemic infarct per patient was classified as above or below the threshold based on expert segmentation. The accuracy, sensitivity, and specificity of UD-net hybrid to classify the patients similarly to the expert categories were evaluated.

4. Experiments and Results

4.1. Implementation Details

Networks were optimized using Adam,42 a first-order stochastic gradient-based algorithm with a learning rate of 10e−5, until the model did not improve on the validation set for 50 epochs. Optimal weights on validation were obtained at epoch 126 for UD-net regular and at epoch 53 for UD-net hybrid. The network and optimizer were implemented with the framework Keras,43 using Tensorflow44 backend and a GPU Nvidia Tesla k80.

4.2. Results

For the test set, the average infarct volume was 11.61  mL±26.45 and the average number of lesions was 6.7±13.2 based on expert segmentation.

On all the test data (see Table 2), the mean Dice obtained with regular training was 0.694±0.201. The mean Dice was significantly improved with the hybrid training, reaching 0.711±0.199 (p<0.01, paired bilateral t-test). When analyzing databases individually, a significant improvement on the Dice after hybrid training was demonstrated for database 3 but not for databases 1 and 2. Database 3 also contains the largest number of cases and showed a smaller average Dice for both regular and hybrid UD-net compared with databases 1 and 2.

Table 2.

Mean and standard deviation for the Dice, number, and volume (in mL) of false positive detection (number of FPD, vol_FPD), number and volume (in mL) of false negative detection (number of FND, vol_FND), relative difference in terms of volume between the prediction and the expert segmentation (RVD) detailed for the different databases and for the UD-net regular and UD-net hybrid models on the test set with associated p-values. Significant p-values (0.5) are highlighted in bold.

  All test databases Test database 1 Test database 2 Test database 3
184 DWI MRI 16 DWI MRI 70 DWI MRI 98 DWI MRI
Regular Hybrid p-values Regular Hybrid p-values Regular Hybrid p-values Regular Hybrid p-values
Dice Mean 0.694 0.711 <0.01 0.73 0.721 0.74 0.744 0.758 0.071 0.653 0.676 <0.01
Std 0.201 0.199   0.219 0.289   0.154 0.135   0.22 0.214  
Number of FPD Mean 3.293 2.391 <0.01 7.375 5.063 0.03 4.143 2.986 <0.01 2.02 1.531 0.015
Std 3.656 2.882   3.324 3.642   3.979 2.956   2.709 2.285  
vol_FPD (mL) Mean 0.144 0.068 <0.01 1.149 0.465 0.02 0.08 0.046 0.151 0.025 0.019 0.147
Std 0.527 0.198   1.407 0.512   0.21 0.068   0.044 0.037  
Number of FND Mean 2.815 2.717 0.075 0.875 0.625 0.164 3.714 3.571 0.0124 2.49 2.449 0.582
Std 7.389 7.623   1.668 1.258   11.283 11.707   3.272 3.253  
vol_FND (mL) Mean 0.058 0.056 0.556 0.078 0.043 0.281 0.074 0.073 0.912 0.044 0.045 0.773
Std 0.16 0.168   0.178 0.095   0.198 0.229   0.124 0.119  
RVD Mean 0.377 0.303 0.037 0.246 0.241 0.875 0.257 0.223 0.464 0.484 0.37 0.049
Std 0.536 0.281   0.261 0.322   0.408 0.166   0.623 0.321  

The global number of FPDs decreased by almost 30% (27.4%) between regular training and hybrid training, from a mean value of 3.293±3.656 to a mean value of 2.391±2.882 (p<0.01). Such significant differences were found for each database independently.

The volume of FPDs was also significantly smaller for the hybrid training model considering the entire database. A few illustrative examples highlighting improvement in the FPDs are shown in Fig. 5.

Fig. 5.

Fig. 5

Axial view of MRIs from the database with expert segmentation as a red outline: (a) MRI, (b) MRI + ground truth, (c) MRI + segmentation with UD-net regular as a red contour (red arrows point to false positives), and (d) MRI + segmentation based on UD-net hybrid as a red contour, with false positives removed.

The number of FNDs decreased nonsignificantly, from 2.815±7.389 to 2.717±7.623, with hybrid training. No significant differences were found for the volume of false negatives.

RVD was significantly smaller for hybrid training considering the entire test set.

The results for predicted infarct volume categorization are presented in Table 3. The accuracies for all volume threshold classifications were at 99%. The sensitivity was between 0.923 and 1. The lowest specificity values were observed for volumes lower than 20 and 50 mL with, respectively, 92% and 86%. For the other cases, the specificity reached 99% and higher.

Table 3.

Sensitivity, specificity, and accuracy of the infarct classification into volume categories: ±20  mL, ±30  mL, and ±50  mL on the test set.

  <20 (mL) 20 (mL) Total <30 (mL) 30 (mL) Total <50 (mL) 50 (mL) Total
Expert segmentation 158 26 184 165 19 184 172 12 184
UD-net hybrid 158 26 184 166 18 184 170 14 184
Sensitivity 0.987 0.923   1 0.947   0.988 1  
Specificity 0.923 0.987   0.947 1   1 0.988  
Accuracy     0.989     0.995     0.989

5. Discussion and Conclusion

In this article, a two-phase learning scheme for training a single U-shaped FCN for ischemic infarct segmentation was presented. The network was trained with both whole MRIs and patches actively selected during the training. This approach significantly increased the Dice to reach an average of 0.711 compared with a regular training method. But more importantly, it drastically reduced the number of FPDs by about 30% compared with a regular approach based solely on whole MRIs.

The main objective of the hybrid phase was to focus the training of our network on relevant patches, as FPDs. Although the Dice has shown robust results on several segmentation challenges, the detection success or failure of smaller lesions can be diluted in the presence of larger lesions. The intuition behind this approach is to isolate smaller lesions and increase their impact with a local Dice. Detected regions were still fed to the network to provide successful cases during training, thus preventing overspecifying our model on such lesions. The network was also trained alternatively on whole MRIs, with the idea of preserving a model optimized on the Dice over the whole MRI, which is the final target metric. Our patch approach required training on the regular phase to obtain a model with robust segmentations; too many candidate patches would have been proposed due to FPD. The translation invariance of FCN allowed training on both whole MRIs and patches with the same weights. Compared with Chen et al.’s14 approach, our method presents the advantages of using a single model trained end-to-end on whole MRIs and patches, which is easier to train and deploy than a cascaded approach with two models.

Although improvements on FPD were high, the benefits on Dice were more modest, and no significant improvement was observed on FND. For the reasons discussed above, reducing the number of FPDs of small size is not expected to considerably increase the Dice over the whole MRIs.

Even though the relative difference in terms of volume between the prediction and the expert segmentation was significantly different, the absolute difference remains modest (1  ml) compared to the thresholds used to decide upon a thrombectomy. However, by avoiding the detection of several false positive islets, the hybrid method could still be clinically useful. First, even if small, the FPDs would need to be manually corrected, which could be time consuming for the clinician. Second, those FPDs could mislead the diagnosis or the etiological orientation of the patient (several small positive lesions orienting toward a cardio-embolic mechanism), especially if images are not interpreted by neuroradiologists as it is the case in smaller hospitals.

The model was trained and tested on three datasets with MRIs originating from three different vendors (1.5T and 3T), with different sequences and different times from stroke onset. The diversity of the database may offer more robustness to our model for external validation. Performances on the Dice with and without hybrid training were lower for database 3 but are likely linked to the fact that it contained cases with more lesions of smaller size, which are harder to segment.

From the context of infarct volume classification, our approach showed promising results for clinical applicability. The ability to sort infarct volume between 20, 30, and 50 mL thresholds reached accuracies of 99%. However, this should be further tested with additional cases containing larger infarct volumes, because only 14% of the tested data had a volume superior to 20 mL.

Our model can still be improved in several ways. To distinguish ischemic infarct voxel, a threshold of 0.5 was chosen based on the literature.14,17,27 However, no analysis was performed to evaluate Dice performance with different threshold values. Active patch selection was performed on the CPU, creating a bottleneck that slowed down computations. Furthermore, a new patch extraction strategy could be tested. For example, Guerrero et al.15 proposed the random shifting of patch centers and showed performance improvement in white matter hyperintensity segmentation. Moreover, the hybrid learning phase introduced new hyperparameters that could be tuned, such as the frequency of patch epochs against MRI epochs or the size of the extracted patches. More generally, the hybrid learning strategy should be evaluated with other deep learning architecture, such as Dense-nets, similar to Zhang et al.,27 and more importantly on other segmentation tasks, to confirm its general beneficial impact.

To conclude, an approach to train a 3-D FCN for infarct lesion segmentation in DWI was presented, by establishing a learning phase in which relevant patches are actively selected during training. The method considerably decreases the number of false-positive lesions. We believe that the idea of focusing on relevant regions of the image when training segmentation models could be further investigated, even on other applications such as multiple sclerosis lesions, for which numerous small lesions are to be segmented. The results showed good potential for clinical use as a preliminary screening tool in computer-aided diagnosis systems.

Acknowledgments

This work was supported by public grants from the French “Agence Nationale de la Recherche” within the context of the “Investments for the Future” program, referenced ANR-10-LABX-57 and named “TRAIL” (Translational Research and Advanced Imaging Laboratory, project DEEP-STROKE). “Region Nouvelle Aquitaine” also supported this work. The databases were funded by public grants from the French government (PHRC protocole hospitalier de recherche clinique). T.T. also received financial support from the ARSEP Foundation and from the French “Agence Nationale de la Recherche” within the context of the “Investments for the Future” program, referenced ANR-10-LABX-43 and named BRAIN. The authors would like to thank Enago (www.enago.com) for the English language review.

Biographies

Aurélien Olivier is a data scientist at DESKi. He is an MSc degree graduate from an IMT Atlantique-Mines Nantes with a specialization in computer science for decision support. He was introduced to deep learning for computer vision during internships at Kanzaki-Takahashi laboratory and at GE Healthcare prior to joining DESKi.

Olivier Moal is the cofounder and chief research officer of DESKi. He received his MSc degree from Ecole Polytechnique Fédérale de Lausanne. He was introduced to machine learning during the research project of his master’s degree at the University of California, Berkeley. He then specialized in deep learning for medical image analysis and founded DESKi in 2016.

Bertrand Moal is the cofounder and chief executive officer of DESKi. In parallel, he is a medical resident in public health at the Bordeaux University Hospital. He received his MSc degree from ESTP Paris in 2010 and his master’s degree in biomedical engineering from Arts et Métiers ParisTech, and conducted a biomechanics thesis in the medical field in partnership with the NYU Hospital in 2012. He founded DESKi in 2016.

Fanny Munsch is a PhD in neurosciences and engineer in MR image analysis. She is currently a postdoctoral fellow at Beth Israel Deaconess Medical Center, Department of Radiology, Division of MRI Research, where she is working on brain perfusion with arterial spring labeling, inhomogeneous magnetization transfer imaging, and myelin content.

Gosuke Okubo received his MD and PhD degrees from Kyoto University in 2007 and 2017, respectively. He was a postdoctoral researcher at the University of Bordeaux. Currently, he is a clinical fellow at the Department of Radiology, Tenri Hospital in Japan. His interests are in neuroimaging and quantitative image analysis.

Igor Sibon is an MD-PHD in neurology. He performed his medical studies at Bordeaux University, in France and a postdoctoral fellowship at Montreal McGill University, in Canada from 2004 to 2006. He is the head of the Stroke Unit at Bordeaux University Hospital since 2009, the codirector of the team “Neuroimaging and Human Cognition” at the INCIA-UMR CNRS-5287 since 2014, and the vice-president of the French Society of Neurovascular Diseases since 2017.

Vincent Dousset is an MD-PhD in radiology and the head of the NeuroImaging Department of Bordeaux University Hospital, France. He is the director of the Laboratory of Excellence TRAIL (Translational Advanced Imaging Laboratory) and of the Institute of BioImaging. His group develops translational approaches from animal models to the patients thanks to in vivo imaging in several neurological disorders.

Thomas Tourdias is an MD-PhD in radiology at the NeuroImaging Department of Bordeaux University Hospital, France. His main research projects are in stroke and multiple sclerosis. In these domains, he uses a strongly translational approach from animal models to the patients thanks to in vivo imaging. His broad research goal is to highlight imaging biomarkers and to validate their biological significance.

Disclosures

Mr. A.O. reports grants from Region Nouvelle Aquitaine and grants from Labex Trail during the conduct of the study, and a salary from DESKi. Mr. O.M. reports grants from Region Nouvelle Aquitaine and grants from Labex Trail during the conduct of the study, and a CTO position at DESKi. Dr. B.M. reports grants from Region Nouvelle Aquitaine and grants from Labex Trail during the conduct of the study and a CEO position at DESKi. Dr. F.M. reports grants from Region Nouvelle Aquitaine and grants from Labex Trail during the conduct of the study. Dr. G.O. reports grants from Region Nouvelle Aquitaine and grants from Labex Trail during the conduct of the study. Professor I.S. reports grants from Region Nouvelle Aquitaine and grants from Labex Trail during the conduct of the study. Professor V.D. reports grants from Region Nouvelle Aquitaine and grants from Labex Trail during the conduct of the study. Professor T.T. reports grants from Region Nouvelle Aquitaine and grants from Labex Trail during the conduct of the study.

References

  • 1.Mozaffarian D., et al. , “Heart disease and stroke statistics—2015 update: a report from the American Heart Association,” Circulation 131, e29–e322 (2015). 10.1161/CIR.0000000000000152 [DOI] [PubMed] [Google Scholar]
  • 2.Goyal M., et al. , “Endovascular thrombectomy after large-vessel ischaemic stroke: a meta-analysis of individual patient data from five randomised trials,” Lancet 387, 1723–1731 (2016). 10.1016/S0140-6736(16)00163-X [DOI] [PubMed] [Google Scholar]
  • 3.Merino J. G., Warach S., “Imaging of acute stroke,” Nat. Rev. Neurol. 6, 560–571 (2010). 10.1038/nrneurol.2010.129 [DOI] [PubMed] [Google Scholar]
  • 4.Le Bihan D., Iima M., “Diffusion magnetic resonance imaging: what water tells us about biological tissues,” PLoS Biol. 13, e1002246 (2015). 10.1371/journal.pbio.1002203 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Nogueira R. G., et al. , “Thrombectomy 6 to 24 hours after stroke with a mismatch between deficit and infarct,” N. Engl. J. Med. 378, 11–21 (2018). 10.1056/NEJMoa1706442 [DOI] [PubMed] [Google Scholar]
  • 6.Albers G. W., et al. , “Thrombectomy for stroke at 6 to 16 hours with selection by perfusion imaging,” N. Engl. J. Med. 378, 708–718 (2018). 10.1056/NEJMoa1713973 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Straka M., Albers G. W., Bammer R., “Real-time diffusion-perfusion mismatch analysis in acute stroke,” J. Magn. Reson. Imaging 32, 1024–1037 (2010). 10.1002/jmri.22338 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Stanley K. O., “Compositional pattern producing networks: a novel abstraction of development,” Genet. Program. Evolvable Mach. 8, 131–162 (2007). 10.1007/s10710-007-9028-8 [DOI] [Google Scholar]
  • 9.Gautheron V., et al. , “Pretreatment lesional volume impacts clinical outcome and thrombectomy efficacy,” Ann. Neurol. 83, 178–185 (2018). 10.1002/ana.25133 [DOI] [PubMed] [Google Scholar]
  • 10.Krizhevsky A., Sutskever I., Geoffrey E. H., “ImageNet classification with deep convolutional neural networks,” in Adv. Neural Inf. Process. Syst. 25, pp. 1097–1105 (2012). [Google Scholar]
  • 11.Menze B. H., et al. , “The multimodal brain tumor image segmentation benchmark (BRATS),” IEEE Trans. Med. Imaging 34, 1993–2024 (2015). 10.1109/TMI.2014.2377694 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Litjens G., et al. , “Evaluation of prostate segmentation algorithms for MRI: the PROMISE12 challenge,” Med. Image Anal. 18, 359–373 (2014). 10.1016/j.media.2013.12.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Maier O., et al. , “ISLES 2015—a public evaluation benchmark for ischemic stroke lesion segmentation from multispectral MRI,” Med. Image Anal. 35, 250–269 (2017). 10.1016/j.media.2016.07.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Chen L., Bentley P., Rueckert D., “Fully automatic acute ischemic lesion segmentation in DWI using convolutional neural networks,” NeuroImage Clin. 15, 633–643 (2017). 10.1016/j.nicl.2017.06.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Guerrero R., et al. , “White matter hyperintensity and stroke lesion segmentation and differentiation using convolutional neural networks,” NeuroImage Clin. 17, 918–934 (2018). 10.1016/j.nicl.2017.12.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Sudre C. H., et al. , “Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations,” Lect. Notes Comput. Sci. 10553, 240–248 (2017). 10.1007/978-3-319-67558-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kamnitsas K., et al. , “Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation,” Med. Image Anal. 36, 61–78 (2017). 10.1016/j.media.2016.10.004 [DOI] [PubMed] [Google Scholar]
  • 18.Ronneberger O., Fischer P., Brox T., “U-Net: convolutional networks for biomedical image segmentation,” Lect. Notes Comput. Sci. 9351, 234–241 (2015). 10.1007/978-3-319-24574-4 [DOI] [Google Scholar]
  • 19.Loshchilov I., Hutter F., “Online batch selection for faster training of neural networks,” arXiv:1511.06343, pp. 1–20 (2015).
  • 20.Shrivastava A., Gupta A., Girshick R., “Training region-based object detectors with online hard example mining,” arXiv:1604.03540 (2016).
  • 21.Long J., Shelhamer E., Darrell T., “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognit., pp. 3431–3440 (2015). 10.1109/CVPR.2015.7298965 [DOI] [PubMed] [Google Scholar]
  • 22.Noh H., Hong S., Han B., “Learning deconvolution network for semantic segmentation,” in Proc. IEEE Int. Conf. Comput. Vision, pp. 1520–1528 (2015). 10.1109/ICCV.2015.178 [DOI] [Google Scholar]
  • 23.Chen L.-C., et al. , “Rethinking atrous convolution for semantic image segmentation,” arXiv:1706.05587 (2017).
  • 24.Çiçek Ö., et al. , “3D U-net: learning dense volumetric segmentation from sparse annotation,” Lect. Notes Comput. Sci. 9901, 424–432 (2016). 10.1007/978-3-319-46723-8 [DOI] [Google Scholar]
  • 25.Milletari F., Navab N., Ahmadi S.-A., “V-Net: fully convolutional neural networks for volumetric medical image segmentation,” arXiv:1606.04797, pp. 1–11 (2016).
  • 26.Couprie C., et al. , “Learning hierarchical features for scene labeling,” IEEE Trans. Pattern Anal. Mach. Intell. 35, 1915–1929 (2013). 10.1109/TPAMI.2012.231 [DOI] [PubMed] [Google Scholar]
  • 27.Zhang R., et al. , “Automatic segmentation of acute ischemic stroke from DWI using 3D fully convolutional DenseNets,” IEEE Trans. Med. Imaging 37, 2149–2160 (2018). 10.1109/TMI.42 [DOI] [PubMed] [Google Scholar]
  • 28.Huang G., et al. , “Densely connected convolutional networks,” in Proc. 30th IEEE Conf. Comput. Vision and Pattern Recognit., pp. 2261–2269 (2017). 10.1109/CVPR.2017.243 [DOI] [Google Scholar]
  • 29.Wang Y., et al. , “Training multi-organ segmentation networks with sample selection by relaxed upper confident bound,” Lect. Notes Comput. Sci. 11073, 434–442 (2018). 10.1007/978-3-030-00937-3 [DOI] [Google Scholar]
  • 30.Kuchcinski G., et al. , “Thalamic alterations remote to infarct appear as focal iron accumulation and impact clinical outcome,” Brain 140, 1932–1946 (2017). 10.1093/brain/awx114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Munsch F., et al. , “Stroke location is an independent predictor of cognitive outcome,” Stroke 47, 66–73 (2016). 10.1161/STROKEAHA.115.011242 [DOI] [PubMed] [Google Scholar]
  • 32.Tourdias T., et al. , “Final cerebral infarct volume is predictable by MR imaging at 1 week,” Am. J. Neuroradiol. 32, 352–358 (2011). 10.3174/ajnr.A2271 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Kikinis R., Pieper S. D., Vosburgh K., “3D Slicer: a platform for subject-specific image analysis, visualization, and clinical support,” Intraoperative Imaging Image-Guided Ther. 3(19), 277–289 (2014). 10.1007/978-1-4614-7657-3_19 [DOI] [Google Scholar]
  • 34.Sibon I., et al. , “Inter- and intraobserver reliability of five MRI sequences in the evaluation of the final volume of cerebral infarct,” J. Magn. Reson. Imaging 29, 1280–1284 (2009). 10.1002/jmri.v29:6 [DOI] [PubMed] [Google Scholar]
  • 35.Dosovitskiy A., et al. , “Discriminative unsupervised feature learning with convolutional neural networks,” in Adv. Nueral Inf. Process. Syst. 27, pp. 1–13 (2014). [Google Scholar]
  • 36.Glorot X., Bengio Y., “Understanding the difficulty of training deep feedforward neural networks,” J. Mach. Learn. Res. 9, 249–256 (2010). [Google Scholar]
  • 37.Wu S., Zhong S., Liu Y., “Deep residual learning for image steganalysis,” Multimedia Tools Appl. 77, 10437–10453 (2018). 10.1007/s11042-017-4440-4 [DOI] [Google Scholar]
  • 38.Ioffe S., Szegedy C., “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proc. 32nd Int. Conf. Mach. Learn. (2015). [Google Scholar]
  • 39.Santurkar S., et al. , “How does batch normalization help optimization? (No, it is not about internal covariate shift),” (2018).
  • 40.Kayalibay B., Jensen G., van der Smagt P., “CNN-based segmentation of medical imaging data” (2017).
  • 41.Jones E., et al. , “SciPy: open source scientific tools for Python,” http://www.scipy.org (2001).
  • 42.Kingma D. P., Ba J., “Adam: a method for stochastic optimization,” arXiv:1412.6980, pp. 1–15 (2014).
  • 43.Chollet F., et al. , “Keras,” (2015).
  • 44.Abadi M., et al. , “TensorFlow: a system for large-scale machine learning,” in Proc. 12th USENIX Conf. Operating Syst. Des. and Implementation (2016). [Google Scholar]

Articles from Journal of Medical Imaging are provided here courtesy of Society of Photo-Optical Instrumentation Engineers

RESOURCES