Abstract
Purpose
To test the efficacy of lesion segmentation using a deep learning algorithm on non–contrast material–enhanced CT (NCCT) images with synthetic lesions resembling acute infarcts.
Materials and Methods
In this retrospective study, 40 diffusion-weighted imaging (DWI) lesions in patients with acute stroke (median age, 69 years; range, 62–76 years; 17 women; screened between 2011 and 2017) were coregistered to 40 normal NCCT scans (median age, 70 years; range, 55–76 years; 25 women; screened between 2008 and 2011), which produced 640 combinations of DWI-NCCT with and without lesions for training (n = 420), validation (n = 110), and testing (n = 110). The signal intensity on the NCCT scans was depressed by 4 HU (a 13% drop) in the region of the diffusion-weighted lesion. Two U-Net architectures (standard and symmetry aware) were trained with two different training strategies. One was a naive strategy, in which the model started training with random coefficients. The other was a progressive strategy, which started with coefficients derived from a model trained on a dataset with lesions that were depressed by 10 HU. The Dice scores from the two architectures and training strategies were compared from the test dataset.
Results
Dice scores of symmetry-aware U-Nets were 25% higher than those of standard U-Nets (median, 0.49 vs 0.65; P < .001). Use of a progressive training strategy had no clear effect on model performance.
Conclusion
Symmetry-aware U-Nets offer promise for segmentation of acute stroke lesions on NCCT scans.
Keywords: Adults, CT-Quantitative, Stroke
Supplemental material is available for this article.
© RSNA, 2021
Keywords: Adults, CT-Quantitative, Stroke
Summary
Segmenting subtle hypoattenuated lesions on non–contrast material–enhanced CT scans produced 25% higher Dice scores with symmetry-aware deep learning architectures than conventional U-Nets.
Key Point
■ Symmetry-aware U-Net had a higher performance than standard U-Nets for segmentation of simulated lesions on non–contrast material–enhanced CT scans (median Dice, 0.49 vs 0.65; P < .001).
Introduction
Non–contrast material–enhanced CT (NCCT) imaging is the most widely available imaging modality for acute stroke. NCCT scans are routinely ordered and reviewed by emergency department physicians and neurologists caring for patients with acute stroke. Accurate identification of acute stroke lesions on NCCT scans is challenging, particularly for physicians without neuroradiology training. Automated identification of acute stroke lesions on NCCT scans would be useful, but lesion segmentation algorithms that operate on NCCT scans are only beginning to emerge (1).
A major challenge with identification of acute infarct lesions on NCCT scans is their low contrast-to-noise ratios. In the early hours after stroke onset, the infarcted area experiences a drop in attenuation that is dependent on time and the level of flow deficit. Studies in primates have shown that attenuation is depressed by 4–5 HU after 4–6 hours with severe blood flow reduction (2). Given the high level of intrinsic image noise, typically with a standard deviation of 5 HU, the contrast-to-noise level is below one, producing difficult-to-discern lesions (2,3).
Convolutional neural networks (CNNs) are currently solving many segmentation tasks with higher performance and with shorter development time than conventional algorithms. Training of a typical CNN generally requires a large number of images paired with ground truth lesion labels. Such a dataset is difficult to obtain for NCCT because manual lesion segmentation can vary a great deal due to poor lesion contrast, even by highly experienced human readers. The lack of an adequate reference standard poses challenges to both the training of CNNs and the evaluation of their performance.
To address this challenge, we aimed to set up a simulated lesion framework that could be used to efficiently compare different architectures and hyperparameters in a controlled setting and demonstrate how it can be used to select the most promising architectures for subsequent validation with in vivo data. We coregistered diffusion-weighted imaging (DWI) lesion outlines in patients with acute stroke to normal NCCT scans. We then introduced various levels of Hounsfield unit depression in the region of the DWI lesion to create realistic lesions on the NCCT scans with a perfectly defined ground truth. We investigated the effectiveness of two modifications to conventional U-Net methods and hypothesized (a) that a “symmetry-aware” CNN architecture, a method that has shown promise in tumor applications (4), has higher lesion detection performance compared with a standard U-Net architecture and (b) that “progressive training” of the CNN, where the CNN is first trained using exaggerated lesions (starting with 10-HU depression) followed by training on typical lesions (4-HU depression), would yield improved performance.
Materials and Methods
Patients
Forty NCCT scans without an acute stroke lesion were obtained from the clinical picture archiving and communication system by reviewing a consecutive cohort of patients who were screened for stroke (from 2011 to 2017) but were confirmed to have a normal DWI scan within 1 day after the NCCT and were ultimately diagnosed with an alternative condition on the basis of all available clinical and radiologic data available to the treating physician. Institutional review board approval was obtained, and consent was waived. See Appendix E1 and E2 (supplement) for acquisition protocol details.
From a separate clinical trial dataset of large vessel occlusion anterior circulation stroke, 40 DWI scans with acute stroke lesion outlines of at least 20 mL were selected at random (5). The focus of the clinical trial was on DWI lesion volumes, whereas we use the DWI outlines for their realistic infarct shape characteristics. DW images were acquired between 2008 and 2011 with institutional review board approval and participant consent. The data were de-identified prior to analysis following Health Insurance Portability and Accountability Act guidelines for de-identification.
Data Partitioning and Preprocessing
The scans were split into training (50%), validation (25%), and test (25%) datasets. The training set consisted of 20 NCCT and 20 DWI scans. All images were downsampled to 128 × 181 in-plane resolution. Each DWI lesion scan was coregistered (using affine, followed by nonlinear registration of DWI to NCCT, SimpleElastix v1.2; https://simpleelastix.github.io/) to each normal NCCT (6), which produced 400 distinct combinations of NCCT scans with a superimposed DWI lesion in the training set. The validation and test sets, each consisting of 10 normal NCCT scans in combination with 10 DW images, yielded 100 DWI-NCCT combinations in each set. In addition, we also added the normal NCCT scans without abnormality to both training, validation, and test datasets. As such, the number of cases in each partition was 420 (training), 110 (validation), and 110 (test). There was no overlap in patients between the datasets. We did not perform any geometric distortion correction of the DWI data, as the affine and nonlinear registration ensured a good overall fit to the NCCT scans.
Synthesis of the NCCT and Ground Truth Lesion
The signal intensity of each normal NCCT scan was then depressed in the region of the paired DWI lesion, producing a simulated instance of an NCCT scan with a hypodense infarct. The depression was given a more realistic edge by smoothing with a Gaussian 3 × 3 kernel (σ = 0.8 mm), causing a gradual transition to normal tissue rather than a sharp edge. For this study, we used two levels of signal depression: A depression of 4 HU was used to simulate clinically relevant depressions, whereas a depression of 10 HU was used to train the U-Net in a progressive fashion, starting with a dataset with more obvious lesions. The 4-HU depression is typical of the acute phase, and although detection of lower levels of hypodensities is of interest, we used this level as a starting point because it is visually discernible, and we expected that an algorithm could detect it (Fig 1).
Figure 1:
Examples of acquired and simulated images from six different patients show, from top to bottom, normal non–contrast material–enhanced CT scan, ground truth diffusion-weighted imaging (DWI) lesion superimposed, scan from first row but depressed 4 HU in the region of the DWI, scan from first row but depressed 10 HU, and example of symmetry-aware U-Net prediction based on the images from images with 4-HU depression.
Networks
We used a publicly available two-dimensional U-Net as well as a symmetry-aware extension of this U-Net based on work by Zhang et al (4,7,8). The idea behind the symmetry-aware U-Net is to add an additional encoding arm with the mirrored brain input alongside the native brain and then concatenate the channels from both arms at the lowest level. In this way, the network has access to perform comparisons between ipsi- and contralateral tissue that otherwise are not offered by the U-Net architecture. The mirrored brain was created by using nonlinear registration to register the brain to itself across the interhemispheric axis. The nonlinear registration ensures that large-scale asymmetry in ventricle size is corrected for. All registrations were visually verified.
Training and Augmentation
The learning rate was set to 10e–3. We used a stochastic gradient descent optimizer with decay 10e–6 and momentum 0.3 and trained for 100 epochs. The loss function was composed of Dice similarity coefficient (DSC) combined with binary cross entropy with equal weights. This choice was motivated by work of other groups (9), and empirically the networks did not train well with DSC alone. Please see Appendix E3 (supplement) for details on the loss function and augmentation.
Experiments
We performed two distinct experiments. In the first, we compared the performance of a conventional U-Net with the symmetry-aware U-Net in the dataset with a lesion depression of 4 HU. The second experiment was a repeat of the first, but training was initialized using a model that was produced by first training with a depression of 10 HU. For brevity, we use the terms progressively trained network and naive network to indicate whether the network has first been trained at the 10-HU level or is trained de novo.
Statistical Analysis
For all experiments, the best model from the validation epochs was defined as the model with the highest median DSC with a median false-positive volume of less than 10 mL in no-lesion cases. This model was selected and evaluated in the withheld test dataset in terms of DSC and volumes of predicted lesions in cases with no abnormalities. The effect of mirroring and progressive training on DSC and false-positive volumes was assessed using generalized estimating equations, which allow for association testing of each of the effects in our repeated measurements design with nonnormally distributed outcomes. The analysis was performed using IBM SPSS Statistics for Windows, version 26.0. A model with binomial distribution and log as link function was used. A P value less than .05 was considered significant, and all tests were two-sided. Data are shown with 95% CIs. The study was exploratory, and no power calculation was performed.
Results
Patient Characteristics
The 40 NCCT scans were in patients who had a median age of 69 years (range, 62–76 years), and 17 were women. The DWI lesions were segmented on MRI scans from patients who had a median age of 70 years (interquartile range [IQR], 55–79 years), and 25 were women. In the training data, the mean DWI lesion volume was 109 mL (IQR, 55–214 mL) with 19 middle cerebral artery distribution lesions and one combined middle-anterior cerebral artery distribution lesion. In the test data, the mean DWI lesion volume was 106 mL (IQR, 73–143 mL) with nine middle cerebral artery distribution lesions and one combined middle-anterior cerebral artery distribution lesion.
Symmetry-aware Compared with the Standard U-Net
When training with depressions of 4 HU, the symmetry-aware U-Net showed a steeper slope and a higher plateau of its validation performance curve compared with the standard U-Net (Fig 2A). The symmetry-aware U-Net had a 25% (95% CI: 19, 31) higher DSC than the standard U-Net (P < .001 in the generalized estimating equations model). The median DSC was 0.65 (IQR, 0.51–0.74) compared with 0.49 (IQR, 0.27–0.63) for the standard U-Net. The symmetry-aware U-Net produced false-positive volumes that were 51% (95% CI: 38, 70; P < .001) smaller than those for the standard U-Net. The median volume of false-positive volume in normal cases using a symmetry-aware U-Net with naive training was median 4 mL (IQR, 3–11 mL).
Figure 2:
(A, B) Validation performance for the symmetry-aware and standard U-Net in 4-HU data and 10-HU data. (C, D) Validation performance as a function of training epoch with naive initialization and initialization based on training with 10-HU data in a symmetry-aware U-Net and (B) in a standard U-Net. DSC = Dice similarity coefficient.
Progressive Training
To assess the value of progressive training, optimal models trained in the 10-HU dataset were selected to initialize the training at 4 HU. Figure 2B shows the training DSC plots for the 10-HU training set. A comparison of the performance of the progressively trained model against the naively trained model is shown in Figure 2C (symmetry-aware U-Net) and 2D (standard U-Net). In the independent test data, progressive training did not show a significant effect on DSC (0.1%) (95% CI: −2, 2; P = .87). Progressive training resulted in false-positive volumes that were 19% (95% CI: 9, 29; P < .001) smaller than without progressive training. Figure 3 shows DSC for all experiments.
Figure 3:

Dice similarity coefficient (DSC) distribution in test data for each experiment. iqr = interquartile range.
Discussion
Accurate identification of the early ischemic core on NCCT scans is a difficult task, which could be aided by automated software solutions. In this study, we evaluated the ability of deep learning networks to automatically segment simulated infarcts on NCCT scans. Our highest-performing model achieved a DSC of 0.65 (IQR, 0.51–0.74). Given that this level of performance was obtained with a perfect ground truth for both training and validation, these performance metrics may be expected on clinical data with similar levels of depressions. In the early hours after stroke onset, the signal depression is often less than 4 HU, and one would expect lower performance in those cases (10).
The use of a symmetry-aware network improved the DSC (median, 0.65 vs 0.49; P < .001). Zhang et al examined network performance as a function of the number of feature maps used in their network and found that a symmetry-aware network invariably converged faster, but once the number of features reached 32 filters in the first U-Net level, they found no added benefit (4). In contrast, our results, also using 32 filters, show a clear improvement with the symmetry-aware network for subtle lesions with 4-HU depression but not for more obvious 10-HU lesions as shown in Figures 2 and 3. We speculate that availability of symmetry information is particularly beneficial in the case of subtle lesions, as is the case with typical acute infarcts on NCCT scans, and less beneficial for lesions that are easily distinguished from normal tissue. For subtle lesions, the contralateral region provides a reasonably matched control region in terms of interfering contrasts, such as patient-specific gray and white matter distribution and beam-hardening effects.
As illustrated in Figure 1, a depression of 4 HU is difficult to discern visually due to high image noise and large inherent gray and white matter contrast. Although the multiscale nature of the U-Net allows flexibility in terms of feature extraction at various scales, our results show that feeding in the mirrored image explicitly leads to higher performance than what the standard U-Net achieves. We speculate that the standard U-Net is either not able to learn the relatively complex operation of flipping across the patient-specific midline or that much more training data are needed.
Progressive training showed no significant effect on DSC. The rationale for progressive training is to drive the model more robustly toward the minimum of the cost function with a higher contrast-to-noise dataset. It is likely that more sophisticated schemes could work better, for example, by training using a mix of 4-HU and 10-HU cases or by adding in depressions in the intermediary range (4–10 HU) in some form of controlled learning schedule.
A major hurdle is the lack of reliable human lesion segmentation of NCCT scans to use for training and validation. To the best of our knowledge, and likely reflecting the difficulty of the task, no studies exist that have analyzed manual interrater DSC in acute stroke cases with subtle hypodensities. A single report on patients with mature NCCT lesions (day 3) reported interrater DSC of 0.79 (11). To avoid using manual segmentation, some investigators have used DWI performed shortly after NCCT as reference standard (12). Although the DWI lesion is the reference standard for the acute infarct, it is not necessarily the standard for the NCCT hypoattenuation, because the modalities are sensitive to different biologic processes of infarction (cytotoxic vs ionic edema) (13). Follow-up infarction in patients with early reperfusion is also a possible choice of reference standard for infarction, but even in the face of irreversible cellular damage, NCCT hypodensities may take hours to develop, so follow-up infarction is likely to overcall the area affected in an acute NCCT scan (10).
Our study had limitations. First, the lesions that we simulated were relatively large (> 20 mL). Segmentation of smaller lesions will likely be more challenging. Second, while we used augmentation of our data to generate 420 training and 220 test and validation cases, an even larger dataset might have resulted in higher performance of our models. Third, we used simulated lesions, which appear realistic but do not perfectly resemble real lesions. For example, our simulated dataset focuses only on the hypoattenuation aspect of acute stroke, which is the principal feature seen on NCCT scans. A more sophisticated approach would also simulate sulcal effacement caused by local displacement of cerebrospinal fluid. Similarly, a more advanced simulation would allow for introduction of motion, as this is frequent in real-life situations. Also, there is heterogeneity in the Hounsfield unit depression of acute infarcts both between patients and within the lesion of a single patient, whereas our simulated lesions were homogeneously depressed by either 4 or 10 HU except for the lesion’s edge. While our simulations are ideally suited to answer our primary research question, namely if a symmetry-aware network outperforms a conventional network, additional studies are needed to determine the performance of our models on real patient data. A challenge of any clinical validation study will be the determination of a reference standard for lesion segmentation on NCCT scans, an issue that we addressed by using simulated lesions in this study. We therefore believe that a training strategy with simulated data is the most efficient path to developing an optimal segmentation algorithm. Such a strategy has already been shown to be of great value for the detection of DWI positive stroke lesions (14).
In conclusion, we present a simulation-based test bed for optimization of network architecture and hyperparameter tuning and show that symmetry-aware U-Nets offer promise for segmentation of acute stroke lesions on NCCT scans. Future studies are needed to validate the diagnostic accuracy of such symmetry-aware models on clinical NCCT scans obtained in patients with acute stroke.
Supported in part by National Institutes of Health/National Institute of Neurological Disorders and Stroke grant R01NS075209.
Disclosures of Conflicts of Interest: S.C. Activities related to the present article: author received grant from National Institutes of Health (NIH). Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships. M.M. disclosed no relevant relationships. J.M. disclosed no relevant relationships. C.F. disclosed no relevant relationships. G.W.A. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: author is consultant for iSchemaView; author has equity interest in iSchemaView. Other relationships: disclosed no relevant relationships. M.G.L. Activities related to the present article: institution received grant from NIH/ National Institute of Neurological Disorders and Stroke (R01NS075209). Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships.
Abbreviations:
- CNN
- convolutional neural network
- DSC
- Dice similarity coefficient
- DWI
- diffusion-weighted imaging
- IQR
- interquartile range
- NCCT
- non–contrast material–enhanced CT
References
- 1.Mikhail P, Le MGD, Mair G. Computational Image Analysis of Nonenhanced Computed Tomography for Acute Ischaemic Stroke: A Systematic Review. J Stroke Cerebrovasc Dis 2020;29(5):104715. [DOI] [PubMed] [Google Scholar]
- 2.von Kummer R, Weber J. Brain and vascular imaging in acute ischemic stroke: the potential of computed tomography. Neurology 1997;49(5 Suppl 4):S52–S55. [DOI] [PubMed] [Google Scholar]
- 3.Dzialowski I, Weber J, Doerfler A, Forsting M, von Kummer R. Brain tissue water uptake after middle cerebral artery occlusion assessed with CT. J Neuroimaging 2004;14(1):42–48. [PubMed] [Google Scholar]
- 4.Zhang H, Zhu X, Willke TL. Segmenting brain tumors with symmetry. arXiv:1711.06636 [cs.CV] [preprint] http://arxiv.org/abs/1711.06636. Posted November 17, 2017. Accessed April 16, 2020. [Google Scholar]
- 5.Lansberg MG, Straka M, Kemp S, et al. MRI profile and response to endovascular reperfusion after stroke (DEFUSE 2): a prospective cohort study. Lancet Neurol 2012;11(10):860–867. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Marstal K, Berendsen F, Staring M, Klein S. SimpleElastix: A user-friendly, multi-lingual library for medical image registration. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, NV,June 26–July 1, 2016.Piscataway, NJ:IEEE,2016;574–582. [Google Scholar]
- 7.Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. arXiv:1505.04597 [cs.CV] [preprint] http://arxiv.org/abs/1505.04597. Posted May 18, 2015. Accessed April 15, 2020. [Google Scholar]
- 8.Antica L. retina-unet. Github. https://github.com/orobix/retina-unet. Published. 2020. Accessed December 1, 2020.
- 9.Isensee F, Petersen J, Klein A, et al. nnU-Net: Self-adapting framework for U-Net-based medical image segmentation. arXiv:1809.10486 [cs.CV] [preprint] http://arxiv.org/abs/1809.10486. Posted September 27, 2018. Accessed October 1, 2020. [Google Scholar]
- 10.Nemoto EM, Mendez O, Kerr ME, et al. CT density changes with rapid onset acute, severe, focal cerebral ischemia in monkeys. Transl Stroke Res 2012;3(3):369–374. [DOI] [PubMed] [Google Scholar]
- 11.Vos PC, Biesbroek JM, Weaver NA, Velthuis BK, Viergever MA. Automatic detection and segmentation of ischemic lesions in computed tomography images of stroke patients. In: Novak CL, Aylward S, eds.Proceedings of SPIE: medical imaging 2013—computer-aided diagnosis.Vol 8670.Bellingham, Wash:International Society for Optics and Photonics,2013;867013. [Google Scholar]
- 12.Qiu W, Kuang H, Teleg E, et al. Machine learning for detecting early infarction in acute stroke with non-contrast-enhanced CT. Radiology 2020;294(3):638–644. [DOI] [PubMed] [Google Scholar]
- 13.von Kummer R, Dzialowski I. Imaging of cerebral ischemic edema and neuronal death. Neuroradiology 2017;59(6):545–553. [DOI] [PubMed] [Google Scholar]
- 14.Federau C, Christensen S, Scherrer N, et al. Improved segmentation and detection sensitivity of diffusion-weighted stroke lesions with synthetically enhanced deep learning. Radiol Artif Intell 2020;2(5):e190217. [DOI] [PMC free article] [PubMed] [Google Scholar]


