Abstract
About 50%–80% of very preterm infants (VPIs) (≤ 32 weeks gestational age) exhibit diffuse white matter abnormality (DWMA) on their MR images at term-equivalent age. It remains unknown if DWMA is associated with developmental impairments, and further study is warranted. To aid in the assessment of DWMA, a deep learning model for DWMA quantification on T2-weighted MR images was developed. This secondary analysis of prospective data was performed with an internal cohort of 98 VPIs (data collected from December 2014 to April 2016) and an external cohort of 28 VPIs (data collected from January 2012 to August 2014) who had already undergone MRI at term-equivalent age. Ground truth DWMA regions were manually annotated by two human experts with the guidance of a prior published semiautomated algorithm. In a twofold cross-validation experiment using the internal cohort of 98 infants, the three-dimensional (3D) ResU-Net model accurately segmented DWMA with a Dice similarity coefficient of 0.907 ± 0.041 (standard deviation) and balanced accuracy of 96.0% ± 2.1, outperforming multiple peer deep learning models. The 3D ResU-Net model that was trained with the whole internal cohort (n = 98) was further tested on an independent external test cohort (n = 28) and achieved a Dice similarity coefficient of 0.877 ± 0.059 and balanced accuracy of 92.3% ± 3.9. The externally validated 3D ResU-Net deep learning model for accurately segmenting DWMA may facilitate the clinical diagnosis of DWMA in VPIs.
Supplemental material is available for this article.
Keywords: Brain/Brain Stem, Convolutional Neural Network (CNN), MR-Imaging, Pediatrics, Segmentation, Supervised learning
© RSNA, 2021
Summary
A deep learning model was developed to segment diffuse white matter abnormality on T2-weighted brain MR images of very preterm infants.
Key Results
■ The proposed three-dimensional (3D) ResU-Net model achieved a Dice similarity coefficient (DSC) of 0.907 for diffuse white matter abnormality (DWMA) segmentation in an internal twofold cross-validation on MR images in 98 very preterm infants (VPIs).
■ The 3D ResU-Net model achieved a DSC of 0.877 for DWMA segmentation in an external evaluation on an independent test dataset consisting of 28 VPIs.
■ The 3D ResU-Net model achieved better DWMA segmentation performance than multiple peer deep learning models.
Introduction
Diffuse white matter abnormality (DWMA) is characterized by diffusely higher signal intensity in white matter at T2-weighted MRI or lower signal intensity in white matter at T1-weighted MRI. It is commonly observed in very preterm infants (VPIs) (≤ 32 weeks gestational age) and reported in many previous studies (1–6). About 50%–80% of VPIs exhibit DWMA when undergoing MRI at their term-equivalent age (1,7,8). Despite the high prevalence of DWMA, the pathologic clinical importance of DWMA continues to be debated. Several neurodevelopmental follow-up studies have revealed that DWMA is predictive of lower developmental scores (3,8–10). Nevertheless, other studies have not observed substantial links between DWMA and neurodevelopmental impairments (4,5,11–13). The limited progress in resolving the controversy over DWMA is partially attributed to the lack of reproducible quantification methods for evaluating DWMA. Subjective and qualitative assessment of DWMA are still the most commonly used strategies (4,6,14), which may produce measurement errors, resulting in poor reliability and reproducibility. Therefore, accurate, objective, and quantitative assessment of DWMA is essential to settle the debate of whether DWMA is a transient benign phenomenon or a pathologic abnormality (6,15,16).
Automated quantification of DWMA in infants remains challenging. Early efforts to develop automatic DWMA quantification algorithms have been limited to using individual white matter voxels of T2-weighted MR images (8,17). Recently, we developed a deep convolutional neural network (CNN) for DWMA quantification in VPIs by considering small image patches with a sliding-window setup to classify each voxel into DWMA or normal white matter groups (18). However, the sliding-window setup generated a lot of overlapping patches, causing computational redundancy. Furthermore, the CNN model used only two-dimensional (2D) brain images in the axial directions from a three-dimensional (3D) T2-weighted brain MRI, thus achieving suboptimal performance.
In this study, we developed a deep learning model 3D ResU-Net (19) to automatically segment DWMA by using 3D spatial information of T2-weighted brain images. The proposed model was evaluated using both internal and external independent cohorts and was compared with multiple alternative deep learning models. Such a validated model could identify neonates who are more likely to benefit from early intervention therapies in a timelier manner.
Materials and Methods
Study Overview and Infant Enrollment
In this study, we first conducted DWMA annotation for deep learning model development (Fig 1, A). Next, a 3D ResU-Net model was developed for DWMA segmentation. Three-dimensional image patches of cerebral tissues (white matter and gray matter) and DWMA ground truth maps were used to train the proposed model in a supervised manner (Fig 1, B).
Figure 1:
Study overview. A, Ground truth diffuse white matter abnormality (DWMA) maps annotation. The T2-weighted brain MR images were annotated for DWMA segmentation by human raters. B, DWMA segmentation using three-dimensional (3D) ResU-Net. Image patches of cerebral tissues (white matter and gray matter) and ground truth DWMA maps were used to train the model in a supervised manner.
The institutional review board of Nationwide Children’s Hospital approved this Health Insurance Portability and Accountability Act–compliant study, and written informed parental consent was obtained for every infant. In this secondary analysis of prospective data, infants who were born at 32 or fewer weeks gestation were eligible for inclusion (from December 2014 to April 2016 for the internal cohort and January 2012 to August 2014 for the external cohort). Infants with known congenital structural central nervous system anomalies, chromosomal anomalies, congenital cyanotic cardiac defects, severe brain injury (which would result in inaccurate brain segmentation), or those with incomplete MRI data were excluded.
These cohorts were reported in a previous study to evaluate the DWMA segmentation of a CNN model. The same cohorts were used in this study to evaluate a 3D ResU-Net on DWMA segmentation.
Image Acquisition
All infants were scanned at term-equivalent age during natural sleep without the use of any sedation. For the internal cohort, anatomic axial 2D fast spin-echo T2-weighted MR images were acquired with 32-channel head coil using the following sequence parameters: repetition time, 9500 msec; echo time, 147 msec; flip angle, 150°; resolution 0.9 × 0.9 × 1.1 mm3; scan time, 4.15 minutes. All infants from the internal cohort were imaged with a 3-T Siemens Magnetom Skyra scanner (Siemens Healthineers).
For the external cohort, anatomic axial 2D fast spin-echo proton density and T2-weighted MR images were obtained with eight-channel head coil using the following sequence parameters: repetition time, 11 000 msec; echo time 1, 14 msec; echo time 2, 185 msec; flip angle, 90°; resolution 0.35 × 0.35 × 2 mm3; scan time, 5.2 minutes. All infants from the external cohort were imaged with a 3-T GE HDX scanner (GE Healthcare).
Ground Truth DWMA Maps
Three steps were applied to generate ground truth DWMA maps. First, we implemented a prior published atlas-based semiautomated algorithm (17) to generate preliminary DWMA maps. DWMA voxels were designated on the basis of white matter signal intensity values greater than or equal to α = 1.4 standard deviation above the mean of cerebral tissues (white and gray matter) according to the definition of DWMA in multiple prior studies (8,17,18,20).
Second, with the supervision from an established neonatologist (N.A.P.) with 15 years of experience in neonatal advanced MRI, two neuroimaging experts (H.L. and M.C.) with 10 and 4 years of MR image processing experience, respectively, manually corrected the preliminary DWMA maps to remove any false-positive detections that are typically at the gray-white matter interface. The manual correction was never used to adjust semiautomated algorithm-defined boundaries of DWMA, which was unreliable for human raters to visually distinguish normal from abnormal (especially slightly abnormal) signal intensity.
Third, our two raters discussed with the neonatologist and reached a consensus. The final unified ground truth DWMA maps were then used to train our model in this work.
Development and Evaluation of 3D ResU-Net for DWMA Segmentation
We designed a 3D ResU-Net model with seven convolutional blocks and seven residual blocks. Detailed description of the model architecture is provided in Figure E1 (supplement). We randomly extracted 1000 small 3D image patches (ie, cubes) with a size of 16 × 16 × 16 from cerebral tissues of 3D MR images. This not only increased the training sample size, but also reduced the dimensions of training data, which resulted in a total of 98 000 small image patches for model training and evaluation. We adopted Dice similarity coefficient (DSC) as a loss function. A mini-batch Adam algorithm (21) was chosen to minimize the loss function. The learning rate was set as 0.05. The number of training epochs was set as 100 with an early stop mechanism, which would cease the optimization process if 20 consecutive epochs returned the same loss errors. Additional details of the model training are provided in Appendix E1 (supplement).
We applied a twofold cross-validation strategy to test the model using the internal cohort (Appendix E1 [supplement]). Then, the 3D ResU-Net model that was trained with the whole internal cohort was further tested on the external test cohort. We applied DSC, balanced accuracy, and lesion-wise sensitivity and precision for the evaluation of infant-wise DWMA segmentation. The lesion-wise sensitivity was calculated as the ratio of the number of correctly detected DWMA regions to the total number of true DWMA regions, while the lesion-wise precision was calculated as the ratio of the number of correctly detected DWMA regions to the total number of detected DWMA regions. We compared the segmentation performance of using the proposed 3D ResU-Net with that of using 2D U-Net (22), deep CNN (18), and deep neural network (DNN) (18). The details of these peer models are also provided in Appendix E1 (supplement).
Statistical Analysis
The two-sided Student t test was used to assess group differences. A P value less than .05 was considered statistically significant. Analyses were performed with the statistical package of Matlab 2019b (MathWorks).
Data and Code Availability
Requests to access the datasets used in this study should be directed to the corresponding author. The source code of proposed 3D ResU-Net is publicly accessible on GitHub (https://github.com/chen2mg/aicad/tree/main/DWMA_segmentation_3DResUNet).
Results
Infant Characteristics and Ground Truth Annotation
Of the 110 enrolled VPIs in the internal cohort, 11 infants were excluded due to severe cerebellar hemorrhage and ventriculomegaly, and one infant was excluded because of excessive motion artifacts. Two raters separately conducted DWMA annotation on the images from the 98 VPIs (53 male, 45 female) from the internal dataset. Rater 1 had a DSC of 0.735 with the preliminary DWMA map, while rater 2 had a DSC of 0.768 with the preliminary DWMA map. Interrater DSC was 0.943.
Of the 40 enrolled VPIs in the external cohort, seven infants were excluded because of severe cerebellar hemorrhage and ventriculomegaly, and five infants were excluded due to incomplete T2-weighted data. This resulted in 28 infants (14 male, 14 female) within the external cohort. Rater 1 had a DSC of 0.828 with the preliminary DWMA map, and rater 2 had a DSC of 0.796 with the preliminary DWMA map. Interrater DSC was 0.957. Demographic information for the internal and external cohorts is listed in Table 1.
Table 1:
Baseline Demographic Information for Internal and External Datasets
Performance of 3D ResU-Net on DWMA Segmentation
The DWMA segmentation performances of multiple deep learning models are displayed in Table 2. In the internal cross-validation, the DSC of the 3D ResU-Net (0.907 ± 0.041) was higher than that of 2D U-Net (0.886 ± 0.052; P = .002), CNN (0.862 ± 0.056; P < .001), and DNN (0.824 ± 0.063; P < .001). Meanwhile, the 3D ResU-Net exhibited a higher balanced accuracy (96.0% ± 2.1) than 2D U-Net (95.2% ± 2.0; P = .007), CNN (94.3% ± 2.6; P < .001), and DNN (91.3% ± 2.4; P < .001). The developed 3D ResU-Net had a higher lesion-wise sensitivity (94.6% ± 10.3) and precision (91.2% ± 7.5) than both CNN (87.3% ± 15.3; P = .009, 85.3% ± 8.1; P = .004) and DNN (85.4% ± 16.7; P < .001, 83.3% ± 7.5; P < .001) models, but did not significantly outperform 2D U-Net (91.4% ± 13.7; P = .07, 88.6% ± 7.9; P = .05). Example DWMA segmentation maps obtained by the proposed model from 10 different infants are displayed in Figure 2 and Figure E3 (supplement).
Table 2:
Performance Comparison of DWMA Segmentation Using Different Deep Learning Models
Figure 2:
Visualization of diffuse white matter abnormality (DWMA) segmentation (highlighted in yellow) in five very preterm infants. Left column: T2-weighted images; middle column: corresponding images with ground truth DWMA maps; right column: corresponding images with DWMA maps segmented using three-dimensional (3D) ResU-Net.
In the external test dataset, the 3D ResU-Net achieved higher DSC (0.877 ± 0.059) and balanced accuracy (92.3% ± 3.9) than 2D U-Net (0.859 ± 0.051, P = .04; 90.2% ± 3.6, P < .001), CNN (0.846 ± 0.058, P = .004; 88.6% ± 4.7, P < .001), and DNN (0.801 ± 0.065, P < .001; 85.7% ± 5.1, P < .001) models. The lesion-wise sensitivity (98.5% ± 4.9) of the 3D ResU-Net was higher than that of 2D U-Net (94.2% ± 6.2; P < .001), CNN (90.8% ± 8.5; P < .001), and DNN (86.9% ± 8.6; P < .001). Similarly, the lesion-wise precision (94.2% ± 5.9) of the 3D ResU-Net was higher than that of 2D U-Net (91.1% ± 6.2; P = .012), CNN (87.4% ± 6.4; P = .003), and DNN (84.8% ± 6.7; P < .001).
Discussion
This study demonstrated a 3D ResU-Net model that is able to accurately segment DWMA regions on T2-weighted brain MR images so as to automatically quantify DWMA. The model exhibited higher DWMA segmentation performance than other peer deep learning models (18).
Our study has several contributions to DWMA segmentation in VPIs. First, extensive data annotation was conducted on our datasets by two human raters. Additional manual correction, guided from a neonatologist with long-standing expertise in DWMA research, was further performed on the datasets. Then, unified ground truth DWMA maps were formulated after discussion. Second, the developed 3D ResU-Net demonstrated consistently and significantly better segmentation performance than several other advanced deep learning models (Table 2). Compared with single-voxel-assignment strategies (ie, CNN and DNN) using the sliding-window setup, the U-shape models are able to mitigate computational redundancy by assigning labels to the voxels of the whole patch simultaneously. The use of 3D image convolutional filters enabled volumetric segmentation of DWMA by taking advantage of the images’ 3D information, achieving improved performance compared to 2D U-Net. In addition, the residual blocks effectively solve the vanishing gradients and curse of dimensionality problems. These components likely contributed to the high performance of the 3D ResU-Net model. Third, both internal and external evaluation were conducted to test the performance of the proposed 3D ResU-Net model on DWMA segmentation. The comprehensive evaluation experiments effectively illustrated the model’s generalizability and potential as a clinical computer-aided diagnosis tool.
Prior research has devised a set of key quality criteria that influence segmentation, including contrast, intensity deviation, image noise, artifacts, shape distortion, and the presence of pathologic features, and so forth (23). Therefore, for the images from the same cohort (ie, images with similar quality), the segmentation algorithm tends to favor larger volumes of lesions. In this study, we indeed observed a positive trend between DSC and DWMA volumes, indicating that cases with larger volumes of lesions tend to have better DSC. More specifically, for the internal cohort (n = 98), the Pearson correlation coefficient between DSC and total DWMA volume is 0.615 (P < .001). For the external cohort (n = 28), the Pearson correlation coefficient is 0.682 (P = .0012).
The current work had some limitations. One limitation was that the 3D ResU-Net model was currently developed and validated on relatively small internal and external datasets, and additional larger cohorts from multiple sites would further benefit the model development and increase the model’s generalizability. Another limitation was that the 3D ResU-Net in this study was developed to segment DWMA on T2-weighted brain images only. DWMA segmentation may be further improved by using multiparametric MRI data (eg, T1-, T2-, and diffusion-weighted images).
In summary, we developed a 3D ResU-Net for automated and accurate DWMA segmentation on T2-weighted anatomic brain MR images acquired at term-equivalent age in VPIs. The proposed model achieved a DSC of 0.907 in an internal twofold cross-validation and a DSC of 0.877 in an evaluation on an independent, external dataset. A larger study is needed to further validate our approach.
Supported by the National Institutes of Health (grants R01-EB029944, R21-HD094085, R01-NS094200, and R01-NS096037) and a Trustee grant from Cincinnati Children’s Hospital Medical Center.
Disclosures of Conflicts of Interest: H.L. Activities related to the present article: institution and work supported by the National Institutes of Health (NIH) grants R01-EB029944, R21-HD094085, R01-NS094200, and R01-NS096037. Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships. M.C. disclosed no relevant relationships. J.W. disclosed no relevant relationships. V.S.P.I. disclosed no relevant relationships. N.A.P. Activities related to the present article: institution received grant from NIH. Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships. L.H. Activities related to the present article: institution received grant from NIH (R01-EB029944, R21-HD094085, R01-NS094200, and R01-NS096037). Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships.
Abbreviations:
- CNN
- convolutional neural network
- DNN
- deep neural network
- DSC
- Dice similarity coefficient
- DWMA
- diffuse white matter abnormality
- VPI
- very preterm infant
- 3D
- three dimensional
- 2D
- two dimensional
References
- 1.Maalouf EF, Duggan PJ, Rutherford MA, et al. Magnetic resonance imaging of the brain in a cohort of extremely preterm infants. J Pediatr 1999;135(3):351–357. [DOI] [PubMed] [Google Scholar]
- 2.Inder TE, Anderson NJ, Spencer C, Wells S, Volpe JJ. White matter injury in the premature infant: a comparison between serial cranial sonographic and MR findings at term. AJNR Am J Neuroradiol 2003;24(5):805–809. [PMC free article] [PubMed] [Google Scholar]
- 3.Krishnan ML, Dyet LE, Boardman JP, et al. Relationship between white matter apparent diffusion coefficients in preterm infants at term-equivalent age and developmental outcome at 2 years. Pediatrics 2007;120(3):e604–e609. [DOI] [PubMed] [Google Scholar]
- 4.de Bruïne FT, van den Berg-Huysmans AA, Leijser LM, et al. Clinical implications of MR imaging findings in the white matter in very preterm infants: a 2-year follow-up study. Radiology 2011;261(3):899–906. [DOI] [PubMed] [Google Scholar]
- 5.Jeon TY, Kim JH, Yoo SY, et al. Neurodevelopmental outcomes in preterm infants: comparison of infants with and without diffuse excessive high signal intensity on MR images at near-term-equivalent age. Radiology 2012;263(2):518–526. [DOI] [PubMed] [Google Scholar]
- 6.Hagmann CF, De Vita E, Bainbridge A, et al. T2 at MR imaging is an objective quantitative measure of cerebral white matter signal intensity abnormality in preterm infants at term-equivalent age. Radiology 2009;252(1):209–217. [DOI] [PubMed] [Google Scholar]
- 7.Skiöld B, Horsch S, Hallberg B, et al. White matter changes in extremely preterm infants, a population-based diffusion tensor imaging study. Acta Paediatr 2010;99(6):842–849. [DOI] [PubMed] [Google Scholar]
- 8.Parikh NA, He L, Bonfante-Mejia E, et al. Automatically quantified diffuse excessive high signal intensity on MRI predicts cognitive development in preterm infants. Pediatr Neurol 2013;49(6):424–430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Dyet LE, Kennea N, Counsell SJ, et al. Natural history of brain lesions in extremely preterm infants studied with serial magnetic resonance imaging from birth and neurodevelopmental assessment. Pediatrics 2006;118(2):536–548. [DOI] [PubMed] [Google Scholar]
- 10.Iwata S, Nakamura T, Hizume E, et al. Qualitative brain MRI at term and cognitive outcomes at 9 years after very preterm birth. Pediatrics 2012;129(5):e1138–e1147. [DOI] [PubMed] [Google Scholar]
- 11.Hart AR, Whitby EH, Clark SJ, Paley MN, Smith MF. Diffusion-weighted imaging of cerebral white matter and the cerebellum following preterm birth. Dev Med Child Neurol 2010;52(7):652–659. [DOI] [PubMed] [Google Scholar]
- 12.Cainelli E, Arrigoni F, Vedovelli L. White matter injury and neurodevelopmental disabilities: A cross-disease (dis)connection. Prog Neurobiol 2020;193101845. [DOI] [PubMed] [Google Scholar]
- 13.Rath CP, Desai S, Rao SC, Patole S. Diffuse excessive high signal intensity on term equivalent MRI does not predict disability: a systematic review and meta-analysis. Arch Dis Child Fetal Neonatal Ed 2021;106(1):9–16. [DOI] [PubMed] [Google Scholar]
- 14.Hart AR, Smith MF, Rigby AS, Wallis LI, Whitby EH. Appearances of diffuse excessive high signal intensity (DEHSI) on MR imaging following preterm birth. Pediatr Radiol 2010;40(8):1390–1396. [DOI] [PubMed] [Google Scholar]
- 15.Keunen K, Kersbergen KJ, Groenendaal F, Isgum I, de Vries LS, Benders MJ. Brain tissue volumes in preterm infants: prematurity, perinatal risk factors and neurodevelopmental outcome: a systematic review. J Matern Fetal Neonatal Med 2012;25(Suppl 1):89–100. [DOI] [PubMed] [Google Scholar]
- 16.Mathur AM, Neil JJ, Inder TE. Understanding brain injury and neurodevelopmental disabilities in the preterm infant: the evolving role of advanced magnetic resonance imaging. Semin Perinatol 2010;34(1):57–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.He L, Parikh NA. Atlas-guided quantification of white matter signal abnormalities on term-equivalent age MRI in very preterm infants: findings predict language and cognitive development at two years of age. PLoS One 2013;8(12):e85475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Li H, Parikh NA, Wang J, et al. Objective and Automated Detection of Diffuse White Matter Abnormality in Preterm Infants Using Deep Convolutional Neural Networks. Front Neurosci 2019;13610. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Bhalerao M, Thakur S. Brain Tumor Segmentation Based on 3D Residual U-Net. In: Crimi A, Bakas S, eds. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2019. Lecture Notes in Computer Science, vol 11993. Cham, Switzerland: Springer, 2020; 218–2251. [Google Scholar]
- 20.He L, Parikh NA. Aberrant Executive and Frontoparietal Functional Connectivity in Very Preterm Infants With Diffuse White Matter Abnormalities. Pediatr Neurol 2015;53(4):330–337. [DOI] [PubMed] [Google Scholar]
- 21.Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980 [preprint] https://arxiv.org/abs/1412.6980. Posted December 22, 2014. Accessed June 11, 2018.
- 22.Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells W, Frangi A, eds. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Cham, Switzerland: Springer, 2015; 234–241. [Google Scholar]
- 23.Pednekar GV, Udupa JK, McLaughlin DJ, et al. Image Quality and Segmentation. In: Fei B, Webster RJ III, eds. Proceedings of SPIE: medical imaging 2018—image-guided procedures, robotic interventions, and modeling. Vol 10576. Bellingham, Wash: International Society for Optics and Photonics, 2018; 105762N. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Requests to access the datasets used in this study should be directed to the corresponding author. The source code of proposed 3D ResU-Net is publicly accessible on GitHub (https://github.com/chen2mg/aicad/tree/main/DWMA_segmentation_3DResUNet).




