Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Dec 1.
Published in final edited form as: Pediatr Neurol. 2023 Sep 7;149:26–31. doi: 10.1016/j.pediatrneurol.2023.09.001

Deep learning to optimize MRI prediction of motor outcomes after HIE

Zachary A Vesoulis 1, Shamik B Trivedi 2, Hallie F Morris 3, Robert C McKinstry 4, Yi Li 5, Amit M Mathur 6, Yvonne W Wu 7
PMCID: PMC10842950  NIHMSID: NIHMS1936829  PMID: 37774643

Abstract

Background:

MRI is the gold standard for outcome prediction after hypoxic-ischemic encephalopathy (HIE). Published scoring systems contain duplicative or conflicting elements.

Methods:

Infants ≥ 36 weeks gestational age (GA) with moderate-severe HIE, therapeutic hypothermia treatment, and T1/T2/DWI imaging were identified. Adverse motor outcome was defined as Bayley-III motor score <85 or Alberta Infant Motor Scale <10th centile at 12–24 months. MRIs were scored using a published scoring system.

Logistic regression (LR) and gradient-boosted deep learning (DL) models quantified the importance of clinical and imaging features. The cohort underwent 80/20 train/test split with five-fold cross validation. Feature selection eliminated low-value features.

Results:

117 infants were identified with mean GA=38.6 weeks, median cord pH=7.01, and median 10-min Apgar=5. Adverse motor outcome was noted in 23/117 (20%).

Putamen/globus pallidus injury on T1, cerebellar injury on T1, and GA were the most informative features. Feature selection improved model accuracy from 79% (48-feature MRI model) to 85% (three-feature model). The three-feature DL model had superior performance to the best LR model (AUC 0.69 vs. 0.75).

Conclusions:

The parsimonious DL model predicted adverse HIE motor outcomes with 85% accuracy using only three features (putamen/globus pallidus injury on T1, GA, and cord pH) and outperformed LR.

Keywords: HIE, MRI, machine learning, outcome, neonatal

INTRODUCTION

Hypoxic-ischemic encephalopathy (HIE) is a clinical syndrome characterized by seizures, altered mental status, and neurodevelopmental sequelae that is a common and costly cause of infant morbidity and mortality.14 HIE occurs following a reduction in fetal blood flow for a variety of reasons and frequently results in consequential brain injury.

HIE brain injury can be characterized by MRI, and most commonly affects the deep gray nuclei and/or white matter in a watershed distribution5. Over time, several approaches have been developed for systematic evaluation of MR imaging with the aim of predicting adverse neurodevelopmental outcomes.610 In general, these strategies divide the brain into anatomical regions (e.g., putamen/globus pallidus, white matter, cerebellum) and hemispheres that are evaluated on different MR sequences (T1, T2, and diffusion weighted imaging [DWI]). By assigning points for increasing severity and extent of injury and calibrating the scale to a validation cohort, these scoring systems can be used to predict subsequent adverse neurodevelopmental outcomes.

The more recent scoring systems of Trivedi et al.9 and Weeke et al.10 have demonstrated good performance in prediction of adverse outcomes with area under the ROC curve (AUC) ranging between 0.720 and 0.989. However, despite the potential clinical value offered by these systems, qualitative and descriptive interpretation remains the primary method of MRI analysis used in clinical practice.

A significant barrier to routine clinical implementation is the complexity of the scoring systems. The Weeke system assigns scores to 13 different anatomic regions across three sequences (T1, T2, DWI) and two 1H-MRS regions of interest for a total of 41 scorable elements. The Trivedi system assigns scores to 16 anatomic regions (8 sites, lateralized to left and right) across the same three sequences for a total of 48 scorable elements. It is likely that many of these elements are duplicative or of minimal additive value when predicting outcome. Additionally, both systems utilize only radiographic data. Clinical factors may provide non-overlapping insight into neurodevelopmental outcomes.

Deep learning (DL), one technique of the broader scope of machine learning approaches, utilizes artificial neural networks to identify salient variables or “features” from a larger pool which, in association, predict outcomes with the greatest accuracy.11 Compared with traditional regression modeling, deep learning is non-linear, robust to multicollinearity, and can be used even in the presence of missing data.

The goal of this project was to leverage the strengths of deep learning by training a model on a cohort of HIE infants with and without adverse motor deficits at one year. Utilizing this approach, we aimed to identify the optimal set of clinical and imaging factors which would produce the greatest predictive accuracy using the smallest number of variables.

METHODS

Study population

This study is a secondary analysis of infants enrolled in one of three existing HIE research cohorts including a single-site historical cohort (WUSM) and two separate multicenter randomized clinical trial cohorts: NEAT (NCT00719407) and NEATO (NCT01913340). Inclusion criteria were gestational age ≥ 36 weeks, perinatal depression and moderate or severe neonatal encephalopathy. Infants were excluded if they were small for gestational age (defined as birth weight < 10th centile), older than 23.5 hours at time of recruitment, or had severe congenital anomaly, microcephaly (defined as head circumference < 10th centile), polycythemia, hypertension, or lack of central or peripheral indwelling catheter. Infants were further excluded if no MRI was obtained (i.e., infants who died) or was not interpretable (e.g., severe motion artifact). Detailed inclusion and exclusion criteria can be found in the Supplemental Appendix. Although this was a multi-institutional cohort, there was a common core treatment approach to HIE, namely 72 hours of therapeutic hypothermia (TH) treatment started within 6 hours of birth, monitoring and treatment of seizures, and a non-contrast MRI including a minimum of T1, T2, and DWI sequences after completion of TH. This secondary analysis study was reviewed and approved by the Washington University Institutional Review Board.

Clinical data

Standard clinical variables were collected including severity of encephalopathy (Sarnat stage) prior to start of therapeutic hypothermia, Apgar score at 10 minutes, worst arterial cord or infant blood gas pH within 1 hour of age, erythropoietin dosing, completed weeks of gestation and birth weight. All infants underwent formal developmental testing between 12 and 24 months using the Alberta Infant Motor Scale (AIMS) or the Bayley Scales of Infant Development, Third Edition. As both scales have in common a validated measurement of motor function, motor impairment was used as the primary outcome. Adverse motor outcome was defined as Bayley-III motor score < 85 or AIMS < 10th centile.

MRI scoring

MRI images were examined and scored by three reviewers (AMM, RCM, YL) with extensive experience in the scoring system described by Trivedi et al. Disagreements between reviewers were resolved by consensus. Following this system, a numeric score indicating the severity of injury in each region in each of T1, T2, and DWI was recorded for the left and right hemispheres. The scores were then summed and assigned a severity category based on the total injury score (0=no injury; 1–11=mild injury; 12–32= moderate injury; 33–138=severe injury).

To examine all possible element combinations, new variables were created as the sum of the right and left hemispheres, the sum of the subcortical components, and the sum of the cortical components. Clinical variables included gestational age, birth weight, worst degree of encephalopathy in the first six hours, cord blood gas pH, and 10-minute Apgar score. A list of all 81 candidate variables can be found in Supplementary Table 1.

Developing the machine learning model

We aimed to identify the optimal combination of variables or features which yielded the most accurate final prediction while also minimizing the number of variables, reducing the degree of effort required to score the MRI. Like the approach taken to build multivariate models, a series of models can be constructed for univariate evaluation of a feature to determine its importance or utility in outcome prediction.

For this project, deep learning was implemented using the XGBoost deep learning library version 1.2.1 in a script written in the Python programming language using the Scikit-learn implementation.12,13 Training and validation cohorts were created with an 80/20 training-validation split, sampling infants at random. The optimal value of three parameters (learning rate, number of estimators, decision tree size) in the XGBoost model were derived by grid search. Values evaluated in model tuning are shown in Supplemental Table 2. Optimal parameters were chosen to maximize model performance while minimizing risk of overfitting.

To best determine feature importance, SHAP (Shapley Additive Explanation) scores14 were calculated for each feature. SHAP scores represent the contribution of each feature to the overall prediction and are evaluated across two domains: SHAP value and feature value. An ideal predictor will be strongly associated with the outcome (high SHAP value) and be distinctly clustered by feature value (e.g., most cases of low feature value are associated with negative outcome, most cases of high feature value are associated with positive outcome).15

Feature selection and pruning of the model to remove features of low importance, was performed utilizing SHAP scores for optimal model accuracy. For efficient use of the cohort and to reduce risk of overfitting, five-fold cross validation of each model was performed by dividing the overall cohort into five equal parts and performing the training and validation steps so that each of the five parts was used for validation once and training four times.

Developing the binary logistic regression model

For this project, logistic regression (LR) was implemented in R version 4.2.0 (R Foundation for Statistical Computing, Vienna, Austria). As with the deep learning model, training and validation cohorts were created with an 80/20 training-validation split, selecting infants at random. Models were fit using a generalized linear model and the binomial family (i.e., binary logistic regression) using the base statistics package. Optimal feature selection was accomplished using the built-in stepwise algorithm in the backwards direction by maximal Akaike information criterion (AIC). In this approach, the model was loaded with the maximal number of variables before sequential removal. At each step, the AIC16, a measure of model error and thus quality is calculated. Variables continued to be removed until AIC could not be reduced further.

As stepwise regression modeling does not account for multicollinearity, an alternative strategy using LASSO regression, which is robust to multicollinearity, was employed. LASSO regression was implemented in R using the glmnet package. Ten-fold cross-validation was performed to identify the lambda value producing the lowest test mean squared error. Optimal feature selection was accomplished using coefficient shrinkage, which selects out poorly predictive factors by setting their coefficients to zero.

Statistical approach

Three different logistic regression models were built. First, a “baseline” LR model was built utilizing the categorical severity score of the MRI as originally published9. This model functionally recapitulates the original scoring system and is essentially a validation using new data. The stepwise and LASSO LR models were built using the same strategy; starting with all 76 MRI scoring components and 3 clinical variables (gestational age, birth weight, and encephalopathy severity), features were reduced until AIC was minimized or further shrinkage could not be obtained.

Three different deep learning models were considered. First, as with the baseline LR mode, the categorical severity score of the MRI as originally published was used. The second model utilized all 76 MRI scoring components or combinations and all 5 clinical variables (comprehensive model). The third model utilized feature reduction by SHAP scores (parsimonious model).

The performance of all models was assessed using the same metrics: accuracy (ratio of correctly predicted to total observations), precision (ratio of correctly predicted positive observations to total observations), recall (ratio of true positives to the sum of the false positives and negatives), and area under the receiver-operator curve (AUC). Where cross validation was performed, the average of the five cross-validation runs is reported.

RESULTS

Cohort description

A total of 117 infants were included in the study (21 from NEAT, 39 from NEATO, 57 from WUSM historic cohort). Missing data were minimal with 9683/9711 (99.8%) elements present. The mean gestational age was 38.6 ± 1.6 weeks and the mean birthweight was 3291 ± 608 grams. At the time of therapeutic hypothermia initiation, 82% had moderate encephalopathy (with the remainder having severe encephalopathy), the median 10-minute Apgar score was 5 (range 0–9), and the mean ± SD cord blood gas pH was 7.01 ± 0.17. Thirty-five percent (41/117) of the infants received erythropoietin. In total, 39/117 (33%) of infants underwent AIMS testing while the remaining 78/117 (67%) infants underwent BSID-III testing. Testing was performed at a median chronologic age of 13 months (IQR 12–21). Adverse motor outcomes were identified in 23/117 (20%) of infants.

Neuroimaging findings

All subjects underwent MR imaging including T1, T2 and DWI sequences at a median age of 5 days (IQR 4–7 days). Overall, injury of some degree was noted in 93/117 (79%) of infants. The median total injury score was 6 (range 0–76) with 24/117 (21%) of infants with no injury, 67/117 (57%) with mild injury, 18/117 (15%) with moderate injury, and 8/117 (7%) with severe injury.

Subcortical white matter was the most common site of injury, occurring in 61/117 (52%) of infants, followed by the posterior limb of the internal capsule (42%), cerebral cortex (24%), putamen/globus pallidus (22%), thalamus (21%), cerebellum (11%), and brainstem (6%).

Machine learning model

Grid search evaluation revealed that the optimal value for learning rate was 0.25, maximum tree depth was 3, and number of estimators was 500. The relative value of all possible variables in a machine learning model was evaluated by calculating the SHAP score for each variable for each infant. The performance of each variable was graded by taking the average of the absolute value of all calculated SHAP scores for that variable. Putamen/globus pallidus injury on T1, GA, and cord pH were the highest performing elements (Table 1).

Table 1.

Top 10 performing variables in univariate analysis

Mean (abs(SHAP value))
Putamen/GP T1 sum 0.117
Gestational age 0.084
Cord pH 0.049
10-minute Apgar score 0.047
Total MRI score 0.034
Non-subcortical score 0.030
Putamen/GP DWI sum 0.016
WM T1 sum 0.016
Cerebellum T1 sum 0.015
Birth weight 0.014

The baseline DL model trained using the injury severity categorical MRI score had an accuracy of 79% and an AUC of 0.70. The comprehensive DL model, which included all MRI features and clinical factors, led to a slightly lower accuracy score of 78% and a lower AUC of 0.65. In contrast, the parsimonious DL model which underwent feature reduction by SHAP score produced notable improvements with an overall accuracy of 85% and achieved the highest AUC of any model at 0.75. Although false negatives negatively impacted the recall metric for all studied models, the parsimonious DL model was the least affected, with nearly double the recall of the next nearest model.

Logistic regression model

The baseline LR model, trained using injury severity category as the outcome, achieved uneven results. The AUC of this model (0.69) was similar to that reported in the original publication9 (0.72) but had the lowest accuracy and precision of any model, at 40% and 0.20 respectively. Backwards stepwise logistic regression yielded a final model with 9 components (putamen/globus pallidus DWI left, putamen/globus pallidus T1 left, putamen/globus pallidus T2 sum, thalamus T1 left, thalamus T2 right, WM T1 right, brainstem DWI left, brainstem T1 left, and gestational age), achieving an accuracy of 80% and a precision of 0.4. LASSO regression modeling yielded 4 elements (putamen/globus pallidus T1 right, thalamus DWI right, cortex DWI R, and cerebellum T1 sum) with similar accuracy and recall to the stepwise model (80% and 0.33% respectively), but a lower AUC at 0.54. Performance profiles for all model variations are shown in Table 2.

Table 2.

Model performance comparison

Model Accuracy Precision Recall AUC
Baseline LR (categorical MRI score) 40% 0.20 0.20 0.69
Stepwise LR 80% 0.40 0.32 0.62
LASSO regression 80% 0.47 0.33 0.54
Baseline ML (categorical MRI score) 79% 0.30 0.15 0.70
Comprehensive ML (all MRI features and clinical variables) 78% 0.38 0.33 0.65
ML feature selection by SHAP score (reduced model) 85% 0.63 0.57 0.75

DISCUSSION

In this retrospective multicenter cohort study, we demonstrate the utility of two novel strategies to improve MRI-based methods for predicting adverse motor outcome in infants with HIE who received therapeutic hypothermia. First, machine learning can be leveraged to identify a parsimonious combination of salient, highly predictive variables for motor outcome after HIE. Second, the addition of a simple clinical variable further augments the model, yielding a three-factor (putamen/globus pallidus injury on T1, GA, and cord pH) system which identifies adverse motor outcomes with 85% accuracy. Machine learning models outperformed logistic regression and handled data complexities such as missing or highly heterogenous data.

Methods for systematic scoring of brain MRI in the setting of HIE have evolved since the publication of the first system by Barkovich et al. in 1998, with a trend towards increasing complexity, but also increased performance. The Barkovich system utilized two elements but had only marginal predictive value with an estimated positive predictive value of 44% and negative predictive value of 57% for neuromotor or cognitive outcomes.6 An expanded scoring system was developed for the 2005 NICHD cooling trial8 which increased the number of scored elements to 12 (and had a sizeable increase in predictive power, with a PPV of 70% and NPV of 87% for death or severe disability. The Rutherford scoring system, developed out of the TOBY hypothermia trial17 and published in 20107, examined four regions and utilized T1 and T2 sequences for a total of 8 elements. This approach had a PPV of 76% and NPV of 91% for death or severe disability. The Trivedi system9, upon which this project was based, represents a marked increase in the number of scored elements, a total of 48 elements. This system had a PPV of 47% and NPV of 76% for adverse neurodevelopmental disability at 18–24 months. Finally, the most recent scoring system developed by Weeke et al. even further expands the number of elements including T1, T2, and DWI sequences in addition to 1H-MRS for a total of 76 elements. This complexity yields outstanding results, with a PPV of 89% and NPV of 97% for death or impairment at two years. Not surprisingly given their complexities, these HIE MRI scoring approaches remain largely in the realm of research.

In this project, SHAP scores were used to prune variables from the model if they result in better accuracy. In Figure 1, SHAP scores for individual infants are plotted for the top 10 variables, sorted by feature importance. The globus pallidus T1 sum was a high performing variable because infants with high scores (shown in purple-red) also are very likely to have adverse motor outcomes as demonstrated by rightward positioning on X-axis. This can be compared to the putamen/globus pallidus DWI sum, where infants with adverse motor outcomes are distinctly to the right, but there is not nearly as much separation between outcome groups and infants without adverse motor outcomes who are centered right at zero along the X-axis. Therefore, the DWI version of the variable can be dropped from the model, improving performance, and simplifying the scoring process.

Figure 1:

Figure 1:

Summary plot of SHAP values for individual infants across MR and clinical features. SHAP values are shown on the X-axis where increasingly positive values increase odds of outcome, while negative decrease outcome. Each point is also colored by the relative value of the measure. GP=globus pallidus; DWI=diffusion weighted imaging; WM=white matter.

A strength of our approach is the use of human scored MRIs. Although there is significant interest in the development of a fully automated machine learning algorithm which recognizes injury in the MRI images without human input, there are significant engineering challenges which must be overcome. The hybrid human-DL strategy employed in this study takes advantage of the strengths of each method while minimizing weaknesses.

The association between motor impairment and injury in part of the basal ganglia and the cerebellum is not unexpected, given the key role both regions play in motor coordination. Not surprisingly, neonatal injury to the globus pallidus, putamen, and cerebellum have been linked to the later development of cerebral palsy.1822 More unexpected was the strength of association between injury on T1 sequences compared to DWI. Although DWI is the preferred sequence for early identification of injury in the perinatal period, the lack of standardization in MRI timing in this cohort potentially impacts the diagnostic power of DWI. Although DWI may have pseudonormalized for those with later imaging, the T1 signal abnormality will tend to be consistently present. Additionally, in many cases of HIE, there is uncertainty as to the timing of injury; earlier or sub-acute injury prior to delivery would shift the timeframe of DWI changes and the optimal pseudonormalization window. The greater reliability of the T1 marker strengthens its value as a feature and importance to the model. While consistent imaging between 4 and 6 postnatal days is optimal in a controlled research setting, the results of this study are more reflective of real-world practice and are more generalizable.

In this study, we did not find signal changes in the PLIC to be associated with adverse motor outcomes. However, unlike the other measured MRI features, where changes in the signal are primarily a function of injury, the strength of the PLIC signal changes with injury and maturation, in opposing directions. This inconsistency undoubtably led to a reduction in feature importance of the PLIC in favor of gestational age, which has strong unidirectional relationship. Future inquiries with a larger sample size may permit investigation into a subset of infants where complete PLIC maturation would be expected (e.g., > 38 weeks gestational age) and signal changes are more consistently related to injury.

This study suggests that a machine learning approach can be applied to improving MRI prediction of HIE outcomes and is superior to more conventional logistic regression. However, it is important to acknowledge limitations encountered during the project. Limitations of logistic regression model building significantly influenced the process. While a “baseline” LR model using the injury severity category was easily accomplished a “comprehensive” model including all 76 MRI components and 5 clinical variables could not be built. The high degree of inter-variable correlation between some of the MRI components leads to unstable coefficient estimates and substantial risk of overfitting. Of greatest collinearity concern is the inclusion of left and right component scores along with summative components. While the summative components could be dropped from this portion of the evaluation (reducing the variable count by 28), the resulting “comprehensive” model is not directly comparable to machine learning approach.

Backwards stepwise regression faced different limitations. As the stepwise process seeks to maximize information content and minimize variable count, duplicative or highly correlated variables will be removed, thus summative scores could be included in this step. However, stepwise (and LASSO) regression cannot accommodate variables with missing data, as the evaluation process at each step requires equal row counts. While the dataset was nearly complete, two variables (cord pH and 10-minute Apgar) contained missing values in cases where the cord blood gas was not obtained, or the 10-minute Apgar was not assigned. These two variables were dropped from the analysis. Regardless of the technique used, there is a risk of overfitting. Validation on data not used in training is an essential future step.

A more overarching limitation of this study is that only adverse motor outcomes were studied. This was a practical limitation based on the availability of follow-up data in our existing datasets and provides an incomplete picture of neurodevelopmental outcome. It is likely that other regions of the brain will be important for predicting cognitive and language outcomes. A related limitation is the use of mixed outcome measurement instruments, namely the AIMS and BSID-III. Although the selected thresholds both represent adverse motor deficits (<10th centile and < 85 respectively) the tests are not identical, thus it is possible that some infants may have been misclassified. Finally, our sample size, while comparable to prior HIE MRI studies (ranging between 53 – 173 subjects610), is still somewhat small for a machine learning investigation, although we have made efficient use of the cohort by performing five-fold cross validation.

Based on this preliminary data, future studies should examine a broader range of clinical variables, the full gamut of neurodevelopmental outcome domains, and other imaging features such as MR spectroscopy in a large pool of infants to confirm these findings and investigate the inclusion of other factors to further increase model accuracy. We anticipate that other HIE datasets, including those from the HEAL study, will become available in the future, and will serve as ideal sources for validation and expansion into non-motor outcomes.

In conclusion, we have demonstrated that application of machine learning strategies to human scored MRI and clinical datasets can identify the most important MRI and clinical features for predicting adverse motor outcome in newborns with HIE treated with therapeutic hypothermia. The parsimonious DL model trained with our study population predicts adverse motor outcomes with 85% accuracy using only three features (putamen/globus pallidus injury on T1, GA, and cord pH) and outperforms conventional logistic regression. This novel, hybrid approach combines the accessibility of human scored imaging and clinical factors with deep learning optimization. Future studies with larger, more homogenous samples and complete neurodevelopmental outcomes across all domains are needed to further determine the optimal features for comprehensive outcome prediction and the generalizability of deep learning in this population.

Supplementary Material

Supp Tables
Appendix

Statement of financial support:

  1. NIH Career Development Awards: K23 NS111086 [Vesoulis]

  2. NIH Project Grant U01 NS092764 [Wu]

  3. Thrasher Research Fund [Wu]

Footnotes

Disclosure statement: No authors have financial ties or potential/perceived conflicts to disclose.

Consent: Informed written consent was obtained for all participants prior to any study procedures.

Data availability statement:

Code for the deep learning algorithm is available under GNU General Public License at: https://github.com/zvesoulis/ml_mri. Patient level data is not available due to privacy restrictions.

References:

  • 1.Badawi N, Kurinczuk JJ, Keogh JM, Alessandri LM, O’Sullivan F, Burton PR, Pemberton PJ, Stanley FJ. Intrapartum risk factors for newborn encephalopathy: the Western Australian case-control study. BMJ. 1998. Dec 5;317:1554–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Graham EM, Ruis KA, Hartman AL, Northington FJ, Fox HE. A systematic review of the role of intrapartum hypoxia-ischemia in the causation of neonatal encephalopathy. Am J Obstet Gynecol. 2008. Dec;199(6):587–595. [DOI] [PubMed] [Google Scholar]
  • 3.Perlman JM. Interruption of placental blood flow during labor: potential systemic and cerebral organ consequences. J Pediatr. 2011. Feb;158(2 Suppl):e1–4. [DOI] [PubMed] [Google Scholar]
  • 4.Volpe JJ. Perinatal brain injury: from pathogenesis to neuroprotection. Ment Retard Dev Disabil Res Rev. 2001;7(1):56–64. [DOI] [PubMed] [Google Scholar]
  • 5.de Vries LS, Groenendaal F. Patterns of neonatal hypoxic–ischaemic brain injury. Neuroradiology. 2010. Jun;52(6):555–566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Barkovich AJ, Hajnal BL, Vigneron D, Sola A, Partridge JC, Allen F, Ferriero DM. Prediction of neuromotor outcome in perinatal asphyxia: evaluation of MR scoring systems. AJNR Am J Neuroradiol. 1998. Jan;19(1):143–149. [PMC free article] [PubMed] [Google Scholar]
  • 7.Rutherford M, Ramenghi LA, Edwards AD, Brocklehurst P, Halliday H, Levene M, Strohm B, Thoresen M, Whitelaw A, Azzopardi D. Assessment of brain tissue injury after moderate hypothermia in neonates with hypoxic–ischaemic encephalopathy: a nested substudy of a randomised controlled trial. The Lancet Neurology. 2010. Jan;9(1):39–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Shankaran S, Laptook AR, Ehrenkranz RA, Tyson JE, McDonald SA, Donovan EF, Fanaroff AA, Poole WK, Wright LL, Higgins RD, Finer NN, Carlo WA, Duara S, Oh W, Cotten CM, Stevenson DK, Stoll BJ, Lemons JA, Guillet R, Jobe AH, National Institute of Child Health and Human Development Neonatal Research Network. Whole-body hypothermia for neonates with hypoxic-ischemic encephalopathy. N Engl J Med. 2005. Oct 13;353(15):1574–1584. [DOI] [PubMed] [Google Scholar]
  • 9.Trivedi SB, Vesoulis ZA, Rao R, Liao SM, Shimony JS, McKinstry RC, Mathur AM. A validated clinical MRI injury scoring system in neonatal hypoxic-ischemic encephalopathy. Pediatr Radiol. 2017. Oct;47(11):1491–1499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Weeke LC, Groenendaal F, Mudigonda K, Blennow M, Lequin MH, Meiners LC, van Haastert IC, Benders MJ, Hallberg B, de Vries LS. A Novel Magnetic Resonance Imaging Score Predicts Neurodevelopmental Outcome After Perinatal Asphyxia and Therapeutic Hypothermia. J Pediatr. 2018;192:33–40.e2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Shalev-Shwartz S, Ben-David S. Understanding machine learning: from theory to algorithms. New York, NY, USA: Cambridge University Press; 2014. [Google Scholar]
  • 12.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830. [Google Scholar]
  • 13.Géron A Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. First edition. Beijing ; Boston: O’Reilly Media; 2017. [Google Scholar]
  • 14.Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee SI. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020. Jan;2(1):56–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Lundberg SM, Nair B, Vavilala MS, Horibe M, Eisses MJ, Adams T, Liston DE, Low DKW, Newman SF, Kim J, Lee SI. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat Biomed Eng. 2018. Oct;2(10):749–760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Akaike H A new look at the statistical model identification. IEEE Transactions on Automatic Control. 1974. Dec;19(6):716–723. [Google Scholar]
  • 17.Azzopardi DV, Strohm B, Edwards AD, Dyet L, Halliday HL, Juszczak E, Kapellou O, Levene M, Marlow N, Porter E, Thoresen M, Whitelaw A, Brocklehurst P, TOBY Study Group. Moderate hypothermia to treat perinatal asphyxial encephalopathy. N Engl J Med. 2009. Oct 1;361(14):1349–1358. [DOI] [PubMed] [Google Scholar]
  • 18.Reid SM, Dagia CD, Ditchfield MR, Reddihough DS. Grey matter injury patterns in cerebral palsy: associations between structural involvement on MRI and clinical outcomes. Dev Med Child Neurol. 2015. Dec;57(12):1159–1167. [DOI] [PubMed] [Google Scholar]
  • 19.Accardo J, Kammann H, Hoon AH Jr. Neuroimaging in cerebral palsy. The Journal of Pediatrics. 2004. Aug;145(2):S19–S27. [DOI] [PubMed] [Google Scholar]
  • 20.Kułak W, Sobaniec W. Magnetic resonance imaging of the cerebellum and brain stem in children with cerebral palsy. Adv Med Sci. 2007;52 Suppl 1:180–182. [PubMed] [Google Scholar]
  • 21.Yin R, Reddihough D, Ditchfield M, Collins K. Magnetic resonance imaging findings in cerebral palsy. Journal of Paediatrics and Child Health. 2000. Apr;36(2):139–144. [DOI] [PubMed] [Google Scholar]
  • 22.Arrigoni F, Peruzzo D, Gagliardi C, Maghini C, Colombo P, Iammarrone FS, Pierpaoli C, Triulzi F, Turconi AC. Whole-Brain DTI Assessment of White Matter Damage in Children with Bilateral Cerebral Palsy: Evidence of Involvement beyond the Primary Target of the Anoxic Insult. American Journal of Neuroradiology. 2016. Jul 1;37(7):1347–1353. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Tables
Appendix

Data Availability Statement

Code for the deep learning algorithm is available under GNU General Public License at: https://github.com/zvesoulis/ml_mri. Patient level data is not available due to privacy restrictions.

RESOURCES