Abstract
This paper is concerned with predicting the occurrence of Periventricular Leukomalacia (PVL) using vital data which are collected over a period of twelve hours after neonatal cardiac surgery. The vital data contain heart rate (HR), mean arterial pressure (MAP), right atrium pressure (RAP), and oxygen saturation (SpO2). Various features are extracted from the data and are then ranked so that an optimal subset of features that have the highest discriminative capabilities can be selected. A decision tree (DT) is then developed for the vital data in order to identify the most important vital measurements. The DT result shows that high amplitude 20 minutes variations and low sample entropy in the data is an important factor for prediction of PVL. Low sample entropy represents lack of variability in hemodynamic measurement, and constant blood pressure with small fluctuations is an important indicator of PVL occurrence. Finally, using the different time frames of the collected data, we show that the first six hours of data contain sufficient information for PVL occurrence prediction.
I. INTRODUCTION
Newborns with congenital heart disease are at high risk of brain injury and adverse neurodevelopmental outcomes [1], [2]. A study by Miller et al. [3] showed that full-term newborns with congenital heart defect (CHD) have widespread brain abnormalities before they undergo cardiac surgery. The imaging findings of their study are similar to those in premature newborns and may reflect abnormal brain development in utero. Licht et al. [4] showed that before surgery, term infants with hypoplastic left heart syndrome and transposition of the great arteries have brains that are smaller and structurally less mature than expected.
Periventricular leukomalacia (PVL) is a type of brain injury that affects infants. The condition involves the death of small areas of the brain tissue around fluidfilled areas called ventricles. Research has shown a high incidence of PVL both before and after cardiac surgery in neonates [5], [6]. Recently, there has been a growing interest in clinical research to aim to understand the progression and pathology of PVL, to develop protocols for the prevention of PVL development and to examine the trends in outcomes of individuals with PVL [7]. The study of relationship between preoperative cerebral blood flow and preoperative neurologic conditions was carried out by Licht et al. [8].
Despite advancement in research in the field of PVL, there are no treatments currently prescribed for PVL; furthermore, clinical investigation of these patients almost always have low accuracy [9]. This is due to fact that the origin of PVL and its physiology still remain to be clearly understood.
A computer based decision making tool, also referred to as an “Intelligent Patient Monitoring Tool” or simply The IPM tool, will help the care-givers aggregate different types of physiological data and discover the hidden knowledge or patterns in the data to quickly make the correct decision.
In our previous paper [10], we have shown how decision tree (DT) will help us to highlight the role of blood dioxide carbon contents (HCO3 and PaCO2). In this study, we investigate how DT can help us to predict the occurrence of the PVL using the vital data. The aim is to identify the most important risk factors based on the classification rules to be extracted. These rules will enable better management of the patient targeting the reduction of events, as well as, reduction of the cost of therapy, due to the expected restriction of interventions to necessary cases only.
II. MATERIALS AND METHODS
A. Data Collection
Data from 44 neonates for a period of 12 hours after neonatal cardiac surgery were collected according to a pre-specified protocol at the Children's Hospital of Philadelphia (CHOP). Subjects of this study are limited to two cases of congenital heart disease, hypoplastic left heart syndrome (HLHS) and transposition of great arteries (TGA), accounting for the fact that these two diseases are considered to have the highest likelihood of PVL occurrence as their postoperative effect. Clinical and demographic characteristics of the study cohort are shown in Tab. I.
TABLE I.
Demographic characteristics of the collected data. DHCA is deep hypothermic cardiac arrest time, CPB is cardiopulmonary bypass time, CCC is cross clamped duration time.
| Male, % | 59 |
| Diagnosis, %HLHS | 55 |
| DHCA time, mean ± SD | 27 ± 26 |
| CPB time, mean ± SD | 102 ± 31 |
| CCD time, mean ± SD | 61 ± 19 |
| PVL, % | 45 |
| Extent, mean ± SD | 94 ± 251 |
For each patient, we collected vital data as well as blood gas measurements. The sampling time for the vital data varies both inter and intra-patient between 4-17 seconds.
B. Developed Algorithm
Next, we discuss the steps involved with the designed algorithm for the task of data classification and role extraction. The patient data is collected at the hospital which will be used to form the pool of features. A mutual information based algorithm will then be applied to rank the features in the feature pool. After forming the ordered feature set the optimal feature subset that encapsulates the most critical features will be selected. Compared to the original feature set, the optimal feature subset is reduced in size; however, by maximizing the class separability measure, this subset will result in higher accuracy in the final prediction. The selected features will then feed to the decision tree (DT). In Fig. (1) we provide a schematic overview of the proposed algorithm.
Fig. 1.
Schematic of the proposed algorithm.
C. Feature Extraction
Next, the feature pool will be developed based on the collected set of physiological measurements. The feature pool contains features generated from the vital data and represents the characteristics of this data. Currently, the derived features of vital measurements include: min, max, mean, variance, skewness, kurtosis, trend, energy of wavelet coefficients in different time frames and multi scale entropy (based on sample entropy measure). Minimum and Maximum of the data are important because they could potentially be representative of some mechanisms triggered when a measurement passes critical values. Skewness and kurtosis are third and fourth order statistical moment of a random variable defined by (1).
| (1) |
where, n is the order, μ is the mean value of the data and E is the expected value.
For this study we focus on the variation in the data in 1 min, 20 min and 2 hour periods and we would like to find the most important time scales in the different waveforms. The energy of the continuous wavelet transform (CWT) coefficients of the vital data at these time scales is a measure of variation in the different time frames.
Sample Entropy (SampEn) is measure of signal complexity and is the negative natural logarithm of the conditional probability of having signal window with length N, having repeated itself within a tolerance r for m points, will also repeat itself for m+1 points, without allowing self-matches [11]. SampEn has been used in the literature to evaluate the cyclic behavior of heart rate variability (HRV) and blood pressure variability (BPV) [12], [13]
D. Feature Ranking
In this paper we apply the concept of mutual information to rank the features. Mutual information of random variables is a quantity that measures the mutual dependence of those random variables. Let xi be the ith feature and p(xi) be its corresponding probability density function. The mutual information is then defined as:
| (2) |
where, ck represents the classes and p(xi,ck) is the joint probability distribution of xi and ck. We will approach the feature selection process using mutual information as an optimization problem which seeks to find a subset Sopt from the whole feature set S by maximizing the information content I(s;ck) between the feature set and the output. In this proposal we use the algorithm proposed by Kappaganthu and Nataraj [14]. They maximize the mutual information of subset Si using (4):
| (3) |
| (4) |
E. Class Separability Measure
Now that the feature vector has been formed, the next step is to measure the discriminative capacity of the feature vectors. The class separability measure is defined as a divergence between classes using the feature vector x,
| (5) |
where, p(x|wi) is the conditional probability of x with respect to wi. An optimal feature subset is the feature subset that maximizes the class separability.
III. RESULTS
Now, the primary feature pool has been formed; the features have been ranked using the mutual information technique and the optimal feature set has been selected using the class separability measure. Table (II) shows 12 features that form the optimal feature set. The table shows that the energy of wavelet coefficients, sample entropy and kurtosis are the most important features for PVL occurrence prediction.
TABLE II.
Optimal subset of the features which will be used for designing the classifier.
| Rank | Feature |
|---|---|
| 1 | 2 hour variations in HR |
| 2 | MAP sample entropy |
| 3 | 20 min variations in HR |
| 4 | Kurtosis HR |
| 5 | 1 min variations in SpO2 |
| 6 | 20 min variations in SpO2 |
| 7 | Kurtosis MAP |
| 8 | 1 min variations in MAP |
| 9 | Min SpO2 |
| 10 | HR sample entropy |
| 11 | 2 hour variations in MAP |
| 12 | RAP sample entropy |
In this step we develop our classifier in order to classify PVL patients for healthy subjects. The DT constructed based on the optimal feature set to predict the occurrence of the PVL is represented in Fig. (3). Investigating this DT shows that high amplitude 20 minute variations and low sample entropy in the data is an important factor for prediction of PVL. Low sample entropy represents lack of variability in hemodynamic measurement, and constant blood pressure with small fluctuations is an important indicator of PVL occurrence.
Fig. 3.
Receiver operating characteristic (ROC) curve (plot of true positive rate vs. false positive rate).
The classification accuracy of the designed classifier is shown using the receiver operating characteristic (ROC) curve. The higher area under the curve shows the higher classification accuracy. The plot shows high accuracy in classification although the relatively small data set makes it impossible to have a smooth curve.
In the final step of the work, we would like to know how long the data collection is sufficient for timely prediction of the PVL. To the best of our knowledge there is no study done so far to find the minimum length of data needed after neonatal heart surgery for PVL prediction. To this end we trained and tested the DT with first 2, 4, 6 and 8 hours of data and we compared the prediction results with the complete data length. We used sensitivity, positive predictivity and their geometric mean, F-score, as criteria to evaluate the performance of the classifier. Table III shows that 6 hours of the data contains sufficient information for reliable PVL prediction. Another way to interpret this result is that after 6 hours the chance to prevent PVL will decrease significantly. If validated further, this would clearly be very important information for clinical practice.
TABLE III.
Prediction accuracy for different length of data. TP: True Positive, FP: False Positive, TN: True Negative, FN: False Negative.
| DATA | TP | FP | FN | TN | Se | PP | F-Score |
|---|---|---|---|---|---|---|---|
| All | 19 | 2 | 1 | 22 | 95% | 90% | 0.93 |
| 8 Hours | 19 | 2 | 1 | 22 | 95% | 90% | 0.93 |
| 6 Hours | 18 | 1 | 2 | 23 | 90% | 95% | 0.93 |
| 4 Hours | 16 | 4 | 4 | 20 | 80% | 80% | 0.80 |
| 2 Hours | 15 | 4 | 5 | 20 | 75% | 79% | 0.76 |
Fig. 2.
Result of forming DT from the optimal feature set derived from vital measurements. The tree is pruned at two levels based on Fishers exact test (FET).
ACKNOWLEDGMENT
The research reported in this paper is supported by a grant from National Institute of Health (No. 1 R01 NS 72338 01A1).Authors are very thankful of Dr. Erin Buckley, Mrs. Jennifer M. Lynch, Peter J. Schwab, from CHOP for their endless efforts in data collection.
References
- 1.Wernovsky G, Shillingford AJ, Gaynor JW. Central nervous system outcomes in children with complex congenital heart disease. Current Opinion in Cardiology. 2005;20(2):94–99. doi: 10.1097/01.hco.0000153451.68212.68. [DOI] [PubMed] [Google Scholar]
- 2.Blackburn S. Central nervous system vulnerabilities in preterm infants, Part II. The Journal of Perinatal & Neonatal Nursing. 2009;23(2):108–110. doi: 10.1097/JPN.0b013e3181a3924b. [DOI] [PubMed] [Google Scholar]
- 3.Miller SP, McQuillen PS, Hamrick S, Xu D, Glidden DV, Charlton N, Karl T, Azakie A, Ferriero DM, Barkovich AJ, Vigneron DB. Abnormal brain development in newborns with congenital heart disease. New England Journal of Medicine. 2007;357(19):1928–1938. doi: 10.1056/NEJMoa067393. [DOI] [PubMed] [Google Scholar]
- 4.Licht DJ, Shera DM, Clancy RR, Wernovsky G, Montenegro LM, Nicolson SC, Zimmerman RA, Spray TL, Gaynor JW, Vossough A. Brain maturation is delayed in infants with complex congenital heart defects. The Journal of Thoracic and Cardiovascular Surgery. 2009;137(3):529–37. doi: 10.1016/j.jtcvs.2008.10.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Gaynor JW. Periventricular leukomalacia following neonatal and infant cardiac surgery. Seminars in Thoracic and Cardiovascular Surgery: Pediatric Cardiac Surgery Annual. 2004;7:133–140. doi: 10.1053/j.pcsu.2004.02.007. [DOI] [PubMed] [Google Scholar]
- 6.McQuillen PS, Miller SP. Congenital heart disease and brain development. Annals of the New York Academy of Sciences. 2010;1184:68–86. doi: 10.1111/j.1749-6632.2009.05116.x. [DOI] [PubMed] [Google Scholar]
- 7.van Haastert IC, de Vries LS, Eijsermans MJC, Jongmans MJ, Helders PJM, Gorter JW. Gross motor functional abilities in preterm-born children with cerebral palsy due to periventricular leukomalacia. Developmental Medicine and Child Neurology. 2008;50(9):684–689. doi: 10.1111/j.1469-8749.2008.03061.x. [DOI] [PubMed] [Google Scholar]
- 8.Licht DJ, Wang J, Silvestre DW, Nicolson SC, Montenegro LM, Wernovsky G, Tabbutt S, Durning SM, Shera DM, Gaynor JW, Spray TL, Clancy RR, Zimmerman RA, Detre JA. Preoperative cerebral blood flow is diminished in neonates with severe congenital heart defects. The Journal of Thoracic and Cardiovascular Surgery. 2004;128(6):841–849. doi: 10.1016/j.jtcvs.2004.07.022. [DOI] [PubMed] [Google Scholar]
- 9.Chen J, Zimmerman RA, Jarvik GP, Nord AS, Clancy RR, Wernovsky G, Montenegro LM, Hartman DM, Nicolson SC, Spray TL, Gaynor JW, Ichord R. Perioperative stroke in infants undergoing open heart operations for congenital heart disease. Annals of Thoracic Surgery. 2009;88(3):823–829. doi: 10.1016/j.athoracsur.2009.03.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Jalali A, Licht DJ, Nataraj C. Application of decision tree in the prediction of periventricular leukomalacia (pvl) occurrence in neonates after heart surgery. Proceedings IEEE International Conference Engineering in Medicine and Biology Society (EMBC) 2012 Aug;2012:5931–5934. doi: 10.1109/EMBC.2012.6347344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Richman JS, Moorman JR. Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology: Heart and Circulatory Physiology. 2000;278(6):H2039–H2049. doi: 10.1152/ajpheart.2000.278.6.H2039. [DOI] [PubMed] [Google Scholar]
- 12.Moorman JR, Delos JB, Flower AA, Cao H, Kovatchev BP, Richman JS, Lake DE. Cardiovascular oscillations at the bedside: early diagnosis of neonatal sepsis using heart rate characteristics monitoring. Physiological Measurements. 2011;32(11):1821–1832. doi: 10.1088/0967-3334/32/11/S08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lake DE, Richman JS, Griffin MP, Moorman JR. Sample entropy analysis of neonatal heart rate variability. American Journal of Physiology - Regulatory, Integrative and Comparative Physiology. 2002;283(3):R789–R797. doi: 10.1152/ajpregu.00069.2002. [DOI] [PubMed] [Google Scholar]
- 14.Kappaganthu K. Ph.D. dissertation. Villanova University; 2011. An integrative approach for machinery prognostics. [Google Scholar]



