Abstract
Physiological data are routinely recorded in intensive care, but their use for rapid assessment of illness severity or long-term morbidity prediction has been limited. We developed a physiological assessment score for preterm newborns, akin to an electronic Apgar score, based on standard signals recorded noninvasively on ad- mission to a neonatal intensive care unit. We were able to accurately and reliably estimate the probability of an individual preterm infant’s risk of severe morbidity on the basis of noninvasive measurements. This prediction algorithm was developed with electronically captured physiological time series data from the first 3 hours of life in preterm infants (≤34 weeks gestation, birth weight ≤2000 g). Extraction and integration of the data with state- of-the-art machine learning methods produced a probability score for illness severity, the PhysiScore. PhysiScore was validated on 138 infants with the leave-one-out method to prospectively identify infants at risk of short- and long-term morbidity. PhysiScore provided higher accuracy prediction of overall morbidity (86% sensitive at 96% specificity) than other neonatal scoring systems, including the standard Apgar score. PhysiScore was particularly accurate at identifying infants with high morbidity related to specific complications (infection: 90% at 100%; cardiopulmonary: 96% at 100%). Physiological parameters, particularly short-term variability in respiratory and heart rates, contributed more to morbidity prediction than invasive laboratory studies. Our flexible methodology of individual risk prediction based on automated, rapid, noninvasive measurements can be easily applied to a range of prediction tasks to improve patient care and resource allocation.
Introduction
Early, accurate prediction of a neonate’s morbidity risk is of significant clinical value because it allows for customized medical management. The standard Apgar score has been used for more than 50 years to assess neonatal well-being and the need for further medical management. We aimed to develop a modern tool akin to an “electronic” Apgar assessment that reflects a newborn’s physiological status and is predictive of future illness severity. Such an improvement in neonatal risk stratification may better inform decisions regarding aggressive use of intensive care, need for transport to tertiary centers, and resource allocation, thus potentially reducing the estimated $26 billion per year in U.S. health care costs resulting from preterm birth (1). Gestational age and birth weight are highly predictive of death or disability (2) but do not estimate individual illness severity or morbidity risk (3). These perinatal risk factors, in addition to laboratory measurements, have been incorporated into currently used algorithms for mortality risk assessment of preterm infants (4–6). These algorithms, however, predict mortality rather than morbidity (3). They also rely on invasive testing and require extraction of data from multiple sources to make a risk assessment.
Although it has been recognized that changes in heart rate characteristics (7) or variability (8) can suggest impending illness and death in a range of clinical scenarios, from sepsis (9) in intensive care patients to fetal intolerance of labor (10), the predictive accuracy of a single parameter is limited. Intensive care providers observe multiple physiological signals in real time to assess health, but certain informative patterns may be subtle. To achieve improved accuracy and speed of individual morbidity prediction for preterm neonates, we developed a new probability score (PhysiScore) based on physiological data obtained non- invasively after birth along with gestational age and birth weight. Two recent advances enabled the use of multiple complex physiological signals for this purpose: the digitization of medical records, which allows linking of real-time physiological signals with later outcomes, and the increasing sophistication of machine learning and pattern recognition algorithms, which allows optimization of PhysiScore in an automated, unbiased manner. We evaluated PhysiScore’s use for predicting overall morbidity and mortality, specific risk for infants with infection or cardiovascular and pulmonary complications, and a combination of complications associated with poor long-term neurodevelopment and compared its performance to standard scoring systems in a preterm neonatal cohort.
Results
PhysiScore Development Based on Patient Characteristics and Morbidities
To develop our prediction tool, we studied a total of 138 preterm neonates that were 34 weeks gestational age or less and <2000 g in weight without major congenital malformations and with baseline characteristics and morbidities as shown in Table 1. Mean birth weight was 1367 g at an estimated mean gestational age of 29.8 weeks, placing these infants at significant risk of both short- and long-term complications.
Table 1.
Category | Subcategory | N |
---|---|---|
Subjects | 138 | |
Birth weight, g | 1367±440 | |
Gestational age, wks | 29.8±3 | |
Gender, female | 68 | |
Apgar Score at 5 min | 7±3 | |
SGA (≤5th percentile) | 7 | |
Multiple Gestation | Total | 46 |
Twins | 20 | |
Triplets | 6 | |
Respiratory distress syndrome | 112 | |
Pneumothorax | 10 | |
Bronchopulmonary dysplasia | Total | 29 |
NOS* | 2 | |
Mild | 12 | |
Moderate | 5 | |
Severe | 10 | |
Pulmonary hemorrhage | 2 | |
Pulmonary hypertension | 3 | |
Acute hemodynamic instability | 11 | |
Retinopathy of Prematurity† | Total | 25 |
Stage I | 9 | |
Stage II | 12 | |
Stage III | 4 | |
Intraventricular hemorrhage‡ | Total | 34 |
Grade 1 | 19 | |
Grade 2 | 7 | |
Grade 3 | 3 | |
Grade 4 | 5 | |
Post hemorrhagic hydrocephalus | 6 | |
Culture positive sepsis | 11 | |
Necrotizing enterocolitis | Total | 8 |
Stage 1 | 2 | |
Stage 2 | 4 | |
Stage 3 | 2 | |
Expired | 4 |
SGA, small for gestational age; NOS, not otherwise specified.
Infants with oxygen requirement at 28 days for whom oxygen requirement was not known at 36 weeks post menstrual age.
ROP is counted by the most severe stage in either eye during the hospitalization.
IVH is counted by the most severe grade in either cerebral hemisphere by Papile classification.
Patients were then classified as high morbidity (HM) or low morbidity (LM) on the basis of their illnesses. The HM group was defined as any patient with major complications associated with short- or long- term morbidity. Short-term morbidity complications included culture- positive sepsis, pulmonary hemorrhage, pulmonary hypertension, and acute hemodynamic instability. Long-term morbidity was defined by moderate or severe bronchopulmonary dysplasia (BPD), retinopathy of prematurity (ROP) stage 2 or greater, intraventricular hemorrhage (IVH) grade 3 or 4, and necrotizing enterocolitis (NEC) on the basis of the strong association of these complications with adverse neurodevelopmental outcome. Death was also included in the long-term morbidity group. Most infants in the HM category had short- and long-term complications affecting multiple organ systems. Infants with only common problems of prematurity such as mild respiratory distress syndrome (RDS) and patent ductus arteriosus (PDA) without major complications were classified as LM.
Probabilistic Score for Illness Severity
We developed a method to estimate the probability that an infant would be in the HM category on the basis of physiological signals recorded in the first 3 hours of life plus gestational age and birth weight. This time period was selected for analysis because it is less likely to be confounded by medical interventions and provides prediction early enough in the infant’s life to be useful for planning therapeutic strategy.
First, we processed the physiological signals (heart rate, respiratory rate, and oxygen saturation) that were recorded for all infants for the first 3 hours after birth. Mean values plus baseline and residual variability signals (capturing both short- and long-term variability) were calculated for heart and respiratory rates. Mean oxygen saturation and the ratio of hypoxia (oxygen saturation <85%) to normoxia over the 3-hour span were calculated.
We then defined the probability for illness severity with a logistic function that aggregated individual risk features as
(1) |
where n was the number of risk factors and c = log P(HM)/P(LM) was the a priori log odds ratio. The ith characteristic, vi (physiological parameter, gestational age, or weight) was used to derive a numerical risk feature f (vi) via nonlinear Bayesian modeling (detailed in Materials and Methods). The score parameters b and w were learned from the training data for use in prospective risk prediction. The parameter wi represents the weight of the contribution of the ith characteristic to the computed probability score, with higher weight characteristics having a greater effect.
PhysiScore is a probability score that ranges from 0 to 1, with higher scores indicating higher morbidity. PhysiScore is calculated by integrating the following 10 patient characteristics into Eq. 1: mean heart rate, base and residual variability; mean respiratory rate, base and residual variability; mean oxygen saturation and cumulative hypoxia time; gestational age and birth weight. Each of these patient characteristics carries a specific learned weight, as denoted by w in Eq. 1. Plotting the receiver operating characteristic (ROC) curve (Fig. 1A) and associated area under the curve (AUC) values (Table 2) shows that PhysiScore exhibits good discriminative ability for prediction of morbidity and mortality risk and compares it to other risk assessment tools. Specifically, PhysiScore was compared to the Apgar score, long used as an indicator for the base physiological state of the newborn (11), as well as to extensively validated neonatal scoring systems that require invasive laboratory measurements [Score for Neonatal Acute Physiology-II (SNAP-II) (5), SNAP Perinatal Extension-II (SNAPPE-II) (5), and Clinical Risk Index for Babies (CRIB) (6)]. For making predictions with the Apgar score, we constructed a model as in Eq. 1 using the 1- and 5-min Apgar scores as the only two inputs; this combined model outperformed either of the two Apgar scores when used in isolation. PhysiScore (AUC 0.9197) performed well across the entire range of the ROC curve and significantly better (P < 0.003) (12) than all four of the other comparison scores (Table 2). PhysiScore’s largest performance gain occurred in the high-sensitivity/specificity region of the ROC curve. Setting a user defined threshold based on desired sensitivity and specificity allows optimization for individual settings. For example, in our neonatal intensive care unit (NICU), a threshold of 0.5 achieves sensitivity of 86% at a specificity of 95% for HM as seen in Fig. 1A (inset panel). Alternately, using a lower threshold would improve sensitivity at the expense of specificity.
Table 2.
APGAR | SNAP-II | SNAPPE-II | CRIB | PhysiScore | |
---|---|---|---|---|---|
Predicting High Morbidity | 0.6978 | 0.8298 | 0.8795 | 0.8509 | 0.9151 |
Infection | 0.7412 | 0.8428 | 0.9087 | 0.8956 | 0.9733 |
Cardiopulmonary | 0.7198 | 0.8592 | 0.9336 | 0.9139 | 0.9828 |
We added the values obtained from laboratory tests to determine the magnitude of their contribution to risk prediction beyond the PhysiScore alone (Fig. 1B), incorporating parameters included in standard risk prediction scores (for example, SNAPPE-II): white blood cell count, band neutrophils, hematocrit, platelet count, and initial blood gas measurement of PaO2 (partial pressure of oxygen, arterial), PaCO2 (partial pressure of carbon dioxide, arterial), and pH (if available at <3 hours of age). No additional discriminatory power was achieved, suggesting that laboratory information is largely redundant with the patient’s physiological characteristics.
To further assess performance of PhysiScore, we analyzed prediction performance for major categories of morbidities contained in the HM categorization. Specifically, we extracted two categories: infection – NEC, culture positive sepsis, urinary tract infection, pneumonia (Fig. 1C); and cardiopulmonary complications – BPD, hemodynamic instability, pulmonary hypertension, pulmonary hemorrhage (Fig. 1D). Plotting data from the HM category infants who had a specific complication against all infants in the LM category yields ROC curves for discriminative ability for these independent morbidity categories (Fig. 1C, D). Comparison to SNAPPE-II (the best performing standard score) is also shown; AUCs were calculated for all scoring methods (Table 2) in these specifically defined sets. At a threshold of 0.5, PhysiScore achieves near-perfect performance (infection: 90% sensitivity at 100% specificity, cardiopulmonary: 96% at 100%).
Morbidity is most difficult to predict in patients with isolated IVH, for which all scores exhibit decreased sensitivity. The PhysiScore AUC for any IVH was 0.8092, whereas SNAP-II, SNAPPE-II, and CRIB had AUCs of 0.6761, 0.6924, and 0.7508, respectively. PhysiScore did not identify three infants who had severe IVH (grade 3 or 4) in the absence of any other HM complications. However, most infants who developed IVH can be found on the left side of the ROC, suggesting that PhysiScore offers high sensitivity without significant compromise in specificity (Fig. 2).
Importance of Physiological Features
Ablation analysis (comparison of model performance when different subsets of risk factors are included) was used to examine the contribution of score subcomponents in predicting HM versus LM. As expected, gestation and birth weight alone achieved reasonable predictive performance (AUC 0.8517). However, these two characteristics are not sufficient for individual risk prediction (3). Notably, physiological parameters alone were more predictive than laboratory values alone (AUC, 0.8540 versus 0.7710, respectively). Adding physiological parameters to gestation and birth weight (that is, PhysiScore) increased the AUC to 0.9129, a significantly (P < 0.01) (12) better prediction than gestation and birth weight alone. Addition of laboratory values and physiologic characteristics did not significantly increase the AUC (0.9197), again suggesting that these parameters are redundant with the laboratory data in morbidity prediction.
Examination of the learned weights (wi in Eq. 1) of individual physiological parameters incorporated into PhysiScore (Fig. 3A) demonstrated that short-term heart and respiratory rate variability make a significant contribution to the value of the PhysiScore, but long-term variability did not. Thus, short-term variability patterns – often difficult to see by eye, but easily calculated by PhysiScore – carried significant physiological information that long term variability patterns did not.
Only three categories of commonly obtained physiological measurements were required for PhysiScore: heart rate, respiratory rate and oxygen saturation. From these measures, using Bayesian modeling, individual curves were obtained that convey the probability of high morbidity associated with individually calculated physiological parameters (Fig. 3B).
As expected, a respiratory rate between 35 and 75 breaths per minute had a greater probability of being associated with health, while higher or lower rates had greater probability of being associated with health, whereas higher or lower rates carried a greater probability of morbidity. A decreased short-term heart rate variability also indicated increased risk, consistent with previous findings linking this parameter to sepsis (9). This visual analysis of the nonlinear relationships seen in Fig. 3B also suggests unexpected associations. Short-term respiratory rate variability, not commonly used as a physiological marker, was associated with increased morbidity risk. Unlike residual heart rate variability, its effect was nonmonotonic. Risk curves describing oxygen saturation suggest, respectively, that risk increases significantly with mean saturations less than 92% and prolonged time spent (>5% total time) at oxygen saturations below 85%. Oxygenation is routinely manipulated by physician intervention, suggesting that intervention failure (for example, the in- ability to keep saturations in a specific range) that allows desaturations lasting for >5% of total time is associated with higher morbidity risk, a threshold that can now be prospectively assessed in clinical trials.
Discussion
We have developed a risk stratification method that predicts morbidity for individual preterm neonates by integrating multiple continuous physiological signals from the first 3 hours of life. This score is analogous to the Apgar score (11), in that only physiological observations are used to derive morbidity and mortality predictions. However, the use of time series data combined with automated score calculation yields significantly more information about illness severity than is provided by the Apgar score.
Discriminative Capacity
Past efforts have resulted in several illness severity scores that use laboratory studies and other perinatal data to achieve improved dis- criminative ability over the Apgar score alone. For all of the available neonatal illness scores, much of the discriminative ability comes from gestational age and birth weight. Nevertheless, it is well-recognized that age- and weight-matched neonates may have significantly different morbidity profiles (3). The CRIB score uses logistic regression to define six factors and their relative weights in predicting mortality: birth weight, gestational age, congenital malformation, maximum base deficit in the first 12 hours, plus minimum and maximum FiO2 (fraction of inspired oxygen) in the first 12 hours (6). SNAP-II and SNAPPE-II were both derived from SNAP. SNAP uses 34 factors identified by experts as important in the first 24 hours of life (specific laboratory data, minimum and maximum vital sign values, and other clinical signs). The resulting score correlated well with birth weight, mortality, length of stay, nursing acuity, and physician estimates of mortality, but was complex to calculate (4). Logistic regression performed on the 34 factors in SNAP identified six variables most predictive of mortality that were recorded in the first 12 hours of life (lowest mean blood pressure, lowest core body temperature, lowest serum pH, multiple seizures, urine output, and FiO2/PaO2 ratio); these were retained in SNAP-II. SNAPPE-II is calculated with the same data as SNAP-II, along with the 5-min Apgar score, small for gestational age status, and birth weight. The additional variables present in SNAPPE-II were found to be independent risk factors for mortality (5). None of these scores, however, discriminate morbidity risk as well as PhysiScore, which integrates a small set of continuous physiological measures calculated directly from standard vital sign monitors.
An intriguing aspect of our findings is that PhysiScore provides high-accuracy predictions about morbidity risk from limited initial data (only 3 hours), even when such outcomes manifest days or weeks later (for example, BPD or NEC). PhysiScore gives positive weight to loss of short-term heart rate variability, much in the way that fetal heart rate monitoring uses loss of short-term heart rate variability to predict fetal distress and guide delivery management (13). PhysiScore addition- ally identifies short-term respiratory variability as having high predictive value, suggesting that further exploration of this factor in other settings might be warranted. Although the precise source of variability loss—either pre- or postnatally—is unknown, autonomic dysregulation likely plays a role. Whether short-term variability loss causes morbidity or is simply a marker of illness is not clear at this point.
Unlike fetal heart rate monitoring or heart rate spectral analysis (14) in the neonate, our approach uses multiple physiological parameters to improve accuracy and provide long-term predictions that extend beyond acute risk. Unlike biomarkers, such predictions are made with data that are already being collected in NICUs. Patient oxygenation, heart rate, and respiratory rate can be automatically processed to compute a score, and a predetermined sensitivity/specificity threshold can be used to make morbidity predictions to guide clinical actions, thereby removing the need for end-user expertise. When integrated into a bedside monitor, the algorithm would indicate the statistical likelihood that an individual patient is at high risk of major morbidities, allowing real-time use of the PhysiScore calculation. This method of deployment would effectively provide an automated electronic Apgar score, with significantly higher predictive accuracy regarding neonatal morbidity.
The PhysiScore’s ability to assess physiologic disturbances before it can be confounded by medical intervention makes it particularly descriptive of initial patient acuity; thus it is particularly well suited as a tool for quality assessment between NICUs. Identification of a patient’s future risk of developing high morbidity complications may be particularly useful for decision-making in primary nurseries to make more informed decisions regarding aggressive use of intensive care, need for transport to higher levels of care and resource allocation. Such economic, social and medical advantages should be evaluated in a large-scale clinical trial.
Technical Considerations
Although we have a relatively small sample size, analysis methods appropriate to small sample sizes (15) were used, and ROC curves were made only for morbidities seen in >10% of our population. Our model, with its automatic factor modeling and selection, requires essentially no parameter tuning, which greatly helps to prevent overfitting in small samples.
In addition, our sample is from a single tertiary care center and was limited to patients born in our institution to ensure that continuous physiological data were available for the first hours of life. Validation in other settings will be required.
Detection of IVH remains elusive in the field of neonatal medicine. Previous work reported that fractal analysis of the original newborn heartbeat may be an early indicator of IVH (14), but yielded no better sensitivity than PhysiScore. It is possible that the underlying pathophysiology of IVH is variable (16), particularly in infants in whom severe IVH is the only morbidity. Although IVH is usually associated with cardiopulmonary instability, recent literature suggests that there may be genetic predisposition to isolated IVH, potentially limiting the role of antecedent physiological signals before large hemorrhages (17). Thus, it is possible that the small number of infants with isolated IVH that were not identified as high risk by PhysiScore represents a distinct subpopulation.
Advanced Computational Techniques in Modern Medical Settings
The use of computer-based techniques to integrate and interpret patterns in patient data to automate morbidity prediction has the potential to improve medical care. The current U.S. governmental mandate to improve electronic health record use and gain economic benefit from using digital data (18) facilitates the use of computer-based tools. Flexible Bayesian modeling with almost no tunable parameters allows our approach to be easily applied to a range of different prediction tasks, allowing use of the highly informative but underused data obtained daily for thousands of acutely ill patients.
Materials and Methods
Ethics Statement
All work was performed under protocol 8312 approved by Stanford’s Panel on Human Subjects. Waiver of Individual Authorization was approved under 45 CFR 164.512(i)(2)(ii)(A),(B),(C) on the basis that the data collection was part of routine care, no intervention or interaction with the patients occurred and the data was processed anonymously.
General Study Strategy
After enrollment, we used a subset of patients (n = 12) to develop physiologic data processing methods. We combined state-of-the-art techniques from machine learning to build our framework that (i) processed these physiological parameters using nonlinear models, (ii) used regularization to do automatic feature selection, and (iii) combined relevant weights using multivariate logistic regression to produce the predictive PhysiScore (physiological features plus birth weight and gestational age). This framework has essentially no tunable parameters. Thus, un- like traditional frameworks that require separate feature selection and modeling steps followed by model testing using data, our framework combined these steps to allow direct testing of the predictive ability of this score on all 138 subjects by the leave-one-out method (15) to prospectively identify infants at high risk of severe complications.
Study Population
Inborn infants admitted to the NICU of Lucile Packard Children’s Hospital from March 2008 to March 2009 were eligible for enrollment. A total of 145 preterm infants met inclusion criteria: gestational age ≤ 34 completed weeks, birth weight ≤ 2000 grams, and availability of cardiorespiratory (CR) monitor data within the first three hours of birth. Seven infants found to have major malformations were subsequently excluded.
Inborn infants admitted to the NICU of Lucile Packard Children’s Hospital from March 2008 to March 2009 were eligible for enrollment. A total of 145 preterm infants met inclusion criteria: gestational age ≤34 completed weeks, birth weight ≤2000 g, and availability of cardio-respiratory (CR) monitor data within the first 3 hours of birth. Seven infants found to have major malformations were subsequently excluded. Thirty-five neonates had HM complications. Of these, 32 had long- term morbidities (moderate or severe BPD, ROP stage 2 or greater, grade 3 or 4 IVH, and/or NEC). Four neonates died after the first 24 hours of life. There were 103 preterm neonates with only common problems of prematurity (RDS and/or PDA) and so were considered LM. Five infants with a < 2-day history of mechanical ventilation for RDS, but no other early complications, were transferred before ROP evaluation and marked as LM.
Outcome Annotation
Electronic medical records, imaging studies, and laboratory values were reviewed by pediatric nurses and verified by a physician. All significant illnesses during the hospitalization were recorded. Morbidities were identified with previously described criteria: BPD (19), ROP (20), NEC (21), and IVH (22). For IVH and ROP, the highest unilateral grade or stage was recorded, respectively. Acute hemodynamic instability was also noted: hypotension (defined as a mean arterial blood pressure less than gestational age or poor perfusion) requiring ≥3 days of pressor support or adrenal insufficiency requiring hydrocortisone.
Physiologic Signal Processing
Time series heart rate, respiratory rate, and oxygen saturation data are collected from all CR monitors. Heart and respiratory rate signals are processed to compute a base and residual signal. The base signal represents a smoothed, long-term trend; it is computed with a moving average window of 10 min. The residual signal is obtained by taking the difference between the original signal and the base signal; it characterizes short-term variability most likely linked to sympathetic function (Fig. 4). The variance features were motivated by analysis using the model in (23) on our preliminary set of 12 patients. For heart and respiratory rates, we compute the base signal mean, base signal variance, and residual signal variance. For the oxygen saturation, we compute the mean and the ratio of the time the oxygen saturation is below 85%.
Statistical Methods
Sensitivity, specificity, AUC, and significance values (12) were computed for all comparisons. All statistical analyses were performed with software developed for this project (available for academic use upon request.) We used the leave-one-out method for all evaluations. With this method, predictive accuracy was evaluated for each patient separately. For each patient, we learned the model parameters with the data from all other patients as the training set and evaluated predictive ac- curacy on the held-out patient. This technique was repeated for each subject, so that each subject’s clinical data were prospectively obtained. This method of performance evaluation is computationally intensive but is a well-established statistical method for measuring performance when the sample set size is limited (15).
Nonlinear Models of Risk Factors
To implement Eq. 1, we must determine how to integrate continuous- valued risk factors, including the physiological measurements, into our risk model. Several approaches exist in the literature. One common approach is to define a “normal” range for a measurement and use a binary indicator whenever the measurement is outside that range. Although this approach can most easily be implemented in a clinical setting, it provides only coarse-grained distinctions derived from extreme values. Another approach is to predetermine a particular representation of the continuous- valued measurement, usually either the feature itself, or a quadratic or logarithmic transformation, as selected by an expert (24, 25).
We used a different approach based on a Bayesian modeling paradigm (26). This approach captures the nonlinear relationships be- tween the risk factor and the outcome and takes into account that the overall behavior of a factor can vary greatly between sickness categories. For each risk factor vi, we separately learned a parametric model of the distribution of observed values in the training set P(vi|C) for each class of patient C (HM and LM). The parametric model is selected and learned with maximum-likelihood estimation (Fig. 5) from the set of long-tailed probability distributions of exponential, Weibull, lognormal, normal, and gamma. Specifically, for each parametric class, we fit the maximum likelihood parameters and then select the parametric class that provides the best (highest likelihood) fit to the data. The log odds ratio of the risk imposed by each factor was incorporated into the model.
An important advantage of our approach is that explicit missing data assumptions can be incorporated. When standard laboratory results (e.g., complete blood count) are not recorded, we assume that they are missing at random and not correlated with outcome. Their contribution if missing is 0 and log P(vi|HM)/P(vi|LM) otherwise. Blood gas measurements, however, are likely obtained only for profoundly ill patients and hence are not missing at random. Thus, for each measurement type i we define mi=1 if measurement vi is missing and mi=0 otherwise. We now learn the distribution P(mi|C) – the chance that the measurement i is missing for each patient category C, and P(vi|C, mi= 0) – the distribution of the observed measurements as described above. The factor contribution for measurement i is computed as
This formulation allows us to account both for the observed measurement, if present, and for the likelihood that a particular measurement might be taken for patients in different categories.
This approach has additional advantages. Putting all factors in a probabilistic framework provides a comparable representation for different risk factors, allowing them to be placed within a single, integrated model. Utilizing a parametric representation of each continuous measurement alleviates issues arising from data scarcity. Uncovering the dependence between the risk factor and the illness category automatically reduces data requirement by eliminating the need for cross-validation to select the appropriate form. Unlike most previous methods, we utilized different parametric representations for patients in different categories, better capturing disease-induced changes in patient physiology. Finally, we obtained an interpretable visual summary of the likelihood of low patient morbidity over the range of values for each factor (Fig. 3B).
Learning the PhysiScore parameters
To learn the score parameters b and w’s, we maximized the log likelihood of the data in the training set with a ridge penalty as
(3) |
The ridge penalty reduces spurious data dependence by enabling automatic factor selection to control model parsimony and prevents over- fitting (27, 28). The hyperparameter 1 controls the complexity of the selected model and was set to 1.2 in our experiments. This value was selected early in our development by random 70/30 cross-validation splits, based on experimental analysis showing that the results were not sensitive to the choice of this parameter.
Acknowledgments
We thank E. Helfenbein (Phillips Healthcare) for help in setting up a monitor data capture system; N. Llerena for data extraction; A. Pasca for figure production; D. Vickrey for insightful discussions; J. Hall, B. Kogut, P. Hartsell, and A. White for data annotation; and P. Prasad and Clinical Technology and Biomedical Engineering at Stanford for technical support.
Funding: S.S. is a Rambus Fellow in Computer Science. A.K.R. is a Susan and Lynn Orr Fellow in Neonatology. This work was supported by the Stanford University Department of Computer Science pilot project funds. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Footnotes
Author contributions: S.S., D.K., and A.A.P. conceived the study. All authors contributed to study design. S.S. and D.K. designed the methods and performed the analysis. A.K.R. extracted the medical annotations from the medical records. S.S. implemented the model and generated the figures. All authors contributed to the preparation of the manuscript and approved the final version.
Competing interests: The authors declare that they have no competing interests.
References
- 1.Behrman R, Butler A, editors. Consequences and Prevention. National Academies Press; Washington, DC: 2007. Preterm Birth: Causes. [PubMed] [Google Scholar]
- 2.Robertson PA, Sniderman SH, Laros RK, Jr, Cowan R, Heilbron D, Goldenberg RL, Iams JD, Creasy RK. Neonatal morbidity according to gestational age and birthweight from five tertiary care centers in the United States, 1983 through 1986. Am J Obstet Gynecol. 1992;166:1629–1641. doi: 10.1016/0002-9378(92)91551-k. [DOI] [PubMed] [Google Scholar]
- 3.Tyson JE, Parikh NA, Langer J, Green C, Higgins RD National Institute of Child Health and Human Development Neonatal Research Network. Intensive care for extreme prematurity—moving beyond gestational age. N Engl J Med. 2008;358:1672–1681. doi: 10.1056/NEJMoa073059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Richardson DK, Gray JE, McCormick MC, Workman K, Goldmann DA. Score for Neonatal Acute Physiology: A physiologic severity index for neonatal intensive care. Pediatrics. 1993;91:617–623. [PubMed] [Google Scholar]
- 5.Richardson DK, Corcoran JD, Escobar GJ, Lee SK. SNAP-II and SNAPPE-II: Simplified newborn illness severity and mortality risk scores. J Pediatr. 2001;138:92–100. doi: 10.1067/mpd.2001.109608. [DOI] [PubMed] [Google Scholar]
- 6.The International Neonatal Network. The CRIB (Clinical Risk Index for Babies) score: A tool for assessing initial risk and comparing performance of neonatal intensive care units. Lancet. 1993;342:193–198. [PubMed] [Google Scholar]
- 7.Schulte-Frohlinde V, Ashkenazy Y, Goldberger AL, Ivanov P, Costa M, Morley-Davies A, Stanley HE, Glass L. Complex patterns of abnormal heartbeats. Phys Rev E Stat Nonlin Soft Matter Phys. 2002;66:031901. doi: 10.1103/PhysRevE.66.031901. [DOI] [PubMed] [Google Scholar]
- 8.Tsuji H, Venditti FJ, Jr, Manders ES, Evans JC, Larson MG, Feldman CL, Levy D. Reduced heart rate variability and mortality risk in an elderly cohort. The Framingham Heart Study. Circulation. 1994;90:878–883. doi: 10.1161/01.cir.90.2.878. [DOI] [PubMed] [Google Scholar]
- 9.Griffin MP, Lake DE, Moorman JR. Heart rate characteristics and laboratory tests in neonatal sepsis. Pediatrics. 2005;115:937–941. doi: 10.1542/peds.2004-1393. [DOI] [PubMed] [Google Scholar]
- 10.Williams KP, Galerneau F. Intrapartum fetal heart rate patterns in the prediction of neonatal acidemia. Am J Obstet Gynecol. 2003;188:820–823. doi: 10.1067/mob.2003.183. [DOI] [PubMed] [Google Scholar]
- 11.Casey BM, McIntire DD, Leveno KJ. The continuing value of the Apgar score for the assessment of newborn infants. N Engl J Med. 2001;344:467–471. doi: 10.1056/NEJM200102153440701. [DOI] [PubMed] [Google Scholar]
- 12.DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics. 1988;44:837–845. [PubMed] [Google Scholar]
- 13.Williams KP, Galerneau F. Intrapartum influences on cesarean delivery in multiple gestation. Acta Obstet Gynecol Scand. 2003;82:241–245. doi: 10.1034/j.1600-0412.2003.00098.x. [DOI] [PubMed] [Google Scholar]
- 14.Tuzcu V, Nas S, Ulusar U, Ugur A, Kaiser JR. Altered heart rhythm dynamics in very low birth weight infants with impending intraventricular hemorrhage. Pediatrics. 2009;123:810–815. doi: 10.1542/peds.2008-0253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rangayyan RM. Biomedical Engineering Series. CRC Press; Boca Raton, FL: 2005. Biomedical Image Analysis. [Google Scholar]
- 16.McCrea HJ, Ment LR. The diagnosis, management, and postnatal prevention of intraventricular hemorrhage in the preterm neonate. Clin Perinatol. 2008;35:777–792. doi: 10.1016/j.clp.2008.07.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Vannemreddy P, Notarianni C, Yanamandra K, Napper D, Bocchini J. Is an endothelial nitric oxide synthase gene mutation a risk factor in the origin of intraventricular hemorrhage? Neurosurg Focus. 2010;28:E11. doi: 10.3171/2009.10.FOCUS09143. [DOI] [PubMed] [Google Scholar]
- 18.The American Recovery and Reinvestment Act of 2009 (Public Law 111-5) official text. Government Institutes/Bernan Press; Lanham, MD: 2009. [Google Scholar]
- 19.Ehrenkranz RA, Walsh MC, Vohr BR, Jobe AH, Wright LL, Fanaroff AA, Wrage LA, Poole K National Institutes of Child Health and Human Development Neonatal Research Network. Validation of the National Institutes of Health consensus definition of bronchopulmonary dysplasia. Pediatrics. 2005;116:1353–1360. doi: 10.1542/peds.2005-0249. [DOI] [PubMed] [Google Scholar]
- 20.International Committee for the Classification of Retinopathy of Prematurity. The International Classification of Retinopathy of Prematurity revisited. Arch Ophthalmol. 2005;123:991–999. doi: 10.1001/archopht.123.7.991. [DOI] [PubMed] [Google Scholar]
- 21.Kliegman RM, Walsh MC. Neonatal necrotizing enterocolitis: Pathogenesis, classification, and spectrum of illness. Curr Probl Pediatr. 1987;17:213–288. doi: 10.1016/0045-9380(87)90031-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Papile LA, Burstein J, Burstein R, Koffler H. Incidence and evolution of subependymal and intraventricular hemorrhage: A study of infants with birth weights less than 1,500 gm. J Pediatr. 1978;92:529–534. doi: 10.1016/s0022-3476(78)80282-0. [DOI] [PubMed] [Google Scholar]
- 23.Saria S, Koller D, Penn A. Discovering shared and individual latent structure in multiple time series. 2010. arXiv:1008.2028v1 [stat.ML] [Google Scholar]
- 24.Whitlock G, Lewington S, Sherliker P, Clarke R, Emberson J, Halsey J, Qizilbash N, Collins R, Peto R. Body-mass index and cause-specific mortality in 900 000 adults: Collaborative analyses of 57 prospective studies. Lancet. 2009;373:1083–1096. doi: 10.1016/S0140-6736(09)60318-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Schnabel RB, Sullivan LM, Levy D, Pencina MJ, Massaro JM, D’Agostino RB, Sr, Newton-Cheh C, Yamamoto JF, Magnani JW, Tadros TM, Kannel WB, Wang TJ, Ellinor PT, Wolf PA, Vasan RS, Benjamin EJ. Development of a risk score for atrial fibrillation (Framingham Heart Study): A community-based cohort study. Lancet. 2009;373:739–745. doi: 10.1016/S0140-6736(09)60443-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Ross SM. Introduction to Probability and Statistics for Engineers and Scientists. 3. Elsevier Academic Press; Amsterdam: 2004. [Google Scholar]
- 27.Hastie T, Tibshirani R, Friedman JH. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer; New York: 2001. [Google Scholar]
- 28.Zhu J, Hastie T. Classification of gene microarrays by penalized logistic regression. Biostatistics. 2004;5:427–443. doi: 10.1093/biostatistics/5.3.427. [DOI] [PubMed] [Google Scholar]