Abstract
Background
Medical outcomes of interest to clinicians may have multiple categories. Researchers face several options for risk prediction of such outcomes, including dichotomized logistic regression and multinomial logit regression modeling. We aimed to compare these methods and provide guidance needed for practice.
Methods
We described dichotomized logistic regression, multinomial continuation-ratio logit regression, which is an alternative to standard multinomial logit regression for ordinal outcomes, and logistic competing risks regression. We then applied these methods to develop prediction models of survival and neurodevelopmental outcomes based on the NICHD Extremely Preterm Birth Outcome Tool model. The statistical and practical advantages and flaws of these methods were examined. Both discrimination and calibration of the estimated logistic models of dichotomized outcomes and continuation-ratio logit model were assessed.
Results
The dichotomized logistic models and multinomial continuation-ratio logit model had similar discrimination and calibration in predicting death and survival without neurodevelopmental impairment. But the continuation-ratio logit model had better discrimination and calibration in predicting neurodevelopmental impairment. The sum of predicted probabilities of outcome categories from the dichotomized logistic models could deviate from 100% substantially, ranging from 87.7 to 124.0%, and the dichotomized logistic model of neurodevelopmental impairment greatly overpredicted low risks and underpredicted high risks.
Conclusions
Estimating multiple logistic regression models of dichotomized outcomes may result in poorly calibrated predictions for an outcome with multiple ordinal categories. Multinomial continuation-ratio logit regression produces better calibrated predictions, constrains the sum of predicted probabilities to 100%, and has the advantages of simplicity in model interpretation, flexibility to include outcome category-specific predictors and random-effect terms for patient heterogeneity by hospital. It also accounts for mutual dependence among multiple categories and accommodates competing risks.
Keywords: Risk prediction, Multicategory outcome, Discrimination, Calibration, Competing risks, Preterm infant (limit: 3–10)
Background
Multivariable risk prediction models are routinely used by healthcare providers in patient counseling and clinical decision-making. The outcomes of these models are often binary and the algorithm is typically based on logistic regression. While outcomes of many medical conditions can have more than two categories, they may be dichotomized and modeled using logistic regression. For an outcome of death, illness, or illness-free survival, for example, a single category, illness-free survival, or a combined category, death or illness, may be of interest and modeled.
Multinomial logit models can simultaneously predict probabilities of multiple outcome categories and thus have the advantage of avoiding loss of detailed information. But few studies have compared predictive performance of logistic models of dichotomized outcomes and multinomial logit models. Biesheuvel et al. (2008) and Roukema et al. (2008) assessed model discrimination and did not find a meaningful difference [1, 2]. More recently, Van Calster and McLernon et al. (2019) argued that model calibration performance should not be overlooked and poor calibration may make a prediction model clinically useless or even harmful [3]. A study by Edlinger et al. (2022) focused on calibration performance of alternative multinomial models for ordinal outcomes but did not compare with that of logistic models of dichotomized outcomes [4].
In this paper, we describe methods for modeling multicategory outcomes. Using data on mortality and neurological development among extremely preterm infants, we estimate dichotomized logistic regression models, a multinomial continuation-ratio logit model and a logistic competing-risk regression model. We assess the discrimination and calibration performance of the dichotomized logistic regressions and continuation-ratio logit models. We also compare the three modeling methods on some important statistical and practical considerations when choosing a modeling method, such as inclusion of random effects to account for patient heterogeneity by hospital and interpretation of coefficients for predictors.
Methods
We consider model development for risk prediction of an outcome with multiple categories using data on patients from various hospitals. Let Ysi indicate an outcome with J categories of the ith patient in the sth hospital, and Xsi1 – Xsi5, for instance, be five predictor variables selected for inclusion in the model. With data collected on patients in a variety of hospitals, patient heterogeneity by hospital can cause poor predictive performance and random-effect logistic regression was found to yield better calibrated predictions than standard logistic regression [4, 5]. We add hospital random-effect terms in our models to account for hospital-level variation in outcomes.
Dichotomized logistic regression
Let the probability of outcome category j be πsi(j) = Pr{Ysi = j}. A logistic regression model can be separately estimated for each outcome category,
![]() |
where the intercept β0j and coefficients β1j – β5j are parameters to be estimated. We further assume that the hospital random-effect term αsj follows a Normal distribution with zero mean and a constant variance. A drawback of this method is that sum of predicted probabilities over all outcome categories for a patient is not constrained to 100%.
Multinomial continuation-ratio logit regression
As an extension of logistic regression, a standard multinomial logit model simultaneously fits J-1 logit models of each outcome category relative to a fixed reference category and constrains the sum of all predicted probabilities to 100%, that is,
. A limitation of this method is that the use of a same reference category and the inclusion of random-effect terms can make model estimation and interpretation difficult [6].
For an outcome with ordered categories, it is preferable to use alternative forms of multinomial logit models that exploit ordinal nature of the outcome categories [7]. We model the sequentially defined conditional probability in the jth category or higher, πsi(j|Ysi ≥ j) = Pr{Ysi = j | Ysi ≥ j}. The continuation-ratio logit models are of the following form,
![]() |
We also assume that the random-effect terms (αs1,…,αs(J−1)) jointly follow a multivariate Normal distribution with zero means [8]. Various forms of the variance-covariance matrix may be specified to represent the correlation structure among the continuation-ratio logits. A simple diagonal form, for example, indicates independent random effects.
Logistic competing risks regression
Competing-risk bias has been a concern in dichotomized logistic regression estimation and composite outcomes combining competing-risk categories such as illness or death are commonly used as study endpoints to mitigate this bias [9, 10]. Competing risks regression models are typically developed in the framework of modeling time-to-event data. Let D indicate different competing events of interest such as death without relapse or relapse from remission, and Tsi be time to the earliest occurrence of an event of interest, say D = 1, for patient i in hospital s, the probability of the occurrence by a preset time t is then defined as Fsi1(t) = Pr{Tsi < = t, D = 1}. The logistic competing risks regression fits a model of this probability transformed with a logit link function [11],
![]() |
A feature of this model is that the coefficients can be interpreted similarly to those in a logistic regression model in terms of odd ratios. The estimated predictor effects are obtained via generalized estimation equations adapted for censored time-to-event data and adjusting for competing risks. A limitation of this method is that a random hospital effect term is not allowed. The estimated model can produce predictions only at average level across hospitals, leading to underestimation of high risks and overestimation of low risks [5].
Patients, outcome and predictor variables
Our study population consisted of extremely preterm infants born at 22–25 weeks of gestational age and weighing 401 to 1000 g during 2006 and 2012 in 19 hospitals across the U.S. and enrolled at birth into an observational study [12]. Infants who had major congenital anomalies or did not receive potentially life-sustaining treatment were excluded. Only surviving infants who completed follow-up assessments of neurodevelopmental impairment (NDI) at a single timepoint of 22–26 months’ age corrected for prematurity were included [13]. NDI is a comprehensive measure of child development based on structured physical examinations and functional assessments. Informed consents were obtained for all infants at hospitals that required parental consent.
For simplicity, we created an outcome with three ordered categories by decreasing severity, death, survival with NDI, or survival without NDI (NDI-free survival) by the time of the follow-up assessments at 22–26 months’ corrected age, and selected five predictor variables, birth weight and gestational age, sex, singleton birth, and exposure to antenatal corticosteroids. These variables have been previously included in the widely used NICHD Extremely Preterm Birth Outcome Tool model [14, 15].
Model estimation and validation
We fitted three logistic models of dichotomized outcomes of death, NDI and NDI-free survival, and a multinomial continuation-ratio logit model of death and NDI conditional on survival using SAS PROC GLIMMIX [16]. We should note that the original patient-level data file was re-structured such that a patient could have as many as J – 1 records stacked together for the estimation of the continuation-ratio logit model. We used the R package riskRegression to fit the logistic competing risks model [11]. The outcome, time to NDI, was computed as age in days on date of NDI examination or age at death. Infants who died or survived without NDI had censored time to NDI.
We computed the Brier scores and the C-statistics to assess model discrimination and overall prediction accuracy. Four increasingly stringent levels of calibration have been suggested for measuring model calibration: mean ('calibration-in-the-large’), weak, moderate, and strong calibration [3]. We assessed model calibration at the first three levels using means and ranges of predicted probabilities, calibration intercepts and slopes and calibration plots. We used bootstrap resampling method to further assess model performance [17, 18]. The predictive models were estimated on 200 random samples drawn with replacement from the original study data and were evaluated both on the bootstrap samples and on the original study data.
Results
We obtained data on 3927 infants. Their characteristics by predictor variables and outcome are shown in Table 1. In brief, there were 1584 (40%) deaths, 1104 (28%) infants with NDI and 1239 (32%) surviving infants without NDI, among whom the mean birth weight was 675 g, 21% were 22–23 weeks of gestational age, 47% were female, 74% were singleton births and 85% received antenatal corticosteroids.
Table 1.
Characteristics of infants by predictors and outcome: n (%) and mean (SD) for birth weight
| Predictor | Outcome | Total | ||
|---|---|---|---|---|
| Death | NDI | NDI-free | ||
| 1,584 (40.3) | 1,104 (28.1) | 1,239 (31.6) | 3,927 (100) | |
| Birth weight (grams) | 630.5 (122.7) | 688.3 (116.4) | 718.6 (117.1) | 674.5 (125.1) |
| Gestational age (weeks) | ||||
| 22–23 | 541 (34.2) | 158 (14.3) | 119 (9.6) | 818 (20.8) |
| 24 | 609 (38.5) | 437 (39.6) | 423 (34.1) | 1,469 (37.4) |
| 25 | 434 (27.4) | 509 (46.1) | 697 (56.3) | 1,640 (41.8) |
| Female | 666 (42.1) | 516 (46.7) | 677 (54.6) | 1,859 (47.3) |
| Singleton birth | 1,129 (71.3) | 831 (75.3) | 943 (76.1) | 2,903 (73.9) |
| Antenatal corticosteroids | 1,238 (78.2) | 968 (87.7) | 1,133 (91.4) | 3,339 (85.0) |
Estimated models
The estimated odds ratios for the predictor variables and the variances of the random hospital effects from three dichotomized logistic models and a multinomial continuation-ratio logit model are presented in Table 2. The predictor variables showed similar effects on death but very different effects on NDI. Notably, antenatal corticosteroid exposure had a significant and positive effect on NDI in the dichotomized logistic model and a significant but negative effect on NDI conditional on survival in the continuation-ratio logit model. The large variance estimates of the random hospital effects relative to their standard errors with ratios greater than two in these models suggested significant differences in outcomes among hospitals. Also shown in Table 2 were the estimated odds ratios for the predictor variables of the logistic competing risks regression model of NDI. They were quite close to those from the dichotomized logistic model of NDI.
Table 2.
Estimated odds ratio (95% CI) and variance (SE) of random hospital effect from logistic models of dichotomized outcomes, multinomial continuation-ratio logit model and logistic competing risks model
| Predictors | Dichotomized logistic | Multinomial continuation-ratio logit | Logistic competing risks | |||
|---|---|---|---|---|---|---|
| Death | NDI | NDI-free | Death | NDI if surviving | NDI by follow-up | |
| Birth weight (100 g) | 0.66 (0.62–0.71) | 1.05 (0.98–1.12) | 1.45 (1.36–1.56) | 0.66 (0.62–0.71) | 0.80 (0.74–0.87) | 1.06 (0.98–1.15) |
| Gestational age (weeks) | ||||||
| 22–23 | 2.76 (2.22–3.43) | 0.61 (0.48–0.78) | 0.43 (0.33–0.55) | 2.59 (2.09–3.22) | 1.23 (0.91–1.68) | 0.58 (0.43–0.78) |
| 24 | 1.46 (1.23–1.72) | 0.98 (0.83–1.16) | 0.72 (0.61–0.85) | 1.46 (1.24–1.72) | 1.20 (0.99–1.46) | 0.98 (0.80–1.20) |
| 25 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| Female | 0.57 (0.50–0.66) | 0.97 (0.84–1.12) | 1.85 (1.60–2.15) | 0.58 (0.50–0.67) | 0.66 (0.55–0.78) | 0.96 (0.80–1.16) |
| Singleton birth | 0.86 (0.74–1.01) | 1.05 (0.89–1.24) | 1.13 (0.95–1.34) | 0.85 (0.73–1.00) | 0.97 (0.79–1.18) | 1.03 (0.83–1.29) |
| Antenatal corticosteroids | 0.53 (0.43–0.65) | 1.28 (1.03–1.60) | 1.73 (1.35–2.21) | 0.52 (0.42–0.63) | 0.65 (0.49–0.87) | 1.21 (0.93–1.59) |
| Variance of random effect | 0.165 (0.065) | 0.126 (0.051) | 0.294 (0.110) | 0.147 (0.055) | 0.213 (0.064) | n.a. |
Footnote: Pr(Death) = 1-Pr(NDI)-Pr(NDI-free survival), Pr(NDI) = 1-Pr(Death)-Pr(NDI-free survival), Pr(NDI-free) = 1-Pr(Death)-Pr(NDI),
Pr(NDI if surviving) = 1-Pr(NDI-free if surviving), Pr(NDI by follow-up) = Pr(age of NDI in days < 22–24 months’ corrected age for prematurity)
Model predictive performance
Measures of predictive performance of the estimated dichotomized logistic models and continuation-ratio logit model are compared in Table 3. Because logistic competing risks regression does not allow hospital-specific predictions, we did not include the estimated logistic competing risks model. The nearly identical Brier scores and C-statistics indicate similar overall model validity and discrimination. The large C-statistics values (> 0.7) for death and NDI-free survival suggest equally acceptable discrimination, but the lower C-statistics for NDI, 0.623 of the dichotomized logistic model and 0.637 of the continuation-ratio logit model, suggest unsatisfactory discrimination. The means of the predicted probabilities of death, NDI and NDI-free survival also are nearly the same, indicating similar calibration. But the predicted probabilities of NDI from the dichotomized logistic model had a slightly narrower range (8.5 − 48.8% vs. 6.6 − 52.1%). A more distinct difference, however, is that the sum of all predicted probabilities from the dichotomized logistic models ranged from 87.7 to 124.0%, but the sum from the continuation-ratio logit model equaled 100% for all study infants. All of the calibration intercepts were close to zero and the calibration slopes for death and NDI-free survival were similarly close to one. The calibration slopes for NDI were slightly larger.
Table 3.
Measures of model predictive performance
| Outcome category | Dichotomized logistic models | Continuation-ratio logit model |
|---|---|---|
| Overall accuracy: Brier score/Bootstrap validation Brier score | ||
| Death | 0.199/0.203 | 0.199/0.202 |
| NDI | 0.194/0.196 | 0.192/0.194 |
| NDI-free survival | 0.186/0.189 | 0.186/0.188 |
| Discrimination: C-statistics (95% CI)/Bootstrap validation C-statistics | ||
| Death | 0.738 (0.722–0.754)/0.729 | 0.738 (0.722–0.753)/0.729 |
| NDI | 0.623 (0.604–0.643)/0.606 | 0.637 (0.618–0.656)/0.619 |
| NDI-free survival | 0.730 (0.714–0.746)/0.720 | 0.730 (0.713–0.746)/0.721 |
| Calibration: mean of predicted probability (range) | ||
| Death | 40.3 (3.7–91.6) | 40.3 (4.0–91.2) |
| NDI | 28.1 (8.5–48.8) | 28.1 (6.6–52.1) |
| NDI-free survival | 31.5 (1.5–81.0) | 31.6 (0.8–78.9) |
| Sum over all categories | 100.0 (87.7–124.0) | 100.0 (100.0–100.0) |
| Calibration intercept/slope | ||
| Death | 0/1.026 | 0.003/1.051 |
| NDI | 0.001/1.133 | 0/1.184 |
| NDI-free survival | 0/1.035 | -0.003/1.076 |
Footnote: Brier score – mean squared difference between observed outcome and predicted probability, 0.25 or greater indicates a worthless model; C-statistics – Area Under the Curve (AUC), 0.5 indicates no discrimination, 0.7 to 0.8 moderate or acceptable, 0.8 or greater excellent; Calibration – agreement between predicted probabilities and observed rates, a calibration intercept of 0 and a slope of 1 indicate perfect calibration; bootstrap validation performance (Brier score or C-statistics) = model performance - average(bootstrap model performance on sample data – bootstrap model performance on original study data)
We further assessed model calibration for NDI by exploiting the fact that the predicted probabilities for each patient from the logistic regression models did not add up to 100%. We divided all the infants into decile groups by the sums of their predicted probabilities and calculated the means of the model predicted probabilities. In Fig. 1, we can see that the means of the predicted probabilities of NDI from the continuation-ratio logit model tended to track the observed rates more closely. But those from the logistic model were much higher than the observed rates at the lower end of the observed rates and much lower than the observed rates at the higher end of the observed rates. We noted that infants in the five groups with the highest observed rates had sums of the predicted probabilities less than 100% and infants in the four groups with the lowest observed rates had sums of the predicted probabilities greater than 100%, and assessed model calibration among these infants in Fig. 2. The calibration intercept of the continuation-ratio logit model was much closer to one than that of the dichotomized logistic model, 0.09 vs. 0.21 among infants whose sum of the predicted probabilities was less than 100% and − 0.12 vs. − 0.28 among infants whose sum of the predicted probabilities was greater than 100%. The corresponding calibration intercepts from bootstrap samples had means and ranges of 0.08 (-0.03, 0.18) vs. 0.20 (0.06, 0.29) and − 0.13 (-0.26, 0.02) vs. -0.29 (-0.43, -0.12), respectively.
Fig. 1.
Mean predicted probabilities of neurodevelopmental impairment (NDI) from estimated models and observed rates
Fig. 2.
Calibration plots of predicted probabilities of neurodevelopmental impairment (NDI). E: O - Ratio of predicted to observed, CITL - Calibration-in-the-large (intercept), Slope - Calibration slope, AUC - C-statistic. Results from bootstrap samples: mean (min, max). (A) CITL = 0.08 (-0.03, 0.18), Slope = 1.12 (0.83, 1.49), AUC = 0.62 (0.59, 0.66). (B) CITL = 0.20 (0.06, 0.29), Slope = 1.13 (0.84, 1.45), AUC = 0.62 (0.59, 0.67). (C) CITL=-0.13 (-0.26, 0.02), Slope = 1.14 (0.80, 1.44), AUC = 0.65 (0.59, 0.71). (D) CITL=-0.29 (-0.43, -0.12), Slope = 1.05 (0.71, 1.37), AUC = 0.64 (0.58, 0.70)
Comparison of modeling methods
In addition to predictive performance, there are other important statistical and practical issues that should be considered in the development of predictive models for multicategory outcomes. We prepared a list of these issues for comparison in Table 4. In general, simplicity in model interpretation facilitates acceptance and usage of a model by clinicians, the sum of all predicted probabilities of outcome categories for each patient should be constrained to 100%, random-effect terms are often needed to accommodate patient heterogeneity by hospital, and flexibility in model fitting to allow outcome category-specific predictor variables helps avoid statistical overfitting. Because the multiple categories of an outcome often result from the occurrences of competing-risk medical conditions or death, the mutual dependence among the categories should be accommodated. We further discussed these issues in the following section.
Table 4.
Comparison of predictive modeling methods on other statistical and practical issues
| Issues to consider | Methods for risk prediction of multicategory outcomes | ||
|---|---|---|---|
| Dichotomized logistic regression | Continuation-ratio logit regression | Logistic competing risks regression | |
| Interpretation of predictor effects | Odds ratio | Conditional odds ratio dependent on ordered outcome category | Odds ratio |
| Constrains sum of all predicted probabilities to 100% | No | Yes | No |
| Allows inclusion of random-effect terms | Yes | Yes | No |
| Allows inclusion of outcome category-specific predictor variables | Yes | Yes | Yes |
| Accommodates competing risks | No | Yes | Yes |
| Availability in statistical software | SAS, Stata, R | SAS, Stata, R | R |
Discussion
Risk prediction models are important tools in clinical decision-making and prognosis often takes the form of multiple categories. We have compared two commonly used methods for modeling multicategory outcomes, dichotomized logistic regression and multinomial logit regression, in an application of predicting mortality and neurodevelopmental impairment among extremely preterm infants. Because the outcome has three ordinal categories, we used an alternative multinomial logit model, continuation-ratio logit model. Additionally, we illustrated a new method that accommodates competing risks, logistic competing risks regression.
We assessed both discrimination and calibration of the estimated logistic models of dichotomized outcomes and continuation-ratio logit model. Consistent with the findings by Biesheuvel et al. and Roukema et al. [1, 2], our results showed that the logistic models and continuation-ratio logit models had similarly satisfactory discrimination in predicting death and survival without NDI. These models also had similar calibration as measured by the average predicted probabilities and by calibration intercepts and slopes. However, the sum of all predicted probabilities from the logistic models for each infant ranged from 87.7 to 124.0%. We found that the dichotomized logistic model of NDI had slightly smaller C-statistics and, for infants whose sum of all predicted probabilities did not equal 100%, it produced poorly calibrated predictions, underestimating high risks and overestimating low risks.
Because death before assessment is a competing risk for NDI, the predictions from the dichotomized logistic model of NDI were dependent on death. We tried using logistic competing risks regression to mitigate the potential competing-risk bias in the estimation of the predictor effects on NDI. However, the estimated odds ratios for predictor variables turned out to be very similar to those in the logistic model of NDI due to limitation of available data. Time to diagnosis of NDI could be determined only at one fixed time, 22–26 months’ age corrected for prematurity, and only for surviving infants in our study data. Nevertheless, logistic competing risks regression could be a useful alternative when the outcome is assessed over time at multiple points.
To facilitate the choice of a predictive modeling method, we compared dichotomized logistic regression, continuation-ratio logit regression and logistic competing risks regression on some important statistical and practical issues. Both dichotomized logistic regression and logistic competing risks regression produce odds ratio estimates for predictor effects but have the flaw that the sum of all predicted probabilities of outcome categories for each patient is not constrained to 100%. Dichotomized logistic regression also has the advantages of allowing for random-effect terms and wide availability of statistical programs for model estimation. But it does not accommodate competing risks. A commonly used method to handle competing-risk bias is to model a composite of combined competing-risk categories. We computed the C-statistics of an estimated logistic model of death or NDI, which was the inverse outcome of the dichotomized logistic model of NDI-free survival, and found that it did not have discrimination in predicting NDI (C-statistic = 0.485) but had moderately satisfactory discrimination in predicting death (C-statistic = 0.719). A shortcoming of logistic competing-risks regression is that it does not allow the use of random effects and cannot produce hospital-specific predictions when patient heterogeneity by hospital is substantial.
Multinomial continuation-ratio logit regression constrains the sum of all predicted probabilities of outcome categories for each patient to 100% and allows for the inclusion of outcome category-specific predictors and random-effect terms. It is easy to explain the estimated predictor effects and prediction results to clinicians or patients. Our estimated continuation-ratio logit model consisted of an equation for predicting death and an equation for predicting NDI conditional on surviving. This addressed the need of clinicians and patients for separate information on death and impairment, which could be valued differently in their decision about treatment options. We included random-effect terms in both equations to account for patient heterogeneity by hospital and had the flexibility to add predictors specific to NDI in the model. To improve the modest model discrimination in predicting NDI, it will be necessary to include additional predictors of this outcome in the future [19, 20]. Finally, continuation-ratio logit regression has the perhaps unexpected advantage of accommodating competing risks. It has long been known that a multinomial logit model is equivalent to a discrete-time version of the cause-specific proportional hazards model for competing risks [21, 22]. Competing risks are not only of statistical interest, but also can be of substantive clinical interest. In neonatal research, for example, it is of interest to understand how the risk and burden of illness among extremely preterm infants has changed with improved survival [23]. This method can be used to study such emerging issues.
Conclusion
A multicategory outcome is often dichotomized and modeled using logistic regression in studies developing prediction models. Because a single outcome category is often of interest, the shortcomings of this method have not received much attention. This study demonstrates the advantages of the continuation-ratio logit model, an alternative form of multinomial logit models for ordinal outcomes, in predictive performance and other statistical and practical considerations. Although dichotomized logistic regression and multinomial logit regression may yield similar discrimination performance, dichotomized logistic regression does not constrain predicted probabilities of all outcome categories to 100% for a patient and can yield poorly calibrated predictions. To accommodate competing risks among multiple categories, logistic competing risks regression can be useful when patient heterogeneity by clustering is not of concern. Development of study designs to collect needed time-to-event data should be considered.
Acknowledgements
We thank Grier Page, Distinguished Fellow at RTI, for insightful comments on the manuscript.
Author contributions
LL and MAR contributed to the conception of the work and drafted the paper. LL conducted all the data analyses. AD contributed to the acquisition and interpretation of data. GB and AD contributed to the revision of the paper. All the four authors approved the submitted version and agreed to be personally accountable for the author’s own contributions and to ensure that questions related to the accuracy or integrity of any part of the work, even ones in which the author was not personally involved, are appropriately investigated, resolved, and the resolution documented in the literature.
Funding
The National Institutes of Health and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) provided grant support for the Neonatal Research Network’s Generic Database and Follow-up Study through cooperative agreements with participating sites and with RTI International (U10 HD36790), which serves as the data coordinating center. We are indebted to our medical and nursing colleagues and the infants and their parents who agreed to take part in these studies. Additional support was provided by the RTI Fellows Program. The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Data availability
The datasets used are available from Dr. Lei Li on reasonable request.
Declarations
Ethics approval and consent to participate
The institutional review board at each hospital approved participation as clinical center. Waiver of consent for enrollment at birth into the observational study was granted at most participating hospitals, but parental consent was required at five hospitals (4 written, 1 oral). Most hospitals required written parental consent for participation in the follow-up study, but five hospitals allowed participation under waiver of consent. Informed consents were obtained for all infants at hospitals that required parental consent. The institutional review board at RTI International approved participation as data coordinating center and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) Neonatal Research Network publication committee approved the submission of this study for publication.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Biesheuvel CJ, Vergouwe Y, Steyerberg EW, Grobbee DE, Moons KG. <ArticleTitle Language=“En”>Polytomous logistic regression analysis could be applied more often in diagnostic research. J Clin Epidemiol. 2008;61:125–34. [DOI] [PubMed] [Google Scholar]
- 2.Roukema J, van Loenhout RB, Steyerberg EW, Moons KG, Bleeker SE, Moll HA. Polytomous regression did not outperform dichotomous logistic regression in diagnosing serious bacterial infections in febrile children. J Clin Epidemiol. 2008;61:135–41. [DOI] [PubMed] [Google Scholar]
- 3.Van Calster B, McLernon DJ, van Smeden M, et al. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17(1):230. 10.1186/s12916-019-1466-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Edlinger M, van Smeden M, Alber HF, Wanitschek M, Van Calster B. Risk prediction models for discrete ordinal outcomes: Calibration and the impact of the proportional odds assumption. Stat Med. 2022;41:1334–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Falconieri N, Van Calster B, Timmerman D, Wynants L. Developing risk models for multicenter data using standard logistic regression produced suboptimal predictions: A simulation study. Biom J. 2020;62:932–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Hartzel J, Agresti A, Caffo B. Multinomial logit random effects models. Stat Modelling. 2001;1:81–102. [Google Scholar]
- 7.Agresti A. Categorical Data Analysis. 2nd Edition, John Wiley and Sons Inc., Hoboken; 2002.
- 8.Coull BA, Agresti A. Random effects modeling of multiple binomial responses using the multivariate binomial logit-normal distribution. Biometrics. 2000;56:73–80. [DOI] [PubMed] [Google Scholar]
- 9.Paneth N, Gryzbowski M, LaGamma E. Combined Outcomes in Prevention Trials: Rarely A Good Idea. American Epidemiological Society 2014. (http://www.epi.msu.edu/video/paneth/copt/default).
- 10.Manja V, AlBashir S, Guyatt G. Criteria for use of composite endpoints for competing risks – A systematic survey of the literature with recommendations. J Clin Epidemiol. 2017;82:4–11. [DOI] [PubMed] [Google Scholar]
- 11.Gerds TA, Scheike TH, Andersen PK. Absolute risk regression for competing risks: interpretation, link functions, and prediction. Stat Med. 2012;31:3921–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Rysavy MA, Li L, Bell EF, et al. Between-hospital variation in treatment and outcomes in extremely preterm infants. New Engl J Med. 2015;372:1801–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Vohr BR, Wright LL, Poole WK, McDonald SA. Neurodevelopmental outcomes of extremely low birth weight infants < 32 weeks’ gestation between 1993 and 1998. Pediatrics. 2005;116:635–43. [DOI] [PubMed] [Google Scholar]
- 14.Tyson JE, Parikh NA, Langer J, Green C, Higgins RD. Intensive care for extreme prematurity–moving beyond gestational age. New Engl J Med. 2008;358:1672–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rysavy MA, Horbar JD, Bell EF, et al. Assessment of an Updated Neonatal Research Network Extremely Preterm Birth Outcome Model in the Vermont Oxford Network. JAMA Pediatr. 2020;174:e196294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.SAS Institute Inc. SAS/STAT® User’s Guide. Cary, NC: SAS Institute Inc; 2021. [Google Scholar]
- 17.Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy and measuring and reducing errors. Stat Med. 1996;15:361–87. [DOI] [PubMed] [Google Scholar]
- 18.Steyerberg EW, Harrell FE, Borsboom GJ, Eijkemans MJ, Vergouwe Y, Habbema JD. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol. 2001;54:774–81. [DOI] [PubMed] [Google Scholar]
- 19.Marlow N. Keeping up with outcomes for infants born at extremely low gestational ages. JAMA Pediatr. 2015;169:207–8. [DOI] [PubMed] [Google Scholar]
- 20.Linsell L, Malouf R, Morris J, Kurinczuk JJ, Marlow N. Prognostic factors for poor cognitive development in children born very preterm or with very low birth weight: A systematic review. JAMA Pediatr. 2015;169:1162–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Prentice RL, Kalbfleisch JD, Peterson AV Jr, Flournoy N, Farewell VT, Breslow NE. The analysis of failure times in the presence of competing risks. Biometrics. 1978;34:541–54. [PubMed] [Google Scholar]
- 22.Allison PD. Discrete-Time Methods for the Analysis of Event Histories. Sociol Methodol. 1982;13:61–98. [Google Scholar]
- 23.Younge N, Goldstein R, Bann CM, et al. Survival and neurodevelopmental outcomes among periviable infants. N Engl J Med. 2017;376:617–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets used are available from Dr. Lei Li on reasonable request.





