Skip to main content
PLOS Neglected Tropical Diseases logoLink to PLOS Neglected Tropical Diseases
. 2022 Oct 12;16(10):e0010789. doi: 10.1371/journal.pntd.0010789

Constructing, validating, and updating machine learning models to predict survival in children with Ebola Virus Disease

Alicia E Genisca 1,#, Kelsey Butler 2,#, Monique Gainey 3, Tzu-Chun Chu 4, Lawrence Huang 5, Eta N Mbong 6, Stephen B Kennedy 7, Razia Laghari 6, Fiston Nganga 6, Rigobert F Muhayangabo 6, Himanshu Vaishnav 8, Shiromi M Perera 9, Moyinoluwa Adeniji 5, Adam C Levine 1,, Ian C Michelow 10,, Andrés Colubri 2,‡,*
Editor: Anita K McElroy11
PMCID: PMC9555640  PMID: 36223331

Abstract

Background

Ebola Virus Disease (EVD) causes high case fatality rates (CFRs) in young children, yet there are limited data focusing on predicting mortality in pediatric patients. Here we present machine learning-derived prognostic models to predict clinical outcomes in children infected with Ebola virus.

Methods

Using retrospective data from the Ebola Data Platform, we investigated children with EVD from the West African EVD outbreak in 2014–2016. Elastic net regularization was used to create a prognostic model for EVD mortality. In addition to external validation with data from the 2018–2020 EVD epidemic in the Democratic Republic of the Congo (DRC), we updated the model using selected serum biomarkers.

Findings

Pediatric EVD mortality was significantly associated with younger age, lower PCR cycle threshold (Ct) values, unexplained bleeding, respiratory distress, bone/muscle pain, anorexia, dysphagia, and diarrhea. These variables were combined to develop the newly described EVD Prognosis in Children (EPiC) predictive model. The area under the receiver operating characteristic curve (AUC) for EPiC was 0.77 (95% CI: 0.74–0.81) in the West Africa derivation dataset and 0.76 (95% CI: 0.64–0.88) in the DRC validation dataset. Updating the model with peak aspartate aminotransferase (AST) or creatinine kinase (CK) measured within the first 48 hours after admission increased the AUC to 0.90 (0.77–1.00) and 0.87 (0.74–1.00), respectively.

Conclusion

The novel EPiC prognostic model that incorporates clinical information and commonly used biochemical tests, such as AST and CK, can be used to predict mortality in children with EVD.

Author summary

Although case fatality rates remain high, there are limited data on predicting mortality in children with Ebola Virus Disease (EVD). Furthermore, challenges in predicting EVD outcomes using clinical and laboratory data highlight the need for the development and validation of pediatric predictive models. The novel EVD Prognosis in Children (EPiC) model uses clinical and biochemical information, such as AST and CK, to predict mortality in infected children. While few prognostic models or scoring systems have been developed to predict clinical outcomes of EVD, the majority of them were limited in geographical and temporal scope having been derived using data from one location. As such, the EPiC model is the first externally validated model for the prognosis of pediatric EVD using diverse datasets from geographically and temporally separate outbreaks. This model can be easily applied by bedside clinicians to assess pediatric patients at risk for death and help to allocate resources accordingly.

Introduction

With more than 28,000 cases and 11,000 deaths throughout Guinea, Liberia, and Sierra Leone, the 2014–2016 Ebola Virus Disease (EVD) outbreak in West Africa was the largest in history [1]. Within the first nine months of this epidemic, an estimated 13.8% of all EVD infections occurred in children under the age of 15 with an estimated case fatality rate (CFR) of 73.4% [2]. More recent observational data show that CFRs from the West Africa outbreak were highest among children under 5 years of age with the highest CFR of 89% reported in Guinea [1,36]. Similar trends were witnessed in the second largest EVD outbreak in the Democratic Republic of the Congo (DRC) from 2018–2020. Officially declared an outbreak on August 1, 2018, the tenth EVD outbreak in the DRC had a reported 3,470 EVD cases with children accounting for more than one third of cases and one in ten cases were children under five years of age [7,8]. The index case of the 13th EVD outbreak in the DRC starting in October of 2021 was a child under 5 and more than half of the confirmed cases to date are children [9]. Such findings suggest that young children are especially vulnerable and remain at higher risk of poor outcomes than older children and adults [46,1012].

Several investigators have attempted to identify clinical features associated with mortality among children with EVD [6]. While common themes emerge, there is no consensus on whether certain clinical information may accurately predict outcomes of EVD because signs and symptoms tend to be non-specific [1315]. Clinical manifestations of EVD, such as vomiting, diarrhea, and fatigue, are similar among both children and adults; however, there are differences in their frequency and severity [1518]. One retrospective cohort study of children under 5 years of age reported that 25% of the EVD-confirmed patients were afebrile, while in another study children were less likely than adults to report abdominal, chest, muscle, or joint pain [1,6]. The latter finding may reflect the difficulty young children have in reporting subjective symptoms. Furthermore, common laboratory tests are frequently abnormal in children with EVD [1921]. Such challenges in predicting EVD outcomes using clinical and laboratory data highlight the need for the development and validation of pediatric predictive models.

Ebola virus tends to have a shorter incubation period and cause more rapid disease progression in children. Therefore, developing a pediatric EVD prognostic model is critical to allow clinicians to promptly identify which children may need more intensive monitoring and interventions [6,15]. Such a model could potentially inform clinical practice by allowing clinicians to optimally allocate scarce resources. Although there are a few prognostic models or scoring systems have been developed to predict clinical outcomes of EVD, they are limited in geographical and temporal scope having been derived using data from one location [2225]. More importantly, none of them has been rigorously externally validated, limiting their generalizability and utility, and are not pediatric specific [2225]. Colubri et al developed and validated prognostic models using data from patients of all ages at multiple treatment sites in Sierra Leone and Liberia [25]. However, data were aggregated in 10-year age bands and similar to other studies, the model was not independently validated in a distinct region of Africa during a different outbreak [2225]. In order to fill this gap, the aim of this study was to develop and externally validate the first pediatric-specific EVD prognostic model using diverse datasets from geographically and temporally separate outbreaks.

Methods

Ethics statement

The Rhode Island Hospital Institutional Review Board provided an exemption from ethical review and informed consent for this secondary analysis of de-identified data as it was not considered human subjects research.

Study design and setting

This study used retrospective data from children presenting to Ebola treatment units (ETUs) in West Africa and the DRC. The West Africa derivation dataset was built from the Infectious Diseases Data Observatory’s (IDDO) Ebola Data Platform (EDP). IDDO’s EDP is the first global data repository for clinical, epidemiological, and laboratory data from patients with EVD during the 2014–2015 West Africa outbreak (specifically Liberia, Guinea and Sierra Leone) provided by the following organizations: Alliance for International Medical Action (ALIMA), International Medical Corps (IMC), Institute of Tropical Medicine Antwerp (ITM), Médecins Sans Frontières (MSF), University of Oxford, Save the Children International (SCI), who had no role in the conduction of this study [2638].

The validation DRC dataset was derived from patients who presented at IMC’s Mangina ETU during the 2018–2020 EVD outbreak in the DRC. The DRC’s eastern provinces of North Kivu and Ituri served as the main catchment area for the Mangina ETU, located in North Kivu.

Participant selection

All patients less than 18 years of age who presented to West African ETUs from June 2014 to October 2015 and to the Mangina ETU from December 2018 to January 2020 with laboratory confirmed EVD were eligible for inclusion in the derivation and validation datasets, respectively. Patients were excluded if they had missing outcome data or if they died on the day of admission to the ETU.

EVD triage and diagnosis

West Africa

Since the data from Liberia, Guinea, and Sierra Leone were provided by several humanitarian aid organizations, triage procedures varied slightly from site to site. All organizations adhered to World Health Organization (WHO) diagnostic criteria and relevant national guidelines [3942].

DRC

All patients presenting at the IMC’s Mangina ETU were screened by trained clinical staff to ensure they met the clinical case definition for suspected EVD based on WHO and MSF guidelines and in consultation with local health authorities [3942]. If patients presented with a documented diagnosis of EVD, they were directly admitted to the ward for patients with confirmed disease. Otherwise, patients who met the case definition but had no prior testing were admitted to the ETU’s ward for suspected cases, where they underwent EVD testing. If the patient’s initial test was negative, they remained in the ETU until 72 hours had passed, and a second EVD test was negative, in which case they were discharged. Patients with a positive test result were moved to the “confirmed” ward for further management [43,44].

Laboratory methods

All PCR cycle threshold (Ct) values presented in this study are based on RT-PCR of the same Zaire ebolavirus nucleoprotein locus using standardized RNA extraction procedures [43,44]. A Ct greater than 40 was considered negative in all cases.

West Africa

Data were provided by several humanitarian aid organizations and consequently laboratory methods differed slightly among treatment sites.

DRC

DRC’s ETUs received all patients from the surrounding catchment areas some of whom may or may not have had laboratory confirmed EVD in the community or other test facility prior to arrival. Patients were diagnosed with EVD with a RT-PCR (GeneXpert) blood assay using plasma. Blood chemistry tests were completed at point of care using Piccolo Amlyte 13, which determined levels of glucose, creatinine (CRE), albumin (ALB), aspartate aminotransferase (AST), alanine aminotransferase (ALT), amylase (AMY), potassium, C-reactive protein (CRP), total urea nitrogen (BUN), total bilirubin (TBL), creatine kinase (CK), sodium, and calcium.

Descriptive data analysis

If the Shapiro-Wilk test for normality indicated that data were not normally distributed, results are presented with median and interquartile range [IQR] values [45]. Binary symptom variables are presented as incidence in patients who survived or died. Odds ratios and p-values for binary variables were calculated from univariate regression coefficients. For continuous outcome variables, odds ratios are reported for a five-year increase in age and for an increase in Ct by IQR.

Multiple imputation

In the West Africa data, 10.7% of values were missing for 16 of 18 predictor variables, and multiple imputation was used to address missing data. Details of the imputation protocol are provided in S1 Text, S1 Fig, and S1 Table.

Variable selection

Eighteen candidate predictors including age, sex, Ct value, and 15 other epidemiological and clinical variables based on the current WHO criteria for identifying suspected Ebola cases were selected for inclusion in the model [19, 42]. These variables included fever, headache, respiratory distress (defined as fast respiratory rate; nasal flaring, grunting, intercostal recession and tracheal tug; in-drawing of lower chest wall; central cyanosis of lips and tongue; inability to breastfeed or drink; lethargy), bone or muscle pain, joint pain, conjunctivitis, asthenia, abdominal pain, hiccups, unexplained bleeding, vomiting, diarrhea, nausea, anorexia, or dysphagia [19,42]. The limit for number of candidate predictors for variable selection was set to p < m/15, with m being the limiting sample size equal to the minimum number of observed cases or non-cases [46].

For variable selection, we opted to use Elastic Net regularization, which combines the Lasso and Ridge regression methods, and is effective in handling multicollinearity [47]. The variable selection protocol worked as follows: Elastic Net was applied to each imputed dataset, the sign of the coefficients of the binary symptom variables in the resulting models were tallied, and those variables with the percentage of positive model coefficients above a given threshold were selected. This selection criterion facilitated the inclusion of groups of correlated predictors and predictors with small but significant effects [48]. The threshold for variable inclusion was set at 100% to exclude variables with weak and/or inconsistent effects (S2 Table).

Model development and performance

A saturated model was constructed to serve as a baseline against which to compare the performance of other predictive models. The model included age and Ct value as continuous predictors along with four binary symptom variables selected with the Elastic Net as described above. Bootstrap resampling was used for internal validation. Discrimination was evaluated by optimism-corrected area under the receiver operating characteristic (ROC) curve (AUC) and calibration by a calibration plot comparing predicted with observed probabilities of a binary outcome, in this case survival and death [49]. The ROC curves were generated using the pROC package [50].

External validation

We applied the West Africa derived model to the DRC data comprising 74 cases and evaluated discrimination and calibration with the optimism-corrected AUC and calibration plot. We only used cases with complete data for model validation.

Exploratory data analysis

To further improve model performance, we followed the model recalibration with a previously outlined extension protocol [51]. We sought to add an additional biomarker as a potentially strong predictor that was available in the external validation data but not in the development data. We focused on commonly used biochemistry laboratory values recorded within 48 hours of admission and selected covariates that were found to be significantly correlated with adverse outcome by Spearman’s rank correlation coefficient. We then recalibrated the model and added an additional predictor simultaneously by fitting a new model with the linear predictors of the original model and the additional biomarkers. We did not impute missing data at this step; updated models were fit only on complete cases. Models were evaluated by comparing their AUCs and 95% confidence intervals.

Results

Baseline characteristics of the West Africa dataset

The West Africa dataset included 579 Ebola-positive patients less than 18 years of age with an overall CFR of 40%. Age was among the strongest predictors of mortality with each five-year decrease in age associated with an increase in the odds of death by more than half (Table 1 and S2 Fig). A Ct value below 21 was associated with higher mortality at all ages. Variables that were associated with significantly increased odds of survival included asthenia/weakness, headache, and abdominal pain (p < 0.02 each). In contrast, the presence of bleeding within the first 48 hours of admission increased the odds of death by almost 70% (p < 0.03). While the geographical distribution of cases within West Africa did not reveal a trend in CFR by location (Fig 1), there was a modest inverse correlation (r = -0.51) between CFR and number of cases at each ETU, suggesting that patients at treatment centers that had larger numbers of cases may have had less lethal outcomes for reasons that have not been determined.

Table 1. Demographic and clinical characteristics of patients in the West Africa derivation cohort.

Characteristic Survived, n = 345 Died, n = 234 OR (95% CI)a p-valueb
Demographics
    Age (years), median (IQR) 11 (7, 14) 6 (3, 13) 0.55 (0.46–0.65) <0.001
    Male sex, n (%) 159 (46) 112 (48) 1.07 (0.77–1.5) 0.67
Symptoms c
    Asthenia 297 (88) 171 (74) 0.39 (0.25–0.6) <0.001
    Headache 203 (60) 93 (42) 0.48 (0.34–0.68) <0.001
    Abdominal pain 165 (50) 78 (36) 0.57 (0.4–0.81) 0.002
    Bleeding 43 (14) 45 (22) 1.68 (1.06–2.67) 0.027
    Joint pain 113 (38) 48 (29) 0.68 (0.45–1.01) 0.060
    Bone or muscle pain 121 (37) 61 (29) 0.71 (0.48–1.02) 0.067
    Respiratory distress 26 (7.9) 27 (13) 1.69 (0.95–2.99) 0.071
    Vomiting 206 (61) 121 (54) 0.76 (0.54–1.07) 0.12
    Nausea 172 (60) 104 (54) 0.78 (0.54–1.13) 0.19
    Conjunctivitis 47 (18) 20 (15) 0.8 (0.45–1.4) 0.45
    Diarrhea 187 (56) 125 (59) 1.13 (0.8–1.61) 0.48
    Hiccups 24 (7.3) 12 (5.7) 0.77 (0.37–1.55) 0.48
    Fever 298 (88) 199 (87) 0.88 (0.54–1.46) 0.63
    Anorexia 225 (74) 128 (75) 1.10 (0.72–1.70) 0.67
    Swallowing problems 60 (18) 40 (19) 1.01 (0.64–1.57) 0.97
Ct value, median (IQR) 26.8 (23.6, 30.8) 21.7 (18.9, 26.5) 0.35 (0.23–0.51) <0.001

aOR is for each 5-year increase in age

bBold values denote statistical significance

cn (%)

Abbreviations: IQR: interquartile range; Ct: cycle threshold; OR: odds ratio; CI: confidence intervals

Fig 1. Map of children with Ebola Virus Disease (EVD).

Fig 1

The map shows the geographical distribution of children with EVD included in triage data from the Ebola Data Platform, collected during the West African EVD outbreak from 2014–2016. Bubble size corresponds to the number of cases reported, and color corresponds to observed case fatality rate. Plotted with the R package tmap [52], using base layer maps in the public domain from the Natural Earth project (https://www.naturalearthdata.com/about/terms-of-use/).

Derivation of clinical prognostic model

The clinical prognostic model, which we refer to as the EVD Prognosis in Children (EPiC) model, included two continuous predictors (age and Ct value) and four binary covariates (bleeding, diarrhea, respiratory distress, dysphagia). The EPiC model showed strong performance upon internal validation with AUC = 0.77 (95% CI: 0.74–0.81).

External validation of clinical prognostic model

The DRC Mangina dataset consisted of 74 children with EVD (S2 Fig and S3 Table). A comparison of the derivation and validation cohorts is given in Table 2. The AUC on the external validation cohort was 0.76 (95% CI: 0.64–0.88) (Fig 2A). To quantitatively assess the predictive value of the EPiC model, we considered the slope and intercept of the linear fit of the calibration data and compared it against the ideal: a slope of 1 and an intercept of 0 (Fig 2B). The slope was 0.89 and the intercept -0.09, indicating that the EPiC model provides a good risk estimation overall, with only a small bias towards overestimating risk of death for all patients (except for one outlier point corresponding to a single high-risk patient) by approximately 0.1 in average. The confusion matrix and additional performance measures (including false alarm and miss rates) calculated at the optimal prediction cutoff for accuracy (pcutoff = 0.63) are presented in S4A and S4B Table. The pcutoff value of 0.63 is also consistent with the bias of around 0.1 towards risk overestimation observed in the calibration plot. Prior prognostic models [2225] are not pediatric specific and may use different features, which makes comparisons difficult. However, we were able to apply the minimal (age+CT) model from [25], which was trained on all the patients (pediatric and adult) from the IMC ETUs in the West African EVD outbreak (a subset of the EDP dataset), on the DRC dataset. The performance is shown in S4 Fig, which shows a similar AUC of 0.77 (95% CI: 0.65–0.88), but a worse calibrated model, with a slope of 1.22 and an intercept of -0.27 in the linear fit to the calibration data. This is consistent with previous observations [25] that clinical features make a small contribution to the prediction relative to age and viral load, as it can be seen for our model in the ANOVA and odd ratios (OR) charts in S5 Fig. But inclusion of selected clinical features does consistently (as in our and prior studies) result in better calibrated models.

Table 2. Comparison of baseline characteristics in West Africa derivation and DRC validation cohorts.

Derivation Cohort Validation Cohort
Case-fatality rate (n, %) 234 (40.4) 22 (29.7)
Continuous predictors (median, IQR)
    Age 10 (5–14) 5 (1.5–14)
    Ct value 25.1 (20.9–29.5) 19.3 (17.6–26.1)
Binary symptoms (n, %) a
    Bleeding 88 (15.1) 17 (22.9)
    Diarrhea 57 (9.8) 40 (54.1)
    Respiratory distress 9.7 (1.7) 16 (21.6)
    Dysphagia 19 (3.3) 16 (21.6)

aCovariates presented are those included in the EPiC model.

Abbreviations: IQR: interquartile range; Ct: cycle threshold

We sought to improve model performance by recalibrating the intercept and slope of the calibration plot and adding a biomarker to the model that was only available in the DRC data. An analysis of peak laboratory test results measured within the first 48 hours after admission identified three variables each significantly (p <0.01) correlated with mortality: ALT (r = 0.57), AST (r = 0.56), and CK (r = 0.51). We omitted ALT because it is highly colinear with AST (Pearson correlation = 0.83. Despite limited availability of test results in the validation data (AST: n = 29; CK: n = 33), we used these new variables to build additional models. Models that incorporated an additional predictor outperformed the original EPiC model on the validation data, in which adding CK as a predictor produced an AUC of 0.87 (95% CI: 0.74–1) while adding AST gave an AUC of 0.90 (95% CI: 0.77–1). We also considered a third model with both AST and CK added as predictors, since the association between these two biomarkers was moderate (Pearson correlation = 0.52), suggesting that they contain some amount of mutually independent information that could be combined to improve the predictions. Indeed, the model with AST and CK yields a higher AUC of 0.95 (95% CI: 0.86–1). The confusion matrix for this model exhibits an almost perfect discriminative capability with only 1 misclassification in each outcome category (S5A and S5B Table). However, the sample size for this model was reduced further to n = 23, since it requires patients to have data for both biomarkers. The ROCs and calibration plots for these three models are shown in Fig 3.

Fig 2. Performance characteristics of the prediction model.

Fig 2

Discrimination (A) and calibration (B) plots of the Ebola Virus Disease Prognosis in Children (EPiC) model are shown for the Democratic Republic of the Congo validation dataset. In the discrimination plot, the receiver operating characteristic (ROC) curve is plotted (central black line) together with the 95% confidence interval band (blue shaded area). In the calibration plot, the dots represent the mean estimate of the observed probability for each 10% bin of predicted probability (with probability being risk of death), the vertical lines passing through each dot are the corresponding confidence intervals for the observed probability, the dashed line is the best linear fit passing through the mean values, and the red line is the LOESS curve fitting all the individual observed/predicted pairs in the data.

Fig 3. Discrimination and calibration curves.

Fig 3

Area under the receiver operating characteristic curves (AUC) (A, C, E) and calibration curves (B, D, F) of the Ebola Virus Disease Prognosis in Children (EPiC) model are shown with aspartate aminotransferase (AST) (A, B), creatine kinase (CK) (C, D), or both (E, F) as additional predictors for the Democratic Republic of the Congo validation dataset. The interpretation of the plots is the same as in Fig 2.

Discussion

In this study, we derived and externally validated a prognostic model for pediatric EVD. Our model showed that younger age, lower Ct values and bleeding are poor prognostic factors while asthenia, headache and abdominal pain predict better outcomes. A few studies have described key predictors of EVD mortality among children under 18 years of age during the 2014–2015 Sierra Leone outbreak [6,18]. Shah et al reported fever, vomiting, and diarrhea as significant symptoms associated with death in children under 6 years, and Kangbai et al found that males younger than 16 years of age, who had abdominal pain, vomiting, conjunctivitis, and difficulty breathing at admission, had increased odds of dying [6,18]. A similar study in the 2014–2015 Guinea outbreak determined that older children with diarrhea, fever, and hemorrhage were at greater risk of death, while another study during the same outbreak did not report any significant risk factors for mortality among patients under 20 years [4,11]. Such findings illustrate that predicting outcomes for children with EVD presents unique challenges because the epidemiology and complications of EVD in one outbreak may vary from those in another outbreak due to differing health seeking behaviors, viral dynamics, medical interventions, and socioeconomic, cultural, and political contexts.

Updating the EPiC model with certain biochemical tests (AST, or CK) improved its performance characteristics by a substantial margin, even though the sample size was small. AST previously has been shown to be significantly elevated in patients with EVD and associated with more severe and fatal disease [23,53,54]. Elevated AST likely reflects not only viral-induced hepatitis but also damage to other cells and end-organs such as red blood cells, pancreas, muscle, or kidneys. CK to our knowledge has not been previously described as a predictive biomarker for EVD outcomes. These biomarker data can be useful in helping to predict mortality for pediatric patients with EVD. For instance, shock may lead to an increase in AST/ALT and CK. However, the results must be interpreted with caution due to the small sample size.

The EPiC model building approach was based on Elastic Net, a form of regularized regression that has been benchmarked favorably against the more commonly-used stepwise regression [55,56]. Regularized regression is particularly good at retaining explanatory variables while reducing model complexity by removing nuisance variables. Our final EPiC model that emerged from the Elastic Net-based variable selection protocol is parsimonious in its complexity and the included predictors of EVD severity match clinical intuition. Furthermore, we were able to easily extend this protocol to update the model with additional biochemical predictors available in the DRC data. These compelling results suggest that our variable selection and model update protocol could be applied to other similar datasets.

A limitation of our study was the moderate amount of missing data (approximately 10%) for some variables, which highlights the difficulty of collecting data during a humanitarian emergency. Also, some patients may have been given experimental treatment under compassionate use, but such detailed information is not available in the West Africa derivation dataset. Furthermore, only aggregated data by day was available. As such, it was not possible to determine whether a patient died immediately upon arrival or later that day, requiring us to exclude all patients who died with one day of admission from our prediction model. The good overall calibration of our model suggests that such exclusion did not significantly affect the predictions. Additionally, our derivation dataset was collected from several different humanitarian agencies with differing data collection and laboratory procedures. Therefore, the scale of Ct values may vary between various laboratories. All Ct values presented in this manuscript were used to derive, validate, and update the models without any sort of normalization to account for the potential differences across the laboratories in the EDP dataset. Rerunning the calculations with normalized Ct values (obtained by subtracting the mean and dividing by the standard deviation at each site) revealed that all AUC values remained the same except for the AUC value on the validation dataset which was slightly lowered from 0.76 (CI 0.64–0.88) to 0.71 (CI 0.59–0.84). This indicates that the effect of Ct differences across sites is not large but also that models could be improved if raw Ct data were more consistent, or a more rigorous inter-site normalization protocol could be defined. In addition, our validation dataset is small (74 cases total) due to inclusion of only those cases with complete data, so study results have to be interpreted with caution, particularly those from model updating, which further reduced the sample size. However, these favorable preliminary results provide compelling justification for future prospective studies to investigate the prognostic utility of certain biomarkers for children as well as adults. These biomarkers, which are often part of a standard blood chemistry panel, are more accessible in low resource settings than more expensive testing such as proinflammatory cytokines [57]. Furthermore, collecting symptom information from children is difficult, especially from those who have not developed verbal skills. In fact, upon further testing, we found bone and muscle pain, asthenia, headache, and abdominal pain to be correlated with age, illustrating that children in the pre-verbal age group (defined as <2 years of age) cannot reliably report these symptoms (S3 Fig). Lastly, both settings adhered to WHO treatment guidelines and each country’s respective national guidelines. As such, there may have been slight differences in the treatment protocol between the West Africa derivation cohort and the DRC validation cohort.

In conclusion, the EPiC model is the first externally validated model for the prognosis of pediatric EVD. Pediatric patients with asthenia/weakness, headache, and abdominal pain were more likely to survive, while younger children, children with lower Ct values, bleeding, diarrhea, respiratory distress, dysphagia were more likely to die from EVD. As Ct value is a strong clinical predictor, rapid molecular tests should be widely available. The addition of routine blood test biochemical markers, such as AST and CK, strengthened the model and are usually available. This model can be easily applied by bedside clinicians to assess pediatric patients at risk for death and help to allocate resources accordingly. In fact, an online calculator has been developed so that clinicians can conveniently use the EPiC model to calculate risk scores, available at: https://kelseymbutler.shinyapps.io/epic-calculator/. Future improvements of this model would result from larger sample sizes with more consistent variable definitions and protocols across sites.

Supporting information

S1 Data. Excel spreadsheet containing all patients less than 18 years of age who presented to the Mangina ETU in DRC from December 2018 to January 2020 with laboratory confirmed EVD.

(XLSX)

S1 Text. Methods appendix on summary and imputation of missing data.

(DOCX)

S1 Fig. Distribution of imputed CT values for patients who presented with or without bleeding upon admission.

(TIF)

S2 Fig

Flowchart of patients excluded in West Africa Derivation cohort (A) and DRC validation cohort (B).

(TIF)

S3 Fig. Correlation between bone and muscle pain, asthenia, headache, and abdominal with age group.

Age group was defined as pre-verbal (<2 years), early verbal (2-<5 years), verbal (5-<10 years), verbal (10–17 years).

(TIF)

S4 Fig

Receiver operating characteristic (ROC) (A) and calibration (B) plots for the minimal model (age+CT) described in [22], trained on all patients (not pediatric-specific) in the EVD West African dataset from IMC. The intercept and slope of the linear fit to the predicted probabilities are -0.27 and 1.22, respectively.

(TIF)

S5 Fig. Plots showing the importance of the features in the EPiC model.

Analysis of variance chart generated with the anova() function in the rms package, showing a ranking of the features according to their predictive contribution to the model, as measured by the Wald χ2-d.f. (degrees of freedom) statistic (A). Chart generated with the summary function in rms, showing the odds ratios for all the features in the model, using interquartile-range odds ratios for continuous features (age and CT), and simple odds ratios for binary (yes/no) features (B).

(TIF)

S1 Table. Summary of missing values in West Africa derivation dataset.

(XLSX)

S2 Table. Results of variable selection protocol for categorical variables.

Variables were included only if they were selected in 100% of the models, which includes bleeding, diarrhea, respiratory distress, and swallowing problems.

(XLSX)

S3 Table. Demographic and clinical characteristics of patients in DRC validation cohort.

(XLSX)

S4 Table

Confusion matrix (a) and detailed performance measures (b) of the EPiC model as applied on the validation DRC dataset at the optimal prediction cutoff for accuracy (0.63). Both tables are the result of the confusionMatrix() function in the R package Caret (https://cran.r-project.org/web/packages/caret/index.html), with the addition of the false alarm and miss rates to S4B Table, and removal of a few measures that are less commonly used (e.g., Mcnemar’s Test P-Value and Kappa coefficient).

(XLSX)

S5 Table

Confusion matrix (a) and detailed performance measures (b) of the EPiC model augmented with AST and CK as applied on the validation DRC dataset at the optimal prediction cutoff for accuracy (0.5). Interpretation of these tables is the same as in S4 Table.

(XLSX)

Acknowledgments

The authors would like to thank the International Medical Corps field teams who serve tirelessly to provide excellent care to patients with Ebola Virus Disease, the Advance Clinical and Translational Research (Advance-CTR) at Brown University, and the staff of the Infectious Diseases Data Observatory Ebola Data Platform without whom this study would not be possible. The content of this manuscript is solely the responsibility of the authors and does not necessarily represent the views of any governmental bodies or academic organizations.

Data Availability

The Ebola Data Access Committee (DAC) manages and oversees all data access applications for re-use of the West Africa EDP derivation dataset, in accordance with their Data Access Guidelines and Data Transfer Agreement. Access can be requested to the DAC from: https://www.iddo.org/ebola/data-sharing/accessing-data. The DRC validation dataset is provided as a supplementary table (S1 Data). All source code used for model construction, validation, and updating is deposited in the following repository: https://github.com/colabobio/ebola-pediatric-prognostic-model/.

Funding Statement

AEG received funding for data collection from the Rhode Island Foundation [grant number 5222_20200596] and the National Institute of Allergy and Infectious Diseases [grant number R25AI140490]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.WHO Ebola Response Team. After Ebola in West Africa—Unpredictable Risks, Preventable Epidemics. N Engl J Med. 2016. Aug 11;375(6):587–96. doi: 10.1056/NEJMsr1513109 . [DOI] [PubMed] [Google Scholar]
  • 2.WHO Ebola Response Team. Ebola virus disease in West Africa—the first 9 months of the epidemic and forward projections. N Engl J Med. 2014. Oct 16;371(16):1481–95. doi: 10.1056/NEJMoa1411100 Epub 2014 Sep 22. ; PMCID: PMC4235004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Smit MA, Michelow IC, Glavis-Bloom J, Wolfman V, Levine AC. Characteristics and Outcomes of Pediatric Patients With Ebola Virus Disease Admitted to Treatment Units in Liberia and Sierra Leone: A Retrospective Cohort Study. Clin Infect Dis. 2017;64(3):243–249. doi: 10.1093/cid/ciw725 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chérif MS, Koonrungsesomboon N, Kassé D, Cissé SD, Diallo SB, Chérif F, et al. Ebola virus disease in children during the 2014–2015 epidemic in Guinea: a nationwide cohort study. Eur J Pediatr. 2017. Jun;176(6):791–796. doi: 10.1007/s00431-017-2914-z Epub 2017 Apr 25. . [DOI] [PubMed] [Google Scholar]
  • 5.Gulland A. Ebola mortality is highest among babies, finds study. BMJ. 2015;350(mar27 10):h1718–h1718. doi: 10.1136/bmj.h1718 [DOI] [PubMed] [Google Scholar]
  • 6.Shah T, Greig J, van der Plas LM, Achar J, Caleo G, Squire JS, et al. Inpatient signs and symptoms and factors associated with death in children aged 5 years and younger admitted to two Ebola management centres in Sierra Leone, 2014: a retrospective cohort study. Lancet Glob Heal. 2016;4(7):e495–e501. doi: 10.1016/S2214-109X(16)30097-3 [DOI] [PubMed] [Google Scholar]
  • 7.UNICEF. Children account for more than one third of Ebola cases in eastern Democratic Republic of the Congo—UNICEF. Accessed August 18, 2021. https://www.unicef.org/wca/press-releases/children-account-more-one-third-ebola-cases-eastern-democratic-republic-congo-unicef
  • 8.Centers for Disease Control and Prevention. Ebola (Ebola Virus Disease), History of Ebola Outbreaks. Accessed August 19, 2021. https://www.cdc.gov/vhf/ebola/history/chronology.html
  • 9.Ilunga P. “DR Congo declares end to latest Ebola outbreak”. The East African. December 17, 2021. Accessed December 26, 2021. https://www.theeastafrican.co.ke/tea/rest-of-africa/dr-congo-declares-end-to-latest-ebola-outbreak-3655822. [Google Scholar]
  • 10.Mupere E, Kaducu OF, Yoti Z. Ebola haemorrhagic fever among hospitalised children and adolescents in northern Uganda: epidemiologic and clinical observations. Afr Health Sci. 2001. Dec;1(2):60–5. ; PMCID: PMC2141551. [PMC free article] [PubMed] [Google Scholar]
  • 11.Sow MS, Sow DC, Diallo ML, Kassé D, Sylla K, Camara A, et al. Ebola virus disease in children in Conakry and Coyah Ebola treatment centers and risk factors associated with death. Médecine Mal Infect. 2020;50(7):562–566. doi: 10.1016/j.medmal.2019.12.001 [DOI] [PubMed] [Google Scholar]
  • 12.WHO Ebola Response Team. Ebola virus disease among children in West Africa. N Engl J Med. 2015. Mar 26;372(13):1274–7. doi: 10.1056/NEJMc1415318 ; PMCID: PMC4393247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Peacock G, Uyeki TM, Rasmussen SA. Ebola virus disease and children: what pediatric health care professionals need to know. JAMA Pediatr. 2014;168:1087–1088. doi: 10.1001/jamapediatrics.2014.2835 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Editorial. Children’s needs in an Ebola virus disease outbreak. Lancet Child Adolesc Health. 2019;3:55. doi: 10.1016/S2352-4642(18)30409-7 [DOI] [PubMed] [Google Scholar]
  • 15.Dixit D, Masumbuko Claude K, Kjaldgaard L, Hawkes MT. Review of Ebola virus disease in children–how far have we come? Paediatr Int Child Health. 2021;41(1):12–27. doi: 10.1080/20469047.2020.1805260 [DOI] [PubMed] [Google Scholar]
  • 16.Jacob ST, Crozier I, Fischer WA 2nd, Hewlett A, Kraft CS, de La Vega MA, et al. Ebola virus disease. Nat Rev Dis Primers. 2020;6:13. doi: 10.1038/s41572-020-0147-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kortepeter MG, Bausch DG, Bray M. Basic Clinical and Laboratory Features of Filoviral Hemorrhagic Fever. J Infect Dis. 2011;204(suppl_3):S810–S816. doi: 10.1093/infdis/jir299 [DOI] [PubMed] [Google Scholar]
  • 18.Kangbai JB, Heumann C, Hoelscher M, Sahr F, Froeschl G. Epidemiological characteristics, clinical manifestations, and treatment outcome of 139 paediatric Ebola patients treated at a Sierra Leone Ebola treatment center. BMC Infect Dis. 2019;19(1):81. doi: 10.1186/s12879-019-3727-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.World Health Organization. (2019). Optimized supportive care for Ebola virus disease clinical management standard operating procedures; [cited 2020 Mar 22]. Available from: https://apps.who.int/iris/handle/10665/325000 [Google Scholar]
  • 20.Fitzgerald F, Awonuga W, Shah T, Youkee D. Ebola response in Sierra Leone: the impact on children. J Infect. 2016;72(Suppl):S6–S12. doi: 10.1016/j.jinf.2016.04.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Fitzgerald F, Naveed A, Wing K, Gbessay M, Ross JCG, Checchi F, et al. Ebola virus disease in children, Sierra Leone, 2014–2015. Emerg Infect Dis. 2016;22:1769–1777. doi: 10.3201/eid2210.160579 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Hartley M-A, Young A, Tran A-M, Okoni-Williams HH, Suma M, Mancuso B, et al. Predicting Ebola Severity: A Clinical Prioritization Score for Ebola Virus Disease. Horby PW, ed. PLoS Negl Trop Dis. 2017;11(2):e0005265. doi: 10.1371/journal.pntd.0005265 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Colubri A, Silver T, Fradet T, Retzepi K, Fry B, Sabeti P. Transforming Clinical Data into Actionable Prognosis Models: Machine-Learning Framework and Field-Deployable App to Predict Outcome of Ebola Patients. Churcher TS, ed. PLoS Negl Trop Dis. 2016;10(3):e0004549. doi: 10.1371/journal.pntd.0004549 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kangbai JB, Heumann C, Hoelscher M, Sahr F, Froeschl G. Severity score for predicting in-facility Ebola treatment outcome. Ann Epidemiol. 2020. Sep;49:68–74. doi: 10.1016/j.annepidem.2020.07.017 Epub 2020 Aug 5. . [DOI] [PubMed] [Google Scholar]
  • 25.Colubri A, Hartley M-A, Siakor M, Wolfman V, Felix A, Sesay T, et al. Machine-learning Prognostic Models from the 2014–16 Ebola Outbreak: Data-harmonization Challenges, Validation Strategies, and mHealth Applications. EClinicalMedicine. 2019;11:54–64. doi: 10.1016/j.eclinm.2019.06.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Infectious Disease Data Observatory. Ebola—Accessing Data. Accessed August 01, 2021.Available from: https://www.iddo.org/ebola/data-sharing/accessing-data.
  • 27.Alliance for International Medical Action (2016): ALIMA Ebola Treatment Centre Database—Nzérékoré, Guinea. Exaptive. (dataset). 10.48688/7vxg-jb68 [doi.org] [DOI] [Google Scholar]
  • 28.International Medical Corps (2016): International Medical Corps Ebola Treatment Unit Data from Bong and Margibi (Liberia); Port Loko, Kambia, and Makeni (Sierra Leone). Exaptive. (dataset). 10.48688/8sm4-p926 [doi.org] [DOI] [Google Scholar]
  • 29.Institute of Tropical Medicine Antwerp (2019): Evaluation of Convalescent Plasma for Ebola Virus Disease in Guinea. Exaptive. (dataset). 10.48688/vngq-wh04 [doi.org] [DOI] [Google Scholar]
  • 30.Médecins Sans Frontières (2019): MSF Ebola Treatment Unit Database—Guéckédou, Guinea. Exaptive. (dataset). 10.48688/t4fq-es18 [doi.org] [DOI] [Google Scholar]
  • 31.Médecins Sans Frontières (2019): MSF Ebola Treatment Unit Database—Foya, Liberia. Exaptive. (dataset). 10.48688/x7eh-wb83 [doi.org] [DOI] [Google Scholar]
  • 32.Médecins Sans Frontières (2019): MSF Ebola Treatment Unit Database—Donka, Guinea. Exaptive. (dataset). 10.48688/q41r-jc95 [doi.org] [DOI] [Google Scholar]
  • 33.Médecins Sans Frontières (2019): MSF Ebola Treatment Unit Database—Freetown, Sierra Leone. Exaptive. (dataset). 10.48688/1wn0-3446 [doi.org] [DOI] [Google Scholar]
  • 34.Médecins Sans Frontières (2019): MSF Ebola Treatment Unit Database—Monrovia, Liberia. Exaptive. (dataset). 10.48688/fz2r-np73 [doi.org] [DOI] [Google Scholar]
  • 35.Médecins Sans Frontières (2020): MSF Ebola Treatment Unit Databases from Bo Town, Kailahun and Magburaka, Sierra Leone. Exaptive. (dataset). 10.48688/vw7p-pq15 [doi.org] [DOI] [Google Scholar]
  • 36.Save the Children International (2019): Save the Children International (SCI) Ebola Treatment Unit Database—Kerry Town, Sierra Leone. Exaptive. (dataset). 10.48688/18g3-d836 [doi.org] [DOI] [Google Scholar]
  • 37.University of Oxford (2019): Experimental Treatment of Ebola Virus Disease with Brincidofovir. Exaptive. (dataset). 10.48688/sbny-th82 [doi.org] [DOI] [Google Scholar]
  • 38.University of Oxford (2016): Experimental Treatment of Ebola Virus Disease with TKM-130803: A Single-Arm Phase 2 Clinical Trial. Exaptive. (dataset). 10.48688/sbny-th82 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Kelly J. Make diagnostic centres a priority for Ebola crisis. Nature 513, 145 (2014). doi: 10.1038/513145a [DOI] [PubMed] [Google Scholar]
  • 40.Dhillon RS, Srikrishna D, Sachs J. Controlling Ebola: next steps. Lancet. 2014. Oct 18;384(9952):1409–11. doi: 10.1016/S0140-6736(14)61696-2 . [DOI] [PubMed] [Google Scholar]
  • 41.Medecins Sans Frontieres (MSF). Filovirus haemorrhagic fever guideline. Barcelona: MSF; 2008. Available from: http://ebolaalert.org/wp-content/themes/ebolaalert/assets/PDFS/SOPMSFReference.pdf [Google Scholar]
  • 42.World Health Organization (WHO). Clinical management of patients with viral haemorrhagic fever: a pocket guide for the front-line health worker. Geneva: WHO; 2016. Available from: https://www.who.int/publications/i/item/9789241549608 [Google Scholar]
  • 43.Roshania R, Mallow M, Dunbar N, Mansary D, Shetty P, Lyon T, et al. Successful Implementation of a Multicountry Clinical Surveillance and Data Collection System for Ebola Virus Disease in West Africa: Findings and Lessons Learned. Glob Health Sci Pract. 2016. Sep 29;4(3):394–409. doi: 10.9745/GHSP-D-16-00186 ; PMCID: PMC5042696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Skrable K, Roshania R, Mallow M, Wolfman V, Siakor M, Levine AC. The natural history of acute Ebola Virus Disease among patients managed in five Ebola treatment units in West Africa: A retrospective cohort study. PLoS Negl Trop Dis. 2017;11(7):e0005700. Published 2017 Jul 19. doi: 10.1371/journal.pntd.0005700 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Royston J. P. 1982. “An Extension of Shapiro and Wilk’s W Test for Normality to Large Samples.” Applied Statistics 31 (2): 115–24. doi: 10.2307/2347973 [DOI] [Google Scholar]
  • 46.Harrell Frank E. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Vol. 608. New York: Springer, 2001. [Google Scholar]
  • 47.Friedman J, Hastie T, Tibshirani R. glmnet: Lasso and elastic-net regularized generalized linear models. R package version 1, no. 4 (2009): 1–24. [Google Scholar]
  • 48.Zou H., Hastie T. "Regularization and variable selection via the elastic net". Journal of the Royal Statistical Society. Series B, Statistical Methodology, 2005;67(2):301–320. doi: 10.1111/j.1467-9868.2005.00503.x [DOI] [Google Scholar]
  • 49.Dahly Darren L. “Evaluating a Logistic Regression Based Prediction Tool in R.” Darrendahly.github.io, 21 Apr. 2019, https://darrendahly.github.io/post/homr/. [Google Scholar]
  • 50.Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. (2011) “pROC: an open-source package for R and S+ to analyze and compare ROC curves”. BMC Bioinformatics, 7, 77. doi: 10.1186/1471-2105-12-77 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Nieboer D, Vergouwe Y, Ankerst DP, Roobol MJ, Steyerberg EW. "Improving prediction models with new markers: a comparison of updating strategies." BMC medical research methodology 16, no. 1 (2016): 1–10. doi: 10.1186/s12874-016-0231-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Tennekes M. “tmap: Thematic Maps in R.” Journal of Statistical Software, 2018;84(6), 1–39. doi: 10.18637/jss.v084.i0630450020 [DOI] [Google Scholar]
  • 53.Onyango CO, Opoka ML, Ksiazek TG, Formenty P, Ahmed A, Tukei PM, et al. Laboratory diagnosis of Ebola hemorrhagic fever during an outbreak in Yambio, Sudan, 2004. J Infect Dis. 2007. Nov 15;196 Suppl 2:S193–8. doi: 10.1086/520609 . [DOI] [PubMed] [Google Scholar]
  • 54.WHO Clinical Response Team. Clinical illness and outcomes in patients with Ebola in Sierra Leone. N Engl J Med. 2014. Nov 27;371(22):2092–100. doi: 10.1056/NEJMoa1411680 Epub 2014 Oct 29. ; PMCID: PMC4318555. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Smith G. Step away from stepwise. J Big Data 5, 32 (2018). doi: 10.1186/s40537-018-0143-6 [DOI] [Google Scholar]
  • 56.Morozova O., Levina O., Uusküla A. Heimer R. Comparison of subset selection methods in linear regression in the context of health-related quality of life and substance abuse in Russia. BMC Med Res Methodol 15, 71 (2015). doi: 10.1186/s12874-015-0066-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.McElroy AK, Erickson BR, Flietstra TD, Rollin PE, Nichol ST, Towner JS, et al. Biomarker correlates of survival in pediatric patients with Ebola virus disease. Emerg Infect Dis. 2014. Oct;20(10):1683–90. doi: 10.3201/eid2010.140430 ; PMCID: PMC4193175. [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0010789.r001

Decision Letter 0

Camille Lebarbenchon, Anita K McElroy

19 Apr 2022

Dear Dr Colubri,

Thank you very much for submitting your manuscript "Using machine learning to predict survival in children with Ebola Virus Disease" for consideration at PLOS Neglected Tropical Diseases. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

  

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Anita K. McElroy, MD, PhD

Associate Editor

PLOS Neglected Tropical Diseases

Camille Lebarbenchon

Deputy Editor

PLOS Neglected Tropical Diseases

***********************

Reviewer's Responses to Questions

Key Review Criteria Required for Acceptance?

As you describe the new analyses required for acceptance, please consider the following:

Methods

-Are the objectives of the study clearly articulated with a clear testable hypothesis stated?

-Is the study design appropriate to address the stated objectives?

-Is the population clearly described and appropriate for the hypothesis being tested?

-Is the sample size sufficient to ensure adequate power to address the hypothesis being tested?

-Were correct statistical analysis used to support conclusions?

-Are there concerns about ethical or regulatory requirements being met?

Reviewer #1: -There are several limitations to the methods. Many of the data elements were missing- in fact 10% of data were missing.

Additionally there was a variety in the manner in which the patients were treated and data were collected. However, given the circumstances it is difficult to envision how the data could have been collected differently during the epidemic. The authors should mention which of the 18 variables were missing and in what percentages in the Supplemental material

-It would help if the authors provided more details about the IDDO EDP. Were data collected retrospectively or prospectively.

-suggest providing more details about how information about symptoms were collected

-suggest specifying why a 5 year age gap was used? did the authors discuss dividing age into a categorical variable since a 4 year old is very different from an infant? Perhaps they could do a sensitivity analysis using age groups that are categorical (infant, toddler, etc.). Would also suggest including more details about the number of patients in each categorical age group. The IQR is presented but it would be helpful to have more details

-please cite the reference for lines 156 regarding the Shapiro Wilk test

Reviewer #2: Using machine learning to predict survival in children with Ebola Virus Disease

Alicia E. Genisca, et al.

While Ebola virus disease (EVD) is well known to cause a highly lethal disease in all age groups, it is especially lethal young, pediatric patients. In this report, Genisca and colleagues use a machine learning approach to develop a prognostic model called the EVD prognosis in Children (EPiC). The initial EPiC model was created using a training dataset created using information based upon pediatric patient in the 2014-16 West Africa EVD outbreak. This resulted in a model that used age, Ct value, bleeding, diarrhea, breathlessness and dysphagia. This initial model performed with an AUC of 0.77 in the training cohort and when evaluated using a smaller validation cohort from the Democratic Republic of Congo (DRC) showed a similar performance. Further evaluation of the model in the validation cohort found that it tended to overestimate the risk of death. Since this performance was not optimal, two additional laboratory parameters, AST and CK, were investigated to determine whether adding their values to the model would improve performance. They found the predictive AUC was 0.87 to 0.90. Overall, the claims of this paper are supported by the data presented. However, parameters that were identified were already well known to be linked to poor outcomes and the predictive model was not translatable into a clinically useful tool that would allow physicians to easily use it to cohort patients based on risks.

Comments:

1) Line 121-123: Patient selection excluded subjects if they were missing outcome data or if they died within 24 hours of admission to the Ebola treatment unit, “as a prediction tool would not be useful for a moribound patient”.

• A flow chart/decision tree for each cohort that shows how many patients were excluded for each reason should be provided as a supplemental figure. This will help assess for any potential biases in their training and validation cohorts.

• This reviewer respectfully disagrees with the assumption that death within 24 hours is always going associated with the patient being moribound at the time of admission or that there might not be a reversable condition that could be addressed if appropriate risk factors were identified early. This is because interventions could be targeted by clinicians to address the risk factors.

2) While, the timing of the ALT and CK sample values used were indicated, it was not clear what time point was used to make the determination for the parameters evaluated in the models. That is, was it at admission, if the symptom developed at any point or something else. Similarly, for Ct, was the value at admission, the peak value or something else?

3) Were there differences in clinical site PCR testing in terms of the viral genes targeted and specific test that determined the Ct value? If so, how were differences accounted for in the model?

4) In the West African outbreak, some patient were able to receive experimental therapies under compassionate use, expanded access or as part of clinical trials. It would be helpful to see the number of patients in each group that received an experimental therapy, especially monoclonal antibodies.

5) The manuscript notes that the original model overestimated the risk of death in the DRC validation cohort, this could be due to a larger number of patients in the validation cohort receiving effective therapeutics as part of expanded access (EUA) protocols or as part of the PALM randomized clinical trial. Particularly, two of the therapies were shown to be highly effective at reducing mortality. A supplementary table should be provided that shows the complete set of clinical parameters shown in Table 1 for the DRC cohort and that shows the number of patients in the cohort that received an experimental therapy under EUA or the PALM randomized clinical trial.

6) The results section discussing the Derivation of the Clinical Prognostic Model (lines 231-235) states what parameter were used in the model but does not explain to the reader how those were chosen amongst all the significantly different variables shown in Table 1. The manuscript would be improved by adding a few more details on how the machine learning protocol reduced parameter sets to two continuous and 4 binary covariates.

7) Line 252-253 indicated that ALT was not used in the models and that ALT or CK were. This was because ALT was highly correlated with AST. Therefore, all comments and statements in the manuscript that ALT was used should be removed. For example on lines 280.

8) Line 312-313. It is not clear from this report how clinician would use this model to predict risk because there is not a toolkit online or clinical scoring system or something similar that one would utilize to calculate a risk. Also, the formal definitions of how the clinical parameters used were defined is not available. For example, what is the formal definition of breathlessness? Is it a subjectively determined or is it an abnormal respiratory rate or is it having signs of severe respiratory distress or something else. Thus, statement about ease of application of the mode should be removed.

Minor comments

9) Line 239. Validation AUC is 0.79 on this line and in the abstract is it 0.76

10) Figure 2B. The figure would be improved by providing a description in the legend or by labels in the panel for what the dashed and red lines are.

Reviewer #3: - It is not clear why there is a need for a new model. As also the authors state in the manuscript, there are existing predictive models. These models were not evaluated using training and validation sets separately. The same datasets could be used to train, test and validate the existing predictive models. At least to justify the need for a new predictive model the proposed model could be compared to these existing models over the same dataset.

- Bootstrapping is used for model training (or derivation t as authors denote in the manuscript). This approach could overestimate the training performance. Why not use K-fold cross validation?

- It will be good to prepare a table that includes all the variables that are used in Elastic net for variable selection. Also this table could include information about if these variables are continuous valued or binary

--------------------

Results

-Does the analysis presented match the analysis plan?

-Are the results clearly and completely presented?

-Are the figures (Tables, Images) of sufficient quality for clarity?

Reviewer #1: The data presented do match the analysis plan and the figures are legible and clear. A few minor suggestions:

-In Table 1, I would add a footnote mentioning that the OR is for each 5 year increase in age

-Suggest a new title for Table 1 reflecting that this title is focused on mortality and predictors of mortality

-The mortality, location, and age of the validation cohort appear to be very different from the original cohort. Do the authors think this might affect their results?

Reviewer #2: see above

Reviewer #3: - It is not clear how many patients were in the training and how many were in the validation? Tables 1 and 2 are confusing, as the number participants used for derivation in Table 2 is 234 while total number of patients in external validation set is 74.

- What does derivation cohort presented in Table 2 mean? Is the derivation cohort from DRC Mangina dataset used to refine the model trained using the data described in Table 1? If yes, the results do not present an external validation. Evaluation data should not be used in the training.

- Overall description of results need to be more clear, more clear description of Tables are needed.

- Why not train and test the existing predictive methods using the same datasets to compare with the proposed method? This will also justify the need for a new predictive model.

- How are the Sensitivity vs Specificity curves obtained? This needs be explained clearly. Also what are the probability of false alarm and probability of miss?

--------------------

Conclusions

-Are the conclusions supported by the data presented?

-Are the limitations of analysis clearly described?

-Do the authors discuss how these data can be helpful to advance our understanding of the topic under study?

-Is public health relevance addressed?

Reviewer #1: -the authors did not include treatment type or type of facility in their analysis. Do the authors think this could be a potential limitation of their study?

-why do the authors think that asthenia, headache, and abdominal pain were correlated with better outcomes? could this have to do with the ability of the child to relate their symptoms to a caregiver? in other words a child with a better mental status or who is older may be better able to express those symptoms?

-authors should mention that collecting symptom information in children is difficult and may be a limitation (especially for very young children)

-line 283- the authors should mention specifically that shock may lead to an increase in AST/ALT and CK

-another limitation that the authors should mention is that Ct values vary from assay to assay

-the validation model only looked at 74 cases in a different setting and time frame which is an additional limitation

Reviewer #2: see above

Reviewer #3: (No Response)

--------------------

Editorial and Data Presentation Modifications?

Use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity. If the only modifications needed are minor and/or editorial, you may wish to recommend “Minor Revision” or “Accept”.

Reviewer #1: (No Response)

Reviewer #2: (No Response)

Reviewer #3: (No Response)

--------------------

Summary and General Comments

Use this section to provide overall comments, discuss strengths/weaknesses of the study, novelty, significance, general execution and scholarship. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. If requesting major revision, please articulate the new experiments that are needed.

Reviewer #1: This is a well written manuscript that provides a model for assessing risk of death in pediatric patients with EVD. This is an important study because it can prove helpful in future epidemics. It would be further strengthened if it included data on therapeutics and care.

Reviewer #2: (No Response)

Reviewer #3: (No Response)

--------------------

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0010789.r003

Decision Letter 1

Camille Lebarbenchon, Anita K McElroy

28 Jul 2022

Dear Dr Colubri,

Thank you very much for submitting your manuscript "Constructing, validating, and updating machine learning models to predict survival in children with Ebola Virus Disease" for consideration at PLOS Neglected Tropical Diseases. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Anita K. McElroy, MD, PhD

Academic Editor

PLOS Neglected Tropical Diseases

Camille Lebarbenchon

Section Editor

PLOS Neglected Tropical Diseases

***********************

Reviewer's Responses to Questions

Key Review Criteria Required for Acceptance?

As you describe the new analyses required for acceptance, please consider the following:

Methods

-Are the objectives of the study clearly articulated with a clear testable hypothesis stated?

-Is the study design appropriate to address the stated objectives?

-Is the population clearly described and appropriate for the hypothesis being tested?

-Is the sample size sufficient to ensure adequate power to address the hypothesis being tested?

-Were correct statistical analysis used to support conclusions?

-Are there concerns about ethical or regulatory requirements being met?

Reviewer #2: see editorial and date presentation section

Reviewer #3: The authors addressed most of our concerns. However, we still think that a comparison with the existing methods will motivate the need for a new predictive model better. The authors claim that the previously developed predictive models are limited in geographic and temporal scope. Our understanding is that the previous models while being trained and tested did not have access to the datasets that the authors used in this manuscript. When trained and tested in the same dataset, these existing models may outperform the methodology that the authors are proposing in this paper. We think that the authors need to demonstrate that the predictive model they are proposing is outperforming the existing methods when the same datasets are used. This will provide a better motivation for the proposed methodology. Then if the proposed method outperforms the existing methods when used on the same dataset, a discussion on methodological differences and why these differences matter should be discussed.

--------------------

Results

-Does the analysis presented match the analysis plan?

-Are the results clearly and completely presented?

-Are the figures (Tables, Images) of sufficient quality for clarity?

Reviewer #2: see editorial and date presentation section

Reviewer #3: please see above.

--------------------

Conclusions

-Are the conclusions supported by the data presented?

-Are the limitations of analysis clearly described?

-Do the authors discuss how these data can be helpful to advance our understanding of the topic under study?

-Is public health relevance addressed?

Reviewer #2: see editorial and date presentation section

Reviewer #3: please see above.

--------------------

Editorial and Data Presentation Modifications?

Use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity. If the only modifications needed are minor and/or editorial, you may wish to recommend “Minor Revision” or “Accept”.

Reviewer #2: If not commented on below, the responses provided by the authors has satisfied this reviewer.

Initial Reviewer comment

Line 121-123: Patient selection excluded subjects if they were missing outcome data or

if they died within 24 hours of admission to the Ebola treatment unit, “as a prediction tool

would not be useful for a moribound patient”. A flow chart/decision tree for each cohort

that shows how many patients were excluded for each reason should be provided as a

supplemental figure. This will help assess for any potential biases in their training and

validation cohorts. This reviewer respectfully disagrees with the assumption that death

within 24 hours is always associated with the patient being moribound at the time of

admission or that there might not be a reversible condition that could be addressed if

appropriate risk factors were identified early. This is because interventions could be

targeted by clinicians to address the risk factors.

○ Thank you for your thoughtful comment. We respectfully disagree with the

reviewer. Based on the experience in the field of some of the co-authors of this

paper, we believe that inclusion of children who died within the first 24 hours

could bias the model because the definitive diagnosis may not be confirmed and

Ct values as well as most other laboratory results would not be available within

that short timeframe. Furthermore, it may not be practical or meaningful for

clinicians to apply such a model when they are focused on resuscitating an

unstable moribund patient under challenging circumstances. Also, we

constructed the model taking into account that patients may respond well to

resuscitative measures (e.g., rehydration, glucose administration, electrolyte

supplementation) within the first 24 hours, in which case, the model would not be

accurate. We concluded that inclusion of these cases may have detracted from

the model because of the large number of missing data. Therefore, taken

together, we believe that by excluding unstable moribound children, we present a

more robust and clinically meaningful model that is likely to be more accurate

and generalizable to other settings. Additionally, per the reviewer’s suggestion a

flow chart has been provided as a supplemental figure detailing how many

patients were excluded from each dataset.

Reviewer response:

o The way the section on participant Selection is written that is being discussed, it is appears to assume that someone who dies in thee first 24 hours is moribund at the time of admission. Could they not become moribund after admission but within the first 24 hours? In that case, having a tool available to predict which ones might become moribund within the first 24 hours might be useful.

That said, I would be satisfied if the authors would rewrite this sentence to be more consistent with the authors response above.

o Thank you for providing the data on excluded patients in each cohort. The DRC validation cohort has ~24% of patients excluded, whereas as the West Africa had ~4%. This could influence the outcomes being analyzed. For example could the overestimation of death in the DRC cohort be because of the higher proportion of patient who died in the first 24 hours being excluded? Some comment should be provided in the manuscript.

Initial Reviewer comment

The results section discussing the Derivation of the Clinical Prognostic Model (lines 231-

235) states what parameter were used in the model but does not explain to the reader

how those were chosen amongst all the significantly different variables shown in Table

1. The manuscript would be improved by adding a few more details on how the machine

learning protocol reduced parameter sets to two continuous and 4 binary covariates.

○ Thank you for the feedback. The section the Reviewer is referring to is in the

Results section of the manuscript. Details on how the final variables were

selected for the model is detailed in the Methods section under the “Variable

selection” subsection. In short, the binary variable selection protocol is as follows:

Elastic Net was applied to each imputed dataset, the sign of the coefficients of

the binary symptom variables in the resulting models were tallied, and those

variables with the percentage of positive model coefficients above a given

threshold were selected (Supplementary Table 2). This selection criterion

facilitated the inclusion of groups of correlated predictors and predictors with

small but significant effects. The threshold for variable inclusion was set at 100%

to exclude variables with weak and/or inconsistent effects. Based on this

protocol, any bleeding, dysphagia, breathlessness, and diarrhea were included in

the EPiC model. Age and Ct were selected for inclusion in the model as they are

highly correlated with poor prognosis in children with (references cited below)..[...]

Reviewer response:

o Thanks for you response. I appreciate the details are in the methods. The comment was to help the reader of the manuscript better follow the logic without needing to refer back to the methods section. The recommendation stands but it is not a requirement.

Initial Reviewer comment:

● Line 312-313. It is not clear from this report how clinician would use this model to predict

risk because there is not a toolkit online or clinical scoring system or something similar

that one would utilize to calculate a risk. Also, the formal definitions of how the clinical

parameters used were defined is not available. For example, what is the formal definition

of breathlessness? Is it a subjectively determined or is it an abnormal respiratory rate or

is it having signs of severe respiratory distress or something else. Thus, statement about

ease of application of the mode should be removed.

○ Thank you for the feedback. Based on your suggestion, we developed an online

calculator so that physicians can easily use it to calculate risk scores. This

calculator can be found at https://kelseymbutler.shinyapps.io/epic-calculator/ and

a link has been included in the text: As stated in the methods section of the

manuscript, clinicians at each Ebola treatment center followed the World Health

Organization’s guidelines for clinical assessment and definitions of abnormal

signs and symptoms e.g. The WHO manual refers to age-related respiratory

rates for respiratory distress. We cited the WHO guidelines in our Reference list

(#19 and #42).

Reviewer Response:

o Thank you for providing the online calculator.

o The link for reference 19 needs to be updated.

o While the presence of bleeding, diarrhea and dysphagia can reasonably be pulled from a chart breathlessness is more subjective. In addition, the WHO documents discussed do not contains a definition for breathlessness. Respiratory distress is defined. However, even if breathlessness was defined in the WHO documents, a reader should not have to go to the WHO guidelines to determine what was meant. Please provide the formal definitions used in your models in the manuscript.

Reviewer #3: (No Response)

--------------------

Summary and General Comments

Use this section to provide overall comments, discuss strengths/weaknesses of the study, novelty, significance, general execution and scholarship. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. If requesting major revision, please articulate the new experiments that are needed.

Reviewer #2: see editorial and date presentation section

Reviewer #3: (No Response)

--------------------

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article's retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0010789.r005

Decision Letter 2

Camille Lebarbenchon, Anita K McElroy

5 Sep 2022

Dear Dr Colubri,

We are pleased to inform you that your manuscript 'Constructing, validating, and updating machine learning models to predict survival in children with Ebola Virus Disease' has been provisionally accepted for publication in PLOS Neglected Tropical Diseases.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Neglected Tropical Diseases.

Best regards,

Anita K. McElroy, MD, PhD

Academic Editor

PLOS Neglected Tropical Diseases

Camille Lebarbenchon

Section Editor

PLOS Neglected Tropical Diseases

***********************************************************

PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0010789.r006

Acceptance letter

Camille Lebarbenchon, Anita K McElroy

28 Sep 2022

Dear Dr Colubri,

We are delighted to inform you that your manuscript, "Constructing, validating, and updating machine learning models to predict survival in children with Ebola Virus Disease," has been formally accepted for publication in PLOS Neglected Tropical Diseases.

We have now passed your article onto the PLOS Production Department who will complete the rest of the publication process. All authors will receive a confirmation email upon publication.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any scientific or type-setting errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Note: Proofs for Front Matter articles (Editorial, Viewpoint, Symposium, Review, etc...) are generated on a different schedule and may not be made available as quickly.

Soon after your final files are uploaded, the early version of your manuscript will be published online unless you opted out of this process. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Neglected Tropical Diseases.

Best regards,

Shaden Kamhawi

co-Editor-in-Chief

PLOS Neglected Tropical Diseases

Paul Brindley

co-Editor-in-Chief

PLOS Neglected Tropical Diseases

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Data. Excel spreadsheet containing all patients less than 18 years of age who presented to the Mangina ETU in DRC from December 2018 to January 2020 with laboratory confirmed EVD.

    (XLSX)

    S1 Text. Methods appendix on summary and imputation of missing data.

    (DOCX)

    S1 Fig. Distribution of imputed CT values for patients who presented with or without bleeding upon admission.

    (TIF)

    S2 Fig

    Flowchart of patients excluded in West Africa Derivation cohort (A) and DRC validation cohort (B).

    (TIF)

    S3 Fig. Correlation between bone and muscle pain, asthenia, headache, and abdominal with age group.

    Age group was defined as pre-verbal (<2 years), early verbal (2-<5 years), verbal (5-<10 years), verbal (10–17 years).

    (TIF)

    S4 Fig

    Receiver operating characteristic (ROC) (A) and calibration (B) plots for the minimal model (age+CT) described in [22], trained on all patients (not pediatric-specific) in the EVD West African dataset from IMC. The intercept and slope of the linear fit to the predicted probabilities are -0.27 and 1.22, respectively.

    (TIF)

    S5 Fig. Plots showing the importance of the features in the EPiC model.

    Analysis of variance chart generated with the anova() function in the rms package, showing a ranking of the features according to their predictive contribution to the model, as measured by the Wald χ2-d.f. (degrees of freedom) statistic (A). Chart generated with the summary function in rms, showing the odds ratios for all the features in the model, using interquartile-range odds ratios for continuous features (age and CT), and simple odds ratios for binary (yes/no) features (B).

    (TIF)

    S1 Table. Summary of missing values in West Africa derivation dataset.

    (XLSX)

    S2 Table. Results of variable selection protocol for categorical variables.

    Variables were included only if they were selected in 100% of the models, which includes bleeding, diarrhea, respiratory distress, and swallowing problems.

    (XLSX)

    S3 Table. Demographic and clinical characteristics of patients in DRC validation cohort.

    (XLSX)

    S4 Table

    Confusion matrix (a) and detailed performance measures (b) of the EPiC model as applied on the validation DRC dataset at the optimal prediction cutoff for accuracy (0.63). Both tables are the result of the confusionMatrix() function in the R package Caret (https://cran.r-project.org/web/packages/caret/index.html), with the addition of the false alarm and miss rates to S4B Table, and removal of a few measures that are less commonly used (e.g., Mcnemar’s Test P-Value and Kappa coefficient).

    (XLSX)

    S5 Table

    Confusion matrix (a) and detailed performance measures (b) of the EPiC model augmented with AST and CK as applied on the validation DRC dataset at the optimal prediction cutoff for accuracy (0.5). Interpretation of these tables is the same as in S4 Table.

    (XLSX)

    Attachment

    Submitted filename: Pediatric.EVD.PLOS-NTD.Revised.ReviewerResponse-06-24-22.docx

    Attachment

    Submitted filename: Pediatric.EVD-PLOS-NTD.Revision.Response-08-21-22.docx

    Data Availability Statement

    The Ebola Data Access Committee (DAC) manages and oversees all data access applications for re-use of the West Africa EDP derivation dataset, in accordance with their Data Access Guidelines and Data Transfer Agreement. Access can be requested to the DAC from: https://www.iddo.org/ebola/data-sharing/accessing-data. The DRC validation dataset is provided as a supplementary table (S1 Data). All source code used for model construction, validation, and updating is deposited in the following repository: https://github.com/colabobio/ebola-pediatric-prognostic-model/.


    Articles from PLoS Neglected Tropical Diseases are provided here courtesy of PLOS

    RESOURCES