Key Points
Question
What are the best-performing organ dysfunction–based criteria to implement the definition of sepsis and septic shock in children with suspected infection?
Findings
In this international, multicenter, retrospective cohort study including more than 3.6 million pediatric encounters, a novel score, the Phoenix Sepsis Score, was derived and validated to predict mortality in children with suspected or confirmed infection. The new criteria for pediatric sepsis and septic shock based on the score performed better than existing organ dysfunction scores and the International Pediatric Sepsis Consensus Conference criteria.
Meaning
The new data-driven criteria for pediatric sepsis and septic shock based on measures of organ dysfunction had improved performance compared with prior pediatric sepsis criteria.
Abstract
Importance
The Society of Critical Care Medicine Pediatric Sepsis Definition Task Force sought to develop and validate new clinical criteria for pediatric sepsis and septic shock using measures of organ dysfunction through a data-driven approach.
Objective
To derive and validate novel criteria for pediatric sepsis and septic shock across differently resourced settings.
Design, Setting, and Participants
Multicenter, international, retrospective cohort study in 10 health systems in the US, Colombia, Bangladesh, China, and Kenya, 3 of which were used as external validation sites. Data were collected from emergency and inpatient encounters for children (aged <18 years) from 2010 to 2019: 3 049 699 in the development (including derivation and internal validation) set and 581 317 in the external validation set.
Exposure
Stacked regression models to predict mortality in children with suspected infection were derived and validated using the best-performing organ dysfunction subscores from 8 existing scores. The final model was then translated into an integer-based score used to establish binary criteria for sepsis and septic shock.
Main Outcomes and Measures
The primary outcome for all analyses was in-hospital mortality. Model- and integer-based score performance measures included the area under the precision recall curve (AUPRC; primary) and area under the receiver operating characteristic curve (AUROC; secondary). For binary criteria, primary performance measures were positive predictive value and sensitivity.
Results
Among the 172 984 children with suspected infection in the first 24 hours (development set; 1.2% mortality), a 4-organ-system model performed best. The integer version of that model, the Phoenix Sepsis Score, had AUPRCs of 0.23 to 0.38 (95% CI range, 0.20-0.39) and AUROCs of 0.71 to 0.92 (95% CI range, 0.70-0.92) to predict mortality in the validation sets. Using a Phoenix Sepsis Score of 2 points or higher in children with suspected infection as criteria for sepsis and sepsis plus 1 or more cardiovascular point as criteria for septic shock resulted in a higher positive predictive value and higher or similar sensitivity compared with the 2005 International Pediatric Sepsis Consensus Conference (IPSCC) criteria across differently resourced settings.
Conclusions and Relevance
The novel Phoenix sepsis criteria, which were derived and validated using data from higher- and lower-resource settings, had improved performance for the diagnosis of pediatric sepsis and septic shock compared with the existing IPSCC criteria.
This cohort study derives and validates novel criteria for diagnosis of pediatric sepsis and septic shock across high-resource and low-resource international settings.
Introduction
Pediatric sepsis is a major public health problem that causes an estimated 3.3 million deaths annually worldwide.1 However, the current criteria to diagnose pediatric sepsis, which were published in 2005 following the International Pediatric Sepsis Consensus Conference (IPSCC), are outdated, have low specificity, do not allow for risk stratification in both lower- and higher-resource settings, and may be discordant with clinician-based diagnosis.2,3 In 2016, the Sepsis-3 Task Force redefined adult sepsis as life-threatening organ dysfunction in the setting of infection and developed criteria using a large electronic health record (EHR) data set and a data-driven approach.4,5 In 2019, the Society of Critical Care Medicine Pediatric Sepsis Definition Task Force was convened to update the pediatric sepsis definition and criteria. The task force adopted the conceptual definition of pediatric sepsis as suspected infection with life-threatening organ dysfunction and sought to implement the definition using organ dysfunction criteria associated with higher risk of mortality. The goal was to develop criteria that would generalize across differently resourced settings.6
New pediatric sepsis criteria should maximize identification of true-positive cases so that infected children with life-threatening organ dysfunction receive best-practice sepsis care, are appropriately enrolled in clinical studies, and are correctly represented in epidemiological surveillance. Simultaneously, new criteria must minimize false-positive cases so that children are not misdiagnosed with sepsis. This is important to reduce unnecessary use of antimicrobials and other treatments, optimize the efficiency of clinical studies, and avoid overcounting in surveillance. However, it is unclear which measures of organ dysfunction in children have an appropriate balance of sensitivity and positive predictive value (PPV) to achieve these goals and also generalize across differently resourced settings.
One challenge is that there is currently no large, centralized, multicenter, high-granularity database that includes pediatric emergency and inpatient care in differently resourced settings. Additionally, the validation of the existing IPSCC criteria has been limited historically.2,3 To address these gaps, a database was developed and used to derive and validate novel criteria for pediatric sepsis and septic shock based on measures of organ dysfunction in children with suspected infection.
Methods
Overview
The existing organ dysfunction subscores for each organ system that best predicted mortality were first identified and then integrated into models to predict mortality in children with suspected infection. From the best-performing models, an integer-based score (the Phoenix Sepsis Score) was developed (eFigure 1 in Supplement 1). The binary Phoenix sepsis and septic shock criteria were then selected as thresholds of the Phoenix Sepsis Score.
Study Design, Setting, and Population
A retrospective cohort study was performed using EHR data from 10 hospital-based sites in 5 countries. The analysis plan was prespecified in the funding application that supported this work. Six US sites represent higher-resource settings, 5 of which were in the development data set (eFigure 2 in Supplement 1). Data from 1 US site was held out for geographic external validation. Two international sites in Bangladesh and Colombia represent lower-resource settings in the development data set. Additionally, limited EHR and registry data from sites in China7 and Kenya served as lower-resource external validation sites. From each site, all emergency department, inpatient, and intensive care unit (ICU) encounters of children younger than 18 years from 2010-2019 were included, with some sites providing shorter time windows (eTable 1 in Supplement 1). Data from newborns before discharge (birth hospitalizations) and children with a postconceptional age of less than 37 weeks were excluded. Data harmonization, quality assurance, and all analyses were conducted as a reproducible pipeline in a centralized, cloud-based environment (eFigure 2 and eAppendix 1 in Supplement 1). The study was approved with a waiver of consent by a central institutional review board at the University of Colorado, plus separate regulatory approvals at non-US sites.
Outcomes, Definitions, and Main Measures
The primary outcome for all analyses was in-hospital mortality, which was used to assess the likelihood that organ dysfunction in the setting of an infection was life-threatening. The secondary outcome for all analyses was a composite of early death (within 72 hours of presentation to the hospital) or requirement of extracorporeal membrane oxygenation (ECMO) support. This secondary outcome was requested by the task force because early death and ECMO are more likely to be directly associated with sepsis in the first 24 hours of presentation than in-hospital mortality, which can occur later and be the result of complications during the hospitalization. Also, using ECMO to rescue children with sepsis-associated respiratory and/or cardiac failure could lead to survival of some children who would otherwise die. Suspected infection was defined as receipt of systemic antimicrobials and microbiological testing within the first 24 hours of the encounter. Comorbidities were defined based on the Pediatric Complex Chronic Conditions Classification System,8 and severe malnutrition was based on more than 3 SDs below the mean based on weight-for-age standards from the World Health Organization.9 The systemic inflammatory response syndrome criteria were based on IPSCC criteria.2,3 Because dosing information necessary to calculate the vasoactive-inotropic score was often missing at lower-resource sites, the number of concurrent vasoactive agents was tested as a proxy. The area under the precision recall curve (AUPRC) was used as the primary measure of organ dysfunction subscore, stacked regression sepsis model, and Phoenix Sepsis Score performance because it is more accurate than the area under the receiver operating characteristic (AUROC) curve when analyzing imbalanced data sets (eg, many more survivors than nonsurvivors). This is particularly important in children with infections given their lower baseline mortality compared with adults.10,11 The best way to interpret AUPRCs is to use the baseline rate as reference. If mortality is 1% (0.01) and the model AUPRC is 0.30, the model has 30-fold higher performance than a random model. Because the novel Phoenix sepsis and septic shock criteria represent single, binary thresholds, the primary performance measures used to evaluate them were sensitivity and PPV, which represent single points on the precision recall curve. Missing data were imputed using a last-observation-carried-forward approach across physiologically appropriate time windows. See eAppendix 1 in Supplement 1 for details.
Derivation and Validation of the Novel Criteria for Sepsis and Septic Shock
The evaluation of which organ dysfunction subscores best predicted mortality involved all patients with and without suspected infection (eFigures 1-2 in Supplement 1). Then, stacked regression models12,13 were derived and validated to predict mortality using the worst organ dysfunction subscores recorded in the first 24 hours of the encounter among children with suspected infection (eFigures 1-2 in Supplement 1). This approach was used to implement the concept of “an infection with life-threatening organ dysfunction,” which was adopted by the Pediatric Sepsis Definition Task Force as the conceptual definition of sepsis.
The data set was first divided into development (including derivation and internal validation) and external validation sets as described above and shown in eFigure 2 in Supplement 1. From each development site, 25% were held out for internal validation. The other three 25% portions of the development data set were used to (1) identify the best-performing criteria for each individual organ dysfunction based on the subscores of 8 existing and previously validated pediatric organ dysfunction criteria in all patients in the development data sets (including patients with suspected infection and those without) (eTable 2 and eFigure 2 in Supplement 1)14,15,16,17,18,19; (2) train and tune stacked regression models using a composite of the best-performing individual organ dysfunction criteria in children with suspected infection12,13; and (3) derive and internally validate the novel sepsis criteria based on the final stacked regression model. Finally, the novel criteria were validated in the external validation sets.
Stacked regression is a robust model-averaging approach that allows many models to be used simultaneously, leveraging the best predictive power of each model. The best-performing organ dysfunction subcomponent scores were used as input variables for stacked regression models that also predicted mortality. The stacked regression models took the organ dysfunction subscores as covariates and estimated the regression weights (or the relative contribution of each respective subcomponent’s prediction to the overall prediction) in accordance with each subcomponent’s predictive power, while maintaining a high degree of interpretability.13 Additional information is available in eAppendix 1 in Supplement 1.
Ridge, least absolute shrinkage and selection operator (LASSO), and elastic net regularized logistic regression were evaluated as the top-level stacked models. Ten-fold cross-validation was used to select the regularization parameter lambda in the stacked models that minimized deviance for each value of alpha (0 = ridge; 1 = LASSO) (see eAppendix 1 in Supplement 1 for additional information). The best-performing stacked regression models were identified using the AUPRC. In the third step, the components of the final stacked regression model were translated into an integer-based score using a grid search, then its performance was compared with the final stacked model to ensure that the AUPRC remained stable. When measures and models had similar performance, the task force voted on which to choose based on parsimony, data collection burden, and face validity.6 The task force then voted using a modified Delphi process on the thresholds of the score to define sepsis and septic shock and achieve the desired balance of sensitivity and PPV. In the final step, performance of the novel criteria was assessed across validation sets using sensitivity and PPV as primary metrics. Additional information is available in eAppendix 1 and eFigures 1-2 in Supplement 1.
Stratifications and Sensitivity Analyses
During each step, prespecified stratifications and sensitivity analyses were performed to ensure robustness. These included (1) higher-resource vs lower-resource settings, where the higher-resource sites were analyzed together given their overall similarity and the lower-resource sites were analyzed individually given their broader differences in underlying population, resources, and data quality; (2) no known prior comorbidities, to assess criteria performance in children without potential confounding by chronic and/or life-limiting conditions; (3) age groups, to ensure that performance remains appropriate across the pediatric spectrum; (4) ICU admission, given that many children with sepsis receive ICU care; and (5) excluding patients who required operative care, to reduce confounding by mechanical ventilation or vasoactive medications related to receiving anesthesia or undergoing surgery.
Results
Cohort Demographic and Clinical Characteristics
The development set included 3 049 699 emergency department, inpatient, and ICU encounters for children younger than 18 years, of which 172 984 (5.7%) had suspected infection in the first 24 hours (Table 1; eTables 3 and 4 and eFigure 2 in Supplement 1). Of those, 2065 (1.2%) died. The external validation set included 581 317 encounters, of which 45 855 (7.9%) had suspected infection in the first 24 hours. Of those, 540 (1.2%) died (Table 1; eTable 5 in Supplement 1).
Table 1. Characteristics of Pediatric Patient Encounters With Suspected Infection in the First 24 Hoursa.
Characteristics | Derivation cohort | Internal validation cohort | External validation cohort |
---|---|---|---|
Encounters, No. | 129 584 | 43 400 | 45 855 |
Resource setting, No. (%) | |||
Higher-resource settings | 108 177 (83.5) | 36 202 (83.4) | 33 020 (72.0) |
Lower-resource settings | 21 407 (16.5) | 7198 (16.6) | 12 835 (28.0) |
Age, median (IQR), y | 3.7 (0.9-9.4) | 3.7 (0.9-9.3) | 2.6 (0.6-7.6) |
Sex, No. (%) | |||
Female | 62.868 (48.5) | 21 041 (48.5) | 22 295 (48.6) |
Male | 66 712 (51.5) | 22 357 (51.5) | 21 555 (47.0) |
Race, No. (%)b | |||
American Indian or Alaska Native | 109 (0.1) | 21 (<0.1) | 59 (0.1) |
Asian | 5149 (4.0) | 1703 (3.9) | 506 (1.1) |
Black | 22 709 (17.5) | 7512 (17.3) | 7476 (16.3) |
Native Hawaiian or Other Pacific Islander | 105 (0.1) | 31 (0.1) | 70 (0.2) |
White | 57 518 (44.4) | 19 533 (45.0) | 23 545 (51.3) |
Multiple | 22 113 (17.1) | 7343 (16.9) | 277 (0.6) |
Other/unknown | 22 095 (17.1) | 7309 (16.8) | 1.4 051 (30.6) |
Hispanic or Latino ethnicity, No. (%) | 33 698 (26.0) | 11 457 (26.4) | 55 (0.1) |
Major comorbidities, No. (%) | |||
Technology dependence | 18 951 (17.5) | 6011 (16.6) | 5677 (17.2) |
Severe malnutrition | 13 505 (10.4) | 4478 (10.3) | 3417 (7.5) |
Malignancy | 10 924 (10.1) | 3709 (10.2) | 2950 (8.9) |
Transplant | 3689 (3.4) | 1287 (3.6) | 1573 (4.8) |
Comorbidities per PCCC, No. (%)c | |||
No known prior comorbidity | 72 291 (66.8) | 24 470 (67.6) | 22 553 (68.3) |
1 PCCC | 9406 (8.7) | 3150 (8.7) | 2580 (7.8) |
≥2 PCCCs | 26 480 (24.5) | 8582 (23.7) | 7887 (23.9) |
Systemic inflammatory response syndrome, No. (%)d | 56 711 (43.8) | 18 848 (43.4) | 21 436 (46.7) |
Locations visited during encounter (not mutually exclusive), No. (%) | |||
Presented to emergency department | 92 507 (71.6) | 31 092 (71.9) | 26 940 (61.6) |
≥1 Intensive care unit stays | 23 128 (17.9) | 7840 (18.1) | 10 702 (23.4) |
≥1 Operating room visits | 17 604 (13.6) | 6098 (14.1) | 469 (1.1) |
Outcomes, No. (%) | |||
Death | 1538 (1.2) | 527 (1.2) | 540 (1.2) |
Early death or extracorporeal membrane oxygenation | 834 (0.6) | 305 (0.7) | 349 (0.8) |
Abbreviation: PCCC, pediatric complex chronic condition.
Table 1 shows site, demographic, care location, comorbidity, and outcome characteristics of those with suspected or confirmed infection in the first 24 hours of the encounter. Data from the 7 development sites are stratified by the 75% derivation cohort vs the 25% internal validation cohort.
For race categories, “multiple” indicates that in the electronic health record, a patient’s race was recorded as “multiracial,” “multiple,” or “2 or more races.” “Other/unknown” indicates that a patient’s race was recorded in the electronic health record as “other,” “unknown,” “not specified,” “information not recorded,” “patient declined,” “patient refused,” “refused,” or as a race category unique to a particular international country or region.
The PCCC system classifies pediatric chronic diseases using International Classification of Diseases diagnosis and procedure codes and was assessed only at higher-resource sites, where the information was available (percentages for PCCC-related counts are based on higher-resource setting encounters).8 The major comorbidities of technology dependence (eg, requiring gastrostomy, tracheostomy, central line), malignancy, and transplant were defined in the PCCC system. Severe malnutrition was defined as based on <3 SDs below the mean based on weight-for-age standards from the World Health Organization and assessed at all sites.9 Early death is defined as death <72 hours after the beginning of the encounter.
Systemic inflammatory response syndrome is assessed using temperature, white blood cell count, heart rate, and respiratory rate, with higher values reflecting more inflammation. Criteria are met when ≥2 values are outside the threshold for age, including at least temperature or white blood cell count. See eAppendix 1 in Supplement 1 for additional details.
Best-Performing Individual Organ Dysfunction Criteria
Organ dysfunction subscore input availability and missingness are shown in eFigure 3, A-H, in Supplement 1. By 24 hours into an encounter, most patients in higher-resource settings had information recorded for pulse oximetry oxygen saturation (Spo2), respiratory support, platelet count, blood pressure, vasoactive agent use, and Glasgow Coma Scale score. Many also had fraction of inspired oxygen (Fio2), lactate, and pupillary reactivity measured. Patients in lower-resource settings were less likely to have available data on lactate, Glasgow Coma Scale, pupillary reactivity, and coagulation studies such as D-dimer and fibrinogen. The best-performing individual organ dysfunction criteria based on the primary measure of AUPRC and task force Delphi process when AUPRCs were similar included cardiovascular (Pediatric Logistic Organ Dysfunction version 2 [PELOD-2] and vasoactive medication count), hematology/coagulation (Disseminated Intravascular Coagulation score), respiratory (pediatric Sequential Organ Failure Assessment [pSOFA]), renal (pSOFA), hepatic (IPSCC), neurologic (PELOD-2), immunologic (Pediatric Organ Dysfunction Information Update Mandate [PODIUM]), and endocrine dysfunction (PODIUM), as shown in eFigure 4 in Supplement 1.
Derivation and Validation of the Stacked Models
The best-performing stacked models included an 8-organ system ridge regression model and a 4-organ system LASSO model (eTable 6 and eFigure 6 in Supplement 1). Overall, AUPRCs and AUROCs were similar between these 2 models (eFigure 7 in Supplement 1). The task force evaluated the 2 models and chose to advance the 4-organ system model because it had similar performance but greater simplicity and lower dependence on laboratory measures. The task force acknowledged that the more comprehensive 8-organ system model may have utility in some circumstances (eg, research). The 4-organ system model included criteria for respiratory (mechanical ventilation, Pao2:Fio2, and Spo2:Fio2 ratios), cardiovascular (mean arterial pressure, lactate level, and vasoactive medications), coagulation (platelet count, international normalized ratio, D-dimer, and fibrinogen), and neurologic (Glasgow Coma Scale and pupillary reaction) dysfunction.
From the Stacked Model to the Phoenix Sepsis Score
The 4-organ system model was translated into an integer-based score, the Phoenix Sepsis Score (Table 2). In doing so, the individual levels were reweighted using a grid search and collapsed into a single level when performance was unaffected (eg, the pSOFA respiratory subscores of 1 and 2 points were collapsed into a single level). Mortality increased with higher score values in both higher- and lower-resource settings (Figure 1 and Figure 2; eFigure 5 in Supplement 1). The Phoenix Sepsis Score had AUPRCs of 0.23 to 0.38 (95% CI range, 0.20-0.39) and AUROCs of 0.71 to 0.92 (95% CI range, 0.70-0.92) to predict mortality in the internal and external validation sets, similar to the stacked sepsis model (Figure 3; eFigures 6-8 in Supplement 1). Compared with the existing IPSCC sepsis score as well as several organ dysfunction scores, the Phoenix Sepsis Score had the highest AUPRC to predict mortality at all validation sites combined, at all higher-resource sites, and at 3 of the 4 lower-resource sites (Figure 3). A notable limitation is that lower-resource sites 2-4 did not record respiratory support, even when a patient received it, which limited the range of the score and likely resulted in lower performance at those sites. Additionally, lower-resource site 2 had no recording of neurologic status, further limiting score range and performance at that site. However, the score at lower-resource site 1 included data for all 4 organ systems. To enable capture of other organ dysfunctions for research or epidemiological purposes, an expanded score based on the 8-organ system model was also developed and named the Phoenix-8 Score (eFigure 9 in Supplement 1).
Table 2. The Phoenix Sepsis Scorea.
0 Points | 1 Point | 2 Points | 3 Points | |
---|---|---|---|---|
Respiratory (0-3 points) | ||||
Pao2:Fio2 ≥400 or Spo2:Fio2 ≥292b | Pao2:Fio2 <400 and any respiratory supportc or Spo2:Fio2 <292 and any respiratory supportc | Pao2:Fio2 100-200 and IMV or Spo2:Fio2 148-220 and IMV | Pao2:Fio2 <100 and IMV or Spo2:Fio2 <148 and IMV | |
Cardiovascular (0-6 points) | ||||
1 point each (up to 3) for: | 2 points each (up to 6) for: | |||
No vasoactive medicationsd | 1 Vasoactive medicationd | ≥2 Vasoactive medicationsd | ||
Lactate <5 mmol/Le | Lactate 5-10.9 mmol/Le | Lactate ≥11 mmol/Le | ||
Mean arterial pressure by age, mm Hgf,g | ||||
<1 mo | >30 | 17-30 | <17 | |
1 to 11 mo | >38 | 25-38 | <25 | |
1 to <2 y | >43 | 31-43 | <31 | |
2 to <5 y | >44 | 32-44 | <32 | |
5 to <12 y | >48 | 36-48 | <36 | |
12 to 17 y | >51 | 38-51 | <38 | |
Coagulation (0-2 points)h | ||||
1 point each (maximum of 2 points) for: | ||||
Platelets ≥100 × 103/μL | Platelets <100 × 103/μL | |||
International normalized ratio ≤1.3 | International normalized ratio >1.3 | |||
D-dimer ≤2 mg/L FEU | D-dimer >2 mg/L FEU | |||
Fibrinogen ≥100 mg/dL | Fibrinogen <100 mg/dL | |||
Neurologic (0-2 points)i | ||||
Glasgow Coma Scale score >10j; pupils reactive | Glasgow Coma Scale score ≤10j | Fixed pupils bilaterally |
Abbreviations: FEU, fibrinogen equivalent units; Fio2, fraction of inspired oxygen; IMV, invasive mechanical ventilation; Spo2, pulse oximetry oxygen saturation.
The Phoenix Sepsis Score may be calculated in the absence of some variables (eg, even if lactate level is not measured and vasoactive medications are not used, a cardiovascular score can still be ascertained using blood pressure). It is expected that laboratory tests and other measurements will be obtained at the discretion of a medical team based on clinical judgment. Unmeasured variables contribute no points to the score.
Calculated only when Spo2 is ≤97%.
Respiratory dysfunction of 1 point can be assessed in any patient receiving oxygen, high-flow, noninvasive positive pressure, or IMV respiratory support, and includes Pao2:Fio2 <200 and Spo2:Fio2 <220 in children who are not receiving IMV.
Vasoactive medications include any dose of epinephrine, norepinephrine, dopamine, dobutamine, milrinone, and/or vasopressin (for shock).
Lactate can be arterial or venous. Lactate reference range is 0.5-2.2 mmol/L.
Use measured mean arterial pressure preferentially (invasive arterial if available or noninvasive oscillometric), and if measured mean arterial pressure is not available, a calculated mean arterial pressure (⅓ × systolic + ⅔ × diastolic) may be used as an alternative.
Age is not adjusted for prematurity, and the criteria do not apply to birth hospitalizations, children with postconceptional age <37 weeks, or those aged ≥18 years.
Coagulation variable reference ranges: platelets, 150-450 × 103/μL; D-dimer, <0.5 mg/L FEU; fibrinogen, 180-410 mg/dL. International normalized ratio reference range is based on local reference prothrombin time.
The neurologic dysfunction subscore was pragmatically validated in both sedated and nonsedated patients and those with and without IMV support.
The Glasgow Coma Scale score measures level of consciousness based on verbal, eye, and motor response and ranges from 3 to 15, with a higher score indicating better neurologic function.
From the Phoenix Sepsis Score to the Criteria for Pediatric Sepsis and Septic Shock
The task force chose a Phoenix Sepsis Score of 2 or greater in patients with suspected infection as the new sepsis criteria, and sepsis with 1 or more cardiovascular points as criteria for septic shock. In the development set, children with sepsis in the first 24 hours had 7.1% mortality at the higher-resource sites and 28.5% mortality at the lower-resource sites. Children with sepsis in both higher- and lower-resource settings had a median Phoenix Sepsis Score of 3 points (IQR, 2-4). Children with septic shock in the first 24 hours had 10.8% mortality at the higher-resource sites and 33.5% mortality at the lower-resource sites. The novel criteria had higher PPV and sensitivity that was comparable with or higher than the IPSCC sepsis, severe sepsis, and septic shock criteria across all settings and using the secondary outcome of early death or ECMO (Figure 4; eFigure 10 and eTable 7 in Supplement 1). For example, for the primary outcome of death at the higher-resource sites, the Phoenix sepsis criteria had a PPV of 5.3% to 7.1% (with a baseline mortality of 0.6% to 0.7%) and a sensitivity of 69.2% to 84.4% compared with the IPSCC severe sepsis criteria, which had a PPV of 3.6% to 4.8% and a sensitivity of 58.7% to 70.7%, in the development and external validation sets, respectively. In the derivation and internal validation set of the lower-resource site that had complete data for assessment of the criteria, the Phoenix sepsis criteria had a PPV of 22.2% (baseline mortality rate of 4.1%) and a sensitivity of 81.2% compared with the IPSCC severe sepsis criteria, which had a PPV of 12.7% and a sensitivity of 49.2%.
Per request of the task force, the concept of organ dysfunction remote to the site of infection was implemented by requiring that those with respiratory or neurologic dysfunction also had 1 or more points in a different organ system. Patients with sepsis who had remote organ dysfunction accounted for 85.2% of sepsis cases and had higher mortality than the whole sepsis cohort: 8% in higher-resource sites and 32.3% in lower-resource sites (eFigure 11 in Supplement 1).
Sensitivity Analyses
Performance of the pediatric sepsis criteria was consistent across age groups, with higher sepsis incidence and mortality in younger age groups, as expected (eTable 8 in Supplement 1). Similarly, the performance was consistent in patients with no known prior comorbidities, those admitted to the ICU, and after excluding patients who underwent surgery (eTable 8 in Supplement 1).
Clinical vignettes for children presenting with sepsis and septic shock and their corresponding Phoenix Sepsis Score data are provided in eAppendix 2 in Supplement 1.
Discussion
New criteria for pediatric sepsis and septic shock were derived and validated by developing and curating a clinical database with more than 3.6 million pediatric hospital encounters at 10 sites in 5 countries. The development data set was built using structured EHR data from an international cohort that was geographically and racially diverse and had widely varying resources, a major strength of this study. A prespecified data-driven approach was used to determine the best-performing organ dysfunction measures in children with suspected infection. An interpretable machine learning approach was used to develop a composite model that was the basis for the new Phoenix Sepsis Score and the new criteria. The new Phoenix criteria for pediatric sepsis and septic shock had higher PPV and comparable or higher sensitivity than the IPSCC criteria for predicting mortality across differently resourced settings. These findings were consistent in multiple sensitivity analyses that included age, absence of prior comorbidities, ICU admission, and surgery.
Comparison With the Adult Sepsis-3 Criteria
The approach used in this study had both similarities with and differences from the derivation of the adult Sepsis-3 criteria.4 Similar to Sepsis-3, the definition of sepsis was implemented as the combination of suspected infection with life-threatening organ dysfunction. Also, existing organ dysfunction scores and a large EHR database were used to develop the new criteria and in-hospital mortality was the primary outcome. However, there were also several important differences. First, instead of using existing complete organ dysfunction scores (eg, the SOFA score) to derive the new criteria, the best-performing individual organ measures of existing scores were used to develop a novel composite score using stacked regression. Additionally, a database was built that included a geographically and demographically diverse population of children from both higher- and lower-resource settings to maximize generalizability. Furthermore, the performance of the individual organ dysfunction measures, the stacked models, and the Phoenix Sepsis Score were primarily evaluated using the AUPRC, instead of the AUROC, with the goal of maximizing the PPV and sensitivity of the final criteria. The AUPRC is considered a better measure of classification performance for rare events (in this case, deaths) compared with the AUROC, which can have inflated performance when the proportions of events (deaths) and nonevents (survivors) are imbalanced,11,20 an issue that is particularly relevant in children with infections given their lower mortality compared with adults. Finally, this analysis focused on diagnosis of sepsis within the first 24 hours of presentation to a hospital setting, when the majority of pediatric sepsis is diagnosed.21
Leveraging Digital Technology to Develop and Implement the Phoenix Sepsis Score
This approach to the development of the Phoenix Sepsis Score and the criteria for sepsis and septic shock is a reflection of the growing digitization of health care globally.22 Most of the vital signs, laboratory tests, and interventions included in the Phoenix Sepsis Score are routinely collected in most lower-resource settings and in nearly all higher-resource settings, according to the Pediatric Sepsis Definition Task Force’s international survey.23 Even in settings where not all variables are available, the Phoenix Sepsis Score is designed to accurately identify children with sepsis. The score functions when not all variables are available because of its redundancy. Because the score has a possible range of 0 to 13 points, there are several ways to achieve the threshold of 2 points for sepsis diagnosis, as evidenced by the fact that patients with sepsis in both higher- and lower-resource settings had a median Phoenix Sepsis Score of 3 points. This feature was primarily assessed in the data sets from lower-resource settings. For example, although platelets were commonly measured at most sites, coagulation tests (eg, D-dimer and fibrinogen) were less frequently available. At lower-resource site 1, where platelet count was routinely measured but coagulation factors such as D-dimer and fibrinogen were not, the Phoenix Sepsis Score had excellent performance and the Phoenix sepsis criteria had higher sensitivity and PPV than the IPSCC sepsis and severe sepsis criteria. This makes the score and criteria readily translatable into EHR and other digital tools, such as web-based and mobile applications across differently resourced settings, even when some of the variables are not routinely collected.24 Furthermore, digital implementation of the Phoenix Sepsis Score can enable longitudinal monitoring and provide clinicians and researchers with a tool to stratify severity of sepsis.
Additional considerations for the implementation and use of the Phoenix Sepsis Score and the novel criteria are discussed in the accompanying consensus criteria article.6
Limitations
This study has several limitations. Retrospective data obtained from EHRs may have missing data and data entry errors. In this study, a robust quality assurance and harmonization process was developed and best practices were used to address outliers and missing data. However, not all errors or missing data can be reconciled. For example, at lower-resource site 2 in the development data set, which represents a lower- to middle-income country, respiratory support (eg, mechanical ventilation, Fio2) and neurologic assessments (eg, level of consciousness and pupillary reaction) are performed but not recorded in the clinical information systems. This reduces the ability to assess the score and criteria at that site. In contrast, score performance was excellent at lower-resource site 1 and comparable with the higher-resource sites. This demonstrates the potential for score performance in lower-resource environments when these variables are recorded. Second, when deriving the stacked regression models, the Phoenix Sepsis Score, and the new criteria for sepsis and septic shock, a pragmatic approach was intentionally chosen, using the data as recorded during routine care as an indicator of how the criteria would perform in real-world implementations. However, it is acknowledged that some of the organ dysfunction measures used in the modeling process may not have reflected actual organ dysfunction, but rather were due to iatrogenic effects or clinician therapeutic choices, such as a lower Glasgow Coma Scale score in a patient receiving sedation or initiation of vasoactive medications in a patient with minimal cardiovascular dysfunction. Future work to determine the effects of these variables and clinician choices on the performance of the criteria is needed. Third, similar to the Sepsis-3 validation study, unique criteria for patients with chronic organ dysfunction were not developed.4 Fourth, few databases from lower-resource settings were available (a form of data poverty),25 and the ones used may not be generalizable to every low-resource environment. Fifth, the data from higher-resource settings were exclusively from tertiary US pediatric centers. Sixth, the data sets from some of the sites included 10 years of data, possibly including changes in practice during that time frame.
Conclusions
The novel Phoenix sepsis criteria, which were derived and validated using a large international database of pediatric hospital encounters in higher- and lower-resource settings, had improved performance for the diagnosis of pediatric sepsis and septic shock compared with the existing IPSCC criteria.
Educational Objective: To identify the key insights or developments described in this article.
-
What was the primary outcome used in the development and validation of this clinical prediction tool?
A combination of intensive care unit admission, intubation, vasopressor support, extracorporeal membrane oxygenation, or death
Agreement by at least 2 of 3 pediatric intensivists that sepsis was present on postevent chart review
In-hospital mortality
-
The authors used stacked regression modeling for the derivation of this new clinical prediction tool. Why did they choose this approach?
Stacked regression allows many models to be used simultaneously, leveraging predictive power while maintaining a high degree of interpretability.
Stacked regression typically yields integer estimates of risk, permitting easy and obvious application in clinical settings.
The large size of this database overwhelmed alternative forms of machine learning, forcing selection of the less computationally intensive stacked regression.
-
According to the authors, how did this development of pediatric sepsis criteria differ from the derivation of the adult Sepsis-3 criteria?
Existing organ dysfunction scores and a large electronic health record database were used to develop the new criteria.
The database included a geographically and demographically diverse population from both higher- and lower-resource settings.
The implemented definition of sepsis combined suspected infection with life-threatening organ dysfunction.
References
- 1.Rudd KE, Johnson SC, Agesa KM, et al. Global, regional, and national sepsis incidence and mortality, 1990-2017. Lancet. 2020;395(10219):200-211. doi: 10.1016/S0140-6736(19)32989-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Goldstein B, Giroir B, Randolph A, et al. International pediatric sepsis consensus conference: definitions for sepsis and organ dysfunction in pediatrics. Pediatr Crit Care Med. 2005;6(1):2-8. doi: 10.1097/01.PCC.0000149131.72248.E6 [DOI] [PubMed] [Google Scholar]
- 3.Weiss SL, Fitzgerald JC, Maffei FA, et al. Discordant identification of pediatric severe sepsis by research and clinical definitions in the SPROUT international point prevalence study. Crit Care. 2015;19(1):325. doi: 10.1186/s13054-015-1055-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Seymour CW, Liu VX, Iwashyna TJ, et al. Assessment of clinical criteria for sepsis: for the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA. 2016;315(8):762-774. doi: 10.1001/jama.2016.0288 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Singer M, Deutschman CS, Seymour CW, et al. The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA. 2016;315(8):801-810. doi: 10.1001/jama.2016.0287 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Schlapbach LJ, Watson RS, Sorce LR, et al. ; Society of Critical Care Medicine Pediatric Sepsis Definition Task Force . International consensus criteria for pediatric sepsis and septic shock. JAMA. Published online January 21, 2024. doi: 10.1001/jama.2024.0179 [DOI] [Google Scholar]
- 7.Zeng X, Yu G, Lu Y, et al. PIC, a paediatric-specific intensive care database. Sci Data. 2020;7(1):14. doi: 10.1038/s41597-020-0355-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Feinstein JA, Russell S, DeWitt PE, et al. R package for pediatric complex chronic condition classification. JAMA Pediatr. 2018;172(6):596-598. doi: 10.1001/jamapediatrics.2018.0256 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.World Health Organization . Nutrition for Health and Development. WHO Child Growth Standards: Growth Velocity Based on Weight, Length and Head Circumference: Methods and Development. World Health Organization; 2009. [Google Scholar]
- 10.Ozenne B, Subtil F, Maucort-Boulch D. The precision-recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases. J Clin Epidemiol. 2015;68(8):855-859. doi: 10.1016/j.jclinepi.2015.02.010 [DOI] [PubMed] [Google Scholar]
- 11.Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One. 2015;10(3):e0118432. doi: 10.1371/journal.pone.0118432 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Wolpert DH. Stacked generalization. Neural Netw. 1992;5(2):241-259. doi: 10.1016/S0893-6080(05)80023-1 [DOI] [Google Scholar]
- 13.Breiman L. Stacked regressions. Mach Learn. 1996;24(1):49-64. doi: 10.1007/BF00117832 [DOI] [Google Scholar]
- 14.Proulx F, Fayon M, Farrell CA, et al. Epidemiology of sepsis and multiple organ dysfunction syndrome in children. Chest. 1996;109(4):1033-1037. doi: 10.1378/chest.109.4.1033 [DOI] [PubMed] [Google Scholar]
- 15.Matics TJ, Sanchez-Pinto LN. Adaptation and validation of a pediatric Sequential Organ Failure Assessment score and evaluation of the Sepsis-3 definitions in critically ill children. JAMA Pediatr. 2017;171(10):e172352. doi: 10.1001/jamapediatrics.2017.2352 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Leteurtre S, Duhamel A, Salleron J, et al. PELOD-2: an update of the Pediatric Logistic Organ Dysfunction score. Crit Care Med. 2013;41(7):1761-1773. doi: 10.1097/CCM.0b013e31828a2bbd [DOI] [PubMed] [Google Scholar]
- 17.Rousseaux J, Grandbastien B, Dorkenoo A, et al. Prognostic value of shock index in children with septic shock. Pediatr Emerg Care. 2013;29(10):1055-1059. doi: 10.1097/PEC.0b013e3182a5c99c [DOI] [PubMed] [Google Scholar]
- 18.Khemani RG, Bart RD, Alonzo TA, et al. Disseminated intravascular coagulation score is associated with mortality for children with shock. Intensive Care Med. 2009;35(2):327-333. doi: 10.1007/s00134-008-1280-8 [DOI] [PubMed] [Google Scholar]
- 19.Haque A, Siddiqui NR, Munir O, et al. Association between vasoactive-inotropic score and mortality in pediatric septic shock. Indian Pediatr. 2015;52(4):311-313. doi: 10.1007/s13312-015-0630-1 [DOI] [PubMed] [Google Scholar]
- 20.Tharwat A. Classification assessment methods. Appl Comput Inform. 2020;17(1):168-192. doi: 10.1016/j.aci.2018.08.003 [DOI] [Google Scholar]
- 21.Scott HF, Brilli RJ, Paul R, et al. Evaluating pediatric sepsis definitions designed for electronic health record extraction and multicenter quality improvement. Crit Care Med. 2020;48(10):e916-e926. doi: 10.1097/CCM.0000000000004505 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wyber R, Vaillancourt S, Perry W, et al. Big data in global health. Bull World Health Organ. 2015;93(3):203-208. doi: 10.2471/BLT.14.139022 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Morin L, Hall M, de Souza D, et al. The current and future state of pediatric sepsis definitions. Pediatrics. 2022;149(6):e2021052565. doi: 10.1542/peds.2021-052565 [DOI] [PubMed] [Google Scholar]
- 24.Jimenez-Zambrano A, Ritger C, Rebull M, et al. Clinical decision support tools for paediatric sepsis in resource-poor settings. BMJ Open. 2023;13(10):e074458. doi: 10.1136/bmjopen-2023-074458 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Ibrahim H, Liu X, Zariffa N, et al. Health data poverty: an assailable barrier to equitable digital health care. Lancet Digit Health. 2021;3(4):e260-e265. doi: 10.1016/S2589-7500(20)30317-4 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.