Skip to main content
JAMA Network logoLink to JAMA Network
. 2024 Jan 21;331(8):675–686. doi: 10.1001/jama.2024.0196

Development and Validation of the Phoenix Criteria for Pediatric Sepsis and Septic Shock

L Nelson Sanchez-Pinto 1, Tellen D Bennett 2,, Peter E DeWitt 3, Seth Russell 3, Margaret N Rebull 3, Blake Martin 2, Samuel Akech 4, David J Albers 5,6, Elizabeth R Alpern 7, Fran Balamuth 8, Melania Bembea 9, Mohammod Jobayer Chisti 10, Idris Evans 11, Christopher M Horvat 11, Juan Camilo Jaramillo-Bustamante 12, Niranjan Kissoon 13, Kusum Menon 14, Halden F Scott 15, Scott L Weiss 16,17, Matthew O Wiens 18,19,20, Jerry J Zimmerman 21, Andrew C Argent 22, Lauren R Sorce 23, Luregn J Schlapbach 24,25, R Scott Watson 26; and the Society of Critical Care Medicine Pediatric Sepsis Definition Task Force, Paolo Biban 27, Enitan Carrol 28, Kathleen Chiotos 29, Claudio Flauzino De Oliveira 30, Mark W Hall 31, David Inwald 32, Paul Ishimine 33, Michael Levin 34, Rakesh Lodha 35, Simon Nadel 36, Satoshi Nakagawa 37, Mark J Peters 38, Adrienne G Randolph 39, Suchitra Ranjit 40, Daniela Carla Souza 41, Pierre Tissieres 42, James L Wynn 43
PMCID: PMC10900964  PMID: 38245897

Key Points

Question

What are the best-performing organ dysfunction–based criteria to implement the definition of sepsis and septic shock in children with suspected infection?

Findings

In this international, multicenter, retrospective cohort study including more than 3.6 million pediatric encounters, a novel score, the Phoenix Sepsis Score, was derived and validated to predict mortality in children with suspected or confirmed infection. The new criteria for pediatric sepsis and septic shock based on the score performed better than existing organ dysfunction scores and the International Pediatric Sepsis Consensus Conference criteria.

Meaning

The new data-driven criteria for pediatric sepsis and septic shock based on measures of organ dysfunction had improved performance compared with prior pediatric sepsis criteria.

Abstract

Importance

The Society of Critical Care Medicine Pediatric Sepsis Definition Task Force sought to develop and validate new clinical criteria for pediatric sepsis and septic shock using measures of organ dysfunction through a data-driven approach.

Objective

To derive and validate novel criteria for pediatric sepsis and septic shock across differently resourced settings.

Design, Setting, and Participants

Multicenter, international, retrospective cohort study in 10 health systems in the US, Colombia, Bangladesh, China, and Kenya, 3 of which were used as external validation sites. Data were collected from emergency and inpatient encounters for children (aged <18 years) from 2010 to 2019: 3 049 699 in the development (including derivation and internal validation) set and 581 317 in the external validation set.

Exposure

Stacked regression models to predict mortality in children with suspected infection were derived and validated using the best-performing organ dysfunction subscores from 8 existing scores. The final model was then translated into an integer-based score used to establish binary criteria for sepsis and septic shock.

Main Outcomes and Measures

The primary outcome for all analyses was in-hospital mortality. Model- and integer-based score performance measures included the area under the precision recall curve (AUPRC; primary) and area under the receiver operating characteristic curve (AUROC; secondary). For binary criteria, primary performance measures were positive predictive value and sensitivity.

Results

Among the 172 984 children with suspected infection in the first 24 hours (development set; 1.2% mortality), a 4-organ-system model performed best. The integer version of that model, the Phoenix Sepsis Score, had AUPRCs of 0.23 to 0.38 (95% CI range, 0.20-0.39) and AUROCs of 0.71 to 0.92 (95% CI range, 0.70-0.92) to predict mortality in the validation sets. Using a Phoenix Sepsis Score of 2 points or higher in children with suspected infection as criteria for sepsis and sepsis plus 1 or more cardiovascular point as criteria for septic shock resulted in a higher positive predictive value and higher or similar sensitivity compared with the 2005 International Pediatric Sepsis Consensus Conference (IPSCC) criteria across differently resourced settings.

Conclusions and Relevance

The novel Phoenix sepsis criteria, which were derived and validated using data from higher- and lower-resource settings, had improved performance for the diagnosis of pediatric sepsis and septic shock compared with the existing IPSCC criteria.


This cohort study derives and validates novel criteria for diagnosis of pediatric sepsis and septic shock across high-resource and low-resource international settings.

Introduction

Pediatric sepsis is a major public health problem that causes an estimated 3.3 million deaths annually worldwide.1 However, the current criteria to diagnose pediatric sepsis, which were published in 2005 following the International Pediatric Sepsis Consensus Conference (IPSCC), are outdated, have low specificity, do not allow for risk stratification in both lower- and higher-resource settings, and may be discordant with clinician-based diagnosis.2,3 In 2016, the Sepsis-3 Task Force redefined adult sepsis as life-threatening organ dysfunction in the setting of infection and developed criteria using a large electronic health record (EHR) data set and a data-driven approach.4,5 In 2019, the Society of Critical Care Medicine Pediatric Sepsis Definition Task Force was convened to update the pediatric sepsis definition and criteria. The task force adopted the conceptual definition of pediatric sepsis as suspected infection with life-threatening organ dysfunction and sought to implement the definition using organ dysfunction criteria associated with higher risk of mortality. The goal was to develop criteria that would generalize across differently resourced settings.6

New pediatric sepsis criteria should maximize identification of true-positive cases so that infected children with life-threatening organ dysfunction receive best-practice sepsis care, are appropriately enrolled in clinical studies, and are correctly represented in epidemiological surveillance. Simultaneously, new criteria must minimize false-positive cases so that children are not misdiagnosed with sepsis. This is important to reduce unnecessary use of antimicrobials and other treatments, optimize the efficiency of clinical studies, and avoid overcounting in surveillance. However, it is unclear which measures of organ dysfunction in children have an appropriate balance of sensitivity and positive predictive value (PPV) to achieve these goals and also generalize across differently resourced settings.

One challenge is that there is currently no large, centralized, multicenter, high-granularity database that includes pediatric emergency and inpatient care in differently resourced settings. Additionally, the validation of the existing IPSCC criteria has been limited historically.2,3 To address these gaps, a database was developed and used to derive and validate novel criteria for pediatric sepsis and septic shock based on measures of organ dysfunction in children with suspected infection.

Methods

Overview

The existing organ dysfunction subscores for each organ system that best predicted mortality were first identified and then integrated into models to predict mortality in children with suspected infection. From the best-performing models, an integer-based score (the Phoenix Sepsis Score) was developed (eFigure 1 in Supplement 1). The binary Phoenix sepsis and septic shock criteria were then selected as thresholds of the Phoenix Sepsis Score.

Study Design, Setting, and Population

A retrospective cohort study was performed using EHR data from 10 hospital-based sites in 5 countries. The analysis plan was prespecified in the funding application that supported this work. Six US sites represent higher-resource settings, 5 of which were in the development data set (eFigure 2 in Supplement 1). Data from 1 US site was held out for geographic external validation. Two international sites in Bangladesh and Colombia represent lower-resource settings in the development data set. Additionally, limited EHR and registry data from sites in China7 and Kenya served as lower-resource external validation sites. From each site, all emergency department, inpatient, and intensive care unit (ICU) encounters of children younger than 18 years from 2010-2019 were included, with some sites providing shorter time windows (eTable 1 in Supplement 1). Data from newborns before discharge (birth hospitalizations) and children with a postconceptional age of less than 37 weeks were excluded. Data harmonization, quality assurance, and all analyses were conducted as a reproducible pipeline in a centralized, cloud-based environment (eFigure 2 and eAppendix 1 in Supplement 1). The study was approved with a waiver of consent by a central institutional review board at the University of Colorado, plus separate regulatory approvals at non-US sites.

Outcomes, Definitions, and Main Measures

The primary outcome for all analyses was in-hospital mortality, which was used to assess the likelihood that organ dysfunction in the setting of an infection was life-threatening. The secondary outcome for all analyses was a composite of early death (within 72 hours of presentation to the hospital) or requirement of extracorporeal membrane oxygenation (ECMO) support. This secondary outcome was requested by the task force because early death and ECMO are more likely to be directly associated with sepsis in the first 24 hours of presentation than in-hospital mortality, which can occur later and be the result of complications during the hospitalization. Also, using ECMO to rescue children with sepsis-associated respiratory and/or cardiac failure could lead to survival of some children who would otherwise die. Suspected infection was defined as receipt of systemic antimicrobials and microbiological testing within the first 24 hours of the encounter. Comorbidities were defined based on the Pediatric Complex Chronic Conditions Classification System,8 and severe malnutrition was based on more than 3 SDs below the mean based on weight-for-age standards from the World Health Organization.9 The systemic inflammatory response syndrome criteria were based on IPSCC criteria.2,3 Because dosing information necessary to calculate the vasoactive-inotropic score was often missing at lower-resource sites, the number of concurrent vasoactive agents was tested as a proxy. The area under the precision recall curve (AUPRC) was used as the primary measure of organ dysfunction subscore, stacked regression sepsis model, and Phoenix Sepsis Score performance because it is more accurate than the area under the receiver operating characteristic (AUROC) curve when analyzing imbalanced data sets (eg, many more survivors than nonsurvivors). This is particularly important in children with infections given their lower baseline mortality compared with adults.10,11 The best way to interpret AUPRCs is to use the baseline rate as reference. If mortality is 1% (0.01) and the model AUPRC is 0.30, the model has 30-fold higher performance than a random model. Because the novel Phoenix sepsis and septic shock criteria represent single, binary thresholds, the primary performance measures used to evaluate them were sensitivity and PPV, which represent single points on the precision recall curve. Missing data were imputed using a last-observation-carried-forward approach across physiologically appropriate time windows. See eAppendix 1 in Supplement 1 for details.

Derivation and Validation of the Novel Criteria for Sepsis and Septic Shock

The evaluation of which organ dysfunction subscores best predicted mortality involved all patients with and without suspected infection (eFigures 1-2 in Supplement 1). Then, stacked regression models12,13 were derived and validated to predict mortality using the worst organ dysfunction subscores recorded in the first 24 hours of the encounter among children with suspected infection (eFigures 1-2 in Supplement 1). This approach was used to implement the concept of “an infection with life-threatening organ dysfunction,” which was adopted by the Pediatric Sepsis Definition Task Force as the conceptual definition of sepsis.

The data set was first divided into development (including derivation and internal validation) and external validation sets as described above and shown in eFigure 2 in Supplement 1. From each development site, 25% were held out for internal validation. The other three 25% portions of the development data set were used to (1) identify the best-performing criteria for each individual organ dysfunction based on the subscores of 8 existing and previously validated pediatric organ dysfunction criteria in all patients in the development data sets (including patients with suspected infection and those without) (eTable 2 and eFigure 2 in Supplement 1)14,15,16,17,18,19; (2) train and tune stacked regression models using a composite of the best-performing individual organ dysfunction criteria in children with suspected infection12,13; and (3) derive and internally validate the novel sepsis criteria based on the final stacked regression model. Finally, the novel criteria were validated in the external validation sets.

Stacked regression is a robust model-averaging approach that allows many models to be used simultaneously, leveraging the best predictive power of each model. The best-performing organ dysfunction subcomponent scores were used as input variables for stacked regression models that also predicted mortality. The stacked regression models took the organ dysfunction subscores as covariates and estimated the regression weights (or the relative contribution of each respective subcomponent’s prediction to the overall prediction) in accordance with each subcomponent’s predictive power, while maintaining a high degree of interpretability.13 Additional information is available in eAppendix 1 in Supplement 1.

Ridge, least absolute shrinkage and selection operator (LASSO), and elastic net regularized logistic regression were evaluated as the top-level stacked models. Ten-fold cross-validation was used to select the regularization parameter lambda in the stacked models that minimized deviance for each value of alpha (0 = ridge; 1 = LASSO) (see eAppendix 1 in Supplement 1 for additional information). The best-performing stacked regression models were identified using the AUPRC. In the third step, the components of the final stacked regression model were translated into an integer-based score using a grid search, then its performance was compared with the final stacked model to ensure that the AUPRC remained stable. When measures and models had similar performance, the task force voted on which to choose based on parsimony, data collection burden, and face validity.6 The task force then voted using a modified Delphi process on the thresholds of the score to define sepsis and septic shock and achieve the desired balance of sensitivity and PPV. In the final step, performance of the novel criteria was assessed across validation sets using sensitivity and PPV as primary metrics. Additional information is available in eAppendix 1 and eFigures 1-2 in Supplement 1.

Stratifications and Sensitivity Analyses

During each step, prespecified stratifications and sensitivity analyses were performed to ensure robustness. These included (1) higher-resource vs lower-resource settings, where the higher-resource sites were analyzed together given their overall similarity and the lower-resource sites were analyzed individually given their broader differences in underlying population, resources, and data quality; (2) no known prior comorbidities, to assess criteria performance in children without potential confounding by chronic and/or life-limiting conditions; (3) age groups, to ensure that performance remains appropriate across the pediatric spectrum; (4) ICU admission, given that many children with sepsis receive ICU care; and (5) excluding patients who required operative care, to reduce confounding by mechanical ventilation or vasoactive medications related to receiving anesthesia or undergoing surgery.

Results

Cohort Demographic and Clinical Characteristics

The development set included 3 049 699 emergency department, inpatient, and ICU encounters for children younger than 18 years, of which 172 984 (5.7%) had suspected infection in the first 24 hours (Table 1; eTables 3 and 4 and eFigure 2 in Supplement 1). Of those, 2065 (1.2%) died. The external validation set included 581 317 encounters, of which 45 855 (7.9%) had suspected infection in the first 24 hours. Of those, 540 (1.2%) died (Table 1; eTable 5 in Supplement 1).

Table 1. Characteristics of Pediatric Patient Encounters With Suspected Infection in the First 24 Hoursa.

Characteristics Derivation cohort Internal validation cohort External validation cohort
Encounters, No. 129 584 43 400 45 855
Resource setting, No. (%)
Higher-resource settings 108 177 (83.5) 36 202 (83.4) 33 020 (72.0)
Lower-resource settings 21 407 (16.5) 7198 (16.6) 12 835 (28.0)
Age, median (IQR), y 3.7 (0.9-9.4) 3.7 (0.9-9.3) 2.6 (0.6-7.6)
Sex, No. (%)
Female 62.868 (48.5) 21 041 (48.5) 22 295 (48.6)
Male 66 712 (51.5) 22 357 (51.5) 21 555 (47.0)
Race, No. (%)b
American Indian or Alaska Native 109 (0.1) 21 (<0.1) 59 (0.1)
Asian 5149 (4.0) 1703 (3.9) 506 (1.1)
Black 22 709 (17.5) 7512 (17.3) 7476 (16.3)
Native Hawaiian or Other Pacific Islander 105 (0.1) 31 (0.1) 70 (0.2)
White 57 518 (44.4) 19 533 (45.0) 23 545 (51.3)
Multiple 22 113 (17.1) 7343 (16.9) 277 (0.6)
Other/unknown 22 095 (17.1) 7309 (16.8) 1.4 051 (30.6)
Hispanic or Latino ethnicity, No. (%) 33 698 (26.0) 11 457 (26.4) 55 (0.1)
Major comorbidities, No. (%)
Technology dependence 18 951 (17.5) 6011 (16.6) 5677 (17.2)
Severe malnutrition 13 505 (10.4) 4478 (10.3) 3417 (7.5)
Malignancy 10 924 (10.1) 3709 (10.2) 2950 (8.9)
Transplant 3689 (3.4) 1287 (3.6) 1573 (4.8)
Comorbidities per PCCC, No. (%)c
No known prior comorbidity 72 291 (66.8) 24 470 (67.6) 22 553 (68.3)
1 PCCC 9406 (8.7) 3150 (8.7) 2580 (7.8)
≥2 PCCCs 26 480 (24.5) 8582 (23.7) 7887 (23.9)
Systemic inflammatory response syndrome, No. (%)d 56 711 (43.8) 18 848 (43.4) 21 436 (46.7)
Locations visited during encounter (not mutually exclusive), No. (%)
Presented to emergency department 92 507 (71.6) 31 092 (71.9) 26 940 (61.6)
≥1 Intensive care unit stays 23 128 (17.9) 7840 (18.1) 10 702 (23.4)
≥1 Operating room visits 17 604 (13.6) 6098 (14.1) 469 (1.1)
Outcomes, No. (%)
Death 1538 (1.2) 527 (1.2) 540 (1.2)
Early death or extracorporeal membrane oxygenation 834 (0.6) 305 (0.7) 349 (0.8)

Abbreviation: PCCC, pediatric complex chronic condition.

a

Table 1 shows site, demographic, care location, comorbidity, and outcome characteristics of those with suspected or confirmed infection in the first 24 hours of the encounter. Data from the 7 development sites are stratified by the 75% derivation cohort vs the 25% internal validation cohort.

b

For race categories, “multiple” indicates that in the electronic health record, a patient’s race was recorded as “multiracial,” “multiple,” or “2 or more races.” “Other/unknown” indicates that a patient’s race was recorded in the electronic health record as “other,” “unknown,” “not specified,” “information not recorded,” “patient declined,” “patient refused,” “refused,” or as a race category unique to a particular international country or region.

c

The PCCC system classifies pediatric chronic diseases using International Classification of Diseases diagnosis and procedure codes and was assessed only at higher-resource sites, where the information was available (percentages for PCCC-related counts are based on higher-resource setting encounters).8 The major comorbidities of technology dependence (eg, requiring gastrostomy, tracheostomy, central line), malignancy, and transplant were defined in the PCCC system. Severe malnutrition was defined as based on <3 SDs below the mean based on weight-for-age standards from the World Health Organization and assessed at all sites.9 Early death is defined as death <72 hours after the beginning of the encounter.

d

Systemic inflammatory response syndrome is assessed using temperature, white blood cell count, heart rate, and respiratory rate, with higher values reflecting more inflammation. Criteria are met when ≥2 values are outside the threshold for age, including at least temperature or white blood cell count. See eAppendix 1 in Supplement 1 for additional details.

Best-Performing Individual Organ Dysfunction Criteria

Organ dysfunction subscore input availability and missingness are shown in eFigure 3, A-H, in Supplement 1. By 24 hours into an encounter, most patients in higher-resource settings had information recorded for pulse oximetry oxygen saturation (Spo2), respiratory support, platelet count, blood pressure, vasoactive agent use, and Glasgow Coma Scale score. Many also had fraction of inspired oxygen (Fio2), lactate, and pupillary reactivity measured. Patients in lower-resource settings were less likely to have available data on lactate, Glasgow Coma Scale, pupillary reactivity, and coagulation studies such as D-dimer and fibrinogen. The best-performing individual organ dysfunction criteria based on the primary measure of AUPRC and task force Delphi process when AUPRCs were similar included cardiovascular (Pediatric Logistic Organ Dysfunction version 2 [PELOD-2] and vasoactive medication count), hematology/coagulation (Disseminated Intravascular Coagulation score), respiratory (pediatric Sequential Organ Failure Assessment [pSOFA]), renal (pSOFA), hepatic (IPSCC), neurologic (PELOD-2), immunologic (Pediatric Organ Dysfunction Information Update Mandate [PODIUM]), and endocrine dysfunction (PODIUM), as shown in eFigure 4 in Supplement 1.

Derivation and Validation of the Stacked Models

The best-performing stacked models included an 8-organ system ridge regression model and a 4-organ system LASSO model (eTable 6 and eFigure 6 in Supplement 1). Overall, AUPRCs and AUROCs were similar between these 2 models (eFigure 7 in Supplement 1). The task force evaluated the 2 models and chose to advance the 4-organ system model because it had similar performance but greater simplicity and lower dependence on laboratory measures. The task force acknowledged that the more comprehensive 8-organ system model may have utility in some circumstances (eg, research). The 4-organ system model included criteria for respiratory (mechanical ventilation, Pao2:Fio2, and Spo2:Fio2 ratios), cardiovascular (mean arterial pressure, lactate level, and vasoactive medications), coagulation (platelet count, international normalized ratio, D-dimer, and fibrinogen), and neurologic (Glasgow Coma Scale and pupillary reaction) dysfunction.

From the Stacked Model to the Phoenix Sepsis Score

The 4-organ system model was translated into an integer-based score, the Phoenix Sepsis Score (Table 2). In doing so, the individual levels were reweighted using a grid search and collapsed into a single level when performance was unaffected (eg, the pSOFA respiratory subscores of 1 and 2 points were collapsed into a single level). Mortality increased with higher score values in both higher- and lower-resource settings (Figure 1 and Figure 2; eFigure 5 in Supplement 1). The Phoenix Sepsis Score had AUPRCs of 0.23 to 0.38 (95% CI range, 0.20-0.39) and AUROCs of 0.71 to 0.92 (95% CI range, 0.70-0.92) to predict mortality in the internal and external validation sets, similar to the stacked sepsis model (Figure 3; eFigures 6-8 in Supplement 1). Compared with the existing IPSCC sepsis score as well as several organ dysfunction scores, the Phoenix Sepsis Score had the highest AUPRC to predict mortality at all validation sites combined, at all higher-resource sites, and at 3 of the 4 lower-resource sites (Figure 3). A notable limitation is that lower-resource sites 2-4 did not record respiratory support, even when a patient received it, which limited the range of the score and likely resulted in lower performance at those sites. Additionally, lower-resource site 2 had no recording of neurologic status, further limiting score range and performance at that site. However, the score at lower-resource site 1 included data for all 4 organ systems. To enable capture of other organ dysfunctions for research or epidemiological purposes, an expanded score based on the 8-organ system model was also developed and named the Phoenix-8 Score (eFigure 9 in Supplement 1).

Table 2. The Phoenix Sepsis Scorea.

0 Points 1 Point 2 Points 3 Points
Respiratory (0-3 points)
Pao2:Fio2 ≥400 or Spo2:Fio2 ≥292b Pao2:Fio2 <400 and any respiratory supportc or Spo2:Fio2 <292 and any respiratory supportc Pao2:Fio2 100-200 and IMV or Spo2:Fio2 148-220 and IMV Pao2:Fio2 <100 and IMV or Spo2:Fio2 <148 and IMV
Cardiovascular (0-6 points)
1 point each (up to 3) for: 2 points each (up to 6) for:
No vasoactive medicationsd 1 Vasoactive medicationd ≥2 Vasoactive medicationsd
Lactate <5 mmol/Le Lactate 5-10.9 mmol/Le Lactate ≥11 mmol/Le
Mean arterial pressure by age, mm Hgf,g
<1 mo >30 17-30 <17
1 to 11 mo >38 25-38 <25
1 to <2 y >43 31-43 <31
2 to <5 y >44 32-44 <32
5 to <12 y >48 36-48 <36
12 to 17 y >51 38-51 <38
Coagulation (0-2 points)h
1 point each (maximum of 2 points) for:
Platelets ≥100 × 103/μL Platelets <100 × 103/μL
International normalized ratio ≤1.3 International normalized ratio >1.3
D-dimer ≤2 mg/L FEU D-dimer >2 mg/L FEU
Fibrinogen ≥100 mg/dL Fibrinogen <100 mg/dL
Neurologic (0-2 points)i
Glasgow Coma Scale score >10j; pupils reactive Glasgow Coma Scale score ≤10j Fixed pupils bilaterally

Abbreviations: FEU, fibrinogen equivalent units; Fio2, fraction of inspired oxygen; IMV, invasive mechanical ventilation; Spo2, pulse oximetry oxygen saturation.

a

The Phoenix Sepsis Score may be calculated in the absence of some variables (eg, even if lactate level is not measured and vasoactive medications are not used, a cardiovascular score can still be ascertained using blood pressure). It is expected that laboratory tests and other measurements will be obtained at the discretion of a medical team based on clinical judgment. Unmeasured variables contribute no points to the score.

b

Calculated only when Spo2 is ≤97%.

c

Respiratory dysfunction of 1 point can be assessed in any patient receiving oxygen, high-flow, noninvasive positive pressure, or IMV respiratory support, and includes Pao2:Fio2 <200 and Spo2:Fio2 <220 in children who are not receiving IMV.

d

Vasoactive medications include any dose of epinephrine, norepinephrine, dopamine, dobutamine, milrinone, and/or vasopressin (for shock).

e

Lactate can be arterial or venous. Lactate reference range is 0.5-2.2 mmol/L.

f

Use measured mean arterial pressure preferentially (invasive arterial if available or noninvasive oscillometric), and if measured mean arterial pressure is not available, a calculated mean arterial pressure (⅓ × systolic + ⅔ × diastolic) may be used as an alternative.

g

Age is not adjusted for prematurity, and the criteria do not apply to birth hospitalizations, children with postconceptional age <37 weeks, or those aged ≥18 years.

h

Coagulation variable reference ranges: platelets, 150-450 × 103/μL; D-dimer, <0.5 mg/L FEU; fibrinogen, 180-410 mg/dL. International normalized ratio reference range is based on local reference prothrombin time.

i

The neurologic dysfunction subscore was pragmatically validated in both sedated and nonsedated patients and those with and without IMV support.

j

The Glasgow Coma Scale score measures level of consciousness based on verbal, eye, and motor response and ranges from 3 to 15, with a higher score indicating better neurologic function.

Figure 1. In-Hospital Mortality Associated With the Phoenix Sepsis Score in Patients in Higher-Resource Settings With Suspected Infection in the First 24 Hours.

Figure 1.

This figure shows calibration of the Phoenix Sepsis Score in higher-resource settings (sites with more technological resources, eg, laboratory equipment, ventilators, and kidney replacement therapy devices, to support organ dysfunction). For patients with suspected infection who have each possible integer value of the Phoenix Sepsis Score in the first 24 hours of the encounter, mortality among those at the development, internal validation, and external validation sites is shown. Binomial confidence intervals (whiskers) for the mortality point estimate in each group are also shown.

Figure 2. In-Hospital Mortality Associated With the Phoenix Sepsis Score in Patients in Lower-Resource Settings With Suspected Infection in the First 24 Hours.

Figure 2.

This figure shows the calibration of the Phoenix Sepsis Score in lower-resource settings (sites with fewer technological resources to support organ dysfunction). For patients with suspected infection who have each possible integer value of the Phoenix Sepsis Score in the first 24 hours of the encounter, mortality among those at the development, internal validation, and external validation sites is shown. Binomial confidence intervals (whiskers) for the mortality point estimate in each group are also shown. At lower-resource sites, some variables were rarely available (eg, D-dimer and fibrinogen for coagulation dysfunction), even when other variables for the same organ systems were recorded (eg, platelet count and international normalized ratio); thus, the maximum cumulative score achieved at lower-resource sites was 9, instead of the maximum possible of 13.

Figure 3. Mortality Prediction Performance of the Phoenix Sepsis Score and Organ Dysfunction Scores.

Figure 3.

IPSCC indicates International Pediatric Sepsis Consensus Conference; PELOD-2, Pediatric Logistic Organ Dysfunction version 2; PODIUM, Pediatric Organ Dysfunction Information Update Mandate; and pSOFA, pediatric Sequential Organ Failure Assessment. This figure compares the performance of the Phoenix Sepsis Score with validated pediatric organ dysfunction scores and criteria to predict mortality in patients with suspected infection in the first 24 hours. Equivalent performance metrics for the secondary outcome, early death or extracorporeal membrane oxygenation, are shown in eFigure 7 in Supplement 1. All types of organ dysfunction are evaluated across their respective full ranges, with higher scores indicating more organ dysfunction burden. The scores for IPSCC, Proulx, and PODIUM are based on the counts of organ dysfunction (eAppendix 1 and eTable 2 in Supplement 1). Performance is presented as both quantitative with 95% CIs (calculated using logit transform), as well as visually using a color heat map. Shading indicates highest (darkest) to lowest (lightest) in each row. The AUPRC is the area under a curve drawn with sensitivity (also referred to as “recall”) and positive predictive value (also referred to as “precision”) across all potential thresholds for the points in the scores. The AUPRC is a more reliable classifier performance metric than the AUROC when the classes are imbalanced, for example, when mortality is very low, as in this study. The AUROC is the area under a curve drawn with the false-positive rate on the x-axis and the true-positive rate on the y-axis. In this study, it is an indicator of how well a classifier can rank encounters with respect to mortality risk.

From the Phoenix Sepsis Score to the Criteria for Pediatric Sepsis and Septic Shock

The task force chose a Phoenix Sepsis Score of 2 or greater in patients with suspected infection as the new sepsis criteria, and sepsis with 1 or more cardiovascular points as criteria for septic shock. In the development set, children with sepsis in the first 24 hours had 7.1% mortality at the higher-resource sites and 28.5% mortality at the lower-resource sites. Children with sepsis in both higher- and lower-resource settings had a median Phoenix Sepsis Score of 3 points (IQR, 2-4). Children with septic shock in the first 24 hours had 10.8% mortality at the higher-resource sites and 33.5% mortality at the lower-resource sites. The novel criteria had higher PPV and sensitivity that was comparable with or higher than the IPSCC sepsis, severe sepsis, and septic shock criteria across all settings and using the secondary outcome of early death or ECMO (Figure 4; eFigure 10 and eTable 7 in Supplement 1). For example, for the primary outcome of death at the higher-resource sites, the Phoenix sepsis criteria had a PPV of 5.3% to 7.1% (with a baseline mortality of 0.6% to 0.7%) and a sensitivity of 69.2% to 84.4% compared with the IPSCC severe sepsis criteria, which had a PPV of 3.6% to 4.8% and a sensitivity of 58.7% to 70.7%, in the development and external validation sets, respectively. In the derivation and internal validation set of the lower-resource site that had complete data for assessment of the criteria, the Phoenix sepsis criteria had a PPV of 22.2% (baseline mortality rate of 4.1%) and a sensitivity of 81.2% compared with the IPSCC severe sepsis criteria, which had a PPV of 12.7% and a sensitivity of 49.2%.

Figure 4. Comparison of Sensitivity and PPV of Novel Phoenix Sepsis Criteria With Current IPSCC Sepsis and Severe Sepsis Criteria Across Outcomes and Patient Subgroups in the Internal Validation Sets.

Figure 4.

The positive predictive value (PPV, or precision) and sensitivity for the Phoenix vs 2005 International Pediatric Sepsis Consensus Conference (IPSCC) criteria for sepsis in children with suspected infection are shown. The Phoenix sepsis criteria are based on achieving ≥2 points in the Phoenix Sepsis Score among patients with suspected infection in the first 24 hours of an encounter. The IPSCC sepsis and severe sepsis criteria are based on systemic inflammatory response syndrome (SIRS) and IPSCC-based organ dysfunction among patients with suspected infection in the first 24 hours of an encounter. Baseline rates of the outcome in each group (death, or early death or extracorporeal membrane oxygenation [ECMO]) are shown as horizontal dashed lines. 95% CIs are shown as bands from each point in the plane representing that component (eg, CIs for PPV are parallel to the y-axis). Confidence bands that are not visible are narrow enough to be completely hidden by the point. These figures are similar to area under the precision recall curves except at a single threshold for criteria that generate a binary response (eg, yes/no sepsis criteria met) instead of across the range of possible points in the curve (eg, 0-13 points in the Phoenix Sepsis Score; see Figure 3). Better-performing criteria are closer to the top right corner. A trade-off exists between sensitivity and PPV, with more sensitive criteria usually having lower PPV and more specific criteria usually having higher PPV and lower sensitivity. Criteria that are close to the baseline outcome rate have poor predictive value.

aAt lower-resource site 2, some Phoenix Sepsis Score and IPSCC data inputs (eg, invasive mechanical ventilation, Glasgow Coma Scale score) are not recorded even when they are performed; thus, assessment of criteria performance is limited. Lower-resource site 1 and all higher-resource sites have inputs for all relevant organ systems in the criteria. Comparison of sepsis criteria in the external validation sites is shown in eFigure 10 in Supplement 1 with similar results. Diagnostic performance measures for this comparison are shown in eTable 7 in Supplement 1.

Per request of the task force, the concept of organ dysfunction remote to the site of infection was implemented by requiring that those with respiratory or neurologic dysfunction also had 1 or more points in a different organ system. Patients with sepsis who had remote organ dysfunction accounted for 85.2% of sepsis cases and had higher mortality than the whole sepsis cohort: 8% in higher-resource sites and 32.3% in lower-resource sites (eFigure 11 in Supplement 1).

Sensitivity Analyses

Performance of the pediatric sepsis criteria was consistent across age groups, with higher sepsis incidence and mortality in younger age groups, as expected (eTable 8 in Supplement 1). Similarly, the performance was consistent in patients with no known prior comorbidities, those admitted to the ICU, and after excluding patients who underwent surgery (eTable 8 in Supplement 1).

Clinical vignettes for children presenting with sepsis and septic shock and their corresponding Phoenix Sepsis Score data are provided in eAppendix 2 in Supplement 1.

Discussion

New criteria for pediatric sepsis and septic shock were derived and validated by developing and curating a clinical database with more than 3.6 million pediatric hospital encounters at 10 sites in 5 countries. The development data set was built using structured EHR data from an international cohort that was geographically and racially diverse and had widely varying resources, a major strength of this study. A prespecified data-driven approach was used to determine the best-performing organ dysfunction measures in children with suspected infection. An interpretable machine learning approach was used to develop a composite model that was the basis for the new Phoenix Sepsis Score and the new criteria. The new Phoenix criteria for pediatric sepsis and septic shock had higher PPV and comparable or higher sensitivity than the IPSCC criteria for predicting mortality across differently resourced settings. These findings were consistent in multiple sensitivity analyses that included age, absence of prior comorbidities, ICU admission, and surgery.

Comparison With the Adult Sepsis-3 Criteria

The approach used in this study had both similarities with and differences from the derivation of the adult Sepsis-3 criteria.4 Similar to Sepsis-3, the definition of sepsis was implemented as the combination of suspected infection with life-threatening organ dysfunction. Also, existing organ dysfunction scores and a large EHR database were used to develop the new criteria and in-hospital mortality was the primary outcome. However, there were also several important differences. First, instead of using existing complete organ dysfunction scores (eg, the SOFA score) to derive the new criteria, the best-performing individual organ measures of existing scores were used to develop a novel composite score using stacked regression. Additionally, a database was built that included a geographically and demographically diverse population of children from both higher- and lower-resource settings to maximize generalizability. Furthermore, the performance of the individual organ dysfunction measures, the stacked models, and the Phoenix Sepsis Score were primarily evaluated using the AUPRC, instead of the AUROC, with the goal of maximizing the PPV and sensitivity of the final criteria. The AUPRC is considered a better measure of classification performance for rare events (in this case, deaths) compared with the AUROC, which can have inflated performance when the proportions of events (deaths) and nonevents (survivors) are imbalanced,11,20 an issue that is particularly relevant in children with infections given their lower mortality compared with adults. Finally, this analysis focused on diagnosis of sepsis within the first 24 hours of presentation to a hospital setting, when the majority of pediatric sepsis is diagnosed.21

Leveraging Digital Technology to Develop and Implement the Phoenix Sepsis Score

This approach to the development of the Phoenix Sepsis Score and the criteria for sepsis and septic shock is a reflection of the growing digitization of health care globally.22 Most of the vital signs, laboratory tests, and interventions included in the Phoenix Sepsis Score are routinely collected in most lower-resource settings and in nearly all higher-resource settings, according to the Pediatric Sepsis Definition Task Force’s international survey.23 Even in settings where not all variables are available, the Phoenix Sepsis Score is designed to accurately identify children with sepsis. The score functions when not all variables are available because of its redundancy. Because the score has a possible range of 0 to 13 points, there are several ways to achieve the threshold of 2 points for sepsis diagnosis, as evidenced by the fact that patients with sepsis in both higher- and lower-resource settings had a median Phoenix Sepsis Score of 3 points. This feature was primarily assessed in the data sets from lower-resource settings. For example, although platelets were commonly measured at most sites, coagulation tests (eg, D-dimer and fibrinogen) were less frequently available. At lower-resource site 1, where platelet count was routinely measured but coagulation factors such as D-dimer and fibrinogen were not, the Phoenix Sepsis Score had excellent performance and the Phoenix sepsis criteria had higher sensitivity and PPV than the IPSCC sepsis and severe sepsis criteria. This makes the score and criteria readily translatable into EHR and other digital tools, such as web-based and mobile applications across differently resourced settings, even when some of the variables are not routinely collected.24 Furthermore, digital implementation of the Phoenix Sepsis Score can enable longitudinal monitoring and provide clinicians and researchers with a tool to stratify severity of sepsis.

Additional considerations for the implementation and use of the Phoenix Sepsis Score and the novel criteria are discussed in the accompanying consensus criteria article.6

Limitations

This study has several limitations. Retrospective data obtained from EHRs may have missing data and data entry errors. In this study, a robust quality assurance and harmonization process was developed and best practices were used to address outliers and missing data. However, not all errors or missing data can be reconciled. For example, at lower-resource site 2 in the development data set, which represents a lower- to middle-income country, respiratory support (eg, mechanical ventilation, Fio2) and neurologic assessments (eg, level of consciousness and pupillary reaction) are performed but not recorded in the clinical information systems. This reduces the ability to assess the score and criteria at that site. In contrast, score performance was excellent at lower-resource site 1 and comparable with the higher-resource sites. This demonstrates the potential for score performance in lower-resource environments when these variables are recorded. Second, when deriving the stacked regression models, the Phoenix Sepsis Score, and the new criteria for sepsis and septic shock, a pragmatic approach was intentionally chosen, using the data as recorded during routine care as an indicator of how the criteria would perform in real-world implementations. However, it is acknowledged that some of the organ dysfunction measures used in the modeling process may not have reflected actual organ dysfunction, but rather were due to iatrogenic effects or clinician therapeutic choices, such as a lower Glasgow Coma Scale score in a patient receiving sedation or initiation of vasoactive medications in a patient with minimal cardiovascular dysfunction. Future work to determine the effects of these variables and clinician choices on the performance of the criteria is needed. Third, similar to the Sepsis-3 validation study, unique criteria for patients with chronic organ dysfunction were not developed.4 Fourth, few databases from lower-resource settings were available (a form of data poverty),25 and the ones used may not be generalizable to every low-resource environment. Fifth, the data from higher-resource settings were exclusively from tertiary US pediatric centers. Sixth, the data sets from some of the sites included 10 years of data, possibly including changes in practice during that time frame.

Conclusions

The novel Phoenix sepsis criteria, which were derived and validated using a large international database of pediatric hospital encounters in higher- and lower-resource settings, had improved performance for the diagnosis of pediatric sepsis and septic shock compared with the existing IPSCC criteria.

Educational Objective: To identify the key insights or developments described in this article.

  1. What was the primary outcome used in the development and validation of this clinical prediction tool?

    1. A combination of intensive care unit admission, intubation, vasopressor support, extracorporeal membrane oxygenation, or death

    2. Agreement by at least 2 of 3 pediatric intensivists that sepsis was present on postevent chart review

    3. In-hospital mortality

  2. The authors used stacked regression modeling for the derivation of this new clinical prediction tool. Why did they choose this approach?

    1. Stacked regression allows many models to be used simultaneously, leveraging predictive power while maintaining a high degree of interpretability.

    2. Stacked regression typically yields integer estimates of risk, permitting easy and obvious application in clinical settings.

    3. The large size of this database overwhelmed alternative forms of machine learning, forcing selection of the less computationally intensive stacked regression.

  3. According to the authors, how did this development of pediatric sepsis criteria differ from the derivation of the adult Sepsis-3 criteria?

    1. Existing organ dysfunction scores and a large electronic health record database were used to develop the new criteria.

    2. The database included a geographically and demographically diverse population from both higher- and lower-resource settings.

    3. The implemented definition of sepsis combined suspected infection with life-threatening organ dysfunction.

Supplement 1.

eAppendix 1. Supplemental methods

eTable 1. Site Characteristics

eTable 2. Organ dysfunction scores and criteria used in the study

eFigure 1. Conceptual illustration of how stacked regression was used to develop the sepsis criteria

eFigure 2. Pipeline for data harmonization, data quality, and data analysis (A), and CONSORT-style flow diagram for encounters in the pipeline and the various analyses (B)

eFigure 3A-H. Subscore input availability and missingness among patients with suspected infection in higher resource settings

eFigure 4. Performance of the individual subscores for each organ system based on AUPRC and AUROC to predict mortality

eTable 3. Cohort characteristics of the development set stratified by infection status

eTable 4. Cohort characteristics of the development set stratified by infection status and site

eTable 5. Cohort characteristics of the external validation set stratified by infection status and site

eTable 6. Stacked regression coefficients of the 8-organ system ridge regression model and the 4-organ system LASSO model

eFigure 5. In-hospital mortality associated with the Phoenix Sepsis Score in patients with suspected infection in the first 24 hours at higher resource site 6 (the geographic external validation set)

eFigure 6A-J. AUPRC and AUROC curves for the four-organ system model

eFigure 7. Performance of the Phoenix Sepsis Score and organ dysfunction scores to predict early death or extracorporeal membrane oxygenation

eFigure 8A-B. Performance of the Phoenix Sepsis Score and other sepsis scores (A) and other organ dysfunction scores (B) to predict mortality across all thresholds

eFigure 9. The Phoenix-8 organ dysfunction score

eFigure 10. Sensitivity and Positive Predictive Value of the Phoenix and IPSCC criteria across outcomes and patient subgroups in the external validation sets

eTable 7. Diagnostic performance measures of the sepsis and septic shock criteria in the development set

eTable 8. Diagnostic performance measures of the Phoenix sepsis criteria across sensitivity analyses in the development set

eFigure 11A-B. Venn diagram of sepsis with remote organ dysfunction in the development set

eAppendix 2. Clinical vignettes with calculation of the Phoenix Sepsis Score and the Phoenix Sepsis Criteria

eReferences

jama-e240196-s001.pdf (4.6MB, pdf)
Supplement 2.

Data Sharing Statement

jama-e240196-s002.pdf (95.2KB, pdf)

References

  • 1.Rudd KE, Johnson SC, Agesa KM, et al. Global, regional, and national sepsis incidence and mortality, 1990-2017. Lancet. 2020;395(10219):200-211. doi: 10.1016/S0140-6736(19)32989-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Goldstein B, Giroir B, Randolph A, et al. International pediatric sepsis consensus conference: definitions for sepsis and organ dysfunction in pediatrics. Pediatr Crit Care Med. 2005;6(1):2-8. doi: 10.1097/01.PCC.0000149131.72248.E6 [DOI] [PubMed] [Google Scholar]
  • 3.Weiss SL, Fitzgerald JC, Maffei FA, et al. Discordant identification of pediatric severe sepsis by research and clinical definitions in the SPROUT international point prevalence study. Crit Care. 2015;19(1):325. doi: 10.1186/s13054-015-1055-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Seymour CW, Liu VX, Iwashyna TJ, et al. Assessment of clinical criteria for sepsis: for the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA. 2016;315(8):762-774. doi: 10.1001/jama.2016.0288 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Singer M, Deutschman CS, Seymour CW, et al. The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA. 2016;315(8):801-810. doi: 10.1001/jama.2016.0287 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Schlapbach LJ, Watson RS, Sorce LR, et al. ; Society of Critical Care Medicine Pediatric Sepsis Definition Task Force . International consensus criteria for pediatric sepsis and septic shock. JAMA. Published online January 21, 2024. doi: 10.1001/jama.2024.0179 [DOI] [Google Scholar]
  • 7.Zeng X, Yu G, Lu Y, et al. PIC, a paediatric-specific intensive care database. Sci Data. 2020;7(1):14. doi: 10.1038/s41597-020-0355-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Feinstein JA, Russell S, DeWitt PE, et al. R package for pediatric complex chronic condition classification. JAMA Pediatr. 2018;172(6):596-598. doi: 10.1001/jamapediatrics.2018.0256 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.World Health Organization . Nutrition for Health and Development. WHO Child Growth Standards: Growth Velocity Based on Weight, Length and Head Circumference: Methods and Development. World Health Organization; 2009. [Google Scholar]
  • 10.Ozenne B, Subtil F, Maucort-Boulch D. The precision-recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases. J Clin Epidemiol. 2015;68(8):855-859. doi: 10.1016/j.jclinepi.2015.02.010 [DOI] [PubMed] [Google Scholar]
  • 11.Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One. 2015;10(3):e0118432. doi: 10.1371/journal.pone.0118432 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Wolpert DH. Stacked generalization. Neural Netw. 1992;5(2):241-259. doi: 10.1016/S0893-6080(05)80023-1 [DOI] [Google Scholar]
  • 13.Breiman L. Stacked regressions. Mach Learn. 1996;24(1):49-64. doi: 10.1007/BF00117832 [DOI] [Google Scholar]
  • 14.Proulx F, Fayon M, Farrell CA, et al. Epidemiology of sepsis and multiple organ dysfunction syndrome in children. Chest. 1996;109(4):1033-1037. doi: 10.1378/chest.109.4.1033 [DOI] [PubMed] [Google Scholar]
  • 15.Matics TJ, Sanchez-Pinto LN. Adaptation and validation of a pediatric Sequential Organ Failure Assessment score and evaluation of the Sepsis-3 definitions in critically ill children. JAMA Pediatr. 2017;171(10):e172352. doi: 10.1001/jamapediatrics.2017.2352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Leteurtre S, Duhamel A, Salleron J, et al. PELOD-2: an update of the Pediatric Logistic Organ Dysfunction score. Crit Care Med. 2013;41(7):1761-1773. doi: 10.1097/CCM.0b013e31828a2bbd [DOI] [PubMed] [Google Scholar]
  • 17.Rousseaux J, Grandbastien B, Dorkenoo A, et al. Prognostic value of shock index in children with septic shock. Pediatr Emerg Care. 2013;29(10):1055-1059. doi: 10.1097/PEC.0b013e3182a5c99c [DOI] [PubMed] [Google Scholar]
  • 18.Khemani RG, Bart RD, Alonzo TA, et al. Disseminated intravascular coagulation score is associated with mortality for children with shock. Intensive Care Med. 2009;35(2):327-333. doi: 10.1007/s00134-008-1280-8 [DOI] [PubMed] [Google Scholar]
  • 19.Haque A, Siddiqui NR, Munir O, et al. Association between vasoactive-inotropic score and mortality in pediatric septic shock. Indian Pediatr. 2015;52(4):311-313. doi: 10.1007/s13312-015-0630-1 [DOI] [PubMed] [Google Scholar]
  • 20.Tharwat A. Classification assessment methods. Appl Comput Inform. 2020;17(1):168-192. doi: 10.1016/j.aci.2018.08.003 [DOI] [Google Scholar]
  • 21.Scott HF, Brilli RJ, Paul R, et al. Evaluating pediatric sepsis definitions designed for electronic health record extraction and multicenter quality improvement. Crit Care Med. 2020;48(10):e916-e926. doi: 10.1097/CCM.0000000000004505 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wyber R, Vaillancourt S, Perry W, et al. Big data in global health. Bull World Health Organ. 2015;93(3):203-208. doi: 10.2471/BLT.14.139022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Morin L, Hall M, de Souza D, et al. The current and future state of pediatric sepsis definitions. Pediatrics. 2022;149(6):e2021052565. doi: 10.1542/peds.2021-052565 [DOI] [PubMed] [Google Scholar]
  • 24.Jimenez-Zambrano A, Ritger C, Rebull M, et al. Clinical decision support tools for paediatric sepsis in resource-poor settings. BMJ Open. 2023;13(10):e074458. doi: 10.1136/bmjopen-2023-074458 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ibrahim H, Liu X, Zariffa N, et al. Health data poverty: an assailable barrier to equitable digital health care. Lancet Digit Health. 2021;3(4):e260-e265. doi: 10.1016/S2589-7500(20)30317-4 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1.

eAppendix 1. Supplemental methods

eTable 1. Site Characteristics

eTable 2. Organ dysfunction scores and criteria used in the study

eFigure 1. Conceptual illustration of how stacked regression was used to develop the sepsis criteria

eFigure 2. Pipeline for data harmonization, data quality, and data analysis (A), and CONSORT-style flow diagram for encounters in the pipeline and the various analyses (B)

eFigure 3A-H. Subscore input availability and missingness among patients with suspected infection in higher resource settings

eFigure 4. Performance of the individual subscores for each organ system based on AUPRC and AUROC to predict mortality

eTable 3. Cohort characteristics of the development set stratified by infection status

eTable 4. Cohort characteristics of the development set stratified by infection status and site

eTable 5. Cohort characteristics of the external validation set stratified by infection status and site

eTable 6. Stacked regression coefficients of the 8-organ system ridge regression model and the 4-organ system LASSO model

eFigure 5. In-hospital mortality associated with the Phoenix Sepsis Score in patients with suspected infection in the first 24 hours at higher resource site 6 (the geographic external validation set)

eFigure 6A-J. AUPRC and AUROC curves for the four-organ system model

eFigure 7. Performance of the Phoenix Sepsis Score and organ dysfunction scores to predict early death or extracorporeal membrane oxygenation

eFigure 8A-B. Performance of the Phoenix Sepsis Score and other sepsis scores (A) and other organ dysfunction scores (B) to predict mortality across all thresholds

eFigure 9. The Phoenix-8 organ dysfunction score

eFigure 10. Sensitivity and Positive Predictive Value of the Phoenix and IPSCC criteria across outcomes and patient subgroups in the external validation sets

eTable 7. Diagnostic performance measures of the sepsis and septic shock criteria in the development set

eTable 8. Diagnostic performance measures of the Phoenix sepsis criteria across sensitivity analyses in the development set

eFigure 11A-B. Venn diagram of sepsis with remote organ dysfunction in the development set

eAppendix 2. Clinical vignettes with calculation of the Phoenix Sepsis Score and the Phoenix Sepsis Criteria

eReferences

jama-e240196-s001.pdf (4.6MB, pdf)
Supplement 2.

Data Sharing Statement

jama-e240196-s002.pdf (95.2KB, pdf)

Articles from JAMA are provided here courtesy of American Medical Association

RESOURCES