Development and Validation of the Phoenix Criteria for Pediatric Sepsis and Septic Shock

L Nelson Sanchez-Pinto; Tellen D Bennett; Peter E DeWitt; Seth Russell; Margaret N Rebull; Blake Martin; Samuel Akech; David J Albers; Elizabeth R Alpern; Fran Balamuth; Melania Bembea; Mohammod Jobayer Chisti; Idris Evans; Christopher M Horvat; Juan Camilo Jaramillo-Bustamante; Niranjan Kissoon; Kusum Menon; Halden F Scott; Scott L Weiss; Matthew O Wiens; Jerry J Zimmerman; Andrew C Argent; Lauren R Sorce; Luregn J Schlapbach; R Scott Watson; and the Society of Critical Care Medicine Pediatric Sepsis Definition Task Force; Paolo Biban; Enitan Carrol; Kathleen Chiotos; Claudio Flauzino De Oliveira; Mark W Hall; David Inwald; Paul Ishimine; Michael Levin; Rakesh Lodha; Simon Nadel; Satoshi Nakagawa; Mark J Peters; Adrienne G Randolph; Suchitra Ranjit; Daniela Carla Souza; Pierre Tissieres; James L Wynn

doi:10.1001/jama.2024.0196

. 2024 Jan 21;331(8):675–686. doi: 10.1001/jama.2024.0196

Development and Validation of the Phoenix Criteria for Pediatric Sepsis and Septic Shock

L Nelson Sanchez-Pinto ¹, Tellen D Bennett ^2,^✉, Peter E DeWitt ³, Seth Russell ³, Margaret N Rebull ³, Blake Martin ², Samuel Akech ⁴, David J Albers ^5,⁶, Elizabeth R Alpern ⁷, Fran Balamuth ⁸, Melania Bembea ⁹, Mohammod Jobayer Chisti ¹⁰, Idris Evans ¹¹, Christopher M Horvat ¹¹, Juan Camilo Jaramillo-Bustamante ¹², Niranjan Kissoon ¹³, Kusum Menon ¹⁴, Halden F Scott ¹⁵, Scott L Weiss ^16,¹⁷, Matthew O Wiens ^18,^19,²⁰, Jerry J Zimmerman ²¹, Andrew C Argent ²², Lauren R Sorce ²³, Luregn J Schlapbach ^24,²⁵, R Scott Watson ²⁶; and the Society of Critical Care Medicine Pediatric Sepsis Definition Task Force, Paolo Biban ²⁷, Enitan Carrol ²⁸, Kathleen Chiotos ²⁹, Claudio Flauzino De Oliveira ³⁰, Mark W Hall ³¹, David Inwald ³², Paul Ishimine ³³, Michael Levin ³⁴, Rakesh Lodha ³⁵, Simon Nadel ³⁶, Satoshi Nakagawa ³⁷, Mark J Peters ³⁸, Adrienne G Randolph ³⁹, Suchitra Ranjit ⁴⁰, Daniela Carla Souza ⁴¹, Pierre Tissieres ⁴², James L Wynn ⁴³

¹Departments of Pediatrics (Critical Care) and Preventive Medicine (Health and Biomedical Informatics), Northwestern University Feinberg School of Medicine, and Ann and Robert H. Lurie Children’s Hospital of Chicago, Chicago, Illinois

²Departments of Biomedical Informatics and Pediatrics (Critical Care Medicine), University of Colorado School of Medicine, and Children’s Hospital Colorado, Aurora

³Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora

⁴Kenya Medical Research Institute (KEMRI)–Wellcome Trust Research Programme, Nairobi, Kenya

⁵Departments of Biomedical Informatics, Bioengineering, Biostatistics, and Informatics, University of Colorado School of Medicine, Aurora

⁶Department of Biomedical Informatics, Columbia University, New York, New York

⁷Division of Emergency Medicine, Department of Pediatrics, Ann and Robert H. Lurie Children’s Hospital of Chicago, and Northwestern University Feinberg School of Medicine, Chicago, Illinois

⁸Department of Pediatrics, University of Pennsylvania, Perelman School of Medicine and Division of Emergency Medicine, Children’s Hospital of Philadelphia, Philadelphia

⁹Department of Anesthesiology and Critical Care Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland

¹⁰Intensive Care Unit, Dhaka Hospital, Nutrition Research Division, International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh

¹¹Clinical Research, Investigation, and Systems Modeling of Acute Illness (CRISMA) Center, Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania

¹²Pediatric Intensive Care Unit, Hospital General de Medellín Luz Castro de Gutiérrez and Hospital Pablo Tobón Uribe, and Red Colaborativa Pediátrica de Latinoamérica (LARed Network), Medellín, Colombia

¹³Department of Pediatrics, University of British Columbia, Vancouver, Canada

¹⁴Department of Pediatrics, Children’s Hospital of Eastern Ontario and University of Ottawa, Ottawa, Canada

¹⁵Department of Pediatrics (Pediatric Emergency Medicine), University of Colorado School of Medicine, and Children’s Hospital Colorado, Aurora

¹⁶Division of Critical Care, Department of Pediatrics, Nemours Children’s Health, Wilmington, Delaware

¹⁷Sidney Kimmel Medical College at Thomas Jefferson University, Philadelphia, Pennsylvania

¹⁸Department of Anesthesiology, Pharmacology, and Therapeutics, Faculty of Medicine, University of British Columbia, Vancouver, Canada

¹⁹Institute for Global Health, BC Children’s Hospital, Vancouver, British Columbia, Canada

²⁰Walimu, Kampala, Uganda

²¹Seattle Children’s Hospital and Department of Pediatrics, University of Washington School of Medicine, Seattle

²²Paediatrics and Child Health, University of Cape Town Faculty of Health Sciences, Cape Town, South Africa

²³Department of Pediatrics, Northwestern University Feinberg School of Medicine, and Ann and Robert H. Lurie Children’s Hospital of Chicago, Chicago, Illinois

²⁴Department of Intensive Care and Neonatology, Children’s Research Center, University Children’s Hospital Zurich, University of Zurich, Zurich, Switzerland

²⁵Child Health Research Centre, The University of Queensland, Brisbane, Australia

²⁶Department of Pediatrics, University of Washington, and Center for Child Health, Behavior, and Development and Pediatric Critical Care, Seattle Children’s Hospital, Seattle

²⁷Verona University Hospital, Verona, Italy

²⁸University of Liverpool, Liverpool, England

²⁹Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania

³⁰Associação de Medicina Intensiva Brasileira, São Paulo, Brazil

³¹Nationwide Children’s Hospital, Columbus, Ohio

³²Addenbrooke’s Hospital, Cambridge University Hospital NHS Trust, Cambridge, England

³³University of California, San Diego School of Medicine, La Jolla

³⁴Imperial College London, London, England

³⁵All India Institute of Medical Sciences, New Delhi, India

³⁶St Mary’s Hospital, London, England

³⁷National Center for Child Health and Development, Tokyo, Japan

³⁸University College London Great Ormond Street Institute of Child Health, London, England

³⁹Boston Children’s Hospital, Boston, Massachusetts

⁴⁰Apollo Children’s Hospital, Chennai, India

⁴¹University Hospital of the University of São Paulo, Sao Paulo, Brazil

⁴²Hôpital de Bicêtre, Paris, France

⁴³University of Florida, Gainesville

^✉

Corresponding Author: Tellen D. Bennett, MD, MS, Departments of Biomedical Informatics and Pediatrics (Critical Care Medicine), Children’s Hospital Colorado, University of Colorado School of Medicine, 1890 N Revere Ct, Mail Stop 600, Aurora, CO 80045 (tell.bennett@cuanschutz.edu).

Group Information: The members of the Society of Critical Care Medicine Pediatric Sepsis Definition Task Force appear at the end of this article.

Accepted for Publication: January 5, 2024.

Published Online: January 21, 2024. doi:10.1001/jama.2024.0196

The Society of Critical Care Medicine Pediatric Sepsis Definition Task Force Authors: Paolo Biban, MD; Enitan Carrol, MD, MBChB; Kathleen Chiotos, MD; Claudio Flauzino De Oliveira, MD, PhD; Mark W. Hall, MD; David Inwald, MB, MB BChir, PhD; Paul Ishimine, MD; Michael Levin, MD, PhD; Rakesh Lodha, MD; Simon Nadel, MBBS; Satoshi Nakagawa, MD; Mark J. Peters, PhD; Adrienne G. Randolph, MD, MS; Suchitra Ranjit, MD; Daniela Carla Souza, MD; Pierre Tissieres, MD, DSc; James L. Wynn, MD.

Affiliations of The Society of Critical Care Medicine Pediatric Sepsis Definition Task Force Authors: Verona University Hospital, Verona, Italy (Biban); University of Liverpool, Liverpool, England (Carrol); Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania (Chiotos); Associação de Medicina Intensiva Brasileira, São Paulo, Brazil (Flauzino De Oliveira); Nationwide Children’s Hospital, Columbus, Ohio (Hall); Addenbrooke’s Hospital, Cambridge University Hospital NHS Trust, Cambridge, England (Inwald); University of California, San Diego School of Medicine, La Jolla (Ishimine); Imperial College London, London, England (Levin); All India Institute of Medical Sciences, New Delhi, India (Lodha); St Mary’s Hospital, London, England (Nadel); National Center for Child Health and Development, Tokyo, Japan (Nakagawa); University College London Great Ormond Street Institute of Child Health, London, England (Peters); Boston Children’s Hospital, Boston, Massachusetts (Randolph); Apollo Children’s Hospital, Chennai, India (Ranjit); University Hospital of the University of São Paulo, Sao Paulo, Brazil (Souza); Hôpital de Bicêtre, Paris, France (Tissieres); University of Florida, Gainesville (Wynn).

Author Contributions: Drs Sanchez-Pinto and Bennett had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Drs Sanchez-Pinto and Bennett contributed equally. Drs DeWitt and Mr Russell contributed equally. Drs Argent, Sorce, Schlapbach, and Watson contributed equally.

Concept and design: Sanchez-Pinto, Bennett, DeWitt, Russell, Rebull, Martin, Akech, Albers, Alpern, Bembea, Chisti, Evans, Horvat, Jaramillo-Bustamante, Kissoon, Menon, Scott, Weiss, Wiens, Zimmerman, Argent, Sorce, Schlapbach, Watson, Carrol, Tissieres.

Acquisition, analysis, or interpretation of data: Sanchez-Pinto, Bennett, DeWitt, Russell, Rebull, Martin, Akech, Albers, Alpern, Bembea, Chisti, Evans, Horvat, Jaramillo-Bustamante, Kissoon, Menon, Scott, Weiss, Wiens, Zimmerman, Argent, Sorce, Schlapbach, Watson, Carrol, Tissieres.

Drafting of the manuscript: Sanchez-Pinto, Bennett, DeWitt, Russell, Rebull, Martin, Akech, Albers, Alpern, Bembea, Chisti, Evans, Horvat, Jaramillo-Bustamante, Kissoon, Menon, Scott, Weiss, Wiens, Zimmerman, Argent, Sorce, Schlapbach, Watson, Carrol, Tissieres.

Critical review of the manuscript for important intellectual content: Sanchez-Pinto, Bennett, DeWitt, Russell, Rebull, Martin, Akech, Albers, Alpern, Bembea, Chisti, Evans, Horvat, Jaramillo-Bustamante, Kissoon, Menon, Scott, Weiss, Wiens, Zimmerman, Argent, Sorce, Schlapbach, Watson, Carrol, Tissieres.

Statistical analysis: Sanchez-Pinto, Bennett, DeWitt, Russell, Rebull, Martin, Akech, Albers, Alpern, Bembea, Chisti, Evans, Horvat, Jaramillo-Bustamante, Kissoon, Menon, Scott, Weiss, Wiens, Zimmerman, Argent, Sorce, Schlapbach, Watson, Carrol, Tissieres.

Obtained funding: Sanchez-Pinto, Bennett, DeWitt, Russell, Rebull, Martin, Akech, Albers, Alpern, Bembea, Chisti, Evans, Horvat, Jaramillo-Bustamante, Kissoon, Menon, Scott, Weiss, Wiens, Zimmerman, Argent, Sorce, Schlapbach, Watson, Carrol, Tissieres.

Administrative, technical, or material support: Sanchez-Pinto, Bennett, DeWitt, Russell, Rebull, Martin, Akech, Albers, Alpern, Bembea, Chisti, Evans, Horvat, Jaramillo-Bustamante, Kissoon, Menon, Scott, Weiss, Wiens, Zimmerman, Argent, Sorce, Schlapbach, Watson, Carrol, Tissieres.

Supervision: Sanchez-Pinto, Bennett, DeWitt, Russell, Rebull, Martin, Akech, Albers, Alpern, Bembea, Chisti, Evans, Horvat, Jaramillo-Bustamante, Kissoon, Menon, Scott, Weiss, Wiens, Zimmerman, Argent, Sorce, Schlapbach, Watson, Carrol, Tissieres.

Conflict of Interest Disclosures: Dr Sanchez-Pinto reported receipt of grants from the National Institutes of Health (NIH)/National Institute of General Medical Sciences outside the submitted work. Dr Bennett reported receipt of grants from the NIH/National Center for Advancing Translational Sciences and NIH/National Heart, Lung, and Blood Institute (NHLBI) outside the submitted work. Dr Martin reported receipt of grants from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) and the Thrasher Research Fund outside the submitted work. Dr Balamuth reported receipt of grants from the NIH and various federal and foundation grants to study sepsis and other infectious emergencies outside the submitted work. Dr Bembea reported receipt of grants with funds paid to institution from the NIH/National Institute of Neurological Disorders and Stroke (NINDS), the NIH/NICHD, Grifols, the Department of Defense, and the NIH/NHLBI. Dr Horvat reported grants from the NICHD and NINDS outside the submitted work. Dr Zimmerman reported receipt of grants from Immunexpress and personal fees from Elsevier outside the submitted work. Dr Schlapbach reported receipt of grants from Medical Research Future Funds, the National Health and Medical Research Council, and the Swiss Personalized Health Network outside the submitted work. Dr Randolph reported receipt of grants to the institution from the NIH and the Centers for Disease Control and Prevention; receipt of personal fees from UpToDate; receipt of travel funds for advisory board meetings from Volition Inc, Thermo Fisher, and bioMérieux; and being a scientific advisor from Inotrem outside the submitted work. Dr Carrol reported being a specialist committee member, scientific advisory board member, and/or panel/group member for the UK National Institute for Health and Care Excellence (NICE) Diagnostics Advisory Committee, the NICE Sepsis Guideline Development Group, BioFire Diagnostics, the NICE Quality Standards Committee for Sepsis, the Surviving Sepsis Campaign Pediatrics Guideline Panel, and the Society of Critical Care Medicine Paediatric Sepsis Definition Task Force and an investigator in the UK National Institute for Health and Care Research (NIHR)–funded studies BATCH, PRONTO, and PEACH and the H2020-funded studies PERFORM and DIAMONDS. Dr Chiotos reported being a member of the Infectious Diseases Society of America Sepsis Taskforce. Dr Hall reported receipt of personal fees for serving as a data and safety monitoring board member from AbbVie and nonfinancial support from Partner Therapeutics and Sobi. Dr Inwald reported receipt of grants from the NIHR as chief investigator of the PRESSURE trial. Dr Peters reported receipt of grants from the NIHR outside the submitted work. Dr Tissieres reported receipt of grants from Baxter and personal fees from Baxter, Sanofi, Thermo Fisher, and Sedana. Dr Wynn reported receipt of personal fees from Sobi. No other disclosures were reported.

Funding/Support: This work was supported by Eunice Kennedy Shriver National Institute of Child Health and Human Development grant R01HD105939 to Drs Sanchez-Pinto and Bennett. Dr Schlapbach received support from the NOMIS Foundation. The Society of Critical Care Medicine provided support to the Pediatric Sepsis Definition Task Force for travel of members, coordination of meetings, and other logistical support.

Role of the Funder/Sponsor: The supporting entities had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.

Disclaimer: Dr Sorce is an elected member of the Executive Committee and serves as president-elect of the Society of Critical Care Medicine for 2023-2024 and president for 2024-2025. The research presented is her own work and does not represent the Society of Critical Care Medicine.

Data Sharing Statement: See Supplement 2.

Additional Contributions: We thank the members of the Society of Critical Care Medicine Pediatric Sepsis Definition Task Force for their contributions to this work. In addition, Kathy Vermoch, MT(ASCP)SM, MPH, Lori Harmon, MBA, CPHQ, and Lynn Retford, CAE, at the Society of Critical Care Medicine provided invaluable committee management assistance throughout this project. Juliane Bubeck Wardenburg, MD, at Washington University in St Louis, also contributed to the task force’s work. No compensation outside of regular salaries was received.

^✉

Corresponding author.

PMCID: PMC10900964 PMID: 38245897

Key Points

Question

What are the best-performing organ dysfunction–based criteria to implement the definition of sepsis and septic shock in children with suspected infection?

Findings

In this international, multicenter, retrospective cohort study including more than 3.6 million pediatric encounters, a novel score, the Phoenix Sepsis Score, was derived and validated to predict mortality in children with suspected or confirmed infection. The new criteria for pediatric sepsis and septic shock based on the score performed better than existing organ dysfunction scores and the International Pediatric Sepsis Consensus Conference criteria.

Meaning

The new data-driven criteria for pediatric sepsis and septic shock based on measures of organ dysfunction had improved performance compared with prior pediatric sepsis criteria.

Abstract

Importance

The Society of Critical Care Medicine Pediatric Sepsis Definition Task Force sought to develop and validate new clinical criteria for pediatric sepsis and septic shock using measures of organ dysfunction through a data-driven approach.

Objective

To derive and validate novel criteria for pediatric sepsis and septic shock across differently resourced settings.

Design, Setting, and Participants

Multicenter, international, retrospective cohort study in 10 health systems in the US, Colombia, Bangladesh, China, and Kenya, 3 of which were used as external validation sites. Data were collected from emergency and inpatient encounters for children (aged <18 years) from 2010 to 2019: 3 049 699 in the development (including derivation and internal validation) set and 581 317 in the external validation set.

Exposure

Stacked regression models to predict mortality in children with suspected infection were derived and validated using the best-performing organ dysfunction subscores from 8 existing scores. The final model was then translated into an integer-based score used to establish binary criteria for sepsis and septic shock.

Main Outcomes and Measures

The primary outcome for all analyses was in-hospital mortality. Model- and integer-based score performance measures included the area under the precision recall curve (AUPRC; primary) and area under the receiver operating characteristic curve (AUROC; secondary). For binary criteria, primary performance measures were positive predictive value and sensitivity.

Results

Among the 172 984 children with suspected infection in the first 24 hours (development set; 1.2% mortality), a 4-organ-system model performed best. The integer version of that model, the Phoenix Sepsis Score, had AUPRCs of 0.23 to 0.38 (95% CI range, 0.20-0.39) and AUROCs of 0.71 to 0.92 (95% CI range, 0.70-0.92) to predict mortality in the validation sets. Using a Phoenix Sepsis Score of 2 points or higher in children with suspected infection as criteria for sepsis and sepsis plus 1 or more cardiovascular point as criteria for septic shock resulted in a higher positive predictive value and higher or similar sensitivity compared with the 2005 International Pediatric Sepsis Consensus Conference (IPSCC) criteria across differently resourced settings.

Conclusions and Relevance

The novel Phoenix sepsis criteria, which were derived and validated using data from higher- and lower-resource settings, had improved performance for the diagnosis of pediatric sepsis and septic shock compared with the existing IPSCC criteria.

This cohort study derives and validates novel criteria for diagnosis of pediatric sepsis and septic shock across high-resource and low-resource international settings.

Introduction

Pediatric sepsis is a major public health problem that causes an estimated 3.3 million deaths annually worldwide.¹ However, the current criteria to diagnose pediatric sepsis, which were published in 2005 following the International Pediatric Sepsis Consensus Conference (IPSCC), are outdated, have low specificity, do not allow for risk stratification in both lower- and higher-resource settings, and may be discordant with clinician-based diagnosis.^2,3 In 2016, the Sepsis-3 Task Force redefined adult sepsis as life-threatening organ dysfunction in the setting of infection and developed criteria using a large electronic health record (EHR) data set and a data-driven approach.^4,5 In 2019, the Society of Critical Care Medicine Pediatric Sepsis Definition Task Force was convened to update the pediatric sepsis definition and criteria. The task force adopted the conceptual definition of pediatric sepsis as suspected infection with life-threatening organ dysfunction and sought to implement the definition using organ dysfunction criteria associated with higher risk of mortality. The goal was to develop criteria that would generalize across differently resourced settings.⁶

New pediatric sepsis criteria should maximize identification of true-positive cases so that infected children with life-threatening organ dysfunction receive best-practice sepsis care, are appropriately enrolled in clinical studies, and are correctly represented in epidemiological surveillance. Simultaneously, new criteria must minimize false-positive cases so that children are not misdiagnosed with sepsis. This is important to reduce unnecessary use of antimicrobials and other treatments, optimize the efficiency of clinical studies, and avoid overcounting in surveillance. However, it is unclear which measures of organ dysfunction in children have an appropriate balance of sensitivity and positive predictive value (PPV) to achieve these goals and also generalize across differently resourced settings.

One challenge is that there is currently no large, centralized, multicenter, high-granularity database that includes pediatric emergency and inpatient care in differently resourced settings. Additionally, the validation of the existing IPSCC criteria has been limited historically.^2,3 To address these gaps, a database was developed and used to derive and validate novel criteria for pediatric sepsis and septic shock based on measures of organ dysfunction in children with suspected infection.

Methods

Overview

The existing organ dysfunction subscores for each organ system that best predicted mortality were first identified and then integrated into models to predict mortality in children with suspected infection. From the best-performing models, an integer-based score (the Phoenix Sepsis Score) was developed (eFigure 1 in Supplement 1). The binary Phoenix sepsis and septic shock criteria were then selected as thresholds of the Phoenix Sepsis Score.

Study Design, Setting, and Population

A retrospective cohort study was performed using EHR data from 10 hospital-based sites in 5 countries. The analysis plan was prespecified in the funding application that supported this work. Six US sites represent higher-resource settings, 5 of which were in the development data set (eFigure 2 in Supplement 1). Data from 1 US site was held out for geographic external validation. Two international sites in Bangladesh and Colombia represent lower-resource settings in the development data set. Additionally, limited EHR and registry data from sites in China⁷ and Kenya served as lower-resource external validation sites. From each site, all emergency department, inpatient, and intensive care unit (ICU) encounters of children younger than 18 years from 2010-2019 were included, with some sites providing shorter time windows (eTable 1 in Supplement 1). Data from newborns before discharge (birth hospitalizations) and children with a postconceptional age of less than 37 weeks were excluded. Data harmonization, quality assurance, and all analyses were conducted as a reproducible pipeline in a centralized, cloud-based environment (eFigure 2 and eAppendix 1 in Supplement 1). The study was approved with a waiver of consent by a central institutional review board at the University of Colorado, plus separate regulatory approvals at non-US sites.

Outcomes, Definitions, and Main Measures

The primary outcome for all analyses was in-hospital mortality, which was used to assess the likelihood that organ dysfunction in the setting of an infection was life-threatening. The secondary outcome for all analyses was a composite of early death (within 72 hours of presentation to the hospital) or requirement of extracorporeal membrane oxygenation (ECMO) support. This secondary outcome was requested by the task force because early death and ECMO are more likely to be directly associated with sepsis in the first 24 hours of presentation than in-hospital mortality, which can occur later and be the result of complications during the hospitalization. Also, using ECMO to rescue children with sepsis-associated respiratory and/or cardiac failure could lead to survival of some children who would otherwise die. Suspected infection was defined as receipt of systemic antimicrobials and microbiological testing within the first 24 hours of the encounter. Comorbidities were defined based on the Pediatric Complex Chronic Conditions Classification System,⁸ and severe malnutrition was based on more than 3 SDs below the mean based on weight-for-age standards from the World Health Organization.⁹ The systemic inflammatory response syndrome criteria were based on IPSCC criteria.^2,3 Because dosing information necessary to calculate the vasoactive-inotropic score was often missing at lower-resource sites, the number of concurrent vasoactive agents was tested as a proxy. The area under the precision recall curve (AUPRC) was used as the primary measure of organ dysfunction subscore, stacked regression sepsis model, and Phoenix Sepsis Score performance because it is more accurate than the area under the receiver operating characteristic (AUROC) curve when analyzing imbalanced data sets (eg, many more survivors than nonsurvivors). This is particularly important in children with infections given their lower baseline mortality compared with adults.^10,11 The best way to interpret AUPRCs is to use the baseline rate as reference. If mortality is 1% (0.01) and the model AUPRC is 0.30, the model has 30-fold higher performance than a random model. Because the novel Phoenix sepsis and septic shock criteria represent single, binary thresholds, the primary performance measures used to evaluate them were sensitivity and PPV, which represent single points on the precision recall curve. Missing data were imputed using a last-observation-carried-forward approach across physiologically appropriate time windows. See eAppendix 1 in Supplement 1 for details.

Derivation and Validation of the Novel Criteria for Sepsis and Septic Shock

The evaluation of which organ dysfunction subscores best predicted mortality involved all patients with and without suspected infection (eFigures 1-2 in Supplement 1). Then, stacked regression models^12,13 were derived and validated to predict mortality using the worst organ dysfunction subscores recorded in the first 24 hours of the encounter among children with suspected infection (eFigures 1-2 in Supplement 1). This approach was used to implement the concept of “an infection with life-threatening organ dysfunction,” which was adopted by the Pediatric Sepsis Definition Task Force as the conceptual definition of sepsis.

The data set was first divided into development (including derivation and internal validation) and external validation sets as described above and shown in eFigure 2 in Supplement 1. From each development site, 25% were held out for internal validation. The other three 25% portions of the development data set were used to (1) identify the best-performing criteria for each individual organ dysfunction based on the subscores of 8 existing and previously validated pediatric organ dysfunction criteria in all patients in the development data sets (including patients with suspected infection and those without) (eTable 2 and eFigure 2 in Supplement 1)^{14,15,16,17,18,19}; (2) train and tune stacked regression models using a composite of the best-performing individual organ dysfunction criteria in children with suspected infection^12,13; and (3) derive and internally validate the novel sepsis criteria based on the final stacked regression model. Finally, the novel criteria were validated in the external validation sets.

Stacked regression is a robust model-averaging approach that allows many models to be used simultaneously, leveraging the best predictive power of each model. The best-performing organ dysfunction subcomponent scores were used as input variables for stacked regression models that also predicted mortality. The stacked regression models took the organ dysfunction subscores as covariates and estimated the regression weights (or the relative contribution of each respective subcomponent’s prediction to the overall prediction) in accordance with each subcomponent’s predictive power, while maintaining a high degree of interpretability.¹³ Additional information is available in eAppendix 1 in Supplement 1.

Ridge, least absolute shrinkage and selection operator (LASSO), and elastic net regularized logistic regression were evaluated as the top-level stacked models. Ten-fold cross-validation was used to select the regularization parameter lambda in the stacked models that minimized deviance for each value of alpha (0 = ridge; 1 = LASSO) (see eAppendix 1 in Supplement 1 for additional information). The best-performing stacked regression models were identified using the AUPRC. In the third step, the components of the final stacked regression model were translated into an integer-based score using a grid search, then its performance was compared with the final stacked model to ensure that the AUPRC remained stable. When measures and models had similar performance, the task force voted on which to choose based on parsimony, data collection burden, and face validity.⁶ The task force then voted using a modified Delphi process on the thresholds of the score to define sepsis and septic shock and achieve the desired balance of sensitivity and PPV. In the final step, performance of the novel criteria was assessed across validation sets using sensitivity and PPV as primary metrics. Additional information is available in eAppendix 1 and eFigures 1-2 in Supplement 1.

Stratifications and Sensitivity Analyses

During each step, prespecified stratifications and sensitivity analyses were performed to ensure robustness. These included (1) higher-resource vs lower-resource settings, where the higher-resource sites were analyzed together given their overall similarity and the lower-resource sites were analyzed individually given their broader differences in underlying population, resources, and data quality; (2) no known prior comorbidities, to assess criteria performance in children without potential confounding by chronic and/or life-limiting conditions; (3) age groups, to ensure that performance remains appropriate across the pediatric spectrum; (4) ICU admission, given that many children with sepsis receive ICU care; and (5) excluding patients who required operative care, to reduce confounding by mechanical ventilation or vasoactive medications related to receiving anesthesia or undergoing surgery.

Results

Cohort Demographic and Clinical Characteristics

The development set included 3 049 699 emergency department, inpatient, and ICU encounters for children younger than 18 years, of which 172 984 (5.7%) had suspected infection in the first 24 hours (Table 1; eTables 3 and 4 and eFigure 2 in Supplement 1). Of those, 2065 (1.2%) died. The external validation set included 581 317 encounters, of which 45 855 (7.9%) had suspected infection in the first 24 hours. Of those, 540 (1.2%) died (Table 1; eTable 5 in Supplement 1).

Table 1. Characteristics of Pediatric Patient Encounters With Suspected Infection in the First 24 Hours^a.

Characteristics	Derivation cohort	Internal validation cohort	External validation cohort
Encounters, No.	129 584	43 400	45 855
Resource setting, No. (%)
Higher-resource settings	108 177 (83.5)	36 202 (83.4)	33 020 (72.0)
Lower-resource settings	21 407 (16.5)	7198 (16.6)	12 835 (28.0)
Age, median (IQR), y	3.7 (0.9-9.4)	3.7 (0.9-9.3)	2.6 (0.6-7.6)
Sex, No. (%)
Female	62.868 (48.5)	21 041 (48.5)	22 295 (48.6)
Male	66 712 (51.5)	22 357 (51.5)	21 555 (47.0)
Race, No. (%)^b
American Indian or Alaska Native	109 (0.1)	21 (<0.1)	59 (0.1)
Asian	5149 (4.0)	1703 (3.9)	506 (1.1)
Black	22 709 (17.5)	7512 (17.3)	7476 (16.3)
Native Hawaiian or Other Pacific Islander	105 (0.1)	31 (0.1)	70 (0.2)
White	57 518 (44.4)	19 533 (45.0)	23 545 (51.3)
Multiple	22 113 (17.1)	7343 (16.9)	277 (0.6)
Other/unknown	22 095 (17.1)	7309 (16.8)	1.4 051 (30.6)
Hispanic or Latino ethnicity, No. (%)	33 698 (26.0)	11 457 (26.4)	55 (0.1)
Major comorbidities, No. (%)
Technology dependence	18 951 (17.5)	6011 (16.6)	5677 (17.2)
Severe malnutrition	13 505 (10.4)	4478 (10.3)	3417 (7.5)
Malignancy	10 924 (10.1)	3709 (10.2)	2950 (8.9)
Transplant	3689 (3.4)	1287 (3.6)	1573 (4.8)
Comorbidities per PCCC, No. (%)^c
No known prior comorbidity	72 291 (66.8)	24 470 (67.6)	22 553 (68.3)
1 PCCC	9406 (8.7)	3150 (8.7)	2580 (7.8)
≥2 PCCCs	26 480 (24.5)	8582 (23.7)	7887 (23.9)
Systemic inflammatory response syndrome, No. (%)^d	56 711 (43.8)	18 848 (43.4)	21 436 (46.7)
Locations visited during encounter (not mutually exclusive), No. (%)
Presented to emergency department	92 507 (71.6)	31 092 (71.9)	26 940 (61.6)
≥1 Intensive care unit stays	23 128 (17.9)	7840 (18.1)	10 702 (23.4)
≥1 Operating room visits	17 604 (13.6)	6098 (14.1)	469 (1.1)
Outcomes, No. (%)
Death	1538 (1.2)	527 (1.2)	540 (1.2)
Early death or extracorporeal membrane oxygenation	834 (0.6)	305 (0.7)	349 (0.8)

Open in a new tab

Abbreviation: PCCC, pediatric complex chronic condition.

^{^a}

Table 1 shows site, demographic, care location, comorbidity, and outcome characteristics of those with suspected or confirmed infection in the first 24 hours of the encounter. Data from the 7 development sites are stratified by the 75% derivation cohort vs the 25% internal validation cohort.

^{^b}

For race categories, “multiple” indicates that in the electronic health record, a patient’s race was recorded as “multiracial,” “multiple,” or “2 or more races.” “Other/unknown” indicates that a patient’s race was recorded in the electronic health record as “other,” “unknown,” “not specified,” “information not recorded,” “patient declined,” “patient refused,” “refused,” or as a race category unique to a particular international country or region.

^{^c}

The PCCC system classifies pediatric chronic diseases using International Classification of Diseases diagnosis and procedure codes and was assessed only at higher-resource sites, where the information was available (percentages for PCCC-related counts are based on higher-resource setting encounters).⁸ The major comorbidities of technology dependence (eg, requiring gastrostomy, tracheostomy, central line), malignancy, and transplant were defined in the PCCC system. Severe malnutrition was defined as based on <3 SDs below the mean based on weight-for-age standards from the World Health Organization and assessed at all sites.⁹ Early death is defined as death <72 hours after the beginning of the encounter.

^{^d}

Systemic inflammatory response syndrome is assessed using temperature, white blood cell count, heart rate, and respiratory rate, with higher values reflecting more inflammation. Criteria are met when ≥2 values are outside the threshold for age, including at least temperature or white blood cell count. See eAppendix 1 in Supplement 1 for additional details.

Best-Performing Individual Organ Dysfunction Criteria

Organ dysfunction subscore input availability and missingness are shown in eFigure 3, A-H, in Supplement 1. By 24 hours into an encounter, most patients in higher-resource settings had information recorded for pulse oximetry oxygen saturation (Spo₂), respiratory support, platelet count, blood pressure, vasoactive agent use, and Glasgow Coma Scale score. Many also had fraction of inspired oxygen (Fio₂), lactate, and pupillary reactivity measured. Patients in lower-resource settings were less likely to have available data on lactate, Glasgow Coma Scale, pupillary reactivity, and coagulation studies such as D-dimer and fibrinogen. The best-performing individual organ dysfunction criteria based on the primary measure of AUPRC and task force Delphi process when AUPRCs were similar included cardiovascular (Pediatric Logistic Organ Dysfunction version 2 [PELOD-2] and vasoactive medication count), hematology/coagulation (Disseminated Intravascular Coagulation score), respiratory (pediatric Sequential Organ Failure Assessment [pSOFA]), renal (pSOFA), hepatic (IPSCC), neurologic (PELOD-2), immunologic (Pediatric Organ Dysfunction Information Update Mandate [PODIUM]), and endocrine dysfunction (PODIUM), as shown in eFigure 4 in Supplement 1.

Derivation and Validation of the Stacked Models

The best-performing stacked models included an 8-organ system ridge regression model and a 4-organ system LASSO model (eTable 6 and eFigure 6 in Supplement 1). Overall, AUPRCs and AUROCs were similar between these 2 models (eFigure 7 in Supplement 1). The task force evaluated the 2 models and chose to advance the 4-organ system model because it had similar performance but greater simplicity and lower dependence on laboratory measures. The task force acknowledged that the more comprehensive 8-organ system model may have utility in some circumstances (eg, research). The 4-organ system model included criteria for respiratory (mechanical ventilation, Pao₂:Fio₂, and Spo₂:Fio₂ ratios), cardiovascular (mean arterial pressure, lactate level, and vasoactive medications), coagulation (platelet count, international normalized ratio, D-dimer, and fibrinogen), and neurologic (Glasgow Coma Scale and pupillary reaction) dysfunction.

From the Stacked Model to the Phoenix Sepsis Score

The 4-organ system model was translated into an integer-based score, the Phoenix Sepsis Score (Table 2). In doing so, the individual levels were reweighted using a grid search and collapsed into a single level when performance was unaffected (eg, the pSOFA respiratory subscores of 1 and 2 points were collapsed into a single level). Mortality increased with higher score values in both higher- and lower-resource settings (Figure 1 and Figure 2; eFigure 5 in Supplement 1). The Phoenix Sepsis Score had AUPRCs of 0.23 to 0.38 (95% CI range, 0.20-0.39) and AUROCs of 0.71 to 0.92 (95% CI range, 0.70-0.92) to predict mortality in the internal and external validation sets, similar to the stacked sepsis model (Figure 3; eFigures 6-8 in Supplement 1). Compared with the existing IPSCC sepsis score as well as several organ dysfunction scores, the Phoenix Sepsis Score had the highest AUPRC to predict mortality at all validation sites combined, at all higher-resource sites, and at 3 of the 4 lower-resource sites (Figure 3). A notable limitation is that lower-resource sites 2-4 did not record respiratory support, even when a patient received it, which limited the range of the score and likely resulted in lower performance at those sites. Additionally, lower-resource site 2 had no recording of neurologic status, further limiting score range and performance at that site. However, the score at lower-resource site 1 included data for all 4 organ systems. To enable capture of other organ dysfunctions for research or epidemiological purposes, an expanded score based on the 8-organ system model was also developed and named the Phoenix-8 Score (eFigure 9 in Supplement 1).

Table 2. The Phoenix Sepsis Score^a.

	0 Points	1 Point	2 Points	3 Points
Respiratory (0-3 points)
	Pao₂:Fio₂ ≥400 or Spo₂:Fio₂ ≥292^b	Pao₂:Fio₂ <400 and any respiratory support^c or Spo₂:Fio₂ <292 and any respiratory support^c	Pao₂:Fio₂ 100-200 and IMV or Spo₂:Fio₂ 148-220 and IMV	Pao₂:Fio₂ <100 and IMV or Spo₂:Fio₂ <148 and IMV
Cardiovascular (0-6 points)
		1 point each (up to 3) for:	2 points each (up to 6) for:
	No vasoactive medications^d	1 Vasoactive medication^d	≥2 Vasoactive medications^d
	Lactate <5 mmol/L^e	Lactate 5-10.9 mmol/L^e	Lactate ≥11 mmol/L^e
Mean arterial pressure by age, mm Hg^f^,^g
<1 mo	>30	17-30	<17
1 to 11 mo	>38	25-38	<25
1 to <2 y	>43	31-43	<31
2 to <5 y	>44	32-44	<32
5 to <12 y	>48	36-48	<36
12 to 17 y	>51	38-51	<38
Coagulation (0-2 points)^h
		1 point each (maximum of 2 points) for:
	Platelets ≥100 × 10³/μL	Platelets <100 × 10³/μL
	International normalized ratio ≤1.3	International normalized ratio >1.3
	D-dimer ≤2 mg/L FEU	D-dimer >2 mg/L FEU
	Fibrinogen ≥100 mg/dL	Fibrinogen <100 mg/dL
Neurologic (0-2 points)ⁱ
	Glasgow Coma Scale score >10^j; pupils reactive	Glasgow Coma Scale score ≤10^j	Fixed pupils bilaterally

Open in a new tab

Abbreviations: FEU, fibrinogen equivalent units; Fio₂, fraction of inspired oxygen; IMV, invasive mechanical ventilation; Spo₂, pulse oximetry oxygen saturation.

^{^a}

The Phoenix Sepsis Score may be calculated in the absence of some variables (eg, even if lactate level is not measured and vasoactive medications are not used, a cardiovascular score can still be ascertained using blood pressure). It is expected that laboratory tests and other measurements will be obtained at the discretion of a medical team based on clinical judgment. Unmeasured variables contribute no points to the score.

^{^b}

Calculated only when Spo₂ is ≤97%.

^{^c}

Respiratory dysfunction of 1 point can be assessed in any patient receiving oxygen, high-flow, noninvasive positive pressure, or IMV respiratory support, and includes Pao₂:Fio₂ <200 and Spo₂:Fio₂ <220 in children who are not receiving IMV.

^{^d}

Vasoactive medications include any dose of epinephrine, norepinephrine, dopamine, dobutamine, milrinone, and/or vasopressin (for shock).

^{^e}

Lactate can be arterial or venous. Lactate reference range is 0.5-2.2 mmol/L.

^{^f}

Use measured mean arterial pressure preferentially (invasive arterial if available or noninvasive oscillometric), and if measured mean arterial pressure is not available, a calculated mean arterial pressure (⅓ × systolic + ⅔ × diastolic) may be used as an alternative.

^{^g}

Age is not adjusted for prematurity, and the criteria do not apply to birth hospitalizations, children with postconceptional age <37 weeks, or those aged ≥18 years.

^{^h}

Coagulation variable reference ranges: platelets, 150-450 × 10³/μL; D-dimer, <0.5 mg/L FEU; fibrinogen, 180-410 mg/dL. International normalized ratio reference range is based on local reference prothrombin time.

^ⁱ

The neurologic dysfunction subscore was pragmatically validated in both sedated and nonsedated patients and those with and without IMV support.

^{^j}

The Glasgow Coma Scale score measures level of consciousness based on verbal, eye, and motor response and ranges from 3 to 15, with a higher score indicating better neurologic function.

Figure 1. — This figure shows calibration of the Phoenix Sepsis Score in higher-resource settings (sites with more technological resources, eg, laboratory equipment, ventilators, and kidney replacement therapy devices, to support organ dysfunction). For patients with suspected infection who have each possible integer value of the Phoenix Sepsis Score in the first 24 hours of the encounter, mortality among those at the development, internal validation, and external validation sites is shown. Binomial confidence intervals (whiskers) for the mortality point estimate in each group are also shown.

Figure 2. — This figure shows the calibration of the Phoenix Sepsis Score in lower-resource settings (sites with fewer technological resources to support organ dysfunction). For patients with suspected infection who have each possible integer value of the Phoenix Sepsis Score in the first 24 hours of the encounter, mortality among those at the development, internal validation, and external validation sites is shown. Binomial confidence intervals (whiskers) for the mortality point estimate in each group are also shown. At lower-resource sites, some variables were rarely available (eg, D-dimer and fibrinogen for coagulation dysfunction), even when other variables for the same organ systems were recorded (eg, platelet count and international normalized ratio); thus, the maximum cumulative score achieved at lower-resource sites was 9, instead of the maximum possible of 13.

Figure 3. — IPSCC indicates International Pediatric Sepsis Consensus Conference; PELOD-2, Pediatric Logistic Organ Dysfunction version 2; PODIUM, Pediatric Organ Dysfunction Information Update Mandate; and pSOFA, pediatric Sequential Organ Failure Assessment. This figure compares the performance of the Phoenix Sepsis Score with validated pediatric organ dysfunction scores and criteria to predict mortality in patients with suspected infection in the first 24 hours. Equivalent performance metrics for the secondary outcome, early death or extracorporeal membrane oxygenation, are shown in eFigure 7 in Supplement 1. All types of organ dysfunction are evaluated across their respective full ranges, with higher scores indicating more organ dysfunction burden. The scores for IPSCC, Proulx, and PODIUM are based on the counts of organ dysfunction (eAppendix 1 and eTable 2 in Supplement 1). Performance is presented as both quantitative with 95% CIs (calculated using logit transform), as well as visually using a color heat map. Shading indicates highest (darkest) to lowest (lightest) in each row. The AUPRC is the area under a curve drawn with sensitivity (also referred to as “recall”) and positive predictive value (also referred to as “precision”) across all potential thresholds for the points in the scores. The AUPRC is a more reliable classifier performance metric than the AUROC when the classes are imbalanced, for example, when mortality is very low, as in this study. The AUROC is the area under a curve drawn with the false-positive rate on the x-axis and the true-positive rate on the y-axis. In this study, it is an indicator of how well a classifier can rank encounters with respect to mortality risk.

From the Phoenix Sepsis Score to the Criteria for Pediatric Sepsis and Septic Shock

The task force chose a Phoenix Sepsis Score of 2 or greater in patients with suspected infection as the new sepsis criteria, and sepsis with 1 or more cardiovascular points as criteria for septic shock. In the development set, children with sepsis in the first 24 hours had 7.1% mortality at the higher-resource sites and 28.5% mortality at the lower-resource sites. Children with sepsis in both higher- and lower-resource settings had a median Phoenix Sepsis Score of 3 points (IQR, 2-4). Children with septic shock in the first 24 hours had 10.8% mortality at the higher-resource sites and 33.5% mortality at the lower-resource sites. The novel criteria had higher PPV and sensitivity that was comparable with or higher than the IPSCC sepsis, severe sepsis, and septic shock criteria across all settings and using the secondary outcome of early death or ECMO (Figure 4; eFigure 10 and eTable 7 in Supplement 1). For example, for the primary outcome of death at the higher-resource sites, the Phoenix sepsis criteria had a PPV of 5.3% to 7.1% (with a baseline mortality of 0.6% to 0.7%) and a sensitivity of 69.2% to 84.4% compared with the IPSCC severe sepsis criteria, which had a PPV of 3.6% to 4.8% and a sensitivity of 58.7% to 70.7%, in the development and external validation sets, respectively. In the derivation and internal validation set of the lower-resource site that had complete data for assessment of the criteria, the Phoenix sepsis criteria had a PPV of 22.2% (baseline mortality rate of 4.1%) and a sensitivity of 81.2% compared with the IPSCC severe sepsis criteria, which had a PPV of 12.7% and a sensitivity of 49.2%.

Figure 4. — The positive predictive value (PPV, or precision) and sensitivity for the Phoenix vs 2005 International Pediatric Sepsis Consensus Conference (IPSCC) criteria for sepsis in children with suspected infection are shown. The Phoenix sepsis criteria are based on achieving ≥2 points in the Phoenix Sepsis Score among patients with suspected infection in the first 24 hours of an encounter. The IPSCC sepsis and severe sepsis criteria are based on systemic inflammatory response syndrome (SIRS) and IPSCC-based organ dysfunction among patients with suspected infection in the first 24 hours of an encounter. Baseline rates of the outcome in each group (death, or early death or extracorporeal membrane oxygenation [ECMO]) are shown as horizontal dashed lines. 95% CIs are shown as bands from each point in the plane representing that component (eg, CIs for PPV are parallel to the y-axis). Confidence bands that are not visible are narrow enough to be completely hidden by the point. These figures are similar to area under the precision recall curves except at a single threshold for criteria that generate a binary response (eg, yes/no sepsis criteria met) instead of across the range of possible points in the curve (eg, 0-13 points in the Phoenix Sepsis Score; see Figure 3). Better-performing criteria are closer to the top right corner. A trade-off exists between sensitivity and PPV, with more sensitive criteria usually having lower PPV and more specific criteria usually having higher PPV and lower sensitivity. Criteria that are close to the baseline outcome rate have poor predictive value.

^aAt lower-resource site 2, some Phoenix Sepsis Score and IPSCC data inputs (eg, invasive mechanical ventilation, Glasgow Coma Scale score) are not recorded even when they are performed; thus, assessment of criteria performance is limited. Lower-resource site 1 and all higher-resource sites have inputs for all relevant organ systems in the criteria. Comparison of sepsis criteria in the external validation sites is shown in eFigure 10 in Supplement 1 with similar results. Diagnostic performance measures for this comparison are shown in eTable 7 in Supplement 1.

Per request of the task force, the concept of organ dysfunction remote to the site of infection was implemented by requiring that those with respiratory or neurologic dysfunction also had 1 or more points in a different organ system. Patients with sepsis who had remote organ dysfunction accounted for 85.2% of sepsis cases and had higher mortality than the whole sepsis cohort: 8% in higher-resource sites and 32.3% in lower-resource sites (eFigure 11 in Supplement 1).

Sensitivity Analyses

Performance of the pediatric sepsis criteria was consistent across age groups, with higher sepsis incidence and mortality in younger age groups, as expected (eTable 8 in Supplement 1). Similarly, the performance was consistent in patients with no known prior comorbidities, those admitted to the ICU, and after excluding patients who underwent surgery (eTable 8 in Supplement 1).

Clinical vignettes for children presenting with sepsis and septic shock and their corresponding Phoenix Sepsis Score data are provided in eAppendix 2 in Supplement 1.

Discussion

New criteria for pediatric sepsis and septic shock were derived and validated by developing and curating a clinical database with more than 3.6 million pediatric hospital encounters at 10 sites in 5 countries. The development data set was built using structured EHR data from an international cohort that was geographically and racially diverse and had widely varying resources, a major strength of this study. A prespecified data-driven approach was used to determine the best-performing organ dysfunction measures in children with suspected infection. An interpretable machine learning approach was used to develop a composite model that was the basis for the new Phoenix Sepsis Score and the new criteria. The new Phoenix criteria for pediatric sepsis and septic shock had higher PPV and comparable or higher sensitivity than the IPSCC criteria for predicting mortality across differently resourced settings. These findings were consistent in multiple sensitivity analyses that included age, absence of prior comorbidities, ICU admission, and surgery.

Comparison With the Adult Sepsis-3 Criteria

The approach used in this study had both similarities with and differences from the derivation of the adult Sepsis-3 criteria.⁴ Similar to Sepsis-3, the definition of sepsis was implemented as the combination of suspected infection with life-threatening organ dysfunction. Also, existing organ dysfunction scores and a large EHR database were used to develop the new criteria and in-hospital mortality was the primary outcome. However, there were also several important differences. First, instead of using existing complete organ dysfunction scores (eg, the SOFA score) to derive the new criteria, the best-performing individual organ measures of existing scores were used to develop a novel composite score using stacked regression. Additionally, a database was built that included a geographically and demographically diverse population of children from both higher- and lower-resource settings to maximize generalizability. Furthermore, the performance of the individual organ dysfunction measures, the stacked models, and the Phoenix Sepsis Score were primarily evaluated using the AUPRC, instead of the AUROC, with the goal of maximizing the PPV and sensitivity of the final criteria. The AUPRC is considered a better measure of classification performance for rare events (in this case, deaths) compared with the AUROC, which can have inflated performance when the proportions of events (deaths) and nonevents (survivors) are imbalanced,^11,20 an issue that is particularly relevant in children with infections given their lower mortality compared with adults. Finally, this analysis focused on diagnosis of sepsis within the first 24 hours of presentation to a hospital setting, when the majority of pediatric sepsis is diagnosed.²¹

Leveraging Digital Technology to Develop and Implement the Phoenix Sepsis Score

This approach to the development of the Phoenix Sepsis Score and the criteria for sepsis and septic shock is a reflection of the growing digitization of health care globally.²² Most of the vital signs, laboratory tests, and interventions included in the Phoenix Sepsis Score are routinely collected in most lower-resource settings and in nearly all higher-resource settings, according to the Pediatric Sepsis Definition Task Force’s international survey.²³ Even in settings where not all variables are available, the Phoenix Sepsis Score is designed to accurately identify children with sepsis. The score functions when not all variables are available because of its redundancy. Because the score has a possible range of 0 to 13 points, there are several ways to achieve the threshold of 2 points for sepsis diagnosis, as evidenced by the fact that patients with sepsis in both higher- and lower-resource settings had a median Phoenix Sepsis Score of 3 points. This feature was primarily assessed in the data sets from lower-resource settings. For example, although platelets were commonly measured at most sites, coagulation tests (eg, D-dimer and fibrinogen) were less frequently available. At lower-resource site 1, where platelet count was routinely measured but coagulation factors such as D-dimer and fibrinogen were not, the Phoenix Sepsis Score had excellent performance and the Phoenix sepsis criteria had higher sensitivity and PPV than the IPSCC sepsis and severe sepsis criteria. This makes the score and criteria readily translatable into EHR and other digital tools, such as web-based and mobile applications across differently resourced settings, even when some of the variables are not routinely collected.²⁴ Furthermore, digital implementation of the Phoenix Sepsis Score can enable longitudinal monitoring and provide clinicians and researchers with a tool to stratify severity of sepsis.

Additional considerations for the implementation and use of the Phoenix Sepsis Score and the novel criteria are discussed in the accompanying consensus criteria article.⁶

Limitations

This study has several limitations. Retrospective data obtained from EHRs may have missing data and data entry errors. In this study, a robust quality assurance and harmonization process was developed and best practices were used to address outliers and missing data. However, not all errors or missing data can be reconciled. For example, at lower-resource site 2 in the development data set, which represents a lower- to middle-income country, respiratory support (eg, mechanical ventilation, Fio₂) and neurologic assessments (eg, level of consciousness and pupillary reaction) are performed but not recorded in the clinical information systems. This reduces the ability to assess the score and criteria at that site. In contrast, score performance was excellent at lower-resource site 1 and comparable with the higher-resource sites. This demonstrates the potential for score performance in lower-resource environments when these variables are recorded. Second, when deriving the stacked regression models, the Phoenix Sepsis Score, and the new criteria for sepsis and septic shock, a pragmatic approach was intentionally chosen, using the data as recorded during routine care as an indicator of how the criteria would perform in real-world implementations. However, it is acknowledged that some of the organ dysfunction measures used in the modeling process may not have reflected actual organ dysfunction, but rather were due to iatrogenic effects or clinician therapeutic choices, such as a lower Glasgow Coma Scale score in a patient receiving sedation or initiation of vasoactive medications in a patient with minimal cardiovascular dysfunction. Future work to determine the effects of these variables and clinician choices on the performance of the criteria is needed. Third, similar to the Sepsis-3 validation study, unique criteria for patients with chronic organ dysfunction were not developed.⁴ Fourth, few databases from lower-resource settings were available (a form of data poverty),²⁵ and the ones used may not be generalizable to every low-resource environment. Fifth, the data from higher-resource settings were exclusively from tertiary US pediatric centers. Sixth, the data sets from some of the sites included 10 years of data, possibly including changes in practice during that time frame.

Conclusions

The novel Phoenix sepsis criteria, which were derived and validated using a large international database of pediatric hospital encounters in higher- and lower-resource settings, had improved performance for the diagnosis of pediatric sepsis and septic shock compared with the existing IPSCC criteria.

Educational Objective: To identify the key insights or developments described in this article.

What was the primary outcome used in the development and validation of this clinical prediction tool?
1. A combination of intensive care unit admission, intubation, vasopressor support, extracorporeal membrane oxygenation, or death
2. Agreement by at least 2 of 3 pediatric intensivists that sepsis was present on postevent chart review
3. In-hospital mortality
The authors used stacked regression modeling for the derivation of this new clinical prediction tool. Why did they choose this approach?
1. Stacked regression allows many models to be used simultaneously, leveraging predictive power while maintaining a high degree of interpretability.
2. Stacked regression typically yields integer estimates of risk, permitting easy and obvious application in clinical settings.
3. The large size of this database overwhelmed alternative forms of machine learning, forcing selection of the less computationally intensive stacked regression.
According to the authors, how did this development of pediatric sepsis criteria differ from the derivation of the adult Sepsis-3 criteria?
1. Existing organ dysfunction scores and a large electronic health record database were used to develop the new criteria.
2. The database included a geographically and demographically diverse population from both higher- and lower-resource settings.
3. The implemented definition of sepsis combined suspected infection with life-threatening organ dysfunction.

Supplement 1.

eAppendix 1. Supplemental methods

eTable 1. Site Characteristics

eTable 2. Organ dysfunction scores and criteria used in the study

eFigure 1. Conceptual illustration of how stacked regression was used to develop the sepsis criteria

eFigure 2. Pipeline for data harmonization, data quality, and data analysis (A), and CONSORT-style flow diagram for encounters in the pipeline and the various analyses (B)

eFigure 3A-H. Subscore input availability and missingness among patients with suspected infection in higher resource settings

eFigure 4. Performance of the individual subscores for each organ system based on AUPRC and AUROC to predict mortality

eTable 3. Cohort characteristics of the development set stratified by infection status

eTable 4. Cohort characteristics of the development set stratified by infection status and site

eTable 5. Cohort characteristics of the external validation set stratified by infection status and site

eTable 6. Stacked regression coefficients of the 8-organ system ridge regression model and the 4-organ system LASSO model

eFigure 5. In-hospital mortality associated with the Phoenix Sepsis Score in patients with suspected infection in the first 24 hours at higher resource site 6 (the geographic external validation set)

eFigure 6A-J. AUPRC and AUROC curves for the four-organ system model

eFigure 7. Performance of the Phoenix Sepsis Score and organ dysfunction scores to predict early death or extracorporeal membrane oxygenation

eFigure 8A-B. Performance of the Phoenix Sepsis Score and other sepsis scores (A) and other organ dysfunction scores (B) to predict mortality across all thresholds

eFigure 9. The Phoenix-8 organ dysfunction score

eFigure 10. Sensitivity and Positive Predictive Value of the Phoenix and IPSCC criteria across outcomes and patient subgroups in the external validation sets

eTable 7. Diagnostic performance measures of the sepsis and septic shock criteria in the development set

eTable 8. Diagnostic performance measures of the Phoenix sepsis criteria across sensitivity analyses in the development set

eFigure 11A-B. Venn diagram of sepsis with remote organ dysfunction in the development set

eAppendix 2. Clinical vignettes with calculation of the Phoenix Sepsis Score and the Phoenix Sepsis Criteria

eReferences

jama-e240196-s001.pdf^{(4.6MB, pdf)}

Supplement 2.

Data Sharing Statement

jama-e240196-s002.pdf^{(95.2KB, pdf)}

References

1.Rudd KE, Johnson SC, Agesa KM, et al. Global, regional, and national sepsis incidence and mortality, 1990-2017. Lancet. 2020;395(10219):200-211. doi: 10.1016/S0140-6736(19)32989-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Goldstein B, Giroir B, Randolph A, et al. International pediatric sepsis consensus conference: definitions for sepsis and organ dysfunction in pediatrics. Pediatr Crit Care Med. 2005;6(1):2-8. doi: 10.1097/01.PCC.0000149131.72248.E6 [DOI] [PubMed] [Google Scholar]
3.Weiss SL, Fitzgerald JC, Maffei FA, et al. Discordant identification of pediatric severe sepsis by research and clinical definitions in the SPROUT international point prevalence study. Crit Care. 2015;19(1):325. doi: 10.1186/s13054-015-1055-x [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Seymour CW, Liu VX, Iwashyna TJ, et al. Assessment of clinical criteria for sepsis: for the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA. 2016;315(8):762-774. doi: 10.1001/jama.2016.0288 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Singer M, Deutschman CS, Seymour CW, et al. The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA. 2016;315(8):801-810. doi: 10.1001/jama.2016.0287 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Schlapbach LJ, Watson RS, Sorce LR, et al. ; Society of Critical Care Medicine Pediatric Sepsis Definition Task Force . International consensus criteria for pediatric sepsis and septic shock. JAMA. Published online January 21, 2024. doi: 10.1001/jama.2024.0179 [DOI] [Google Scholar]
7.Zeng X, Yu G, Lu Y, et al. PIC, a paediatric-specific intensive care database. Sci Data. 2020;7(1):14. doi: 10.1038/s41597-020-0355-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Feinstein JA, Russell S, DeWitt PE, et al. R package for pediatric complex chronic condition classification. JAMA Pediatr. 2018;172(6):596-598. doi: 10.1001/jamapediatrics.2018.0256 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.World Health Organization . Nutrition for Health and Development. WHO Child Growth Standards: Growth Velocity Based on Weight, Length and Head Circumference: Methods and Development. World Health Organization; 2009. [Google Scholar]
10.Ozenne B, Subtil F, Maucort-Boulch D. The precision-recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases. J Clin Epidemiol. 2015;68(8):855-859. doi: 10.1016/j.jclinepi.2015.02.010 [DOI] [PubMed] [Google Scholar]
11.Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One. 2015;10(3):e0118432. doi: 10.1371/journal.pone.0118432 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Wolpert DH. Stacked generalization. Neural Netw. 1992;5(2):241-259. doi: 10.1016/S0893-6080(05)80023-1 [DOI] [Google Scholar]
13.Breiman L. Stacked regressions. Mach Learn. 1996;24(1):49-64. doi: 10.1007/BF00117832 [DOI] [Google Scholar]
14.Proulx F, Fayon M, Farrell CA, et al. Epidemiology of sepsis and multiple organ dysfunction syndrome in children. Chest. 1996;109(4):1033-1037. doi: 10.1378/chest.109.4.1033 [DOI] [PubMed] [Google Scholar]
15.Matics TJ, Sanchez-Pinto LN. Adaptation and validation of a pediatric Sequential Organ Failure Assessment score and evaluation of the Sepsis-3 definitions in critically ill children. JAMA Pediatr. 2017;171(10):e172352. doi: 10.1001/jamapediatrics.2017.2352 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Leteurtre S, Duhamel A, Salleron J, et al. PELOD-2: an update of the Pediatric Logistic Organ Dysfunction score. Crit Care Med. 2013;41(7):1761-1773. doi: 10.1097/CCM.0b013e31828a2bbd [DOI] [PubMed] [Google Scholar]
17.Rousseaux J, Grandbastien B, Dorkenoo A, et al. Prognostic value of shock index in children with septic shock. Pediatr Emerg Care. 2013;29(10):1055-1059. doi: 10.1097/PEC.0b013e3182a5c99c [DOI] [PubMed] [Google Scholar]
18.Khemani RG, Bart RD, Alonzo TA, et al. Disseminated intravascular coagulation score is associated with mortality for children with shock. Intensive Care Med. 2009;35(2):327-333. doi: 10.1007/s00134-008-1280-8 [DOI] [PubMed] [Google Scholar]
19.Haque A, Siddiqui NR, Munir O, et al. Association between vasoactive-inotropic score and mortality in pediatric septic shock. Indian Pediatr. 2015;52(4):311-313. doi: 10.1007/s13312-015-0630-1 [DOI] [PubMed] [Google Scholar]
20.Tharwat A. Classification assessment methods. Appl Comput Inform. 2020;17(1):168-192. doi: 10.1016/j.aci.2018.08.003 [DOI] [Google Scholar]
21.Scott HF, Brilli RJ, Paul R, et al. Evaluating pediatric sepsis definitions designed for electronic health record extraction and multicenter quality improvement. Crit Care Med. 2020;48(10):e916-e926. doi: 10.1097/CCM.0000000000004505 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Wyber R, Vaillancourt S, Perry W, et al. Big data in global health. Bull World Health Organ. 2015;93(3):203-208. doi: 10.2471/BLT.14.139022 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Morin L, Hall M, de Souza D, et al. The current and future state of pediatric sepsis definitions. Pediatrics. 2022;149(6):e2021052565. doi: 10.1542/peds.2021-052565 [DOI] [PubMed] [Google Scholar]
24.Jimenez-Zambrano A, Ritger C, Rebull M, et al. Clinical decision support tools for paediatric sepsis in resource-poor settings. BMJ Open. 2023;13(10):e074458. doi: 10.1136/bmjopen-2023-074458 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Ibrahim H, Liu X, Zariffa N, et al. Health data poverty: an assailable barrier to equitable digital health care. Lancet Digit Health. 2021;3(4):e260-e265. doi: 10.1016/S2589-7500(20)30317-4 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials