A machine-learning approach for decision support and risk stratification of pediatric perioperative patients based on the APRICOT dataset

Geoffrey M Gray; Luis M Ahumada; Mohamed A Rehman; Anna Varughese; Allison M Fernandez; James Fackler; Hannah M Yates; Walid Habre; Nicola Disma; Hannah Lonsdale

doi:10.1111/pan.14694

. Author manuscript; available in PMC: 2024 Oct 17.

Published in final edited form as: Paediatr Anaesth. 2023 May 21;33(9):710–719. doi: 10.1111/pan.14694

A machine-learning approach for decision support and risk stratification of pediatric perioperative patients based on the APRICOT dataset

Geoffrey M Gray ¹, Luis M Ahumada ¹, Mohamed A Rehman ², Anna Varughese ², Allison M Fernandez ², James Fackler ³, Hannah M Yates ², Walid Habre ⁴, Nicola Disma ⁵, Hannah Lonsdale ^3,^*

¹Center for Pediatric Data Science and Analytics Methodology, Johns Hopkins All Children’s Hospital, St. Petersburg, Florida, USA

²Department of Anesthesia, Pain and Perioperative Medicine, Johns Hopkins All Children’s Hospital, St. Petersburg, Florida, USA

³Department of Anesthesia, Division of Pediatric Anesthesia, Vanderbilt University School of Medicine, Monroe Carell Jr. Children's Hospital at Vanderbilt, Nashville, Tennessee, USA

⁴Department of Anaesthesia, Pharmacology and Intensive Care, University Hospitals of Geneva, Switzerland

⁵Unit for Research & Innovation, Department of Anesthesia, IRCCS Istituto Giannina Gaslini, Genoa, Italy

Corresponding author: Dr. Hannah Lonsdale, Department of Anesthesia, Division of Pediatric Anesthesia, Vanderbilt University School of Medicine, Monroe Carell Jr. Children's Hospital at Vanderbilt, Nashville, TN, hannah.lonsdale@vumc.org

Author contributions

Geoffrey M. Gray helped design the study, process and analyze the data, prepare the manuscript and accepted the final manuscript.

Luis M. Ahumada helped design the study, process and analyze the data, prepare the manuscript and accepted the final manuscript.

Mohamed A. Rehman helped design the study, analyze the data, prepare the manuscript and accepted the final manuscript.

Anna Varughese helped design the study, analyze the data, prepare the manuscript and accepted the final manuscript.

Allison Fernandez helped design the study, analyze the data, prepare the manuscript and accepted the final manuscript.

James Fackler helped design the study, analyze the data, prepare the manuscript and accepted the final manuscript.

Hannah M. Yates helped design the study, analyze the data, prepare the manuscript and accepted the final manuscript.

Walid Habre helped to analyze the data, prepare the manuscript and accepted the final manuscript.

Nicola Disma helped to analyze the data, prepare the manuscript and accepted the final manuscript.

Hannah Lonsdale helped design the study, process and analyze the data, prepare the manuscript and accepted the final manuscript.

PMCID: PMC11485222 NIHMSID: NIHMS2026490 PMID: 37211981

Abstract

Background:

Pediatric anesthesia has evolved to a high level of patient safety, yet a small chance remains for serious perioperative complications, even in those traditionally considered at low risk. In practice, prediction of at-risk patients currently relies on the American Society of Anesthesiologists Physical Status (ASA-PS) score, despite reported inconsistencies with this method.

Aims:

The goal of this study was to develop predictive models that can classify children as low risk for anesthesia at the time of surgical booking and after anesthetic assessment on the procedure day.

Methods:

Our dataset was derived from APRICOT, a prospective observational cohort study conducted by 261 European institutions in 2014 and 2015. We included only the first procedure, ASA-PS classification I to III, and perioperative adverse events not classified as drug errors, reducing the total number of records to 30,325 with an adverse event rate of 4.43%. From this dataset, a stratified train:test split of 70:30 was used to develop predictive machine learning algorithms that could identify children in ASA-PS class I to III at low risk for severe perioperative critical events that included respiratory, cardiac, allergic, and neurological complications.

Results:

Our selected models achieved accuracies of >0.9, areas under the receiver operating curve of 0.6–0.7, and negative predictive values greater than 95%. Gradient boosting models were the best performing for both the booking phase and the day-of-surgery phase.

Conclusions:

This work demonstrates that prediction of patients at low risk of critical PAEs can be made on an individual, rather than population-based, level by using machine learning. Our approach yielded two models that accommodate wide clinical variability and, with further development, are potentially generalizable to many surgical centers.

Keywords: Machine learning, Artificial intelligence, Anesthesia, Pediatrics, Preoperative care

INTRODUCTION

Clinical practice has evolved to provide a high level of safety for children undergoing anesthesia. However, a small but clinically significant risk remains for complications that can lead to serious injury or even death. ¹ Clinicians may stratify patient risk with the widely used but subjective American Society of Anesthesiologists Physical Status (ASA-PS) Classification System, although it is not designed for this purpose ² Most pediatric patients undergoing diagnostic and surgical procedures will be scored as ASA-PS I-III, with patients scored IV-V considered to be at the highest risk of perioperative events. Some guidelines for assignment of pediatric surgical patients to Ambulatory Surgery Centers (ASCs) and District General Hospitals (DGHs) use ASA-PS status I/II in their selection criteria³, but others advise against this and stress individual patient assessment. Some patients assigned ASA-PS III are potentially suitable for ambulatory care in both North America⁴ and Europe^5,6. The decision is influenced by local variables and referral patterns⁷, and these children are still susceptible to critical postoperative adverse events (PAEs) that may require management at specialist in-patient facilities⁴.

Widely used methods to improve risk stratification center around isolated medical conditions such as sleep disordered breathing (the Snoring, Trouble Breathing, and Un-Refreshed (STBUR) questionnaire), ⁸ upper respiratory infections (COLDS score) ⁹, and perioperative mortality (Pediatric Risk Assessment (PRAm) score) ^10,11. These systems are simple to use and have been validated for use in the perioperative period, but their performance is limited by class imbalance: low prevalence of the event of interest and a large preponderance of patients who experienced uneventful anesthesia. Class imbalance produces a model with a low positive predictive value (PPV). Optimizing models instead for negative predictive potential may offer clinical utility. Accurate models that can be personalized to an individual and used easily at the point of care have not been achievable to date with traditional biostatistical methods and tools, partly due to class imbalance. Large, varied, and complete datasets are therefore necessary to yield clinically meaningful results.

The APRICOT (Anaesthesia PRactice In Children Observational Trial) study¹² focused on the incidence of severe perioperative critical events in children across Europe and, in so doing, created the largest dataset of its kind. The APRICOT group identified age, medical history, and physical condition as major risk factors for perioperative events and found that greater years of experience of the most senior anesthesia team member had a beneficial effect.

Machine learning is used increasingly for predicting outcomes in medicine¹³ because it can compute complex nonlinear relationships across multiple variables and generate tailored recommendations for individual patients on a precision medicine basis. ¹⁴ We leveraged the APRICOT dataset with machine learning techniques to develop models that can determine which pediatric patients in ASA-PS class I-III are at low risk for severe perioperative critical events. We identified two time points in the clinical pathway at which individualized risk stratification would have utility. The first of these—the time of surgical booking—would help identify the children who are safest to assign to a secondary care facility or ambulatory surgical center. The second time point—the day of surgery after anesthesia assessment—would assist with choices in care, such as level of supervision from the appropriate seniority of anesthesia provider.

METHODS

APRICOT dataset

APRICOT was a prospective observational multicenter cohort study conducted by 261 institutions in 33 European countries from April 1, 2014, to January 31, 2015. ¹² The dataset includes 30,874 children aged 0-16 years who were undergoing emergency or elective surgery and diagnostic procedures. The primary endpoint used for model development was based on the original APRICOT study. This encompassed the occurrence of “respiratory, cardiac, allergic, or neurological complications requiring immediate intervention and that led (or could have led) to major disability or death”. Drug errors were excluded for this work as they do not relate to underlying patient condition. The dataset comprises 31,127 anesthetic procedures with a critical perioperative adverse event rate of 4.7%. It includes details of patient demographics, preoperative assessment, intraoperative management, postoperative recovery, and, where applicable, management and outcome of the critical event. Table 1 outlines data elements identified as clinically relevant and therefore used in model training. A small subset of patients (188) had multiple procedures. Full details of the original data collection process have been reported previously, ¹² but the original APRICOT authors did not create a risk stratification tool using their dataset. For our study, use of the APRICOT dataset was considered non-human subjects research by the Johns Hopkins All Children’s Hospital Institutional Review Board. We report our findings using the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) guidelines. ¹⁵

Table 1.

Name and description of Input Features and the presence of these features in the reduced dataset along with the Pearson Correlation with Perioperative Adverse Events.

Feature	Feature used in model	Description	Pearson Correlation with Target Feature
Wheezing	B, S	History of wheezing in the last 12 months (Yes, No, NA)	0.041
Prematurity	B, S	Was the child premature (less than 37 weeks)? (Yes, No, Unknown)	−0.050
Family smokers	B, S	Do any family members smoke? (Yes, No, NA)	0.003
Anesthetic complications	B, S	History of previous anesthetic complications (Yes, No, NA)	0.034
Medication	B, S	Does the child take medications or natural remedies? (Yes, No, NA)	0.037
Handicap	B, S	Presence of metabolic/genetic disorder or neurological impairment (Yes, No, NA)	0.023
Age	B, S	Age of the patient (years)	−0.062
Sex	B, S	Sex of the patient (Male, Female)	0.007
Weight	B, S	Weight of the patient (kg)	−0.052
Surgical procedure	B, S	Is the procedure surgical? (Yes, No)	0.039
Nonsurgical procedure	B, S	Is the procedure nonsurgical? (Yes, No)	−0.039
Anesthesia type	B, S	General anesthesia or sedation (Sedation, General Anesthesia, None)	0.040
Inpatient/outpatient	B, S	Inpatient or outpatient procedure	0.07
Procedure time scheduled	B, S	Is the procedure scheduled during regular working hours or after hours/weekend? (Regular hours, After-hours/weekend)	−0.011
Asthma	B, S	History of any asthma diagnosis (Yes, No, NA)	0.011
Atopy	B, S	Symptoms of atopy in the last 12 months (Yes, No, NA)	0.016
Allergy	B, S	Any history of allergy (Yes, No, NA)	0.004
Snoring	B, S	Does the child snore? (Yes, No, NA)	0.021
ASA-PS score	S	ASA-PS score for the patient (I, II, III, IV, V)	0.080
Flu/cold	S	History of flu/cold in last 2 weeks (Yes, No, NA)	0.043
Fever	S	Presence of fever in last 24 hours (>38.5°C) (Yes, No, NA)	0.004
Urgency	S	Elective, urgent, or emergency surgery	0.010
Consultation	S	Face-to-face anesthesia consultation within 24 hours of procedure (Yes, No)	0.027
Type of anesthesia provider	S	Degree of specialization and role of provider (anesthesiologist with > 80% pediatric practice, anesthesiologist with 50% - 80% pediatric practice, anesthesiologist with < 50% pediatric practice, anesthesiologist in training, anaesthetic nurse, anaesthetic technician)	0.017
Experience	S	Years of experience of the anesthesia provider in charge of the case	−0.022
Premedication	S	Sedative premedication given (Yes, No)	−0.003
Parental presence at induction	S	Parent present during induction (Yes, No)	−0.039
Monitoring	S	Type of monitoring (Standard (ECG, SpO2, anaesthetic agent, capnography, NIBP, temp), Standard + (arterial central line), Standard ++ (NIRS, EEG derived data), Standard minus (one equipment missing))	−0.029
Induction type	S	Inhalation, intravenous, or intramuscular	−0.012
Airway interface type	S	Facemask, endotracheal tube (ETT), or supraglottic airway (SGAW) (Face Mask, SGAW, ETT, Other, None)	0.005
Fluids	S	Fluids given? (Yes, No)	0.083
Regional Anesthesia	S	Was regional anesthesia used? (Yes, No)	0.001
Ventilator Type	S	Type of ventilator (Spontaneous ventilation, pressure support ventilation, mechanical ventilation)	0.007

Open in a new tab

B indicates that the feature was used in the booking model; S indicates that it was used in the day-of-surgery model.

ASA-PS: American Society of Anesthesiologists Physical Status.

Statistical methods and outcome

Scikit-learn and Microsoft Azure Machine Learning Studio (Redmond, WA) were used for model training and testing. ¹⁶ We utilized the APRICOT dataset to develop predictive machine learning algorithms that could help clinicians determine low-risk status in children using only the information that would be available at the time of model use. Model development consisted of three main stages: data pre-processing, model training using the training dataset, and model evaluation using the test dataset.

Enhanced technical details of the data processing and model development process can be found in table 1 of the supporting materials.

Data pre-processing

Feature selection

We included only the first procedure in the dataset for each child, only children with ASA-PS scores I-III, and only PAEs not classified as drug errors. This selection reduced the total number of records to 30,325 with an adverse event rate of 4.43% (Fig 1). Two sets of features were selected with input from subject matter experts, to ensure the clinical utility of the model (Table 1). The first were features available at surgical booking. The second set added information available on the day of surgery. A test-train split of 70:30 was assigned in a stratified manner. Stratification is important to preserve the distribution of PAEs when fewer than 5% of records in the dataset have the outcome of interest. ¹⁷ Using a stratified approach preserved the PAE distribution of 5% in both the train and test set.

Figure 1. — Flow chart showing data selection from the original apricot dataset.

Data scaling

The presence of different ranges for data can introduce bias into a model, favoring features with larger numerical values. To correct for this bias, data scaling can be used. Two common approaches are Min-Max (Normalization) and standard scaler (Standardization). Standard scaling was chosen to preserve the original normal distributions.

Multiple imputation

Missing data were imputed using Multiple Imputation with Chained Equations (MICE), a variation of multiple imputation. ¹⁸ Because each variable is independently iterated, Multiple Imputation with Chained Equations is more efficient than other multiple imputation methods and is generally preferred for large datasets. Additionally, this technique estimates missing values using the existing probability distributions for all available features, providing results superior to those of simpler approaches.

Class imbalance

With an incidence rate of 4.7% for severe perioperative critical events, the dataset had a high degree of class imbalance (~20:1). This imbalance can make it difficult for the model to learn the minority class, which is often the class of interest. Therefore, we used synthetic minority oversampling technique (SMOTE) to reduce the class imbalance. ¹⁷ SMOTE is a data generation technique that creates synthetic data using real data as input. By increasing the proportion of the minority class, algorithms are better able to learn that class. SMOTE was applied to the training set to increase the majority-minority ratio to 10:1. SMOTE was applied to the training data and a static validation set was used. ¹⁷

Model training

Along with synthetic data, several other approaches can improve learning on the minority class. These include the model type (algorithm) and optimization metric (F1 score). The F1 score is the harmonic mean of precision (PPV) and recall (true positive rate). Because the F1 metric is derived from both PPV and recall, both are optimized concurrently during training. Compared to other metrics, this better facilitates the model to learn positive examples. Gradient boosting provided the best performance for both datasets and was chosen for further optimization. Gradient boosting iteratively builds multiple decision trees, known as weak learners, and combines the output together into a final prediction. After the addition of each weak learner, a weight for that learner is calculated using the residuals. This helps improve performance when class imbalance is high, by more favorably weighting weak learners that perform better on the minority class.

Model evaluation

While the F1 score was chosen for optimization of the model during training, multiple metrics were used to evaluate model performance on the test set. These include AUROC, PPV, NPV, accuracy, sensitivity and specificity. Using multiple metrics provides a more complete approach to assessing model generalizability.

Feature importance

Feature ranking identifies elements that contribute the most to model decision-making and is an important step to providing clinical insights and an element of model transparency. We used permutation feature importance to assess the contribution of each feature to the prediction of a perioperative adverse event (PAE). In this approach, each column is randomly shuffled along the rows and the final metric (F1 score) is assessed. The feature ranking score is then determined by taking the difference between the original metric and the newly calculated metric. Positive feature importance scores indicate that the feature improved the prediction. Negative feature importance scores indicate that the feature was detrimental to the prediction. A score of zero indicates no effect on predicting the outcome. For each model, we performed five permutations with different random seeds for the shuffling and used the average for each.

RESULTS

Two machine learning models were produced that identify those children in ASA-PS class I-III who are at low risk for severe PAEs. The receiver operating characteristic curves for each top performing model are shown in Figure 3. The AUROC was 0.618 for the booking model and 0.722 for the day-of-surgery model. Gradient boosting models were the best performing for both the booking phase and the day-of-surgery phase and achieved accuracies of >0.9, AUROCs of 0.6–0.7, and negative predictive values (NPVs) greater than 95%. These models were built using data derived from European practice standards and would likely require additional calibration for use in non-European medical institutions.

Figure 3. — Receiver operating characteristic curves for each model. Diagonal gray line indicates a random selection (AUROC 0.5), and the curves above the line represent better performance.

Pearson correlation coefficients were used to quantify feature interdependence for the entire subset of the APRICOT dataset (Figure 2, Supporting Materials 1). Data elements are described in Table 1. Snoring and wheezing were individually positively correlated with asthma, atopy, fever, flu/cold, and family smoking. Snoring and wheezing were also positively correlated with each other. Both asthma and atopy were individually positively correlated with allergy and anesthetic complications. Age and weight were both individually positively correlated with induction type. Urgency and prematurity were negatively correlated with one another. Negative correlations were also identified between fluids and parental presence at induction; snoring and prematurity; wheezing and prematurity; and family smoking and prematurity. Most of these correlations were not strong (<0.7), with only age/weight and surgical/nonsurgical being strongly correlated.

The confusion matrix results (Table 2) clearly show that the models can learn the negative (non-PAE) outcome better than the PAE outcome. Further detail can be found in table 2 of the supporting materials.

Table 2.

Confusion matrices for time of booking and day of surgery model

Model	PAE (label)	No PAE (label)	Total	Accuracy	Sensitivity	Specificity	PPV	NPV
Booking
				95.2%	1.2%	99.6%	11.9%	95.6%
PAE	5	37	42
No PAE	398	8658	9056
Total	403	8695	9098
Day of surgery
				95.4%	2.7%	99.7%	30.6%	95.7%
PAE	11	25	36
No PAE	392	8670	9062
Total	403	8695	9098

Open in a new tab

PAE: perioperative adverse event; NPV: negative predictive value; PPV: positive predictive value.

Booking Model

Permutation feature importance scores are shown in Figure 4. For the booking model, the top five predictors were patient age, family history of smoking, procedure time, snoring, and sex. Family smoking, use of any medications, previous anesthetic complications, history of handicap, and anesthesia type were also predictors. Surgical procedure, patient type, wheezing, atopy and asthma were not significant predictors. The most important predictors within standard deviation were procedure time, family history of smoking, patient age, anesthesia type, and snoring. That these well-known predictors carry most weighting for the models may reassure users in the early stages of future model implementation.

Day of Surgery Model

For the day of surgery model patient age, ASA-PS score, anesthesia type, asthma and nonsurgical procedure where the top five predictors. Urgency, parental presence at induction, procedure time, patient type, weight and sex were all positive predictors. These findings are consistent with previous studies and indicate that the model is stratifying patients based on clinically relevant factors. This helps to further validate the model’s decision-making. The remaining features had negative feature permutation scores, indicating that they are not important predictors. The most important predictors within standard deviation were ASA-PS score, patient age, anesthesia type, procedure time, and nonsurgical procedure.

DISCUSSION

We developed machine-learning models to prospectively classify ASA-PS I-III pediatric patients aged 0-16 years as low risk for perioperative adverse events (PAEs). ASA-PS I-III was used as a surrogate marker for patients potentially suitable for care at ASCs and DGHs, or by providers who are not fellowship-trained pediatric anesthesiologists. These models were trained using only data that would be available at each of two time points in the clinical workflow—at surgical booking and on the day of surgery. The top performing models achieved accuracies of >0.9, AUROCs of 0.6–0.7, and negative predictive values (NPVs) greater than 95%. Generally, an AUROC of 0.7 is desirable as a benchmark for clinical performance. ^19,20. Because of the high class imbalance, models can improve the optimization metric by favoring predictions of the majority class. This results in a much higher predictive value for the majority class and poses challenges in predicting the minority class.

Comparison to Current Risk Assessment Tools

ASA-PS classification is a standard tool that has been used globally for assessment of anesthesia risk, ²¹ but its reliability in both risk prediction and inter-observer variation has long been questioned. ²² Its poor reliability is more pronounced in children than in adults. ^23-25 In addition, ASA-PS does not consider the impact of surgery. ²⁶ No objective model to stratify anesthesia risk in children is in widespread clinical use, ²⁷ although several have been developed.¹² Malviya et al ²⁸ developed the NARCO-SS tool using a “review of systems” approach, relying on clinical judgement rather than objective data analysis. The score places the patient on a four-point risk scale; however, Malviya et al did not report PPV, sensitivity, or specificity of the score as a screening tool for PAEs. Nasr et al²⁹ developed and validated¹¹ the Pediatric Risk Assessment score (PRAm), which uses multivariable logistic regression to predict perioperative mortality in children undergoing noncardiac surgery and achieved an impressive AUROC of 0.943. Because mortality is a specific and rare outcome, it is heavily imbalanced in a dataset. This led to reported PPVs for PRAm scores ≥6 of 2.57% to 6.03%. Faraoni and colleagues³⁰ developed a risk stratification score for children with congenital heart disease undergoing noncardiac surgery and reported an AUROC of 0.831; however, no PPV is reported.

A tool with low PPV has limited clinical utility as a predictor of high-risk patients; however, a tool optimized for high NPV may help identify low-risk patients. The high NPVs of our model demonstrate a potential future clinical utility of the booking model, but its AUROC score of 0.626 currently falls short of the generally accepted performance of 0.7. With an AUROC of 0.722 and predictors that are well established, the day-of-surgery model demonstrates better performance. Our models incorporate more detailed information than is used to assign an ASA-PS score, delivering an objective risk assessment that is further individualized to the patient.

Feature Importance

Feature importance analysis (Figure 4) identifies patient age, ASA-PS score, anesthesia type, asthma and nonsurgical procedure as the strongest global predictors for PAEs in the day-of-surgery model. Among these, anesthesia type may be modifiable at the procedure level and asthma may be optimized. Additionally, given the importance of ASA-PS score for the surgery model, incorporating this into decision-making at the time of booking may be of further interest if the surgical team are appropriately trained in ASA-PS scoring. The feature importance analysis provides the global feature importance for the dataset but does not identify local feature importance values for individual patients. This means that the trends observed are not necessarily applicable to all patient subpopulations. Additionally, feature importance does not identify predictor correlation. For example, modifying the airway management technique may not lead to a change in perioperative risk for every child.

Model Limitations

Accurate reporting of preoperative history, ASA-PS classification, and inclusion of PAEs in the dataset are influenced by the individual anesthesiologist for each case. Errors or exclusions may lead to inaccurate input details, which could affect model performance. The APRICOT PAE criteria also led to the inclusion of adverse events that some anesthesiologists would consider nonsignificant. The inclusion of these “critical events” may produce a prediction that is overly sensitive for the needs of some clinicians, although this problem could be partially overcome by adjusting the model threshold. Each of the records in the APRICOT dataset documents the current working practice of the individual anesthesiologist managing the case. These practices may not align with current best practice, or the practice of an anesthesiologist in a given country, working culture, patient population, or center. Such mismatches may further limit the model generalizability to individual institutions, although the models should be locally validated before implementation at an institution to ensure that their performance met clinical requirements. Finally, despite being the largest of its kind, the dataset is also relatively small in machine learning terms, and, despite the use of synthetic data, this further limits model performance.

Effective prediction of high-risk patients (higher PPV) would require a larger, more granular dataset. The APRICOT dataset is valuable because of the scope of data collection. However, the hand-collected data are considered low fidelity when compared with current electronic record-keeping and automated data capture, and do not include important factors such as ethnicity and socioeconomic status, which are known to correlate with perioperative respiratory adverse events³¹. This, in part, contributes to the limitations of model performance. For example, height was missing from 41.6% of patient records and therefore was not included as an input to model development. This prohibited the use of BMI in model development, despite obesity being a common and known risk factor for PAEs. Obese and severely obese children are also frequently under-classified in the ASA-PS system ³² and may inappropriately be allocated to a non-tertiary center for surgery. Additional examples of more granular data that may improve the performance of a predictive model include details of acute symptoms, duration of fasting, liver and pancreatic diseases, diabetes, congenital heart disease, kidney injury, immunosuppression, and preoperative use of corticosteroids.

Future applications

Performance of our models is similar to that of the COLDS score (AUROC 0.69) used to risk-stratify patients with upper respiratory tract infections and could eventually serve in a similar capacity.⁹ Validation against a prospectively collected external dataset of similar patients would demonstrate the extent of model generalizability to new, similar, patient populations. If the population differs significantly from the data used for model training and evaluation, performance could decrease. If the model demonstrates good generalizability, then use of this model could serve as an additional clinical datapoint for traditionally lower risk patients and warrants further study. The model could be integrated into the clinical workflow to serve as decision support,³³ with additional model calibration and revalidation for the specific clinical implementation as necessary.

The time-of-booking model, once refined, could be used as part of the assessment process for the surgical team, to support the allocation of patients to an ambulatory care center for their procedure. Patients screened by the model and identified as low risk is additional reassurance to the surgical team that booking at an ASC is appropriate. Any patient not assessed as low risk by the model could be highlighted to the surgical team for additional consideration. This may lead to one of three outcomes: book the patient at the ASC as planned, book the patient at a tertiary center, or refer to an anesthesia preoperative service for further guidance.

On the day of surgery, a model output of “low risk” for an individual patient could provide additional reassurance to the perioperative team, informing their decision-making in areas that include degree of supervision of trainees, use of benzodiazepines for pre-medication, choice of airway adjunct, and deep vs. awake extubation.

The long-term aims of model development and implementation therefore might be (1) to provide additional information to support clinician decision-makers to designate children as low risk for treatment at non-tertiary surgical sites, such as ASCs, satellite sites, or DGHs, and (2) to support decision-making on the day of surgery. Several challenges to deploying this model exist. These include differences in population between the patients used to build the model and those present in the pipeline, differences in anesthesia practice including data collected preoperatively and the potential for changing populations over time (data drift).

A further utility of the models is in identification of potential factors for clinical study or practical considerations. Future work can consider individual predictors to further explain model decisions. This will build clinical trust and can be guided by examples such as the United States Food and Drug Administration Artificial Intelligence and Machine Learning in Software as a Medical Device Action Plan.³⁴

Conclusion

This work demonstrates that prediction of patients at low risk of critical PAEs can be made on an individual level by using machine learning Predictions generated based on individual patient factors can avoid the high variability and potentially misleading general population risk estimates generated by the ASA-PS. Our approach yielded two machine learning models that accommodate wide clinical variability and, with further development, are potentially generalizable to many surgical centers. Future work could focus on developing a more granular risk stratification capable of informing providers of an accurate perioperative risk level for all patients. Future models could be integrated into the clinical workflow.

Supplementary Material

Supporting materials

NIHMS2026490-supplement-Supporting_materials.docx^{(27.3KB, docx)}

ACKNOWLEDGEMENTS

The team would like to express its thanks to Medical Librarian Pamela Williams, MS, MLS, AHIP, for her expertise and assistance in performing medical literature searches. Claire F. Levine, MS, ELS, Department of Anesthesiology and Critical Care Medicine, Johns Hopkins University School of Medicine, and Yvonne Poindexter, MA, Editor, Department of Anesthesiology, Vanderbilt University Medical Center, edited the text of this article in manuscript form.

Glossary of Terms

APRICOT: Anaesthesia PRactice In Children Observational Trial
ASA: American Society of Anesthesiologists
ASA-PS: American Society of Anesthesiologists Physical Status
AUROC: area under the receiver operating curve
FN: false negative
FP: false positive
MI: multiple imputation
MICE: Multiple Imputation with Chained Equations
NPV: negative predictive value
PAE: perioperative adverse event
PPV: positive predictive value
SMOTE: synthetic minority oversampling technique
TRIPOD: Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis
ESAIC: European Society of Anaesthesiology and Intensive Care

Footnotes

Disclosures: This study was funded internally by the Department of Anesthesia and Pain Medicine at Johns Hopkins All Children’s Hospital. The APRICOT dataset was used with approval from the APRICOT steering committee and the European Society of Anaesthesiology and Intensive Care (ESAIC) and was originally entirely sponsored by a grant from their clinical trial network. The ESAIC was not involved in this work or in the original study’s data analysis and interpretation.

Conflicts of interest: None

Data availability statement:

The data are not publicly available; requests for access should be made via the European Society of Anaesthesiology and Intensive Care.

REFERENCES

1.Kurth CD, Tyler D, Heitmiller E, Tosone SR, Martin L, Deshpande JK. National pediatric anesthesia safety quality improvement program in the United States. Anesth Analg. 2014;119(1):112–121. [DOI] [PubMed] [Google Scholar]
2.Owens WD. American Society of Anesthesiologists Physical Status Classification System is not a risk classification system. Anesthesiology. 2001;94(2):378–378. [DOI] [PubMed] [Google Scholar]
3.Hartley B, Powell S, Bew S. Safe delivery of paediatric ENT surgery in the UK: a national strategy 2019. Accessed [2023-02-14]. https://www.entuk.org/news_and_events/news/77/safe_delivery_of_paediatric_ent_surgery_in_the_uk_a_national_strategy/ [Google Scholar]
4.Whippey A, Kostandoff G, Ma HK, Cheng J, Thabane L, Paul J. Predictors of unanticipated admission following ambulatory surgery in the pediatric population: a retrospective case–control study. Paediatr Anaesth.. 2016;26(8):831–837. [DOI] [PubMed] [Google Scholar]
5.Junger A, Klasen J, Benson M, et al. Factors determining length of stay of surgical day-case patients. Eur J Anaesthesiol. 2001;18(5):314–321. [DOI] [PubMed] [Google Scholar]
6.Green Z, Woodman N, McLernon DJ, Engelhardt T. Incidence of paediatric unplanned day-case admissions in the UK and Ireland: a prospective multicentre observational study. Br J Anaesth. 2020;124(4):463–472. [DOI] [PubMed] [Google Scholar]
7.Rabbitts JA, Groenewald CB, Moriarty JP, Flick R. Epidemiology of Ambulatory Anesthesia for Children in the United States: 2006 and 1996. Anesth Analg. 2010;111(4):1011–1015. [DOI] [PubMed] [Google Scholar]
8.Tait AR, Voepel-Lewis T, Christensen R, O'Brien LM. The STBUR questionnaire for predicting perioperative respiratory adverse events in children at risk for sleep-disordered breathing. Paediatr Anaesth. 2013;23(6):510–516. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Lee LK, Bernardo MKL, Grogan TR, Elashoff DA, Ren WHP. Perioperative respiratory adverse event risk assessment in children with upper respiratory tract infection: Validation of the COLDS score. Paediatr Anaesth. 2018;28(11):1007–1014. [DOI] [PubMed] [Google Scholar]
10.Nasr VG, DiNardo JA, Faraoni D. Development of a pediatric risk assessment score to predict perioperative mortality in children undergoing noncardiac surgery. Anesth Analg. 2017;124(5):1514–1519. [DOI] [PubMed] [Google Scholar]
11.Valencia E, Staffa SJ, Faraoni D, DiNardo JA, Nasr VG. Prospective External Validation of the Pediatric Risk Assessment Score in Predicting Perioperative Mortality in Children Undergoing Noncardiac Surgery. Anesth Analg. 2019;129(4):1014–1020. [DOI] [PubMed] [Google Scholar]
12.Habre W, Disma N, Virag K, et al. Incidence of severe critical events in paediatric anaesthesia (APRICOT): a prospective multicentre observational study in 261 hospitals in Europe. Lancet Respir Med. 2017;5(5):412–425. [DOI] [PubMed] [Google Scholar]
13.Lonsdale H, Jalali A, Ahumada L, Matava C. Machine learning and artificial intelligence in pediatric research: current state, future prospects, and examples in perioperative and critical care. J Pediatr. 2020;221:S3–S10. [DOI] [PubMed] [Google Scholar]
14.Lonsdale H, Gray GM, Ahumada LM, Yates HM, Varughese A, Rehman MA. The Perioperative Human Digital Twin. Anesth Analg. 2022;134(4):885–892. [DOI] [PubMed] [Google Scholar]
15.Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. J Clin Epidemiol. 2015;68(2):134–143. [DOI] [PubMed] [Google Scholar]
16.Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011;12(Oct):2825–2830. [Google Scholar]
17.Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic minority oversampling technique. Journal of Artificial Intelligence Research. 2002;16:321–357. [Google Scholar]
18.White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. Stat Med. 2011;30(4):377–399. [DOI] [PubMed] [Google Scholar]
19.Pierpont GL, Parenti CM. Physician risk assessment and APACHE scores in cardiac care units. Clin Cardiol. 1999;22(5):366–368. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Allaudeen N, Schnipper JL, Orav EJ, Wachter RM, Vidyarthi AR. Inability of providers to predict unplanned readmissions. J Gen Intern Med. 2011;26(7):771–776. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Saklad M Grading of patients for surgical procedures. Anesthesiology. 1941;2(3):281–284. [Google Scholar]
22.Haynes SR, Lawler PG. An assessment of the consistency of ASA physical status classification allocation. Anaesthesia. 1995;50(3):195–199. [DOI] [PubMed] [Google Scholar]
23.Jacqueline R, Malviya S, Burke C, Reynolds PI. An assessment of interrater reliability of the ASA physical status classification in pediatric surgical patients. Pediatr Anesth. 2006;16(9):928–931. [DOI] [PubMed] [Google Scholar]
24.Burgoyne LL, Smeltzer MP, Pereiras LA, Norris AL, De Armendi AJ. How well do pediatric anesthesiologists agree when assigning ASA physical status classifications to their patients? Pediatr Anesth. 2007;17(10):956–962. [DOI] [PubMed] [Google Scholar]
25.Tollinche LE, Yang G, Tan K-S, Borchardt R. Interrater variability in ASA physical status assignment: an analysis in the pediatric cancer setting. J Anesth. 2018;32(2):211–218. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Nasr VG, Staffa SJ, Zurakowski D, DiNardo JA, Faraoni D. Pediatric Risk Stratification Is Improved by Integrating Both Patient Comorbidities and Intrinsic Surgical Risk. Anesthesiology. 2019;130(6):971–980. [DOI] [PubMed] [Google Scholar]
27.Aplin S, Baines D, De Lima J. Use of the ASA Physical Status Grading System in pediatric practice. Pediatr Anesth. 2007;17(3):216–222. [DOI] [PubMed] [Google Scholar]
28.Malviya S, Voepel-Lewis T, Chiravuri SD, et al. Does an objective system-based approach improve assessment of perioperative risk in children? A preliminary evaluation of the ‘NARCO’. Br J Anaesth. 2011;106(3):352–358. [DOI] [PubMed] [Google Scholar]
29.Nasr VG, DiNardo JA, Faraoni D. Development of a pediatric risk assessment score to predict perioperative mortality in children undergoing noncardiac surgery. Anesth Analg.. 2017;124(5):1514–1519. [DOI] [PubMed] [Google Scholar]
30.Faraoni D, Vo D, Nasr VG, DiNardo JA. Development and validation of a risk stratification score for children with congenital heart disease undergoing noncardiac surgery. Anesth Analg. 2016;123(4):824–830. [DOI] [PubMed] [Google Scholar]
31.Tariq S, Syed M, Martin T, Zhang X, Schmitz M. Rates of Perioperative Respiratory Adverse Events Among Caucasian and African American Children Undergoing General Anesthesia. Anesth Analg. 2018;127(1):181–187. [DOI] [PubMed] [Google Scholar]
32.Burton ZA, Lewis R, Bennett T, et al. Prevalence of PErioperAtive CHildhood obesitY in children undergoing general anaesthesia in the UK: a prospective, multicentre, observational cohort study. Br J Anaesth. 2021;127(6):953–961. [DOI] [PubMed] [Google Scholar]
33.Hofer IS, Burns M, Kendale S, Wanderer JP. Realistically integrating machine learning into clinical practice: a road map of opportunities, challenges, and a potential future. Anesth Analg. 2020;130(5):1115. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Artificial intelligence and machine learning in software as a medical device action plan. Silverspring: U.S. Food and Drug Administration. 2019. Accessed [2023-02-14]. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting materials

NIHMS2026490-supplement-Supporting_materials.docx^{(27.3KB, docx)}

Data Availability Statement

The data are not publicly available; requests for access should be made via the European Society of Anaesthesiology and Intensive Care.

[R1] 1.Kurth CD, Tyler D, Heitmiller E, Tosone SR, Martin L, Deshpande JK. National pediatric anesthesia safety quality improvement program in the United States. Anesth Analg. 2014;119(1):112–121. [DOI] [PubMed] [Google Scholar]

[R2] 2.Owens WD. American Society of Anesthesiologists Physical Status Classification System is not a risk classification system. Anesthesiology. 2001;94(2):378–378. [DOI] [PubMed] [Google Scholar]

[R3] 3.Hartley B, Powell S, Bew S. Safe delivery of paediatric ENT surgery in the UK: a national strategy 2019. Accessed [2023-02-14]. https://www.entuk.org/news_and_events/news/77/safe_delivery_of_paediatric_ent_surgery_in_the_uk_a_national_strategy/ [Google Scholar]

[R4] 4.Whippey A, Kostandoff G, Ma HK, Cheng J, Thabane L, Paul J. Predictors of unanticipated admission following ambulatory surgery in the pediatric population: a retrospective case–control study. Paediatr Anaesth.. 2016;26(8):831–837. [DOI] [PubMed] [Google Scholar]

[R5] 5.Junger A, Klasen J, Benson M, et al. Factors determining length of stay of surgical day-case patients. Eur J Anaesthesiol. 2001;18(5):314–321. [DOI] [PubMed] [Google Scholar]

[R6] 6.Green Z, Woodman N, McLernon DJ, Engelhardt T. Incidence of paediatric unplanned day-case admissions in the UK and Ireland: a prospective multicentre observational study. Br J Anaesth. 2020;124(4):463–472. [DOI] [PubMed] [Google Scholar]

[R7] 7.Rabbitts JA, Groenewald CB, Moriarty JP, Flick R. Epidemiology of Ambulatory Anesthesia for Children in the United States: 2006 and 1996. Anesth Analg. 2010;111(4):1011–1015. [DOI] [PubMed] [Google Scholar]

[R8] 8.Tait AR, Voepel-Lewis T, Christensen R, O'Brien LM. The STBUR questionnaire for predicting perioperative respiratory adverse events in children at risk for sleep-disordered breathing. Paediatr Anaesth. 2013;23(6):510–516. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Lee LK, Bernardo MKL, Grogan TR, Elashoff DA, Ren WHP. Perioperative respiratory adverse event risk assessment in children with upper respiratory tract infection: Validation of the COLDS score. Paediatr Anaesth. 2018;28(11):1007–1014. [DOI] [PubMed] [Google Scholar]

[R10] 10.Nasr VG, DiNardo JA, Faraoni D. Development of a pediatric risk assessment score to predict perioperative mortality in children undergoing noncardiac surgery. Anesth Analg. 2017;124(5):1514–1519. [DOI] [PubMed] [Google Scholar]

[R11] 11.Valencia E, Staffa SJ, Faraoni D, DiNardo JA, Nasr VG. Prospective External Validation of the Pediatric Risk Assessment Score in Predicting Perioperative Mortality in Children Undergoing Noncardiac Surgery. Anesth Analg. 2019;129(4):1014–1020. [DOI] [PubMed] [Google Scholar]

[R12] 12.Habre W, Disma N, Virag K, et al. Incidence of severe critical events in paediatric anaesthesia (APRICOT): a prospective multicentre observational study in 261 hospitals in Europe. Lancet Respir Med. 2017;5(5):412–425. [DOI] [PubMed] [Google Scholar]

[R13] 13.Lonsdale H, Jalali A, Ahumada L, Matava C. Machine learning and artificial intelligence in pediatric research: current state, future prospects, and examples in perioperative and critical care. J Pediatr. 2020;221:S3–S10. [DOI] [PubMed] [Google Scholar]

[R14] 14.Lonsdale H, Gray GM, Ahumada LM, Yates HM, Varughese A, Rehman MA. The Perioperative Human Digital Twin. Anesth Analg. 2022;134(4):885–892. [DOI] [PubMed] [Google Scholar]

[R15] 15.Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. J Clin Epidemiol. 2015;68(2):134–143. [DOI] [PubMed] [Google Scholar]

[R16] 16.Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011;12(Oct):2825–2830. [Google Scholar]

[R17] 17.Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic minority oversampling technique. Journal of Artificial Intelligence Research. 2002;16:321–357. [Google Scholar]

[R18] 18.White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. Stat Med. 2011;30(4):377–399. [DOI] [PubMed] [Google Scholar]

[R19] 19.Pierpont GL, Parenti CM. Physician risk assessment and APACHE scores in cardiac care units. Clin Cardiol. 1999;22(5):366–368. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Allaudeen N, Schnipper JL, Orav EJ, Wachter RM, Vidyarthi AR. Inability of providers to predict unplanned readmissions. J Gen Intern Med. 2011;26(7):771–776. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Saklad M Grading of patients for surgical procedures. Anesthesiology. 1941;2(3):281–284. [Google Scholar]

[R22] 22.Haynes SR, Lawler PG. An assessment of the consistency of ASA physical status classification allocation. Anaesthesia. 1995;50(3):195–199. [DOI] [PubMed] [Google Scholar]

[R23] 23.Jacqueline R, Malviya S, Burke C, Reynolds PI. An assessment of interrater reliability of the ASA physical status classification in pediatric surgical patients. Pediatr Anesth. 2006;16(9):928–931. [DOI] [PubMed] [Google Scholar]

[R24] 24.Burgoyne LL, Smeltzer MP, Pereiras LA, Norris AL, De Armendi AJ. How well do pediatric anesthesiologists agree when assigning ASA physical status classifications to their patients? Pediatr Anesth. 2007;17(10):956–962. [DOI] [PubMed] [Google Scholar]

[R25] 25.Tollinche LE, Yang G, Tan K-S, Borchardt R. Interrater variability in ASA physical status assignment: an analysis in the pediatric cancer setting. J Anesth. 2018;32(2):211–218. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Nasr VG, Staffa SJ, Zurakowski D, DiNardo JA, Faraoni D. Pediatric Risk Stratification Is Improved by Integrating Both Patient Comorbidities and Intrinsic Surgical Risk. Anesthesiology. 2019;130(6):971–980. [DOI] [PubMed] [Google Scholar]

[R27] 27.Aplin S, Baines D, De Lima J. Use of the ASA Physical Status Grading System in pediatric practice. Pediatr Anesth. 2007;17(3):216–222. [DOI] [PubMed] [Google Scholar]

[R28] 28.Malviya S, Voepel-Lewis T, Chiravuri SD, et al. Does an objective system-based approach improve assessment of perioperative risk in children? A preliminary evaluation of the ‘NARCO’. Br J Anaesth. 2011;106(3):352–358. [DOI] [PubMed] [Google Scholar]

[R29] 29.Nasr VG, DiNardo JA, Faraoni D. Development of a pediatric risk assessment score to predict perioperative mortality in children undergoing noncardiac surgery. Anesth Analg.. 2017;124(5):1514–1519. [DOI] [PubMed] [Google Scholar]

[R30] 30.Faraoni D, Vo D, Nasr VG, DiNardo JA. Development and validation of a risk stratification score for children with congenital heart disease undergoing noncardiac surgery. Anesth Analg. 2016;123(4):824–830. [DOI] [PubMed] [Google Scholar]

[R31] 31.Tariq S, Syed M, Martin T, Zhang X, Schmitz M. Rates of Perioperative Respiratory Adverse Events Among Caucasian and African American Children Undergoing General Anesthesia. Anesth Analg. 2018;127(1):181–187. [DOI] [PubMed] [Google Scholar]

[R32] 32.Burton ZA, Lewis R, Bennett T, et al. Prevalence of PErioperAtive CHildhood obesitY in children undergoing general anaesthesia in the UK: a prospective, multicentre, observational cohort study. Br J Anaesth. 2021;127(6):953–961. [DOI] [PubMed] [Google Scholar]

[R33] 33.Hofer IS, Burns M, Kendale S, Wanderer JP. Realistically integrating machine learning into clinical practice: a road map of opportunities, challenges, and a potential future. Anesth Analg. 2020;130(5):1115. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Artificial intelligence and machine learning in software as a medical device action plan. Silverspring: U.S. Food and Drug Administration. 2019. Accessed [2023-02-14]. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device [Google Scholar]

PERMALINK

A machine-learning approach for decision support and risk stratification of pediatric perioperative patients based on the APRICOT dataset

Geoffrey M Gray, PhD

Luis M Ahumada, PhD

Mohamed A Rehman, MD

Anna Varughese, MD

Allison M Fernandez, MD, MBA

James Fackler, MD

Hannah M Yates, BS

Walid Habre, MD

Nicola Disma, MD

Hannah Lonsdale, MBChB

Abstract

Background:

Aims:

Methods:

Results:

Conclusions:

INTRODUCTION

METHODS

APRICOT dataset

Table 1.

Statistical methods and outcome

Data pre-processing

Feature selection

Figure 1.

Data scaling

Multiple imputation

Class imbalance

Model training

Model evaluation

Feature importance

RESULTS

Figure 3.

Figure 2.

Table 2.

Booking Model

Figure 4.

Day of Surgery Model

DISCUSSION

Comparison to Current Risk Assessment Tools

Feature Importance

Model Limitations

Future applications

Conclusion

Supplementary Material

ACKNOWLEDGEMENTS

Glossary of Terms

Footnotes

Data availability statement:

REFERENCES

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases