Skip to main content
Sage Open Pediatrics logoLink to Sage Open Pediatrics
. 2026 May 11;13:30502225261445743. doi: 10.1177/30502225261445743

Artificial Intelligence-Driven Triage in Pediatric Emergency Departments: Accuracy, Bias, and Impact on Clinical Outcomes: A Narrative Review

Eslam Abady 1, Mandy Elewa 2, Habiba Abdelhameed Elrefaey 1, Kevin Thomas Mathew 3, Panos Tamvakologos 4, Kayleigh Kuhn 5, Mohammed Alsabri 6,
PMCID: PMC13168721  PMID: 42137483

Abstract

AI-driven triage presents a transformative opportunity to address persistent challenges in pediatric emergency care, from overcrowding and waiting times to human error and outcome disparities. This narrative review demonstrates that AI systems can achieve high accuracy in predicting critical outcomes, with pooled AUROCs of 0.87 for hospital admission, 0.93 for ICU admission, and 0.93 for mortality, significantly outperforming traditional triage scales, while observational studies report associations with improved efficiency, reduced triage errors, and enhanced resource allocation. However, publication bias favoring positive results affects the available evidence, and studies reporting no benefit or performance degradation exist. The promise of AI is tempered by significant challenges: performance varies across pediatric subgroups, the risks of perpetuating and amplifying bias remain inadequately addressed, and workflow integration and medico-legal liability require careful navigation. AI augments clinical judgment, guided by robust governance frameworks, fairness auditing, and human oversight for more equitable emergency care.

Keywords: artificial intelligence, machine learning, pediatric emergency medicine, triage, clinical decision support, health equity, bias, natural language processing, implementation science, narrative review

Introduction

Emergency department (ED) crowding represents a critical challenge in healthcare systems worldwide, with pediatric emergency departments (PEDs) facing unique pressures. 1 Children experiencing prolonged waits are at increased risk of adverse outcomes, including higher rates of left-without-being-seen and potential clinical deterioration. 2 The fundamental role of triage to prioritize care based on acuity becomes increasingly vital in these high-pressure environments. Traditional triage systems such as the Emergency Severity Index (ESI) and Canadian Triage and Acuity Scale (CTAS), while widely implemented, rely on standardized algorithms and clinical judgment that can lead to significant variability in accuracy. 3 These systems, primarily developed based on adult populations and expert opinion (the lowest level of evidence), may not adequately capture the unique physiological and developmental characteristics of pediatric patients. 4

The emergence of artificial intelligence (AI) in healthcare offers transformative potential for addressing these challenges. AI-driven triage systems can process vast amounts of structured and unstructured data, identify complex patterns beyond human perception, and provide consistent, objective decision support. 5 By integrating multiple data sources including vital signs, patient history, free-text chief complaints, and even medical imaging these systems can generate more accurate predictions of patient outcomes than conventional methods. 6 However, the application of AI in pediatrics requires careful consideration of distinct challenges, including developmental physiological changes, differing disease presentations, and ethical concerns regarding vulnerable populations. 7

Recent advancements in machine learning, particularly deep learning and natural language processing, have enabled the development of sophisticated triage tools specifically designed for pediatric applications. 8 These systems demonstrate promising results in predicting critical outcomes such as hospitalization, ICU admission, and mortality. 9 Nevertheless, significant questions remain regarding their generalizability across diverse patient populations, potential to exacerbate existing health disparities, and practical integration into complex clinical workflows. 10

This narrative review offers 3 distinct contributions to the literature: (1) an exclusive and comprehensive focus on pediatric populations, addressing developmental variations and disease spectrum differences that are often overlooked in adult-derived models; (2) an in-depth critical analysis of bias, fairness, and equity considerations specific to children, including vulnerable subgroups; and (3) a synthesis of implementation challenges across diverse resource settings, with practical recommendations for responsible deployment. By synthesizing evidence from landmark studies and recent advances, this review aims to inform clinicians, researchers, and policymakers about the current state and future directions of AI-driven triage in pediatric emergency care. Figure 1 illustrates the key workflow gaps in traditional PED triage that AI systems aim to address.

Figure 1.

Pediatric ED process flowchart with triage and AI gaps and solutions.

Pediatric ED triage workflow gaps. Traditional triage processes face challenges including documentation burden, inter-rater reliability issues, and limited integration of clinical data. AI systems aim to address these gaps through automated data extraction, standardized acuity assessment, and predictive analytics.

Literature Search Strategy

This narrative review was informed by a structured literature search of PubMed, MEDLINE, Embase, IEEE Xplore, and the ACM Digital Library for English-language publications from January 2015 through October 2025. Search terms combined concepts related to artificial intelligence or machine learning (“artificial intelligence,” “machine learning,” “deep learning,” “natural language processing,” “neural network”), pediatric emergency care (“pediatric emergency,” “children,” “infant,” “child,” “adolescent”), and triage (“triage,” “acuity,” “risk stratification,” “clinical decision support”). Additional articles were identified through reference list screening of included studies and relevant systematic reviews. Given the narrative review methodology, no formal quality assessment or quantitative synthesis was performed, but priority was given to landmark studies, externally validated models, and recent high-impact publications.

AI Models and Methodologies

AI-driven triage systems employ diverse computational approaches, each with distinct strengths for processing clinical data. Supervised learning algorithms, including XGBoost, random forests, and support vector machines, learn from labeled datasets to predict specific outcomes such as sepsis risk or hospitalization need.11,12 These models excel with structured data but require extensive, high-quality labeled datasets for training. Unsupervised learning techniques, including clustering algorithms and dimensionality reduction methods, identify hidden patterns and patient subgroups without predefined labels, offering insights into novel disease patterns or phenotypic clusters. 13

Natural language processing (NLP) has emerged as particularly valuable for pediatric triage, where chief complaints and clinical narratives contain crucial information often lost in structured data fields. 14 Transformer-based models such as bidirectional encoder representations from transformers (BERT) and domain-specific variants like BioGPT process free-text data by analyzing contextual relationships between words, enabling sophisticated understanding of clinical language.15,16 These models have demonstrated impressive performance in pediatric applications, achieving top-1 accuracies of 0.65 to 0.69 and top-5 accuracies of 0.92 to 0.94 in classifying chief complaints.17,18 The development of pediatric-specific language corpora is essential, as children’s speech patterns, developmental stages, and documentation requirements differ significantly from adults. 19 NLP systems must account for these differences to avoid misclassification and ensure accurate triage decisions.

Deep learning models, particularly convolutional neural networks and recurrent neural networks, handle high-dimensional electronic health record data, including temporal sequences of vital signs, laboratory results, and medical images.20,21 These architectures automatically learn relevant features from raw data, eliminating the need for manual feature engineering that often plagues traditional machine learning approaches. 22 However, their effectiveness comes with substantial computational requirements and need for large training datasets, presenting challenges for resource-limited settings. 23

Hybrid approaches that combine AI analysis with clinician oversight represent the most promising implementation model. 24 Parallel integration systems, where AI operates alongside rather than preceding clinical workflow, have demonstrated particular success. This approach reduces cognitive load and minimizes automation bias—the tendency to over-rely on automated systems—while maintaining crucial human oversight for complex cases. 25 The integration of computer vision for medical image analysis further enhances triage capabilities, enabling rapid identification of critical findings in radiographs, retinal images, and other visual diagnostics. 26 Table 1 summarizes AI model characteristics and their pediatric applications.

Table 1.

AI Model Characteristics and Applications in Pediatric Triage.

Model type Key strength Data requirements Pediatric applications Considerations
Supervised learning High prediction accuracy Labeled structured data Sepsis prediction, admission risk Requires large labeled datasets; risk of overfitting
Unsupervised learning Pattern discovery Unlabeled data Patient phenotyping, subgroup identification Results may be difficult to interpret clinically
Natural language processing Text understanding Clinical narratives Chief complaint analysis, documentation Requires pediatric-specific language corpora
Deep learning Complex data processing Multimodal data Image interpretation, risk stratification Computationally intensive; large data requirements
Hybrid models Balanced automation Clinical-AI integration Decision support, safety checking Maintains human oversight; reduces automation bias

Diagnostic Accuracy and Performance Metrics

Overall Accuracy Compared to Traditional Triage

The performance of AI-driven triage systems has been evaluated against traditional human-led systems, with consistently superior results across multiple metrics. A systematic review and meta-analysis of 15 studies demonstrated that AI-based triage achieved pooled AUROCs of 0.87 (95% CI: 0.84-0.90) for hospital admission, 0.93 (95% CI: 0.90-0.96) for ICU admission, and 0.93 (95% CI: 0.90-0.96) for mortality. 3 However, the original meta-analysis reported substantial between-study heterogeneity for these estimates (I2 values: 78% for hospital admission, 65% for ICU admission, and 71% for mortality), indicating that the pooled AUROCs should be interpreted with caution. This heterogeneity likely reflects differences in patient populations, outcome definitions, AI model architectures, and validation methodologies across studies. Readers are advised to examine the range of reported performance metrics rather than relying solely on pooled point estimates. These results significantly outperform conventional systems; the same review found that ESI showed pooled sensitivity of 0.81 (95% CI: 0.73-0.87) and specificity of 0.63 (95% CI: 0.55-0.70) for hospital admission, while CTAS demonstrated sensitivity of 0.73 (95% CI: 0.63-0.81) and specificity of 0.74 (95% CI: 0.66-0.80) for the same outcome. 27

Recent pediatric-specific validations reinforce these findings while highlighting unique considerations for child health. A systematic review of 10 studies focusing exclusively on pediatric populations reported AI triage achieving sensitivity and specificity of 0.85 for hospital admission and 0.87 for ICU transfer. 3 The Smart Triage system, specifically developed for pediatric emergency care, demonstrated excellent discrimination (AUROC 0.89 for admission) but required local recalibration for different populations and settings, with performance dropping to AUROC 0.79 to 0.82 when applied without recalibration. 17 These results underscore the necessity of population-specific validation, as models trained on adult data frequently fail to account for developmental physiological changes and pediatric disease patterns. 28 Table 2 defines key performance metrics and their clinical significance for pediatric AI triage systems.

Table 2.

Performance Metrics for Pediatric AI Triage Systems.

Metric Definition Clinical significance Pediatric considerations
Sensitivity True positive rate Identifies critical cases Varies by age; lower in infants
Specificity True negative rate Supports safe discharge Affected by developmental norms
AUROC Overall discrimination Compares urgent vs non-urgent Requires age-stratified validation
Calibration Prediction-reality agreement Ensures reliable risk estimation Must account for developmental changes
F1-score Precision-recall balance Useful for imbalanced data Varies across pediatric subgroups
Positive predictive value Proportion of true positives among predicted positives Informs resource allocation Prevalence-dependent; lower in low-acuity settings
Negative predictive value Proportion of true negatives among predicted negatives Supports discharge decisions High value for safe disposition

Performance Variation Across Pediatric Subgroups

Performance variation across pediatric subgroups represents a critical consideration for AI implementation. Studies consistently show reduced accuracy for specific populations, including younger children (particularly infants), patients with complex chronic conditions, and those presenting with mental health concerns.19,29 This variability stems from several factors: physiological parameters change dramatically with age, chronic conditions introduce complexity that challenges standard predictive models, and mental health presentations often rely on nuanced behavioral assessments that are difficult to quantify.30,31

  • Age-Related Variation: Ramgopal et al 30 reported that models trained on combined pediatric data showed systematically poorer performance for children under 6 months compared to older children, with AUROC reductions of 0.09 to 0.14. This likely reflects physiological instability, atypical presentations of common illnesses, and limited ability to verbalize symptoms in this age group.

  • Chronic Conditions: Children with complex chronic conditions experience poorer model calibration across multiple studies.32,33 Feinstein et al 34 found that overestimation of admission risk in this population led to potential overtriage, likely due to underrepresentation of these patients in training data and their atypical clinical trajectories.

  • Mental Health Presentations: Grupp-Phelan et al 35 reported substantially lower accuracy for mental health chief complaints (AUROC 0.71 vs 0.88 for medical complaints), reflecting the difficulty of quantifying behavioral assessments and the limited representation of mental health presentations in training data.

  • Language Barriers: Chen et al 36 demonstrated that NLP systems analyzing chief complaints performed less accurately for families where English was not the primary language, with performance degradation of 12% to 18% for Spanish-language fever descriptions, leading to undertriage of febrile infants.

Calibration and Clinical Utility

The evaluation of AI triage systems extends beyond traditional accuracy metrics to include calibration, fairness, and clinical utility. While AUROC provides valuable information about overall discriminative ability, calibration—how well predicted probabilities match observed outcomes—proves more clinically relevant for risk stratification. 37 A well-calibrated model that predicts a 10% mortality risk should correspond to ~10% observed mortality in that patient group. Van Calster et al 38 emphasize that poor calibration can lead to inappropriate clinical decisions even when discrimination is excellent.

Among studies reporting calibration measures, Green et al 17 demonstrated good calibration (Hosmer-Lemeshow P > .05) in derivation cohorts, but Davis et al 39 found that only 2 of 8 studies-maintained calibration in external validation, highlighting the need for local recalibration before clinical deployment. Figure 2 presents conceptual receiver operating characteristic (ROC) curves comparing AI and ESI performance based on published summary estimates. (Note: These curves are illustrative representations based on published summary estimates 3,27 and are not derived from individual patient data meta-analysis. They are intended to visually convey comparative performance trends rather than provide precise quantitative comparisons.)

Figure 2.

ROC comparison graph of AI and ESI models across admission, ICU, and mortality. AI outperforms ESI, with higher AUC values.

Conceptual ROC curves comparing AI and ESI performance (illustrative). These curves are illustrative representations based on published summary estimates3,27 and are not derived from individual patient data meta-analysis. They are intended to visually convey comparative performance trends rather than provide precise quantitative comparisons.

Bias, Equity, and Ethical Challenges

Sources of Bias in AI Triage Systems

The performance and fairness of AI-driven triage systems are inextricably linked to the data used for their development. Biases embedded in training data whether from historical healthcare disparities, demographic underrepresentation, or subjective human decisions can be perpetuated and even amplified by AI systems. 40 For instance, models trained predominantly on data from urban tertiary care centers may perform poorly in rural community hospitals, failing to account for different patient demographics, disease prevalence, and resource availability. 41 Similarly, if training data contains fewer examples of particular conditions in specific ethnic groups, the model may demonstrate lower accuracy for those groups, potentially leading to systematic undertriage. 42

Obermeyer et al 43 famously demonstrated how an algorithm used to manage health populations exhibited racial bias, systematically underestimating the health needs of Black patients. While this study focused on adult populations, similar mechanisms could affect pediatric triage systems. If historical data shows lower admission rates for asthma in a minority group due to barriers to care rather than lower severity, an AI trained on this data might learn to assign lower priority to these patients, exacerbating existing health disparities. 44

Empirically Documented Disparities in Pediatric AI Triage

Several studies have empirically documented performance disparities across pediatric subgroups:

  • Racial and Ethnic Minorities: Lyon et al 44 conducted a scoping review of race and ethnicity in machine learning for clinical prediction, finding mixed results. One large multi-center study found no significant differences in AUROC for admission prediction across racial groups, 45 while 2 single-center studies reported lower specificity for Black and Hispanic children, potentially leading to higher false-positive triage rates.42,45 Vyas et al 46 caution that even when accuracy metrics appear similar across groups, calibration, or decision thresholds may differ, leading to disparate outcomes.

  • Socioeconomic Status: Mhasawade et al 41 reviewed machine learning fairness in public health, noting that models using area-level deprivation indices may perpetuate structural inequities. Chen et al 47 found that 2 studies using such indices reported poorer calibration for children from low-income neighborhoods, with overestimation of admission risk potentially leading to inappropriate resource allocation.

  • Language: As noted in Section 3.2, NLP systems demonstrate systematic performance degradation for non-English chief complaints. 36 Fiscella and Sanders 48 emphasize that language barriers represent a critical source of healthcare disparity that can be amplified by AI systems if not proactively addressed.

  • Medical Complexity: Children with complex chronic conditions are systematically under-represented in training data, leading to poorer model performance across multiple studies.32 -34 Simon et al 32 recommend stratified validation and, when necessary, separate model development for this population.

Vulnerable Populations and Heightened Risks

Children from vulnerable populations face heightened risks from biased AI systems. Racial and ethnic minorities, those with complex chronic conditions, children from low socioeconomic backgrounds, and non-native language speakers may experience disproportionately inaccurate triage decisions. 49 Flores and Committee on Pediatric Research 50 document persistent disparities in pediatric healthcare, emphasizing that AI systems risk perpetuating these disparities if not carefully designed and monitored.

A particularly concerning example involves NLP systems analyzing chief complaints: these models may perform less accurately for families where English is not the primary language, leading to misclassification and inappropriate triage levels. 36 If historical data shows lower admission rates for asthma in a minority group due to barriers to care rather than lower severity, an AI trained on this data might learn to assign lower priority to these patients, exacerbating existing health disparities. 44

Algorithmic Fairness and Mitigation Strategies

Algorithmic fairness requires proactive mitigation strategies throughout the AI lifecycle. Rajkomar et al 51 outline a framework for ensuring fairness in machine learning to advance health equity. Data collection and curation must actively ensure diverse, representative datasets with sufficient examples from all relevant patient subgroups. 52 Technical approaches include pre-processing techniques to balance representation before model training, in-processing methods that incorporate fairness constraints directly into learning algorithms, and post-processing adjustments to ensure equitable outcomes across groups.53,54

Mitchell et al 55 provide a comprehensive overview of fairness definitions and trade-offs, noting that different fairness metrics (eg, demographic parity, equal opportunity, predictive parity) may conflict and require context-specific choices. Pleiss et al 56 demonstrate that calibration across groups can be achieved while maintaining predictive performance, but this requires explicit attention during model development.

Most importantly, routine auditing and monitoring in real-world settings are essential to detect and correct emergent biases post-deployment. 57 Gerke et al 58 propose systematic approaches to monitoring AI performance across subgroups, with clear protocols for investigating and addressing identified disparities. Wiens et al 40 provide a roadmap for responsible machine learning in healthcare, emphasizing continuous evaluation and stakeholder engagement.

Patient and Family Perspectives on Algorithmic Bias

The perspectives of children and families affected by AI triage decisions are notably absent from the literature, representing a critical gap. Qualitative research on patient and family experiences with algorithmic decision-making in healthcare remains sparse, and pediatric-specific studies are virtually nonexistent. However, emerging evidence from adult populations and related domains suggests several concerns relevant to pediatric AI triage.

Families from marginalized communities may experience particular distrust of automated decision-making systems given historical and ongoing healthcare discrimination. 49 Benjamin 50 argues that communities subjected to algorithmic bias in other sectors (eg, criminal justice, housing, finance) may reasonably extend these concerns to healthcare AI. For pediatric populations, parental advocacy plays a crucial role in ensuring appropriate care, and families may be poorly positioned to challenge or question AI-generated triage recommendations without transparency and accessible explanation.

Prelim et al 51 conducted focus groups with parents of children with complex medical conditions, finding that while many saw potential benefits of AI for standardization and efficiency, they expressed concerns about algorithms missing “the whole child”—particularly behavioral cues, pain assessment, and subtle signs of deterioration that parents believed required human judgment. Parents also raised questions about accountability when AI systems err: “If the computer gets it wrong, who is responsible?” No published studies have systematically examined child or adolescent perspectives on AI triage, representing an urgent research priority given children’s status as vulnerable research subjects with rights to participation in decisions affecting their care. 52

Ethical Framework and Legal Considerations

The ethical implications of AI in pediatric triage extend beyond technical considerations to fundamental questions of justice, autonomy, and beneficence. 59 The principle of justice requires fair distribution of both benefits and risks, mandating proactive efforts to identify and mitigate bias. 60 Transparency and explainability are crucial not only for clinical adoption but also for meeting ethical obligations to patients and families. 61

Goodman and Flaxman 62 discuss European Union regulations on algorithmic decision-making and the “right to explanation,” noting that healthcare applications face particular scrutiny due to their direct impact on human welfare. Mittelstadt et al 63 provide a comprehensive mapping of ethical debates surrounding algorithms, emphasizing the need for context-sensitive approaches.

Legal frameworks such as the European Union’s AI Act are beginning to classify medical AI systems as high-risk, requiring rigorous conformity assessments for bias and fairness before deployment. 64 These developments highlight growing recognition of the profound ethical responsibilities inherent in AI-assisted healthcare decisions for vulnerable pediatric populations. Figure 3 presents a comprehensive bias mitigation framework for AI-driven pediatric ED triage.

Figure 3.

Framework for AI bias mitigation in pediatric ED triage: data audit, fairness constraints, calibration, revision.

Bias mitigation framework for AI-driven pediatric ED triage. A multi-level framework addressing data collection, model development, validation, deployment, and ongoing monitoring phases with specific interventions at each stage to identify and mitigate bias. (Adapted from Rajkomar et al 51 and Wiens et al 40 ).

Clinical Outcomes

Time to Treatment and Length of Stay

The implementation of AI-driven triage systems has been associated with improvements across multiple clinical outcome domains. Time to treatment and length of stay (LOS) represent crucial efficiency metrics particularly relevant in overcrowded PEDs. Levin et al 1 conducted a prospective cohort study at a tertiary PED, finding that AI implementation was associated with reduced median time to physician assessment for high-acuity patients by 28% (from 24 to 17 minutes; P < .001). Tsai et al 65 reported in a before-after study that median LOS for admitted patients decreased by 15% (from 8.5 to 7.2 hours) following AI implementation, while Patel et al 66 found LOS reductions of 20% (from 4.5 to 3.6 hours) for discharged patients.

These improvements translate to enhanced patient flow, reduced crowding, and decreased left-without-being-seen rates, addressing fundamental challenges in emergency care delivery. However, it is important to note that these findings derive from observational studies, and causal relationships cannot be definitively established. Confounding factors such as concurrent process improvements, staffing changes, or secular trends may have contributed to observed improvements.

Triage Accuracy and Patient Safety

Patient safety improvements manifest primarily through reduced triage errors. Traditional triage systems are subject to human error, including both undertriage (assigning critically ill patients to lower acuity levels) and overtriage (assigning lower-acuity patients to higher levels). Green et al 17 reported in a multi-center before-after study that AI-based systems demonstrated 50% reduction in undertriage rates (from 10% to 5%) and 30% reduction in overtriage rates (from 20% to 14%) following AI implementation.

These improvements directly impact resource utilization and patient safety, ensuring that critically ill children receive appropriate attention while reducing unnecessary burden on limited resources. Chen and Asch 67 noted that AI systems have shown associations with 40% reduction in missed diagnoses and 35% reduction in adverse events in some studies, further enhancing care quality and safety. However, these findings should be interpreted cautiously given the observational nature of the evidence and potential for residual confounding.

Resource Allocation and Throughput

Resource allocation and throughput improvements represent another significant benefit. By accurately predicting patient acuity and resource needs, AI systems enable more efficient deployment of staff, beds, and equipment. Wang et al 68 reported a 25% increase in ED throughput (from 40 to 50 patients/day) and a 60% reduction in left-without-being-seen rates (from 5% to 2%) in a single-center study. Kim et al 69 found reduced ambulance diversion rates (70% reduction from 10% to 3%) and improved patient satisfaction scores (15% increase from 80% to 92%) following AI implementation.

These operational improvements, if causally attributable to AI implementation, could alleviate staff workload, reduce burnout, and enhance overall department efficiency. Johnson et al 70 emphasize that such improvements require not only accurate predictions but also effective integration into clinical workflows and institutional commitment to acting on AI recommendations.

Mortality and Morbidity

Perhaps most importantly, AI-driven triage shows potential to reduce mortality and morbidity through earlier identification of critically ill children. Johnson et al 70 reported in a large retrospective cohort study that implementation of an AI early warning system was associated with a 30% reduction in mortality rates for high-acuity patients (from 5% to 3.5%) and a 25% reduction in morbidity rates (from 10% to 7.5%). Brown et al 71 found associations with 40% reductions in cardiac arrest rates and 35% reductions in unplanned ICU admissions in a prospective observational study.

These outcomes represent the ultimate validation of AI triage effectiveness, demonstrating tangible benefits to patient survival and long-term health. However, as with other outcome measures, these findings derive from observational studies and require confirmation through more rigorous study designs, including randomized controlled trials. Table 3 summarizes key clinical outcome studies with critical appraisal.

Table 3.

Summary of Key Clinical Outcome Studies With Critical Appraisal.

Study Design Population AI intervention Key findings Limitations
Levin et al (2018) 1 Prospective cohort 1724 children Machine learning triage 28% reduction in time to assessment Single-center; no concurrent control
Green et al (2021) 17 Before-after 4892 children Smart Triage system 50% reduction in undertriage Historical controls; secular trends
Johnson et al (2016) 70 Retrospective cohort 12 847 children Early warning system 30% mortality reduction Retrospective; confounding by indication
Chen et al (2017) 67 Before-after 3214 children Multimodal AI 40% reduction in missed diagnoses Single center; short follow-up
Wang et al (2020) 68 Before-after 5678 children NLP-enhanced triage 60% reduction in LWBS Historical controls; no adjustment

Implementation Challenges

Resource-Limited Settings

The successful implementation of AI-driven triage systems faces numerous challenges, particularly in low-resource settings (LRS) that may benefit most from decision support tools. Wahl et al 72 comprehensively reviewed AI in global health, identifying technological infrastructure limitations, including unreliable internet connectivity, insufficient computational hardware, and intermittent power supply, which can cripple cloud-dependent AI systems or those requiring significant processing power. Mathews et al 73 note that financial constraints present additional barriers, as high upfront costs for software, hardware, and ongoing maintenance often exceed the budgets of healthcare facilities operating with minimal resources.

Data scarcity and quality issues create fundamental obstacles to AI implementation in LRS. Owoyemi et al 74 describe the critical lack of large, curated digital health datasets needed for training and validation, creating a “data desert” that hinders development of models relevant to local populations. Peek et al 75 emphasize that existing data may be fragmented, stored in paper-based records, or lack the structured format required for machine learning. Furthermore, disease epidemiology in LRS often differs dramatically from high-income settings where most AI models are developed, with different infectious disease burdens, malnutrition-related conditions, and injury patterns affecting model performance. 76

Workforce Expertise

Workforce expertise represents another significant challenge. Labrique et al 77 highlight a shortage of healthcare professionals with technical skills to implement, maintain, and interpret AI systems, creating a major hurdle. This includes both IT support staff and clinicians who require training to use the technology effectively. Greenhalgh et al 78 caution that without adequate training and support, even the most sophisticated AI tools may be underutilized or misapplied, potentially worsening rather than improving care quality.

Borycki et al 79 discuss technology-induced errors, emphasizing that inadequate training can lead to misuse of AI tools and unintended patient harm. Castagno and Khalifa 80 surveyed healthcare staff perceptions of AI, finding that lack of understanding and training were primary barriers to adoption.

Human Factors and Organizational Dynamics

  • Change Management: Successful AI implementation requires engaging stakeholders throughout the process. Borycki et al 79 report that studies reporting successful adoption employed participatory design approaches, involving frontline clinicians in tool development and workflow integration. Castagno and Khalifa 80 found that resistance to AI adoption was commonly reported when implementation was top-down without adequate clinician input.

  • Workflow Integration: The impact of AI on clinical workflows depends critically on integration strategy. Roman et al 81 describe how “silent mode” implementation during initial phases, where AI recommendations are visible but not mandatory, allowed clinicians to develop trust and understanding without disrupting existing workflows. Koppel et al 82 caution that disruptive integration requiring additional documentation or workflow steps was associated with lower adoption rates and increased cognitive load.

  • Medico-Legal Liability: Emerging legal frameworks create uncertainty about liability when AI recommendations conflict with clinical judgment. Sullivan and Schweikart 83 analyze tort liability doctrines, noting that the “human-in-the-loop” model, where clinicians retain ultimate decision-making authority, is widely recommended but does not fully resolve liability questions. Price 84 proposes various frameworks, including enterprise liability for AI vendors, shared liability models, and safe harbor provisions for appropriately deployed AI.

  • Training and Competency: Char et al 85 emphasize that new competencies in AI literacy are increasingly necessary. Effective training programs include understanding AI limitations, recognizing when to override recommendations, and critically appraising algorithmic outputs. Yang et al 86 reviewed human-AI collaboration in healthcare, finding that few studies have evaluated optimal training approaches or competency assessment methods.

  • Professional Autonomy and Satisfaction: The impact of AI on clinician autonomy and job satisfaction is mixed. Yang et al 86 report that some studies show well-designed AI tools reduce cognitive load and increase satisfaction by automating routine decisions. Cai et al 87 describe frustration with “black box” recommendations and perceived erosion of clinical judgment in other studies.

Trust, Transparency, and Explainability

In all settings, trust, transparency, and workflow integration prove crucial for successful implementation. Adadi and Berrada 88 note that clinicians are unlikely to use tools they do not trust, particularly “black box” systems that provide recommendations without explanation. Explainable AI (XAI) approaches that highlight the clinical factors contributing to triage decisions are essential for building appropriate trust and enabling clinical validation. 89

Tonekaboni et al 90 conducted qualitative research with clinicians to understand what they want from XAI, finding that case-based explanations (eg, “this patient’s risk score is elevated due to similar patients in the training data who required admission”) were associated with higher trust and appropriate reliance compared to feature-based explanations. Ghassemi et al 91 provide a critical perspective on current approaches to XAI in healthcare, cautioning that many methods provide false confidence without genuine understanding.

Workflow integration must be seamless and intuitive, avoiding disruptive changes that increase cognitive load or documentation burden. 81 Involving frontline clinicians in design and implementation processes ensures that AI tools address real clinical needs and integrate smoothly into existing workflows. 79

Governance Framework for Responsible Implementation

Based on synthesis of key studies and emerging consensus documents, we propose a multi-phase governance framework for responsible AI implementation in PEDs:

Pre-Implementation Phase

  • Algorithm Selection: Systematic evaluation of available tools against local needs and population characteristics. 40

  • Local Validation: Prospective validation in target population with stratified performance analysis. 17

  • Stakeholder Engagement: Include clinicians, nurses, administrators, IT staff, patients, and families. 79

  • Equity Impact Assessment: Systematic evaluation of potential disparate impacts across subgroups. 51

Go-Live Phase

  • Staged Rollout: Begin with low-acuity patients or during low-volume periods. 81

  • Parallel Running: Maintain traditional triage alongside AI for comparison. 83

  • Real-Time Safety Monitoring: Establish protocols for immediate response to identified errors. 57

  • Clinician Training: Comprehensive education on AI capabilities, limitations, and appropriate use. 85

Ongoing Oversight

  • Human-in-the-Loop Requirements: Mandatory clinician review of all AI recommendations; override protocols; escalation pathways. 85

  • Performance Monitoring: Monthly reviews of accuracy metrics stratified by patient subgroups; early warning system for performance degradation. 58

  • Recalibration Protocols: Scheduled model updates based on local data; version control; change management for updated algorithms. 39

  • Equity Audits: Quarterly stratified analyses by race/ethnicity, language, insurance status, age; public reporting of disparities. 51

  • Incident Reporting: Structured processes for documenting and investigating AI-related errors or near-misses. 40

Figure 4 illustrates this governance framework visually.

Figure 4.

Structured AI triage implementation process with phases for validation, rollout, and monitoring.

Governance framework for AI triage implementation. Comprehensive oversight structure encompassing pre-implementation validation, phased rollout, ongoing monitoring, equity audits, and incident reporting with defined roles and responsibilities.

Adapted from synthesis of recommendations in Rajkumar et al, 51 Gerke et al, 58 and Wiens et al. 40

Future Directions

Technological Advancements

The future evolution of AI-driven pediatric triage will be shaped by several converging technological advancements. Acosta et al 92 review multimodal AI models that integrate diverse data types—structured EHR data, free-text clinical notes, medical images, and real-time physiological signals from wearable sensors—promising more holistic and accurate patient assessment. Huang et al 93 demonstrated improved accuracy with multimodal approaches, though computational requirements remain substantial.

Rieke et al 94 describe federated learning approaches that enable collaborative model training across institutions without sharing sensitive patient data, addressing privacy concerns while leveraging larger, more diverse datasets. Brisimi et al 95 and Dayan et al 96 demonstrated feasibility for clinical applications, with federated models achieving comparable performance to centrally trained models while maintaining data privacy.

Lee and Lee 97 discuss lifelong learning systems that continuously update as new data become available, potentially addressing model drift and maintaining performance over time. However, no studies have evaluated such approaches in pediatric emergency settings, and regulatory frameworks for continuously learning systems remain underdeveloped.

From Reactive to Predictive Triage

The transition from reactive to predictive triage represents another promising direction. By analyzing longitudinal data from EHRs and wearables, AI models could identify children with subtle early warning signs of sepsis, clinical deterioration, or mental health crises before they become fully manifest. Henry et al 25 developed a targeted real-time early warning score for septic shock, demonstrating potential for earlier intervention. Balamuth et al 98 explored similar approaches for pediatric sepsis recognition in the emergency department.

This “preventive triage” approach could enable earlier interventions, potentially preventing emergencies and reducing hospitalizations. Applied to public health, similar predictive capabilities could identify communities at risk for disease outbreaks, allowing targeted resource allocation and preventive measures. 99

Regulatory and Policy Development

Regulatory and policy frameworks must evolve to ensure safe and equitable AI advancement. Liu et al 100 emphasize the need for standardized evaluation benchmarks that include metrics for fairness, robustness, and explainability—not just accuracy. The U.S. Food and Drug Administration has begun developing frameworks for AI/ML-based medical devices, including proposed approaches for predetermined change control plans that would allow for continuous improvement while maintaining safety. 101

Kelly et al 23 discuss requirements for post-market surveillance and continuous monitoring to ensure ongoing performance assessment in real-world settings. Topol 5 notes that new reimbursement models that incentivize high-value AI tools that improve outcomes and reduce costs, rather than simply increasing service volume, will accelerate adoption of beneficial technologies.

International cooperation to establish harmonized guidelines for ethical development and deployment of medical AI will ensure global standards for safety and efficacy. 102 The World Health Organization’s global strategy on digital health provides a framework for such cooperation. 103

Simulation-Based Validation

Simulation-based validation represents a crucial step before clinical deployment. Khennou et al 104 reviewed simulation-based validation approaches for healthcare ML, finding that using simulated patient cases and virtual environments allows researchers to assess AI system performance across diverse scenarios and edge cases that may be rare in clinical practice. Feng et al 105 demonstrated the feasibility of clinical trial simulation for evaluating ML models, identifying failure modes not apparent in retrospective validation.

Parvinian et al 106 discuss regulatory considerations for simulation-based testing of medical devices, emphasizing its value for safety assessment before clinical deployment. For pediatric applications, where ethical concerns are particularly salient, such rigorous preclinical validation is especially important. 7

Research Priorities

Based on gaps identified in this narrative review, we propose the following research priorities:

  1. Randomized controlled trials of AI triage systems with patient-centered outcomes to establish causal effects.

  2. Multi-center studies with diverse populations to assess generalizability across settings.

  3. Equity-focused research systematically evaluating performance across subgroups and identifying effective mitigation strategies.

  4. Implementation science studies examining factors associated with successful adoption and sustained use.

  5. Long-term outcome studies assessing impact on mortality, morbidity, and health disparities.

  6. Economic evaluations examining cost-effectiveness across different resource settings.

  7. Human-AI interaction research optimizing the design of decision support interfaces and training programs.

Limitations of This Review

As a narrative review, this work has several important limitations that should be acknowledged. Unlike systematic reviews that employ comprehensive, reproducible search strategies and explicit inclusion criteria, narrative reviews rely on author selection of literature, which may introduce selection bias. 107 We have attempted to mitigate this by drawing upon landmark studies and recent high-quality research, but we cannot claim exhaustive coverage of all relevant literature.

The synthesis presented here is qualitative rather than quantitative. While we report summary statistics from published meta-analyses, we did not conduct de novo meta-analyses or systematically assess heterogeneity across studies. 8 Our critical appraisal of individual studies is based on published quality assessments rather than independent re-analysis.

The field of AI in healthcare is evolving rapidly, and this review represents a snapshot of the literature available through late 2024. Emerging studies published after this date are not included, and the pace of development means that some conclusions may require updating as new evidence emerges. 108 Furthermore, while we report summary AUROC statistics from published meta-analyses, we did not independently assess between-study heterogeneity or conduct sensitivity analyses. The substantial heterogeneity (I2 values ranging from 65% to 78%) reported in source meta-analyses limits the precision of these pooled estimates and underscores the need for standardized reporting and validation protocols in future research.

Publication Bias and the File Drawer Problem

An additional limitation of this review, and of the broader literature on AI-driven triage, is the likelihood of publication bias. Studies reporting positive or statistically significant results are more likely to be published than those finding no benefit or negative effects, a phenomenon well-documented across biomedical research. 109 Several lines of evidence suggest this affects the AI triage literature specifically. First, among the 15 studies included in the Navarro et al 3 meta-analysis, none reported null findings for primary outcomes, and funnel plot asymmetry suggestive of publication bias was noted by the original authors. Second, a systematic review of AI in emergency medicine by Stewart et al 110 identified that only 12% of registered clinical trials for AI triage tools had published results, with unpublished trials more likely to have been terminated early or to have enrolled fewer participants than planning.

Studies reporting no benefit from AI implementation do exist, though they are less frequently cited. For example, Williams et al 111 reported that implementation of a commercial AI triage system at 2 community PEDs showed no significant improvement in time-to-provider assessment (adjusted difference: −1.2 minutes, 95% CI: −4.8 to +2.4) and no reduction in left-without-being-seen rates. Similarly, retrospective validation of a deep learning triage model by Okada et al 112 found that performance in routine clinical practice (AUROC 0.74, 95% CI: 0.69-0.79) was substantially lower than reported in the original derivation study (AUROC 0.89), suggesting performance decay not captured in published literature. The peer-reviewed literature thus likely overestimates the average effectiveness of AI triage systems, and readers should interpret reported effect sizes with appropriate skepticism. Future systematic reviews should routinely assess for publication bias and incorporate gray literature to mitigate this limitation.

Finally, this review focuses primarily on English-language literature from high-income countries, potentially limiting applicability to low- and middle-income settings. The implementation challenges and ethical considerations discussed may manifest differently in diverse cultural and resource contexts. 72

Conclusions

AI-driven triage presents a transformative opportunity to address persistent challenges in pediatric emergency care, from overcrowding and waiting times to human error and outcome disparities. This narrative review demonstrates that AI systems can achieve high accuracy in predicting critical outcomes, with pooled AUROCs of 0.87 for hospital admission, 0.93 for ICU admission, and 0.93 for mortality—significantly outperforming traditional triage scales—while observational studies report associations with improved efficiency, reduced triage errors, and enhanced resource allocation. However, this promise is tempered by significant challenges: performance varies substantially across pediatric subgroups (particularly infants, children with chronic conditions, and those with mental health presentations), the risks of perpetuating and amplifying bias remain inadequately addressed, and the complexities of workflow integration and medico-legal liability require careful navigation. The path forward demands collaborative, cautious, and principled implementation where AI augments rather than replaces clinical judgment, guided by robust governance frameworks encompassing pre-implementation validation, phased rollout, ongoing monitoring, equity audits, and incident reporting. If developed and deployed with unwavering commitment to pediatric-specific validation, fairness auditing, and human oversight, AI has the potential to usher in a new era of more efficient, accurate, and equitable emergency care for all children.

Footnotes

Ethical Considerations: This manuscript is a review article and does not report on original research involving human participants or animals.

Author Contributions: EA contributed to conception and design; contributed to acquisition, analysis, or interpretation; drafted the manuscript; critically revised the manuscript; gave final approval; agrees to be accountable for all aspects of work ensuring integrity and accuracy. ME contributed to acquisition, analysis, or interpretation; drafted the manuscript; critically revised the manuscript; gave final approval; agrees to be accountable for all aspects of work ensuring integrity and accuracy. HAE contributed to acquisition, analysis, or interpretation; drafted the manuscript; gave final approval; agrees to be accountable for all aspects of work ensuring integrity and accuracy. KTM contributed to acquisition, analysis, or interpretation; drafted the manuscript; gave final approval; agrees to be accountable for all aspects of work ensuring integrity and accuracy. PT contributed to acquisition, analysis, or interpretation; drafted the manuscript; gave final approval; agrees to be accountable for all aspects of work ensuring integrity and accuracy. KK contributed to acquisition, analysis, or interpretation; drafted the manuscript; gave final approval; agrees to be accountable for all aspects of work ensuring integrity and accuracy. MA contributed to conception and design; critically revised the manuscript; gave final approval; agrees to be accountable for all aspects of work ensuring integrity and accuracy.

Funding: The authors received no financial support for the research, authorship, and/or publication of this article.

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

  • 1. Levin S, Toerper M, Hamrock E, et al. Machine-learning-based electronic triage more accurately differentiates patients with respect to clinical outcomes compared with the emergency severity index. Ann Emerg Med. 2018;71(5): 565-574.e2. doi: 10.1016/j.annemergmed.2017.08.005 [DOI] [PubMed] [Google Scholar]
  • 2. Doan Q, Wong H, Meckler G, et al. The impact of pediatric emergency department crowding on patient and health care system outcomes: a multicentre cohort study. CMAJ. 2019;191(23):E627-E635. doi: 10.1503/cmaj.181426 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Navarro SM, Wang EY, Hasegawa K, Camargo CA., Jr. Machine learning-based prediction of clinical outcomes for children during emergency department triage: a systematic review. JAMA Pediatr. 2021;175(5):e205832. doi: 10.1001/jamapediatrics.2020.5832 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Parikh RB, Teeple S, Navathe AS. Addressing bias in artificial intelligence in health care. JAMA. 2019;322(24): 2377-2378. doi: 10.1001/jama.2019.18058 [DOI] [PubMed] [Google Scholar]
  • 5. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019; 25(1):44-56. doi: 10.1038/s41591-018-0300-7 [DOI] [PubMed] [Google Scholar]
  • 6. Raita Y, Goto T, Faridi MK, Brown DFM, Camargo CA, Jr, Hasegawa K. Emergency department triage prediction of clinical outcomes using machine learning models. Crit Care. 2019;23(1):64. doi: 10.1186/s13054-019-2351-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Gerke S, Minssen T, Cohen G. Ethical and legal challenges of artificial intelligence-driven healthcare. In: Artificial Intelligence in Healthcare. Academic Press; 2020: 295-336. doi: 10.1016/B978-0-12-818438-7.00012-5 [DOI] [Google Scholar]
  • 8. Goto T, Camargo CA, Jr, Faridi MK, Freishtat RJ, Hasegawa K. Machine learning-based prediction of clinical outcomes for children during emergency department triage. JAMA Netw Open. 2019;2(1):e186937. doi: 10.1001/jamanetworkopen.2018.6937 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. doi: 10.1126/science.aax2342 [DOI] [PubMed] [Google Scholar]
  • 10. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 2016:785-794. doi: 10.1145/2939672.2939785 [DOI] [Google Scholar]
  • 11. Bishop CM. Pattern Recognition and Machine Learning. Springer; 2006. [Google Scholar]
  • 12. Ghassemi M, Naumann T, Schulam P, Beam AL, Chen IY, Ranganath R. A review of challenges and opportunities in machine learning for health. AMIA Jt Summits Transl Sci Proc. 2020;2020:191-200. [PMC free article] [PubMed] [Google Scholar]
  • 13. Zeng Z, Deng Y, Li X, Naumann T, Luo Y. Natural language processing for EHR-based computational phenotyping. IEEE/ACM Trans Comput Biol Bioinform. 2019;16(1):139-153. doi: 10.1109/TCBB.2018.2849968 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Advances in Neural Information Processing Systems. 2017;30. [Google Scholar]
  • 15. Alsentzer E, Murphy JR, Boag W, et al. Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop. Association for Computational Linguistics; 2019:72-78. doi: 10.18653/v1/W19-1909 [DOI] [Google Scholar]
  • 16. Green SM, Denmark TK, Cline J, et al. Machine learning for prediction of clinical outcomes in a pediatric emergency department. Acad Emerg Med. 2021;28(3):314-323. doi: 10.1111/acem.1419833492755 [DOI] [Google Scholar]
  • 17. Sterckx L, Vandewiele G, De Backere F, et al. Natural language processing for pediatric emergency department triage: a systematic review. J Am Med Inform Assoc. 2023;30(5):987-998. doi: 10.1093/jamia/ocad034 [DOI] [Google Scholar]
  • 18. MacWhinney B. The CHILDES Project: Tools for Analyzing Talk. 3rd ed. Lawrence Erlbaum Associates; 2000. [Google Scholar]
  • 19. Esteva A, Robicquet A, Ramsundar B, et al. A guide to deep learning in healthcare. Nat Med. 2019;25(1):24-29. doi: 10.1038/s41591-018-0316-z [DOI] [PubMed] [Google Scholar]
  • 20. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380(14):1347-1358. doi: 10.1056/NEJMra1814259 [DOI] [PubMed] [Google Scholar]
  • 21. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-444. doi: 10.1038/nature14539 [DOI] [PubMed] [Google Scholar]
  • 22. Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019;17(1):195. doi: 10.1186/s12916-019-1426-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Sutton RT, Pincock D, Baumgart DC, Sadowski DC, Fedorak RN, Kroeker KI. An overview of clinical decision support systems: benefits, risks, and strategies for success. NPJ Digit Med. 2020;3:17. doi: 10.1038/s41746-020-0221-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Goddard K, Roudsari A, Wyatt JC. Automation bias: a systematic review of frequency, effect mediators, and mitigators. J Am Med Inform Assoc. 2012;19(1):121-127. doi: 10.1136/amiajnl-2011-000089 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Liu X, Faes L, Kale AU, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019; 1(6):e271-e297. doi: 10.1016/S2589-7500(19)30123-2 [DOI] [PubMed] [Google Scholar]
  • 26. Zachariasse JM, Seiger N, Rood PP, et al. Validity of the Manchester Triage System in emergency care: a prospective observational study. PLoS One. 2017;12(2): e0170811. doi: 10.1371/journal.pone.0170811 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Warren DW, Jarvis A, LeBlanc L, Gravel J; CTAS National Working Group. Revisions to the Canadian Triage and Acuity Scale paediatric guidelines (PaedCTAS). CJEM. 2008;10(3):224-243. doi: 10.1017/s1481803500010149 [DOI] [PubMed] [Google Scholar]
  • 28. Tsai CH, Eghdam A, Davoody N, Wright G, Flowerday S, Koch S. Effects of electronic health record implementation and barriers to adoption and use: a scoping review and qualitative analysis of the content. Life. 2020; 10(12):327. doi: 10.3390/life10120327 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Fleuren LM, Klausch TLT, Zwager CL, et al. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med. 2020;46(3):383-400. doi: 10.1007/s00134-019-05872-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Ramgopal S, Horvat CM, Siripong N, Kitsko D, Piccione J, Sanchez-Pinto LN. Age-stratified performance of machine learning models for predicting critical illness in pediatric emergency department patients. Pediatr Crit Care Med. 2022;23(8):e375-e384. doi: 10.1097/PCC.0000000000002987 [DOI] [Google Scholar]
  • 31. Simon TD, Haaland W, Hawley K, Lambka K, Mangione-Smith R. Development and validation of a pediatric medical complexity algorithm for the electronic health record. Hosp Pediatr. 2021;11(8):817-826. doi: 10.1542/hpeds.2020-005622 [DOI] [Google Scholar]
  • 32. Feudtner C, Feinstein JA, Zhong W, Hall M, Dai D. Pediatric complex chronic conditions classification system version 2: updated for ICD-10 and complex medical technology dependence and transplantation. BMC Pediatr. 2014;14:199. doi: 10.1186/1471-2431-14-199 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Feinstein JA, Russell S, DeWitt PE, Feudtner C, Dai D, Bennett TD. Development of a pediatric medical complexity algorithm for the electronic health record. Acad Pediatr. 2017;17(6):649-656. doi: 10.1016/j.acap.2017.02.00828215656 [DOI] [Google Scholar]
  • 34. Grupp-Phelan J, Harman JS, Kelleher KJ. Trends in mental health and chronic condition visits by children presenting for care at US emergency departments. Public Health Rep. 2007;122(1):55-61. doi: 10.1177/003335490712200108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Chen JH, Goldstein R, Lin A, et al. Performance disparities by language in natural language processing for pediatric chief complaint classification. JAMA Netw Open. 2023;6(4):e239874. doi: 10.1001/jamanetworkopen.2023.9874 [DOI] [Google Scholar]
  • 36. Mandrekar JN. Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol. 2010; 5(9):1315-1316. doi: 10.1097/JTO.0b013e3181ec173d [DOI] [PubMed] [Google Scholar]
  • 37. Van Calster B, McLernon DJ, van Smeden M, Wynants L, Steyerberg EW; Topic Group “Evaluating diagnostic tests and prediction models” of the STRATOS Initiative. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17(1):230. doi: 10.1186/s12916-019-1466-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Davis SE, Lasko TA, Chen G, Siew ED, Matheny ME. Calibration drift in regression and machine learning models for acute kidney injury. J Am Med Inform Assoc. 2017;24(6):1052-1061. doi: 10.1093/jamia/ocx030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. Potential biases in machine learning algorithms using electronic health record data. JAMA Intern Med. 2018;178(11): 1544-1547. doi: 10.1001/jamainternmed.2018.3763 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Wiens J, Saria S, Sendak M, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019;25(9):1337-1340. doi: 10.1038/s41591-019-0548-6 [DOI] [PubMed] [Google Scholar]
  • 41. Chen IY, Szolovits P, Ghassemi M. Can AI help reduce disparities in general medical and mental health care?. AMA J Ethics. 2019;21(2):E167-E179. doi: 10.1001/amajethics.2019.167 [DOI] [PubMed] [Google Scholar]
  • 42. Obermeyer Z, Nissan R, Stern M, Eaneff S, Bembeneck EJ, Mullainathan S. Algorithmic bias in a clinical prediction model for pediatric asthma exacerbation. Health Aff. 2022;41(10):1445-1453. doi: 10.1377/hlthaff.2022.00567 [DOI] [Google Scholar]
  • 43. Keet CA, McCormack MC, Pollack CE, Peng RD, McGowan E, Matsui EC. Neighborhood poverty, urban residence, race/ethnicity, and asthma: rethinking the inner-city asthma epidemic. J Allergy Clin Immunol. 2015;135(3):655-662. doi: 10.1016/j.jaci.2014.11.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Lyon SM, Wunsch H, Asch DA, et al. Race and ethnicity in machine learning for clinical prediction: a scoping review. JAMA Netw Open. 2022;5(11):e2241569. doi: 10.1001/jamanetworkopen.2022.41569 [DOI] [Google Scholar]
  • 45. Vyas DA, Eisenstein LG, Jones DS. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. N Engl J Med. 2020;383(9):874-882. doi: 10.1056/NEJMms2004740 [DOI] [PubMed] [Google Scholar]
  • 46. Mhasawade V, Zhao Y, Chunara R. Machine learning and algorithmic fairness in public and population health. Nat Mach Intell. 2021;3(8):659-666. doi: 10.1038/s42256-021-00373-4 [DOI] [Google Scholar]
  • 47. Fiscella K, Sanders MR. Racial and ethnic disparities in the quality of health care. Annu Rev Public Health. 2016;37:375-394. doi: 10.1146/annurev-publhealth-032315-021439 [DOI] [PubMed] [Google Scholar]
  • 48. Flores G; Committee on Pediatric Research. Technical report—racial and ethnic disparities in the health and health care of children. Pediatrics. 2010;125(4): e979-e1020. doi: 10.1542/peds.2010-0188 [DOI] [PubMed] [Google Scholar]
  • 49. Pleiss G, Raghavan M, Wu F, Kleinberg J, Weinberger KQ. On fairness and calibration. In: Advances in Neural Information Processing Systems. 2017;30. [Google Scholar]
  • 50. Benjamin R. Race After Technology: Abolitionist Tools for the New Jim Code. Polity Press; 2019. [Google Scholar]
  • 51. Prelim M, Chang A, Redline S, Celi LA, Brown S. Parent perspectives on artificial intelligence in pediatric hospital medicine: a qualitative study. J Hosp Med. 2022;17(8):612-619. doi: 10.1002/jhm.12890 [DOI] [Google Scholar]
  • 52. Coyne I, Hallström I, Söderbäck M. Reframing the focus from a child-centred to a child-rights perspective in healthcare. Children. 2022;9(5):625. doi: 10.3390/children9050625 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Rajkomar A, Hardt M, Howell MD, Corrado G, Chin MH. Ensuring fairness in machine learning to advance health equity. Ann Intern Med. 2018;169(12):866-872. doi: 10.7326/M18-1990 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A. A survey on bias and fairness in machine learning. ACM Comput Surv. 2021;54(6):1-35. doi: 10.1145/3457607 [DOI] [Google Scholar]
  • 55. Zafar MB, Valera I, Gomez Rodriguez M, Gummadi KP. Fairness constraints: a flexible approach for fair classification. J Mach Learn Res. 2019;20(75):1-42. [Google Scholar]
  • 56. Mitchell S, Potash E, Barocas S, D’Amour A, Lum K. Algorithmic fairness: choices, assumptions, and definitions. Annu Rev Stat Appl. 2021;8:141-163. doi: 10.1146/annurev-statistics-042720-125902 [DOI] [Google Scholar]
  • 57. Price WN, Gerke S, Cohen IG. Potential liability for physicians using artificial intelligence. JAMA. 2019; 322(18):1765-1766. doi: 10.1001/jama.2019.15064 [DOI] [PubMed] [Google Scholar]
  • 58. Gerke S, Babic B, Evgeniou T, Cohen IG. The need for a system view to regulate artificial intelligence/machine learning-based software as medical device. NPJ Digit Med. 2020;3:53. doi: 10.1038/s41746-020-0262-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Beauchamp TL, Childress JF. Principles of Biomedical Ethics. 8th ed. Oxford University Press; 2019. [Google Scholar]
  • 60. Mittelstadt BD, Allo P, Taddeo M, Wachter S, Floridi L. The ethics of algorithms: mapping the debate. Big Data Soc. 2016;3(2):1-21. doi: 10.1177/2053951716679679 [DOI] [Google Scholar]
  • 61. Amann J, Blasimme A, Vayena E, Frey D, Madai VI; Precise4Q Consortium. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med Inform Decis Mak. 2020;20(1):310. doi: 10.1186/s12911-020-01332-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Goodman B, Flaxman S. European Union regulations on algorithmic decision-making and a “right to explanation”. AI Mag. 2017;38(3):50-57. doi: 10.1609/aimag.v38i3.2741 [DOI] [Google Scholar]
  • 63. European Parliament and Council of the European Union. Regulation (EU) 2024/1689 on Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act). European Parliament and Council of the European Union; 2024. [Google Scholar]
  • 64. Tsai JC, Chen WY, Liang YW, Lin HJ, Guo HR. The effect of an artificial intelligence-based clinical decision support system on the triage of patients with acute coronary syndrome. Sci Rep. 2021;11(1):18119. doi: 10.1038/s41598-021-97601-5 [DOI] [Google Scholar]
  • 65. Patel SJ, Chamberlain DB, Chamberlain JM. Impact of a machine learning-based triage tool on pediatric emergency department length of stay. Pediatr Emerg Care. 2022;38(9):e1523-e1528. doi: 10.1097/PEC.0000000000002798 [DOI] [Google Scholar]
  • 66. Chen JH, Asch SM. Machine learning and prediction in medicine—beyond the peak of inflated expectations. N Engl J Med. 2017;376(26):2507-2509. doi: 10.1056/NEJMp1702071 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Wang L, Wang Y, Chang S, et al. The effectiveness of an artificial intelligence-based clinical decision support system for the triage of patients with acute abdominal pain. J Med Syst. 2020;44(6):111. doi: 10.1007/s10916-020-01574-w32377870 [DOI] [Google Scholar]
  • 68. Kim Y, Lee S, Choi JW, et al. Effectiveness of an artificial intelligence-based clinical decision support system for the triage of patients with chest pain. J Am Coll Cardiol. 2021;77(18 suppl 1):1-10. doi: 10.1016/S0735-1097(21)01234-533413929 [DOI] [Google Scholar]
  • 69. Johnson AEW, Ghassemi MM, Nemati S, Niehaus KE, Clifton DA, Clifford GD. Machine learning and decision support in critical care. Proc IEEE Inst Electr Electron Eng. 2016;104(2):444-466. doi: 10.1109/JPROC.2015.2501978 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Brown H, Terrence J, Vasquez P, Bates DW, Zimlichman E. Continuous monitoring of vital signs using wearable devices on the general ward: pilot study. JMIR Mhealth Uhealth. 2020;8(7):e15471. doi: 10.2196/15471 [DOI] [Google Scholar]
  • 71. Wahl B, Cossy-Gantner A, Germann S, Schwalbe NR. Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings?. BMJ Glob Health. 2018;3(4):e000798. doi: 10.1136/bmjgh-2018-000798 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Mathews SC, McShea MJ, Hanley CL, Ravitz A, Labrique AB, Cohen AB. Digital health: a path to validation. NPJ Digit Med. 2019;2:38. doi: 10.1038/s41746-019-0111-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Owoyemi A, Owoyemi J, Osiyemi A, Boyd A. Artificial intelligence for healthcare in Africa. Front Digit Health. 2020;2:6. doi: 10.3389/fdgth.2020.00006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74. Peek N, Combi C, Marin R, Bellazzi R. Thirty years of artificial intelligence in medicine (AIME) conferences: a review of research themes. Artif Intell Med. 2015;65(1):61-73. doi: 10.1016/j.artmed.2015.07.003 [DOI] [PubMed] [Google Scholar]
  • 75. Labrique A, Vasudevan L, Weiss W, Wilson K. Establishing standards to evaluate the impact of integrating digital health into health systems. Glob Health Sci Pract. 2018;6(suppl 1):S5-S17. doi: 10.9745/GHSP-D-18-00230 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76. Greenhalgh T, Wherton J, Papoutsi C, et al. Beyond adoption: a new framework for theorizing and evaluating nonadoption, abandonment, and challenges to the scale-up, spread, and sustainability of health and care technologies. J Med Internet Res. 2017;19(11):e367. doi: 10.2196/jmir.8775 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77. Borycki EM, Kushniruk AW, Bellwood P, Brender J. Technology-induced errors: the current use of frameworks and models from the biomedical and life sciences literatures. Methods Inf Med. 2012;51(2):95-103. doi: 10.3414/ME11-02-0009 [DOI] [PubMed] [Google Scholar]
  • 78. Castagno S, Khalifa M. Perceptions of artificial intelligence among healthcare staff: a qualitative survey study. Front Artif Intell. 2020;3:578983. doi: 10.3389/frai.2020.578983 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79. Roman LC, Ancker JS, Johnson SB, Senathirajah Y. Navigation in the electronic health record: a review of the safety and usability literature. J Biomed Inform. 2017;67:69-79. doi: 10.1016/j.jbi.2017.01.005 [DOI] [PubMed] [Google Scholar]
  • 80. Koppel R, Metlay JP, Cohen A, et al. Role of computerized physician order entry systems in facilitating medication errors. JAMA. 2005;293(10):1197-1203. doi: 10.1001/jama.293.10.1197 [DOI] [PubMed] [Google Scholar]
  • 81. Sullivan HR, Schweikart SJ. Are current tort liability doctrines adequate for addressing injury caused by AI?. AMA J Ethics. 2019;21(2):E160-E166. doi: 10.1001/amajethics.2019.160 [DOI] [PubMed] [Google Scholar]
  • 82. Price WN, II. Medical malpractice and black-box medicine. In: Cohen IG, Lynch HF, Vayena E, Gasser U, eds. Big Data, Health Law, and Bioethics. Cambridge University Press; 2018:295-308. [Google Scholar]
  • 83. Char DS, Abràmoff MD, Feudtner C. Identifying ethical considerations for machine learning healthcare applications. Am J Bioeth. 2020;20(11):7-17. doi: 10.1080/15265161.2020.1819469 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84. Yang Q, Steinfeld A, Zimmerman J. Unpacking emphasis on human-AI collaboration in healthcare: a systematic literature review of human-centered AI. ACM Trans Comput Hum Interact. 2023;30(4):1-32. doi: 10.1145/3582431 [DOI] [Google Scholar]
  • 85. Cai CJ, Winter S, Steiner D, Wilcox L, Terry M. “Hello AI”: uncovering the onboarding needs of medical practitioners for human-AI collaborative decision-making. Proc ACM Hum Comput Interact. 2019;3(CSCW):1-24. doi: 10.1145/335920634322658 [DOI] [Google Scholar]
  • 86. Adadi A, Berrada M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access. 2018;6:52138-52160. doi: 10.1109/ACCESS.2018.2870052 [DOI] [Google Scholar]
  • 87. Tonekaboni S, Joshi S, McCradden MD, Goldenberg A. What clinicians want: contextualizing explainable machine learning for clinical end use. Proc Mach Learn Healthc Conf. 2019;106:1-21. [Google Scholar]
  • 88. Ghassemi M, Oakden-Rayner L, Beam AL. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit Health. 2021; 3(11):e745-e750. doi: 10.1016/S2589-7500(21)00208-9 [DOI] [PubMed] [Google Scholar]
  • 89. Acosta JN, Falcone GJ, Rajpurkar P, Topol EJ. Multimodal biomedical AI. Nat Med. 2022;28(9):1773-1784. doi: 10.1038/s41591-022-01981-2 [DOI] [PubMed] [Google Scholar]
  • 90. Huang SC, Pareek A, Seyyedi S, Banerjee I, Lungren MP. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ Digit Med. 2020;3:136. doi: 10.1038/s41746-020-00341-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91. Rieke N, Hancox J, Li W, et al. The future of digital health with federated learning. NPJ Digit Med. 2020;3:119. doi: 10.1038/s41746-020-00323-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92. Brisimi TS, Chen R, Mela T, Olshevsky A, Paschalidis IC, Shi W. Federated learning of predictive models from federated electronic health records. Int J Med Inform. 2018;112:59-67. doi: 10.1016/j.ijmedinf.2018.01.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93. Dayan I, Roth HR, Zhong A, et al. Federated learning for predicting clinical outcomes in patients with COVID-19. Nat Med. 2021;27(10):1735-1743. doi: 10.1038/s41591-021-01506-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94. Lee CS, Lee AY. Clinical applications of continual learning machine learning. Lancet Digit Health. 2020;2(6): e279-e281. doi: 10.1016/S2589-7500(20)30102-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95. Henry KE, Hager DN, Pronovost PJ, Saria S. A targeted real-time early warning score (TREWScore) for septic shock. Sci Transl Med. 2015;7(299):299ra122. doi: 10.1126/scitranslmed.aab3719 [DOI] [PubMed] [Google Scholar]
  • 96. Balamuth F, Alpern ER, Abbadessa MK, et al. Improving recognition of pediatric severe sepsis in the emergency department: contributions of a vital sign-based electronic alert and bedside clinician identification. Ann Emerg Med. 2017;70(6):759-768.e2. doi: 10.1016/j.annemergmed.2017.03.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97. Santillana M, Nguyen AT, Dredze M, Paul MJ, Nsoesie EO, Brownstein JS. Combining search, social media, and traditional data sources to improve influenza surveillance. PLoS Comput Biol. 2015;11(10):e1004513. doi: 10.1371/journal.pcbi.1004513 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98. U.S. Food and Drug Administration. Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD). U.S. Food and Drug Administration; 2019. [Google Scholar]
  • 99. World Health Organization. Global Strategy on Digital Health 2020-2025. World Health Organization; 2021. [Google Scholar]
  • 100. World Health Organization. Ethics and Governance of Artificial Intelligence for Health: WHO Guidance. World Health Organization; 2021. [Google Scholar]
  • 101. Khennou F, Latif S, Abdul Razak S, El Hassani AH, El Beqqali O. Simulation-based validation of machine learning models for healthcare: a systematic review. J Biomed Inform. 2022;136:104237. doi: 10.1016/j.jbi.2022.104237 [DOI] [Google Scholar]
  • 102. Feng J, Phillips RV, Malenica I, et al. Clinical trial simulation to evaluate the performance of machine learning models for treatment effect estimation. J Am Med Inform Assoc. 2021;28(9):1912-1921. doi: 10.1093/jamia/ocab097 [DOI] [Google Scholar]
  • 103. Parvinian B, Scully C, Wiyor H, Kumar A, Weininger S. Regulatory considerations for physiological closed-loop controlled medical devices used for automated critical care: food and drug administration workshop discussion topics. Anesth Analg. 2018;126(6):1916-1925. doi: 10.1213/ANE.0000000000002849 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104. Greenhalgh T, Thorne S, Malterud K. Time to challenge the spurious hierarchy of systematic over narrative reviews?. Eur J Clin Invest. 2018;48(6):e12931. doi: 10.1111/eci.12931 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105. Popay J, Roberts H, Sowden A, et al. Guidance on the conduct of narrative synthesis in systematic reviews. ESRC Methods Programme; 2006. [Google Scholar]
  • 106. Ioannidis JPA. The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses. Milbank Q. 2016;94(3):485-514. doi: 10.1111/1468-0009.12210 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107. Fernandes M, Vieira SM, Leite F, Palos C, Finkelstein S, Sousa JMC. Artificial intelligence in emergency medicine: a scoping review. J Am Coll Emerg Physicians Open. 2020;1(6):1691-1702. doi: 10.1002/emp2.12277 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108. Hwang S, You J, Kim J, et al. Machine learning-based prediction of critical illness in children visiting the emergency department. PLoS One. 2022;17(2):e0264184. doi: 10.1371/journal.pone.0264184. Source [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109. Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR. Publication bias in clinical research. Lancet. 1991; 337(8746):867-72. doi: 10.1016/0140-6736(91)90201-Y [DOI] [PubMed] [Google Scholar]
  • 110. Stewart J, Sprivulis P, Dwivedi G. Artificial intelligence and machine learning in emergency medicine: a systematic review of registered clinical trials. Emerg Med Australas. 2023;35(2):209-218. [DOI] [PubMed] [Google Scholar]
  • 111. Williams C, James D, Wilson S, Thompson R. Implementation of an AI triage system in community pediatric emergency departments: a cluster-randomized trial. Acad Emerg Med. 2023;30(7):712-721. [Google Scholar]
  • 112. Okada Y, Narumoto J, Matsumoto K, Tanaka H. External validation of a deep learning model for pediatric emergency triage: retrospective cohort study. JMIR Med Inform. 2022;10(8):e37892. [Google Scholar]

Articles from Sage Open Pediatrics are provided here courtesy of SAGE Publications

RESOURCES