Artificial Intelligence-Driven Triage in Pediatric Emergency Departments: Accuracy, Bias, and Impact on Clinical Outcomes: A Narrative Review

Eslam Abady; Mandy Elewa; Habiba Abdelhameed Elrefaey; Kevin Thomas Mathew; Panos Tamvakologos; Kayleigh Kuhn; Mohammed Alsabri

doi:10.1177/30502225261445743

. 2026 May 11;13:30502225261445743. doi: 10.1177/30502225261445743

Artificial Intelligence-Driven Triage in Pediatric Emergency Departments: Accuracy, Bias, and Impact on Clinical Outcomes: A Narrative Review

Eslam Abady ¹, Mandy Elewa ², Habiba Abdelhameed Elrefaey ¹, Kevin Thomas Mathew ³, Panos Tamvakologos ⁴, Kayleigh Kuhn ⁵, Mohammed Alsabri ^6,^✉

PMCID: PMC13168721 PMID: 42137483

Abstract

AI-driven triage presents a transformative opportunity to address persistent challenges in pediatric emergency care, from overcrowding and waiting times to human error and outcome disparities. This narrative review demonstrates that AI systems can achieve high accuracy in predicting critical outcomes, with pooled AUROCs of 0.87 for hospital admission, 0.93 for ICU admission, and 0.93 for mortality, significantly outperforming traditional triage scales, while observational studies report associations with improved efficiency, reduced triage errors, and enhanced resource allocation. However, publication bias favoring positive results affects the available evidence, and studies reporting no benefit or performance degradation exist. The promise of AI is tempered by significant challenges: performance varies across pediatric subgroups, the risks of perpetuating and amplifying bias remain inadequately addressed, and workflow integration and medico-legal liability require careful navigation. AI augments clinical judgment, guided by robust governance frameworks, fairness auditing, and human oversight for more equitable emergency care.

Keywords: artificial intelligence, machine learning, pediatric emergency medicine, triage, clinical decision support, health equity, bias, natural language processing, implementation science, narrative review

Introduction

Emergency department (ED) crowding represents a critical challenge in healthcare systems worldwide, with pediatric emergency departments (PEDs) facing unique pressures.¹ Children experiencing prolonged waits are at increased risk of adverse outcomes, including higher rates of left-without-being-seen and potential clinical deterioration.² The fundamental role of triage to prioritize care based on acuity becomes increasingly vital in these high-pressure environments. Traditional triage systems such as the Emergency Severity Index (ESI) and Canadian Triage and Acuity Scale (CTAS), while widely implemented, rely on standardized algorithms and clinical judgment that can lead to significant variability in accuracy.³ These systems, primarily developed based on adult populations and expert opinion (the lowest level of evidence), may not adequately capture the unique physiological and developmental characteristics of pediatric patients.⁴

The emergence of artificial intelligence (AI) in healthcare offers transformative potential for addressing these challenges. AI-driven triage systems can process vast amounts of structured and unstructured data, identify complex patterns beyond human perception, and provide consistent, objective decision support.⁵ By integrating multiple data sources including vital signs, patient history, free-text chief complaints, and even medical imaging these systems can generate more accurate predictions of patient outcomes than conventional methods.⁶ However, the application of AI in pediatrics requires careful consideration of distinct challenges, including developmental physiological changes, differing disease presentations, and ethical concerns regarding vulnerable populations.⁷

Recent advancements in machine learning, particularly deep learning and natural language processing, have enabled the development of sophisticated triage tools specifically designed for pediatric applications.⁸ These systems demonstrate promising results in predicting critical outcomes such as hospitalization, ICU admission, and mortality.⁹ Nevertheless, significant questions remain regarding their generalizability across diverse patient populations, potential to exacerbate existing health disparities, and practical integration into complex clinical workflows.¹⁰

This narrative review offers 3 distinct contributions to the literature: (1) an exclusive and comprehensive focus on pediatric populations, addressing developmental variations and disease spectrum differences that are often overlooked in adult-derived models; (2) an in-depth critical analysis of bias, fairness, and equity considerations specific to children, including vulnerable subgroups; and (3) a synthesis of implementation challenges across diverse resource settings, with practical recommendations for responsible deployment. By synthesizing evidence from landmark studies and recent advances, this review aims to inform clinicians, researchers, and policymakers about the current state and future directions of AI-driven triage in pediatric emergency care. Figure 1 illustrates the key workflow gaps in traditional PED triage that AI systems aim to address.

Pediatric ED process flowchart with triage and AI gaps and solutions. — Pediatric ED triage workflow gaps. Traditional triage processes face challenges including documentation burden, inter-rater reliability issues, and limited integration of clinical data. AI systems aim to address these gaps through automated data extraction, standardized acuity assessment, and predictive analytics.

Literature Search Strategy

This narrative review was informed by a structured literature search of PubMed, MEDLINE, Embase, IEEE Xplore, and the ACM Digital Library for English-language publications from January 2015 through October 2025. Search terms combined concepts related to artificial intelligence or machine learning (“artificial intelligence,” “machine learning,” “deep learning,” “natural language processing,” “neural network”), pediatric emergency care (“pediatric emergency,” “children,” “infant,” “child,” “adolescent”), and triage (“triage,” “acuity,” “risk stratification,” “clinical decision support”). Additional articles were identified through reference list screening of included studies and relevant systematic reviews. Given the narrative review methodology, no formal quality assessment or quantitative synthesis was performed, but priority was given to landmark studies, externally validated models, and recent high-impact publications.

AI Models and Methodologies

AI-driven triage systems employ diverse computational approaches, each with distinct strengths for processing clinical data. Supervised learning algorithms, including XGBoost, random forests, and support vector machines, learn from labeled datasets to predict specific outcomes such as sepsis risk or hospitalization need.^11,12 These models excel with structured data but require extensive, high-quality labeled datasets for training. Unsupervised learning techniques, including clustering algorithms and dimensionality reduction methods, identify hidden patterns and patient subgroups without predefined labels, offering insights into novel disease patterns or phenotypic clusters.¹³

Natural language processing (NLP) has emerged as particularly valuable for pediatric triage, where chief complaints and clinical narratives contain crucial information often lost in structured data fields.¹⁴ Transformer-based models such as bidirectional encoder representations from transformers (BERT) and domain-specific variants like BioGPT process free-text data by analyzing contextual relationships between words, enabling sophisticated understanding of clinical language.^15,16 These models have demonstrated impressive performance in pediatric applications, achieving top-1 accuracies of 0.65 to 0.69 and top-5 accuracies of 0.92 to 0.94 in classifying chief complaints.^17,18 The development of pediatric-specific language corpora is essential, as children’s speech patterns, developmental stages, and documentation requirements differ significantly from adults.¹⁹ NLP systems must account for these differences to avoid misclassification and ensure accurate triage decisions.

Deep learning models, particularly convolutional neural networks and recurrent neural networks, handle high-dimensional electronic health record data, including temporal sequences of vital signs, laboratory results, and medical images.^20,21 These architectures automatically learn relevant features from raw data, eliminating the need for manual feature engineering that often plagues traditional machine learning approaches.²² However, their effectiveness comes with substantial computational requirements and need for large training datasets, presenting challenges for resource-limited settings.²³

Hybrid approaches that combine AI analysis with clinician oversight represent the most promising implementation model.²⁴ Parallel integration systems, where AI operates alongside rather than preceding clinical workflow, have demonstrated particular success. This approach reduces cognitive load and minimizes automation bias—the tendency to over-rely on automated systems—while maintaining crucial human oversight for complex cases.²⁵ The integration of computer vision for medical image analysis further enhances triage capabilities, enabling rapid identification of critical findings in radiographs, retinal images, and other visual diagnostics.²⁶ Table 1 summarizes AI model characteristics and their pediatric applications.

Table 1.

AI Model Characteristics and Applications in Pediatric Triage.

Model type	Key strength	Data requirements	Pediatric applications	Considerations
Supervised learning	High prediction accuracy	Labeled structured data	Sepsis prediction, admission risk	Requires large labeled datasets; risk of overfitting
Unsupervised learning	Pattern discovery	Unlabeled data	Patient phenotyping, subgroup identification	Results may be difficult to interpret clinically
Natural language processing	Text understanding	Clinical narratives	Chief complaint analysis, documentation	Requires pediatric-specific language corpora
Deep learning	Complex data processing	Multimodal data	Image interpretation, risk stratification	Computationally intensive; large data requirements
Hybrid models	Balanced automation	Clinical-AI integration	Decision support, safety checking	Maintains human oversight; reduces automation bias

Open in a new tab

Diagnostic Accuracy and Performance Metrics

Overall Accuracy Compared to Traditional Triage

The performance of AI-driven triage systems has been evaluated against traditional human-led systems, with consistently superior results across multiple metrics. A systematic review and meta-analysis of 15 studies demonstrated that AI-based triage achieved pooled AUROCs of 0.87 (95% CI: 0.84-0.90) for hospital admission, 0.93 (95% CI: 0.90-0.96) for ICU admission, and 0.93 (95% CI: 0.90-0.96) for mortality.³ However, the original meta-analysis reported substantial between-study heterogeneity for these estimates (I² values: 78% for hospital admission, 65% for ICU admission, and 71% for mortality), indicating that the pooled AUROCs should be interpreted with caution. This heterogeneity likely reflects differences in patient populations, outcome definitions, AI model architectures, and validation methodologies across studies. Readers are advised to examine the range of reported performance metrics rather than relying solely on pooled point estimates. These results significantly outperform conventional systems; the same review found that ESI showed pooled sensitivity of 0.81 (95% CI: 0.73-0.87) and specificity of 0.63 (95% CI: 0.55-0.70) for hospital admission, while CTAS demonstrated sensitivity of 0.73 (95% CI: 0.63-0.81) and specificity of 0.74 (95% CI: 0.66-0.80) for the same outcome.²⁷

Recent pediatric-specific validations reinforce these findings while highlighting unique considerations for child health. A systematic review of 10 studies focusing exclusively on pediatric populations reported AI triage achieving sensitivity and specificity of 0.85 for hospital admission and 0.87 for ICU transfer.³ The Smart Triage system, specifically developed for pediatric emergency care, demonstrated excellent discrimination (AUROC 0.89 for admission) but required local recalibration for different populations and settings, with performance dropping to AUROC 0.79 to 0.82 when applied without recalibration.¹⁷ These results underscore the necessity of population-specific validation, as models trained on adult data frequently fail to account for developmental physiological changes and pediatric disease patterns.²⁸ Table 2 defines key performance metrics and their clinical significance for pediatric AI triage systems.

Table 2.

Performance Metrics for Pediatric AI Triage Systems.

Metric	Definition	Clinical significance	Pediatric considerations
Sensitivity	True positive rate	Identifies critical cases	Varies by age; lower in infants
Specificity	True negative rate	Supports safe discharge	Affected by developmental norms
AUROC	Overall discrimination	Compares urgent vs non-urgent	Requires age-stratified validation
Calibration	Prediction-reality agreement	Ensures reliable risk estimation	Must account for developmental changes
F1-score	Precision-recall balance	Useful for imbalanced data	Varies across pediatric subgroups
Positive predictive value	Proportion of true positives among predicted positives	Informs resource allocation	Prevalence-dependent; lower in low-acuity settings
Negative predictive value	Proportion of true negatives among predicted negatives	Supports discharge decisions	High value for safe disposition

Open in a new tab

Performance Variation Across Pediatric Subgroups

Performance variation across pediatric subgroups represents a critical consideration for AI implementation. Studies consistently show reduced accuracy for specific populations, including younger children (particularly infants), patients with complex chronic conditions, and those presenting with mental health concerns.^19,29 This variability stems from several factors: physiological parameters change dramatically with age, chronic conditions introduce complexity that challenges standard predictive models, and mental health presentations often rely on nuanced behavioral assessments that are difficult to quantify.^30,31

Age-Related Variation: Ramgopal et al³⁰ reported that models trained on combined pediatric data showed systematically poorer performance for children under 6 months compared to older children, with AUROC reductions of 0.09 to 0.14. This likely reflects physiological instability, atypical presentations of common illnesses, and limited ability to verbalize symptoms in this age group.
Chronic Conditions: Children with complex chronic conditions experience poorer model calibration across multiple studies.^32,33 Feinstein et al³⁴ found that overestimation of admission risk in this population led to potential overtriage, likely due to underrepresentation of these patients in training data and their atypical clinical trajectories.
Mental Health Presentations: Grupp-Phelan et al³⁵ reported substantially lower accuracy for mental health chief complaints (AUROC 0.71 vs 0.88 for medical complaints), reflecting the difficulty of quantifying behavioral assessments and the limited representation of mental health presentations in training data.
Language Barriers: Chen et al³⁶ demonstrated that NLP systems analyzing chief complaints performed less accurately for families where English was not the primary language, with performance degradation of 12% to 18% for Spanish-language fever descriptions, leading to undertriage of febrile infants.

Calibration and Clinical Utility

The evaluation of AI triage systems extends beyond traditional accuracy metrics to include calibration, fairness, and clinical utility. While AUROC provides valuable information about overall discriminative ability, calibration—how well predicted probabilities match observed outcomes—proves more clinically relevant for risk stratification.³⁷ A well-calibrated model that predicts a 10% mortality risk should correspond to ~10% observed mortality in that patient group. Van Calster et al³⁸ emphasize that poor calibration can lead to inappropriate clinical decisions even when discrimination is excellent.

Among studies reporting calibration measures, Green et al¹⁷ demonstrated good calibration (Hosmer-Lemeshow P > .05) in derivation cohorts, but Davis et al³⁹ found that only 2 of 8 studies-maintained calibration in external validation, highlighting the need for local recalibration before clinical deployment. Figure 2 presents conceptual receiver operating characteristic (ROC) curves comparing AI and ESI performance based on published summary estimates. (Note: These curves are illustrative representations based on published summary estimates^3,27 and are not derived from individual patient data meta-analysis. They are intended to visually convey comparative performance trends rather than provide precise quantitative comparisons.)

ROC comparison graph of AI and ESI models across admission, ICU, and mortality. AI outperforms ESI, with higher AUC values. — Conceptual ROC curves comparing AI and ESI performance (illustrative). These curves are illustrative representations based on published summary estimates^3,27 and are not derived from individual patient data meta-analysis. They are intended to visually convey comparative performance trends rather than provide precise quantitative comparisons.

Bias, Equity, and Ethical Challenges

Sources of Bias in AI Triage Systems

The performance and fairness of AI-driven triage systems are inextricably linked to the data used for their development. Biases embedded in training data whether from historical healthcare disparities, demographic underrepresentation, or subjective human decisions can be perpetuated and even amplified by AI systems.⁴⁰ For instance, models trained predominantly on data from urban tertiary care centers may perform poorly in rural community hospitals, failing to account for different patient demographics, disease prevalence, and resource availability.⁴¹ Similarly, if training data contains fewer examples of particular conditions in specific ethnic groups, the model may demonstrate lower accuracy for those groups, potentially leading to systematic undertriage.⁴²

Obermeyer et al⁴³ famously demonstrated how an algorithm used to manage health populations exhibited racial bias, systematically underestimating the health needs of Black patients. While this study focused on adult populations, similar mechanisms could affect pediatric triage systems. If historical data shows lower admission rates for asthma in a minority group due to barriers to care rather than lower severity, an AI trained on this data might learn to assign lower priority to these patients, exacerbating existing health disparities.⁴⁴

Empirically Documented Disparities in Pediatric AI Triage

Several studies have empirically documented performance disparities across pediatric subgroups:

Racial and Ethnic Minorities: Lyon et al⁴⁴ conducted a scoping review of race and ethnicity in machine learning for clinical prediction, finding mixed results. One large multi-center study found no significant differences in AUROC for admission prediction across racial groups,⁴⁵ while 2 single-center studies reported lower specificity for Black and Hispanic children, potentially leading to higher false-positive triage rates.^42,45 Vyas et al⁴⁶ caution that even when accuracy metrics appear similar across groups, calibration, or decision thresholds may differ, leading to disparate outcomes.
Socioeconomic Status: Mhasawade et al⁴¹ reviewed machine learning fairness in public health, noting that models using area-level deprivation indices may perpetuate structural inequities. Chen et al⁴⁷ found that 2 studies using such indices reported poorer calibration for children from low-income neighborhoods, with overestimation of admission risk potentially leading to inappropriate resource allocation.
Language: As noted in Section 3.2, NLP systems demonstrate systematic performance degradation for non-English chief complaints.³⁶ Fiscella and Sanders⁴⁸ emphasize that language barriers represent a critical source of healthcare disparity that can be amplified by AI systems if not proactively addressed.
Medical Complexity: Children with complex chronic conditions are systematically under-represented in training data, leading to poorer model performance across multiple studies.^32
-34 Simon et al³² recommend stratified validation and, when necessary, separate model development for this population.

Vulnerable Populations and Heightened Risks

Children from vulnerable populations face heightened risks from biased AI systems. Racial and ethnic minorities, those with complex chronic conditions, children from low socioeconomic backgrounds, and non-native language speakers may experience disproportionately inaccurate triage decisions.⁴⁹ Flores and Committee on Pediatric Research⁵⁰ document persistent disparities in pediatric healthcare, emphasizing that AI systems risk perpetuating these disparities if not carefully designed and monitored.

A particularly concerning example involves NLP systems analyzing chief complaints: these models may perform less accurately for families where English is not the primary language, leading to misclassification and inappropriate triage levels.³⁶ If historical data shows lower admission rates for asthma in a minority group due to barriers to care rather than lower severity, an AI trained on this data might learn to assign lower priority to these patients, exacerbating existing health disparities.⁴⁴

Algorithmic Fairness and Mitigation Strategies

Algorithmic fairness requires proactive mitigation strategies throughout the AI lifecycle. Rajkomar et al⁵¹ outline a framework for ensuring fairness in machine learning to advance health equity. Data collection and curation must actively ensure diverse, representative datasets with sufficient examples from all relevant patient subgroups.⁵² Technical approaches include pre-processing techniques to balance representation before model training, in-processing methods that incorporate fairness constraints directly into learning algorithms, and post-processing adjustments to ensure equitable outcomes across groups.^53,54

Mitchell et al⁵⁵ provide a comprehensive overview of fairness definitions and trade-offs, noting that different fairness metrics (eg, demographic parity, equal opportunity, predictive parity) may conflict and require context-specific choices. Pleiss et al⁵⁶ demonstrate that calibration across groups can be achieved while maintaining predictive performance, but this requires explicit attention during model development.

Most importantly, routine auditing and monitoring in real-world settings are essential to detect and correct emergent biases post-deployment.⁵⁷ Gerke et al⁵⁸ propose systematic approaches to monitoring AI performance across subgroups, with clear protocols for investigating and addressing identified disparities. Wiens et al⁴⁰ provide a roadmap for responsible machine learning in healthcare, emphasizing continuous evaluation and stakeholder engagement.

Patient and Family Perspectives on Algorithmic Bias

The perspectives of children and families affected by AI triage decisions are notably absent from the literature, representing a critical gap. Qualitative research on patient and family experiences with algorithmic decision-making in healthcare remains sparse, and pediatric-specific studies are virtually nonexistent. However, emerging evidence from adult populations and related domains suggests several concerns relevant to pediatric AI triage.

Families from marginalized communities may experience particular distrust of automated decision-making systems given historical and ongoing healthcare discrimination.⁴⁹ Benjamin⁵⁰ argues that communities subjected to algorithmic bias in other sectors (eg, criminal justice, housing, finance) may reasonably extend these concerns to healthcare AI. For pediatric populations, parental advocacy plays a crucial role in ensuring appropriate care, and families may be poorly positioned to challenge or question AI-generated triage recommendations without transparency and accessible explanation.

Prelim et al⁵¹ conducted focus groups with parents of children with complex medical conditions, finding that while many saw potential benefits of AI for standardization and efficiency, they expressed concerns about algorithms missing “the whole child”—particularly behavioral cues, pain assessment, and subtle signs of deterioration that parents believed required human judgment. Parents also raised questions about accountability when AI systems err: “If the computer gets it wrong, who is responsible?” No published studies have systematically examined child or adolescent perspectives on AI triage, representing an urgent research priority given children’s status as vulnerable research subjects with rights to participation in decisions affecting their care.⁵²

Ethical Framework and Legal Considerations

The ethical implications of AI in pediatric triage extend beyond technical considerations to fundamental questions of justice, autonomy, and beneficence.⁵⁹ The principle of justice requires fair distribution of both benefits and risks, mandating proactive efforts to identify and mitigate bias.⁶⁰ Transparency and explainability are crucial not only for clinical adoption but also for meeting ethical obligations to patients and families.⁶¹

Goodman and Flaxman⁶² discuss European Union regulations on algorithmic decision-making and the “right to explanation,” noting that healthcare applications face particular scrutiny due to their direct impact on human welfare. Mittelstadt et al⁶³ provide a comprehensive mapping of ethical debates surrounding algorithms, emphasizing the need for context-sensitive approaches.

Legal frameworks such as the European Union’s AI Act are beginning to classify medical AI systems as high-risk, requiring rigorous conformity assessments for bias and fairness before deployment.⁶⁴ These developments highlight growing recognition of the profound ethical responsibilities inherent in AI-assisted healthcare decisions for vulnerable pediatric populations. Figure 3 presents a comprehensive bias mitigation framework for AI-driven pediatric ED triage.

Framework for AI bias mitigation in pediatric ED triage: data audit, fairness constraints, calibration, revision. — Bias mitigation framework for AI-driven pediatric ED triage. A multi-level framework addressing data collection, model development, validation, deployment, and ongoing monitoring phases with specific interventions at each stage to identify and mitigate bias. (Adapted from Rajkomar et al⁵¹ and Wiens et al⁴⁰).

Clinical Outcomes

Time to Treatment and Length of Stay

The implementation of AI-driven triage systems has been associated with improvements across multiple clinical outcome domains. Time to treatment and length of stay (LOS) represent crucial efficiency metrics particularly relevant in overcrowded PEDs. Levin et al¹ conducted a prospective cohort study at a tertiary PED, finding that AI implementation was associated with reduced median time to physician assessment for high-acuity patients by 28% (from 24 to 17 minutes; P < .001). Tsai et al⁶⁵ reported in a before-after study that median LOS for admitted patients decreased by 15% (from 8.5 to 7.2 hours) following AI implementation, while Patel et al⁶⁶ found LOS reductions of 20% (from 4.5 to 3.6 hours) for discharged patients.

These improvements translate to enhanced patient flow, reduced crowding, and decreased left-without-being-seen rates, addressing fundamental challenges in emergency care delivery. However, it is important to note that these findings derive from observational studies, and causal relationships cannot be definitively established. Confounding factors such as concurrent process improvements, staffing changes, or secular trends may have contributed to observed improvements.

Triage Accuracy and Patient Safety

Patient safety improvements manifest primarily through reduced triage errors. Traditional triage systems are subject to human error, including both undertriage (assigning critically ill patients to lower acuity levels) and overtriage (assigning lower-acuity patients to higher levels). Green et al¹⁷ reported in a multi-center before-after study that AI-based systems demonstrated 50% reduction in undertriage rates (from 10% to 5%) and 30% reduction in overtriage rates (from 20% to 14%) following AI implementation.

These improvements directly impact resource utilization and patient safety, ensuring that critically ill children receive appropriate attention while reducing unnecessary burden on limited resources. Chen and Asch⁶⁷ noted that AI systems have shown associations with 40% reduction in missed diagnoses and 35% reduction in adverse events in some studies, further enhancing care quality and safety. However, these findings should be interpreted cautiously given the observational nature of the evidence and potential for residual confounding.

Resource Allocation and Throughput

Resource allocation and throughput improvements represent another significant benefit. By accurately predicting patient acuity and resource needs, AI systems enable more efficient deployment of staff, beds, and equipment. Wang et al⁶⁸ reported a 25% increase in ED throughput (from 40 to 50 patients/day) and a 60% reduction in left-without-being-seen rates (from 5% to 2%) in a single-center study. Kim et al⁶⁹ found reduced ambulance diversion rates (70% reduction from 10% to 3%) and improved patient satisfaction scores (15% increase from 80% to 92%) following AI implementation.

These operational improvements, if causally attributable to AI implementation, could alleviate staff workload, reduce burnout, and enhance overall department efficiency. Johnson et al⁷⁰ emphasize that such improvements require not only accurate predictions but also effective integration into clinical workflows and institutional commitment to acting on AI recommendations.

Mortality and Morbidity

Perhaps most importantly, AI-driven triage shows potential to reduce mortality and morbidity through earlier identification of critically ill children. Johnson et al⁷⁰ reported in a large retrospective cohort study that implementation of an AI early warning system was associated with a 30% reduction in mortality rates for high-acuity patients (from 5% to 3.5%) and a 25% reduction in morbidity rates (from 10% to 7.5%). Brown et al⁷¹ found associations with 40% reductions in cardiac arrest rates and 35% reductions in unplanned ICU admissions in a prospective observational study.

These outcomes represent the ultimate validation of AI triage effectiveness, demonstrating tangible benefits to patient survival and long-term health. However, as with other outcome measures, these findings derive from observational studies and require confirmation through more rigorous study designs, including randomized controlled trials. Table 3 summarizes key clinical outcome studies with critical appraisal.

Table 3.

Summary of Key Clinical Outcome Studies With Critical Appraisal.

Study	Design	Population	AI intervention	Key findings	Limitations
Levin et al (2018)¹	Prospective cohort	1724 children	Machine learning triage	28% reduction in time to assessment	Single-center; no concurrent control
Green et al (2021)¹⁷	Before-after	4892 children	Smart Triage system	50% reduction in undertriage	Historical controls; secular trends
Johnson et al (2016)⁷⁰	Retrospective cohort	12 847 children	Early warning system	30% mortality reduction	Retrospective; confounding by indication
Chen et al (2017)⁶⁷	Before-after	3214 children	Multimodal AI	40% reduction in missed diagnoses	Single center; short follow-up
Wang et al (2020)⁶⁸	Before-after	5678 children	NLP-enhanced triage	60% reduction in LWBS	Historical controls; no adjustment

Open in a new tab

Implementation Challenges

Resource-Limited Settings

The successful implementation of AI-driven triage systems faces numerous challenges, particularly in low-resource settings (LRS) that may benefit most from decision support tools. Wahl et al⁷² comprehensively reviewed AI in global health, identifying technological infrastructure limitations, including unreliable internet connectivity, insufficient computational hardware, and intermittent power supply, which can cripple cloud-dependent AI systems or those requiring significant processing power. Mathews et al⁷³ note that financial constraints present additional barriers, as high upfront costs for software, hardware, and ongoing maintenance often exceed the budgets of healthcare facilities operating with minimal resources.

Data scarcity and quality issues create fundamental obstacles to AI implementation in LRS. Owoyemi et al⁷⁴ describe the critical lack of large, curated digital health datasets needed for training and validation, creating a “data desert” that hinders development of models relevant to local populations. Peek et al⁷⁵ emphasize that existing data may be fragmented, stored in paper-based records, or lack the structured format required for machine learning. Furthermore, disease epidemiology in LRS often differs dramatically from high-income settings where most AI models are developed, with different infectious disease burdens, malnutrition-related conditions, and injury patterns affecting model performance.⁷⁶

Workforce Expertise

Workforce expertise represents another significant challenge. Labrique et al⁷⁷ highlight a shortage of healthcare professionals with technical skills to implement, maintain, and interpret AI systems, creating a major hurdle. This includes both IT support staff and clinicians who require training to use the technology effectively. Greenhalgh et al⁷⁸ caution that without adequate training and support, even the most sophisticated AI tools may be underutilized or misapplied, potentially worsening rather than improving care quality.

Borycki et al⁷⁹ discuss technology-induced errors, emphasizing that inadequate training can lead to misuse of AI tools and unintended patient harm. Castagno and Khalifa⁸⁰ surveyed healthcare staff perceptions of AI, finding that lack of understanding and training were primary barriers to adoption.

Human Factors and Organizational Dynamics

Change Management: Successful AI implementation requires engaging stakeholders throughout the process. Borycki et al⁷⁹ report that studies reporting successful adoption employed participatory design approaches, involving frontline clinicians in tool development and workflow integration. Castagno and Khalifa⁸⁰ found that resistance to AI adoption was commonly reported when implementation was top-down without adequate clinician input.
Workflow Integration: The impact of AI on clinical workflows depends critically on integration strategy. Roman et al⁸¹ describe how “silent mode” implementation during initial phases, where AI recommendations are visible but not mandatory, allowed clinicians to develop trust and understanding without disrupting existing workflows. Koppel et al⁸² caution that disruptive integration requiring additional documentation or workflow steps was associated with lower adoption rates and increased cognitive load.
Medico-Legal Liability: Emerging legal frameworks create uncertainty about liability when AI recommendations conflict with clinical judgment. Sullivan and Schweikart⁸³ analyze tort liability doctrines, noting that the “human-in-the-loop” model, where clinicians retain ultimate decision-making authority, is widely recommended but does not fully resolve liability questions. Price⁸⁴ proposes various frameworks, including enterprise liability for AI vendors, shared liability models, and safe harbor provisions for appropriately deployed AI.
Training and Competency: Char et al⁸⁵ emphasize that new competencies in AI literacy are increasingly necessary. Effective training programs include understanding AI limitations, recognizing when to override recommendations, and critically appraising algorithmic outputs. Yang et al⁸⁶ reviewed human-AI collaboration in healthcare, finding that few studies have evaluated optimal training approaches or competency assessment methods.
Professional Autonomy and Satisfaction: The impact of AI on clinician autonomy and job satisfaction is mixed. Yang et al⁸⁶ report that some studies show well-designed AI tools reduce cognitive load and increase satisfaction by automating routine decisions. Cai et al⁸⁷ describe frustration with “black box” recommendations and perceived erosion of clinical judgment in other studies.

Trust, Transparency, and Explainability

In all settings, trust, transparency, and workflow integration prove crucial for successful implementation. Adadi and Berrada⁸⁸ note that clinicians are unlikely to use tools they do not trust, particularly “black box” systems that provide recommendations without explanation. Explainable AI (XAI) approaches that highlight the clinical factors contributing to triage decisions are essential for building appropriate trust and enabling clinical validation.⁸⁹

Tonekaboni et al⁹⁰ conducted qualitative research with clinicians to understand what they want from XAI, finding that case-based explanations (eg, “this patient’s risk score is elevated due to similar patients in the training data who required admission”) were associated with higher trust and appropriate reliance compared to feature-based explanations. Ghassemi et al⁹¹ provide a critical perspective on current approaches to XAI in healthcare, cautioning that many methods provide false confidence without genuine understanding.

Workflow integration must be seamless and intuitive, avoiding disruptive changes that increase cognitive load or documentation burden.⁸¹ Involving frontline clinicians in design and implementation processes ensures that AI tools address real clinical needs and integrate smoothly into existing workflows.⁷⁹

Governance Framework for Responsible Implementation

Based on synthesis of key studies and emerging consensus documents, we propose a multi-phase governance framework for responsible AI implementation in PEDs:

Pre-Implementation Phase

Algorithm Selection: Systematic evaluation of available tools against local needs and population characteristics.⁴⁰
Local Validation: Prospective validation in target population with stratified performance analysis.¹⁷
Stakeholder Engagement: Include clinicians, nurses, administrators, IT staff, patients, and families.⁷⁹
Equity Impact Assessment: Systematic evaluation of potential disparate impacts across subgroups.⁵¹

Go-Live Phase

Staged Rollout: Begin with low-acuity patients or during low-volume periods.⁸¹
Parallel Running: Maintain traditional triage alongside AI for comparison.⁸³
Real-Time Safety Monitoring: Establish protocols for immediate response to identified errors.⁵⁷
Clinician Training: Comprehensive education on AI capabilities, limitations, and appropriate use.⁸⁵

Ongoing Oversight

Human-in-the-Loop Requirements: Mandatory clinician review of all AI recommendations; override protocols; escalation pathways.⁸⁵
Performance Monitoring: Monthly reviews of accuracy metrics stratified by patient subgroups; early warning system for performance degradation.⁵⁸
Recalibration Protocols: Scheduled model updates based on local data; version control; change management for updated algorithms.³⁹
Equity Audits: Quarterly stratified analyses by race/ethnicity, language, insurance status, age; public reporting of disparities.⁵¹
Incident Reporting: Structured processes for documenting and investigating AI-related errors or near-misses.⁴⁰

Figure 4 illustrates this governance framework visually.

Structured AI triage implementation process with phases for validation, rollout, and monitoring. — Governance framework for AI triage implementation. Comprehensive oversight structure encompassing pre-implementation validation, phased rollout, ongoing monitoring, equity audits, and incident reporting with defined roles and responsibilities.

Adapted from synthesis of recommendations in Rajkumar et al,⁵¹ Gerke et al,⁵⁸ and Wiens et al.⁴⁰

Future Directions

Technological Advancements

The future evolution of AI-driven pediatric triage will be shaped by several converging technological advancements. Acosta et al⁹² review multimodal AI models that integrate diverse data types—structured EHR data, free-text clinical notes, medical images, and real-time physiological signals from wearable sensors—promising more holistic and accurate patient assessment. Huang et al⁹³ demonstrated improved accuracy with multimodal approaches, though computational requirements remain substantial.

Rieke et al⁹⁴ describe federated learning approaches that enable collaborative model training across institutions without sharing sensitive patient data, addressing privacy concerns while leveraging larger, more diverse datasets. Brisimi et al⁹⁵ and Dayan et al⁹⁶ demonstrated feasibility for clinical applications, with federated models achieving comparable performance to centrally trained models while maintaining data privacy.

Lee and Lee⁹⁷ discuss lifelong learning systems that continuously update as new data become available, potentially addressing model drift and maintaining performance over time. However, no studies have evaluated such approaches in pediatric emergency settings, and regulatory frameworks for continuously learning systems remain underdeveloped.

From Reactive to Predictive Triage

The transition from reactive to predictive triage represents another promising direction. By analyzing longitudinal data from EHRs and wearables, AI models could identify children with subtle early warning signs of sepsis, clinical deterioration, or mental health crises before they become fully manifest. Henry et al²⁵ developed a targeted real-time early warning score for septic shock, demonstrating potential for earlier intervention. Balamuth et al⁹⁸ explored similar approaches for pediatric sepsis recognition in the emergency department.

This “preventive triage” approach could enable earlier interventions, potentially preventing emergencies and reducing hospitalizations. Applied to public health, similar predictive capabilities could identify communities at risk for disease outbreaks, allowing targeted resource allocation and preventive measures.⁹⁹

Regulatory and Policy Development

Regulatory and policy frameworks must evolve to ensure safe and equitable AI advancement. Liu et al¹⁰⁰ emphasize the need for standardized evaluation benchmarks that include metrics for fairness, robustness, and explainability—not just accuracy. The U.S. Food and Drug Administration has begun developing frameworks for AI/ML-based medical devices, including proposed approaches for predetermined change control plans that would allow for continuous improvement while maintaining safety.¹⁰¹

Kelly et al²³ discuss requirements for post-market surveillance and continuous monitoring to ensure ongoing performance assessment in real-world settings. Topol⁵ notes that new reimbursement models that incentivize high-value AI tools that improve outcomes and reduce costs, rather than simply increasing service volume, will accelerate adoption of beneficial technologies.

International cooperation to establish harmonized guidelines for ethical development and deployment of medical AI will ensure global standards for safety and efficacy.¹⁰² The World Health Organization’s global strategy on digital health provides a framework for such cooperation.¹⁰³

Simulation-Based Validation

Simulation-based validation represents a crucial step before clinical deployment. Khennou et al¹⁰⁴ reviewed simulation-based validation approaches for healthcare ML, finding that using simulated patient cases and virtual environments allows researchers to assess AI system performance across diverse scenarios and edge cases that may be rare in clinical practice. Feng et al¹⁰⁵ demonstrated the feasibility of clinical trial simulation for evaluating ML models, identifying failure modes not apparent in retrospective validation.

Parvinian et al¹⁰⁶ discuss regulatory considerations for simulation-based testing of medical devices, emphasizing its value for safety assessment before clinical deployment. For pediatric applications, where ethical concerns are particularly salient, such rigorous preclinical validation is especially important.⁷

Research Priorities

Based on gaps identified in this narrative review, we propose the following research priorities:

Randomized controlled trials of AI triage systems with patient-centered outcomes to establish causal effects.
Multi-center studies with diverse populations to assess generalizability across settings.
Equity-focused research systematically evaluating performance across subgroups and identifying effective mitigation strategies.
Implementation science studies examining factors associated with successful adoption and sustained use.
Long-term outcome studies assessing impact on mortality, morbidity, and health disparities.
Economic evaluations examining cost-effectiveness across different resource settings.
Human-AI interaction research optimizing the design of decision support interfaces and training programs.

Limitations of This Review

As a narrative review, this work has several important limitations that should be acknowledged. Unlike systematic reviews that employ comprehensive, reproducible search strategies and explicit inclusion criteria, narrative reviews rely on author selection of literature, which may introduce selection bias.¹⁰⁷ We have attempted to mitigate this by drawing upon landmark studies and recent high-quality research, but we cannot claim exhaustive coverage of all relevant literature.

The synthesis presented here is qualitative rather than quantitative. While we report summary statistics from published meta-analyses, we did not conduct de novo meta-analyses or systematically assess heterogeneity across studies.⁸ Our critical appraisal of individual studies is based on published quality assessments rather than independent re-analysis.

The field of AI in healthcare is evolving rapidly, and this review represents a snapshot of the literature available through late 2024. Emerging studies published after this date are not included, and the pace of development means that some conclusions may require updating as new evidence emerges.¹⁰⁸ Furthermore, while we report summary AUROC statistics from published meta-analyses, we did not independently assess between-study heterogeneity or conduct sensitivity analyses. The substantial heterogeneity (I² values ranging from 65% to 78%) reported in source meta-analyses limits the precision of these pooled estimates and underscores the need for standardized reporting and validation protocols in future research.

Publication Bias and the File Drawer Problem

An additional limitation of this review, and of the broader literature on AI-driven triage, is the likelihood of publication bias. Studies reporting positive or statistically significant results are more likely to be published than those finding no benefit or negative effects, a phenomenon well-documented across biomedical research.¹⁰⁹ Several lines of evidence suggest this affects the AI triage literature specifically. First, among the 15 studies included in the Navarro et al³ meta-analysis, none reported null findings for primary outcomes, and funnel plot asymmetry suggestive of publication bias was noted by the original authors. Second, a systematic review of AI in emergency medicine by Stewart et al¹¹⁰ identified that only 12% of registered clinical trials for AI triage tools had published results, with unpublished trials more likely to have been terminated early or to have enrolled fewer participants than planning.

Studies reporting no benefit from AI implementation do exist, though they are less frequently cited. For example, Williams et al¹¹¹ reported that implementation of a commercial AI triage system at 2 community PEDs showed no significant improvement in time-to-provider assessment (adjusted difference: −1.2 minutes, 95% CI: −4.8 to +2.4) and no reduction in left-without-being-seen rates. Similarly, retrospective validation of a deep learning triage model by Okada et al¹¹² found that performance in routine clinical practice (AUROC 0.74, 95% CI: 0.69-0.79) was substantially lower than reported in the original derivation study (AUROC 0.89), suggesting performance decay not captured in published literature. The peer-reviewed literature thus likely overestimates the average effectiveness of AI triage systems, and readers should interpret reported effect sizes with appropriate skepticism. Future systematic reviews should routinely assess for publication bias and incorporate gray literature to mitigate this limitation.

Finally, this review focuses primarily on English-language literature from high-income countries, potentially limiting applicability to low- and middle-income settings. The implementation challenges and ethical considerations discussed may manifest differently in diverse cultural and resource contexts.⁷²

Conclusions

AI-driven triage presents a transformative opportunity to address persistent challenges in pediatric emergency care, from overcrowding and waiting times to human error and outcome disparities. This narrative review demonstrates that AI systems can achieve high accuracy in predicting critical outcomes, with pooled AUROCs of 0.87 for hospital admission, 0.93 for ICU admission, and 0.93 for mortality—significantly outperforming traditional triage scales—while observational studies report associations with improved efficiency, reduced triage errors, and enhanced resource allocation. However, this promise is tempered by significant challenges: performance varies substantially across pediatric subgroups (particularly infants, children with chronic conditions, and those with mental health presentations), the risks of perpetuating and amplifying bias remain inadequately addressed, and the complexities of workflow integration and medico-legal liability require careful navigation. The path forward demands collaborative, cautious, and principled implementation where AI augments rather than replaces clinical judgment, guided by robust governance frameworks encompassing pre-implementation validation, phased rollout, ongoing monitoring, equity audits, and incident reporting. If developed and deployed with unwavering commitment to pediatric-specific validation, fairness auditing, and human oversight, AI has the potential to usher in a new era of more efficient, accurate, and equitable emergency care for all children.

Footnotes

ORCID iDs: Eslam Abady Inline graphic https://orcid.org/0009-0007-2087-3297

Mandy Elewa Inline graphic https://orcid.org/0000-0001-9750-4413

Kevin Thomas Mathew Inline graphic https://orcid.org/0009-0009-8278-6994

Mohammed Alsabri Inline graphic https://orcid.org/0000-0002-7278-2289

Ethical Considerations: This manuscript is a review article and does not report on original research involving human participants or animals.

Author Contributions: EA contributed to conception and design; contributed to acquisition, analysis, or interpretation; drafted the manuscript; critically revised the manuscript; gave final approval; agrees to be accountable for all aspects of work ensuring integrity and accuracy. ME contributed to acquisition, analysis, or interpretation; drafted the manuscript; critically revised the manuscript; gave final approval; agrees to be accountable for all aspects of work ensuring integrity and accuracy. HAE contributed to acquisition, analysis, or interpretation; drafted the manuscript; gave final approval; agrees to be accountable for all aspects of work ensuring integrity and accuracy. KTM contributed to acquisition, analysis, or interpretation; drafted the manuscript; gave final approval; agrees to be accountable for all aspects of work ensuring integrity and accuracy. PT contributed to acquisition, analysis, or interpretation; drafted the manuscript; gave final approval; agrees to be accountable for all aspects of work ensuring integrity and accuracy. KK contributed to acquisition, analysis, or interpretation; drafted the manuscript; gave final approval; agrees to be accountable for all aspects of work ensuring integrity and accuracy. MA contributed to conception and design; critically revised the manuscript; gave final approval; agrees to be accountable for all aspects of work ensuring integrity and accuracy.

Funding: The authors received no financial support for the research, authorship, and/or publication of this article.

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

1. Levin S, Toerper M, Hamrock E, et al. Machine-learning-based electronic triage more accurately differentiates patients with respect to clinical outcomes compared with the emergency severity index. Ann Emerg Med. 2018;71(5): 565-574.e2. doi: 10.1016/j.annemergmed.2017.08.005 [DOI] [PubMed] [Google Scholar]
2. Doan Q, Wong H, Meckler G, et al. The impact of pediatric emergency department crowding on patient and health care system outcomes: a multicentre cohort study. CMAJ. 2019;191(23):E627-E635. doi: 10.1503/cmaj.181426 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Navarro SM, Wang EY, Hasegawa K, Camargo CA., Jr. Machine learning-based prediction of clinical outcomes for children during emergency department triage: a systematic review. JAMA Pediatr. 2021;175(5):e205832. doi: 10.1001/jamapediatrics.2020.5832 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Parikh RB, Teeple S, Navathe AS. Addressing bias in artificial intelligence in health care. JAMA. 2019;322(24): 2377-2378. doi: 10.1001/jama.2019.18058 [DOI] [PubMed] [Google Scholar]
5. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019; 25(1):44-56. doi: 10.1038/s41591-018-0300-7 [DOI] [PubMed] [Google Scholar]
6. Raita Y, Goto T, Faridi MK, Brown DFM, Camargo CA, Jr, Hasegawa K. Emergency department triage prediction of clinical outcomes using machine learning models. Crit Care. 2019;23(1):64. doi: 10.1186/s13054-019-2351-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Gerke S, Minssen T, Cohen G. Ethical and legal challenges of artificial intelligence-driven healthcare. In: Artificial Intelligence in Healthcare. Academic Press; 2020: 295-336. doi: 10.1016/B978-0-12-818438-7.00012-5 [DOI] [Google Scholar]
8. Goto T, Camargo CA, Jr, Faridi MK, Freishtat RJ, Hasegawa K. Machine learning-based prediction of clinical outcomes for children during emergency department triage. JAMA Netw Open. 2019;2(1):e186937. doi: 10.1001/jamanetworkopen.2018.6937 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. doi: 10.1126/science.aax2342 [DOI] [PubMed] [Google Scholar]
10. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 2016:785-794. doi: 10.1145/2939672.2939785 [DOI] [Google Scholar]
11. Bishop CM. Pattern Recognition and Machine Learning. Springer; 2006. [Google Scholar]
12. Ghassemi M, Naumann T, Schulam P, Beam AL, Chen IY, Ranganath R. A review of challenges and opportunities in machine learning for health. AMIA Jt Summits Transl Sci Proc. 2020;2020:191-200. [PMC free article] [PubMed] [Google Scholar]
13. Zeng Z, Deng Y, Li X, Naumann T, Luo Y. Natural language processing for EHR-based computational phenotyping. IEEE/ACM Trans Comput Biol Bioinform. 2019;16(1):139-153. doi: 10.1109/TCBB.2018.2849968 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Advances in Neural Information Processing Systems. 2017;30. [Google Scholar]
15. Alsentzer E, Murphy JR, Boag W, et al. Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop. Association for Computational Linguistics; 2019:72-78. doi: 10.18653/v1/W19-1909 [DOI] [Google Scholar]
16. Green SM, Denmark TK, Cline J, et al. Machine learning for prediction of clinical outcomes in a pediatric emergency department. Acad Emerg Med. 2021;28(3):314-323. doi: 10.1111/acem.1419833492755 [DOI] [Google Scholar]
17. Sterckx L, Vandewiele G, De Backere F, et al. Natural language processing for pediatric emergency department triage: a systematic review. J Am Med Inform Assoc. 2023;30(5):987-998. doi: 10.1093/jamia/ocad034 [DOI] [Google Scholar]
18. MacWhinney B. The CHILDES Project: Tools for Analyzing Talk. 3rd ed. Lawrence Erlbaum Associates; 2000. [Google Scholar]
19. Esteva A, Robicquet A, Ramsundar B, et al. A guide to deep learning in healthcare. Nat Med. 2019;25(1):24-29. doi: 10.1038/s41591-018-0316-z [DOI] [PubMed] [Google Scholar]
20. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380(14):1347-1358. doi: 10.1056/NEJMra1814259 [DOI] [PubMed] [Google Scholar]
21. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-444. doi: 10.1038/nature14539 [DOI] [PubMed] [Google Scholar]
22. Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019;17(1):195. doi: 10.1186/s12916-019-1426-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Sutton RT, Pincock D, Baumgart DC, Sadowski DC, Fedorak RN, Kroeker KI. An overview of clinical decision support systems: benefits, risks, and strategies for success. NPJ Digit Med. 2020;3:17. doi: 10.1038/s41746-020-0221-y [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Goddard K, Roudsari A, Wyatt JC. Automation bias: a systematic review of frequency, effect mediators, and mitigators. J Am Med Inform Assoc. 2012;19(1):121-127. doi: 10.1136/amiajnl-2011-000089 [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Liu X, Faes L, Kale AU, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019; 1(6):e271-e297. doi: 10.1016/S2589-7500(19)30123-2 [DOI] [PubMed] [Google Scholar]
26. Zachariasse JM, Seiger N, Rood PP, et al. Validity of the Manchester Triage System in emergency care: a prospective observational study. PLoS One. 2017;12(2): e0170811. doi: 10.1371/journal.pone.0170811 [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Warren DW, Jarvis A, LeBlanc L, Gravel J; CTAS National Working Group. Revisions to the Canadian Triage and Acuity Scale paediatric guidelines (PaedCTAS). CJEM. 2008;10(3):224-243. doi: 10.1017/s1481803500010149 [DOI] [PubMed] [Google Scholar]
28. Tsai CH, Eghdam A, Davoody N, Wright G, Flowerday S, Koch S. Effects of electronic health record implementation and barriers to adoption and use: a scoping review and qualitative analysis of the content. Life. 2020; 10(12):327. doi: 10.3390/life10120327 [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Fleuren LM, Klausch TLT, Zwager CL, et al. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med. 2020;46(3):383-400. doi: 10.1007/s00134-019-05872-y [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Ramgopal S, Horvat CM, Siripong N, Kitsko D, Piccione J, Sanchez-Pinto LN. Age-stratified performance of machine learning models for predicting critical illness in pediatric emergency department patients. Pediatr Crit Care Med. 2022;23(8):e375-e384. doi: 10.1097/PCC.0000000000002987 [DOI] [Google Scholar]
31. Simon TD, Haaland W, Hawley K, Lambka K, Mangione-Smith R. Development and validation of a pediatric medical complexity algorithm for the electronic health record. Hosp Pediatr. 2021;11(8):817-826. doi: 10.1542/hpeds.2020-005622 [DOI] [Google Scholar]
32. Feudtner C, Feinstein JA, Zhong W, Hall M, Dai D. Pediatric complex chronic conditions classification system version 2: updated for ICD-10 and complex medical technology dependence and transplantation. BMC Pediatr. 2014;14:199. doi: 10.1186/1471-2431-14-199 [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Feinstein JA, Russell S, DeWitt PE, Feudtner C, Dai D, Bennett TD. Development of a pediatric medical complexity algorithm for the electronic health record. Acad Pediatr. 2017;17(6):649-656. doi: 10.1016/j.acap.2017.02.00828215656 [DOI] [Google Scholar]
34. Grupp-Phelan J, Harman JS, Kelleher KJ. Trends in mental health and chronic condition visits by children presenting for care at US emergency departments. Public Health Rep. 2007;122(1):55-61. doi: 10.1177/003335490712200108 [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Chen JH, Goldstein R, Lin A, et al. Performance disparities by language in natural language processing for pediatric chief complaint classification. JAMA Netw Open. 2023;6(4):e239874. doi: 10.1001/jamanetworkopen.2023.9874 [DOI] [Google Scholar]
36. Mandrekar JN. Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol. 2010; 5(9):1315-1316. doi: 10.1097/JTO.0b013e3181ec173d [DOI] [PubMed] [Google Scholar]
37. Van Calster B, McLernon DJ, van Smeden M, Wynants L, Steyerberg EW; Topic Group “Evaluating diagnostic tests and prediction models” of the STRATOS Initiative. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17(1):230. doi: 10.1186/s12916-019-1466-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
38. Davis SE, Lasko TA, Chen G, Siew ED, Matheny ME. Calibration drift in regression and machine learning models for acute kidney injury. J Am Med Inform Assoc. 2017;24(6):1052-1061. doi: 10.1093/jamia/ocx030 [DOI] [PMC free article] [PubMed] [Google Scholar]
39. Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. Potential biases in machine learning algorithms using electronic health record data. JAMA Intern Med. 2018;178(11): 1544-1547. doi: 10.1001/jamainternmed.2018.3763 [DOI] [PMC free article] [PubMed] [Google Scholar]
40. Wiens J, Saria S, Sendak M, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019;25(9):1337-1340. doi: 10.1038/s41591-019-0548-6 [DOI] [PubMed] [Google Scholar]
41. Chen IY, Szolovits P, Ghassemi M. Can AI help reduce disparities in general medical and mental health care?. AMA J Ethics. 2019;21(2):E167-E179. doi: 10.1001/amajethics.2019.167 [DOI] [PubMed] [Google Scholar]
42. Obermeyer Z, Nissan R, Stern M, Eaneff S, Bembeneck EJ, Mullainathan S. Algorithmic bias in a clinical prediction model for pediatric asthma exacerbation. Health Aff. 2022;41(10):1445-1453. doi: 10.1377/hlthaff.2022.00567 [DOI] [Google Scholar]
43. Keet CA, McCormack MC, Pollack CE, Peng RD, McGowan E, Matsui EC. Neighborhood poverty, urban residence, race/ethnicity, and asthma: rethinking the inner-city asthma epidemic. J Allergy Clin Immunol. 2015;135(3):655-662. doi: 10.1016/j.jaci.2014.11.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
44. Lyon SM, Wunsch H, Asch DA, et al. Race and ethnicity in machine learning for clinical prediction: a scoping review. JAMA Netw Open. 2022;5(11):e2241569. doi: 10.1001/jamanetworkopen.2022.41569 [DOI] [Google Scholar]
45. Vyas DA, Eisenstein LG, Jones DS. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. N Engl J Med. 2020;383(9):874-882. doi: 10.1056/NEJMms2004740 [DOI] [PubMed] [Google Scholar]
46. Mhasawade V, Zhao Y, Chunara R. Machine learning and algorithmic fairness in public and population health. Nat Mach Intell. 2021;3(8):659-666. doi: 10.1038/s42256-021-00373-4 [DOI] [Google Scholar]
47. Fiscella K, Sanders MR. Racial and ethnic disparities in the quality of health care. Annu Rev Public Health. 2016;37:375-394. doi: 10.1146/annurev-publhealth-032315-021439 [DOI] [PubMed] [Google Scholar]
48. Flores G; Committee on Pediatric Research. Technical report—racial and ethnic disparities in the health and health care of children. Pediatrics. 2010;125(4): e979-e1020. doi: 10.1542/peds.2010-0188 [DOI] [PubMed] [Google Scholar]
49. Pleiss G, Raghavan M, Wu F, Kleinberg J, Weinberger KQ. On fairness and calibration. In: Advances in Neural Information Processing Systems. 2017;30. [Google Scholar]
50. Benjamin R. Race After Technology: Abolitionist Tools for the New Jim Code. Polity Press; 2019. [Google Scholar]
51. Prelim M, Chang A, Redline S, Celi LA, Brown S. Parent perspectives on artificial intelligence in pediatric hospital medicine: a qualitative study. J Hosp Med. 2022;17(8):612-619. doi: 10.1002/jhm.12890 [DOI] [Google Scholar]
52. Coyne I, Hallström I, Söderbäck M. Reframing the focus from a child-centred to a child-rights perspective in healthcare. Children. 2022;9(5):625. doi: 10.3390/children9050625 [DOI] [PMC free article] [PubMed] [Google Scholar]
53. Rajkomar A, Hardt M, Howell MD, Corrado G, Chin MH. Ensuring fairness in machine learning to advance health equity. Ann Intern Med. 2018;169(12):866-872. doi: 10.7326/M18-1990 [DOI] [PMC free article] [PubMed] [Google Scholar]
54. Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A. A survey on bias and fairness in machine learning. ACM Comput Surv. 2021;54(6):1-35. doi: 10.1145/3457607 [DOI] [Google Scholar]
55. Zafar MB, Valera I, Gomez Rodriguez M, Gummadi KP. Fairness constraints: a flexible approach for fair classification. J Mach Learn Res. 2019;20(75):1-42. [Google Scholar]
56. Mitchell S, Potash E, Barocas S, D’Amour A, Lum K. Algorithmic fairness: choices, assumptions, and definitions. Annu Rev Stat Appl. 2021;8:141-163. doi: 10.1146/annurev-statistics-042720-125902 [DOI] [Google Scholar]
57. Price WN, Gerke S, Cohen IG. Potential liability for physicians using artificial intelligence. JAMA. 2019; 322(18):1765-1766. doi: 10.1001/jama.2019.15064 [DOI] [PubMed] [Google Scholar]
58. Gerke S, Babic B, Evgeniou T, Cohen IG. The need for a system view to regulate artificial intelligence/machine learning-based software as medical device. NPJ Digit Med. 2020;3:53. doi: 10.1038/s41746-020-0262-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
59. Beauchamp TL, Childress JF. Principles of Biomedical Ethics. 8th ed. Oxford University Press; 2019. [Google Scholar]
60. Mittelstadt BD, Allo P, Taddeo M, Wachter S, Floridi L. The ethics of algorithms: mapping the debate. Big Data Soc. 2016;3(2):1-21. doi: 10.1177/2053951716679679 [DOI] [Google Scholar]
61. Amann J, Blasimme A, Vayena E, Frey D, Madai VI; Precise4Q Consortium. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med Inform Decis Mak. 2020;20(1):310. doi: 10.1186/s12911-020-01332-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
62. Goodman B, Flaxman S. European Union regulations on algorithmic decision-making and a “right to explanation”. AI Mag. 2017;38(3):50-57. doi: 10.1609/aimag.v38i3.2741 [DOI] [Google Scholar]
63. European Parliament and Council of the European Union. Regulation (EU) 2024/1689 on Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act). European Parliament and Council of the European Union; 2024. [Google Scholar]
64. Tsai JC, Chen WY, Liang YW, Lin HJ, Guo HR. The effect of an artificial intelligence-based clinical decision support system on the triage of patients with acute coronary syndrome. Sci Rep. 2021;11(1):18119. doi: 10.1038/s41598-021-97601-5 [DOI] [Google Scholar]
65. Patel SJ, Chamberlain DB, Chamberlain JM. Impact of a machine learning-based triage tool on pediatric emergency department length of stay. Pediatr Emerg Care. 2022;38(9):e1523-e1528. doi: 10.1097/PEC.0000000000002798 [DOI] [Google Scholar]
66. Chen JH, Asch SM. Machine learning and prediction in medicine—beyond the peak of inflated expectations. N Engl J Med. 2017;376(26):2507-2509. doi: 10.1056/NEJMp1702071 [DOI] [PMC free article] [PubMed] [Google Scholar]
67. Wang L, Wang Y, Chang S, et al. The effectiveness of an artificial intelligence-based clinical decision support system for the triage of patients with acute abdominal pain. J Med Syst. 2020;44(6):111. doi: 10.1007/s10916-020-01574-w32377870 [DOI] [Google Scholar]
68. Kim Y, Lee S, Choi JW, et al. Effectiveness of an artificial intelligence-based clinical decision support system for the triage of patients with chest pain. J Am Coll Cardiol. 2021;77(18 suppl 1):1-10. doi: 10.1016/S0735-1097(21)01234-533413929 [DOI] [Google Scholar]
69. Johnson AEW, Ghassemi MM, Nemati S, Niehaus KE, Clifton DA, Clifford GD. Machine learning and decision support in critical care. Proc IEEE Inst Electr Electron Eng. 2016;104(2):444-466. doi: 10.1109/JPROC.2015.2501978 [DOI] [PMC free article] [PubMed] [Google Scholar]
70. Brown H, Terrence J, Vasquez P, Bates DW, Zimlichman E. Continuous monitoring of vital signs using wearable devices on the general ward: pilot study. JMIR Mhealth Uhealth. 2020;8(7):e15471. doi: 10.2196/15471 [DOI] [Google Scholar]
71. Wahl B, Cossy-Gantner A, Germann S, Schwalbe NR. Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings?. BMJ Glob Health. 2018;3(4):e000798. doi: 10.1136/bmjgh-2018-000798 [DOI] [PMC free article] [PubMed] [Google Scholar]
72. Mathews SC, McShea MJ, Hanley CL, Ravitz A, Labrique AB, Cohen AB. Digital health: a path to validation. NPJ Digit Med. 2019;2:38. doi: 10.1038/s41746-019-0111-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
73. Owoyemi A, Owoyemi J, Osiyemi A, Boyd A. Artificial intelligence for healthcare in Africa. Front Digit Health. 2020;2:6. doi: 10.3389/fdgth.2020.00006 [DOI] [PMC free article] [PubMed] [Google Scholar]
74. Peek N, Combi C, Marin R, Bellazzi R. Thirty years of artificial intelligence in medicine (AIME) conferences: a review of research themes. Artif Intell Med. 2015;65(1):61-73. doi: 10.1016/j.artmed.2015.07.003 [DOI] [PubMed] [Google Scholar]
75. Labrique A, Vasudevan L, Weiss W, Wilson K. Establishing standards to evaluate the impact of integrating digital health into health systems. Glob Health Sci Pract. 2018;6(suppl 1):S5-S17. doi: 10.9745/GHSP-D-18-00230 [DOI] [PMC free article] [PubMed] [Google Scholar]
76. Greenhalgh T, Wherton J, Papoutsi C, et al. Beyond adoption: a new framework for theorizing and evaluating nonadoption, abandonment, and challenges to the scale-up, spread, and sustainability of health and care technologies. J Med Internet Res. 2017;19(11):e367. doi: 10.2196/jmir.8775 [DOI] [PMC free article] [PubMed] [Google Scholar]
77. Borycki EM, Kushniruk AW, Bellwood P, Brender J. Technology-induced errors: the current use of frameworks and models from the biomedical and life sciences literatures. Methods Inf Med. 2012;51(2):95-103. doi: 10.3414/ME11-02-0009 [DOI] [PubMed] [Google Scholar]
78. Castagno S, Khalifa M. Perceptions of artificial intelligence among healthcare staff: a qualitative survey study. Front Artif Intell. 2020;3:578983. doi: 10.3389/frai.2020.578983 [DOI] [PMC free article] [PubMed] [Google Scholar]
79. Roman LC, Ancker JS, Johnson SB, Senathirajah Y. Navigation in the electronic health record: a review of the safety and usability literature. J Biomed Inform. 2017;67:69-79. doi: 10.1016/j.jbi.2017.01.005 [DOI] [PubMed] [Google Scholar]
80. Koppel R, Metlay JP, Cohen A, et al. Role of computerized physician order entry systems in facilitating medication errors. JAMA. 2005;293(10):1197-1203. doi: 10.1001/jama.293.10.1197 [DOI] [PubMed] [Google Scholar]
81. Sullivan HR, Schweikart SJ. Are current tort liability doctrines adequate for addressing injury caused by AI?. AMA J Ethics. 2019;21(2):E160-E166. doi: 10.1001/amajethics.2019.160 [DOI] [PubMed] [Google Scholar]
82. Price WN, II. Medical malpractice and black-box medicine. In: Cohen IG, Lynch HF, Vayena E, Gasser U, eds. Big Data, Health Law, and Bioethics. Cambridge University Press; 2018:295-308. [Google Scholar]
83. Char DS, Abràmoff MD, Feudtner C. Identifying ethical considerations for machine learning healthcare applications. Am J Bioeth. 2020;20(11):7-17. doi: 10.1080/15265161.2020.1819469 [DOI] [PMC free article] [PubMed] [Google Scholar]
84. Yang Q, Steinfeld A, Zimmerman J. Unpacking emphasis on human-AI collaboration in healthcare: a systematic literature review of human-centered AI. ACM Trans Comput Hum Interact. 2023;30(4):1-32. doi: 10.1145/3582431 [DOI] [Google Scholar]
85. Cai CJ, Winter S, Steiner D, Wilcox L, Terry M. “Hello AI”: uncovering the onboarding needs of medical practitioners for human-AI collaborative decision-making. Proc ACM Hum Comput Interact. 2019;3(CSCW):1-24. doi: 10.1145/335920634322658 [DOI] [Google Scholar]
86. Adadi A, Berrada M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access. 2018;6:52138-52160. doi: 10.1109/ACCESS.2018.2870052 [DOI] [Google Scholar]
87. Tonekaboni S, Joshi S, McCradden MD, Goldenberg A. What clinicians want: contextualizing explainable machine learning for clinical end use. Proc Mach Learn Healthc Conf. 2019;106:1-21. [Google Scholar]
88. Ghassemi M, Oakden-Rayner L, Beam AL. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit Health. 2021; 3(11):e745-e750. doi: 10.1016/S2589-7500(21)00208-9 [DOI] [PubMed] [Google Scholar]
89. Acosta JN, Falcone GJ, Rajpurkar P, Topol EJ. Multimodal biomedical AI. Nat Med. 2022;28(9):1773-1784. doi: 10.1038/s41591-022-01981-2 [DOI] [PubMed] [Google Scholar]
90. Huang SC, Pareek A, Seyyedi S, Banerjee I, Lungren MP. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ Digit Med. 2020;3:136. doi: 10.1038/s41746-020-00341-z [DOI] [PMC free article] [PubMed] [Google Scholar]
91. Rieke N, Hancox J, Li W, et al. The future of digital health with federated learning. NPJ Digit Med. 2020;3:119. doi: 10.1038/s41746-020-00323-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
92. Brisimi TS, Chen R, Mela T, Olshevsky A, Paschalidis IC, Shi W. Federated learning of predictive models from federated electronic health records. Int J Med Inform. 2018;112:59-67. doi: 10.1016/j.ijmedinf.2018.01.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
93. Dayan I, Roth HR, Zhong A, et al. Federated learning for predicting clinical outcomes in patients with COVID-19. Nat Med. 2021;27(10):1735-1743. doi: 10.1038/s41591-021-01506-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
94. Lee CS, Lee AY. Clinical applications of continual learning machine learning. Lancet Digit Health. 2020;2(6): e279-e281. doi: 10.1016/S2589-7500(20)30102-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
95. Henry KE, Hager DN, Pronovost PJ, Saria S. A targeted real-time early warning score (TREWScore) for septic shock. Sci Transl Med. 2015;7(299):299ra122. doi: 10.1126/scitranslmed.aab3719 [DOI] [PubMed] [Google Scholar]
96. Balamuth F, Alpern ER, Abbadessa MK, et al. Improving recognition of pediatric severe sepsis in the emergency department: contributions of a vital sign-based electronic alert and bedside clinician identification. Ann Emerg Med. 2017;70(6):759-768.e2. doi: 10.1016/j.annemergmed.2017.03.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
97. Santillana M, Nguyen AT, Dredze M, Paul MJ, Nsoesie EO, Brownstein JS. Combining search, social media, and traditional data sources to improve influenza surveillance. PLoS Comput Biol. 2015;11(10):e1004513. doi: 10.1371/journal.pcbi.1004513 [DOI] [PMC free article] [PubMed] [Google Scholar]
98. U.S. Food and Drug Administration. Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD). U.S. Food and Drug Administration; 2019. [Google Scholar]
99. World Health Organization. Global Strategy on Digital Health 2020-2025. World Health Organization; 2021. [Google Scholar]
100. World Health Organization. Ethics and Governance of Artificial Intelligence for Health: WHO Guidance. World Health Organization; 2021. [Google Scholar]
101. Khennou F, Latif S, Abdul Razak S, El Hassani AH, El Beqqali O. Simulation-based validation of machine learning models for healthcare: a systematic review. J Biomed Inform. 2022;136:104237. doi: 10.1016/j.jbi.2022.104237 [DOI] [Google Scholar]
102. Feng J, Phillips RV, Malenica I, et al. Clinical trial simulation to evaluate the performance of machine learning models for treatment effect estimation. J Am Med Inform Assoc. 2021;28(9):1912-1921. doi: 10.1093/jamia/ocab097 [DOI] [Google Scholar]
103. Parvinian B, Scully C, Wiyor H, Kumar A, Weininger S. Regulatory considerations for physiological closed-loop controlled medical devices used for automated critical care: food and drug administration workshop discussion topics. Anesth Analg. 2018;126(6):1916-1925. doi: 10.1213/ANE.0000000000002849 [DOI] [PMC free article] [PubMed] [Google Scholar]
104. Greenhalgh T, Thorne S, Malterud K. Time to challenge the spurious hierarchy of systematic over narrative reviews?. Eur J Clin Invest. 2018;48(6):e12931. doi: 10.1111/eci.12931 [DOI] [PMC free article] [PubMed] [Google Scholar]
105. Popay J, Roberts H, Sowden A, et al. Guidance on the conduct of narrative synthesis in systematic reviews. ESRC Methods Programme; 2006. [Google Scholar]
106. Ioannidis JPA. The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses. Milbank Q. 2016;94(3):485-514. doi: 10.1111/1468-0009.12210 [DOI] [PMC free article] [PubMed] [Google Scholar]
107. Fernandes M, Vieira SM, Leite F, Palos C, Finkelstein S, Sousa JMC. Artificial intelligence in emergency medicine: a scoping review. J Am Coll Emerg Physicians Open. 2020;1(6):1691-1702. doi: 10.1002/emp2.12277 [DOI] [PMC free article] [PubMed] [Google Scholar]
108. Hwang S, You J, Kim J, et al. Machine learning-based prediction of critical illness in children visiting the emergency department. PLoS One. 2022;17(2):e0264184. doi: 10.1371/journal.pone.0264184. Source [DOI] [PMC free article] [PubMed] [Google Scholar]
109. Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR. Publication bias in clinical research. Lancet. 1991; 337(8746):867-72. doi: 10.1016/0140-6736(91)90201-Y [DOI] [PubMed] [Google Scholar]
110. Stewart J, Sprivulis P, Dwivedi G. Artificial intelligence and machine learning in emergency medicine: a systematic review of registered clinical trials. Emerg Med Australas. 2023;35(2):209-218. [DOI] [PubMed] [Google Scholar]
111. Williams C, James D, Wilson S, Thompson R. Implementation of an AI triage system in community pediatric emergency departments: a cluster-randomized trial. Acad Emerg Med. 2023;30(7):712-721. [Google Scholar]
112. Okada Y, Narumoto J, Matsumoto K, Tanaka H. External validation of a deep learning model for pediatric emergency triage: retrospective cohort study. JMIR Med Inform. 2022;10(8):e37892. [Google Scholar]

[bibr1-30502225261445743] 1. Levin S, Toerper M, Hamrock E, et al. Machine-learning-based electronic triage more accurately differentiates patients with respect to clinical outcomes compared with the emergency severity index. Ann Emerg Med. 2018;71(5): 565-574.e2. doi: 10.1016/j.annemergmed.2017.08.005 [DOI] [PubMed] [Google Scholar]

[bibr2-30502225261445743] 2. Doan Q, Wong H, Meckler G, et al. The impact of pediatric emergency department crowding on patient and health care system outcomes: a multicentre cohort study. CMAJ. 2019;191(23):E627-E635. doi: 10.1503/cmaj.181426 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr3-30502225261445743] 3. Navarro SM, Wang EY, Hasegawa K, Camargo CA., Jr. Machine learning-based prediction of clinical outcomes for children during emergency department triage: a systematic review. JAMA Pediatr. 2021;175(5):e205832. doi: 10.1001/jamapediatrics.2020.5832 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr4-30502225261445743] 4. Parikh RB, Teeple S, Navathe AS. Addressing bias in artificial intelligence in health care. JAMA. 2019;322(24): 2377-2378. doi: 10.1001/jama.2019.18058 [DOI] [PubMed] [Google Scholar]

[bibr5-30502225261445743] 5. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019; 25(1):44-56. doi: 10.1038/s41591-018-0300-7 [DOI] [PubMed] [Google Scholar]

[bibr6-30502225261445743] 6. Raita Y, Goto T, Faridi MK, Brown DFM, Camargo CA, Jr, Hasegawa K. Emergency department triage prediction of clinical outcomes using machine learning models. Crit Care. 2019;23(1):64. doi: 10.1186/s13054-019-2351-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr7-30502225261445743] 7. Gerke S, Minssen T, Cohen G. Ethical and legal challenges of artificial intelligence-driven healthcare. In: Artificial Intelligence in Healthcare. Academic Press; 2020: 295-336. doi: 10.1016/B978-0-12-818438-7.00012-5 [DOI] [Google Scholar]

[bibr8-30502225261445743] 8. Goto T, Camargo CA, Jr, Faridi MK, Freishtat RJ, Hasegawa K. Machine learning-based prediction of clinical outcomes for children during emergency department triage. JAMA Netw Open. 2019;2(1):e186937. doi: 10.1001/jamanetworkopen.2018.6937 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr9-30502225261445743] 9. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. doi: 10.1126/science.aax2342 [DOI] [PubMed] [Google Scholar]

[bibr10-30502225261445743] 10. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 2016:785-794. doi: 10.1145/2939672.2939785 [DOI] [Google Scholar]

[bibr11-30502225261445743] 11. Bishop CM. Pattern Recognition and Machine Learning. Springer; 2006. [Google Scholar]

[bibr12-30502225261445743] 12. Ghassemi M, Naumann T, Schulam P, Beam AL, Chen IY, Ranganath R. A review of challenges and opportunities in machine learning for health. AMIA Jt Summits Transl Sci Proc. 2020;2020:191-200. [PMC free article] [PubMed] [Google Scholar]

[bibr13-30502225261445743] 13. Zeng Z, Deng Y, Li X, Naumann T, Luo Y. Natural language processing for EHR-based computational phenotyping. IEEE/ACM Trans Comput Biol Bioinform. 2019;16(1):139-153. doi: 10.1109/TCBB.2018.2849968 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr14-30502225261445743] 14. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Advances in Neural Information Processing Systems. 2017;30. [Google Scholar]

[bibr15-30502225261445743] 15. Alsentzer E, Murphy JR, Boag W, et al. Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop. Association for Computational Linguistics; 2019:72-78. doi: 10.18653/v1/W19-1909 [DOI] [Google Scholar]

[bibr16-30502225261445743] 16. Green SM, Denmark TK, Cline J, et al. Machine learning for prediction of clinical outcomes in a pediatric emergency department. Acad Emerg Med. 2021;28(3):314-323. doi: 10.1111/acem.1419833492755 [DOI] [Google Scholar]

[bibr17-30502225261445743] 17. Sterckx L, Vandewiele G, De Backere F, et al. Natural language processing for pediatric emergency department triage: a systematic review. J Am Med Inform Assoc. 2023;30(5):987-998. doi: 10.1093/jamia/ocad034 [DOI] [Google Scholar]

[bibr18-30502225261445743] 18. MacWhinney B. The CHILDES Project: Tools for Analyzing Talk. 3rd ed. Lawrence Erlbaum Associates; 2000. [Google Scholar]

[bibr19-30502225261445743] 19. Esteva A, Robicquet A, Ramsundar B, et al. A guide to deep learning in healthcare. Nat Med. 2019;25(1):24-29. doi: 10.1038/s41591-018-0316-z [DOI] [PubMed] [Google Scholar]

[bibr20-30502225261445743] 20. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380(14):1347-1358. doi: 10.1056/NEJMra1814259 [DOI] [PubMed] [Google Scholar]

[bibr21-30502225261445743] 21. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-444. doi: 10.1038/nature14539 [DOI] [PubMed] [Google Scholar]

[bibr22-30502225261445743] 22. Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019;17(1):195. doi: 10.1186/s12916-019-1426-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr23-30502225261445743] 23. Sutton RT, Pincock D, Baumgart DC, Sadowski DC, Fedorak RN, Kroeker KI. An overview of clinical decision support systems: benefits, risks, and strategies for success. NPJ Digit Med. 2020;3:17. doi: 10.1038/s41746-020-0221-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr24-30502225261445743] 24. Goddard K, Roudsari A, Wyatt JC. Automation bias: a systematic review of frequency, effect mediators, and mitigators. J Am Med Inform Assoc. 2012;19(1):121-127. doi: 10.1136/amiajnl-2011-000089 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr25-30502225261445743] 25. Liu X, Faes L, Kale AU, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019; 1(6):e271-e297. doi: 10.1016/S2589-7500(19)30123-2 [DOI] [PubMed] [Google Scholar]

[bibr26-30502225261445743] 26. Zachariasse JM, Seiger N, Rood PP, et al. Validity of the Manchester Triage System in emergency care: a prospective observational study. PLoS One. 2017;12(2): e0170811. doi: 10.1371/journal.pone.0170811 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr27-30502225261445743] 27. Warren DW, Jarvis A, LeBlanc L, Gravel J; CTAS National Working Group. Revisions to the Canadian Triage and Acuity Scale paediatric guidelines (PaedCTAS). CJEM. 2008;10(3):224-243. doi: 10.1017/s1481803500010149 [DOI] [PubMed] [Google Scholar]

[bibr28-30502225261445743] 28. Tsai CH, Eghdam A, Davoody N, Wright G, Flowerday S, Koch S. Effects of electronic health record implementation and barriers to adoption and use: a scoping review and qualitative analysis of the content. Life. 2020; 10(12):327. doi: 10.3390/life10120327 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr29-30502225261445743] 29. Fleuren LM, Klausch TLT, Zwager CL, et al. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med. 2020;46(3):383-400. doi: 10.1007/s00134-019-05872-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr30-30502225261445743] 30. Ramgopal S, Horvat CM, Siripong N, Kitsko D, Piccione J, Sanchez-Pinto LN. Age-stratified performance of machine learning models for predicting critical illness in pediatric emergency department patients. Pediatr Crit Care Med. 2022;23(8):e375-e384. doi: 10.1097/PCC.0000000000002987 [DOI] [Google Scholar]

[bibr31-30502225261445743] 31. Simon TD, Haaland W, Hawley K, Lambka K, Mangione-Smith R. Development and validation of a pediatric medical complexity algorithm for the electronic health record. Hosp Pediatr. 2021;11(8):817-826. doi: 10.1542/hpeds.2020-005622 [DOI] [Google Scholar]

[bibr32-30502225261445743] 32. Feudtner C, Feinstein JA, Zhong W, Hall M, Dai D. Pediatric complex chronic conditions classification system version 2: updated for ICD-10 and complex medical technology dependence and transplantation. BMC Pediatr. 2014;14:199. doi: 10.1186/1471-2431-14-199 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr33-30502225261445743] 33. Feinstein JA, Russell S, DeWitt PE, Feudtner C, Dai D, Bennett TD. Development of a pediatric medical complexity algorithm for the electronic health record. Acad Pediatr. 2017;17(6):649-656. doi: 10.1016/j.acap.2017.02.00828215656 [DOI] [Google Scholar]

[bibr34-30502225261445743] 34. Grupp-Phelan J, Harman JS, Kelleher KJ. Trends in mental health and chronic condition visits by children presenting for care at US emergency departments. Public Health Rep. 2007;122(1):55-61. doi: 10.1177/003335490712200108 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr35-30502225261445743] 35. Chen JH, Goldstein R, Lin A, et al. Performance disparities by language in natural language processing for pediatric chief complaint classification. JAMA Netw Open. 2023;6(4):e239874. doi: 10.1001/jamanetworkopen.2023.9874 [DOI] [Google Scholar]

[bibr36-30502225261445743] 36. Mandrekar JN. Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol. 2010; 5(9):1315-1316. doi: 10.1097/JTO.0b013e3181ec173d [DOI] [PubMed] [Google Scholar]

[bibr37-30502225261445743] 37. Van Calster B, McLernon DJ, van Smeden M, Wynants L, Steyerberg EW; Topic Group “Evaluating diagnostic tests and prediction models” of the STRATOS Initiative. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17(1):230. doi: 10.1186/s12916-019-1466-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr38-30502225261445743] 38. Davis SE, Lasko TA, Chen G, Siew ED, Matheny ME. Calibration drift in regression and machine learning models for acute kidney injury. J Am Med Inform Assoc. 2017;24(6):1052-1061. doi: 10.1093/jamia/ocx030 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr39-30502225261445743] 39. Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. Potential biases in machine learning algorithms using electronic health record data. JAMA Intern Med. 2018;178(11): 1544-1547. doi: 10.1001/jamainternmed.2018.3763 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr40-30502225261445743] 40. Wiens J, Saria S, Sendak M, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019;25(9):1337-1340. doi: 10.1038/s41591-019-0548-6 [DOI] [PubMed] [Google Scholar]

[bibr41-30502225261445743] 41. Chen IY, Szolovits P, Ghassemi M. Can AI help reduce disparities in general medical and mental health care?. AMA J Ethics. 2019;21(2):E167-E179. doi: 10.1001/amajethics.2019.167 [DOI] [PubMed] [Google Scholar]

[bibr42-30502225261445743] 42. Obermeyer Z, Nissan R, Stern M, Eaneff S, Bembeneck EJ, Mullainathan S. Algorithmic bias in a clinical prediction model for pediatric asthma exacerbation. Health Aff. 2022;41(10):1445-1453. doi: 10.1377/hlthaff.2022.00567 [DOI] [Google Scholar]

[bibr43-30502225261445743] 43. Keet CA, McCormack MC, Pollack CE, Peng RD, McGowan E, Matsui EC. Neighborhood poverty, urban residence, race/ethnicity, and asthma: rethinking the inner-city asthma epidemic. J Allergy Clin Immunol. 2015;135(3):655-662. doi: 10.1016/j.jaci.2014.11.022 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr44-30502225261445743] 44. Lyon SM, Wunsch H, Asch DA, et al. Race and ethnicity in machine learning for clinical prediction: a scoping review. JAMA Netw Open. 2022;5(11):e2241569. doi: 10.1001/jamanetworkopen.2022.41569 [DOI] [Google Scholar]

[bibr45-30502225261445743] 45. Vyas DA, Eisenstein LG, Jones DS. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. N Engl J Med. 2020;383(9):874-882. doi: 10.1056/NEJMms2004740 [DOI] [PubMed] [Google Scholar]

[bibr46-30502225261445743] 46. Mhasawade V, Zhao Y, Chunara R. Machine learning and algorithmic fairness in public and population health. Nat Mach Intell. 2021;3(8):659-666. doi: 10.1038/s42256-021-00373-4 [DOI] [Google Scholar]

[bibr47-30502225261445743] 47. Fiscella K, Sanders MR. Racial and ethnic disparities in the quality of health care. Annu Rev Public Health. 2016;37:375-394. doi: 10.1146/annurev-publhealth-032315-021439 [DOI] [PubMed] [Google Scholar]

[bibr48-30502225261445743] 48. Flores G; Committee on Pediatric Research. Technical report—racial and ethnic disparities in the health and health care of children. Pediatrics. 2010;125(4): e979-e1020. doi: 10.1542/peds.2010-0188 [DOI] [PubMed] [Google Scholar]

[bibr49-30502225261445743] 49. Pleiss G, Raghavan M, Wu F, Kleinberg J, Weinberger KQ. On fairness and calibration. In: Advances in Neural Information Processing Systems. 2017;30. [Google Scholar]

[bibr50-30502225261445743] 50. Benjamin R. Race After Technology: Abolitionist Tools for the New Jim Code. Polity Press; 2019. [Google Scholar]

[bibr51-30502225261445743] 51. Prelim M, Chang A, Redline S, Celi LA, Brown S. Parent perspectives on artificial intelligence in pediatric hospital medicine: a qualitative study. J Hosp Med. 2022;17(8):612-619. doi: 10.1002/jhm.12890 [DOI] [Google Scholar]

[bibr52-30502225261445743] 52. Coyne I, Hallström I, Söderbäck M. Reframing the focus from a child-centred to a child-rights perspective in healthcare. Children. 2022;9(5):625. doi: 10.3390/children9050625 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr53-30502225261445743] 53. Rajkomar A, Hardt M, Howell MD, Corrado G, Chin MH. Ensuring fairness in machine learning to advance health equity. Ann Intern Med. 2018;169(12):866-872. doi: 10.7326/M18-1990 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr54-30502225261445743] 54. Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A. A survey on bias and fairness in machine learning. ACM Comput Surv. 2021;54(6):1-35. doi: 10.1145/3457607 [DOI] [Google Scholar]

[bibr55-30502225261445743] 55. Zafar MB, Valera I, Gomez Rodriguez M, Gummadi KP. Fairness constraints: a flexible approach for fair classification. J Mach Learn Res. 2019;20(75):1-42. [Google Scholar]

[bibr56-30502225261445743] 56. Mitchell S, Potash E, Barocas S, D’Amour A, Lum K. Algorithmic fairness: choices, assumptions, and definitions. Annu Rev Stat Appl. 2021;8:141-163. doi: 10.1146/annurev-statistics-042720-125902 [DOI] [Google Scholar]

[bibr57-30502225261445743] 57. Price WN, Gerke S, Cohen IG. Potential liability for physicians using artificial intelligence. JAMA. 2019; 322(18):1765-1766. doi: 10.1001/jama.2019.15064 [DOI] [PubMed] [Google Scholar]

[bibr58-30502225261445743] 58. Gerke S, Babic B, Evgeniou T, Cohen IG. The need for a system view to regulate artificial intelligence/machine learning-based software as medical device. NPJ Digit Med. 2020;3:53. doi: 10.1038/s41746-020-0262-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr59-30502225261445743] 59. Beauchamp TL, Childress JF. Principles of Biomedical Ethics. 8th ed. Oxford University Press; 2019. [Google Scholar]

[bibr60-30502225261445743] 60. Mittelstadt BD, Allo P, Taddeo M, Wachter S, Floridi L. The ethics of algorithms: mapping the debate. Big Data Soc. 2016;3(2):1-21. doi: 10.1177/2053951716679679 [DOI] [Google Scholar]

[bibr61-30502225261445743] 61. Amann J, Blasimme A, Vayena E, Frey D, Madai VI; Precise4Q Consortium. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med Inform Decis Mak. 2020;20(1):310. doi: 10.1186/s12911-020-01332-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr62-30502225261445743] 62. Goodman B, Flaxman S. European Union regulations on algorithmic decision-making and a “right to explanation”. AI Mag. 2017;38(3):50-57. doi: 10.1609/aimag.v38i3.2741 [DOI] [Google Scholar]

[bibr63-30502225261445743] 63. European Parliament and Council of the European Union. Regulation (EU) 2024/1689 on Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act). European Parliament and Council of the European Union; 2024. [Google Scholar]

[bibr64-30502225261445743] 64. Tsai JC, Chen WY, Liang YW, Lin HJ, Guo HR. The effect of an artificial intelligence-based clinical decision support system on the triage of patients with acute coronary syndrome. Sci Rep. 2021;11(1):18119. doi: 10.1038/s41598-021-97601-5 [DOI] [Google Scholar]

[bibr65-30502225261445743] 65. Patel SJ, Chamberlain DB, Chamberlain JM. Impact of a machine learning-based triage tool on pediatric emergency department length of stay. Pediatr Emerg Care. 2022;38(9):e1523-e1528. doi: 10.1097/PEC.0000000000002798 [DOI] [Google Scholar]

[bibr66-30502225261445743] 66. Chen JH, Asch SM. Machine learning and prediction in medicine—beyond the peak of inflated expectations. N Engl J Med. 2017;376(26):2507-2509. doi: 10.1056/NEJMp1702071 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr67-30502225261445743] 67. Wang L, Wang Y, Chang S, et al. The effectiveness of an artificial intelligence-based clinical decision support system for the triage of patients with acute abdominal pain. J Med Syst. 2020;44(6):111. doi: 10.1007/s10916-020-01574-w32377870 [DOI] [Google Scholar]

[bibr68-30502225261445743] 68. Kim Y, Lee S, Choi JW, et al. Effectiveness of an artificial intelligence-based clinical decision support system for the triage of patients with chest pain. J Am Coll Cardiol. 2021;77(18 suppl 1):1-10. doi: 10.1016/S0735-1097(21)01234-533413929 [DOI] [Google Scholar]

[bibr69-30502225261445743] 69. Johnson AEW, Ghassemi MM, Nemati S, Niehaus KE, Clifton DA, Clifford GD. Machine learning and decision support in critical care. Proc IEEE Inst Electr Electron Eng. 2016;104(2):444-466. doi: 10.1109/JPROC.2015.2501978 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr70-30502225261445743] 70. Brown H, Terrence J, Vasquez P, Bates DW, Zimlichman E. Continuous monitoring of vital signs using wearable devices on the general ward: pilot study. JMIR Mhealth Uhealth. 2020;8(7):e15471. doi: 10.2196/15471 [DOI] [Google Scholar]

[bibr71-30502225261445743] 71. Wahl B, Cossy-Gantner A, Germann S, Schwalbe NR. Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings?. BMJ Glob Health. 2018;3(4):e000798. doi: 10.1136/bmjgh-2018-000798 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr72-30502225261445743] 72. Mathews SC, McShea MJ, Hanley CL, Ravitz A, Labrique AB, Cohen AB. Digital health: a path to validation. NPJ Digit Med. 2019;2:38. doi: 10.1038/s41746-019-0111-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr73-30502225261445743] 73. Owoyemi A, Owoyemi J, Osiyemi A, Boyd A. Artificial intelligence for healthcare in Africa. Front Digit Health. 2020;2:6. doi: 10.3389/fdgth.2020.00006 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr74-30502225261445743] 74. Peek N, Combi C, Marin R, Bellazzi R. Thirty years of artificial intelligence in medicine (AIME) conferences: a review of research themes. Artif Intell Med. 2015;65(1):61-73. doi: 10.1016/j.artmed.2015.07.003 [DOI] [PubMed] [Google Scholar]

[bibr75-30502225261445743] 75. Labrique A, Vasudevan L, Weiss W, Wilson K. Establishing standards to evaluate the impact of integrating digital health into health systems. Glob Health Sci Pract. 2018;6(suppl 1):S5-S17. doi: 10.9745/GHSP-D-18-00230 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr76-30502225261445743] 76. Greenhalgh T, Wherton J, Papoutsi C, et al. Beyond adoption: a new framework for theorizing and evaluating nonadoption, abandonment, and challenges to the scale-up, spread, and sustainability of health and care technologies. J Med Internet Res. 2017;19(11):e367. doi: 10.2196/jmir.8775 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr77-30502225261445743] 77. Borycki EM, Kushniruk AW, Bellwood P, Brender J. Technology-induced errors: the current use of frameworks and models from the biomedical and life sciences literatures. Methods Inf Med. 2012;51(2):95-103. doi: 10.3414/ME11-02-0009 [DOI] [PubMed] [Google Scholar]

[bibr78-30502225261445743] 78. Castagno S, Khalifa M. Perceptions of artificial intelligence among healthcare staff: a qualitative survey study. Front Artif Intell. 2020;3:578983. doi: 10.3389/frai.2020.578983 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr79-30502225261445743] 79. Roman LC, Ancker JS, Johnson SB, Senathirajah Y. Navigation in the electronic health record: a review of the safety and usability literature. J Biomed Inform. 2017;67:69-79. doi: 10.1016/j.jbi.2017.01.005 [DOI] [PubMed] [Google Scholar]

[bibr80-30502225261445743] 80. Koppel R, Metlay JP, Cohen A, et al. Role of computerized physician order entry systems in facilitating medication errors. JAMA. 2005;293(10):1197-1203. doi: 10.1001/jama.293.10.1197 [DOI] [PubMed] [Google Scholar]

[bibr81-30502225261445743] 81. Sullivan HR, Schweikart SJ. Are current tort liability doctrines adequate for addressing injury caused by AI?. AMA J Ethics. 2019;21(2):E160-E166. doi: 10.1001/amajethics.2019.160 [DOI] [PubMed] [Google Scholar]

[bibr82-30502225261445743] 82. Price WN, II. Medical malpractice and black-box medicine. In: Cohen IG, Lynch HF, Vayena E, Gasser U, eds. Big Data, Health Law, and Bioethics. Cambridge University Press; 2018:295-308. [Google Scholar]

[bibr83-30502225261445743] 83. Char DS, Abràmoff MD, Feudtner C. Identifying ethical considerations for machine learning healthcare applications. Am J Bioeth. 2020;20(11):7-17. doi: 10.1080/15265161.2020.1819469 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr84-30502225261445743] 84. Yang Q, Steinfeld A, Zimmerman J. Unpacking emphasis on human-AI collaboration in healthcare: a systematic literature review of human-centered AI. ACM Trans Comput Hum Interact. 2023;30(4):1-32. doi: 10.1145/3582431 [DOI] [Google Scholar]

[bibr85-30502225261445743] 85. Cai CJ, Winter S, Steiner D, Wilcox L, Terry M. “Hello AI”: uncovering the onboarding needs of medical practitioners for human-AI collaborative decision-making. Proc ACM Hum Comput Interact. 2019;3(CSCW):1-24. doi: 10.1145/335920634322658 [DOI] [Google Scholar]

[bibr86-30502225261445743] 86. Adadi A, Berrada M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access. 2018;6:52138-52160. doi: 10.1109/ACCESS.2018.2870052 [DOI] [Google Scholar]

[bibr87-30502225261445743] 87. Tonekaboni S, Joshi S, McCradden MD, Goldenberg A. What clinicians want: contextualizing explainable machine learning for clinical end use. Proc Mach Learn Healthc Conf. 2019;106:1-21. [Google Scholar]

[bibr88-30502225261445743] 88. Ghassemi M, Oakden-Rayner L, Beam AL. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit Health. 2021; 3(11):e745-e750. doi: 10.1016/S2589-7500(21)00208-9 [DOI] [PubMed] [Google Scholar]

[bibr89-30502225261445743] 89. Acosta JN, Falcone GJ, Rajpurkar P, Topol EJ. Multimodal biomedical AI. Nat Med. 2022;28(9):1773-1784. doi: 10.1038/s41591-022-01981-2 [DOI] [PubMed] [Google Scholar]

[bibr90-30502225261445743] 90. Huang SC, Pareek A, Seyyedi S, Banerjee I, Lungren MP. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ Digit Med. 2020;3:136. doi: 10.1038/s41746-020-00341-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr91-30502225261445743] 91. Rieke N, Hancox J, Li W, et al. The future of digital health with federated learning. NPJ Digit Med. 2020;3:119. doi: 10.1038/s41746-020-00323-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr92-30502225261445743] 92. Brisimi TS, Chen R, Mela T, Olshevsky A, Paschalidis IC, Shi W. Federated learning of predictive models from federated electronic health records. Int J Med Inform. 2018;112:59-67. doi: 10.1016/j.ijmedinf.2018.01.007 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr93-30502225261445743] 93. Dayan I, Roth HR, Zhong A, et al. Federated learning for predicting clinical outcomes in patients with COVID-19. Nat Med. 2021;27(10):1735-1743. doi: 10.1038/s41591-021-01506-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr94-30502225261445743] 94. Lee CS, Lee AY. Clinical applications of continual learning machine learning. Lancet Digit Health. 2020;2(6): e279-e281. doi: 10.1016/S2589-7500(20)30102-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr95-30502225261445743] 95. Henry KE, Hager DN, Pronovost PJ, Saria S. A targeted real-time early warning score (TREWScore) for septic shock. Sci Transl Med. 2015;7(299):299ra122. doi: 10.1126/scitranslmed.aab3719 [DOI] [PubMed] [Google Scholar]

[bibr96-30502225261445743] 96. Balamuth F, Alpern ER, Abbadessa MK, et al. Improving recognition of pediatric severe sepsis in the emergency department: contributions of a vital sign-based electronic alert and bedside clinician identification. Ann Emerg Med. 2017;70(6):759-768.e2. doi: 10.1016/j.annemergmed.2017.03.019 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr97-30502225261445743] 97. Santillana M, Nguyen AT, Dredze M, Paul MJ, Nsoesie EO, Brownstein JS. Combining search, social media, and traditional data sources to improve influenza surveillance. PLoS Comput Biol. 2015;11(10):e1004513. doi: 10.1371/journal.pcbi.1004513 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr98-30502225261445743] 98. U.S. Food and Drug Administration. Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD). U.S. Food and Drug Administration; 2019. [Google Scholar]

[bibr99-30502225261445743] 99. World Health Organization. Global Strategy on Digital Health 2020-2025. World Health Organization; 2021. [Google Scholar]

[bibr100-30502225261445743] 100. World Health Organization. Ethics and Governance of Artificial Intelligence for Health: WHO Guidance. World Health Organization; 2021. [Google Scholar]

[bibr101-30502225261445743] 101. Khennou F, Latif S, Abdul Razak S, El Hassani AH, El Beqqali O. Simulation-based validation of machine learning models for healthcare: a systematic review. J Biomed Inform. 2022;136:104237. doi: 10.1016/j.jbi.2022.104237 [DOI] [Google Scholar]

[bibr102-30502225261445743] 102. Feng J, Phillips RV, Malenica I, et al. Clinical trial simulation to evaluate the performance of machine learning models for treatment effect estimation. J Am Med Inform Assoc. 2021;28(9):1912-1921. doi: 10.1093/jamia/ocab097 [DOI] [Google Scholar]

[bibr103-30502225261445743] 103. Parvinian B, Scully C, Wiyor H, Kumar A, Weininger S. Regulatory considerations for physiological closed-loop controlled medical devices used for automated critical care: food and drug administration workshop discussion topics. Anesth Analg. 2018;126(6):1916-1925. doi: 10.1213/ANE.0000000000002849 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr104-30502225261445743] 104. Greenhalgh T, Thorne S, Malterud K. Time to challenge the spurious hierarchy of systematic over narrative reviews?. Eur J Clin Invest. 2018;48(6):e12931. doi: 10.1111/eci.12931 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr105-30502225261445743] 105. Popay J, Roberts H, Sowden A, et al. Guidance on the conduct of narrative synthesis in systematic reviews. ESRC Methods Programme; 2006. [Google Scholar]

[bibr106-30502225261445743] 106. Ioannidis JPA. The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses. Milbank Q. 2016;94(3):485-514. doi: 10.1111/1468-0009.12210 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr107-30502225261445743] 107. Fernandes M, Vieira SM, Leite F, Palos C, Finkelstein S, Sousa JMC. Artificial intelligence in emergency medicine: a scoping review. J Am Coll Emerg Physicians Open. 2020;1(6):1691-1702. doi: 10.1002/emp2.12277 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr108-30502225261445743] 108. Hwang S, You J, Kim J, et al. Machine learning-based prediction of critical illness in children visiting the emergency department. PLoS One. 2022;17(2):e0264184. doi: 10.1371/journal.pone.0264184. Source [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr109-30502225261445743] 109. Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR. Publication bias in clinical research. Lancet. 1991; 337(8746):867-72. doi: 10.1016/0140-6736(91)90201-Y [DOI] [PubMed] [Google Scholar]

[bibr110-30502225261445743] 110. Stewart J, Sprivulis P, Dwivedi G. Artificial intelligence and machine learning in emergency medicine: a systematic review of registered clinical trials. Emerg Med Australas. 2023;35(2):209-218. [DOI] [PubMed] [Google Scholar]

[bibr111-30502225261445743] 111. Williams C, James D, Wilson S, Thompson R. Implementation of an AI triage system in community pediatric emergency departments: a cluster-randomized trial. Acad Emerg Med. 2023;30(7):712-721. [Google Scholar]

[bibr112-30502225261445743] 112. Okada Y, Narumoto J, Matsumoto K, Tanaka H. External validation of a deep learning model for pediatric emergency triage: retrospective cohort study. JMIR Med Inform. 2022;10(8):e37892. [Google Scholar]

PERMALINK

Artificial Intelligence-Driven Triage in Pediatric Emergency Departments: Accuracy, Bias, and Impact on Clinical Outcomes: A Narrative Review

Eslam Abady, MBBCh

Mandy Elewa, MSc

Habiba Abdelhameed Elrefaey, MBBCh

Kevin Thomas Mathew, MD

Panos Tamvakologos, MD

Kayleigh Kuhn, MD

Mohammed Alsabri, MD, FAAP

Abstract

Introduction

Figure 1.

Literature Search Strategy

AI Models and Methodologies

Table 1.

Diagnostic Accuracy and Performance Metrics

Overall Accuracy Compared to Traditional Triage

Table 2.

Performance Variation Across Pediatric Subgroups

Calibration and Clinical Utility

Figure 2.

Bias, Equity, and Ethical Challenges

Sources of Bias in AI Triage Systems

Empirically Documented Disparities in Pediatric AI Triage

Vulnerable Populations and Heightened Risks

Algorithmic Fairness and Mitigation Strategies

Patient and Family Perspectives on Algorithmic Bias

Ethical Framework and Legal Considerations

Figure 3.

Clinical Outcomes

Time to Treatment and Length of Stay

Triage Accuracy and Patient Safety

Resource Allocation and Throughput

Mortality and Morbidity

Table 3.

Implementation Challenges

Resource-Limited Settings

Workforce Expertise

Human Factors and Organizational Dynamics

Trust, Transparency, and Explainability

Governance Framework for Responsible Implementation

Pre-Implementation Phase

Go-Live Phase

Ongoing Oversight

Figure 4.

Future Directions

Technological Advancements

From Reactive to Predictive Triage

Regulatory and Policy Development

Simulation-Based Validation

Research Priorities

Limitations of This Review

Publication Bias and the File Drawer Problem

Conclusions

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases