Skip to main content
Translational Pediatrics logoLink to Translational Pediatrics
. 2026 Feb 4;15(2):56. doi: 10.21037/tp-2025-667

Applications of artificial intelligence in pediatric general surgery: a systematic review

Bin Zhang 1, Pushu Wang 1, Yang Song 1, Yanwei Su 2, Yaqi Zhu 2, Yuqi Wang 2, Jinjin Guo 2, Wenjin Wang 1,, Jixin Yang 1,
PMCID: PMC12969160  PMID: 41810201

Abstract

Background

Artificial intelligence (AI) technologies are increasingly being applied in the field of pediatric surgery. Utilizing machine learning (ML) to analyze clinical case data, we can develop models for disease diagnosis and prognosis prediction. This study aims to explore whether AI can effectively process massive amounts of medical data, extract key information, and assist doctors in aspects such as disease diagnosis, surgical plan selection, and prognosis assessment.

Methods

The protocol of this study was registered with PROSPERO (CRD420251184780). We searched PubMed, Web of Science, and Scopus for studies published between February 2016 and June 2025 focusing on AI applications in pediatric appendicitis, intussusception, Hirschsprung’s disease (HD), necrotizing enterocolitis (NEC), and biliary atresia (BA). PRISMA guidelines and Synthesis Without Meta-analysis (SWiM) guidelines were used.

Results

Models integrating multimodal data (such as clinical data, laboratory markers, and imaging) generally outperformed those utilizing single data sources. Some models performed at a level comparable to or exceeding that of experienced specialists in diagnosis, improving the diagnostic accuracy of junior physicians. Most included studies were retrospective with single-center designs, resulting in a generally high risk of bias.

Conclusions

Current research has demonstrated AI’s potential to improve diagnostic accuracy, optimize treatment decisions, and enhance patient outcomes, while improvements are needed in areas such as bias risk control, model interpretability, and data quality. More high-quality, multicenter prospective studies are required to fully realize the comprehensive clinical translation of AI technology in pediatric surgery.

Keywords: Artificial intelligence (AI), machine learning (ML), deep learning (DL), diagnosis, pediatric


Highlight box.

Key findings

• Artificial intelligence (AI) can improve the diagnostic efficacy of pediatric surgeons.

What is known and what is new?

• AI and machine learning (ML) are capable of processing massive volumes of medical data and extracting key information.

• AI and ML can provide assistance to physicians in aspects such as disease diagnosis, surgical plan selection, and prognosis assessment.

What is the implication, and what should change now?

• This report can enhance clinicians’ trust in AI and ML. However, to fully realize the comprehensive clinical translation of AI technology in the field of pediatric surgery, more high-quality, multicenter prospective studies are needed.

Introduction

Artificial intelligence (AI) is a broad technical discipline, which aspires to enable computer systems to simulate, extend, and expand human intelligent behaviors, such as learning, reasoning, judgment and decision-making. A computer system based on AI technology can automatically recognize patterns and regularity contained in data through learning and analysis, and make predictions and decisions according to these patterns and regularity. Machine learning (ML) is a significant branch of AI, which focuses on how to enable the computer to enhance its intelligence with data learning (1,2). With ML algorithms, the features and patterns of data can be trained on a substantial data set, then identified in new data to predict and classify them. Key ML algorithms are the decision trees, support vector machines, neural networks, and the random forests (1,3,4). By integrating various clinical case data on pediatric surgical digestive tract diseases, ML algorithms can construct disease diagnosis models and therapeutic efficacy prediction models. These models can aid doctors in diagnosing diseases and developing effective treatment plans.

Deep learning (DL) is a subset of ML (5). Originating from artificial neural networks, it learns complex feature representations in large-scale scenarios by constructing nested neural network models. DL models possess powerful feature extraction and pattern recognition capabilities, demonstrating excellent performance in tasks such as image recognition, speech recognition, and natural language processing (6,7). Convolutional neural networks can effectively process image data (1,5). To date, the application of DL in pediatric gastrointestinal diseases has primarily focused on medical image analysis. By processing abdominal computed tomography (CT) and magnetic resonance imaging (MRI) images of children, it assists physicians in detecting diseases, diagnosing lesion characteristics, and improving diagnostic accuracy (8). However, its limitations are also significant in practical applications. On the one hand, the “black-box” nature of DL models makes their decision-making processes difficult to interpret intuitively. When a model identifies a lesion area, it cannot clearly explain the basis for its judgment, leading to physicians’ distrust of its output and hindering the clinical promotion of the technology. This lack of interpretability is a significant barrier to clinical adoption, as the demand for explainable AI (XAI) in medicine grows to meet regulatory standards and build clinician confidence (9,10). On the other hand, the pediatric digestive system exhibits significant developmental differences with age. Existing models often fail to fully consider these age-specific anatomical and physiological variations (11). This oversight results in substantial fluctuations in diagnostic accuracy among children of different age groups and makes it difficult to achieve stable and reliable diagnostic outcomes.

The purpose of this study is to systematically review the clinical value of AI technology in pediatric surgical diseases, including disease diagnosis, surgical risk assessment, and intelligent monitoring and management of postoperative rehabilitation. By in-depth analyzing the application scenarios and technical advantages of AI technology in the above-mentioned links, and objectively examining the practical challenges it faces in terms of data quality, model reliability, and ethical norms, it aims to provide a theoretical basis for improving the diagnostic accuracy of pediatric surgery, ensuring surgical safety, and optimizing treatment effects. We present this article in accordance with the PRISMA reporting checklist (available at https://tp.amegroups.com/article/view/10.21037/tp-2025-667/rc).

Methods

The protocol of this study was registered with PROSPERO (CRD420251184780). The conduct and reporting of this study adhered to the Synthesis Without Meta-analysis (SWiM) guidelines (12). A meta-analysis was not performed due to substantial heterogeneity in intervention targets, study populations, outcome measures, and implementation formats, which precluded a meaningful quantitative synthesis. Specifically, the included studies differed considerably in their choice of study populations and outcome measurement instruments. Therefore, this study adopted a combination of narrative synthesis and reporting of effect size ranges and medians in accordance with the recommendations by Campbell et al. (12). Since all included studies reported the area under the curve (AUC) as a consistent and comparable performance metric, the use of effect size ranges and medians aligns with best practice as endorsed by the SWiM guidelines.

Search strategy

We conducted a search across multiple databases, including PubMed, Web of Science, and Scopus, with the search time frame ranging from February 2016 to June 2025. The search terms were composed of various combinations of terms such as “machine learning”, “artificial intelligence”, “Hirschsprung’s disease”, “necrotizing enterocolitis”, “intussusception”, “pediatric appendicitis”, and “biliary atresia”.

Inclusion and exclusion criteria

We considered relevant studies that used ML techniques in diseases such as pediatric appendicitis, intussusception, Hirschsprung’s disease (HD), necrotizing enterocolitis (NEC), and biliary atresia (BA). Studies published before February 2016, abstracts, and non-English language studies were excluded. Additionally, subjects older than 18 years of age were excluded from the analysis.

Screening and selection

Two independent reviewers conducted the screening in two phases: title/abstract screening and full-text screening. Any discrepancies or uncertainties were resolved through discussion to reach a consensus. After removing duplicate documents, review articles and conference papers were excluded in the initial screening process, thus forming a refined pool of potential research literature. Subsequently, the remaining full-text articles were evaluated, and studies not directly related to ML applications were excluded. The finally selected studies included those that met the inclusion criteria and were published between February 2016 and June 2025.

Data extraction

Data extraction follows a structured approach, with a focus on features related to author, year, study design, objectives, modeling approach, data source, gold-standard for diagnosis, AUC for the modeling, AUC for the clinician and risk of bias assessment of the study. The extracted data is synthesized in a narrative manner. From the perspective of pediatric surgeons, key themes and findings regarding the role of AI technologies in disease diagnosis, treatment option selection, and prognostic evaluation are highlighted, and these findings are ultimately comprehensively summarized and presented.

Grouping rationale and data synthesis

The 36 included studies were categorized into three groups based on their clinical objectives: disease diagnosis, surgical decision-making, and prognosis assessment. Within each group, studies were further subgrouped by disease type. The primary outcome measure for synthesis was the model’s discriminative performance, with the AUC serving as the unified effect indicator. For the limited number of studies that did not report AUC but provided alternative metrics such as accuracy, these data were utilized for supplementary descriptive analysis.

Data standardization

The performance of all models was uniformly measured and reported using the AUC. As a standardized metric, the AUC ranges from 0.5 (indicating no discriminatory power) to 1.0 (representing perfect discriminatory ability), ensuring comparability of results across different studies.

Assessment of credibility across studies and sensitivity analysis

Evidence with high credibility was derived from studies with prospective designs, multicenter validation, or utilization of publicly available datasets. Evidence with low credibility originated from single-center, retrospective studies, those with small sample sizes, or studies exhibiting risks of overfitting (e.g., AUC =1.000). Greater weight was assigned to high-credibility evidence when drawing conclusions. The robustness of the synthesized findings was evaluated through subgroup analyses, which compared differences in model performance across various diseases.

Assessment of bias risk

The risk of bias for all included studies was evaluated by two independent reviewers using the Quality Assessment Tool for Diagnostic Accuracy Studies-2 (QUADAS-2) (13). This tool is specifically designed to assess the quality of primary diagnostic accuracy studies. It consists of four domains covering patient selection, index test, reference standard, and flow and timing.

Results

The search yielded 308 unique papers published between February 2016 and July 2025. We further reviewed these papers, excluded review articles and conference papers, and retained only English language studies involving pediatric populations (under 18 years old), totaling 46 studies. After screening, we eliminated 10 papers unrelated to ML applications. Ultimately, 36 studies were analyzed, covering five major pediatric surgical diseases: appendicitis (14-25), BA (26-35), intussusception (36-39), NEC (40-48), and HD (49). These studies employed various ML and DL methods, primarily focusing on disease diagnosis, surgical decision-making, and prognosis assessment. Figure 1 displays the PRISMA flowchart.

Figure 1.

Figure 1

Literature search flow diagram based on PRISMA. ML, machine learning.

Tables S1-S3 summarize the key characteristics of the included studies, with a focus on the following aspects: research objectives, modeling approach, study population and sample size, diagnostic gold standard, AUC for the model, AUC for the clinician, and risk of bias assessment of the study. Table S4 presents the results of quantitative synthesis conducted in accordance with the SWiM guidelines. Specifically, Table S1 concentrates on predictive models for the diagnosis of five diseases developed based on clinical data; Table S2 evaluates the application of AI in surgical decision-making and risk assessment; and Table S3 is dedicated to the application of AI in postoperative recovery monitoring and the prediction of long-term outcomes. Table S4 summarizes the diagnostic, surgical decision-making, and prognostic prediction performance of AI models across different pediatric surgical diseases.

Disease diagnosis

In disease diagnosis, most studies utilized supervised ML and DL methods, integrating clinical, laboratory, and imaging data to construct predictive models. For pediatric appendicitis, the APPSTACK ensemble model proposed by Chadaga et al. (21) achieved an AUC of 0.96; the AiPAD decision tree model by Shikha and Kasem (23) achieved an accuracy of 97.1%, outperforming traditional ultrasound and CT reports. For BA, the extreme gradient boosting (XGBoost) model by Choi et al. (26), when combined with imaging data, achieved an AUC close to 0.999; the AI system for ultrasound images developed by Zhou et al. (34) achieved an AUC of 0.924 in external validation, surpassing the performance of most junior and some senior radiologists. For NEC, the multimodal DL model (imaging and laboratory) by Cui et al. (40) achieved an AUC of 0.92, comparable to pediatricians with 10 years of experience. For intussusception and HD, the YOLOv5 model by Kim et al. (39) achieved an average precision (AP) value of 0.952 for detecting intussusception in ultrasound images; Huang et al. (49) combined barium enema morphological features to predict short-segment HD, achieving an AUC of 0.93.

Surgical decision-making

In surgical decision-making and risk assessment, AI models demonstrated strong predictive capabilities, particularly in distinguishing the need for surgery and predicting complications. For appendicitis, the random forest model by Males et al. (24) achieved an AUC of 0.94 in discriminating complex appendicitis, significantly outperforming the traditional Appendicitis Inflammatory Response (AIR) score. For acute appendicitis, ML models effectively reduced the negative appendectomy rate and distinguished between complex and non-complex cases, with AUCs as high as 0.96. For predicting surgery in NEC, the combined XGBoost model (clinical and radiomics) by Li et al. (43) achieved an AUC of 0.959 in predicting the need for surgery in neonates without absolute surgical indications; the ResNet18 model by Wu et al. (48), based on bedside X-ray images, predicted the need for surgery with an AUC of 0.876. In surgical risk stratification, the ensemble model constructed by Kim et al. (41) using nationwide data predicted surgical NEC in very low birth weight (VLBW) infants with an AUC of 0.721. For BA, multimodal AI systems by Ma et al. (28) demonstrated high accuracy (AUC =0.98) in identifying cases requiring surgical intervention.

Prognosis assessment

AI also shows potential in postoperative recovery monitoring and long-term outcome prediction. Post-Kasai portoenterostomy prognosis for BA: Caruso et al. (30) used various ML models to predict postoperative stability in BA patients, achieving a maximum AUC of 1.00. Postoperative complication monitoring: Ghomrawi et al. (25) used wearable device (Fitbit) data combined with a random forest model to predict abnormal recovery after appendectomy, with AUCs of 0.80 (complex cases) and 0.70 (simple cases). Liver transplant risk prediction: Nyholm et al. (32) analyzed liver biopsy images using a convolutional neural network, and the extracted histological features were significantly correlated with liver transplantation/death risk.

SWiM

Across all diseases, AI models demonstrated strong to exceptional diagnostic performance, with median AUCs ranging from 0.92 to 0.98. The highest performance was observed in BA. For both diagnosis and prognosis prediction, BA-related models had the highest median AUCs, supported by multiple high-quality multicenter and prospective validation studies, making it the most promising area for clinical implementation of AI in pediatric surgery. In appendicitis and NEC, AI models performed excellently (median AUC >0.85) in distinguishing disease severity and predicting the need for surgery, demonstrating significant potential as clinical decision support tools. Compared to diagnosis, there are fewer studies on prognosis prediction, and model performance is more variable (median AUC 0.75–0.90). This is a promising direction that requires more high-quality data and prospective research. Comprehensive analysis suggests that models integrating multimodal data (e.g., clinical data, laboratory markers, and imaging) generally outperform models using a single data source.

Risk of bias

Most studies attempted to incorporate explainability tools such as SHapley Additive exPlanations (SHAP), local interpretable model-agnostic explanations (LIME), and gradient-weighted class activation mapping (Grad-CAM) to enhance model transparency. However, risk of bias assessment (QUADAS-2) indicated that some studies exhibited a high risk of patient selection bias due to their single-center, retrospective design and exclusion criteria (40,49). Studies utilizing public datasets or conducting prospective multicenter validation demonstrated a lower risk of bias. Some studies lacked clear descriptions of blinding and model comparisons methods, which affected the reliability of their conclusions (22,34).

Results of sensitivity analysis

Subgroup analysis by disease type: significant variations in model performance were observed across different diseases. Specifically, diagnostic models for BA demonstrated the most outstanding performance (median AUC: 0.98), significantly surpassing models for other disease types. This indicates that disease-specific characteristics substantially influence AI model performance.

Results of credibility assessment across studies

Using the QUADAS-2 tool, the credibility of evidence was graded as follows:

  • ❖ High-credibility evidence primarily originated from studies with the following characteristics: prospective study designs [e.g., Zhou et al., 2024 (34)], multicenter validation cohorts, use of publicly available datasets.
These studies demonstrated robust model performance, with AUC values ranging between 0.77 and 0.974, demonstrating good clinical generalizability.

  • ❖ Low-credibility evidence was mainly identified in: single-center retrospective studies [e.g., Wang et al., 2025 (31)], studies with insufficient sample sizes (n<50), studies exhibiting risks of overfitting (AUC =1.000).
Although some studies reported exceptionally high AUC values (1.000), their methodological limitations significantly undermined the credibility of the evidence.

Discussion

This systematic review aims to evaluate the application and performance of AI technologies in processing clinical and imaging data to construct diagnostic models, assist surgical decision-making, and predict prognoses for pediatric diseases. The findings indicate that AI models demonstrate excellent performance across three key domains: disease diagnosis, surgical decision-making, and prognosis assessment. AI performs remarkably well in disease diagnosis. When integrating multimodal data such as imaging and laboratory results, AI exhibits predictive capabilities rivaling and sometimes surpassing those of experienced pediatricians and radiologists. This conclusion aligns with the findings of systematic reviews by Liu et al. (50) and Aggarwal et al. (51), suggesting that DL algorithms have the potential to diagnose disease and can exceed radiologists in certain scenarios. In a prospective study, the multimodal DL model proposed by Ma et al. (28) achieved an AUC of 0.9870 in internal testing and 0.9740 in external validation, performing comparably to pediatricians with ten years of experience. Furthermore, the diagnostic AUC of less experienced radiologists could be improved to 0.9006 by using this model. Nevertheless, the development and integration of AI in pediatric surgery face challenges, including scarce pediatric imaging datasets and algorithm heterogeneity. Multiple reviews indicate that limited sample sizes and lack of methodological standardization are key factors limiting the generalizability of AI models (52,53). Furthermore, existing models fail to adequately account for the anatomical structure and pathophysiology among different age groups of children when analyzing medical images, which may compromise diagnostic accuracy (54,55).

AI also has potential for surgical decision-making and prognosis prediction, helping to reduce unnecessary surgeries, optimize surgical timing, and assess risk. For example, the random forest model developed by Males et al. (24) achieved an AUC of 0.94 in discriminating complex appendicitis, significantly outperforming the traditional AIR score. Similarly, the XGBoost model developed by Li et al. (43) which integrated clinical features and radiomics data, demonstrated outstanding performance (AUC =0.959) in predicting the necessity for surgery in children with NEC, providing a more objective basis for surgical decision-making. However, a common challenge for such models is limited generalizability, meaning models that perform excellently on one dataset may exhibit significantly degraded performance when applied to other medical centers (56). In postoperative recovery monitoring and long-term prognosis assessment, AI technology also shows unique advantages, such as using wearable devices (e.g., Fitbit) for real-time monitoring of postoperative recovery status (25). Although the prospects for these technologies are promising, their widespread adoption faces challenges. Maintaining high device sensitivity in dynamic or complex environments may lead to high false alarm rates, causing alarm fatigue and undermining clinicians’ confidence (57).

The “black-box” nature of AI models remains a major obstacle to their clinical adoption. Insufficient model interpretability may weaken clinicians’ trust in AI results, thereby hindering the integration of AI into clinical environments. Most literature included in this review uses XAI techniques such as SHAP, LIME, and Grad-CAM to enhance model transparency, aligning with the current academic emphasis on “Trustworthy AI” (9,10). However, existing XAI methods still face specific challenges in pediatrics: age-related physiological changes limit model generalizability, clinicians’ demand for causal mechanisms surpasses that for correlation explanations, and standardized methods for effectively translating AI-generated explanations in clinician-patient communication are lacking (11).

Most studies included in this review demonstrate that the diagnostic performance of AI algorithms is comparable or superior to that of clinicians. However, these studies are predominantly single-center and retrospective, whereas prospective studies are relatively scarce. This status quo may lead clinicians to maintain reservations regarding the actual efficacy of AI models. Furthermore, this systematic review identified inconsistencies in the reporting standards, outcome measures, and validation methods used for AI model evaluation. Therefore, it is necessary to adopt a comprehensive set of performance evaluation metrics, including optimal threshold, sensitivity, specificity, positive predictive value, and negative predictive value. These metrics can facilitate a more nuanced presentation of model performance, aiding clinicians in understanding model outputs and trusting them.

There are several limitations in this systematic review. First, most included studies were retrospective and single-center, lacking multicenter external validation, thus making it difficult to accurately assess the differences in diagnostic performance between AI and clinicians in real-world clinical settings. The scarcity of rigorously validated prospective studies might lead to overestimation of model diagnostic accuracy. Second, significant heterogeneity existed among the included studies regarding the AI technologies used, the targeted surgical disease types, and the predicted outcomes, rendering meta-analysis infeasible. Third, this review relied on manual data extraction, potentially leading to the exclusion of studies with poor data quality or incomplete reporting. Finally, the definitions of diseases and outcomes are inconsistent across studies; additionally, there exist the heterogeneity in the algorithms used, the dependence on human-extracted data for analysis, or the lack of relevance of some studies, such as the use of X-rays for diagnosing intussusception.

Future research should focus on conducting large-scale, prospective, multicenter clinical studies employing standardized image acquisition protocols, diverse patient populations, and high-quality annotated datasets to enhance model generalizability. Furthermore, efforts should be intensified to strengthen external validation of models and their prospective testing within real clinical workflows. This includes comparing AI model performance with radiologists, assessing treatment response in longitudinal imaging data, and integrating AI tools into clinical trial designs. The application of XAI frameworks should be expanded and standardized to improve model interpretability and clinician trust. Researchers should also adopt standardized reporting guidelines such as TRIPOD-AI to promote transparency, reproducibility, and comparability across studies.

Conclusions

AI technology offers guidance for diagnosis, surgical decision-making, and postoperative management of pediatric surgical diseases. Current research has demonstrated AI’s potential to improve diagnostic accuracy, optimize treatment decisions, and enhance patient outcomes, while improvements are needed in areas such as bias risk control, model interpretability, and data quality. More high-quality, multicenter prospective studies are required to fully realize the comprehensive clinical translation of AI technology in pediatric surgery.

Supplementary

The article’s supplementary files as

tp-15-02-56-rc.pdf (372.2KB, pdf)
DOI: 10.21037/tp-2025-667
tp-15-02-56-coif.pdf (1.2MB, pdf)
DOI: 10.21037/tp-2025-667
DOI: 10.21037/tp-2025-667

Acknowledgments

We would like to thank Ms. Pan Ruochen for her revision suggestions.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Footnotes

Reporting Checklist: The authors have completed the PRISMA reporting checklist. Available at https://tp.amegroups.com/article/view/10.21037/tp-2025-667/rc

Funding: This study was supported by the Hubei Provincial Natural Science Foundation (grant No. 2022CFB134).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tp.amegroups.com/article/view/10.21037/tp-2025-667/coif). All authors report that this study was supported by the Hubei Provincial Natural Science Foundation (grant No. 2022CFB134). The authors have no other conflicts of interest to declare.

References

  • 1.Greener JG, Kandathil SM, Moffat L, et al. A guide to machine learning for biologists. Nat Rev Mol Cell Biol 2022;23:40-55. 10.1038/s41580-021-00407-0 [DOI] [PubMed] [Google Scholar]
  • 2.Mintz Y, Brodie R. Introduction to artificial intelligence in medicine. Minim Invasive Ther Allied Technol 2019;28:73-81. 10.1080/13645706.2019.1575882 [DOI] [PubMed] [Google Scholar]
  • 3.Handelman GS, Kok HK, Chandra RV, et al. eDoctor: machine learning and the future of medicine. J Intern Med 2018;284:603-19. 10.1111/joim.12822 [DOI] [PubMed] [Google Scholar]
  • 4.Domaratzki M, Kidane B. Deus ex machina? Demystifying rather than deifying machine learning. J Thorac Cardiovasc Surg 2022;163:1131-1137.e4. 10.1016/j.jtcvs.2021.02.095 [DOI] [PubMed] [Google Scholar]
  • 5.Choi RY, Coyner AS, Kalpathy-Cramer J, et al. Introduction to Machine Learning, Neural Networks, and Deep Learning. Transl Vis Sci Technol 2020;9:14. 10.1167/tvst.9.2.14 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.van der Velden BHM, Kuijf HJ, Gilhuijs KGA, et al. Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Med Image Anal 2022;79:102470. 10.1016/j.media.2022.102470 [DOI] [PubMed] [Google Scholar]
  • 7.Zhou T, Cheng Q, Lu H, et al. Deep learning methods for medical image fusion: A review. Comput Biol Med 2023;160:106959. 10.1016/j.compbiomed.2023.106959 [DOI] [PubMed] [Google Scholar]
  • 8.Dillman JR, Somasundaram E, Brady SL, et al. Current and emerging artificial intelligence applications for pediatric abdominal imaging. Pediatr Radiol 2022;52:2139-48. 10.1007/s00247-021-05057-0 [DOI] [PubMed] [Google Scholar]
  • 9.Antoniadi AM, Du Y, Guendouz Y, et al. Current Challenges and Future Opportunities for XAI in Machine Learning-Based Clinical Decision Support Systems: A Systematic Review. Applied Sciences-Basel 2021;11:5088. [Google Scholar]
  • 10.Amann J, Blasimme A, Vayena E, et al. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med Inform Decis Mak 2020;20:310. 10.1186/s12911-020-01332-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Salih AM, Menegaz G, Pillay T, et al. Explainable Artificial Intelligence in Paediatric: Challenges for the Future. Health Sci Rep 2024;7:e70271. 10.1002/hsr2.70271 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Campbell M, McKenzie JE, Sowden A, et al. Synthesis without meta-analysis (SWiM) in systematic reviews: reporting guideline. BMJ 2020;368:l6890. 10.1136/bmj.l6890 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lee J, Mulder F, Leeflang M, et al. QUAPAS: An Adaptation of the QUADAS-2 Tool to Assess Prognostic Accuracy Studies. Ann Intern Med 2022;175:1010-8. 10.7326/M22-0276 [DOI] [PubMed] [Google Scholar]
  • 14.Reismann J, Romualdi A, Kiss N, et al. Diagnosis and classification of pediatric acute appendicitis by artificial intelligence methods: An investigator-independent approach. PLoS One 2019;14:e0222030. 10.1371/journal.pone.0222030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Erman A, Ferreira J, Ashour WA, et al. Machine-learning-assisted Preoperative Prediction of Pediatric Appendicitis Severity. J Pediatr Surg 2025;60:162151. 10.1016/j.jpedsurg.2024.162151 [DOI] [PubMed] [Google Scholar]
  • 16.Stiel C, Elrod J, Klinke M, et al. The Modified Heidelberg and the AI Appendicitis Score Are Superior to Current Scores in Predicting Appendicitis in Children: A Two-Center Cohort Study. Front Pediatr 2020;8:592892. 10.3389/fped.2020.592892 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Aydin E, Türkmen İU, Namli G, et al. A novel and simple machine learning algorithm for preoperative diagnosis of acute appendicitis in children. Pediatr Surg Int 2020;36:735-42. 10.1007/s00383-020-04655-7 [DOI] [PubMed] [Google Scholar]
  • 18.Alramadhan MM, Al Khatib HS, Murphy JR, et al. Using Artificial Neural Networks to Predict Intra-Abdominal Abscess Risk Post-Appendectomy. Ann Surg Open 2022;3:e168. 10.1097/AS9.0000000000000168 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Marcinkevičs R, Reis Wolfertstetter P, Klimiene U, et al. Interpretable and intervenable ultrasonography-based machine learning models for pediatric appendicitis. Med Image Anal 2024;91:103042. 10.1016/j.media.2023.103042 [DOI] [PubMed] [Google Scholar]
  • 20.Maffezzoni D, Barbierato E, Gatti A. Data-Driven Diagnostics for Pediatric Appendicitis: Machine Learning to Minimize Misdiagnoses and Unnecessary Surgeries. Future Internet 2025;17:147. [Google Scholar]
  • 21.Chadaga K, Khanna V, Prabhu S, et al. An interpretable and transparent machine learning framework for appendicitis detection in pediatric patients. Sci Rep 2024;14:24454. 10.1038/s41598-024-75896-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Navaei M, Doogchi Z, Gholami F, et al. Leveraging Machine Learning for Pediatric Appendicitis Diagnosis: A Retrospective Study Integrating Clinical, Laboratory, and Imaging Data. Health Sci Rep 2025;8:e70756. 10.1002/hsr2.70756 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Shikha A, Kasem A. The Development and Validation of Artificial Intelligence Pediatric Appendicitis Decision-Tree for Children 0 to 12 Years Old. Eur J Pediatr Surg 2023;33:395-402. 10.1055/a-1946-0157 [DOI] [PubMed] [Google Scholar]
  • 24.Males I, Boban Z, Kumric M, et al. Applying an explainable machine learning model might reduce the number of negative appendectomies in pediatric patients with a high probability of acute appendicitis. Sci Rep 2024;14:12772. 10.1038/s41598-024-63513-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ghomrawi HMK, O'Brien MK, Carter M, et al. Applying machine learning to consumer wearable data for the early detection of complications after pediatric appendectomy. NPJ Digit Med 2023;6:148. 10.1038/s41746-023-00890-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Choi HJ, Kim YE, Namgoong JM, et al. Development and Validation of a Machine Learning-Based Prediction Model for Detection of Biliary Atresia. Gastro Hep Adv 2023;2:778-87. 10.1016/j.gastha.2023.05.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zhao Y, Wang A, Wang D, et al. Development of a diagnostic model for biliary atresia based on MMP7 and serological tests using machine learning. Pediatr Surg Int 2024;40:203. 10.1007/s00383-024-05740-x [DOI] [PubMed] [Google Scholar]
  • 28.Ma Y, Yang Y, Du Y, et al. Development of an artificial intelligence-based multimodal diagnostic system for early detection of biliary atresia. BMC Med 2025;23:127. 10.1186/s12916-025-03962-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Sun Y, Dai S, Shen Z, et al. Gamma-glutamyl transpeptidase has different efficacy on biliary atresia diagnosis in different hospital patient groups: an application of machine learning approach. Pediatr Surg Int 2022;38:1131-41. 10.1007/s00383-022-05148-5 [DOI] [PubMed] [Google Scholar]
  • 30.Caruso M, Ricciardi C, Delli Paoli G, et al. Machine Learning Evaluation of Biliary Atresia Patients to Predict Long-Term Outcome after the Kasai Procedure. Bioengineering (Basel) 2021;8:152. 10.3390/bioengineering8110152 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Wang D, Sun J, Jin Y, et al. Application of machine learning in constructing a diagnostic model for neonatal biliary atresia. Pediatr Investig 2025;9:361-71. 10.1002/ped4.70009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Nyholm I, Sjöblom N, Pihlajoki M, et al. Deep learning quantification reveals a fundamental prognostic role for ductular reaction in biliary atresia. Hepatol Commun 2023;7:e0333. 10.1097/HC9.0000000000000333 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Zhou W, Yang Y, Yu C, et al. Ensembled deep learning model outperforms human experts in diagnosing biliary atresia from sonographic gallbladder images. Nat Commun 2021;12:1259. 10.1038/s41467-021-21466-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Zhou W, Ye Z, Huang G, et al. Interpretable artificial intelligence-based app assists inexperienced radiologists in diagnosing biliary atresia from sonographic gallbladder images. BMC Med 2024;22:29. 10.1186/s12916-024-03247-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Duan X, Yang L, Zhu W, et al. Is the diagnostic model based on convolutional neural network superior to pediatric radiologists in the ultrasonic diagnosis of biliary atresia? Front Med (Lausanne) 2023;10:1308338. 10.3389/fmed.2023.1308338 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Kwon G, Ryu J, Oh J, et al. Deep learning algorithms for detecting and visualising intussusception on plain abdominal radiography in children: a retrospective multicenter study. Sci Rep 2020;10:17582. 10.1038/s41598-020-74653-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Chen X, You G, Chen Q, et al. Development and evaluation of an artificial intelligence system for children intussusception diagnosis using ultrasound images. iScience 2023;26:106456. 10.1016/j.isci.2023.106456 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Qian YF, Guo WL. Development and validation of a deep learning algorithm for prediction of pediatric recurrent intussusception in ultrasound images and radiographs. BMC Med Imaging 2025;25:67. 10.1186/s12880-025-01582-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Kim SW, Cheon JE, Choi YH, et al. Feasibility of a deep learning artificial intelligence model for the diagnosis of pediatric ileocolic intussusception with grayscale ultrasonography. Ultrasonography 2024;43:57-67. 10.14366/usg.23153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Cui K, Changrong S, Maomin Y, et al. Development of an artificial intelligence-based multimodal model for assisting in the diagnosis of necrotizing enterocolitis in newborns: a retrospective study. Front Pediatr 2024;12:1388320. 10.3389/fped.2024.1388320 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kim SH, Oh YJ, Son J, et al. Machine learning-based analysis for prediction of surgical necrotizing enterocolitis in very low birth weight infants using perinatal factors: a nationwide cohort study. Eur J Pediatr 2024;183:2743-51. 10.1007/s00431-024-05505-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Cho H, Lee EH, Lee KS, et al. Machine learning-based risk factor analysis of necrotizing enterocolitis in very low birth weight infants. Sci Rep 2022;12:21407. 10.1038/s41598-022-25746-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Li Y, Wu K, Yang H, et al. Surgical prediction of neonatal necrotizing enterocolitis based on radiomics and clinical information. Abdom Radiol (NY) 2024;49:1020-30. 10.1007/s00261-023-04157-9 [DOI] [PubMed] [Google Scholar]
  • 44.Lure AC, Du X, Black EW, et al. Using machine learning analysis to assist in differentiating between necrotizing enterocolitis and spontaneous intestinal perforation: A novel predictive analytic tool. J Pediatr Surg 2021;56:1703-10. 10.1016/j.jpedsurg.2020.11.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Verhoeven R, Kupers T, Brunsch CL, et al. Using Vital Signs for the Early Prediction of Necrotizing Enterocolitis in Preterm Neonates with Machine Learning. Children (Basel) 2024;11:1452. 10.3390/children11121452 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Gao WJ, Pei YY, Liang HY, et al. Multimodal AI System for the Rapid Diagnosis and Surgical Prediction of Necrotizing Enterocolitis. IEEE Access 2021;9:51050-51064.
  • 47.Weller JH, Scheese D, Tragesser C, et al. Artificial Intelligence vs. Doctors: Diagnosing Necrotizing Enterocolitis on Abdominal Radiographs. J Pediatr Surg 2024;59:161592. 10.1016/j.jpedsurg.2024.06.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Wu Z, Zhuo R, Liu X, et al. Enhancing surgical decision-making in NEC with ResNet18: a deep learning approach to predict the need for surgery through x-ray image analysis. Front Pediatr 2024;12:1405780. 10.3389/fped.2024.1405780 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Huang SG, Qian XS, Cheng Y, et al. Machine learning-based quantitative analysis of barium enema and clinical features for early diagnosis of short-segment Hirschsprung disease in neonate. J Pediatr Surg 2021;56:1711-7. 10.1016/j.jpedsurg.2021.05.006 [DOI] [PubMed] [Google Scholar]
  • 50.Liu X, Faes L, Kale AU, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health 2019;1:e271-97. 10.1016/S2589-7500(19)30123-2 [DOI] [PubMed] [Google Scholar]
  • 51.Aggarwal R, Sounderajah V, Martin G, et al. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. NPJ Digit Med 2021;4:65. 10.1038/s41746-021-00438-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Dalboni da Rocha JL, Lai J, Pandey P, et al. Artificial Intelligence for Neuroimaging in Pediatric Cancer. Cancers (Basel) 2025;17:622. 10.3390/cancers17111778 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Yearley AG, Blitz SE, Patel RV, et al. Machine Learning in the Classification of Pediatric Posterior Fossa Tumors: A Systematic Review. Cancers (Basel) 2022;14:5608. 10.3390/cancers14225608 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Ramgopal S, Sanchez-Pinto LN, Horvat CM, et al. Artificial intelligence-based clinical decision support in pediatrics. Pediatr Res 2023;93:334-41. 10.1038/s41390-022-02226-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Straus Takahashi M, Donnelly LF, Siala S. Artificial intelligence: a primer for pediatric radiologists. Pediatr Radiol 2024;54:2127-42. 10.1007/s00247-024-06098-x [DOI] [PubMed] [Google Scholar]
  • 56.Nagendran M, Chen Y, Lovejoy CA, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ 2020;368:m689. 10.1136/bmj.m689 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Khan MM, Shah N. AI-driven wearable sensors for postoperative monitoring in surgical patients: A systematic review. Comput Biol Med 2025;196:110783. 10.1016/j.compbiomed.2025.110783 [DOI] [PubMed] [Google Scholar]

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    The article’s supplementary files as

    tp-15-02-56-rc.pdf (372.2KB, pdf)
    DOI: 10.21037/tp-2025-667
    tp-15-02-56-coif.pdf (1.2MB, pdf)
    DOI: 10.21037/tp-2025-667
    DOI: 10.21037/tp-2025-667

    Articles from Translational Pediatrics are provided here courtesy of AME Publications

    RESOURCES