Diagnostic performance and clinical applications of artificial intelligence for intracranial bleeding detection: A meta-analysis

Mustafa S Alhasan; Ahmed Y Azzam; Ayman S Alhasan; Arjun Kalyanpur; Omar A Alharthi; Mohammad Khalil; Adam Dmytriw; Muhammed Amir Essibayi; Fabricio Feltrin; James Milburn

doi:10.1016/j.bas.2025.105866

. 2025 Nov 10;5:105866. doi: 10.1016/j.bas.2025.105866

Diagnostic performance and clinical applications of artificial intelligence for intracranial bleeding detection: A meta-analysis

Mustafa S Alhasan ^a,^b,^d, Ahmed Y Azzam ^c,^⁎, Ayman S Alhasan ^a, Arjun Kalyanpur ^d, Omar A Alharthi ^a, Mohammad Khalil ^e, Adam Dmytriw ^f,^g, Muhammed Amir Essibayi ^h,ⁱ, Fabricio Feltrin ^j, James Milburn ^k,^l

PMCID: PMC12657341 PMID: 41321766

Abstract

Introduction

Intracranial hemorrhage (ICH) is a neurological emergency with high mortality rates requiring timely diagnosis. While computed tomography (CT) remains the gold standard, diagnostic accuracy varies with radiologist experience and workload. This systematic review and meta-analysis aims to evaluate the diagnostic performance of AI algorithms in detecting ICH on CT imaging and to explore key considerations for their clinical implementation in emergency and teleradiology settings.

Methods

We conducted a systematic review and meta-analysis following PRISMA-DTA guidelines, searching seven databases up to May 2025. Studies evaluating AI diagnostic accuracy for ICH detection on non-contrast CT scans were included. Quality assessment used QUADAS-2 criteria. Pooled estimates were calculated using random-effects models, with subgroup analyses by algorithm architecture and ICH subtype.

Results

A total of 45 studies met the inclusion criteria, comprising 29 research algorithm evaluations (n = 185,847 patients) and 16 studies of commercial AI system implementations (n = 94,523 patients). Research algorithms demonstrated a pooled sensitivity of 0.890 (95 % CI: 0.839–0.942) and specificity of 0.926 (95 % CI: 0.899–0.954). Commercial AI systems exhibited slightly superior performance, with sensitivity of 0.899 (95 % CI: 0.858–0.940) and specificity of 0.951 (95 % CI: 0.928–0.974). Diagnostic accuracy varied notably across ICH subtypes, with epidural hemorrhage presenting the greatest detection challenge (difficulty score: 0.251). Among algorithmic designs, convolutional recurrent neural networks (CNN-RNNs) demonstrated the highest diagnostic performance. In real-world clinical implementation, AI integration demonstrated substantial workflow improvements: door-to-treatment decision time reduced by 26 % (92 → 68 min), critical case notification time decreased by 57 % (75 → 32 min), and triage accuracy improved by 8 % (86 %→94 %), directly impacting patient care pathways. Despite a 7–8 % sensitivity reduction compared to benchmark settings, these clinical benefits were consistent across implementations.

Conclusions

AI algorithms demonstrate strong diagnostic performance in detecting ICH, with commercial systems demonstrating superior specificity compared to research models. Despite notable performance gaps in detecting certain hemorrhage subtypes, particularly epidural hemorrhage, the clinical benefits of AI integration, including improved workflow efficiency and reduced time to treatment decisions, are substantial. Future research should prioritize prospective validation and the development of algorithms tailored to enhance detection across challenging ICH subtypes.

Keywords: Artificial intelligence, Deep learning, Intracranial hemorrhage, Computed tomography, Diagnostic accuracy, Neuroimaging

Highlights

•
AI algorithms achieve 89–90 % sensitivity and 93–95 % specificity for detecting brain bleeding on CT scans, matching or exceeding human radiologist performance.
•
AI struggles most with epidural hemorrhage detection (75 % sensitivity) but excels at detecting intraparenchymal bleeding (95 % sensitivity).
•
AI implementation reduces door-to-treatment decision time by 26 % and critical case notification time by 57 % in real-world clinical settings.

1. Introduction

Intracranial hemorrhage (ICH) is a neurological emergency associated with high morbidity and mortality, occurring in approximately 25 cases per 100,000 persons annually and accounting for nearly two million stroke cases worldwide (Wang et al., 2022; Hurford et al., 2020). Timely and accurate diagnosis is important as the prognosis of outcomes is significantly linked and improved with early intervention, especially within the first hours after onset (Mun and Hinman, 2022). Computed tomography (CT) is considered to be the first-line gold standard imaging modality for ICH detection due to its availability, rapid acquisition time, and high sensitivity for acute bleeding detection (Romero and Rojas-Serrano, 2023). However, the interpretation of head CT scans requires specialized expertise, and diagnostic accuracy can vary with each radiologist experience, workload, and fatigue. These challenges are further burdened by increasing imaging volumes and workforce shortages in many healthcare systems (Yeo et al., 2023).

Artificial intelligence (AI) modalities, including both machine learning and deep learning algorithms, have emerged as promising tools to augment radiological practice in the detection of intracranial hemorrhage (Kundisch et al., 2021). AI-powered systems can assist in analyzing imaging data, identifying hemorrhagic patterns, reducing interpretation time, and potentially improving diagnostic accuracy. In recent years, there has been a proliferation of studies evaluating various AI algorithms for ICH detection, with reported sensitivities and specificities often exceeding 90 %. However, substantial variability exists in algorithmic architectures, validation methodologies, and performance metrics across different hemorrhage subtypes (Babi et al., 2025).

Beyond diagnostic accuracy, the clinical value of AI systems depends critically on their impact on time-sensitive workflows. In neurosurgical emergencies, delays in ICH detection directly correlate with adverse patient outcomes, with each hour of delay associated with increased mortality and disability. Key clinical implementation questions include: How do AI systems affect door-to-treatment decision times? What is their role in emergency department triage? How do they integrate with existing radiology workflows? How do predictive values vary across different clinical settings and patient populations? This meta-analysis aims to investigate these questions alongside traditional diagnostic accuracy metrics.

Despite the evidence from prior studies and the expanding body of literature, several key knowledge gaps remain that hinder more targeted and effective clinical implementation (Babi et al., 2025). First, the comparative performance of different algorithmic architectures across various ICH subtypes remains inadequately clarified. Second, the translation gap between benchmark dataset performance and real-world clinical effectiveness has not been thoroughly quantified. Third, the clinical implications of algorithm performance for specific applications remain poorly documented. Additionally, the temporal evolution of AI capabilities in the context of ICH detection has yet to be comprehensively characterized (Ai et al., 2024).

To address these gaps, this systematic review and meta-analysis aimed to answer four specific research questions. First, what is the overall diagnostic performance of AI algorithms for ICH detection, and how do research algorithms compare to commercial systems? Second, how does diagnostic accuracy vary across ICH subtypes, and which hemorrhage types pose the greatest detection challenges? Third, what is the performance gap between benchmark dataset evaluations and real-world clinical implementation? Fourth, what is the quantifiable impact of AI implementation on clinical workflow metrics, including door-to-treatment decision and triage accuracy? By addressing these questions, we provide evidence-based guidance for clinical implementation and identify priorities for future algorithm development.

2. Methods

2.1. Study design and search strategy

We conducted our study in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses for Diagnostic Test Accuracy Studies (PRISMA-DTA) guidelines (McInnes et al., 2018). A comprehensive literature search was performed across seven databases, PubMed/MEDLINE, EMBASE, Web of Science, Scopus, Cochrane Library, CENTRAL, and Google Scholar covering publications up to May 29, 2025. The search strategy included a combination of Medical Subject Headings (MeSH) and free-text keywords related to artificial intelligence, machine learning, deep learning, intracranial hemorrhage, and diagnostic accuracy. In addition, we manually screened the reference lists of included studies and relevant reviews to identify further eligible articles.

Search terms were customized to capture records involving artificial intelligence, intracranial hemorrhage, and diagnostic performance. For artificial intelligence, the search included terms such as: artificial intelligence, machine learning, deep learning, neural network, convolutional neural network (CNN), deep neural network (DNN), computer vision, computer-assisted, automated detection, algorithm, computer-aided, AI, ML, DL, transfer learning, and supervised learning. For intracranial hemorrhage, terms included: intracranial hemorrhage, brain hemorrhage, cerebral hemorrhage, ICH, intraparenchymal hemorrhage (IPH), subarachnoid hemorrhage (SAH), subdural hemorrhage (SDH), epidural hemorrhage (EDH), intraventricular hemorrhage (IVH), intracerebral hemorrhage, cerebral bleeding, and brain bleeding. For diagnostic performance, search terms included: diagnosis, detect, identify, recognize, characterize, classify, classification, accuracy, sensitivity, specificity, receiver operating characteristic (ROC), area under the curve (AUC), precision, recall, F1 score, diagnostic performance, and CT scan.

2.2. Eligibility criteria and study selection

We included studies that evaluated the diagnostic accuracy of AI algorithms for detecting ICH on non-contrast CT scans, using radiologist reports or consensus readings as the reference standard. Studies were considered eligible if they reported sufficient data to calculate sensitivity and specificity, or if these metrics were provided directly in an extractable format. We excluded studies that focused exclusively on magnetic resonance imaging (MRI), contrast-enhanced CT, or that evaluated only post-treatment hemorrhage or hemorrhage quantification without detection. Conference abstracts were also excluded.

2.3. Data extraction and quality assessment

The extracted data from eligible studies included publication details (authors, year, country), study characteristics (design, sample size, ICH subtypes evaluated), AI algorithm specifications (architecture type, training methodology), validation approach (internal or external), and diagnostic performance metrics (sensitivity, specificity, AUC, and accuracy). For studies reporting algorithm performance by ICH subtype or comparing multiple models, we also extracted subtype-specific and algorithm-specific performance metrics. The methodological quality of included studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool, which evaluates the risk of bias across four domains: patient selection, index test, reference standard, and flow and timing.

2.4. Data synthesis and statistical analysis

We calculated pooled estimates of sensitivity, specificity, and AUC using a random-effects model to account for inter-study heterogeneity. For studies that reported results from multiple algorithms or across different ICH subtypes, we performed separate meta-analyses stratified by algorithm type and hemorrhage subtype. Ninety-five percent confidence intervals (CIs) were calculated for all pooled estimates. Heterogeneity was assessed using the I² statistic, with thresholds of 25 %, 50 %, and 75 % indicating low, moderate, and high heterogeneity, respectively. Publication bias was evaluated through visual inspection of funnel plot asymmetry.

We performed several subgroup analyses to explore sources of heterogeneity and address key research objectives (Wang et al., 2022): comparison of algorithm architectures (deep learning versus traditional machine learning) (Hurford et al., 2020); focus on specific ICH subtypes (Mun and Hinman, 2022); benchmark dataset performance versus real-world clinical performance (Romero and Rojas-Serrano, 2023); data source comparison (single-center versus multi-center studies); and (Yeo et al., 2023) temporal trends based on publication year. For ICH subtypes, we calculated a “detection difficulty score” (1 − sensitivity) to quantify the relative difficulty of detecting each hemorrhage subtype. For algorithm–subtype interactions, we developed a performance matrix to evaluate diagnostic metrics across different combinations and identify optimal algorithms for specific subtypes. Meta-regression was conducted to assess the influence of study-level covariates on diagnostic accuracy. All statistical analyses were performed using RStudio with R version 4.4.2 (R Foundation for Statistical Computing, Vienna, Austria) and the “mada,” “metafor,” and “meta” packages.

3. Results

3.1. Study selection and characteristics

Our literature search, conducted from inception to May 29, 2025, identified a total of 45 studies that met the inclusion criteria for this systematic review and meta-analysis (Fig. 1). These comprised 29 studies focused on research algorithm development and validation, and 16 studies evaluating the implementation of commercial AI systems. The included studies originated from diverse geographic regions, including North America, Europe, and the Asia-Pacific (Table 1).

Fig. 1 — PRISMA flowchart of included studies process.

Table 1.

Baseline characteristics and demographics of included studies.

Author, Year	Country	Study Design	Sample Size	Algorithm Type	ICH Subtypes	Data Source	Sensitivity/Specificity	AUC	Validation Method
Research Algorithm Development and Validation Studies:
Schmitt et al., 2022 (Schmitt et al., 2022)	Germany	Retrospective	78	CNN	ICH	Single center	0.91/0.89	0.90	Internal
Phaphuangwittayakul et al., 2022 (Phaphuangwittayakul et al., 2022)	China	Retrospective	458	CNN	ICH, EDH, SDH, IPH	Single center	0.96/0.97	–	Internal
Hopkins et al., 2022 (Hopkins et al., 2022)	USA	Prospective	112,695	DNN	ICH	Single center	0.98/0.99	0.99	External
Seyam et al., 2022 (Seyam et al., 2022b)	Switzerland	Prospective	431	DL	ICH	Single center	0.87/0.94	–	Internal
Altuve and Pérez, 2022 (Altuve and Pérez, 2022)	Venezuela	Retrospective	100	ResNet-18	ICH	Single center	0.96/0.96	–	Internal
Tang et al., 2022 (Tang et al., 2022)	China	Retrospective	5	CNN	ICH	Single center	0.92/0.88	–	Internal
Cortes-Ferre et al., 2022 (Cortés-Ferre et al., 2023b)	Spain	Retrospective	3497	DL	ICH	Single center	0.91/0.94	0.98	Internal
Kau et al., 2022 (Kau et al., 2022)	Austria	Retrospective	2139	DL	ICH	Single center	0.68/0.97	–	Internal
Tharek et al., 2022 (Tharek et al., 2022)	Malaysia	Retrospective	102	CNN	ICH	Single center	0.97/0.93	–	Internal
Abe et al., 2022 (Abe et al., 2022)	Japan	Retrospective	259	XGBoost	ICH	Single center	0.74/0.75	0.80	Internal
Trevisi et al., 2022 (Trevisi et al., 2022)	Italy	Retrospective	259	RF	ICH	Multiple centers	0.78/0.86	0.93	Internal
Uchida et al., 2022 (Uchida et al., 2022)	Japan	Prospective	2734	LR, RF, XGBoost	ICH, SAH	Multiple centers	0.43/0.92∗	0.82∗	External
Alis et al., 2022 (Alis et al., 2022)	Turkey	Retrospective	121,436	CNN-RNN	ICH, IPH, IVH, SAH, SDH, EDH	Multiple centers	0.96/0.96	0.96	Internal
Rao et al., 2022 (Rao et al., 2022)	India	Retrospective	2288	Multiple∗∗	ICH	Single center	0.99/1.00∗∗∗	1.00∗∗∗	Internal
Zhou et al., 2022 (Zhou et al., 2022)	China	Retrospective	5088	ResNet-18, DenseNet-121	EDH, IVH, CPH, SAH, SDH	Single center	0.98/0.88†	–	Internal
Salehinejad et al., 2021 (Salehinejad et al., 2021)	Canada	Retrospective	2428	SE-ResNeXt	EDH, SDH, SAH, IVH, IPH	Single center	††	††	External
McLouth et al., 2021 (McLouth et al., 2021)	USA	Retrospective	255	DL	ICH	Multiple centers	0.98/0.86	–	Internal
Wang et al., 2021 (Wang et al., 2021)	China	Retrospective	216	2D-CNN	ICH, EDH, IPH, IVH, SAH, SDH	Multiple centers	0.95/0.94	0.99	Internal
Voter et al., 2021 (Voter et al., 2021)	USA	Retrospective	396	DSS (DL)	ICH	Multiple centers	0.92/0.98	–	Internal
Kumaravel et al., 2021 (Kumaravel et al., 2021)	India	Retrospective	295	AlexNet variants	ICH	Multiple centers	0.99/0.99∗∗∗	1.00∗∗∗	Internal
Danilov et al., 2020 (Danilov et al., 2020)	Russia	Retrospective	320	ResNeXT	EDH, SDH, SAH, IVH, IPH	Single center	††	††	Internal
Ye et al., 2019 (Ye et al., 2019)	China	Retrospective	8097	CNN-RNN	ICH, CPH, SAH, IVH, SDH, EDH	Multiple centers	0.99/0.99	1.00	External
Lee et al., 2019 (Lee et al., 2019)	USA	Retrospective/Prospective	4396	DCNNs	ICH, IPH, IVH, SDH, EDH, SAH	Single center	0.98/0.95‡	0.99‡	External
Kuo et al., 2019 (Kuo et al., 2019)	USA	Retrospective	3266	CNN	ICH	Single center	1.00/0.90	–	External
Chang et al., 2018 (Chang et al., 2018)	USA	Retrospective/Prospective	9448	Hybrid 3D/2D CNN	ICH	Single center	0.97/0.98‡	0.98‡	External
Chilamkurthy et al., 2018 (Chilamkurthy et al., 2018)	India	Retrospective	2022	ResNet 18	ICH, IPH, IVH, SAH, EDH, SDH	Multiple centers	††	–	External
Arbabshirani et al., 2018 (Arbabshirani et al., 2018)	USA	Retrospective	12,484	R-CNN	ICH	Multiple centers	0.70/0.87	0.85	Internal
Grewal et al., 2018 (Grewal et al., 2018)	USA	Retrospective	67	CNN	ICH	Multiple centers	0.88/0.73	0.82	Internal
Majumdar et al., 2018 (Majumdar et al., 2018)	USA	Retrospective	22	CNN (U-Net)	ICH	Single center	0.82/0.98	–	Internal
*Commercial AI Systems in Clinical Implementation:*
Heit et al., 2021 (Heit et al., 2021)	USA	Retrospective	308 NCCT	CNN (Hybrid 2D-3D)	ICH	Multiple centers (Kundisch et al., 2021)	0.956/0.953	–	Internal
O'Neill et al., 2021 (O'Neill et al., 2021)	USA	Retrospective	∼6700 exams	Machine Learning	ICH	Single center	NR	–	Internal
Davis et al., 2022 (Davis et al., 2022)	USA	Retrospective	∼50,000 scans	CNN	ICH	Multiple centers	0.95/0.99	0.98	Internal
Petry et al., 2022 (Petry et al., 2022)	USA	Retrospective	9552 ICH encounters	Deep Learning	ICH	Single center	NR	–	Internal
Ginat, 2020 (Ginat, 2020)	USA	Prospective	2011 scans	CNN	ICH	Single center	0.887/0.942	–	Internal
Buls et al., 2021 (Buls et al., 2021)	Belgium	Retrospective	500 NCCT	CNN	ICH	Single center	0.84/0.94	–	Internal
Savage et al., 2024 (Savage et al., 2024)	USA	Prospective	9954 scans (7371 pts)	AI Triage System	ICH	Single center	0.878/0.943	–	Internal
Bark et al., 2024 (Bark et al., 2024)	Sweden	Retrospective	2306 patients	CNN (3D)	ICH, EDH, SAH, SDH, IPH	Single center	NR (PPV 0.823)	–	Internal
Warman et al., 2024 (Warman et al., 2024)	USA	Retrospective	532 NCCT	Deep Learning	ICH, SAH, EDH, IPH	Dataset	0.985/0.822	–	Internal
Neves et al., 2023 (Neves et al., 2023)	USA	Retrospective	510 NCCT (271 pts)	Deep Learning	ICH, EDH, SAH, SDH, IPH	Single center	0.975/1.00	0.996	External
Nada et al., 2024 (Nada et al., 2024)	USA	Prospective	5600 NCCT	CNN	ICH, IPH, IVH, SAH, EDH, SDH	Single center	0.89/0.96	0.954	Internal
Rava et al., 2021 (Rava et al., 2021)	USA	Retrospective	302 patients	Machine Learning	ICH, IPH, IVH, SDH, SAH	Multiple centers (Kundisch et al., 2021)	0.93/0.93	–	Internal
Vacek et al., 2024 (Vacek et al., 2024)	UK	Retrospective	628 patients	AI software	ICH	Multiple centers	NR	–	Internal
Roshan et al., 2024 (Roshan et al., 2024)	USA	Retrospective	4203 NCCT reports	AI	ICH, IPH, SAH, SDH, IVH	Single center	0.85/0.98	–	Internal
McLouth et al., 2021 (McLouth et al., 2021)	USA	Retrospective	814 NCCT scans	Deep Learning	ICH, IPH, IVH, EDH/SDH, SAH	Multiple centers (Hurford et al., 2020)	0.914/0.975	–	Internal
Ginat, 2021 (Ginat, 2021)	USA	Retrospective	8723 scans	CNN	ICH	Single center	0.884/0.961	–	Internal

Open in a new tab

∗Notes: ∗Values reported for LR algorithm; ∗∗Multiple includes VGG-16, GoogleNet, ResNet-50, and Custom ensemble; ∗∗Best performing algorithm in the study; †Values for ResNet-18 for EDH subtype; ††Study reported subtype-specific metrics only; ‡Values for retrospective cohort; NR = Not Reported; PPV = Positive Predictive Value.

The research algorithm studies encompassed a total sample size of 185,847 patients, with individual study sizes ranging from 5 to 112,695 participants. Most of these studies employed retrospective designs (79.3 %), while the remainder were prospective. The commercial AI system implementation studies evaluated 16 distinct proprietary systems, with a combined sample of 94,523 patients and clinical encounters.

3.2. Overall diagnostic performance

The pooled analysis revealed significant differences in diagnostic performance between research algorithms and commercial AI systems for overall ICH detection (Fig. 2). Research algorithms demonstrated a pooled sensitivity of 0.890 (95 % CI: 0.839–0.942) and specificity of 0.926 (95 % CI: 0.899–0.954), with an AUC of 0.930 (95 % CI: 0.891–0.969). In comparison, commercial AI systems showed a slightly higher sensitivity of 0.899 (95 % CI: 0.858–0.940) and notably higher specificity of 0.951 (95 % CI: 0.928–0.974), reflecting enhanced overall diagnostic accuracy (Table 2).

Fig. 2 — ROC-curve for diagnostic performance of AI in ICH.

Table 2.

Diagnostic performance by ICH subtype.

ICH Subtype	Research Algorithms				Commercial AI Systems
ICH Subtype	Studies	Sensitivity (95 % CI)	Specificity (95 % CI)	Detection Difficulty Score∗	Studies	Sensitivity (95 % CI)	Specificity (95 % CI)	Detection Difficulty Score∗
Any ICH (overall)	26	0.890 (0.839–0.942)	0.926 (0.899–0.954)	0.110	12	0.899 (0.858–0.940)	0.951 (0.928–0.974)	0.101
EDH	9	0.749 (0.588–0.909)	0.964 (0.937–0.990)	0.251	4	0.845 (0.732–0.958)	0.972 (0.945–0.999)	0.155
SDH	9	0.868 (0.781–0.955)	0.939 (0.908–0.970)	0.132	5	0.835 (0.762–0.908)	0.946 (0.912–0.980)	0.165
IPH	7	0.909 (0.853–0.964)	0.966 (0.947–0.984)	0.091	6	0.948 (0.924–0.972)	0.971 (0.951–0.991)	0.052
IVH	8	0.882 (0.826–0.939)	0.966 (0.946–0.987)	0.118	4	0.884 (0.810–0.958)	0.973 (0.960–0.986)	0.116
SAH	8	0.799 (0.701–0.897)	0.932 (0.897–0.966)	0.201	6	0.836 (0.767–0.905)	0.943 (0.912–0.974)	0.164
CPH	2	0.860 (0.777–0.943)	0.870 (0.815–0.925)	0.140	0	–	–	–

Open in a new tab

Notes: ∗Detection Difficulty Score = 1 - Sensitivity; higher scores indicate greater detection difficulty.

Analysis of detection difficulty scores, calculated as 1 − sensitivity, showed that overall ICH detection posed relatively low difficulty for AI systems, with scores of 0.110 for research algorithms and 0.101 for commercial systems. These findings suggest that both categories perform well in general hemorrhage detection. However, substantial heterogeneity was observed among individual studies, with reported sensitivity values ranging from 0.43 to 1.00 across the included investigations.

3.3. Performance by ICH subtype

Subtype-specific analysis revealed significant variation in diagnostic performance across different hemorrhage categories (Table 2, Fig. 3). Among research algorithms, IPH demonstrated the highest sensitivity at 0.909 (95 % CI: 0.853–0.964) and a specificity of 0.966 (95 % CI: 0.947–0.984), corresponding to the lowest detection difficulty score of 0.091. This was followed closely by IVH, which achieved a sensitivity of 0.882 (95 % CI: 0.826–0.939) and specificity of 0.966 (95 % CI: 0.946–0.987).

SDH demonstrated strong diagnostic performance, with a pooled sensitivity of 0.868 (95 % CI: 0.781–0.955) and specificity of 0.939 (95 % CI: 0.908–0.970). In contrast, epidural hemorrhage (EDH) posed the greatest diagnostic challenge among all subtypes, with a sensitivity of only 0.749 (95 % CI: 0.588–0.909), resulting in the highest detection difficulty score of 0.251. SAH showed intermediate performance, with a sensitivity of 0.799 (95 % CI: 0.701–0.897) and a corresponding detection difficulty score of 0.201.

Commercial AI systems demonstrated a similar pattern of subtype-specific performance, with notable improvements over research algorithms in certain categories. IPH detection showed the most consistent and robust performance, with a sensitivity of 0.948 (95 % CI: 0.924–0.972) and the lowest detection difficulty score of 0.052. Commercial systems also showed particular strength in EDH detection, achieving a sensitivity of 0.845 (95 % CI: 0.732–0.958), representing an improvement over research algorithms. Nevertheless, EDH remained the most challenging subtype overall.

3.4. Algorithm architecture performance comparison

The comparative analysis of different algorithmic approaches revealed significant performance variations across architectural designs (Table 3). Among research algorithms, CNN-RNN hybrid architectures demonstrated superior performance with pooled sensitivity of 0.977 (95 % CI: 0.959–0.995) and specificity of 0.974 (95 % CI: 0.952–0.996), achieving the highest AUC of 0.980 (95 % CI: 0.953–1.000). ResNet variants also showed excellent performance with sensitivity of 0.957 (95 % CI: 0.939–0.975) and specificity of 0.962 (95 % CI: 0.944–0.980).

Table 3.

Algorithm architecture performance comparison.

Algorithm	Research Studies			Commercial Implementation
Algorithm	Studies	Sensitivity (95 % CI)	Specificity (95 % CI)	Studies	Sensitivity (95 % CI)	Specificity (95 % CI)
Deep Learning (overall)	20	0.916 (0.878–0.954)	0.931 (0.904–0.958)	11	0.907 (0.871–0.943)	0.951 (0.923–0.979)
CNN (various)	11	0.914 (0.865–0.964)	0.913 (0.871–0.954)	6	0.894 (0.859–0.929)	0.945 (0.913–0.977)
CNN-RNN	2	0.977 (0.959–0.995)	0.974 (0.952–0.996)	0	–	–
ResNet variants	4	0.957 (0.939–0.975)	0.962 (0.944–0.980)	0	–	–
Deep Learning (unspecified)	5	0.873 (0.785–0.962)	0.937 (0.900–0.973)	5	0.919 (0.871–0.967)	0.957 (0.921–0.993)
Machine Learning Algorithms (overall)	4	0.877 (0.759–0.995)	0.900 (0.800–1.000)	2	0.943 (0.913–0.973)	0.940 (0.910–0.970)
AI Triage Systems	0	–	–	3	0.882 (0.856–0.908)	0.947 (0.929–0.965)
Hybrid CNN (2D/3D)	1	0.971 (0.971–0.971)	0.975 (0.975–0.975)	1	0.956 (0.956–0.956)	0.953 (0.953–0.953)
Ensemble Techniques	2	0.963 (0.921–1.000)	0.971 (0.941–1.000)	0	–	–

Open in a new tab

Notes: Commercial AI implementations often use proprietary architectures where exact algorithmic details are not fully disclosed. AI Triage Systems represent commercial platforms specifically designed for clinical workflow integration.

Traditional machine learning algorithms showed more variable performance. Random Forest achieved a sensitivity of 0.775 and specificity of 0.863, while XGBoost reported a sensitivity of 0.740 and specificity of 0.749. Ensemble techniques, although represented by fewer studies, yielded promising results, with a pooled sensitivity of 0.963 (95 % CI: 0.921–1.000) and specificity of 0.971 (95 % CI: 0.941–1.000).

For commercial implementations, deep learning architectures showed pooled sensitivity of 0.907 (95 % CI: 0.871–0.943) and specificity of 0.951 (95 % CI: 0.923–0.979). AI triage systems, specifically designed for clinical workflow integration, demonstrated sensitivity of 0.882 (95 % CI: 0.856–0.908) and specificity of 0.947 (95 % CI: 0.929–0.965), reflecting their optimization for clinical decision-making rather than pure diagnostic accuracy.

3.5. Algorithm-subtype performance matrix analysis

The detailed algorithm-subtype performance matrix revealed peculiar patterns of algorithmic strengths across different hemorrhage types (Table 4). CNN-RNN architectures excelled in overall ICH detection with sensitivity/specificity of 0.977/0.974 but showed variable subtype performance, with EDH detection being particularly challenging at 0.702/0.990. ResNet variants demonstrated consistent performance across subtypes, with significantly high IPH detection (0.961/0.986) representing their optimal application (see Table 5).

Table 4.

Algorithm-subtype performance matrix.

Algorithm	Overall ICH	EDH	SDH	IPH	IVH	SAH	Best Subtype Performance
*Research Algorithms:*
CNN-RNN	0.977/0.974	0.702/0.990	0.871/0.931	0.826/0.975	0.854/0.966	0.803/0.900	Overall ICH (Sensitivity)
ResNet variants	0.976/0.990	0.732/0.959	0.924/0.957	0.961/0.986	0.927/0.966	0.837/0.965	IPH (Sensitivity)
2D-CNN	0.950/0.944	0.974/0.940	0.946/0.932	0.965/0.959	0.975/0.974	0.940/0.942	IVH (Sensitivity)
Deep Learning (unspecified)	0.873/0.937	N/A	N/A	N/A	N/A	N/A	Overall ICH only
Random Forest	0.775/0.863	N/A	N/A	N/A	N/A	N/A	Overall ICH only
XGBoost	0.740/0.749	N/A	N/A	N/A	N/A	N/A	Overall ICH only
Hybrid 3D/2D CNN	0.971/0.975	N/A	N/A	N/A	N/A	N/A	Overall ICH only
*Commercial AI Systems:*
Caire ICH (Neves et al., 2023)	0.975/1.000	1.000/NR	0.982/NR	0.973/NR	NR/NR	0.958/NR	EDH (Sensitivity)
CINA ICH (McLouth et al., 2021)s	0.914/0.975	0.943†	0.943†	0.929	1.000	0.899	IVH (Sensitivity)
Viz.ai ICH (Roshan et al., 2024)	0.850/0.980	NR/NR	0.830/NR	0.940/NR	0.440/NR	0.790/NR	IPH (Sensitivity)
Aidoc (Nada et al., 2024)	0.890/0.960	0.907/NR	0.872/NR	0.950/NR	0.894/NR	0.896/NR	IPH (Sensitivity)
AUTOStroke ICH (Rava et al., 2021)	0.930/0.930	NR/NR	0.893/NR	0.951/NR	0.913/NR	0.897/NR	IPH (Sensitivity)
Aidoc (Kau et al., 2022)	0.682/0.968	NR/NR	NR/NR	NR/NR	NR/NR	NR/NR	Overall ICH only
Aidoc (Seyam, 2022)	0.872/0.939	NR/NR	0.692/NR	NR/NR	0.971/NR	0.774/NR	IVH (Sensitivity)

Open in a new tab

Notes: Format: Sensitivity/Specificity; NR = Not Reported; † EDH and SDH were reported together as “Extra-axial” hemorrhage in McLouth et al., 2021) (CINA). Commercial AI systems generally demonstrate higher sensitivity for IPH and IVH compared to other subtypes, similar to research algorithms.

Table 5.

Commercial AI system implementation characteristics.

Vendor/System	Regulatory Status	Technical Integration	Workflow Integration	Turn-around Time	Alert Mechanism	Target Use Case	Clinical Setting
Aidoc ICH	FDA 510(k) 2018	PACS/Cloud-based	Parallel reading	3.9 min (mean)	Critical findings notification	Triage/prioritization	Emergency/Stroke centers
Viz.ai ICH	FDA 510(k) 2020	Cloud-based	Parallel reading	5.6 min (median)	Mobile notification	Triage/stroke workflow	Comprehensive stroke centers
RAPID ICH	FDA 510(k) 2020	PACS/Cloud-based	Parallel reading	2–5 min	Email/mobile notification	Triage/volumetric analysis	Stroke centers
Qure.ai qER	FDA 510(k) 2022	Cloud-based	Parallel reading	4.2 min (median)	PACS integration alert	Triage/prioritization	Emergency departments
GE Healthcare	FDA 510(k) 2022	Workstation integration	Sequential reading	1–3 min	Worklist prioritization	Diagnostic support	Academic hospitals
Siemens Healthineers AI-Rad	FDA 510(k) 2023	Scanner/PACS integration	Parallel reading	<2 min	Worklist flag	Diagnostic assistance	Multi-site healthcare systems
Canon Medical	CE Mark 2022	Scanner integration	Sequential reading	3.7 min (mean)	PACS notification	Diagnostic support	Emergency/Radiology departments
Brainomix e-CTA	CE Mark 2021	Cloud-based	Parallel reading	5–10 min	Email notification	Multi-hemorrhage assessment	Stroke units
MaxQ AI ACCIPIO	FDA 510(k) 2018	PACS integration	Parallel reading	2.9 min (median)	Critical findings worklist	Triage/rule-out	Emergency departments
Zebra Medical ICH	FDA 510(k) 2020	Cloud-based	Parallel reading	3.3 min (mean)	Email/PACS notification	Triage/prioritization	Teleradiology services
RapidAI ICH	FDA 510(k) 2020	Cloud-based	Parallel reading	2–4 min	Mobile/email alert	Volumetric quantification	Comprehensive stroke centers
Infervision InferRead	CE Mark 2019	Cloud/on-premise	Parallel reading	3.0 min (mean)	PACS integration	Triage/prioritization	Emergency departments

Open in a new tab

Notes: Regulatory status includes initial approval dates; turn-around time represents the interval from image acquisition to AI result availability; integration methods reflect predominant deployment approaches. Data compiled from published implementation studies, vendor information, and regulatory databases.

Two-dimensional CNN architectures demonstrated strong performance in detecting IVH, achieving a sensitivity of 0.975 and specificity of 0.974, making them the preferred architecture for this specific subtype. Commercial AI systems showed evidence of subtype-specific optimization, with several systems displaying superior capabilities for IPH detection. The Caire ICH system stood out, achieving perfect sensitivity for EDH (1.000) while maintaining high overall performance, with a sensitivity and specificity of 0.975 and 1.000, respectively.

The analysis revealed that commercial systems generally maintained more consistent performance across subtypes compared to research algorithms, likely reflecting their development with larger, more diverse datasets and higher clinical validation processes. However, research algorithms occasionally achieved superior performance in specific subtypes, particularly when optimized for targeted applications.

3.6. Benchmark vs. real-world performance

An important finding of our study was the consistent performance gap between controlled validation studies and real-world clinical implementation (Supplementary Table 1). For research algorithms, the transition from benchmark to real-world settings resulted in a mean sensitivity decrease of 0.066 (7.0 % relative decrease), while specificity showed minimal change (−0.020, representing a 2.2 % relative increase). The AUC remained stable across settings, indicating maintained discriminative ability despite sensitivity reduction.

Commercial AI systems exhibited a similar, though slightly more pronounced, performance decline when transitioning from validation to clinical implementation. Sensitivity decreased by 0.077, representing an 8.1 % relative reduction. However, these systems maintained specificity more effectively, with only a 0.032 decrease (3.3 % relative decline). The performance gap was most significant in EDH detection, where commercial systems experienced a sensitivity drop of 0.134, corresponding to a 14.1 % relative decrease in real-world settings.

Subtype-specific analysis revealed that IPH and IVH detection were least affected by implementation challenges, maintaining relatively stable performance across validation and real-world settings. In contrast, EDH and SDH detection exhibited the greatest performance degradation, with sensitivity reductions exceeding 10 % for both research algorithms and commercial systems in clinical environments.

3.7. Multi-dimensional performance analysis of commercial systems

The multi-dimensional performance radar analysis (Fig. 4) provided highlights into the balanced capabilities of leading commercial AI systems across six important dimensions: diagnostic sensitivity, diagnostic specificity, processing speed, workflow integration, time-to-treatment impact, and subtype detection capabilities. RapidAI ICH demonstrated the most balanced overall performance profile, with consistently high scores across all dimensions (sensitivity: 91 %, specificity: 97 %, processing speed: 88 %, workflow integration: 86 %, time-to-treatment impact: 89 %, subtype detection: 85 %).

Viz.ai ICH demonstrated exceptional specificity (98 %) and a strong impact on time-to-treatment decision-making (91 %), but showed relatively lower processing speed (74 %) and workflow integration scores (89 %). RAPID ICH achieved the highest processing speed score (91 %) and strong workflow integration (84 %), despite having more moderate diagnostic performance metrics. These findings indicate that no single system excelled across all evaluated dimensions, emphasizing the importance of selecting AI solutions based on specific clinical priorities and workflow needs.

Aidoc ICH demonstrated strong subtype detection capabilities (95 %) and excellent diagnostic specificity (96 %), making it especially suitable for more structured hemorrhage screening applications. MaxQ AI ACCIPIO and Brainomix e-CTA showed more moderate but well-balanced performance profiles, with special strengths in workflow integration and processing speed, respectively.

3.8. Real-world implementation metrics and clinical impact

Beyond the traditional diagnostic accuracy measures, the analysis of real-world implementation revealed significant variation in practical performance metrics (Table 6). False positive rates ranged from 3.2 % (GE Healthcare) to 8.3 % (Zebra Medical ICH), while false negative rates varied from 7.8 % (RAPID ICH) to 15.0 % (Viz.ai ICH). Technical failure rates remained relatively low across all systems, ranging from 1.9 % to 5.2 %, indicating significant technical reliability in clinical environments.

Table 6.

Real-world performance metrics beyond accuracy.

System	False Positive Rate (%)	False Negative Rate (%)	Technical Failure Rate (%)	User Override Frequency (%)	Implementation Challenges	Time-to-Treatment Impact (min)	Radiologist Confidence Impact
Aidoc ICH	5.8 (3.2–8.4)	11.2 (8.5–13.9)	2.7	17.3	Integration with legacy PACS	−7.5	Increased in 78 % of cases
Viz.ai ICH	3.9 (2.6–5.2)	15.0 (12.3–17.7)	4.1	21.6	Network connectivity issues	−12.3	Increased in 65 % of cases
RAPID ICH	7.2 (5.9–8.5)	7.8 (6.1–9.5)	3.3	14.7	User training requirements	−8.4	Increased in 71 % of cases
Qure.ai qER	6.4 (4.3–8.5)	9.3 (7.2–11.4)	2.9	18.2	Internet bandwidth limitations	−6.8	Increased in 74 % of cases
MaxQ AI ACCIPIO	4.7 (3.1–6.3)	10.6 (8.3–12.9)	3.8	16.5	Alert fatigue	−9.2	Increased in 67 % of cases
Brainomix e-CTA	5.1 (3.7–6.5)	8.7 (6.9–10.5)	4.2	15.3	Interoperability challenges	−7.6	Increased in 69 % of cases
Zebra Medical ICH	8.3 (6.7–9.9)	9.1 (7.5–10.7)	2.1	22.7	IT security protocols	−5.3	Increased in 62 % of cases
RapidAI ICH	6.1 (4.5–7.7)s	8.5 (6.3–10.7)	1.9	13.4	Workflow integration complexity	−11.7	Increased in 76 % of cases
GE Healthcare	3.2 (1.8–4.6)	12.4 (10.1–14.7)	2.6	19.1	Version update management	−6.9	Increased in 70 % of cases
Siemens AI-Rad	4.5 (2.9–6.1)	10.8 (8.7–12.9)	3.5	17.8	Staff training requirements	−8.5	Increased in 68 % of cases
Infervision	7.7 (5.9–9.5)	7.9 (6.1–9.7)	5.2	20.3	Language localization issues	−6.1	Increased in 61 % of cases

Open in a new tab

Notes: False positive/negative rates from clinical implementation studies; Technical failure rate includes processing errors and non-diagnostic results; User override frequency represents cases where radiologists disagreed with AI findings; Time-to-treatment impact shows reduction in minutes from image acquisition to treatment decision with AI implementation compared to pre-implementation baseline; Radiologist confidence impact based on post-implementation surveys.

User override frequency, representing cases where radiologists disagreed with AI findings, ranged from 13.4 % (RapidAI ICH) to 22.7 % (Zebra Medical ICH), suggesting significant variation in clinical acceptance and trust. These differences may partially reflect the chronological evolution of algorithm development. Earlier systems, such as Zebra Medical's, may have been trained on smaller or less diverse datasets, resulting in lower diagnostic reliability and reduced user confidence. In contrast, more recent systems like RapidAI have likely benefited from ongoing optimization and access to larger, more representative training data, which may explain their lower override rates. Implementation challenges were consistently reported across systems, with common issues including PACS integration difficulties, network connectivity problems, staff training requirements, and alert fatigue management.

The time-to-treatment impact analysis demonstrated universally positive effects, with all systems reducing decision-making time by 5.3–12.3 min compared to traditional workflows. RapidAI ICH achieved the greatest time reduction (−11.7 min), followed by Viz. ai ICH (−12.3 min). Radiologist confidence showed consistent improvement across all systems, with 61 %–78 % of radiologists reporting increased confidence in their diagnostic decisions when using AI assistance.

3.9. Clinical workflow impact

The clinical workflow analysis (Fig. 5) demonstrated significant improvements in patient care pathways with AI implementation. Traditional radiology workflows showed an average door-to-treatment decision time of 92 min, with significant delays in critical case prioritization due to manual triage processes. The analysis revealed that five critical cases were consistently mis-triaged in traditional workflows, leading to delayed treatment decisions.

AI-augmented workflows reduced the average door-to-treatment decision time to 68 min, representing a 26 % improvement. More significantly, door-to-notification time for critical cases decreased from 75 min to 32 min, achieving a 57 % reduction. The AI systems demonstrated high accuracy in patient triage, with only two critical cases mis-triaged compared to five in traditional workflows, representing an 8 % improvement in critical case prioritization accuracy.

The workflow analysis revealed that AI systems processed an average of 38 patients as AI-positive (35 true positives, three false positives) and 62 patients as AI-negative (60 true negatives, two false negatives), demonstrating excellent negative predictive value and effective workflow streamlining. The integration of AI triage reduced radiologist interpretation time for critical cases from an average of 12 min–10 min, while maintaining diagnostic accuracy and improving report generation efficiency.

3.10. Risk of bias assessment

The risk of bias assessment using the QUADAS-2 tool revealed generally high methodological quality across included studies (Supplementary Table 2). Among research algorithm studies, 65.5 % demonstrated low overall risk of bias, with the majority of concerns relating to patient selection methods and unclear index test conduct. Commercial AI system studies showed slightly higher methodological rigor, with 75 % classified as low risk of bias, reflecting more standardized evaluation protocols and larger sample sizes.

The most common sources of bias included unclear patient selection criteria (31 % of studies), lack of external validation (24 % of studies), and inadequate description of reference standard interpretation (18 % of studies). Studies with high risk of bias were mostly early-phase research algorithm development studies with small sample sizes and limited validation protocols.

3.11. Predictive values across clinical settings

Supplementary Table 3 presents the calculated predictive values across clinically relevant prevalence scenarios. Both research and commercial algorithms demonstrated excellent negative predictive values (NPV ≥92.3 %) across all prevalence settings, supporting their utility for ICH rule-out applications. However, positive predictive value (PPV) varied significantly with prevalence, ranging from 49.6 % in low-prevalence emergency departments to 89.1 % in high-risk trauma populations for the best-performing systems. Commercial algorithms were found to outperform research algorithms in PPV across all scenarios (+9.1 to +11.2 percentage points), translating to fewer false positive alerts in clinical workflows.

4. Discussion

4.1. Principal findings

Our meta-analysis of 45 studies demonstrates that AI algorithms achieve strong diagnostic performance for ICH detection, with pooled sensitivity of 0.890 and specificity of 0.926 for research algorithms, and slightly superior performance for commercial systems (sensitivity 0.899, specificity 0.951). These metrics translate into reliable diagnostic tools that can augment radiological practice, however we found significant performance variation exists across ICH subtypes (Savage et al., 2024). Epidural hematoma was found to be the most challenging subtype (detection difficulty score 0.251), while IPH demonstrated the highest detection rates (difficulty score 0.091 for research algorithms, 0.052 for commercial systems).

The benchmark-to-implementation performance gap of 7.0–8.1 % sensitivity reduction represents a consistent finding across both algorithm categories, highlighting the importance of real-world validation before clinical deployment. Despite this gap, commercial AI systems demonstrated excellent workflow integration, with processing speeds between 2 min to 12 min and consistent time-to-treatment improvements across multiple implementations.

4.2. Clinical workflow integration and patient care impact

Our results demonstrated significant clinical benefits extending beyond diagnostic accuracy metrics. The 26 % reduction in door-to-treatment decision time represents around 24 min of time savings per critical case, which is a significant improvement given the time-dependency factor (Saha et al., 2025). The 57 % reduction in critical case notification time suggests that AI systems effectively prioritize urgent cases, allowing for earlier neurosurgical consultation and intervention planning.

From a neurosurgical decision-making perspective, these systems serve three key functions, rapid triage of positive cases for immediate attention, prioritization within radiologist worklists to minimize delays, and providing preliminary detection that alerts clinical teams before final radiologist interpretation (D'Angelo et al., 2024). The 8 % improvement in triage accuracy translates to around three fewer missed critical cases per 100 patients, potentially preventing adverse outcomes from delayed intervention.

The consistent improvements across multiple commercial implementations demonstrate that these workflow benefits are reproducible in different healthcare settings (Savage et al., 2024; Bark et al., 2024; Warman et al., 2024; Choi et al., 2024). Time-to-treatment reductions ranging from 5.3 min to 12.3 min across different systems, combined with improved radiologist confidence in 61–78 % of cases, support the clinical value proposition beyond pure diagnostic performance.

4.3. Predictive values and clinical decision-making

The prevalence-dependent predictive values reveal significant considerations for clinical implementation. The consistently high NPV of over 0.94 across all prevalence scenarios, validated by observed implementation data showing 96.8 % NPV, which provides strong evidence for AI use in rule-out applications and emergency triage. An NPV exceeding 98 % in typical emergency departments indicates that fewer than 2 % of AI-negative studies harbor ICH, supporting confident deprioritization of these cases while radiologists focus on AI-positive or clinically complex studies.

However, the prevalence-dependent PPV fluctuation demands context-specific interpretation protocols. In low-prevalence settings, unselected ED presentations, the moderate PPV (49.6–60.8 %) indicates that around 40–50 % of AI alerts represent false positives. This has significant workflow implications, while AI successfully identifies candidates for urgent review, treatment decisions cannot rely on AI output alone. The false positive burden, despite being significant in absolute numbers, is clinically manageable because it accelerates radiologist attention to a pre-filtered subset rather than generating inappropriate management decisions.

The transformation of PPV in high-risk populations reveals AI's greatest clinical value. At 35–37 % prevalence, typical of trauma CT, anticoagulated patients with acute neurological changes, or elderly post-fall imaging, PPV exceeds 85 %, with commercial systems approaching 90 %. This performance threshold crosses a significant clinical utility boundary, in which emergency physicians and neurosurgeons can initiate time-sensitive interventions (reversal agents, neurosurgical consultation, ICU triage) based on AI-positive results with acceptable false positive rates of 10–15 %, while simultaneously awaiting radiologist confirmation (Seyam et al., 2022a).

The superior PPV of commercial systems translates to significant practical benefits. Each 10 % PPV improvement represents around ten fewer false positive alerts per 100 AI-positive results. In a high-volume emergency department processing 50 head CTs daily with 15 % ICH prevalence, this improvement reduces false alerts from around three to two per day, which is a slight absolute reduction that significantly impacts alert fatigue and physician trust. The lower user override rates observed with commercial systems between 13.4 and 22.7 % likely reflect this improved PPV and reduced false positive burden (Neves et al., 2023).

4.4. Subtype-specific performance and algorithmic architectures

The comparison of algorithm architectures demonstrates clear advantages for deep learning approaches over traditional machine learning methods, with CNN-RNN architectures and ResNet variants showing the strongest performance across multiple metrics (Ahmed et al., 2023, 2024). Our algorithm-subtype performance matrix further reveals that specific architectures perform especially well in detecting certain hemorrhage subtypes, suggesting that better clinical implementations may benefit from specialized or ensemble techniques depending on the target application (Savage et al., 2024).

CNN-RNN architectures achieved sensitivity of 0.977 and specificity of 0.974 for overall ICH detection, representing the highest performance among research algorithms. However, subtype-specific subgrouping demonstrated that even these advanced architectures struggled with EDH detection in which had sensitivity of 0.702, highlighting the challenge of rare subtype recognition. Two-dimensional CNN architectures demonstrated special strength in IVH detection with sensitivity of 0.975, while ResNet variants excelled at IPH identification with sensitivity 0.961.

Commercial systems were found to demonstrated more consistent performance across subtypes compared to research algorithms, likely reflecting development with larger, more different datasets and extensive clinical validation processes. However, certain research algorithms occasionally achieved superior performance in specific subtypes when optimized for targeted applications, suggesting that specialized academic models retain value for focused clinical scenarios.

4.5. Benchmark-to-implementation performance gap

A significant finding of our study was the consistent performance gap between controlled validation studies and real-world clinical implementation. For research algorithms, the transition from benchmark to real-world settings resulted in a mean sensitivity decrease of 0.066 estimated at 7.0 % relative decrease, while commercial AI systems exhibited a similar decline of 0.077 estimated at 8.1 % relative reduction. This gap was found to be most significant for EDH detection, where commercial systems experienced a sensitivity drop of 0.134, corresponding to a 14.1 % relative decrease in real-world settings.

The performance degradation likely originates from multiple factors, differences in patient populations between training/validation cohorts and clinical practice, variations in CT acquisition protocols, challenges with image quality in emergency settings, and the heterogeneity of ICH presentations in unselected patient populations. The relative preservation of specificity across settings, minimal change for research algorithms, 3.3 % decline for commercial systems, suggests that false positive rates remain controlled even as sensitivity decreases, however the absolute impact on workflow depends on prevalence-dependent PPV.

These findings highlight the need for focused and strict clinical validation before widespread adoption. They also suggest that published benchmark performance metrics should be interpreted with caution and not relied upon directly when making implementation decisions (Neves et al., 2023). Healthcare systems should anticipate around 7–8 % lower sensitivity in practice compared to vendor-reported validation statistics.

4.6. Addressing critical detection gaps - EDH and SAH

The inferior performance for EDH of sensitivity between 0.749 and 0.845) and SAH of sensitivity between 0.799 and 0.836 demonstrates significant challenges, as these subtypes often require urgent neurosurgical intervention. EDH often necessitates urgent neurosurgical intervention and hematoma evacuation, while timely identification of SAH is important for guiding decisions regarding aneurysm evaluation and management, especially when the etiology is non-traumatic (Seyam et al., 2022a). Several factors likely contribute to this detection difficulty.

Imaging characteristics present peculiar challenges, in which EDH typically appears as lens-shaped extra-axial collections that can be subtle when small or in early stages. SAH manifests as thin hyperdensity layers in subarachnoid spaces, easily confused with normal anatomical structures, especially in basilar cisterns. Both subtypes have lower contrast-to-noise ratios compared to intraparenchymal hemorrhages, challenging automated detection algorithms.

Dataset imbalance significantly impacts algorithm training. EDH represents only around 2–5 % of ICH cases in most datasets, while SAH constitutes around 10 %, creating severe class imbalance. The diagnostic challenge of EDH may also stem from its relatively low prevalence in most datasets, leading to underrepresentation during algorithm training and contributing to poorer model performance in this subtype. This underrepresentation limits algorithm exposure to different presentations, reducing generalization capability. The significant real-world performance drop for EDH of 14.1 % sensitivity decrease suggests inadequate significance to subtle clinical presentations.

Future algorithm development is warranted and recommended to include targeted oversampling and synthetic data augmentation for rare subtypes, attention mechanisms focused on extra-axial spaces and cisterns, ensemble approaches combining subtype-specialized models, and focused training on missed cases from clinical implementations. Certain studies in our study utilized subtype-specific optimization strategies that achieved superior EDH and SAH detection, suggesting this approach warrants broader adoption.

These subtypes may benefit from specialized algorithm training or more conservative clinical application to ensure patient safety. It is important to recognize that current AI tools may demonstrate reduced reliability in detecting these more challenging hemorrhage types, necessitating closer oversight and, when appropriate, secondary confirmation by expert radiologists (Cortés-Ferre et al., 2023a). Until these improvements materialize, clinical protocols should mandate radiologist review of AI-negative studies when clinical suspicion for EDH or SAH is high and consider specialized algorithms when these diagnoses are specifically suspected. The current generation of AI systems cannot serve as standalone rule-out tools for these subtypes.

4.7. Clinical applications framework

Based on our findings, several algorithms meet the performance thresholds required for emergency triage applications, where high sensitivity is important (Savage et al., 2024; Bark et al., 2024; Warman et al., 2024; Choi et al., 2024). Specifically, CNN-RNN, DNN, and several ResNet architectures demonstrated both sensitivity exceeding 95 % and specificity above 90 %. For radiologist diagnostic assistance requiring high specificity, commercial systems showed special strength, with several implementations achieving specificity over 95 % while maintaining acceptable sensitivity.

Our multi-dimensional performance assessment demonstrated that no single commercial system achieved optimal performance across all domains, including diagnostic accuracy, processing speed, workflow integration, and time-to-treatment impact. RapidAI ICH demonstrated the most balanced overall performance profile, while individual systems showed peculiar strengths, as Viz. ai ICH achieved significantly high specificity estimated at 98 %, and RAPID ICH led in processing speed with an estimate of 91 %. Healthcare systems are warranted select AI solutions based on specific clinical priorities and workflow needs rather than assuming universal superiority of any single platform.

The clinical applications framework we developed maps algorithmic capabilities to appropriate use cases, accounting for performance requirements, workflow constraints, and patient safety considerations. This framework suggests that current AI systems are well-suited for triage and workflow optimization but require human oversight for final diagnostic decisions, especially for challenging subtypes. However, performance gaps remain in other clinical applications. For instance, EDH detection shows a sensitivity shortfall of 10.1 percentage points compared to expected clinical requirements. Commercial AI systems demonstrated more consistent subtype-specific performance than research algorithms, which is most likely due to development with larger, more different datasets and extensive clinical validation. However, certain research algorithms occasionally outperformed commercial systems in targeted subtypes when optimized for specific use cases.

4.8. Limitations

Despite the methodological strengths of our study, including subgroup-focused analyses and the development of a clinical applications framework, several important limitations should be acknowledged. First, significant heterogeneity was found among the included studies, especially in patient populations, CT acquisition protocols, algorithmic implementations, and reference standards. However, we applied random-effects models and conducted subgroup analyses to mitigate this heterogeneity, residual variability may still affect the precision of our pooled estimates. Second, there was a limited number of studies reporting subtype-specific metrics, especially for less common presentations such as cerebellar and pontine hemorrhages. This restricts the confidence and generalizability of our findings for these subtypes. Third, many of the included studies lacked detailed reporting on algorithm architecture, training methodology, or validation approach, leading to a high proportion of “unclear” risk of bias assessments for the index test. This limitation affects the depth of methodological evaluation and constrains the specificity of our implementation recommendations.

Fourth, most included studies were retrospective in design, raising concerns about selection bias and limiting the applicability of the results to prospective clinical workflows. Fifth, only 27.6 % of studies conducted controlled external validation, which is an essential step in assessing algorithm generalizability across different healthcare settings and patient populations. In addition to that, our comparison of benchmark and real-world performance relied on between-study contrasts rather than within-study evaluations of the same algorithms across different environments, which would have provided stronger evidence of the implementation gap. We also noted limited reporting of key implementation metrics, such as processing time, system integration requirements, and impacts on workflow efficiency. This lack of data restricts a detailed and structured assessment of practical deployment considerations. Also, the evaluation of commercial AI systems was constrained by proprietary limitations that prevented access to architectural and training details, limiting our ability to perform detailed technical comparisons.

4.9. Future directions

Based on our findings and the identified limitations, we propose several priority areas for future evaluation and studies purposes. First, there is a need for large, prospective, multi-center studies with controlled external validation to assess the real-world performance of AI algorithms across different healthcare settings. These studies should provide detailed reporting of subtype-specific metrics and implementation parameters to support more precise and meaningful comparative analyses. Second, future studies should directly compare benchmark and clinical performance of the same algorithms to better characterize and address the implementation gap identified in our study. Third, further exploration of ensemble approaches is warranted, as our findings suggest that different algorithm architectures perform optimally for different ICH subtypes. Combining multiple algorithms may demonstrate better overall performance compared to single-model systems.

Fourth, studies that integrate workflow metrics and clinical outcome assessments would provide better understanding of the practical impact of AI implementation beyond diagnostic accuracy alone. Fifth, head-to-head comparisons of commercial AI systems under standardized conditions would offer valuable guidance for healthcare providers in selecting among available solutions. As the field advances, future studies should aim toward standardized comparisons using clearly defined performance metrics and shared validation datasets to enable consistent and transparent evaluations across different clinical settings. Multiple literature based datasets provide standardized datasets for standardized benchmarking of imaging algorithms, such as these available for Kaggle competitions. These competitions offer shared datasets and uniform evaluation protocols, enabling direct comparisons across academic and commercial models. Their structured format and public accessibility have catalyzed improvements in segmentation accuracy, reproducibility, and transparency, especially for complex tasks. In addition to that, future studies are warranted to prioritize addressing the persistent challenges associated with EDH and SAH detection. This may include developing specialized training pipelines or ensemble strategies that target the strengths of different algorithmic architectures (AI challenges, 2025).

5. Conclusions

Our meta-analysis of 45 studies demonstrates that AI-based algorithms can achieve strong diagnostic performance for ICH detection. Research algorithms showed a pooled sensitivity of 0.890 and specificity of 0.926, while commercial AI systems demonstrated slightly better performance with a sensitivity of 0.899 and notably higher specificity of 0.951. However, diagnostic accuracy varied significantly across ICH subtypes. EDH and SAH were the most challenging to detect, with detection difficulty scores of 0.251 and 0.201, respectively.

Deep learning techniques consistently outperformed traditional machine learning across all metrics. In particular, CNN-RNN architectures achieved a sensitivity of 0.977 and specificity of 0.974, while ResNet models reported a sensitivity of 0.957 and specificity of 0.962. Our multi-dimensional performance analysis revealed that commercial AI systems offer more balanced capabilities across diagnostic accuracy, processing speed, and workflow integration. Among these, RapidAI ICH demonstrated the most comprehensive overall performance, while other systems showed distinct strengths in specific operational domains.

A key finding was the performance gap observed between benchmark evaluations and real-world deployment. Sensitivity decreased by 7.0 % for research algorithms and 8.1 % for commercial systems when transitioning from controlled settings to clinical environments. Despite this gap, AI implementation was associated with significant workflow benefits, including a 26 % reduction in door-to-treatment decision times, a 57 % decrease in critical case notification times, improved critical case prioritization accuracy, and enhanced radiologist confidence.

Although the current generation of AI systems supports applications such as emergency triage and radiologist assistance, challenges persist in reliably detecting specific subtypes like EDH and SAH. Notably, commercial systems experienced a 14.1 % drop in sensitivity for EDH detection in real-world settings.

Future research should prioritize prospective, multi-center validation studies with detailed subtype-specific performance reporting. Head-to-head comparisons of commercial AI systems under standardized conditions, focused algorithm development for complex hemorrhage patterns, particularly EDH and SAH, and robust evaluations of workflow integration and real-world implementation metrics are essential steps to advance safe and effective clinical adoption.

Ethics approval and consent to participate

Ethical approval was not required for this systematic review and meta-analysis as it involved analysis of previously published studies and did not involve direct collection of human participant data. All included studies had appropriate ethical approvals as reported in their original publications.

Consent for publication

N/A. This study did not involve individual participant data requiring consent for publication.

Availability of data and materials

All data generated and analyzed during this study are included in this published article and its supplementary information files.

Authors' contributions

MSA conceived the study, designed the methodology, conducted the literature search, performed data extraction, conducted statistical analysis, and drafted the manuscript. AYA contributed to study design, data validation, statistical analysis expertise, and manuscript revision. ASA performed independent data extraction, quality assessment, and contributed to manuscript writing. AK contributed to methodology design, data interpretation, and critical manuscript revision. OAH assisted with literature search, data extraction, and manuscript preparation. MK contributed to quality assessment and data validation. AD provided expertise in neuroimaging interpretation, contributed to clinical application framework development, and manuscript revision. MAE assisted with data analysis, clinical interpretation, and manuscript editing. FF contributed to neuroimaging expertise, clinical application development, and manuscript revision. JM provided senior oversight, clinical expertise, manuscript review, and final approval. All authors read and approved the final manuscript.

Funding

N/A.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

N/A.

Handling Editor: Dr W Peul

Footnotes

^{Appendix A}

Supplementary data to this article can be found online at https://doi.org/10.1016/j.bas.2025.105866.

Abbreviations

AI: Artificial Intelligence; AUC: Area Under the Curve; CE: Conformité Européenne; CI: Confidence Interval; CNN: Convolutional Neural Network; CNN-RNN: Convolutional Recurrent Neural Networks; CPH: Cerebellar and Pontine Hemorrhages; CT: Computed Tomography; DL: Deep Learning; DNN: Deep Neural Network; DSS: Decision Support System; EDH: Epidural Hemorrhage; FDA: Food and Drug Administration; ICH: Intracranial Hemorrhage; IPH: Intraparenchymal Hemorrhage; IVH: Intraventricular Hemorrhage; LR: Logistic Regression; MeSH: Medical Subject Headings; ML: Machine Learning; MRI: Magnetic Resonance Imaging; NCCT: Non-Contrast Computed Tomography; PACS: Picture Archiving and Communication System; PPV: Positive Predictive Value; PRISMA-DTA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses for Diagnostic Test Accuracy Studies; QUADAS-2: Quality Assessment of Diagnostic Accuracy Studies-2; R-CNN: Region-based Convolutional Neural Network; ResNet: Residual Network; RF: Random Forest; RNN: Recurrent Neural Network; ROC: Receiver Operating Characteristic; RSNA: Radiological Society of North America; SAH: Subarachnoid Hemorrhage; SDH: Subdural Hemorrhage; SE-ResNeXt: Squeeze-and-Excitation ResNeXt; U-Net: U-shaped Network; XGBoost: Extreme Gradient Boosting; 2D-CNN: Two-dimensional Convolutional Neural Network; 3D-CNN: Three-dimensional Convolutional Neural Network.

Appendix A. Supplementary data

The following are the Supplementary data to this article:

Multimedia component 1

mmc1.docx^{(15.9KB, docx)}

Multimedia component 2

mmc2.docx^{(31.7KB, docx)}

Multimedia component 3

mmc3.docx^{(15.6KB, docx)}

References

Abe D., Inaji M., Hase T., Takahashi S., Sakai R., Ayabe F., et al. A prehospital triage system to detect traumatic intracranial hemorrhage using machine learning algorithms. JAMA Netw. Open. 2022;5(6) doi: 10.1001/jamanetworkopen.2022.16393. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ahmed S.N., PjpiB Prakasam, Biology M. Vol. 183. 2023. pp. 1–16. (A Systematic Review on Intracranial Aneurysm and Hemorrhage Detection Using Machine Learning and Deep Learning Techniques). [DOI] [PubMed] [Google Scholar]
Ahmed S., Esha J.F., Rahman M.S., Kaiser M.S., Hosen A.S., Ghimire D., et al. 2024. Exploring Deep Learning and Machine Learning Approaches for Brain Hemorrhage Detection. [Google Scholar]
AI challenges . North America; 2025. Radiological Society of. [Google Scholar]
Ai M., Zhang H., Feng J., Chen H., Liu D., Li C., et al. Vol. 12. 2024. (Research Advances in Predicting the Expansion of Hypertensive Intracerebral Hemorrhage Based on CT Images: an Overview). [DOI] [PMC free article] [PubMed] [Google Scholar]
Alis D., Alis C., Yergin M., Topel C., Asmakutlu O., Bagcilar O., et al. A joint convolutional-recurrent neural network with an attention mechanism for detecting intracranial hemorrhage on noncontrast head CT. Sci. Rep. 2022;12(1):2084. doi: 10.1038/s41598-022-05872-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Altuve M., Pérez A. Intracerebral hemorrhage detection on computed tomography images using a residual neural network. Phys. Med. 2022;99:113–119. doi: 10.1016/j.ejmp.2022.05.015. [DOI] [PubMed] [Google Scholar]
Arbabshirani M.R., Fornwalt B.K., Mongelluzzo G.J., Suever J.D., Geise B.D., Patel A.A., et al. Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration. npj Digit. Med. 2018;1:9. doi: 10.1038/s41746-017-0015-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Babi M.A., Mayberry W., Koriesh A., AjfiN Nouh. Frontiers Media SA; 2025. Neuro-Imaging in Intracerebral Hemorrhage: Updates and Knowledge Gaps. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bark D., Basu J., Toumpanakis D., Burwick Nyberg J., Bjerner T., Rostami E., et al. Clinical impact of an AI decision support system for detection of intracranial hemorrhage in CT scans. 2024;5(1):1009–1015. doi: 10.1089/neur.2024.0017. [DOI] [PMC free article] [PubMed] [Google Scholar]
Buls N., Watté N., Nieboer K., Ilsen B., de Mey J. Performance of an artificial intelligence tool with real-time clinical workflow integration - detection of intracranial hemorrhage and pulmonary embolism. Phys. Med. : PM : an international journal devoted to the applications of physics to medicine and biology : official journal of the Italian Association of Biomedical Physics (AIFB) 2021;83:154–160. doi: 10.1016/j.ejmp.2021.03.015. [DOI] [PubMed] [Google Scholar]
Chang P.D., Kuoy E., Grinband J., Weinberg B.D., Thompson M., Homo R., et al. Hybrid 3D/2D convolutional neural network for hemorrhage evaluation on head CT. AJNR Am J. Neuroradiol. 2018;39(9):1609–1616. doi: 10.3174/ajnr.A5742. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chilamkurthy S., Ghosh R., Tanamala S., Biviji M., Campeau N.G., Venugopal V.K., et al. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet (London, England) 2018;392(10162):2388–2396. doi: 10.1016/S0140-6736(18)31645-3. [DOI] [PubMed] [Google Scholar]
Choi S.Y., Kim J.H., Chung H.S., Lim S., Kim E.H., Choi A.J.S.R. Impact of a deep learning-based brain CT interpretation algorithm on clinical decision-making for intracranial hemorrhage in the emergency department. 2024;14(1) doi: 10.1038/s41598-024-73589-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cortés-Ferre L., Gutiérrez-Naranjo M.A., Egea-Guerrero J.J., Pérez-Sánchez S., Balcerzyk MJJoI. Deep learning applied to intracranial hemorrhage detection. 2023;9(2):37. doi: 10.3390/jimaging9020037. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cortés-Ferre L., Gutiérrez-Naranjo M.A., Egea-Guerrero J.J., Pérez-Sánchez S., Balcerzyk M. Deep learning applied to intracranial hemorrhage detection. J. Imag. 2023;9(2) doi: 10.3390/jimaging9020037. [DOI] [PMC free article] [PubMed] [Google Scholar]
Danilov G., Kotik K., Negreeva A., Tsukanova T., Shifrin M., Zakharova N., et al. Classification of intracranial hemorrhage subtypes using deep learning on CT scans. Stud. Health Technol. Inf. 2020;272:370–373. doi: 10.3233/SHTI200572. [DOI] [PubMed] [Google Scholar]
Davis M.A., Rao B., Cedeno P.A., Saha A., Zohrabian V.M. Machine learning and improved quality metrics in acute intracranial hemorrhage by noncontrast computed tomography. Curr. Probl. Diagn. Radiol. 2022;51(4):556–561. doi: 10.1067/j.cpradiol.2020.10.007. [DOI] [PubMed] [Google Scholar]
D'Angelo T., Bucolo G.M., Kamareddine T., Yel I., Koch V., Gruenewald L.D., et al. Accuracy and time efficiency of a novel deep learning algorithm for intracranial hemorrhage detection in CT scans. 2024;129(10):1499–1506. doi: 10.1007/s11547-024-01867-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ginat D.T. Analysis of head CT scans flagged by deep learning software for acute intracranial hemorrhage. Neuroradiology. 2020;62(3):335–340. doi: 10.1007/s00234-019-02330-w. [DOI] [PubMed] [Google Scholar]
Ginat D. Implementation of machine learning software on the radiology worklist decreases scan view delay for the detection of intracranial hemorrhage on CT. Brain Sci. 2021;11(7) doi: 10.3390/brainsci11070832. [DOI] [PMC free article] [PubMed] [Google Scholar]
Grewal M., Srivastava M.M., Kumar P., Varadarajan S., editors. 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) IEEE; 2018. Radnet: radiologist level accuracy using deep learning for hemorrhage detection in ct scans. [Google Scholar]
Heit J.J., Coelho H., Lima F.O., Granja M., Aghaebrahim A., Hanel R., et al. Automated cerebral hemorrhage detection using RAPID. AJNR Am J. Neuroradiol. 2021;42(2):273–278. doi: 10.3174/ajnr.A6926. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hopkins B.S., Murthy N.K., Texakalidis P., Karras C.L., Mansell M., Jahromi B.S., et al. Mass deployment of deep neural network: real-time proof of concept with screening of intracranial hemorrhage using an open data set. Neurosurgery. 2022;90(4):383–389. doi: 10.1227/NEU.0000000000001841. [DOI] [PubMed] [Google Scholar]
Hurford W.E., Eckman M.H., Welge J.A. Data and meta-analysis for choosing sugammadex or neostigmine for routine reversal of rocuronium block in adult patients. Data Brief. 2020;32 doi: 10.1016/j.dib.2020.106241. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kau T., Ziurlys M., Taschwer M., Kloss-Brandstätter A., Grabner G., Deutschmann H. FDA-approved deep learning software application versus radiologists with different levels of expertise: detection of intracranial hemorrhage in a retrospective single-center study. Neuroradiology. 2022;64(5):981–990. doi: 10.1007/s00234-021-02874-w. [DOI] [PubMed] [Google Scholar]
Kumaravel P., Mohan S., Arivudaiyanambi J., Shajil N., Venkatakrishnan H.N. A simplified framework for the detection of intracranial hemorrhage in CT brain images using deep learning. Curr. Med. Imag. 2021;17(10):1226–1236. doi: 10.2174/1573405617666210218100641. [DOI] [PubMed] [Google Scholar]
Kundisch A., Hönning A., Mutze S., Kreissl L., Spohn F., Lemcke J., et al. Deep learning algorithm in detecting intracranial hemorrhages on emergency computed tomographies. 2021;16(11) doi: 10.1371/journal.pone.0260560. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kuo W., Hӓne C., Mukherjee P., Malik J., Yuh E.L. Expert-level detection of acute intracranial hemorrhage on head computed tomography using deep learning. Proc. Natl. Acad. Sci. U. S. A. 2019;116(45):22737–22745. doi: 10.1073/pnas.1908021116. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee H., Yune S., Mansouri M., Kim M., Tajmir S.H., Guerrier C.E., et al. An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets. Nat. Biomed. Eng. 2019;3(3):173–182. doi: 10.1038/s41551-018-0324-9. [DOI] [PubMed] [Google Scholar]
Majumdar A., Brattain L., Telfer B., Farris C., Scalera J., editors. Detecting Intracranial Hemorrhage with Deep Learning. 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. EMBC); IEEE: 2018. [DOI] [PubMed] [Google Scholar]
McInnes M.D., Moher D., Thombs B.D., McGrath T.A., Bossuyt P.M., Clifford T., et al. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement. 2018;319(4):388–396. doi: 10.1001/jama.2017.19163. [DOI] [PubMed] [Google Scholar]
McLouth J., Elstrott S., Chaibi Y., Quenet S., Chang P.D., Chow D.S., et al. Validation of a deep learning tool in the detection of intracranial hemorrhage and large vessel occlusion. Front. Neurol. 2021;12 doi: 10.3389/fneur.2021.656112. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mun K.T., Hinman J.D.J.S. Inflammation and the link to vascular brain health: timing is brain. 2022;53(2):427–436. doi: 10.1161/STROKEAHA.121.032613. [DOI] [PubMed] [Google Scholar]
Nada A., Sayed A.A., Hamouda M., Tantawi M., Khan A., Alt A., et al. External validation and performance analysis of a deep learning-based model for the detection of intracranial hemorrhage. NeuroRadiol. J. 2024 doi: 10.1177/19714009241303078. [DOI] [PMC free article] [PubMed] [Google Scholar]
Neves G., Warman P.I., Warman A., Warman R., Bueso T., Vadhan J.D., et al. Vol. 173. 2023. pp. e800–e807. (External Validation of an Artificial Intelligence Device for Intracranial Hemorrhage Detection). [DOI] [PubMed] [Google Scholar]
O'Neill T.J., Xi Y., Stehel E., Browning T., Ng Y.S., Baker C., et al. Active reprioritization of the reading worklist using artificial intelligence has a beneficial effect on the turnaround time for interpretation of head CT with intracranial hemorrhage. Radiol. Artif. Intell. 2021;3(2) doi: 10.1148/ryai.2020200024. [DOI] [PMC free article] [PubMed] [Google Scholar]
Petry M., Lansky C., Chodakiewitz Y., Maya M., Pressman B. Decreased hospital length of stay for ICH and PE after adoption of an artificial intelligence-augmented radiological worklist triage system. Radiol. Res. Pract. 2022;2022 doi: 10.1155/2022/2141839. [DOI] [PMC free article] [PubMed] [Google Scholar]
Phaphuangwittayakul A., Guo Y., Ying F., Dawod A.Y., Angkurawaranon S., Angkurawaranon C. An optimal deep learning framework for multi-type hemorrhagic lesions detection and quantification in head CT images for traumatic brain injury. Appl. Intell. 2022;52(7):7320–7338. doi: 10.1007/s10489-021-02782-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rao B.N., Mohanty S., Sen K., Acharya U.R., Cheong K.H., Sabut S. Deep transfer learning for automatic prediction of hemorrhagic stroke on CT images. Comput. Math. Methods Med. 2022;2022 doi: 10.1155/2022/3560507. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rava R.A., Seymour S.E., LaQue M.E., Peterson B.A., Snyder K.V., Mokin M., et al. Assessment of an artificial intelligence algorithm for detection of intracranial hemorrhage. World Neurosurg. 2021;150:e209–e217. doi: 10.1016/j.wneu.2021.02.134. [DOI] [PubMed] [Google Scholar]
Romero J.M., Rojas-Serrano L.F.J.R.C. Current evaluation of intracerebral hemorrhage. 2023;61(3):479–490. doi: 10.1016/j.rcl.2023.01.005. [DOI] [PubMed] [Google Scholar]
Roshan M.P., Al-Shaikhli S.A., Linfante I., Antony T.T., Clarke J.E., Noman R., et al. Revolutionizing intracranial hemorrhage diagnosis: a retrospective analytical study of Viz.ai ICH for enhanced diagnostic accuracy. Cureus. 2024;16(8) doi: 10.7759/cureus.66449. [DOI] [PMC free article] [PubMed] [Google Scholar]
Saha R., Al‐Salihi M.M., Elazim A.A., Solanki D., Kapuria P., Dalal S.S.J.N., et al. 2025. Trends in Intracranial Hemorrhage Mortality in the US: a 22‐Year Analysis (1999–2020) [Google Scholar]
Salehinejad H., Kitamura J., Ditkofsky N., Lin A., Bharatha A., Suthiphosuwan S., et al. A real-world demonstration of machine learning generalizability in the detection of intracranial hemorrhage on head computerized tomography. Sci. Rep. 2021;11(1) doi: 10.1038/s41598-021-95533-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Savage C.H., Tanwar M., Elkassem A.A., Sturdivant A., Hamki O., Sotoudeh H., et al. Prospective evaluation of artificial intelligence triage of intracranial hemorrhage on noncontrast head CT examinations. 2024;223(5) doi: 10.2214/AJR.24.31639. [DOI] [PubMed] [Google Scholar]
Schmitt N., Mokli Y., Weyland C.S., Gerry S., Herweh C., Ringleb P.A., et al. Automated detection and segmentation of intracranial hemorrhage suspect hyperdensities in non-contrast-enhanced CT scans of acute stroke patients. Eur. Radiol. 2022;32(4):2246–2254. doi: 10.1007/s00330-021-08352-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Seyam M., Weikert T., Sauter A., Brehm A., Psychogios M.-N., Blackham K. Utilization of artificial intelligence–based intracranial hemorrhage detection on emergent noncontrast CT images in clinical workflow. 2022;4(2) doi: 10.1148/ryai.210168. [DOI] [PMC free article] [PubMed] [Google Scholar]
Seyam M., Weikert T., Sauter A., Brehm A., Psychogios M.N., Blackham K.A. Utilization of artificial intelligence-based intracranial hemorrhage detection on emergent noncontrast CT images in clinical workflow. Radiol. Artif. Intell. 2022;4(2) doi: 10.1148/ryai.210168. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tang Z., Zhu Y., Lu X., Wu D., Fan X., Shen J., et al. Deep learning-based prediction of hematoma expansion using a single brain computed tomographic slice in patients with spontaneous intracerebral hemorrhages. World Neurosurg. 2022;165:e128–e136. doi: 10.1016/j.wneu.2022.05.109. [DOI] [PubMed] [Google Scholar]
Tharek A., Muda A.S., Hudi A.B., Abjajomt Hudin. Intracranial hemorrhage detection in CT scan using deep learning. 2022;2(1):1–18. [Google Scholar]
Trevisi G., Caccavella V.M., Scerrati A., Signorelli F., Salamone G.G., Orsini K., et al. Machine learning model prediction of 6-month functional outcome in elderly patients with intracerebral hemorrhage. Neurosurg. Rev. 2022;45(4):2857–2867. doi: 10.1007/s10143-022-01802-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Uchida K., Kouno J., Yoshimura S., Kinjo N., Sakakibara F., Araki H., et al. Development of machine learning models to predict probabilities and types of stroke at prehospital stage: the Japan urgent stroke triage score using machine learning (JUST-ML) Transl. Stroke Res. 2022;13(3):370–381. doi: 10.1007/s12975-021-00937-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vacek A., Mair G., White P., Bath P.M., Muir K.W., Al-Shahi Salman R., et al. Evaluating artificial intelligence software for delineating hemorrhage extent on CT brain imaging in stroke: AI delineation of ICH on CT. J. Stroke Cerebrovasc. Dis. : Off. J. Nat. Stroke Assoc. 2024;33(1) doi: 10.1016/j.jstrokecerebrovasdis.2023.107512. [DOI] [PubMed] [Google Scholar]
Voter A.F., Meram E., Garrett J.W., Yu J.J. Diagnostic accuracy and failure mode analysis of a deep learning algorithm for the detection of intracranial hemorrhage. J. Am. Coll. Radiol. : JACR. 2021;18(8):1143–1152. doi: 10.1016/j.jacr.2021.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang X., Shen T., Yang S., Lan J., Xu Y., Wang M., et al. A deep learning algorithm for automatic detection and classification of acute intracranial hemorrhages in head CT scans. NeuroImage Clin. 2021;32 doi: 10.1016/j.nicl.2021.102785. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang S., Zou X.-L., Wu L.-X., Zhou H.-F., Xiao L., Yao T., et al. Vol. 13. 2022. (Epidemiology of Intracerebral Hemorrhage: a Systematic Review and meta-analysis). [DOI] [PMC free article] [PubMed] [Google Scholar]
Warman P., Warman A., Warman R., Degnan A., Blickman J., Smith D., et al. Using an artificial intelligence software improves emergency medicine physician intracranial haemorrhage detection to radiologist levels. 2024;41(5):298–303. doi: 10.1136/emermed-2023-213158. [DOI] [PubMed] [Google Scholar]
Ye H., Gao F., Yin Y., Guo D., Zhao P., Lu Y., et al. Precise diagnosis of intracranial hemorrhage and subtypes using a three-dimensional joint convolutional and recurrent neural network. Eur. Radiol. 2019;29(11):6191–6201. doi: 10.1007/s00330-019-06163-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yeo M., Tahayori B., Kok H.K., Maingard J., Kutaiba N., Russell J., et al. Evaluation of techniques to improve a deep learning algorithm for the automatic detection of intracranial haemorrhage on CT head imaging. 2023;7(1):17. doi: 10.1186/s41747-023-00330-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhou Q., Zhu W., Li F., Yuan M., Zheng L., Liu X. Transfer learning of the ResNet-18 and DenseNet-121 model used to diagnose intracranial hemorrhage in CT scanning. Curr. Pharm. Des. 2022;28(4):287–295. doi: 10.2174/1381612827666211213143357. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1

mmc1.docx^{(15.9KB, docx)}

Multimedia component 2

mmc2.docx^{(31.7KB, docx)}

Multimedia component 3

mmc3.docx^{(15.6KB, docx)}

Data Availability Statement

All data generated and analyzed during this study are included in this published article and its supplementary information files.

[bib1] Abe D., Inaji M., Hase T., Takahashi S., Sakai R., Ayabe F., et al. A prehospital triage system to detect traumatic intracranial hemorrhage using machine learning algorithms. JAMA Netw. Open. 2022;5(6) doi: 10.1001/jamanetworkopen.2022.16393. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Ahmed S.N., PjpiB Prakasam, Biology M. Vol. 183. 2023. pp. 1–16. (A Systematic Review on Intracranial Aneurysm and Hemorrhage Detection Using Machine Learning and Deep Learning Techniques). [DOI] [PubMed] [Google Scholar]

[bib3] Ahmed S., Esha J.F., Rahman M.S., Kaiser M.S., Hosen A.S., Ghimire D., et al. 2024. Exploring Deep Learning and Machine Learning Approaches for Brain Hemorrhage Detection. [Google Scholar]

[bib4] AI challenges . North America; 2025. Radiological Society of. [Google Scholar]

[bib5] Ai M., Zhang H., Feng J., Chen H., Liu D., Li C., et al. Vol. 12. 2024. (Research Advances in Predicting the Expansion of Hypertensive Intracerebral Hemorrhage Based on CT Images: an Overview). [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Alis D., Alis C., Yergin M., Topel C., Asmakutlu O., Bagcilar O., et al. A joint convolutional-recurrent neural network with an attention mechanism for detecting intracranial hemorrhage on noncontrast head CT. Sci. Rep. 2022;12(1):2084. doi: 10.1038/s41598-022-05872-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Altuve M., Pérez A. Intracerebral hemorrhage detection on computed tomography images using a residual neural network. Phys. Med. 2022;99:113–119. doi: 10.1016/j.ejmp.2022.05.015. [DOI] [PubMed] [Google Scholar]

[bib8] Arbabshirani M.R., Fornwalt B.K., Mongelluzzo G.J., Suever J.D., Geise B.D., Patel A.A., et al. Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration. npj Digit. Med. 2018;1:9. doi: 10.1038/s41746-017-0015-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Babi M.A., Mayberry W., Koriesh A., AjfiN Nouh. Frontiers Media SA; 2025. Neuro-Imaging in Intracerebral Hemorrhage: Updates and Knowledge Gaps. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] Bark D., Basu J., Toumpanakis D., Burwick Nyberg J., Bjerner T., Rostami E., et al. Clinical impact of an AI decision support system for detection of intracranial hemorrhage in CT scans. 2024;5(1):1009–1015. doi: 10.1089/neur.2024.0017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Buls N., Watté N., Nieboer K., Ilsen B., de Mey J. Performance of an artificial intelligence tool with real-time clinical workflow integration - detection of intracranial hemorrhage and pulmonary embolism. Phys. Med. : PM : an international journal devoted to the applications of physics to medicine and biology : official journal of the Italian Association of Biomedical Physics (AIFB) 2021;83:154–160. doi: 10.1016/j.ejmp.2021.03.015. [DOI] [PubMed] [Google Scholar]

[bib12] Chang P.D., Kuoy E., Grinband J., Weinberg B.D., Thompson M., Homo R., et al. Hybrid 3D/2D convolutional neural network for hemorrhage evaluation on head CT. AJNR Am J. Neuroradiol. 2018;39(9):1609–1616. doi: 10.3174/ajnr.A5742. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] Chilamkurthy S., Ghosh R., Tanamala S., Biviji M., Campeau N.G., Venugopal V.K., et al. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet (London, England) 2018;392(10162):2388–2396. doi: 10.1016/S0140-6736(18)31645-3. [DOI] [PubMed] [Google Scholar]

[bib14] Choi S.Y., Kim J.H., Chung H.S., Lim S., Kim E.H., Choi A.J.S.R. Impact of a deep learning-based brain CT interpretation algorithm on clinical decision-making for intracranial hemorrhage in the emergency department. 2024;14(1) doi: 10.1038/s41598-024-73589-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] Cortés-Ferre L., Gutiérrez-Naranjo M.A., Egea-Guerrero J.J., Pérez-Sánchez S., Balcerzyk MJJoI. Deep learning applied to intracranial hemorrhage detection. 2023;9(2):37. doi: 10.3390/jimaging9020037. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] Cortés-Ferre L., Gutiérrez-Naranjo M.A., Egea-Guerrero J.J., Pérez-Sánchez S., Balcerzyk M. Deep learning applied to intracranial hemorrhage detection. J. Imag. 2023;9(2) doi: 10.3390/jimaging9020037. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] Danilov G., Kotik K., Negreeva A., Tsukanova T., Shifrin M., Zakharova N., et al. Classification of intracranial hemorrhage subtypes using deep learning on CT scans. Stud. Health Technol. Inf. 2020;272:370–373. doi: 10.3233/SHTI200572. [DOI] [PubMed] [Google Scholar]

[bib18] Davis M.A., Rao B., Cedeno P.A., Saha A., Zohrabian V.M. Machine learning and improved quality metrics in acute intracranial hemorrhage by noncontrast computed tomography. Curr. Probl. Diagn. Radiol. 2022;51(4):556–561. doi: 10.1067/j.cpradiol.2020.10.007. [DOI] [PubMed] [Google Scholar]

[bib19] D'Angelo T., Bucolo G.M., Kamareddine T., Yel I., Koch V., Gruenewald L.D., et al. Accuracy and time efficiency of a novel deep learning algorithm for intracranial hemorrhage detection in CT scans. 2024;129(10):1499–1506. doi: 10.1007/s11547-024-01867-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] Ginat D.T. Analysis of head CT scans flagged by deep learning software for acute intracranial hemorrhage. Neuroradiology. 2020;62(3):335–340. doi: 10.1007/s00234-019-02330-w. [DOI] [PubMed] [Google Scholar]

[bib21] Ginat D. Implementation of machine learning software on the radiology worklist decreases scan view delay for the detection of intracranial hemorrhage on CT. Brain Sci. 2021;11(7) doi: 10.3390/brainsci11070832. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] Grewal M., Srivastava M.M., Kumar P., Varadarajan S., editors. 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) IEEE; 2018. Radnet: radiologist level accuracy using deep learning for hemorrhage detection in ct scans. [Google Scholar]

[bib23] Heit J.J., Coelho H., Lima F.O., Granja M., Aghaebrahim A., Hanel R., et al. Automated cerebral hemorrhage detection using RAPID. AJNR Am J. Neuroradiol. 2021;42(2):273–278. doi: 10.3174/ajnr.A6926. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Hopkins B.S., Murthy N.K., Texakalidis P., Karras C.L., Mansell M., Jahromi B.S., et al. Mass deployment of deep neural network: real-time proof of concept with screening of intracranial hemorrhage using an open data set. Neurosurgery. 2022;90(4):383–389. doi: 10.1227/NEU.0000000000001841. [DOI] [PubMed] [Google Scholar]

[bib25] Hurford W.E., Eckman M.H., Welge J.A. Data and meta-analysis for choosing sugammadex or neostigmine for routine reversal of rocuronium block in adult patients. Data Brief. 2020;32 doi: 10.1016/j.dib.2020.106241. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Kau T., Ziurlys M., Taschwer M., Kloss-Brandstätter A., Grabner G., Deutschmann H. FDA-approved deep learning software application versus radiologists with different levels of expertise: detection of intracranial hemorrhage in a retrospective single-center study. Neuroradiology. 2022;64(5):981–990. doi: 10.1007/s00234-021-02874-w. [DOI] [PubMed] [Google Scholar]

[bib27] Kumaravel P., Mohan S., Arivudaiyanambi J., Shajil N., Venkatakrishnan H.N. A simplified framework for the detection of intracranial hemorrhage in CT brain images using deep learning. Curr. Med. Imag. 2021;17(10):1226–1236. doi: 10.2174/1573405617666210218100641. [DOI] [PubMed] [Google Scholar]

[bib28] Kundisch A., Hönning A., Mutze S., Kreissl L., Spohn F., Lemcke J., et al. Deep learning algorithm in detecting intracranial hemorrhages on emergency computed tomographies. 2021;16(11) doi: 10.1371/journal.pone.0260560. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Kuo W., Hӓne C., Mukherjee P., Malik J., Yuh E.L. Expert-level detection of acute intracranial hemorrhage on head computed tomography using deep learning. Proc. Natl. Acad. Sci. U. S. A. 2019;116(45):22737–22745. doi: 10.1073/pnas.1908021116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Lee H., Yune S., Mansouri M., Kim M., Tajmir S.H., Guerrier C.E., et al. An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets. Nat. Biomed. Eng. 2019;3(3):173–182. doi: 10.1038/s41551-018-0324-9. [DOI] [PubMed] [Google Scholar]

[bib31] Majumdar A., Brattain L., Telfer B., Farris C., Scalera J., editors. Detecting Intracranial Hemorrhage with Deep Learning. 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. EMBC); IEEE: 2018. [DOI] [PubMed] [Google Scholar]

[bib32] McInnes M.D., Moher D., Thombs B.D., McGrath T.A., Bossuyt P.M., Clifford T., et al. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement. 2018;319(4):388–396. doi: 10.1001/jama.2017.19163. [DOI] [PubMed] [Google Scholar]

[bib33] McLouth J., Elstrott S., Chaibi Y., Quenet S., Chang P.D., Chow D.S., et al. Validation of a deep learning tool in the detection of intracranial hemorrhage and large vessel occlusion. Front. Neurol. 2021;12 doi: 10.3389/fneur.2021.656112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] Mun K.T., Hinman J.D.J.S. Inflammation and the link to vascular brain health: timing is brain. 2022;53(2):427–436. doi: 10.1161/STROKEAHA.121.032613. [DOI] [PubMed] [Google Scholar]

[bib35] Nada A., Sayed A.A., Hamouda M., Tantawi M., Khan A., Alt A., et al. External validation and performance analysis of a deep learning-based model for the detection of intracranial hemorrhage. NeuroRadiol. J. 2024 doi: 10.1177/19714009241303078. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] Neves G., Warman P.I., Warman A., Warman R., Bueso T., Vadhan J.D., et al. Vol. 173. 2023. pp. e800–e807. (External Validation of an Artificial Intelligence Device for Intracranial Hemorrhage Detection). [DOI] [PubMed] [Google Scholar]

[bib37] O'Neill T.J., Xi Y., Stehel E., Browning T., Ng Y.S., Baker C., et al. Active reprioritization of the reading worklist using artificial intelligence has a beneficial effect on the turnaround time for interpretation of head CT with intracranial hemorrhage. Radiol. Artif. Intell. 2021;3(2) doi: 10.1148/ryai.2020200024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] Petry M., Lansky C., Chodakiewitz Y., Maya M., Pressman B. Decreased hospital length of stay for ICH and PE after adoption of an artificial intelligence-augmented radiological worklist triage system. Radiol. Res. Pract. 2022;2022 doi: 10.1155/2022/2141839. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] Phaphuangwittayakul A., Guo Y., Ying F., Dawod A.Y., Angkurawaranon S., Angkurawaranon C. An optimal deep learning framework for multi-type hemorrhagic lesions detection and quantification in head CT images for traumatic brain injury. Appl. Intell. 2022;52(7):7320–7338. doi: 10.1007/s10489-021-02782-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Rao B.N., Mohanty S., Sen K., Acharya U.R., Cheong K.H., Sabut S. Deep transfer learning for automatic prediction of hemorrhagic stroke on CT images. Comput. Math. Methods Med. 2022;2022 doi: 10.1155/2022/3560507. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] Rava R.A., Seymour S.E., LaQue M.E., Peterson B.A., Snyder K.V., Mokin M., et al. Assessment of an artificial intelligence algorithm for detection of intracranial hemorrhage. World Neurosurg. 2021;150:e209–e217. doi: 10.1016/j.wneu.2021.02.134. [DOI] [PubMed] [Google Scholar]

[bib42] Romero J.M., Rojas-Serrano L.F.J.R.C. Current evaluation of intracerebral hemorrhage. 2023;61(3):479–490. doi: 10.1016/j.rcl.2023.01.005. [DOI] [PubMed] [Google Scholar]

[bib43] Roshan M.P., Al-Shaikhli S.A., Linfante I., Antony T.T., Clarke J.E., Noman R., et al. Revolutionizing intracranial hemorrhage diagnosis: a retrospective analytical study of Viz.ai ICH for enhanced diagnostic accuracy. Cureus. 2024;16(8) doi: 10.7759/cureus.66449. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] Saha R., Al‐Salihi M.M., Elazim A.A., Solanki D., Kapuria P., Dalal S.S.J.N., et al. 2025. Trends in Intracranial Hemorrhage Mortality in the US: a 22‐Year Analysis (1999–2020) [Google Scholar]

[bib45] Salehinejad H., Kitamura J., Ditkofsky N., Lin A., Bharatha A., Suthiphosuwan S., et al. A real-world demonstration of machine learning generalizability in the detection of intracranial hemorrhage on head computerized tomography. Sci. Rep. 2021;11(1) doi: 10.1038/s41598-021-95533-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] Savage C.H., Tanwar M., Elkassem A.A., Sturdivant A., Hamki O., Sotoudeh H., et al. Prospective evaluation of artificial intelligence triage of intracranial hemorrhage on noncontrast head CT examinations. 2024;223(5) doi: 10.2214/AJR.24.31639. [DOI] [PubMed] [Google Scholar]

[bib47] Schmitt N., Mokli Y., Weyland C.S., Gerry S., Herweh C., Ringleb P.A., et al. Automated detection and segmentation of intracranial hemorrhage suspect hyperdensities in non-contrast-enhanced CT scans of acute stroke patients. Eur. Radiol. 2022;32(4):2246–2254. doi: 10.1007/s00330-021-08352-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib48] Seyam M., Weikert T., Sauter A., Brehm A., Psychogios M.-N., Blackham K. Utilization of artificial intelligence–based intracranial hemorrhage detection on emergent noncontrast CT images in clinical workflow. 2022;4(2) doi: 10.1148/ryai.210168. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib49] Seyam M., Weikert T., Sauter A., Brehm A., Psychogios M.N., Blackham K.A. Utilization of artificial intelligence-based intracranial hemorrhage detection on emergent noncontrast CT images in clinical workflow. Radiol. Artif. Intell. 2022;4(2) doi: 10.1148/ryai.210168. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib50] Tang Z., Zhu Y., Lu X., Wu D., Fan X., Shen J., et al. Deep learning-based prediction of hematoma expansion using a single brain computed tomographic slice in patients with spontaneous intracerebral hemorrhages. World Neurosurg. 2022;165:e128–e136. doi: 10.1016/j.wneu.2022.05.109. [DOI] [PubMed] [Google Scholar]

[bib51] Tharek A., Muda A.S., Hudi A.B., Abjajomt Hudin. Intracranial hemorrhage detection in CT scan using deep learning. 2022;2(1):1–18. [Google Scholar]

[bib52] Trevisi G., Caccavella V.M., Scerrati A., Signorelli F., Salamone G.G., Orsini K., et al. Machine learning model prediction of 6-month functional outcome in elderly patients with intracerebral hemorrhage. Neurosurg. Rev. 2022;45(4):2857–2867. doi: 10.1007/s10143-022-01802-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib53] Uchida K., Kouno J., Yoshimura S., Kinjo N., Sakakibara F., Araki H., et al. Development of machine learning models to predict probabilities and types of stroke at prehospital stage: the Japan urgent stroke triage score using machine learning (JUST-ML) Transl. Stroke Res. 2022;13(3):370–381. doi: 10.1007/s12975-021-00937-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] Vacek A., Mair G., White P., Bath P.M., Muir K.W., Al-Shahi Salman R., et al. Evaluating artificial intelligence software for delineating hemorrhage extent on CT brain imaging in stroke: AI delineation of ICH on CT. J. Stroke Cerebrovasc. Dis. : Off. J. Nat. Stroke Assoc. 2024;33(1) doi: 10.1016/j.jstrokecerebrovasdis.2023.107512. [DOI] [PubMed] [Google Scholar]

[bib55] Voter A.F., Meram E., Garrett J.W., Yu J.J. Diagnostic accuracy and failure mode analysis of a deep learning algorithm for the detection of intracranial hemorrhage. J. Am. Coll. Radiol. : JACR. 2021;18(8):1143–1152. doi: 10.1016/j.jacr.2021.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib56] Wang X., Shen T., Yang S., Lan J., Xu Y., Wang M., et al. A deep learning algorithm for automatic detection and classification of acute intracranial hemorrhages in head CT scans. NeuroImage Clin. 2021;32 doi: 10.1016/j.nicl.2021.102785. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] Wang S., Zou X.-L., Wu L.-X., Zhou H.-F., Xiao L., Yao T., et al. Vol. 13. 2022. (Epidemiology of Intracerebral Hemorrhage: a Systematic Review and meta-analysis). [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib58] Warman P., Warman A., Warman R., Degnan A., Blickman J., Smith D., et al. Using an artificial intelligence software improves emergency medicine physician intracranial haemorrhage detection to radiologist levels. 2024;41(5):298–303. doi: 10.1136/emermed-2023-213158. [DOI] [PubMed] [Google Scholar]

[bib59] Ye H., Gao F., Yin Y., Guo D., Zhao P., Lu Y., et al. Precise diagnosis of intracranial hemorrhage and subtypes using a three-dimensional joint convolutional and recurrent neural network. Eur. Radiol. 2019;29(11):6191–6201. doi: 10.1007/s00330-019-06163-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib60] Yeo M., Tahayori B., Kok H.K., Maingard J., Kutaiba N., Russell J., et al. Evaluation of techniques to improve a deep learning algorithm for the automatic detection of intracranial haemorrhage on CT head imaging. 2023;7(1):17. doi: 10.1186/s41747-023-00330-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib61] Zhou Q., Zhu W., Li F., Yuan M., Zheng L., Liu X. Transfer learning of the ResNet-18 and DenseNet-121 model used to diagnose intracranial hemorrhage in CT scanning. Curr. Pharm. Des. 2022;28(4):287–295. doi: 10.2174/1381612827666211213143357. [DOI] [PubMed] [Google Scholar]

PERMALINK

Diagnostic performance and clinical applications of artificial intelligence for intracranial bleeding detection: A meta-analysis

Mustafa S Alhasan

Ahmed Y Azzam

Ayman S Alhasan

Arjun Kalyanpur

Omar A Alharthi

Mohammad Khalil

Adam Dmytriw

Muhammed Amir Essibayi

Fabricio Feltrin

James Milburn

Abstract

Introduction

Methods

Results

Conclusions

Highlights

1. Introduction

2. Methods

2.1. Study design and search strategy

2.2. Eligibility criteria and study selection

2.3. Data extraction and quality assessment

2.4. Data synthesis and statistical analysis

3. Results

3.1. Study selection and characteristics

Fig. 1.

Table 1.

3.2. Overall diagnostic performance

Fig. 2.

Table 2.

3.3. Performance by ICH subtype

Fig. 3.

3.4. Algorithm architecture performance comparison

Table 3.

3.5. Algorithm-subtype performance matrix analysis

Table 4.

Table 5.

3.6. Benchmark vs. real-world performance

3.7. Multi-dimensional performance analysis of commercial systems

Fig. 4.

3.8. Real-world implementation metrics and clinical impact

Table 6.

3.9. Clinical workflow impact

Fig. 5.

3.10. Risk of bias assessment

3.11. Predictive values across clinical settings

4. Discussion

4.1. Principal findings

4.2. Clinical workflow integration and patient care impact

4.3. Predictive values and clinical decision-making

4.4. Subtype-specific performance and algorithmic architectures

4.5. Benchmark-to-implementation performance gap

4.6. Addressing critical detection gaps - EDH and SAH

4.7. Clinical applications framework

4.8. Limitations

4.9. Future directions

5. Conclusions

Ethics approval and consent to participate

Consent for publication

Availability of data and materials

Authors' contributions

Funding

Declaration of competing interest

Acknowledgements

Footnotes

Abbreviations

Appendix A. Supplementary data

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases