Abstract
Background
Accurately predicting hematoma enlargement (HE) is crucial for improving the prognosis of patients with cerebral haemorrhage. Artificial intelligence (AI) is a potentially reliable assistant for medical image recognition. This study systematically reviews medical imaging articles on the predictive performance of AI in HE.
Materials and methods
Retrieved relevant studies published before October, 2024 from Embase, Institute of Electrical and Electronics Engineers (IEEE), PubMed, Web of Science, and Cochrane Library databases. The diagnostic test of predicting hematoma enlargement based on CT image training artificial intelligence model, and reported 2 × 2 contingency tables or provided sensitivity (SE) and specificity (SP) for calculation. Two reviewers independently screened the retrieved citations and extracted data. The methodological quality of studies was assessed using the QUADAS-AI, and Preferred Reporting Items for Systematic reviews and Meta-Analyses was used to ensure standardised reporting of studies. Subgroup analysis was performed based on sample size, risk of bias, year of publication, ratio of training set to test set, and number of centres involved.
Results
36 articles were included in this Systematic review to qualitative analysis, of which 23 have sufficient information for further quantitative analysis. Among these articles, there are a total of 7 articles used deep learning (DL) and 16 articles used machine learning (ML). The comprehensive SE and SP of ML are 78% (95% CI: 69–85%) and 85% (78–90%), respectively, while the AUC is 0.89 (0.86–0.91). The SE and SP of DL was 87% (95% CI: 80–92%) and 75% (67–81%), respectively, with an AUC of 0.88 (0.85–0.91). The subgroup analysis found that when the ratio of the training set to the test set is 7:3, the sensitivity is 0.77(0.62–0.91), p = 0.03; In terms of specificity, the group with sample size more than 200 has higher specificity, which is 0.83 (0.75–0.92), p = 0.02; among the risk groups in the study design, the specificity of the risk group was higher, which was 0.83 (0.76–0.89), p = 0.02. The group specificity of articles published before 2021 was higher, 0.84 (0.77–0.90); and the specificity of data from a single research centre was higher, which was 0.85 (0.80–0.91), p < 0.001.
Conclusions
Artificial intelligence algorithms based on imaging have shown good performance in predicting HE.
Keywords: Artificial intelligence, medical imaging, meta-analysis, hematoma expansion, intracerebral hematoma
Introduction
Spontaneous intracerebral haemorrhage (SICH) is a sub-type of stroke that refers to the rupture of blood vessels in the brain, leading to the accumulation of blood in the brain parenchyma [1]. Its characteristics are the risk of onset, rapid disease changes, high mortality and disability rates, and it is a devastating disease [2]. About one-fifth of patients with cerebral haemorrhage experience hematoma enlargement (HE) within the first 3–6 h after symptoms appear, which may lead to poor prognosis [3–5]. Accurately predicting HE in the early stage of cerebral haemorrhage and taking timely and reasonable treatment measures play a very important role in the prognosis of patients [6]. Computed tomography (CT) of the head is the preferred examination method for diagnosing cerebral haemorrhage [7], but the accuracy of the examination results may be lower due to different clinical experience or medical level of doctors.
When patients experience cerebral haemorrhage, a key challenge is to quickly identify high-risk patients with enlarged hematoma. Treatment methods that can effectively improve patient functional prognosis are lacking. So when patients experience cerebral haemorrhage, a key challenge is to quickly identify high-risk patients with enlarged hematoma.
In recent years, Artificial intelligence (AI) has been more and more widely used in Medical imaging to assist disease diagnosis [8–10], and can significantly improve the classification, segmentation and synthesis performance of imaging images, improve the accuracy of diagnostic results, reduce missed diagnosis and misdiagnosis caused by different doctors’ clinical experience and uneven distribution of medical resources, and improve the prognosis of patients. Research has shown that artificial intelligence can assist in CT imaging predict of brain hematoma enlargement [11–14].
So far, the studies describing the differences between the selected population and artificial intelligence algorithms in artificial intelligence assisted diagnosis of HE have been conducted [15]. In order to provide higher-level evidence for guiding clinical practice, this study conducted a systematic review of published diagnostic performance data of artificial intelligence for HE prediction and conducted a meta-analysis using quantitative methods.
Methods
Protocol registration and study design
The study was registered in the PROSPERO, CRD42023485662. The meta-analysis was following the PRISMA [16] (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) and AMSTAR 2 [17] (Assessing the methodological quality of systematic reviews) Guidelines.
Retrieval strategy and eligibility criteria
We systematically searched the research of artificial intelligence algorithm based on medical imaging to predict hematoma expansion in Embase, Institute of Electrical and Electronics Engineers (IEEE), PubMed, Web of Science, and Cochrane Library. The search deadline is October 2024. The search keywords include ‘cerebral haemorrhage’, ‘artificial intelligence’, ‘hematoma enlargement’, ‘machine learning’, ‘deep learning’ and ‘Radiomics’. The search formula of PubMed is: (((((Cerebral haemorrhage[Title/Abstract]) OR (intracerebral haemorrhage[Title/Abstract])) OR (Haemorrhagic stroke[Title/Abstract])) OR (intracerebral haematoma[Title/Abstract])) AND (((Hematoma Expansion[Title/Abstract]) OR (hematoma enlargement[Title/Abstract])) OR (haematoma expansion[Title/Abstract]))) AND (((((Artificial Intelligence[Title/Abstract]) OR (Machine Learning[Title/Abstract])) OR (Deep Learning[Title/Abstract])) OR (Computational Intelligence[Title/Abstract])) OR (Radiomics[Title/Abstract])). The retrieval strategy used for each database is supplemented in Supplemental Table 1.
Inclusion criteria for the article: (1) Diagnostic tests using artificial intelligence models to predict hematoma enlargement; (2) Artificial intelligence models were trained based on CT images; (3) Defined hematoma enlargement; (4) Analyzed or evaluated the diagnostic effectiveness; (5) Report 2 × 2 contingency tables or provide sensitivity (SE) and specificity (SP) for calculations; (6) Articles published in English Language. Exclusion criteria for the article: (1) Duplicate publications; (2) There are no gold standard results as validation; (3) Review, comments, letters, case reports, clinical trial protocols, meeting documents.
Two authors (WJ-F and WW) conducted title abstract and entire article screening, respectively. Disagreements encountered are resolved through mutual consultation. During the reading process of the entire article, the reasons for excluding each article were recorded. Extract data from selected articles. The two authors (WJ-F and WW) examined the existing differences to ensure the accuracy of the database.
Data extraction
Extract diagnostic accuracy data including true positive (TP), false positive (FP), true negative (TN), and false negative (FN) directly into the contingency table. If the above data is not provided, calculate 2 based on sensitivity and specificity × 2 contingency table information. Supplementary Table 2 summarizes the contingency tables extracted from the included studies.
Study quality assessment
All selected studies were assessed for quality with the use of quality assessment of diagnostic accuracy studies-AI (QUADAS-AI) [18–20] criteria. This new evaluation method is a specific extension of artificial intelligence to QUADAS-2 [21] and QUADAS-c [22], providing researchers with a specific framework to evaluate the risks of bias and applicability when reviewing the accuracy of diagnostic tests centred on AI. Revman software (version 5.4) was used to evaluate the methodological quality of the included studies.
Statistical analysis
Meta-analysis was carried out with Stata Software (Version 17), and the effect quantity entering the meta-analysis was the SE and SP of each AI model diagnosis. Use the built-in calculator of Revman software (version 5.4) to perform four grid table calculations on extracted data. Preferred training set data extraction. For the research that does not provide training set data, extract the test set data. If there is only one validation set in the study, the validation set data will be extracted. Heterogeneity testing was performed on the measurement results of the same indicators in the literature. A fixed effects model was used for data consolidation if there was no statistical heterogeneity in the results (p > 0.05). A random effects model was used for data consolidation, and an attempt was made to identify the source of heterogeneity if there was statistical heterogeneity in the results (p < 0.05). A summary receiver operating characteristic (SROC) curve was constructed to calculate the AUC value. When testing different artificial intelligence models in the same paper, the proposed model with the best accuracy was used for further meta-analysis. Funnel plots and regression tests were used to estimate the risk of publication bias.
Subgroup analysis were performed: (1) Divided into ≥200 groups and <200 groups based on the size of the research sample; (2) Divide into risk group and non-risk group based on whether have risk; (3) Divided into single centre group and multi-centre group based on data sources; (4) According to the allocation ratio of validation set and training set, it is divided into: 7:3 and others, others subgroups mainly include 8:2 and other studies that did not divide the dataset according to a certain proportion. (5) Before 2021 group and after 2021 group according to the year of publication of the article.
The double tailed probability of Type I error for all other statistical analyses is 0.05(α = 0.05).
Results
Study selection and characteristics of eligible studies
In the initial search, a total of 290 records were retrieved and 141 duplicates were deleted, of which 63 studies were excluded based on the screening of titles and abstracts. Finally, 36 articles were included in this Systematic review [15,20, 23–56] to qualitative analysis, of which 23 have sufficient information for further quantitative analysis (Figure 1). All included studies used retrospective information collection to train the model, and two of them used prospective data validation models. In addition, the research distribution of artificial intelligence algorithms in this study is as follows: 7 studies selected deep learning (DL) and 16 studies used machine learning (ML). Table 1 shows the detailed characteristics of these studies.
Figure 1.
PRISMA flowchart of study selection.
Table 1.
Participant demographics for the 36 included studies.
| Author | Year | Country | Mean or median age (SD; IQR) |
N | Training set:Validation set | First CT time | Second CT time | Algorithm architecture | Artificial intelligence type | External validation | HE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HE group | NHE group | |||||||||||
| Lu et al. [23] | 2024 | China | 68.7(53.0–89.0) | 64.4(50.0–85.0) | 280 | – | 24 | 72 | CNN | DL | No | 6 mL/33% |
| Ma et al. [24] | 2022 | China | 57.0(48.0–64.0) | 59.0(50.8–68.0) | 253 | 8:2 | 6 | 48 | CNN | DL | No | 12.5 mL/33% |
| Yu et al. [25] | 2024 | China | 59.7 ± 12.7 | 555 | 7:3 | Within 24 h | LR, KNN, SVM, DT, RF, LDA, QDA, NB | ML | Yes | 6 mL/33% | ||
| Li et al. [26] | 2024 | China | 66.6 ± 13.0 | 59.1 ± 13.5 | 182 | 4:1 | 6 | 36 | LR | ML | No | 6 mL/33% |
| Tang et al. [27] | 2022 | China | 67.3 ± 12.4 | 62.1 ± 15.1 | 223 | – | 6 | 24 | CNN, SVM, MLP, NB, DT, RF | DL and ML | No | 6 mL/33% |
| Li et al. [28] | 2024 | China | 59.28 ± 13.51 | 60.12 ± 12.41 | 582 | 8:2 | 6 | 24 | SVM, RF, LR, KNN, XGBoost | ML | No | 6 mL/33% |
| Zhou et al. [29] | 2021 | China | 62.8 ± 13.7 | 60.4 ± 13.1 | 232 | 7:3 | 6 | 24 | KNN, LR, GBT, NB, RF | ML | No | 6 mL/33% |
| Song et al. [15] | 2021 | China | 62.0(51.0–71.0) | 61.0(51.0–69.0) | 261 | 7:3 | 6 | 24 | LR | ML | No | 6 mL/33% |
| Li et al. [30] | 2021 | China | 59.0 ± 12.0 | 181 | 7:3 | 6 | 72 | LR | ML | No | 6 mL/33% | |
| Arooshi Kumar et.a [31] | 2024 | America | 63(53.0–72.0) | 62 .0(52.0–71.0) | 924 | 7:3 | 4.5 | 24 | ANN, DT, Gradient boost, RF, SVM,LR | DL and ML | No | 6 mL/33% |
| Dai et al. [32] | 2023 | China | 63.13 ± 11.53 | 130 | 7:3 | 24 | 24 | LR | ML | No | 6 mL/33% | |
| Du et al. [33] | 2024 | China | 59.6 (41.9–77.0) | 59.5 (45.8–73.1) | 604 | – | 24 | 24 | XGBoost, SVM, RF, LR | ML | No | 6 mL |
| Bo et al. [34] | 2023 | China | 59.53 ± 13.67 | 59.4 ± 11.17 | 408 | 8:2 | 6 | 72 | SVM | ML | No | 6 mL/33% |
| Teng et al. [35] | 2021a | China | 48.5(38.3–62.8) | 60.0(53.0–65.3) | 118 | – | 6 | 24 ± 3 | CNN | DL | Yes | 6 mL |
| Zhu et al. [36] | 2021a | China | 60.0(51.5–68.0) | 62.0(50.0–70.0) | 314 | 7:3 | 6 | 72 | SVM | ML | No | 6 mL/33% |
| Chen et al. [37] | 2021a | China | 60.9 ± 12.5 | 61.3 ± 12.7 | 864 | 7.5:2.5 | 6 | 72 | LR | ML | No | 6 mL/33% |
| Duan et al. [38] | 2021a | China | 55.8(13.1) | 59.4(12.2) | 108 | 7:3 | 6 | 24 | SVM, DT, CIT, RF, KNN, BPNet, Bayes | ML | No | 12.5 mL/33% |
| Zhong et al. [39] | 2021a | China | 63.3 ± 12.0 | 77 | – | 8 | 20–24 | CNN | DL | No | 6 mL/33% | |
| Guo et al. [40] | 2022a | China | 64.5 ± 13.2 | 102 | – | 6 | 24 | CNN | DL | Yes | 12.5 mL/33% | |
| Zhou et al. [41] | 2024a | China | 59.39 ±14.02 | 61.11 ± 12.98 | 236 | 8:2 | 6 | 24 | U-net | DL | Yes | 6 mL/33% |
| Wang et al. [42] | 2024a | China | 59.5 ± 11.9 | 180 | – | 24 | 72 | nnU-Net | DL | Yes | 6 mL/33% | |
| Satoru Tanioka et al. [43] | 2022a | Japan | 76.0(67.0–84.0) | 71.0(62.0–82.0) | 351 | – | 24 | 30 | KNN, LR, SVM, RF, XGBoost | ML | Yes | 6 mL/33% |
| Chen et al. [44] | 2024a | China | – | 307 | 7:3 | 6 | 72 | L1 logit, DT, SVM, AdaBoost | ML | No | 6 mL/33% | |
| Li et al. [45] | 2021a | China | 57.8 ± 12.6 | 58.9 ± 13.4 | 288 | 7:3 | 6 | 24 | LR | ML | No | 6 mL/33% |
| Xu et al. [46] | 2022a | China | 61.5 ± 13.77 | 58.51 ± 12.09 | 235 | – | 6 | 24 | LR | ML | No | 6 mL/33% |
| Xie et al. [47] | 2020a | China | 61.7 ± 16.7 | 62.8 ± 13.8 | 177 | 7:3 | 6 | 24 | LR | ML | No | 6 mL/33% |
| Lee et al. [48] | 2024a | Korea | 58 ± 14 | 63 ± 14 | 569 | 8:2 | – | 24 | LR | ML | No | 6 mL/33% |
| Tang et al. [49] | 2022a | China | 66.8 ± 10.4 | 61.4 ± 13.7 | 223 | – | 0 | 6 | KNN, DRN | DL | No | 10% |
| Zhu et al. [50] | 2021a | China | 61.5(53.0–73.8) | 60.0(51.0–69.0) | 626 | 7:3 | 6 | 72 | LASSO, SVM | ML | No | 2 mL |
| Cheng et al. [51] | 2021a | China | 63.1 ± 12.5 | 56.95 ± 11.8 | 140 | – | 6 | 48 | DCNN, MLP | ML | No | 12.5 mL/33% |
| Satoru Tanioka et al. [52] | 2024a | Japan | 72.0 (62.0–82.0) | 106 | – | 24 | 30 | CNN, KNN | DL and ML | Yes | 6 mL/33% | |
| Liu et al. [53] | 2019a | China | 61.0 ± 12.9 | 61.7 ± 12.8 | 1262 | 8:2 | 6 | 72 | SVM | ML | Yes | 6 mL/33% |
| Li et al. [54] | 2019a | China | 57.3(11.4) | 59.1(12.8) | 167 | – | 6 | 24 | SVM | ML | No | 6 mL/30% |
| Ma et al. [55] | 2019a | China | 53.0(44.0–59.0) | 55.0(49.0–65.0) | 254 | – | 6 | 48 | LASSO | ML | Yes | 12.5 mL/33% |
| Wu et al. [56] | 2024a | China | 63.14 ± 11.97 | 60.73 ± 13.09 | 204 | 8:2 | 24 | 24 | NB, LR, SVM, KNN, RF, ET, XGBoost, LightGBM, MLP, AdaBoost, GradientBoosting | ML | No | 6 mL/33% |
| Tan et al. [57] | 2019a | America | 65.8(16.2) | 107 | – | 2.5 | 48 | NB | ML | Yes | 3 cm3/25% | |
aStudies (n = 23) included in the meta-analysis; deep learning (DL), machine learning (ML), Convolutional neural network (CNN), logistic regression (LR), k-nearest neighbors (KNN), support vector machines (SVM), decision trees (DT), random forests (RF), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), naive bayes (NB), Multilayer Perceptron (MLP), Extreme Gradient Boosting (XGBoost), Gradient Boosting Tree (GBT), Artificial Neural Network (ANN), BackPropagation Neural Network (BPNet), U-shaped Network (U-Net), Deep Residual Network (DRN) and Deep Convolutional Neural Network (DCNN).
Pooled performance of AI algorithms
The meta-analysis results of 23 studies showed significant heterogeneity among the included studies, with SE I2=90.34%, p < 0.01, and SP I2=94.55%, p < 0.01 are shown in Figure 2. Artificial intelligence algorithms are advantageous in predicting hematoma enlargement in medical imaging using random effects models.
Figure 2.
Forest plot of studies included in the meta-analysis.
The SROC curves of 23 included studies are shown in Figure 3. The SE and SP of DL are 87% (95% CI: 80–92%) and 75% (67–81%), respectively, and the AUC of all DL algorithms is 0.88 (0.85–0.91). The SE and SP of ML are 78% (95% CI: 69–85%) and 85% (78–90%), respectively, and the AUC of all ML algorithms is 0.89 (0.86–0.91).
Figure 3.
SROC curve of studies (A: deep learning SROC curves of the meta-analysis; B: machine learning SROC curves of the meta-analysis).
Subgroup meta-analyses
The subgroup analysis results of this study are shown in Table 2. The results of subgroup analysis indicate that in terms of sensitivity, the proportion of the dataset affects the sensitivity of the artificial intelligence model. When the ratio of the training set to the test set is 7:3, the sensitivity is 0.77(0.62–0.91), and when the other ratios are 0.83(0.77–0.89), p = 0.03; In terms of specificity, the artificial intelligence model has different diagnostic abilities for non-patients in terms of the number of samples, whether there are risks in the research design, the year of publication, and whether the data are from multiple centres. In the group with sample size, the group with sample size more than 200 has higher specificity, which is 0.83 (0.75–0.92), and it was 0.81 (0.74–0.88) in the group less than 200, p = 0.02; among the risk groups in the study design, the specificity of the risk group was higher, which was 0.83 (0.76–0.89), p = 0.02. The group specificity of articles published before 2021 was higher, 0.84 (0.77–0.90); and the specificity of data from a single research centre was higher, which was 0.85 (0.80–0.91), p < 0.001.
Table 2.
Summary estimation of the pooled performance of artificial intelligence in image-based hematoma enlargement detection.
| No. of studies | Sensitivity |
Specificity |
|||||
|---|---|---|---|---|---|---|---|
| Sensitivity | P value | I2 (95%CI) | Specificity | P value | I2 (95%CI) | ||
| Sample size | |||||||
| ≥200 | 8 | 0.84(0.75–0.92) | =0.08 | 96.91(95.72–98.10) | 0.83(0.75–0.92) | =0.02* | 97.71(96.91–98.51) |
| <200 | 15 | 0.80(0.72–0.88) | 51.78(23.35–80.22) | 0.81(0.74–0.88) | 86.99(81.51–92.48) | ||
| Risk of bias | |||||||
| Have risk | 14 | 0.82(0.75–0.89) | =0.05 | 93.64(91.39–95.90) | 0.83(0.76–0.89) | =0.02* | 96.49(95.45–97.52) |
| No risk | 9 | 0.80(0.70–0.90) | 70.32(49.94–90.69) | 0.81(0.72–0.90) | 78.47(64.82–92.12) | ||
| Year of publication | |||||||
| After 2021 | 11 | 0.86(0.79–0.92) | =0.16 | 60.87(35.09–86.66) | 0.80(0.72–0.88) | <0.00* | 76.72(63.18–90.27) |
| Before 2021 | 12 | 0.77(0.68–0.86) | 93.59(91.11–96.06) | 0.84(0.77–0.90) | 96.71(92.19–97.75) | ||
| Dataset scale | |||||||
| 7:3 | 5 | 0.77(0.62–0.91) | =0.03* | 96.67(94.94–98.40) | 0.89(0.81–0.96) | =0.25 | 97.15(95.75–98.56) |
| Others | 18 | 0.83(0.77–0.89) | 72.24(59.14–85.34) | 0.80(0.74–0.86) | 93.67(91.71–95.64) | ||
| Data | |||||||
| Multi-centre data | 9 | 0.87(0.80–0.94) | =0.19 | 52.53(16.66–88.40) | 0.76(0.66–0.85) | <0.00* | 76.00(60.36–92.64) |
| Single centre data | 14 | 0.78(0.70–0.86) | 93.54(91.24–95.85) | 0.85(0.80–0.91) | 96.98(96.13–97.83) | ||
*p < 0.05.
Quality assessment
The quality of inclusion in the study was determined by QUADAS-AI. The detailed evaluation results are shown in Figure 4. The bias risk of index testing in 14 studies is high or unclear, as these studies lack sufficient external evaluation. Further analysis showed that the sensitivity and specificity of the included studies were significantly heterogeneous: the sensitivity ranged from 0.18 (0.09–0.31) to 0.94 (0.72–0.92), and the specificity ranged from 0.55 (0.51–0.58) to 0.99 (0.95–1.00).
Figure 4.
QUADAS-AI summary plot.
Publication bias analysis
The Deek’s method was used to analyze the publication bias of the included articles, and the results showed no significant publication bias (p > 0.05), the results are shown in Figure 5.
Figure 5.
Publication bias (funnel plots suggested there was no publication bias.
Discussion
This systematic review and meta-analysis evaluated the role of AI in predicting hematoma expansion from medical images, and the results showed that the combined diagnostic performance of deep learning models and machine learning models was excellent, highlighting the clinical potential of AI in hematoma prognosis.
In recent years, along with artificial intelligence in medical imaging, especially deep learning algorithms, which can process and recognize complex medical image features, as a means of imaging assisted diagnosis, it has high accuracy and sensitivity. As far as we know, this is the first systematic review and meta-analysis employing AUC indicator specifically concentrate on the performance of artificial intelligence systems in predicting hematoma enlargement.
To ensure the rigor of the study, the diagnostic review guidelines [58] were strictly followed during the study, and comprehensive literature searches were conducted in medical and engineering databases. By merging the original research, we found that artificial intelligence algorithms perform well in identifying hematoma enlargement using medical imaging. At the same time, this study also described six subgroups for analysis, including different sample sizes, model types, publication years, literature quality, and data sources.
Based on the above meta-analysis, we found that the sensitivity and AUC value of deep learning models are higher than those of machine learning models, and their predictive performance for hematoma enlargement is stronger. This may be due to the establishment of multi-layer neural networks for feature extraction and representation of data, thereby achieving efficient data processing and analysis. Deep learning can automatically learn higher-level feature representations from data. The specificity of machine learning is higher than that of deep learning models, considering that machine learning usually relies on carefully designed feature engineering to extract meaningful features from input data, thus having a strong ability to correctly judge disease free. Through analysis of the included studies, it was found that the advantages of deep learning lie in its ability to automatically extract features, strong small sample optimization capabilities, and high end-to-end processing efficiency, making it suitable for image segmentation and complex pattern recognition [23]. The advantages of machine learning lie in its relatively flexible data requirements, strong interpretability, and good multi centre generalization ability, making it suitable for scenarios that require the integration of clinical knowledge [24]. This suggests that in practical applications, if the amount of data is large and requires rapid automated processing, DL has greater potential; If transparent decision-making or diverse data sources are needed, ML is more reliable. The future trend may be a combination of both (such as DL feature extraction and ML interpretation of results) to balance performance and interpretability.
Based on subgroup analysis, potential sources of heterogeneity between studies were identified. In terms of specificity, year of publication are the sources of heterogeneity testing in this article. This may be due to the continuous development of artificial intelligence in recent years, and the gradual maturity of the application of deep learning models after 2021 [12,59], especially research based on CNN models, due to the use of different datasets, recent studies have tended to use multi centre datasets to evaluate the extrapolation of results, resulting in a slight decrease in specificity, but overall maintaining a relatively high level.
With the continuous emergence of research on artificial intelligence assisted diagnosis and treatment, the number of related reviews has also increased, providing strong evidence for doctors to develop personalized diagnosis and treatment plans and guidelines. However, most studies do not have specific quality evaluation standards for artificial intelligence. This study used the adjusted QUADAS-AI evaluation tool to strictly rate the quality of the study and the risk of bias. Sounderajah, V et al. [19]. proposed that an AI specific of bias assessment tool involves specific terminology generated by AI diagnostic testing research, and did not consider other issues that arise in AI research, such as dataset settings, bias sources, etc. This is the advantage of this systematic review, which will better guide future related research [20].
Although artificial intelligence currently exhibits significant advantages, there are still some urgent problems to be solved in its current applications:
Firstly, in the current published studies, there are issues with small sample sizes and a lack of sufficient external validation. Adequate data is the core and key component supporting the development of artificial intelligence models. Utilizing a large amount of information for training, extracting high-throughput features, and accurately constructing predictive models can better assist doctors in diagnosis and treatment. In relevant research, small sample size and lack of sufficient external validation are the main bottlenecks in attempting to learn any artificial intelligence model [20,60].
Secondly, in the current published studies, the indicators related to the performance evaluation of artificial intelligence prediction models have not been unified. Among the studies we included, 36 articles were screened and met the standards, but only 23 articles provided the SE and SP required for merging data. This indicator is the key to reporting the diagnostic performance of artificial intelligence. Combined with the number of patients with or without hematoma enlargement mentioned in the study, we can obtain TP, TN, FP, and FN, and construct contingency tables. In other studies, commonly used indicators in computer science research were selected, such as accuracy, dice ratio, F1 score, and recall rate, which cannot comprehensively evaluate the diagnostic effectiveness of the model compared to the commonly used ones [20,24].
While conducting our research, we identified a study that employed the C-index to assess the role of machine learning in predicting hematoma enlargement [61]. That study evaluated its dataset using the C-index, sensitivity, and specificity. In contrast, our study centres on the AUC as the key metric. Renowned as a standard metric for binary classification models, AUC comprehensively captures a model’s discriminatory power across various classification thresholds. C-index is mainly used for risk stratification in survival analysis. When dealing with classification problems, the C-index lacks the interpretability of AUC. AUC offers a more straightforward representation of the trade-off between true positives and false positives. This makes it easier for clinicians to understand the model’s predictive performance for hematoma enlargement, thereby facilitating its implementation in real-world diagnostic and treatment settings.
Thirdly, most of the data sources included in the article are from a single data centre, and the data is relatively limited. These studies typically divide the data of this centre into training sets, validation sets, and testing sets. Therefore, the performance testing of artificial intelligence models can only be conducted through internal validation, and the lack of sufficient external validation may lead to results that are too high compared to the actual situation, reducing the model’s generalizability [20]. This result can be confirmed from studies containing external validation. The results of internal validation are often higher than those of external validation, possibly because data from the same dataset often has homogeneity, and artificial intelligence models have a higher probability of judging things. In order to improve the model of artificial intelligence, it is necessary to establish regional and multi centre data centres.
Finally, only a small number of studies have included prospective data for artificial intelligence model construction and evaluation, while the majority of included studies are based on retrospective data, with patient information sourced from hospital medical records. The evidence provided by prospective studies is more favourable, and it is hoped that more prospective studies will emerge in the future.
In the implementation process of this study, there are inevitably some limitations. Due to only including literature reports with 2 × 2 contingency tables or providing sensitivity and specificity, the number of studies included was limited, as some studies reported other performance indicators (such as F1 score.) but did not provide the required indicators. In addition, focusing solely on sensitivity and specificity may not be sufficient to comprehensively evaluate the performance of artificial intelligence models in predicting hematoma enlargement, especially when data is imbalanced or model ranking ability needs to be evaluated, which may affect the comprehensiveness of the results.
Conclusion
Artificial intelligence algorithms based on imaging have shown good performance in predicting HE. In the future, more multicentre and large sample size datasets can be used to train artificial intelligence models, which will be beneficial for AI to more accurately predict hematoma expansion, telemedicine, and compensate for medical problems caused by imbalanced medical resources.
Supplementary Material
Funding Statement
This work was supported by Natural Science Foundation of Liaoning Province (Grant number 2024-MS-144). The fund is mainly used for research design, data collection, analysis, and interpretation.
Consent form
Not applicable.
Disclosure statement
No potential conflict of interest was reported by the authors.
Ethical approval
Not applicable.
Data availability statement
The data of this study are from the original research that can be obtained publicly. The author confirms that the data supporting the results of this study are provided in the article and its supplementary materials.
References
- 1.O’Carroll CB, Brown BL, Freeman WD.. Intracerebral hemorrhage: a common yet disproportionately deadly stroke subtype. Mayo Clin Proc. 2021;96(6):1639–1654. doi: 10.1016/j.mayocp.2020.10.034. [DOI] [PubMed] [Google Scholar]
- 2.Moulin S, Labreuche J, Bombois S, et al. Dementia risk after spontaneous intracerebral haemorrhage: a prospective cohort study. Lancet Neurol. 2016;15(8):820–829. doi: 10.1016/S1474-4422(16)00130-7. [DOI] [PubMed] [Google Scholar]
- 3.Dowlatshahi D, Demchuk AM, Flaherty ML, et al. Defining hematoma expansion in intracerebral hemorrhage: relationship with patient outcomes. Neurology. 2011;76(14):1238–1244. doi: 10.1212/WNL.0b013e3182143317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Morotti A, Boulouis G, Dowlatshahi D, et al. Standards for detecting, interpreting, and reporting noncontrast computed tomographic markers of intracerebral hemorrhage expansion. Ann Neurol. 2019;86(4):480–492. doi: 10.1002/ana.25563. [DOI] [PubMed] [Google Scholar]
- 5.Brouwers HB, Greenberg SM.. Hematoma expansion following acute intracerebral hemorrhage. Cerebrovasc Dis. 2013;35(3):195–201. doi: 10.1159/000346599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Veltkamp R, Purrucker J.. Management of spontaneous intracerebral hemorrhage. Curr Neurol Neurosci Rep. 2017;17(10):80. doi: 10.1007/s11910-017-0783-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Jain A, Malhotra A, Payabvash S.. Imaging of spontaneous intracerebral hemorrhage. Neuroimaging Clin N Am. 2021;31(2):193–203. doi: 10.1016/j.nic.2021.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Benke K, Benke G.. Artificial intelligence and big data in public health. Int J Environ Res Public Health. 2018;15(12):2796. doi: 10.3390/ijerph15122796. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Haug CJ, Drazen JM.. Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med. 2023;388(13):1201–1208. doi: 10.1056/NEJMra2302038. [DOI] [PubMed] [Google Scholar]
- 10.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56. doi: 10.1038/s41591-018-0300-7. [DOI] [PubMed] [Google Scholar]
- 11.Choi RY, Coyner AS, Kalpathy-Cramer J, et al. Introduction to machine learning, neural networks, and deep learning. Transl Vis Sci Technol. 2020;9(2):14. doi: 10.1167/tvst.9.2.14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chen X, Wang X, Zhang K, et al. Recent advances and clinical applications of deep learning in medical image analysis. Med Image Anal. 2022;79:102444. doi: 10.1016/j.media.2022.102444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Manco L, Maffei N, Strolin S, et al. Basic of machine learning and deep learning in imaging for medical physicists. Phys Med. 2021;83:194–205. doi: 10.1016/j.ejmp.2021.03.026. [DOI] [PubMed] [Google Scholar]
- 14.Wagner MW, Namdar K, Biswas A, et al. Radiomics, machine learning, and artificial intelligence-what the neuroradiologist needs to know. Neuroradiology. 2021;63(12):1957–1967. doi: 10.1007/s00234-021-02813-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Song Z, Guo D, Tang Z, et al. Noncontrast computed tomography-based radiomics analysis in discriminating early hematoma expansion after spontaneous intracerebral hemorrhage. Korean J Radiol. 2021;22(3):415–424. doi: 10.3348/kjr.2020.0254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Int J Surg. 2021;88:105906. doi: 10.1136/bmj.n71. [DOI] [PubMed] [Google Scholar]
- 17.Shea BJ, Reeves BC, Wells G, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358:j4008. doi: 10.1136/bmj.j4008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Moons KG, de Groot JA, Bouwmeester W, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med. 2014;11(10):e1001744. doi: 10.1371/journal.pmed.1001744. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Sounderajah V, Ashrafian H, Rose S, et al. A quality assessment tool for artificial intelligence-centered diagnostic test accuracy studies: QUADAS-AI. Nat Med. 2021;27(10):1663–1665. doi: 10.1038/s41591-021-01517-0. [DOI] [PubMed] [Google Scholar]
- 20.Xu HL, Gong TT, Liu FH, et al. Artificial intelligence performance in image-based ovarian cancer identification: a systematic review and meta-analysis. EClinicalMedicine. 2022;53:101662. doi: 10.1016/j.eclinm.2022.101662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Whiting PF, Rutjes AW, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–536. doi: 10.7326/0003-4819-155-8-201110180-00009. [DOI] [PubMed] [Google Scholar]
- 22.Yang B, Mallett S, Takwoingi Y, et al. QUADAS-C: a tool for assessing risk of bias in comparative diagnostic accuracy studies. Ann Intern Med. 2021;174(11):1592–1599. doi: 10.7326/M21-2234. [DOI] [PubMed] [Google Scholar]
- 23.Lu M, Wang Y, Tian J, et al. Application of deep learning and radiomics in the prediction of hematoma expansion in intracerebral hemorrhage: a fully automated hybrid approach. Diagn Interv Radiol. 2024;30(5):299–312. doi: 10.4274/dir.2024.222088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ma C, Wang L, Gao C, et al. Automatic and efficient prediction of hematoma expansion in patients with hypertensive intracerebral hemorrhage using deep learning based on CT images. J Pers Med. 2022;12(5):779. doi: 10.3390/jpm12050779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Yu F, Yang M, He C, et al. CT radiomics combined with clinical and radiological factors predict hematoma expansion in hypertensive intracerebral hemorrhage. Eur Radiol. 2025;35(1):6–19. doi: 10.1007/s00330-024-10921-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Li Q, Li F, Liu H, et al. CT-based radiomics models predict spontaneous intracerebral hemorrhage expansion and are comparable with CT angiography spot sign. Front Neurol. 2024;15:1332509. doi: 10.3389/fneur.2024.1332509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Tang Z, Zhu Y, Lu X, et al. Deep learning-based prediction of hematoma expansion using a single brain computed tomographic slice in patients with spontaneous intracerebral hemorrhages. World Neurosurg. 2022;165: E128–E136. doi: 10.1016/j.wneu.2022.05.109. [DOI] [PubMed] [Google Scholar]
- 28.Li Y, Du C, Ge S, et al. Hematoma expansion prediction based on SMOTE and XGBoost algorithm. BMC Med Inform Decis Mak. 2024;24(1):172. doi: 10.1186/s12911-024-02561-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Zhou H, Zhou Z, Song Z, et al. Machine learning-based modified BAT score in predicting hematoma enlargement after spontaneous intracerebral hemorrhage. J Clin Neurosci. 2021;93:206–212. doi: 10.1016/j.jocn.2021.09.030. [DOI] [PubMed] [Google Scholar]
- 30.Li H, Xie Y, Liu H, et al. Non-contrast CT-based radiomics score for predicting hematoma enlargement in spontaneous intracerebral hemorrhage. Clin Neuroradiol. 2022;32(2):517–528. doi: 10.1007/s00062-021-01062-w. [DOI] [PubMed] [Google Scholar]
- 31.Kumar A, Witsch J, Frontera J, et al. Predicting hematoma expansion using machine learning: an exploratory analysis of the ATACH 2 trial. J Neurol Sci. 2024;461:123048. doi: 10.1016/j.jns.2024.123048. [DOI] [PubMed] [Google Scholar]
- 32.Dai J, Liu D, Li X, et al. Prediction of hematoma expansion in hypertensive intracerebral hemorrhage by a radiomics nomogram. Pak J Med Sci. 2023;39(4):1149–1155. doi: 10.12669/pjms.39.4.7724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Du C, Li Y, Yang M, et al. Prediction of hematoma expansion in intracerebral hemorrhage in 24 hours by machine learning algorithm. World Neurosurg. 2024;185: E475–E483. doi: 10.1016/j.wneu.2024.02.058. [DOI] [PubMed] [Google Scholar]
- 34.Bo R, Xiong Z, Huang T, et al. Using radiomics and convolutional neural networks for the prediction of hematoma expansion after intracerebral hemorrhage. Int J Gen Med. 2023;16:3393–3402. doi: 10.2147/IJGM.S408725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Teng L, Ren Q, Zhang P, et al. Artificial intelligence can effectively predict early hematoma expansion of intracerebral hemorrhage analyzing noncontrast computed tomography image. Front Aging Neurosci. 2021;13:632138. doi: 10.3389/fnagi.2021.632138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Zhu D, Zhang M, Li Q, et al. Can perihaematomal radiomics features predict haematoma expansion? Clin Radiol. 2021;76(8):629.e1–629.e9. doi: 10.1016/j.crad.2021.03.003. [DOI] [PubMed] [Google Scholar]
- 37.Chen Q, Zhu D, Liu J, et al. Clinical-radiomics nomogram for risk estimation of early hematoma expansion after acute intracerebral hemorrhage. Acad Radiol. 2021;28(3):307–317. doi: 10.1016/j.acra.2020.02.021. [DOI] [PubMed] [Google Scholar]
- 38.Duan C, Liu F, Gao S, et al. Comparison of radiomic models based on different machine learning methods for predicting intracerebral hemorrhage expansion. Clin Neuroradiol. 2022;32(1):215–223. doi: 10.1007/s00062-021-01040-2. [DOI] [PubMed] [Google Scholar]
- 39.Zhong JW, Jin YJ, Song ZJ, et al. Deep learning for automatically predicting early haematoma expansion in Chinese patients. Stroke Vasc Neurol. 2021;6(4):610–614. doi: 10.1136/svn-2020-000647. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Guo DC, Gu J, He J, et al. External validation study on the value of deep learning algorithm for the prediction of hematoma expansion from noncontrast CT scans. BMC Med Imaging. 2022;22(1):45. doi: 10.1186/s12880-022-00772-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Zhou Z, Chen W, Yu R, et al. HE-Mind: a model for automatically predicting hematoma expansion after spontaneous intracerebral hemorrhage. Eur J Radiol. 2024;176:111533. doi: 10.1016/j.ejrad.2024.111533. [DOI] [PubMed] [Google Scholar]
- 42.Wang M, Liang Y, Li H, et al. Hybrid clinical-radiomics model based on fully automatic segmentation for predicting the early expansion of spontaneous intracerebral hemorrhage: a multi-center study. J Stroke Cerebrovasc Dis. 2024;33(11):107979. doi: 10.1016/j.jstrokecerebrovasdis.2024.107979. [DOI] [PubMed] [Google Scholar]
- 43.Tanioka S, Yago T, Tanaka K, et al. Machine learning prediction of hematoma expansion in acute intracerebral hemorrhage. Sci Rep. 2022;12(1):12452. doi: 10.1038/s41598-022-15400-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Chen Q, Fu C, Qiu X, et al. Machine-learning-based performance comparison of two-dimensional (2D) and three- dimensional (3D) CT radiomics features for intracerebral. Clin Radiol. 2024;79(1):e26–e33. doi: 10.1016/j.crad.2023.10.002. [DOI] [PubMed] [Google Scholar]
- 45.Li Q, Dong F, Wang Q, et al. A model comprising the blend sign and black hole sign shows good performance for predicting early intracerebral haemorrhage expansion: a comprehensive evaluation of CT features. Eur Radiol. 2021;31(12):9131–9138. doi: 10.1007/s00330-021-08061-y. [DOI] [PubMed] [Google Scholar]
- 46.Xu W, Guo H, Li H, et al. A non-contrast computed tomography-based radiomics nomogram for the prediction of hematoma expansion in patients with deep ganglionic intracerebral hemorrhage. Front Neurol. 2022;13:974183. doi: 10.3389/fneur.2022.974183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Xie H, Ma S, Wang X, et al. Noncontrast computer tomography-based radiomics model for predicting intracerebral hemorrhage expansion: preliminary findings and comparison with conventional radiological model. Eur Radiol. 2020;30(1):87–98. doi: 10.1007/s00330-019-06378-3. [DOI] [PubMed] [Google Scholar]
- 48.Lee H, Lee J, Jang J, et al. Predicting hematoma expansion in acute spontaneous intracerebral hemorrhage: integrating clinical factors with a multitask deep learning model for non-contrast head CT. Neuroradiology. 2024;66(4):577–587. doi: 10.1007/s00234-024-03298-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Tang ZR, Chen Y, Hu R, et al. Predicting hematoma expansion in intracerebral hemorrhage from brain CT scans via K-nearest neighbors matting and deep residual network. Biomed Signal Process Control. 2022;76:103656. doi: 10.1016/j.bspc.2022.103656. [DOI] [Google Scholar]
- 50.Zhu DQ, Chen Q, Xiang YL, et al. Predicting intraventricular hemorrhage growth with a machine learning-based, radiomics-clinical model. Aging (Albany NY). 2021;13(9):12833–12848. doi: 10.18632/aging.202954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Cheng X, Zhang W, Wu M, et al. A prediction of hematoma expansion in hemorrhagic patients using a novel dual-modal machine learning strategy. Physiol Meas. 2021;42(7):074005. doi: 10.1088/1361-6579/ac10ab. [DOI] [PubMed] [Google Scholar]
- 52.Tanioka S, Aydin OU, Hilbert A, et al. Prediction of hematoma expansion in spontaneous intracerebral hemorrhage using a multimodal neural network. Sci Rep. 2024;14(1):16465. doi: 10.1038/s41598-024-67365-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Liu J, Xu H, Chen Q, et al. Prediction of hematoma expansion in spontaneous intracerebral hemorrhage using support vector machine. EBioMedicine. 2019;43:454–459. doi: 10.1016/j.ebiom.2019.04.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Li H, Xie Y, Wang X, et al. Radiomics features on non-contrast computed tomography predict early enlargement of spontaneous intracerebral hemorrhage. Clin Neurol Neurosurg. 2019;185:105491. doi: 10.1016/j.clineuro.2019.105491. [DOI] [PubMed] [Google Scholar]
- 55.Ma C, Zhang Y, Niyazi T, et al. Radiomics for predicting hematoma expansion in patients with hypertensive intraparenchymal hematomas. Eur J Radiol. 2019;115:10–15. doi: 10.1016/j.ejrad.2019.04.001. [DOI] [PubMed] [Google Scholar]
- 56.Wu F, Wang P, Yang H, et al. Research on predicting hematoma expansion in spontaneous intracerebral hemorrhage based on deep features of the VGG-19 network. Postgrad Med J. 2024;100(1186):592–602. doi: 10.1093/postmj/qgae037. [DOI] [PubMed] [Google Scholar]
- 57.Tan CO, Lam S, Kuppens D, et al. Spot and diffuse signs: quantitative markers of intracranial hematoma expansion at dual-energy CT. Radiology. 2019;290(1):179–186. doi: 10.1148/radiol.2018180322. [DOI] [PubMed] [Google Scholar]
- 58.McInnes MDF, Moher D, Thombs BD, et al. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement. JAMA. 2018;319(4):388–396. doi: 10.1001/jama.2017.19163. [DOI] [PubMed] [Google Scholar]
- 59.Jiang Y, Yang M, Wang S, et al. Emerging role of deep learning-based artificial intelligence in tumor pathology. Cancer Commun (Lond). 2020;40(4):154–166. doi: 10.1002/cac2.12012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Geras KJ, Mann RM, Moy L.. Artificial intelligence for mammography and digital breast tomosynthesis: current concepts and future perspectives. Radiology. 2019;293(2):246–259. doi: 10.1148/radiol.2019182627. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Liu Y, Zhao F, Niu E, et al. Machine learning for predicting hematoma expansion in spontaneous intracerebral hemorrhage: a systematic review and meta-analysis. Neuroradiology. 2024;66(9):1603–1616. doi: 10.1007/s00234-024-03399-8. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data of this study are from the original research that can be obtained publicly. The author confirms that the data supporting the results of this study are provided in the article and its supplementary materials.





