Summary
Background
Epilepsy in children and adolescents harms cognitive development and quality of life, necessitating early risk identification to improve outcomes. Yet, current predictive models yielded inconsistent results, demanding a thorough evaluation of their accuracy and effectiveness to guide future research and inform evidence-based clinical strategies. This review aimed to integrate existing research findings on epilepsy prediction models for children and adolescents.
Methods
China National Knowledge Infrastructure, Wanfang Database, SinoMed, China Science and Technology Journal Database, PubMed, Embase, CINAHL, and Web of Science were searched from inception to August 31, 2025. The Prediction Model Risk of Bias Assessment Tool was used to assess the risk of bias and applicability. The areas under the curve (AUC) with 95% confidence intervals were pooled using random-effects meta-analysis. The study was registered with PROSPERO (CRD42025637913).
Findings
A total of 27 studies were included in this review. Sixteen studies were conducted in China. Twenty-five studies were at high risk of bias. The pooled AUC for 14 training models was 0.794 (95% CI: 0.747–0.840). For 17 validation models, the pooled AUC was 0.726 (95% CI: 0.659–0.792). Clinical features + EEG outperformed combinations with MRI in training (0.855 vs 0.725) and validation (0.743 vs 0.655). Non-machine learning models surpassed machine learning (training: 0.838 vs 0.717; validation: 0.778 vs 0.654), but the difference might not be statistically significant as the 95% CIs are overlapped in the validation; and external validation yielded higher AUC (0.807) than internal validation (0.634), though with extreme heterogeneity (I2 = 90.93%).
Interpretation
Current research showed uneven regional distribution. Models based on clinical features + EEG warrants further exploration. Predictor selection predominantly relies on univariate analysis, lacking standardized and scientific methodologies. Most studies carry a high risk of bias and rarely undergo validation, limiting their practical applicability. Validating existing models is crucial for identifying flaws and enhancing future research.
Funding
Natural Science Foundation of Hunan Province (grant No. 2024JJ8254).
Keywords: Epilepsy, Prediction models, Children, Adolescents, Systematic review and meta-analysis
Research in context.
Evidence before this study
According to the 2021 Global Burden of Disease Study, 51.7 million people worldwide suffer from epilepsy, with an age-standardized prevalence of 658 per 100,000. Among children and adolescents, epilepsy ranks as one of the top three causes of disability-adjusted life years, at a rate of 185.1 per 100,000. Early intervention and meticulous management can effectively control disease progression and improve patients' quality of life. Given epilepsy's significant impact on children and adolescents and the potential of predictive models to enhance their care and outcomes, these models warrant greater attention. We searched PubMed and China National Knowledge Infrastructure using the terms “Epilepsy” [MeSH] AND “model” [Title/Abstract] AND “Review” [Publication type]. The prediction of epilepsy is still in the exploratory stage. In recent years, many studies have developed models for pediatric and adolescent epilepsy to aid diagnosis, predict seizures, analyze prognosis, and assess treatment efficacy. However, few have been applied into clinical application. Therefore, it is essential to evaluate the effectiveness, strengths, and weaknesses of existing models to provide references and evidence for future monitoring of epilepsy prognosis in children and adolescents.
Added value of this study
This systematic review and meta-analysis included 27 studies from 7 countries. The pooled AUC for 14 training models was 0.794 (95% CI: 0.747–0.840), while for 17 validation models, it was 0.726 (95% CI: 0.659–0.792). Predictor combinations revealed that clinical features paired with electroencephalography (EEG) consistently outperformed more complex models that included magnetic resonance imaging (MRI), suggesting potential overfitting from excessive features. Notably, clinical-only models, despite high variability, achieved the highest validation AUC (0.846), highlighting the need for standardized clinical parameter selection. Additionally, non-machine learning models, especially logistic regression, outperformed machine learning models in both training and validation, likely due to limited sample sizes for complex algorithms and the greater interpretability of traditional methods in clinical settings. Our study also found that, beyond general clinical information, more studies are now using objective data like EEG and MRI features as predictors, though most lack external validation, casting doubt on their practical effectiveness.
Implications of all the available evidence
Current studies tend to focus on repeatedly identifying predictors and developing new models, rather than validating and refining existing ones. Therefore, there's a pressing need for external validation of these models to support their clinical application. Meanwhile, many studies have underscored the critical role of objective data in epilepsy prediction. Our findings suggest that models combining clinical features with EEG deserve further investigation. Despite the persistent high heterogeneity across subgroups, which highlights challenges in methodological consistency and dataset diversity, these results collectively stress the urgent need for region-specific model calibration, judicious predictor selection, and multi-center collaborations following standardized protocols to enhance the generalizability of pediatric epilepsy prediction.
Introduction
Epilepsy is a chronic neurological disease affecting approximately 51.7 million people globally,1,2 results in 140,000 deaths annually, with more than 80% of these deaths occurring in low- and middle-income countries. 2,3 It is one of the prevalent neurological disorders among children and adolescents, being characterized by abnormal and excessive discharge of brain neurons, which leads to recurrent, paroxysmal, and transient disturbances in the functioning of the central nervous system, significantly impacting their quality of life and cognitive development.4 Based on incomplete statistical data, the prevalence of epilepsy among children stands at approximately 0.34%. It has the potential to cause significant nervous system damage, such as autonomous learning disabilities, cerebral retardation, mental decline, and so on.5,6 The studies showed the overall risk of death for children with epilepsy is about ten times that of the general population.7 Besides, epilepsy type, age, gender, comorbidities, and potential drug interactions constitute critical factors in selecting appropriate treatment modalities. Bone health risks associated with antiepileptic drugs and weight gain represent essential considerations for children and adolescents. Psychiatric comorbidities, particularly depression and anxiety, may profoundly affect seizure control and overall quality of life.8,9
Currently, epilepsy remains incurable in the sense that there is no definitive solution to permanently halt all seizures.10 It is well-established that early intervention and meticulous management can significantly control the progression and improve quality of life for individual with epilepsy. Compared to adults, children and adolescents experience epilepsy within a dynamic neurodevelopmental context, where seizures and their consequences interact with rapidly maturing brain circuits. Additionally, in children, accurate seizure forecasting could enable age-adjusted interventions, such as adjusting antiseizure medication dosing during growth spurts or coordinating school activities around predicted high-risk periods. Therefore, more and more studies are focusing on children and adolescents.4,11
The predictive model is a statistical model that incorporates multiple variables (such as clinical indicators, biochemical markers, imaging data, etc.) to forecast the occurrence of specific outcomes. In recent years, numerous studies have commenced the development of predictive models for epilepsy in children and adolescents.1,11 For children and adolescents gradually transitioning to adult care, prediction models might identify those requiring closer monitoring or specialized services. These models can serve multiple purposes: facilitating epilepsy diagnosis, predicting seizure occurrences, conducting prognosis analyses, and estimating the therapeutic efficacy of epilepsy, etc. The aim of this study was to synthesize existing research on epilepsy prediction models for children and adolescents through a systematic review and meta-analysis. Our objectives were to: 1) evaluate the predictive performance of these models, and 2) provide constructive recommendations for developing developmentally informed, clinically impactful predictive tools.
Methods
This review was reported according to the preferred reporting items for systematic reviews and meta-analyses statement (PRISMA). The review protocol was registered prospectively and published in the International Prospective Register of systematic reviews (PROSPERO), registration number CRD42025637913. Based on the reviewer's suggestion, the search time period has been extended to August 31, 2025.
Search strategy
A comprehensive literature search was conducted from the inception to August 31, 2025. The electronic databases included China National Knowledge Infrastructure (CNKI), Wanfang Database, SinoMed, China Science and Technology Journal Database (VIP), PubMed, Embase, CINAHL, and Web of Science. The search included relevant keywords such as “epilepsy,” “prediction models,” “children,” “adolescents,” etc.
Inclusion and exclusion criteria
Studies were included if they met the following criteria: (1) Study population was children and adolescents with epilepsy aged ≤18 years; (2) The study objective was to develop predictive models for epilepsy, with the requirement that each model must include at least two predictive factors. These models were evaluated using metrics such as Area Under the Curve (AUC) values; (3) The study should be original research published in peer-reviewed journals, encompassing study designs such as prospective cohort studies, retrospective cohort studies, case–control studies, etc; (4) The prediction outcomes primarily focused on epilepsy diagnosis and epileptic seizures. If an article predicted epilepsy prognosis but the evaluation outcomes include epileptic seizures, it should also be included; (5) The search was limited to articles published in Chinese or English. We excluded studies based on the following exclusion criteria: (1) Duplicate publications; (2) Only described research protocols, methodological frameworks, or recruitment plans without presenting empirical data analysis. Studies in active recruitment phases but lacking preliminary results were also excluded; (3) Conference summaries, abstracts, letters, communications, comments, and so on; (4) Full text was not found despite contacting the authors via email.
Study screening
Two authors independently carried out the screening process of the studies. First, duplicate studies were excluded. Second, based on the titles and abstracts, the remaining studies were initially assessed to judge whether the studies align with the aim of this research. Third, according to the inclusion and exclusion criteria, the full texts were reviewed, and their reference lists were examined to identify any potentially relevant studies. Finally, in the event of any disagreements, a group discussion involving three authors was held to reach a consensus.
Data extraction
Two reviewers independently extracted data from the included studies using a standardized data extraction form. The form included information on the study characteristics (e.g., authors, year of publication, country, study design, sample size), population characteristics (e.g., study population, age range), model characteristics (e.g., model type, input variables), and model performance (e.g., sensitivity, specificity, area under the receiver operating characteristic curve).
Quality assessment
The methodological quality of the included studies was assessed using the Prediction Model Risk of Bias Assessment Tool (PROBAST) checklist.12 Studies employing machine learning were evaluated using the updated PROBAST-AI checklist.13 It covers four areas: participants, predictors, outcome, and analysis. Each item gets a “yes”, “probably yes”, “no”, “probably no”, or “no information”. If any item in an area is “no” or “probably no”, that area is high bias risk. Only if all areas are low risk, is the overall bias considered low. First, two researchers who had received evidence-based training jointly discussed the items of PROBAST to establish a unified evaluation criterion. Second, five studies were randomly selected from those included, and the research team conducted a joint evaluation to ensure that all members of the research team had a more consistent understanding of the practical application of the evaluation criteria. Finally, two researchers independently evaluated the presence of bias and applicability of the studies. The Kappa values between the two evaluators ranged from 0.467 to 0.717 for risk of bias, and were all 1 for applicability across three dimensions (see Supplementary Table S1). When there were inconsistent evaluations, consensus was reached through discussion within the research team.
Data synthesis and statistics analysis
A narrative synthesis was conducted to summarize the findings of the included studies. All statistics analysis was performed using Comprehensive Meta Analysis Software v3.0 (CMA)14,15 and R software (version 4.0.2).
Refer to previous studies,16,17 a meta-analysis of AUC values and their 95% confidence interval (CI) was conducted, which allowed for quantifying model performance via AUC and revealing uncertainty ranges with the 95% CIs, enabling a comprehensive assessment of model stability.18 The AUC represents the probability that a model assigns a higher predicted score to a randomly selected positive sample than to a randomly selected negative sample. This probability value can be directly used as a point estimate. Since the AUC does not rely on the selection of a specific classification threshold, it can more stably reflect the model's overall discriminatory power. Additionally, when combined with the 95% CI, it enables quantification of the uncertainty in the estimate. The training model is developed and optimized using a training dataset, while the validation model is assessed on a separate validation dataset (either internal or external) to evaluate its generalizability and guard against overfitting. The AUC of training and validation datasets for meta-analysis enables the differentiation a model's “fitting capacity (performance on development data)” and “predictive capacity (generalizability to independent data)”, thereby providing a more reliable evidence base for clinical translation, research quality evaluation, and model optimization. We separately pooled the AUC values and their 95% CIs from both the training and validation sets of the model. If the pooled AUCs of the training and validation sets are similar, it implies the model generalizes well. Conversely, a notably higher pooled AUC in the training set than the validation set may signal potential overfitting, requiring further examination and improvement.
CMA allowed entering the point estimate and confidence limits, which were then used to compute the standard error. If both the lower and upper limits are entered, the program computed the standard error using each value and checks to ensure the two estimates were comparable. If they were, it used the average of the two estimates. Otherwise, it flagged an error. Since predictive models may be influenced by sample size, disease incidence, or measurement errors, minor fluctuations can occur in the calculation of the upper and lower limits of the 95% CI. In such cases, an acceptable range of discrepancy can be defined by setting a “fudge factor” (an arbitrary adjustment or correction factor applied to account for uncertainties or biases) to ensure that computations proceed within reasonable error margins, thereby preventing program interruption due to trivial differences.14,19 In our analysis, considering that confidence intervals were symmetric around the point estimate, an acceptable difference can be specified by setting a “fudge factor”.
The inverse variance weighting method was used in meta-analysis.18,20 High-variance data may contain more random errors or outliers. By reducing the weights assigned to such data, inverse variance weighting minimizes the interference of noise on the pooled results. This approach also mitigates the impact of data with large measurement errors or high volatility. Notably, when AUC values vary across studies due to factors like sample size or design differences, the method adjusts weights by variance, yielding pooled results closer to the true value. Heterogeneity was tested using the I2 index. The I2 index provides a measure of heterogeneity, with values of 25%, 50%, and 75% indicating low, moderate, and high heterogeneity, respectively. Fixed or random effects models were used based on the heterogeneity of the analysis results. If I2 was greater than 50%, a random effects model was adopted; if I2 was less than or equal to 50%, a fixed effects model was adopted.21 Meta-regression was employed to detect sources of heterogeneity, utilizing the Knapp-Hartung adjustment method and maximum likelihood estimation. Factors incorporated into the analysis included publication year, sample size, and study design. The Knapp-Hartung adjustment enables us to obtain more reliable statistical inferences about the relationships between the moderators and the effect size by correcting for the possible underestimation of the variance of the regression coefficients. The maximum likelihood estimation allows us to estimate the meta-regression parameters in an optimal way, taking into account the probability distribution of the observed data. Meanwhile, making a comparison of the differences between classifications in various subgroups can also offer an indirect indication of the sources of heterogeneity. These subgroups encompassed different predictors, model types, and validation methods. Egger's test was used to identify publication bias, with p > 0.05 indicating a low likelihood of publication bias.22
The sensitivity analysis employed the leave-one-out method to assess the robustness of the results.23 Considering that the development of some models was based on the same dataset, and despite the fact that different models exhibit differences in the associations between variables and coefficients due to varying algorithms, it is still necessary to examine whether such associations excessively influence the results. Therefore, we further employed the “metafor” and “robumeta” R packages to conduct robust variance estimation,24,25 with dataset and model incorporated as stratification factors to additionally examine their potential interaction effects. Considering that the studies are independent of each other and some models are generated based on the same dataset, we make adjustments by introducing hierarchical effects weights25 to calculate the adjusted pooled AUC.
Role of the funding source
The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report.
Results
Study selection
The study screening flowchart is showed in Fig. 1. Initially, a total of 5040 studies were identified through database searches. Following an in-depth evaluation of 181 full-text studies for their eligibility, 27 studies were ultimately selected and included in the final analysis.
Fig. 1.
Flowchart of study selection process.
Study characteristics
Sixteen studies originated from China, three from the Canada, two each from USA, UK, and Netherlands, and one each from India and Israel. Nineteen studies were retrospective, seven were prospective, and one was a mixed cohort study including both retrospective and prospective data. A total of 56 models were constructed. Among them, 48 models reported results on the training dataset, and 28 models reported results on the validation dataset. Only 20 models (35.71%) reported results for both the training dataset and the validation dataset. Most of the models were designed to predict epileptic seizures. Two studies were to predict seizure recurrence following discontinuation of antiseizure medication in children with epilepsy. One study aimed to predict resective epilepsy surgical candidates earlier in the disease course, while another study focused on predicting valproic acid-induced dyslipidemia. Traditional logistic regression remained predominant, employed in 18 of 27 studies (66.67%), while machine learning algorithms (e.g., support vector machine, random forest, XGBoost, etc.) gained traction. Variable selection methods in the 27 studies demonstrated a dual paradigm: traditional statistical approaches dominated, with univariate analysis being the most frequent method (12 studies), such as stepwise regression selection (4 studies) and recursive feature elimination (4 studies). Machine learning techniques such as LASSO algorithm (2 studies), random forest variable importance (1 study), and classification tree-based selection (1 study) were applied in four studies. Notably, 4 studies employed multiple variable selection strategies sequentially. The predictor variables in the final model mainly consist of three major components: clinical features (e.g., seizure type, etiology, etc), electrophysiological data (EEG abnormalities), and neuroimaging findings (MRI lesions), with growing emphasis on treatment-related factors (e.g., antiepileptic drug response). The characteristics of these studies are detailed in Table 1.
Table 1.
Characteristics of the included studies.
| Study | Country | Study design | Sample size | Study population | Age | Prediction outcome | Variable selection | Included variables | Model | Training | Validation |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Arts et al., 199926 | Netherlands | Prospective cohort study | 466 | Children with epilepsy | 1 month to 16 years | Childhood epilepsy with a poor short-term outcome | Stepwise regression | Number of seizures before intake, seizure type, etiology, preexisting neurologic signs, number of seizures in first 6 months after intake, 3 months remission in first 6 months follow-up, EEG at intake, EEG at 6 months after intake | Logistic regression model | SE:29%, SP: 89%, PPV: 66%, NPV: 79% | / |
| Huang et al., 201427 | China | Prospective cohort study | 649 | Children with epilepsy | <12 years | Drug-resistant epilepsy (6 months after diagnosis) | Univariate analysis | Clinical characteristics (gender, family history of seizures, preterm delivery) | Logistic regression model | AUC: 0.52 | / |
| Neurological physical abnormality, abnormal neuroimaging, abnormal EEG, febrile convulsion, aura, age at onset, duration of seizures, number of seizures before diagnosis, partial epilepsy | AUC: 0.77 | ||||||||||
| Neurological physical abnormality, age at onset of epilepsy under 1 year, more than 10 seizures before diagnosis, partial epilepsy | AUC: 0.78 | ||||||||||
| Neurological physical abnormality, more than 10 seizures before diagnosis, partial epilepsy | AUC: 0.76 | ||||||||||
| Age at onset of epilepsy under 1 year, more than 10 seizures before diagnosis, partial epilepsy | AUC: 0.76 | ||||||||||
| van Diessen et al., 201828 | Netherlands | Retrospective study | Training: 451 Validation: 187 |
Children who visited our outpatient department for diagnostic workup related to 1 or more paroxysmal events | Training: 5.9 years Validation: 7.8 years |
Presence or absence of epilepsy | Backward selection using the Akaike information criterion | Clinical characteristics (sex, age of first seizure, event description, medical history) | Logistic regression model | / | External validation: AUC: 0.67 (95% CI 0.59–0.74) |
| EEG | External Validation: AUC: 0.82 (95% CI 0.77–0.88) | ||||||||||
| Clinical characteristics (sex, age of first seizure, event description, medical history), EEG | External Validation: AUC: 0.86 (95% CI 0.80–0.92), SE: 0.62 (95% CI 0.51–0.72), SP: 0.96 (95% CI 0.90–0.99), PPV: 0.93 (95% CI 0.83–0.97), NPV: 0.76 (95% CI 0.70–0.80) | ||||||||||
| Sansevere et al., 201929 | USA | Retrospective study | 210 | Neonates in the Neonatal Intensive Care Unit | / | Seizure Occurrence | Stepwise regression using the Akaike information criterion | Clinical features (sex, postconceptional age, EEG indication, and disorders and therapies associated with a high risk for seizures) | Cox proportional hazard regression model | AUC: 0.66 (95% CI 0.59–0.74), SE: 58.9%, SP: 66.4%, PPV: 48.3%, NPV: 75.2% | / |
| EEG | AUC: 0.76 (95% CI 0.70–0.83), SE: 86.3%, SP: 56.2%, PPV: 51.2%, NPV: 88.5% | ||||||||||
| Clinical features (sex, postconceptional age, EEG indication, and disorders and therapies associated with a high risk for seizures), EEG | AUC: 0.83 (95% CI 0.78–0.88), SE: 83.6%, SP: 68.6%, PPV: 58.7%, NPV: 88.7% | ||||||||||
| Chen, 202030 | China | Retrospective case–control study | Training: 273 Validation: 131 | Children with epilepsy | 7 years | Refractory epilepsy | Univariate analysis | Cause of disease, change in seizure type, EEG after half a year of treatment, the effect of initial medication | Logistic regression model | AUC: 0.94, SE: 0.91, SP: 0.83 | Internal validation (Ten-fold cross-validation): AUC: 0.93, SE: 0.92, SP: 0.83 |
| Automatic variable selection via classification tree model | Initial response to antiepileptic drugs, etiology, pre-treatment brain MRI findings, and pre-treatment psychomotor development | Classification tree model | AUC: 0.93, SE: 0.79, SP: 0.96 | Internal validation (Ten-fold cross-validation): AUC: 0.92, SE: 0.95, SP: 0.76 | |||||||
| Latzer et al., 202031 | Israel | Retrospective study | 118 | Children with both cerebral palsy and epilepsy | 0–17 years | Drug-resistant epilepsy | Stepwise backward elimination | Birth weight, gestational age, low APGAR score at 1 min, low APGAR score at 5 min, resuscitation at delivery, hypoxic-ischemic encephalopathy, gross brain malformation, microcephaly, younger age of seizure onset, neonatal seizures, focal-onset epilepsy, focal slowing on EEG | Logistic regression model | AUC: 0.80 | / |
| Low APGAR score at 5 min, neonatal seizures, focal-onset epilepsy, focal slowing on EEG | AUC: 0.84 | ||||||||||
| Jing et al., 202132 | China | Retrospective study | 138 | Children with active epilepsy | 4 months to 12 years | Valproic acid treatment effect | / | Compliance of nursing instructions, course of discase, frequency of illness within one year, valproic acid concentration reaching standard | Logistic regression model | AUC: 0.90 (95% CI: 0.86–0.95) | Internal validation (Bootstrap resampling method): C-index: 0.90 |
| Andreas et al., 202233 | UK | Multicenter retrospective cohort study | Training: 743 Validation 1: 203 Validation 2: 72 | Patients with Dravet syndrome and patients with GEFS + carrying pathogenic SCN1A variants | >24 months | SCN1A-related epilepsies | / | SCN1A genetic score, age at seizure onset | Generalized linear model | AUC: 0.89 (95% CI: 0.86–0.92) | External validation 1: AUC: 0.94 (95% CI: 0.91–0.97); External validation 2: AUC: 0.92 (95% CI: 0.82–1.00); C-index: 0.38 (95% CI: 0.28–0.49) |
| Azriel et al., 202234 | Canada | Retrospective study | Training: 111 Validation: 55 |
Children with a reduced level of consciousness in the pediatric intensive care unit | 5.1 years | epileptic seizures | Recursive feature elimination | Clinical features | Random forest model | AUC: 0.95 | Internal validation (5-fold cross-validation): AUC: 0.85 |
| Age, heart rate variability, ECG features | AUC: 0.94 | Internal validation (5-fold cross-validation): AUC: 0.84 | |||||||||
| Clinical features heart rate variability, ECG features | AUC: 0.96 | Internal validation (5-fold cross-validation): AUC: 0.87 | |||||||||
| Chen et al., 202235 | China | Retrospective study | 328 | Japanese encephalitis children | ≤18 years | Epilepsy after Japanese encephalitis | LASSO algorithm | Seizure number >5, status epilepticus, coma | Logistic regression model | AUC: 0.89 (95% CI 0.87–0.97) | / |
| Geng & Chen, 202236 | China | Retrospective study | Training: 679 Validation: 347 |
Children with epilepsy | / | Drug-resistant epilepsy | Univariate analysis | Onset age, >20 pretreatment seizures, etiology, development and epileptic encephalopathy, neurological abnormalities, status epilepticus, focal seizure | Logistic regression model | AUC: 0.92, C-index: 0.92 (95% CI 0.92–0.93) | Internal validation (Bootstrap resampling method): AUC: 0.91, C-index: 0.91 (95% CI 0.90–0.91), AC: 0.87, (95% CI 0.88–0.90), SE: 0.93; SP: 0.76; External validation: AC: 0.87, (95% CI 0.84–0.90), SE: 0.93; SP: 0.76 |
| Ma et al., 202237 | China | Prospective cohort study | Training: 70 Validation: 18 |
Children with drug-resistant epilepsy | ≤16 years | Postoperative seizure | Filter method with the F-score and wrapper method with recursive feature elimination | Clinical features | Support vector machine linear model | AUC: 0.610, AC: 51.4%, PR: 60% | / |
| EEG synchronization features | AUC: 0.741, AC: 61.4%, PR: 67.5% | / | |||||||||
| Clinical features, EEG synchronization features | AUC: 0.766, AC: 75.7%, PR: 80.8% | External validation: AUC: 0.774, AC: 61.1% | |||||||||
| Yossofzai et al., 202238 | Canada | Retrospective multicenter cohort study | Training: 641 Validation: 160 |
Children who were treated with epilepsy surgery | ≤18 years | Seizure outcome after pediatric epilepsy surgery | Univariate analysis | Number of antiseizure medications, MRI lesion, age at seizure onset, surgery type, vEEG | XGBoost model | AUC: 0.73 (95% CI: 0.69–0.77) | External validation: AUC: 0.74 (95% CI: 0.66–0.82), SE: 0.87 (95% CI: 0.81–0.94), SP: 0.58 (95% CI: 0.47–0.71), PPV: 0.77 (95% CI: 0.72–0.82), NPV: 0.75 (95% CI: 0.64–0.86) |
| Logistic regression model | AUC: 0.72 (95% CI: 0.68–0.76) | External validation: AUC: 0.72 (95% CI: 0.63–0.80), SE: 0.72 (95% CI: 0.63–0.80), SP: 0.66 (95% CI: 0.53–0.77), PPV: 0.77 (95% CI: 0.71–0.84), NPV: 0.60 (95% CI: 0.52–0.70) | |||||||||
| Zhao et al., 202239 | China | Retrospective study | 103 | Epilepsy children with rare tuberous sclerosis complex | 0–190 months | Drug treatment outcome (seizures for at least 1 year) | Univariate analysis | Clinical features, MRI features, EEG features | Combining 6 models (decision tree, random forest, support vector machine, Naive Bayes, logistic regression and multilayer perceptron) | AUC: 0.812 | / |
| Clinical features, MRI features (no MIR lesion quantity), EEG features | AUC: 0.795 | ||||||||||
| Clinical features, MRI features (no MIR lesion lobe), EEG features | AUC: 0.554 | ||||||||||
| Clinical features, MRI features (no MIR lesion type), EEG features | AUC: 0.471 | ||||||||||
| Clinical features, EEG features | AUC: 0.444 | ||||||||||
| Eriksson et al., 202340 | UK | Retrospective study | 797 | Children underwent epilepsy surgery | / | Postoperative seizure outcome | Univariate analysis | handedness, educational status, genetic findings, age of epilepsy onset, history of infantile spasms, spasms at time of preoperative evaluation, number of seizure types at time of preoperative evaluation, total number of antiseizure medications trialed, MRI bilaterality, MRI diagnosis, type of surgery performed, lobe operated on, histopathology diagnosis | Logistic regression model | AUC: 0.72 (95% CI: 0.64–0.82), AC: 72% (95% CI: 68%–75%) | / |
| Multilayer perceptron model | AUC: 0.70 (95% CI: 0.63–0.82), AC: 71% (95% CI: 67%–0.74%) | ||||||||||
| XGBoost model | AUC: 0.70 (95% CI: 0.62–0.83), AC: 71% (95% CI: 68%–75%) | ||||||||||
| Hu et al., 202341 | China | Retrospective study | Training: 75 Validation: 30 |
Children with tuberous sclerosis complex–related epilepsy | >1 year | Drug treatment efficacy | Spearman correlation and LASSO algorithm | Age of onset, infantile spasms, epileptiform discharge in left parietooccipital area of EEG, ASM numbers, gene mutation type, radiomic features | Combining 11 machine learning models | AUC: 0.96, SE: 0.97, SP: 0.86 | Internal validation (Ten-fold cross-validation): AUC: 0.94, SE: 0.94, SP: 0.84; External validation: AUC: 0.854, SE: 0.75, SP: 0.83 |
| Panda et al., 202342 | India | Prospective study | 161 | Children with neurocysticercosis | 4–18 years | Seizure recurrence | Univariate analysis | Presence of epileptiform abnormalities in EEG, more than 5 NCC lesions, the presence of perilesional edema exceeding 2 cm, a history of a cluster of seizures before presentation | Logistic regression model | AUC: 0.89 (95% CI: 0. 81–0.95) | / |
| Wu et al., 202343 | China | Retrospective study | 97 | Children with epilepsy secondary to focal cortical dysplasia who had undergone resection surgery | / | Postoperative seizure outcomes | / | Detectable lesion on MRI, temporal focal cortical dysplasia, complete resection, focal cortical dysplasia type, risk of persistent seizure at 0.5-year, risk of persistent seizure at 1-year, risk of persistent seizure at 2-year | Cox proportional hazard regression model | C-index: 0.88 (95% CI: 0.85–0.95) | / |
| Sun et al., 202344 | China | Retrospective study | Training: 184 Validation: 71 |
Children with encephalitis | 3–12 years | Epilepsy after encephalitis | Univariate analysis | The level of hemoglobin and globulin, the proportion of patients with fever or epilepsy frequency ≥10/day, the value of EEG S/F in forehead, temporal, and occipital region | Logistic regression model | AUC: 0.835 (95% CI: 0.745–0.925), SE: 79.3%, SP: 82.6% | External validation: 0.712 (95% CI: 0.469–0.956), SE: 87.7%, SP: 90.0% |
| Yossofzai et al., 202345 | Canada | Multicenter retrospective and prospective observational cohort study | Training: 219 Validation: 49 |
Children with drug-resistant epilepsy | ≤18 years | Seizure-free outcome following MR-guided laser interstitial thermal therapy | / | Clinical features (age at MR-guided laser interstitial thermal therapy, sex, age at seizure onset, preoperative seizure frequency, number of ASMs, seizure type, and whether the child underwent prior open epilepsy surgery), diagnostic features (type of MRI-identified lesion, lesion size, and vEEG concordance to ablation site), ablation features (type of MR-guided laser interstitial thermal therapy and site of ablation) | Gradient-boosting machine model | AUC: 0.67 | Internal validation (Ten-fold cross-validation): AUC: 0.67 (95% CI 0.50–0.82); SE: 0.71 (95% CI 0.47–0.88); SP: 0.66 (95% CI 0.50–0.81); PPV: 0.52 (95% CI 0.38–0.68); NPV: 0.81 (95% CI 0.68–0.93) |
| Logistic regression model | AUC: 0.63 | Internal validation (Ten-fold cross-validation): AUC: 0.58 (0.33–0.86); SE: 0.41 (95% CI 0.18–0.65); SP: 0.84 (95% CI 0.72–0.97); PPV: 0.58 (95% CI 0.33–0.86); NPV: 95% CI 0.73 (0.65–0.83) | |||||||||
| Random forest model | AUC: 0.66 | Internal validation (Ten-fold cross-validation): AUC: 0.61 (95% CI 0.43–0.77); SE: 0.35 (95% CI 0.12–0.59); SP: 0.93 (95% CI 0.84–1.00); PPV: 0.75 (95% CI 0.44–1.00); NPV: 0.73 (95% CI 0.67–0.81) | |||||||||
| Support vector machine model | AUC: 0.64 | Internal validation (Ten-fold cross-validation): AUC: 0.62 (95% CI 0.44–0.78); SE: 0.41 (95% CI 0.18–0.65); SP: 0.81 (95% CI 0.69–0.94); PPV: 0.54 (95% CI 0.30–0.80); NPV: 0.72 (95% CI 0.64–0.82) | |||||||||
| Neural network model | AUC: 0.66 | Internal validation (Ten-fold cross-validation): AUC:0.58 (95% CI 0.41–0.73); SE: 0.82 (95% CI 0.65–1.00); SP: 0.38 (95% CI 0.22–0.56); PPV: 0.41 (95% CI 0.33–0.50); NPV: 0.80 (95% CI 0.62–1.00) | |||||||||
| Univariate analysis | Age at seizure onset, preoperative seizure frequency, number of ASMs, vEEG concordance, and lesion size | Gradient-boosting machine model | / | Internal validation (Ten-fold cross-validation): AUC: 0.68 (95% CI 0.51–0.84); SE: 0.53 (95% CI 0.29–0.76); SP:0.84 (95% CI 0.72–0.97); PPV: 0.64 (95% CI 0.44–0.89); NPV: 0.77 (95% CI 0.69–0.88) | |||||||
| Logistic regression model | Internal validation (Ten-fold cross-validation): AUC: 0.65 (95% CI 0.47–0.81); SE: 0.29 (95% CI 0.13–0.53); SP: 1.00 (95% CI 0.89–1.00); PPV: 1.00 (95% CI 0.57–1.00); NPV: 0.73 (95% CI 0.68–0.80) | ||||||||||
| Boruta and recursive feature elimination | vEEG concordance, sex, lesion size, number of ASMs, MRI-identified lesion, and preoperative seizure frequency | Gradient-boosting machine model | / | Internal validation (Ten-fold cross-validation): AUC: 0.64 (95% CI 0.47–0.80); SE: 0.94 (95% CI 0.82–0.94); SP: 0.34 (95% CI 0.19–0.50); PPV: 0.43 (95% CI 0.37–0.51); NPV: 0.91 (95% CI 0.75–1.00) | |||||||
| Logistic regression model | Internal validation (Ten-fold cross-validation): AUC: 0.65 (95% CI 0.47–0.81); SE: 0.29 (95% CI 0.13–0.53); SP:1.00 (95% CI 0.89–1.00); PPV: 1.00 (95% CI 0.57–1.00); NPV: 0.73 (95% CI 0.68–0.80) | ||||||||||
| Cheng et al., 202446 | China | Prospective cohort study | 65 | Pediatric patients with drug-resistant epilepsy | ≤16 years | Vagus nerve stimulation efficacy | A wrapper method of recursive feature elimination | Clinical features (duration of epilepsy, BMI, unknown type of seizure) and EEG signals (nodal features in C3, T5, P3, O1, HHSE feature) | Linear Support Vector Machine model | AUC: 0.839, AC: 81.5%, PR: 80.1% | / |
| Li et al., 202447 | China | Retrospective study | 296 | Children with febrile seizures | 3 months-3 years | Epileptic seizures | Univariate analysis | Age of first seizure, number of first seizures, complex febrile seizures, EEG, family history of epilepsy | Logistic regression model | AUC: 0.896 (95% CI: 0.851–0.940) | / |
| Liang et al., 202448 | China | Retrospective cohort study | 157 | Children with epilepsy | ≤16 years | Valproic acid induced dyslipidemia | Univariate analysis | DBIL, duration of medication, ALB, BMI, AST | Logistic regression model | AUC: 0.777 (95% CI: 0.706–0.849), SE: 0.733, SP: 0.746 | / |
| Wissel et al., 202449 | USA | Multicenter, prospective, longitudinal cohort study | Training: 5285 Validation: 1633 |
Children with epilepsy | Training: 13.4 years Validation: 12.5 years |
Resective epilepsy surgical candidates earlier in the disease course | Correlation-based filter and variable importance measures from a random forest | Electronic health record (EEG and MRI reports, neurology notes, clinical features) | Random forest model | AUC: 0.95 (95% CI: 0.93–0.96) | External validation: AUC: 0.91 (95% CI: 0.87–0.94), SE: 0.93 (95% CI 0.80–0.98), SP: 0.71 (95% CI 0.69–0.74), PPV: 0.08 (95% CI 0.05–0.10), NPV: 1.00 (95% CI 0.99–1.00) |
| Zhang et al., 202450 | China | Retrospective study | 126 | Children with drug-resistant epilepsy | 5.55 years | Epilepsy prognosis | Univariate analysis | Age at onset, etiology, frequency of attacks before treatment, cognitive disability | Logistic regression model | AUC: 0.828 (95% CI: 0.754–0.902), SE: 72.22%, SP: 84.72% | / |
| Dai et al., 2025a,51 | China | Prospective cohort study | 341 | Children with epilepsy | 7.01 years | Recurrence after drug withdrawal | Meta-analysis | Intellectual disability, Abnormal neurological examination or motor deficit, History of febrile seizures, only focal onset seizures, Overall number of ASM used, Duration of epilepsy≥3 years, Abnormal EEG at the start of ASM tapering, Abnormal EEG after ASM tapering, and age at first seizure≥10 years | β coefficients derived from log-transformed pooled relative risks to establish weighted scores in the predictive model | / | AUC: 0.850 (95% CI: 0.810–0.910), AC: 0.770 (95% CI: 0.720–0.810), SE: 0.740 (95% CI: 0.680–0.800), SP: 0.820 (95% CI: 0.750–0.890) |
| Kuang et al., 202552 | China | Retrospective cohort study | Training: 212 Validation: 106 |
Children with epilepsy | ≤15 years | Recurrence after drug withdrawal | / | Clinical characteristics (age at onset, high frequency of epilepsy before treatment, timing of drug withdrawal, drug withdrawal duration, drug number) and abnormal vEEG before drug withdrawal | Logistic regression model | AUC: 0.865 (95% CI 0.812–0.908), SE: 79.37%, SP: 88.59% | AUC: 0.857 (95% CI 0.775–0.917), SE: 76.67%, SP: 85.53% |
AC: accuracy; ALB: serum albumin; ASM: antiseizure medication; AST: aspartate aminotransferase; AUC: area under curve; BMI: body mass index; CI: confidence interval; DBIL: direct bilirubin; ECG: Electrocardiogram; EEG: electroencephalography; HHSE: Hilbert-Huang spectral entropy; MRI: magnetic resonance imaging; NCC: Neurocysticercosis; NPV: negative predictive value; PPV: positive predictive value; PR: precision; SE: sensitivity; SP: specificity; vEEG: video-electroencephalography;/: not reported.
Model development was based on meta-analysis of patient data from cohort studies.
Study assessment
The assessment results of included studies were detailed in Table 2. In totally, a total of 25 studies were high risk of bias. Among the included studies, 19 employed a retrospective design. Retrospective studies often lack blinding procedures, creating a risk that prior knowledge of outcomes may influence the evaluation of predictive factors. This form of bias is particularly prevalent in diagnostic model development. Additionally, the statistical methodologies in most studies were limited, primarily marked by the use of univariate analysis for predictor selection, an exclusive focus on predictive effectiveness while neglecting to assess the model's calibration, and only 12 studies employing cross-validation or external validation sets for model verification. The calibration methods mainly employed calibration curves and the Hosmer–Lemeshow test. For nine studies employing machine learning, the following risks were primarily identified: (1) a lack of detailed inclusion and exclusion criteria; (2) some studies had small sample sizes and also failed to assess the rationality of the sample size; (3) most studies did not mention the handling of missing values, and among those that did, the more common approach was to directly exclude samples with missing values.
Table 2.
The results of the quality assessment of included studies.a
| Study | ROB |
Applicability |
Overall |
||||||
|---|---|---|---|---|---|---|---|---|---|
| Participants | Predictors | Outcome | Analysis | Participants | Predictors | Outcome | ROB | Applicability | |
| Arts et al., 199926 | + | + | ? | – | + | + | + | – | + |
| Huang et al., 201427 | + | + | + | – | + | + | + | – | + |
| van Diessen et al., 201828 | – | + | + | – | + | + | + | – | + |
| Sansevere et al., 201929 | – | – | + | – | + | + | + | – | + |
| Chen, 202030 | – | – | + | – | + | + | + | – | + |
| Latzer et al., 202031 | – | – | + | – | + | + | + | – | + |
| Jing et al., 202132 | – | – | ? | – | + | + | + | – | + |
| Andreas et al., 202233 | – | + | + | – | + | + | + | – | + |
| Azriel et al., 202234: Model development | – | – | + | – | + | + | + | – | + |
| Azriel et al., 202234: Model evaluation | – | – | + | – | + | + | + | – | + |
| Chen et al., 202235 | – | – | + | – | + | + | + | – | + |
| Geng & Chen, 202236 | – | – | + | – | + | + | + | – | + |
| Ma et al., 202237: Model development | + | + | + | – | + | + | + | – | + |
| Ma et al., 202237: Model evaluation | + | + | + | – | + | + | + | – | + |
| Yossofzai et al., 202238: Model development | + | + | + | – | + | + | + | – | + |
| Yossofzai et al., 202238: Model evaluation | + | + | + | – | + | + | + | – | + |
| Zhao et al., 202239: Model development | + | ? | + | – | + | + | + | – | + |
| Zhao et al., 202239: Model evaluation | / | / | / | / | / | / | / | / | / |
| Eriksson et al., 202340: Model development | – | + | ? | + | + | + | + | – | + |
| Eriksson et al., 202340: Model evaluation | / | / | / | / | / | / | / | / | / |
| Hu et al., 202341: Model development | + | – | + | – | + | + | + | – | + |
| Hu et al., 202341: Model evaluation | + | – | + | – | + | + | + | – | + |
| Panda et al., 202342 | + | ? | + | – | + | + | + | – | + |
| Wu et al., 202343 | – | – | + | – | + | + | + | – | + |
| Sun et al., 202344 | – | – | ? | – | + | + | + | – | + |
| Yossofzai et al., 202345: Model development | + | + | + | – | + | + | + | – | + |
| Yossofzai et al., 202345: Model evaluation | + | + | + | – | + | + | + | – | + |
| Cheng et al., 202446: Model development | + | + | + | – | + | + | + | – | + |
| Cheng et al., 202446: Model evaluation | / | / | / | / | / | / | / | / | / |
| Li et al., 202447 | – | – | ? | – | + | + | + | – | + |
| Liang et al., 202448 | – | – | + | – | + | + | + | – | + |
| Wissel et al., 202449: Model development | + | + | + | + | + | + | + | + | + |
| Wissel et al., 202449: Model evaluation | + | + | + | + | + | + | + | + | + |
| Zhang et al., 202450 | – | – | + | – | + | + | + | – | + |
| Dai et al., 202551: Model development | + | + | + | + | + | + | + | + | + |
| Dai et al., 202551: Model evaluation | + | + | + | + | + | + | + | + | + |
| Kuang et al., 202552: Model development | – | – | + | – | + | + | + | – | + |
| Kuang et al., 202552: Model evaluation | – | – | + | – | + | + | + | – | + |
ROB = risk of bias. + indicates low ROB/low concern regarding applicability; − indicates high ROB/high concern regarding applicability; ? indicates unclear ROB/unclear concern regarding applicability; /:The relevant results were not reported in the literature, rendering evaluation impossible.
The Prediction model Risk of Bias Assessment Tool was used to assessment the quality assessment of included studies.
Meta analysis
Due to insufficient reporting on the development details of the models in the included studies, certain limitations were noted. Meanwhile, for the meta-analysis, the fudge factor was set to 2. This adjustment was made with the specific intention of allowing for the inclusion of more models in the pooled analysis. However, two models35,36 could not be pooled because extreme asymmetry in the 95%CI. For example, in the training model developed by Geng & Chen (2022),36 the AUC was 0.920, with its 95% CI lower limit also at 0.920. Ultimately, for the prediction of seizure occurrences, a total of 9 studies (with 14 models) based on the training dataset and 5 studies (with 17 models) based on the validation dataset were included in the meta-analysis (Figs. 2 and 3). These models were specifically included because they reported both AUC and 95% CI.
Fig. 2.
Forest plot of the random effects meta-analysis of pooled AUC estimates for 14 training models. Note: Parentheses following each study denote distinct models derived from that study. Refer to Table 1 for details.
Fig. 3.
Forest plot of the random effects meta-analysis of pooled AUC estimates for 17 validation models. Note: Parentheses following each study denote distinct models derived from that study. Refer to Table 1 for details.
Training model
Using a random effects model, the pooled AUC of 14 training models was 0.794 (95% CI: 0.747–0.840) (Fig. 2). The leave-one-out method showed that the overall effect size was not notably influenced by a single study, suggesting the robustness of the result (Supplementary eFig. S1). In Supplementary Table S2 (training model), there were differences (σ = 0.073) in the AUCs across different datasets. There was no significant variation in the interaction between datasets and models (σ = 0), indicating that the AUCs of different models under the same dataset demonstrate consistency. Therefore, the use of hierarchical effects weights based on datasets was supported for result pooling. According to robust variance estimation, the adjustment pooled AUC increased to 0.794 (95% CI: 0.727–0.860) in Supplementary eFig. S2.
The I2 value was 89.93%, indicating a high degree of heterogeneity among the studies, and the intercept value of Egger's test was −3.252 (95% CI: −8.123, 1.619), with a t-value of 1.455 (P = 0.086), suggesting no significant publication bias. Meta-regression results indicated that publication year, sample size, and study design were not significant contributors to elevated heterogeneity (Supplementary Table S3A).
Validation model
The pooled AUC of 17 validation model was calculated using a random effects model, resulting in a value of 0.726 (95% CI: 0.659, 0.792) (Fig. 3). The leave-one-out sensitivity analysis demonstrated that the overall effect size was not significantly impacted by any individual study, thereby indicating the robustness of the findings (Supplementary eFig. S3). In Supplementary Table S2 (validation model), there were differences (σ = 0.113) in the AUCs across different datasets. There was no significant variation in the interaction between datasets and models (σ = 0), indicating that the AUCs of different models under the same dataset demonstrate consistency. Therefore, the use of hierarchical effects weights based on datasets was supported for result pooling. According to robust variance estimation, the adjustment pooled AUC increased to 0.725 (95% CI: 0.613–0.837) in Supplementary eFig. S4.
The I2 value was 88.17%, indicating a high degree of heterogeneity among the studies. Furthermore, the intercept value derived from Egger's test was −4.102 (95% CI: −5.606, −2.597), accompanied by a t-value of 5.811 (P < 0.001), which collectively suggest the existence of publication bias. Meta-regression results suggested that sample size was a source of heterogeneity (β = 0.001; P = 0.031) (Supplementary Table S3B). The leave-one-out analysis showed that after excluding the logistic regression model from van Diessen et al., 2018,28 the I2 value decreased to 72.96%. When other models were removed one by one, there were no significant fluctuations in the I2 value.
Subgroup analysis
Regarding the types of included predictors, in the training dataset, the pooled AUC for the multimodal combination of clinical features + EEG + MRI was 0.725 (95% CI: 0.624–0.826), for the combination of clinical features + EEG it was 0.855 (95% CI: 0.769–0.942), and for clinical features alone it was 0.796 (95% CI: 0.741–0.851). In the validation dataset, the pooled AUC for the multimodal combination of clinical features + EEG + MRI was 0.655 (95% CI: 0.570–0.740), for the combination of clinical features + EEG it was 0.743 (95% CI: 0.615–0.871), and for clinical features alone it was the highest at 0.846 (95% CI: 0.721–0.970). Whether in the training dataset or the validation dataset, the heterogeneity among models using only clinical features was extremely high (I2 = 89.01% and 95.42%, respectively).
In the training set, the non-machine learning model (AUC = 0.838, 95% CI: 0.800–0.876) outperformed the machine learning model (AUC = 0.717, 95% CI: 0.662–0.771). Similarly, in the validation set, non-machine learning models demonstrated superior predictive performance compared to machine learning models, with AUCs of 0.778 (95% CI: 0.705–0.851) and 0.654 (95% CI: 0.561–0.747), respectively, but the difference might not be statistically significant as the 95% CIs are overlapped. Notably, significant heterogeneity was observed among non-machine learning models in both datasets (I2 = 83.82% and 89.44%, respectively). Among traditional models, logistic regression is the most common. In the training dataset, its pooled AUC is 0.829 (95% CI: 0.765, 0.893), while in the validation dataset, it is 0.740 (95% CI: 0.681, 0.800). However, significant heterogeneity was observed in both datasets (I2 = 89.24% and 73.59%, respectively).
Additionally, external validation yielded a significantly higher pooled AUC (0.807, 95% CI: 0.737–0.878) than internal validation (0.634, 95% CI: 0.552–0.716), though with extremely high heterogeneity (I2 = 90.93%). The results of subgroup analyses were detailed in Supplementary Tables S4 and S5.
Discussions
This systematic review and meta-analysis included a total of 27 studies. This review revealed an increasing number of epilepsy prediction models for children and adolescents, although most of these models are based on Chinese patient data. Currently, it's still tough for clinicians to identify pediatric epilepsy patients at high risk of frequent seizures, which limits the effectiveness of personalized treatment despite medical advances. Predictive models have emerged as a promising solution, but none have been widely adopted in clinics yet. Although, to the best of our knowledge, none of the predictive models we retrieved through our comprehensive literature search have been widely adopted in clinical settings yet, their development represents a critical step forward in epilepsy management.
For the 14 training models, the initial pooled AUC was 0.794 (95% CI: 0.747–0.840). After adjustment using robust variance estimation, the AUC remained 0.794, but the 95% CI widened to 0.727–0.860. Among 17 validation models, the pooled AUC shifted from 0.726 (95% CI: 0.659–0.792) to 0.725 (95% CI: 0.613–0.837) post-adjustment. AUC values above 0.80 are generally considered clinically useful, while values below 0.80, particularly those below 0.75, are considered to have limited clinical utility.53,54 A wider 95% CI for the validated model similarly suggests suboptimal stability of the included model on the whole.18,53 This may be related to the small size of the validation samples. For instance, in the studies by Yossofzai et al., 202345 and Sun et al., 2023,44 the validation samples in both cases comprised fewer than 100 cases, leading to the lower limit of the confidence intervals for their models falling below 0.5. According to the results of robust variance estimation, regardless of whether it is training models or validation models, the differences mainly stem from the dataset. When using the same dataset, different models demonstrate consistent performance. Therefore, future efforts in developing predictive models should focus on constructing higher-quality datasets.
Regarding predictor combinations, clinical features + EEG consistently outperformed clinical features + EEG + MRI in both training (AUC 0.855 vs 0.725) and validation (AUC 0.743 vs 0.655) datasets, suggesting potential overfitting with excessive feature inclusion. Future research could focus on models integrating clinical features and EEG. Notably, clinical-only models showed the highest validation AUC (0.846) despite extreme heterogeneity (I2 = 95.42%), emphasizing the need for standardized clinical parameter selection.
For the validation models, regardless of before or after adjustment, the 95% CIs remained relatively wide, indicating that, on the whole, the models’ predictive capabilities still exhibited a certain degree of instability. Subgroup analysis revealed that external validation yielded significantly higher AUC (0.807) than internal validation (0.634), likely due to reduced overfitting, though the extreme heterogeneity (I2 = 90.93%) indicated inconsistencies in external dataset quality and diversity. These findings collectively emphasize the importance of region-specific model calibration, parsimonious predictor selection, and rigorous external validation in pediatric epilepsy prediction research. Future studies should prioritize multi-center collaborations with standardized protocols to address current limitations in generalizability and methodological consistency.
Additionally, non-machine learning models (particularly logistic regression) outperformed machine learning in both training (AUC 0.838 vs 0.717) and validation (0.778 vs 0.654) settings. This discrepancy may reflect insufficient sample sizes for complex algorithms, or the superior interpretability of traditional methods in clinical settings. The persistent high heterogeneity (I2>83%) across all non-machine learning subgroups suggests variability in variable selection and model calibration.
Additionally, according to the PROBAST checklist, only two study was deemed to have a low risk of bias and others were high risk of bias, limiting the practical utility of these prediction models. First, retrospective data presents limitations owing to difficulties in managing confounding variables, the high risk of selection bias, and ambiguous temporal relationships, all of which may introduce model bias. Retrospective data may include only complete cases or specific populations, leading to an overestimation (e.g., healthy user effect) or underestimation (e.g., missing key confounding factors) of the AUC. Second, univariate predictor selection may overlook interaction effects or nonlinear relationships among variables, leading to model overfitting and a subsequent decline in AUC performance when applied to independent datasets. Moreover, in small-sample studies, this approach risks amplifying spurious associations, thereby producing artificially inflated AUC estimates. Third, only three studies used data imputation for missing values, and others used complete ones. Eric van Diessen (2018)28 used single imputation to avoid bias toward complete cases. Omar Yossofzai (2022 & 2023)38,45 checked missing patterns and used the R package missForest, which is good for categorical data and accurate. Using complete datasets preserves data authenticity and ensures reliable results, but complete datasets are hard to get. Deleting missing data reduces samples and loses information, harming model generalization. Imputation increases samples and uses data better, but inaccurate imputed values from wrong methods can introduce bias, distort data relationships, and lower prediction accuracy. Fourth, ten included studies32,33,35,36,41, 42, 43,47, 48, 49 mentioned calibration validation. The main calibration methods are calibration curves and the Hosmer–Lemeshow test, but the latter has drawbacks. While a significant result shows a non-random gap between predicted and observed event rates, it doesn't reveal how big this gap is or if it differs between low- and high-risk patients. Plus, with large samples, even tiny, clinically unimportant differences can appear statistically significant, making the test misleading. Thus, calibration curves visually comparing predicted and observed values are the best way to assess calibration.54 Fifth, validation is lacking for some models, which is consistent with previous research findings.16,55 This suggests more focus is on repeatedly finding predictors and building new models, rather than validating and improving existing ones. Model validation is key for clinical use. Plus, most model reports lack detail, preventing external validation by others and thus their use in clinical guidelines or practice. In addition, the included studies were mostly conducted with model construction and validation carried out in a consecutive time sequence, meaning the model was built on the training dataset first and then immediately validated on the validation dataset. Therefore, more attention should be paid to time-related issues in future research, as model performance typically degrades over time due to case mix changes and temporal drift.
Furthermore, during the model evaluation, we observed inadequate reporting adherence to the Transparent Reporting of a Multivariate Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement in several articles56 and only seven studies (25.93%) mentioned following the TRIPOD statement. 33,36,38,42,45,49,51 This lack of transparency introduces uncertainty and potential bias risks to the models. Inadequate reporting of critical methodological details introduces substantial uncertainty into the meta-analysis.57,58 For instance, missing information on handling missing data, calibration methods, or subgroup analyses may lead to biased effect estimates or overestimation of model generalizability. Similarly, incomplete disclosure of model validation undermines confidence in the robustness of reported performance metrics. These reporting deficiencies not only limit reproducibility but also compromise the validity of cross-study comparisons. Future research should prioritize transparent reporting aligned with TRIPOD statement to mitigate these risks and strengthen the evidentiary foundation for clinical decision-making.
Our systematic review also identified valuable insights for future research. The predictors of models are not limited to patients' subjective information; many studies have focused on various objective data, such as EEG features and MRI features.28,29,39 Firstly, objective data is concrete, measurable metrics less affected by self-reporting biases. It's crucial in paediatric settings where kids may not describe symptoms well. Secondly, it enables the creation of standardized, widely applicable models for different groups and settings. In paediatric care, there is a vast diversity in terms of age, developmental stages, and underlying health conditions. Standardized models using objective data are adaptable across healthcare settings, from big children's hospitals to community clinics, ensuring uniform high-quality care. Lastly, using objective data helps integrate prediction models into clinical decisions by offering objective, evidence-based support for paediatric healthcare workers. In the high-pressure, quick-moving environment of paediatric care, reliable, data-based predictions aid clinicians in making better-informed decisions on diagnosis, treatment, and prognosis. However, there's a lack of consensus on objective indicators, with predictors differing greatly among models. Many studies still use univariate analysis for predictor selection, and future research should further explore objective indicator selection.
This review has some limitations. Firstly, the majority of the studies included were carried out in China, which might restrict the generalizability of the findings to Western populations. Secondly, due to the inadequate reporting in the included studies, only a limited number of models were incorporated into our meta-analysis. This situation may result in the inability to further explore the sources of heterogeneity between studies. Nonetheless, these limitations did not impede the evaluation of the models and, to some extent, mirror the methodological and reporting issues we have identified. Third, since this review only encompassed studies published in English and Chinese, and findings from studies conducted in other major languages were excluded from this review. Lastly, most existing models overly emphasize enhancing discriminative ability like AUC. Given the inconsistent and sometimes incomplete reporting of calibration metrics across included studies, we cannot perform a pooled analysis of calibration indicators for existing models. However, high discriminative power alone can be misleading, as it may not guarantee reliable risk predictions essential for clinical decisions, and AUC has limitations, such as insensitivity to absolute risk changes. Our meta-analysis, which incorporated AUC and its 95% CI, helps address these issues by offering a more comprehensive evaluation of model performance and accounting for estimate variability. Another notable aspect is the absence of established minimum sample size criteria for the studies included in the analysis. Given that the majority of the incorporated research tends to have relatively small sample sizes, this situation may inadvertently permit small validation studies, which are likely to yield unreliable performance estimates.
In total, total fourteen training and seventeen validation models were summarized. Most studies had high bias risk, and most models lacked external validation. Current research showed uneven regional distribution. Models based on clinical features + EEG warrants further exploration. Current models focus too much on boosting discriminative power (e.g., AUC), ignoring accurate probability estimation. This makes their current value for practitioners, policymakers, and guideline developers uncertain. It is time to validate the existing models so as to better summarize their deficiencies and provide solid support for future research.
Contributors
Yuan Luo, Xiaoni Chai and Yunchen Li accessed and verified all of the data in the study and took responsibility for the integrity of the data and the accuracy of the data analysis. Yuan Luo and Xiaoni Chai contributed equally in as co-first authors.
Conceptualization: Yuan Luo and Yunchen Li.
Data curation: Yuan Luo and Xiaoni Chai.
Formal analysis: Yuan Luo and Xiaoni Chai.
Funding acquisition: Yunchen Li.
Methodology: Yuan Luo, Xiaoni Chai and Yunchen Li.
Project administration: Yuan Luo.
Resources: Yuan Luo, Xiaoni Chai and Yunchen Li.
Software: Yuan Luo.
Supervision: Yunchen Li.
Validation: Yunchen Li.
Visualization: Yuan Luo.
Writing—original draft: Yuan Luo and Xiaoni Chai.
Writing—review & editing: Yunchen Li.
Data sharing statement
This study was based on previously published data and is therefore available in the original studies. The study protocol was registered at PROSPERO under the registration ID: CRD42025637913.
Declaration of interests
All authors declare that they have no competing interests.
Acknowledgements
None.
Footnotes
Translation: For the language translation of the abstract see Supplementary Materials section.
Supplementary data related to this article can be found at https://doi.org/10.1016/j.eclinm.2025.103602.
Contributor Information
Yuan Luo, Email: luoyuan0609@qq.com.
Xiaoni Chai, Email: 850235346@qq.com.
Yunchen Li, Email: lhlihuan@csu.edu.cn.
Appendix A. Supplementary data
The leave-one-out results for 14 training models.
Forest plot of the pooled AUC estimates for 14 training models based on robust variance estimation.
The leave-one-out results for 17 validation models.
Forest plot of the pooled AUC estimates for 17 validation models based on robust variance estimation.
References
- 1.Shehryar S., Lara J. Predictive models of epilepsy outcomes. Curr Opin Neurol. 2024;37(2):115–120. doi: 10.1097/WCO.0000000000001241. [DOI] [PubMed] [Google Scholar]
- 2.GBD 2016 Epilepsy Collaborators Global, regional, and national burden of epilepsy, 1990-2021: a systematic analysis for the Global Burden of Disease Study 2021. Lancet Public Health. 2025;10(3):e203–e227. doi: 10.1016/S2468-2667(24)00302-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Singh G., Sander J.W. The global burden of epilepsy report: implications for low- and middle-income countries. Epilepsy Behav. 2020;105 doi: 10.1016/j.yebeh.2020.106949. [DOI] [PubMed] [Google Scholar]
- 4.Specchio N., Wirrell E.C., Scheffer I.E., et al. International league against epilepsy classification and definition of epilepsy syndromes with onset in childhood: position paper by the ILAE task force on nosology and definitions. Epilepsia. 2022;63(6):1398–1442. doi: 10.1111/epi.17241. [DOI] [PubMed] [Google Scholar]
- 5.Ali S., Stanley J., Davis S., et al. Epidemiology of treated epilepsy in New Zealand children: a focus on ethnicity. Neurology. 2021;97(19):e1933–e1941. doi: 10.1212/WNL.0000000000012784. [DOI] [PubMed] [Google Scholar]
- 6.Choi S.A., Lee H., Kim K., et al. Mortality, disability, and prognostic factors of status epilepticus: a nationwide population-based retrospective cohort study. Neurology. 2022;99(13):e1393–e1401. doi: 10.1212/WNL.0000000000200912. [DOI] [PubMed] [Google Scholar]
- 7.Donner E.J., Camfield P., Brooks L., et al. Understanding death in children with epilepsy. Pediatr Neurol. 2017;70:7–15. doi: 10.1016/j.pediatrneurol.2017.01.011. [DOI] [PubMed] [Google Scholar]
- 8.Ali A. Global health: epilepsy. Semin Neurol. 2018;38(2):191–199. doi: 10.1055/s-0038-1646947. [DOI] [PubMed] [Google Scholar]
- 9.Plevin D., Smith N. Assessment and management of depression and anxiety in children and adolescents with epilepsy. Behav Neurol. 2019;2019 doi: 10.1155/2019/2571368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Wirrell E.C., Riney K., Specchio N., Zuberi S.M. How have the recent updated epilepsy classifications impacted on diagnosis and treatment? Expert Rev Neurother. 2023;23(11):969–980. doi: 10.1080/14737175.2023.2254937. [DOI] [PubMed] [Google Scholar]
- 11.Ratcliffe C., Pradeep V., Marson A., Keller S.S., Bonnett L.J. Clinical prediction models for treatment outcomes in newly diagnosed epilepsy: a systematic review. Epilepsia. 2024;65(7):1811–1846. doi: 10.1111/epi.17994. [DOI] [PubMed] [Google Scholar]
- 12.Wolff R.F., Moons K., Riley R.D., et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019;170(1):51–58. doi: 10.7326/M18-1376. [DOI] [PubMed] [Google Scholar]
- 13.Moons K.G.M., Damen J.A.A., Kaul T., et al. PROBAST+AI: an updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods. BMJ. 2025;388 doi: 10.1136/bmj-2024-082505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Borenstein M., Hedges L.V., Higgins J.P.T., Rothstein H.R. 2nd ed. John Wiley & Sons; Oxford: 2021. Introduction to meta-analysis. [Google Scholar]
- 15.Brüggemann P., Rajguru K. Comprehensive meta-analysis (CMA) 3.0: a software review. J Market Anal. 2022;10:425–429. [Google Scholar]
- 16.Fu H., Hou D., Xu R., et al. Risk prediction models for deep venous thrombosis in patients with acute stroke: a systematic review and meta-analysis. Int J Nurs Stud. 2024;149 doi: 10.1016/j.ijnurstu.2023.104623. [DOI] [PubMed] [Google Scholar]
- 17.Xie Q., Wang X., Pei J., et al. Machine learning-based prediction models for delirium: a systematic review and meta-analysis. J Am Med Dir Assoc. 2022;23(10):1655–1668.e6. doi: 10.1016/j.jamda.2022.06.020. [DOI] [PubMed] [Google Scholar]
- 18.Debray T.P., Damen J.A., Snell K.I., et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ. 2017;356 doi: 10.1136/bmj.i6460. [DOI] [PubMed] [Google Scholar]
- 19.Borenstein M. 2022. Comprehensive meta-analysis software. Systematic reviews in health research: meta-analysis in context; pp. 535–548. [Google Scholar]
- 20.Damen J., Moons K., van Smeden M., Hooft L. How to conduct a systematic review and meta-analysis of prognostic model studies. Clin Microbiol Infect. 2023;29(4):434–440. doi: 10.1016/j.cmi.2022.07.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Higgins J.P., Thompson S.G., Deeks J.J., Altman D.G. Measuring inconsistency in meta-analyses. BMJ. 2003;327(7414):557–560. doi: 10.1136/bmj.327.7414.557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Egger M., Davey S.G., Schneider M., Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ. 1997;315(7109):629–634. doi: 10.1136/bmj.315.7109.629. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Bae H., Shin H., Ji H.G., et al. App-based interventions for moderate to severe depression: a systematic review and meta-analysis. JAMA Netw Open. 2023;6(11) doi: 10.1001/jamanetworkopen.2023.44120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Fisher Z., Tipton E. Robumeta: an R-package for robust variance estimation in meta-analysis. arXiv. 2015 [Google Scholar]
- 25.Tanner-Smith E.E., Tipton E., Polanin J.R. Handling complex meta-analytic data structures using robust variance estimates: a tutorial in R. J Dev Life Course Criminol. 2016;2(1):85–112. [Google Scholar]
- 26.Arts W., Geerts A.T., Brouwer O.F., et al. The early prognosis of epilepsy in childhood: the prediction of a poor outcome. The Dutch study of epilepsy in childhood. Epilepsia. 1999;40(6):726–734. doi: 10.1111/j.1528-1157.1999.tb00770.x. [DOI] [PubMed] [Google Scholar]
- 27.Huang L., Li S., He D., Bao W., Li L. A predictive risk model for medical intractability in epilepsy. Epilepsy Behav. 2014;37:282–286. doi: 10.1016/j.yebeh.2014.07.002. [DOI] [PubMed] [Google Scholar]
- 28.van Diessen E., Lamberink H.J., Otte W.M., et al. A prediction model to determine childhood epilepsy after 1 or more paroxysmal events. Pediatrics. 2018;142(6) doi: 10.1542/peds.2018-0931. [DOI] [PubMed] [Google Scholar]
- 29.Sansevere A.J., Kapur K., Peters J.M., et al. Seizure prediction models in the neonatal intensive care unit. J Clin Neurophysiol. 2019;36(3):186–194. doi: 10.1097/WNP.0000000000000574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Chen X. 2020. Development and validation of early predictive model for refractory epilepsy in childhood. [Google Scholar]
- 31.Latzer I.T., Blumovich A., Sagi L., Uliel-Sibony S., Fattal-Valevski A. Prediction of drug-resistant epilepsy in children with cerebral palsy. J Child Neurol. 2020;35(3):187–194. doi: 10.1177/0883073819883157. [DOI] [PubMed] [Google Scholar]
- 32.Jing Q., Yu X., Huang Y., Cai M., He F. Establishment of a risk prediction model for valproic acid treatment effect in children with active epilepsy and related nursing measures investigation. Chin J Birth Health Heredity. 2021;29(12):1777–1781. [Google Scholar]
- 33.Andreas B., Eduardo P., Ismael G., et al. Development and validation of a prediction model for early diagnosis of SCN1A-Related epilepsies. Neurology. 2022;98(11):e1163–e1174. doi: 10.1212/WNL.0000000000200028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Azriel R., Hahn C.D., De Cooman T., et al. Machine learning to support triage of children at risk for epileptic seizures in the pediatric intensive care unit. Physiol Meas. 2022;43(9) doi: 10.1088/1361-6579/ac8ccd. [DOI] [PubMed] [Google Scholar]
- 35.Chen D., Peng X., Cheng H., et al. Risk factors and a predictive model for the development of epilepsy after Japanese encephalitis. Seizure. 2022;99:105–112. doi: 10.1016/j.seizure.2022.05.017. [DOI] [PubMed] [Google Scholar]
- 36.Geng H., Chen X. Development and validation of a nomogram for the early prediction of drug resistance in children with epilepsy. Front Pediatr. 2022;10 doi: 10.3389/fped.2022.905177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Ma J., Wang Z., Cheng T., et al. A prediction model integrating synchronization biomarkers and clinical features to identify responders to vagus nerve stimulation among pediatric patients with drug-resistant epilepsy. CNS Neurosci Ther. 2022;28(11):1838–1848. doi: 10.1111/cns.13923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Yossofzai O., Fallah A., Maniquis C., et al. Development and validation of machine learning models for prediction of seizure outcome after pediatric epilepsy surgery. Epilepsia. 2022;63(8):1956–1969. doi: 10.1111/epi.17320. [DOI] [PubMed] [Google Scholar]
- 39.Zhao X., Jiang D., Hu Z., et al. Machine learning and statistic analysis to predict drug treatment outcome in pediatric epilepsy patients with Tuberous sclerosis complex. Epilepsy Res. 2022;188 doi: 10.1016/j.eplepsyres.2022.107040. [DOI] [PubMed] [Google Scholar]
- 40.Eriksson M.H.H., Ripart M., Piper R.J.J., et al. Predicting seizure outcome after epilepsy surgery: do we need more complex models, larger samples, or better data? Epilepsia. 2023;64(8):2014–2026. doi: 10.1111/epi.17637. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Hu Z., Jiang D., Zhao X., et al. Predicting drug treatment outcomes in childrens with tuberous sclerosis complex?related epilepsy: a clinical radiomics study. AJNR Am J Neuroradiol. 2023;44(7):853–860. doi: 10.3174/ajnr.A7911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Panda P.K., Elwadhi A., Gupta D., et al. Development and validation of a predictive model assessing the risk of seizure recurrence in children with neurocysticercosis. Epilepsy Res. 2023;197 doi: 10.1016/j.eplepsyres.2023.107239. [DOI] [PubMed] [Google Scholar]
- 43.Wu Y., Zhang Z., Liang P., Li L., Zhai X. Development and validation of a nomogram for predicting seizure outcomes after epilepsy surgery for children with focal cortical dysplasia. Turk Neurosurg. 2023;33(4):683–690. doi: 10.5137/1019-5149.JTN.40718-22.2. [DOI] [PubMed] [Google Scholar]
- 44.Sun X., Zhao J., Guo C., Zhu X. Early prediction of epilepsy after encephalitis in childhood based on EEG and clinical features. Emerg Med Int. 2023;2023 doi: 10.1155/2023/8862598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Yossofzai O., Stone S.S.D., Madsen J.R., et al. Machine learning models for predicting seizure outcome after MR-guided laser interstitial thermal therapy in children. J Neurosurg Pediatr. 2023;32(6):739–749. doi: 10.3171/2023.8.PEDS23240. [DOI] [PubMed] [Google Scholar]
- 46.Cheng T., Hu Y., Qin X., et al. A predictive model combining connectomics and entropy biomarkers to discriminate long-term vagus nerve stimulation efficacy for pediatric patients with drug-resistant epilepsy. CNS Neurosci Ther. 2024;30(7) doi: 10.1111/cns.14751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Li Q., Liu H., Dong G. Risk factors for epileptic seizures after febrile seizures in children andconstruction of a nomogram prediction model. Chin J Gen Pract. 2024;22(1) [Google Scholar]
- 48.Liang T., Lin C., Ning H., et al. Pre-treatment risk predictors of valproic acid-induced dyslipidemia in pediatric patients with epilepsy. Front Pharmacol. 2024:15. doi: 10.3389/fphar.2024.1349043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Wissel B.D., Greiner H.M., Glauser T.A., et al. Early identification of candidates for epilepsy surgery: a multicenter, machine learning, prospective validation study. Neurology. 2024;102(4):e208048. doi: 10.1212/WNL.0000000000208048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Zhang H., Ceng H., Chen X., Sun Z., Jiang S. Prognostic risk factors of drug-resistant epilepsy in children and construction and evaluntion of regression prediction model. J Brain Nerv Dis. 2024;32(7):453–458. [Google Scholar]
- 51.Dai K., Tang D., Bao L., et al. Development and validation of a predictive model for seizure recurrence following discontinuation of antiseizure medication in children with epilepsy: a systematic review and meta-analysis, and prospective cohort study. eClinicalMedicine. 2025;82 doi: 10.1016/j.eclinm.2025.103154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Kuang M., Cao H., Chen H., Li W., Wang X. Prediction model establishment and efficacy evaluation of recurrence in children with epilepsy after drug withdrawal based on clinical characteristics and video electroencephalography. J Pediatr Pharm. 2025;31(5):6–11. [Google Scholar]
- 53.Çorbacıoğlu Ş.K., Aksel G. Receiver operating characteristic curve analysis in diagnostic accuracy studies: a guide to interpreting the area under the curve value. Turk J Emerg Med. 2023;23(4):195–198. doi: 10.4103/tjem.tjem_182_23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Alba A.C., Agoritsas T., Walsh M., et al. Discrimination and calibration of clinical prediction models: users' guides to the medical literature. JAMA. 2017;318(14):1377–1384. doi: 10.1001/jama.2017.12126. [DOI] [PubMed] [Google Scholar]
- 55.Damen J.A.A.G., Hooft L., Schuit E., et al. Prediction models for cardiovascular disease risk in the general population: systematic review. BMJ. 2016;353 doi: 10.1136/bmj.i2416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Collins G.S., Reitsma J.B., Altman D.G., Moons K.G. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350 doi: 10.1136/bmj.g7594. [DOI] [PubMed] [Google Scholar]
- 57.Wynants L., Van Calster B., Collins G.S., et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ. 2020;369 doi: 10.1136/bmj.m1328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Whittle R., Peat G., Belcher J., Collins G.S., Riley R.D. Measurement error and timing of predictor values for multivariable risk prediction models are poorly reported. J Clin Epidemiol. 2018;102:38–49. doi: 10.1016/j.jclinepi.2018.05.008. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
The leave-one-out results for 14 training models.
Forest plot of the pooled AUC estimates for 14 training models based on robust variance estimation.
The leave-one-out results for 17 validation models.
Forest plot of the pooled AUC estimates for 17 validation models based on robust variance estimation.



