Abstract
Background
Accurate mortality prediction following transcatheter aortic valve implantation (TAVI) is essential for mitigating risk, shared decision-making and periprocedural planning. Surgical risk models have demonstrated modest discriminative value for patients undergoing TAVI and are typically poorly calibrated, with incremental improvements seen in TAVI-specific models. Machine learning (ML) models offer an alternative risk stratification that may offer improved predictive accuracy.
Methods
PubMed, EMBASE, Web of Science and Cochrane databases were searched until 16 December 2023 for studies comparing ML models with traditional statistical methods for event prediction after TAVI. The primary outcome was comparative discrimination measured by C-statistics with 95% CIs between ML models and traditional methods in estimating the risk of all-cause mortality at 30 days and 1 year.
Results
Nine studies were included (29 608 patients). The summary C-statistic of the top performing ML models was 0.79 (95% CI 0.71 to 0.86), compared with traditional methods 0.68 (95% CI 0.61 to 0.76). The difference in C-statistic between all ML models and traditional methods was 0.11 (p<0.00001). Of the nine studies, two studies provided externally validated models and three studies reported calibration. Prediction Model Risk of Bias Assessment Tool tool demonstrated high risk of bias for all studies.
Conclusion
ML models outperformed traditional risk scores in the discrimination of all-cause mortality following TAVI. While integration of ML algorithms into electronic healthcare systems may improve periprocedural risk stratification, immediate implementation in the clinical setting remains uncertain. Further research is required to overcome methodological and validation limitations.
Keywords: Aortic Valve Stenosis, Transcatheter Aortic Valve Replacement, Meta-Analysis, Systematic Reviews as Topic, Translational Medical Research
WHAT IS ALREADY KNOWN ON THIS TOPIC
Traditional surgical risk models showed modest accuracy in predicting mortality following transcatheter aortic valve implantation (TAVI), with only slight improvements in TAVI-specific models. We hypothesised that machine learning (ML) models may offer improved predictive accuracy.
WHAT THIS STUDY ADDS
This study confirms that ML models outperform traditional risk scores in predicting all-cause mortality post-TAVI, offering superior discrimination. It provides a detailed comparison of ML versus traditional methods, highlighting key strengths and limitations.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
These findings may lead to wider, prospective external validation of ML models for risk prediction after TAVI, with the view to integrate within clinical systems for improved personalised risk assessment and resource allocation. The results may also influence future clinical guidelines and policymaking regarding the use of ML in healthcare.
Introduction
Transcatheter aortic valve implantation (TAVI) presents a minimally invasive alternative to surgical aortic valve replacement (SAVR) and cardiopulmonary bypass. The indication for TAVI began with patients at prohibitive risk for SAVR for severe symptomatic aortic stenosis1 and has gradually expanded as TAVI technology has improved and longitudinal outcomes post-TAVI are reported. Contemporaneous guidelines now recommend TAVI as a viable alternative to SAVR in patients with aortic stenosis over age 65 who have an indication for valve replacement.2 3 Central to this process is a multidisciplinary assessment, including individualised risk stratification.3 As such, accurate prediction of risk in TAVI procedures is essential for enabling shared decision-making, identifying high-risk patients and periprocedural planning.3
The traditional approach to choosing TAVI versus SAVR has included risk modelling for SAVR; frequently used is The Society of Thoracic Surgeons Short-Term/Operative Risk Calculator (STS-PROM).2 3 The STS-PROM achieved a reasonably high level of discrimination for operative mortality with a C-statistic of 0.775.4 Relying on backwards-selected logistic regression with minimal terms for interaction, it offers computational simplicity and interpretability. Eight outcomes of interest in valvular disease use a total of 59 variables, with some variables not universally incorporated in all models. Periodically, the model is recalibrated to adjust to temporal trends in complication rates.5 For patients who ultimately undergo TAVI, the calculated score demonstrates modest discriminative capacity to offer patients and the treating team a relative risk of 30-day operative mortality, but it is poorly calibrated to the risk profile of the TAVI procedure.6 For risk stratification in TAVI procedures, there are comparatively less data available. One of the largest TAVI registries is the US-based STS-American College of Cardiology Transcatheter Valve Therapy Registry (STS-ACC), which collected outcomes on a total of 276 316 procedures since its establishment in 2011.7 During the last decade, several traditional TAVI-specific risk prediction models have been validated for in-hospital or 30-day all-cause mortality. Among these are ACC-TAVI, FRANCE-2 and GAVS.8,10 To date, these models have demonstrated a moderate discriminative ability.
Risk model development adheres to a standardised framework involving model fitting using a derivation dataset, with internal validation on a ‘held out’ dataset, followed by evaluation on an external dataset to assess model performance. Several metrics are collectively employed to determine a risk model’s effectiveness, including discrimination, calibration and newer measures including risk reclassification analysis. They can be designed to streamline the inclusion of new data and facilitate regular refinement/calibration. Machine learning (ML) is a form of artificial intelligence (AI) and has been hypothesised to improve risk stratification. Supervised ML train models based on pairs of predictors and known outcomes. These models typically allow for very large parameterisation spaces, which increases the risk of overfitting; so ML models must undergo a validation process with assessment of accuracy, discrimination and calibration.11 When compared with traditional statistical modelling for binary outcomes (eg, logistic regression) in cardiovascular medicine, supervised ML has demonstrated inconsistent results.12
Recent meta-analyses have highlighted the potential clinical utility of ML models for risk prediction after TAVI, amidst inconsistent results and methodology.13 14 We hypothesised that ML models may provide superior discrimination of all-cause mortality when compared with traditional risk scores. Thus, the aim of this current study was to consolidate the existing evidence comparing discriminative accuracy of ML methods with traditional risk scores for risk prediction after TAVI. As a secondary aim, we evaluated methodological quality, clinical utility and validation practices of these models within the context of current data governance and reporting standards.
Methods
This systematic review and meta-analysis is reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and Meta-analysis Of Observational Studies in Epidemiology statement guidelines.15 16 This review is registered with PROSPERO (CRD42023495584).
Search terms
The search strategy was conducted across PubMed, Embase, Web of Science and Cochrane databases up to 16 December 2023. The search was restricted to studies from the year 2000 onward, reflecting a focus on contemporary studies employing advanced ML algorithms. We conducted a literature search in databases using the following MeSH terms and keywords in the title or abstract: ‘Transcatheter Aortic Valve Implantation,’ ‘TAVR,’ ‘Aortic Valve Stenosis’ and related terms for aortic stenosis. We also included a range of terms to capture studies involving machine learning and artificial intelligence, such as ‘machine learning,’ ‘neural networks,’ ‘random forests,’ ‘decision trees,’ ‘knowledge representation,’ ‘computer vision systems,’ ‘computer reasoning,’ ‘natural language processing,’ ‘connectionist models,’ ‘expert systems’ and variations of these terms with prefixes including ‘artificial,’ ‘computer*,’ ‘machine,’ ‘deep,’ ‘transfer,’ ‘hierarchical’ and their associations with ‘intelligence,’ ‘learning’ and ‘reasoning’. Further details regarding the search strategy can be found in online supplemental appendix 1.
Inclusion and exclusion criteria
We considered all studies that evaluated the risk of all-cause mortality after an initial admission for TAVI using both ML algorithms and conventional risk scores. Studies were excluded if they did not assess ML and traditional risk score models on the same dataset or if their predictive models lacked all-cause mortality as an outcome. Additionally, we excluded review articles, abstracts without available full texts and non-English studies. No restrictions were placed on minimum follow-up duration.
Outcomes
The primary outcome compares discrimination of all-cause mortality in-hospital, at 30 days or 1 year. The C-statistic (area under the receiver-operating characteristic curve) was selected as the discrimination metric.17
Data collection and quality asessment
Four reviewers (CM, DM, JG and SS) screened all titles and abstracts independently, with all candidate articles screened by at least two reviewers. Any conflicts were reviewed by the inclusion of two separate reviewers (AG and AZ) and resolved by consensus. This was performed with a free-to-use web application (Rayyan, Qatar Computing Research Institute, Ar-Rayyan, Qatar).18 The selected articles then underwent a full-text review conducted independently by four reviewers (CM, DM, JG and SS). Data from the selected studies were collected using a standardised extraction template that had undergone pilot testing. Four independent reviewers (CM, DM, JG and SS) performed the data extraction, and disagreements were resolved either through discussion or by consulting a third reviewer (AZ). Methodological quality was assessed by six reviewers (AZ, CM, DM, JG, SS and BM) using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Any disagreements regarding classifications were resolved through consensus among the reviewers.
Statistical analysis
A random-effects model was used to account for variations in populations and outcomes. Studies were included in the meta-analysis if they reported or permitted the calculation of C-statistics with 95% CIs for both ML and traditional risk score models. Statistical significance of pooled C-statistics was determined, with a threshold set at p<0.05. Similar to previous meta-analyses,12 to avoid unit-of-analysis errors, only the best-performing model was included in the meta-analysis when multiple ML or traditional models were evaluated from the same dataset. Consistent with prior approaches, heterogeneity was not formally assessed and presumed due to variability in baseline demographics, sample size and model features.12 19 We calculated prediction intervals using Hartung-Knapp-Sidik-Jonkman (HKSJ) method, in order to mitigate the risk of type 1 error. A narrative assessment of calibration was conducted due to inconsistency in reporting metrics.20 21 Similar to previous approaches, ML algorithms were classified into six categories: logistic regression-based algorithms, random trees, support vector machines, gradient boosted trees, neural networks and other mixed ensemble ML methods. Based on the number of covariates included, ML models were further grouped as low (traditional cardiovascular risk factors), moderate (including non-traditional lab results) or high (incorporating biobank or imaging data), following the methodology by Liu et al22 and Zaka et al.12
Results
Search results
After eliminating duplicate records, the literature search identified 1476 unique studies. Of these, nine studies met the inclusion criteria for this systematic review and meta-analysis (refer to figure 1). The reasons for excluding full-text studies are outlined in detail inonline supplemental materials.
Figure 1. PRISMA flow chart. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses; TAVI, transcatheter aortic valve implantation.
Included studies
A total of nine studies published between 2019 and 2023 were included in this meta-analysis (see table 1). The cohort sizes varied, ranging between 129 and 10 883 patients. Two studies reported externally validated models. All nine studies provided measures of discrimination for both ML models and traditional risk scores. Calibration was reported in four studies through a combination of calibration plots (n=3) and Brier scores (n=1), while six studies did not include any calibration metrics.
Table 1. Characteristics of included studies.
| Study | Registry | Design | Outcome | N | Machine learning model | Traditional risk model | |||
| Model | Covariates | Calibration | Dataset partition (training, internal validation, external validation) | ||||||
| Agasthi47 2021 | Mayo Clinic database | Retrospective Cohort Study | 1-year mortality | N=1055 | Gradient Boosting Machine | High covariate status including 163 covariates for outcome prediction, highlighting the top 20, which include Resting Cardiac Power Index, preprocedural haemoglobin, and systolic blood pressure as leading indicators. | Not reported | Internal Validation | TAVI2-SCORE |
| Al-Farra48 2022 | Netherlands Heart Registration | Retrospective Cohort Study | 30-day mortality | N=9144 | Logistic Regression (LR) with MICE (Multiple Imputations by Chained Equations) | High covariate status including age, serum creatinine, left ventricular ejection fraction (%), body surface area (m2), NYHA class, procedure acuity, chronic lung disease, critical preoperative state, diabetes mellitus without medication TAVI access route (transfemoral, subclavian, transapical or direct aortic) | LR with MICE:Brier score: 0.039Brier-skill score: 0.001Calibration-intercept:−0.03Calibration-slope: 1.00 | Internal Validation | ACC-TAVI |
| Gomes49 2021 | Department of Internal Medicine III, University of Heidelberg. | Retrospective Cohort Study (Single centre) | In-hospital mortality | N=451 | Neural network (NN)Support vector machineRandom Forrest | High covariate status including baseline comorbidities, echocardiogram and CT findings, laboratory results, ECG readings, valve specifics, functional class, intraprocedural metrics and clinical outcomes | Not reported | Internal validation | STS risk score |
| Hernandez-Suarez50 2019 | National Inpatient Sample (NIS)TAVR database | Retrospective Cohort Study | In-hospital mortality | N=10 883 | LRArtificial NN (Artificial NN)Naïve BayesRandom Forest (RF) | High covariate status including demographics, lifestyle, cardiovascular history, comorbidities, and socioeconomic status, along with hospital characteristics and TAVR procedural approaches | Not reported | Internal validation | NIS TAVR |
| Hoffman51 2020 | Department of Cardiology, Centre of Internal Medicine, Goethe University Frankfurt, Germany | Prospective, Observational Cohort Study | 1-year mortality | N=129 | XGBoost | Moderate covariate status including 25 covariates such as STS score, frailty, various pre-TAVI and post-TAVI biomarkers such as immune cell percentages, albumin, red cell distribution width, ejection fraction, and inflammation markers | Not reported | Internal validation | Linear Predictor Score |
| Kwiecinski31 2023 | Medical University of Warsaw and the Cardiovascular Institute, Hospital Clinico San Carlos, Madrid (validation cohort). | Retrospective Cohort Study | 1-year mortality | N=604 | XGBoost | High covariate status including pre- and post-TAVI patient data including demographics, medical history, cardiac function, and procedural outcomes | XGBoost calibration plot provided shows good calibration with the observed 1-year risk of call-cause mortality. In the external validation, 77 (91%) events occurred in patients with the ML score within the top two deciles. | External validation of 823 patients was performed on unseen data from two other independent high-volume TAVI centres (validation cohort). | OBSERVANT |
| Lertsanguansinchai45 2023 | King Chulalongkorn Memorial Hospital, Bangkok, Thailand | Retrospective Cohort Study (Single centre) | 1-year mortality | N=178 |
|
High covariate status including height, chronic lung disease, STS score, preoperative left ventricular ejection fraction (LVEF), age, preoperative Left Ventricular Outflow Tract Velocity-Time Integral (LVOT)For the 1 year mortality model: preoperative LVEF, STS score, heart rate, systolic blood pressure, home oxygen use, serum creatinine level, preoperative LVOT maximum velocity (LVOT Vmax) | Not reported | Internal validation | CoreValve |
| Leha32 2023 | German Aortic Valve Registry 2011–2017 | Retrospective Cohort Study | 30-day mortality | N=6693 | TRIMpost | High covariate status including 58 covariates with most influential variables including duration of intervention, fluoroscopy, serum creatinine, weight, age, contrast load, LVEF, AVA, iliac artery diameter, PA pressure, aortic valve annulus dimensions, Pmax post TAVI, Ppeak to leak | Calibration curves were generated comparing the estimated and the observed proportion of events in the deciles of the risk scores. TRIM intercept 0.59 (95% CI 0.03 to 1.10) and slope 2.71 (95% CI 2.32 to 3.10). STS intercept −0.67 (95% CI −1.26 to 0.1) and slope 1.08 (95% CI 0.86 to 1.3) | External validation | STS score |
| Penso52 2021 | Centro Cardiologico Monzino IRCCS in Milan, Italy | Retrospective Cohort Study | 5-year mortality | N=471 | RFXGBoostMultilayer PerceptronAdditionally:LR | High covariate status including 83 preprocedural variables were considered, encompassing demographics, cardiovascular status, comorbidities, medication use, and detailed echocardiographic data | Not reported | Internal validatoin | EuroSCORE II |
ACCAmerican College of CardiologyAVAaortic valve areaNISNational Inpatient SampleNYHANew York Heart AssociationOBSERVANTObservatoire Francais du Remplacement Valvulaire Aortique par Voie PercutanéePApulmonary arterySTSSociety of Thoracic SurgeonsTAVItranscatheter aortic valve implantationTAVRtranscatheter aortic valve replacementTRIMpostTranscatheter Risk of Mortality Post-OperativeXGBoostExtreme Gradient Boosting
There were a total number of 16 models. These included random forests (n=5), neural networks (n=2), logistic regression-based approaches (n=2), support vector machines (n=1), gradient-boosted trees (n=4) and other classical ML methods (n=2). The number of covariates incorporated into these models ranged from 6 to 163. Among the traditional risk scores, the CoreValve score (n=3), TAVI2-SCORE (n=3), FRANCE-2 score (n=3) and Observatoire Francais du Remplacement Valvulaire Aortique par Voie Percutanée score (n=3) were most frequently reported. Further details on the characteristics of the included studies are provided in table 1.
Risk of bias
Assessment using the PROBAST tool revealed that all 16 models had a high overall risk of bias. The primary concerns were related to handling of missing data, inadequate reporting of calibration metrics and insufficient detailing of predictor weighting. A comprehensive risk of bias evaluation is provided in online supplemental appendix 3.
Primary outcome: discrimination of included studies
A total of 9 studies reported algorithms trained and validated on a total of 29 608 participants contributed to the meta-analysis of model discrimination (see figure 1). Among the evaluated outcomes, three studies focused on all-cause mortality at 1 year, three studies examined in-hospital mortality, three studies analysed 30-day all-cause mortality and one study evaluated both 30-day and 1-year all-cause mortality. All included C-statistics from the ML model and traditional risk scores were calculated from internal validation performances. Where reported, external validation performances were also included.
The leading ML models achieved a C-statistic of 0.79 (95% CI 0.71 to 0.86), outperforming the highest-scoring traditional methods, which had a C-statistic of 0.68 (95% CI 0.61 to 0.76) figure 1 and (figure 2). The observed difference in C-statistics between all ML approaches and traditional techniques was 0.11, which was statistically significant (p<0.00001) (figure 3). To mitigate the impact of larger studies on the results, a fixed-effects model was applied. This approach yielded a pooled C-statistic of 0.86 (95% CI 0.85 to 0.87) for the top-performing ML models, compared with 0.77 (95% CI 0.76 to 0.78) for the best traditional methods. As shown in table 2, the top three performing ML models included support vector machines with a C-statistic of 0.94 (95% CI 0.91 to 0.97), neural network models which had a C-statistic of 0.91 (95% CI 0.80 to 1.01) and random forest models which had a C-statistic of 0.86 (95% CI 0.73 to 0.99); test for subgroup differences (p<0.00001). C-statistic and prediction intervals for ML models are summarised in table 3.
Figure 2. Summary C-statistic (95% CI) for top-performing ML models versus top-performing traditional risk score for all-cause mortality. ACC, American College of Cardiology; GBM, Gradient Boosting Machine; LR, logistic Regression; ML, machine learning; MLP, multilayer perceptron; NIS, National Inpatient Sample; OBSERVANT, Observatoire Francais du Remplacement Valvulaire Aortique par Voie Percutanée; RF, random forest; STS, Society of Thoracic Surgeons; TAVI, transcatheter aortic valve implantation; TAVR, transcatheter aortic valve replacement; XGBoost, Extreme Gradient Boosting.
Figure 3. Central graphical abstract. PROBAST, Prediction Model Risk of Bias Assessment Tool; TRIPOD, Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis.
Table 2. Subgroup analysis of discriminative performance by ML subalgorithm.
| Subgroup analysis (number of included studies) | ML summary C-statistic | P value |
| Random forest (n=4) | 0.84 (0.71–0.96) | <0.00001 |
| Gradient Boosted Tree (n=4) | 0.74 (0.68–0.80) | <0.00001 |
| Neural network (n=2) | 0.91 (0.80–1.01) | <0.00001 |
| Logistic regression (n=2) | 0.82 (0.61–1.02) | <0.00001 |
| Support vector machines (n=1) | 0.94 (0.91–0.97) | <0.00001 |
| Other ML methods (n=2) | 0.80 (0.69–0.91) | <0.00001 |
MLmachine learning
Table 3. Comparison of ML models and traditional risk scores in discrimination (C-statistics with 95% CI and prediction intervals).
| Model | C-statistic | 95% CI (lower, upper) | Prediction interval (lower, upper) |
| Agasthi 2021 GBM | 0.72 | (0.68, 0.76) | (0.71, 0.73) |
| Agasthi 2021 TAVI2-SCORE | 0.56 | (0.50, 0.62) | (0.55, 0.57) |
| Al-Farra 2022 LR MICE | 0.71 | (0.64, 0.78) | (0.69, 0.73) |
| Al-Farra 2022 ACC-TAVI | 0.64 | (0.61, 0.67) | (0.63, 0.65) |
| Gomes 2021 RF | 0.97 | (0.95, 0.99) | (0.97, 0.97) |
| Gomes 2021 STS risk score | 0.64 | (0.62, 0.66) | (0.64, 0.65) |
| Hernandez-Suarez 2019 LR | 0.92 | (0.89, 0.95) | (0.91, 0.93) |
| Hernandez-Suarez 2019 NIS TAVR | 0.92 | (0.88, 0.96) | (0.91, 0.93) |
| Hoffman 2020 XGBoost | 0.66 | (0.56, 0.76) | (0.64, 0.68) |
| Hoffman 2020 Linear Predictor Score | 0.88 | (0.78, 0.99) | (0.86, 0.90) |
| Kwiecinski 2023 XGBoost | 0.82 | (0.78, 0.86) | (0.81, 0.83) |
| Kwiecinski 2023 OBSERVANT | 0.59 | (0.54, 0.64) | (0.58, 0.60) |
| Leha 2023 TRIMpost | 0.75 | (0.72, 0.78) | (0.74, 0.76) |
| Leha 2023 STS score | 0.67 | (0.63, 0.71) | (0.66, 0.68) |
| Lertsanguansinchai 2023 DT+RF | 0.71 | (0.60, 0.82) | (0.68, 0.74) |
| Lertsanguansinchai 2023 CoreValve Score | 0.68 | (0.57, 0.79) | (0.66, 0.70) |
| Penso 2021 MLP | 0.77 | (0.73, 0.81) | (0.76, 0.78) |
| Penso 2021 EuroSCORE II | 0.6 | (0.55, 0.65) | (0.59, 0.61) |
ACCAmerican College of CardiologyC-statisticConcordance StatisticGBMGradient Boosting MachineMICEMultiple Imputation by Chained EquationsMLmachine learningMLPmultilayer perceptronNISNational Inpatient SampleOBSERVANTObservatoire Francais du Remplacement Valvulaire Aortique par Voie PercutanéeRFrandom forestSTSSociety of Thoracic SurgeonsTAVITranscatheter Aortic Valve ImplantationTAVRtranscatheter aortic valve replacementTRIMpostTranscatheter Risk of Mortality Post-OperativeXGBoostExtreme Gradient Boosting
Calibration of included studies
Calibration was reported in three of the nine included studies. One study reported calibration of the ML model only. One study demonstrated calibration was equally good in both ML and traditional risk score models.
Subgroup analysis: machine-learning subtypes and model covariates
All ML algorithm subtypes outperformed traditional risk scores in model discrimination with statistical significance (table 2). Eight out of nine studies were identified as high covariate while one study was identified as moderate covariate. Due to the predominance of the studies in the high covariate group, subgroup analysis could not be performed.
Discussion
This study evaluated the performance of ML models versus traditional methods in predicting mortality risk among TAVI patients. The findings revealed that ML approaches achieved slightly better discrimination than traditional risk scores, with an 11% improvement in identifying patients at risk of all-cause mortality at 30 days and 1-year post-TAVI. However, it is worth noting that only two ML models underwent external validation and the included studies demonstrated significant bias (figure 3).
The conventional method of deciding between TAVI and SAVR often involves using STS-PROM or EuroSCORE II. Despite simple interpretability due to minimal interaction terms, these scores recalibrate periodically for current trends.4 Notably, both scores, though indicative of relative risk, are not calibrated specifically to TAVI patients and their diverse risk profile.23 24 TAVI-specific risk stratification data is less extensive; models like ACC-TAVI, FRANCE-2 and GAVS show moderate discriminative ability but have not been disseminated throughout clinical guidelines, in the setting of early commitment to TAVI by operators and evolving guidelines that favour TAVI for intermediate and low-risk patients.725,29 Patient-specific risk estimates have been hypothesised to provide tailored post-TAVI monitoring and more efficient resource distribution.30,32
This systematic review and meta-analysis complements and extends the result of previous analyses.13 14 Previous meta-analysis by Sazzad et al14 reported superior discrimination of all-cause mortality at varying time points after TAVI (30 days, 1 year and 5 years). The current review is distinguished by (a) inclusion of prediction intervals using the HKSJ method, (b) inclusion of an extra eligible study, (c) adherence to standardised reporting guidelines including CHARMS, Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) and PROBAST guidelines for evaluation of methodological quality and risk of bias. These features not only improve the accuracy of the evidence but also provide a crucial assessment of clinical utility, model performance, standardised reporting and validation practices in response to the exponential increase in recent ML applications and the challenges associated with broader validation and clinical integration.
In the current review, despite ML models significantly outperforming traditional methods, included studies exhibited methodological limitations. In alignment with previous research, only three studies reported calibration, and the variability in calibration metrics underscores the absence of standardised reporting practices.2022 33,36 None of the included studies complied with the TRIPOD reporting framework, pointing to its limited application in AI-focused research and underscoring the pressing need for revised and more relevant guidelines.37 38 Despite limited external validations across the included studies, notably, both Kwiecinski et al’s and Leha et al’s models achieved high accuracy in external validations, with the former being tested in two independent clinical centres and the latter across the Swiss TAVI registry, incorporating pre and postprocedural data.31 32 Recent evaluations of ML models for TAVI patients have highlighted further methodological challenges, particularly in model selection and variable inclusion, impacting the models’ generalisability and applicability to clinical practice. Addressing class imbalance is crucial, with different approaches like cost-sensitive learning potentially offering more balanced model sensitivity without extensive data manipulation.39 Furthermore, while the inclusion of numerous predictive variables can enhance model accuracy, it is essential to ensure that the added complexity does not compromise model interpretability and robustness. This calls for a judicious selection of features, prioritising those that provide significant predictive power while minimising the risk of overfitting and enhancing model stability.
The results of this study align with the hypothesis that ML models provide enhanced discrimination in predicting all-cause mortality among TAVI patients. Importantly, many models evaluated in the current review included readily available variables that are available from electronic medical records. Although challenges in methodology hinder their immediate implementation in clinical settings, ML models hold promise for future use, particularly due to their capacity to process real-time ‘big data’ and seamlessly integrate with electronic medical record systems.30 40 Moreover, traditional risk scores developed from SAVR patient data may inadequately reflect the comorbidities affecting the TAVI patient, and few models offer risk stratification beyond 30 days for TAVI recipients.8 9 41 By generating individualised risk estimates, ML models have the potential to facilitate more personalised consent, and follow-up for those at the highest risk of adverse events after the index procedure.31 32 Accurate knowledge of a patient’s mortality or readmission risk can allow clinicians to anticipate patient trajectories and personalise treatment plans. This precision in prediction not only facilitates early interventions but also enhances the allocation of healthcare resources, ensuring that high-risk patients receive appropriate care interventions timely. For instance, by identifying patients with a high 1-year risk of all-cause mortality, insurers can deploy targeted interventions like home nurse visits, early outpatient follow-up, ensuring optimal management and cost-effective use of resources while also providing intensive education and early follow-up to these high-risk patients to efficiently reduce readmission rates. An accurate mortality risk prediction tool could be helpful when there is uncertainty about whether this is the case for an individual patient. Furthermore, ‘all-cause’ mortality (as opposed to cardiovascular death) serves as an appropriate outcome for risk prediction models in hospitalised HF patients, offering a holistic view of mortality risk that encompasses all potential causes. Risk prediction enhances rather than replaces clinical judgement by providing additional data that can be integrated into decision-making processes. Predicting TAVI outcomes remains challenging due to the homogeneous nature of patients when assessed using conventional predictors.42 While many included studies found no single metric to be a strong predictor of mortality, ML models can provide a relative ranked contribution of variables, enhancing interpretability.43 ML’s capacity to handle various data types, including imaging, allows for more comprehensive outcome prediction and is accessible to clinicians preprocedure or predischarge.31 44 45 Notably, models in the current study were predominantly rated high for covariate inclusion, encompassing patient comorbidities, hospitalisation acuity, socioeconomic factors and periprocedural clinical and echocardiographic data. While ML models present a significant advancement in risk prediction accuracy, their clinical adoption is currently hindered by data governance and regulatory compliance barriers.
Defining robust evidence standards for ML integration in clinical practice remains uncertain, sparking debate over whether early adoption risks inefficiencies or delayed implementation forfeits potential patient benefits. Establishing standardised guidelines and performance measures could facilitate integration and advance development. These findings highlight the critical need for more external validation in medical ML research. For TAVI algorithms to reach implementation and improve patient outcomes, international collaboration enabling secure data exchange, regulatory frameworks to allow wider validation and algorithm sharing are necessary.
Limitations
In line with existing ML research, this study faces notable limitations. There is often insufficient disclosure regarding the specifics of ML algorithms, including neural network structures, training periods, data sets and hyperparameter selections. The diversity in training ML models leads to difficulties in interpretability and when coupled with inconsistencies in internal validation, baseline demographics, covariate inclusion, outcome definitions and reporting metrics, this further limits the generalisability of these findings. The paucity of external validations limits immediate clinical application and increases the potential for overfitting. As such, despite its recognised innovative potential, ML has not yet been incorporated into formal clinical risk stratification guidelines.46
Conclusion
ML models demonstrated superior discriminatory performance compared with traditional risk scores in predicting all-cause mortality following TAVI. However, their clinical integration remains limited by methodological challenges, underscoring the need for further validation and standardised performance evaluation.
supplementary material
Footnotes
Funding: The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Provenance and peer review: Not commissioned; externally peer reviewed.
Patient consent for publication: Not applicable.
Data availability free text: The data underlying this article are available in the article and in its online supplementary material.
Data availability statement
Data are available in a public, open access repository.
References
- 1.Leon MB, Smith CR, Mack M, et al. Transcatheter aortic-valve implantation for aortic stenosis in patients who cannot undergo surgery. N Engl J Med. 2010;363:1597–607. doi: 10.1056/NEJMoa1008232. [DOI] [PubMed] [Google Scholar]
- 2.Otto CM, Nishimura RA, Bonow RO, et al. 2020 ACC/AHA Guideline for the Management of Patients With Valvular Heart Disease. J Am Coll Cardiol. 2021;77:e25–197. doi: 10.1016/j.jacc.2020.11.018. [DOI] [PubMed] [Google Scholar]
- 3.Vahanian A, Beyersdorf F, Praz F, et al. 2021 ESC/EACTS Guidelines for the management of valvular heart disease: Developed by the Task Force for the management of valvular heart disease of the European Society of Cardiology (ESC) and the European Association for Cardio-Thoracic Surgery (EACTS) Eur Heart J. 2021;43:561–632. doi: 10.1093/eurheartj/ehab395. [DOI] [Google Scholar]
- 4.O’Brien SM, Feng L, He X, et al. The Society of Thoracic Surgeons 2018 Adult Cardiac Surgery Risk Models: Part 2—Statistical Methods and Results. Ann Thorac Surg. 2018;105:1419–28. doi: 10.1016/j.athoracsur.2018.03.003. [DOI] [PubMed] [Google Scholar]
- 5.Shahian DM, Jacobs JP, Badhwar V, et al. The Society of Thoracic Surgeons 2018 Adult Cardiac Surgery Risk Models: Part 1—Background, Design Considerations, and Model Development. Ann Thorac Surg. 2018;105:1411–8. doi: 10.1016/j.athoracsur.2018.03.002. [DOI] [PubMed] [Google Scholar]
- 6.Beohar N, Whisenant B, Kirtane AJ, et al. The relative performance characteristics of the logistic European System for Cardiac Operative Risk Evaluation score and the Society of Thoracic Surgeons score in the Placement of Aortic Transcatheter Valves trial. J Thorac Cardiovasc Surg. 2014;148:2830–7. doi: 10.1016/j.jtcvs.2014.04.006. [DOI] [PubMed] [Google Scholar]
- 7.Carroll JD, Mack MJ, Vemulapalli S, et al. STS-ACC TVT Registry of Transcatheter Aortic Valve Replacement. J Am Coll Cardiol. 2020;76:2492–516. doi: 10.1016/j.jacc.2020.09.595. [DOI] [PubMed] [Google Scholar]
- 8.Edwards FH, Cohen DJ, O’Brien SM, et al. Development and Validation of a Risk Prediction Model for In-Hospital Mortality After Transcatheter Aortic Valve Replacement. JAMA Cardiol. 2016;1:46–52. doi: 10.1001/jamacardio.2015.0326. [DOI] [PubMed] [Google Scholar]
- 9.Iung B, Laouénan C, Himbert D, et al. Predictive factors of early mortality after transcatheter aortic valve implantation: individual risk assessment using a simple score. Heart. 2014;100:1016–23. doi: 10.1136/heartjnl-2013-305314. [DOI] [PubMed] [Google Scholar]
- 10.Kötting J, Schiller W, Beckmann A, et al. German Aortic Valve Score: a new scoring system for prediction of mortality related to aortic valve procedures in adults. Eur J Cardiothorac Surg. 2013;43:971–7. doi: 10.1093/ejcts/ezt114. [DOI] [PubMed] [Google Scholar]
- 11.Collins GS, Moons KGM, Dhiman P, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024;385:e078378. doi: 10.1136/bmj-2023-078378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Zaka A, Mutahar D, Gorcilov J, et al. Machine learning approaches for risk prediction after percutaneous coronary intervention: a systematic review and meta-analysis. Eur Heart J Dig Health. 2024 doi: 10.1093/ehjdh/ztae074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Jacquemyn X, Van Onsem E, Dufendach K, et al. Machine-learning approaches for risk prediction in transcatheter aortic valve implantation: Systematic review and meta-analysis. J Thorac Cardiovasc Surg. 2024:38815806. doi: 10.1016/j.jtcvs.2024.05.017. [DOI] [PubMed] [Google Scholar]
- 14.Sazzad F, Ler AAL, Furqan MS, et al. Harnessing the power of artificial intelligence in predicting all-cause mortality in transcatheter aortic valve replacement: a systematic review and meta-analysis. Front Cardiovasc Med. 2024;11:1343210. doi: 10.3389/fcvm.2024.1343210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. doi: 10.1136/bmj.n71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Stroup DF, Berlin JA, Morton SC, et al. Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. JAMA. 2000;283:2008–12. doi: 10.1001/jama.283.15.2008. [DOI] [PubMed] [Google Scholar]
- 17.Uno H, Cai T, Pencina MJ, et al. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 2011;30:1105–17. doi: 10.1002/sim.4154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ouzzani M, Hammady H, Fedorowicz Z, et al. Rayyan-a web and mobile app for systematic reviews. Syst Rev. 2016;5:210. doi: 10.1186/s13643-016-0384-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Debray TP, Damen JA, Riley RD, et al. A framework for meta-analysis of prediction model studies with binary and time-to-event outcomes. Stat Methods Med Res. 2019;28:2768–86. doi: 10.1177/0962280218785504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Assel M, Sjoberg DD, Vickers AJ. The Brier score does not evaluate the clinical utility of diagnostic tests or prediction models. Diagn Progn Res . 2017;1:19. doi: 10.1186/s41512-017-0020-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Redelmeier DA, Bloch DA, Hickam DH. Assessing predictive accuracy: how to compare Brier scores. J Clin Epidemiol. 1991;44:1141–6. doi: 10.1016/0895-4356(91)90146-z. [DOI] [PubMed] [Google Scholar]
- 22.Liu W, Laranjo L, Klimis H, et al. Machine-learning versus traditional approaches for atherosclerotic cardiovascular risk prognostication in primary prevention cohorts: a systematic review and meta-analysis. Eur Heart J Qual Care Clin Outcomes. 2023;9:310–22. doi: 10.1093/ehjqcco/qcad017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kumar A, Sato K, Narayanswami J, et al. Current Society of Thoracic Surgeons Model Reclassifies Mortality Risk in Patients Undergoing Transcatheter Aortic Valve Replacement. Circ Cardiovasc Interv. 2018;11:e006664. doi: 10.1161/CIRCINTERVENTIONS.118.006664. [DOI] [PubMed] [Google Scholar]
- 24.Rogers T, Koifman E, Patel N, et al. Society of Thoracic Surgeons Score Variance Results in Risk Reclassification of Patients Undergoing Transcatheter Aortic Valve Replacement. JAMA Cardiol. 2017;2:455–6. doi: 10.1001/jamacardio.2016.4132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lee ZX, Elangovan S, Anderson R, et al. Short- and medium-term survival after TAVI: Clinical predictors and the role of the FRANCE-2 score. Int J Cardiol Heart Vasc. 2020;31:100657. doi: 10.1016/j.ijcha.2020.100657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Schiller W, Barnewold L, Kazmaier T, et al. The German Aortic Valve Score II. Eur J Cardiothorac Surg. 2017;52:881–7. doi: 10.1093/ejcts/ezx282. [DOI] [PubMed] [Google Scholar]
- 27.Nishimura RA, Otto CM, Bonow RO, et al. 2017 AHA/ACC Focused Update of the 2014 AHA/ACC Guideline for the Management of Patients With Valvular Heart Disease: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Circulation. 2017;135:28298458. doi: 10.1161/CIR.0000000000000503. [DOI] [PubMed] [Google Scholar]
- 28.Mack MJ, Leon MB, Thourani VH, et al. Transcatheter Aortic-Valve Replacement with a Balloon-Expandable Valve in Low-Risk Patients. N Engl J Med. 2019;380:1695–705. doi: 10.1056/NEJMoa1814052. [DOI] [PubMed] [Google Scholar]
- 29.Leon MB, Smith CR, Mack MJ, et al. Transcatheter or Surgical Aortic-Valve Replacement in Intermediate-Risk Patients. N Engl J Med. 2016;374:1609–20. doi: 10.1056/NEJMoa1514616. [DOI] [PubMed] [Google Scholar]
- 30.Krittanawong C, Zhang H, Wang Z, et al. Artificial Intelligence in Precision Cardiovascular Medicine. J Am Coll Cardiol. 2017;69:2657–64. doi: 10.1016/j.jacc.2017.03.571. [DOI] [PubMed] [Google Scholar]
- 31.Kwiecinski J, Dabrowski M, Nombela-Franco L, et al. Machine learning for prediction of all-cause mortality after transcatheter aortic valve implantation. Eur Heart J Qual Care Clin Outcomes. 2023;9:768–77. doi: 10.1093/ehjqcco/qcad002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Leha A, Huber C, Friede T, et al. Development and validation of explainable machine learning models for risk of mortality in transcatheter aortic valve implantation: TAVI risk machine scores. Eur Heart J Digit Health . 2023;4:225–35. doi: 10.1093/ehjdh/ztad021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Benedetto U, Dimagli A, Sinha S, et al. Machine learning improves mortality risk prediction after cardiac surgery: Systematic review and meta-analysis. J Thorac Cardiovasc Surg. 2022;163:2075–87. doi: 10.1016/j.jtcvs.2020.07.105. [DOI] [PubMed] [Google Scholar]
- 34.Christodoulou E, Ma J, Collins GS, et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110:12–22. doi: 10.1016/j.jclinepi.2019.02.004. [DOI] [PubMed] [Google Scholar]
- 35.Hamatani Y, Nishi H, Iguchi M, et al. Machine Learning Risk Prediction for Incident Heart Failure in Patients With Atrial Fibrillation. JACC Asia . 2022;2:706–16. doi: 10.1016/j.jacasi.2022.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Van Calster B, McLernon DJ, van Smeden M, et al. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17:230. doi: 10.1186/s12916-019-1466-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594. doi: 10.1136/bmj.g7594. [DOI] [PubMed] [Google Scholar]
- 38.Collins GS, Dhiman P, Andaur Navarro CL, et al. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open. 2021;11:e048008. doi: 10.1136/bmjopen-2020-048008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Mienye ID, Sun Y. Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Inform Med Unlocked. 2021;25:100690. doi: 10.1016/j.imu.2021.100690. [DOI] [Google Scholar]
- 40.Kia B, Mendes A, Parnami A, et al. Nonlinear dynamics based machine learning: Utilizing dynamics-based flexibility of nonlinear circuits to implement different functions. PLoS One. 2020;15:e0228534. doi: 10.1371/journal.pone.0228534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Capodanno D, Barbanti M, Tamburino C, et al. A Simple Risk Tool (the OBSERVANT Score) for Prediction of 30-Day Mortality After Transcatheter Aortic Valve Replacement. Am J Cardiol. 2014;113:1851–8. doi: 10.1016/j.amjcard.2014.03.014. [DOI] [PubMed] [Google Scholar]
- 42.Martinsson A, Nielsen SJ, Milojevic M, et al. Life Expectancy After Surgical Aortic Valve Replacement. J Am Coll Cardiol. 2021;78:2147–57. doi: 10.1016/j.jacc.2021.09.861. [DOI] [PubMed] [Google Scholar]
- 43.Vollmer S, Mateen BA, Bohner G, et al. Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness. BMJ. 2020;368:l6927. doi: 10.1136/bmj.l6927. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Pieszko K, Shanbhag AD, Singh A, et al. Time and event-specific deep learning for personalized risk assessment after cardiac perfusion imaging. NPJ Digit Med . 2023;6:78. doi: 10.1038/s41746-023-00806-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Lertsanguansinchai P, Chokesuwattanaskul R, Petchlorlian A, et al. Machine learning-based predictive risk models for 30-day and 1-year mortality in severe aortic stenosis patients undergoing transcatheter aortic valve implantation. Int J Cardiol. 2023;374:20–6. doi: 10.1016/j.ijcard.2022.12.023. [DOI] [PubMed] [Google Scholar]
- 46.Otto CM, Nishimura RA, Bonow RO, et al. 2020 ACC/AHA Guideline for the Management of Patients With Valvular Heart Disease: Executive Summary: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. Circulation. 2021;143:33332149. doi: 10.1161/CIR.0000000000000932. [DOI] [PubMed] [Google Scholar]
- 47.Agasthi P, Ashraf H, Pujari SH, et al. Artificial Intelligence Trumps TAVI2-SCORE and CoreValve Score in Predicting 1-Year Mortality Post-Transcatheter Aortic Valve Replacement. Cardiovasc Revasc Med. 2021;24:33–41. doi: 10.1016/j.carrev.2020.08.010. [DOI] [PubMed] [Google Scholar]
- 48.Al-Farra H, Ravelli ACJ, Henriques JPS, et al. Development and validation of a prediction model for early mortality after transcatheter aortic valve implantation (TAVI) based on the Netherlands Heart Registration (NHR): The TAVI-NHR risk model. Catheter Cardiovasc Interv. 2022;100:879–89. doi: 10.1002/ccd.30398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Gomes B, Pilz M, Reich C, et al. Machine learning-based risk prediction of intrahospital clinical outcomes in patients undergoing TAVI. Clin Res Cardiol. 2021;110:343–56. doi: 10.1007/s00392-020-01691-0. [DOI] [PubMed] [Google Scholar]
- 50.Hernandez-Suarez DF, Kim Y, Villablanca P, et al. Machine Learning Prediction Models for In-Hospital Mortality After Transcatheter Aortic Valve Replacement. JACC Cardiovasc Interv. 2019;12:1328–38. doi: 10.1016/j.jcin.2019.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Hoffmann J, Mas-Peiro S, Berkowitsch A, et al. Inflammatory signatures are associated with increased mortality after transfemoral transcatheter aortic valve implantation. ESC Heart Fail. 2020;7:2597–610. doi: 10.1002/ehf2.12837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Penso M, Pepi M, Fusini L, et al. Predicting Long-Term Mortality in TAVI Patients Using Machine Learning Techniques. J Cardiovasc Dev Dis. 2021;8:44. doi: 10.3390/jcdd8040044. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data are available in a public, open access repository.



