Artificial intelligence in pulmonary hypertension: a systematic review

Tilmann Kramer; Mira Kramer; Christian Hagist; Stefan Spinler

doi:10.1186/s40001-025-03557-5

. 2025 Dec 8;30:1225. doi: 10.1186/s40001-025-03557-5

Artificial intelligence in pulmonary hypertension: a systematic review

Tilmann Kramer ^1,^2,^✉, Mira Kramer ³, Christian Hagist ⁴, Stefan Spinler ²

PMCID: PMC12690818 PMID: 41361272

Abstract

Background

Pulmonary hypertension (PH) is characterized by elevated pulmonary pressures and right ventricular strain. Pulmonary arterial hypertension (PAH), a subtype, has a poor prognosis, especially when diagnosis is delayed. Artificial intelligence (AI) methods, including machine learning (ML) and deep learning (DL), offer potential for non-invasive prediction and risk stratification.

Objective

This systematic review assesses ML and DL applications for non-invasive diagnosis, classification, and prognostication in PH and PAH, with emphasis on methodological quality and clinical applicability.

Methods

A PRISMA-guided search identified studies using ML or DL on non-invasive clinical, imaging, or biomarker data, including omics and laboratory parameters. Study characteristics and heterogeneity were synthesized using the SWiM framework. Risk of bias was assessed using PROBAST+AI across participant selection, predictors, outcomes, and analysis.

Results

Fifty-three studies were included. Most used clinical, echocardiographic, imaging, or molecular data. AUC values ranged from 0.71 to 1.00. DL approaches, especially convolutional neural networks, were increasingly applied but seldom externally validated. Nine studies were multicenter, four prospective, one combined retrospective and prospective cohorts, none were randomized controlled trials. The rest were retrospective single-center studies. In 15 studies, right heart catheterization was either not performed or not clearly reported. SWiM analysis showed substantial heterogeneity in study design and outcome definitions. According to PROBAST +AI, 44 studies (83%) had low risk of bias, though applicability concerns were common.

Conclusion

ML and DL models show promise for PH and PAH diagnosis and prognosis, but limitations in subclass differentiation, methodological transparency, and validation must be addressed in future research.

Supplementary Information

The online version contains supplementary material available at 10.1186/s40001-025-03557-5.

Keywords: Pulmonary hypertension, Pulmonary arterial hypertension, Artificial intelligence, Machine learning, Deep learning, Diagnostic and prognostic prediction models

Introduction

Pulmonary hypertension (PH) is a progressive, life-limiting condition defined by a mean pulmonary arterial pressure (mPAP) above 20 mmHg at rest, as confirmed by right heart catheterization (RHC) [1]. PH encompasses a spectrum of entities with distinct etiologies, pathophysiologies, and therapeutic implications. The international classification system endorsed by the World Symposium on PH and reaffirmed in the 2022 ESC/ERS guidelines subdivides PH into five groups, including pulmonary arterial hypertension (PAH), PH due to left heart or lung disease, chronic thromboembolic PH (CTEPH), and multifactorial forms [1]. Among these, PAH is a rare but severe vascular disorder characterized by progressive remodeling of the pulmonary arteries and increased pulmonary vascular resistance, often leading to right heart failure and systemic complications [1, 2].

Despite advances in targeted pharmacologic therapy and structured follow-up, survival remains limited, and many patients present at an advanced stage due to significant diagnostic delays [3–6]. These delays, largely attributable to non-specific symptoms such as dyspnea, fatigue, and reduced exercise tolerance [1, 5] average approximately 2.5 years and are associated with increased mortality and higher healthcare utilization [6].

Echocardiography remains the first-line screening tool, though its interpretation is prone to interobserver variability [1, 7], whereas RHC provides definitive diagnosis but is invasive and less feasible for large-scale screening [1, 8].

In this context, artificial intelligence (AI) has emerged as a promising tool to enhance early detection, improve risk stratification, and support clinical decision-making in PH [9–11]. Machine learning (ML) and deep learning (DL) algorithms can identify complex, nonlinear relationships within high-dimensional datasets, enabling earlier and more accurate recognition of disease patterns [9, 12, 13]. Recent studies have applied AI to detect PH using chest radiographs, electrocardiograms, and multimodal combinations of clinical and diagnostic data, with performance in some cases comparable to or exceeding that of physicians [14–17]. ML models trained on real-world electronic health records (EHRs) have shown potential to identify at-risk patients before clinical diagnosis, possibly helping to reduce diagnostic delay [18].

Beyond detection, AI has been used to predict disease severity, treatment response, and clinical outcomes based on diverse data inputs, including clinical parameters, imaging results, laboratory values, and omics data, often relying on multimodal architectures that approximate complex clinical reasoning [12, 16, 19].

However, existing studies vary considerably in methodological quality and clinical applicability [9, 12], and it remains uncertain whether models trained on mixed PH populations can generalize across subtypes [12, 20].This systematic review synthesizes the current evidence on ML and DL applications in PH, focusing on modeled PH subtypes, data modalities, algorithmic techniques, reported outcomes, and methodological rigor, including validation strategies and overfitting control. Given that nuanced diagnostic distinctions between PH subtypes carry significant therapeutic implications, precise attribution is essential in AI research on PH to ensure clinically meaningful translation [1]. This review aims to provide an evidence-based overview and to guide future research at the intersection of PH phenotyping and advanced AI methodologies.

Methods

This systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines [21] and was prospectively registered in the International Prospective Register of Systematic Reviews (PROSPERO; registration number: CRD420251074202) [22]. The review focused on ML and DL applications in PH, emphasizing diagnostic, phenotypic, and prognostic use cases. Methodological aspects such as data types, algorithmic approaches, validation strategies, and subtype attribution were systematically assessed.

A comprehensive literature search was carried out using two databases: MEDLINE via PubMed and Google Scholar. Google Scholar was searched to identify additional relevant studies. The complete search strategy, including specific Medical Subject Headings (MeSH) terms, free-text keywords, Boolean operators, and field specifications, is provided in Supplementary Table S1. The strategy was designed to identify peer-reviewed original research articles applying AI methods in PH. MeSH terms and free-text keywords such as “pulmonary hypertension”, “pulmonary arterial hypertension”, “machine learning”, “deep learning”, “artificial intelligence”, “diagnosis”, “phenotyping”, “prognosis”, “prediction”, “non-invasive”, “risk stratification”, “survival”, “mortality”, “electrocardiography”, “echocardiography”, “chest X-ray”, “computed tomography”, “magnetic resonance imaging”, and “electronic health records” were used. Boolean operators (“AND”, “OR”) were applied to combine related terms, and searches were conducted within titles, abstracts, and MeSH terms. The same search strategy was applied across both databases, with minor syntax adjustments for Google Scholar, which also indexes full-text information. The search was restricted to English-language publications published since 2016 to reflect the emergence of modern ML and DL approaches in PH research. Only studies involving human subjects were included. In addition to database searches, reference lists of the electronically identified articles were screened manually, yielding six additional studies. The strategy was further refined through iterative testing to ensure retrieval of all known eligible studies. Refinement involved adjusting keyword groupings and Boolean logic to verify that previously known relevant publications were consistently retrieved by the final search strategy.

Studies were included if they met the predefined eligibility criteria. Specifically, studies were eligible if they (i) applied ML or DL techniques to predict or classify PH; (ii) used non-invasive data as input features, such as clinical parameters, laboratory results, electrocardiographic measurements, echocardiography, chest imaging, or other routinely collected non-invasive modalities; (iii) reported quantitative predictive performance metrics for diagnostic, classification, or prognostic tasks; and (iv) provided sufficient methodological detail to allow an assessment of model development, validation, and reproducibility.

Studies were excluded if they (i) were reviews, editorials, conference abstracts, or case reports; (ii) exclusively relied on invasive input features, such as hemodynamic parameters from RHC; or (iii) did not report relevant outcome metrics for model performance.

Two reviewers independently screened titles and abstracts for eligibility. Full-text articles of potentially eligible studies were then reviewed in detail. Disagreements were resolved by discussion and, if necessary, by consulting a third reviewer. The same two reviewers independently extracted data from each included study using a pre-defined data extraction form. Extracted information comprised authorship, year of publication, study location, population characteristics, PH subgroup investigated, data type and source, model type and structure, clinical objective (diagnostic classification, phenotypic differentiation, or prognostic prediction), model performance metrics including area under the receiver operating characteristic curve (AUC), and methods used for validation.

To systematically assess the risk of bias, methodological quality, and applicability of model development and evaluation, the updated Prediction Model Risk of Bias Assessment Tool for Artificial Intelligence (PROBAST+AI) was applied to all included studies [23]. This tool is applicable to AI-based prediction models and evaluates the risk of bias across four domains: participants, predictors, outcome, and analysis. It includes additional signaling questions tailored to ML workflows, addressing aspects such as model calibration, resampling methods, data leakage, and model explainability. To address domain-specific concerns in PH, the assessment was extended to evaluate whether studies applied guideline-based hemodynamic criteria for cohort labeling and adhered to consistent subtype classification according to current recommendations [1]. This approach ensured a robust appraisal of AI model quality in relation to the underlying ground truth and the respective PH populations.

All steps of study selection, data extraction, and quality appraisal were conducted in accordance with established best practices for systematic reviews of prediction model studies. Due to substantial methodological heterogeneity in model types, input features, and clinical endpoints, no meta-analysis was performed. Instead, findings were synthesized systematically. Studies were grouped according to clinical objective, algorithmic approach, outcome parameters, performance metrics, validation strategy, and PH subtype, wherever classification was possible based on the reported data and in accordance with the current ESC/ERS guidelines and Nice classification [1]. This synthesis approach followed the Synthesis Without Meta-analysis (SWiM) guideline to ensure transparent and structured reporting in the absence of a meta-analysis [24]. Study characteristics were extracted and systematically tabulated, incorporating relevant clinical and modeling features.

Results

Study selection and characteristics

A total of 53 studies met the predefined eligibility criteria and were included in this systematic review. The study selection process is illustrated in the PRISMA flow chart (Fig. 1). The corresponding PRISMA 2020 Checklist detailing adherence to reporting standards is provided in Supplementary Table S2.

Fig. 1 — PRISMA flowchart for study selection. This flowchart illustrates the study selection process in accordance with the PRISMA 2020 guidelines [21]. A total of 472 records were identified through database searches (MEDLINE via PubMed and Google Scholar), and an additional six studies were identified through manual reference searching of the previously selected studies. After removing 142 duplicates, 336 records remained for title and abstract screening. Of these, 263 were excluded based on predefined eligibility criteria. Seventy-two full-text articles were assessed for eligibility, of which 19 articles were excluded. Ultimately, 53 studies were included in the final qualitative synthesis. Reasons for exclusion at each stage are detailed in the flowchart

Results were synthesized narratively and organized by clinical objective, algorithmic approach and outcome parameters in line with SWiM guidance. Study-level characteristics such as study design, sample size, PH classification, input data types, clinical objectives, validation strategies, model types, and performance metrics are systematically presented in Table 1. The table also summarizes outcome definitions, use of prognostic modeling and key strengths and limitations for each study. Risk of bias and applicability concerns were assessed using PROBAST-AI and are detailed in Supplementary Table S3, structured by predefined domains.

Table 1.

Study characteristics and performance metrics

Author (year)	PH group	Study group	Study design	Sample size	Key findings	Outcome measures	Prognostic model used	Model type	Diagnosis method	Validation method	Strengths and limitations
Alabed et al. (2022) [25]	PAH	MPCA-based ML for Mortality Prediction in PAH	Retro-spective cohort	723 patients	MPCA-based features from CMR significantly improved 1-year mortality prediction (c-index: 0.83 vs. 0.71 with REVEAL)	c-index, ROC-AUC, Kaplan–Meier survival	MPCA, CMR features, REVEAL score	ML	RHC	Tenfold Cross-validation, Internal validation	Strengths: Transparent, clinically interpretable; Limitations: Retrospective, single center, no external validation
Anand et al. (2024) [26]	PH	ML for PH Diagnosis Using Echo	Retro-spective cohort	7853 patients	XGBoost model for PH detection achieved high AUC (0.83) and sensitivity (88%) with specificity (54%)	AUC, Accuracy, Sensitivity, Specificity, PPV, NPV	XGBoost	ML	RHC	Fivefold Cross-validation, Internal validation	Strengths: Large cohort, no need for TR jet velocity; Limitations: Retrospective, High PH prevalence in cohort, model performance drop in testing data
Aras et al. (2023) [27]	PAH, PH	DL ECG Detection of PH	Retro-spective cohort	24,470 patients	CNN model achieved high AUC for detecting PH (AUC: 0.89), sensitivity (0.79), and specificity (0.84). For pre-capillary PH, the model performed excellently (AUC: 0.91). For PAH, AUC was 0.88	AUC, Sensitivity, Specificity, PPV, NPV, F1-Score	CNN	DL	RHC or Echo, RHC (subgroup)	Internal validation	Strengths: Large cohort, early detection capability (up to 2 years before diagnosis), potential for widespread clinical use with remote monitoring; Limitations: Retrospective design, misclassification potential for some PH subtypes due to broad inclusion criterion (TR-Velocity > 3.4 m/s), dependent on quality of ECG data
Argiento et al. (2024) [28]	PAH	ML for PAH Prediction	Retro-spective cohort	226 patients	Developed an ML algorithm for identifying PAH from anamnesis and non-invasive data. AUC of 83%, accuracy of 74%	AUC, Sensitivity, Specificity	Elastic-Net Regularized Generalized Linear Model	ML	RHC	Threefold Cross-validation, Internal validation	Strengths: Focus on high-risk populations, robust ML model. Limitations: Retrospective, Single center, unbalanced sample, smaller dataset
Bauer et al. (2021) [29]	PAH	ML for PAH Prediction Using Proteomics Data	Retro-spective cohort	157	ML models using proteomics data showed significant potential in predicting PAH with high AUC, sensitivity, and specificity, outperforming traditional biomarkers	AUC, Sensitivity, Specificity	Random Forest	ML	RHC	Tenfold Cross-validation, Internal validation	Strengths: Use of proteomics for early detection, potential for personalized treatment; Limitations: Retrospective, sample size, and lack of external validation
Bordag et al. (2023) [30]	PAH, PH (left heart disease), PH (lung disease), CTEPH	ML for PH Prediction Using Lipidomics	Retro-spective cohort	233 patients	ML models using lipidomics identified diagnostic and prognostic biomarkers, with predictive potential (AUC 0.82–0.90) for PH	AUC, Sensitivity, Specificity	Random Forest, XGBoost	ML	RHC	Sevenfold cross-validation, External Validation	Strengths: Novel lipidomics approach, high diagnostic accuracy; Limitations: Small sample size, single center, potential bias in prognostic scores with mixed PH groups
Chettrit et al. (2019) [31]	COPD-related PH	DL System for PH Risk Stratification using chest CT	Retro-spective cohort	1285 chest CT studies	The DL model automated the measurement of pulmonary artery (PA) and aorta (Ao) diameters to assess the PA-to-Ao ratio, showing significant potential for PH risk stratification with high Pearson correlation (93% for Ao, 92% for PA)	Pearson correlation, Sensitivity, Specificity, PPV	CNN	DL	Diagnosis based on clinical criteria (RHC, Echo, and other diagnostic tests)	Cross-validation, Internal validation	Strengths: Fully automated, accurate measurements, high specificity for screening; Limitations: Retrospective design, reliance on contrast-enhanced CT scans, potential bias with mixed PH groups
Dawes et al. (2017) [66]	PH	ML of 3D Right Ventricular Motion for survival prediction in PH	Prospective cohort	256 patients	Survival prediction improved with 3D right ventricular motion data. Model provided better prediction than conventional clinical measures (AUC: 0.73 vs. 0.60, P < 0.001)	AUC, Sensitivity, Survival Time	Principal Component Analysis, Supervised Learning	ML	RHC	Eightfold Cross-validation	Strengths: Incorporates 3D motion, better survival prediction, prospective design; Limitations: Retrospective data analysis
Diller et al. (2022) [32]	PAH	DL Framework for Detection and Prognostication of PAH	Retro-spective cohort	450 patients	DL model achieved 97.6% accuracy and 100% sensitivity in detecting PAH. It also provided prognostic insights with non-inferior performance compared to expert echocardiography	Sensitivity, Specificity, AUC, Cox-proportional hazard models	CNN-based segmentation and feature extraction; prognostic modelling via multivariable Cox regression	DL	RHC	Internal validation	Strengths: High accuracy, expert-level prediction, provides prognostic data; Limitations: Limited to expert center data, small number of normal controls, retrospective design
DuBrock et al. (2024) [33]	PAH	DL algorithm for early detection of PH based on 12-lead ECG	Retro-spective cohort	39,823 PH-likely patients, 219,404 controls	DL model achieved high accuracy for detecting PH, with an AUC of 0.92 at Mayo Clinic, and 0.88 at VUMC. The model was capable of predicting PH up to 5 years prior to diagnosis	AUC, Sensitivity, Specificity, PPV, NPV	CNN	DL	RHC or Echo	Internal validation, External validation (VUMC)	Strengths: High performance, early detection capability; Limitations: Retrospective design, reliance on RHC and TRV measurements, potential bias due to using echo and RHC for cohort definition
Duo et al. (2022) [34]	PAH	Gene expression-based diagnostic signature for PAH	Retro-spective cohort	73 PAH samples, 36 normal samples	A diagnostic signature (PDS) for PAH was constructed from key genes identified via WGCNA and LASSO. ROC analysis showed AUCs of 0.948 and 0.945 in two independent cohorts	Sensitivity, Specificity, AUC	LASSO	ML	RHC	External validation (GSE113439)	Strengths: High accuracy, identification of key biomarkers and immune landscape; Limitations: Limited sample size, experimental validation needed
Dwivedi et al. (2024) [35]	PAH and PH-LD	AI model for lung fibrosis quantification and survival prediction	Retro-spective cohort	521 patients	AI-quantified lung fibrosis on CT pulmonary angiograms was associated with increased mortality risk (C-index: 0.76). Combining AI with radiologic scoring improved survival prediction	C-index, Mortality	AI-based DL Model	DL	RHC	External validation	Strengths: AI model for accurate fibrosis quantification; Limitations: Retrospective study, reliance on external validation, potential bias from image acquisition variability
Errington et al. (2023) [36]	PAH	miRNA expression-based ML model for PAH diagnosis	Retro-spective cohort	107 patients	ML model based on miRNA expression showed high diagnostic accuracy (AUC: 0.85) for PAH and provided potential biomarkers for prognosis	AUC, Sensitivity, Specificity, PPV, NPV	SVM, Random Forest, LASSO, XGBoost, Ensemble, Rpart	ML	RHC	Tenfold Cross-validation, External validation	Strengths: High diagnostic accuracy, identification of miRNA biomarkers for PAH; Limitations: Limited external validation, retrospective design
Fortmeier et al. (2022) [37]	PH	XGBoost model for mPAP prediction	Retro-spective cohort	116 patients	XGBoost model based on echocardiographic parameters was able to predict mPAP and associated with 2-year all-cause mortality (HR 2.4)	Pearson correlation, survival	XGBoost	ML	RHC	Internal and external validation	Strengths: Cohort with both RHC and echo data; Limitations: Small sample size, retrospective design
Gawlitza et al. (2024) [38]	CTEPH	ML-based feature identification for hemodynamic endpoint prediction using CT	Retro-spective cohort	127 patients	The random forest model achieved AUC of 0.82 for mPAP prediction and 0.74 for PA SaO2 prediction, using quantitative and qualitative CT features	AUC, Sensitivity, Specificity, PPV, NPV	Random Forest	ML	RHC	Cross-validation, internal validation	Strengths: Non-invasive risk stratification using CT features; Limitations: Small cohort, Retrospective design
Imai et al. (2024) [14]	PAH	DL algorithm for PAH detection using CXR	Retro-spective cohort	145 PAH patients, 260 controls	The DL model (ResNet50) achieved AUC of 0.988 for PAH detection using CXR images, outperforming experienced doctors (AUC 0.945)	AUC, Sensitivity, Specificity	ResNet50	DL	RHC	Fourfold cross-validation, Internal validation	Strengths: High diagnostic accuracy, non-invasive and cost-effective; Limitations: Small sample size, single center study, potential image quality variability, retrospective design
Kanwar et al. (2020) [41]	PAH	Bayesian network (PHORA) for PAH risk stratification	Retro-spective cohort	3515 patients from the REVEAL registry	The PHORA Bayesian network model achieved AUC of 0.80 for 1-year survival, outperforming the REVEAL 2.0 model (AUC 0.76). It was validated externally in two registries with an AUC of 0.74 and 0.80	AUC, Sensitivity, Specificity, NPV, PPV	Bayesian network (Tree-augmented Naïve Bayes—TAN)	ML	RHC	Internal validation (REVEAL registry), External validation (COMPERA, PHSANZ)	Strengths: Improved discriminatory ability, can handle missing data, validated in multiple cohorts; Limitations: Survival bias, missing data in registries
Kheyfets et al. (2023) [68]	PAH	Random forest model for PAH survival prediction using clinical and biomarker data	Prospective cohort	167 PAH patients	The random forest model predicted 4-year survival risk with AUC 0.94 (internal validation) and AUC 0.81 (external validation). It identified novel biomarkers such as IL-2, IL-9, and 6MWD as significant predictors of risk	AUC, Sensitivity, Specificity	Random Forest	ML	RHC	Internal validation (Stanford cohort), External validation (Sheffield cohort)	Strengths: Novel approach combining clinical and biomarker data for personalized PAH prognostication, prospective design; Limitations: Relatively small cohort, biomarker data from a single center
Kiely et al. (2019) [42]	iPAH	Predictive model based on HCRU to identify patients at risk for iPAH	Retro-spective cohort	709 iPAH patients and 2,812,458 non-iPAH patients	The Gradient Boosting Trees model achieved 99.99% specificity and 14.10% sensitivity, identifying 100 iPAH cases among 969 flagged patients	Sensitivity, Specificity, PPV, NPV	Gradient Boosting Trees (XGBoost)	ML	RHC	Fivefold cross-validation, internal validation	Strengths: Cost-effective, real-world data-based model for rare disease screening; Limitations: Low sensitivity, narrow scope of healthcare data used
Kogan et al. (2023) [18]	PAH, CTEPH, and other PH types	XGBoost model for early PH detection using EHR data	Retro-spective cohort	115,822 patients, 11,279,478 controls	The XGBoost model achieved AUC 0.92 for PH prediction. The model also predicted PH subgroups (PAH: 0.79–0.90 AUC, CTEPH: 0.87–0.96 AUC)	AUC, Sensitivity, PPV	XGBoost	ML	Echo or RHC	Threefold cross-validation, internal validation	Strengths: Large cohort, uses real-world EHR data; Limitations: PH diagnosis not uniformly confirmed by RHC, potential bias from coding algorithms, retrospective design
Kusunose et al. (2020) [45]	PH	DL model for PH detection using CXR	Retro-spective cohort	900 patients	CNN achieved AUC of 0.71 for PH detection using CXR images, improving significantly compared to human observers	AUC, NPV	CNN	DL	RHC	Tenfold cross-validation, internal and external validation	Strengths: AI-driven approach with CXR, non-invasive screening; Limitations: Moderate accuracy, single-center
Leha et al. (2019) [47]	PAH, PH due to left heart disease, PH due to lung disease and hypoxia, PH with unclear and multi-factorial	ML for PH prediction using echo	Retro-spective cohort	90 patients (68 with confirmed PH, 22 without PH)	AUC for SVM: 0.83, Random Forest Regression: 0.87 for predicting PH from echo	AUC, Sensitivity, Specificity, PPV, NPV	SVM, Random Forest, Lasso, boosted classification trees	ML	RHC	Threefold cross-validation, Internal validation	Strengths: High AUC, broad echocardiographic data; Limitations: Small cohort, retrospective design
Liao et al. (2023) [69]	PH due to left heart disease, PAH, CTED	ML for PH detection using echo	Retro-spective cohort	346 patients	The ML model achieved AUC 0.945 in internal validation and AUC 0.950 in external validation for predicting PH from echocardiographic images	AUC	Linear regression,LightGBM, CatBoost	ML	RHC	Cross-validation (50%), Internal and external validation	Strengths: High AUC, robust model for PH detection from echocardiographic images, external validation; Limitations: relatively small sample size, no data on ethnic diversity, Possible bias in image quality, retrospective design
Lungu et al. (2016) [49]	PAH, PH due to left heart disease, PH due to lung disease and hypoxia, CTEPH, PH with unclear and multi-factorial	MRI-based ML model for PH detection	Retro-spective cohort	72 patients	The ML model achieved 92% accuracy in diagnosing PH using MRI-derived parameters and decision tree analysis	AUC, Sensitivity, Specificity, PPV, NPV	Random Forest Classification	ML	RHC	Leave-one-out cross-validation	Strengths: Non-invasive diagnostic tool, high accuracy with MRI; Limitations: Small sample size, lack of internal or external validation dataset, retrospective design, single center
Matsunaga et al. (2024) [50]	CTEPH	ML models for predicting mPAP in CTEPH	Retro-spective cohort	136 patients	The linear regression model achieved the highest R² value of 0.388. Models including age, BNP, TRPG, CXR performed better than traditional methods using TRPG alone	R², RMSE, MAE	Linear regression, Decision Tree, SVR, KNN, Random Forest, XGBoost	ML	RHC	Internal validation	Strengths: Multiple model types tested, multivariable model increases prediction accuracy; Limitations: Small sample size, no external validation
Murayama et al. (2024) [51]	PAH, CTEPH	DL model for RVEF estimation using 2D echo	Retro-spective cohort	93 patients	The DL model (3D-ResNet50) predicted RVEF with a mean absolute error of 7.67% and showed AUC 0.84 for detecting severe RV dysfunction	AUC, Mean absolute error	3D-ResNet50 CNN model	DL	RHC	Fivefold cross-validation, Internal validation	Strengths: Automated tool for RVEF prediction, high diagnostic accuracy, Echo-based; Limitations: Small sample size, retrospective design, proportional error observed
Nemati et al. (2024) [20]	PH	ML model for PH detection using orthogonal voltage gradient (OVG) and photoplethysmographic (PPG) signals	Retro-spective cohort	488 patients	AUC of 0.93, sensitivity of 87%, specificity of 83% for PH detection using non-invasive sensors (OVG & PPG signals)	AUC, Sensitivity, Specificity	Elastic Net, Random Forest	ML	RHC	Out-of-fold cross-validation, Internal validation	Strengths: Non-invasive, point-of-care, high sensitivity and specificity, generalizable; Limitations: Relies on specific features, limited to point-of-care application, retrospective design, still requires further validation
Ong et al. (2020) [52]	PH, PAH	Claims-based ML model for PH detection using EHR and Medicare claims	Retro-spective cohort	550 patients	ML models outperformed rule-based algorithms for identifying PH in administrative claims, achieving an AUC of 0.88	AUC, Sensitivity, Specificity, PPV, NPV	Lasso, Random Forest, Gradient boosting machine	ML	RHC	Tenfold cross-validation, Internal validation, Bootstrap	Strengths: High performance, real-world healthcare data, multicenter design; Limitations: Relies on administrative claims, no full external validation, retrospective design
Priya et al. (2021) [53]	PAH, PH due to left heart disease, PH due to lung disease and hypoxia, CTEPH, PH with unclear and multi-factorial	Cardiac MRI-based radiomics for PH detection using texture features	Retro-spective cohort	72 patients (42 PH, 30 controls)	The radiomics-based model achieved AUC 0.862 for PH detection and AUC 0.918 for PH patients with preserved LVEF in subgroup analysis	AUC, Sensitivity, Specificity, Accuracy	MLP, Random Forest, SVM, Elastic Net, Ridge	ML	RHC	Five-fold cross-validation	Strengths: Non-invasive, good diagnostic performance; Limitations: Small sample size, no external validation, single institution retrospective design
Priya et al. (2021) [54]	PAH, PH due to left heart disease, PH due to lung disease and hypoxia, CTEPH, PH with unclear and multi-factorial	Cardiac MRI-derived radiomics with DAFIT for PH detection	Retro-spective cohort	82 patients (42 PH, 40 controls)	DAFIT model with combined LV and RV masks performed with AUC 0.958, outperforming other models and showing superior predictive performance in PH detection	AUC	Linear, logistic, ridge, elastic net, and LASSO regression, Neural network, SVM, MLP, Random Forest, generalized boosted regression model	ML	RHC	Five-fold cross-validation	Strengths: High AUC, non-invasive, Data augmentation approach improves reproducibility; Limitations: Small sample size, Lack of external validation, limited PH subgroup, retrospective design
Schuler et al. (2022) [56]	PAH	ML model using ICD-9/10 codes, RHC, and PAH medication for PAH prediction	Retro-spective cohort	194 PAH patients and 786 controls	ML algorithm achieved sensitivity 0.88, specificity 0.93, PPV 0.89, NPV 0.92 in identifying PAH using administrative claims data	Sensitivity, Specificity, PPV, NPV, AUC	Random Forest, XGBoost, Elastic Net	ML	RHC	Tenfold cross-validation, Internal and external validation	Strengths: High sensitivity and specificity, External validation, non-invasive administrative data use; Limitations: Relies on administrative data, retrospective design
Shikhare et al. (2022) [57]	CTEPH	ML-based algorithm for right-to-left ventricle ratio (dRV/dLV) prediction from CTPA	Retro-spective cohort	125 patients	ML-based algorithm performed well with a strong correlation of r = 0.96 between predicted and manual dRV/dLV, associated with long ICU length of stay	AUC, Sensitivity, ICU length of stay	Neural Networks, CNN	ML	RHC	–	Strengths: High correlation with manual measurements, predictive of ICU length of stay, non-invasive; Limitations: 20% algorithm failure, small cohort, no validation on test set, single center, retrospective design
Suvon et al. (2023) [58]	PAH	Multimodal learning for mortality prediction using EHR, echo, and MRI data	Retro-spective cohort	2563 patients	The multimodal model combined numerical imaging features, categorical features, and textual features from EHR data, achieving AUC 0.89 for one-year mortality prediction	AUC, Sensitivity, Specificity, PPV, NPV	Bidirectional Encoder Representations from Transformers (BERT), MLP	ML	RHC	Tenfold cross-validation, Internal validation	Strengths: Multimodal approach, high AUC, utilizes real-world data; Limitations: Missing data, small class imbalance, retrospective design
Sweatt et al. (2019) [70]	PAH	Immune phenotypes classification using proteomic profiles	Prospective obser-vational	385 patients (discovery: 281, validation: 104)	Identified 4 immune clusters with distinct cytokine profiles using unsupervised ML, which correlated with clinical outcomes and 5-year survival	Survival Rate, Kaplan–Meier estimates, Cytokine levels	Consensus Clustering, Partial Correlation Networks	ML	RHC	External validation	Strengths: Unsupervised phenotyping, identifies immune phenotypes, links to prognosis, prospective multicenter design, external validation; Limitations: One-time point sampling, no dynamic monitoring
Swift et al. (2020) [59]	PAH	Tensor-based ML for CMR feature extraction to predict PAH	Retro-spective cohort	220 patients (150 with PAH, 70 with no PH)	Tensor-based ML approach showed AUC = 0.92 for PAH diagnosis using CMR data, identifying new diagnostic features	AUC, Sensitivity, Specificity, PPV, NPV	Tensor-based ML, Multilinear Subspace Learning (MPCA)	ML	RHC	Tenfold cross-validation	Strengths: High diagnostic accuracy, innovative approach using CMR data; Limitations: Small sample size, requires CMR, single center, no external validation retrospective design
Swinnen et al. (2023) [60]	PAH vs. PH due to left heart disease (PH-LHD)	Differentiation of PAH from PH-LHD using ML on noninvasive data	Retro-spective cohort	344 patients	Random Forest-based model showed sensitivity of 64% and 100% specificity for PH-LHD detection; outperforming the Jacobs score	AUC, Sensitivity, Specificity, PPV, NPV	Random Forest, Logistic Regression	ML	RHC	Tenfold cross-validation, Internal validation	Strengths: Highly specific model, non-invasive approach to differentiate PAH vs PH-LHD; Limitations: Retrospective design, single center
Zhang et al. (2023) [63]	PAH, PH due to left heart disease, PH due to lung disease and hypoxia, CTEPH, PH with unclear and multi-factorial	ML-based PAP prediction from CTPA	Retro-spective cohort	55 patients	Developed ML model using CTPA for the automatic evaluation of PAP. Achieved good consistency between predicted and manual measurements for mPAP, sPAP, dPAP	Intraclass correlation coefficient (ICC), AUC, mPAP, sPAP, dPAP, TPR	XGBoost, SVM, CatBoost	ML, DL	RHC	Tenfold cross-validation	Strengths: Accurate PAP prediction and segmentation via CTPA; Limitations: Small sample size, retrospective design
Zhao et al. (2025) [71]	PH (pre- and postcapillary)	Multimodal DL for PH detection from EHR, echo, and CXR	Prospective and retro-spective design	2451 patients	Developed MMF-PH model integrating CXR, ECG, echo, and clinical data; outperformed Echo in PH screening with higher specificity and NPV across datasets	Accuracy, Precision, Sensitivity, Specificity, NPV, F1, AUROC, AUPRC	Multimodal DL (MMF-PH)	DL	RHC	Internal and external validation	Strengths: Robust diagnostic accuracy, non-invasive PH screening, multicenter design, partially prospective design; Limitations: Small external validation group, PH subtypes were not comprehensively classified, overfitting potential

Open in a new tab

This table summarizes the characteristics of the studies included in the systematic review. The “Author (year)” column lists the lead author and the publication year of each study. The “PH group” column identifies the PH subtype studied. The “Study group” column provides a brief description of the study’s focus. “Study design” describes the methodology used in each study, whether retrospective or prospective. “Sample size” lists the number of participants included in each study. The “Key findings” column highlights the primary outcomes or findings. “Outcome measures” refers to the specific performance metrics used in each study. “Prognostic model used" details the type of model applied for prediction or prognosis. The “Model type” column specifies whether the model was ML or DL. “Diagnosis method” outlines the diagnostic methods used for PH in the study. “Validation method” describes the validation strategy used. Finally, the “Strengths and limitations” column provides insights into the strengths and weaknesses of each study

The majority of studies were retrospective (48 studies, 90.6%) [14–18, 20, 25–65] and single-center (44 studies, 83.0%) [14–16, 18, 20, 25–40, 42–45, 47, 49–51, 53–63, 66–69] in design. Nine studies (17.0%) were conducted across multiple centers [17, 41, 46, 48, 52, 64, 65, 70, 71], four (7.5%) were designed prospectively [66–68, 70], and one (1.9%) included both retrospective and prospective cohorts [71]. All included studies were published between 2016 and 2025. No randomized controlled trials were identified. Study populations included patients with either PAH or broader PH, with varying degrees of diagnostic certainty and subtype attribution (see Table 1 for full study-level details).

The primary clinical objectives varied across studies: 47 studies (88.7%) addressed diagnostic classification [14–18, 20, 26–56, 59, 61–65, 67, 69–71], nine studies (17.0%) aimed at prognostic prediction [25, 31, 32, 35, 41, 57, 58, 66, 68], and one study (1.9%) focused on phenotypic subgroup differentiation [60]. There was some overlap, as several studies pursued more than one objective. Data sources were heterogeneous and included clinical variables, echocardiographic data, electrocardiograms (ECG), chest imaging [(chest X-ray (CXR), computed tomography (CT), magnetic resonance imaging (MRI)], laboratory parameters, and omics-based inputs. The latter were employed in seven studies (13.2%), including four based on proteomic or transcriptomic data [29, 34, 36, 70], two on radiomic features [53, 54], and one on lipidomics [30] (see Table 1).

Algorithmic approaches and input modalities

Among the 53 included studies, 32 (60.4%) employed ML models such as random forests, support vector machines or gradient boosting. DL models, particularly convolutional neural networks (CNNs), were applied in 18 studies (34.0%), mainly for image-based classification tasks involving CXR, CT scans, echocardiographic images, MRI, or ECG data. Three studies (5.7%) combined ML and DL methods [17, 63, 64]. An increasing number of studies adopted multimodal frameworks that integrated structured clinical data with unstructured sources such as imaging or free-text reports. Input features differed considerably between studies. While most studies relied on clinical, imaging, and echocardiographic data, seven studies (13.2%) incorporated ECG-derived parameters [16, 17, 27, 33, 46, 48, 71]. Eight studies (15.1%) used biomarker data [29, 34, 36, 39, 61, 62, 68, 70]. Data preprocessing strategies, feature selection methods, and hyperparameter tuning procedures were reported inconsistently (Tables 1, 2).

Table 2.

Study characteristics and performance metrics—studies without explicit RHC confirmation for diagnosis

Author (year)	PH group	Study group	Study design	Sample size	Key findings	Outcome measures	Prognostic model used	Model type	Diagnosis method	Validation method	Strengths and limitations
Guo et al. (2025) [67]	PH	DL model based on phonocardiograms for PH screening	Prospective cohort study	985 patients	The model achieved an AUC of 0.79 for detecting elevated PASP ≥ 40 mm Hg, with sensitivity of 0.73 and specificity of 0.74. Performance was better when using a per-patient approach (AUC 0.82)	AUC, Sensitivity, Specificity, PPV, NPV	CNN	DL	Echo	Fivefold cross-validation, internal validation	Strengths: Non-invasive, low-cost screening tool for PH using a digital stethoscope, prospective design; Limitations: Echocardiographic PASP used as ground truth instead of RHC
Han et al. (2024) [15]	PAH-CHD (Pulmonary Arterial Hypertension in Congenital Heart Disease)	AI model based on chest radiographs (CXR) for PAH-CHD diagnosis	Retro-spective study	3255 radiographs	AI model achieved AUC 0.948 for CHD diagnosis and AUC 0.778 for PAH-CHD detection. With AI assistance, radiologists’ performance improved significantly for both diagnoses	AUC, Sensitivity, Specificity, Accuracy, F1 Score	ResNet18 (DL)	DL	Echo (CHD diagnosis), Clinical Reports	Fivefold cross-validation, Internal validation cohort	Strengths: Non-invasive, easy-to-perform CXR with AI assistance; Limitations: Single center, PAH diagnosed by echo, small sample for specific CHD types, retrospective design
Hu et al. (2023) [39]	PAH	ML-based biomarker identification for PAH	Retro-spective cohort	3 Lung tissue samples from PAH patients	Identification of gene biomarkers that reliably distinguished PAH from controls	AUC	Gradient boosting decision tree	ML	Bio-informatics analysis, Gene expression data from public datasets	Fivefold cross-validation, External dataset validation (GSE53408)	Strengths: Comprehensive bioinformatics approach and experimental validation. Limitations: Small sample size, potential overfitting due to limited data
Hyde et al. (2023) [40]	PAH	Claims-based ML algorithm for PAH identification	Retro-spective cohort	1339 PAH and 4222 non-PAH patients	The random forest model distinguished PAH from non-PAH patients with AUC of 0.84 for 6 months prior to diagnosis, showing promising early identification capability	AUC, Recall, Precision, Accuracy	Random Forest	ML	Claims data-based (ICD-10 codes for PAH or PH, outpatient claims)	Fivefold cross-validation, Internal validation	Strengths: Claims data-based approach for early PAH (or PH) identification, real-world evidence; Limitations: Potential biases in claims data (PAH vs PH), missing data
Kishikawa et al. (2025) [17]	PH	Ensemble learning model for PH detection using ECG, CXR, and BNP	Retro-spective cohort	71,826 ECG data points, 4718 CXR data points, 4718 BNP data points	AUC 0.872 for ensemble model; improves cardiologists’ detection accuracy for PH from 65 to 74% using ECG, CXR, and BNP data	AUC, Sensitivity, Specificity, Accuracy, PPV, NPV	Ensemble learning model	ML, DL	Echo	Internal validation	Strengths: Multimodal model, multicenter design, improves accuracy in detecting PH; Limitations: Only cardiologists tested; small cohort; unspecific patient population; only echocardiographic diagnosis without subtype classification, potentially limiting treatment decisions, retrospective design
Kivrak et al. (2023) [43]	PAH, PH due to left heart disease, PH due to lung disease and hypoxia, PH with unclear and multi-factorial mechanism, and non-PH	AI-based classification of PH using Chest X-ray images	Retro-spective cohort	6642 X-ray images from 2005 patients	The DL model (EfficientNetb0) achieved accuracy of 86.14%, AUC of 0.945 for PH detection	Accuracy, Recall, Precision, F1 Score, AUC	EfficientNetb0, SVM	DL	CXR and clinical findings	Internal validation	Strengths: High performance with CXR for PH classification; Limitations: Unbalanced dataset, retrospective design, black-box AI, no reliable PH diagnosis (no RHC)
Kusunose et al. (2022) [44]	Exercise-induced PH	DL model for PH detection using CXR	Retro-spective cohort	142 patients	The DL model achieved an AUC of 0.71 adding predictive value over clinical and echocardiographic parameters at rest, improving AUC from 0.65 to 0.74	AUC	DL (Capsule Network with residual blocks)	DL	AI model	Tenfold cross-validation	Strengths: Non-invasive detection of exercise-induced PH using CXR and AI; Limitations: Small cohort, no RHC for diagnosis, Black-box nature of the model, retrospective design
Kwon et al. (2020) [46]	PH	DL model for PH prediction using ECG	Retro-spective cohort	38,241 patients (including 4096 PH patients)	The AI algorithm achieved AUC of 0.859 (internal validation) and 0.902 (external validation)	AUC, Sensitivity, NPV, PPV	DL (ensemble neural network, CNN)	DL	Echo	Internal and external validation	Strengths: High accuracy using ECG data for PH detection, multicenter cohort, external validation; Limitations: No RHC for PH confirmation, potential bias from data imbalances
Liu et al. (2025) [48]	PH	DL model combining ECG and CXR for elevated PAP detection	Retro-spective cohort	85,193 patients from Hospital A, 16,736 patients from Hospital B	The DL model achieved AUC 0.8644 in internal validation and AUC 0.8734 in external validation for detecting elevated PAP using a combination of ECG and CXR. It also predicted future left ventricular dysfunction and cardiovascular mortality	AUC, Sensitivity, Specificity, PPV, NPV, Hazard Ratio	CNN, XGBoost	DL	Echo	Internal and external validation	Strengths: High diagnostic accuracy and NPV, integrates ECG and CXR for early PH detection, external validation, multicenter design; Limitations: No RHC confirmation, retrospective design
Liu et al. (2022) [16]	PH	AI model using ECG and Echo for PH detection	Retro-spective cohort	41,097 patients	The AI model achieved AUC 0.88 for elevated PAP detection and predicted cardiovascular mortality. It outperformed conventional ECG diagnosis by cardiologists	AUC, Sensitivity, Specificity, accuracy, Hazard Ratio	Neural network	DL	Echo	Tenfold cross-validation, internal and external validation	Strengths: Good diagnostic accuracy (AUC 0.88), robust prediction for cardiovascular mortality, validated externally, large sample size; Limitations: No RHC confirmation, retrospective design, possible biases due to cohort
Ragnarsdottir et al. (2024) [55]	PH in newborns	Echo-based multi-view DL for predicting and classifying PH	Retro-spective cohort	270 newborns	Explainable multi-view DL model for predicting and classifying PH severity with F1-score of 0.84 for severity and 0.92 for binary detection. Results demonstrated that multi-view and spatio-temporal analysis helped significantly improve prediction	F1-score, AUROC, accuracy, Recall, Precision	CNN	DL	Echo	Tenfold cross-validation, Internal validation	Strengths: First automated PH severity prediction in newborns using echo, explainable model, high performance metrics; Limitations: Data imbalance, limited to newborns, retrospective design
Yang et al. (2024) [61]	PAH and PH (unclear)	Gene expression data from 65 samples (41 PAH, 24 controls) from GEO datasets GSE113439 and GSE15197 were used for PAH prediction	Retro-spective study	274 (unclear)	Lasso combined with Linear Discriminant Analysis achieved the best feature selection performance (AUC = 0.741); the resulting diagnostic model based on selected hub genes reached an AUC of 0.87	AUC	113 ML algorithms	ML	unclear	Cross-validation	Strengths: High AUC, well-selected biomarkers; Limitations: Small sample size, unclear use of RHC for diagnosis in dataset, lack of diverse validation datasets, retrospective design, several methodological limitations
Zeng et al. (2021) [62]	PAH	Identification of biomarkers and immune infiltration analysis in IPAH using bioinformatics	Retro-spective cohort	74 patients	Identified HBB, RNASE2, S100A9, and IL1R2 as biomarkers with high diagnostic value (AUC = 1) for IPAH detection. Immune infiltration differences noted between IPAH and controls	AUC, Sensitivity, Specificity, ROC curve	SVM-recursive feature elimination, Lasso	ML	unclear	Tenfold cross-validation, External validation	Strengths: Accurate biomarkers, Immune infiltration analysis; Limitations: Small dataset, relies on bioinformatics datasets, no real-time monitoring, unclear use of RHC for diagnosis in dataset, retrospective design
Zhao et al. (2024) [64]	CTEPH	Automated CTEPH detection using non-contrasted CT scans	Retro-spective cohort	300 patients	Developed a cascaded network with multiple instance learning using non-contrast CT scans, achieving an AUC of 0.807 and sensitivity of 0.795 in detecting CTEPH	AUC, Sensitivity, Specificity, Accuracy	ResNet-18 CNN	ML, DL	CTEPH diagnosis based on MSKCC Q-SPECT/CT and Modified PIOPED II criteria	Fivefold cross-validation, External validation	Strengths: Non-invasive approach with no additional annotations required. High diagnostic accuracy, multicenter design. Limitations: External validation is limited as the second cohort included only healthy subjects
Zou et al. (2020) [65]	PH	DL-based PH detection and PASP prediction from CXR	Retro-spective cohort	762 patients	DL approach using frontal CXR to screen for PH with high AUC (0.970) on internal test, 0.967 on external test	AUC, Sensitivity, Specificity, PPV, NPV, MAE	InceptionV3, Xception, ResNet50	DL	Echo	Eightfold cross-validation, Internal and external validation	Strengths: High diagnostic accuracy, multicenter design, external validation. Limitations: Small sample size for external validation, overfitting potential, PH diagnosis based on Echo without RHC confirmation

Open in a new tab

This table summarizes the characteristics and performance metrics of the 15 studies that either did not perform RHC, did not explicitly report its use, or replaced it with echocardiography alone for diagnostic confirmation, which we acknowledge as a methodological limitation. These studies are presented separately because the absence of invasive confirmation represents a methodological limitation that may affect diagnostic ground truth. Columns include “Author (year)” (lead author and publication year), “PH group” (pulmonary hypertension subtype studied), “Study group” (study focus), “Study design” (retrospective or prospective), “Sample size,” “Key findings,” “Outcome measures,” “Prognostic model used,” “Model type” (ML or DL), “Diagnosis method,” “Validation method,” and “Strengths and limitations,” summarizing key methodological aspects and performance outcomes

Diagnostic and classification models

Model performance and validation

Reported model performance varied according to study objective, input modality and algorithmic approach. For example, Imai et al. [14] developed a DL model based on CXR images, achieving an AUC of 0.988 with a sensitivity of 0.93 and specificity of 0.98, outperforming experienced physicians in detecting PAH [14]. Similarly, DuBrock et al. [33] demonstrated that an ECG-based CNN could predict PH up to five years before clinical diagnosis (AUC 0.92 at diagnosis, remaining ≥0.80 up to 18 months pre-diagnosis) across two independent cohorts, highlighting the potential of AI for early, non-invasive screening and disease detection [72]. AUC values ranged from 0.71 to 1.00 across both ML and DL models [32, 44, 45, 62]. Diagnostic model performance varied substantially across input domains. CXR and CT-based models generally achieved moderate AUCs (for example, CXR: 0.71 in Kusunose et al. 2020/2022 [44, 45]; CT for CTEPH detection: 0.81 in Zhao et al. 2024 [64]), whereas the best-performing CXR algorithms reached very high accuracy (Imai 2024 0.988 [14]; Zou 2020 0.970/0.967 internal/external [65]). ECG-based models consistently performed in the high range, typically 0.86–0.92 (Kwon 2020 0.859/0.902 [46]; DuBrock 2024 0.92/0.88 [33], with predictive ability up to five years before diagnosis). Echocardiography-based ML also showed strong discrimination (Liao 2023 0.945/0.950 internal/external [69]). Claims/EHR-based approaches yielded high to very high AUCs (Ong 2020 0.88 [52]; Kogan 2023 0.92 [18]). Biomarker and omics studies reported exceptionally high AUCs in smaller, homogeneous cohorts (Duo 2022 0.948/0.945 [34]; Zeng 2021 AUC = 1.00 [62]), although their generalizability remains limited. Finally, multimodal models that integrated imaging and clinical data (for example, Zhao et al. 2025 [71]) achieved consistently high performance, likely reflecting the richer feature space (see Tables 1, 2).

External validation was performed in 20 studies (37.7%), partially in combination with internal validation using a held-out test set. Cross-validation was the most commonly applied strategy, frequently supplemented by a separate internal test split.

Calibration metrics and decision curve analyses were rarely reported across studies.

Risk of bias and applicability

Risk of bias and applicability were assessed using the PROBAST+AI tool. For each domain, both methodological quality and clinical applicability were independently rated as low, moderate, or high. Based on this assessment, 44 studies (83.0%) were classified as having a low overall risk of bias. However, moderate applicability concerns were frequently identified. These were mainly related to non-representative patient populations, insufficient detail on predictor definitions and measurement, non-guideline-conforming diagnostic criteria for PH (including inconsistent use of RHC), and limited generalizability of imaging-based models to broader clinical settings.

In 15 studies (28.3%) [15–17, 39, 40, 43, 44, 46, 48, 55, 61, 62, 64, 65, 67], RHC, the diagnostic gold standard for PH, was either not performed, not explicitly reported, or replaced by echocardiography alone for diagnostic confirmation. This raised applicability concerns regarding the validity and consistency of case definitions across these studies. Subtype attribution in accordance with ESC/ERS guidelines and the Nice classification [1] was clearly reported in 38 studies (71.2%), while the remaining studies used heterogeneous PH definitions, partly without clear diagnostic specification or consistent delineation according to guideline-based criteria [1]. As RHC is not routinely performed in all patients with imaging signs of right heart strain in clinical practice, these studies nonetheless provide valuable insights into clinical and echocardiography-based AI applications in suspected PH. For completeness of the review’s evidence base, these studies were retained but are now presented separately in Table 2 to maintain a clear distinction between reference standards in the main analysis. A detailed study-level assessment of bias and applicability is provided in Supplementary Table S3.

Prognostic and predictive models

Among the studies reviewed, nine focused on prognostic modeling, specifically aimed at predicting outcomes such as mortality [25, 31, 32, 35, 41, 57, 58, 66, 68]. While these studies leveraged imaging data and clinical endpoints, they were rarely externally validated or prospectively tested.

Alabed et al. (2022) applied a cardiac MRI-based multilinear principal component analysis (MPCA) approach to identify prognostic features across the cardiac cycle, improving 1-year mortality prediction in PAH compared with the REVEAL score (c-index 0.76 vs. 0.71) while maintaining interpretability through visualization of high-risk myocardial regions [25]. Kheyfets et al. (2023) developed a random forest model in PAH, integrating clinical, hemodynamic, and biomarker data, and achieved excellent internal (AUC 0.94) and robust external validation (AUC 0.81) for 4-year survival prediction, illustrating the potential of explainable, individualized AI-based risk assessment [68].

Prognostic models demonstrated moderate to high discriminatory ability, depending on input modality and outcome definition. Early CMR-based motion models performed at the lower bound (Dawes 2017, AUC 0.73 [66]), while registry-based Bayesian networks showed intermediate accuracy (Kanwar 2020, AUC 0.80; external 0.74–0.80 [41]). Imaging-rich or multimodal approaches achieved higher performance, with the CMR-based MPCA model by Alabed 2022 improving 1-year mortality prediction in PAH (c-index 0.83 vs. 0.71 REVEAL) [25], and the random-forest model by Kheyfets 2023 reaching AUCs of 0.94 (internal) and 0.81 (external) [68]. Further prognostic applications, such as AI-quantified fibrosis in CT (Dwivedi 2024, c-index 0.76 [35]) or multimodal EHR-based survival prediction (Suvon 2023, AUC 0.89 [58]), also demonstrated strong predictive accuracy. These results collectively highlight that greater data richness and more precise labeling enhance prognostic power (see Table 1).

Discussion

Despite growing enthusiasm for AI in PH research, the clinical translation of ML and DL models remains limited. The observed variation in reported AUCs reflects the heterogeneity of data sources and study objectives. CXR- and CT-based models generally achieved moderate accuracy, ECG-based algorithms showed consistently higher performance, and multimodal or MRI-based prognostic models achieved the highest results, albeit often in smaller, more homogeneous cohorts. These differences likely stem from variations in data richness, label quality (RHC-confirmed vs. surrogate definitions), and cohort heterogeneity, underscoring the need for standardized endpoints, transparent reporting, and robust external validation in future studies (see Tables 1, 2).

To our knowledge, this is the first systematic review to provide a structured quality assessment of 53 studies addressing non-invasive diagnosis, phenotypic classification, and prognostication in PH. By synthesizing heterogeneous approaches using the SWiM framework and evaluating methodological rigor via the PROBAST+AI tool, we identified key limitations that currently hinder clinical implementation.

Study cohorts and diagnosis methodology

A key limitation identified across many studies is the lack of clear differentiation between PH subgroups, particularly with respect to the current ESC/ERS classification [1]. While studies such as Swinnen et al. (2023) explicitly aimed to distinguish PAH from post-capillary PH due to left heart disease (PH-LHD) [60], this distinction was not rigorously addressed in other studies, despite its clinical importance. Differentiating between PH Groups 1 to 5 is essential, as these entities differ markedly in pathophysiology, therapeutic implications, and clinical outcomes [1]. The omission of this distinction in a considerable number of studies underscores a persistent gap between clinical priorities and prevailing practices in AI research, thereby limiting the utility of ML and DL models that do not account for the multifaceted nature of PH.

Regarding diagnostic methodology, most studies employed RHC [1]. When performed in experienced centers, RHC offers high diagnostic specificity and precision with an acceptable risk profile [8, 73]. Across 15 studies, RHC was not performed, not clearly reported, or substituted by echocardiographic assessment [15–17, 39, 40, 43, 44, 46, 48, 55, 61, 62, 64, 65, 67]. Despite its wider availability, echocardiography is insufficient for definitive diagnosis according to guideline-based algorithms [1]. To account for this methodological heterogeneity, studies that did not explicitly report RHC confirmation or used echocardiography alone for diagnosis were analyzed separately (Table 2). Although the absence of invasive confirmation limits diagnostic ground truth, these studies remain relevant as they reflect real-world clinical practice, where RHC is not routinely performed in all patients with imaging signs of right heart strain. Moreover, they provide complementary insights into the development and validation of non-invasive AI models for screening or triage applications. Nevertheless, the use of inconsistent diagnostic standards introduces potential bias, limits comparability, and hinders both model performance and clinical translation. In addition, the decision to perform RHC itself may represent a classification factor, as patient selection for invasive confirmation often differs across PH subgroups and disease stages.

Model performance: machine learning vs. deep learning

The studies included in this review predominantly employed supervised ML models such as support vector machines (SVM), random forests, and gradient boosting machines (GBM) for structured clinical data. These models demonstrated promising discriminatory performance, with AUC values ranging from 0.73 to 1.00 [62, 66]. However, most were developed for binary classification tasks, such as distinguishing PH from healthy controls or identifying at-risk individuals. While such classifiers may support initial screening, they fall short of addressing dynamic clinical needs, including the prediction of long-term outcomes, therapeutic response, or continuous hemodynamic parameters. The limited adoption of regression-based ML approaches limits the clinical applicability of current models for longitudinal monitoring and individualized prognostic assessment.

In contrast, DL approaches, primarily based on CNNs, were mainly applied to imaging data. A considerable number of studies focused specifically on CXR analysis [14, 15, 43–45, 48, 65]. These studies demonstrated the feasibility of using DL for non-invasive detection of PH and PAH, with reported AUC values ranging from 0.71 to 1.00 [32, 44, 45]. The use of widely available CXR data underscores the potential of DL models for scalable, non-invasive screening in PH. However, several limitations remain. The opaque nature of CNN-based models challenges clinical acceptance [74, 75]. None of the included studies appear to have employed explainable AI (XAI) techniques such as saliency maps, Gradient-weighted Class Activation Mapping (Grad-CAM), or layer-wise relevance propagation.

Prognostic models and long-term outcomes

Although several studies have explored AI-based prognostic modeling in pulmonary hypertension, most remain limited by small sample sizes, lack of external validation, and insufficient reporting of model interpretability. Many models did not adequately quantify the relevance of input features, and clinical transparency was often insufficiently addressed. This lack of interpretability hampers clinical applicability, as clinicians require explainable and actionable outputs to inform patient management and therapeutic decisions [76].

Despite these challenges, AI-driven prognostic tools hold considerable promise for advancing PH management by enabling earlier risk stratification and personalized treatment strategies [11].

Data heterogeneity and multi-modal integration

A major limitation across the studies was the heterogeneity of data sources. The studies included in this review utilized various combinations of data types. This diversity complicates direct comparisons between studies and the identification of reproducible predictors. Notably, only a small number of studies integrated multiple data modalities, such as imaging and clinical data, to potentially enhance predictive performance [17, 58, 71]. Only seven studies incorporated advanced data types such as proteomics, transcriptomics, lipidomics, or radiomics [29, 30, 34, 36, 53, 54, 70].

Validation methods and generalizability

A critical limitation across the studies was the insufficient attention to model validation. While most studies relied solely on internal cross-validation, only a limited number employed independent hold-out sets or external datasets. Notably, Zhao et al. (2025) and Bordag et al. (2023) highlighted the value of external validation, demonstrating robust model performance across distinct cohorts and datasets [30, 64]. However, the lack of consistent external validation across the studies included in this review raises concerns about the generalizability of the proposed ML models. Small sample sizes, particularly from single-center cohorts, increase the risk of overfitting and limit broader applicability. Moreover, none of the studies fully met the methodological and reporting standards assessed using the PROBAST+AI tool [23], reflecting persistent gaps in model transparency, reproducibility, and bias control. While several approaches demonstrate considerable innovation, many remain at an early proof-of-concept stage. Robust external validation and prospective multicenter testing are essential to address these concerns [77, 78].

Model interpretability and ethical considerations

A key barrier to the clinical adoption of AI models is their limited interpretability and transparency. While metrics such as AUC and accuracy are important indicators of model performance, ML and DL applications in healthcare must also be comprehensible to clinicians and transparent in their decision-making logic. Techniques from the field of XAI, including SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME), are critical for fostering trust and enabling the meaningful integration of model outputs into clinical workflows [79–81].

In addition to technical concerns, ethical challenges such as data privacy, algorithmic fairness, and bias against underrepresented populations are often insufficiently addressed. Such bias may arise from imbalanced datasets, unrepresentative training populations, or opaque model development processes. These challenges require collaborative strategies, including representative data selection and transparent model auditing [82]. A comprehensive taxonomy of bias sources and fairness strategies highlights the persistent risk of discriminatory outcomes if fairness is not explicitly addressed throughout the AI development process [83]. Addressing these issues is essential for the responsible and equitable implementation of AI in clinical care.

Trust in AI is not established by validation metrics alone but rather emerges throughout the development process. Winter and Carusi (2022) demonstrated that validation and trust are co-constructed iteratively through continuous interaction between algorithm developers and clinical users. Their study on AI-assisted early diagnosis of PH emphasized how crucial steps such as data curation, label refinement, and the choice of benchmarks are shaped collaboratively, often through tacit, practice-oriented input that is not captured in formal reporting [84]. In this sense, validation is not a static technical endpoint but an evolving process embedded in clinical workflows. Acknowledging and integrating these collaborative dynamics may be essential for developing AI systems that are robust, interpretable, and clinically acceptable.

Summary

This systematic review analyzed 53 studies applying ML and DL to PH, focusing on non-invasive models for diagnosis, classification, and prognostication. Key aspects included model inputs, algorithm types, validation strategies, and subgroup differentiation. While ML- and DL-based approaches demonstrated promising accuracy, limited external validation, methodological heterogeneity, and the common failure to address subgroup-specific analyses continue to constrain clinical applicability.

Strengths and limitations

This systematic review provides a comprehensive and methodologically rigorous synthesis of current ML and DL applications in PH research. One key strength lies in the structured evaluation of studies based on clinical intent, model design, and phenotypic focus, which allows for a differentiated assessment of algorithmic potential across the PH spectrum. Additional strengths include the prospective registration in PROSPERO (CRD420251074202) [22], the consistent application of transparent inclusion criteria, and adherence to PRISMA methodology [21]. Study quality and reporting were critically appraised using the recently updated PROBAST+AI tool [23], specifically developed for assessing ML-based prediction models. The synthesis and reporting were also guided by the SWiM guideline, which supports transparent evidence presentation in the absence of formal meta-analysis due to methodological heterogeneity [24]. Another strength is the review’s focus on PH subgroup differentiation, which addresses a clinically important but often overlooked aspect in PH research.

Several limitations should be acknowledged. First, the number of eligible studies remains limited, reflecting the early stage of AI application in this field. Second, substantial heterogeneity in input features, outcome definitions, and performance metrics limited comparability. Third, the lack of access to source code, model parameters, or detailed preprocessing steps in most studies hindered transparency and reproducibility, which impacts the robustness of the review’s findings.

Future directions

To advance the clinical utility of AI in PH, future research should prioritize phenotypically precise model development across all subgroups defined by the current classification [1]. Substantial differences in pathophysiology, therapeutic response, and prognosis between PAH and other forms of PH necessitate subgroup-specific algorithms trained and validated on clearly stratified patient populations. Rigorous model validation must become standard practice, including not only internal cross-validation and independent hold-out testing, but also external validation to ensure broader applicability. Given the relative rarity of PH and its subtypes, collaborative multicenter registries and federated learning approaches may help overcome current limitations in sample size and data diversity. To increase model transparency and foster clinician trust, explainability techniques such as SHAP values, attention mechanisms, or class activation mapping (CAM) should be routinely implemented and clearly reported. By mitigating the black-box nature of AI models, these tools enhance clinical interpretability and help identify biologically plausible predictors, similar to the feature selection process in methods like Least Absolute Shrinkage and Selection Operator (LASSO). Furthermore, integrating structured clinical and imaging data with unstructured modalities such as free-text reports or waveform data holds promise for improving model performance and robustness. Clinical implementation of AI models in PH should complement rather than replace established clinical workflows, with particular attention to interoperability with EHRs and prospective validation. In addition, future research should aim to systematically consider health economic implications, for example by evaluating whether AI-based tools can contribute to more efficient diagnostic pathways or resource allocation. Close collaboration between clinicians, data scientists, health economists, software engineers, and regulatory bodies is essential to ensure that future AI applications meet the standards of safety, transparency, clinical and health economic relevance required for real-world adoption.

Conclusions

AI holds considerable promise to support earlier diagnosis, individualized risk assessment, and data-informed therapeutic decision-making in PH. Current ML and DL models show encouraging performance in diagnostic and prognostic applications based on non-invasive clinical and imaging data. However, progress toward clinical translation remains limited by small sample sizes, single-center designs, methodological heterogeneity, and the lack of external validation and standardized subgroup phenotyping aligned with current ESC/ERS guidelines. Future research should prioritize harmonized development and reporting practices, transparent diagnostic labeling, and robust multicenter validation to enable safe and effective integration of AI tools into clinical care.

Questions for future research

How can AI models in PH be trained on data strictly aligned with current clinical and hemodynamic definitions?
What methodological approaches can enhance the generalizability of AI tools across PH subgroups and clinical environments?
Can explainable AI increase transparency and foster clinical acceptance in PH applications?
How can AI support early risk stratification and individualized therapy guidance, especially in PAH?
How can AI tools in PH be prospectively validated through real-world, multicenter study designs?

Supplementary Information

Additional file 1.^{(17.1KB, docx)}

Additional file 2.^{(33.4KB, docx)}

Additional file 3.^{(21.3KB, odt)}

Acknowledgements

We gratefully acknowledge Linda Stein for her valuable administrative assistance throughout the preparation of this systematic review.

Abbreviations

AI: Artificial intelligence
Ao: Aorta
AUC: Area under the curve
AUROC: Area under the receiver operating characteristic curve
AUPRC: Area under the precision recall curve
BERT: Bidirectional encoder representations from transformers
BNP: Brain natriuretic peptide
CatBoost: Categorical boosting
CMR: Cardiovascular magnetic resonance
CNN: Convolutional neural network
CT: Computed tomography
CTEPH: Chronic thromboembolic pulmonary hypertension
CTPA: CT pulmonary angiography
CXR: Chest X-ray
DAFIT: Data-augmented feature integration technique
DL: Deep learning
Echo: Echocardiography
ECG: Electrocardiogram
EHR: Electronic health record
F1-Score: Harmonic mean of precision and recall
GSE: Gene expression omnibus series
HR: Hazard ratio
ICD-9/10: International classification of diseases, ninth/tenth revision
IL-2: Interleukin-2
IL-9: Interleukin-9
LASSO: Least absolute shrinkage and selection operator
LV: Left ventricle
MAE: Mean absolute error
ML: Machine learning
MLP: Multilayer perceptron
MPCA: Multilinear principal component analysis
MRI: Magnetic resonance imaging
mPAP: Mean pulmonary arterial pressure
MSKCC: Memorial Sloan Kettering Cancer Center
NPV: Negative predictive value
PA: Pulmonary artery
PAH: Pulmonary arterial hypertension
PAP: Pulmonary arterial pressure
PASP: Pulmonary arterial systolic pressure
PH: Pulmonary hypertension
PH-LD: Pulmonary hypertension due to lung disease
PH-LHD: Pulmonary hypertension due to left heart disease
PIOPED: Prospective investigation of pulmonary embolism diagnosis
PPG: Photoplethysmography
PPV: Positive predictive value
PRISMA: Preferred reporting items for systematic reviews and meta-analyses
PROBAST+AI: Prediction model risk of bias assessment tool for artificial intelligence
R²: Coefficient of determination
RHC: Right heart catheterization
ROC: Receiver operating characteristic
RV: Right ventricle
RVEF: Right ventricular ejection fraction
SVM: Support vector machine
sPAP: Systolic pulmonary arterial pressure
SVR: Support vector regression
TAN: Tree-augmented naïve Bayes
TPR: True positive rate
TRPG: Tricuspid regurgitant pressure gradient
TRV: Tricuspid regurgitation velocity
VUMC: Vanderbilt University Medical Center
WGCNA: Weighted gene co-expression network analysis
XGBoost: Extreme gradient boosting

Author contributions

Author contributions TK conceived and designed the review, performed the literature search and data extraction, conducted the quality assessment, prepared all figures and tables, and drafted the manuscript. MK assisted with data organization and supported manuscript preparation. CH contributed to the economic framing and critically revised the manuscript. SS provided conceptual guidance and acted as academic supervisor of the project. All authors read and approved the final version of the manuscript.

Funding

Open Access funding enabled and organized by Projekt DEAL. This research was conducted without any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. No external funding was received for the conduct, analysis, or reporting of this study.

Data availability

This systematic review is based on publicly available data from previously published studies. As no original data were collected or generated, no new datasets are available. All relevant data from the included studies are cited in the manuscript and summarized in the main text and supplementary tables. The corresponding review protocol was prospectively registered and is publicly available in PROSPERO (registration number: CRD420251074202). No analytic code was generated, as data synthesis was conducted narratively following the SWiM (Synthesis Without Meta-analysis) approach. A pre-defined data extraction form was used but is not publicly available; it can be obtained upon reasonable request from the corresponding author. Further inquiries regarding specific studies or data can also be directed to the corresponding author.

Declarations

Ethics approval and consent to participate

Not applicable. This article is a systematic review of previously published studies and does not involve any new studies with human participants or animals performed by any of the authors. Therefore, ethical approval and informed consent were not required.

Competing interests

TK has received speaker honoraria for lectures from Janssen unrelated to the present work. MK, CH, SS: Nothing to disclose.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Humbert M, et al. 2022 ESC/ERS Guidelines for the diagnosis and treatment of pulmonary hypertension. Eur Heart J. 2022;43:3618–731. 10.1093/eurheartj/ehac237. [DOI] [PubMed] [Google Scholar]
2.Rosenkranz S, Howard LS, Gomberg-Maitland M, Hoeper MM. Systemic consequences of pulmonary hypertension and right-sided heart failure. Circulation. 2020;141:678–93. 10.1161/circulationaha.116.022362. [DOI] [PubMed] [Google Scholar]
3.Small M, Perchenet L, Bennett A, Linder J. The diagnostic journey of pulmonary arterial hypertension patients: results from a multinational real-world survey. Ther Adv Respir Dis. 2024;18:17534666231218886. 10.1177/17534666231218886. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Deshwal H, Weinstein T, Sulica R. Advances in the management of pulmonary arterial hypertension. J Investig Med. 2021;69:1270–80. 10.1136/jim-2021-002027. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Frost A, et al. Diagnosis of pulmonary hypertension. Eur Respir J. 2019. 10.1183/13993003.01904-2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Weatherald J, Humbert M. The ‘great wait’ for diagnosis in pulmonary arterial hypertension. Respirology. 2020;25:790–2. 10.1111/resp.13814. [DOI] [PubMed] [Google Scholar]
7.Patton DM, Enzevaie A, Day A, Sanfilippo A, Johri AM. A quality control exercise in the echo laboratory: reduction in inter-observer variability in the interpretation of pulmonary hypertension. Echocardiography. 2017;34:1882–7. 10.1111/echo.13712. [DOI] [PubMed] [Google Scholar]
8.Rosenkranz S, Preston IR. Right heart catheterisation: best practice and pitfalls in pulmonary hypertension. Eur Respir Rev. 2015;24:642–52. 10.1183/16000617.0062-2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Fadilah A, Putri VYS, Puling I, Willyanto SE. Assessing the precision of machine learning for diagnosing pulmonary arterial hypertension: a systematic review and meta-analysis of diagnostic accuracy studies. Front Cardiovasc Med. 2024;11:1422327. 10.3389/fcvm.2024.1422327. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Hardacre CJ, et al. Diagnostic test accuracy of artificial intelligence analysis of cross-sectional imaging in pulmonary hypertension: a systematic literature review. Br J Radiol. 2021. 10.1259/bjr.20210332. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Attaripour Esfahani S, et al. A comprehensive review of Artificial Intelligence (AI) applications in pulmonary hypertension (PH). Medicina Kaunas. 2025. 10.3390/medicina61010085. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Rhodes CJ, Sweatt AJ, Maron BA. Harnessing big data to advance treatment and understanding of pulmonary hypertension. Circ Res. 2022;130:1423–44. 10.1161/circresaha.121.319969. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Tchuente Foguem G, Teguede Keleko A. Artificial intelligence applied in pulmonary hypertension: a bibliometric analysis. AI Ethics. 2023;3:1063–93. 10.1007/s43681-023-00267-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Imai S, et al. Artificial intelligence-based model for predicting pulmonary arterial hypertension on chest x-ray images. BMC Pulm Med. 2024;24:101. 10.1186/s12890-024-02891-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Han PL, et al. Artificial intelligence-assisted diagnosis of congenital heart disease and associated pulmonary arterial hypertension from chest radiographs: a multi-reader multi-case study. Eur J Radiol. 2024;171:111277. 10.1016/j.ejrad.2023.111277. [DOI] [PubMed] [Google Scholar]
16.Liu CM, et al. Artificial Intelligence-enabled electrocardiogram improves the diagnosis and prediction of mortality in patients with pulmonary hypertension. JACC Asia. 2022;2:258–70. 10.1016/j.jacasi.2022.02.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Kishikawa R, et al. An ensemble learning model for detection of pulmonary hypertension using electrocardiogram, chest X-ray, and brain natriuretic peptide. Eur Heart J Digit Health. 2025;6:209–17. 10.1093/ehjdh/ztae097. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Kogan E, et al. A machine learning approach to identifying patients with pulmonary hypertension using real-world electronic health records. Int J Cardiol. 2023;374:95–9. 10.1016/j.ijcard.2022.12.016. [DOI] [PubMed] [Google Scholar]
19.Sharkey MJ, Checkley EW, Swift AJ. Applications of artificial intelligence in computed tomography imaging for phenotyping pulmonary hypertension. Curr Opin Pulm Med. 2024;30:464–72. 10.1097/mcp.0000000000001103. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Nemati N, et al. Pulmonary hypertension detection non-invasively at point-of-care using a machine-learned algorithm. Diagnostics. 2024;14:897. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Page MJ, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. 10.1136/bmj.n71. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Schiavo JH. PROSPERO: an international register of systematic review protocols. Med Ref Serv Q. 2019;38:171–80. 10.1080/02763869.2019.1588072. [DOI] [PubMed] [Google Scholar]
23.Moons KGM, et al. PROBAST+AI: an updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods. BMJ. 2025;388:e082505. 10.1136/bmj-2024-082505. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Campbell M, et al. Synthesis without meta-analysis (SWiM) in systematic reviews: reporting guideline. BMJ. 2020;368:l6890. 10.1136/bmj.l6890. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Alabed S, et al. Machine learning cardiac-MRI features predict mortality in newly diagnosed pulmonary arterial hypertension. Eur Heart J Digit Health. 2022;3:265–75. 10.1093/ehjdh/ztac022. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Anand V, et al. Machine learning for diagnosis of pulmonary hypertension by echocardiography. Mayo Clin Proc. 2024;99:260–70. 10.1016/j.mayocp.2023.05.006. [DOI] [PubMed] [Google Scholar]
27.Aras MA, et al. Electrocardiogram detection of pulmonary hypertension using deep learning. J Card Fail. 2023;29:1017–28. 10.1016/j.cardfail.2022.12.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Argiento P, et al. A pulmonary hypertension targeted algorithm to improve referral to right heart catheterization: a machine learning approach. Comput Struct Biotechnol J. 2024;24:746–53. 10.1016/j.csbj.2024.11.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Bauer Y, et al. Identifying early pulmonary arterial hypertension biomarkers in systemic sclerosis: machine learning on proteomics from the DETECT cohort. Eur Respir J. 2021;57:2002591. 10.1183/13993003.02591-2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Bordag N, et al. Lipidomics for diagnosis and prognosis of pulmonary hypertension. medRxiv. 2023. 10.1101/2023.05.17.23289772. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Chettrit D, Bregman Amitai O, Tamir I, Bar A, Elnekave E. PHT-bot: a deep learning based system for automatic risk stratification of COPD patients based upon signs of pulmonary hypertension. Vol 10950 MI (SPIE, 2019). arXiv:1905.11773
32.Diller G-P, et al. A framework of deep learning networks provides expert-level accuracy for the detection and prognostication of pulmonary arterial hypertension. Eur Heart J Cardiovasc Imaging. 2022;23:1447–56. 10.1093/ehjci/jeac147. [DOI] [PubMed] [Google Scholar]
33.DuBrock HM, et al. An electrocardiogram-based AI algorithm for early detection of pulmonary hypertension. Eur Respir J. 2024;64:2400192. 10.1183/13993003.00192-2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Duo M, et al. Construction of a diagnostic signature and immune landscape of pulmonary arterial hypertension. Front Cardiovasc Med. 2022. 10.3389/fcvm.2022.940894. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Dwivedi K, et al. Improving prognostication in pulmonary hypertension using AI-quantified fibrosis and radiologic severity scoring at baseline CT. Radiology. 2024;310:e231718. 10.1148/radiol.231718. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Errington N, et al. A diagnostic miRNA signature for pulmonary arterial hypertension using a consensus machine learning approach. EBioMedicine. 2021. 10.1016/j.ebiom.2021.103444. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Fortmeier V, et al. Solving the pulmonary hypertension paradox in patients with severe tricuspid regurgitation by employing artificial intelligence. JACC Cardiovasc Interv. 2022;15:381–94. 10.1016/j.jcin.2021.12.043. [DOI] [PubMed] [Google Scholar]
38.Gawlitza J, et al. Machine learning assisted feature identification and prediction of hemodynamic endpoints using computed tomography in patients with CTEPH. Int J Cardiovasc Imaging. 2024;40:569–77. 10.1007/s10554-023-03026-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Hu H, et al. Identification of potential biomarkers for group I pulmonary hypertension based on machine learning and bioinformatics analysis. Int J Mol Sci. 2023. 10.3390/ijms24098050. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Hyde B, et al. A claims-based, machine-learning algorithm to identify patients with pulmonary arterial hypertension. Pulm Circ. 2023;13:e12237. 10.1002/pul2.12237. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Kanwar MK, et al. Risk stratification in pulmonary arterial hypertension using Bayesian analysis. Eur Respir J. 2020;56:2000008. 10.1183/13993003.00008-2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Kiely DG, et al. Utilising artificial intelligence to determine patients at risk of a rare disease: idiopathic pulmonary arterial hypertension. Pulm Circ. 2019;9:2045894019890549. 10.1177/2045894019890549. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Kıvrak T et al. Pulmonary Hypertension Classification using Artificial Intelligence and Chest X-Ray:ATA AI STUDY-1. 2023. medRxiv. 10.1101/2023.04.14.23288561
44.Kusunose K, et al. Deep learning for detection of exercise-induced pulmonary hypertension using chest x-ray images. Front Cardiovasc Med. 2022. 10.3389/fcvm.2022.891703. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Kusunose K, Hirata Y, Tsuji T, Kotoku Ji, Sata M. Deep learning to predict elevated pulmonary artery pressure in patients with suspected pulmonary hypertension using standard chest x ray. Sci Rep. 2020;10:19311. 10.1038/s41598-020-76359-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Kwon J-M, et al. Artificial intelligence for early prediction of pulmonary hypertension using electrocardiography. J Heart Lung Transplant. 2020;39:805–14. 10.1016/j.healun.2020.04.009. [DOI] [PubMed] [Google Scholar]
47.Leha A, et al. A machine learning approach for the prediction of pulmonary hypertension. PLoS ONE. 2019;14:e0224453. 10.1371/journal.pone.0224453. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Liu P-Y, et al. A deep-learning-enabled electrocardiogram and chest X-ray for detecting pulmonary arterial hypertension. J Imaging Inf Med. 2025;38:747–56. 10.1007/s10278-024-01225-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Lungu A, et al. Diagnosis of pulmonary hypertension from magnetic resonance imaging-based computational models and decision tree analysis. Pulm Circ. 2016;6:181–90. 10.1086/686020. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Matsunaga T, et al. Development and web deployment of prediction model for pulmonary arterial pressure in chronic thromboembolic pulmonary hypertension using machine learning. PLoS ONE. 2024;19:e0300716. 10.1371/journal.pone.0300716. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Murayama M, et al. Deep learning to assess right ventricular ejection fraction from two-dimensional echocardiograms in precapillary pulmonary hypertension. Echocardiography. 2024;41:e15812. 10.1111/echo.15812. [DOI] [PubMed] [Google Scholar]
52.Ong MS, et al. Claims‐based algorithms for identifying patients with pulmonary hypertension: a comparison of decision rules and machine‐learning approaches. J Am Heart Assoc. 2020;9:e016648. 10.1161/JAHA.120.016648. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Priya S, et al. Radiomics detection of pulmonary hypertension via texture-based assessments of cardiac MRI: a machine-learning model comparison—cardiac MRI radiomics in pulmonary hypertension. J Clin Med. 2021;10:1921. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Priya S, et al. Radiomics side experiments and DAFIT approach in identifying pulmonary hypertension using Cardiac MRI derived radiomics based machine learning models. Sci Rep. 2021;11:12686. 10.1038/s41598-021-92155-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Ragnarsdottir H, et al. Deep learning based prediction of pulmonary hypertension in newborns using echocardiograms. Int J Comput Vis. 2024;132:2567–84. 10.1007/s11263-024-01996-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Schuler KP, et al. An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record. Respir Res. 2022;23:138. 10.1186/s12931-022-02055-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Shikhare S, et al. Right-to-left ventricle ratio determined by machine learning algorithms on CT pulmonary angiography images predicts prolonged ICU length of stay in operated chronic thromboembolic pulmonary hypertension. Br J Radiol. 2022;95:20210722. 10.1259/bjr.20210722. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Suvon M, Tripathi P, Alabed S, Swift A, Lu H. Multimodal learning for predicting mortality in patients with pulmonary arterial hypertension. 2022.
59.Swift AJ, et al. A machine learning cardiac magnetic resonance approach to extract disease features and automate pulmonary arterial hypertension diagnosis. Eur Heart J Cardiovasc Imaging. 2020;22:236–45. 10.1093/ehjci/jeaa001. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Swinnen K, et al. Machine learning to differentiate pulmonary hypertension due to left heart disease from pulmonary arterial hypertension. ERJ Open Res. 2023. 10.1183/23120541.00229-2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Yang J, Chen S, Chen K, Wu J, Yuan H. Exploring IRGs as a biomarker of pulmonary hypertension using multiple machine learning algorithms. Diagnostics. 2024. 10.3390/diagnostics14212398. [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Zeng H, Liu X, Zhang Y. Identification of potential biomarkers and immune infiltration characteristics in idiopathic pulmonary arterial hypertension using bioinformatics analysis. Front Cardiovasc Med. 2021. 10.3389/fcvm.2021.624714. [DOI] [PMC free article] [PubMed] [Google Scholar]
63.Zhang N, et al. Machine learning based on computed tomography pulmonary angiography in evaluating pulmonary artery pressure in patients with pulmonary hypertension. J Clin Med. 2023;12:1297. [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Zhao M, et al. Non-contrasted computed tomography (NCCT) based chronic thromboembolic pulmonary hypertension (CTEPH) automatic diagnosis using cascaded network with multiple instance learning. Phys Med Biol. 2024;69:185011. 10.1088/1361-6560/ad7455. [DOI] [PubMed] [Google Scholar]
65.Zou X-L, et al. A promising approach for screening pulmonary hypertension based on frontal chest radiographs using deep learning: a retrospective study. PLoS ONE. 2020;15:e0236378. 10.1371/journal.pone.0236378. [DOI] [PMC free article] [PubMed] [Google Scholar]
66.Dawes TJW, et al. Machine learning of three-dimensional right ventricular motion enables outcome prediction in pulmonary hypertension: a cardiac MR imaging study. Radiology. 2017;283:381–90. 10.1148/radiol.2016161315. [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Guo L, et al. Development and evaluation of a deep learning-based pulmonary hypertension screening algorithm using a digital stethoscope. J Am Heart Assoc. 2025;14:e036882. 10.1161/jaha.124.036882. [DOI] [PMC free article] [PubMed] [Google Scholar]
68.Kheyfets VO, et al. Computational platform for doctor-artificial intelligence cooperation in pulmonary arterial hypertension prognostication: a pilot study. ERJ Open Res. 2023. 10.1183/23120541.00484-2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Liao Z, et al. Automatic echocardiographic evaluation of the probability of pulmonary hypertension using machine learning. Pulm Circ. 2023;13:e12272. 10.1002/pul2.12272. [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Sweatt AJ, et al. Discovery of distinct immune phenotypes using machine learning in pulmonary arterial hypertension. Circ Res. 2019;124:904–19. 10.1161/circresaha.118.313911. [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Zhao W, et al. Development and validation of multimodal deep learning algorithms for detecting pulmonary hypertension. NPJ Digit Med. 2025;8:198. 10.1038/s41746-025-01593-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Dubrock HM, et al. Use of machine-learning models to identify clinical features in patients with pulmonary arterial hypertension associated with a future clinical worsening event. Chest. 2023;164:A5931–2. 10.1016/j.chest.2023.07.3821. [Google Scholar]
73.Hoeper MM, et al. Complications of right heart catheterization procedures in patients with pulmonary hypertension in experienced centers. J Am Coll Cardiol. 2006;48:2546–52. 10.1016/j.jacc.2006.07.061. [DOI] [PubMed] [Google Scholar]
74.Salih A, et al. Explainable artificial intelligence and cardiac imaging: toward more interpretable models. Circ Cardiovasc Imaging. 2023;16:e014519. 10.1161/circimaging.122.014519. [DOI] [PubMed] [Google Scholar]
75.Marey A, et al. Explainability, transparency and black box challenges of AI in radiology: impact on patient care in cardiovascular radiology. Egypt J Radiol Nucl Med. 2024;55:183. 10.1186/s43055-024-01356-2. [Google Scholar]
76.Tonekaboni S, Joshi S, McCradden MD, Goldenberg A. Proceedings of the 4th Machine Learning for Healthcare Conference. 2019; vol. 106, p. 359–380 (PMLR, Proceedings of Machine Learning Research).
77.Goto S, Ozawa H. The importance of external validation for neural network models. JACC Adv. 2023;2:100610. 10.1016/j.jacadv.2023.100610. [DOI] [PMC free article] [PubMed] [Google Scholar]
78.Cabitza F, et al. The importance of being external. Methodological insights for the external validation of machine learning models in medicine. Comput Methods Programs Biomed. 2021;208:106288. 10.1016/j.cmpb.2021.106288. [DOI] [PubMed] [Google Scholar]
79.Hassija V, et al. Interpreting black-box models: a review on explainable artificial intelligence. Cogn Comput. 2024;16:45–74. 10.1007/s12559-023-10179-8. [Google Scholar]
80.Ribeiro MT, Singh S, Guestrin C. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, CA: Association for Computing Machinery; 2016. p. 1135–1144.
81.Lundberg S. M. & Lee, S.-I. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, CA: Curran Associates Inc.; 2017. p. 4768–4777.
82.Ueda D, et al. Fairness of artificial intelligence in healthcare: review and recommendations. Jpn J Radiol. 2024;42:3–15. 10.1007/s11604-023-01474-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
83.Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A. A survey on bias and fairness in machine learning. ACM Comput Surv. 2021;54:115. 10.1145/3457607. [Google Scholar]
84.Winter P, Carusi A. If you’re going to trust the machine, then that trust has got to be based on something’: : validation and the co-constitution of trust in developing artificial intelligence (AI) for the early diagnosis of pulmonary hypertension (PH). Sci Technol Stud. 2022;35:58–77. 10.23987/sts.102198. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Additional file 1.^{(17.1KB, docx)}

Additional file 2.^{(33.4KB, docx)}

Additional file 3.^{(21.3KB, odt)}

Data Availability Statement

[CR1] 1.Humbert M, et al. 2022 ESC/ERS Guidelines for the diagnosis and treatment of pulmonary hypertension. Eur Heart J. 2022;43:3618–731. 10.1093/eurheartj/ehac237. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Rosenkranz S, Howard LS, Gomberg-Maitland M, Hoeper MM. Systemic consequences of pulmonary hypertension and right-sided heart failure. Circulation. 2020;141:678–93. 10.1161/circulationaha.116.022362. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Small M, Perchenet L, Bennett A, Linder J. The diagnostic journey of pulmonary arterial hypertension patients: results from a multinational real-world survey. Ther Adv Respir Dis. 2024;18:17534666231218886. 10.1177/17534666231218886. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Deshwal H, Weinstein T, Sulica R. Advances in the management of pulmonary arterial hypertension. J Investig Med. 2021;69:1270–80. 10.1136/jim-2021-002027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Frost A, et al. Diagnosis of pulmonary hypertension. Eur Respir J. 2019. 10.1183/13993003.01904-2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Weatherald J, Humbert M. The ‘great wait’ for diagnosis in pulmonary arterial hypertension. Respirology. 2020;25:790–2. 10.1111/resp.13814. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Patton DM, Enzevaie A, Day A, Sanfilippo A, Johri AM. A quality control exercise in the echo laboratory: reduction in inter-observer variability in the interpretation of pulmonary hypertension. Echocardiography. 2017;34:1882–7. 10.1111/echo.13712. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Rosenkranz S, Preston IR. Right heart catheterisation: best practice and pitfalls in pulmonary hypertension. Eur Respir Rev. 2015;24:642–52. 10.1183/16000617.0062-2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Fadilah A, Putri VYS, Puling I, Willyanto SE. Assessing the precision of machine learning for diagnosing pulmonary arterial hypertension: a systematic review and meta-analysis of diagnostic accuracy studies. Front Cardiovasc Med. 2024;11:1422327. 10.3389/fcvm.2024.1422327. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Hardacre CJ, et al. Diagnostic test accuracy of artificial intelligence analysis of cross-sectional imaging in pulmonary hypertension: a systematic literature review. Br J Radiol. 2021. 10.1259/bjr.20210332. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Attaripour Esfahani S, et al. A comprehensive review of Artificial Intelligence (AI) applications in pulmonary hypertension (PH). Medicina Kaunas. 2025. 10.3390/medicina61010085. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Rhodes CJ, Sweatt AJ, Maron BA. Harnessing big data to advance treatment and understanding of pulmonary hypertension. Circ Res. 2022;130:1423–44. 10.1161/circresaha.121.319969. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Tchuente Foguem G, Teguede Keleko A. Artificial intelligence applied in pulmonary hypertension: a bibliometric analysis. AI Ethics. 2023;3:1063–93. 10.1007/s43681-023-00267-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Imai S, et al. Artificial intelligence-based model for predicting pulmonary arterial hypertension on chest x-ray images. BMC Pulm Med. 2024;24:101. 10.1186/s12890-024-02891-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Han PL, et al. Artificial intelligence-assisted diagnosis of congenital heart disease and associated pulmonary arterial hypertension from chest radiographs: a multi-reader multi-case study. Eur J Radiol. 2024;171:111277. 10.1016/j.ejrad.2023.111277. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Liu CM, et al. Artificial Intelligence-enabled electrocardiogram improves the diagnosis and prediction of mortality in patients with pulmonary hypertension. JACC Asia. 2022;2:258–70. 10.1016/j.jacasi.2022.02.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Kishikawa R, et al. An ensemble learning model for detection of pulmonary hypertension using electrocardiogram, chest X-ray, and brain natriuretic peptide. Eur Heart J Digit Health. 2025;6:209–17. 10.1093/ehjdh/ztae097. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Kogan E, et al. A machine learning approach to identifying patients with pulmonary hypertension using real-world electronic health records. Int J Cardiol. 2023;374:95–9. 10.1016/j.ijcard.2022.12.016. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Sharkey MJ, Checkley EW, Swift AJ. Applications of artificial intelligence in computed tomography imaging for phenotyping pulmonary hypertension. Curr Opin Pulm Med. 2024;30:464–72. 10.1097/mcp.0000000000001103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Nemati N, et al. Pulmonary hypertension detection non-invasively at point-of-care using a machine-learned algorithm. Diagnostics. 2024;14:897. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Page MJ, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. 10.1136/bmj.n71. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Schiavo JH. PROSPERO: an international register of systematic review protocols. Med Ref Serv Q. 2019;38:171–80. 10.1080/02763869.2019.1588072. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Moons KGM, et al. PROBAST+AI: an updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods. BMJ. 2025;388:e082505. 10.1136/bmj-2024-082505. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Campbell M, et al. Synthesis without meta-analysis (SWiM) in systematic reviews: reporting guideline. BMJ. 2020;368:l6890. 10.1136/bmj.l6890. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Alabed S, et al. Machine learning cardiac-MRI features predict mortality in newly diagnosed pulmonary arterial hypertension. Eur Heart J Digit Health. 2022;3:265–75. 10.1093/ehjdh/ztac022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Anand V, et al. Machine learning for diagnosis of pulmonary hypertension by echocardiography. Mayo Clin Proc. 2024;99:260–70. 10.1016/j.mayocp.2023.05.006. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Aras MA, et al. Electrocardiogram detection of pulmonary hypertension using deep learning. J Card Fail. 2023;29:1017–28. 10.1016/j.cardfail.2022.12.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Argiento P, et al. A pulmonary hypertension targeted algorithm to improve referral to right heart catheterization: a machine learning approach. Comput Struct Biotechnol J. 2024;24:746–53. 10.1016/j.csbj.2024.11.031. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Bauer Y, et al. Identifying early pulmonary arterial hypertension biomarkers in systemic sclerosis: machine learning on proteomics from the DETECT cohort. Eur Respir J. 2021;57:2002591. 10.1183/13993003.02591-2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Bordag N, et al. Lipidomics for diagnosis and prognosis of pulmonary hypertension. medRxiv. 2023. 10.1101/2023.05.17.23289772. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Chettrit D, Bregman Amitai O, Tamir I, Bar A, Elnekave E. PHT-bot: a deep learning based system for automatic risk stratification of COPD patients based upon signs of pulmonary hypertension. Vol 10950 MI (SPIE, 2019). arXiv:1905.11773

[CR32] 32.Diller G-P, et al. A framework of deep learning networks provides expert-level accuracy for the detection and prognostication of pulmonary arterial hypertension. Eur Heart J Cardiovasc Imaging. 2022;23:1447–56. 10.1093/ehjci/jeac147. [DOI] [PubMed] [Google Scholar]

[CR33] 33.DuBrock HM, et al. An electrocardiogram-based AI algorithm for early detection of pulmonary hypertension. Eur Respir J. 2024;64:2400192. 10.1183/13993003.00192-2024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Duo M, et al. Construction of a diagnostic signature and immune landscape of pulmonary arterial hypertension. Front Cardiovasc Med. 2022. 10.3389/fcvm.2022.940894. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Dwivedi K, et al. Improving prognostication in pulmonary hypertension using AI-quantified fibrosis and radiologic severity scoring at baseline CT. Radiology. 2024;310:e231718. 10.1148/radiol.231718. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Errington N, et al. A diagnostic miRNA signature for pulmonary arterial hypertension using a consensus machine learning approach. EBioMedicine. 2021. 10.1016/j.ebiom.2021.103444. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Fortmeier V, et al. Solving the pulmonary hypertension paradox in patients with severe tricuspid regurgitation by employing artificial intelligence. JACC Cardiovasc Interv. 2022;15:381–94. 10.1016/j.jcin.2021.12.043. [DOI] [PubMed] [Google Scholar]

[CR38] 38.Gawlitza J, et al. Machine learning assisted feature identification and prediction of hemodynamic endpoints using computed tomography in patients with CTEPH. Int J Cardiovasc Imaging. 2024;40:569–77. 10.1007/s10554-023-03026-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Hu H, et al. Identification of potential biomarkers for group I pulmonary hypertension based on machine learning and bioinformatics analysis. Int J Mol Sci. 2023. 10.3390/ijms24098050. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Hyde B, et al. A claims-based, machine-learning algorithm to identify patients with pulmonary arterial hypertension. Pulm Circ. 2023;13:e12237. 10.1002/pul2.12237. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Kanwar MK, et al. Risk stratification in pulmonary arterial hypertension using Bayesian analysis. Eur Respir J. 2020;56:2000008. 10.1183/13993003.00008-2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Kiely DG, et al. Utilising artificial intelligence to determine patients at risk of a rare disease: idiopathic pulmonary arterial hypertension. Pulm Circ. 2019;9:2045894019890549. 10.1177/2045894019890549. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Kıvrak T et al. Pulmonary Hypertension Classification using Artificial Intelligence and Chest X-Ray:ATA AI STUDY-1. 2023. medRxiv. 10.1101/2023.04.14.23288561

[CR44] 44.Kusunose K, et al. Deep learning for detection of exercise-induced pulmonary hypertension using chest x-ray images. Front Cardiovasc Med. 2022. 10.3389/fcvm.2022.891703. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Kusunose K, Hirata Y, Tsuji T, Kotoku Ji, Sata M. Deep learning to predict elevated pulmonary artery pressure in patients with suspected pulmonary hypertension using standard chest x ray. Sci Rep. 2020;10:19311. 10.1038/s41598-020-76359-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.Kwon J-M, et al. Artificial intelligence for early prediction of pulmonary hypertension using electrocardiography. J Heart Lung Transplant. 2020;39:805–14. 10.1016/j.healun.2020.04.009. [DOI] [PubMed] [Google Scholar]

[CR47] 47.Leha A, et al. A machine learning approach for the prediction of pulmonary hypertension. PLoS ONE. 2019;14:e0224453. 10.1371/journal.pone.0224453. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] 48.Liu P-Y, et al. A deep-learning-enabled electrocardiogram and chest X-ray for detecting pulmonary arterial hypertension. J Imaging Inf Med. 2025;38:747–56. 10.1007/s10278-024-01225-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Lungu A, et al. Diagnosis of pulmonary hypertension from magnetic resonance imaging-based computational models and decision tree analysis. Pulm Circ. 2016;6:181–90. 10.1086/686020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Matsunaga T, et al. Development and web deployment of prediction model for pulmonary arterial pressure in chronic thromboembolic pulmonary hypertension using machine learning. PLoS ONE. 2024;19:e0300716. 10.1371/journal.pone.0300716. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Murayama M, et al. Deep learning to assess right ventricular ejection fraction from two-dimensional echocardiograms in precapillary pulmonary hypertension. Echocardiography. 2024;41:e15812. 10.1111/echo.15812. [DOI] [PubMed] [Google Scholar]

[CR52] 52.Ong MS, et al. Claims‐based algorithms for identifying patients with pulmonary hypertension: a comparison of decision rules and machine‐learning approaches. J Am Heart Assoc. 2020;9:e016648. 10.1161/JAHA.120.016648. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR53] 53.Priya S, et al. Radiomics detection of pulmonary hypertension via texture-based assessments of cardiac MRI: a machine-learning model comparison—cardiac MRI radiomics in pulmonary hypertension. J Clin Med. 2021;10:1921. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR54] 54.Priya S, et al. Radiomics side experiments and DAFIT approach in identifying pulmonary hypertension using Cardiac MRI derived radiomics based machine learning models. Sci Rep. 2021;11:12686. 10.1038/s41598-021-92155-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR55] 55.Ragnarsdottir H, et al. Deep learning based prediction of pulmonary hypertension in newborns using echocardiograms. Int J Comput Vis. 2024;132:2567–84. 10.1007/s11263-024-01996-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR56] 56.Schuler KP, et al. An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record. Respir Res. 2022;23:138. 10.1186/s12931-022-02055-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR57] 57.Shikhare S, et al. Right-to-left ventricle ratio determined by machine learning algorithms on CT pulmonary angiography images predicts prolonged ICU length of stay in operated chronic thromboembolic pulmonary hypertension. Br J Radiol. 2022;95:20210722. 10.1259/bjr.20210722. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR58] 58.Suvon M, Tripathi P, Alabed S, Swift A, Lu H. Multimodal learning for predicting mortality in patients with pulmonary arterial hypertension. 2022.

[CR59] 59.Swift AJ, et al. A machine learning cardiac magnetic resonance approach to extract disease features and automate pulmonary arterial hypertension diagnosis. Eur Heart J Cardiovasc Imaging. 2020;22:236–45. 10.1093/ehjci/jeaa001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR60] 60.Swinnen K, et al. Machine learning to differentiate pulmonary hypertension due to left heart disease from pulmonary arterial hypertension. ERJ Open Res. 2023. 10.1183/23120541.00229-2023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR61] 61.Yang J, Chen S, Chen K, Wu J, Yuan H. Exploring IRGs as a biomarker of pulmonary hypertension using multiple machine learning algorithms. Diagnostics. 2024. 10.3390/diagnostics14212398. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR62] 62.Zeng H, Liu X, Zhang Y. Identification of potential biomarkers and immune infiltration characteristics in idiopathic pulmonary arterial hypertension using bioinformatics analysis. Front Cardiovasc Med. 2021. 10.3389/fcvm.2021.624714. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR63] 63.Zhang N, et al. Machine learning based on computed tomography pulmonary angiography in evaluating pulmonary artery pressure in patients with pulmonary hypertension. J Clin Med. 2023;12:1297. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR64] 64.Zhao M, et al. Non-contrasted computed tomography (NCCT) based chronic thromboembolic pulmonary hypertension (CTEPH) automatic diagnosis using cascaded network with multiple instance learning. Phys Med Biol. 2024;69:185011. 10.1088/1361-6560/ad7455. [DOI] [PubMed] [Google Scholar]

[CR65] 65.Zou X-L, et al. A promising approach for screening pulmonary hypertension based on frontal chest radiographs using deep learning: a retrospective study. PLoS ONE. 2020;15:e0236378. 10.1371/journal.pone.0236378. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR66] 66.Dawes TJW, et al. Machine learning of three-dimensional right ventricular motion enables outcome prediction in pulmonary hypertension: a cardiac MR imaging study. Radiology. 2017;283:381–90. 10.1148/radiol.2016161315. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR67] 67.Guo L, et al. Development and evaluation of a deep learning-based pulmonary hypertension screening algorithm using a digital stethoscope. J Am Heart Assoc. 2025;14:e036882. 10.1161/jaha.124.036882. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR68] 68.Kheyfets VO, et al. Computational platform for doctor-artificial intelligence cooperation in pulmonary arterial hypertension prognostication: a pilot study. ERJ Open Res. 2023. 10.1183/23120541.00484-2022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR69] 69.Liao Z, et al. Automatic echocardiographic evaluation of the probability of pulmonary hypertension using machine learning. Pulm Circ. 2023;13:e12272. 10.1002/pul2.12272. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR70] 70.Sweatt AJ, et al. Discovery of distinct immune phenotypes using machine learning in pulmonary arterial hypertension. Circ Res. 2019;124:904–19. 10.1161/circresaha.118.313911. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR71] 71.Zhao W, et al. Development and validation of multimodal deep learning algorithms for detecting pulmonary hypertension. NPJ Digit Med. 2025;8:198. 10.1038/s41746-025-01593-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR72] 72.Dubrock HM, et al. Use of machine-learning models to identify clinical features in patients with pulmonary arterial hypertension associated with a future clinical worsening event. Chest. 2023;164:A5931–2. 10.1016/j.chest.2023.07.3821. [Google Scholar]

[CR73] 73.Hoeper MM, et al. Complications of right heart catheterization procedures in patients with pulmonary hypertension in experienced centers. J Am Coll Cardiol. 2006;48:2546–52. 10.1016/j.jacc.2006.07.061. [DOI] [PubMed] [Google Scholar]

[CR74] 74.Salih A, et al. Explainable artificial intelligence and cardiac imaging: toward more interpretable models. Circ Cardiovasc Imaging. 2023;16:e014519. 10.1161/circimaging.122.014519. [DOI] [PubMed] [Google Scholar]

[CR75] 75.Marey A, et al. Explainability, transparency and black box challenges of AI in radiology: impact on patient care in cardiovascular radiology. Egypt J Radiol Nucl Med. 2024;55:183. 10.1186/s43055-024-01356-2. [Google Scholar]

[CR76] 76.Tonekaboni S, Joshi S, McCradden MD, Goldenberg A. Proceedings of the 4th Machine Learning for Healthcare Conference. 2019; vol. 106, p. 359–380 (PMLR, Proceedings of Machine Learning Research).

[CR77] 77.Goto S, Ozawa H. The importance of external validation for neural network models. JACC Adv. 2023;2:100610. 10.1016/j.jacadv.2023.100610. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR78] 78.Cabitza F, et al. The importance of being external. Methodological insights for the external validation of machine learning models in medicine. Comput Methods Programs Biomed. 2021;208:106288. 10.1016/j.cmpb.2021.106288. [DOI] [PubMed] [Google Scholar]

[CR79] 79.Hassija V, et al. Interpreting black-box models: a review on explainable artificial intelligence. Cogn Comput. 2024;16:45–74. 10.1007/s12559-023-10179-8. [Google Scholar]

[CR80] 80.Ribeiro MT, Singh S, Guestrin C. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, CA: Association for Computing Machinery; 2016. p. 1135–1144.

[CR81] 81.Lundberg S. M. & Lee, S.-I. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, CA: Curran Associates Inc.; 2017. p. 4768–4777.

[CR82] 82.Ueda D, et al. Fairness of artificial intelligence in healthcare: review and recommendations. Jpn J Radiol. 2024;42:3–15. 10.1007/s11604-023-01474-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR83] 83.Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A. A survey on bias and fairness in machine learning. ACM Comput Surv. 2021;54:115. 10.1145/3457607. [Google Scholar]

[CR84] 84.Winter P, Carusi A. If you’re going to trust the machine, then that trust has got to be based on something’: : validation and the co-constitution of trust in developing artificial intelligence (AI) for the early diagnosis of pulmonary hypertension (PH). Sci Technol Stud. 2022;35:58–77. 10.23987/sts.102198. [Google Scholar]

PERMALINK

Artificial intelligence in pulmonary hypertension: a systematic review

Tilmann Kramer

Mira Kramer

Christian Hagist

Stefan Spinler

Abstract

Background

Objective

Methods

Results

Conclusion

Supplementary Information

Introduction

Methods

Results

Study selection and characteristics

Fig. 1.

Table 1.

Algorithmic approaches and input modalities

Table 2.

Diagnostic and classification models

Model performance and validation

Risk of bias and applicability

Prognostic and predictive models

Discussion

Study cohorts and diagnosis methodology

Model performance: machine learning vs. deep learning

Prognostic models and long-term outcomes

Data heterogeneity and multi-modal integration

Validation methods and generalizability

Model interpretability and ethical considerations

Summary

Strengths and limitations

Future directions

Conclusions

Questions for future research

Supplementary Information

Acknowledgements

Abbreviations

Author contributions

Funding

Data availability

Declarations

Ethics approval and consent to participate

Competing interests

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases