Abstract
Background
Pulmonary hypertension (PH) is a progressive vascular disorder where early diagnosis is critical for improving patient outcomes. While right heart catheterization (RHC) remains the gold standard for diagnosis, its invasive nature often leads to delayed PH detection. This study aimed to develop a machine learning-based predictive model incorporating MRI-derived parameters to facilitate early PH diagnosis.
Methods
In this retrospective study, after data filtering, 323 participants (161 RHC-confirmed PH patients and 162 controls) who underwent cardiac MRI at Zhongnan Hospital, Wuhan University between January 2021 and May 2024 were enrolled for model development, with a 7:3 split for training and internal validation. An additional external validation cohort (48 PH cases and 16 controls) was collected from June 2024 to June 2025. We analyzed 27 MRI parameters reflecting cardiac structure/function, 60 laboratory biomarkers (including NT-proBNP and D-dimer), and basic demographic information (age, sex). Key MRI features were selected via recursive feature elimination (RFE), followed by comparative evaluation of multiple machine learning models (XGBoost, logistic regression, etc.) to identify optimal predictors. SHAP analysis elucidated variable importance, while Random forest selected significant laboratory biomarkers. The final integrated model combined MRI and laboratory predictors, with performance assessed via Receiver Operating Characteristic (ROC) curves, calibration plots, and decision curve analysis (DCA). A nomogram and web-based calculator were developed for clinical implementation.
Results
Fifteen MRI parameters showed strong PH association: pulmonary artery diameter (PA), right end-diastolic volumes (REDV), left end-diastolic volumes (LEDV), left end-systolic volumes (LESV), right cardiac index (RCI), right ventricular ejection fraction (RVEF), left atrial anteroposterior diameter (LAAPD), left cardiac output (LCO), right stroke volumes (RSV), left stroke volumes (LSV), left stroke volume index (LSVI), left ventricular lateral wall thickness (LVLWT), left basal interventricular septal thickness (LIVST), and ascending aortic diameters (AAD), descending aortic diameters (DAD). SHAP analysis identified PA, REDV, and LEDV as top predictors. The MRI-derived model demonstrated excellent discriminative ability across all metrics (AUC, precision-recall, specificity-sensitivity). Key laboratory predictors included BUN, γGGT, TBIL, and D-dimer. The combined model achieved AUCs of 0.999 (training), 0.944 (internal validation), and 0.897 (external validation), with excellent calibration. For enhanced clinical utility, we have deployed the developed PH prediction model as a web-based calculator (https://jianghx.shinyapps.io/PH_prediction_MRIIndex/) to facilitate early diagnosis of pulmonary hypertension.
Conclusion
Our study developed a high-performance PH prediction model integrating cardiac MRI and laboratory biomarkers, demonstrating robust diagnostic accuracy that could enable earlier PH detection while circumventing RHC-related diagnostic delays.
Clinical trial number
Not applicable.
Keywords: Pulmonary hypertension, Magnetic resonance imaging, Machine learning, Risk prediction
Introduction
Pulmonary hypertension (PH) represents a severe, progressive vascular disorder characterized by elevated pulmonary arterial pressure, increased pulmonary vascular resistance, and eventual right ventricular failure [1]. Epidemiological studies estimate a global prevalence of approximately 1%, rising to 10% in individuals over 65 years [2]. The disease carries significant morbidity and mortality burdens, with untreated cases demonstrating a median survival under three years [3]. Early detection is of paramount importance for implementing timely interventions and improving clinical outcomes.
Current diagnostic approaches employ both invasive and non-invasive modalities. While right heart catheterization (RHC) remains the diagnostic gold standard for hemodynamic assessment [4], its invasive nature limits utility for routine screening. Echocardiography serves as the primary non-invasive screening tool, providing estimates of pulmonary pressures and right ventricular function [5]. However, this modality exhibits several limitations: operator-dependent variability, restricted accuracy in patients with poor acoustic windows, reliance on indirect pathological markers, and limited capability for detecting subtle structural abnormalities like focal fibrosis [6]. In addition, many other factors can affect the accuracy of echocardiography in estimating pulmonary artery pressure, such as right heart failure, severe tricuspid regurgitation, female sex, cardiac arrhythmias, systemic hypertension, and diuretic therapy [7, 8]. Therefore, developing alternative predictive models for PH is of significant importance for improving non-invasive diagnosis.
Cardiac magnetic resonance imaging (MRI) offers distinct advantages over echocardiography, including superior spatial resolution and tissue characterization. MRI enables non-invasive quantification of myocardial fibrosis (via late gadolinium enhancement), edema (T2 mapping), and iron deposition, while providing three-dimensional assessment of ventricular structure and function with minimal operator dependence [9, 10]. This technique has emerged as a reference standard for comprehensive cardiovascular evaluation in PH.
Laboratory biomarkers complement imaging findings by reflecting the underlying pathophysiological processes. The integration of imaging and biochemical data promises enhanced diagnostic accuracy through multidimensional assessment of PH manifestations.
The challenge lies in effectively synthesizing these complex, multidimensional datasets. Contemporary bioinformatics and machine learning approaches provide powerful solutions for feature selection, algorithm optimization, and predictive model development [11, 12]. These techniques enable the creation of interpretable models capable of accurate risk stratification, early detection, and personalized management.
This study presents a novel machine learning framework integrating comprehensive MRI parameters and laboratory biomarkers to develop a predictive model for PH. Through rigorous evaluation including Receiver Operating Characteristic (ROC) analysis, decision curve analysis (DCA), and calibration curves, coupled with both internal and external validation, we demonstrate robust model performance. The implementation of this model as a clinically accessible nomogram and online calculator provides a practical tool for non-invasive PH risk assessment, offering new possibilities for early diagnosis and intervention.
Materials and methods
Data colleciton
This study retrospectively enrolled PH patients diagnosed by RHC (mean pulmonary artery pressure > 20 mmHg) who underwent cardiac MRI at Zhongnan Hospital of Wuhan University between January 2021 and May 2024 as the case group, while collecting control subjects without a PH diagnosis and with normal cardiac structure/function on MRI. An external validation cohort consisting of PH patients and controls meeting the same criteria was prospectively collected from June 2024 to June 2025. For both groups, we collected demographic information (age, sex, etc.), cardiac MRI parameters assessing cardiac morphology and function, and laboratory biomarkers.
MRI parameters
Left atrial anterior-posterior diameter (LAAPD), left ventricular transverse diameter (LVTD), left basal interventricular septal thickness (LIVST), left ventricular lateral wall thickness (LVLWT), pulmonary artery diameter (PA), ascending aorta diameter (AAD), descending aorta diameter (DAD), left ventricular ejection fraction (LVEF), left ventricular end-diastolic volume (LEDV), left ventricular end-systolic volume (LESV), left ventricular stroke volume (LSV), left ventricular mass (LVM), left cardiac output (LCO), left cardiac index (LCI), left ventricular end-diastolic volume index (LEDVI), left ventricular end-systolic volume index (LESVI), left ventricular stroke volume index (LSVI), left ventricular mass index (LMI), right ventricular ejection fraction (RVEF), right ventricular end-diastolic volume (REDV), right ventricular end-systolic volume (RESV), right ventricular stroke volume (RSV), right cardiac output (RCO), right cardiac index (RCI), right ventricular end-diastolic volume index (REDVI), right ventricular end-systolic volume index (RESVI), and right ventricular stroke volume index (RSVI).
Laboratory biomarkers
White blood cell count (WBC), red blood cell count (RBC), hemoglobin (Hb), platelet count (PLT), neutrophil percentage (NEUT%), lymphocyte percentage (LYMPH%), monocyte percentage (MONO%), eosinophil percentage (EOS%), basophil percentage (BASO%), neutrophil absolute count (NEUT), lymphocyte absolute count (LYMPH), monocyte absolute count (MONO), eosinophil absolute count (EOS), basophil absolute count (BASO), hematocrit (HCT), mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), red cell distribution width (RDW-CV), mean platelet volume (MPV), prothrombin time (PT), international normalized ratio (INR), prothrombin time activity (PTA), activated partial thromboplastin time (APTT), thrombin time (TT), fibrinogen (FIB), D-dimer (D-Dimer), N-terminal pro-B-type natriuretic peptide (NT-proBNP), alanine aminotransferase (ALT), aspartate aminotransferase (AST), AST/ALT ratio (AST/ALT), total bilirubin (TBIL), direct bilirubin (DBIL), indirect bilirubin (IDBIL), total protein (TP), albumin (ALB), globulin (GLB), albumin/globulin ratio (A/G ratio), gamma-glutamyl transferase (GGT), alkaline phosphatase (ALP), total bile acids (TBA), blood urea nitrogen (BUN), creatinine (Cr), uric acid (UA), carbon dioxide (CO₂), serum cystatin C (Cys-C), potassium (K⁺), sodium (Na⁺), chloride (Cl⁻), calcium (Ca²⁺), magnesium (Mg²⁺), phosphorus (P), total cholesterol (TC), triglycerides (TG), high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), small dense LDL (sdLDL), lipoprotein(a) (Lp(a)), free fatty acids (FFA), and phospholipids (PL), free triiodothyronine (FT3), free thyroxine (FT4), and thyroid-stimulating hormone (TSH). Immunological profiling included anti-cardiolipin antibodies (IgM, IgG, IgA), complement components (C3, C4), immunoglobulins (IgG, IgA, IgM, IgE), anti-streptolysin O (ASO), rheumatoid factor (RF), and rheumatoid factor antibodies (IgM, IgG, IgA). Blood gas analysis provided measures of acid-base balance and oxygenation status: pH, partial pressure of oxygen (PaO2), oxygen saturation (SaO2), partial pressure of carbon dioxide (PaCO2), temperature, lactate, standard bicarbonate (HCO3-), actual base excess (ABE), standard base excess (SBE), and anion gap (AG).
Methodological framework for predictive model development
Data Preprocessing: Variables and samples with > 15% missing values were excluded. Highly correlated variables (Pearson’s r > 0.75) were removed to mitigate multicollinearity. Missing values were imputed using K-nearest neighbors (KNN) algorithm. Continuous variables were standardized (z-score normalization), while categorical variables were converted to dummy variables.
Data Partitioning: The dataset was stratified by outcome and randomly split into training (70%) and testing (30%) sets.
Feature Selection: Recursive feature elimination (RFE) with random forest importance scoring was implemented. To enhance robustness, repeated k-fold cross-validation (10 folds, 3 repeats) was employed during RFE. The selected MRI parameters constituted the MRI index.
Model Training & Comparison: Multiple machine learning algorithms were evaluated: Regularized models: Logistic regression, LASSO, Elastic Net, Tree-based models: Decision tree, Random forest, XGBoost, GBM, Other classifiers: SVM (radial basis), KNN, Naïve Bayes. Model optimization was performed through 10-fold cross-validation with ROC as the primary metric. XGBoost demonstrated superior discriminative ability (highest AUC) and was selected as the final algorithm.
Model Evaluation: Performance metrics (AUC, accuracy, sensitivity, specificity) were calculated for training, testing, and external validation sets. Diagnostic plots included ROC curves and confusion matrices generated via ‘ROC’ and ‘caret’ packages.
Interpretability Analysis: SHAP (SHapley Additive exPlanations) values were computed using Python’s SHAP library to quantify feature contributions to predictions. This provided granular insights into how individual MRI parameters influenced model outputs.
Biomarker Selection: Random forest-based screening was applied to demographic variables (age, sex) and laboratory biomarkers to identify potential PH predictors, with importance scores derived from permutation testing.
Nomogram Development: The final integrated model combined the MRI index with selected biomarkers. A penalized logistic regression model with L2 regularization was constructed, based on which a clinical nomogram was developed. Model discrimination was validated through ROC analysis across all datasets.
Clinical Utility Assessment: Decision curve analysis quantified net benefit across probability thresholds, comparing Biomarker-only model and Combined final model.
Clinical Implementation: The nomogram was deployed as an interactive web application using R Shiny, enabling real-time risk stratification in clinical practice.
Statistical analysis
This comprehensive analytical pipeline adhered to TRIPOD guidelines for transparent reporting of multivariable prediction models. Data Preprocessing, data Partitioning, feature Selection, model Training & comparison, model Evaluation, biomarker Selection, nomogram Development, clinical Utility assessment, and clinical implementation were performed using R(v4.2.2) and SHAP analysis was applied with Python (v3.9).
Results
Baseline characteristics and statistical analysis between PH and control groups
The comprehensive data analysis framework of this study is systematically presented in Fig. 1. Our cohort consists of 161 patients and 162 matched controls for model development and internal validation. Differences emerged between groups across multiple clinical parameters (Table 1). The PH cohort demonstrated significantly enlarged cardiac dimensions, including left atrial diameters (41.9 ± 12.0 vs. 33.4 ± 9.23 mm, p < 0.001) and pulmonary artery dilation (31.2 ± 8.61 vs. 23.1 ± 3.92 mm, p < 0.001). Impaired cardiac function was evidenced by reduced LVEF (39.2 ± 20.4% vs. 53.9 ± 13.6%, p < 0.001) alongside elevated ventricular volumes (both EDV and ESV, p < 0.001). Hematological and biochemical profiling revealed a distinct PH signature: thrombocytopenia (184 ± 65.1 vs. 216 ± 68.2 × 10^9/L, p < 0.001), elevated coagulation markers (D-dimer: 892 ± 2990 vs. 218 ± 474 ng/mL, p = 0.006), and significantly increased NT-proBNP (3160 ± 4510 vs. 702 ± 2770 pg/mL, p < 0.001) and hepatic/kidney function markers (all p ≤ 0.001).
Fig. 1.
Flow chart of data analysis in this study
Table 1.
Baseline characteristics and intergroup differences between PH patients and controls
| Normal(n = 162) | PH(n = 161) | Overall(n = 323) | p value | |
|---|---|---|---|---|
| Sex | ||||
| Male | 95 (58.6%) | 92 (57.1%) | 187 (57.9%) | NA |
| Female | 67 (41.4%) | 69 (42.9%) | 136 (42.1%) | |
| Missing | 0 (0%) | 0 (0%) | 0 (0%) | |
| Age, years | ||||
| Mean (SD) | 42.9 (17.7) | 46.8 (17.4) | 44.8 (17.6) | 0.046 |
| Median [Min, Max] | 42.3 [9.68, 79.0] | 49.0 [6.00, 85.0] | 46.0 [6.00, 85.0] | |
| Missing | 0 (0%) | 1 (0.6%) | 1 (0.3%) | |
| Left atrial anterior-posterior diameter (LAAPD), mm | ||||
| Mean (SD) | 33.4 (9.23) | 41.9 (12.0) | 37.6 (11.5) | < 0.001 |
| Median [Min, Max] | 31.0 [17.0, 79.0] | 42.0 [18.0, 82.0] | 35.0 [17.0, 82.0] | |
| Missing | 0 (0%) | 2 (1.2%) | 2 (0.6%) | |
| left ventricular transverse diameter (LVTD), mm | ||||
| Mean (SD) | 50.5 (9.30) | 58.2 (18.0) | 54.3 (14.8) | < 0.001 |
| Median [Min, Max] | 49.0 [33.0, 90.0] | 54.0 [26.0, 106] | 50.0 [26.0, 106] | |
| Missing | 0 (0%) | 1 (0.6%) | 1 (0.3%) | |
| Left basal interventricular septal thickness (LIVST), mm | ||||
| Mean (SD) | 8.97 (2.49) | 9.29 (3.16) | 9.13 (2.85) | 0.357 |
| Median [Min, Max] | 8.50 [5.50, 21.5] | 8.50 [4.00, 21.5] | 8.50 [4.00, 21.5] | |
| Missing | 26 (16.0%) | 21 (13.0%) | 47 (14.6%) | |
| Left ventricular lateral wall thickness (LVLWT), mm | ||||
| Mean (SD) | 6.43 (1.58) | 6.72 (2.20) | 6.58 (1.91) | 0.222 |
| Median [Min, Max] | 6.00 [4.00, 13.0] | 6.50 [3.00, 14.5] | 6.25 [3.00, 14.5] | |
| Missing | 25 (15.4%) | 26 (16.1%) | 51 (15.8%) | |
| Pulmonary artery diameter (PA), mm | ||||
| Mean (SD) | 23.1 (3.92) | 31.2 (8.61) | 27.1 (7.81) | < 0.001 |
| Median [Min, Max] | 23.0 [15.0, 37.0] | 30.0 [14.0, 64.0] | 25.4 [14.0, 64.0] | |
| Missing | 0 (0%) | 2 (1.2%) | 2 (0.6%) | |
| Ascending aorta diameter (AAD), mm | ||||
| Mean (SD) | 29.4 (5.49) | 29.9 (6.82) | 29.7 (6.19) | 0.504 |
| Median [Min, Max] | 29.0 [20.0, 46.0] | 31.0 [10.0, 46.0] | 30.0 [10.0, 46.0] | |
| Missing | 0 (0%) | 1 (0.6%) | 1 (0.3%) | |
| Descending aorta diameter (DAD), mm | ||||
| Mean (SD) | 21.2 (3.84) | 20.9 (4.52) | 21.0 (4.19) | 0.6 |
| Median [Min, Max] | 21.0 [11.0, 32.0] | 21.0 [7.00, 39.0] | 21.0 [7.00, 39.0] | |
| Missing | 0 (0%) | 1 (0.6%) | 1 (0.3%) | |
| Left ventricular ejection fraction (LVEF),% | ||||
| Mean (SD) | 53.9 (13.6) | 39.2 (20.4) | 46.6 (18.8) | < 0.001 |
| Median [Min, Max] | 57.3 [10.0, 86.6] | 39.0 [9.67, 77.4] | 54.2 [9.67, 86.6] | |
| Left ventricular end-diastolic volume (LEDV), mL | ||||
| Mean (SD) | 156 (78.8) | 239 (163) | 197 (134) | < 0.001 |
| Median [Min, Max] | 136 [62.4, 601] | 184 [52.9, 872] | 144 [52.9, 872] | |
| Left ventricular end-systolic volume (LESV), mL | ||||
| Mean (SD) | 106 (349) | 170 (157) | 138 (273) | 0.035 |
| Median [Min, Max] | 55.6 [10.9, 4430] | 107 [18.0, 783] | 63.5 [10.9, 4430] | |
| Left ventricular stroke volume (LSV), mL | ||||
| Mean (SD) | 77.2 (23.6) | 69.4 (27.8) | 73.4 (26.0) | 0.007 |
| Median [Min, Max] | 73.8 [22.7, 185] | 67.0 [12.3, 192] | 70.4 [12.3, 192] | |
| Left ventricular mass (LVM), g | ||||
| Mean (SD) | 97.1 (41.1) | 126 (60.9) | 112 (53.7) | < 0.001 |
| Median [Min, Max] | 90.0 [28.2, 285] | 110 [20.5, 371] | 98.0 [20.5, 371] | |
| Missing | 0 (0%) | 5 (3.1%) | 5 (1.5%) | |
| Left cardiac output (LCO), L/min | ||||
| Mean (SD) | 5.57 (1.71) | 5.19 (2.21) | 5.38 (1.98) | 0.086 |
| Median [Min, Max] | 5.26 [2.22, 11.9] | 4.74 [1.01, 16.6] | 5.05 [1.01, 16.6] | |
| Left cardiac index (LCI), g/m2 | ||||
| Mean (SD) | 3.17 (0.888) | 3.10 (1.36) | 3.14 (1.15) | 0.584 |
| Median [Min, Max] | 3.02 [1.43, 7.99] | 2.99 [0.540, 11.2] | 3.00 [0.540, 11.2] | |
| Left ventricular end-diastolic volume index (LEDVI), mL/m2 | ||||
| Mean (SD) | 89.9 (44.7) | 138 (84.4) | 114 (71.6) | < 0.001 |
| Median [Min, Max] | 78.0 [49.9, 376] | 106 [40.4, 412] | 84.9 [40.4, 412] | |
| Left ventricular end-systolic volume index (LESVI), mL/m2 | ||||
| Mean (SD) | 45.5 (39.5) | 97.3 (83.0) | 71.3 (69.8) | < 0.001 |
| Median [Min, Max] | 32.6 [6.97, 273] | 60.3 [15.5, 363] | 37.1 [6.97, 363] | |
| Left ventricular stroke volume index (LSVI), mL/m2 | ||||
| Mean (SD) | 44.5 (12.7) | 44.9 (47.0) | 44.7 (34.4) | 0.931 |
| Median [Min, Max] | 43.2 [13.7, 103] | 40.3 [6.50, 599] | 41.7 [6.50, 599] | |
| Left ventricular mass index (LMI), g/m2 | ||||
| Mean (SD) | 56.2 (21.5) | 72.8 (29.6) | 64.3 (27.1) | < 0.001 |
| Median [Min, Max] | 51.9 [23.5, 168] | 67.2 [5.20, 196] | 56.1 [5.20, 196] | |
| Missing | 0 (0%) | 6 (3.7%) | 6 (1.9%) | |
| Right ventricular ejection fraction (RVEF),% | ||||
| Mean (SD) | 45.3 (14.0) | 37.0 (15.8) | 41.7 (15.4) | < 0.001 |
| Median [Min, Max] | 47.9 [9.00, 72.8] | 35.3 [10.3, 70.2] | 42.6 [9.00, 72.8] | |
| Missing | 0 (0%) | 36 (22.4%) | 36 (11.1%) | |
| Right ventricular end-diastolic volume (REDV), mL | ||||
| Mean (SD) | 128 (55.4) | 204 (104) | 161 (88.5) | < 0.001 |
| Median [Min, Max] | 114 [59.0, 454] | 175 [59.9, 578] | 133 [59.0, 578] | |
| Missing | 0 (0%) | 37 (23.0%) | 37 (11.5%) | |
| Right ventricular end-systolic volume (RESV), mL/m2 | ||||
| Mean (SD) | 73.9 (47.7) | 134 (87.4) | 99.9 (74.0) | < 0.001 |
| Median [Min, Max] | 60.2 [21.4, 308] | 111 [21.5, 456] | 72.6 [21.4, 456] | |
| Missing | 0 (0%) | 37 (23.0%) | 37 (11.5%) | |
| Right Ventricular stroke volume(RSV), mL | ||||
| Mean (SD) | 56.7 (24.3) | 70.0 (42.0) | 62.5 (33.7) | 0.002 |
| Median [Min, Max] | 54.6 [9.28, 201] | 60.3 [16.3, 302] | 57.0 [9.28, 302] | |
| Missing | 0 (0%) | 37 (23.0%) | 37 (11.5%) | |
| Right cardiac output (RCO), L/min | ||||
| Mean (SD) | 4.41 (4.80) | 5.29 (2.99) | 4.79 (4.13) | 0.059 |
| Median [Min, Max] | 3.93 [0.630, 60.0] | 4.67 [1.09, 20.8] | 4.10 [0.630, 60.0] | |
| Missing | 0 (0%) | 38 (23.6%) | 38 (11.8%) | |
| Right cardiac index (RCI), g/m2 | ||||
| Mean (SD) | 3.20 (11.0) | 3.21 (1.99) | 3.20 (8.38) | 0.992 |
| Median [Min, Max] | 2.23 [0.480, 142] | 2.78 [0.560, 16.2] | 2.47 [0.480, 142] | |
| Missing | 0 (0%) | 38 (23.6%) | 38 (11.8%) | |
| Right ventricular end-diastolic volume index (REDVI), mL/m2 | ||||
| Mean (SD) | 73.1 (29.4) | 125 (68.2) | 95.3 (56.0) | < 0.001 |
| Median [Min, Max] | 67.6 [5.03, 244] | 105 [32.9, 445] | 78.5 [5.03, 445] | |
| Missing | 0 (0%) | 38 (23.6%) | 38 (11.8%) | |
| Right ventricular end-systolic volume index (RESVI), mL/m2 | ||||
| Mean (SD) | 41.5 (24.8) | 78.7 (49.2) | 57.6 (41.6) | < 0.001 |
| Median [Min, Max] | 35.1 [2.45, 173] | 63.1 [13.1, 228] | 42.7 [2.45, 228] | |
| Missing | 0 (0%) | 38 (23.6%) | 38 (11.8%) | |
| Right ventricular stroke volume index (RSVI), mL/m2 | ||||
| Mean (SD) | 32.6 (13.7) | 42.7 (28.2) | 36.9 (21.7) | < 0.001 |
| Median [Min, Max] | 31.7 [6.35, 127] | 35.9 [8.37, 235] | 32.8 [6.35, 235] | |
| Missing | 0 (0%) | 40 (24.8%) | 40 (12.4%) | |
| WBC, 109/L | ||||
| Mean (SD) | 6.07 (2.05) | 6.12 (2.04) | 6.10 (2.04) | 0.837 |
| Median [Min, Max] | 5.80 [2.30, 16.7] | 5.80 [1.60, 17.5] | 5.80 [1.60, 17.5] | |
| Missing | 6 (3.7%) | 0 (0%) | 6 (1.9%) | |
| RBC, 109/L | ||||
| Mean (SD) | 4.55 (0.716) | 4.60 (0.939) | 4.58 (0.836) | 0.588 |
| Median [Min, Max] | 4.55 [2.01, 8.33] | 4.44 [1.77, 8.33] | 4.51 [1.77, 8.33] | |
| Missing | 6 (3.7%) | 0 (0%) | 6 (1.9%) | |
| Hb, g/L | ||||
| Mean (SD) | 134 (21.4) | 136 (25.3) | 135 (23.4) | 0.436 |
| Median [Min, Max] | 135 [65.8, 208] | 135 [60.0, 238] | 135 [60.0, 238] | |
| Missing | 6 (3.7%) | 0 (0%) | 6 (1.9%) | |
| Plt, 109/L | ||||
| Mean (SD) | 216 (68.2) | 184 (65.1) | 200 (68.4) | < 0.001 |
| Median [Min, Max] | 218 [15.0, 425] | 182 [40.0, 460] | 197 [15.0, 460] | |
| Missing | 6 (3.7%) | 0 (0%) | 6 (1.9%) | |
| Neutrophilic granulocyte percentage,% | ||||
| Mean (SD) | 60.7 (9.85) | 62.4 (10.8) | 61.6 (10.4) | 0.131 |
| Median [Min, Max] | 60.8 [39.4, 95.8] | 63.7 [32.8, 88.1] | 62.1 [32.8, 95.8] | |
| Missing | 6 (3.7%) | 0 (0%) | 6 (1.9%) | |
| Lymphocytes Percentage,% | ||||
| Mean (SD) | 29.4 (8.73) | 26.8 (9.88) | 28.1 (9.41) | 0.013 |
| Median [Min, Max] | 29.1 [2.50, 47.8] | 26.1 [5.30, 53.4] | 27.5 [2.50, 53.4] | |
| Missing | 6 (3.7%) | 0 (0%) | 6 (1.9%) | |
| Monocyte percentage,% | ||||
| Mean (SD) | 7.61 (2.24) | 8.10 (2.40) | 7.86 (2.33) | 0.062 |
| Median [Min, Max] | 7.20 [1.60, 15.9] | 7.70 [3.50, 18.2] | 7.50 [1.60, 18.2] | |
| Missing | 6 (3.7%) | 0 (0%) | 6 (1.9%) | |
| Eosinophils Percentage,% | ||||
| Mean (SD) | 1.71 (1.55) | 1.80 (1.71) | 1.76 (1.63) | 0.637 |
| Median [Min, Max] | 1.20 [0, 9.00] | 1.30 [0, 8.70] | 1.20 [0, 9.00] | |
| Missing | 6 (3.7%) | 1 (0.6%) | 7 (2.2%) | |
| Basophils Percentage,% | ||||
| Mean (SD) | 0.583 (0.319) | 0.659 (0.427) | 0.622 (0.379) | 0.074 |
| Median [Min, Max] | 0.500 [0, 1.70] | 0.600 [0, 2.60] | 0.500 [0, 2.60] | |
| Missing | 6 (3.7%) | 0 (0%) | 6 (1.9%) | |
| Neutrophil absolute value | ||||
| Mean (SD) | 3.77 (1.79) | 3.89 (1.70) | 3.83 (1.74) | 0.558 |
| Median [Min, Max] | 3.43 [1.20, 16.0] | 3.60 [0.800, 12.9] | 3.50 [0.800, 16.0] | |
| Missing | 6 (3.7%) | 0 (0%) | 6 (1.9%) | |
| Lymphocyte absolute value,109/L | ||||
| Mean (SD) | 1.72 (0.620) | 1.58 (0.657) | 1.65 (0.642) | 0.057 |
| Median [Min, Max] | 1.68 [0.400, 3.90] | 1.50 [0.400, 3.81] | 1.60 [0.400, 3.90] | |
| Missing | 6 (3.7%) | 0 (0%) | 6 (1.9%) | |
| Monocyte absolute value,109/L | ||||
| Mean (SD) | 0.449 (0.165) | 0.491 (0.240) | 0.470 (0.207) | 0.073 |
| Median [Min, Max] | 0.400 [0.100, 1.00] | 0.450 [0.130, 2.00] | 0.420 [0.100, 2.00] | |
| Missing | 6 (3.7%) | 0 (0%) | 6 (1.9%) | |
| Eosinophils absolute value,109/L | ||||
| Mean (SD) | 0.103 (0.111) | 0.120 (0.252) | 0.111 (0.196) | 0.432 |
| Median [Min, Max] | 0.100 [0, 0.740] | 0.100 [0, 3.04] | 0.100 [0, 3.04] | |
| Missing | 6 (3.7%) | 0 (0%) | 6 (1.9%) | |
| Basophil absolute value,109/L | ||||
| Mean (SD) | 0.0216 (0.0325) | 0.0307 (0.0377) | 0.0262 (0.0355) | 0.021 |
| Median [Min, Max] | 0 [0, 0.100] | 0.0100 [0, 0.110] | 0 [0, 0.110] | |
| Missing | 6 (3.7%) | 0 (0%) | 6 (1.9%) | |
| HCT,% | ||||
| Mean (SD) | 40.6 (6.18) | 41.7 (7.76) | 41.2 (7.04) | 0.146 |
| Median [Min, Max] | 40.8 [20.4, 66.9] | 41.4 [17.6, 71.8] | 41.0 [17.6, 71.8] | |
| Missing | 6 (3.7%) | 0 (0%) | 6 (1.9%) | |
| Mean RBC volume, fL | ||||
| Mean (SD) | 89.6 (7.23) | 91.2 (7.17) | 90.5 (7.24) | 0.046 |
| Median [Min, Max] | 90.9 [65.8, 107] | 91.9 [65.8, 109] | 91.6 [65.8, 109] | |
| Missing | 6 (3.7%) | 0 (0%) | 6 (1.9%) | |
| Average hemoglobin content, pg | ||||
| Mean (SD) | 29.6 (2.98) | 29.8 (2.90) | 29.7 (2.93) | 0.525 |
| Median [Min, Max] | 30.2 [20.1, 37.4] | 30.4 [19.5, 35.4] | 30.3 [19.5, 37.4] | |
| Missing | 6 (3.7%) | 0 (0%) | 6 (1.9%) | |
| Average hemoglobin concentration, g/L | ||||
| Mean (SD) | 330 (11.5) | 327 (13.4) | 329 (12.6) | 0.011 |
| Median [Min, Max] | 332 [288, 360] | 330 [257, 359] | 330 [257, 360] | |
| Missing | 6 (3.7%) | 0 (0%) | 6 (1.9%) | |
| Erythrocyte distribution width CV,% | ||||
| Mean (SD) | 14.1 (2.36) | 14.9 (2.47) | 14.5 (2.45) | 0.005 |
| Median [Min, Max] | 13.3 [11.7, 28.8] | 14.3 [11.7, 28.8] | 13.8 [11.7, 28.8] | |
| Missing | 17 (10.5%) | 5 (3.1%) | 22 (6.8%) | |
| Mean platelet volume, fL | ||||
| Mean (SD) | 9.37 (1.36) | 9.73 (1.33) | 9.55 (1.36) | 0.018 |
| Median [Min, Max] | 9.10 [6.90, 13.1] | 9.55 [7.10, 13.6] | 9.30 [6.90, 13.6] | |
| Missing | 6 (3.7%) | 1 (0.6%) | 7 (2.2%) | |
| PT, s | ||||
| Mean (SD) | 11.9 (1.08) | 13.8 (5.81) | 12.9 (4.32) | < 0.001 |
| Median [Min, Max] | 11.8 [9.80, 16.2] | 12.4 [9.50, 70.1] | 12.1 [9.50, 70.1] | |
| Missing | 10 (6.2%) | 1 (0.6%) | 11 (3.4%) | |
| INR | ||||
| Mean (SD) | 1.09 (0.0984) | 1.26 (0.512) | 1.18 (0.382) | < 0.001 |
| Median [Min, Max] | 1.08 [0.900, 1.49] | 1.14 [0.870, 6.08] | 1.11 [0.870, 6.08] | |
| Missing | 10 (6.2%) | 1 (0.6%) | 11 (3.4%) | |
| Prothrombin time activity,% | ||||
| Mean (SD) | 89.6 (12.0) | 79.0 (19.9) | 84.2 (17.3) | < 0.001 |
| Median [Min, Max] | 89.5 [54.0, 120] | 81.5 [10.0, 127] | 86.0 [10.0, 127] | |
| Missing | 10 (6.2%) | 1 (0.6%) | 11 (3.4%) | |
| APTT, s | ||||
| Mean (SD) | 31.6 (4.09) | 32.9 (5.82) | 32.3 (5.08) | 0.032 |
| Median [Min, Max] | 31.6 [19.5, 50.4] | 31.9 [21.1, 75.4] | 31.8 [19.5, 75.4] | |
| Missing | 10 (6.2%) | 1 (0.6%) | 11 (3.4%) | |
| TT, s | ||||
| Mean (SD) | 14.7 (1.37) | 15.0 (1.67) | 14.8 (1.54) | 0.057 |
| Median [Min, Max] | 14.4 [12.1, 19.9] | 14.8 [11.3, 25.4] | 14.6 [11.3, 25.4] | |
| Missing | 11 (6.8%) | 1 (0.6%) | 12 (3.7%) | |
| Fibrinogen content, mg/dL | ||||
| Mean (SD) | 297 (67.1) | 296 (72.6) | 297 (69.9) | 0.891 |
| Median [Min, Max] | 286 [172, 474] | 289 [163, 536] | 289 [163, 536] | |
| Missing | 14 (8.6%) | 2 (1.2%) | 16 (5.0%) | |
| D-dimer, ng/mL | ||||
| Mean (SD) | 218 (474) | 892 (2990) | 571 (2210) | 0.006 |
| Median [Min, Max] | 97.0 [0, 4140] | 202 [20.0, 23100] | 129 [0, 23100] | |
| Missing | 17 (10.5%) | 2 (1.2%) | 19 (5.9%) | |
| NTProBNP, pg/mL | ||||
| Mean (SD) | 702 (2770) | 3160 (4510) | 2300 (4150) | < 0.001 |
| Median [Min, Max] | 88.3 [12.5, 19900] | 1370 [17.1, 23100] | 512 [12.5, 23100] | |
| Missing | 84 (51.9%) | 17 (10.6%) | 101 (31.3%) | |
| ALT, U/L | ||||
| Mean (SD) | 24.7 (27.3) | 57.5 (252) | 41.1 (179) | 0.103 |
| Median [Min, Max] | 17.0 [4.00, 273] | 21.5 [1.00, 2350] | 19.0 [1.00, 2350] | |
| Missing | 1 (0.6%) | 1 (0.6%) | 2 (0.6%) | |
| AST, U/L | ||||
| Mean (SD) | 26.2 (40.3) | 63.1 (335) | 44.6 (238) | 0.168 |
| Median [Min, Max] | 20.0 [10.0, 507] | 24.5 [10.0, 3950] | 22.0 [10.0, 3950] | |
| Missing | 1 (0.6%) | 1 (0.6%) | 2 (0.6%) | |
| AST/ALT | ||||
| Mean (SD) | 1.27 (0.621) | 1.29 (1.13) | 1.28 (0.913) | 0.815 |
| Median [Min, Max] | 1.18 [0.350, 4.40] | 1.14 [0.330, 14.0] | 1.15 [0.330, 14.0] | |
| Missing | 1 (0.6%) | 1 (0.6%) | 2 (0.6%) | |
| TBIL,µmol/L | ||||
| Mean (SD) | 14.7 (15.0) | 20.9 (18.1) | 17.8 (16.9) | 0.001 |
| Median [Min, Max] | 12.4 [2.50, 173] | 16.1 [4.10, 173] | 14.5 [2.50, 173] | |
| Missing | 1 (0.6%) | 1 (0.6%) | 2 (0.6%) | |
| DBIL,µmol/L | ||||
| Mean (SD) | 3.25 (3.56) | 6.35 (7.51) | 4.79 (6.06) | < 0.001 |
| Median [Min, Max] | 2.60 [0.400, 41.3] | 4.20 [1.00, 53.0] | 3.10 [0.400, 53.0] | |
| Missing | 1 (0.6%) | 1 (0.6%) | 2 (0.6%) | |
| IDBIL,µmol/L | ||||
| Mean (SD) | 11.5 (11.7) | 14.5 (12.7) | 13.0 (12.3) | 0.025 |
| Median [Min, Max] | 9.80 [1.10, 131] | 11.7 [2.10, 131] | 10.7 [1.10, 131] | |
| Missing | 1 (0.6%) | 1 (0.6%) | 2 (0.6%) | |
| Total Protein, g/L | ||||
| Mean (SD) | 68.2 (5.97) | 66.7 (7.07) | 67.4 (6.57) | 0.034 |
| Median [Min, Max] | 68.4 [45.7, 84.1] | 67.1 [37.4, 82.6] | 67.5 [37.4, 84.1] | |
| Missing | 1 (0.6%) | 1 (0.6%) | 2 (0.6%) | |
| Alb, g/L | ||||
| Mean (SD) | 40.3 (4.39) | 38.7 (4.74) | 39.5 (4.63) | 0.002 |
| Median [Min, Max] | 40.3 [19.3, 51.2] | 38.7 [22.0, 48.8] | 39.5 [19.3, 51.2] | |
| Missing | 1 (0.6%) | 1 (0.6%) | 2 (0.6%) | |
| Glb, g/L | ||||
| Mean (SD) | 27.9 (4.43) | 28.0 (5.02) | 28.0 (4.72) | 0.933 |
| Median [Min, Max] | 27.7 [18.4, 39.6] | 27.6 [15.4, 45.3] | 27.6 [15.4, 45.3] | |
| Missing | 1 (0.6%) | 1 (0.6%) | 2 (0.6%) | |
| Alb/Glb | ||||
| Mean (SD) | 1.48 (0.292) | 1.42 (0.292) | 1.45 (0.293) | 0.087 |
| Median [Min, Max] | 1.44 [0.720, 2.29] | 1.41 [0.700, 2.60] | 1.42 [0.700, 2.60] | |
| Missing | 1 (0.6%) | 1 (0.6%) | 2 (0.6%) | |
| γGGT, U/L | ||||
| Mean (SD) | 31.1 (27.4) | 57.4 (63.9) | 44.2 (50.7) | < 0.001 |
| Median [Min, Max] | 21.0 [6.00, 158] | 39.5 [8.00, 439] | 27.0 [6.00, 439] | |
| Missing | 1 (0.6%) | 1 (0.6%) | 2 (0.6%) | |
| ALP, U/L | ||||
| Mean (SD) | 82.2 (48.3) | 84.0 (46.8) | 83.1 (47.5) | 0.736 |
| Median [Min, Max] | 72.0 [26.0, 336] | 74.0 [10.0, 342] | 73.0 [10.0, 342] | |
| Missing | 1 (0.6%) | 1 (0.6%) | 2 (0.6%) | |
| TBA,µmol/L | ||||
| Mean (SD) | 5.82 (12.4) | 10.7 (22.6) | 8.26 (18.4) | 0.017 |
| Median [Min, Max] | 3.40 [0.400, 112] | 4.50 [0.200, 163] | 4.00 [0.200, 163] | |
| Missing | 1 (0.6%) | 1 (0.6%) | 2 (0.6%) | |
| BUN,µmol/L | ||||
| Mean (SD) | 5.22 (1.85) | 7.12 (3.69) | 6.12 (3.02) | < 0.001 |
| Median [Min, Max] | 4.93 [2.35, 15.0] | 6.30 [2.32, 32.3] | 5.32 [2.32, 32.3] | |
| Missing | 1 (0.6%) | 16 (9.9%) | 17 (5.3%) | |
| Creatinine,µmol/L | ||||
| Mean (SD) | 68.9 (17.7) | 85.7 (43.8) | 76.8 (33.8) | < 0.001 |
| Median [Min, Max] | 68.4 [32.6, 123] | 77.5 [34.1, 451] | 72.4 [32.6, 451] | |
| Missing | 1 (0.6%) | 16 (9.9%) | 17 (5.3%) | |
| Uric Acid,µmol/L | ||||
| Mean (SD) | 369 (125) | 436 (167) | 401 (150) | < 0.001 |
| Median [Min, Max] | 358 [145, 977] | 406 [145, 1000] | 374 [145, 1000] | |
| Missing | 1 (0.6%) | 16 (9.9%) | 17 (5.3%) | |
| Carbon dioxide, mmol/L | ||||
| Mean (SD) | 25.3 (3.97) | 23.9 (4.00) | 24.6 (4.04) | 0.003 |
| Median [Min, Max] | 25.0 [17.0, 35.5] | 23.7 [15.1, 38.3] | 24.3 [15.1, 38.3] | |
| Missing | 1 (0.6%) | 16 (9.9%) | 17 (5.3%) | |
| Serum cystatin C, mg/L | ||||
| Mean (SD) | 0.849 (0.225) | 1.19 (0.563) | 1.01 (0.453) | < 0.001 |
| Median [Min, Max] | 0.820 [0.420, 1.71] | 1.06 [0.530, 5.23] | 0.920 [0.420, 5.23] | |
| missing | 9 (5.6%) | 22 (13.7%) | 31 (9.6%) | |
| Potassium, mmol/L | ||||
| Mean (SD) | 4.00 (0.407) | 4.01 (0.475) | 4.00 (0.440) | 0.833 |
| Median [Min, Max] | 3.94 [3.02, 6.39] | 4.01 [2.68, 5.37] | 3.98 [2.68, 6.39] | |
| Missing | 0 (0%) | 17 (10.6%) | 17 (5.3%) | |
| Sodium, mmol/L | ||||
| Mean (SD) | 140 (2.33) | 140 (3.04) | 140 (2.68) | 0.904 |
| Median [Min, Max] | 140 [129, 147] | 140 [129, 147] | 140 [129, 147] | |
| Missing | 0 (0%) | 16 (9.9%) | 16 (5.0%) | |
| Chlorine, mmol/L | ||||
| Mean (SD) | 105 (3.05) | 105 (4.25) | 105 (3.66) | 0.377 |
| Median [Min, Max] | 106 [84.4, 115] | 105 [84.4, 115] | 105 [84.4, 115] | |
| Missing | 0 (0%) | 16 (9.9%) | 16 (5.0%) | |
| Calcium, mmol/L | ||||
| Mean (SD) | 2.30 (0.128) | 2.27 (0.141) | 2.29 (0.135) | 0.012 |
| Median [Min, Max] | 2.31 [1.77, 2.61] | 2.26 [1.79, 2.55] | 2.29 [1.77, 2.61] | |
| Missing | 0 (0%) | 16 (9.9%) | 16 (5.0%) | |
| Magnesium, mmol/L | ||||
| Mean (SD) | 0.878 (0.113) | 0.857 (0.107) | 0.868 (0.111) | 0.11 |
| Median [Min, Max] | 0.870 [0.570, 1.40] | 0.860 [0.520, 1.08] | 0.860 [0.520, 1.40] | |
| Missing | 6 (3.7%) | 21 (13.0%) | 27 (8.4%) | |
| Phosphorus, mmol/L | ||||
| Mean (SD) | 1.21 (0.220) | 1.26 (0.264) | 1.23 (0.242) | 0.148 |
| Median [Min, Max] | 1.20 [0.580, 2.09] | 1.21 [0.620, 2.11] | 1.21 [0.580, 2.11] | |
| Missing | 0 (0%) | 16 (9.9%) | 16 (5.0%) | |
| Total cholesterol, mmol/L | ||||
| Mean (SD) | 4.24 (0.928) | 4.06 (1.11) | 4.15 (1.02) | 0.152 |
| Median [Min, Max] | 4.22 [2.26, 7.11] | 3.95 [1.02, 7.09] | 4.11 [1.02, 7.11] | |
| Missing | 17 (10.5%) | 34 (21.1%) | 51 (15.8%) | |
| Triglyceride, mmol/L | ||||
| Mean (SD) | 1.43 (0.945) | 1.52 (0.954) | 1.47 (0.948) | 0.44 |
| Median [Min, Max] | 1.13 [0.270, 5.20] | 1.24 [0.520, 6.79] | 1.18 [0.270, 6.79] | |
| Missing | 17 (10.5%) | 34 (21.1%) | 51 (15.8%) | |
| High density lipoprotein, mmol/L | ||||
| Mean (SD) | 1.13 (0.269) | 1.05 (0.317) | 1.09 (0.294) | 0.04 |
| Median [Min, Max] | 1.07 [0.530, 1.96] | 1.04 [0.250, 2.02] | 1.06 [0.250, 2.02] | |
| Missing | 18 (11.1%) | 34 (21.1%) | 52 (16.1%) | |
| Low density lipoprotein, mmol/L | ||||
| Mean (SD) | 2.58 (0.818) | 2.47 (0.924) | 2.53 (0.869) | 0.297 |
| Median [Min, Max] | 2.49 [0.510, 5.20] | 2.40 [0.520, 5.39] | 2.45 [0.510, 5.39] | |
| Missing | 18 (11.1%) | 34 (21.1%) | 52 (16.1%) | |
| Small low density lipoprotein, mmol/L | ||||
| Mean (SD) | 0.888 (0.418) | 0.897 (0.493) | 0.892 (0.454) | 0.871 |
| Median [Min, Max] | 0.825 [0.140, 2.00] | 0.810 [0.160, 2.63] | 0.820 [0.140, 2.63] | |
| Missing | 20 (12.3%) | 34 (21.1%) | 54 (16.7%) | |
| Lipoprotein a, mg/dL | ||||
| Mean (SD) | 131 (153) | 167 (188) | 148 (171) | 0.092 |
| Median [Min, Max] | 80.1 [2.70, 989] | 103 [0.800, 1300] | 90.9 [0.800, 1300] | |
| Missing | 19 (11.7%) | 34 (21.1%) | 53 (16.4%) | |
| Free fatty acid, mmol/L | ||||
| Mean (SD) | 421 (218) | 474 (336) | 446 (281) | 0.13 |
| Median [Min, Max] | 379 [10.6, 1010] | 406 [10.6, 1840] | 389 [10.6, 1840] | |
| Missing | 20 (12.3%) | 34 (21.1%) | 54 (16.7%) | |
| Phospholipid, mmol/L | ||||
| Mean (SD) | 2.24 (0.382) | 2.21 (0.479) | 2.23 (0.430) | 0.692 |
| Median [Min, Max] | 2.25 [1.51, 3.37] | 2.19 [0.940, 3.65] | 2.21 [0.940, 3.65] | |
| Missing | 20 (12.3%) | 34 (21.1%) | 54 (16.7%) | |
Development and validation of a machine learning-based MRI index for PH prediction
Key cardiac magnetic resonance (MRI) features were selected using recursive feature elimination (RFE) to construct the MRI index (Fig. 2A). The comparative performance of multiple machine learning models demonstrated that the XGBoost model achieved excellent predictive accuracy (Fig. 2B). The diagnostic performance of the MRI index in the training set is shown through a confusion matrix and a scatter plot of correctly and incorrectly classified samples (Fig. 2C and D), while similar evaluation in the internal validation cohort also revealed consistent excellent classification ability (Fig. 2E and F). The ROC curve in the training set indicated excellent discriminative performance with an area under the curve (AUC) of 0.995 (Fig. 2G). Model precision and sensitivity-specificity trade-offs are further visualized via the precision-recall curve and specificity-sensitivity curve, respectively (Fig. 2H and I), and the relationship between classification accuracy and decision threshold is depicted (Fig. 2J). In the internal validation set, the ROC curve yielded an AUC of 0.941 (Fig. 2K), with precision-recall, specificity-sensitivity, and accuracy-threshold curves providing complementary insights into model behavior and threshold optimization (Fig. 2L–N).
Fig. 2.
Development and validation of the MRI index for PH prediction. (A) Key MRI features selected by recursive feature elimination (RFE) to construct the MRI index. (B) Comparative performance of machine learning models for PH prediction. (C) Confusion matrix of the MRI index for PH prediction in the training set. (D) Scatter plot of correctly and incorrectly classified samples by the MRI index in the training set. (E) Confusion matrix of the MRI index for PH prediction in the internal validation set. (F) Scatter plot of correctly and incorrectly classified samples by the MRI index in the internal validation set. (G) Receiver operating characteristic (ROC) curve of the MRI index in the training set. (H) Precision-recall curve of the MRI index in the training set. (I) Specificity-sensitivity trade-off curve of the MRI index in the training set. (J) Accuracy-threshold relationship of the MRI index in the training set. (K) ROC curve of the MRI index in the internal validation set. (L) Precision-recall curve of the MRI index in the internal validation set. (M) Specificity-sensitivity trade-off curve of the MRI index in the internal validation set. (N) Accuracy-threshold relationship of the MRI index in the internal validation set
Interpretable machine learning: SHAP analysis of the MRI index
The relative contribution of each cardiac MRI parameter is shown through a feature importance ranking derived from mean absolute SHAP values. The top influential features include pulmonary artery (PA) metrics, right ventricular ejection fraction (RVEF), and right cardiac index (RCI) (Fig. 3A). Red and blue colors indicate high and low feature values, respectively, showing their directional impact on PH prediction (positive/negative SHAP values = higher/lower risk). The beeswarm plot demonstrates how extreme PA/RCI values (red) drive PH risk predictions, while preserved RVEF (blue) exerts protective effects (Fig. 3B). Heatmap displaying feature values (rows) across samples (columns), clustered by similarity in SHAP-driven patterns. Darker hues highlight extreme values of key features (e.g., PA, REDV) associated with PH subgroups (Fig. 3C). Waterfall plot explaining the model’s output for a representative high-risk PH case. Key drivers elevating the prediction score (f(x) = 2.01 vs. baseline E[f(x)]=-0.078) were PA dilation (+ 1.589), increased REDV (+ 1.886), and elevated cardiac output (LCO, + 2.244), while RVEF showed a protective effect (-0.037) (Fig. 3D).
Fig. 3.
SHAP analysis of the PH prediction model. (A) Feature importance shows PA, RVEF, and REDV as top predictors. (B) Beeswarm plot reveals how feature values influence predictions (red=high, blue = low). (C) Heatmap clusters samples by similar SHAP patterns, highlighting key features like PA and REDV. (D) Waterfall plot explains a high-risk case: PA dilation (+ 1.589) and increased REDV (+ 1.889) raised PH risk, while higher RVEF (-0.037) was protective
Multimodal integration: combined MRI index and laboratory biomarker prediction model
Feature importance analysis identified key laboratory predictors (BUN, γGGT, TBIL, and D-dimer) through random forest’s mean decrease in Gini index (Fig. 4A), which were subsequently integrated with the MRI index. DCA of the biomarker-only model demonstrated its clinical net benefit across various probability thresholds (Fig. 4B). The black line (“None”) represents the net benefit of treating no patients, while the gray line (“All”) indicates the net benefit of treating all patients. Standardized net benefit analysis of the combined model (MRI index plus selected laboratory biomarkers) showed superior clinical utility compared to the base model, particularly in the intermediate probability ranges (threshold probabilities of 20–60%) (Fig. 4C). Nomogram representation of the final PH prediction model, incorporating both the MRI index and significant laboratory markers (BUN, γGGT, TBIL, D-dimer). Each variable is assigned points proportional to its predictive weight, with total points corresponding to individualized PH risk on the bottom scale (Fig. 4D).
Fig. 4.
Development of the integrated PH prediction model combining laboratory biomarkers with MRI index and its clinical presentation. (A) Random forest selection of key predictive biomarkers from comprehensive laboratory tests. (B) Decision curve analysis demonstrating the clinical net benefit of biomarker-based PH prediction. (C) Enhanced predictive performance shown by decision curve analysis of the combined MRI-biomarker model. (D) Nomogram visualization of the final integrated prediction model incorporating both MRI parameters and selected biomarkers for point-of-care risk assessment
Clinical translation: validation and web calculator implementation
An independent external validation dataset, consisting of 48 PH patients and 16 controls, was collected separately from the training and internal validation sets (Table 2). To further validate the performance of our prediction model, we calculated the ROC curves and constructed calibration curves for the nomogram-based PH prediction model in the training set, internal validation set, and external validation set. The results are shown in the corresponding figures. Calibration curves demonstrate good agreement between predicted probabilities and actual outcomes in the training set (Fig. 5A), internal validation set (70:30 split) (Fig. 5B), and external validation set (Fig. 5C). ROC curves evaluate the discriminative performance of the model. The AUC was 0.999 in the training set (Fig. 5D), 0.944 in the internal validation set (Fig. 5E), and 0.897 in the external validation set (Fig. 5F). Given the excellent predictive performance of our model, we developed a web-based nomogram calculator (https://jianghx.shinyapps.io/PH_prediction_MRIIndex/) (Fig. 6A and B) to facilitate clinical implementation and promote widespread application of the model.
Table 2.
Baseline characteristics and intergroup comparisons in the external validation cohort
| Normal(n = 16) | PH(n = 48) | Overall(n = 64) | P value | |
|---|---|---|---|---|
| Sex | ||||
| Male | 10 (62.5%) | 27 (56.3%) | 37 (57.8%) | NA |
| Female | 6 (37.5%) | 21 (43.8%) | 27 (42.2%) | |
| Missing | 0 (0%) | 0 (0%) | 0 (0%) | |
| Age, year | ||||
| Mean (SD) | 54.5 (13.6) | 49.7 (16.7) | 50.9 (16.0) | 0.26 |
| Median [Min, Max] | 56.5 [16.0, 78.0] | 54.0 [3.00, 79.0] | 55.5 [3.00, 79.0] | |
| Left atrial anterior-posterior diameter (LAAPD), mm | ||||
| Mean (SD) | 42.9 (10.7) | 44.3 (11.2) | 43.9 (11.0) | 0.653 |
| Median [Min, Max] | 39.0 [26.0, 61.0] | 44.0 [26.0, 84.0] | 43.5 [26.0, 84.0] | |
| left ventricular transverse diameter (LVTD), mm | ||||
| Mean (SD) | 51.7 (6.89) | 59.4 (14.4) | 57.5 (13.3) | 0.006 |
| Median [Min, Max] | 52.0 [40.0, 71.0] | 60.0 [32.0, 91.0] | 54.0 [32.0, 91.0] | |
| Left basal interventricular septal thickness (LIVST), mm | ||||
| Mean (SD) | 9.13 (2.13) | 11.2 (4.13) | 10.6 (3.81) | 0.014 |
| Median [Min, Max] | 9.00 [5.50, 13.0] | 9.50 [4.50, 24.0] | 9.50 [4.50, 24.0] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Left ventricular lateral wall thickness (LVLWT), mm | ||||
| Mean (SD) | 6.88 (1.28) | 6.76 (1.67) | 6.79 (1.57) | 0.771 |
| Median [Min, Max] | 6.75 [5.50, 9.50] | 7.00 [3.00, 10.5] | 7.00 [3.00, 10.5] | |
| Missing | 0 (0%) | 3 (6.3%) | 3 (4.7%) | |
| Pulmonary artery diameter (PA), mm | ||||
| Mean (SD) | 25.9 (4.55) | 27.6 (6.09) | 27.2 (5.76) | 0.255 |
| Median [Min, Max] | 26.0 [17.0, 36.0] | 28.0 [17.0, 45.0] | 26.0 [17.0, 45.0] | |
| Ascending aorta diameter (AAD), mm | ||||
| Mean (SD) | 33.8 (5.50) | 31.2 (6.34) | 31.8 (6.20) | 0.131 |
| Median [Min, Max] | 34.5 [21.0, 43.0] | 31.0 [12.0, 48.0] | 32.0 [12.0, 48.0] | |
| Descending aorta diameter (DAD), mm | ||||
| Mean (SD) | 22.8 (3.09) | 21.4 (3.91) | 21.7 (3.75) | 0.154 |
| Median [Min, Max] | 23.0 [17.0, 29.0] | 21.0 [9.00, 31.0] | 22.0 [9.00, 31.0] | |
| Left ventricular ejection fraction (LVEF),% | ||||
| Mean (SD) | 48.5 (16.8) | 40.2 (20.8) | 42.3 (20.1) | 0.116 |
| Median [Min, Max] | 57.3 [17.1, 70.3] | 35.0 [14.0, 75.5] | 43.8 [14.0, 75.5] | |
| Missing | 0 (0%) | 1 (2.1%) | 1 (1.6%) | |
| Left ventricular end-diastolic volume (LEDV), mL | ||||
| Mean (SD) | 149 (49.9) | 229 (135) | 209 (124) | 0.001 |
| Median [Min, Max] | 142 [74.1, 262] | 192 [32.2, 788] | 168 [32.2, 788] | |
| Missing | 0 (0%) | 1 (2.1%) | 1 (1.6%) | |
| Left ventricular end-systolic volume (LESV), mL | ||||
| Mean (SD) | 78.9 (43.8) | 157 (132) | 137 (121) | < 0.001 |
| Median [Min, Max] | 64.7 [30.1, 196] | 101 [15.2, 669] | 89.6 [15.2, 669] | |
| Missing | 0 (0%) | 1 (2.1%) | 1 (1.6%) | |
| Left ventricular stroke volume (LSV), mL | ||||
| Mean (SD) | 70.5 (29.8) | 88.4 (88.6) | 83.9 (78.1) | 0.235 |
| Median [Min, Max] | 68.8 [18.8, 144] | 78.0 [17.0, 657] | 76.7 [17.0, 657] | |
| Missing | 0 (0%) | 1 (2.1%) | 1 (1.6%) | |
| Left ventricular mass (LVM), g | ||||
| Mean (SD) | 90.9 (23.2) | 133 (56.5) | 123 (53.3) | < 0.001 |
| Median [Min, Max] | 89.9 [56.7, 132] | 129 [20.9, 286] | 110 [20.9, 286] | |
| Missing | 0 (0%) | 1 (2.1%) | 1 (1.6%) | |
| Left cardiac output (LCO), L/min | ||||
| Mean (SD) | 5.08 (1.53) | 5.50 (2.15) | 5.39 (2.01) | 0.396 |
| Median [Min, Max] | 4.95 [2.33, 7.89] | 5.21 [1.43, 13.9] | 5.20 [1.43, 13.9] | |
| Missing | 0 (0%) | 1 (2.1%) | 1 (1.6%) | |
| Left cardiac index (LCI), g/m2 | ||||
| Mean (SD) | 2.81 (0.778) | 3.24 (1.05) | 3.13 (0.999) | 0.089 |
| Median [Min, Max] | 2.54 [1.58, 4.37] | 3.00 [1.83, 6.76] | 2.90 [1.58, 6.76] | |
| Missing | 0 (0%) | 1 (2.1%) | 1 (1.6%) | |
| Left ventricular end-diastolic volume index (LEDVI), ml/m2 | ||||
| Mean (SD) | 83.1 (26.8) | 133 (68.7) | 120 (64.4) | < 0.001 |
| Median [Min, Max] | 77.1 [46.7, 151] | 121 [2.20, 393] | 92.9 [2.20, 393] | |
| Missing | 0 (0%) | 1 (2.1%) | 1 (1.6%) | |
| Left ventricular end-systolic volume index (LESVI), ml/m2 | ||||
| Mean (SD) | 44.0 (24.3) | 91.7 (67.1) | 79.6 (62.6) | < 0.001 |
| Median [Min, Max] | 33.2 [18.9, 113] | 89.7 [17.7, 334] | 51.4 [17.7, 334] | |
| Missing | 0 (0%) | 1 (2.1%) | 1 (1.6%) | |
| Left ventricular stroke volume index (LSVI), ml/m2 | ||||
| Mean (SD) | 39.1 (16.1) | 56.5 (82.1) | 52.1 (71.5) | 0.173 |
| Median [Min, Max] | 39.9 [12.8, 79.8] | 44.9 [23.2, 599] | 43.6 [12.8, 599] | |
| Missing | 0 (0%) | 1 (2.1%) | 1 (1.6%) | |
| Left mass index(LMI), g/m2 | ||||
| Mean (SD) | 52.4 (12.3) | 78.1 (27.5) | 71.6 (26.9) | < 0.001 |
| Median [Min, Max] | 53.1 [35.7, 77.7] | 78.4 [27.6, 162] | 70.1 [27.6, 162] | |
| Missing | 0 (0%) | 1 (2.1%) | 1 (1.6%) | |
| Right ventricular ejection fraction (RVEF),% | ||||
| Mean (SD) | 39.7 (18.2) | 39.1 (16.1) | 39.3 (16.5) | 0.921 |
| Median [Min, Max] | 37.3 [10.6, 66.8] | 35.5 [11.7, 68.2] | 35.6 [10.6, 68.2] | |
| Missing | 3 (18.8%) | 10 (20.8%) | 13 (20.3%) | |
| Right ventricular end-diastolic volume (REDV), mL | ||||
| Mean (SD) | 136 (31.9) | 171 (97.3) | 162 (86.5) | 0.064 |
| Median [Min, Max] | 149 [84.7, 191] | 165 [31.2, 516] | 151 [31.2, 516] | |
| Missing | 3 (18.8%) | 10 (20.8%) | 13 (20.3%) | |
| Right ventricular end-systolic volume (RESV), ml/m2 | ||||
| Mean (SD) | 80.0 (23.8) | 113 (87.4) | 105 (77.5) | 0.038 |
| Median [Min, Max] | 75.9 [35.5, 129] | 113 [14.0, 456] | 85.7 [14.0, 456] | |
| Missing | 3 (18.8%) | 10 (20.8%) | 13 (20.3%) | |
| Right Ventricular stroke volume(RSV), mL | ||||
| Mean (SD) | 56.3 (30.8) | 57.1 (22.7) | 56.9 (24.7) | 0.929 |
| Median [Min, Max] | 48.7 [13.1, 93.7] | 55.4 [17.2, 104] | 51.3 [13.1, 104] | |
| Missing | 3 (18.8%) | 10 (20.8%) | 13 (20.3%) | |
| Right cardiac output (RCO), L/min | ||||
| Mean (SD) | 3.92 (1.83) | 4.19 (1.82) | 4.12 (1.81) | 0.65 |
| Median [Min, Max] | 3.26 [1.47, 6.50] | 3.95 [1.26, 10.5] | 3.92 [1.26, 10.5] | |
| Missing | 3 (18.8%) | 10 (20.8%) | 13 (20.3%) | |
| Right cardiac index (RCI), g/m2 | ||||
| Mean (SD) | 2.27 (1.11) | 2.45 (0.852) | 2.41 (0.917) | 0.595 |
| Median [Min, Max] | 1.88 [0.750, 4.54] | 2.38 [0.880, 5.10] | 2.35 [0.750, 5.10] | |
| Missing | 3 (18.8%) | 10 (20.8%) | 13 (20.3%) | |
| Right ventricular end-diastolic volume index (REDVI), ml/m2 | ||||
| Mean (SD) | 77.5 (15.5) | 97.6 (46.7) | 92.5 (41.9) | 0.025 |
| Median [Min, Max] | 78.7 [50.0, 106] | 93.5 [36.7, 258] | 83.1 [36.7, 258] | |
| Missing | 3 (18.8%) | 10 (20.8%) | 13 (20.3%) | |
| Right ventricular end-systolic volume index (RESVI), ml/m2 | ||||
| Mean (SD) | 45.1 (11.0) | 63.8 (43.5) | 59.0 (38.7) | 0.019 |
| Median [Min, Max] | 41.9 [25.7, 65.2] | 62.6 [15.6, 228] | 53.7 [15.6, 228] | |
| Missing | 3 (18.8%) | 10 (20.8%) | 13 (20.3%) | |
| Right ventricular stroke volume index (RSVI), ml/m2 | ||||
| Mean (SD) | 32.5 (17.9) | 33.8 (12.5) | 33.4 (13.9) | 0.808 |
| Median [Min, Max] | 28.1 [7.74, 53.6] | 32.9 [13.3, 69.5] | 32.3 [7.74, 69.5] | |
| Missing | 3 (18.8%) | 10 (20.8%) | 13 (20.3%) | |
| WBC, 109/L | ||||
| Mean (SD) | 7.16 (3.69) | 7.69 (2.96) | 7.55 (3.14) | 0.606 |
| Median [Min, Max] | 6.30 [2.17, 14.7] | 7.35 [2.90, 15.6] | 7.00 [2.17, 15.6] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| RBC, 109/L | ||||
| Mean (SD) | 4.13 (0.890) | 4.14 (1.07) | 4.14 (1.02) | 0.968 |
| Median [Min, Max] | 3.96 [2.81, 5.70] | 3.98 [2.44, 8.83] | 3.98 [2.44, 8.83] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Hb, g/L | ||||
| Mean (SD) | 126 (29.7) | 123 (25.9) | 124 (26.7) | 0.747 |
| Median [Min, Max] | 121 [78.1, 184] | 121 [70.0, 187] | 121 [70.0, 187] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Plt, 109/L | ||||
| Mean (SD) | 176 (63.9) | 187 (77.0) | 184 (73.5) | 0.579 |
| Median [Min, Max] | 165 [90.0, 335] | 191 [28.0, 447] | 178 [28.0, 447] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Neutrophilic granulocyte percentage,% | ||||
| Mean (SD) | 65.8 (16.4) | 68.9 (11.5) | 68.1 (12.9) | 0.487 |
| Median [Min, Max] | 66.8 [31.8, 89.9] | 67.8 [45.4, 90.1] | 67.8 [31.8, 90.1] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Lymphocytes Percentage,% | ||||
| Mean (SD) | 23.4 (14.0) | 21.0 (9.68) | 21.6 (10.9) | 0.525 |
| Median [Min, Max] | 22.8 [4.10, 53.3] | 20.0 [2.20, 41.9] | 20.2 [2.20, 53.3] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Monocyte percentage,% | ||||
| Mean (SD) | 8.61 (3.12) | 7.85 (2.55) | 8.04 (2.70) | 0.389 |
| Median [Min, Max] | 8.55 [2.20, 13.5] | 7.70 [1.50, 14.1] | 7.85 [1.50, 14.1] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Eosinophils Percentage,% | ||||
| Mean (SD) | 1.74 (1.90) | 1.73 (1.96) | 1.74 (1.92) | 0.996 |
| Median [Min, Max] | 1.70 [0, 8.10] | 1.15 [0, 9.50] | 1.30 [0, 9.50] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Basophils Percentage,% | ||||
| Mean (SD) | 0.450 (0.443) | 0.487 (0.343) | 0.477 (0.368) | 0.764 |
| Median [Min, Max] | 0.300 [0.100, 1.90] | 0.400 [0.100, 2.00] | 0.400 [0.100, 2.00] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Neutrophil absolute value | ||||
| Mean (SD) | 5.13 (3.51) | 5.50 (2.83) | 5.40 (3.00) | 0.706 |
| Median [Min, Max] | 4.27 [0.690, 12.7] | 4.65 [1.70, 12.6] | 4.37 [0.690, 12.7] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Lymphocyte absolute value,109/L | ||||
| Mean (SD) | 1.32 (0.526) | 1.47 (0.696) | 1.43 (0.655) | 0.385 |
| Median [Min, Max] | 1.34 [0.380, 2.25] | 1.40 [0.300, 3.20] | 1.40 [0.300, 3.20] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Monocyte absolute value,109/L | ||||
| Mean (SD) | 0.578 (0.330) | 0.579 (0.253) | 0.579 (0.272) | 0.991 |
| Median [Min, Max] | 0.470 [0.200, 1.30] | 0.555 [0.100, 1.32] | 0.520 [0.100, 1.32] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Eosinophils absolute value,109/L | ||||
| Mean (SD) | 0.0988 (0.0973) | 0.108 (0.129) | 0.106 (0.121) | 0.765 |
| Median [Min, Max] | 0.0900 [0, 0.330] | 0.100 [0, 0.600] | 0.100 [0, 0.600] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Basophil absolute value,109/L | ||||
| Mean (SD) | 0.0238 (0.0285) | 0.0187 (0.0310) | 0.0200 (0.0302) | 0.555 |
| Median [Min, Max] | 0.0200 [0, 0.100] | 0 [0, 0.100] | 0.00500 [0, 0.100] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| HCT,% | ||||
| Mean (SD) | 37.6 (8.78) | 37.2 (7.71) | 37.3 (7.92) | 0.855 |
| Median [Min, Max] | 36.7 [23.6, 55.1] | 36.5 [22.6, 57.2] | 36.5 [22.6, 57.2] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Mean RBC volume, fL | ||||
| Mean (SD) | 90.8 (3.96) | 90.6 (7.18) | 90.7 (6.47) | 0.903 |
| Median [Min, Max] | 90.8 [83.0, 97.2] | 92.0 [64.7, 103] | 91.6 [64.7, 103] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Average hemoglobin content, pg | ||||
| Mean (SD) | 30.5 (1.63) | 30.2 (2.89) | 30.2 (2.62) | 0.592 |
| Median [Min, Max] | 30.9 [27.5, 33.3] | 30.8 [20.0, 34.4] | 30.8 [20.0, 34.4] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Average hemoglobin concentration, g/L | ||||
| Mean (SD) | 335 (7.21) | 332 (11.6) | 333 (10.7) | 0.232 |
| Median [Min, Max] | 333 [324, 348] | 334 [297, 358] | 333 [297, 358] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Erythrocyte distribution width CV,% | ||||
| Mean (SD) | 14.8 (2.52) | 39.2 (77.2) | 32.9 (67.1) | 0.038 |
| Median [Min, Max] | 14.6 [11.8, 23.1] | 8.45 [0, 341] | 13.1 [0, 341] | |
| Missing | 0 (0%) | 2 (4.2%) | 2 (3.1%) | |
| Mean platelet volume, fL | ||||
| Mean (SD) | 9.24 (1.42) | 9.14 (1.40) | 9.17 (1.40) | 0.808 |
| Median [Min, Max] | 9.00 [7.10, 11.6] | 8.90 [7.10, 14.1] | 9.00 [7.10, 14.1] | |
| Missing | 0 (0%) | 3 (6.3%) | 3 (4.7%) | |
| PT, s | ||||
| Mean (SD) | 13.6 (3.27) | 12.6 (2.40) | 12.9 (2.68) | 0.263 |
| Median [Min, Max] | 12.7 [10.8, 20.9] | 11.8 [10.4, 21.5] | 12.0 [10.4, 21.5] | |
| Missing | 0 (0%) | 6 (12.5%) | 6 (9.4%) | |
| INR | ||||
| Mean (SD) | 1.27 (0.324) | 1.16 (0.228) | 1.19 (0.259) | 0.255 |
| Median [Min, Max] | 1.17 [0.990, 2.02] | 1.09 [0.950, 1.99] | 1.10 [0.950, 2.02] | |
| Missing | 0 (0%) | 6 (12.5%) | 6 (9.4%) | |
| Prothrombin time activity,% | ||||
| Mean (SD) | 76.4 (20.3) | 84.1 (16.7) | 82.0 (17.9) | 0.192 |
| Median [Min, Max] | 79.5 [40.0, 102] | 88.0 [39.0, 107] | 86.0 [39.0, 107] | |
| Missing | 0 (0%) | 6 (12.5%) | 6 (9.4%) | |
| APTT, s | ||||
| Mean (SD) | 30.4 (3.56) | 32.6 (5.08) | 32.0 (4.78) | 0.079 |
| Median [Min, Max] | 31.2 [24.2, 36.5] | 31.9 [23.8, 47.4] | 31.9 [23.8, 47.4] | |
| Missing | 0 (0%) | 6 (12.5%) | 6 (9.4%) | |
| TT, s | ||||
| Mean (SD) | 15.4 (2.10) | 14.9 (3.19) | 15.0 (2.92) | 0.481 |
| Median [Min, Max] | 15.1 [12.9, 21.5] | 14.3 [11.1, 30.0] | 14.5 [11.1, 30.0] | |
| Missing | 0 (0%) | 6 (12.5%) | 6 (9.4%) | |
| Fibrinogen content, mg/dL | ||||
| Mean (SD) | 314 (126) | 333 (116) | 327 (118) | 0.609 |
| Median [Min, Max] | 287 [189, 726] | 311 [109, 748] | 306 [109, 748] | |
| Missing | 0 (0%) | 6 (12.5%) | 6 (9.4%) | |
| D-dimer, ng/mL | ||||
| Mean (SD) | 1210 (1740) | 1570 (3010) | 1470 (2710) | 0.571 |
| Median [Min, Max] | 544 [142, 7120] | 494 [105, 16500] | 494 [105, 16500] | |
| Missing | 0 (0%) | 5 (10.4%) | 5 (7.8%) | |
| NTProBNP, pg/mL | ||||
| Mean (SD) | 1300 (2300) | 3750 (4920) | 3120 (4520) | 0.016 |
| Median [Min, Max] | 121 [23.9, 6590] | 2370 [126, 26500] | 1790 [23.9, 26500] | |
| Missing | 2 (12.5%) | 7 (14.6%) | 9 (14.1%) | |
| ALT, U/L | ||||
| Mean (SD) | 83.4 (195) | 37.1 (42.1) | 49.2 (106) | 0.359 |
| Median [Min, Max] | 23.5 [11.0, 799] | 23.0 [6.00, 212] | 23.0 [6.00, 799] | |
| Missing | 0 (0%) | 3 (6.3%) | 3 (4.7%) | |
| AST, U/L | ||||
| Mean (SD) | 72.3 (90.6) | 63.6 (76.8) | 65.9 (80.0) | 0.734 |
| Median [Min, Max] | 32.0 [14.0, 314] | 29.0 [11.0, 292] | 30.0 [11.0, 314] | |
| Missing | 0 (0%) | 3 (6.3%) | 3 (4.7%) | |
| AST/ALT | ||||
| Mean (SD) | 1.86 (1.76) | 2.40 (3.29) | 2.25 (2.96) | 0.417 |
| Median [Min, Max] | 1.11 [0.390, 6.96] | 1.25 [0.290, 15.2] | 1.24 [0.290, 15.2] | |
| Missing | 0 (0%) | 3 (6.3%) | 3 (4.7%) | |
| TBIL,µmol/L | ||||
| Mean (SD) | 20.0 (11.1) | 25.2 (42.7) | 23.8 (37.0) | 0.459 |
| Median [Min, Max] | 18.7 [4.70, 49.3] | 15.3 [5.70, 285] | 15.8 [4.70, 285] | |
| Missing | 0 (0%) | 3 (6.3%) | 3 (4.7%) | |
| DBIL,µmol/L | ||||
| Mean (SD) | 6.37 (3.50) | 12.0 (33.5) | 10.5 (28.9) | 0.272 |
| Median [Min, Max] | 5.75 [1.30, 15.3] | 4.80 [1.00, 226] | 5.10 [1.00, 226] | |
| Missing | 0 (0%) | 3 (6.3%) | 3 (4.7%) | |
| IDBIL,µmol/L | ||||
| Mean (SD) | 13.7 (7.89) | 13.2 (10.9) | 13.3 (10.1) | 0.853 |
| Median [Min, Max] | 11.4 [3.40, 34.0] | 9.60 [1.90, 59.5] | 10.0 [1.90, 59.5] | |
| Missing | 0 (0%) | 3 (6.3%) | 3 (4.7%) | |
| Total Protein, g/L | ||||
| Mean (SD) | 64.4 (10.9) | 66.7 (8.33) | 66.1 (9.05) | 0.458 |
| Median [Min, Max] | 64.7 [42.4, 80.8] | 66.9 [50.7, 90.3] | 66.0 [42.4, 90.3] | |
| Missing | 0 (0%) | 3 (6.3%) | 3 (4.7%) | |
| Alb, g/L | ||||
| Mean (SD) | 37.6 (6.57) | 38.8 (5.35) | 38.5 (5.66) | 0.538 |
| Median [Min, Max] | 37.4 [25.6, 47.1] | 38.9 [26.1, 48.7] | 38.6 [25.6, 48.7] | |
| Missing | 0 (0%) | 3 (6.3%) | 3 (4.7%) | |
| Glb, g/L | ||||
| Mean (SD) | 26.8 (5.64) | 27.9 (5.45) | 27.6 (5.47) | 0.495 |
| Median [Min, Max] | 27.4 [16.8, 35.8] | 27.8 [17.9, 42.2] | 27.8 [16.8, 42.2] | |
| Missing | 0 (0%) | 3 (6.3%) | 3 (4.7%) | |
| Alb/Glb | ||||
| Mean (SD) | 1.43 (0.253) | 1.43 (0.332) | 1.43 (0.311) | 0.992 |
| Median [Min, Max] | 1.44 [1.04, 1.93] | 1.37 [0.730, 2.47] | 1.39 [0.730, 2.47] | |
| Missing | 0 (0%) | 3 (6.3%) | 3 (4.7%) | |
| γGGT, U/L | ||||
| Mean (SD) | 84.4 (210) | 51.4 (56.2) | 60.0 (117) | 0.544 |
| Median [Min, Max] | 31.0 [6.00, 871] | 32.0 [7.00, 285] | 31.0 [6.00, 871] | |
| Missing | 0 (0%) | 3 (6.3%) | 3 (4.7%) | |
| ALP, U/L | ||||
| Mean (SD) | 79.7 (47.0) | 80.7 (48.8) | 80.4 (47.9) | 0.944 |
| Median [Min, Max] | 66.5 [34.0, 225] | 64.0 [24.0, 247] | 64.0 [24.0, 247] | |
| Missing | 0 (0%) | 3 (6.3%) | 3 (4.7%) | |
| TBA,µmol/L | ||||
| Mean (SD) | 7.92 (11.3) | 13.2 (36.9) | 11.8 (32.1) | 0.4 |
| Median [Min, Max] | 4.70 [0.400, 46.3] | 3.10 [0, 201] | 3.95 [0, 201] | |
| Missing | 0 (0%) | 4 (8.3%) | 4 (6.3%) | |
| BUN,µmol/L | ||||
| Mean (SD) | 7.46 (2.45) | 6.64 (2.58) | 6.90 (2.55) | 0.287 |
| Median [Min, Max] | 7.11 [3.40, 11.6] | 5.83 [3.10, 11.9] | 6.29 [3.10, 11.9] | |
| Missing | 0 (0%) | 13 (27.1%) | 13 (20.3%) | |
| Creatinine,µmol/L | ||||
| Mean (SD) | 84.9 (33.1) | 110 (128) | 102 (108) | 0.284 |
| Median [Min, Max] | 80.0 [48.9, 153] | 81.6 [32.0, 658] | 80.9 [32.0, 658] | |
| Missing | 0 (0%) | 13 (27.1%) | 13 (20.3%) | |
| Uric Acid,µmol/L | ||||
| Mean (SD) | 344 (99.3) | 348 (97.6) | 347 (97.2) | 0.895 |
| Median [Min, Max] | 355 [156, 486] | 331 [198, 664] | 340 [156, 664] | |
| Missing | 0 (0%) | 13 (27.1%) | 13 (20.3%) | |
| Carbon dioxide, mmol/L | ||||
| Mean (SD) | 21.6 (2.65) | 20.4 (2.62) | 20.8 (2.66) | 0.159 |
| Median [Min, Max] | 22.2 [15.4, 24.4] | 20.1 [15.8, 29.6] | 20.4 [15.4, 29.6] | |
| Missing | 0 (0%) | 13 (27.1%) | 13 (20.3%) | |
| Serum cystatin C, mg/L | ||||
| Mean (SD) | 1.50 (0.656) | 1.88 (1.93) | 1.77 (1.62) | 0.61 |
| Median [Min, Max] | 1.29 [1.02, 2.42] | 1.15 [1.03, 6.96] | 1.15 [1.02, 6.96] | |
| missing | 12 (75.0%) | 39 (81.3%) | 51 (79.7%) | |
| Potassium, mmol/L | ||||
| Mean (SD) | 4.19 (0.433) | 4.22 (0.402) | 4.21 (0.408) | 0.759 |
| Median [Min, Max] | 4.19 [3.69, 5.17] | 4.18 [3.46, 4.95] | 4.18 [3.46, 5.17] | |
| Missing | 0 (0%) | 11 (22.9%) | 11 (17.2%) | |
| Sodium, mmol/L | ||||
| Mean (SD) | 139 (3.01) | 140 (3.78) | 140 (3.54) | 0.782 |
| Median [Min, Max] | 139 [134, 144] | 140 [132, 148] | 140 [132, 148] | |
| Missing | 0 (0%) | 11 (22.9%) | 11 (17.2%) | |
| Chlorine, mmol/L | ||||
| Mean (SD) | 103 (4.78) | 105 (4.14) | 105 (4.41) | 0.127 |
| Median [Min, Max] | 105 [91.0, 109] | 105 [97.6, 114] | 105 [91.0, 114] | |
| Missing | 0 (0%) | 11 (22.9%) | 11 (17.2%) | |
| Calcium, mmol/L | ||||
| Mean (SD) | 2.24 (0.146) | 2.19 (0.156) | 2.21 (0.154) | 0.251 |
| Median [Min, Max] | 2.25 [1.99, 2.48] | 2.16 [1.86, 2.61] | 2.18 [1.86, 2.61] | |
| Missing | 0 (0%) | 11 (22.9%) | 11 (17.2%) | |
| Magnesium, mmol/L | ||||
| Mean (SD) | 0.968 (0.244) | 0.905 (0.179) | 0.925 (0.202) | 0.382 |
| Median [Min, Max] | 0.910 [0.600, 1.53] | 0.870 [0.680, 1.46] | 0.870 [0.600, 1.53] | |
| Missing | 1 (6.3%) | 16 (33.3%) | 17 (26.6%) | |
| Phosphorus, mmol/L | ||||
| Mean (SD) | 1.13 (0.323) | 1.14 (0.247) | 1.14 (0.268) | 0.904 |
| Median [Min, Max] | 1.16 [0.440, 1.63] | 1.13 [0.500, 1.59] | 1.15 [0.440, 1.63] | |
| Missing | 1 (6.3%) | 11 (22.9%) | 12 (18.8%) | |
| Total cholesterol, mmol/L | ||||
| Mean (SD) | 3.69 (0.826) | 3.89 (1.18) | 3.84 (1.10) | 0.494 |
| Median [Min, Max] | 3.34 [2.68, 5.97] | 3.79 [0.280, 7.16] | 3.73 [0.280, 7.16] | |
| Missing | 2 (12.5%) | 8 (16.7%) | 10 (15.6%) | |
| Triglyceride, mmol/L | ||||
| Mean (SD) | 1.06 (0.273) | 1.31 (0.969) | 1.24 (0.850) | 0.14 |
| Median [Min, Max] | 1.08 [0.420, 1.58] | 1.09 [0.170, 6.25] | 1.09 [0.170, 6.25] | |
| Missing | 2 (12.5%) | 8 (16.7%) | 10 (15.6%) | |
| High density lipoprotein, mmol/L | ||||
| Mean (SD) | 1.13 (0.242) | 0.995 (0.315) | 1.03 (0.301) | 0.116 |
| Median [Min, Max] | 1.06 [0.770, 1.61] | 0.950 [0.110, 1.90] | 0.990 [0.110, 1.90] | |
| Missing | 2 (12.5%) | 8 (16.7%) | 10 (15.6%) | |
| Low density lipoprotein, mmol/L | ||||
| Mean (SD) | 2.23 (0.626) | 2.39 (0.976) | 2.35 (0.896) | 0.49 |
| Median [Min, Max] | 2.16 [1.38, 3.68] | 2.39 [0.120, 5.33] | 2.36 [0.120, 5.33] | |
| Missing | 2 (12.5%) | 8 (16.7%) | 10 (15.6%) | |
| Small low density lipoprotein, mmol/L | ||||
| Mean (SD) | 0.585 (0.292) | 0.718 (0.566) | 0.684 (0.510) | 0.269 |
| Median [Min, Max] | 0.505 [0.240, 1.29] | 0.520 [0.0300, 2.80] | 0.520 [0.0300, 2.80] | |
| Missing | 2 (12.5%) | 8 (16.7%) | 10 (15.6%) | |
| Lipoprotein a, mg/dL | ||||
| Mean (SD) | 27.0 (21.9) | 70.3 (83.7) | 56.5 (72.3) | 0.06 |
| Median [Min, Max] | 23.1 [5.00, 59.8] | 39.5 [7.20, 311] | 28.9 [5.00, 311] | |
| Missing | 8 (50.0%) | 31 (64.6%) | 39 (60.9%) | |
| Free fatty acid, mmol/L | ||||
| Mean (SD) | 426 (213) | 511 (324) | 489 (300) | 0.274 |
| Median [Min, Max] | 438 [97.0, 897] | 467 [57.8, 1230] | 452 [57.8, 1230] | |
| Missing | 2 (12.5%) | 8 (16.7%) | 10 (15.6%) | |
| Phospholipid, mmol/L | ||||
| Mean (SD) | 2.12 (0.357) | 2.16 (0.555) | 2.15 (0.508) | 0.755 |
| Median [Min, Max] | 2.07 [1.51, 3.06] | 2.12 [0.290, 3.30] | 2.12 [0.290, 3.30] | |
| Missing | 2 (12.5%) | 8 (16.7%) | 10 (15.6%) | |
Fig. 5.
Performance of the machine learning-based predictive model for PH. (A) Calibration curve of the nomogram in the training set, (B) Calibration curve of the nomogram in the internal validation set, (C) Calibration curve of the nomogram in external validation set, (D) ROC curve of the nomogram in the training set, (E) ROC curve of the nomogram in the internal validation set, (F) ROC curve of the nomogram in the external validation set
Fig. 6.
Web-based calculator for the nomogram-derived PH prediction model. (A) Interface screenshot displaying the calculated PH probability based on MRI index parameters, accompanied by SHAP value interpretation of key imaging features. (B) Interactive nomogram implementation showing real-time risk calculation with detailed visualization of model components (MRI parameters and laboratory biomarkers) and their respective point contributions
The advantages of our model over traditional PH models
In comparison to established clinical-echocardiographic models, such as those recommended by international guidelines, and the extensively validated REVEAL 2.0 clinical calculator, our proposed machine learning model represents a significant methodological and performance advancement for pulmonary hypertension assessment (Table 3). While traditional models rely primarily on operator-dependent echocardiography and a limited set of clinical variables, and REVEAL 2.0 integrates broader clinical and biomarker data but omits advanced imaging, our approach leverages a unique multimodal fusion of gold-standard cardiac MRI—providing superior, reproducible 3D volumetric and functional quantification of the right ventricle—with a comprehensive panel of 60 laboratory biomarkers. This integration of high-fidelity imaging and detailed biochemical profiling, processed through an optimized machine learning framework, yields substantially superior discriminatory performance, as evidenced by an area under the curve (AUC) of 0.944 upon internal validation and a robust 0.897 in a prospective, temporally distinct external cohort. Beyond its validated accuracy, the model offers actionable clinical transparency via SHAP-based interpretability, identifying key MRI-derived features and biomarkers that drive predictions. Furthermore, it has been translated into an interactive, publicly accessible web application, facilitating immediate point-of-care risk stratification. Although constraints regarding MRI availability and the need for validation in broader, multi-ethnic populations remain, our model provides a powerful, clinically deployable tool that synergizes best-in-class imaging, systemic biomarkers, and artificial intelligence to enhance diagnostic precision in pulmonary hypertension.
Table 3.
Comparison between our PH prediction model and traditional PH prediction models
| Comparison Dimension | Traditional Clinical/Echo Models (e.g., ESC/ERS guidelines, echo indices) | REVEAL 2.0 Score (Clinical Calculator) | Our Proposed Model (MRI + Biomarkers + ML) |
|---|---|---|---|
| Core Data Modality | Primarily clinical signs, symptoms, basic echocardiography. | Comprehensive clinical variables, echocardiography, & biomarkers (NT-proBNP). | Multimodal Integration: 2 demographic data + 27 cardiac MRI parameters + 60 laboratory biomarkers. |
| Imaging Basis | Echocardiography (operator-dependent, limited windows). | Echocardiography. | Cardiac MRI: Superior 3D volumetric, functional, and structural quantification (gold-standard for RV assessment). |
| Key Variables / Features | TR jet velocity, RV size/function, NT-proBNP, functional class. | Demographics, clinical findings, echo data, hemodynamics, biomarkers. | 15 MRI-derived features + 4 key biomarkers |
| Reported Performance (AUC - Diagnostic) | Moderate, used more for screening/ probability estimation. | High prognostic value; diagnostic AUCs typically < 0.90. | Training: 0.999; Internal Validation: 0.944; External Validation: 0.897. |
| Validation Rigor | Widely validated in cohorts but based on inherent modality limitations. | Extensively validated for prognosis in large registries. | Prospective External Validation Cohort (temporally distinct) confirming robustness. |
| Clinical Translation | Deeply integrated into guidelines. | Web/calculator available. | Deployed as an interactive web calculator (Shiny app) for immediate clinical use. |
| Key Advantages | Accessibility, speed, guideline endorsement. | Comprehensive clinical integration, strong prognostic evidence. |
1. Multimodal Data Fusion: Combines best-in-class imaging (MRI) with systemic biochemistry. 2. Superior & Validated Performance: Excellent AUC sustained in external validation. 3. Actionable Transparency: SHAP provides clinically explainable insights. 4. Clinical Readiness: Web tool enables real-time risk assessment. |
| Main Limitations | Echo limitations (operator dependency, accuracy issues). | Does not incorporate advanced imaging (MRI/CT). | MRI cost/availability, model requires further validation in diverse populations. |
Discussion
Our study developed and validated a novel, multidimensional diagnostic model for PH that integrates MRI-derived features with laboratory biomarkers through advanced machine learning techniques. The results indicate significant improvements in predictive performance, interpretability, and clinical utility compared to traditional diagnostic methods. By integrating structural cardiac parameters and biochemical markers, this approach offers a comprehensive and accurate tool for early PH diagnosis, advancing the precision medicine paradigm.
A notable highlight of our study is that all cases in the PH group were verified by RHC, with an average pulmonary artery pressure > 20 mmHg, and had complete cardiac MRI data. All individuals in the control group also had complete cardiac MRI data, with all cardiac function parameters within normal ranges. In other words, Our case group was diagnosed with PH by the gold standard, while our control group was confirmed by cardiac MRI to have no structural or functional abnormalities in the heart, ensuring there are no false positives or false negatives. Although these stringent inclusion criteria limited the number of cases in our study, they greatly enhanced the accuracy and clinical applicability of our data and model.
Although ultrasound is currently more widely used than MRI in patients with PH, it has notable limitations as previously mentioned, including operator dependency and other sources of variability. MRI has emerged as a cornerstone in the non-invasive assessment of PH, offering unparalleled advantages over traditional echocardiography. MRI’s superior spatial resolution and tissue characterization capabilities enable precise quantification of structural and functional cardiac abnormalities, such as PA dilation and right ventricular dysfunction, which are hallmarks of PH [1, 9]. Unlike echocardiography, which suffers from operator-dependent variability and limited accuracy in patients with poor acoustic windows [5], MRI provides reproducible, three-dimensional assessments of ventricular volumes, ejection fractions, and vascular dimensions with minimal operator dependence. Our findings align with prior studies demonstrating MRI’s role as a reference standard for evaluating right ventricular function and pulmonary vascular remodeling in PH [9, 13]. The high discriminative power of our MRI index (AUC 0.999 in training, 0.944 in internal validation) underscores MRI’s potential to transform PH diagnostics by enabling earlier detection than RHC which is not only invasive but also carries certain contraindications and demands higher medical infrastructure and technical expertise, potentially reducing reliance on RHC [4]. However, MRI’s limitations, including high cost, limited availability in resource-constrained settings, and contraindications in patients with certain implants, must be acknowledged [14].
The integration of machine learning with multidimensional datasets, as demonstrated in our study, represents a significant advancement in PH clinical prediction models. By combining MRI parameters with laboratory biomarkers, our model achieved exceptional diagnostic performance (AUCs of 0.995, 0.944, and 0.897 in training, internal, and external validation sets, respectively, surpassing traditional risk assessment tools like the REVEAL 2.0 score [15]. Machine learning approaches, such as the XGBoost algorithm used here, excel in handling high-dimensional data and capturing non-linear relationships between predictors, which are often missed by conventional statistical methods [12]. Our model’s interpretability, enhanced by SHAP analysis, provides clinicians with granular insights into how specific features, such as PA diameter and REDV, drive PH risk predictions. This transparency is critical for clinical adoption, as it bridges the gap between complex algorithms and actionable medical decision-making [16]. Previous studies have explored machine learning in PH for risk stratification and prognosis prediction, but few have integrated imaging and biochemical data as comprehensively as our approach [11, 17]. These studies either constructed predictive models for PH using only partial imaging data [18, 19], or used only some imaging and laboratory indicators [19], with each article presenting no more than thirty cumulative statistical variables. In contrast, our research incorporates all clinically collectible imaging and laboratory indicators, amounting to eighty-nine cumulative statistical variables. Therefore, we have reason to believe that our study encompasses more comprehensive variables and provides a more accurate and reliable model. The deployment of our model as a web-based calculator (https://jianghx.shinyapps.io/PH_prediction_MRIIndex/) further enhances its clinical utility, enabling real-time risk assessment at the point of care. This aligns with the growing trend of translating predictive models into practical tools for precision medicine in PH [20]. Nonetheless, the generalizability of such models across diverse populations and healthcare settings requires further validation, particularly in underrepresented groups [21].
The 15 MRI-derived parameters in our model, including pulmonary artery diameter (PA), right/left end-diastolic volumes (REDV/LEDV), left end-systolic volumes (LESV), right cardiac index (RCI), right ventricular ejection fraction (RVEF), left atrial anteroposterior diameter (LAAPD), left cardiac output (LCO), right/left stroke volumes (RSV/LSV), left stroke volume index (LSVI), left ventricular lateral wall thickness (LVLWT), left basal interventricular septal thickness (LIVST), and ascending/descending aortic diameters (AAD/DAD), capture critical pathophysiological changes in PH. PA dilation reflects increased pulmonary vascular resistance, a hallmark of PH, while elevated REDV and reduced RVEF indicate right ventricular strain and dysfunction [22, 23]. Parameters like LSV and LCO provide insights into left heart involvement, which can be both a cause of PH and a manifestation of the disease. Additionally, structural metrics such as LVLWT, AAD, and DAD highlight vascular and myocardial remodeling [24, 25]. SHAP analysis identified PA, REDV, and LEDV as top predictors, emphasizing their diagnostic significance. These findings are consistent with studies showing that MRI parameters correlate strongly with PH severity and prognosis [10, 17]. By quantifying these features, our model enhances non-invasive PH detection, offering a comprehensive view of cardiac and vascular pathology.
In addition to the aforementioned 15 MRI parameters, our study identified BUN, γGGT, TBIL, and D-dimer as key laboratory biomarkers for PH, providing complementary value to cardiac magnetic resonance (MRI)-derived characteristics. These biomarkers reflect multisystem pathophysiological processes in PH, including renal dysfunction, hepatic impairment, and prothrombotic states. BUN, indicative of renal dysfunction, is reported to be increased in the serum of patients with PH and associated with poor prognosis [26, 27]. Similarly, TBIL and γGGT, markers of hepatic dysfunction, are elevated in PH due to right ventricular failure causing hepatic congestion [28], Moreover, studies have shown that individuals living at high altitudes exhibit elevated plasma bilirubin levels, which may serve as a predictive factor for COPD-associated pulmonary hypertension (COPD-PH) [29], while γ-GGT can serve as a predictor of poor prognosis in PH [30]. D-dimer, a fibrin degradation product, is well known to indicate pulmonary embolism (PE), which is one of the causes of PH. Therefore, D-dimer is often associated with acute or chronic thromboembolic PH (CTEPH) [31]. Furthermore, studies have shown that D-dimer serves as an independent and significant predictor of prognosis in patients with CTEPH [32]. Our random forest-based feature selection confirmed the predictive importance of these biomarkers, consistent with prior studies linking them to PH severity and prognosis. By integrating these biomarkers with MRI parameters, our model captures a broader spectrum of PH-related pathophysiology, enhancing diagnostic accuracy. However, the specificity of these biomarkers to PH is limited, as they may be elevated in other conditions, such as liver disease or systemic inflammation, necessitating careful interpretation in clinical contexts.
Despite their promise, clinical prediction models for PH also face several challenges. First, the reliance on MRI data, while highly informative, limits applicability in settings with restricted access to advanced imaging or in patients with contraindications, such as claustrophobia or metallic implants [33]. Second, machine learning models, including ours, require large, high-quality datasets for training and validation, which may not be readily available in all regions, particularly in low-resource settings [21]. Third, the external validation cohort in our study (48 PH cases, 16 controls) was relatively small, potentially limiting the robustness of generalizability assessments [34]. Finally, the implementation of web-based tools, while innovative, requires robust cybersecurity measures and user training to ensure effective adoption in clinical practice [35]. Future studies should focus on validating our model in larger, more diverse cohorts and integrating additional data sources, such as genetic or proteomic markers, to further enhance predictive accuracy. Additionally, cost-effectiveness analyses are needed to evaluate the economic feasibility of widespread MRI-based screening in PH.
Conclusion
In summary, our study presents a new diagnostic framework for PH, integrating MRI-derived features and laboratory biomarkers through advanced machine-learning techniques. The model exhibits exceptional predictive accuracy, interpretability, and clinical utility, paving the way for improved early diagnosis and personalized treatment strategies. By seeking to address some limitations of traditional diagnostic methods through the integration of advanced technologies, our approach may contribute to the ongoing development of PH diagnostics and could inform future clinical practice.
Abbreviations
- PH
Pulmonary hypertension
- RHC
Right heart catheterization
- AUC
Area under the curve
- LAAPD
Left atrial anterior-posterior diameter
- LVTD
Left ventricular transverse diameter
- LIVST
Left basal interventricular septal thickness
- LVLWT
Left ventricular lateral wall thickness
- PA
Pulmonary artery diameter
- AAD
Ascending aorta diameter
- DAD
Descending aorta diameter
- LVEF
Left ventricular ejection fraction
- LVEDV
Left ventricular end-diastolic volume
- LVESV
Left ventricular end-systolic volume
- LVSV
Left ventricular stroke volume
- LVM
Left ventricular mass
- LCO
Left cardiac output
- LCI
Left cardiac index
- LVEDVI
Left ventricular end-diastolic volume index
- LVESVI
Left ventricular end-systolic volume index
- LVSVI
Left ventricular stroke volume index
- LVMI
Left ventricular mass index
- RVEF
Right ventricular ejection fraction
- REDV
Right ventricular end-diastolic volume
- RVESV
Right ventricular end-systolic volume
- RVSV
Right ventricular stroke volume
- RCO
Right cardiac output
- RCI
Right cardiac index
- REDVI
Right ventricular end-diastolic volume index
- RVESVI
Right ventricular end-systolic volume index
- RVSVI
Right ventricular stroke volume index
- RFE
Recursive feature elimination
- DCA
Decision curve analysis
- ROC
Receiver Operating Characteristic
- MRI
Magnetic resonance imaging
- KNN
K-nearest neighbors
- SHAP
SHapley Additive exPlanations
- COPD-PH
COPD-associated pulmonary hypertension
- PE
Pulmonary embolism
- CTEPH
Chronic thromboembolic PH
Author contributions
Qian Cheng, Fan Yang, Changrong Wu, and Dan Xiao contributed to data curation, formal analysis, validation, visualization, and writing the original draft. Xiaojun Hao participated in data collection and organization. Hongxia Jiang provided resources, supervised the project, acquired funding, administered the project, reviewed and edited the manuscript, and verified the underlying data. All authors have read and approved the final version of the manuscript.
Funding
This study was funded by the following programs: The National Natural Science Foundation of China (No. 82300078), the Fundamental Research Funds for the Central Universities (No. 2042023kf0059), Wuhan University Clinical Medicine + Youth Supporting Program (No. 413000557), Science and Technology Innovation Cultivation Funding of Zhongnan Hospital of Wuhan University (No. CXPY2023060).
Data availability
To protect patient privacy, the data will be available only upon reasonable request to the corresponding author.
Declarations
Ethics approval and consent to participate
The study was approved by the Research Ethics Commission of Zhongnan Hospital of Wuhan Unviersity in accordance with the Declaration of Helsinki and the requirement for informed consent was waived by the Ethics Commission (Approval No.:2023185).
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Qian Cheng, Fan Yang, Changrong Wu and Dan Xiao contributed equally to this work.
References
- 1.Simonneau G, Montani D, Celermajer DS, Denton CP, Gatzoulis MA, Krowka M, et al. Haemodynamic definitions and updated clinical classification of pulmonary hypertension. Eur Respir J. 2019;53(1). [DOI] [PMC free article] [PubMed]
- 2.Humbert M, Kovacs G, Hoeper MM, Badagliacca R, Berger RMF, Brida M, et al. 2022 ESC/ERS Guidelines for the diagnosis and treatment of pulmonary hypertension. Eur Heart J. 2022;43(38):3618–731. [DOI] [PubMed] [Google Scholar]
- 3.Murthy S, Benza R. The Evolution of Risk Assessment in Pulmonary Arterial Hypertension. Methodist Debakey Cardiovasc J. 2021;17(2):134–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Gonzalez-Hermosillo LM, Cueto-Robledo G, Roldan-Valadez E, Graniel-Palafox LE, Garcia-Cesar M, Torres-Rojas MB, et al. Right Heart Catheterization (RHC): A Comprehensive Review of Provocation Tests and Hepatic Hemodynamics in Patients With Pulmonary Hypertension (PH). Curr Probl Cardiol. 2022;47(12):101351. [DOI] [PubMed] [Google Scholar]
- 5.Champion HC, Michelakis ED, Hassoun PM. Comprehensive invasive and noninvasive approach to the right ventricle-pulmonary circulation unit: state of the art and clinical and research implications. Circulation. 2009;120(11):992–1007. [DOI] [PubMed] [Google Scholar]
- 6.Keir GJ, Wort SJ, Kokosi M, George PM, Walsh SLF, Jacob J, et al. Pulmonary hypertension in interstitial lung disease: Limitations of echocardiography compared to cardiac catheterization. Respirology. 2018;23(7):687–94. [DOI] [PubMed] [Google Scholar]
- 7.Finkelhor RS, Lewis SA, Pillai D. Limitations and strengths of doppler/echo pulmonary artery systolic pressure-right heart catheterization correlations: a systematic literature review. Echocardiography. 2015;32(1):10–8. [DOI] [PubMed] [Google Scholar]
- 8.Sonaglioni A, Cassandro R, Luisi F, Ferrante D, Nicolosi GL, Lombardo M, et al. Correlation Between Doppler Echocardiography and Right Heart Catheterisation-Derived Systolic and Mean Pulmonary Artery Pressures: Determinants of Discrepancies Between the Two Methods. Heart Lung Circ. 2021;30(5):656–64. [DOI] [PubMed] [Google Scholar]
- 9.Aryal SR, Sharifov OF, Lloyd SG. Emerging role of cardiovascular magnetic resonance imaging in the management of pulmonary hypertension. Eur Respir Rev. 2020;29:156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Wessels JN, de Man FS, Vonk Noordegraaf A. The use of magnetic resonance imaging in pulmonary hypertension: why are we still waiting? Eur Respir Rev. 2020;29:156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.van der Bijl P, Bax JJ. Using deep learning to diagnose pulmonary hypertension. Eur Heart J Cardiovasc Imaging. 2022;23(11):1457–8. [DOI] [PubMed] [Google Scholar]
- 12.Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. Inf Fusion. 2019;50:71–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kovacs G, Reiter G, Reiter U, Rienmuller R, Peacock A, Olschewski H. The emerging role of magnetic resonance imaging in the diagnosis and management of pulmonary hypertension. Respiration. 2008;76(4):458–70. [DOI] [PubMed] [Google Scholar]
- 14.Wang TKM, Ayoub C, Chetrit M, Kwon DH, Jellis CL, Cremer PC, et al. Cardiac Magnetic Resonance Imaging Techniques and Applications for Pericardial Diseases. Circ Cardiovasc Imaging. 2022;15(7):e014283. [DOI] [PubMed] [Google Scholar]
- 15.Benza RL, Gomberg-Maitland M, Elliott CG, Farber HW, Foreman AJ, Frost AE, et al. Predicting Survival in Patients With Pulmonary Arterial Hypertension: The REVEAL Risk Score Calculator 2.0 and Comparison With ESC/ERS-Based Risk Assessment Strategies. Chest. 2019;156(2):323–37. [DOI] [PubMed] [Google Scholar]
- 16.Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat Mach Intell. 2020;2(1):56–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Dawes TJW, de Marvao A, Shi W, Fletcher T, Watson GMJ, Wharton J, et al. Machine Learning of Three-dimensional Right Ventricular Motion Enables Outcome Prediction in Pulmonary Hypertension: A Cardiac MR Imaging Study. Radiology. 2017;283(2):381–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.D’Alto M, Romeo E, Argiento P, D’Andrea A, Vanderpool R, Correra A, et al. Accuracy and precision of echocardiography versus right heart catheterization for the assessment of pulmonary hypertension. Int J Cardiol. 2013;168(4):4058–62. [DOI] [PubMed] [Google Scholar]
- 19.Kim WR, Krowka MJ, Plevak DJ, Lee J, Rettke SR, Frantz RP, et al. Accuracy of Doppler echocardiography in the assessment of pulmonary hypertension in liver transplant candidates. Liver Transpl. 2000;6(4):453–8. [DOI] [PubMed] [Google Scholar]
- 20.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56. [DOI] [PubMed] [Google Scholar]
- 21.Rajkomar A, Hardt M, Howell MD, Corrado G, Chin MH. Ensuring Fairness in Machine Learning to Advance Health Equity. Ann Intern Med. 2018;169(12):866–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Mohamed Hoesein FA, Besselink T, Pompe E, Oudijk EJ, de Graaf EA, Kwakkel-van Erp JM, et al. Accuracy of CT Pulmonary Artery Diameter for Pulmonary Hypertension in End-Stage COPD. Lung. 2016;194(5):813–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Fowler ED, Drinkhill MJ, Stones R, White E. Diastolic dysfunction in pulmonary artery hypertension: Creatine kinase and the potential therapeutic benefit of beta-blockers. Clin Exp Pharmacol Physiol. 2018;45(4):384–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Meyer M. Left ventricular atrophy in pulmonary arterial hypertension: a sinister dexter conundrum. J Am Coll Cardiol. 2014;64(1):38–40. [DOI] [PubMed] [Google Scholar]
- 25.Al-Omary MS, Sugito S, Boyle AJ, Sverdlov AL, Collins NJ. Pulmonary Hypertension Due to Left Heart Disease: Diagnosis, Pathophysiology, and Therapy. Hypertension. 2020;75(6):1397–408. [DOI] [PubMed] [Google Scholar]
- 26.Hu B, Xu G, Jin X, Chen D, Qian X, Li W, et al. Novel Prognostic Predictor for Primary Pulmonary Hypertension: Focus on Blood Urea Nitrogen. Front Cardiovasc Med. 2021;8:724179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zhang S, Gao L, Zhao Z, Zhao Q, Yang T, Zeng Q, et al. Blood urea nitrogen to serum albumin ratio as a new indicator of disease severity and prognosis in idiopathic pulmonary artery hypertension. Respir Med. 2024;227:107643. [DOI] [PubMed] [Google Scholar]
- 28.Scott JV, Moutchia J, McClelland RL, Al-Naamani N, Weinberg E, Palevsky HI, et al. Novel Liver Injury Phenotypes and Outcomes in Clinical Trial Participants with Pulmonary Hypertension. Am J Respir Crit Care Med. 2024;210(8):1045–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Wang L, Wang F, Tuo Y, Wan H, Luo F. Clinical characteristics and predictors of pulmonary hypertension in chronic obstructive pulmonary disease at different altitudes. BMC Pulm Med. 2023;23(1):127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Yogeswaran A, Tello K, Lund J, Klose H, Harbaum L, Sommer N, et al. Risk assessment in pulmonary hypertension based on routinely measured laboratory parameters. J Heart Lung Transpl. 2022;41(3):400–10. [DOI] [PubMed] [Google Scholar]
- 31.Kearon C, de Wit K, Parpia S, Schulman S, Afilalo M, Hirsch A, et al. Diagnosis of Pulmonary Embolism with d-Dimer Adjusted to Clinical Probability. N Engl J Med. 2019;381(22):2125–34. [DOI] [PubMed] [Google Scholar]
- 32.Skoro-Sajer N, Gerges C, Gerges M, Panzenbock A, Jakowitsch J, Kurz A, et al. Usefulness of thrombosis and inflammation biomarkers in chronic thromboembolic pulmonary hypertension-sampling plasma and surgical specimens. J Heart Lung Transpl. 2018;37(9):1067–74. [DOI] [PubMed] [Google Scholar]
- 33.Lee EM, Ibrahim EH, Dudek N, Lu JC, Kalia V, Runge M, et al. Improving MR Image Quality in Patients with Metallic Implants. Radiographics. 2021;41(4):E126–37. [DOI] [PubMed] [Google Scholar]
- 34.Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): the TRIPOD Statement. Br J Surg. 2015;102(3):148–58. [DOI] [PubMed] [Google Scholar]
- 35.Kruse CS, Frederick B, Jacobson T, Monticone DK. Cybersecurity in healthcare: A systematic review of modern threats and trends. Technol Health Care. 2017;25(1):1–10. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
To protect patient privacy, the data will be available only upon reasonable request to the corresponding author.






