ABSTRACT
Pulmonary hypertension (PH) is a common complication in patients with chronic kidney disease (CKD) and is associated with high mortality. Early detection and proper management may improve outcomes in high‐risk patients. This study aimed to develop a simple and effective model for screening PH risk in this population. We retrospectively screened 1082 CKD patients. Feature selection was performed using the least absolute shrinkage and selection operator, univariate and multivariate logistic regression (LR). Nomograms were developed for PH risk assessment. The discriminative ability was estimated by the area under the receiver operating characteristic curve (AUROC), and the accuracy was assessed with a Brier score. Models were validated externally by calculating their performance on a validation cohort. Eight machine learning models were developed, and their performance was evaluated. Decision curve analysis and clinical impact curve were used to assess the model's clinical usefulness. A total of 440 patients were included in the analysis, with 308 in the development cohort and 132 in the validation cohort. The final nomogram included five variables as follows: haemoglobin, gamma‐glutamyl transferase, triglycerides, coronary heart disease and NT‐proBNP. The AUROC of the model was 0.772 (95% CI: 0.731–0.806). External validation confirmed the model's good performance, with an AUROC of 0.782 (95% CI: 0.696–0.854). Among the eight machine learning models, LR showed the best performance. We developed a machine learning model based on clinical and biochemical features to assess PH risk in CKD patients. It enables early detection and risk stratification during follow‐up.
Keywords: chronic kidney disease, machine learning, pulmonary hypertension, risk model
1. Introduction
Pulmonary hypertension (PH) is common in patients with chronic kidney disease (CKD) or renal failure. Its prevalence ranges from 20% to 25% across different CKD stages [1, 2], and can reach up to 78% in patients referred for right heart catheterization (RHC) [3]. PH is defined as mean pulmonary arterial pressure (mPAP) of ≥25 mmHg, measured by RHC [4]. Although RHC is the gold standard for diagnosing PH and its subtypes, most studies in CKD populations rely on transthoracic echocardiography to estimate PH severity [5]. The most frequently used echocardiographic criterion for suspected PH is a pulmonary artery systolic pressure (PASP) ≥35–40 mmHg [6, 7].
Despite the high prevalence and mortality associated with PH, there are currently no targeted treatments available [2]. PH often goes undiagnosed in early stages because it causes few symptoms [8]. Early identification may help slow disease progression and improve outcomes.
Although the prevalence of PH increases with CKD progression, there is currently no consensus or standardized guideline regarding the timing for echocardiographic screening [3]. In the United States, PH screening is recommended only for patients on dialysis for more than 2 years, as part of pre‐transplant evaluation. In contrast, European and Canadian guidelines do not include PH screening in transplant assessments [9, 10, 11]. This highlights the need to identify high‐risk patients who may benefit from further screening.
Machine learning techniques have gained increasing attention in clinical risk prediction due to their capacity to capture complex, non‐linear relationships within high‐dimensional datasets [12]. Compared with conventional regression‐based approaches, machine learning models offer enhanced predictive performance, particularly in heterogeneous populations such as patients with CKD [13]. Although ML has shown potential in predicting cardiovascular risk, its use in identifying PH risk in CKD remains limited.
This study aimed to develop a simple and effective model for guiding the timing of echocardiographic screening in patients with CKD, based on their risk of PH.
2. Materials and Methods
2.1. Data Collection and Study Population
This study was approved by the Ethics Committee of the First Affiliated Hospital of Xi'an Jiaotong University (No. XJTU1AF2024LSYY‐467). All procedures involving human participants were conducted in accordance with applicable institutional guidelines and regulations, as well as the principles outlined in the Declaration of Helsinki. A total of 1946 patients who were first identified or recorded as having CKD and hospitalized at the First Affiliated Hospital of Xi'an Jiaotong University from January 2020 to December 2022 were screened for inclusion. Demographic, clinical and echocardiographic data were collected through the hospital's electronic medical record system. Echocardiographic parameters, including PASP and cardiac chamber dimensions, were obtained from standardized electronic echocardiography reports. All variables were defined according to established clinical diagnostic criteria. Diabetes mellitus was defined as a documented diagnosis, a fasting plasma glucose ≥7.0 mmol/L or use of anti‐diabetic medication. Coronary heart disease (CHD) was defined as a history of myocardial infarction, coronary artery disease or prior coronary revascularization. CKD stages were classified according to the KDIGO guidelines based on eGFR values [14]. Based on the diagnostic criteria from the literature and the First Affiliated Hospital of Xi'an Jiaotong University, PH was defined as PASP ≥40 mmHg, measured by transthoracic echocardiography, based on published criteria and local hospital standards [6, 15].
The inclusion criterion was as follows: (a) confirmed clinical diagnosis of CKD [14]. Exclusion criteria were as follows: (a) age: <18 years old, >75 years old; (b) PH identified prior to the confirmed diagnosis of CKD; (c) CKD stage 1, 2 or 3a; (d) baseline data with more than 20% missing values and (e) no echocardiography during hospitalization. After applying these criteria, 1082 patients were included. Furthermore, a random 1:1 case‐matching was performed between patients with PH and those without PH. To ensure statistical rigour, matching was based on age, sex and CKD stage. After matching, 220 pairs (440 patients) were included in the final analysis, as shown in Figure 1. The patients were randomly assigned to development and validation cohorts using a random number method with a 7:3 ratio.
FIGURE 1.

Flow chart of the study participants. CKD, chronic kidney disease; PH, pulmonary hypertension.
2.2. Laboratory and Demographic Data Collection
Demographic and clinical data for each patient were recorded, including age, sex, body mass index (BMI), medical history, family history of kidney disease, CKD stage, renal replacement therapy status, aetiology of CKD, heart failure classification and electrocardiogram (ECG) findings. Laboratory tests performed on patients included complete blood count, liver function tests, lipid profile, coagulation profile, kidney function tests, N‐terminal pro B‐type natriuretic peptide (NT‐proBNP) and cardiac troponin T (TnT). NT‐proBNP levels were categorized as normal or elevated [16]. All blood samples were collected in the morning after overnight fasting and before dialysis at the First Affiliated Hospital of Xi'an Jiaotong University. The sample collection process followed routine clinical procedures to ensure the reliability and consistency of the data.
2.3. Echocardiographic Data
Echocardiography was performed using the following equipment: the LOGIQ E9 system (GE Healthcare, Milwaukee, WI, USA) and the Philips CX50 system (Philips Medical Devices Group, Netherlands) equipped with 1–5‐MHz cardiac probe. Echocardiographic data were collected for all CKD patients, including measurements of PASP, left ventricular ejection fraction (LVEF) and diameters of the ascending aorta (AAO), aortic root (AOR), right ventricular outflow tract (RVOT), main pulmonary artery (MPA), right ventricular anteroposterior diameter (RVAP), left atrial anteroposterior diameter (LAAP), left ventricular end‐systolic diameter (LVDs) and left ventricular end‐diastolic diameter (LVDd). In addition, the presence of pericardial effusion, tricuspid regurgitation and mitral regurgitation was assessed. All echocardiographic results were double‐checked by the same group of ultrasound specialists at the First Affiliated Hospital of Xi'an Jiaotong University to ensure consistency and accuracy.
2.4. Statistical Analysis
Statistical analyses were performed using R (version 4.2.3; https://www.R‐project.org) and Python language (version 3.7.0; https://www.python.org). Sample size calculation for a binary outcome model followed a previously established approach [17]. Based on this approach, the minimum required sample size was estimated to be 246. For handling missing data, samples with more than 20% missing values were excluded from the dataset. For those with less than 20% missing data, multiple imputation was used.
Normality of continuous variables was evaluated using the Kolmogorov–Smirnov test, and homogeneity of variance was assessed with Levene's test. Multicollinearity among variables was examined via the variance inflation factor. For group comparisons, independent t‐tests were used for normally distributed variables, and Mann–Whitney U tests for non‐normally distributed variables. Categorical variables were compared using the chi‐square test or Fisher's exact test, as appropriate. Continuous data are reported as mean (standard deviation, SD) for normal distributions and as median (interquartile range, IQR) for non‐normal distributions. Categorical data are presented as counts and percentages. A two‐sided p value less than 0.05 was considered indicative of statistical significance.
To build and validate the prediction model, the dataset was randomly split into a development cohort (n = 308) and an independent validation cohort (n = 132) in a 70:30 ratio. Initially, candidate features were selected based on clinical expertise and prior research. Least absolute shrinkage and selection operator (LASSO) regression with 10‐fold cross‐validation, univariate logistic regression (LR) and multivariate LR were applied to further refine the selection of relevant features, and the selected features were used to build the model [18]. In the multivariate LR analysis, the regression coefficients (β) and odds ratios (OR) with two‐sided 95% confidence intervals (CIs) for each feature were calculated. Subsequently, based on the fitted multivariate LR model, a nomogram was developed. Discrimination was evaluated using the ROC curve or Harrell's concordance index (C‐index) along with its 95% CI. Model performance was also externally validated in the validation cohort. The Brier Score was used to assess the accuracy of the model's probability predictions, with lower scores indicating better predictive performance [19]. The precision–recall (PR) curve was used to evaluate the model's performance in identifying positive cases. Finally, the clinical usefulness of the model was evaluated using the decision curve analysis (DCA) and clinical impact curve (CIC) [20].
Based on the selected features, eight machine learning models were used for further development and validation, including the eXtreme Gradient Boosting (XGBoost) algorithm, LR, Random Forest (RF), Adaptive Boosting (AdaBoost), Gaussian Naive Bayes (GNB), Multi‐layer Perceptron (MLP), Support Vector Machine (SVM) and K‐Nearest Neighbour (KNN). Hyperparameter tuning of the models was performed using nested cross‐validation on the development cohort. Model performance was evaluated using the previously mentioned methods. SHapley Additive exPlanations (SHAP) were employed to interpret the models and show the relationship between the importance of each feature [21].
3. Results
3.1. Demographic and Clinical Data
A total of 220 CKD patients with PH were included, with a PH prevalence of 20.33% (220/1082). Patient ages ranged from 18 to 75 years, with a median age of 51 years (IQR: 37.00–62.00). Among the patients, 85 were female (38.6%) and 135 were male (61.4%).
As shown in Table S1, there were significant differences between the PH and non‐PH groups across a range of clinical parameters. Haematological assessments revealed that red blood cell count (RBC), haemoglobin (Hb) and platelet count (PLT) were all significantly different, with p values less than 0.001, 0.001 and 0.008, respectively. Additionally, white blood cell count (WBC) differed with a p value of 0.026.
Liver function tests showed significant changes. Alanine aminotransferase (ALT) had a p value of 0.03, gamma‐glutamyl transferase (GGT) and albumin (ALB) were both highly significant (p < 0.001), and direct bilirubin (DBIL) showed notable differences (p = 0.017). Lipid profiles also differed significantly; total cholesterol and triglycerides showed significant differences p values of 0.006 and less than 0.001, respectively.
Coagulation function tests, such as D‐dimer and fibrin degradation products (FDPs), both showed significant variances with p values less than 0.001. Furthermore, key biomarkers like DBIL, NT‐proBNP and TnT were showed significantly, all with p values less than 0.001.
Historical medical data indicated significant differences in the prevalence of diabetes mellitus, CHD and WHO heart function classification, with p values of 0.015, 0.002 and less than 0.001, respectively. Electrocardiographic findings showed significant differences in the prevalence of atrial fibrillation and supraventricular premature beats, with p values less than 0.001 and 0.003, respectively.
Echocardiographic assessments revealed significant differences in PASP, LVEF and additional measurements such as atrial and ventricular dimensions and the presence of pericardial effusion, all showing p values less than 0.001. Regurgitation measurements for the TV and MV also indicated significant differences, both with p values less than 0.001.
3.2. Features Selection
In the development cohort, based on intergroup differences and clinical relevance, a total of 25 candidate features were included in the LASSO regression. After LASSO selection, the 25 features were reduced to nine non‐zero coefficient features (shown in Figures 2a and 2b). These features included NT‐proBNP, pericardial effusion, LAAP diameter, RVAP diameter, haemoglobin, BUN, triglyceride, DBIL and FDP. Subsequently, univariate and multivariate LR models were used to further evaluate the prognostic value of these nine variables. Ultimately, RVAP diameter, LAAP diameter, triglyceride and pericardial effusion were incorporated into the final combined model (C_U model).
FIGURE 2.

Features were selected using the LASSO regression. (a) and (b) show the C_U Model, where LASSO regression was used to select informative features through 10‐fold cross‐validation. The model feature selection is based on the minimum distance of the SE. (c) and (d) show the C_Model. LASSO, the least absolute shrinkage and selection operator.
Considering that echocardiography is not routinely performed during regular follow‐up visits, a separate clinical model (C model) was developed by excluding echocardiographic features. This model can help identify high‐risk patients during routine follow‐up who may require further screening. The same feature selection process was applied. LASSO regression identified 10 variables with non‐zero coefficients, among which five— haemoglobin, GGT, triglyceride, CHD and NT‐proBNP—were incorporated into the final C model after further LR analysis (Tables 1 and 2), considering both statistical and clinical relevance.
TABLE 1.
Univariable and multivariable logistic analysis of PH risk.
| Features |
Univariable OR (95% CI) |
p (univariable) |
Multivariable OR (95% CI) |
p (multivariable) |
|---|---|---|---|---|
| CHD | 2.45 (1.32–4.55) | 0.005 | 2.06 (1.03–4.12) | 0.041 |
| NT‐proBNP | 8.33 (3.39–20.47) | <0.001 | 5.25 (2.03–13.57) | <0.001 |
| Atrial fibrillation | 0.14 (0.03–0.61) | 0.009 | 0.31 (0.07–1.50) | 0.147 |
| RBC (10^12/L) | 0.56 (0.41–0.76) | <0.001 | 0.92 (0.64–1.30) | 0.629 |
| Haemoglobin (g/L) | 0.98 (0.97–0.99) | <0.001 | 0.98 (0.97–1.00) | 0.051 |
| GGT (U/L) | 1.01 (1.00–1.02) | 0.014 | 1.01 (1.00–1.02) | 0.052 |
| Triglyceride (mmol/L) | 0.52 (0.39–0.68) | <0.001 | 0.57 (0.43–0.77) | <0.001 |
| FDP (mg/L) | 1.08 (1.03–1.15) | 0.004 | 1.03 (0.97–1.08) | 0.34 |
| DBIL (µmol/L) | 1.05 (1.00–1.10) | 0.051 | ||
| TnT (ng/mL) | 1.26 (0.83–1.91) | 0.278 |
Abbreviations: CHD, coronary heart disease; DBIL, direct bilirubin; FDP, fibrin degradation product; GGT, γ‐glutamyl transferase; NT‐proBNP, N‐terminal pro B‐type natriuretic peptide; RBC, red blood cell; TnT, troponin T.
TABLE 2.
Logistic regression analysis for a clinical model of PH risk.
| Predictor | Estimate | SE | Z | p | OR | 95% CI | |
|---|---|---|---|---|---|---|---|
| Lower | Upper | ||||||
| (Intercept) | 0.56 | 0.71 | 0.79 | 0.431 | 1.75 | 0.42 | 6.98 |
| Haemoglobin (g/L) | −0.02 | 0.01 | −3.16 | 0.002 | 0.98 | 0.97 | 0.99 |
| GGT (U/L) | 0.01 | 0.00 | 2.43 | 0.015 | 1.01 | 1.00 | 1.02 |
| Triglyceride (mmol/L) | −0.60 | 0.15 | −3.92 | <0.001 | 0.55 | 0.40 | 0.73 |
| CHD | 0.78 | 0.35 | 2.24 | 0.025 | 2.18 | 1.12 | 4.42 |
| NT‐proBNP | 1.79 | 0.48 | 3.76 | <0.001 | 6.03 | 2.52 | 16.85 |
Abbreviations: CHD, coronary heart disease; GGT, γ‐glutamyl transferase; NT‐proBNP, N‐terminal pro B‐type natriuretic peptide.
3.3. Development of Model
Based on the results of feature selection, three multivariate LR models were constructed:
C_U model, which included both clinical and echocardiographic features;
C model, including only clinical features;
U model, including only echocardiographic features (selected from the C_U model).
These models were developed to compare predictive performance and assess the contribution of different feature sets.
Since the goal of this study was to identify high‐risk patients using clinical information and support decisions on when to perform echocardiography, the clinical‐only (C) model was selected for nomogram development (Figure 3c).
FIGURE 3.

Nomogram and model performance evaluation for risk prediction models. Receiver‐operating characteristic curves for the training cohort (a) and validation cohort (b). (c) Nomogram model for predicting the risk of PH in CKD patients. The nomogram is used to predict the probability of a PH risk by assigning points to each predictor features based on its value. The total points are then calculated by summing the individual points for each feature, and the corresponding probability of the PH risk is determined from the total points. CHD, coronary heart disease; CKD, chronic kidney disease; GGT, γ‐glutamyl transferase; NT‐proBNP, N‐terminal pro B‐type natriuretic peptide; PH, pulmonary hypertension.
3.4. Model Assessment and Validation
The areas under the receiver operating characteristic curve (AUROC) for the C, U and C_U models were 0.772 (95% CI: 0.721–0.826), 0.835 (95% CI: 0.788–0.878) and 0.853 (95% CI: 0.813–0.890), respectively (shown in Figure 3a). In the validation cohort, the AUROCs for the three models were 0.782 (95% CI: 0.696–0.854), 0.869 (95% CI: 0.816–0.923) and 0.876 (95% CI: 0.813–0.923), respectively (shown in Figure 3b).
Youden index values for the three models were as follows: 0.453 (95% CI: 0.340–0.611) for the C model, 0.594 (95% CI: 0.511–0.741) for the U model and 0.590 (95% CI: 0.501–0.733) for the C_U model.
The DCA curves for the three models are shown in Figure 4a. Compared with the “treat‐all” and “treat‐none” strategies, all models provided greater net clinical benefit across a range of threshold probabilities.
FIGURE 4.

Clinical decision curve (a) and clinical impact curve (b) for model performance evaluation.
Furthermore, for the C model, the Brier score was 0.192 in the development cohort and 0.197 in the validation cohort, indicating that the model's prediction accuracy was consistent across both datasets with minimal prediction error. The Hosmer and Lemeshow goodness of fit (GOF) test was performed to assess the model's fit, yielding a p value of 0.795 and a statistic of 4.638. These results indicate that the model has a good fit. This suggests that the C model demonstrated good stability and consistency in both the development and validation cohorts in terms of its fit and predictive ability. Based on the Youden index, the optimal probability cut‐off for identifying high‐risk patients was 46.8%. Patients with predicted probabilities above this threshold should be prioritized for further screening or clinical intervention. The CIC curve of the C model provides an evaluation of the benefits under different thresholds and risk criteria. As shown in Figure 4b, when the threshold exceeds approximately 0.6, the number of true positives closely aligns with the number of predicted positives, suggesting strong clinical predictive value.
3.5. Machine Learning
Based on the selected features, eight machine learning models were developed to predict the risk of PH in patients with CKD. The models were evaluated using AUROC, Brier scores, PR curves and DCA.
The PR curves show that LR exhibited the best performance on the validation set, with an AP of 0.742, indicating a good balance between precision and recall (shown in Figures 5a and 5b). Although RF performed the best on the training sets (AP = 0.902), its performance on the validation set decreased slightly (AP = 0.722). AdaBoost also performed relatively well (AP = 0.726) on the validation set, and XGBoost showed good performance on the training set but declined in the validation set (AP = 0.709). Overall, LR performed the best on the validation set.
FIGURE 5.

Precision–recall curves for eight machine learning models in the training set (a) and validation set (b). Random Forest performed the best on the training set, while Logistic Regression showed the best performance on the validation set. GNB, GaussianNB; KNN, K‐nearest neighbour; MLP, multi‐layer perceptron; SVM, support vector machine; XGBoost, eXtreme Gradient Boosting.
The performance evaluation metrics of the model in the training and validation sets are shown in Tables 3 and 4. Figure 6 shows the ROC curves for different machine learning models on both the training and validation sets. On the training set, the XGBoost model achieves the highest AUC of 0.813 (95% CI: 0.756–0.869). However, on the validation set, its AUC dropped to 0.721 (95% CI: 0.654–0.788), likely due to overfitting. In contrast, LR performs the best on the validation set with an AUC of 0.766 (95% CI: 0.703–0.828) and a comparable AUROC in the training set (0.775, 95% CI: 0.715–0.836), reflecting stable performance. This suggests that LR exhibits greater stability, indicating that it may be the most reliable model. Further analysis of the LR model in the testing set yielded an AUC of 0.742 (95% CI: 0.657–0.828). The model also demonstrated an accuracy of 59.1%, with a sensitivity of 50%, specificity of 70% and an F1 score of 0.571.
TABLE 3.
Performance metrics for eight models in the training set.
| Model | AUC (95% CI) | Accuracy (95%CI) | Sensitivity (95% CI) | Specificity (95% CI) | F1 score (95% CI) |
|---|---|---|---|---|---|
| XGBoost | 0.813 (0.756–0.869) | 0.745 (0.745–0.745) | 0.709 (0.638–0.780) | 0.782 (0.711–0.853) | 0.735 (0.716–0.755) |
| Logistic | 0.775 (0.715–0.836) | 0.705 (0.705–0.705) | 0.723 (0.411–1.000) | 0.686 (0.375–0.998) | 0.703 (0.611–0.794) |
| RandomForest | 0.906 (0.866–0.946) | 0.823 (0.814–0.832) | 0.768 (0.741–0.795) | 0.877 (0.868–0.886) | 0.812 (0.799–0.825) |
| AdaBoost | 0.795 (0.736–0.853) | 0.745 (0.737–0.754) | 0.755 (0.576–0.933) | 0.736 (0.576–0.897) | 0.745 (0.694–0.797) |
| GNB | 0.757 (0.694–0.820) | 0.686 (0.677–0.695) | 0.723 (0.447–0.999) | 0.65 (0.392–0.908) | 0.691 (0.603–0.780) |
| MLP | 0.708 (0.641–0.776) | 0.648 (0.634–0.661) | 0.682 (0.628–0.735) | 0.614 (0.533–0.694) | 0.659 (0.650–0.668) |
| SVM | 0.774 (0.712–0.835) | 0.716 (0.703–0.729) | 0.750 (0.420–1.000) | 0.682 (0.379–0.985) | 0.717 (0.618–0.817) |
| KNN | 0.893 (0.850–0.935) | 0.823 (0.814–0.832) | 0.718 (0.700–0.736) | 0.927 (0.892–0.963) | 0.802 (0.798–0.806) |
Abbreviations: GNB, GaussianNB; KNN, K‐nearest neighbour; MLP, multi‐layer perceptron; SVM, support vector machine; XGBoost, eXtreme Gradient Boosting.
TABLE 4.
Performance metrics for eight models in the validation set.
| Model | AUC (95% CI) | Accuracy (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | F1 score (95% CI) |
|---|---|---|---|---|---|
| XGBoost | 0.721 (0.654–0.788) | 0.648 (0.608–0.688) | 0.614 (0.337–0.890) | 0.682 (0.486–0.878) | 0.627 (0.494–0.760) |
| Logistic | 0.766 (0.703–0.828) | 0.695 (0.695–0.695) | 0.718 (0.504–0.932) | 0.673 (0.459–0.887) | 0.699 (0.636–0.762) |
| RandomForest | 0.738 (0.672–0.804) | 0.639 (0.572–0.705) | 0.491 (0.277–0.705) | 0.786 (0.706–0.867) | 0.569 (0.416–0.722) |
| AdaBoost | 0.743 (0.678–0.807) | 0.689 (0.657–0.720) | 0.682 (0.646–0.717) | 0.695 (0.597–0.793) | 0.687 (0.676–0.697) |
| GNB | 0.749 (0.685–0.813) | 0.680 (0.675–0.684) | 0.718 (0.629–0.807) | 0.641 (0.543–0.739) | 0.691 (0.667–0.714) |
| MLP | 0.710 (0.643–0.778) | 0.645 (0.645–0.645) | 0.673 (0.637–0.708) | 0.618 (0.583–0.654) | 0.655 (0.643–0.667) |
| SVM | 0.749 (0.685–0.813) | 0.666 (0.644–0.688) | 0.691 (0.406–0.976) | 0.641 (0.400–0.881) | 0.667 (0.560–0.774) |
| KNN | 0.720 (0.654–0.786) | 0.645 (0.592–0.699) | 0.432 (0.298–0.565) | 0.859 (0.832–0.886) | 0.546 (0.432–0.660) |
Abbreviations: GNB, GaussianNB; KNN, K‐nearest neighbour; MLP, multi‐layer perceptron; SVM, support vector machine; XGBoost, eXtreme Gradient Boosting.
FIGURE 6.

ROC curves for training set (a), validation set (b) and forest plot of auc scores (c) for eight machine learning models. XGBoost performed the best on the training set but showed a significant decline in performance on the validation set. Logistic regression exhibited the best performance on the validation set. GNB, GaussianNB; KNN, K‐nearest neighbour; MLP, multi‐layer perceptron; SVM, support vector machine; XGBoost, eXtreme Gradient Boosting.
Calibration curves of different models are shown in Figure 7a. Among them, XGBoost, LR and SVM demonstrated good calibration, with predicted probabilities closely aligned with the ideal calibration line. Figure 7b shows the DCA results. All eight models provided better clinical net benefit across various thresholds compared to the “Treat None” and “Treat All” strategies, highlighting their practical value in decision‐making.
FIGURE 7.

Calibration curve (a) and decision curve analysis (b) for eight machine learning models. GNB, GaussianNB; KNN, K‐nearest neighbour; MLP, multilayer perceptron; SVM, support vector machine; XGBoost, eXtreme Gradient Boosting.
SHAP analysis was applied to interpret the model. Figure 8a ranks predictors by their mean absolute SHAP value, whereas Figure 8b depicts how individual feature values shift the predicted risk: positive SHAP values (right of zero) raise the probability of PH, whereas negative values (left of zero) exert a protective effect.
FIGURE 8.

SHAP‐based analysis of the logistic regression model. (a) Features ranked by mean absolute SHAP value, reflecting their overall contribution to the model. (b) SHAP summary plot illustrating the direction and magnitude of each feature's impact on the predicted probability of pulmonary hypertension. Positive SHAP values indicate features that increase the predicted probability of pulmonary hypertension, whereas negative values indicate a protective effect; larger absolute values denote a stronger influence. CHD, coronary heart disease; GGT, γ‐glutamyl transferase; NT‐proBNP, N‐terminal pro B‐type natriuretic peptide; SHAP, SHapley Additive exPlanations.
4. Discussion
In this study, we developed a model to assess the risk of PH in CKD patients. The model incorporates several features, including haemoglobin, GGT, triglycerides, CHD and NT‐proBNP. Additionally, the LR model performed well, demonstrating good discriminative ability, accuracy and clinical usefulness in both the training and validation sets. According to our model, when the predicted risk exceeds 46.8%, patients may benefit from further screening and clinical intervention.
This study found that the prevalence of PH in patients with CKD was 20.33%. Two meta‐analyses reported PH prevalence rates of 23% and 33% in patients with CKD stages 1–5 or kidney failure on dialysis [22, 23]. Epidemiological data obtained through RHC indicated a PH prevalence of 68% in patients with CKD stages 3–5, while the prevalence of PH in kidney failure patients listed for transplantation ranged from 59% to 78% [5, 24, 25]. These findings suggest that PH is relatively common in the CKD population.
Several earlier studies have identified the risk factors for PH in patients with CKD, identifying several factors associated with PH, including race (e.g., Black individuals), comorbidities (such as chronic obstructive pulmonary disease and cardiovascular disease), history of dialysis and cardiac dysfunction [26, 27, 28]. In our study, we identified cardiac function‐related risk factors, specifically CHD and elevated NT‐proBNP.
Interestingly, we observed that triglyceride levels in PH‐CKD patients were lower than those in non‐PH CKD patients, a finding consistent with previous studies [29]. This may be related to the nutritional status of the patients. Low HB levels are a potential risk factor for PH, possibly due to their role in reducing the oxygen transport capacity of red blood cells. This reduction leads to hypoxemia, which subsequently increases heart rate and cardiac output and induces pulmonary vasoconstriction, ultimately contributing to the development of PH [8, 30]. Although limited studies have examined GGT in patients with both PH and CKD, recent evidence suggests that it has prognostic value in PH [31]. Elevated GGT levels are thought to reflect hepatic congestion secondary to right heart dysfunction, as well as systemic oxidative stress. The latter is a key shared pathophysiological mechanism linking PH and CKD [32, 33, 34]. Therefore, GGT may serve as an integrated biomarker, although its precise role requires further investigation.
This study has several advantages. We used machine learning algorithms to develop a risk prediction model for PH‐CKD. These techniques help clinicians identify key predictors and better understand the underlying mechanisms of disease development [35]. We employed more stringent feature selection methods, including LASSO and univariate and multivariate LR, enabling us to develop a more streamlined model that is better suited for rapid clinical screening. Compared to previous studies, the number of predictive features in our model was significantly reduced, yet its performance remained comparable [30]. To date, there is no clear recommendation on when echocardiographic screening for PH should be conducted in patients with CKD [3]. Our model requires only clinical and biochemical features to assess the risk of PH in CKD patients, making it suitable for use in the follow‐up of CKD patients. When a patient's risk exceeds the cut‐off value, further echocardiographic examinations can be conducted for confirmation.
This study has several limitations. First, the cohort size in this study was relatively small. Although the sample size met the requirements of the power calculation, validating the model in larger patient populations would be beneficial. Second, patients without echocardiographic data or with missing key variables were excluded from the analysis. No formal comparison was made between included and excluded individuals, which may introduce selection bias. Third, because the present analysis was retrospective and not all eligible CKD patients underwent standardized echocardiography and uniform laboratory testing, a prospective study in which these assessments are performed in every participant is needed to confirm and extend our findings.
5. Conclusion
We utilized machine learning algorithms to develop a model that leverages clinical and biochemical features to personalize the assessment of PH risk in CKD patients. This model demonstrates strong performance and provides a practical tool for PH risk assessment during the follow‐up of CKD patients. When the predicted risk exceeds the cut‐off value, the model can effectively identify patients who may benefit from further diagnostic evaluation.
Author Contributions
Wen Gu: conceptualization, formal analysis, methodology, validation, writing – original draft, writing – review and editing. Lingling Li: conceptualization, data curation, investigation, resources, writing – original draft. Ashfaq Ahmad: data curation, investigation, resources. Jing Lv: funding acquisition, project administration, supervision. Songling Zhang: data curation, resources. Yajuan Du: data curation, resources, visualization. Jite Shi: project administration, software, funding acquisition. Yiming Ding: data curation, resources. Ting Liu: data curation, resources. Fenling Fan: conceptualization, funding acquisition, project administration, supervision, writing – review and editing. All authors reviewed and approved the final version of the manuscript and agree to be accountable for all aspects of the work.
Ethics Statement
This study was approved by the Ethics Committee of the First Affiliated Hospital of Xi'an Jiaotong University (application ID: XJTU1AF2024LSYY‐467). All methods were performed in accordance with the relevant guidelines and regulations or the declaration of Helsinki.
Consent
All subjects provided written informed consent after the experimental procedures were fully explained.
Conflicts of Interest
The authors declare no conflicts of interest.
Supporting information
Supporting Table 1: Comparison of features between the PH and Non‐PH groups.
Acknowledgments
We are grateful to well‐trained experts on Echocardiography, who are involved in several international and national clinical trials, from the First Affiliated Hospital of Xi'an Jiaotong University.
Gu W., Li L., Ahmad A., et al. “A Machine Learning–Based Model to Estimate the Risk of Pulmonary Hypertension in Chronic Kidney Disease Patients.” The Journal of Clinical Hypertension 27, no. 9 (2025): e70132. 10.1111/jch.70132
Wen Gu and Lingling Li contributed equally to this work.
Funding: This work was supported by the National Natural Science Foundation of China (Grant No. 82270057), the Clinical Research Award of the First Affiliated Hospital of Xi'an Jiaotong University (Grant No. XJTU1AF‐CRF‐2019‐010), the Fundamental Project Plan in Shaanxi Province (Grant No. 2020JM‐364) and the Key Research and Development Plan of Xianyang City (Grant No. L2023‐ZDYF‐SF‐020).
Data Availability Statement
The datasets generated for this study are available on request to the corresponding author.
References
- 1. Bolignano D., Lennartz S., Leonardis D., et al., “High Estimated Pulmonary Artery Systolic Pressure Predicts Adverse Cardiovascular Outcomes in Stage 2–4 Chronic Kidney Disease,” Kidney International 88, no. 1 (2015): 130–136. [DOI] [PubMed] [Google Scholar]
- 2. Navaneethan S. D., Roy J., Tao K., et al., “Prevalence, Predictors, and Outcomes of Pulmonary Hypertension in CKD,” Journal of the American Society of Nephrology 27, no. 3 (2016): 877–886. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Zeder K., Siew E. D., Kovacs G., Brittain E. L., and Maron B. A., “Pulmonary Hypertension and Chronic Kidney Disease: Prevalence, Pathophysiology and Outcomes,” Nature Reviews Nephrology 20, no. 11 (2024): 742–754. [DOI] [PubMed] [Google Scholar]
- 4. Hoeper M. M., Bogaard H. J., Condliffe R., et al., “Definitions and Diagnosis of Pulmonary Hypertension,” Journal of the American College of Cardiology 62, no. S25 (2013): D42–D50. [DOI] [PubMed] [Google Scholar]
- 5. Edmonston D. L., Parikh K. S., Rajagopal S., et al., “Pulmonary Hypertension Subtypes and Mortality in CKD,” American Journal of Kidney Diseases 75, no. 5 (2020): 713–724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. McQuillan B. M., Picard M. H., Leavitt M., and Weyman A. E., “Clinical Correlates and Reference Intervals for Pulmonary Artery Systolic Pressure Among Echocardiographically Normal Subjects,” Circulation 104, no. 23 (2001): 2797–2802. [DOI] [PubMed] [Google Scholar]
- 7. Humbert M., Kovacs G., Hoeper M. M., et al., “2022 ESC/ERS Guidelines for the Diagnosis and Treatment of Pulmonary Hypertension,” European Respiratory Journal 61, no. 1 (2023): 2200879. [DOI] [PubMed] [Google Scholar]
- 8. Bolignano D., Rastelli S., Agarwal R., et al., “Pulmonary Hypertension in CKD,” American Journal of Kidney Diseases 61, no. 4 (2013): 612–622. [DOI] [PubMed] [Google Scholar]
- 9. Knoll G., Cockfield S., Blydt‐Hansen T., et al., “Canadian Society of Transplantation: Consensus Guidelines on Eligibility for Kidney Transplantation,” CMAJ 173, no. 10 (2005): S1–S25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Abramowicz D., Cochat P., Claas F. H., et al., “European Renal Best Practice Guideline on Kidney Donor and Recipient Evaluation and Perioperative Care,” Nephrology, Dialysis, Transplantation 30, no. 11 (2015): 1790–1797. [DOI] [PubMed] [Google Scholar]
- 11. Chadban S. J., Ahn C., Axelrod D. A., et al., “KDIGO Clinical Practice Guideline on the Evaluation and Management of Candidates for Kidney Transplantation,” Transplantation 104, no. 4S1 (2020): S11–S103. [DOI] [PubMed] [Google Scholar]
- 12. Rajkomar A., Dean J., and Kohane I., “Machine Learning in Medicine,” New England Journal of Medicine 380, no. 14 (2019): 1347–1358. [DOI] [PubMed] [Google Scholar]
- 13. Tomašev N., Glorot X., Rae J. W., et al., “A Clinically Applicable Approach to Continuous Prediction of Future Acute Kidney Injury,” Nature 572, no. 7767 (2019): 116–119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Inker L. A., Astor B. C., Fox C. H., et al., “KDOQI US Commentary on the 2012 KDIGO Clinical Practice Guideline for the Evaluation and Management of CKD,” American Journal of Kidney Diseases 63, no. 5 (2014): 713–735. [DOI] [PubMed] [Google Scholar]
- 15. Li M., Tang M., Zhao C., et al., “Prognostic Potential of Pulmonary Hypertension in Patients With Hematologic Malignancy,” Advances in Therapy 40, no. 11 (2023): 4792–4804. [DOI] [PubMed] [Google Scholar]
- 16. Rørth R., Jhund P. S., Yilmaz M. B., et al., “Comparison of BNP and NT‐proBNP in Patients With Heart Failure and Reduced Ejection Fraction,” Circulation: Heart Failure 13, no. 2 (2020): e006541. [DOI] [PubMed] [Google Scholar]
- 17. Peduzzi P., Concato J., Kemper E., Holford T. R., and Feinstein A. R., “A Simulation Study of the Number of Events per Variable in Logistic Regression Analysis,” Journal of Clinical Epidemiology 49, no. 12 (1996): 1373–1379. [DOI] [PubMed] [Google Scholar]
- 18. Tibshirani R., “The Lasso Method for Variable Selection in the Cox Model,” Statistics in Medicine 16, no. 4 (1997): 385–395. [DOI] [PubMed] [Google Scholar]
- 19. Rufibach K., “Use of Brier Score to Assess Binary Predictions,” Journal of Clinical Epidemiology 63, no. 8 (2010): 938–939. author reply 39. [DOI] [PubMed] [Google Scholar]
- 20. Zhang Z., Rousson V., Lee W. C., et al., “Decision Curve Analysis: A Technical Note,” Annals of Translational Medicine 6, no. 15 (2018): 308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Lundberg S. M., Erion G., Chen H., et al., “From Local Explanations to Global Understanding With Explainable AI for Trees,” Nature Machine Intelligence 2, no. 1 (2020): 56–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Tang M., Batty J. A., Lin C., Fan X., Chan K. E., and Kalim S., “Pulmonary Hypertension, Mortality, and Cardiovascular Disease in CKD and ESRD Patients: A Systematic Review and Meta‐Analysis,” American Journal of Kidney Diseases 72, no. 1 (2018): 75–83. [DOI] [PubMed] [Google Scholar]
- 23. Bolignano D., Pisano A., Coppolino G., Tripepi G. L., and D'Arrigo G., “Pulmonary Hypertension Predicts Adverse Outcomes in Renal Patients: A Systematic Review and Meta‐Analysis,” Therapeutic Apheresis and Dialysis: Official Peer‐Reviewed Journal of the International Society for Apheresis, the Japanese Society for Apheresis, the Japanese Society for Dialysis Therapy 23, no. 4 (2019): 369–384. [DOI] [PubMed] [Google Scholar]
- 24. Pabst S., Hammerstingl C., Hundt F., et al., “Pulmonary Hypertension in Patients With Chronic Kidney Disease on Dialysis and Without Dialysis: Results of the PEPPER‐Study,” PLoS One 7, no. 4 (2012): e35310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Wolfe J. D., Hickey G. W., Althouse A. D., et al., “Pulmonary Vascular Resistance Determines Mortality in End‐Stage Renal Disease Patients With Pulmonary Hypertension,” Clinical Transplantation 32, no. 6 (2018): e13270. [DOI] [PubMed] [Google Scholar]
- 26. Shang W., Li Y., Ren Y., Li W., Wei H., and Dong J., “Prevalence of Pulmonary Hypertension in Patients With Chronic Kidney Disease Without Dialysis: A Meta‐Analysis,” International Urology and Nephrology 50, no. 8 (2018): 1497–1504. [DOI] [PubMed] [Google Scholar]
- 27. Wang N., Guo Z., Gong X., Kang S., Cui Z., and Yuan Y., “A Nomogram for Predicting the Risk of Pulmonary Hypertension for Patients With Chronic Obstructive Pulmonary Disease,” International Journal of General Medicine 15 (2022): 5751–5762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Lin C., Ge Q., Wang L., Zeng P., Huang M., and Li D., “Predictors, Prevalence and Prognostic Role of Pulmonary Hypertension in Patients With Chronic Kidney Disease: A Systematic Review and Meta‐Analysis,” Renal Failure 46, no. 2 (2024): 2368082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Genctoy G., Arikan S., and Eldem O., “Pulmonary Hypertension Associates With Malnutrition and Body Composition Hemodialysis Patients,” Renal Failure 37, no. 2 (2015): 273–279. [DOI] [PubMed] [Google Scholar]
- 30. Hu Y., Wang X., Xiao S., et al., “Development and Validation of a Risk Nomogram Model for Predicting Pulmonary Hypertension in Patients With Stage 3–5 Chronic Kidney Disease,” International Urology and Nephrology 55, no. 5 (2023): 1353–1363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Yogeswaran A., Tello K., Lund J., et al., “Risk Assessment in Pulmonary Hypertension Based on Routinely Measured Laboratory Parameters,” Journal of Heart and Lung Transplantation 41, no. 3 (2022): 400–410. [DOI] [PubMed] [Google Scholar]
- 32. Whitfield J. B., “Gamma Glutamyl Transferase,” Critical Reviews in Clinical Laboratory Sciences 38, no. 4 (2001): 263–355. [DOI] [PubMed] [Google Scholar]
- 33. Förstermann U. and Sessa W. C., “Nitric Oxide Synthases: Regulation and Function,” European Heart Journal 33, no. 7 (2012): 829–837. 37a‐37d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Nickel N. P., Galura G. M., Zuckerman M. J., et al., “Liver Abnormalities in Pulmonary Arterial Hypertension,” Pulmonary Circulation 11, no. 4 (2021): 20458940211054304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Ngiam K. Y. and Khor I. W., “Big Data and Machine Learning Algorithms for Health‐Care Delivery,” Lancet Oncology 20, no. 5 (2019): e262–e273. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supporting Table 1: Comparison of features between the PH and Non‐PH groups.
Data Availability Statement
The datasets generated for this study are available on request to the corresponding author.
