A Multimodal Prediction Model for Diagnosing Pulmonary Hypertension in Systemic Sclerosis

Justin K Lui; Kari R Gillmeyer; Ruchika A Sangani; Robert J Smyth; Deepa M Gopal; Marcin A Trojanowski; Andreea M Bujor; Renda Soylemez Wiener; Michael P LaValley; Elizabeth S Klings

doi:10.1002/acr.24969

. Author manuscript; available in PMC: 2024 Jul 1.

Published in final edited form as: Arthritis Care Res (Hoboken). 2023 Jan 18;75(7):1462–1468. doi: 10.1002/acr.24969

A Multimodal Prediction Model for Diagnosing Pulmonary Hypertension in Systemic Sclerosis

Justin K Lui ^1,², Kari R Gillmeyer ¹, Ruchika A Sangani ¹, Robert J Smyth ¹, Deepa M Gopal ³, Marcin A Trojanowski ⁴, Andreea M Bujor ⁴, Renda Soylemez Wiener ^1,⁵, Michael P LaValley ^2,⁴, Elizabeth S Klings ¹

PMCID: PMC9732142 NIHMSID: NIHMS1814768 PMID: 35678779

Abstract

Objective:

Diagnosis of pulmonary hypertension (PH) in systemic sclerosis (SSc) requires an invasive right heart catheterization (RHC), often based on an elevated estimated pulmonary artery systolic pressure on screening echocardiography. However, due to the poor specificity of echocardiography, a greater number of patients undergo RHC than necessary, exposing patients to potentially avoidable complication risks. The development of improved prediction models for PH in SSc may inform decision-making for RHC in these patients.

Methods:

We conducted a retrospective study of 130 patients with SSc; 66 (50.8%) were diagnosed with PH by RHC. We used data from pulmonary function testing, electrocardiography, echocardiography, and computed tomography to identify and compare the performance characteristics of three models predicting the presence of PH: 1) Random forest; 2) Classification and regression tree; and, 3) Logistic regression. For each model, we generated receiver operating curves and calculated sensitivity and specificity. We internally validated models using a train-test split of the data.

Results:

The random forest model performed best with an area under the curve of 0.92 (95% CI: 0.83, 1.00), sensitivity of 0.95 (95% CI: 0.75, 1.00), and specificity of 0.80 (95% CI: 0.56, 0.94). The two most important variables in our random forest model were pulmonary artery diameter on chest computed tomography and diffusing capacity for carbon monoxide on pulmonary function testing.

Conclusions:

In patients with SSc, a random forest model can aid in the detection of PH with high sensitivity and specificity, and may allow for better patient selection for RHC, thereby minimizing patient risk.

Keywords: Machine learning, receiver operating characteristic

Introduction

Pulmonary hypertension (PH) affects 8 to 12% of patients with systemic sclerosis (SSc) and is a leading cause of morbidity and mortality [1, 2]. Diagnosis of PH requires measurement of a mean pulmonary artery pressure (mPAP) of > 20 mmHg via an invasive right heart catheterization (RHC) [3]. The decision to proceed with RHC often hinges on the detection of an estimated pulmonary artery systolic pressure (ePASP) threshold of 35 to 40 mmHg obtained through noninvasive echocardiography [4, 5]. However, the pooled sensitivity and specificity of ePASP in diagnosing PH have been known to vary widely, making it an inadequate screening tool on its own [5]. Limitations of ePASP include a sensitivity of 0.88 (95% CI: 0.84, 0.92), which can lead to false negatives that can delay diagnosis of PH, and a specificity of 0.56 (95% CI: 0.46, 0.66), which can introduce a high false positivity rate resulting in a larger number of RHCs than necessary with increased costs and potentially avoidable complications. Assessment of ePASP requires a sufficient tricuspid regurgitant jet velocity absent in a quarter of patients [6]. Furthermore, ePASP is affected by technique and certain underlying pathologies; it may be underestimated due to Doppler beam misalignment and severe tricuspid regurgitation or overestimated in pulmonic stenosis, excessive Doppler gain, high cardiac output states, and arrhythmias [7]. Indeed, ePASP has been inaccurate in nearly one half of patients [8] and is thought to be an unreliable screening tool for PH.

An alternative screening approach is to apply a multimodal strategy that incorporates commonly available, noninvasively obtained variables to create a prediction model for PH. For example, Schreiber et al. integrated oximetry and diffusing capacity of carbon monoxide (D_LCO) on pulmonary function testing (PFT) using a multivariable linear regression model to derive a predicted mPAP [9]. Using a predicted mPAP threshold of 25 mmHg, the model demonstrated a sensitivity of 0.90 and a specificity of 0.29 for diagnosis of PH [9]. Similarly, the two-step DETECT algorithm applied a multivariable logistic regression model using data from PFTs, serum markers, electrocardiography (ECG), and echocardiography [1]. While the algorithm exhibited a high sensitivity of 0.96, the rate of RHC referral was substantial, as high as 62% among patients with SSc because of its low specificity of 0.48 [10]. Therefore, like ePASP by echocardiography, there is an unacceptably high false positivity rate with current multimodal prediction models.

Machine learning algorithms have been recently applied to improve diagnosis of cardiovascular diseases [11]. Decision trees including the classification and regression tree (CART) are one of the most common machine learning approaches. These prediction models can be further optimized by aggregating a multitude of decision trees into an ensemble learning method termed random forests [12]. Recognized for their accuracy, random forests have demonstrated exceptional performance in settings in which the number of variables exceeds the number of observations [12], making them an ideal prediction model for rare diseases such as SSc. Herein, we aimed to explore the use of a random forest model to predict PH in patients with SSc using a multimodal approach on data from: 1) PFT; 2) ECG; 3) Echocardiography; and 4) Computed tomography (CT), and to compare its performance characteristics to those of a logistic regression model and a single CART model. We hypothesized that a random forest approach would be able to predict PH in patients with SSc with high sensitivity, specificity, and accuracy.

Methods

Study Population

We conducted a retrospective, single-center study of patients with SSc enrolled into the Scleroderma Center of Research Translation (CORT) clinical registry database at Boston Medical Center/Boston University School of Medicine. At the time of enrollment, patients were consented for inclusion in this longitudinal clinical registry. The study was approved by the Institutional Review Board at Boston University School of Medicine. We included patients with SSc with the following assessments between the dates of January 1, 2003 through May 31, 2021: 1) PFT; 2) ECG; 3) Echocardiography; and 4) CT chest imaging – each evaluation was performed within 3 years of the others. This interval was selected to mirror usual clinical practice as many patients did not have all four assessments completed simultaneously, with a mean time interval of 11.5 months between each of the four assessments. 81 (62.3%) patients had all four assessments performed within 1 year of each other. We defined PH based on RHC by a resting mPAP > 20 mmHg [3]. We did not incorporate pulmonary vascular resistance (PVR) into our inclusion criteria as we recognized that those with a PVR < 3 Wood units may still be abnormal. For patients with SSc-PH, we included assessments within the range of 3 years pre-RHC or 1 month post-RHC, selecting assessments closest in timing to the RHC. The mean time interval between RHC and assessments was 9.9 months with 42 (63.6%) patients having all assessments performed within 1 year of RHC.

Variable Selection

We included the following as variables in the prediction models: 1) PFT variables including forced vital capacity (FVC) and D_LCO (as individual variables, and separately as FVC/D_LCO ratio); 2) ECG variables including left axis deviation (LAD), right axis deviation (RAD), and right bundle branch block (RBBB) [13]; 3) Echocardiography variables including aortic regurgitation, mitral regurgitation, tricuspid regurgitation, pulmonic regurgitation, left ventricular ejection fraction (LVEF), left ventricular diastolic dysfunction, left ventricular hypertrophy, left atrial (LA) dilation, right atrial (RA) dilation, right ventricular (RV) dilation, RV dysfunction, and pericardial effusion; and, 4) Pulmonary artery (PA) diameter on CT chest imaging, measured at the level of the bifurcation of the main PA in the axial plane [14].

Statistical Analysis and Evaluation of Models

R version 4.1.2 (R Foundation for Statistical Computing, Vienna, Austria) was used for all analyses. First, we randomly divided the data into a training and test split of 90 (~70%) to 40 (~30%) patients, similar to previous studies [11]. Using only the training split we developed our prediction models. First, we constructed the random forest using the ‘randomForest’ package in which variable importance was determined by the mean decrease in the Gini impurity index, a measure of the information content in a binary outcome [15]. To determine mtry, the number of variables randomly sampled as candidates at each split, and ntree, the number of trees to grow, a grid search was applied by varying mtry from 2 to 10 and by varying ntree from 1000 to 10000 to build the model. The optimal mtry and ntree were selected based on model performance using the test split of the data. For the minimum size of terminal nodes, the default value of 1 was used for classification, and the default value of 5 was used for regression. For model comparisons, we generated a single CART tree with the ‘tree’ package, which used the same variables and applied the default splitting criterion of deviance to split the node into separate groups. For the minimum number of observations to include in either child node, the default value of 5 was used, and for the smallest allowed node size, the default value of 10 was used. We then developed a logistic regression model with bidirectional stepwise variable selection using Akaike Information Criterion (AIC). We internally validated all the models using the test split of the data by the area under the Receiver Operating Curve (AUC). We calculated 95% confidence intervals (CI) of the AUC using a default of 2000 bootstrap samples of the test split with the ‘pROC’ package. Lastly, we determined sensitivity, specificity, and accuracy for each model compared to the ground truth PH assessment of mPAP > 20 mmHg on RHC.

Results

Study Population and Clinical Characteristics

Of 584 patients with SSc identified within the Scleroderma CORT clinical registry with assessments between January 1, 2003 and May 31, 2021, 130 patients met inclusion for the study (Figure 1). Of 86 (66.2%) patients who underwent RHC, 66 (50.8%) were diagnosed with SSc-PH. At the time of RHC, 34 (51.5%) were classified as pre-capillary PH, 10 (15.2%) were classified as post-capillary PH, and 4 (6.1%) were classified as combined pre- and post-capillary PH. There were 18 (27.3%) who remained unclassified. Patient demographics and baseline characteristics are summarized in Table 1. Compared to patients without PH, those with PH were more frequently male (25.8% vs. 10.9%). There were no differences in age, diagnosis of diffuse cutaneous SSc, or tobacco use between those with and without PH. Patients with PH were more likely to have B-type natriuretic peptide (BNP) ≥ 100 pg/mL than patients without PH. Patients with PH had a lower FVC (69.8% predicted vs. 79.9% predicted) and D_LCO (44.3% predicted vs. 58.9% predicted) compared to those without. On ECG and echocardiography, patients with PH had a greater frequency of cardiac axis deviation, valvular abnormalities, and both left- and right-sided cardiac structural abnormalities. There was an insufficient tricuspid regurgitant jet velocity to assess an ePASP in 32.8% of patients without PH and 21.9% with PH. While ePASP was greater amongst those with PH compared to those without (48.3 vs. 28.6 mmHg), only 57.6% with PH had an ePASP > 35 mmHg, and 45.5% had an ePASP > 40 mmHg. Radiologically, those with PH had a larger PA diameter (31.1 vs. 26.3 mm) compared to those without.

Figure 1. — Abbreviations: CT: Computed tomography; ECG: Electrocardiography; PFT: Pulmonary function testing; PH: Pulmonary hypertension; SSc: Systemic sclerosis.

Table 1.

Demographics and Clinical Features of Patients with SSc, by Presence of PH.

Variable	SSc Only	SSc-PH
Variable	(N = 64)	(N = 66)

Demographics
Age, years (mean ± SD)	53.4 ± 10.8	57.7 ± 11.0
Male Sex, n (%)	7 (10.9%)	17 (25.8%)
Diffuse Cutaneous SSc, n (%)	29 (45.3%)	29 (43.9%)
Former or Current Smoker, n (%)	33 (51.6%)	36 (54.5%)
PH Classification
Pre-capillary PH, n (%)	0	34 (51.5%)
Post-capillary PH, n (%)	0	10 (15.2%)
Combined pre- and post-capillary PH, n (%)	0	4 (6.1%)
BNP Measurements ^#
BNP < 100 pg/mL	28 (82.4%)	33 (51.6%)
BNP ≥ 100 pg/mL and < 400 pg/mL	6 (17.6%)	19 (29.7%)
BNP ≥ 400 pg/mL	0	12 (18.8%)
Pulmonary Function Testing
FVC, % predicted (mean ± SD)	79.9 ± 19.2	69.8 ± 17.6
D_LCO, % predicted (mean ± SD)	58.9 ± 18.4	44.3 ± 16.0
FVC/D_LCO Ratio (mean ± SD)	1.5 ± 0.4	1.7 ± 0.7
Electrocardiography
Left Axis Deviation, n (%)	4 (6.3%)	12 (18.2%)
Right Axis Deviation, n (%)	0	4 (6.1%)
Right Bundle Branch Block, n (%)	4 (6.3%)	5 (7.6%)
Echocardiography
Aortic Regurgitation, n (%)	2 (3.1%)	11 (16.7%)
Mitral Regurgitation, n (%)	15 (23.4%)	20 (30.3%)
Tricuspid Regurgitation, n (%)	26 (40.6%)	37 (56.1%)
Pulmonic Regurgitation, n (%)	5 (7.8%)	15 (22.7%)
LA Dilation, n (%)	9 (14.1%)	25 (37.9%)
LVEF, % (mean ± SD)	62.2 ± 4.8	60.7 ± 9.5
LV Diastolic Dysfunction, n (%)	6 (9.4%)	21 (31.8%)
LV Hypertrophy, n (%)	8 (12.5%)	19 (28.8%)
RA Dilation, n (%)	1 (1.6%)	15 (22.7%)
RV Dilation, n (%)	2 (3.1%)	24 (36.4%)
RV Dysfunction, n (%)	0	10 (15.2%)
Pericardial Effusion, n (%)	16 (25.0%)	21 (31.8%)
Insufficient TR Jet for ePASP, n (%)	21 (32.8%)	14 (21.9%)
ePASP > 35 mmHg, n (%)	6 (9.4%)	38 (57.6%)
ePASP > 40 mmHg, n (%)	2 (3.1%)	30 (45.5%)
ePASP, mmHg (mean ± SD)^*	28.6 ± 6.9	48.3 ± 16.2
Chest CT Findings
PA Diameter, mm (mean ± SD)	26.3 ± 3.6	31.1 ± 5.5

Open in a new tab

Abbreviations: BNP: B-type natriuretic peptide; CT: computed tomography; D_LCO: Diffusing capacity of carbon monoxide; ePASP: Estimated pulmonary artery systolic pressure; FVC: Forced vital capacity; LA: Left atrial; LV: Left ventricular; LVEF: Left ventricular ejection fraction; PA: Pulmonary artery; PH: Pulmonary hypertension; RA: Right atrial; RV: Right ventricular; SSc: Systemic sclerosis; TR: Tricuspid regurgitant

For BNP, SSc Only: N = 34; SSc-PH: N = 64.

For ePASP, SSc Only: N = 43; SSc-PH: N = 51.

Comparison of Model Performance

As summarized in Figure 2, the AUC for the random forest was 0.92 with a sensitivity of 0.95 and specificity of 0.80 using a ntree of 1000 and mtry of 2. The accuracy of the random forest model was 0.88. Based on the mean decrease in the Gini impurity index (Figure 3), the most important variable in the random forest was PA diameter on CT followed by D_LCO and FVC on PFT and LVEF, RV dilation, and LA dilation by echocardiography. The least important variables in the prediction model were LAD, RBBB, and RAD on ECG. When we used the FVC/D_LCO ratio instead of individual measurements of FVC and D_LCO, the AUC was 0.84 with a sensitivity of 0.85 and a specificity of 0.60. The accuracy using FVC/D_LCO ratio was 0.73.

Figure 3. — Abbreviations: PADiam: Pulmonary artery diameter; D_LCO: Diffusing capacity for carbon monoxide; FVC; Forced vital capacity; LVEF: Left ventricular ejection fraction; RVdil: Right ventricular dilation; LAdil: Left atrial dilation; AR: Aortic regurgitation; LVdias: Left ventricular diastolic dysfunction; RAdil: Right atrial dilation; PR: Pulmonic regurgitation; TR: Tricuspid regurgitation; LVH: Left ventricular hypertrophy; Pericard: Pericardial effusion; MR: Mitral regurgitation; RVdys: Right ventricular dysfunction; LAD: Left axis deviation; RBBB: Right bundle branch block; RAD: Right axis deviation.

Using a single CART model, the AUC was 0.73 with a sensitivity of 0.55 and specificity of 0.70. The accuracy of the CART model was 0.63. All variables included in the random forest were used in the CART model, but the following 6 variables were selected by the algorithm to make the splits: 1) PA diameter; 2) D_LCO; 3) FVC; 4) LVEF; 5) RV dilation; and 6) mitral regurgitation (Figure 4). Similarly, all variables included in the random forest were assessed in the logistic regression model (See Supplemental Table 1), but the following 6 variables were selected for the final prediction model using a bidirectional stepwise variable selection by AIC: 1) PA diameter; 2) D_LCO; 3) RV dilation; 4) LA dilation; 5) Aortic regurgitation; and, 6) Tricuspid regurgitation (Table 2). The AUC of the logistic regression model was 0.85 with a sensitivity of 0.85 and a specificity of 0.70. The accuracy was 0.78.

Figure 4. — Abbreviations: D_LCO: Diffusing capacity for carbon monoxide; FVC: Forced vital capacity; LVEF: Left ventricular ejection fraction; PA: Pulmonary artery; RV: Right ventricular. At each split, an observation goes to the left branch if the condition for the variable is satisfied as illustrated in the Legend. There are 10 terminal nodes in the CART model with the sample sizes for the classification of each node denoted in parentheses. Of the training split of 90 patients, the residual mean deviance of the CART model was 0.65, and the misclassification error rate was 0.16.

Table 2.

Logistic Regression Coefficients of the Parsimonious Model

Variable	β Estimate	Standard Error	p-value	OR (95% CI)
(Intercept)	−3.95	2.49	0.11	0.02 (0, 2.57)
D_LCO	−0.05	0.02	0.01	0.95 (0.91, 0.98)
Aortic Regurgitation	2.45	1.19	0.04	11.53 (1.12, 118.53)
Tricuspid Regurgitation	−0.90	0.63	0.15	0.40 (0.12, 1.39)
LA Dilation	1.79	0.72	0.01	5.98 (1.45, 24.60)
RV Dilation	2.16	0.98	0.03	8.68 (1.26, 59.70)
PA Diameter	0.22	0.08	0.01	1.25 (1.06, 1.47)

Open in a new tab

Abbreviations: D_LCO: Diffusing capacity of carbon monoxide; LA: Left atrial; RV: Right ventricular; PA: Pulmonary artery.

Discussion

As PH is a major cause of morbidity and mortality in SSc [2], timely diagnosis and initiation of therapies are essential to improve symptoms and outcomes. Currently, diagnosis requires an invasive RHC which is often pursued based on an unreliable ePASP obtained on echocardiography [4]. Whereas poor specificity can result in a greater number of referrals for RHC than necessary, thereby exposing patients to potential risks of complications from an invasive procedure that may have been otherwise avoidable, poor sensitivity may lead to missed or delayed PH diagnosis. Furthermore, in the current study, sufficient tricuspid regurgitant jet velocity for ePASP was absent in 21.9% of patients with SSc-PH and 26.9% of patients in the entire SSc cohort. Using retrospectively collected data from a single specialist referral center for SSc, our random forest model predicted PH with high sensitivity, specificity, and accuracy for the presence of PH.

The most important predictors selected in the random forest, CART, and logistic regression models were PA diameter, D_LCO, and RV dilation. Based on the Gini impurity index, the most important variable in the random forest model was PA diameter, consistent with prior studies which demonstrated that PA diameters are increased even in those with mild PH (mPAP of 21 to 24 mmHg) [14]. D_LCO can be reduced in both PH and interstitial lung disease (ILD), particularly with increased severity of each, thereby limiting its ability to discriminate between the two conditions. Interestingly, in the random forest and CART models, both FVC and D_LCO were predictors of PH, suggesting that ILD may be an important contributor to PH (Group 3 PH). Furthermore, when the FVC/D_LCO ratio was used in the random forest model instead of individual measurements of FVC and D_LCO, the model performed poorly, likely related to the decrease in degrees of freedom of the model by collapsing two variables into one. While ECG has been observed to be a prognostic indicator of all-cause mortality in patients with SSc-PH [13], it was not an important predictor of PH in the present study. The finding of an insignificant tricuspid regurgitant jet velocity in more than a quarter of patients is notable and consistent with previous findings [6]. These patients represent a diagnostic dilemma, and the best course of action is often unclear. An accurate prediction model with commonly available noninvasive studies may add vital information in this scenario and may help the clinician decide whether to follow these patients clinically or to proceed directly to RHC.

Our study has limitations. First, our prediction models were created using retrospective data of a single specialist referral center. This data was also dictated by clinical practice, leading to inconsistencies in timing of diagnostic studies and missing data. We attempted to standardize the timing of studies by requiring that PFT, ECG, echocardiography, and CT chest imaging be spaced no more than 3 years apart from each other. This rationale was based on the inclusion of four assessments (i.e., PFT, ECG, echocardiography, and CT imaging) to be separated from each other by a maximum of one year. Certainly, SSc can progress during that interval of time, and repeated testing within the 3-year window could demonstrate marked differences in values recorded. However, most patients had all four assessments performed with 1 year of each other. Additionally, for patients with PH, we used only studies that were closest in time prior to PH diagnosis or within 1 month following PH diagnosis. Second, our model did not differentiate between PH classification (i.e., pre-capillary, post-capillary, and combined pre- and post-capillary PH) due to the limitation of our small dataset of a rare disease population. Additionally, there were 18 patients who remained as unclassified PH that likely represented a cohort with either high cardiac output (related to coexistent anemia, cirrhosis, etc.) or alternatively, post-capillary PH who were diuresed to euvolemia at the time of RHC. Future work will be needed using larger cohorts of patients with SSc to create individual prediction models for each type of PH. Third, BNP measurements were only obtained in 53.1% of patients with SSc compared to 97.0% of patients with SSc-PH, suggesting that it was not routinely tested unless there was a strong suspicion of PH. Therefore, while measurements of BNP were key in the DETECT algorithm derived from a prospective study with standardized assessments, we did not incorporate them into our prediction model due to the possibility of skewing our sampling structure if we were to exclude patients without BNP. Furthermore, among patients with SSc who did not develop PH, only 20 (31.3%) underwent RHC. As a result, it is possible that some of these patients may have had undiagnosed PH. However, including only patients who had undergone RHC would bias the data because only patients with suspicion for PH would be referred for RHC. Finally, while existing clinical models [9, 10] were based on a diagnostic criterion using a mPAP ≥ 25 mmHg on RHC, our prediction model was based on the more recent diagnostic definition of PH using a mPAP > 20 mmHg on RHC. Hence, direct comparisons on model performance cannot be made using our single-center dataset with existing clinical models. Moreover, variables from existing models were not entirely routine (i.e., serum urate) whereas our prediction model using retrospective data was based on more commonly accessible, noninvasive assessments. This, in fact, is one of the strengths of our random forest approach, allowing it to be potentially integrated into a point-of-care decision aid to help guide clinicians on whether to refer patients with SSc for RHC.

Conclusion

We developed a multimodal prediction model using a random forest approach that demonstrated high sensitivity, specificity, and accuracy in the detection of PH among patients with SSc, with the strongest predictors of PH being PA diameter and DLCO. This model can potentially be utilized to better risk stratify patients for PH and thereby improve patient selection for RHC, which would reduce potential procedural risks for patients and thus potentially decrease costs for healthcare systems. Future work will need to externally validate our findings in other datasets as well as in a prospective, time-controlled manner prior to implementation into clinical practice.

Supplementary Material

tS1

NIHMS1814768-supplement-tS1.docx^{(15.1KB, docx)}

Significance and Innovations.

A multimodal prediction model using a random forest approach demonstrated high sensitivity, specificity, and accuracy in the detection of pulmonary hypertension among patients with systemic sclerosis.
Pulmonary artery diameter and diffusing capacity for carbon monoxide are important predictors of pulmonary hypertension in systemic sclerosis.
This novel approach may be valuable when estimated pulmonary artery pressures on echocardiography are unavailable to help risk stratify patients for pulmonary hypertension and improve patient selection for right heart catheterization.

Funding:

J. K. L. is supported by NIH/NHLBI 1F32HL156614-01. K. R. G. is supported by NIH/NHLBI 5F32HL149236-02 and a Parker B. Francis Fellowship Award. R. J. S. is supported by the International Lung Cancer Foundation John Fishers Legacy Fellowship grant number 55208708. D. M. G. is supported by the American Heart Association: FTF17FTF33670369. A. M. B. is supported by the Tobé and Stephen E. Malawista, MD, Endowment in Academic Rheumatology, Rheumatology Research Foundation grant number 55207931 and NIH/NHLBI 1R01HL155955-01A1. R. S. W. is supported in part by resources from the VA Boston Healthcare system. M.P.L. is supported by NIH/NIAMS CCCR P30 AR072571. E. S. K. is supported by NIH/NHLBI 1UG3 HL143192-01A1, NCATS 2UL1TR001430-05A1, and HRSA U1EMC27864-08-00 and receives research support from Bayer, Forma Therapeutics, Novartis, and United Therapeutics.

Footnotes

Conflict of Interest: E. S. K. receives research support Bayer, Forma Therapeutics, Novartis, and United Therapeutics. She has served as a consultant for Bluebird Bio, Omeros Corporation, and CSL Behring for sickle cell disease related clinical trials (no conflict with the present work).

Disclaimer: The views presented here do not necessarily reflect the views of the Department of Veterans Affairs or the US government.

References

1.Coghlan JG, Wolf M, Distler O, et al. Incidence of pulmonary hypertension and determining factors in patients with systemic sclerosis. Eur Respir J. 2018;51(4):1701197. [DOI] [PubMed] [Google Scholar]
2.Steen VD, Medsger TA. Changes in causes of death in systemic sclerosis, 1972–2002. Ann Rheum Dis. 2007;66(7):940–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Simonneau G, Montani D, Celermajer DS, et al. Haemodynamic definitions and updated clinical classification of pulmonary hypertension. Eur Respir J. 2019;53(1). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Rudski LG, Lai WW, Afilalo J, et al. Guidelines for the echocardiographic assessment of the right heart in adults: a report from the American Society of Echocardiography endorsed by the European Association of Echocardiography, a registered branch of the European Society of Cardiology, and the Canadian Society of Echocardiography. J Am Soc Echocardiogr. 2010;23(7):685–713; quiz 786–8. [DOI] [PubMed] [Google Scholar]
5.Taleb M, Khuder S, Tinkel J, Khouri SJ. The diagnostic accuracy of Doppler echocardiography in assessment of pulmonary artery systolic pressure: a meta-analysis. Echocardiography. 2013;30(3):258–65. [DOI] [PubMed] [Google Scholar]
6.Yared K, Noseworthy P, Weyman AE, McCabe E, Picard MH, Baggish AL. Pulmonary artery acceleration time provides an accurate estimate of systolic pulmonary arterial pressure during transthoracic echocardiography. J Am Soc Echocardiogr. 2011;24(6):687–92. [DOI] [PubMed] [Google Scholar]
7.Rich JD. Counterpoint: Can Doppler Echocardiography Estimates of Pulmonary Artery Systolic Pressures Be Relied Upon to Accurately Make the Diagnosis of Pulmonary Hypertension? No. Chest. 2013;143(6):1536–1539. [DOI] [PubMed] [Google Scholar]
8.Rich JD, Shah SJ, Swamy RS, Kamp A, Rich S. Inaccuracy of Doppler echocardiographic estimates of pulmonary artery pressures in patients with pulmonary hypertension: implications for clinical practice. Chest. 2011. May;139(5):988–993. [DOI] [PubMed] [Google Scholar]
9.Schreiber BE, Valerio CJ, Keir GJ, et al. Improving the detection of pulmonary hypertension in systemic sclerosis using pulmonary function tests. Arthritis Rheum. 2011;63(11):3531–9. [DOI] [PubMed] [Google Scholar]
10.Coghlan JG, Denton CP, Grünig E, et al. Evidence-based detection of pulmonary arterial hypertension in systemic sclerosis: the DETECT study. Ann Rheum Dis. 2014;73(7):1340–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Aryal S, Alimadadi A, Manandhar I, Joe B, Cheng X. Machine Learning Strategy for Gut Microbiome-Based Diagnostic Screening of Cardiovascular Disease. Hypertension. 2020;76(5):1555–1562. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Biau G, Scornet E. A random forest guided tour. TEST. 2016;25:197–227. [Google Scholar]
13.Lui JK, Sangani RA, Chen CA, et al. The Prognostic Value of Cardiac Axis Deviation in Systemic Sclerosis-related Pulmonary Hypertension. Arthritis Care Res (Hoboken). 2021. Jun 3. Doi: 10.1002/acr.24724. Online ahead of print. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Lange TJ, Dornia C, Stiefel J, et al. Increased Pulmonary Artery Diameter on Chest Computed Tomography Can Predict Borderline Pulmonary Hypertension. Pulm Circ. 2013;3(2):363–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Loh WY. Fifty Years of Classification and Regression Trees. Int Statistical Rev. 2014;82(3):329–48. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

tS1

NIHMS1814768-supplement-tS1.docx^{(15.1KB, docx)}

[R1] 1.Coghlan JG, Wolf M, Distler O, et al. Incidence of pulmonary hypertension and determining factors in patients with systemic sclerosis. Eur Respir J. 2018;51(4):1701197. [DOI] [PubMed] [Google Scholar]

[R2] 2.Steen VD, Medsger TA. Changes in causes of death in systemic sclerosis, 1972–2002. Ann Rheum Dis. 2007;66(7):940–4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Simonneau G, Montani D, Celermajer DS, et al. Haemodynamic definitions and updated clinical classification of pulmonary hypertension. Eur Respir J. 2019;53(1). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Rudski LG, Lai WW, Afilalo J, et al. Guidelines for the echocardiographic assessment of the right heart in adults: a report from the American Society of Echocardiography endorsed by the European Association of Echocardiography, a registered branch of the European Society of Cardiology, and the Canadian Society of Echocardiography. J Am Soc Echocardiogr. 2010;23(7):685–713; quiz 786–8. [DOI] [PubMed] [Google Scholar]

[R5] 5.Taleb M, Khuder S, Tinkel J, Khouri SJ. The diagnostic accuracy of Doppler echocardiography in assessment of pulmonary artery systolic pressure: a meta-analysis. Echocardiography. 2013;30(3):258–65. [DOI] [PubMed] [Google Scholar]

[R6] 6.Yared K, Noseworthy P, Weyman AE, McCabe E, Picard MH, Baggish AL. Pulmonary artery acceleration time provides an accurate estimate of systolic pulmonary arterial pressure during transthoracic echocardiography. J Am Soc Echocardiogr. 2011;24(6):687–92. [DOI] [PubMed] [Google Scholar]

[R7] 7.Rich JD. Counterpoint: Can Doppler Echocardiography Estimates of Pulmonary Artery Systolic Pressures Be Relied Upon to Accurately Make the Diagnosis of Pulmonary Hypertension? No. Chest. 2013;143(6):1536–1539. [DOI] [PubMed] [Google Scholar]

[R8] 8.Rich JD, Shah SJ, Swamy RS, Kamp A, Rich S. Inaccuracy of Doppler echocardiographic estimates of pulmonary artery pressures in patients with pulmonary hypertension: implications for clinical practice. Chest. 2011. May;139(5):988–993. [DOI] [PubMed] [Google Scholar]

[R9] 9.Schreiber BE, Valerio CJ, Keir GJ, et al. Improving the detection of pulmonary hypertension in systemic sclerosis using pulmonary function tests. Arthritis Rheum. 2011;63(11):3531–9. [DOI] [PubMed] [Google Scholar]

[R10] 10.Coghlan JG, Denton CP, Grünig E, et al. Evidence-based detection of pulmonary arterial hypertension in systemic sclerosis: the DETECT study. Ann Rheum Dis. 2014;73(7):1340–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Aryal S, Alimadadi A, Manandhar I, Joe B, Cheng X. Machine Learning Strategy for Gut Microbiome-Based Diagnostic Screening of Cardiovascular Disease. Hypertension. 2020;76(5):1555–1562. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Biau G, Scornet E. A random forest guided tour. TEST. 2016;25:197–227. [Google Scholar]

[R13] 13.Lui JK, Sangani RA, Chen CA, et al. The Prognostic Value of Cardiac Axis Deviation in Systemic Sclerosis-related Pulmonary Hypertension. Arthritis Care Res (Hoboken). 2021. Jun 3. Doi: 10.1002/acr.24724. Online ahead of print. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Lange TJ, Dornia C, Stiefel J, et al. Increased Pulmonary Artery Diameter on Chest Computed Tomography Can Predict Borderline Pulmonary Hypertension. Pulm Circ. 2013;3(2):363–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Loh WY. Fifty Years of Classification and Regression Trees. Int Statistical Rev. 2014;82(3):329–48. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A Multimodal Prediction Model for Diagnosing Pulmonary Hypertension in Systemic Sclerosis

Justin K Lui, MD, MS

Kari R Gillmeyer, MD, MSc

Ruchika A Sangani, MD

Robert J Smyth, MD, MSc

Deepa M Gopal, MD, MS

Marcin A Trojanowski, MD

Andreea M Bujor, MD, PhD

Renda Soylemez Wiener, MD, MPH

Michael P LaValley, PhD

Elizabeth S Klings, MD

Abstract

Objective:

Methods:

Results:

Conclusions:

Introduction

Methods

Study Population

Variable Selection

Statistical Analysis and Evaluation of Models

Results

Study Population and Clinical Characteristics

Figure 1. Study Population Flow Diagram.

Table 1.

Comparison of Model Performance

Figure 2. Receiver Operating Characteristic Curve for Prediction Models.

Figure 3. Variable Importance in the Random Forest Model.

Figure 4. Classification and Regression Tree Model.

Table 2.

Discussion

Conclusion

Supplementary Material

Significance and Innovations.

Funding:

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases