Skip to main content
eBioMedicine logoLink to eBioMedicine
. 2023 Sep 7;96:104795. doi: 10.1016/j.ebiom.2023.104795

Treatment response to spironolactone in patients with heart failure with preserved ejection fraction: a machine learning-based analysis of two randomized controlled trials

Karl-Patrik Kresoja a,b,g, Matthias Unterhuber a,b,g, Rolf Wachter c,d, Karl-Philipp Rommel a,b, Christian Besler a,b, Sanjiv Shah e, Holger Thiele a,b, Frank Edelmann f, Philipp Lurz a,b,
PMCID: PMC10498181  PMID: 37689023

Summary

Background

Whether there is a subset of patients with heart failure with preserved ejection fraction (HFpEF) that benefit from spironolactone therapy is unclear. We applied a machine learning approach to identify responders and non-responders to spironolactone among patients with HFpEF in two large randomized clinical trials.

Methods

Using a reiterative cluster allocating permutation approach, patients from the derivation cohort (Aldo-DHF) were identified according to their treatment response to spironolactone with respect to improvement in E/e’. Heterogenous features of response (‘responders’ and ‘non-responders’) were characterized by an extreme gradient boosting (XGBoost) algorithm. XGBoost was used to predict treatment response in the validation cohort (TOPCAT). The primary endpoint of the validation cohort was a combined endpoint of cardiovascular mortality, aborted cardiac arrest, or heart failure hospitalization. Patients with missing variables for the XGboost model were excluded from the validation analysis.

Findings

Out of 422 patients from the derivation cohort, reiterative cluster allocating permutation identified 159 patients (38%) as spironolactone responders, in whom E/e’ significantly improved (p = 0.005). Within the validation cohort (n = 525) spironolactone treatment significantly reduced the occurrence of the primary outcome among responders (n = 185, p log rank = 0.008), but not among patients in the non-responder group (n = 340, p log rank = 0.52).

Interpretation

Machine learning approaches might aid in identifying HFpEF patients who are likely to show a favorable therapeutic response to spironolactone.

Funding

See Acknowledgements section at the end of the manuscript.

Keywords: Machine learning, Heart failure with preserved ejection fraction, Spironolactone


Research in context.

Evidence before this study

The therapeutic value of spironolactone treatment for patients with heart failure with preserved ejection fraction (HFpEF) is controversially discussed and randomized clinical trials have provided conflicting data. The heterogenous characteristics of HFpEF patients have been suggested as a possible reason for those contradicting results, yet conventional statistical approaches are unable to model heterogenous patients’ characteristics and treatment responses.

Added value of this study

Modern machine learning algorithms were used to identify individual treatment response to spironolactone, by modelling heterogenous patients’ characteristics. This led to a significant reduction of the number needed to treat and showed consistency in a validation cohort were patients with predicted favourable response to spironolactone had a reduction in a composite outcome of heart failure hospitalization and death.

Implications of all the available evidence

Machine learning approaches might help to reduce the number of patients unnecessary exposed to medical therapy and might improve efficiency of future randomized clinical trials.

Introduction

Heart failure with preserved ejection fraction (HFpEF) is a heterogenous disease.1 This heterogeneity is often made accountable for the lengthy progress in the difficult pursuit for an effective HFpEF treatment2, 3, 4 and accordingly current guidelines offer no specific recommendations on the treatment of HFpEF.5

Mineralocorticoid-receptor antagonists, such as spironolactone, have shown promising results in studies focused on functional improvements in patients with HFpEF,6,7 but have failed to transfer these improvements into reduced morbidity or mortality.2 To account for heterogeneity in clinical trials, outcome directed subgroup analysis were performed, but they do not reflect patient individual heterogeneity as they usually, given limitations of conventional statistical models, merely process dichotomous data. Clustering represents an approach to account for heterogeneity and provides groups with similar clinical characteristics or phenotypes but is often limited to simple correlations8 and cannot account for therapeutic responses. On the other hand, outcome directed machine learning algorithms (ML) combine positive features of aforementioned methods and can model more complex interactions of variables, can account for individual patient heterogeneity and have been shown to outperform classical statistical models for prediction of mortality when analysing complex data.9,10

We therefore designed and applied a ML approach to model complex interactions of features for identification of HFpEF patients responding to spironolactone in two randomized controlled trials, the ALDOsterone receptor blockade in Diastolic Heart Failure (Aldo-DHF)6 and Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist (TOPCAT) trials as derivation and validation cohorts.

Methods

The aim of the current study was to identify responders and non-responders to spironolactone therapy among patients with HFpEF. We first performed a reiterative cluster allocating permutation analysis to identify responders to spironolactone with respect to the primary endpoint of Aldo-DHF, which was change in E/e’. Subsequently, a decision tree algorithm-based model (extreme gradient boosting—XGBoost) was trained to identify characteristics of responders vs. non-responders as previously defined by the permutation analysis. In a next step, this algorithm was used to identify responders among the TOPCAT cohort (validation cohort). The prognostic value with respect to the primary endpoint of the TOPCAT trial (composite of death from cardiovascular causes, aborted cardiac arrest, or hospitalization for the management of heart failure) between treatment allocations (spironolactone vs. placebo) within the cohort of patients identified as responders and non-responders by the XGBoost algorithm was then tested.

Study cohort–derivation

Aldo-DHF (NCT00108251) is a prospective, randomized, placebo-controlled, double-blind multicenter trial, which assessed the efficacy of spironolactone treatment in HFpEF patients. The trial design and results have been published previously and can be found summarized in the Supplementary Material.6

Eligible patients were randomly assigned to receive either spironolactone (25 mg/d) or matching placebo. The co-primary endpoints were changes in E/e’ (the ratio of peak early transmitral ventricular filling velocity to early diastolic tissue Doppler velocity as an echocardiographic estimate of left ventricular filling pressure) and peak exercise capacity at twelve months.

Study cohort—validation and sensitivity analysis

The TOPCAT trial (NCT00094302) data were obtained through the publicly available National Heart, Lung, and Blood Institute (NHLBI) BioLINCC data repository. This manuscript was prepared using TOPCAT Research Materials obtained from the NHLBI Biologic Specimen and Data Repository Information Coordinating Center and does not necessarily reflect the opinions or views of the TOPCAT investigators or the NHLBI. TOPCAT was a multicenter, international, randomized, double blind, placebo-controlled trial that tested the efficacy and safety of the mineralocorticoid receptor antagonist spironolactone compared with placebo on cardiovascular (CV) morbidity and mortality in 3445 patients. The trial design and results have been published previously and can be found summarized in the Supplementary Material.2

Eligible participants were randomly assigned to receive either spironolactone or placebo in a 1:1 ratio. The primary outcome was a composite of death from cardiovascular causes, aborted cardiac arrest, or hospitalization for the management of heart failure.

Patients with at least one missing variable precluding to perform the XGBoost model were excluded from the validation cohort. A sensitivity analysis was conducted where missing data were imputed by expectation maximization method and results are reported separately.

Model building workflow

Fig. 1 shows the model building workflow which is here presented in detail.

Fig. 1.

Fig. 1

Model building workflow. The figure displays the model building workflow used to identify responders to spironolactone treatment. Steps 1 to 3 represent the steps of the reiterative cluster allocating permutation. Hereby in a first step patients are randomly grouped into pairs with respect to their initial treatment regime (i.e. spironolactone or placebo). This produces a large number of samples within all possible combinations. In the second step, odds ratios are calculated with regards to the primary outcome (i.e. improvement in E/e’) in order to identify response. In a third step, those groups are sorted according to their within group baseline differences (variance). The groups with the lowest variance, yet highest respective odds ratio is further labelled as responders and the other group as non-responders. Finally, a supervised machine learning approach, in this case an XGBoost model, is used to identify features of the responder group. Of note, any other supervised machine learning approach could be used for this approach. The features extracted by the supervised machine learning approach can now be used on new cohorts and individual patients to predict their respective treatment response.

Reiterative cluster allocating permutation—derivation cohort

We aimed to identify patients with response vs. non-response to spironolactone treatment. A supervised reiterative cluster allocating permutation approach was used and can be explained in the following steps in short and in more detail below. The main goal of the first two steps was to identify patients that would diverge as much as possible with regards to their therapeutic response (i.e. reduction in E/e’). In short, this approach shuffles pairs of patients together (either treated or not) to identify groups with the strongest response regarding the defined outcome. The main limitation of these steps was that they would not account for differences in baseline characteristics (variance), which might explain the observed differences in the response. Therefore, a third step was introduced to choose the group that had the best trade-off between the difference in response, while yet showing the most similar baseline characteristics. A pseudo random number (“seed”) was chosen to guarantee reproducibility.

In detail: In a first step, patients were randomly assigned to one of two clusters labelled “A” or “B”. Each cluster contained 50% patients treated with spironolactone and 50% treated with placebo.

In a second step, binary logistic regression was performed to compare the odds ratio (OR) of E/e' improvement at 12 months of follow-up within the two clusters comparing patients receiving spironolactone with placebo. To identify clusters with differing response to spironolactone one cluster had to have an OR significantly >1 (to define response, e.g. odds ratio 1.4; 95% confidence interval 1.1, 1.7), and the other should have an OR significantly <1 for the prediction of E/e’ improvement. If these requisites were not met, both clusters were discarded and the first step repeated with a new seed chosen for permutation. This reiterative cluster allocating permutation was performed over 10ˆ6 times. On average these steps had to be repeated 1666 times to find one combination which met the predefined requirements. Cluster combinations which met the requirements of significantly different OR proceeded to the third step.

In the third step, differences within the clusters at baseline that might have caused differences in outcomes had to be eliminated. All identified clusters were sorted with respect to their within-cluster-variance, according to a gap score. By choosing a different seed starting from a pre-specified seed, this procedure had two advantages: a) The results are reproducible every time the clustering algorithm is run; b) computing performance is optimized to avoid re-calculating reiterations with the same combinations as before. For all further analysis the cluster combination with the minimal within-cluster-variance was chosen and the cluster with an OR significantly >1 was further labelled “responder cluster” and the cluster with an OR significantly <1 was further labelled “non-responder cluster”. By using these optimization functions, the highest homogeneity within the clusters (i.e. with the smallest variance in baseline characteristics) and the largest differences between clusters (i.e. with the largest variance between clusters in baseline characteristics) can be achieved. This further guarantees that the identified clusters are not found by chance, but indeed represent homogenic clusters that distinguish themselves in clinical characteristics. Therefore, the responder cluster contained patients treated with spironolactone which showed a favourable response and a matching number of patients that received placebo. Conversely the non-responder cluster contained patients treated with spironolactone that did not show a favourable response and a matching number of patients that did receive placebo.

XGBoost machine learning—derivation cohort

An XGBoost model11 was trained to learn the specific features of individuals grouped as responders vs. non-responders and described and defined above. Only the 10 most influential variables were included in the final model which included: BMI, brain natriuretic peptide [BNP], haematocrit, haemoglobin, potassium level, creatinine, white blood cell count, pulse wave doppler derived mitral valve E-wave velocity, A-wave velocity (set to “0” in case of atrial fibrillation) and LVEF. To achieve a high performance in identifying subgroups which could benefit most of spironolactone treatment, the tendency of modern ML algorithms such as XGBoost to keep variance and bias low were exploited to achieve a high specificity of the model. This was done to mainly identify patients who were highly likely to respond favorably to spironolactone therapy and defer those with low to intermediate probabilities of response. A high-depth XGBoost model with 1000 learning rounds, learning rate 10−5 and binary logistic outcome was applied.

Validation and sensitivity analysis—validation cohort

To test the validity and the performance of the trained model, it was applied to the independent TOPCAT cohort to identify responder and non-responder groups with respect to their corresponding primary outcome.

To achieve a high reliability of the external validation analysis, no patients with any missing variable required by the XGBoost model, were included in the external validation cohort. In a further sensitivity analysis, all patients from the TOPCAT trial were included and missing variables relevant to the XGBoost model were imputed by using the expectation maximization method. This sensitivity analysis was then repeated after exclusion of patients initially randomized in Georgia or Russia.

Ethics

This is a retrospective analysis of two randomised controlled trials, the University of Leipzig ethical committee waived the need for an additional approval for the present study. For both ALDO-DHF6 and TOPCAT2 the protocol and amendments were approved by the institutional review board at each participating center, and the trial was conducted in accordance with the principles of the Declaration of Helsinki, Good Clinical Practice guidelines, and local and national regulations. Written informed consent was provided by all patients before any study-related procedures were performed.

Statistical analysis

Data are presented as median with the corresponding 25th and 75th percentile (interquartile Range, IQR). Categorical variables are expressed as the absolute number with percentages. Chi-square or Fisher's exact test were used to evaluate the association between categorical variables, and independent-samples Wilcoxon test was used for comparison of continuous variables. Two-sided p-values ≤0.05 were considered as statistically significant. Data were analyzed using R 4.1.1 (The R Foundation for Statistical Computing, Vienna, Austria).

Role of funders

No funding was received for this specific retrospective analysis. No funders from the original trials had any role in study design, data collection, data analyses, interpretation, or writing of report.

Results

Study cohorts

Baseline characteristics of the individuals from the derivation and validation cohort are presented in Table 1. The study flow chart is presented in Fig. 2. The derivation cohort consisted of 422 patients (213 spironolactone, 209 placebo treatment). The validation cohort consisted of 525 patients (264 spironolactone, 261 placebo treatment), as 2920 patients from the validation cohort were excluded due to missing variables. The sensitivity analysis expanded the validation cohort to all 3445 patients of the TOPCAT cohort by imputing missing values.

Table 1.

Baseline characteristics.

Characteristic ALDO-DHF n = 422 TOPCAT n = 3.445 p-value
Age 66 (59, 71) 69 (61, 76) <0.001
Female sex 221 (52%) 1775 (52%) 0.7
BMI [kg/m2] 29 (26, 31) 31 (27, 36) <0.001
NYHA-class <0.001
I&II 0 (0%) 2303 (67%)
III&IV 422 (100%) 1136 (33%)
Comorbidities
Coronary artery disease 165 (39%) 813 (24%) <0.001
Arterial hypertension 387 (92%) 3147 (91%) 0.8
Hyperlipoproteinemia 255 (60%) 2073 (60%) >0.9
Diabetes mellitus 70 (17%) 1118 (32%) <0.001
Chronic obstructive pulmonary disease 14 (3.3%) 403 (12%) <0.001
Atrial fibrillation 66 (16%) 1214 (35%) <0.001
Lab values
Haemoglobin, mmol/l 8.57 (8.07, 9.13) 8.20 (7.51, 8.94) <0.001
Haematocrit, % 40 (38, 43) 40 (37, 43) 0.001
Missing, no
Potassium, mmol/l 4.20 (3.90, 4.40) 4.30 (4.00, 4.60) <0.001
Missing, no
Glomerular filtration rate, ml/min/1.73 m2 73 (61, 86) 65 (54, 79) <0.001
Missing, no
BNP/NT-proBNP % above of HFpEF cut-off, 140 (70, 228) 443 (168, 2003) <0.001
Missing, no 2022
Echocardiography
Left ventricular ejection fraction, % 67 (62, 73) 60 (60, 60) <0.001
Mitral valve doppler E, cm/s 71 (60, 83) 84 (65, 107) 0.025
Mitral valve doppler A, cm/s 83 (71, 94) 73 (56, 91) <0.001
Tissue velocity e' lateral, cm/s 8.00 (6.60, 9.80) 7.60 (5.87, 9.78) <0.001
Tissue velocity a' lateral, cm/s 10.8 (9.0, 12.4) 8.3 (6.3, 10.3) <0.001
Mitral valve doppler E/e’ 11.9 (10.3, 14.0) 14.7 (10.5, 18.7) <0.001

Abbreviations: BMI, body mass index; NYHA, New York Heart Association; BNP, brain natriuretic peptide.

Natriuretic peptides are given as % difference of their respective pathological cut-off which was <125 ng/l for NT-proBNP and <50 ng/l for BNP.

Fisher's exact test was used to evaluate the association between categorical variables, and Wilcoxon test was used for comparison of continuous variables.

Fig. 2.

Fig. 2

Study flow-chart. Abbreviations: XGBoost denominates extreme gradient boosting.

Derivation cohort responder identification and features

Using reiterative cluster allocating permutation, we identified 159 responders and matching placebo patients (38%) and 263 (62%) non-responders and matching placebo patients with respect to a beneficial therapeutic response to spironolactone treatment in terms of E/e’ improvement. Baseline differences between responders and non-responders are displayed in Supplementary Tables S1 and S2, but notably did not show any significant differences between patients allocated to the responder or non-responder group. Of note, within the responder group, potassium and e’ lateral was higher at comparable levels of E/e’ in patients receiving spironolactone as compared to placebo.

Responders treated with spironolactone showed a significantly greater E/e’ improvement (from 12.5 (IQR 10.4, 14.8) to 11.6 (IQR 9.7, 13.8), p = 0.005 [Wilcoxon test]) from baseline to follow-up as compared to patients treated by placebo, in whom an increase in E/e' was observed (from 11.5 (IQR 9.7, 13.2) to 12.6 (IQR 10.8, 16.2), p < 0.001 [Wilcoxon test]).

Among non-responders, there was no significant change in E/e' from baseline to follow-up in either treatment allocation group (spironolactone: from 11.7 (IQR 10.2, 13.7) to 11.4 (IQR 9.6, 14.2 [Wilcoxon test]) and placebo: from 12.1 (IQR 10.5, 14.0–12.8 (IQR 10.6, 15.2), p = 0.23 and p = 0.18, respectively [Wilcoxon test])).

Using the aforementioned definition of response and non-response, an XGBoost algorithm was established based on the 10 most important features that did influence responder and non-responder status which are displayed in Supplementary Figure S1. In the derivation cohort an area under the curve (AUC) of 0.87 was achieved on discriminating between patients which did or did not improve in E/e' ratio at follow-up.

Validation cohort

Among the 525 patients of the validation cohort, the XGBoost algorithm identified 185 (35%) patients to be responders and 340 (65%) to be non-responders. Baseline characteristics of patients stratified according to predicted treatment response are displayed in Supplementary Table S3. While natriuretic peptides, haematocrit and haemoglobin values were higher among responders, potassium levels were lower as compared to non-responder patients. Responders treated with spironolactone had lower rates of hyperlipoproteinemia and diabetes as compared to patients receiving placebo (Table 2).

Table 2.

Baseline characteristics of the validation cohort stratified according to proposed state of response or non-response and allocation treatment.

Characteristic Non-responder
Responder
Spironolactone N = 171 Placebo N = 169 p-value Spironolactone N = 94 Placebo N = 91 p-value
Age 69 (61, 78) 68 (61, 77) 0.4 68 (61, 76) 71 (63, 78) 0.11
Female sex 103 (60%) 103 (61%) 46 (49%) 40 (44%) 0.5
BMI [kg/m2] 32 (28, 37) 31 (27, 37) 0.5 31 (28, 37) 32 (28, 36) 0.9
NYHA-class 0.4
I & II 106 (62%) 109 (65%) 64 (69%) 57 (63%)
III & IV 65 (38%) 58 (35%) 29 (31%) 34 (37%)
Missing 0 2 1 0
Comorbidities
Coronary artery disease 50 (29%) 62 (37%) 0.13 29 (31%) 28 (31%) >0.9
Missing 1
Arterial hypertension 152 (89%) 156 (93%) 0.2 90 (96%) 86 (95%) 0.7
Missing 0 1
Hyperlipoproteinemia 124 (73%) 120 (71%) 0.8 58 (62%) 73 (80%) 0.006
Missing 0 1
Diabetes mellitus 76 (44%) 76 (45%) 0.9 26 (28%) 40 (44%) 0.021
Missing 0 1
Chronic obstructive pulmonary disease 27 (16%) 23 (14%) 0.6 16 (17%) 10 (11%) 0.2
Missing 0 1
Atrial fibrillation 39 (23%) 34 (20%) 0.6 24 (26%) 23 (25%) >0.9
Missing 0 1
Lab values
Haemoglobin, mmol/l 7.70 (7.14, 8.54) 7.45 (6.89, 8.20) 0.052 8.57 (8.14, 9.11) 8.45 (8.07, 8.91) 0.4
Haematocrit, % 37 (34, 41) 36 (34, 40) 0.093 41 (39, 44) 41 (39, 43) 0.4
Potassium, mmol/l 4.40 (4.00, 4.60) 4.30 (4.10, 4.60) 0.8 4.00 (3.70, 4.38) 4.10 (3.80, 4.30) 0.5
Glomerular filtration rate, ml/min/1.73m2 62 (51, 77) 62 (50, 80) 0.7 71 (58, 82) 64 (53, 81) 0.3
NT-proBNP, pg/ml 380 (150, 865) 226 (120, 701) 0.2 325 (175, 593) 365 (210, 700) 0.5
Missing 96 87 25 30
Echocardiography
Left ventricular ejection fraction, % 60 (56, 64) 62 (57, 66) 0.2 63 (57, 68) 61 (58, 65) 0.3
Mitral valve doppler E, cm/s 76 (60, 98) 80 (62, 100) 0.6 77 (61, 96) 78 (59, 101) >0.9
Mitral valve doppler e' lateral, cm/s 8.24 (6.05, 10.08) 8.32 (6.92, 10.10) 0.3 8.6 (5.8, 10.5) 8.8 (6.5, 10.7) 0.8
Missing 51 42 22 42 51
Mitral valve doppler a' lateral, cm/s 7.06 (5.65, 9.11) 7.06 (5.57, 8.93) 0.7 7.20 (5.65, 8.81) 7.05 (5.49, 9.53) 0.7
Missing 48 42 22 41
Mitral valve doppler A, cm/s 71 (58, 88) 76 (61, 93) 0.079 75 (55, 91) 64 (54, 89) 0.2
Missing
Mitral valve doppler E/e’ 15 (10, 18) 15 (11, 19) 0.4 14 (9, 19) 15 (11, 20) 0.5
Missing 55 41 26 39
Country of Origin p = 0.3 p = 0.6
USA 85 (50%) 97 (57%) 59 (63%) 55 (60%)
Canada 21 (12%) 11 (6.5%) 8 (8.5%) 7 (7.7%)
Russia 47 (27%) 47 (28%) 24 (26%) 26 (29%)
Georgia 0 0 0 0
Brazil 14 (8.2%) 12 (7.1%) 1 (1.1%) 3 (3.3%)
Argentina 4 (2.3%) 2 (1.2%) 2 (2.1%) 0 (0%)

Abbreviations: BMI, body mass index; NYHA, New York Heart Association; BNP, brain natriuretic peptide.

Fisher's exact test was used to evaluate the association between categorical variables, and Wilcoxon test was used for comparison of continuous variables. Statistically significant values are indicated in bold (p < 0.05).

The primary outcome, a composite of death from cardiovascular cause, aborted cardiac arrest, or hospitalization for the management of heart failure occurred in 137 (26%) patients with a median follow-up of 2.9 (IQR 1.7–4.0) years in the overall cohort. Overall, 40 (22%) patients of the responder group met the primary endpoint, 13 (14%) in the spironolactone treatment arm and 27 (30%) in the placebo arm. In the non-responder group 97 (29%) patients met the primary endpoint, 48 (28%) in the spironolactone arm and 49 (29%) in the placebo arm (Supplementary Figure S2). As shown in the Central Illustration, spironolactone treatment significantly reduced the occurrence of the primary outcome among responders (Hazard ratio [HR] 0.42, 95% CI 0.22, 0.78; p = 0.008 [Cox-regression]), but not among patients in the non-responder group (HR 0.88, 95% CI 0.59, 1.31; p = 0.52 [Cox-regression]). This effect among responders was mainly driven by a reduction in mortality (p-log-rank = 0.028), while heart failure hospitalization only showed a non-significant trend (p-log-rank = 0.085). Responder patients showed a significantly lower number needed to treat at 4 years of 5 as compared to 33 among the non-responders. Notably, Fig. 3 shows the relative feature importance for individual patients, underlining the variable weighing of different features of the model and Supplementary Figure S3 a bee-swarm SHAP value plot for the whole validation cohort.

Central Illustration.

Central Illustration

Study overview of model derivation and validation. The figure represents the study main findings, including relevance of identified responder features, number of patients proposed to be responders and non-responders in the validation cohort, as well as their respective outcome. Abbreviations: NNT denominates number needed to treat.

Fig. 3.

Fig. 3

Examples of patient individual feature importance. For better understanding of the variability of the relationships, the 10 variables of four individuals of the cohort with different response allocations were plotted. On the x-axis, the cumulative relative effect of the variable on the allocation with features favouring response (positive, blue, values) and features favouring non-response (negative, red, values) for every individual are shown. Notably, this figure illustrates how differently variables with seemingly comparable values might be interpreted by the model, an E-wave of 0.5 m/s in patient number one is seen as a factor highly predisposing for patients to be a responder, an E-wave of 0.6 m/s in patient number two is suggesting a higher chance of non-response. This underlines the strength of decision tree based algorithms were variables can be interpreted beyond mere linear assumptions and their model influence changes in accordance to accompanying covariables. Abbreviations: BMI denominates body mass index; BNP, brain natriuretic peptide; HCT, haematocrit; K+, potassium; Hb, haemoglobin; WBc, white blood cell count; A-Wave transmitral E-Wave on continuous wave doppler echocardiography and LV-EF, left ventricular ejection fraction. Natriuretic peptides were imputed in the model as % difference of their respective pathological cut-off which was <125 ng/l for NT-proBNP and <50 ng/l for BNP.

Sensitivity analysis

Baseline characteristics of the sensitivity cohort with imputed data are shown in Supplementary Tables S4 and S5. Overall, 1.320 (38%) patients were identified as responders and 2125 (62%) as non-responders. The number of imputed variables is displayed in Supplementary Table S6. Encouragingly, the main results were consistent with a higher benefit for patients treated with spironolactone as compared to placebo in the responder group, while non-responders still did not show any improvement through spironolactone therapy. Overall treatment effect size was lower as compared to the validation cohort (Supplementary Figure S4). Furthermore, when patients allocated to treatment in Russia or Georgia were excluded from the sensitivity analysis, similar results were retained as compared to the validation cohort (Supplementary Figure S5).

Discussion

Using data from two prospective randomized clinical trials a ML approach was able to identify HFpEF patients that showed markedly positive response to spironolactone treatment in terms of filling pressure estimates12 and clinical endpoints. Conversely the algorithm was also able to identify a fraction of HFpEF patients that is unlikely to benefit from spironolactone treatment in terms of clinically relevant outcomes. The proposed ML approach has the possibility to change clinical practice for HFpEF in specific but even more for precision medicine in general, by leveraging the potential of complex decision tree algorithm-based approaches.

As HFpEF is a syndrome characterized by heterotopic features1,13 it might be an exemplary cardiovascular disease to investigate and ultimately apply a ML strategy accounting for heterogenous features. Our algorithm was based on functional improvement (i.e. E/e’ change) in the Aldo-DHF trial. The focus on this surrogate endpoint was necessary due to a short follow-up of only 12 months and a low event rate. Nevertheless, the algorithm was validated in an independent cohort on all-cause mortality and heart failure hospitalizations and showed good performance in identifying patients who are likely to benefit from spironolactone in the validation analysis. The finding is in line with previous data where elevated filling pressures were linked to adverse outcomes14 and improving filling pressures has been proposed as a prognostically relevant approach among HFpEF patients.15,16

There is still an unmet clinical need for an effective therapy in HFpEF patients. Two of the most promising trials in HFpEF patients, the TOPCAT (mineralocorticoid receptor antagonist)2 as well as the PARAGON-HF trial (angiotensin receptor-neprilysin inhibitors),3 have failed to show statistical significance for their primary endpoint while E/e' was reduced by angiotensin receptor-neprilysin inhibitors in comparison to different RAS blockade treatments.4 Importantly, out of 34 predefined subgroups in TOPCAT and PARAGON-HF, only 3 have shown positive results (8.8% positive results at a given Alpha error of 5%). Subgroup analyses only accounting for one variable might fail to account for heterogeneous treatment effects. The incremental value of ML approaches is created by exceeding conventional analyses as it allows to model more complex interactions of variables. This inherently limits plausibility but factors predicting a response to therapy are possibly defined by many different features likely in a non-linear way. As shown, ML can stratify patients according to their treatment response to medical therapy translating complex features in simple interpretable dichotomous categories, i.e. responders and non-responders. While it is inviting to speculate on relative implications of the features identified by the ML model it is not reasonable to do so. The predictive power of such algorithms is derived from an individualized prediction model, in which the same feature might be considered as beneficial or harmful in different patients. For example, E-wave was the most important feature of the model. One might argue that higher E-wave, as an indicator of elevated filling pressures,12,17 is an obvious predictor of spironolactone treatment response. However, Fig. 3 shows that in ML models, the effect of a specific variable is modified by its covariables. Therefore, while a high E-wave might in fact be a predictor of response in some patients it might be a predictor of non-response in others. The interpretability of results is sacrificed for the sake of an improved and foremost individualized predictability of treatment response and therefore possibly optimized patient selection. This is a typical feature to ML approaches but likely better reflects reality where causations are usually in a complex multimodal way. Importantly, such model can therefore hardly be expressed by conventional cluster analysis and even exceeds their potential at the cost of reduced interpretability.18

Beyond its use in HFpEF, the presented approach has the potential to allow a next step in precision medicine. The advent of comprehensive ML algorithms and its increasing availability have given rise to a large number of newly developed risk prediction scores9,19 as well as phenotyping approaches1,8,20,21 in cardiovascular medicine. Beyond an optimized and individualized risk prediction, precision medicine requires the determination of patient's individual response to therapy. Using the presented reiterative cluster allocating permutation to first identify therapy responders and then a further supervised ML strategy like XGBoost to define complex cohort characteristics, a precision medicine treatment algorithm can be established with every treatment where randomized clinical trials of relevant size have been performed. This improved selection of specific patient populations in accordance to their treatment response may have two large advantages: 1) for subsequent randomized trials it may decrease the sample size required to observe a significant treatment effect; and 2) in clinical practice the number of patients exposed to interventions from which they are unlikely to derive benefit may vastly be reduced.18,22

Strengths and limitations

The study population used in the development and validation of the model was obtained from two randomized clinical trials, and results may not be applicable to a broader population in clinical practice. The initial validation cohort consisted only of 15% of patients from the TOPCAT cohort, but results were retained when a sensitivity analysis was performed with patients with missing baseline variables included.

In the derivation cohort, the improvement in E/e' was employed as a surrogate marker for therapeutic response. However, in the validation cohort, the endpoints considered were mortality and heart failure hospitalization. It is worth noting that the model's performance could potentially have been enhanced if the derivation had been conducted using the same outcome variables as the validation cohort. As limited data availability hampered this approach in our study, this will need to be determined in future investigations.

There was some discussion whether patients from Russia and Georgia actually received the allocated treatment in the TOPCAT trial,23 which we addressed by performing a subgroup analysis, excluding those patients, retaining the studies main results. The application of the model to the entire TOPCAT population showed a lower treatment effect size compared to the validation cohort. However, after exclusion of patients from Russia and Georgia a similar effect size in validation and sensitivity cohort could be demonstrated.24

Other potentially useful data as imaging,25 multiomics,9,26 and environment factors could further improve prediction but were not available for this analysis. On the other hand, the model provided by the current study works well with clinically readily available data and is made freely available for clinical practice on request and might easily be implemented in daily practice. The role of the derived algorithm in a prospective approach is yet to be determined.

Conclusions

Using ML, an algorithm that identifies individual HFpEF patients that are likely to respond favorably to spironolactone treatment when compared to placebo using readily available clinical data was derived and validated. This may significantly reduce the number needed to treat and might pave the way for future precision medicine approaches in CV medicine and beyond. Larger prospective studies are needed to apply the current findings and approaches in primary and secondary interventions to assess its prospective and additive value.

Contributors

All authors read and approved the final version of the manuscript. The individual contributions were as follows:

Karl-Patrik Kresoja verification of the underlying data, conceptualisation, formal analysis, investigation, methodology, project administration, visualisation, writing—original draft.

Matthias Unterhuber verification of the underlying data, conceptualisation, formal analysis, investigation, methodology, project administration, visualisation, writing—original draft.

Rolf Wachter writing—review & editing.

Karl-Philipp Rommel writing—review & editing.

Christian Besler writing—review & editing.

Sanjiv Shah writing—review & editing.

Holger Thiele writing—review & editing.

Frank Edelmann writing—review & editing.

Philipp Lurz writing—review & editing, supervision.

Data sharing statement

Data from the Aldo-DHF study will be made available on reasonable request by contacting the primary investigators.

The data of the TOPCAT study were kindly provided by the NHLBI and will be made available on request by filling out the required forms at: https://biolincc.nhlbi.nih.gov/home/.

Declaration of interests

Karl-Patrik Kresoja: travel grants from Edwards Lifesciences.

Karl-Philipp Rommel: Personal and unrestricted research grant “excellence fellowship” from Else-Kröner-Fresenius-Stiftung, Bad Homburg, Germany.

Philipp Lurz: Institutional grants from Abbott Vascular, ReCor and Edwards Lifesciences.

Rolf Wachter: Personal fees from Daiichi Sankyo, Gilead, Novartis, Pfizer, Pharmacosmos, and Servier; grants and personal fees from Boehringer Ingelheim, CVRx, and Medtronic; grants from Bundesministerium für Bildung und Forschung, the European Union, and Deutsche Forschungsgemeinschaft.

Frank Edelmann received honoraria as a consultant or speaker from AstraZeneca, Bayer, Merck, MSD, Boehringer Ingelheim, Novartis, Pfizer, Pharmacosmos, Vifor Pharma and Servier.

Matthias Unterhuber, Christian Besler, Sanjiv Shah and Holger Thiele have nothing to disclose.

Acknowledgements

The Aldo-DHF study was supported by the German Competence Network of Heart Failure. Aldo-DHF was funded by the Federal Ministry of Education and Research Grant 01GI0205 (clinical trial program Aldo-DHF [FKZ 01KG0506]). The University of Göttingen was the formal sponsor. NCT00108251.

The TOPCAT trial was funded by the National Heart, Lung, and Blood Institute; TOPCAT ClinicalTrials.gov number, NCT00094302.

Footnotes

Appendix A

Supplementary data related to this article can be found at https://doi.org/10.1016/j.ebiom.2023.104795.

Appendix A. Supplementary data

Supplementary Data
mmc1.docx (995.1KB, docx)

References

  • 1.Shah S.J., Katz D.H., Selvaraj S., et al. Phenomapping for novel classification of heart failure with preserved ejection fraction. Circulation. 2015;131:269–279. doi: 10.1161/CIRCULATIONAHA.114.010637. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Pitt B., Pfeffer M.A., Assmann S.F., et al. Spironolactone for heart failure with preserved ejection fraction. N Engl J Med. 2014;370:1383–1392. doi: 10.1056/NEJMoa1313731. [DOI] [PubMed] [Google Scholar]
  • 3.Solomon S.D., McMurray J.J.V., Anand I.S., et al. Angiotensin-neprilysin inhibition in heart failure with preserved ejection fraction. N Engl J Med. 2019;381:1609–1620. doi: 10.1056/NEJMoa1908655. [DOI] [PubMed] [Google Scholar]
  • 4.Pieske B., Wachter R., Shah S.J., et al. Effect of sacubitril/valsartan vs standard medical therapies on plasma NT-proBNP concentration and submaximal exercise capacity in patients with heart failure and preserved ejection fraction: the PARALLAX randomized clinical trial. JAMA. 2021;326:1919–1929. doi: 10.1001/jama.2021.18463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.McDonagh T.A., Metra M., Adamo M., et al. 2021 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure. Eur Heart J. 2021;42:3599–3726. doi: 10.1093/eurheartj/ehab368. [DOI] [PubMed] [Google Scholar]
  • 6.Edelmann F., Wachter R., Schmidt A.G., et al. Effect of spironolactone on diastolic function and exercise capacity in patients with heart failure with preserved ejection fraction: the Aldo-DHF randomized controlled trial. JAMA. 2013;309:781–791. doi: 10.1001/jama.2013.905. [DOI] [PubMed] [Google Scholar]
  • 7.Mottram P.M., Haluska B., Leano R., Cowley D., Stowasser M., Marwick T.H. Effect of aldosterone antagonism on myocardial dysfunction in hypertensive patients with diastolic heart failure. Circulation. 2004;110:558–565. doi: 10.1161/01.CIR.0000138680.89536.A9. [DOI] [PubMed] [Google Scholar]
  • 8.Karwath A., Bunting K.V., Gill S.K., et al. Redefining β-blocker response in heart failure patients with sinus rhythm and atrial fibrillation: a machine learning cluster analysis. Lancet. 2021;398:1427–1435. doi: 10.1016/S0140-6736(21)01638-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Unterhuber M., Kresoja K.-P., Rommel K.-P., et al. Proteomics-enabled deep learning machine algorithms can enhance prediction of mortality. J Am Coll Cardiol. 2021;78:1621–1631. doi: 10.1016/j.jacc.2021.08.018. [DOI] [PubMed] [Google Scholar]
  • 10.Liu Q., Huang S., Desautels D., McManus K.J., Murphy L., Hu P. Development and validation of a prognostic 15-gene signature for stratifying HER2+/ER+ breast cancer. Comput Struct Biotechnol J. 2023;21:2940–2949. doi: 10.1016/j.csbj.2023.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Chen T., Guestrin C. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. Krishnapuram B., Shah M., Smola A., Aggarwal C., Shen D., Rastogi R., editors. ACM; New York, NY, USA: 2016. XGBoost; pp. 785–794. [Google Scholar]
  • 12.Ommen S.R., Nishimura R.A., Appleton C.P., et al. Clinical utility of Doppler echocardiography and tissue Doppler imaging in the estimation of left ventricular filling pressures: a comparative simultaneous Doppler-catheterization study. Circulation. 2000;102:1788–1794. doi: 10.1161/01.cir.102.15.1788. [DOI] [PubMed] [Google Scholar]
  • 13.Braunwald E. Heart failure with preserved ejection fraction: a stepchild no more! Eur Heart J. 2021;42:3900–4001. doi: 10.1093/eurheartj/ehab601. [DOI] [PubMed] [Google Scholar]
  • 14.Dorfs S., Zeh W., Hochholzer W., et al. Pulmonary capillary wedge pressure during exercise and long-term mortality in patients with suspected heart failure with preserved ejection fraction. Eur Heart J. 2014;35:3103–3112. doi: 10.1093/eurheartj/ehu315. [DOI] [PubMed] [Google Scholar]
  • 15.Abraham W.T., Stevenson L.W., Bourge R.C., Lindenfeld J.A., Bauman J.G., Adamson P.B. Sustained efficacy of pulmonary artery pressure to guide adjustment of chronic heart failure therapy: complete follow-up results from the CHAMPION randomised trial. Lancet. 2016;387:453–461. doi: 10.1016/S0140-6736(15)00723-0. [DOI] [PubMed] [Google Scholar]
  • 16.Adamson P.B., Abraham W.T., Bourge R.C., et al. Wireless pulmonary artery pressure monitoring guides management to reduce decompensation in heart failure with preserved ejection fraction. Circ Heart Fail. 2014;7:935–944. doi: 10.1161/CIRCHEARTFAILURE.113.001229. [DOI] [PubMed] [Google Scholar]
  • 17.Nagueh S.F., Smiseth O.A., Appleton C.P., et al. Recommendations for the evaluation of left ventricular diastolic function by echocardiography: an update from the American society of echocardiography and the European association of cardiovascular imaging. J Am Soc Echocardiogr. 2016;29:277–314. doi: 10.1016/j.echo.2016.01.011. [DOI] [PubMed] [Google Scholar]
  • 18.Weissler E.H., Naumann T., Andersson T., et al. The role of machine learning in clinical research: transforming the future of evidence generation. Trials. 2021;22:537. doi: 10.1186/s13063-021-05489-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Angraal S., Mortazavi B.J., Gupta A., et al. Machine learning prediction of mortality and hospitalization in heart failure with preserved ejection fraction. JACC Heart Fail. 2020;8:12–21. doi: 10.1016/j.jchf.2019.06.013. [DOI] [PubMed] [Google Scholar]
  • 20.Segar M.W., Patel K.V., Ayers C., et al. Phenomapping of patients with heart failure with preserved ejection fraction using machine learning-based unsupervised cluster analysis. Eur J Heart Fail. 2020;22:148–158. doi: 10.1002/ejhf.1621. [DOI] [PubMed] [Google Scholar]
  • 21.Woolley R.J., Ceelen D., Ouwerkerk W., et al. Machine learning based on biomarker profiles identifies distinct subgroups of heart failure with preserved ejection fraction. Eur J Heart Fail. 2021;23:983–991. doi: 10.1002/ejhf.2144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Schork N.J. Personalized medicine: time for one-person trials. Nature. 2015;520:609–611. doi: 10.1038/520609a. [DOI] [PubMed] [Google Scholar]
  • 23.de Denus S., O'Meara E., Desai A.S., et al. Spironolactone metabolites in TOPCAT–new insights into regional variation. N Engl J Med. 2017;376:1690–1692. doi: 10.1056/NEJMc1612601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Pfeffer M.A., Claggett B. Behind the scenes of TOPCAT — bending to inform. NEJM Evidence. 2022;1 doi: 10.1056/EVIDctcs2100007. [DOI] [PubMed] [Google Scholar]
  • 25.Rommel K.-P., von Roeder M., Latuscynski K., et al. Extracellular volume fraction for characterization of patients with heart failure and preserved ejection fraction. J Am Coll Cardiol. 2016;67:1815–1825. doi: 10.1016/j.jacc.2016.02.018. [DOI] [PubMed] [Google Scholar]
  • 26.Urbich M., Globe G., Pantiri K., et al. A systematic review of medical costs associated with heart failure in the USA (2014-2020) Pharmacoeconomics. 2020;38:1219–1236. doi: 10.1007/s40273-020-00952-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data
mmc1.docx (995.1KB, docx)

Articles from eBioMedicine are provided here courtesy of Elsevier

RESOURCES