Abstract
Background
The 2017 guidelines of the American College of Cardiology and the American Heart Association propose substantial changes to hypertension management. The guidelines lower the blood pressure threshold defining hypertension and promote more aggressive treatments. Thus, more individuals are now classified as hypertensive and as a result, medication usage may become more extensive. An inevitable byproduct of greater medication use is higher incidence of adverse effects. Here, we examined these issues by developing models that predict both cardiovascular events and other adverse events based on the treatment chosen and other patient’s data.
Methods and results
We used data from the SPRINT trial to produce patient-specific predictions of the risks for adverse cardiovascular or kidney outcomes. Unlike prior models, we used both the baseline characteristics collected upon recruitment and the longitudinal data during the follow-up. Importantly, our cardiovascular predictor outperformed extant models on SPRINT participants, achieving AUC = 0.765, and was validated with good performance in an independent cohort of the ACCORD trial.
Conclusions
Our study illustrates the importance of including longitudinal data for assessing personalized risk and provides means for recommending personalized treatment decisions.
Introduction
The hypertension treatment guidelines published in 2017 redefine thresholds for aggressive treatment of patients with hypertension and high risk of cardiovascular (CV) events [1]. An inevitable consequence of greater pharmacotherapy is an increase in adverse events [2]. The new guidelines are based on the conclusions of the SPRINT study [3], a large clinical trial aimed at determining the benefit of aggressive vs. standard treatment for lowering the risk of CV events. The design and recommendations raise important questions regarding personalized treatments. To what extent can we predict CV events by using multiple clinical parameters available upon recruitment? Can longitudinal data improve such predictions? Can we predict adverse events that may arise from aggressive treatment? A possible starting point for answering these questions is the development of predictive models that can distinguish between patients more likely to benefit from the aggressive treatment and those who are at higher risk to experience adverse events.
The SPRINT study data can be utilized to address these questions. SPRINT measured the effect of intensive vs. standard treatment arms for reduction in systolic blood pressure (SBP) in 9,361 non-diabetic patients with high risk of CV events. Comparing the response of the two arms, the study showed a significant decrease in the number of CV events in the intensive treatment group, along with an increased risk of some adverse events, most prominent of which were kidney-related outcomes. These data were subsequently released as part of the SPRINT Challenge [4]; and several studies utilized it, some in combination with other data, to address questions related to personalized prognosis [5–7]. In particular, Patel et al. [8] and Basu et al. [9] presented predictors for CV and adverse events using logistic regression and Cox proportional hazards regression (CoxPH) [10], respectively, using only the data collected upon participant entry to the study.
The SPRINT data is of two types: characteristics taken from each patient on the day of enrollment (baseline data) and follow-up blood pressure measurements taken at periodic visits (longitudinal data). Here, we used the SPRINT data to provide novel predictive models combining baseline and longitudinal data and examined whether they outperform extant methods. Using supervised methods, we compared multivariable models that predict the risk of CV outcomes and kidney-related events based on baseline data only, or utilizing all data (baseline and longitudinal). Understanding the value of longitudinal data to prediction models is timely, as use of such data will rapidly increase in the near future owing to digital health appliances. Such appliances allow patients to measure their blood pressure at home [11], making time-series data available to the physician at the point of treatment decision and even on the first visit.
Methods
The SPRINT data set has three components for each patient: baseline characteristics collected at the day of enrollment, longitudinal measurements taken over time, and clinical outcomes. There are 20 baseline parameters, describing demographic and medical information. The longitudinal data are systolic and diastolic blood pressure (BP) values measured at periodic visits (every 3 months) for each subject (mean number of visits per patient: 14.46, SD: 4.13. 135,205 visits in total). We used these time series to extract summary statistics for each subject. Using summary statistics of longitudinal data to characterize follow-up response is a well-established tool for medical applications [10]. S1 Table shows of the baseline and longitudinal features and their statistics in both study arms; the main SPRINT study outcomes are summarized in S2 Table. Here, we focused on two main outcomes: (1) the study’s primary outcome, namely CV events (myocardial infarction, acute coronary syndrome, stroke, heart failure and death from CV causes) and (2) kidney-related outcomes (novel Chronic Kidney Disease (CKD) for non-CKD patients at baseline, and Acute Kidney Injury or renal failure (AKI)). Patients with CKD at baseline were excluded from our prediction of novel CKD. Unlike previous methods that predicted all Serious Adverse Events (SAEs), we focused on the prediction of kidney-related outcomes only. Since kidney-related events were the most statistically significant adverse events in the SPRINT study, we reasoned that controlling those was most critical for personalized prognosis.
We constructed two prediction models. The baseline model utilizes as candidate predictors for model construction all baseline features (see S1 Table). Two constructed variables combining two baseline features, the ratio of HDL to total cholesterol and the ratio of eGFR to serum creatinine, were added based on prior knowledge. The longitudinal model uses as additional candidate predictors summary statistics extracted from longitudinal measurements as follows. For each visit, we computed the pulse pressure (PP, namely SBP minus DBP) and then summarized the series of PP values as the following features: the difference between the maximum and minimum, the mean, the slope of a linearly fitted curve, its R squared statistic and its F statistic. A linear curve is the simplest representation of the BP trend and is less prone to overfitting than higher rank curves. The R and F statistics reflect the strength of the trend. To avoid information leakage in cases where a BP measurement was taken close to the event, we followed a retrospective study design: we excluded measurements from the last t months (using t = 6 or 12) prior to the event or to the end of follow-up (S1 Fig). Patients with less than three measurements up to t months prior to the event/censoring were excluded from the longitudinal model (n = 435 (4.6%) and 634 (6.7%) for t = 6 and t = 12, respectively). All our predictors are multivariable, i.e., they use multiple features for prediction. In contrast, the SPRINT study is akin to a predictor that uses a single parameter (group assignment at recruitment) to predict the outcome.
For each individual, we used Cox proportional hazard regression on the baseline data to predict the risk for an outcome, and Logistic Regression on baseline and longitudinal data to predict the probability of the outcome. Final models were derived using lasso regularization [12,13] to account for collinearity between candidate predictors and to get compact, interpretable models. To evaluate the predictions, we used 10-fold cross-validation with internal sampling (see S1 File for details) and estimated the resulting ROC AUC (a measure equivalent to the C-statistic [14] used in other studies). The process was repeated 50 times to obtain the distribution of AUC scores. We compared the mean performance of our models with two extant methods, Patel et al. [8] and Basu et al. [9]. We evaluated the difference between the AUC scores using a simple t-test. For comparison with the method of Basu et al. we used a two-sample t-test as we have the equivalent distribution of AUC scores of their method. For Patel et al. we used the simple one-sample t-test as we only had the mean performance (without the variance across the cross-validation folds). Comparison was for prediction of CV events only, since the other two methods predicted SAE and not kidney-related events. To evaluate the quality of predicting kidney-related outcomes we compared our models with a univariate model that utilizes the treatment arm only (analogous to the original SPRINT study).
Treatment recommendation—A simulation
How can our approach help the physician decide on aggressive vs. standard treatment given the risks for CV and adverse events? This decision is not trivial since aggressive treatment may decrease CV risk but induce higher risk for adverse events and vice versa [2,3]. One way to overcome this difficulty is to use both risks when deciding on the right treatment. To demonstrate this approach, we developed a method that recommends aggressive treatment to subjects with high risk for CV events but also keeps the AKI risk low. We studied this problem when the adverse event was AKI and did not include CKD as it was not defined for all SPRINT patients). We trained a logistic regression model using the predicted CV and AKI risks as covariates. A threshold value θ is set, and subjects with regression value above θ were assigned to the intensive treatment (in that sense, θ weighs the relative severity of CV vs. AKI events). See S1 File for more details.
External validation on ACCORD data
We validated our models on the independent ACCORD dataset [15]. The ACCORD trial had similar goals to SPRINT but focused exclusively on type-2 diabetes mellitus patients. ACCORD was different from SPRINT in three additional aspects that are relevant for our analysis: (1) the set of CV events that defined a primary CV outcome. We addressed this by aggregating the same secondary CV outcomes of ACCORD as described by SPRINT (non fatal MI, non fatal stoke, coronary heart disease, heart failure and death from CV cause) to construct a matched composition for a primary CV outcome, (2) Since the definitions of kidney events were different in ACCORD, we validated only the CV models. (3) Finally, as the ACCORD trial did not report the 10-year Framingham Risk score it was excluded from this analysis.
In order to validate our CV models on the ACCORD data, we derived baseline and longitudinal models trained using internal sampling (see S1 File) from the entire SPRINT dataset, recalibrated them by adjusting the intercepts in order to match the overall event rates in ACCORD [14] and used them to predict CV events for all ACCORD participants. As in the SPRINT validation, patients with less than three BP measurements up to t months prior to the event/censoring were excluded from the prediction by the longitudinal model (n = 233 (4.9%), 346 (7.3%) for t = 6 and t = 12 respectively).
Results
The results using only baseline variables are shown in Fig 1A. For all outcomes, multivariable prediction was far better (the univariate prediction using the treatment arm variate achieved 0.52 ROC AUC for CV outcome). This is expected, since multivariable methods leverage more information about the background risk of an individual. Multivariable prediction of primary CV outcome was slightly lower than for the models suggested in [8,9]. The advantage of the Basu and Patel models probably stems from the fact that they utilized multiple nonlinear combinations of variables, whereas our models used only two interactions between variables.
Fig 1B shows that our longitudinal model predicts the probability of the primary CV outcome better: Using the longitudinal data up to t = 6 and t = 12 months prior to a primary CV outcome event, we obtained a marked improvement compared to the multivariable baseline analysis and outperformed the models in [8,9] (p<2.4E-30, Student’s t-test, p<0.01 using Chebyshev’s inequality and avoiding the normal assumption). We emphasize that the primary outcome predicted was identical for all methods. Reassuringly, when ranking subjects by our predicted CV probability, we achieved hazard ratio of 9.171 between the first and third tertile of the cohort (Fig 1B). All our models showed good calibration when plotting the observed vs. predicted risk percentiles with modest overestimation on the top risk deciles (S2–S6 Figs). Notably, the most important features for the prediction of CV outcomes were the summary statistics extracted from longitudinal data and prior CV events (S3 Table). For predicting kidney-related outcomes, the most important features included known kidney markers such as urinary albumin to creatinine ratio, along with other general risk factors like the pulse pressure. For a full list of important features and their coefficients see S3–S5 Tables. The longitudinal model did not improve prediction of kidney related outcomes.
Fig 2 shows the estimated risks of CV, CKD and AKI outcomes for all included individuals according to our final models. These plots give an overview of the distribution of risks and event probabilities calculated for the study patients. Both plots show a reasonable separation between the study arms. More importantly, they illustrate that actual events tend to cluster in areas where our models predict greater likelihood for an event (i.e., higher x coordinate values for CV outcomes and higher y values for kidney events).
Treatment recommendation
We applied our recommendation system (see Methods) in a simulated scenario of assignment of SPRINT participants, based only on their pre-trial data. Our method partitioned the participants into two groups: Those with recommended intensive treatment (RI) and those with recommended standard treatment (RS). The RI group contained 4,246 patients (45.3%, S6 Table). Notably, the hazard ratio for CV events between RI and RS patients was 1.341 (1.136–1.583 95% CI, p-value = 0.002). Hence, patients for whom our system recommended intensive treatment were indeed at higher risk for CV events than patients who were recommended standard treatment. Moreover, the hazard ratio for AKI events between RI and RS patients was 0.676 (0.539–0.848 95% CI, p-value = 0.018). In other words, RI patients were at lower risk for AKI. This clearly demonstrates the ability of our recommendation system to identify patients who would benefit from intensive treatment and are less likely to experience AKI. A possible explanation for the success of our system could have been that the RI group is biased, as it includes more patients who were truly given intensive treatment. We ruled out this possibility by showing that the recommendations were not biased towards any treatment arm of the SPRINT study (S7 Table).
This analysis suggests that our models have the power to recommend treatment with reduced risks for both CV and AKI to the specific patient. Assigning treatment according to multiple variables is non-trivial, especially when multiple risks must be weighed, and the exact weight of each risk cannot be easily assessed. In our case, the choice of the threshold θ indirectly reflects the relative weights of each risk (see S10 Fig). While in our implementation we chose θ based on the training group (S1 File), one can modify θ to reflect personal severity assumptions. This analysis is retrospective and additional trials are needed in order to evaluate its accuracy and efficiency.
External validation on ACCORD data
Fig 3A shows the performance of our SPRINT-derived longitudinal models in predicting CV events on the independent cohort of the ACCORD study. Performance was high, with only a modest decrease as compared to the SPRINT validation. Decent calibration was achieved when plotting the observed vs. predicted values for CV risk percentiles (S7–S9 Figs). Notably, our longitudinal models outperformed the models in [8,9] for predicting CV events on the ACCORD data (p<7.7E-37, Student’s t-test, p<0.01 using Chebyshev’s inequality and avoiding the normal assumption). Again, we stress that the predicted primary outcome was the same for all methods. When ranking the ACCORD patients by their predicted risk, the hazard ratio between top and bottom tertile was 5.999 (4.65–7.73995% CI, P-value < 5.5*10^-9) (Fig 3B).
Discussion
Our analysis highlights the value of longitudinal treatment response data for accurate prediction of CV events. Such analysis incorporates variables measured later than the baseline variables and reflects the treatment response. Thus, they offer direct information not used by extant models. While seemingly trivial, integration of longitudinal data into such prediction models has proven to be challenging [16]. To the best of our knowledge, our study is the first to leverage longitudinal blood pressure measurements to significantly improve CV risk prediction. By quantifying simultaneously the CV benefit and the kidney-related risks, our predictions can assist clinicians in treatment decisions. We demonstrated such usage here by developing a tool that predicts both risks for a patient considered for aggressive treatment. We show that it successfully discriminates between patients who would benefit from such treatment and those who are more likely to be harmed by it.
The SPRINT randomized controlled trial (RCT) showed a significant effect of aggressive treatment for hypertension on CV events, but observed that the treatment also increased the risk of renal complications. It is important to note that the main goal of an RCT, by definition, is to provide an estimate for one causal factor. In other words, the randomization is used to separate the effect of the treatment from other factors such that the causal effect of the treatment on the outcome would be identifiable [17]. However, the task of predicting the risk for a specific subject, the focus of precision medicine, is fundamentally different from the goal of an RCT. Such prediction should consider as many presumed causal risk factors (or at least proxies of these factors) as possible when making the assessment. Naturally, once the SPRINT RCT showed the significance of the treatment, this variable should be added into the model, as was recapitulated in our results. It is therefore not surprising that our analysis and those of others easily improved upon a univariate model akin to that used in the SPRINT study.
In this study, we adopted a retrospective study design that utilized data from the SPRINT trial to train powerful predictive models that rely on both baseline and longitudinal follow-up data, for personalized prognosis of cardiovascular patients. While our analysis is limited due to the retrospective nature of the data, our careful validations illustrate the usefulness of integrating regularly collected longitudinal data into prediction models. Recently, Pool et al.[16] examined whether incorporating cumulative SBP measures that summarize SBP levels collected over time improve atherosclerotic cardiovascular disease prediction. Their results show only modest improvement by incorporating cumulative SBP in the prediction as compared to a single SBP measure. In contrast, we presented a marked improvement by introducing novel features derived from the longitudinal data. Possible explanations of our better result is the fact that we make a more complex use of the response to treatment. Notably, our results show a clear improvement trend in prediction quality: multivariable models outperform univariate models, and multivariable models that use baseline and longitudinal features improve over models that incorporate baseline data only. Taken together, our main contribution is illustrating how longitudinal data can contribute to more precise risk estimation.
Our models performed well in internal validation on SPRINT dataset. The validation on the independent ACCORD dataset further confirmed the usefulness and generalizability of the CV outcome predictor. In an additional evaluation of the model, we tested it in cross validation on each treatment arm of the SPRINT study separately, in order to rule out a possible concern that the treatment assignment significantly confounded our results. The results (S11 Fig) when testing each group separately still outperform the baseline model, with a modest decrease in performance. Some decrease is expected, due to reduced sample size and exclusion of the treatment type feature from the model (an important feature for CV prediction, S3 Table). Nevertheless, this experiment demonstrated that inclusion of longitudinal data in the model robustly improved predictive power.
Several other machine learning studies utilized data from SPRINT and demonstrated the advantage of multivariable predictors [8,9]. Our study is the first to utilize the longitudinal data and show that using these data significantly improves risk prediction. Our longitudinal models clearly illustrate that blood pressure measurements taken well ahead of the primary outcome contribute to prediction accuracy. This observation was corroborated in our results in two ways: 1) features of the longitudinal data had higher importance in our models than the baseline characteristics, and 2) longitudinal models that use only BP measurements outperformed all baseline models, each of which utilizing 15–20 features (Fig 1B). More generally, our results suggest that longitudinal data, which is expected to be available more broadly and at denser sampling frequency owing to new technologies (e.g. wearable devices), should become a standard tool for assessing and improving treatment policies. The advantage of using the longitudinal data is probably in part because the data carry information about the response to the treatment, which is a downstream effect of the treatment arm and a more direct risk factor.
Our study has some major limitations. First, we carried out a retrospective analysis. A randomized trial is needed to precisely estimate the effect of the new recommendation system and rule out post-randomization biases in the predictors. Second, as explained above, we decided to consider only kidney related SAEs, because of the results of the SPRINT study. Larger sample sizes in future studies may enable other predictions. In prediction of all SAE, extant models outperformed our baseline model (S12 Fig). Third, individual patient adherence to the assigned treatments may have introduced bias to our results. The fact that we cannot control for such a confounding effect is a limitation of this analysis. Finally, our main external validation was performed using the ACCORD dataset. Our results clearly showed the generalizability of our model, but the exact estimates of the hazard ratios should be interpreted with caution, as the ACCORD trial focused on diabetes subjects. In addition, since both SPRINT and ACCORD contained exclusively patients from the USA, additional validation on cohorts from other countries is needed to demonstrate the robustness of the predictors.
R scripts for reproducing all our results are available from https://github.com/Shamir-Lab/SPRINT
Supporting information
Acknowledgments
We thank Idan Nurick for helpful comments.
Data Availability
R scripts for reproducing all results shown in the manuscript are available from https://github.com/Shamir-Lab/SPRINT. The study utilizes data collected during the SPRINT and ACCORD trials. This data is owned by a third party and cannot be transferred by us due to privacy reasons imposed by the owner. It can be acquired from BioLINCC (contact via https://biolincc.nhlbi.nih.gov/studies/sprint/) for researchers who meet the criteria for access to confidential data.
Funding Statement
Study supported in part by the Israel Science Foundation (ISF) as part of the ISF-NSFC joint program (grant 2193/15), by ISF grant 1339/18, and by Len Blavatnik and the Blavatnik Family foundation (to RS) and a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel Aviv University (to GD).
References
- 1.Whelton PK, Carey RM, Aronow WS, Ovbiagele B, Casey DE, Smith SC, et al. 2017 ACC / AHA / AAPA / ABC / ACPM / AGS / APhA / ASH / ASPC / NMA / PCNA Guideline for the Prevention, Detection, Evaluation, and Management of High Blood Pressure in Adults. Hypertension. 2017;71: 85–87. [Google Scholar]
- 2.Ioannidis JPA. Diagnosis and treatment of hypertension in the 2017 ACC/AHA guidelines and in the real world. J Am Med Assoc. 2018;319: 115–116. 10.1001/jama.2017.19672 [DOI] [PubMed] [Google Scholar]
- 3.The SPRINT Research Group. A randomized trial of intensive versus standard blood-pressure control. N Engl J Med. 2016;36: 140–141. [Google Scholar]
- 4.Burns NS, Miller PW. Learning What We Didn’t Know—The SPRINT Data Analysis Challenge. N Engl J Med. 2017;376: 2205–2207. 10.1056/NEJMp1705323 [DOI] [PubMed] [Google Scholar]
- 5.Huang C, Dhruva SS, Coppi AC, Warner F, Li SX, Lin H, et al. Systolic blood pressure response in SPRINT (Systolic Blood Pressure Intervention Trial) and ACCORD (Action to Control Cardiovascular Risk in Diabetes): A possible explanation for discordant trial results. J Am Heart Assoc. 2017;6 10.1161/JAHA.117.007509 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Mezue K, Goyal A, Pressman GS, Horrow JC, Rangaswami J. Blood Pressure Variability Predicts Adverse Events and Cardiovascular Outcomes in Chronic Kidney Disease: A Post-Hoc Analysis of the SPRINT Trial. Am J Hypertens. 2018;31: 48–52. 10.1093/ajh/hpx128 [DOI] [PubMed] [Google Scholar]
- 7.Huesch MD. Serious Adverse Events Among SPRINT Trial Participants Taking Statins at Baseline. Drugs R D. 2017;17: 623–629. 10.1007/s40268-017-0213-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Patel KK, Arnold S V., Chan PS, Tang Y, Pokharel Y, Jones PG, et al. Personalizing the Intensity of Blood Pressure Control Modeling the Heterogeneity of Risks and Benefits From SPRINT (Systolic Blood Pressure Intervention Trial). Circ Cardiovasc Qual Outcomes. 2017;10 10.1161/circoutcomes.117.003624 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Basu S, Sussman JB, Rigdon J, Steimle L, Denton BT, Hayward RA. Benefit and harm of intensive blood pressure treatment: Derivation and validation of risk models using data from the SPRINT and ACCORD trials. PLoS Med. 2017;14: 1–26. 10.1371/journal.pmed.1002410 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Cox DR. Regression models and life-tables. J R Stat Soc Ser B. 1972;34: 187–202. [Google Scholar]
- 11.Chandrasekhar A, Kim C-S, Naji M, Natarajan K, Hahn J-O, Mukkamala R. Smartphone-based blood pressure monitoring via the oscillometric finger-pressing method. Sci Transl Med. 2018;10: 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Simon N, Friedman J, Hastie T, Tibshirani R. Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Softw. 2011;39: 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33: 1. [PMC free article] [PubMed] [Google Scholar]
- 14.Steyerberg EW. Clinical prediction models: a practical approach to development, validation, and updating. Springer Science & Business Media; 2008. [Google Scholar]
- 15.The ACCORD Study Group. Effects of intensive blood pressure control in type 2 diabetes mellitus. N Engl J Med. 2010;362: 1575–1585. 10.1056/NEJMoa1001286 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Pool LR, Ning H, Wilkins J, Lloyd-Jones DM, Allen NB. Use of Long-term Cumulative Blood Pressure in Cardiovascular Risk Prediction Models. JAMA Cardiol. 2018;3: 1096–1100. 10.1001/jamacardio.2018.2763 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Pearl J. Causality. Cambridge university press; 2009. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
R scripts for reproducing all results shown in the manuscript are available from https://github.com/Shamir-Lab/SPRINT. The study utilizes data collected during the SPRINT and ACCORD trials. This data is owned by a third party and cannot be transferred by us due to privacy reasons imposed by the owner. It can be acquired from BioLINCC (contact via https://biolincc.nhlbi.nih.gov/studies/sprint/) for researchers who meet the criteria for access to confidential data.