Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2008;2008:369–373.

Predicting Hemodialysis Mortality Utilizing Blood Pressure Trends

Ronilda Lacson 1
PMCID: PMC2655936  PMID: 18999118

Abstract

Background:

Mean Systolic Blood Pressure (SBP) is a predictor of mortality in hemodialysis (HD) patients. The hypothesis is that transforming SBP measurements to reflect trends would improve the quality of predictions.

Method:

Data consisted of 4,500 patients from a dialysis provider in the US with at least six months follow-up. Relative Difference in Percentage yielded six transformed variables, representing SBP trends. Models were constructed using Support Vector Machine (SVM). Results were compared to a baseline model utilizing six-month mean SBP. All models included age, gender, race, diabetes, vintage, and BMI. Pooling of repeated observations incorporated all repeated observations in a generalized person-month approach.

Results:

The AUC for the model using transformed variables on unseen data was 0.70, compared to 0.63 for the baseline model (p<0.00001). The AUC was 0.69 when modeling a pooled data set.

Conclusion:

The use of SBP trends significantly improved mortality prediction in HD patients.

Introduction

There were approximately 500,000 patients with End Stage Renal Disease in the United States in 2005, and this population continues to grow.1 The majority of patients, numbering over 315,000, are treated by hemodialysis. Hemodialysis patients have 10 to 20 times higher mortality risk than that of the general population.2 Risk factors for mortality include poor nutrition, diabetes, anemia and dyslipidemia, among many others.3;4 The role of blood pressure in cardiovascular morbidity and mortality in hemodialysis patients has not been fully elucidated. There appears to be increased risk for both individuals with abnormally high and low blood pressures in this population.5 Blood pressure is routinely measured at each hemodialysis session, making it easy to monitor, and is responsive to several types of therapy.

Numerous publications address the role of blood pressure as a predictor for mortality in dialysis patients.35 In most of these publications, blood pressure measurements are represented by a fixed (e.g. systolic, diastolic, pulse pressure) single value (e.g. mean, median), which summarizes the blood pressure measurement for each individual. Blood pressure measurements are typically aggregated over a week, a month or a quarter (baseline period) and used to predict subsequent mortality over a fixed follow-up period. Such representation of blood pressure often fails to provide a good representation for blood pressure variability and trends over time, both of which are readily accessible because these measurements are obtained as part of the routine protocol for hemodialysis therapy thrice weekly.

Apart from baseline blood pressure, a recent study noted that a decrease in systolic blood pressure (SBP) over time is associated with mortality.6 Specifically, patients who exhibited a greater decrease in SBP during a one-year period were more likely to die, compared to patients who survived. Moreover, for patients with low normal blood pressure (SBP < 120 mmHg) at baseline, those who survived exhibited an increase in SBP, while patients who died remained within this low blood pressure category. Such findings motivate utilizing sequential SBP measurements and trends over time in predicting mortality in this population.

Hemodialysis patients typically have 12–14 predialysis SBP measurements recorded every month. Simply including each individual SBP measurement in the model will exponentially increase the number of independent variables and will not take into account variability and changes over time; hence the need for data transformation. Data transformation is almost always necessary for processing time-series signals. This process ranges from geometric transformations, as used in image processing (e.g. enlargement, rotation) to Fourier transformations, as used in speech processing. Logarithmic transformations, differencing, and normalization techniques are also frequently used. The transformed data are then used with a selected algorithm for automatic analysis, compression, or feature extraction.

Several algorithms have been developed for use in repeated measurements of data. Auto-Regressive Integrated Moving Average is mainly used for forecasting time-series data, and has been used in medicine (e.g. forecasting seasonal admissions for asthma7). This class of models is used for forecasting time-series data that can be stationarized (i.e. the mean, variance, and autocorrelations are considered constant over time) by “transforms” such as differencing and logging. Signal processing algorithms using time-varying data, on the other hand, have been used in image processing and speech recognition.8 Traditional methods include hidden Markov models and discrete wavelet transforms. In addition, for financial forecasting, Support Vector Machines (SVM) have been employed to forecast prices of stocks and bonds using previous prices.9 In the latter approach, selecting a forecasting horizon is an essential first step in predicting time-series data. From a clinical perspective, this period should be of sufficient length to allow enough time for intervention to avert an undesirable outcome. From the prediction perspective, the period should be short enough because trends in data do not persist for too long.

The current study tested an algorithm for data transformation, similar to that used for financial forecasting, and used it in conjunction with an SVM classifier.9 The hypothesis is that appropriately transformed sequential blood pressure measurements would significantly improve the prediction of mortality in hemodialysis patients. In particular, the study had three major contributions. First, it devised a framework for transforming time-varying data (e.g. SBP) to a representation that is suitable for clinical prediction algorithms. Such transformed data remained concise while reflecting trend over time. Second, it implemented a machine learning method that could predict hemodialysis mortality using the transformed data. Model performance was compared with models that used only traditional variables for clinical prediction, including the use of simple mean to represent baseline blood pressure measurement. Third, data were augmented by using pooled personmonth data in lieu of individual patients to further refine the models. This technique, previously used with Logistic Regression for analysis of Framingham data, was shown to be comparable to time-dependent Cox regression analysis.10

Methods

A. Study Population and Setting

Data were obtained from Fresenius Medical Care North America, the largest provider of hemodialysis therapy in the United States. A five percent (5%) random sample was obtained from approximately 90,000 in-center prevalent hemodialysis patients who received treatment from October 1, 2003 to March 31, 2004, and survived the entire 6-month period (N=4,500). Patients’ pre-dialysis systolic blood pressure and survival information were collected until December 31, 2004. This data set was randomly divided into two parts – 3,500 cases for the training data set, and 1,000 cases for the test data set. This division of data was maintained for the pooled data analysis described in Part C below. Patient data were de-identified in compliance with HIPAA requirements. Demographic variables include age (as of January 1, 2004), race, gender, presence or absence of diabetes mellitus (DM), length of time since starting dialysis, which is also known as dialysis ‘vintage’ (in days, as of January 1, 2004), height, and baseline weight. Body mass index (BMI) was calculated from weight (in kg) divided by the square of height (in m2). A priori, the investigators empirically selected an observation period of six months to predict mortality for the succeeding month. Mean SBP over six months was utilized as a continuous independent variable in the baseline model, shown in Table 1, below.

Table 1.

Variables used in the analysis

Variable Name Descriptive statistics
Age 61.4 (mean) ± 14.8 (s.d.)
Race 53% male, 47% female
Gender 49% Caucasian, 41% African-American, 10% others
DM 53% had DM
Dialysis Vintage 1,287 days (mean)
Body Mass Index (BMI) 30 kg/m2 (mean)
Mean SBP (6 mo.) 153 mmHg (mean) ± 9 mmHg (s.d.)

B. Data Transformation and Analysis

The investigator averaged the data into monthly mean SBP. The study then utilized a simple but succinct approach to measure trend over time known as relative difference in percentage (RDP), as shown in Table 2.11

Table 2:

Additional SBP Variables. [SBP(i) refers to mean SBP at month i. MA_6m is the mean-adjusted SBP at month i]

Variable Calculation
RDP1 [(SBP(i) - SBP(i - 1))/SBP(i - 1)] * 100
RDP2 [(SBP(i - 1) - SBP(i - 2))/SBP(i - 2)] * 100
RDP3 [(SBP(i - 2) - SBP(i - 3))/SBP(i - 3)] * 100
RDP4 [(SBP(i - 3) - SBP(i - 4))/SBP(i - 4)] * 100
RDP5 [(SBP(i - 4) - SBP(i - 5))/SBP(i - 5)] * 100
MA_6m SBP(i) - last_6 month_average_SBP

This transformation has the added advantage of normalizing the data distribution of transformed data, thus potentially increasing the models’ predictive accuracy. This process yielded five transformed variables representing the RDP for each successive two months, and a sixth variable was also included, representing a six-month moving average subtracted from the latest month’s mean SBP (also shown in Table 2). The differencing transformation of the last month’s SBP was intended to reflect the relative magnitude of change over time.

Support Vector Machine

Models were constructed using a Support Vector Machine (SVM), using the LibSVM implementation (version 2.85).12 Kernel and parameter optimization were performed using the following analytical framework. A randomly selected subset (N= 1,000 patients) from the 3500 patient training set was utilized to create a development set. Linear, Radial Basis Function (RBF) and polynomial kernels were applied to fit an SVM using the remaining 2500 patients’ data from the training set. For each of the kernels, the corresponding parameters were optimized to obtain the best AUC. For example, two tuning parameters – γ, a parameter which controls the width of the kernel function; and C, which reflects the penalty parameter for the error terms, were optimized for the RBF kernel. The three top models were further evaluated with five-fold cross-validation to select the best model using the baseline variables as well as the transformed variables. Optimization was based on the AUC. None of the models, during the development of the optimal SVM, used any information from the original test set (N=1,000 patients), as described in Part A.

C. Evaluation

The best performing SVM using baseline variables was trained using all the combined data in the training set. This model was then applied to the test set. The AUC was calculated, including a 95% confidence interval. Similarly, the best performing SVM incorporating the transformed variables reflecting SBP trend was also applied to the test set and the corresponding AUC was calculated. The AUCs of the SVM using baseline variables and the SVM using the transformed variables (on the test data) were compared using Rockit implementation of the binned binormal method.13

Pooled Data

In order to incorporate all repeated observations in a generalized person-month approach, the data were further analyzed using the pooling of repeated observations (PRO) method, frequently used in the Framingham study cohort.10 Treating each month as a follow up interval, the PRO method pools observations over all intervals to examine the development of the clinical outcome. The pooled person-month data were assembled for this analysis, as described in Figure 1.

Figure 1.

Figure 1

Description of PRO method

Analysis of this restructured data set was performed using the same method for the regular non-pooled data set. The PRO method was applied to the original training data set (N=3,500), as well as to the testing data set (N=1,000), yielding 32,000 person-months from the training data and 9,000 person-months from the test data for analysis. Figure 2 illustrates how incorporating pooled repeated observations substantially increase the amount of analyzable data.

Figure 2.

Figure 2

Difference between pooled and nonpooled data sets for a hypothetical set of two patients, A and B. (Anrefers to research data for patient A at visit n. Shade indicates outcome where blue =“alive” and white =“died.”)

Results

Demographic Data

Descriptive statistics are shown in Table 1. The mean age of the study population was 61 (N=4,500). Fiftythree percent were male and 53% had DM. Approximately 49% were white and 41% were black.

The mean hemodialysis vintage was 1,287 days and the mean BMI was 30. The average SBP was 150 mmHg with a range from 70 to 290 mmHg. The mortality rate over one year was 14%.

SVM Results

The three best models applied on the training data set, with results from both the development set and five-fold cross validation, are shown in Table 3. The first three rows represent results for the three best performing baseline models, while the last three rows represent results for the SVM incorporating transformed SBP values. The simple linear and RBF kernels achieved more accurate performance in this clinical domain with moderate number of variables, in comparison to more complex polynomial kernels. The SVM showed stable performance using the development data set when compared to the results for five-fold cross-validation. The performances of the SVM models augmented with transformed SBP values were consistently better than those of the baseline models (p values for the differences were < 0.00001).

Table 3:

Results on training non-pooled data (Boldface represents best models, n/a=not applicable)

Model Kernel C γ AUC (Dev.) AUC (5-CV)
Baseline RBF 32 0.12 0.66 0.68
RBF 4 0.06 0.66 0.66
RBF 4 0.01 0.66 0.65
Transformed Variable Model RBF 32 0.02 0.71 0.71
RBF 32 0.01 0.71 0.71
Linear 8 n/a 0.71 0.71

Results from the pooled data analyses are shown in Table 4. Similarly, the augmented SVMs consistently demonstrated better discriminatory performance.

Table 4:

Results on training pooled data (Boldface represents best models, n/a=not applicable)

Model Kernel C γ AUC (Dev.) AUC (5-CV)
Baseline RBF 8 0.01 0.69 0.66
Linear 1 n/a 0.69 0.66
Linear 8 n/a 0.69 0.66
Transformed Variable Model Linear 1 n/a 0.73 0.70
RBF 2 0.01 0.73 0.70
RBF 2 0.02 0.73 0.70

The comparison of AUCs using binned binormal method in the previously unseen test set is shown in Table 5, using the best trained SVM for the baseline model and the transformed-variable model (both highlighted in boldface from Table 3). As shown by the non-overlapping 95% confidence intervals for both pooled and non-pooled data sets, the improvements in AUC for the SVM utilizing the transformed SBP variables were highly statistically significant (p<0.00001).

Table 5:

Comparison of AUCs on test set

Data Set Best Baseline Model Best Transformed Variable Model p-value
AUC 95% CI AUC 95% CI
Regular 0.63 0.60, 0.65 0.70 0.67, 0.72 <0.00001
Pooled 0.63 0.61, 0.64 0.69 0.67, 0.70 <0.00001

Discussion

The current study confirms that SBP is an important predictor of mortality in hemodialysis patients. Moreover, this study shows that use of data transformation to reflect SBP trend over time significantly improves the utility of such information, which could be better exploited since the data are routinely monitored and recorded every visit. Simply using more data (e.g. trivially adding all recorded SBPs into the model) is not sufficient. To show that the improvement in discrimination was not due primarily to the increase in the number of variables the investigator attempted to further augment the model by adding the mean SBP every month to the best model, and as expected, there was no significant effect (data not shown). However, the relative amount of change in SBP over time has been shown to significantly predict mortality.

A practical application of this model will include a continued monitoring process for alerting clinicians when there is deterioration in a patient’s status that portends some clinical outcome, thus providing an opportunity for intervention. The use of SBP for predicting mortality in the hemodialysis population is important for three reasons: (1) it is frequently monitored and recorded, which makes it readily available for data analysis; (2) it is clinically significant because of its role in human physiology and as a risk factor for disease and mortality; and (3) it is actionable, such that medications and fluid adjustment can modify it directly. This study was not designed to differentiate whether or not SBP is a causative factor in hemodialysis mortality, or simply an associated finding that reflects impending clinical outcome. However, it suggests that whenever a recorded SBP is showing a decreasing trend, which would be predictive of mortality in some subsets of patients, it would be prudent to more closely study that patent’s clinical status and consider potential interventions.

This study is a first step towards predicting hemodialysis mortality using data transformations reflecting trends in SBP. Results consistently indicated that the use of SBP trends, reflected in the data transformation, significantly predicted mortality, evident even with the relatively small data set (N=4,500). This work can be extended primarily in two directions. The use of multiple time duration granularity (instead of just one month) may further improve the model and increase the forecasting time horizon, thus allowing for more time for interventions. Second, the algorithm can be applied to other continuously monitored time-varying data in the clinical domain. In hemodialysis patients for whom clinical information is routinely collected, it is prudent to examine the role of weight and laboratory parameters, such as albumin and CRP, in relation to clinical outcome. Further experimentation with other expressive machine learning algorithms that more effectively utilize readily available continuously monitored time-varying clinical data would also be extremely valuable.

Acknowledgments

This research is supported by the National Library of Medicine Grants R01 1LM009520-01 and 2T15LM007092-16. I would like to acknowledge use of proprietary patient data from Fresenius Medical Care, North America. I thank Drs. Michael Lazarus, Raymond Hakim, Eduardo Lacson and Lucila Ohno-Machado for reviewing the manuscript and for their helpful suggestions and comments.

References

  • 1.2007. USRDS. The 2007 USRDS Annual Data Report (ADR). http://www.usrds.org/adr.htm.
  • 2.Foley RN, Parfrey PS, Sarnak MJ. Clinical epidemiology of cardiovascular disease in chronic renal disease. Am J Kidney Dis. 1998;32(5 Suppl 3):S112–S119. doi: 10.1053/ajkd.1998.v32.pm9820470. [DOI] [PubMed] [Google Scholar]
  • 3.Bradbury BD, Fissell RB, Albert JM, Anthony MS, Critchlow CW, Pisoni RL, et al. Predictors of early mortality among incident US hemodialysis patients in the Dialysis Outcomes and Practice Patterns Study (DOPPS) Clin J Am Soc Nephrol. 2007;2(1):89–99. doi: 10.2215/CJN.01170905. [DOI] [PubMed] [Google Scholar]
  • 4.Levin A, Foley RN. Cardiovascular disease in chronic renal insufficiency. Am J Kidney Dis. 2000;36(6 Suppl 3):S24–S30. doi: 10.1053/ajkd.2000.19928. [DOI] [PubMed] [Google Scholar]
  • 5.Lacson E, Jr, Lazarus JM. The association between blood pressure and mortality in ESRD-not different from the general population? Semin Dial. 2007;20(6):510–517. doi: 10.1111/j.1525-139X.2007.00339.x. [DOI] [PubMed] [Google Scholar]
  • 6.Li Z, Lacson E, Jr, Lowrie EG, Ofsthun NJ, Kuhlmann MK, Lazarus JM, et al. The epidemiology of systolic blood pressure and death risk in hemodialysis patients. Am J Kidney Dis. 2006;48(4):606–615. doi: 10.1053/j.ajkd.2006.07.005. [DOI] [PubMed] [Google Scholar]
  • 7.Chen CH, Xirasagar S, Lin HC. Seasonality in adult asthma admissions, air pollutant levels, and climate: a population-based study. J Asthma. 2006;43(4):287–292. doi: 10.1080/02770900600622935. [DOI] [PubMed] [Google Scholar]
  • 8.Jobert M, Tismer C, Poiseau E, Schulz H. Wavelets-a new tool in sleep biosignal analysis. J Sleep Res. 1994;3(4):223–232. doi: 10.1111/j.1365-2869.1994.tb00135.x. [DOI] [PubMed] [Google Scholar]
  • 9.Cao LJ, Tay FH. Support vector machine with adaptive parameters in financial time series forecasting. IEEE Trans Neural Netw. 2003;14(6):1506–1518. doi: 10.1109/TNN.2003.820556. [DOI] [PubMed] [Google Scholar]
  • 10.D'Agostino RB, Lee ML, Belanger AJ, Cupples LA, Anderson K, Kannel WB. Relation of pooled logistic regression to time dependent Cox regression analysis: the Framingham Heart Study. Stat Med. 1990;9(12):1501–1515. doi: 10.1002/sim.4780091214. [DOI] [PubMed] [Google Scholar]
  • 11.Thomason M. The practitioner methods and tool. J Comput Intell in Finance. 1999;7(3):36–45. [Google Scholar]
  • 12.Chang C-C, Lin CJ. LIBSVM: A library for support vector machines. 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  • 13.Kurt Rossman Laboratories for Radiologic Image Research ROCKIT: ROC analysis software. 2008. http://www-radiology.uchicago.edu/krl/KRL_ROC/software_index6.htm.

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES