Skip to main content
Diabetes Care logoLink to Diabetes Care
. 2008 Dec;31(12):2301–2306. doi: 10.2337/dc08-1047

Predicting 6-Year Mortality Risk in Patients With Type 2 Diabetes

Brian J Wells 1, Anil Jain 2, Susana Arrigain 1, Changhong Yu 1, Wayne A Rosenkrans Jr 3, Michael W Kattan 1
PMCID: PMC2584185  PMID: 18809629

Abstract

OBJECTIVE—The objective of this study was to create a tool that predicts the risk of mortality in patients with type 2 diabetes.

RESEARCH DESIGN AND METHODS—This study was based on a cohort of 33,067 patients with type 2 diabetes identified in the Cleveland Clinic electronic health record (EHR) who were initially prescribed a single oral hypoglycemic agent between 1998 and 2006. Mortality was determined in the EHR and the Social Security Death Index. A Cox proportional hazards regression model was created using medication class and 20 other predictor variables chosen for their association with mortality. A prediction tool was created using the Cox model coefficients. The tool was internally validated using repeated, random subsets of the cohort, which were not used to create the prediction model.

RESULTS—Follow-up in the cohort ranged from 1 day to 8.2 years (median 28.6 months), and 3,661 deaths were observed. The prediction tool had a concordance index (i.e., c statistic) of 0.752.

CONCLUSIONS—We successfully created a tool that accurately predicts mortality risk in patients with type 2 diabetes. The incorporation of medications into mortality predictions in patients with type 2 diabetes should improve treatment decisions.


Patients with type 2 diabetes have an increased risk of mortality (14), and tools for predicting overall mortality in diabetic patients are lacking. Tools for predicting the risk of cardiovascular disease (CVD) have been created and assist physicians in the prevention of CVD among patients with type 2 diabetes (58). However, these tools do not consider the risk of all-cause mortality. Preventing a myocardial infarction is important but not if the patient dies prematurely from kidney disease. Treating each disease in isolation is a strategy that fails to consider the overall effect to the patient. Scores have been created to adjust for diabetes severity and other comorbid conditions when mortality is studied (9,10). However, these indexes do not provide overall mortality predictions and therefore are of limited use in routine clinical practice. Furthermore, existing prediction tools do not adjust for a patient's current therapy. Specific diabetes medications may decrease the risk of one complication while increasing the risk of other outcomes. Tools are needed to help clinicians tailor therapy to individual patients to minimize mortality risk on the basis of characteristics of the patient, his or her disease, and the available treatment options. The purpose of this study was to create a mortality risk calculator for patients with type 2 diabetes that can be used to aid in treatment decisions.

RESEARCH DESIGN AND METHODS

This study was conducted on a retrospective cohort of patients with type 2 diabetes whose data were collected for clinical and administrative purposes in the electronic health record (EHR) at Cleveland Clinic. The cohort began in 1998, and follow-up data were obtained through 2006.

Eligibility criteria at baseline

Baseline was defined as the date of the first prescription for a qualifying oral antidiabetic agent in an eligible patient. Because we are interested in patients with type 2 diabetes, the cohort was limited to patients at least 18 years of age with a diagnosis of diabetes. Patients with diabetes were identified if they had a single diagnosis of diabetes in the “History” or “Problem List” sections of their chart. To reduce the chance of misclassification due to “rule out” diagnoses, we required two occurrences of “diabetes” for patients with diabetes identified from the “encounter diagnosis” section. The following codes from the ICD-9 were used to identify patients with diabetes: 250–250.99, 357.2, 362.01, 362.02, 366.41, and 648.01–648.04.

The cohort was further limited to patients who were prescribed a single one of the four most common types of oral hypoglycemic agents: sulfonylureas (SFUs), meglitinides (MEGs), biguanides (BIGs), or thiazolidinediones (TZDs). Patients prescribed α-glucosidase inhibitors were excluded because of an inadequate sample (n = 149). Patients with prescriptions for multiple oral agents at baseline were excluded because of the substantial number of possible two- and three-drug (10) combinations.

Exclusions

Polycystic ovarian syndrome is sometimes treated with a biguanide and could be confused with diabetes. Noninsulin injectable medications are used infrequently in the treatment of type 2 diabetes, and the patients receiving these medications may be fundamentally different. Therefore, patients with polycystic ovarian syndrome (ICD-9 256.4) and patients prescribed noninsulin injectable diabetes medications at baseline were excluded.

Outcome

Mortality was determined in the EHR and with linkage to the Social Security Death Index (SSDI). Patients classified as deceased per the SSDI or the EHR but who continued to have vital statistics entered into the EHR were counted as alive.

Predictor variables

The following variables were included in the model because of their independent associations with mortality in the literature: estimated glomerular filtration rate (GFR), A1C, BMI, systolic blood pressure (SBP), diastolic blood pressure (DBP), HDL and LDL cholesterol, triglycerides, history of congestive heart failure (CHF), history of coronary heart disease, smoking status, use of concomitant medications (insulin, ACE inhibitor/angiotensin receptor blocker [ARB], aspirin, clopidogrel, or lipid-lowering drug), new diabetes, sex, race, age, and oral medication class. Values for all of the predictor variables were extracted from the EHR. GFR was calculated from serum creatinine using the simplified equation from the Modification of Diet in Renal Disease Study Group (11). BMI was calculated using the standard equation (weight in kilograms divided by the square of height in meters). The baseline date was defined as the date of the first prescription for an oral hypoglycemic agent in a qualified individual. The baseline value for the predictor variables was defined as the value on the baseline date or the most recent historical value. If no historical lab or vital value was available then the value closest to the baseline date up to 21 days into the future was defined as the baseline. Patients were considered to be newly diabetic if they had been seen before their baseline date by either an endocrinologist or primary care physician at Cleveland Clinic and did not have a diagnosis of diabetes entered in the EHR at that time.

Interactions

A limited number of interactions were explored for inclusion in the predictive model but for parsimony were only included if they were statistically significant (P < 0.05). The following interactions were investigated in the model:

Medication class × GFR and medication class × age.

The medication class interactions with GFR and age were included because of the precautions advised for use of biguanides in older adults and in patients with renal dysfunction (12).

Sex × race.

The interaction for sex and race was explored because of the higher risk of coronary heart disease seen in black women (13).

Medication class × CHF.

The interaction between medication class and heart failure was explored because of the precautions advised for using TZDs and BIGs in patients with CHF (12).

Statistical methods

The analytic dataset was built using SAS, version 9.1. Missing values were imputed using Multiple Imputation by Chained Equations package, version 1.16, for R (14). Imputation was performed to maximize the available information and to reduce the potential bias introduced by deleting incomplete records. The imputation was performed using regression techniques that include all baseline patients and all baseline variables as predictors and without knowledge of the outcome. Linearity assumptions of the ordinal and continuous variables were relaxed by fitting the model using restricted cubic splines.

A Cox proportional hazards regression model was created with the predictor variables and interactions listed above with time to death as the outcome. Statistical analyses were performed using R for Unix, version 2.3.1 (15). The coefficients from the fitted Cox model were also used to develop an interactive Web-based tool that calculates the predictions automatically. The Web-based calculator is available from http://www.clinicriskcalculators.org.

Tenfold cross-validation was used to validate our modeling approach. With this method, the data are randomly partitioned into 10 equal segments. One segment (10% of the data) is extracted before the prediction model is fitted. The model is fitted with the remaining 90% of the data, and the prediction accuracy is evaluated on the outcomes observed in the 10% subsample. The process is repeated (10 times) until each segment of the data has been used to assess the prediction accuracy. A calibration curve was created by plotting the quintiles of the predicted probabilities on the observed estimates for the entire cohort. The model was also evaluated through the use of the concordance index (i.e., c statistic). In this process, the model is graded on its ability to differentiate between all possible discordant pairs of patients. (For example, patient 1 dies after 6 months in the cohort. Patient 2 dies after 3 years in the cohort. Does the predictive model correctly show that patient 1 had a higher risk of death?) Concordance indexes can vary between 0.5 (chance) and 1.0 (perfect prediction).

RESULTS

The final cohort had a sample size of 33,067 patients. Table 1 shows the characteristics of the cohort at baseline by drug class. Overall, the patients were predominantly white with a similar proportion of men and women. Patients taking BIGs were younger, had less heart failure, and were more likely to have newly diagnosed diabetes compared with patients taking the other drugs. Patients taking TZDs were the most likely to be using insulin, whereas patients taking MEGs had the lowest levels of LDL cholesterol and triglycerides.

Table 1.

Baseline characteristics of patients by drug class after imputation

BIG MEG SFU TZD Missing*
n 14,708 773 12,606 4,980
Continuous variables
    Age 57.8 ± 13.7 66.4 ± 13.3 66.4 ± 13.2 61.9 ± 12.9 0 (0)
    BMI 33.5 ± 7.5 30.3 ± 6.9 31.1 ± 6.9 33.4 ± 7.8 13,986 (42.3)
    LDL (mg/dl) 110.1 ± 39.0 94.2 ± 36.2 107.4 ± 39.2 107.1 ± 40.7 17,347 (52.5)
    HDL (mg/dl) 46.9 ± 14.0 49.4 ± 16.3 46.0 ± 14.4 46.2 ± 14.1 16,653 (50.4)
    Triglycerides (mg/dl) 203.6 ± 229.6 169.8 ± 116.5 202.9 ± 211.6 207.0 ± 214.1 16,861 (51.0)
Categorical variables
    Male sex 6,733 (45.8) 418 (54.1) 6,961 (55.2) 2,600 (52.2) 2 (<0.01)
    Caucasian 11,198 (76.1) 647 (83.7) 9,844 (78.1) 4,073 (81.8) 1,175 (3.6)
    Heart failure 431 (2.9) 97 (12.5) 1,030 (8.2) 255 (5.1) NA
    Insulin 1,934 (13.1) 214 (27.7) 1,371 (10.9) 1,568 (31.5) NA
    Aspirin 3,566 (24.2) 243 (31.4) 3,171 (25.2) 1,325 (26.6) NA
    Plavix 929 (6.3) 98 (12.7) 1,059 (8.4) 516 (10.4) NA
    ACE/ARB 7,286 (49.5) 443 (57.3) 6,699 (53.1) 2,921 (58.7) NA
    Cholesterol medication 7,098 (48.3) 409 (52.9) 5,630 (44.7) 2,911 (58.5) NA
    New diabetic 4,578 (31.1) 37 (4.8) 1,002 (7.9) 399 (8.0) NA
    Heart disease 1,533 (10.4) 147 (19.0) 1,791 (14.2) 688 (13.8) NA
    Current smokers 2,310 (15.7) 74 (9.6) 1,795 (14.2) 757 (15.2) 8,195 (24.8)
    History of stroke or transient ischemic attack 591 (4.0) 64 (8.3) 715 (5.7) 264 (5.3) NA
    GFR (ml/min) <60 1,875 (12.7) 293 (37.9) 3,501 (27.8) 1,329 (26.7) 10,702 (32.4)

Data are means ± SD or n (%). n = 33,067. All of the baseline characteristics showed significant differences when stratified by drug class.

*

Number of missing values before imputation. NA, not applicable.

Cumulative mortality by drug class is shown in Fig. 1. There were 1,958 patients followed for at least 5 years. The median length of follow-up was 28.6 months (range 1 day–8.2 years). The number of deaths per drug class were 799, 135, 2,220, and 507 for BIGs, MEGs, SFUs, and TZDs, respectively.

Figure 1.

Figure 1

Mortality curve over time.

The interaction between race and sex was not found to be statistically significant and was removed from the model for parsimony. The other interactions (medication × GFR, medication × age, and medication × CHF) were all statistically significant. The medication × CHF interaction was the least significant of these interactions and was removed from the final model to reduce the size of the final nomogram. Figure 2A and B shows the paper-based nomogram for predicting 6-year survival (model coefficients are available in the supplemental table in an online appendix, available at http://dx.doi.org/10.2337/dc08-1047).

Figure 2.

Figure 2

Nomogram for predicting 6-year probability of survival. Instructions for using the nomogram are as follows. Estimate the patient's GFR from his or her most recent serum creatinine level. Locate the value of the patient's age according to baseline medication and GFR in A, draw a line straight upward to the Points axis to determine the number of points contributed by age. Repeat this process for the other variables in the model. Sum the points achieved for each predictor in A. Repeat this process in B. Sum the points obtained in both parts of the nomogram, and find this total on the Total Points axis at the bottom of B. Draw a straight line down from the total points axis to determine the probability of 6-year survival. An important point to note about nomograms is the U-shaped relationship. In this nomogram, for instance, the LDL cholesterol predictor variable has a U-shaped relationship with the probability of survival. This is presented in the nomogram by having the direct relationship on one side of the scale and the indirect relationship on the other side of the scale. LDL cholesterol levels from 150 to 0 are shown under the scale and have a direct relationship with survival, whereas LDL cholesterol values from 150 to 450 are shown on the top of the scale and have an indirect relationship with survival. In other words, a patient with an LDL cholesterol of exactly 150 has the highest probability of survival, and as the LDL cholesterol goes up or down from 150, the risk of mortality increases. An example of use of the nomogram is the following. A 50-year-old man with type 2 diabetes presents today for his first visit at Cleveland Clinic. The physician caring for the patient (Pt) wants to know the risk of mortality for this specific patient over the next 6 years if he or she prescribes a BIG. Here are the characteristics for this patient along with the calculation using the survival nomogram: age 50 years, taking BIG, GFR 60 ml/min (18 points); A1C 10.0% (3 points); BMI 35 kg/m2 (0 points); systolic blood pressure 140 mmHg (0 points); diastolic blood pressure 80 mmHg (3 points); HDL cholesterol 35 mg/dl (6 points); LDL cholesterol 100 mg/dl (1 point); triglycerides 200 mg/dl (1 point); male sex (3 points); Caucasian (7 points); no heart disease (0 points); no heart failure (0 points); no smoking (0 points); no insulin (0 points); no ACE/ARBs (5 points); not newly diabetic (10 points); aspirin, yes (0 points); no clopidogrel (0 points); and no lipid-lowering drugs (0 points). Total points = 57. Probability of 6-year survival ∼0.94. Nomogram calculator available online from http://www.clinicriskcalculators.org.

The BIGs were associated with the lowest risk of mortality in younger patients. Because of the interactions between medication with GFR and medication with age, the medication associated with the highest probability of survival varied according to individual patient characteristics. As expected, smoking and high levels of A1C were associated with lower survival. LDL had a U-shaped relationship with mortality. The lowest risk of mortality was associated with an LDL of 150 mg/dl and increased with higher or lower LDL levels.

Figure 3 shows the calibration curve for the mortality prediction. The predicted survival was quite accurate at all quintiles of mortality risk. The concordance index for the survival prediction was 0.752, which indicates that the nomogram was correct 75.2% of the time in identifying which patient had the highest risk among all possible discordant patient pairs.

Figure 3.

Figure 3

Validation of the survival prediction. Vertical bars represent the 95% CIs by quintile. The 45° line represents a perfect prediction.

CONCLUSIONS

The prediction tool created in this study was accurate in predicting 6-year mortality risk among patients with type 2 diabetes. The c statistic of 0.752 indicates good discrimination ability of this tool, and the calibration curve shows that the prediction does not significantly overestimate or underestimate risk. If the current prediction tool performs well in other cohorts of patients with type 2 diabetes, it offers clinicians a tool for tailoring antidiabetes treatments with the aim of improving survival among patients with type 2 diabetes.

We are not aware of other tools designed to predict mortality in patients with type 2 diabetes in the clinical setting. It is hoped that the recently published Diabetes Complications Severity Index (DCSI) will improve the ability to adjust for the severity of type 2 diabetes in future regression models predicting mortality (9). However, the DCSI is not a stand-alone prediction tool and is designed to be used as a covariate in a larger model. We did not have all of the predictors necessary to add the DCSI score to our model.

The widespread use of this tool may be limited by the number of predictor variables that we included in the model and the requirement of GFR. We did not attempt to reduce the model using variable selection techniques (e.g., stepwise regression) because the omission of insignificant predictors tends to harm predictive accuracy (16). We hope that the online version of the calculator will make the model more user-friendly by calculating GFR automatically from serum creatinine and eliminating the need for the paper nomogram. The online calculator is available as mentioned earlier under statistical methods.

This study has several other weaknesses that should be discussed. First, there is the potential for misclassification bias surrounding the designation of the baseline oral medication. Prescriptions outside Cleveland Clinic are unknown, and we did not assess for medication changes within our system. However, in a separate analysis of this cohort, we found that ∼75% of patients were treated with the same oral diabetes medication throughout their time in the cohort. A second weakness of the study involves the substantial amount of missing data for some predictor variables. However, the imputation techniques used should help to limit the potential bias caused by simply eliminating incomplete records. Third, we were unable to accurately define the exact duration of type 2 diabetes. Patients whose diabetes was diagnosed on their first visit to Cleveland Clinic will not be recognized as newly diabetic. However, duration of diabetes is probably captured by the presence or absence of comorbidities and other variables such as GFR. Diabetes complications may be the most accurate reflection of diabetes duration because the disease frequently goes undetected for years after it first appears (17). Young et al. (9) found that duration of diabetes was not independently predictive of mortality. In addition, the presence of comorbid conditions such as heart failure and stroke may be underdocumented in the EHR. However, there is no reason to suggest that the documentation would vary according to drug class. To maximize the presence or absence of baseline conditions, we included information beyond documentation of ICD-9 codes. For instance, patients with a documentation of a coronary revascularization procedure were considered to have a history of heart disease. Despite any possible lack of documentation in the EHR, the model performed very well. A final weakness could be the loss to follow-up that is inevitable in this type of study. Fortunately, we were able to link participants with the SSDI, which should capture deaths of patients regardless of loss to follow-up at Cleveland Clinic.

The major strengths of the study include the large sample size and cohort design. The good prediction accuracy obtained through the cross-validation of the model suggests strong internal validity of these results. The Web-based version of this calculator provides the tool in an easily accessible format for clinical use. The current model requires external validation before the applicability of this model to other patient populations will be known.

Supplementary Material

Online-Only Appendix
dc08-1047_index.html (804B, html)

Acknowledgments

This research was funded by a grant from AstraZeneca, Wilmington, DE.

Published ahead of print at http://care.diabetesjournals.org on 22 September 2008.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

References

  • 1.Kannel WB, McGee DL: Diabetes and glucose tolerance as risk factors for cardiovascular disease: the Framingham study. Diabetes Care 2:120–126, 1979 [DOI] [PubMed] [Google Scholar]
  • 2.Morgan CL, Currie CJ, Peters JR: Relationship between diabetes and mortality: A population study using record linkage. Diabetes Care 23:1103–1107, 2000 [DOI] [PubMed] [Google Scholar]
  • 3.Morgan CL, Currie CJ, Stott NC, Smithers M, Butler CC, Peters JR: The prevalence of multiple diabetes-related complications. Diabet Med 17:146–151, 2000 [DOI] [PubMed] [Google Scholar]
  • 4.Moss SE, Klein R, Klein BE: Cause-specific mortality in a population-based study of diabetes. Am J Public Health 81:1158–1162, 1991 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wilson PW, D'Agostino RB, Levy D, Belanger AM, Silbershatz H, Kannel WB: Prediction of coronary heart disease using risk factor categories. Circulation 97:1837-1847, 1998 [DOI] [PubMed]
  • 6.Lee ET, Howard BV, Wang W, Welty TK, Galloway JM, Best LG, Fabsitz RR, Zhang Y, Yeh J, Devereux RB: Prediction of coronary heart disease in a population with high prevalence of diabetes and albuminuria: the Strong Heart Study. Circulation 113:2897–2905, 2006 [DOI] [PubMed] [Google Scholar]
  • 7.Stevens RJ, Kothari V, Adler AI, Stratton IM, United Kingdom Prospective Diabetes Study (UKPDS) Group: The UKPDS risk engine: a model for the risk of coronary heart disease in type II diabetes (UKPDS 56). Clin Sci (Lond) 101:671–679, 2001 [PubMed] [Google Scholar]
  • 8.Conroy RM, Pyorala K, Fitzgerald AP, Sans S, Menotti A, De Backer G, De Bacquer D, Ducimetiere P, Jousilahti P, Keil U, Njolstad I, Oganov RG, Thomsen T, Tunstall-Pedoe H, Tverdal A, Wedel H, Whincup P, Wilhelmsen L, Graham IM, SCORE project group: Estimation of ten-year risk of fatal cardiovascular disease in Europe: the SCORE project. Eur Heart J 24:987–1003, 2003 [DOI] [PubMed] [Google Scholar]
  • 9.Young BA, Lin E, Von Korff M, Simon G, Ciechanowski P, Ludman EJ, Everson-Stewart S, Kinder L, Oliver M, Boyko EJ, Katon WJ: Diabetes complications severity index and risk of mortality, hospitalization, and healthcare utilization. Am J Manag Care 14:15–23, 2008 [PMC free article] [PubMed] [Google Scholar]
  • 10.Charlson ME, Pompei P, Ales KL, MacKenzie CR: A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis 40:373–383, 1987 [DOI] [PubMed] [Google Scholar]
  • 11.National Kidney Foundation: K/DOQI clinical practice guidelines for chronic kidney disease: evaluation, classification, and stratification. Am J Kidney Dis 39:S1–266, 2002 [PubMed] [Google Scholar]
  • 12.Institute for Clinical Systems Improvement: Management of type 2 diabetes mellitus.[article online], 2006. Available from http://www.icsi.org/diabetes_mellitus_type_2/management_of_type_2_diabetes_mellitus_9.html. Accessed 23 January 2008
  • 13.Gillum RF, Mussolino ME, Madans JH: Coronary heart disease incidence and survival in African-American women and men. the NHANES I epidemiologic follow-up study. Ann Intern Med 127:111–118, 1997 [DOI] [PubMed] [Google Scholar]
  • 14.Van Burren S, Oudshoorn CGM: MICE: multivariate imputation by chained equations: R package version 1.16.2007. Vienna, R Foundation for Statistical Computing, 2007
  • 15.R Development Core Team: R: a languate and environment for statistical computing. 2006
  • 16.Harrell FE Jr, Lee KL, Mark DB: Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15:361–387, 1996 [DOI] [PubMed] [Google Scholar]
  • 17.Harris MI, Eastman RC: Early detection of undiagnosed diabetes mellitus: a US perspective. Diabetes Metab Res Rev 16:230–236, 2000 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Online-Only Appendix
dc08-1047_index.html (804B, html)

Articles from Diabetes Care are provided here courtesy of American Diabetes Association

RESOURCES