Skip to main content
Canadian Urological Association Journal logoLink to Canadian Urological Association Journal
. 2017 Jun;11(6):167–171. doi: 10.5489/cuaj.4569

The value of complementing administrative data with abstracted information on smoking and obesity: A study in kidney cancer

Madhur Nayan 1,, Robert J Hamilton 1, Antonio Finelli 1, Peter C Austin 2,3, Girish S Kulkarni 1, David N Juurlink 4
PMCID: PMC5472460  PMID: 28652873

Abstract

Introduction

Variables, such as smoking and obesity, are rarely available in administrative databases. We explored the added value of including these data in an administrative database study evaluating the association of statin use with survival in kidney cancer.

Methods

We linked administrative data with chart-abstracted data on smoking and obesity for 808 patients undergoing nephrectomy for kidney cancer. Base models consisted of variables from administrative databases (age, sex, year of surgery, and different measures of comorbidity [to compare their sensitivity to smoking and obesity data]); extended models added chart-abstracted data. We compared coefficients for statin use with overall (OS) and cancer-specific survival (CSS), and used the c-statistic and net reclassification improvement (NRI) to compare predications of five-year survival obtained from Cox proportional hazard models.

Results

The coefficient for statin use changed minimally following addition of abstracted data (<6% for OS, <2% for CSS). Base models performed similarly for OS, with c-statistics of 0.75 (95% confidence interval [CI] 0.72–0.79) for Charlson score and 0.73 (95% CI 0.69–0.78) for John Hopkins Aggregated Diagnosis Groups score. After including abstracted data, c-statistics modestly improved (change <0.02); CSS demonstrated similar findings. NRIs were 0.210 (95% CI 0.062–0.297) and 0.186 (−0.031–0.387) when using the Charlson score, and 0.207 (0.068–0.287) and 0.197 (0.007–0.399) when using the Aggregated Diagnosis Groups score, for OS and CSS, respectively.

Conclusions

The inclusion of data on smoking and obesity marginally influences survival models in kidney cancer studies using administrative data.

Introduction

Administrative databases are being increasingly used in clinical research over institutional clinical databases, as they provide large sample sizes, have strong external generalizability, and can have comprehensive information on followup;1 however, variables such as smoking and obesity are rarely available in administrative databases, but could be available in institutional databases from chart review. Studies using administrative data often note this as a limitation2,3 and a measure of comorbidity is often used to account for smoking and obesity, since these factors influence various aspects of health and may relate to a person’s overall health status.4 It remains unknown whether the addition of smoking and obesity data would markedly improve risk prediction compared to a model using a measure of comorbidity derived from administrative data. If including these variables is important for risk prediction, researchers should make efforts to obtain these data to improve the reliability of their results. Conversely, if little value is added, this additional exercise may not be worthwhile, given the costs and time associated with the process.

We explore this concept using as an example a cohort study evaluating the association of statin use with kidney cancer survival. Statins are commonly prescribed lipid-lowering medications that have recently gained interest in the oncology community based on studies showing that their use is associated with improved survival outcomes in various malignancies.5,6 In a large, population-based cohort study using administrative data, we recently demonstrated that statin use at the time of diagnosis was not significantly associated with cancer-specific (CSS) or overall survival (OS) in kidney cancer patients (Nayan et al. Manuscript in progress). We used a comorbidity score for risk adjustment, but we were unable to control for smoking and obesity, as these factors were not available in the databases used, despite previous studies demonstrating their independent associations with survival in kidney cancer.7,8 Considering that statin users may be more likely to be smokers and obese,911 being unable to control for these factors may have undermined our ability to demonstrate an association between statin use and survival in kidney cancer.

Therefore, using this example, the objectives of this study were to investigate the sensitivity of the association of statin use with survival after including data on smoking and obesity, and to evaluate the added value of including these data to predict survival in models using information from administrative data.

Methods

Patients and institutional data abstraction

After obtaining research ethics board approval, we retrospectively identified patients who underwent nephrectomy for unilateral, non-metastatic, sporadic kidney cancer between January 2000 and March 2015 using our institutional database (eKidney, University Health Network, Toronto). Although we have previously included patients undergoing nephrectomy after this date,1214 comorbidity information in administrative data was only available until March 2015.

Using electronic chart review, we abstracted information on smoking status, body mass index (BMI), and statin use at the time of surgery. We used the patient’s provincial health card number to link our institutional data with administrative data. The abstracted health card numbers were encrypted to allow anonymized analyses.

Administrative data sources

We obtained hospitalization data from the Canadian Institute for Health Information Discharge Abstract Database (CIHI-DAD). We used the Ontario Health Insurance Plan (OHIP) database to identify claims for physician services, and used the National Ambulatory Care Reporting System (NACRS) to obtain information on patient visits to hospital and community-based ambulatory care facilities. We obtained basic demographic data and date of death from the Registered Persons Database. We obtained cause of death information from the Ontario Cancer Registry. Details regarding the databases used and their validity have been provided elsewhere.3

Measures of comorbidity

To compare whether the added value of smoking and obesity data varied by the measure of comorbidity used, we evaluated two measures that were available in our administrative databases, namely the John’s Hopkins Aggregated Diagnostic Groups (ADG) score and the Charlson score.

Details of the Johns Hopkins Adjusted Clinical Group System have been described previously;15 briefly, this system uses the International Classification of Disease (ICD) diagnosis and procedural codes from both inpatient and outpatient claims to assign each ICD code to one of 32 diagnostic clusters, referred to as ADG. We used a weighted score of the individual 32 ADG categories, herein referred to as the ADG score, which was previously developed in the general adult Ontario population and shown to predict survival.16 For our study, data from inpatient and outpatient claims were derived from CIHI-DAD, NACRS, and OHIP.

In contrast, the Charlson score estimated from our administrative data only considers inpatient claims, and therefore only used CIHI-DAD, and is a weighted sum of predefined chronic conditions in the Charlson Comorbidity Index (CCI).17 Both the ADG and Charlson score were estimated from claims in the five years prior to date of nephrectomy.

Outcome assessment

The outcomes of interest were overall and kidney cancer-specific survival. For each outcome, patients were followed from the date of surgery until their date of last contact with health services, death, or the end of the study period (December 31, 2011 for cancer-specific mortality and November 30, 2016 for all-cause mortality), whichever occurred first. These dates were based on the most recent update of the database for each outcome.

Statistical analysis

All measures of comorbidity and BMI were modelled as continuous variables, while smoking status was categorized into never smoked, former smoker, and current smoker. Covariates in the model derived from administrative data included age at surgery, sex, and year of surgery.

We compared various nested Cox proportional hazard models to evaluate the association of statin use with OS and CSS. We used two base models that included only variables available in administrative data (base model 1: age, sex, year of surgery, and the Charlson score; base model 2: age, sex, year of surgery, and the ADG score). The extended models included these variables, as well as smoking status and BMI.

The regression coefficients for statin use obtained from each model were compared to determine how sensitive this coefficient was to the inclusion of smoking and BMI data. Furthermore, each model was evaluated for its accuracy on predicting five-year survival through assessments of discrimination (c-statistic). Finally, we used the continuous (category-free) net-reclassification improvement (NRI) measure1820 to determine the utility of the added institutional data to the administrative data to predict five-year survival. As a continuous measure, NRI is an average weighted estimate of the correct reclassification in the extended model among events and non-events.20,21 This is obtained by summing the NRI from each component (event and non-event) and dividing by two. Values for NRI range from −1 to 1, where 1 indicates that the extended model correctly reclassified all events into a higher predicted risk estimate and all non-events into a lower predicted risk estimate, while a value of −1 represents the opposite. For OS, we obtained predicted probabilities using a Kaplan-Meier approach, while for CSS, we used the cumulative incidence function22 due to the competing risk of death from other causes.

We used a macro21 to estimate discrimination and NRI, and 95% confidence intervals (CI) were estimated using 100 bootstrap samples. All statistical analyses were performed using SAS (version 9.4; SAS Institute, Cary, NC, U.S.).

Results

Cohort characteristics

In our institutional database, we identified 905 patients who met inclusion criteria between January 1, 2000 and March 31, 2015. Of these, complete data on smoking status and BMI were available in 840 patients, 808 (96.2%) of whom were successfully linked with administrative databases. Cohort characteristics are shown in Table 1. At the time of surgery, 239 (29.6%) patients used statins. Median followup from date of nephrectomy was 6.07 years (interquartile range [IQR] 3.45 – 9.07) for patients that did not die; there were 161 deaths from any cause and 41 due to kidney cancer.

Table 1.

Cohort characteristics

Characteristic Total Statin non-user Statin user
Male gender, n (%) 527 (65.2) 352 (61.9) 175 (73.2)
Age, median (IQR) 59 (50–68) 56 (47–65) 66 (56–72)
Statin use at surgery, n (%) 239 (29.6%) 0 (0) 239 (100)
BMI, median (IQR) 27.5 (24.5–30.8) 27.2 (24.2–30.5) 28.2 (25.3–31.4)
Smoking status, n (%)
 Never smoked 348 (43.1) 255 (44.8) 93 (38.9)
 Former smoker 346 (42.8) 225 (39.5) 121 (50.6)
 Currently smoking 114 (14.1) 89 (15.6) 25 (10.5)
Year of surgery, n (%)
 2000–2003 68 (8.4) 35 (6.2) 9 (3.8)
 2004–2006 145 (18.0) 80 (14.1) 22 (9.2)
 2007–2009 214 (26.5) 153 (26.9) 61 (25.5)
 2010–2012 213 (26.4) 128 (22.5) 74 (31.0)
 2013–2015 168 (20.8) 173 (30.4) 73 (30.5)
John Hopkin’s ADG score, median (IQR) 29 (20–36) 27 (19–34) 32 (24–41)
Charlson Comorbidity Index, median (IQR) 0 (0–1) 0 (0–0) 0 (0–2)
Died of kidney cancer, n (%) 41 (7.3) 31 (5.45) 10 (4.2)
Died of any cause, n (%) 161 (19.3) 106 (18.6) 57 (23.8)

562 patients with cause of death data available until December 31, 2011.

ADG: aggregated diagnosis group; BMI: body mass index; IQR: interquartile range.

Estimate of statin use on survival

In the base models, the hazard ratios for statin use with OS were 0.87 and 0.90, when using the Charlson and ADG score, respectively (Table 2). In the extended models, the respective hazard ratios were 0.92 and 0.93, corresponding to relative changes of 5.7% and 3.3%.

Table 2.

Comparison of nested models determining the hazard ratio of statin use on survival

Base model Extended model

Measure of comorbidity HR (95% CI) c-statistic HR (95% CI) c-statistic Change in c-statistic
Overall survival

Charlson score 0.87 (0.63–1.22) 0.748 (0.720–0.792) 0.92 (0.66–1.28) 0.761 (0.733–0.806) 0.013 (0.004–0.030)
John Hopkins ADG score 0.90 (0.65–1.26) 0.734 (0.694–0.782) 0.93 (0.66–1.29) 0.748 (0.719–0.797) 0.014 (0.005–0.034)

Cancer-specific survival

Charlson score 0.63 (0.30–1.31) 0.765 (0.716–0.842) 0.63 (0.30–1.31) 0.778 (0.747–0.856) 0.012 (0.004–0.069)
John Hopkins ADG score 0.62 (0.30–1.29) 0.765 (0.715–0.838) 0.61 (0.29–1.27) 0.781 (0.741–0.854) 0.016 (0.006–0.069)

Based on predicting five-year survival.

ADG: aggregated diagnosis groups; CI: confidence interval; HR: hazard ratio.

For CSS, the hazard ratios for statin use in the base models were 0.63 and 0.62, when using the Charlson and ADG score, respectively (Table 2). In the extended models, the respective hazard ratios were 0.63 and 0.61, corresponding to relative changes of 0.5% and 1.6%.

Comparison of model discrimination

The base model with the highest discriminative ability for the prediction of five-year OS used the Charlson score (Table 2, c-statistic=0.748, 95% CI 0.720–0.792), followed by the ADG score (c-statistic=0.734, 95% CI 0.694–0.782). After including smoking and BMI data, the discriminative abilities of the extended models were 0.761 (95% CI 0.733–0.806) and 0.748 (95% CI 0.719–0.797), respectively, corresponding to improvements in the c-statistic of 0.013 (95% CI 0.004–0.030) and 0.014 (95% CI 0.005–0.034).

For five-year CSS, the base models using either the Charlson (Table 2, c-statistic=0.765, 95% CI 0.716–0.842) or ADG score (c-statistic=0.765, 95% CI 0.715–0.838) provided similar discriminative ability. After including smoking and BMI data, the discriminative ability of the respective extended models was 0.778 (95% CI 0.747–0.856) and 0.781 (95% CI 0.741–0.854), corresponding to improvements in the c-statistic of 0.012 (95% CI 0.004–0.069) and 0.016 (95% CI 0.006–0.069).

Net-reclassification improvement

For five-year OS, the NRIs were 0.210 (95% CI 0.062–0.297) and 0.207 (0.068–0.287) when using the Charlson and ADG score, respectively.

For five-year CSS, the NRIs were 0.186 (−0.031–0.387) and 0.197 (0.007–0.399) when using the Charlson and ADG score, respectively.

Discussion

We evaluated the influence of adding data on smoking and obesity to models predicting OS and CSS in administrative data for patients undergoing nephrectomy for kidney cancer. Our primary observation is that the estimates of the association of a prespecified exposure with the outcome remained relatively consistent across models. Furthermore, models using administrative data only, but with different measures of comorbidity, had relatively similar abilities to predict survival, and this predictive ability was not markedly enhanced after including abstracted smoking and obesity data. Finally, few patients were correctly reclassified in terms of their predicted risk when including abstracted smoking and obesity data.

We found that statin recipients were more likely to have any history of smoking and a higher BMI, as observed by others.911 While several studies have shown that both of these factors are significantly associated with survival in kidney cancer,7,8,2325 other studies found no such association;2628 however, many of those studies showing a significant association for obesity with survival did not control for smoking,2325 and studies demonstrating an association for smoking with survival did not control for obesity.29,30 Therefore, the independent prognostic importance of smoking and obesity remains unknown. One potential explanation for the present findings is that smoking and obesity have little, if any, impact on survival following nephrectomy for kidney cancer. Indeed, the three nomograms3133 widely used to predict survival in localized kidney cancer incorporate various combinations of variables related to tumour characteristics, comorbidity burden, and symptoms, rather than information on smoking and obesity. Moreover, statistical significance of a predictor does not necessarily imply substantial improvement in model performance with its inclusion.18,21

Another possible explanation for our findings is that while smoking and BMI data may be associated with survival in kidney cancer, these factors contribute minimal prognostic information beyond available measures of comorbidity. Indeed, smoking and obesity relate to an individual’s overall health status.4 Some of the previous studies suggesting that smoking and obesity are significantly associated with survival in kidney cancer did not control for comorbidity burden.23,25,29 It may be the case that, after accounting for comorbidity burden, the independent associations of smoking and obesity on survival are largely extinguished.

Our study is strengthened by the use of two metrics to evaluate model performance — the c-statistic and NRI. Although an improvement in the c-statistic has been argued as the first criterion to evaluate improvement in predictive accuracy,18 it is unknown what value constitutes a model that adequately predicts risk, it is difficult to interpret clinically, and its value may change minimally despite the addition of important predictors to the model.18,21 In contrast, the NRI allows us to determine whether the predicted risk is correctly reclassified by the addition of a predictor to the model, facilitating clinical interpretation;18 however, as a continuous measure, the NRI does not provide information on the magnitude of change in predicted risk following the inclusion of additional predictors; while categorization of risk can facilitate the interpretation of whether including additional predictors would guide clinical decision-making,20 no accepted categories of risk exist in kidney cancer. Nonetheless, the results from our model discrimination and NRI analysis both suggested marginal improvement with the addition of smoking and obesity data.

Some limitations of our study merit emphasis. For example, it involved a modest number of patients from a single institution; however, the purpose of this study was to investigate the added value of abstracted data to administrative data, rather than estimating an association dependent on statistical power. Furthermore, it is not expected that smoking and obesity would differentially influence the statistical model of patients at one institution relative to others. Second, we could not account for changes in smoking and obesity status that may have occurred during followup. This information is seldom available, has not been done previously to the best of our knowledge, and would have to be modelled as time-dependent covariates; although the focus of this study was to investigate the sensitivity of the estimate of effect of statin use on survival, there is currently no known method of estimating the c-statistic or NRI in the presence of time-dependent covariates. It remains unknown whether accounting for changes to smoking status and BMI over time may result in a considerable change to the estimate of statin use on survival or predicting survival in kidney cancer. Finally, tumour characteristics are important prognostic variables in kidney cancer and were not included in the current study. Though this study focused on smoking and obesity data, further studies are needed to evaluate the potential value of including tumour characteristics on improving survival risk predictions when using administrative data in kidney cancer. Despite these limitations, our study is the first to investigate the importance of including abstracted smoking and obesity data in kidney cancer survival models and suggests that the results from studies in kidney cancer using administrative data are not strongly influenced by the omission of smoking or obesity data.

Conclusion

Our study used an example in kidney cancer to investigate the added value of including information on smoking and obesity to administrative data and found that their inclusion did not meaningfully influence the measures of association of interest or the model’s ability to predict survival. These results suggest that the interpretations of results from studies using administrative data in kidney cancer are not sensitive to the omission of data on smoking and obesity.

Footnotes

See related commentary on page 172

Competing interests: Dr. Hamilton has been an advisor for Abbvie, Astellas, Bayer, and Janssen; and participated in a clinical trial supported by Janssen. Dr. Austin participated in a clinical trial supported by Allergan. The remaining authors report no competing personal or financial interests.

This paper has been peer-reviewed.

References


Articles from Canadian Urological Association Journal are provided here courtesy of Canadian Urological Association

RESOURCES