Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Jan 31.
Published in final edited form as: Cancer Causes Control. 2016 Jan 21;27(3):391–401. doi: 10.1007/s10552-016-0715-8

Validation of self-reported comorbidity status of breast cancer patients with medical records: the California Breast Cancer Survivorship Consortium (CBCSC)

Cheryl Vigen 1,, Marilyn L Kwan 2, Esther M John 3, Scarlett Lin Gomez 3, Theresa H M Keegan 3, Yani Lu 4, Salma Shariff-Marco 3, Kristine R Monroe 1, Allison W Kurian 5, Iona Cheng 3, Bette J Caan 2, Valerie S Lee 2, Janise M Roh 2, Leslie Bernstein 4, Richard Sposto 6, Anna H Wu 7
PMCID: PMC5792190  NIHMSID: NIHMS935527  PMID: 26797455

Abstract

Purpose

To compare information from self-report and electronic medical records for four common comorbidities (diabetes, hypertension, myocardial infarction, and other heart diseases).

Methods

We pooled data from two multiethnic studies (one case–control and one survivor cohort) enrolling 1,936 women diagnosed with breast cancer, who were members of Kaiser Permanente Northern California.

Results

Concordance varied by comorbidity; kappa values ranged from 0.50 for other heart diseases to 0.87 for diabetes. Sensitivities for comorbidities from self-report versus medical record were similar for racial/ethnic minorities and non-Hispanic Whites, and did not vary by age, neighborhood socioeconomic status, or education. Women with a longer history of comorbidity or who took medications for the comorbidity were more likely to report the condition. Hazard ratios for all-cause mortality were not consistently affected by source of comorbidity information; the hazard ratio was lower for diabetes, but higher for the other comorbidities when medical record versus self-report was used. Model fit was better when the medical record versus self-reported data were used.

Conclusions

Comorbidities are increasingly recognized to influence the survival of patients with breast or other cancers. Potential effects of misclassification of comorbidity status should be considered in the interpretation of research results.

Keywords: Breast cancer, Comorbidity, Concordance, Misclassification, Sensitivity and specificity, Survival

Introduction

Evidence has accumulated showing that comorbid conditions influence survival after a breast cancer diagnosis [112]. Comorbidity data are commonly derived from self-report and medical records. Although both sources may be subject to error, medical records are generally considered to be a more reliable source of comorbidity information than self-report [1318]. Previous studies have explored the accuracy of self-reported comorbidities compared to medical records, but they were limited in sample size and racial/ethnic diversity [1322]. Attention to non-differential misclassification of comorbidity status by race/ethnicity, however, is necessary to avoid the substantial bias that can occur when misclassification of covariates differs, not by disease status, but instead by exposure classification [23]. Furthermore, the accuracy of self-reports has been assessed in general population and disease-specific cohorts, but no study has specifically examined the accuracy of comorbidities reported by breast cancer patients.

For 1,936 women with breast cancer who were members of the Kaiser Permanente Northern California (KPNC) health plan at the time of their breast cancer diagnoses and are a part of the California Breast Cancer Survivorship Consortium (CBCSC), we compared the comorbidity status obtained by self-report (in-person interviews or self-administered questionnaires) to that found in electronic medical records (EMR) for four common comorbidities available in our studies. The specific comorbidities we examined were diabetes, hypertension, myocardial infarction (MI), and other heart diseases, representing a selected subset of the comorbidities that are of interest in breast cancer survival. We explored whether discrepancies between the two sources of comorbidity information differed by demographic characteristics such as age, race/ethnicity, and neighborhood socioeconomic status (SES), and by comorbidity characteristics such as timing and treatment for comorbidity. We also examined the impact of source of comorbidity information on hazard ratios (HR) for all-cause mortality in multivariable models.

Methods

Study population and comorbidity data collection

This analysis included a subset of women diagnosed with breast cancer who are part of the CBCSC, which harmonized and pooled existing questionnaire data from six studies of breast cancer to explore racial/ethnic disparities in breast cancer survival [12, 2426]. Two of these studies, the San Francisco Bay Area Breast Cancer Study (SFBCS) [27] and the Life After Cancer Epidemiology (LACE) study [28], enrolled participants who were members of Kaiser Permanente Northern California (KPNC) and had obtained self-reported information on select comorbidities. Individual studies received Institutional Review Board (IRB) approval from their respective institution(s) to participate in this collaboration, and IRB approval permitting the use of California Cancer Registry (CCR) data was also obtained from the State of California Committee for the Protection of Human Subjects.

The SFBCS is a population-based case–control study of breast cancer in which participants, who were enrolled during the years 1999–2003, were interviewed using a structured questionnaire. A subset of study participants (those enrolled later in the recruitment period) was asked whether a doctor had ever diagnosed specific comorbidities (diabetes, hypertension, heart disease) before the breast cancer diagnosis, and whether they were currently taking any medication for the condition. Women who were KPNC members were included in this analysis.

LACE participants were KPNC members and breast cancer survivors enrolled in the cohort study during the years 2000–2002 within 39 months (mean of 22 months) after breast cancer diagnosis. The LACE baseline questionnaire assessed history (as of the date of interview) of the four conditions of interest (diabetes, hypertension, MI, other heart diseases), as well as use of insulin injections, oral hypoglycemic medications, diuretics, blood pressure medications, and other medications for heart problems. Participants were asked two questions about being treated for diabetes: Did they have diabetes requiring insulin (yes/no), and did they have diabetes not requiring insulin (yes/no). Women were coded as having diabetes if they replied “yes” to either question. The presence of other heart diseases was determined by a positive response to “Other heart-related problems (not specified above).”

Kaiser Permanente Medical Records (KPMR)

Through linkage with the CCR, we identified 896 SFBCS and 1,731 LACE participants who were diagnosed with breast cancer at a KPNC hospital. We then limited breast cancer diagnoses to those recorded from 1997 and onward, given that KPNC electronic data capture began consistently in 1996, thus allowing for capture of at least 1 year of comorbidity data before breast cancer diagnosis. As a result, 1,936 participants in SFBCS (n = 327) and LACE (n = 1,609) were included in this analysis.

The KPMR is supported by the Virtual Data Warehouse (VDW) of the former Health Maintenance Organization Research Network (now called the Health Care Systems Research Network) which uses input from EMR and insurance data to create research-quality databases [29]. For the eligible cohort, we searched the KPMR for first diagnosis of the four comorbidities of interest from at least one year prior to the patient’s date of breast cancer diagnosis, using the following ICD-9 codes: diabetes: 249.0–249.91, 250.0–250.93; hypertension: 401.0–405; MI: 410; and other heart diseases: 411–414, 415–417, 420–429, 390–392, 393–398, 746.9. These codes enumerated for other heart diseases include atherosclerosis, heart failure, and congenital anomalies, among others. Along with the diagnostic code, we obtained the date associated with the first diagnosis of the comorbidity in the KPMR. We also extracted medications (prescriptions filled) for diabetes (insulin, sulfonylureas, and others) and hypertension (diuretics, angiotensin-converting enzyme inhibitors, calcium channel blockers, and others) from the KPNC VDW outpatient pharmacy database.

California Cancer Registry (CCR) data

Data on patient demographic and tumor characteristics were available from the CCR, including age at diagnosis, American Joint Committee on Cancer (AJCC) stage, tumor size, grade, estrogen receptor (ER) and progesterone receptor (PR) status, number of positive nodes, prior cancer diagnoses, treatment (surgery, chemotherapy, radiation), neighborhood SES, and marital status. Study participants were linked to the CCR to obtain vital status as of 31 December 2010 and, among those who died, date of death.

Statistical analysis

Concordance

We defined the concordance (reference) date as the date for which the comorbidity status was asked in the study questionnaire (the year of breast cancer diagnosis for SFBCS and date of interview for LACE). Comorbidity status in the KPMR was positive if the comorbidity ICD-9 code was found in the patient’s EMR on or prior to the concordance date. Sensitivity, specificity, and kappa statistics were calculated using these criteria. P values for differences in sensitivity and specificity between groups defined by dichotomized demographic variables [e.g., race/ethnicity (non-Hispanic White vs. all other), age (<60 vs. ≥60 years)] were calculated using Fisher’s exact tests that compared the frequencies of correct/incorrect reporting between the two groups within participants with the condition (for sensitivity) or without the condition (for specificity). The KPMR was treated as the “gold standard” in sensitivity and specificity calculations. Although EMRs may also contain errors, we refer to a woman with a comorbidity diagnosis in her KPMR in the appropriate time frame as having the comorbidity, while we refer to a woman answering positively to the comorbidity in the study questionnaire as reporting the comorbidity. Kappa is a measure of inter-observer agreement with a value of 0 indicating no better than chance and a value of 1.0 indicating perfect agreement [30]. Kappa values in excess of 0.80 are commonly considered to be excellent.

We stratified measures of concordance by race/ethnicity (non-Hispanic Whites vs. all others), age at interview (<60 vs. ≥60 years), neighborhood SES [block-group-level composite index based on statewide quintiles [31] collapsed into low (quintiles 1–3) or high (quintiles 4, 5)], and education (less than college graduate vs. college graduate).

Reasons for discordance

We explored reasons for false negatives (i.e., women having a comorbidity according to KPMR but not reporting it) by considering the lag time between self-report and KPMR comorbidity diagnosis and medication use according to the KPMR. We first evaluated whether some of the discordance could be explained by women with a recent diagnosis of comorbidity not yet being sure about disease status. Although medication usage was not considered when determining comorbidity status either according to the KPMR or according to self-report, we explored whether some women who were not treated with medications might have reported that they did not have the comorbidity, or conversely, that women whose disease was controlled by medication might have responded that they did not have the comorbidity.

Associations of comorbidities with all-cause mortality by source of comorbidity information

We evaluated the effect of source of comorbidity information on associations of comorbidities with all-cause mortality. Specifically, for each comorbidity, we compared hazard ratio (HR) estimates based on self-report versus EMR. We used Cox proportional hazards regression models to estimate HRs and 95 % confidence intervals (CIs), stratifying by study (LACE or SFBCS) and adjusting for age at breast cancer diagnosis [age and ln(age)], AJCC stage (I, II, III, IV, or unknown), tumor size (<1, 1 to<5, or ≥5 cm), grade (I, II, III/IV, or unknown), ER/PR status (ER+/PR+, ER+/PR−, ER−/PR+, ER−/PR−, unknown), number of positive nodes (0, ≥1, or unknown), prior cancer (yes or no), chemotherapy (yes, no, or unknown), breast surgery (none, mastectomy, lumpectomy, or other), age at first birth (nulliparous,<20, 20–24, 25–29, 30–34, or ≥35), alcohol consumption (none, ≤2 drinks/week, >2 drinks/week, or unknown), smoking (never, past <1 pack/day, past >1 pack/day, current ≤1 pack/day, current >1 pack/day, or unknown), education (<high school, high school, some college, college graduate, or unknown), marital status (single, married, separated/divorced, widowed, or unknown), neighborhood SES (statewide quintile or unknown), race/ethnicity (non-Latina White, African American, Latina, Asian American, or other), nativity (USA, other, or unknown), and body mass index (BMI) (<18.5, 18.5–24.9, 25.0–29.9, 30.0–34.9, 35.0–39.9, ≥40 kg/m2, or unknown). For each comorbidity, we used the attained age model for all-cause mortality to evaluate the effect of the comorbidity data source on the HR estimates for the comorbidity and for the other covariates contained in each of the models. Model fit was assessed using the Akaike information criterion (AIC), which adjusts for the number of terms in the model and the number of observations used.

All data were analyzed using SAS for Windows, version 9.4 (SAS Institute, Cary, N.C.). All p values are two-sided.

Results

The mean age at interview of women in this validation study was 60.4 (standard deviation, SD 11.0) years (Table 1). Most women (71 %) were non-Hispanic White, and 33 % were college graduates. Between 1,609 and 1,936 women with breast cancer were included in the analyses of each of the four comorbidities we studied.

Table 1.

Demographic characteristics of CBCSC Comorbidity Validation Study, 1997–2010

Total (n = 1,936) LACE (n = 1609) SFBCS (n = 327)
Age at interview 60.4 (11.0) 61.0 (10.9) 57.6 (11.0)
Education
 Less than high school 171 (8.8) 83 (5.2) 88 (26.9)
 High school graduate 446 (23.0) 360 (22.4) 86 (26.3)
 Some college 679 (35.1) 585 (36.4) 94 (28.8)
 College graduate 633 (32.7) 574 (35.7) 59 (18.0)
 Unknown 7 (0.4) 7 (0.4) 0 (0)
Race/ethnicity
 Non-Hispanic White 1,370 (70.8) 1,327 (82.5) 43 (13.1)
 African American 133 (6.9) 72 (4.5) 61 (18.7)
 Hispanic 292 (15.1) 69 (4.3) 223 (68.2)
 Asian American 103 (5.3) 103 (6.4) 0 (0)
 Other 38 (2.0) 38 (2.4) 0 (0)
Number of breast cancer patients asked comorbidity question
 Diabetes 1,936 1,609 327
 Hypertension 1,936 1,609 327
 Myocardial infarction 1,609 1,609 0
 Other heart disease 1,768 1,609 159

CBCSC California Breast Cancer Survivorship Consortium, LACE Life After Cancer Epidemiology, SFBCS San Francisco Bay Area Breast Cancer Study

Specificity was high for diabetes, hypertension, MI, and other heart diseases, ranging from 96.0 % to 99.5 % (Table 2). Sensitivity ranged from 48.0 % (other heart diseases) to 90.5 % (MI). Specificity was similar in non-Hispanic Whites and other race/ethnicities, but sensitivity for hypertension was higher in racial/ethnic minorities than in non-Hispanic Whites (unadjusted p = 0.0004, adjusted for study p = 0.02). Kappa statistics ranged from 0.50 for other heart diseases to 0.87 for diabetes. Data within the other race/ethnicity category were stratified into African American, Hispanic, Asian American, and others, and results are also shown in Table 2. The small cell sizes, however, provide limited interpretation.

Table 2.

Concordance of CBCSC Comorbidity Validation Study, 1997–2010

Kaiser Permanente medical record (gold standard) P valuea All others by race/ethnicity


Total (n = 1,936) Non-Hispanic Whites (n = 1,370) All others (n = 566)b African American Hispanic Asian American Others







Yes No Yes No Yes No Yes No Yes No Yes No Yes No
Questionnaire response
Diabetes
 Yes 164 8 89 4 75 4 25 1 39 3 6 0 5 0
 No 34 1,683 22 1,219 12 464 3 103 3 241 6 89 0 31
 Missing 11 36 8 28 3 8 1 0 2 4 0 2 0 2
 Kappa (95 % CI) (among non-missing) .87 (.84,.91) .86 (.81,.91) .89 (.83,.94) .91 (.82,1.00) .92 (.85,.98) .64 (.38,.90) 1.000
 Sensitivity/specificity (among non-missing) .828/.995 .802/.997 .862/.991 0.23/0.34 .893/.990 .929/.988 .500/.937 1.000/1.000
Hypertension
 Yes 627 14 420 8 207 6 68 1 95 3 34 0 10 2
 No 156 1,108 127 789 29 319 8 55 12 180 8 60 1 24
 Missing 16 15 13 13 3 2 1 0 2 0 0 1 0 1
 Kappa (95 % CI) (among non-missing) .81 (.78,.84) .78 (.75,.82) .87 (.83,.91) .86 (.78,.95) .89 (.83,.94) .83 (.72,.94) .81 (.61,1.00)
 Sensitivity/specificity (among non-missing) .801/.988 .768/.990 .877/.982 0.0004/0.25 .895/.982 .888/.984 .810/1.000 .909/.923
Myocardial infarction
 Yes 19 17 16 13 3 4 1 2 1 1 1 1 0 0
 No 2 1,518 2 1,255 0 263 0 66 0 61 0 100 0 36
 Missing 2 51 2 39 0 12 0 3 0 6 0 1 0 2
 Kappa (95 % CI) (among non-missing) .66 (.52,.80) .68 (.52,.83) .59 (.23,.95) .49 (−11, 1.00) .66 (.04,1.00) .66 (.04,1.00)
 Sensitivity/specificity (among non-missing) .905/.989 .889/.990 1.000/0.985 1.00/0.52 1.000/.971 1.000/.984 1.000/.990
Other heart diseases
 Yes 131 58 107 45 24 13 8 2 7 6 6 3 3 2
 No 142 1,388 115 1,066 27 322 16 103 5 100 6 87 0 32
 Missing 12 37 10 27 2 10 1 3 1 5 0 1 0 1
 Kappa (95 % CI) (among non-missing) .50 (.44,.56) .51 (.44,.57) .49 (.35,.62) .41 (.19,.62) .51 (.26,.76) .52 (.25,.80) .72 (.36,1.00)
 Sensitivity/Specificity (among non-missing) .480/.960 .482/.960 .471/.961 1.00/1.00 .333/.981 .583/.943 .500/.967 1.000/.941

CBCSC California Breast Cancer Survivorship Consortium, CI confidence interval

a

P value for sensitivity and specificity differing by race/ethnicity among women with non-missing comorbidity on questionnaire (Fisher’s exact test)

b

All others included African Americans, Hispanics, Asian Americans, and others

When analyses were stratified by age at diagnosis or interview (Table 3), we found that for each comorbidity, sensitivity was similar in women <60 years and those ≥60 years of age; however, specificity for MI was higher in younger women compared to older women (1.00 and 0.98, respectively, p = 0.0003). We considered additional cut points for age (65, 70, and 75 years), and results were similar (data not shown). For each comorbidity, sensitivity and specificity did not differ by neighborhood SES or education, except that specificity for MI was higher in the high (quintile 4 or 5) versus low SES group (0.99 and 0.98, respectively, p = 0.01) and in college graduates versus those with lower education (0.99 and 0.98, respectively, p = 0.01).

Table 3.

Sensitivity and specificity by age, neighborhood SES, study, and education in CBCSC Comorbidity Validation Study, 1997–2010

Diabetes Hypertension Myocardial infarction Other heart disease
Age at interview <60 years
n with comorbidity 57 223 3 60
 Sensitivity 0.86 0.78 1.00 0.42
n without comorbidity 871 709 729 757
 Specificity 1.00 0.99 1.00 0.93
Age at interview ≥60 years
n with comorbidity 141 560 18 213
 Sensitivity 0.82 0.81 0.89 0.50
n without comorbidity 820 413 806 689
 Specificity 1.00 0.99 0.98 0.96
P values for difference by age group
 Sensitivity 0.54 0.37 1.00 0.31
 Specificity 1.00 0.59 0.0003 0.79
SES ≤3 Quintile
n with comorbidity 88 271 9 97
 Sensitivity 0.85 0.80 0.89 0.55
n without comorbidity 533 354 463 441
 Specificity 0.99 0.99 0.98 0.97
SES > 3 Quintile
n with comorbidity 105 488 12 166
 Sensitivity 0.81 0.80 0.92 0.43
n without comorbidity 1,096 723 1,010 950
 Specificity 1.00 0.99 0.99 0.96
P values for difference by SES
 Sensitivity 0.45 0.85 1.00 0.10
 Specificity 0.45 0.77 0.01 0.47
<College graduate
n with comorbidity 97 279 7 94
 Sensitivity 0.87 0.80 1.00 0.52
n without comorbidity 504 326 416 397
 Specificity 1.00 0.98 0.98 0.97
≥College graduate
n with comorbidity 99 500 13 179
 Sensitivity 0.79 0.80 0.85 0.46
n without comorbidity 1,184 795 1,115 1,046
 Specificity 0.99 0.99 0.99 0.96
P values for difference by education
 Sensitivity 0.19 1.00 0.52 0.37
 Specificity 0.45 0.25 0.01 0.45

CBCSC California Breast Cancer Survivorship Consortium, MI myocardial infarction, SES socioeconomic status

We explored possible explanations for the false positives (i.e., “yes” according to questionnaire response but “no” according to KPMR; Table 2). For diabetes, there were eight false positives; three of these women reported diabetes not requiring insulin and one woman was taking diabetes medication per her KPMR, although no diabetes diagnosis was present. Of the 14 women who were false positives for hypertension, one reported taking diuretic and antihypertensive medication, yet neither of these medications was found in her KPMR. Another woman was taking antihypertensive medication per her KPMR. There were 17 women who were false positives for MI. Sixteen of these women were positive for ischemic heart disease in their KPMR. Five women, including four who had ischemic heart disease, reported a date for the MI which was prior to their date of enrollment in KPNC. There were 58 false positives for other heart diseases. Among these women, four had diabetes, 29 had hypertension, and one had a MI in her KPMR, potentially accounting for 29 (50 %) of these false positives. No explanation for the other discrepancies could be found.

We also evaluated reasons for false negatives, i.e., women who were positive for a comorbidity per their KPMR, but responded negatively according to questionnaire data (Table 4). Women who were taking medications for diabetes or hypertension according to their KPMR were significantly more likely to report the comorbidity than women who were positive for the disease but not taking medication (p ≤ 0.0001 for both diabetes and hypertension, Table 4). Women who had been diagnosed with diabetes or hypertension more than 1 year prior to their interview date were more likely than women who had been diagnosed more recently to report that comorbidity on the study questionnaire (85 % vs. 68 % for diabetes, p = 0.03; 82 % vs. 58 % for hypertension, p = 0.0002). Only two women did not report a MI that was shown in their KPMR.

Table 4.

Covariates of correct versus incorrect self-report among women with confirmed comorbidity in KPMR in the CBCSC Comorbidity Validation Study, 1997–2010

Time between interview and comorbidity diagnosis Medication use per KPMR


≤1 Year > 1 Year P value Yes No P value




n % n % n % n %
Questionnaire response
Diabetes
 Yes 19 68 145 85 .03 158 92 6 22 <.0001
 No 9 32 25 15 13 8 21 78
Hypertension
 Yes 31 58 596 82 .0002 495 83 132 69 .0001
 No 22 42 134 18 98 17 58 31
Other heart diseases
 Yes 8 33 123 49 .13
 No 16 67 126 51

CBCSC California Breast Cancer Survivorship Consortium, KPMR Kaiser Permanente medical record

In Table 5, we show for each comorbidity the HRs for all-cause mortality derived from models where all specifications are the same except the source of information on comorbidity (self-report vs. KPMR). The HR estimates for the comorbidities were not consistently affected by source of comorbidity information. For diabetes, the HR estimates based on self-report were higher than those based on the KPMR, but for the other comorbidities, HRs based on self-report were lower. Regardless of how diabetes status was determined, diabetes was a significant risk factor for all-cause mortality among women with breast cancer; the risk was higher when self-reported questionnaire data (vs. KPMR data) were used, but the 95 % CIs were overlapping. Hypertension and other heart diseases were associated with statistically significantly increased HRs when the KPMR data were used (HR = 1.55 for hypertension, HR = 1.51 for other heart diseases), but these associations were not statistically significant when self-reported data were used (HR = 1.22 for hypertension; HR = 1.07 for other heart diseases). HRs for MI were nonsignificantly elevated regardless of data source.

Table 5.

Hazard ratios (95 % CI) for race/ethnicity and comorbidities by source of comorbidity data in the CBCSC Comorbidity Validation Study, 1997–2010

Study Questionnaire KPMR


HRa 95 % CI HR 95 % CI
n deaths + censored 1,936 1,936
Diabetes model (n = 1,936)
 No 1.00 1.00
 Yes 1.65 1.20, 2.25 1.44 1.07, 1.95
 Missing 0.70 0.34, 1.45
 Akaike information criterionb 3,938 3,940
Hypertension model (n = 1,936)
 No 1.00 1.00
 Yes 1.22 0.96, 1.54 1.55 1.22, 1.96
 Missing 0.93 0.37, 2.31
 Akaike information criterion 3,945 3,933
Myocardial infarction model (n = 1,609)
 No 1.00 1.00
 Yes 1.40 0.79, 2.49 1.73 0.82, 3.66
 Missing 0.44 0.20, 0.96
 Akaike information criterion 3,485 3,488
Other heart diseases model (n = 1,768)
 No 1.00 1.00
 Yes 1.07 0.77, 1.49 1.51 1.17, 1.96
 Missing 0.82 0.43, 1.55
 Akaike information criterion 3,736 3,725

CBCSC California Breast Cancer Survivorship Consortium, CI confidence interval, HR hazard ratio, KPMR Kaiser Permanente medical record

a

Cox proportion hazards regression model for all-cause mortality using attained age as the time metric, stratification by study, and adjustment for age, ln(age), AJCC stage, differentiation, ER/PR, nodes, tumor size, prior tumor, chemotherapy, surgery, age at first birth, alcohol, education, marital status, neighborhood socioeconomic status, race/ethnicity, smoking, nativity, and BMI

b

Lower values of Akaike information criterion indicate better model fit

We compared the HR estimates and p values obtained using self-report versus KPMR for all covariates in each of the comorbidity models. The source of comorbidity data did not substantially affect the HR estimate for any of the covariates in any of the comorbidity models. For each categorical covariate (we did not include the two continuous covariates [age and ln(age)] because their HRs are dependent on units of measurement), we calculated the absolute value of the difference between the HR in the model using the self-reported comorbidity data and the HR using the KPMR data. The mean (SD) difference was 0.04 (0.06).

Model fit was similar for diabetes and MI regardless of data source, but was better for hypertension and other heart diseases when KPMR data were used (see AIC, Table 5).

Discussion

In this analysis, we investigated a selected number of major comorbidities that are of interest in breast cancer survival. We compared self-reported to EMR ascertained comorbidity status in a multiethnic population of breast cancer patients, and found that sensitivity and specificity varied by comorbidity and had generally excellent specificity, but weaker sensitivity. When statistically significant differences in sensitivity or specificity were found between subgroups defined by race/ethnicity, age, neighborhood SES, or education, the group more likely to have the comorbidity was more likely to report the condition, suggesting that there may be some amount of confusion regarding borderline cases. For example, women who know that they are at high risk for a condition may be more likely to interpret borderline results as definitive diagnoses. We also found that model fit for survival was somewhat superior when the EMR rather than self-report was used to determine comorbidity, but that risk estimates were similar regardless of data source.

Data discrepancies between self-report and the EMR may have occurred for several reasons, including misunderstanding of the questions such as the relevant date of interest. Additionally, women taking medication may have not reported the condition due to disease management by medication, or women not requiring medication may have not reported having a condition because it was a borderline disease. There were 34 false negatives for diabetes, and 62 % of these women were not taking diabetes medications according to their KPMR, indicating that they may have had relatively mild disease. There were 156 false negatives for hypertension; yet, according to the KPMR, 63 % were taking medication commonly prescribed for hypertension. This could indicate either a limitation in the EMR or that women were taking these medications for reasons other than hypertension. Sensitivity was less than 50 % for other heart diseases, most likely due to the less specific question and ambiguity regarding what should be included in this category. Although the ICD-9 codes selected to indicate other heart diseases are unambiguous, women may not have reported some conditions, particularly those that were mild. Our data were not detailed enough to determine concordance by specific heart disease (such as atherosclerosis or heart failure), nor to determine whether a condition was serious or mild.

Errors may also exist within medical record data, and some may argue that the EMR should not be considered the gold standard. In this study, we found similar or improved model fit for the KPMR data compared to self-report. Thus, even though we have no way of determining which data source was more correct, we found that comorbidity data from the KPMR better predicted survival after diagnosis of breast cancer. The accuracy and completeness of EMRs in general are likely to increase in the future; however, increased accuracy does not necessarily equate to increased prognostic value. For example, a participant may choose to not report a comorbidity that was diagnosed long ago and has since resolved with lifestyle modifications. While the comorbidity may be found in a sufficiently complete EMR, it may be unrelated to survival. Thus, attention to the extraction of relevant EMR data will be necessary to obtain the best models possible.

Our results are generally consistent with those from previous studies that have investigated comorbidity concordance between participant responses and medical records (Table 6) [1322]. Our finding of high concordance for diabetes (kappa = 0.87) agrees with findings from 10 previous studies (mean kappa unweighted by sample size = 0.85, range 0.75–0.97). Our data also show high concordance for hypertension (kappa = 0.81), which is compatible with most previous studies. Our concordance for MI is in the middle of a wide range of kappas found in six previous studies, and, similar to three previous studies, we found modest kappas for other heart diseases. There is less consistent information on whether concordance differs by age, education, or race/ethnicity. Within our limited data by specific race/ethnicity groups, we found little evidence for concordance differences by race/ethnicity. The largest previous study [16] found generally similar concordance by age and education, whereas other studies found situations where concordance was worse for those of older age or lower education [14, 15, 1820].

Table 6.

Kappa values for agreement between participant reports and medical records in current and prior reports

Study n % Male % White % > High school education Population Kappa values

Diabetes Hypertension MI Other heart disease
Current analyses 1,936 0 70.8 68.2 California, breast cancer .87 .81 .66 .50
Midthjell (1992) [21] 507 n/a ~100 % n/a Norwegian municipality .97
Haapanen (1997) [18] 587 ~50 % ~100 % n/a Finnish sample cohort .75 .77 .79
Martin (2000) [14] 599 49–50 n/a n/a Colorado, managed care .81 .65
Sangha (2003) [13] 170 45 82 50 Massachusetts, medical or surgical inpatients .90 .50
Okura (2004) [15] 2,037 48.2 ~96 % 60.4 Minnesota, ≥45 years of age .76 .75 .80 .46
Skinner (2005) [22] 402 100 88.7 35 Massachusetts, veterans .84 .70
Merkin (2007) [17] 965 54 57 37 19 US states, end-stage renal disease .93 .19 .55
Baena-Diez (2008) [16] 3,329 45.2 ~100 % 6.6 Spain, primary care .79 .82 .91
Corser (2008) [19] 525 63.6 84.4 43.8 Michigan, acute coronary syndrome .80 .63
Eze-Nliam (2012) [20] 117 56.4 74.2 33.8 Maryland, acute coronary syndrome .94 .73 .31 .37

n/a not available, MI myocardial infarction

The real measure of whether data from different sources are sufficiently concordant, however, depends on the effect any misclassifications would have on risk estimates. Diabetes was a statistically significant risk factor for overall mortality regardless of the source of comorbidity data, but the HR was greater when self-reported data were used. In contrast, hypertension was significantly associated with overall mortality when KPMR data were used, whereas a smaller nonsignificant increase in risk was found when self-reported data were used. Higher HR estimates were found for hypertension, MI, and other heart diseases when KPMR data were used. Attenuation of HR estimates due to non-differential misclassification of data is an expected result; however, it may also be true that women with false-negative comorbidity status have mortality risks exceeding those of true positives.

Despite the potential for comorbidities to confound the relationships between many of our model covariates and survival, source of comorbidity data had little effect on the HR estimates for the covariates in our models for all-cause mortality. These findings are not surprising given that comorbidity was not a strong confounder in these models regardless of data source. In future studies where comorbidity is expected to be an important confounding variable, and clearly when the comorbidity itself is the variable of interest, special attention is warranted to ensure that the best possible data are obtained.

The increasing availability of EMR systems which can be used to address specific research questions in a timely and comprehensive manner may allow us to rely less on questionnaire-based data in the future. Nevertheless, many studies still require specific or diverse populations which cannot be found under the umbrella of one medical record system. Evaluation of results from studies which rely on self-reports versus those based on medical record data will continue to be informative. The results from this study suggest that differences in accuracy of self-reported comorbidities by various demographic groups are not likely to hinder comparisons of the association between comorbidity and survival.

Although these analyses concern the use of data for research purposes, the potential for errors affecting medical treatment needs to be considered. The presence of a comorbidity may affect the course of treatment for the comorbidity or for breast cancer [12]; thus, these findings may indicate a need for better coordination of care between multiple providers and better communication with the patient.

This study’s strengths include its large sample size, the ability to explore concordance of self-reported comorbidities compared to the EMR, and the low percentage (<3 %) of missing comorbidity data among women asked about these conditions. We have documented concordance rates for four common comorbidities and explored causes for discrepancies, including race/ethnicity, age, neighborhood SES, education, time since comorbidity diagnosis, and use of medication for the condition. In addition, to our knowledge, this is the first study to explore the effect of comorbidity data source on model fit or parameter estimates.

Although our sample size was large, some variation in concordance by demographic and other characteristics may have been missed due to small subgroup sample size. Another limitation of this and similar studies is that concordance rates are undoubtedly affected by the precise wording and the manner in which the comorbidity questions are asked. Thus, our results may not be generalizable to studies which have used different wording in their questionnaires. We were unable to determine the extent to which borderline cases comprised the discordances. Nevertheless, we have been able to highlight some issues that will apply when other researchers develop questionnaires and evaluate results obtained with them.

In conclusion, an EMR, when available and accessible, is likely to be the best source for comorbidity data when used as either a primary risk factor or as a covariate in a study of breast cancer survival. Self-reported data can provide good results, especially when the comorbidity data are used as covariates in multivariable models. Potential effects of misclassification of comorbidity status should be considered when research results are interpreted, but large differences in concordance by demographic groups seem unlikely.

Acknowledgments

This work was supported by the California Breast Cancer Research Program (CBCRP) (Grants 16ZB-8001 to AHW, 16ZB-8002 to SLG, 16ZB-8003 to LB, 16ZB-8004 to MLK, 16ZB-8005 to KRM). The Asian American Breast Cancer Study was supported by CBCRP Grants 1RB-0287, 3 PB-0120, and 5 PB-0018. The San Francisco Bay Area Breast Cancer Study was supported by National Cancer Institute Grants R01 CA63446 and R01 CA77305; by the US Department of Defense (DOD) Grant DAMD17-96-1-6071; and by the CBCRP Grants 4JB-1106 and 7 PB-0068. The Women’s CARE Study was funded by the National Institute of Child Health and Human Development (NICHD), through a contract with USC (N01-HD-3-3175), and the California Teachers Study was funded by the California Breast Cancer Act of 1993; National Cancer Institute grants (R01 CA77398 and K05 CA136967 to LB); and the California Breast Cancer Research Fund (contract 97-10500). The Multiethnic Cohort Study was supported by National Cancer Institute Grants R01 CA54281, R37CA54281, and UM1 CA164973. The Life After Cancer Epidemiology Study is supported by National Cancer Institute Grant R01 CA129059. Clinical and tumor characteristics and mortality data were obtained from the California Cancer Registry. The collection of cancer incidence data used in this study was supported by the California Department of Public Health as part of the statewide cancer reporting program mandated by California Health and Safety Code Sect. 103885; the National Cancer Institute’s Surveillance, Epidemiology and End Results Program under Contract HHSN261201000140C awarded to the Cancer Prevention Institute of California, Contract HHSN261201000035C awarded to the University of Southern California, and Contract HHSN261201000034C awarded to the Public Health Institute; and the Centers for Disease Control and Prevention’s National Program of Cancer Registries, under Agreement U58DP003862-01 awarded to the California Department of Public Health. The ideas and opinions expressed herein are those of the author(s), and endorsement by the State of California, Department of Public Health the National Cancer Institute, and the Centers for Disease Control and Prevention or their Contractors and Subcontractors is not intended nor should be inferred.

Abbreviations

AIC

Akaike information criterion

AJCC

American Joint Committee on Cancer

CBCSC

California Breast Cancer Survivorship Consortium

CCR

California Cancer Registry

CI

Confidence interval

EMR

Electronic medical record

ER

Estrogen receptor

HR

Hazard ratio

KPMR

Kaiser Permanente medical record

KPNC

Kaiser Permanente Northern California

LACE

Life After Cancer Epidemiology

MI

Myocardial infarction

PR

Progesterone receptor

SES

Socioeconomic status

SFBCS

San Francisco Bay Area Breast Cancer Study

Footnotes

Compliance with ethical standards

Conflict of interest The authors declare that they have no conflict of interest.

Ethical standard All procedures performed in studies involving human participants were in accordance with the ethical standards of the Institutional and/or National Research Committee and with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards.

References

  • 1.Patnaik JL, Byers T, DiGuiseppi C, Denberg TD, Dabelea D. The influence of comorbidities on overall survival among older women diagnosed with breast cancer. J Natl Cancer Inst. 2011;103(14):1101–1111. doi: 10.1093/jnci/djr188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Land LH, Dalton SO, Jorgensen TL, Ewertz M. Comorbidity and survival after early breast cancer. A review. Crit Rev Oncol Hematol. 2012;81(2):196–205. doi: 10.1016/j.critrevonc.2011.03.001. [DOI] [PubMed] [Google Scholar]
  • 3.Tammemagi CM, Nerenz D, Neslund-Dudas C, Feldkamp C, Nathanson D. Comorbidity and survival disparities among black and white patients with breast cancer. JAMA. 2005;294(14):1765–1772. doi: 10.1001/jama.294.14.1765. [DOI] [PubMed] [Google Scholar]
  • 4.Braithwaite D, Tammemagi CM, Moore DH, Ozanne EM, Hiatt RA, Belkora J, West DW, Satariano WA, Liebman M, Esserman L. Hypertension is an independent predictor of survival disparity between African-American and white breast cancer patients. Int J Cancer. 2009;124(5):1213–1219. doi: 10.1002/ijc.24054. [DOI] [PubMed] [Google Scholar]
  • 5.Du W, Simon MS. Racial disparities in treatment and survival of women with stage I–III breast cancer at a large academic medical center in metropolitan Detroit. Breast Cancer Res Treat. 2005;91(3):243–248. doi: 10.1007/s10549-005-0324-9. [DOI] [PubMed] [Google Scholar]
  • 6.Lipscombe LL, Goodwin PJ, Zinman B, McLaughlin JR, Hux JE. The impact of diabetes on survival following breast cancer. Breast Cancer Res Treat. 2008;109(2):389–395. doi: 10.1007/s10549-007-9654-0. [DOI] [PubMed] [Google Scholar]
  • 7.Louwman WJ, Janssen-Heijnen ML, Houterman S, Voogd AC, van der Sangen MJ, Nieuwenhuijzen GA, Coebergh JW. Less extensive treatment and inferior prognosis for breast cancer patient with comorbidity: a population-based study. Eur J Cancer. 2005;41(5):779–785. doi: 10.1016/j.ejca.2004.12.025. [DOI] [PubMed] [Google Scholar]
  • 8.Yancik R, Wesley MN, Ries LA, Havlik RJ, Edwards BK, Yates JW. Effect of age and comorbidity in postmenopausal breast cancer patients aged 55 years and older. JAMA. 2001;285(7):885–892. doi: 10.1001/jama.285.7.885. [DOI] [PubMed] [Google Scholar]
  • 9.Patterson RE, Flatt SW, Saquib N, Rock CL, Caan BJ, Parker BA, Laughlin GA, Erickson K, Thomson CA, Bardwell WA, Hajek RA, Pierce JP. Medical comorbidities predict mortality in women with a history of early stage breast cancer. Breast Cancer Res Treat. 2010;122(3):859–865. doi: 10.1007/s10549-010-0732-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Patnaik JL, Byers T, DiGuiseppi C, Dabelea D, Denberg TD. Cardiovascular disease competes with breast cancer as the leading cause of death for older females diagnosed with breast cancer: a retrospective cohort study. Breast Cancer Res. 2011;13(3):R64. doi: 10.1186/bcr2901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Nechuta S, Lu W, Zheng Y, Cai H, Bao PP, Gu K, Zheng W, Shu XO. Comorbidities and breast cancer survival: a report from the Shanghai Breast Cancer Survival Study. Breast Cancer Res Treat. 2013;139(1):227–235. doi: 10.1007/s10549-013-2521-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Wu AH, Kurian AW, Kwan ML, John EM, Lu Y, Keegan TH, Gomez SL, Cheng I, Shariff-Marco S, Caan BJ, Lee VS, Sullivan-Halley J, Tseng CC, Bernstein L, Sposto R, Vigen C. Diabetes and other comorbidities in breast cancer survival by race/ethnicity: the California Breast Cancer Survivorship Consortium (CBCSC) Cancer Epidemiol Biomarkers Prev. 2015;24(2):361–368. doi: 10.1158/1055-9965.EPI-14-1140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Sangha O, Stucki G, Liang MH, Fossel AH, Katz JN. The self-administered comorbidity questionnaire: a new method to assess comorbidity for clinical and health services research. Arthritis Care Res. 2003;49(2):156–163. doi: 10.1002/art.10993. [DOI] [PubMed] [Google Scholar]
  • 14.Martin LM, Leff M, Calonge N, Garrett C, Nelson DE. Validation of self-reported chronic conditions and health services in a managed care population. Am J Prev Med. 2000;18(3):215–218. doi: 10.1016/s0749-3797(99)00158-0. [DOI] [PubMed] [Google Scholar]
  • 15.Okura Y, Urban LH, Mahoney DW, Jacobsen SJ, Rodeheffer RJ. Agreement between self-report questionnaires and medical record data was substantial for diabetes, hypertension, myocardial infarction and stroke but not for heart failure. J Clin Epidemiol. 2004;57(10):1096–1103. doi: 10.1016/j.jclinepi.2004.04.005. [DOI] [PubMed] [Google Scholar]
  • 16.Baena-Díez JM, Alzamora-Sas MT, Grau M, Subirana I, Vila J, Torán P, García-Navarro Y, Bermúdez-Chillida N, Alegre-Basagaña J, Viozquez-Meia M. Validity of the MONICA cardiovascular questionnaire compared with clinical records. Gac Sanit. 2009;23(6):519–525. doi: 10.1016/j.gaceta.2009.01.009. [DOI] [PubMed] [Google Scholar]
  • 17.Merkin SS, Cavanaugh K, Longenecker JC, Fink NE, Levey AS, Powe NR. Agreement of self-reported comorbid conditions with medical and physician reports varied by disease among end-stage renal disease patients. J Clin Epidemiol. 2007;60(6):634–642. doi: 10.1016/j.jclinepi.2006.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Haapanen N, Miilunpalo S, Pasanen M, Oja P, Vuori I. Agreement between questionnaire data and medical records of chronic diseases in middle-aged and elderly Finnish men and women. Am J Epidemiol. 1997;45(8):762–769. doi: 10.1093/aje/145.8.762. [DOI] [PubMed] [Google Scholar]
  • 19.Corser W, Sikorskii A, Olomu A, Stommel M, Proden C, Holmes-Rovner M. Concordance between comorbidity data from patient self-report interviews and medical record documentation. BMC Health Serv Res. 2008;8(1):85. doi: 10.1186/1472-6963-8-85. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Eze-Nliam C, Cain K, Bond K, Forlenza K, Jankowski R, Magyar-Russell G, Yenokyan G, Ziegelstein RC. Discrepancies between the medical record and the reports of patients with acute coronary syndrome regarding important aspects of the medical history. BMC Health Serv Res. 2012;12(1):78. doi: 10.1186/1472-6963-12-78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Midthjell K, Holmen J, Bjorndal A, Lund-Larsen G. Is questionnaire information valid in the study of a chronic disease such as diabetes? The Nord-Trondelag diabetes study. J Epidemiol Commun Health. 1992;46(5):537–542. doi: 10.1136/jech.46.5.537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Skinner KM, Miller DR, Lincoln E, Lee A, Kazis LE. Concordance between respondent self-reports and medical records for chronic conditions: experience from the Veterans Health Study. J Ambulat Care Manag. 2005;28(2):102–110. doi: 10.1097/00004479-200504000-00002. [DOI] [PubMed] [Google Scholar]
  • 23.Walker AM, Lanes SF. Misclassification of covariates. Stat Med. 1991;10(8):1181–1196. doi: 10.1002/sim.4780100803. [DOI] [PubMed] [Google Scholar]
  • 24.Kwan ML, John EM, Caan BJ, Lee VS, Bernstein L, Cheng I, Gomez SL, Henderson BE, Keegan TH, Kurian AW, Lu Y, Monroe KR, Roh JM, Shariff-Marco S, Sposto R, Vigen C, Wu AH. Obesity and mortality after breast cancer by race/ethnicity: the California Breast Cancer Survivorship Consortium. Am J Epidemiol. 2014;179(1):95–111. doi: 10.1093/aje/kwt233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Lu Y, John EM, Sullivan-Halley J, Vigen C, Gomez SL, Kwan ML, Caan BJ, Lee VS, Roh JM, Shariff-Marco S, Keegan TH, Kurian AW, Monroe KR, Cheng I, Sposto R, Wu AH, Bernstein L. History of recreational physical activity and survival after breast cancer: the California Breast Cancer Survivorship Consortium. Am J Epidemiol. 2015;181(12):944–955. doi: 10.1093/aje/kwu466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Wu AH, Gomez SL, Vigen C, Kwan ML, Keegan TH, Lu Y, Shariff-Marco S, Monroe KR, Kurian AW, Cheng I, Caan BJ, Lee VS, Roh JM, Sullivan-Halley J, Henderson BE, Bernstein L, John EM, Sposto R. The California Breast Cancer Survivorship Consortium (CBCSC): prognostic factors associated with racial/ethnic differences in breast cancer survival. Cancer Causes Control. 2013;24(10):1821–1836. doi: 10.1007/s10552-013-0260-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.John EM, Phipps AI, Davis A, Koo J. Migration history, acculturation, and breast cancer risk in Hispanic women. Cancer Epidemiol Biomark Prev. 2005;14(12):2905–2913. doi: 10.1158/1055-9965.EPI-05-0483. [DOI] [PubMed] [Google Scholar]
  • 28.Caan B, Sternfeld B, Gunderson E, Coates A, Quesenberry C, Slattery ML. Life After Cancer Epidemiology (LACE) Study: a cohort of early stage breast cancer survivors (United States) Cancer Causes Control. 2005;16(5):545–556. doi: 10.1007/s10552-004-8340-3. [DOI] [PubMed] [Google Scholar]
  • 29.Ross M, Tyler R, Ng M, Jeffrey S, Mark C, Hart M, John F. The HMO research network virtual data warehouse: a public data model to support collaboration. eGEMs. 2014;2(1):2. doi: 10.13063/2327-9214.1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Viera AJ, Garrett JM. Understanding interobserver agreement: the kappa statistic. Fam Med. 2005;37(5):360–363. [PubMed] [Google Scholar]
  • 31.Yost K, Perkins C, Cohen R, Morris C, Wright W. Socioeconomic status and breast cancer incidence in California for different race/ethnic groups. Cancer Causes Control. 2001;12(8):703–711. doi: 10.1023/a:1011240019516. [DOI] [PubMed] [Google Scholar]

RESOURCES