Abstract
Background:
We sought to determine whether body mass index (BMI) can be accurately identified in epidemiologic studies using claims databases.
Materials and methods:
Using Mass General Brigham (MGB) Research Patient Data Repository-Medicare linked database, we identified a cohort of patients with a BMI measurement from January 1 to June 31, 2014 or January 1 to June 31, 2016, to capture both ICD-9 and ICD-10 eras. Patients were divided into two groups, with or without an obesity-related ICD code in the 6 months before or after the BMI measurement date. We created two binary measures, first for composite overweight, obesity, or severe obesity (BMI≥25 kg/m2), and second for obesity or severe obesity (BMI≥30 kg/m2). We calculated accuracy measures [sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV)] for each obesity category for the overall cohort, and stratified by type 2 diabetes (T2D) and ICD-code era.
Results:
The cohort included 73,644 patients with a BMI measurement in 2014 or 2016, of which 16,280 had an obesity-related ICD-code. The specificity of obesity-related ICD-codes (ICD-9 and ICD-10) were 99.7% for underweight/normal weight, 97.4% for overweight, 99.7% for obese and 98.9% for severe obese. For binary categories capturing BMI≥25 kg/m2 and BMI≥30 kg/m2, specificity was 97.0% and 98.2%, and PPV was 86.9% and 97.3%. Sensitivity was low overall (<40%). Patients with T2D, and coded in ICD-10 era had higher sensitivity, PPV and NPV.
Conclusion:
Obesity-related ICD-codes can accurately identify patients with obesity in epidemiologic studies using claims databases.
INTRODUCTION
Given the high prevalence of obesity worldwide, and the established association with several outcomes, including type 2 diabetes, cardiovascular disease and mortality,1,2 the correct identification of obese patients in non-randomized studies is critical to establish the study population or for confounding adjustment. In real-world evidence (RWE) research, some studies rely on direct measurement of obesity from electronic health records (EHR) databases, and insurance claims-based studies must rely on ICD-9 or ICD-10 codes for its identification. However, obesity as measured by ICD codes tends to be greatly underreported 3, and the accuracy of ICD-defined body mass index (BMI) categories remains unclear.
Previous studies have explored the validity of claims based codes for obesity, broadly defined as BMI ≥ 30 kg/m2, showing sensitivity ranging from 2.7 to 7.8%, specificity from 97 to 99%, positive predictive value (PPV) from 65 to 92% and negative predictive value (NPV) from 74 to 99%.3–5 These accuracy measures appeared to vary depending on the population examined, with better performance in patients undergoing surgery, particularly bariatric surgery,6 and patients with diabetes.5 However, no studies have examined the accuracy of finer strata of BMI as defined by ICD-9 or ICD-10 codes.
The accurate characterization of different levels of obesity in non-randomized research is critical to correctly identify target populations for the assessment of the use and the effects of interventions for obesity, to investigate effect measure modification by obesity, and to reduce potential residual confounding by unmeasured BMI. Therefore, our objective was to establish the measurement characteristics of obesity-related ICD-9 and ICD-10 codes in Medicare claims data to determine whether BMI categories can be accurately identified and studied in healthcare utilization databases.
MATERIALS AND METHODS
Data sources
We leveraged data from the Mass General Brigham (MGB) Research Patient Data Repository (RPDR)7 linked to Medicare fee-for-service (FFS) Part A, B, and D. The RPDR contains longitudinal EHR data for all patients that receive care in an MGB facilities academic medical center and its affiliated provider network in the Boston metropolitan area. It contains information on BMI, blood pressure, smoking status, laboratory and radiology test results. The EHR data of about 550,000 Medicare beneficiaries were linked via the beneficiary numbers, date of birth, and sex with Medicare claims with a success rate of 99.2%.8 Medicare is a U.S. federal health insurance program providing medical and prescription drug coverage to individuals aged 65 years and older and to younger individuals with disabilities. The Medicare program currently covers approximately 50 million Americans. The Medicare fee-for-service (FFS) claims database includes longitudinal, individual-level data on healthcare utilization, inpatient and outpatient diagnoses, diagnostic tests and procedures, and pharmacy filled prescriptions. These data are widely used to study real-world drug effectiveness and safety.8–10
Study population
Within the EHR-Medicare linked database, we identified a cohort of patients with an available BMI measurement from January 1, 2014 to June 31, 2014 or from January 1, 2016 to June 31, 2016 (2014 and 2016 were selected in order to compare ICD-9 and ICD-10 coding eras) (Supplementary Figure 1). The cohort included patients with at least 1 year of continuous enrollment before and after the recorded BMI measure (index date) in the EHR data. Patients were excluded if they had an implausible measure of BMI (BMI<12 kg/m2 or BMI>70 kg/m2) and if they were pregnant in the 6 months prior to cohort entry. Among those, we identified patients with and without an obesity-related ICD code in Medicare in the 6 months prior to or after the index date. If two obesity-related ICD codes were available, the most proximal one to the index date was selected.
Obesity classification from EHR data
Obesity categories were determined based on WHO recommendations11 for obesity classification and included the following: underweight or normal weight: BMI < 25 kg/m2; overweight: BMI 25–29.9 kg/m2; obese: BMI 30–39.9 kg/m2; and severe obese: BMI ≥ 40 kg/m2. We also created broad categories for overweight, obesity, or severe obesity, i.e., BMI ≥ 25 kg/m2, and for obesity or severe obesity, i.e., BMI ≥ 30 kg/m2.
Obesity classification from claims data
In both ICD-9 and ICD-10 classification systems, obesity-related codes can be found under the section “Other Metabolic Disorders and Immunity Disorders” (ICD-9: 278.xx; ICD-10: E66.xx), or “Factors influencing health status and contact with health services” (ICD-9: V85.xx; ICD-10: Z68.xx). These codes were divided into categories of obesity, namely underweight or normal weight, overweight, obese, severe obese, obese or severe obese, overweight, obese, or severe obese, and unspecified obesity. See Table 1 for classifications of obesity-related ICD-9 and ICD-10 codes.
Table 1.
ICD-9 | ICD-10 | |
---|---|---|
Underweight or normal weight | V85.0, V85.1 | Z68.1, Z68.20, Z68.21, Z68.22, Z68.23, Z68.24 |
Overweight | V85.2, V85.21, V85.22, V85.23, V85.24, V85.25, 278.02 | Z68.25, Z68.26, Z68.27, Z68.28, Z68.29, E66.3 |
Obese | V85.3, V85.30, V85.31, V85.32, V85.33, V85.34, V85.35, V85.36, V85.37, V85.38, V85.39 | E66.0, Z68.30, Z68.31, Z68.32, Z68.33, Z68.34, Z68.35, Z68.36, Z68.37, Z68.38, Z68.39 |
Severe obese | 278.01, V85.4, V85.40, V85.41, V85.42, V85.43, V85.44, V85.45 | E66.01, E66.2, Z68.4, Z68.41, Z68.42, Z68.43, Z68.44, Z68.45 |
Obese or severe obese | V85.3, V85.30, V85.31, V85.32, V85.33, V85.34, V85.35, V85.36, V85.37, V85.38, V85.39, 278.01, V85.4, V85.40, V85.41, V85.42, V85.43, V85.44, V85.45, 278.0, 278.03, 278.00 | E66.0, Z68.30, Z68.31, Z68.32, Z68.33, Z68.34, Z68.35, Z68.36, Z68.37, Z68.38, Z68.39 E66.01, E66.2, Z68.4, Z68.41, Z68.42, Z68.43, Z68.44, Z68.45, E66.09, E66.9 |
Overweight, obese, or severe obese | V85.2, V85.21, V85.22, V85.23, V85.24, V85.25, 278.02, V85.3, V85.30, V85.31, V85.32, V85.33, V85.34, V85.35, V85.36, V85.37, V85.38, V85.39, 278.01, V85.4, V85.40, V85.41, V85.42, V85.43, V85.44, V85.45, 278.0, 278.03, 278.00 | Z68.25, Z68.26, Z68.27, Z68.28, Z68.29, E66.3, E66.0, Z68.30, Z68.31, Z68.32, Z68.33, Z68.34, Z68.35, Z68.36, Z68.37, Z68.38, Z68.39 E66.01, E66.2, Z68.4, Z68.41, Z68.42, Z68.43, Z68.44, Z68.45, E66.09, E66.9 |
Unspecified obesity | 278.0, 278.03, 278.00 | E66.09, E66.9 |
Non-obese | V85.0, V85.1, V85.2, V85.21, V85.22, V85.23, V85.24, V85.25, 278.02 | Z68.1, Z68.20, Z68.21, Z68.22, Z68.23, Z68.24, Z68.25, Z68.26, Z68.27, Z68.28, Z68.29, E66.3 |
Non-overweight + non-obese | V85.0, V85.1 | Z68.1, Z68.20, Z68.21, Z68.22, Z68.23, Z68.24 |
Statistical analysis
Using descriptive statistics, we explored characteristics of the study population, and compared patients with and without obesity-related ICD-codes, overall and within obesity categories (underweight or normal weight: BMI < 25 kg/m2; overweight: BMI 25–29.9 kg/m2; obese: BMI 30–39.9 kg/m2; severe obese: BMI ≥ 40 kg/m2; overweight, obese, or severe obese: BMI ≥ 25 kg/m2: and obese or severe obese: BMI ≥ 30 kg/m2), using standardized differences.12 Patient characteristics were also stratified by ICD code era and age (<65 vs. ≥65). We assessed the prevalence of each obesity category in the cohort, and among those with and without an obesity-related ICD code. We calculated sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) for each obesity category. In addition, we calculated accuracy measures for various subgroups: by presence of a diagnosis of type 2 diabetes, by index date year (2014 vs. 2016), and by age (<65 vs. ≥65). Analyses were conducted using SAS and Microsoft Excel.
RESULTS
We identified a cohort of 282,264 patients with a BMI measure between January 1, 2014 and June 31, 2014 or between January 1, 2016 and June 31, 2016 (Figure 1). A total of 73,644 had sufficient enrollment, a BMI measure within plausible range, and no pregnancy diagnoses in the 6 months prior to cohort entry date. Overall, patients were on average 71.1 years of age, 41% were male, and 87% were white. The mean BMI was 28.4 (Table 2).
Table 2.
Characteristics | Total | Patients with ICD code | Patients without ICD codes | Stand Diff |
---|---|---|---|---|
Total | 73,644 | 16,280 | 57,364 | |
Days before index date, mean (SD) | 45.9 (53.7) | 45.9 (53.7) | - | |
Days after index date, mean (SD) | 71.5 (51.9) | 71.5 (51.9) | - | |
Index year: 2014, n(%) | 36,518 (49.6) | 5,920 (36.4) | 30,598 (53.3) | 0.34 |
Index year: 2016, n(%) | 37,126 (50.4) | 10,360 (63.6) | 26,766 (46.7) | −0.34 |
Demographics | ||||
Age, mean years (SD) | 71.1 (11.1) | 67.5 (12.0) | 72.1 (10.7) | 0.40 |
Male, n(%) | 30,275 (41.1) | 6,700 (41.2) | 23,575 (41.1) | 0.00 |
White, n(%) | 64,099 (87.0) | 13,663 (83.9) | 50,436 (87.9) | 0.12 |
Non-white, n(%) | 9,545 (13.0) | 2,617 (16.1) | 6,928 (12.1) | −0.12 |
Co-severities | ||||
BMI, median (IQR) | 27.5 (24.2 to 31.5) | 32.6 (28.5 to 37.3) | 26.6 (23.6 to 29.8) | −0.10 |
BMI, mean (SD) | 28.4 (6.1) | 33.2 (7.4) | 27.0 (4.9) | −0.99 |
Smoking, n(%) | 9,205 (12.5) | 3,237 (19.9) | 5,968 (10.4) | −0.27 |
Drug abuse, n(%) | 1,858 (2.5) | 735 (4.5) | 1,123 (2.0) | −0.14 |
Alcohol abuse, n(%) | 1,889 (2.6) | 684 (4.2) | 1,205 (2.1) | −0.12 |
Hypertension, n(%) | 42,345 (57.5) | 11,218 (68.9) | 31,127 (54.3) | −0.30 |
Hyperlipidemia, n(%) | 34,994 (47.5) | 8,994 (55.2) | 26,000 (45.3) | 0.20 |
Type 2 Diabetes, n(%) | 9,282 (12.6) | 3,999 (24.6) | 5,283 (9.2) | −0.42 |
Cardiovascular disease, n(%) | 17,450 (23.7) | 4,980 (30.6) | 12,470 (21.7) | −0.20 |
Heart failure, n (%) | 6,373 (8.7) | 2,270 (13.9) | 4,103 (7.2) | −0.22 |
Hypertension, n (%) | 42,345 (57.5) | 11,218 (68.9) | 31,127 (54.3) | −0.30 |
Ischemic heart disease, n (%) | 12,768 (17.3) | 3,565 (21.9) | 9,203 (16.0) | −0.15 |
Stroke, n (%) | 4,378 (5.9) | 1,107 (6.8) | 3,271 (5.7) | −0.05 |
TIA, n (%) | 1,377 (1.9) | 355 (2.2) | 1,022 (1.8) | −0.03 |
PVD, n (%) | 4,233 (5.7) | 1,226 (7.5) | 3,007 (5.2) | −0.09 |
Other cerebrovascular disease, n (%) | 1,271 (1.7) | 331 (2.0) | 940 (1.6) | −0.03 |
COPD, n(%) | 6,659 (9.0) | 2,063 (12.7) | 4,596 (8.0) | −0.15 |
Asthma, n(%) | 6,155 (8.4) | 2,250 (13.8) | 3,905 (6.8) | −0.23 |
Sleep apnea, n (%) | 4,776 (6.5) | 2,492 (15.3) | 2,284 (4.0) | −0.39 |
GERD, n (%) | 12,461 (16.9) | 3,789 (23.3) | 8,672 (15.1) | −0.21 |
Osteoarthritis, n (%) | 15,293 (20.8) | 4,308 (26.5) | 10,985 (19.1) | −0.18 |
NAFLD or NASH, n (%) | 1,293 (1.8) | 633 (3.9) | 660 (1.2) | −0.17 |
Bariatric surgery, n(%) | 135 (0.2) | 88 (0.5) | 47 (0.1) | −0.07 |
Malignant cancer, n(%) | 18,010 (24.5) | 3,746 (23.0) | 14,264 (24.9) | 0.04 |
Depression, n (%) | 12,305 (16.7) | 3,937 (24.2) | 8,368 (14.6) | −0.24 |
Combined co-severity score, mean (SD) | 1.7 (2.6) | 2.3 (3.0) | 1.5 (2.4) | −0.29 |
Frailty, n(%) | ||||
<0.1 | 6,871 (9.3) | 853 (5.2) | 6,018 (10.5) | 0.20 |
0.1-<0.2 | 55,426 (75.3) | 11,562 (71.0) | 43,864 (76.5) | 0.13 |
>=0.2 | 11,347 (15.4) | 3,865 (23.7) | 7,482 (13.0) | −0.28 |
Medications | ||||
Antihypertensives, n(%) | 34,744 (47.2) | 8,592 (52.8) | 26,152 (45.6) | 0.14 |
Beta blocker, n(%) | 27,752 (37.7) | 6,808 (41.8) | 20,944 (36.5) | −0.11 |
CCB, n(%) | 16,787 (22.8) | 4,185 (25.7) | 12,602 (22.0) | −0.09 |
Diuretics, n(%) | 11,063 (15.0) | 3,681 (22.6) | 7,382 (12.9) | 0.26 |
Nitrates, n(%) | 3,374 (4.6) | 1,049 (6.4) | 2,325 (4.1) | −0.10 |
Other hypertension drugs, n(%) | 5,200 (7.1) | 1,597 (9.8) | 3,603 (6.3) | −0.13 |
Statins, n(%) | 37,159 (50.5) | 8,999 (55.3) | 28,160 (49.1) | −0.12 |
Non-insulin Antidiabetic drugs, n(%) | 9,602 (13.0) | 3,558 (21.9) | 6,044 (10.5) | −0.31 |
Insulin, n (%) | 2,828 (3.8) | 1,282 (7.9) | 1,546 (2.7) | −0.23 |
Antiobesity medications, n(%) | 121 (0.2) | 46 (0.3) | 75 (0.1) | −0.04 |
Anticoagulants, n(%) | 9,154 (12.4) | 2,571 (15.8) | 6,583 (11.5) | −0.13 |
Antiplatelets, n(%) | 3,573 (4.9) | 963 (5.9) | 2,610 (4.6) | −0.06 |
COPD medications, n(%) | 16,121 (21.9) | 4,527 (27.8) | 11,594 (20.2) | −0.18 |
PPI or H2 blocker, n (%) | 22,851 (31.0) | 6,450 (39.6) | 16,401 (28.6) | 0.23 |
Opioids, n (%) | 14,174 (19.2) | 4,325 (26.6) | 9,849 (17.2) | −0.23 |
NSAIDS, n (%) | 10,280 (14.0) | 2,971 (18.2) | 7,309 (12.7) | −0.15 |
Antidepressant, n(%) | 22,032 (29.9) | 6,253 (38.4) | 15,779 (27.5) | −0.23 |
Healthcare utilization | ||||
Hospitalizations, mean (SD) | 0.2 (0.6) | 0.4 (0.8) | 0.2 (0.5) | −0.30 |
ED visits, mean (SD) | 0.5 (1.1) | 0.7 (1.5) | 0.4 (0.9) | −0.24 |
Outpatient visits, mean (SD) | 5.5 (4.8) | 6.7 (5.2) | 5.2 (4.6) | −0.31 |
Number of drugs, mean (SD) | 7.4 (4.8) | 9.3 (5.5) | 6.8 (4.4) | −0.50 |
Abbreviations: standard deviation (SD), primary care physician (PCP), standardized difference (Stand. Diff.), body mass index (BMI), International classification of diseases (ICD), chronic obstructive pulmonary disease (COPD), non-alcoholic fatty-liver disease (NAFLD), nonalcoholic steatohepatitis (NASH), gastroesophageal reflux disease (GERD), calcium-channel blockers (CCB), nonsteroidal anti-inflammatory drugs (NSAIDS), proton-pump inhibitors (PPI), emergency department (ED)
Among these patients, we identified 16,280 (22.1%) with an obesity-related ICD code within a time window of 6 months before or after the BMI measure (Table 2). Overall, compared to patients without an obesity-related ICD codes, patients with an obesity ICD code were younger (67.5 vs. 72.1), more likely of non-white race (16.1 vs. 12.1%), with higher average BMI (33.2 vs. 27.0), more likely to be a smoker (19.9 vs. 10.4%), more likely to have type 2 diabetes (24.6% vs. 9.2%) and cardiovascular diseases (30.6% vs. 21.7%) at baseline. Patients with an obesity ICD code were also more likely to receive various medications compared to those with without such ICD codes (mean number of drugs 9.3 vs. 6.8), including antihypertensives, glucose-lowering agents, anticoagulants, and antidepressants. Finally, a greater proportion of patients with the obesity ICD code had a higher frailty score (≥ 0.2 : 23.7% vs 13.0%), compared to those without an ICD code. These differences remained after stratification by obesity category (Supplemental Table 1). Within each obesity category, the percentage of patients without an obesity-related ICD code was highest for the underweight/normal weight (91.7%) and overweight (86.6%) groups, moderate for the obese group (60.4%) and lowest for patients in the severe obese category (25.2%).
Table 3 summarizes the accuracy of obesity-related ICD codes within the entire cohort and among patients with type 2 diabetes. Underweight and normal weight, overweight and obese categories had lowest sensitivity (between 5.2 and 7.9), which was slightly higher among patients with type 2 diabetes. Specificity was generally high (> 97.4%) for the overall cohort and the type 2 diabetes cohort. Severe obesity had higher sensitivity (30.9%), specificity (98.9%) and NPV (96.7%), however had the lowest PPV (58.6%).
Table 3:
All patients | Patients with type 2 diabetes | |||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
Sensitivity % | Specificity % | PPV % | NPV % | Sensitivity % | Specificity % | PPV % | NPV % | |
Underweight and normal weight: BMI <25 | 6.5 | 99.7 | 90.0 | 70.7 | 12.7 | 99.6 | 85.7 | 85.5 |
Overweight: BMI 25–29.9 | 7.9 | 97.4 | 63.5 | 64.9 | 13.6 | 96.4 | 63.8 | 70.6 |
Obese: BMI 30–39.9 | 5.2 | 99.7 | 86.3 | 72.7 | 8.5 | 99.1 | 87.0 | 72.6 |
| ||||||||
Severe obese: BMI ≥ 40 | 30.9 | 98.9 | 58.6 | 96.7 | 39.8 | 97.2 | 64.2 | 92.8 |
| ||||||||
Obese or severe obese: BMI ≥ 30 * | 40.4 | 97.0 | 86.9 | 76.8 | 57.7 | 91.8 | 88.4 | 66.7 |
Overweight, obese or severe obese: BMI ≥ 25 * | 27.9 | 98.2 | 97.3 | 37.6 | 47.5 | 94.6 | 97.9 | 25.9 |
Abbreviations: body mass index (BMI), International classification of diseases (ICD), positive predictive value (PPV), negative predictive value (NPV)
categories for obesity include ICD codes for unspecified obesity
When assessing the prevalence of BMI categories stratified by the presence of an ICD code for BMI or obesity, we observed that, while the distribution of BMI among patients without an ICD code resembled that of the total cohort, the distribution of BMI among patients with an ICD code was not representative of the distribution in the total cohort (Figure 2).
Overall, we observed higher sensitivity, PPV and NPV in ICD-10 codes compared to ICD-9 codes, while specificity remained stable (Supplemental Table 2). Similarly, in age stratified analyses, sensitivity, PPV and NPV were generally higher among the less than 65 years, and specificity was similar for both age groups (Supplemental Table 3).
DISCUSSION
Using Medicare claims linked to EHR data, we found that weight-related ICD codes are substantially underreported, particularly in non-obese patients. However, the specificity and PPV of available ICD codes for BMI or obesity were high, allowing to accurately identify patients within different BMI categories which makes ICD-based weight categories useful for identifying a study population and sub-group categorization in claims data studies.
Consistent with our results, previous validations studies have reported low sensitivity and generally higher specificity and PPV within various populations, with respect to broader definitions of obesity.3,5,6 A study conducted within a Medicare population linked to NHANES data found that obesity related diagnosis codes had low sensitivity but high specificity, thus concluding that the identification of obesity is accurate within the Medicare database. 5 However, the study only assessed a broad definition of obesity (i.e., BMI ≥ 30) and was limited to earlier years, from 1999 to 2004, therefore being unable to compare the difference in accuracy between the ICD-9 and ICD-10 eras.
Our findings suggest that a higher proportion of inaccurate obesity-related diagnosis codes occurred in the ICD-9 era. In addition, we also observed a higher sensitivity with ICD-10 compared to ICD-9 coding eras. Similarly to our findings, Ammann et al. also observed a higher sensitivity in the ICD-10 era. This increase in sensitivity indicates a better capture of obesity following the shift to ICD-10 coding. In fact, ICD-10 coding is more prescriptive compared to ICD-9 coding era such that the physician is required to enter certain codes for billing purposes which likely encourages physicians to enter codes for obesity. This shift may also be due to the rising prevalence of obesity in the population resulting in physicians’ increased willingness to document obesity.3 In contrast, Ammann et al. did not observe a higher PPV associated with ICD-10 coding, likely due to differences in care approaches of inherently different populations (age 20 and over in Optum compared to 65 and over in Medicare).3 In a subsequent study, Ammann et al. reported that while the obesity-related codes remained greatly underreported, the PPVs were high among patients undergoing bariatric surgery, total knee arthroplasty, cardiac ablation and hernia repair.6
We showed that the accuracy of obesity-related ICD codes was higher with increasing BMI. Two other studies have reported similar findings. First, a Danish study observed a higher PPV for severe and severe obesity versus obesity overall (90% vs. 87.6%).4 However, they also observed a high proportion of underreporting in patients with overweight and obesity diagnoses and did not report other accuracy measures such as sensitivity, specificity and NPV. Second, a Canadian study reported low sensitivity (7.75%), but high measures of specificity (99%), NPV (80.8%) and PPV (65.9%) for the diagnosis of obesity, which increased as BMI increased.13 In addition, they observed higher PPV of an obesity diagnosis in women, and with certain health conditions associated to obesity including diabetes and hypertension.
Accurately identifying obesity in research is of high importance given the already high, and increasing prevalence of obesity worldwide and the established association of obesity with several highly researched outcomes, including type 2 diabetes,14 hypertension,15 cardiovascular disease,16 venous thrombosis,17 sleep apnea, multiple cancer types,18 and mortality.19 This further highlights the importance of correctly capturing obesity for the identification of target populations with obesity or for confounding adjustment. However, direct measurements of BMI are not readily available in claims-based databases as they only rely on obesity-related ICD-9 and ICD-10 diagnosis codes which tend to be underreported. The significant underreporting of obesity-related ICD codes may be due to the requirement that these codes be accompanied by an associated and reportable diagnosis, instead of being entered as a standalone code.20 In this setting, the presence of conditions that require higher healthcare utilization may be associated with better reporting, in line with our findings of better reporting in patients with type 2 diabetes.21
There is consensus in the literature that the prevalence of obesity cannot be estimated using ICD codes for obesity due to the extensive underreporting, which is also supported by our findings.3,5 Therefore, studies that require a representative distribution of obesity, e.g., for obesity surveillance purposes, should not rely on ICD codes. One exception to note is when focusing on highly-specific populations, such as patients that recently underwent bariatric surgery, who showed to have a higher likelihood of receiving BMI or obesity diagnosis codes.6
Obesity-related ICD codes are accurate enough to be reliably used for characterizing patients as obese, as suggested by our findings and supported by other studies, 3,5 allowing for the correct identification of target populations to assess the use and the effects of interventions for obesity and investigate effect measure modification by obesity. It may also reduce potential residual confounding by BMI, however, only within a cohort with high prevalence of obesity-related ICD codes. Our study also shows that the accurate characterization of patients’ weight can be obtained not only as a binary obesity categorization but also at a finer level, including normal weight, overweight, obese and severe obese.
Our study has limitations. First, while the classification of ICD codes into categories of obesity was done by expert consensus, certain ICD codes represented obesity conditions that were not specific to precise BMI categories and may result in some level of misclassification. Second, although we selected the obesity-related ICD codes most proximal in time to the BMI measure, there may still be inconsistency between BMI measure and ICD code, if significant change in weight occurred in that time period. However, we observed an average of 46 to 71 days between the index date and the occurrence of the ICD code, which is a short period of time, unlikely sufficient for meaningful changes in weight to happen. Third, the calculation of specificity and NPV is based on the assumption that patients without an ICD code represent individuals that would have not qualified to receive that code, and that patients with a BMI measure represent all individuals that would have received an obesity-related code. Given these assumptions, our accuracy measures of NPV and specificity may be overestimated, however this is less likely in categories of more severe obesity, and in the cohort of patients with type 2 diabetes. Fourth, many patients were excluded due to insufficient enrollment. This is likely because we identified study participants based on a diagnosis code which may have resulted in a slightly larger loss of patients compared to studies identifying participants based on a medication filled prescription, because patients that have prescription drug coverage in Medicare may be more likely to be continuously enrolled. However, we do not believe that this exclusion criteria would undermine the generalizability of our findings, as we do not expect the use of obesity-related codes to be associated with continuous enrollment. Fifth, these results are only generalizable to patients that are eligible for Medicare coverage, which generally includes individuals aged 65 years and older.
CONCLUSION
The specificity of obesity-related ICD codes were high (>97%) for all categories of obesity, indicating that patients that received an obesity-related ICD code were accurately identified in these categories. Claims-based ICD codes could be used to correctly identify target populations of patients with obesity for the evaluation of healthcare interventions, and investigate effect measure modification by obesity. They could potentially reduce residual confounding by BMI, though not in a study population with significant levels of missingness in obesity-related codes. In contrast, given their low sensitivity, claims-based ICD codes should not be used in studies aiming to assess the prevalence or distribution of obesity at a general population level.
Supplementary Material
ACKNOWLEDGMENTS
Funding:
This study was funded by the Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA. Dr. Patorno was supported by a career development grant (K08AG055670) from the National Institute on Aging. Dr. Kim is supported by the NIH (K24-AR078959).
Role of the Funder/Sponsor:
The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Footnotes
Conflict of Interest Disclosures: Dr. Patorno is co-investigator of an investigator-initiated grant to the Brigham and Women’s Hospital from Boehringer-Ingelheim, not directly related to the topic of the submitted work. Dr. Schneeweiss is the principal investigator of investigator-initiated grants to the Brigham and Women’s Hospital from Boehringer Ingelheim unrelated to the topic of this study. He is a consultant to Aetion Inc, a software manufacturer of which he owns equity. His interests were declared, reviewed, and approved by the Brigham and Women’s Hospital and Mass General Brigham (MGB) in accordance with their institutional compliance policies. Dr. Kim received research grants to the BWH from Pfizer, AbbVie, Roche, and Bristol-Myers Squibb for unrelated studies. All other authors have no conflicts of interest to disclose.
REFERENCES
- 1.Pani LN, Nathan DM, Grant RW. Clinical predictors of disease progression and medication initiation in untreated patients with type 2 diabetes and A1C less than 7%. Diabetes Care 2008;31(3):386–390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Khan SS, Ning H, Wilkins JT, et al. Association of Body Mass Index With Lifetime Risk of Cardiovascular Disease and Compression of Morbidity. JAMA Cardiol 2018;3(4):280–287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ammann EM, Kalsekar I, Yoo A, Johnston SS. Validation of body mass index (BMI)-related ICD-9-CM and ICD-10-CM administrative diagnosis codes recorded in US claims data. Pharmacoepidemiol Drug Saf 2018;27(10):1092–1100. [DOI] [PubMed] [Google Scholar]
- 4.Gribsholt SB, Pedersen L, Richelsen B, Thomsen RW. Validity of ICD-10 diagnoses of overweight and obesity in Danish hospitals. Clin Epidemiol 2019;11:845–854. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lloyd JT, Blackwell SA, Wei II, Howell BL, Shrank WH. Validity of a Claims-Based Diagnosis of Obesity Among Medicare Beneficiaries. Eval Health Prof 2015;38(4):508–517. [DOI] [PubMed] [Google Scholar]
- 6.Ammann EM, Kalsekar I, Yoo A, et al. Assessment of obesity prevalence and validity of obesity diagnoses coded in claims data for selected surgical populations: A retrospective, observational study. Medicine (Baltimore) 2019;98(29):e16438. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Nalichowski R, Keogh D, Chueh HC, Murphy SN. Calculating the benefits of a Research Patient Data Repository. AMIA Annu Symp Proc 2006:1044. [PMC free article] [PubMed] [Google Scholar]
- 8.Lin KJ, Singer DE, Glynn RJ, et al. Prediction Score for Anticoagulation Control Quality Among Older Adults. J Am Heart Assoc 2017;6(10). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Patorno E, Najafzadeh M, Pawar A, et al. The EMPagliflozin compaRative effectIveness and SafEty (EMPRISE) study programme: Design and exposure accrual for an evaluation of empagliflozin in routine clinical care. Endocrinol Diabetes Metab 2020;3(1):e00103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Patorno E, Pawar A, Franklin JM, et al. Empagliflozin and the Risk of Heart Failure Hospitalization in Routine Clinical Care. Circulation 2019;139(25):2822–2830. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.WHO. Obesity: Preventing and Managing the Global Epidemic Geneva, SwitzerlandWHO; 2000. [PubMed] [Google Scholar]
- 12.Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Statistics in medicine 2009;28(25):3083–3107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Martin BJ, Chen G, Graham M, Quan H. Coding of obesity in administrative hospital discharge abstract data: accuracy and impact for future research studies. BMC Health Serv Res 2014;14:70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Knowler WC, Barrett-Connor E, Fowler SE, et al. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N Engl J Med 2002;346(6):393–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Semlitsch T, Jeitler K, Berghold A, et al. Long-term effects of weight-reducing diets in people with hypertension. Cochrane Database Syst Rev 2016;3:CD008274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Aune D, Sen A, Norat T, et al. Body Mass Index, Abdominal Fatness, and Heart Failure Incidence and Mortality: A Systematic Review and Dose-Response Meta-Analysis of Prospective Studies. Circulation 2016;133(7):639–649. [DOI] [PubMed] [Google Scholar]
- 17.Ageno W, Becattini C, Brighton T, Selby R, Kamphuisen PW. Cardiovascular risk factors and venous thromboembolism: a meta-analysis. Circulation 2008;117(1):93–102. [DOI] [PubMed] [Google Scholar]
- 18.Kyrgiou M, Kalliala I, Markozannes G, et al. Adiposity and cancer at major anatomical sites: umbrella review of the literature. BMJ 2017;356:j477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Haslam DW, James WP. Obesity. Lancet 2005;366(9492):1197–1209. [DOI] [PubMed] [Google Scholar]
- 20.US National Center for Health Statistics. ICD-10-CM Official Guidelines for Coding and Reporting FY 2021 (October 1, 2020 – September 30, 2021) Available at https://www.who.int/classifications/icd/factsheet/en/. Accessed April 27, 2021. www.cdc.gov/nchs/icd/icd10cm.htm.
- 21.Juarez DT, Tan C, Davis J, Mau M. Factors Affecting Sustained Medication Adherence and Its Impact on Health Care Utilization in Patients with Diabetes. J Pharm Health Serv Res 2013;4(2):89–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.