Abstract
Background:
Prior validation studies of claims-based definitions of chronic kidney disease (CKD) using ICD-9 codes reported overall low sensitivity, high specificity, and variable but reasonable PPV. No studies to date have evaluated the accuracy of ICD-10 codes to identify a US patient population with CKD.
Methods:
We assessed the accuracy of claims-based algorithms to identify adults with CKD Stages 3–5 compared with laboratory values in a subset (~40%) of a US commercial insurance claims database (Optum’s de-identified Clinformatics® Data Mart Database). We calculated the positive predictive value (PPV) of one or two ICD-9 (2012–2014) or ICD-10 (2016–2018) codes for CKD compared with a lab-based estimated glomerular filtration rate (eGFR) occurring within prespecified windows (±90 days, ±180 days, ±365 days) of the ICD-based CKD code(s).
Results:
The study population ranged between 104 774 and 161 305 patients (ICD-9 cohorts) and between 285 520 and 373 220 patients (ICD-10 cohorts). The mean age was 74.4 years (ICD-9) and 75.6 years (ICD-10) and the median eGFR was 48 ml/min/1.73 m2. The algorithm of two CKD codes compared with a lab value ±90 days of the first code achieved the highest PPV (PPV 86.36% [ICD-9] and 86.07% [ICD-10]). Overall, ICD-10 based codes had comparable PPVs to ICD-9 based codes and all ICD-10 based algorithms had PPVs >80%. The algorithm of one CKD code compared with laboratory value ±180 days maintained the PPV above 80% but still retained a large number of patients (PPV 80.32% [ICD-9] and 81.56% [ICD-10]).
Conclusion:
An ICD-10-based definition of CKD identified with sufficient accuracy a patient population with CKD Stages 3–5. Our findings suggest that claims databases could be used for future real-world research studies in patients with CKD Stages 3–5.
Keywords: chronic kidney disease, claims data, validation study
1 |. INTRODUCTION
Chronic kidney disease (CKD) is a major public health burden affecting 14.9% of the US adult population1 and ~7% have moderate to severe CKD (Stages 3–5).2 From 2002 to 2016, the burden of CKD in the US increased at a pace faster than for other non-communicable diseases, with the top causes of CKD continuing to be diabetes and hypertension.3 With the aging of the US population, the incidence4 and prevalence of CKD is only expected to continue to increase.5 CKD is associated with poorer quality of life,6,7 high morbidity,5,8 hospitalizations,9,10 cost,1 and mortality.3,8,11 In 2018, Medicare fee-for-service spending for beneficiaries with CKD exceeded $81 billion, representing over 22% of all Medicare spending.1
There has been growing interest in the use of more real-world data and evidence to inform clinical care,12 and this is especially true for nephrology.13 Healthcare utilization databases can help answer important clinical questions on the comparative effectiveness of therapeutics14 in real-world patients who are underrepresented in clinical trials, such as patients with moderate to severe CKD, a complex patient population15 with a large comorbidity and medication burden.6,16 Studies using real-world data are also ideally suited to evaluate drug safety since they include patients of more diverse age, comorbidities, and baseline risk for outcomes, making the use of real-world studies a cost-effective strategy to generate real world, post-marketing data to understand the benefits/harms of medications and monitor for rare or unanticipated adverse events in understudied populations.17
Therefore, accurately identifying a patient population with moderate to severe CKD in claims data would be helpful to conduct these types of studies. Several prior validation studies using claims data18–24 examined the use of international classification of diseases ninth revision (ICD-9) codes for CKD and demonstrated overall low sensitivity, high specificity, and variable but reasonable positive predictive values (PPVs). However, these studies used variable algorithms and study populations, which could have contributed to the variability in the PPVs. Moreover, none of the prior validation studies was conducted in the US after the introduction of international classification of diseases 10th revision (ICD-10) codes in 2015, a coding system with more digits that was designed to be more specific and to more accurately reflect disease severity than the ICD-9 codes. Therefore, the purpose of this validation study was to assess the accuracy of identifying patients with CKD Stages 3–5 using ICD-9 or ICD-10 based diagnosis codes in a large, national commercial claims database in the US.
2 |. METHODS
2.1 |. Data source
The study used data from Optum’s de-identified Clinformatics® Data Mart Database from 2012–2014 (for ICD-9-based codes) and 2016–2018 (for ICD-10-based codes). This database contains demographic information, inpatient and outpatient diagnoses, procedures, provider visits, hospitalizations, pharmacy dispensing records, and health plan enrollment status for each enrolled patient. In addition to claims data, outpatient laboratory test results are available for a subset of beneficiaries (~40%) through linkage with national laboratory (lab) test provider chains. This work was approved by the Mass General Brigham Institutional Review Board.
2.2 |. CKD diagnosis algorithms
We used one or two ICD-9/10 codes to identify patients with CKD Stages 3–5 (see Table S1.1). For algorithms using two ICD codes, we required a second ICD code to occur between 7 and 180 days after the first code. Since the conversion of ICD-9 to ICD-10 in 2015 in the US, for this validation study, we created two separate time periods: 2012–2014 for the ICD-9 system and 2016–2018 for the ICD-10 system.
From the available lab data, we calculated the estimated glomerular filtration rate (eGFR) using a modified chronic kidney disease epidemiology collaboration (CKD-EPI) equation25 that does not include race (as below) as well as the modification of diet in renal disease (MDRD) study equation.26
Females:
Males:
2.3 |. Study cohort selection
Patients were required to have at least one ICD-9 or ICD-10 code for CKD Stages 3–5. To minimize misclassification and missing data, we required all patients aged ≥18 years to be continuously enrolled for at least 12 months before and after the first CKD diagnosis (2012–2014 [ICD-9 cohort] and 2016–2018 [ICD-10 cohort]). Patients could be included in both time periods. Patients with diagnosis codes for acute kidney injury (see Table S1.2) or who were missing demographic information necessary to calculate an eGFR value (age or gender) were excluded from the study (see Figure 1).
FIGURE 1.

Study cohort selection for algorithm of one CKD claims diagnosis and lab value within 180 days of the claims-based diagnosis. The assessment period excluded the year 2015 due to the switch from ICD-9-based coding to ICD-10-based coding. *Diagnosis codes for CKD Stages 3, 4, and 5, and acute kidney injury can be found in Tables 1 and 2, respectively. **For patients with two serum creatinine laboratory values in the period before and after the claims-based CKD diagnosis that occurred within the same number of months from the diagnosis, if these laboratory values had an absolute difference greater than 1.0 mg/dl, then the patient was excluded from the analysis
In addition, patients were required to have at least one serum creatinine lab value within pre-defined time windows around the time of the CKD diagnosis (±90 days, ±180 days, or ± 365 days). For patients with ≥2 serum creatinine lab values in the pre-defined time window, the value closer to the CKD diagnosis was considered for the analysis. Patients with two serum creatinine lab values that occurred within an equal number of months from the ICD code were excluded from the analysis if these values had an absolute difference greater than 1.0 mg/dl.
2.4 |. Analysis to assess the accuracy of the CKD algorithms
We used the calculated eGFR as the gold standard to validate the claims-based CKD algorithms. We compared six combinations of ICD based codes for the diagnosis of CKD with calculated eGFRs within different time windows of the ICD based diagnosis: (1) one CKD diagnosis and lab value ±90 days of the diagnosis code; (2) two CKD diagnoses and lab value ±90 days of the first diagnosis code; (3) one CKD diagnosis and lab value ±180 days of the diagnosis code (primary algorithm); (4) two CKD diagnoses and lab value ±180 days of the first diagnosis code; (5) one CKD diagnosis and lab value ±365 days of the diagnosis code; and (6) two CKD diagnoses and lab value ±365 days of the first diagnosis code.
For each of the six algorithms, the PPV was calculated as the percentage of patients who met the eGFR based definition of CKD among the patients identified with CKD by the ICD-based definitions. We used an eGFR definition based on both the CKD-EPI equation (excluding race) and the MDRD equation. We calculated separate PPVs for the algorithms using ICD-9 codes (2012–2014) and ICD-10 codes (2016–2018). Patient characteristics for the ICD-9 and ICD-10 cohorts were assessed during the one-year baseline period prior to the first ICD diagnosis code. We also assessed the accuracy of the CKD diagnosis in subgroups by age (<65 years, 65–74 years, ≥75 years), gender (female, male), comorbid conditions of diabetes or hypertension, and nephrologist visit during the 1-year baseline period (yes/no) in both the ICD-9 (2012–2014) and ICD-10 (2016–2018) time periods. We calculated 95% confidence intervals (CI) for the PPVs by using the normal approximation of the binomial distribution.
3 |. RESULTS
3.1 |. Study cohort characteristics
After applying the exclusion criteria, the study population ranged between 104 774 and 161 305 patients for the validation assessment of CKD Stages 3–5 using ICD-9 codes and between 285 520 and 373 220 patients for the validation assessment using ICD-10 codes. Figure 1 illustrates the selection of the study cohort for the algorithm comparing one CKD claims-based code to eGFR calculated from lab value ±180 days of the claims diagnosis (primary algorithm). The baseline characteristics of the study cohort are listed in Table 1. The mean age was 74.4 years (ICD-9 cohort) and 75.6 years (ICD-10 cohort). Over half were female (55.6% [ICD-9 cohort] and 55.4% [ICD-10 cohort]) and almost half had diabetes mellitus (47.7% [ICD-9 cohort] and 49.8% [ICD-10 cohort]). The median eGFR calculated by the CKD-EPI equation (without race) and the MDRD equation remained relatively stable across the ICD-9 and ICD-10 cohorts at 48 ml/min/1.73 m2.
TABLE 1.
Baseline characteristics of study cohort, 2012–2014, and 2016–2018
| ICD-9 period (2012–2014) | ICD-10 period (2016–2018) | |
|---|---|---|
| Number of patients | N = 155 730 | N = 386 331 |
|
| ||
| Demographics, n (%) | ||
|
| ||
| Age (years), mean (SD) | 74.4 (9.5) | 75.6 (9.2) |
|
| ||
| Age, n (%) | ||
| 18–64 years | 20 890 (13.4) | 38 607 (10.0) |
| 65–74 years | 44 547 (28.6) | 122 628 (31.7) |
| ≥ 75 years | 90 293 (58.0) | 225 096 (58.3) |
|
| ||
| Female, n (%) | 86 637 (55.6) | 213 943 (55.4) |
|
| ||
| Black, n (%) | 16 863 (10.8) | 46 444 (12.0) |
|
| ||
| Comorbidities, n (%) | ||
|
| ||
| Diabetes mellitus | 74 318 (47.7) | 192 500 (49.8) |
|
| ||
| Hypertension | 140 621 (90.3) | 352 155 (91.2) |
|
| ||
| Heart failure | 28 042 (18.0) | 75 124 (19.4) |
|
| ||
| CCI, mean (SD) | 3.74 (2.24) | 4.32 (2.56) |
|
| ||
| Medications, n (%) | ||
|
| ||
| ACEI/ARBs | 99 175 (63.7) | 236 022 (61.1) |
|
| ||
| Diuretics | 59 086 (37.9) | 136 916 (35.4) |
|
| ||
| Other antihypertensives | 101 535 (65.2) | 242 491 (62.8) |
|
| ||
| Insulin | 19 737 (12.7) | 47 640 (12.3) |
|
| ||
| Non-insulin antidiabetic medications | 40 477 (26.0) | 106 967 (27.7) |
|
| ||
| Statins | 93 700 (60.2) | 235 532 (61.0) |
|
| ||
| NSAIDs | 24 149 (15.5) | 58 163 (15.1) |
|
| ||
| Serum creatinine and eGFR calculations | ||
|
| ||
| No. of serum creatinine labs ordered | ||
| Mean (SD) | 2.3 (2.0) | 2.7 (2.2) |
|
| ||
| Serum creatinine, mg/dl | ||
| Median [IQR] | 1.28 [1.06, 1.56] | 1.27 [1.06, 1.54] |
|
| ||
| eGFR by CKD-EPI, ml/min/1.73 m2 | ||
| Median [IQR] | 48.12 [37.91, 57.54] | 48.30 [38.51, 57.13] |
|
| ||
| eGFR by MDRD, ml/min/1.73 m2 | ||
| Median [IQR] | 48.33 [38.55, 57.30] | 48.69 [39.38, 57.14] |
|
| ||
| Healthcare utilization | ||
|
| ||
| No. of office visits, mean (SD) | 11.7 (8.9) | 11.2 (8.1) |
|
| ||
| No. hospitalizations, mean (SD) | 0.2 (0.6) | 0.2 (0.6) |
|
| ||
| Any hospitalization, n (%) | 21 427 (13.8) | 53 544 (13.9) |
Abbreviations: ACEI, angiotensin-converting-enzyme inhibitor; ARBs, angiotensin II receptor blockers; CCI, combined comorbidity index; CKD-EPI, chronic kidney disease epidemiology collaboration; eGFR, estimated glomerular filtration rate; MDRD, the modification of diet in renal disease; Other antihypertensives, calcium channel blockers and beta blockers.
3.2 |. Accuracy of the CKD diagnoses in claims data
When we compared the PPVs of the six algorithms for CKD in the ICD-9 and ICD-10 cohorts, we found that all algorithms had PPVs >80% except for the algorithm that used one CKD diagnosis code and one lab value ±365 days for the ICD-9 cohort (Table 2). The ICD-10 cohort for this algorithm had a PPV of 80.63%. For algorithms using two ICD codes, the PPV remained >80% when we required the second ICD code to occur between 30 and 180 days after the first code. The highest PPV was achieved using the algorithm of two CKD diagnoses codes with a lab value ±90 days of the first diagnosis code (PPV 86.36% [ICD-9 cohort] and PPV 86.07% [ICD-10 cohort]), followed by the algorithm of two CKD diagnoses with a lab value ±180 days (PPV 85.30% [ICD-9 cohort] and PPV 85.56% [ICD-10 cohort]). However, these groups also had the lowest number of patients (two CKD diagnoses and lab value ±90 days: N = 104 774 [ICD-9 cohort] and N = 285 520 [ICD-10 cohort]; two CKD diagnoses and lab value ±180 days: N = 110 540 [ICD-9 cohort] and N = 286 745 [ICD-10 cohort]) compared with the number of remaining patients when using the other four algorithms. The algorithm that maintained the PPV above 80% but still retained a large number of patients was the algorithm using one CKD diagnosis code compared with lab value ±180 days of the ICD code. The PPV using this algorithm was 80.32% for the ICD-9 cohort (N = 155 730) and 81.56% for the ICD-10 cohort (N = 386 331).
TABLE 2.
Positive predictive values comparing algorithms of ICD-9 and ICD-10 claims-based diagnoses of CKD Stages 3–5 to eGFR based on laboratory values
| Algorithm | ICD-9 period 2012–2014 | ICD-10 period 2016–2018 |
|---|---|---|
| 1 CKD diagnosis, lab ± 90 days | N = 146 739 | N = 380 117 |
|
| ||
| PPV (%), 95% CI | ||
| Comparison to eGFR (CKD-EPI) | 81.24 (81.04, 81.44) | 82.00 (81.87, 82.12) |
| Comparison to eGFR (MDRD) | 81.89 (81.70, 82.09) | 82.03 (81.91, 82.15) |
|
| ||
| 2 CKD diagnoses, lab ± 90 days | N = 104 774 | N = 285 520 |
|
| ||
| PPV (%), 95% CI | ||
| Comparison to eGFR (CKD-EPI) | 86.36 (86.15, 86.57) | 86.07 (85.94, 86.20) |
| Comparison to eGFR (MDRD) | 86.67 (86.47, 86.88) | 85.93 (85.80, 86.06) |
|
| ||
| 1 CKD diagnosis, lab ± 180 days | N = 155 730 | N = 386 331 |
|
| ||
| PPV (%), 95% CI | ||
| Comparison to eGFR (CKD-EPI) | 80.32 (80.12, 80.52) | 81.56 (81.43, 81.68) |
| Comparison to eGFR (MDRD) | 80.91 (80.71, 81.10) | 81.46 (81.34, 81.59) |
|
| ||
| 2 CKD diagnoses, lab ± 180 days | N = 110 540 | N = 286 745 |
|
| ||
| PPV (%), 95% CI | ||
| Comparison to eGFR (CKD-EPI) | 85.30 (85.09, 85.51) | 85.56 (85.43, 85.69) |
| Comparison to eGFR (MDRD) | 85.60 (85.40, 85.81) | 85.30 (85.17, 85.43) |
|
| ||
| 1 CKD diagnosis, lab ± 365 days | N = 161 305 | N = 373 220 |
|
| ||
| PPV (%), 95% CI | ||
| Comparison to eGFR (CKD-EPI) | 78.80 (78.61, 79.00) | 80.63 (80.50, 80.76) |
| Comparison to eGFR (MDRD) | 79.21 (79.01, 79.41) | 80.40 (80.27, 80.52) |
|
| ||
| 2 CKD diagnoses, lab ± 365 days | N = 111 329 | N = 270 553 |
|
| ||
| PPV (%), 95% CI | ||
| Comparison to eGFR (CKD-EPI) | 84.06 (83.84, 84.27) | 84.81 (84.67, 84.94) |
| Comparison to eGFR (MDRD) | 84.12 (83.91, 84.34) | 84.41 (84.27, 84.54) |
Note: Lab assessment window is described by the X-days lab label. For example, the 90-days lab label describes an analysis in which patients had a creatinine laboratory value 90 days before or after their claims-based CKD diagnosis.
Abbreviations: CKD-EPI, chronic kidney disease epidemiology collaboration; MDRD, the modification of diet in renal disease.
3.3 |. Additional analysis
When we calculated the PPV for patients with CKD Stages 4 and 5 only, we found the PPV to be higher, but the sample size was substantially decreased. We also examined the PPV for the CKD algorithm for patients with exactly 1 CKD diagnosis available, exactly 2 CKD diagnoses available, and ≥ 3 CKD diagnoses available, during the 180-day lab window. For the algorithm using 1 CKD diagnosis in the 180-day lab window, the PPV was 74.9%–77.7% for patients with exactly 1 CKD diagnosis available, 77.0%–79.9% for patients with exactly 2 CKD diagnoses available, and 83.1%–84.0% with ≥3 CKD diagnoses codes available within the 180-day lab window. For patients with exactly 2 or ≥ 3 CKD diagnoses available in the 180-day lab window, the PPV remained >80% for the algorithms using 2 CKD diagnoses and a lab value within 180 days. We also examined the consistency of coding among patients who had at least 2 ICD codes, with one code occurring in the first 180 days, and another code occurring during days 181–365. The majority of patients had at least one code during days 181–365 that was for the same stage of CKD as the first code occurring in the first 180 days (98.4%–98.7% for CKD Stage 3, 87.0%–87.9% for CKD Stage 4, and 83.6%–85.9% for CKD Stage 5).
3.4 |. Subgroup analysis
We applied the CKD algorithms across subgroups of patients by age, gender, comorbidity status (diabetes or hypertension), and the presence of nephrology visit during the baseline period. When we applied the algorithm of one CKD diagnosis compared with a lab value ±180 days of the claims-based diagnosis, the PPV was highest for older adults ≥75 years (PPV 84.88% [ICD-10]) compared with adults 65–74 years (PPV 79.29% [ICD-10]) and < 65 years (PPV 70.62% [ICD-10]) (Figure 2). The algorithm performed slightly better for the ICD-10 cohort compared with the ICD-9 cohort. This algorithm performed better in females (PPV 82.24% [ICD-10]) than males (PPV 80.70% [ICD-10]) (Figure 2). The algorithm performed with a PPV >80% for patients with diabetes (PPV 81.83% [ICD-10]) and hypertension (PPV 82.25% [ICD-10]), with a slightly higher PPV for the ICD-10 cohort compared with the ICD-9 cohort (Figure 3). The highest PPVs were among patients who had at least one nephrologist visit during the baseline period (PPV 88.18% [ICD-10]) (Figure 3).
FIGURE 2.

PPV of algorithm using one claims-based diagnosis of CKD compared with eGFR from laboratory value within a 180-day lab window, stratified by age and gender
FIGURE 3.

PPV of algorithm using one claims-based diagnosis of CKD compared with eGFR from laboratory value within a 180-day lab window, stratified by the comorbid conditions of diabetes or hypertension, and a Nephrologist visit
4 |. DISCUSSION
In this study, we assessed the accuracy of identifying a cohort of patients with moderate to severe CKD (Stages 3–5) using ICD-based claims codes compared with a lab-based eGFR in a national US claims database. We found that one ICD-9/10 based code for CKD had sufficient PPV (>80%) to accurately identify a patient population with CKD Stages 3–5. Overall, ICD-10 based codes had comparable PPVs compared with ICD-9 based codes. While the use of two CKD claims codes led to a higher PPV, more patients were dropped from the sample size with this algorithm compared with the use of one claims code. We found that the PPV also varied depending on the time window during which the eGFR was compared with the claims-based diagnosis code. The PPV was higher if the laboratory value was closer in time to the ICD-code date.
There have been several prior validation studies of a diagnosis of CKD using claims data.18–24 In general, studies reported variable but overall poor sensitivity and high specificity with claims-based definitions of CKD,18–23,27 which suggests that claims codes underestimate the number of patients with CKD and thus may be problematic if used as an outcome variable or for disease surveillance purposes. However, the acceptable PPVs suggest that claims codes for CKD can be used to identify with sufficient accuracy a patient population with CKD. As noted by Winkelmayer et al.20 in their validation study of ICD-9 claims-based algorithms of CKD, if the intent of the study is to create a cohort of patients with CKD from claims data, then the algorithm must have a high PPV and their study in Medicare claims data found excellent internal validity, but unclear generalizability.
One systematic review reported that PPVs from prior studies were reasonable but variable (median 78%, range 29%–100%),19 but this review encompassed heterogenous studies, including ones based on patients on dialysis as well as studies using medical chart review as the reference standard, which can underreport diagnoses of CKD.28 In addition to the different study populations (including patients with end-stage kidney disease) and reference standards (e.g., medical chart review vs. laboratory values), the variable PPVs could also have resulted from the different algorithms for the definition of CKD, study populations, and reference time windows to compare the claims codes with the lab values.
Several of the prior validation studies used lab values to calculate an eGFR as the reference standard. In a validation study conducted in elderly Medicare beneficiaries (n = 1852) who were hospitalized for myocardial infarction, the PPVs were generally high (85.7% to 97.5%),20 suggesting that Medicare claims data could be used with sufficient accuracy to identify patients with CKD. Among 6982 participants in the Reasons for Geographic and Racial Differences in Stroke (REGARDS) Study who also had Medicare fee-for-service coverage, the PPV for the Medicare claims-based definition of CKD was 75.6% when compared with the research study definition of CKD.23 Of note, the claims-based definition of CKD could occur during a window period of 2 years prior to the occurrence of the research study-based definition of CKD. Another study among veterans with diabetes (n = 263 730) found that expanding the algorithm to include more renal-related codes (from 9 codes to 79 codes) improved the sensitivity of diagnosing CKD, but decreased the PPV (94.2% to 74.1%).21 In another study that reported a lower PPV comparing two physician claims or one hospitalization to a calculated eGFR, the PPV for the claims code was 60.1%, but this was based on a comparison to a lab value within a 2-year period.24
There have been several studies examining ICD-10 codes in other countries, but none in the US. In a Canadian study22 (n = 123 499), which compared ICD-10 codes for CKD with lab values, they found low sensitivity and high specificity for an ICD-10 based definition of CKD. The PPV was 65.4% for CKD, but increased to 85.2% for CKD Stage 3 and beyond (eGFR <60 ml/min/1.73 m2). In an Australian study27 (n = 325) comparing ICD-10 codes with chart review, the ICD-10 based definition had low sensitivity and high specificity. Of note, the reference standard here was CKD defined in the medical chart, rather than a calculated eGFR.
In our study, when we evaluated the PPV among subgroups by age, gender, history of hypertension or diabetes, or prior nephrologist visit, we found that the PPV was higher for older adults ≥75 years, and females. Since the PPV depends on the underlying prevalence of the disease in a population, the better performance of claims-based definitions of CKD in older adults could be related to the higher prevalence of CKD in this subgroup. This suggests that the identification of older adults with CKD might be more accurate in claims data that predominantly includes older adults (e.g., Medicare). The highest PPV was obtained for patients who had at least one nephrologist visit during the 1-year baseline period prior to the first ICD diagnosis code. The PPV was higher for ICD-10-based diagnoses compared with ICD-9-based diagnoses. Given the adoption of ICD-10 codes in the US since 2015, the slightly higher PPVs for ICD-10-based definitions of CKD compared with ICD-9-based codes suggest that the current ICD-10-based codes could better identify a patient population with CKD Stages 3–5.
There are several limitations to our study. First, since the goal of our study was to evaluate the accuracy of identifying a cohort of patients with moderate to severe CKD (Stages 3–5) using claims-based definitions, we only calculated the PPV and did not calculate the sensitivity, specificity, or negative predictive value. Thus, our study approach using claims-based definitions of CKD cannot be used to estimate the incidence or prevalence of CKD or used for disease surveillance purposes in this patient population since this approach likely underestimates the number of patients with CKD. Given that prior studies reported low sensitivity and high specificity using claims-based definitions of CKD, we expect that many patients with CKD were not included in our cohort. Nonetheless, our study found that the PPV was sufficiently high to be able to create a cohort of patients with moderate to severe CKD using national claims data. Second, the PPV depends on the underlying prevalence of CKD in the population, and we did not have information on the underlying prevalence. Third, since our study purpose was to identify a patient population with moderate to severe CKD (rather than evaluate the performance characteristics of claims-based CKD definitions as an outcome), we evaluated the PPV for patients with CKD Stages 3–5 and did not evaluate the PPV for identifying patients within individual CKD stages. The performance characteristics of claims-based definitions of the individual CKD stages remain an area for future research. Fourth, reliance on eGFR as the gold standard to estimate kidney function could have introduced bias, since muscle mass and other factors can influence serum creatinine and eGFR, as well as the reliance on a single measure of eGFR. Since the laboratory data comes from commercial laboratories, the available lab results are also mainly from routine outpatient testing. Moreover, our use of an estimating equation for eGFR that did not include race could have introduced misclassification of eGFR.29 Fifth, we did not have albuminuria values in our data set and thus could not perform a validation study that compared the claims-based diagnosis against a definition of CKD that included albuminuria. Sixth, our study only included patients with stable CKD since we excluded patients with history of AKI and kidney disease progression (serum creatinine difference > 1 mg/dl between the two lab values). Lastly, the generalizability of this claims-based approach to all patients with CKD is unclear and we do not assume that patients who are identified using claims-based diagnosis codes are of the same clinical phenotype as patients identified purely by lab-based or medical record based definitions of CKD.23,30,31 Two studies found that patients with CKD identified through claims had higher risk of end-stage kidney disease and mortality.23,31
In conclusion, claims-based algorithms for moderate to severe CKD have sufficient accuracy that they can be used to identify a cohort of patients with CKD Stages 3–5 in claims data within which real-world research questions can be investigated in this patient population. Future studies are needed to understand the generalizability of how patients with CKD identified with claims-based algorithms may differ from those identified through other algorithms. Nonetheless, our findings suggest that claims databases could be used for future real-world research studies in patients with CKD Stages 3–5.
Supplementary Material
Key Points.
We assessed the accuracy of claims-based algorithms to identify adults with CKD Stages 3–5 compared with laboratory values in a subset of a US commercial insurance claims database.
Overall, ICD-10 based codes had comparable PPVs to ICD-9 based codes and all ICD-10 based algorithms had PPVs >80%.
An ICD-10 claims-based definition of CKD identified with sufficient accuracy a patient population with CKD Stages 3–5.
Our findings suggest that claims databases could be used for future real-world research studies in patients with CKD Stages 3–5.
Funding information
This study was funded by the Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA. Dr. Min Zhuo is supported by a NIH NIDDK T32 award DK007199. Dr. Elisabetta Patorno is supported by a career development grant K08AG055670 from the National Institute on Aging.
CONFLICT OF INTEREST
Seoyoung C. Kim has received research grants to the Brigham and Women’s Hospital from Pfizer, Roche, AbbVie and Bristol-Myers Squibb for unrelated studies. She is in part supported by NIH/NIAMS-K24-AR078959. Dr. Elisabetta Patorno is investigator of a research grant to the Brigham and Women’s Hospital from Boehringer Ingelheim, not related to the topic of the submitted work. The rest of the authors declare no conflict of interest.
Footnotes
ETHICS STATEMENT
This work was approved by the Mass General Brigham Institutional Review Board.
SUPPORTING INFORMATION
Additional supporting information may be found in the online version of the article at the publisher’s website.
REFERENCES
- 1.United States Renal Data System. USRDS Annual Data Report: Epidemiology of Kidney Disease in the United States. National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases; 2020. https://adr.usrds.org/2020 [Google Scholar]
- 2.Myers OB, Pankratz VS, Norris KC, Vassalotti JA, Unruh ML, Argyropoulos C. Surveillance of CKD epidemiology in the US - a joint analysis of NHANES and KEEP. Sci Rep. 2018;8(1):15900. doi: 10.1038/s41598-018-34233-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bowe B, Xie Y, Li T, et al. Changes in the US burden of chronic kidney disease from 2002 to 2016: an analysis of the global burden of disease study. JAMA Netw Open. 2018;1(7):e184412. doi: 10.1001/jamanetworkopen.2018.4412 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Grams ME, Chow EK, Segev DL, Coresh J. Lifetime incidence of CKD stages 3–5 in the United States. Am J Kidney Dis. 2013;62(2):245–252. doi: 10.1053/j.ajkd.2013.03.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Levey AS, Coresh J. Chronic kidney disease. Lancet. 2012;379(9811):165–180. doi: 10.1016/S0140-6736(11)60178-5 [DOI] [PubMed] [Google Scholar]
- 6.Stengel B, Metzger M, Combe C, et al. Risk profile, quality of life and care of patients with moderate and advanced CKD: the French CKD-REIN cohort study. Nephrol Dial Transplant. 2019;34:277–286. doi: 10.1093/ndt/gfy058 [DOI] [PubMed] [Google Scholar]
- 7.Porter AC, Lash JP, Xie D, et al. Predictors and outcomes of health-related quality of life in adults with CKD. Clin J Am Soc Nephrol. 2016; 11(7):1154–1162. doi: 10.2215/CJN.09990915 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Tonelli M, Wiebe N, Guthrie B, et al. Comorbidity as a driver of adverse outcomes in people with chronic kidney disease. Kidney Int. 2015;88(4):859–866. doi: 10.1038/ki.2015.228 [DOI] [PubMed] [Google Scholar]
- 9.Wong E, Ballew SH, Daya N, et al. Hospitalization risk among older adults with chronic kidney disease. Am J Nephrol. 2019;50(3):212–220. doi: 10.1159/000501539 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ishigami J, Grams ME, Chang AR, Carrero JJ, Coresh J, Matsushita K. CKD and risk for hospitalization with infection: the atherosclerosis risk in communities (ARIC) study. Am J Kidney Dis. 2017;69(6):752–761. doi: 10.1053/j.ajkd.2016.09.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Chen TK, Knicely DH, Grams ME. Chronic kidney disease diagnosis and management: a review. JAMA. 2019;322(13):1294–1304. doi: 10.1001/jama.2019.14745 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Corrigan-Curay J, Sacks L, Woodcock J. Real-world evidence and real-world data for evaluating drug safety and effectiveness. JAMA. 2018;320(9):867–868. doi: 10.1001/jama.2018.10136 [DOI] [PubMed] [Google Scholar]
- 13.Thompson AM, Southworth MR. Real world data and evidence: support for drug approval: applications to kidney diseases. Clin J Am Soc Nephrol. 2019;14(10):1531–1532. doi: 10.2215/CJN.02790319 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Schneeweiss S Developments in post-marketing comparative effectiveness research. Clin Pharmacol Ther. 2007;82(2):143–156. doi: 10.1038/sj.clpt.6100249 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tonelli M, Wiebe N, Manns BJ, et al. Comparison of the complexity of patients seen by different medical subspecialists in a universal health care system. JAMA Netw Open. 2018;1(7):e184852. doi: 10.1001/jamanetworkopen.2018.4852 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bailie GR, Eisele G, Liu L, et al. Patterns of medication use in the RRI-CKD study: focus on medications with cardiovascular effects. Nephrol Dial Transplant. 2005;20(6):1110–1115. doi: 10.1093/ndt/gfh771 [DOI] [PubMed] [Google Scholar]
- 17.Schneeweiss S, Avorn J. A review of uses of health care utilization databases for epidemiologic research on therapeutics. J Clin Epidemiol. 2005;58(4):323–337. doi: 10.1016/j.jclinepi.2004.10.012 [DOI] [PubMed] [Google Scholar]
- 18.Grams ME, Plantinga LC, Hedgeman E, et al. Validation of CKD and related conditions in existing data sets: a systematic review. Am J Kidney Dis. 2011;57(1):44–54. doi: 10.1053/j.ajkd.2010.05.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Vlasschaert ME, Bejaimal SA, Hackam DG, et al. Validity of administrative database coding for kidney disease: a systematic review. Am J Kidney Dis. 2011;57(1):29–43. doi: 10.1053/j.ajkd.2010.08.031 [DOI] [PubMed] [Google Scholar]
- 20.Winkelmayer WC, Schneeweiss S, Mogun H, Patrick AR, Avorn J, Solomon DH. Identification of individuals with CKD from Medicare claims data: a validation study. Am J Kidney Dis. 2005;46(2):225–232. doi: 10.1053/j.ajkd.2005.04.029 [DOI] [PubMed] [Google Scholar]
- 21.Kern EF, Maney M, Miller DR, et al. Failure of ICD-9-CM codes to identify patients with comorbid chronic kidney disease in diabetes. Health Serv Res. 2006;41(2):564–580. doi: 10.1111/j.1475-6773.2005.00482.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Fleet JL, Dixon SN, Shariff SZ, et al. Detecting chronic kidney disease in population-based administrative databases using an algorithm of hospital encounter and physician claim codes. BMC Nephrol. 2013;14:81. doi: 10.1186/1471-2369-14-81 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Muntner P, Gutierrez OM, Zhao H, et al. Validation study of medicare claims to identify older US adults with CKD using the reasons for geographic and racial differences in stroke (REGARDS) study. Am J Kidney Dis. 2015;65(2):249–258. doi: 10.1053/j.ajkd.2014.07.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ronksley PE, Tonelli M, Quan H, et al. Validating a case definition for chronic kidney disease using administrative data. Nephrol Dial Transplant. 2012;27(5):1826–1831. doi: 10.1093/ndt/gfr598 [DOI] [PubMed] [Google Scholar]
- 25.Levey AS, Stevens LA, Schmid CH, et al. A new equation to estimate glomerular filtration rate. Ann Intern Med. 2009;150(9):604–612. doi: 10.7326/0003-4819-150-9-200905050-00006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Levey AS, Coresh J, Greene T, et al. Using standardized serum creatinine values in the modification of diet in renal disease study equation for estimating glomerular filtration rate. Ann Intern Med. 2006;145(4):247–254. doi: 10.7326/0003-4819-145-4-200608150-00004 [DOI] [PubMed] [Google Scholar]
- 27.Ko S, Venkatesan S, Nand K, Levidiotis V, Nelson C, Janus E. International statistical classification of diseases and related health problems coding underestimates the incidence and prevalence of acute kidney injury and chronic kidney disease in general medical patients. Intern Med J. 2018;48(3):310–315. doi: 10.1111/imj.13729 [DOI] [PubMed] [Google Scholar]
- 28.Samal L, Linder JA, Bates DW, Wright A. Electronic problem list documentation of chronic kidney disease and quality of care. BMC Nephrol. 2014;15:70. doi: 10.1186/1471-2369-15-70 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Hsu C-y, Yang W, Parikh RV, et al. Race, genetic ancestry, and estimating kidney function in CKD. N Engl J Med. 2021;385:1750–1760. doi: 10.1056/NEJMoa2103753 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Smith DH, Shetterly S, Flory J, et al. Diagnosis-based cohort augmentation using laboratory results data: the case of chronic kidney disease. Pharmacoepidemiol Drug Saf. 2018;27(8):872–877. doi: 10.1002/pds.4583 [DOI] [PubMed] [Google Scholar]
- 31.Vestergaard SV, Christiansen CF, Thomsen RW, Birn H, Heide-Jørgensen U. Identification of patients with CKD in medical databases: a comparison of different algorithms. Clin J Am Soc Nephrol. 2021;16:543–551. doi: 10.2215/cjn.15691020 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
