Abstract
Background
To facilitate inflammatory bowel disease (IBD) research in the United States, we developed and validated claims-based definitions to identify incident and prevalent IBD diagnoses using administrative healthcare claims data among multiple payers.
Methods
We used data from Medicare, Medicaid, and the HealthCore Integrated Research Database (Anthem commercial and Medicare Advantage claims). The gold standard for validation was review of medical records. We evaluated 1 incidence and 4 prevalence algorithms based on a combination of International Classification of Diseases codes, National Drug Codes, and Current Procedural Terminology codes. The claims-based incident diagnosis date needed to be within ±90 days of that recorded in the medical record to be valid.
Results
We reviewed 111 charts of patients with a potentially incident diagnosis. The positive predictive value (PPV) of the claims algorithm was 91% (95% confidence interval [CI], 81%-97%). We reviewed 332 charts to validate prevalent case definition algorithms. The PPV was 94% (95% CI, 86%-98%) for ≥2 IBD diagnoses and presence of prescriptions for IBD medications, 92% (95% CI, 85%-97%) for ≥2 diagnoses without any medications, 78% (95% CI, 67%-87%) for a single diagnosis and presence of an IBD medication, and 35% (95% CI, 25%-46%) for 1 physician diagnosis and no IBD medications.
Conclusions
Through a combination of diagnosis, procedural, and medication codes in insurance claims data, we were able to identify incident and prevalent IBD cases with high accuracy. These algorithms can be useful for the ascertainment of IBD cases in future studies.
Keywords: IBD, validation, epidemiology, ICD, observational study
Introduction
Accurately defining the epidemiology of inflammatory bowel diseases (IBDs) in the United States has proved difficult, as IBD is not a reportable diagnosis and comprehensive, population-based tracking registries do not exist.1 Administrative healthcare databases are commonly used in chronic disease epidemiology and typically use International Classification of Diseases codes to identify patients with specific diseases. However, the accuracy of identifying IBD cases using International Classification of Diseases codes remains unclear.2
Several IBD case definition algorithms have been developed for administrative healthcare databases in North America,3-5 but none capture the full spectrum of health insurance plans in the United States. Additionally, prior research has focused on algorithms to identify prevalent diagnoses; the suitability of claims data to identify new diagnoses is unknown. Thus, we developed new claims-based algorithms to identify incident and prevalent IBD in administrative healthcare databases using a combination of diagnoses codes, procedures, and medications, and validated these new algorithms within commercial and government-funded insurance plans.
Methods
Database
The study utilized administrative claims data from Medicare, Medicaid, and the HealthCore Integrated Research Database (HIRD) (Anthem commercial and Medicare Advantage claims). Medicare is a federal government–funded insurance plan for adults older than 65 years of age, those with end-stage renal disease, and disabled persons. We used a linked cohort of Medicare beneficiaries who were enrolled in a fee-for-service Medicare plan that was the primary payer for at least 1 encounter within the University of North Carolina healthcare system from 2014 to 2017. Medicaid is a state government–run insurance plan for children and adults meeting criteria for low income. We utilized Medicaid data from Pennsylvania from 1999 to 2012. HIRD includes patients with commercial plans (ie, <65 years of age) and those with managed Medicare plans through Anthem. We used claims data from HIRD beneficiaries from 2014 to 2018. While the structure of each dataset varied, common elements include enrollment dates, demographics, diagnosis and procedure codes from outpatient and inpatient encounters, physician specialty, and nonhospital prescription drug dispensing.
Incident case definition algorithms
We empirically developed several potential algorithms. The core algorithm included (1) ≥2 IBD diagnoses codes by a gastroenterologist or surgeon within 1 year; (2) colonoscopy, sigmoidoscopy, capsule endoscopy, or surgery within 6 weeks prior to, or on the same day as, the first IBD diagnosis; (3) prescriptions for IBD medications within 90 days after the first diagnosis; and (4) a minimum of 1 year of insurance coverage without any prescriptions for IBD medication prior to the first IBD diagnosis code. The date of the lower endoscopy or surgery was considered the index date of diagnosis. We used information from medications dispensed within the 90-day period following index date to stratify the core algorithm into a high and lower probability based on medications commonly used as first-line or later therapy.
High-probability algorithm
Patients were included in the high-probability algorithm if they had a first prescription for steroids (oral or rectal) and/or mesalamine (oral or rectal), sulfasalazine, olsalazine, balsalazide, adalimumab, infliximab, golimumab, or certolizumab within 90 days following the index date. Patients were also included in the high-probability algorithm if they were not prescribed IBD medications but had a bowel resection surgery on their index date.
Lower-probability algorithm
Patients were included in the low-probability group if they met the incident definition listed previously but did not meet the high-probability therapy–based definition. See the Supplemental Materials for additional details.
We created several additional variations of the high-probability algorithm (Appendix Table 1). Using data from HIRD to identify potential incidence diagnoses, we performed 2 rounds of chart reviews. In phase 1 (algorithm refinement), we estimated the accuracy of 5 proposed algorithms. We reviewed charts of 50 patients (10 for each algorithm). The positive predictive value (PPV) ranged from 10% to 80% using a ±182-day window to define accuracy (Appendix Table 2). As a result of this, we added video capsule endoscopy to the list of diagnostic tests or surgeries used to establish a new diagnosis of IBD. In phase 2 (algorithm validation), we reviewed the charts of 113 additional patients, from which 111 had usable data (Appendix Figure 1).
Prevalent case definition algorithms
We generated 4 different claims-based algorithms to identify patients with prevalent IBD, drawing on the prior published literature. There were slight variations in the algorithms based on the data source as noted in Appendix Table 3. See the Supplemental Materials for additional details. Definitions of prevalence were validated using chart reviews of patients who received care from the entire United States based on HIRD commercial claims, from the University of North Carolina based on Medicare claims (years 2014-2017 only), or from the University of Pennsylvania based on Medicaid claims. We used stratified random sampling from the 3 data sources to select approximately equal numbers of patients using each of the 4 mutually exclusive definitions of prevalence. After excluding medical records that were inadequate to determine IBD diagnosis, we reviewed 332 charts: 132 from HIRD, 100 from Medicare, and 100 from Medicaid claims.
Statistical Methods
We computed PPV and corresponding binomial 95% confidence intervals (CIs). For incidence, we required that the IBD diagnosis be correct and occur within ±90 days of that recorded in the medical record. For prevalence validation analyses, we pooled summary data from patients from all data sources to estimate the overall PPV for different definitions of prevalence. We were not able to provide exact numbers of true and false positives from each database, as the Center for Medicare and Medicaid Services has rules on small cell suppression.
Results
Validating IBD incident case definition algorithms in HIRD data
Amount of time within a health plan has previously been shown to be important for identifying new diagnoses.6 Among the patients selected for review, the median time from enrollment to first IBD diagnosis was 54.9 (interquartile range, 27.0-106.1) months. The PPV was 91% (95% CI, 81%-97%) for the high-probability algorithm, 85% (95% CI, 65%-96%) for the lower-probability algorithm, and 73% (95% CI, 52%-88%) for the high-probability algorithm without the requirement for lower endoscopy or surgery within 6 weeks of the first IBD diagnosis (Table 1).
Table 1.
PPV of the incident claim–based algorithms to identify the date of incidence diagnosis of IBD within 90 days of that recorded in the medical records using HIRD data.
| Algorithm | Number of charts reviewed | Correctly identified incidence date | PPV (%) | 95% CI (%) |
|---|---|---|---|---|
| High probability | 57 | 52 | 91 | 81-97 |
| Low probability | 26 | 22 | 85 | 65-96 |
| High probability with longer lag time between lower endoscopy or surgery and first IBD diagnosis | 26 | 19 | 73 | 52-88 |
Abbreviations: CI, confidence interval; HIRD, HealthCore Integrated Research Database; IBD, inflammatory bowel disease; PPV, positive predictive value.
Validating IBD prevalent case definition algorithms in Anthem, Medicare, and Medicaid data
The estimated PPV of the prevalence algorithms, as assessed in all data sources, is summarized in Table 2. Among the patients selected for review in HIRD data, the median time from enrollment to first IBD diagnosis was 56.2 (interquartile range, 25.0-105.0) months. The PPV was 94% (95% CI, 86%-98%) for ≥2 IBD diagnoses and presence of a prescription for an IBD medication, 92% (95% CI, 85%-97%) for ≥2 diagnoses without any medications, 78% (95% CI, 67%-87%) for a single diagnosis and presence of an IBD medication, and 35% (95% CI, 25%-46%) for 1 physician diagnosis and no IBD medications. The PPV for most definitions was the highest in the HIRD data.
Table 2.
Pooled PPV of the prevalent claims-based algorithms to identify a diagnosis of IBD in the medical records using Medicare, Medicaid, and HIRD data.
| Algorithm | Number of charts reviewed | Correctly identified IBD diagnosis | PPV (%) | 95% CI (%) | Range of PPV across data sources (%) |
|---|---|---|---|---|---|
| 2 or more IBD diagnoses and receipt of IBD medication | 79 | 74 | 94 | 86-98 | 83-100 |
| 2 or more IBD diagnoses and no IBD medications | 93 | 86 | 92 | 85-97 | 88-95 |
| 1 IBD diagnosis and receipt of IBD medication | 77 | 60 | 78 | 67-87 | 64-85 |
| 1 IBD diagnosis and no IBD medications | 83 | 29 | 35 | 25-46 | 22-64 |
Abbreviations: CI, confidence interval; HIRD, HealthCore Integrated Research Database; IBD, inflammatory bowel disease; PPV, positive predictive value.
Discussion
Using 3 different administrative databases, we validated incident and prevalent IBD case definition algorithms and determined the PPV for incident IBD within 90 days to exceed 90%. We validated incidence with a 1-year minimum lookback, but the median time from enrollment to first IBD diagnosis was approximately 4 years. Thus, we recommend a longer lookback period when using this algorithm to improve specificity. The highest algorithm PPV for prevalent IBD was 94% when requiring ≥2 IBD diagnoses (with 1 or more by a gastroenterologist or surgeon when using Medicare or commercial HIRD) and presence of IBD medication prescription. This definition yielded a 100% PPV in the HIRD commercial claims data. These case-finding algorithms can be applied to other studies using claims data.
Our study has a few limitations. For Medicare and Medicaid, we only had records from 2 academic health systems and may have missed diagnoses made elsewhere, thereby underestimating the PPV. Indeed, PPVs for prevalence were highest in the HIRD data that accessed records at multiple healthcare systems. A novel aspect of this study was the development of the incident algorithm using commercial claims data. However, there were too few incident patients in our academic medical center’s Medicare and Medicaid data to test the PPV. While our gold standard was a provider diagnosis confirmed by chart review, it is possible that patients had a condition, such as an infectious colitis, that mimicked IBD. By chance, our incidence validation did not include sufficient cases with surgery followed by no medications to specify the accuracy of this component of the definition. We did not determine the ability to distinguish between IBD subtypes. The generalizability of these results to other claims databases (within or outside of the United States) and for Medicare and Medicaid claims outside of academic medical centers is unknown, but most billing and coding practices in the United States are not health plan specific, and our results were generally similar between databases despite slight differences due to variations across different types of claims data.
Conclusions
Through a combination of diagnostic codes, procedure codes, and IBD medications, we developed IBD case definition algorithms capable of identifying incident and prevalent cases with high accuracy. These algorithms can be used in future epidemiology studies.
Supplementary Material
Contributor Information
Ghadeer K Dawwas, Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Alexandra Weiss, Division of Gastroenterology and Hepatology, University of Pennsylvania, Philadelphia, PA, USA.
Brad D Constant, Division of Gastroenterology, Hepatology, and Nutrition, Children’s Hospital of Philadelphia, Philadelphia, PA, USA.
Lauren E Parlett, Carelon Research, Wilmington, DE, USA.
Kevin Haynes, Janssen Research and Development, LLC, Titusville, NJ, USA.
Jeff Yufeng Yang, Center for Pharmacoepidemiology, Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
Colleen Brensinger, Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Qufei Wu, Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Virginia Pate, Center for Pharmacoepidemiology, Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
Michele Jonsson Funk, Center for Pharmacoepidemiology, Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
Douglas E Schaubel, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Andres Hurtado-Lorenzo, Crohn’s & Colitis Foundation, New York, NY, USA.
Michael David Kappelman, Department of Pediatrics, Division of Pediatric Gastroenterology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
James D Lewis, Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Division of Gastroenterology and Hepatology, University of Pennsylvania, Philadelphia, PA, USA; Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Funding
This work was supported by the Centers for Disease Control and Prevention (U01-DP006369) and the National Institutes of Health (P30-DK050306, UL1TR002489).
Conflict of Interest
G.K.D. has received funding from the American Society of Hematology and National Institutes of Health. L.E.P. is an employee of Elevance Health; she has received funding from Sanofi for an unrelated study. J.D.L. has consulted or served on the advisory board for Eli Lilly and Company, Samsung Bioepis, UCB, Bristol-Myers Squibb, Nestlé Health Science, Merck, Celgene, Janssen Pharmaceuticals, Bridge Biotherapeutics, Entasis Therapeutics, AbbVie, Pfizer, Gilead, Galapagos, Sanofi, Arena Pharmaceuticals, Protagonist Therapeutics, Amgen, and Scipher Medicine; has received research funding from Nestlé Health Science, Takeda, Janssen Pharmaceuticals, and AbbVie; has received educational grants from Takeda and Janssen; has performed legal work on behalf of generic manufacturers of ranitidine, including L. Perrigo Company, Glenmark Pharmaceuticals Inc, Amneal Pharmaceuticals LLC, Aurobindo Pharma USA, Dr. Reddy’s Laboratories, Novitium Pharma, Ranbaxy Inc, and Sun Pharmaceutical Industries, Strides Pharma, and Wockhardt USA LLC; and owns stock in Dark Canyon Labs. M.J.-F.’s employer (Center for Pharmacoepidemiology, Department of Epidemiology, University of North Carolina at Chapel Hill) has collaborative agreements AbbVie, Astellas, Boehringer Ingelheim, GlaxoSmithKline, Sarepta, Takeda, and UCB Biosciences, and she has received salary support to as Center Director. M.D.K. has consulted for AbbVie and Lilly; is a shareholder in Johnson & Johnson; and has received research support from Janssen and AbbVie. The other authors report no potential conflicts of interest.
References
- 1. Long MD, Hutfless S, Kappelman MD, et al. Challenges in designing a national surveillance program for inflammatory bowel disease in the United States. Inflamm Bowel Dis. 2014;20(2):398-415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Johnson EK, Nelson CP.. Values and pitfalls of the use of administrative databases for outcomes assessment. J Urol. 2013;190(1):17-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Herrinton LJ, Liu L, Lewis JD, Griffin PM, Allison J.. Incidence and prevalence of inflammatory bowel disease in a Northern California managed care organization, 1996-2002. Am J Gastroenterol. 2008;103(8):1998-2006. [DOI] [PubMed] [Google Scholar]
- 4. Bernstein CN, Wajda A, Svenson LW, et al. The epidemiology of inflammatory bowel disease in Canada: a population-based study. Am J Gastroenterol. 2006;101(7):1559-1568. [DOI] [PubMed] [Google Scholar]
- 5. Rezaie A, Quan H, Fedorak RN, Panaccione R, Hilsden RJ.. Development and validation of an administrative case definition for inflammatory bowel diseases. Can J Gastroenterol. 2012;26(10):711-717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Lewis JD, Bilker WB, Weinstein RB, Strom BL.. The relationship between time since registration and measured incidence rates in the General Practice Research Database. Pharmacoepidemiol Drug Saf. 2005;14(7):443-451. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
