Skip to main content
VA Author Manuscripts logoLink to VA Author Manuscripts
. Author manuscript; available in PMC: 2019 Dec 12.
Published in final edited form as: Dig Dis Sci. 2014 May 10;59(10):2406–2410. doi: 10.1007/s10620-014-3174-7

Accuracy of diagnostic codes for identifying patients with ulcerative colitis and Crohn’s disease in the Veterans Affairs Health Care System.

JK Hou 1,2, M Tan 2, RW Stidham 3, J Colozzi 4, D Adams 4, H El-Serag 1,2, AK Waljee 3,4
PMCID: PMC6907154  NIHMSID: NIHMS1059345  PMID: 24817338

Abstract

Background

International Classification of Diseases-9 (ICD-9) codes are useful in clinical research; however, the validity of ICD-9 codes for inflammatory bowel disease (IBD) patients in multiple centers in the Veterans Affairs Health Care Systems (VA) has not been established. Our aim was to determine the accuracy of ICD-9 codes for Crohn’s disease (CD) and ulcerative colitis (UC) in the VA.

Methods

Patients with a diagnosis of IBD during 1999–2009 were identified by at least one ICD-9 code for CD (555.x) or UC (556.x) at the Houston and Ann Arbor VA Medical Centers and confirmed by chart review. A diagnosis of CD, UC, and IBD, unspecified (IBDU) was determined based on structured review of data in the VA medical records. Positive predictive values (PPV) were calculated for the codes using previously published ICD-9 algorithms.

Results

A total of 1,871 patients were identified with ICD-9 codes for IBD. Of these patients, 1,298 (69 %) were confirmed to have IBD, with 541 CD (41 %), 707 UC (55 %), and 50 IBDU (4 %) patients. An algorithm of 2 or more codes with at least one from an outpatient encounter improved the PPV (0.83 and 0.89 for CD and UC, respectively) compared a single code algorithm (PPV 0.59 and 0.66, respectively).

Conclusion

Single ICD-9 codes are inadequate to accurately define IBD patients; however, ICD-9 code algorithms can be used to identify patients with UC or CD with high positive predictive value. The 2 code, at least 1 outpatient code algorithm was observed to have a high PPV and low miss rate.


Inflammatory bowel disease (IBD) affects more than 1.4 million patients in the United States with an estimated cost of $6.3 billion annually [1]. Several factors account for the economic impact of disease, including often lifelong expensive medical treatments, hospitalization and surgery, and disability, all in a condition that frequently affects individuals in early adulthood and middle age [1, 2, 3]. Given that IBD is a low prevalence condition, administrative datasets are increasingly used to study disease outcomes to assess the impact of therapies on patient outcomes.

The Veterans Affairs Health Care Systems (VA) administrative datasets have been used to examine the clinical epidemiology of several disorders. The VA datasets include over 7 million veterans and houses clinical, pharmacy, and diagnostic data including the International Classification of Diseases-9 (ICD-9) code. With the growing recognition of the large population of VA users with inflammatory bowel disease (IBD), correctly identifying patients with IBD for research is vital. Validated algorithms for IBD patient identification in the VA are essential to accurately and consistently define a cohort of IBD patients for further study.

The validity of ICD-9 codes to identify IBD patients in VA systems has only been evaluated in one prior study of administrative codes for IBD in the VA performed at a single center [4]. However, the accuracy of the ICD-9 codes algorithm may vary by facility due to the nature of coding the encounters and thus may not be generalizable to the entire national cohort of IBD patients. In addition, the previously reported algorithm included the use of age, gender, and race to identify cases which may limit the evaluation of the effects of age, gender, and race among VA users with IBD. We hypothesized that a case definition using ICD-9 codes alone can accurately identify VA users with IBD. Therefore, the aim of our study was to determine the accuracy of ICD-9 codes for Crohn’s disease (CD) and ulcerative colitis (UC) in the VA.

Study Setting

This retrospective cohort study was conducted at the Michael E. DeBakey Veterans Affairs Medical Center (MEDVAMC), Houston, TX, and Ann Arbor Veterans Affairs Health Care System (AAVA). The Institutional Review Board of both facilities approved this study.

Data Source and Case Identification

We tested the validity of IBD-related ICD-9 codes recorded in the two facilities described above. We estimated the validity of the ICD-9 codes using several algorithms that included at least one ICD-9 code for UC (556.x) or CD (555.x) in either inpatient (IP) or outpatient (OP) encounters, and compared them to the diagnosis derived from a comprehensive manual review of medical records using the Computerized Patient Record System (CPRS). The manual chart review was performed by the investigators (MT and JC) using standardized data extraction forms. A diagnosis of CD, UC, and IBD, unspecified (IBDU) was determined based on review of clinical history, endoscopy, radiology, and pathology reports in the VA medical records.

Statistical Analyses

We calculated the positive predictive value (PPV) and the miss rate (defined below) using five separate algorithms of individual and combinations of ICD-9 codes for the diagnosis of CD, UC, IBDU or overall inflammatory bowel disease (IBD). These algorithms were as follows: 1 code: Presence of at least one code for CD (555.x) or UC (556.x) from either outpatient or inpatient encounter. 5 codes: Presence of at least 5 codes for CD or UC from any type of encounter. 2 codes, at least 1 OP: Presence of at least 2 codes for CD or UC with at least one encounter occurring in an outpatient setting. 2 OP or 1IP: Presence of either at least 2 outpatient codes for CD or UC or at least one inpatient code for CD or UC. 2OP and 1IP: Presence of at least 2 outpatient codes for CD or UC in addition to at least one inpatient code for CD or UC. Patients who met the above criteria but had the presence of both UC and CD codes were classified as IBDU.

The PPV indicated the probability that a veteran with the IBD ICD-9 code actually had IBD, and was calculated as the ratio of the number of accurate IBD diagnoses (based on manual chart review) divided by the number of cases identified by the ICD-9 based classification algorithm. The miss rate was calculated as the ratio of cases misclassified as negative for IBD by the coding algorithm that were confirmed to have IBD by manual chart review divided by the total number of truly positive cases.

Positivepredictivevalue=#truepositives/(#truepositives+#falsepositives)Missrate=#falsenegatives/(#truepositive+#falsenegatives)Positivepredictivevalue=#truepositives/(#truepositives+#falsepositives)Missrate=#falsenegatives/(#truepositive+#falsenegatives)

All statistical analysis was performed using SAS 9.2 (SAS Institute, Cary, NC, USA) and Stata 11.2 (College Station, TX).

RESULTS

Study Population

From 1999 to 2009, we identified a total of 1,871 patients to have at least one ICD-9 code for CD or UC at the two VA sites. Of these patients, 1,298 were confirmed as having IBD on chart review, with 541 CD, 707 UC, and 50 IBDU patients (Table 1). The cohort of patients with confirmed IBD was 74 % white and 92 % male. The mean age of patients at the time of first VA IBD encounter was 54.8 years. Further details of UC and CD distribution are shown in Table 1.

Table 1.

Demographics and disease characteristics of veterans with IBD in the two study sites

AAVA MEDVAMC p value

IBD (n) 622 676
 UC 325 382 0.124
 CD 258 283 0.888
 1BDU   39   11 0.000
Gender (% men)   95   90 0.001
Age at first VA 56.2 53.3 0.000
 IBD encounter (years) Range [20,88] Range [18,87]
Race (%)
 White 498 (80) 465 ( 69) 0.000
 Black 32 (5) 115 (17) 0.000
 Other 92 (15) 96 (14) 0.763
UC distribution (%)
 Proctitis 38 (12) 21 (5) 0.003
 Left sided 56 (17) 50 (13) 0.124
 Pancolitis 178 (55) 170 (45) 0.007
 Missing 53 (16) 141 (37) 0.000
CD distribution (%)
 Ileal 69 (27) 41 (15) 0.000
 Ileo-colonic 74 (29) 99 (35) 0.117
 Colonic 96 (37) 71 (25) 0.002
 Missing 19 (7) 72 (25) 0.000

MEDVAMC Michael E. DeBakey VA Medical Center, AAVA Ann Arbor VA, IBD inflammatory bowel disease, UC ulcerative colitis, CD Crohn’s disease, IBDU IBD unspecified

Performance of ICD-9 Code Algorithms for CD

The 1 code algorithm for CD did not accurately identify patients with CD (Table 2). In the combined cohort, the PPV for 1 code for was only 0.60 for CD. The 5 code and 2OP and 1IP algorithms had the highest PPV (both 0.91); however, they also had high miss rates of 0.48 and 0.78, respectively. The performance of 2 codes, at least 1OP and 2OP or 1IP algorithms were similar with PPVs of 0.84 and 0.82, respectively, and miss rates of 0.31 and 0.29, respectively.

Table 2.

Performance of ICD-9 code algorithms lor IBD

Crohn’s disease Ulcerative colitis IBD, unspecified IBD, any type




PPV Miss rate PPV Miss rate PPV Miss rate PPV Miss rate

Combined
1 Code 0.60 0.21 0.67 0.10 0.06 0.74 0.69 0
5 Codes 0.91 0.48 0.94 0.46 0.06 0.74 0.94 0.34
2 Codes, at least 1 OP 0.84 0.31 0.91 0.23 0.06 0.74 0.91 0.13
2 OP or 1 IP 0.82 0.29 0.83 0.22 0.06 0.74 0.87 0.10
2 OP and 1IP 0.91 0.78 0.94 0.81 0.06 0.74 0.92 0.66
MEDVAMC
1 Code 0.48 0.27 0.61 0.12 0.03 0.64 0.61 0
5 Codes 0.87 0.49 0.91 0.43 0.03 0.64 0.90 0.28
2 Codes, at least 1 OP 0.79 0.34 0.88 0.26 0.03 0.64 0.86 0.12
2 OP or 1 IP 0.76 0.33 0.79 0.24 0.03 0.64 0.81 0.10
2 OP and 1IP 0.85 0.80 0.92 0.79 0.03 0.64 0.89 0.62
AAV A
1 Code 0.78 0.14 0.75 0.07 0.13 0.77 0.82 0
5 Codes 0.95 0.48 0.98 0.49 0.13 0.77 0.99 0.40
2 Codes, at least 1 OP 0.91 0.28 0.94 0.21 0.13 0.77 0.97 0.14
2 OP or 1 IP 0.87 0.25 0.87 0.19 0.13 0.77 0.94 0.11
2 OP and 1IP 0.97 0.76 0.96 0.84 0.13 0.77 0.97 0.71

MEDVAMC Michael E. DeBakey VA Mcdical Center, AAVA Ann Arbor VA, IBD inflammatory bowel disease, OP outpatient, IP inpatient, PPV positive predictive value

There was some variation in the accuracy of algorithms for CD by site. The PPV was greater for all algorithms at the AAVA site, with the greatest difference using the 1 code algorithm, PPV of 0.78 at AAVA, and only 0.48 at MEDVAMC. The miss rates were similar between the two sites, with the greatest difference seen in the 1 code algorithm. However, the relative performance of all of the algorithms was consistent between sites, as well as high PPV and low miss rates for the 2 codes, at least 1OP algorithm.

Performance of ICD-9 Code Algorithms for UC

The 1 code algorithm for UC did not accurately identify patients with UC. In the combined cohort, the PPV for 1 code for was only 0.67. The 5 code and 2OP and 1IP algorithms had the highest PPV (both 0.94); however, they also had high miss rates, 0.46 and 0.81, respectively. The 2 codes, at least 1OP algorithm had a higher PPV than the 2OP or 1IP algorithm (0.91 and 0.83, respectively) with similar miss rates (0.23 and 0.22, respectively).

The performances of the case-finding algorithms for UC were not different between the two sites. The greatest difference was seen in the PPV for the 1 code algorithm (difference of 14 %). The differences in PPV for the other algorithms were less than 10 % and the miss rates were also similar. Again, the 2 codes, at least 1 OP algorithm performed with high PPV and low miss rates at both sites.

Performance of ICD-9 Code Algorithms for IBDU

None of the described algorithms accurately identified patients with IBDU. PPV was very low in all 5 case-finding algorithms, with PPV of 0.06 in the combined cohort and miss rate of 0.74 in all algorithms.

Performance of ICD-9 Code Algorithms for any IBD type

With the exception of the 1 code algorithm, the case-finding algorithms accurately identified patients with IBD with PPV greater than 0.87. The miss rates were lowest in the 2 codes, at least 1OP and the 2 OP or 1IP algorithms, 0.13 and 0.10, respectively.

The case-finding algorithms for any IBD type showed high PPV for all 5 algorithms in AAVA compared to MEDVAMC. However, the algorithms also had higher miss rates in AAVA compared to MEDVAMC. Again, the greatest difference in PPV was seen in the 1 code algorithm, of 31 %. The 2 code, at least 1 OP code was observed to have a high PPV and low miss rate at both sites.

DISCUSSION

This is largest study to validate ICD-9 codes as identifiers of IBD in the VA. With the exception of the 1 code algorithm, all other tested case-finding algorithms had high PPV to identify patients with IBD (PPV 0.87–0.94). For CD and UC, the 5 code or 2 OP and 1IP algorithms demonstrate high PPV but with very high miss rates. The two algorithms of 2 codes, at least 1 OP and 2OP or 1IP algorithms performed similarly, with the former algorithm with higher a PPV for UC compared to the later. None of the tested algorithms accurately identified patients with IBDU.

The VA Healthcare System has great potential for conducting IBD research for several reasons. The cohort of VA users diagnosed with IBD represent one of the largest IBD cohorts identified, and the racially diverse IBD population in VA represents an opportunity to study racial disparities in IBD [5, 6]. The comprehensive electronic medical record of the VA allows medical record review in addition to access to administrative datasets. Lastly, nationally shared electronic medical records may serve as an opportunity to implement and test IBD-related quality improvement interventions which may be applied to non-VA practices as well as to allow the development of risk prediction tools. However, accurate means to identify IBD patients in the national VA administrative datasets have not been clearly defined or uniformly performed.

Our study builds upon a prior ICD-9 code validation study performed at a single VA center previously published [4]. The prior study demonstrated that potential ICD-9 codes for IBD other than 555.x and 556.x (i.e., non-infectious colitis [558.9], intestinal fistula [569.81], presence of ileostomy [V44.2], and presence of colostomy [V44.3]) did not accurately identify patients with IBD. In that study, CD codes were reported to have a PPV between 88 and 100 %. UC codes were less accurate, with a PPV between 0 and 93 %. They proposed a diagnostic algorithm for UC using a combination of age, gender, and race to improve the PPV to 88 %. Our results of PPV for IBD in the VA administrative datasets are similar to those of other chronic diseases in the VA. For examples, the use of two ICD-9 codes have been validated in the VA for gout, post-traumatic stress disorder, and alcoholic liver disease with a PPVs of 86, 82, and 83 %, respectively [7, 8, 9]. Among diabetic patients in the VA, single ICD-9 codes for stroke and acute MI have also been validated, with PPVs of 81 and 89.7 %, respectively [10]. In the work presented, we evaluated alternative case-finding algorithms without using age and race in the algorithm, as doing so may bias age- and race-related research questions. We therefore tested variations of other previously published ICD-9 based non-VA IBD case-finding algorithms and non-IBD VA case-finding algorithms [5, 7, 11].

Case-finding algorithms accuracy varied by site. During the study period, neither site had a dedicated IBD clinic, and therefore, coding practices likely reflect general coding practices in VA. Difference in ICD-9 codes may reflect differences in clinician documentation, site-specific coder accuracy, or variations in patient presentation. Despite the differences by site, the relative performance of each case-finding algorithm was similar suggesting these algorithms may be generalized within the VA. We also observed a difference in the performance of single codes from the prior code validation study [4]. There are several potential reasons for such discrepancies. The study period in the current study is over a period of 10 years compared to 4 years in the prior Houston study. Our study period included 1 year prior and 5 years after the prior Houston study, and variations in coding patterns may be present. There may also be variations in disease classification, although one would not expect such a large difference in inter-observation variation alone. Chart review in our current study allowed for an additional 5 years of follow-up of patients in the original study. As the cohort of IBD patients in the VA are older than most non-VA IBD studies (mean age 54.8 years), there may be alternative diagnoses to IBD, such as ischemic colitis, that may manifest with longer follow-up. The follow-up of older patients initially classified with IBD is an interesting question and deserves further study.

This study has several limitations. The study was retrospective and past coding practices may not reflect current or future coding practices. In addition, both AAVA and MEDVAMC are relatively large academic-affiliated VA medical centers which may not reflect the coding practices in other VA centers. However, at the time of the study, neither site had IBD specific clinics and therefore IBD patients were managed in the general GI clinics. Further studies may be required to evaluate the accuracy of ICD-9 code algorithms across a national sample of patients with IBD in the VA. No algorithm accurately identified patients with IBDU. The total number of IBDU patients was low, with the majority having ICD-9 codes for either UC or CD. Alternative methods to ICD-9 codes will be required to identify patients with IBDU. We did not calculate the negative predictive value for the absence of ICD-9 codes for CD or UC as the prior study by Thirumurthi et al. showed a very high negative predictive value (99 %) for absence of a single ICD-9 code for IBD [4]. Despite these limitations, this study has numerous strengths, including the use of two separate VA sites, large sample size of IBD cases, evaluation of 5 separate IBD case-finding algorithms, and access to full medical records for IBD case validation.

In summary, IBD patients and IBD type (with the exception of IBDU) can be accurately identified in the VA using ICD-9-based case-finding algorithms with high positive predictive value. The performance of ICD-9 code algorithms for IBD varies by facility but have similar relative performance, suggesting they may be generalized across the VA. Observed miss rates may be improved by identifying non-ICD-9-based case-finding algorithms. We recommend the use of the 2 codes, at least one OP algorithm which demonstrated high PPV across both sites with a low miss rate.

Acknowledgments

Jason Hou has received research funding from Aptalis Pharmaceuticals. The research reported here was supported in part by the American College of Gastroenterology Junior Faculty Development Award to J. K. Hou and by the VA HSR&D Center for Innovations in Quality, Effectiveness and Safety (#CIN 13–413), at the Michael E. DeBakey VA Medical Center, Houston, TX, VA, HSR&D CDA-2 Career Development Award to A. Waljee, and Crohn’s and Colitis Foundation of America Career Development (3775) Award to R. Stidham.

Footnotes

Conflict of interest

Jason Hou has served as a speaker for UCB Pharmaceuticals and Abbvie Pharmaceuticals. The other authors declare that they have no conflict of interest.

REFERENCES

  • 1.Kappelman MD, Rifas-Shiman SL, Porter CQ, et al. Direct health care costs of Crohn’s disease and ulcerative colitis in US children and adults. Gastroenterology. 2008;135:1907–1913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bernstein CN, Loftus EV Jr, Ng SC, et al. Hospitalisations and surgery in Crohn’s disease. Gut. 2012;61:622–629. [DOI] [PubMed] [Google Scholar]
  • 3.Peyrin-Biroulet L, Loftus EV Jr, Colombel JF, et al. The natural history of adult Crohn’s disease in population-based cohorts. Am J Gastroenterol. 2010;105:289–297. [DOI] [PubMed] [Google Scholar]
  • 4.Thirumurthi S, Chowdhury R, Richardson P, et al. Validation of ICD-9-CM diagnostic codes for inflammatory bowel disease among veterans. Dig Dis Sci. 2010;55:2592–2598. [DOI] [PubMed] [Google Scholar]
  • 5.Hou JK, Kramer JR, Richardson P, et al. Risk of colorectal cancer among Caucasian and African American veterans with ulcerative colitis. Inflamm Bowel Dis. 2012;18:1011–1017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Hou JK, Kramer JR, Richardson P, et al. The incidence and prevalence of inflammatory bowel disease among U.S. veterans: a national cohort study. Inflamm Bowel Dis. 2013;19:1059–1064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kramer JR, Davila JA, Miller ED, et al. The validity of viral hepatitis and chronic liver disease diagnoses in Veterans Affairs administrative databases. Aliment Pharmacol Ther. 2008;27:274–282. [DOI] [PubMed] [Google Scholar]
  • 8.Singh JA. Veterans Affairs databases are accurate for gout-related health care utilization: a validation study. Arthritis Res Ther. 2013;15:R224. Epub. 12/31/2013.Google Scholar [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Gravely AA, Cutting A, Nugent S, Grill J, Carlson K, Spoont M. Validity of PTSD diagnoses in VA administrative data: comparison of VA administrative PTSD diagnoses to self-reported PTSD Checklist scores. J Rehabil Res Dev. 2011;48:21–30. [DOI] [PubMed] [Google Scholar]
  • 10.Niesner K, Murff HJ, Griffin MR, Wasserman B, Greevy R, Grijalva CG. Roumie CL Validation of VA administrative data algorithms for identifying cardiovascular disease hospitalization. Epidemiology. 2013;24:334–335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Bernstein CN, Blanchard JF, Rawsthorne P, et al. Epidemiology of Crohn’s disease and ulcerative colitis in a central Canadian province: a population-based study. Am J Epidemiol. 1999;149:916–924. [DOI] [PubMed] [Google Scholar]

RESOURCES