Skip to main content
Canadian Journal of Gastroenterology logoLink to Canadian Journal of Gastroenterology
. 2012 Oct;26(10):711–717. doi: 10.1155/2012/278495

Development and validation of an administrative case definition for inflammatory bowel diseases

Ali Rezaie 1,, Hude Quan 2, Richard N Fedorak 3, Remo Panaccione 1, Robert J Hilsden 1
PMCID: PMC3472911  PMID: 23061064

Abstract

BACKGROUND:

A population-based database of inflammatory bowel disease (IBD) patients is invaluable to explore and monitor the epidemiology and outcome of the disease. In this context, an accurate and validated population-based case definition for IBD becomes critical for researchers and health care providers.

METHODS:

IBD and non-IBD individuals were identified through an endoscopy database in a western Canadian health region (Calgary Health Region, Calgary, Alberta). Subsequently, using a novel algorithm, a series of case definitions were developed to capture IBD cases in the administrative databases. In the second stage of the study, the criteria were validated in the Capital Health Region (Edmonton, Alberta).

RESULTS:

A total of 150 IBD case definitions were developed using 1399 IBD patients and 15,439 controls in the development phase. In the validation phase, 318,382 endoscopic procedures were searched and 5201 IBD patients were identified. After consideration of sensitivity, specificity and temporal stability of each validated case definition, a diagnosis of IBD was assigned to individuals who experienced at least two hospitalizations or had four physician claims, or two medical contacts in the Ambulatory Care Classification System database with an IBD diagnostic code within a two-year period (specificity 99.8%; sensitivity 83.4%; positive predictive value 97.4%; negative predictive value 98.5%). An alternative case definition was developed for regions without access to the Ambulatory Care Classification System database. A novel scoring system was developed that detected Crohn disease and ulcerative colitis patients with a specificity of >99% and a sensitivity of 99.1% and 86.3%, respectively.

CONCLUSION:

Through a robust methodology, a reproducible set of criteria to capture IBD patients through administrative databases was developed. The methodology may be used to develop similar administrative definitions for chronic diseases.

Keywords: Administrative database, Crohn disease, Epidemiology, Inflammatory bowel diseases, Population-based data, Ulcerative colitis


Crohn disease (CD) and ulcerative colitis (UC), known as inflammatory bowel diseases (IBD), are chronic relapsing inflammatory conditions of the gastrointestinal tract with an unknown etiology. CD and UC share several pathological and clinical presentations; in fact, they may be clinically and histopathologically indistinguishable (ie, IBD unspecified [IBDU]) (1). IBD often occurs early in life and requires continuous treatment, which may involve major surgery. IBD patients have a lower quality of life and contend with a variety of IBD-related morbidities (2).

The incidence and prevalence rates of IBD are dynamic and differ enormously according to geographical location (3). In North America, prevalence rates range from 37.5 to 249 cases per 100,000 persons for UC, and from 26 to 319 cases per 100,000 persons for CD. However, the generalizability of these studies is considered to be limited because of their highly selected populations (49).

A population-based database of IBD patients is invaluable to explore and monitor epidemiology and to quantify the burden and outcomes of the disease. Therefore, researchers and health care providers have devoted significant attention to population-based administrative databases. Previously, Bernstein et al (10) used the provincial inpatient hospital discharge abstracts and the physician service claims to describe the epidemiology of IBD in the central Canadian province of Manitoba (population 1.2 million). As the largest population-based IBD database in North America at the present time, this database has significantly enhanced our knowledge of the epidemiology of IBD. However, the lack of an independent population-based data source to verify or exclude the diagnosis of IBD may have compromised the accuracy of this particular database.

The purpose of the present study was to develop and validate an administrative ‘case definition’ for IBD. We identified IBD and non-IBD individuals through an endoscopy database in a western Canadian health region (Calgary Heath Region [CHR], Calgary, Alberta). Subsequently, through a novel algorithm, we developed a series of case definitions to capture IBD cases in the administrative databases. In the second stage of the study, we validated our criteria in a separate Canadian health region (Capital Health Region [Edmonton, Alberta]). The study represents the first independently validated administrative case definition for IBD.

METHODS

Setting

Alberta is a western province of Canada with a population of approximately 3.7 million. To receive universal health insurance coverage, Alberta residents must register with the Alberta Health Care Insurance Plan. At the time of the study, administration of the health system in Alberta was provided by health regions, with the largest health regions being the CHR and the Capital Health Region (Edmonton) with populations of 1.2 million and 1.1 million, respectively. The initial phase of the study was conducted in the CHR while the validation phase was completed in the Capital Health Region.

Institutional ethics approvals were obtained from the Ethics Review Boards of the Universities of Alberta (Edmonton) and Calgary (Calgary).

Governmental administrative databases

Four Alberta health administrative databases were used:

  1. Health Care Insurance Plan Registry (HCIPR): Because registration is mandatory, the HCIPR houses demographic information for all residents of Alberta. Each resident is represented by a unique lifetime personal health number (PHN).

  2. Physician billing claims (PC) database: The Alberta government is the sole payer of the submitted physician claims and maintains an electronic database for this purpose. Each claim includes the PHN, date of service, up to three diagnostic codes (International Classification of Diseases, Ninth Revision, ICD-9-CM) and one service tariff code.

  3. Hospital Discharge Abstract Database (DAD): The DAD is collected from each patient’s chart at the time of discharge (acute care, chronic care and rehabilitation) to reflect diagnoses and procedures performed during the period of hospitalization. Up to 16 diagnoses are coded in ICD-9-CM until fiscal year (FY) 2002/2003, when a transition was made to the International Statistical Classification of Diseases and related Health Problems, Tenth Revision, Canadian modification (ICD-10-CA). Data also include patient demographics, PHN, admission and discharge dates.

  4. Ambulatory Care Classification System (ACCS) database: The ACCS collects information on facility-based ambulatory care (ie, day surgeries and procedures, and emergency visits). Recorded data consist of patient demographic data, service dates, procedure codes, six ICD-9 diagnostic codes until FY 2001/2002 and 10 ICD-10-CA diagnostic codes from FY 2002/2003.

Endoscopy databases

Two separate endoscopy databases were used in the present study:

  • Physicians in the tertiary care centres of the CHR record all endoscopic procedures in an electronic endoscopy database (Endopro; Pentax Inc, USA). Each record contains patient demographics, a brief medical history and physical examination data. This report also includes the indication for the procedure, endoscopic findings, impressions and recommendations.

  • In the Capital Health Region, all of the gastrointestinal endoscopic procedures performed in tertiary or secondary care centres are abstracted in an electronic database. Each record includes patient demographics, PHN, indication for the procedure and an ICD-coded diagnosis.

Development of the administrative case definition for IBD

From May 2000 to March 2004, endoscopic procedure reports in the CHR were digitally and manually searched for IBD-related findings, indication or diagnosis. In addition, structured reviews were conducted on selected outpatient and inpatient medical charts of patients who underwent endoscopy in this time period.

Patients with an indication or diagnosis of IBD on their endoscopy report or patients who met Lennard-Jones criteria (11) on chart reviews were considered to be IBD. Patients were classified as possible IBD if the indication for endoscopy was suspicious for IBD (eg, rectal bleeding); endoscopic findings were compatible but not specific to IBD (eg, nonspecific inflammation); or IBD was considered as a possible diagnosis at the end of the procedure. The remaining subjects were classified as non-IBD.

Using patients’ names and demographics, PHNs for each individual were captured in the HCIPR by CHR personnel. Subsequently, available PHNs were linked to the PC, DAD and ACCS databases.

IBD case ascertainment in administrative data

A total of 100 case definitions were developed depending on the number of physician claims (two to six claims), ACCS encounters (two or three encounters) and hospitalizations (one or two) with a diagnosis of IBD (ie, ICD-9 codes of 555 or 556; and ICD-10 codes of K50 or K51) in various time frames (one to five years). Each time frame ended with an administrative contact and did not necessarily fall within calendar or fiscal years. For this purpose, a time-floating computerized algorithm was developed (11).

Fifty additional case definitions were developed without the incorporation of the ACCS database because the ACCS or similar databases of outpatient encounters are not currently available in several Canadian provinces.

Using medical chart reviews and the endoscopic database as the ‘gold standard’ diagnostic tool, diagnostic characteristics (sensitivity, specificity, positive predictive value and negative predictive value with exact 95% CIs) of each administrative case definition were determined using contingent 2×2 tables.

Validation of the administrative case definition for IBD

The Capital Health Region’s endoscopy database was searched from FY 1997/1998 through FY 2006/2007. Patients with an indication and diagnosis of IBD for their endoscopic procedure(s) were classified as IBD. Because there was no ICD code for ‘normal’ or ‘healthy’, capturing non-IBD subjects was not possible.

PHNs were linked to the PC, DAD and ACCS databases (FY 1995/1996 to FY 2006/2007) and IBD-related ICD codes were extracted. Each of the ‘case definitions’ created in the development phase were examined. Using the endoscopic database as the ‘gold standard’ test, the sensitivity of each case definition with exact 95% CIs were calculated. Sensitivity was calculated with 10 years of administrative data (FY 1995/1996 to FY 2005/2006) to match the development phase (FY 1994/1995 to FY 2004/2005). Given the almost perfect specificity of preferable case definitions and lack of a population-based non-IBD cohort, validation of specificity was not feasible.

Additional administrative data from FY 2005/2006 and FY 2006/2007 were used to assess the stability of sensitivity of each case definition over 11 and 12 years.

Statistical analysis

The exact two-sample binomial test was used to compare the sensitivity of each case definition in development versus the validation phase. Assessment of the temporal stability of each case definition was also performed using the same test.

An α level of 0.1, instead of 0.05, was chosen to further decrease the possibility of accepting the null hypothesis by chance. All statistical analyses were performed using STATA 10.0 (Stata Corporation, USA).

Selection of the final case definition

In order of importance, selection of the final administrative case definition was founded on the following:

  • Statistically equal sensitivity in the development and validation phase (ie, reproducibility).

  • Stability of sensitivity over time. As administrative data accumulate over time, a proportion of non-IBD subjects who incorrectly received IBD diagnostic codes will be misclassified as IBD. This will lead to a false increment in sensitivity and, conversely, a decrease in specificity. A case definition with unstable characteristics is not suitable for surveillance of administrative data due to the lack of reliability.

  • A very high specificity. IBD is a rare disease; therefore, a minimal false-negative rate is essential for a population-based diagnostic tool.

  • A reasonably high sensitivity. While a very high sensitivity is ideal, it sacrifices specificity as the first priority for the selection process. A sensitivity >75% was sought.

Distinction between UC and CD patients

To identify definite CD and UC patients, the IBD patients who were captured with the final case definition were evaluated in the development phase. Corresponding endoscopy report(s) for each patient were reviewed for indication, findings, or impression of UC or CD. Patients were categorized as IBDU if the review of the endoscopy report(s) failed to clearly differentiate UC from CD.

Within a 10-year time period (FY 1994/1995 to FY 2004/2005), health care contacts of UC and CD patients in the ACCS, DAD and PC databsases were enumerated. Within each administrative database (ie, ACCS, DAD or PC), patients were given a +1 score for any ICD diagnostic code for UC or a score of −1 for each ICD code for CD. Subsequently, for every patient, a cumulative score was calculated in each of the three databases. Considering various diagnostic cut-off values and different weights for each database, multiple combinations of these three scores were evaluated. The final scoring system classifies the subjects into CD, UC or IBDU if the score does not reach the cut-off values for diagnosis of CD and UC. Selection of the final scoring system was founded on high specificity and sensitivity rates to detect CD and UC patients.

The coding system for indication of endoscopic procedures on IBD patients in the Capital Health Region does not differentiate UC from CD; hence, performance assessment of the scoring system was not feasible in the Capital Health Region. The various phases of the study are shown in Figure 1.

Figure 1).

Figure 1)

Flow chart describing different phases of the study. ACCS Ambulatory Care Classifications System; CD Crohn disease; DAD Hospital Discharge Abstract Database; HCIPR Health Care Insurance Plan Registry; IBD Inflammatory bowel disease; IBDU IBD unspecified; PC Physician claims; PHN Personal health number; UC Ulcerative colitis

RESULTS

Development phase

A manual and computerized search was conducted on 23,527 endoscopy reports of 21,193 patients. Of 21,193 patients, 17,699 (83.5%) were assigned a valid PHN through the linkage to HCIPR. Inability to assign a PHN to 16.5% of the individuals was mainly due to the procedures performed on non-CHR residents and missing demographic data in the endoscopy database. Based on the endoscopic data and 186 selected chart reviews, 17,699 individuals were categorized as non-IBD (n=15,439), possible IBD (n=861) and IBD patients (n=1399). Because the charts were not randomly selected, performance characteristics of endoscopic data in the diagnosis of IBD could not be calculated.

The PHNs of IBD and non-IBD patients were linked to the DAD, PC and ACCS databases for FY 1994/1995 through FY 2004/2005.

In total, 28,785 hospitalization records were found for 8632 patients. Sixty-one per cent of IBD patients experienced at least one hospital admission (median number of hospitalizations = 2), of which 75.4% had an ICD code for IBD. Non-IBD patients had a hospitalization rate of 50.4%, of which 0.08% had an ICD code for IBD.

Overall, 2,386,552 physician claims were found for the cohort of IBD and non-IBD patients. A total of 92.2% of IBD patients had at least one physician claim (median number of claims = 13), of which 95.0% had an ICD code for IBD. Moreover, 97.3% of non-IBD patients had a claim in the PC database, of which 3.8% had an ICD code for IBD.

In total, 240,142 contacts were found in the ACCS database. A total of 97.5% of IBD patients had at least one contact (median number of contacts = 2), of which 85.4% had an ICD code for IBD; 95.6% of non-IBD patients had a contact in the ACCS database, of which 0.6% had an ICD code for IBD. The individual performance of each administrative database is summarized in Table 1.

TABLE 1.

Individual performance of administrative databases during the period from 1995 to 2004

ACCS database
PC database
DAD database
No IBD contact At least one IBD contact At least two IBD contacts No IBD contact At least one IBD contact At least two IBD contacts No IBD contact At least one IBD contact At least two IBD contacts
IBD (n=1399) 247 (17.7) 1152 (82.3) 868 (62.0) 169 (12.1) 1230 (87.9) 1197 (85.6) 755 (54.0) 644 (46.0) 476 (34.0)
Non-IBD (n=15,439) 15,382 (99.6) 57 (0.4) 10 (0.06) 14,915 (96.6) 524 (3.4) 169 (1.1) 15,426 (99.9) 13 (0.08) 4 (0.03)

Data presented as n (%). ACCS Ambulatory Care Classification System; DAD Hospital Discharge Abstract Database; IBD Inflammatory bowel disease; PC Physician billing claims

Using the endoscopy database and chart reviews as the gold standard, the characteristics of 150 case definitions were calculated in the cohort of IBD and non-IBD patients (13).

Validation phase

From FY 1996/1997 through FY 2006/2007, 318,382 endoscopic procedures were performed on 170,443 individuals in the Capital Health Region. Using the indications and diagnoses of the procedures, 5433 IBD patients were identified. The PHNs of patients were linked to the DAD, PC and ACCS databases from FY 1994/1995 through FY 2006/2007. Of 5433 patients, 5201 had a record in Alberta administrative databases and 4637 had a record between FY 1994/1995 and FY 2004/2005.

In total 21,760 individuals with 12,554, 54,963 and 14,162 health care contacts with a diagnosis of IBD were identified within the DAD, ACCS and PC databases, respectively. The median number of health care contacts could not be calculated because only IBD-related ICD codes were extracted from the administrative databases.

Each of the previously developed 150 case definitions was applied to the IBD cohort and statistical comparisons of case definition characteristics in the validation and development phases were made. Fifteen case definitions that included the ACCS database and 14 case definitions that did not incorporate the ACCS database had statistically similar sensitivity in the validation phase (Tables 2 and 3). Temporal stability of the sensitivity of each validated case definition was also measured.

TABLE 2.

Characteristics of favourable validated case definitions

Case definition* Sensitivity Specificity Predictive value
Validated sensitivity P
Positive Negative
1-3-2-2 88.85 (88.07–90.45) 99.58 (99.46–99.68) 95.31 (93.71–96.14) 98.99 (98.83–99.15) 89.91 (89.00–90.76) 0.255
1-3-3-2 86.71 (84.81–88.44) 99.60 (99.49–99.69) 95.14 (93.81–96.25) 98.81 (98.62–98.97) 86.39 (85.37–87.37) 0.764
1-3-3-3 87.78 (85.95–89.45) 99.59 (99.47–99.68) 95.05 (93.72–96.17) 98.90 (98.72–99.06) 87.97 (87.00–88.89) 0.849
1-3-3-4 87.99 (86.17–89.65) 99.57 (99.45–99.66) 94.84 (93.49–95.98) 98.92 (98.74–99.08) 88.59 (87.64–89.49) 0.538
1-4-3-2 81.70 (79.57–83.70) 99.78 (99.69–99.85) 97.11 (95.99–97.99) 98.37 (98.15–99.56) 82.90 (81.78–83.97) 0.300
1-4-3-3 83.56 (81.51–85.47) 99.77 (99.68–99.84) 97.01 (95.89–97.90) 98.53 (98.33–98.71) 85.08 (84.02–86.09) 0.167
2-3-2-2 88.13 (86.32–89.78) 99.62 (99.51–99.71) 95.43 (94.15–96.51) 98.93 (98.76–99.09) 86.76 (85.74–87.72) 0.179
2-3-2-3 88.92 (87.16–90.52) 99.61 (99.50–99.70) 95.40 (94.12–96.47) 99.00 (99.83–99.15) 88.31 (87.35–89.22) 0.532
2-3-2-4 89.14 (87.39–90.72) 99.59 (99.47–99.68) 95.12 (93.81–96.22) 99.02 (98.85–99.17) 89.45 (88.53–90.32) 0.734
2-4-2-2 83.42 (81.36–85.33) 99.80 (99.72–99.86) 97.41 (96.35–98.24) 98.52 (98.32–98.70) 83.20 (82.09–84.26) 0.849
2-4-2-3 84.78 (82.78–86.62) 99.79 (99.71–99.86) 97.37 (96.31–98.20) 98.64 (98.44–98.81) 85.44 (84.40–86.45) 0.536
2-4-2-4 85.28 (83.31–87.09) 99.78 (99.69–99.85) 97.23 (96.15–98.07) 98.68 (98.49–98.85) 86.97 (85.97–87.93) 0.102
2-5-3-2 74.77 (72.41–77.03) 99.90 (99.84–99.95) 98.59 (97.68–99.21) 97.76 (97.52–97.99) 72.91 (71.61–74.19) 0.169
2-5-3-3 77.06 (74.46–79.23) 99.89 (99.83–99.94) 98.54 (97.64–99.16) 97.96 (97.73–98.18) 76.67 (75.42–77.88) 0.763
2-5-3-4 78.84 (76.61–80.96) 99.89 (99.83–99.94) 98.57 (97.69–99.18) 98.12 (97.89–98.32) 78.63 (77.42–779.80) 0.864

Data presented as % (95% CI) unless otherwise indicated. Final definition indicated in bold.

*

The first digit represents the minimum number of hospitalization(s) required for an individual to be considered an inflammatory bowel disease patient. Similarly, the second and the third digits represent the minimum number of the health care contacts in physician claims and Ambulatory Care Classification System databases. The fourth digit denotes the maximum time (in years) for the definition to be fulfilled

TABLE 3.

Characteristics of validated case definitions using hospital discharge abstract physician claim databases

Case definition* Sensitivity Specificity Predictive value
Validated sensitivity P
Positive Negative
1-3-5* 84.99 (83.01–86.82) 99.59 (99.47–99.68) 94.89 (93.52–96.05) 98.65 (98.46–98.83) 83.17 (82.07–84.25) 0.220
1-4-2 77.98 (75.72–80.13) 99.80 (99.72–99.86) 97.24 (96.10–98.12) 98.04 (97.81–98.25) 77.03 (75.79–78.24) 0.457
1-4-3 80.13 (77.94–82.19) 99.77 (99.70–99.85) 97.14 (96.01–98.02) 98.23 (98.01–98.43) 78.71 (77.51–79.89) 0.255
1-4-4 80.70 (78.53–82.74) 99.78 (99.69–99.85) 97.08 (95.94–97.97) 98.29 (98.06–98..48) 79.64 (78.45–80.79) 0.387
1-4-5 81.06 (78.90–83.08) 99.77 (99.69–99.84) 97.01 (95.86–97.91) 98.31 (98.09–98.51) 80.18 (79.00–81.32) 0.469
1-5-1 67.19 (64.66–69.65) 99.90 (99.83–99.94) 98.33 (97.30–99.04) 97.11 (96.84–97.36) 68.28 (66.92–69.62) 0.445
1-5-2 72.41 (69.99–74.74) 99.88 (99.81–99.93) 98.16 (97.14–98.89) 97.56 (97.31–97.79) 73.45 (72.16–74.72) 0.439
1-5-3 74.55 (72.19–76.82) 99.86 (99.79–99.92) 98.03 (97.00–98.77) 97.74 (97.50–97.97) 75.48 (74.22–76.71) 0.481
1-5-4 76.20 (73.88–78.41) 99.86 (99.79–99.92) 98.07 (97.06–98.80) 97.89 (97.65–98.11) 76.86 (75.62–78.07) 0.607
1-5-5 76.84 (74-54–79.03) 99.86 (99.79–99.92) 98.08 (97.09–98.81) 97.94 (97.71–98.16) 77.68 (76.45–78.87) 0.510
1-6-2 68.19 (65.68–70.63) 99.89 (99.83–99.94) 98.35 (97.34–99.05) 97.20 (96.93–97.45) 69.70 (6835–71.02) 0.283
1-6-3 70.34 (67.87–72.72) 99.89 (99.82–99.94) 98.30 (97.30–99.01) 97.38 (97.12–97.62) 72.44 (71.13–73.72) 0.125
1-6-4 72.05 (69.62–74.39) 99.88 (99.82–99.93) 98.25 (97.24–98.96) 97.53 (97.27–97.76) 73.82 (72.53–75.08) 0.190
1-6-5 73.27 (70.87–75-57) 99.88 (99.82–99.93) 98.27 (97.29–98.97) 97.63 (97.38–97.86) 74.68 (73.40–75.93) 0.288

Data presented as % (95% CI) unless otherwise indicated. Final definition indicated in bold.

*

The first and second digits represents the minimum number of hospitalization(s) and Ambulatory Care Classification System contacts required for an individual to be considered an inflammatory bowel disease patient. The third digit denotes the maximum time (in years) for the definition to be fulfilled

Selection of the final case definition

After consideration of sensitivity, specificity and temporal stability of each validated case definition, a diagnosis of IBD was assigned to individuals who experienced at least two hospitalizations or four physician claims, or two contacts in the ACCS database with an IBD diagnostic code within a two-year period (definition 2-4-2-2). A 2×2 contingency table and stability analysis for the final definition are shown in Table 4 and Figure 2.

TABLE 4.

Contingency 2×2 table yielded for the final administrative definition (ie, 2-4-2-2)

Endoscopy + chart reviews (gold standard) Administrative definition
IBD Non-IBD
IBD 1167 232
Non-IBD 31 15,408

IBD Inflammatory bowel disease

Figure 2).

Figure 2)

Stability of the sensitivity of the final case definition (2-4-2-2). FY Fiscal year; NS Not significant

Sensitivity of the final case definition for females and males was 82.56% (95% CI 80.94% to 84.17%) and 83.86% (95% CI 82.42% to 85.30%), respectively.

There was an inverse association between the age of the patients at the time of health care contact and the sensitivity of the final case definition. Sensitivity for the age groups <18, 18 to 39, 40 to 59 and >60 years was 85.76% (95% CI 81.41% to 89.42%), 78.78% (95% CI 77.11% to 80.38%), 75.51% (95% CI 73.68% to 77.27%) and 75.23% (95% CI 71.71% to 78.52%), respectively.

Selection of the final case definition without incorporation of the ACCS database

It was proposed that individuals with at least one hospitalization or four physician claims with an IBD diagnostic code within a two-year period should be considered as IBD (Figure 3).

Figure 3).

Figure 3)

Stability of the sensitivity of the final case definition without incorporation of the Ambulatory Care Classification System database (2-4-2). FY Fiscal year; NS Not significant

Sensitivity of the final case definition for females and males was 78.48% (95% CI 76.87% to 80.09%) and 75.51% (95% CI 73.68% to 77.33%), respectively. Sensitivity for the age groups <18, 18 to 39, 40 to 59 and >60 years was 81.96% (95% CI 77.27% to 86.04%), 76.89% (95% CI 75.17% to 78.53%), 73.64% (95% CI 7.76% to 75.45%) and 76.47% (95% CI 73.00% to 79.69%), respectively.

Development of a scoring system to classify IBD patients as UC or CD

Of 1399 definite IBD patients in the development phase, 1167 could be captured with the final case definition. From May 2000 to March 2004, these patients underwent 2457 endoscopic procedures. Through manual and computerized searches of endoscopic reports, 430 (36.8%) and 342 (29.3%) patients could be classified as definite CD or definite UC, respectively. Indication, findings and impression of endoscopic report(s) were inadequate to classify 365 (31.3%) patients as definite UC or definite CD. The PHNs of CD and UC patients were linked to the ACCS, DAD and PC databases (FY 1994/1995 to FY 2004/2005).

The characteristics of various scoring systems were calculated. The performance characteristics of selected scoring systems are shown in Table 5. Cut-off values of ‘greater than +2’ (for UC patients) and ‘less than −2’ (for CD patients) were found to elicit an optimal balance between specificity and sensitivity to detect UC and CD patients. Patients with a score equal to or between −2 and +2 were classified as IBDU.

TABLE 5.

Characteristics of favourable scoring systems to differentiate Crohn disease (CD) and ulcerative colitis (UC)

Specificity
Sensitivity
IBDU
UC* CD UC* CD
½PC+ACCS+DAD§ 99.42±0.4 98.84±0.5 84.8±1.9 93.49±1.2 7.9±1.0
0.4PC+ACCS+DAD 99.71±0.3 99.07±0.5 86.26±1.9 93.49±1.2 8.3%±1.0
⅓PC+DAD+ACCS 99.71±0.3 99.07±0.5 85.66±1.9 93.26±1.2 9.3±1.1
¼4PC+DAD+ACCS 99.71±0.3 99.30±0.4 77.78±2.24 95.12±1.0 12.56±1.2
½PC+⅔ACCS+DAD 99.12±0.5 99.07±0.5 86.84±1.8 91.86±1.3 9.3±1.1
⅓PC+DAD+⅔ACCS 99.71±0.3 99.30±0.4 89.82±1.6 89.77±1.5 13.86±1.2

Data presented as % ± SD. Final definition indicated in bold.

*

Patients with a cumulative score >+2 are classified as UC;

Patients with a cumulative score <–2 are classified as CD;

Percentage ±1 SD of patients with a score equal to or between −2 and +2;

§

Ratios represent the weight assigned to each medical contact for the corresponding database. ACCS Ambulatory Care Classification System; DAD Hospital Discharge Abstract Database; IBDU Inflammatory bowel disease unspecified; PC Physician billing claims

DISCUSSION

A large proportion of IBD patients are diagnosed and managed in outpatient clinics. Only 60% of IBD patients are managed by gastroenterologists (10); the present study showed that less than one-half of IBD patients are hospitalized within an 11-year period. Therefore, the epidemiology of IBD will be under-represented if patients are captured only through hospitals and outpatient gastroenterology clinics. Health care administrative databases are invaluable to investigate the population-based epidemiology of IBD and provide an effective tool for ongoing disease surveillance. Canada has a universal public health care insurance program that provides access to comprehensive coverage for inpatient and outpatient physician services. This is in contrast to the administrative databases in the United States, which only cover the populations they serve.

Identification of the initial cohort from an independent source is a critical factor to accurately estimate the sensitivity of an administrative case definition. For this reason, we identified IBD and non-IBD patients from an endoscopy database. In addition, we integrated medical chart reviews as another independent source to enhance the accuracy of disease status. Some of the signs and symptoms of IBD are shared by other diseases; therefore, rather than a binary classification of our patients into IBD and non-IBD, we implemented a third group as ‘possible IBD’. Defining a ‘possible IBD’ category decreases the number of false positives and false negatives in IBD and non-IBD groups, respectively.

In practice, it is not uncommon for a patient’s diagnosis of UC to change to CD or vice versa (12). Moreover, patients with IBDU, with no specific ICD code, may arbitrarily be assigned CD or UC codes. Separate definitions for CD or UC may fail to identify or may misclassify these two groups of patients; therefore, we initially developed a definition to detect IBD patients (ie, CD, UC and IBDU patients together) in the administrative databases. Following the identification of an IBD patient, we used a novel cumulative scoring system to categorize patients into CD, UC or IBDU. While this methodology categorized CD and UC patients as IBDU in approximately 8% of cases, this deficiency is compensated with high specificity rates (>99%) to detect CD and UC patients. Highly specific cohorts of CD or UC patients will enhance the quality and accuracy of population-based research, particularly genetic and environmental interaction studies.

No single administrative database could satisfactorily identify an accurate number of IBD patients. For example, 12.1% of IBD patients did not have an IBD claim in the PC database within a 10-year period (Table 1). In addition, in the same time period, 54.0% of IBD patients did not experience a hospitalization. A study from Ontario suggested that 8.6% of the physicians who see IBD patients do not use specific ICD-9 codes (13). In this context, in addition to commonly used DAD and PC databases, we incorporated the ACCS database, which is of significant value for IBD patients who repeatedly undergo endoscopic procedures for diagnosis, evaluation of disease activity and cancer screening. The ACCS database enabled us to identify 28.1% of IBD patients who failed to be captured using the DAD and PC databases. The National Ambulatory Care Reporting System that is being implemented by the Canadian Institute for Health Information shares similar content and function with ACCS and may be used as an alternative.

The value of each administrative database to identify IBD and non-IBD cases was not homogenous. Whereas hospitalization with a diagnostic code of IBD correlated correctly with the gold standard 98.0% of the time, this number for one IBD claim in PC database was only 70.0%. Therefore, in the development of our IBD case definition and our scoring system to differentiate CD and UC, different weights were applied to the IBD health care contacts in the DAD, PC and ACCS databases.

To enhance the reproducibility of an administrative definition, the currently acceptable method is to fractionate the data into shorter fixed periods. For instance, Hux et al (14) suggested division of a 10-year period into two-year phases to develop an administrative definition for diabetes in Ontario. We used a novel algorithm to enumerate ICD codes in multiple time periods to define IBD cases with optimal generalizability. Each time period starts with a health care contact and does not necessarily fall within a fiscal or calendar year. This method may have the advantage of capturing IBD flares by identifying peaks of health care contacts. Moreover, it facilitates future updates of the IBD database with ongoing acquisition of administrative data.

The difference among the test characteristics of a large number of case definitions in the development phase was minimal, making it difficult to select the final case definition. This finding was also observed in a study involving pediatric IBD patients (15). Statistical methods, such as Youden’s index (16), aim for a balance between sensitivity and specificity; however, in population-based case definitions for rare diseases, a higher weight should be given to the specificity rate to minimize the number of false positives (9). The ideal method to compare multiple case definitions is through external validation with an independent source of diseased and nondiseased individuals. In Alberta, we have a unique opportunity with two distinct health regions with similar populations (ie, Capital Health Region [Edmonton] and the CHR). In addition, both health regions benefit from endoscopy databases independently collected from the DAD, ACCS and PC databases that enabled us to successfully validate the final case definitions.

To our knowledge, no attention has been devoted to the importance of the temporal stability of test characteristics of administrative case definitions. Application of case definitions to longer than original time periods may disproportionally increase the number of false positives and decrease the number of false negatives. Consequently, sensitivity of the case definition will increase and specificity will decrease, leading to inaccurate interpretation of administrative data. To avoid such systematic error, through a novel approach, we examined the stability of case definitions based on 10, 11 and 12 years of data (Figures 2 and 3).

The sensitivity of the case definition is higher in younger age groups. A similar pattern has been observed in other administrative databases (15). This finding has been attributed to closer follow-up in pediatric patients and also to the presence of other comorbidities in older age groups. For example, an elderly patient with IBD who presents with cardiac chest pain might not receive an IBD code during his health care contact.

A potential limitation of the use of administrative databases is the accuracy of the collected data (17). However, coding accuracy of administrative databases in Alberta has been validated and data have been used extensively in epidemiological and outcome studies (1821). Inherently, surgical/procedural and administrative databases capture more severe cases of disease (22). Similarly, moderate and severe cases of IBD may be over-represented in our cohort. This may significantly affect the generalizability of the results.

The disease status of the majority of our patients was determined through the endoscopy reports. Diagnosis of IBD may be recorded prematurely before the pathological confirmation of the disease. In the development phase, 89.9% of the endoscopic procedures performed on ‘definite IBD’ patients had an indication of IBD. In the validation phase, diagnosis of IBD was only assigned to patients whose endoscopy reports had an indication of IBD. Therefore, given the diagnosis of IBD in the indication of the procedure, the majority of our patients were known to have IBD. However, misclassification of disease status remains a potential systematic error in our study.

In the development phase, we linked the HCIPR with an endoscopy database using a combination of personal identifiers such as date of birth, sex and name. The ‘deterministic record linkage’ method is inferior to the use of unique identifiers (ie, PHN). However, in a recent study conducted in Alberta, the correct linkage rate was estimated to be 96.9% (23).

To improve the specificity of our case definition, we did not include the ‘possible IBD’ cohort in the development phase. However, a subset of ‘possible IBD’ patients who have IBD do not necessarily have the same clinical and coding characteristics as those in the ‘definite IBD’ cohort. This may have introduced a selection bias in the study.

In the validation phase, patients were considered to have IBD only if they underwent endoscopies with an indication and diagnosis of IBD. Given the fact they underwent an endoscopic procedure with an indication of IBD – in addition to a final diagnosis of IBD – we speculated that the accuracy of this strategy should be very high. Subjective assessment of this strategy was not performed due to feasibility of structured outpatient and inpatient chart reviews on a large random sample of patients who are scattered over a vast geographical region.

In the development phase, patients were identified through an endoscopy database; therefore, the prevalence of IBD in our cohort was higher than in the general population. Our reported positive predictive value, which is influenced by disease prevalence, is likely overestimated and, in contrast, negative predictive value is likely to be underestimated. This limitation does not apply to the sensitivity and specificity rates, which are independent of prevalence.

CONCLUSION.

The present study demonstrated that IBD patients can be accurately identified in administrative databases. Through a novel methodology, we have developed a reproducible set of criteria to capture IBD patients through administrative databases and classify them as UC, CD or IBDU. Application of this case definition to the administrative databases in Alberta will result in the formation of the largest IBD database in North America. Our methodology might be used to develop similar administrative definitions for chronic diseases and also to establish population-based IBD databases.

Footnotes

DISCLOSURES: The authors have no financial disclosures or conflicts of interest to declare.

REFRENCES

  • 1.Romano C, Famiani A, Gallizzi R, Comito D, Ferrau’ V, Rossi P. Indeterminate colitis: A distinctive clinical pattern of inflammatory bowel disease in children. Pediatrics. 2008;122:e1278–81. doi: 10.1542/peds.2008-2306. [DOI] [PubMed] [Google Scholar]
  • 2.Rubin GP, Hungin AP, Kelly PJ, Ling J. Inflammatory bowel disease: Epidemiology and management in an English general practice population. Aliment Pharmacol Ther. 2000;14:1553–9. doi: 10.1046/j.1365-2036.2000.00886.x. [DOI] [PubMed] [Google Scholar]
  • 3.Loftus EV., Jr Clinical epidemiology of inflammatory bowel disease: Incidence, prevalence, and environmental influences. Gastroenterology. 2004;126:1504–17. doi: 10.1053/j.gastro.2004.01.063. [DOI] [PubMed] [Google Scholar]
  • 4.Pinchbeck BR, Imes S, Dinwoodie A, Thomson AB. Discriminant function analysis to calculate a Crohn’s activity group scale to predict future inactive or active disease. J Clin Gastroenterol. 1988;10:498–504. doi: 10.1097/00004836-198810000-00006. [DOI] [PubMed] [Google Scholar]
  • 5.Loftus CG, Loftus EV, Jr, Harmsen WS, et al. Update on the incidence and prevalence of Crohn’s disease and ulcerative colitis in Olmsted County, Minnesota, 1940–2000. Inflamm Bowel Dis. 2007;13:254–61. doi: 10.1002/ibd.20029. [DOI] [PubMed] [Google Scholar]
  • 6.Kurata JH, Kantor-Fish S, Frankl H, Godby P, Vadheim CM. Crohn’s disease among ethnic groups in a large health maintenance organization. Gastroenterology. 1992;102:1940–8. doi: 10.1016/0016-5085(92)90317-r. [DOI] [PubMed] [Google Scholar]
  • 7.Stowe SP, Redmond SR, Stormont JM, et al. An epidemiologic study of inflammatory bowel disease in Rochester, New York. Hospital incidence. Gastroenterology. 1990;98:104–10. doi: 10.1016/0016-5085(90)91297-j. [DOI] [PubMed] [Google Scholar]
  • 8.Hiatt RA, Kaufman L. Epidemiology of inflammatory bowel disease in a defined northern California population. West J Med. 1988;149:541–6. [PMC free article] [PubMed] [Google Scholar]
  • 9.Bernstein CN, Wajda A, Svenson LW, et al. The epidemiology of inflammatory bowel disease in Canada: A population-based study. Am J Gastroenterol. 2006;101:1559–68. doi: 10.1111/j.1572-0241.2006.00603.x. [DOI] [PubMed] [Google Scholar]
  • 10.Bernstein CN, Blanchard JF, Rawsthorne P, Wajda A. Epidemiology of Crohn’s disease and ulcerative colitis in a central Canadian province: A population-based study. Am J Epidemiol. 1999;149:916–24. doi: 10.1093/oxfordjournals.aje.a009735. [DOI] [PubMed] [Google Scholar]
  • 11.Rezaie A. Alberta: University of Calgary; 2007. Development of an ICD coding definition for inflammatory bowel disease. (Master’s thesis). < www.dspace.ucalgary.ca/bitstream/1880//3/Rezaie_MSc_2007_Med.pdf>. [Google Scholar]
  • 12.Pimentel M, Chang M, Chow EJ, et al. Identification of a prodromal period in Crohn’s disease but not ulcerative colitis. Am J Gastroenterol. 2000;95:3458–62. doi: 10.1111/j.1572-0241.2000.03361.x. [DOI] [PubMed] [Google Scholar]
  • 13.Farrokhyar F, McHugh K, Irvine EJ. Self-reported awareness and use of the International Classification of Diseases coding of inflammatory bowel disease services by Ontario physicians. Can J Gastroenterol. 2002;16:519–26. doi: 10.1155/2002/619574. [DOI] [PubMed] [Google Scholar]
  • 14.Hux JE, Ivis F, Flintoft V, Bica A. Diabetes in Ontario: Determination of prevalence and incidence using a validated administrative data algorithm. Diabetes Care. 2002;25:512–6. doi: 10.2337/diacare.25.3.512. [DOI] [PubMed] [Google Scholar]
  • 15.Benchimol EI, Guttmann A, Griffiths AM, et al. Increasing incidence of paediatric inflammatory bowel disease in Ontario, Canada: Evidence from health administrative data. Gut. 2009;58:1490–7. doi: 10.1136/gut.2009.188383. [DOI] [PubMed] [Google Scholar]
  • 16.Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3:32–5. doi: 10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
  • 17.Studney DR, Hakstian AR. A comparison of medical record with billing diagnostic information associated with ambulatory medical care. Am J Public Health. 1981;71:145–9. doi: 10.2105/ajph.71.2.145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Patten SB, Svenson LW, White CM, Khaled SM, Metz LM. Affective disorders in motor neuron disease: A population-based study. Neuroepidemiology. 2007;28:1–7. doi: 10.1159/000097849. [DOI] [PubMed] [Google Scholar]
  • 19.Hauck LJ, White C, Feasby TE, Zochodne DW, Svenson LW, Hill MD. Incidence of Guillain-Barre syndrome in Alberta, Canada: An administrative data study. J Neurol Neurosurg Psychiatry. 2008;79:318–20. doi: 10.1136/jnnp.2007.118810. [DOI] [PubMed] [Google Scholar]
  • 20.Gao S, Manns BJ, Culleton BF, et al. Access to health care among status Aboriginal people with chronic kidney disease. CMAJ. 2008;179:1007–12. doi: 10.1503/cmaj.080063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Sadowski DC, Ackah F, Jiang B, Svenson LW. Achalasia: Incidence, prevalence and survival. A population-based study. Neurogastroenterol Motil. 2010;22:e256–61. doi: 10.1111/j.1365-2982.2010.01511.x. [DOI] [PubMed] [Google Scholar]
  • 22.Chini F, Pezzotti P, Orzella L, Borgia P, Guasticchi G. Can we use the pharmacy data to estimate the prevalence of chronic conditions? A comparison of multiple data sources. BMC Public Health. 2011;11:688. doi: 10.1186/1471-2458-11-688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Li B, Quan H, Fong A, Lu M. Assessing record linkage between health care and Vital Statistics databases using deterministic methods. BMC Health Serv Res. 2006;6:48. doi: 10.1186/1472-6963-6-48. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Canadian Journal of Gastroenterology are provided here courtesy of Wiley

RESOURCES