Skip to main content
Annals of the American Thoracic Society logoLink to Annals of the American Thoracic Society
. 2017 Jun;14(6):880–887. doi: 10.1513/AnnalsATS.201610-764OC

Code-based Diagnostic Algorithms for Idiopathic Pulmonary Fibrosis. Case Validation and Improvement

Brett Ley 1,, Thomas Urbania 2, Gail Husson 3, Eric Vittinghoff 4, David R Brush 5, Mark D Eisner 6, Carlos Iribarren 3, Harold R Collard 1
PMCID: PMC5566307  PMID: 28355518

Abstract

Rationale: Population-based studies of idiopathic pulmonary fibrosis (IPF) in the United States have been limited by reliance on diagnostic code–based algorithms that lack clinical validation.

Objectives: To validate a well-accepted International Classification of Diseases, Ninth Revision, code–based algorithm for IPF using patient-level information and to develop a modified algorithm for IPF with enhanced predictive value.

Methods: The traditional IPF algorithm was used to identify potential cases of IPF in the Kaiser Permanente Northern California adult population from 2000 to 2014. Incidence and prevalence were determined overall and by age, sex, and race/ethnicity. A validation subset of cases (n = 150) underwent expert medical record and chest computed tomography review. A modified IPF algorithm was then derived and validated to optimize positive predictive value.

Results: From 2000 to 2014, the traditional IPF algorithm identified 2,608 cases among 5,389,627 at-risk adults in the Kaiser Permanente Northern California population. Annual incidence was 6.8/100,000 person-years (95% confidence interval [CI], 6.1–7.7) and was higher in patients with older age, male sex, and white race. The positive predictive value of the IPF algorithm was only 42.2% (95% CI, 30.6 to 54.6%); sensitivity was 55.6% (95% CI, 21.2 to 86.3%). The corrected incidence was estimated at 5.6/100,000 person-years (95% CI, 2.6–10.3). A modified IPF algorithm had improved positive predictive value but reduced sensitivity compared with the traditional algorithm.

Conclusions: A well-accepted International Classification of Diseases, Ninth Revision, code–based IPF algorithm performs poorly, falsely classifying many non-IPF cases as IPF and missing a substantial proportion of IPF cases. A modification of the IPF algorithm may be useful for future population-based studies of IPF.

Keywords: idiopathic pulmonary fibrosis, incidence, prevalence


Idiopathic pulmonary fibrosis (IPF) is a progressive fibrotic lung disease of unknown cause affecting older adults (1). IPF is considered an uncommon disease, but accurate estimates of its incidence and prevalence have been difficult to determine for multiple reasons. Clinically, the diagnosis of IPF is complex, and diagnostic criteria have evolved over the last 15 years, which may lead to diagnostic misclassification. Identifying cases in large administrative databases is further complicated by nonrepresentative diagnostic codes for IPF and potential for varying coding practices among providers.

Authors of a recent systematic review estimated incidence rates for IPF of 3–9 per 100,000 person-years in the United States and Europe, but the methodology of underlying epidemiologic studies varied widely (2). In the United States, estimates of IPF incidence and prevalence are based primarily on studies conducted in administrative databases that have applied diagnostic code–based algorithms that lacked patient-level case validation (3, 4). A recent study conducted with a large private insurance database suggested that these algorithms have poor positive predictive value (PPV) for identifying IPF on the basis of expert medical record review of a small subset of cases (5). Although this study suggested that IPF occurrence rates might be lower than previously estimated using diagnostic code–based algorithms, the researchers were unable to evaluate how many cases such algorithms may miss. Further, the study’s case validation procedure was limited by lack of computed tomographic (CT) image review—a crucial piece of information in the diagnostic classification of IPF.

In the age of large administrative databases and electronic medical records (EMRs), there is rich opportunity to conduct population-based studies in IPF. Kaiser Permanente Northern California (KPNC) provides one such opportunity with a large (over 3 million active members) and ethnically diverse member population. Therefore, the objectives of this study were to describe the epidemiology of IPF in the KPNC population, to determine the accuracy of the most commonly used International Classification of Diseases, Ninth Revision (ICD-9), code–based algorithm for identifying IPF cases, and to develop a modified algorithm for IPF with enhanced predictive value.

Methods

Institutional review boards at the University of California, San Francisco (approval 14-15449), and the KPNC Division of Research (approval CN-15-2126-H) approved the study protocol.

Study Population

The source population was the KPNC member population from January 1, 2000, through December 31, 2014. January 1, 2000, was selected as the study start date to include patients diagnosed after publication of the first international consensus guidelines on the diagnosis of IPF (6). To identify patients with IPF, an algorithm based on the ICD-9 and the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM), was used. This algorithm is similar or equivalent to that used in previous studies of IPF in the United States (35, 7). This algorithm required individuals to be over 18 years of age and to have at least one claim for a specific diagnostic code for IPF, either ICD-9 code 516.3 or ICD-9-CM code 516.31. Of these, cases were then excluded for any one claim for an alternative diagnostic code associated with interstitial lung disease (ILD) occurring on or after the date of the last claim for a specific IPF code (see Table E1 in the online supplement). We refer to this algorithm as the IPF algorithm.

A second algorithm was used to broadly screen for all individuals diagnosed with ILD of unknown etiology (i.e., idiopathic interstitial pneumonia [IIP]) and thus gain improved sensitivity for IPF cases. This “IIP algorithm” included all patients over the age of 50 years (to improve specificity) with at least one claim for ICD-9 codes 515, 516.3, or 516.31. ICD-9 code 515 was included in this algorithm to improve sensitivity; it is widely used by clinicians for patients with any form of ILD, and many patients with IPF may receive only this nonspecific diagnostic code. Cases were excluded for specific alternative diagnostic claims for other ILDs (see Table E1).

Algorithm Validation

The case validation procedure was restricted to new cases identified after 2008, when KPNC’s EMR system (EPIC Systems, Verona, WI) was widely introduced, to increase the availability of high-quality clinical information for medical record review. Two random samples of 75 cases were drawn from among those identified by the IPF algorithm, and one random sample of 75 cases was drawn from among those identified by the IIP algorithm. Cases with missing or inaccessible CT scans were excluded.

The case validation procedure involved a three-part process including structured medical record review by an expert ILD clinician (B.L.), chest CT image review by an expert ILD radiologist (T.U.), and case adjudication by two expert ILD clinicians (B.L. and H.R.C.). The structured medical record review was performed using the EMR and a clinical data collection form. Data extracted included age at diagnosis, sex, treating clinician’s diagnosis, pulmonary function test results, biopsy results, bronchoscopy results, smoking history, other exposure history (e.g., asbestos, radiation, molds, birds), medications, comorbidities, and autoimmune serologies.

Cases were then clinically categorized as likely IPF, unlikely IPF, not IPF, or insufficient information (see Figure E1). An expert chest radiologist (T.U.) reviewed all CT scan images, blinded to patients’ clinical information and the primary radiologist’s interpretation, to determine whether ILD was present and whether the ILD met established criteria for definite usual interstitial pneumonia (UIP) pattern, possible UIP pattern, or inconsistent with UIP pattern (1). Finally, using a prespecified case adjudication algorithm that incorporates clinical and radiologic review (see Figure E2), two expert ILD clinicians (B.L. and H.R.C.) applied a consensus diagnosis of IPF, unclassifiable pulmonary fibrosis, or not IPF, with consensus achieved on all cases.

Statistical Analysis

The annual incidence of the IPF algorithm was calculated as the number of new cases meeting criteria for the IPF algorithm in each calendar year divided by the total midyear adult member population (i.e., person-year). Annual cumulative prevalence was calculated as the total number of current, living members meeting criteria for the IPF algorithm in the current or any previous calendar year divided by the total midyear adult member population. Incidence rates and incidence rate ratios were also determined by year and demographic categories, including age, sex, and racial/ethnic groups. Person-years at risk was calculated as the sum of the at-risk midyear member population, overall and by year and demographic categories, from 2001 to 2014.

The PPV and binomial 95% confidence interval (CI) were determined for each algorithm and outcome definition. Sensitivity and binomial 95% CI of the IPF algorithm were calculated using the IIP algorithm validation sample. Agreement between the ILD expert diagnosis and the treating clinician’s diagnosis was determined using the κ-statistic with 95% CI. In the primary analysis, patients adjudicated as IPF were considered cases. For sensitivity analyses, cases were alternatively defined by (1) patients adjudicated as IPF or unclassifiable, (2) a chart diagnosis of IPF from the treating clinician, or (3) the presence of any ILD on a chest CT scan. PPV-corrected incidence and prevalence were calculated by multiplying point estimates by the corresponding PPV; 95% CIs were calculated by extremes of the rectangular CI bounds for incidence/prevalence and PPV.

To improve upon the PPV of the IPF algorithm (and develop a “modified IPF algorithm”), we considered additional predictor variables that would be available in administrative databases, including age at first IPF claim (code 516.3 or 516.31), sex, hospitalization with discharge diagnosis for IPF, any code 515 claim prior to the first IPF claim, any code 515 claim after the first the IPF claim, two or more IPF claims at least 1 month apart, a diagnostic claim for chronic obstructive pulmonary disease after the first IPF claim, a procedure code for lung biopsy (see Table E2) prior to the first IPF claim, a procedure code for chest computed tomography (see Table E2) prior to the first IPF claim, and antinuclear antibody or rheumatoid factor measurement prior to the first IPF claim. Using the IPF algorithm validation set 1, these candidate variables were considered in a least absolute shrinkage and selection operator (LASSO) model to predict an adjudicated diagnosis of IPF. Variables selected by the LASSO procedure were then validated in the IPF algorithm validation set 2 as a “modified IPF algorithm.”

Results

Characteristics, Incidence, and Prevalence of the IPF Algorithm

From 2000 to 2014, the IPF algorithm identified 2,608 patients among 5,389,627 at-risk adult KPNC members (Figure 1). The mean age of identified patients was 72.6 years (SD, ±11.6 yr); 55% were male; and the majority (67%) were non-Hispanic white individuals (Table 1). The distribution of age, sex, and race/ethnicity was comparable for the cohort restricted to 2008–2014, from which validation samples were drawn. The average annual incidence of the IPF algorithm was 6.8 per 100,000 person-years (95% CI, 6.6–7.1). The average annual cumulative prevalence was 14.5 per 100,000 person-years (95% CI, 14.1–14.9). Incidence was higher with older age, for men than for women, and for non-Hispanic white persons than for those in other racial/ethnic groups (Table 2). There were no clear increasing or decreasing trends in incidence or prevalence over the study period (see Figure E3). The more inclusive IIP algorithm had an estimated average annual incidence of 47.0 per 100,000 person-years (95% CI, 46.2–47.7) and an average cumulative prevalence of 87.9 per 100,000 persons (95% CI, 86.9–88.9).

Figure 1.

Figure 1.

Flowchart of patient selection. ICD-9 = International Classification of Diseases, Ninth Revision; ICD-9-CM = International Classification of Diseases, Ninth Revision, Clinical Modification; IPF = idiopathic pulmonary fibrosis; KPNC = Kaiser Permanente Northern California.

Table 1.

Characteristics of patients identified by diagnostic code-based algorithms and randomly selected samples used for algorithm validation

  IPF Algorithm*
IIP Algorithm
All Subjects, 2000–2014 (n = 2,608) All Subjects, 2008–2014 (n = 993) Validation Sample 1, 2008–2014 (n = 75) Validation Sample 2, 2008–2014 (n = 75) All Subjects, 2000–2014 (n = 16,731) Validation Sample, 2000–2014 (n = 75)
Mean age, yr (SD) 72.6 (11.6) 72.6 (11.8) 73.7 (10.6) 80.0 (9.5) 74.6 (10.8) 73.2 (10.9)
Age group, yr, n (%)            
 <55 199 (8%) 73 (7%) 3 (4%) 1 (1%) 885 (5%) 2 (3%)
 55–59 138 (5%) 58 (6%) 4 (5%) 4 (5%) 915 (5%) 8 (11%)
 60–64 239 (9%) 102 (10%) 10 (13%) 2 (3%) 1,483 (9%) 9 (12%)
 65–69 344 (13%) 137 (14%) 12 (16%) 2 (3%) 2,033 (12%) 8 (11%)
 70–74 431 (16%) 152 (15%) 11 (15%) 9 (12%) 2,412 (14%) 11 (15%)
 75–79 467 (18%) 159 (16%) 14 (19%) 10 (13%) 2,836 (17%) 13 (17%)
 >80 790 (30%) 312 (31%) 21 (28%) 47 (63%) 6,167 (37%) 24 (32%)
Sex, n (%)            
 Male 1,446 (55%) 574 (58%) 39 (52%) 49 (65%) 8,149 (49%) 30 (40%)
 Female 1,161 (45%) 419 (42%) 36 (48%) 26 (35%) 8,582 (51%) 45 (60%)
Race, n (%)            
 White 1,735 (67%) 617 (62%) 42 (56%) 59 (79%) 11,359 (68%) 50 (67%)
 Black 140 (5%) 50 (5%) 3 (4%) 2 (3%) 950 (6%) 4 (5%)
 Asian 180 (7%) 98 (10%) 8 (11%) 7 (9%) 1,414 (8%) 9 (12%)
 Hispanic 371 (14%) 156 (16%) 14 (19%) 6 (8%) 1,885 (11%) 7 (9%)
 Other 182 (7%) 72 (7%) 8 (11%) 1 (1%) 1,123 (7%) 5 (7%)

Definition of abbreviations: ICD-9 = International Classification of Diseases, Ninth Revision; ICD-9-CM = International Classification of Diseases, Ninth Revision, Clinical Modification; IIP = idiopathic interstitial pneumonia; IPF = idiopathic pulmonary fibrosis.

*

The IPF algorithm requires being older than 18 years of age, having at least one diagnostic claim for ICD-9 code 516.3 or ICD-9-CM code 516.31, and exclusion of alternative interstitial lung disease diagnostic claims.

The IIP algorithm requires being older than 50 years of age; having at least one claim for the ICD-9 codes 515, 516.3, or 516.31; and exclusion of alternative interstitial lung disease diagnostic claims.

Table 2.

Raw and positive predictive value–corrected incidence and prevalence using two case-finding algorithms from 2001 to 2014 in the Kaiser Permanente Northern California member population

Overall Estimates Annual Incidence* (per 100,000 Person-Years) 95% CI Average Prevalence* (per 100,000 Persons) 95% CI
IPF algorithm 6.8 6.6–7.1 14.5 14.1–14.9
PPV-corrected IPF algorithm 3.1 2.4–3.8 6.6 5.2–8.0
IIP algorithm§ 47.0 46.2–47.7 87.9 86.9–88.9
PPV-corrected IIP algorithm 5.6 2.6–10.3 10.5 4.9–19.2
IPF algorithm*, subgroups Annual Incidence* (per 100,000 Person-Years) 95% CI Incidence Rate Ratio 95% CI
Age, yr        
 <55 0.7 0.6–0.9 Reference Reference
 55–59 5.1 4.3–6.1 7.1 3.7–13.7
 60–64 8.8 7.7–10.1 12.6 6.7–23.7
 65–69 16.1 14.4–18.0 22.2 11.9–41.4
 70–74 26.4 23.8–29.2 36.0 19.4–66.8
 75–79 37.2 33.7–41.0 50.6 27.3–93.8
 >80 50.5 46.9–54.4 70.7 38.3–130.7
Sex        
 Female 5.8 5.4–6.1 Reference Reference
 Male 8.1 7.6–8.5 1.39 1.04–1.84
Race        
 White 9.7 9.2–10.2 Reference Reference
 Black 5.8 4.8–6.8 0.60 0.45–0.79
 Asian 3.5 3.0–4.1 0.36 0.26–0.50
 Hispanic 6.6 5.9–7.4 0.70 0.54–0.91
 Other 2.7 2.3–3.1 0.28 0.20–0.40

Definition of abbreviations: CI = confidence interval; ICD-9 = International Classification of Diseases, Ninth Revision; ICD-9-CM = International Classification of Diseases, Ninth Revision, Clinical Modification; IIP = idiopathic interstitial pneumonia; IPF = idiopathic pulmonary fibrosis; PPV = positive predictive value.

*

Annual incidence was calculated as the number of new cases meeting criteria for the algorithm in each calendar year divided by the total midyear adult member population of each year (i.e., person-years). Annual cumulative prevalence was calculated as the total number of current, living members meeting criteria for the algorithm in the current or any previous calendar year divided by the total midyear adult member population.

The IPF algorithm requires being older than 18 years of age, having at least one diagnostic claim for ICD-9 code 516.3 or ICD-9-CM code 516.31, and exclusion of alternative interstitial lung disease diagnostic claims.

Overall PPV from the two combined IPF algorithm validation sets (n = 141) using the adjudication diagnosis to define positive cases (PPV, 45.4%; 95% CI, 37.0–54.0).

§

The IIP algorithm requires being older than 50 years of age; having at least one diagnostic claim for ICD-9 code 516.3, ICD-9-CM code 516.31, or ICD-9 code 515; and exclusion of alternative interstitial lung disease diagnostic claims.

PPV from the IIP validation set (n = 75) using the adjudication diagnosis to define cases (PPV, 12.0%; 95% CI, 5.6–21.6).

Algorithm Validation

In the first validation sample of 75 patients identified by the IPF algorithm, 30 patients were confirmed as IPF, 11 patients were considered unclassifiable, and 30 patients were classified as not having IPF (see Table E3). In four patients, insufficient data were available for validation. Thus, the PPV for the IPF algorithm was 42.2% (95% CI, 30.6 to 54.6%) (Table 3).

Table 3.

Positive predictive value of the idiopathic pulmonary fibrosis algorithm for identifying cases with an adjudicated diagnosis of idiopathic pulmonary fibrosis, with sensitivity analyses for alternative case classification methods

Case Definition PPV of IPF Algorithm (95% CI)
Sample 1 (n = 71) Sample 2 (n = 70)
Primary analysis
 
 
 Adjudicated diagnosis of IPF 42.2 (30.6–54.6) 48.6 (36.4–60.8)
Sensitivity analyses    
 Adjudicated diagnosis of IPF or unclassifiable 57.7 (45.4–69.4) 65.7 (53.4–76.7)
 Treating clinician chart diagnosis 49.3 (37.2–61.4) 47.1 (35.0–59.4)
 Any ILD based on expert CT image review 76.1 (64.4–85.4) 85.7 (75.3–92.9)

Definition of abbreviations: CI = confidence interval; CT = computed tomographic; ILD = interstitial lung disease; IPF = idiopathic pulmonary fibrosis; PPV = positive predictive value.

In sensitivity analysis, the PPV varied by the applied case definition: 57.7% (95% CI, 45.4 to 69.4%) for cases adjudicated as confirmed IPF or unclassifiable, 49.3% (95% CI, 37.2 to 61.4%) for cases based on the treating physician’s chart diagnosis, and 76.1% (95% CI, 64.4 to 85.4%) for cases defined by the presence of any ILD on a CT scan. Results were similar in the second IPF algorithm validation sample. The observed agreement between the treating clinician and ILD expert diagnosis was 84.5%, with a κ of 0.69 (95% CI, 0.52–0.86). In the sample of 75 patients identified by the broader IIP algorithm, 9 patients were confirmed as IPF (PPV, 12%; 95% CI, 5.6–21.6). Of these nine patients, five also met criteria for the IPF algorithm, giving an estimated sensitivity for the IPF algorithm of 55.6% (95% CI, 21.2 to 86.3%) among cases with an idiopathic ILD as defined by the IIP algorithm. Using these PPV estimates, the average PPV-corrected incidence rates are 3.1 per 100,000 person-years (95% CI, 2.4–3.8) using the IPF algorithm and 5.6 per 100,000 person-years (95% CI, 2.6–10.3) using the IIP algorithm (Table 2).

Modified IPF Algorithm

The LASSO procedure selected three additional variables predictive of an adjudicated diagnosis of IPF: (1) two or more code 516.3 or 516.31 claims at least 1 month apart, (2) chest computed tomography prior to the first code 516.3 or 516.31 claim, and (3) any code 515 claim after the first code 516.3 or 516.31 claim. In the validation cohort, only the first two variables just mentioned remained predictive. In both the derivation and validation cohorts, all cases were at least 50 years of age, and no cases with an ICD-9 code for rheumatoid arthritis (code 714.0) received an adjudication diagnosis of IPF. Therefore, we proposed a modified IPF algorithm that included (1) age 50 years or older, (2) at least two code 516.3 or 516.31 claims at least 1 month apart, (3) a chest computed tomography procedure code prior to the first code 516.3 or 516.31 claim, and (4) exclusion for claims with alternative ILD diagnoses (including ICD-9 code 714.0) on or after the first code 516.3 or 516.31 claim (Table 4).

Table 4.

Criteria for the modified idiopathic pulmonary fibrosis algorithm

Criteria Notes
Inclusion criteria  
 Age ≥50 yr At time of first claim for either ICD-9 code 516.3 or ICD-9-CM code 516.31
 At least two IPF diagnostic claims At least two claims for either ICD-9 code 516.3 or ICD-9-CM code 516.31 at least 1 mo apart
 Chest CT procedure claim* Any chest CT procedure code prior to the first diagnostic claim for IPF
Exclusion criteria  
 Any diagnostic claim for an alternative ILD diagnosis Any claims for alternative ILD codes occurring on or after the first claim for IPF

Definition of abbreviations: CT = computed tomography; ICD-9 = International Classification of Diseases, Ninth Revision; ICD-9-CM = International Classification of Diseases, Ninth Revision, Clinical Modification; ILD = interstitial lung disease; IPF = idiopathic pulmonary fibrosis.

*

Chest CT procedure codes: ICD-9-CM 87.41 and Current Procedural Terminology, 4th Edition, codes 71250, 71260, and 71270.

Exclusionary ICD-9 codes for alternative ILD diagnoses: 135, 237.7, 272.7, 277.3, 277.8, 446.21, 446.4, 495, 500–505, 506.4, 508.1, 508.8, 516.0, 516.1, 516.32–516.37, 516.2, 516.8, 516.9, 517.0, 517.2, 517.8, 518.3, 555, 710.0, 710.0–710.4, 714.0, 714.81, 720, and 759.5.

The PPV of the modified IPF algorithm for an ILD expert diagnosis of IPF was improved compared with the original IPF algorithm: 70.4% (95% CI, 49.8 to 86.2%) and 61.8% (95% CI, 43.6 to 77.8%) in the derivation and validation cohorts, respectively (Figure 2, Table 5). The PPV was further improved in sensitivity analyses. However, the sensitivity of the modified IPF algorithm for IPF was further reduced compared with the original IPF algorithm.

Figure 2.

Figure 2.

Diagram demonstrating misclassification of cases by the idiopathic pulmonary fibrosis (IPF) algorithm compared with the modified IPF algorithm. PPV = positive predictive value.

Table 5.

Positive predictive value of the modified idiopathic pulmonary fibrosis algorithm for identifying cases with an interstitial lung disease expert diagnosis of idiopathic pulmonary fibrosis, with sensitivity analyses for alternative case classification methods

Case Definition PPV of Modified IPF Algorithm (95% CI)
Sample 1 (n = 71) Sample 2 (n = 70)
Primary analysis
 
 
 Adjudicated diagnosis of IPF 70.4 (49.8–86.2) 61.8 (43.6–77.8)
Sensitivity analyses    
 Adjudicated diagnosis of IPF or unclassifiable 85.2 (66.3–95.8) 85.3 (68.9–95.0)
 Treating clinician chart diagnosis 77.8 (57.7–91.4) 61.8 (43.6–77.8)
 Any ILD based on expert CT image review 88.9 (70.8–97.6) 100 (89.7–100)

Definition of abbreviations: CI = confidence interval; CT = computed tomographic; ILD = interstitial lung disease; IPF = idiopathic pulmonary fibrosis; PPV = positive predictive value.

Discussion

The KPNC population appears to be representative of traditionally used claims or third-party data cohorts with regard to IPF epidemiology. The estimated incidence of IPF based on a commonly used case-finding algorithm in the KPNC population is 6.8 per 100,000 person-years, within the range reported in a recent systematic review (3–9 per 100,000 person-years) (2). This incidence rate is in the lower range of estimates from prior U.S. studies (which ranged from 6.8 to 93.7 per 100,000 person-years) (35, 7, 8).

We speculate that this may be due to a younger and more racially diverse patient population. Similar to findings in other cohorts, we found that IPF incidence increases with older age, is higher in men than in women, and is highest among whites compared with other racial/ethnic groups (3, 79).

Importantly, our results suggest that this code-based algorithm may not generate accurate estimates of IPF incidence and prevalence, given its poor PPV and poor sensitivity. PPV-corrected estimates based on this algorithm would therefore be expected to substantially underestimate the true incidence of the disease (PPV-corrected incidence 3.1 per 100,000 person-years), as suggested by the PPV-corrected estimate derived from our more inclusive screening IIP algorithm (PPV-corrected incidence 5.6 per 100,000 person-years).

Our study is the first to investigate the epidemiology of IPF in a Kaiser Permanente population. We found that the most commonly accepted ICD-9 code–based algorithm for identifying IPF in administrative databases in the United States (termed the IPF algorithm) demonstrates similar incidence and demographic trends as in prior studies, but that it performs poorly, with a high degree of diagnostic misclassification. We also found that by making a few simple, empirically derived changes to the IPF algorithm, a modified IPF algorithm demonstrated substantially improved PPV. This internally validated, modified IPF algorithm may be useful for conducting future population-based studies of IPF in the KPNC member population.

Case validation is an essential component of coding-based algorithms, as demonstrated by our results. Over half of the patients identified as having IPF by the code-based IPF algorithm did not have IPF on case review. Alarmingly, whereas half of the misclassified cases had an alternative ILD diagnosis, the other half had no clinical or radiologic evidence of ILD at all. These findings are consistent with those recently reported in a large private insurance claims database (where the PPV was 54%) (5). In contrast to this prior study, which validated cases by medical record review only, our study incorporated information from an expert chest radiologist’s interpretation of chest CT image pattern. This is a critical difference because differentiation of IPF from other forms of ILD often is based primarily on CT image pattern, which requires expert chest radiologist review (10, 11). Incorporating CT image review provided greater validity to our predictive estimates and allowed for more detailed characterization of the misclassified cases.

We can only speculate about the reasons for the poor PPV of the IPF algorithm. It seems likely to be due to a combination of misdiagnosis at the clinical level and miscoding at the administrative level. Importantly, to our knowledge, our study is the first to demonstrate that the classification problems extend beyond poor PPV to sensitivity. Using our broad-based IIP algorithm, which was designed to capture the larger idiopathic ILD population, it appears that approximately half of IPF cases may not be captured by the IPF algorithm. Importantly, the precision of this estimate is poor, owing to the small number of false-negative cases identified by our broader screening. Future research into the reasons for misdiagnosis is needed to develop strategies for improved coding and coding-based case identification.

Our internally validated, modified IPF algorithm performed substantially better than the original IPF algorithm by incorporating the following two additional variables: having two or more diagnostic claims for IPF (code 516.3 or 516.31) at least 1 month apart and having a chest computed tomography claim prior to the first diagnostic claim for IPF. Indeed, the large majority of cases detected by the modified IPF algorithm had either IPF or unclassifiable pulmonary fibrosis on expert adjudication (PPV, 85%).

Inclusion of patients adjudicated as unclassifiable in the case definition may be reasonable because many of these cases are believed to have IPF (12). In this cohort, the main reason for the inability to classify these patients was due to the lack of a surgical lung biopsy in the setting of a nondefinitive CT scan. There are emerging data supporting the high prevalence of IPF in these patients with “possible IPF” (1315). We believe the modified IPF algorithm will be useful for population-based studies of IPF in the KPNC population (and likely in the larger population as well) that require high diagnostic certainty, such as studies of disease behavior, disease outcomes, and health care use.

Limitations

There are several limitations of our study to consider. We validated the modified IPF algorithm for application in the KPNC population. Whether this algorithm will perform similarly well in other administrative cohorts is unknown and requires external validation before broader use.

Our results apply only to the ICD-9 era. With the recent change in diagnostic coding to the International Classification of Diseases, 10th Revision (ICD-10), in the United States, similar methods will be needed to validate an IPF algorithm using the ICD-10 coding system. Whether the more specific codes provided by the ICD-10 system will allow for improved case classification of IPF requires further study.

There are inherent limitations of insurance claims data, some of which are highlighted in this article (e.g., case misclassification), thus limiting confidence in the results. KPNC is a dynamic cohort (i.e., members freely enter or exit the system). If there was differential in- or outmigration of cases compared with the larger member population, this could have biased our estimates.

Our case adjudication process was based on retrospective case and CT scan review. Data not available in these sources therefore could have led to case misclassification. For example, we excluded cases with a missing CT scan from the samples drawn for validation of the IPF algorithm. If these cases were systematically different from the remaining sample, we could have over- or underestimated PPVs for the IPF algorithm by up to 3.4%.

Because of feasibility issues, only one expert chest radiologist interpreted CT scans. Considering the variation in CT scan pattern interpretation among radiologists (10), our results could have been biased by systematic differences in interpretation compared with the average radiologist.

Conclusions

We demonstrate that a well-accepted ICD-9 code–based algorithm for identifying IPF is inaccurate; it both includes many patients who do not have IPF and likely misses a substantial proportion of patients who do have IPF. An empirically derived and internally validated modified IPF algorithm more reliably identifies patients with IPF and could be a useful tool for future population-based studies of IPF. Finally, our study demonstrates the rich opportunity for population-based studies of IPF in integrated health care delivery systems.

Supplementary Material

Supplements
Author disclosures

Acknowledgments

Acknowledgment

The authors thank Elwyn and Jennifer Berlekamp for their generous donation, which made this work possible.

Footnotes

The research reported in this publication was supported by National Heart, Lung, and Blood Institute (NHLBI) grants F32 HL124895 (B.L.) and K24 HL127131 (H.R.C.) as well as by a generous donation from Elwyn and Jennifer Berlekamp. The views expressed in this article do not communicate the official position of the National Institutes of Health, the NHLBI, or our private donors.

Author Contributions: B.L. and H.R.C.: conceived of the study; B.L., H.R.C., M.D.E., and C.I.: designed the study; B.L., G.H., and T.U. collected data; B.L., G.H., C.I., and E.V.: performed statistical analyses; all authors: participated in result review and interpretation; B.L. and H.R.C.: wrote the first draft of the manuscript; and all authors: contributed to review and revision of the manuscript and approved of the final version of the manuscript for submission.

This article has an online supplement, which is accessible from this issue’s table of contents at www.atsjournals.org

Author disclosures are available with the text of this article at www.atsjournals.org.

References

  • 1.Raghu G, Collard HR, Egan JJ, Martinez FJ, Behr J, Brown KK, Colby TV, Cordier JF, Flaherty KR, Lasky JA, et al. ATS/ERS/JRS/ALAT Committee on Idiopathic Pulmonary Fibrosis. An official ATS/ERS/JRS/ALAT statement: idiopathic pulmonary fibrosis: evidence-based guidelines for diagnosis and management. Am J Respir Crit Care Med. 2011;183:788–824. doi: 10.1164/rccm.2009-040GL. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Hutchinson J, Fogarty A, Hubbard R, McKeever T. Global incidence and mortality of idiopathic pulmonary fibrosis: a systematic review. Eur Respir J. 2015;46:795–806. doi: 10.1183/09031936.00185114. [DOI] [PubMed] [Google Scholar]
  • 3.Raghu G, Chen SY, Yeh WS, Maroni B, Li Q, Lee YC, Collard HR.Idiopathic pulmonary fibrosis in US Medicare beneficiaries aged 65 years and older: incidence, prevalence, and survival, 2001–11 Lancet Respir Med 20142566–572.[Published erratum appears in Lancet Respir Med 2014;2:e12.] [DOI] [PubMed] [Google Scholar]
  • 4.Raghu G, Weycker D, Edelsberg J, Bradford WZ, Oster G. Incidence and prevalence of idiopathic pulmonary fibrosis. Am J Respir Crit Care Med. 2006;174:810–816. doi: 10.1164/rccm.200602-163OC. [DOI] [PubMed] [Google Scholar]
  • 5.Esposito DB, Lanes S, Donneyong M, Holick CN, Lasky JA, Lederer D, Nathan SD, O’Quinn S, Parker J, Tran TN. Idiopathic pulmonary fibrosis in United States automated claims: incidence, prevalence, and algorithm validation. Am J Respir Crit Care Med. 2015;192:1200–1207. doi: 10.1164/rccm.201504-0818OC. [DOI] [PubMed] [Google Scholar]
  • 6.American Thoracic Society. Idiopathic pulmonary fibrosis: diagnosis and treatment: international consensus statement. This joint statement of the American Thoracic Society (ATS) and the European Respiratory Society (ERS) was adopted by the ATS Board of Directors, July 1999 and by the ERS Executive Committee, October 1999. Am J Respir Crit Care Med. 2000;161:646–664. doi: 10.1164/ajrccm.161.2.ats3-00. [DOI] [PubMed] [Google Scholar]
  • 7.Raghu G, Chen SY, Hou Q, Yeh WS, Collard HR. Incidence and prevalence of idiopathic pulmonary fibrosis in US adults 18–64 years old. Eur Respir J. 2016;48:179–186. doi: 10.1183/13993003.01653-2015. [DOI] [PubMed] [Google Scholar]
  • 8.Fernández Pérez ER, Daniels CE, Schroeder DR, St Sauver J, Hartman TE, Bartholmai BJ, Yi ES, Ryu JH. Incidence, prevalence, and clinical course of idiopathic pulmonary fibrosis: a population-based study. Chest. 2010;137:129–137. doi: 10.1378/chest.09-1002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Navaratnam V, Fleming KM, West J, Smith CJ, Jenkins RG, Fogarty A, Hubbard RB. The rising incidence of idiopathic pulmonary fibrosis in the U.K. Thorax. 2011;66:462–467. doi: 10.1136/thx.2010.148031. [DOI] [PubMed] [Google Scholar]
  • 10.Flaherty KR, Andrei AC, King TE, Jr, Raghu G, Colby TV, Wells A, Bassily N, Brown K, du Bois R, Flint A, et al. Idiopathic interstitial pneumonia: do community and academic physicians agree on diagnosis? Am J Respir Crit Care Med. 2007;175:1054–1060. doi: 10.1164/rccm.200606-833OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Travis WD, Costabel U, Hansell DM, King TE, Jr, Lynch DA, Nicholson AG, Ryerson CJ, Ryu JH, Selman M, Wells AU, et al. ATS/ERS Committee on Idiopathic Interstitial Pneumonias. An official American Thoracic Society/European Respiratory Society statement: update of the international multidisciplinary classification of the idiopathic interstitial pneumonias. Am J Respir Crit Care Med. 2013;188:733–748. doi: 10.1164/rccm.201308-1483ST. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ryerson CJ, Urbania TH, Richeldi L, Mooney JJ, Lee JS, Jones KD, Elicker BM, Koth LL, King TE, Jr, Wolters PJ, et al. Prevalence and prognosis of unclassifiable interstitial lung disease. Eur Respir J. 2013;42:750–757. doi: 10.1183/09031936.00131912. [DOI] [PubMed] [Google Scholar]
  • 13.Raghu G, Lynch D, Godwin JD, Webb R, Colby TV, Leslie KO, Behr J, Brown KK, Egan JJ, Flaherty KR, et al. Diagnosis of idiopathic pulmonary fibrosis with high-resolution CT in patients with little or no radiological evidence of honeycombing: secondary analysis of a randomised, controlled trial. Lancet Respir Med. 2014;2:277–284. doi: 10.1016/S2213-2600(14)70011-6. [DOI] [PubMed] [Google Scholar]
  • 14.Chung JH, Chawla A, Peljto AL, Cool CD, Groshong SD, Talbert JL, McKean DF, Brown KK, Fingerlin TE, Schwarz MI, et al. CT scan findings of probable usual interstitial pneumonitis have a high predictive value for histologic usual interstitial pneumonitis. Chest. 2015;147:450–459. doi: 10.1378/chest.14-0976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Yagihashi K, Huckleberry J, Colby TV, Tazelaar HD, Zach J, Sundaram B, Pipavath S, Schwarz MI, Lynch DA Idiopathic Pulmonary Fibrosis Clinical Research Network (IPFnet) Radiologic-pathologic discordance in biopsy-proven usual interstitial pneumonia. Eur Respir J. 2016;47:1189–1197. doi: 10.1183/13993003.01680-2015. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplements
Author disclosures

Articles from Annals of the American Thoracic Society are provided here courtesy of American Thoracic Society

RESOURCES