Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Jul 1.
Published in final edited form as: Pharmacoepidemiol Drug Saf. 2011 May 12;20(7):709–713. doi: 10.1002/pds.2157

Accuracy of identifying neutropenia diagnoses in outpatient claims data

Seo Young Kim 1,2, Daniel H Solomon 1,2, Jun Liu 1, Chun-Lan Chang 3, Gregory W Daniel 3, Sebastian Schneeweiss 1
PMCID: PMC3142869  NIHMSID: NIHMS306623  PMID: 21567653

Abstract

Purpose

Diagnosis codes have been valid tools to identify severe neutropenia leading to hospitalization in claims data, but no data exist on the accuracy of outpatient diagnosis of neutropenia. We examined the validity and accuracy of claims-based algorithms to identify neutropenia from outpatient visits.

Methods

Adults with outpatient diagnosis of neutropenia in the HealthCore Integrated Research Database ™ were identified by several algorithms using a combination of ICD-9 codes and drug use data. We calculated sensitivity, specificity, positive predictive value (PPV) and negative predictive value of these algorithms using outpatient laboratory data within 3 months of the diagnosis as the gold standard to ascertain cases of mild (absolute neutrophil count (ANC) <1,500 /μL) and severe (ANC <500 /μL) neutropenia.

Results

Among 95,742 eligible subjects, 867 patients were identified with any ICD-9 codes for neutropenia. This algorithm had high specificity (99%), but low sensitivity (9%) and PPV (18%) for mild neutropenia. Among the subjects identified with the ICD-9 288.0 (N=203), sensitivity was 4% and PPV was 33%. Specificity and PPV of the algorithm that combined any ICD-9 codes for neutropenia with dispensing of pegfilgrastim or filgrastim were 100% and 56% for mild neutropenia, respectively. Sensitivity was 1%. All algorithms had slightly higher sensitivity, but lower PPV for severe neutropenia.

Conclusions

Use of ICD-9 codes for neutropenia in combination with drug use data did not appear to accurately identify outpatient diagnosis of neutropenia without using laboratory results, but it may be useful in determining the absence of neutropenia in claims data.

Keywords: neutropenia, validation studies, diagnosis code, International Classification of Diseases

Introduction

Numerous cytotoxic and non-chemotherapy drugs, such as β-lactam antibiotics, antithyroid drugs, antipsychotic drugs, non-steroidal anti-inflammatory drugs, anti-arrhythmics, and sulfa-based drugs have been known to have hematologic toxicity including neutropenia.1-5 In severe cases of neutropenia or agranulocytosis, defined as the absolute neutrophil count (ANC) less than 500/μL, patients can develop serious infection leading to hospitalization and potentially mortality.2, 5 The overall age- and sex- adjusted incidence rate of neutropenia and agranulocytosis based on the inpatient claims data in the U.S. was 44.5 and 9.0 per million per year, respectively.6 The management of drug-induced neutropenia includes immediate discontinuation of any potential causative drugs, prevention of infections with antibiotics, and use of granulocyte colony-stimulating factors (G-CSF) such as pegfilgrastim or filgrastim in severe cases.

Large health care utilization databases are useful to study rare adverse events, such as neutropenia, associated with drugs because of their size and access to prescription drug utilization data.7 Because most pharmacoepidemiologic studies that use claims data mainly rely on the diagnosis codes to define their outcomes, there is a potential for misclassifying outcomes such as neutropenia. Therefore, validating the outcomes with medical records and laboratory data is often needed to minimize outcome misclassification bias.7 Although the accuracy of the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9 CM) code 288.0 for neutropenia in a hospital-based setting has a high positive predictive value (PPV) of 97%,6, 8 the accuracy of neutropenia diagnosis of neutropenia in claims data derived from outpatient visits is unknown. We developed several algorithms for diagnoses of neutropenia from outpatient visits using a combination of diagnosis codes and drug dispensing data in a commercial health care utilization database and validated them with outpatient laboratory data.

Methods

Data Source

We used health care claims data from the HealthCore Integrated Research Database (HIRD™) for the period January 1, 2005 through June 30, 2008. This database contained longitudinal claims information including medical diagnoses, procedures, hospitalizations, physician visits, and pharmacy dispensing records on more than 28 million subscribers, with medical and pharmacy coverage, from 14 Blue Cross/Blue Shield health plans across the United States. Results for outpatient laboratory tests including complete blood counts with differentials were available in the subset of beneficiaries whose test were processed by two nationally operating lab test providers. This work was approved by the Institutional Review Board at Partners HealthCare. Data Use Agreements were in place with HealthCore, Inc. prior to initiating this study.

Neutropenia Diagnosis Algorithms

We developed several algorithms to identify subjects with outpatient diagnoses of neutropenia; A) subjects with any outpatient claims with the ICD-9 code 288.00, B) subjects with any outpatient claims with the ICD-9 codes for neutropenia (See Table 1 for the list of ICD-9 codes), and C) subjects with any outpatient claims with the ICD-9 codes for neutropenia combined with one or more outpatient drug dispensings for G-CSF (pegfilgrastim or filgrastim). Use of prescriptions for G-CSF was identified by the National Drug Code, a unique identifier for human drug products, in the outpatient prescription claims during the study period.

Table 1. ICD-9 codes for neutropenia.

288.00 Unspecified neutropenia
288.03 Drug-induced neutropenia
288.09 Other neutropenia
288.5 Decreased white blood cell count
288.50 Unspecified leukocytopenia
288.59 Other decreased white blood cell count
288.8 Other specified disease of white blood cells
288.9 Unspecified disease of white blood cells

Cohort Selection

Subjects were required to have at least 12 months of continuous health plan eligibility before entering the initial study cohort. Subjects with claims for solid tumors, hematologic malignancies, myelodysplastic syndrome, human immunodeficiency virus infection, or chemotherapy were excluded from the study cohort. Only subjects who had available laboratory data for the ANCs were included. Each algorithm was then applied to identify subjects with outpatient diagnosis of neutropenia. Figure 1 displays our cohort selection process.

Figure 1. Selection of the study cohort.

Figure 1

Claims data derived from outpatient visits in the HealthCore Integrated Research Database (HIRD) between January 1, 2005 and June 30, 2008 were used to identify cases of neutropenia based on several algorithms. Subjects with diagnosis of solid tumor, hematologic malignancies, human immunodeficiency virus (HIV) infection, or history of chemotherapy were excluded. Patients with data on absolute neutrophil counts were only included for analysis.

Algorithm Validation

We used outpatient laboratory data within 3 months before or after the neutropenia diagnosis as the gold standard to validate the claims-based algorithms. Mild neutropenia was defined as having at least one ANC less than 1,500/μL and severe neutropenia as having at least one ANC less than 500/μL. We calculated the positive predictive value (PPV) as the percentage of the patients who met the laboratory definitions of neutropenia, mild and severe, among those identified by the respective algorithms. The negative predictive value (NPV) of the algorithms was calculated as the percentage of the patient not identified by the algorithms, who did not have neutropenia based on the laboratory data. 95% confidence intervals (CIs) of the PPV, NPV, sensitivity, and specificity were calculated by using the normal approximation of the binomial distribution. All analyses were done using SAS 9.1 Statistical Software (SAS Institute Inc., Cary, NC).

Results

There were a total of 730,874 subjects who had at least 12 months of continuous health plan eligibility in the HIRD™. Of those, 63,191 subjects who had one or more claims for solid tumors, hematologic malignancies, myelodysplastic syndrome, human immunodeficiency virus infection, or chemotherapy were excluded. Several claims-based algorithms were run in the cohort of 95,742 eligible subjects who had laboratory data on ANCs. (Figure 1) Of those, 867 (0.9%) patients were identified by algorithm B, 203 (0.2%) patients by algorithm A, and 18 (0.02%) patients by algorithm C. The baseline characteristics of the study population identified by each algorithm are presented in Table 2. The mean age of patients in each group ranged between 46.1 and 49.4 years and 72.2% to 78.9% were female. Table 3 summarizes the accuracy of each algorithm for outpatient diagnosis of neutropenia. All the algorithms had both specificity and NPV around 100%, but very low sensitivity. For mild neutropenia, PPV of the algorithms ranged from 18% to 56%. The algorithms had lower PPV between 3% and 22% for severe neutropenia.

Table 2. Baseline characteristics of study cohort identified by the algorithms *.

Algorithm
A
Algorithm
B
Algorithm
C
N 203 867 18
Age, mean (SD) 46.9 (12.4) 46.1(12.6) 49.4 (10.9)
Female, n (%) 158 (77.8) 684 (78.9) 13 (72.2)
Number of outpatient visits, mean (SD) 5.7 (6.6) 6.5 (7.4) 7.2 (6.6)
Number of prescription drugs, mean (SD) 5.8 (6.4) 6.9 (7.0) 8.1 (4.9)
Emergency room visit, n (%) 25(12.3) 158 (18.2) 7 (38.9)
Hospitalization, n (%) 84 (10.8) 468 (12.9) 8 (18.2)
*

Patient characteristics were assessed during 365 days prior to diagnosis of neutropenia.

Algorithm A: subjects with outpatient claims for ICD-9 code 288.00; Algorithm B: subjects with any outpatient claims for neutropenia; Algorithm C: subjects with any outpatient claims for neutropenia combined with a drug dispensing for G-CSF.

SD: standard deviation

Table 3. Accuracy of outpatient diagnoses for neutropenia.

Sensitivity
(95% CI)
Specificity
(95% CI)
PPV
(95% CI)
NPV
(95% CI)
Algorithm
Mild neutropenia
(ANC < 1,500/μL)

A. ICD-9 288.00 4
(3-5)
100
(100-100)
33
(27-40)
98
(98-98)
B. Any ICD-9 codes for neutropenia 9
(78-10)
99
(99-99)
18
(15-20)
98
(98-98)
C. Any ICD-9 codes for neutropenia combined with a dispensing of G-CSF 1
(0-1)
100
(100-100)
56
(33-79)
98
(98-98)

Severe neutropenia
(ANC < 500/μL)

A. ICD-9 288.00 16
(7-25)
100
(100-100)
5
(2-8)
100
(100-100)
B. Any ICD-9 codes for neutropenia 35
(23-46)
99
(99-99)
3
(2-4)
100
(100-100)
C. Any ICD-9 codes for neutropenia combined with a dispensing of G-CSF 6
(0-12)
100
(100-100)
22
(3-41)
100
(100-100)

Numbers are presented in percentage.

ANC: absolute neutrophil count; G-CSF: Granulocyte colony-stimulating factor; CI: confidence intervals; PPV: positive predictive value; NPV: negative predictive value; ICD-9: International Classification of Diseases, 9th revision

Discussion

Using the claims data from outpatient visits in a commercial health care utilization database with lab test results, our study demonstrated that the diagnosis of neutropenia as outpatient was rare and underused and having at least one ICD-9 code for neutropenia with or without use of G-CSF did not appear to accurately identify outpatient diagnosis of neutropenia.

In previous validation studies, the PPV of ICD-9 code 288.0 was 97% for inpatient diagnoses of neutropenia. 8, 9 There are several limitations that could partially explain the suboptimal performance of claims-based algorithms in this study. All the algorithms in this study had poor sensitivity for both mild and severe neutropenia. Because this study was focused on outpatient diagnoses of neutropenia, the diagnoses of neutropenia could have been missed unless it was symptomatic or highly suspected by clinicians. It is also possible that clinicians do not necessarily use a code for neutropenia even with the presence of the clinical diagnosis, if there are other health conditions related to the visits. The prevalence of neutropenia was probably lower in the outpatient population compared to the inpatient setting. Thus, it was expected to observe a lower PPV as the PPV is directly related to the disease prevalence.10 We also did not have access to all the longitudinal laboratory data that patients might have had in other outpatient laboratory facilities, leading to incomplete ascertainment of cases. 424,795 subjects had one or more complete blood counts with or without differentials ordered by a physician at any time during the study period. However, only 95,742 subjects had ANC results available to us. About 27% of subjects identified with algorithm A had a blood test done at a laboratory facility which we had an access to. Patients with severe neutropenia requiring therapy with G-CSF often get hospitalized for further treatment; it is therefore not surprising that algorithm C which combined diagnosis codes with use of G-CSF identified only a smaller number of patients (N=18) with neutropenia as outpatient.

Table 4 illustrates the impact of neutropenia outcome misclassification on estimates of relative risk (RR) and risk difference (RD) in a hypothetical cohort study assessing the effect of a specific drug in neutropenia using claims-based algorithms. Suppose there are 5,000 patients with neutropenia in the total study population of 100,000 and the true RR of developing neutropenia in subjects treated with a certain drug is 2.0. Because algorithm B (any ICD-9 code for neutropenia) has a sensitivity of 9% and a specificity of 99%, only 1,400 patients (5,000 × sensitivity + 95,000 × [1- specificity]) with neutropenia would be identified by algorithm B.11 The estimated RR and RD in this hypothetical study using algorithm B to identify neutropenia would be 1.27 and 0.004, respectively. In other words, substantial bias towards the null was observed on estimates of both RR and RD, due to non-differential misclassification of the outcome in these data. The impact of bias due to outcome misclassification will be even greater if the disease is less prevalent. Therefore, studies assessing the risk of neutropenia based solely on the ICD-9 code in claims data should be interpreted with caution.

Table 4. The impact of neutropenia outcome misclassification on estimates of relative risk and risk difference in a hypothetical cohort study assessing the effect of a specific drug in neutropenia using claims-based algorithms.

Algorithm Sensitivity
(%)
Specificity
(%)
True RR RR estimates* True RD RD estimates *
A. ICD-9 288.00 4 100 2.0 2.00 0.05 0.002
B. Any ICD-9 codes for neutropenia 9 99 2.0 1.27 0.05 0.004
C. Any ICD-9 codes for neutropenia combined with a dispensing of G-CSF 1 100 2.0 2.00 0.05 0.0005

Prevalence of neutropenia in the study population was assumed to be 0.05.

RR= relative risk; RD= risk difference; G-CSF: Granulocyte colony-stimulating factor; ICD-9: International Classification of Diseases, 9th revision

*

RR and RD were estimated using a claims-based algorithm for outcome identification.

This study highlights the need for validation studies of diagnosis code-based outcomes in claims data compared with medical records to assess the potential impact of misclassification bias. Using large longitudinal health care utilization data linked to both outpatient prescription drug use and laboratory test results data, we were able to calculate not only PPV and NPV, but also sensitivity and specificity of various algorithms to define outpatient diagnosis of neutropenia. Given the high specificity and NPV close to 100% (Table 3), these algorithms may be a useful tool for a pharmacoepidemiologic study that verifies the absence of neutropenia as outpatient in patients exposed to a certain drug or as a screening tool for potential neutropenia outcomes that need additional adjudication.

Key Points.

The diagnosis of neutropenia in an outpatient setting was rare and having at least one International Classification of Diseases, 9th Revision, diagnosis code for neutropenia with or without use of granulocyte colony-stimulating factor did not accurately identify diagnosis of neutropenia in outpatient claims data.

Acknowledgments

This study was supported by the National Institutes of Health (NIH) K24 (AR055989) grant.

Dr. Kim is supported by the NIH (T32 AR055885 and now K23 AR059677) grant.

Dr. Solomon is supported by the NIH (K24 AR055989, P60 AR047782, R21 DE018750, and R01 AR056215) grants. Dr. Solomon has received research support from Abbott Immunology, Amgen, and support for an educational course from Bristol Myers Squibb. He has unpaid roles in two drug trials sponsored by Pfizer.

Dr. Schneeweiss is principal investigator of the Brigham and Women's Hospital DEcIDE Center on Comparative Effectiveness Research, funded by the Agency for Healthcare Research and Quality, and of the Harvard-Brigham Drug Safety and Risk Management Research Contract, funded by the US Food and Drug Administration. In the past year, Dr Schneeweiss was a paid member of the scientific advisory board of HealthCore and has received consulting fees from WHISCON. Dr. Schneeweiss has received an Investigator-initiated research grant from Pfizer, Inc.

Footnotes

Conflict of Interest: Dr. Solomon has received research support from Abbott Immunology, Amgen, and support for an educational course from Bristol Myers Squibb. He has unpaid roles in two drug trials sponsored by Pfizer. In the past year, Dr Schneeweiss was a paid member of the scientific advisory board of HealthCore and has received consulting fees from WHISCON. Dr. Schneeweiss has received an Investigator-initiated research grant from Pfizer, Inc.

References

  • 1.Mintzer D, Billet S, Chmielewski L. Drug-induced hematologic syndromes. Adv Hematol. 2009 doi: 10.1155/2009/495863. Epub. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Andersohn F, Konzen C, Garbe E. Systematic review: agranulocytosis induced by nonchemotherapy drugs. Ann Intern Med. 2008;146(9):657–665. doi: 10.7326/0003-4819-146-9-200705010-00009. [DOI] [PubMed] [Google Scholar]
  • 3.Andrès E, Maloisel F. Idiosyncratic drug-induced agranulocytosis or acute neutropenia. Curr Opin Hematol. 2008;15(1):15–21. doi: 10.1097/MOH.0b013e3282f15fb9. [DOI] [PubMed] [Google Scholar]
  • 4.Garbe E. Non-chemotherapy drug-induced agranulocytosis. Expert Opin Drug Saf. 2007;6(3):323–335. doi: 10.1517/14740338.6.3.323. [DOI] [PubMed] [Google Scholar]
  • 5.Andrès E, Kurtz J, Maloisel F. Nonchemotherapy drug-induced agranulocytosis: experience of the Strasbourg teaching hospital (1985-2000) and review of the literature. Clin Lab Haematol. 2002;24(2):99–106. doi: 10.1046/j.1365-2257.2002.00437.x. [DOI] [PubMed] [Google Scholar]
  • 6.Strom B, Carson J, Schinnar R, Snyder E, Shaw M. Descriptive epidemiology of agranulocytosis. Arch Intern Med. 1992;152(7):1475–1480. [PubMed] [Google Scholar]
  • 7.Schneeweiss S, Avorn J. A review of uses of health care utilization databases for epidemiologic research on therapeutics. J Clin Epidemiol. 2005;58(4):323–337. doi: 10.1016/j.jclinepi.2004.10.012. [DOI] [PubMed] [Google Scholar]
  • 8.Strom B. Data validity issues in using claims data. Pharmacoepidemiol Drug Saf. 2001 Aug-Sep;10(5):389–392. doi: 10.1002/pds.610. [DOI] [PubMed] [Google Scholar]
  • 9.Strom B, Carson J, Schinnar R, Shaw M. Is cimetidine associated with neutropenia? Am J Med. 1995 Sep;99(3):282–290. doi: 10.1016/s0002-9343(99)80161-2. [DOI] [PubMed] [Google Scholar]
  • 10.Altman D, Bland J. Diagnostic tests 2: Predictive values. BMJ. 1994;309(6947):102. doi: 10.1136/bmj.309.6947.102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Rothman KJ, Greenland S, Lash TL. Validity in epidemiologic studies. In: Rothman KJ, Greenland S, Lash TL, editors. Modern Epidemiology. 3rd. Philadelphia: Lippincott Williams & Wilkins; 2008. pp. 137–145. [Google Scholar]

RESOURCES