Abstract
Background:
International Classification of Diseases (ICD) code algorithms are routinely used to estimate the frequency of illicit injection drug use (IDU)-associated hospitalizations in administrative health datasets despite a lack of evidence regarding their validity. We aimed to measure the sensitivity and specificity of ICD code algorithms used to estimate the prevalence of current/recent IDU among infective endocarditis (IE) hospitalizations without a reference standard.
Methods:
We reviewed medical records of 321 patients aged 18–64 years old from an urban academic hospital with an IE diagnosis between 2007 and 2017. Diagnostic tests for IDU included self-reported IDU in medical records; a drug use, abuse and dependence (UAD) ICD algorithm; a Hepatitis C Virus (HCV) ICD algorithm; and a combination drug UAD/HCV ICD algorithm. Sensitivity, specificity and the misclassification error (ME)-adjusted IDU prevalence were estimated using Bayesian latent class models.
Results:
The combination algorithm had the highest sensitivity and lowest specificity. Sensitivity increased in the ICD-10 period for the drug UAD algorithm compared to the ICD-9 period. The ME-adjusted current/recent IDU prevalence estimated using the drug UAD and HCV algorithms was 23% (95% Bayesian credible interval: 16%, 31%). The unadjusted prevalence estimate from the drug UAD algorithm underestimated the ME-adjusted prevalence, while the combination algorithm overestimated it.
Conclusion:
The validity of ICD code algorithms for IDU among IE hospitalizations is imperfect and differs between ICD-9 and ICD-10. Commonly used ICD-based algorithms could lead to substantially biased prevalence estimates in IDU-associated hospitalizations when using administrative health data.
Keywords: validation studies, endocarditis, substance abuse, intravenous, International Classification of Diseases, diagnostic error, Bayesian analysis
1. Introduction
Worldwide, an estimated 15.6 million persons (95% uncertainty interval: 10.2 million, 23.7 million) injected drugs in 2015. (Degenhardt et al., 2017) Non-medical injection drug use (IDU) can lead to devastating communicable infections such as HIV, Hepatitis B and C (Degenhardt et al., 2017; Lansky et al., 2014) but also costly and potentially fatal bacterial or fungal infections such as infective endocarditis (IE). IE is an infection of the heart chambers or valves that can be acquired through non-sterile drug use behaviors such as sharing drug use equipment and administering injections if an organism is present on the skin, the injection equipment, or in the drug itself. (Gordon and Lowy, 2005; Wurcel et al., 2015) The epidemiology of IDU-associated IE is challenging to estimate due to the lack of surveillance for IDU and IE and the sensitive nature of non-medical IDU. Accurate estimation of IDU requires a massive effort (e.g., Degenhardt et al. (2017)). Yet clinicians and researchers alike may wish to monitor the frequency of healthcare encounters for IDU-IE at a single institution or at the state- or national-level to estimate healthcare costs and the need for and impact of public health interventions to reduce morbidity and mortality from IDU. Trends in healthcare encounters for IDU-IE can be estimated efficiently using administrative health databases (e.g., hospitalization discharge data), and available evidence suggests that hospitalizations for IDU-IE have increased in North America in parallel with the twenty-first century opioid crisis. (Cooper et al., 2007; Weir et al., 2019; Wurcel et al., 2016) However, in such studies, identifying IDU-IE hospitalizations relied on International Classification of Diseases (ICD) codes because most administrative health datasets lack detailed information regarding recent IDU. (Cooper et al., 2007; Tookes et al., 2015; Weir et al., 2019; Wurcel et al., 2016) While ICD algorithms for classifying IE cases have demonstrated high sensitivity and nearly perfect specificity when validated against results of the Modified Duke Criteria, (Tan et al., 2016; Toyoda et al., 2017) no ICD code specifically for drug use via injection exists. Consequently, prior studies of IDU-IE used ICD codes for conditions associated with IDU as surrogates (e.g., ICD codes for opioid and illicit drug use, abuse or dependence [UAD] or acute and chronic Hepatitis C Virus [HCV]). (Cooper et al., 2007; Tookes et al., 2015; Weir et al., 2019; Wurcel et al., 2016) However, drug UAD ICD codes do not specify the route of drug administration (e.g., injection, inhalation, oral ingestion), and so this surrogate results in misclassifying users of non-injectable drugs as IDUs. Using the HCV ICD algorithm to identify IDU can lead to a large amount of false positive cases because, while IDU is a major risk factor for HCV, non-IDU causes of HCV are also important. (Girometti et al., 2019; Ramiere et al., 2019)
To validate the ability of ICD codes to identify IDU, prior studies used self-reported IDU status abstracted from medical records or disease surveillance reports as the reference standard. (Ball et al., 2017; Janjua et al., 2018) These studies produced suboptimal point estimates of sensitivity (49–73% and 59–66%) and specificity (79–92% and 79–95%) for the drug UAD ICD codes and HCV ICD codes, respectively. (Ball et al., 2017; Janjua et al., 2018) However, self-reported IDU is very likely to be an imperfect reference standard due to stigma and fear of legal consequences which may prevent a user of injection drugs from reporting this behavior. The goal of this study was to estimate the sensitivity and specificity of ICD code algorithms and the misclassification error (ME)-adjusted prevalence of IDU among IE hospitalizations in the absence of a reference standard using Bayesian latent class models (BLCM).
2. Methods
2.1. Study design
This was a cross-sectional analysis using retrospective data and followed the Standards for the Reporting of Diagnostic Accuracy Studies that use Bayesian Latent Class Models (STARD-BLCM) (Appendix A). (Kostoulas et al., 2017) The University of Oklahoma Health Sciences Center Institutional Review Board approved this research.
2.2. Source and study populations
The source population consisted of inpatient hospitalizations admitted to a 400-bed urban academic hospital in the Midwestern US between January 1, 2007 and December 31, 2017. The target population was consecutive IE inpatient hospitalizations among source population members aged 18–64 years at admission. These potentially eligible hospitalizations were excluded if no medical record was available for review. An IE hospitalization was defined as a hospitalization discharge record with one or more ICD code for IE in any diagnostic position (Appendix B). These IE algorithms had been previously validated against the Modified Duke Criteria. (Tan et al., 2016; Toyoda et al., 2017) ICD Ninth Revision (ICD-9)- and Tenth Revision (ICD-10)-based definitions for IE were used to identify patients discharged from 1/1/2007 to 9/30/2015 and from 10/1/2015 to 12/31/2017, respectively (Appendix B). Eligible patients were excluded from the final study population given missing or inconclusive reference standard results were found in the medical records.
2.3. Tests for IDU
2.3.1. ICD code algorithms for IDU
ICD codes included in our algorithms are listed in Table B1 of Appendix B. We began with three ICD algorithms validated in a cross-sectional study (Ball et al., 2017) of ICD-10 algorithms for IDU-IE using self-reported IDU status and results of the Modified Duke Criteria (Li et al., 2000) for IE abstracted from medical records as the reference standard: (1) at least one ICD diagnosis code for drug UAD with opioids or illicit drugs (“drug UAD algorithm”); (2) at least one ICD diagnosis code for acute or chronic HCV (“HCV algorithm”); (3) a code from the drug UAD algorithm or the HCV algorithm (“combination algorithm”). (Ball et al., 2017) Two modifications were made to these algorithms for our study. First, ICD-10 codes were translated to their ICD-9 code equivalents using forward and backward mapping with General Equivalence Mappings for validation during the period that ICD-9 codes were in use in the study hospital. (The National Bureau of Economic Research, 2016) Second, we added ICD-9 and ICD-10 codes for unspecified HCV with and without hepatic coma to the HCV algorithm, similar to what had been done in another study identifying IDU-IE using US administrative health data. (Wurcel et al., 2016) Only data from the current hospitalization (i.e., the hospitalization for which the ICD code for IE was selected) was abstracted.
2.3.2. Self-reported IDU status from medical records
Medical record abstraction was conducted by the first author (KM) who received electronic medical record software (MEDITECH) training and was mentored by a co-author (DD), a physician providing care for IE patients. IDU status was abstracted through manual review of electronic medical records for the current hospitalization only (i.e., the hospitalization for which the ICD code for IE was selected) using a medical record abstraction form developed and pilot tested for this study (Appendix C). Reports of IDU and illicit drug use status by family members or friends were used for patients unable to self-report their status (e.g., due to intubation). If self-reported status differed from that reported by a family member or friend (e.g., patient reported positive yet family reported negative), the positive status was assumed for analyses. When the first author (KM) required interpretation assistance of clinical documentation, the records were reviewed with a co-author (DD). Both co-authors were blinded to the ICD codes selected for the hospitalization during medical record abstraction. For all analyses, hospitalizations were dichotomized as follows:
-
“IE with current/recent IDU”
Documentation of positive IDU history with last exposure reported to be less than or equal to six months ago
-
“IE without current/recent IDU”
Documentation of negative IDU history or non-recent IDU (i.e., no IDU within six months), or documentation of a negative illicit drug use history and a missing IDU history
Hospitalizations with missing or indeterminate results were excluded from analyses.
2.4. Statistical analyses
2.4.1. Unadjusted prevalence estimates
The unadjusted current/recent IDU prevalence was calculated as the number testing positive for each ICD algorithm during the study period divided by the total study population during the period.
2.4.2. Misclassification error-adjusted validity and prevalence estimates
Two BLCM, stratified by ICD revision period, were used to estimate the validity of self-reported IDU status and the ICD-based algorithms, none of which was considered a reference standard. BLCM have been used extensively to estimate the prevalence of an outcome and the validity of multiple diagnostic tests data in the absence of a reference standard. (Carabin et al., 2015; Jafarzadeh and Felson, 2018; Jafarzadeh et al., 2016a; Jafarzadeh et al., 2016b; Joseph et al., 1995; Lasry et al., 2018; Schumacher et al., 2016) In the first model (“model 1”), we analyzed three tests for current/recent IDU: the drug UAD algorithm, the HCV algorithm, and the self-reported IDU status. (Joseph et al., 1995) A second model (model 2) was developed considering only two diagnostic tests, that is, the combination ICD algorithm and the self-reported status. We assumed conditional independence among all tests in the main analysis. Conditional independence means that the result of any of the tests under consideration is not affected by the result of any of the other tests conditional on the true IDU status of the subject being tested. For example, when assuming conditional independence, we consider that the true IDU status of a subject does not create an association between the results of the drug UAD ICD algorithm and the HCV ICD algorithm. However, we also considered a situation where the assumption of conditional independence may not be met. A person truly IDU-positive who admits to their injectable drug behavior will likely have a note regarding IDU status in the medical record, and this note will likely influence the ICD coding for UAD (i.e., a positive self-reported IDU status would make it more likely that an ICD code for drug UAD would be selected for the hospital discharge record), creating a dependence between the UAD ICD algorithm and the reference standard. We therefore conducted a sensitivity analysis assuming conditional dependence between these two tests (see 2.4.5).
Bayesian analyses were conducted using JAGS, version 4.3.0 in R version 3.5.1 and WinBUGS version 1.4.3. Two MCMC chains were run, and all inferences were based on 100,000 iterations after a burn-in of 10,000 iterations. Lack of convergence of the Markov chains was assessed with the Gelman-Rubin statistic and diagnostic plots of MCMC chains.
2.4.3. Priors
In a Bayesian approach, probability distributions are used to reflect the current knowledge (i.e., from previous studies or expert opinion) regarding model parameters of interest (here, the sensitivity and specificity of the different ICD algorithms and of the medical records as well as the true IDU prevalence). Such probability distributions are called priors. These priors are updated with the observed data (i.e., our data for this study) to obtain posterior probability distributions for the model parameters of interest.
We first conducted analyses with a first set of priors (“priors set 1”) and subsequently reran analyses with two other sets of priors (“priors set 2” and “priors set 3”). In Bayesian analyses, several sets of priors are used to assess the robustness of the posterior results to the use of alternative a priori beliefs (i.e., priors) regarding the performance of the algorithms and medical records. Here, we used three sets of priors, each explained below.
2.4.3.1. Priors for IDU prevalence
We assumed a beta (1,1) prior in all models.
2.4.3.2. Priors for ICD code algorithms
For prior set 1, a mode and lower limit in between the point estimates and confidence interval lower bounds reported for the two study populations in Ball et al. (2017) were selected for each parameter (Table 1). Because our study differed from Ball et al. in that we did not collect data on results of the Modified Duke Criteria, for prior set 2, we added 10% to the mode as we hypothesized the removal of the Modified Duke Criteria from Ball et al.’s reference standard would have increased the sensitivity and specificity of the algorithms. This is because hospitalizations with a self-reported positive IDU status and a negative result from the Modified Duke Criteria would be counted as reference standard positive instead of reference standard negative. Because reference standard positives may be more likely to have positive index tests than negative index tests, without the Modified Duke Criteria in the reference standard, the number of true positives would increase more than the number of false negatives, and the number of false positives would decrease more than the number of true negatives. To account for uncertainty even more, for prior set 3, we specified more diffuse priors as uniform distributions. Priors with uniform distributions allow the current data to have more influence on the posterior estimates (i.e., the current data has more freedom to vary) compared to more restrictive informative priors specified as, for example, beta distributions (e.g., prior sets 1 and 2). The lower bounds of prior set 3 were guided by Ball et al.’s results but adjusted downward to account for between-study variation and Ball et al’s use of the Modified Duke Criteria in the reference standard. These priors with uniform distributions were applied to both ICD periods because data needed to specify more informative priors (e.g., from Ball et al. or other studies) were not available for the ICD-9 period.
Table 1.
Priors Adopted in Bayesian Analyses to Estimate the Unadjusted and Misclassification Error-Adjusted Prevalence of Current/Recent IDU and the Validity of Medical Record Status and ICD Algorithms among IE Hospitalizations, 2007–2017
| Classification Method | Parameter | Prior Set 1 | Prior Set 2 | Prior Set 3 |
|---|---|---|---|---|
| Medical Record IDU Status | Sensitivity | uniform (0.6, 1) | Same as Primary | uniform (0.5, 1) |
| Specificity | uniform (0.9, 1) | Same as Primary | uniform (0.8, 1) | |
| Drug ICD Code Algorithm | Sensitivity | Mode=52%, 97.5% of the probability density is located above 39% beta (29.337, 27.158) |
Mode=62%, 97.5% of the probability density is located above 39% beta (11.997, 7.740) |
uniform (0.3, 1) |
| Specificity | Mode=85%, 97.5% of the probability density is located above 70% beta (31.888, 6.451) |
Mode=95%, 97.5% of the probability density is located above 70% beta (13.640, 1.665) |
uniform (0.5, 1) | |
| HCV ICD Code Algorithm | Sensitivity | Mode=62%, 97.5% of the probability density is located above 49% beta (36.537, 22.781) |
Mode=72%, 97.5% of the probability density is located above 49% beta (14.433, 6.224) |
uniform (0.1, 1) |
| Specificity | Mode=87%, 97.5% of the probability density is located above 72% beta (31.342, 5.534) |
Mode=97%, 97.5% of the probability density is located above 72% beta (13.371, 1.383) |
uniform (0.5, 1) | |
| Combination Drug/HCV ICD Code Algorithm |
Sensitivity | Mode=77%, 97.5% of the probability density is located above 65% beta (48.164, 15.088) |
Mode=87%, 97.5% of the probability density is located above 65% beta (17.122, 3.409) |
uniform (0.5, 1) |
| Specificity | Mode=80%, 97.5% of the probability density is located above 63% beta (26.204, 7.301) |
Mode=90%, 97.5% of the probability density is located above 63% beta (12.476, 2.275) |
uniform (0.4, 1) |
Abbreviations: HCV, Hepatitis C Virus; ICD, International Classification of Diseases; IDU, injection drug use
2.4.3.3. Priors for self-reported IDU status
Prior selection was loosely guided by results of Weatherby et al. (1994). This study’s sensitivity and specificity estimates were not directly applicable to our study because the authors assessed the validity of self-reported IDU among individuals who had already admitted to using illicit drugs in the community and used a different reference standard to ours here. Hence, we used lower estimates than sensitivity estimates reported in the study as a uniform distribution (0.6, 1.0) for prior sets 1 and 2. We decreased the lower bounds for both sensitivity and specificity in the tertiary set of priors (Table 1).
2.4.4. Bias estimation
The difference between the unadjusted IDU prevalence and the ME-adjusted IDU prevalence was first calculated to estimate bias. This difference was subsequently divided by the corresponding ME-adjusted prevalence result to estimate a bias ratio.
2.4.5. Sensitivity analysis
We ran Model 1 with the primary and alternative priors assuming conditional dependence (Dendukuri and Joseph, 2001) between the reference standard and the drug UAD algorithm as a positive medical record IDU status could increase the likelihood that a drug UAD ICD code be selected as part of the discharge summary. Appendix D displays conditional covariances and the ME-adjusted IDU prevalence calculated assuming conditionally dependence. Assuming conditional dependence led primarily to decreased lower bounds of the ME-adjusted IDU prevalence; however, the upper bound was also occasionally impacted. Because the assumption of conditional dependence did not make any practical differences in our interpretations, we reported results assuming conditional independence.
3. Results
3.1. Flow of participants
Of the 411 potentially eligible patients, 35 (8.5%) were excluded due to unavailable medical records in the earlier years of the study period (2007–2009) (Figure 1). Of the remaining 376 eligible IE hospitalizations, 41 (11%) were excluded due to missing IDU status documentation, and 14 were excluded due to inconclusive reference standard results (i.e., a positive illicit drug use status with no documentation on IDU). The final study population consisted of 321 hospitalizations.
Figure 1.

Standards for Reporting Diagnostic Accuracy (STARD) Flow Diagram, Validity of ICD codes for Injection Drug Use-Associated Infective Endocarditis Hospitalizations, 2007–2017
Abbreviations: HCV, Hepatitis C Virus; ICD, International Classification of Diseases
Note: A potentially eligible hospitalization was defined as an IE inpatient visit among a patient aged 18–64 years and admitted between January 1, 2007 and December 31, 2017. These potentially eligible hospitalizations were excluded if no medical record was available for review. Eligible patients were then excluded from the final study population given missing or inconclusive reference standard results were found in the medical records.
3.2. Characteristics of the study population
Among the final study population (N=321), a majority were male (60.4%) and of white race (69.2%). More than half were aged 35–53 years (51.1%), followed by ages 55–64 (27.1%) and 18–34 (21.8%). Nineteen percent had documented self-reported current/recent IDU in their medical record (95% CI: 14.7%, 23.3%). Many patients reported using more than one drug, and methamphetamine (70.5%) was most commonly used, followed by cocaine (27.9%), and opioids other than heroin (27.9%).
3.3. Posterior sensitivity and specificity stratified by ICD period
3.3.1. Model 1
The drug UAD algorithm demonstrated similar sensitivity compared to the HCV algorithm during the ICD-9 period using prior sets 1 and 2, but superior and more precise specificity with all priors (e.g., with prior set 1, drug UAD median=94.5%, 95% Bayesian credible interval [BCI]: 90.2%, 97.5%, and HCV median=85.6%, 95% BCI: 79.7%, 90.7%). Using the tertiary prior, the drug UAD had higher sensitivity than the HCV algorithm (medians=66.7% and 58.5%, respectively). During the ICD-10 period, the drug UAD algorithm outperformed the HCV algorithm in terms of sensitivity and specificity across the three sets of priors. The distribution of the drug UAD ICD-10 sensitivity shifted towards higher values as the priors were relaxed, resulting in the highest median value of 83.1% (95% BCI: 62.4%, 97.7%) with the tertiary prior (Figure 2). For HCV sensitivity however, the opposite was observed in that adoption of more diffuse tertiary priors shifted the posterior distribution toward lower values, but this was also accompanied by an increase in uncertainty.
Figure 2.

Prior and Posterior Distributions of the Sensitivity of the Drug UAD ICD-10 Algorithm
Abbreviations: ICD, International Classification of Diseases; UAD, use, abuse or dependence
The medical record (i.e., documentation on self-reported IDU status) demonstrated moderate sensitivity (median=77.0%, 95% BCI: 62.0%, 94.8%) and nearly perfect specificity (median=98.7%, 95% BCI: 94.8%, 100%) to classify patients into their IDU status using prior set 1 during the ICD-9 period (Table 2). Results were fairly robust when assessed using the two alternative sets of priors. During the ICD-10 period, the medians and 95% BCIs for sensitivity were considerably higher with medians ranging from 88.3% to 90.4%.
Table 2.
Posterior Medians and 95% Bayesian Credible Intervals, Misclassification Error-Adjusted Estimates of the Sensitivity and Specificity of Methods to Classify Current/Recent IDU among IE Hospitalizations Assuming No Reference Standard, Stratified by ICD Revision
| ICD Ninth Revision Period (Discharged January 1, 2007 - September 30, 2015) (n=217) |
ICD Tenth Revision Period (Discharged October 1, 2015 - December 31, 2017) (n=104) |
||||||
|---|---|---|---|---|---|---|---|
| Model 1 | |||||||
| Parameter | Prior Set 1 | Prior Set 2 | Prior Set 3 | Prior Set 1 | Prior Set 2 | Prior Set 3 | |
| Misclassification Error-Adjusted Prevalence | 22.3 (15.6, 30.1) | 23.3 (16.6, 30.9) | 23.4 (16.4, 31.4) | 22.9 (14.8, 33.0) | 23.3 (15.1, 33.3) | 22.6 (14.4, 32.5) | |
| Medical Record IDU Status | Sensitivity | 77.0 (62.0, 94.8) | 73.0 (61.2, 88.7) | 72.2 (55.6, 88.5) | 90.4 (67.7, 99.6) | 88.4 (66.8, 99.3) | 88.3 (66.2, 99.4) |
| Specificity | 98.7 (94.8, 100) | 98.5 (94.4, 99.9) | 98.4 (93.9, 99.9) | 98.6 (93.3, 100) | 98.6 (93.5, 99.9) | 98.0 (91.5, 99.9) | |
| Drug ICD Code Algorithm | Sensitivity | 57.4 (47.2, 67.4) | 64.9 (51.5, 77.6) | 66.7 (49.9, 83.4) | 60.0 (48.8, 70.6) | 72.2 (57.1, 84.8) | 83.1 (62.4, 97.7) |
| Specificity | 94.5 (90.2, 97.5) | 97.9 (94.0, 99.7) | 98.5 (94.2, 99.9) | 91.9 (85.9, 96.1) | 95.9 (89.9, 99.2) | 96.5 (89.7, 99.8) | |
| HCV ICD Code Algorithm | Sensitivity | 60.6 (50.7, 70.0) | 62.9 (50.4, 74.5) | 58.5 (43.6, 72.6) | 57.5 (46.6, 68.0) | 57.4 (42.4, 71.8) | 44.4 (25.0, 64.7) |
| Specificity | 85.6 (79.7, 90.7) | 86.8 (80.7, 92.0) | 85.7 (79.2, 91.3) | 85.3 (78.0, 91.2) | 86.2 (78.0, 92.6) | 83.7 (74.3, 90.9) | |
| Model 2 | |||||||
| Parameter | Prior Set 1 | Prior Set 2 | Prior Set 3 | Prior Set 1 | Prior Set 2 | Prior Set 3 | |
| Misclassification Error-Adjusted Prevalence | 21.6 (14.5, 31.4) | 22.7 (14.4, 33.3) | 23.5 (14.1, 36.2) | 25.4 (15.9, 37.9) | 27.3 (16.6, 40.3) | 27.9 (16.5, 41.8) | |
| Medical Record IDU Status | Sensitivity | 80.3 (61.4, 98.9) | 74.7 (60.6, 98.3) | 70.3 (51.1, 98.1) | 81.9 (61.7, 99.0) | 76.2 (60.9, 98.3) | 73.6 (52.2, 98.4) |
| Specificity | 98.9 (95.1, 100) | 98.3 (94.2, 99.9) | 97.8 (93.5, 99.9) | 98.6 (93.4, 100) | 98.5 (93.3, 99.9) | 98.3 (92.7, 99.9) | |
| Combination ICD Code Algorithm | Sensitivity | 81.2 (72.7, 88.2) | 89.2 (78.5, 96.6) | 91.9 (77.2, 99.6) | 81.3 (72.1, 88.7) | 90.9 (79.5, 97.4) | 94.8 (79.7, 99.8) |
| Specificity | 82.3 (75.5, 89.3) | 85.0 (76.9, 93.9) | 86.1 (76.2, 98.3) | 82.4 (73.6, 90.3) | 86.6 (75.9, 96.3) | 87.3 (74.2, 99.1) | |
Abbreviations: HCV, Hepatitis C Virus; ICD, International Classification of Diseases; IDU, injection drug use; IE, infective endocarditis
3.3.2. Model 2
The combination ICD algorithm median sensitivity estimates were around 81% with prior set 1 during both ICD periods. Secondary and tertiary priors produced higher estimates in both ICD revisions, with the highest estimate being obtained in ICD-10 with prior set 3 (median=94.8%, 95% BCI: 79.7%, 99.8%). The specificity of the combination algorithm was suboptimal in ICD-9 (with prior set 1: median=82.3%, 95% BCI: 75.5%, 89.3%) and varied minimally between ICD revisions or across the different sets of priors. The medical record posterior 95% BCIs for sensitivity were generally wider in model 2 than model 1, resulting in some variation in the estimated median sensitivity.
3.4. Estimates of current/recent IDU prevalence from 2007–2017
For the entire study period, the unadjusted current/recent IDU prevalence estimates were lowest using the drug UAD algorithm (median=18.2%, 95% BCI: 14.3%, 22.6%) and highest using the combination algorithm (median=33.4%, 95% BCI: 28.4%, 38.7%) (Table 3). Using prior set 1 in model 1, the ME-adjusted current/recent IDU prevalence was 22.8% (95% BCI: 15.9%, 30.5%), which is higher than the unadjusted estimate using the drug UAD algorithm and similar to the unadjusted estimate using the HCV algorithm. Estimates produced by model 2 were slightly higher. Posterior estimates were robust to different priors.
Table 3.
Posterior Medians and 95% Bayesian Credible Intervals, Unadjusted and Misclassification Error-Adjusted Estimates of the Current/Recent IDU Prevalence among IE Hospitalizations, 2007–2017 (N=321)
| Model | ICD code Algorithm | Noninformative Priora | Prior Set 1 | Prior Set 2 | Prior Set 3 |
|---|---|---|---|---|---|
| 1 | Drug ICD Code Algorithm | 18.2 (14.3, 22.6) | 22.8 (15.9, 30.5) | 23.6 (16.8, 31.3) | 23.4 (16.5, 31.3) |
| HCV ICD Code Algorithm | 23.8 (19.4, 28.6) | ||||
| 2 | Combination Drug/HCV ICD Code Algorithm | 33.4 (28.4, 38.7) | 23.3 (15.8, 32.3) | 24.5 (16.5, 34.0) | 25.5 (16.4, 36.4) |
Abbreviations: HCV, Hepatitis C Virus; ICD, International Classification of Diseases; IDU, injection drug use; IE, infective endocarditis
beta (1,1)
3.5. Bias estimates
The drug UAD algorithm evaluated in model 1 was more likely to underestimate the ME-adjusted current/recent IDU prevalence as shown by the skewness of the bias difference and ratio distributions toward negative values (Table 4). In contrast, the combination ICD algorithm was more likely to overestimate the ME-adjusted current/recent IDU prevalence. Using the HCV algorithm alone was similarly likely to overestimate or underestimate (as shown by the distribution of the credible intervals) the ME-adjusted current/recent IDU prevalence.
Table 4.
Posterior Medians and 95% Bayesian Credible Intervals for the Differences in the Unadjusted Current/Recent IDU Prevalence According to Each ICD Algorithm and the Misclassification Error-Adjusted Current/Recent IDU Prevalence, Presented as Bias Differences and Bias Ratios, 2007–2017
| ICD Algorithm | Bias Measured as a Differencea | Bias Measured as a Ratiob | ||||
|---|---|---|---|---|---|---|
| Prior Set 1 | Prior Set 2 | Prior Set 3 | Prior Set 1 | Prior Set 2 | Prior Set 3 | |
| Drug UAD algorithm | −4.6 (−13.3, 3.6) | −5.3 (−14.0, 2.8) | −5.2 (−14.1, 3.1) | −0.2 (−0.5, 0.2) | −0.2 (−0.5, 0.2) | −0.2 (−0.5, 0.2) |
| HCV algorithm | 1.0 (−7.9, 9.4) | 0.2 (−8.7, 8.6) | 0.3 (−8.7, 8.9) | 0.04 (−0.3, 0.6) | 0.01 (−0.3, 0.5) | 0.02 (−0.3, 0.5) |
| Combination algorithm | 10.1 (−0.2, 19.3) | 8.9 (−1.8, 18.5) | 7.9 (−4.1, 18.4) | 0.4 (−0.006, 1.2) | 0.4 (−0.06, 1.1) | 0.3 (−0.1, 1.1) |
Abbreviations: HCV, Hepatitis C Virus; ICD, International Classification of Diseases; IDU, injection drug use;
Bias difference= (unadjusted prevalence minus ME-adjusted prevalence;
Bias ratio= (unadjusted prevalence minus ME-adjusted prevalence) divided by ME-adjusted prevalence
4. Discussion
This study was the first to validate multiple methods of current/recent IDU classification among IE hospitalizations assuming that no measurement method is perfect. Understanding how well ICD codes and notes on medical records can classify IDU status is critical given the increasing reliance on hospitalization data to estimate trends in IDU-associated health consequences. One of the primary findings was that ICD algorithm validity varies across ICD revision periods, particularly for the drug UAD algorithm when allowed more freedom to vary using prior set 3, with the drug UAD sensitivity considerably higher in ICD-10 compared to the ICD-9 period. This is not surprising given the increased number of drug ICD codes in ICD-10 relative to ICD-9. Our inability to detect this increase in sensitivity using a prior based on Ball et al. (2017) indicates that the relatively low sensitivities estimated in that study were not well suited to our setting, likely due to the inclusion of the Modified Duke Criteria in Ball et al.’s reference standard. As explained in the methods section, we relaxed the prior based on this study and used diffuse priors in two alternative models, and this allowed us to detect (although not fully capture) the increase in performance of the algorithms during the ICD-10 period. With the upcoming transition to ICD-11, it is important to understand changes in ICD revisions as researchers continue to monitor trends using these data.
The combination algorithm outperformed the single-category drug UAD and HCV algorithms during the ICD-9 and ICD-10 periods in terms of sensitivity, but demonstrated the lowest specificity. Although HCV codes varied minimally between ICD-9 and ICD-10, the sensitivity of the HCV algorithm diminished in ICD-10, especially when a diffuse prior was used. This reduction might be the result of more routine HCV screening and diagnosis among all IE hospitalizations (i.e., beyond those who appear at risk of IDU such as baby boomers as per the 2012 screening recommendations of the CDC (Smith et al., 2012)) during more recent years as practitioners gained awareness of HCV. The decrease in sensitivity is unlikely to be a result of changes in HCV codes as little variation exists between HCV codes available in ICD-9 versus ICD-10 (Appendix B).
Failure to adjust for the imperfect nature of the drug UAD algorithm most likely would have underestimated the prevalence of current/recent IDU status among our study population. In contrast, using the combination algorithm would have resulted in overestimating the ME-adjusted current/recent IDU prevalence. These findings together suggest prior research (Wurcel et al., 2016) that relied on a combination algorithm without adjusting for misclassification error may have overestimated the prevalence of IDU among US IE hospitalizations, while research relying only on a drug UAD algorithm (Cooper et al., 2007) may have underestimated the number of IDU-IE hospitalizations particularly when ICD-9 data were used. Which ICD algorithm to recommend for future research depends on the underlying true prevalence of IDU as well as the context of the health care setting in the study population. The true prevalence would influence the magnitude of under or over-estimation introduced by the non-perfect sensitivity and specificity of each algorithm while the context could impact the actual sensitivity and specificity values of how well IDU is captured by ICD in different health care systems. Therefore, we recommend future studies use an approach similar to ours to obtain adjusted estimates of IDU prevalence, especially when data from ICD-9 and ICD-10 are being used. If only ICD-10 data are being used, our results suggest the drug UAD algorithm may perform reasonably well when a Bayesian Latent Class approach is not possible.
This is the first study to validate ICD codes for current/recent IDU using US data. We believe that the results may not only be of interest to researchers, who may wish to estimate trends in the prevalence of IDU among IE hospitalizations on a large scale (e.g., at the national level), but also to clinicians as end users of the results, as they may wish to estimate ME-adjusted time trends in IDU-IE at their practice/institution using administrative health data, which would be more efficient than conducting individual medical record reviews. One strength of our study is the estimation of the validity of ICD algorithms separately for ICD-9 and ICD-10 rather than producing only combined estimates as was done in prior validation research. (Janjua et al., 2018) Our posterior estimates of ICD algorithm sensitivity and specificity can be adopted as priors in similar analyses of other populations; however, alternative priors should also be adopted due to the limited generalizability of our findings using data from a single Midwestern US facility. We also improved on prior validation research (Ball et al., 2017) of ICD algorithms for IDU by incorporating the estimated error of self-reported IDU status from the medical record into the analysis. It is almost certain that the sensitivity and specificity of self-reported IDU status documented in medical records varies across hospitals and time periods, and therefore our estimates should not be generalized to other populations. However, to our knowledge, our results represent the first estimates of the sensitivity and specificity of self-reported IDU status in medical records and can be used as a starting point in either frequentist or Bayesian analyses to adjust for misclassification of IDU. Reliance on the self-reported status of a stigmatized and illegal behavior as a reference standard without misclassification error adjustment will likely produce biased estimates of sensitivity and specificity, and consequently the frequency of that behavior. Future studies should examine how the validity of self-reported IDU status varies across geographic regions as it will likely be impacted by regional characteristics tied to hospital location (e.g., levels of stigma, most common drugs used). Furthermore, future research should consider abstracting information on IDU status from records six months before and after the IE hospitalization, as was done in Miller and Polgreen (2019), to assess the impact on the validity.
A limitation of our study is that we used ICD codes to identify our study population of IE patients rather than verifying each case through extensive medical record review. However, IE misclassification is likely to be minimal since the IE ICD-9 and ICD-10 code algorithms we used have demonstrated high sensitivity and specificity when validated against the Modified Duke Criteria. (Tan et al., 2016; Toyoda et al., 2017) As with any Bayesian analysis, our results depend on the priors used for the different models. Presenting results according to different priors allows the reader to decide which one best reflects their beliefs. Moreover, models run with diffuse priors rely mostly on the data at hand to generate the results, to the expense of losing some precision. Additionally, the relatively small sample size obtained after stratifying by ICD period limited the precision of our posterior estimates.
The use of ICD code algorithms to identify cases of IDU in large administrative-based studies is becoming more common as the opioid crisis continues. We found that the misclassification-adjusted prevalence of current/recent IDU was approximately 23% between 2007 and 2017 among a population of IE inpatients in a Midwestern US hospital. These more accurate estimates of the current/recent IDU prevalence are better suited to inform clinical practice and policy at the local, state, and national levels.
Supplementary Material
References
- Ball LJ, Sherazi A, Laczko D, Gupta K, Koivu S, W. M,A, Mele T, Tirona R, McCormick JK, Silverman M, 2017. Validation of an Algorithm to Identify Infective Endocarditis in People Who Inject Drugs. Med. Care [DOI] [PubMed] [Google Scholar]
- Carabin H, Mc GS, Sahlu I, Tarafder MR, Joseph L, BB DEA, Balolong E, Olveda R, 2015. Schistosoma japonicum in Samar, the Philippines: infection in dogs and rats as a possible risk factor for human infection. Epidemiol Infect 143, 1767–1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cooper HL, Brady JE, Ciccarone D, Tempalski B, Gostnell K, Friedman SR, 2007. Nationwide increase in the number of hospitalizations for illicit injection drug use-related infective endocarditis. Clin. Infect. Dis 45, 1200–1203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Degenhardt L, Peacock A, Colledge S, Leung J, Grebely J, Vickerman P, Stone J, Cunningham EB, Trickey A, Dumchev K, Lynskey M, Griffiths P, Mattick RP, Hickman M, Larney S, 2017. Global prevalence of injecting drug use and sociodemographic characteristics and prevalence of HIV, HBV, and HCV in people who inject drugs: a multistage systematic review. Lancet Glob Health 5, e1192–e1207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dendukuri N, Joseph L, 2001. Bayesian approaches to modeling the conditional dependence between multiple diagnostic tests. Biometrics 57, 158–167. [DOI] [PubMed] [Google Scholar]
- Girometti N, Devitt E, Phillips J, Nelson M, Whitlock G, 2019. High rates of unprotected anal sex and use of generic direct-acting antivirals in a cohort of MSM with acute HCV infection. J Viral Hepat 26, 627–634. [DOI] [PubMed] [Google Scholar]
- Gordon RJ, Lowy FD, 2005. Bacterial infections in drug users. N Engl J Med 353, 1945–1954. [DOI] [PubMed] [Google Scholar]
- Jafarzadeh SR, Felson DT, 2018. Updated Estimates Suggest a Much Higher Prevalence of Arthritis in United States Adults Than Previous Ones. Arthritis Rheumatol 70, 185–192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jafarzadeh SR, Thomas BS, Gill J, Fraser VJ, Marschall J, Warren DK, 2016a. Sepsis surveillance from administrative data in the absence of a perfect verification. Ann Epidemiol 26, 717–722.e711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jafarzadeh SR, Warren DK, Nickel KB, Wallace AE, Fraser VJ, Olsen MA, 2016b. Bayesian estimation of the accuracy of ICD-9-CM- and CPT-4-based algorithms to identify cholecystectomy procedures in administrative data without a reference standard. Pharmacoepidemiol Drug Saf 25, 263–268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Janjua NZ, Islam N, Kuo M, Yu A, Wong S, Butt ZA, Gilbert M, Buxton J, Chapinal N, Samji H, Chong M, Alvarez M, Wong J, Tyndall MW, Krajden M, 2018. Identifying injection drug use and estimating population size of people who inject drugs using healthcare administrative datasets. Int J Drug Policy 55, 31–39. [DOI] [PubMed] [Google Scholar]
- Joseph L, Gyorkos TW, Coupal L, 1995. Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard. Am J Epidemiol 141, 263–272. [DOI] [PubMed] [Google Scholar]
- Kostoulas P, Nielsen SS, Branscum AJ, Johnson WO, Dendukuri N, Dhand NK, Toft N, Gardner IA, 2017. STARD-BLCM: Standards for the Reporting of Diagnostic accuracy studies that use Bayesian Latent Class Models. Prev Vet Med 138, 37–47. [DOI] [PubMed] [Google Scholar]
- Lansky A, Finlayson T, Johnson C, Holtzman D, Wejnert C, Mitsch A, Gust D, Chen R, Mizuno Y, Crepaz N, 2014. Estimating the number of persons who inject drugs in the united states by meta-analysis to calculate national rates of HIV and hepatitis C virus infections. PLoS One 9, e97596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lasry O, Dendukuri N, Marcoux J, Buckeridge DL, 2018. Accuracy of administrative health data for surveillance of traumatic brain injury: a Bayesian latent class analysis. Epidemiology. [DOI] [PubMed] [Google Scholar]
- Li JS, Sexton DJ, Mick N, Nettles R, Fowler VG Jr., Ryan T, Bashore T, Corey GR, 2000. Proposed modifications to the Duke criteria for the diagnosis of infective endocarditis. Clin. Infect. Dis 30, 633–638. [DOI] [PubMed] [Google Scholar]
- Miller AC, Polgreen PM, 2019. Many Opportunities to Record, Diagnose, or Treat Injection Drug-related Infections Are Missed: A Population-based Cohort Study of Inpatient and Emergency Department Settings. Clin Infect Dis 68, 1166–1175. [DOI] [PubMed] [Google Scholar]
- Ramiere C, Charre C, Miailhes P, Bailly F, Radenne S, Uhres AC, Brochier C, Godinot M, Chiarello P, Pradat P, Cotte L, 2019. Patterns of HCV transmission in HIV-infected and HIV-negative men having sex with men. Clin Infect Dis. [DOI] [PubMed] [Google Scholar]
- Schumacher SG, van Smeden M, Dendukuri N, Joseph L, Nicol MP, Pai M, Zar HJ, 2016. Diagnostic Test Accuracy in Childhood Pulmonary Tuberculosis: A Bayesian Latent Class Analysis. Am J Epidemiol 184, 690–700. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith BD, Morgan RL, Beckett GA, Falck-Ytter Y, Holtzman D, Ward JW, 2012. Hepatitis C virus testing of persons born during 1945–1965: recommendations from the Centers for Disease Control and Prevention. Ann Intern Med 157, 817–822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tan C, Hansen M, Cohen G, Boyle K, Daneman N, Adhikari NK, 2016. Accuracy of administrative data for identification of patients with infective endocarditis. Int J Cardiol 224, 162–164. [DOI] [PubMed] [Google Scholar]
- The National Bureau of Economic Research, 2016. CMS’ ICD-9-CM to and from ICD-10-CM and ICD-10-PCS Crosswalk or General Equivalence Mappings. Accessed on March 29, 2019 from https://www.nber.org/data/icd9-icd-10-cm-and-pcs-crosswalk-general-equivalence-mapping.html.
- Tookes H, Diaz C, Li H, Khalid R, Doblecki-Lewis S, 2015. A Cost Analysis of Hospitalizations for Infections Related to Injection Drug Use at a County Safety-Net Hospital in Miami, Florida. PLoS One 10, e0129360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Toyoda N, Chikwe J, Itagaki S, Gelijns AC, Adams DH, Egorova NN, 2017. Trends in Infective Endocarditis in California and New York State, 1998–2013. JAMA 317, 1652–1660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weatherby NL, Needle R, Cesari H, Booth R, McCoy CB, Watters JK, Williams M, Chitwood DD, 1994. Validity of self-reported drug use among injection drug users and crack cocaine users recruited through street outreach. Evaluation and Program Planning 17, 347–355. [Google Scholar]
- Weir MA, Slater J, Jandoc R, Koivu S, Garg AX, Silverman M, 2019. The risk of infective endocarditis among people who inject drugs: a retrospective, population-based time series analysis. Cmaj 191, E93–e99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wurcel AG, Anderson JE, Chui KK, Skinner S, Knox TA, Snydman DR, Stopka TJ, 2016. Increasing Infectious Endocarditis Admissions Among Young People Who Inject Drugs. Open Forum Infect. Dis 3, ofw157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wurcel AG, Merchant EA, Clark RP, Stone DR, 2015. Emerging and Underrecognized Complications of Illicit Drug Use. Clin Infect Dis 61, 1840–1849. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
