Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Sep 1.
Published in final edited form as: HIV Med. 2019 May 26;20(8):567–570. doi: 10.1111/hiv.12757

Validation of HIV cohort identification using automated clinical data in the Department of Veterans Affairs

JR Kramer 1,2,3, C Hartman 1, DL White 1,2,3,4, K Royse 1,2, P Richardson 1,2, AP Thrift 1,2,3, S Raychaudhury 1, R Desiderio 1, D Sanchez 1, E Chiao 1,2,3
PMCID: PMC6687524  NIHMSID: NIHMS1025883  PMID: 31131549

Abstract

Objectives:

The US Department of Veterans Affairs (VA) is the largest integrated healthcare provider for HIV patients in the US. VA data for HIV-specific clinical and quality improvement research is an important resource. We sought to determine the accuracy of using the VA Corporate Data Warehouse (CDW), a fully automated medical records database for all VA users nationally, to identify HIV patients compared with a gold-standard VA HIV Clinical Case Registry (CCR).

Methods:

We assessed the test performance characteristics of each of our CDW criteria-based algorithms (presence of 1, 2, or all the following: diagnostic codes for HIV, positive HIV laboratory tests, or prescription for HIV medication) by calculating their sensitivity (proportion of HIV+ from CCR accurately detected as HIV+ by CDW algorithm) and positive predictive value (PPV; proportion of patients identified by CDW algorithm accurately classified as HIV+ from CCR).

Results:

We found that using a CDW algorithm requiring 2 of 3 HIV diagnostic criteria yielded the highest sensitivity (95.2%) with very little tradeoff in positive predictive value (93.5%).

Conclusion:

A two diagnostic criteria-based algorithm can be utilized to accurately identify HIV cohorts seen in the nationwide VA healthcare system.

Keywords: HIV, Department of Veterans Affairs, cohort, validation, epidemiology

INTRODUCTION

U.S. Department of Veterans Affairs (VA) was at the forefront of implementing electronic medical records, and its large and automated clinical and administrative databases have become useful tools for quality assessment as well as research.(1) In addition, the VA is the largest single integrated healthcare provider for HIV patients in the US.(2) Thus, the utilization of VA data for HIV-specific clinical research remains an important resource. As HIV patients are living longer due to improvements in antiretroviral therapy,(3) chronic diseases such as heart disease, diabetes, and cancer are becoming an increasing burden for HIV patients.(4) Large sample sizes and detailed clinical data make the VA an ideal to setting in which to study the incidence and outcomes of HIV-associated comorbidities. In order to address these research needs, the VA developed an adjudicated VA HIV Clinical Case Registry (CCR)(5) that had been utilized to conduct large-scale HIV-related epidemiologic and clinical research. In recent years, the VA has released extensive individual-level electronic medical record (EMR) data via the Corporate Data Warehouse (CDW) to conduct research; therefore, while the CCR continues to warehouse data, the data has been decommissioned from use by researchers. For this study, we sought to determine the accuracy of using the CDW to identify all VA Veteran users with HIV infection as compared to the CCR as the gold standard.

METHODS

Data sources

The VA HIV CCR, which serves as the “gold standard” for this study, is a disease specific registry maintained by the VA that was designed to collect data on all HIV-infected veterans in care at the VA.(5) A local CCR Coordinator manually enters HIV patients into the registry and then their clinical information including laboratory, pharmacy and encounter data is automatically extracted from the EMR. Our latest HIV CCR data extract included HIV patients identified between 01/01/1985–12/31/2010.

The study cohort was identified from the VA CDW, which is a comprehensive automated VA database that auto-extracts from the EMR all inpatient and outpatient encounters, laboratory information, pharmacy utilization and vital status information on all VA users from 10/01/1999 [fiscal year (FY) 2000] to present day. CDW is updated in real-time. The use of these data for this study was approved by the Institutional Review Board for the Baylor College of Medicine and the Research and Development Committee of the Michael E. DeBakey VA Medical Center in Houston, TX.

Cohort Identification

Our CCR HIV data extract consisted of VA patients who were tested and confirmed to be HIV positive by local CCR Coordinators any time prior to 1/1/2011. There were 66,991 unique patients in the initial CCR cohort. As reported previously by this research group, we excluded 6,762 patients from the cohort because of no or poor documentation of vital statistics such as incorrect date of birth or death, and/or poor or no documentation of positive laboratory test for HIV.(6) We also excluded an additional 21,197 patients who died or were lost to follow-up prior to FY 2000 (inception date of CDW), and an additional 1823 patients with no CD4 lab value during follow-up; the resulting analytic cohort included 37,209 patients.

Our CDW-based HIV cohort included patients who had met at least one of three HIV-specific criteria: 1) positive HIV lab tests, 2) evidence of having received HIV-specific medications, or 3) ICD-9 diagnostic codes for HIV. The laboratory criteria included patients with at least 1 positive HIV antibody test by Elisa or Western Blot; tested for HIV viral load (any: +/−/indeterminate); or tested for CD4+ count. The treatment criteria included patients with at least one prescription in inpatient or outpatient pharmacy records for HIV antiretroviral therapy (ART). The diagnosis criteria included any inpatient or outpatient encounter with an ICD-9 code for HIV (042 or V08). These patients had to have the first of at least one of their HIV diagnostic criteria after 10/1/1999 (FY 2000) and before 12/31/2010 (end date of our CCR gold standard cohort) or they were excluded from the cohort. If a patient had one or two additional criteria after 12/31/2010, those criteria were not counted, but the patient was retained in our CDW cohort.

Data Analysis

After restricting both cohorts to patients with any indication of HIV identified from the same time frame (10/1/1999–12/31/2010) we created CDW algorithms based on the number of a priori specified HIV diagnostic criteria (from 1–3) present for each HIV patient. We classified patients who were HIV+ based on one of the CDW algorithms and were also identified as HIV+ in the gold-standard CCR as true positives (TP), and classified patients positive only based on CDW-algorithms as false positives (FP). We classified patients not in CDW but the CCR as false negatives (FN). We assessed the test performance characteristics of each of our CDW criteria-based algorithms by calculating their sensitivity (proportion of HIV+ from CCR accurately detected as HIV+ by CDW algorithm) and positive predictive value (PPV; proportion of patients identified by CDW algorithm accurately classified as HIV+ from CCR). We also calculated 95% confidence intervals for each proportion.

RESULTS

There were 65,235 patients identified with at least one HIV diagnostic criteria from the CDW. After merging with the CCR cohort there were 66,562 total patients in the combined CCR and CDW cohorts. Of these, 1,327 had no CDW diagnostic criteria but were in the CCR cohort (i.e., were FN), 65,235 had 1 or more CDW diagnostic criteria, 37,896 had 2 or more criteria, and 31,994 had 3 criteria (Table 1)

Table 1.

Sensitivity and positive predictive value for CDW algorithm using CCR cohort as gold standard to identify HIV-infected patients using VA healthcare nationwide (2000–2010).

Proportion of HIV+ from CCR accurately
detected as HIV+ by CDW algorithm
Proportion of patients identified by CDW
algorithm accurately classified as HIV+
from CCR
CDW Algorithm In CCR &
CDW (TP)
Not in
CDW/In CCR
(FN)
In CCR
(TP+FN)
Sensitivity
(TP/TP+FN)
(95% CI)
Not in CCR/
In CDW
(FP)
In CDW
(TP+FP)
Positive
Predictive Value
(TP/TP+FP)
(95% CI)
1 or more CDW criteria for HIV diagnosis 35,882 1,327 37,209 96.4%
(96.2%−96.6%)
29,353 65,235 55.0%
(54.6%−55.4%)
2 or more CDW criteria for HIV diagnosis 35,424 1,785 37,209 95.2%
(95.0%−95.4%)
2,472 37,896 93.5%
(93.2%−93.7%)
3 CDW criteria for HIV diagnosis 31,388 5,821 37,209 84.4%
(84.0%−84.7%)
606 31,994 98.1%
(97.9%−98.3%)

Abbreviations: CCR-Clinical Case Registry, CDW-Corporate Data Warehouse, CI-confidence interval, FN-false negative, TP-true positive

CDW criteria include: positive HIV lab tests; HIV-specific medications; ICD-9 diagnostic codes for HIV

Of the 37,209 identified in the CCR, 35,882 patients were also identified in the CDW cohort using an algorithm of at least 1 positive diagnostic criteria in the CDW. This resulted in a sensitivity of 96.4%. When using 2 of the 3 diagnostic criteria the number of TP only dropped slightly to 35,424 resulting in a sensitivity of 95.2%. Finally, using 3 of 3 criteria resulted in fewer TP (N=31,388), with a sensitivity of 84.4%.

There were 29,353 CDW patients with at least 1 HIV diagnostic criteria in the CDW but who were not in the CCR cohort; these were considered FP. Therefore, using 1 of the 3 diagnostic criteria resulted in a PPV of 55.0%. However, the number of FP dropped to 2,472 veterans when using the 2 of 3 HIV diagnostic criteria, resulting in a PPV of 93.5%. Finally, there were only 606 FP that had all 3 criteria resulting in a PPV of 98.1%.

We determined that using the CDW algorithm requiring 2 of 3 HIV diagnostic criteria yielded a high sensitivity (95.2%) with very little tradeoff in PPV (93.5%). This CDW-identified cohort of 37,896 HIV infected patients had a mean age of 47.2 years (SD=10), 97.1% were male, 37.6% white, 53.3% African-American, and 5.2% Hispanic.

DISCUSSION

We found that using 2 out of 3 HIV diagnostic criteria (labs, ART treatment, and ICD codes) was a highly accurate algorithm to identify patients with confirmed HIV infection using VA healthcare facilities nationwide using the VA’s automated EMR-based CDW databases. The use of 2 criteria provided a high sensitivity without sacrificing considerable PPV; it resulted in an algorithm that missed only a small proportion of HIV-infected patients while ensuring those classified as HIV-infected were HIV-infected as defined by the gold standard HIV CCR cohort.

Fultz et al.(7) modified an HIV ICD-9 code based algorithm utilizing VA claims data from 1998–2003 and compared it to the VA HIV CCR as the gold standard. They reported use of one code to identify HIV had a sensitivity of 93%, but a PPV of only 69%. When requiring two outpatient or one inpatient codes, the PPV improved to 88% with minimal loss in sensitivity (90%). However, this study did not use clinical data (e.g., HIV laboratory data and ART treatment) that is now available in the CDW in their algorithm. With the addition of laboratory and pharmacy data, we increased the PPV and sensitivity even further with both substantially over 90%.

Limitations of this study include the uncertainty about whether or not these results would be generalizable to healthcare systems outside the VA. In particular, many electronic medical records-based health care systems may have differing amounts of access to laboratory and pharmacy data. Furthermore, although we used the current algorithm to identify HIV-infected patients with good sensitivity and PPV, due to the age of our CCR cohort we were only able to examine the effect ICD-9 codes and not the currently used ICD-10 codes (implemented in October 2015 in the VA). It is also possible that some of the FP in the study are true HIV cases that were missed by the CCR. Many of the FP patients missed by the CCR with one criteria likely received only one errant HIV ICD code, perhaps if being screened for HIV. However, some of the patients with 2 or more criteria in CDW could indeed have HIV but were missed by the CCR. If true this would increase the PPV even further. Finally, it was not possible to include the over 10 million VA patients (from years 2000–2010) who do not have HIV (i.e., all the true negatives) in the analysis because a large proportion of veterans never received an HIV test; therefore, we were unable to calculate specificity and negative predictive value. However, we believe these values would also be quite high in this population.

In conclusion, we have demonstrated that a two diagnostic criteria-based algorithm can be utilized to accurately identify HIV cohorts seen in the nationwide VA healthcare system. The ability to precisely identify large and contemporary HIV-infected cohorts will enable many future quality improvement and clinical epidemiology research studies using VA clinical data including novel examination of the long term impact of HIV, aging, and chronic long term medications on HIV-related outcomes including co-morbidities and other health specific outcomes.

Acknowledgements:

The opinions stated here are those of the authors and do not necessarily represent the views of the National Institutes of Health or the U.S. Department of Veterans Affairs.

Funding: This research was supported by an NIH Provocative Questions (PQ3) research grant from the National Institute of Health (#CA206476–03; PI: Dr. Chiao) and the VA Health Services Research Center for Innovations in Quality, Effectiveness and Safety (#CIN 13–413). Dr. White receives salary support from the Department of Veterans Affairs (#CX001430). Dr. Thrift was supported by the Creative and Novel Ideas in HIV Research (CNIHR) Program through a supplement to the University of Alabama at Birmingham (UAB) Center for AIDS Research funding (P30 #AI027767). This funding was made possible by collaborative efforts of the Office of AIDS Research, the National Institute of Allergy and Infectious Diseases, and the International AIDS Society.

Footnotes

Conflict of Interest: The authors declare no conflict of interest. The U.S. Department of Veterans Affairs and National Institutes of Health played no role in the analyses or interpretation of findings or in the decision to publish these findings.

REFERENCES

  • 1.Justice AC, Erdos J, Brandt C, Conigliaro J, Tierney W, Bryant K. The Veterans Affairs Healthcare System: A unique laboratory for observational and interventional research. Medical Care 2006; 44(8 Suppl 2): S7–12. [DOI] [PubMed] [Google Scholar]
  • 2.Phillips B, Mole L, Backus L, Halloran J, Chang S. Caring for Veterans with HIV Disease-Fiscal Year 2002. Palo Alto, CA, Center for Quality Management in Public Health Veterans Health Administration Department of Veterans Affairs 2003: 1–68. [Google Scholar]
  • 3.Lodi S, Phillips A, Logan R, et al. Comparative effectiveness of immediate antiretroviral therapy versus CD4-based initiation in HIV-positive individuals in high-income countries: observational cohort study. Lancet HIV 2015. August; 2(8):e335–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Vandenhende MA, Roussillon C, Henard S, et al. Cancer-Related Causes of Death among HIV-Infected Patients in France in 2010: Evolution since 2000. PloS one 2015; 10(6): e0129550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Backus L, Mole L, Chang S, Deyton L. The Immunology Case Registry. Journal of clinical epidemiology 2001; 54 Suppl 1: S12–5. [DOI] [PubMed] [Google Scholar]
  • 6.Mbang PA, Kowalkowski MA, Amirian ES, et al. Association between Time on Protease Inhibitors and the Incidence of Squamous Cell Carcinoma of the Anus among U.S. Male Veterans. PloS one 2015; 10(12): e0142966. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Fultz SL, Skanderson M, Mole LA, et al. Development and verification of a “virtual” cohort using the National VA Health Information System. Medical Care 2006; 44(8): S25–S30. [DOI] [PubMed] [Google Scholar]

RESOURCES