Abstract
This pilot study assessed feasibility of computer-assisted electronic medical record (EMR) abstraction to ascertain coronary heart disease (CHD) event hospitalizations. We included a sample of 87 hospitalization records from participants the University of North Carolina (UNC) site of the Women's Interagency HIV Study (WIHS) and UNC Center for AIDS Research (CFAR) HIV Clinical Cohort who were hospitalized within UNC Healthcare System from July 2004 to July 2015. We compared a computer algorithm utilizing diagnosis/procedure codes, medications, and cardiac enzyme levels to adjudicate CHD events [myocardial infarction (MI)/coronary revascularization] from the EMR to standardized manual chart adjudication. Of 87 hospitalizations, 42 were classified as definite, 25 probable, and 20 non-CHD events by manual chart adjudication. A computer algorithm requiring presence of ≥1 CHD-related International Classification of Diseases, 9th Revision (ICD-9)/Current Procedural Terminology (CPT) code correctly identified 24 of 42 definite (57%), 29 of 67 probable/definite CHD (43%), and 95% of non-CHD events; additionally requiring clinically defined cardiac enzyme levels or administration of MI-related medications correctly identified 55%, 42%, and 95% of such events, respectively. Requiring any one of the ICD-9/CPT or cardiac enzyme criteria correctly identified 98% of definite, 97% of probable/definite CHD, and 85% of non-CHD events. Challenges included difficulty matching hospitalization dates, incomplete diagnosis code data, and multiple field names/locations of laboratory/medication data. Computer algorithms comprising only ICD-9/CPT codes failed to identify a sizable proportion of CHD events. Using a less restrictive algorithm yielded fewer missed events but increased the false-positive rate. Despite potential benefits of EMR-based research, there remain several challenges to fully computerized adjudication of CHD events.
Keywords: HIV, coronary heart disease, computerized event adjudication, electronic medical record, computer phenotype
Introduction
In the era of effective antiretroviral therapy, non-AIDS related diseases, in particular coronary heart disease (CHD), contribute significantly to morbidity and mortality among HIV-infected adults. As such, determination of clearly defined clinical outcomes for CHD and other non-AIDS conditions in long-term cohorts is imperative.
Efforts to accurately identify CHD events in HIV studies have revealed limitations of relying on diagnosis codes alone to ascertain these outcomes.1 The International Network for Strategic Initiatives in Global HIV Trials (INSIGHT) team, in a retrospective review of serious non-AIDS outcomes in a clinical trial, evaluated 83 reports of acute myocardial infarction (MI) and found that 30% did not meet criteria for acute MI after medical record adjudication.1 Similarly, Crane et al. identified 294 definite/probable MIs by manual chart review and adjudication of MI events in the Center for AIDS Research (CFAR) Network of Integrated Clinical Systems (CNICS) cohort and reported that only 44% of those events had a clinical diagnosis code of MI and 78% had an elevated troponin concentration that met study-defined MI criteria.2
Manual review and adjudication of all medical records in large cohorts provides better assurance of accuracy, but is often logistically difficult and time consuming. The use of electronic medical records (EMRs) has been a paradigm shift in how hospitalization data are stored and accessed. This has led to interest in using EMRs in clinical research to provide more efficient electronically executable methods of medical record review. Development of phenotype algorithms using EMR data has been explored as a screening method for identification of individuals with specific diagnoses for study recruitment and as a tool for increasing efficiency of outcome adjudication in clinical research.1,3 The potential for facilitation of research has been heralded as a major benefit of widespread use of EMRs.
The Electronic Medical Records and Genomics (eMERGE) network is a national network funded and developed by the National Human Genome Research Institute (NHGRI) that aims to foster collaborative efforts to develop electronic phenotypes for use in large-scale genetic research.3
Although the use of computerized methods of EMR review for ascertainment of outcomes is encouraged by funding institutions and research networks, electronic-only methods of outcome adjudication have not yet been widely investigated in large HIV studies.
The Women's Interagency HIV Study (WIHS) is an ongoing prospective cohort study of HIV-seropositive and HIV-seronegative women who are at risk for HIV infection, established in 1993 to investigate epidemiology, treatment outcomes, and comorbidities of HIV infection, to which four sites in the southern United States, including University of North Carolina (UNC), were added in January 2013.4 For multicenter studies such as the WIHS, the ability to use a single standardized computer algorithm to identify outcomes across multiple study sites would be an important advance, offering the possibility of significantly decreasing workload for individual sites and increasing efficiency. A key factor determining the feasibility of adopting such methods will be the ability to maintain accuracy of outcome adjudication.
This article describes a single-center pilot study designed to obtain preliminary data on development and testing of an automated computer algorithm including data elements from multiple domains for use in ascertainment of CHD events (MI and coronary revascularization).
Materials and Methods
Study population
After obtaining approval from our institutional review board, we conducted this retrospective study at the UNC School of Medicine (SOM), reviewing hospitalization records from two observational cohorts, the UNC site of the WIHS, and the UNC CFAR HIV Clinical Cohort (UCHCC). The study population for this feasibility pilot study included all hospitalization records for HIV-infected WIHS participants at this site who reported hospitalization at a hospital within the UNC Healthcare System between October 2013 (date of first hospitalization for a UNC WIHS participant) and July 2015, as well as hospitalization records of UCHCC participants who had been hospitalized at a UNC-affiliated hospital between July 2004 and July 2015 and whose records had previously been adjudicated for occurrence of CHD events.
The current EMR used by UNC Hospitals, Epic Systems Corporation (EpicCare EMR, Verona, WI), was launched in 2014 and includes legacy data from the previously used Web-based Clinical Information System (WebCIS) dating back to July 2004; therefore, this was chosen as the earliest date for inclusion in this study. The UCHCC is an ongoing large observational clinical cohort that includes all HIV-seropositive adults receiving care at the UNC Infectious Diseases Clinic since 1996 who provided written informed consent, as has been described previously.5,6
Definition of CHD events
Hospitalization for a CHD event was defined as any hospitalization during which the participant had a documented MI or coronary revascularization procedure. MI was classified as definite, probable, or “not MI” based on criteria adapted from the Universal Definition of MI and Multi-Ethnic Study of Atherosclerosis (MESA) that were used for adjudication of CHD events in the UCHCC.7,8 Classification was based on the following: symptoms (chest pain/tightness/pressure not attributed to clear noncardiac cause), elevated cardiac enzymes (with abnormal defined as greater than twice the upper limit of normal), electrocardiogram (ECG) criteria, or by imaging studies showing new loss of viable myocardium/new regional wall motion abnormality.
Based on MESA criteria, events were classified as “definite MI” in the presence of documented cardiac pain and (1) evolution of a major Q wave regardless of cardiac enzyme results or (2) evolution of ST elevation/new left bundle branch block (LBBB)/ST-T depression or inversion/minor Q waves and abnormal cardiac enzymes or (3) a single ECG with a major Q wave and abnormal cardiac enzymes; or in the absence of documented cardiac pain if the following were present: (1) evolution of a major Q wave or (2) evolution of ST elevation/new LBBB and abnormal cardiac enzymes.
Events in which cardiac pain was documented and there were any of the above-described ECG changes other than evolution of a major Q wave were defined as “probable MI” if cardiac enzymes were equivocal (value between normal and twice the upper limit of normal), and as “not MI” if cardiac enzymes were normal. In the absence of cardiac pain, events that did not meet criteria for definite MI were classified as “probable MI” if there was evolution of ST elevation/new LBB and cardiac enzymes were equivocal, or if one of the other ECG findings was present with abnormal cardiac enzymes. Events in which the ECG was normal and cardiac enzymes were abnormal were classified as “probable MI” and all other events were classified as “not MI.” Coronary revascularization events were defined as occurrence of coronary artery bypass graft, percutaneous transluminal coronary angioplasty, or other percutaneous coronary revascularization intervention (e.g., atherectomy).
Manual chart abstraction
In the UCHCC, medical chart reviews done at enrollment and prospectively at 6-month intervals obtain data on clinical diagnoses, medications (antiretroviral and all other medications), laboratory and other testing, and treatment during any hospitalizations reported by the participant. Since the UCHCC is part of CNICS, the manual chart abstraction and event adjudication for UCHCC hospitalizations, therefore, follow the same protocol as in CNICS.2 Potential MI events are identified for review based on clinical diagnosis of MI and/or elevated cardiac enzyme levels [any troponin or creatine kinase muscle/brain (CK-MB) result above the upper limit of normal].
Trained research assistants had previously conducted standardized manual chart abstraction and two clinician reviewers had adjudicated CHD events independently for all hospitalizations included in this study, with additional review by a third clinician if the initial reviewers disagreed. We conducted manual medical record review of hospitalizations reported by WIHS participants during the study period using the same standardized chart abstraction and adjudication methods. Data manually abstracted from the hospitalization record included emergency department (ED) notes, admission and progress notes (including cardiology consults), medications, CK-MB and troponin results, ECGs, stress tests/cardiac imaging and cardiac catherization reports, and cardiac surgery/revascularization procedure reports.
Development of computer algorithms
We developed computer algorithms using structured data elements from domains available in Epic to query the EMR with the goal of identifying hospitalization records that met criteria for protocol-specified definitions of an MI or coronary revascularization event during a hospitalization or ED visit. Data elements eligible for inclusion in the algorithms were International Classification of Diseases, 9th Revision (ICD-9) and Current Procedural Terminology (CPT) codes denoting MI and closely related diagnoses that have been validated in previously published studies (given in Table 2)2,6,9; the following are medications commonly used in treatment of MIs or in conjunction with coronary revascularization: nitroglycerin [sublingual and intravenous (IV)], heparin (continuous IV infusion), statins, beta-blockers, antiplatelet agents (including aspirin 325 mg), and thrombolytic medications; and cardiac enzyme levels classified as abnormal by standard clinically defined thresholds (CK-MB >6.0, Troponin T > 0.029, Troponin I > 0.034).
Table 2.
Algorithm A One or more of the following ICD-9 or CPT codes: 410.**, 412.**, 411.**, 413.**, 414.**, 429.7, V45.81, V45.82, 36.01, 36.02, 36.03, 36.05, 36.09, 36.10–36.19, 33140, 33533–33536, 33510–33523, 33530, 33533–33536, 92920–92921, 92924–92925, 92928–92929, 92933–92934, 92937–92938, 92941, 92943–92944, 92980–92982, 92984, 92995–92996, 92974 | |||
---|---|---|---|
Classification of hospitalizations | |||
Manual chart review and adjudication | Algorithm |
Total | |
CHD event, n (%) | Not CHD event, n (%) | ||
Definite MI/revascularization | 24 (57) | 18 (43) | 42 |
Probable MI | 5 (20) | 20 (80) | 25 |
Not MI/revascularization | 1 (5) | 19 (95) | 20 |
Total | 30 | 57 | 87 |
κ(definite or probable MI) = 0.23 (95% CI, 0.11–0.36)a |
Algorithm B One or more of the following ICD-9 or CPT codes: 410.**, 412.**, 411.**, 413.**, 414.**, 429.7, V45.81, V45.82, 36.01, 36.02, 36.03, 36.05, 36.09, 36.10–36.19, 33140, 33533–33536, 33510–33523, 33530, 33533–33536, 92920–92921, 92924–92925, 92928–92929, 92933–92934, 92937–92938, 92941, 92943–92944, 92980–92982, 92984, 92995–92996, 92974 where one or more of the following laboratory value thresholds were reached: CK-MB >6.0 Troponin T > 0.029 Troponin I > 0.034 or One or more of the study-specified CHD-related medications were administered | |||
---|---|---|---|
Classification of hospitalizations | |||
Manual chart review and adjudication | Algorithm |
Total | |
CHD event, n (%) | Not CHD event, n (%) | ||
Definite MI/revascularization |
23 (54.8) |
19 (45.2) |
42 |
Probable MI |
5 (20) |
20 (80) |
25 |
Not MI/revascularization |
1 (5) |
19 (95) |
20 |
Total |
29 |
58 |
87 |
κ(definite or probable MI) = 0.22 (95% CI, 0.10–0.34)a |
Algorithm C One or more of the following ICD-9 or CPT codes: 410.**, 412.**, 411.**, 413.**, 414.**, 429.7, V45.81, V45.82, 36.01, 36.02, 36.03, 36.05, 36.09, 36.10–36.19, 33140, 33533–33536, 33510–33523, 33530, 33533–33536, 92920–92921, 92924–92925, 92928–92929, 92933–92934, 92937–92938, 92941, 92943–92944, 92980–92982, 92984, 92995–92996, 92974 or One or more of the following laboratory value thresholds are reached: CK-MB >6.0 Troponin T > 0.029 Troponin I > 0.034 | |||
---|---|---|---|
Classification of hospitalizations | |||
Manual chart review and adjudication | Algorithm |
Total | |
CHD event, n (%) | Not CHD event, n (%) | ||
Definite MI/revascularization |
41 (97.6) |
1 (2.4) |
42 |
Probable MI |
24 (96) |
1 (4.0) |
25 |
Not MI/revascularization |
3 (15) |
17 (85) |
20 |
Total |
68 |
19 |
87 |
κ(definite or probable MI) = 0.83 (95% CI, 0.69–0.97)a |
Kappa coefficient, definite and probable MI categories combined.
CHD, coronary heart disease; CI, confidence interval; CK-MB, creatine kinase-muscle/brain; CPT, Current Procedural Terminology; ICD-9, International Classification of Diseases, 9th Revision; MI, myocardial infarction.
Since ECG reports, exercise stress tests, and myocardial perfusion test results are stored as unstructured data in Epic, those data could not be queried electronically and, therefore, were not included in our algorithm. Similarly, presence/absence of cardiac pain could not be queried from free text. It was, therefore, not possible to create an algorithm to identify probable MI using MESA criteria, which include cardiac pain and ECG findings; therefore, the algorithm classified events only as “MI” or “not MI.” Data retrieval from the EMR was performed by trained biomedical informatics data analysts working in the Carolina Data Warehouse for Health, a central data repository containing clinical and administrative data from the UNC Health Care system. Statistical analyses were performed in SAS version 9.4 (SAS Institute, Inc., Cary, NC).
Results
Hospitalization record retrieval
Of 123 hospitalizations within the UNC Health Care system for which manual chart abstraction and event adjudication had been performed (102 from UCHCC and 21 from WIHS), 87 were retrievable from Epic using the medical record number (MRN) and hospitalization dates. Among study subjects for whom medical records were retrieved, median age at time of hospitalization was 48.4 years [interquartile range (IQR) 41.1–56.4], median CD4+ cell count was 392 (IQR 147–641), and 58.6% had HIV-1 RNA below the limit of detection (Table 1). Thirty-six records could not be retrieved electronically, 45% of which were events that had been classified as CHD events by manual adjudication.
Table 1.
WIHS (n = 16) | CFAR (n = 71) | Total (n = 87) | |
---|---|---|---|
Age in years, median (IQR) | 38.8 (35.5–46.1) | 50.3 (43.1–57.6) | 48.4 (41.1–56.4) |
Race, n (%) | |||
White | 1 (6.3) | 14 (19.7) | 15 (17.2) |
Black | 12 (75.0) | 50 (70.4) | 62 (71.3) |
Other | 3 (18.8) | 7 (9.9) | 10 (11.5) |
CD4+ cell count, cells/mm3, median (IQR) | 747 (424–935) | 307 (126–544) | 392 (147–641) |
HIV-1 RNA below limit of detection, n (%) | 11 (68.8) | 40 (56.3) | 51 (58.6) |
Combination ART, n (%) | 12 (75.0) | 49 (69.0) | 61 (70.1) |
History of clinical AIDS, n (%) | 2 (12.5) | 36 (50.7) | 38 (43.7) |
Year of event, n (%) | |||
2004 | 0 (0.0) | 3 (4.2) | 3 (3.4) |
2005 | 0 (0.0) | 10 (14.1) | 10 (11.5) |
2006 | 0 (0.0) | 15 (21.1) | 15 (17.2) |
2007 | 0 (0.0) | 5 (7.0) | 5 (5.5) |
2008 | 0 (0.0) | 6 (8.5) | 6 (6.9) |
2009 | 0 (0.0) | 9 (12.7) | 9 (10.3) |
2010 | 0 (0.0) | 6 (8.5) | 6 (6.9) |
2011 | 0 (0.0) | 4 (5.6) | 4 (4.6) |
2012 | 0 (0.0) | 8 (11.3) | 8 (9.2) |
2013 | 0 (0.0) | 4 (5.6) | 4 (4.6) |
2014 | 5 (31.3) | 1 (1.4) | 6 (6.9) |
2015 | 11 (68.8) | 0 (0.0) | 11 (12.6) |
Fatal event, n (%) | 0 (0.0) | 0 (0.0) | (0.0) |
CD4+ cell count, HIV-1 RNA, combination ART use, and history of AIDS reflect most recently available data at time of the event.
ART, antiretroviral therapy; CFAR, Center for AIDS Research; IQR, interquartile range; WIHS, Women's Interagency HIV Study.
We encountered several obstacles to retrieval of hospitalization records. Although hospitalizations sourced from the WebCIS legacy system were queryable in Epic, in some cases inconsistencies in handling of trailing digits precluded standardized conversion of legacy MRN to the current Epic MRN. Legacy system records from some UNC-affiliated hospitals had been scanned into Epic and were, therefore, not searchable by electronic methods. The classification in Epic of ED visits as separate encounters, distinct from the resulting hospital admission, led to mismatched hospitalization dates in situations in which the patient was admitted to the hospital on the date after the ED visit date (e.g. for a patient who presented to the ED on January 1, 2009, leading to admission to the hospital unit on January 2, 2009, the manual chart abstraction used the ED visit date, January 1, 2009, as the hospitalization date while the EPIC EMR indicated January 2, 2009 as the hospitalization date).
Data abstraction challenges
All hospitalization records were queried for the following data: primary diagnosis code, study-specified diagnosis codes linked to MI/coronary revascularization, troponin and CK-MB results, and generic and trade names of study-specified MI/coronary revascularization-related medications. As with record retrieval, there were several challenges in electronic abstraction of structured data from hospitalization records. In Epic, each inpatient procedure is coded as a separate encounter, thus data queries had to capture all procedure codes performed during the date range of the hospitalization (ED arrival/hospital admission to discharge) to ensure retrieval of complete CPT code data. Diagnosis code data from the legacy system were not all stored in the same fields as diagnosis codes that originated in Epic, so that not all legacy diagnostic codes could be reliably retrieved electronically.
In addition, laboratory test names in Epic may differ depending on the patient's location at the time that the test is ordered, leading to missing values for test result fields if the test name and ordering department were not correctly matched. Medication data are stored in multiple different fields in the EMR, some of which include only medication name but not dose or route of administration, which were important criteria for inclusion of some medications (e.g., aspirin and heparin) in our algorithm.
Assessment of concordance of computer algorithms with results of manual event adjudication
Manual chart abstraction and event adjudication were used as the gold standard for this study and classified 42 of 87 hospitalizations (48%) as definite CHD events, 25 (29%) as probable CHD events, and 20 (23%) as non-CHD events. The 20 possible CHD hospitalizations that were ultimately determined to be non-CHD events by manual adjudication were hospitalizations for cardiac-related illnesses (e.g., arrhythmias) that did not meet study criteria for a CHD event, for cardiac procedures that did not meet study criteria for a CHD event (i.e., did not include cardiac revascularization), or for noncardiac illnesses (e.g., chest pain due to GI etiology).
We constructed a computer algorithm requiring the presence of ≥1 specified ICD-9 code for MI, obstructive coronary artery disease, or MI-related complication (e.g., ventricular fibrillation secondary to MI, myocardial rupture secondary to MI) or CPT code denoting coronary revascularization (Algorithm A). This algorithm correctly identified 24 of 42 definite CHD events (57%), 29 of 67 probable/definite CHD events (43%), and 19 of 20 (95%) non-CHD events (Table 2, Algorithm A). A computer algorithm that required either presence of elevated CK-MB or troponin levels or documented administration of MI-related medications in addition to ≥1 of the above-specified ICD-9/CPT codes correctly identified 55%, 42%, and 95% of such events, respectively (Table 2, Algorithm B). A less restrictive computer algorithm requiring the presence of only one of the ICD-9/CPT code or cardiac enzyme criteria correctly identified 98% of definite, 97% of probable/definite, and 85% of non-CHD events (Table 2, Algorithm C).
Discussion
Increasing use of EMRs in inpatient and outpatient clinical settings has tremendous potential to improve efficiency of clinical outcome ascertainment from the medical record. All public and private health care providers and other eligible professionals are now incentivized by the Center for Medicare and Medicaid services to demonstrate meaningful use of an EMR.10 As of 2015, 96% of all hospitals and 86.9% of office-based physicians possessed an EMR.11,12 This pilot study assessed the feasibility of computer-assisted EMR abstraction for ascertainment of CHD events among 87 hospitalization records.
A computer algorithm requiring presence of ≥1 CHD-related ICD-9/CPT codes correctly identified 57% of 42 definite, 43% of 67 probable/definite CHD, and 95% of non-CHD events; additionally requiring clinically defined cardiac enzyme levels or administration of MI-related medications correctly identified 55%, 42%, and 95% of such events, respectively. Less restrictive criteria requiring any one of the ICD-9/CPT or cardiac enzyme codes improved the algorithm's performance, with correct identification of 98% of definite, 97% of probable/definite CHD, and 85% of non-CHD events. However, we encountered substantial challenges to fully computerized adjudication of CHD events.
Use of EMRs has overcome a significant obstacle to comprehensive hospitalization review by increasing ease of access to medical records. Furthermore, availability of information in the electronic format has the potential to allow automated retrieval of clinical and administrative information by query of data that have been entered into structured fields. These structured data can then be used individually or as part of an algorithm to identify occurrences of the disease or clinical outcome of interest in the EMR. Such algorithms, when designed to independently identify clinical conditions or diseases, have been termed electronic phenotype extraction algorithms, or computable phenotypes.13 Electronic phenotype extraction algorithms, used to extract data for clinical care and disease surveillance, are being increasingly employed to facilitate identification of diseases in patient cohorts in biomedical research and have demonstrated reproducible results when implemented across different institutions and EMR systems.14–18
EMR phenotype development and testing is a multistep process, involving first the identification of characteristics of the disease of interest that can be electronically extracted from the medical record, followed by construction of the algorithm, algorithm testing, and assessment of precision of the algorithm compared with the gold standard of manual chart review. This pilot study identified several difficulties in implementing a computer algorithm for CHD event adjudication at our medical center. These obstacles affected multiple stages of the process, starting with retrieval of records from the EMR and encompassing diagnosis codes, medication data, and laboratory test results.
Our results may not be generalizable to samples of hospitalization records that have not been selected by the same cardiac enzyme/clinical diagnosis criteria as used in the UCHCC. Generalizability may also be limited by the fact that our study represents a single-center experience. However, Epic, the EMR used by UNC Healthcare, is the most widely used EMR in acute care hospitals, accounting for more than one-quarter of EMRs in this setting.19
Our analysis reviewed hospitalizations that occurred during a period when ICD-9 codes were in use. More recently, the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10), has been adopted for coding and billing nationally.20 The ICD-10 distinguishes diagnoses and procedures in much greater detail than ICD-9, and has thus considerably expanded the number of codes in use for cardiac and other clinical diagnoses. This has the potential to allow for more accurate clinical and billing practice but may present several challenges for clinical research. Given the complexity and detail of ICD-10 codes, reliability of coded diagnosis data relies heavily on accurate selection of codes by clinicians.
In accordance with the Third Universal Definition of MI, the ICD-10 has been updated to include new acute MI codes differentiating between Type 1 MI (due to coronary artery plaque disruption and subsequent decreased blood flow) and Type 2 MI (acute imbalance between myocardial oxygen supply and demand due to another condition) as well as additional codes for other MI subtypes, which, although improving accuracy of diagnosis, further complicates selection of correct codes for MI.21–23 In addition, the transition from ICD-9 to ICD-10 codes presents particular challenges in analysis and interpretation of research studies conducted across a time span that includes both ICD-9 and ICD-10 codes, since only a small minority of ICD-10 codes can be linked directly (1-to-1) to ICD-9 codes.21 As underscored by Khera et al. in a recent publication, studies that include event adjudication across such time spans will, therefore, need to include rigorous reporting of validation status of the ICD-10 codes used and the process for linking ICD-9 and ICD-10 codes, as well as an assessment of ways in which temporal changes may be related to transition between the two coding systems.21
In addition to complexities related to coding, automated adjudication of CHD events is further challenged by the need to assess key diagnostic criteria, such as presence of Q waves on the ECG or wall motion defects on cardiac imaging, which are stored as free text in many EMRs and cannot, therefore, be queried by computerized methods designed to search only structured data fields. Incorporation of natural language processing (NLP) to extract data from clinical notes as well as cardiac electrophysiology and imaging reports has the potential to address this deficiency. Utilizing a named-entity recognition system to identify relevant nouns within a string of text, NLP is able to extract content from free text, incorporating logic to determine whether content meets specified criteria.
NLP has been used in epidemiology and clinical research as an adjunct to manual chart abstraction and a tool within algorithms for electronic record review. Carrell et al. evaluated use of NLP to identify breast cancer recurrence from EMR clinical note data, finding that the NLP-based system correctly identified 92% of recurrences with a specificity of 96%.24 Murff et al. studied NLP as a tool for identifying postoperative complications, including MI.25 In a cross-sectional study conducted in the Veterans' Health Administration, NLP correctly identified 91% (95% confidence interval, 78%–97) of postoperative MIs.19 Adopting NLP for use in research studies does, however, require an upfront investment of time for development of rules or for machine learning, which may be the rate-limiting steps in the implementation of this modality.
Although hospital EMRs may have existing research capability, optimization for computerized detection and adjudication of clinical outcomes will require considerable investment of time and collaboration with institutional information technology and bioinformatics professionals. The overarching challenge will be to ensure completeness of data, which is essential to ensuring validity of event adjudication in clinical research. For health care systems with multiple clinical care sites, adoption of uniform coding and mapping of clinical data in the EMR across the system will be needed.
In multicenter studies, such as the WIHS, this challenge is further magnified since it is very likely that billing practices, which influence the final diagnosis codes linked to a hospitalization, will differ considerably in different hospital systems. Devising approaches to overcome some of these limitations will require improvements in research study design and processes for retrieving and linking data from the EMR. Collaboration with data analysts with EMR-specific expertise during the early stages of study development will be crucial to ensure that investigators understand the limitations of the electronic data and that data retrieval will provide data that are both accurate and comprehensive with respect to the research question. For adjudication of diagnoses, like MI, which require assessment of test results over time, it will be very important to develop a standard process for ensuring that data are retrieved from all encounters that occur during a single hospitalization event (e.g., linking troponin levels and ECG results from the ED to the related hospital admission, which may exist as a separate encounter in the EMR).
Despite these challenges, investment in the development of automated methods of chart review to facilitate more efficient ascertainment of outcomes is of growing importance in HIV longitudinal research studies and other fields of study related to chronic disease. Successful systems will need to incorporate methods that allow electronic review of both structured and unstructured data, tailored to achieve high sensitivity to avoid missing outcomes, and a high level of specificity to improve efficiency. In future studies, our research group will explore use of Algorithm C, which performed best in this pilot study, for investigation of EMR adjudication of CHD events in a larger sample, after developing measures to address some of the limitations already discussed.
Acknowledgments
Data in this article were collected, in part by the WIHS, and in part by the University of North Carolina Center for AIDS Research (CFAR). The contents of this publication are solely the responsibility of the authors and do not represent the official views of the National Institutes of Health (NIH). WIHS (principal investigators): UAB-MS WIHS (Mirjam-Colette Kempf and Deborah Konkle-Parker), U01-AI-103401; Atlanta WIHS (Ighovwerha Ofotokun and Gina Wingood), U01-AI-103408; Bronx WIHS (Kathryn Anastos and Anjali Sharma), U01-AI-035004; Brooklyn WIHS (Howard Minkoff and Deborah Gustafson), U01-AI-031834; Chicago WIHS (Mardge Cohen and Audrey French), U01-AI-034993; Metropolitan Washington WIHS (Seble Kassaye), U01-AI-034994; Miami WIHS (Margaret Fischl and Lisa Metsch), U01-AI-103397; UNC WIHS (Adaora Adimora), U01-AI-103390; Connie Wofsy Women's HIV Study, Northern California (Ruth Greenblatt, Bradley Aouizerat, and Phyllis Tien), U01-AI-034989; WIHS Data Management and Analysis Center (Stephen Gange and Elizabeth Golub), U01-AI-042590; Southern California WIHS (Joel Milam), U01-HD-032632 (WIHS I–WIHS IV).
Author Disclosure Statement
No competing financial interests exist.
Funding Information
This research was supported, in part, by the Center for AIDS Research (CFAR) Network of Integrated Clinical Systems (CNICS), National Institutes of Health (NIH) grant award R24-AI-067039, and by the Carolina Data Warehouse for Health within the NC Translational and Clinical Sciences (NC TraCS) Institute, which is supported by the National Center for Advancing Translational Sciences (NCATS), NIH Grant award UL1-TR-002489.
The WIHS is funded primarily by the National Institute of Allergy and Infectious Diseases (NIAID), with additional cofunding from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), the National Cancer Institute (NCI), the National Institute on Drug Abuse (NIDA), and the National Institute on Mental Health (NIMH). Targeted supplemental funding for specific projects is also provided by the National Institute of Dental and Craniofacial Research (NIDCR), the National Institute on Alcohol Abuse and Alcoholism (NIAAA), the National Institute on Deafness and other Communication Disorders (NIDCD), and the NIH Office of Research on Women's Health. WIHS data collection is also supported by UL1-TR000004 (UCSF CTSA), UL1-TR000454 (Atlanta CTSA), P30-AI-050410 (UNC CFAR), and P30-AI-027767 (UAB CFAR).
References
- 1. Lifson AR; INSIGHT Endpoint Review Committee Writing Group, Belloso WH, et al. : Development of diagnostic criteria for serious non-AIDS events in HIV clinical trials. HIV Clin Trials 2010;11:205–219 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Crane HM, Heckbert SR, Drozd DR, et al. : Lessons learned from the design and implementation of myocardial infarction adjudication tailored for HIV clinical cohorts. Am J Epidemiol 2014;179:996–1005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Newton KM, Peissig PL, Kho AN, et al. : Validation of electronic medical record-based phenotyping algorithms: Results and lessons learned from the eMERGE network. J Am Med Inform Assoc 2013;20:e147–e154 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Adimora AA, Ramirez C, Benning L, et al. : Cohort profile: The Women's Interagency HIV Study. Int J Epidemiol 2018;47:393i–394i [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Napravnik S, Eron JJ Jr., McKaig RG, Heine AD, Menezes P, Quinlivan E: Factors associated with fewer visits for HIV primary care at a tertiary care center in the Southeastern U.S. AIDS Care 2006;18(Suppl 1):S45–S50 [DOI] [PubMed] [Google Scholar]
- 6. Brouwer ES, Napravnik S, Eron JJ Jr., et al. : Validation of Medicaid claims-based diagnosis of myocardial infarction using an HIV clinical cohort. Med Care 2015;53:e41–e48 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. MESA Field Center: Manual of operations. 2001. Available at www.mesa-nhlbi.org/PublicDocs/MesaMop/MesMop1-5-01.doc, accessed February5, 2019
- 8. Thygesen K, Alpert JS, White HD, et al. : Universal definition of myocardial infarction. Circulation 2008;116:2634–2653 [DOI] [PubMed] [Google Scholar]
- 9. Roumie C, Shirey-Rice J, Kripalani S: MidSouth CDRN—coronary heart disease algorithm. PheKB, 2014. Available at https://phekb.org/phenotype/midsouth-cdrn-coronaryheart-disease-algorithm, accessed January20, 2019
- 10. Centers for Medicare and Medicaid Services: Promoting interoperability programs. Available at www.cms.gov/Regulations-and-Guidance/Legislation/EHRIncentivePrograms/2017ProgramRequirements, accessed July3, 2019
- 11. Henry J, Pylypchuk Y, Searcy T, Patel V: Office of the National Coordinator for Health Information Technology, Department of Health and Human Services, Data Brief 35, May 2016 [Google Scholar]
- 12. Jamoom E, Yang N: Table of electronic health record adoption and use among office-based physicians in the U.S., by State: 2015. National Electronic Health Records Survey. 2016. Available at www.cdc.gov/nchs/data/ahcd/nehrs/2015_nehrs_web_table.pdf, accessed September9, 2019
- 13. Rasmussen LV, Thompson WK, Pacheco JA, et al. : Design patterns for the development of electronic health record-driven phenotype extraction algorithms. J Biomed Inform 2014;51:280–286 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Carroll RJ, Eyler AE, Denny JC: Naive electronic health record phenotype identification for Rheumatoid arthritis. AMIA Annu Symp Proc 2011;2011:189–196 [PMC free article] [PubMed] [Google Scholar]
- 15. Ho ML, Lawrence N, van Walraven C, et al. : The accuracy of using integrated electronic health care data to identify patients with undiagnosed diabetes mellitus. J Eval Clin Pract 2012;18:606–611 [DOI] [PubMed] [Google Scholar]
- 16. Shivade C, Raghavan P, Fosler-Lussier E, et al. : A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc 2014;21:221–230 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Peissig PL, Rasmussen LV, Berg RL, et al. : Importance of multi-modal approaches to effectively identify cataract cases from electronic health records. J Am Med Inform Assoc 2012;19:225–234 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Hanauer DA, Englesbe MJ, Cowan JA, Campbell DA: Informatics and the American College of Surgeons National Surgical Quality Improvement Program: Automated processes could replace manual record review. J Am Coll Surg 2008;8:37–41 [DOI] [PubMed] [Google Scholar]
- 19. Snell E: Cerner, Epic systems account for 51.5% acute care market. EHR Intelligence 2018; Available at https://ehrintelligence.com/news/cerner-epic-systems-account-for-51.5-acute-care-hospital-market, accessed January20, 2019
- 20. US Centers for Disease Control and Prevention; National Center for Health Statistics: International Classification of Diseases (ICD-10-CM/PCS). 2017. Available at www.cdc.gov/nchs/icd/icd10cm_pcs.htm, accessed July2, 2019
- 21. Khera R, Dorsey KB, Krumholz HM: Transition to the ICD-10 in the United States. An emerging chasm. JAMA 2018;320:133–134 [DOI] [PubMed] [Google Scholar]
- 22. Thygesen K, Alpert JS, Jaffe AS, et al. : Joint ESC/AACF/AHA/WHF Task Foce for the Universal Definition of Myocardial Infarction. Third universal definition of myocardial infarction. Circulation 2012;126:2020–2035 [DOI] [PubMed] [Google Scholar]
- 23. Goyal A, Gluckman TJ, Tcheng JE: What's in a name? The new ICD-10 (10th Revision of the International Statistical Classification of Diseases and Related Health Problems) codes and type 2 myocardial infarction. Circulation 2017;136:1180–1182 [DOI] [PubMed] [Google Scholar]
- 24. Carrell DS, Halgrim S, Diem-Thy T, et al. : using natural language processing to improve efficiency of manual chart abstraction in research: The case of breast cancer recurrence. Am J Epidemiol 2014;179:749–758 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Murff HJ, FitzHenry F, Matheny ME, et al. : Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA 2011;306:848–855 [DOI] [PubMed] [Google Scholar]