Skip to main content
. 2021 Oct 21;157:104622. doi: 10.1016/j.ijmedinf.2021.104622

Table 4.

Recommendations for avoiding discrepancies between manual and automated data extraction. Issues were generally categorized into one of 3 groups: human error, ETL or mapping error, and abstraction-query mismatch. Suggestions for future implementations are provided for each issue.

Common Issues Suggestions for improvement
Human Error
Manual abstractors overlook desired information. Provide instructions with as complete a list as possible of inclusion examples. Rapid extraction efforts limited the time available for data collectors to meticulously consider all possible responses.
Manual abstractors include inappropriate information. (false positive) Explain complex drug classes, confusing content, and any foreseeable misconceptions (with examples), otherwise reported data will reflect this confusion.
ETL/Mapping Error
Missing data Increase use of health information exchanges (HIEs) and foster adoption of HIE data into research repositories. Encourage more robust data capture within the EHR (e.g. recording end dates of medication orders). Incorporate, where possible, medication fill data into research repositories.
Local errors Actively surveil and address mapping issues by hardcoding placeholder reference terminology (e.g. RxNorm code 2284718) in ETL code for investigational agents (e.g. remdesivir).
Patient identifier inconsistency Work with existing health information management teams within the clinical informatics domain to address observed issues in identity management.
Cross institutional differences Ensure that differences between EHR systems at subsites of large hospital systems are properly addressed before incorporating their data. Implement “sanity checks” on mappings to identify errors before pushing new data.
Abstraction-Query Mismatch
Mismatch between query and data format Develop imputation logic for extending repeated standing orders into continuous drug exposure variable. Develop abstraction instructions that either consider the logic of automated processes or clarify inclusion criteria.
Complex instructions Minimize conditional chart review instructions where possible and ensure contingent logic is necessary to yield clinically meaningful data. Minimize ambiguity by providing lists of acceptable responses and examples of chart review in ambiguous conditions.