Table 2.
Frequent problems that often stand in the way of EHR use for translational research.
Problems Observed in Certain EHR Instances | Characteristics of These Problems |
---|---|
| |
EHR data are inaccurate | EHR data contain too many errors—from carelessly written texts, copy-and-paste errors to clinical coding [8]. |
EHR data are difficult or impossible to interpret | Complete interpretation of EHR content requires hidden contexts to be made explicit. |
EHR data tell an incomplete story | Missing data elements abound. |
EHR data across organizations are inconsistent | Patients are treated in many places; different organizations’ EHR typically records different information on the same patient; even within the same organization and with the same EHR system, data on a given patient may be inconsistent because of different installed features or different levels of training. |
EHR data are tilted toward the needs of billing and administration | Much data derives from coding of diagnoses and procedures for billing and exhibits even greater inaccuracy than EHR data proper. This is particularly problematic where insurance companies require a specific diagnosis to be present in order to pay for specific procedures, medications, or other treatments. |
EHR data include too much free text | The vast majority of the patient’s story is told in narrative text, but natural language processing (NLP) technology is still too far from perfect for routine use. NLP is complex to set up, use, and maintain. To work well, NLP solutions need to be tailored to document types, domains, and sometimes specific healthcare providers. The quality of NLP solutions critically depends on large training resources, which are expensive to create and are often not available due to privacy concerns. |
EHR data lack provenance | EHRs are constantly fed by external information systems (e.g., lab systems, connected devices), but they do not always indicate the provenance (source systems and organizations) of these data. |
EHR data are too coarse-grained for research | Coding for billing is at the level of diagnosis categories, not fine-grained diagnoses. Besides imparting loss of information, heterogeneous sets of diseases or procedures are identified by the same code. |
EHR data are derived from clinical care and lack of granularity for research purposes | Clinical care rarely matches the level of rigor in measurement, calibration, and data collection that is required for clinical research. |