Assessment of Data Quality Variability across Two EHR Systems through a Case Study of Post-Surgical Complications

Sunyang Fu; Andrew Wen; Gavin M Schaeferle; Patrick M Wilson; Gabriel Demuth; Xiaoyang Ruan; Sijia Liu; Curtis Storlie; Hongfang Liu

. 2022 May 23;2022:196–205.

Assessment of Data Quality Variability across Two EHR Systems through a Case Study of Post-Surgical Complications

Sunyang Fu ¹, Andrew Wen ¹, Gavin M Schaeferle ³, Patrick M Wilson ³, Gabriel Demuth ², Xiaoyang Ruan ¹, Sijia Liu ¹, Curtis Storlie ^2,³, Hongfang Liu ¹

PMCID: PMC9285181 PMID: 35854735

Abstract

Translation of predictive modeling algorithms into routine clinical care workflows faces challenges in the form of varying data quality-related issues caused by the heterogeneity of electronic health record (EHR) systems. To better understand these issues, we retrospectively assessed and compared the variability of data produced from two different EHR systems. We considered three dimensions of data quality in the context of EHR-based predictive modeling for three distinct translational stages: model development (data completeness), model deployment (data variability), and model implementation (data timeliness). The case study was conducted based on predicting post-surgical complications using both structured and unstructured data. Our study discovered a consistent level of data completeness, a high syntactic, and moderate-high semantic variability across two EHR systems, for which the quality of data is context-specific and closely related to the documentation workflow and the functionality of individual EHR systems.

Introduction

The rapid adoption of electronic health record (EHR) systems incentivized by the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 has enabled the digital transformation of clinical data and large-scale data-driven research^1,2. In particular, the longitudinal, voluminous, and dense data offered by the EHR fuels the development of modern machine learning (ML) techniques for predicting disease trajectories and health outcomes, offering unique opportunities for real-time clinical decision support, risk management, and personalized patient monitoring^3,4. In alignment with the vision of evidence-based care and precision medicine, medical decisions can be tailored to the individual patient leveraging predictive models trained from longitudinal EHR data⁵. One famous example was the Dual Antiplatelet Therapy (DAPT) study that used multiple predictive models to estimate the risk of ischemic events and bleeding to identify unique clinical factors, providing data-driven knowledge insights for maximizing patient treatments⁶. More recently, Dikilitas et al leveraged both EHR and eMERGE data to derive risk prediction models for coronary heart disease based on three major racial and ethnic cohorts⁷. Leveraging the power of unstructured data, Oliwa et al developed a predictive model from EHR-based clinical notes to identify patients who are at risk for falling out of HIV care⁸.

Despite the increasing volume of research related to clinical predictive modeling, the translation of prediction algorithms into routine clinical care remains challenging^9,10. A recent study done by Wong et al published in JAMA Internal Medicine evaluated the Epic sepsis model on a large-scale cohort of 27,697 patients. The observed model performance (AUC, 0.63) was substantially lower than the reported performance in the internal documentation (AUC, 0.76-0.83)¹¹. Further analysis revealed that primary issue for the Epic sepsis model was not the degradation of performance, but rather the direct deployment of the model without a proper definition of EHR data elements, implementation transparency, detailed instructions on how model can be used, customized, and interpreted, as well as best practices to deal with data quality issues in the context of the clinical problem that the model is designed for.

These findings support the intuitive idea that models cannot be trusted without a good understanding of the data being fed into them. Consequently, the validity and portability of predictive models are dependent on the data on which it is derived¹⁰. EHR data is known to suffer from several data quality issues^12,13. Unlike data being prospectively collected in a controlled environment such as clinical trials, EHR systems are primarily designed for patient care, and data documentation patterns can be easily affected by numerous contextual factors, including clinical setting (e.g., ICU vs. non-ICU), human factors (e.g., varying levels of medical expertise and training), patient characteristics, and practice guidelines (e.g., whether to document incidental findings)^14-16. Furthermore, the EHR system itself has a significant impact on the form and format of clinical data. Built-in documentation functionality such as templates, copy and paste, auto-documentation, and transcription can affect the EHR-specific syntactic and semantic definition for any data contained therein^17,18. The issues with data quality can therefore be further exacerbated when working with different EHR systems. An evaluation study done by Madigan et al. discovered that 40% of results from ten different clinical databases vary significantly in terms of data heterogeneity, which measures the variability of information quality and semantic definition across heterogeneous data sources¹⁴. In the context of predictive modeling, if models are trained on data that cannot be reproduced due to a high level of variability, models may suffer the issues of portability and generalizability.

To help further investigate and quantify the data quality issue caused by the heterogeneous EHR systems in the context of predictive modeling, we retrospectively assessed the variability of data generated from two different EHR systems. As EHR system functionality and information documentation patterns are deeply embedded within the clinical workflow and practice, the quality of data needs to be examined for the given context (e.g., clinical setting, data documentation environment, and use cases). We considered three dimensions of data quality in the context of EHR-based predictive modeling for three distinct translational phases: model development (data completeness), model deployment (data variability), and model implementation (data timeliness). The data quality-related measurements were implemented in a real-world study of predicting post-surgical complications (PSC) that comprised a wide range of clinical modalities collected from three stages of surgery (pre-operative, intra-operative, and post-operative). To the best of our knowledge, this is the first study that compares data heterogeneity of two EHR systems using the case matching design. We believe the pragmatic informatics methods presented by the study can be considered as potential data quality assessment methods for the implementation and translation of future predictive models.

Methods and Materials

Study Setting This study was approved by the Mayo Clinic Institutional Review Board. In May 2018, Mayo Clinic completed a large EHR migration and workflow standardization. The effort for Mayo Clinic Rochester campus included the conversion of the GE Centricity/LastWord EHR system (Centricity) to Epic EHR system (Epic). This migration offers an ideal scenario to study the difference between two EHR systems because confounding factors from inter-institutional variation can be mitigated due to the entire study being conducted within a single institution¹⁴. In addition, we used the case matching design to account for potential confounders contributed by patient population variation¹⁹. Bins were created for the age variable with a fixed range of 5 years. We performed exact matching for age, sex, and type of surgery (Table 1). Two study cohorts with a colorectal surgery as the primary procedure performed at Mayo Clinic Rochester were retrospectively constructed. Each cohort contains a total of 811 patients.

Table 1.

Matching Criteria of Two EHRs

	GE Centricity (Pre-migration)	Epic (Post-migration)
Study Period	2017-01-01 - 2018-01-01	2019-01-01 - 2020-01-01
Total matched patients	811	811
Matching criteria	Age, sex, type of surgery (CPT: 44140, 44141, 44143-44147, 44150 - 44153, 44155 - 44158, 44160, 44204 – 44208, 44210 - 44212, 45110 - 45114, 45116 ,45119, 45123,45395, 45397) 20

Open in a new tab

Study Variables All study anticipates are part of the ACS National Surgical Quality Improvement Program (ACS NSQIP®) based on the Mayo Clinic Rochester campus. The program conducts a monthly evaluation of a sample (approximately 20%) of the colon and rectal surgery (CRS) practice based on standard procedure sampling methodology²¹. The NSQIP variables were defined by the ACS NSQIP abstraction guidelines and can be summarized into three stages: pre-operative, intra-operative, and post-operative (including post-hospitalization) (Figure1). The key variables include patient demographic, comorbidities, preoperative labs (90 days before surgery), clinical, intraoperative elements, and postoperative occurrences/complications for 30 days after surgery.

Data collection Definitions of data collection and abstraction were standardized and aggregated with 18 other participating institutions across the nation²². The structured data consists of 102 unique variables falling under categories of demographic data, patient-provided information (PPI), symptoms, comorbidities, physiologic measurements, laboratory tests, observational assessments, and operative factors. The data was retrieved from the Mayo Unified Data Platform (UDP) using an R-based application programming interface (API). The UDP is an enterprise data warehouse that loads data directly from the Mayo Clinic EHR. Patient comorbidities were found from ICD-9 and 10 codes recorded within one year of surgery. PPI was measured and collected at the time of admission. Symptoms, physiologic values, laboratory tests, and observational factors were abstracted from two weeks before surgery until the start of surgery. In addition to the 102 variables abstracted from EHR, another 16 were generated using NLP as a service²³, a Mayo Clinic internal natural language processing (NLP) platform for extracting medical information from unstructured text. This system was developed based on an open-source NLP framework MedTaggerIE²⁴. In total, 118 variables were created for the final data set.

Measurements of EHR Variability To examine the potential variability of data quality caused by two EHR systems, we consider three dimensions of data quality: data completeness, data variability (syntactic and semantic), and data timeliness, as listed in Table 2.

Table 2.

Definitions of Data Quality Dimensions in the Context of EHRs

Dimensions	Definition
Data completeness	A record contains all observations, all desired types of data, and a specified frequency of data over time25,26.
Data variability (syntactic)	The structure (or syntax) of data15,27.
Data variability (semantic)	The meaning (or semantics) of data15,27.
Data timeliness	The measurement of time expectation for accessibility and availability of data28.

Open in a new tab

Data Completeness As suggested by Juran and Weiskopf et al, data completeness needs to be viewed as context-dependent and fitness for use^25,26. We used the NSQIP as the reference standard to assess the data completeness. Each clinical encounter was defined as a colorectal surgery period using surgery operation date as the index date (Figure 1). The completeness of data is measured by the presence of a reference standard given all observations made about a patient²⁵. We used the rate of missing (RoM) to calculate the presence of information frequently found in CRS patients. A missing event for pre-operative variables was defined as the absence of the information within 90 days prior to the surgery index date. The perioperative duration was calculated using admit and discharge date. The duration for post-operative variables was 30 days after the surgery. To further understand and measure the RoM variation, we organized the variables by seven unique data sources including admit, discharge, and transfer status (ADT), billing code, patient demographics, vital signs, laboratory result, clinical note, and surgery information. Three different stages (pre-operative, intra-operative, and post-operative) were also assigned to each variable. McNemar’s test was performed to determine the statistically significant difference in the data completeness between Centricity and Epic.

Figure 1. — Study Timeline and Example Variables in Three Stages of Colon and Rectal Surgery

Data Variability The health level seven (HL7) messages of unstructured clinical notes within one month before and after the surgery date were retrieved for the two matched cohorts. The HL7 Clinical Document Architecture (CDA) is a standard XML format for the syntactic representation of clinical documents based on the Reference Information Model (RIM)²⁹. The general structure of a CDA document is comprised of 1) document header or metadata information such as document date, document creator, and service location, and 2) narrative text in the body of the document. Based on the definition proposed by Elkin et al and Sohn et al, syntactic variability was examined by comparing meta-structure, documentation sections of the HL7 messages, and calculating corpus statistics^15,27. The following metrics were considered: tokens/section, tokens/document, tokens/patient, sections/document, sections/patient, and documents/patient¹⁵. The statistically significant difference between the two sites was determined using Wilcoxon signed-rank test.

The semantics variability was examined by comparing the number of PSC concepts per patient across two EHR systems. The PSC concepts were extracted by an existing NLP algorithm³⁰. Since the original algorithm was developed and evaluated using the Centricity data only, we conducted corpus annotation and NLP refinement on 100 patients with roughly 1200 Epic clinical notes (within one month before and after surgery index date). Corpus annotation is a process of marking interpretative linguistic and predefined clinical concepts. The annotation was conducted by the same annotator (DMI) who participated in the previous study and had gone through training and consensus development. The same annotation guideline, annotation software (MAE), and schema were applied. The 100 patients were randomly split into 50 training and 50 test sets. The out-of-box (i.e., directly applied with no refinement) precision, recall, and f1-score for NLP were 0.72, 0.84, and 0.79, respectively. After the refinement on the training data, the final performance on test data was 0.92 in f1-score. Two versions of the NLP algorithm were applied separately to two cohorts (Centricity and Epic).

Furthermore, we assessed the variability of semantic textual similarity (STS) of the positive mentions extracted from the NLP algorithm. We measured the sentence pair textual similarity using the averaged value of three surface lexical similarities, which include a string-matching algorithm proposed by Ratcliff and Obershelp, cosine similarity of two-word vector space, and Levenshtein distance^31-33. The method was utilized and evaluated in the 2018 BioCreative/OHNLP clinical semantic textual similarity challenge³². A high similarity sentence pair is determined when the average score was greater or equal to 0.40. Based on the concept distribution, we examine the textual similarity of two frequent concepts - Anemia and Abscess and the two least frequent concepts – Purulent Drain and Wound Infection (with minimal 50 sentence pairs per section). We calculate both intra-EHR similarity (i.e., comparison within the same EHR system) and inter EHR similarity between Epic and Centricity. The distributions of the unique clinical expressions were visualized using histogram charts.

Data Timeliness In the era of achieving real-time clinical decision support and prospective risk detection, information timeliness becomes an important quality criterion since the timeliness of the model is dependent on data. Data timeliness was defined as the time expectation of whether information can be accessible given each patient encounter²⁸. The analysis of data timeliness for structured data was focused on lab variables due to their high prevalence and importance to the prediction of PSC34. We retrieved the lab result record date (i.e., the date when a record loaded to the source system) and compared it with the patient encounter date. For unstructured data, we retrospectively collected and measured the time spent on the documentation of clinical notes for each patient visit (i.e., comparison of note date on source system and encounter date). To simplify the measure, we define timely information as the data that can be accessed within 24 hours of a CRS-related clinical encounter.

Results

Data completeness The overall comparison of RoM across the two EHRs was illustrated in Figure 2. Among a total of 118 variables studied, the median rate of missing for Centricity is 0.011 (1^st IQR 0, 3^rd 0.71), whereas it was 0.007 (1^st IQR 0, 3^rd IQR 0.665) for Epic. We observed a high RoM among the intraoperative variables (green dots) compared with the postoperative variables (red dots). There was no significant pattern discovered for the comparison of measurement and temporal variables. A zero to mild difference was discovered for both highly complete variables (RoM < 0.1) and highly incomplete variables (RoM > 0.85). On the other hand, there was a high variation among variables with RoM between 0.1 to 0.85 across two EHRs (Figure 2- Area of High Heterogeneity).

The aggregated comparison of RoM by operation stage and data source was provided in Table 3. No significant difference was found for the data collected in the intraoperative and postoperative stages. However, there was a significant difference for lab-related variables and a moderate difference for demographics varibles. Based on McNemar’s test, variables with a significant difference in the level of RoM were Serum albumin (p<0.001), BUN (p<0.001), Bilirubin (p<0.001), Alkaline phosphatase (p<0.001), C. Diff (p<0.001), Transfer status (p=0.002), International Normalized Ratio (INR) of PT values (p=0.012).

Table 3.

Comparison of Information Completeness by Operation Stage and Data Source

Stage	Original Data Source	Total No. of Variables	No. of Significant Variables (%)
Preoperative	ADT	4	0 (0)
	Billing (D)	16	0 (0)
	CN	26	1 (4)
	Demo	6	2 (33)
	Lab	36	16 (44)
	Vitals	4	0 (0)
Intraoperative	Billing (P)	1	0 (0)
Intraoperative	Surgery	11	1 (9)
Postoperative	ADT	10	0 (0)
Postoperative	Billing (P)	3	0 (0)

Open in a new tab

*Variables were organized by the stage of surgery and summarized by the original data source. Statistically significant difference was determined by paired McNemar’s test of the data completeness between Centricity and Epic, a significant variable was defined as p<0.05; Abbreviation: ADT: admit, discharge, and transfer status, CN: clinical notes, Billing (D): diagnosis code, Billing (P): procedure code.

Data Variability The comparison of the corpus statistics between Centricity and Epic was provided in Table 4. We observed a larger number of clinical documents, tokens, and sections (total) in Epic compared with Centricity. Because the total number of documents and sections for Epic has increased, the number of sections/patient and documents/patient were higher than Centricity. On the other hand, the median of the number of tokens/patient for Epic was lower than Centricity despite the fact that the total number of documents and tokens were higher. Based on the Wilcoxon signed-rank test, all five corpus statistics metrics were found to be significant for the comparison between two EHRs. Overall, it is evident that two systems have different ways of organizing clinical documents.

Table 4.

Corpus statistics of GE Centricity and Epic

Original data source	Centricity	Epic	p-value
No. of patients	811	811
No. of documents	18,648	30,476
No. of tokens (Total)	8,273,327	11,383,088
No. of sections (Total)	94,645	116,399
No. of sections (Unique)	64	47
No. of tokens/section, median (IQR)	29 (95)	64 (90)	<0.001
No. of tokens/document, median (IQR)	243 (440)	229 (328)	<0.001
No. of tokens/patient, median (IQR)	35,927 (42,828)	26,744 (36,367)	<0.001
No. of sections/document, median (IQR)	4 (5)	3 (5)	0.0012
No. of sections/patient, median (IQR)	81 (69)	94 (92)	<0.001
No. of documents/patient, median (IQR)	15 (14)	26 (26)	<0.001

Open in a new tab

*IQR: interquartile range. Statistically significant difference was determined by paired Wilcoxon signed-rank test.

Based on the semantic mapping and analysis of the document sections across two EHRs, there was a high similarity of the clinical document sections across the two systems. Among the total 94,645 sections in Centricity, the top three sections were “Impression/Report/Plan” (14,203), “Chief Complaint/Reason for Visit” (7,106), and “Physical Examination” (5,853). The three most prevalent sections for Epic were Impression/Report/Plan” (20,392), “Procedure Information” (8,267), and “Physical Examination” (6,628). Among the top 15 sections, 9 sections were matched, including ‘Impression/Report/Plan’, ‘Chief Complaint/Reason for Visit’, ‘Physical Examination’, ‘History of Present Illness’, ‘Vital Signs’, ‘Subjective’, ‘Diagnosis’, ‘Procedure Information’, and ‘Social History’.

The overall summary statistics of the number of PSC-related clinical concepts extracted by NLP was provided in Table 5. Overall, the concept/document ratios and concept/patient ratios for Epic (blue and orange columns) were lower for all concept types and significantly lower for Anemia, Abscess, Cellulitis and Painful incision; this pattern indicating that Centricity has potentially higher semantic breadth²⁵ in the context of PSC.

Table 5.

Semantic Concept Distribution of Two EHRs

Open in a new tab

Based on the concept distribution from Table 5, we further examined the textual similarity of two most frequent concepts, anemia and abscess, and two least frequent concepts, purulent drain and wound infection. Figure 3 presents the summarized textual similarity scores for intra and inter EHR comparison. Since the document section plays an important role in contextual information, this analysis was further stratified by document sections. Compared with Epic, the intra-EHR textual similarity of Centricity was higher for all disease categories and majority of the sections. On the other hand, Epic yielded a substantially higher similarity under the ‘Secondary Diagnoses’ section. Among the section dimension, the similarity difference of ‘Diagnosis’, ‘Past Medical/Surgical History’, ‘Secondary Diagnosis’, and ‘History of Present Illness’ was substantial. For most clinical concepts and document sections, there was no substantial drop of inter-EHR similarity.

Figure 3. — Comparison of the Textual Similarity between Centricity and Epic

*High similarity: greater or equal to 0.40, Abbreviation: Intra-C: Intra-Centricity, Intra-E: Intra-Epic, Inter-E-C, Inter-Epic-Centricity

The distribution of the unique clinical expressions related to abscess and anemia (Figure 4) revealed two completely opposite patterns: Epic has more standardized language for abscess, whereas Centricity is lengthy and descriptive. On the other hand, the language representing anemia for Epic was more variant than Centricity. For example, the expression of “Anemia Posthemorrhagic Acute (Blood Loss Anemia)” was repetitively documented for more than 30% of the total sample size. The varying similarity patterns affirms that the characteristics and patterns of data is context dependent.

Data timeliness There was no delay of information found in the structured lab variables across two EHR systems. Amongst the total 811 patients who had CRS under the centricity EHR, there were a total of 1855 visits and 1673 instances of on-time documentation, and 182 instances of delayed documentation. For the other 811 matched patients under Epic EHR, there were a total of 3260 encounters, of which 44 encounters had documentation delay. The delayed documentation rate for Centricity and Epic cohorts were 0.098 and 0.013, respectively.

Discussion

The translation of predictive modeling algorithms into routine clinical care faces challenges in the form of various data quality issues caused by the heterogeneity of EHR systems. To better understand this barrier, we retrospectively assessed the variability of data from two EHR systems in the context of PSC. We discovered a consistent level of data completeness across two EHR systems with an exception for lab data. To further understand Epic’s significant improvement of capturing lab data in the context of CRS, we investigated the workflow difference between two EHRs. We learned that after EHR migration, there was a process change with how laboratory tests can be ordered. The migration enables the primary providers to order the laboratory test directly through the Epic EHRs, which may explain the lower RoM score. Conversely, there was a high syntactic variation suggested by the corpus statistics. There was also a moderate high difference in the semantic type and frequency of document sections. Textual similarity revealed a consistent pattern for roughly half of the concept-section pairs. High language variation was found for the sections of Secondary Diagnosis (Abscess) and Past Medical/Surgical History (Anemia). The data timeliness of clinical notes documentation for Epic was improved when compared with Centricity. The improvement of information timeliness from the Epic system suggests a potential higher utilization of auto- or assisted documentation. However, confirmation of this finding requires additional on-site evaluation, which we have left to a future study.

The validity and reliability of clinical data are crucial for the development of robust, safe, and scalable predictive models. However, data is often being viewed as the least incentivized aspect by ML researchers³⁵. Dealing with data can indeed be challenging; for example, data curation and wrangling can be time-consuming and tedious, especially in the context of secondary use of EHRs, where researchers have limited control of how the data is documented and standardized. On the other hand, because these latent factors (e.g., variant patterns of documentation) may introduce systematic bias and measurement error, a solid understanding of how data is documented, defined, and collected is required prior to adoption of any predictive models relying upon it. Solid data understanding can promote a good data curation plan and solutions for mitigating potential biases or confounders prior to model development and re-deployment. The transparency of information documentation has a direct implication to the explainability, implementability, and ultimately the trust of the models derived from the data. Based on our case study investigation on CRS patients, the EHR system plays an important role in how data is documented, defined, and organized. In a situation when a model will be translated to care practice or deployed to a different environment, proper data quality assessment needs to be conducted including the comparison of data characteristics and variability between the destination environment and the development environment.

Our investigation confirmed that the quality of data needs to be viewed from the context of data being generated and documented. For example, the results from Figure 3 discovered high similarity patterns in two EHR systems under different contextual factors (disease-section combination). Chart review was conducted to confirm expressions with high textual similarity were associated with the use of documentation templates. Although the use of templates may enhance the documentation standardization, the clinician’s reasoning process may be eliminated. The direct implication to machine learning models may be a varying level of contextual knowledge loss³⁶. The varying results from the analysis also strongly indicated a proper model re-training, refinement, and re-evaluation are needed.

Our study has several limitations. Since the study was conducted on a single-case scenario, the generalizability of the findings is limited by the scope of the study. We aim to expand our investigation on multiple different institutions with diverse case scenarios as part of future work. Furthermore, we plan to leverage qualitative methods to study the workflow of data documentation and transformation across multiple EHR systems.

Conclusion

To better understand the potential data heterogeneity caused by different EHR systems, we proposed and applied a standardized set of informatics methods to retrospective assess the variability of data quality contributed by two EHR systems. We discovered a varying level of data quality across two EHR systems, for which the quality of data is context-specific and closely related to the documentation workflow and the functionality of individual EHR systems. We recommend that data understanding should be equally incentivized as model development.

Acknowledgment

We gratefully thank Donna M Ihrke for performance corpus annotation and Mayo Foundation for Medical Education, NIH (#R01EB019403, U01TR002062-04) for supporting this project.

Figures & Table

References

1.Gelijns AC, Gabriel SE. Looking beyond translation—integrating clinical research with medical practice. N Engl J Med. 2012;366(18):1659–61. doi: 10.1056/NEJMp1201850. [DOI] [PubMed] [Google Scholar]
2.Arnold Milstein M. Code red and blue--safely limiting health care's GDP footprint. The New England journal of medicine. 2013;368(1):1. doi: 10.1056/NEJMp1211374. [DOI] [PubMed] [Google Scholar]
3.Bennett CC, Doub TW, Selove R. EHRs connect research and practice: Where predictive modeling, artificial intelligence, and clinical decision support intersect. Health Policy and Technology. 2012;1(2):105–14. [Google Scholar]
4.Wu G, Yang P, Xie Y, Woodruff HC, Rao X, Guiot J, et al. Development of a clinical decision support system for severity risk prediction and triage of COVID-19 patients at hospital admission: an international multicentre study. European Respiratory Journal. 2020;56(2) doi: 10.1183/13993003.01104-2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Pencina MJ, Peterson ED. Moving from clinical trials to precision medicine: the role for predictive modeling. Jama. 2016;315(16):1713–4. doi: 10.1001/jama.2016.4839. [DOI] [PubMed] [Google Scholar]
6.Yeh RW, Secemsky EA, Kereiakes DJ, Normand S-LT, Gershlick AH, Cohen DJ, et al. Development and validation of a prediction rule for benefit and harm of dual antiplatelet therapy beyond 1 year after percutaneous coronary intervention. Jama. 2016;315(16):1735–49. doi: 10.1001/jama.2016.3775. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Dikilitas O, Schaid DJ, Kosel ML, Carroll RJ, Chute CG, Denny JA, et al. Predictive utility of polygenic risk scores for coronary heart disease in three major racial and ethnic groups. The American Journal of Human Genetics. 2020;106(5):707–16. doi: 10.1016/j.ajhg.2020.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Oliwa T, Furner B, Schmitt J, Schneider J, Ridgway JP. Development of a predictive model for retention in HIV care using natural language processing of clinical notes. Journal of the American Medical Informatics Association. 2021;28(1):104–12. doi: 10.1093/jamia/ocaa220. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Caspers J. Translation of predictive modeling and AI into clinics: a question of trust. European Radiology. 2021:1-2. [DOI] [PMC free article] [PubMed]
10.Paxton C, Niculescu-Mizil A, Saria S , editors. Developing predictive models using electronic medical records: challenges and pitfalls. AMIA Annual Symposium Proceedings; 2013: American Medical Informatics Association. [PMC free article] [PubMed]
11.Wong A, Otles E, Donnelly JP, Krumm A, McCullough J, DeTroyer-Cooley O, et al. 2021. External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients. JAMA Internal Medicine. [DOI] [PMC free article] [PubMed]
12.Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. Journal of the American Medical Informatics Association. 2013;20(1):144–51. doi: 10.1136/amiajnl-2011-000681. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Botsis T, Hartvigsen G, Chen F, Weng C. Secondary use of EHR: data quality issues and informatics opportunities. Summit on Translational Bioinformatics. 2010;2010:1. [PMC free article] [PubMed] [Google Scholar]
14.Fu S, Leung LY, Raulli A-O, Kallmes DF, Kinsman KA, Nelson KB, et al. Assessment of the impact of EHR heterogeneity for clinical research through a case study of silent brain infarction. BMC medical informatics and decision making. 2020;20(1):1–12. doi: 10.1186/s12911-020-1072-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Sohn S, Wang Y, Wi C-I, Krusemark EA, Ryu E, Ali MH, et al. Clinical documentation variations and NLP system portability: a case study in asthma birth cohorts across institutions. Journal of the American Medical Informatics Association. 2018;25(3):353–9. doi: 10.1093/jamia/ocx138. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Cohen GR, Friedman CP, Ryan AM, Richardson CR, Adler-Milstein J. Variation in physicians’ electronic health record documentation and potential patient harm from that variation. Journal of general internal medicine. 2019;34(11):2355–67. doi: 10.1007/s11606-019-05025-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Zhang R, Pakhomov S, McInnes BT, Melton GB , editors. Evaluating measures of redundancy in clinical texts. AMIA annual symposium proceedings; 2011: American Medical Informatics Association. [PMC free article] [PubMed]
18.Thaker VV, Lee F, Bottino CJ, Perry CL, Holm IA, Hirschhorn JN, et al. Impact of an electronic template on documentation of obesity in a primary care clinic. Clinical pediatrics. 2016;55(12):1152–9. doi: 10.1177/0009922815621331. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Mann C. Observational research methods. Research design II: cohort, cross sectional, and case-control studies. Emergency medicine journal. 2003;20(1):54–60. doi: 10.1136/emj.20.1.54. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Cima R, Dankbar E, Lovely J, Pendlimari R, Aronhalt K, Nehring S, et al. Colorectal surgery surgical site infection reduction program: a national surgical quality improvement program–driven multidisciplinary single-institution experience. Journal of the American College of Surgeons. 2013;216(1):23–33. doi: 10.1016/j.jamcollsurg.2012.09.009. [DOI] [PubMed] [Google Scholar]
21.Ko CY, Hall BL, Hart AJ, Cohen ME, Hoyt DB. The American college of surgeons national surgical quality improvement program: achieving better and safer surgery. The Joint Commission Journal on Quality and Patient Safety. 2015;41(5):199. doi: 10.1016/s1553-7250(15)41026-8. .AP1. [DOI] [PubMed] [Google Scholar]
22.Ingraham AM, Richards KE, Hall BL, Ko CY. Quality improvement in surgery: the American College of Surgeons national surgical quality improvement program approach. Advances in surgery. 2010;44(1):251–67. doi: 10.1016/j.yasu.2010.05.003. [DOI] [PubMed] [Google Scholar]
23.Wen A, Fu S, Moon S, El Wazir M, Rosenbaum A, Kaggal VC, et al. Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Clinic NLP-as-a-service implementation. NPJ digital medicine. 2019;2(1):1–7. doi: 10.1038/s41746-019-0208-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Liu H, Bielinski SJ, Sohn S, Murphy S, Wagholikar KB, Jonnalagadda SR, et al. An information extraction framework for cohort identification using electronic health records. AMIA Summits on Translational Science Proceedings. 2013;2013:149. [PMC free article] [PubMed] [Google Scholar]
25.Weiskopf NG, Hripcsak G, Swaminathan S, Weng C. Defining and measuring completeness of electronic health records for secondary use. Journal of biomedical informatics. 2013;46(5):830–6. doi: 10.1016/j.jbi.2013.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Juran JM, De Feo JA. 2010. Juran's quality handbook: the complete guide to performance excellence: McGraw-Hill Education.
27.Elkin PL, Froehling D, Bauer BA, Wahner-Roedler D, Rosenbloom S, Bailey K, et al. , editors. Aequus communis sententia: defining levels of interoperability. Medinfo 2007: Proceedings of the 12th World Congress on Health (Medical) Informatics; Building Sustainable Health Systems; 2007: IOS Press. [PubMed]
28.David L. 2009. Master data management. Morgan Kaufmann Press: San Francisco, CA, USA.
29.Dolin RH, Alschuler L, Beebe C, Biron PV, Boyer SL, Essin D, et al. The HL7 clinical document architecture. Journal of the American Medical Informatics Association. 2001;8(6):552–69. doi: 10.1136/jamia.2001.0080552. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Sohn S, Larson DW, Habermann EB, Naessens JM, Alabbad JY, Liu H. Detection of clinically important colorectal surgical site infection using Bayesian network. Journal of Surgical Research. 2017;209:168–73. doi: 10.1016/j.jss.2016.09.058. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Black PE. Ratcliff/Obershelp pattern recognition. Dictionary of algorithms and data structures. 2004;17.
32.Wang Y, Afzal N, Fu S, Wang L, Shen F, Rastegar-Mojarad M, et al. MedSTS: a resource for clinical semantic textual similarity. Language Resources and Evaluation. 2020;54(1):57–72. [Google Scholar]
33.Singhal A. Modern information retrieval: A brief overview. IEEE Data Eng Bull. 2001;24(4):35–43. [Google Scholar]
34.Chen D, Afzal N, Sohn S, Habermann EB, Naessens JM, Larson DW, et al. Postoperative bleeding risk prediction for patients undergoing colorectal surgery. Surgery. 2018;164(6):1209–16. doi: 10.1016/j.surg.2018.05.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Sambasivan N, Kapania S, Highfill H, Akrong D, Paritosh P, Aroyo LM. 2021. , editors. “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. proceedings of the 2021 CHI Conference on Human Factors in Computing Systems.
36.Leung LY, Fu S, Luetmer PH, Kallmes DF, Madan N, Weinstein G, et al. Agreement between neuroimages and reports for natural language processing-based detection of silent brain infarcts and white matter disease. BMC neurology. 2021;21(1):1–5. doi: 10.1186/s12883-021-02221-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r1-2232] 1.Gelijns AC, Gabriel SE. Looking beyond translation—integrating clinical research with medical practice. N Engl J Med. 2012;366(18):1659–61. doi: 10.1056/NEJMp1201850. [DOI] [PubMed] [Google Scholar]

[r2-2232] 2.Arnold Milstein M. Code red and blue--safely limiting health care's GDP footprint. The New England journal of medicine. 2013;368(1):1. doi: 10.1056/NEJMp1211374. [DOI] [PubMed] [Google Scholar]

[r3-2232] 3.Bennett CC, Doub TW, Selove R. EHRs connect research and practice: Where predictive modeling, artificial intelligence, and clinical decision support intersect. Health Policy and Technology. 2012;1(2):105–14. [Google Scholar]

[r4-2232] 4.Wu G, Yang P, Xie Y, Woodruff HC, Rao X, Guiot J, et al. Development of a clinical decision support system for severity risk prediction and triage of COVID-19 patients at hospital admission: an international multicentre study. European Respiratory Journal. 2020;56(2) doi: 10.1183/13993003.01104-2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r5-2232] 5.Pencina MJ, Peterson ED. Moving from clinical trials to precision medicine: the role for predictive modeling. Jama. 2016;315(16):1713–4. doi: 10.1001/jama.2016.4839. [DOI] [PubMed] [Google Scholar]

[r6-2232] 6.Yeh RW, Secemsky EA, Kereiakes DJ, Normand S-LT, Gershlick AH, Cohen DJ, et al. Development and validation of a prediction rule for benefit and harm of dual antiplatelet therapy beyond 1 year after percutaneous coronary intervention. Jama. 2016;315(16):1735–49. doi: 10.1001/jama.2016.3775. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r7-2232] 7.Dikilitas O, Schaid DJ, Kosel ML, Carroll RJ, Chute CG, Denny JA, et al. Predictive utility of polygenic risk scores for coronary heart disease in three major racial and ethnic groups. The American Journal of Human Genetics. 2020;106(5):707–16. doi: 10.1016/j.ajhg.2020.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r8-2232] 8.Oliwa T, Furner B, Schmitt J, Schneider J, Ridgway JP. Development of a predictive model for retention in HIV care using natural language processing of clinical notes. Journal of the American Medical Informatics Association. 2021;28(1):104–12. doi: 10.1093/jamia/ocaa220. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r9-2232] 9.Caspers J. Translation of predictive modeling and AI into clinics: a question of trust. European Radiology. 2021:1-2. [DOI] [PMC free article] [PubMed]

[r10-2232] 10.Paxton C, Niculescu-Mizil A, Saria S , editors. Developing predictive models using electronic medical records: challenges and pitfalls. AMIA Annual Symposium Proceedings; 2013: American Medical Informatics Association. [PMC free article] [PubMed]

[r11-2232] 11.Wong A, Otles E, Donnelly JP, Krumm A, McCullough J, DeTroyer-Cooley O, et al. 2021. External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients. JAMA Internal Medicine. [DOI] [PMC free article] [PubMed]

[r12-2232] 12.Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. Journal of the American Medical Informatics Association. 2013;20(1):144–51. doi: 10.1136/amiajnl-2011-000681. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13-2232] 13.Botsis T, Hartvigsen G, Chen F, Weng C. Secondary use of EHR: data quality issues and informatics opportunities. Summit on Translational Bioinformatics. 2010;2010:1. [PMC free article] [PubMed] [Google Scholar]

[r14-2232] 14.Fu S, Leung LY, Raulli A-O, Kallmes DF, Kinsman KA, Nelson KB, et al. Assessment of the impact of EHR heterogeneity for clinical research through a case study of silent brain infarction. BMC medical informatics and decision making. 2020;20(1):1–12. doi: 10.1186/s12911-020-1072-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r15-2232] 15.Sohn S, Wang Y, Wi C-I, Krusemark EA, Ryu E, Ali MH, et al. Clinical documentation variations and NLP system portability: a case study in asthma birth cohorts across institutions. Journal of the American Medical Informatics Association. 2018;25(3):353–9. doi: 10.1093/jamia/ocx138. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r16-2232] 16.Cohen GR, Friedman CP, Ryan AM, Richardson CR, Adler-Milstein J. Variation in physicians’ electronic health record documentation and potential patient harm from that variation. Journal of general internal medicine. 2019;34(11):2355–67. doi: 10.1007/s11606-019-05025-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r17-2232] 17.Zhang R, Pakhomov S, McInnes BT, Melton GB , editors. Evaluating measures of redundancy in clinical texts. AMIA annual symposium proceedings; 2011: American Medical Informatics Association. [PMC free article] [PubMed]

[r18-2232] 18.Thaker VV, Lee F, Bottino CJ, Perry CL, Holm IA, Hirschhorn JN, et al. Impact of an electronic template on documentation of obesity in a primary care clinic. Clinical pediatrics. 2016;55(12):1152–9. doi: 10.1177/0009922815621331. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r19-2232] 19.Mann C. Observational research methods. Research design II: cohort, cross sectional, and case-control studies. Emergency medicine journal. 2003;20(1):54–60. doi: 10.1136/emj.20.1.54. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r20-2232] 20.Cima R, Dankbar E, Lovely J, Pendlimari R, Aronhalt K, Nehring S, et al. Colorectal surgery surgical site infection reduction program: a national surgical quality improvement program–driven multidisciplinary single-institution experience. Journal of the American College of Surgeons. 2013;216(1):23–33. doi: 10.1016/j.jamcollsurg.2012.09.009. [DOI] [PubMed] [Google Scholar]

[r21-2232] 21.Ko CY, Hall BL, Hart AJ, Cohen ME, Hoyt DB. The American college of surgeons national surgical quality improvement program: achieving better and safer surgery. The Joint Commission Journal on Quality and Patient Safety. 2015;41(5):199. doi: 10.1016/s1553-7250(15)41026-8. .AP1. [DOI] [PubMed] [Google Scholar]

[r22-2232] 22.Ingraham AM, Richards KE, Hall BL, Ko CY. Quality improvement in surgery: the American College of Surgeons national surgical quality improvement program approach. Advances in surgery. 2010;44(1):251–67. doi: 10.1016/j.yasu.2010.05.003. [DOI] [PubMed] [Google Scholar]

[r23-2232] 23.Wen A, Fu S, Moon S, El Wazir M, Rosenbaum A, Kaggal VC, et al. Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Clinic NLP-as-a-service implementation. NPJ digital medicine. 2019;2(1):1–7. doi: 10.1038/s41746-019-0208-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r24-2232] 24.Liu H, Bielinski SJ, Sohn S, Murphy S, Wagholikar KB, Jonnalagadda SR, et al. An information extraction framework for cohort identification using electronic health records. AMIA Summits on Translational Science Proceedings. 2013;2013:149. [PMC free article] [PubMed] [Google Scholar]

[r25-2232] 25.Weiskopf NG, Hripcsak G, Swaminathan S, Weng C. Defining and measuring completeness of electronic health records for secondary use. Journal of biomedical informatics. 2013;46(5):830–6. doi: 10.1016/j.jbi.2013.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r26-2232] 26.Juran JM, De Feo JA. 2010. Juran's quality handbook: the complete guide to performance excellence: McGraw-Hill Education.

[r27-2232] 27.Elkin PL, Froehling D, Bauer BA, Wahner-Roedler D, Rosenbloom S, Bailey K, et al. , editors. Aequus communis sententia: defining levels of interoperability. Medinfo 2007: Proceedings of the 12th World Congress on Health (Medical) Informatics; Building Sustainable Health Systems; 2007: IOS Press. [PubMed]

[r28-2232] 28.David L. 2009. Master data management. Morgan Kaufmann Press: San Francisco, CA, USA.

[r29-2232] 29.Dolin RH, Alschuler L, Beebe C, Biron PV, Boyer SL, Essin D, et al. The HL7 clinical document architecture. Journal of the American Medical Informatics Association. 2001;8(6):552–69. doi: 10.1136/jamia.2001.0080552. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r30-2232] 30.Sohn S, Larson DW, Habermann EB, Naessens JM, Alabbad JY, Liu H. Detection of clinically important colorectal surgical site infection using Bayesian network. Journal of Surgical Research. 2017;209:168–73. doi: 10.1016/j.jss.2016.09.058. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r31-2232] 31.Black PE. Ratcliff/Obershelp pattern recognition. Dictionary of algorithms and data structures. 2004;17.

[r32-2232] 32.Wang Y, Afzal N, Fu S, Wang L, Shen F, Rastegar-Mojarad M, et al. MedSTS: a resource for clinical semantic textual similarity. Language Resources and Evaluation. 2020;54(1):57–72. [Google Scholar]

[r33-2232] 33.Singhal A. Modern information retrieval: A brief overview. IEEE Data Eng Bull. 2001;24(4):35–43. [Google Scholar]

[r34-2232] 34.Chen D, Afzal N, Sohn S, Habermann EB, Naessens JM, Larson DW, et al. Postoperative bleeding risk prediction for patients undergoing colorectal surgery. Surgery. 2018;164(6):1209–16. doi: 10.1016/j.surg.2018.05.043. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r35-2232] 35.Sambasivan N, Kapania S, Highfill H, Akrong D, Paritosh P, Aroyo LM. 2021. , editors. “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. proceedings of the 2021 CHI Conference on Human Factors in Computing Systems.

[r36-2232] 36.Leung LY, Fu S, Luetmer PH, Kallmes DF, Madan N, Weinstein G, et al. Agreement between neuroimages and reports for natural language processing-based detection of silent brain infarcts and white matter disease. BMC neurology. 2021;21(1):1–5. doi: 10.1186/s12883-021-02221-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Assessment of Data Quality Variability across Two EHR Systems through a Case Study of Post-Surgical Complications

Sunyang Fu, M.H.I

Andrew Wen, M.S.

Gavin M Schaeferle, B.S.

Patrick M Wilson, M.P.H

Gabriel Demuth, Ph.D.

Xiaoyang Ruan, Ph.D.

Sijia Liu, Ph.D.

Curtis Storlie, Ph.D.

Hongfang Liu, Ph.D.

Abstract

Introduction