Skip to main content
PLOS One logoLink to PLOS One
. 2026 Jan 9;21(1):e0340287. doi: 10.1371/journal.pone.0340287

Assessment of the integrity of real-time electronic health record data used in clinical research

Jessica Liu 1, Sameer Pandya 2, Andreas Coppi 3,4, H Patrick Young 2, Harlan M Krumholz 3,4,5, Wade L Schulz 2, Guannan Gong 1,*
Editor: Sreeram V Ramagopalan6
PMCID: PMC12788664  PMID: 41511976

Abstract

Background

Near real-time electronic health record (EHR) data offers significant potential for secondary use in research, operations, and clinical care, yet challenges remain in ensuring data quality and stability. While prior studies have assessed retrospective EHR datasets, few have systematically examined the integrity of real-time data for research readiness.

Methods

We developed an automated benchmarking pipeline to evaluate the stability and completeness of real-time EHR data from the Yale New Haven Health clinical data warehouse, transformed into the OMOP common data model. Twenty-nine weekly snapshots of the EHR collected from July to November 2024 and twenty-two daily snapshots collected from April to May 2025 were analyzed. Benchmarks focused on (1) clinical actions such as patient additions, deletions, and merges; (2) changes in demographic variables (date of birth, gender, race, ethnicity); and (3) stability of discharge information (time and status). A synthetic dataset derived from MIMIC-III was used to validate the benchmarking code prior to large-scale analyses.

Results

Benchmarking revealed frequent updates due to clinical actions and demographic corrections across consecutive snapshots. Demographic changes were most frequently related to race and ethnicity, highlighting potential workflow and data entry inconsistencies. Discharge time and status values demonstrated instability for several days post-encounter, typically reaching a stable state within 4–7 days. These findings indicate that while near real-time EHR data provide valuable insights, the timing of data stabilization is critical for accurate secondary use.

Conclusions

This study demonstrates the feasibility of automated benchmarking to assess the integrity of real-time EHR data and identify when such data become analysis ready. Our findings highlight key challenges for secondary use of dynamic clinical data and provide an automated framework that can be applied across health systems to support high-quality research, surveillance, and clinical trial readiness.

Introduction

Near real-time healthcare data has the potential for broad applications beyond direct interactions between patients and clinicians. The secondary use of electronic health record (EHR) and other real-world data (RWD), such as administrative claims data, disease registries, and personal health data collected through in-home medical devices or mobile apps, has rapidly increased [110]. Adopting near real-time clinical data analytics can be beneficial from clinical, operational, and research perspectives – it provides the possibility to reduce costs and duplicate procedures, enable early detection of deteriorating or high-risk conditions, decrease patient waiting time, and to ensure more personalized patient treatment that enhances outcomes.

However, administrative healthcare data, such as claims data and mortality data, typically experiences lags from at least 90 days to a year or more before becoming usable for analysis in clinical research [11]. Moreover, these data may only represent a “snapshot” of patients rather than a longitudinal assessment regarding cause and effect [6]. Information extracted from the EHR has the potential to provide near real-time access to a more complete dataset than can be provided from other real-world data sources [12,13].

Still, there are notable challenges in the use of EHR data including ensuring data quality, bias detection, data access, information delivery [10,1417] and delayed, incomplete, and erroneous data capture caused by omissions during documentation at the time of service delivery [13,1824]. The design focus of EHR has been transactional due to its historic focus on billing [25], and the primary use in daily clinical care workflows; analytical use of real-time EHR data in clinical research is only considered a secondary use case. Previous works have proposed that consistent and standardized methods for describing, assessing, and reporting data quality (DQ) findings could aid secondary data users and consumers to better understand the potential impact of DQ on reusing data and interpreting findings. Kahn MG et al. [26,27] introduced a DQ assessment framework of EHR data from three categories– Conformance, Completeness and Plausibility: Conformance focuses strictly on the agreement of values against various technical specifications, Completeness focuses on the absence of data of a variable, and Plausibility focuses on the reasonability or correctness of data. However, most of the studies conducted DQ assessment on retrospective EHR data sets [2830], treating EHR data as static entities requiring retrospective quality control rather than dynamic systems requiring temporal validation. To ensure high-quality analysis and to better characterize and understand the implications of real-time EHR data and system use, three critical gaps must be addressed: (1) What kind of patient information were entered and updated? (2) How often was information updated? (3) When and how would the updated information flow into a computational platform for analysis? (4) How can we identify when near real-time EHR data has reached a “stable” status for analysis? As near real-time EHR data constantly changes during clinical workflows and is derived from data aggregation, EHR data reflects what was recorded into the systems but may not accurately reflect a patient’s status.

In this study, we assessed the completeness of real-time EHR data, i.e., whether real-time EHR data has reached a stabilized stage and is ready to use for further analytics, by comparing multiple snapshots of the real-time EHR data throughout a defined timeframe. We aimed to identify the changes and consistency of EHR patient data over time. We characterized EHR data for three use cases: (1) Duplicate patient registration (2) Incorrectly documented patient demographics information (3) Incorrectly documented discharge information for discharged encounters. Our findings highlight the feasibility of applying an automated benchmarking pipeline to determine when the real-time clinical data from EHR is in analysis ready state on several use cases.

Methods

Overview

We conducted a retrospective study to assess the integrity of EHR data in a real-time operational environment by benchmarking 22 consecutive daily extracts and 29 weekly extracts from the Yale New Haven Health (YNHH) clinical data warehouse (Epic Caboodle). These daily snapshots were continuously transformed into the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) [31] using the YNHH computational health platform (CHP), which maintains a daily-updated data pipeline with current clinical data [32]. As a DQ study based on existing and deidentified data, this work was not classified as human subjects research and did not require Institutional Review Board approval.

Data sources

There were two types of datasets used in the study. One was a synthetic testing dataset designed to emulate the behaviors of EHR data and to validate the benchmarking code. The second set was the larger benchmarking data set extracted from the YNHH healthcare system.

We created our testing data set from the Medical Information Mart for Intensive Care III database version 1.4 (MIMIC III v1.4) for the study. MIMIC-III is a publicly available, single-center critical care database containing medical care information on 46,520 patients who were admitted between 2001–2012 to various ICUs of Beth Israel Deaconess Medical Center in Boston, Massachusetts [33]. All the MIMIC tables were transformed into the OMOP CDM through the Extract-Transform-Load (ETL) process [34].

The benchmarking data set contained daily and weekly extractions of the YNHH clinical data warehouse transformed into the OMOP CDM.

Data analysis and statistical approaches

Source datasets were stored as parquet format files on Hadoop Distributed File System (HDFS) of the CHP Spark cluster. Data extraction and data analysis were done with custom PySpark scripts using Apache Spark (v2.3.2) [35]. Benchmarking results were stored as CSV files on HDFS of CHP. All study-specific scripts were reviewed by an independent data scientist.

We analyzed two distinct sets of EHR snapshots that differed in temporal coverage and sampling frequency. First, we used 29 weekly snapshots collected between July 30 and November 13, 2024, which represented the most complete and temporally continuous series available; these snapshots were used for the analyses presented in Figs 3 and 4. To assess whether the trends observed in 2024 were consistent later in the year, we additionally analyzed 22 daily snapshots obtained between April 12 and May 3, 2025. For the daily series, the first snapshot (April 12, 2025) served as the baseline. We verified that baseline selection did not influence results by re-running the benchmarking code using alternative baseline dates and observing comparable outputs. Analyses using daily data (22 consecutive snapshots from April 12 to May 3, 2025) are reported in the Supplementary and were used for the inpatient/outpatient BM-3 analysis, whereas all other visualized analyses rely on the weekly 2024 snapshots. Summary statistics were computed, with the median and interquartile range (IQR) of patient counts reported.

Fig 3. Categorized EHR transactions between consecutive weekly snapshots.

Fig 3

Snapshots taken from July 30 to November 13, 2024.

Fig 4. Categorized demographics changes between consecutive weekly snapshots.

Fig 4

Snapshots taken from July 30 to November 13, 2024.

Three benchmarking assessments were performed: (1) a pre-specified analysis of database-level changes in patient records (referred to as “clinical actions”) between consecutive snapshots including patient additions, deletions, merges, and demographic changes; (2) a post-hoc analysis of specific demographic information updates among patients with changes, conducted to investigate the drivers of demographic instability identified in BM-1; and (3) a pre-specified analysis of stabilization timing for discharge time and status in baseline encounters. The benchmarking framework is illustrated in Fig 1.

Fig 1. The assessment categories for each benchmarking target.

Fig 1

For Benchmark 1 (BM-1), we assessed five categories of clinical actions that occurred between two snapshots (either consecutive snapshots or all compared to baseline snapshot). These clinical actions encompass both prospective changes—such as newly added patients entering the healthcare system—and retrospective corrections, including patient ID updates, merged duplicate records, and deleted records. Benchmarking these changes helps us better understand the evolving source patient population and the data quality improvement processes in our dataset. We have used the combination of gender, DOB, race, and ethnicity to best identify the same patient in two different snapshots. This was based on two considerations: (1) Patient ID was not reliable (2) Single demographics information might be updated, but the chance of updating the whole combination was comparably low. Following the observation of frequent demographic updates in BM-1, we performed a post-hoc analysis, Benchmark 2 (BM-2), to further assessed changes in demographic information, i.e., DOB, gender, race, and ethnicity. While this information is usually considered consistent in patients, frequent changes can indicate mis-entered information or systematic data corrections. Benchmark 3 (BM-3) focused on data entries regarding discharge time and status, as the information should be collected consistently and correspondingly in clinical workflow but often is not.

Process of preparing the testing data set

To validate the automated EHR data extraction pipeline, we simulated various clinical actions (playbook) based on the MIMIC data to generate the testing dataset. For BM-1, the clinical actions included adding new patients with different ID (AD), updating existing patients’ demographics information (IR), updating existing patients’ ID (IC), deleting existing patients (DL), and merging existing patients (DM). The extraction and analytical scripts caught all mocked events (sensitivity 100%). The pipeline was illustrated in Fig 2, with similar validation simulations completed for BM-2 and BM-3. Source codes to reproduce each analysis were included in our repository (https://github.com/ComputationalHealth/patient-merge) and were provided under a permissive open-source license (MIT License).

Fig 2. Pipeline for preparing testing dataset and validating analytics code.

Fig 2

Results

From July 30 to November 13, 2024, weekly snapshots of the person and visit_occurrence tables of the OMOP CDM were saved and used for analysis. We also analyzed 22 daily snapshots obtained between April 12 and May 3, 2025, during which the YNHH system EHR included a median of 2,403,201 unique patients and 135,271,445 encounters. Detailed statistics are listed in S1 Table.

BM-1: Categorized clinical actions detected between consecutive snapshots

The number of transactions per category were calculated for each snapshot in comparison to the previous snapshot. Between weekly snapshots from September 30 to November 13, 2024, most transactions involved new patients, followed by ID changes and demographics changes (Fig 3). This pattern was consistent between daily snapshots from April 12 to May 3, 2025, during which the median number of new patients was 817 (IQR: 332), updated patient IDs was 642 (IQR: 990), updated demographics information was 488 (IQR: 133), deleted patients was 118 (IQR: 237), and merged patient IDs was 70 (IQR: 36). The daily breakdown of number of patients in each of the clinical action categories is shown in S2 Table.

BM-2: Categorized demographics changes detected between consecutive snapshots

The post-hoc analysis of specific demographic changes (BM-2) identified frequent updates to gender, DOB, race, and ethnicity. During the 2024 study period, these modifications were non-uniformly distributed, with peak activity occurring in August and October (Fig 4). The number of patients with changes in each demographics change category was calculated between each of the snapshots, i.e., comparing that day’s or week’s snapshot to the previous snapshot. From most to least, the median number of patients with detected EHR changes made between snapshots to race was 411 (IQR: 105), ethnicity was 119 (IQR: 48), date of birth was 15 (IQR: 19), and sex was 6 (IQR: 6). The detailed number of patients falling in each of the categories for each daily snapshot is provided in S3 Table.

BM-3: Information updates on discharge encounters

When patients get discharged, both discharge time and discharge status should be entered into the EHR to reflect the real-time status of patients. However, occasionally either discharge time or discharge status are updated several days after the discharge event. To assess the delay in the stable state of discharge information, we evaluated two types of patient cohorts based on their baseline snapshot data: (1) Patients with discharge time in their discharge encounter; and (2) Patients without discharge time but with discharge status in their discharge encounter. We compared continuous and consecutive daily snapshots to ensure sufficient granularity, i.e., 22 snapshots from April 12, 2025 to May 3, 2025. Assigning the first snapshot on April 12, 2025, as the baseline, we compared all the following snapshots to the baseline. On April 12, 2025, there were 994,405 patients with inpatient encounters and 1,648,863 patients with outpatient encounters in the EHR. The number of patients whose discharge status, discharge time, or both were updated was calculated for each subsequent snapshot. For all patients with inpatient encounters in the baseline snapshot, 517 patients had only their discharge status changed, 287 patients had only their discharge time changed, and 10 patients had both discharge status and discharge time changed by the end of the observation period (May 3, 2025). For patients with outpatient encounters in the baseline snapshot, 6,484 patients had only their discharge status changed, 996 patients had only their discharge time changed, and 144 patients had both discharge status and discharge time changed. The discharged status and discharged time information stabilized in approximately 4–7 days, though we observed a spike in discharge status changes among patients with outpatient encounters on May 2, 2025 (Fig 5). The daily counts of patients for each of the discharge change categories are provided for both inpatient (S4 Table) and outpatient cohorts (S5 Table).

Fig 5. The number of patients with discharge information changes, including discharge time, discharge status, or both, for patients who during the baseline snapshot on April 12, 2025, had a discharge time recorded in their discharge encounter.

Fig 5

(A) Patients with inpatient encounters (B) Patients with outpatient encounters. Discharge information was compared for each day between April 12 and May 3, 2025, with the discharge information in the baseline snapshot.

Discussion

This study systematically evaluated real-time EHR data stability and identified critical patterns that affect research readiness. Our results demonstrate three gaps that static quality assessment cannot address: First, we identified continuous clinical actions including frequent patient additions, ID changes, and demographic modifications that occur between consecutive daily and weekly snapshots (patient additions, deletions, merges) revealing that EHR data undergo constant real-time updates that static tools cannot capture. Second, demographic changes occur predominantly in race and ethnicity fields, indicating workflow inconsistencies that compromise data reliability. Race and ethnicity data at YNHH are self-assigned by patients, yet frequent updates could reflect incomplete initial collection during emergency visits that are completed in subsequent encounters, retrospective standardization to OMOP vocabulary by administrative staff, delayed patient consent after initially declining to provide this information, and reconciliation processes when patients provide different responses across YNHH facilities. Third, our discharge encounter analysis revealed that both discharge status and time required 4–7 days to stabilize post-encounter. Across the study period, more than 700 inpatient cases and 7,000 outpatient cases exhibited changes to discharge information. We also observed a sharp increase in the number of outpatient encounters with updated discharge status on May 2, 2025—from roughly 2,200–6,300 cases—which may indicate a scheduled, monthly update to discharge records. Another explanation is that certain outpatient encounters may undergo delayed finalization (e.g., completion of provider documentation, coding review, or end-of-month reconciliation workflows), leading to a large batch of discharge updates being released simultaneously. These results demonstrate that while real-time EHR data offer valuable research opportunities, data quality depends on understanding stabilization timeframes and implementing appropriate validation methods. Our benchmarking framework provides healthcare systems and researchers with evidence-based guidance for determining when dynamic clinical data become analysis-ready, addressing a critical gap in real-time clinical research.

Implementing automated real-time EHR benchmarking systems faces significant operational challenges, not only in addressing data heterogeneity and inconsistency but also in integrating teams of health IT experts, clinical informaticists, healthcare providers, and administrative staff to ensure enterprise-wide data quality and proper interpretation of observational results [21,36]. However, our benchmarking pipeline demonstrates that these challenges can be systematically addressed through validated automated approaches: we successfully used a synthetic MIMIC-III dataset to establish ground truth validation for our benchmark procedures, identified specific stabilization patterns (4–7-day periods for discharge information), and quantified demographic change frequencies that reveal workflow inconsistencies requiring targeted interventions. Therefore, while implementation complexity remains a barrier, our results provide the evidence-based framework necessary for healthcare systems to deploy automated benchmarking that delivers actionable insights to researchers and leadership, enabling informed decisions about data readiness timing and quality assurance in real-time clinical data applications.

This study has several limitations. First, as benchmarking data were collected from a single healthcare system, external validation across multiple healthcare systems would strengthen these findings. However, the temporal delays in discharge information stabilization and demographic field inconsistencies arise from how real-time Epic Caboodle data are mapped to OMOP’s standardized schemas—a process common to any institution implementing OMOP CDM. Therefore, the data stability characteristics we observed are likely tied to the OMOP CDM structure and transformation process rather than YNHH-specific system configurations. Furthermore, the pipeline in this study was designed to efficiently perform the same analysis at different healthcare systems – particularly systems with data already mapped to OMOP CDM, which has been shown to improve data quality and anomaly detection [28]. Second, the benchmarking was focused on only three use cases. Although these use cases were validated by the clinical research team as typical clinical scenarios, future work will expand this evaluation to encompass a broader range of clinical scenarios and research questions. Finally, we did not assess how to automatically integrate these insights—such as stabilization timeframes for data readiness—into researcher workflows and data aggregation processes. This integration represents an important area for future development to maximize the practical utility of our approach.

Despite the potential challenges related to the use of real-time clinical data described here, it remains a new, valuable, and rapidly evolving field. The presented approach can be used to investigate the time it takes for other EHR information, e.g., laboratory results or data on treatment responses, to stabilize and reflect the actual patient status. This strategy has been adopted in one of our recent Clinical characteristics and outcomes research for SARS-CoV-2 infection [2]. Proper assessment and validation of real-time clinical datasets will enhance the reliability and impact of research across disease surveillance, risk detection, outcome studies, and clinical trials. As healthcare systems increasingly adopt real-time data analytics, establishing standardized benchmarking approaches will be critical for ensuring data quality and supporting evidence-based clinical decision-making at scale.

Supporting information

S1 Table. Statistics on number of patients and associated encounters for daily snapshots between April 12 to May 31, 2025.

(DOCX)

pone.0340287.s001.docx (13.4KB, docx)
S2 Table. Number of patients with EHR clinical actions taken between consecutive daily snapshots from April 12, 2025, to May 3, 2025.

(DOCX)

pone.0340287.s002.docx (13.3KB, docx)
S3 Table. Number of patients with EHR demographics change performed between consecutive daily snapshots from April 12, 2025, to May 3, 2025.

(DOCX)

pone.0340287.s003.docx (12.8KB, docx)
S4 Table. Number of patients with inpatient encounters whose discharge information was changed from that in the baseline snapshot of April 12, 2025.

(DOCX)

pone.0340287.s004.docx (12.6KB, docx)
S5 Table

Number of patients with outpatient encounters whose discharge information was changed from that in the baseline snapshot of April 12, 2025.

(DOCX)

pone.0340287.s005.docx (14.7KB, docx)

Data Availability

Data and source codes to reproduce each analysis were included in our repository (https://github.com/ComputationalHealth/patient-merge) and were provided under a permissive open-source license (MIT License).

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Dagliati A, Malovini A, Tibollo V, Bellazzi R. Health informatics and EHR to support clinical research in the COVID-19 pandemic: An overview. Brief Bioinform. 2021;22(2):812–22. doi: 10.1093/bib/bbaa418 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Schulz WL, Durant TJS, Torre CJ Jr, Hsiao AL, Krumholz HM. Agile health care analytics: enabling real-time disease surveillance with a computational health platform. J Med Internet Res. 2020;22(5):e18707. doi: 10.2196/18707 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Moore JH, Barnett I, Boland MR, Chen Y, Demiris G, Gonzalez-Hernandez G, et al. Ideas for how informaticians can get involved with COVID-19 research. BioData Min. 2020;13:3. doi: 10.1186/s13040-020-00213-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Haimovich A, Ravindra NG, Stoytchev S, Young HP, Wilson FP, van Dijk D, et al. Development and validation of the COVID-19 severity index (CSI): a prognostic tool for early respiratory decompensation. 2020 [cited 2025 Sep 7]. p. 2020.05.07.20094573. Available from: https://www.medrxiv.org/content/10.1101/2020.05.07.20094573v2 [DOI] [PMC free article] [PubMed]
  • 5.Zou KH, Li JZ, Imperato J, Potkar CN, Sethi N, Edwards J, et al. Harnessing real-world data for regulatory use and applying innovative applications. J Multidiscip Healthc. 2020;13:671–9. doi: 10.2147/JMDH.S262776 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Katkade VB, Sanders KN, Zou KH. Real world data: An opportunity to supplement existing evidence for the use of long-established medicines in health care decision making. J Multidiscip Healthc. 2018;11:295–304. doi: 10.2147/JMDH.S160029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Seeger JD, Nunes A, Loughlin AM. Using RWE research to extend clinical trials in diabetes: An example with implications for the future. Diabetes Obes Metab. 2020;22 Suppl 3(Suppl 3):35–44. doi: 10.1111/dom.14021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Booth CM, Karim S, Mackillop WJ. Real-world data: towards achieving the achievable in cancer care. Nat Rev Clin Oncol. 2019;16(5):312–25. doi: 10.1038/s41571-019-0167-7 [DOI] [PubMed] [Google Scholar]
  • 9.Sherman RE, Anderson SA, Dal Pan GJ, Gray GW, Gross T, Hunter NL, et al. Real-world evidence - what is it and what can it tell us?. N Engl J Med. 2016;375(23):2293–7. doi: 10.1056/NEJMsb1609216 [DOI] [PubMed] [Google Scholar]
  • 10.Rudrapatna VA, Butte AJ. Opportunities and challenges in using real-world data for health care. J Clin Invest. 2020;130:565–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Majumder MS, Rose S. Health care claims data may be useful for COVID-19 research despite significant limitations. Health Affairs. 2020. doi: 10.1377/hblog20201001.977332 [DOI] [Google Scholar]
  • 12.Jollis JG, Ancukiewicz M, DeLong ER, Pryor DB, Muhlbaier LH, Mark DB. Discordance of databases designed for claims payment versus clinical information systems. Implications for outcomes research. Ann Intern Med. 1993;119(8):844–50. doi: 10.7326/0003-4819-119-8-199310150-00011 [DOI] [PubMed] [Google Scholar]
  • 13.Hartzema AG, Racoosin JA, MaCurdy TE, Gibbs JM, Kelman JA. Utilizing Medicare claims data for real‐time drug safety evaluations:Is it feasible?. Pharmacoepidemiol Drug Saf. 2011;20(7):684–8. doi: 10.1002/pds.2143 [DOI] [PubMed] [Google Scholar]
  • 14.Parsons A, McCullough C, Wang J, Shih S. Validity of electronic health record-derived quality measurement for performance monitoring. J Am Med Inform Assoc. 2012;19(4):604–9. doi: 10.1136/amiajnl-2011-000557 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Rusanov A, Weiskopf NG, Wang S, Weng C. Hidden in plain sight: Bias towards sick patients when sampling patients with sufficient electronic health record data for research. BMC Med Inform Decis Mak. 2014;14:51. doi: 10.1186/1472-6947-14-51 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hersh WR, Weiner MG, Embi PJ, Logan JR, Payne PRO, Bernstam EV, et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Medical Care. 2013;51:S30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Johnson SG, Speedie S, Simon G, Kumar V, Westra BL. A data quality ontology for the secondary use of EHR data. AMIA Annu Symp Proc. 2015:1937–46. [PMC free article] [PubMed] [Google Scholar]
  • 18.Greene SK, Kulldorff M, Lewis EM, Li R, Yin R, Weintraub ES, et al. Near real-time surveillance for influenza vaccine safety: Proof-of-concept in the Vaccine Safety Datalink Project. Am J Epidemiol. 2010;171(2):177–88. doi: 10.1093/aje/kwp345 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Brown JS, Moore KM, Braun MM, Ziyadeh N, Chan KA, Lee GM, et al. Active influenza vaccine safety surveillance: potential within a healthcare claims environment. Med Care. 2009;47(12):1251–7. doi: 10.1097/MLR.0b013e3181b58b5c [DOI] [PubMed] [Google Scholar]
  • 20.Khela H, Khalil J, Daxon N, Neilson Z, Shahrokhi T, Chung P, et al. Real world challenges in maintaining data integrity in electronic health records in a cancer program. Tech Innov Patient Support Radiat Oncol. 2024;29:100233. doi: 10.1016/j.tipsro.2023.100233 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Sarwar T, Seifollahi S, Chan J, Zhang X, Aksakalli V, Hudson I, et al. The secondary use of electronic health records for data mining: Data characteristics and challenges. ACM Comput Surv. 2022;55(2):1–40. doi: 10.1145/3490234 [DOI] [Google Scholar]
  • 22.Schorer AE, Moldwin R, Koskimaki J, Bernstam EV, Venepalli NK, Miller RS, et al. Chasm between cancer quality measures and electronic health record data quality. JCO Clin Cancer Inform. 2022;6:e2100128. doi: 10.1200/CCI.21.00128 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Getzen E, Ungar L, Mowery D, Jiang X, Long Q. Mining for equitable health: Assessing the impact of missing data in electronic health records. J Biomed Inform. 2023;139:104269. doi: 10.1016/j.jbi.2022.104269 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Brundin-Mather R, Soo A, Zuege DJ, Niven DJ, Fiest K, Doig CJ, et al. Secondary EMR data for quality improvement and research: A comparison of manual and electronic data collection from an integrated critical care electronic medical record system. J Crit Care. 2018;47:295–301. doi: 10.1016/j.jcrc.2018.07.021 [DOI] [PubMed] [Google Scholar]
  • 25.Holmes JH, Beinlich J, Boland MR, Bowles KH, Chen Y, Cook TS, et al. Why Is the Electronic Health Record So Challenging for Research and Clinical Care?. Methods Inf Med. 2021;60(1–02):32–48. doi: 10.1055/s-0041-1731784 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Kahn MG, Ranade D. The impact of electronic medical records data sources on an adverse drug event quality measure. J Am Med Inform Assoc. 2010;17(2):185–91. doi: 10.1136/jamia.2009.002451 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kahn MG, Callahan TJ, Barnard J, Bauck AE, Brown J, Davidson BN, et al. A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. EGEMS (Wash DC). 2016;4(1):1244. doi: 10.13063/2327-9214.1244 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ramakrishnaiah Y, Macesic N, Webb GI, Peleg AY, Tyagi S. EHR-QC: A streamlined pipeline for automated electronic health records standardisation and preprocessing to predict clinical outcomes. J Biomed Inform. 2023;147:104509. doi: 10.1016/j.jbi.2023.104509 [DOI] [PubMed] [Google Scholar]
  • 29.Ozonze O, Scott PJ, Hopgood AA. Automating electronic health record data quality assessment. J Med Syst. 2023;47(1):23. doi: 10.1007/s10916-022-01892-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Heumos L, Ehmele P, Treis T, Upmeier zu Belzen J, Roellin E, May L, et al. An open-source framework for end-to-end analysis of electronic health record data. Nat Med. 2024;30:3369–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Voss EA, Makadia R, Matcho A, Ma Q, Knoll C, Schuemie M. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J Am Med Inform Assoc. 2015;22:553–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.McPadden J, Durant TJ, Bunch DR, Coppi A, Price N, Rodgerson K. Health care and precision medicine research: Analysis of a scalable data science platform. J Med Internet Res. 2019;21:e13043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Johnson AEW, Pollard TJ, Shen L, Lehman LH, Feng M, Ghassemi M. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3:160035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Hripcsak George, Duke Jon D., Shah Nigam H., Reich Christian G., Huser Vojtech, Schuemie Martijn J., et al. Observational health data sciences and informatics (OHDSI): Opportunities for observational researchers. Studies Health Technol Inf. IOS Press. 2015. doi: 10.3233/978-1-61499-564-7-574 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Apache SparkTM - Unified Engine for Large-Scale Data Analytics. https://spark.apache.org/. Accessed 2023 October 1.
  • 36.Tsai CH, Eghdam A, Davoody N, Wright G, Flowerday S, Koch S. Effects of electronic health record implementation and barriers to adoption and use: A scoping review and qualitative analysis of the content. Life (Basel). 2020;10(12):327. doi: 10.3390/life10120327 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Sreeram V Ramagopalan

24 Oct 2025

Dear Dr. Gong,

Please submit your revised manuscript by Dec 08 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Sreeram V. Ramagopalan

Academic Editor

PLOS ONE

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.-->--> -->-->Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at -->-->https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and -->-->https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf-->--> -->-->2. Thank you for stating the following in the Competing Interests section: -->-->Harlan Krumholz works under contract with the Centers for Medicare & Medicaid Services to support quality measurement programs; was a recipient of a research grant, through Yale, from Medtronic and the U.S. Food and Drug Administration to develop methods for post-market surveillance of medical devices; was a recipient of a research grant with Medtronic and is the recipient of a research grant from Johnson & Johnson, through Yale University, to support clinical trial data sharing; was a recipient of a research agreement, through Yale University, from the Shenzhen Center for Health Information for work to advance intelligent disease prevention and health promotion; collaborates with the National Center for Cardiovascular Diseases in Beijing; receives payment from the Arnold & Porter Law Firm for work related to the Sanofi clopidogrel litigation, from the Ben C. Martin Law Firm for work related to the Cook Celect IVC filter litigation, and from the Siegfried and Jensen Law Firm for work related to Vioxx litigation; chairs a Cardiac Scientific Advisory Board for UnitedHealth; was a participant/participant representative of the IBM Watson Health Life Sciences Board; is a member of the Advisory Board for Element Science, the Advisory Board for Facebook, and the Physician Advisory Board for Aetna; and is the co-founder of HugoHealth, a personal health information platform, and co-founder of Refactor Health, an AI-augmented data management platform for healthcare. -->-->Wade Schulz is an investigator for a research agreement, through Yale University, from the Shenzhen Center for Health Information for work to advance intelligent disease prevention and health promotion; collaborates with the National Center for Cardiovascular Diseases in Beijing; is a technical consultant to HugoHealth, a personal health information platform, and cofounder of Refactor Health, an AI-augmented data management platform for healthcare; is a consultant for Interpace Diagnostics Group, a molecular diagnostics company.-->-->Guannan Gong is the founder of CtrlTrial Inc., an AI-augmented patient screening platform for clinical trials. -->--> -->-->We note that one or more of the authors are employed by a commercial company. -->--> -->-->a. Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form.-->--> -->-->Please also include the following statement within your amended Funding Statement. -->-->“The funder provided support in the form of salaries for authors, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”-->-->If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement. -->--> -->-->b. Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc.  -->--> -->-->Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to  PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and  there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.-->--> -->-->Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.-->--> -->-->3. Thank you for uploading your study's underlying data set. Unfortunately, the repository you have noted in your Data Availability statement does not qualify as an acceptable data repository according to PLOS's standards.-->--> -->-->At this time, please upload the minimal data set necessary to replicate your study's findings to a stable, public repository (such as figshare or Dryad) and provide us with the relevant URLs, DOIs, or accession numbers that may be used to access these data. For a list of recommended repositories and additional information on PLOS standards for data deposition, please see https://journals.plos.org/plosone/s/recommended-repositories.-->--> -->-->4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.-->--> -->-->5. If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. ?>

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: No

**********

2. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

**********

Reviewer #1:  Liu and colleagues report their analysis of the integrity of real-time electronic health records (EHR) data for use in clinical research, in which they assess the stability and completeness of the recorded information in the Yale New Haven Health (YNHH) clinical data warehouse and conclude that a time lag of 4-7 days are required for records to stabilise to ensure validity for research. The work is interesting and has merit to other researchers who can utilise the framework in assessing the integrity of other local EHR systems, however I do feel the findings presented are likely to be very specific to the YNHH context, and thus the external validity of the results is uncertain.

Major comments

1. The methods section would benefit from further description of the YNHH dataset, e.g. patients that are included, sector of healthcare etc. How records enter the YNHH dataset, does each clinical encounter generate a new record, who inputs the data, etc?

2. The methods section would benefit from revision to make it clearer which samples of data used are in the analysis, currently the methods refers only to data between April/May 2025, however the axes of figures 3-5 shows data spanning July/Nov 2024, and results section also describes additional methods relating to data in November 2024. It’s not fully clear to me why the dates would change for these different assessments so this should be spelled out. It would also be helpful to know if there was any rationale for the choice of baseline dates selected and why 26 snapshots were used?

3. The results section is very descriptive and contains few statistical results, which are instead included in an appendix. I would strongly suggest adding more summary statistics to support assertions being made.

4. Conclusions “Our findings indicate that real-time EHR data require 4–7 days for discharge information stabilization and systematic monitoring of demographic field consistency to ensure research validity.” I would probably suggest revising this statement slightly to acknowledge that this is a finding within the YNHH dataset, as external validity is not proven.

Minor

5. The first benchmark assessment relates to ‘clinical actions such as additions, deletions, and merges’ – I would suggest defining exactly what ‘clinical actions’ means as it is not entirely clear to me and I’m assuming it refers to specific clinical events or healthcare encounters than have happened, and the additions, deletions or merges occur if an update has been made to either add a new event or more accurately reflect one which has taken place?

6. Race and ethnicity data were shown to be frequently updated, do you have any further insight into how race/ethnicity data are collected? Are they self-assigned or healthcare provider assigned categories? The latter tends be more prone to data quality issues therefore specifying this would be informative.

7. Careful review needed to check all acronyms spelled out at first use eg. BM-1 /consider if even needs to be abbreviated

8. Figures 3 and 4 – I am curious to understand if there is any explanations for the apparent spike in patient numbers on 09/27/24 ?

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

To ensure your figures meet our technical requirements, please review our figure guidelines: https://journals.plos.org/plosone/s/figures 

You may also use PLOS’s free figure tool, NAAS, to help you prepare publication quality figures: https://journals.plos.org/plosone/s/figures#loc-tools-for-figure-preparation. 

NAAS will assess whether your figures meet our technical requirements by comparing each figure against our figure specifications.

PLoS One. 2026 Jan 9;21(1):e0340287. doi: 10.1371/journal.pone.0340287.r002

Author response to Decision Letter 1


4 Dec 2025

Thank you for your review and comments, please see our specific response in the attached files.

Attachment

Submitted filename: EHRQuality_ResponsetoReviewers.docx

pone.0340287.s007.docx (41.9KB, docx)

Decision Letter 1

Sreeram V Ramagopalan

16 Dec 2025

Dear Dr. Gong,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

plosone@plos.org

  • A letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Sreeram V. Ramagopalan

Academic Editor

PLOS One

Journal Requirements:

1. If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. 

2. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions??>

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

**********

Reviewer #1: Liu and colleagues have responded positively to all prior comments and the manuscript and its reporting is far strengthened by the revisions. One minor remaining comment I have relates to description of methods lingering the results section. For BM-2, it is suggested that this deeper analysis was conducted due the findings of BM-1. It would be helpful to state this earlier in the methods section, and ideally differentiate what analyses were pre-specified and which were post-hoc, which it sounds like BM-2 may have been.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

To ensure your figures meet our technical requirements, please review our figure guidelines: https://journals.plos.org/plosone/s/figures 

You may also use PLOS’s free figure tool, NAAS, to help you prepare publication quality figures: https://journals.plos.org/plosone/s/figures#loc-tools-for-figure-preparation. 

NAAS will assess whether your figures meet our technical requirements by comparing each figure against our figure specifications.

PLoS One. 2026 Jan 9;21(1):e0340287. doi: 10.1371/journal.pone.0340287.r004

Author response to Decision Letter 2


17 Dec 2025

Per last comment, We thank the reviewer for their continued feedback and support. We have updated the Methods section to explicitly categorize BM-1 and BM-3 as pre-specified analyses, while identifying BM-2 as a post-hoc investigation sparked by the demographic instability observed in BM-1. We also relocated the methodological rationale for BM-2 from the Results to the Methods to ensure a clear separation of study design and findings. The Results section now opens directly with the quantitative findings regarding demographic updates in 2024. These revisions enhance the manuscript's clarity and adhere to the distinction between planned and exploratory analyses.

Attachment

Submitted filename: EHRQuality_ResponsetoReviewers_v2.docx

pone.0340287.s008.docx (14.4KB, docx)

Decision Letter 2

Sreeram V Ramagopalan

18 Dec 2025

Assessment of the Integrity of Real-Time Electronic Health Record Data used in Clinical Research

PONE-D-25-49671R2

Dear Dr. Gong,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager®  and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support .

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Sreeram V. Ramagopalan

Academic Editor

PLOS One

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Sreeram V Ramagopalan

PONE-D-25-49671R2

PLOS One

Dear Dr. Gong,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS One. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Sreeram V. Ramagopalan

Academic Editor

PLOS One

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Statistics on number of patients and associated encounters for daily snapshots between April 12 to May 31, 2025.

    (DOCX)

    pone.0340287.s001.docx (13.4KB, docx)
    S2 Table. Number of patients with EHR clinical actions taken between consecutive daily snapshots from April 12, 2025, to May 3, 2025.

    (DOCX)

    pone.0340287.s002.docx (13.3KB, docx)
    S3 Table. Number of patients with EHR demographics change performed between consecutive daily snapshots from April 12, 2025, to May 3, 2025.

    (DOCX)

    pone.0340287.s003.docx (12.8KB, docx)
    S4 Table. Number of patients with inpatient encounters whose discharge information was changed from that in the baseline snapshot of April 12, 2025.

    (DOCX)

    pone.0340287.s004.docx (12.6KB, docx)
    S5 Table

    Number of patients with outpatient encounters whose discharge information was changed from that in the baseline snapshot of April 12, 2025.

    (DOCX)

    pone.0340287.s005.docx (14.7KB, docx)
    Attachment

    Submitted filename: EHRQuality_ResponsetoReviewers.docx

    pone.0340287.s007.docx (41.9KB, docx)
    Attachment

    Submitted filename: EHRQuality_ResponsetoReviewers_v2.docx

    pone.0340287.s008.docx (14.4KB, docx)

    Data Availability Statement

    Data and source codes to reproduce each analysis were included in our repository (https://github.com/ComputationalHealth/patient-merge) and were provided under a permissive open-source license (MIT License).


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES