Abstract
Introduction
Clinical trial data is still predominantly manually entered by site staff into Electronic Data Capture (EDC) systems. This process of abstracting and manually transcribing patient data is time-consuming, inefficient and error prone. Use of Electronic Health Record to Electronic Data Capture (EHR-To-EDC) technologies that digitize this process would improve these inefficiencies.
Objectives
This study measured the impact of EHR-To-EDC technology on the data entry workflow of clinical trial data managers. The primary objective was to compare the speed and accuracy of the EHR-To-EDC enabled data entry method to the traditional, manual method. The secondary objective was to measure end user satisfaction.
Materials and Methods
Five data managers ranging in experience from 9 months to over 2 years, were assigned an investigator-initiated, Memorial Sloan Kettering-sponsored oncology study within their disease area of expertise. Each data manager performed one-hour of manual data entry, and a week later, one-hour of data entry using IgniteData’s EHR-To-EDC solution, Archer, on a predetermined set of patients, timepoints and data domains (labs, vitals). The data entered into the EDC were compared side-by-side and used to evaluate the speed and accuracy of the EHR-To-EDC enabled method versus traditional, manual data entry. A user satisfaction survey using a 5-point Likert scale was used to collect feedback regarding the selected platform’s learnability, ease of use, perceived time savings, perceived efficiency, and preference over the manual method.
Results
The EHR-To-EDC method resulted in 58% more data entered versus the manual method (difference, 1745 data points; manual, 3023 data points; EHR-To-EDC, 4768 data points). The number of data entry errors was reduced by 99% (manual, 100 data points; EHR-To-EDC, 1 data point). Regarding user satisfaction, data managers either agreed or strongly agreed that the EHR-To-EDC workflow was easy to learn (5/5), easy to use (4.6/5), saved time (5/5), was more efficient (4.8/5), and preferred it over the manual entry workflow (4/5).
Conclusion
EHR-To-EDC enabled data entry increases data manager productivity, reduces errors and is preferred by data managers over manual data entry.
Keywords: EHR-To-EDC, electronic data capture, data managers, technologies, clinical research, throughput
Introduction
Clinical trials are continuing to grow in complexity across all phases and in most therapeutic areas.1 This increase in complexity is correlated with longer study timelines and puts added pressure on sites already facing rising costs and staff shortages. Technologies that reduce redundant, error prone tasks can help sites combat the effects of growing study complexity.
In the last decade, several technological advancements have been made to support EHR system optimization and EHR data extraction tools. Yet, the adoption of Electronic Health Record to Electronic Data Capture (EHR-To-EDC) within clinical research workflows remains a global challenge.2,3 Concerns over EHR-To-EDC implementations inadvertently increasing the burden on sites and clinical trial participants, as well as technology challenges have been cited as contributing factors. EHR-To-EDC technologies, which leverage healthcare data standards such as Health Level 7 (HL7®) Fast Healthcare Interoperability Resources (FHIR®) and terminology standards such as Logical Observation Identifiers, Names, and Codes (LOINC), enable site staff to safely and securely electronically transfer trial participant data from the EHR to an EDC. In previous studies, EHR-To-EDC enabled data entry was shown to be more cost effective, less error prone, and reduced time spent by 37% when compared to traditional manual transcription.4–7 However, there are no time-controlled studies in the literature comparing the 2 workflows under identical, real-world conditions.
With the industry struggling to bridge this interoperability gap, we conducted this within-subjects study to directly compare EHR-To-EDC enabled data transfers versus traditional, manual data entry. This study aims to reinforce the benefits of electronic data transfer and provide a more holistic site perspective with data regarding end user feedback and satisfaction. To our knowledge and to provide greater contextualization of our results within the broader literature, this is the first time an analysis such as this has been performed using real world phase I/II oncological clinical research trial data.
Objective
The primary objectives of this study were to compare the speed and accuracy of the EHR-To-EDC enabled data entry method versus traditional, manual data entry. The secondary objective was to measure user satisfaction.
Methods
Setting, Study, and eCRF Selection
Memorial Sloan Kettering (MSK) is a high-volume National Cancer Institute (NCI)—designated Comprehensive Cancer Center. We compared the EHR-To-EDC data entry workflow to manual data entry by running both workflows side-by-side on a predetermined set of patients, timepoints and electronic case report forms (eCRFs) in a time-controlled setting. Five MSK investigator-initiated oncology clinical trials were selected to be a part of this study (Table SI). Trials varied across 5 clinical disease areas and were chosen based on the following requirements: contained a high volume of labs and vitals data, had at least one monitoring event, and contained eCRFs that were compatible with the EHR-To-EDC platform. Studies required at least 20 subjects to be enrolled to ensure there would be enough data to enter within the given timeframe. To control variability across trials, the data entry tasks focused on three common eCRFs within the labs and vitals data domains: complete blood count, comprehensive metabolic panel, and vital signs.
System Requirements
This study involved three disparate systems: a homegrown EHR-like system, the EHR-To-EDC technology, and the EDC. To minimize setup time the data entry scope was kept consistent across the five trials by focusing on three common eCRFs: complete blood count, comprehensive metabolic panel, and vital signs. Granting user access and configuring each system took approximately 2-4 hours per system per trial and was performed by experienced system analysts.
A proprietary EHR-like system developed by the site was used in place of the EHR. This was mainly due to lack of HL7® FHIR® capability on the hospital’s EHR, which was a requirement for the EHR-To-EDC platform. The EHR-like system is built on top of a data warehouse that aggregates data from upstream clinical systems and uses HL7® FHIR®, the widely used healthcare data standard to expose participant data via RESTful API. The system allows data managers to navigate to a subject on study and launch the EHR-To-EDC technology within the subject’s context.
The EHR-To-EDC technology used in this study was Archer, developed in 2019 by IgniteData. Archer is an EHR and EDC agnostic middleware tool that leverages HL7® FHIR® and standard terminology concepts such as LOINC to enable data managers to electronically transfer participant data from site to sponsor. Archer requires a one-time initial setup on the EHR and EDC, as well as study level configuration to accommodate each trial’s unique database structures. Technical resources are required for the initial EHR and EDC setup. Resources are also needed to define the data requirements that are ultimately configured into Archer by the vendor for each trial. Archer configuration was validated prior to all data entry activities to minimize the risk of system-related errors. The Medidata Rave EDC system was used as the EDC platform. A replicate of each trial’s database structure was created in an isolated environment of the EDC to ensure data entered for the purposes of this study did not conflict with live study activities. Different sites were created within each study database to distinguish data entered manually versus using the EHR-To-EDC technology. Subjects were created in the EDC prior to data entry activities so that data managers could focus solely on the entry of labs and vitals.
Participant Selection and Training
Five data managers from five different disease areas were selected from MSK’s clinical research operations unit to participate in this study. Participants ranged in experience from 9 months to over 2 years and were required to be entering data for at least one unrelated clinical trial as part of their daily responsibilities. Each participant was assigned one clinical trial within their disease area of expertise for this study. Since participants typically perform data entry as part of their day-to-day responsibilities EDC system access and training was completed prior to this study. A 60-minute interactive EHR-To-EDC training was scheduled 3-5 days prior to each EHR-To-EDC session to ensure data managers were familiar with the workflow and had access to the EHR-like and EHR-To-EDC systems. Participants were provided with training materials detailing the end-to-end workflow; however, no information regarding subjects, timepoints or eCRFs were shared prior to any data entry activities.
Data Entry Activities
Two virtual data entry activities (via Teams) were conducted per each of the five clinical trials: one manual and one using the EHR-To-EDC platform, Archer. For each trial, these sessions were scheduled one week apart with the manual session occurring first. Sessions spanned a total of 90 minutes and were structured as per the following: welcoming and instructions (15 minutes), data entry task, either manual or EHR-To-EDC (60 minutes), closing remarks and survey feedback (15 minutes). Instructions were provided both verbally, and in written format. During each session, participants were instructed to remain connected to the call in case of issues and to confirm that they were in a quiet, and distraction free environment in which they could focus solely on the data entry task. They were free to work using their typical virtual setup using either one or two monitors. Time was allocated for participants to log into the permitted EHR and EDC systems before the data entry task and access to the study in the EDC was granted while on call. A list of patients, timepoints and eCRFs was provided and participants were instructed to enter as much data as possible, in the order provided, within the one-hour timeframe. The eCRFs used in this study were limited to complete blood count, comprehensive metabolic panel, and vital signs. Moderators kept track of time to ensure participants maximized 60 minutes of data entry, and remained available in case questions or issues arose during the session. Satisfaction surveys were distributed at the closing of both manual and electronic sessions to gather near real time feedback.
Data Collection and Analysis
After both the manual and EHR-To-EDC data entry tasks were performed, data was exported from the database and two analysts performed a side-by-side comparison to evaluate the total number of data points and the number of errors. Date, time and test result were counted as discrete data points. Discrepancies between the manual and EHR-To-EDC methods were highlighted and verified against source documents. Errors were defined as instances in which incorrect data was entered in the EDC. In our primary analysis, errors were counted at the case or field or data point level with one error per incorrect user action. However, in the EHR-To-EDC workflow, associating a date to an incorrect visit can impact multiple eCRFs and fields. To be conservative, we conducted a sensitivity analysis with the date error counted as one error versus with the date error counted as the total 50 fields that would have been associated with an erroneous date.
We calculated error rates as the number of adjudicated errors divided by the number of fields inspected and their exact 95% confidence intervals (CIs) using the method of Clopper and Pearson.8 This method uses the binomial distribution to calculate CIs. To compare the manual and EHR-To-EDC methods while adjusting for manager’s experience, we fitted a mixed-effects logistic regression model with a random intercept for study. From this model, we estimated the odds ratio of error occurrence for the manual method versus EHR-To-EDC, and for manager experience of more than 2 years versus 9-12 months. These statistical analyses were performed using the R functions binom.test and glmer.9 A sensitivity analysis was conducted by fitting the same model to data in which one field error, for example, eCRF date, was replaced with 50 field errors (subsequent eCRF errors resulting from the previous error) in Study 4.
A User Satisfaction Survey using a 5-point Likert scale (1 = strongly disagree, 2 = disagree, 3 = neutral, 4 = agree, 5 = strongly agree) was used to collect feedback on the EHR-To-EDC workflow. The survey assessed the Archer platform’s: (1) learnability, (2) ease of use, (3) perceived time and effort (TE) savings, (4) perceived efficiency, and (5) preference versus manual workflows. Surveys were sent to all data managers directly after performing the electronic transfer session.
Results
Populating eCRFs Electronically Shows Significant Increase in Productivity
Among the five studies, 3023 data points were entered manually, and 4768 data points were entered using the EHR-To-EDC method. Overall, the EHR-To-EDC method resulted in 58% more data entered versus the manual method (difference, 1745 data points; manual, 3023 data points; EHR-To-EDC, 4768 data points) (Table 1).
Table 1.
Number of data points entered per study and overall.
| Manual | EHR-To-EDC | Difference (%) | |
|---|---|---|---|
| Study 1 | 460 | 859 | 87% |
| Study 2 | 720 | 1059 | 47% |
| Study 3 | 526 | 749 | 42% |
| Study 4 | 931 | 1300 | 40% |
| Study 5 | 386 | 801 | 108% |
| Total | 3023 | 4768 | 58% |
Data Quality and Accuracy
Based on the event level analysis, the number of errors was reduced by 99%; manual, 100 data points; EHR-To-EDC, 1 data point (Table 2). The one error identified in the EHR-To-EDC workflow was due to mis-associating a date to an incorrect visit, impacting a total of 3 eCRFs and 50 fields. In the sensitivity analysis, the use of the incorrect timepoint was reclassified as 3 errors at the eCRF level and 50 errors at the field level (Table 3).
Table 2.
Number of data points entered incorrectly per study and overall.
| Field count type | Field count | Manual errors | Manual error rate (%)a | EHR-To-EDC errors | EHR-To-EDC error rate (%)a |
|---|---|---|---|---|---|
| Study 1 | |||||
| All data points | 859 | 10 | 0 | 0 (0, 0.004) | |
| Matched data points | 460 | 10 | 0.022 (0.01, 0.04) | 0 | 0 (0, 0.008) |
| Study 2 | |||||
| All data points | 1059 | 33 | 0 | 0 (0, 0.003) | |
| Matched data points | 720 | 33 | 0.046 (0.032, 0.064) | 0 | 0 (0, 0.005) |
| Study 3 | |||||
| All data points | 749 | 15 | 0 | 0 (0, 0.005) | |
| Matched data points | 526 | 15 | 0.029 (0.016, 0.047) | 0 | 0 (0, 0.007) |
| Study 4 | |||||
| All data points | 1300 | 3 | 1 | 0.001 (0, 0.004) | |
| Matched data points | 931 | 3 | 0.003 (0.001, 0.009) | 0 | 0 (0, 0.004) |
| Study 5 | |||||
| All data points | 801 | 39 | 0 | 0 (0, 0.005) | |
| Matched data points | 386 | 39 | 0.101 (0.073, 0.136) | 0 | 0 (0, 0.01) |
Error rates with 95% confidence intervals.
Table 3.
Sensitivity analysis showing the number of cascading errors resulting from transferring data to an incorrect timepoint in Study 4.
| Manual errors | EHR-To-EDC errors (per eCRF) | EHR-To-EDC errors (per field) | |
|---|---|---|---|
| Total Errors | 3 | 3 | 50 |
Logistic regression revealed statistically significant differences in error occurrence by the method of data entry and manager experience. Firstly, the odds of error occurrence were substantially higher with the manual method (OR = 182.9, 95% CI: 40.7-3223.3, P < .001) compared to the EHR-To-EDC method (Table SII). Secondly, managers with more than 2 years of experience were 5.4 times more likely to have an error occur (95% CI: 1.3-23.9, P = .005) than those with 9-12 months of experience. In the sensitivity analysis, the odds of error occurrence remained higher with the manual method (OR = 3.3, 95% CI: 2.3-4.7, P < .001) compared to the EHR-To-EDC method, while the association with manager experience was not statistically significant (Table SIII).
Data Managers Prefer Using Virtual Assistant Over Manual Entry
Surveys had a 100% response rate (Table 4). All users strongly agreed (5.0/5.0) that Archer was easy to use and learn, and all agreed that the app was less time consuming (5.0/5.0), more efficient (4.8/5.0), and was preferred over the manual method (4.0/5.0).
Table 4.
User satisfaction results (rating scale, 1 = strongly disagree, 2 = disagree, 3 = neutral, 4 = agree, 5 = strongly agree).
| Average rating (n = 5) | |
|---|---|
| Archer’s electronic transfer workflow was easy for me to learn. | 5 |
| It was easy to transfer data to the EDC using Archer. | 4.6 |
| Transferring data through Archer is less time consuming than entering data manually. | 5 |
| Transferring data through Archer is more efficient than entering data manually. | 4.8 |
| I prefer transferring data electronically with Archer over manual data entry. | 4 |
Discussion
A key finding in our research showed an increase in data manager productivity. On average, data entry rates using EHR-To-EDC were 58% higher compared to manual data entry. By eliminating the manual entry, reducing transcription errors, and standardizing the data entry method, data managers were able to spend less time entering data using EHR-To-EDC, and the data was of higher quality. Another notable finding from this study was the consistent user satisfaction with using the EHR-To-EDC technology. Our survey showed that data managers clearly preferred using this method over the current manual data entry process. The users found the EHR-To-EDC technology platform easy to learn, easy to use, saved them time, and improved their overall efficiency. Two users noted that there was an initial learning curve, but they were able to appreciate the gain in adopting the EHR-To-EDC method.
Although there is significant promise in using the EHR-To-EDC technology, there are limitations. The method described in this article is reliant on access to structured data in the EHR. There is also a reliance on the data having a standardized unique identifier, such as LOINC codes, which in this case were used to uniquely identify the specific lab tests needed for each study. Data lacking standardized terminology concepts do not meet the technical requirements for EHR-To-EDC transfer; in the absence of standard codes, such as LOINC codes for lab tests, consistent interpretation automated and sometimes even manual mapping and exchange of information between systems cannot be achieved. Thus, sites with EHRs utilizing structured and standardized clinical data elements are likely to see similar improvements to those noted here; while sites and studies relying more heavily on unstructured clinical data, for example, clinical notes, are less likely to see much benefit with current EHR-To-EDC technologies. Additionally, our study was observational in nature, was performed at a single site, and used a single EHR-To-EDC technology application. While one of the larger studies conducted to date the work reported here was conducted in five studies and five abstracters. Other EHR-To-EDC technology and other contexts may yield varying results.
Despite the current limitations, there are significant opportunities for leveraging unstructured data within EHR systems. Unstructured data, such as clinical notes, pathology reports and radiology interpretations contain rich, contextual information that will increase the amount of data available for transfer. Data most often found in unstructured clinical documentation usually require the most time for manual abstraction and are also more prone to data quality issues. While advances in Artificial Intelligence and machine learning technologies, such as Large Language Models are beginning to streamline these processes, more work is needed to ensure consistent, high-quality data is generated and ready for EHR-To-EDC extraction.
Conclusion
In comparing EHR-To-EDC to the traditional manual data entry method, this study showed that EHR-To-EDC can streamline clinical trial data entry processes by increasing the speed and rate of data entry and improving overall efficiency. EHR-To-EDC enabled data entry increased productivity, reduced errors and was preferred by data managers over manual data entry methods. The evaluated EHR-To-EDC application demonstrated a notable decrease in time consumed by data collection and an increase in accuracy. While our findings are not generalizable to other EHR-To-EDC vendors and types of data beyond the labs and vitals investigated herein, these results add to a growing body of evidence for which EHR-To-EDC technology in all reported cases has performed similarly.
Supplementary Material
Contributor Information
Anna Patruno, Memorial Sloan Kettering Cancer Center, New York, NY 10065, United States.
Michael-Owen Panzarella, Memorial Sloan Kettering Cancer Center, New York, NY 10065, United States.
Michael Buckley, Memorial Sloan Kettering Cancer Center, New York, NY 10065, United States.
Milena Silverman, Memorial Sloan Kettering Cancer Center, New York, NY 10065, United States.
Evelyn Salazar, Memorial Sloan Kettering Cancer Center, New York, NY 10065, United States.
Renata Panchal, Memorial Sloan Kettering Cancer Center, New York, NY 10065, United States.
Joseph Lengfellner, Memorial Sloan Kettering Cancer Center, New York, NY 10065, United States.
Alexia Iasonos, Memorial Sloan Kettering Cancer Center, New York, NY 10065, United States.
Maryam Garza, University of Texas Health San Antonio, San Antonio, TX 78229, United States.
Byeong Yeob Choi, University of Texas Health San Antonio, San Antonio, TX 78229, United States.
Meredith Zozus, University of Texas Health San Antonio, San Antonio, TX 78229, United States.
Stephanie Terzulli, Memorial Sloan Kettering Cancer Center, New York, NY 10065, United States.
Paul Sabbatini, Memorial Sloan Kettering Cancer Center and Weill Cornell Medical College, New York, NY 10065, United States.
Author contributions
Anna Patruno (Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing—original draft, Writing—review & editing), Michael-Owen Panzarella (Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing—original draft, Writing—review & editing), Michael Buckley (Formal analysis, Methodology, Project administration, Resources, Supervision, Writing—original draft, Writing—review & editing), Milena Silverman (Data curation, Formal analysis, Software, Validation), Evelyn Salazar (Formal analysis, Validation), Renata Panchal (Conceptualization, Investigation, Methodology, Project administration, Writing—original draft, Writing—review & editing), Joseph Lengfellner (Conceptualization, Investigation, Methodology, Project administration, Resources, Software, Writing—review & editing), Alexia Iasonos (Conceptualization, Formal analysis, Writing—review & editing), Maryam Garza (Formal analysis, Writing—review & editing), Byeong Yeob Choi (Formal analysis, Writing—review & editing), Meredith Zozus (Formal analysis, Writing—review & editing), Stephanie Terzulli (Conceptualization, Funding acquisition, Resources, Supervision, Writing—review & editing), and Paul Sabbatini (Funding acquisition, Resources, Writing—review & editing)
Supplementary material
Supplementary material is available at JAMIA Open online.
Funding
This research was funded in part through the NIH/NCI Cancer Center Support Grant P30 CA008748. The analysis was partially supported through a grant from the Burroughs Wellcome Fund.
Conflicts of interest
The authors declare no competing interests. All authors have no direct financial interest in IgniteData. Memorial Sloan Kettering has institutional financial interests related to IgniteData.
Data availability
The data underlying this article will be shared on reasonable request to the corresponding author.
References
- 1. Markey N, Howitt B, El-Mansouri I, Schwartzenberg C, Kotova O, Meier C. Clinical trials are becoming more complex: a machine learning analysis of data from over 16,000 trials. Sci Rep. 2024;14:3514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Parab AA, Mehta P, Vattikola A, et al. Accelerating the adoption of eSource in clinical research: a transcelerate point of view. Ther Innov Regul Sci. 2020;54:1141-1151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Negro-Calduch E, Azzopardi-Muscat N, Krishnamurthy RS, Novillo-Ortiz D. Technological progress in electronic health record system optimization: systematic review of systematic literature reviews. Int J Med Inform. 2021;152:104507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Eisenstein EL, Garza MY, Rocca M, Gordon GS, Zozus M. eSource-enabled vs traditional clinical trial data collection methods: a site-level economic analysis. Stud Health Technol Inform. 2020;270:961-965. [DOI] [PubMed] [Google Scholar]
- 5. Buckley M, Vattikola A, Maniar R, Dai H. Direct data extraction and exchange of local labs for clinical research protocols: a partnership with sites, biopharmaceutical firms, and clinical research organizations. J Soc Clin Data Manage. 2021;1:1–5. 10.47912/jscdm.21 [DOI] [Google Scholar]
- 6. Nordo AH, Eisenstein EL, Hawley J, et al. A comparative effectiveness study of eSource used for data capture for a clinical research registry. Int J Med Inform. 2017;103:89-94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Ammour N, Griffon N, Djadi-Prat J, et al. TransFAIR study: a european multicentre experimental comparison of EHR2EDC technology to the usual manual method for eCRF data collection. BMJ Health Care Inform. 2023;30:e100602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Clopper CJ, Pearson ES. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika. 1934;26:404-413. 10.1093/biomet/26.4.404 [DOI] [Google Scholar]
- 9. Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. J Stat Softw. 2015;67:1-48. 10.18637/jss.v067.i01 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data underlying this article will be shared on reasonable request to the corresponding author.
