Abstract
Background
Electronic health record (EHR) data are anticipated to inform the development of health policy systems across countries and furnish valuable insights for the advancement of health and medical technology. As the current paradigm of clinical research is shifting toward data centricity, the utilization of health care data is increasingly emphasized.
Objective
We aimed to review the literature on clinical data quality management and define a process for ensuring the quality management of clinical data, especially in the secondary utilization of data.
Methods
A systematic review of PubMed articles from 2010 to October 2023 was conducted. A total of 82,346 articles were retrieved and screened based on the inclusion and exclusion criteria, narrowing the number of articles to 851 after title and abstract review. Articles focusing on clinical data quality management life cycles, assessment methods, and tools were selected.
Results
We reviewed 105 papers describing the clinical data quality management process. This process is based on a 4-stage life cycle: planning, construction, operation, and utilization. The most frequently used dimensions were completeness, plausibility, concordance, security, currency, and interoperability.
Conclusions
Given the importance of the secondary use of EHR data, standardized quality control methods and automation are necessary. This study proposes a process to standardize data quality management and develop a data quality assessment system.
Keywords: clinical research informatics, data quality, data accuracy, electronic health records, frameworks, quality of health care
Introduction
As data continue to accumulate, the question of how to use neglected data has received increasing attention. In particular, the need for quality control in the use of electronic health record (EHR) data has been emphasized. EHR data are expected to facilitate the development of national health policy systems and provide useful information for improving public health and medical technology [1]. As the current clinical research paradigm shifts to one of data centricity, the use of EHR data has increasingly been emphasized [2].
The quality of EHR data research depends on the quality of the generated data, which is a major research limitation. EHR data are essential in preclinical research, which is conducted to study the future of diseases and draft policies. Therefore, integrated data must be used seamlessly and incorporate different types of data. Currently, various methods for integrated data management are being developed [3-9], but quality control standards are set differently for each data type, and discussions in this regard are challenging because of the nature of EHR data [10-13].
Although research into EHR data quality management is actively underway, a gold standard for assessing data quality remains absent. Inconsistencies in data formats and terminology, a lack of standardization, security issues, and challenges in processing large-scale data persist as major obstacles to establishing standardized EHR data management practices [14,15]. Another critical challenge in EHR data management is achieving consistency across data sets from different hospitals and health care systems [16]. The variability in data collection methods and formats among institutions complicates the integration of data sets, undermining the reproducibility and reliability of research [17].
The consistent quality of EHR data is a critical factor in the performance of data analytics. Meeting data quality standards requires a management system that is appropriate for each stage of the data life cycle [18,19]. However, no standardized approach is available to assess the quality of EHR data [14]. For accurate and consistent research on EHR data, common data models (CDMs) such as the Observational Medical Outcomes Partnership CDM and Sentinel CDM are being built [20,21]. However, CDMs are evaluated individually depending on their type [22-24].
The quality of clinical data depends on the quality of the data on which they are built, and such dependence is another major research limitation. A data quality management process defines the basic principles of data management and enables accurate, consistent control of data quality [25]. High-quality data can be defined as such when they are not built piecemeal but are managed throughout the entire process of operation and use.
This study aimed to understand the importance of clinical data quality management and the life cycle–based clinical data quality management process. Accordingly, the existing literature on EHRs and clinical data quality was reviewed, and the guidelines for the predefined clinical data quality management processes of planning, implementation, operation, and utilization [26] were subsequently considered.
Methods
Definition of the Clinical Data Life Cycle
In the context of systematic data quality management, we defined the life cycle of clinical data quality management [26] as the quality management activities for health care data that include a series of steps from data construction to operation and use [26].
Literature Review on Data Quality
We aimed to identify articles that extensively discussed the generation and quality of EHR data. In this study, an EHR refers to all electronically stored records of patient health information, encompassing both electronic medical records and personal health records. To conduct the literature review, we followed the methods of previous studies that closely reviewed previous EHR data [14,27-29]. A PubMed literature search was conducted by the first author in October 2023. The keywords for the search were text words and Medical Subject Headings such as “data quality,” “data accuracy,” “quality indicators,” “quality of health care,” “quality control,” and combinations of these terms (Textbox 1). The literature search was limited to articles published in English.
Search terms.
'quality[ti]' AND (‘data quality’ OR ‘data accuracy’ OR ‘Quality of Health Care’ OR ‘Quality Indicators’ OR ‘quality control’) AND (EHR OR electronic medical record OR computerized medical record OR medical records systems, computerized [mh]) AND English[lang] NOT (review OR Clinical Trial OR Documents OR Books)
A total of 82,346 articles were retrieved from PubMed. To select articles suitable for our research purpose, we referred to previous studies and applied the inclusion and exclusion criteria listed in Textbox 2 [14,27-29]. The studies were evaluated based on their relevance to the assessment and management of data quality of EHR data. This was done by applying inclusion and exclusion criteria to the titles and abstracts of the studies. This process was conducted by an author with a degree in public health (DA) and cross-checked by another author specializing in health informatics (MS) to minimize bias. In cases of disagreement in study selection, final decisions were made through thorough discussion. A total of 851 articles were selected after the first review. In the second review, all articles were manually reviewed by the first author to ensure they met the criteria. Subsequently, all papers related to data quality were selected and classified based on the following 4 keywords: “data quality,” “EHR assessment,” “treatment quality,” and “hospital quality.”
Inclusion and exclusion criteria.
Inclusion criteria
Original research using data quality assessment methods
Focus on data derived from electronic health records or related systems
Exclusion criteria
Guidelines limited to one medical area (eg, cardiology) without generalization to other areas
-
Review papers
Guidance aimed at governing bodies
Published before 2010
Papers not in the English language
No full text available
Not a paper on data quality issues
To focus on data quality management for clinical data analysis, we reviewed the full text of each article containing 2 of the 4 keywords, that is, “data quality” and “EHR assessment.” In this process, we reviewed medical data quality and 13 relevant guidelines. Ultimately, 105 studies were included.
For each article, we described the category, definition of data quality, data quality management methods, and quality control procedures. The literature categories included the main perspectives, research methods, and research findings. For efficiency, we reviewed the articles by classifying them into the following 4 topics: “framework,” “quality measures,” “quality tool,” and “interview.” Framework papers included articles addressing general procedures for data quality, while papers on quality measures included those involving data evaluation. Articles on quality tools included those that developed data evaluation tools, while interview articles included those that evaluated data based on the opinions of experts in actual hospital settings.
We abstracted the general methods and procedures for data quality management based on data life cycle and evaluation methods in each paper. To establish standards for the data life cycle, we analyzed the literature related to data frameworks and identified ways to construct data quality management procedures. The data quality evaluation criteria, quality evaluation methods, data types, and vocabulary used in each article were also collected. The content of the articles was then repeatedly reviewed to define their quality control dimensions.
To organize the overall data quality assessment methodology, we reviewed the literature that mentioned the data life cycle; however, finding articles offering a clear definition was difficult. Data quality must be consistently defined [30]. The literature shows how clinical data are constructed and evaluated according to different processes. Studies have been conducted to define methods for evaluating data; however, the series of processes through which data are generated and used has not been considered. We realized that consistent data quality management could be implemented by identifying and defining the data characteristics highlighted in the literature. Our study attempted to define a set of processes through which data are constructed, operated, and used through a literature review and to include all commonly occurring concepts. We then reviewed all articles to collect data on the use of the newly defined processes and dimensions.
Results
Data Quality Assessment Framework Based on the Clinical Data Life Cycle
Data quality can be defined as “the level that can continuously meet the various activity purposes or satisfaction of users using data” [31]. Data quality management refers to a set of activities that ensure data quality. With the goal of developing and implementing high-quality data, data quality management encompasses all data-related management activities, from data creation to use [26].
Figure 1 illustrates the life cycle of clinical data and defines the data quality management methods according to the life cycle stage. We used the clinical data life cycle, which consists of the planning, construction, operation, and utilization stages [26]. In producing high-quality data, data must be managed according to the data life cycle and governance principles [26].
Figure 1.
Life cycle of clinical data quality management (DQM). CDW: common data warehouse; DB: database.
We established the definitions for each clinical data life cycle stage by reviewing the literature (Table 1). The literature included in the review often described the data life cycle for improving hospital EHR quality, quality measurement, and clinical decision support [32-36].
Table 1.
Defining the life cycle of clinical data quality management.
Life cycle stage | Definition | References |
Planning stage | Defining data standards based on the direction of data and creating a clear strategy for establishing quality management activities | [6,32,33,37] |
Construction stage | Considering the characteristics among data sets, collecting data, and proceeding with overall data construction and management that reflect clinical attributes | [32,37-43] |
Operation stage | Conducting data quality assessments on the constructed data and reviewing them from various angles and perspectives | [32,33,37,44,45] |
Utilization stage | Sharing the outcomes of data quality validation, implementing data quality enhancement activities, and recalibrating the overall data quality | [32,33,37,46] |
Planning Stage
In the planning stage of data quality management, key issues such as the data to be generated and their documentation and organization, storage and security, stewardship, and accessibility for reuse and sharing are considered [47]. Developing a data management plan should involve describing how data will be handled throughout the life of the project and after completion and establishing principles that are easy to implement [48].
Construction Stage
The construction stage involves quality control. It is also called the big data life cycle stage [25] (Figure 1). This data life cycle stage consists of 4 stages: data collection, data cleaning, data labeling, and data learning. At each stage of the life cycle, the tasks to be performed vary. For example, data quality control standards must be established and reflected in the data collection stage.
Operation Stage
Managing constructed data is the most active phase of data quality management. When building quality data, quality control must be implemented starting from the planning stage. However, not all data are built with quality control in mind from the planning stage. In data quality management, the operational stage involves activities to diagnose and improve the quality of the data loaded in data construction projects.
Utilization Stage
The main users of public medical data are public institutions and research institutes. Data quality management organizations must continuously implement improvements to provide high-quality data by adhering to the requirements of both data providers and consumers. Moreover, data must be continuously and accurately managed to provide high-quality medical services [9]. Accordingly, a support system must be institutionalized to continuously communicate with researchers on the use of medical data, and a foundation such as medical data standards must be established to ensure the uninterrupted provision of high-quality data.
Proposed Data Framework Based on the Clinical Data Life Cycle
In our literature review, we found one commonality: All stages are interrelated and emphasize the need to manage data from a holistic, life cycle perspective [26]. The plan-do-study-act (PDSA) cycle, which was frequently mentioned in most of the articles we reviewed, is primarily used for short-term processes, such as data construction or operation [33,38,46]. Therefore, the PDSA cycle, which is mainly used in the data construction stage, could not be applied in our study. The clinical data life cycle proposed in this study is designed to manage data comprehensively from a governance perspective. It is structured in a mutually organic manner, allowing for the reapplication of improvements after EHR data planning, construction, and secondary use. A set of procedures, such as the data framework, provides an environment for researchers to understand data, identify quality issues, and address them effectively [49]. As data significantly influence research outcomes, they must meaningfully be evaluated and managed throughout their life cycle [30]. Some studies did not consider data from a life cycle perspective [34,35,50-52]. Nevertheless, they considered the ecological use of data. They also considered the impact of data on hospital treatment processes [34,35]. Thus, data operations are organically linked, reflecting the interplay between different stages.
Dimensions of the Data Life Cycle and Clinical Data Quality Management
The set of reviewed papers comprised 44 papers on data framework, 32 papers on quality measures, 20 papers on quality tools, and 9 papers on interviews (Figures 2 and 3; Multimedia Appendix 1). Completeness was identified as the most commonly used indicator, particularly in 94 papers (Table 2 and Table 3). Research using data quality dimensions can be classified according to the stage of the clinical data life cycle, with the greatest amount of research occurring in the planning and implementation phase (Table 3).
Figure 2.
Diagram of the literature review process for clinical data quality management.
Figure 3.
PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 flow diagram.
Table 2.
Definitions of the life cycle of clinical data quality management and dimensions of data quality.
Dimension | Definition | Synonyms |
Completeness | Assessing the extent to which data have been fully constructed in accordance with their characteristics and intended design | Completeness, correctness, conformance, incompleteness, consistency |
Plausibility | Degree of reliability in data values and the significance of the associated information | Accuracy, consistency, relevance |
Concordance | The extent to which data can be stored in accordance with their characteristics based on standards | Structure, standardization |
Security | The extent to which data are trustworthy and accessible only to authorized users | Security, availability, confidentiality, representation, confidentiality, trustworthiness |
Currency | The extent to which data can be provided promptly when needed | Currency, timeliness, currentness |
Interoperability | The degree to which data operation is flexible, providing a sufficient and useful level of information that satisfies users | Availability, manageability, variability |
Table 3.
Life cycle of clinical data quality management and dimensions of data quality.
Dimension | Planning stage (n=69) | Construction stage (n=99) | Operation stage (n=95) | Utilization stage (n=72) | |||||||
|
Mentions, n (%)a | Articles | Mentions, n (%)a | Articles | Mentions, n (%)a | Articles | Mentions, n (%)a | Articles | |||
Completeness (n=107) | 22 (20.6) | [6,7,18,19, 25,32,33,43,45,53-65] |
34 (31.8) | [7,9,18 ,19,25,32,39 ,40,43 ,45,53-77] |
30 (28) | [7,9,15 ,19,25,32,34 ,43,49,50,55-57,59 ,63,65,70,72 ,75,76,78-85] |
21 (19.6) | [7,16,18,19,22,25,32,43,49 ,50,56,63,65-67,75,78,84-87] |
|||
Plausibility (n=72) | 19 (26.4) | [6,7,11,17-19 ,25,32,33,43 ,45,51,54,56,61,63-65,88] |
25 (34.7) | [7,9,11,17,19,22,25,43,45 ,51,54-56,61,63-66,68-70 ,75,76,88,89] |
26 (36.1) | [7,9,11 ,15,17,19,25,43,45 ,46,49 ,51,56,63,65,70,75 ,76 ,79-83,88,90,91] |
19 (26.4) | [7,11,16-19,25,32,43 ,46,49,56,65,66,75,86,88,90] |
|||
Concordance (n=81) | 18 (22.2) | [6,7,17-19,25,32,33,43,45 ,51,56,57,59,62,63,65] |
22 (27.2) | [7,9,17 ,19,25,43-45 ,51,55-57,59 ,62,63,65,67 ,70,75,76] |
23 (28.4) | [7,9,17,19,25 ,32,43,44,49,51,56,57,59 ,63,65,70 ,75,76,79 ,80,85,90] |
18 (22.2) | [7,16-19 ,25,32,43,49,56,63 ,65,67,75,85,86,90] |
|||
Security (n=33) | 8 (24.2) | [17,19,25 ,32,45,51,58 ,60,63] |
9 (27.3) | [17,19,25,45,51 ,58,60,63,89,91] |
7 (21.2) | [17,19,25,51,63,79,90,91] | 9 (27.3) | [16,17,19,25,32,63,86,87,90] | |||
Currency (n=42) | 9 (21.4) | [33,43,45,52,54 ,57,62,63,92] |
14 (33.3) | [11,15,17,43,52,55,57 ,62,63,67,71,72,92,93] |
10 (23.8) | [11,15,17,43,55 ,57,63,72,79,85,93] |
9 (21.4) | [11,16,17,32,43,57,63,67,85,86] | |||
Interoperability (n=35) | 7 (20) | [17,33,35,36,62,63,94] | 8 (22.9) | [17,35,36,46,55 ,63,74,79,80,94] |
10 (28.6) | [17,35,36 ,46,55 ,63,74 ,79 ,80,94] |
10 (28.6) | [17,32,35,36,46,63,75,86,90,94] |
aDistribution of each dimension across the stages of the clinical data life cycle (planning, construction, operation, and utilization), calculated as a proportion of each dimension’s total.
Completeness
Completeness was mainly used in the construction or operation stage and was used as an indicator for EHR evaluation [66,67], data quality system development [7,53,78], data recognition [17], and comparative evaluation [50]. The related terms used in the articles included correctness, conformance, incompleteness, and consistency.
Plausibility
Plausibility was the second most frequently used indicator, with 72 references mentioning it. It was often used in data evaluation during the operation phase of the data life cycle. It was mainly mentioned in the literature on data tool development [54,55], framework presentation [45,68], data measurement [69], and data quality assessment [7,66,89].
Concordance
Similar to completeness and plausibility, concordance was frequently mentioned in the construction and utilization stages. Concordance can be considered an indicator that determines whether the characteristics of different data are best expressed and stored based on standards. Concordance was mentioned in the studies that developed, experimented with, and evaluated quality management tools [9,51,54,56,57,70,90]. The related terms mentioned in the articles included structure and standardization.
Security
As EHR data are sensitive, great attention must be paid to ethical issues and data leakage. Therefore, the security of EHR data is crucial. In contrast to the aforementioned 3 indicators, which reflect the completeness of data, security was most frequently mentioned in the construction and utilization stages. The related terms mentioned in the articles included availability, confidentiality, representation, and trustworthiness.
Currency
Currency was mentioned most often during the data construction stage. In particular, the availability of data must be determined during data construction. Having readily available data is critical for the research process. The terms representing currency included timeliness.
Interoperability
The most cited limitation of EHR data is the difficulty with linking data between hospitals. By combining and sharing data already in use, more resources can be utilized. The indicator representing this relation is interoperability. The literature review in this study revealed a strong emphasis on interoperability, but it was not mentioned in articles defining other data quality indicators.
Discussion
Principal Findings
This study reviewed the existing literature, focusing on the importance of quality management from the EHR data life cycle perspective. Accordingly, an EHR data life cycle framework was defined, and 6 quality indicators were identified.
Data quality ensures the validity of research findings and provides information to demonstrate the appropriateness of EHR data use [49]. In this study, we identified the requirements for each stage of the data life cycle, including cycle-specific objectives, tasks, and evaluation metrics, to determine the validity of data. Data quality is a fundamental element for determining whether data have been constructed for their intended purpose [95]. Quality management must be applied at every stage of data processing to ensure that all data are reliable and appropriately handled [96].
The metrics identified in this study were frequently mentioned in the literature. We mapped the categories proposed in this study for currency and interoperability, which differ from the indicators proposed in previous studies. An accurate definition of these dimensions is essential for data quality. The definition of completeness alone can vary the completeness ratio of data depending on the type of data or the purpose for which quality is defined [28,86]. Dimensions have been developed to clearly define and automatically measure data [45]. Currency and interoperability metrics are not entirely new. They were mentioned repeatedly in various studies [33,43,45,52,54,57,62,63,92]. Currency refers to information about current data [63] and is primarily used for temporal information when representing the lifetime of data [16]. Temporal factors exert a significant effect on research results. In addition, currency should be considered when visualizing data quality results [42].
This study proposes a total of 6 data quality dimensions based on a comprehensive literature review. These indicators are not universally applicable across all data sets; additional dimensions may be warranted depending on specific conditions (Multimedia Appendix 2). For instance, bias can emerge based on data construction or the research environment. Addressing bias is crucial and has been emphasized in numerous studies on data quality [14,16,83]. In this regard, assessing task relevance is vital to verify that the constructed data meet their intended objectives and are effective for their purpose [45]. Furthermore, if data are integrated from multiple sources rather than generated from a single system, it is critical to evaluate consistency across data sets using the variability dimension [57]. In clinical settings, the validity and reliability of data are fundamental to the development of safe and accurate predictive models [57]. It is also necessary to assess usability to confirm that researchers in clinical environments can use data both effectively and efficiently [7,42] (Multimedia Appendix 3). Before using and measuring any data quality dimension, the purpose and research objectives of the data must be thoroughly understood, and the indicators must be selected accordingly. Systematic data quality assessments are essential at each phase of the data life cycle to ensure comprehensive data utilization. Each dimension can play a vital role in ensuring data accuracy, reliability, and efficiency, thereby enhancing the reproducibility and validity of the research. Developing a well-defined data quality plan minimizes unnecessary processes and costs and directly enhances data transparency and trustworthiness.
The majority of discussions on the quality of EHR data have centered on 3 key areas: conformance, plausibility, and completeness [6,42,49,70,86]. However, the actual quality of data can vary significantly depending on the measurement methods and management strategies used, due to factors such as the type and volume of data, data construction environment, characteristics of the disease, and type of system in which the data are generated [75,94]. A substantial body of research has proposed and developed a multitude of indicators. Through a comprehensive review of the literature, we identified that dimensions such as accuracy, consistency, completeness, and currency are closely interrelated according to data characteristics. Additionally, these indicators may vary in relevance depending on the data life cycle stage. Many studies, however, have overlooked these aspects. Recognizing the interdependence between dimensions while accounting for the unique characteristics of the data is crucial to establishing high-quality data.
When ensuring effective data quality management, simplified data guidelines that can be easily applied must be considered. Data quality management frameworks and guidelines are being developed in a data-specific manner [12,18,19,25,65]. From the data life cycle perspective, data quality management must be coordinated from a governance perspective throughout the entire life cycle. Several different types of data exist. To actively manage the quality of different data, more diverse data quality management methodologies must be developed [97]. Meanwhile, ensuring that data are usable and consistent requires clearly targeted and planned quality control procedures [48]. Regarding ensuring the scalability of data connections, quality control for integrated data using standardized procedures should be implemented from the planning stage [98].
In our study, we emphasized the importance of interoperability in the use of EHR data. The use of EHR helps researchers conduct their studies involving large amounts of data at a low cost [99] and facilitates the analysis of health information from thousands of individuals. Ideally, EHRs should be accurate and complete because they contain all health records [100]; however, EHR data face numerous quality issues [4,101]. In addition, challenges arise from the use of different EHR systems across hospitals and the heterogeneity of data, resulting in limited interoperability. Limited interoperability and inconsistent data exchange across settings are significant barriers to quality improvement [102]. The interoperability of EHRs with medical data is becoming increasingly valuable because of its potential to exponentially increase the availability of data or directly impact the activation of research. EHR systems can efficiently support data structuring and quality measurement results and have a great impact on patients and their time [102]. Interoperability among EHR systems refers to the linking of data, which improves data usability. Therefore, regulating the data structure or transfer standards between systems is essential to improve data quality and interoperability.
Considerable effort has been made to improve the quality of EHRs. These efforts include the development of automated data quality assessment systems [9,42,103], organization of quality indicator events, and development of metrics. Data must be sufficiently flexible to be used for multiple purposes. Moreover, data must be managed according to user needs, and diagnoses must be made based on the users’ purpose. When producing high-quality data, the data must be thoroughly examined from a data life cycle perspective, starting from data construction, to ensure that data standards are well established and applied, data are consistently secured, and errors are minimized [104].
Establishing criteria for data quality is critical because the data sources for research questions represent a major determinant of research outcomes. Several factors necessitate the establishment of data quality standards. First, the types of data required vary according to the research topic, and data types and structure are significantly diverse. In addition, medical practices and health care systems vary widely worldwide, and their differences can affect the relevance of data to research questions [12]. Data must be managed continuously and accurately to provide high-quality medical services [9]. Consequently, the perspectives for measuring the level of data quality must be defined, and the criteria for what should be measured must be established [25].
Investing in EHR data quality management improves clinical outcomes [34]. As hospital resources are limited, data preprocessing and quality assessment must be automated to avoid wasting resources. Many hospital researchers have focused on automating data quality assessment [3,6,8,9,59,77,105]. However, automation across all data sets lacks a unified standard, and different tools have been developed for different data types and languages. Given the diverse criteria and forms of EHR data, such approaches are not pragmatic [14]. Accurately defining the domains and task ontologies for measuring data quality in the automation process is critical [45,59]. Various methodologies and quality criteria have been identified [29]. Nevertheless, flexible tools that consider interoperability must be developed, and existing methodologies must be used to create a unified automation tool [14].
Limitations
Our literature review has several limitations that need to be considered. First, the literature selection was conducted solely by the first author, which may introduce subjectivity to the process and result in classifications that other reviewers might not agree with. Although cross-review efforts were made, the lack of a multireviewer approach may limit the generalizability of the findings. Second, in this study, we conducted the literature search using only one database. Due to the use of a single source, there may be a risk of missing other relevant studies. However, prior to conducting our study, we performed the same search in other databases and observed similar results to those obtained from PubMed, the database ultimately used in this research. Third, the quality dimensions identified in this review, derived solely from existing literature, have not been validated by clinical experts. The absence of expert validation may limit the practical applicability of these dimensions in clinical settings, indicating a need for further expert review.
Conclusion
As the value of EHR data increases, the demand for high-quality data also rises. Standardized quality management and automation of data quality assessment are necessary to produce high-quality data and improve their usability. This study focuses on the secondary use of EHR data, reviews the existing literature, and redefines quality management indicators from a data life cycle perspective. As data quality assessment methods based on the data life cycle perspective have not yet been developed, future work should focus on developing data quality assessment systems with an emphasis on standardized frameworks and tools that consider the specific characteristics of the data.
Acknowledgments
We attest that there was no use of GenAI technology in the generation of text, figures, or other informational content of this manuscript.
This research was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: RS-2022-KH125153) and This work was supported by the Gachon University research fund of 2023 (GCU-202400550001).
Abbreviations
- CDM
common data model
- DQM
data quality management
- EHR
electronic health record
- PDSA
plan-do-study-act
Data Search List.
Additional Quality Dimension.
Term of Data Quality Management.
PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 checklist.
Data Availability
The data supporting this article are available upon request from the corresponding author.
Footnotes
Conflicts of Interest: None disclosed.
References
- 1.Kwon T, Jeong Y, Lee D. Standardization and quality evaluation of health and medical big data. Osong: Korea Health Industry Development Institute. 2019. Nov 25, [2023-10-25]. https://www.khiss.go.kr/board/view?pageNum=1&rowCnt=10&no1=314&linkId=175501&menuId=MENU00305&schType=0&schText=&boardStyle=&categoryId=&continent=&schStartChar=&schEndChar=&country=
- 2.Choi MS, Lee SH. Current status and issues of data management plan in Korea. The Journal of the Korea Contents Association. 2020;20:220–229. doi: 10.5392/JKCA.2020.20.06.220. [DOI] [Google Scholar]
- 3.Tute E, Scheffner I, Marschollek M. A method for interoperable knowledge-based data quality assessment. BMC Med Inform Decis Mak. 2021 Mar 09;21(1):93. doi: 10.1186/s12911-021-01458-1. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-021-01458-1 .10.1186/s12911-021-01458-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Weiskopf NG, Khan FJ, Woodcock D, Dorr DA, Cigarroa JE, Cohen AM. A mixed methods task analysis of the implementation and validation of EHR-based clinical quality measures. AMIA Annu Symp Proc. 2016;2016:1229–1237. https://europepmc.org/abstract/MED/28269920 . [PMC free article] [PubMed] [Google Scholar]
- 5.Devine EB, Van Eaton E, Zadworny ME, Symons R, Devlin A, Yanez D, Yetisgen M, Keyloun KR, Capurro D, Alfonso-Cristancho R, Flum DR, Tarczy-Hornoch P. Automating electronic clinical data capture for quality improvement and research: the CERTAIN validation project of real world evidence. EGEMS (Wash DC) 2018 May 22;6(1):8. doi: 10.5334/egems.211. https://europepmc.org/abstract/MED/29881766 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Khare R, Utidjian L, Ruth B, Kahn M, Burrows E, Marsolo K, Patibandla Nandan, Razzaghi Hanieh, Colvin Ryan, Ranade Daksha, Kitzmiller Melody, Eckrich Daniel, Bailey L Charles. A longitudinal analysis of data quality in a large pediatric data research network. J Am Med Inform Assoc. 2017 Nov 01;24(6):1072–1079. doi: 10.1093/jamia/ocx033. https://europepmc.org/abstract/MED/28398525 .3238563 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kapsner LA, Mang JM, Mate S, Seuchter SA, Vengadeswaran A, Bathelt F, Deppenwiese N, Kadioglu D, Kraska D, Prokosch H. Linking a consortium-wide data quality assessment tool with the MIRACUM metadata repository. Appl Clin Inform. 2021 Aug 25;12(4):826–835. doi: 10.1055/s-0041-1733847. https://europepmc.org/abstract/MED/34433217 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Chiang J, Lin J, Yang C. Automated evaluation of electronic discharge notes to assess quality of care for cardiovascular diseases using Medical Language Extraction and Encoding System (MedLEE) J Am Med Inform Assoc. 2010 May 01;17(3):245–52. doi: 10.1136/jamia.2009.000182. https://europepmc.org/abstract/MED/20442141 .17/3/245 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Mang JM, Seuchter SA, Gulden C, Schild S, Kraska D, Prokosch H, Kapsner LA. DQAgui: a graphical user interface for the MIRACUM data quality assessment tool. BMC Med Inform Decis Mak. 2022 Aug 11;22(1):213. doi: 10.1186/s12911-022-01961-z. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-022-01961-z .10.1186/s12911-022-01961-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Mohamed Y, Song X, McMahon TM, Sahil S, Zozus M, Wang Z, Waitman LR. Tailoring rule-based data quality assessment to the Patient-Centered Outcomes Research Network (PCORnet) common data model (CDM) AMIA Annu Symp Proc. 2022;2022:775–784. https://europepmc.org/abstract/MED/37128433 .684 [PMC free article] [PubMed] [Google Scholar]
- 11.Davoudi S, Dooling J, Glondys B, Jones T, Kadlec L, Overgaard S, Ruben Kerry, Wendicke Annemarie. Data quality management model (Updated) J AHIMA. 2015 Oct;86(10):62–5. [PubMed] [Google Scholar]
- 12.Real-World Data: Assessing Electronic Health Records and Medical Claims Data To Support Regulatory Decision-Making for Drug and Biological Products. Food and Drug Administration. 2024. Jul, [2025-03-06]. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/real-world-data-assessing-electronic-health-records-and-medical-claims-data-support-regulatory . [DOI] [PMC free article] [PubMed]
- 13.Madnick SE, Wang RY, Lee YW, Zhu H. Overview and framework for data and information quality research. J. Data and Information Quality. 2009 Jun;1(1):1–22. doi: 10.1145/1515693.1516680. [DOI] [Google Scholar]
- 14.Lewis AE, Weiskopf N, Abrams ZB, Foraker R, Lai AM, Payne PRO, Gupta A. Electronic health record data quality assessment and tools: a systematic review. J Am Med Inform Assoc. 2023 Sep 25;30(10):1730–1740. doi: 10.1093/jamia/ocad120. https://europepmc.org/abstract/MED/37390812 .7216383 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Johnson S, Speedie S, Simon G, Kumar V, Westra B. Application of an ontology for characterizing data quality for a secondary use of EHR data. Appl Clin Inform. 2017 Dec 16;07(01):69–88. doi: 10.4338/aci-2015-08-ra-0107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kahn MG, Raebel MA, Glanz JM, Riedlinger K, Steiner JF. A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. Med Care. 2012 Jul;50 Suppl:S21–9. doi: 10.1097/MLR.0b013e318257dd67. https://europepmc.org/abstract/MED/22692254 .00005650-201207001-00008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Callahan T, Barnard J, Helmkamp L, Maertens J, Kahn M. Reporting data quality assessment results: identifying individual and organizational barriers and solutions. EGEMS (Wash DC) 2017 Sep 04;5(1):16. doi: 10.5334/egems.214. https://europepmc.org/abstract/MED/29881736 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Public data preventive quality management guide. Sejong, South Korea: Ministry of the Interior and Safety; 2021. Mar 18, [Google Scholar]
- 19.Public Data Quality Management Manual ver 2.0. Ministry of Interior And Safety. 2018. Jan 15, [2025-03-24]. https://www.data.go.kr/en/bbs/rcr/selectRecsroom.do?pageIndex=1&originId=PDS_0000000000000516 .
- 20.Data Quality Review and Characterization Programs. Sentinel Initiative. 2024. Jun 06, [2025-03-06]. https://www.sentinelinitiative.org/methods-data-tools/sentinel-common-data-model/data-quality-review-and-characterization-programs .
- 21.Makadia R, Ryan PB. Transforming the premier perspective hospital database into the Observational Medical Outcomes Partnership (OMOP) common data model. EGEMS (Wash DC) 2014;2(1):1110. doi: 10.13063/2327-9214.1110. https://europepmc.org/abstract/MED/25848597 .egems1110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Huser V, DeFalco FJ, Schuemie M, Ryan PB, Shang N, Velez M, Park RW, Boyce RD, Duke J, Khare R, Utidjian L, Bailey C. Multisite evaluation of a data quality tool for patient-level clinical data sets. EGEMS (Wash DC) 2016;4(1):1239. doi: 10.13063/2327-9214.1239. https://europepmc.org/abstract/MED/28154833 .egems1239 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Estiri H, Stephens KA, Klann JG, Murphy SN. Exploring completeness in clinical data research networks with DQe-c. J Am Med Inform Assoc. 2018 Jan 01;25(1):17–24. doi: 10.1093/jamia/ocx109. https://europepmc.org/abstract/MED/29069394 .4562678 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bian J, Lyu T, Loiacono A, Viramontes TM, Lipori G, Guo Y, Wu Y, Prosperi M, George TJ, Harle CA, Shenkman EA, Hogan W. Assessing the practice of data quality evaluation in a national clinical data research network through a systematic scoping review in the era of real-world data. J Am Med Inform Assoc. 2020 Dec 09;27(12):1999–2010. doi: 10.1093/jamia/ocaa245. https://europepmc.org/abstract/MED/33166397 .5964088 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lee Y, Son K, Yoo S, Kim E, Co . platform DoB, editor. Daegu: National Information Society Agency; 2022. Jun 2, Big Data Platform and Center Data Quality Management Guide; pp. 7–143. [Google Scholar]
- 26.Lee S, Roh G, Kim J, Ho Lee Young, Woo H, Lee S. Effective data quality management for electronic medical record data using SMART DATA. Int J Med Inform. 2023 Dec;180:105262. doi: 10.1016/j.ijmedinf.2023.105262. https://doi.org/10.1016/j.ijmedinf.2023.105262 .S1386-5056(23)00280-0 [DOI] [PubMed] [Google Scholar]
- 27.Arts DGT, De Keizer NF, Scheffer G. Defining and improving data quality in medical registries: a literature review, case study, and generic framework. J Am Med Inform Assoc. 2002;9(6):600–11. doi: 10.1197/jamia.m1087. https://europepmc.org/abstract/MED/12386111 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2013 Jan 01;20(1):144–51. doi: 10.1136/amiajnl-2011-000681. https://europepmc.org/abstract/MED/22733976 .amiajnl-2011-000681 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.de Hond AAH, Leeuwenberg AM, Hooft L, Kant IMJ, Nijman SWJ, van Os HJA, Aardoom JJ, Debray TPA, Schuit E, van Smeden M, Reitsma JB, Steyerberg EW, Chavannes NH, Moons KGM. Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review. NPJ Digit Med. 2022 Jan 10;5(1):2. doi: 10.1038/s41746-021-00549-7. https://doi.org/10.1038/s41746-021-00549-7 .10.1038/s41746-021-00549-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Liaw S, Guo JGN, Ansari S, Jonnagaddala J, Godinho MA, Borelli AJ, de Lusignan S, Capurro D, Liyanage H, Bhattal N, Bennett V, Chan J, Kahn MG. Quality assessment of real-world data repositories across the data life cycle: a literature review. J Am Med Inform Assoc. 2021 Jul 14;28(7):1591–1599. doi: 10.1093/jamia/ocaa340. https://europepmc.org/abstract/MED/33496785 .6120399 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.English L. Information quality management: The next frontier. ProQuest. 2001. [2023-10-25]. https://www.proquest.com/openview/5ee454e132571ff609fe90509f73abfa/1?cbl=39817&pq-origsite=gscholar .
- 32.Winter A, Takabayashi K, Jahn F, Kimura E, Engelbrecht R, Haux R, Honda M, Hübner U, Inoue S, Kohl C, Matsumoto T, Matsumura Y, Miyo K, Nakashima N, Prokosch H, Staemmler M. Quality requirements for electronic health record systems. Methods Inf Med. 2018 Jan 31;56(S 01):e92–e104. doi: 10.3414/me17-05-0002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Jedwab RM, Franco M, Owen D, Ingram A, Redley B, Dobroff N. Improving the quality of electronic medical record documentation: development of a compliance and quality program. Appl Clin Inform. 2022 Aug 07;13(4):836–844. doi: 10.1055/s-0042-1756369. http://www.thieme-connect.com/DOI/DOI?10.1055/s-0042-1756369 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Damberg CL, Shortell SM, Raube K, Gillies RR, Rittenhouse D, McCurdy RK, Casalino LP, Adams J. Relationship between quality improvement processes and clinical performance. Am J Manag Care. 2010 Aug;16(8):601–6. https://www.ajmc.com/pubMed.php?pii=12694 .12694 [PubMed] [Google Scholar]
- 35.Ramirez A, Sulieman L, Schlueter D, Halvorson A, Qian J, Ratsimbazafy F, Loperena Roxana, Mayo Kelsey, Basford Melissa, Deflaux Nicole, Muthuraman Karthik N, Natarajan Karthik, Kho Abel, Xu Hua, Wilkins Consuelo, Anton-Culver Hoda, Boerwinkle Eric, Cicek Mine, Clark Cheryl R, Cohn Elizabeth, Ohno-Machado Lucila, Schully Sheri D, Ahmedani Brian K, Argos Maria, Cronin Robert M, O'Donnell Christopher, Fouad Mona, Goldstein David B, Greenland Philip, Hebbring Scott J, Karlson Elizabeth W, Khatri Parinda, Korf Bruce, Smoller Jordan W, Sodeke Stephen, Wilbanks John, Hentges Justin, Mockrin Stephen, Lunt Christopher, Devaney Stephanie A, Gebo Kelly, Denny Joshua C, Carroll Robert J, Glazer David, Harris Paul A, Hripcsak George, Philippakis Anthony, Roden Dan M, All of Us Research Program The research program: data quality, utility, and diversity. Patterns (N Y) 2022 Aug 12;3(8):100570. doi: 10.1016/j.patter.2022.100570. https://linkinghub.elsevier.com/retrieve/pii/S2666-3899(22)00181-7 .S2666-3899(22)00181-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Lin Y, Staes CJ, Shields DE, Kandula V, Welch BM, Kawamoto K. Design, development, and initial evaluation of a terminology for clinical decision support and electronic clinical quality measurement. AMIA Annu Symp Proc. 2015;2015:843–51. https://europepmc.org/abstract/MED/26958220 . [PMC free article] [PubMed] [Google Scholar]
- 37.Chelico JD, Wilcox AB, Vawdrey DK, Kuperman GJ. Designing a clinical data warehouse architecture to support quality improvement initiatives. AMIA Annu Symp Proc. 2016;2016:381–390. https://europepmc.org/abstract/MED/28269833 . [PMC free article] [PubMed] [Google Scholar]
- 38.Knight AW, Szucs C, Dhillon M, Lembke T, Mitchell C. The eCollaborative: using a quality improvement collaborative to implement the National eHealth Record System in Australian primary care practices. Int J Qual Health Care. 2014 Aug 12;26(4):411–7. doi: 10.1093/intqhc/mzu059. https://europepmc.org/abstract/MED/24925685 .mzu059 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Randall SM, Ferrante AM, Boyd JH, Semmens JB. The effect of data cleaning on record linkage quality. BMC Med Inform Decis Mak. 2013 Jun 05;13:64. doi: 10.1186/1472-6947-13-64. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-6947-13-64 .1472-6947-13-64 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Schepens MHJ, Trompert AC, van Hooff ML, van der Velde E, Kallewaard M, Verberk-Jonkers IJAM, Cense HA, Somford DM, Repping S, Tromp SC, Wouters MWJM. Using existing clinical information models for Dutch quality registries to reuse data and follow COUMT paradigm. Appl Clin Inform. 2023 Mar;14(2):326–336. doi: 10.1055/s-0043-1767681. http://www.thieme-connect.com/DOI/DOI?10.1055/s-0043-1767681 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Kukhareva PV, Kawamoto K, Shields DE, Barfuss DT, Halley AM, Tippetts TJ, Warner PB, Bray BE, Staes CJ. Clinical Decision Support-based Quality Measurement (CDS-QM) framework: prototype implementation, evaluation, and future directions. AMIA Annu Symp Proc. 2014;2014:825–34. https://europepmc.org/abstract/MED/25954389 . [PMC free article] [PubMed] [Google Scholar]
- 42.Engel N, Wang H, Jiang X, Lau CY, Patterson J, Acharya N, Beaton M, Sulieman L, Pavinkurve N, Natarajan K. EHR data quality assessment tools and issue reporting workflows for the 'All of Us' research program clinical data research network. AMIA Jt Summits Transl Sci Proc. 2022;2022:186–195. https://europepmc.org/abstract/MED/35854725 .2268 [PMC free article] [PubMed] [Google Scholar]
- 43.Diaz-Garelli J, Bernstam EV, Lee M, Hwang KO, Rahbar MH, Johnson TR. DataGauge: a practical process for systematically designing and implementing quality assessments of repurposed clinical data. EGEMS (Wash DC) 2019 Jul 25;7(1):32. doi: 10.5334/egems.286. https://europepmc.org/abstract/MED/31367649 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Fife CE, Walker D, Thomson B. Electronic health records, registries, and quality measures: What? Why? How? Adv Wound Care (New Rochelle) 2013 Dec;2(10):598–604. doi: 10.1089/wound.2013.0476. https://europepmc.org/abstract/MED/24761335 .10.1089/wound.2013.0476 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Johnson SG, Speedie S, Simon G, Kumar V, Westra BL. A data quality ontology for the secondary use of EHR data. AMIA Annu Symp Proc. 2015;2015:1937–46. https://europepmc.org/abstract/MED/26958293 . [PMC free article] [PubMed] [Google Scholar]
- 46.Carroll A, Johnson D. Know it when you see it: identifying and using special cause variation for quality improvement. Hosp Pediatr. 2020 Nov;10(11):e8–e10. doi: 10.1542/hpeds.2020-002303. https://europepmc.org/abstract/MED/33051243 .hpeds.2020-002303 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Holles JH, Schmidt L. Graduate Research Data Management Course Content: Teaching the Data Management Plan (DMP). ASEE Annual Conference & Exposition; June 23-27, 2018; Salt Lake City, UT. 2018. [DOI] [Google Scholar]
- 48.Michener WK. Ten simple rules for creating a good data management plan. PLoS Comput Biol. 2015 Oct 22;11(10):e1004525. doi: 10.1371/journal.pcbi.1004525. https://dx.plos.org/10.1371/journal.pcbi.1004525 .PCOMPBIOL-D-15-00210 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Huang Y, Voorham J, Haaijer-Ruskamp FM. Using primary care electronic health record data for comparative effectiveness research: experience of data quality assessment and preprocessing in The Netherlands. J Comp Eff Res. 2016 Jul;5(4):345–54. doi: 10.2217/cer-2015-0022. https://www.becarispublishing.com/doi/10.2217/cer-2015-0022?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub0pubmed . [DOI] [PubMed] [Google Scholar]
- 50.Bell EJ, Takhar SS, Beloff JR, Schuur JD, Landman AB. Information technology improves emergency department patient discharge instructions completeness and performance on a national quality measure: a quasi-experimental study. Appl Clin Inform. 2013;4(4):499–514. doi: 10.4338/ACI-2013-07-RA-0046. https://europepmc.org/abstract/MED/24454578 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Rahbar MH, Gonzales NR, Ardjomand-Hessabi M, Tahanan A, Sline MR, Peng H, Pandurengan R, Vahidy FS, Tanksley JD, Delano AA, Malazarte RM, Choi EE, Savitz SI, Grotta JC. The University of Texas Houston Stroke Registry (UTHSR): implementation of enhanced data quality assurance procedures improves data quality. BMC Neurol. 2013 Jun 15;13(1):61. doi: 10.1186/1471-2377-13-61. https://bmcneurol.biomedcentral.com/articles/10.1186/1471-2377-13-61 .1471-2377-13-61 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Kvale M, Hesselson S, Hoffmann T, Cao Y, Chan D, Connell S, Croen Lisa A, Dispensa Brad P, Eshragh Jasmin, Finn Andrea, Gollub Jeremy, Iribarren Carlos, Jorgenson Eric, Kushi Lawrence H, Lao Richard, Lu Yontao, Ludwig Dana, Mathauda Gurpreet K, McGuire William B, Mei Gangwu, Miles Sunita, Mittman Michael, Patil Mohini, Quesenberry Charles P, Ranatunga Dilrini, Rowell Sarah, Sadler Marianne, Sakoda Lori C, Shapero Michael, Shen Ling, Shenoy Tanu, Smethurst David, Somkin Carol P, Van Den Eeden Stephen K, Walter Lawrence, Wan Eunice, Webster Teresa, Whitmer Rachel A, Wong Simon, Zau Chia, Zhan Yiping, Schaefer Catherine, Kwok Pui-Yan, Risch Neil. Genotyping informatics and quality control for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort. Genetics. 2015 Aug;200(4):1051–60. doi: 10.1534/genetics.115.178905. https://europepmc.org/abstract/MED/26092718 .genetics.115.178905 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Van Batavia JP, Weiss DA, Long CJ, Madison J, McCarthy G, Plachter N, Zderic SA. Using structured data entry systems in the electronic medical record to collect clinical data for quality and research: Can we efficiently serve multiple needs for complex patients with spina bifida? PRM. 2018 Dec 13;11(4):303–309. doi: 10.3233/prm-170525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Johnson S, Speedie S, Simon G, Kumar V, Westra B. Quantifying the effect of data quality on the validity of an eMeasure. Appl Clin Inform. 2017 Dec 14;08(04):1012–1021. doi: 10.4338/aci-2017-03-ra-0042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Wang Z, Talburt JR, Wu N, Dagtas S, Zozus MN. A rule-based data quality assessment system for electronic health record data. Appl Clin Inform. 2020 Aug 23;11(4):622–634. doi: 10.1055/s-0040-1715567. http://www.thieme-connect.com/DOI/DOI?10.1055/s-0040-1715567 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Razzaghi H, Greenberg J, Bailey L. Developing a systematic approach to assessing data quality in secondary use of clinical data based on intended use. Learn Health Syst. 2022 Jan;6(1):e10264. doi: 10.1002/lrh2.10264. https://europepmc.org/abstract/MED/35036548 .LRH210264 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Fu S, Wen A, Schaeferle GM, Wilson PM, Demuth G, Ruan X, Liu S, Storlie C, Liu H. Assessment of data quality variability across two EHR systems through a case study of post-surgical complications. AMIA Jt Summits Transl Sci Proc. 2022;2022:196–205. https://europepmc.org/abstract/MED/35854735 .2232 [PMC free article] [PubMed] [Google Scholar]
- 58.Joukes E, de Keizer NF, de Bruijne MC, Abu-Hanna A, Cornet R. Impact of electronic versus paper-based recording before EHR implementation on health care professionals' perceptions of EHR use, data quality, and data reuse. Appl Clin Inform. 2019 Mar;10(2):199–209. doi: 10.1055/s-0039-1681054. https://europepmc.org/abstract/MED/30895574 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Wu Q, Ganz C, Li L. Data quality control for electronic pathology reporting. J Registry Manag. 2022;49(3):95–96. https://europepmc.org/abstract/MED/37260925 .jrm.2022.49.3.95 [PMC free article] [PubMed] [Google Scholar]
- 60.Devine EB, Capurro D, van Eaton E, Alfonso-Cristancho R, Devlin A, Yanez ND, Yetisgen-Yildiz M, Flum DR, Tarczy-Hornoch P. Preparing electronic clinical data for quality improvement and comparative effectiveness research: the SCOAP CERTAIN automation and validation project. EGEMS (Wash DC) 2013;1(1):1025. doi: 10.13063/2327-9214.1025. https://europepmc.org/abstract/MED/25848565 .egems1025 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Huser V, Williams ND, Mayer CS. Linking provider specialty and outpatient diagnoses in Medicare claims data: data quality implications. Appl Clin Inform. 2021 Aug;12(4):729–736. doi: 10.1055/s-0041-1732404. http://www.thieme-connect.com/DOI/DOI?10.1055/s-0041-1732404 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Skyttberg N, Vicente J, Chen R, Blomqvist H, Koch S. How to improve vital sign data quality for use in clinical decision support systems? A qualitative study in nine Swedish emergency departments. BMC Med Inform Decis Mak. 2016 Jun 04;16:61. doi: 10.1186/s12911-016-0305-4. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-016-0305-4 .10.1186/s12911-016-0305-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Fadahunsi KP, Wark PA, Mastellos N, Neves AL, Gallagher J, Majeed A, Webster A, Smith A, Choo-Kang B, Leon C, Edwards C, O'Shea C, Heitz E, Kayode OV, Nash M, Kowalski M, Jiwani M, O'Callaghan ME, Zary N, Henderson N, Chavannes NH, Čivljak R, Olubiyi OA, Mahapatra P, Panday RN, Oriji SO, Fox TE, Faint V, Car J. Assessment of clinical information quality in digital health technologies: international eDelphi study. J Med Internet Res. 2022 Dec 06;24(12):e41889. doi: 10.2196/41889. https://www.jmir.org/2022/12/e41889/ v24i12e41889 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Garies S, Cummings M, Forst B, McBrien K, Soos B, Taylor M, Drummond N, Manca D, Duerksen K, Quan H, Williamson T. Achieving quality primary care data: a description of the Canadian Primary Care Sentinel Surveillance Network data capture, extraction, and processing in Alberta. Int J Popul Data Sci. 2019 Jul 29;4(2):1132. doi: 10.23889/ijpds.v4i2.1132. https://europepmc.org/abstract/MED/34095540 .S2399490819011327 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Agency N. Division ID. editor. Daegu: National Information Society Agency; 2022. Mar 14, Data Quality Management Guidelines v2.0 for Artificial Intelligence Learning - Quality Management Guide. [Google Scholar]
- 66.Garies S, McBrien K, Quan H, Manca D, Drummond N, Williamson T. A data quality assessment to inform hypertension surveillance using primary care electronic medical record data from Alberta, Canada. BMC Public Health. 2021 Feb 02;21(1):264. doi: 10.1186/s12889-021-10295-w. https://bmcpublichealth.biomedcentral.com/articles/10.1186/s12889-021-10295-w .10.1186/s12889-021-10295-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Afshar AS, Li Y, Chen Z, Chen Y, Lee JH, Irani D, Crank A, Singh D, Kanter M, Faraday N, Kharrazi H. An exploratory data quality analysis of time series physiologic signals using a large-scale intensive care unit database. JAMIA Open. 2021 Jul;4(3):ooab057. doi: 10.1093/jamiaopen/ooab057. https://europepmc.org/abstract/MED/34350392 .ooab057 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Winnenburg R, Bodenreider O. Metrics for assessing the quality of value sets in clinical quality measures. AMIA Annu Symp Proc. 2013;2013:1497–505. https://europepmc.org/abstract/MED/24551422 . [PMC free article] [PubMed] [Google Scholar]
- 69.Gadde MA, Wang Z, Zozus M, Talburt JB, Greer ML. Rules based data quality assessment on claims database. Stud Health Technol Inform. 2020 Jun 26;272:350–353. doi: 10.3233/SHTI200567. https://europepmc.org/abstract/MED/32604674 .SHTI200567 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Rogers JR, Callahan TJ, Kang T, Bauck A, Khare R, Brown JS, Kahn MG, Weng C. A data element-function conceptual model for data quality checks. EGEMS (Wash DC) 2019 Apr 23;7(1):17. doi: 10.5334/egems.289. https://europepmc.org/abstract/MED/31065558 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Li M, Cai H, Zhi Y, Fu Z, Duan H, Lu X. A configurable method for clinical quality measurement through electronic health records based on openEHR and CQL. BMC Med Inform Decis Mak. 2022 Feb 10;22(1):37. doi: 10.1186/s12911-022-01763-3. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-022-01763-3 .10.1186/s12911-022-01763-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Keegan N, Vasselman S, Barnett E, Nweji B, Carbone E, Blum A, Morris Michael J, Rathkopf Dana E, Slovin Susan F, Danila Daniel C, Autio Karen A, Scher Howard I, Kantoff Philip W, Abida Wassim, Stopsack Konrad H. Clinical annotations for prostate cancer research: defining data elements, creating a reproducible analytical pipeline, and assessing data quality. Prostate. 2022 Aug;82(11):1107–1116. doi: 10.1002/pros.24363. https://europepmc.org/abstract/MED/35538298 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Dixon BE, Siegel JA, Oemig TV, Grannis SJ. Electronic health information quality challenges and interventions to improve public health surveillance data and practice. Public Health Rep. 2013;128(6):546–53. doi: 10.1177/003335491312800614. https://europepmc.org/abstract/MED/24179266 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Cusick MM, Sholle ET, Davila MA, Kabariti J, Cole CL, Campion TR. A method to improve availability and quality of patient race data in an electronic health record system. Appl Clin Inform. 2020 Oct;11(5):785–791. doi: 10.1055/s-0040-1718756. http://www.thieme-connect.com/DOI/DOI?10.1055/s-0040-1718756 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Arvidsson E, Dijkstra R, Klemenc-Ketiš Zalika. Measuring quality in primary healthcare - opportunities and weaknesses. Zdr Varst. 2019 Sep;58(3):101–103. doi: 10.2478/sjph-2019-0013. https://europepmc.org/abstract/MED/31275436 .sjph-2019-0013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Lewis J, Stephens J, Musick B, Brown S, Malateste K, Ha Dao Ostinelli Cam, Maxwell Nicola, Jayathilake Karu, Shi Qiuhu, Brazier Ellen, Kariminia Azar, Hogan Brenna, Duda Stephany N, on the behalf of leDEA The IeDEA harmonist data toolkit: a data quality and data sharing solution for a global HIV research consortium. J Biomed Inform. 2022 Jul;131:104110. doi: 10.1016/j.jbi.2022.104110. https://boris.unibe.ch/id/eprint/170555 .S1532-0464(22)00126-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Paulsen A, Overgaard S, Lauritsen JM. Quality of data entry using single entry, double entry and automated forms processing--an example based on a study of patient-reported outcomes. PLoS One. 2012 Apr 6;7(4):e35087. doi: 10.1371/journal.pone.0035087. https://dx.plos.org/10.1371/journal.pone.0035087 .PONE-D-11-15903 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Johnson SG, Pruinelli L, Hoff A, Kumar V, Simon GJ, Steinbach M, Westra BL. A framework for visualizing data quality for predictive models and clinical quality measures. AMIA Jt Summits Transl Sci Proc. 2019;2019:630–638. https://europepmc.org/abstract/MED/31259018 . [PMC free article] [PubMed] [Google Scholar]
- 79.Dentler K, Numans ME, ten Teije A, Cornet R, de Keizer NF. Formalization and computation of quality measures based on electronic medical records. J Am Med Inform Assoc. 2014 Mar 01;21(2):285–91. doi: 10.1136/amiajnl-2013-001921. https://europepmc.org/abstract/MED/24192317 .amiajnl-2013-001921 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Kahn M, Ranade D. The impact of electronic medical records data sources on an adverse drug event quality measure. J Am Med Inform Assoc. 2010;17(2):185–91. doi: 10.1136/jamia.2009.002451. https://europepmc.org/abstract/MED/20190062 .17/2/185 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Zakim D, Brandberg H, El Amrani S, Hultgren A, Stathakarou N, Nifakos S, Kahan T, Spaak J, Koch S, Sundberg CJ. Computerized history-taking improves data quality for clinical decision-making-Comparison of EHR and computer-acquired history data in patients with chest pain. PLoS One. 2021;16(9):e0257677. doi: 10.1371/journal.pone.0257677. https://dx.plos.org/10.1371/journal.pone.0257677 .PONE-D-21-11324 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Thuraisingam S, Chondros P, Dowsey MM, Spelman T, Garies S, Choong PF, Gunn J, Manski-Nankervis J. Assessing the suitability of general practice electronic health records for clinical prediction model development: a data quality assessment. BMC Med Inform Decis Mak. 2021 Oct 30;21(1):297. doi: 10.1186/s12911-021-01669-6. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-021-01669-6 .10.1186/s12911-021-01669-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Wang H, Belitskaya-Levy I, Wu F, Lee JS, Shih M, Tsao PS, Lu Y. A statistical quality assessment method for longitudinal observations in electronic health record data with an application to the VA million veteran program. BMC Med Inform Decis Mak. 2021 Oct 20;21(1):1. doi: 10.1186/S12911-021-01643-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Wang J, Cha J, Sebek K, McCullough C, Parsons A, Singer J, Shih Sarah C. Factors related to clinical quality improvement for small practices using an EHR. Health Serv Res. 2014 Dec;49(6):1729–46. doi: 10.1111/1475-6773.12243. https://europepmc.org/abstract/MED/25287906 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Kiogou SD, Chi C, Zhang R, Ma S, Adam TJ. Clinical data cohort quality improvement: the case of the medication data in the University of Minnesota's clinical data repository. AMIA Jt Summits Transl Sci Proc. 2022;2022:293–302. https://europepmc.org/abstract/MED/35854717 .2298 [PMC free article] [PubMed] [Google Scholar]
- 86.Alwhaibi M, Balkhi B, Alshammari T, AlQahtani N, Mahmoud M, Almetwazi M, Ata Sondus, Basyoni Mada, Alhawassi Tariq. Measuring the quality and completeness of medication-related information derived from hospital electronic health records database. Saudi Pharm J. 2019 May;27(4):502–506. doi: 10.1016/j.jsps.2019.01.013. https://linkinghub.elsevier.com/retrieve/pii/S1319-0164(19)30012-X .S1319-0164(19)30012-X [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Tak YW, Han JH, Park YJ, Kim D, Oh JS, Lee Y. Examining final-administered medication as a measure of data quality: a comparative analysis of death data with the Central Cancer Registry in Republic of Korea. Cancers (Basel) 2023 Jun 27;15(13):3371. doi: 10.3390/cancers15133371. https://www.mdpi.com/resolver?pii=cancers15133371 .cancers15133371 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Upadhyay S, Hu H. A qualitative analysis of the impact of electronic health records (EHR) on healthcare quality and safety: clinicians' lived experiences. Health Serv Insights. 2022;15:11786329211070722. doi: 10.1177/11786329211070722. https://www.nia.or.kr/site/nia_kor/ex/bbs/View.do?cbIdx=26537&bcIdx=24250 .10.1177_11786329211070722 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Barkhuysen P, de Grauw Wim, Akkermans R, Donkers J, Schers H, Biermans M. Is the quality of data in an electronic medical record sufficient for assessing the quality of primary care? J Am Med Inform Assoc. 2014;21(4):692–8. doi: 10.1136/amiajnl-2012-001479. https://europepmc.org/abstract/MED/24145818 .amiajnl-2012-001479 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Fu S, Wen A, Pagali S, Zong N, St Sauver J, Sohn S, Fan J, Liu H. The implication of latent information quality to the reproducibility of secondary use of electronic health records. Stud Health Technol Inform. 2022 Jun 06;290:173–177. doi: 10.3233/SHTI220055. https://europepmc.org/abstract/MED/35672994 .SHTI220055 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Dy S, Lorenz K, O'Neill S, Asch S, Walling A, Tisnado D, Antonio Anna Liza, Malin Jennifer L. Cancer Quality-ASSIST supportive oncology quality indicator set: feasibility, reliability, and validity testing. Cancer. 2010 Jul 01;116(13):3267–75. doi: 10.1002/cncr.25109. https://onlinelibrary.wiley.com/doi/10.1002/cncr.25109 . [DOI] [PubMed] [Google Scholar]
- 92.Greiver M, Drummond N, Birtwhistle R, Queenan J, Lambert-Lanning A, Jackson D. Using EMRs to fuel quality improvement. Can Fam Physician. 2015 Jan;61(1):92, e68–9. http://www.cfp.ca/cgi/pmidlookup?view=long&pmid=25609529 .61/1/92 [PMC free article] [PubMed] [Google Scholar]
- 93.Capurro D, Yetisgen M, van Eaton E, Black R, Tarczy-Hornoch P. Availability of structured and unstructured clinical data for comparative effectiveness research and quality improvement: a multisite assessment. EGEMS (Wash DC) 2014;2(1):1079. doi: 10.13063/2327-9214.1079. https://europepmc.org/abstract/MED/25848594 .egems1079 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Romero L, Carneiro PB, Riley C, Clark H, Uy R, Park M, Mawokomatanda T, Bombard JM, Hinckley A, Skapik J. Building capacity of community health centers to overcome data challenges with the development of an agile COVID-19 public health registry: a multistate quality improvement effort. J Am Med Inform Assoc. 2021 Dec 28;29(1):80–88. doi: 10.1093/jamia/ocab233. https://europepmc.org/abstract/MED/34648005 .6396875 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Merino J, Caballero I, Rivas B, Serrano M, Piattini M. A data quality in use model for big data. Future Generation Computer Systems. 2016 Oct;63:123–130. doi: 10.1016/j.future.2015.11.024. https://doi.org/10.1016/j.future.2015.11.024 . [DOI] [Google Scholar]
- 96.E6(R2) Good Clinical Practice: Integrated Addendum to ICH E6(R1) Food and Drug Administration. 2018. Mar, [2025-03-06]. https://www.fda.gov/media/93884/download .
- 97.Colditz RR, Conrad C, Wehrmann T, Schmidt M, Dech S. TiSeG: a flexible software tool for time-series generation of MODIS data utilizing the quality assessment science data set. IEEE Trans. Geosci. Remote Sensing. 2008 Oct;46(10):3296–3308. doi: 10.1109/tgrs.2008.921412. [DOI] [Google Scholar]
- 98.Final NIH Policy for Data Management and Sharing. National Institutes of Health. 2020. Oct 29, [2025-03-06]. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html .
- 99.Häyrinen Kristiina, Saranto K, Nykänen Pirkko. Definition, structure, content, use and impacts of electronic health records: a review of the research literature. Int J Med Inform. 2008 May;77(5):291–304. doi: 10.1016/j.ijmedinf.2007.09.001. https://doi.org/10.1016/j.ijmedinf.2007.09.001 .S1386-5056(07)00168-2 [DOI] [PubMed] [Google Scholar]
- 100.Miran S, Nelson S, Redd D, Zeng-Treitler Q. Using multivariate long short-term memory neural network to detect aberrant signals in health data for quality assurance. Int J Med Inform. 2021 Mar;147:104368. doi: 10.1016/j.ijmedinf.2020.104368. https://europepmc.org/abstract/MED/33401168 .S1386-5056(20)31904-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Botsis T, Hartvigsen G, Chen F, Weng C. Secondary use of EHR: data quality issues and informatics opportunities. Summit Transl Bioinform. 2010 Mar 01;2010:1–5. https://europepmc.org/abstract/MED/21347133 . [PMC free article] [PubMed] [Google Scholar]
- 102.Torda P, Tinoco A. Achieving the promise of electronic health record-enabled quality measurement: a measure developer's perspective. EGEMS (Wash DC) 2013;1(2):1031. doi: 10.13063/2327-9214.1031. https://europepmc.org/abstract/MED/25848574 .egems1031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.co . SW Computing Industry Source Technology Development Project 3rd Year Final Report. Korea: Ministry of Science and ICT; 2020. Feb 14, Bigdata Quality Evaluation Tool Development; pp. 1–283. [Google Scholar]
- 104.Gong X, Shroff N. Incentivizing Truthful Data Quality for Quality-Aware Mobile Data Crowdsourcing. Eighteenth ACM International Symposium on Mobile Ad Hoc Networking and Computing; June 26-29, 2018; Los Angeles, CA. 2018. https://scienceon.kisti.re.kr/srch/selectPORSrchReport.do?cn=TRKO202100007277#; [DOI] [Google Scholar]
- 105.Amoah AO, Amirfar S, Silfen SL, Singer J, Wang JJ. Applied use of composite quality measures for EHR-enabled practices. EGEMS (Wash DC) 2015;3(1):1118. doi: 10.13063/2327-9214.1118. https://europepmc.org/abstract/MED/26290881 .egems1118 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Search List.
Additional Quality Dimension.
Term of Data Quality Management.
PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 checklist.
Data Availability Statement
The data supporting this article are available upon request from the corresponding author.