Skip to main content
. 2024 Jan 19;8:e2300046. doi: 10.1200/CCI.23.00046

TABLE 1.

Data Quality Dimensions in Flatiron Health RWD in Comparison With Published Frameworks

Data Quality Dimension Frameworks and Guidance Definition
Relevance
Availability, sufficiency, representativeness
Flatiron Health RWD Availability of critical variables (exposure, outcomes, covariates) and sufficient numbers of representative patients within the appropriate time period to address a given use case
EMA Extent to which a data set presents data elements useful to answer a research question
Extensiveness, including coverage: amount of information available with respect to what exists in the real world, whether it is within the capture process or not
NICE Determined by whether (1) the data provide sufficient information to produce robust and relevant results and (2) results are generalizable to patients in the NHS
FDA Availability of key data elements (exposure, outcomes, covariates) and sufficient numbers of representative patients for the study
Duke-Margolis Assessment of whether the data adequately address the applicable regulatory question or requirement, in part or in whole. Includes whether the data capture relevant information on exposures, outcomes, and covariates, and whether the data are generalizable
PCORI Contextual data quality features are described as entailing unique contextual or task-specific data quality requirements
Reliability Flatiron Health RWD Degree to which the data represent the clinical concept intended, inclusive of data accuracy, completeness, provenance, and timeliness
EMA The dimension that covers how closely the data reflect what they are designed to measure. It covers how correct and trustworthy the data are
NICE The ability to get the same or similar result each time a study is repeated with a different population or group
FDA Data accuracy, completeness, provenance, and traceability
Duke-Margolis Considers whether the data adequately represent the underlying medical concepts they are intended to represent; encompasses data accrual and data quality control (data assurance)
PCORI Intrinsic features of data values are described as features of quality that involve only the data values “in their own right” without reference to external requirements or tasks
Accuracy Flatiron Health RWD Closeness of agreement between the measured value and the true value of what is intended to be measured
EMA Amount of discrepancy between data and reality
Precision: degree of approximation by which data represent reality
NICE How closely the data resemble reality
FDA Closeness of agreement between the measured value and the true value of what is intended to be measured
 Validation: the process of establishing that a method is sound or that data are correctly measured, usually according to a reference standard
Duke-Margolis Assessment of the validity, reliability, and robustness of a data field
PCORI Not defined; concepts of plausibility, conformance, and consistency are described as alternatives
 Conformance Flatiron Health RWD Compliance of data values with internal relational, formatting, or computational definitions or internal or external standards
EMA Assesses coherence toward a specific reference or data model
NICE Whether the recording of data elements is consistent with the data source specifications
FDA Data congruence with standardized types, sizes, and formats
Duke-Margolis Congruence with standardized types, sizes, and formats; how compliant the data are with internal relational, formatting, or computational definitions or standards
PCORI Compliance of the representation of data against internal or external formatting, relational, or computational definitions. Data values align to specified standards and formats
 Plausibility Flatiron Health RWD Believability or truthfulness of data values
EMA Likelihood of some information being true; a proxy to detect errors
NICE Not defined
FDA The believability or truthfulness of data values
Duke-Margolis Recorded values are logically believable given data source and expert opinion
PCORI Believability of data values (uniqueness, atemporal, temporal plausibility)
 Consistency Flatiron Health RWD Stability of a data value within a data set or across linked data sets or over time
EMA Coherence: how different parts of overall data sets are consistent in their representation and meaning. Subdimensions include format coherence, structural coherence, semantic coherence, and uniqueness
 Uniqueness: same information is not duplicated but appears in the data set once
NICE Agreement in patient status in records across the data sources
FDA Included as part of the definition of data integrity: completeness, consistency, and accuracy of data
Duke-Margolis Stability of a data value within a data set or across linked data sets
PCORI Consistency is included as a subcategory of plausibility and conformance
Completeness Flatiron Health RWD Presence of data values (data value frequencies, without reference to actual values themselves)
EMA Extensiveness, including completeness: amount of information available with respect to total information that could be available, given the capture process and data format
NICE Percentage of records without missing data at a given time point
FDA The “presence of the necessary data”
Duke-Margolis Measure of recorded data present within a defined data field and/or data set
The frequencies of data attributes present in a data set without reference to data values
PCORI Frequencies of data attributes present in a data set, without reference to data values
Provenance Flatiron Health RWD An audit trail that accounts for the origin of a piece of data (in a database, document, or repository) together with an explanation of how and why it got to the present place
EMA Not defined
NICE Describes the ability to trace the origin of data and identify how it has been altered and transformed throughout its lifecycle. It provides an understanding of the trustworthiness or reliability of a data source
FDA An audit trail that “accounts for the origin of a piece of data (in a database, document, or repository) together with an explanation of how and why it got to the present place”
  Traceability: permits an understanding of the relationships between the analysis results (tables, listings, and figures in the study report), analysis data sets, tabulation data sets, and source data
Duke-Margolis Origin of the data, sometimes including a chronologic record of data custodians and transformations
 Traceability: ability to record changes to location, ownership, and values
 Data accrual: the process by which data are collected and aggregated (includes provenance)
 Data lineage: the history of all data transformations (eg, recoding or modifying variables)
PCORI Not defined
Timeliness Flatiron Health RWD Data are collected and curated with acceptable recency such that the data set represents reality during the period of coverage
EMA Availability of data at the right time for regulatory decision making, that in turn entails that data are collected and made available within an acceptable time
  Currency: considers freshness of the data, eg, current and immediately useful
  Lateness: aspect of data being captured later than expected corresponding to reality
NICE Lag time between data collection and availability for research
FDA Not defined
Duke-Margolis Longitudinality: condition of data indexed by time/interval of exposure and outcome time
PCORI Not defined

NOTE. Duke-Margolis definitions are synthesized from both the August 2019 and October 2018 white papers.23,24

Abbreviations: EMA, European Medicines Agency; FDA, US Food and Drug Administration; NHS, National Health Service; NICE, National Institute for Health and Care Excellence; PCORI, Patient-Centered Outcomes Research Institute; RWD, real-world data.