Skip to main content
. 2024 Jan 19;8:e2300046. doi: 10.1200/CCI.23.00046

TABLE A2.

Sample of Quality Processes for Completeness, Provenance, and Timeliness in Flatiron Health Real-World Data

Dimension Subcategory Description Example Quality Process
Completeness Site level Site-level completeness is assessed across selected variables, with thresholds set based on variable criticality and clinical or internal benchmarks Completeness thresholds for critical laboratory data range from 40% (eg, lactate dehydrogenase) to 90% (eg, hematocrit and hemoglobin)
Completeness targets on the basis of median site scores (eg, Route of medication administration documentation rate is >70%)
Completeness targets for critical variables (eg, birth year, sex) are >92%
Patient level Patient-level completeness is assessed by verification checks designed to identify and improve potentially incomplete data on the basis of clinical or data model expectations Patients with a line of therapy change without a corresponding progression event are reviewed for complete capture of progression
Patients who have received a PI3K inhibitor but do not have a PIK3CA test are reviewed for complete capture of biomarker test data
Variable level Variable-level completeness is assessed across a selected variable in a data set after curation, across sites, with thresholds set based on variable criticality and clinical or internal benchmarks Completeness of smoking status, which is expected to be frequently captured in the EHR patient chart, has an expected completeness of >95%
Field level Field-level completeness is assessed by verification checks designed to identify and improve potentially incomplete data on the basis of clinical or data model expectations Abstracted treatment start dates containing a year but missing month or day are re-reviewed for more complete data capture
Required fields as per the data model, such as diagnosis, are prompted to be completed during data curation before submitting data with quality controls
Provenance Data collection Information about data sources, setting and time period of collection, and timing of extracts Data elements can be traced to specific site, setting, extraction date, and source documentation
Distributions of data source site (community cancer centers, academic medical centers), geographic areas, and patient populations are made available
Processing Information about the steps to curate and transform the source data Abstractor username, policy version, and timestamp are logged for curation from unstructured data
Version of the data standard used for mapping and mapping decisions are stored
Data changes over time are logged with an audit trail
Data and quality management Documentation of processes for data and quality management Data management plans are available and version controlled
Training records for staff handling data are logged and retained
Data verification checks are version controlled, with records of flagging and resolutions
Timeliness Recency-based thresholds Percent of patients with a value within a given window of time Percentage of nondeceased patients with a medication administration in the 90 days before data cutoff
Data refresh cadence Frequency with which incremental documentation is curated within the data set Structured data feeds are refreshed every 24 hours

Abbreviations: EHR, electronic health record; PI3K, phosphoinositide 3-kinases; PIK3CA, phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha.