TABLE A2.
Dimension | Subcategory | Description | Example Quality Process |
---|---|---|---|
Completeness | Site level | Site-level completeness is assessed across selected variables, with thresholds set based on variable criticality and clinical or internal benchmarks | Completeness thresholds for critical laboratory data range from 40% (eg, lactate dehydrogenase) to 90% (eg, hematocrit and hemoglobin) |
Completeness targets on the basis of median site scores (eg, Route of medication administration documentation rate is >70%) | |||
Completeness targets for critical variables (eg, birth year, sex) are >92% | |||
Patient level | Patient-level completeness is assessed by verification checks designed to identify and improve potentially incomplete data on the basis of clinical or data model expectations | Patients with a line of therapy change without a corresponding progression event are reviewed for complete capture of progression | |
Patients who have received a PI3K inhibitor but do not have a PIK3CA test are reviewed for complete capture of biomarker test data | |||
Variable level | Variable-level completeness is assessed across a selected variable in a data set after curation, across sites, with thresholds set based on variable criticality and clinical or internal benchmarks | Completeness of smoking status, which is expected to be frequently captured in the EHR patient chart, has an expected completeness of >95% | |
Field level | Field-level completeness is assessed by verification checks designed to identify and improve potentially incomplete data on the basis of clinical or data model expectations | Abstracted treatment start dates containing a year but missing month or day are re-reviewed for more complete data capture | |
Required fields as per the data model, such as diagnosis, are prompted to be completed during data curation before submitting data with quality controls | |||
Provenance | Data collection | Information about data sources, setting and time period of collection, and timing of extracts | Data elements can be traced to specific site, setting, extraction date, and source documentation Distributions of data source site (community cancer centers, academic medical centers), geographic areas, and patient populations are made available |
Processing | Information about the steps to curate and transform the source data | Abstractor username, policy version, and timestamp are logged for curation from unstructured data Version of the data standard used for mapping and mapping decisions are stored Data changes over time are logged with an audit trail |
|
Data and quality management | Documentation of processes for data and quality management | Data management plans are available and version controlled Training records for staff handling data are logged and retained Data verification checks are version controlled, with records of flagging and resolutions |
|
Timeliness | Recency-based thresholds | Percent of patients with a value within a given window of time | Percentage of nondeceased patients with a medication administration in the 90 days before data cutoff |
Data refresh cadence | Frequency with which incremental documentation is curated within the data set | Structured data feeds are refreshed every 24 hours |
Abbreviations: EHR, electronic health record; PI3K, phosphoinositide 3-kinases; PIK3CA, phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha.