Skip to main content
. 2024 Jan 19;8:e2300046. doi: 10.1200/CCI.23.00046

TABLE 3.

Sample of Verification Checks in Flatiron Health RWD

Category Subcategory Description Example Verification Check
Conformance Value conformance Data values conform to internal formatting constraints Dates are recorded as YYYY-MM-DD
Data values conform to allowable values or ranges Stage is abstracted from unstructured documents into structured categories aligned to AJCC terminology
Relational conformance Data values conform to relational constraints Patients with documentation of real-world response events also have documented treatment data
Unique (key) data values are not duplicated Duplicate records for the same patient across multiple clinic sites are merged into a single record
Changes to the data model or data model versioning Changes to the data model are tracked and inputs only allowed that match the current data model at the time of entry
Computational conformance Computed values conform to computational or programming specifications Human-abstracted group stage and group stage calculated from abstracted T, N, M components, when available, are identical
Plausibility Uniqueness plausibility Data values that identify a single object are not duplicated Biomarker tests are not captured in duplicate when there are multiple references to the same event in documentation
Atemporal plausibility Data values and distributions agree with an internal measurement or local knowledge (overlaps with indirect benchmarking) First-line treatment regimens, as defined according to line of therapy business rules, reflect expected clinical practice as described by NCCN guidelines
Data values and distributions for independent measurements of the same or related facts agree Date of treatment discontinuation for disease progression is in close proximity to date of progression documented on imaging
Logical constraints between values agree with local or common knowledge (includes ”expected” missingness) Patients receiving TRK inhibitor therapy have documentation of an NTRK fusion
Biologic plausibility of different values is in agreement with local or common knowledge Coexistence of EGFR and KRAS mutations are rare
Values of repeated measurement of the same fact show expected variability Time between repeated response assessments is generally aligned to intervals recommended by NCCN guidelines; however, shorter and longer intervals are also present in line with real-world practice patterns
Temporal plausibility Observed or derived values conform to expected temporal properties Initial diagnosis date precedes metastatic diagnosis date for patients whose cancer stage at initial diagnosis is nonadvanced
Sequences of values that represent state transitions conform to expected properties Real-world progression events are followed by a logical clinical event, such as change in treatment, referral to hospice, or death
Measures of data value density against a time-oriented denominator are expected on the basis of internal or common knowledge PD-L1 testing events become more frequent after approval of a therapy, for which PD-L1 positivity is required by indication
Consistency Cross-field consistency Data are consistent across multiple fields or data sources Patients documented as having brain metastases at initial diagnosis are also identified as having stage IV disease
Temporal consistency Data from recurring or refreshed databases are consistent over time Frequency of PSA values within a given site shows minimal month-over-month variation
Agreement Duplicate capture of the same data point by different processes or individuals results in the same values Two abstractors agree on the discontinuation date and reason for discontinuation of the same drug
Reproducibility Repeat use of operational data capture algorithms will result in the same or similar results Performance of a smoking status variable leads to a consistent extracted result each time it is used on the same or similar tasks

NOTE. Modified from Kahn et al.25

Abbreviations: AJCC, American Joint Committee on Cancer; EGFR, epidermal growth factor receptor; NCCN, National Comprehensive Cancer Network; NTRK, neurotrophic tyrosine receptor kinase; PD-L1, programmed death ligand 1; PSA, prostate-specific antigen; TRK, tropomyosin receptor kinase.