Skip to main content
. 2021 May 3;6(1):e10264. doi: 10.1002/lrh2.10264

TABLE 1.

Central elements of semantic DQA conceptual model

Phase Construct Definition Examples
Semantic DQ Design Principles Clinical Data Factors
  • Expresses clinical concept for which data quality (DQ) must be measured

  • Considers the ways in which underlying workflow affects potential variables

  • Connects clinical concepts and data provenance

  • Hypertension can be measured through diagnoses, medications (prescriptions or administration of antihypertensives), or blood pressure measurements in EHR data

Analytic Uses
  • Weighs the impact of the clinical concept undergoing DQ assessment

  • Considers the scope: how widely the DQ check will be implemented

  • Main exposure variables or outcomes may be more important than minor covariates

DQ Principles
  • Addresses the combination of established DQ theory with current needs

  • Develops roadmap to determine appropriate DQ method

  • Focuses the results of variable testing

  • Benchmarking hypertension metrics across institutions for face validity requires a different set of tools than attempting to use external sources to test the plausibility of blood pressure values

  • Common DQ principles include outlier detection, completeness of records, variable concordance, and plausible distribution of facts

Semantic DQ Practice Representation
  • Translates clinical concepts to data‐adapted variable definitions

  • More precise clinical definitions should be considered—eg, hypertension defined as use of antihypertensives may be important to measure specificity and hypertension defined as a series of blood pressure measurements allows more flexibility in analytic modeling

Assessment Lenses
  • Supplies specific assessments to evaluate the validity of variables

  • Common lenses to consider in clinical research are epidemiology, diagnoses, clinical care, and health care utilization.

DQ Methods
  • Applies statistical or descriptive methods to evaluate DQ principles

  • Methods can range from simple (eg, proportions or frequency distributions) to complex (eg, PCA, clustering, or other machine learning)

  • Results can be categorical or can rely on visualization.

  • Thresholds for acceptable DQ can be pre‐determined or part of the applied methodology.

Note: Green elements address development of clinical content for testing, while blue rows address application of DQA testing methods.