Skip to main content
. 2024 Mar 6;12:e51560. doi: 10.2196/51560

Table 2.

Overview of all data quality assessment methods with definitions.

Assessment Ma Assessment technique in reviews Explanation
M1 Linkages—other data sets
  • Percentage of eligible population included in the data set.

M2 Comparison of distributions
  • Difference in means and other statistics.

M3 Case duplication
  • Number and percentage or cases with >1 record.

M4 Completeness of variables
  • Percentage of cases with complete observations of each variable.

M5 Completeness of cases
  • Percentage of cases with complete observations for all variables.

M6 Distribution comparison
  • Distributions or summary statistics of aggregated data from the data set are compared with the expected distributions for the clinical concepts of interest.

M7 Gold standard
  • A data set drawn from another source or multiple sources is used as a gold standard.

M8 Historic data methods
  • Stability of incidence rates over time.

  • Comparison of incidence rates in different populations.

  • Shape of age-specific curves.

  • Incidence rates of childhood curves.

M9 M:Ib
  • Comparing the number of deaths, sourced independently from the registry, with the number of new cases recorded for a specific period.

M10 Number of sources and notifications per case
  • Using many sources reduces the possibility of diagnoses going unreported, thus increasing the completeness of cases.

M11 Capture-recapture method
  • A statistical method using multiple independent samples to estimate the size of an entire population.

M12 Death certificate method
  • This method requires that death certificate cases can be explicitly identified by the data set and makes use of the M:I ratio to estimate the proportion of the initially unregistered cases.

M13 Histological verification of diagnosis
  • The percentage of cases morphologically verified is a measure of the completeness of the diagnostic information.

M14 Independent case ascertainment
  • Rescreening the sources used to detect any case missing during the registration process.

M15 Data element agreement
  • Two or more elements within a data set are compared to check if they report the same or compatible information.

M16 Data source agreement
  • Data from the data set are cross-referenced with another source to check for agreement.

M17 Conformance check
  • Check the uniqueness of objects that should not be duplicated; the data set agreement with prespecified or additional structural constraints, and the agreement of object concepts and formats granularity between ≥2 data sources.

M18 Element presence
  • A determination is made as to whether or not desired or expected data elements are present.

M19 Not specified
  • Number of consistent values and number of total values.

M20 International standards for classification and coding
  • For example, neoplasms, the International Classification of Diseases for Oncology provides coding of topography, morphology, behavior, and grade.

M21 Incidence rate
  • Not specified

M22 Multiple primaries
  • The extent that a distinction must be made between those that are new cases and those that represent an extension or recurrence of an existing one.

M23 Incidental diagnosis
  • Screening aims to detect cases that are asymptomatic.

  • Autopsy diagnosis without any suspicion of diagnosed case before death.

M24 Not specified
  • 1=ratio of violations of specific consistency type to the total number of consistency checks.

M25 Validity check
  • Data in the data set are assessed using various techniques that determine of the values “make sense.”

M26 Reabstracting and recoding
  • Reabstracting describes the process of independently reabstracting records from a given source, coding the data, and comparing the abstracted and coded data with the information recorded in the database. For each reabstracted data item, the auditor’s codes are compared with the original codes to identify discrepancies.

  • Recoding involves independently reassigning codes to abstracted text information and evaluating the level of agreement with records already in the database.

M27 Missing information
  • The proportion of registered cases with unknown values for various data items.

M28 Internal consistency
  • The proportion of registered cases with unknown values for various data items.

M29 Domain check
  • Proportion of observations outside plausible range (%).

M30 Interrater variability
  • Proportion of observations in agreement (%).

  • Kappa statistics.

M31 Log review
  • Information on the actual data entry practices (eg, dates, times, and edits) is examined.

M32 Syntactic accuracy
  • Not specified.

M33 Log review
  • Information on the actual data entry practices (eg, dates, times, and edits) is examined.

  • Time at which data are stored in the system.

  • Time of last update.

  • User survey.

M34 Not specified
  • Ratio: number of reports sent on time divided by total reports.

M35 Not specified
  • Ratio: number of data values divided by the overall number of values.

M36 Time to availability
  • The interval between date of diagnosis (or date of incidence) and the date the case was available in the registry or data set.

M37 Security analyses
  • Analyses of access reports.

M38 Not specified
  • Descriptive qualitative measures with group interviews and interpreted with grounded theory.

aM: method.

bM:I: mortality:incidence ratio.