Skip to main content
. 2017 Sep 7;46(5):1699–1710. doi: 10.1093/ije/dyx177

Box 1. Summary of approaches to evaluating linkage quality

Using a gold standard dataset to quantify false matches and missed matches Comparing characteristics of linked and unlinked data to identify potential sources of bias Sensitivity analyses to evaluate how sensitive results are to changes in linkage procedure
Purpose To quantify errors (missed matches and false matches) To identify subgroups of records that are more prone to linkage error and are potential sources of bias Assesses the extent to which results of interest may vary depending on different levels of error, and the direction of likely bias
Strengths Easily interpretable; allows linkage error to be fully measured Straightforward to implement and easily interpretable Straightforward to implement
Limitations Representative gold standard data are rarely available Cannot be applied if systematic differences are expected between linked unlinked records (e.g. if linking to death register) Results may be difficult to interpret as false matches and missed matches may impact on results in opposing or compounding ways
Technical requirements A representative group of records for which true match status is known; data linker capacity to perform evaluation (researchers rarely have access to gold standard data) A linkage design where all records in at least one file are expected to link; provision of record-level or aggregate characteristics of unlinked records to researchers Provision of information on the strength of the match (e.g. deterministic rule or probabilistic match weight)