Skip to main content
. 2022 Jun 22;19(4):452–463. doi: 10.1177/17407745221087469

Table 3.

Most common definitions for anonymisation, de-identification and pseudonymisation.

Pseudonymisation De-identification Anonymisation
• Attributes are replaced with pseudonyms on a one-to-one correspondence
• It is never an effective means of anonymisation
• A security enhancing measure
• Pseudonyms bear no relationto the patient details
• Preferably reversible
• Stripping datasets of patients identifying variables as per either:
 ○ HIPAA 18 items‘Safe Harbor’ method (US)
 ○ Hrynaszkiewicz et al.28 items of personal andclinical information (Europe)
• Any given record lacks any individuality,distinction or recognisability
• Can potentially distort data
• The link with the original dataset should be destroyed
• Set at a level to reach acceptable risk, but binary in law
Most common definitions for data manipulation techniquesa
Suppression (removal, elimination) Recoding (grouping, masking, replacement, generalisation, blurring, aggregation) Recalculation Perturbation
• Delete outliers
• Delete free-text
• Delete high-risk variables
• Delete high-risk records
• Keep first three digits of postcode
• Categorise age (18–40) and >= 40
• Show age instead of DOB
• Show study day relativeto randomisationday, instead of date(e.g. day 7)
• When dates are important they are presented offset
• Add randomnoise to variables
• Replace data withsimulated randomvalues
• Data shuffling
• Rounding of variables

HIPAA: Health Insurance Portability and Accountability Act.

aTuck et al. 45 and Tudur-Smith et al. 48 mentioned the removal of superfluous data (e.g. deletion of data, such as audit trails) to supplement data manipulation techniques.