Table 3.
Pseudonymisation | De-identification | Anonymisation | ||
---|---|---|---|---|
• Attributes are replaced with pseudonyms on a one-to-one
correspondence • It is never an effective means of anonymisation • A security enhancing measure • Pseudonyms bear no relationto the patient details • Preferably reversible |
• Stripping datasets of patients identifying variables as
per either: ○ HIPAA 18 items‘Safe Harbor’ method (US) ○ Hrynaszkiewicz et al.28 items of personal andclinical information (Europe) |
• Any given record lacks any
individuality,distinction or recognisability • Can potentially distort data • The link with the original dataset should be destroyed • Set at a level to reach acceptable risk, but binary in law |
||
Most common definitions for data manipulation techniquesa | ||||
Suppression (removal, elimination) | Recoding (grouping, masking, replacement, generalisation, blurring, aggregation) | Recalculation | Perturbation | |
• Delete outliers • Delete free-text • Delete high-risk variables • Delete high-risk records |
• Keep first three digits of postcode • Categorise age (18–40) and >= 40 |
• Show age instead of DOB • Show study day relativeto randomisationday, instead of date(e.g. day 7) • When dates are important they are presented offset |
• Add randomnoise to variables • Replace data withsimulated randomvalues • Data shuffling • Rounding of variables |
HIPAA: Health Insurance Portability and Accountability Act.
aTuck et al. 45 and Tudur-Smith et al. 48 mentioned the removal of superfluous data (e.g. deletion of data, such as audit trails) to supplement data manipulation techniques.