Linkage variables used to generate tokens
|
Token 1
Patient last name
Patient first initial of first name
Patient gender
Patient DOB
Token 2
Patient last name (Soundex)
Patient first name (Soundex)
Patient gender
Patient DOB
|
Cleaning and pre-processing of linkage variables
|
De-identification software applies a series of validators and cleaners before a token is generated |
Validators
First name and last name should be more than 1 character
Gender: Requires that the field is MF, mf, or female or male (case insensitive)
If any field fails a validation test, the token is not created
|
Cleaners
Removes all non-alphabetic characters. Alphabetic characters include A–Z and a–z
Removes all non-numeric characters. Numeric characters include 0–9
Removes all characters that are not a number (0–9) or letter (A–Z + a–z)
Capitalizes all alphabetic characters (a–z → A–Z)
|
DOB: date of birth. |