Table 1.
Principles to assist experts in the determination of the identifiability of health information
Principle | Description | Examples |
---|---|---|
Replication | Prioritize health information features into levels of risk according to the chance it will consistently occur in relation to the individual | Low: results of a patient’s blood glucose level test will vary |
High: Demographics of a patient (e.g. birthdate) are relatively static | ||
Resource availability | Determine which external resources contain the patients’ identifiers and the replicable features in the health information, as well as who is permitted access to these resources | Low: The results of laboratory reports are not often disclosed with identity beyond healthcare environments |
High: Patient identity and demographics are often in public resources, such as vital records—birth, death, and marriage registries. | ||
Distinguishability | Determine the extent to which the subject’s data can be distinguished if health data is disseminated | Low: It has been estimated that the combination of Year of Birth, Gender, and 3-Digit ZIP Code is unique for approximately 0.04% of residents in the United States (Sweeney 2007). This means that very few residents could be indentified through this combination of data alone |
High: It has been estimated that the combination of a patient’s Date of Birth, Gender, and 5-Digit ZIP CODE is unique for over 50% of residents in the United States (Golle, 2006, Sweeney 2002a, b). This means that over half of US residents could be uniquely described just with these three data elements |