Skip to main content
. 2012 Jul 6;20(2):342–348. doi: 10.1136/amiajnl-2012-001034

Table 2.

Automated de-identification system performance by identifier type in the average-density identifier corpus consisting of 50 family practice progress notes (22 525 words)

PHI type N PHI instances in corpus N PHI instances replaced by system System PHI recall* N residual PHI instance in corpus Reasonable opportunity to test HIPS?
A B C D E F
HIPAA PHI
 Pat. name 59 27 46% 32 Yes
 Age 50 50 100% 0 No
 Phone # 3 3 100% 0 No
 Address 3 0 0% 3 No
 Date 228 194 85% 34 Yes
 MRN 0 0 NA 0 No
 Acct. # 0 0 NA 0 No
 Other ID #s 0 0 NA 0 No
 ALL HIPAA 343 274 80% 69
OTHER PHI
 MD name 53 4 8% 49 No
 Org. name 63 2 3% 61 No
 ALL OTHER 116 6 5% 110
*

A suboptimal training set was used to degrade system recall, thereby increasing residual PHI for experimental purposes.

Criteria for inclusion in the detection experiment were system recall (col. D) ≥ ∼0.5 and N residual instances (col. E) ≥ ∼10.