Skip to main content
. 2012 Jul 6;20(2):342–348. doi: 10.1136/amiajnl-2012-001034

Table 1.

Automated de-identification system performance by identifier type in the high-density identifier corpus consisting of 31 oncology progress notes (15 512 words)

Identifier type Instances in the corpus Instances replaced by the system System recall* Residual identifier instances in corpus Reasonable opportunity to test HIPS?
A B C D E F
HIPAA PHI
 Pat. name 35 29 83% 6 Yes
 Age 86 79 92% 7 Yes
 Phone # 2 0 0% 2 No
 Address 6 4 67% 2 No
 Date 180 163 91% 17 Yes
 MRN 3 0 NA 3 No
 Acct. # 1 0 NA 1 No
 Other ID #s 10 1 NA 9 No
 Subtotal 323 276 85% 47
OTHER PHI
 MD name 82 73 89% 9 Yes
 Org. name 27 7 26% 20 No
 Subtotal 109 80 73% 29
*

A suboptimal training set was used to degrade system recall, thereby increasing residual PHI for experimental purposes.

We defined a reasonable opportunity to test the HIPS approach as recall (col. D) ≥∼0.5 and N residual PHI instances in the corpus (col. E) ≥∼10.