Table 1.
Identifier type | Instances in the corpus | Instances replaced by the system | System recall* | Residual identifier instances in corpus | Reasonable opportunity to test HIPS?† |
A | B | C | D | E | F |
HIPAA PHI | |||||
Pat. name | 35 | 29 | 83% | 6 | Yes |
Age | 86 | 79 | 92% | 7 | Yes |
Phone # | 2 | 0 | 0% | 2 | No |
Address | 6 | 4 | 67% | 2 | No |
Date | 180 | 163 | 91% | 17 | Yes |
MRN | 3 | 0 | NA | 3 | No |
Acct. # | 1 | 0 | NA | 1 | No |
Other ID #s | 10 | 1 | NA | 9 | No |
Subtotal | 323 | 276 | 85% | 47 | |
OTHER PHI | |||||
MD name | 82 | 73 | 89% | 9 | Yes |
Org. name | 27 | 7 | 26% | 20 | No |
Subtotal | 109 | 80 | 73% | 29 |
A suboptimal training set was used to degrade system recall, thereby increasing residual PHI for experimental purposes.
We defined a reasonable opportunity to test the HIPS approach as recall (col. D) ≥∼0.5 and N residual PHI instances in the corpus (col. E) ≥∼10.