Table 1.
VHA PHI categories | MIT deid | One-step CRF | BoB rules | BoB CRF | BoB rules+CRF | BoB full | |
---|---|---|---|---|---|---|---|
R | R | R | R | R | R | P | |
Patient Name | 0.590 | 0.949 | 0.972 | 0.953 | 0.992 | 0.980 | 0.707* |
Relative Name | 0.600 | 0.920 | 0.960 | 0.960 | 0.960 | 0.920 | |
Healthcare Provider Name | 0.319 | 0.898 | 0.920 | 0.898 | 0.963 | 0.943 | |
Other Person Name | 0.111 | 0.667 | 1 | 0.667 | 1 | 0.888 | |
Street City | 0.828 | 0.802 | 0.962 | 0.872 | 0.974 | 0.943 | 0.679 |
State Country | 0.689 | 0.824 | 0.953 | 0.757 | 0.973 | 0.878 | 0.751 |
Deployment | 0.057 | 0.887 | 1 | – | 1 | 0.887 | 0.859 |
ZIP Code | 1 | 1 | 1 | – | 1 | 1 | 1 |
Healthcare Units | 0.008 | 0.732 | 0.832 | 0.755 | 0.914 | 0.811 | 0.836 |
Other Organizations | 0.033 | 0.483 | 0.824 | 0.549 | 0.912 | 0.725 | 0.578 |
Date | 0.399 | 0.892 | 0.963 | 0.917 | 0.977 | 0.971 | 0.934 |
Age>89 | 0.250 | 0.500 | 1 | – | 1 | 1 | 0.8 |
Phone Number | 0.494 | 0.835 | 0.989 | – | 0.989 | 0.956 | 1 |
Electronic Address | 1 | 0.500 | 1 | – | 1 | 1 | 1 |
SSN | 1 | 0.407 | 1 | – | 1 | 1 | 0.964 |
Other ID Number | 0.117 | 0.822 | 0.978 | – | 0.978 | 0.917 | 0.831 |
Overall macro-averaged | 0.468 | 0.757 | 0.960 | – | 0.977 | 0.926 | 0.841 |
Overall micro-averaged | |||||||
Precision Recall F1 measure F2 measure | 0.311 | 0.920 | 0.362 | – | 0.346 | 0.836 | |
0.350 | 0.842 | 0.928 | – | 0.961 | 0.922 | ||
0.329 | 0.879 | 0.521 | – | 0.509 | 0.877 | ||
0.341 | 0.856 | 0.707 | – | 0.709 | 0.904 |
*BoB annotates all person names as one PHI category.
CRF, conditional random fields; P, precision; PHI, protected health information; R, recall; VHA, Veterans Health Administration.