Table 2.
VHA PHI categories | MIT deid | One-step CRF | BoB rules | BoB CRF | BoB rules+CRF | BoB full | |
---|---|---|---|---|---|---|---|
R | R | R | R | R | R | P | |
Patient Name | 0.724 | 0.956 | 0.977 | 0.962 | 0.994 | 0.985 | 0.642* |
Relative Name | 0.909 | 0.939 | 0.970 | 0.970 | 0.970 | 0.939 | |
Healthcare Provider Name | 0.747 | 0.925 | 0.938 | 0.916 | 0.965 | 0.943 | |
Other Person Name | 0.867 | 0.800 | 1 | 0.800 | 1 | 0.933 | |
Street City | 0.765 | 0.728 | 0.878 | 0.798 | 0.929 | 0.887 | 0.682 |
State Country | 0.656 | 0.812 | 0.944 | 0.744 | 0.956 | 0.869 | 0.839 |
Deployment | 0.177 | 0.859 | 0.934 | – | 0.934 | 0.869 | 0.915 |
ZIP Code | 1 | 1 | 1 | – | 1 | 1 | 1 |
Healthcare Units | 0.080 | 0.716 | 0.834 | 0.748 | 0.902 | 0.798 | 0.779 |
Other Organizations | 0.098 | 0.503 | 0.798 | 0.596 | 0.880 | 0.721 | 0.606 |
Date | 0.617 | 0.922 | 0.972 | 0.938 | 0.978 | 0.972 | 0.935 |
Age>89 | 0.250 | 0.500 | 1 | – | 1 | 1 | 0.8 |
Phone Number | 0.565 | 0.810 | 0.991 | – | 0.991 | 0.939 | 1 |
Electronic Address | 1 | 0.500 | 1 | – | 1 | 1 | 1 |
SSN | 1 | 0.407 | 1 | – | 1 | 1 | 0.964 |
Other ID number | 0.094 | 0.855 | 0.983 | – | 0.983 | 0.936 | 0.82 |
Overall macro-averaged | 0.597 | 0.764 | 0.951 | – | 0.968 | 0.925 | 0.845 |
Overall micro-averaged | |||||||
Precision | 0.734 | 0.931 | 0.420 | – | 0.392 | 0.815 | |
Recall | 0.489 | 0.859 | 0.933 | – | 0.957 | 0.921 | |
F1 measure | 0.587 | 0.893 | 0.579 | – | 0.556 | 0.864 | |
F2 measure | 0.524 | 0.872 | 0.749 | – | 0.743 | 0.897 |
*BoB annotates all person names as one PHI category.
CRF, conditional random fields; P, precision; PHI, protected health information; R, recall; VHA, Veterans Health Administration.