Table 1.
Category | Number of tokens | ||||
---|---|---|---|---|---|
Random corpus | Ambiguous corpus | Out-of-vocabulary corpus | Authentic corpus | Challenge corpus | |
Non-PHI | 17,874 | 19,275 | 17,875 | 112,669 | 444,127 |
Patient | 1,048 | 1,047 | 1,037 | 294 | 1,737 |
Doctor | 311 | 311 | 302 | 738 | 7,697 |
Location | 24 | 24 | 24 | 88 | 518 |
Hospital | 600 | 600 | 404 | 656 | 5,204 |
Date | 735 | 736 | 735 | 1,953 | 7,651 |
ID | 36 | 36 | 36 | 482 | 5,110 |
Phone | 39 | 39 | 39 | 32 | 271 |