Harbin Institute of Technology |
OpenNLP CRF++ |
Regular expressions for tokenization |
CRF: lexical, syntactic, orthographic |
|
Harbin Institute of Technology Shenzhen Graduate School |
MedEx |
Regular expressions for categories such as PHONE, FAX, MEDICAL RECORD, EMAIL and IPADDR |
CRFs: bag-of-words; part-of-speech (POS) tags; combinations of tokens and POS tags; sentence information; affixes; orthographical features; word shapes; section information; dictionary features |
|
Kaiser Permanente |
MIST Stanford NER |
Regular expressions for categories such as PHONE, EMAIL, ZIP |
MIST, Stanford NER; features not mentioned |
Personal de-id corpus |
LIMSI-CNRS |
Tree Tagger MEDINA toolkit |
Rules to correct output of CRF |
CRF: surface features, morpo-syntactic, semantic, distributional |
|
UNIMAN |
Pre-processing: CTAKES and GATE |
JAPE system: orthographic, pattern, contextual, entity |
CFR: lexical, orthographic, semantic, positional |
Dictionaries collected from Wikipedia, GATE, and deid |
Newfoundland |
Python packages Numpy and Scipy |
|
non-parametric Bayesian Hidden Markov Model: token, word token, number token |
|
Nottingham |
Pre-processing; CRF++ |
Yes, for categories such as FAX, EMAIL, DEVICE, BIOID |
CRF: Word-token, context, orthographic, sentence-level,task-specific |
self-compiled dictionary |
San Marcos |
|
Used for all categories of PHI |
|
|