Skip to main content
. Author manuscript; available in PMC: 2016 Aug 18.
Published in final edited form as: J Biomed Inform. 2015 Jul 28;58(Suppl):S11–S19. doi: 10.1016/j.jbi.2015.06.007

Table 1.

Overview of tools, rules, machine learning, and external resources used in systems.

Team name Tools Rules and features Machine learning systems and features External resources
Harbin Institute of Technology OpenNLP CRF++ Regular expressions for tokenization CRF: lexical, syntactic, orthographic
Harbin Institute of Technology Shenzhen Graduate School MedEx Regular expressions for categories such as PHONE, FAX, MEDICAL RECORD, EMAIL and IPADDR CRFs: bag-of-words; part-of-speech (POS) tags; combinations of tokens and POS tags; sentence information; affixes; orthographical features; word shapes; section information; dictionary features
Kaiser Permanente MIST Stanford NER Regular expressions for categories such as PHONE, EMAIL, ZIP MIST, Stanford NER; features not mentioned Personal de-id corpus
LIMSI-CNRS Tree Tagger MEDINA toolkit Rules to correct output of CRF CRF: surface features, morpo-syntactic, semantic, distributional
UNIMAN Pre-processing: CTAKES and GATE JAPE system: orthographic, pattern, contextual, entity CFR: lexical, orthographic, semantic, positional Dictionaries collected from Wikipedia, GATE, and deid
Newfoundland Python packages Numpy and Scipy non-parametric Bayesian Hidden Markov Model: token, word token, number token
Nottingham Pre-processing; CRF++ Yes, for categories such as FAX, EMAIL, DEVICE, BIOID CRF: Word-token, context, orthographic, sentence-level,task-specific self-compiled dictionary
San Marcos Used for all categories of PHI