. Author manuscript; available in PMC: 2016 Aug 18.

Published in final edited form as: J Biomed Inform. 2015 Jul 28;58(Suppl):S11–S19. doi: 10.1016/j.jbi.2015.06.007

Table 1.

Overview of tools, rules, machine learning, and external resources used in systems.

Team name	Tools	Rules and features	Machine learning systems and features	External resources
Harbin Institute of Technology	OpenNLP CRF++	Regular expressions for tokenization	CRF: lexical, syntactic, orthographic
Harbin Institute of Technology Shenzhen Graduate School	MedEx	Regular expressions for categories such as PHONE, FAX, MEDICAL RECORD, EMAIL and IPADDR	CRFs: bag-of-words; part-of-speech (POS) tags; combinations of tokens and POS tags; sentence information; affixes; orthographical features; word shapes; section information; dictionary features
Kaiser Permanente	MIST Stanford NER	Regular expressions for categories such as PHONE, EMAIL, ZIP	MIST, Stanford NER; features not mentioned	Personal de-id corpus
LIMSI-CNRS	Tree Tagger MEDINA toolkit	Rules to correct output of CRF	CRF: surface features, morpo-syntactic, semantic, distributional
UNIMAN	Pre-processing: CTAKES and GATE	JAPE system: orthographic, pattern, contextual, entity	CFR: lexical, orthographic, semantic, positional	Dictionaries collected from Wikipedia, GATE, and deid
Newfoundland	Python packages Numpy and Scipy		non-parametric Bayesian Hidden Markov Model: token, word token, number token
Nottingham	Pre-processing; CRF++	Yes, for categories such as FAX, EMAIL, DEVICE, BIOID	CRF: Word-token, context, orthographic, sentence-level,task-specific	self-compiled dictionary
San Marcos		Used for all categories of PHI