Table 2.
Details on the chemical NER tools in terms of training sets, databases to which the entities are normalized, classes of chemicals addressed, and tokenization methods
NER tool | Training set | Databases | Classes | Tokenization method |
---|---|---|---|---|
tmChem [24] | CHEMDNER corpus at BioCreative IV (training and development sets) | CHEBI | SYSTEMATIC | Tokenization at every non-letter and non-digit characters, number- letter changes and lower case letter followed by an uppercase letter |
MESH | FORMULA | |||
FAMILY | ||||
TRIVIAL | ||||
IDENTIFIER | ||||
MULTIPLE | ||||
ABBREVIATION | ||||
ChemSpot [13] | A subset of SCAI Corpus [29] containing only IUPAC | ChemIDplus | SYSTEMATIC | Tokenization at every non-letter and non-digit characters and number-letter changes |
CHEBI | FORMULA | |||
CAS | FAMILY | |||
NUMBER | TRIVIAL | |||
PubChem | IDENTIFIER | |||
InChI | MULTIPLE | |||
DrugBank | ABBREVIATION | |||
KEGG | ||||
Human | ||||
Metabolome | ||||
MESH |