Skip to main content
. 2016 Oct 28;8:59. doi: 10.1186/s13321-016-0172-0

Table 2.

Details on the chemical NER tools in terms of training sets, databases to which the entities are normalized, classes of chemicals addressed, and tokenization methods

NER tool Training set Databases Classes Tokenization method
tmChem [24] CHEMDNER corpus at BioCreative IV (training and development sets) CHEBI SYSTEMATIC Tokenization at every non-letter and non-digit characters, number- letter changes and lower case letter followed by an uppercase letter
MESH FORMULA
FAMILY
TRIVIAL
IDENTIFIER
MULTIPLE
ABBREVIATION
ChemSpot [13] A subset of SCAI Corpus [29] containing only IUPAC ChemIDplus SYSTEMATIC Tokenization at every non-letter and non-digit characters and number-letter changes
CHEBI FORMULA
CAS FAMILY
NUMBER TRIVIAL
PubChem IDENTIFIER
InChI MULTIPLE
DrugBank ABBREVIATION
KEGG
Human
Metabolome
MESH
HHS Vulnerability Disclosure