. 2016 Oct 28;8:59. doi: 10.1186/s13321-016-0172-0

Table 2.

Details on the chemical NER tools in terms of training sets, databases to which the entities are normalized, classes of chemicals addressed, and tokenization methods

NER tool	Training set	Databases	Classes	Tokenization method
tmChem [24]	CHEMDNER corpus at BioCreative IV (training and development sets)	CHEBI	SYSTEMATIC	Tokenization at every non-letter and non-digit characters, number- letter changes and lower case letter followed by an uppercase letter
		MESH	FORMULA
			FAMILY
			TRIVIAL
			IDENTIFIER
			MULTIPLE
			ABBREVIATION
ChemSpot [13]	A subset of SCAI Corpus [29] containing only IUPAC	ChemIDplus	SYSTEMATIC	Tokenization at every non-letter and non-digit characters and number-letter changes
		CHEBI	FORMULA
		CAS	FAMILY
		NUMBER	TRIVIAL
		PubChem	IDENTIFIER
		InChI	MULTIPLE
		DrugBank	ABBREVIATION
		KEGG
		Human
		Metabolome
		MESH