Skip to main content
. 2014 Apr 28;6:17. doi: 10.1186/1758-2946-6-17

Table 2.

Chemical text corpora for evaluating and training the NER applications

Corpus Class of named entities Reference Availability
IUPAC training corpus
IUPAC names
[2]
http://www.scai.fraunhofer.de/chem-corpora.html
SCAI
All chemical names
[17]
http://www.scai.fraunhofer.de/chem-corpora.html
PubMed corpus
Compounds, reagents, chemical adjectives enzymes and prefix
[18]
Not available.
Sciborg corpus
All chemical names
[18]
Not available
GENIA corpus
Biological besides some chemical entities
[19]
http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA
European Patent Office and the ChEB
All chemical names
[20]
http://chebi.cvs.sourceforge.net/viewvc/chebi/chapati/patentsGoldStandard
CHEMDNER Corpus Chemical compounds and drugs [21] http://www.biocreative.org/tasks/biocreative-iv/chemdner/