Skip to main content
. 2020 Aug 28;8:673. doi: 10.3389/fcell.2020.00673

Table 1.

Benchmark Corpora used for analyzing BioNER systems.

Corpus Year Text type Training data type Data size
ChEBI (Shardlow et al., 2018) 2018 Abstracts/Full text Chemical Entities of Biological Interest Abs-199/FT-100 (15,000 mentions)
CHEMDNER (Krallinger et al., 2015) 2015 Pubmed Abstracts Chemicals and Drugs 10,000 (84,355 Entity mentions)
NCBI Disease (Dogan et al., 2014) 2014 Pubmed Abstracts Diseases 793 (6,892 Disease mentions)
CRAFT (Bada et al., 2012) 2012 Full Text Cell Type, Chemical Entities of Biological Interest, NCBI Taxonomy, protein, Sequence, Gene, DNA, RNA 97 (140,000 Annotations)
AnEM (Ohta et al., 2012) 2012 Abstracts/ Full text Pathology, Anatomical Structures/Substances 500 (3,000 mentions)
NaCTeM Metabolite and Enzyme (Nobata et al., 2011) 2011 Medline Abstracts Metabolites and Enzymes 296
LINNAEUS (Gerner et al., 2010) 2010 Full text Documents Species 100
GENETAG (Tanabe et al., 2005) 2005 Sentences Gene, Protein 20,000 Sentences
JNLPBA (Huang et al., 2019) 2004 Abstracts DNA, RNA, Protein, Cell Type, Cell Line 2,000 (+404 testset)
GENIA (Kim et al., 2003) 2003 Pubmed Abstracts DNA, RNA, Protein, Cells, Tissue, Anatomy, Organisms, Chemicals 2,000