Skip to main content
. 2009 Jul 17;10:223. doi: 10.1186/1471-2105-10-223

Table 8.

Comparison of performance and applicability of different NER systems on BioCreative 2 test set

System or authors Precision Recall F-score # of features Tagging complexity Availability
CRF 1 (ABNER+) 87.30 80.68 83.86 171,251 LM N
CRF 2 (ABNER++) 87.39 81.96 84.59 355,461 LM N
Dictionary 90.37 82.40 86.20 0 Trie Y
Dictionary + CRF 2 90.52 87.63 89.05 355,609 LM Y
BANNER [26,27] 88.66 84.32 86.43 500,876 LM+POS tagger Y
Ando [5] (1st in BioCreative 2) 88.48 85.97 87.21 -- 2*LM+POS tagger+syntactic parser N
Hus et al. [10] 88.95 87.65 88.30 8 * 5,059,368 8*LM+POS tagger N

In the 6th column, 'LM' and 'Trie' respectively refer to the time complexities of a linear model and a Trie tree based dictionary match. The 'Dictionary' method doesn't need any feature, once the dictionary is constructed. For Ando's system, we cannot find the number of features in the paper [5]. Since the systems in the last two rows used classifier combination, the tagging complexities and numbers of features are multiplied by the numbers of sub-models.