Skip to main content
. 2009 Mar-Apr;16(2):247–255. doi: 10.1197/jamia.M2844

Table 2.

Table 2 Recognition Performance Over the Training Corpus of the BioCreAtIvE II GM Corpus, N = 18,265

Software Notes Precision Recall F-Measure
BioThesaurus With all mapping 0.2253 0.8654 0.3576
With all mapping + false-positive list 0.5000 0.8541 0.6308
Above w/longest first mapping 0.6100 0.8378 0.7059
ABNER First-order CRF model 0.8324 0.7246 0.7753
With post-processing module 0.8361 0.7493 0.7901
LingPipe CharLmRescoring with 36-gram 0.7637 0.8204 0.7910
With post-processing module 0.7661 0.8364 0.7997
MEMM (MALLET) Second-order MEMM 0.8432 0.8044 0.8233
With post-processing module 0.8412 0.8175 0.8291
CRF (MALLET) Without BioThesaurus 0.8621 0.7765 0.8170
Without post-processing 0.8718 0.8133 0.8415
Without POS 0.8717 0.8138 0.8417
Without UMLS 0.8660 0.8187 0.8417
Without false-positive list 0.8772 0.8109 0.8428
With longest first mapping 0.8673 0.8212 0.8436
The best configuration 0.8714 0.8261 0.8481
BioTagger-GM Combination of four systems 0.8658 0.8717 0.8687

GM = gene mention; MEMM = maximum entropy Markov model; CRF = conditional random field.

Reported numbers are averages of performance measures in 5×2-fold cross-validation tests.