. 2009 Mar-Apr;16(2):247–255. doi: 10.1197/jamia.M2844

Table 2.

Table 2 Recognition Performance Over the Training Corpus of the BioCreAtIvE II GM Corpus, N = 18,265

Software	Notes	Precision	Recall	F-Measure
BioThesaurus	With all mapping	0.2253	0.8654	0.3576
	With all mapping + false-positive list	0.5000	0.8541	0.6308
	Above w/longest first mapping	0.6100	0.8378	0.7059
ABNER	First-order CRF model	0.8324	0.7246	0.7753
	With post-processing module	0.8361	0.7493	0.7901
LingPipe	CharLmRescoring with 36-gram	0.7637	0.8204	0.7910
	With post-processing module	0.7661	0.8364	0.7997
MEMM (MALLET)	Second-order MEMM	0.8432	0.8044	0.8233
	With post-processing module	0.8412	0.8175	0.8291
CRF (MALLET)	Without BioThesaurus	0.8621	0.7765	0.8170
	Without post-processing	0.8718	0.8133	0.8415
	Without POS	0.8717	0.8138	0.8417
	Without UMLS	0.8660	0.8187	0.8417
	Without false-positive list	0.8772	0.8109	0.8428
	With longest first mapping	0.8673	0.8212	0.8436
	The best configuration	0.8714	0.8261	0.8481
BioTagger-GM	Combination of four systems	0.8658	0.8717	0.8687

GM = gene mention; MEMM = maximum entropy Markov model; CRF = conditional random field.

Reported numbers are averages of performance measures in 5×2-fold cross-validation tests.