. 2012 Aug 7;12:88. doi: 10.1186/1472-6947-12-88

Table 3.

Performance of freetext matching algorithm and MetaMap on test sets

Algorithm	FMA	FMA	MetaMap	MetaMap
Vocabulary	Read/OXMIS	Read/OXMIS	Read/OXMIS	Full Read
Test set	Death	General	General	General
Number of texts	1000	1000	1000	1000
Number of words	7534	25981	25981	25981
Positive diagnoses detected in free text
True positives	683	346	286	273
False positives	11	32	126	18
False negatives	52	101	161	174
Precision, %	98.4 (97.2, 99.2)	91.5 (88.3, 94.1)	69.4 (64.7, 73.8)	93.8 (90.4, 96.3)
Recall, %	92.9 (90.8, 94.7)	77.4 (73.2, 81.2)	64.0 (59.3, 68.4)	61.1 (56.4, 65.6)
F-score	0.96	0.84	0.67	0.74
Strictly defined precision for positive diagnoses (best term and correct attribute)
Number strictly correct	625	315	260	247
Precision strict, %	90.1 (87.6, 92.2)	83.3 (79.2, 86.9)	63.1 (58.2, 67.8)	84.9 (80.2, 88.8)
Precision of non-diagnosis positive concepts
True positives	84	304	295	453
False positives	2	22	55	41
Precision, %	97.7 (91.9, 99.7)	93.3 (90.0, 95.7)	84.3 (80.0, 87.9)	91.7 (88.9, 94.0)
Overall precision of positive concepts detected (diagnostic and non-diagnostic)
True positives	767	650	581	726
False positives	13	54	181	59
Precision, %	98.3 (97.2, 99.1)	92.3 (90.1, 94.2)	76.2 (73.1, 79.2)	92.5 (90.4, 94.2)
Precision of negative concepts detected
True positives	5	57	0	92
False positives	5	18	0	33
Precision, %	50.0 (18.7, 81.3)	76.0 (64.7, 85.1)		73.6 (65.0, 81.1)
Texts for which algorithm suggested a better Read term than the original term
Percentage of texts	0	1.2	0.5	0.6
Dates and durations
True positives	116	96
False positives	15	10
False negative	25	22
Precision, %	88.5 (81.8, 93.4)	90.6 (83.3, 95.4)
Recall, %	82.3 (74.9, 88.2)	81.4 (73.1, 87.9)
F-score	0.85	0.86
Test results and quantitative measurements
True positives		105
False positives		11
False negatives		18
Precision, %		90.5 (83.7, 95.2)
Recall, %		85.4 (77.9, 91.1)
F-score		0.89

Comparison of precision (positive predictive value) and recall (sensitivity) of the Freetext Matching Algorithm (FMA) and MetaMap against the gold standard of manual review, for two test sets: ‘General’, a random sample of 500 texts from cases and 500 from controls in a study on coronary artery disease; and ‘Death’, a random sample of 1000 texts associated with Read terms for death or suicide in 2001.