. 2013 Nov 16;2013:721–730.

Table 1.

Performance metrics for various training set sizes and active learning

Training set size	Performance metric	Deterministic		FIE		Probabilistic
Training set size	Performance metric	Avg	Stdev	Avg	Stdev	Avg	Stdev
0 (Baseline)	Match recall	0.54		0.12		0.20
	Match precision	1.0		1.0		1.0
	Match errors (FP)	0.0		0.0		0.0
	UnMatch errors (FN)	0.0		0.0		0.0
	Manual review	11.6%		49.6%		10.5%
2,000	Match recall	0.79	0.20	0.94	0.05	0.74	0.16
	Match precision	0.986	0.25	0.971	0.05	0.986	0.21
	Match errors (FP)	7.0	6.2	16.9	8.2	6.2	5.1
	UnMatch errors (FN)	13.2	5.4	9.4	3.7	7.4	3.9
	Manual review	1.9%	2.0%	0.4%	0.4%	2.0%	1.4%
4,000	Match recall	0.93	0.05	0.92	0.03	0.62	0.07
	Match precision	0.982	0.05	0.983	0.03	0.994	0.11
	Match errors (FP)	9.9	3.7	9.3	3.8	2.4	1.8
	UnMatch errors (FN)	8.3	5.7	5.6	4.4	7.0	2.9
	Manual review	0.7%	0.5%	0.7%	0.2%	2.5%	0.5%
6,000	Match recall	0.81	0.09	0.79	0.05	0.66	0.08
	Match precision	0.990	0.11	0.990	0.06	0.993	0.13
	Match errors (FP)	4.7	2.2	4.7	2.0	2.8	1.8
	UnMatch errors (FN)	5.4	2.6	1.6	1.6	5.4	2.0
	Manual review	1.9%	0.4%	1.8%	0.3%	2.3%	0.6%
8,000	Match recall	0.81	0.05	0.83	0.06	0.62	0.07
	Match precision	0.994	0.07	0.991	0.07	0.993	0.11
	Match errors (FP)	3.1	1.2	4.5	2.0	2.6	0.8
	UnMatch errors (FN)	4.2	2.5	0.9	0.8	4.4	2.6
	Manual review	1.7%	0.2%	1.6%	0.3%	2.6%	0.5%
10,000	Match recall	0.70	0.01	0.76	0.02	0.59	0.03
	Match precision	0.997	0.00	1.0	0.03	0.980	0.06
	Match errors (FP)	1.5	0.5	0.0	0.0	7.5	0.8
	UnMatch errors (FN)	1.6	0.5	0.1	0.3	1.2	0.4
	Manual review	2.5%	0.0%	1.9%	0.1%	3.5%	0.2%
Active learning 25 iterations (aprox. 3100)	Match recall	0.70	0.01	0.75	0.01	0.59	0.0
	Match precision	0.996	0.01	0.993	0.02	0.997	0.00
	Match errors (FP)	1.8	0.4	3.0	0.7	1.0	0.0
	UnMatch errors (FN)	1.4	0.9	0.0	0. 0	4	0.0
	Manual review	2.5%	0.05%	2.1%	0.13%	2.8%	0.01%

Two-threshold algorithms were constructed to minimize errors as well as manual review set size. We, therefore, report error rates as false positive (FP) for cases classified as duplicate; false negative (FN) for cases classified as non-duplicates; and the size of the manual review set in percentage (of the 10,000 record pair test set) (Figure 1). We also report familiar metrics of recall and precision for duplicate records.