Skip to main content
. 2013 Nov 16;2013:721–730.

Table 1.

Performance metrics for various training set sizes and active learning

Training set size Performance metric Deterministic FIE Probabilistic
Avg Stdev Avg Stdev Avg Stdev
0
(Baseline)
Match recall 0.54 0.12 0.20
Match precision 1.0 1.0 1.0
Match errors (FP) 0.0 0.0 0.0
UnMatch errors (FN) 0.0 0.0 0.0
Manual review 11.6% 49.6% 10.5%
2,000 Match recall 0.79 0.20 0.94 0.05 0.74 0.16
Match precision 0.986 0.25 0.971 0.05 0.986 0.21
Match errors (FP) 7.0 6.2 16.9 8.2 6.2 5.1
UnMatch errors (FN) 13.2 5.4 9.4 3.7 7.4 3.9
Manual review 1.9% 2.0% 0.4% 0.4% 2.0% 1.4%
4,000 Match recall 0.93 0.05 0.92 0.03 0.62 0.07
Match precision 0.982 0.05 0.983 0.03 0.994 0.11
Match errors (FP) 9.9 3.7 9.3 3.8 2.4 1.8
UnMatch errors (FN) 8.3 5.7 5.6 4.4 7.0 2.9
Manual review 0.7% 0.5% 0.7% 0.2% 2.5% 0.5%
6,000 Match recall 0.81 0.09 0.79 0.05 0.66 0.08
Match precision 0.990 0.11 0.990 0.06 0.993 0.13
Match errors (FP) 4.7 2.2 4.7 2.0 2.8 1.8
UnMatch errors (FN) 5.4 2.6 1.6 1.6 5.4 2.0
Manual review 1.9% 0.4% 1.8% 0.3% 2.3% 0.6%
8,000 Match recall 0.81 0.05 0.83 0.06 0.62 0.07
Match precision 0.994 0.07 0.991 0.07 0.993 0.11
Match errors (FP) 3.1 1.2 4.5 2.0 2.6 0.8
UnMatch errors (FN) 4.2 2.5 0.9 0.8 4.4 2.6
Manual review 1.7% 0.2% 1.6% 0.3% 2.6% 0.5%
10,000 Match recall 0.70 0.01 0.76 0.02 0.59 0.03
Match precision 0.997 0.00 1.0 0.03 0.980 0.06
Match errors (FP) 1.5 0.5 0.0 0.0 7.5 0.8
UnMatch errors (FN) 1.6 0.5 0.1 0.3 1.2 0.4
Manual review 2.5% 0.0% 1.9% 0.1% 3.5% 0.2%
Active learning
25 iterations (aprox. 3100)
Match recall 0.70 0.01 0.75 0.01 0.59 0.0
Match precision 0.996 0.01 0.993 0.02 0.997 0.00
Match errors (FP) 1.8 0.4 3.0 0.7 1.0 0.0
UnMatch errors (FN) 1.4 0.9 0.0 0. 0 4 0.0
Manual review 2.5% 0.05% 2.1% 0.13% 2.8% 0.01%

Two-threshold algorithms were constructed to minimize errors as well as manual review set size. We, therefore, report error rates as false positive (FP) for cases classified as duplicate; false negative (FN) for cases classified as non-duplicates; and the size of the manual review set in percentage (of the 10,000 record pair test set) (Figure 1). We also report familiar metrics of recall and precision for duplicate records.