Table 1.
Performance metrics for various training set sizes and active learning
Training set size | Performance metric | Deterministic | FIE | Probabilistic | |||
---|---|---|---|---|---|---|---|
Avg | Stdev | Avg | Stdev | Avg | Stdev | ||
0 (Baseline) |
Match recall | 0.54 | 0.12 | 0.20 | |||
Match precision | 1.0 | 1.0 | 1.0 | ||||
Match errors (FP) | 0.0 | 0.0 | 0.0 | ||||
UnMatch errors (FN) | 0.0 | 0.0 | 0.0 | ||||
Manual review | 11.6% | 49.6% | 10.5% | ||||
2,000 | Match recall | 0.79 | 0.20 | 0.94 | 0.05 | 0.74 | 0.16 |
Match precision | 0.986 | 0.25 | 0.971 | 0.05 | 0.986 | 0.21 | |
Match errors (FP) | 7.0 | 6.2 | 16.9 | 8.2 | 6.2 | 5.1 | |
UnMatch errors (FN) | 13.2 | 5.4 | 9.4 | 3.7 | 7.4 | 3.9 | |
Manual review | 1.9% | 2.0% | 0.4% | 0.4% | 2.0% | 1.4% | |
4,000 | Match recall | 0.93 | 0.05 | 0.92 | 0.03 | 0.62 | 0.07 |
Match precision | 0.982 | 0.05 | 0.983 | 0.03 | 0.994 | 0.11 | |
Match errors (FP) | 9.9 | 3.7 | 9.3 | 3.8 | 2.4 | 1.8 | |
UnMatch errors (FN) | 8.3 | 5.7 | 5.6 | 4.4 | 7.0 | 2.9 | |
Manual review | 0.7% | 0.5% | 0.7% | 0.2% | 2.5% | 0.5% | |
6,000 | Match recall | 0.81 | 0.09 | 0.79 | 0.05 | 0.66 | 0.08 |
Match precision | 0.990 | 0.11 | 0.990 | 0.06 | 0.993 | 0.13 | |
Match errors (FP) | 4.7 | 2.2 | 4.7 | 2.0 | 2.8 | 1.8 | |
UnMatch errors (FN) | 5.4 | 2.6 | 1.6 | 1.6 | 5.4 | 2.0 | |
Manual review | 1.9% | 0.4% | 1.8% | 0.3% | 2.3% | 0.6% | |
8,000 | Match recall | 0.81 | 0.05 | 0.83 | 0.06 | 0.62 | 0.07 |
Match precision | 0.994 | 0.07 | 0.991 | 0.07 | 0.993 | 0.11 | |
Match errors (FP) | 3.1 | 1.2 | 4.5 | 2.0 | 2.6 | 0.8 | |
UnMatch errors (FN) | 4.2 | 2.5 | 0.9 | 0.8 | 4.4 | 2.6 | |
Manual review | 1.7% | 0.2% | 1.6% | 0.3% | 2.6% | 0.5% | |
10,000 | Match recall | 0.70 | 0.01 | 0.76 | 0.02 | 0.59 | 0.03 |
Match precision | 0.997 | 0.00 | 1.0 | 0.03 | 0.980 | 0.06 | |
Match errors (FP) | 1.5 | 0.5 | 0.0 | 0.0 | 7.5 | 0.8 | |
UnMatch errors (FN) | 1.6 | 0.5 | 0.1 | 0.3 | 1.2 | 0.4 | |
Manual review | 2.5% | 0.0% | 1.9% | 0.1% | 3.5% | 0.2% | |
Active learning 25 iterations (aprox. 3100) |
Match recall | 0.70 | 0.01 | 0.75 | 0.01 | 0.59 | 0.0 |
Match precision | 0.996 | 0.01 | 0.993 | 0.02 | 0.997 | 0.00 | |
Match errors (FP) | 1.8 | 0.4 | 3.0 | 0.7 | 1.0 | 0.0 | |
UnMatch errors (FN) | 1.4 | 0.9 | 0.0 | 0. 0 | 4 | 0.0 | |
Manual review | 2.5% | 0.05% | 2.1% | 0.13% | 2.8% | 0.01% |
Two-threshold algorithms were constructed to minimize errors as well as manual review set size. We, therefore, report error rates as false positive (FP) for cases classified as duplicate; false negative (FN) for cases classified as non-duplicates; and the size of the manual review set in percentage (of the 10,000 record pair test set) (Figure 1). We also report familiar metrics of recall and precision for duplicate records.