Table 3.
Performance comparison of the baseline and machine learning–optimized matching configurations in SantéMPI in the held-out evaluation sets, for the detection of definite linkages not needing manual review.
Data set | Sensitivity (%; 95% CI) | Patients, n | Correctly predicted linkages (ground-truth linkages) |
FEBRL1a | 98.0 (95.0-100.0) | 100 | 98 (100) |
FEBRL2 | 96.6 (93.9-98.6) | 800 | 196 (203) |
FEBRL3 | 94.9 (93.0-96.5) | 400 | 558 (588) |
FEBRL4 | 98.3 (97.5-99.1) | 1000 | 983 (1000) |
Hawaii | 96.6 (93.7-98.9) | 200 | 168 (174) |
aFEBRL: Freely Extensible Biomedical Record Linkage.