Table 1.
Description of the data used for evaluating the configuration optimization approach.
Data set | Description | Original records, n | Duplicates, n |
FEBRL1a | Distributed with the FEBRL package. | 500 | 500 (1 per original) |
FEBRL2 | Distributed with the FEBRL package. | 4000 | 1000 (maximum 5 per original) |
FEBRL3 | Distributed with the FEBRL package. | 2000 | 3000 (maximum 5 per original) |
FEBRL4 | Distributed with the FEBRL package. | 5000 | 5000 (1 per original) |
Hawaii | Constructed with the FEBRL data set generator using a number of Hawaii-specific data sources. | 1000 | 1000 (maximum 5 per original) |
aFEBRL: Freely Extensible Biomedical Record Linkage.