Skip to main content
. 2016 Aug 16;50:49. doi: 10.1590/S1518-8787.2016050006327

Table 5. Characteristic of records with discordant classification by deterministic and probabilistic record linkage.

Record characteristics Pair by deterministic and non-pair by probabilistic linkage Pair by probabilistic and non-pair by deterministic linkage


(N = 1,285) (N = 527)
Median CI95%
Score - - 24.2 20.3–32.3

  n % n %
Missing sex value 0 0 0 0
Missing name value 0 0 0 0
Missing mother’s name value 68 5.3 76 14.4
Missing date of birth value 81 6.3 58 11.0
Missing address value 7 0.5 20 3.8
Combined: unknown mother’s name; or unknown date of birth; or unknown address 129 10.0 141 26.7

Link characteristics Pair by deterministic and non-pair by probabilistic linkagea,b Pair by probabilistic and non-pair by deterministic linkagea,b
 

  (N = 733) (N = 293)

  n % n %
Difference in sex 115 15.7 0 0
Similarity measure for name lower than 70,0% 160 25.9 0 0
Similarity measure for mother’s name lower than 70,0% 147 26.6 36 16.6
Similarity measure for date of birth lower than 70,0% 45 8.3 25 10.6

  Median IC95% Median IC95%

Similarity measure for name x 100 81.6 69.4–94.4 100 95.0–100
Similarity measure for mother’s name x 100 91.3 68.2–100 94.4 85.7–100
Similarity measure for date of birth 100 10.0–100 87.5 75.0–100

a Calculating Levenshtein distance and assessing the difference in sex between records required comparing the records of the record group of the same patient. Comparing discordant records: (i) when the two discordant records were identified by only one of the techniques, the calculation was done between them; (ii) when they were records identified by both techniques and only one of them was not identified by one of the techniques, the calculation was done by comparing the unidentified record with the record of highest score in the group.

b For this calculation, the group of records that were blocked by sex or had missing information for one of the variables were excluded.