Table 7. The percentage of alterations made to Dogslife, SAVSNET, Banfield and CLOSER data with simulated duplications and 1% simulated errors using the NLME-A data cleaning method.
Step of algorithm | Description of step |
Dogslife | SAVSNET | Banfield | CLOSER | |
---|---|---|---|---|---|---|
Weights | Heights | |||||
STEP 1 | Remove identical duplications | 12.52 | 11.42 | 10.21 | 0.671 | 7.183 |
STEP 2 | Remove similar duplications | 1.193 | 4.716 | 0.886 | 0.097 | 0.119 |
STEP 3 | Replace outliers with the closest correction to the measurement prediction | |||||
Transpose | 0.025 | 0.382 | 0.000 | 0.000 | 0.060 | |
/10 | 0.108 | 0.000 | 0.006 | 0.011 | 0.167 | |
/100 | 0.018 | 0.000 | 0.004 | 0.000 | 0.132 | |
/1000 | 0.002 | 0.000 | 0.000 | 0.000 | 0.041 | |
x10 | 0.051 | 0.143 | 0.008 | 0.040 | 0.025 | |
x100 | 0.000 | 0.000 | 0.000 | 0.000 | 0.021 | |
x1000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.022 | |
-100 | 0.025 | 0.004 | 0.000 | 0.000 | 0.016 | |
-1000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.038 | |
+100 | 0.000 | 0.000 | 0.000 | 0.000 | 0.001 | |
+1000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
x metric | 0.124 | 1.960 | 0.032 | 0.080 | 0.100 | |
x imperial | 0.081 | 0.200 | 0.056 | 0.046 | 0.045 | |
STEP 4 | Remove outliers that jump in size | 0.219 | 0.603 | 0.142 | 0.138 | 0.214 |
STEP 5 | Remove implausible entries | 0.005 | 0.018 | 0.000 | 0.017 | 0.041 |
Total duplicates removed | 13.71 | 16.14 | 11.10 | 0.768 | 7.301 | |
Total errors removed | 0.659 | 3.309 | 0.249 | 0.332 | 0.924 |
Simulated errors were made up of 50% random errors and 50% fixed errors. Random errors were simulated between the values of 0.0001 and 500. Fixed errors comprised of manipulating measurements by multiplying and dividing by 10, 100 and 1000, adding 100 and 1000, converting to the metric and imperial units and transposing the number.