Skip to main content
. 2020 Jan 24;15(1):e0228154. doi: 10.1371/journal.pone.0228154

Table 7. The percentage of alterations made to Dogslife, SAVSNET, Banfield and CLOSER data with simulated duplications and 1% simulated errors using the NLME-A data cleaning method.

Step of algorithm Description
of step
Dogslife SAVSNET Banfield CLOSER
Weights Heights
STEP 1 Remove identical duplications 12.52 11.42 10.21 0.671 7.183
STEP 2 Remove similar duplications 1.193 4.716 0.886 0.097 0.119
STEP 3 Replace outliers with the closest correction to the measurement prediction
Transpose 0.025 0.382 0.000 0.000 0.060
/10 0.108 0.000 0.006 0.011 0.167
/100 0.018 0.000 0.004 0.000 0.132
/1000 0.002 0.000 0.000 0.000 0.041
x10 0.051 0.143 0.008 0.040 0.025
x100 0.000 0.000 0.000 0.000 0.021
x1000 0.000 0.000 0.000 0.000 0.022
-100 0.025 0.004 0.000 0.000 0.016
-1000 0.000 0.000 0.000 0.000 0.038
+100 0.000 0.000 0.000 0.000 0.001
+1000 0.000 0.000 0.000 0.000 0.000
x metric 0.124 1.960 0.032 0.080 0.100
x imperial 0.081 0.200 0.056 0.046 0.045
STEP 4 Remove outliers that jump in size 0.219 0.603 0.142 0.138 0.214
STEP 5 Remove implausible entries 0.005 0.018 0.000 0.017 0.041
Total duplicates removed 13.71 16.14 11.10 0.768 7.301
Total errors removed 0.659 3.309 0.249 0.332 0.924

Simulated errors were made up of 50% random errors and 50% fixed errors. Random errors were simulated between the values of 0.0001 and 500. Fixed errors comprised of manipulating measurements by multiplying and dividing by 10, 100 and 1000, adding 100 and 1000, converting to the metric and imperial units and transposing the number.