Table 6. The mean preservation of data (PD), sensitivity, specificity and convergence rate across different rates and types of simulated errors and duplications of uncleaned, de-duplicated and data cleaned with five data cleaning approaches with and without our algorithm (A) for longitudinal growth measurements from CLOSER data.
Method | Sensitivity (%) | Specificity (%) | PD (%) | Convergence rate (%) |
---|---|---|---|---|
Uncleaned | 0.00 | 100.00 | 100.00 | 100.00 |
De-duplicated | 0.93 | 99.34 | 92.70 | 100.00 |
GCO | 57.01 | 99.34 | 87.38 | 100.00 |
GCO-A | 59.85 | 99.87 | 92.70 | 100.00 |
SZCO | 11.02 | 99.33 | 92.28 | 100.00 |
SZCO-A | 60.39 | 99.57 | 87.78 | 100.00 |
TZCO | 25.08 | 99.29 | 92.04 | 100.00 |
TZCO-A | 64.78 | 99.56 | 87.93 | 100.00 |
NLR | 54.03 | 99.33 | 87.62 | 100.00 |
NLR-A | 71.35 | 99.94 | 91.47 | 100.00 |
NLME | 80.03 | 99.46 | 88.96 | 76.36 |
NLME-A | 87.71 | 99.88 | 91.76 | 76.36 |
Errors were simulated for 0%, 0.1%, 0.2%, 0.5%, 1%, 2%, 5%, 10%, 20% and 50% of the data. Random errors were simulated between the values of 0.0001 and 500, for 0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% and 100% of the overall errors, where fixed errors made up the remaining percentage of errors. Fixed errors comprised of manipulating measurements by multiplying and dividing by 10, 100 and 1000, adding 100 and 1000, converting to the metric and imperial units and transposing the number. The preservation of data (PD) describes the percentage of the original data that was preserved. Sensitivity was calculated as the mean percentage of simulated (true-positive) measurement errors that were correctly identified. Specificity was calculated as the mean percentage of non-simulated (true-negative) measurements that were correctly identified. The convergence rate was calculated as the mean percentage of times a method was able to execute correctly.