Table 4. The mean, standard deviation, preservation of data (PD), sensitivity and specificity of five data cleaning approaches with and without an algorithm (A) compared to uncleaned longitudinal growth measurements in CLOSER data with and without simulated duplications and 1% errors.
Method | Mean ± SD | PD (%) | Sensitivity (%) | Specificity (%) |
---|---|---|---|---|
Pre-cleaned without simulations | 41.03 ± 30.20 | _ _ _ | _ _ _ | _ _ _ |
Uncleaned with simulations | 63.61 ± 1135.70 | 100.00 | 0.00 | 100.00 |
GCO | 41.34 ± 30.86 | 92.15 | 56.38 | 99.96 |
GCO-A | 41.43 ± 30.82 | 92.70 | 59.85 | 99.99 |
SZCO | 43.39 ± 57.54 | 92.64 | 6.10 | 99.96 |
SZCO-A | 41.32 ± 30.79 | 92.27 | 60.05 | 99.99 |
TZCO | 42.61 ± 47.84 | 92.56 | 14.71 | 99.95 |
TZCO-A | 41.28 ± 30.77 | 92.33 | 61.52 | 99.99 |
NLR | 41.09 ± 30.39 | 92.18 | 53.51 | 99.95 |
NLR-A | 41.10 ± 30.35 | 92.57 | 71.93 | 100.00 |
NLME | 40.94 ± 30.16 | 91.64 | 86.00 | 99.75 |
NLME-A | 40.96 ± 30.17 | 92.45 | 90.55 | 99.85 |
Duplications were simulated by randomly selecting 2.5% of the data and duplicating it once, followed by randomly selecting a further 2.5% of the data and duplicating it twice. Simulated errors were made up of 50% random errors and 50% fixed errors. Random errors were simulated between the values of 0.0001 and 500. Fixed errors comprised of manipulating measurements by multiplying and dividing by 10, 100 and 1000, adding 100 and 1000, converting to the metric and imperial units and transposing the number. The mean ± SD describes the mean plus or minus the standard deviation of the growth measurements. The preservation of data (PD) describes the percentage of the original data that was preserved. Sensitivity was calculated as the mean percentage of simulated (true-positive) measurement errors that were correctly identified. Specificity was calculated as the mean percentage of non-simulated (true-negative) measurements that were correctly identified.