Skip to main content
. 2020 Jan 24;15(1):e0228154. doi: 10.1371/journal.pone.0228154

Table 4. The mean, standard deviation, preservation of data (PD), sensitivity and specificity of five data cleaning approaches with and without an algorithm (A) compared to uncleaned longitudinal growth measurements in CLOSER data with and without simulated duplications and 1% errors.

Method Mean ± SD PD (%) Sensitivity (%) Specificity (%)
Pre-cleaned without simulations 41.03 ± 30.20 _ _ _ _ _ _ _ _ _
Uncleaned with simulations 63.61 ± 1135.70 100.00 0.00 100.00
GCO 41.34 ± 30.86 92.15 56.38 99.96
GCO-A 41.43 ± 30.82 92.70 59.85 99.99
SZCO 43.39 ± 57.54 92.64 6.10 99.96
SZCO-A 41.32 ± 30.79 92.27 60.05 99.99
TZCO 42.61 ± 47.84 92.56 14.71 99.95
TZCO-A 41.28 ± 30.77 92.33 61.52 99.99
NLR 41.09 ± 30.39 92.18 53.51 99.95
NLR-A 41.10 ± 30.35 92.57 71.93 100.00
NLME 40.94 ± 30.16 91.64 86.00 99.75
NLME-A 40.96 ± 30.17 92.45 90.55 99.85

Duplications were simulated by randomly selecting 2.5% of the data and duplicating it once, followed by randomly selecting a further 2.5% of the data and duplicating it twice. Simulated errors were made up of 50% random errors and 50% fixed errors. Random errors were simulated between the values of 0.0001 and 500. Fixed errors comprised of manipulating measurements by multiplying and dividing by 10, 100 and 1000, adding 100 and 1000, converting to the metric and imperial units and transposing the number. The mean ± SD describes the mean plus or minus the standard deviation of the growth measurements. The preservation of data (PD) describes the percentage of the original data that was preserved. Sensitivity was calculated as the mean percentage of simulated (true-positive) measurement errors that were correctly identified. Specificity was calculated as the mean percentage of non-simulated (true-negative) measurements that were correctly identified.