Table 3.
Comparison of Different Strategies for Handling Sporadic Missing Data for the Different Methods
Method | Missing | Euclidean Distance: 2nd[1st, 3rd] Quartile | Distance: 2nd[1st, 3rd] Quartile (km) |
---|---|---|---|
PCA | filled | 2.89 [1.71, 4.56] | 253.4 [152.9, 375.9] |
PCA | imputed | 2.88 [1.68, 4.50] | 253.8 [150.0, 373.8] |
SMARTPCA with regression (5) | with missing | 2.78 [1.60, 4.44] | 241.1 [147.3, 366.6] |
SMARTPCA with regression (5) | imputed | 2.74 [1.62, 4.33] | 237.5 [146.4, 363.0] |
SPA | filled | 2.86 [1.63, 4.48] | 248.9 [148.0, 371.3] |
SPA | ignored | 2.87 [1.62, 4.49] | 248.7 [147.1, 371.0] |
SPA | imputed | 2.88 [1.65, 4.44] | 249.1 [148.4, 366.2] |
LOCO-LD (window length = 10) | ignored | 2.46 [1.43, 3.86] | 221.5 [127.1, 320.9] |
LOCO-LD | imputed | 2.42 [1.36, 3.70] | 211.2 [124.7, 313.8] |
“Filled” denotes replacing the missing entries with the mean genotype value for that variant. “Imputed” denotes using BEAGLE for imputing the missing entries. “With missing” for SMARTPCA denotes running the software on data with missing entries. “Ignored” denotes leaving the missing entries out of the computation; this option is available only for the model-based approaches, SPA and LOCO-LD. Reported error measures are the same as in Table 1.