Skip to main content
. 2013 Jun 6;92(6):882–894. doi: 10.1016/j.ajhg.2013.04.023

Table 3.

Comparison of Different Strategies for Handling Sporadic Missing Data for the Different Methods

Method Missing Euclidean Distance: 2nd[1st, 3rd] Quartile Distance: 2nd[1st, 3rd] Quartile (km)
PCA filled 2.89 [1.71, 4.56] 253.4 [152.9, 375.9]
PCA imputed 2.88 [1.68, 4.50] 253.8 [150.0, 373.8]
SMARTPCA with regression (5) with missing 2.78 [1.60, 4.44] 241.1 [147.3, 366.6]
SMARTPCA with regression (5) imputed 2.74 [1.62, 4.33] 237.5 [146.4, 363.0]
SPA filled 2.86 [1.63, 4.48] 248.9 [148.0, 371.3]
SPA ignored 2.87 [1.62, 4.49] 248.7 [147.1, 371.0]
SPA imputed 2.88 [1.65, 4.44] 249.1 [148.4, 366.2]
LOCO-LD (window length = 10) ignored 2.46 [1.43, 3.86] 221.5 [127.1, 320.9]
LOCO-LD imputed 2.42 [1.36, 3.70] 211.2 [124.7, 313.8]

“Filled” denotes replacing the missing entries with the mean genotype value for that variant. “Imputed” denotes using BEAGLE for imputing the missing entries. “With missing” for SMARTPCA denotes running the software on data with missing entries. “Ignored” denotes leaving the missing entries out of the computation; this option is available only for the model-based approaches, SPA and LOCO-LD. Reported error measures are the same as in Table 1.