. 2013 Jun 6;92(6):882–894. doi: 10.1016/j.ajhg.2013.04.023

Table 3.

Comparison of Different Strategies for Handling Sporadic Missing Data for the Different Methods

Method	Missing	Euclidean Distance: 2^nd[1^st, 3^rd] Quartile	Distance: 2^nd[1^st, 3^rd] Quartile (km)
PCA	filled	2.89 [1.71, 4.56]	253.4 [152.9, 375.9]
PCA	imputed	2.88 [1.68, 4.50]	253.8 [150.0, 373.8]
SMARTPCA with regression (5)	with missing	2.78 [1.60, 4.44]	241.1 [147.3, 366.6]
SMARTPCA with regression (5)	imputed	2.74 [1.62, 4.33]	237.5 [146.4, 363.0]
SPA	filled	2.86 [1.63, 4.48]	248.9 [148.0, 371.3]
SPA	ignored	2.87 [1.62, 4.49]	248.7 [147.1, 371.0]
SPA	imputed	2.88 [1.65, 4.44]	249.1 [148.4, 366.2]
LOCO-LD (window length = 10)	ignored	2.46 [1.43, 3.86]	221.5 [127.1, 320.9]
LOCO-LD	imputed	2.42 [1.36, 3.70]	211.2 [124.7, 313.8]

“Filled” denotes replacing the missing entries with the mean genotype value for that variant. “Imputed” denotes using BEAGLE for imputing the missing entries. “With missing” for SMARTPCA denotes running the software on data with missing entries. “Ignored” denotes leaving the missing entries out of the computation; this option is available only for the model-based approaches, SPA and LOCO-LD. Reported error measures are the same as in Table 1.