Skip to main content
. 2021 Jul 21;37(24):4756–4763. doi: 10.1093/bioinformatics/btab489

Table 2.

Summary of real and simulated datasets used in our experiments

Dataset Total SNP Typed SNP Sample Ref Haps Miss %
Sim 10K 62 704 22 879 1000 10 000 0.5
Sim 100K 80 029 23 594 1000 100 000 0.5
Sim 1M 97 750 23 092 1000 1 000 000 0.5
1000G chr10 3 968 020 192 683 52 4904 0.1
1000G chr20 1 802 261 96 083 52 4904 0.1
HRC chr10 1 809 068 191 210 1000 52 330 0.1
HRC chr20 829 265 95 414 1000 52 330 0.1

Note: For the 1000G and HRC datasets, the SNPs also present on the Infinium Omni5-4 Kit constitute the typed SNPs. Miss % is the percentage of typed SNPs randomly masked to mimic random missing data. For ancestry estimation, we used the top 50 000 most ancestry informative SNPs in each 1000G chromosome.