Table 1.
Subjects in the extreme “tail” of the rare-het and rare-hom count distributions of two typical sample sets
SubjectIDa | RHH Counts and p-valuesb |
Genotype Confidencec |
“Ethnic” Het Countsd |
Other Ethnic Outlier methods |
||||||
---|---|---|---|---|---|---|---|---|---|---|
All SNPs |
1 Mb apart |
All 500K | Rare Hets | YRI | CHB | WTCCC PC-MDSe | PLINK Z-scoref | |||
Hets | Homs | Hets | Homs | |||||||
Sample set A (UKBS controls) | ||||||||||
A1-1-1-1 | 9063 | 386 | 2003 | 229 | 0.06 | 0.04 | 356 | 19 | YES | −21.4 |
A2-6-2-5 | 3040 | 6 | 787 | 6 | 0.04 | 0.04 | 164 | 14 | YES | −5.5 |
A3-9-6-7 | 2090 | 1 | 617 | 1 | 0.04 | 0.04 | 111 | 5 | YES | −3.2 |
A4-3-3-3 | 1528 | 48 | 678 | 32 | 0.04 | 0.04 | 40 | 66 | YES | −14.5 |
A5-4-4-2 | 1315 | 46 | 659 | 38 | 0.03 | 0.03 | 50 | 64 | YES | −13.4 |
A6-2-5-2 | 1314 | 51 | 634 | 38 | 0.03 | 0.02 | 50 | 65 | YES | −13.8 |
A7-5-8-4 | 949 | 7 | 329 | 7 | 0.04 | 0.04 | 62 | 12 | YES | −2.7 |
A8-9-13-7 | 810 | 1 | 223 | 1 | 0.03 | 0.03 | 30 | 6 | −1.2 | |
A9-10-9-8 | 704 | 0 | 257 | 0 | 0.03 | 0.04 | 40 | 6 | YES | −2.0 |
A10-10-7-8 | 569 | 0 | 438 | 0 | 0.06 | 0.20 | 23 | 6 | −1.4 | |
A11-9-10-7 | 524 | 1 | 248 | 1 | 0.05 | 0.04 | 35 | 22 | YES | −3.7 |
A12-8-12-6 | 446 | 2 | 236 | 2 | 0.03 | 0.02 | 33 | 17 | YES | −7.2 |
A13-7-11-6 | 406 | 3 | 244 | 2 | 0.05 | 0.05 | 38 | 16 | YES | −6.1 |
A14-10-25-8 | 394 | 0 | 117 | 0 | 0.04 | 0.04 | 26 | 3 | −0.9 | |
A15-9-15-7 | 361 | 1 | 175 | 1 | 0.03 | 0.04 | 21 | 19 | YES | −4.2 |
A16-10-15-8 | 337 | 0 | 175 | 0 | 0.03 | 0.03 | 27 | 18 | YES | −5.7 |
A17-9-18-7 | 311 | 1 | 154 | 1 | 0.04 | 0.06 | 30 | 4 | −1.9 | |
A18-10-20-8 | 306 | 0 | 151 | 0 | 0.05 | 0.04 | 25 | 8 | −2.0 | |
A19-9-30-7 | 276 | 1 | 85 | 1 | 0.04 | 0.05 | 23 | 0 | −0.4 | |
A20-9-24-7 | 262 | 1 | 118 | 1 | 0.04 | 0.05 | 16 | 2 | −1.6 | |
A20-10-26-8 | 262 | 0 | 111 | 0 | 0.03 | 0.04 | 26 | 1 | −1.1 | |
A21-10-22-8 | 259 | 0 | 120 | 0 | 0.04 | 0.05 | 21 | 3 | −2.2 | |
A22-10-21-8 | 258 | 0 | 138 | 0 | 0.04 | 0.06 | 22 | 7 | YES | −4.7 |
A23-9-28-7 | 251 | 1 | 107 | 1 | 0.05 | 0.07 | 21 | 7 | −0.6 | |
A24-10-26-8 | 243 | 0 | 111 | 0 | 0.03 | 0.02 | 22 | 6 | −1.0 | |
Sample set B (58BC controls) | ||||||||||
B1-3-1-1 | 2802 | 4 | 814 | 4 | 0.04 | 0.03 | 114 | 33 | YES | −6.0 |
B2-5-2-3 | 2221 | 2 | 556 | 2 | 0.03 | 0.03 | 102 | 9 | YES | −4.2 |
B3-4-4-2 | 1525 | 3 | 404 | 3 | 0.03 | 0.03 | 72 | 5 | YES | −2.4 |
B4-6-5-4 | 1068 | 1 | 317 | 1 | 0.06 | 0.05 | 49 | 3 | YES | −2.5 |
B5-4-12-2 | 600 | 3 | 180 | 3 | 0.04 | 0.05 | 33 | 2 | −1.4 | |
B6-7-3-5 | 507 | 0 | 411 | 0 | 0.07 | 0.22 | 24 | 7 | −2.1 | |
B7-7-18-5 | 488 | 0 | 137 | 0 | 0.03 | 0.03 | 35 | 1 | −1.1 | |
B8-6-16-4 | 451 | 1 | 141 | 1 | 0.04 | 0.04 | 25 | 2 | −0.9 | |
B9-7-11-5 | 442 | 0 | 185 | 0 | 0.07 | 0.07 | 22 | 9 | −1.5 | |
B10-5-8-3 | 432 | 2 | 224 | 2 | 0.06 | 0.06 | 39 | 17 | YES | −2.5 |
B11-2-17-2 | 413 | 7 | 139 | 3 | 0.02 | 0.04 | 34 | 2 | −4.1 | |
B12-7-15-5 | 384 | 0 | 151 | 0 | 0.06 | 0.07 | 24 | 3 | −0.5 | |
B13-7-13-5 | 383 | 0 | 161 | 0 | 0.06 | 0.08 | 25 | 1 | −1.1 | |
B14-7-22-5 | 352 | 0 | 108 | 0 | 0.03 | 0.03 | 22 | 1 | −0.8 | |
B15-7-17-5 | 318 | 0 | 139 | 0 | 0.05 | 0.08 | 19 | 3 | −1.4 | |
B16-1-18-1 | 313 | 8 | 122 | 4 | 0.03 | 0.03 | 28 | 5 | −3.7 | |
B17-7-15-5 | 298 | 0 | 151 | 0 | 0.03 | 0.03 | 20 | 5 | −2.7 | |
B18-6-24-4 | 294 | 1 | 98 | 1 | 0.03 | 0.04 | 14 | 6 | −0.3 | |
B19-7-14-5 | 282 | 0 | 157 | 0 | 0.05 | 0.06 | 24 | 8 | −2.3 | |
B20-7-7-5 | 269 | 0 | 233 | 0 | 0.07 | 0.19 | 14 | 7 | −0.7 | |
B21-7-19-5 | 257 | 0 | 121 | 0 | 0.03 | 0.03 | 25 | 3 | −2.1 | |
B22-7-25-5 | 255 | 0 | 96 | 0 | 0.03 | 0.03 | 23 | 2 | −1.2 | |
B23-7-6-5 | 252 | 0 | 234 | 0 | 0.05 | 0.19 | 14 | 4 | −1.2 | |
B24-7-20-5 | 248 | 0 | 118 | 0 | 0.03 | 0.03 | 19 | 5 | −2.3 | |
B24-7-26-5 | 248 | 0 | 88 | 0 | 0.04 | 0.04 | 15 | 5 | YES | −1.8 |
aSubjects are sorted from highest to lowest rare-het counts for “All SNPs” (column 2); subjectID is sample set followed by count rank in columns 2, 3, 4 and 5.
bCounts from all Affy500K SNPs or “thinned” to derive only from SNPs at least 1 Mb apart. Counts exceeding permutation-derived threshold are bold and underlined (signifying p<0.001) or only underlined (signifying p<0.05).
cMean BRLMM confidence for subject genotypes at all Affy500K SNPs and at all rare hets. Mean rare-het confidence above 0.1 is in bold italics to indicate doubtful genotype accuracy and likely false-positive ethnic outlier.
dCounts of heterozygotes at “ethnic” SNPs at least 1 Mb apart. Statistically excess counts (p<0.001 or p<0.05) denoted by bold and underline as in footnote b.
eSubject identified by WTCCC as having “non-Caucasian ancestry” based on PC-MDS analysis (12).
fLowest PLINK Z-score from 1st thorough 10th nearest-neighbor distributions. Z-scores are bold and underlined if statistically significant (Z<−4.0).