Table 2. Comparison of reference genomic sequence datasets for mapping captured reads.
Numbers of SNPs by allele frequency (AF) | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Reference | Min. variant reads | SNP type | 0.05–0.1 | 0.1–0.2 | 0.2–0.3 | 0.3–0.4 | 0.4–0.5 | 0.5–0.6 | 0.6–0.7 | 0.7–0.8 | 0.8–0.9 | 0.9–0.10 | Total | Total (AF>0.1) |
CSS | 3 | EMS | 427 | 163 | 64 | 56 | 46 | 30 | 23 | 19 | 18 | 257 | 1103 | 676 |
NON-EMS | 294 | 113 | 28 | 11 | 1 | 1 | 0 | 1 | 0 | 4 | 453 | 159 | ||
CSS | 8 | EMS | 13 | 22 | 31 | 38 | 35 | 26 | 20 | 18 | 17 | 246 | 466 | 453 |
NON-EMS | 16 | 3 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 4 | 27 | 11 | ||
Ensembl | 3 | EMS | 923 | 419 | 127 | 74 | 53 | 27 | 26 | 16 | 18 | 253 | 1936 | 1013 |
NON-EMS | 660 | 349 | 91 | 32 | 3 | 1 | 0 | 1 | 0 | 3 | 1140 | 480 | ||
Ensembl | 8 | EMS | 24 | 33 | 34 | 43 | 34 | 22 | 23 | 16 | 17 | 242 | 488 | 464 |
NON-EMS | 17 | 1 | 5 | 1 | 0 | 0 | 0 | 1 | 0 | 3 | 28 | 11 | ||
Ensembl-RM | 3 | EMS | 426 | 163 | 61 | 54 | 47 | 22 | 27 | 17 | 16 | 231 | 1064 | 638 |
NON-EMS | 279 | 130 | 31 | 8 | 2 | 0 | 1 | 1 | 0 | 2 | 454 | 175 | ||
Ensembl-RM | 8 | EMS | 17 | 25 | 32 | 40 | 37 | 19 | 24 | 17 | 16 | 221 | 448 | 431 |
NON-EMS | 12 | 9 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 26 | 3 |
Reads were mapped with Novoalign using parameter t = 60, equivalent to a mismatch setting of approximately 2. Novoalign hard clipping option was used with a base quality 15. Reads were filtered to remove those with a mapping score less than 20. References used were the full IWGSC chromosome arm survey (“CSS”), the Ensembl v21 subset of CSS (“Ensembl”) or a repeat-masked version of the latter (“Ensembl-RM”). Minimum total read coverage was 8, minimum SNP read coverage 3 or 8, and minimum SNP base quality of 20.