Table 1. Performance of tranches from Arabidopsis WGS sequence data.
Tranche, % | Sensitivity, % | Positive Predictive Value, % |
---|---|---|
100.0 | 99.9 | 93.7 |
99.9 | 99.3 | 95.4 |
99.0 | 94.9 | 99.2 |
97.5 | 92.0 | 99.3 |
95.0 | 89.3 | 99.4 |
75.0 | 54.3 | 99.6 |
Sensitivity and positive predictive value of multiple tranches of recalibrated variants from Arabidopsis WGS data were calculated using variants found in Sanger sequence data from Nordborg et al. (2005) for sensitivity; variants found in both the Sanger sequence data and in Gramene (build 43) were used to estimate positive predictive value (Table S4). For simplicity, the tranche percentage corresponds to both the SNP and the indel tranche. We note that these values are not generally applicable to other RIG analyses and these should not be taken as representative of how tranches in other analyses will behave; tranches should be chosen based on the reliability of the variants designated as truth for VQSR. WGS, whole-genome sequencing; SNP, single-nucleotide polymorphism; RIG, Recalibration and Interrelation of genomic sequence data with the GATK; VQSLOD, logarithm of odds ratio that a variant is real vs. not under the trained Gaussian mixture model; VQSR, Variant Quality Score Recalibration.