Table 2.
Effects of data preprocessing on SNP calling accuracy
Call set (QUAL > = 50) |
Site discovery | |||||
---|---|---|---|---|---|---|
No. SNPs | Ti/Tv ratio | |||||
All | Known | Novel | dbSNP% | Known | Novel | |
raw | 640946 | 499377 | 141569 | 77.91% | 2.19 | 1.65 |
filterY | 630641 | 490722 | 139919 | 77.81% | 2.19 | 1.65 |
trim | 651391 | 502951 | 148440 | 77.21% | 2.18 | 1.58 |
filterY&trim | 640487 | 493741 | 146746 | 77.08% | 2.18 | 1.58 |
raw: without any preprocessing steps; filterY: removing those reads that fail the Illumina chastity filter; trim: trimming off low-quality tails from reads with the BWA parameter (-q 15); filterY&trim: removing those reads that fail the Illumina chastity filter and trimming off low quality tails. SNPs were called for five samples together by GATK using bases with base quality≥20 and reads with mapping quality ≥20. Only sites with QUAL > = 50 were considered as potentially variable sites.