Table 1.
Minimal differences between Picard, SAMTools, and no duplicate removal
| Subset | Total Variants | Ti/Tv Ratio | % Variants in dbSNP | Avg. Population Frequency | % Protein Changing Variants |
|---|---|---|---|---|---|
| All Picard | 16354497 | 2.14 | 72.05 | 0.21 | 0.40 |
| All SAMTools | 16250761 | 2.14 | 71.86 | 0.22 | 0.40 |
| All No Dups | 16494672 | 2.14 | 71.30 | 0.21 | 0.40 |
| P-Value | <2.60e-16 | 1.00 | 0.99 | 0.99 | 1.00 |
| Common to all three | 15688522 | 2.15 | 80.18 | 0.22 | 0.41 |
| Unique to Picard | 307486 | 1.92 | 66.27 | 0.16 | 0.33 |
| Unique to SAMTools | 150474 | 1.80 | 69.59 | 0.19 | 0.26 |
| Unique to No Dups | 398248 | 1.95 | 54.07 | 0.16 | 0.34 |
| Unique to Picard/SAMTools | 181176 | 1.97 | 73.86 | 0.22 | 0.33 |
| Unique to Picard/No Dups | 177313 | 2.07 | 65.30 | 0.21 | 0.31 |
| Unique to SAMTools/No Dups | 230589 | 1.73 | 52.17 | 0.23 | 0.24 |
| P-Value (comparing Unique rows) | <2.60e-16 | 1.00 | 0.32 | 0.84 | 1.00 |
Here we present metrics from each portion of the Venn diagram (Fig. 2), including total number of variants, transition/transversion (Ti/Tv) ratios, average population frequency, proportion of novel variants, and proportion of variants that change the protein product. In the top part of the table, variant characteristics are reported for all the variants resulting from duplicate removal using Picard or SAMTools, or no duplicate removal. Variants from the dataset processed using Picard are referred to as Picard, processed using SAMTools as SAMTools, and the dataset without duplicate removal as No Dups. Population frequencies are based on the 1000 Genomes Project, dbSNP variants refer to build 138 and any variant not present in dbSNP is considered novel, and protein changing variants are missense SNVs or frameshifting InDels. We performed a Chi-square goodness-of-fit to test for significant differences amongst values in each column. Two tests were performed for each column: (1) comparing the values for all variants in each main dataset (“All Picard”, “All SAMTools”, and “All No Dups”); and (2) comparing values for variants across all “Unique” groups. There was a significant difference when comparing the number of variants across groups, but none of the other measures were significantly different