Skip to main content
. 2016 Jul 25;17(Suppl 7):239. doi: 10.1186/s12859-016-1097-3

Table 1.

Minimal differences between Picard, SAMTools, and no duplicate removal

Subset Total Variants Ti/Tv Ratio % Variants in dbSNP Avg. Population Frequency % Protein Changing Variants
 All Picard 16354497 2.14 72.05 0.21 0.40
 All SAMTools 16250761 2.14 71.86 0.22 0.40
 All No Dups 16494672 2.14 71.30 0.21 0.40
P-Value <2.60e-16 1.00 0.99 0.99 1.00
Common to all three 15688522 2.15 80.18 0.22 0.41
 Unique to Picard 307486 1.92 66.27 0.16 0.33
 Unique to SAMTools 150474 1.80 69.59 0.19 0.26
 Unique to No Dups 398248 1.95 54.07 0.16 0.34
 Unique to Picard/SAMTools 181176 1.97 73.86 0.22 0.33
 Unique to Picard/No Dups 177313 2.07 65.30 0.21 0.31
 Unique to SAMTools/No Dups 230589 1.73 52.17 0.23 0.24
P-Value (comparing Unique rows) <2.60e-16 1.00 0.32 0.84 1.00

Here we present metrics from each portion of the Venn diagram (Fig. 2), including total number of variants, transition/transversion (Ti/Tv) ratios, average population frequency, proportion of novel variants, and proportion of variants that change the protein product. In the top part of the table, variant characteristics are reported for all the variants resulting from duplicate removal using Picard or SAMTools, or no duplicate removal. Variants from the dataset processed using Picard are referred to as Picard, processed using SAMTools as SAMTools, and the dataset without duplicate removal as No Dups. Population frequencies are based on the 1000 Genomes Project, dbSNP variants refer to build 138 and any variant not present in dbSNP is considered novel, and protein changing variants are missense SNVs or frameshifting InDels. We performed a Chi-square goodness-of-fit to test for significant differences amongst values in each column. Two tests were performed for each column: (1) comparing the values for all variants in each main dataset (“All Picard”, “All SAMTools”, and “All No Dups”); and (2) comparing values for variants across all “Unique” groups. There was a significant difference when comparing the number of variants across groups, but none of the other measures were significantly different