. 2016 Jul 25;17(Suppl 7):239. doi: 10.1186/s12859-016-1097-3

Table 1.

Minimal differences between Picard, SAMTools, and no duplicate removal

Subset	Total Variants	Ti/Tv Ratio	% Variants in dbSNP	Avg. Population Frequency	% Protein Changing Variants
All Picard	16354497	2.14	72.05	0.21	0.40
All SAMTools	16250761	2.14	71.86	0.22	0.40
All No Dups	16494672	2.14	71.30	0.21	0.40
P-Value	<2.60e-16	1.00	0.99	0.99	1.00
Common to all three	15688522	2.15	80.18	0.22	0.41
Unique to Picard	307486	1.92	66.27	0.16	0.33
Unique to SAMTools	150474	1.80	69.59	0.19	0.26
Unique to No Dups	398248	1.95	54.07	0.16	0.34
Unique to Picard/SAMTools	181176	1.97	73.86	0.22	0.33
Unique to Picard/No Dups	177313	2.07	65.30	0.21	0.31
Unique to SAMTools/No Dups	230589	1.73	52.17	0.23	0.24
P-Value (comparing Unique rows)	<2.60e-16	1.00	0.32	0.84	1.00

Here we present metrics from each portion of the Venn diagram (Fig. 2), including total number of variants, transition/transversion (Ti/Tv) ratios, average population frequency, proportion of novel variants, and proportion of variants that change the protein product. In the top part of the table, variant characteristics are reported for all the variants resulting from duplicate removal using Picard or SAMTools, or no duplicate removal. Variants from the dataset processed using Picard are referred to as Picard, processed using SAMTools as SAMTools, and the dataset without duplicate removal as No Dups. Population frequencies are based on the 1000 Genomes Project, dbSNP variants refer to build 138 and any variant not present in dbSNP is considered novel, and protein changing variants are missense SNVs or frameshifting InDels. We performed a Chi-square goodness-of-fit to test for significant differences amongst values in each column. Two tests were performed for each column: (1) comparing the values for all variants in each main dataset (“All Picard”, “All SAMTools”, and “All No Dups”); and (2) comparing values for variants across all “Unique” groups. There was a significant difference when comparing the number of variants across groups, but none of the other measures were significantly different