. 2015 Oct 26;37(1):28–35. doi: 10.1002/humu.22911

Table 4.

Comparing Performance with Previously Published Results and Testing all Methods with the New Multi‐Method Benchmark Dataset

	Previously published		Multi‐method benchmark
	Sensitivity	Specificity	Sensitivity	Specificity	Balanced Accuracy
In‐frame
VEST‐indel	0.90^a	0.90^a	0.81	0.96	0.88
SIFT‐indel	0.81	0.82	0.86	0.76	0.81
DDIG‐in	0.89	N/A	0.78	0.91	0.84
PROVEAN	0.93/0.96	0.80/0.68	0.95	0.80	0.88
CADD	N/A	N/A	0.74	0.88	0.81
Frameshift
VEST‐indel	0.83^a	0.88^a	0.85	0.95	0.90
SIFT‐indel	0.90	0.78	0.94	0.25	0.59
DDIG‐in	0.86	0.72	0.75	0.80	0.77
CADD	N/A	N/A	0.98	0.05	0.52

Previously published sensitivity and specificity based on author's cross‐validation experiments. PROVEAN does not use cross validation so the reported numbers are from validation set experiments done separately for insertion and deletion variants. N/A, not applicable. Published results for the DDIG‐in in‐frame classifier do not include specificity; their self‐reporting consists of an accuracy (not balanced accuracy) of 0.84 and precision of 0.81. The authors of CADD did not report the performance achieved with indels separately.

^aResults from Table 1 included here for comparison. Multi‐method benchmark set consisted of pathogenic examples from Human Gene Mutation Database 2014.4 and benign examples 1000 Genomes Phase 3 (minor allele frequency in African Ancestry ≥ 0.1).