Table 1.
Software | Supernat et al., 2018 (59)1 | Bian et al., 2018 (60) | Pei et al., 2020 (61) | Kumaran et al., 2019 (62) | Wang et al., 2020 (63)4 | Wang et al., 2020 (63)5 | Hofmann et al., 2017 (64) |
---|---|---|---|---|---|---|---|
Deep Variant | #1 | #1 | |||||
GATK MuTect2 | #1 | #1 | #22 | #2 | #37 | ||
SpeedSeq | #1 | ||||||
TNscopeS4 | #12 | ||||||
MuSE | #2 | #1 | #2 | ||||
Strelka | #1 | #1 | |||||
LoFreq | #1 | #2 | |||||
JointSNVMix2 | #1 | ||||||
SAMtools | #2 | #4 | |||||
MuTect | #3 | #2 | #2 | ||||
DeepSNV | #2 | ||||||
NeuSomatic | #32 | ||||||
SomaticSniper | #3 | #3 | |||||
GATK UnifiedGenotyper | #3 | ||||||
GATK Halotype Caller | #3 | #4 | |||||
VarDict | #4 | #3 | #48 | ||||
VarScan | #3 | ||||||
FreeBayes | #4 | ||||||
VarScan2 | #43 | #4 | |||||
Strelka2 | #43 | ||||||
TNseq6 | #4 |
Benchmarking papers from before 2017 were excluded as they typically compared outdated software versions or compared software that are no longer maintained. Numbers and colors indicate the relative ranking based on the individual paper, with one (green) being the highest two (yellow), three (orange), and four (red) being the lowest.
1These rankings are based on 30x data. In 15x data, the improved performance of DeepVariant was enhanced.
2At 20% purity.
3Good performance at high purity, but poor performance for low purity samples.
4Results based on DREAM WGS datasets as ground truth.
5Results based on WES and deep sequencing spike in studies.
6Software not free.
7High performance at low VAF, low performance at high VAF.
8High sensitivity, but with very high false positive rate.