Table 1:
Synthetic dataset | WGS dataset | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Package | Reference | Available on | Language | Method | Accuracy | nMCC | F1 | TPR | TNR | Accuracy | nMCC | F1 | TPR | TNR |
Katdetectr | [21] | Bioconductor | R | Changepoint analysis (PELT) | 0.99 | 0.98 | 0.97 | 0.94 | 0.99 | 0.99 | 0.92 | 0.83 | 0.91 | 0.99 |
SeqKat | [13] | CRAN | R | Sliding window/exact binomial test | 0.84 | 0.54 | 0.02 | 0.93 | 0.84 | 0.99 | 0.85 | 0.69 | 0.59 | 0.99 |
MafTools | [10] | Bioconductor | R | Sliding window/piecewise constant fit (PCF) | 0.74 | 0.53 | 0.01 | 0.96 | 0.74 | 0.99 | 0.85 | 0.66 | 0.93 | 0.99 |
SigProfilerClusters | [14] | GitHub | Python | Model sample-specific IMD cutoff | 0.65 | 0.52 | 0.01 | 0.88 | 0.65 | 0.99 | 0.84 | 0.68 | 0.66 | 0.99 |
ClusteredMutations | [11] | CRAN | R | Anti-Robinson matrix | 0.70 | 0.53 | 0.01 | 0.99 | 0.74 | 0.99 | 0.83 | 0.61 | 0.99 | 0.99 |
Kataegis | [12] | GitHub | R | Piecewise constant fit (PCF) | 0.99 | 0.80 | 0.52 | 0.36 | 0.99 | 0.99 | 0.56 | 0.03 | 0.02 | 0.99 |
Summary: information of all evaluated kataegis detection packages and their respective performance metrics regarding kataegis classification on 1,024 synthetic samples and 507 a priori labeled whole-genome sequenced (WGS) samples. Accuracy, normalized Matthews correlation coefficient (nMCC), F1-score, true-positive rate (TPR), and true-negative rate (TNR), pruned exact linear time (PELT), piecewise constant fit (PCF), and intermutation distance (IMD).
Note: Highest value per column is underscored.