Skip to main content
. 2023 Oct 17;12:giad081. doi: 10.1093/gigascience/giad081

Table 1:

Summary and performance of kataegis detection packages

Synthetic dataset WGS dataset
Package Reference Available on Language Method Accuracy nMCC F1 TPR TNR Accuracy nMCC F1 TPR TNR
Katdetectr [21] Bioconductor R Changepoint analysis (PELT) 0.99 0.98 0.97 0.94 0.99 0.99 0.92 0.83 0.91 0.99
SeqKat [13] CRAN R Sliding window/exact binomial test 0.84 0.54 0.02 0.93 0.84 0.99 0.85 0.69 0.59 0.99
MafTools [10] Bioconductor R Sliding window/piecewise constant fit (PCF) 0.74 0.53 0.01 0.96 0.74 0.99 0.85 0.66 0.93 0.99
SigProfilerClusters [14] GitHub Python Model sample-specific IMD cutoff 0.65 0.52 0.01 0.88 0.65 0.99 0.84 0.68 0.66 0.99
ClusteredMutations [11] CRAN R Anti-Robinson matrix 0.70 0.53 0.01 0.99 0.74 0.99 0.83 0.61 0.99 0.99
Kataegis [12] GitHub R Piecewise constant fit (PCF) 0.99 0.80 0.52 0.36 0.99 0.99 0.56 0.03 0.02 0.99

Summary: information of all evaluated kataegis detection packages and their respective performance metrics regarding kataegis classification on 1,024 synthetic samples and 507 a priori labeled whole-genome sequenced (WGS) samples. Accuracy, normalized Matthews correlation coefficient (nMCC), F1-score, true-positive rate (TPR), and true-negative rate (TNR), pruned exact linear time (PELT), piecewise constant fit (PCF), and intermutation distance (IMD).

Note: Highest value per column is underscored.