a BAD calling with Bayesian changepoint identification applied to variant calls detected at chr2 and chr6 in K562 ENCODE data (ENCBS725WFV). X-axis: chromosome position, bp. Y-axis: the allelic imbalance of individual SNVs. Horizontal green lines (ground level of the plots) indicate results of the initial stage of the algorithm: the detection of SNV-free regions including deletions, telomeric, and centromeric segments. Horizontal light-blue lines: predicted BAD. Orange dashes: “ground truth” BAD according to the COSMIC data (when available). b
Y-axis: SNV-level Kendall τb rank correlation between the predicted BAD and the “ground truth” BAD (COSMIC data). Each of 516 points denotes a particular group of related data sets of the same series (ENCODE biosample or GEO GSE ID) and the same cell type. X-axis: the number of SNV calls in a particular group of related data sets. Only SNVs falling into regions of known BAD (present in the COSMIC data) are considered, recurrent SNVs in several data sets are considered only once. c, d Receiver operating characteristic and precision-recall curves for predicted BAD maps used as binary classifiers of individual SNVs according to BAD vs the “ground truth” COSMIC data. To plot each curve, the score S = L(BAD = x) − maxy≠x
L(BAD = y), where L denotes log-likelihood, was used as the prediction score for thresholding. Colored circles denote the values obtained with the final BAD maps where particular BAD values were assigned to each segment according to the maximum posterior. Regions with BAD of 1, 3/2, 2, and 3 contain more than 97% of all candidate ASB variants. SNP single-nucleotide polymorphism, SNV single-nucleotide variant, AD allelic dosage, BAD background allelic dosage, TPR true positive rate, FPR false positive rate.