Extended Data Fig. 7. Internal and external metric correlations.
a. SpliceAI values are significantly higher for depleted and enriched synonymous/intron variants vs unchanged synonymous/intron variants (two-sided Mann-Whitney-Wilcoxon Test, ****p<0.0001, **p<0.01, depleted/enriched synonymous vs unchanged synonymous p<2.2e-16 and p=0.009, respectively; depleted/enriched intron vs unchanged intron p<2.2e-16 and p=2.9e-05, respectively). Non-splice region missense variants are not significantly different between depleted/enriched missense vs unchanged missense p=0.15 and p=0.11, respectively (two-sided Mann-Whitney-Wilcoxon Test). Box shows interquartile range, horizontal line median maximum spliceAI score, whiskers show maximum and minimum values, outliers as points. b. Average functional score for 4,619 ‘control’ variants (3,993 missense, 188 stop-gained and 438 synonymous) generated using redundant codons (snvre LFC mean) created in VaLiAnT13, compared to average functional score for the same variant generated by a SNV, coloured by SNV classification. Pearson’s Correlation Coefficient R and two-sided t-test p value shown. c. SGE classifications used as standards to compare in silico predictors: 8,470 non-splice region missense, 6,334 unchanged and 1,839 depleted variants (297 enriched were excluded). EVE, CADD and PolyPhen-2 reported SGE classifications with 79.5%, 77.9% and 76.7% accuracy, respectively. d-g. Bar charts show variants by classification for 8,470 missense variants. h. Strongly depleted variants show earlier depletion than most weakly depleted variants, observed by LFC D4 D10 FDR. i. Known9,24 BAP1 developmental variants show strong and weak depletion (c.1308A>G and c.2153G>A are unchanged). Functional score (bar) and DESeq2-calculated standard error (+/-error bars) from 3 biological replicates. j. Age of cancer onset for 256 carriers of BAP1 germline variants reported in a clinical analysis of 181 carrier families26. Strongly or weakly depleted variant carriers show no difference in age of onset. Carriers of variants in either depleted category have an earlier age of onset compared to unchanged variant carriers (two-sided Dunn’s BH FDR, ****q=5.91e-05, **q=0.0074, ns q=0.17). Box shows interquartile range, horizontal line median age of onset, whiskers show maximum and minimum values, outliers as points. k. Top, germline cancer variants26 by primary diagnosis site (where tumor site had >5 associated variants), coloured by functional classification. Bottom, MSK-IMPACT43 somatic variants by cancer type (where cancer type had >5 associated variants). Strongly and weakly depleted classifications are distributed throughout cancer sites/types.