Figure - PMC

Skip to main content

View full-text article in PMC

. Author manuscript; available in PMC: 2016 Jun 21.

Published in final edited form as: Nat Genet. 2015 Dec 21;48(2):117–125. doi: 10.1038/ng.3471

Identification of significantly mutated regions (SMRs) in 20 cancer types across a broad spectrum of functional elements. (a) Pan-cancer distribution of mutation types in n=3,078,482 somatic single-nucleotide variant (SNV) calls. (b) Exons and exon-proximal domains (±1,000 bp) were scanned for clusters of somatic mutations (orange, DBSCAN). Distance parameter ε is dynamically defined as the average distance of mutated positions (d_p) in the domain size (d_s). Clusters (green) are divided if sub-clusters with higher mutation densities (P < 0.05, binomial test) are found in a second-pass analysis with ε defined as the average distance of mutated positions (c_p) within the cluster of size c_s (see Online Methods for density scoring and FDR calculation). (c) Per-cancer mutation frequency and density scores of discovered SMRs (color-coded by type and labelled by associated gene). The distribution of density scores in evaluated regions and SMR region types are shown in insets (middle) and (bottom), respectively. Dashed lines indicate the minimum, median, and maximum density score FDR (5%) thresholds. “Exon*” label refers to coding exons and non-coding genes. (d) Number of SMRs with FDR ≤ 5% and mutation frequency ≥2% per cancer type. Gray bars indicate SMRs with FDR ≤ 5% but mutation frequency <2%. (e) SMR size distribution. (f) Concordance between SMRs discovered by employing background models derived from whole-genome (WGS-based) or whole-exome (WES-based) sequencing. (g) Categories with significant fold change in mutation type representation between SMR-associated and input mutations are denoted (*; P < 0.01). (h) Distribution of the number of mutations per sample in SMRs (blue) and 58 (green) recurrently-altered non-coding regions²⁰.