Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2020 Apr 13;36(12):3637–3644. doi: 10.1093/bioinformatics/btaa242

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s) 2020. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

PMC Copyright notice

Fig. 2. — Scoring distributions for SNVs in the non-coding datasets show differences between germline (1000 Genomes) and rare somatic (COSMIC, r = 1) examples. The features that discriminate most clearly between germline and somatic variants are those associated with conservation scores (top) and the somatic mutation frequency within a local region (bottom). Conservation scores do not yield the kind of discrimination we see typically when comparing pathogenic or oncogenic mutants with presumed benign variants, however PhyloP scores suggest that putative somatic passenger variants are more closely associated with highly conserved regions (lower scores indicate greater conservation) than benign germline variants (top). This same pattern holds for other conservation scores, but the distinction is less clear (Supplementary Fig. S2). Somatic variants also appear to reside in regions with higher mutation tolerance, as measured by the number of somatic variants found within a region of 1000 positions (bottom). The individual probabilities that the two distributions in each subplot come from the same underlying distribution are upper bounded by $10^{- 18}$ , and hence the differences are certainly statistically significant