Skip to main content
. 2015 Nov 20;11(11):e1004583. doi: 10.1371/journal.pcbi.1004583

Fig 2. Construction of the Somatic Mutation (SOM) model for liver cancer.

Fig 2

A. Relative density of somatic mutations from whole genome sequences of 88 liver tumors [11], associated to different genome features (see Methods for feature details). Mutation density is normalized so that the whole genome average has a mutation density of 1. PC gene: protein coding gene; CDS: coding sequence; Exon.P, Intron.P, Exon.L,Intron.L are exon and intron of protein coding gene and lncRNA respectively; CR: conserved region; DNase: DNase I hypersensitive site; ECS: evolutionarily conserved structure; ncExon: non-coding exon; PC gene.HE, LncRNA.HE, PC gene.LE and LncRNA.LE are high expressed and low expressed protein coding gene and lncRNA; PC gene.early, LncRNA.early, PC gene.late and LncRNA.late are early and late replicated protein coding gene and lncRNA; cTFBS: conserved transcription factor binding site;RR H,RR L,GC H,GC L,DNA.met H and DNA.met L are 1-Kb windows with high recombination rate (> 4.0), low recombination rate (< 0.5), high GC content (GC % > 50%), low GC content (GC%<30%), high DNA methylation (average value > 0.7245) and low DNA methylation (average value < 0.4062) respectively; Blue and red dotted lines: base lines showing average values for CDS and intergenic regions, respectively; B: Feature importance as measured by IncNodePurity. We only show here features that passed feature selection. C. Distribution of SOM scores for neutral SNPs and for clinical variants from two disease-causing variants databases Clivariant and HGMD. Neutral SNPs here are SNPs from the 1000 Genome project with allele frequency higher than 0.01, SOM scores predicted by the random forest model were divided by the number of patients. D. Correlation of SOM score with densities of disease-causing variants. Genome positions were sorted by SOM score and split into 100Mb intervals. The plots show the average SOM score and density of disease-causing variants for each interval. The purple dotted line shows cutoff used for defining low SOM score thereafter.