Skip to main content
. 2015 Jan 20;5:7857. doi: 10.1038/srep07857

Table 1. Multi-step filter selects cancer drivers in genomics datasets with high mutational burden.

Step Logic Program Input Output Reference files
i Cohort selection TCGA Portal, CGHub, Gene Torrent tcga_patient_list.txt wes.bam  
ii Mutation call MuTect 1.1.428 wes.bam coverage.wig, call_stats.txt HG19, COSMIC_54.vcf, dbsnp_132.vcf
iii Recurrence, evolutionary conservation MutSig 2.029 patient.maf, coverage.wig covariates.txt HG19/.maf
iv Correction for background mutation rate InVEx 1.0.19 coverage.wig, covariates.txt significant_mutation_burden.txt, qq.png HG19/.maf, PPH2, nucleotide_classes_HG19.txt, COSMIC, genePeptideFile_HG19
v Mutation Signature, UV induced damage Text editor patient.maf sorted_transitions.maf nucleotide_classes_HG19.txt, uv_transitions.txt10
vi Structure-activity-relationship SWISS-MODEL 8.05, TMpred, 25.0 structure.pdb model.pdb, tm_model.txt  
vii Pathway enrichment, Mutual exclusivity GSEA 2.1.012, MEMo 1.111 tcga_patient_list.txt, scna.txt, amp_del_gene.txt, covariates.txt, coverage.wig modules.txt  
viii Recurrence within PAN-Cancer TCGA TCGA covariate_target.txt, patient.maf covariate.maf  

Somatic mutations of driver genes are called after i) cohort selection, ii) mapping of human genome and patient specific somatic references, iii) assessment of recurrence, evolutionary conversation, and iv) basal mutation rate based on frequency of mutations of introns vs exons. This first set of filters i)-iv) is necessary and sufficient to identify statistically significant enriched somatic mutations of driver genes in any dataset with high mutational burden. In a genome-wide sequencing experiment with a goal to find cancer drivers, an additional level of filters v)-viii) is advantageous. Relevance of mutations is assessed by v) nucleotide signature, vi) structure activity relationship, vii) pathway enrichment and mutual exclusivity to known cancer drivers, as well as viii) recurrence in other cancer tissues.