Skip to main content
. 2020 Jun 5;5:26. doi: 10.1038/s41525-020-0133-4

Table 1.

Details of sequence, epigenetic and structural features that can be included in the MutSpot model.

Feature Feature detail Rationale Source
Sequence context (SNVs) Identity of mutated base (A/T or C/G). Trinucleotide and penta-nucleotide contexts centered at the mutated base, and 1 bp and 2 bp left and right flanks of the mutated base. Sequence context is a major covariate of mutation probability. Although previous studies typically considered trinucleotide contexts, mutation rates could be affected by wider sequence contexts25. Computed from mutation data
Sequence context (indels) Presence of poly-A/T or poly-C/G sequences longer than 5 bp at the indel site. Long mononucleotide repeats could lead to artifacts in indel calling. Computed from mutation data
TF-binding profiles ChIP-Seq peak profiles of 132 TFs and 1 meta profile including peaks of all TFs from ENCODE cell lines. TF-binding sites have elevated mutation rates in certain cancers due to impaired nucleotide excision repair. Zerbino et al. 26
Replication timing Mean replication timing profile of 13 ENCODE cell lines. Replication timing is inversely correlated with mutation probability. Hansen et al. 27
APOBEC editing sites Predicted APOBEC editing sites. Elevated mutation rates at APOBEC editing sites could lead to the formation of passenger hotspots. Buisson et al.28 Table S2.
Local mutation rate Mutation rate of 100 kb nonoverlapping genomic bins. To correct for additional unexplained regional variation in mutation rates. Computed from mutation data
Individual mutation count Mutation burden of individual tumors. To account for intertumor heterogeneity. Computed from mutation data
Tissue-specific epigenetic profile Chromatin accessibility and modification profiles from matched tissue/cell type. Epigenetic profiles from the cell of origin better predict the mutational landscape of tumors13. Supplied by the user
COSMIC mutation signatures Proportion of mutations contributed by a specific mutation signature for each tumor. To further correct for specific mutational processes in the tumor cohort. Supplied by the user