Table 3.
Pearson’s linear correlation coefficient between the vIP of the wild-type nucleotide motifs and their normalized SBS frequency in the different gene regions
| X | Doublets XN | Triplets NXN | Quadruplets NXNN | Quintuplets NNXNN | Sextuplets NNXNNN |
|---|---|---|---|---|---|
| Missense mutations in exons | |||||
| All | -0.39 | -0.39 | -0.35 (<0.01) | -0.34 | -0.31 |
| G | -0.23 | -0.54 | -0.36 | -0.32 | -0.32 |
| C | -0.81 | -0.71 (<0.005) | -0.59 | -0.54 | -0.48 |
| A | 0.77 | 0.55 | 0.39 | 0.36 | 0.23 |
| T | -0.06 | 0.18 | -0.05 | -0.03 | -0.07 |
| Synonymous mutations in exons | |||||
| All | -0.48 | -0.42 | -0.39 | -0.38 | -0.34 |
| G | -0.65 | -0.67 (<0.005) | -0.50 (<0.00005) | -0.42 | -0.33 |
| C | -0.80 | -0.68 (<0.05) | -0.54 | -0.53 | -0.50 |
| A | -0.59 | 0.02 | -0.09 | 0.00 | -0.05 |
| T | -0.87 | 0.00 | -0.21 | -0.17 (<0.05) | -0.13 (<0.0005) |
| Mutations in introns | |||||
| All | -0.39 | -0.42 (<0.005) | -0.38 | -0.36 | -0.33 |
| G | -0.27 | -0.51 (<0.05) | -0.36 | -0.31 | -0.23 |
| C | -0.78 | -0.70 (<0.05) | -0.61 | -0.55 | -0.54 |
| A | 0.80 | 0.67 (<0.05) | 0.45 (<0.01) | 0.22 (<0.005) | 0.09 |
| T | 0.07 | 0.41 | 0.34 | -0.09 | -0.11 |
| Mutations in UTRs | |||||
| All | -0.43 | -0.39 (<0.05) | -0.35 (<0.005) | -0.34 | -0.32 |
| G | -0.18 | -0.47 | -0.31 | -0.25 (<0.01) | -0.16 (<0.0001) |
| C | -0.79 | -0.71 (<0.05) | -0.60 (<0.0001) | -0.57 | -0.53 |
| A | 0.60 | 0.64 (<0.01) | 0.54 | 0.49 | 0.28 |
| T | -0.13 | 0.33 | -0.02 | 0.00 | -0.04 |
X indicates the position of the mutated nucleobase and N any base. The correlation coefficients that are statistically significant and for which the null hypothesis is rejected are underlined, with the P-values below α=0.05 reported in parentheses. Note that the increase in sample size, from doublet to sextuplet motifs, also contributes to the increase of the statistical significance of the correlation