Skip to main content
. 2020 Jan 27;11:527. doi: 10.1038/s41467-020-14404-y

Fig. 1. UTR pG4 sequences are under heightened selective pressure.

Fig. 1

a Schematic depicting a folded RNA parallel G-quadruplex with the accompanying canonical pG4-sequence. b Reduction in variant frequencies affecting guanine G-tracts within UTR pG4 forming sequences compared to matched non-pG4 G-tracts by transcript-level constraint. rG4-G-tracts are those within UTR pG4 that have evidence of secondary structure formation by rG4-seq. Asterisks denote P value << 2.2 × 10−16 by Fisher’s exact test. c Reduction in the number of observed polymorphic sites compared to expectation in 5′ and 3′ UTR pG4 forming G-tracts using a nucleotide substitution model based on local sequence context (permuted P < 1 × 10−4 in all G-tracts compared to matched non-pG4 UTR sequences). Error bars represent bootstrapped 90% confidence interval for the ratio of observed vs. expected substitutions within each pG4 region. Red line and shaded regions represent the observed vs. expected number of substitutions in non-pG4 UTR sequences matched by transcript-level constraint and 90% confidence intervals, respectively. Gray-dashed line represents an expected vs. the observed ratio of 1:1. d Mutability-adjusted proportion of singletons (MAPS) for each set of variants affecting trinucleotide guanines within the meta-pG4 sequence motif. Central position guanines consistently demonstrate the highest MAPS scores (are most constrained) compared to non-pG4 UTR variants (permuted P < 1 × 10−4) across all contexts. Error bars represent the 5% and 95% bootstrap permutations for each variant class. Purple-dashed line, orange dashed line, and gray-dashed line represent MAPS score for Ensembl predicted high-impact coding (predicted loss-of-function), missense, and synonymous mutations respectively. Source data for bd are provided as a Source Data file.