(
A) Protein coding genes with high coding divergence (defined by amino acid identity between chimpanzee and human) generally have higher variability than genes with low coding divergence. The distribution of chimpanzee dispersion estimates is plotted as the empirical cumulative distribution function (ECDF) for the top and bottom decile genes by percent identity. (
B) Same as (
A) but defining coding divergence based on ratio of non-synonymous to synonymous substitution rates (dN/dS) across mammals. (
C) Loss-of-function tolerant (LoF tolerant) genes, defined by pLI score (
Lek et al., 2016), generally have higher variability than loss-of-function intolerant (LoF intolerant) genes. (
D) TATA box genes generally show higher variability. p-Values and ρ correlation coefficient provided for (
A) and (
B) represent Spearman correlation across all quantiles, rather than just the upper and lower decile, which are plotted for similar visual interpretation as (
C) and (
D), where the P-values provided represent a two-sided Mann-Whitney U-test. (
E) Gene set enrichment analysis of genes ordered by chimpanzee dispersion estimates. Only the top and bottom three most enriched significant categories (Adjusted p-value<0.05) are shown for each ontology set for space.