Protein regions with high fractions of rare variants are believed to be more sensitive to sequence variants than other regions, thereby explaining why such variants occur infrequently in the population.
(A and C) Distributions for rare (low-DAF) non-synonymous SNVs (taken from the 1,000 Genomes dataset) in which the critical residues are defined to be the surface-critical (A) and interior-critical (C) residues.
(B and D) Distributions for rare (low MAF) non-synonymous SNVs (taken from the ExAC dataset) in which the critical residues are defined to be the surface-critical (B) and interior-critical (D) residues. For varying thresholds to define rarity, there are more structures in which the fraction of rare variants is higher in critical residues than in non-critical residues. Cases in which the fraction is equal in both categories are not shown. We consider all structures such that at least one critical and at least one non-critical residue intersect a non-synonymous SNV.
(A), (B), (C), and (D) represent data from 31, 90, 32, and 84 structures, respectively.