Figure 2. ML-based pathogenicity predictor Rhapsody takes account of structural dynamics, yields highly accurate predictions, and can be used to generate in silico saturation mutagenesis heatmaps.
(a) Descriptors used in Rhapsody: sequence (conservation (entropy), position specific independent counts (PSIC) and change in PSIC (ΔPSIC), amino acid substitution (Blosum62), mutual information (MI)), structure (SASA) and structural dynamics (mean-square fluctuations (MSF) of mutated residue, propensity to serve as effector or sensor of allosteric signals (see previous work [26]), and mechanical stiffness). The bars display the percent contribution of these descriptors to the trained classifier. A set of five bars is displayed for each descriptor, corresponding to subsets of proteins of different sizes, with numbers of residues lying in the ranges [150–249] (orange), [250–361] (gray), [362–520] (yellow), and [521 –3636] (light blue). The first bar (dark blue) in each group refers to the entire set. The corresponding percent contributions of different features are listed in the light blue box. (b) Prediction performance based on different metrics. (c) In silico saturation mutagenesis heatmap. These are pathogenicity probabilities (see the scale bar on the right) evaluated for all 19 substitutions (ordinate) at each residue position (abscissa), shown here for a 100-residue segment of p53. Structural and dynamic features are based on the tetrameric structure (PDB id: 3KMD) [27]. The curves underneath are the averages over all 19 substitutions for each residue, predicted by Rhapsody (red dots), PolyPhen-2 (dark blue) [28] and EVMutation (green) [19]. The Pearson correlation coefficient (PCC) between each pair of results is around 0.74; whereas that between PolyPhen-2 and EVMutation is 0.58. (d) Color-coded pathogenicity results for p53 monomer. Mutations at sites colored red are highly susceptible to be pathogenic. A few such residues are labeled. These are reported in ClinVar to be pathogenic (green spheres), likely pathogenic (olive spheres), or unknown (green sticks).