Figure 3. Significance of equilibrium dynamics and variation of physicochemical attributes across ensembles of conformers in developing ML-predictors of stability.
(a) A gradient boosting regressor (scikit-learn package with n_estimators = 1500, subsample = 0.7, max_depth = 5, max_features = 7) trained on dynamics-based attributes (ESSA score (for wt and substituted residues and their difference); MSFs in the softest 2% of modes, MSFs in the stiffest 5% modes, allosteric signaling sensitivity and effectiveness; mechanical stiffness) The regressor yielded a PCC of 0.61 and a RMSE of 1.24 kcal/mol on a widely benchmarked test set consisting of 350 SAVs. (b) Hen egg-white lysozyme N46 corresponding to two intrinsically accessible conformations show a widely varying SASA, from 0.04 to 0.86. (c) The distribution of the maximum difference in solvent accessibility, (d) distribution of hydrophobic packing density (the ratio of hydrophobic residues within 5 Å sphere radius of the mutated residue to the total number of residues within the same radius), (e) distribution of the changes in the number of hydrogen bonds near the investigated residue, (f) Contribution of indicated (abscissa labels) dynamics-based attributes and the 39 additional attributes from PROFOUND) to foldability prediction.