Skip to main content
. 2024 Sep 25;634(8035):995–1003. doi: 10.1038/s41586-024-07966-0

Extended Data Fig. 5. Saturation combinatorial mutagenesis of a protein surface patch.

Extended Data Fig. 5

a, 3D structure of GRB2-SH3 (PDB: 2VWF) indicating four residues targeted for saturation combinatorial mutagenesis (orange, library 2) and GAB2 ligand (blue). See also Extended Data Fig. 4. b, Scatter plots showing the reproducibility of fitness estimates from triplicate AbundancePCA experiments. Pearson’s r indicated in red. Rep., biological replicate. c, Histogram showing the number of observed aa variants at increasing Hamming distances from the wild type, in which the x axis is shared with panel d. d, Violin plot showing distributions of AbundancePCA growth rates inferred from deep sequencing data versus number of aa substitutions. The percentage of folded protein variants (predicted fraction folded molecules > 0.5) is shown at each Hamming distance from the wild type. e, Nonlinear relationship (global epistasis) between observed AbundancePCA fitness and changes in free energy of folding. Thermodynamic model fit shown in red. f, Performance of energy model including all first-order and second-order genetic interaction (energetic coupling) terms/coefficients. g, Distributions of folding free energy changes (ΔΔG, grey) and pairwise energetic couplings (ΔΔΔG, red). h, Comparisons of the model-inferred single aa substitution free energy changes to previously reported estimates using GRB2-SH3 ddPCA data23. Pearson’s r is shown. i, Box plots showing relationship between folding coupling energy strength and minimal inter-residue side-chain heavy-atom distance. Boxes are coloured by inter-residue distance. Spearman’s ρ is shown for all couplings (n = 2,166 second-order coefficients), as well as the weighted mean per residue pair (n = 6 residue pairs). j, Relationship between folding coupling energy strength and linear sequence (backbone) distance in number of residues. Boxes are coloured as in panel i. For box plots in panels i and j: centre line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; n = 2,166 second-order coefficients.