a, Violin plot showing distributions of simulated AbundancePCA growth rates (assuming additivity of individual inferred folding free energy changes23) versus number of random aa substitutions (n = 100,000). Violins are scaled to have the same maximum width. b, DMS data, energy model and algorithm used to select a set of single aa substitutions for combinatorial mutagenesis. A shallow double-mutant library of GRB2-SH3 protein variants was assayed by AbundancePCA (see panel c) and BindingPCA (see Fig. 4b; in combination referred to as ddPCA), followed by energy modelling to infer single aa substitution free energy changes of folding and binding23. We used this model together with a greedy algorithm to select a set of 34 single aa substitutions that, when combined, would simultaneously maximize both the predicted AbundancePCA and BindingPCA growth rates, that is, preserving both fold and function. 3D structure of GRB2-SH3 (PDB: 2VWF) indicating the 34 combinatorially mutated residues (orange) and GAB2 ligand (blue) is shown on the right. c, Overview of AbundancePCA on the protein of interest (GRB2-SH3)23. yes, yeast growth; no, yeast growth defect; DHF, dihydrofolate; THF, tetrahydrofolate. d, Scatter plots showing the reproducibility of fitness estimates from triplicate AbundancePCA experiments. Pearson’s r is indicated in red. Rep., biological replicate. e, Histogram showing the number of observed aa variants at increasing Hamming distances from the wild type (denoted by WT), for which the x axis is shared with panel f. f, Violin plot showing distributions of AbundancePCA growth rates inferred from deep sequencing data versus number of aa substitutions. In panels a and f, the percentage of folded protein variants (predicted fraction folded molecules > 0.5) is shown at each Hamming distance from the wild type.