Fig. 3. Considerations for experimental ΔΔG dataset generation, with respect to ML predictiveness.
a,b, Model performance with varying training plus validation dataset size (datasets: Synthetic_FoldX_ΔΔG_{580-450000}, Supplementary Table 1) (a) and dataset diversity (datasets: Synthetic_FoldX_ΔΔG_100000_randomly_sampled, Synthetic_FoldX_ΔΔG_100000_{sequence/substitution_type/substitution_distribution}_{min/max}; Supplementary Table 1) (b). For b, we considered diversity in antibody CDR sequence identity, amino acid substitution type frequency and the distribution of mutated positions in the complex.
