Skip to main content
. 2024 Jul 29;15:6392. doi: 10.1038/s41467-024-50698-y

Fig. 4. MODIFY library improves the performances of machine learning-guided directed evolution (MLDE) on GB1.

Fig. 4

a An in silico MLDE experiment was simulated on the GB1 landscape, where an ML model was trained to predict the sequence-fitness relationships using the variant sequences in the MODIFY library and their associated experimentally characterized fitness as the training data. The trained ML model was then applied to prioritize novel fitness-enhanced variants. bf t-SNE visualization of the library sequences in the GB1 fitness landscape. Variants from various libraries (NNK, FoldX, FuncLib, Exploitation, and MODIFY) were colored in red. arb. unit, arbitrary unit. g Stratified bar plots of library sequences based on their fitness ranges: (WT, Max]: better than the wildtype, (0, WT]: lower than wildtype but higher than 0, {0}: zero fitness, with stop codon: variants with stop codons. hj The performance of ML models trained on fitness-labeled sequences from each library. The mean fitness (h), the max fitness (i), and the recall of the top 100 variants (j) as a function of the top K prediction were shown. The curves and error bands represent mean ± SEM over 25 independent repetitions.