Skip to main content
. 2020 Sep 25;11:4880. doi: 10.1038/s41467-020-17910-1

Fig. 4. Machine learning-guided predictive engineering of tryptophan metabolism.

Fig. 4

a, b Learning curves for ART and EVOLVE algorithms, respectively. Mean absolute error (MAE) from model training and testing as a function of the number of genotypes in the dataset. Shaded areas represent 95% confidence intervals based on ten random samples of the given no of genotypes (n). Blue curves indicate MAE when calculated for the whole dataset (train), while red curves indicate the cross-validation, i.e., by training the models on 80% of the data and then testing the predictions of this model against measurements for the remaining 20% (test). c, d Promoter distributions for the 30 recommendations of the exploitative (ART) and explorative (EVOLVE) approach, respectively. The orders and colors of promoters correspond to those in Fig. 1c. e, f Cross-validated predictions vs average of measured GFP synthesis rate for the exploitative (ART) and explorative (EVOLVE) approach, respectively. Data are shown for library and control strains (gray markers; green markers show the platform strain expressing ARO4K229L and TRP2S65R,S76L), as well as for recommended strains (blue markers; orange markers show recommendations that overlap between the two approaches). R-squared values are for cross-validated predictions for the whole dataset (not only training set data). MFI mean fluorescence intensity. Source data are provided as a Source data file.