Fig. 3. Using enzyme and reaction information combined leads to improved kcat predictions.
a Coefficients of determination R2 for models with different inputs. b Mean squared errors (MSE) on -scale. Boxplots summarize the results of the CV with n = 5 folds on the training set with the best sets of hyperparameters; blue dots show the results on the test set using the optimized models trained on the whole training set. We used a 2× interquartile range for the whiskers, the boxes extend from the lower to upper quartile values, and the red horizontal lines are displaying the median of the data points. Model performances are plotted for the models with structural reaction fingerprints (str. FP), difference reaction fingerprints (diff. FP), differential reaction fingerprints (DRFP), ESM-1b vectors (ESM-1b), task-specific ESM-1b vectors (ESM-1bESP), and with enzyme and reaction information (ESM-1bESP + DRFP). Source data are provided as a Source Data file.
