Skip to main content
. 2025 Jan 16;16:714. doi: 10.1038/s41467-025-55987-8

Table 1.

Summary of encodings of protein sequences, models, and acquisition functions tested in this work

Encoding Dimension per Residue Description
AAIndex 4 Continuous fixed amino acid descriptors
Georgiev71 19 Continuous fixed amino acid descriptors
Onehot 20 Categorical (which amino acid)
ESM233 1280 Learned embedding from a protein language model (ESM2 with 650 million parameters)
Model Bayesian? Deep Learning? Description
Boosting Ensemble N N An ensemble of 5 boosting models
Gaussian Process (GP) Y N A collection of continuous functions described by a posterior
DNN Ensemble N Y An ensemble of 5 multilayer perceptrons (deep neural networks, DNNs)
Deep Kernel Learning (DKL)29 Y Y A GP on the last layer of a deep neural network
Acquisition Function Deterministic? Description
Greedy Y Acquires the maximum value of the mean from the posterior
Upper Confidence Bound (UCB) Y Acquires the maximum value of a certain confidence interval from the posterior (tuned by a hyperparameter)
Thompson Sampling (TS) N Acquires the maximum value of a random function sampled from the posterior