Skip to main content
. 2024 Feb 22;15:1639. doi: 10.1038/s41467-024-45621-4

Fig. 3. CoVES, an unsupervised approach to learn residue mutation preferences from structural microenvironments, can predict variant effects and generatively design functional and diverse sequences.

Fig. 3

a Schematic of the CoVES workflow: First, an equivariant graph neural network23,31 is used to predict amino acid preferences from the structural environment around a particular residue. The mutation preference for each residue are converted into log probabilities, and summed to predict combinatorial variant effects. Finally, these scores can be used to design combinatorial variants at the desired residue positions by sampling with the Boltzmann energy function. b Spearman correlation coefficients between unsupervised model scores and observed combinatorial variant effects. c Serial dilution growth assay of CoVES designed antitoxin variants in the presence of wild-type toxin. d, e CoVES designed antitoxin sequences evaluated for their predicted functionality and diversity. Generated sequences (n = 91 unique sequences, temperature = 1.5) from CoVES are evaluated for their predicted growth rate effects using the supervised surrogate fitness function (d), and their mutation number with respect to the wild-type antitoxin (e). f Comparison of generated sequences using CoVES vs. other state-of-the-art sequence design models in terms of the fraction of generated antitoxin sequences predicted to be functional and their average number of mutations as the sampling temperature is varied. Each dot represents a collection of sampled sequences summarized by their average number of mutations and their predicted fraction of functional sequences. Random library measurements indicated in dark gray, and energy-based Boltzmann sampling from the supervised, per residue logistic regression model trained on the observed variant data indicated in light gray. Lines represent polynomial fits. Source data are provided as a Source Data file.