Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2022 Apr 15;23:98. doi: 10.1186/s13059-022-02661-7

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s) 2022

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

PMC Copyright notice

Fig. 3 — Analysis of DMS data for protein GB1. MAVE-NN was used to infer a latent phenotype model, consisting of an additive G-P map and a GE measurement process with a heteroscedastic skewed-t noise model, from the DMS data of Olson et al. [8]. All 530,737 pairwise variants reported for positions 2 to 56 of the GB1 domain were analyzed. Data were split 90:5:5 into training, validation, and test sets. a The inferred additive G-P map parameters. Gray dots indicate wildtype residues. Amino acids are ordered as in Olson et al. [8]. b GE plot showing measurements versus predicted latent phenotype values for 5000 randomly selected test set sequences (blue dots), alongside the inferred nonlinearity (solid orange line) and the 95% PI (dotted orange lines) of the noise model. Gray line indicates the latent phenotype value of the wildtype sequence. c Measurements plotted against $\hat{y}$ predictions for these same sequences. Dotted lines indicate the 95% PI of the noise model. Gray line indicates the wildtype value of $\hat{y}$ . Uncertainty in the value of R² reflects standard error. d Corresponding information metrics computed during model training (using training data) or for the final model (using test data). The uncertainties in these estimates are very small—roughly the width of the plotted lines. Gray shaded area indicates allowed values for intrinsic information based on the upper and lower bounds estimated as described in “Methods.” e–g Test set predictions (blue dots) and GE nonlinearities (orange lines) for models trained using subsets of the GB1 data containing all single mutants and 50,000 (e), 5000 (f), or 500 (g) double mutants. The GE nonlinearity from panel b is shown for reference (yellow-green lines). DMS: deep mutational scanning; GB1: protein G domain B1; GE: global epistasis; G-P: genotype-phenotype; PI: prediction interval